diff options
| author | Juan Marín Noguera <juan@mnpi.eu> | 2025-02-22 21:32:43 +0100 |
|---|---|---|
| committer | Juan Marín Noguera <juan@mnpi.eu> | 2025-02-22 21:32:43 +0100 |
| commit | 9ad815299e875247f8dab86cee99e5636aebab9e (patch) | |
| tree | 9939392a08fdc1d0b278df1f59d6a745bbf95abb | |
| parent | 2b61f0afdbc5bfbe285b7d68512010d41c5b7f6d (diff) | |
3.3.2 Empirical Tests
| -rw-r--r-- | 3.3.2.lyx | 1009 | ||||
| -rw-r--r-- | index.lyx | 23 |
2 files changed, 1032 insertions, 0 deletions
diff --git a/3.3.2.lyx b/3.3.2.lyx new file mode 100644 index 0000000..03a3027 --- /dev/null +++ b/3.3.2.lyx @@ -0,0 +1,1009 @@ +#LyX 2.4 created this file. For more info see https://www.lyx.org/ +\lyxformat 620 +\begin_document +\begin_header +\save_transient_properties true +\origin unavailable +\textclass book +\begin_preamble +\input defs +\end_preamble +\use_default_options true +\maintain_unincluded_children no +\language english +\language_package default +\inputencoding utf8 +\fontencoding auto +\font_roman "default" "default" +\font_sans "default" "default" +\font_typewriter "default" "default" +\font_math "auto" "auto" +\font_default_family default +\use_non_tex_fonts false +\font_sc false +\font_roman_osf false +\font_sans_osf false +\font_typewriter_osf false +\font_sf_scale 100 100 +\font_tt_scale 100 100 +\use_microtype false +\use_dash_ligatures true +\graphics default +\default_output_format default +\output_sync 0 +\bibtex_command default +\index_command default +\float_placement class +\float_alignment class +\paperfontsize default +\spacing single +\use_hyperref false +\papersize default +\use_geometry false +\use_package amsmath 1 +\use_package amssymb 1 +\use_package cancel 1 +\use_package esint 1 +\use_package mathdots 1 +\use_package mathtools 1 +\use_package mhchem 1 +\use_package stackrel 1 +\use_package stmaryrd 1 +\use_package undertilde 1 +\cite_engine basic +\cite_engine_type default +\biblio_style plain +\use_bibtopic false +\use_indices false +\paperorientation portrait +\suppress_date false +\justification true +\use_refstyle 1 +\use_formatted_ref 0 +\use_minted 0 +\use_lineno 0 +\index Index +\shortcut idx +\color #008000 +\end_index +\secnumdepth 3 +\tocdepth 3 +\paragraph_separation indent +\paragraph_indentation default +\is_math_indent 0 +\math_numbering_side default +\quotes_style english +\dynamic_quotes 0 +\papercolumns 1 +\papersides 1 +\paperpagestyle default +\tablestyle default +\tracking_changes false +\output_changes false +\change_bars false +\postpone_fragile_content false +\html_math_output 0 +\html_css_as_file 0 +\html_be_strict false +\docbook_table_output 0 +\docbook_mathml_prefix 1 +\end_header + +\begin_body + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +exerc1[10] +\end_layout + +\end_inset + +Why should the serial test described in part B be applied to +\begin_inset Formula +\[ +(Y_{0},Y_{1}),(Y_{2},Y_{3}),\dots,(Y_{2n-2},Y_{2n-1}) +\] + +\end_inset + +instead of to +\begin_inset Formula $(Y_{0},Y_{1}),(Y_{1},Y_{2}),\dots,(Y_{n-1},Y_{n})$ +\end_inset + +? +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + +Otherwise the points wouldn't be independent. +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +exerc2[10] +\end_layout + +\end_inset + +State an appropriate way to generalize the serial test to triples, + quadruples, + etc., + instead of pairs. +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + +For +\begin_inset Formula $k$ +\end_inset + +-tuples, + use points +\begin_inset Formula $(Y_{0},\dots,Y_{k-1}),(Y_{k},\dots,Y_{2k-1}),\dots,(Y_{k(n-1)},\dots,Y_{kn-1})$ +\end_inset + +. + The chi-square method is applied to the +\begin_inset Formula $d^{k}$ +\end_inset + + possible categories and at least +\begin_inset Formula $5d^{k}$ +\end_inset + + values of +\begin_inset Formula $U$ +\end_inset + + should be taken. +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +rexerc3[M20] +\end_layout + +\end_inset + +How many +\begin_inset Formula $U$ +\end_inset + +'s need to be examined in the gap test (Algorithm G) before +\begin_inset Formula $n$ +\end_inset + + gaps have been found, + on average, + assuming that the sequence is random? + What is the standard deviation of this quantity? +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + +The probability of a given value being the end of a gap is +\begin_inset Formula $p\coloneqq\frac{1}{\beta-\alpha}$ +\end_inset + +, + so the gap lengths have approximately exponential distribution and therefore we should take about +\begin_inset Formula $np=\frac{n}{\beta-\alpha}$ +\end_inset + + +\begin_inset Formula $U$ +\end_inset + +'s. + More precisely, + let +\begin_inset Formula $q\coloneqq1-p$ +\end_inset + +, + the probability of a gap with length +\begin_inset Formula $k$ +\end_inset + + is +\begin_inset Formula $q^{k-1}p$ +\end_inset + +, + so the average is +\begin_inset Formula +\[ +\sum_{k=1}^{\infty}q^{k-1}pk=p\sum_{k=1}^{\infty}kq^{k-1}=p\sum_{j=1}^{\infty}\sum_{k=j}^{\infty}q^{k-1}=p\sum_{j=1}^{\infty}\frac{q^{j-1}}{1-q}=\sum_{j=0}^{\infty}q^{j}=\frac{1}{1-q}=\frac{1}{p}, +\] + +\end_inset + +so the approximation above is actually the exact value of the mean gap length and the average is exactly +\begin_inset Formula $\frac{n}{\beta-\alpha}$ +\end_inset + +. +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +exerc7[08] +\end_layout + +\end_inset + +Apply the coupon collector's test procedure (Algorithm C), + with +\begin_inset Formula $d=3$ +\end_inset + + and +\begin_inset Formula $n=7$ +\end_inset + +, + to the sequence 1101221022120202001212201010201121. + What length do the seven subsequences have? +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + + +\begin_inset Formula $5,3,5,6,5,5,4$ +\end_inset + +. +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +rexerc8[M22] +\end_layout + +\end_inset + +How many +\begin_inset Formula $U$ +\end_inset + +'s need to be examined in the coupon collector's test, + on the average, + before +\begin_inset Formula $n$ +\end_inset + + complete sets have been found by Algorithm C, + assuming that the sequence is random? + What is the standard deviation? +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + +Given that +\begin_inset Formula $\stirlb rd=0$ +\end_inset + + for +\begin_inset Formula $r<d$ +\end_inset + +, + we can write the generating function corresponding to the coupon collector's probability distribution as +\begin_inset Formula +\begin{align*} +F(z) & \coloneqq\sum_{r\geq1}\frac{d!}{d^{r}}\stirlb{r-1}{d-1}z^{r}=d!\frac{z}{d}\sum_{r\geq0}\stirlb r{d-1}\left(\frac{z}{d}\right)^{r}=\frac{(d-1)!z\left(\frac{z}{d}\right)^{d-1}}{(1-\tfrac{1}{d}z)(1-\tfrac{2}{d}z)\cdots(1-\tfrac{d-1}{d}z)}\\ + & =\frac{(d-1)!z^{d}}{(d-z)(d-2z)\cdots(d-(d-1)z)}=\frac{z^{d}}{(d-z)(\frac{d}{2}-z)\cdots(\frac{d}{d-1}-z)}. +\end{align*} + +\end_inset + +using Eq. + 1.2.9(28). + We derive this to get +\begin_inset Formula +\begin{align*} +F'(z) & =\frac{dz^{d-1}}{(d-z)\cdots(\frac{d}{d-1}-z)}-\frac{z^{d}(d-z)\cdots(\frac{d}{d-1}-z)\left(-\frac{1}{d-z}-\dots-\frac{1}{\frac{d}{d-1}-z}\right)}{(d-z)^{2}\cdots(\tfrac{d}{d-1}-z)^{2}}\\ + & =\frac{dz^{d-1}+z^{d}\left(\frac{1}{d-z}+\dots+\frac{1}{\frac{d}{d-1}-z}\right)}{(d-z)\cdots(\frac{d}{d-1}-z)};\\ +F''(z) & =\frac{d(d-1)z^{d-2}+dz^{d-1}\left(\frac{1}{d-z}+\dots+\frac{1}{\frac{d}{d-1}-z}\right)+z^{d}\left(\frac{1}{(d-z)^{2}}+\dots+\frac{1}{(\frac{d}{d-1}-z)^{2}}\right)}{(d-z)\cdots(\frac{d}{d-1}-z)}+\\ + & +\left(\frac{1}{d-z}+\dots+\frac{1}{\frac{d}{d-1}-z}\right)\frac{dz^{d-1}+z^{d}\left(\frac{1}{d-z}+\dots+\frac{1}{\frac{d}{d-1}-z}\right)}{(d-z)\cdots(\frac{d}{d-1}-z)}. +\end{align*} + +\end_inset + +We are interested in +\begin_inset Formula $F'(1)$ +\end_inset + + and +\begin_inset Formula $F''(1)$ +\end_inset + +, + for which we should note that +\begin_inset Formula +\begin{align*} +\frac{1}{\frac{d}{k}-1} & =\frac{k}{d-k}, & (d-1)(\tfrac{d}{2}-1)\cdots(\tfrac{d}{d-2}-1)(\tfrac{d}{d-1}-1) & =(d-1)\tfrac{d-2}{2}\cdots\tfrac{2}{d-2}\tfrac{1}{d-1}=1. +\end{align*} + +\end_inset + + +\end_layout + +\begin_layout Standard +Thus, + if +\begin_inset Formula $D_{1}\coloneqq\sum_{k=1}^{d}\frac{k}{d-k}$ +\end_inset + + and +\begin_inset Formula $D_{2}\coloneqq\sum_{k=1}^{d}\left(\frac{k}{d-k}\right)^{2}$ +\end_inset + +, +\begin_inset Formula +\begin{align*} +F'(1)= & D_{1}+d; & F''(1)= & d(d-1)+dD_{1}+dD_{1}+D_{1}^{2}+D_{2}=(d+D_{1})^{2}-d+D_{2}. +\end{align*} + +\end_inset + +With this, +\begin_inset Formula +\begin{align*} +\mu & =F'(1)=D_{1}+d;\\ +\sigma & =\sqrt{F''(1)+F'(1)-F'(1)^{2}}=\sqrt{D_{2}+D_{1}}. +\end{align*} + +\end_inset + + +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +exerc11[00] +\end_layout + +\end_inset + +The +\begin_inset Quotes eld +\end_inset + +runs up +\begin_inset Quotes erd +\end_inset + + in a particular permutation are displayed in (9); + what are the +\begin_inset Quotes eld +\end_inset + +runs down +\begin_inset Quotes erd +\end_inset + + in that permutation? +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + + +\begin_inset Formula +\[ +\begin{array}{|c|c|cccc|c|cc|c|} +1 & 2 & 9 & 8 & 5 & 3 & 6 & 7 & 0 & 4\end{array}. +\] + +\end_inset + + +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +rexerc14[M15] +\end_layout + +\end_inset + +If we +\begin_inset Quotes eld +\end_inset + +throw away +\begin_inset Quotes erd +\end_inset + + the element that immediately follows a run, + so that when +\begin_inset Formula $X_{j}$ +\end_inset + + is greater than +\begin_inset Formula $X_{j+1}$ +\end_inset + + we start the next run with +\begin_inset Formula $X_{j+2}$ +\end_inset + +, + the run lengths are independent, + and a simple chi-square test may be used (instead of the horribly complicated method derived in the text). + What are the appropriate run-length probabilities for this simple run test? +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + +When the +\begin_inset Formula $X_{j}$ +\end_inset + + are real numbers, + the probability of a repeated element in a finite sample is 0, + so we can unambiguously take a permutation of the elements in such a way that all permutations are equally likely. + Let +\begin_inset Formula $p_{r}$ +\end_inset + + be the probability that +\begin_inset Formula $U_{1},U_{2},U_{3},\dots$ +\end_inset + + starts with a run of length +\begin_inset Formula $r$ +\end_inset + + for +\begin_inset Formula $1\leq r<t$ +\end_inset + +, + and let +\begin_inset Formula $p_{t}$ +\end_inset + + be the probability that it starts with a run of length +\begin_inset Formula $t$ +\end_inset + + or more, + we have +\begin_inset Formula $p_{r}=\frac{r}{(r+1)!}$ +\end_inset + +, + since +\begin_inset Formula $U_{1},\dots,U_{r}$ +\end_inset + + must be ordered like +\begin_inset Formula $U_{1}<\dots<U_{r}$ +\end_inset + + and +\begin_inset Formula $U_{r+1}$ +\end_inset + + must be inserted in this permutation in any of the +\begin_inset Formula $r$ +\end_inset + + places that is not the last one (that is, + +\begin_inset Formula $U_{r+1}<U_{r}$ +\end_inset + +). + Similarly, + +\begin_inset Formula $p_{t}=\frac{1}{t!}$ +\end_inset + +. +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +exerc15[M10] +\end_layout + +\end_inset + +In the maximum-of- +\begin_inset Formula $t$ +\end_inset + + test, + why are +\begin_inset Formula $V_{0}^{t},V_{1}^{t},\dots,V_{n-1}^{t}$ +\end_inset + + supposed to be uniformly distributed between zero and one? +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + + +\begin_inset Formula $P(V_{0}^{t}\leq x)=P(V_{0}\leq\sqrt[t]{x})=\sqrt[t]{x}^{t}=x$ +\end_inset + +. +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +rexerc6[15] +\end_layout + +\end_inset + +Mr. + J. + H. + Quick (a student) wanted to perform the maximum-of- +\begin_inset Formula $t$ +\end_inset + + test for several different values of +\begin_inset Formula $t$ +\end_inset + +. +\end_layout + +\begin_layout Enumerate +Letting +\begin_inset Formula $Z_{jt}=\max(U_{j},U_{j+1},\dots,U_{j+t-1})$ +\end_inset + + he found a clever way to go from the sequence +\begin_inset Formula $Z_{0(t-1)},Z_{1(t-1)},\dots$ +\end_inset + +, + to the sequence +\begin_inset Formula $Z_{0t},Z_{1t},\dots$ +\end_inset + +, + using very little time and space. + What was his bright idea? +\end_layout + +\begin_layout Enumerate +He decided to modify the maximum-of- +\begin_inset Formula $t$ +\end_inset + + method so that the +\begin_inset Formula $j$ +\end_inset + +th observation would be +\begin_inset Formula $\max(U_{j},\dots,U_{j+t-1})$ +\end_inset + +; + in other words, + he took +\begin_inset Formula $V_{j}=Z_{jt}$ +\end_inset + + instead of +\begin_inset Formula $V_{j}=Z_{(tj)t}$ +\end_inset + + as the text says. + He reasoned that +\emph on +all +\emph default + of the +\begin_inset Formula $Z$ +\end_inset + +'s should have the same distribution, + so the test is even stronger if each +\begin_inset Formula $Z_{jt}$ +\end_inset + +, + +\begin_inset Formula $0\leq j<n$ +\end_inset + +, + is used instead of just every +\begin_inset Formula $t$ +\end_inset + +th one. + But when he tried a chi-square equidistribution test on the values of +\begin_inset Formula $V_{j}^{t}$ +\end_inset + +, + he got extremely high values of the statistic +\begin_inset Formula $V$ +\end_inset + +, + which got even higher as +\begin_inset Formula $t$ +\end_inset + + increased. + Why did this happen? +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + + +\end_layout + +\begin_layout Enumerate +\begin_inset Formula +\begin{multline*} +Z_{jt}=\max\{U_{j},\dots,U_{j+t-1}\}=\\ +=\max\{\max\{U_{j},\dots,U_{j+t-2}\},\max\{U_{j+1},\dots,U_{j+t-1}\}\}=\max\{Z_{j,t-1},Z_{j+1,t-1}\}. +\end{multline*} + +\end_inset + + +\end_layout + +\begin_layout Enumerate +The values of the +\begin_inset Formula $Z_{jt}$ +\end_inset + + for fixed +\begin_inset Formula $t$ +\end_inset + + and for all +\begin_inset Formula $j$ +\end_inset + + are not independent, + as they represent overlapping ranges of elements in +\begin_inset Formula $(U_{j})_{j}$ +\end_inset + +. +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +rexerc31[M21] +\end_layout + +\end_inset + +The recurrence +\begin_inset Formula $Y_{n}=(Y_{n-24}+Y_{n-55})\bmod2$ +\end_inset + +, + which describe the least significant bits of the lagged Fibonacci generator 3.2.2–(7) as well as the second-least significant bits of 3.2.2–(7'), + is known to have period length +\begin_inset Formula $2^{55}-1$ +\end_inset + +; + hence every possible nonzero pattern of bits +\begin_inset Formula $(Y_{n},Y_{n+1},\dots,Y_{n+54})$ +\end_inset + + occurs equally often. + Nevertheless, + prove that if we generate 79 consecutive random bits +\begin_inset Formula $Y_{n},\dots,Y_{n+78}$ +\end_inset + + starting at a random point in the period, + the probability is more than +\begin_inset Formula $\unit[51]{\%}$ +\end_inset + + that there are more 1s than 0s. + If we use such bits to define a +\begin_inset Quotes eld +\end_inset + +random walk +\begin_inset Quotes erd +\end_inset + + that moves to the right when the bit is 1 and to the left when the bit is 0, + we'll finish to the right of our starting point significantly more than half of the time. +\end_layout + +\begin_layout Standard +\begin_inset ERT +status open + +\begin_layout Plain Layout + + +\backslash +answer +\end_layout + +\end_inset + +The values +\begin_inset Formula $(Y_{n},\dots,Y_{n+54})$ +\end_inset + + can be any nonzero pattern of bits with equal probability. + It is enough to prove this when also include the zero pattern, + as the probability of finding more 1s than 0s is even higher when we don't. + +\end_layout + +\begin_layout Standard +In this case, + each +\begin_inset Formula $Y_{n+j}$ +\end_inset + +, + +\begin_inset Formula $0\leq j\leq54$ +\end_inset + +, + can be 0 or 1 with equal probability, + each independent of the other, + and +\begin_inset Formula $Y_{n+j}=(Y_{n+j-55}+Y_{n+j-24})\bmod2$ +\end_inset + + for +\begin_inset Formula $55\leq j\leq78$ +\end_inset + +. + Then, + for +\begin_inset Formula $0\leq j\leq23$ +\end_inset + +, + +\begin_inset Formula $Z_{n+j}\coloneqq Y_{n+j}+Y_{n+31+j}+Y_{n+55+j}$ +\end_inset + + is 2 with probability +\begin_inset Formula $\frac{3}{4}$ +\end_inset + + and 0 with probability +\begin_inset Formula $\frac{1}{4}$ +\end_inset + +, + and for +\begin_inset Formula $24\leq j\leq30$ +\end_inset + +, + +\begin_inset Formula $Y_{n+j}$ +\end_inset + + is 1 or 0 with equal probability +\begin_inset Formula $\frac{1}{2}$ +\end_inset + +. + Furthermore, + these probabilities are now independent. +\end_layout + +\begin_layout Standard +With this in mind, +\begin_inset Formula +\begin{align*} +F(z) & \coloneqq\sum_{k}P(Y_{n}+\dots+Y_{n+78}=k)z^{k}\\ + & =\sum_{k_{0},\dots,k_{30}}P(Z_{n}=k_{0})\cdots P(Z_{n+23}=k_{23})P(Y_{24}=k_{24})\cdots P(Y_{30}=k_{30})z^{k_{1}+\dots+k_{30}}\\ + & =\left(\frac{3}{4}z^{2}+\frac{1}{4}\right)^{24}\left(\frac{1}{2}(z+1)\right)^{7}. +\end{align*} + +\end_inset + + +\end_layout + +\begin_layout Standard +In Maxima, + we can type +\begin_inset listings +inline false +status open + +\begin_layout Plain Layout + +p:''(expand((3/4*z^2+1/4)^24*(1/2*(z+1))^7))$ +\end_layout + +\begin_layout Plain Layout + +sum(coeff(p,z,j),j,40,79),numer; +\end_layout + +\end_inset + +giving us a probability of having 1s than 0s of +\begin_inset Formula $0.5137...$ +\end_inset + + +\end_layout + +\end_body +\end_document @@ -2016,6 +2016,29 @@ A10+R25 Empirical Tests \end_layout +\begin_layout Standard +\begin_inset CommandInset include +LatexCommand input +filename "3.3.2.lyx" +literal "false" + +\end_inset + + +\begin_inset Note Note +status open + +\begin_layout Plain Layout + +\family typewriter +A10+R25 +\end_layout + +\end_inset + + +\end_layout + \begin_layout Subsection Theoretical Tests \end_layout |
