Rick Wicklin on the SAS blog made a post today on how to tell if a sequence of coin flips were random. I figured it was only fair to port the SAS IML code over to R. Just like Rick Wicklin did in his example this is the Wald-Wolfowitz test for randomness. I tried to match his code line-for-line.
flips = matrix(c('H','T','T','H','H','H','T','T','T','T','T','T','T','H','H','H','T','H','T','H','H','H','T','H','H','H','T','H','T','H')) RunsTest = function(flip.seq){ u = unique(flip.seq) # unique value (should be two) d = rep(-1, nrow(flip.seq)*ncol(flip.seq)) # recode as vector of -1, +1 d[flip.seq==u[1]] = 1 n = sum(d > 0) # count +1's m = sum(d < 0) # count -1's dif = c(ifelse(d[1] < 0, 2, -2), diff( sign(d) )) # take the lag and find differences R = sum(dif==2 | dif==-2) # count up the number of runs ww.mu = 2*n*m / (n+m) + 1 # get the mean ww.var = (ww.mu-1)*(ww.mu-2)/(n+m-1) # get the variance sigma = sqrt(ww.var) # standard deviation # compute test statistics if((n+m) > 50){ Z = (R-ww.mu) / sigma } else if ((R-ww.mu) < 0){ Z = (R-ww.mu+0.5) / sigma } else { Z = (R-ww.mu-0.5)/sigma } pval = 2*(1-pnorm(abs(Z))) # compute a two-sided p-value ret.val = list(Z=Z, p.value=pval) return(ret.val) } runs.test = RunsTest(flips) runs.test > runs.test $Z [1] -0.1617764 $p.value [1] 0.8714819
Nice. I often convert programs from R to SAS/IML. The languages are similar enough that most computations port with minimal effort, as you’ve shown here.
By typing
runs Test in R
into a search engine, I found at least two R packages that support the runs test. Searching for “Wald-Wolfowitz test” might give more. Best wishes!
Well… Define “random.” You guys are checking that it fits a binomial, but you failed to use the critical word “fair” to define your coin. Ask any mathematician — he’ll tell you that an unfair coin can do anything; similarly tests with actual coins show a bias because they are not perfectly symmetric (center of mass and all that). So “true random” might require fitting to the expected outcome of a Not_Fair coin.
(BTW, my captcha was (?)*9 = 72, which based on HHGTTG should be more like 10.3 than 8 🙂 )
I run into a little problem: After changing ret.val = pval and replicating i get a nearly uniform distribution of p-values. What went wrong (assuming that the sampler is random)
loops = 1000
mcRes = rep(0, loops)
flips= matrix(sample(c(“H”,”T”), 1000*loops, T), ncol=1000)
for(iter in 1:loops)
{
mcRes[iter] = RunsTest(matrix(flips[iter,]))
}
summary(mcRes)
ps: summary(mcRes)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0007974 0.2688000 0.5321000 0.5145000 0.7565000 0.9999000
This is screaming out for an application of the “rle” (run length encoding) function. Especially so considering the ‘number of runs’ is explicitly mentioned. From a quick search, it doesn’t look like SAS IML has an “rle” function, which might explain why it wasn’t used in the original code.
Yes! My favorite R-function! So much so that (shameless plug) I published a version which finds linear sequences as well as runs. See cgwtools::seqle at CRAN