Like most of my post these code snippets derive from various other projects. In this example it shows a simulation of how one can determine if a set of t statistics are distributed properly. This can be useful when sampling known populations (e.g. U.S. census or hospital populations) or populations that will soon be known (e.g. pre-election, exit polling). This is a simple example but the concept can be expanded upon to include varying sample sizes and varying known mean values. When collecting data in real life the *nsim* value will likely be only a handful of random samples rather than a million. In this example a fixed constant sample size of 50 is used.

If you’re collecting data and you begin to see that your distribution of t scores begins to deviate from the known distribution then it might be time to tweak some of the algorithms.

set.seed(1234) nsims <- 1000000 n <- 50 x <- replicate(nsims, rexp(n, 5)) x.sd <- apply(x, 2, sd) x.mean <- apply(x, 2, mean) x.t <- (x.mean - 0)/(x.sd/sqrt(nrow(x))) qqnorm(x.t) # follows a normal distribution (x.grand.mean <- mean(x.t)) # ~0 median(x.t) # ~0 var(x.t) # v/(v-2) skewness(x.t) # ~0 library(e1071) kurtosis(x.t, type=1) theta <- seq(-4,4, by=.01) p <- dt(theta, n) p <- p/max(p) d <- density(x.t) plot(d) plot(theta, p, type = "l", ylab = "Density", lty = 2, lwd = 3) abline(v=x.grand.mean, col="red")