Like most of my post these code snippets derive from various other projects. In this example it shows a simulation of how one can determine if a set of t statistics are distributed properly. This can be useful when sampling known populations (e.g. U.S. census or hospital populations) or populations that will soon be known (e.g. pre-election, exit polling). This is a simple example but the concept can be expanded upon to include varying sample sizes and varying known mean values. When collecting data in real life the *nsim* value will likely be only a handful of random samples rather than a million. In this example a fixed constant sample size of 50 is used.

If you’re collecting data and you begin to see that your distribution of t scores begins to deviate from the known distribution then it might be time to tweak some of the algorithms.

set.seed(1234)

nsims <- 1000000
n <- 50
x <- replicate(nsims, rexp(n, 5))
x.sd <- apply(x, 2, sd)
x.mean <- apply(x, 2, mean)
x.t <- (x.mean - 0)/(x.sd/sqrt(nrow(x)))
qqnorm(x.t) # follows a normal distribution
(x.grand.mean <- mean(x.t)) # ~0
median(x.t) # ~0
var(x.t) # v/(v-2)
skewness(x.t) # ~0
library(e1071)
kurtosis(x.t, type=1)
theta <- seq(-4,4, by=.01)
p <- dt(theta, n)
p <- p/max(p)
d <- density(x.t)
plot(d)
plot(theta, p, type = "l", ylab = "Density", lty = 2, lwd = 3)
abline(v=x.grand.mean, col="red")
[/sourcecode]