Here is an example of nonparametric bootstrapping. It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as parametric approaches but it gets the job done. This can be used in a variety of situations ranging from variance estimation to model selection. John Tukey, as the story goes, suggested the name “the shotgun” because you can blow the head off any statistical problem.

The code below is for illustrative purposes and compares a couple of different approaches for bootstrapping. The mean shows a very nice distribution but something like a median is not so symmetrical The code below can easily be changed to allow for any single statistic (e.g. any percentile). A little bit of alteration and bivariate statistics (e.g. correlation) can be bootstrapped. One can observe that it is quite simple to obtain the confidence interval directly. By using *nboot*=10000 (or any other number that can easily be divided) it makes it quite simple to find the confidence interval by merely taking the *alpha/2 *and *(1-alpha/2) *percentiles; in this case below the 50 and 9950 positions.

library(boot) nboot <- 10000 # Number of simulations alpha <- .01 # alpha level n <- 1000 # sample size bootThetaQuantile <- function(x,i) { quantile(x[i], probs=.5) } bootThetaMean <- function(x,i) { mean(x[i]) } raw <- rnorm(n,0, 1) # raw data ( theta.boot.median <- boot(raw, bootThetaQuantile, R=nboot) ) boot.ci(theta.boot.median, conf=(1-alpha)) ( theta.boot.mean <- boot(raw, bootThetaMean, R=nboot) ) boot.ci(theta.boot.mean, conf=(1-alpha)) my.replicate <- replicate(nboot, raw[sample(1:length(raw), n, replace=TRUE)]) # Bootstrap theta.median <- apply(my.replicate, 2, bootThetaQuantile) theta.mean <- apply(my.replicate, 2, bootThetaMean) hist(theta.median, xlim=c(-.2,.2), nclass=50, col=3, main="Histogram of Bootstrap Confidence Intervals for Median") hist(theta.mean, xlim=c(-.2,.2), nclass=50, col=3, main="Histogram of Bootstrap Confidence Intervals for Mean") sort(theta.median)[nboot*alpha/2] sort(theta.median)[nboot*(1-alpha/2)] sort(theta.mean)[nboot*alpha/2] sort(theta.mean)[nboot*(1-alpha/2)] ### Randomly generated data my.replicate <- replicate(nboot, rnorm(n,0,1)) theta.rand.median <- apply(my.replicate, 2, bootThetaQuantile) theta.rand.mean <- apply(my.replicate, 2, bootThetaMean) ci.u <- mean(theta.rand.mean)+qnorm(1-alpha/2)*sd(raw)/sqrt(n) ci.l <- mean(theta.rand.mean)-qnorm(1-alpha/2)*sd(raw)/sqrt(n) hist(theta.rand.median, xlim=c(-.2,.2), nclass=100, col=3, main="Histogram of Randomly Generated Data for Medians") hist(theta.rand.mean, xlim=c(-.2,.2), nclass=50, col=3, main="Histogram of Randomly Generated Data for Means") abline(v=c(ci.u,ci.l))

Hi,

Nice post. Few comments:

(1) minor text correction: “…code below can easily be easily by changed to..”

(2) “It’s clearly not as good as parametric approaches but it gets the job done”. I actually think therein lies the power, since you can relax any distributional assumption.

(3) here: http://eranraviv.com/blog/bootstrap-example/

there is similar example in a regression context. It also refers to the price you pay when you use non-parametric when you can use, say, Normal distribution bootstrap (not high..).

Cheers

E

Thanks for the comments. Text error has been corrected.

Agreed on item 2. Non-parametrics can be more powerful than parametric processes as it gives you more flexibility. I guess I should have clarified a bit more that if the distribution of the data is known then one may be able to use that to their advantage.

Also, great post on regression bootstrapping on your blog. Thanks