This post spawned from a discussion I had the other day. Confidence intervals are notoriously a difficult topic for those unfamiliar with statistics. I can’t really think of another statistical topic that is so widely published in newspaper articles, television, and elsewhere that so few people really understand. It’s been this way since the moment that Jerzy Neyman proposed the idea (in the appendix no less) in 1937.

**What the Confidence Interval is Not**

There are a lot of things that the confidence interval is not. Unfortunately many of these are often used to define confidence interval.

- It is not the probability that the true value is in the confidence interval.
- It is not that we will obtain the true value 95% of the time.
- We are not 95% sure that the true value lies within the interval of the one sample.
- It is not the probability that we correct.
- It does not say anything about how accurate the current estimate is.
- It does not mean that if we calculate a 95% confidence interval then the true value is, with certainty, contained within that one interval.

**The Confidence Interval
**

There are several core assumption that need to be met to use confidence intervals and often require random selection, independent and identically distribution (IID) data, among others. When one computes a confidence interval repeatedly they will find that the true value lies within the computed interval 95 percent of the time. That means in the long run if we keep on computing these confidence intervals then 95% of those intervals will contain the true value.

**The other 5%**

When we have a “95% Confidence Interval” it means that if we repeatedly conduct this survey using the exact same procedures then 95% of the intervals would contain the actual, “true value”, in the long run. But that leaves a remaining 5%. Where did that go? This gets into hypothesis testing and rejecting the null () and concluding the alternative (). The 5% that is often used is known as a Type I error. It is often identified by the Greek letter alpha (). This 5% is the probability of making a Type I error and is often called significance level. This means that the probability of an error and rejecting the null hypothesis when the null hypothesis is in fact true is 5%.

**The Population
**

Simply looking at the formulas used to calculate a confidence interval we can see that it is a function of the data (variance and mean). Unless the finite population correction (FPC) is used, it is otherwise not related to the population size. If we have a population of one hundred thousand or one hundred million the confidence interval will be the same. With a population of that size the FPC is so minuscule that it won’t really change anything anyway.

**The Margin of Error
**

A direct component of the confidence interval is the margin of error. This is the number that is most widely seen in the news whether it be print, TV or otherwise. Often, however, the confidence level is excluded and not mentioned in these articles. One can normally assume a 95% confidence level, most of the time. What makes the whole thing difficult is that the margin of error could be based on a 90% confidence level making the margin of error smaller. Thus giving the artificial impression of the survey’s accuracy. The graph below shows the sample size needed for a given margin of error. This graph is based on the conservative 50% proportion. Different proportions will provide a smaller margin of error due to the math. In other words .5*.5 maximizes the margin of error (as seen in the graph above), any other combination of numbers will decrease the margin of error. Often the “magic number” for sample size seems to be in the neighborhood of 1000 respondents (with, according to Pew, a 9% response rate for telephone surveys).

**The Other Error**

Margin of error isn’t the only error. Keep in mind that the word error should not be confused with there being a mistake in the research. Error simply means random variation due to sampling. So when a survey or other study indicates a margin of error of +/- 3% that is simply the error (variation) due to random sampling. There are all sorts of other types of error that can work its way in to the research including, but not limited to, differential response, question wording on surveys, weather, and the list could go on. Many books have been written on this topic.

**Some Examples**

alpha = .01

reps = 100000

true.mean = 0

true.var = 1

true.prop = .25

raw = replicate(reps, rnorm(100,true.mean,true.var))

# Calculate the mean and standard error for each of the replicates

raw.mean = apply(raw, 2, mean)

raw.se = apply(raw, 2, sd)/sqrt( nrow(raw) )

# Calculate the margin of error

raw.moe = raw.se * qnorm(1-alpha/2)

# Set up upper and lower bound matrix. This format is useful for the graphs

raw.moe.mat = rbind(raw.mean+raw.moe, raw.mean-raw.moe)

row.names(raw.moe.mat) = c(alpha/2, 1-alpha/2)

# Calculate the confidence level

( raw.CI = (1-sum(

as.numeric( apply(raw.moe.mat, 2, min) > 0 | apply(raw.moe.mat, 2, max) < 0 )
)/reps)*100 )
# Try some binomial distribution data
raw.bin.mean = rbinom(reps,50, prob=true.prop)/50
raw.bin.moe = sqrt(raw.bin.mean*(1-raw.bin.mean)/50)*qnorm(1-alpha/2)
raw.bin.moe.mat = rbind(raw.bin.mean+raw.bin.moe, raw.bin.mean-raw.bin.moe)
row.names(raw.bin.moe.mat) = c(alpha/2, 1-alpha/2)
( raw.bin.CI = (1-sum(
as.numeric( apply(raw.bin.moe.mat, 2, min) > true.prop | apply(raw.bin.moe.mat, 2, max) <= true.prop )
)/reps)*100 )
par(mfrow=c(1,1))
ind = 1:100
ind.odd = seq(1,100, by=2)
ind.even = seq(2,100, by=2)
matplot(rbind(ind,ind),raw.moe.mat[,1:100],type="l",lty=1,col=1,
xlab="Sample Identifier",ylab="Response Value",
main=expression(paste("Confidence Intervals with ",alpha,"=.01")),
sub=paste("Simulated confidence Level: ",raw.CI,"%", sep="")
, xaxt='n')
axis(side=1, at=ind.odd, tcl = -1.0, lty = 1, lwd = 0.5, labels=ind.odd, cex.axis=.75)
axis(side=1, at=ind.even, tcl = -0.7, lty = 1, lwd = 0.5, labels=rep("",length(ind.even)), cex.axis=.75)
points(ind,raw.mean[1:100],pch=19, cex=.4)
abline(h=0, col="#0000FF")
size.seq = seq(0, 10000, by=500)[-1]
moe.seq = sqrt( (.5*(1-.5))/size.seq ) * qnorm(1-alpha/2)
plot(size.seq, moe.seq, xaxt='n', yaxt='n',
main='Margin of Error and Sample Size',
ylab='Margin of Error', xlab='Sample Size',
sub='Based on 50% Proportion')
lines(size.seq, moe.seq)
axis(side=1, at=size.seq, tcl = -1.0, lty = 1, lwd = 0.5, labels=size.seq, cex.axis=.75)
axis(side=2, at=seq(0,15, by=.005), tcl = -0.7, lty = 1, lwd = 0.5, labels=seq(0,15, by=.005), cex.axis=.75)
abline(h=seq(0,15,by=.005), col='#CCCCCC')
abline(v=size.seq, col='#CCCCCC')
size.seq = seq(0,1, by=.01)
moe.seq = sqrt( (size.seq*(1-size.seq))/1000 ) * qnorm(1-alpha/2)
plot(size.seq, moe.seq, xaxt='n', yaxt='n',
main='Margin of Error and Sample Size',
ylab='Margin of Error', xlab='Proportion',
sub='Based on 50% Proportion')
lines(size.seq, moe.seq)
axis(side=1, at=size.seq, tcl = -1.0, lty = 1, lwd = 0.5, labels=size.seq, cex.axis=.75)
axis(side=2, at=seq(0,15, by=.005), tcl = -0.7, lty = 1, lwd = 0.5, labels=seq(0,15, by=.005), cex.axis=.75)
abline(h=seq(0,15,by=.005), col='#CCCCCC')
abline(v=.5, col="#CCCCCC")
[/sourcecode]

being confident (HT: D Giles) http://t.co/06yi1DI2Li

Nice post. Would be interesting if you could extend the discussion to include bayesian credible intervals.

It would be interesting to explore whether the assertions to which CIs are not equal are themselves explicable more clearly than are CIs (e.g., the probability the true value is in the interval, or the degree to which we are sure is such and such.)

It seems to me that CIs have a very straightforward meaning, which becomes even more relevant by stating several benchmarks (at different confidence levels). For example, to assert mu > CI-lower at level .025 tells us that were mu smaller than CI-lower, then .975 of the observed proportions would have exceeded the value we observed. Therefore, this is a good indication that mu > CI-lower (where mu is the population proportion in your ex.) On the other hand the data do not indicate that mu > observed proportion, say, because even smaller values of mu would readily generate observations in excess of what we observed. (Well, this is clearly in my published work, I hope.)

The Confidence Interval – what it is, what it is not, and what do you tell others about it.

http://t.co/LabJYV5hwr #CI #statistics

MT @statisticsblog: The Confidence Interval – what it is, what it is not, and what do you tell others about it.

http://t.co/BjqcA0GXvg

I understand that the real meaning of the 95% confidence interval is that in the case we have let’s say 30 samples and then we construct 30 CI’s from these samples, the theory says approximately 95% of these intervals will contain the true value. Now, let’s say I have only one sample from the same population(with random selection, iidness, etc). Now, what is the probability that this CI I get from this sample is one of those 95% CI’s that contain the true value?

Confidence intervals are a funny thing. By obtaining a confidence interval we say that the true parameter is either in the interval or it is not (either 0 or 1). This does not mean that we would expect, with 95% probability, that the mean from another sample is in this interval (it either is or it isn’t). In that case we would be comparing the differences between two sample means. In order to have any idea on the probability we would need to know additional information and that brings us to the topic of Bayesian statistics and credible intervals, which quite frankly, can be more intuitive than frequentist statistics (e.g. confidence intervals).

Thanks for your reply. So, I think the interpretation that should be given to a confidence interval should always be “the true parameter value can either be covered by this interval or not”. Which states more clearly that knowing that in the long run 95% of your intervals will contain the true parameter value is useless.

RT @statisticsblog: The Confidence Interval – what it is, what it is not, and what do you tell others about it.

http://t.co/LabJYV5hwr #CI …

RT @statisticsblog: The Confidence Interval – what it is, what it is not, and what do you tell others about it.

http://t.co/LabJYV5hwr #CI …

Nice!!!

It is not the probability that the true value is in the confidence interval.

But that is precisely what it is. Whenever you draw a random sample, the true value will lie within the interval with that probability.

A frequentist would argue that the true value is a fixed parameter and therefore lacks a distribution. Consequently, there is no probability associated with the non-random (and unknown) parameter. In frequentist terms the interval either does or does not contain the parameter (0 or 1). However, a Bayesian credible interval can have those properties by attaching a probability to it.

“When Discussing Confidence Level With Others…” http://t.co/cjLYsqKjSm

You should provide some references (one such http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm )

, since plenty of published books on statistics do not get it right, either.

Confidence intervals (When Discussing Confidence Level With Others… | Statistical Research) http://t.co/OTjgbDdjjv

This post needs an edit.

This statement: “We are not 95% sure that the true value lies within the interval.” can be read to be contradicted by this one: “When one computes a confidence interval repeatedly they will find that the true value lies within the computed interval 95 percent of the time.”

The first should be qualified as referring to just one sample.

Thanks. I clarified the statement so it reads that it is referring specifically to the one sample.

That is better, but even better would be “We are not 95% sure that the true value lies within the interval constructed around the mean drawn from one sample.”

Nice post. Thanks.

Since you refer to Jerzy Neyman, and to touch a can of worms related to statistical epistemology…

“This 5% is the probability of making a Type I error and is often called significance level. This means that the probability of an error and rejecting the null hypothesis when the null hypothesis is in fact true is 5%.”

Some argue that “The p-value is not the probability of falsely rejecting the null hypothesis. “, and that this is representative of a flawed reasoning coming from artificially conflating Fisher’s p-values and Neyman-Pearson hypothesis-testing formalism:

http://en.wikipedia.org/wiki/P-value#Misunderstandings

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Origins_and_early_controversy

I have no useful opinion on the matter, but what do you think?