There are all sorts of problems with p-values and confidence intervals and I have no intention (or the time) to cover all those problems right now. However, a big problem is that most people have no idea what p-values really mean. Here is one example of a common problem with p-values and how it relates to confidence intervals. The problem arises when there are two random variables (from independent populations), each with a mean and variance. A confidence interval can be constructed around each sample mean. Using these confidence intervals might be a tempting way to explain whether two values are statistically different. The issue is that a person may see that the confidence intervals overlap and therefore declare that there is no difference. Simply put this is not one of those **iff **(if and only if) situations. If the confidence intervals **do not** overlap then one can conclude that there is a statistical difference between means. However, the opposite can not be concluded. When the confidence intervals **do** overlap then there still may be a difference.

Here are simulated data from two independent normally distributed populations testing the confidence intervals and the p-values. It can easily be seen that this is a fairly frequent event. So don’t make the mistake and make conclusions solely on confidence intervals.

### #Set some constants ### alpha = .05 m = 15 n = 15 nsim = 20000 ### #Function to calculate the t statistic. Same as t.test(x,y, var.eqal=T) # `spooled` can easiliy be modified. ### tstatfunc=function(x,y){ m = length(x) n = length(y) spooled = sqrt(((m-1)*sd(x)^2+(n-1)*sd(y)^2)/(m+n-2)) tstat = (mean(x)-mean(y))/(spooled*sqrt(1/m+1/n)) return(tstat) } calcInterval = function(){ x = rnorm(m,0,1) y = rnorm(n,1,.8) se.x = sd(x)/sqrt(m) se.y = sd(y)/sqrt(n) t.stat = tstatfunc(x,y) p.val = (1-pt(abs(t.stat),m+n-2))*2 #Pooled Variance, two sided hypothesis ci.x.ll = mean(x)-abs(qt(alpha/2,m-1))*se.x ci.x.ul = mean(x)+abs(qt(alpha/2,m-1))*se.x ci.y.ll = mean(y)-abs(qt(alpha/2,n-1))*se.y ci.y.ul = mean(y)+abs(qt(alpha/2,n-1))*se.y TTest = t.test(x,y, var.equal=TRUE) #Run the t.test() function for comparison ret.val = c(p=p.val, t.p=TTest$p.value, ci.x.ll=ci.x.ll, ci.x.ul=ci.x.ul, ci.y.ll=ci.y.ll, ci.y.ul=ci.y.ul) return(ret.val) } ### #Replicate a few times ### my.sims = as.data.frame( t(replicate(nsim,calcInterval())) ) ### #Do the intervals overlap and it's significant? ### ci.vals = cbind(my.sims$ci.x.ll - my.sims$ci.y.ul, my.sims$ci.x.ul-my.sims$ci.y.ul, my.sims$ci.x.ul-my.sims$ci.y.ll) overlapTest = (ci.vals[,1] > 0 & ci.vals[,2] > 0 & ci.vals[,3] > 0) | (ci.vals[,1] < 0 & ci.vals[,2] < 0 & ci.vals[,3] < 0) my.sims = cbind(my.sims,ci.vals,NotOverlap=overlapTest) sum(my.sims$CI.p)/nsim hist( rbeta(100000,sum(my.sims$CI.p)+1, nsim-sum(my.sims$CI.p)+1) ##CI.p is a binomial distribution. , nclass=100, xlab="Proportion", freq=F, main=expression("Histogram of Overlapping Confidence Intervals and p<"*alpha*" from Beta Distribution")) ### #What percent have overlapping CI and p < alpha? -- "Significant" but yet CI indicate otherwise. #Multiple comparisons ### my.sims$CI.p = as.numeric( !my.sims$NotOverlap & my.sims$p < alpha ) my.sims$CI.p2 = as.numeric( my.sims$NotOverlap & my.sims$p > alpha ) my.sims$my.diff = my.sims$p-my.sims$t.p # Check calculations for consistency sum(my.sims$CI.p)/length(my.sims$CI.p) ### #How many have confidence intervals that do not overlap but yet are still "Significant"? ### my.sims$CI.p2 = as.numeric( my.sims$NotOverlap & my.sims$p > alpha ) sum(my.sims$CI.p2) ### #Histograms of the intervals ### hist(my.sims$t.p, nclass=100) subset(my.sims,my.sims$p > .5) ### #Histograms of the intervals ### hist(my.sims$ci.y.ul, nclass=100, xlim=c(-2,3), ylim=c(0,2), col=4, freq=F, main="Histogram of Confidence Intervals", xlab="Value") hist(my.sims$ci.y.ll, nclass=100, add=T, col=4, freq=F) hist(my.sims$ci.x.ul, nclass=100, add=T, col=2, freq=F) hist(my.sims$ci.x.ll, nclass=100, add=T, col=2, freq=F)

Nice post. Here’s a relevant paper:

http://amstat.tandfonline.com/doi/abs/10.1198/000313001317097960

Hi Wesley, nice illustration.

Coincidentally, (as with your bootstrap post I guess..) I just wrote something related couple of weeks back: http://eranraviv.com/blog/on-p-value/ Took a hypothesis testing approach which is strongly linked to the CI overlapping you present.

Thanks. Indeed it was purely coincidental that they are similar posts. It also just so happened that I was getting ready to make the post and realized that Andrew Gelman posted something along the same lines earlier that day with a link to some commentary he published earlier this year.

Frequentist caveat! Beware the error of concluding “non-significance” on the basis of overlapping CIs http://t.co/9zsrwff6Yc