Significant P-Values and Overlapping Confidence Intervals

There are all sorts of problems with p-values and confidence intervals and I have no intention (or the time) to cover all those problems right now.  However, a big problem is that most people have no idea what p-values really mean. Here is one example of a common problem with p-values and how it relates to confidence intervals.  The problem arises when there are two  random variables (from independent populations), each with a mean and variance.  A confidence interval can be constructed around each sample mean.  Using these confidence intervals might be a tempting way to explain whether two values are statistically different.  The issue is that a person may see that the confidence intervals overlap and therefore declare that there is no difference.  Simply put this is not one of those iff (if and only if) situations.  If the confidence intervals do not overlap then one can conclude that there is a statistical difference between means.  However, the opposite can not be concluded.  When the confidence intervals do overlap then there still may be a difference.

Here are simulated data from two independent normally distributed populations testing the confidence intervals and the p-values.  It can easily be seen that this is a fairly frequent event. So don’t make the mistake and make conclusions solely on confidence intervals.Histogram of Confidence Intervals

Beta Distribution for Overlapping Confidence Intervals


###
#Set some constants
###
alpha = .05
m = 15
n = 15
nsim = 20000

###
#Function to calculate the t statistic. Same as t.test(x,y, var.eqal=T)
# `spooled` can easiliy be modified.
###
tstatfunc=function(x,y){
m = length(x)
n = length(y)
spooled = sqrt(((m-1)*sd(x)^2+(n-1)*sd(y)^2)/(m+n-2))
tstat = (mean(x)-mean(y))/(spooled*sqrt(1/m+1/n))

return(tstat)
}

calcInterval = function(){
x = rnorm(m,0,1)
y = rnorm(n,1,.8)
se.x = sd(x)/sqrt(m)
se.y = sd(y)/sqrt(n)

t.stat = tstatfunc(x,y)

p.val = (1-pt(abs(t.stat),m+n-2))*2 #Pooled Variance, two sided hypothesis
ci.x.ll = mean(x)-abs(qt(alpha/2,m-1))*se.x
ci.x.ul = mean(x)+abs(qt(alpha/2,m-1))*se.x
ci.y.ll = mean(y)-abs(qt(alpha/2,n-1))*se.y
ci.y.ul = mean(y)+abs(qt(alpha/2,n-1))*se.y
TTest = t.test(x,y, var.equal=TRUE) #Run the t.test() function for comparison
ret.val = c(p=p.val, t.p=TTest$p.value,
ci.x.ll=ci.x.ll, ci.x.ul=ci.x.ul,
ci.y.ll=ci.y.ll, ci.y.ul=ci.y.ul)

return(ret.val)
}

###
#Replicate a few times
###
my.sims = as.data.frame( t(replicate(nsim,calcInterval())) )

###
#Do the intervals overlap and it's significant?
###
ci.vals = cbind(my.sims$ci.x.ll - my.sims$ci.y.ul, my.sims$ci.x.ul-my.sims$ci.y.ul, my.sims$ci.x.ul-my.sims$ci.y.ll)
overlapTest = (ci.vals[,1] > 0 & ci.vals[,2] > 0 & ci.vals[,3] > 0) |
(ci.vals[,1] < 0 & ci.vals[,2] < 0 & ci.vals[,3] < 0)
my.sims = cbind(my.sims,ci.vals,NotOverlap=overlapTest)

sum(my.sims$CI.p)/nsim

hist(
rbeta(100000,sum(my.sims$CI.p)+1, nsim-sum(my.sims$CI.p)+1) ##CI.p is a binomial distribution.
, nclass=100, xlab="Proportion", freq=F, main=expression("Histogram of Overlapping Confidence Intervals and p<"*alpha*" from Beta Distribution"))
###
#What percent have overlapping CI and p < alpha? -- "Significant" but yet CI indicate otherwise.
#Multiple comparisons
###
my.sims$CI.p = as.numeric( !my.sims$NotOverlap & my.sims$p < alpha )
my.sims$CI.p2 = as.numeric( my.sims$NotOverlap & my.sims$p > alpha )
my.sims$my.diff = my.sims$p-my.sims$t.p # Check calculations for consistency

sum(my.sims$CI.p)/length(my.sims$CI.p)

###
#How many have confidence intervals that do not overlap but yet are still "Significant"?
###
my.sims$CI.p2 = as.numeric( my.sims$NotOverlap & my.sims$p > alpha )
sum(my.sims$CI.p2)

###
#Histograms of the intervals
###
hist(my.sims$t.p, nclass=100)
subset(my.sims,my.sims$p > .5)

###
#Histograms of the intervals
###
hist(my.sims$ci.y.ul, nclass=100, xlim=c(-2,3), ylim=c(0,2), col=4, freq=F,
main="Histogram of Confidence Intervals", xlab="Value")
hist(my.sims$ci.y.ll, nclass=100, add=T, col=4, freq=F)
hist(my.sims$ci.x.ul, nclass=100, add=T, col=2, freq=F)
hist(my.sims$ci.x.ll, nclass=100, add=T, col=2, freq=F)

m4s0n501
Leave a comment

4 Comments

  1. Hi Wesley, nice illustration.
    Coincidentally, (as with your bootstrap post I guess..) I just wrote something related couple of weeks back: http://eranraviv.com/blog/on-p-value/ Took a hypothesis testing approach which is strongly linked to the CI overlapping you present.

    Reply
    • Wesley

      Thanks. Indeed it was purely coincidental that they are similar posts. It also just so happened that I was getting ready to make the post and realized that Andrew Gelman posted something along the same lines earlier that day with a link to some commentary he published earlier this year.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


seven − 4 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

%d bloggers like this: