P-values seem to be the bane of a statistician’s existence. I’ve seen situations where entire narratives are written without p-values and only provide the effects. It can also be used as a data reduction tool but ultimately it reduces the world into a binary system: yes/no, accept/reject. Not only that but the binary threshold is determined on a roughly subjective criterion. It reminds me of Jon Stewart’s recent discussion “good thing/bad thing“. Taking the most complex of issues and reducing them down to two outcomes.

Below is a simple graph (nothing elaborate) that shows how p-values alone don’t tell the whole story. Sometimes, data is reduced so much that solid decisions are difficult to make. The graph on the left shows a simulated situation where there are identical p-values but very different effects. The graph on the right shows where the p-values are very different, and one is quite low, but the simulated effects are the same.

P-values and confidence intervals have quite a bit in common and when interpreted incorrectly can be misleading. Simply put a p-value is the probability of the observed data (e.g. ), or more extreme data, given the null hypothesis is true[1, 2, 3, see also Moore's book The Basic Practice of Statistics, 2nd ed. p 321-322].

Ultimately, I could write a fairly lengthy discussion, and there are several others (e.g. Gelman), on the topic. However, for this blog post I’ll highlight this one idea that with p-values it’s possible to get differing effects with the same p-values and differing p-values with the same effect. In the example below I opted for extreme data to highlight the point. Here’s a quick little matplot of the example…

In conclusion, don’t get me wrong. P-values are useful and I actually use them quite often. But we sometimes need to look beyond the test statistic and p-value to understand the data. Below is some code to simulate some extreme datasets.

set.seed(1234) # fix the one sample. However, replicate is randomized. So exact replication of these data not possible. Need a lot and sometimes it doesn't always work so it may need to be rerun.
x1 = rnorm(10, 0, 1)
x2 = replicate(500000, rnorm(10, 0, 5))
set.seed(1234) # same as previous seed.
x3 = rnorm(50, 0, 1)
x4 = replicate(500000, rnorm(10, 0, 4))
get.ttest = function(x){
## This is equivelent to the one sample t-test from t.test()
## just explicitly showing the formula
t.x1 = abs( ( mean(x) - 0 ) / ( sd(x)/sqrt(length(x)) ) )
p.value = pt(t.x1, 9, lower.tail=FALSE)*2 # two-sided
return(p.value)
}
get.ttest.ci = function(x){
## This is equivelent to the one sample t-test from t.test()
## just explicitly showing the formula
me = qt(1-.05/2, length(x)-1) * sd(x)/sqrt(length(x))
ll = mean(x)-me
ul = mean(x)+me
return(rbind(ll, ul))
}
### Find data with the greatest difference in effect but yet the same p-value.
## No real reason for this approach it just helps finding extreme sets of data.
## Need a very high number of simulations to ensure the constraints on effect.match are meet.
sim.p = apply(x2, 2, get.ttest)
sim.ci = apply(x2, 2, get.ttest.ci)
sim.dif = sim.ci[1,]-sim.ci[2,]
effect.match = x2[,round(get.ttest(x1),3) == round(sim.p,3) & sim.dif==min(sim.dif)]
sim.max.effect = apply(effect.match, 2, mean) - mean(x1)
pick.max.effect = which( sim.max.effect == max(sim.max.effect) )
pick.small.ci = effect.match[,pick.max.effect]
ci.matrix = cbind(
get.ttest.ci(x1),
get.ttest.ci(pick.small.ci)
)
### Find data with the same effect and has the greatest difference in p-value
sim.mean = apply(x4, 2, mean)
effect.match.mean = x4[, round(mean(x3),1) == round(sim.mean, 1)]
sim.max.p = apply(effect.match.mean, 2, get.ttest) - get.ttest(x3)
pick.max.p = which( sim.max.p == max(sim.max.p) )
pick.small.effect = effect.match.mean[,pick.max.p]
ci.matrix.effect = cbind(
get.ttest.ci(x3),
get.ttest.ci(pick.small.effect)
)
###Plot the graph
par(mfrow=c(1,2))
ind=1:ncol( ci.matrix )
ind.odd=seq(1,ncol( ci.matrix ), by=1)
ind.even=seq(2,ncol( ci.matrix ), by=1)
matplot(rbind(ind,ind),ci.matrix,type="l",lty=1, lwd=1, col=1,
xlab="Group",ylab="Response Variable, y", main=paste("Comparison of data with the same p-value of ", round(get.ttest(x1),2),"\nbut different effects", sep="")
, xaxt='n')
axis(side=1, at=ind.odd, tcl = -1.0, lty = 1, lwd = 0.5, labels=ind.odd, cex.axis=.75)
axis(side=1, at=ind.even, tcl = -0.7, lty = 1, lwd = 0.5, labels=rep("",length(ind.even)), cex.axis=.75)
points(ind,c(mean(x1),mean(pick.small.ci)),pch=19, cex=1, col='red')
###Plot the graph
ind=1:ncol( ci.matrix.effect )
ind.odd=seq(1,ncol( ci.matrix.effect ), by=1)
ind.even=seq(2,ncol( ci.matrix.effect ), by=1)
matplot(rbind(ind,ind),ci.matrix.effect,type="l",lty=1, lwd=1, col=1,
xlab="Group",ylab="Response Variable, y", main=paste("Comparison of data with the same effect of ", round(mean(x3),1), "\n but different p-values ", sprintf("%.3f", get.ttest(x3)), " and ", sprintf("%.3f", get.ttest(x4) ) , sep="")
, xaxt='n')
axis(side=1, at=ind.odd, tcl = -1.0, lty = 1, lwd = 0.5, labels=ind.odd, cex.axis=.75)
axis(side=1, at=ind.even, tcl = -0.7, lty = 1, lwd = 0.5, labels=rep("",length(ind.even)), cex.axis=.75)
points(ind,c(mean(x3),mean(pick.small.effect)),pch=19, cex=1, col='red')

### Like this:

Like Loading...