P-values have been an issue for statistician for an extremely long time. I’ve seen situations where entire narratives are written without p-values and only provide the effects. It can also be used as a data reduction tool but ultimately it reduces the world into a binary system: yes/no, accept/reject. Not only that but the binary threshold is determined on a roughly subjective criterion. It reminds me of Jon Stewart’s recent discussion “good thing/bad thing“. Taking the most complex of issues and reducing them down to two outcomes.

Below is a simple graph (nothing elaborate) that shows how p-values alone don’t tell the whole story. Sometimes, data is reduced so much that solid decisions are difficult to make. The graph on the left shows a simulated situation where there are identical p-values but very different effects. The graph on the right shows where the p-values are very different, and one is quite low, but the simulated effects are the same.

P-values and confidence intervals have quite a bit in common and when interpreted incorrectly can be misleading. Simply put a p-value is the probability of the observed data (e.g. ), or more extreme data, given the null hypothesis is true[1, 2, 3, see also Moore’s book The Basic Practice of Statistics, 2nd ed. p 321-322].

Ultimately, I could write a fairly lengthy discussion, and there are several others (e.g. Gelman), on the topic. However, for this blog post I’ll highlight this one idea that with p-values it’s possible to get differing effects with the same p-values and differing p-values with the same effect. In the example below I opted for extreme data to highlight the point. Here’s a quick matplot of the example…

In conclusion, P-values are useful and can be useful. But we sometimes need to look beyond the test statistic and p-value to understand the data. Below is some code to simulate some extreme datasets.

set.seed(1234) # fix the one sample. However, replicate is randomized. So exact replication of these data not possible. Need a lot and sometimes it doesn't always work so it may need to be rerun. x1 = rnorm(10, 0, 1) x2 = replicate(500000, rnorm(10, 0, 5)) set.seed(1234) # same as previous seed. x3 = rnorm(50, 0, 1) x4 = replicate(500000, rnorm(10, 0, 4)) get.ttest = function(x){ ## This is equivelent to the one sample t-test from t.test() ## just explicitly showing the formula t.x1 = abs( ( mean(x) - 0 ) / ( sd(x)/sqrt(length(x)) ) ) p.value = pt(t.x1, 9, lower.tail=FALSE)*2 # two-sided return(p.value) } get.ttest.ci = function(x){ ## This is equivelent to the one sample t-test from t.test() ## just explicitly showing the formula me = qt(1-.05/2, length(x)-1) * sd(x)/sqrt(length(x)) ll = mean(x)-me ul = mean(x)+me return(rbind(ll, ul)) } ### Find data with the greatest difference in effect but yet the same p-value. ## No real reason for this approach it just helps finding extreme sets of data. ## Need a very high number of simulations to ensure the constraints on effect.match are meet. sim.p = apply(x2, 2, get.ttest) sim.ci = apply(x2, 2, get.ttest.ci) sim.dif = sim.ci[1,]-sim.ci[2,] effect.match = x2[,round(get.ttest(x1),3) == round(sim.p,3) & sim.dif==min(sim.dif)] sim.max.effect = apply(effect.match, 2, mean) - mean(x1) pick.max.effect = which( sim.max.effect == max(sim.max.effect) ) pick.small.ci = effect.match[,pick.max.effect] ci.matrix = cbind( get.ttest.ci(x1), get.ttest.ci(pick.small.ci) ) ### Find data with the same effect and has the greatest difference in p-value sim.mean = apply(x4, 2, mean) effect.match.mean = x4[, round(mean(x3),1) == round(sim.mean, 1)] sim.max.p = apply(effect.match.mean, 2, get.ttest) - get.ttest(x3) pick.max.p = which( sim.max.p == max(sim.max.p) ) pick.small.effect = effect.match.mean[,pick.max.p] ci.matrix.effect = cbind( get.ttest.ci(x3), get.ttest.ci(pick.small.effect) ) ###Plot the graph par(mfrow=c(1,2)) ind=1:ncol( ci.matrix ) ind.odd=seq(1,ncol( ci.matrix ), by=1) ind.even=seq(2,ncol( ci.matrix ), by=1) matplot(rbind(ind,ind),ci.matrix,type="l",lty=1, lwd=1, col=1, xlab="Group",ylab="Response Variable, y", main=paste("Comparison of data with the same p-value of ", round(get.ttest(x1),2),"\nbut different effects", sep="") , xaxt='n') axis(side=1, at=ind.odd, tcl = -1.0, lty = 1, lwd = 0.5, labels=ind.odd, cex.axis=.75) axis(side=1, at=ind.even, tcl = -0.7, lty = 1, lwd = 0.5, labels=rep("",length(ind.even)), cex.axis=.75) points(ind,c(mean(x1),mean(pick.small.ci)),pch=19, cex=1, col='red') ###Plot the graph ind=1:ncol( ci.matrix.effect ) ind.odd=seq(1,ncol( ci.matrix.effect ), by=1) ind.even=seq(2,ncol( ci.matrix.effect ), by=1) matplot(rbind(ind,ind),ci.matrix.effect,type="l",lty=1, lwd=1, col=1, xlab="Group",ylab="Response Variable, y", main=paste("Comparison of data with the same effect of ", round(mean(x3),1), "\n but different p-values ", sprintf("%.3f", get.ttest(x3)), " and ", sprintf("%.3f", get.ttest(x4) ) , sep="") , xaxt='n') axis(side=1, at=ind.odd, tcl = -1.0, lty = 1, lwd = 0.5, labels=ind.odd, cex.axis=.75) axis(side=1, at=ind.even, tcl = -0.7, lty = 1, lwd = 0.5, labels=rep("",length(ind.even)), cex.axis=.75) points(ind,c(mean(x3),mean(pick.small.effect)),pch=19, cex=1, col='red')