The Uncertainty of Predictions

There are many kinds of intervals in statistics.  To name a few of the common intervals: confidence intervals, prediction intervals, credible intervals, and tolerance intervals. Each are useful and serve their own purpose.

I’ve been recently working on a couple of projects that involve making predictions from a regression model and I’ve been doing some work on estimation and the variability of the estimates. Recently, in my case, I need to predict an individual outcome drawn from the distribution (this differs from estimating the mean of that distribution). Often I end up estimating several simultaneous individual outcomes. It follows that we need to assume the underlying regression model still applies and is appropriate for this new individual observation.

Prediction vs. Confidence Intervals

Prediction Intervals and Confidence Intervals

Prediction intervals are useful in that it gives us an understanding of likely values that an individual observation may be.  There is a subtle, but big, difference between prediction intervals and confidence intervals.  With a confidence interval we are trying to obtain an upper and lower limit for the parameter.  For a prediction interval we are interested in the upper and lower limits on an individual observation.  One key item to note is that prediction intervals will always be wider than the corresponding confidence intervals.  The reason for this is that when making a new prediction on an observation ( \hat{y_{i}} ) we encounter the variability in the observation estimate as well as variation within the probability distribution of y.

These intervals often get confused with confidence intervals (which those alone have many nuances). But they really need to be kept separate because calling a prediction interval a confidence interval (or vice versa) will provide a false understanding of the variability.

Doing Many Prediction Intervals

Sometimes it is necessary to estimate several new observations.  In this case we need an overall family confidence level of (1-\alpha). To compensate for simultaneously estimating many observations there are a couple of difference procedures: ScheffĂ© and Bonferroni.  It seems, based on various conferences that I’ve attended, that Bonferroni is most common, probably because it is not only conservative in its application but because it is so simple to implement.

Bayesian Posterior Predictive Distribution

It’s worth mentioning the posterior predictive distribution using a Bayesian approach.  In Bayesian statistics the parameter is treated as random and often unknown (as compared to frequentists where it’s fixed and often unknown).   So here we can use the posterior predictive distribution for new data. It also happens to be a good way to see how well the model performed.

Posterior Prediction Distribution

Example Code

library(LearnBayes)
alpha = 0.05
y = rnorm(100,0, 1)
x1= y+rnorm(100,1,1)
x2= y+rnorm(100,2,2)

##### Least-squares fit
fit=lm(y~x1+x2,x=TRUE,y=TRUE)

new = data.frame(x1 = seq(-5, 5, 0.5), x2=seq(-3, 7, 0.5))

pred.w.plim = predict(fit, new, interval = "prediction")
pred.w.clim = predict(fit, new, interval = "confidence")
matplot(new$x1, cbind(pred.w.clim, pred.w.plim[,-1]),
        lty = c(1,2,2,3,3), col=c(1,2,2,3,3), type = "l", 
        ylab = "predicted y", xlab= "x1", main="Confidence vs. Prediction Intervals")
legend("topleft", c("Confidence","Prediction"), lwd=2, lty=c(2,3), col=c(2,3), cex=.7)

##### Sampling from posterior
theta.sample=blinreg(fit$y,fit$x,5000)


######### Model checking via posterior predictive distribution
pred.draws=blinregpred(fit$x,theta.sample)
pred.sum=apply(pred.draws,2,quantile,c(alpha/2,1-alpha/2))

par(mfrow=c(1,1))
ind=1:ncol( pred.sum )
ind.odd=seq(1,ncol( pred.sum ), by=2)
ind.even=seq(2,ncol( pred.sum ), by=2)
matplot(rbind(ind,ind),pred.sum,type="l",lty=1,col=1,
        xlab="Identifier",ylab="Response Variable y", main="Posterior Predictive Distribution"
        , xaxt='n')
axis(side=1, at=ind.odd, tcl = -1.0, lty = 1, lwd = 0.5, labels=ind.odd, cex.axis=.75)
axis(side=1, at=ind.even, tcl = -0.7, lty = 1, lwd = 0.5, labels=rep("",length(ind.even)), cex.axis=.75)
points(ind,y,pch=19, cex=.4)

out=(y>pred.sum[2,] | y<pred.sum[1,])
text(ind[out], y[out], label=ind[out], pos = 1, cex=1, col=2)
Leave a comment

1 Comment

  1. The Uncertainty of Predictions | Frontiers of J...

Leave a Reply

Your email address will not be published. Required fields are marked *


seven × 9 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

%d bloggers like this: