While at the Joint Statistical Meeting a few weeks ago I was talking to a friend about various aspects to clinical trials. He indicated that no current R package was able to perfectly reproduce Passing-Bablok (PB) regression so that it exactly matched SAS. He ultimately wrote a couple of functions and kindly shared them with me so that I could share them here.

There is currently an R package (mcr) that will do the Passing-Bablok regression but, as it turns out, the output does not precisely match SAS. Sadly, I don’t currently have a copy of SAS available to use for this purpose so I can’t independently run code and comparisons. Though I would be interested in an equivalent comparison. However, I ran some examples below to compare the R *mcr package.*

These code snippets that he shared with me will only calculate the slope, intercept and the confidence interval for each coefficient. It’s not CRAN package ready but the short function and output are convenient and easy to use.

One interesting characteristic comparing the two PB regressions (at least in the simulation below) is that the *mcr package* always has a slightly smaller coefficient. In the simulation below the maximum difference is .00054. The minimum difference is somewhere along the lines of zero (1.59e-12). The details of the differences is something I must look into at a later date.

library(mcr) # Function to generate some correlated data genData = function(numobs=100,randval){ R = matrix(cbind(1,.80, .80,1),nrow=2) U = t(chol(R)) nvars = dim(U)[1] set.seed(randval) random.normal = matrix(rnorm(nvars*numobs,100,10), nrow=nvars, ncol=numobs); X = U %*% random.normal newX = t(X) raw = newX return(raw) } PB.reg = function(X,Y,alpha=.05) { ## (1) calculate all pairwise slopes x < - X y <- Y dat <- cbind(x,y) n <- length(x) S <- array(NA,dim=rep(n,2)) for(i in 1:(n-1)){ for(j in (i+1):n) { if(i != j) { S[i,j] <- (y[i] - y[j])/(x[i] - x[j]) } } } S <- sort(na.exclude(as.vector(S))) K <- sum(S <= -1) - .5 * sum(S == -1) N <- length(S) b <- ifelse(N%%2,S[(N+1)/2+K],mean(S[N/2+K+0:1])) ### CI for b C.gamma <- qnorm(1-alpha/2) * sqrt(n*(n-1)*(2*n+5)/18) M1 <- round((N-C.gamma)/2,0) M2 <- N - M1 + 1 CI.b <- c(LB=S[M1+K],UB=S[M2+K]) a <- median(y - b*x) ### CI for a CI.a <- c(LB=median(y - CI.b["UB"]*x), UB=median(y - CI.b["LB"]*x)) ## return(list(a=a, CI.a=CI.a, b=b, CI.b=CI.b)) } # A quick function to give only b coefficient PB.fit.red = function(X,Y){ fit.lm = PB.reg(X, Y) fit.lm$b } # Generate some one-time use correlated data simData = genData(numobs=20,c(12345)) # Updated function pb.fit = PB.reg(simData[,1], simData[,2]) # Look at just the b coefficient, single example PB.fit.red(simData[,1], simData[,2]) # mcreg function, single example pb.fit2 = mcreg(simData[,1], simData[,2], method.reg="PaBa", method.ci="bootstrap") slot(pb.fit2, "glob.coef")[2] &amp;amp;nbsp; # Run a small simulation nsims = 250 sim.data = matrix(NA, nrow=nsims, ncol=2) # for() loop used for demonstration purposes. # Not efficient code applying a function over the vectors is more efficient for(i in 1:nsims){ simData = genData( numobs=20, i ) # Look at just the b coefficient, multiple simulations sim.data[i,1] = PB.fit.red(simData[,1], simData[,2]) # mcreg function, multiple simulations pb.fit2 = mcreg(simData[,1], simData[,2], method.reg="PaBa", method.ci="bootstrap") sim.data[i,2] = slot(pb.fit2, "glob.coef")[2] } delta = sim.data[,1] - sim.data[,2] mean(delta) max(delta) min(delta) hist(delta, nclass=50, main="Distribution of Differences in PB Regression") plot(sim.data, main="Difference Between Two Types of Passing-Bablok Non-Parametric Regression", xlab="Improved PB Function", ylab="mcr Package")

## anspiess

/ September 3, 2013For what specific does R have to be gauged against SAS? I know that the FDA favors the latter, but is SAS gauged against anything, maybe something like calibrator datasets that are independently (non-company) evaluated?

## Wesley

/ September 3, 2013That’s an interesting question. I am by no means an expert in FDA regulatory submissions, so maybe someone else with experience in the history of SAS and the FDA can chime in, but it seems that SAS has a monopoly on the market due to tradition. I would be interested in a first-person account of the history. But my take is that in comparison all other statistical software are effectively children (or grandchildren) compared to SAS. SAS came around in the mid 60′s while R was the late 90′s. Though SPSS came out around the same time it never took hold due to the fact that the software was a Statistical Package for the Social Science (SPSS). Even S started up in the mid-70′s. It is likely the SAS format (and output) is something that the FDA are very familiar and providing anything else would throw off their groove. I know that the FDA will only accept for archive SAS XPORT transport format.

## Fabian

/ February 13, 2014Good catch – the default behavior of mcr Passing-Bablok regression is indeed slightly different from the original publication. The major difference is that mcr estimates the regression slope as the shifted median of pairwise angles atan(dy/dx) and not slopes dy/dx. This makes only a difference for data sets with even number of data points and pairwise slopes – then the central two angles/slopes are averaged and here the used metric for slope actually matters. For most applications an approximation based on angles makes more sense than averaging slopes which generates a positive bias from a geometrical point of view.

The current version of mcr offers the option “slope.measure” which when set to “tangent” will provide the classical Passing-Bablok estimates. To reproduce the default (slope.measure=”radian”) mcr behavior you can change line 035 in your example code to:

b <- ifelse(N%%2,S[(N+1)/2+K],tan(mean(atan(S[N/2+K+0:1]))))

In most practical examples the difference between radian and tangent metric is negligible. Here is an artificial example where the difference matters (from mcreg help page):

## Pathological example of Passing-Bablok regression where measure for slope angle matters

library(mcr)

x1 <- 1:10; y1 <- 0.5*x1; x <- c(x1,y1); y <- c(y1,x1)

m1 <- mcreg(x, y, method.reg="PaBa", method.ci="analytical", slope.measure="radian", mref.name="X", mtest.name="Y")

m2 <- mcreg(x, y, method.reg="PaBa", method.ci="analytical", slope.measure="tangent", mref.name="X",mtest.name="Y")

plot(m1, add.legend=FALSE, identity=FALSE, main="Radian vs. tangent slope measures in Passing-Bablok regression\n(pathological example)", ci.area=FALSE, add.cor=FALSE)

plot(m2, ci.area=FALSE, reg.col="darkgreen", reg.lty=2, identity=FALSE, add.legend=FALSE, draw.points=FALSE, add=TRUE, add.cor=FALSE)

includeLegend(place="topleft", models=list(m1,m2), model.names=c("PaBa Radian","PaBa Tangent"), colors=c("darkblue","darkgreen"), lty=c(1,2), design="1", digits=2)

The mcr PaBa implementation has some additional extensions to the original equations:

We extended the algorithm to enable unbiased estimation of anti-correlation.

Very small x and y differences (rel difference <1E-12) used for calculating slopes are set to 0 to avoid meaningless slope estimates due to limited floating point accuracy. See mcr:::calcDiff.

Shift index K estimates of N/2 (very rare but can occur in only weakly correlated data and in combination with bootstrap) lead to invalid index calculations for data sets with even N (N/2+K+1 = N/2+N/2+1 would be invalid) and are handled by truncating to the valid index range.

Hope this sheds some light on the mcr PaBa implementation.