Question

[ComBat] ComBat outputting NAs with Large Batches - Switch to Log Likelihood

0

Entering edit mode

Jason Stein ▴ 10

@jason-stein-5621

Last seen 9.6 years ago

Hi fellow ComBat fans, Not sure if my previous message posted to the listserv because of the attachment. Now re-sending without attachment: I was having a problem with ComBat when using large batch sizes and the non-parametric parameter estimation. Almost all probe intensity values after ComBat correction would be NAs. I've traced the source of this problem and have a solution (switching from likelihood to log likelihood) described below. I hope you all find it helpful, or that you would let me know if you find any errors. Also, I can send the attachment which shows why I changed what I changed if anyone would like to take a look. ComBat correction ComBat was sometimes outputting NAs for all probe values using non-parametric parameter estimates. I traced this to the likelihood calculation in the int.eprior function. The original function is pasted below: # Monte Carlo integration function to find the nonparametric adjustments for a particular batch int.eprior <- function(sdat,g.hat,d.hat){ g.star <- d.star <- NULL r <- nrow(sdat) for(i in 1:r){ g <- g.hat[-i] d <- d.hat[-i] x <- sdat[i,!is.na(sdat[i,])] n <- length(x) j <- numeric(n)+1 dat <- matrix(as.numeric(x),length(g),n,byrow=T) resid2 <- (dat-g)^2 sum2 <- resid2%*%j LH <- 1/(2*pi*d)^(n/2)*exp(-sum2/(2*d)) LH[LH=="NaN"]=0 g.star <- c(g.star,sum(g*LH)/sum(LH)) d.star <- c(d.star,sum(d*LH)/sum(LH)) ##if(i%%1000==0){cat(i,'\n')} } adjust <- rbind(g.star,d.star) rownames(adjust) <- c("g.star","d.star") adjust } The likelihood calculation LH here goes to zero when sum2 is very large due to numerical precision errors. This causes the parameter estimates g.star and d.star to become NaN from a divide by zero error, and consequently for the output of the program to be NAs. This can happen when there are a lot of samples in a batch or when there are large residuals (or both). To correct this, can use the log-likelihood, which is less susceptible to numerical precision errors since it is not so small. I changed the function to use log-likelihood and the entire edited function is now pasted below: # Monte Carlo integration function to find the nonparametric adjustments for a particular batch int.eprior <- function(sdat,g.hat,d.hat){ #Get the number of probes r <- nrow(sdat); #The number of probes removing the probe of interest G = r - 1; #g.star and d.star are the output matrices g.star <- d.star <- matrix(NA,nrow=G,ncol=1); #Loop over each probe for(i in 1:r){ #Remove the beta value of the probe of interest g <- g.hat[-i]; #Remove the variance of the probe of interest d <- d.hat[-i]; #Isolate the value of this probe across all samples x <- sdat[i,!is.na(sdat[i,])]; #Number of samples in this batch n <- length(x) #A matrix of dimension gxn which contains the probe values repeated in each row dat <- matrix(as.numeric(x),length(g),n,byrow=T) #The residual between the probe values and the regression coefficient resid2 <- (dat-g)^2 #Calculate Log-likelihood using Gaussian LL = matrix(0,nrow=length(G),ncol=1); for (j in 1:n) { LL = LL + -0.5*log(2*pi*d)-(resid2[,j]/(2*d)); } #Order by the log-likelihood values to avoid Inf in parameter estimates orderind = order(LL,decreasing=TRUE); LL = LL[orderind]; g = g[orderind]; d = d[orderind]; #Determine the parameter estimates numerator.g = 1; denominator = 1; numerator.d = 1; for (k in 2:G) { numerator.g = numerator.g + (g[k]/g[1]) * exp(LL[k]-LL[1]); denominator = denominator + exp(LL[k]-LL[1]); numerator.d = numerator.d + (d[k]/d[1]) * exp(LL[k]-LL[1]); } g.star[i] = g[1]*(numerator.g/denominator); d.star[i] = d[1]*(numerator.d/denominator); } adjust <- rbind(g.star,d.star) rownames(adjust) <- c("g.star","d.star") adjust }

probe probe • 1.9k views

ADD COMMENT • link updated 6.1 years ago by jbloomstein • 0 • written 11.4 years ago by Jason Stein ▴ 10

score 0 · Answer 1 · 2018-03-27

0

Entering edit mode

jbloomstein • 0

@jbloomstein-15356

Last seen 6.1 years ago

Hi Jason,

I realize that you wrote this over 5 years ago, but I am having a similar 'NA' issue with Combat now. Did this help to solve the issue you witnessed? How would you implement this code into Combat? Are you aware of other threads that speak to this topic that I might check out?

Thanks,

Josh

ADD COMMENT • link 6.1 years ago jbloomstein • 0

0

Entering edit mode

Hi Josh, I am having the same problem. The developers' github says it is a problem with gene rows having zero variance. Supposedly they fixed it, with 2 pull-requests at https://github.com/jtleek/sva-devel/issues/14 and https://github.com/jtleek/sva-devel/pull/35. Downloading the new version from https://github.com/zhangyuqing/sva-devel is suppose to fix this problem, but all I get is function not found errors.

ADD REPLY • link 4.8 years ago manninm • 0