I am working on a project where I am comparing how different statistical methods for finding differential abundance perform. Over the course of this project I have found that the zero-inflated, feature-specific lognormal model (Feature model,
fitFeatureModel function) generally performs better than the zero-inflated Gaussian model (ZIG model,
fitZig function), but the Feature model is unable to handle confounding covariates unlike the ZIG model. Both the manual and vignette clearly indicate that confounders can be incorporated into the model design matrix for the ZIG model but there is no similar mention for the Feature model (see example code from the vignette below).
## Feature model data(lungData) lungData = lungData[, -whichis.na(pData(lungData)$SmokingStatus))] lungData = filterData(lungData, present = 30, depth = 1) lungData <- cumNorm(lungData, p = 0.5) pd <- pData(lungData) ## No covariates in the model matrix mod <- model.matrix(~1 + SmokingStatus, data = pd) lungres1 = fitFeatureModel(lungData, mod)
## ZIG model data(lungData) controls = grep("Extraction.Control", pData(lungData)$SampleType) lungTrim = lungData[, -controls] rareFeatures = which(rowSums(MRcounts(lungTrim) > 0) < 10) lungTrim = lungTrim[-rareFeatures, ] lungp = cumNormStat(lungTrim, pFlag = TRUE, main = "Trimmed lung data") lungTrim = cumNorm(lungTrim, p = lungp) smokingStatus = pData(lungTrim)$SmokingStatus bodySite = pData(lungTrim)$SampleType normFactor = normFactors(lungTrim) normFactor = log2(normFactor/median(normFactor) + 1) ## bodySite and normFactor are the covariates mod = model.matrix(~smokingStatus + bodySite + normFactor) settings = zigControl(maxit = 10, verbose = TRUE) fit = fitZig(obj = lungTrim, mod = mod, useCSSoffset = FALSE, control = settings)
Whenever I try to use covariates with the Feature model, either with the example data or my own data, I keep getting an error saying
Can't analyze currently. I have also tried to fit the zero-inflated lognormal model directly myself using code from the package to no avail. Is there a mathematical / statistical explanation as to why the Feature model cannot handle covariates that I am just missing?
Let me know if you need any additional information from me. Thanks in advance!