Hello,
I analyzed a biosynthetic gene cluster dataset using metagenomeSeq but I am confused about the resulting table(s). Perhaps someone can help me understanding what is going on here. When performing the fitZig function (i.e. Zero-inflated Gaussian Model), using a model matrix with an intercept, as such:
# model matrix
mod <- model.matrix(~Treatment)
colnames(mod) = levels(as.factor(Treatment))
# fitZig function
settings <- zigControl(tol=1e-6, maxit = 18, verbose = TRUE)
fit <- fitZig(obj = object, mod = mod, useCSSoffset = TRUE, control = settings)
zigFit = fit@fit
finalMod = zigFit$design
contrast.matrix = makeContrasts(paste(make.names(lvls.contrast[2]), "-", make.names(lvls.contrast[1])), levels = finalMod)
fit2 = contrasts.fit(zigFit, contrast.matrix)
fit3 = eBayes(fit2)
topTable(fit3, coef=paste(lvls.contrast[2], "-", lvls.contrast[1]))
res <- topTable(fit3,coef=1,adjust="BH",n=Inf)
my resulting topTable shows very high logFolds (30+) and relatively low number of significant hits (range 1 - 20).
However, when I perform the same but without intercept, as such:
# model matrix without intercept
mod <- model.matrix(~ 0 + Treatment)
colnames(mod) = levels(as.factor(Treatment))
I get realistic logFolds (max. 4) but the number of significant hits are in the range of 400-500. Also, the two resulting tables from 'with intercept' and 'without intercept', rarely have the same logFold direction (positive or negative).
THEN, I tried the fitFeatureModel (see below) and I get huge amounts of significant hits (range of 500-600) with a 'realistic' logFold (max. 4)
fit <- fitFeatureModel(object, mod)
So, my confusion is about which models to use. I also don't understand how it's possible that the resulting table from the different model.matrix gives opposite directions (logFolds) for same features.
If someone could explain this, I would be very grateful.