Question: Limma: The log2FC of topTable is not equal to Mean(log2CPM) of Basal - Mean(log2CPM) of LP
0
zhouqz0 wrote:

I am confused about the log2FC when I use the limma to analyze the RNA-seq data. In the paper (Law et al. 2016; https://f1000research.com/articles/5-1408 ),  Log2FC of the gene (ENTREZID:12759) is -5.44 in basal.vs.lp, getting from topTable (Page 15).  However, the mean expression getting from voom is 5.58 and 10.89 in group basal and lp, respectively. 5.58 - 10.89 = -5.31, is not equal to -5.44. How to get the final expression value match to the log2FC of topTable ?

limma • 974 views  modified 13 months ago by Aaron Lun25k • written 13 months ago by zhouqz0
Answer: Limma: The log2FC of topTable is not equal to Mean(log2CPM) of Basal - Mean(log2
2
Aaron Lun25k wrote:

Weights.

set.seed(1000)
y <- matrix(rnbinom(10000, mu=1:1000, size=1), ncol=10)
design <- model.matrix(~gl(2, 5))
v <- voom(y, design)
fit <- lmFit(v, design)
fit <- eBayes(fit)
topTable(fit, coef=2, sort.by="none")

# Comparing to manual calculation:
head(rowMeans(v$E[,6:10]) - rowMeans(v$E[,1:5]))
weighted <- v$E * v$weights
head(rowSums(weighted[,6:10])/rowSums(v$weights[,6:10]) - rowSums(weighted[,1:5])/rowSums(v$weights[,1:5]))

I can only assume you're asking out of curiosity. Don't do the above in actual analyses, lmFit and topTable have been very thoroughly tested and will be much less buggy than any manual calculation.

ADD COMMENTlink modified 13 months ago • written 13 months ago by Aaron Lun25k

Note also that the analysis in Law et al (2016), from which OP has taken the logFC, included a batch correction term, another reason why naive subtraction of means will not match the actual computed logFC.

With a batch effect, there is no way to compute logFCs other than to fit the weighted linear model.

ADD REPLYlink modified 13 months ago • written 13 months ago by Gordon Smyth38k

Dear Gordon,

Thank you very much for your reply. I have removed the batch effect, but the subtraction of means still does not matches the actual logFC. I have displayed the test code. Is it ok?

Dear Aaron,

Thank you very much for your answer. I saw the code of lm.wfit. I can get the coefficient base on the v$E and v$weight. But I do not know how to get the final expression for each sample matching to logFC of topTable.  In Law et al(2016) paper, the v$E was considered as the final expression to do the next analysis such as heatmap (Page 16). It was confused for me. Base on the Manual, the v$E should remove the batch effect using the function removeBatchEffect. Moreover, I follow your code to do the manual calculation of logFC using the data in Law et al(2016) paper. The manual logFC is also not equal to the actual logFC of topTable.  I also used the removeBatchEffect to remove the batch effect, but after removing the batch effects, the subtraction of means still not matches the actual computed logFC. Below is the test code using the same data in Law's paper. Thanks again.

files <- c("GSM1545535_10_6_5_11.txt", "GSM1545536_9_6_5_11.txt",
"GSM1545538_purep53.txt","GSM1545539_JMS8-2.txt",
"GSM1545540_JMS8-3.txt","GSM1545541_JMS8-4.txt",
"GSM1545542_JMS8-5.txt","GSM1545544_JMS9-P7c.txt",
"GSM1545545_JMS9-P8c.txt")

x <- readDGE(files, columns=c(1,3))
class(x)

samplenames <- substring(colnames(x), 12, nchar(colnames(x)))
samplenames

colnames(x) <- samplenames
group <- as.factor(c("LP", "ML", "Basal", "Basal", "ML", "LP", "Basal", "ML", "LP"))
x$samples$group <- group
lane <- as.factor(rep(c("L004","L006","L008"), c(3,4,2)))
x$samples$lane <- lane
x$samples library(Mus.musculus) geneid <- rownames(x) genes <- select(Mus.musculus, keys=geneid, columns=c("SYMBOL", "TXCHROM"), keytype="ENTREZID") dim(genes) head(genes) genes <- genes[!duplicated(genes$ENTREZID),]

x$genes <- genes cpm <- cpm(x) lcpm <- cpm(x, log=TRUE) table(rowSums(x$counts==0)==9)

keep.exprs <- rowSums(cpm>1)>=3
x <- x[keep.exprs,, keep.lib.sizes=FALSE]
dim(x)
x <- calcNormFactors(x, method = "TMM")

design <- model.matrix(~0+group+lane)
colnames(design) <- gsub("group", "", colnames(design))

contr.matrix <- makeContrasts(
BasalvsLP = Basal-LP,
BasalvsML = Basal - ML,
LPvsML = LP - ML,
levels = colnames(design))
contr.matrix
v <- voom(x, design, plot=TRUE)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)

basal.vs.lp <- topTable(efit, coef=1, sort.by="none")

ENTREZID  SYMBOL TXCHROM       logFC    AveExpr           t      P.Value    adj.P.Val          B
497097   497097    Xkr4    chr1  7.45743870 -1.7775010  13.9455600 7.111517e-07 1.332370e-05  6.0632317
27395     27395  Mrpl15    chr1  0.48326620  4.2082183   2.3879867 4.414848e-02 7.017880e-02 -5.2737955
18777     18777  Lypla1    chr1  0.29313803  5.2818178   1.9143098 9.210726e-02 1.321884e-01 -6.0961967

#basal.vs.lp
weighted=v$E*v$weights
head(rowSums(weighted[,c(1,6,9)])/rowSums(v$weights[,c(1,6,9)]) - rowSums(weighted[,c(3,4,7)])/rowSums(v$weights[,c(3,4,7)]))

#Output

ENTREZID    497097      27395      18777
logFC       -7.5681082 -0.4334995 -0.2368517

#reomve the Batch Effect
rv <- voom(x, design, plot=FALSE)

design.BatchEffect=model.matrix(~0+group)

rBE.exp =removeBatchEffect(rv$E,covariates = = x$samples$lane,design=design.BatchEffect) head(rowMeans(rBE.exp[,c(1,6,9)])-rowMeans(rBE.exp[,c(3,4,7)])) #output ENTREZID 497097 27395 18777 logFC -7.5033660 -0.4798565 -0.3034289 ADD REPLYlink written 13 months ago by zhouqz0 Again, weights. Your naive application of removeBatchEffect doesn't account for the observation-level weights. These weights are considered by lmFit and affect the estimation of all coefficients (including the batch effect). ADD REPLYlink modified 13 months ago • written 13 months ago by Aaron Lun25k I manually remove the batch effect with weights base on the function of removeBatchEffect. However, the result still does not match to the autal logFC of topTable. rv <- voom(x, design, plot=FALSE) design.BatchEffect=model.matrix(~0+group) batch <- as.factor(x$samples$lane) contrasts(batch) <- contr.sum(levels(batch)) batch <- model.matrix(~batch)[, -1, drop = FALSE] batch2 = NULL covariates = NULL X.batch <- cbind(batch, batch2, covariates) rm.fit <- lmFit(rv$E, cbind(design.BatchEffect, X.batch), weights = v$weights) beta <- rm.fit$coefficients[, -(1:ncol(design.BatchEffect)), drop = FALSE]
beta[is.na(beta)] <- 0
rm.w.adj=as.matrix(v\$E) - beta %*% t(X.batch)

#output

ENTREZID   497097      27395      18777      21399      58175     108664
logFC       7.4586913  0.4733701  0.3017084  0.3788963 -5.4797058 -0.8135124

logFC of topTable:

ENTREZID   497097      27395      18777      21399      58175     108664

logFC    7.4574387  0.4832662  0.2931380  0.3711194 -5.4879042 -0.8143231

Why do you insist on using rowMeans? As Gordon has said, naive subtraction of means will not give you the correct result. This is because part of the residuals are driven by the batch effects, which will not cancel out when the number of samples in each batch are not the same between groups.