Search
Question: Limma: The log2FC of topTable is not equal to Mean(log2CPM) of Basal - Mean(log2CPM) of LP
0
gravatar for zhouqz
9 weeks ago by
zhouqz0
zhouqz0 wrote:

I am confused about the log2FC when I use the limma to analyze the RNA-seq data. In the paper (Law et al. 2016; https://f1000research.com/articles/5-1408 ),  Log2FC of the gene (ENTREZID:12759) is -5.44 in basal.vs.lp, getting from topTable (Page 15).  However, the mean expression getting from voom is 5.58 and 10.89 in group basal and lp, respectively. 5.58 - 10.89 = -5.31, is not equal to -5.44. How to get the final expression value match to the log2FC of topTable ? 

ADD COMMENTlink modified 9 weeks ago by Aaron Lun21k • written 9 weeks ago by zhouqz0
2
gravatar for Aaron Lun
9 weeks ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

Weights.

set.seed(1000)
y <- matrix(rnbinom(10000, mu=1:1000, size=1), ncol=10)
design <- model.matrix(~gl(2, 5))
v <- voom(y, design)
fit <- lmFit(v, design)
fit <- eBayes(fit)
topTable(fit, coef=2, sort.by="none")

# Comparing to manual calculation:
head(rowMeans(v$E[,6:10]) - rowMeans(v$E[,1:5]))
weighted <- v$E * v$weights
head(rowSums(weighted[,6:10])/rowSums(v$weights[,6:10])
    - rowSums(weighted[,1:5])/rowSums(v$weights[,1:5]))

I can only assume you're asking out of curiosity. Don't do the above in actual analyses, lmFit and topTable have been very thoroughly tested and will be much less buggy than any manual calculation.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Aaron Lun21k

Note also that the analysis in Law et al (2016), from which OP has taken the logFC, included a batch correction term, another reason why naive subtraction of means will not match the actual computed logFC.

With a batch effect, there is no way to compute logFCs other than to fit the weighted linear model.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Gordon Smyth35k

Dear Gordon,

Thank you very much for your reply. I have removed the batch effect, but the subtraction of means still does not matches the actual logFC. I have displayed the test code. Is it ok? 

 

ADD REPLYlink written 9 weeks ago by zhouqz0

Dear Aaron,

Thank you very much for your answer. I saw the code of lm.wfit. I can get the coefficient base on the v$E and v$weight. But I do not know how to get the final expression for each sample matching to logFC of topTable.  In Law et al(2016) paper, the v$E was considered as the final expression to do the next analysis such as heatmap (Page 16). It was confused for me. Base on the Manual, the v$E should remove the batch effect using the function removeBatchEffect. Moreover, I follow your code to do the manual calculation of logFC using the data in Law et al(2016) paper. The manual logFC is also not equal to the actual logFC of topTable.  I also used the removeBatchEffect to remove the batch effect, but after removing the batch effects, the subtraction of means still not matches the actual computed logFC. Below is the test code using the same data in Law's paper. Thanks again.


files <- c("GSM1545535_10_6_5_11.txt", "GSM1545536_9_6_5_11.txt",
           "GSM1545538_purep53.txt","GSM1545539_JMS8-2.txt",
           "GSM1545540_JMS8-3.txt","GSM1545541_JMS8-4.txt",
           "GSM1545542_JMS8-5.txt","GSM1545544_JMS9-P7c.txt",
           "GSM1545545_JMS9-P8c.txt")
read.delim(files[1], nrow=5)

x <- readDGE(files, columns=c(1,3))
class(x)

samplenames <- substring(colnames(x), 12, nchar(colnames(x)))
samplenames

colnames(x) <- samplenames
group <- as.factor(c("LP", "ML", "Basal", "Basal", "ML", "LP", "Basal", "ML", "LP"))
x$samples$group <- group
lane <- as.factor(rep(c("L004","L006","L008"), c(3,4,2)))
x$samples$lane <- lane
x$samples

library(Mus.musculus)
geneid <- rownames(x)
genes <- select(Mus.musculus, keys=geneid, columns=c("SYMBOL", "TXCHROM"),
                keytype="ENTREZID")
dim(genes)
head(genes)
genes <- genes[!duplicated(genes$ENTREZID),]

x$genes <- genes
cpm <- cpm(x)
lcpm <- cpm(x, log=TRUE)
table(rowSums(x$counts==0)==9)

keep.exprs <- rowSums(cpm>1)>=3
x <- x[keep.exprs,, keep.lib.sizes=FALSE]
dim(x)
x <- calcNormFactors(x, method = "TMM")

design <- model.matrix(~0+group+lane)
colnames(design) <- gsub("group", "", colnames(design))

contr.matrix <- makeContrasts(
  BasalvsLP = Basal-LP,
  BasalvsML = Basal - ML,
  LPvsML = LP - ML,
  levels = colnames(design))
contr.matrix
v <- voom(x, design, plot=TRUE)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)

basal.vs.lp <- topTable(efit, coef=1, sort.by="none")

head(basal.vs.lp)

            ENTREZID  SYMBOL TXCHROM       logFC    AveExpr           t      P.Value    adj.P.Val          B
497097   497097    Xkr4    chr1  7.45743870 -1.7775010  13.9455600 7.111517e-07 1.332370e-05  6.0632317
27395     27395  Mrpl15    chr1  0.48326620  4.2082183   2.3879867 4.414848e-02 7.017880e-02 -5.2737955
18777     18777  Lypla1    chr1  0.29313803  5.2818178   1.9143098 9.210726e-02 1.321884e-01 -6.0961967

#basal.vs.lp
weighted=v$E*v$weights
head(rowSums(weighted[,c(1,6,9)])/rowSums(v$weights[,c(1,6,9)])
     - rowSums(weighted[,c(3,4,7)])/rowSums(v$weights[,c(3,4,7)]))

#Output

ENTREZID    497097      27395      18777 
logFC       -7.5681082 -0.4334995 -0.2368517 

#reomve the Batch Effect
rv <- voom(x, design, plot=FALSE)

design.BatchEffect=model.matrix(~0+group)

rBE.exp =removeBatchEffect(rv$E,covariates =  = x$samples$lane,design=design.BatchEffect)
head(rowMeans(rBE.exp[,c(1,6,9)])-rowMeans(rBE.exp[,c(3,4,7)]))
#output
ENTREZID      497097      27395       18777
logFC       -7.5033660 -0.4798565 -0.3034289

ADD REPLYlink written 9 weeks ago by zhouqz0

Again, weights. Your naive application of removeBatchEffect doesn't account for the observation-level weights. These weights are considered by lmFit and affect the estimation of all coefficients (including the batch effect).

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Aaron Lun21k

I manually remove the batch effect with weights base on the function of removeBatchEffect. However, the result still does not match to the autal logFC of topTable. 

rv <- voom(x, design, plot=FALSE)
design.BatchEffect=model.matrix(~0+group)
batch <- as.factor(x$samples$lane)
contrasts(batch) <- contr.sum(levels(batch))
batch <- model.matrix(~batch)[, -1, drop = FALSE]
batch2 = NULL
covariates = NULL
X.batch <- cbind(batch, batch2, covariates)
rm.fit <- lmFit(rv$E, cbind(design.BatchEffect, X.batch), weights = v$weights)
beta <- rm.fit$coefficients[, -(1:ncol(design.BatchEffect)), drop = FALSE]
beta[is.na(beta)] <- 0
rm.w.adj=as.matrix(v$E) - beta %*% t(X.batch)
head(rowMeans(rm.w.adj[,c(3,4,7)])-rowMeans(rm.w.adj[,c(1,6,9)]))

#output

ENTREZID   497097      27395      18777      21399      58175     108664 
logFC       7.4586913  0.4733701  0.3017084  0.3788963 -5.4797058 -0.8135124

logFC of topTable:

ENTREZID   497097      27395      18777      21399      58175     108664 

logFC    7.4574387  0.4832662  0.2931380  0.3711194 -5.4879042 -0.8143231

 

ADD REPLYlink written 9 weeks ago by zhouqz0

Why do you insist on using rowMeans? As Gordon has said, naive subtraction of means will not give you the correct result. This is because part of the residuals are driven by the batch effects, which will not cancel out when the number of samples in each batch are not the same between groups.

ADD REPLYlink written 9 weeks ago by Aaron Lun21k

I just want to know why base on the definition of logFC. You and Gordon are right. the actual logFC is base on the coefficient. Thanks again.

ADD REPLYlink written 8 weeks ago by zhouqz0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 162 users visited in the last hour