Question

Limma: The log2FC of topTable is not equal to Mean(log2CPM) of Basal - Mean(log2CPM) of LP

0

Entering edit mode

zhouqz • 0

@zhouqz-17283

Last seen 3.7 years ago

I am confused about the log2FC when I use the limma to analyze the RNA-seq data. In the paper (Law et al. 2016; https://f1000research.com/articles/5-1408 ), Log2FC of the gene (ENTREZID:12759) is -5.44 in basal.vs.lp, getting from topTable (Page 15). However, the mean expression getting from voom is 5.58 and 10.89 in group basal and lp, respectively. 5.58 - 10.89 = -5.31, is not equal to -5.44. How to get the final expression value match to the log2FC of topTable ?

limma • 2.0k views

ADD COMMENT • link updated 5.6 years ago by Aaron Lun ★ 28k • written 5.6 years ago by zhouqz • 0

score 2 · Answer 1 · 2018-09-12

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 8 hours ago

The city by the bay

Weights.

set.seed(1000)
y <- matrix(rnbinom(10000, mu=1:1000, size=1), ncol=10)
design <- model.matrix(~gl(2, 5))
v <- voom(y, design)
fit <- lmFit(v, design)
fit <- eBayes(fit)
topTable(fit, coef=2, sort.by="none")

# Comparing to manual calculation:
head(rowMeans(v$E[,6:10]) - rowMeans(v$E[,1:5]))
weighted <- v$E * v$weights
head(rowSums(weighted[,6:10])/rowSums(v$weights[,6:10])
    - rowSums(weighted[,1:5])/rowSums(v$weights[,1:5]))

I can only assume you're asking out of curiosity. Don't do the above in actual analyses, lmFit and topTable have been very thoroughly tested and will be much less buggy than any manual calculation.

ADD COMMENT • link 5.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Note also that the analysis in Law et al (2016), from which OP has taken the logFC, included a batch correction term, another reason why naive subtraction of means will not match the actual computed logFC.

With a batch effect, there is no way to compute logFCs other than to fit the weighted linear model.

ADD REPLY • link 5.6 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon,

Thank you very much for your reply. I have removed the batch effect, but the subtraction of means still does not matches the actual logFC. I have displayed the test code. Is it ok?

ADD REPLY • link 5.6 years ago zhouqz • 0

0

Entering edit mode

Dear Aaron,

Thank you very much for your answer. I saw the code of lm.wfit. I can get the coefficient base on the v$E and v$weight. But I do not know how to get the final expression for each sample matching to logFC of topTable. In Law et al(2016) paper, the v$E was considered as the final expression to do the next analysis such as heatmap (Page 16). It was confused for me. Base on the Manual, the v$E should remove the batch effect using the function removeBatchEffect. Moreover, I follow your code to do the manual calculation of logFC using the data in Law et al(2016) paper. The manual logFC is also not equal to the actual logFC of topTable. I also used the removeBatchEffect to remove the batch effect, but after removing the batch effects, the subtraction of means still not matches the actual computed logFC. Below is the test code using the same data in Law's paper. Thanks again.

files <- c("GSM1545535_10_6_5_11.txt", "GSM1545536_9_6_5_11.txt",
"GSM1545538_purep53.txt","GSM1545539_JMS8-2.txt",
"GSM1545540_JMS8-3.txt","GSM1545541_JMS8-4.txt",
"GSM1545542_JMS8-5.txt","GSM1545544_JMS9-P7c.txt",
"GSM1545545_JMS9-P8c.txt")
read.delim(files[1], nrow=5)

x <- readDGE(files, columns=c(1,3))
class(x)

samplenames <- substring(colnames(x), 12, nchar(colnames(x)))
samplenames

colnames(x) <- samplenames
group <- as.factor(c("LP", "ML", "Basal", "Basal", "ML", "LP", "Basal", "ML", "LP"))
x$samples$group <- group
lane <- as.factor(rep(c("L004","L006","L008"), c(3,4,2)))
x$samples$lane <- lane
x$samples

library(Mus.musculus)
geneid <- rownames(x)
genes <- select(Mus.musculus, keys=geneid, columns=c("SYMBOL", "TXCHROM"),
keytype="ENTREZID")
dim(genes)
head(genes)
genes <- genes[!duplicated(genes$ENTREZID),]

x$genes <- genes
cpm <- cpm(x)
lcpm <- cpm(x, log=TRUE)
table(rowSums(x$counts==0)==9)

keep.exprs <- rowSums(cpm>1)>=3
x <- x[keep.exprs,, keep.lib.sizes=FALSE]
dim(x)
x <- calcNormFactors(x, method = "TMM")

design <- model.matrix(~0+group+lane)
colnames(design) <- gsub("group", "", colnames(design))

contr.matrix <- makeContrasts(
BasalvsLP = Basal-LP,
BasalvsML = Basal - ML,
LPvsML = LP - ML,
levels = colnames(design))
contr.matrix
v <- voom(x, design, plot=TRUE)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)

basal.vs.lp <- topTable(efit, coef=1, sort.by="none")

head(basal.vs.lp)

ENTREZID SYMBOL TXCHROM logFC AveExpr t P.Value adj.P.Val B
497097 497097 Xkr4 chr1 7.45743870 -1.7775010 13.9455600 7.111517e-07 1.332370e-05 6.0632317
27395 27395 Mrpl15 chr1 0.48326620 4.2082183 2.3879867 4.414848e-02 7.017880e-02 -5.2737955
18777 18777 Lypla1 chr1 0.29313803 5.2818178 1.9143098 9.210726e-02 1.321884e-01 -6.0961967

#basal.vs.lp
weighted=v$E*v$weights
head(rowSums(weighted[,c(1,6,9)])/rowSums(v$weights[,c(1,6,9)])
- rowSums(weighted[,c(3,4,7)])/rowSums(v$weights[,c(3,4,7)]))

#Output

ENTREZID 497097 27395 18777
logFC -7.5681082 -0.4334995 -0.2368517

#reomve the Batch Effect
rv <- voom(x, design, plot=FALSE)

design.BatchEffect=model.matrix(~0+group)

rBE.exp =removeBatchEffect(rv$E,covariates = = x$samples$lane,design=design.BatchEffect)
head(rowMeans(rBE.exp[,c(1,6,9)])-rowMeans(rBE.exp[,c(3,4,7)]))
#output
ENTREZID 497097 27395 18777
logFC -7.5033660 -0.4798565 -0.3034289

ADD REPLY • link 5.6 years ago zhouqz • 0

0

Entering edit mode

Again, weights. Your naive application of removeBatchEffect doesn't account for the observation-level weights. These weights are considered by lmFit and affect the estimation of all coefficients (including the batch effect).

ADD REPLY • link 5.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

I manually remove the batch effect with weights base on the function of removeBatchEffect. However, the result still does not match to the autal logFC of topTable.

rv <- voom(x, design, plot=FALSE)
design.BatchEffect=model.matrix(~0+group)
batch <- as.factor(x$samples$lane)
contrasts(batch) <- contr.sum(levels(batch))
batch <- model.matrix(~batch)[, -1, drop = FALSE]
batch2 = NULL
covariates = NULL
X.batch <- cbind(batch, batch2, covariates)
rm.fit <- lmFit(rv$E, cbind(design.BatchEffect, X.batch), weights = v$weights)
beta <- rm.fit$coefficients[, -(1:ncol(design.BatchEffect)), drop = FALSE]
beta[is.na(beta)] <- 0
rm.w.adj=as.matrix(v$E) - beta %*% t(X.batch)
head(rowMeans(rm.w.adj[,c(3,4,7)])-rowMeans(rm.w.adj[,c(1,6,9)]))

#output

ENTREZID 497097 27395 18777 21399 58175 108664
logFC 7.4586913 0.4733701 0.3017084 0.3788963 -5.4797058 -0.8135124

logFC of topTable:

ENTREZID 497097 27395 18777 21399 58175 108664

logFC 7.4574387 0.4832662 0.2931380 0.3711194 -5.4879042 -0.8143231

ADD REPLY • link 5.6 years ago zhouqz • 0

0

Entering edit mode

Why do you insist on using rowMeans? As Gordon has said, naive subtraction of means will not give you the correct result. This is because part of the residuals are driven by the batch effects, which will not cancel out when the number of samples in each batch are not the same between groups.

ADD REPLY • link 5.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

I just want to know why base on the definition of logFC. You and Gordon are right. the actual logFC is base on the coefficient. Thanks again.

ADD REPLY • link 5.6 years ago zhouqz • 0