Question: Limma voom or trend?
0
2.3 years ago by
b.nota320
Netherlands
b.nota320 wrote:

Hello,

This might have been asked before, but I couldn't find the answer. So I try to ask here what the difference between voom and trend is for RNAseq analysis.

I have a study with four groups, and a batch effect. The lib sizes are between 8M to 11M reads. In the user's guide it says that when the lib sizes are not more than 3 fold different, trend is the method to choose. So when I do this I get significant genes. However, when I do the analysis with voom, I get many more significantly expressed genes. Why is that, and how reliable are these extra genes? Does this mean that the extra genes from limma voom are false positive?

trend

> DGE1 <- DGEList(counts, group=Group)
> keep <- rowSums(cpm(DGE1)>1) >= 3
> DGE2 <- DGE1[keep,]
> y1 <- calcNormFactors(DGE2)
> logCPM <- cpm(y1, log=TRUE, prior.count=0.5)
> design <- model.matrix(~0 + targets$Group + targets$Donor)
> fit <- lmFit(logCPM, design)
> fit2 <- eBayes(contrasts.fit(fit, contr))
> summary(decideTests(fit2, method="separate"))
PPvsPN PPvsNN PPvsNP PNvsNN PNvsNP NPvsNN
-1     54    140     86     27    433    137
0   15897  15654  15796  16039  15095  15803
1     154    311    223     39    577    165

voom

> DGE1 <- DGEList(counts_minus, group=Group)
> keep <- rowSums(cpm(DGE1)>1) >= 3
> DGE2 <- DGE1[keep,]
> y1 <- calcNormFactors(DGE2)
> design <- model.matrix(~0 + targets$Group + targets$Donor)
> v <- voom(y1, design, plot=TRUE)
> fit <- lmFit(v, design)
> fit2 <- eBayes(contrasts.fit(fit, contr))
> summary(decideTests(fit2, method="separate"))
PPvsPN PPvsNN PPvsNP PNvsNN PNvsNP NPvsNN
-1    363    579    428    277    873    465
0   15166  14647  15023  15469  14252  15087
1     576    879    654    359    980    553



limma voom limma-trend • 2.2k views
modified 15 months ago by max.moldovan0 • written 2.3 years ago by b.nota320
2
2.3 years ago by
United States
James W. MacDonald50k wrote:

The code for limma trend isn't doing limma trend. You have to include trend = TRUE in the call to eBayes, or else you are doing 'regular' limma. This probably has a lot to do with the difference in numbers of significant genes.

In addition, if I do a google search like 'limma trend vs voom site:support.bioconductor.org', the first Limma-voom vs limma-trend seems relevant.

Well, that's embarrassing... Forgetting the "trend=T" part in the limma trend analysis! Must have something to do with Mondays...

Thanks for your quick help, the figures are more alike now with the real trend included:

> fit2 <- eBayes(contrasts.fit(fit, contr),trend=TRUE)

> summary(decideTests(fit2, method="separate"))
PPvsPN PPvsNN PPvsNP PNvsNN PNvsNP NPvsNN
-1    257    428    303    193    796    372
0   15339  14793  15212  15544  14403  15250
1     509    884    590    368    906    483

As you've discovered, voom and limma-trend tend to give about the same number of DE genes, although it will vary from one dataset to another.

0
15 months ago by
max.moldovan0 wrote:

I was googling for this exact question and this what here is an excellent numerical illustration. What is the difference methodologically? It is not clear from the manual etc. Isn't it that by not applying voom we don't account for the common mean-variance dependance issue?

1

I would disagree with your statement regarding clarity of the 'manual', although I am not sure what you mean by that. There are help pages, and they seem abundantly clear to me. If you read ?voom, you will see this:

    'voom' is an acronym for mean-variance modelling at the
observational level. The key concern is to estimate the
mean-variance relationship in the data, then use this to compute
appropriate weights for each observation. Count data almost show
non-trivial mean-variance relationships. Raw counts show
increasing variance with increasing count size, while log-counts
typically show a decreasing mean-variance trend. This function
estimates the mean-variance trend for log-counts, then assigns a
weight to each observation based on its predicted variance. The
weights are then used in the linear modelling process to adjust
for heteroscedasticity.

'voom' performs the following specific calculations. First, the
counts are converted to logCPM values, adding 0.5 to all the
counts to avoid taking the logarithm of zero. The matrix of logCPM
values is then optionally normalized. The 'lmFit' function is used
to fit row-wise linear models. The 'lowess' function is then used
to fit a trend to the square-root-standard-deviations as a
function of average logCPM. The trend line is then used to predict
the variance of each logCPM value as a function of its fitted
value, and the inverse variances become the estimated precision
weights.

And if you look at ?eBayes, you will see

     If 'trend=TRUE' then an intensity-dependent trend is fitted to the
prior variances 's2.prior'. Specifically, 'squeezeVar' is called
with the 'covariate' equal to 'Amean', the average log2-intensity
for each gene. See 'squeezeVar' for more details.

There are also multiple references in each man page that you can reference if you really want to get into the weeds.