Limma voom or trend?
2
1
Entering edit mode
b.nota ▴ 360
@bnota-7379
Last seen 3.6 years ago
Netherlands

Hello,

This might have been asked before, but I couldn't find the answer. So I try to ask here what the difference between voom and trend is for RNAseq analysis.

I have a study with four groups, and a batch effect. The lib sizes are between 8M to 11M reads. In the user's guide it says that when the lib sizes are not more than 3 fold different, trend is the method to choose. So when I do this I get significant genes. However, when I do the analysis with voom, I get many more significantly expressed genes. Why is that, and how reliable are these extra genes? Does this mean that the extra genes from limma voom are false positive?

trend

> DGE1 <- DGEList(counts, group=Group)
> keep <- rowSums(cpm(DGE1)>1) >= 3
> DGE2 <- DGE1[keep,]
> y1 <- calcNormFactors(DGE2)
> logCPM <- cpm(y1, log=TRUE, prior.count=0.5)
> design <- model.matrix(~0 + targets$Group + targets$Donor)
> fit <- lmFit(logCPM, design)
> fit2 <- eBayes(contrasts.fit(fit, contr))
> summary(decideTests(fit2, method="separate"))
   PPvsPN PPvsNN PPvsNP PNvsNN PNvsNP NPvsNN
-1     54    140     86     27    433    137
0   15897  15654  15796  16039  15095  15803
1     154    311    223     39    577    165

voom

> DGE1 <- DGEList(counts_minus, group=Group)
> keep <- rowSums(cpm(DGE1)>1) >= 3
> DGE2 <- DGE1[keep,]
> y1 <- calcNormFactors(DGE2)
> design <- model.matrix(~0 + targets$Group + targets$Donor)
> v <- voom(y1, design, plot=TRUE)
> fit <- lmFit(v, design)
> fit2 <- eBayes(contrasts.fit(fit, contr))
> summary(decideTests(fit2, method="separate"))
   PPvsPN PPvsNN PPvsNP PNvsNN PNvsNP NPvsNN
-1    363    579    428    277    873    465
0   15166  14647  15023  15469  14252  15087
1     576    879    654    359    980    553


 

 

 

 

 

limma voom limma-trend • 7.2k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

The code for limma trend isn't doing limma trend. You have to include trend = TRUE in the call to eBayes, or else you are doing 'regular' limma. This probably has a lot to do with the difference in numbers of significant genes.

In addition, if I do a google search like 'limma trend vs voom site:support.bioconductor.org', the first Limma-voom vs limma-trend seems relevant.

ADD COMMENT
0
Entering edit mode

Well, that's embarrassing... Forgetting the "trend=T" part in the limma trend analysis! Must have something to do with Mondays...

Thanks for your quick help, the figures are more alike now with the real trend included:

> fit2 <- eBayes(contrasts.fit(fit, contr),trend=TRUE)

> summary(decideTests(fit2, method="separate"))
   PPvsPN PPvsNN PPvsNP PNvsNN PNvsNP NPvsNN
-1    257    428    303    193    796    372
0   15339  14793  15212  15544  14403  15250
1     509    884    590    368    906    483

 

ADD REPLY
0
Entering edit mode

As you've discovered, voom and limma-trend tend to give about the same number of DE genes, although it will vary from one dataset to another.

ADD REPLY
0
Entering edit mode
@maxmoldovan-15036
Last seen 6.2 years ago

I was googling for this exact question and this what here is an excellent numerical illustration. What is the difference methodologically? It is not clear from the manual etc. Isn't it that by not applying voom we don't account for the common mean-variance dependance issue?

ADD COMMENT
1
Entering edit mode

First, unless you are answering a question, don't use the 'Add your answer' box. Because you aren't answering a question. Use the ADD REPLY link to add replies or additional questions.

I would disagree with your statement regarding clarity of the 'manual', although I am not sure what you mean by that. There are help pages, and they seem abundantly clear to me. If you read ?voom, you will see this:

    'voom' is an acronym for mean-variance modelling at the
     observational level. The key concern is to estimate the
     mean-variance relationship in the data, then use this to compute
     appropriate weights for each observation. Count data almost show
     non-trivial mean-variance relationships. Raw counts show
     increasing variance with increasing count size, while log-counts
     typically show a decreasing mean-variance trend. This function
     estimates the mean-variance trend for log-counts, then assigns a
     weight to each observation based on its predicted variance. The
     weights are then used in the linear modelling process to adjust
     for heteroscedasticity.

     'voom' performs the following specific calculations. First, the
     counts are converted to logCPM values, adding 0.5 to all the
     counts to avoid taking the logarithm of zero. The matrix of logCPM
     values is then optionally normalized. The 'lmFit' function is used
     to fit row-wise linear models. The 'lowess' function is then used
     to fit a trend to the square-root-standard-deviations as a
     function of average logCPM. The trend line is then used to predict
     the variance of each logCPM value as a function of its fitted
     value, and the inverse variances become the estimated precision
     weights.

And if you look at ?eBayes, you will see

     If 'trend=TRUE' then an intensity-dependent trend is fitted to the
     prior variances 's2.prior'. Specifically, 'squeezeVar' is called
     with the 'covariate' equal to 'Amean', the average log2-intensity
     for each gene. See 'squeezeVar' for more details.

There are also multiple references in each man page that you can reference if you really want to get into the weeds.

ADD REPLY

Login before adding your answer.

Traffic: 512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6