Question

Performance of different flavors of limma and edgeR as the sample size goes up

1

Entering edit mode

Nik Tuzov ▴ 80

@nik-tuzov-8783

Last seen 3 months ago

United States

Hello:

I performed some limma analysis using simulated data. For each gene, I fitted a “plain” linear model (lm() in R), limma with no trend, limma with trend, and voom.

I expected the results of all four models to converge to the same thing (in terms of p-values) as the sample size goes up. Indeed, it is the case for the first three models. However, voom stands apart. It looks like the weights computed in voom do not become equal to each other as the sample size goes to infinity.

edgeR has a similar problem. I fitted Negative Binomial regression separately for each gene, and the results are consistent with edgeR output for a large sample. However, edgeR_Robust stands apart, just like voom.

I find this queer because, if weighing is a way to dampen outliers, then for a very large sample size the sample is essentially representative of the population and there are no outliers to speak of. Hence, the observation weights should become equal to each other. Could you explain why it doesn't happen?

Regards,

Nik

limma edgeR • 1.7k views

ADD COMMENT • link 8.6 years ago • updated 8.5 years ago Nik Tuzov ▴ 80

score 1 · Answer 1 · 2015-09-15

The voom weights are intended to capture the magnitude of the Poisson noise, or any other heteroskedasticity in the data that is related to sequencing depth per sample, not the number of samples, so Increasing the number of samples while holding the sequencing depth per sample constant should not affect the weights very much. Regardless, as the sample size increases, I would expect voom to converge with the other methods in its estimation of p-values and log fold changes. The voom paper shows that voom and limma-trend are nearly equivalent when sample sequencing depth is consistent, and so voom is primarily necessary to account for variation in sequencing depth, which is of course a near inevitability in real data, hence the need for the method. You don't mention whether the methods resulted in similar p-values or fold changes.

As for edgeR_robust, I have less experience with this method. However, a large sample size doesn't necessarily eliminate outliers. If I recall correctly, the goal of edgeR_robust is to identify and down-weight samples that do not follow the distributional assumptions of edgeR. If there is some mechanism generating such outliers in the data, increasing the sample size will not make it disappear, and a method that successfully detects and compensates for these outliers will always perform better than the same method without outlier detection, no matter how many samples are used. The edgeR_robust paper claims that in the absence of outliers, edgeR_robust performs essentially the same as unweighted edgeR. Again, you don't mention how your comparisons performed at the level of p-values and fold changes, so I can't comment on any differences you may have seen there.

On a related note, my recent experience on real data has been that limma-voom and edgeR's quasi-likelihood method perform almost identically for most genes. When I plot their p-values against each other on a log-log plot, they cluster tightly around the identity line. The LRT (and DESeq2's Wald test), in contrast, is way too liberal with most genes: https://mneme.homenet.org/~ryan/resume/examples/Salomon/globin/pval-comparisons.pdf

score 0 · Answer 2 · 2015-11-05

0

Entering edit mode

Nik Tuzov ▴ 80

@nik-tuzov-8783

Last seen 3 months ago

United States

It turns out that I didn't take into account that, unlike the other two versions, voom performs CPM normalization inside the call. After I neutralized the normalization, the results of all three methods look similar for a large sample size.

ADD COMMENT • link 8.5 years ago Nik Tuzov ▴ 80