Voom nornalization lowess span
3
0
Entering edit mode
@gregory-warnes-2155
Last seen 7.8 years ago
United States

I've seen voom normalization plots that have an S-shaped form, where the default lowess span in limma::voom appears to be substantially too large.   For instance, appying limma::voom  to the 48 replicate WT data from Schurch et al. (2016)

Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson GG, Owen-Hughes T, Blaxter M, Barton GJ. (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA

yields:

I have three  questions:

1) What is the probable cause of the initial upward trending curve?

2) What is the expected impact on the results of limma+voom of using this over-smoothed mean-variance relationship?

3) Is there a better (automated?) mechanism to select an appropriate span?

FWIW, here is the voom plot with a span of 0.05, which looks to be a much better fit to the data:

 

 

 

 

 

voom normalization • 2.0k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 8 hours ago
The city by the bay

That dip on the left is invariably a result of discreteness, where the variance decreases because the majority of the counts are zero (and thus, many of the log-CPMs are the same, or nearly the same after the continuity correction and division by library size). If you filter on abundance as recommended, the plot would get truncated on the left which should remove this dip. In fact, the removal of strange trends at low abundances is one of the main reasons for filtering, to ensure that they don't compromise the inferences for higher-abundance genes.

Of course, if you're explicitly interested in low-abundance genes, then filtering would not be a solution. However, if that's the case, I'd argue that voom should not be used at all; normality isn't appropriate for low, discrete counts, and it's also difficult to model the mean-variance relationship when you're only focusing on a limited covariate range at low abundance. If you really need to get inferences for low-abundance genes, then I'd suggest switching to edgeR, as it better handles the count-based nature of the data.

ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

As Aaron has said, the voom trend always has a J-shape at low counts because of the effects. When you filter the low count genes out, this J-shape will disappear, see for example my advice to this poster:

  voom for spectral counts

The voom lowess span is deliberating chosen large so that the curve will not follow artifacts like this. Voom is currently conservative for very low counts, which I'm reasonably happy with.

ADD COMMENT
0
Entering edit mode
@gregory-warnes-2155
Last seen 7.8 years ago
United States

Thanks Aaron.

I suppose that using the wider span would increase the estimated variability dramaticallly, which would have the side effect of forcing these low-significance genes to be non-significant. 

ADD COMMENT
0
Entering edit mode

For the low-abundance genes, yes. For other genes, the effect is harder to predict, as the estimate of the prior degrees of freedom (a measure of the variability of the variances around the trend) gets distorted when you don't fit the trend right. In fact, even when you do fit the trend correctly, the prior d.f. estimate will probably be a bit strange; you can see that the spread is a lot tighter at low abundances due to discreteness, whereas it's generally more consistent throughout the higher abundances.

ADD REPLY

Login before adding your answer.

Traffic: 921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6