How to know if I should use voomWithQualityWeights() or not?
Entering edit mode
pedrodcb ▴ 10
Last seen 17 months ago

Hello Everyone,

I'm running an RNA-Seq analysis for which I have 72 samples (mice), (6 replicates for each observed group). The main covariates are sex (MvsF) and cell type (A, B, C) from different tissues. Using voomWithQualityWeights() instead of voom() returns a lot more diferentially expressed genes (2-3 folds more). I was wondering how can I know for sure if I should use voomWithQualityWeights() or not?

I understand it has to do with sample heterogeneity and outliers. How can I know if my samples truly are heterogeneous and whether my outliers are significant enough to justify using voomWithQualityWeights() instead of regular voom()? I only have on 1-2 samples that seem to be acting as outliers on the MDS plots. But as I have 72 samples I think it should be ok to simply remove them and run the analysis with regular voom(). Is there any way to better test or know which voom method I should use?

Thank you!

limma RNA-Seq voom voomWithQualityWeights edgeR • 2.8k views
Entering edit mode

Already asked on Biostars, where I linked to answers on Bioconductor:

Entering edit mode
Last seen 47 minutes ago
WEHI, Melbourne, Australia

In the situation you describe, it is generally better to use voomWithQualityWeights() than to remove outlier samples by hand. The automatic weighting procedure is more objective and systematic.

It would be ok to remove outlier samples if you can identify some clear-cut experimental reason why those samples are not of good quality. Otherwise, it is not correct to remove samples merely because they look like outliers on a plot or they don't fit your hypotheses -- that would be a type of cherry-picking.

Just ignoring the sample heterogeneity also doesn't seem to be an option. The fact that voomWithQualityWeights() gives 2-3 5 times as many DE genes clearly tells you that running voom() on the complete dataset is not sufficient.

In simulations, voomWithQualityWeights() still controls the FDR correctly even when no heterogeneity is present so, if you are in doubt about whether to use voom() or voomWithQualityWeights(), use the latter.

Entering edit mode

Hello Gordon,

Thank you very much for taking the time to answer. I just wanted to add that on the MDS plots, the 3 cell types (A,B,C) cluster very nicely (without visible outliers) on the first dimension. The sexes also cluster well (with only 1 or 2 visible outliers), but on the fifth dimension. The treatment vs. control hoever doesn't at all on any dimensions.

Would this justify using voomWithQualityWeights() even more based on data heterogeneity, at least when it comes to the variable of interest (since I want to compare treatment with control)?

Thank you again!


Login before adding your answer.

Traffic: 601 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6