8 months ago by
Cambridge, United Kingdom
The first consideration is how you will be normalizing the samples. Library size normalization would be unwise if you're expecting DE in your rRNA genes, given that these make up the bulk of your reads. If you have enough coverage of the rest of the transcriptome, you could reasonably assume that most of those genes are not DE and use them to calculate normalization/size factors, e.g., with
calcNormFactors. If not... you're in trouble.
The next question is whether you have enough features for empirical Bayes shrinkage to work. I can't remember how many rRNA genes there are, but I didn't think there were that many - 5S, 5.8S, 28S and 40S, for eukaryotes (excluding repeats)? This limits the advantage to sharing information between genes. It's unlikely that the rest of the transcriptome will be of much help here, as rRNA genes are probably so highly expressed that they will be in a different part of the mean-dispersion trend compared to all other genes. Nonetheless, if you can get a stable estimate of the trended dispersion (i.e., without overfitting at high abundances for the few rRNA genes), you're good to go.
If you've overcome the two challenges above, then the rest is easy. The fact that the rRNA genes have high abundance is not a problem for hypothesis testing, as it just gives you more power to detect differential expression. This is usually a good thing - and in fact, sometimes too good, as one consequence of increasing power is that you may get significant DE with (very) small log-fold changes. If you don't want that, consider using things like
glmTreat to impose a minimum threshold on the log-fold change.
modified 8 months ago
8 months ago by
Aaron Lun • 21k