What are the different strategies to filter out low count genes?
2
0
Entering edit mode
Arindam ▴ 80
@ag1805x-15215
Last seen 13 days ago
University of Eastern Finland

What are the different strategies to filter low count genes? Some of the ones I have seen in different papers are:

rowSums(count)>0

rowVariance(count)>1

rnaseq deseq2 differential gene expression • 2.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

To filter out low count genes in DESeq2, to make the pipeline faster or reduce the size of the object in memory, I recommend something like:

keep <- rowSums(counts(dds) >= x) >= y

Where x may be 5, or 10, and y could be the smallest group's sample size. This is similar to the recommendation from edgeR and limma.

It's not technically required, but it certainly speeds things up if there are many genes with very low counts.

I have never recommended a variance filter. We don't want to only look at the high variance genes, but all of them which are above a minimal count.

ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…

Besides the reasons that Mike mentioned (save memory or compute time) another motivation for filtering can be to improve your multiple testing computations; if that is the motivation, then in fact, you might better off not filtering, but weighting, as e.g. in https://bioconductor.org/packages/release/bioc/vignettes/IHW/inst/doc/introduction_to_ihw.html

ADD COMMENT

Login before adding your answer.

Traffic: 698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6