Question

What are the different strategies to filter out low count genes?

0

Entering edit mode

Arindam ▴ 80

@ag1805x-15215

Last seen 13 days ago

University of Eastern Finland

What are the different strategies to filter low count genes? Some of the ones I have seen in different papers are:

rowSums(count)>0

rowVariance(count)>1

rnaseq deseq2 differential gene expression • 2.6k views

ADD COMMENT • link updated 6.4 years ago by Wolfgang Huber ★ 13k • written 6.4 years ago by Arindam ▴ 80

score 0 · Answer 1 · 2018-07-09

To filter out low count genes in DESeq2, to make the pipeline faster or reduce the size of the object in memory, I recommend something like:

keep <- rowSums(counts(dds) >= x) >= y

Where x may be 5, or 10, and y could be the smallest group's sample size. This is similar to the recommendation from edgeR and limma.

It's not technically required, but it certainly speeds things up if there are many genes with very low counts.

I have never recommended a variance filter. We don't want to only look at the high variance genes, but all of them which are above a minimal count.

score 0 · Answer 2 · 2018-07-11

Besides the reasons that Mike mentioned (save memory or compute time) another motivation for filtering can be to improve your multiple testing computations; if that is the motivation, then in fact, you might better off not filtering, but weighting, as e.g. in https://bioconductor.org/packages/release/bioc/vignettes/IHW/inst/doc/introduction_to_ihw.html