What are the different strategies to filter low count genes? Some of the ones I have seen in different papers are:
rowSums(count)>0
rowVariance(count)>1
What are the different strategies to filter low count genes? Some of the ones I have seen in different papers are:
rowSums(count)>0
rowVariance(count)>1
To filter out low count genes in DESeq2, to make the pipeline faster or reduce the size of the object in memory, I recommend something like:
keep <- rowSums(counts(dds) >= x) >= y
Where x may be 5, or 10, and y could be the smallest group's sample size. This is similar to the recommendation from edgeR and limma.
It's not technically required, but it certainly speeds things up if there are many genes with very low counts.
I have never recommended a variance filter. We don't want to only look at the high variance genes, but all of them which are above a minimal count.
Besides the reasons that Mike mentioned (save memory or compute time) another motivation for filtering can be to improve your multiple testing computations; if that is the motivation, then in fact, you might better off not filtering, but weighting, as e.g. in https://bioconductor.org/packages/release/bioc/vignettes/IHW/inst/doc/introduction_to_ihw.html
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.