Search
Question: What are the different strategies to filter out low count genes?
0
4 months ago by
ag1805x10
ag1805x10 wrote:

What are the different strategies to filter low count genes? Some of the ones I have seen in different papers are:

rowSums(count)>0

rowVariance(count)>1

modified 4 months ago by Wolfgang Huber13k • written 4 months ago by ag1805x10
0
4 months ago by
Michael Love20k
United States
Michael Love20k wrote:

To filter out low count genes in DESeq2, to make the pipeline faster or reduce the size of the object in memory, I recommend something like:

keep <- rowSums(counts(dds) >= x) >= y

Where x may be 5, or 10, and y could be the smallest group's sample size. This is similar to the recommendation from edgeR and limma.

It's not technically required, but it certainly speeds things up if there are many genes with very low counts.

I have never recommended a variance filter. We don't want to only look at the high variance genes, but all of them which are above a minimal count.

0
4 months ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Besides the reasons that Mike mentioned (save memory or compute time) another motivation for filtering can be to improve your multiple testing computations; if that is the motivation, then in fact, you might better off not filtering, but weighting, as e.g. in https://bioconductor.org/packages/release/bioc/vignettes/IHW/inst/doc/introduction_to_ihw.html