Search
Question: What are the different strategies to filter out low count genes?
0
gravatar for ag1805x
10 weeks ago by
ag1805x0
ag1805x0 wrote:

What are the different strategies to filter low count genes? Some of the ones I have seen in different papers are:

rowSums(count)>0

rowVariance(count)>1

ADD COMMENTlink modified 10 weeks ago by Wolfgang Huber13k • written 10 weeks ago by ag1805x0
0
gravatar for Michael Love
10 weeks ago by
Michael Love19k
United States
Michael Love19k wrote:

To filter out low count genes in DESeq2, to make the pipeline faster or reduce the size of the object in memory, I recommend something like:

keep <- rowSums(counts(dds) >= x) >= y

Where x may be 5, or 10, and y could be the smallest group's sample size. This is similar to the recommendation from edgeR and limma.

It's not technically required, but it certainly speeds things up if there are many genes with very low counts.

I have never recommended a variance filter. We don't want to only look at the high variance genes, but all of them which are above a minimal count.

ADD COMMENTlink written 10 weeks ago by Michael Love19k
0
gravatar for Wolfgang Huber
10 weeks ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Besides the reasons that Mike mentioned (save memory or compute time) another motivation for filtering can be to improve your multiple testing computations; if that is the motivation, then in fact, you might better off not filtering, but weighting, as e.g. in https://bioconductor.org/packages/release/bioc/vignettes/IHW/inst/doc/introduction_to_ihw.html

ADD COMMENTlink written 10 weeks ago by Wolfgang Huber13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 126 users visited in the last hour