BaseMean threshold
Entering edit mode
Last seen 18 days ago
United Kingdom

I have an rna seq dataset and I am using Deseq2 to find differentially expressed genes between the two groups. However, I also want to remove genes in low counts by using a base mean threshold. I used pre-filtering to remove any genes that have no counts or only one count across the samples, however, I also want to remove those that have low counts compared to the rest of the genes. Is there a common threshold used for the basemean or a way to work out what this threshold should be?

Thank you

basemean DESeq2 • 318 views
Entering edit mode
ATpoint ★ 1.2k
Last seen 7 hours ago

I would not use the baseMean for any filtering as it is (at least to me) hard to deconvolute. You do not know why the baseMean is low, either because there is no difference between groups and the gene is just lowly-expressed (and/or short), or it is moderately expressed in one but off in the other. The baseMean could be the same in these two scenarios. If you filter I would do it on the counts. So you could say that all or a fraction of samples of at least one group must have 10 or more counts. That will ensure that you remove genes that have many low counts or zeros across the groups rather than nested by group, the latter would be a good DE candidate so it should not be removed. Or you do that automated, e.g. using the edgeR function filterByExpr.

Entering edit mode

Yes, agree you can use filterByExpr or I commonly just use something like:

keep <- rowSums(counts(dds) >= 10) >= x

where x is the minimal number of samples that should have a count of 10 or more. E.g. you can use the smallest group sample size.


Login before adding your answer.

Traffic: 280 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6