cpm filtering edgeR
1
0
Entering edit mode
es874 ▴ 20
@es874-11802
Last seen 7.3 years ago

I have read the edgeR User's Guide, and Section 2.6 recommends setting a threshold of at least 5-10 absolute counts in a library to be considered expressed. If I set 10 counts as my threshold for expression, my calculated cpm becomes 0.8 based on the smallest library size of my sample set:

group lib.size norm.factors
SM01          T0 12559041            1

0.8 = ((10 x 1,000,000)/12559041) Is this correct? Am I being too stringent using 0.8 instead of 1?

Also, in what instances would the recommended 5-10 counts change (+ or -)?

Thanks

 

edger • 778 views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 35 minutes ago
The city by the bay

First, your calculation looks fine to me. You could probably just use a threshold of 1 instead, it's not that much of a difference. But if you've taken the time to work it out, you might as well use 0.8.

The 5-10 recommendation seems to do well in a variety of situations (for routine RNA-seq, at least). The issue is that, at counts lower than 5, we get problems with discreteness and some statistical approximations become inaccurate. On the other hand, we don't want to increase the filter beyond 10, because we might start filtering out interesting genes. So the 5-10 choice represents a compromise between these two considerations.

Lower thresholds are sometimes used when you're explicitly interested in low-abundance genes, e.g., certain ncRNAs, repeat elements that don't get a lot of reads. Higher thresholds are used in other applications like ChIP-seq, where there is a certain level of background enrichment and we need to set the filter above that background. This gets rid of uninteresting genomic regions, even if they have large read counts.

ADD COMMENT

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6