Do we include unclassified reads (and other ambiguous reads) in normalization
1
0
Entering edit mode
a.sulit • 0
@asulit-15700
Last seen 5.9 years ago

I am trying to modify the protocol to find significant differences in microbe composition between samples (so, I am using microbial species in place of genes). However, I would have reads that have not mapped to the species level and reads that are unclassified as well. I was wondering if these should be included in the cpm function calculation for 'lowly expressed' filtering, as well as the calcNormFactors library size for edgeR. From what I have seen in the protocol, only classified genes are taken into consideration, and when lowly expressed genes are removed, the library size for calcNormFactors are adjusted to reflect this removal (effectively not including them in the downstream analysis). I am unsure though if this is translatable to what I am attempting to do. I would appreciate any insight you might have, thank you.

edger normalization • 454 views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 5 hours ago
The city by the bay

Whenever you run calcNormFactors, the question that you have to ask yourself is, "Are most of my features non-DE between samples?" Here, your features are microbial species, so if you are happy to assume that most of your (successfully classified) species do not change in abundance across conditions, then you can use calcNormFactors directly as described in the user's guide.

You do not need to consider unclassified reads and ambiguous reads. In fact, including them would probably make life more difficult. Can  you reasonably assume that the number of ambiguous reads should be constant between samples? I'm not sure what this even means from an experimental or biological perspective.

If you cannot assume that most features are non-DE, then the use of calcNormFactors is inherently problematic, regardless of whatever you do. There's not much that can be done here. Either you find some known constant control features and normalize based on those; or you accept that you cannot do proper DE testing, and test for differential proportions instead. Which might be good enough for your purposes, I don't know.

ADD COMMENT

Login before adding your answer.

Traffic: 791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6