I am trying to modify the protocol to find significant differences in microbe composition between samples (so, I am using microbial species in place of genes). However, I would have reads that have not mapped to the species level and reads that are unclassified as well. I was wondering if these should be included in the cpm function calculation for 'lowly expressed' filtering, as well as the calcNormFactors library size for edgeR. From what I have seen in the protocol, only classified genes are taken into consideration, and when lowly expressed genes are removed, the library size for calcNormFactors are adjusted to reflect this removal (effectively not including them in the downstream analysis). I am unsure though if this is translatable to what I am attempting to do. I would appreciate any insight you might have, thank you.