I'm trying to incorporate breadth of coverage into my metatranscriptomic analysis. For each gene I have the # of mapped reads and the percentage of bps that are covered (breadth of coverage). In the example gene set:
gene counts breadth A. 10. 0.8 B. 20. 0.1 C. 15. 1.0
Normally just the counts would go into DESeq, but I'm wondering if it could work to adjust these counts based on breadth of coverage. In this example gene
B has many counts, but it is only covering a small portion of that gene—likely due to sequencing bias or ambiguous mapping. I was thinking about scaling the count values by breadth. It would just be
breadth_adjusted = counts * breadth, which would essential give the average number of counts per bp in a gene. So in the above example:
gene counts breadth breadth_adjusted A. 10. 0.8. 8 B. 20. 0.1. 2 C. 15. 1.0. 15
I'm completely aware that DESeq is meant for non-transformed count values. That said, could this breadth adjusted value be used with DESeq? I think this could be a good way of accounting for breadth of coverage prior to normalization. I thought about setting all genes with low breadth coverage to zero, but this would influence both the scaling as well as over-inflate expression of genes.