4 months ago by
Cambridge, United Kingdom
The multi-mapping reads (or lack thereof) should not affect how TMM normalization behaves. The assumption of TMM normalization on binned counts is that most regions of the genome are not marked, i.e., background, and all background regions are not DB between conditions. The normalization factors are subsequently computed to remove any systematic differences in the background counts between samples. Such differences are empirical, so the normalization will automatically account for the fact that multi-mapping reads are not counted.
Of course, this assumes that you're using the same set of reads for normalization as you are for the rest of the DB analysis. You shouldn't be computing normalization factors with the uniquely-mapped reads and then performing the rest of the analysis with multi-mapping reads (i.e., use the same
readParam object). I'm also assuming you're using bins over the entire genome, don't just use the telomere regions for normalization.
There are also probably other issues you should consider. For example, if the telomere length changes between control or treatment, changes in marking would be confounded with changes in coverage due to copy number. This would be pretty hard to resolve from the ChIP-seq data, you'd probably need some other technique. There may also be some other biases, e.g., differences in IP efficiency between control and treatment conditions (this can be checked by ensuring that other marked non-telomere loci are not DB).
modified 4 months ago
4 months ago by
Aaron Lun • 18k