Differential ChIP-seq with csaw: How to normalise counts on repetitive regions (telomers)?
Entering edit mode
Last seen 3.8 years ago


I am interested in H3K9me2 signal (in S. Pombe), which is abundant on telomeres and centromeres. These regions are notorious for being highly repetitive. I clearly see an effect between treated and control samples in the coverage on telomeres, and I would like to quantify these differences using csaw. However, I think that he default normalisation (TMM) is problematic, because if I have more signal from the telomeres, then I also have more multi-mapping reads (because they fall on repeats), the multi-mappers are not counted, which affects the the over-all count normalisation.

Any thoughts on how to solve this? Maybe skip TMM and divide the counts by the total number of mapped reads (not just the uniquely mapped)?


Gil Hornung


csaw histone chip-seq chip-seq • 1.0k views
Entering edit mode
Aaron Lun ★ 28k
Last seen 7 hours ago
The city by the bay

The multi-mapping reads (or lack thereof) should not affect how TMM normalization behaves. The assumption of TMM normalization on binned counts is that most regions of the genome are not marked, i.e., background, and all background regions are not DB between conditions. The normalization factors are subsequently computed to remove any systematic differences in the background counts between samples. Such differences are empirical, so the normalization will automatically account for the fact that multi-mapping reads are not counted.

Of course, this assumes that you're using the same set of reads for normalization as you are for the rest of the DB analysis. You shouldn't be computing normalization factors with the uniquely-mapped reads and then performing the rest of the analysis with multi-mapping reads (i.e., use the same readParam object). I'm also assuming you're using bins over the entire genome, don't just use the telomere regions for normalization.

There are also probably other issues you should consider. For example, if the telomere length changes between control or treatment, changes in marking would be confounded with changes in coverage due to copy number. This would be pretty hard to resolve from the ChIP-seq data, you'd probably need some other technique. There may also be some other biases, e.g., differences in IP efficiency between control and treatment conditions (this can be checked by ensuring that other marked non-telomere loci are not DB).

Entering edit mode

Thanks Aaron!


Login before adding your answer.

Traffic: 529 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6