Question

csaw normalization and library size

0

Entering edit mode

Nicolas Servant ▴ 260

@nicolas-servant-1466

Last seen 23 months ago

France

Hi all,

I have a short question about TMM normalization in csaw.

I would like to normalize ChIP-seq histone marks data, and compare samples (WT vs KO).

Before starting, I have my bam files, and results from peak calling. Here is my plan ;

1- Run bin counts over large genomic bins (10kb), where peaks locations were removed. So basically, the idea is to count over the background only (using windowCounts)

2- Counts the reads on peak regions (using regionCounts)

3- Normalize the peak counts using the scaling factors calculated on background only

counts.peaks <- normOffsets(counts.bg, se.out=counts.peaks)

However, here, I have an issue with the library size.

Error in .local(object, ...) :
library sizes of 'se.out' and 'object' are not identical

The message is clear, but how can I fix it ?

Any feedback is welcome.

Best. Nicolas

chip-seq normalization • 1.0k views

ADD COMMENT • link updated 5.7 years ago by Aaron Lun ★ 28k • written 5.7 years ago by Nicolas Servant ▴ 260

score 0 · Answer 1 · 2018-07-30

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 42 minutes ago

The city by the bay

This shouldn't happen if you used the same readParam object in both regionCounts and windowCounts. In fact, that's the primary purpose of having the readParam class in the first place. The only possibility I can think of is if one of the totals is integer and the other is double-precision. You can check this easily enough with:

identical(counts.bg$totals, counts.peaks$totals)
all.equal(counts.bg$totals, counts.peaks$totals)

If the latter is TRUE and the former is not, then it's a type issue. If neither are TRUE, then I would guess that you didn't use the same readParam in the two *Counts calls. If both are TRUE... well, then we wouldn't have any problems.

ADD COMMENT • link 5.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for your feedback.

Finally, I change a bit my code to use windowCounts on all genomic bins, and them removing bins overlapping peaks with the overlap function. That's way, I'm sure the lib.size is always based on the full genome information, with the same lib;size

Thanks again

ADD REPLY • link 5.7 years ago Nicolas Servant ▴ 260