Question: csaw normalization and library size
gravatar for Nicolas Servant
16 months ago by
Nicolas Servant260 wrote:

Hi all,

I have a short question about TMM normalization in csaw.

I would like to normalize ChIP-seq histone marks data, and compare samples (WT vs KO).

Before starting, I have my bam files, and results from peak calling. Here is my plan ;

1- Run bin counts over large genomic bins (10kb), where peaks locations were removed. So basically, the idea is to count over the background only (using windowCounts)

2- Counts the reads on peak regions (using regionCounts)

3- Normalize the peak counts using the scaling factors calculated on background only

counts.peaks <- normOffsets(, se.out=counts.peaks)

However, here, I have an issue with the library size.

Error in .local(object, ...) :
  library sizes of 'se.out' and 'object' are not identical

The message is clear, but how can I fix it ?

Any feedback is welcome.

Best. Nicolas


normalization chip-seq • 309 views
ADD COMMENTlink modified 16 months ago by Aaron Lun25k • written 16 months ago by Nicolas Servant260
Answer: csaw normalization and library size
gravatar for Aaron Lun
16 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

This shouldn't happen if you used the same readParam object in both regionCounts and windowCounts. In fact, that's the primary purpose of having the readParam class in the first place. The only possibility I can think of is if one of the totals is integer and the other is double-precision. You can check this easily enough with:

identical($totals, counts.peaks$totals)
all.equal($totals, counts.peaks$totals)

If the latter is TRUE and the former is not, then it's a type issue. If neither are TRUE, then I would guess that you didn't use the same readParam in the two *Counts calls. If both are TRUE... well, then we wouldn't have any problems.

ADD COMMENTlink written 16 months ago by Aaron Lun25k

Thanks for your feedback.

Finally, I change a bit my code to use windowCounts on all genomic bins, and them removing bins overlapping peaks with the overlap function. That's way, I'm sure the lib.size is always based on the full genome information, with the same lib;size

Thanks again


ADD REPLYlink written 16 months ago by Nicolas Servant260
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 226 users visited in the last hour