Question: csaw normalization and library size
0
gravatar for Nicolas Servant
9 months ago by
France
Nicolas Servant230 wrote:

Hi all,

I have a short question about TMM normalization in csaw.

I would like to normalize ChIP-seq histone marks data, and compare samples (WT vs KO).

Before starting, I have my bam files, and results from peak calling. Here is my plan ;

1- Run bin counts over large genomic bins (10kb), where peaks locations were removed. So basically, the idea is to count over the background only (using windowCounts)

2- Counts the reads on peak regions (using regionCounts)

3- Normalize the peak counts using the scaling factors calculated on background only

counts.peaks <- normOffsets(counts.bg, se.out=counts.peaks)

However, here, I have an issue with the library size.

Error in .local(object, ...) :
  library sizes of 'se.out' and 'object' are not identical

The message is clear, but how can I fix it ?

Any feedback is welcome.

Best. Nicolas

 

normalization chip-seq • 210 views
ADD COMMENTlink modified 9 months ago by Aaron Lun23k • written 9 months ago by Nicolas Servant230
Answer: csaw normalization and library size
0
gravatar for Aaron Lun
9 months ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

This shouldn't happen if you used the same readParam object in both regionCounts and windowCounts. In fact, that's the primary purpose of having the readParam class in the first place. The only possibility I can think of is if one of the totals is integer and the other is double-precision. You can check this easily enough with:

identical(counts.bg$totals, counts.peaks$totals)
all.equal(counts.bg$totals, counts.peaks$totals)

If the latter is TRUE and the former is not, then it's a type issue. If neither are TRUE, then I would guess that you didn't use the same readParam in the two *Counts calls. If both are TRUE... well, then we wouldn't have any problems.

ADD COMMENTlink written 9 months ago by Aaron Lun23k

Thanks for your feedback.

Finally, I change a bit my code to use windowCounts on all genomic bins, and them removing bins overlapping peaks with the overlap function. That's way, I'm sure the lib.size is always based on the full genome information, with the same lib;size

Thanks again


 

ADD REPLYlink written 9 months ago by Nicolas Servant230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 106 users visited in the last hour