bias correction (an example from ChIP-seq)
2
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 14 months ago
Palo Alto, CA, USA

Dear all, I would have a more general question regarding  bias correction in ChIP-seq, and i would appreciate having your suggestions.

Let's assume that someone did a ChIP-seq  for a protein in 2 conditions (a. CTL, and b. KD), and we compare : 1 -- all the genes in the genome ("all") , vs 2. -- a specific subset ("subset").

After KD, there is some increase in ChIP seq intensity genome-wide (for "all" genes), although the increase on the genes in "subset" is much stronger.  more specifically, looking at the MEDIAN values in each experiment, let's assume :

CTL -- "all" genes : 10

KD  --  "all" genes : 20

CTL -- "subset" genes : 15

KD -- "subset" genes : 45

The question would be : what type of scaling procedure (beside computing the Z-scores), would you recommend in order to bring CTL--"all genes" and KD--"all genes" on approx the same median value ?

thank you very much, 

-- bogdan

 

 

normalization limma edgeR DEseq2 • 1.1k views
ADD COMMENT
4
Entering edit mode
@ryan-c-thompson-5618
Last seen 12 weeks ago
Icahn School of Medicine at Mount Sinai…

You should read through the normalization section of the csaw User's Guide. It gives a fairly in-depth discussion of normalization issues for ChIP-Seq data. I'm not exactly sure how you're defining a "gene" for ChIP-Seq data, which can detect binding anywhere in the genome, not just in gene bodies or promoters. I'll assume that you're counting all the reads that overlap each gene body, or something similar. If this is the case, you're probably going to want to use the same normalization strategy as for RNA-seq: either the edgeR or DESeq2 normalization methods should be fine. Broadly, the goal of these normalizations is to set the log fold change of the average gene to zero. Hence, this will normalize out the average genome-wide difference in coverage, and all log fold change values will now have an interpretation of "deviation from the genome-wide average log fold change", which sounds like what you want.

ADD COMMENT
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 14 months ago
Palo Alto, CA, USA

Hi Ryan, thank you for your reply, and suggestion about csaw. I am working with specific regions, that are 1kb long, where i count the reads.

The question would be -- how i can obtain the matrix of normalized counts, from csaw  (at this moment, we are not looking that much into differential binding). 'm reading the documentation. Thanks a lot ;) !

 

ADD COMMENT
0
Entering edit mode

yes, just noticed this old post about cpm() : How can I get the normalized read counts from TMM?. And it helps ;)

ADD REPLY

Login before adding your answer.

Traffic: 643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6