Question

bias correction (an example from ChIP-seq)

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 6 months ago

Palo Alto, CA, USA

Dear all, I would have a more general question regarding bias correction in ChIP-seq, and i would appreciate having your suggestions.

Let's assume that someone did a ChIP-seq for a protein in 2 conditions (a. CTL, and b. KD), and we compare : 1 -- all the genes in the genome ("all") , vs 2. -- a specific subset ("subset").

After KD, there is some increase in ChIP seq intensity genome-wide (for "all" genes), although the increase on the genes in "subset" is much stronger. more specifically, looking at the MEDIAN values in each experiment, let's assume :

CTL -- "all" genes : 10

KD -- "all" genes : 20

CTL -- "subset" genes : 15

KD -- "subset" genes : 45

The question would be : what type of scaling procedure (beside computing the Z-scores), would you recommend in order to bring CTL--"all genes" and KD--"all genes" on approx the same median value ?

thank you very much,

-- bogdan

normalization limma edgeR DEseq2 • 919 views

ADD COMMENT • link 7.1 years ago Bogdan ▴ 670

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 6 months ago

Palo Alto, CA, USA

Hi Ryan, thank you for your reply, and suggestion about csaw. I am working with specific regions, that are 1kb long, where i count the reads.

The question would be -- how i can obtain the matrix of normalized counts, from csaw (at this moment, we are not looking that much into differential binding). 'm reading the documentation. Thanks a lot ;) !

ADD COMMENT • link 7.1 years ago Bogdan ▴ 670

0

Entering edit mode

yes, just noticed this old post about cpm() : How can I get the normalized read counts from TMM?. And it helps ;)

ADD REPLY • link 7.1 years ago Bogdan ▴ 670

score 4 · Accepted Answer · 2017-03-14

You should read through the normalization section of the csaw User's Guide. It gives a fairly in-depth discussion of normalization issues for ChIP-Seq data. I'm not exactly sure how you're defining a "gene" for ChIP-Seq data, which can detect binding anywhere in the genome, not just in gene bodies or promoters. I'll assume that you're counting all the reads that overlap each gene body, or something similar. If this is the case, you're probably going to want to use the same normalization strategy as for RNA-seq: either the edgeR or DESeq2 normalization methods should be fine. Broadly, the goal of these normalizations is to set the log fold change of the average gene to zero. Hence, this will normalize out the average genome-wide difference in coverage, and all log fold change values will now have an interpretation of "deviation from the genome-wide average log fold change", which sounds like what you want.