Question

normalization of POL2 ChIP-seq for the calculation of POL2 Pausing Ratios

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 6 months ago

Palo Alto, CA, USA

Dear all,

please could you advise about a way (ie a package/function in BioC) to normalize the ChIP-seq data for the calculation of POLYMERASE2 PAUSING RATIO (PR) for all the genes in the genome ;

we have a POL2 ChIP-seq dataset, and we compute the PAUSING RATIO (PR) as the ratio between the regions :

a. -- the read density in a region (-100,+300) around the TSS

b. -- the read density in a region (+1kb,+5kb) downstream of TSS

I know that we could normalize in edgeR or DESEq2 the ChIP-seq data;

another question being, shall we normalize the read counts in each region a) and b) above :

-- separately, -- together, or -- would it be legitimate to normalize the PR (Pausing Ratio) ?

thanks,

bogdan

edgeR deseq2 normalization • 766 views

ADD COMMENT • link updated 6.6 years ago by Aaron Lun ★ 28k • written 6.6 years ago by Bogdan ▴ 670

score 1 · Answer 1 · 2017-08-29

Check out my answer to a related question here:

A: Differential Pol2 pausing analysis

Briefly: you shouldn't need to perform sample-specific normalization, as the PR is computed within each sample. Any sample-specific (scaling) bias will affect both the upstream and downstream regions and thus cancel out*. However, there is a need to perform region-specific normalization. One should at least correct for the sizes of the regions involved, but there are also other things like GC content, mappability, etc. that are harder to deal with.

More generally: the PR is easiest to interpret in a relative sense, when you're comparing between PRs of different conditions. This is because any region-specific biases in the calculation of the PR will cancel out between conditions. It's harder to interpret the PR in an absolute sense due to the presence of these biases. I guess you could hope that these biases are mostly the same between genes (probably not true), and just use the PR for a rough comparison across genes.

*: In theory. In practice, there are likely to be situations where Pol2 binding occurs only in one of the regions, which may then be affected by efficiency biases across samples. You may be able to diagnose this with MA plots of all upstream/downstream regions, and if there is a trend with increasing abundance, perform normalization with normOffsets from csaw.