normalization of POL2 ChIP-seq for the calculation of POL2 Pausing Ratios
1
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 6 months ago
Palo Alto, CA, USA

Dear all,

please could you advise about a way (ie a package/function in BioC) to normalize the ChIP-seq data for the calculation of POLYMERASE2 PAUSING RATIO (PR) for all the genes in the genome ;

we have a POL2 ChIP-seq dataset, and we compute the PAUSING RATIO (PR) as the ratio between the regions :

a. -- the read density in a region (-100,+300) around the TSS

b. -- the read density in a region (+1kb,+5kb) downstream of TSS

I know that we could normalize in edgeR or DESEq2 the ChIP-seq data;

another question being, shall we normalize the read counts in each region a) and b) above :

-- separately, -- together, or -- would it be legitimate to normalize the PR (Pausing Ratio) ?

 

thanks,

bogdan

edgeR deseq2 normalization • 766 views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 13 minutes ago
The city by the bay

Check out my answer to a related question here:

A: Differential Pol2 pausing analysis

Briefly: you shouldn't need to perform sample-specific normalization, as the PR is computed within each sample. Any sample-specific (scaling) bias will affect both the upstream and downstream regions and thus cancel out*. However, there is a need to perform region-specific normalization. One should at least correct for the sizes of the regions involved, but there are also other things like GC content, mappability, etc. that are harder to deal with.

More generally: the PR is easiest to interpret in a relative sense, when you're comparing between PRs of different conditions. This is because any region-specific biases in the calculation of the PR will cancel out between conditions. It's harder to interpret the PR in an absolute sense due to the presence of these biases. I guess you could hope that these biases are mostly the same between genes (probably not true), and just use the PR for a rough comparison across genes.

*: In theory. In practice, there are likely to be situations where Pol2 binding occurs only in one of the regions, which may then be affected by efficiency biases across samples. You may be able to diagnose this with MA plots of all upstream/downstream regions, and if there is a trend with increasing abundance, perform normalization with normOffsets from csaw.

ADD COMMENT
0
Entering edit mode

thanks Aaron, for your reply and for sending the link to potential solutions.

ADD REPLY

Login before adding your answer.

Traffic: 710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6