Subtracting INPUT reads from ChIP read counts.
1
0
Entering edit mode
xie186 • 0
@xie186-11029
Last seen 22 months ago
USA

Hi,

Recently I'm using DiffBind.

After running dba.count, I can use tamoxifen$binding to get a table.

> head(tamoxifen$binding)
  CHR  START    END       Sample1       Sample2       Sample3    Sample4     Sample5
1   1  10400  10999  29.5902750  26.3166045  44.5083833  30.57152   2.491209
2   1  15200  16199   0.6295803   0.7740178   0.6743694   1.09184   2.491209
3   1  97600  99799 130.3231261 181.8941783 195.5671390 230.37824 196.805481
4   1 103000 106799 239.2405214 288.7086320 328.4179196 276.23551  52.315381
5   1 108000 109599  33.9973372  20.1244623  22.2541917  30.57152  12.456043
6   1 110400 111799  23.9240521  24.7685690  18.2079750  21.83680   4.982417

I'm wondering what are these values.  DiffBind author Rory Stark mentioned DiffBind will subtract INPUT reads from ChIP reads and deal with negative values. I'd like to have some details about how these values are calculated from the first place. Can someone help me on this and/or show me an example? 

I tried to read the code. But it's hard for me to follow.

Thanks. 

 

 

 

diffbind ChIP-seq • 946 views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 4.1k
@rory-stark-5741
Last seen 1 day ago
CRUK, Cambridge, UK

There are a number of possible "scoring" methods in DiffBind (see man page for dba.count()). The default values are calculated as follows:

  1. For each consensus interval, for each sample, count the overlapping ChIP reads.
  2. Subtract the number of overlapping Input reads.
  3. Set the values to a minimum value of 1 read.
  4. Compute normalization factors using edgeR TMM method, based on the total number of reads in each bam file.

These are the values you are seeing in the $binding matrix. Some notes:

  • These values are not necessarily the ones used for a differential analysis. For example, when invoking dba.analyze(), you can  specify if the Input reads should be subtracted, and if only the number of reads overlapping consensus peaks should be used for normalization. This would not change the values in the global $binding matrix. If you use DESeq2 for analysis (currently the default method), a different normalization will be used as well.
  • Instead of accessing the $binding values directly, you can use dba.peakset() with bRetrieve=TRUE
  • You can change the scoring method without recounting by calling dba.count() with peaks=NULL and setting the score parameter to a valid score, for example dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_RPKM).
  • In most cases that involve normalized count values, such as dba.report() with bCounts=TRUE, you can view non-normalized values by setting bNormalized=FALSE.

Hope this helps-

Rory

ADD COMMENT
0
Entering edit mode

Thank you so muck for your response. Rory. 

ADD REPLY

Login before adding your answer.

Traffic: 194 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6