Subtracting INPUT reads from ChIP read counts.
1
0
Entering edit mode
xie186 • 0
@xie186-11029
Last seen 3.3 years ago
USA

Hi,

Recently I'm using DiffBind.

After running dba.count, I can use tamoxifen$binding to get a table.

> head(tamoxifen$binding)
  CHR  START    END       Sample1       Sample2       Sample3    Sample4     Sample5
1   1  10400  10999  29.5902750  26.3166045  44.5083833  30.57152   2.491209
2   1  15200  16199   0.6295803   0.7740178   0.6743694   1.09184   2.491209
3   1  97600  99799 130.3231261 181.8941783 195.5671390 230.37824 196.805481
4   1 103000 106799 239.2405214 288.7086320 328.4179196 276.23551  52.315381
5   1 108000 109599  33.9973372  20.1244623  22.2541917  30.57152  12.456043
6   1 110400 111799  23.9240521  24.7685690  18.2079750  21.83680   4.982417

I'm wondering what are these values.  DiffBind author Rory Stark mentioned DiffBind will subtract INPUT reads from ChIP reads and deal with negative values. I'd like to have some details about how these values are calculated from the first place. Can someone help me on this and/or show me an example? 

I tried to read the code. But it's hard for me to follow.

Thanks. 

 

 

 

diffbind ChIP-seq • 2.4k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 18 hours ago
Cambridge, UK

There are a number of possible "scoring" methods in DiffBind (see man page for dba.count()). The default values are calculated as follows:

  1. For each consensus interval, for each sample, count the overlapping ChIP reads.
  2. Subtract the number of overlapping Input reads.
  3. Set the values to a minimum value of 1 read.
  4. Compute normalization factors using edgeR TMM method, based on the total number of reads in each bam file.

These are the values you are seeing in the $binding matrix. Some notes:

  • These values are not necessarily the ones used for a differential analysis. For example, when invoking dba.analyze(), you can  specify if the Input reads should be subtracted, and if only the number of reads overlapping consensus peaks should be used for normalization. This would not change the values in the global $binding matrix. If you use DESeq2 for analysis (currently the default method), a different normalization will be used as well.
  • Instead of accessing the $binding values directly, you can use dba.peakset() with bRetrieve=TRUE
  • You can change the scoring method without recounting by calling dba.count() with peaks=NULL and setting the score parameter to a valid score, for example dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_RPKM).
  • In most cases that involve normalized count values, such as dba.report() with bCounts=TRUE, you can view non-normalized values by setting bNormalized=FALSE.

Hope this helps-

Rory

ADD COMMENT
0
Entering edit mode

Thank you so muck for your response. Rory. 

ADD REPLY
0
Entering edit mode

Hi Rory!

I'm trying to scale my bigwig files so the visualization on IGV matches what the DiffBind analysis results, being that the samples are all scaled to be on the same sequencing depth with the input reads subtracted. I know I can scale the coverage in the bedgraph files using the normFactors computed by the analysis; however I can't figure out how to deal with the subtraction of input reads. I was just wondering, by default while using bSubControl=TRUE, do you subtract the raw input reads from the raw ChIP reads? I tried changing the scoring method to both score=DBA_SCORE_RPKM_MINUS or score=DBA_SCORE_READS_MINUS, but in both cases, I get the same library size for each sample. I'm using method=DBA_DESEQ2 as well as the full library sizes for normalization.

I'm a bit confused as you can see! Your help would be much appreciated.

Here is just the bit of test that I did to try to figure out how the scoring impacts the library sizes.

> counts <- dba.count(counts, peaks=NULL, score=DBA_SCORE_READS_MINUS)
> counts <- dba.normalize(counts, method=DBA_DESEQ2)
> norm <- dba.normalize(counts, method=DBA_DESEQ2, bRetrieve=TRUE)
> norm
$norm.method
[1] "lib"

$norm.factors
[1] 1.4139824 0.7911595 0.9840277 0.8108305

$lib.method
[1] "full"

$lib.sizes
[1] 58097868 32507249 40431841 33315494

$control.subtract
[1] TRUE

$filter.value
[1] 1

> counts <- dba.count(counts, peaks=NULL, score=DBA_SCORE_RPKM_MINUS)
> counts <- dba.normalize(counts, method=DBA_DESEQ2)
> norm <- dba.normalize(counts, method=DBA_DESEQ2, bRetrieve=TRUE)
> norm
$norm.method
[1] "lib"

$norm.factors
[1] 1.4139824 0.7911595 0.9840277 0.8108305

$lib.method
[1] "full"

$lib.sizes
[1] 58097868 32507249 40431841 33315494

$control.subtract
[1] TRUE

$filter.value
[1] 1
ADD REPLY
0
Entering edit mode

As you have seen, the library sizes are based on the number of aligned reads and are not altered by subtracting control reads. Control read subtraction only impacts the read counts themselves.

In general, DiffBind has moved away from subtracting control reads, preferring greylists, as this has a more principled statistical justification.

ADD REPLY

Login before adding your answer.

Traffic: 794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6