Question

Subtracting INPUT reads from ChIP read counts.

0

Entering edit mode

xie186 • 0

@xie186-11029

Last seen 2.8 years ago

USA

Hi,

Recently I'm using DiffBind.

After running dba.count, I can use tamoxifen$binding to get a table.

> head(tamoxifen$binding)
CHR START END Sample1 Sample2 Sample3 Sample4 Sample5
1 1 10400 10999 29.5902750 26.3166045 44.5083833 30.57152 2.491209
2 1 15200 16199 0.6295803 0.7740178 0.6743694 1.09184 2.491209
3 1 97600 99799 130.3231261 181.8941783 195.5671390 230.37824 196.805481
4 1 103000 106799 239.2405214 288.7086320 328.4179196 276.23551 52.315381
5 1 108000 109599 33.9973372 20.1244623 22.2541917 30.57152 12.456043
6 1 110400 111799 23.9240521 24.7685690 18.2079750 21.83680 4.982417

I'm wondering what are these values. DiffBind author Rory Stark mentioned DiffBind will subtract INPUT reads from ChIP reads and deal with negative values. I'd like to have some details about how these values are calculated from the first place. Can someone help me on this and/or show me an example?

I tried to read the code. But it's hard for me to follow.

Thanks.

diffbind ChIP-seq • 2.1k views

ADD COMMENT • link updated 6 months ago by Rory Stark ★ 5.2k • written 8.1 years ago by xie186 • 0

score 0 · Answer 1 · 2016-07-05

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 3 days ago

Cambridge, UK

There are a number of possible "scoring" methods in DiffBind (see man page for dba.count()). The default values are calculated as follows:

For each consensus interval, for each sample, count the overlapping ChIP reads.
Subtract the number of overlapping Input reads.
Set the values to a minimum value of 1 read.
Compute normalization factors using edgeR TMM method, based on the total number of reads in each bam file.

These are the values you are seeing in the $binding matrix. Some notes:

These values are not necessarily the ones used for a differential analysis. For example, when invoking dba.analyze(), you can specify if the Input reads should be subtracted, and if only the number of reads overlapping consensus peaks should be used for normalization. This would not change the values in the global $binding matrix. If you use DESeq2 for analysis (currently the default method), a different normalization will be used as well.
Instead of accessing the $binding values directly, you can use dba.peakset() with bRetrieve=TRUE.
You can change the scoring method without recounting by calling dba.count() with peaks=NULL and setting the score parameter to a valid score, for example dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_RPKM).
In most cases that involve normalized count values, such as dba.report() with bCounts=TRUE, you can view non-normalized values by setting bNormalized=FALSE.

Hope this helps-

Rory

ADD COMMENT • link 8.1 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thank you so muck for your response. Rory.

ADD REPLY • link 8.1 years ago xie186 • 0

0

Entering edit mode

Hi Rory!

I'm trying to scale my bigwig files so the visualization on IGV matches what the DiffBind analysis results, being that the samples are all scaled to be on the same sequencing depth with the input reads subtracted. I know I can scale the coverage in the bedgraph files using the normFactors computed by the analysis; however I can't figure out how to deal with the subtraction of input reads. I was just wondering, by default while using bSubControl=TRUE, do you subtract the raw input reads from the raw ChIP reads? I tried changing the scoring method to both score=DBA_SCORE_RPKM_MINUS or score=DBA_SCORE_READS_MINUS, but in both cases, I get the same library size for each sample. I'm using method=DBA_DESEQ2 as well as the full library sizes for normalization.

I'm a bit confused as you can see! Your help would be much appreciated.

Here is just the bit of test that I did to try to figure out how the scoring impacts the library sizes.

> counts <- dba.count(counts, peaks=NULL, score=DBA_SCORE_READS_MINUS)
> counts <- dba.normalize(counts, method=DBA_DESEQ2)
> norm <- dba.normalize(counts, method=DBA_DESEQ2, bRetrieve=TRUE)
> norm
$norm.method
[1] "lib"

$norm.factors
[1] 1.4139824 0.7911595 0.9840277 0.8108305

$lib.method
[1] "full"

$lib.sizes
[1] 58097868 32507249 40431841 33315494

$control.subtract
[1] TRUE

$filter.value
[1] 1

> counts <- dba.count(counts, peaks=NULL, score=DBA_SCORE_RPKM_MINUS)
> counts <- dba.normalize(counts, method=DBA_DESEQ2)
> norm <- dba.normalize(counts, method=DBA_DESEQ2, bRetrieve=TRUE)
> norm
$norm.method
[1] "lib"

$norm.factors
[1] 1.4139824 0.7911595 0.9840277 0.8108305

$lib.method
[1] "full"

$lib.sizes
[1] 58097868 32507249 40431841 33315494

$control.subtract
[1] TRUE

$filter.value
[1] 1

ADD REPLY • link 8 months ago ann-sophie.gironne • 0

0

Entering edit mode

As you have seen, the library sizes are based on the number of aligned reads and are not altered by subtracting control reads. Control read subtraction only impacts the read counts themselves.

In general, DiffBind has moved away from subtracting control reads, preferring greylists, as this has a more principled statistical justification.

ADD REPLY • link 6 months ago Rory Stark ★ 5.2k