Question

DiffBind confusions

0

Entering edit mode

skyyks123 • 0

@c8a61312

Last seen 13 months ago

Japan

Dear friends,

While I was using the latest release (3.13), I have met a few problems which need your help.

I have repeated the whole process following the vignettes using the built in data sets, but my final results (I got 246 significantly differentially binding sites, see the attached file) are different from numbers (249) provided by the vignettes .
Those 246 sites were produced by the Deseq2 method (which is the default one), but if I choose the method = DBA_EDGER while doing the analyze step, I got 0 significantly differentially binding sites which is quite confusing. I do not know what is going wrong here (see the attached file)?
As for the normalization step, which is newly added, cause the old version comes with no normalization function. So the normalization step is necessary ? As Deseq2 or edgR is has their own way of normalizing data, so should we normalize the data first (using dba.normalize()) and then normalize the data again using Deseq2 or edgeR (using dba.analyze())？ Or the dba.analyze() adopted the normalization (i.e. size factors) from dba.normalize() and then continues the following normalization.

Many thanks for your kind help!

Code as shown below


> tamoxifen
11 Samples, 2845 sites in matrix:
       ID Tissue Factor  Condition  Treatment Replicate   Reads FRiP
1  BT4741  BT474     ER  Resistant Full-Media         1  652697 0.16
2  BT4742  BT474     ER  Resistant Full-Media         2  663370 0.15
3   MCF71   MCF7     ER Responsive Full-Media         1  346429 0.31
4   MCF72   MCF7     ER Responsive Full-Media         2  368052 0.19
5   MCF73   MCF7     ER Responsive Full-Media         3  466273 0.25
6   T47D1   T47D     ER Responsive Full-Media         1  399879 0.11
7   T47D2   T47D     ER Responsive Full-Media         2 1475415 0.06
8  MCF7r1   MCF7     ER  Resistant Full-Media         1  616630 0.22
9  MCF7r2   MCF7     ER  Resistant Full-Media         2  593224 0.14
10  ZR751   ZR75     ER Responsive Full-Media         1  706836 0.33
11  ZR752   ZR75     ER Responsive Full-Media         2 2575408 0.22
> plot(tamoxifen)
> #3.Normalizing the data
> tamoxifen <- dba.normalize(tamoxifen)
> tamoxifen
11 Samples, 2845 sites in matrix:
       ID Tissue Factor  Condition  Treatment Replicate   Reads FRiP
1  BT4741  BT474     ER  Resistant Full-Media         1  652697 0.16
2  BT4742  BT474     ER  Resistant Full-Media         2  663370 0.15
3   MCF71   MCF7     ER Responsive Full-Media         1  346429 0.31
4   MCF72   MCF7     ER Responsive Full-Media         2  368052 0.19
5   MCF73   MCF7     ER Responsive Full-Media         3  466273 0.25
6   T47D1   T47D     ER Responsive Full-Media         1  399879 0.11
7   T47D2   T47D     ER Responsive Full-Media         2 1475415 0.06
8  MCF7r1   MCF7     ER  Resistant Full-Media         1  616630 0.22
9  MCF7r2   MCF7     ER  Resistant Full-Media         2  593224 0.14
10  ZR751   ZR75     ER Responsive Full-Media         1  706836 0.33
11  ZR752   ZR75     ER Responsive Full-Media         2 2575408 0.22
> tamoxifen <- dba.contrast(tamoxifen)
Computing results names...
> tamoxifen
11 Samples, 2845 sites in matrix:
       ID Tissue Factor  Condition  Treatment Replicate   Reads FRiP
1  BT4741  BT474     ER  Resistant Full-Media         1  652697 0.16
2  BT4742  BT474     ER  Resistant Full-Media         2  663370 0.15
3   MCF71   MCF7     ER Responsive Full-Media         1  346429 0.31
4   MCF72   MCF7     ER Responsive Full-Media         2  368052 0.19
5   MCF73   MCF7     ER Responsive Full-Media         3  466273 0.25
6   T47D1   T47D     ER Responsive Full-Media         1  399879 0.11
7   T47D2   T47D     ER Responsive Full-Media         2 1475415 0.06
8  MCF7r1   MCF7     ER  Resistant Full-Media         1  616630 0.22
9  MCF7r2   MCF7     ER  Resistant Full-Media         2  593224 0.14
10  ZR751   ZR75     ER Responsive Full-Media         1  706836 0.33
11  ZR752   ZR75     ER Responsive Full-Media         2 2575408 0.22

Design: [~Condition] | 1 Contrast:
     Factor      Group Samples    Group2 Samples2
1 Condition Responsive       7 Resistant        4
> tamoxifen <- dba.analyze(tamoxifen)
Applying Blacklist/Greylists...
Genome detected: Hsapiens.UCSC.hg19
Applying blacklist...
Removed: 1 of 2845 intervals.
Counting control reads for greylist...
Building greylist: C:/Users/Tao/Documents/Test/DiffBind_Vignette/reads/Chr18_BT474_input.bam
coverage: 166912 bp (0.21%)
Building greylist: C:/Users/Tao/Documents/Test/DiffBind_Vignette/reads/Chr18_MCF7_input.bam
coverage: 106495 bp (0.14%)
Building greylist: C:/Users/Tao/Documents/Test/DiffBind_Vignette/reads/Chr18_T47D_input.bam
coverage: 56832 bp (0.07%)
Building greylist: C:/Users/Tao/Documents/Test/DiffBind_Vignette/reads/Chr18_TAMR_input.bam
coverage: 122879 bp (0.16%)
Building greylist: C:/Users/Tao/Documents/Test/DiffBind_Vignette/reads/Chr18_ZR75_input.bam
coverage: 68608 bp (0.09%)
BT474c: 58 ranges, 166912 bases
MCF7c: 14 ranges, 106495 bases
T47Dc: 11 ranges, 56832 bases
TAMRc: 10 ranges, 122879 bases
ZR75c: 12 ranges, 68608 bases
Master greylist: 69 ranges, 251391 bases
Removed: 50 of 2844 intervals.
Removed 51 (of 2845) consensus peaks.
Normalize DESeq2 with defaults...
Forming default model design and contrast(s)...
Computing results names...
Analyzing...
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> dba.show(tamoxifen, bContrasts=TRUE)
     Factor      Group Samples    Group2 Samples2 DB.DESeq2
1 Condition Responsive       7 Resistant        4       249
> tamoxifen <- dba.analyze(tamoxifen,method = DBA_EDGER)
Normalize edgeR with defaults...
Analyzing...
> dba.show(tamoxifen, bContrasts=TRUE)
     Factor      Group Samples    Group2 Samples2 DB.edgeR DB.DESeq2
1 Condition Responsive       7 Resistant        4        0       249
sessionInfo( )

DiffBind • 831 views

ADD COMMENT • link updated 2.9 years ago by Rory Stark ★ 5.2k • written 2.9 years ago by skyyks123 • 0

score 0 · Answer 1 · 2021-06-08

The difference is due to the vignette data having called dba.blacklist() before calling dba.count(). As a result, the blacklists and greylists are applied to the re-centered consensus set, which changes things just enough to get slightly different results. I'll update the vignette to make this clear. To get the same results as in the vignette, call dba.blacklist(tamoxifen) before dba.count(tamoxifen). BTW, I think you got this backwards, as the attached file shows 249, while the vignette shows 246

You are correct that zero sites are calculated to be differentially bound using edgeR. In this case it is related to the model itself; edgeR is more sensitive to the use of MCF7 cells in both sides of the contrast and needs to use a multi-factor design formula "~Tissue + Condition", as discussed in Section 5 of the Vignette Example: Multi-factor designs.

dba.normalize() does not actually do any normalization, it only computes the size factors and normalization factors (possibly using facilities built into edgeR and/or DESeq2). These factors are passed into the underlying analysis package by dba.analyze(). Previous versions of DiffBind did normalize the data based on some parameters to dba.analyze(). If you do not call dba.normalize() explicitly, then dba.analyze() will apply the default normalization, which is the same as the default normalization in previous versions. The default is to base the normalization factors purely on the library sizes, which does not rely on either of the "native" methods used in edgeR or DESeq2.

Basically, dba.analyze() will run any "skipped" steps in default mode if they haven't been run explicitly. These steps include blacklists/greylists, counting, normalizing, and setting up a contrast. Then it calls DESeq2 and/or edgeR.