Question

Diffbind: differences between design vs no design (same comparison)?

0

Entering edit mode

Roger • 0

@roger-12673

Last seen 4.2 years ago

Netherlands

Hi,

One of the many changes brought by DiffBind version 3.0 is the possibility of setting a model design in the same way one does in DESEq2. However, when I compare the results of this new approach ("design") with those of the pre-version 3.0 analysis ("no-design"), I see some differences. I used the following commands:

diffobject <- dba(sampleSheet=samplesheet,minOverlap=2, bRemoveRandom=T,scoreCol=9,filter=10,bRemoveRandom=T)
diffobject$config$doBlacklist <- F 
diffobject$config$doGreylist <- F 
diffobject <- dba.count(diffobject,bRemoveDuplicates=T,minCount=0,minOverlap=2, score=DBA_SCORE_NORMALIZED,mapQCth = 20,bParallel = T)
diffobject <- dba.normalize(diffobject,normalize=DBA_NORM_DEFAULT, library=DBA_LIBSIZE_FULL,background = F)

I tried to normalize using the same settings as in the pre-3.00 version according to the vignette. This is also the reason I set greylist and blacklist to FALSE and why I set bSubControl to T. Next, I defined the contrasts as follows:

# Design approach (3.X)
diffobject.3 <- dba.contrast(diffobject,design="~Condition", contrast=c("Condition","TREAT","CONTROL"),reorderMeta = list(Condition="CONTROL"))
diffobject.3 <- dba.analyze(diffobject.3,design=T)
# No-design approach (pre-3.00)
diffobject.2 <- dba.contrast(diffobject,categories="Condition",group1=dba.mask(diffobject,"Condition","TREAT"),
  group2=dba.mask(diffobject,"Condition","CONTROL"),name1="TREAT",name2="CONTROL")
diffobject.2 <- dba.normalize(diffobject.2,normalize=DBA_NORM_DEFAULT, library=DBA_LIBSIZE_FULL,background = F)
diffobject.2 <- dba.analyze(diffobject.2)

These are the contrasts with each approach:

# diffbind.2 (no-design)
1 Contrast:
    Group Samples  Group2 Samples2 DB.DESeq2
1 TREAT       4 CONTROL        3     11496

# diffbind.3 (design)
Design: [~Condition] | 1 Contrast:
     Factor   Group Samples  Group2 Samples2 DB.DESeq2
1 Condition TREAT       4 CONTROL        3     12283

Of note, when I ran dba.analyze on diffobject.3 with design=F I got the exact same results as in diffobject.2, in which groups had been set manually in dba.contrast.

These are the main differences between the design vs no-design approaches, an example follows:

Different number of regions with differential binding: 11496 vs 12283
Different p-values for the same regions, probably related to the above
Differences in fold change: In the no-design approach I obtain fold changes that are the result of subtracting group 1 - group 2, whereas in the design approach I get numbers that are a few decimals higher or lower. In the example below, you would expect a Fold of -9.12193 ( 2.28187-11.4038), but only the analysis with the "no-design" approach returns that.

# No design
dba.report(diffobject.3)
      seqnames              ranges strand |      Conc Conc_TREAT Conc_CONTROL      Fold      p-value         FDR
           <Rle>           <IRanges>  <Rle> | <numeric>    <numeric>    <numeric> <numeric>    <numeric>   <numeric>
  48116     chr7   27195628-27196028      * |  10.18484      2.28187      11.4038  -8.91646 1.20932e-101 6.67834e-97
# Design
dba.report(diffobject.2)
      seqnames            ranges strand |      Conc Conc_TREAT Conc_CONTROL      Fold      p-value         FDR
           <Rle>         <IRanges>  <Rle> | <numeric>    <numeric>    <numeric> <numeric>    <numeric>   <numeric>
  48116     chr7 27195628-27196028      * |  10.18484      2.28187      11.4038  -9.12192 1.83179e-101 1.01095e-96

Moreover, in the "design" approach there are some extreme cases with a Fold change that is not only very different from (Conc_1 - Conc_2), but in fact extremely large. As a result, the MA plot looks "flattened" due to the inclusion of these extreme values.

I have also run DESeq2 separately, after extracting the raw counts from the DiffBind object using dba.count(diffobject,peaks=NULL,score=DBA_SCORE_READS_MINUS). The results look quite similar to the "design" approach, including those extreme values I was talking about. Skimming through the DiffBind code, what I believe is going on here is that in the "design" approach the fold change is taken directly from DESeq2, whereas in the "no-design" mode, the fold is always calculated as Conc_1 - Conc_2. Since Conc values < 0 are converted to 0, that avoids extreme differences.

Still, that does not explain why there are also differences in the p-values and the total number of differential regions. Are there any parameters that I should adjust differently? It have been using the "no design" approach for a while and I would like new results to be comparable. Furthermore, it would be ideal if Fold change was presented in the same manner in both cases.

P.S. I have also noticed that the vignette states that bSubControl is FALSE by default unless a greylist has been specified, but the dba.count states the opposite. Looking at the function, it looks like the vignette is wrong: bSubControl = is.null(DBA$greylist).

DiffBind • 1.5k views

ADD COMMENT • link updated 4.3 years ago by Rory Stark ★ 5.2k • written 4.3 years ago by Roger • 0

score 2 · Accepted Answer · 2021-09-05

The older, pre-3.0 analysis method, only the samples directly involved in a contrast are included in the model, so if you have samples not included in the contrast, you will get different results when using a full design. I suspect that is not what is happening in your case (where I assume all your samples are either TREAT or CONTROL).

It is however the case that the newer design-based method is preferred to the older one. The previous method has been retained for backward compatibility, emphasizing getting similar results as you would have obtained prior to 3.0, rather than having the old method and new method match in 3.0. It is likely that the old method will be obsoleted at some point.

One thing to note (which is not entirely clear in the help pages) is that not all of the options in dba.normalize() are supported for the old method. Only library=DBA_LIBSIZE_FULL and library=DBA_LIBSIZE_PEAKREADS are recognized (and not any of the options for normalize, background, etc.) In your case things should be working as you expect. The only difference should be the value for libFun. Previously the default was min (normalizing to the smallest library size, so that all other samples are down weighted) while now the default is mean (normalizing to the mean library size, will those smaller being upweighted and those larger being down weighted). Try setting libFun=mean and see if you get the same results for the old way and the new way.

You are correct regarding the Fold changes in the report. The older method does a simple subtraction while the design-based method takes the Fold values computed by DESeq2::lfcShrink. This is the preferred way to compute Fold changes, but you can always do the simple subtraction of the fold changes for each group yourself in a single line of code after getting a report.