One of the many changes brought by DiffBind version 3.0 is the possibility of setting a model design in the same way one does in DESEq2. However, when I compare the results of this new approach ("design") with those of the pre-version 3.0 analysis ("no-design"), I see some differences. I used the following commands:
diffobject <- dba(sampleSheet=samplesheet,minOverlap=2, bRemoveRandom=T,scoreCol=9,filter=10,bRemoveRandom=T) diffobject$config$doBlacklist <- F diffobject$config$doGreylist <- F diffobject <- dba.count(diffobject,bRemoveDuplicates=T,minCount=0,minOverlap=2, score=DBA_SCORE_NORMALIZED,mapQCth = 20,bParallel = T) diffobject <- dba.normalize(diffobject,normalize=DBA_NORM_DEFAULT, library=DBA_LIBSIZE_FULL,background = F)
I tried to normalize using the same settings as in the pre-3.00 version according to the vignette. This is also the reason I set greylist and blacklist to
FALSE and why I set bSubControl to T. Next, I defined the contrasts as follows:
# Design approach (3.X) diffobject.3 <- dba.contrast(diffobject,design="~Condition", contrast=c("Condition","TREAT","CONTROL"),reorderMeta = list(Condition="CONTROL")) diffobject.3 <- dba.analyze(diffobject.3,design=T) # No-design approach (pre-3.00) diffobject.2 <- dba.contrast(diffobject,categories="Condition",group1=dba.mask(diffobject,"Condition","TREAT"), group2=dba.mask(diffobject,"Condition","CONTROL"),name1="TREAT",name2="CONTROL") diffobject.2 <- dba.normalize(diffobject.2,normalize=DBA_NORM_DEFAULT, library=DBA_LIBSIZE_FULL,background = F) diffobject.2 <- dba.analyze(diffobject.2)
These are the contrasts with each approach:
# diffbind.2 (no-design) 1 Contrast: Group Samples Group2 Samples2 DB.DESeq2 1 TREAT 4 CONTROL 3 11496 # diffbind.3 (design) Design: [~Condition] | 1 Contrast: Factor Group Samples Group2 Samples2 DB.DESeq2 1 Condition TREAT 4 CONTROL 3 12283
Of note, when I ran
design=F I got the exact same results as in
diffobject.2, in which groups had been set manually in
These are the main differences between the design vs no-design approaches, an example follows:
- Different number of regions with differential binding: 11496 vs 12283
- Different p-values for the same regions, probably related to the above
- Differences in fold change: In the no-design approach I obtain fold changes that are the result of subtracting group 1 - group 2, whereas in the design approach I get numbers that are a few decimals higher or lower. In the example below, you would expect a Fold of -9.12193 ( 2.28187-11.4038), but only the analysis with the "no-design" approach returns that.
# No design dba.report(diffobject.3) seqnames ranges strand | Conc Conc_TREAT Conc_CONTROL Fold p-value FDR <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> 48116 chr7 27195628-27196028 * | 10.18484 2.28187 11.4038 -8.91646 1.20932e-101 6.67834e-97 # Design dba.report(diffobject.2) seqnames ranges strand | Conc Conc_TREAT Conc_CONTROL Fold p-value FDR <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> 48116 chr7 27195628-27196028 * | 10.18484 2.28187 11.4038 -9.12192 1.83179e-101 1.01095e-96
Moreover, in the "design" approach there are some extreme cases with a Fold change that is not only very different from (Conc_1 - Conc_2), but in fact extremely large. As a result, the MA plot looks "flattened" due to the inclusion of these extreme values.
I have also run DESeq2 separately, after extracting the raw counts from the DiffBind object using
dba.count(diffobject,peaks=NULL,score=DBA_SCORE_READS_MINUS). The results look quite similar to the "design" approach, including those extreme values I was talking about. Skimming through the DiffBind code, what I believe is going on here is that in the "design" approach the fold change is taken directly from DESeq2, whereas in the "no-design" mode, the fold is always calculated as Conc_1 - Conc_2. Since Conc values < 0 are converted to 0, that avoids extreme differences.
Still, that does not explain why there are also differences in the p-values and the total number of differential regions. Are there any parameters that I should adjust differently? It have been using the "no design" approach for a while and I would like new results to be comparable. Furthermore, it would be ideal if Fold change was presented in the same manner in both cases.
P.S. I have also noticed that the vignette states that
bSubControl is FALSE by default unless a greylist has been specified, but the
dba.count states the opposite. Looking at the function, it looks like the vignette is wrong:
bSubControl = is.null(DBA$greylist).