Question

DiffBind does not use DESeq2 MLE estimation of fold change

0

Entering edit mode

Sam ▴ 10

@sam-21502

Last seen 11 weeks ago

Jerusalem

In this post I have described that DiffBind gives a different number of DE genes when using dba.analyze, and when returning the results as a DESeq2 object. It seems that I have found the reason. Underlying that is DiffBind (at least by default) not using DESeq2's MLE estimation when calculating (log) fold change. So the concentration of DiffBind equals precisely log2(mean across condition), while the DESeq2 log fold change does not.

> suppressPackageStartupMessages(library(DESeq2))
> suppressPackageStartupMessages(library(DiffBind))
> data(tamoxifen_counts)
> tamoxifen <- dba.contrast(tamoxifen, design="~Tissue")
Computing results names...
> tamoxifen <- dba.contrast(tamoxifen, group1=tamoxifen$masks$MCF7, group2=tamoxifen$masks$BT474, name1="MCF7", name2="BT474")
> tamoxifen <- dba.analyze(tamoxifen,method = DBA_DESEQ2)
Normalize DESeq2 with defaults...
Analyzing...
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
converting counts to integer mode
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> 
> dba.show(tamoxifen,bContrasts = T,th=0.1)
  Group Samples Group2 Samples2 DB.DESeq2
1  MCF7       5  BT474        2       954
> 
> 
> dds <- dba.analyze(tamoxifen,bRetrieveAnalysis = DBA_DESEQ2)
> 
> res <- results(dds, independentFiltering = T, contrast=c("Tissue","MCF7","BT474"))
> 
> summary(res)

out of 2845 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 738, 26%
LFC < 0 (down)     : 437, 15%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 4)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Above, we see that the number of DE genes is different. Here's why :

 peak="1"
> dba.report(tamoxifen,th = 1,bNormalized = T,contrast=1,bCounts=T)[peak,]
GRanges object with 1 range and 13 metadata columns:
    seqnames      ranges strand |      Conc Conc_MCF7 Conc_BT474      Fold   p-value       FDR     MCF71
       <Rle>   <IRanges>  <Rle> | <numeric> <numeric>  <numeric> <numeric> <numeric> <numeric> <numeric>
  1    chr18 90841-91241      * |         0         0    0.88128  -0.88128  0.373987         1         0
        MCF72     MCF73    MCF7r1    MCF7r2    BT4741    BT4742
    <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
  1         0         0         0         0      2.47      1.21
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> counts(dds,normalized=T)[peak,]
    BT4741     BT4742      MCF71      MCF72      MCF73      T47D1      T47D2     MCF7r1     MCF7r2 
 2.4692546  1.2147633  0.0000000  0.0000000  0.0000000  0.0000000  0.5461769  0.0000000  0.0000000 
     ZR751      ZR752 
25.0813852 10.9513965

DiffBind fold change = Conc_MCF7-Conc_BT474

ConcBT474 is precisely log2(mean(BT4741,BT4742)) = log2(mean(c(2.4692546,1.2147633))) = 0.8812801

Which means that it is estimated without using MLE, as far as I understand. DESeq2 fold change is different :

> res[peak,]
log2 fold change (MLE): Tissue MCF7 vs BT474 
Wald test p-value: Tissue MCF7 vs BT474 
DataFrame with 1 row and 6 columns
   baseMean log2FoldChange     lfcSE      stat    pvalue      padj
  <numeric>      <numeric> <numeric> <numeric> <numeric> <numeric>
1   3.66027       -2.52726   1.28297  -1.96985 0.0488553  0.114492

This relationship holds for all peaks. The log2 fold change that appears in dba.report is precisely log2(mean of condition), without MLE estimation. That is different from DESeq2 fold change, and thus the results are different.

What is the solution to that? Which of the results is the correct one to use?

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_IL.UTF-8       LC_NUMERIC=C               LC_TIME=en_IL.UTF-8       
 [4] LC_COLLATE=en_IL.UTF-8     LC_MONETARY=en_IL.UTF-8    LC_MESSAGES=en_IL.UTF-8   
 [7] LC_PAPER=en_IL.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_IL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DiffBind_3.0.13             DESeq2_1.30.1               SummarizedExperiment_1.20.0
 [4] Biobase_2.50.0              MatrixGenerics_1.2.1        matrixStats_0.58.0         
 [7] GenomicRanges_1.42.0        GenomeInfoDb_1.26.2         IRanges_2.24.1             
[10] S4Vectors_0.28.1            BiocGenerics_0.36.0        

loaded via a namespace (and not attached):
  [1] GOstats_2.56.0           backports_1.2.1          BiocFileCache_1.14.0     plyr_1.8.6              
  [5] GSEABase_1.52.1          splines_4.0.4            BiocParallel_1.24.1      ggplot2_3.3.3           
  [9] amap_0.8-18              digest_0.6.27            invgamma_1.1             htmltools_0.5.1.1       
 [13] GO.db_3.12.1             SQUAREM_2021.1           fansi_0.4.2              magrittr_2.0.1          
 [17] checkmate_2.0.0          memoise_2.0.0            BSgenome_1.58.0          base64url_1.4           
 [21] limma_3.46.0             Biostrings_2.58.0        annotate_1.68.0          systemPipeR_1.24.3      
 [25] bdsmatrix_1.3-4          askpass_1.1              prettyunits_1.1.1        jpeg_0.1-8.1            
 [29] colorspace_2.0-0         blob_1.2.1               rappdirs_0.3.3           apeglm_1.12.0           
 [33] ggrepel_0.9.1            xfun_0.21                dplyr_1.0.4              crayon_1.4.1            
 [37] RCurl_1.98-1.2           jsonlite_1.7.2           graph_1.68.0             genefilter_1.72.1       
 [41] brew_1.0-6               survival_3.2-7           VariantAnnotation_1.36.0 glue_1.4.2              
 [45] gtable_0.3.0             zlibbioc_1.36.0          XVector_0.30.0           DelayedArray_0.16.1     
 [49] V8_3.4.0                 Rgraphviz_2.34.0         scales_1.1.1             mvtnorm_1.1-1           
 [53] pheatmap_1.0.12          DBI_1.1.1                edgeR_3.32.1             Rcpp_1.0.6              
 [57] xtable_1.8-4             progress_1.2.2           emdbook_1.3.12           bit_4.0.4               
 [61] rsvg_2.1                 truncnorm_1.0-8          AnnotationForge_1.32.0   httr_1.4.2              
 [65] gplots_3.1.1             RColorBrewer_1.1-2       ellipsis_0.3.1           pkgconfig_2.0.3         
 [69] XML_3.99-0.5             dbplyr_2.1.0             locfit_1.5-9.4           utf8_1.1.4              
 [73] tidyselect_1.1.0         rlang_0.4.10             AnnotationDbi_1.52.0     munsell_0.5.0           
 [77] tools_4.0.4              cachem_1.0.4             generics_0.1.0           RSQLite_2.2.3           
 [81] evaluate_0.14            stringr_1.4.0            fastmap_1.1.0            yaml_2.2.1              
 [85] knitr_1.31               bit64_4.0.5              caTools_1.18.1           purrr_0.3.4             
 [89] RBGL_1.66.0              xml2_1.3.2               biomaRt_2.46.3           compiler_4.0.4          
 [93] rstudioapi_0.13          curl_4.3                 png_0.1-7                tibble_3.0.6            
 [97] geneplotter_1.68.0       stringi_1.5.3            GenomicFeatures_1.42.1   lattice_0.20-41         
[101] Matrix_1.3-2             vctrs_0.3.6              pillar_1.5.0             lifecycle_1.0.0         
[105] irlba_2.3.3              data.table_1.14.0        bitops_1.0-6             rtracklayer_1.50.0      
[109] R6_2.5.0                 latticeExtra_0.6-29      hwriter_1.3.2            ShortRead_1.48.0        
[113] KernSmooth_2.23-18       MASS_7.3-53.1            gtools_3.8.2             assertthat_0.2.1        
[117] openssl_1.4.3            Category_2.56.0          rjson_0.2.20             withr_2.4.1             
[121] GenomicAlignments_1.26.0 batchtools_0.9.15        Rsamtools_2.6.0          GenomeInfoDbData_1.2.4  
[125] hms_1.0.0                grid_4.0.4               DOT_0.1                  coda_0.19-4             
[129] rmarkdown_2.7            GreyListChIP_1.22.0      ashr_2.2-47              mixsqp_0.3-43           
[133] bbmle_1.0.23.1           numDeriv_2016.8-1.1      tinytex_0.29

DiffBind • 1.4k views

ADD COMMENT • link updated 16 months ago by Rory Stark ★ 5.2k • written 3.7 years ago by Sam ▴ 10

score 2 · Answer 1 · 2021-04-07

2

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 5 weeks ago

Cambridge, UK

The answer here is the same for this related issue. If you use the contrastparameter to dba.contrast, the results of the DESeq2 analysis will report fold changes computed using DESeq2::lfcShrink().

ADD COMMENT • link 3.7 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Rory Stark ,

I wanted to discuss the "DESeq2::lfcShrink()" function mentioned in your previous explanation. While examining the source code of the latest version of Diffbind, I came across the "shrink" function, which contains the following relevant code snippets:

 if(shrink) {
if(contrast$contrastType=="byname") {
  res$de <- suppressMessages(DESeq2::lfcShrink(pv$DESeq2$DEdata,
                                               coef=contrast$contrast,
                                               lfcThreshold=lfc,
                                               res=res$de,type="apeglm"))
} else  if(contrast$contrastType=="results1") {
  res$de <- suppressMessages(DESeq2::lfcShrink(pv$DESeq2$DEdata,
                                               coef=contrast$contrast[[1]],
                                               lfcThreshold=lfc,
                                               res=res$de,type="apeglm"))
} else {
  res$de <- suppressMessages(DESeq2::lfcShrink(pv$DESeq2$DEdata,
                                               contrast=contrast$contrast,
                                               lfcThreshold=lfc,
                                               res=res$de,type="ashr"))
}

I have a few questions regarding this implementation. Firstly, when should I use "byname" or "results1" for the contrastType parameter? Secondly, I noticed that Diffbind employs two shrinkage methods, namely "apeglm" and "ashr." Could you please provide further explanation on this matter?

Thanks!

Joe

ADD REPLY • link 16 months ago Yizhou • 0

0

Entering edit mode

This is a repeat and answered here: Question regarding lfcShrink types in DiffBind

ADD REPLY • link 16 months ago Rory Stark ★ 5.2k