I have a TF which is cytoplasmic in control samples & nuclear under treatment and am intentionally trying to demonstrate that TMM is an inappropriate method in this situation. However, I was quite surprised to see that using DBA_EDGER, the results from DBA_NORM_LIB and DBA_NORM_TMM are 100% identical. Fortunately, the expected differences are evident when using DESeq2, but this seems like there might be a bug somewhere in the call to edgeR.
My code
dba <- dba(sampleSheet = db_sample_sheet) %>%
dba.count() %>%
dba.blacklist()
dba_ls <- dba.normalize(dba, normalize = DBA_NORM_LIB) %>%
dba.analyze(method = DBA_ALL_METHODS)
dba_tmm <- dba.normalize(dba, normalize = DBA_NORM_TMM) %>%
dba.analyze(method = DBA_ALL_METHODS)
When I apply the edgeR method, I get completely identical results
> db_ls_edger <- dba.report(dba_ls, method = DBA_EDGER, th = 1)
> db_ls_edger
GRanges object with 14743 ranges and 6 metadata columns:
seqnames ranges strand | Conc Conc_E2DHT Conc_E2 Fold p-value FDR
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
13505 chr8 124417709-124418109 * | 6.81798 7.75970 3.159221 4.51547 5.15647e-38 7.60218e-34
2623 chr12 52520253-52520653 * | 6.51184 7.46118 2.654935 4.71332 2.79160e-36 2.05783e-32
7982 chr22 30330993-30331393 * | 6.38990 7.34084 2.487531 4.75724 2.56122e-35 1.25867e-31
2961 chr12 111505344-111505744 * | 5.82414 6.79857 0.993156 5.65514 1.57190e-33 5.79361e-30
4003 chr14 81072798-81073198 * | 7.50892 8.39092 4.838411 3.48057 3.41593e-33 1.00722e-29
> db_tmm_edger <- dba.report(dba_tmm, method = DBA_EDGER, th = 1)
> db_tmm_edger
GRanges object with 14743 ranges and 6 metadata columns:
seqnames ranges strand | Conc Conc_E2DHT Conc_E2 Fold p-value FDR
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
13505 chr8 124417709-124418109 * | 6.81798 7.75970 3.159221 4.51547 5.15647e-38 7.60218e-34
2623 chr12 52520253-52520653 * | 6.51184 7.46118 2.654935 4.71332 2.79160e-36 2.05783e-32
7982 chr22 30330993-30331393 * | 6.38990 7.34084 2.487531 4.75724 2.56122e-35 1.25867e-31
2961 chr12 111505344-111505744 * | 5.82414 6.79857 0.993156 5.65514 1.57190e-33 5.79361e-30
4003 chr14 81072798-81073198 * | 7.50892 8.39092 4.838411 3.48057 3.41593e-33 1.00722e-29
Hashing shows they are identical
> rlang::hash(db_ls_edger)
[1] "b4e5536150b45108684eef43ac0a4994"
> rlang::hash(db_tmm_edger)
[1] "b4e5536150b45108684eef43ac0a4994"
However, when I apply the identical strategy using DESeq2, I can see the differences that should be there
> db_ls_deseq2 <- dba.report(dba_ls, method = DBA_DESEQ2, th = 1)
> db_ls_deseq2
GRanges object with 14743 ranges and 6 metadata columns:
seqnames ranges strand | Conc Conc_E2DHT Conc_E2 Fold p-value FDR
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
4003 chr14 81072798-81073198 * | 7.47974 8.35825 4.84957 3.43505 2.24075e-46 3.30353e-42
12575 chr7 107452412-107452812 * | 7.22213 8.10589 4.53080 3.48964 2.54621e-43 1.87694e-39
2260 chr11 114049984-114050384 * | 8.09689 8.84925 6.43244 2.39092 2.81037e-41 1.38111e-37
2313 chr11 129895232-129895632 * | 7.03702 7.93384 4.18029 3.66639 3.28305e-39 1.21005e-35
13505 chr8 124417709-124418109 * | 6.78672 7.72723 3.15704 4.40111 5.06715e-37 1.49410e-33
> db_tmm_deseq2 <- dba.report(dba_tmm, method = DBA_DESEQ2, th = 1)
> db_tmm_deseq2
GRanges object with 14743 ranges and 6 metadata columns:
seqnames ranges strand | Conc Conc_E2DHT Conc_E2 Fold p-value FDR
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
13942 chr9 89254953-89255353 * | 6.97872 5.50465 7.69245 -2.10114 2.12071e-39 3.12656e-35
7674 chr20 45637812-45638212 * | 7.21142 6.12668 7.82355 -1.64274 1.08626e-29 8.00739e-26
6201 chr2 18981633-18982033 * | 7.38869 6.29292 8.00422 -1.64607 1.51376e-27 7.43913e-24
10071 chr4 167201738-167202138 * | 6.20587 4.74095 6.91758 -2.05670 4.54453e-26 1.67500e-22
5251 chr17 50496602-50497002 * | 6.74841 5.57290 7.38744 -1.73168 1.03115e-25 3.04045e-22
(I've truncated the output above)
And as expected the hashes are different
> rlang::hash(db_ls_deseq2)
[1] "dcf3c928594ec0d9802c8e2f09f28a9f"
> rlang::hash(db_tmm_deseq2)
[1] "61d0d8b2fb383ea9442244385568bfce"
This error seems to occur in DiffBind_3.10.0
> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /data/tki_bodl/sw/R/4.3.0-gcc-blas-fullsupport/lib64/R/lib/libRblas.so
LAPACK: /data/tki_bodl/sw/R/4.3.0-gcc-blas-fullsupport/lib64/R/lib/libRlapack.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
time zone: Australia/Perth
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.42.4 limma_3.56.2 DiffBind_3.10.0 pander_0.6.5
[5] plyranges_1.20.0 glue_1.6.2 scales_1.2.1 rtracklayer_1.60.0
[9] Rsamtools_2.16.0 Biostrings_2.68.1 XVector_0.40.0 extraChIPs_1.5.6
[13] SummarizedExperiment_1.30.2 Biobase_2.60.0 MatrixGenerics_1.12.2 matrixStats_1.0.0
[17] ggside_0.2.2 GenomicRanges_1.52.0 GenomeInfoDb_1.36.0 IRanges_2.34.0
[21] S4Vectors_0.38.1 BiocGenerics_0.46.0 BiocParallel_1.34.2 lubridate_1.9.2
[25] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1
[29] readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2
[33] tidyverse_2.0.0 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] fs_1.6.2 ProtGenerics_1.32.0 bitops_1.0-7 httr_1.4.6
[5] RColorBrewer_1.1-3 doParallel_1.0.17 InteractionSet_1.28.1 numDeriv_2016.8-1.1
[9] tools_4.3.0 backports_1.4.1 utf8_1.2.3 R6_2.5.1
[13] mgcv_1.8-42 lazyeval_0.2.2 Gviz_1.44.0 apeglm_1.22.1
[17] GetoptLong_1.0.5 withr_2.5.0 prettyunits_1.1.1 gridExtra_2.3
[21] VennDiagram_1.7.3 cli_3.6.1 formatR_1.14 labeling_0.4.2
[25] SQUAREM_2021.1 mvtnorm_1.2-2 mixsqp_0.3-48 foreign_0.8-84
[29] dichromat_2.0-0.1 BSgenome_1.68.0 invgamma_1.1 bbmle_1.0.25
[33] rstudioapi_0.14 RSQLite_2.3.1 generics_0.1.3 shape_1.4.6
[37] BiocIO_1.10.0 hwriter_1.3.2.1 gtools_3.9.4 vroom_1.6.3
[41] Matrix_1.5-4 interp_1.1-4 futile.logger_1.4.3 fansi_1.0.4
[45] lifecycle_1.0.3 whisker_0.4.1 yaml_2.3.7 gplots_3.1.3
[49] BiocFileCache_2.8.0 grid_4.3.0 blob_1.2.4 promises_1.2.0.1
[53] crayon_1.5.2 bdsmatrix_1.3-6 lattice_0.21-8 ComplexUpset_1.3.3
[57] GenomicFeatures_1.52.0 KEGGREST_1.40.0 pillar_1.9.0 knitr_1.43
[61] ComplexHeatmap_2.16.0 metapod_1.8.0 rjson_0.2.21 systemPipeR_2.6.3
[65] codetools_0.2-19 ShortRead_1.58.0 getPass_0.2-2 GreyListChIP_1.32.0
[69] data.table_1.14.8 vctrs_0.6.3 png_0.1-8 gtable_0.3.3
[73] amap_0.8-19 emdbook_1.3.13 cachem_1.0.8 xfun_0.39
[77] S4Arrays_1.0.4 coda_0.19-4 iterators_1.0.14 GenomicInteractions_1.34.0
[81] nlme_3.1-162 bit64_4.0.5 progress_1.2.2 filelock_1.0.2
[85] rprojroot_2.0.3 irlba_2.3.5.1 KernSmooth_2.23-20 rpart_4.1.19
[89] colorspace_2.1-0 DBI_1.1.3 Hmisc_5.1-0 nnet_7.3-18
[93] DESeq2_1.40.2 tidyselect_1.2.0 processx_3.8.1 bit_4.0.5
[97] compiler_4.3.0 curl_5.0.1 git2r_0.32.0 csaw_1.34.0
[101] htmlTable_2.4.1 xml2_1.3.4 DelayedArray_0.26.3 checkmate_2.2.0
[105] caTools_1.18.2 callr_3.7.3 rappdirs_0.3.3 digest_0.6.31
[109] rmarkdown_2.22 htmltools_0.5.5 pkgconfig_2.0.3 jpeg_0.1-10
[113] base64enc_0.1-3 dbplyr_2.3.2 fastmap_1.1.1 ensembldb_2.24.0
[117] rlang_1.1.1 GlobalOptions_0.1.2 htmlwidgets_1.6.2 EnrichedHeatmap_1.30.0
[121] farver_2.1.1 VariantAnnotation_1.46.0 RCurl_1.98-1.12 magrittr_2.0.3
[125] Formula_1.2-5 GenomeInfoDbData_1.2.10 patchwork_1.1.2 munsell_0.5.0
[129] Rcpp_1.0.10 stringi_1.7.12 zlibbioc_1.46.0 MASS_7.3-58.4
[133] plyr_1.8.8 parallel_4.3.0 ggrepel_0.9.3 deldir_1.0-9
[137] splines_4.3.0 hms_1.1.3 circlize_0.4.15 locfit_1.5-9.8
[141] ps_1.7.5 igraph_1.5.0 biomaRt_2.56.0 futile.options_1.0.1
[145] XML_3.99-0.14 evaluate_0.21 latticeExtra_0.6-30 biovizBase_1.48.0
[149] lambda.r_1.2.4 BiocManager_1.30.21 tzdb_0.4.0 foreach_1.5.2
[153] tweenr_2.0.2 httpuv_1.6.11 polyclip_1.10-4 clue_0.3-64
[157] ashr_2.2-54 ggforce_0.4.1 broom_1.0.5 restfulr_0.0.15
[161] AnnotationFilter_1.24.0 later_1.3.1 truncnorm_1.0-9 memoise_2.0.1
[165] AnnotationDbi_1.62.1 GenomicAlignments_1.36.0 cluster_2.1.4 timechange_0.2.0
[169] here_1.0.1
Lovely. Thank you!