DiffBind Using edgeR: TMM Normalisation produces identical results to Library Size normalisation
1
0
Entering edit mode
@steve-pederson-23427
Last seen 7 months ago
Australia

I have a TF which is cytoplasmic in control samples & nuclear under treatment and am intentionally trying to demonstrate that TMM is an inappropriate method in this situation. However, I was quite surprised to see that using DBA_EDGER, the results from DBA_NORM_LIB and DBA_NORM_TMM are 100% identical. Fortunately, the expected differences are evident when using DESeq2, but this seems like there might be a bug somewhere in the call to edgeR.

My code

dba <- dba(sampleSheet = db_sample_sheet) %>% 
  dba.count() %>% 
  dba.blacklist()
dba_ls <- dba.normalize(dba, normalize = DBA_NORM_LIB) %>% 
  dba.analyze(method = DBA_ALL_METHODS)
dba_tmm <- dba.normalize(dba, normalize = DBA_NORM_TMM) %>% 
  dba.analyze(method = DBA_ALL_METHODS)

When I apply the edgeR method, I get completely identical results

> db_ls_edger <-  dba.report(dba_ls, method = DBA_EDGER, th = 1) 
> db_ls_edger
GRanges object with 14743 ranges and 6 metadata columns:
        seqnames              ranges strand |      Conc Conc_E2DHT   Conc_E2        Fold     p-value         FDR
           <Rle>           <IRanges>  <Rle> | <numeric>  <numeric> <numeric>   <numeric>   <numeric>   <numeric>
  13505     chr8 124417709-124418109      * |   6.81798    7.75970  3.159221     4.51547 5.15647e-38 7.60218e-34
   2623    chr12   52520253-52520653      * |   6.51184    7.46118  2.654935     4.71332 2.79160e-36 2.05783e-32
   7982    chr22   30330993-30331393      * |   6.38990    7.34084  2.487531     4.75724 2.56122e-35 1.25867e-31
   2961    chr12 111505344-111505744      * |   5.82414    6.79857  0.993156     5.65514 1.57190e-33 5.79361e-30
   4003    chr14   81072798-81073198      * |   7.50892    8.39092  4.838411     3.48057 3.41593e-33 1.00722e-29
> db_tmm_edger <-  dba.report(dba_tmm, method = DBA_EDGER, th = 1) 
> db_tmm_edger
GRanges object with 14743 ranges and 6 metadata columns:
        seqnames              ranges strand |      Conc Conc_E2DHT   Conc_E2        Fold     p-value         FDR
           <Rle>           <IRanges>  <Rle> | <numeric>  <numeric> <numeric>   <numeric>   <numeric>   <numeric>
  13505     chr8 124417709-124418109      * |   6.81798    7.75970  3.159221     4.51547 5.15647e-38 7.60218e-34
   2623    chr12   52520253-52520653      * |   6.51184    7.46118  2.654935     4.71332 2.79160e-36 2.05783e-32
   7982    chr22   30330993-30331393      * |   6.38990    7.34084  2.487531     4.75724 2.56122e-35 1.25867e-31
   2961    chr12 111505344-111505744      * |   5.82414    6.79857  0.993156     5.65514 1.57190e-33 5.79361e-30
   4003    chr14   81072798-81073198      * |   7.50892    8.39092  4.838411     3.48057 3.41593e-33 1.00722e-29

Hashing shows they are identical

> rlang::hash(db_ls_edger)
[1] "b4e5536150b45108684eef43ac0a4994"
> rlang::hash(db_tmm_edger)
[1] "b4e5536150b45108684eef43ac0a4994"

However, when I apply the identical strategy using DESeq2, I can see the differences that should be there

> db_ls_deseq2 <-  dba.report(dba_ls, method = DBA_DESEQ2, th = 1) 
> db_ls_deseq2
GRanges object with 14743 ranges and 6 metadata columns:
        seqnames              ranges strand |      Conc Conc_E2DHT   Conc_E2         Fold     p-value         FDR
           <Rle>           <IRanges>  <Rle> | <numeric>  <numeric> <numeric>    <numeric>   <numeric>   <numeric>
   4003    chr14   81072798-81073198      * |   7.47974    8.35825   4.84957      3.43505 2.24075e-46 3.30353e-42
  12575     chr7 107452412-107452812      * |   7.22213    8.10589   4.53080      3.48964 2.54621e-43 1.87694e-39
   2260    chr11 114049984-114050384      * |   8.09689    8.84925   6.43244      2.39092 2.81037e-41 1.38111e-37
   2313    chr11 129895232-129895632      * |   7.03702    7.93384   4.18029      3.66639 3.28305e-39 1.21005e-35
  13505     chr8 124417709-124418109      * |   6.78672    7.72723   3.15704      4.40111 5.06715e-37 1.49410e-33

> db_tmm_deseq2 <-  dba.report(dba_tmm, method = DBA_DESEQ2, th = 1) 
> db_tmm_deseq2
GRanges object with 14743 ranges and 6 metadata columns:
        seqnames              ranges strand |      Conc Conc_E2DHT   Conc_E2         Fold     p-value         FDR
           <Rle>           <IRanges>  <Rle> | <numeric>  <numeric> <numeric>    <numeric>   <numeric>   <numeric>
  13942     chr9   89254953-89255353      * |   6.97872    5.50465   7.69245     -2.10114 2.12071e-39 3.12656e-35
   7674    chr20   45637812-45638212      * |   7.21142    6.12668   7.82355     -1.64274 1.08626e-29 8.00739e-26
   6201     chr2   18981633-18982033      * |   7.38869    6.29292   8.00422     -1.64607 1.51376e-27 7.43913e-24
  10071     chr4 167201738-167202138      * |   6.20587    4.74095   6.91758     -2.05670 4.54453e-26 1.67500e-22
   5251    chr17   50496602-50497002      * |   6.74841    5.57290   7.38744     -1.73168 1.03115e-25 3.04045e-22

(I've truncated the output above)

And as expected the hashes are different

> rlang::hash(db_ls_deseq2)
[1] "dcf3c928594ec0d9802c8e2f09f28a9f"
> rlang::hash(db_tmm_deseq2)
[1] "61d0d8b2fb383ea9442244385568bfce"

This error seems to occur in DiffBind_3.10.0

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /data/tki_bodl/sw/R/4.3.0-gcc-blas-fullsupport/lib64/R/lib/libRblas.so 
LAPACK: /data/tki_bodl/sw/R/4.3.0-gcc-blas-fullsupport/lib64/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8    LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

time zone: Australia/Perth
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] edgeR_3.42.4                limma_3.56.2                DiffBind_3.10.0             pander_0.6.5               
 [5] plyranges_1.20.0            glue_1.6.2                  scales_1.2.1                rtracklayer_1.60.0         
 [9] Rsamtools_2.16.0            Biostrings_2.68.1           XVector_0.40.0              extraChIPs_1.5.6           
[13] SummarizedExperiment_1.30.2 Biobase_2.60.0              MatrixGenerics_1.12.2       matrixStats_1.0.0          
[17] ggside_0.2.2                GenomicRanges_1.52.0        GenomeInfoDb_1.36.0         IRanges_2.34.0             
[21] S4Vectors_0.38.1            BiocGenerics_0.46.0         BiocParallel_1.34.2         lubridate_1.9.2            
[25] forcats_1.0.0               stringr_1.5.0               dplyr_1.1.2                 purrr_1.0.1                
[29] readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1                ggplot2_3.4.2              
[33] tidyverse_2.0.0             workflowr_1.7.0            

loaded via a namespace (and not attached):
  [1] fs_1.6.2                   ProtGenerics_1.32.0        bitops_1.0-7               httr_1.4.6                
  [5] RColorBrewer_1.1-3         doParallel_1.0.17          InteractionSet_1.28.1      numDeriv_2016.8-1.1       
  [9] tools_4.3.0                backports_1.4.1            utf8_1.2.3                 R6_2.5.1                  
 [13] mgcv_1.8-42                lazyeval_0.2.2             Gviz_1.44.0                apeglm_1.22.1             
 [17] GetoptLong_1.0.5           withr_2.5.0                prettyunits_1.1.1          gridExtra_2.3             
 [21] VennDiagram_1.7.3          cli_3.6.1                  formatR_1.14               labeling_0.4.2            
 [25] SQUAREM_2021.1             mvtnorm_1.2-2              mixsqp_0.3-48              foreign_0.8-84            
 [29] dichromat_2.0-0.1          BSgenome_1.68.0            invgamma_1.1               bbmle_1.0.25              
 [33] rstudioapi_0.14            RSQLite_2.3.1              generics_0.1.3             shape_1.4.6               
 [37] BiocIO_1.10.0              hwriter_1.3.2.1            gtools_3.9.4               vroom_1.6.3               
 [41] Matrix_1.5-4               interp_1.1-4               futile.logger_1.4.3        fansi_1.0.4               
 [45] lifecycle_1.0.3            whisker_0.4.1              yaml_2.3.7                 gplots_3.1.3              
 [49] BiocFileCache_2.8.0        grid_4.3.0                 blob_1.2.4                 promises_1.2.0.1          
 [53] crayon_1.5.2               bdsmatrix_1.3-6            lattice_0.21-8             ComplexUpset_1.3.3        
 [57] GenomicFeatures_1.52.0     KEGGREST_1.40.0            pillar_1.9.0               knitr_1.43                
 [61] ComplexHeatmap_2.16.0      metapod_1.8.0              rjson_0.2.21               systemPipeR_2.6.3         
 [65] codetools_0.2-19           ShortRead_1.58.0           getPass_0.2-2              GreyListChIP_1.32.0       
 [69] data.table_1.14.8          vctrs_0.6.3                png_0.1-8                  gtable_0.3.3              
 [73] amap_0.8-19                emdbook_1.3.13             cachem_1.0.8               xfun_0.39                 
 [77] S4Arrays_1.0.4             coda_0.19-4                iterators_1.0.14           GenomicInteractions_1.34.0
 [81] nlme_3.1-162               bit64_4.0.5                progress_1.2.2             filelock_1.0.2            
 [85] rprojroot_2.0.3            irlba_2.3.5.1              KernSmooth_2.23-20         rpart_4.1.19              
 [89] colorspace_2.1-0           DBI_1.1.3                  Hmisc_5.1-0                nnet_7.3-18               
 [93] DESeq2_1.40.2              tidyselect_1.2.0           processx_3.8.1             bit_4.0.5                 
 [97] compiler_4.3.0             curl_5.0.1                 git2r_0.32.0               csaw_1.34.0               
[101] htmlTable_2.4.1            xml2_1.3.4                 DelayedArray_0.26.3        checkmate_2.2.0           
[105] caTools_1.18.2             callr_3.7.3                rappdirs_0.3.3             digest_0.6.31             
[109] rmarkdown_2.22             htmltools_0.5.5            pkgconfig_2.0.3            jpeg_0.1-10               
[113] base64enc_0.1-3            dbplyr_2.3.2               fastmap_1.1.1              ensembldb_2.24.0          
[117] rlang_1.1.1                GlobalOptions_0.1.2        htmlwidgets_1.6.2          EnrichedHeatmap_1.30.0    
[121] farver_2.1.1               VariantAnnotation_1.46.0   RCurl_1.98-1.12            magrittr_2.0.3            
[125] Formula_1.2-5              GenomeInfoDbData_1.2.10    patchwork_1.1.2            munsell_0.5.0             
[129] Rcpp_1.0.10                stringi_1.7.12             zlibbioc_1.46.0            MASS_7.3-58.4             
[133] plyr_1.8.8                 parallel_4.3.0             ggrepel_0.9.3              deldir_1.0-9              
[137] splines_4.3.0              hms_1.1.3                  circlize_0.4.15            locfit_1.5-9.8            
[141] ps_1.7.5                   igraph_1.5.0               biomaRt_2.56.0             futile.options_1.0.1      
[145] XML_3.99-0.14              evaluate_0.21              latticeExtra_0.6-30        biovizBase_1.48.0         
[149] lambda.r_1.2.4             BiocManager_1.30.21        tzdb_0.4.0                 foreach_1.5.2             
[153] tweenr_2.0.2               httpuv_1.6.11              polyclip_1.10-4            clue_0.3-64               
[157] ashr_2.2-54                ggforce_0.4.1              broom_1.0.5                restfulr_0.0.15           
[161] AnnotationFilter_1.24.0    later_1.3.1                truncnorm_1.0-9            memoise_2.0.1             
[165] AnnotationDbi_1.62.1       GenomicAlignments_1.36.0   cluster_2.1.4              timechange_0.2.0          
[169] here_1.0.1
DiffBind • 922 views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 5 weeks ago
Cambridge, UK

By default, dba.normalize() only establishes normalization parameters for the default analysis method, which is DESeq2. As you are not specifying that it should also establish normalization parameters for edgeR, it uses the default normalization for the edgeR analysis.

You can fix this by changing the code to:

dba_tmm <- dba.normalize(dba, method=DBA_ALL_METHODS, 
                         normalize = DBA_NORM_TMM) %>% 
  dba.analyze(method = DBA_ALL_METHODS)

Note that you can also examine the normalization parameters to see what is going on:

dba.normalize(dba_tmm, bRetrieve=TRUE, method=DBA_ALL_METHODS)
ADD COMMENT
0
Entering edit mode

Lovely. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6