Question

DiffBind always reverts to default normalization

0

Entering edit mode

bers.anthrope ▴ 10

@b748553d

Last seen 3.5 years ago

United Kingdom

I am trying out different DiffBind normalization methods for my ATAC-seq dataset in order to select the most suitable one. However, it looks like no matter which normalization method from the official manual I use, it always reverts to default normalization method.

To illustrate (starting with the default - normalization using full library sizes, DESeq2 method):

> normbyRD <- dba.normalize(non_norm_data, normalize=DBA_NORM_LIB)
> normbyRD
6 Samples, 54620 sites in matrix:
   ID Tissue Treatment Replicate    Reads FRiP
1  1a  Muscle       WT         1 19852956 0.35
2  2a  Muscle       WT         2 32243037 0.26
3  3a  Muscle       WT         3 36275376 0.29
4 1ab  Muscle    Treated        1 37432844 0.26
5 2ab  Muscle    Treated        2 40729414 0.28
6 3ab  Muscle    Treated        3 37136858 0.27

Design: [~Treatment] | 1 Contrast:
     Factor Group Samples Group2 Samples2
1 Treatment Treated       3     WT        3

> dba.normalize(normbyRD, method = DBA_DESEQ2, bRetrieve = TRUE)
$norm.method
[1] "lib"
$norm.factors
[1] 0.5848552 0.9498589 1.0686490 1.1027472 1.1998620 1.0940277
$lib.method
[1] "full"
$lib.sizes
[1] 19852956 32243037 36275376 37432844 40729414 37136858
$control.subtract
[1] TRUE
$filter.value
[1] 1

> normbyRD <- dba.analyze(normbyRD)
Applying Blacklist/Greylists...
Genome detected: Hsapiens.UCSC.hg38
Applying blacklist...
Removed: 136 of 54620 intervals.
Removed 136 (of 54620) consensus peaks.
Normalize DESeq2 with defaults...
Analyzing...
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates

When I try to normalize with both edgeR and DESeq2 approaches but using Reads in Peaks

> normbyRIP <- dba.normalize(non_norm_data, method=DBA_ALL_METHODS, normalize=DBA_NORM_NATIVE, library=DBA_LIBSIZE_PEAKREADS)
> normbyRIP 
...[output here same as above for > normbyRD]
> dba.normalize(normbyRIP, method=DBA_DESEQ2, bRetrieve=TRUE)
$norm.method
[1] "RLE"
$norm.factors
       1a        2a        3a       1ab       2ab       3ab 
0.6154949 0.8637777 1.0765341 1.1687352 1.3005265 1.1629528 
$lib.method
[1] "RiP"
$lib.sizes
      1a       2a       3a      1ab      2ab      3ab 
 6918883  8231661 10701169  9759950 11282510  9910829 
$control.subtract
[1] TRUE
$filter.value
[1] 1

> dba.normalize(normbyRIP, method=DBA_EDGER, bRetrieve=TRUE)
$norm.method
[1] "TMM"
$norm.factors
[1] 0.8358383 0.9751437 0.9405268 1.1106895 1.0755978 1.0919314
$lib.method
[1] "RiP"
$lib.sizes
      1a       2a       3a      1ab      2ab      3ab 
 6918883  8231661 10701169  9759950 11282510  9910829 
$control.subtract
[1] TRUE
$filter.value
[1] 1

So far everything looks like it's working.

But after I run the following, peaks and their numbers from "DESeq2 full library" method are the same as from "DESeq2 RiP" method

> normbyRIP <- dba.analyze(normbyRIP, method=DBA_ALL_METHODS)
Applying Blacklist/Greylists...
Genome detected: Hsapiens.UCSC.hg38
Applying blacklist...
Removed: 136 of 54620 intervals.
Removed 136 (of 54620) consensus peaks.
Normalize edgeR with defaults...
Normalize DESeq2 with defaults...
Analyzing...
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates

Here is the Venn diagram (MA plots also look identical)

enter image description here

Same happens when I try to use other normalization methods (background etc.)

I do not know what is causing this but these lines suggest that dba.analyze somehow overwrites the method that I chose for normalization.

Normalize edgeR with defaults...
Normalize DESeq2 with defaults...

Would you be able to explain that, please, Rory Stark?

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ChIPpeakAnno_3.28.1         edgeR_3.36.0                limma_3.50.3                BiocParallel_1.28.3         DiffBind_3.4.11            
 [6] SummarizedExperiment_1.24.0 Biobase_2.54.0              MatrixGenerics_1.6.0        matrixStats_0.62.0          GenomicRanges_1.46.1       
[11] GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.4            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
  [1] amap_0.8-18              colorspace_2.0-3         rjson_0.2.21             hwriter_1.3.2.1          ellipsis_0.3.2          
  [6] futile.logger_1.4.3      XVector_0.34.0           ggrepel_0.9.1            bit64_4.0.5              AnnotationDbi_1.56.2    
 [11] fansi_1.0.3              mvtnorm_1.1-3            apeglm_1.16.0            xml2_1.3.3               splines_4.1.3           
 [16] cachem_1.0.6             geneplotter_1.72.0       Rsamtools_2.10.0         annotate_1.72.0          dbplyr_2.1.1            
 [21] ashr_2.2-54              png_0.1-7                graph_1.72.0             GreyListChIP_1.26.0      compiler_4.1.3          
 [26] httr_1.4.2               lazyeval_0.2.2           assertthat_0.2.1         Matrix_1.4-1             fastmap_1.1.0           
 [31] cli_3.2.0                formatR_1.12             prettyunits_1.1.1        htmltools_0.5.2          tools_4.1.3             
 [36] coda_0.19-4              gtable_0.3.0             glue_1.6.2               GenomeInfoDbData_1.2.7   systemPipeR_2.0.8       
 [41] dplyr_1.0.8              rappdirs_0.3.3           ShortRead_1.52.0         Rcpp_1.0.8.3             bbmle_1.0.24            
 [46] vctrs_0.4.1              Biostrings_2.62.0        multtest_2.50.0          rtracklayer_1.54.0       stringr_1.4.0           
 [51] lifecycle_1.0.1          irlba_2.3.5              ensembldb_2.18.4         restfulr_0.0.13          gtools_3.9.2            
 [56] InteractionSet_1.22.0    XML_3.99-0.9             zlibbioc_1.40.0          MASS_7.3-56              scales_1.2.0            
 [61] BSgenome_1.62.0          ProtGenerics_1.26.0      hms_1.1.1                RBGL_1.70.0              parallel_4.1.3          
 [66] AnnotationFilter_1.18.0  lambda.r_1.2.4           RColorBrewer_1.1-3       curl_4.3.2               yaml_2.3.5              
 [71] memoise_2.0.1            ggplot2_3.3.5            emdbook_1.3.12           biomaRt_2.50.3           bdsmatrix_1.3-4         
 [76] latticeExtra_0.6-29      stringi_1.7.6            RSQLite_2.2.12           SQUAREM_2021.1           genefilter_1.76.0       
 [81] BiocIO_1.4.0             filelock_1.0.2           GenomicFeatures_1.46.5   caTools_1.18.2           truncnorm_1.0-8         
 [86] rlang_1.0.2              pkgconfig_2.0.3          bitops_1.0-7             lattice_0.20-45          invgamma_1.1            
 [91] purrr_0.3.4              GenomicAlignments_1.30.0 htmlwidgets_1.5.4        bit_4.0.4                tidyselect_1.1.2        
 [96] plyr_1.8.7               magrittr_2.0.3           DESeq2_1.34.0            R6_2.5.1                 gplots_3.1.1            
[101] generics_0.1.2           DelayedArray_0.20.0      DBI_1.1.2                pillar_1.7.0             survival_3.3-1          
[106] KEGGREST_1.34.0          RCurl_1.98-1.6           mixsqp_0.3-43            tibble_3.1.6             crayon_1.5.1            
[111] futile.options_1.0.1     KernSmooth_2.23-20       utf8_1.2.2               BiocFileCache_2.2.1      jpeg_0.1-9              
[116] progress_1.2.2           locfit_1.5-9.5           grid_4.1.3               blob_1.2.3               digest_0.6.29           
[121] VennDiagram_1.7.3        xtable_1.8-4             regioneR_1.26.1          numDeriv_2016.8-1.1      munsell_0.5.0

ATACSeq Normalization DiffBind • 1.6k views

ADD COMMENT • link 3.6 years ago • updated 3.5 years ago bers.anthrope ▴ 10

score 2 · Accepted Answer · 2022-05-11

2

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 10 months ago

Cambridge, UK

There is a workaround, which is to call dba.blacklist() prior to calling dba.normalize().

What is happening is that normalizing with library=DBA_LIBSIZE_PEAKREADS depends on the consensus peakset, and by applying the blacklist after normalizing, the consensus set is changing (136 fewer peaks). Right now it detects that the consensus set has changed and clears the normalization, which ends up resulting in the default normalization being used. I'll look at fixing this by renormalizing using the same normalization parameters, or at least issuing a warning.

ADD COMMENT • link 3.5 years ago Rory Stark ★ 5.2k

0

Entering edit mode

I've checked in a fix for this (DiffBind_3.6.1). DiffBind will now re-normalize after applying blacklists/greylists, using the originally specified parameters.