Question

Scores used for DiffBind's dba.plotHeatmap

0

Entering edit mode

Sam ▴ 10

@sam-21502

Last seen 2 days ago

Jerusalem

I am retrieving the normalized counts of reads in peaks (scores) from DiffBind, as explained in this Bioconductor post :

dba.peakset(dbObj_norm, bRetrieve=TRUE ) %>% as.data.frame() %>% head()

  seqnames   start     end width strand Samp1    Samp2    Samp3    Samp4
1     chr1  905219  905619   401      * 30.96286 26.57474 43.41694 39.82934
2     chr1  959509  959909   401      * 46.86271 36.36544 45.02497 45.13992
3     chr1  966465  966865   401      * 66.94673 70.63287 36.98480 31.86347
4     chr1 1062705 1063105   401      * 51.88371 44.75746 41.80890 37.17405
5     chr1 1231475 1231875   401      * 36.82070 37.06477 28.94462 45.13992
6     chr1 1890661 1891061   401      * 65.27306 65.73752 30.55266 31.86347

For some reason, the results are very different from the values used for plotting by dba.plotHeatmap

 z <- dba.plotHeatmap(dbObj_norm, correlations = F, score = DBA_SCORE_NORMALIZED, maxSites = 10000) 
> z %>% as.data.frame() %>% arrange(seqnames, start, end) %>% head()
  seqnames   start     end width strand   Samp4  Samp3    Samp1    Samp2
1     chr1  905219  905619   401      * 5.315760 5.440186 4.952467 4.731984
2     chr1  959509  959909   401      * 5.496332 5.492653 5.550368 5.184496
3     chr1  966465  966865   401      * 4.993831 5.208860 6.064942 6.142268
4     chr1 1062705 1063105   401      * 5.216224 5.385738 5.697210 5.484056
5     chr1 1231475 1231875   401      * 5.496332 4.855224 5.202445 5.211977
6     chr1 1890661 1891061   401      * 4.993831 4.933226 6.028416 6.03864

  -------
  seqinfo: 23 sequences from an unspecified genome; no seqlengths

The order of the columns has changed due to clustering of course, but my point is that the values themselves are different by an order of magnitude. What is the cause of the difference? Are not the values of normalized scores (as in the first code block) the correct ones to be used for heatmap plotting?

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_IL.UTF-8       LC_NUMERIC=C               LC_TIME=en_IL.UTF-8        LC_COLLATE=en_IL.UTF-8     LC_MONETARY=en_IL.UTF-8   
 [6] LC_MESSAGES=en_IL.UTF-8    LC_PAPER=en_IL.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_IL.UTF-8 LC_IDENTIFICATION=C       

time zone: Asia/Jerusalem
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-3          dplyr_1.1.2                 pheatmap_1.0.12             DiffBind_3.10.1             rgl_1.2.1                  
 [6] limma_3.56.2                DESeq2_1.40.1               SummarizedExperiment_1.30.2 Biobase_2.60.0              MatrixGenerics_1.12.2      
[11] matrixStats_1.0.0           GenomicRanges_1.52.0        GenomeInfoDb_1.36.0         IRanges_2.34.0              S4Vectors_0.38.1           
[16] BiocGenerics_0.46.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7             deldir_1.0-9             rlang_1.1.1              magrittr_2.0.3           compiler_4.3.0           png_0.1-8               
 [7] vctrs_0.6.3              stringr_1.5.0            pkgconfig_2.0.3          crayon_1.5.2             fastmap_1.1.1            XVector_0.40.0          
[13] caTools_1.18.2           utf8_1.2.3               Rsamtools_2.16.0         rmarkdown_2.22           xfun_0.39                cachem_1.0.8            
[19] zlibbioc_1.46.0          jsonlite_1.8.5           DelayedArray_0.26.3      BiocParallel_1.34.2      jpeg_0.1-10              irlba_2.3.5.1           
[25] parallel_4.3.0           R6_2.5.1                 bslib_0.5.0              stringi_1.7.12           SQUAREM_2021.1           rtracklayer_1.60.0      
[31] jquerylib_0.1.4          numDeriv_2016.8-1.1      Rcpp_1.0.10              knitr_1.43               base64enc_0.1-3          Matrix_1.5-4.1          
[37] tidyselect_1.2.0         rstudioapi_0.14          yaml_2.3.7               gplots_3.1.3             codetools_0.2-19         hwriter_1.3.2.1         
[43] lattice_0.21-8           tibble_3.2.1             plyr_1.8.8               withr_2.5.0              ShortRead_1.58.0         evaluate_0.21           
[49] coda_0.19-4              Biostrings_2.68.1        pillar_1.9.0             KernSmooth_2.23-21       generics_0.1.3           invgamma_1.1            
[55] RCurl_1.98-1.12          truncnorm_1.0-9          emdbook_1.3.12           ggplot2_3.4.2            munsell_0.5.0            scales_1.2.1            
[61] ashr_2.2-54              gtools_3.9.4             glue_1.6.2               tools_4.3.0              apeglm_1.22.1            interp_1.1-4            
[67] BiocIO_1.10.0            BSgenome_1.68.0          locfit_1.5-9.8           GenomicAlignments_1.36.0 systemPipeR_2.6.1        mvtnorm_1.2-2           
[73] XML_3.99-0.14            grid_4.3.0               bbmle_1.0.25             amap_0.8-19              bdsmatrix_1.3-6          latticeExtra_0.6-30     
[79] colorspace_2.1-0         GenomeInfoDbData_1.2.10  restfulr_0.0.15          cli_3.6.1                GreyListChIP_1.32.0      fansi_1.0.4             
[85] mixsqp_0.3-48            S4Arrays_1.0.4           gtable_0.3.3             sass_0.4.6               digest_0.6.31            ggrepel_0.9.3           
[91] rjson_0.2.21             htmlwidgets_1.6.2        htmltools_0.5.5          lifecycle_1.0.3          MASS_7.3-60

DiffBind • 232 views

ADD COMMENT • link 6 weeks ago Sam ▴ 10

0

Entering edit mode

For heatmaps one usually uses standardized (aka Z-scored, which this almost certainly is) values which represent the deviation from the mean per gene. If you cluster expression values directly, even on log2 scale, then your heatmap will simply cluster by expression value, since expression level differences between genes are usually much larger than differential expression between samples.

ADD REPLY • link 6 weeks ago ATpoint ★ 4.0k

0

Entering edit mode

That makes sense to use standardized values for heatmaps.

In this specific example, the values that are returned from dba.plotHeatmap are not the z-score values. It can be seen from the fact that they are all positive. It cannot be the case that all values are higher than the mean of these very values.

I am still interested to know what are the values that dba.plotHeatmap returns.

ADD REPLY • link 6 weeks ago Sam ▴ 10