Hello,
I want to start by saying that I am still relatively green when it comes to understanding the intricacies of R packages. I have read through the DiffBind package vignette a few times, and have tried going through the source code to help me, but I am still trying to understand the meaning of some of the plots that come out of the DiffBind package, namely plotHeatmap()
and dba.plotPCA()
. I can run through a pipeline of the package without issue, but I really want to understand what is happening behind the scenes. I want to know what features are being used for the clustering.
For example with the heatmaps:
tamoxifen <- dba(sampleSheet="tamoxifen.csv",
+ dir=system.file("extra", package="DiffBind"))
plot(tamoxifen)
The plot that is produced is:
and when I run this code:
tamoxifen_counts <- dba.count(tamoxifen, summits=250)
plot(tamoxifen_counts)
The plots that is produced is:
What are the different features that each function is using while clustering? For both the correlation heatmap and for PCA? I'm looking for a more in depth explanation, not just what is stated in the vignette.
Thank you!
Joseph
sessionInfo():
sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.6 (Maipo)
Matrix products: default
BLAS: /usr/local/gcc-6_3_0/lapack/3.7.0/lib/libblas.so.3.7.0
LAPACK: /usr/local/gcc-6_3_0/lapack/3.7.0/lib/liblapack.so.3.7.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] XLConnect_0.2-15 XLConnectJars_0.2-15
[3] DiffBind_2.10.0 SummarizedExperiment_1.12.0
[5] DelayedArray_0.8.0 BiocParallel_1.16.6
[7] matrixStats_0.53.1 Biobase_2.40.0
[9] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[11] IRanges_2.16.0 S4Vectors_0.20.1
[13] BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] Category_2.48.1 bitops_1.0-6 bit64_0.9-7
[4] RColorBrewer_1.1-2 progress_1.2.0 httr_1.4.0
[7] Rgraphviz_2.26.0 backports_1.1.2 tools_3.5.0
[10] R6_2.2.2 KernSmooth_2.23-15 DBI_1.0.0
[13] lazyeval_0.2.1 colorspace_1.3-2 tidyselect_0.2.5
[16] prettyunits_1.0.2 bit_1.1-13 compiler_3.5.0
[19] sendmailR_1.2-1 graph_1.60.0 rtracklayer_1.42.2
[22] checkmate_1.8.5 caTools_1.17.1.2 scales_0.5.0
[25] BatchJobs_1.8 genefilter_1.64.0 RBGL_1.58.2
[28] stringr_1.3.1 digest_0.6.15 Rsamtools_1.34.1
[31] AnnotationForge_1.24.0 XVector_0.22.0 base64enc_0.1-3
[34] pkgconfig_2.0.1 limma_3.36.1 rlang_0.4.5
[37] RSQLite_2.1.1 BBmisc_1.11 GOstats_2.48.0
[40] hwriter_1.3.2 gtools_3.8.1 dplyr_0.8.5
[43] RCurl_1.95-4.10 magrittr_1.5 GO.db_3.6.0
[46] GenomeInfoDbData_1.2.0 Matrix_1.2-14 Rcpp_1.0.4
[49] munsell_0.4.3 stringi_1.2.2 edgeR_3.22.1
[52] zlibbioc_1.28.0 gplots_3.0.3 plyr_1.8.4
[55] grid_3.5.0 blob_1.1.1 ggrepel_0.8.2
[58] gdata_2.18.0 crayon_1.3.4 lattice_0.20-35
[61] Biostrings_2.50.2 splines_3.5.0 GenomicFeatures_1.34.8
[64] annotate_1.58.0 hms_0.4.2 locfit_1.5-9.1
[67] pillar_1.4.3 rjson_0.2.20 systemPipeR_1.16.1
[70] biomaRt_2.38.0 XML_3.98-1.11 glue_1.3.0
[73] ShortRead_1.40.0 latticeExtra_0.6-28 data.table_1.11.2
[76] gtable_0.2.0 purrr_0.2.5 amap_0.8-11
[79] assertthat_0.2.0 ggplot2_3.1.0 xtable_1.8-2
[82] survival_2.42-3 tibble_2.1.3 pheatmap_1.0.12
[85] rJava_0.9-11 GenomicAlignments_1.18.1 AnnotationDbi_1.44.0
[88] memoise_1.1.0 brew_1.0-6 GSEABase_1.44.0