Hi,
I am trying to generate small files to include in my package for regression tests. One of them is a small DESeqDataSet object (object dds_small
below, the first 50 features from a complete analysis store in object dds
). However, when I save the small object, its size remains very large:
> dds <- readRDS("2018-02-12_all_tissues/dds.rds") > object.size(dds) 12579016 bytes > dds_small <- dds[1:50,] > object.size(dds_small) 111056 bytes > length(serialize(dds_small, NULL)) [1] 45625706
The size of the small object seems larger than the size of the original object! It seems to be the design
slot which uses so much space, as there appears to be an environment attached to it:
> dds_small@design ~(Tissue/Age)/Genotype <environment: 0x3e64708> > object.size(dds_small@design) 1344 bytes > length(serialize(dds_small@design, NULL)) [1] 45353218
This environment probably stores a bunch of packages that were in use when the original object was created, because the sessionInfo
(below) reports many loaded packages, although I just did a readRDS
command in a fresh R session.
As I am not familiar with environments nor with DESeqDataSet
internals, my question is: how should I do to keep my subset object size small?
Thanks for your help,
Eric
> sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.5 LTS Matrix products: default BLAS: /home/eblanc/R/R-3.5.1/lib/libRblas.so LAPACK: /home/eblanc/R/R-3.5.1/lib/libRlapack.so locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] Biobase_2.40.0 bit64_0.9-7 [3] splines_3.5.1 Formula_1.2-3 [5] assertthat_0.2.0 stats4_3.5.1 [7] latticeExtra_0.6-28 blob_1.1.1 [9] GenomeInfoDbData_1.1.0 pillar_1.3.0 [11] RSQLite_2.1.1 backports_1.1.2 [13] lattice_0.20-35 glue_1.3.0 [15] digest_0.6.17 GenomicRanges_1.32.7 [17] RColorBrewer_1.1-2 XVector_0.20.0 [19] checkmate_1.8.5 colorspace_1.3-2 [21] htmltools_0.3.6 Matrix_1.2-14 [23] plyr_1.8.4 DESeq2_1.20.0 [25] XML_3.98-1.16 pkgconfig_2.0.2 [27] rseqCP_0.1.0 genefilter_1.62.0 [29] zlibbioc_1.26.0 purrr_0.2.5 [31] xtable_1.8-3 scales_1.0.0 [33] BiocParallel_1.14.2 htmlTable_1.12 [35] tibble_1.4.2 annotate_1.58.0 [37] IRanges_2.14.12 ggplot2_3.0.0 [39] SummarizedExperiment_1.10.1 nnet_7.3-12 [41] BiocGenerics_0.26.0 lazyeval_0.2.1 [43] survival_2.42-3 magrittr_1.5 [45] crayon_1.3.4 memoise_1.1.0 [47] foreign_0.8-70 tools_3.5.1 [49] data.table_1.11.6 matrixStats_0.54.0 [51] stringr_1.3.1 S4Vectors_0.18.3 [53] locfit_1.5-9.1 munsell_0.5.0 [55] cluster_2.0.7-1 DelayedArray_0.6.6 [57] AnnotationDbi_1.42.1 bindrcpp_0.2.2 [59] compiler_3.5.1 GenomeInfoDb_1.16.0 [61] rlang_0.2.2 grid_3.5.1 [63] RCurl_1.95-4.11 rstudioapi_0.7 [65] htmlwidgets_1.2 bitops_1.0-6 [67] base64enc_0.1-3 gtable_0.2.0 [69] DBI_1.0.0 R6_2.2.2 [71] gridExtra_2.3 knitr_1.20 [73] dplyr_0.7.6 bit_1.1-14 [75] bindr_0.1.1 Hmisc_4.1-1 [77] stringi_1.2.4 parallel_3.5.1 [79] Rcpp_0.12.18 geneplotter_1.58.0 [81] rpart_4.1-13 acepack_1.4.1 [83] tidyselect_0.2.4
Thanks Michael, and sorry I wasn't able to find the relevant thread...
I have a hard time finding old threads myself! And (1) is new since the last two versions since I got tired of dealing with formula() and it’s greedy behavior.