Hi,
I have encountered unexpected large disk images when exporting DESeqDataSet (subclass of SummarizedExperiment) objects using the save function. The size of the exported file is much larger than expected, which is a problem for large experiments. In the following example, a test object is created and exported. For comparison, also the actual count values (a matrix of integers) are extracted and saved.
In order to test compression, all count values are set to zero and the corresponding object and count matrix are exported once again.
The DESeqDataSet disk image (about 10MB) is much larger than the expected size (about 4MB) and the counts seem not to be compressed by means of the save function. In comparison, the exported count matrix is highly compressed (1.3 MB and 0.03 MB for zero count values). See the following code
library(DESeq2) # DESeq2_1.6.3 # create example data set dds <- makeExampleDESeqDataSet(n=10000, m=100) # size should be about 4 MB (10000 x 100 x 4 Bytes (integers)) print(object.size(dds@assays$data@listData$counts)) # 4000200 bytes # extract counts for comparison countMat <- counts(dds) # set counts to zero to test compression zeroDds <- dds; counts(zeroDds)[] <- 0L zeroCountMat <- counts(zeroDds) # save to disk save(dds, file="dds.RData") save(countMat, file="countMat.RData") save(zeroCountMat, file="zeroCountMat.RData") save(zeroDds, file="zeroDds.RData") # report file size for(f in c("dds.RData", "zeroDds.RData", "countMat.RData", "zeroCountMat.RData")) cat(f, ": ", file.info(f)$size / 2^20, " MB\n", sep="") # dds.RData: 10.47201 MB # zeroDds.RData: 10.50408 MB # countMat.RData: 1.269134 MB # zeroCountMat.RData: 0.02977085 MB
Maybe, there is somewhere a conversion to double?
Thanks for answers
The large size comes from the 'design' slot
which I think captures the global environment(!).