Hi,
I have encountered unexpected large disk images when exporting DESeqDataSet (subclass of SummarizedExperiment) objects using the save function. The size of the exported file is much larger than expected, which is a problem for large experiments. In the following example, a test object is created and exported. For comparison, also the actual count values (a matrix of integers) are extracted and saved.
In order to test compression, all count values are set to zero and the corresponding object and count matrix are exported once again.
The DESeqDataSet disk image (about 10MB) is much larger than the expected size (about 4MB) and the counts seem not to be compressed by means of the save function. In comparison, the exported count matrix is highly compressed (1.3 MB and 0.03 MB for zero count values). See the following code
library(DESeq2)
# DESeq2_1.6.3
# create example data set
dds <- makeExampleDESeqDataSet(n=10000, m=100)
# size should be about 4 MB (10000 x 100 x 4 Bytes (integers))
print(object.size(dds@assays$data@listData$counts))
# 4000200 bytes
# extract counts for comparison
countMat <- counts(dds)
# set counts to zero to test compression
zeroDds <- dds; counts(zeroDds)[] <- 0L
zeroCountMat <- counts(zeroDds)
# save to disk
save(dds, file="dds.RData")
save(countMat, file="countMat.RData")
save(zeroCountMat, file="zeroCountMat.RData")
save(zeroDds, file="zeroDds.RData")
# report file size
for(f in c("dds.RData", "zeroDds.RData", "countMat.RData", "zeroCountMat.RData"))
cat(f, ": ", file.info(f)$size / 2^20, " MB\n", sep="")
# dds.RData: 10.47201 MB
# zeroDds.RData: 10.50408 MB
# countMat.RData: 1.269134 MB
# zeroCountMat.RData: 0.02977085 MB
Maybe, there is somewhere a conversion to double?
Thanks for answers

The large size comes from the 'design' slot
owd = setwd(tempdir()) sort(sapply(slotNames(dds), function(nm) { x = slot(dds, nm); save(x, file=nm); file.size(nm) })) setwd(owd) ## dispersionFunction exptData colData ## 79 168 316 ## rowData assays design ## 206618 1316594 10771848which I think captures the global environment(!).