We have a number of serialized GRanges objects in RData format that cannot seem to be handled by updateObject()
in Bioconductor 3.0. These all seem to have been created with Bioconductor 2.8. The version 2.9 objects appear to be working fine. Here's the output from calling updateObject()
:
> load("samples/toll10b_k27ac_2to4h_1.ranges.RData") > updateObject(toll10b_k27ac_2to4h_1.ranges, verbose=TRUE) updateObject(object = 'GRanges') Error in names(ans) <- seqnames(x) : 'names' attribute [12] must be the same length as the vector [0]
I can provide a download link for the saved object if necessary. The seqnames(), start(), end() and strand() accessors all work on the loaded object, so I am able to recreate it. But this issue has prompted me to re-examine whether serializing Bioconductor objects to disk and expecting them to be accessible in all future versions is realistic. We prefer saving sequencing results as serialized GRanges because it is extremely fast to load them back into R, as opposed to re-importing them from their source BAM. Also, we often perform some read filtering so that the resulting GRanges differ from the source BAM files. Since our projects span multiple Bioconductor releases, we easily end up with collections of GRanges objects from various versions of Bioconductor over time. Until now updateObject() prevented us from running into any issues, and this first issue might be due to a simple bug, but I would like to hear if anyone thinks there might be a better format for storing GRanges-type information on-disk over the long term. We really only have two major requirements:
1. The format can be quickly loaded into R as a GRanges object
2. The resulting GRanges object has the correct seqlengths() set
Thanks,
Jeff
> sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicRanges_1.18.1 GenomeInfoDb_1.2.2 IRanges_2.0.0 S4Vectors_0.4.0 BiocGenerics_0.12.0 setwidth_1.0-3 loaded via a namespace (and not attached): [1] XVector_0.6.0
Thanks Hervè. Here is a link for the object: https://dl.dropboxusercontent.com/u/726183/bioconductor/toll10b_k27ac_2to4h_1.ranges.RData
We certainly have the ability to recreate all of these objects under a new version of Bioconductor using the original source data; the serialized GRanges objects are solely for convenience.
Hi Jeff,
Thanks for the link.
I just committed a fix in release (BioC 3.0) and devel (BioC 3.1). The fix is in the GenomeInfoDb package (version 1.2.3 in release and 1.3.7 in devel) and will propagate to our public package repositories tomorrow morning (Seattle time). Make sure you run biocLite() with no arguments tomorrow morning to get it (beside GenomeInfoDb, that will also update any other package that is out of sync). If you can't wait, you can grab and install the latest GenomeInfoDb directly from svn:
Release:
Devel:
Login/password: readonly/readonly
Cheers,
H.