updateObject() fails for old serialized GRanges objects
1
0
Entering edit mode
@jeff-johnston-6497
Last seen 6.3 years ago
United States

We have a number of serialized GRanges objects in RData format that cannot seem to be handled by updateObject() in Bioconductor 3.0. These all seem to have been created with Bioconductor 2.8. The version 2.9 objects appear to be working fine. Here's the output from calling updateObject():

> load("samples/toll10b_k27ac_2to4h_1.ranges.RData")
> updateObject(toll10b_k27ac_2to4h_1.ranges, verbose=TRUE)
updateObject(object = 'GRanges')
Error in names(ans) <- seqnames(x) :
  'names' attribute [12] must be the same length as the vector [0]

I can provide a download link for the saved object if necessary. The seqnames(), start(), end() and strand() accessors all work on the loaded object, so I am able to recreate it. But this issue has prompted me to re-examine whether serializing Bioconductor objects to disk and expecting them to be accessible in all future versions is realistic. We prefer saving sequencing results as serialized GRanges because it is extremely fast to load them back into R, as opposed to re-importing them from their source BAM. Also, we often perform some read filtering so that the resulting GRanges differ from the source BAM files. Since our projects span multiple Bioconductor releases, we easily end up with collections of GRanges objects from various versions of Bioconductor over time. Until now updateObject() prevented us from running into any issues, and this first issue might be due to a simple bug, but I would like to hear if anyone thinks there might be a better format for storing GRanges-type information on-disk over the long term. We really only have two major requirements:

1. The format can be quickly loaded into R as a GRanges object

2. The resulting GRanges object has the correct seqlengths() set

Thanks,

Jeff

 

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.18.1 GenomeInfoDb_1.2.2   IRanges_2.0.0        S4Vectors_0.4.0      BiocGenerics_0.12.0  setwidth_1.0-3      

loaded via a namespace (and not attached):
[1] XVector_0.6.0
genomicranges • 983 views
ADD COMMENT
3
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Hi Jeff,

Please provide a download link for the saved object.

Serializing objects to disk and expecting them to be accessible in all future versions is probably a reasonable expectation for standard R objects like atomic vectors (character, numeric, etc...), factors, list, data.frame, and also for S3 objects. (But even that statement might need confirmation from the R core team.) For S4 objects in general, you cannot expect this. Bioconductor objects are mostly S4 objects and keeping them compatible with future versions of Bioconductor requires long-term commitment from the maintainer of the class. Core classes like eSet, GRanges, SummarizedExperiment, DNAStringSet are maintained by core members of the project who are committed to keeping them compatible with as many future versions as possible. However note that it has already happened in the past that some classes that were considered core at some point (e.g. RangedData, RangedDataList, GenomeData, GenomeDataList) are not anymore because they've been superseded by other classes (e.g. GRanges, GRangesList). Hence they will go away at some point in the future.

IMO a good practice is to always keep around the source data and recipe that were used to generate the serialized object. The recipe is more important than the object itself. It not only allows you to regenerate the object when the class definition has changed (after maybe some adjustments to the recipe) but it's also the ultimate reference for knowing exactly what went into the object (e.g. what kind of filtering was applied to the data).

Cheers,

H.

ADD COMMENT
0
Entering edit mode

Thanks Hervè. Here is a link for the object: https://dl.dropboxusercontent.com/u/726183/bioconductor/toll10b_k27ac_2to4h_1.ranges.RData

We certainly have the ability to recreate all of these objects under a new version of Bioconductor using the original source data; the serialized GRanges objects are solely for convenience.

ADD REPLY
0
Entering edit mode

Hi Jeff,

Thanks for the link.

I just committed a fix in release (BioC 3.0) and devel (BioC 3.1). The fix is in the GenomeInfoDb package (version 1.2.3 in release and 1.3.7 in devel) and will propagate to our public package repositories tomorrow morning (Seattle time). Make sure you run biocLite() with no arguments tomorrow morning to get it (beside GenomeInfoDb, that will also update any other package that is out of sync). If you can't wait, you can grab and install the latest GenomeInfoDb directly from svn:

Release:

svn co https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_0/madman/Rpacks/GenomeInfoDb

Devel:

svn co https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomeInfoDb

Login/password: readonly/readonly

Cheers,

H.

 

ADD REPLY

Login before adding your answer.

Traffic: 545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6