annotationhub: data update
1
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 8 months ago
United States

If I run the following code to get RefSeq from UCSC:

library(AnnotationHub)
ahub <- AnnotationHub()
ahub <- subset(ahub, species == "Homo sapiens" & genome == "hg19")
query(ahub, "refseq")[1]

I see

> query(ahub, "refseq")[1]
AnnotationHub with 1 record
# snapshotDate(): 2018-08-01 
# names(): AH5040
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# $rdatadateadded: 2013-03-26
# $title: RefSeq Genes
<SNIP>

suggesting to me that this data is from 2013. Additionally, if I change hg19 to hg38 my query comes up empty, despite the fact that UCSC at least right now has RefSeq genes released in 2018 for hg38 (as everyone would expect).

Question: I am surprised that AnnotationHub has these (to me) outdated resources. I expected that that with each new Bioc release, the hub gets updated. But perhaps I am wrong?

annotationhub • 1.2k views
ADD COMMENT
0
Entering edit mode
shepherl 3.8k
@lshep
Last seen 16 hours ago
United States

We keep all records including old records for data reproducibility. As far as updating, We update bioconductor maintained resources every release - user contributed annotations we rely on the maintainers to submit proper updates. It would be impossible for us to regenerate user contributed resources.

ADD COMMENT
0
Entering edit mode
In this case, this is an interface to a UCSC resource which I believe is provided by the core team (I could be wrong though). The implications (to me) is that as time goes on, the AnnotationHub version of the resource deviates from UCSC. I understand short term deviations in the sense that UCSC have different release cycles which do not match with ours, but ultimately, I think this is a potential big problem since it implies that the version of Refseq I get from AnnotationHub appears to be more than 5 years old. That's a long time. Best, Kasper On Thu, Aug 16, 2018 at 7:27 AM shepherl [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User shepherl <https: support.bioconductor.org="" u="" 10588=""/> wrote Answer: > annotationhub: data update > <https: support.bioconductor.org="" p="" 112102="" #112138="">: > > We keep all records including old records for data reproducibility. As far > as updating, We update bioconductor maintained resources every release - > user contributed annotations we rely on the maintainers to submit proper > updates. It would be impossible for us to regenerate user contributed > resources. > ------------------------------ > > Post tags: annotationhub > > You may reply via email or visit > A: annotationhub: data update >
ADD REPLY
0
Entering edit mode
Looking at the recipes in AnnotationHubData I do not see anything for RefSeq data which leads me to believe that it is not a core provided resource. I can investigate more into this and will bring it up at the core meeting later today.
ADD REPLY
0
Entering edit mode

These were all generated by Marc Carlson, in 2013, when he was part of BioC core, so it seems to be core provided?

> hub <- AnnotationHub()
snapshotDate(): 2018-04-30
> minihub <- query(hub, c("Homo sapiens","refGene"))

> mcols(minihub)[,c("title","genome","maintainer", "rdatadateadded")]
DataFrame with 8 rows and 4 columns
              title      genome                        maintainer rdatadateadded
        <character> <character>                       <character>    <character>
AH5040 RefSeq Genes        hg19 Marc Carlson <mcarlson@fhcrc.org>     2013-03-26
AH5041 Other RefSeq        hg19 Marc Carlson <mcarlson@fhcrc.org>     2013-03-26
AH5155 RefSeq Genes        hg18 Marc Carlson <mcarlson@fhcrc.org>     2013-03-26
AH5156 Other RefSeq        hg18 Marc Carlson <mcarlson@fhcrc.org>     2013-03-26
AH5306 RefSeq Genes        hg17 Marc Carlson <mcarlson@fhcrc.org>     2013-03-26
AH5307 Other RefSeq        hg17 Marc Carlson <mcarlson@fhcrc.org>     2013-03-26
AH5431 RefSeq Genes        hg16 Marc Carlson <mcarlson@fhcrc.org>     2013-03-26
AH5432 Other RefSeq        hg16 Marc Carlson <mcarlson@fhcrc.org>     2013-03-26
ADD REPLY
0
Entering edit mode

The core team will come up with a strategy for updating these resources as part of our regular work flows.

ADD REPLY

Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6