Programatically download correct genome wide annotation package for given organism/assembly
1
0
Entering edit mode
s1437643 ▴ 20
@s1437643-9524
Last seen 5.1 years ago

I'm looking for a way to programatically download the correct genome wide annotation package given one of the following or a combination of arguments to a script:

Organism: Mmusculus
Assembly: mm10

The result would be that the org.MM.eg.db package would be downloaded.

annotationdbi organismdbi • 1.5k views
ADD COMMENT
1
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 21 days ago
Italy

You could use AnnotationHub to query for available ressources:

library(AnnotationHub)
ah <- AnnotationHub()
query(ah, "org.MM.eg")
AnnotationHub with 1 record
# snapshotDate(): 2017-01-05
# names(): AH52234
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Mus musculus
# $rdataclass: OrgDb
# $title: org.Mm.eg.db.sqlite
# $description: NCBI gene ID based annotations about Mus musculus
# $taxonomyid: 10090
# $genome: NCBI genomes
# $sourcetype: NCBI/ensembl
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation")
# retrieve record with 'object[["AH52234"]]'

Problem seems that this resource is not tagged to mm10. If you want all resources for mm10 you can query for that too.

The UCSC mm10 corresponds to GRCm38, so you could eventually use that to fetch e.g. gtf files to build either a TxDb or an EnsDb - but that's unfortunately not a org db resource.

query(ah, c("GRCm38", "gtf"))
AnnotationHub with 23 records
# snapshotDate(): 2017-01-05
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH7567"]]'

            title                                          
  AH7567  | Mus_musculus.GRCm38.70.gtf                     
  AH7628  | Mus_musculus.GRCm38.69.gtf                     
  AH7675  | Mus_musculus.GRCm38.71.gtf                     
  ...       ...                                            
  AH51038 | Mus_musculus.GRCm38.85.chr.gtf                 
  AH51039 | Mus_musculus.GRCm38.85.chr_patch_hapl_scaff.gtf
  AH51040 | Mus_musculus.GRCm38.85.gtf    
ADD COMMENT
0
Entering edit mode

That could work - I'd simply have to parse the organism parameter for the first two letters and perform the query search you suggested.

ADD REPLY

Login before adding your answer.

Traffic: 630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6