Question: Programatically download correct genome wide annotation package for given organism/assembly
0
gravatar for s1437643
2.7 years ago by
s14376430
s14376430 wrote:

I'm looking for a way to programatically download the correct genome wide annotation package given one of the following or a combination of arguments to a script:

Organism: Mmusculus
Assembly: mm10

The result would be that the org.MM.eg.db package would be downloaded.

annotationdbi organismdbi • 603 views
ADD COMMENTlink modified 2.7 years ago by Johannes Rainer1.5k • written 2.7 years ago by s14376430
Answer: Programatically download correct genome wide annotation package for given organi
1
gravatar for Johannes Rainer
2.7 years ago by
Johannes Rainer1.5k
Italy
Johannes Rainer1.5k wrote:

You could use AnnotationHub to query for available ressources:

library(AnnotationHub)
ah <- AnnotationHub()
query(ah, "org.MM.eg")
AnnotationHub with 1 record
# snapshotDate(): 2017-01-05
# names(): AH52234
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Mus musculus
# $rdataclass: OrgDb
# $title: org.Mm.eg.db.sqlite
# $description: NCBI gene ID based annotations about Mus musculus
# $taxonomyid: 10090
# $genome: NCBI genomes
# $sourcetype: NCBI/ensembl
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation")
# retrieve record with 'object[["AH52234"]]'

Problem seems that this resource is not tagged to mm10. If you want all resources for mm10 you can query for that too.

The UCSC mm10 corresponds to GRCm38, so you could eventually use that to fetch e.g. gtf files to build either a TxDb or an EnsDb - but that's unfortunately not a org db resource.

query(ah, c("GRCm38", "gtf"))
AnnotationHub with 23 records
# snapshotDate(): 2017-01-05
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH7567"]]'

            title                                          
  AH7567  | Mus_musculus.GRCm38.70.gtf                     
  AH7628  | Mus_musculus.GRCm38.69.gtf                     
  AH7675  | Mus_musculus.GRCm38.71.gtf                     
  ...       ...                                            
  AH51038 | Mus_musculus.GRCm38.85.chr.gtf                 
  AH51039 | Mus_musculus.GRCm38.85.chr_patch_hapl_scaff.gtf
  AH51040 | Mus_musculus.GRCm38.85.gtf    
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Johannes Rainer1.5k

That could work - I'd simply have to parse the organism parameter for the first two letters and perform the query search you suggested.

ADD REPLYlink written 2.7 years ago by s14376430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 140 users visited in the last hour