Question

error with downloading NCBI data for makeorgpackagefromNCBI?

0

Entering edit mode

lesneski • 0

@lesneski-22963

Last seen 4.2 years ago

Boston University

Hello, I have been trying to use makeOrgPackageFromNCBI for what I believe is a well-annotated organism (Acropora millepora, taxon ID 45264). After being unsuccessful, I tried to run the example from Mark Carlson from this page :

https://bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/inst/doc/MakingNewOrganismPackages.html

Ie using

makeOrgPackageFromNCBI(version = "0.1", + author = "Some One so@someplace.org", + maintainer = "Some One so@someplace.org", + outputDir = ".", + tax_id = "59729", + genus = "Taeniopygia", + species = "guttata")

But I get this return with an error at the bottom:

If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day. preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] geneinfo.gz [5] gene2go.gz getting data for gene2pubmed.gz Error: no such table: gene2pubmeddate

I tried this same approach using inputs that I found from a published article that successfully did it with another organism:

makeOrgPackageFromNCBI(version = "0.1", + author = "Some One so@someplace.org", + maintainer = "Some One so@someplace.org", + outputDir = ".", + tax_id = "12993", + genus = "Cassiopea", + species = "xamachana")

But again, same error.

Does anyone know why this happening and a potential solution? Thanks in advance!

annotationforge makeOrgPackageFromNCBI • 1.1k views

ADD COMMENT • link updated 4.2 years ago by James W. MacDonald 65k • written 4.2 years ago by lesneski • 0

score 0 · Answer 1 · 2020-02-20

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

Rather than generating your own, it's easier to get from the AnnotationHub

> library(AnnotationHub)
> hub <- AnnotationHub()
  |======================================================================| 100%

snapshotDate(): 2019-10-29
> query(hub, c("taeniopygia guttata","orgdb"))
AnnotationHub with 1 record
# snapshotDate(): 2019-10-29 
# names(): AH76439
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Taeniopygia guttata
# $rdataclass: OrgDb
# $rdatadateadded: 2019-10-29
# $title: org.Taeniopygia_guttata.eg.sqlite
# $description: NCBI gene ID based annotations about Taeniopygia guttata
# $taxonomyid: 59729
# $genome: NCBI genomes
# $sourcetype: NCBI/UniProt
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.uniprot.org/p...
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation") 
# retrieve record with 'object[["AH76439"]]' 
> orgdb <- hub[["AH76439"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%


> orgdb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Taeniopygia guttata
| SPECIES: Taeniopygia guttata
| CENTRALID: GID
| Taxonomy ID: 59729
| Db type: OrgDb
| Supporting package: AnnotationDbi

Please see: help('select') for usage information
>

ADD COMMENT • link 4.2 years ago James W. MacDonald 65k

0

Entering edit mode

Hi, thanks so much for your quick response. I was able to get that to work. Now I am stuck making this new orgdb work with clusterProfiler with my DESeq2 data, to do a GO Enrichment Analysis.

Here is what I have done:

created and sorted a geneList:

head(geneList) NP612819.1 XP029213287.1 XP029179546.1 XP015773080.1 XP029193592.1 XP015748232.1 15.90933 12.44382 12.35324 12.29142 12.21528 12.07018

since these are in REFSEQ format, add ENTREZID with bitr

gene.df <- bitr(gene, fromType = "REFSEQ", + toType = c("ENTREZID"), + OrgDb = orgdb)

head(gene.df) REFSEQ ENTREZID 2 XP029213287.1 114976909 3 XP029179546.1 114947034 5 XP029193592.1 114959662 10 XP029187547.1 114954991 11 XP029198384.1 114963355 12 XP029179359.1 114946884

which seems to look good

try to do enrichGO with this gene.df and this new orgdb ego2 <- enrichGO(gene = gene.df$ENTREZID, OrgDb = orgdb, keyType = 'REFSEQ', ont = "CC", pAdjustMethod = "BH", pvalueCutoff = 0.01, qvalueCutoff = 0.05)

but, the error that comes out is

Error in testForValidKeytype(x, keytype) : Invalid keytype: GOALL. Please use the keytypes method to see a listing of valid arguments.

so I tried playing around with different keytype inputs , but still get the same:

ego2 <- enrichGO(gene = gene.df$ENTREZID, + OrgDb = orgdb, + keyType = 'ENTREZID', + ont = "CC", + pAdjustMethod = "BH", + pvalueCutoff = 0.01, + qvalueCutoff = 0.05) Error in testForValidKeytype(x, keytype) : Invalid keytype: GOALL. Please use the keytypes method to see a listing of valid arguments.

Not sure where this GOALL is coming from. Any ideas how to pass both this new orgdb and my DESeq2 results through goenrich?

Many thanks again.

ADD REPLY • link 4.2 years ago lesneski • 0