Hello, I am having issues with running the makeOrgPackageFromNCBI() function. Since there is no available OrgDB available for Candida albicans, I am trying to download the Candida albicans available in NCBI with tax_id 237561
library(AnnotationForge)
library(biomaRt)
makeOrgPackageFromNCBI(version = "0.1",
author = "Some One <some@one.org>",
maintainer = "Some One <some@one.org>",
outputDir = ".",
tax_id = "237561",
genus = "Candida",
species = "albicans",
rebuildCache = FALSE)
preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz
Error: no such table: main.gene2pubmed
Backtrace:
- AnnotationForge::makeOrgPackageFromNCBI(...)
- AnnotationForge:::NEW_makeOrgPackageFromNCBI(...)
- AnnotationForge:::prepareDataFromNCBI(...)
- AnnotationForge:::.makeBaseDBFromDLs(...)
- AnnotationForge:::.downloadData(...) ...
- DBI::dbExecute(NCBIcon, sql)
- DBI::dbSendStatement(conn, statement, ...)
- RSQLite::dbSendQuery(conn, statement, ...)
- RSQLite (local) .local(conn, statement, ...)
- RSQLite:::result_create(conn@ptr, statement)
Thanks for your comment! I am struggling to reproduce this code on my side, and don't know what I am doing wrong. When I run the code, I keep getting the "Error: no such table: gene2pubmed" error. Also, I am interested in running this analysis on SC5314 Candida albicans strains, and looking at the taxonomic ID's available, I have only found one or two taxonomic ID's, so I am curious as to where you saw all the tax_id's you mentioned.
I went to the source.
Also, what code are you talking about? The
makeOrgPackageFromNCBI
that you originally posted? If so, my previous post was meant to explain to you why it isn't working (and won't ever work) for you - if there are no annotations for your species at NCBI, you cannot make anOrgDb
package usingmakeOrgPackageFromNCBI
, because, well, there are no data at NCBI with which to do so.I mean, in time there may be some annotations on the species you care about, but the way NCBI works is that people identify things they think are interesting and then submit. If many people are working on your strain (or even one dedicated person/lab), then NCBI may end up with a bunch of strain-specific information that they will populate their databases with, but until that happens you won't have any data. Unless you are willing to consider that the genes in the 'main' strain are the same/good enough for your purposes (although there are hardly any for that strain as it is).
However! You might be able to do an end-around, depending on what you are really trying to do (simply having an
OrgDb
is not likely your end goal). UniProt appears to have lots of data for this strain, so you could either A.) use theUniProt.ws
package to get whatever data you want and usemakeOrgPackage
to make anOrgDb
, or B.) just use theUniProt.ws
package directly to do whatever annotations you are trying to do.That's completely fair. I meant to say that I am unable to reproduce the code block you've provided above. I try running this on my system, but I get an error when I get to the dbGetQuery function. The UniProt.ws approach is very interesting. I have briefly skimmed through and tried the package, and there seems to a bunch of options available for my Candida strain. In the end, I am aiming to run an over-representation analysis on an enriched gene set from my data. I am unsure what you are referring to when you say I can "directly use the package to do whatever annotations I am trying to do" so I'd really appreciate some guidance on that point. Thank you for your time by the way, you have no idea how helpful all this really is!
If the error is that the function can't be found, you need to either load (or install and then load)
DBI
.When I run dbGetQuery, it says: "Error: no such table: gene2pubmed", so I don't think it is unable to find the function. I thought maybe my data is incomplete, but at the very least I was hoping to get the same output as you.
What does
dbListTables(con)
produce? This is on the NCBI.sqlite db, right?I run the
con <- dbConnect(SQLite(), "NCBI.sqlite")
line, but thedbListTables(con)
yieldscharacter(0)
so guessing this list is not populated when I trydbConnect
You have to do that in the directory that contains your NCBI.sqlite database.
I believe my
NCBI.sqlite
database is located in the same directory, but for some reason it seems to be empty. Even in the File explorer it shows the file as having 0B size.Ah, ok. Doesn't really matter though. Try using
UniProt.ws
.Sounds good. Thanks again!
You can use
UniProt.ws
to annotate things. Say you have the Candida IDs and you want to know what the gene symbol is or whatever.You can use
UniProt.ws
to download the GO table as well as other annotations and then make anOrgDb
usingmakeOrgPackage
.Okay, this is a great explanation, thank you. I will use this and share results once I get some.