RSQLite::dbGetPreparedQuery() is deprecated in AnnotationForge
2
0
Entering edit mode
MOD • 0
@mod-12330
Last seen 5.2 years ago
Teagasc Dublin

Hi,

I'm trying to run create an annotation database for Agaricus bisporus through NCBI in AnnotationForge, but I get a couple of errors:

Error in makeOrgDbFromDataFrames(data, tax_id, genus, species, dbFileName,  :
'goTable' GO Ids must be formatted like 'GO:XXXXXXX'
In addition: Warning messages:
1: RSQLite::dbGetPreparedQuery() is deprecated, please switch to DBI::dbGetQuery(params = bind.data).
2: Named parameters not used in query: genes
3: Named parameters not used in query: name, value

How do I work around the deprecated RSQLite::dbGetPreparedQuery() function? The full script is given below along with sessonInfo. Furthermore, when I open the gene2go file the GO IDs seem fine so not sure why the go Table is not recognizing the IDs. Does anybody have an idea why the GO IDs are not recognized (I have pasted the top rows from the gene2go file that AnnotationForge obtained from NCBI at the bottom of this page)?

My script is:

> library(AnnotationDbi)
> library(GenomeInfoDb)
> library(biomaRt)
> library(survival)
> libraryUniProt.ws)
> library(knitr)
> library(DBI)
> library(mclust)

> makeOrgPackageFromNCBI(version = "0.1",
+ author = "my name",
+ maintainer = "email.com",
+ outputDir = ".",
+ tax_id = "936046",
+ genus = "Agaricus",
+ species = "bisporus")

If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
rebuilding the cache
extracting data for our organism from : gene_info
getting data for gene2go.gz
rebuilding the cache
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Please be patient while we work out which organisms can be annotated with
ensembl IDs.
making the OrgDb package ...
Populating genes table:
genes table filled
Populating pubmed table:
pubmed table filled
Populating chromosomes table:
chromosomes table filled
Populating gene_info table:
gene_info table filled
Populating entrez_genes table:
entrez_genes table filled
Populating alias table:
alias table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating go table:
go table filled
Error in makeOrgDbFromDataFrames(data, tax_id, genus, species, dbFileName,  :
'goTable' GO Ids must be formatted like 'GO:XXXXXXX'
In addition: Warning messages:
1: RSQLite::dbGetPreparedQuery() is deprecated, please switch to DBI::dbGetQuery(params = bind.data).
2: Named parameters not used in query: genes
3: Named parameters not used in query: name, value

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_Ireland.1252  LC_CTYPE=English_Ireland.1252
[3] LC_MONETARY=English_Ireland.1252 LC_NUMERIC=C
[5] LC_TIME=English_Ireland.1252

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods
[9] base

other attached packages:
[1] mclust_5.2.2           DBI_0.5-1              knitr_1.15.1
[4] UniProt.ws_2.14.0      RCurl_1.95-4.8         bitops_1.0-6
[7] survival_2.40-1        biomaRt_2.30.0         GenomeInfoDb_1.10.3
[10] AnnotationHub_2.6.4    AnnotationForge_1.16.0 AnnotationDbi_1.36.2
[13] IRanges_2.8.1          S4Vectors_0.12.1       Biobase_2.34.0
[16] BiocGenerics_0.20.0    RSQLite_1.1-2

loaded via a namespace (and not attached):
[1] Rcpp_0.12.9                   splines_3.3.2
[3] lattice_0.20-34               xtable_1.8-2
[5] R6_2.2.0                      httr_1.2.1
[7] tools_3.3.2                   grid_3.3.2
[9] htmltools_0.3.5               yaml_2.1.14
[11] digest_0.6.12                 interactiveDisplayBase_1.12.0
[13] Matrix_1.2-8                  shiny_1.0.0
[15] memoise_1.0.0                 mime_0.5
[17] BiocInstaller_1.24.0          XML_3.98-1.5
[19] httpuv_1.3.3

An example of the gene2go file obtained from NCBI is:

 #tax_id GeneID GO_ID Evidence Qualifier GO_term PubMed Category 3702 814629 GO:0005634 ISM - nucleus - Component 3702 814629 GO:0008150 ND - biological_process - Process 3702 814630 GO:0003677 IEA - DNA binding - Function 3702 814630 GO:0003700 ISS - transcription factor activity, sequence-specific DNA binding 11118137 Function 3702 814630 GO:0005634 IEA - nucleus - Component 3702 814630 GO:0005634 ISM - nucleus - Component 3702 814630 GO:0006351 IEA - transcription, DNA-templated - Process

annotation microarray annotate annotationforge • 1.8k views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Your post title is misleading, as the real problem here is the error, not the warning. The error arises for species that have no GO data at NCBI. As a fail-over we then parse data from Blast2GO, and if that results in no data, then it fails because of a small bug. That's fixed now, and the updated version (1.16.1) should make its way through the build servers in the next day or so.

The warning is a long-standing issue that has to do with changes that were made in the RSQLite package, which AnnotationForge depends on. This doesn't stop anything from working - it's just letting us know that a function we are depending on is probably going to disappear in the future.

The devel version of AnnotationForge is now updated to remove the warnings, so once we have the new release in April, those warnings will go away as well.

0
Entering edit mode
MOD • 0
@mod-12330
Last seen 5.2 years ago
Teagasc Dublin

Ok, thanks for the info and your reply James. I'll keep an eye out for the update. I had thought that the warning was part of the issue of not seeing the GO IDs. The gene2go file though did appear to have GO IDs though (see the end of my original question for the first few lines of the gene2go dataframe) and I was wondering why the program was not parsing that data into the goTable?

1
Entering edit mode

If you want to comment on a post, please click the ADD COMMENT link and type in the box that appears. The 'Add your answer' box below is intended for answers.

While you did show some rows from gene2go, you should note that the taxonomic ID for those rows (the first column) is 3702, which is Arabidopsis thaliana, not Agaricus bisporus. There are no rows in the gene2go file that have 936046 in the first column, hence no data parsed out for your GO table.

0
Entering edit mode

ok, thanks. I did not see that. Any idea why it obtained Arabidopsis thaliana GO ID's and not ​Agaricus bisporus? ​I'll try to see if I can source the GO IDs some where else and use the makeOrgPackage()​. Thanks again for your help.

1
Entering edit mode

The gene2go file that is downloaded is a generic file that contains Entrez Gene ID -> GO ID mappings for all the species that NCBI has currently annotated. It just so happens that A. thaliana is at the top of the file. The function makeOrgPackageFromNCBI downloads all these generic files, then extracts data that are specific to whatever species you are interested in, and uses those data to build the orgDb package.

In the case of GO mappings, there are no mappings for your species in gene2go. So the function then queries blast2go, and gets all the mappings they have. It so happens that there are 42 (or 44? I forget) mappings for your species in blast2go, but unfortunately there aren't any Entrez Gene IDs associated with those GO terms, so they get dropped as well. In the end, there aren't any Entrez Gene -> GO mappings that makeOrgPackageFromNCBI can find, so you end up with an orgDb package that has everything but the GO table.

0
Entering edit mode

ok, thanks for the information. I really appreciate it. I have found GO annotation for Agaricus bisporus on the JGI website for that species. I've downloaded it and will attempt to construct a database using that.