Question

RSQLite::dbGetPreparedQuery() is deprecated in AnnotationForge

0

Entering edit mode

MOD • 0

@mod-12330

Last seen 8.1 years ago

Teagasc Dublin

Hi,

I'm trying to run create an annotation database for Agaricus bisporus through NCBI in AnnotationForge, but I get a couple of errors:

Error in makeOrgDbFromDataFrames(data, tax_id, genus, species, dbFileName, :
'goTable' GO Ids must be formatted like 'GO:XXXXXXX'
In addition: Warning messages:
1: RSQLite::dbGetPreparedQuery() is deprecated, please switch to DBI::dbGetQuery(params = bind.data).
2: Named parameters not used in query: genes
3: Named parameters not used in query: name, value

How do I work around the deprecated RSQLite::dbGetPreparedQuery() function? The full script is given below along with sessonInfo. Furthermore, when I open the gene2go file the GO IDs seem fine so not sure why the go Table is not recognizing the IDs. Does anybody have an idea why the GO IDs are not recognized (I have pasted the top rows from the gene2go file that AnnotationForge obtained from NCBI at the bottom of this page)?

My script is:

> library(AnnotationDbi)
> library(GenomeInfoDb)
> library(biomaRt)
> library(survival)
> libraryUniProt.ws)
Loading required package: RCurl
Loading required package: bitops
> library(knitr)
> library(DBI)
> library(mclust)

> makeOrgPackageFromNCBI(version = "0.1",
+ author = "my name",
+ maintainer = "email.com",
+ outputDir = ".",
+ tax_id = "936046",
+ genus = "Agaricus",
+ species = "bisporus")

If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
starting download for 5 data files
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
rebuilding the cache
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
rebuilding the cache
extracting data for our organism from : gene_info
getting data for gene2go.gz
rebuilding the cache
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
Please be patient while we work out which organisms can be annotated with
ensembl IDs.
making the OrgDb package ...
Populating genes table:
genes table filled
Populating pubmed table:
pubmed table filled
Populating chromosomes table:
chromosomes table filled
Populating gene_info table:
gene_info table filled
Populating entrez_genes table:
entrez_genes table filled
Populating alias table:
alias table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating go table:
go table filled
table metadata filled
Error in makeOrgDbFromDataFrames(data, tax_id, genus, species, dbFileName, :
'goTable' GO Ids must be formatted like 'GO:XXXXXXX'
In addition: Warning messages:
1: RSQLite::dbGetPreparedQuery() is deprecated, please switch to DBI::dbGetQuery(params = bind.data).
2: Named parameters not used in query: genes
3: Named parameters not used in query: name, value

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_Ireland.1252 LC_CTYPE=English_Ireland.1252
[3] LC_MONETARY=English_Ireland.1252 LC_NUMERIC=C
[5] LC_TIME=English_Ireland.1252

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base

other attached packages:
[1] mclust_5.2.2           DBI_0.5-1              knitr_1.15.1
[4] UniProt.ws_2.14.0      RCurl_1.95-4.8         bitops_1.0-6
[7] survival_2.40-1        biomaRt_2.30.0         GenomeInfoDb_1.10.3
[10] AnnotationHub_2.6.4    AnnotationForge_1.16.0 AnnotationDbi_1.36.2
[13] IRanges_2.8.1          S4Vectors_0.12.1       Biobase_2.34.0
[16] BiocGenerics_0.20.0    RSQLite_1.1-2

loaded via a namespace (and not attached):
[1] Rcpp_0.12.9                   splines_3.3.2
[3] lattice_0.20-34               xtable_1.8-2
[5] R6_2.2.0                      httr_1.2.1
[7] tools_3.3.2                   grid_3.3.2
[9] htmltools_0.3.5               yaml_2.1.14
[11] digest_0.6.12                 interactiveDisplayBase_1.12.0
[13] Matrix_1.2-8                  shiny_1.0.0
[15] memoise_1.0.0                 mime_0.5
[17] BiocInstaller_1.24.0          XML_3.98-1.5
[19] httpuv_1.3.3

An example of the gene2go file obtained from NCBI is:

#tax_id	GeneID	GO_ID	Evidence	Qualifier	GO_term	PubMed	Category
3702	814629	GO:0005634	ISM	-	nucleus	-	Component
3702	814629	GO:0008150	ND	-	biological_process	-	Process
3702	814630	GO:0003677	IEA	-	DNA binding	-	Function
3702	814630	GO:0003700	ISS	-	transcription factor activity, sequence-specific DNA binding	11118137	Function
3702	814630	GO:0005634	IEA	-	nucleus	-	Component
3702	814630	GO:0005634	ISM	-	nucleus	-	Component
3702	814630	GO:0006351	IEA	-	transcription, DNA-templated	-	Process

annotation microarray annotate annotationforge • 3.0k views

ADD COMMENT • link 8.1 years ago MOD • 0

score 1 · Answer 1 · 2017-02-10

Your post title is misleading, as the real problem here is the error, not the warning. The error arises for species that have no GO data at NCBI. As a fail-over we then parse data from Blast2GO, and if that results in no data, then it fails because of a small bug. That's fixed now, and the updated version (1.16.1) should make its way through the build servers in the next day or so.

The warning is a long-standing issue that has to do with changes that were made in the RSQLite package, which AnnotationForge depends on. This doesn't stop anything from working - it's just letting us know that a function we are depending on is probably going to disappear in the future.

The devel version of AnnotationForge is now updated to remove the warnings, so once we have the new release in April, those warnings will go away as well.

score 0 · Answer 2 · 2017-02-12

0

Entering edit mode

MOD • 0

@mod-12330

Last seen 8.1 years ago

Teagasc Dublin

Ok, thanks for the info and your reply James. I'll keep an eye out for the update. I had thought that the warning was part of the issue of not seeing the GO IDs. The gene2go file though did appear to have GO IDs though (see the end of my original question for the first few lines of the gene2go dataframe) and I was wondering why the program was not parsing that data into the goTable?

ADD COMMENT • link 8.1 years ago MOD • 0

1

Entering edit mode

If you want to comment on a post, please click the ADD COMMENT link and type in the box that appears. The 'Add your answer' box below is intended for answers.

While you did show some rows from gene2go, you should note that the taxonomic ID for those rows (the first column) is 3702, which is Arabidopsis thaliana, not Agaricus bisporus. There are no rows in the gene2go file that have 936046 in the first column, hence no data parsed out for your GO table.

ADD REPLY • link 8.1 years ago James W. MacDonald 68k

0

Entering edit mode

ok, thanks. I did not see that. Any idea why it obtained Arabidopsis thaliana GO ID's and not Agaricus bisporus? I'll try to see if I can source the GO IDs some where else and use the makeOrgPackage(). Thanks again for your help.

ADD REPLY • link 8.1 years ago MOD • 0

1

Entering edit mode

The gene2go file that is downloaded is a generic file that contains Entrez Gene ID -> GO ID mappings for all the species that NCBI has currently annotated. It just so happens that A. thaliana is at the top of the file. The function makeOrgPackageFromNCBI downloads all these generic files, then extracts data that are specific to whatever species you are interested in, and uses those data to build the orgDb package.

In the case of GO mappings, there are no mappings for your species in gene2go. So the function then queries blast2go, and gets all the mappings they have. It so happens that there are 42 (or 44? I forget) mappings for your species in blast2go, but unfortunately there aren't any Entrez Gene IDs associated with those GO terms, so they get dropped as well. In the end, there aren't any Entrez Gene -> GO mappings that makeOrgPackageFromNCBI can find, so you end up with an orgDb package that has everything but the GO table.

ADD REPLY • link 8.1 years ago James W. MacDonald 68k

0

Entering edit mode

ok, thanks for the information. I really appreciate it. I have found GO annotation for Agaricus bisporus on the JGI website for that species. I've downloaded it and will attempt to construct a database using that.

ADD REPLY • link 8.1 years ago MOD • 0