Hello,
I am getting an error when using the Bioconductor packages AnnotationDbi and AnnotationForge. I am generating an organism package using the makeOrgPackage function in the AnnotationForge package. To do so, I am using two dataframes: gids, with two columns "GID" and "SYMBOL", and "go_info" with three columns "GID", "GO" and "EVIDENCE". The latter dataframe is the goTable.
After generating and installing the package, I try to use the godata function in the AnnotationDbi package but I get an error. It says that two fields in the source DB have the same name. The same error raises when trying to get the keys for "GO" and "EVIDENCE".
Could you please help me to solve this issue? Thank you very much!
I copy the code and error messages below.
> head(go_info)
GID GO EVIDENCE
1 RL4439 GO:0005975 IEA
2 RL4439 GO:0019752 IEA
3 RL4439 GO:0055114 IEA
4 RL4439 GO:0003824 IEA
5 RL4439 GO:0016491 IEA
6 RL4439 GO:0016616 IEA
> head(gids)
GID SYMBOL
1 RL4439 RL4439
2 pRL120793 pRL120793
3 pRL120792 pRL120792
4 pRL120791 pRL120791
5 pRL120790 pRL120790
6 pRL120789 pRL120789
> makeOrgPackage(gene_info=gids,gocodes=go_info,
+ version="0.1",
+ maintainer="Javier Pardo-Diaz <jdiaz@stats.ox.ac.uk>",
+ author="Javier Pardo-Diaz <jdiaz@stats.ox.ac.uk>",
+ outputDir = ".",
+ tax_id="216596",
+ genus="Rhizobium",
+ species="leguminosarum.bv.viciae.2.3841",
+ goTable = "gocodes")
Populating genes table:
genes table filled
Populating gene_info table:
gene_info table filled
Populating gocodes table:
gocodes table filled
table metadata filled
'select()' returned many:1 mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
Populating go_bp table:
go_bp table filled
Populating go_cc table:
go_cc table filled
Populating go_mf table:
go_mf table filled
'select()' returned many:1 mapping between keys and columns
Populating go_bp_all table:
go_bp_all table filled
Populating go_cc_all table:
go_cc_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_all table:
go_all table filled
Creating package in ./org.Rleguminosarum.bv.viciae.2.3841.eg.db
Now deleting temporary database file
[1] "./org.Rleguminosarum.bv.viciae.2.3841.eg.db"
There were 50 or more warnings (use warnings() to see the first 50)
>
> install.packages("./org.Rleguminosarum.bv.viciae.2.3841.eg.db", repos=NULL)
Installing package into ‘/home/javier/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
* installing *source* package ‘org.Rleguminosarum.bv.viciae.2.3841.eg.db’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (org.Rleguminosarum.bv.viciae.2.3841.eg.db)
>
> library(org.Rleguminosarum.bv.viciae3.3841.eg.db)
> library(GOSemSim)
> hsGO <- godata("org.Rleguminosarum.bv.viciae.2.3841.eg.db" , keytype="GID", ont="MF")
Loading required package: org.Rleguminosarum.bv.viciae.2.3841.eg.db
preparing gene to GO mapping data...
Error in FUN(X[[i]], ...) :
Two fields in the source DB have the same name.
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Rleguminosarum.bv.viciae.2.3841.eg.db_0.1 org.Rleguminosarum.bv.viciae.1.3841.eg.db_0.1 org.Rleguminosarum.bv.viciae3.3841.eg.db_0.1
[4] org.Rleguminosarum.bv.viciae2.3841.eg.db_0.1 org.Rleguminosarum.bv.viciae.3841.eg.db_0.1 stringr_1.4.0
[7] org.Rleguminosarum3841.eg.db_0.2 org.Hs.eg.db_3.10.0 org.Rleguminosarumbvvc3841.eg.db_0.2
[10] AnnotationForge_1.28.0 org.Rleguminosarumbvviciae3841.eg.db_0.1 AnnotationDbi_1.48.0
[13] IRanges_2.20.2 S4Vectors_0.24.4 Biobase_2.46.0
[16] BiocGenerics_0.32.0 GOSemSim_2.12.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 magrittr_1.5 bit_1.1-15.2 rlang_0.4.6 blob_1.2.1 tools_3.6.3 DBI_1.1.0 bit64_0.9-7 digest_0.6.25 vctrs_0.2.4
[11] bitops_1.0-6 RCurl_1.98-1.2 memoise_1.1.0 RSQLite_2.2.0 stringi_1.4.6 compiler_3.6.3 GO.db_3.10.0 XML_3.99-0.3 pkgconfig_2.0.3
> keytypes(org.Rleguminosarum.bv.viciae.2.3841.eg.db)
[1] "EVIDENCE" "EVIDENCEALL" "GID" "GO" "GOALL" "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"
> keys(org.Rleguminosarum.bv.viciae.2.3841.eg.db,"GO")
Error in .deriveTableNameFromField(field = keytype, x) :
Two fields in the source DB have the same name.
Please follow any additional Bioconductor solutions on the open github issue https://github.com/Bioconductor/AnnotationForge/issues/14 or mailing list thread https://stat.ethz.ch/pipermail/bioc-devel/2020-May/016785.html
Thank you very much! It works now :)
Hello James,
Would you be willing to submit your change as a PR to
AnnotationDbi
?Thanks!
Hi Kayla,
Done. I did the PR for RELEASE311
I made the change as a simple hack to get around the fact that the NOSCHEMA DBs now have multiple GO tables.
There used to only be a go and go_all table for the NOSCHEMA DBs, but now we have these extra tables that are subsets of the existing tables (e.g., go_bp is simply a subset of the go table, containing only the BP ontology terms).
These tables already existed in the 'regular' DB packages, but I don't know why, nor can I see where there is any code in AnnotationDbi that makes use of them? And the tables aren't so big that it appears useful to have the pre-subsetted versions anyway. But maybe there is a rationale that I am unaware of.