makeOrgPackage - ERROR: Two fields in the source DB have the same name
1
0
Entering edit mode
jdiaz • 0
@jdiaz-23502
Last seen 16 months ago

Hello,

I am getting an error when using the Bioconductor packages AnnotationDbi and AnnotationForge. I am generating an organism package using the makeOrgPackage function in the AnnotationForge package. To do so, I am using two dataframes: gids, with two columns "GID" and "SYMBOL", and "go_info" with three columns "GID", "GO" and "EVIDENCE". The latter dataframe is the goTable.

After generating and installing the package, I try to use the godata function in the AnnotationDbi package but I get an error. It says that two fields in the source DB have the same name. The same error raises when trying to get the keys for "GO" and "EVIDENCE".

Could you please help me to solve this issue? Thank you very much!

I copy the code and error messages below.

> head(go_info)
     GID         GO EVIDENCE
1 RL4439 GO:0005975      IEA
2 RL4439 GO:0019752      IEA
3 RL4439 GO:0055114      IEA
4 RL4439 GO:0003824      IEA
5 RL4439 GO:0016491      IEA
6 RL4439 GO:0016616      IEA
> head(gids)
        GID    SYMBOL
1    RL4439    RL4439
2 pRL120793 pRL120793
3 pRL120792 pRL120792
4 pRL120791 pRL120791
5 pRL120790 pRL120790
6 pRL120789 pRL120789
> makeOrgPackage(gene_info=gids,gocodes=go_info,
+                version="0.1",
+                maintainer="Javier Pardo-Diaz <jdiaz@stats.ox.ac.uk>",
+                author="Javier Pardo-Diaz <jdiaz@stats.ox.ac.uk>",
+                outputDir = ".",
+                tax_id="216596",
+                genus="Rhizobium",
+                species="leguminosarum.bv.viciae.2.3841",
+                goTable = "gocodes")
Populating genes table:
genes table filled
Populating gene_info table:
gene_info table filled
Populating gocodes table:
gocodes table filled
table metadata filled
'select()' returned many:1 mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
Populating go_bp table:
go_bp table filled
Populating go_cc table:
go_cc table filled
Populating go_mf table:
go_mf table filled
'select()' returned many:1 mapping between keys and columns
Populating go_bp_all table:
go_bp_all table filled
Populating go_cc_all table:
go_cc_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_all table:
go_all table filled
Creating package in ./org.Rleguminosarum.bv.viciae.2.3841.eg.db 
Now deleting temporary database file
[1] "./org.Rleguminosarum.bv.viciae.2.3841.eg.db"
There were 50 or more warnings (use warnings() to see the first 50)
> 
> install.packages("./org.Rleguminosarum.bv.viciae.2.3841.eg.db", repos=NULL)
Installing package into ‘/home/javier/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
* installing *source* package ‘org.Rleguminosarum.bv.viciae.2.3841.eg.db’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (org.Rleguminosarum.bv.viciae.2.3841.eg.db)
> 
> library(org.Rleguminosarum.bv.viciae3.3841.eg.db)
> library(GOSemSim)
> hsGO <- godata("org.Rleguminosarum.bv.viciae.2.3841.eg.db" , keytype="GID", ont="MF")
Loading required package: org.Rleguminosarum.bv.viciae.2.3841.eg.db

preparing gene to GO mapping data...
Error in FUN(X[[i]], ...) : 
  Two fields in the source DB have the same name.
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Rleguminosarum.bv.viciae.2.3841.eg.db_0.1 org.Rleguminosarum.bv.viciae.1.3841.eg.db_0.1 org.Rleguminosarum.bv.viciae3.3841.eg.db_0.1 
 [4] org.Rleguminosarum.bv.viciae2.3841.eg.db_0.1  org.Rleguminosarum.bv.viciae.3841.eg.db_0.1   stringr_1.4.0                                
 [7] org.Rleguminosarum3841.eg.db_0.2              org.Hs.eg.db_3.10.0                           org.Rleguminosarumbvvc3841.eg.db_0.2         
[10] AnnotationForge_1.28.0                        org.Rleguminosarumbvviciae3841.eg.db_0.1      AnnotationDbi_1.48.0                         
[13] IRanges_2.20.2                                S4Vectors_0.24.4                              Biobase_2.46.0                               
[16] BiocGenerics_0.32.0                           GOSemSim_2.12.1                              

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6    magrittr_1.5    bit_1.1-15.2    rlang_0.4.6     blob_1.2.1      tools_3.6.3     DBI_1.1.0       bit64_0.9-7     digest_0.6.25   vctrs_0.2.4    
[11] bitops_1.0-6    RCurl_1.98-1.2  memoise_1.1.0   RSQLite_2.2.0   stringi_1.4.6   compiler_3.6.3  GO.db_3.10.0    XML_3.99-0.3    pkgconfig_2.0.3
> keytypes(org.Rleguminosarum.bv.viciae.2.3841.eg.db)
[1] "EVIDENCE"    "EVIDENCEALL" "GID"         "GO"          "GOALL"       "ONTOLOGY"    "ONTOLOGYALL" "SYMBOL"     
> keys(org.Rleguminosarum.bv.viciae.2.3841.eg.db,"GO")
Error in .deriveTableNameFromField(field = keytype, x) : 
  Two fields in the source DB have the same name.

software error AnnotationForge AnnotationDbi • 514 views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

This is a bug in AnnotationDbi that doesn't account for changes made to AnnotationForge for the NOSCHEMA OrgDb packages last year. We are looking into the changes and should have a permanent fix in the coming days.

For now I pushed a temporary fix to a personal repo that you could use in the interim.

## current situation
> library(org.Tguttata.eg.db)
> select(org.Tguttata.eg.db, head(keys(org.Tguttata.eg.db)), "GO")
Error in FUN(X[[i]], ...) : 
  Two fields in the source DB have the same name.

## to fix, do this
library(BiocManager)
install("jmacdon/AnnotationDbi") ## you may need to install the remotes package for this to work
> library(org.Tguttata.eg.db)
> select(org.Tguttata.eg.db, head(keys(org.Tguttata.eg.db)), "GO")
'select()' returned 1:many mapping between keys and columns
       GID         GO
1   751582 GO:0000287
2   751582 GO:0001774
3   751582 GO:0001921
4   751582 GO:0001956
5   751582 GO:0001963
6   751582 GO:0005507
7   751582 GO:0005509
8   751582 GO:0005515
9   751582 GO:0005634
10  751582 GO:0005737
11  751582 GO:0005739
12  751582 GO:0005829
13  751582 GO:0005856
14  751582 GO:0005886

ADD COMMENT
0
Entering edit mode

Please follow any additional Bioconductor solutions on the open github issue https://github.com/Bioconductor/AnnotationForge/issues/14 or mailing list thread https://stat.ethz.ch/pipermail/bioc-devel/2020-May/016785.html

ADD REPLY
0
Entering edit mode

Thank you very much! It works now :)

ADD REPLY
0
Entering edit mode

Hello James,

Would you be willing to submit your change as a PR to AnnotationDbi?

Thanks!

ADD REPLY
1
Entering edit mode

Hi Kayla,

Done. I did the PR for RELEASE311

I made the change as a simple hack to get around the fact that the NOSCHEMA DBs now have multiple GO tables.

> orgdb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Macaca fascicularis
| SPECIES: Macaca fascicularis
| CENTRALID: GID
| Taxonomy ID: 9541
| Db type: OrgDb
| Supporting package: AnnotationDbi

Please see: help('select') for usage information
> dbListTables(dbconn(orgdb))
 [1] "accessions"   "alias"        "chromosomes"  "entrez_genes" "gene_info"
 [6] "genes"        "go"           "go_all"       "go_bp"        "go_bp_all"
[11] "go_cc"        "go_cc_all"    "go_mf"        "go_mf_all"    "map_counts"
[16] "map_metadata" "metadata"     "pubmed"       "refseq"       "unigene"

There used to only be a go and go_all table for the NOSCHEMA DBs, but now we have these extra tables that are subsets of the existing tables (e.g., go_bp is simply a subset of the go table, containing only the BP ontology terms).

These tables already existed in the 'regular' DB packages, but I don't know why, nor can I see where there is any code in AnnotationDbi that makes use of them? And the tables aren't so big that it appears useful to have the pre-subsetted versions anyway. But maybe there is a rationale that I am unaware of.

ADD REPLY

Login before adding your answer.

Traffic: 326 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6