Question

makeOrgPackage error: Error in .deriveTableNameFromField(field = keytype, x) : Two fields in the source DB have the same name.

1

Entering edit mode

Eduardo Andres-Leon ▴ 30

@eduardo-andres-leon-14572

Last seen 3.8 years ago

Granada (Spain)

Hi all, I'm trying to create an organism package (I've did it in the past with no problems), but now I'm experiecing a weird error.

I've used the example in the vignette in order to explain everythin as best as posible:

I copy and paste the code form the vignette:

library("AnnotationForge")

## Makes an organism package for Zebra Finch data.frames:
finchFile <- system.file("extdata","finch_info.txt",package="AnnotationForge")
finch <- read.table(finchFile,sep="\t")

## not that this is how it should always be, but that it *could* be this way.
fSym <- finch[,c(2,3,9)]
fSym <- fSym[fSym[,2]!="-",]
fSym <- fSym[fSym[,3]!="-",]
colnames(fSym) <- c("GID","SYMBOL","GENENAME")

fChr <- finch[,c(2,7)]
fChr <- fChr[fChr[,2]!="-",]
colnames(fChr) <- c("GID","CHROMOSOME")

finchGOFile <- system.file("extdata","GO_finch.txt",package="AnnotationForge")
fGO <- read.table(finchGOFile,sep="\t")
fGO <- fGO[fGO[,2]!="",]
fGO <- fGO[fGO[,3]!="",]
colnames(fGO) <- c("GID","GO","EVIDENCE")

makeOrgPackage(gene_info=fSym, chromosome=fChr, go=fGO,
               version="0.1",
               maintainer="Some One <so@someplace.org>",
               author="Some One <so@someplace.org>",
               outputDir = ".",
               tax_id="59729",
               genus="Taeniopygia",
               species="guttata",
               goTable="go",
               verbose=T)

Once it finished i obtain the folllowing:

Creating package in ./org.Tguttata.eg.db 
Now deleting temporary database file
[1] "./org.Tguttata.eg.db"
There were 50 or more warnings (use warnings() to see the first 50)

The warnings are:

Warning messages:
1: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
2: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
3: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
4: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
5: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
6: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
7: In result_fetch(res@ptr, n = n) :

Then I load the library and I try some keytype:

library(org.Tguttata.eg.db)
head( keys(org.Tguttata.eg.db) )
ls("package:org.Tguttata.eg.db")
columns(org.Tguttata.eg.db)

head(keys(org.Tguttata.eg.db, keytype="CHROMOSOME"))
head(keys(org.Tguttata.eg.db, keytype="GID"))

But once I use the GO or the evidence I obtnain the following errors:

> columns(org.Tguttata.eg.db)
 [1] "CHROMOSOME"  "EVIDENCE"    "EVIDENCEALL" "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"    "ONTOLOGYALL" "SYMBOL"     
> head(keys(org.Tguttata.eg.db, keytype="GO"))

Error in .deriveTableNameFromField(field = keytype, x) : 
  Two fields in the source DB have the same name.
> head(keys(org.Tguttata.eg.db, keytype="EVIDENCE"))

Error in .deriveTableNameFromField(field = keytype, x) : 
  Two fields in the source DB have the same name.

So I check the sqlite db:

>bash$>sqlite3 ./org.Psp.PAO1.eg.db/inst/extdata/org.Psp.PAO1.eg.sqlite
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
sqlite> .tables
chromosome    go            go_bp_all     go_mf         map_metadata
gene_info     go_all        go_cc         go_mf_all     metadata    
genes         go_bp         go_cc_all     map_counts  
sqlite> .schema go
CREATE TABLE go  (
            _id INTEGER NOT NULL,                         -- REFERENCES genes
         GO  VARCHAR( 25 ) NOT NULL,    -- data
       EVIDENCE  VARCHAR( 25 ) NOT NULL,    -- data
       ONTOLOGY  VARCHAR( 25 ) NOT NULL,    -- data 
        FOREIGN KEY (_id)
        REFERENCES genes (_id));
CREATE INDEX go_GO_ind ON go (GO);
CREATE INDEX go_EVIDENCE_ind ON go (EVIDENCE);
CREATE INDEX go_ONTOLOGY_ind ON go (ONTOLOGY);
CREATE INDEX go__id_ind ON go (_id);
sqlite> select * from go limit 5;
1|GO:0003677|IEA|MF
1|GO:0003688|IEA|MF
1|GO:0006260|IEA|BP
1|GO:0043565|IEA|MF
2|GO:0006271|IEA|BP
sqlite>

I've hecked the code and it seems that the error is related with this function:

## Keys method ##
.deriveTableNameFromField <- function(field, x){
  con <- dbconn(x)
  tables <- .getDataTables(con)
  colTabs <- lapply(tables, FUN=RSQLite::dbListFields, con=con)
  m <- unlist2(lapply(colTabs, match, field))
  tab <- names(m)[!is.na(m)]
  if(length(tab) > 1){stop("Two fields in the source DB have the same name.")}
  if(length(tab) == 0){stop("Did not find a field in the source DB.")}
  tab
}

But I'm not sure how to proceed ...

So I hope that you guys could help me cause I'have done everythin I could with no success :(

Thanks in advance !!!!

Here is my SessionInfo()

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Tguttata.eg.db_0.1 AnnotationForge_1.28.0 AnnotationDbi_1.48.0   IRanges_2.20.2         S4Vectors_0.24.4       Biobase_2.46.0         BiocGenerics_0.32.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3      GO.db_3.10.0    XML_3.99-0.3    digest_0.6.23   bitops_1.0-6    DBI_1.1.0       RSQLite_2.2.0   rlang_0.4.4     blob_1.2.1      vctrs_0.2.2    
[11] tools_3.6.2     bit64_0.9-7     RCurl_1.98-1.1  bit_1.1-15.1    yaml_2.2.0      compiler_3.6.2  pkgconfig_2.0.3 memoise_1.1.0

makeOrgPackage • 1.5k views

ADD COMMENT • link updated 3.2 years ago by James W. MacDonald 65k • written 4.0 years ago by Eduardo Andres-Leon ▴ 30

0

Entering edit mode

This is remarkably similar to another recent issue - are you the same person? - https://support.bioconductor.org/p/130238/#130265

I provide an indirect solution in my answer.

ADD REPLY • link 4.0 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Hi, thanks for your reply. I'm not Cei. I was searching for a while for similar errors and I could not find anything ....

The problem with your solition is that you "remove" de GO column which is the one I need. I've created the db package in order to use clusterprofiler or similar, to perform GO enrichment in a RNASeq study

ADD REPLY • link 4.0 years ago Eduardo Andres-Leon ▴ 30

0

Entering edit mode

You could find a combination of columns to retain / remove such that GO is retained. This is just a simple fix, though.

ADD REPLY • link 4.0 years ago Kevin Blighe ★ 3.9k

score 0 · Answer 1 · 2020-04-25

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 1 day ago

United States

Also not-an-answer but https://support.bioconductor.org/p/130160/#130185 shows that this information is already available in AnnotationHub?

ADD COMMENT • link 4.0 years ago Martin Morgan 25k

1

Entering edit mode

Hi Martin thanks for your post, it was the rigth answer to me. I'll try to explain it:

I'm working with Pseudomonas, as I did not find the X package, I tried to create my own (as I've done dozens of times earlier). The idea of the whole process is to use DE genes from a RNASeq to study GO/KEGG enrichment, so I needed tha mapping from genes to go and genes to KEGG. I use clusterprofiler in my pipeline as it is straigth forward to obtain my results and several graphs. For KEGG pocessing I just need the locus name of the organism (which is the gff file), so I spent 2 seconds to obtain KEGG enriched pathays. But the GO part was a nigthmare. As I failed to get the GO keys, I posted this question ...

I tried Kevin suggestion, I removed different go tables from the databases but I did not find the correct combination. I checked de Mm and the Hs org db and I see that both organism have the same go tables that I have and the keys are working as expected .... So instead of ending with my frustration under liters of beer, I checked again the post provided my Martin ....

I searched for all GeneID for each of Pseaudomonas locus (I found a gff in genbank with the locus<=>geneId mapping).... So I repeated the whole process using the GeneID as GID .... and it failed.... Then I tried to duplicate this GID field by acreating a copy called ENTREZID ... and it worked !

So apart from using the genbank geneid as GID, I created a field called ENTREZID in the org package. The error included in this post remains appearing:

head(keys(org.Paeruginosa.eg.db, keytype="GO"))

Error in .deriveTableNameFromField(field = keytype, x) : Two fields in the source DB have the same name.

But the enrichGO function from clusterprofiler works perflecty fine ! I hope this solution can help more people and all developers from makeOrgPackage find a solution

ADD REPLY • link 4.0 years ago Eduardo Andres-Leon ▴ 30

1

Entering edit mode

Glad that you got it working, but seems like a problem that will re-occur for others. Perhaps reporting this as an issue on GitHub would help.

From what I can see, you simply created a new column called ENTREZID, which actually contains the GIDs for your organism?

ADD REPLY • link 4.0 years ago Kevin Blighe ★ 3.9k

1

Entering edit mode

Yes, that was what I did (I guess GO.db internally works only with ENTREZID)

ADD REPLY • link 4.0 years ago Eduardo Andres-Leon ▴ 30

0

Entering edit mode

Hi Eduardo, I tried what you suggested to add the new field (below along with other commands) but they didn't work.

fSym$ENTREZID <- fSym$GID

Can you elaborate on what you did? I also tried looking for answers from other posts but got no luck, maybe you are the only one in the world had solved the issue. Please help. Spent too much time on it, I feel like you suffered too. lol

ADD REPLY • link 3.2 years ago Cheng • 0

0

Entering edit mode

You need to update your R/Bioconductor install. This was fixed almost a year ago. All of the OrgDb packages mentioned in this post are NOSCHEMA_DB packages, like for instance this one:

> z
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Macaca fascicularis
| SPECIES: Macaca fascicularis
| CENTRALID: GID
| Taxonomy ID: 9541
| Db type: OrgDb
| Supporting package: AnnotationDbi

Please see: help('select') for usage information

And this works:

> head(keys(z, "GO"))
[1] "GO:0000002" "GO:0000003" "GO:0000009" "GO:0000010" "GO:0000012"
[6] "GO:0000014"

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] DBI_1.1.0            AnnotationHub_2.22.0 BiocFileCache_1.14.0
[4] dbplyr_2.0.0         AnnotationDbi_1.52.0 IRanges_2.24.1      
[7] S4Vectors_0.28.1     Biobase_2.50.0       BiocGenerics_0.36.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                    later_1.1.0.1                
 [3] compiler_4.0.0                pillar_1.4.7                 
 [5] BiocManager_1.30.10           tools_4.0.0                  
 [7] digest_0.6.27                 bit_4.0.4                    
 [9] RSQLite_2.2.1                 memoise_1.1.0                
[11] lifecycle_0.2.0               tibble_3.0.4                 
[13] pkgconfig_2.0.3               rlang_0.4.10                 
[15] shiny_1.5.0                   curl_4.3                     
[17] yaml_2.2.1                    fastmap_1.0.1                
[19] withr_2.3.0                   dplyr_1.0.2                  
[21] httr_1.4.2                    generics_0.1.0               
[23] vctrs_0.3.6                   rappdirs_0.3.1               
[25] bit64_4.0.5                   tidyselect_1.1.0             
[27] glue_1.4.2                    R6_2.5.0                     
[29] purrr_0.3.4                   blob_1.2.1                   
[31] magrittr_2.0.1                promises_1.1.1               
[33] htmltools_0.5.0               ellipsis_0.3.1               
[35] assertthat_0.2.1              xtable_1.8-4                 
[37] mime_0.9                      interactiveDisplayBase_1.28.0
[39] httpuv_1.5.4                  crayon_1.3.4                 
[41] BiocVersion_3.12.0

ADD REPLY • link 3.2 years ago James W. MacDonald 65k

0

Entering edit mode

Hey James, I was following http://bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/inst/doc/MakingNewOrganismPackages.html, but seeing the issue still. Do you think it is a R version related issue (mine is 3.6.2), I worried I have to reinstall all other packages in R and hesitate to upgrade R to 4.0.

> org.Tguttata.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Taeniopygia guttata
| SPECIES: Taeniopygia guttata
| CENTRALID: GID
| Taxonomy ID: 59729
| Db type: OrgDb
| Supporting package: AnnotationDbi

> keytypes(org.Tguttata.eg.db)

[1] "CHROMOSOME" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GID" "GO" "GOALL"
[8] "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"

> head(keys(org.Tguttata.eg.db, "GO"))

Error in .deriveTableNameFromField(field = keytype, x) : Two fields in the source DB have the same name.

> sessionInfo()

R version 3.6.2 (2019-12-12) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.6

ADD REPLY • link 3.2 years ago Cheng • 0

0

Entering edit mode

When we fix things we only fix devel, and possibly release. So unless you upgrade to the newest version (and we don't support old versions anyway - if you choose to use old versions, getting them to work is on you, not us) you cannot expect the fix to exist for you.

It is true that you have to re-install all your old packages. But that's how it works. We upgrade and fix bugs, and you have to reinstall in order to get the updated versions.

If you want everything you had in the old install, then it's simple enough. Something like

## in your existing R
z <- row.names(installed.packages())
## you can save this as a text file or an RDS. Let's use text
cat(z, file = "myoldpackages.txt", sep = "\n")

## upgrade to new R. After doing so, 
install.packages("BiocManager")
library(BiocManager)
## here you might need to point to the directory where you saved this file
z <- scan("myoldpackages.txt", "c")
install(z, ask = FALSE)

ADD REPLY • link 3.2 years ago James W. MacDonald 65k