Hi all, I'm trying to create an organism package (I've did it in the past with no problems), but now I'm experiecing a weird error.
I've used the example in the vignette in order to explain everythin as best as posible:
I copy and paste the code form the vignette:
library("AnnotationForge")
## Makes an organism package for Zebra Finch data.frames:
finchFile <- system.file("extdata","finch_info.txt",package="AnnotationForge")
finch <- read.table(finchFile,sep="\t")
## not that this is how it should always be, but that it *could* be this way.
fSym <- finch[,c(2,3,9)]
fSym <- fSym[fSym[,2]!="-",]
fSym <- fSym[fSym[,3]!="-",]
colnames(fSym) <- c("GID","SYMBOL","GENENAME")
fChr <- finch[,c(2,7)]
fChr <- fChr[fChr[,2]!="-",]
colnames(fChr) <- c("GID","CHROMOSOME")
finchGOFile <- system.file("extdata","GO_finch.txt",package="AnnotationForge")
fGO <- read.table(finchGOFile,sep="\t")
fGO <- fGO[fGO[,2]!="",]
fGO <- fGO[fGO[,3]!="",]
colnames(fGO) <- c("GID","GO","EVIDENCE")
makeOrgPackage(gene_info=fSym, chromosome=fChr, go=fGO,
version="0.1",
maintainer="Some One <so@someplace.org>",
author="Some One <so@someplace.org>",
outputDir = ".",
tax_id="59729",
genus="Taeniopygia",
species="guttata",
goTable="go",
verbose=T)
Once it finished i obtain the folllowing:
Creating package in ./org.Tguttata.eg.db
Now deleting temporary database file
[1] "./org.Tguttata.eg.db"
There were 50 or more warnings (use warnings() to see the first 50)
The warnings are:
Warning messages:
1: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
2: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
3: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
4: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
5: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
6: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
7: In result_fetch(res@ptr, n = n) :
Then I load the library and I try some keytype:
library(org.Tguttata.eg.db)
head( keys(org.Tguttata.eg.db) )
ls("package:org.Tguttata.eg.db")
columns(org.Tguttata.eg.db)
head(keys(org.Tguttata.eg.db, keytype="CHROMOSOME"))
head(keys(org.Tguttata.eg.db, keytype="GID"))
But once I use the GO or the evidence I obtnain the following errors:
> columns(org.Tguttata.eg.db)
[1] "CHROMOSOME" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GID" "GO" "GOALL" "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"
> head(keys(org.Tguttata.eg.db, keytype="GO"))
Error in .deriveTableNameFromField(field = keytype, x) :
Two fields in the source DB have the same name.
> head(keys(org.Tguttata.eg.db, keytype="EVIDENCE"))
Error in .deriveTableNameFromField(field = keytype, x) :
Two fields in the source DB have the same name.
So I check the sqlite db:
>bash$>sqlite3 ./org.Psp.PAO1.eg.db/inst/extdata/org.Psp.PAO1.eg.sqlite
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
sqlite> .tables
chromosome go go_bp_all go_mf map_metadata
gene_info go_all go_cc go_mf_all metadata
genes go_bp go_cc_all map_counts
sqlite> .schema go
CREATE TABLE go (
_id INTEGER NOT NULL, -- REFERENCES genes
GO VARCHAR( 25 ) NOT NULL, -- data
EVIDENCE VARCHAR( 25 ) NOT NULL, -- data
ONTOLOGY VARCHAR( 25 ) NOT NULL, -- data
FOREIGN KEY (_id)
REFERENCES genes (_id));
CREATE INDEX go_GO_ind ON go (GO);
CREATE INDEX go_EVIDENCE_ind ON go (EVIDENCE);
CREATE INDEX go_ONTOLOGY_ind ON go (ONTOLOGY);
CREATE INDEX go__id_ind ON go (_id);
sqlite> select * from go limit 5;
1|GO:0003677|IEA|MF
1|GO:0003688|IEA|MF
1|GO:0006260|IEA|BP
1|GO:0043565|IEA|MF
2|GO:0006271|IEA|BP
sqlite>
I've hecked the code and it seems that the error is related with this function:
## Keys method ##
.deriveTableNameFromField <- function(field, x){
con <- dbconn(x)
tables <- .getDataTables(con)
colTabs <- lapply(tables, FUN=RSQLite::dbListFields, con=con)
m <- unlist2(lapply(colTabs, match, field))
tab <- names(m)[!is.na(m)]
if(length(tab) > 1){stop("Two fields in the source DB have the same name.")}
if(length(tab) == 0){stop("Did not find a field in the source DB.")}
tab
}
But I'm not sure how to proceed ...
So I hope that you guys could help me cause I'have done everythin I could with no success :(
Thanks in advance !!!!
Here is my SessionInfo()
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Tguttata.eg.db_0.1 AnnotationForge_1.28.0 AnnotationDbi_1.48.0 IRanges_2.20.2 S4Vectors_0.24.4 Biobase_2.46.0 BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 GO.db_3.10.0 XML_3.99-0.3 digest_0.6.23 bitops_1.0-6 DBI_1.1.0 RSQLite_2.2.0 rlang_0.4.4 blob_1.2.1 vctrs_0.2.2
[11] tools_3.6.2 bit64_0.9-7 RCurl_1.98-1.1 bit_1.1-15.1 yaml_2.2.0 compiler_3.6.2 pkgconfig_2.0.3 memoise_1.1.0
This is remarkably similar to another recent issue - are you the same person? - https://support.bioconductor.org/p/130238/#130265
I provide an indirect solution in my answer.
Hi, thanks for your reply. I'm not Cei. I was searching for a while for similar errors and I could not find anything ....
The problem with your solition is that you "remove" de GO column which is the one I need. I've created the db package in order to use clusterprofiler or similar, to perform GO enrichment in a RNASeq study
You could find a combination of columns to retain / remove such that GO is retained. This is just a simple fix, though.