AnnotationForge not working for building custom org packages
1
0
Entering edit mode
Cei • 0
@cei-23383
Last seen 4.7 years ago
Langebio - Mexico

Dear Marc and Hervé,

I believe the AnnotationForge package is currently not working correctly. Perhaps due to an update in SQL functions? To keep it as simple as possible, I simply tried to reproduce the worked example in Marc's "Making Organism packages" vignette. It required one small modification on the last line to run (type="source"), so I'll paste in the whole code here:

library(AnnotationForge)
## Makes an organism package for Zebra Finch data.frames:
finchFile <- system.file("extdata","finch_info.txt",
                 package="AnnotationForge")
finch <- read.table(finchFile,sep="\t")

## Now prepare some data.frames
fSym <- finch[,c(2,3,9)]
fSym <- fSym[fSym[,2]!="-",]
fSym <- fSym[fSym[,3]!="-",]
colnames(fSym) <- c("GID","SYMBOL","GENENAME")

fChr <- finch[,c(2,7)]
fChr <- fChr[fChr[,2]!="-",]
colnames(fChr) <- c("GID","CHROMOSOME")

finchGOFile <- system.file("extdata","GO_finch.txt",
               package="AnnotationForge")
fGO <- read.table(finchGOFile,sep="\t")
fGO <- fGO[fGO[,2]!="",]
fGO <- fGO[fGO[,3]!="",]
colnames(fGO) <- c("GID","GO","EVIDENCE")

## Then call the function
makeOrgPackage(gene_info=fSym, chromosome=fChr, go=fGO,
               version="0.1",
               maintainer="Some One <so@someplace.org>",
               author="Some One <so@someplace.org>",
               outputDir = ".",
               tax_id="59729",
               genus="Taeniopygia",
               species="guttata",
               goTable="go")

## then you can call install.packages based on the return value
install.packages("./org.Tguttata.eg.db", repos=NULL, type="source")

No errors, but 50+ warnings like this:

In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().

I can load the resulting package, but the database seems to be empty:

library("org.Tguttata.eg.db")
org.Tguttata.eg()
Quality control information for org.Tguttata.eg:

This package has the following mappings:

Additional Information about this package:

DB schema: NOSCHEMA_DB
DB schema version: 2.1
Organism: Taeniopygia guttata

At the end of this message, I'm posting my sessionInfo().

Many thanks for all the great work, and I hope this post helps getting others unstuck as well.

Cei

PS Thinking the RSQLite package might have been updated and become incompatible, I also tried running the same code on an old R (3.4) installation on Linux, but got similar warnings although mentioning different SQL functions, and the result was equally unpopulated.

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Tguttata.eg.db_0.1 AnnotationForge_1.28.0 AnnotationDbi_1.48.0   IRanges_2.20.2         S4Vectors_0.24.4      
[6] Biobase_2.46.0         BiocGenerics_0.32.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4      GO.db_3.10.0    XML_3.99-0.3    digest_0.6.25   bitops_1.0-6    DBI_1.1.0       RSQLite_2.2.0  
 [8] rlang_0.4.5     blob_1.2.1      vctrs_0.2.4     tools_3.6.2     bit64_0.9-7     RCurl_1.98-1.2  bit_1.1-15.2   
[15] compiler_3.6.2  pkgconfig_2.0.3 memoise_1.1.0  
AnnotationForge Marc Carlson Hervé Pagès • 1.3k views
ADD COMMENT
0
Entering edit mode

I know it's less fun, but does

library(AnnotationHub)
hub = AnnotationHub()
query(hub, c("OrgDb", "Taeniopygia guttata"))
orgdb = hub[["AH76439"]]

get you what you want?

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

That's just the example for makeOrgPackage, which is a cut-down version of what you should do, intended to serve as an example. Which apparently works just fine?

> example(makeOrgPackage)

mkOrgP> if(interactive()){
mkOrgP+ 
mkOrgP+ ## Makes an organism package for Zebra Finch data.frames:
mkOrgP+ finchFile <- system.file("extdata","finch_info.txt",package="AnnotationForge")
mkOrgP+ finch <- read.table(finchFile,sep="\t")
mkOrgP+ 
mkOrgP+ ## not that this is how it should always be, but that it *could* be this way.
mkOrgP+ fSym <- finch[,c(2,3,9)]
mkOrgP+ fSym <- fSym[fSym[,2]!="-",]
mkOrgP+ fSym <- fSym[fSym[,3]!="-",]
mkOrgP+ colnames(fSym) <- c("GID","SYMBOL","GENENAME")
mkOrgP+ 
mkOrgP+ fChr <- finch[,c(2,7)]
mkOrgP+ fChr <- fChr[fChr[,2]!="-",]
mkOrgP+ colnames(fChr) <- c("GID","CHROMOSOME")
mkOrgP+ 
mkOrgP+ finchGOFile <- system.file("extdata","GO_finch.txt",package="AnnotationForge")
mkOrgP+ fGO <- read.table(finchGOFile,sep="\t")
mkOrgP+ fGO <- fGO[fGO[,2]!="",]
mkOrgP+ fGO <- fGO[fGO[,3]!="",]
mkOrgP+ colnames(fGO) <- c("GID","GO","EVIDENCE")
mkOrgP+ 
mkOrgP+ makeOrgPackage(gene_info=fSym, chromosome=fChr, go=fGO,
mkOrgP+                version="0.1",
mkOrgP+                maintainer="Some One <so@someplace.org>",
mkOrgP+                author="Some One <so@someplace.org>",
mkOrgP+                outputDir = ".",
mkOrgP+                tax_id="59729",
mkOrgP+                genus="Taeniopygia",
mkOrgP+                species="guttata",
mkOrgP+                goTable="go")
mkOrgP+ 
mkOrgP+ ## then you can call install.packages based on the return value
mkOrgP+ install.packages("./org.Tguttata.eg.db", repos=NULL)
mkOrgP+ 
mkOrgP+ }
Populating genes table:
genes table filled
Populating gene_info table:
gene_info table filled
Populating chromosome table:
chromosome table filled
Populating go table:
go table filled
table metadata filled

'select()' returned many:1 mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
Populating go_bp table:
go_bp table filled
Populating go_cc table:
go_cc table filled
Populating go_mf table:
go_mf table filled
'select()' returned many:1 mapping between keys and columns
Populating go_bp_all table:
go_bp_all table filled
Populating go_cc_all table:
go_cc_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_all table:
go_all table filled
Creating package in ./org.Tguttata.eg.db 
Now deleting temporary database file
## the install doesn't work out of the box for Windows
> install.packages("org.Tguttata.eg.db/",type = "source", repos = NULL)
Installing package into 'C:/Users/jmacdon/AppData/Roaming/R/win-library/3.6'
(as 'lib' is unspecified)
* installing *source* package 'org.Tguttata.eg.db' ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
Warning messages:
<snip>
* DONE (org.Tguttata.eg.db)
> library(org.Tguttata.eg.db)

Warning message:
call dbDisconnect() when finished working with a connection 

> select(org.Tguttata.eg.db, head(keys(org.Tguttata.eg.db)), c("GENENAME","GID", "SYMBOL"))
'select()' returned 1:1 mapping between keys and columns
     GID                                                 GENENAME SYMBOL
1 751582 synuclein, alpha (non A4 component of amyloid precursor)   SNCA
2 751583                                        neurocalcin delta  NCALD
3 751584                        brain-derived neurotrophic factor   BDNF
4 751585                cAMP responsive element binding protein 1  CREB1
5 751586                                    melatonin receptor 1A MTNR1A
6 751588                                    melatonin receptor 1B MTNR1B

ADD COMMENT
0
Entering edit mode

Hi James (and Martin),

Many thanks for you quick replies. My fault for not giving the correct context. I am trying to use GOSemSim to reduce and play around with GO mappings from a custom annotation that we have of the Monarch butterfly. The GOSemSim manual requires an org.XXX.eg.db package, so I ended up trying AnnotationForge (that's why not AnnotationHub, thanks Martin!). Since the package I built did not work for GOSemSim, I tried going back to the worked example of T. guttata, but got the same error! So, after building with example(makeOrgPackage), and installing. I run the following from the GOSemSim manual:

library(GOSemSim)
tgGO <- godata('org.Tguttata.eg.db', ont="BP")

I got the following error:

Error in testForValidKeytype(x, keytype) : Invalid keytype: ENTREZID. Please use the keytypes method to see a listing of valid arguments.

Due to the lack of mapping info displayed upon org.Tguttata.eg() I thought the problem was with AnnotationForge. But the select() code James sent works for me as well. So, I now tried adding a dummy ENTREZID to the fSym table in the example code, by adding:

fSym$ENTREZID <- paste0("ent",fSym$GID)

After reinstalling, I again try the same code:

tgGO <- godata('org.Tguttata.eg.db', ont="BP")

and I now get the following error:

preparing gene to GO mapping data... Error in FUN(X[[i]], ...) : Two fields in the source DB have the same name.

If you have any suggestions I would really appreciate it, otherwise I'll try a new post to attract the attention of the GOSemSim developers and see if they can help.

Many thanks to all,

Cei

ADD REPLY
0
Entering edit mode

Since the central ID of the database may or may not be an NCBI Gene ID, it's hard-coded to be something not very descriptive (GID). That allows you to use any unique ID, which is nice, but it is problematic for other tools that assume the central ID of any orgDb will be 'ENTREZID" (this isn't a valid assumption btw; neither org.Sc.sgd.db nor org.At.tair.db use NCBI Gene IDs as the central ID).

Ideally you would be able to convince the maintainer of GOSemSim to add an argument for the central ID, which would make their package more portable.

ADD REPLY

Login before adding your answer.

Traffic: 726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6