topGO::annFUN.org()
has the following lines:
## function annFUN.org() to work with the "org.XX.eg" annotations
annFUN.org <- function(whichOnto, feasibleGenes = NULL, mapping, ID = "entrez") {
# [some lines have been omitted]
geneID <- keyName[tolower(ID)]
.sql <- paste("SELECT DISTINCT ", geneID, ", go_id FROM ", tableName[tolower(ID)],
" INNER JOIN ", paste("go", tolower(whichOnto), sep = "_"),
" USING(_id)", sep = "")
retVal <- dbGetQuery(get(paste(mapping, "dbconn", sep = "_"))(), .sql)
## restric to the set of feasibleGenes
if(!is.null(feasibleGenes))
retVal <- retVal[retVal[[geneID]] %in% feasibleGenes, ]
## split the table into a named list of GOs
return(split(retVal[[geneID]], retVal[["go_id"]]))
}
I created a custom OrgDb (for a non-model organism) using AnnotationForge
, which I wanted to use with topGO
.
The custom OrgDb didn't have tables like go_bp
, so topGO
wouldn't work at first (which I posted about in https://support.bioconductor.org/p/118713/), but I was able to create that table and get around that problem.
Using my custom OrgDb, topGO
failed on the last line of annFUN.org()
with the call to split()
, because there is no data in retVal[[geneID]]
I debugged this and found out that custom OrgDb files created using AnnotationForge
seem to have uppercase field names like SYMBOL and ENSEMBL in the SQLite tables. In contrast, the standard OrgDb packages like org.Hs.eg.db have uppercase columns()
, but lowercase field names in the SQLite tables.
The weird thing, is that SQL is case-insensitive, so the SQL query above returns data whether or not the field name stored in geneID
is uppercase or lowercase. But retVal[[geneID]]
returns NULL when the SQLite table field names are uppercase, because of the line geneID <- keyName[tolower(ID)]
, which is why my custom OrgDb failed here.
My question is: would it be possible for topGO::annFUN.org
to handle upper and lowercase field names more gracefully?
Sorry, I am new to bioconductor and annotation packages, and it is overwhelming at times, since it seems a lot more difficult to work with a non-model organism. It took me a long time to figure out why split()
was throwing an error.