Hi everyone! Trying to solve some issue here about 'makeOrgPackage' to use gseGO function of clusterProfiler package. Please, any help will be very appreciated. I need to analyse GSE GO terms for my RNA-seq expression study in Quercus suber. First of all, I looked for an available OrgDb file on NCBI and pum, there is one but sadly doesn't include any GO annotations. Second, I prepared the GO annotations files to build another OrgDb with makeOrgPackage as follows with the specific columns: GID, CHROMOSOME, START, END, STRAND, GOALL and the GO, ONTOLOGY, EVIDENCE. However, seems that GOALL column, which allows you to perform the analysis can not be integrated by this tool as was reported before in: Use of clusterProfiler : Error in testForValidKeytype(x, keytype)
So, do you know any other way to build a new OrgDb or implement the exiting one with the GO terms I already have? Thanks,
Nuri
library(AnnotationHub)
# Is Quercus suber already in the hub database?
#UPLOAD THE WHOLE ANNOTATIONHUB
hub <- AnnotationHub()
query(hub, c("suber", "orgdb"))
#AnnotationHub with 1 record
QS2 <- hub[["AH114342"]]
keytypes(QS2)
[1] "ACCNUM" "ALIAS" "ENTREZID" "GENENAME" "GID"
[6] "PMID" "REFSEQ" "SYMBOL"
#no GO annotations
library(AnnotationDbi)
AnnotationDbi::keytypes(orgdb)
AnnotationDbi::columns(orgdb)
library(AnnotationForge)
a=read.csv(file = "gene_info.tsv", sep = "\t")
b=read.csv(file = "go.tsv", sep = "\t")
c=read.csv(file = "goall.tsv", sep = "\t")
makeOrgPackage(
gene_info = a,
go = b,
goall = c,
tax_id = "58331", # Taxonomy ID for Quercus suber
genus = "Quercus",
species = "suber",
version = "0.99.0",
outputDir = "."
)
# Invalid keytype: GOALL. Please use the keytypes method to see a listing of valid arguments.
sessionInfo( )
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8
[2] LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8
[8] LC_NAME=C
[9] LC_ADDRESS=C
[10] LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8
[12] LC_IDENTIFICATION=C
time zone: Europe/Madrid
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats
[3] graphics grDevices
[5] utils datasets
[7] methods base
other attached packages:
[1] tidyr_1.3.1
[2] dplyr_1.1.4
[3] biomaRt_2.60.1
[4] org.Qsuber.eg.db_0.99.0
[5] AnnotationForge_1.46.0
[6] ggridges_0.5.6
[7] AnnotationDbi_1.66.0
[8] IRanges_2.38.1
[9] S4Vectors_0.42.1
[10] Biobase_2.64.0
[11] clusterProfiler_4.12.3
[12] AnnotationHub_3.12.0
[13] BiocFileCache_2.12.0
[14] dbplyr_2.5.0
[15] BiocGenerics_0.50.0
[16] BiocManager_1.30.23
Ugh. Sent the last one prematurely...
Thanks for your help and time, James. I followed the steps you showed me, but I encountered problems with fully downloading the files due to URL access issues. So, I've been trying to download the files first and then run the function, avoiding rebuildCache. However, I'm still not successful because, even though the download was complete, I'm getting an error with one of the tables. Do you have any idea what might be causing this? Also, do you know if there is a way to use gseGO with the gene2go.gz file?
Many thanks!
There are two steps involved in this process. First, downloading all the files and putting all the data into an omnibus SQLite DB called 'NCBI.sqlite'. The second step involves parsing the data from that DB and putting into a smaller organism-specific DB that then goes in the package.
If you already have a (good, complete) NCBI.sqlite DB, then you can say rebuildCache = FALSE, which means 'skip the first step and just parse the data from my NCBI.sqlite DB'. But if the NCBI.sqlite DB isn't good or complete (yours isn't complete - the error says you are missing the gene2accession table) you will get an error. In that situation you should delete the NCBI.sqlite file, and then re-run with rebuildCache = FALSE (the default), which will re-generate the NCBI.sqlite DB using the files you downloaded.