Yes, you can build yourselves an OrgDb for Tetrahymena thermophile using the function makeOrgPackageFromNCBI() from the library AnnotationForge. You will only need the taxonomy ID of T. Thermophile, which apparently is 312017. Please note that this OrgDb contains (only) the annotation information available at the NCBI (for Tetrahymena thermophila SB210).
 
# download files from NCBI and create an annotation database named "org.Tthermophila.eg.db"
# you will need ~25GB disk space for this.
# running time function was ~ 7hr on my computer, during that time leave R session untouched.
# you can ignore the waring on removing the file './org.Tthermophila.eg.sqlite'
> # set working dir to a location with sufficient HDD space
> setwd("D:\\my\\favorite\\directory")
> # load required libraries.
>library(AnnotationForge)
>library(AnnotationDbi)
>library(GenomeInfoDb)
> #step below takes long time!
>makeOrgPackageFromNCBI(version="0.0.1", author = "First Last Name <email@address.com>", maintainer = "First Last Name <email@address.com>", ".", tax_id = "312017", genus = "Tetrahymena", species= "thermophila")
#Next install the generated org.Tthermophila.eg.db for use in R.
> install.packages(pkgs="./org.Tthermophila.eg.db", repos=NULL, type="source")
* installing *source* package 'org.Tthermophila.eg.db' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (org.Tthermophila.eg.db)
#check
> library(org.Tthermophila.eg.db)
> org.Tthermophila.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Tetrahymena thermophila
| SPECIES: Tetrahymena thermophila
| CENTRALID: GID
| Taxonomy ID: 312017
| Db type: OrgDb
| Supporting package: AnnotationDbi
> columns(org.Tthermophila.eg.db)
 [1] "ACCNUM"      "ALIAS"       "ENTREZID"    "EVIDENCE"    "EVIDENCEALL"
 [6] "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"   
[11] "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"     
> keytypes(org.Tthermophila.eg.db)
 [1] "ACCNUM"      "ALIAS"       "ENTREZID"    "EVIDENCE"    "EVIDENCEALL"
 [6] "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"   
[11] "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"     
> head(keys(org.Tthermophila.eg.db))
[1] "7822955" "7822974" "7823109" "7823219" "7823307" "7823613"
> #Check: number of keys (genes) indeed corresponds to # genes listed @ NCBI [=26997]
> length(keys(org.Tthermophila.eg.db))
[1] 26997
> mykeys <- keys(org.Tthermophila.eg.db)[1:25]
> anno.result <- select(org.Tthermophila.eg.db, keys=mykeys, columns=c("ENTREZID","SYMBOL","GENENAME","ALIAS","GO"),keytype="ENTREZID")
'select()' returned 1:many mapping between keys and columns
> head(anno.result)
  ENTREZID          SYMBOL                   GENENAME           ALIAS         GO
1  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0022625
2  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0003735
3  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0002181
4  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0000027
5  7822974 TTHERM_00134940 60S ribosomal protein L23a TTHERM_00134940 GO:0022625
6  7822974 TTHERM_00134940 60S ribosomal protein L23a TTHERM_00134940 GO:0019843
 
> sessionInfo()
R version 3.4.0 Patched (2017-05-10 r72670)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     
other attached packages:
[1] org.Tthermophila.eg.db_0.0.1 GenomeInfoDb_1.12.0         
[3] AnnotationForge_1.18.0       AnnotationDbi_1.38.0        
[5] IRanges_2.10.2               S4Vectors_0.14.2            
[7] Biobase_2.36.2               AnnotationHub_2.8.1         
[9] BiocGenerics_0.22.0         
                    
                
                 
Thank you so much for your response!
I also need to load library(biomaRt), other than that everything works exactly as you said.
when
makeOrgPackageFromNCBI(version="0.0.1", author = "First Last Name <email@address.com>", maintainer = "First Last Name <email@address.com>", ".", tax_id = "312017", genus = "Tetrahymena", species= "thermophila")
it turns out
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
Error in result_create(conn@ptr, statement) :
no such table: main.gene2pubmed
could you give some advice?
many thanks!!
Well, the above line of code (still) works for me on the current version of R/BioC... (although run-time was longer than before [now ~20hrs]). If needed, I can sent you the OrgDb library I made.
># set working dir to a location with sufficient HDD space ># needed >35GB >setwd("D:\\my\\favorite\\directory") > ># load required libraries. >library(AnnotationForge) >library(AnnotationDbi) >library(GenomeInfoDb) # create OrgDb package. # Note: takes very long time (~20hrs) > makeOrgPackageFromNCBI(version="0.0.1", author = "First Last Name <email@address.com>", maintainer = "First Last Name <email@address.com>", ".", tax_id = "312017", genus = "Tetrahymena", species= "thermophila") If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day. preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz rebuilding the cache extracting data for our organism from : gene2pubmed getting data for gene2accession.gz rebuilding the cache extracting data for our organism from : gene2accession getting data for gene2refseq.gz rebuilding the cache extracting data for our organism from : gene2refseq getting data for gene_info.gz rebuilding the cache extracting data for our organism from : gene_info getting data for gene2go.gz rebuilding the cache extracting data for our organism from : gene2go processing gene2pubmed processing gene_info: chromosomes processing gene_info: description processing alias data processing refseq data processing accession data processing GO data Please be patient while we work out which organisms can be annotated with ensembl IDs. making the OrgDb package ... Populating genes table: genes table filled Populating pubmed table: pubmed table filled Populating gene_info table: gene_info table filled Populating entrez_genes table: entrez_genes table filled Populating alias table: alias table filled Populating refseq table: refseq table filled Populating accessions table: accessions table filled Populating go table: go table filled table metadata filled 'select()' returned many:1 mapping between keys and columns Dropping GO IDs that are too new for the current GO.db Populating go table: go table filled 'select()' returned many:1 mapping between keys and columns Populating go_all table: go_all table filled Creating package in ./org.Tthermophila.eg.db Now deleting temporary database file complete! [1] "org.Tthermophila.eg.sqlite" There were 50 or more warnings (use warnings() to see the first 50) >