Question

OrgDb for Tetrahymena thermophila

0

Entering edit mode

lin.ying.zhang • 0

@linyingzhang-13125

Last seen 7.6 years ago

I am trying to use clusterProfiler. There is no OrgDb object available for Tetrahymena thermophila. I only found a Inparanoid8Db object through Annotation Hub. Can I build OrgDb for Tetrahymena thermopile? How to do that?

annotation clustering • 1.7k views

ADD COMMENT • link updated 7.6 years ago by Guido Hooiveld ★ 4.1k • written 7.6 years ago by lin.ying.zhang • 0

score 1 · Answer 1 · 2017-05-29

Yes, you can build yourselves an OrgDb for Tetrahymena thermophile using the function makeOrgPackageFromNCBI() from the library AnnotationForge. You will only need the taxonomy ID of T. Thermophile, which apparently is 312017. Please note that this OrgDb contains (only) the annotation information available at the NCBI (for Tetrahymena thermophila SB210).

# download files from NCBI and create an annotation database named "org.Tthermophila.eg.db"
# you will need ~25GB disk space for this.
# running time function was ~ 7hr on my computer, during that time leave R session untouched.
# you can ignore the waring on removing the file './org.Tthermophila.eg.sqlite'

> # set working dir to a location with sufficient HDD space
> setwd("D:\\my\\favorite\\directory")

> # load required libraries.
>library(AnnotationForge)
>library(AnnotationDbi)
>library(GenomeInfoDb)

> #step below takes long time!
>makeOrgPackageFromNCBI(version="0.0.1", author = "First Last Name <email@address.com>", maintainer = "First Last Name <email@address.com>", ".", tax_id = "312017", genus = "Tetrahymena", species= "thermophila")

#Next install the generated org.Tthermophila.eg.db for use in R.
> install.packages(pkgs="./org.Tthermophila.eg.db", repos=NULL, type="source")
* installing *source* package 'org.Tthermophila.eg.db' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (org.Tthermophila.eg.db)

#check
> library(org.Tthermophila.eg.db)
> org.Tthermophila.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Tetrahymena thermophila
| SPECIES: Tetrahymena thermophila
| CENTRALID: GID
| Taxonomy ID: 312017
| Db type: OrgDb
| Supporting package: AnnotationDbi

> columns(org.Tthermophila.eg.db)
 [1] "ACCNUM"      "ALIAS"       "ENTREZID"    "EVIDENCE"    "EVIDENCEALL"
 [6] "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"   
[11] "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"     
> keytypes(org.Tthermophila.eg.db)
 [1] "ACCNUM"      "ALIAS"       "ENTREZID"    "EVIDENCE"    "EVIDENCEALL"
 [6] "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"   
[11] "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"     
> head(keys(org.Tthermophila.eg.db))
[1] "7822955" "7822974" "7823109" "7823219" "7823307" "7823613"

> #Check: number of keys (genes) indeed corresponds to # genes listed @ NCBI [=26997]
> length(keys(org.Tthermophila.eg.db))
[1] 26997

> mykeys <- keys(org.Tthermophila.eg.db)[1:25]
> anno.result <- select(org.Tthermophila.eg.db, keys=mykeys, columns=c("ENTREZID","SYMBOL","GENENAME","ALIAS","GO"),keytype="ENTREZID")
'select()' returned 1:many mapping between keys and columns
> head(anno.result)
  ENTREZID          SYMBOL                   GENENAME           ALIAS         GO
1  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0022625
2  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0003735
3  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0002181
4  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0000027
5  7822974 TTHERM_00134940 60S ribosomal protein L23a TTHERM_00134940 GO:0022625
6  7822974 TTHERM_00134940 60S ribosomal protein L23a TTHERM_00134940 GO:0019843

> sessionInfo()
R version 3.4.0 Patched (2017-05-10 r72670)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1


attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
[1] org.Tthermophila.eg.db_0.0.1 GenomeInfoDb_1.12.0         
[3] AnnotationForge_1.18.0       AnnotationDbi_1.38.0        
[5] IRanges_2.10.2               S4Vectors_0.14.2            
[7] Biobase_2.36.2               AnnotationHub_2.8.1         
[9] BiocGenerics_0.22.0