Search
Question: OrgDb for Tetrahymena thermophila
0
gravatar for lin.ying.zhang
6 months ago by
lin.ying.zhang0 wrote:

I am trying to use clusterProfiler. There is no OrgDb object available for Tetrahymena thermophila. I only found a Inparanoid8Db object through Annotation Hub. Can I build OrgDb for Tetrahymena thermopile? How to do that?

ADD COMMENTlink modified 5 months ago by Guido Hooiveld2.2k • written 6 months ago by lin.ying.zhang0
1
gravatar for Guido Hooiveld
5 months ago by
Guido Hooiveld2.2k
Wageningen University, Wageningen, the Netherlands
Guido Hooiveld2.2k wrote:

Yes, you can build yourselves an OrgDb for Tetrahymena thermophile using the function makeOrgPackageFromNCBI() from the library AnnotationForge. You will only need the taxonomy ID of T. Thermophile, which apparently is 312017. Please note that this OrgDb contains (only) the annotation information available at the NCBI (for Tetrahymena thermophila SB210).

 

# download files from NCBI and create an annotation database named "org.Tthermophila.eg.db"
# you will need ~25GB disk space for this.
# running time function was ~ 7hr on my computer, during that time leave R session untouched.
# you can ignore the waring on removing the file './org.Tthermophila.eg.sqlite'

> # set working dir to a location with sufficient HDD space
> setwd("D:\\my\\favorite\\directory")

> # load required libraries.
>library(AnnotationForge)
>library(AnnotationDbi)
>library(GenomeInfoDb)

> #step below takes long time!
>makeOrgPackageFromNCBI(version="0.0.1", author = "First Last Name <email@address.com>", maintainer = "First Last Name <email@address.com>", ".", tax_id = "312017", genus = "Tetrahymena", species= "thermophila")

#Next install the generated org.Tthermophila.eg.db for use in R.
> install.packages(pkgs="./org.Tthermophila.eg.db", repos=NULL, type="source")
* installing *source* package 'org.Tthermophila.eg.db' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (org.Tthermophila.eg.db)

#check
> library(org.Tthermophila.eg.db)
> org.Tthermophila.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Tetrahymena thermophila
| SPECIES: Tetrahymena thermophila
| CENTRALID: GID
| Taxonomy ID: 312017
| Db type: OrgDb
| Supporting package: AnnotationDbi

> columns(org.Tthermophila.eg.db)
 [1] "ACCNUM"      "ALIAS"       "ENTREZID"    "EVIDENCE"    "EVIDENCEALL"
 [6] "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"   
[11] "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"     
> keytypes(org.Tthermophila.eg.db)
 [1] "ACCNUM"      "ALIAS"       "ENTREZID"    "EVIDENCE"    "EVIDENCEALL"
 [6] "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"   
[11] "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"     
> head(keys(org.Tthermophila.eg.db))
[1] "7822955" "7822974" "7823109" "7823219" "7823307" "7823613"

> #Check: number of keys (genes) indeed corresponds to # genes listed @ NCBI [=26997]
> length(keys(org.Tthermophila.eg.db))
[1] 26997

> mykeys <- keys(org.Tthermophila.eg.db)[1:25]
> anno.result <- select(org.Tthermophila.eg.db, keys=mykeys, columns=c("ENTREZID","SYMBOL","GENENAME","ALIAS","GO"),keytype="ENTREZID")
'select()' returned 1:many mapping between keys and columns
> head(anno.result)
  ENTREZID          SYMBOL                   GENENAME           ALIAS         GO
1  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0022625
2  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0003735
3  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0002181
4  7822955 TTHERM_00136120   60S ribosomal protein L6 TTHERM_00136120 GO:0000027
5  7822974 TTHERM_00134940 60S ribosomal protein L23a TTHERM_00134940 GO:0022625
6  7822974 TTHERM_00134940 60S ribosomal protein L23a TTHERM_00134940 GO:0019843

 

> sessionInfo()
R version 3.4.0 Patched (2017-05-10 r72670)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1


attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
[1] org.Tthermophila.eg.db_0.0.1 GenomeInfoDb_1.12.0         
[3] AnnotationForge_1.18.0       AnnotationDbi_1.38.0        
[5] IRanges_2.10.2               S4Vectors_0.14.2            
[7] Biobase_2.36.2               AnnotationHub_2.8.1         
[9] BiocGenerics_0.22.0         

ADD COMMENTlink modified 5 months ago • written 5 months ago by Guido Hooiveld2.2k

Thank you so much for your response!

I also need to load library(biomaRt), other than that everything works exactly as you said.

ADD REPLYlink written 5 months ago by lin.ying.zhang0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 111 users visited in the last hour