Question: Generating database with non model species to be use by ReportingTools
gravatar for cagenet34
2.5 years ago by
Toulouse, France, INRA
cagenet3420 wrote:


I tried to annotate my list of differentially expressed genes with ReportingTools.

My problem is that I'm working on non model species (basically ovis aries).

I'm blocked because I don't succeed neither in creating my own annotation.Db for sheep via annotationHub or I failed using ensembldb package.

Here are my script with annotation Hub

query(ah, c("OrgDb", "sheep"))


select(sheep, ensoa,c("SYMBOL", "GENENAME"), "ENSEMBL")
Error in ensDbFromAH(sheep) : 
  Argument 'ah' has to be a (single) AnnotationHub object.

My script with ensembldb

> fetchTablesFromEnsembl(84, species = "sheep")
Error in fetchTablesFromEnsembl(84, species = "sheep") : 
  Something went wrong! I'm missing some of the txt files the perl script should have generated.
In addition: Warning message:
running command 'perl C:/Users/cagenet/Documents/R/win-library/3.3/ensembldb/perl/ -s sheep -e 84 -U anonymous -H -p 5306 -P ' had status 127 


R version 3.3.1 RC (2016-06-17 r70798)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] AnnotationHub_2.4.2        ensembldb_1.4.6            GenomicFeatures_1.24.2    
 [4] HTSFilter_1.12.0           biomaRt_2.28.0             ReportingTools_2.12.2     
 [7] knitr_1.13                 DESeq2_1.12.3              SummarizedExperiment_1.2.3
[10] GenomicRanges_1.24.1       GenomeInfoDb_1.8.2         AnnotationDbi_1.34.3      
[13] IRanges_2.6.0              S4Vectors_0.10.1           Biobase_2.32.0            
[16] BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] httr_1.2.0                    edgeR_3.14.0                  splines_3.3.1                
 [4] R.utils_2.3.0                 Formula_1.2-1                 shiny_0.13.2                 
 [7] interactiveDisplayBase_1.10.3 latticeExtra_0.6-28           RBGL_1.48.1                  
[10] BSgenome_1.40.1               Rsamtools_1.24.0              Category_2.38.0              
[13] RSQLite_1.0.0                 lattice_0.20-33               biovizBase_1.20.0            
[16] limma_3.28.6                  chron_2.3-47                  digest_0.6.9                 
[19] RColorBrewer_1.1-2            XVector_0.12.0                colorspace_1.2-6             
[22] ggbio_1.20.1                  R.oo_1.20.0                   httpuv_1.3.3                 
[25] htmltools_0.3.5               Matrix_1.2-6                  plyr_1.8.4                   
[28] OrganismDbi_1.14.1            GSEABase_1.34.0               XML_3.98-1.4                 
[31] genefilter_1.54.2             zlibbioc_1.18.0               xtable_1.8-2                 
[34] GO.db_3.3.0                   scales_0.4.0                  BiocParallel_1.6.2           
[37] annotate_1.50.0               ggplot2_2.1.0                 PFAM.db_3.3.0                
[40] nnet_7.3-12                   mime_0.4                      survival_2.39-4              
[43] magrittr_1.5                  R.methodsS3_1.7.1             GGally_1.1.0                 
[46] hwriter_1.3.2                 foreign_0.8-66                GOstats_2.38.0               
[49] BiocInstaller_1.22.2          graph_1.50.0                  tools_3.3.1                  
[52] data.table_1.9.6              stringr_1.0.0                 munsell_0.4.3                
[55] locfit_1.5-9.1                cluster_2.0.4                 Biostrings_2.40.2            
[58] DESeq_1.24.0                  grid_3.3.1                    RCurl_1.95-4.8               
[61] dichromat_2.0-0               VariantAnnotation_1.18.1      AnnotationForge_1.14.2       
[64] bitops_1.0-6                  gtable_0.2.0                  curl_0.9.7                   
[67] DBI_0.4-1                     reshape_0.8.5                 reshape2_1.4.1               
[70] R6_2.1.2                      GenomicAlignments_1.8.1       gridExtra_2.2.1              
[73] rtracklayer_1.32.0            Hmisc_3.17-4                  stringi_1.1.1                
[76] Rcpp_0.12.5                   geneplotter_1.50.0            rpart_4.1-10                 
[79] acepack_1.3-3.3              



ADD COMMENTlink modified 2.5 years ago by Johannes Rainer1.3k • written 2.5 years ago by cagenet3420
gravatar for Johannes Rainer
2.5 years ago by
Johannes Rainer1.3k
Johannes Rainer1.3k wrote:


you are on the right track! To create the EnsDb database for sheep I would suggest that you use the AnnotationHub approach. The fetchTablesFromEnsembl call requires that you have perl available and the correct Ensembl Perl API installed on your system. So, for starters (and if you don't mind missing the NCBI EntrezGene IDs), it's easier to use the ensDbFromAH approach:


ah <- AnnotationHub()
## Query all GTF files from Ensembl for sheep
query(ah, c("ensembl", "ovis", "gtf"))

AnnotationHub with 10 records
# snapshotDate(): 2016-06-06
# $dataprovider: Ensembl
# $species: Ovis aries
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH8773"]]'

  AH8773  | Ovis_aries.Oar_v3.1.74.gtf
  AH10704 | Ovis_aries.Oar_v3.1.75.gtf
  AH28626 | Ovis_aries.Oar_v3.1.78.gtf
  AH28694 | Ovis_aries.Oar_v3.1.76.gtf
  AH28763 | Ovis_aries.Oar_v3.1.79.gtf
  AH28832 | Ovis_aries.Oar_v3.1.77.gtf
  AH47086 | Ovis_aries.Oar_v3.1.80.gtf
  AH47983 | Ovis_aries.Oar_v3.1.81.gtf
  AH50328 | Ovis_aries.Oar_v3.1.82.gtf
  AH50397 | Ovis_aries.Oar_v3.1.83.gtf

## So, we're using the Ensembl 83 GTF here:
dbFile <- ensDbFromAH(ah["AH50397"])  ## Note the single [ !

## Now, this is the SQLite file:
[1] "./Ovis_aries.Oar_v3.1.83.sqlite"

## To use it:
edb <- EnsDb(dbFile)

## Or alternatively make a package using the makeEnsembldbPackage function.

Now you can use the `EnsDb` object with the `genes` etc methods, or also with the `select` method:

 [1] "ENTREZID"       "EXONID"         "EXONIDX"        "EXONSEQEND"    
[13] "SEQLENGTH"      "SEQNAME"        "SEQSTRAND"      "TXBIOTYPE"     
[17] "TXCDSSEQEND"    "TXCDSSEQSTART"  "TXID"           "TXNAME"        
[21] "TXSEQEND"       "TXSEQSTART"   

## To get the Gene ID and the gene name (symbol):

head(select(edb, columns=c("GENEID", "GENENAME", "GENEBIOTYPE")))
1 ENSOARG00000000001     <NA>        Mt_tRNA
2 ENSOARG00000000002     <NA>        Mt_rRNA
3 ENSOARG00000000003     <NA>        Mt_tRNA
4 ENSOARG00000000004     <NA>        Mt_rRNA
5 ENSOARG00000000005     <NA>        Mt_tRNA
6 ENSOARG00000000006      ND1 protein_coding


hope that helps.



ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Johannes Rainer1.3k

ok Thank you. I'm newbie and your advice helps me ;-)


ADD REPLYlink written 2.4 years ago by cagenet3420
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 296 users visited in the last hour