Search
Question: Generating database with non model species to be use by ReportingTools
0
gravatar for cagenet34
17 months ago by
cagenet3410
Toulouse, France, INRA
cagenet3410 wrote:

Hello,

I tried to annotate my list of differentially expressed genes with ReportingTools.

My problem is that I'm working on non model species (basically ovis aries).

I'm blocked because I don't succeed neither in creating my own annotation.Db for sheep via annotationHub or I failed using ensembldb package.

Here are my script with annotation Hub

library("AnnotationHub")
ah<-AnnotationHub()
query(ah, c("OrgDb", "sheep"))
sheep<-ah[["AH48021"]]
keytypes(sheep)

ensoa<-head(keys(sheep,"ENSEMBL"))

select(sheep, ensoa,c("SYMBOL", "GENENAME"), "ENSEMBL")
DbFile<-ensDbFromAH(sheep)
Error in ensDbFromAH(sheep) : 
  Argument 'ah' has to be a (single) AnnotationHub object.

My script with ensembldb

library(ensembldb)
> fetchTablesFromEnsembl(84, species = "sheep")
Error in fetchTablesFromEnsembl(84, species = "sheep") : 
  Something went wrong! I'm missing some of the txt files the perl script should have generated.
In addition: Warning message:
running command 'perl C:/Users/cagenet/Documents/R/win-library/3.3/ensembldb/perl/get_gene_transcript_exon_tables.pl -s sheep -e 84 -U anonymous -H ensembldb.ensembl.org -p 5306 -P ' had status 127 

 

R version 3.3.1 RC (2016-06-17 r70798)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] AnnotationHub_2.4.2        ensembldb_1.4.6            GenomicFeatures_1.24.2    
 [4] HTSFilter_1.12.0           biomaRt_2.28.0             ReportingTools_2.12.2     
 [7] knitr_1.13                 DESeq2_1.12.3              SummarizedExperiment_1.2.3
[10] GenomicRanges_1.24.1       GenomeInfoDb_1.8.2         AnnotationDbi_1.34.3      
[13] IRanges_2.6.0              S4Vectors_0.10.1           Biobase_2.32.0            
[16] BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] httr_1.2.0                    edgeR_3.14.0                  splines_3.3.1                
 [4] R.utils_2.3.0                 Formula_1.2-1                 shiny_0.13.2                 
 [7] interactiveDisplayBase_1.10.3 latticeExtra_0.6-28           RBGL_1.48.1                  
[10] BSgenome_1.40.1               Rsamtools_1.24.0              Category_2.38.0              
[13] RSQLite_1.0.0                 lattice_0.20-33               biovizBase_1.20.0            
[16] limma_3.28.6                  chron_2.3-47                  digest_0.6.9                 
[19] RColorBrewer_1.1-2            XVector_0.12.0                colorspace_1.2-6             
[22] ggbio_1.20.1                  R.oo_1.20.0                   httpuv_1.3.3                 
[25] htmltools_0.3.5               Matrix_1.2-6                  plyr_1.8.4                   
[28] OrganismDbi_1.14.1            GSEABase_1.34.0               XML_3.98-1.4                 
[31] genefilter_1.54.2             zlibbioc_1.18.0               xtable_1.8-2                 
[34] GO.db_3.3.0                   scales_0.4.0                  BiocParallel_1.6.2           
[37] annotate_1.50.0               ggplot2_2.1.0                 PFAM.db_3.3.0                
[40] nnet_7.3-12                   mime_0.4                      survival_2.39-4              
[43] magrittr_1.5                  R.methodsS3_1.7.1             GGally_1.1.0                 
[46] hwriter_1.3.2                 foreign_0.8-66                GOstats_2.38.0               
[49] BiocInstaller_1.22.2          graph_1.50.0                  tools_3.3.1                  
[52] data.table_1.9.6              stringr_1.0.0                 munsell_0.4.3                
[55] locfit_1.5-9.1                cluster_2.0.4                 Biostrings_2.40.2            
[58] DESeq_1.24.0                  grid_3.3.1                    RCurl_1.95-4.8               
[61] dichromat_2.0-0               VariantAnnotation_1.18.1      AnnotationForge_1.14.2       
[64] bitops_1.0-6                  gtable_0.2.0                  curl_0.9.7                   
[67] DBI_0.4-1                     reshape_0.8.5                 reshape2_1.4.1               
[70] R6_2.1.2                      GenomicAlignments_1.8.1       gridExtra_2.2.1              
[73] rtracklayer_1.32.0            Hmisc_3.17-4                  stringi_1.1.1                
[76] Rcpp_0.12.5                   geneplotter_1.50.0            rpart_4.1-10                 
[79] acepack_1.3-3.3              
 

>

 

ADD COMMENTlink modified 17 months ago by Johannes Rainer1.1k • written 17 months ago by cagenet3410
0
gravatar for Johannes Rainer
17 months ago by
Johannes Rainer1.1k
Italy
Johannes Rainer1.1k wrote:

Hi,

you are on the right track! To create the EnsDb database for sheep I would suggest that you use the AnnotationHub approach. The fetchTablesFromEnsembl call requires that you have perl available and the correct Ensembl Perl API installed on your system. So, for starters (and if you don't mind missing the NCBI EntrezGene IDs), it's easier to use the ensDbFromAH approach:

library(ensembldb)
library(AnnotationHub)

ah <- AnnotationHub()
## Query all GTF files from Ensembl for sheep
query(ah, c("ensembl", "ovis", "gtf"))

AnnotationHub with 10 records
# snapshotDate(): 2016-06-06
# $dataprovider: Ensembl
# $species: Ovis aries
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH8773"]]'

            title                     
  AH8773  | Ovis_aries.Oar_v3.1.74.gtf
  AH10704 | Ovis_aries.Oar_v3.1.75.gtf
  AH28626 | Ovis_aries.Oar_v3.1.78.gtf
  AH28694 | Ovis_aries.Oar_v3.1.76.gtf
  AH28763 | Ovis_aries.Oar_v3.1.79.gtf
  AH28832 | Ovis_aries.Oar_v3.1.77.gtf
  AH47086 | Ovis_aries.Oar_v3.1.80.gtf
  AH47983 | Ovis_aries.Oar_v3.1.81.gtf
  AH50328 | Ovis_aries.Oar_v3.1.82.gtf
  AH50397 | Ovis_aries.Oar_v3.1.83.gtf

## So, we're using the Ensembl 83 GTF here:
dbFile <- ensDbFromAH(ah["AH50397"])  ## Note the single [ !

## Now, this is the SQLite file:
dbFile
[1] "./Ovis_aries.Oar_v3.1.83.sqlite"

## To use it:
edb <- EnsDb(dbFile)

## Or alternatively make a package using the makeEnsembldbPackage function.

Now you can use the `EnsDb` object with the `genes` etc methods, or also with the `select` method:

columns(edb)
 [1] "ENTREZID"       "EXONID"         "EXONIDX"        "EXONSEQEND"    
 [5] "EXONSEQSTART"   "GENEBIOTYPE"    "GENEID"         "GENENAME"      
 [9] "GENESEQEND"     "GENESEQSTART"   "ISCIRCULAR"     "SEQCOORDSYSTEM"
[13] "SEQLENGTH"      "SEQNAME"        "SEQSTRAND"      "TXBIOTYPE"     
[17] "TXCDSSEQEND"    "TXCDSSEQSTART"  "TXID"           "TXNAME"        
[21] "TXSEQEND"       "TXSEQSTART"   

## To get the Gene ID and the gene name (symbol):

head(select(edb, columns=c("GENEID", "GENENAME", "GENEBIOTYPE")))
              GENEID GENENAME    GENEBIOTYPE
1 ENSOARG00000000001     <NA>        Mt_tRNA
2 ENSOARG00000000002     <NA>        Mt_rRNA
3 ENSOARG00000000003     <NA>        Mt_tRNA
4 ENSOARG00000000004     <NA>        Mt_rRNA
5 ENSOARG00000000005     <NA>        Mt_tRNA
6 ENSOARG00000000006      ND1 protein_coding

 

hope that helps.

 

jo

ADD COMMENTlink modified 17 months ago • written 17 months ago by Johannes Rainer1.1k

ok Thank you. I'm newbie and your advice helps me ;-)

 

ADD REPLYlink written 16 months ago by cagenet3410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 241 users visited in the last hour