GenomicFeatures makeTxDbFromBiomart fails with "unkown species" error
1
3
Entering edit mode
kaur.alasoo ▴ 30
@kauralasoo-12123
Last seen 5.4 years ago
University of Tartu, Tartu, Estonia

I tried to construct TxDb object from the latest version of Ensembl (v87):

txdb87 = makeTxDbFromBiomart( biomart = "ENSEMBL_MART_ENSEMBL", 
dataset = "hsapiens_gene_ensembl", host="dec2016.archive.ensembl.org")

But I got the following error:

Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... Error in FUN(X[[i]], ...) : 
  1 unknown species: ‘Human genes’ Please use 'available.species' to see viable species names or tax Ids

Session info:

R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.2 (Sierra)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.24.0   dplyr_0.5.0            biomaRt_2.30.0         GenomicFeatures_1.26.2 AnnotationDbi_1.36.2  
 [6] Biobase_2.34.0         GenomicRanges_1.26.2   GenomeInfoDb_1.10.3    IRanges_2.8.1          S4Vectors_0.12.1      
[11] BiocGenerics_0.20.0   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9                magrittr_1.5               XVector_0.14.0             zlibbioc_1.20.0            GenomicAlignments_1.10.0  
 [6] BiocParallel_1.8.1         R6_2.2.0                   tools_3.3.1                SummarizedExperiment_1.4.0 DBI_0.5-1                 
[11] lazyeval_0.2.0             assertthat_0.1             tibble_1.2                 rtracklayer_1.34.1         bitops_1.0-6              
[16] RCurl_1.95-4.8             RSQLite_1.1-2              Biostrings_2.42.1          Rsamtools_1.26.1           XML_3.98-1.5  

 

biomart maketxdbfrombiomart genomicfeatures • 2.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Works for me:

> txdb87 = makeTxDbFromBiomart( biomart = "ENSEMBL_MART_ENSEMBL",
dataset = "hsapiens_gene_ensembl", host="dec2016.archive.ensembl.org")
txdb87 = makeTxDbFromBiomart( biomart = "ENSEMBL_MART_ENSEMBL",
+ dataset = "hsapiens_gene_ensembl", host="dec2016.archive.ensembl.org")
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK

> txdb87
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: BioMart
# Organism: Homo sapiens
# Taxonomy ID: 9606
# Resource URL: www.ensembl.org:80
# BioMart database: ENSEMBL_MART_ENSEMBL
# BioMart database version: Ensembl Genes 87
# BioMart dataset: hsapiens_gene_ensembl
# BioMart dataset description: hsapiens_gene_ensembl
# BioMart dataset version: GRCh38.p7
# Full dataset: yes
# miRBase build ID: NA
# transcript_nrow: 215929
# exon_nrow: 737982
# cds_nrow: 295719
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2017-02-11 13:08:58 -0800 (Sat, 11 Feb 2017)
# GenomicFeatures version at creation time: 1.26.2
# RSQLite version at creation time: 1.1-2
# DBSCHEMAVERSION: 1.1

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] GenomicFeatures_1.26.2 AnnotationDbi_1.36.1   Biobase_2.34.0        
[4] GenomicRanges_1.26.2   GenomeInfoDb_1.10.2    IRanges_2.8.1         
[7] S4Vectors_0.12.1       BiocGenerics_0.20.0    biomaRt_2.30.0        

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9                XVector_0.14.0            
 [3] zlibbioc_1.20.0            GenomicAlignments_1.10.0  
 [5] BiocParallel_1.8.1         lattice_0.20-34           
 [7] tools_3.3.1                grid_3.3.1                
 [9] SummarizedExperiment_1.4.0 DBI_0.5-1                 
[11] digest_0.6.12              Matrix_1.2-8              
[13] rtracklayer_1.34.1         bitops_1.0-6              
[15] RCurl_1.95-4.8             memoise_1.0.0             
[17] RSQLite_1.1-2              compiler_3.3.1            
[19] Biostrings_2.42.1          Rsamtools_1.26.1          
[21] XML_3.98-1.5              
>

Maybe try again?

 

ADD COMMENT
0
Entering edit mode

Yes, you are right. Seems to be working now.

Thanks!

ADD REPLY
1
Entering edit mode

I'm getting the same error?

Weirdly, it works if I use an archived 'host'="jul2016.archive.ensembl.org"

> CanFam.txdb <- makeTxDbFromBiomart(biomart = "ENSEMBL_MART_ENSEMBL",

+                                    dataset = "cfamiliaris_gene_ensembl",
+                                    host = "ensembl.org")
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... Error in FUN(X[[i]], ...) : 
  1 unknown species: ‘Dog genes’ Please use 'available.species' to see viable species names or tax Ids
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Rsubread_1.22.3        VennDiagram_1.6.17     futile.logger_1.4.3    GenomicFeatures_1.24.5 AnnotationDbi_1.34.4   Biobase_2.32.0         biomaRt_2.28.0        
 [8] gridExtra_2.2.1        tidyr_0.6.1            knitr_1.15.1           DT_0.2                 RColorBrewer_1.1-2     ggplot2_2.2.1          BiocInstaller_1.22.3  
[15] GenomicRanges_1.24.3   GenomeInfoDb_1.8.7     IRanges_2.6.1          S4Vectors_0.10.3       BiocGenerics_0.18.0   

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.2.3 colorspace_1.3-2           htmltools_0.3.5            rtracklayer_1.32.2         yaml_2.1.14                XML_3.98-1.5              
 [7] DBI_0.5-1                  BiocParallel_1.6.6         lambda.r_1.1.9             plyr_1.8.4                 stringr_1.1.0              zlibbioc_1.18.0           
[13] Biostrings_2.40.2          munsell_0.4.3              gtable_0.2.0               htmlwidgets_0.8            evaluate_0.10              memoise_1.0.0             
[19] labeling_0.3               highr_0.6                  Rcpp_0.12.9                scales_0.4.1               backports_1.0.5            jsonlite_1.2              
[25] XVector_0.12.1             Rsamtools_1.24.0           digest_0.6.12              stringi_1.1.2              dplyr_0.5.0                rprojroot_1.2             
[31] tools_3.3.1                bitops_1.0-6               magrittr_1.5               lazyeval_0.2.0             RCurl_1.95-4.8             tibble_1.2                
[37] RSQLite_1.1-2              futile.options_1.0.0       rsconnect_0.7              assertthat_0.1             rmarkdown_1.3              R6_2.2.0                  
[43] GenomicAlignments_1.8.4   
ADD REPLY
2
Entering edit mode

Hi,

Maybe the problem is that you're not using the latest released version of Bioconductor (which is 3.4). Some fixes were applied recently to makeTxDbFromBiomart() in BioC 3.4 (and in BioC devel) to work around some issues introduced by some changes on the Ensembl Mart side.

Try to load the BiocInstaller package. You should see something like this:

> library(BiocInstaller)
Bioconductor version 3.3 (BiocInstaller 1.22.3), ?biocLite for help
A newer version of Bioconductor is available for this version of R,
  ?BiocUpgrade for help

I strongly suggest that you upgrade your installation to use BioC 3.4 so you get these fixes.

Cheers,

H.

ADD REPLY
0
Entering edit mode

This solved the problem, thank you

ADD REPLY
0
Entering edit mode

This error seems to be cropping up in GenomicFeatures:::.prepareBiomartMetadata, which is an internal function that isn't really intended for people to call directly. Regardless, we can call this function directly to see if we can get the error you see:

> mart <- useMart("ENSEMBL_MART_ENSEMBL", "cfamiliaris_gene_ensembl")
> GenomicFeatures:::.prepareBiomartMetadata(mart, TRUE, "ensembl.org", "80", "9615", "5")
Prepare the 'metadata' data frame ... OK
                          name                    value
1                  Data source                  BioMart
2                     Organism         Canis familiaris
3                  Taxonomy ID                     9615
4                 Resource URL       www.ensembl.org:80
5             BioMart database     ENSEMBL_MART_ENSEMBL
6     BioMart database version         Ensembl Genes 87
7              BioMart dataset cfamiliaris_gene_ensembl
8  BioMart dataset description cfamiliaris_gene_ensembl
9      BioMart dataset version                CanFam3.1
10                Full dataset                      yes
11            miRBase build ID                        5

And as before, I can't reproduce the error. What happens if you try to do this?

ADD REPLY

Login before adding your answer.

Traffic: 756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6