Error with makeTxDbFromBiomart
2
0
Entering edit mode
anavmar1 • 0
@8f378497
Last seen 14 months ago
Spain

Hello everyone! I'm trying to do an analysis with DEXSeq, but for that I need to connect to biomart first. I need to use the makeTxDbFromBiomart function to generate the TxDb object, but I always get the same error. I have already solved the firewall problem and if I should be able to connect to the internet, I don't know what is not working. Maybe someone can help me.

I attach the executed command and the error that it returns:

hse = makeTxDbFromBiomart(biomart="ENSEMBL_MART_ENSEMBL", dataset="mmusculus_gene_ensembl", host="https://www.ensembl.org")

Ensembl site unresponsive, trying useast mirror Download and preprocess the 'transcripts' data frame ... OK Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped) Download and preprocess the 'splicings' data frame ... OK Download and preprocess the 'genes' data frame ... OK Prepare the 'metadata' data frame ... Error in function (type, msg, asError = TRUE) : Failed to connect to ftp.ensembl.org port 21: Operation timed out

Thanks in advance! Alejandra

DEXSeq biomaRt • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 53 minutes ago
United States

Are you sure you need that TxDb? The easier thing to do would be to use an EnsDb that you can get from the AnnotationHub.

> library(AnnotationHub)

> hub <- AnnotationHub()
snapshotDate(): 2023-04-24
> query(hub, c("mus musculus","ensdb"))
AnnotationHub with 38 records
# snapshotDate(): 2023-04-24
# $dataprovider: Ensembl
# $species: Mus musculus, Mus musculus musculus, Mus musculus domesticus, Mu...
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53222"]]' 

             title                                        
  AH53222  | Ensembl 87 EnsDb for Mus Musculus            
  AH53726  | Ensembl 88 EnsDb for Mus Musculus            
  AH56691  | Ensembl 89 EnsDb for Mus Musculus            
  AH57770  | Ensembl 90 EnsDb for Mus Musculus            
  AH60788  | Ensembl 91 EnsDb for Mus Musculus            
  ...        ...                                          
  AH109651 | Ensembl 109 EnsDb for Mus musculus           
  AH109652 | Ensembl 109 EnsDb for Mus musculus           
  AH109653 | Ensembl 109 EnsDb for Mus musculus musculus  
  AH109654 | Ensembl 109 EnsDb for Mus musculus domesticus
  AH109655 | Ensembl 109 EnsDb for Mus musculus           
> ensdb <- hub[["AH109655"]]
loading from cache
require("ensembldb")
> ensdb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.10
|Creation time: Fri Feb 17 05:48:03 2023
|ensembl_version: 109
|ensembl_host: localhost
|Organism: Mus musculus
|taxonomy_id: 10090
|genome_build: GRCm39
|DBSCHEMAVERSION: 2.2
|common_name: mouse
|species: mus_musculus
| No. of genes: 57010.
| No. of transcripts: 149443.
|Protein data available.
>

You might need to try several versions to ensure you are matching up to whatever version was used to align your data.

ADD COMMENT
0
Entering edit mode

Thanks James, I will try this.

ADD REPLY
0
Entering edit mode
@herve-pages-1542
Last seen 19 hours ago
Seattle, WA, United States

The Ensembl Mart is notoriously unreliable, but it looks like it's working today:

txdb <- makeTxDbFromBiomart(biomart="ENSEMBL_MART_ENSEMBL", dataset="mmusculus_gene_ensembl") 
# Download and preprocess the 'transcripts' data frame ... OK
# Download and preprocess the 'chrominfo' data frame ... OK
# Download and preprocess the 'splicings' data frame ... OK
# Download and preprocess the 'genes' data frame ... OK
# Prepare the 'metadata' data frame ... OK
# Make the TxDb object ... OK

txdb
# TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: BioMart
# Organism: Mus musculus
# Taxonomy ID: 10090
# Resource URL: www.ensembl.org:443
# BioMart database: ENSEMBL_MART_ENSEMBL
# BioMart database version: Ensembl Genes 110
# BioMart dataset: mmusculus_gene_ensembl
# BioMart dataset description: Mouse genes (GRCm39)
# BioMart dataset version: GRCm39
# Full dataset: yes
# miRBase build ID: NA
# Nb of transcripts: 149547
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2023-08-11 18:57:14 -0700 (Fri, 11 Aug 2023)
# GenomicFeatures version at creation time: 1.52.1
# RSQLite version at creation time: 2.3.1
# DBSCHEMAVERSION: 1.2

Note that an alternative to makeTxDbFromBiomart() is to use makeTxDbFromEnsembl(). The latter will query the Ensembl MySQL server _directly_ (and thus bypass the Mart service completely):

txdb2 <- makeTxDbFromEnsembl("Mus musculus")
# Fetch transcripts and genes from Ensembl ... OK
#   (fetched 149547 transcripts from 56941 genes)
# Fetch exons and CDS from Ensembl ... OK
# Fetch chromosome names and lengths from Ensembl ...OK
# Gather the metadata ... OK
# Make the TxDb object ... OK

txdb2
# TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: Ensembl
# Organism: Mus musculus
# Ensembl release: 110
# Ensembl database: mus_musculus_core_110_39
# MySQL server: ensembldb.ensembl.org
# Full dataset: yes
# Nb of transcripts: 149547
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2023-08-11 18:50:42 -0700 (Fri, 11 Aug 2023)
# GenomicFeatures version at creation time: 1.52.1
# RSQLite version at creation time: 2.3.1
# DBSCHEMAVERSION: 1.2

See ?makeTxDbFromEnsembl for more information, including how to specify a given Ensembl release.

Best,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6