biomaRt getSequence through genomic position
1
0
Entering edit mode
@steffenstatberkeleyedu-2907
Last seen 9.7 years ago
Hi Paul, To retrieve sequences with biomaRt and mysql=TRUE, the package actually connects to two BioMarts one is Ensembl and the other is the sequence BioMart. However the user only needs to connect to the Ensembl BioMart. Under the hood getSequence will also connect to the sequence BioMart. It looks like it doesn't disconnect and this causes the error when you apply this in a loop. I'll try to provide a fix as soon as possible. Unfortunately it is not possible to retrieve genomic sequences with mysql=F. We need to discuss with the Ensembl developers and ask them if they could make this available through their BioMart web service. Cheers, Steffen > Dear Paul, > > and what is the output of sessionInfo()? > > bw Wolfgang > > Paul Hammer ha scritto: >> hi all, >> >> i try to get sequences via the getSequence function from biomaRt. Exact >> i would like to have the last 5 bases of an exon and the last 5 bases of >> the following intron. my approach is following: >> >> library(biomaRt) >> ensembl_rat = useMart("ensembl", dataset="rnorvegicus_gene_ensembl") >> filter_rat = listFilters(ensembl_rat) >> rat_exonsLocs = getBM(attributes=c("ensembl_exon_id", >> "exon_chrom_start", "exon_chrom_end"), filter=filter_rat[c(14,45,12),1], >> values=list(chromosome="1", status="KNOWN", biotype="protein_coding"), >> mart=ensembl_rat) >> laenge = dim(rat_exonsLocs)[1] >> >> ensembl_rat2 = useMart("ensembl", dataset="rnorvegicus_gene_ensembl", >> mysql=TRUE) >> for(i in 1:laenge){ >> gseqs_exon = getSequence(chromosome = 1, start=rat_exonsLocs[i,3]-5, end >> = rat_exonsLocs[i,3], mart = ensembl_rat2) >> seqs_introns = getSequence(chromosome = 1, start=rat_exonsLocs[i+1,2]-5, >> end=rat_exonsLocs[i+1,2], mart = ensembl_rat2) >> } >> >> but i get always this error message: "Error in mysqlNewConnection(drv, >> ...) : RS-DBI driver: (??O?cannot allocate a new connection -- maximum >> of 16 connections already opened)" >> >> Is there a way to use useMart without mysql=TRUE to get sequences only >> via genomic position? when i connect without mysql=TRUE >> (useMart("ensembl", dataset="rnorvegicus_gene_ensembl") ) i always have >> to set seqType and type. when i do this i don't get the 5 bases that i >> want! >> >> any help would great! >> thanks in advance >> paul >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
biomaRt biomaRt • 921 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On Tue, Dec 2, 2008 at 11:18 PM, <steffen at="" stat.berkeley.edu=""> wrote: > Hi Paul, > > To retrieve sequences with biomaRt and mysql=TRUE, the package actually > connects to two BioMarts one is Ensembl and the other is the sequence > BioMart. However the user only needs to connect to the Ensembl BioMart. > Under the hood getSequence will also connect to the sequence BioMart. It > looks like it doesn't disconnect and this causes the error when you apply > this in a loop. I'll try to provide a fix as soon as possible. > > Unfortunately it is not possible to retrieve genomic sequences with mysql=F. > We need to discuss with the Ensembl developers and ask them if they could > make this available through their BioMart web service. > > Cheers, > Steffen > >> Dear Paul, >> >> and what is the output of sessionInfo()? >> >> bw Wolfgang >> >> Paul Hammer ha scritto: >>> hi all, >>> >>> i try to get sequences via the getSequence function from biomaRt. Exact >>> i would like to have the last 5 bases of an exon and the last 5 bases of >>> the following intron. my approach is following: >>> >>> library(biomaRt) >>> ensembl_rat = useMart("ensembl", dataset="rnorvegicus_gene_ensembl") >>> filter_rat = listFilters(ensembl_rat) >>> rat_exonsLocs = getBM(attributes=c("ensembl_exon_id", >>> "exon_chrom_start", "exon_chrom_end"), filter=filter_rat[c(14,45,12),1], >>> values=list(chromosome="1", status="KNOWN", biotype="protein_coding"), >>> mart=ensembl_rat) >>> laenge = dim(rat_exonsLocs)[1] >>> >>> ensembl_rat2 = useMart("ensembl", dataset="rnorvegicus_gene_ensembl", >>> mysql=TRUE) >>> for(i in 1:laenge){ >>> gseqs_exon = getSequence(chromosome = 1, start=rat_exonsLocs[i,3]-5, end >>> = rat_exonsLocs[i,3], mart = ensembl_rat2) >>> seqs_introns = getSequence(chromosome = 1, start=rat_exonsLocs[i+1,2]-5, >>> end=rat_exonsLocs[i+1,2], mart = ensembl_rat2) >>> } >>> >>> but i get always this error message: "Error in mysqlNewConnection(drv, >>> ...) : RS-DBI driver: (??O?cannot allocate a new connection -- maximum >>> of 16 connections already opened)" >>> >>> Is there a way to use useMart without mysql=TRUE to get sequences only >>> via genomic position? when i connect without mysql=TRUE >>> (useMart("ensembl", dataset="rnorvegicus_gene_ensembl") ) i always have >>> to set seqType and type. when i do this i don't get the 5 bases that i >>> want! Just an FYI, genomic sequence is also available via the BSgenome package and associate data packages. Install that package, load it, and then issue the available.genomes() command. This will list the available genomes. I imagine that rnorvegicus is one of them. Install and load that package, also. Then follow the BSgenome vignette to get the sequences. Sean
ADD COMMENT

Login before adding your answer.

Traffic: 455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6