Question: Retrieving Upstream Sequences With biomaRt
0
12.8 years ago by
peter robinson300 wrote:
Hi Conductors, I have been experimenting with the very nice biomaRt package and noticed in the vignette (section 5) that sequence retrieval appears to be restricted to the cDNA (possibly only UTR) or peptide sequences. From an earlier posting on the mailing list, I saw a way to retrieve upstream sequences: library(biomaRt) ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE) entrez <- c("100","330") gene <- getGene(id=entrez, type="entrezgene", mart=ens) getSequence(chromosome = gene$chromosome, start = gene$start - 2000, end= gene$end + 1000, mart=ens) However, when I try the following line: gene <- getGene(id=entrez, type="entrezgene", mart=ens) I get the error message Error in mysqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not run statement: Can't create/write to file '/ensembldb1b-1/data/#sql_21c2_0.MYI' (Errcode: 13)) The examples in the vignette all seem to work. What is wrong here? Where is the file mentioned in the error message supposed to live? Thanks, Peter biomart • 844 views ADD COMMENTlink modified 12.8 years ago by Stephen Henderson1.0k • written 12.8 years ago by peter robinson300 Answer: Retrieving Upstream Sequences With biomaRt 0 12.8 years ago by Stephen Henderson1.0k wrote: The first code that you show doesn't work--for many reasons. >library(biomaRt) >ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE) >entrez <- c("100","330") >gene <- getGene(id=entrez, type="entrezgene", mart=ens) >getSequence(chromosome = gene$chromosome, start = gene$start - 2000, end= >gene$end + 1000, mart=ens) 1. gene$start and gene$end don't exist. 2. seqType is not specified. 3. seqType can't be specified as the type you want. >However, when I try the following line: >gene <- getGene(id=entrez, type="entrezgene", mart=ens) that appears to be a problem with your database installation as if you do it over the web (i.e. drop mysql=TRUE): ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl") entrez <- c("100","330") gene <- getGene(id=entrez, type="entrezgene", mart=ens) it works fine. >The examples in the vignette all seem to work. What is wrong here? Where is >the file mentioned in the error message supposed to live? Stephen Henderson Wolfson Inst. for Biomedical Research Cruciform Bldg., Gower Street University College London United Kingdom, WC1E 6BT +44 (0)207 679 6827 ********************************************************************** This email and any files transmitted with it are confidentia...{{dropped}}
On Tue, Feb 20, 2007 at 12:11:02PM -0000, Stephen Henderson wrote: > The first code that you show doesn't work--for many reasons. > > >library(biomaRt) > >ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE) > >entrez <- c("100","330") > >gene <- getGene(id=entrez, type="entrezgene", mart=ens) > >getSequence(chromosome = gene$chromosome, start = gene$start - 2000, > end= >gene$end + 1000, mart=ens) > > > 1. gene$start and gene$end don't exist. > 2. seqType is not specified. > 3. seqType can't be specified as the type you want. > > > >However, when I try the following line: > >gene <- getGene(id=entrez, type="entrezgene", mart=ens) > > that appears to be a problem with your database installation as if you > do it over the web (i.e. drop mysql=TRUE): > > ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl") > entrez <- c("100","330") > gene <- getGene(id=entrez, type="entrezgene", mart=ens) > > it works fine. > > >The examples in the vignette all seem to work. What is wrong here? > Where is >the file mentioned in the error message supposed to live? > Hmm..., I take it that the API of biomaRt has changed quite a bit. I got the code from a previous message to this list (I think), but perhaps it is easier to ask how to do things properly than to ask how to fix the code. Is there a way of retrieving upstream sequences with biomaRt? Thanks, Peter > > > Stephen Henderson > Wolfson Inst. for Biomedical Research > Cruciform Bldg., Gower Street > University College London > United Kingdom, WC1E 6BT > +44 (0)207 679 6827 > > ********************************************************************** > This email and any files transmitted with it are confidentia...{{dropped}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ADD REPLYlink written 12.8 years ago by peter robinson300 Hi Peter, As many of you noticed recently the implementation of the getSequence function in MySQL mode and in the default webservice mode is different. In MySQL mode you can only retrieve sequences based on chromosomal coordinates so this should enable you to retrieve upstream sequences if you have the exact positions of the sequences you want. In webservice mode there are more options however upstream sequences are currently not yet available. One can retrieve 5'utr, 3'utr, protein and cdna sequence based on a set of identifiers or if a chromosomal location is given then e.g. any annotated 5'utr between these positions will be returned. In webservice mode the getSequence function should actually be able to retrieve more types of sequences such as exons only, or upstream regions but this requires some more development which I will try to get working as soon as possible now that there is a clear interest in the getSequence function of biomaRt. best, Steffen Peter Robinson wrote: > On Tue, Feb 20, 2007 at 12:11:02PM -0000, Stephen Henderson wrote: > >> The first code that you show doesn't work--for many reasons. >> >> >>> library(biomaRt) >>> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE) >>> entrez <- c("100","330") >>> gene <- getGene(id=entrez, type="entrezgene", mart=ens) >>> getSequence(chromosome = gene$chromosome, start = gene$start - 2000, >>> >> end= >gene$end + 1000, mart=ens) >> >> >> 1. gene$start and gene$end don't exist. >> 2. seqType is not specified. >> 3. seqType can't be specified as the type you want. >> >> >> >>> However, when I try the following line: >>> gene <- getGene(id=entrez, type="entrezgene", mart=ens) >>> >> that appears to be a problem with your database installation as if you >> do it over the web (i.e. drop mysql=TRUE): >> >> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl") >> entrez <- c("100","330") >> gene <- getGene(id=entrez, type="entrezgene", mart=ens) >> >> it works fine. >> >> >>> The examples in the vignette all seem to work. What is wrong here? >>> >> Where is >the file mentioned in the error message supposed to live? >> >> > > Hmm..., I take it that the API of biomaRt has changed quite a bit. I got the code from a previous message to this list (I think), but perhaps it is easier to ask how to do things properly than to ask how to fix the code. Is there a way of retrieving upstream sequences with biomaRt? > > Thanks, Peter > > > >> Stephen Henderson >> Wolfson Inst. for Biomedical Research >> Cruciform Bldg., Gower Street >> University College London >> United Kingdom, WC1E 6BT >> +44 (0)207 679 6827 >> >> ********************************************************************** >> This email and any files transmitted with it are confidentia...{{dropped}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877
On Mon, Feb 26, 2007 at 09:08:50AM -0500, Steffen Durinck wrote: > Hi Peter, > > As many of you noticed recently the implementation of the getSequence > function in MySQL mode and in the default webservice mode is different. > In MySQL mode you can only retrieve sequences based on chromosomal > coordinates so this should enable you to retrieve upstream sequences if > you have the exact positions of the sequences you want. > In webservice mode there are more options however upstream sequences are > currently not yet available. One can retrieve 5'utr, 3'utr, protein and > cdna sequence based on a set of identifiers or if a chromosomal location > is given then e.g. any annotated 5'utr between these positions will be > returned. > In webservice mode the getSequence function should actually be able to > retrieve more types of sequences such as exons only, or upstream regions > but this requires some more development which I will try to get working > as soon as possible now that there is a clear interest in the > getSequence function of biomaRt. > > best, > Steffen Steffen, thanks for your work on this! best, Peter > > > Peter Robinson wrote: > > On Tue, Feb 20, 2007 at 12:11:02PM -0000, Stephen Henderson wrote: > > > >> The first code that you show doesn't work--for many reasons. > >> > >> > >>> library(biomaRt) > >>> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE) > >>> entrez <- c("100","330") > >>> gene <- getGene(id=entrez, type="entrezgene", mart=ens) > >>> getSequence(chromosome = gene$chromosome, start = gene$start - 2000, > >>> > >> end= >gene$end + 1000, mart=ens) > >> > >> > >> 1. gene$start and gene\$end don't exist. > >> 2. seqType is not specified. > >> 3. seqType can't be specified as the type you want. > >> > >> > >> > >>> However, when I try the following line: > >>> gene <- getGene(id=entrez, type="entrezgene", mart=ens) > >>> > >> that appears to be a problem with your database installation as if you > >> do it over the web (i.e. drop mysql=TRUE): > >> > >> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl") > >> entrez <- c("100","330") > >> gene <- getGene(id=entrez, type="entrezgene", mart=ens) > >> > >> it works fine. > >> > >> > >>> The examples in the vignette all seem to work. What is wrong here? > >>> > >> Where is >the file mentioned in the error message supposed to live? > >> > >> > > > > Hmm..., I take it that the API of biomaRt has changed quite a bit. I got the code from a previous message to this list (I think), but perhaps it is easier to ask how to do things properly than to ask how to fix the code. Is there a way of retrieving upstream sequences with biomaRt? > > > > Thanks, Peter > > > > > > > >> Stephen Henderson > >> Wolfson Inst. for Biomedical Research > >> Cruciform Bldg., Gower Street > >> University College London > >> United Kingdom, WC1E 6BT > >> +44 (0)207 679 6827 > >> > >> ********************************************************************** > >> This email and any files transmitted with it are confidentia...{{dropped}} > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > -- > Steffen Durinck, Ph.D. > > Oncogenomics Section > Pediatric Oncology Branch > National Cancer Institute, National Institutes of Health > URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ > > Phone: 301-402-8103 > Address: > Advanced Technology Center, > 8717 Grovemont Circle > Gaithersburg, MD 20877 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor