R: why biomaRt cannot extract 3UTR sequences for 1941 ENSGxxxxx ?
3
0
Entering edit mode
@mauedealiceit-3511
Last seen 9.6 years ago
Those instructions worked fine in many other 3UTR extractions that requested a smaller number of ENSTzzzz I would be surprised if the query vector length were a problem. A few months ago I was suggested to download big amounts of data through one query only and then parse the output locally. Recently I have updated R and all the installed packages automatically. Shall I take it that Bioconductor packages are not automatically upgraded ? I haven't checked that. Anyway, on Monday I will try to upgrade biomaRt anyway. Shall I uninstall it first or may I proceed as I did the first time following the instructions posted for Bioconductor packages installation ? Thank you Maura -----Messaggio originale----- Da: Steffen Durinck [mailto:sdurinck@lbl.gov] Inviato: ven 28/05/2010 23.16 A: michael watson (IAH-C) Cc: mauede@alice.it; Bioconductor List Oggetto: Re: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 ENSGxxxxx ? Hi Maura, This also works for me and duplicate transcript ids shouldn't give problems, you'll only get unique results back though. What version of biomaRt are you running? Would you be able to send me your complete transcript id list as an rda so I can try the complete list? Cheers, Steffen On Fri, May 28, 2010 at 1:54 PM, michael watson (IAH-C) < michael.watson@bbsrc.ac.uk> wrote: > The following (small) code works for me: > > library(biomaRt) > mart <- useMart("ensembl","hsapiens_gene_ensembl") > ids <- c("ENST00000262187","ENST00000296271") > seq <- getSequence(id=ids, type="ensembl_transcript_id", mart=mart, > seqType="3utr") > seq > ________________________________________ > From: bioconductor-bounces@stat.math.ethz.ch [ > bioconductor-bounces@stat.math.ethz.ch] On Behalf Of mauede@alice.it [ > mauede@alice.it] > Sent: 28 May 2010 21:41 > To: Bioconductor List > Subject: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 > ENSGxxxxx ? > > I executed the following lines several times from a script as well as > pasting them in an R shell. > Systematically biomaRt is failing. > The problem is to extract the 3UTR sequences corresponding to a vector > containing 1941 > Ensembl Transcript numbers (some are duplicated ... is this s problem ?) > Please, find the failing instructions in the following including the ENST > vector > > Any suggestion is welcome. Thank you, > Maura > > > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > Checking attributes ... ok > Checking filters ... ok > > > genes_map[,"ensembl_transcript_id"] > [1] "ENST00000262187" "ENST00000296271" "ENST00000346166" > "ENST00000381570" > > <snip> > > [1937] "ENST00000400907" "ENST00000400908" "ENST00000440864" > "ENST00000309042" > [1941] "ENST00000254325" > > > genes_seq <- getSequence > (id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id", > + seqType="3utr",mart=hmart) > Error in value[[3L]](cond) : > Request to BioMart web service failed. Verify if you are still connected > to the internet. Alternatively the BioMart web service is temporarily down. > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > tutti i telefonini TIM! [[alternative HTML version deleted]]
biomaRt biomaRt • 1.3k views
ADD COMMENT
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 9.6 years ago
Hi I parsed your entire ID list and used it in the same code I produced below, and it worked. My id list was a vector - I wonder if your construct genes_map[,"ensembl_transcript_id"] produces not a vector, but a factor, which may cause problems for biomaRt? I'm just guessing though. Bioc packages are not updated automatically, no. The best way to install bioconductor is through biocLite(). Mick ________________________________________ From: mauede@alice.it [mauede@alice.it] Sent: 29 May 2010 15:40 To: Steffen Durinck; michael watson (IAH-C) Cc: Bioconductor List Subject: R: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 ENSGxxxxx ? Those instructions worked fine in many other 3UTR extractions that requested a smaller number of ENSTzzzz I would be surprised if the query vector length were a problem. A few months ago I was suggested to download big amounts of data through one query only and then parse the output locally. Recently I have updated R and all the installed packages automatically. Shall I take it that Bioconductor packages are not automatically upgraded ? I haven't checked that. Anyway, on Monday I will try to upgrade biomaRt anyway. Shall I uninstall it first or may I proceed as I did the first time following the instructions posted for Bioconductor packages installation ? Thank you Maura -----Messaggio originale----- Da: Steffen Durinck [mailto:sdurinck at lbl.gov] Inviato: ven 28/05/2010 23.16 A: michael watson (IAH-C) Cc: mauede at alice.it; Bioconductor List Oggetto: Re: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 ENSGxxxxx ? Hi Maura, This also works for me and duplicate transcript ids shouldn't give problems, you'll only get unique results back though. What version of biomaRt are you running? Would you be able to send me your complete transcript id list as an rda so I can try the complete list? Cheers, Steffen On Fri, May 28, 2010 at 1:54 PM, michael watson (IAH-C) < michael.watson at bbsrc.ac.uk> wrote: > The following (small) code works for me: > > library(biomaRt) > mart <- useMart("ensembl","hsapiens_gene_ensembl") > ids <- c("ENST00000262187","ENST00000296271") > seq <- getSequence(id=ids, type="ensembl_transcript_id", mart=mart, > seqType="3utr") > seq > ________________________________________ > From: bioconductor-bounces at stat.math.ethz.ch [ > bioconductor-bounces at stat.math.ethz.ch] On Behalf Of mauede at alice.it [ > mauede at alice.it] > Sent: 28 May 2010 21:41 > To: Bioconductor List > Subject: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 > ENSGxxxxx ? > > I executed the following lines several times from a script as well as > pasting them in an R shell. > Systematically biomaRt is failing. > The problem is to extract the 3UTR sequences corresponding to a vector > containing 1941 > Ensembl Transcript numbers (some are duplicated ... is this s problem ?) > Please, find the failing instructions in the following including the ENST > vector > > Any suggestion is welcome. Thank you, > Maura > > > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > Checking attributes ... ok > Checking filters ... ok > > > genes_map[,"ensembl_transcript_id"] > [1] "ENST00000262187" "ENST00000296271" "ENST00000346166" > "ENST00000381570" > > <snip> > > [1937] "ENST00000400907" "ENST00000400908" "ENST00000440864" > "ENST00000309042" > [1941] "ENST00000254325" > > > genes_seq <- getSequence > (id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id", > + seqType="3utr",mart=hmart) > Error in value[[3L]](cond) : > Request to BioMart web service failed. Verify if you are still connected > to the internet. Alternatively the BioMart web service is temporarily down. > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e tutti i telefonini TIM! Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer
ADD COMMENT
0
Entering edit mode
@mauedealiceit-3511
Last seen 9.6 years ago
This morning I have updated biomaRt (through biocLite()) as adviced. But now I cannot even connect: > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') is hanging ... Any idea ? Thank you, Maura -----Messaggio originale----- Da: Steffen Durinck [mailto:sdurinck@lbl.gov] Inviato: ven 28/05/2010 23.16 A: michael watson (IAH-C) Cc: mauede@alice.it; Bioconductor List Oggetto: Re: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 ENSGxxxxx ? Hi Maura, This also works for me and duplicate transcript ids shouldn't give problems, you'll only get unique results back though. What version of biomaRt are you running? Would you be able to send me your complete transcript id list as an rda so I can try the complete list? Cheers, Steffen On Fri, May 28, 2010 at 1:54 PM, michael watson (IAH-C) < michael.watson@bbsrc.ac.uk> wrote: > The following (small) code works for me: > > library(biomaRt) > mart <- useMart("ensembl","hsapiens_gene_ensembl") > ids <- c("ENST00000262187","ENST00000296271") > seq <- getSequence(id=ids, type="ensembl_transcript_id", mart=mart, > seqType="3utr") > seq > ________________________________________ > From: bioconductor-bounces@stat.math.ethz.ch [ > bioconductor-bounces@stat.math.ethz.ch] On Behalf Of mauede@alice.it [ > mauede@alice.it] > Sent: 28 May 2010 21:41 > To: Bioconductor List > Subject: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 > ENSGxxxxx ? > > I executed the following lines several times from a script as well as > pasting them in an R shell. > Systematically biomaRt is failing. > The problem is to extract the 3UTR sequences corresponding to a vector > containing 1941 > Ensembl Transcript numbers (some are duplicated ... is this s problem ?) > Please, find the failing instructions in the following including the ENST > vector > > Any suggestion is welcome. Thank you, > Maura > > > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > Checking attributes ... ok > Checking filters ... ok > > > genes_map[,"ensembl_transcript_id"] > [1] "ENST00000262187" "ENST00000296271" "ENST00000346166" > "ENST00000381570" > > <snip> > > [1937] "ENST00000400907" "ENST00000400908" "ENST00000440864" > "ENST00000309042" > [1941] "ENST00000254325" > > > genes_seq <- getSequence > (id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id", > + seqType="3utr",mart=hmart) > Error in value[[3L]](cond) : > Request to BioMart web service failed. Verify if you are still connected > to the internet. Alternatively the BioMart web service is temporarily down. > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > tutti i telefonini TIM! [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@mauedealiceit-3511
Last seen 9.6 years ago
I reinstalled all Bioconductor packages. I ran again my R script aimed at extracting 3UTR sequences of validated gene-targets. Back to "hsa-mir-1" gene-targets ... I perfoemed the following verifications and testsS: > is.list(genes_map) [1] TRUE > is.vector(genes_map[,"ensembl_transcript_id"]) [1] TRUE > length(genes_map[,"ensembl_transcript_id"]) [1] 1941 > genes_seq <- getSequence (id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id", + seqType="3utr",mart=hmart) Error in value[[3L]](cond) : Request to BioMart web service failed. Verify if you are still connected to the internet. Alternatively the BioMart web service is temporarily down. > genes_seq <- getSequence (id=genes_map[1:100,"ensembl_transcript_id" ],type="ensembl_transcript_id", + seqType="3utr",mart=hmart) > dim(genes_seq) [1] 100 2 > genes_seq <- getSequence (id=genes_map[1:1000,"ensembl_transcript_id "],type="ensembl_transcript_id", + seqType="3utr",mart=hmart) Error in value[[3L]](cond) : Request to BioMart web service failed. Verify if you are still connected to the internet. Alternatively the BioMart web service is temporarily down. > genes_seq <- getSequence (id=genes_map[1:500,"ensembl_transcript_id" ],type="ensembl_transcript_id", + seqType="3utr",mart=hmart) > dim(genes_seq) [1] 500 2 > genes_seq <- getSequence (id=genes_map[1:800,"ensembl_transcript_id" ],type="ensembl_transcript_id", + seqType="3utr",mart=hmart) > dim(genes_seq) [1] 800 2 > genes_seq <- getSequence (id=genes_map[1:900,"ensembl_transcript_id" ],type="ensembl_transcript_id", + seqType="3utr",mart=hmart) > dim(genes_seq) [1] 900 2 The above results show that my query is successful as long as the number of 3UTR sequences requested is less than 1000. How come ? Is this a *magic number* ? Thank you. Maura -----Messaggio originale----- Da: Steffen Durinck [mailto:sdurinck@lbl.gov] Inviato: ven 28/05/2010 23.16 A: michael watson (IAH-C) Cc: mauede@alice.it; Bioconductor List Oggetto: Re: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 ENSGxxxxx ? Hi Maura, This also works for me and duplicate transcript ids shouldn't give problems, you'll only get unique results back though. What version of biomaRt are you running? Would you be able to send me your complete transcript id list as an rda so I can try the complete list? Cheers, Steffen On Fri, May 28, 2010 at 1:54 PM, michael watson (IAH-C) < michael.watson@bbsrc.ac.uk> wrote: > The following (small) code works for me: > > library(biomaRt) > mart <- useMart("ensembl","hsapiens_gene_ensembl") > ids <- c("ENST00000262187","ENST00000296271") > seq <- getSequence(id=ids, type="ensembl_transcript_id", mart=mart, > seqType="3utr") > seq > ________________________________________ > From: bioconductor-bounces@stat.math.ethz.ch [ > bioconductor-bounces@stat.math.ethz.ch] On Behalf Of mauede@alice.it [ > mauede@alice.it] > Sent: 28 May 2010 21:41 > To: Bioconductor List > Subject: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 > ENSGxxxxx ? > > I executed the following lines several times from a script as well as > pasting them in an R shell. > Systematically biomaRt is failing. > The problem is to extract the 3UTR sequences corresponding to a vector > containing 1941 > Ensembl Transcript numbers (some are duplicated ... is this s problem ?) > Please, find the failing instructions in the following including the ENST > vector > > Any suggestion is welcome. Thank you, > Maura > > > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > Checking attributes ... ok > Checking filters ... ok > > > genes_map[,"ensembl_transcript_id"] > [1] "ENST00000262187" "ENST00000296271" "ENST00000346166" > "ENST00000381570" > > <snip> > > [1937] "ENST00000400907" "ENST00000400908" "ENST00000440864" > "ENST00000309042" > [1941] "ENST00000254325" > > > genes_seq <- getSequence > (id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id", > + seqType="3utr",mart=hmart) > Error in value[[3L]](cond) : > Request to BioMart web service failed. Verify if you are still connected > to the internet. Alternatively the BioMart web service is temporarily down. > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > tutti i telefonini TIM! [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On Mon, May 31, 2010 at 11:07 AM, <mauede@alice.it> wrote: > I reinstalled all Bioconductor packages. > I ran again my R script aimed at extracting 3UTR sequences of validated > gene-targets. > Back to "hsa-mir-1" gene-targets ... I perfoemed the following > verifications and testsS: > > > is.list(genes_map) > [1] TRUE > > is.vector(genes_map[,"ensembl_transcript_id"]) > [1] TRUE > > length(genes_map[,"ensembl_transcript_id"]) > [1] 1941 > > > genes_seq <- getSequence > (id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id", > + seqType="3utr",mart=hmart) > Error in value[[3L]](cond) : > Request to BioMart web service failed. Verify if you are still connected > to the internet. Alternatively the BioMart web service is temporarily down. > > genes_seq <- getSequence > (id=genes_map[1:100,"ensembl_transcript_id"],type="ensembl_transcrip t_id", > + seqType="3utr",mart=hmart) > > dim(genes_seq) > [1] 100 2 > > genes_seq <- getSequence > (id=genes_map[1:1000,"ensembl_transcript_id"],type="ensembl_transcri pt_id", > + seqType="3utr",mart=hmart) > Error in value[[3L]](cond) : > Request to BioMart web service failed. Verify if you are still connected > to the internet. Alternatively the BioMart web service is temporarily down. > > genes_seq <- getSequence > (id=genes_map[1:500,"ensembl_transcript_id"],type="ensembl_transcrip t_id", > + seqType="3utr",mart=hmart) > > dim(genes_seq) > [1] 500 2 > > genes_seq <- getSequence > (id=genes_map[1:800,"ensembl_transcript_id"],type="ensembl_transcrip t_id", > + seqType="3utr",mart=hmart) > > dim(genes_seq) > [1] 800 2 > > genes_seq <- getSequence > (id=genes_map[1:900,"ensembl_transcript_id"],type="ensembl_transcrip t_id", > + seqType="3utr",mart=hmart) > > dim(genes_seq) > [1] 900 2 > > The above results show that my query is successful as long as the number of > 3UTR sequences > requested is less than 1000. How come ? Is this a *magic number* ? > I don't see that 1000 is a magic number in your example. Could you explain how you came to that conclusion? With the exception of the first query which failed, your other queries worked. Perhaps if you tried your longer query again, it would work. If not, I would follow the instructions in each case in which your query fails and make sure that you are still connected to the internet and that the BioMart web service is still working. Also, I have to point out that you have been on this list long enough to know that you MUST include the output of sessionInfo() and a reproducible example in order to get the best help. Also, Steffen (the author of the biomaRt package) has offered to take your list of ids and check it. Perhaps you should try following up on some of the answers you receive before proceeding. Just a thought.... And to be clear, everyone here is trying hard to get you your answers as quickly as time permits. Help us to help you by trying to do as folks suggest rather than simply following up with more questions. Sean > -----Messaggio originale----- > Da: Steffen Durinck [mailto:sdurinck@lbl.gov] > Inviato: ven 28/05/2010 23.16 > A: michael watson (IAH-C) > Cc: mauede@alice.it; Bioconductor List > Oggetto: Re: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 > ENSGxxxxx ? > > Hi Maura, > > This also works for me and duplicate transcript ids shouldn't give > problems, > you'll only get unique results back though. > What version of biomaRt are you running? > Would you be able to send me your complete transcript id list as an rda so > I > can try the complete list? > > Cheers, > Steffen > > On Fri, May 28, 2010 at 1:54 PM, michael watson (IAH-C) < > michael.watson@bbsrc.ac.uk> wrote: > > > The following (small) code works for me: > > > > library(biomaRt) > > mart <- useMart("ensembl","hsapiens_gene_ensembl") > > ids <- c("ENST00000262187","ENST00000296271") > > seq <- getSequence(id=ids, type="ensembl_transcript_id", mart=mart, > > seqType="3utr") > > seq > > ________________________________________ > > From: bioconductor-bounces@stat.math.ethz.ch [ > > bioconductor-bounces@stat.math.ethz.ch] On Behalf Of mauede@alice.it [ > > mauede@alice.it] > > Sent: 28 May 2010 21:41 > > To: Bioconductor List > > Subject: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 > > ENSGxxxxx ? > > > > I executed the following lines several times from a script as well as > > pasting them in an R shell. > > Systematically biomaRt is failing. > > The problem is to extract the 3UTR sequences corresponding to a vector > > containing 1941 > > Ensembl Transcript numbers (some are duplicated ... is this s problem ?) > > Please, find the failing instructions in the following including the ENST > > vector > > > > Any suggestion is welcome. Thank you, > > Maura > > > > > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > > Checking attributes ... ok > > Checking filters ... ok > > > > > genes_map[,"ensembl_transcript_id"] > > [1] "ENST00000262187" "ENST00000296271" "ENST00000346166" > > "ENST00000381570" > > > > <snip> > > > > [1937] "ENST00000400907" "ENST00000400908" "ENST00000440864" > > "ENST00000309042" > > [1941] "ENST00000254325" > > > > > genes_seq <- getSequence > > (id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id", > > + seqType="3utr",mart=hmart) > > Error in value[[3L]](cond) : > > Request to BioMart web service failed. Verify if you are still connected > > to the internet. Alternatively the BioMart web service is temporarily > down. > > > > > > > > tutti i telefonini TIM! > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6