R: R: why biomaRt cannot extract 3UTR sequences for 1941 ENSGxxxxx ?
0
0
Entering edit mode
@mauedealiceit-3511
Last seen 9.6 years ago
-----Messaggio originale----- Da: seandavi@gmail.com per conto di Sean Davis Inviato: lun 31/05/2010 23.47 A: mauede@alice.it Cc: Steffen Durinck; michael watson (IAH-C); Stefano Rovetta; Giuseppe Russo; Bioconductor List Oggetto: Re: [BioC] R: why biomaRt cannot extract 3UTR sequences for 1941 ENSGxxxxx ? On Mon, May 31, 2010 at 11:07 AM, <mauede@alice.it> wrote: > I reinstalled all Bioconductor packages. > I ran again my R script aimed at extracting 3UTR sequences of validated > gene-targets. > Back to "hsa-mir-1" gene-targets ... I perfoemed the following > verifications and testsS: > > > is.list(genes_map) > [1] TRUE > > is.vector(genes_map[,"ensembl_transcript_id"]) > [1] TRUE > > length(genes_map[,"ensembl_transcript_id"]) > [1] 1941 > > > genes_seq <- getSequence > (id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id", > + seqType="3utr",mart=hmart) > Error in value[[3L]](cond) : > Request to BioMart web service failed. Verify if you are still connected > to the internet. Alternatively the BioMart web service is temporarily down. > > genes_seq <- getSequence > (id=genes_map[1:100,"ensembl_transcript_id"],type="ensembl_transcrip t_id", > + seqType="3utr",mart=hmart) > > dim(genes_seq) > [1] 100 2 > > genes_seq <- getSequence > (id=genes_map[1:1000,"ensembl_transcript_id"],type="ensembl_transcri pt_id", > + seqType="3utr",mart=hmart) > Error in value[[3L]](cond) : > Request to BioMart web service failed. Verify if you are still connected > to the internet. Alternatively the BioMart web service is temporarily down. > > genes_seq <- getSequence > (id=genes_map[1:500,"ensembl_transcript_id"],type="ensembl_transcrip t_id", > + seqType="3utr",mart=hmart) > > dim(genes_seq) > [1] 500 2 > > genes_seq <- getSequence > (id=genes_map[1:800,"ensembl_transcript_id"],type="ensembl_transcrip t_id", > + seqType="3utr",mart=hmart) > > dim(genes_seq) > [1] 800 2 > > genes_seq <- getSequence > (id=genes_map[1:900,"ensembl_transcript_id"],type="ensembl_transcrip t_id", > + seqType="3utr",mart=hmart) > > dim(genes_seq) > [1] 900 2 > > The above results show that my query is successful as long as the number of > 3UTR sequences > requested is less than 1000. How come ? Is this a *magic number* ? > I don't see that 1000 is a magic number in your example. Could you explain how you came to that conclusion? With the exception of the first query which failed, your other queries worked. Perhaps if you tried your longer query again, it would work. If not, I would follow the instructions in each case in which your query fails and make sure that you are still connected to the internet and that the BioMart web service is still working. Also, I have to point out that you have been on this list long enough to know that you MUST include the output of sessionInfo() and a reproducible example in order to get the best help. Also, Steffen (the author of the biomaRt package) has offered to take your list of ids and check it. Perhaps you should try following up on some of the answers you receive before proceeding. Just a thought.... And to be clear, everyone here is trying hard to get you your answers as quickly as time permits. Help us to help you by trying to do as folks suggest rather than simply following up with more questions. Sean I ran and posted the results of some tests where I attempted to find the upper bound of the ENST list length. The results I posted show (at least this was my intention) that the query fails when I ask for 1000 3UTR sequences and get the same error message that pops up when I ask for 1941 ENST sequences. But it works fine if I only ask for 100, 500, 800,900 ENST sequences ... I did send the 1941 long ENST list in a previous email. What can I provide to reproduce the error I am getting ? While it's true I failed to include sessionInfo() output, it's also true the answer I got boils down to be "... it works for me". Therefore, my further question is "why am I not so lucky ?". As I said, and proved, the same query seems to work for me only up to a limited ENST list length. I'd like to find out whether this upper limit depends upon the local network configuration or my computer cofiguration or ... ? I can overcome this stumbling block patching my script so as it won't exceed the limiting data number in any query. Still, though, if I cannot figure out the cause of such a limit, I wonder whether it may be time dependent (network / traffic load, etc ...) As for the connection with biomaRt, running the long query separately, after the error occurred I could successfully run commands like listAttributes(), listMart() ... which *I think* would fail if the connecton was down. Thank you, Maura E > -----Messaggio originale----- > Da: Steffen Durinck [mailto:sdurinck@lbl.gov] > Inviato: ven 28/05/2010 23.16 > A: michael watson (IAH-C) > Cc: mauede@alice.it; Bioconductor List > Oggetto: Re: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 > ENSGxxxxx ? > > Hi Maura, > > This also works for me and duplicate transcript ids shouldn't give > problems, > you'll only get unique results back though. > What version of biomaRt are you running? > Would you be able to send me your complete transcript id list as an rda so > I > can try the complete list? > > Cheers, > Steffen > > On Fri, May 28, 2010 at 1:54 PM, michael watson (IAH-C) < > michael.watson@bbsrc.ac.uk> wrote: > > > The following (small) code works for me: > > > > library(biomaRt) > > mart <- useMart("ensembl","hsapiens_gene_ensembl") > > ids <- c("ENST00000262187","ENST00000296271") > > seq <- getSequence(id=ids, type="ensembl_transcript_id", mart=mart, > > seqType="3utr") > > seq > > ________________________________________ > > From: bioconductor-bounces@stat.math.ethz.ch [ > > bioconductor-bounces@stat.math.ethz.ch] On Behalf Of mauede@alice.it [ > > mauede@alice.it] > > Sent: 28 May 2010 21:41 > > To: Bioconductor List > > Subject: [BioC] why biomaRt cannot extract 3UTR sequences for 1941 > > ENSGxxxxx ? > > > > I executed the following lines several times from a script as well as > > pasting them in an R shell. > > Systematically biomaRt is failing. > > The problem is to extract the 3UTR sequences corresponding to a vector > > containing 1941 > > Ensembl Transcript numbers (some are duplicated ... is this s problem ?) > > Please, find the failing instructions in the following including the ENST > > vector > > > > Any suggestion is welcome. Thank you, > > Maura > > > > > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > > Checking attributes ... ok > > Checking filters ... ok > > > > > genes_map[,"ensembl_transcript_id"] > > [1] "ENST00000262187" "ENST00000296271" "ENST00000346166" > > "ENST00000381570" > > > > <snip> > > > > [1937] "ENST00000400907" "ENST00000400908" "ENST00000440864" > > "ENST00000309042" > > [1941] "ENST00000254325" > > > > > genes_seq <- getSequence > > (id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id", > > + seqType="3utr",mart=hmart) > > Error in value[[3L]](cond) : > > Request to BioMart web service failed. Verify if you are still connected > > to the internet. Alternatively the BioMart web service is temporarily > down. > > > > > > > > tutti i telefonini TIM! > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > tutti i telefonini TIM! [[alternative HTML version deleted]]
biomaRt biomaRt • 854 views
ADD COMMENT

Login before adding your answer.

Traffic: 550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6