Question: libraries or commands to help with parsing or handling web based database queries
0
gravatar for ALAN SMITH
12.5 years ago by
ALAN SMITH40
ALAN SMITH40 wrote:
Hello Bioconductors I am having a very hard time figuring out how to make web based database query results into a nice neat table (if such a thing is possible in R). I am constantly searching the metabolite database METLIN by copying and pasting addresses. I have to search this database with several hundred entries, often, and would like to automate the process to remove the HUGE amount of time I spend doing this carpel tunnel creating routine. I have found several ways to get the pages source like. library(RCurl) test<-getURL("http://metlin.scripps.edu/metabo_list.php?mass_min=112.0 4885&mass_max=112.0555") #OR url.show("http://metlin.scripps.edu/metabo_list.php?mass_min=112.04885 &mass_max=112.0555") Once I get the URL info I notice that the data I am interested in is between and . Are there any packages or methods in R to remove the information I am interested in? I am having problems manipulating STRINGS in R like selecting all of the text between two strings. I am not a programmer. Thanks, Alan Note I am able to use KEGGSOAP without any trouble.
process keggsoap • 487 views
ADD COMMENTlink modified 12.5 years ago by Thomas Girke1.7k • written 12.5 years ago by ALAN SMITH40
Answer: libraries or commands to help with parsing or handling web based database querie
0
gravatar for Thomas Girke
12.5 years ago by
Thomas Girke1.7k
United States
Thomas Girke1.7k wrote:
Alan, You will need for this some basic knowledge on how to use regular expressions within R's grep() and gsub() functions. Additional useful fuctions are paste() and Sys.sleep(). Rcurl also provides some useful utilities for this approach. Below is a short example on a similar problem for obtaining peptide MW information from the Expasy site (http://ca.expasy.org/tools/pi_tool.html). ################################################################### myentries <- c("MKWVTFISLLFLFSSAYS", "MWVTFISLL", "MFISLLFLFSSAYS") myresult <- NULL for(i in myentries) { myurl <- paste("http://ca.expasy.org/cgi- bin/pi_tool?protein=", i, "&resolution=monoisotopic", sep="") x <- url(myurl) res <- readLines(x) close(x) mylines <- res[grep('Theoretical pI/Mw:',res)] myresult <- c(myresult, as.numeric(gsub('.*/ ','', mylines))) print(myresult) Sys.sleep(1) # halts process for one sec to give database a break } final <- data.frame(Pep=myentries, MW=myresult) cat("\n The MW values for my peptides are:\n") print(final) ################################################################### Thomas On Mon 02/19/07 11:41, ALAN SMITH wrote: > Hello Bioconductors > I am having a very hard time figuring out how to make web based > database query results into a nice neat table (if such a thing is > possible in R). I am constantly searching the metabolite database > METLIN by copying and pasting addresses. I have to search this > database with several hundred entries, often, and would like to > automate the process to remove the HUGE amount of time I spend doing > this carpel tunnel creating routine. I have found several ways to get > the pages source like. > > library(RCurl) > test<-getURL("http://metlin.scripps.edu/metabo_list.php?mass_min=112 .04885&mass_max=112.0555") > #OR > url.show("http://metlin.scripps.edu/metabo_list.php?mass_min=112.048 85&mass_max=112.0555") > > Once I get the URL info I notice that the data I am interested in is > between and . > > Are there any packages or methods in R to remove the information I am > interested in? I am having problems manipulating STRINGS in R like > selecting all of the text between two strings. I am not a programmer. > > Thanks, > Alan > > Note I am able to use KEGGSOAP without any trouble. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Thomas Girke, Ph.D. 1008 Noel T. Keen Hall Center for Plant Cell Biology (CEPCEB) University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Website: http://faculty.ucr.edu/~tgirke Ph: 951-827-2469 Fax: 951-827-4437
ADD COMMENTlink written 12.5 years ago by Thomas Girke1.7k
Hi Alan, Which parts are you interested in exactly? Looking at the page there are MID, MASS, Name, Formula information which seem to be more easily extracted from the code. However the structure seems a little bit more tricky to me. Regards Benjamin -----Urspr?ngliche Nachricht----- Von: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] Im Auftrag von Thomas Girke Gesendet: 19 February 2007 19:34 An: ALAN SMITH Cc: bioconductor at stat.math.ethz.ch Betreff: Re: [BioC] libraries or commands to help with parsing or handlingweb based database queries Alan, You will need for this some basic knowledge on how to use regular expressions within R's grep() and gsub() functions. Additional useful fuctions are paste() and Sys.sleep(). Rcurl also provides some useful utilities for this approach. Below is a short example on a similar problem for obtaining peptide MW information from the Expasy site (http://ca.expasy.org/tools/pi_tool.html). ################################################################### myentries <- c("MKWVTFISLLFLFSSAYS", "MWVTFISLL", "MFISLLFLFSSAYS") myresult <- NULL for(i in myentries) { myurl <- paste("http://ca.expasy.org/cgi- bin/pi_tool?protein=", i, "&resolution=monoisotopic", sep="") x <- url(myurl) res <- readLines(x) close(x) mylines <- res[grep('Theoretical pI/Mw:',res)] myresult <- c(myresult, as.numeric(gsub('.*/ ','', mylines))) print(myresult) Sys.sleep(1) # halts process for one sec to give database a break } final <- data.frame(Pep=myentries, MW=myresult) cat("\n The MW values for my peptides are:\n") print(final) ################################################################### Thomas On Mon 02/19/07 11:41, ALAN SMITH wrote: > Hello Bioconductors > I am having a very hard time figuring out how to make web based > database query results into a nice neat table (if such a thing is > possible in R). I am constantly searching the metabolite database > METLIN by copying and pasting addresses. I have to search this > database with several hundred entries, often, and would like to > automate the process to remove the HUGE amount of time I spend doing > this carpel tunnel creating routine. I have found several ways to get > the pages source like. > > library(RCurl) > test<-getURL("http://metlin.scripps.edu/metabo_list.php?mass_min=112.0 4885&m ass_max=112.0555") > #OR > > Once I get the URL info I notice that the data I am interested in is > between and . > > Are there any packages or methods in R to remove the information I am > interested in? I am having problems manipulating STRINGS in R like > selecting all of the text between two strings. I am not a programmer. > > Thanks, > Alan > > Note I am able to use KEGGSOAP without any trouble. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Thomas Girke, Ph.D. 1008 Noel T. Keen Hall Center for Plant Cell Biology (CEPCEB) University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Website: http://faculty.ucr.edu/~tgirke Ph: 951-827-2469 Fax: 951-827-4437 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 12.5 years ago by Benjamin Otto830
Alan, Have you looked at using the XML package? Depending on how malformed the HTML is, it may be useful, as it is designed to parse these types of data. Sean On Tuesday 20 February 2007 08:07, Benjamin Otto wrote: > Hi Alan, > > Which parts are you interested in exactly? > Looking at the page there are MID, MASS, Name, Formula information which > seem to be more easily extracted from the code. However the structure seems > a little bit more tricky to me. > > Regards > > Benjamin > > > > > > -----Urspr?ngliche Nachricht----- > Von: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] Im Auftrag von Thomas Girke > Gesendet: 19 February 2007 19:34 > An: ALAN SMITH > Cc: bioconductor at stat.math.ethz.ch > Betreff: Re: [BioC] libraries or commands to help with parsing or > handlingweb based database queries > > Alan, > You will need for this some basic knowledge on how to use regular > expressions within R's grep() and gsub() functions. Additional useful > fuctions are paste() and Sys.sleep(). > > Rcurl also provides some useful utilities for this approach. > > Below is a short example on a similar problem for obtaining peptide MW > information from the Expasy site (http://ca.expasy.org/tools/pi_tool.html). > > > ################################################################### > myentries <- c("MKWVTFISLLFLFSSAYS", "MWVTFISLL", "MFISLLFLFSSAYS") > myresult <- NULL > for(i in myentries) { > myurl <- paste("http://ca.expasy.org/cgi- bin/pi_tool?protein=", > i, "&resolution=monoisotopic", sep="") > x <- url(myurl) > res <- readLines(x) > close(x) > mylines <- res[grep('Theoretical pI/Mw:',res)] > myresult <- c(myresult, as.numeric(gsub('.*/ ','', mylines))) > print(myresult) > Sys.sleep(1) # halts process for one sec to give database a break > } > final <- data.frame(Pep=myentries, MW=myresult) > cat("\n The MW values for my peptides are:\n") > print(final) > ################################################################### > > > Thomas > > On Mon 02/19/07 11:41, ALAN SMITH wrote: > > Hello Bioconductors > > I am having a very hard time figuring out how to make web based > > database query results into a nice neat table (if such a thing is > > possible in R). I am constantly searching the metabolite database > > METLIN by copying and pasting addresses. I have to search this > > database with several hundred entries, often, and would like to > > automate the process to remove the HUGE amount of time I spend doing > > this carpel tunnel creating routine. I have found several ways to get > > the pages source like. > > > > library(RCurl) > > test<-getURL("http://metlin.scripps.edu/metabo_list.php?mass_min=112 .04885& >m ass_max=112.0555") > > > #OR > > > > > > > > Once I get the URL info I notice that the data I am interested in is > > between and . > > > > Are there any packages or methods in R to remove the information I am > > interested in? I am having problems manipulating STRINGS in R like > > selecting all of the text between two strings. I am not a programmer. > > > > Thanks, > > Alan > > > > Note I am able to use KEGGSOAP without any trouble. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 12.5 years ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 139 users visited in the last hour