Entering edit mode
Voke AO
▴
760
@voke-ao-4830
Last seen 10.2 years ago
Hi Duncan and Martin,
My bad, no bug whatsoever...it was me. Got my code sorted for the most
part. Thanks again for all the help. It's much appreciated.
-Avoks
On Wed, Feb 29, 2012 at 12:19 PM, Ovokeraye Achinike-Oduaran
<ovokeraye at="" gmail.com=""> wrote:
> Hi Morgan,
>
> Thanks. I think there's possibly a bug with the
> getHTMLFormDescription() but I do understand what you've explained.
>
> Thanks again.
>
>
> -Avoks
>
> On Tue, Feb 28, 2012 at 6:19 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote:
>> On 02/28/2012 06:14 AM, Ovokeraye Achinike-Oduaran wrote:
>>>
>>> Hi Duncan,
>>>
>>> My understanding is that xpathSApply() combines both the
geneSetNode()
>>> and the sapply(). I hope that this is a correct assumption. In
>>> attempting to retrieve nodes in general from the pathway, I used
?both
>>>
>>> xpathSApply(doc, "//li/node()", ?xmlGetAttr, "href")
>>> and
>>> xpathSApply(doc, "//li/a/node()", ?xmlGetAttr, "href")
>>>
>>> and the I get nothing (null) back even though no visible error
pops
>>> up. I something wrong with the way I'm using the path or do I just
not
>>> yet grasp the whole XPath concept (I did read the online
tutorial)?
>>
>>
>> the NULL means that no nodes match your xpath query.
>>
>>
>>>
>>> Sorry to drag this on, but please help.
>>
>>
>> I used Duncan's RHTMLForms suggestion
>>
>> ?library(RHTMLForms)
>> ?url = "http://www.genome.jp/kegg/tool/map_pathway1.html"
>> ?u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>> ?ff = getHTMLFormDescription(url)
>>
>> ?fun = createFunction(ff[[1]])
>> ?txt = fun(unclassified = "ko:K01803 cpd:C00111 cpd:C00118 K00134
C00236",
>> target = "alias", .url = u)
>>
>> to retrieve the text and then
>>
>> ?library(XML)
>> ?xml = htmlTreeParse(txt, asText=TRUE, useInternalNodes=TRUE)
>>
>> to parse to xml (maybe there is a more direct way, using the reader
argument
>> to createFunction?). If I experiment a little, I see for instance
that
>>
>> ?getNodeSet(xml, "//li/a")
>>
>> returns the 'li' elements with nested 'a' elements, and
>>
>> ?getNodeSet(xml, "//li/a[@target]")
>>
>> returns the subset of those elements that have a 'target'
attribute. Finally
>>
>>> head(xpathSApply(xml, "//li/a[@target]", xmlValue))
>> [1] "ko00010 Glycolysis / Gluconeogenesis"
>> [2] "ko01100 Metabolic pathways"
>> [3] "ko01110 Biosynthesis of secondary metabolites"
>> [4] "ko01120 Microbial metabolism in diverse environments"
>> [5] "ko00710 Carbon fixation in photosynthetic organisms"
>> [6] "ko00562 Inositol phosphate metabolism"
>>
>> seems to be about what you want, or
>>
>>
>> head(xpathSApply(xml, "//li/a/@href"))
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href
>> "/kegg-bin/show_pathway?13304448561022/ko00010.args"
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href
>> ? ? ? ? ? ? ? ? ? ? "javascript:display('ko00010')"
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href
>> "/kegg-bin/show_pathway?13304448561022/ko01100.args"
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href
>> ? ? ? ? ? ? ? ? ? ? "javascript:display('ko01100')"
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href
>> "/kegg-bin/show_pathway?13304448561022/ko01110.args"
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href
>> ? ? ? ? ? ? ? ? ? ? "javascript:display('ko01110')"
>>
>> Maybe the KEGGSOAP package already does what you're interested in?
The web
>> scraping you're doing is going to break as soon as the web site
tweaks its
>> presentation.
>>
>> Or maybe
>>
>>> library(org.Hs.eg.db)
>>> head(toTable(revmap(org.Hs.egPATH)[c("00232", "04142")]))
>> ?gene_id path_id
>> 1 ? ? ? 9 ? 00232
>> 2 ? ? ?10 ? 00232
>> 3 ? ? ?20 ? 04142
>> 4 ? ? ?53 ? 04142
>> 5 ? ? ?54 ? 04142
>> 6 ? ? 162 ? 04142
>>
>> The KEGG information in the org.* and KEGG packages dates to the
last free
>> public release, and so are starting to be dated).
>>
>> Martin
>>
>>
>>>
>>> Thanks.
>>>
>>> Avoks
>>>
>>> On Mon, Feb 27, 2012 at 4:09 PM, Ovokeraye Achinike-Oduaran
>>> <ovokeraye at="" gmail.com=""> ?wrote:
>>>>
>>>> Thank you so very much, Duncan. I will go get myself
enlightened:).
>>>> Thanks again.
>>>>
>>>> Avoks
>>>>
>>>> On Mon, Feb 27, 2012 at 3:50 PM, Duncan Temple Lang
>>>> <duncan at="" wald.ucdavis.edu=""> ?wrote:
>>>>>
>>>>>
>>>>> Use
>>>>>
>>>>> ? target = "alias"
>>>>>
>>>>> in the call.
>>>>>
>>>>> If you don't know how to map form elements to parameters in the
request,
>>>>> you
>>>>> can either read ?a tutorial on HTML forms, or alternatively, use
>>>>> the RHTMLForms package which you have loaded according to your
search
>>>>> path, e.g.
>>>>>
>>>>> ?# read the form ?and then turn the information into an R
function.
>>>>> ff =
>>>>> getHTMLFormDescription("http://www.genome.jp/kegg/tool/map_pathw
ay1.html")
>>>>> fun = createFunction(ff[[1]])
>>>>>
>>>>> ?# Since the action in the form is javascript, we'll provide the
>>>>> ?# URL manually.
>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>>>> out = fun(unclassified = "ko:K01803 cpd:C00111 cpd:C00118 K00134
>>>>> C00236",
>>>>> ? ? ? ? ?target = "alias", .url = u)
>>>>>
>>>>> The benefits of the RHTMLForms include using the same defaults
>>>>> as the form on the Web page, adding hidden parameters,
identifying
>>>>> the names of the parameters.
>>>>>
>>>>> ? D
>>>>>
>>>>>
>>>>> On 2/27/12 3:08 AM, Ovokeraye Achinike-Oduaran wrote:
>>>>>>
>>>>>> Hi Duncan,
>>>>>>
>>>>>> I noticed that with the script as is, it doesn't take into
>>>>>> consideration the "include alias" checkbox. I tried modifying
the
>>>>>> script to force include that option but it still did not work.
Any
>>>>>> ideas?
>>>>>>
>>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>>>>> data = postForm(u,
>>>>>> ? ? ? ? ? ? ? ?.params = list(org_name = "hsadd",
>>>>>> ? ? ? ? ? ? ? ?unclassified = paste(readLines(file.choose()),
collapse
>>>>>> = "\n"),
>>>>>> ? ? ? ? ? ? ? ?file = "", checkbox = "alias", submit = "Exec"))
>>>>>>
>>>>>>
>>>>>> Thanks again.
>>>>>>
>>>>>> Avoks
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 27, 2012 at 10:24 AM, Ovokeraye Achinike-Oduaran
>>>>>> <ovokeraye at="" gmail.com=""> ?wrote:
>>>>>>>
>>>>>>> Hi Duncan,
>>>>>>>
>>>>>>> Thanks a bunch.
>>>>>>>
>>>>>>> -Avoks
>>>>>>>
>>>>>>> On Fri, Feb 24, 2012 at 11:09 PM, Duncan Temple Lang
>>>>>>> <duncan at="" wald.ucdavis.edu=""> ?wrote:
>>>>>>>>
>>>>>>>> Hi Avoks
>>>>>>>>
>>>>>>>> While the form is provided by KEGG and so bio-relatd,
>>>>>>>> you might have been better posting this to the more general
r-help
>>>>>>>> mailing list.
>>>>>>>>
>>>>>>>>
>>>>>>>> You are posting the HTTP request to the wrong URL. That is
the URL
>>>>>>>> of the Web page that displays the form, not the URL that
processes
>>>>>>>> the input from the form.
>>>>>>>> You have to look at the JavaScript that is referenced in the
action
>>>>>>>> attribute of the HTML form element.
>>>>>>>>
>>>>>>>> The second issue is that you are submitting the name of a
local file.
>>>>>>>> This won't work as is. ?You either need to identify this is
the name
>>>>>>>> of a file and not the contents
>>>>>>>> of the file to send, or else send the contents. ?In this
form, you
>>>>>>>> can send the
>>>>>>>> contents via the the unclassified parameter.
>>>>>>>>
>>>>>>>>
>>>>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object"
>>>>>>>> data = postForm(u,
>>>>>>>> ? ? ? ? ? ? ? ?.params = list(org_name = "hsadd",
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unclassified = "hsa:7167
hsa:GPI
>>>>>>>> cpd:C00118\nALDOA 1.2.1.12 C00236",
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? file = "", submit = "Exec"))
>>>>>>>>
>>>>>>>>
>>>>>>>> If your input is in a file, you can use
>>>>>>>>
>>>>>>>> ?unclassified = paste(readLines(file.choose()), collapse =
"\n")
>>>>>>>>
>>>>>>>> as the value for the unclassified parameter.
>>>>>>>>
>>>>>>>>
>>>>>>>> There are additional parameters that the form accepts that
may be
>>>>>>>> relevant for your search.
>>>>>>>>
>>>>>>>>
>>>>>>>> As for processing the results, you will want to use
>>>>>>>>
>>>>>>>> ?doc = htmlParse(data, asText = TRUE)
>>>>>>>>
>>>>>>>> and then use getNodeSet()/xpathSApply() or direct tree
extraction to
>>>>>>>> access the nodes you want, e.g.
>>>>>>>>
>>>>>>>> ?xpathSApply(doc, "//li/a", ?xmlGetAttr, "href")
>>>>>>>>
>>>>>>>>
>>>>>>>> ?D.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/24/12 6:09 AM, Ovokeraye Achinike-Oduaran wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I am trying to use postForm() with the KEGG website but I am
stuck
>>>>>>>>> on
>>>>>>>>> how to get my results. Is it possible (code below) or am I
using
>>>>>>>>> postForm() wrongly? The code appears to run but I'm not
quite sure
>>>>>>>>> how
>>>>>>>>> to read the results assuming there are any. Please help.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Avoks
>>>>>>>>> ____
>>>>>>>>>
>>>>>>>>> data =
postForm("http://www.genome.jp/kegg/tool/map_pathway1.html",
>>>>>>>>> org_name = "hsadd",
>>>>>>>>> file = file.choose(),
>>>>>>>>> submit = "Exec")
>>>>>>>>>
>>>>>>>>>> sessionInfo()
>>>>>>>>>
>>>>>>>>> R version 2.14.1 (2011-12-22)
>>>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>>>>
>>>>>>>>> locale:
>>>>>>>>> [1] LC_COLLATE=English_xxx.1252 ?LC_CTYPE=English_xxx.1252
>>>>>>>>> [3] LC_MONETARY=English_xxx.1252 LC_NUMERIC=C
>>>>>>>>> [5] LC_TIME=English_xxx.1252
>>>>>>>>>
>>>>>>>>> attached base packages:
>>>>>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets
?methods ? base
>>>>>>>>>
>>>>>>>>> other attached packages:
>>>>>>>>> [1] RHTMLForms_0.5-1 XML_3.9-4.1 ? ? ?RCurl_1.91-1.1
>>>>>>>>> bitops_1.0-4.1
>>>>>>>>>
>>>>>>>>> loaded via a namespace (and not attached):
>>>>>>>>> [1] tools_2.14.1
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioconductor mailing list
>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>> Search the archives:
>>>>>>>>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793