Entering edit mode
Andrew Yee
▴
350
@andrew-yee-2667
Last seen 10.2 years ago
Thinking about this more broadly, is there a Bioconductor package that
lets you parse out the different features listed in a GenBank feature,
somewhat akin to this:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation
Thanks,
Andrew
On Wed, Sep 21, 2011 at 5:41 PM, Andrew Yee <yee at="" post.harvard.edu="">
wrote:
> Thanks for the reply. ?I guess on a broader level, is there a way to
> extract the sig_peptide field from
>
> http://www.ncbi.nlm.nih.gov/nuccore/NM_000610.3
>
> I'm trying to figure out why the document reference in Carey's
example
> doesn't contain "sig_peptide" yet is visible on that web page.
>
> Perhaps there is another method of getting the annotation for
> sig_peptide within GenBank?
>
> Thanks,
> Andrew
>
> On Wed, Sep 21, 2011 at 4:07 PM, Vincent Carey
> <stvjc at="" channing.harvard.edu=""> wrote:
>> I don't see a sig_peptide field.? You should have a look at
>>
>> http://www.omegahat.org/RSXML/shortIntro.html
>>
>> and references therein.
>>
>> It has been a long time since I did anything with XML per se. We
did a
>> certain amount of exposition in Chapter 8
>> of the 2005 Springer monograph.? Since then more XPath support has
come in
>> and many new ideas help distance users from
>> details of XML processing.? To illustrate a bit with your example,
I trapped
>> the actual document reference
>>
>> zz =
>> xmlInternalTreeParse("http://www.ncbi.nih.gov/entrez/eutils/efetch.
fcgi?tool=bioconductor&rettype=xml&retmode=text&db=Nucleotide&id=NM_00
0610")
>>
>> and then performed an XPath query
>>
>>> getNodeSet(zz, "//Seq-interval_from")
>> [[1]]
>> <seq-interval_from>3244</seq-interval_from>
>>
>> [[2]]
>> <seq-interval_from>3328</seq-interval_from>
>>
>> [[3]]
>> <seq-interval_from>5695</seq-interval_from>
>>
>> and so on.? I don't recall how to do a relatively simple task like
>> "enumerate all tags in use in a document" but it can be done with
the XML
>> package tools.? I think it will be more effective to isolate the
use case
>> and see how to use eutils to solve it fairly directly as opposed to
wading
>> through XML, but perhaps wading is inevitable.
>>
>>
>> On Wed, Sep 21, 2011 at 12:29 PM, Andrew Yee <yee at="" post.harvard.edu=""> wrote:
>>>
>>> Hi, I'm looking for some guidance in terms of parsing the XML
output
>>> from a genbank query.
>>>
>>> result <- genbank('NM_000610', disp='data', type='uid')
>>>
>>> I'm trying to figure out how to use the XML package in order to
parse
>>> out the "sig_peptide" field from the XML output from the genbank
>>> query.
>>>
>>> Any pointers or suggestions would be appreciated, as I'm new to
XML.
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>
>>> > sessionInfo()
>>> R version 2.13.0 (2011-04-13)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
>>> ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
>>> ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8
>>> ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C
>>> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ?
base
>>>
>>> other attached packages:
>>> [1] XML_3.2-0 ? ? ? ? ? ? annotate_1.29.4 ? ? ?
AnnotationDbi_1.13.21
>>> [4] Biobase_2.11.10
>>>
>>> loaded via a namespace (and not attached):
>>> [1] DBI_0.2-5 ? ? RSQLite_0.9-4 tools_2.13.0 ?xtable_1.5-6
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>