Help with genbank
2
0
Entering edit mode
@jean-yee-hwa-yang-104
Last seen 10.2 years ago
Hi all, I would like to extract the sequence information corresponding to an accession number. I use the function in annotate x <- genbank("AK008608") to extract the XML file and hope someone can point me to some functions (or directions) to further extract the sequence info from x$doc$childern. Thanks you. Jean ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jean Yee Hwa Yang jean@biostat.ucsf.edu Division of Biostatistics, Tel: (415) 476-3368 University of California, Fax: (415) 476-6014 500 Parnassus Avenue, MU 420-W, San Francisco, CA 94143-0560 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
• 959 views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 9 weeks ago
United States
> Hi all, > > I would like to extract the sequence information corresponding to an > accession number. I use the function in annotate > x <- genbank("AK008608") the short answer is to look at the XML library by Duncan and in particular to have a look at xmlTreeParse or xmlEventParse. if you know what fields you want to retain and will do this repetitively, we can consider adding a function that saves those fields to the annotate package to work in concert with the genbank function. writing handlers for the xml*Parse functions is a little unintuitive at first but ultimately is not too hard. > to extract the XML file and hope someone can point me to some functions > (or directions) to further extract the sequence info from > x$doc$childern. > > Thanks you. > > Jean > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Jean Yee Hwa Yang jean@biostat.ucsf.edu > Division of Biostatistics, Tel: (415) 476-3368 > University of California, Fax: (415) 476-6014 > 500 Parnassus Avenue, MU 420-W, San Francisco, CA 94143-0560 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD COMMENT
0
Entering edit mode
Jeff Gentry ★ 3.9k
@jeff-gentry-12
Last seen 10.2 years ago
> I would like to extract the sequence information corresponding to an > accession number. I use the function in annotate > x <- genbank("AK008608") > to extract the XML file and hope someone can point me to some functions > (or directions) to further extract the sequence info from > x$doc$childern. Unfortunately we don't really have any notion of a genbank object like we do with pubmed (the pubMedAbst class), nor have any convenience functions to go in and retrieve that information. I'm not familiar enough w/ the XML structures to say which parts correspond to what you're looking for, but if you look at functions like buildPubMedAbst it can show you how to get at the pieces that you want. -J
ADD COMMENT

Login before adding your answer.

Traffic: 693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6