Help with genbank

0

Entering edit mode

Jean Yee Hwa Yang ▴ 920

@jean-yee-hwa-yang-104

Last seen 11.4 years ago

Hi all, I would like to extract the sequence information corresponding to an accession number. I use the function in annotate x <- genbank("AK008608") to extract the XML file and hope someone can point me to some functions (or directions) to further extract the sequence info from x$doc$childern. Thanks you. Jean ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jean Yee Hwa Yang jean@biostat.ucsf.edu Division of Biostatistics, Tel: (415) 476-3368 University of California, Fax: (415) 476-6014 500 Parnassus Avenue, MU 420-W, San Francisco, CA 94143-0560 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

• 1.1k views

ADD COMMENT • link updated 22.9 years ago by Jeff Gentry ★ 3.9k • written 22.9 years ago by Jean Yee Hwa Yang ▴ 920

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 10 days ago

United States

> Hi all, > > I would like to extract the sequence information corresponding to an > accession number. I use the function in annotate > x <- genbank("AK008608") the short answer is to look at the XML library by Duncan and in particular to have a look at xmlTreeParse or xmlEventParse. if you know what fields you want to retain and will do this repetitively, we can consider adding a function that saves those fields to the annotate package to work in concert with the genbank function. writing handlers for the xml*Parse functions is a little unintuitive at first but ultimately is not too hard. > to extract the XML file and hope someone can point me to some functions > (or directions) to further extract the sequence info from > x$doc$childern. > > Thanks you. > > Jean > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Jean Yee Hwa Yang jean@biostat.ucsf.edu > Division of Biostatistics, Tel: (415) 476-3368 > University of California, Fax: (415) 476-6014 > 500 Parnassus Avenue, MU 420-W, San Francisco, CA 94143-0560 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 22.9 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Jeff Gentry ★ 3.9k

@jeff-gentry-12

Last seen 11.4 years ago

> I would like to extract the sequence information corresponding to an > accession number. I use the function in annotate > x <- genbank("AK008608") > to extract the XML file and hope someone can point me to some functions > (or directions) to further extract the sequence info from > x$doc$childern. Unfortunately we don't really have any notion of a genbank object like we do with pubmed (the pubMedAbst class), nor have any convenience functions to go in and retrieve that information. I'm not familiar enough w/ the XML structures to say which parts correspond to what you're looking for, but if you look at functions like buildPubMedAbst it can show you how to get at the pieces that you want. -J

ADD COMMENT • link 22.9 years ago Jeff Gentry ★ 3.9k

Login before adding your answer.