Entering edit mode
                    stubben
        
    
        ▴
    
    80
        @stubben-4185
        Last seen 11.2 years ago
        
    I've been using Efetch to get some full text articles from Pubmed
Central,  which works fine...
url <-
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=PM
C2784878"
x<-readLines(url)
doc <- xmlParse(x )   # requires XML package
xpathSApply(doc, "//abstract", xmlValue)
[1] "The majority of all genes have so far been identified and
annotated
systematically through in silico gene finding. Here we report the
finding of 3662 strand-specific transcriptionally active regions
(TARs)
in the genome of Bacillus subtilis by the use of tiling arrays.
I recently noticed the PMC copyright says to use the FTP or OAI
service
for any "automated" retrievals, so I thought I would try OAI, but I
can't get the same xpath queries to work.
url <-
"http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=GetRecord&metadataP
refix=pmc&identifier=oai:pubmedcentral.nih.gov:2784878"
x2<-readLines(url)  # will warn about incomplete final line
doc2 <- xmlParse(x2 )
xpathSApply(doc2, "//abstract", xmlValue)
list()
This query does work so I know there's an abstract tag.
table(xpathSApply(doc2, "//*", xmlName))
              abstract                    ack
addr-line                    aff                article
article-categories
                     1                      1
1                      1                      1                      1
            article-id           article-meta
article-title           author-notes
back                   body
                     3                      1
79                      1                      1
1
               caption                contrib          contrib-group
copyright-statement                corresp                   date
                     7                      3
1                      1                      1                      1
Thanks for any help.
Chris Stubben
                    
                
                