Question

Trouble querying pubmed on strings

0

Entering edit mode

Ken Termiso ▴ 250

@ken-termiso-1087

Last seen 9.6 years ago

hi all, i'm trying to get a function working that queries pubmed with any string and returns pubMedAbst objects corrresponding to the pubmed article hits from the query string... this is my code so far, based partly from annotate's 'query.pdf' and also from the perl script from NCBI at http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html : library(annotate) library(XML) query <- "trk" pmSrch <- function(query) { utils <- "http://www.ncbi.nlm.nih.gov/entrez/eutils" esearch <- paste(utils, "/esearch.fcgi?" , "report=xml&mode=text&tool=bioconductor&", "db=Pubmed&retmax=1&usehistory=y&term=", query) esearch <- gsub(" ", "", esearch) cat(esearch, "\n") #return(esearch) # returns URL return(.handleXML(esearch)) } pms <- pmSrch(query) a <- xmlRoot(pms) numAbst <- length(xmlChildren(a)) numAbst arts <- vector("list", length = numAbst) absts <- rep(NA, numAbst) for (i in 1:numAbst) { arts[[i]] <- buildPubMedAbst(a[[i]]) absts[i] <- abstText(arts[[i]]) } i don't know perl and i end up with numAbst = 8 (regardless of the search string) and esearch = http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?report=xml&mode =text&tool=bioconductor&db=Pubmed&retmax=1&usehistory=y&term=trk but typing : >arts[1] [[1]] An object of class 'pubMedAbst': Title: No Title Provided PMID: No PMID Provided Authors: No Author Information Provided Journal: No Journal Provided Date: Month Year simply gives me empty objects... i'd appreciate any help anyone can give. i am not familiar with XML... thanks in advance, ken

• 972 views

ADD COMMENT • link updated 18.5 years ago by Seth Falcon ★ 7.4k • written 18.5 years ago by Ken Termiso ▴ 250

score 0 · Answer 1 · 2005-11-06

Hi Ken, On 4 Nov 2005, jerk_alert at hotmail.com wrote: > hi all, > > i'm trying to get a function working that queries pubmed with any > string and returns pubMedAbst objects corrresponding to the pubmed > article hits from the query string... > > this is my code so far, based partly from annotate's 'query.pdf' and > also from the perl script from NCBI at > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html > pmSrch <- function(query) > { > utils <- "http://www.ncbi.nlm.nih.gov/entrez/eutils" > > esearch <- paste(utils, "/esearch.fcgi?" , > "report=xml&mode=text&tool=bioconductor&", > "db=Pubmed&retmax=1&usehistory=y&term=", query) > esearch <- gsub(" ", "", esearch) You might find the sep and collapse arguments to paste useful here. No need for gsub then. That would also allow you to make the query string a bit easier to read. > i don't know perl and i end up with numAbst = 8 (regardless of the > search string) and esearch = If you look at what you get back: lapply(xmlChildren(xmlRoot(pms)), xmlValue) And look at the last part of the Perl example [1], you will see that the search results have to be fetched in two steps. Here is a very rough cut of a function to fetch results after the first query: pmExtract <- function(pmSrchResult) { dom <- xmlRoot(pmSrchResult) searchData <- lapply(xmlChildren(dom), xmlValue) webEnv <- searchData$WebEnv queryKey <- searchData$QueryKey utils <- "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?" args <- c("rettype=abstract", "retmode=xml", "retstart=0", "retmax=3", "db=pubmed", paste("query_key", queryKey, sep="="), paste("WebEnv", webEnv, sep="=")) args <- paste(args, collapse="&") utils <- paste(utils, args, sep="") cat(utils, "\n") return(.handleXML(utils)) } So then you would do: res1 <- pmSearch("trk") res2 <- pmExtract(res1) ## process res2 to extract the XML abstracts, etc Hope that helps to get you going. Best, + seth [1] http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl