Fetching documents from PubMed
3
0
Entering edit mode
@kaustubh-patil-1544
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060208/ 5e3d62cc/attachment.pl
• 917 views
ADD COMMENT
0
Entering edit mode
rgentleman ★ 5.5k
@rgentleman-7725
Last seen 9.6 years ago
United States
Hi, pubmed makes precisely one request, so there is no issue with timing. In many cases you can make a single request for lots of things, rather than lots of requests for one thing. If you stick it in a for loop then there could be problems, but so far not a single person has reported hitting this particular wall. As for why only 377 came back, did you check to see what happens if you request one of the missing ones by itself? Or go to the website at NLM and see if you Pubmed id is valid? Also, please do read the posting guide and tell us something about your system. thanks Robert Kaustubh Patil wrote: > Hi, > > I want to fetch documents from PubMed. So first I get all the PMIDs and then use the "pubmed" function from the "annotate package". But does this function take care of the NCBI rule for waiting 3 seconds between queries? > > Also I have a list of 718 PMIDs but the function retrieves only 377 of them? I don't understand why. Suggestions appreciated. > > Thank you and regards, > Kaustubh > > > --------------------------------- > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD COMMENT
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060222/ 425de84a/attachment.pl
ADD REPLY
0
Entering edit mode
Morten ▴ 300
@morten-929
Last seen 10.2 years ago
Kaustubh Patil wrote: >Hi, > > I want to fetch documents from PubMed. So first I get all the PMIDs and then use the "pubmed" function from the "annotate package". But does this function take care of the NCBI rule for waiting 3 seconds between queries? > > Dont know about the "pubmed" function from annotate, but Ive seen a function which does excaly this in the MedlineR package (im just pasting the code below) pauseBetweenQueries<- function ( sleep.peak=15, # pause (in seconds) during peak hours sleep.offpeak=3 # pause (in seconds) during off-peak ) { # sleep.peak<-15; sleep.offpeak<-3 # Date example: # "Thu" "Jan" "15" "16:46:11" "2004" result.date<- unlist (strsplit( date(), split=" ")) hour<- as.numeric(unlist (strsplit (result.date[4], split=':'))[1]) # off peak hours are Sat, Sun or anytime between 9 pm and 5 am if ( (result.date[1]=="Sat") | (result.date[1]=="Sun") | (hour > 21) | (hour<5) ) {off.peak<-T} else {off.peak<-F} # perform the sleep if (off.peak) { Sys.sleep (sleep.offpeak) } else { Sys.sleep (sleep.peak) } } you may want to try more code from MedlineR. you can find the complete code here: http://www.dbsr.duke.edu/pub/MedlineR/MedlineR_v30.txt hope this can be usefull :) morten > > Also I have a list of 718 PMIDs but the function retrieves only 377 of them? I don't understand why. Suggestions appreciated. > > Thank you and regards, > Kaustubh > > >--------------------------------- > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor > >
ADD COMMENT
0
Entering edit mode
@kaustubh-patil-1544
Last seen 10.2 years ago
Hi, I forgot to attch the file Its here, Kaustubh Kaustubh Patil <kaustubhp_in at="" yahoo.com=""> wrote: Dear Robert, Thanks for your reply. First of all something about my system, I have celeron 2.5 with 512 mb ram, running fedora core 4 R Version 2.2.1 (2005-12-20 r36812) wilth RSXML 0.99 I am attaching a file that contains 2665 PMIDS that I want to fetch, load this file using load("ids") and it will create a variable with name ids. Then if I use following code, I get only 363 abstracts, docs <- pubmed(ids) root <- xmlRoot(docs) arts <- xmlApply(root,buildPubMedAbst) absts <- sapply(arts,abstText) length(absts) [1] 363 interestingly those are first 363 abstracts. The 364th ("12136003") abstract could be fetched manually as well as using MedlineR library. Am I missing something here? Robert Gentleman <rgentlem at="" fhcrc.org=""> wrote: Hi, pubmed makes precisely one request, so there is no issue with timing. In many cases you can make a single request for lots of things, rather than lots of requests for one thing. If you stick it in a for loop then there could be problems, but so far not a single person has reported hitting this particular wall. As for why only 377 came back, did you check to see what happens if you request one of the missing ones by itself? Or go to the website at NLM and see if you Pubmed id is valid? Also, please do read the posting guide and tell us something about your system. thanks Robert Kaustubh Patil wrote: > Hi, > > I want to fetch documents from PubMed. So first I get all the PMIDs and then use the "pubmed" function from the "annotate package". But does this function take care of the NCBI rule for waiting 3 seconds between queries? > > Also I have a list of 718 PMIDs but the function retrieves only 377 of them? I don't understand why. Suggestions appreciated. > > Thank you and regards, > Kaustubh > > > --------------------------------- > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org --------------------------------- ---------------------------------
ADD COMMENT

Login before adding your answer.

Traffic: 896 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6