R: Can an R script be run through a cron job ?

0

Entering edit mode

mauede@alice.it ▴ 870

@mauedealiceit-3511

Last seen 9.7 years ago

I reattached my script. I had attached it to an earlier message that maybe was overlooked. As you can see yourself, I scan a big data set, named hsTargets, that contains plenty of target gene transcript IDs with a handle to the relative miRNA. I process such a data base one miRNA at a time. That is, I gather all the transcript IDs for the current miRNA and query biomaRT asking for the 3'utr for all such transcrpts whose ENST are in a vector that I pass as input parameter to the query. Therefore I do use the vectorized capabilities of R, don't I ? My mistake is to keep the connection to biomaRt opened while processing as many miRNAs as I can. Therefore I acknowledge I have to improve my script and catch the exception so that I have to delete the file currently being written (as in general it will be incomplete) and have the script die gently. Then I have to get my script pause and disconnect from biomaRT regularly to avoid hammering the provided service. Eventually my process can even end itself instead of sleeping, after saving its current status. However, I need to set up the task scheduler to restart it some time later ... Regards, Maura -----Messaggio originale----- Da: Kasper Daniel Hansen [mailto:khansen at stat.berkeley.edu] Inviato: ven 20/11/2009 15.12 A: mauede at alice.it Cc: Bioconductor List Oggetto: Re: [BioC] Can an R script be run through a cron job ? Maura Unfortunately you never showed us your code, despite repeated requests to do so. That makes it hard to help (and frankly, ignoring requests for information from people trying to help you is extremely counterproductive). Your comments in your last email in the last thread indicates that you have code that essentially do this for(i in 1:100) getBM(...) If this is true (which we would know if we can see the code), this is why your script fail. There are two problems with this (1) you are not using the vectorized capabilities of R, but more important is (2) you are sending many requests to Biomart and typically such behaviour might mean your IP address will be banned temporarily. They don't like people hammering their services with repeated requests. Instead you should create a query that essentially asks for all your return objects in one request. That should be easy to write, and will be much faster. You might think that processing the output is slightly harder, but that is the thing to do (and with more R experience, processing a big output is actually easier). Regarding your actual question in this email, you seem to be very confused regarding the meaning of a batch job. This word has many different interpretations (not related to R), so it is hard to google for. What you are specifically asking for has everything to do with what operating system you are using (Windows, Linux, OS X) and nothing to do with R. Kasper On Nov 19, 2009, at 18:24 , <mauede at="" alice.it=""> <mauede at="" alice.it=""> wrote: > I am running a script that extracts many long strings from remote > data bases. > Every now and then the remote data base gets out of sync and closes > the connection. > I have been adviced to implement an R script that queries the data > base in batch modality. > I never ran an R script in batch modality. I think I have to use R > CMD BATCH or something similar > Given the amount of data I am extracting, I am concerned about > having to parse a huge data file looking for the > informattion I need. > The less painful modification would consist in running the R script > as is but through a cron job. So that the script > should be set to sleep on an established frequency and when > awakened it should resume from where it was interrupted. > Is such a scheme doable in R ? If it is then what are the most > important commands to make a script sleep and wake up > on a regular basis ? > > Thank you in advance, > Maura > > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor e tutti i telefonini TIM! Vai su

miRNA PROcess biomaRt miRNA PROcess biomaRt • 1.6k views

ADD COMMENT • link updated 14.5 years ago by Francois Pepin ★ 1.3k • written 14.5 years ago by mauede@alice.it ▴ 870

0

Entering edit mode

Francois Pepin ★ 1.3k

@francois-pepin-1012

Last seen 9.7 years ago

Hi Maura, your attachment was scrubbed by the list software, it wasn't overlooked. You would be better to have the relevant parts in your e-mail instead. Kasper is referring to the fact that you are sending a different query for each miRNA. Grouping everything together such that you only have a single query. This is basically Cei's suggestion, although I would suggest limiting yourself to the Ensembl transcript IDs of interest as opposed to querying all unique IDs. Francois On 11/20/2009 09:49 AM, mauede at alice.it wrote: > I reattached my script. I had attached it to an earlier message that maybe was overlooked. > > As you can see yourself, I scan a big data set, named hsTargets, that contains plenty of target gene > transcript IDs with a handle to the relative miRNA. > I process such a data base one miRNA at a time. That is, I gather all the transcript IDs for the current miRNA > and query biomaRT asking for the 3'utr for all such transcrpts whose ENST are in a vector that I pass as input parameter to the query. Therefore I do use the vectorized capabilities of R, don't I ? > > My mistake is to keep the connection to biomaRt opened while processing as many miRNAs as I can. > Therefore I acknowledge I have to improve my script and catch the exception so that I have to delete the file currently being written (as in general it will be incomplete) and have the script die gently. > Then I have to get my script pause and disconnect from biomaRT regularly to avoid hammering the provided > service. > Eventually my process can even end itself instead of sleeping, after saving its current status. > However, I need to set up the task scheduler to restart it some time later ... > > Regards, > Maura > > > > > > > -----Messaggio originale----- > Da: Kasper Daniel Hansen [mailto:khansen at stat.berkeley.edu] > Inviato: ven 20/11/2009 15.12 > A: mauede at alice.it > Cc: Bioconductor List > Oggetto: Re: [BioC] Can an R script be run through a cron job ? > > Maura > > Unfortunately you never showed us your code, despite repeated requests > to do so. That makes it hard to help (and frankly, ignoring requests > for information from people trying to help you is extremely > counterproductive). > > Your comments in your last email in the last thread indicates that you > have code that essentially do this > > for(i in 1:100) > getBM(...) > > If this is true (which we would know if we can see the code), this is > why your script fail. There are two problems with this (1) you are > not using the vectorized capabilities of R, but more important is (2) > you are sending many requests to Biomart and typically such behaviour > might mean your IP address will be banned temporarily. They don't > like people hammering their services with repeated requests. > > Instead you should create a query that essentially asks for all your > return objects in one request. That should be easy to write, and will > be much faster. You might think that processing the output is > slightly harder, but that is the thing to do (and with more R > experience, processing a big output is actually easier). > > Regarding your actual question in this email, you seem to be very > confused regarding the meaning of a batch job. This word has many > different interpretations (not related to R), so it is hard to google > for. What you are specifically asking for has everything to do with > what operating system you are using (Windows, Linux, OS X) and nothing > to do with R. > > Kasper > > > On Nov 19, 2009, at 18:24 ,<mauede at="" alice.it=""> <mauede at="" alice.it=""> wrote: > >> I am running a script that extracts many long strings from remote >> data bases. >> Every now and then the remote data base gets out of sync and closes >> the connection. >> I have been adviced to implement an R script that queries the data >> base in batch modality. >> I never ran an R script in batch modality. I think I have to use R >> CMD BATCH or something similar >> Given the amount of data I am extracting, I am concerned about >> having to parse a huge data file looking for the >> informattion I need. >> The less painful modification would consist in running the R script >> as is but through a cron job. So that the script >> should be set to sleep on an established frequency and when >> awakened it should resume from where it was interrupted. >> Is such a scheme doable in R ? If it is then what are the most >> important commands to make a script sleep and wake up >> on a regular basis ? >> >> Thank you in advance, >> Maura >> >> >> >> >> tutti i telefonini TIM! >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > e tutti i telefonini TIM! > Vai su > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.5 years ago Francois Pepin ★ 1.3k

0

Entering edit mode

Cei Abreu-Goodger ▴ 830

@cei-abreu-goodger-4433

Last seen 9.2 years ago

Mexico

may I suggest the following: 1) First get all unique Ensembl transcript IDs 2) If there are too many, split into groups of ~1-5 thousand (I don't know what the optimum would be) 3) For each group of ids, use getSequence() to retrieve the 3'UTR. 4) rbind the results, check, save Cheers, Cei mauede at alice.it wrote: > I reattached my script. I had attached it to an earlier message that maybe was overlooked. > > As you can see yourself, I scan a big data set, named hsTargets, that contains plenty of target gene > transcript IDs with a handle to the relative miRNA. > I process such a data base one miRNA at a time. That is, I gather all the transcript IDs for the current miRNA > and query biomaRT asking for the 3'utr for all such transcrpts whose ENST are in a vector that I pass as input parameter to the query. Therefore I do use the vectorized capabilities of R, don't I ? > > My mistake is to keep the connection to biomaRt opened while processing as many miRNAs as I can. > Therefore I acknowledge I have to improve my script and catch the exception so that I have to delete the file currently being written (as in general it will be incomplete) and have the script die gently. > Then I have to get my script pause and disconnect from biomaRT regularly to avoid hammering the provided > service. > Eventually my process can even end itself instead of sleeping, after saving its current status. > However, I need to set up the task scheduler to restart it some time later ... > > Regards, > Maura > > > > > > > -----Messaggio originale----- > Da: Kasper Daniel Hansen [mailto:khansen at stat.berkeley.edu] > Inviato: ven 20/11/2009 15.12 > A: mauede at alice.it > Cc: Bioconductor List > Oggetto: Re: [BioC] Can an R script be run through a cron job ? > > Maura > > Unfortunately you never showed us your code, despite repeated requests > to do so. That makes it hard to help (and frankly, ignoring requests > for information from people trying to help you is extremely > counterproductive). > > Your comments in your last email in the last thread indicates that you > have code that essentially do this > > for(i in 1:100) > getBM(...) > > If this is true (which we would know if we can see the code), this is > why your script fail. There are two problems with this (1) you are > not using the vectorized capabilities of R, but more important is (2) > you are sending many requests to Biomart and typically such behaviour > might mean your IP address will be banned temporarily. They don't > like people hammering their services with repeated requests. > > Instead you should create a query that essentially asks for all your > return objects in one request. That should be easy to write, and will > be much faster. You might think that processing the output is > slightly harder, but that is the thing to do (and with more R > experience, processing a big output is actually easier). > > Regarding your actual question in this email, you seem to be very > confused regarding the meaning of a batch job. This word has many > different interpretations (not related to R), so it is hard to google > for. What you are specifically asking for has everything to do with > what operating system you are using (Windows, Linux, OS X) and nothing > to do with R. > > Kasper > > > On Nov 19, 2009, at 18:24 , <mauede at="" alice.it=""> <mauede at="" alice.it=""> wrote: > >> I am running a script that extracts many long strings from remote >> data bases. >> Every now and then the remote data base gets out of sync and closes >> the connection. >> I have been adviced to implement an R script that queries the data >> base in batch modality. >> I never ran an R script in batch modality. I think I have to use R >> CMD BATCH or something similar >> Given the amount of data I am extracting, I am concerned about >> having to parse a huge data file looking for the >> informattion I need. >> The less painful modification would consist in running the R script >> as is but through a cron job. So that the script >> should be set to sleep on an established frequency and when >> awakened it should resume from where it was interrupted. >> Is such a scheme doable in R ? If it is then what are the most >> important commands to make a script sleep and wake up >> on a regular basis ? >> >> Thank you in advance, >> Maura >> >> >> >> >> tutti i telefonini TIM! >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > e tutti i telefonini TIM! > Vai su > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.5 years ago Cei Abreu-Goodger ▴ 830

0

Entering edit mode

Hello, going back to the original question I just wanted to indicate http://dirk.eddelbuettel.com/code/littler.html which works just fine for me, also with the getops package. Steffen Cei Abreu-Goodger wrote: > may I suggest the following: > > 1) First get all unique Ensembl transcript IDs > > 2) If there are too many, split into groups of ~1-5 thousand (I don't > know what the optimum would be) > > 3) For each group of ids, use getSequence() to retrieve the 3'UTR. > > 4) rbind the results, check, save > > Cheers, > > Cei > > > mauede at alice.it wrote: >> I reattached my script. I had attached it to an earlier message that >> maybe was overlooked. >> >> As you can see yourself, I scan a big data set, named hsTargets, that >> contains plenty of target gene transcript IDs with a handle to the >> relative miRNA. >> I process such a data base one miRNA at a time. That is, I gather all >> the transcript IDs for the current miRNA >> and query biomaRT asking for the 3'utr for all such transcrpts whose >> ENST are in a vector that I pass as input parameter to the query. >> Therefore I do use the vectorized capabilities of R, don't I ? >> >> My mistake is to keep the connection to biomaRt opened while >> processing as many miRNAs as I can. >> Therefore I acknowledge I have to improve my script and catch the >> exception so that I have to delete the file currently being written >> (as in general it will be incomplete) and have the script die gently. >> Then I have to get my script pause and disconnect from biomaRT >> regularly to avoid hammering the provided >> service. Eventually my process can even end itself instead of >> sleeping, after saving its current status. However, I need to set up >> the task scheduler to restart it some time later ... >> >> Regards, >> Maura >> >> >> >> >> >> >> -----Messaggio originale----- >> Da: Kasper Daniel Hansen [mailto:khansen at stat.berkeley.edu] >> Inviato: ven 20/11/2009 15.12 >> A: mauede at alice.it >> Cc: Bioconductor List >> Oggetto: Re: [BioC] Can an R script be run through a cron job ? >> >> Maura >> >> Unfortunately you never showed us your code, despite repeated >> requests to do so. That makes it hard to help (and frankly, ignoring >> requests for information from people trying to help you is extremely >> counterproductive). >> >> Your comments in your last email in the last thread indicates that >> you have code that essentially do this >> >> for(i in 1:100) >> getBM(...) >> >> If this is true (which we would know if we can see the code), this is >> why your script fail. There are two problems with this (1) you are >> not using the vectorized capabilities of R, but more important is (2) >> you are sending many requests to Biomart and typically such behaviour >> might mean your IP address will be banned temporarily. They don't >> like people hammering their services with repeated requests. >> >> Instead you should create a query that essentially asks for all your >> return objects in one request. That should be easy to write, and >> will be much faster. You might think that processing the output is >> slightly harder, but that is the thing to do (and with more R >> experience, processing a big output is actually easier). >> >> Regarding your actual question in this email, you seem to be very >> confused regarding the meaning of a batch job. This word has many >> different interpretations (not related to R), so it is hard to google >> for. What you are specifically asking for has everything to do with >> what operating system you are using (Windows, Linux, OS X) and >> nothing to do with R. >> >> Kasper >> >> >> On Nov 19, 2009, at 18:24 , <mauede at="" alice.it=""> <mauede at="" alice.it=""> wrote: >> >>> I am running a script that extracts many long strings from remote >>> data bases. >>> Every now and then the remote data base gets out of sync and closes >>> the connection. >>> I have been adviced to implement an R script that queries the data >>> base in batch modality. >>> I never ran an R script in batch modality. I think I have to use R >>> CMD BATCH or something similar >>> Given the amount of data I am extracting, I am concerned about >>> having to parse a huge data file looking for the >>> informattion I need. >>> The less painful modification would consist in running the R script >>> as is but through a cron job. So that the script >>> should be set to sleep on an established frequency and when >>> awakened it should resume from where it was interrupted. >>> Is such a scheme doable in R ? If it is then what are the most >>> important commands to make a script sleep and wake up >>> on a regular basis ? >>> >>> Thank you in advance, >>> Maura >>> >>> >>> >>> >>> tutti i telefonini TIM! >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> >> >> e tutti i telefonini TIM! >> Vai su >> >> ------------------------------------------------------------------- ----- >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.5 years ago Steffen Moeller ▴ 90

Login before adding your answer.