foreach, doMC, and GOstats problems
1
0
Entering edit mode
Tarca, Adi ▴ 570
@tarca-adi-1500
Last seen 12 months ago
United States
Hi all, I am using foreach and doMC packages to do some parallel GO analyses with GOstats. If I use 3 or less processors at the same time (i.e. the foreach loop goes up to n=3) all works fine but when I want to use n>3 I get this error: "RSQLite driver: (RS_SQLite_fetch: failed: database disk image is malformed)" I assume that the error comes from the multiple connections to the annotation packages used by GOstats. Is there a way to overcome this problem? Thanks Adi L. Tarca > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 [4] codetools_0.2-2 iterators_1.0.3 GOstats_2.16.0 [7] RSQLite_0.9-3 DBI_0.2-5 graph_1.28.0 [10] Category_2.16.0 AnnotationDbi_1.12.0 Biobase_2.10.0 [13] limma_3.6.6 loaded via a namespace (and not attached): [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.1 [5] RBGL_1.26.0 splines_2.12.0 survival_2.35-8 tools_2.12.0 [9] XML_3.2-0 xtable_1.5-6 [[alternative HTML version deleted]]
GO GOstats GO GOstats • 2.5k views
ADD COMMENT
0
Entering edit mode
Dan Tenenbaum ★ 8.2k
@dan-tenenbaum-4256
Last seen 5 months ago
United States
On Thu, Mar 24, 2011 at 9:29 AM, Tarca, Adi <atarca@med.wayne.edu> wrote: > Hi all, > > I am using foreach and doMC packages to do some parallel GO analyses with > GOstats. > > If I use 3 or less processors at the same time (i.e. the foreach loop goes > up to n=3) all works fine but when I want to use n>3 I get this error: > > "RSQLite driver: (RS_SQLite_fetch: failed: database disk image is > malformed)" > I assume that the error comes from the multiple connections to the > annotation packages used by GOstats. Is there a way to overcome this > problem? > Thanks > > Are you doing any writes to a SQLite database? SQLite is not so good at write concurrency but it should be able to handle multiple (near-)simultaneous reads. Also, how many processors do you have? Dan > Adi L. Tarca > > > sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 > [4] codetools_0.2-2 iterators_1.0.3 GOstats_2.16.0 > [7] RSQLite_0.9-3 DBI_0.2-5 graph_1.28.0 > [10] Category_2.16.0 AnnotationDbi_1.12.0 Biobase_2.10.0 > [13] limma_3.6.6 > > loaded via a namespace (and not attached): > [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.1 > [5] RBGL_1.26.0 splines_2.12.0 survival_2.35-8 tools_2.12.0 > [9] XML_3.2-0 xtable_1.5-6 > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thanks Dan, I only have/want to use 8 processors, and I do not do any writes with SQLite. Each processor runs sequentially multiple calls to the hyperGTest function. This is the only (indirect) interaction I have with SQLite. Best, Adi From: Dan Tenenbaum [mailto:dtenenba@fhcrc.org] Sent: Thursday, March 24, 2011 1:40 PM To: Tarca, Adi Cc: bioconductor@r-project.org Subject: Re: [BioC] foreach, doMC, and GOstats problems On Thu, Mar 24, 2011 at 9:29 AM, Tarca, Adi <atarca@med.wayne.edu<mailto:atarca@med.wayne.edu>> wrote: Hi all, I am using foreach and doMC packages to do some parallel GO analyses with GOstats. If I use 3 or less processors at the same time (i.e. the foreach loop goes up to n=3) all works fine but when I want to use n>3 I get this error: "RSQLite driver: (RS_SQLite_fetch: failed: database disk image is malformed)" I assume that the error comes from the multiple connections to the annotation packages used by GOstats. Is there a way to overcome this problem? Thanks Are you doing any writes to a SQLite database? SQLite is not so good at write concurrency but it should be able to handle multiple (near-)simultaneous reads. Also, how many processors do you have? Dan Adi L. Tarca > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 [4] codetools_0.2-2 iterators_1.0.3 GOstats_2.16.0 [7] RSQLite_0.9-3 DBI_0.2-5 graph_1.28.0 [10] Category_2.16.0 AnnotationDbi_1.12.0 Biobase_2.10.0 [13] limma_3.6.6 loaded via a namespace (and not attached): [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.1 [5] RBGL_1.26.0 splines_2.12.0 survival_2.35-8 tools_2.12.0 [9] XML_3.2-0 xtable_1.5-6 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Since SQLite is an in-process database it might not like being accessed by different processes. Do you ensure that each process has its own unique database connection? If that doesn't fix it, can you post a simple reproducible example? Thanks Dan On Thu, Mar 24, 2011 at 11:04 AM, Tarca, Adi <atarca@med.wayne.edu> wrote: > Thanks Dan, > > I only have/want to use 8 processors, and I do not do any writes with > SQLite. Each processor runs sequentially multiple calls to the hyperGTest > function. This is the only (indirect) interaction I have with SQLite. > > > > Best, > > Adi > > > > > > *From:* Dan Tenenbaum [mailto:dtenenba@fhcrc.org] > *Sent:* Thursday, March 24, 2011 1:40 PM > *To:* Tarca, Adi > *Cc:* bioconductor@r-project.org > *Subject:* Re: [BioC] foreach, doMC, and GOstats problems > > > > > > On Thu, Mar 24, 2011 at 9:29 AM, Tarca, Adi <atarca@med.wayne.edu> wrote: > > Hi all, > > I am using foreach and doMC packages to do some parallel GO analyses with > GOstats. > > If I use 3 or less processors at the same time (i.e. the foreach loop goes > up to n=3) all works fine but when I want to use n>3 I get this error: > > "RSQLite driver: (RS_SQLite_fetch: failed: database disk image is > malformed)" > I assume that the error comes from the multiple connections to the > annotation packages used by GOstats. Is there a way to overcome this > problem? > Thanks > > > > Are you doing any writes to a SQLite database? SQLite is not so good at > write concurrency but it should be able to handle multiple > (near-)simultaneous reads. > > > > Also, how many processors do you have? > > Dan > > > > > > Adi L. Tarca > > > sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 > [4] codetools_0.2-2 iterators_1.0.3 GOstats_2.16.0 > [7] RSQLite_0.9-3 DBI_0.2-5 graph_1.28.0 > [10] Category_2.16.0 AnnotationDbi_1.12.0 Biobase_2.10.0 > [13] limma_3.6.6 > > loaded via a namespace (and not attached): > [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.1 > [5] RBGL_1.26.0 splines_2.12.0 survival_2.35-8 tools_2.12.0 > [9] XML_3.2-0 xtable_1.5-6 > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Also, note this from the RSQLite NEWS file: - The SQLite driver handle validation code, is_ValidHandle, no longer requires the driver ID to be equal to the current process ID. SQLite supports multiple processes accessing the same SQLite file via locking (however, results are known to be unreliable on NFS). This change should make using RSQLite with the multicore package easier. For an example of the issue that the PID check causes see: https://stat.ethz.ch/pipermail/r-sig-hpc/2009-August/000335.html Dan On Thu, Mar 24, 2011 at 11:28 AM, Dan Tenenbaum <dtenenba@fhcrc.org> wrote: > Since SQLite is an in-process database it might not like being accessed by > different processes. Do you ensure that each process has its own unique > database connection? > If that doesn't fix it, can you post a simple reproducible example? > Thanks > Dan > > > > On Thu, Mar 24, 2011 at 11:04 AM, Tarca, Adi <atarca@med.wayne.edu> wrote: > >> Thanks Dan, >> >> I only have/want to use 8 processors, and I do not do any writes with >> SQLite. Each processor runs sequentially multiple calls to the hyperGTest >> function. This is the only (indirect) interaction I have with SQLite. >> >> >> >> Best, >> >> Adi >> >> >> >> >> >> *From:* Dan Tenenbaum [mailto:dtenenba@fhcrc.org] >> *Sent:* Thursday, March 24, 2011 1:40 PM >> *To:* Tarca, Adi >> *Cc:* bioconductor@r-project.org >> *Subject:* Re: [BioC] foreach, doMC, and GOstats problems >> >> >> >> >> >> On Thu, Mar 24, 2011 at 9:29 AM, Tarca, Adi <atarca@med.wayne.edu> wrote: >> >> Hi all, >> >> I am using foreach and doMC packages to do some parallel GO analyses with >> GOstats. >> >> If I use 3 or less processors at the same time (i.e. the foreach loop goes >> up to n=3) all works fine but when I want to use n>3 I get this error: >> >> "RSQLite driver: (RS_SQLite_fetch: failed: database disk image is >> malformed)" >> I assume that the error comes from the multiple connections to the >> annotation packages used by GOstats. Is there a way to overcome this >> problem? >> Thanks >> >> >> >> Are you doing any writes to a SQLite database? SQLite is not so good at >> write concurrency but it should be able to handle multiple >> (near-)simultaneous reads. >> >> >> >> Also, how many processors do you have? >> >> Dan >> >> >> >> >> >> Adi L. Tarca >> >> > sessionInfo() >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 >> [4] codetools_0.2-2 iterators_1.0.3 GOstats_2.16.0 >> [7] RSQLite_0.9-3 DBI_0.2-5 graph_1.28.0 >> [10] Category_2.16.0 AnnotationDbi_1.12.0 Biobase_2.10.0 >> [13] limma_3.6.6 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.1 >> [5] RBGL_1.26.0 splines_2.12.0 survival_2.35-8 tools_2.12.0 >> [9] XML_3.2-0 xtable_1.5-6 >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Dan, You are right, it seems that SQLite does not like being accessed by different processes and this is what causes the problem. The example below shows how 3 processors can each call hyperGTest 2 times in sequence, as long as the annotation database is not shared between processors. To make the code fail and reproduce my error just uncomment this line: #arraystouse=rep("hgu133plus2.db",3) In this way all 3 processors will attempt to use the same hgu133plus2.db annotation package. Thanks, Adi #code starts here: library(GOstats) library(foreach) library(doMC) registerDoMC() arraystouse=c("hgu133plus2.db","hgu133a.db","illuminaHumanv3BeadID.db" ) #arraystouse=rep("hgu133plus2.db",3) objres=foreach(i=1:3)%dopar%{ require(arraystouse[i],character.only=TRUE) anpack=paste(unlist(strsplit(arraystouse[i],split=".db")),"ENTREZID",s ep="") x=unlist(as.list(get(anpack))) allG<- unlist(as.list(x)) res=NULL for(ite in 1:2){ DEG=sample(allG,500) params <- new("GOHyperGParams", geneIds = DEG, universeGeneIds = allG, annotation = arraystouse[i], ontology = "BP", pvalueCutoff = 0.05, conditional = FALSE, testDirection = "over") hgCondOver <- hyperGTest(params) tmp<-summary(hgCondOver) res=rbind(res,tmp) cat(paste(i,ite,"\n")); } res } From: Dan Tenenbaum [mailto:dtenenba@fhcrc.org] Sent: Thursday, March 24, 2011 2:29 PM To: Tarca, Adi Cc: bioconductor@r-project.org Subject: Re: [BioC] foreach, doMC, and GOstats problems Since SQLite is an in-process database it might not like being accessed by different processes. Do you ensure that each process has its own unique database connection? If that doesn't fix it, can you post a simple reproducible example? Thanks Dan On Thu, Mar 24, 2011 at 11:04 AM, Tarca, Adi <atarca@med.wayne.edu<mailto:atarca@med.wayne.edu>> wrote: Thanks Dan, I only have/want to use 8 processors, and I do not do any writes with SQLite. Each processor runs sequentially multiple calls to the hyperGTest function. This is the only (indirect) interaction I have with SQLite. Best, Adi From: Dan Tenenbaum [mailto:dtenenba@fhcrc.org<mailto:dtenenba@fhcrc.org>] Sent: Thursday, March 24, 2011 1:40 PM To: Tarca, Adi Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] foreach, doMC, and GOstats problems On Thu, Mar 24, 2011 at 9:29 AM, Tarca, Adi <atarca@med.wayne.edu<mailto:atarca@med.wayne.edu>> wrote: Hi all, I am using foreach and doMC packages to do some parallel GO analyses with GOstats. If I use 3 or less processors at the same time (i.e. the foreach loop goes up to n=3) all works fine but when I want to use n>3 I get this error: "RSQLite driver: (RS_SQLite_fetch: failed: database disk image is malformed)" I assume that the error comes from the multiple connections to the annotation packages used by GOstats. Is there a way to overcome this problem? Thanks Are you doing any writes to a SQLite database? SQLite is not so good at write concurrency but it should be able to handle multiple (near-)simultaneous reads. Also, how many processors do you have? Dan Adi L. Tarca > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 [4] codetools_0.2-2 iterators_1.0.3 GOstats_2.16.0 [7] RSQLite_0.9-3 DBI_0.2-5 graph_1.28.0 [10] Category_2.16.0 AnnotationDbi_1.12.0 Biobase_2.10.0 [13] limma_3.6.6 loaded via a namespace (and not attached): [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.1 [5] RBGL_1.26.0 splines_2.12.0 survival_2.35-8 tools_2.12.0 [9] XML_3.2-0 xtable_1.5-6 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
OK, It's GOstats which loads AnnotationDbi (which loads RSQLite), so put library(GOstats) inside your %dopar% loop. And then start with a fresh session, so GOstats is not loaded outside the loop. It worked for me. Dan On Thu, Mar 24, 2011 at 12:54 PM, Tarca, Adi <atarca@med.wayne.edu> wrote: > Hi Dan, > > You are right, it seems that SQLite does not like being accessed by > different processes and this is what causes the problem. The example below > shows how 3 processors can each call hyperGTest 2 times in sequence, as > long as the annotation database is not shared between processors. To make > the code fail and reproduce my error just uncomment this line: > > #arraystouse=rep("hgu133plus2.db",3) > > In this way all 3 processors will attempt to use the same hgu133plus2.db > annotation package. > > > > Thanks, > > Adi > > > > > > #code starts here: > > > > > > library(GOstats) > > library(foreach) > > library(doMC) > > registerDoMC() > > > > > > arraystouse=c("hgu133plus2.db","hgu133a.db","illuminaHumanv3BeadID.d b") > > #arraystouse=rep("hgu133plus2.db",3) > > > > objres=foreach(i=1:3)%dopar%{ > > > > > > require(arraystouse[i],character.only=TRUE) > > > anpack=paste(unlist(strsplit(arraystouse[i],split=".db")),"ENTREZID" ,sep="") > > x=unlist(as.list(get(anpack))) > > allG<- unlist(as.list(x)) > > > > res=NULL > > > > for(ite in 1:2){ > > > > DEG=sample(allG,500) > > params <- new("GOHyperGParams", geneIds = DEG, > > universeGeneIds = allG, annotation = arraystouse[i], > > ontology = "BP", pvalueCutoff = 0.05, conditional = FALSE, > > testDirection = "over") > > > > hgCondOver <- hyperGTest(params) > > tmp<-summary(hgCondOver) > > res=rbind(res,tmp) > > cat(paste(i,ite,"\n")); > > } > > > > res > > } > > > > > > > > > > *From:* Dan Tenenbaum [mailto:dtenenba@fhcrc.org] > *Sent:* Thursday, March 24, 2011 2:29 PM > > *To:* Tarca, Adi > *Cc:* bioconductor@r-project.org > *Subject:* Re: [BioC] foreach, doMC, and GOstats problems > > > > Since SQLite is an in-process database it might not like being accessed by > different processes. Do you ensure that each process has its own unique > database connection? > > If that doesn't fix it, can you post a simple reproducible example? > > Thanks > > Dan > > > > > > On Thu, Mar 24, 2011 at 11:04 AM, Tarca, Adi <atarca@med.wayne.edu> wrote: > > Thanks Dan, > > I only have/want to use 8 processors, and I do not do any writes with > SQLite. Each processor runs sequentially multiple calls to the hyperGTest > function. This is the only (indirect) interaction I have with SQLite. > > > > Best, > > Adi > > > > > > *From:* Dan Tenenbaum [mailto:dtenenba@fhcrc.org] > *Sent:* Thursday, March 24, 2011 1:40 PM > *To:* Tarca, Adi > *Cc:* bioconductor@r-project.org > *Subject:* Re: [BioC] foreach, doMC, and GOstats problems > > > > > > On Thu, Mar 24, 2011 at 9:29 AM, Tarca, Adi <atarca@med.wayne.edu> wrote: > > Hi all, > > I am using foreach and doMC packages to do some parallel GO analyses with > GOstats. > > If I use 3 or less processors at the same time (i.e. the foreach loop goes > up to n=3) all works fine but when I want to use n>3 I get this error: > > "RSQLite driver: (RS_SQLite_fetch: failed: database disk image is > malformed)" > I assume that the error comes from the multiple connections to the > annotation packages used by GOstats. Is there a way to overcome this > problem? > Thanks > > > > Are you doing any writes to a SQLite database? SQLite is not so good at > write concurrency but it should be able to handle multiple > (near-)simultaneous reads. > > > > Also, how many processors do you have? > > Dan > > > > > > Adi L. Tarca > > > sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 > [4] codetools_0.2-2 iterators_1.0.3 GOstats_2.16.0 > [7] RSQLite_0.9-3 DBI_0.2-5 graph_1.28.0 > [10] Category_2.16.0 AnnotationDbi_1.12.0 Biobase_2.10.0 > [13] limma_3.6.6 > > loaded via a namespace (and not attached): > [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.1 > [5] RBGL_1.26.0 splines_2.12.0 survival_2.35-8 tools_2.12.0 > [9] XML_3.2-0 xtable_1.5-6 > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Great, that makes sense. It works for me too. Thanks, Adi From: Dan Tenenbaum [mailto:dtenenba@fhcrc.org] Sent: Thursday, March 24, 2011 5:18 PM To: Tarca, Adi; bioconductor@r-project.org Subject: Re: [BioC] foreach, doMC, and GOstats problems OK, It's GOstats which loads AnnotationDbi (which loads RSQLite), so put library(GOstats) inside your %dopar% loop. And then start with a fresh session, so GOstats is not loaded outside the loop. It worked for me. Dan On Thu, Mar 24, 2011 at 12:54 PM, Tarca, Adi <atarca@med.wayne.edu<mailto:atarca@med.wayne.edu>> wrote: Hi Dan, You are right, it seems that SQLite does not like being accessed by different processes and this is what causes the problem. The example below shows how 3 processors can each call hyperGTest 2 times in sequence, as long as the annotation database is not shared between processors. To make the code fail and reproduce my error just uncomment this line: #arraystouse=rep("hgu133plus2.db",3) In this way all 3 processors will attempt to use the same hgu133plus2.db annotation package. Thanks, Adi #code starts here: library(GOstats) library(foreach) library(doMC) registerDoMC() arraystouse=c("hgu133plus2.db","hgu133a.db","illuminaHumanv3BeadID.db" ) #arraystouse=rep("hgu133plus2.db",3) objres=foreach(i=1:3)%dopar%{ require(arraystouse[i],character.only=TRUE) anpack=paste(unlist(strsplit(arraystouse[i],split=".db")),"ENTREZID",s ep="") x=unlist(as.list(get(anpack))) allG<- unlist(as.list(x)) res=NULL for(ite in 1:2){ DEG=sample(allG,500) params <- new("GOHyperGParams", geneIds = DEG, universeGeneIds = allG, annotation = arraystouse[i], ontology = "BP", pvalueCutoff = 0.05, conditional = FALSE, testDirection = "over") hgCondOver <- hyperGTest(params) tmp<-summary(hgCondOver) res=rbind(res,tmp) cat(paste(i,ite,"\n")); } res } From: Dan Tenenbaum [mailto:dtenenba@fhcrc.org<mailto:dtenenba@fhcrc.org>] Sent: Thursday, March 24, 2011 2:29 PM To: Tarca, Adi Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] foreach, doMC, and GOstats problems Since SQLite is an in-process database it might not like being accessed by different processes. Do you ensure that each process has its own unique database connection? If that doesn't fix it, can you post a simple reproducible example? Thanks Dan On Thu, Mar 24, 2011 at 11:04 AM, Tarca, Adi <atarca@med.wayne.edu<mailto:atarca@med.wayne.edu>> wrote: Thanks Dan, I only have/want to use 8 processors, and I do not do any writes with SQLite. Each processor runs sequentially multiple calls to the hyperGTest function. This is the only (indirect) interaction I have with SQLite. Best, Adi From: Dan Tenenbaum [mailto:dtenenba@fhcrc.org<mailto:dtenenba@fhcrc.org>] Sent: Thursday, March 24, 2011 1:40 PM To: Tarca, Adi Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] foreach, doMC, and GOstats problems On Thu, Mar 24, 2011 at 9:29 AM, Tarca, Adi <atarca@med.wayne.edu<mailto:atarca@med.wayne.edu>> wrote: Hi all, I am using foreach and doMC packages to do some parallel GO analyses with GOstats. If I use 3 or less processors at the same time (i.e. the foreach loop goes up to n=3) all works fine but when I want to use n>3 I get this error: "RSQLite driver: (RS_SQLite_fetch: failed: database disk image is malformed)" I assume that the error comes from the multiple connections to the annotation packages used by GOstats. Is there a way to overcome this problem? Thanks Are you doing any writes to a SQLite database? SQLite is not so good at write concurrency but it should be able to handle multiple (near-)simultaneous reads. Also, how many processors do you have? Dan Adi L. Tarca > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 [4] codetools_0.2-2 iterators_1.0.3 GOstats_2.16.0 [7] RSQLite_0.9-3 DBI_0.2-5 graph_1.28.0 [10] Category_2.16.0 AnnotationDbi_1.12.0 Biobase_2.10.0 [13] limma_3.6.6 loaded via a namespace (and not attached): [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.1 [5] RBGL_1.26.0 splines_2.12.0 survival_2.35-8 tools_2.12.0 [9] XML_3.2-0 xtable_1.5-6 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
I've seen errors like this when reading from SQLite dbs stored on network devices (not running parallel jobs) and having them on local disks seemed to address the issue. b On 24 March 2011 17:39, Dan Tenenbaum <dtenenba at="" fhcrc.org=""> wrote: > On Thu, Mar 24, 2011 at 9:29 AM, Tarca, Adi <atarca at="" med.wayne.edu=""> wrote: > >> Hi all, >> >> I am using foreach and doMC packages to do some parallel GO analyses with >> GOstats. >> >> If I use 3 or less processors at the same time (i.e. the foreach loop goes >> up to n=3) all works fine but when I want to use n>3 I get this error: >> >> "RSQLite driver: (RS_SQLite_fetch: failed: database disk image is >> malformed)" >> I assume that ? the error comes from the multiple connections to the >> annotation packages used by GOstats. ?Is there a way to overcome this >> problem? >> Thanks >> >> > Are you doing any writes to a SQLite database? SQLite is not so good at > write concurrency but it should be able to handle multiple > (near-)simultaneous reads. > > Also, how many processors do you have? > Dan > > > >> Adi L. Tarca >> >> > sessionInfo() >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C >> [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> other attached packages: >> [1] doMC_1.2.1 ? ? ? ? ? multicore_0.1-3 ? ? ?foreach_1.3.0 >> [4] codetools_0.2-2 ? ? ?iterators_1.0.3 ? ? ?GOstats_2.16.0 >> [7] RSQLite_0.9-3 ? ? ? ?DBI_0.2-5 ? ? ? ? ? ?graph_1.28.0 >> [10] Category_2.16.0 ? ? ?AnnotationDbi_1.12.0 Biobase_2.10.0 >> [13] limma_3.6.6 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.28.0 ? genefilter_1.32.0 GO.db_2.4.5 ? ? ? GSEABase_1.12.1 >> [5] RBGL_1.26.0 ? ? ? splines_2.12.0 ? ?survival_2.35-8 ? tools_2.12.0 >> [9] XML_3.2-0 ? ? ? ? xtable_1.5-6 >> >> >> >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6