QuasR on Linux Cluster

0

Entering edit mode

Ugo Borello ▴ 340

@ugo-borello-5753

Last seen 7.6 years ago

France

Hi all, I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores. I run: library(QuasR) library(BSgenome.Mmusculus.UCSC.mm10) cl <- makeCluster(1) sampleFile <- "sampleFile.txt" genomeName <- "BSgenome.Mmusculus.UCSC.mm10" proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, clObj=cl) And I get > proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, clObj=cl) alignment files missing - need to: create 1 genomic alignment(s) Testing the compute nodes...OK Loading QuasR on the compute nodes...OK Available cores: nodeNames ccwsge0155 1 Performing genomic alignments for 1 samples. See progress in the log file: /scratch/4401022.1.huge/QuasR_log_41394115a102.txt Error in unserialize(node$con) : error reading from connection Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize Execution halted I also tryied to modify the multicore option cl <- makeCluster(detectCores()) And my job is killed because it uses more memory ( Max vmem = 17.118G) than allowed (16G) Any suggestions. I am pretty stuck. Thank you in advance for your help. Ugo > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] TxDb.Mmusculus.UCSC.mm10.knownGene_2.10.1 [2] GenomicFeatures_1.14.0 [3] AnnotationDbi_1.24.0 [4] Biobase_2.22.0 [5] BSgenome.Mmusculus.UCSC.mm10_1.3.19 [6] BSgenome_1.30.0 [7] Biostrings_2.30.0 [8] QuasR_1.2.0 [9] Rbowtie_1.2.0 [10] GenomicRanges_1.14.1 [11] XVector_0.2.0 [12] IRanges_1.20.0 [13] BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] BiocInstaller_1.12.0 DBI_0.2-7 RColorBrewer_1.0-5 [4] RCurl_1.95-4.1 RSQLite_0.11.4 Rsamtools_1.14.1 [7] ShortRead_1.20.0 XML_3.98-1.1 biomaRt_2.18.0 [10] bitops_1.0-6 grid_3.0.2 hwriter_1.3 [13] lattice_0.20-24 latticeExtra_0.6-26 rtracklayer_1.22.0 [16] stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0

BSgenome BSgenome QuasR BSgenome BSgenome QuasR • 3.3k views

ADD COMMENT • link updated 12.3 years ago by Michael Stadler ▴ 350 • written 12.3 years ago by Ugo Borello ▴ 340

0

Entering edit mode

Michael Stadler ▴ 350

@michael-stadler-5887

Last seen 9 days ago

Switzerland

Hi Ugo, On 18.10.2013 13:56, Ugo Borello wrote:> Hi all, > I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores. > > I run: > library(QuasR) > library(BSgenome.Mmusculus.UCSC.mm10) > > cl <- makeCluster(1) > > sampleFile <- "sampleFile.txt" > > genomeName <- "BSgenome.Mmusculus.UCSC.mm10" > > proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, > clObj=cl) > > And I get >> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, > clObj=cl) > alignment files missing - need to: > create 1 genomic alignment(s) > Testing the compute nodes...OK > Loading QuasR on the compute nodes...OK > Available cores: > nodeNames > ccwsge0155 > 1 > Performing genomic alignments for 1 samples. See progress in the log file: > /scratch/4401022.1.huge/QuasR_log_41394115a102.txt > Error in unserialize(node$con) : error reading from connection > Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize > Execution halted The error that you get is not created within QuasR; my guess is that it comes from the "parallel" package, indicating that something goes wrong when using your cluster object "cl". I would suggest testing whether your cluster object works fine. It would help to know if the error message appears immediately after you call qAlign(), or if it takes some time to process. Also, it would be great to see the content of the QuasR log file. Here is a simple test you could try to check your cluster object/connection: parLapply(cl, seq_along(cl), function(i) Sys.info()) As a result, you should get Sys.info() output from each of the cluster nodes. > > I also tryied to modify the multicore option > > cl <- makeCluster(detectCores()) > > And my job is killed because it uses more memory ( Max vmem = 17.118G) than > allowed (16G) With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your reads, which may require several GB of memory per node in your cluster object. You can avoid the memory overflow by reducing the number of nodes in your cluster object, e.g. by: cl <- makeCluster(4) which should run through on your machine with 16GB of memory. Best, Michael

ADD COMMENT • link 12.3 years ago Michael Stadler ▴ 350

0

Entering edit mode

Thank you Michael, My bad, I am not able to find the QuasR_log at the moment. Anyway the last step was the .sam file. QuasR was not proceeding in converting the .sam file to a .bam file. In attachment some other info on the running job before death. Those refer to a case where cl<- makeCluster(1). I run your test and I got: > library(parallel) > cl<- makeCluster(detectCores()) > info<- parLapply(cl, seq_along(cl), function(i) Sys.info()) > info [[1]] sysname release "Linux" "2.6.18-348.3.1.el5" version nodename "#1 SMP Tue Mar 5 13:19:32 EST 2013" "ccwsge0053" machine login "x86_64" "unknown" user effective_user "uborello" "uborello" The same for the 32 nodes. Then I run: > library(parallel) > type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK" > type [1] "PSOCK" > cores <- getOption("mc.cores", detectCores()) > cl <- makeCluster(cores, type=type) > cl socket cluster with 32 nodes on host 'localhost' > results <- parLapply(cl, 1:100, sqrt) > sum(unlist(results)) [1] 671.4629 > stopCluster(cl) I don't know if this could help. Any suggestions? Ugo > From: Michael Stadler <michael.stadler at="" fmi.ch=""> > Date: Mon, 21 Oct 2013 11:30:27 +0200 > To: <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] QuasR on Linux Cluster > > Hi Ugo, > > On 18.10.2013 13:56, Ugo Borello wrote:> Hi all, >> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores. >> >> I run: >> library(QuasR) >> library(BSgenome.Mmusculus.UCSC.mm10) >> >> cl <- makeCluster(1) >> >> sampleFile <- "sampleFile.txt" >> >> genomeName <- "BSgenome.Mmusculus.UCSC.mm10" >> >> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >> clObj=cl) >> >> And I get >>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >> clObj=cl) >> alignment files missing - need to: >> create 1 genomic alignment(s) >> Testing the compute nodes...OK >> Loading QuasR on the compute nodes...OK >> Available cores: >> nodeNames >> ccwsge0155 >> 1 >> Performing genomic alignments for 1 samples. See progress in the log file: >> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt >> Error in unserialize(node$con) : error reading from connection >> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize >> Execution halted > > The error that you get is not created within QuasR; my guess is that it > comes from the "parallel" package, indicating that something goes wrong > when using your cluster object "cl". > > I would suggest testing whether your cluster object works fine. It would > help to know if the error message appears immediately after you call > qAlign(), or if it takes some time to process. Also, it would be great > to see the content of the QuasR log file. > > Here is a simple test you could try to check your cluster object/connection: > parLapply(cl, seq_along(cl), function(i) Sys.info()) > > As a result, you should get Sys.info() output from each of the cluster > nodes. > > >> >> I also tryied to modify the multicore option >> >> cl <- makeCluster(detectCores()) >> >> And my job is killed because it uses more memory ( Max vmem = 17.118G) than >> allowed (16G) > With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your > reads, which may require several GB of memory per node in your cluster > object. You can avoid the memory overflow by reducing the number of > nodes in your cluster object, e.g. by: > > cl <- makeCluster(4) > > which should run through on your machine with 16GB of memory. > > Best, > Michael > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.3 years ago Ugo Borello ▴ 340

0

Entering edit mode

Your cluster object seems functional now. Another possible problem could be available diskspace in R's tempdir(). It is used by qAlign to temporarily store the uncompressed fastq files, the sam files and the bam files (and thus needs several-fold more free capacity than the size of your fastq.gz files). For more information, see vignette section 4.1 "File storage locations". If tempdir() is too small, you can use redirect R's tempdir() by setting the TMPDIR environment variable, or just for one qAlign call by using the "cacheDir" parameter of qAlign. If you are sure that diskspace is not the issue, could you give qAlign() another try, using a cluster object with only 4 nodes to avoid any memory issues? Michael On 21.10.2013 15:09, Ugo Borello wrote: > Thank you Michael, > My bad, I am not able to find the QuasR_log at the moment. Anyway the last > step was the .sam file. QuasR was not proceeding in converting the .sam file > to a .bam file. > In attachment some other info on the running job before death. > Those refer to a case where cl<- makeCluster(1). > > > I run your test and I got: >> library(parallel) >> cl<- makeCluster(detectCores()) >> info<- parLapply(cl, seq_along(cl), function(i) Sys.info()) >> info > [[1]] > sysname release > "Linux" "2.6.18-348.3.1.el5" > version nodename > "#1 SMP Tue Mar 5 13:19:32 EST 2013" "ccwsge0053" > machine login > "x86_64" "unknown" > user effective_user > "uborello" "uborello" > > The same for the 32 nodes. > > Then I run: >> library(parallel) >> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK" >> type > [1] "PSOCK" >> cores <- getOption("mc.cores", detectCores()) >> cl <- makeCluster(cores, type=type) >> cl > socket cluster with 32 nodes on host 'localhost' >> results <- parLapply(cl, 1:100, sqrt) >> sum(unlist(results)) > [1] 671.4629 >> stopCluster(cl) > > I don't know if this could help. > > Any suggestions? > > Ugo > > > >> From: Michael Stadler <michael.stadler at="" fmi.ch=""> >> Date: Mon, 21 Oct 2013 11:30:27 +0200 >> To: <bioconductor at="" r-project.org=""> >> Subject: Re: [BioC] QuasR on Linux Cluster >> >> Hi Ugo, >> >> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all, >>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores. >>> >>> I run: >>> library(QuasR) >>> library(BSgenome.Mmusculus.UCSC.mm10) >>> >>> cl <- makeCluster(1) >>> >>> sampleFile <- "sampleFile.txt" >>> >>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10" >>> >>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>> clObj=cl) >>> >>> And I get >>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>> clObj=cl) >>> alignment files missing - need to: >>> create 1 genomic alignment(s) >>> Testing the compute nodes...OK >>> Loading QuasR on the compute nodes...OK >>> Available cores: >>> nodeNames >>> ccwsge0155 >>> 1 >>> Performing genomic alignments for 1 samples. See progress in the log file: >>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt >>> Error in unserialize(node$con) : error reading from connection >>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize >>> Execution halted >> >> The error that you get is not created within QuasR; my guess is that it >> comes from the "parallel" package, indicating that something goes wrong >> when using your cluster object "cl". >> >> I would suggest testing whether your cluster object works fine. It would >> help to know if the error message appears immediately after you call >> qAlign(), or if it takes some time to process. Also, it would be great >> to see the content of the QuasR log file. >> >> Here is a simple test you could try to check your cluster object/connection: >> parLapply(cl, seq_along(cl), function(i) Sys.info()) >> >> As a result, you should get Sys.info() output from each of the cluster >> nodes. >> >> >>> >>> I also tryied to modify the multicore option >>> >>> cl <- makeCluster(detectCores()) >>> >>> And my job is killed because it uses more memory ( Max vmem = 17.118G) than >>> allowed (16G) >> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your >> reads, which may require several GB of memory per node in your cluster >> object. You can avoid the memory overflow by reducing the number of >> nodes in your cluster object, e.g. by: >> >> cl <- makeCluster(4) >> >> which should run through on your machine with 16GB of memory. >> >> Best, >> Michael >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 12.3 years ago Michael Stadler ▴ 350

0

Entering edit mode

Dear Michael, I think that the disk space is not an issue; anyway, I will double check with the administrator. I used 4 nodes and QuasR stopped at the .sam file. See the output files in attachment. When I use less than 4 nodes, it stops at the beginning of the process: [1] "Writing BSgenome to disk on ccwsge0144 : /scratch/4847271.1.huge/Rtmp7nHkpp/file5971727e49b5.fa" What am I missing? Thank you Ugo > From: Michael Stadler <michael.stadler at="" fmi.ch=""> > Date: Mon, 21 Oct 2013 17:48:53 +0200 > To: Ugo Borello <ugo.borello at="" inserm.fr="">, <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] QuasR on Linux Cluster > > Your cluster object seems functional now. > > Another possible problem could be available diskspace in R's tempdir(). > It is used by qAlign to temporarily store the uncompressed fastq files, > the sam files and the bam files (and thus needs several-fold more free > capacity than the size of your fastq.gz files). For more information, > see vignette section 4.1 "File storage locations". > > If tempdir() is too small, you can use redirect R's tempdir() by setting > the TMPDIR environment variable, or just for one qAlign call by using > the "cacheDir" parameter of qAlign. > > If you are sure that diskspace is not the issue, could you give qAlign() > another try, using a cluster object with only 4 nodes to avoid any > memory issues? > > Michael > > > On 21.10.2013 15:09, Ugo Borello wrote: >> Thank you Michael, >> My bad, I am not able to find the QuasR_log at the moment. Anyway the last >> step was the .sam file. QuasR was not proceeding in converting the .sam file >> to a .bam file. >> In attachment some other info on the running job before death. >> Those refer to a case where cl<- makeCluster(1). >> >> >> I run your test and I got: >>> library(parallel) >>> cl<- makeCluster(detectCores()) >>> info<- parLapply(cl, seq_along(cl), function(i) Sys.info()) >>> info >> [[1]] >> sysname release >> "Linux" "2.6.18-348.3.1.el5" >> version nodename >> "#1 SMP Tue Mar 5 13:19:32 EST 2013" "ccwsge0053" >> machine login >> "x86_64" "unknown" >> user effective_user >> "uborello" "uborello" >> >> The same for the 32 nodes. >> >> Then I run: >>> library(parallel) >>> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK" >>> type >> [1] "PSOCK" >>> cores <- getOption("mc.cores", detectCores()) >>> cl <- makeCluster(cores, type=type) >>> cl >> socket cluster with 32 nodes on host 'localhost' >>> results <- parLapply(cl, 1:100, sqrt) >>> sum(unlist(results)) >> [1] 671.4629 >>> stopCluster(cl) >> >> I don't know if this could help. >> >> Any suggestions? >> >> Ugo >> >> >> >>> From: Michael Stadler <michael.stadler at="" fmi.ch=""> >>> Date: Mon, 21 Oct 2013 11:30:27 +0200 >>> To: <bioconductor at="" r-project.org=""> >>> Subject: Re: [BioC] QuasR on Linux Cluster >>> >>> Hi Ugo, >>> >>> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all, >>>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores. >>>> >>>> I run: >>>> library(QuasR) >>>> library(BSgenome.Mmusculus.UCSC.mm10) >>>> >>>> cl <- makeCluster(1) >>>> >>>> sampleFile <- "sampleFile.txt" >>>> >>>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10" >>>> >>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>>> clObj=cl) >>>> >>>> And I get >>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>>> clObj=cl) >>>> alignment files missing - need to: >>>> create 1 genomic alignment(s) >>>> Testing the compute nodes...OK >>>> Loading QuasR on the compute nodes...OK >>>> Available cores: >>>> nodeNames >>>> ccwsge0155 >>>> 1 >>>> Performing genomic alignments for 1 samples. See progress in the log file: >>>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt >>>> Error in unserialize(node$con) : error reading from connection >>>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize >>>> Execution halted >>> >>> The error that you get is not created within QuasR; my guess is that it >>> comes from the "parallel" package, indicating that something goes wrong >>> when using your cluster object "cl". >>> >>> I would suggest testing whether your cluster object works fine. It would >>> help to know if the error message appears immediately after you call >>> qAlign(), or if it takes some time to process. Also, it would be great >>> to see the content of the QuasR log file. >>> >>> Here is a simple test you could try to check your cluster object/connection: >>> parLapply(cl, seq_along(cl), function(i) Sys.info()) >>> >>> As a result, you should get Sys.info() output from each of the cluster >>> nodes. >>> >>> >>>> >>>> I also tryied to modify the multicore option >>>> >>>> cl <- makeCluster(detectCores()) >>>> >>>> And my job is killed because it uses more memory ( Max vmem = 17.118G) than >>>> allowed (16G) >>> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your >>> reads, which may require several GB of memory per node in your cluster >>> object. You can avoid the memory overflow by reducing the number of >>> nodes in your cluster object, e.g. by: >>> >>> cl <- makeCluster(4) >>> >>> which should run through on your machine with 16GB of memory. >>> >>> Best, >>> Michael >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: QuasR.out 4800353.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131022="" d1ebb2e4="" attachment.txt=""> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: QuasR_log_37446ea52111.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131022="" d1ebb2e4="" attachment-0001.txt="">

ADD REPLY • link 12.3 years ago Ugo Borello ▴ 340

0

Entering edit mode

I can see from the intermediate files that SpliceMap was stopped halfway through, before it could create the single sam file with spliced alignments. QuasR tries to detect such cases in the child R process (one of the R processes spawned in your cluster object) and throws an error with a descriptive message. However, you do not get this error message. Rather, you get an error indicating that the parent R process lost it's connection to the child R process. It's hard to get at this from far, so I'll have to wildly guess. Could it be that the child R process is terminated and therefore neither able to signal failure, nor to communicate with the parent R process? Can you give more details about your setup, e.g. if you are running some batch or queueing system that controls job execution? Other things that may help to narrow down the problem is to rerun qAlign() on a subset of the dataset, or without a cluster object. It may also help to know a bit more about the sample you try to analyse (read length, read number, sequence file format). Michael On 22.10.2013 10:43, Ugo Borello wrote: > Dear Michael, > I think that the disk space is not an issue; anyway, I will double check > with the administrator. > > I used 4 nodes and QuasR stopped at the .sam file. See the output files in > attachment. > > When I use less than 4 nodes, it stops at the beginning of the process: > > [1] "Writing BSgenome to disk on ccwsge0144 : > /scratch/4847271.1.huge/Rtmp7nHkpp/file5971727e49b5.fa" > > > > What am I missing? > > Thank you > > Ugo > > > >> From: Michael Stadler <michael.stadler at="" fmi.ch=""> >> Date: Mon, 21 Oct 2013 17:48:53 +0200 >> To: Ugo Borello <ugo.borello at="" inserm.fr="">, <bioconductor at="" r-project.org=""> >> Subject: Re: [BioC] QuasR on Linux Cluster >> >> Your cluster object seems functional now. >> >> Another possible problem could be available diskspace in R's tempdir(). >> It is used by qAlign to temporarily store the uncompressed fastq files, >> the sam files and the bam files (and thus needs several-fold more free >> capacity than the size of your fastq.gz files). For more information, >> see vignette section 4.1 "File storage locations". >> >> If tempdir() is too small, you can use redirect R's tempdir() by setting >> the TMPDIR environment variable, or just for one qAlign call by using >> the "cacheDir" parameter of qAlign. >> >> If you are sure that diskspace is not the issue, could you give qAlign() >> another try, using a cluster object with only 4 nodes to avoid any >> memory issues? >> >> Michael >> >> >> On 21.10.2013 15:09, Ugo Borello wrote: >>> Thank you Michael, >>> My bad, I am not able to find the QuasR_log at the moment. Anyway the last >>> step was the .sam file. QuasR was not proceeding in converting the .sam file >>> to a .bam file. >>> In attachment some other info on the running job before death. >>> Those refer to a case where cl<- makeCluster(1). >>> >>> >>> I run your test and I got: >>>> library(parallel) >>>> cl<- makeCluster(detectCores()) >>>> info<- parLapply(cl, seq_along(cl), function(i) Sys.info()) >>>> info >>> [[1]] >>> sysname release >>> "Linux" "2.6.18-348.3.1.el5" >>> version nodename >>> "#1 SMP Tue Mar 5 13:19:32 EST 2013" "ccwsge0053" >>> machine login >>> "x86_64" "unknown" >>> user effective_user >>> "uborello" "uborello" >>> >>> The same for the 32 nodes. >>> >>> Then I run: >>>> library(parallel) >>>> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK" >>>> type >>> [1] "PSOCK" >>>> cores <- getOption("mc.cores", detectCores()) >>>> cl <- makeCluster(cores, type=type) >>>> cl >>> socket cluster with 32 nodes on host 'localhost' >>>> results <- parLapply(cl, 1:100, sqrt) >>>> sum(unlist(results)) >>> [1] 671.4629 >>>> stopCluster(cl) >>> >>> I don't know if this could help. >>> >>> Any suggestions? >>> >>> Ugo >>> >>> >>> >>>> From: Michael Stadler <michael.stadler at="" fmi.ch=""> >>>> Date: Mon, 21 Oct 2013 11:30:27 +0200 >>>> To: <bioconductor at="" r-project.org=""> >>>> Subject: Re: [BioC] QuasR on Linux Cluster >>>> >>>> Hi Ugo, >>>> >>>> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all, >>>>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores. >>>>> >>>>> I run: >>>>> library(QuasR) >>>>> library(BSgenome.Mmusculus.UCSC.mm10) >>>>> >>>>> cl <- makeCluster(1) >>>>> >>>>> sampleFile <- "sampleFile.txt" >>>>> >>>>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10" >>>>> >>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>>>> clObj=cl) >>>>> >>>>> And I get >>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>>>> clObj=cl) >>>>> alignment files missing - need to: >>>>> create 1 genomic alignment(s) >>>>> Testing the compute nodes...OK >>>>> Loading QuasR on the compute nodes...OK >>>>> Available cores: >>>>> nodeNames >>>>> ccwsge0155 >>>>> 1 >>>>> Performing genomic alignments for 1 samples. See progress in the log file: >>>>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt >>>>> Error in unserialize(node$con) : error reading from connection >>>>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize >>>>> Execution halted >>>> >>>> The error that you get is not created within QuasR; my guess is that it >>>> comes from the "parallel" package, indicating that something goes wrong >>>> when using your cluster object "cl". >>>> >>>> I would suggest testing whether your cluster object works fine. It would >>>> help to know if the error message appears immediately after you call >>>> qAlign(), or if it takes some time to process. Also, it would be great >>>> to see the content of the QuasR log file. >>>> >>>> Here is a simple test you could try to check your cluster object/connection: >>>> parLapply(cl, seq_along(cl), function(i) Sys.info()) >>>> >>>> As a result, you should get Sys.info() output from each of the cluster >>>> nodes. >>>> >>>> >>>>> >>>>> I also tryied to modify the multicore option >>>>> >>>>> cl <- makeCluster(detectCores()) >>>>> >>>>> And my job is killed because it uses more memory ( Max vmem = 17.118G) than >>>>> allowed (16G) >>>> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your >>>> reads, which may require several GB of memory per node in your cluster >>>> object. You can avoid the memory overflow by reducing the number of >>>> nodes in your cluster object, e.g. by: >>>> >>>> cl <- makeCluster(4) >>>> >>>> which should run through on your machine with 16GB of memory. >>>> >>>> Best, >>>> Michael >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >

ADD REPLY • link 12.3 years ago Michael Stadler ▴ 350

0

Entering edit mode

I will run more tests to understand where is the problem. But I don't know if, in the meantime, this could help: when I run qAlign from the R console on the server I get: > proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, clObj=cl) alignment files missing - need to: create 1 genomic alignment(s) will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s Testing the compute nodes...OK Loading QuasR on the compute nodes...OK Available cores: nodeNames ccage014 4 Performing genomic alignments for 1 samples. See progress in the log file: /sps/inter/isc/uborello/input/QuasR_log_14c95dfa8d40.txt Error in checkForRemoteErrors(val) : one node produced an error: Error on ccage014 processing sample /sps/inter/isc/uborello/input/133het.fastq : error in evaluating the argument 'file' in selecting a method for function 'scanFaIndex': Error in value[[3L]](cond) : 'open' index failed file: /tmp/RtmpnzdpVP/file722251e6f03c.fa Calls: open ... tryCatch -> tryCatchList -> tryCatchOne -> <anonymous> And when I drop the argument ' splicedAlignment=TRUE' I get: Performing genomic alignments for 1 samples. See progress in the log file: /sps/inter/isc/uborello/input/QuasR_log_14c9af4e6be.txt sh: line 1: 7369 Aborted (core dumped) '/sps/inter/isc/uborello/software/lib64/R/library/Rbowtie/bowtie' '/sps/inter/isc/uborello/software/lib64/R/library/BSgenome.Mmusculus.U CSC.mm 10.Rbowtie/alignmentIndex/bowtieIndex' '/sps/inter/isc/uborello/input/133het.fastq' -m 1 --best --strata --phred33-quals -S -p 4 '/tmp/RtmpnzdpVP/133het.fastq7222669b51e6.sam' 2>&1 Error in checkForRemoteErrors(val) : one node produced an error: Error on ccage014 processing sample /sps/inter/isc/uborello/input/133het.fastq : bowtie failed to perform the alignments Thank you Ugo > From: Michael Stadler <michael.stadler at="" fmi.ch=""> > Date: Tue, 22 Oct 2013 11:50:05 +0200 > To: Ugo Borello <ugo.borello at="" inserm.fr="">, <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] QuasR on Linux Cluster > > I can see from the intermediate files that SpliceMap was stopped halfway > through, before it could create the single sam file with spliced alignments. > > QuasR tries to detect such cases in the child R process (one of the R > processes spawned in your cluster object) and throws an error with a > descriptive message. However, you do not get this error message. Rather, > you get an error indicating that the parent R process lost it's > connection to the child R process. > > It's hard to get at this from far, so I'll have to wildly guess. Could > it be that the child R process is terminated and therefore neither able > to signal failure, nor to communicate with the parent R process? Can you > give more details about your setup, e.g. if you are running some batch > or queueing system that controls job execution? > > Other things that may help to narrow down the problem is to rerun > qAlign() on a subset of the dataset, or without a cluster object. It may > also help to know a bit more about the sample you try to analyse (read > length, read number, sequence file format). > > Michael > > > > > On 22.10.2013 10:43, Ugo Borello wrote: >> Dear Michael, >> I think that the disk space is not an issue; anyway, I will double check >> with the administrator. >> >> I used 4 nodes and QuasR stopped at the .sam file. See the output files in >> attachment. >> >> When I use less than 4 nodes, it stops at the beginning of the process: >> >> [1] "Writing BSgenome to disk on ccwsge0144 : >> /scratch/4847271.1.huge/Rtmp7nHkpp/file5971727e49b5.fa" >> >> >> >> What am I missing? >> >> Thank you >> >> Ugo >> >> >> >>> From: Michael Stadler <michael.stadler at="" fmi.ch=""> >>> Date: Mon, 21 Oct 2013 17:48:53 +0200 >>> To: Ugo Borello <ugo.borello at="" inserm.fr="">, <bioconductor at="" r-project.org=""> >>> Subject: Re: [BioC] QuasR on Linux Cluster >>> >>> Your cluster object seems functional now. >>> >>> Another possible problem could be available diskspace in R's tempdir(). >>> It is used by qAlign to temporarily store the uncompressed fastq files, >>> the sam files and the bam files (and thus needs several-fold more free >>> capacity than the size of your fastq.gz files). For more information, >>> see vignette section 4.1 "File storage locations". >>> >>> If tempdir() is too small, you can use redirect R's tempdir() by setting >>> the TMPDIR environment variable, or just for one qAlign call by using >>> the "cacheDir" parameter of qAlign. >>> >>> If you are sure that diskspace is not the issue, could you give qAlign() >>> another try, using a cluster object with only 4 nodes to avoid any >>> memory issues? >>> >>> Michael >>> >>> >>> On 21.10.2013 15:09, Ugo Borello wrote: >>>> Thank you Michael, >>>> My bad, I am not able to find the QuasR_log at the moment. Anyway the last >>>> step was the .sam file. QuasR was not proceeding in converting the .sam >>>> file >>>> to a .bam file. >>>> In attachment some other info on the running job before death. >>>> Those refer to a case where cl<- makeCluster(1). >>>> >>>> >>>> I run your test and I got: >>>>> library(parallel) >>>>> cl<- makeCluster(detectCores()) >>>>> info<- parLapply(cl, seq_along(cl), function(i) Sys.info()) >>>>> info >>>> [[1]] >>>> sysname release >>>> "Linux" "2.6.18-348.3.1.el5" >>>> version nodename >>>> "#1 SMP Tue Mar 5 13:19:32 EST 2013" "ccwsge0053" >>>> machine login >>>> "x86_64" "unknown" >>>> user effective_user >>>> "uborello" "uborello" >>>> >>>> The same for the 32 nodes. >>>> >>>> Then I run: >>>>> library(parallel) >>>>> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK" >>>>> type >>>> [1] "PSOCK" >>>>> cores <- getOption("mc.cores", detectCores()) >>>>> cl <- makeCluster(cores, type=type) >>>>> cl >>>> socket cluster with 32 nodes on host 'localhost' >>>>> results <- parLapply(cl, 1:100, sqrt) >>>>> sum(unlist(results)) >>>> [1] 671.4629 >>>>> stopCluster(cl) >>>> >>>> I don't know if this could help. >>>> >>>> Any suggestions? >>>> >>>> Ugo >>>> >>>> >>>> >>>>> From: Michael Stadler <michael.stadler at="" fmi.ch=""> >>>>> Date: Mon, 21 Oct 2013 11:30:27 +0200 >>>>> To: <bioconductor at="" r-project.org=""> >>>>> Subject: Re: [BioC] QuasR on Linux Cluster >>>>> >>>>> Hi Ugo, >>>>> >>>>> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all, >>>>>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores. >>>>>> >>>>>> I run: >>>>>> library(QuasR) >>>>>> library(BSgenome.Mmusculus.UCSC.mm10) >>>>>> >>>>>> cl <- makeCluster(1) >>>>>> >>>>>> sampleFile <- "sampleFile.txt" >>>>>> >>>>>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10" >>>>>> >>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>>>>> clObj=cl) >>>>>> >>>>>> And I get >>>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>>>>> clObj=cl) >>>>>> alignment files missing - need to: >>>>>> create 1 genomic alignment(s) >>>>>> Testing the compute nodes...OK >>>>>> Loading QuasR on the compute nodes...OK >>>>>> Available cores: >>>>>> nodeNames >>>>>> ccwsge0155 >>>>>> 1 >>>>>> Performing genomic alignments for 1 samples. See progress in the log >>>>>> file: >>>>>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt >>>>>> Error in unserialize(node$con) : error reading from connection >>>>>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize >>>>>> Execution halted >>>>> >>>>> The error that you get is not created within QuasR; my guess is that it >>>>> comes from the "parallel" package, indicating that something goes wrong >>>>> when using your cluster object "cl". >>>>> >>>>> I would suggest testing whether your cluster object works fine. It would >>>>> help to know if the error message appears immediately after you call >>>>> qAlign(), or if it takes some time to process. Also, it would be great >>>>> to see the content of the QuasR log file. >>>>> >>>>> Here is a simple test you could try to check your cluster >>>>> object/connection: >>>>> parLapply(cl, seq_along(cl), function(i) Sys.info()) >>>>> >>>>> As a result, you should get Sys.info() output from each of the cluster >>>>> nodes. >>>>> >>>>> >>>>>> >>>>>> I also tryied to modify the multicore option >>>>>> >>>>>> cl <- makeCluster(detectCores()) >>>>>> >>>>>> And my job is killed because it uses more memory ( Max vmem = 17.118G) >>>>>> than >>>>>> allowed (16G) >>>>> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your >>>>> reads, which may require several GB of memory per node in your cluster >>>>> object. You can avoid the memory overflow by reducing the number of >>>>> nodes in your cluster object, e.g. by: >>>>> >>>>> cl <- makeCluster(4) >>>>> >>>>> which should run through on your machine with 16GB of memory. >>>>> >>>>> Best, >>>>> Michael >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>

ADD REPLY • link 12.3 years ago Ugo Borello ▴ 340

0

Entering edit mode

Thanks Ugo, running the script directly on the server was a good idea - something seems to have eaten the error messages before. On 22.10.2013 15:08, Ugo Borello wrote: > I will run more tests to understand where is the problem. > > But I don't know if, in the meantime, this could help: > when I run qAlign from the R console on the server I get: > >> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, > clObj=cl) > alignment files missing - need to: > create 1 genomic alignment(s) > will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s > Testing the compute nodes...OK > Loading QuasR on the compute nodes...OK > Available cores: > nodeNames > ccage014 > 4 > Performing genomic alignments for 1 samples. See progress in the log file: > /sps/inter/isc/uborello/input/QuasR_log_14c95dfa8d40.txt > Error in checkForRemoteErrors(val) : > one node produced an error: Error on ccage014 processing sample > /sps/inter/isc/uborello/input/133het.fastq : error in evaluating the > argument 'file' in selecting a method for function 'scanFaIndex': Error in > value[[3L]](cond) : 'open' index failed > file: /tmp/RtmpnzdpVP/file722251e6f03c.fa > Calls: open ... tryCatch -> tryCatchList -> tryCatchOne -> <anonymous> The failure is in Rsamtools::scanFaIndex, which cannot open /tmp/RtmpnzdpVP/file722251e6f03c.fa or it's index. As mentioned before, my guess is that /tmp/ has run out of disk space...could you check that. > > And when I drop the argument ' splicedAlignment=TRUE' I get: > > Performing genomic alignments for 1 samples. See progress in the log file: > /sps/inter/isc/uborello/input/QuasR_log_14c9af4e6be.txt > sh: line 1: 7369 Aborted (core dumped) > '/sps/inter/isc/uborello/software/lib64/R/library/Rbowtie/bowtie' > '/sps/inter/isc/uborello/software/lib64/R/library/BSgenome.Mmusculus .UCSC.mm > 10.Rbowtie/alignmentIndex/bowtieIndex' > '/sps/inter/isc/uborello/input/133het.fastq' -m 1 --best --strata > --phred33-quals -S -p 4 '/tmp/RtmpnzdpVP/133het.fastq7222669b51e6.sam' 2>&1 > Error in checkForRemoteErrors(val) : > one node produced an error: Error on ccage014 processing sample > /sps/inter/isc/uborello/input/133het.fastq : bowtie failed to perform the > alignments Assuming that bowtie and the genome index are fine, this could also be related to the disk beim full, since it fails to open the output file /tmp/RtmpnzdpVP/133het.fastq7222669b51e6.sam Michael >> From: Michael Stadler <michael.stadler at="" fmi.ch=""> >> Date: Tue, 22 Oct 2013 11:50:05 +0200 >> To: Ugo Borello <ugo.borello at="" inserm.fr="">, <bioconductor at="" r-project.org=""> >> Subject: Re: [BioC] QuasR on Linux Cluster >> >> I can see from the intermediate files that SpliceMap was stopped halfway >> through, before it could create the single sam file with spliced alignments. >> >> QuasR tries to detect such cases in the child R process (one of the R >> processes spawned in your cluster object) and throws an error with a >> descriptive message. However, you do not get this error message. Rather, >> you get an error indicating that the parent R process lost it's >> connection to the child R process. >> >> It's hard to get at this from far, so I'll have to wildly guess. Could >> it be that the child R process is terminated and therefore neither able >> to signal failure, nor to communicate with the parent R process? Can you >> give more details about your setup, e.g. if you are running some batch >> or queueing system that controls job execution? >> >> Other things that may help to narrow down the problem is to rerun >> qAlign() on a subset of the dataset, or without a cluster object. It may >> also help to know a bit more about the sample you try to analyse (read >> length, read number, sequence file format). >> >> Michael >> >> >> >> >> On 22.10.2013 10:43, Ugo Borello wrote: >>> Dear Michael, >>> I think that the disk space is not an issue; anyway, I will double check >>> with the administrator. >>> >>> I used 4 nodes and QuasR stopped at the .sam file. See the output files in >>> attachment. >>> >>> When I use less than 4 nodes, it stops at the beginning of the process: >>> >>> [1] "Writing BSgenome to disk on ccwsge0144 : >>> /scratch/4847271.1.huge/Rtmp7nHkpp/file5971727e49b5.fa" >>> >>> >>> >>> What am I missing? >>> >>> Thank you >>> >>> Ugo >>> >>> >>> >>>> From: Michael Stadler <michael.stadler at="" fmi.ch=""> >>>> Date: Mon, 21 Oct 2013 17:48:53 +0200 >>>> To: Ugo Borello <ugo.borello at="" inserm.fr="">, <bioconductor at="" r-project.org=""> >>>> Subject: Re: [BioC] QuasR on Linux Cluster >>>> >>>> Your cluster object seems functional now. >>>> >>>> Another possible problem could be available diskspace in R's tempdir(). >>>> It is used by qAlign to temporarily store the uncompressed fastq files, >>>> the sam files and the bam files (and thus needs several-fold more free >>>> capacity than the size of your fastq.gz files). For more information, >>>> see vignette section 4.1 "File storage locations". >>>> >>>> If tempdir() is too small, you can use redirect R's tempdir() by setting >>>> the TMPDIR environment variable, or just for one qAlign call by using >>>> the "cacheDir" parameter of qAlign. >>>> >>>> If you are sure that diskspace is not the issue, could you give qAlign() >>>> another try, using a cluster object with only 4 nodes to avoid any >>>> memory issues? >>>> >>>> Michael >>>> >>>> >>>> On 21.10.2013 15:09, Ugo Borello wrote: >>>>> Thank you Michael, >>>>> My bad, I am not able to find the QuasR_log at the moment. Anyway the last >>>>> step was the .sam file. QuasR was not proceeding in converting the .sam >>>>> file >>>>> to a .bam file. >>>>> In attachment some other info on the running job before death. >>>>> Those refer to a case where cl<- makeCluster(1). >>>>> >>>>> >>>>> I run your test and I got: >>>>>> library(parallel) >>>>>> cl<- makeCluster(detectCores()) >>>>>> info<- parLapply(cl, seq_along(cl), function(i) Sys.info()) >>>>>> info >>>>> [[1]] >>>>> sysname release >>>>> "Linux" "2.6.18-348.3.1.el5" >>>>> version nodename >>>>> "#1 SMP Tue Mar 5 13:19:32 EST 2013" "ccwsge0053" >>>>> machine login >>>>> "x86_64" "unknown" >>>>> user effective_user >>>>> "uborello" "uborello" >>>>> >>>>> The same for the 32 nodes. >>>>> >>>>> Then I run: >>>>>> library(parallel) >>>>>> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK" >>>>>> type >>>>> [1] "PSOCK" >>>>>> cores <- getOption("mc.cores", detectCores()) >>>>>> cl <- makeCluster(cores, type=type) >>>>>> cl >>>>> socket cluster with 32 nodes on host 'localhost' >>>>>> results <- parLapply(cl, 1:100, sqrt) >>>>>> sum(unlist(results)) >>>>> [1] 671.4629 >>>>>> stopCluster(cl) >>>>> >>>>> I don't know if this could help. >>>>> >>>>> Any suggestions? >>>>> >>>>> Ugo >>>>> >>>>> >>>>> >>>>>> From: Michael Stadler <michael.stadler at="" fmi.ch=""> >>>>>> Date: Mon, 21 Oct 2013 11:30:27 +0200 >>>>>> To: <bioconductor at="" r-project.org=""> >>>>>> Subject: Re: [BioC] QuasR on Linux Cluster >>>>>> >>>>>> Hi Ugo, >>>>>> >>>>>> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all, >>>>>>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores. >>>>>>> >>>>>>> I run: >>>>>>> library(QuasR) >>>>>>> library(BSgenome.Mmusculus.UCSC.mm10) >>>>>>> >>>>>>> cl <- makeCluster(1) >>>>>>> >>>>>>> sampleFile <- "sampleFile.txt" >>>>>>> >>>>>>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10" >>>>>>> >>>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>>>>>> clObj=cl) >>>>>>> >>>>>>> And I get >>>>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE, >>>>>>> clObj=cl) >>>>>>> alignment files missing - need to: >>>>>>> create 1 genomic alignment(s) >>>>>>> Testing the compute nodes...OK >>>>>>> Loading QuasR on the compute nodes...OK >>>>>>> Available cores: >>>>>>> nodeNames >>>>>>> ccwsge0155 >>>>>>> 1 >>>>>>> Performing genomic alignments for 1 samples. See progress in the log >>>>>>> file: >>>>>>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt >>>>>>> Error in unserialize(node$con) : error reading from connection >>>>>>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize >>>>>>> Execution halted >>>>>> >>>>>> The error that you get is not created within QuasR; my guess is that it >>>>>> comes from the "parallel" package, indicating that something goes wrong >>>>>> when using your cluster object "cl". >>>>>> >>>>>> I would suggest testing whether your cluster object works fine. It would >>>>>> help to know if the error message appears immediately after you call >>>>>> qAlign(), or if it takes some time to process. Also, it would be great >>>>>> to see the content of the QuasR log file. >>>>>> >>>>>> Here is a simple test you could try to check your cluster >>>>>> object/connection: >>>>>> parLapply(cl, seq_along(cl), function(i) Sys.info()) >>>>>> >>>>>> As a result, you should get Sys.info() output from each of the cluster >>>>>> nodes. >>>>>> >>>>>> >>>>>>> >>>>>>> I also tryied to modify the multicore option >>>>>>> >>>>>>> cl <- makeCluster(detectCores()) >>>>>>> >>>>>>> And my job is killed because it uses more memory ( Max vmem = 17.118G) >>>>>>> than >>>>>>> allowed (16G) >>>>>> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your >>>>>> reads, which may require several GB of memory per node in your cluster >>>>>> object. You can avoid the memory overflow by reducing the number of >>>>>> nodes in your cluster object, e.g. by: >>>>>> >>>>>> cl <- makeCluster(4) >>>>>> >>>>>> which should run through on your machine with 16GB of memory. >>>>>> >>>>>> Best, >>>>>> Michael >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>> > >

ADD REPLY • link 12.3 years ago Michael Stadler ▴ 350

Login before adding your answer.