Entering edit mode
rna seq
▴
90
@rna-seq-4145
Last seen 8.3 years ago
Hello,
I am trying to run the Bioconductor QUASR package on multiple nodes. However, when I exceed 3 nodes in the command "cl<-makeCluster(4)", I get the following error:
Error in checkForRemoteErrors(val) : one node produced an error: Error on ip-172-31-10-100 processing sample /home/ubuntu/RtmpFNTEl1/M1_L001_R1_001.fastq.gz6bcc701726.fastq : failed to open SAM/BAM file.
After running the following commands:
library(QuasR) library(snow) library(parallel) cl<-makeCluster(4) sampleFile2<-("sample_file.txt") genomeFile<-("reference_genome.fa") proj2 <- qAlign(sampleFile2, genomeFile, paired="no", cacheDir="/home/ubuntu",clObj=cl)
Reading threads from similar errors from this list, the explanation that makes the most sense to me is that QUASR is losing it's connection to the child R process or that multiple nodes are trying to access/write sam files to the same folder.
Hi
I am not really familiar with the parallel packages, so I am just adding a comment:
Have you tried:
cl <- makePSOCKcluster(4)
also, can you confirm, that the alignment worked for M1_L001_R1_001.fastq.gz6bcc701726.fastq, when using 3 nodes?
and it will be easier to help you, if you include the sessionInfo
Regards, Hans-Rudolf
Have you tried:
cl <- makePSOCKcluster(4)
Yes, same error
also, can you confirm, that the alignment worked for M1_L001_R1_001.fastq.gz6bcc701726.fastq, when using 3 nodes?
Yes alignment worked
Here is my session info:
> sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] snow_0.4-1 QuasR_1.10.0 Rbowtie_1.10.0 [4] GenomicRanges_1.22.3 GenomeInfoDb_1.6.1 IRanges_2.4.6 [7] S4Vectors_0.8.7 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] AnnotationDbi_1.32.3 XVector_0.10.0 [3] GenomicAlignments_1.6.3 zlibbioc_1.16.0 [5] BiocParallel_1.4.3 lattice_0.20-33 [7] BSgenome_1.38.0 hwriter_1.3.2 [9] tools_3.2.3 grid_3.2.3 [11] SummarizedExperiment_1.0.2 Biobase_2.30.0 [13] DBI_0.3.1 latticeExtra_0.6-26 [15] lambda.r_1.1.7 futile.logger_1.4.1 [17] GenomicFiles_1.6.2 RColorBrewer_1.1-2 [19] rtracklayer_1.30.1 futile.options_1.0.0 [21] bitops_1.0-6 biomaRt_2.26.1 [23] RCurl_1.95-4.7 RSQLite_1.0.0 [25] BiocInstaller_1.20.1 GenomicFeatures_1.22.8 [27] Rsamtools_1.22.0 Biostrings_2.38.3 [29] ShortRead_1.28.0 XML_3.98-1.3
can you run other stuff (i.e not R) on more than 3 nodes?
and just to double check: what happens when you just load QuasR and don't load snow
Hans-Rudolf
I can do
cl<-makeCluster(14)
myfunc <- function(x=2){x+1}
> myfunc_argument <- 5
I get the same error if I don't load snow
clusterCall(cl, myfunc, myfunc_argument)
I am sorry, your last message is a litlle bit confusing to me (the lines look mixed up to me):
do you get the error when calling "clusterCall(cl, myfunc, myfunc_argument)" ?
- if 'yes', it is not a QuasR issue,
- if 'no', could you please provide a reproducible example for the QuasR developers/maintainers can look into.
Regards, Hans-Rudolf
The cluster call works fine
no error
As you say, I think the problem is with QUASR
produces the error:
Your example isn't reproducible, because 'we' do not have access to sample_file.txt, reference_genome.fa, or your FASTQ files. The following is modified from
?qCount
.It uses files that come with QuasR, and so is reproducible. The output is
Does this example work for you? If so, then try to modify it, piece by piece, to be more like your data, until it no longer works for you. This will help to identify what the problem is. If you cannot narrow the problem, then you will have to arrange, somehow, to make a reproducible example from your own data. The worst case would be to share your data, but probably you can create small bam files / reference genomes that illustrate the problem.
Thanks Martin,
Your code works fine.
I think I have identified the problem with my code. When my genomeFile smaller ~500 sequences, the code works fine. When my genome file is > 1200 sequences, I get the error:
one node produced an error: Error on ip-172-31-10-100 processing sample /tmp/Rtmp4VPRy5/MAQC-1_S14_L001_R1_001.fastq.gzfbb2dbd9863.fastq : failed to open SAM/BAM file
file: '/tmp/Rtmp4VPRy5/samToBam_fbb1927e6ca/5982_RPS18.sam'
Below id the code that generates this:
library(QuasR)
library(parallel)
cl<-makeCluster(4)
sampleFile2 <- "sample_file.txt"
sampleFile2="ref_1250.fa"
proj2 <- qAlign(sampleFile2, genomeFile, paired="no", alignmentsDir="/tmp",clObj=cl)
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] QuasR_1.10.0 Rbowtie_1.10.0 GenomicRanges_1.22.3
[4] GenomeInfoDb_1.6.1 IRanges_2.4.6 S4Vectors_0.8.7
[7] BiocGenerics_0.16.1
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.32.3 XVector_0.10.0
[3] GenomicAlignments_1.6.3 zlibbioc_1.16.0
[5] BiocParallel_1.4.3 lattice_0.20-33
[7] BSgenome_1.38.0 hwriter_1.3.2
[9] tools_3.2.3 grid_3.2.3
[11] SummarizedExperiment_1.0.2 Biobase_2.30.0
[13] DBI_0.3.1 latticeExtra_0.6-26
[15] lambda.r_1.1.7 futile.logger_1.4.1
[17] GenomicFiles_1.6.2 RColorBrewer_1.1-2
[19] rtracklayer_1.30.1 futile.options_1.0.0
[21] bitops_1.0-6 biomaRt_2.26.1
[23] RCurl_1.95-4.7 RSQLite_1.0.0
[25] BiocInstaller_1.20.1 GenomicFeatures_1.22.8
[27] Rsamtools_1.22.0 Biostrings_2.38.3
[29] ShortRead_1.28.0 XML_3.98-1.3
Reposting new question under title: "error when genomeFile too big when running Quasr on >3 clusters"