problem using >3 clusters with Quasr
0
0
Entering edit mode
rna seq ▴ 90
@rna-seq-4145
Last seen 8.3 years ago

Hello,

I am trying to run the Bioconductor QUASR package on multiple nodes. However, when I exceed 3 nodes in the command "cl<-makeCluster(4)", I get the following error:

Error in checkForRemoteErrors(val) :
  one node produced an error: Error on ip-172-31-10-100 processing sample /home/ubuntu/RtmpFNTEl1/M1_L001_R1_001.fastq.gz6bcc701726.fastq : failed to open SAM/BAM file.

After running the following commands:

library(QuasR)
library(snow)
library(parallel)
cl<-makeCluster(4)
sampleFile2<-("sample_file.txt")

genomeFile<-("reference_genome.fa")

proj2 <- qAlign(sampleFile2, genomeFile, paired="no", cacheDir="/home/ubuntu",clObj=cl)

Reading threads from similar errors from this list, the explanation that makes the most sense to me is that QUASR is losing  it's connection to the child R process or that multiple nodes are trying to access/write sam files to the same folder.

quasr parallel • 2.5k views
ADD COMMENT
0
Entering edit mode

Hi

I am not really familiar with the parallel packages, so I am just adding a comment:

Have you tried:

cl <- makePSOCKcluster(4)

 

also, can you confirm, that the alignment worked for M1_L001_R1_001.fastq.gz6bcc701726.fastq, when using 3 nodes?

and it will be easier to help you, if you include the sessionInfo

 

Regards, Hans-Rudolf

 

ADD REPLY
0
Entering edit mode

Have you tried:

cl <- makePSOCKcluster(4)

Yes, same error

also, can you confirm, that the alignment worked for M1_L001_R1_001.fastq.gz6bcc701726.fastq, when using 3 nodes?

Yes alignment worked

Here is my session info:

> sessionInfo()

R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] snow_0.4-1           QuasR_1.10.0         Rbowtie_1.10.0      
[4] GenomicRanges_1.22.3 GenomeInfoDb_1.6.1   IRanges_2.4.6       
[7] S4Vectors_0.8.7      BiocGenerics_0.16.1

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.3       XVector_0.10.0            
 [3] GenomicAlignments_1.6.3    zlibbioc_1.16.0           
 [5] BiocParallel_1.4.3         lattice_0.20-33           
 [7] BSgenome_1.38.0            hwriter_1.3.2             
 [9] tools_3.2.3                grid_3.2.3                
[11] SummarizedExperiment_1.0.2 Biobase_2.30.0            
[13] DBI_0.3.1                  latticeExtra_0.6-26       
[15] lambda.r_1.1.7             futile.logger_1.4.1       
[17] GenomicFiles_1.6.2         RColorBrewer_1.1-2        
[19] rtracklayer_1.30.1         futile.options_1.0.0      
[21] bitops_1.0-6               biomaRt_2.26.1            
[23] RCurl_1.95-4.7             RSQLite_1.0.0             
[25] BiocInstaller_1.20.1       GenomicFeatures_1.22.8    
[27] Rsamtools_1.22.0           Biostrings_2.38.3         
[29] ShortRead_1.28.0           XML_3.98-1.3          
ADD REPLY
0
Entering edit mode

can you run other stuff (i.e not R) on more than 3 nodes?

and just to double check: what happens when you just load QuasR and don't load snow

Hans-Rudolf

 

ADD REPLY
0
Entering edit mode

I can do

cl<-makeCluster(14)

myfunc <- function(x=2){x+1}

> myfunc_argument <- 5


I get the same error if I don't load snow

clusterCall(cl, myfunc, myfunc_argument)

 

ADD REPLY
0
Entering edit mode

I am sorry, your last message is a litlle bit confusing to me (the lines look mixed up to me):

do you get the error when calling "clusterCall(cl, myfunc, myfunc_argument)" ?

- if 'yes', it is not a QuasR issue,

- if 'no', could you please provide a reproducible example for the QuasR developers/maintainers can look into.

Regards, Hans-Rudolf

 

 

ADD REPLY
0
Entering edit mode

The cluster call works fine

clusterCall(cl, myfunc, myfunc_argument)"

no error

As you say, I think the problem is with QUASR

library(QuasR)
library(parallel)
cl<-makeCluster(4)
sampleFile2<-("sample_file.txt")

genomeFile<-("reference_genome.fa")

proj2 <- qAlign(sampleFile2, genomeFile, paired="no", cacheDir="/home/ubuntu",clObj=cl)

produces the error:

alignment files missing - need to:
    create alignment index for the genome
    create 6 genomic alignment(s)
will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s
Creating an Rbowtie index for /home/ubuntu/bios0087/reference_tox_0.3_leiden.fa
Finished creating index
Testing the compute nodes...FAILED
Error: The cluster object does not work properly on this system. Please consult the manual of the package 'parallel'

 

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] QuasR_1.10.0         Rbowtie_1.10.0       GenomicRanges_1.22.3
[4] GenomeInfoDb_1.6.1   IRanges_2.4.6        S4Vectors_0.8.7     
[7] BiocGenerics_0.16.1

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.3       XVector_0.10.0            
 [3] GenomicAlignments_1.6.3    zlibbioc_1.16.0           
 [5] BiocParallel_1.4.3         lattice_0.20-33           
 [7] BSgenome_1.38.0            hwriter_1.3.2             
 [9] tools_3.2.3                grid_3.2.3                
[11] SummarizedExperiment_1.0.2 Biobase_2.30.0            
[13] DBI_0.3.1                  latticeExtra_0.6-26       
[15] lambda.r_1.1.7             futile.logger_1.4.1       
[17] GenomicFiles_1.6.2         RColorBrewer_1.1-2        
[19] rtracklayer_1.30.1         futile.options_1.0.0      
[21] bitops_1.0-6               biomaRt_2.26.1            
[23] RCurl_1.95-4.7             RSQLite_1.0.0             
[25] BiocInstaller_1.20.1       GenomicFeatures_1.22.8    
[27] Rsamtools_1.22.0           Biostrings_2.38.3         
[29] ShortRead_1.28.0           XML_3.98-1.3  
ADD REPLY
0
Entering edit mode

Your example isn't reproducible, because 'we' do not have access to sample_file.txt, reference_genome.fa, or your FASTQ files. The following is modified from ?qCount.

library(QuasR)
library(parallel)

file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)
genomeFile <- "extdata/hg19sub.fa"
sampleFile <- "extdata/samples_rna_paired.txt"
     
cl<-makeCluster(4)
proj <- qAlign(sampleFile, genomeFile, splicedAlignment=TRUE,
               alignmentsDir="/tmp", clObj=cl)

It uses files that come with QuasR, and so is reproducible. The output is

> proj <- qAlign(sampleFile, genomeFile, splicedAlignment=TRUE,
+                alignmentsDir="/tmp", clObj=cl)
alignment files missing - need to:
    create 2 genomic alignment(s)
will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s
Testing the compute nodes...OK
Loading QuasR on the compute nodes...OK
Available cores:
nodeNames
HP-ZB 
    4 
Performing genomic alignments for 2 samples. See progress in the log file:
/tmp/QuasR_log_4c946b99a8af.txt
Genomic alignments have been created successfully

> 

Does this example work for you? If so, then try to modify it, piece by piece, to be more like your data, until it no longer works for you. This will help to identify what the problem is. If you cannot narrow the problem, then you will have to arrange, somehow, to make a reproducible example from your own data. The worst case would be to share your data, but probably you can create small bam files / reference genomes that illustrate the problem.

ADD REPLY
0
Entering edit mode

Thanks Martin,

Your code works fine.

I think I have identified the problem with my code. When my genomeFile smaller ~500 sequences, the code works fine. When my genome file is > 1200 sequences, I get the error:

 

  one node produced an error: Error on ip-172-31-10-100 processing sample /tmp/Rtmp4VPRy5/MAQC-1_S14_L001_R1_001.fastq.gzfbb2dbd9863.fastq : failed to open SAM/BAM file
  file: '/tmp/Rtmp4VPRy5/samToBam_fbb1927e6ca/5982_RPS18.sam'

 

Below id the code that generates this:

library(QuasR)

library(parallel)

cl<-makeCluster(4)

sampleFile2 <- "sample_file.txt"

sampleFile2="ref_1250.fa"

 proj2 <- qAlign(sampleFile2, genomeFile, paired="no", alignmentsDir="/tmp",clObj=cl)

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] QuasR_1.10.0         Rbowtie_1.10.0       GenomicRanges_1.22.3
[4] GenomeInfoDb_1.6.1   IRanges_2.4.6        S4Vectors_0.8.7     
[7] BiocGenerics_0.16.1

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.3       XVector_0.10.0            
 [3] GenomicAlignments_1.6.3    zlibbioc_1.16.0           
 [5] BiocParallel_1.4.3         lattice_0.20-33           
 [7] BSgenome_1.38.0            hwriter_1.3.2             
 [9] tools_3.2.3                grid_3.2.3                
[11] SummarizedExperiment_1.0.2 Biobase_2.30.0            
[13] DBI_0.3.1                  latticeExtra_0.6-26       
[15] lambda.r_1.1.7             futile.logger_1.4.1       
[17] GenomicFiles_1.6.2         RColorBrewer_1.1-2        
[19] rtracklayer_1.30.1         futile.options_1.0.0      
[21] bitops_1.0-6               biomaRt_2.26.1            
[23] RCurl_1.95-4.7             RSQLite_1.0.0             
[25] BiocInstaller_1.20.1       GenomicFeatures_1.22.8    
[27] Rsamtools_1.22.0           Biostrings_2.38.3         
[29] ShortRead_1.28.0           XML_3.98-1.3       

ADD REPLY
0
Entering edit mode

Reposting new question under title: "error when genomeFile too big when running Quasr on >3 clusters"

ADD REPLY

Login before adding your answer.

Traffic: 553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6