summarizeOverlaps bplapply error in DeSeq2 pipeline
0
0
Entering edit mode
cavery • 0
@cavery-20903
Last seen 5.6 years ago
University of Utah

Hello all,

Hoping to get some fresh insight on the error detailed below.

I've tried suggestions raised in other forums, including adding BPPARAM=SerialParam() to the summarizeOverlaps command, which consistently crashes R Studio. Without defining BPPARAM, the process appears to be split between the six samples. If I run the command on just one .bam, R studio crashes.

What's even more curious is that my pipeline worked a few weeks ago when I was working with a different set of .bams. I tried to go back to those .bams and my script no longer works (same error as detailed below). As far as I know, the server I'm running R on has not been changed in any significant way.

I'm at a total loss and welcome any suggestions.

Thanks


library("Rsamtools")
library("GenomicFeatures")

gtffile <- file.path("/ybod2/cavery/Resources/gencode.v30.annotation.gtf")
txdb <- makeTxDbFromGFF(gtffile, format = "gtf", circ_seqs = character())
ebg <- exonsBy(txdb, by="gene")

library("GenomicAlignments")

bamX1 <- BamFile("/ybod2/cavery/Bin/STAR-2.7.0f/bin/Linux_x86_64/15735X1_190404_A00421_0051_AHK3J5DSXX_S2Aligned.samtoolsout.bam", index="/ybod2/cavery/Bin/STAR-2.7.0f/bin/Linux_x86_64/15735X1_190404_A00421_0051_AHK3J5DSXX_S2Aligned.samtoolsout.bam.bai", asMates=TRUE, yieldSize=100000)

...

bamlist <- BamFileList(bamX1, bamX2, bamX3, bamX4, bamX5, bamX6, yieldSize=100000)

se <- summarizeOverlaps(features=ebg, reads=bamlist, mode="Union", singleEnd=FALSE, ignore.strand=FALSE, fragments=TRUE)

**Error: 'bplapply' receive data failed:
  error reading from connection**

*> traceback()*
16: stop(.error_worker_comm(e, sprintf("'%s' receive data failed", 
        id)))
15: value[[3L]](cond)
14: tryCatchOne(expr, names, parentenv, handlers[[1L]])
13: tryCatchList(expr, classes, parentenv, handlers)
12: tryCatch({
        parallel:::recvOneData(cluster)
    }, error = function(e) {
        stop(.error_worker_comm(e, sprintf("'%s' receive data failed", 
            id)))
    })

*> sessionInfo()*
R version 3.5.2 (2018-12-20)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicAlignments_1.18.1    SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [4] BiocParallel_1.16.6         matrixStats_0.54.0          GenomicFeatures_1.34.8     
 [7] AnnotationDbi_1.44.0        Biobase_2.42.0              Rsamtools_1.34.1           
[10] Biostrings_2.50.2           XVector_0.22.0              GenomicRanges_1.34.0       
[13] GenomeInfoDb_1.18.2         IRanges_2.16.0              S4Vectors_0.20.1           
[16] BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1             compiler_3.5.2         prettyunits_1.0.2      bitops_1.0-6          
 [5] tools_3.5.2            zlibbioc_1.28.0        progress_1.2.2         biomaRt_2.38.0        
 [9] digest_0.6.19          bit_1.1-14             lattice_0.20-38        RSQLite_2.1.1         
[13] memoise_1.1.0          pkgconfig_2.0.2        rlang_0.3.4            Matrix_1.2-15         
[17] DBI_1.0.0              rstudioapi_0.10        GenomeInfoDbData_1.2.0 rtracklayer_1.42.2    
[21] stringr_1.4.0          httr_1.4.0             hms_0.4.2              grid_3.5.2            
[25] bit64_0.9-7            R6_2.4.0               XML_3.98-1.19          blob_1.1.1            
[29] magrittr_1.5           assertthat_0.2.1       stringi_1.4.3          RCurl_1.95-4.12       
[33] crayon_1.3.4
summarizeOverlaps R GenomicAlignments • 1.7k views
ADD COMMENT
0
Entering edit mode

Can you confirm that this fails with BPPARAM=SerialParam(), or BiocParallel::register(BiocParallel::SerialParam()) ? summarizeOverlaps() should just revert to doing an lapply() in this situation.

ADD REPLY
0
Entering edit mode

I'm getting a new error that I'm not sure how to interpret...

> se <- summarizeOverlaps(features=ebg, reads=bamlist, mode="Union", singleEnd=FALSE, ignore.strand=FALSE, fragments=TRUE, BiocParallel::register(BiocParallel::SerialParam()))

**Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6
  first error: argument is not interpretable as logical**

> traceback()
9: stop(.error_bplist(res))
8: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
7: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
6: bplapply(setNames(seq_along(reads), names(reads)), function(i, 
       FUN, reads, features, mode, ignore.strand, inter.feature, 
       param, preprocess.reads, ...) {
       bf <- reads[[i]]
       .countWithYieldSize(FUN, features, bf, mode, ignore.strand, 
           inter.feature, param, preprocess.reads, ...)
   }, FUN, reads, features, mode = match.fun(mode), ignore.strand = ignore.strand, 
       inter.feature = inter.feature, param = param, preprocess.reads = preprocess.reads, 
       ...)
5: bplapply(setNames(seq_along(reads), names(reads)), function(i, 
       FUN, reads, features, mode, ignore.strand, inter.feature, 
       param, preprocess.reads, ...) {
       bf <- reads[[i]]
       .countWithYieldSize(FUN, features, bf, mode, ignore.strand, 
           inter.feature, param, preprocess.reads, ...)
   }, FUN, reads, features, mode = match.fun(mode), ignore.strand = ignore.strand, 
       inter.feature = inter.feature, param = param, preprocess.reads = preprocess.reads, 
       ...)
4: .dispatchBamFiles(features, reads, mode, ignore.strand, inter.feature = inter.feature, 
       singleEnd = singleEnd, fragments = fragments, param = param, 
       preprocess.reads = preprocess.reads, ...)
3: .local(features, reads, mode, ignore.strand, ...)
2: summarizeOverlaps(features = ebg, reads = bamlist, mode = "Union", 
       singleEnd = FALSE, ignore.strand = FALSE, fragments = TRUE, 
       BiocParallel::register(BiocParallel::SerialParam()))
1: summarizeOverlaps(features = ebg, reads = bamlist, mode = "Union", 
       singleEnd = FALSE, ignore.strand = FALSE, fragments = TRUE, 
       BiocParallel::register(BiocParallel::SerialParam()))
ADD REPLY
0
Entering edit mode

I wasn't clear; the command BiocParallel::register(BiocParallel::SerialParam() belongs outside the function call. Also, when submitting questions / comments / answers you can select your R code and click the icon with 0's and 1's to format it in a more legible manner.

ADD REPLY
0
Entering edit mode

Thank you for the clarification. Upon correction and re-running summarizeOverlaps, my session is aborted with a forced restart of the session so I am unable to provide any specific error messages...

ADD REPLY
0
Entering edit mode

Thank you for the clarification. Upon correction and re-running summarizeOverlaps, my session is aborted with a forced restart of the session so I am unable to provide any specific error messages...

ADD REPLY
0
Entering edit mode

are you running out of memory? if so try reducing yield size. Also, make sure that you are working in an R session that does not have extraneous packages attached. And make sure that you have a 'valid' Bioconductor installation, as returned by BiocManager::valid().

ADD REPLY
0
Entering edit mode

I have been monitoring memory usage in the command line and it doesn't appear to be using much memory at all. But I tried reducing yield size all the way down to 10,000 and still running into the same bplapply error. BiocManager::valid() did reveal two packages that were out of date and I promptly updated them, but still no luck.

ADD REPLY
0
Entering edit mode

It's a little confusing when you say 'still no luck', since you have been unlucky in three different ways (bplapply receive data failure; argument is not interpretable as logical; session aborted).

Can you confirm that

se <- summarizeOverlaps(
    features=ebg, reads=bamlist, mode="Union", 
    singleEnd=FALSE, ignore.strand=FALSE, fragments=TRUE,
    BPPARAM = SerialParam()
)

results in an error

**Error: 'bplapply' receive data failed:
  error reading from connection**

This is not expected at all, because with SerialParam there is no connection involved, it is just an lapply over each file.

I guess you managed to use SerialParam and the session aborted with no error message. This implies a fundamental problem reading the BAM file. The code does this using something like

open(bamX1)
repeat {
    aln <- readGAlignmentsList(bamX1)
    if (length(aln) == 0L)
        break
}
close(bamX1)

You could try running this for each BAM file.

ADD REPLY
0
Entering edit mode

Yes, I can confirm that attempting to use SerialParam resulted in the session being terminated with no error message. I decided to remake the BAM files and now summarizeOverlaps completes successfully. It's unclear what the error with the files was, but regenerating the BAMs seems to have corrected the issue.

Thank you for all of your help- I was completely stumped by the errors I was receiving.

ADD REPLY

Login before adding your answer.

Traffic: 612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6