Hello all,
Hoping to get some fresh insight on the error detailed below.
I've tried suggestions raised in other forums, including adding BPPARAM=SerialParam() to the summarizeOverlaps command, which consistently crashes R Studio. Without defining BPPARAM, the process appears to be split between the six samples. If I run the command on just one .bam, R studio crashes.
What's even more curious is that my pipeline worked a few weeks ago when I was working with a different set of .bams. I tried to go back to those .bams and my script no longer works (same error as detailed below). As far as I know, the server I'm running R on has not been changed in any significant way.
I'm at a total loss and welcome any suggestions.
Thanks
library("Rsamtools")
library("GenomicFeatures")
gtffile <- file.path("/ybod2/cavery/Resources/gencode.v30.annotation.gtf")
txdb <- makeTxDbFromGFF(gtffile, format = "gtf", circ_seqs = character())
ebg <- exonsBy(txdb, by="gene")
library("GenomicAlignments")
bamX1 <- BamFile("/ybod2/cavery/Bin/STAR-2.7.0f/bin/Linux_x86_64/15735X1_190404_A00421_0051_AHK3J5DSXX_S2Aligned.samtoolsout.bam", index="/ybod2/cavery/Bin/STAR-2.7.0f/bin/Linux_x86_64/15735X1_190404_A00421_0051_AHK3J5DSXX_S2Aligned.samtoolsout.bam.bai", asMates=TRUE, yieldSize=100000)
...
bamlist <- BamFileList(bamX1, bamX2, bamX3, bamX4, bamX5, bamX6, yieldSize=100000)
se <- summarizeOverlaps(features=ebg, reads=bamlist, mode="Union", singleEnd=FALSE, ignore.strand=FALSE, fragments=TRUE)
**Error: 'bplapply' receive data failed:
error reading from connection**
*> traceback()*
16: stop(.error_worker_comm(e, sprintf("'%s' receive data failed",
id)))
15: value[[3L]](cond)
14: tryCatchOne(expr, names, parentenv, handlers[[1L]])
13: tryCatchList(expr, classes, parentenv, handlers)
12: tryCatch({
parallel:::recvOneData(cluster)
}, error = function(e) {
stop(.error_worker_comm(e, sprintf("'%s' receive data failed",
id)))
})
*> sessionInfo()*
R version 3.5.2 (2018-12-20)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicAlignments_1.18.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[4] BiocParallel_1.16.6 matrixStats_0.54.0 GenomicFeatures_1.34.8
[7] AnnotationDbi_1.44.0 Biobase_2.42.0 Rsamtools_1.34.1
[10] Biostrings_2.50.2 XVector_0.22.0 GenomicRanges_1.34.0
[13] GenomeInfoDb_1.18.2 IRanges_2.16.0 S4Vectors_0.20.1
[16] BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 compiler_3.5.2 prettyunits_1.0.2 bitops_1.0-6
[5] tools_3.5.2 zlibbioc_1.28.0 progress_1.2.2 biomaRt_2.38.0
[9] digest_0.6.19 bit_1.1-14 lattice_0.20-38 RSQLite_2.1.1
[13] memoise_1.1.0 pkgconfig_2.0.2 rlang_0.3.4 Matrix_1.2-15
[17] DBI_1.0.0 rstudioapi_0.10 GenomeInfoDbData_1.2.0 rtracklayer_1.42.2
[21] stringr_1.4.0 httr_1.4.0 hms_0.4.2 grid_3.5.2
[25] bit64_0.9-7 R6_2.4.0 XML_3.98-1.19 blob_1.1.1
[29] magrittr_1.5 assertthat_0.2.1 stringi_1.4.3 RCurl_1.95-4.12
[33] crayon_1.3.4
Can you confirm that this fails with
BPPARAM=SerialParam()
, orBiocParallel::register(BiocParallel::SerialParam())
?summarizeOverlaps()
should just revert to doing an lapply() in this situation.I'm getting a new error that I'm not sure how to interpret...
I wasn't clear; the command
BiocParallel::register(BiocParallel::SerialParam()
belongs outside the function call. Also, when submitting questions / comments / answers you can select your R code and click the icon with 0's and 1's to format it in a more legible manner.Thank you for the clarification. Upon correction and re-running summarizeOverlaps, my session is aborted with a forced restart of the session so I am unable to provide any specific error messages...
Thank you for the clarification. Upon correction and re-running summarizeOverlaps, my session is aborted with a forced restart of the session so I am unable to provide any specific error messages...
are you running out of memory? if so try reducing yield size. Also, make sure that you are working in an R session that does not have extraneous packages attached. And make sure that you have a 'valid' Bioconductor installation, as returned by
BiocManager::valid()
.I have been monitoring memory usage in the command line and it doesn't appear to be using much memory at all. But I tried reducing yield size all the way down to 10,000 and still running into the same bplapply error.
BiocManager::valid()
did reveal two packages that were out of date and I promptly updated them, but still no luck.It's a little confusing when you say 'still no luck', since you have been unlucky in three different ways (bplapply receive data failure; argument is not interpretable as logical; session aborted).
Can you confirm that
results in an error
This is not expected at all, because with SerialParam there is no connection involved, it is just an lapply over each file.
I guess you managed to use SerialParam and the session aborted with no error message. This implies a fundamental problem reading the BAM file. The code does this using something like
You could try running this for each BAM file.
Yes, I can confirm that attempting to use SerialParam resulted in the session being terminated with no error message. I decided to remake the BAM files and now summarizeOverlaps completes successfully. It's unclear what the error with the files was, but regenerating the BAMs seems to have corrected the issue.
Thank you for all of your help- I was completely stumped by the errors I was receiving.