Entering edit mode
When trying to create a SummarizedExperiment
object from approx. 50 bam files GenomicAlignments::summarizeOverlaps
, I get the following error:
Error in result[[njob]] <- value :
attempt to select less than one element in OneIndex
Calls: <Anonymous> ... bplapply -> bplapply -> bplapply -> bploop -> bploop.lapply
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
1 parallel job did not deliver a
This error doesn't occur if I restrict to only 5 bam files. I am running the following code on our HPC as a slurm job with 250GB of memory and 12 cores assigned to it:
# Create a transcript data base
txdb = GenomicFeatures::makeTxDbFromGFF(gtf_file)
# Collapse gene models into counting bins
exonic_parts = GenomicFeatures::exonicParts(txdb, linked.to.single.gene.only = TRUE)
# Create references to BAM files
bam_filelist = Rsamtools::BamFileList(bam_files, index=character(), yieldSize=100000, obeyQname=TRUE)
# Count the reads overlapping the bins
ese = GenomicAlignments::summarizeOverlaps(exonic_parts, bam_filelist, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, inter.feature=FALSE, fragments=TRUE)
sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)
Matrix products: default
BLAS/LAPACK: /opt/conda/envs/rnaseq/lib/libopenblasp-r0.3.10.so
locale:
[1] C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicAlignments_1.24.0 Rsamtools_2.4.0
[3] Biostrings_2.56.0 XVector_0.28.0
[5] SummarizedExperiment_1.18.1 DelayedArray_0.14.0
[7] matrixStats_0.58.0 GenomicFeatures_1.40.0
[9] AnnotationDbi_1.50.0 Biobase_2.48.0
[11] GenomicRanges_1.40.0 GenomeInfoDb_1.24.0
[13] IRanges_2.22.1 S4Vectors_0.26.0
[15] BiocGenerics_0.34.0 dplyr_1.0.5
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 lattice_0.20-41 prettyunits_1.1.1
[4] assertthat_0.2.1 utf8_1.2.1 BiocFileCache_1.12.0
[7] R6_2.5.0 RSQLite_2.2.4 httr_1.4.2
[10] pillar_1.5.1 zlibbioc_1.34.0 rlang_0.4.10
[13] progress_1.2.2 curl_4.3 rstudioapi_0.13
[16] blob_1.2.1 Matrix_1.3-2 BiocParallel_1.22.0
[19] stringr_1.4.0 RCurl_1.98-1.3 bit_4.0.4
[22] biomaRt_2.44.0 compiler_4.0.2 rtracklayer_1.48.0
[25] pkgconfig_2.0.3 askpass_1.1 openssl_1.4.3
[28] tidyselect_1.1.0 tibble_3.1.0 GenomeInfoDbData_1.2.4
[31] XML_3.99-0.6 fansi_0.4.2 crayon_1.4.1
[34] dbplyr_2.1.0 bitops_1.0-6 rappdirs_0.3.3
[37] grid_4.0.2 lifecycle_1.0.0 DBI_1.1.1
[40] magrittr_2.0.1 stringi_1.4.6 cachem_1.0.4
[43] ellipsis_0.3.1 generics_0.1.0 vctrs_0.3.6
[46] tools_4.0.2 bit64_4.0.5 glue_1.4.2
[49] purrr_0.3.4 hms_1.0.0 fastmap_1.1.0
[52] memoise_2.0.0
[1] "gtf_file: /local/1385061/tmp.N2RyL48FNO/by_id/arpez-4zz18-y930zhe7xllejrh/gencode.v35.annotation.gtf.gz"
[1] "bam_dir: /local/1385061/tmp.N2RyL48FNO/by_id/arpez-j7d0g-hhwwhnva9l5xsu4/Hisat2"
[1] "meta_dir: /local/1385061/tmp.N2RyL48FNO/by_id/arpez-j7d0g-hhwwhnva9l5xsu4/Metadata"
[1] "output_dir: /pstore/scratch/u/munzm1/rnaseq-myopathay-small/bash_online/dexseq_preprocOutputxyz"
[1] "multiqc_file: /local/1385061/tmp.N2RyL48FNO/by_id/arpez-j7d0g-hhwwhnva9l5xsu4/MultiQC/multiqc_data/multiqc_general_stats.txt"
Just saw that
GenomicAlignments::summarizeOverlaps
allows to controll the number of parallel process withBiocParallel::register
, e.g.register(MulticoreParam(workers=6))