I am following the ATACseqQC vignette (https://bioconductor.org/packages/release/bioc/vignettes/ATACseqQC/inst/doc/ATACseqQC.html) to analyze several ATAC seq samples, and I keep getting a segfault error when I run
outPath <- "splited"
dir.create(outPath)
## shift the coordinates of 5'ends of alignments in the bam file
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
## if you don't have an available TxDb, please refer
## GenomicFeatures::makeTxDbFromGFF to create one from gff3 or gtf file.
seqlev <- c("chr1", "chr2", "chr3", "chr4", "chr5",
"chr6", "chr7", "chr8", "chr9", "chr10",
"chr11", "chr12", "chr13", "chr14", "chr15",
"chr16", "chr17", "chr18", "chr19", "chr20",
"chr21", "chr22")
seqinformation <- seqinfo(TxDb.Hsapiens.UCSC.hg.38knownGene)
which <- as(seqinformation[seqlev], "GRanges")
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
I have changed from the original vignette to get all 'normal' chromosomes from just "chr1"
. I also am using TxDb.Hsapiens.UCSC.hg38.knownGene
rather than TxDb.Hsapiens.UCSC.hg19.knownGene
. The error comes when shifting the alignments using shiftGAlignmentsList
, and the error I get is
*** caught segfault ***
address 0xffffffff9a378e20, cause 'memory not mapped'
Traceback:
1: .Call(.merge_bam, files, destination, overwrite, header, region, byQname, addRG, compressLevel1)
2: doTryCatch(return(expr), name, parentenv, handler)
3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
4: tryCatchList(expr, classes, parentenv, handlers)
5: tryCatch({ files <- sapply(files, .normalizePath) destination <- .normalizePath(destination) region <- local({ x <- as(region, "GRanges") if (1L < length(x)) stop("'region' must specify one range") sprintf("%s:%d-%d", as.character(seqnames(x)), start(x), end(x)) }) if (!overwrite && file.exists(destination)) { msg <- sprintf("'%s' exists, '%s' is FALSE\n %s: %s", "destination", "overwrite", "destination", destination) stop(msg) } header <- .normalizePath(header) destination <- .Call(.merge_bam, files, destination, overwrite, header, region, byQname, addRG, compressLevel1) if (indexDestination) indexBam(destination) destination}, error = function(err) { msg <- sprintf("'mergeBam' %s", conditionMessage(err)) stop(msg)})
6: .local(files, destination, ...)
7: mergeBam(outfile, destination = tempfile(fileext = ".bam"), indexDestination = TRUE, header = meta$file)
8: mergeBam(outfile, destination = tempfile(fileext = ".bam"), indexDestination = TRUE, header = meta$file)
9: shiftGAlignmentsList(gal, outbam = shiftedBamfile)
I have referred to this stackoverflow post (https://stackoverflow.com/questions/49190251/caught-segfault-memory-not-mapped-error-in-r) that mentioned something similar, but reinstalling/updating my packages did not work. I would love any help fixing what should be a fairly straightforward analysis! Here is my sessionInfo:
sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux
Matrix products: default
BLAS/LAPACK: /zapps7/intel_parallel_studio_xe/2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
locale:
[1] C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0
[2] GenomicFeatures_1.42.3
[3] AnnotationDbi_1.52.0
[4] Biobase_2.50.0
[5] Rsamtools_2.6.0
[6] Biostrings_2.58.0
[7] XVector_0.30.0
[8] GenomicRanges_1.42.0
[9] GenomeInfoDb_1.26.7
[10] IRanges_2.24.1
[11] ggplot2_3.3.3
[12] ATACseqQC_1.14.4
[13] S4Vectors_0.28.1
[14] BiocGenerics_0.36.1
loaded via a namespace (and not attached):
[1] colorspace_2.0-1 ellipsis_0.3.2
[3] futile.logger_1.4.3 rstudioapi_0.13
[5] farver_2.1.0 ChIPpeakAnno_3.24.2
[7] bit64_4.0.5 interactiveDisplayBase_1.28.0
[9] fansi_0.5.0 xml2_1.3.2
[11] motifStack_1.34.0 splines_4.0.0
[13] cachem_1.0.5 ade4_1.7-16
[15] polynom_1.4-0 dbplyr_2.1.1
[17] png_0.1-7 graph_1.68.0
[19] shiny_1.6.0 HDF5Array_1.18.1
[21] BiocManager_1.30.15 compiler_4.0.0
[23] httr_1.4.2 assertthat_0.2.1
[25] Matrix_1.3-4 fastmap_1.1.0
[27] lazyeval_0.2.2 limma_3.46.0
[29] later_1.2.0 formatR_1.11
[31] htmltools_0.5.1.1 prettyunits_1.1.1
[33] tools_4.0.0 gtable_0.3.0
[35] glue_1.4.2 GenomeInfoDbData_1.2.4
[37] dplyr_1.0.6 rappdirs_0.3.3
[39] Rcpp_1.0.6 vctrs_0.3.8
[41] rhdf5filters_1.2.1 multtest_2.46.0
[43] rtracklayer_1.50.0 stringr_1.4.0
[45] mime_0.10 lifecycle_1.0.0
[47] ensembldb_2.14.1 XML_3.99-0.6
[49] AnnotationHub_2.22.1 edgeR_3.32.1
[51] zlibbioc_1.36.0 MASS_7.3-54
[53] scales_1.1.1 BSgenome_1.58.0
[55] hms_1.1.0 promises_1.2.0.1
[57] MatrixGenerics_1.2.1 ProtGenerics_1.22.0
[59] SummarizedExperiment_1.20.0 RBGL_1.66.0
[61] rhdf5_2.34.0 AnnotationFilter_1.14.0
[63] lambda.r_1.2.4 yaml_2.2.1
[65] curl_4.3.1 memoise_2.0.0
[67] biomaRt_2.46.3 stringi_1.6.2
[69] RSQLite_2.2.7 BiocVersion_3.12.0
[71] randomForest_4.6-14 BiocParallel_1.24.1
[73] rlang_0.4.11 pkgconfig_2.0.3
[75] matrixStats_0.59.0 bitops_1.0-7
[77] lattice_0.20-41 purrr_0.3.4
[79] Rhdf5lib_1.12.1 labeling_0.4.2
[81] htmlwidgets_1.5.3 GenomicAlignments_1.26.0
[83] bit_4.0.4 tidyselect_1.1.1
[85] magrittr_2.0.1 R6_2.5.0
[87] generics_0.1.0 DelayedArray_0.16.3
[89] DBI_1.1.1 withr_2.4.2
[91] preseqR_4.0.0 pillar_1.6.1
[93] survival_3.2-11 KEGGREST_1.30.1
[95] RCurl_1.98-1.3 tibble_3.1.2
[97] crayon_1.4.1 futile.options_1.0.1
[99] KernSmooth_2.23-20 utf8_1.2.1
[101] BiocFileCache_1.14.0 progress_1.2.2
[103] locfit_1.5-9.4 grid_4.0.0
[105] blob_1.2.1 GenomicScores_2.2.0
[107] digest_0.6.27 xtable_1.8-4
[109] VennDiagram_1.6.20 httpuv_1.6.1
[111] regioneR_1.22.0 openssl_1.4.4
[113] munsell_0.5.0 askpass_1.1
Haibo, Thank you for the quick response. I wrote the analysis in a bash script, and submit the job on a login node on my department's server.
where the contents of sample1.sh was
I tried to submit a slurm job with more resources, but, for security purposes, our compute nodes do not have internet access. The biomaRt package is referenced online internally by ATACseqQC, and this requires an internet connection. When I try running a slurm job with more resources, I get the following error:
I referred to https://www.bioconductor.org/packages/devel/bioc/vignettes/biomaRt/inst/doc/accessing_ensembl.html#connection-troubleshooting for connection help, but none of the suggestions worked/applied in this case. Do you have any suggestions for this issue?