problem reading bam files into nucleR
0
0
Entering edit mode
efoss ▴ 10
@efoss-8908
Last seen 15 hours ago
United States

I have run into trouble reading bam files into nucleR using readBAM. I aligned my fastq files using bwa, processed them with Picard Tools' "CleanSam", "SortSam", and "FixMateInformation", and then confirmed that there are no errors using "ValidateSamFile". When I run "readBAM", it "thinks" for a while but then crashes as follows:

WT_reads <- readBAM(files = WT_bams, type = 'paired')

The messages that come back are as follows: reading file XXXX.bam processing flags processing strand + processing strand - Error in .processStrand("-", bam) : ERROR: Mate selection for - strand is invalid

"WT_bams" is just a character vector with the names of bam files.

It's challenging for me to figure out even which read in the bam file is causing problems, which seems like a good first step toward fixing things. I would very much appreciate any suggestions about what I am doing wrong.

Thanks.

Eric

``` sessionInfo() R version 4.4.0 (2024-04-24) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 18.04.6 LTS

Matrix products: default BLAS/LAPACK: FlexiBLAS OPENBLAS; LAPACK version 3.11.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Los_Angeles tzcode source: system (glibc)

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] nucleR_2.36.0 BiocManager_1.30.23

loaded via a namespace (and not attached): [1] SummarizedExperiment_1.34.0 gtable_0.3.5 hwriter_1.3.2.1 ggplot2_3.5.1
[5] latticeExtra_0.6-30 Biobase_2.64.0 lattice_0.22-6 vctrs_0.6.5
[9] tools_4.4.0 bitops_1.0-7 generics_0.1.3 stats4_4.4.0
[13] parallel_4.4.0 tibble_3.2.1 fansi_1.0.6 pkgconfig_2.0.3
[17] Matrix_1.7-0 RColorBrewer_1.1-3 S4Vectors_0.42.0 lifecycle_1.0.4
[21] GenomeInfoDbData_1.2.12 compiler_4.4.0 deldir_2.0-4 Rsamtools_2.20.0
[25] Biostrings_2.72.0 munsell_0.5.1 codetools_0.2-20 GenomeInfoDb_1.40.0
[29] pillar_1.9.0 crayon_1.5.2 BiocParallel_1.38.0 DelayedArray_0.30.1
[33] ShortRead_1.62.0 abind_1.4-5 tidyselect_1.2.1 dplyr_1.1.4
[37] grid_4.4.0 colorspace_2.1-0 cli_3.6.2 SparseArray_1.4.3
[41] magrittr_2.0.3 S4Arrays_1.4.0 utf8_1.2.4 scales_1.3.0
[45] UCSC.utils_1.0.0 pwalign_1.0.0 XVector_0.44.0 httr_1.4.7
[49] matrixStats_1.3.0 jpeg_0.1-10 interp_1.1-6 png_0.1-8
[53] GenomicRanges_1.56.0 IRanges_2.38.0 rlang_1.1.3 Rcpp_1.0.12
[57] glue_1.7.0 BiocGenerics_0.50.0 pkgload_1.3.4 rstudioapi_0.16.0
[61] jsonlite_1.8.8 R6_2.5.1 MatrixGenerics_1.16.0 GenomicAlignments_1.40.0
[65] zlibbioc_1.50.0

nucleR • 54 views
ADD COMMENT
0
Entering edit mode

I thought I might be able to track down a problematic line in the bam file by repeatedly splitting the bam file in and then running the command on both halves. I did this, splitting the file, seeing that one half caused a crash while the other half didn't, then splitting the half that caused the crash, etc. After 13 or so splits, neither half caused a crash. I then reassembled the last split file, saw that it caused the crash, split it again, and saw that neither half caused the crash. I had hoped that bwa used some sort of syntax that nucleR didn't like, and I would be able to track things down this way, but that doesn't seem to be the case. Perhaps my failure offers a clue about what is going wrong.

Eric

ADD REPLY

Login before adding your answer.

Traffic: 811 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6