Entering edit mode
Hello - I'm able to use the ATACseqQC package for handling mouse and human ATACseq data, but have run into an issue specifically with the bamSplit function from this package. This function works just fine with human data, following section 2.4.5 of the vignette here. However, when I pass a .bam file generated from mouse data to this function, I get the following error:
Error in asMethod(object) :
cannot create a GRanges object from a Seqinfo object with NA seqlengths
running traceback()
shows:
4: stop(wmsg("cannot create a GRanges object ", "from a Seqinfo object with NA seqlengths"))
3: asMethod(object)
2: as(seqinfo(genome)[seqlev], "GRanges")
1: splitBam("myMouseSample.bam", tags = tags,
outPath = outPath, txs = txs.ms, genome = genome.ms)
Here's is my code:
library(ATACseqQC)
## input is bamFile
bamfile <- system.file("extdata", "GL1.bam",
package="ATACseqQC", mustWork=TRUE)
## bamfile tags
tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
## files will be output into outPath
outPath <- "splited"
dir.create(outPath)
## shift the bam file by the 5'ends
#using human .bam from above, works fine
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs.hs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs.hs <- txs.hs[seqnames(txs.hs) %in% "chr1"]
genome.hs <- Hsapiens
objs <- splitBam(bamfile, tags=tags, outPath=outPath,
txs=txs.hs, genome=genome.hs)
#using mouse .bam, produces the error about 'NA seqlengths'
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
txs.ms <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
txs.ms <- txs.ms[seqnamestxs.ms) %in% "chr1"]
genome.ms <- Mmusculus
objs <- splitBam("myMouseSample.bam", tags=tags, outPath=outPath,
txs=txs.ms, genome=genome.ms)
Session info is:
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.7 BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.52.0
[5] rtracklayer_1.44.2 Biostrings_2.52.0 XVector_0.24.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[9] GenomicFeatures_1.36.4 AnnotationDbi_1.46.0 Biobase_2.44.0 GenomicRanges_1.36.0
[13] GenomeInfoDb_1.20.0 IRanges_2.18.1 ATACseqQC_1.8.5 S4Vectors_0.22.0
[17] BiocGenerics_0.30.0 edgeR_3.26.5 biomaRt_2.40.3 limma_3.40.6
loaded via a namespace (and not attached):
[1] colorspace_1.4-1 seqinr_3.4-5 grImport2_0.1-5 futile.logger_1.4.3 base64enc_0.1-3 rstudioapi_0.10
[7] rGADEM_2.32.0 ChIPpeakAnno_3.18.2 bit64_0.9-7 interactiveDisplayBase_1.22.0 splines_3.6.1 motifStack_1.28.0
[13] zeallot_0.1.0 ade4_1.7-13 polynom_1.4-0 Rsamtools_2.0.0 seqLogo_1.50.0 GO.db_3.8.2
[19] dbplyr_1.4.2 png_0.1-7 graph_1.62.0 shiny_1.3.2 BiocManager_1.30.4 compiler_3.6.1
[25] httr_1.4.0 backports_1.1.4 assertthat_0.2.1 Matrix_1.2-17 lazyeval_0.2.2 later_0.8.0
[31] formatR_1.7 htmltools_0.3.6 prettyunits_1.0.2 tools_3.6.1 gtable_0.3.0 glue_1.3.1
[37] GenomeInfoDbData_1.2.1 dplyr_0.8.3 rappdirs_0.3.1 Rcpp_1.0.2 vctrs_0.2.0 multtest_2.40.0
[43] stringr_1.4.0 mime_0.7 ensembldb_2.8.0 XML_3.98-1.20 idr_1.2 AnnotationHub_2.16.0
[49] zlibbioc_1.30.0 MASS_7.3-51.4 scales_1.0.0 hms_0.5.0 promises_1.0.1 ProtGenerics_1.16.0
[55] SummarizedExperiment_1.14.0 RBGL_1.60.0 AnnotationFilter_1.8.0 lambda.r_1.2.3 yaml_2.2.0 curl_4.0
[61] memoise_1.1.0 ggplot2_3.2.0 MotIV_1.40.0 stringi_1.4.3 RSQLite_2.1.2 randomForest_4.6-14
[67] BiocParallel_1.18.0 rlang_0.4.0 pkgconfig_2.0.2 matrixStats_0.54.0 bitops_1.0-6 lattice_0.20-38
[73] purrr_0.3.2 GenomicAlignments_1.20.1 htmlwidgets_1.3 bit_1.1-14 tidyselect_0.2.5 magrittr_1.5
[79] R6_2.4.0 DelayedArray_0.10.0 DBI_1.0.0 preseqR_4.0.0 pillar_1.4.2 survival_2.44-1.1
[85] RCurl_1.95-4.12 tibble_2.1.3 crayon_1.3.4 futile.options_1.0.1 KernSmooth_2.23-15 BiocFileCache_1.8.0
[91] jpeg_0.1-8 progress_1.2.2 locfit_1.5-9.1 grid_3.6.1 blob_1.2.0 GenomicScores_1.8.1
[97] digest_0.6.20 xtable_1.8-4 VennDiagram_1.6.20 httpuv_1.5.1 regioneR_1.16.2 munsell_0.5.0
Correction: the function name is
splitBam
, not 'bamSplit'Hello,
splitBam(bamfile, tags, index = bamfile, outPath = NULL, txs, genome, conservation, positive = 4L, negative = 5L, breaks = c(0, 100, 180, 247, 315, 473, 558, 615, Inf), labels = c("NucleosomeFree", "inter1", "mononucleosome", "inter2", "dinucleosome", "inter3", "trinucleosome", "others"), seqlev = paste0("chr", c(1:22, "X", "Y")), cutoff = 0.8)
So the argument for seqlev is default to the human genome. Since there are 20 pairs of chromosome in mice only. You may reset this parameter.
Another possibility is that you got your mouse ref genome from other source instead of UCSC, which has different chromsome naming rules. Can you just display the BAM header using samtools: samtools view -h XXX.bam to check?
Haibo