ATACseqQC function bamSplit returns error when input .bam is from mouse
0
0
Entering edit mode
beiting ▴ 30
@beiting-7489
Last seen 5.3 years ago
United States

Hello - I'm able to use the ATACseqQC package for handling mouse and human ATACseq data, but have run into an issue specifically with the bamSplit function from this package. This function works just fine with human data, following section 2.4.5 of the vignette here. However, when I pass a .bam file generated from mouse data to this function, I get the following error:

Error in asMethod(object) : 
  cannot create a GRanges object from a Seqinfo object with NA seqlengths

running traceback() shows:

4: stop(wmsg("cannot create a GRanges object ", "from a Seqinfo object with NA seqlengths"))
3: asMethod(object)
2: as(seqinfo(genome)[seqlev], "GRanges")
1: splitBam("myMouseSample.bam", tags = tags, 
       outPath = outPath, txs = txs.ms, genome = genome.ms)

Here's is my code:

library(ATACseqQC)
## input is bamFile
bamfile <- system.file("extdata", "GL1.bam", 
                       package="ATACseqQC", mustWork=TRUE)
## bamfile tags
tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
## files will be output into outPath
outPath <- "splited"
dir.create(outPath)
## shift the bam file by the 5'ends

#using human .bam from above, works fine
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs.hs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs.hs <- txs.hs[seqnames(txs.hs) %in% "chr1"]
genome.hs <- Hsapiens
objs <- splitBam(bamfile, tags=tags, outPath=outPath,
                 txs=txs.hs, genome=genome.hs)

#using mouse .bam, produces the error about 'NA seqlengths'
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
txs.ms <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
txs.ms <- txs.ms[seqnamestxs.ms) %in% "chr1"]
genome.ms <- Mmusculus
objs <- splitBam("myMouseSample.bam", tags=tags, outPath=outPath,
                 txs=txs.ms, genome=genome.ms)

Session info is:

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.7 BSgenome.Mmusculus.UCSC.mm10_1.4.0       BSgenome.Hsapiens.UCSC.hg19_1.4.0        BSgenome_1.52.0                         
 [5] rtracklayer_1.44.2                       Biostrings_2.52.0                        XVector_0.24.0                           TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 
 [9] GenomicFeatures_1.36.4                   AnnotationDbi_1.46.0                     Biobase_2.44.0                           GenomicRanges_1.36.0                    
[13] GenomeInfoDb_1.20.0                      IRanges_2.18.1                           ATACseqQC_1.8.5                          S4Vectors_0.22.0                        
[17] BiocGenerics_0.30.0                      edgeR_3.26.5                             biomaRt_2.40.3                           limma_3.40.6                            

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1              seqinr_3.4-5                  grImport2_0.1-5               futile.logger_1.4.3           base64enc_0.1-3               rstudioapi_0.10              
  [7] rGADEM_2.32.0                 ChIPpeakAnno_3.18.2           bit64_0.9-7                   interactiveDisplayBase_1.22.0 splines_3.6.1                 motifStack_1.28.0            
 [13] zeallot_0.1.0                 ade4_1.7-13                   polynom_1.4-0                 Rsamtools_2.0.0               seqLogo_1.50.0                GO.db_3.8.2                  
 [19] dbplyr_1.4.2                  png_0.1-7                     graph_1.62.0                  shiny_1.3.2                   BiocManager_1.30.4            compiler_3.6.1               
 [25] httr_1.4.0                    backports_1.1.4               assertthat_0.2.1              Matrix_1.2-17                 lazyeval_0.2.2                later_0.8.0                  
 [31] formatR_1.7                   htmltools_0.3.6               prettyunits_1.0.2             tools_3.6.1                   gtable_0.3.0                  glue_1.3.1                   
 [37] GenomeInfoDbData_1.2.1        dplyr_0.8.3                   rappdirs_0.3.1                Rcpp_1.0.2                    vctrs_0.2.0                   multtest_2.40.0              
 [43] stringr_1.4.0                 mime_0.7                      ensembldb_2.8.0               XML_3.98-1.20                 idr_1.2                       AnnotationHub_2.16.0         
 [49] zlibbioc_1.30.0               MASS_7.3-51.4                 scales_1.0.0                  hms_0.5.0                     promises_1.0.1                ProtGenerics_1.16.0          
 [55] SummarizedExperiment_1.14.0   RBGL_1.60.0                   AnnotationFilter_1.8.0        lambda.r_1.2.3                yaml_2.2.0                    curl_4.0                     
 [61] memoise_1.1.0                 ggplot2_3.2.0                 MotIV_1.40.0                  stringi_1.4.3                 RSQLite_2.1.2                 randomForest_4.6-14          
 [67] BiocParallel_1.18.0           rlang_0.4.0                   pkgconfig_2.0.2               matrixStats_0.54.0            bitops_1.0-6                  lattice_0.20-38              
 [73] purrr_0.3.2                   GenomicAlignments_1.20.1      htmlwidgets_1.3               bit_1.1-14                    tidyselect_0.2.5              magrittr_1.5                 
 [79] R6_2.4.0                      DelayedArray_0.10.0           DBI_1.0.0                     preseqR_4.0.0                 pillar_1.4.2                  survival_2.44-1.1            
 [85] RCurl_1.95-4.12               tibble_2.1.3                  crayon_1.3.4                  futile.options_1.0.1          KernSmooth_2.23-15            BiocFileCache_1.8.0          
 [91] jpeg_0.1-8                    progress_1.2.2                locfit_1.5-9.1                grid_3.6.1                    blob_1.2.0                    GenomicScores_1.8.1          
 [97] digest_0.6.20                 xtable_1.8-4                  VennDiagram_1.6.20            httpuv_1.5.1                  regioneR_1.16.2               munsell_0.5.0       
ATACseqQC • 1.6k views
ADD COMMENT
0
Entering edit mode

Correction: the function name is splitBam, not 'bamSplit'

ADD REPLY
1
Entering edit mode

Hello,

   If you check the document for the function splitBam, you will see:

splitBam(bamfile, tags, index = bamfile, outPath = NULL, txs, genome, conservation, positive = 4L, negative = 5L, breaks = c(0, 100, 180, 247, 315, 473, 558, 615, Inf), labels = c("NucleosomeFree", "inter1", "mononucleosome", "inter2", "dinucleosome", "inter3", "trinucleosome", "others"), seqlev = paste0("chr", c(1:22, "X", "Y")), cutoff = 0.8)

So the argument for seqlev is default to the human genome. Since there are 20 pairs of chromosome in mice only. You may reset this parameter.

Another possibility is that you got your mouse ref genome from other source instead of UCSC, which has different chromsome naming rules. Can you just display the BAM header using samtools: samtools view -h XXX.bam to check?

Haibo

ADD REPLY

Login before adding your answer.

Traffic: 592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6