Question: ATACseqQC function bamSplit returns error when input .bam is from mouse
0
gravatar for beiting
3 months ago by
beiting30
United States
beiting30 wrote:

Hello - I'm able to use the ATACseqQC package for handling mouse and human ATACseq data, but have run into an issue specifically with the bamSplit function from this package. This function works just fine with human data, following section 2.4.5 of the vignette here. However, when I pass a .bam file generated from mouse data to this function, I get the following error:

Error in asMethod(object) : 
  cannot create a GRanges object from a Seqinfo object with NA seqlengths

running traceback() shows:

4: stop(wmsg("cannot create a GRanges object ", "from a Seqinfo object with NA seqlengths"))
3: asMethod(object)
2: as(seqinfo(genome)[seqlev], "GRanges")
1: splitBam("myMouseSample.bam", tags = tags, 
       outPath = outPath, txs = txs.ms, genome = genome.ms)

Here's is my code:

library(ATACseqQC)
## input is bamFile
bamfile <- system.file("extdata", "GL1.bam", 
                       package="ATACseqQC", mustWork=TRUE)
## bamfile tags
tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
## files will be output into outPath
outPath <- "splited"
dir.create(outPath)
## shift the bam file by the 5'ends

#using human .bam from above, works fine
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs.hs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs.hs <- txs.hs[seqnames(txs.hs) %in% "chr1"]
genome.hs <- Hsapiens
objs <- splitBam(bamfile, tags=tags, outPath=outPath,
                 txs=txs.hs, genome=genome.hs)

#using mouse .bam, produces the error about 'NA seqlengths'
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
txs.ms <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
txs.ms <- txs.ms[seqnamestxs.ms) %in% "chr1"]
genome.ms <- Mmusculus
objs <- splitBam("myMouseSample.bam", tags=tags, outPath=outPath,
                 txs=txs.ms, genome=genome.ms)

Session info is:

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.7 BSgenome.Mmusculus.UCSC.mm10_1.4.0       BSgenome.Hsapiens.UCSC.hg19_1.4.0        BSgenome_1.52.0                         
 [5] rtracklayer_1.44.2                       Biostrings_2.52.0                        XVector_0.24.0                           TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 
 [9] GenomicFeatures_1.36.4                   AnnotationDbi_1.46.0                     Biobase_2.44.0                           GenomicRanges_1.36.0                    
[13] GenomeInfoDb_1.20.0                      IRanges_2.18.1                           ATACseqQC_1.8.5                          S4Vectors_0.22.0                        
[17] BiocGenerics_0.30.0                      edgeR_3.26.5                             biomaRt_2.40.3                           limma_3.40.6                            

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1              seqinr_3.4-5                  grImport2_0.1-5               futile.logger_1.4.3           base64enc_0.1-3               rstudioapi_0.10              
  [7] rGADEM_2.32.0                 ChIPpeakAnno_3.18.2           bit64_0.9-7                   interactiveDisplayBase_1.22.0 splines_3.6.1                 motifStack_1.28.0            
 [13] zeallot_0.1.0                 ade4_1.7-13                   polynom_1.4-0                 Rsamtools_2.0.0               seqLogo_1.50.0                GO.db_3.8.2                  
 [19] dbplyr_1.4.2                  png_0.1-7                     graph_1.62.0                  shiny_1.3.2                   BiocManager_1.30.4            compiler_3.6.1               
 [25] httr_1.4.0                    backports_1.1.4               assertthat_0.2.1              Matrix_1.2-17                 lazyeval_0.2.2                later_0.8.0                  
 [31] formatR_1.7                   htmltools_0.3.6               prettyunits_1.0.2             tools_3.6.1                   gtable_0.3.0                  glue_1.3.1                   
 [37] GenomeInfoDbData_1.2.1        dplyr_0.8.3                   rappdirs_0.3.1                Rcpp_1.0.2                    vctrs_0.2.0                   multtest_2.40.0              
 [43] stringr_1.4.0                 mime_0.7                      ensembldb_2.8.0               XML_3.98-1.20                 idr_1.2                       AnnotationHub_2.16.0         
 [49] zlibbioc_1.30.0               MASS_7.3-51.4                 scales_1.0.0                  hms_0.5.0                     promises_1.0.1                ProtGenerics_1.16.0          
 [55] SummarizedExperiment_1.14.0   RBGL_1.60.0                   AnnotationFilter_1.8.0        lambda.r_1.2.3                yaml_2.2.0                    curl_4.0                     
 [61] memoise_1.1.0                 ggplot2_3.2.0                 MotIV_1.40.0                  stringi_1.4.3                 RSQLite_2.1.2                 randomForest_4.6-14          
 [67] BiocParallel_1.18.0           rlang_0.4.0                   pkgconfig_2.0.2               matrixStats_0.54.0            bitops_1.0-6                  lattice_0.20-38              
 [73] purrr_0.3.2                   GenomicAlignments_1.20.1      htmlwidgets_1.3               bit_1.1-14                    tidyselect_0.2.5              magrittr_1.5                 
 [79] R6_2.4.0                      DelayedArray_0.10.0           DBI_1.0.0                     preseqR_4.0.0                 pillar_1.4.2                  survival_2.44-1.1            
 [85] RCurl_1.95-4.12               tibble_2.1.3                  crayon_1.3.4                  futile.options_1.0.1          KernSmooth_2.23-15            BiocFileCache_1.8.0          
 [91] jpeg_0.1-8                    progress_1.2.2                locfit_1.5-9.1                grid_3.6.1                    blob_1.2.0                    GenomicScores_1.8.1          
 [97] digest_0.6.20                 xtable_1.8-4                  VennDiagram_1.6.20            httpuv_1.5.1                  regioneR_1.16.2               munsell_0.5.0       
atacseqqc • 122 views
ADD COMMENTlink written 3 months ago by beiting30

Correction: the function name is splitBam, not 'bamSplit'

ADD REPLYlink written 3 months ago by beiting30
1

Hello,

   If you check the document for the function splitBam, you will see:

splitBam(bamfile, tags, index = bamfile, outPath = NULL, txs, genome, conservation, positive = 4L, negative = 5L, breaks = c(0, 100, 180, 247, 315, 473, 558, 615, Inf), labels = c("NucleosomeFree", "inter1", "mononucleosome", "inter2", "dinucleosome", "inter3", "trinucleosome", "others"), seqlev = paste0("chr", c(1:22, "X", "Y")), cutoff = 0.8)

So the argument for seqlev is default to the human genome. Since there are 20 pairs of chromosome in mice only. You may reset this parameter.

Another possibility is that you got your mouse ref genome from other source instead of UCSC, which has different chromsome naming rules. Can you just display the BAM header using samtools: samtools view -h XXX.bam to check?

Haibo

ADD REPLYlink written 3 months ago by haibol90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 306 users visited in the last hour