ATACseqQC shiftGAlignmentsList() problem
1
0
Entering edit mode
jfmay • 0
@518122bc
Last seen 10 weeks ago
Canada

I am trying to follow the tutorial found here . The tutorial uses a subset of the sample data (chromosome 1) to make the demo run faster, but I am trying to shift my bam file for the whole mouse genome (so I changed "seqlev" from the tutorial code to "mmstdchr" when I applied it to my own). Here is my code:

library(ATACseqQC)
library(BSgenome.Mmusculus.UCSC.mm10)
mouse <- BSgenome.Mmusculus.UCSC.mm10 # doing this subsets the mouse genome assembly to only include "standard" sequences and not alternates
mmstdchr <- standardChromosomes(mouse)

tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
outPath <- "splited"
dir.create(outPath)

bamfile <- "D:/All open chromatin bam files for counting/Sorted_3-AL1_filtered.bam"
bamfile.labels <- gsub(".bam", "", basename(bamfile))
which <- as(seqinfo(Mmusculus)
            [mmstdchr]
            , "GRanges")
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE)
gal1 <- shiftGAlignmentsList(gal)

The issue is that when the shiftGAlignmentsList(gal) completes, gal1 is generated into a large GAlignments object 54 GB in size, whereas the bam file I'm shifting (Sorted_3-AL1_filtered.bam) is only 4.6 GB. This seems very wrong. And I can't even export gal1 because it's too large. Can anyone tell me what I'm doing wrong, and how to properly generate a shift bam file?

ATACseqQC ATACSeq • 397 views
ADD COMMENT
0
Entering edit mode

Hi,

Thank you for trying ATACseqQC. Could you let me know the version of ATACseqQC you are using? And did you try to set bigFile=TRUE when you call readBamFile?

Jianhong.

ADD REPLY
0
Entering edit mode

Thanks for your response. The version of ATACseqQC I'm using is 1.26.0. I tried to set bigFile=TRUE. The resulting object gal1 still ends up the same huge size. Here is what happens when I try to export this file:

export(gal1, "3-AL1_Tn5_shift.bam")

Error: cannot allocate vector of size 987.8 Mb

I'm not sure I fully understand, but I think R is running out of memory at this stage?

ADD REPLY
0
Entering edit mode

try: gal1 <- shiftGAlignmentsList(gal, outbam='3-AL1_Tn5_shift.bam'). I will try to figure out why the memory is so high.

ADD REPLY
0
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 4 days ago
United States

Hi jfmay,

I updated the development version to fix the issue that caused by readGAlignmentsList does not support read by chunk. This updates will decrease the memory cost to about 10Gb for a 5Gb bam file (not fully tested) for single session. If you want to process the file with multiple process, the memory cost will be about 8Gb*(#process+1). You can set a smaller slidingWindowSize to decrease the memory cost. The current process still have lot of space to improve for how to handle the seq and qual in the meta data of GAlignments. But it will take time. I will continues work on that. To install the development version of ATACseqQC, you may want to try: BiocManager::install("jianhong/ATACseqQC")

Let me know if you still have any question.

Jianhong.

ADD COMMENT

Login before adding your answer.

Traffic: 364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6