ATACSeqQC - error when splitting files using mm10
0
0
Entering edit mode
@ritamonteiro-23554
Last seen 16 months ago

Hi,

I've been trying for over a week to run ATACseqQC on my data with little success. I managed to generate the shifted.bam file but I cannot split them into different categories based on fragment size. I managed to ran the package using the example samples and I think ( I'm sure at this point after so many trials) that I managed to do it with a subsample (Chr1) of data.

Here's the code and the error that I get:


packageVersion("ATACseqQC")
[1] '1.10.4'

library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(ChIPpeakAnno)
library(Rsamtools)
library(GenomicRanges)
library(GenomicScores)

bamfile <- "*.bam"
bamfile.labels <- gsub(".bam", "", basename(bamfile))
outPath <- "./2h_B2_Test"
dir.create(outPath)

possibleTag <- combn(LETTERS, 2)
possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]),
                 paste0(possibleTag[2, ], possibleTag[1, ]))
bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),
                     param = ScanBamParam(tag=possibleTag))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)==100]

seqlev <-(c("chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chrX","chrY"))
which <- as(seqinfo(Mmusculus)[seqlev], "GRanges")
genome <- Mmusculus
txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
gal <- readBamFile(bamfile, tag=tags, asMates=TRUE, which = which, bigFile=TRUE)
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath)

And here's the error:


[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:1:13208:3290:2190" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:1:22303:21110:16221" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:2:13112:7733:15765" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:4:11601:9523:4754" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:4:13606:5705:12385" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:3:22511:22886:7021" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:2:12208:11077:15338" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:3:23507:19885:11109" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:1:22204:5857:1791" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:4:21511:7543:2465" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:3:23408:6559:11844" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:3:13502:3326:18935" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:2:22102:9488:3937" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:1:12202:22122:12780" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:2:11209:3361:14587" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:2:13103:9747:4114" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:1:12207:23014:16371" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:3:11508:8191:4892" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:2:21101:10496:13188" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:2:11310:18186:7800" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "NB501779:62:HG3TYBGX5:4:23509:16389:11592" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[E::sam_parse1] unrecognized type N
[W::sam_read1] Parse error at line 22
[E::sam_parse1] unrecognized type N
[W::sam_read1] Parse error at line 22
[E::sam_parse1] unrecognized type N
[W::sam_read1] Parse error at line 22
[E::sam_parse1] unrecognized type N
[W::sam_read1] Parse error at line 22
[E::sam_parse1] unrecognized type N
[W::sam_read1] Parse error at line 22
[E::sam_parse1] unrecognized type N
[W::sam_read1] Parse error at line 22
[...]
atacseqqc atac-seq R bioconductor rsamtools • 393 views
ADD COMMENT
0
Entering edit mode

Hi Rita,

Thank you for trying ATACseqQC to analyze your ATAC-seq data. This is a known issue that when one or more tags are missing, NA value will be generated when write it to sam file, which in turn trigger the error when convert sam file to bam file. Could you try it first by removing PG tag from your tags like this:

tags <- tags[tags!="PG"]

Let me know if you still have trouble.

I am planning to rewrite the export function when there is NA values.

Jianhong.

ADD REPLY
0
Entering edit mode

Hi Jianhong,

Thank you so much, it fixed the problem!

I'm running into another error when trying to plot the heatmaps and the normalised signals. When I run this code:

 sigs <- enrichedFragments(gal=objs[c("NucleosomeFree", 
                                     "mononucleosome",
                                     "dinucleosome",
                                     "trinucleosome")], 
                          TSS=TSS,
                          librarySize=librarySize,
                          seqlev=seqlev,
                          TSS.filter=0.5,
                          n.tile = NTILE,
                          upstream = ups,
                          downstream = dws)
## log2 transformed signals
sigs.log2 <- lapply(sigs, function(.ele) log2(.ele+1))
#plot heatmap
featureAlignedHeatmap(sigs.log2, reCenterPeaks(TSS, width=ups+dws),
                      zeroAt=.5, n.tile=NTILE)

I get this error:

  Warning messages:
1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 1 out-of-bound range located on sequence
  chr4_JH584295_random. Note that ranges located on a sequence whose
  length is unknown (NA) or on a circular sequence are not considered
  out-of-bound (use seqlengths() and isCircular() to get the lengths and
  circularity flags of the underlying sequences). You can use trim() to
  trim these ranges. See ?`trim,GenomicRanges-method` for more
  information.
2: In reCenterPeaks(TSS, width = ups + dws) :
  Some start position of the peaks are less than 1!
3: In reCenterPeaks(TSS, width = ups + dws) :
  Some end position of the peaks are out of bound!

Do you have any tips on how to solve this?

Thanks again

ADD REPLY
0
Entering edit mode

Hi,

Did you see the heatmap? I did not see the error message from your description. I only see the warning message. Could you please show me the error message?

Jianhong.

ADD REPLY
0
Entering edit mode

Hi, Running the command I don't get any heatmap. I get the warning messages and there is no output. I've tried the code more times with different files and I keep getting the error. I don't know what I'm doing wrong.

Thank you.

ADD REPLY
0
Entering edit mode

Hi, Could you share your sigs.log2 and TSS object to me? I will check what happened.

Jianhong.

ADD REPLY

Login before adding your answer.

Traffic: 204 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6