ATACseqQC R package for QC scores (TSS, FRIP, NFR, IDR)
1
0
Entering edit mode
achaillon • 0
@achaillon-14117
Last seen 8 months ago

Der All I am new with ATAC-Seq data and I have some QC questions We ran our first ATACseqQC experiments on MiSeq and got results that I would like to perform QC on… I cleaned up and mapped the fastq to the hg19 and was able to run most of your code for each sample, following this nice step by step approach: https://www.bioconductor.org/packages/release/bioc/vignettes/ATACseqQC/inst/doc/ATACseqQC.html#transcription-start-site-tss-enrichment-score

I am trying to get several ‘standard’ scores and results I got from one sample using the nice R package ATACseqQC See my questions below…

# 1 TSS Score

My results seem off with inf…. (according to what I understood, it should be >6 to be acceptable) Min. 0 1st Qu. 4.54695122 Median Inf Mean Inf 3rd Qu. Inf Max. Inf NA 24655

I did the following for the entire genome

gal <- readBamFile(bamfile, tag=character(0),asMates=TRUE, bigFile=TRUE)
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
tsse <- TSSEscore(gal1, txs)


and I also tried with the tag identification for one chromosome

possibleTag <- combn(LETTERS, 2)
possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]),
paste0(possibleTag[2, ], possibleTag[1, ]))

bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),
param = ScanBamParam(tag=possibleTag))[[1]]$tag tags <- names(bamTop100)[lengths(bamTop100)==100] tags seqlev <- "chr1" which <- as(seqinfo(Hsapiens)[seqlev], "GRanges") gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE) shiftedBamfile <- file.path(outPath, "shifted.bam") gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)  # 2 NFR Score: I tried to compile an overall NFR score (all chrom) by computing the mean(nfr2$NFR_score) -0.54 [-0.58--0.49]

I am probably missing something as I don’t think this score can be negative? Should it be performed chrom by chrom?

# 3 The fraction of reads in called peak regions FRIP Score

It is the ratio nucleosome free/total obtained from libsize? NucleosomeFree.bam 601935 mononucleosome.bam 50970 dinucleosome.bam 33860 trinucleosome.bam 2018

# 4 IDR score

Finally, for the IDR score (for reproducibility) : is it something that can be extracted from the corr plot?

thank you!

ATACseqQC atacseq R • 458 views
0
Entering edit mode
Ou, Jianhong ★ 1.2k
@ou-jianhong-4539
Last seen 10 weeks ago
United States

Hi achaillon,

Thank you for using ATACseqQC.

1. TSSE score. I am updating the formula for that. Please refer: post#133036.

2. The NFR score for each TSS is calculated as NFR-score = log2(nf) - log2((n1+n2)/2). The negative value indicates that your open region is shifted from TSS. See review: Gene regulation by nucleosome positioning. You can do it chrom by chrom. You can also do it by whole genome.

3. We did not provide function to calculate the FRIP score. I will add this into TODO list.

4. IDR score. We did not provide function for that. I don't think you can extract IDR score from corr plot. Basically, IDR score is indicates the correlation of peak ranks in duplicates. So the IDR score is calculated for each peak. When I design the package, I did take the calling peaks into account. I am thinking if we can use nucleosome free region and nucleosome region to replace the peaks for IDR calculation.

Jianhong.

0
Entering edit mode

Hi Jianhong,

Since TSSE score is a bug fix, could you please fix it in the released version as well. Thanks!

Best, Julie

0
Entering edit mode

thank you so much Jianhong. Sorry for not getting back to you earlier. Looking forward to testing the FRIP score function (any ETA?). Best,

0
Entering edit mode

Dear Jianhong I've updated ATACseqQC to version 1.13.5 and tried to rerun the TSSE Score for one sample, full region. I still have these scores with infinite values... Am I missing something?

thank you

gal <- readBamFile(bamfile, tag=character(0),asMates=TRUE, bigFile=TRUE)
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)

txs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
tsse <- TSSEscore(gal1, txs)

> summary(tsse\$TSS.enrichment.score)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
0.000   2.921     Inf     Inf     Inf     Inf   21701