The meaning of results produced by the bamQC function implemented in the ATACseqQC
2
0
Entering edit mode
Gary ▴ 20
@gary-7967
Last seen 5.1 years ago

Hi,

I use bamQC in ATACseqQC to do the quality analysis of our ATAC-Seq data. However, I have some questions below. Could you help me? Many thanks.

Best,

Gary

My questions (1) The meaning of $totalQNAMEs

(2) The meaning of $nonRedundantFraction

(3) The meaning of $MAPQ. Is it mapping quality?

(4) How does ATACseqQC define the low mapping quality in $notPassingQualityControlsRate

(5) The difference between $duplicateRate, $PCRbottleneckCoefficient1, and $PCRbottleneckCoefficient2

bamQC results

> bamfile <- "Bulbul.bam"
> bamfile.label <- sub(".bam","",basename(bamfile))
> bamQC(bamfile, doubleCheckDup = TRUE, mitochondria = "chrM", outPath=NULL)
$totalQNAMEs
[1] 21916261

$duplicateRate
[1] 0.3664853

$mitochondriaRate
[1] 0.1325265

$properPairRate
[1] 0.8650588

$unmappedRate
[1] 0

$hasUnmappedMateRate
[1] 0.009750733

$notPassingQualityControlsRate
[1] 0

$nonRedundantFraction
[1] 0.4259382

$PCRbottleneckCoefficient_1
[1] 0.6878547

$PCRbottleneckCoefficient_2
[1] 3.223372

$MAPQ
   Var1     Freq
0     0  1084726
1     1  2871341
11   11  3289972
12   12   104817
14   14   816882
16   16    93679
17   17   501750
18   18   393854
19   19    54317
2     2   885517
21   21   276105
22   22  2311572
24   24  1988876
25   25   157419
28   28  1968735
31   31   157705
32   32   120801
33   33   119436
34   34   123993
35   35    44634
36   36  2199923
37   37   158525
38   38   179100
39   39   203899
40   40   359444
41   41  1728179
42   42  1664509
44   44 19352277
9     9   197263

$idxstats
   seqnames seqlength  mapped unmapped
1      chr1 148872119 4128800        0
2      chr2 196923045 5417894        0
3      chr3 131294613 3780349        0
4      chr4  91954985 2632716        0
5      chr5  70255457 2138911        0
6      chr6  47769531 1717115        0
7      chr7  48434409 1320034        0
8      chr8  38169314 1097923        0
9      chr9  32183676  909768        0
10    chr10  26112686  759545        0
11    chr11  29734599  926547        0
12    chr12  26924181  784131        0
13    chr13  20766281  587163        0
14    chr14  23539961  695308        0
15    chr15  18414836  502454        0
16    chr16  86149285 2480089        0
17    chr17  18671425  534943        0
18    chr18  16791751  470999        0
19    chr19  13212593  386971        0
20    chr20  20610846  610399        0
21    chr21  10574128  301684        0
22    chr22   7051356  305518        0
23    chr23   8322055  267565        0
24    chr24   9986730  340468        0
25    chr25   2816729  118446        0
26    chr26   8520532  272092        0
27    chr27   8282467  251498        0
28    chr28   7839907  262123        0
29    chr29   1612305   44851        0
30    chr30  23086460  897442        0
31    chr31   1266911   40603        0
32    chr32  81982468 2672024        0
33     chrM     17011 5752877        0
ATACseqQC bamQC ATAC-Seq totalQNAMEs MAPQ • 1.4k views
ADD COMMENT
1
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 1 day ago
United States

Hi Gary,

(1) The meaning of $totalQNAMEs Total number of reads (single, paired)

(3) The meaning of $MAPQ. Is it mapping quality? Yes. It is the count of each mapping quality value.

(4) How does ATACseqQC define the low mapping quality in $notPassingQualityControlsRate ATACseqQC did not define the low mapping quality. It should be defined in your bam file.

(2) The meaning of $nonRedundantFraction (5) The difference between $duplicateRate, $PCRbottleneckCoefficient1, and $PCRbottleneckCoefficient2

You can refer https://www.encodeproject.org/data-standards/terms/

Jianhong.

ADD COMMENT
0
Entering edit mode

Hi Jianhong,

Thanks a lot. May I have an additional question? Using ENCODE's terms and definitions for the ATAC-Seq library complexity, I don't understand why (1) my bottlenecking level is "Severe" based on my PBC1 value (0.6878547 < 0.7), but (2) my bottlenecking level is "None" base on my PCB2 value (3.223372 > 3). Could you help me? Many thanks.

Best,

Gary

ADD REPLY
1
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 4 months ago
United States

Gary,

PCR1 = number of genomic locations with one uniquely mapped reads / number of genomic locations with at least one uniquely mapped reads. PCR2 = number of genomic locations with one uniquely mapped reads / number of genomic locations with two uniquely mapped reads.

If one of the PCR Bottlenecking Coefficients indicates there is a problem with the library complexity, there is a problem regardless of the value of the other coefficient. In your situation, it means that there is not a concern about the number of genomic locations with exactly two uniquely mapped reads. However, there is too many genomic locations with more than 1 uniquely mapped reads.

Hope this answers your question.

Best regards,

Julie

ADD COMMENT
0
Entering edit mode

Dear Julie,

Your explanation is very helpful. Thank you so much.

Best,

Gary

ADD REPLY
0
Entering edit mode

Dear Gary,

You are very welcome! Thanks for letting me know! It is a good question! This thread will very likely help others to evaluate their ATAC-seq data.

Best regards,

Julie

ADD REPLY

Login before adding your answer.

Traffic: 705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6