Hi,
Apologies if the information is already written somewhere and I missed it.
I read the ChIPQC vignette, its manual, the ChIPQC Bioc2014 presentation, but could not find any more details about the "Reads" metrics and how the peaks numbers are calculated.
It says in the vignette (page 9) that Reads metrics corresponds to "Total number of reads in the bam file for each sample".
Here is my data obtained with the following commands:
> exp <- ChIPQC(metadata, annotation = "mm10")
> exp
Samples: 6 : samp1 samp2 ... samp5 samp6
Tissue Factor Condition Treatment Replicate Peaks
samp1 brain TF WT treat1 1 621
...
Reads Map% Filt% Dup% ReadL FragL RelCC SSD RiP%
samp1 2227533 100 12.9 0 51 207 1.59 0.842 0.02210
> samp1 <- QCsample(exp, 1)
> samp1
ChIPQCsample
Number of Mapped reads: 2227533
Number of Mapped reads passing MapQ filter: 1939371
Percentage Of Reads as Non-Duplicates (NRF): 100(0)
Percentage Of Reads in Blacklisted Regions: NA
SSD: 0.842153930509319
Fragment Length Cross-Coverage: 0.000468866812610841
Relative Cross-Coverage: 1.58670988654781
Percentage Of Reads in GenomicFeature:
ProportionOfCounts
Peaks 0.0002536905
LongPromoter20000to2000 0.0000000000
Promoters2000to500 0.0000000000
Promoters500 0.0000000000
All5utrs 0.0000000000
Alltranscripts 0.0000000000
Allcds 0.0000000000
Allintrons 0.0000000000
All3utrs 0.0000000000
Percentage Of Reads in Peaks: 0.03
Number of Peaks: 8
...
The output from samtools flagstat is:
32041419 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
30769799 + 0 mapped (96.03%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
So my questions would be:
1 - where do the "Number of Mapped reads: 2227533
" given by ChIPQC come from and why do they differ from samtools total number of reads?
2 - where does "Number of Peaks: 8
" come from and why does it differ from the "Peaks = 621" given in exp summary (that's what I see as well in the bed file).
3- the RIP% thus corresponds to the number of reads in the peaks N=8, or N=621?
Thanks a lot in advance for you help,
kind regards,
H.
ps: thanks a lot for making this package available, it is extremely helpful!
ps2: this ChIP does not seem to have worked anyway (very low SSD, very few specific peaks etc), hence the need of thorough QC checks.
Hi,
I had already a look at
flagtagcounts
(), but this still does not explain the discrepancy in reads counts between ChIPQC and samtools flagstat.flagtagcounts(exp)
samp1
UnMapped 0
Mapped 2227533
Duplicates 0
MapQPass 1939371
MapQPassAndDup 0
What worries me is that ChIPQC counts ~2M reads in total, while there are ~30M reads in total in the input bam file.
Why are there ~28M reads missing? or did I miss something important of ChIPQC?
Ok, makes sense now. I did see that parameter description, but somehow, it did not hit my brain, apologies.
Thanks a lot again,
Kind regards,
H.