Question: ChIPQC: reads counts in bam do not match ones from samtools flagstat
0
gravatar for atisou
4.7 years ago by
atisou0
Switzerland
atisou0 wrote:

Hi,

 

Apologies if the information is already written somewhere and I missed it.

I read the ChIPQC vignette, its manual, the ChIPQC Bioc2014 presentation, but could not find any more details about the "Reads" metrics and how the peaks numbers are calculated.

It says in the vignette (page 9) that Reads metrics corresponds to "Total number of reads in the bam file for each sample".

Here is my data obtained with the following commands:

> exp <- ChIPQC(metadata, annotation = "mm10")

> exp

Samples: 6 : samp1 samp2 ... samp5 samp6
           Tissue Factor Condition Treatment Replicate Peaks
samp1 brain  TF        WT    treat1         1   621

...
        Reads Map% Filt% Dup% ReadL FragL RelCC   SSD    RiP%
samp1 2227533  100  12.9    0    51   207  1.59 0.842 0.02210

> samp1 <- QCsample(exp, 1)

> samp1

                    ChIPQCsample
Number of Mapped reads: 2227533
Number of Mapped reads passing MapQ filter: 1939371
Percentage Of Reads as Non-Duplicates (NRF): 100(0)
Percentage Of Reads in Blacklisted Regions: NA
SSD: 0.842153930509319
Fragment Length Cross-Coverage: 0.000468866812610841
Relative Cross-Coverage: 1.58670988654781
Percentage Of Reads in GenomicFeature:
                        ProportionOfCounts
Peaks                         0.0002536905
LongPromoter20000to2000       0.0000000000
Promoters2000to500            0.0000000000
Promoters500                  0.0000000000
All5utrs                      0.0000000000
Alltranscripts                0.0000000000
Allcds                        0.0000000000
Allintrons                    0.0000000000
All3utrs                      0.0000000000
Percentage Of Reads in Peaks: 0.03
Number of Peaks: 8

...

The output from samtools flagstat is:

32041419 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
30769799 + 0 mapped (96.03%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

 

So my questions would be:

1 - where do the "Number of Mapped reads: 2227533" given by ChIPQC come from and why do they differ from samtools total number of reads?

2 - where does "Number of Peaks: 8" come from and why does it differ from the "Peaks = 621" given in exp summary (that's what I see as well in the bed file).

3- the RIP% thus corresponds to the number of reads in the peaks N=8, or N=621?

 

Thanks a lot in advance for you help,

 

kind regards,

 

H.

 

ps: thanks a lot for making this package available, it is extremely helpful!

ps2: this ChIP does not seem to have worked anyway (very low SSD, very few specific peaks etc), hence the need of thorough QC checks.

 

 

 

 

 

 

 

 

chipqc • 1.2k views
ADD COMMENTlink modified 4.7 years ago by Thomas Carroll400 • written 4.7 years ago by atisou0
Answer: ChIPQC: reads counts in bam do not match ones from samtools flagstat
0
gravatar for Thomas Carroll
4.7 years ago by
United States/New York/The Rockefeller University
Thomas Carroll400 wrote:
Hi, Sorry, I think the vignette could do with some clarification here. The result in "Mapped reads" should be the number of read mappings within chromosomes included in analysis. Small chromosomes may be omitted when smaller than shift size used for cross-coverage calculations. You can use the flagtagcounts() function on the ChIPQC object to get alittle more information on counts of different BAM flags. Worth noting that ChIPQC relies on having read duplicates pre-marked for its calculation of duplication rates. I would run Picards MarkDuplicates tool on your Bam files so you can see your duplication rates in ChIPQC. I will add a method for calculating duplicates which isn't dependent on the pre-marking in the new release hope that helps, tom On Tue, Mar 17, 2015 at 11:25 AM, atisou [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User atisou <https: support.bioconductor.org="" u="" 7468=""/> wrote Question: > ChIPQC: reads counts in bam do not match ones from samtools flagstat > <https: support.bioconductor.org="" p="" 65738=""/>: > > Hi, > > > > Apologies if this was written somewhere and I missed it. > > I read the ChIPQC vignette, its manual, the ChIPQC Bioc2014 presentation, > but could not find any more details about the "Reads" metrics. > > It says in the vignette (page 9) that Reads metrics corresponds to "Total > number of reads in the bam file for each sample". > > Here is my data obtained with the following commands: > > > exp <- ChIPQC(metadata, annotation = "mm10") > > > exp > > Samples: 6 : samp1 samp2 ... samp5 samp6 > Tissue Factor Condition Treatment Replicate Peaks > samp1 brain TF WT treat1 1 621 > ... > Reads Map% Filt% Dup% ReadL FragL RelCC SSD RiP% > samp12227533 100 12.9 0 51 207 1.59 0.842 0.02210 > > > samp1 <- QCsample(exp, 1) > > > samp1 > > ChIPQCsample > Number of Mapped reads: 2227533 > Number of Mapped reads passing MapQ filter: 1939371 > Percentage Of Reads as Non-Duplicates (NRF): 100(0) > Percentage Of Reads in Blacklisted Regions: NA > SSD: 0.842153930509319 > Fragment Length Cross-Coverage: 0.000468866812610841 > Relative Cross-Coverage: 1.58670988654781 > Percentage Of Reads in GenomicFeature: > ProportionOfCounts > Peaks 0.0002536905 > LongPromoter20000to2000 0.0000000000 > Promoters2000to500 0.0000000000 > Promoters500 0.0000000000 > All5utrs 0.0000000000 > Alltranscripts 0.0000000000 > Allcds 0.0000000000 > Allintrons 0.0000000000 > All3utrs 0.0000000000 > Percentage Of Reads in Peaks: 0.03 > Number of Peaks: 8 > > ... > > > > > > > > > > > > > > > > ------------------------------ > > You may reply via email or visit ChIPQC: reads counts in bam do not match ones from samtools flagstat >
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Thomas Carroll400

Hi,

I had already a look at flagtagcounts(), but this still does not explain the discrepancy in reads counts between ChIPQC and samtools flagstat.

flagtagcounts(exp)
samp1
UnMapped             0
Mapped         2227533
Duplicates           0
MapQPass       1939371
MapQPassAndDup       0

 

What worries me is that ChIPQC counts ~2M reads in total, while there are ~30M reads in total in the input bam file.

Why are there ~28M reads missing? or did I miss something important of ChIPQC?

 

 

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by atisou0
1
Hi, In your case here, ChIPQC is analysing only 1 chromosome, the first chromsome in peak file. This was done to save time as working across all chromosomes from multiple files which can take a longer time. So your counts will be only from the chromosome analysed From the manual: chromosomes Specification of which chromosomes to use for computing QC statistics. If missing, the first chromosome which has a peak is checked. If NULL, all chromosomes will be checked (which may be time-consuming). This can be a character string (e.g. “chr18”) or a vector or list of character strings. If it is an integer or vector of integers, the chromosomes will be checked based on the order that they are listed in a peak set. To run across all chromosomes, then use something like. exp <- ChIPQC(metadata, annotation = "mm10",chromosomes=NULL). I will clarify the entry from manual above and change the default here to something sensible (i.e missing will run all chromosomes not just 1 example chromsome) best, tom On Tue, Mar 17, 2015 at 3:38 PM, atisou [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User atisou <https: support.bioconductor.org="" u="" 7468=""/> wrote Comment: > ChIPQC: reads counts in bam do not match ones from samtools flagstat > <https: support.bioconductor.org="" p="" 65738="" #65748="">: > > Hi, > > I had already a look at flagtagcounts(), but this still not explain the > discrepancy in reads counts between ChIPQC and samtools flagstat. > > flagtagcounts(exp) > samp1 > UnMapped 0 > Mapped 2227533 > Duplicates 0 > MapQPass 1939371 > MapQPassAndDup 0 > > > > What worries me is that ChIPQC counts ~2M reads in total, while there are > ~30M reads in total in the input bam file. > > Why are there ~28M reads missing? or did I miss something important of > ChIPQC? > > > > > > ------------------------------ > > You may reply via email or visit > C: ChIPQC: reads counts in bam do not match ones from samtools flagstat >
ADD REPLYlink written 4.7 years ago by Thomas Carroll400

Ok, makes sense now. I did see that parameter description, but somehow, it did not hit my brain, apologies.

Thanks a lot again,

Kind regards,

H.

 

ADD REPLYlink written 4.7 years ago by atisou0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour