I am processing paired-end RNA-Seq data. I map them using Hisat2.0.3.Next I convert SAM files to BAM, sort and index the BAM file using samtools 1.3. I subsequently use the sorted BAM files to aggregate mapped reads based on annotation using featureCounts (Rsubread package v1.20.6).
After mapping the reads using Hisat2, we get some summary statistics like this:
20358841 reads; of these:
20358841 (100.00%) were paired; of these:
5046482 (24.79%) aligned concordantly 0 times
12600084 (61.89%) aligned concordantly exactly 1 time
2712275 (13.32%) aligned concordantly >1 times
----
5046482 pairs aligned concordantly 0 times; of these:
57755 (1.14%) aligned discordantly 1 time
----
4988727 pairs aligned 0 times concordantly or discordantly; of these:
9977454 mates make up the pairs; of these:
8723119 (87.43%) aligned 0 times
914540 (9.17%) aligned exactly 1 time
339795 (3.41%) aligned >1 times
78.58% overall alignment rate
After sorting and indexing, featureCounts on this BAM file tells me
Process BAM file xxxx.bam... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| ||
|| WARNING: reads from the same pair were found not adjacent to each ||
|| other in the input (due to read sorting by location or ||
|| reporting of multi-mapping read pairs). ||
|| ||
|| Read re-ordering is performed. ||
|| ||
|| Total fragments : 21770038 ||
|| Successfully assigned fragments : 10375129 (47.7%)
Which shows that the total fragments (i.e. paired-end reads) do not correspond with the number of paired-end reads that is given in the summary statistics from Hisat2. featureCounts tells me I have 21M fragments and Hisat2 tells me I have 20M fragments. How can this be possible?
Thank you for your reply, Wei.
So the larger number of fragments reported by featureCounts may be the result of multi-mapping reads?
However, similar, the number of counted fragments is higher than the number of concordantly mapped reads (exactly one time as in the example above) for some of the samples. This happens when we are not counting multi-mapping fragments, not counting multi-overlapping fragments and only counting fragments where both reads are mapped. How can this be explained?
Best,
Koen