Search
Question: Hisat2 vs featureCounts summary statistics
0
gravatar for Koen Van den Berge
18 months ago by
Ghent University, Belgium
Koen Van den Berge90 wrote:

I am processing paired-end RNA-Seq data. I map them using Hisat2.0.3.Next I convert SAM files to BAM, sort and index the BAM file using samtools 1.3. I subsequently use the sorted BAM files to aggregate mapped reads based on annotation using featureCounts (Rsubread package v1.20.6).

After mapping the reads using Hisat2, we get some summary statistics like this:

20358841 reads; of these:
  20358841 (100.00%) were paired; of these:
    5046482 (24.79%) aligned concordantly 0 times
    12600084 (61.89%) aligned concordantly exactly 1 time
    2712275 (13.32%) aligned concordantly >1 times
    ----
    5046482 pairs aligned concordantly 0 times; of these:
      57755 (1.14%) aligned discordantly 1 time
    ----
    4988727 pairs aligned 0 times concordantly or discordantly; of these:
      9977454 mates make up the pairs; of these:
        8723119 (87.43%) aligned 0 times
        914540 (9.17%) aligned exactly 1 time
        339795 (3.41%) aligned >1 times
78.58% overall alignment rate

 

 

After sorting and indexing, featureCounts on this BAM file tells me

Process BAM file xxxx.bam...                                        ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 21770038                                              ||
||    Successfully assigned fragments : 10375129 (47.7%)                  

 

Which shows that the total fragments (i.e. paired-end reads) do not correspond with the number of paired-end reads that is given in the summary statistics from Hisat2. featureCounts tells me I have 21M fragments and Hisat2 tells me I have 20M fragments. How can this be possible?

 

ADD COMMENTlink modified 18 months ago by Wei Shi2.7k • written 18 months ago by Koen Van den Berge90
2
gravatar for Wei Shi
18 months ago by
Wei Shi2.7k
Australia
Wei Shi2.7k wrote:

The total number of fragments reported by featureCounts is the total number of alignments included in your bam files. If there are fragments that were reported more than once in your bam file, the total number of alignments will be greater than the total number of fragments included in your data.

ADD COMMENTlink written 18 months ago by Wei Shi2.7k

Thank you for your reply, Wei.

So the larger number of fragments reported by featureCounts may be the result of multi-mapping reads?

However, similar, the number of counted fragments is higher than the number of concordantly mapped reads (exactly one time as in the example above) for some of the samples. This happens when we are not counting multi-mapping fragments, not counting multi-overlapping fragments and only counting fragments where both reads are mapped. How can this be explained?

Best,

Koen

ADD REPLYlink written 18 months ago by Koen Van den Berge90
0
gravatar for Wei Shi
18 months ago by
Wei Shi2.7k
Australia
Wei Shi2.7k wrote:

Can you provide the parameters you used when running featureCounts?

The discrepancy might be due to how you define "concordantly mapped reads". Although you count fragments where both ends are mapped, the fragments with paired end distance greater than the nominal template length are counted by featureCounts as well and this will give you larger counts. If you use -d and -D options to specify the minimum and maximum fragment length, you will get reduced number of counts. So this is all dependent on how you count the reads.

ADD COMMENTlink written 18 months ago by Wei Shi2.7k

Dear Wei, thanks again.

I used featureCounts (Rsubread_1.20.6) with the following arguments:

featureCounts ( f i l e s = bamFiles , annot . ext=”xxx.gff3”, isGTFAnnotationFile = TRUE, useMetaFeatures = TRUE, isPairedEnd=TRUE, requireBothEndsMapped = TRUE, GTF . featureType = ”CDS” , GTF. attrType = ”ID” , allowMultiOverlap=FALSE) 

The genome is an in-house draft for a non-model organism.

For the mapping, Hisat2 has been used with default parameters, and I basically compared the number of concordantly mapped reads reported there with the number of assigned fragments in featureCounts.

ADD REPLYlink modified 18 months ago • written 18 months ago by Koen Van den Berge90

Thanks for the information. I think you will need to understand how Hisat2 defines "concordantly mapped reads". What is the limit on the distance between the two reads from the same pair?

If you set 'checkFragLength=TRUE'  in your featureCounts command, you will get less assigned reads.

ADD REPLYlink written 18 months ago by Wei Shi2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 314 users visited in the last hour