featureCounts -L: high unassigned:mapq fraction
1
0
Entering edit mode
sikora • 0
@694bee18
Last seen 3 days ago
Germany

Dear Community,

I'd like to clarify unexpected behaviour of featureCounts when counting on ONT cDNA bam files, generated with minimap2 with the splice alignment preset. FeatureCounts version: 2.1.1. When running

featureCounts -L -C -Q 10 --primary -M -O --fraction -t transcript


aired-end : no                                               ||
||        Count read pairs : no                                               ||
||              Annotation : genes.filtered.gtf (GTF)                         ||
||      Dir for temp files : /scratch/local/snakepipes.khA6aeMgrY             ||
||                                                                            ||
||                 Threads : 8                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted (fractional)                             ||
||     Multiple alignments : primary alignment only                           ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 1                                                ||
||          Long read mode : yes                                              ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file genes.filtered.gtf ...                                ||
||    Features : 267506                                                       ||
||    Meta-features : 67928                                                   ||
||    Chromosomes/contigs : 439                                               ||
||                                                                            ||
|| Process BAM file sorted.bam...          ||
||    Single-end reads are included.                                          ||
||    Total alignments : 9121703                                              ||
||    Successfully assigned alignments : 3639166 (39.9%)                      ||
||    Running time : 1.37 minutes                                             ||

i.e. ~ 40% of the fragments are counted. In the summary file, I notice 4481575 fragments (~ 49%) are Unassigned:Mapping Quality.

Samtools stats run on the same bam file on the genic regions tells me that <12% of alignments have MAPQ<10.

What could be the reason for the discrepancy between low MAPQ fraction calculated by samtools stats and by featureCounts ?

Best wishes,

Katarzyna

Bioconductor • 144 views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 4 hours ago
The Cave, 181 Longwood Avenue, Boston, …

Hi Katarzyna,

The discrepancy arises because your samtools stats were restricted to genic regions only, where alignments are typically of higher quality and thus exhibit fewer low-MAPQ scores (<12% below 10). FeatureCounts, however, applies the -Q 10 filter to all input alignments upfront - regardless of whether they overlap features - before assessing overlaps with transcripts. In long-read ONT cDNA data aligned via minimap2's splice preset, a substantial fraction (~49%) of total alignments likely have MAPQ <10 overall, often due to multi-mappers, off-target hits, or lower-confidence placements outside genic areas.

To verify, re-run samtools stats on the full BAM (without genic restriction):

samtools stats sorted.bam | grep "mapped ("

This should reveal a higher overall low-MAPQ fraction aligning with featureCounts' unassigned tally. If it doesn't, inspect the MAPQ distribution directly:

samtools view sorted.bam | cut -f 5 | sort -n | uniq -c

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1089 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6