featureCounts -L: high unassigned:mapq fraction
1
0
Entering edit mode
sikora • 0
@694bee18
Last seen 19 days ago
Germany

Dear Community,

I'd like to clarify unexpected behaviour of featureCounts when counting on ONT cDNA bam files, generated with minimap2 with the splice alignment preset. FeatureCounts version: 2.1.1. When running

featureCounts -L -C -Q 10 --primary -M -O --fraction -t transcript


aired-end : no                                               ||
||        Count read pairs : no                                               ||
||              Annotation : genes.filtered.gtf (GTF)                         ||
||      Dir for temp files : /scratch/local/snakepipes.khA6aeMgrY             ||
||                                                                            ||
||                 Threads : 8                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted (fractional)                             ||
||     Multiple alignments : primary alignment only                           ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 1                                                ||
||          Long read mode : yes                                              ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file genes.filtered.gtf ...                                ||
||    Features : 267506                                                       ||
||    Meta-features : 67928                                                   ||
||    Chromosomes/contigs : 439                                               ||
||                                                                            ||
|| Process BAM file sorted.bam...          ||
||    Single-end reads are included.                                          ||
||    Total alignments : 9121703                                              ||
||    Successfully assigned alignments : 3639166 (39.9%)                      ||
||    Running time : 1.37 minutes                                             ||

i.e. ~ 40% of the fragments are counted. In the summary file, I notice 4481575 fragments (~ 49%) are Unassigned:Mapping Quality.

Samtools stats run on the same bam file on the genic regions tells me that <12% of alignments have MAPQ<10.

What could be the reason for the discrepancy between low MAPQ fraction calculated by samtools stats and by featureCounts ?

Best wishes,

Katarzyna

Bioconductor • 221 views
ADD COMMENT
0
Entering edit mode

FeatureCounts can behave differently with ONT cDNA BAM files due to long read splice patterns, and many users compare settings or presets much like gamers explore different modes with q789 game free Download this app. Alignment gaps, secondary reads, and strand settings often influence counting accuracy, so adjusting parameters is essential for reliable results.

0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 22 hours ago
The Cave, 181 Longwood Avenue, Boston, …

Hi Katarzyna,

The discrepancy arises because your samtools stats were restricted to genic regions only, where alignments are typically of higher quality and thus exhibit fewer low-MAPQ scores (<12% below 10). FeatureCounts, however, applies the -Q 10 filter to all input alignments upfront - regardless of whether they overlap features - before assessing overlaps with transcripts. In long-read ONT cDNA data aligned via minimap2's splice preset, a substantial fraction (~49%) of total alignments likely have MAPQ <10 overall, often due to multi-mappers, off-target hits, or lower-confidence placements outside genic areas.

To verify, re-run samtools stats on the full BAM (without genic restriction):

samtools stats sorted.bam | grep "mapped ("

This should reveal a higher overall low-MAPQ fraction aligning with featureCounts' unassigned tally. If it doesn't, inspect the MAPQ distribution directly:

samtools view sorted.bam | cut -f 5 | sort -n | uniq -c

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6