Hi-
I've noticed what i think is a bug in featureCounts with paired-end reads and --countReadPairs
.
When a bam is used that has been filtered post alignment so that some reads that were originally paired are no longer paired, featureCounts still treats them like a complete read pair and counts them. The example attached was run using subread v2.0.6
featureCounts -p -B -O --countReadPairs -R BAM -a features.gtf -s 0 -o sample.featureCounts.txt sample.bam
Here is the input sample.bam (the input). There are three reads: one read pair and one unpaired read (but the flag indicates that it is paired, because it was a paired read who's mate was filtered out in post-processing)
A01700:8:HKCHLDMXY:2:1416:12409:1188:GTAGT_GCACT 99 chr1 228212618 42 96M = 228212620 98 ACCGAGGGCCGGCGCCACGTGGTGTACGAGGACGCGCAGGAGAACTTCGTGCTCAAGATCCTCTTCTGCAAGCAGTCGGACCGCGGCCTCTACACC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:96 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP cr:i:11 cf:i:11 PG:Z:MarkDuplicates.8
A01700:8:HKCHLDMXY:2:1416:12409:1188:GTAGT_GCACT 147 chr1 228212620 42 96M = 228212618 -98 CGAGGGCCGGCGCCACGTGGTGTACGAGGACGCGCAGGAGAACTTCGTGCTCAAGATCCTCTTCTGCAAGCAGTCGGACCGCGGCCTCTACACCTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:96 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP cr:i:11 cf:i:11 PG:Z:MarkDuplicates.8
A01700:8:HKCHLDMXY:2:1349:1958:12806:TCGTT_TAGTG 99 chr9 94946976 34 96M = 94947018 138 ATTCTTTTTTTTTTTTTTCGAGATAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAAGCTCCGCCTCCTGG FFF,FFFFFFFFFFFFFFF,F::FF,F,F,F,FFF:::F::FFFFFFF:,FF,F,:F:F,FFFFF:F,F::F:F::::FFFF,F:FF,FF,,F,:F MD:Z:96 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:-52 YS:i:0 YT:Z:CP cr:i:7 cf:i:9 PG:Z:MarkDuplicates.8-2B7A7784
In sample.featureCounts.txt.summary, you can see that there are 2 fragments counted, even though the input bam only has 1 complete read pair:
Status sample.bam
Assigned 0
Unassigned_Unmapped 0
Unassigned_Read_Type 0
Unassigned_Singleton 0
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 0
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 2
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 0
The log also indicates the same thing
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.6
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| sample.bam ||
|| ||
|| Output file : sample.featureCounts.txt ||
|| Summary : sample.featureCounts.txt.summary ||
|| Paired-end : yes ||
|| Count read pairs : yes ||
|| Annotation : features.gtf (GTF) ||
|| Dir for temp files : ./ ||
|| Assignment details : <input_file>.featureCounts.bam ||
|| (Note that files are saved to the output directory) ||
|| ||
|| ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 1 ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file features.gtf ... ||
|| Features : 2 ||
|| Meta-features : 2 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file sample.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 2 ||
|| Successfully assigned alignments : 0 (0.0%) ||
|| Running time : 0.00 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "sample.featureCounts.tx ||
|| t.summary" ||
|| ||
\\============================================================================//
And the detailed read count report in sample.bam.featureCounts.bam also shows that all three reads are counted even though one is not paired:
A01700:8:HKCHLDMXY:2:1416:12409:1188:GTAGT_GCACT 147 chr1 228212620 42 96M = 228212618 -98 CGAGGGCCGGCGCCACGTGGTGTACGAGGACGCGCAGGAGAACTTCGTGCTCAAGATCCTCTTCTGCAAGCAGTCGGACCGCGGCCTCTACACCTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:96 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP cr:i:11 cf:i:11 PG:Z:MarkDuplicates.8 XS:Z:Unassigned_NoFeatures
A01700:8:HKCHLDMXY:2:1416:12409:1188:GTAGT_GCACT 99 chr1 228212618 42 96M = 228212620 98 ACCGAGGGCCGGCGCCACGTGGTGTACGAGGACGCGCAGGAGAACTTCGTGCTCAAGATCCTCTTCTGCAAGCAGTCGGACCGCGGCCTCTACACC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:96 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP cr:i:11 cf:i:11 PG:Z:MarkDuplicates.8 XS:Z:Unassigned_NoFeatures
A01700:8:HKCHLDMXY:2:1349:1958:12806:TCGTT_TAGTG 99 chr9 94946976 34 96M = 94947018 138 ATTCTTTTTTTTTTTTTTCGAGATAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAAGCTCCGCCTCCTGG FFF,FFFFFFFFFFFFFFF,F::FF,F,F,F,FFF:::F::FFFFFFF:,FF,F,:F:F,FFFFF:F,F::F:F::::FFFF,F:FF,FF,,F,:F MD:Z:96 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:-52 YS:i:0 YT:Z:CP cr:i:7 cf:i:9 PG:Z:MarkDuplicates.8-2B7A7784 XS:Z:Unassigned_NoFeatures