featureCounts bug with unpaired reads
1
0
Entering edit mode
@1cf96ea4
Last seen 9 months ago
United States

Hi- I've noticed what i think is a bug in featureCounts with paired-end reads and --countReadPairs.

When a bam is used that has been filtered post alignment so that some reads that were originally paired are no longer paired, featureCounts still treats them like a complete read pair and counts them. The example attached was run using subread v2.0.6

featureCounts -p -B -O --countReadPairs -R BAM -a features.gtf -s 0 -o sample.featureCounts.txt sample.bam

Here is the input sample.bam (the input). There are three reads: one read pair and one unpaired read (but the flag indicates that it is paired, because it was a paired read who's mate was filtered out in post-processing)

A01700:8:HKCHLDMXY:2:1416:12409:1188:GTAGT_GCACT    99  chr1    228212618   42  96M =   228212620   98  ACCGAGGGCCGGCGCCACGTGGTGTACGAGGACGCGCAGGAGAACTTCGTGCTCAAGATCCTCTTCTGCAAGCAGTCGGACCGCGGCCTCTACACC    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    MD:Z:96 XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  YS:i:0  YT:Z:CP cr:i:11 cf:i:11 PG:Z:MarkDuplicates.8
A01700:8:HKCHLDMXY:2:1416:12409:1188:GTAGT_GCACT    147 chr1    228212620   42  96M =   228212618   -98 CGAGGGCCGGCGCCACGTGGTGTACGAGGACGCGCAGGAGAACTTCGTGCTCAAGATCCTCTTCTGCAAGCAGTCGGACCGCGGCCTCTACACCTG    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    MD:Z:96 XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  YS:i:0  YT:Z:CP cr:i:11 cf:i:11 PG:Z:MarkDuplicates.8
A01700:8:HKCHLDMXY:2:1349:1958:12806:TCGTT_TAGTG    99  chr9    94946976    34  96M =   94947018    138 ATTCTTTTTTTTTTTTTTCGAGATAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAAGCTCCGCCTCCTGG    FFF,FFFFFFFFFFFFFFF,F::FF,F,F,F,FFF:::F::FFFFFFF:,FF,F,:F:F,FFFFF:F,F::F:F::::FFFF,F:FF,FF,,F,:F    MD:Z:96 XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  XS:i:-52    YS:i:0  YT:Z:CP cr:i:7  cf:i:9  PG:Z:MarkDuplicates.8-2B7A7784

In sample.featureCounts.txt.summary, you can see that there are 2 fragments counted, even though the input bam only has 1 complete read pair:

Status  sample.bam
Assigned    0
Unassigned_Unmapped 0
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 0
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures   2
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    0

The log also indicates the same thing


        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
      v2.0.6

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           sample.bam                                       ||
||                                                                            ||
||             Output file : sample.featureCounts.txt                         ||
||                 Summary : sample.featureCounts.txt.summary                 ||
||              Paired-end : yes                                              ||
||        Count read pairs : yes                                              ||
||              Annotation : features.gtf (GTF)                               ||
||      Dir for temp files : ./                                               ||
||      Assignment details : <input_file>.featureCounts.bam                   ||
||                      (Note that files are saved to the output directory)   ||
||                                                                            ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file features.gtf ...                                      ||
||    Features : 2                                                            ||
||    Meta-features : 2                                                       ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file sample.bam...                                             ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 2                                                    ||
||    Successfully assigned alignments : 0 (0.0%)                             ||
||    Running time : 0.00 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "sample.featureCounts.tx  ||
|| t.summary"                                                                 ||
||                                                                            ||
\\============================================================================//

And the detailed read count report in sample.bam.featureCounts.bam also shows that all three reads are counted even though one is not paired:

A01700:8:HKCHLDMXY:2:1416:12409:1188:GTAGT_GCACT    147 chr1    228212620   42  96M =   228212618   -98 CGAGGGCCGGCGCCACGTGGTGTACGAGGACGCGCAGGAGAACTTCGTGCTCAAGATCCTCTTCTGCAAGCAGTCGGACCGCGGCCTCTACACCTG    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    MD:Z:96 XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  YS:i:0  YT:Z:CP cr:i:11 cf:i:11 PG:Z:MarkDuplicates.8   XS:Z:Unassigned_NoFeatures
A01700:8:HKCHLDMXY:2:1416:12409:1188:GTAGT_GCACT    99  chr1    228212618   42  96M =   228212620   98  ACCGAGGGCCGGCGCCACGTGGTGTACGAGGACGCGCAGGAGAACTTCGTGCTCAAGATCCTCTTCTGCAAGCAGTCGGACCGCGGCCTCTACACC    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    MD:Z:96 XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  YS:i:0  YT:Z:CP cr:i:11 cf:i:11 PG:Z:MarkDuplicates.8   XS:Z:Unassigned_NoFeatures
A01700:8:HKCHLDMXY:2:1349:1958:12806:TCGTT_TAGTG    99  chr9    94946976    34  96M =   94947018    138 ATTCTTTTTTTTTTTTTTCGAGATAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAAGCTCCGCCTCCTGG    FFF,FFFFFFFFFFFFFFF,F::FF,F,F,F,FFF:::F::FFFFFFF:,FF,F,:F:F,FFFFF:F,F::F:F::::FFFF,F:FF,FF,,F,:F    MD:Z:96 XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  XS:i:-52    YS:i:0  YT:Z:CP cr:i:7  cf:i:9  PG:Z:MarkDuplicates.8-2B7A7784  XS:Z:Unassigned_NoFeatures
Rsubread featureCounts • 654 views
ADD COMMENT
1
Entering edit mode
Yang Liao ▴ 380
@yang-liao-6075
Last seen 4 weeks ago
Australia

This is not a bug. It is the intended behaviour of featureCounts. When counting read pairs, featureCounts will count all read pairs no matter they include both reads or only include one of the two reads.

ADD COMMENT

Login before adding your answer.

Traffic: 546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6