Reads mistakenly being assigned to Unassigned_NoFeatures category when using featureCounts
1
0
Entering edit mode
jma1991 ▴ 70
@jma1991-11856
Last seen 20 months ago
Cumbernauld

:: I originally posted my question on Biostars, but according to the featureCounts website, I should post on this forum for help ::

I am using featureCounts (version 1.5.2) to count the number of reads within bins along the genome:

    featureCounts -R -F SAF --fracOverlap 1 -Q 1 --primary -T 20 -a Bins.saf -o readCounts.txt Input.bam

A small percentage of my reads are not being counted:

    Status    Input.bam
    Assigned    25677313
    Unassigned_Ambiguity    0
    Unassigned_MultiMapping    0
    Unassigned_NoFeatures    344088
    Unassigned_Unmapped    0
    Unassigned_MappingQuality    0
    Unassigned_FragmentLength    0
    Unassigned_Chimera    0
    Unassigned_Secondary    0
    Unassigned_Nonjunction    0
    Unassigned_Duplicate    0

However, when I remove the --fracOverlap command, 100% of my reads are assigned. When I view the location of the unassigned reads in IGV alongside my bins, they are mapped completely within the bins. Therefore shouldn't they be counted?

Here is the SAM output of some of the unassigned reads:

    HISEQ:132:C87JAANXX:8:2102:21334:46258    1040    chr1    64313502    60    47M3S    *    0    0    GCCTGTAGAGCATTTTCTTAAATAGTGATTGATGGGGGAGAGCCCAGANA    DGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGE<<3#3    MD:Z:47    PG:Z:MarkDuplicates    RG:Z:C87JAANXX.8    NM:i:0    AS:i:47    XS:i:32
    HISEQ:287:C8785ANXX:2:2108:10903:32326    16    chr1    64348161    60    46M4S    *    0    0    TAATATATCAATCTTGTTTTTCATAGGTAAACAGGGTGAAAGATTTACTC    GGGEGGGFGGGGGBEGBGGGEEG>GGGGGGGGEGGGGGGGGGGGGCCCCC    MD:Z:46    PG:Z:MarkDuplicates    RG:Z:C8785ANXX.2    NM:i:0    AS:i:46    XS:i:19
    HISEQ:132:C87JAANXX:8:2306:5138:50069    0    chr1    64416635    60    48M2S    *    0    0    AGTCACAAGTGCACAGGTTCCAGAGGTCAGGCCTTGAGCAGTTTGGGGCT    CBBBBFGGGGGFGGGGGGGGGEG1FGGGGGG1=FGGEGGGGGGGGGGFGG    MD:Z:31A16    PG:Z:MarkDuplicates    RG:Z:C87JAANXX.8    NM:i:1    AS:i:43    XS:i:19
    HISEQ:132:C87JAANXX:8:1310:2745:62463    1040    chr1    64441532    60    4S46M    *    0    0    AGTTCTCTCAAAAACAACATCTGTAATGTTGTTAGGACCTGGGTCCTAAC    :11:E::=/0=1E<1=1111>F:E1=0<1E=1@GGGF0>GBGFGGBBBBB    MD:Z:46    PG:Z:MarkDuplicates    RG:Z:C87JAANXX.8    NM:i:0    AS:i:46    XS:i:0
    HISEQ:287:C8785ANXX:1:2301:12294:65428    1024    chr1    64540417    14    2S48M    *    0    0    GTTGTGTGTGTGTGTGTGGGTGTGTGTGTGTGTGTTTGTGTTATATGTGT    BABABGGGGGGGGGGGGG/=CCGGGGGGGFGGG>FGGGGF?==F:FGGGG    MD:Z:16T31    PG:Z:MarkDuplicates    RG:Z:C8785ANXX.1    NM:i:1    AS:i:43    XS:i:39
    HISEQ:287:C8785ANXX:5:2304:7860:85658    0    chr1    64540417    18    2S48M    *    0    0    GTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGTGTTATATGTGT    BCCBBGGGGEGFGGGGGGGGFDGGGGGGGGGGGGGBGGGGGGGD@GGGFG    MD:Z:48    PG:Z:MarkDuplicates    RG:Z:C8785ANXX.5    NM:i:0    AS:i:48    XS:i:43
    HISEQ:287:C8785ANXX:5:2101:11743:10794    16    chr1    64575425    60    3M1I46M    *    0    0    TGATTTTTTTTTTTCACATATGGAGATTGTATGGTGAGGAAAACCCAAAC    CEFEE<BGGGGEEGEGEGGGGGGGGGEGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:49    PG:Z:MarkDuplicates    RG:Z:C8785ANXX.5    NM:i:1    AS:i:46    XS:i:21

And here is the flagstat output from the BAM file:

   26021401 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 secondary
    0 + 0 supplementary
    8931396 + 0 duplicates
    26021401 + 0 mapped (100.00% : N/A)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (N/A : N/A)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (N/A : N/A)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)


  [1]: https://preview.ibb.co/fOGhT5/igv_snapshot.png

subread featurecounts • 2.7k views
ADD COMMENT
3
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 1 day ago
Australia/Melbourne

Your command requires each read to 100% overlap with a feature ('--fracOverlap 1') but the reads you provided do not 100% overlap with any feature. These reads have soft-clipped bases ('S' section in CIGAR) or inserted bases ('I' section in CIGAR), making them have the fraction of overlapping bases being less than 1. This is the reason why they were assigned to the "Unassigned_NoFeatures" category. 

ADD COMMENT
0
Entering edit mode

Ah that makes sense, thank you for replying

ADD REPLY

Login before adding your answer.

Traffic: 598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6