Search
Question: Reads mistakenly being assigned to Unassigned_NoFeatures category when using featureCounts
0
gravatar for jma1991
6 months ago by
jma199130
jma199130 wrote:

:: I originally posted my question on Biostars, but according to the featureCounts website, I should post on this forum for help ::

I am using featureCounts (version 1.5.2) to count the number of reads within bins along the genome:

    featureCounts -R -F SAF --fracOverlap 1 -Q 1 --primary -T 20 -a Bins.saf -o readCounts.txt Input.bam

A small percentage of my reads are not being counted:

    Status    Input.bam
    Assigned    25677313
    Unassigned_Ambiguity    0
    Unassigned_MultiMapping    0
    Unassigned_NoFeatures    344088
    Unassigned_Unmapped    0
    Unassigned_MappingQuality    0
    Unassigned_FragmentLength    0
    Unassigned_Chimera    0
    Unassigned_Secondary    0
    Unassigned_Nonjunction    0
    Unassigned_Duplicate    0

However, when I remove the --fracOverlap command, 100% of my reads are assigned. When I view the location of the unassigned reads in IGV alongside my bins, they are mapped completely within the bins. Therefore shouldn't they be counted?

Here is the SAM output of some of the unassigned reads:

    HISEQ:132:C87JAANXX:8:2102:21334:46258    1040    chr1    64313502    60    47M3S    *    0    0    GCCTGTAGAGCATTTTCTTAAATAGTGATTGATGGGGGAGAGCCCAGANA    DGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGE<<3#3    MD:Z:47    PG:Z:MarkDuplicates    RG:Z:C87JAANXX.8    NM:i:0    AS:i:47    XS:i:32
    HISEQ:287:C8785ANXX:2:2108:10903:32326    16    chr1    64348161    60    46M4S    *    0    0    TAATATATCAATCTTGTTTTTCATAGGTAAACAGGGTGAAAGATTTACTC    GGGEGGGFGGGGGBEGBGGGEEG>GGGGGGGGEGGGGGGGGGGGGCCCCC    MD:Z:46    PG:Z:MarkDuplicates    RG:Z:C8785ANXX.2    NM:i:0    AS:i:46    XS:i:19
    HISEQ:132:C87JAANXX:8:2306:5138:50069    0    chr1    64416635    60    48M2S    *    0    0    AGTCACAAGTGCACAGGTTCCAGAGGTCAGGCCTTGAGCAGTTTGGGGCT    CBBBBFGGGGGFGGGGGGGGGEG1FGGGGGG1=FGGEGGGGGGGGGGFGG    MD:Z:31A16    PG:Z:MarkDuplicates    RG:Z:C87JAANXX.8    NM:i:1    AS:i:43    XS:i:19
    HISEQ:132:C87JAANXX:8:1310:2745:62463    1040    chr1    64441532    60    4S46M    *    0    0    AGTTCTCTCAAAAACAACATCTGTAATGTTGTTAGGACCTGGGTCCTAAC    :11:E::=/0=1E<1=1111>F:E1=0<1E=1@GGGF0>GBGFGGBBBBB    MD:Z:46    PG:Z:MarkDuplicates    RG:Z:C87JAANXX.8    NM:i:0    AS:i:46    XS:i:0
    HISEQ:287:C8785ANXX:1:2301:12294:65428    1024    chr1    64540417    14    2S48M    *    0    0    GTTGTGTGTGTGTGTGTGGGTGTGTGTGTGTGTGTTTGTGTTATATGTGT    BABABGGGGGGGGGGGGG/=CCGGGGGGGFGGG>FGGGGF?==F:FGGGG    MD:Z:16T31    PG:Z:MarkDuplicates    RG:Z:C8785ANXX.1    NM:i:1    AS:i:43    XS:i:39
    HISEQ:287:C8785ANXX:5:2304:7860:85658    0    chr1    64540417    18    2S48M    *    0    0    GTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGTGTTATATGTGT    BCCBBGGGGEGFGGGGGGGGFDGGGGGGGGGGGGGBGGGGGGGD@GGGFG    MD:Z:48    PG:Z:MarkDuplicates    RG:Z:C8785ANXX.5    NM:i:0    AS:i:48    XS:i:43
    HISEQ:287:C8785ANXX:5:2101:11743:10794    16    chr1    64575425    60    3M1I46M    *    0    0    TGATTTTTTTTTTTCACATATGGAGATTGTATGGTGAGGAAAACCCAAAC    CEFEE<BGGGGEEGEGEGGGGGGGGGEGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:49    PG:Z:MarkDuplicates    RG:Z:C8785ANXX.5    NM:i:1    AS:i:46    XS:i:21

And here is the flagstat output from the BAM file:

   26021401 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 secondary
    0 + 0 supplementary
    8931396 + 0 duplicates
    26021401 + 0 mapped (100.00% : N/A)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (N/A : N/A)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (N/A : N/A)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)


  [1]: https://preview.ibb.co/fOGhT5/igv_snapshot.png

ADD COMMENTlink modified 6 months ago by Wei Shi2.7k • written 6 months ago by jma199130
3
gravatar for Wei Shi
6 months ago by
Wei Shi2.7k
Australia
Wei Shi2.7k wrote:

Your command requires each read to 100% overlap with a feature ('--fracOverlap 1') but the reads you provided do not 100% overlap with any feature. These reads have soft-clipped bases ('S' section in CIGAR) or inserted bases ('I' section in CIGAR), making them have the fraction of overlapping bases being less than 1. This is the reason why they were assigned to the "Unassigned_NoFeatures" category. 

ADD COMMENTlink written 6 months ago by Wei Shi2.7k

Ah that makes sense, thank you for replying

ADD REPLYlink written 6 months ago by jma199130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 181 users visited in the last hour