Question: featureCounts creating fragments and high NoFeatures value
0
gravatar for gregory.l.stone
2.7 years ago by
gregory.l.stone10 wrote:

I apologize in advance for my ignorance--this is my first time using both STAR and featureCounts.

my STAR command is STAR --runThreadN 4 --genomeDir $REF --readFilesIn ${FILE1} ${FILE2} --sjdbGTFfile $GTF --outSAMtype BAM Unsorted --outFileNamePrefix ${OUT_STAR}

my featureCounts command is featureCounts -T 2 -p -t exon -g gene_id -a ${GTF} -o counts.txt ${FILE1}

I am using STAR 2.5.2b and featureCounts 1.5.1

I am using human samples and the hg19 gtf annotation file. I ran featureCounts after getting unreliable results from htseq-count. My samples are paired, and featureCounts was appealing after struggling with htseq-count because I read that featureCounts handles paired-end reads very well.

 

The output from featureCounts is as follows:

Assigned        17798778
Unassigned_Ambiguity    316936
Unassigned_MultiMapping 3099021
Unassigned_NoFeatures   13022732
Unassigned_Unmapped     0
Unassigned_MappingQuality       0
Unassigned_FragmentLength       0
Unassigned_Chimera      0
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_Duplicate    0

 

These values are from 34237467 total fragments, and corresponds to a success rate of 52.0%.

I am confused because my STAR output is as follows:

Number of input reads |       33385184
                      Average input read length |       175
                                    UNIQUE READS:
                   Uniquely mapped reads number |       31138446
                        Uniquely mapped reads % |       93.27%
                          Average mapped length |       174.80
                       Number of splices: Total |       13176899
            Number of splices: Annotated (sjdb) |       12931737
                       Number of splices: GT/AG |       12993469
                       Number of splices: GC/AG |       93376
                       Number of splices: AT/AC |       7197
               Number of splices: Non-canonical |       82857
                      Mismatch rate per base, % |       0.25%
                         Deletion rate per base |       0.01%
                        Deletion average length |       1.62
                        Insertion rate per base |       0.01%
                       Insertion average length |       1.33
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       1399714
             % of reads mapped to multiple loci |       4.19%
        Number of reads mapped to too many loci |       6341
             % of reads mapped to too many loci |       0.02%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |       0.00%
                 % of reads unmapped: too short |       2.50%
                     % of reads unmapped: other |       0.02%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

 

Prior to running featureCounts I sorted my bam file by position.

I am confused primarily for 2 reasons. 1) why does featureCounts identify more input fragments (34237467) as were mapped by STAR (31138446). 2) Why is featureCounts returning such a high Unassigned_NoFeatures value?

 

Any help and advice would be invaluable and much appreciated.

 

Thanks,

Greg

 

tl;dr featureCounts assigning high number of no features and reading more input fragments than what was produced by STAR alignemnt.

ADD COMMENTlink modified 2.7 years ago by Wei Shi3.2k • written 2.7 years ago by gregory.l.stone10
Answer: featureCounts creating fragments and high NoFeatures value
1
gravatar for WouterDeCoster
2.7 years ago by
Antwerp, Belgium
WouterDeCoster110 wrote:

Based on my understanding:

STAR is reporting percentages on the number of reads received as input of STAR (so in the fastq files). In contrast, featureCounts is reporting percentages on the number of alignments, which is, due to multimapping, higher than the number of reads.

For STAR one fastq is multimapping in 20 locations, for featureCounts this read results in 20 multimapping alignments (as such inflating the number of Unassigned_NoFeatures).

ADD COMMENTlink written 2.7 years ago by WouterDeCoster110

Thank you for your thoughtful response. That definitely makes sense, I should've thought more closely about what the values meant before just comparing the numbers.

ADD REPLYlink written 2.7 years ago by gregory.l.stone10
Answer: featureCounts creating fragments and high NoFeatures value
0
gravatar for Wei Shi
2.7 years ago by
Wei Shi3.2k
Australia
Wei Shi3.2k wrote:

As @WouterDeCoster pointed, featureCounts counts alignments not reads. The total number of alignments will be greater than total number of reads if there are reads in your bam file that were reported multiple times.

The reporting of multi-mapping reads may increase the number in every category in featureCounts counting summary including 'Unassigned_NoFeatures' and 'Assigned'. I've typically seen ~50-70 percent of reads assigned to genes for human/mouse RNA-seq data. So your assignment rate of 52% looks OK to me.

ADD COMMENTlink written 2.7 years ago by Wei Shi3.2k

Great, thank you very much for your help

ADD REPLYlink written 2.7 years ago by gregory.l.stone10

Also you do not need to sort your bam files by position before you feed them to featureCounts. featureCounts accepts both name-sorted and location-sorted bam files. It processes these files in the same speed and returns you identical counting results.

ADD REPLYlink written 2.7 years ago by Wei Shi3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 339 users visited in the last hour