Question: Subread FeatureCounts for Exon level quantification: Length estimate for Exons as low as 1 base-pair
gravatar for jennyl.smith12
8 months ago by
jennyl.smith120 wrote:


I am working on quantifying RNASeq data at the exon level, which is intended for use in the DEXSeq package. I followed the documentation provided in the DEXSeq tutorial (, page 23), as well as the commands on Subread to DEXSeq helper code (

First to produce the GTF File, I used the following code: 


python2 -f Homo_sapiens.GRCh37.87_collapsed.gtf $GTF \ Homo_sapiens.GRCh37.87_collapsed.gff


Then I ran subread on the command line as submitted jobs on our shared HPC. We use the SLURM job scheduler. 

#Program:featureCounts v1.5.3; Command:

featureCounts -p -B -s 2 -T 16 -Q 10 -f -O -F GTF -a Homo_sapiens.GRCh37.87_collapsed.gtf -t exon -g gene_id  \ -o sample_withJunctionsOnGenome_dupsFlagged_stranded.exon.cts  \ sample_withJunctionsOnGenome_dupsFlagged.bam


I have two questions:

1) Why would I have exons for a gene that are only a few base-pairs long? I have some entries, in the example output below, where the length is calculated as only 1 base-pair.  This does not seem representative of the true expression of a real gene, which would not have exons of such short length. Could it be there was some error in the commands run above or in the way the GTF was produced? 

2) The number of assigned features for this quantification was particularly low.  I have two sets of RNASeq data; 1) created with Poly-A enrichment in the library prep and 2) produced with the ribodepletion (RBS) library protocol. Both sets had paired-end, strand-specific, 75bp reads, and were processed through identical bioinformatic pipeline from raw FASTQ to BAM.  I did check that the orientation of the paired reads were the same for both library preparation protocols.  It was these BAMs that I input to Subread featurecounts() function.  

For my poly-A library, I was getting on average 60% of my total reads assigned to features using the same commands as above, and same GTF reference. 

For my RBS library, I am getting on average 30-40% of my total reads assigned to features, again using the same commands as above and same GTF reference. The vast majority of reads, from the summary files produced by Subread and then summarized into one figure by MultiQC, for all samples were classified as "Unassigned_NoFeatures".  

I was thinking that this might make sense due to many additional non-protein coding RNAs that the Ribodepletion protocol will sequence.  Has anyone seen this type of difference before? Again, should I be optimizing my commands for ribodeplete RNASeq that I am not aware of?  


Thank you so much, 




Example of my output for one sample is below:

Geneid Chr Start End Strand Length Sample_withJunctionsOnGenome_dupsFlagged.bam
ENSG00000223972 1 11869 11871 + 3 0
ENSG00000223972 1 11872 11873 + 2 0
ENSG00000223972 1 11874 12009 + 136 1
ENSG00000223972 1 12010 12057 + 48 2
ENSG00000223972 1 12058 12178 + 121 3
ENSG00000223972 1 12179 12227 + 49 9
ENSG00000223972 1 12595 12612 + 18 1
ENSG00000223972 1 12613 12697 + 85 20
ENSG00000223972 1 12698 12721 + 24 10
ENSG00000223972 1 12975 13052 + 78 1
ENSG00000223972 1 13221 13224 + 4 3
ENSG00000223972 1 13225 13374 + 150 3
ENSG00000223972 1 13375 13402 + 28 0
ENSG00000223972 1 13403 13452 + 50 10
ENSG00000223972 1 13453 13655 + 203 7
ENSG00000223972 1 13656 13660 + 5 1
ENSG00000223972 1 13661 13670 + 10 1
ENSG00000223972 1 13671 14409 + 739 1
ENSG00000223972 1 14410 14412 + 3 0
ENSG00000227232 1 14363 14403 - 41 0
ENSG00000227232 1 14404 14410 - 7 0
ENSG00000227232 1 14411 14501 - 91 9
ENSG00000227232 1 14502 14502 - 1 8
ENSG00000227232 1 14503 14829 - 327 51




ADD COMMENTlink modified 8 months ago by Gordon Smyth36k • written 8 months ago by jennyl.smith120
Answer: Subread FeatureCounts for Exon level quantification: Length estimate for Exons a
gravatar for Gordon Smyth
8 months ago by
Gordon Smyth36k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth36k wrote:

In answer to your first question, that's what does. It collapses the original exons into smaller segments according to the intersections between different exons. If two original exons overlap except for one base, then it will output a collapsed exon consisting of that one base. This is presumably explained in the tutorial or documentation.

ADD COMMENTlink modified 8 months ago • written 8 months ago by Gordon Smyth36k
Answer: Subread FeatureCounts for Exon level quantification: Length estimate for Exons a
gravatar for Wei Shi
8 months ago by
Wei Shi3.0k
Wei Shi3.0k wrote:

Have you looked into your GTF file to see if these exons are indeed very short? featureCounts just uses your provided annotation for counting and it does not modify your annotation.

The counting percentage for your polyA sample from featureCounts sounds reasonable. I dont have much experience with RBS sequencing, but I cant think of any ways to improve your counting since apparently most of your reads do not overlap exons.

ADD COMMENTlink written 8 months ago by Wei Shi3.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 249 users visited in the last hour