Why is it better to count genes at the exon-level using subread::featureCounts()?
Entering edit mode
Sara • 0
Last seen 7 weeks ago
United States


I am using Rsubread's featureCounts() to quantify the genes in my RNA-Seq data. There are two options to count "genes" or "exons" that are then aggregated to gene counts when counting "meta-features" instead of features. Why are summing exon counts preferred to automatically counting genes? I am having troubel finding this answer.

Thank You, Sara

RNASeq Rsubread featureCounts Quantification • 1.3k views
Entering edit mode
Last seen 1 hour ago
WEHI, Melbourne, Australia

I feel that I answered your question previously: Quantification of Genes with RSubread::featureCounts() at exon-level vs gene-level? but I will try again. You are still misinterpretting how the read counting works and I hope that a few more words will perhaps clarify things for you. You don't say which code options you are considering but I will assume they are the same as in your previous question.

featureCounts does not count reads at the exon-level and then add them up to get gene-level counts. The options you are refering to are instead between

  1. Counting whole gene bodies (from TSS to TES) including both exons and introns
  2. Or only counting reads that arise from the expressed part of each gene.

We recommend that latter approach because counting reads that are mapped entirely to introns tends to increase noise relative to the second approach. Neither of these approaches is equivalent to exon-level counting.

Entering edit mode

Yes, I was confused weeks ago on several aspects of quantification and now am reviewing my work and was interested in exactly why it is better to quantify at "exon-level". And you are right, I had looked into this and had misinterpreted. Thank you for this detailed answer, now I understand and it is clear as to why I was having trouble to finding the answer to my faulty question.

Entering edit mode

If you open up the GTF file in an editor you will see why the featureCounts options are as they are, even though they might not seem intuitive at first glance. featureCounts() needs to know which rows of the GTF to use. For each gene, the GTF has a row called "gene", which specifies the entire gene range from TSS to TES, and one or more rows labelled "exon". For almost all purposes the "exon" rows are what are important, but for completeness featureCounts gives the option of using the "gene" rows instead by setting the GTF.featureType option. The names "gene" and "exon" refer here to row names of the GTF file, not to whether featureCounts is returning gene or exon-level counts.


Login before adding your answer.

Traffic: 588 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6