Question

Clarification on counting in Rsubread (featureCounts)

0

Entering edit mode

Luca • 0

@lucapiacentini-9597

Last seen 22 hours ago

Italy

Dear Subread developers,

I am analyzing RNA sequencing data from ribo-depleted RNA samples generated using 150 bp paired-end, stranded sequencing. Reads were aligned to the human reference genome GRCh38 (Ensembl release 115) using STAR, with more than 95 percent of reads successfully aligned. I then quantified gene expression using featureCounts with the corresponding Ensembl GTF (release 115).

When running featureCounts with -t exon -g gene_id, approximately 20 to 30 percent of the aligned reads are assigned, which is expected given that this setting effectively quantifies mature (exonic) RNA only. In contrast, when using -t gene -g gene_id, the proportion of assigned reads increases to about 70 to 85 percent, consistent with aggregation across the full gene body, including intronic and other non exonic regions present in ribo-depleted libraries.

However, I observe an unexpected behavior: for some genes that have non-zero counts across all samples when using -t exon, the corresponding counts are zero when using -t gene. Intuitively, I would expect these genes to retain at least the same counts (or even higher counts) when switching from exon-level to gene-level features, not to drop to zero.

Is there a plausible explanation for this behavior?

Thanks in advance for your help.

Best regards.

featureCounts Rsubread • 251 views

ADD COMMENT • link updated 15 hours ago by Gordon Smyth 53k • written 4 weeks ago by Luca • 0

1

Entering edit mode

Have you told featureCounts to do strand-specific counting?

As pointed out by Frances Turner, you can lose reads if they overlap the gene bodies of more than one gene. Such overlaps are increased if you consider full gene bodies, but are greatly reduced if strand is taken into account. Most overlapping genes are on opposite strands.

ADD REPLY • link 1 day ago Gordon Smyth 53k

score 2 · Answer 1 · 2026-01-12

2

Entering edit mode

Frances Turner ▴ 20

@6bce1540

Last seen 1 day ago

United Kingdom

You should look at the summary files produced by feature counts. This gives a breakdown of what happened to reads that were not assigned. You may find the number of read in the 'ambigious' category has increased . This could happen if the introns (but not exons) of the genes in question overlap with another gene

ADD COMMENT • link 1 day ago Frances Turner ▴ 20

0

Entering edit mode

Thanks to both of you for your helpful suggestions.

The counting was already strand-specific at both the exon and gene levels. I resolved the issue by specifying the -O option, which allows reads to be assigned to overlapping features. As expected, when switching from exon-level to gene-level features, I now obtain the same or higher counts.

I would like to ask a follow-up question, if possible.

I am working with ribo-depleted bulk RNA-seq data and plan to perform a conventional differential gene expression analysis using a count matrix generated by featureCounts with the -t exon and -g gene_id options (without -O). This approach focuses on mature transcripts but uses only about one third of the total aligned reads.

As a secondary aim, to better exploit the full sequencing depth, I would also like to analyze all RNA species (including nascent or immature RNAs), perform intron retention analyses, and carry out isoform-level analyses (for which I am aware that dedicated tools exist). In this context, I am wondering whether it would be more appropriate to use -t gene together with -O for these secondary analyses, or whether I should also consider using the -M and/or --fraction options when counting with featureCounts.

The recommended strategy in this situation is not entirely clear to me.

Thank you very much for your time and consideration.

ADD REPLY • link 1 day ago Luca • 0

0

Entering edit mode

Your follow-up question is very much wider than the first, and is essentially a research question rather a software question. As one of the featureCounts authors, we don't make recommendations for the sort of analyses you are proposing. Things like fractional counting are a bit ad hoc, and only you can judge whether they would be useful in the context of your own analyses. You can use featureCounts in conjunction with subjunc to count exon-exon junctions, which may help with isoform-level analyses.

ADD REPLY • link 16 hours ago • updated 15 hours ago Gordon Smyth 53k