Dear Subread developers,
I am analyzing RNA sequencing data from ribo-depleted RNA samples generated using 150 bp paired-end, stranded sequencing. Reads were aligned to the human reference genome GRCh38 (Ensembl release 115) using STAR, with more than 95 percent of reads successfully aligned. I then quantified gene expression using featureCounts with the corresponding Ensembl GTF (release 115).
When running featureCounts with -t exon -g gene_id, approximately 20 to 30 percent of the aligned reads are assigned, which is expected given that this setting effectively quantifies mature (exonic) RNA only. In contrast, when using -t gene -g gene_id, the proportion of assigned reads increases to about 70 to 85 percent, consistent with aggregation across the full gene body, including intronic and other non exonic regions present in ribo-depleted libraries.
However, I observe an unexpected behavior: for some genes that have non-zero counts across all samples when using -t exon, the corresponding counts are zero when using -t gene. Intuitively, I would expect these genes to retain at least the same counts (or even higher counts) when switching from exon-level to gene-level features, not to drop to zero.
Is there a plausible explanation for this behavior?
Thanks in advance for your help.
Best regards.

Have you told featureCounts to do strand-specific counting?
As pointed out by Frances Turner, you can lose reads if they overlap the gene bodies of more than one gene. Such overlaps are increased if you consider full gene bodies, but are greatly reduced if strand is taken into account. Most overlapping genes are on opposite strands.