Dealing with multi-overlapping and multi-mapping reads
Entering edit mode
ag1805x ▴ 80
Last seen 16 days ago
University of Allahabad

Should multi-overlapping and multi-mapping reads be counted during read quantification in human RNA-seq/miRNA-seq data analysis?

The Rsubread manual mentions that reads or fragments overlapping more than one gene are not counted for RNA-seq experiments, because any single fragment must originate from only one of the target genes but the identity of the true target gene cannot be confidently determined. But we must also remember that there are overlapping genes.

In case of stranded RNA-seq, overlapping genes are better resolved and that gives a hint that for such data we can avoid counting multi-overlapping reads. But stranded RNA-seq only solves problem for genes on different strand. Ignoring multi-overlap reads even in case of stranded RNA-seq can lead to loss of overlapping genes on same strand.

Similar is the case for multimapping reads. A read from a gene can map to both the parent gene and as well as to a similar pseudogene. Removing ambiguous reads will under represent the parent gene even though it was expressed and counting them will over-represent the pseudogene.

Counting these ambiguous reads may lead to false-positives but ignoring them might cause read loss?

What is the general consensus in dealing with ambiguous reads in RNA-seq data analysis?

RNA-seq featureCounts ht-seq • 3.1k views
Entering edit mode
swbarnes2 ★ 1.3k
Last seen 1 day ago
San Diego

The simple answer...don't use subread. Probably not what you want to hear, but the best way to deal with multi-mappers is to use a gene counter that understands with them and deals with them.

Pseudoaligners like Kallisto and salmon will figure out how to intelligently distribute ambiguous reads. RSEM can do much the same thing if you have a bam aligned to a genome.


Login before adding your answer.

Traffic: 439 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6