Should multi-overlapping and multi-mapping reads be counted during read quantification in human RNA-seq/miRNA-seq data analysis?
The Rsubread manual mentions that reads or fragments overlapping more than one gene are not counted for RNA-seq experiments, because any single fragment must originate from only one of the target genes but the identity of the true target gene cannot be confidently determined. But we must also remember that there are overlapping genes.
In case of stranded RNA-seq, overlapping genes are better resolved and that gives a hint that for such data we can avoid counting multi-overlapping reads. But stranded RNA-seq only solves problem for genes on different strand. Ignoring multi-overlap reads even in case of stranded RNA-seq can lead to loss of overlapping genes on same strand.
Similar is the case for multimapping reads. A read from a gene can map to both the parent gene and as well as to a similar pseudogene. Removing ambiguous reads will under represent the parent gene even though it was expressed and counting them will over-represent the pseudogene.
Counting these ambiguous reads may lead to false-positives but ignoring them might cause read loss?
What is the general consensus in dealing with ambiguous reads in RNA-seq data analysis?