8 months ago by
In short, the biochemistry and bioinformatics that you perform should both be aligned with the scientific questions you seek to ask.
The degree to which RNA-Seq can be taken as an indirect or proxy measure of protein abundance has been the subject of lots of research.
In a particular study, if RNA-Seq is being used as a proxy measure of "gene expression", which itself is being understood as a proxy measure of "protein abundance", then enriching for "mature" mRNA (= spliced & ready for translation into protein) by any means (e.g. polyA-enrichment) is a means of boosting your indirect readout of protein abundance.
In such a study, intronic reads are perforce taken to result from either:
- gDNA contamination
- artifact of mapping read to genome/transcriptome (incorrect, multiple, etc)
- readout from (un-annotated) nested genes
- pre-mRNA (e.g. escaping mature mRNA enrichment)
- intron retention likely to introduce a premature stop codon leading to nonsense mediated decay
- recursive splicing
- or some other artifact which is not aim of the study.
However, if studying regulatory aspects of RNA are the aim of the study, then interrogating intronic reads can be informative. In particular, the papers you site make observations about the timing of splicing w.r.t. transcription, alternative splicing, and other aspects of transcriptional control:
- "the pattern of intronic sequence read coverage is explained by nascent transcription in combination with co-transcriptional splicing"
- "intronic levels are a proxy for nascent transcription"
- "comparison of exonic and intronic expression changes can separate transcriptional and post-transcriptional effects"
The reason to exclude introns is (roughly) that you are using gene expression analysis of mRNA-Seq data as a proxy for protein abundance.
Your finding a correlation between DESeq results with v without dropping intronic reads does not take aim at the research questions of your cited papers. If you instead determined genes where the relative abundance of intronic to exonic reads (consistently) changed between experimental conditions for selected (classes of) genes, you might instead have something to say about the relationship of those conditions and, say, co-transcriptional splicing or intron retention or transcriptional elongation rates or ...