Question

Small-RNA analysis using DESeq2

0

Entering edit mode

lehouri • 0

@lehouri-8006

Last seen 8.9 years ago

Israel

Hi,

We are currently using the DESeq2 R package for analysis of differential expression of small-RNAs over genes and we would very much appreciate if you could help us with a question regarding the treatment of multiple reads:

As I understand it, when dealing with mRNA-seq the consensus is to count the one location which received the highest alignment score (whether by the aligner scoring system or by "the rich get richer" methods and so on) or to neglect these reads completely.

While dealing with small-RNAs, however, the situation might be a bit different, since small-RNAs act in trans and have the potential of targeting each location they align to.

One option is to consider and count all the locations a read aligns to, but this will obviously affect the statistics greatly.

Another option is - as in mRNA seq - to count only one read or to neglect the multiple-aligned reads completely. This option, though, is not ideal too since large portions of the small-RNA reads are actually derived from repetitive elements and ignoring them will also affect and shift the results.

This leads me to the question itself: with the intention to use DESeq2 for differential expression analysis, which counting method would you recommend to use while dealing with small-RNA seq?

Thank you!

Leah

rnaseq deseq2 • 3.0k views

ADD COMMENT • link updated 8.9 years ago by Michael Love 41k • written 8.9 years ago by lehouri • 0

score 0 · Answer 1 · 2015-06-02

I don't have much experience with small RNA analysis so I might defer to someone else on best practices. I can tell you the statistical effect though. Simon has explained this in depth on the list, but it's easier for me to repeat briefly here than to search for those posts (you might try searching "DESeq" and "multi-mapping" etc if you want to read more details). By discarding multi-mapping reads, you lose "theoretical" power (that is, the power you would have if you could uniquely assign these to a genomic location, which you cannot though). However, if you include them you can potentially introduce false positives. For example, suppose you have a set of RNAs which share repetitive sequence and one of them is DE across condition. By sharing reads, you could end up calling all RNAs in the set as DE, because you spread the DE signal amongst all the RNAs in this set. This is why we recommend the first approach (only using reads/fragments which can be uniquely assigned to a gene).