Question

limma-voom for miRNA seq

0

Entering edit mode

Jake ▴ 90

@jake-7236

Last seen 20 months ago

United States

I have several questions about analyzing miRNA sequencing data using limma voom. Most of these questions relate to the fact that the composition/distribution of miRNA sequencing data is very different from mRNA sequencing data.

1) In comparison to mRNA sequencing where thousands of transcripts are often similarly expressed across many tissue types, approximately only 100-150 miRNAs make up over 95% of the miRNAs expressed in a given cell type. Furthermore, just a few of these miRNAs can account for up to 50% of the miRNAs expressed. Can limma voom still be used for differential expression of miRNAs between two different cell types when the composition of miRNAs changes drastically? How about if I am interested in how a particular treatment affects the expression of the most highly expressed miRNAs?

2) Following up on question 1, does limma voom calculate the mean variance trend for a miRNA across all samples or just across replicates? If I have many different treatments or cell types that drastically change the expression of highly expressed/dominant miRNAs does that change the variance calculation for these miRNAs and affect the differential expression call between treatments/cell types?

3) The manual says that low counts should be filtered out prior to voom. As discussed above, if 100-150 miRNAs make 95% of the total expression should I just filter everything else out? If there are a number of cases where a miRNA is highly expressed in one cell type but very lowly expressed in another, does that effect things? Is there a more rigorous way to decide what is too lowly expressed and should be filtered out?

4) Voom applies a pseudocount of 0.5 directly to the counts. I realize that this is a small count, but if library sizes differ a lot, this pseudocount will have a different influence on each library. Can the pseudocount be applied after CPM? I don't believe that I can pass CPM+pseudocount directly to limma bypassing voom since voom models the mean variance trend?

5) I have seen that TMM normalization is often recommended for limma voom rather than normalizing to CPM. I may be misunderstanding TMM normalization, but it essentially looking between samples and seeing what changes. It filters out the most extreme differences on either end and you are left with genes/miRNAs that aren't really changing and normalizes library size to them to account for changes where a few transcripts may change a lot and affect the distribution of the entire library when doing CPM. However for miRNA libraries whose distribution can change a lot between samples, it seems like it might not be a good idea to do TMM normalization?

Thanks

limma voom • 3.1k views

ADD COMMENT • link updated 8.2 years ago by Gordon Smyth 50k • written 8.2 years ago by Jake ▴ 90

score 4 · Accepted Answer · 2016-02-19

Jake, I think you may be perceiving more problems than there actually are. Maybe your data is different, but we have found voom to work well for miRNA data using much the same analysis pipeline as we would use for mRNA-seq. The read count composition of miRNA data is not so different to that of regular RNA-seq. A relatively small proportion of the genes account for most of the reads in regular RNA-seq also. The main difference is that there are relatively few miRs (just a few hundred) and the counts for individual miRs can therefore be quite large.

Have you actually tried it and found a problem, or are you just worrying about things in advance?

Here are brief responses to your questions:

1) It is not a problem that some miRNAs have large counts, nor is it that some miRNAs have larger counts than others.

2) Obviously voom computes variances over replicates. That's why you have to give it a design matrix.

3) The idea is to keep all miRNAs that have reasonable counts (say around 10 or so) in a reasonable number of samples. This does not require that you filter out everything except the top few.

4) No you can't hack voom by offsetting cpm, nor is there a need to do so. Just leave it as it is. The original voom paper showed that voom worked the best of all methods when some libraries had 10 times as many reads as others. Yes, the offset of 0.5 is relatively larger for small libraries than large, but that is by design -- the smaller libraries need relatively larger moderation because they contain relatively less information.

5) Again, I don't see the problem. You will only have a problem if all or most miRNAs are DE in the same direction.