I have several questions about analyzing miRNA sequencing data using limma voom. Most of these questions relate to the fact that the composition/distribution of miRNA sequencing data is very different from mRNA sequencing data.
1) In comparison to mRNA sequencing where thousands of transcripts are often similarly expressed across many tissue types, approximately only 100-150 miRNAs make up over 95% of the miRNAs expressed in a given cell type. Furthermore, just a few of these miRNAs can account for up to 50% of the miRNAs expressed. Can limma voom still be used for differential expression of miRNAs between two different cell types when the composition of miRNAs changes drastically? How about if I am interested in how a particular treatment affects the expression of the most highly expressed miRNAs?
2) Following up on question 1, does limma voom calculate the mean variance trend for a miRNA across all samples or just across replicates? If I have many different treatments or cell types that drastically change the expression of highly expressed/dominant miRNAs does that change the variance calculation for these miRNAs and affect the differential expression call between treatments/cell types?
3) The manual says that low counts should be filtered out prior to voom. As discussed above, if 100-150 miRNAs make 95% of the total expression should I just filter everything else out? If there are a number of cases where a miRNA is highly expressed in one cell type but very lowly expressed in another, does that effect things? Is there a more rigorous way to decide what is too lowly expressed and should be filtered out?
4) Voom applies a pseudocount of 0.5 directly to the counts. I realize that this is a small count, but if library sizes differ a lot, this pseudocount will have a different influence on each library. Can the pseudocount be applied after CPM? I don't believe that I can pass CPM+pseudocount directly to limma bypassing voom since voom models the mean variance trend?
5) I have seen that TMM normalization is often recommended for limma voom rather than normalizing to CPM. I may be misunderstanding TMM normalization, but it essentially looking between samples and seeing what changes. It filters out the most extreme differences on either end and you are left with genes/miRNAs that aren't really changing and normalizes library size to them to account for changes where a few transcripts may change a lot and affect the distribution of the entire library when doing CPM. However for miRNA libraries whose distribution can change a lot between samples, it seems like it might not be a good idea to do TMM normalization?