Question: EdgeR on miRNA data
0
gravatar for melmogy
2.5 years ago by
melmogy0
melmogy0 wrote:

Hi,

1- in miRNA analysis, when determining the CPM that corresponds to 10 raw reads it would be CPM>1 in a 10 million library size (miRNA counts). However, in some samples, miRNA can represent a small fraction and can be 0.5 to 0.1 million or less. Is it valid to use a CPM >20 or >100 or more for filtration?

2- The library size that we base CPM filtration value on, is it the mapped counts or only miRNA counts?

Thanks,

Mohamed

mirna edger • 1.4k views
ADD COMMENTlink modified 2.5 years ago by Aaron Lun25k • written 2.5 years ago by melmogy0
Answer: EdgeR on miRNA data
3
gravatar for Aaron Lun
2.5 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

For your first question - yes, it's fine to adjust the CPM threshold. The important thing is how big the underlying counts are, which contributes to the detection power of the downstream DE analysis. For example, I would be fairly assured that I could detect DE if I had average counts of ~20 across my samples. If I had average counts of 2 across samples instead, my detection power would be a lot lower, and I doubt I would be able to consistently detect DE. The latter case should be removed during filtering to reduce the severity of the BH correction, as well as to ensure that the discreteness of low counts does not interfere with normalization and trend fitting.

For the second question - it depends on whether you can assume that most miRNAs are not DE across samples. If your are expecting a global up- or downregulation of miRNAs between conditions, you should not use the total miRNA count as a normalizing factor. This is because it will change between conditions for biological reasons, such that normalizing on it would remove the biology of interest. On the other hand, if you do assume that most miRNAs are not DE, then normalizing on the miRNA counts is the preferred approach, as it will eliminate any uninteresting biases in miRNA representation between samples (e.g., due to differences in miRNA capture efficiency).

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Aaron Lun25k

Thank you Aaron,

Does that mean we should be including all small RNAs that we have counts for, ie, piRNAs, tRNAs, rRNAs in the library normalization if we are expecting high differential expression between samples?

Best regards

ADD REPLYlink written 2.5 years ago by darrinformatics0
1

Hi,

Aaron is completely right about using miRNA if you don't expect total de-regulation. I would say that detect that is pretty complicated, because using everything can introduce a lot of bias, since the library preparation itself could be the cause of different amount of some specific small RNA, like rRNA or in the right side of the size distribution. 

I would check if some other kind of small RNA show a difference in total number of reads. For instance, assuming a total de-regulation of miRNAs, you would see that one group have half number of reads mapping to miRNA, if you see that tRNA are constant, then you have a good reason to use miRNA / tRNA for the normalization. But, if always you see a difference in number of reads for any kind of small RNA type. Then, it's more complicate to decide what to do. 

I would say, even if half of them are DE-regulated, you still can use edgeR/DESeq2/limma-voom options. 

As well, for that you can read more: https://www.researchgate.net/publication/315091565_Modeling_bias_and_variation_in_the_stochastic_processes_of_small_RNA_sequencing?ev=prf_high

hope this helps.

ADD REPLYlink written 2.5 years ago by Lorena Pantano100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 305 users visited in the last hour