Question

Down/Up-regulated genes asymmetry in edgeR differential expression analysis

0

Entering edit mode

Pauly Lin ▴ 150

@pauly-lin-7537

Last seen 8.5 years ago

University of New South Wales, Australia

Dear all,

I have performed edgeR differential analysis on RNA-Seq data with six samples (3 vs 3). edgeR finds around 30 up regulated genes but more than 100 down regulated genes. Should I be concerned with this big difference? I have also used limma to perform differential analysis on microarray data from the same individuals, and there's no such asymmetry in the number of down and up regulated genes. I have been told that edgeR assumes that the number of up regulated genes is similar to the number of down regulated genes - is that true?

Thanks!

Paul

edgeR rnaseq • 2.4k views

ADD COMMENT • link 9.1 years ago Pauly Lin ▴ 150

score 1 · Answer 1 · 2015-03-31

In the bigger scheme of things, this asymmetry isn't particularly dramatic. If you had, say, 30 up-regulated genes and 3000 down-regulated genes, that would be a bit more interesting. As it is now, I wouldn't worry about it, as the numbers involved are too low to be of concern.

Of course, it's worth pointing out that asymmetry isn't a problem in most cases. The affected part of the analysis is that of TMM normalization, in the calcNormFactors function. In TMM normalization, the 30% of most extreme M-values on either side (i.e., up- or down-regulated) are trimmed away, and normalization is performed with the M-values of the remaining (presumably non-DE) genes. As long as the DE proportions on either side do not exceed 30%, normalization will be okay.

So, what you've been told (or at least, how you're saying it) is mostly wrong. There are still slivers of truth, though. Firstly, at the maximum number of DE genes that TMM normalization can tolerate (60% of total), they must be split evenly between up- and down-regulation in order to avoid exceeding the 30% threshold on either side. Secondly, if you have pronounced asymmetry, normalization will become less accurate as trimming will start eating into non-DE genes on the side without any DE genes. This will distort the M-value distribution of non-DE genes, leading to a biased estimate. However, this asymmetry needs to be fairly extreme to have an effect.

score 0 · Answer 2 · 2015-03-31

0