Hi everyone,
I heard that edgeR and DESeq2 assume that most genes in the samples are equally expressed, and only a small fraction of genes are differentially expressed. I was wondering how to compare two very different RNA samples. For example, one from muscle and the other from the liver. I know some people just use a more stringent criterion (e.g. 4-fold difference and FDR adjusted p-value <0.001) to reduce the number of significant genes. I have 2 questions related to this issue:
Q1. How different of the samples should we start worrying about this issue? For example, when we found >10 percent of genes are differentially expressed using edgeR or DESeq2 ?
Q2. Is there a more statistically sound way to fix this problem?
Thanks a lot!
p.s. I also saw this concept in a paper:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-91
"Still, it is important to keep in mind that even these methods are based on an assumption that most genes are equivalently expressed in the samples, and that the differentially expressed genes are divided more or less equally between up- and downregulation"
These questions bothered me for a long time. Thanks so much, Aaron. I really appreciate your help.