Question

DESeq2 vs edgeR results analysis

0

Entering edit mode

adR ▴ 30

@do-it-23093

Last seen 8 months ago

Germany, München

Hi Dear Scientist, Thank you so much for the platform and as usual, I may have your few minutes to my question? I used both DESeq2 and edgeR to analyze my RNAseq data. However, I found a higher number of significant genes in my DESeq2 analysis compared to edgeR. The difference is like 1000 which is high in my opinion. Here I posted the code I used below and please show me my mistake case I missed something. My variable(sample) is continuous data.

## edgeR
 x <- DGEList(counts = muscle, group = Sample)
 design <- model.matrix(~Sample)
fit <- estimateDisp(fat, design = design, robust = TRUE)
QL <- glmQLFit(fit, design = design)
table(p.adjust(QL$table$PValue, method="BH")<0.05) #### 5928 genes

### DESeq2
dds <- DESeqDataSetFromMatrix(countData = countData,
                              colData = colData,
                              design = ~ Sample)
dds <- DESeq(dds, fitType = "mean")
resultsNames(dds )
Sample <- results(dds)
sum(Sample$padj < 0.05, na.rm = TRUE) #### 6042 genes

Thank you so much! Best, Amare

deseq2 edgeR • 8.1k views

ADD COMMENT • link updated 2.3 years ago by Gordon Smyth 50k • written 4.0 years ago by adR ▴ 30

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 8 minutes ago

WEHI, Melbourne, Australia

I answered your question two weeks ago: Design edger with one or more continues variables

I am astonished that you find a 2% difference in the number of DE genes to be large or surprising, especially considering that the edgeR QL method is specifically designed to offer more rigorous error rate control (be more conservative) than negative binomial DE pipelines based on likelihood ratio tests or Wald tests. The difference in DE genes is about 100, not 1000 as you say in your question. To me the results from the two packages seem remarkably consistent.

ADD COMMENT • link 4.0 years ago • updated 2.3 years ago Gordon Smyth 50k

0

Entering edit mode

I really thank you so much!
Now it is corrected and as you said the difference is almost 100-150. My problem is now solved! Best!

ADD REPLY • link 4.0 years ago adR ▴ 30

score 2 · Accepted Answer · 2020-04-20

2

Entering edit mode

swbarnes2 ★ 1.3k

@swbarnes2-14086

Last seen 16 hours ago

San Diego

They are different algorithms. They are going to return different answers. Without knowing how many genes overlap between those two sets, I'd say both programs returned the same results; 6000 genes.

ADD COMMENT • link 4.0 years ago swbarnes2 ★ 1.3k

0

Entering edit mode

I'm going to echo "swbarnes2". This has been asked and answered, even recently, on the support site. The methods are different, and it's a lot less interesting or surprising when you note that a method might call a gene DE because adj p = 0.04, and another method might call a gene not DE because adj p = 0.06.

We recommend on the site (and have on many previous threads), pick a tool and use it, but it's not a good idea to alternate through various methods on the dataset you are going to use these methods for analysis. You can certainly do this on another dataset in order to choose which method to use, or just look at the dozens of papers comparing methods systematically on simulated and real datasets.

ADD REPLY • link 4.0 years ago Michael Love 41k

0

Entering edit mode

Thank you so much for your replay. All of the DE genes (adjp < 0.05) I found from edgeR analysis are actually present in the DESeq2 analysis result(adjp<0.05). Thanks!

ADD REPLY • link 4.0 years ago adR ▴ 30