Question

Building the correct model for DESeq2

0

Entering edit mode

allisondaly13 • 0

@allisondaly13-19568

Last seen 5.3 years ago

Hi,

I am trying to find differentially expressed genes in my RNA-Seq data. I have noticed that based on the count matrix I provide, whether it contains only the direct comparison or more of the data, I get different p-values.

For example: I am comparing (0.5h stimulated to 0.5h unstimulated) and (1.0h stimulated to 1.0h unstimulated). 1) I can provide a count matrix which has counts for all four conditions. Then, I look at the p-values for 0.5 stimulated vs 0.5 unstimulated. OR 2) I can provide a count matrix which only has the counts for the conditions I am directly comparing ( 0.5h stimulated vs 0.5h unstimulated). When I compare the p-values between the two methods they do not match.

I am wondering which is better/ more accurate/ more true to the data?

Thanks! Allison

deseq2 • 356 views

ADD COMMENT • link updated 5.3 years ago by James W. MacDonald 65k • written 5.3 years ago by allisondaly13 • 0

score 1 · Answer 1 · 2019-01-23

In general, if you have more data in your model the deviance estimates are better, which tends to give more accurate results. As with all things there are tradeoffs here, as you are borrowing information from the data that you aren't making comparisons with. If there is a good reason to think that the different groups should have very different variabilities, then you might not want to combine, but that's up to you as the analyst.