Building the correct model for DESeq2
1
0
Entering edit mode
@allisondaly13-19568
Last seen 5.3 years ago

Hi,

I am trying to find differentially expressed genes in my RNA-Seq data. I have noticed that based on the count matrix I provide, whether it contains only the direct comparison or more of the data, I get different p-values.

For example: I am comparing (0.5h stimulated to 0.5h unstimulated) and (1.0h stimulated to 1.0h unstimulated). 1) I can provide a count matrix which has counts for all four conditions. Then, I look at the p-values for 0.5 stimulated vs 0.5 unstimulated. OR 2) I can provide a count matrix which only has the counts for the conditions I am directly comparing ( 0.5h stimulated vs 0.5h unstimulated). When I compare the p-values between the two methods they do not match.

I am wondering which is better/ more accurate/ more true to the data?

Thanks! Allison

deseq2 • 356 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

In general, if you have more data in your model the deviance estimates are better, which tends to give more accurate results. As with all things there are tradeoffs here, as you are borrowing information from the data that you aren't making comparisons with. If there is a good reason to think that the different groups should have very different variabilities, then you might not want to combine, but that's up to you as the analyst.

ADD COMMENT

Login before adding your answer.

Traffic: 790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6