For multiple level variables, different results with or without subsetting
1
0
Entering edit mode
Xianjun Dong ▴ 10
@xianjun-dong-7069
Last seen 2.6 years ago
United States

Hi,

I have a simple case, design(dds) = ~ sex + batch + condition, where condition is a factor with three levels (A, B, and C). I tried two different ways of testing differential expressed genes between condition B and C:

Method 1: Run DEseq on the whole dataset, and extract comparison between B and C as results(dds, contrast = c("condition","B","C")). 

Method 2: Subsetting the dataset by extracting samples only with condition B and C, relevel the condition, then run DEseq on the subset, get result via results(dds_subset, contrast = c("condition","B","C"))

I expected the two methods have same results, but actually they don't. I got different number of DE genes with the same FDR cutoff. 

Anything I misunderstand? Please advise. 

Thanks,

-Xianjun

deseq2 • 384 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States

You should expect different results. If you fit a linear model and then make comparisons, the statistic you use will incorporate an estimate of variance based on the data in the model. If you change what data are used in the model, you should expect to get different statistics as well.

ADD COMMENT
0
Entering edit mode

This is also one of the FAQ in the vignette.

ADD REPLY

Login before adding your answer.

Traffic: 746 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6