Question

the confuse issue in DEseq2 among multi group

0

Entering edit mode

诗友 • 0

@efda635c

Last seen 2.4 years ago

United States

recently, I met a question in DEseq2. Firstly, I have a dataset including four groups (ctrl, a, b, c), each group have three samples. if I want to get DEGs between a vs ctrl, b vs ctrl, c vs ctrl, there are two ways to calculate. way1: I construct a matrix with 12 column (samples) and 20000 row (protein coding gene). Then I used DESeq() and result() to get the DEGs of a vs ctrl, b vs ctrl, c vs ctrl. way2: for a vs ctrl, I construct a matrix with 6 column (ctrl1 ctrl2 ctrl3 a1 a2 a3) and 20000 row (protein coding gene). Then I used DESeq() and result() to get the DEGs. repeat above for other comparision.

The p value of the same gene between way1 and way2 is very different.

so I have two question:

whether more samples will influence the construction of GLM for each gene? in the past, I just think that GLM was contructed in each group independently. is the different GLM the reason why p value is different ?
samples in the same group have strong batch effect. therefore, which way is more suitable for me? way1 or way2

thanks for you !

DESeq2 • 755 views

ADD COMMENT • link updated 2.4 years ago by James W. MacDonald 68k • written 2.4 years ago by 诗友 • 0

score 0 · Answer 1 · 2023-08-16

1.) Yes, adding more samples to your model increases the degrees of freedom and increases power, as long as the within-group variability is relatively similar across groups. 2.) It depends. If you have the same batches for all groups (e.g., you ran 2 samples for a,b,c,ctrl in one batch and then ran an additional sample of each in the second batch), then you should just include a batch factor in your model. If you don't have the same batches for all groups, then it's more complicated.