the confuse issue in DEseq2 among multi group
1
0
Entering edit mode
诗友 • 0
@efda635c
Last seen 13 months ago
United States

recently, I met a question in DEseq2. Firstly, I have a dataset including four groups (ctrl, a, b, c), each group have three samples. if I want to get DEGs between a vs ctrl, b vs ctrl, c vs ctrl, there are two ways to calculate. way1: I construct a matrix with 12 column (samples) and 20000 row (protein coding gene). Then I used DESeq() and result() to get the DEGs of a vs ctrl, b vs ctrl, c vs ctrl. way2: for a vs ctrl, I construct a matrix with 6 column (ctrl1 ctrl2 ctrl3 a1 a2 a3) and 20000 row (protein coding gene). Then I used DESeq() and result() to get the DEGs. repeat above for other comparision.

The p value of the same gene between way1 and way2 is very different.

so I have two question:

  1. whether more samples will influence the construction of GLM for each gene? in the past, I just think that GLM was contructed in each group independently. is the different GLM the reason why p value is different ?
  2. samples in the same group have strong batch effect. therefore, which way is more suitable for me? way1 or way2

thanks for you !

DESeq2 • 508 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

1.) Yes, adding more samples to your model increases the degrees of freedom and increases power, as long as the within-group variability is relatively similar across groups. 2.) It depends. If you have the same batches for all groups (e.g., you ran 2 samples for a,b,c,ctrl in one batch and then ran an additional sample of each in the second batch), then you should just include a batch factor in your model. If you don't have the same batches for all groups, then it's more complicated.

ADD COMMENT

Login before adding your answer.

Traffic: 302 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6