Question

Multifactor model design for DE analysis (DESeq2 & edgeR)

1

Entering edit mode

Mathieu Bahin ▴ 30

@mathieu-bahin-6488

Last seen 11.2 years ago

Hi all, I am using DESeq2 and edgeR to perform DE analysis on paired samples on a dog cancer project. Sorry if the question is redundant but I can?t find one very similar to my case. I have been designing models with 2 factors: condition (control / tumor) and patient ID (to match the paired samples). I used the model '~sample_id + condition? until now but I would like to add a third factor, the breed. Is that then correct to use ?~sample_id + breed + condition? if my goal is to analyse the DE between control and tumor samples taking into account the individual variabilities (with the sample ID factor) and the breed variability (with the breed factor). Here is an example of a sample table I could have: Patient ID Condition Breed Sample1 1 Control Breed1 Sample2 2 Control Breed2 Sample3 3 Control Breed1 Sample4 4 Control Breed2 Sample5 1 Tumor Breed1 Sample6 2 Tumor Breed2 Sample7 3 Tumor Breed1 Sample8 4 Tumor Breed2

Cancer edgeR DESeq2 Cancer edgeR DESeq2 • 2.2k views

ADD COMMENT • link updated 11.2 years ago by Simon Anders ★ 3.8k • written 11.2 years ago by Mathieu Bahin ▴ 30

score 0 · Answer 1 · 2014-08-19

Hi Mathieu On 19/08/14 10:01, Mathieu Bahin wrote: > I have been designing models with 2 factors: condition (control / > tumor) and patient ID (to match the paired samples). I used the model > '~sample_id + condition? until now but I would like to add a third > factor, the breed. > Is that then correct to use ?~sample_id + breed + condition? if my > goal is to analyse the DE between control and tumor samples taking > into account the individual variabilities (with the sample ID factor) > and the breed variability (with the breed factor). No. This would make breed another blocking factor, besides patient_id. But it does not offer any new information, because all samples from the same patient are from the same breed, so the patient_id factor already captures all variation associated with this. Therefore, there is no need to account for breed if you just want to see the overall effect of cancer. If, however, you want to know for which genes the expression change due to cancer _depends_ on breed, you are looking for an _interaction_ between breed and condition and should hence use: ~ patient_id + condition + breed:condition (BTW, I renamed your factor from "sample_id" to "patient_id": After all, you have two samples from each patient.) > Another question: If I use the pairwise information, I don?t have > replicates because I only have two sample (one control, one tumor) > for each patient. Is it better to use it (and then have no replicates) > or not (and then have replicates for ?control? and ?tumor? samples) ? Of course, you still have replicates. You have several dogs. This is the whole point of the paired design. If you omitted the "patient_id" factor, you would drastically lose inferential power. Simon