Question

Designing a contrast in DESEQ

0

Entering edit mode

mrigaya.mehra ▴ 10

@mrigayamehra-12427

Last seen 8.5 years ago

Hello,

I am new to R and DeSEq. I am using DeSeq to identify DE genes in my experiment. The experiment consists of a 2x2 matrix with treatment and control administered in specific tissues. Thus creating 4 different combinations as below

1. treatment_tissue1

2. treatment_tissue2

3. control_tissue1

4. control_tissue2

I want to compare the DE genes found in the following combinations

1. treatment_tissue1 vs treatment_tissue2

2. control_tissue1 vs treatment_tissue1

3. control_tissue2 vs treatment_tissue2

I have designed my dds object as below

dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ treatment )

dds$group <- factor(paste0(dds$treatment, dds$tissue))

dds$group <- relevel(dds$group, "control_tissue1")

I am not sure if I am doing it correct, please advise. Also, I do not understand how I can set up the contrast. Please help.

Someone suggested that I use a model matrix and then use contrast. Though I have set up a model matrix but I am stuck at the point where I have to set up contrasts. Please help

deseq2 design and contrast matrix model matrix • 31k views

ADD COMMENT • link 8.9 years ago mrigaya.mehra ▴ 10

0

Entering edit mode

Dear Michael,

Thanks for your reply, I have 40 samples from 10 patients. I did follow the example mentioned in the link. I created a group like this

dds$group <- factor(paste0(dds$treatment, dds$tissue))

dds$group <- relevel(dds$group, "control_tissue1")
design(dds) <- ~ group

dds <- DESeq(dds)

So considering this design I got ~600 DE genes.

But now I also have to take into account one more factor

My design is like this

patient   treatment   tissue   group
patient1   treatment1   tissue1   control
patient1   treatment2   tissue1   control
patient1   treatment1   tissue2   control
patient1   treatment2   tissue2   control
patient2   treatment1   tissue1   treated
patient2   treatment2   tissue1   treated
patient2   treatment1   tissue2   treated
patient2   treatment2   tissue2   treated
patient3   treatment1   tissue1   control
patient3   treatment2   tissue1   control
patient3   treatment1   tissue2   control
patient3   treatment2   tissue2   control
patient4   treatment1   tissue1   treated
patient4   treatment2   tissue1   treated
patient4   treatment1   tissue2   treated
patient4   treatment2   tissue2   treated
patient5   treatment1   tissue1   control
patient5   treatment2   tissue1   control
patient5   treatment1   tissue2   control
patient5   treatment2   tissue2   control
patient6   treatment1   tissue1   treated
patient6   treatment2   tissue1   treated
patient6   treatment1   tissue2   treated
patient6   treatment2   tissue2   treated
patient7   treatment1   tissue1   control
patient7   treatment2   tissue1   control
patient7   treatment1   tissue2   control
patient7   treatment2   tissue2   control
patient8   treatment1   tissue1   treated
patient8   treatment2   tissue1   treated
patient8   treatment1   tissue2   treated
patient8   treatment2   tissue2   treated
patient9   treatment1   tissue1   treated
patient9   treatment2   tissue1   treated
patient9   treatment1   tissue2   treated
patient9   treatment2   tissue2   treated
patient10   treatment1   tissue1   treated
patient10   treatment2   tissue1   treated
patient10   treatment1   tissue2   treated
patient10   treatment2   tissue2   treated

and I also want to consider the variations in the patients data.

So, I generated a second design

design(dds) <- ~ group + patient

and here I got ~1000 DE genes. I do not understand why there is a difference in the DE gene numbers and also if I am dong it the right way. Please help.

ADD REPLY • link 8.9 years ago mrigaya.mehra ▴ 10

1

Entering edit mode

Adding patient as a term should help remove the variation across patient, and improve sensitivity. This is potentially why you have more DE genes.

Note that you should follow the recommendations in the vignette and always put the variable of interest (group) at the END of the design formula: ~ patient + group.

ADD REPLY • link 8.9 years ago Michael Love 43k

0

Entering edit mode

Why should variables of interest go at the end of design formulae?

I've briefly read about sequential sums of squares. Are sequential sums of squares calculated in DESeq2, and (if so) are they calculated in the order that terms appear in the design formula?

ADD REPLY • link 7.0 years ago galen.seilis • 0

0

Entering edit mode

Nearly all R packages will automatically test for the last coefficient first, if you don't provide any additional information to the software about which coefficient to test. Of course, you can also specify which coefficient to test, and then the order doesn't matter.

ADD REPLY • link 7.0 years ago Michael Love 43k

score 2 · Answer 1 · 2017-02-22

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

How many samples do you have total?

Why don't you follow the example in the vignette, in the beginning of the interactions section:

https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions

ADD COMMENT • link 8.9 years ago Michael Love 43k