Question: Multi-factorial design in DEseq2: defining reduced model
0
gravatar for laianavarromartin
2.8 years ago by
Spain
laianavarromartin10 wrote:

Hi, 

I have an experimental design where I have 2 factors (A: 4 levels, B: 2 levels). I'm interested to look at both factor effects and its interaction. I'm having troubles understanding how to define the reduced model to run DEseq.        

I define the full model as : A + B + A:B

My question is:

reduced model as: A+B means that I'm looking at DE genes of the interaction A:B?

reduced model as: A+A:B means that I'm looking at DE genes considering factor B?

reduced model as: B+A:B means that I'm looking at DE genes considering factor A?

Thanks!

 

 

 

 

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by laianavarromartin10

Dear Mike,

Thanks for your quick answer. I have 3 replicates per each condition, the problem is that these are data from 2 different experiments, and the model is not full rank. I was considering to not include replicate as a factor.

This is the info of my ColData

Dosis Window Replicate
Ctrl 0a5 dpf R1
1ppm 0a5 dpf R1
4ppm 0a5 dpf R1
Ctrl 0a5 dpf R2
1ppm 0a5 dpf R2
4ppm 0a5 dpf R2
Ctrl 0a5 dpf R3
1ppm 0a5 dpf R3
4ppm 0a5 dpf R3
Ctrl 2a5 dpf R4
1ppm 2a5 dpf R4
4ppm 2a5 dpf R4
Ctrl 2a5 dpf R5
1ppm 2a5 dpf R5
4ppm 2a5 dpf R5
Ctrl 2a5 dpf R6
1ppm 2a5 dpf R6
4ppm 2a5 dpf R6
ADD REPLYlink written 2.8 years ago by laianavarromartin10

To be more concise. My data come from two different experiments (carried out at different times). Experiment 1 was done at a different experimental windows (from 0 to 5 days post fertilization) than experiment 2 (from 2 to 5 days post fertilization). For that reason I have named them differently as replicates, since the experiment nor the library preparation and sequencing where not done at the same time.

My questions are:

Which genes are DE due to dose?

Does Experimental window affect how those genes are DE by the different doses?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by laianavarromartin10

Note, i moved your post to a Comment, instead of an Answer.

What is the significance of replicate? What makes the three samples labelled R1 related to each other?

ADD REPLYlink written 2.8 years ago by Michael Love26k

In these experiments we treated zebrafish embryo to different doses of a compound. All those labelled with the same replicate name come from the same 6-well plate (in which one treatment-dose was place in a different well of the same plate). So all the zebrafish from the same Replicate were exposed to the same conditions of light, temperature... This is done to account for differences between plates (tank effect)

ADD REPLYlink written 2.8 years ago by laianavarromartin10

Got it. You can follow an example listed in the vignette, using a design of ~window + window:rep.nested + window:dose.

You should re-code replicate to take values R1-R3. So R4 become R1, R5 becomes R2, R6 becomes R3.

The example in the vignette starts with "Consider an experiment with grouped individuals, where we seek to test the group-specific effect of a treatment, while controlling for individual effects."

 

 

ADD REPLYlink written 2.8 years ago by Michael Love26k

But, animals from the R1 and R4 do not come from the same plate. Is that correct? 

That means that in our case is a nested design, but Replicates from different experiments are not the same. 

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by laianavarromartin10

The model I suggested takes into account that the R1 and R4, etc. are different because of the interaction between window and rep.nested. Each replicate within each window is accounted for separately.

ADD REPLYlink written 2.8 years ago by Michael Love26k

Mike, thanks this is clear now.

ADD REPLYlink written 2.8 years ago by laianavarromartin10

Apart from choosing the best design, I still do not understand how should we do the LRT. So I come back to my original question:

Once I define the full model, should i leave out the variable of interest in the reduced model to be looking at the DE genes that are defined by that variable?

So in the case that my final design is: ~window + window:rep.nested + window:dose

reduced = ~window + window:rep.nested  will tell me which genes are DE by doses depending on window?

reduced = ~window:rep.nested + window:dose which genes are DE depending on window?

ADD REPLYlink written 2.8 years ago by laianavarromartin10

Note, i moved your post to a Comment, instead of an Answer. Answers are for answering the person asking the question, who is you.

The recommendation I am making is to use Wald tests on the interaction terms, which will perform a test of the dose effect for each window, and also allow you to contrast the dose effect across windows. After following the example in the vignette, and running DESeq(), you will then be able to build results tables with, e.g.

results(dds, name="window0a5.dose1ppm")

...for the 1ppm vs control effect for window 0a5. And so on for the others. See resultsNames(dds) for the exact names of the coefficients. (I'd recommend you use "0a5" as a factor level instead of "0a5 dpf", because it's easier to write out and it doesn't contain a space.)

A results table for the contrast of 1ppm vs control across window is:

results(dds, contrast=list("window2a5.dose1ppm", "window0a5.dose1ppm"))
ADD REPLYlink written 2.8 years ago by Michael Love26k

ok, just to make sure I understand correctly:

1. I should run the DE as 

dds <- DESeqDataSet(se, design = ~window + window:rep.nested + window:dose )

dds$dose <- relevel(dds$dose, "Control")

dds <- DESeq (dds)

2. Then I can extract the results for each window separately (comparing 2 dose at each time):

results(dds, name="window0a5.dose1ppm")

3. I obtain the DE genes across windows for each comparison created in 2 by:

results(dds, contrast=list("window2a5.dose1ppm", "window0a5.dose1ppm"))
ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by laianavarromartin10

Yes.             

ADD REPLYlink written 2.8 years ago by Michael Love26k

And why do not include Dose as a factor in the design? I also want to find the genes that are altered by dose regardless of the window of exposure.

On the other hand, why the walt test is better than the LRT? To make my life easier I always compare the Walt and LRT to a t-test or to an ANOVA. Does not my experimental design resemble better an ANOVA type of stats, where I look at the contribution of the 2 factors (genes that are DE by Window or dose) and then their interactions (genes that respond differently upon exposures to different doses depending on the window of exposure?

 

ADD REPLYlink written 2.8 years ago by laianavarromartin10

You can't really ask for the window-less effect of dose and at the same time control for replicate. This is maybe a subtle issue that you could discuss with a statistician, if you can find someone locally to sit down with.

In short, we don't have contrasts implemented for LRT in DESeq2, so you have to use the Wald to perform contrasts with DESeq2. All I can say is that the code I've suggested will enable you to discover the genes you want to find, and with high sensitivity. But the analysis is in your hands, so you are free of course to use whatever other methods you like.

 

ADD REPLYlink written 2.8 years ago by Michael Love26k

I have another experiment a little more complicated.  Have fish exposed to 3 different compounds at equivalent doses. I want to see which genes are specifically regulated by each compound, and which are common to all of them. 

Compound Dose Rep.nested
1 Control 1
1 NOEC/30 1
1 NOEC/3 1
1 NOEC 1
1 Control 2
1 NOEC/30 2
1 NOEC/3 2
1 NOEC 2
1 Control 3
1 NOEC/30 3
1 NOEC/3 3
1 NOEC 3
2 Control 1
2 NOEC/30 1
2 NOEC/3 1
2 NOEC 1
2 Control 2
2 NOEC/30 2
2 NOEC/3 2
2 NOEC 2
2 Control 3
2 NOEC/30 3
2 NOEC/3 3
2 NOEC 3
3 Control 1
3 NOEC/30 1
3 NOEC/3 1
3 NOEC 1
3 Control 2
3 NOEC/30 2
3 NOEC/3 2
3 NOEC 2
3 Control 3
3 NOEC/30 3
3 NOEC/3 3
3 NOEC 3

 

In that case, could I run LRT as follow?

dds <- DESeqDataSet(se, design = ~Compound + Dose + Compound:Rep.nested + Compound:Dose )
dds$Dose <- relevel(dds$Dose, "Control")
dds$Compound <- relevel(dds$Compound, "1")

dds <- DESeq(dds, test="LRT", full= ~Compound + Dose + Compound:Rep.nested + Compound:Dose, reduced = Dose + Compound:Rep.nested + Compound:Dose ))

Will give me the genes that are differentially expressed by each compound?

dds <- DESeq(dds, test="LRT", full= ~Compound + Dose + Compound:Rep.nested + Compound:Dose, reduced = ~Compound + Compound:Rep.nested + Compound:Dose ))

Will give me the genes that are common to all compounds?

 

Or should I run a Wald test as follows:

dds <- DESeqDataSet(se, design = ~ Dose + Compound:Rep.nested + Compound:Dose )
dds$Dose <- relevel(dds$Dose, "Control")
dds <- DESeq (dds)

Then extract the results for each dose separately (comparing 2 dose at each time). This will give me the genes that are commonly altered by the compounds

results(dds, name="doseControl.doseNOEC")

and Obtain the DE genes across compounds for each comparison created:

results(dds, contrast=list("Compound1.doseNOEC", "Compound2.doseNOEC"))
ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by laianavarromartin10

I'd focus on the Wald testing code you have. The design is correct and the second results table is correct, but not the first results table. With your experimental design, the only way to get the "common" effects across compound at a given dose it to average the interactions. You can do this with:

results(dds, contrast=list(c("Compound1.doseNOEC","Compound2.doseNOEC","Compound3.doseNOEC")), listValues=c(1/3,-1))

This code is taking 1/3 of the NOEC vs Control effect across all three compounds, which gives the average effect. There isn't any other way to get the "common" effect and at the same time control for replicate, because replicate is confounded with compound. Again this is a subtle issue which may be best explained by working with a local statistician.

ADD REPLYlink written 2.8 years ago by Michael Love26k
Answer: Multi-factorial design in DEseq2: defining reduced model
0
gravatar for Michael Love
2.8 years ago by
Michael Love26k
United States
Michael Love26k wrote:

Are those the three questions you want to ask? It's usually easier for me to figure out what questions you want to ask and then proceed from there. Do you have replicates? Can you show a table indicating how many replicates you have for each of A x B?

ADD COMMENTlink written 2.8 years ago by Michael Love26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 270 users visited in the last hour