**40**wrote:

Hello,

I am new to RNAseq, new to DESeq2 and new to writing design formulas/matrix (not a statistician)...

I am trying to understand the meaning of the different examples I find on the internet and so far I am far from a clear understanding. Could someone help please? Is there a documentation for dummies that I could look. I have looked at DESeq2 and Limma documentation so far. I do get that the formula to use really depends on the question i want to answer but still not clear... Here are some of my questions:

1 - I have 27 samples. Two different treatments + control and 3 different times. I have 3 replicates for each. I would like to know what are the genes differentially expressed at each time for each treatment versus control (e.g. treated versus control at time 1, treated versus control at time 4 ...etc). I also would like to know what are the genes differentially expressed at each time for one treatment versus the other.

So far I have used: ~time*treatment

From DESeq2, I get the following column names for the design:

`resultsNames(dds)`

[1] "Intercept" "time1"

[3] "time2" "time3"

[5] "treatmentControl" "treatment1"

[7] "treatment2" "time1.treatmentControl"

[9] "time2.treatmentControl" "time3.treatmentControl"

[11] "time1.treatment1" "time2.treatment1"

[13] "time3.treatment1" "time1.treatment2"

[15] "time2.treatment2" "time3.treatment2"

In order to get more insight on what was happening I tried the same formula but using` design<- model.matrix(~time*treatment,meta )`

and the colnames I have are completely different, I just get:

`colnames(design)`

[1] "(Intercept)" "time2"

[3] "time3" "treatment1"

[5] "treatment2" "time3:treatment1"

[7] "time3:treatment1" "time2:treatment2"

[9] "time3:treatment2"

I am missing every column related to control and time 1.

I am lost. What DESeq2 is doing in the background please? Is it changing the formula? Is it possible to access the design matrix used by DESeq2?

2 - In the Limma documentation (http://www.statsci.org/smyth/pubs/limma-biocbook-reprint.pdf, p413), there is an example for a 2 x 2 factorial design.

`FileName Strain Treatment`

`File1 WT U`

`File2 WT S`

`File3 Mu U`

`File4 Mu S`

`File5 Mu S`

`TS <- paste(targets$Strain, targets$Treatment, sep = ".") `

`TS <- factor(TS, levels = c("WT.U", "WT.S", "Mu.U", "Mu.S")) `

`design <- model.matrix(~0 + TS)`

`colnames(design) <- levels(TS)`

`fit <- lmFit(eset, design)`

`cont.matrix <- makeContrasts(`

`+ WT.SvsU = WT.S - WT.U,`

`+ Mu.SvsU = Mu.S - Mu.U,`

`+ Diff = (Mu.S - Mu.U) - (WT.S - WT.U),`

`+ levels = design)`

` fit2 <- contrasts.`

Is it possible to do the same thing in DESeq2 (i.e. to merge the two factors in one)? It does seem more intuitive this way for the subsequent comparisons. But does it really gives the same results as ~strain * Treatment as they seems to explain? Could I do the same with my time*treatment experiment?

3 - Looking at this example for instance: Yet another DESeq2 nested design contrast matrix question

sample |
stage |
region |
animal |

E1_1 |
E |
Region1 |
1 |

E1_2 |
E |
Region1 |
2 |

E1_3 |
E |
Region1 |
3 |

E2_1 |
E |
Region2 |
1 |

E2_2 |
E |
Region2 |
2 |

E2_3 |
E |
Region2 |
7 |

P1_1 |
P |
Region1 |
4 |

P1_2 |
P |
Region1 |
5 |

P1_3 |
P |
Region1 |
6 |

P2_1 |
P |
Region2 |
4 |

P2_2 |
P |
Region2 |
5 |

P2_3 |
P |
Region2 |
8 |

Darya tried `design = ~ Animals + Stage + Region and design = ~ Animals + Stage + Region + Stage:Region`

But Mickeal Love suggested

`dds$animal.nested <- factor(c(1,2,3,1,2,4,1,2,3,1,2,4))`

`design =~ stage + stage:animal + stage:region`

What was wrong with Darya ones please?

Why not use design=~stage *animal *region so that all possible comparisons are available and then only use the constrasts needed?

How does the formula impact on what is calculated in DESeq2 before doing any contrast please?

Thanks in advance for any help!