Question: Importance of Order in Design Formula
0
3 months ago by
dennism92510 wrote:

Can someone just quickly clarify if order matters when deciding on your design formula? In this post Anderson nicely explains that the design below is "Effect of treatment, accounting for the sample pairing"

~ Patient.ID + Treatment


But I thought that variables included in your design table (specifically referring to the Wilkinson Notation) just referred to the factors you wanted to take into account when creating your linear regression model.

What would be the difference between:

"Effect of treatment, accounting for the sample pairing"

~ Patient.ID + Treatment


and "Effect of sample pairing, accounting for the treatment"

~ Treatment + Patient.ID


The same question is extended to interactions Patient.ID:Treatment and Treatment:Patient.ID)

deseq2 • 173 views
modified 3 months ago by swbarnes2340 • written 3 months ago by dennism92510

Also the design matrix does not change when specifying different orders in model.matrix()

Answer: Importance of Order in Design Formula
1
3 months ago by
swbarnes2340
swbarnes2340 wrote:

https://rdrr.io/bioc/DESeq2/man/results.html

If results is run without specifying contrast or name, it will return the comparison of the last level of the last variable in the design formula over the first level of this variable.

If you specify the contrast you want, order doesn't matter.

Thank you for your answer, but the focus of this question isn't really results() or choosing the correct resultsNames/ contrast. I was wondering if calling results( same contrast/name/etc ) on two DESeqDataSets with differing design order would change the outcome. To clarify my question...

design(dds1) <- ~ Patient.ID + Treatment
design(dds2) <- ~ Treatment + Patient.ID
dds1 <- DESeq(dds1)
dds2 <- DESeq(dds2)
res1 <- results(dds1) #Yes I know there is no contrast/results name
res2 <- results(dds2)


Is res1 == res2?

Also while on the topic on specifying contrast. Is there a difference from in the code shown below?

results(dds, contrast=c("condition", "Trt", "Ctrl"))
results(dds, name="condition_Trt_vs_Ctrl")

1

As swbarnes2 points out above, the order in the design doesn't matter.

When someone pulls out the coefficient associated with variable x with a design formula ~z + x, they will often write about "the effect of x, controlling for z" or "...while adjusting for z", etc. Or you could be more explicit and say "the coefficient associated with x, in a linear model including terms for z and x".