Question: DESEq2 contrast statement syntax and design statement question
0
21 months ago by
grashow0
grashow0 wrote:

Hi,

Thank you for answering our previous questions. We've made progress addressing our shared controls and "model matrix not full rank" issues. As a reminder, we had 8 chemicals (plus control) tested at 4 concentrations both with and without estrogen (E2). We had three biological replicates and three technical replicates per plate. We assigned samples labeled "control" to each chemical and assigned a concentration of 0 uM.

We are mostly interested in the three way interaction between chemical, concentration and E2. We've explored two ways of doing this.

1) Michael Love suggested the following:

design= ~ bio_rep + E2 + E2:Conc_uM + E2:new_chem:Conc_uM

2) we have also used the paste0 command to create a mega-variable that combines chemical, concentration and E2.

design ~ bio_rep + mega_variable

We are getting very different results with these two approaches. Can you articulate what the differences might be?

As a second question, we've also seen two different syntax styles for contrast statements:

1) Tam_results_1 <- results(ddsColl, contrast= list(c("mega_varTam_0.1uM_E2_0","mega_varcontrol_0uM_E2_0")),alpha=0.05)

2) Tam_results_2 <- results(ddsColl, contrast= (c("mega_var","Tam_0.1uM_E2_0", "control_0uM_E2_0")),alpha=0.05)

These two yield different results. When would each be appropriate to use?

Thank you in advance,

Rachel

deseq2 interactions contrast • 697 views
modified 21 months ago by Michael Love23k • written 21 months ago by grashow0
Answer: DESEq2 contrast statement syntax and design statement question
0
21 months ago by
Michael Love23k
United States
Michael Love23k wrote:

Just to link for my own tracking back to previous post

DESeq2- what to do when two conditions share controls?

Re: very different results, this makes sense because treating concentration as a numeric variable (as before) or putting it into a string and treating the levels of concentration as categorical is a very different modeling choice.

In my last reply, I recommended you work with a local statistician, as you have a very complex experimental setup and I think it's not trivial to encode the tests that you want from English into R formula and contrasts. Statistical modeling is an iterative process with such an experimental design, between checking assumptions of certain models (how to encode the numeric variable of concentration, how to deal with shared controls). I'm going to reiterate that I recommend you partner with someone with background in using R's linear model formula.

Re: the two syntax, see the help page for ?results:

contrast: this argument specifies what comparison to extract from the
‘object’ to build a results table. one of either:

• a character vector with exactly three elements: the name
of a factor in the design formula, the name of the
numerator level for the fold change, and the name of the
denominator level for the fold change (simplest case)

• a list of 2 character vectors: the names of the fold
changes for the numerator, and the names of the fold
changes for the denominator. these names should be
elements of ‘resultsNames(object)’. if the list is length
1, a second element is added which is the empty character
vector, ‘character()’. (more general case, can be to
combine interaction terms and main effects)

You are using the contrast correctly in (2), but for (1) above, you are only giving one character vector:

list( c("mega_varTam_0.1uM_E2_0", "mega_varcontrol_0uM_E2_0") )

This adds the two coefficient together, rather than taking their difference. If you want to contrast these two levels you need to provide the two levels as separate elements of the list:

list( "mega_varTam_0.1uM_E2_0", "mega_varcontrol_0uM_E2_0" )

They look similar, but one is adding together coefficients while the other is calculating their difference.