Question

Struggling with use and interpretation of "Group-specific condition effects, individuals nested within groups"

1

Entering edit mode

jpkarl ▴ 10

@jpkarl-14139

Last seen 6.3 years ago

Hi,

Ive read the excellent vignettes, the DeSeq paper, and discussed with colleagues but ultimately am still struggling with the correct approach to using DeSeq2 to analyze a repeated measures experiment including 2 separate groups. I want to be sure I do this correctly, so any guidance is greatly appreciated.

I am analyzing an experiment in which I have 2 groups (x and y) of volunteers measured at 4 separate time points (a, b, c and d). I want to know A) whether changes in gene counts over time differ between groups (i.e., group-by-time interaction), and for any genes that don't exhibit a significant interaction to B) determine whether gene counts differ over time when both groups are combined.

For objective A the "Group-specific condition effects, individuals nested within groups" explained here: http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups seems most appropriate. Therein the appropriate model is described as ~ grp + grp:ind.n + grp:cnd.

Question 1 is why is there no cnd term (i.e., main effect of time/condition) as there would be in a general linear model? In other words, why isn't the correct model ~ grp + cnd + grp:ind.n + grp:cnd?

In order to test for an interaction I plan to use a log-ratio test to compare a reduced model that does not include the grp:cnd term.

Question 2: is that approach correct?

To compare whether condition/timepoint effects (e.g., b vs a) also differ across groups, I can use the code: results(dds, contrast=list("grpY.cndB","grpX.cndB")).

Question 3 is how do I interpret those results? Do the results indicate log2(grpY.cndB/grpX.cndB) while controlling for cndA, or (log2(grpY.cndB/grpY.cndA) - log2(grpX.cndB/grpX.cndA)) or something else?

For objective B, my initial thought was to use a log-ratio test to compare the models ~ grp + cnd + grp:ind.n + grp:cnd and ~ grp + grp:ind.n + grp:cnd. However, if the first model is incorrect, that obviously doesn't work. My next thought was to use the paired samples approach noted in the vignette above. The vignette states that the model would be: ~ subject + cnd; so my thought was to use ~ subject + cnd + group. I also looked at the vignettes for time course experiments, but these models don't appear to account for the fact that repeated measurements are being taken from the same individual (i.e., measurements over time are not independent) so I didn't think the time course example would be applicable here.

Question 4 is which approach, if any, is correct?

Thank you for the help,

Phil

deseq2 • 1.8k views

ADD COMMENT • link updated 6.3 years ago by Michael Love 41k • written 6.4 years ago by jpkarl ▴ 10

score 1 · Answer 1 · 2017-12-12

"Question 1 is why is there no cnd term"

This is a particularity of how design formula and model.matrix() works in R. Without a condition term, group + group:condition, produces a condition effect for each group. With a condition term, group + condition + group:condition, produces a main effect for condition (the effect in the reference group) and then interaction terms for the non-reference group. Not adding a condition term makes it easier to extract the condition effect in each group, and to contrast them using the 'list' style of contrast.

"Question 2: is that approach correct?"

Not really, for this design (without condition term), you can use contrast=list(..., ...), to compare the two condition effects to see if they are different. If you wanted to use an LRT, you would use full=group + condition + group:ind.n + group:condition and reduced=group + condition + group:ind.n. They should give similar results but they are slightly different statistical approaches, so not identical results. I recommend the first approach in the vignette because it is simpler for users to extract multiple results tables from this approach.

"Question 3 is how do I interpret those results?"

Using the design without the condition term, you can pull out the condition effect for each group using 'name' in results(), and you can contrast the two effects using a 'list' style of contrast. You may want to meet or consult with a local statistician though, if the coefficients are not recognizable, or the interpretation is not clear after reading over the relevant vignette sections. I think this is always a good idea, as you learn much more from a face-to-face interaction.