Any help would be much appreciated. I'm having trouble understanding how to compare all facets of my data.
I am analysing samples from two groups of patients ("A" or "B"). All patients are treated identically but only some respond positively to treatment ("P"). Samples are taken "before" and "after" treatment. The goal is to determine which genes are differentially expressed before compared to after, and to assess whether either the group or the response or both have any effect on this change in gene expression after treatment. I have looked around these question pages for a few days now but I cannot find an example like this (to my eyes).
In essence, the following data:
L = 6 # no of patients, 18 in the real data set df <- data.frame("sample" = rep(paste("sample_", 1:L, sep=""),2), "time" = rep(c("before","after"),1,each=L), "group" = rep(c("A", "B")[round(runif(L,1,2))],2), "response" = rep(c("P", "N")[round(runif(L,1,2))],2) ) > df sample time group response 1 sample_1 before B P 2 sample_2 before B N 3 sample_3 before A N 4 sample_4 before B P 5 sample_5 before B P 6 sample_6 before A P 7 sample_1 after B P 8 sample_2 after B N 9 sample_3 after A N 10 sample_4 after B P 11 sample_5 after B P 12 sample_6 after A P
The base factor levels are "before", "A" and "N" (in
df[2:4]) after re-levelling.
Previously I have combined the factors into various different combinations and performed contrasts between the groups for all manner of combinations.
df$c1 <- factor(paste(df$time, df$group, sep=".")) df$c2 <- factor(paste(df$time, df$response, sep=".")) df$c3 <- factor(paste(df$time, df$group, df$response, sep="."))
However, I may have lost power with this approach as some of the subsets only contain a few samples (there are only 18 patients shared between the two time-points). I have resorted to including interaction terms in the design but find it hard to interpret the results with three factors. Am I using the correct design?
The current DESeq design:
design(dds) <- formula (~ response + group + time + group:time + response:time)
I am interested in several questions but face confusion with how they are answered. Are any of these approaches correct? If not, how should I formulate them?
1) Which genes are DE after treatment?
Given the other base factors, is this actually asking "Which genes are differentially expressed after treatment in response N, group A samples?"?
2) Which genes are DE after treatment in P vs N samples, ignoring groupings?
Or, does this refer to group A only?
3) Which genes are DE after treatment in response P vs N, in group B vs A?
results(dds, contrast = list(c(
"time_after_vs_before", "responseP.timeafter", "groupB.timeafter")))
Will this result in DE genes between "before" and "after" treatment, in Positive vs Negative patients, in group B against group A? i.e, The main treatment effect + additional effect of positive responding patients + additional effect of group B?
In addition to the treatment only effect, is this the DE result of only the positive response effect?
Likewise, does this show only the additional effect of group B on the treatment?
4) Would this design answer any questions about the additional effect of group B upon the contrast of P vs N when not comparing "after" to "before" or do I need to add to the design with "+ response:group" to achieve this?
results(dds, contrast = list(c(
Apologies if this is confusing.
Thanks for the help,