Understanding the design matrix
2
1
Entering edit mode
@nitesh-kumar-singh-10653
Last seen 3.1 years ago

Hi,

I always gets confused with the design matrix and even though I figure out the solution I am not very confident. So I though of asking for help.

I have 3 experiments: Control, wild and knockout. 2 time points: 2 and 4 hr. and 3 replicate for each.

I am not sure which of the following design matrix and contrast is best suited to get DEG for specified scenario.

Case 1:

dds <- DESeqDataSet(se, design = ~Time + Experiment + Time*Experiment)
dds <- dds[rowSums(counts(dds))>1,]
dds$Experiment <- relevel(dds$Experiment, ref="Control")
dds <- DESeq(dds,parallel = T)

resultsNames(dds)

[1] "Intercept"             "Time_4_vs_2"           "Experiment_Wild_vs_Control"  "Experiment_Knockout_vs_Control" "Time4.ExperimentWild"     "Time4.ExperimentKnockout"

res.05 <- results(dds,c(0,0,-1,1,-1,1), alpha=0.05)     #For knockout vs wild (both 2hr and 4hr)

res.05 <- results(dds,c(0,0,0,1,0,1), alpha=0.05)     #For knockout vs control (both 2hr and 4hr)

res.05 <- results(dds,c(0,0,1,0,0,0), alpha=0.05)     #For wild vs control 2hr

res.05 <- results(dds,c(0,0,0,-1,0,1), alpha=0.05)     #For knockout 4h vs knockout 2hr

Case 2:

dds <- DESeqDataSet(se, design = ~Time + Experiment)
dds <- dds[rowSums(counts(dds))>1,]
dds$Experiment <- relevel(dds$Experiment, ref="Control")
dds <- DESeq(dds,parallel = T)

resultsNames(dds)

[1] "Intercept"     "Time2"         "Time4"         "ExperimentControl" "ExperimentWild"   "ExperimentKnockout"

res.05 <- results(dds,c(0,0,0,0,-1,1), alpha=0.05)     #For knockout vs wild (both 2hr and 4hr)

res.05 <- results(dds,c(0,0,0,-1,0,1), alpha=0.05)     #For knockout vs control (both 2hr and 4hr)

res.05 <- results(dds,c(0,1,0,-1,1,0), alpha=0.05)     #For wild vs control 2hr

res.05 <- results(dds,c(0,-1,1,0,0,1), alpha=0.05)     #For knockout 4h vs knockout 2hr

Can anyone explain me which case/contrast to use for above scenario? And a little explanation would be very much appreciated.

Thanks

Nitesh

deseq2 design and contrast matrix • 1.0k views
0
Entering edit mode
@mikelove
Last seen 17 hours ago
United States
What do you mean by this exactly: "For knockout vs wild (both 2hr and 4hr)"? Can you describe what kind of null and alternative you want to be testing?
0
Entering edit mode

Hi Michael,

For knockout vs wild (both 2hr and 4hr), I meant that testing DEG between knockout and wild considering both time points (2hr and 4hr). So its overall change in expression pattern between experiments. Null hypothesis would be that there is no significant diff in expression pattern of genes between knockout and wild, considering both time points. So taking into account all the samples I have. Does this make sense?

Thanks

Nitesh

0
Entering edit mode
So a change in only one time point or in both would count?
0
Entering edit mode

Change in both should count. Is it same if I take union of DEGs between knockout and wild at 2hr and 4hr? I thought doing it at same time will have more statistical power.

0
Entering edit mode

(1) Requiring DE at both time points has less power than (2) DE at either time point or both. Can you say which of 1 or 2 you are interested in?

0
Entering edit mode
Hi, I am interested in the first one, DE at both time points. But can you explain how i can obtain DE for both the above cases, for learning purpose. Also, is there any link which will help me understand what i want to do?
0
Entering edit mode
@mikelove
Last seen 17 hours ago
United States

I'm starting a new thread to have more space. All of your contrasts of interest except for the "both 2hr and 4hr" are pairwise comparisons between groups, if you define the groups as unique combinations of factors. So I would recommend you follow the recommendation in the vignette about creating such a variable

dds$group <- factor(paste0(dds$time, dds\$experiment))
design(dds) <- ~ group

And then use contrast=c("group","2hrknockout","2hrwild"), etc. There isn't a way to express with a contrast the requirement than something be DE both at 2 hours and at 4 hours, but you can look at the genes which are in the intersection of the DE lists from both pairwise contrasts.

The way to specify DE at either 2 hours and 4 hours or both would be to form a full and reduced model matrix, and use a likelihood ratio test. The reduced model matrix would remove the column corresponding to the main effect at time 2hr as well as the interaction effect.

0
Entering edit mode