Hello,
I have a question about the validity of using the GLM strategy in edgeR if the design matrix is not symmetric.
I am dealing with RNA-seq data from cultured T-cells from mice that are WT or KO'd for my gene of interest. In addition to the WT vs. KO variable, there are also "batch" effects with this data such that the profiles cluster by genotype (as expected), but also by litter of mice. Typically, I have 3 biological replicates for each genotype, all collected on different days (i.e. from different litters). To deal with this, I have been using the GLM method in edgeR with the following design matrix:
day <- factor(c(1,2,3,1,2,3)) condition <- factor(c("WT", "WT", "WT", "KO", "KO", "KO"), levels=c("WT", "KO")) design <- model.matrix(~day+condition)
The DE analysis with this method was more sensitive in detecting differences due to genotype vs. the "classic" exactTest method.
My problem arises in a new experiment where one of the KO replicates failed, but the matched WT was fine. The corresponding design matrix is:
day <- factor(c(1,2,3,2,3)) condition <- factor(c("WT", "WT", "WT", "KO", "KO"), levels=c("WT", "KO")) design <- model.matrix(~day+condition)
Is such a design valid?
Thanks very much for your time.
Michael
Hi Michael,
As a postscript, there is a way to recover at least partial information from the day1 WT sample, even when the day1 KO sample has been lost. This requires a random effect or correlation approach instead of a paired t-test type analysis. If the litter effect is not very strong, this can be a useful approach. To do such an analysis, you would need to switch to a voom-limma analysis pipeline and use the duplicateCorrelation function of the limma package.
Best wishes
Gordon