Question

GLM design matrix not symmetric

0

Entering edit mode

Michael Moore ▴ 10

@michael-moore-6356

Last seen 9.6 years ago

Hello,

I have a question about the validity of using the GLM strategy in edgeR if the design matrix is not symmetric.

I am dealing with RNA-seq data from cultured T-cells from mice that are WT or KO'd for my gene of interest. In addition to the WT vs. KO variable, there are also "batch" effects with this data such that the profiles cluster by genotype (as expected), but also by litter of mice. Typically, I have 3 biological replicates for each genotype, all collected on different days (i.e. from different litters). To deal with this, I have been using the GLM method in edgeR with the following design matrix:

day <- factor(c(1,2,3,1,2,3))
condition <- factor(c("WT", "WT", "WT", "KO", "KO", "KO"), levels=c("WT", "KO"))
design <- model.matrix(~day+condition)

The DE analysis with this method was more sensitive in detecting differences due to genotype vs. the "classic" exactTest method.

My problem arises in a new experiment where one of the KO replicates failed, but the matched WT was fine. The corresponding design matrix is:

day <- factor(c(1,2,3,2,3))
condition <- factor(c("WT", "WT", "WT", "KO", "KO"), levels=c("WT", "KO"))
design <- model.matrix(~day+condition)

Is such a design valid?

Thanks very much for your time.

Michael

edgeR • 1.3k views

ADD COMMENT • link updated 9.5 years ago by Gordon Smyth 50k • written 10.3 years ago by Michael Moore ▴ 10

score 1 · Answer 1 · 2014-01-25

Dear Michael,

Intuitively, I'm sure you will appreciate that, if you lose one member of a matched pair in a paired analysis, then it becomes impossible to make any comparisons using the remaining member of that pair.

From a mathematical point of view, edgeR has no requirement for there to be equal numbers of samples in the different genotype groups, so the analysis approach does the right thing and remains valid. In the simple example you give, edgeR will in effect remove the first sample from the analysis. So you will get identical KO vs WT DE results from either:

  day <- factor(c(1,2,3,2,3))
  condition <- factor(c("WT","WT","WT","KO","KO"),levels=c("WT","KO"))
  design <- model.matrix(~day+condition)

or

  day <- factor(c(2,3,2,3))
  condition <- factor(c("WT","WT","KO","KO"),levels=c("WT","KO"))
  design <- model.matrix(~day+condition)

with the day1 WT sample removed.

Best wishes
Gordon