GLM design matrix not symmetric
1
0
Entering edit mode
@michael-moore-6356
Last seen 10.4 years ago

Hello,

I have a question about the validity of using the GLM strategy in edgeR if the design matrix is not symmetric.

I am dealing with RNA-seq data from cultured T-cells from mice that are WT or KO'd for my gene of interest. In addition to the WT vs. KO variable, there are also "batch" effects with this data such that the profiles cluster by genotype (as expected), but also by litter of mice. Typically, I have 3 biological replicates for each genotype, all collected on different days (i.e. from different litters). To deal with this, I have been using the GLM method in edgeR with the following design matrix:

day <- factor(c(1,2,3,1,2,3))
condition <- factor(c("WT", "WT", "WT", "KO", "KO", "KO"), levels=c("WT", "KO"))
design <- model.matrix(~day+condition)

The DE analysis with this method was more sensitive in detecting differences due to genotype vs. the "classic" exactTest method.

My problem arises in a new experiment where one of the KO replicates failed, but the matched WT was fine. The corresponding design matrix is:

day <- factor(c(1,2,3,2,3))
condition <- factor(c("WT", "WT", "WT", "KO", "KO"), levels=c("WT", "KO"))
design <- model.matrix(~day+condition)

Is such a design valid?

Thanks very much for your time.

Michael

edgeR • 1.5k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 8 minutes ago
WEHI, Melbourne, Australia

Dear Michael,

Intuitively, I'm sure you will appreciate that, if you lose one member of a matched pair in a paired analysis, then it becomes impossible to make any comparisons using the remaining member of that pair.

From a mathematical point of view, edgeR has no requirement for there to be equal numbers of samples in the different genotype groups, so the analysis approach does the right thing and remains valid. In the simple example you give, edgeR will in effect remove the first sample from the analysis. So you will get identical KO vs WT DE results from either:

  day <- factor(c(1,2,3,2,3))
  condition <- factor(c("WT","WT","WT","KO","KO"),levels=c("WT","KO"))
  design <- model.matrix(~day+condition)

or

  day <- factor(c(2,3,2,3))
  condition <- factor(c("WT","WT","KO","KO"),levels=c("WT","KO"))
  design <- model.matrix(~day+condition)

with the day1 WT sample removed.

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode

Hi Michael,

As a postscript, there is a way to recover at least partial information from the day1 WT sample, even when the day1 KO sample has been lost. This requires a random effect or correlation approach instead of a paired t-test type analysis. If the litter effect is not very strong, this can be a useful approach. To do such an analysis, you would need to switch to a voom-limma analysis pipeline and use the duplicateCorrelation function of the limma package.

Best wishes
Gordon

ADD REPLY

Login before adding your answer.

Traffic: 666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6