Question

Edger paired design + batch effect

0

Entering edit mode

assaf www ▴ 140

@assaf-www-6709

Last seen 4.7 years ago

Hi Gordon and all,

I'm analysing two experiments: 'experiment1', and 'experiment2', where the samples from each experiments are clustered separately in PCA.
In each experiment, two individuals (1,2) were tested repeatedly in 4 different conditions: 'A24','C24','A30','C30'.
But please note: the two individuals tested in the first experiment are NOT the same individuals tested in the second experiment.

so, in summary:

experiment    group    individual
experiment1    'A24'    1
experiment1    'C24'    1
experiment1    'A30'    1
experiment1    'C30'    1
experiment1    'A24'    2
experiment1    'C24'    2
experiment1    'A30'    2
experiment1    'C30'    2

experiment2    'A24'    1
experiment2    'C24'    1
experiment2    'A30'    1
experiment2    'C30'    1
experiment2    'A24'    2
experiment2    'C24'    2
experiment2    'A30'    2
experiment2    'C30'    2

I want to test how 'group' effects expression, specifically to compare 'A24' to 'C24', and A30 to C30.
So, based on the above, is the following design matrix correct ?

design <- model.matrix(~group+experiment+experiment:individual, data=y$samples)

below is how the design matrix looks like.
Many thanks, Assaf

> design
      (Intercept) groupA30 groupC24 groupC30 experiment2
xA24B           1        0        0        0           0
xA24C           1        0        0        0           0
xC24B           1        0        1        0           0
xC24C           1        0        1        0           0
xA30B           1        1        0        0           0
xA30C           1        1        0        0           0
xC30B           1        0        0        1           0
xC30C           1        0        0        1           0
yA24A           1        0        0        0           1
yA24B           1        0        0        0           1
yC24A           1        0        1        0           1
yC24B           1        0        1        0           1
yA30A           1        1        0        0           1
yA30B           1        1        0        0           1
yC30A           1        0        0        1           1
yC30B           1        0        0        1           1
      experiment1:individual2 experiment2:individual2
xA24B                       0                       0
xA24C                       1                       0
xC24B                       0                       0
xC24C                       1                       0
xA30B                       0                       0
xA30C                       1                       0
xC30B                       0                       0
xC30C                       1                       0
yA24A                       0                       0
yA24B                       0                       1
yC24A                       0                       0
yC24B                       0                       1
yA30A                       0                       0
yA30B                       0                       1
yC30A                       0                       0
yC30B                       0                       1

edger • 1.4k views

ADD COMMENT • link updated 8.8 years ago by Aaron Lun ★ 28k • written 8.8 years ago by assaf www ▴ 140

score 1 · Answer 1 · 2015-07-06

Your design is basically correct. However, for what it's worth, the parametrization of the model could be simplified. Consider this alternative:

> individual <- factor(paste0(sample_data$experiment, ".", sample_data$individual))
> group <- factor(sample_data$group)
> design2 <- model.matrix(~ 0 + group + individual)
> colnames(design2) <- c(levels(group), levels(individual)[-1])
> colnames(design2)
[1] "A24"           "A30"           "C24"           "C30"
[5] "experiment1.2" "experiment2.1" "experiment2.2"

In this design, the first four columns represent the average log-expression of each group, while the last three columns represent the individual-specific effect (individual 1 from experiment 1 is treated as the baseline, such that each of the other three individuals has his/her own column). This set-up is fairly intuitive, as each coefficient in the design has a clearly defined purpose in the model.

The fitted model with design2 will give the same results as your original design, but it does simplify some things. In particular, it avoids construction and interpretation of interaction terms, which I find somewhat annoying. It also clarifies to the reader that individuals 1 and 2 from experiment 1 are not the same as individuals 1 and 2 from experiment 2.