Edger paired design + batch effect
1
0
Entering edit mode
assaf www ▴ 140
@assaf-www-6709
Last seen 5.3 years ago

Hi Gordon and all,

I'm analysing two experiments: 'experiment1', and 'experiment2', where the samples from each experiments are clustered separately in PCA.
In each experiment, two individuals (1,2) were tested repeatedly in 4 different conditions: 'A24','C24','A30','C30'.
But please note: the two individuals tested in the first experiment are NOT the same individuals tested in the second experiment.

so, in summary:

experiment    group    individual
experiment1    'A24'    1
experiment1    'C24'    1
experiment1    'A30'    1
experiment1    'C30'    1
experiment1    'A24'    2
experiment1    'C24'    2
experiment1    'A30'    2
experiment1    'C30'    2

experiment2    'A24'    1
experiment2    'C24'    1
experiment2    'A30'    1
experiment2    'C30'    1
experiment2    'A24'    2
experiment2    'C24'    2
experiment2    'A30'    2
experiment2    'C30'    2

I want to test how 'group' effects expression, specifically to compare 'A24' to 'C24', and A30 to C30.
So, based on the above, is the following design matrix correct ?

design <- model.matrix(~group+experiment+experiment:individual, data=y$samples)


below is how the design matrix looks like.
Many thanks, Assaf

 

 

> design
      (Intercept) groupA30 groupC24 groupC30 experiment2
xA24B           1        0        0        0           0
xA24C           1        0        0        0           0
xC24B           1        0        1        0           0
xC24C           1        0        1        0           0
xA30B           1        1        0        0           0
xA30C           1        1        0        0           0
xC30B           1        0        0        1           0
xC30C           1        0        0        1           0
yA24A           1        0        0        0           1
yA24B           1        0        0        0           1
yC24A           1        0        1        0           1
yC24B           1        0        1        0           1
yA30A           1        1        0        0           1
yA30B           1        1        0        0           1
yC30A           1        0        0        1           1
yC30B           1        0        0        1           1
      experiment1:individual2 experiment2:individual2
xA24B                       0                       0
xA24C                       1                       0
xC24B                       0                       0
xC24C                       1                       0
xA30B                       0                       0
xA30C                       1                       0
xC30B                       0                       0
xC30C                       1                       0
yA24A                       0                       0
yA24B                       0                       1
yC24A                       0                       0
yC24B                       0                       1
yA30A                       0                       0
yA30B                       0                       1
yC30A                       0                       0
yC30B                       0                       1

 

edger • 1.5k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 10 hours ago
The city by the bay

Your design is basically correct. However, for what it's worth, the parametrization of the model could be simplified. Consider this alternative:

> individual <- factor(paste0(sample_data$experiment, ".", sample_data$individual))
> group <- factor(sample_data$group)
> design2 <- model.matrix(~ 0 + group + individual)
> colnames(design2) <- c(levels(group), levels(individual)[-1])
> colnames(design2)
[1] "A24"           "A30"           "C24"           "C30"
[5] "experiment1.2" "experiment2.1" "experiment2.2"

In this design, the first four columns represent the average log-expression of each group, while the last three columns represent the individual-specific effect (individual 1 from experiment 1 is treated as the baseline, such that each of the other three individuals has his/her own column). This set-up is fairly intuitive, as each coefficient in the design has a clearly defined purpose in the model.

The fitted model with design2 will give the same results as your original design, but it does simplify some things. In particular, it avoids construction and interpretation of interaction terms, which I find somewhat annoying. It also clarifies to the reader that individuals 1 and 2 from experiment 1 are not the same as individuals 1 and 2 from experiment 2.

ADD COMMENT

Login before adding your answer.

Traffic: 870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6