Question: difference between two design matrix : how to choose ?
0
kippec0 wrote:

Dear All,

I have an experiment for whom the main factor of study as 4 levels : Early Mid Late Last (exposure to the treatment).

I would like to know what are the genes that are significantly differentially expressed between the group of (LAte + last) versus (Early + Mid).

I choose two different way to build the design matrix (see below), that raised really different result.

I'm not sure which one is the best one to use, so I would be very grateful if one could give me some advise about these two design.

Bests

--

Kippec Taman

design <- model.matrix(~ 0+factor(target$Status)) colnames(design) = levels(factor(target$Status))

Early Mid Late Last

1            1                0                0        0
2            1                0                0        0
3            1                0                0        0
4            0                1                0        0
5            0                1                0        0
6            0                1                0        0
7            0                0                1        0
8            0                0                1        0
9            0                0                1        0
10           0                0                0        1
11           0                0                0        1
12           0                0                0        1

fit <- lmFit(X, design)

contrast.matrix <- makeContrasts(
+     (Late + Last)-  ( Mid + Early),
+     levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

The second way, was to build another factor with two levels :

f = as.vector(target\$Status)
f[which(f == "Early")] = "E.M"
f[which(f == "Mid")] = "E.M"
f[which(f == "Late")] = "La.Ls"
f[which(f == "Last")] = "La.Ls"

then

design <- model.matrix(~0+f)
colnames(design) = levels(factor(f))

E.M  La.Ls

1    1   0
2    1   0
3    1   0
4    1   0
5    1   0
6    1   0
7    0   1
8    0   1
9    0   1
10   0   1
11   0   1
12   0   1

fit <- lmFit(X, design)

contrast.matrix <- makeContrasts(
La.Ls - E.M,
levels=design
)

fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

Answer: difference between two design matrix : how to choose ?
1
James W. MacDonald51k wrote:

If you have four different groups, then you should probably use the first design matrix rather than the second. The difference being that in the second case you are artificially grouping the four groups into two. If you only really have two groups, then I would imagine you wouldn't have put them into four in the first place.

In addition, the contrast (Early + Mid) - (Late + Last) is usually computed as (Early + Mid)/2 - (Late + Last)/2, as you are testing that the mean of the first two is different from the mean of the second two, rather than testing the sums. Although when the two sides are balanced like that, I don't think it makes a difference, really.

Answer: difference between two design matrix : how to choose ?
0
kippec0 wrote:

Dear James,

effectively, I have 4 differents groups. But I would like to test a more macroscopique organisation of the sample by grouping two "early group" and the two "late" group.

And you are wright : the two design method raised barely same results :

method 1 (grouping the two levels) raised 141 genes

method 2 ((Early + Mid)/2 - (Late + Last)/2 raised 283 genes, 132 are common to the method 1.

I guess variance estimation with the method 2 is more accurate (may be also more degres of freedom)

best