difference between two design matrix : how to choose ?
2
0
Entering edit mode
kippec • 0
@kippec-7290
Last seen 9.2 years ago
France

Dear All,

I have an experiment for whom the main factor of study as 4 levels : Early Mid Late Last (exposure to the treatment).

I would like to know what are the genes that are significantly differentially expressed between the group of (LAte + last) versus (Early + Mid).

I choose two different way to build the design matrix (see below), that raised really different result.

I'm not sure which one is the best one to use, so I would be very grateful if one could give me some advise about these two design.

Bests

--

Kippec Taman

 

 

design <- model.matrix(~ 0+factor(target$Status))

colnames(design) = levels(factor(target$Status))


Early Mid Late Last

1            1                0                0        0
2            1                0                0        0
3            1                0                0        0
4            0                1                0        0
5            0                1                0        0
6            0                1                0        0
7            0                0                1        0
8            0                0                1        0
9            0                0                1        0
10           0                0                0        1
11           0                0                0        1
12           0                0                0        1


fit <- lmFit(X, design)

contrast.matrix <- makeContrasts(
+     (Late + Last)-  ( Mid + Early), 
+     levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

 

The second way, was to build another factor with two levels :

f = as.vector(target$Status)
f[which(f == "Early")] = "E.M"
f[which(f == "Mid")] = "E.M"
f[which(f == "Late")] = "La.Ls"
f[which(f == "Last")] = "La.Ls"

 

then

design <- model.matrix(~0+f)
colnames(design) = levels(factor(f))


  E.M  La.Ls

1    1   0
2    1   0
3    1   0
4    1   0
5    1   0
6    1   0
7    0   1
8    0   1
9    0   1
10   0   1
11   0   1
12   0   1

 



fit <- lmFit(X, design)

contrast.matrix <- makeContrasts(
        La.Ls - E.M,
        levels=design
)

fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

limma design matrix • 1.7k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States

If you have four different groups, then you should probably use the first design matrix rather than the second. The difference being that in the second case you are artificially grouping the four groups into two. If you only really have two groups, then I would imagine you wouldn't have put them into four in the first place.

In addition, the contrast (Early + Mid) - (Late + Last) is usually computed as (Early + Mid)/2 - (Late + Last)/2, as you are testing that the mean of the first two is different from the mean of the second two, rather than testing the sums. Although when the two sides are balanced like that, I don't think it makes a difference, really.

ADD COMMENT
0
Entering edit mode
kippec • 0
@kippec-7290
Last seen 9.2 years ago
France

Dear James,

effectively, I have 4 differents groups. But I would like to test a more macroscopique organisation of the sample by grouping two "early group" and the two "late" group.

And you are wright : the two design method raised barely same results : 

method 1 (grouping the two levels) raised 141 genes

method 2 ((Early + Mid)/2 - (Late + Last)/2 raised 283 genes, 132 are common to the method 1.

I guess variance estimation with the method 2 is more accurate (may be also more degres of freedom)

Again thanks for your answer !

best

 

 

 

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6