Question

Creating a design matrix

0

Entering edit mode

mahm ▴ 20

@mahm-16884

Last seen 6.4 years ago

I'm creating a design matrix as follows,

library(gcrma)
library(limma)
gseEset <- getGEO('GSE20966')[[1]]
DesignMatrixInput <- pData(gseEset)[,45,drop=FALSE]
colnames(DesignMatrixInput)[1] <- "CellType"
CellTypeLabel <- unique(DesignMatrixLabel$CellType)
Design.mat <- model.matrix(~0+DesignMatrixInput$CellType)

Output:

          non-diabetic control type 2 diabetes
GSM524151                    1               0
GSM524152                    1               0
GSM524153                    1               0
GSM524154                    1               0
GSM524155                    1               0
GSM524156                    1               0
GSM524157                    1               0
GSM524158                    1               0
GSM524159                    1               0
GSM524160                    1               0
GSM524161                    0               1
GSM524162                    0               1
GSM524163                    0               1
GSM524164                    0               1
GSM524165                    0               1
GSM524166                    0               1
GSM524167                    0               1
GSM524168                    0               1
GSM524169                    0               1
GSM524170                    0               1

In section 9.1 of user guide , the following example is given with Array1 ,Array 2... as the row names.

      WT MU
Array1 1 0
Array2 1 0
Array3 0 1
Array4 0 1
Array5 0 1

In my case ,I have created the design martix with the row names as the sample names.

Could someone explain me what Array1 ,Array2 stands for in the example shown in the manual?

design matrix limma • 1.2k views

ADD COMMENT • link updated 6.3 years ago by Gordon Smyth 52k • written 6.4 years ago by mahm ▴ 20

score 0 · Answer 1 · 2018-11-02

It isn't compulsory to set the row.names of the design matrix but, yes, in general the row sames should be the sample Ids. The row.names of the design matrix should match the column names of the expression data.

The row names "Array1", "Array2" from the example in the limma User's Guide are just arbitrary example sample names names. They just mean "microarray number 1", "microarray number 2" etc. In practice, you can use any samples names that are appropriate for your study. Using GSM sample Ids from GEO is a perfectly appropriate choice.