Question

How does the function model.matrix (to define experimental design) really works ?

0

Entering edit mode

Aurora ▴ 20

@aurora-15104

Last seen 5.5 years ago

Good morning,

I am working with edgeR to perform differential expression on rna-seq data.

I have a design with two variables : type (obese/normal) and treatment

type is a vector with two levels (obese/normal)

treatment is a vector with 5 levels ( 5 different treatments)

But when I perform this line of code : design_matrix = model.matrix(~0+type+treatment)

the design_matrix that results have only 6 columns : typeLean, typeObese and only 4 columns for treatments ( the treatment that appears to be the first level of the treatment vector is not in my design_matrix ! )

Does anyone knows why ? How could I have all the treatments in my design_matrix ?

Thank you,

Have a good day

experimental design model.matrix edger • 1.1k views

ADD COMMENT • link updated 6.1 years ago by Ryan C. Thompson ★ 7.9k • written 6.1 years ago by Aurora ▴ 20

score 4 · Accepted Answer · 2018-03-27

4

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

It is important to note that a factor with K levels only adds K-1 coefficients to the design matrix, because there are only K-1 independent differences between K groups. So your design matrix should have 1 + (5-1) + (2-1) = 6 coefficients. For more information on how factors are encoded into a design matrix, have a look here. You are most likely using the "dummy coding" since that is the default.

You can sidestep the problem of factor coding for one of your factors by using a model with no intercept (i.e. ~0), but the rest of the factors must still be coded as normal.