Question: Limma DesignMatrix, dropping terms
1
13 months ago by
Keifa10
Keifa10 wrote:

Hello guys,

I was surfing the blog and I stopped in this post C: Design and contrast question limma (additive or nested or duplicateCorrelation(). The guy had this experimental design:

Filename    Sexe    Diet    Litter
sample1    M    LFD    L37
sample2    M    LFD    L37
sample3    M    LFD    L49
sample4    M    LFD    L49
sample5    M    LFD    L50
sample6    M    LFD    L50
sample7    M    WD    L48
sample8    M    WD    L48
sample9    M    WD    L48
sample10    M    WD    L48
sample11    M    WD    L40
sample12    M    WD    L40
sample13    F    LFD    L49
sample14    F    LFD    L50
sample15    F    LFD    L37
sample16    F    LFD    L37
sample17    F    LFD    L37
sample18    F    LFD    L49
sample19    F    WD    L39
sample20    F    WD    L39
sample21    F    WD    L40
sample22    F    WD    L40
sample23    F    WD    L48
sample24    F    WD    L48


Aaron suggested to create a design matrix in this way:

design <- model.matrix(~0+Litter+paste(Sexe,Diet,sep="."))

Then he suggested to drop term number seven. I have doubt why dropping term number 7 the final two terms represent the average log-fold change for male mice over female, in the LFD or WD-fed mice. The design matrix would be:

 SDM.LFD SDM.WD Filename Sexe Diet 0 0 sample13 F LFD 0 0 sample14 F LFD 0 0 sample15 F LFD 0 0 sample16 F LFD 0 0 sample17 F LFD 0 0 sample18 F LFD 0 0 sample19 F WD 0 0 sample20 F WD 0 0 sample21 F WD 0 0 sample22 F WD 0 0 sample23 F WD 0 0 sample24 F WD 1 0 sample1 M LFD 1 0 sample2 M LFD 1 0 sample3 M LFD 1 0 sample4 M LFD 1 0 sample5 M LFD 1 0 sample6 M LFD 0 1 sample7 M WD 0 1 sample8 M WD 0 1 sample9 M WD 0 1 sample10 M WD 0 1 sample11 M WD 0 1 sample12 M WD

Do the last two terms represent SDM.LFD-SDF.LFD and SDM.WD-SDF.LFD? Am I wrong?

Best,

Keifa

limma designmatrix • 249 views
modified 13 months ago by Aaron Lun24k • written 13 months ago by Keifa10
0
13 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

Your code isn't exactly what I wrote:

SD <- as.factor(paste(targets$Sexe, targets$Diet, sep="."))
Litter <- targets\$Litter
design2 <- model.matrix(~Litter + SD)


... and your "design matrix" is not correct, either. My last two coefficients are:

   SDM.LFD SDM.WD
1        1      0
2        1      0
3        1      0
4        1      0
5        1      0
6        1      0
7        0      1
8        0      1
9        0      1
10       0      1
11       0      1
12       0      1
13       0      0
14       0      0
15       0      0
16       0      0
17       0      0
18       0      0
19       0      0
20       0      0
21       0      0
22       0      0
23       0      0
24       0      0

Dropping the 7th term in design2 is necessary to achieve full column rank, otherwise there is no unique least-squares solution to the system of linear equations. Once dropped, the last two coefficients represent the male/female log-fold change in each diet. There is no "SDF.LFD" term in the final matrix, so I don't know what you're referring to there.

Hi Aaron,

Maybe I am wrong, but this is what you wrote (https://support.bioconductor.org/p/68916/#110986):

> design2 <- model.matrix(~0 + Litter + SD)
> design2 <- design2[,-7]
> colnames(design2)
[1] "LitterL37" "LitterL39" "LitterL40" "LitterL48" "LitterL49" "LitterL50"
[7] "SDM.LFD"   "SDM.WD"  

I am referring to design2.

Keifa

Yes, I know. (I've edited my response to avoid confusion.) What I wrote then and now is correct. You'll have to be much clearer about what you don't understand.

Hi Aaron,

thank you for your comment. "SDF.LFD" term seems to be the reference level for factor SD and if you do not drop the coefficient 7 (you need to achieve full column rank), the three terms "SDM.LFD", "SDM.WD" and "SDF.WD" (the once dropped) should represent the log-fold change respect the reference level (SDF.LFD), am I right? What is not clear to me is why once dropped, the last two coefficients (SDM.LFD and SDM.WD) represent the male/female log-fold change in each diet?

Keifa

1

Because the SD levels are nested within Litter. Once you drop the 7th term, the reference level in the WD litters becomes "SDF.WD". You can convince yourself of this by looking at the design matrix. For example, let's look at sample 24. In the linear model described by design2 (after dropping coefficient 7), sample 24 has the terms:

(Intercept)   LitterL39   LitterL40   LitterL48   LitterL49   LitterL50
1           0           0           1           0           0
SDM.LFD      SDM.WD
0           0

You can see that sample 24's expression is equal to the sum of the intercept and LitterL48. This means that sample 24's group (i.e., "SDF.WD") is the baseline for all litter 48 samples. By comparison, if we look at sample 10, we get:

(Intercept)   LitterL39   LitterL40   LitterL48   LitterL49   LitterL50
1           0           0           1           0           0
SDM.LFD      SDM.WD
0           1

... which demonstrates that SDM.WD represents the difference in (log-)expression between sample 10's group (i.e., "SDM.WD") and sample 24's group, i.e., the log-fold change between male and female mice in the WD group.

Hi Aaron,

I get it, thank you very much.

Keifa