Question

limma, reference group, covariate

1

Entering edit mode

alakatos ▴ 130

@alakatos-6983

Last seen 6.4 years ago

United States

Hello,

I would like to use “limma” for my Affy gene expression analysis. My dataset is in “expressionset” format.

I would like to include two variables in my design matrix: 4-level treatment and 2-level sex. The goal is to test the treatment groups against “WT" while controlling for sex ; (treatment effect independent from sex).

Dataset:

> class(ExpressionSet)
[1] "ExpressionSet"
attr(,"package")
[1] “Biobase"

Includes expression data, phenoData and annotation.

Execution:

pd <- pData(phenoData(ExpressionSet)) 
design <- model.matrix(~0+Treatment+Sex, pd, ref="WT")
colnames(design) <- gsub("Treatment|Sex", "", colnames(design))
fit <- lmFit(exprs(ExpressionSet), design)
ebFit <- eBayes(fit)

Design 
                    S_S   S_Veh   WT_S            WT            M
1112F-1         0       1          0                   0               0
1112F-2          0       1         0                  0                1
1112F-3          0       1         0                  0                1
1112F-4          0       1         0                  0                0
1112F-5          0       0         1                  0                 0 
Etc..

[1] 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts”)$Treatment
[1] "contr.treatment"
attr(,"contrasts")$Sex
[1] "contr.treatment"

It seems “WT” is not recognized as reference. My understanding is that I should not see “WT" in the design matrix if it is a reference.

1. Would you please advise how to fix the reference?

2. I am also wondering if my setup is correct for testing treatment and controlling for sex.

3. What do I test without the reference group in the design above? What is the interpretation of for instance, the result of coef=1 (tt <- topTable(ebFit, number=10, coef=1, adjust = "fdr”) when no (ref= ) in the design.

Thanks a lot for your help and time.

Anita

limma reference group covariate • 5.4k views

ADD COMMENT • link 11.2 years ago alakatos ▴ 130

score 3 · Answer 1 · 2014-12-22

The model you have fit is 'cell means' model, and it is estimating the means for each treatment group. This is a perfectly acceptable parameterization, but you have an extra step of fitting contrasts.

contrasts <- makeContrasts(S_S-WT, S_Veh-WT, WT_S-WT, levels = design)

fit2 <- contrasts.fit(fit, contrasts)

ebFit <- eBayes(fit2)

And at this point

topTable(ebFit, coef=1)

will output the top 10 genes for the S_S vs WT contrast.

You could always do what I think you want by excluding the '0' from your call to model.matrix (see ?formula for more information). But do note that there is no argument 'ref' for model.matrix (or any underlying functions), so that will have no effect on what the reference level is.

To answer your other questions, yes your model is correct for testing treatment and controlling for sex. But do note that this might not be a biologically reasonable thing to do. By fitting this model you are in essence assuming that the differences between sexes is a simple shift in expression levels. In other words, you assume that for a given gene (for example) that if the S_S treatment in females has a higher expression than WT, that this same pattern holds for males, but with perhaps higher or lower overall expression levels. This will miss (or more correctly ignore) any situations where e.g., S_S treatment is up-regulated as compared to WT in females, but down-regulated in males (or any other dis-similar reaction to S_S treatment that is based on sex). To capture that sort of thing you need an interaction term.

So one could argue that you should at least do (here I just fake up the Treatment and Sex factors as an example):

> Treatment <- factor(rep(c("S_S","S_Veh","WT_S", "WT"), each = 4))
> Sex <- factor(rep(c("Male","Female"), each=2, times=4))
> Treatment <- relevel(Treatment, ref = "WT")
> design <- model.matrix(~Treatment*Sex)
> colnames(design)
[1] "(Intercept)"            "TreatmentS_S"           "TreatmentS_Veh"        
[4] "TreatmentWT_S"          "SexMale"                "TreatmentS_S:SexMale"  
[7] "TreatmentS_Veh:SexMale" "TreatmentWT_S:SexMale"

And then you could do

fit <- lmFit(ExpressionSet, design)
fit2 <- eBayes(fit)
topTable(fit2, 6:8)

Which will then tell you if you have any genes for which the interaction is significant for any of the treatments. If there are none, you are probably safe ignoring the interaction and proceeding with what you were doing initially.

score 1 · Answer 2 · 2014-12-22

You appear to have confused the functions model.matrix() and modelMatrix(). The latter has a 'ref' argument and is only for two-color data. The former does not have a 'ref' argument and is appropriate for single channel data, as you have here.

It is a good idea to get familiar with help system, which will tell you what the arguments are for each function. Try typing ?model.matrix at the R prompt.

score 0 · Answer 3 · 2014-12-22

0

Entering edit mode

alakatos ▴ 130

@alakatos-6983

Last seen 6.4 years ago

United States

Hi James,

Thank you very much for the detailed answer and clarification.

All the best,

Anita

ADD COMMENT • link 11.2 years ago alakatos ▴ 130

score 0 · Answer 4 · 2014-12-23

0

Entering edit mode

alakatos ▴ 130

@alakatos-6983

Last seen 6.4 years ago

United States

Ok. I will. Thank you.

ADD COMMENT • link 11.2 years ago alakatos ▴ 130