Question: limma, reference group, covariate
0
4.3 years ago by
alakatos90
United States
alakatos90 wrote:

Hello,

I would like to use “limma” for my Affy gene expression analysis.  My dataset is in “expressionset”  format.

I would like  to include two variables in my design matrix: 4-level treatment and 2-level sex. The goal is to test the treatment groups against “WT" while controlling for sex ; (treatment effect independent from sex).

Dataset:

> class(ExpressionSet)
[1] "ExpressionSet"
attr(,"package")
[1] “Biobase" 

Includes expression data, phenoData and annotation.

Execution:

pd <- pData(phenoData(ExpressionSet))
design <- model.matrix(~0+Treatment+Sex, pd, ref="WT")
colnames(design) <- gsub("Treatment|Sex", "", colnames(design))
fit <- lmFit(exprs(ExpressionSet), design)
ebFit <- eBayes(fit)

Design
S_S   S_Veh   WT_S            WT            M
1112F-1         0       1          0                   0               0
1112F-2          0       1         0                  0                1
1112F-3          0       1         0                  0                1
1112F-4          0       1         0                  0                0
1112F-5          0       0         1                  0                 0
Etc..

[1] 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts”)$Treatment [1] "contr.treatment" attr(,"contrasts")$Sex
[1] "contr.treatment"

It seems  “WT” is not recognized as reference. My understanding is that  I should not see “WT" in the design matrix if it is a reference.

2. I am also wondering if my setup  is correct for testing treatment and controlling for sex.

3. What do I test without  the reference group in the design above? What is the interpretation of for instance, the result of  coef=1  (tt <- topTable(ebFit, number=10, coef=1, adjust = "fdr”) when  no (ref= ) in the design.

Thanks a lot for your help and time.

Anita

modified 4.3 years ago • written 4.3 years ago by alakatos90
2
4.3 years ago by
United States
James W. MacDonald49k wrote:

The model you have fit is 'cell means' model, and it is estimating the means for each treatment group. This is a perfectly acceptable parameterization, but you have an extra step of fitting contrasts.

contrasts <- makeContrasts(S_S-WT, S_Veh-WT, WT_S-WT, levels = design)

fit2 <- contrasts.fit(fit, contrasts)

ebFit <- eBayes(fit2)

And at this point

topTable(ebFit, coef=1)

will output the top 10 genes for the S_S vs WT contrast.

You could always do what I think you want by excluding the '0' from your call to model.matrix (see ?formula for more information). But do note that there is no argument 'ref' for model.matrix (or any underlying functions), so that will have no effect on what the reference level is.

To answer your other questions, yes your model is correct for testing treatment and controlling for sex. But do note that this might not be a biologically reasonable thing to do. By fitting this model you are in essence assuming that the differences between sexes is a simple shift in expression levels. In other words, you assume that for a given gene (for example) that if the S_S treatment in females has a higher expression than WT, that this same pattern holds for males, but with perhaps higher or lower overall expression levels. This will miss (or more correctly ignore) any situations where e.g., S_S treatment is up-regulated as compared to WT in females, but down-regulated in males (or any other dis-similar reaction to S_S treatment that is based on sex). To capture that sort of thing you need an interaction term.

So one could argue that you should at least do (here I just fake up the Treatment and Sex factors as an example):

> Treatment <- factor(rep(c("S_S","S_Veh","WT_S", "WT"), each = 4))
> Sex <- factor(rep(c("Male","Female"), each=2, times=4))
> Treatment <- relevel(Treatment, ref = "WT")
> design <- model.matrix(~Treatment*Sex)
> colnames(design)
[1] "(Intercept)"            "TreatmentS_S"           "TreatmentS_Veh"
[4] "TreatmentWT_S"          "SexMale"                "TreatmentS_S:SexMale"
[7] "TreatmentS_Veh:SexMale" "TreatmentWT_S:SexMale"

And then you could do

fit <- lmFit(ExpressionSet, design)
fit2 <- eBayes(fit)
topTable(fit2, 6:8)

Which will then tell you if you have any genes for which the interaction is significant for any of the treatments. If there are none, you are probably safe ignoring the interaction and proceeding with what you were doing initially.

0
4.3 years ago by
alakatos90
United States
alakatos90 wrote:

Hi James,

Thank you very much for the detailed answer and clarification.

All the best,

Anita

0
4.3 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

You appear to have confused the functions model.matrix() and modelMatrix(). The latter has a 'ref' argument and is only for two-color data. The former does not have a 'ref' argument and is appropriate for single channel data, as you have here.

It is a good idea to get familiar with help system, which will tell you what the arguments are for each function. Try typing ?model.matrix at the R prompt.

0
4.3 years ago by
alakatos90
United States
alakatos90 wrote:

Ok. I will.  Thank you.