limma, reference group, covariate
4
1
Entering edit mode
alakatos ▴ 130
@alakatos-6983
Last seen 5.2 years ago
United States

Hello, 

I would like to use “limma” for my Affy gene expression analysis.  My dataset is in “expressionset”  format. 

I would like  to include two variables in my design matrix: 4-level treatment and 2-level sex. The goal is to test the treatment groups against “WT" while controlling for sex ; (treatment effect independent from sex). 

Dataset:

> class(ExpressionSet)
[1] "ExpressionSet"
attr(,"package")
[1] “Biobase" 

Includes expression data, phenoData and annotation. 

Execution:

pd <- pData(phenoData(ExpressionSet)) 
design <- model.matrix(~0+Treatment+Sex, pd, ref="WT")
colnames(design) <- gsub("Treatment|Sex", "", colnames(design))
fit <- lmFit(exprs(ExpressionSet), design)
ebFit <- eBayes(fit)

Design 
                    S_S   S_Veh   WT_S            WT            M
1112F-1         0       1          0                   0               0
1112F-2          0       1         0                  0                1
1112F-3          0       1         0                  0                1
1112F-4          0       1         0                  0                0
1112F-5          0       0         1                  0                 0 
Etc..

[1] 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts”)$Treatment
[1] "contr.treatment"
attr(,"contrasts")$Sex
[1] "contr.treatment"

It seems  “WT” is not recognized as reference. My understanding is that  I should not see “WT" in the design matrix if it is a reference.  

1. Would you please advise how to fix the reference?

2. I am also wondering if my setup  is correct for testing treatment and controlling for sex.  

3. What do I test without  the reference group in the design above? What is the interpretation of for instance, the result of  coef=1  (tt <- topTable(ebFit, number=10, coef=1, adjust = "fdr”) when  no (ref= ) in the design. 

Thanks a lot for your help and time.

Anita

limma reference group covariate • 4.9k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 15 minutes ago
United States

The model you have fit is 'cell means' model, and it is estimating the means for each treatment group. This is a perfectly acceptable parameterization, but you have an extra step of fitting contrasts.
 

contrasts <- makeContrasts(S_S-WT, S_Veh-WT, WT_S-WT, levels = design)

fit2 <- contrasts.fit(fit, contrasts)

ebFit <- eBayes(fit2)

And at this point

topTable(ebFit, coef=1)

will output the top 10 genes for the S_S vs WT contrast.

You could always do what I think you want by excluding the '0' from your call to model.matrix (see ?formula for more information). But do note that there is no argument 'ref' for model.matrix (or any underlying functions), so that will have no effect on what the reference level is.

To answer your other questions, yes your model is correct for testing treatment and controlling for sex. But do note that this might not be a biologically reasonable thing to do. By fitting this model you are in essence assuming that the differences between sexes is a simple shift in expression levels. In other words, you assume that for a given gene (for example) that if the S_S treatment in females has a higher expression than WT, that this same pattern holds for males, but with perhaps higher or lower overall expression levels. This will miss (or more correctly ignore) any situations where e.g., S_S treatment is up-regulated as compared to WT in females, but down-regulated in males (or any other dis-similar reaction to S_S treatment that is based on sex). To capture that sort of thing you need an interaction term.

So one could argue that you should at least do (here I just fake up the Treatment and Sex factors as an example):

> Treatment <- factor(rep(c("S_S","S_Veh","WT_S", "WT"), each = 4))
> Sex <- factor(rep(c("Male","Female"), each=2, times=4))
> Treatment <- relevel(Treatment, ref = "WT")
> design <- model.matrix(~Treatment*Sex)
> colnames(design)
[1] "(Intercept)"            "TreatmentS_S"           "TreatmentS_Veh"        
[4] "TreatmentWT_S"          "SexMale"                "TreatmentS_S:SexMale"  
[7] "TreatmentS_Veh:SexMale" "TreatmentWT_S:SexMale"

And then you could do

fit <- lmFit(ExpressionSet, design)
fit2 <- eBayes(fit)
topTable(fit2, 6:8)

Which will then tell you if you have any genes for which the interaction is significant for any of the treatments. If there are none, you are probably safe ignoring the interaction and proceeding with what you were doing initially.

ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 11 hours ago
WEHI, Melbourne, Australia

You appear to have confused the functions model.matrix() and modelMatrix(). The latter has a 'ref' argument and is only for two-color data. The former does not have a 'ref' argument and is appropriate for single channel data, as you have here.

It is a good idea to get familiar with help system, which will tell you what the arguments are for each function. Try typing ?model.matrix at the R prompt.

ADD COMMENT
0
Entering edit mode
alakatos ▴ 130
@alakatos-6983
Last seen 5.2 years ago
United States

Hi James,

Thank you very much for the detailed answer and clarification. 

All the best, 

Anita 

 

ADD COMMENT
0
Entering edit mode
alakatos ▴ 130
@alakatos-6983
Last seen 5.2 years ago
United States

Ok. I will.  Thank you.

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6