Question

what conditions affect my gene (with limma?)

0

Entering edit mode

mamillerpa • 0

@mamillerpa-8668

Last seen 6.3 years ago

United States

I’m hoping to find a simple, cookbook method for determining what experimental conditions most affect the expression level of a single gene. I can do that with a generic anova, but I’d like to take advantage of the more sophisticated and task-appropriate models in limma.

I’m generally getting my data from GEO. Consider the expression of estrogen receptor beta (ESRRB/223858_at) in GEO dataset GDS4011, using the HG-U133_Plus_2 chip. (or http://www.ncbi.nlm.nih.gov/geoprofiles/93486818 )

I can get that data with

> geo.profile <- getURL("http://www.ncbi.nlm.nih.gov/geo/gds/getDatum.cgi?uid=93486818")

…do a little rearranging, and test with a vanilla anova:

> summary(aov(data = rearranged.geo.profile, X1556156_at~disease.state+gender+other))

Df Sum Sq Mean Sq F value  Pr(>F) 
disease.state  5  11.52  2.3043   4.218 0.00216 **
gender         1   0.33  0.3279   0.600 0.44121 
other          2   1.56  0.7797   1.427 0.24720

> sort(with(rearranged.geo.profile,tapply(X1556156_at, disease.state, mean)))
         subgroup: G4          subgroup: G3
             3.925902              4.062805
        subgroup: WNT         subgroup: N/A
             4.205237              4.219620
subgroup: SHH outlier         subgroup: SHH
             4.383280              5.117186

How can I get to a similar point with limma? When I’ve used limma up to this point, I’ve started with the GEO2R templates provided by NCBI, so can I find the relationship between disease state and ESRRB expression in my fit2 object?

> fit <- lmFit(gset, design)
> cont.matrix <- makeContrasts(G1-G0, levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2, 0.01)

limma geo experimental design • 2.0k views

ADD COMMENT • link updated 8.6 years ago by Gordon Smyth 50k • written 8.7 years ago by mamillerpa • 0

0

Entering edit mode

Have you read through the limmaUser'sGuide?

ADD REPLY • link 8.7 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

I have started. Are there particular sections that you recommend for me?

ADD REPLY • link 8.7 years ago mamillerpa • 0

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 15 minutes ago

WEHI, Melbourne, Australia

Actually aov() doesn't in general do what you think it does. It outputs a sequential anova table. Unless the experimental design is perfectly balanced, you will find the SS for each factors in the anova table will change depending on the order that you put the factors into the aov model formula. So you can't interpret the p-values as the relative importance of each term.

ADD COMMENT • link 8.6 years ago Gordon Smyth 50k

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 15 minutes ago

WEHI, Melbourne, Australia

In general, it is better to judge the importance of each factor while adjusting for all other factors. The limma tests do this, but aov() doesn't, unless the design is balanced.

ADD COMMENT • link 8.6 years ago Gordon Smyth 50k

0

Entering edit mode

Great. I dind't know that per se, but I'm operating on the assumption that I am less likely to misuse limma than I am to misuse aov.

ADD REPLY • link 8.6 years ago mamillerpa • 0

score 3 · Accepted Answer · 2015-08-21

As Steve suggests, the limma user's guide is a good source of information. Check out section 9.3 for how to perform ANOVA-like contrasts involving multiple groups. For a more concrete example, if you want to check whether there is a difference in expression between any of your disease states, you can do something like this:

design <- model.matrix(~disease.state+gender+other)
fit <- lmFit(gset, design) 
fit <- eBayes(fit)
topTable(fit, coef=2:5)

The 2nd to 5th coefficients should represent log-fold changes of each disease state over the "reference" disease state (probably G3, if we go by alphabetical order in the disease.state vector). Dropping them all at once represents the null hypothesis that all groups have expression equal to that of G3 - and, thus, equal to each other. Genes that reject this null will be those that have some response to disease state. You can do the same for gender and other, probably by setting coef=6 and coef=7:8, respectively. This should give you the effect of each factor for each gene, à la aov.

Note that you'll need to account for multiple testing across many genes, so you should be looking at the BH-adjusted p-value rather than the raw p-value for each gene. Obviously, the raw p-values themselves will be different between aov and limma, as the latter performs moderation, i.e., sharing information between genes to stabilize the variance estimates and to increase power.