Question

Limma linear models

0

Entering edit mode

nonCodingGene ▴ 10

@noncodinggene-7018

Last seen 5.8 years ago

Hi

I'm using the limma package to obtain some linear models out of some arrays.

I have the transcription rate of the cell at 7 different growth rates (7 different arrays).

I did created the design matrix, and because I just want the linear model and not any contrast bewtween arrays I did it this way.

design <- matrix(ncol = 1, nrow = 7 )
rownames(design) <-colnames(set)[alk]
colnames(design) <- "GRI"
design[] <- alk_gri
design <- model.matrix(~  + design)
fit_alk <- lmFit(alk_set, design)
toptable(fit_alk)

But now I have several doubts as I'm not an expert in statistics and I have not used limma before.

The end result of all this should be an list order the slope of the linear models and do some gene ontology. The questions are, because I'm not doing contrasts between the models can I trust the p-value output of toptable or should I use adjusted p-value.

The other doubt is that I think I may want to filter the models by the correlation of predictor and response variable (which is the p-value I have talked before) and probably by the goodness of the linnear model (R square or ¿Residual error of each model?) and I'm not very sure how to do this last filtering.

Sorry for this messy post.

Thanks

limma goodness fit linear model • 2.1k views

ADD COMMENT • link updated 9.2 years ago by Gordon Smyth 50k • written 9.2 years ago by nonCodingGene ▴ 10

0

Entering edit mode

I'm wondering whether if a could use the sigma output of lmFit as a goodness of the linear model.

ADD REPLY • link 9.2 years ago nonCodingGene ▴ 10

Gordon Smyth · Answer 1 · 2015-01-25

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 41 minutes ago

WEHI, Melbourne, Australia

Your experiment has a problem in that you have 7 arrays and 7 different growth rates. In other words, you have no replication. However you may be able to take the approach explained in Section 9.6.2 of the User's Guide on Time Course Experiments with Many time points.

I don't know from what source you have got the idea that you should filter on correlation between predictor and response or on the sigma value. You certainly should not be filtering in either of these ways -- it would make your analysis invalid.

I suggest that you tell us more about your experimental design so that we can give you more specific advice. In particular, what data do you have about the "growth rate" associated with each array? Do you have a numerical growth rate value for each array?

ADD COMMENT • link 9.2 years ago Gordon Smyth 50k

0

Entering edit mode

Hi, thanks for your reply.

Each array has it's own numerical value. That means that I can plot on the X axis the growth rate and on the Y axis the transcription rate.

This is the design matrix for one of the sets of data:

  (Intercept)   design1
1           1 0.3260000
2           1 0.3262800
3           1 0.3036779
4           1 0.2271464
5           1 0.1723926

Where each line is an array, and under the column named as design1 it's placed the value for the growth rate each array, in this case 5 differnte arrays (not 7 like in the previous example).

ADD REPLY • link updated 9.2 years ago by Gordon Smyth 50k • written 9.2 years ago by nonCodingGene ▴ 10

score 0 · Answer 2 · 2015-01-25

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 41 minutes ago

WEHI, Melbourne, Australia

OK, I see. The analysis is easy, you just need to specify the coefficient you are testing for when calling topTable, and you need to use eBayes:

design <- model.matrix(~ alk_gri)
fit_alk <- lmFit(alk_set, design)
fit_alk <- eBayes(fit_alk)
topTable(fit_alk, coef=2)

This tests for linear changes in log-expression with growth rate. You could of course test whether a quadratic trend is needed:

design <- model.matrix(~ poly(alk_gri,2))
fit_alk <- lmFit(alk_set, design)
fit_alk <- eBayes(fit_alk)
topTable(fit_alk, coef=3)

If you want to continue on to do a gene ontology analysis, then it would help to tell us to know what species you are working with and what gene ID is included in your alk_set object.

ADD COMMENT • link 9.2 years ago Gordon Smyth 50k

0

Entering edit mode

The eBayes should be donde with the output of lmFit instead of with the alk_set. I guess that is a mistake.

I'm working with Saccharomyces cerevisiae and the genes are in systematic gene names (ex: YDL229W).

I still need to dig in how ebayes it's working in this case. I know how the package has been working so far but the statistics behind the eBayes are way beyond my actual knowledge.

ADD REPLY • link 9.2 years ago nonCodingGene ▴ 10

0

Entering edit mode

The GOstats and org.Sc.sgd.db packages may help you do a GO analysis. I find GOstats a bit hard to use, but it gives good results.

You don't need to have an expert understanding of the eBayes statistical method in order to use it.

ADD REPLY • link 9.2 years ago Gordon Smyth 50k

0

Entering edit mode

Thanks!