Question: Limma linear models
0
4.8 years ago by
nonCodingGene10 wrote:

Hi

I'm using the limma package to obtain some linear models out of some arrays.

I have the transcription rate of the cell at 7 different growth rates (7 different arrays).

I did created the design matrix, and because I just want the linear model and not any contrast bewtween arrays I did it this way.

design <- matrix(ncol = 1, nrow = 7 )
rownames(design) <-colnames(set)[alk]
colnames(design) <- "GRI"
design[] <- alk_gri
design <- model.matrix(~  + design)
fit_alk <- lmFit(alk_set, design)
toptable(fit_alk)

But now I have several doubts as I'm not an expert in statistics and I have not used limma before.

The end result of all this should be an list order the slope of the linear models and do some gene ontology. The questions are, because I'm not doing contrasts between the models can I trust the p-value output of toptable or should I use adjusted p-value.

The other doubt is that I think I may want to filter the models by the correlation of predictor and response variable (which is the p-value I have talked before) and probably by the goodness of the linnear model (R square or ¿Residual error of each model?) and I'm not very sure how to do this last filtering.

Sorry for this messy post.

Thanks

limma model goodness fit linear • 1.1k views
modified 4.8 years ago by Gordon Smyth39k • written 4.8 years ago by nonCodingGene10

I'm wondering whether if a could use the sigma output of lmFit as a goodness of the linear model.

1
4.8 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Your experiment has a problem in that you have 7 arrays and 7 different growth rates. In other words, you have no replication. However you may be able to take the approach explained in Section 9.6.2 of the User's Guide on Time Course Experiments with Many time points.

I don't know from what source you have got the idea that you should filter on correlation between predictor and response or on the sigma value. You certainly should not be filtering in either of these ways -- it would make your analysis invalid.

I suggest that you tell us more about your experimental design so that we can give you more specific advice. In particular, what data do you have about the "growth rate" associated with each array? Do you have a numerical growth rate value for each array?

Each array has it's own numerical value. That means that I can plot on the X axis the growth rate and on the Y axis the transcription rate.

This is the design matrix for one of the sets of data:

  (Intercept)   design1
1           1 0.3260000
2           1 0.3262800
3           1 0.3036779
4           1 0.2271464
5           1 0.1723926

Where each line is an array, and under the column named as design1 it's placed the value for the growth rate each array, in this case 5 differnte arrays (not 7 like in the previous example).

0
4.8 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

OK, I see. The analysis is easy, you just need to specify the coefficient you are testing for when calling topTable, and you need to use eBayes:

design <- model.matrix(~ alk_gri)
fit_alk <- lmFit(alk_set, design)
fit_alk <- eBayes(fit_alk)
topTable(fit_alk, coef=2)

This tests for linear changes in log-expression with growth rate. You could of course test whether a quadratic trend is needed:

design <- model.matrix(~ poly(alk_gri,2))
fit_alk <- lmFit(alk_set, design)
fit_alk <- eBayes(fit_alk)
topTable(fit_alk, coef=3)

If you want to continue on to do a gene ontology analysis, then it would help to tell us to know what species you are working with and what gene ID is included in your alk_set object.

The eBayes should be donde with the output of lmFit instead of with the alk_set. I guess that is a mistake.

I'm working with Saccharomyces cerevisiae and the genes are in systematic gene names (ex: YDL229W).

I still need to dig in how ebayes it's working in this case. I know how the package has been working so far but the statistics behind the eBayes are way beyond my actual knowledge.

The GOstats and org.Sc.sgd.db packages may help you do a GO analysis. I find GOstats a bit hard to use, but it gives good results.

You don't need to have an expert understanding of the eBayes statistical method in order to use it.

Thanks!

Other question related to this.

I have filtered the output of toptable. In this case I filtered genes with p.value lower than 0,05 and genes with t value > than 0 and t value < 0, in order to obtain upregulating genes(t > 0) or downreagulatin genes(t < 0).

But when I plot some genes that I've filtered as  downregulating I find that the biggest the growth rate the biggest the transcription rate.

With the upregulating genes happens the same, I plot it and the bigger the growth rate is the less the transcription rate happens.

I'm not sure if this makes sense to you. I think I would be expecting right the other way.