design and contrast matrix for limma time series without replicates
0
0
Entering edit mode
@chunxuan-shao-5175
Last seen 10.3 years ago
Thanks very much for the thorough explanation, I will add tm:trt1 to my model to see expression changes in two condition. Best wishes, On Tue, Mar 20, 2012 at 7:51 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Xuan, > > > On 3/20/2012 2:13 PM, shao chunxuan wrote: > >> Hi Jim, >> >> Thanks for the reply, but I still have few more questions on the >> regression model of limma. >> >> Time points are not evenly spreaded, the real experiments like this: >> >> xx.1 <- c(0.5,1,3,6,12,24,72) >> time <- rep(xx.1,2) >> >> The design matrix is >> trt <- factor(rep(0:1, each = 7)) >> design=model.matrix(~time+trt) >> colnames(design) <- c("Intercept","time","treat") >> >> > design >> Intercept time treat >> 1 1 0.5 0 >> 2 1 1.0 0 >> 3 1 3.0 0 >> 4 1 6.0 0 >> 5 1 12.0 0 >> 6 1 24.0 0 >> 7 1 72.0 0 >> 8 1 0.5 1 >> 9 1 1.0 1 >> 10 1 3.0 1 >> 11 1 6.0 1 >> 12 1 12.0 1 >> 13 1 24.0 1 >> 14 1 72.0 1 >> attr(,"assign") >> [1] 0 1 2 >> attr(,"contrasts") >> attr(,"contrasts")$trt >> [1] "contr.treatment" >> >> Here is my understanding of what happens in limma >> >> In this case, Iimma is supposed to fit: >> y=p0 + p1*time + p2*treat, and y=p0 + p1*time, >> while time is quantitative variable, treat is group variable. >> A F-statistic will be calculated and the corresponding pvalue will be >> used to determine significance. >> >> Is it right to understand in this way? >> >> The implementation is : >> fit <- lmFit(ts.data, design) ## ts.data is the time-course data. >> fit <- eBayes(fit) >> xx.1 <- topTable(fit, coef="treat", adjust="BH") >> xx.2 <- topTable(fit, coef="time", adjust="BH") >> >> So xx.1 means the differentially expressed genes between the two >> treatments, after adjusting for the time effect, right? what does xx.2 mean? >> > > xx.2 gives you those genes where the slope of the fitted line is > significantly different from zero. In other words, these are the genes that > appear to go up or down as time progresses. > > You should note two things here. First, since you don't have replicates, > we are forced to use the time points as continuous covariates rather than > factor levels. This means that you are looking for a linear response to > time (e.g., genes that go up or down more as time progresses). This is not > normally what one would want to do, as it is more likely that a gene will > go up for a bit and then go back down (or vice versa). To account for that > curvature in the expression, you might want to add a time^2 covariate as > well. > > Second, linear regression is in some ways more complicated than analysis > of variance, and if you were just fitting a single model you would be able > to assess how well your model was conforming to the assumptions of > regression, test to see if you need a quadratic time term, look for data > with high leverage that might be skewing your results, etc. For ANOVA, you > are in general just testing to see if the mean of the groups are different. > > When doing microarray analysis, you just fit what you hope is a generally > reasonable model and let it rip on thousands of genes at once. There is no > way to check all the little details of a linear regression on thousands of > models, so you may (will?) have numerous genes that are clearly > mis-specified by the model you fit, or have glaring inconsistencies that > cause them to appear significant when it is just that the model is really > wrong. > > So that is my long winded way of saying that you are not IMO analyzing > your data in an optimal way, but you are forced to do so by the lack of > replication. So for each 'significant' gene you find, you should look at > plots of expression vs time prior to attempting validation. > > > > I am more interested in finding genes changed in time course in >> treatment, after adjusting for the control. >> Is tm*trt in design2 what I need? >> > > Not really. The model you are fitting here (xx.2) gives the genes that > change linearly with time, after adjusting for control. The assumptions you > are making are that > > a) Genes go up or down in a linear fashion (e.g., you can plot expression > values on the vertical axis and time on the horizontal axis and then draw a > line over the data, and it looks 'reasonable'). > b) The only difference between control and treatment is a difference in > the intercept, so you assume the same slope for control and treated samples. > > adding a tm:trt1 coefficient allows the slopes to be different as well. > This coefficient tests for a significant difference between slopes (for > example, it might be that geneX goes up in treated, but down in control). > > Best, > > Jim > > > >> Best, >> >> >> >> On Tue, Mar 20, 2012 at 2:48 PM, James W. MacDonald <jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote: >> >> Hi Xuan, >> >> >> On 3/20/2012 6:50 AM, Chunxuan Shao wrote: >> >> Hi everyone: >> >> I have one microarray data set considering differentiation in >> a cell line. About 7 time points for both control and >> treatment, no replicates >> I would like to use limma to find the differentially expressed >> genes between time points for control and treatment, and want >> to compare the gene expression between control and treatment >> for the same time point. But I don't know how to make the >> right design and contrast matrix. >> >> After searching mail archive, the closest related answer is >> "https://stat.ethz.ch/**pipermail/bioconductor/2010-** >> June/033849.html<https: stat.ethz.ch="" pipermail="" bioconductor="" 2010-j="" une="" 033849.html=""> >> ", >> which suggest: >> >> " >> time=1:10 >> design=model.matrix(~time) >> " >> >> In my case, is it correct to set this? >> >> time=rep(1:7,2) >> design=model.matrix(~time) >> >> >> Probably not. This implies that your time points are equally >> spaced, with one time period between them (e.g., you collected >> samples after 1,2,3,4,5,6,7 hours or days or some other period). >> If you didn't use equally spaced time points, you need to change >> to reflect that. >> >> In addition, you need to set the design matrix up to include the >> control/treatment comparison. If I assume you are in fact using >> seven equally spaced time intervals, then you would want: >> >> tm <- rep(1:7,2) >> trt <- factor(rep(0:1, each = 7)) >> >> design <- model.matrix(~tm+trt) >> >> And the trt1 coefficient estimates the difference in the >> intercepts, which is what you are looking for. In other words, >> this is fitting a model where you are allowing the two sample >> types to have different intercepts, but assuming the same slope. >> If the intercepts are different, then one sample type has overall >> higher expression than the other. >> >> You could also allow for different slopes by adding a >> time/treatment interaction term: >> >> design2 <- model.matrix(~tm*trt) >> >> Here the main coefficient of interest would be tm:trt1, which >> measures the difference in slope between the treatment and control. >> >> Best, >> >> Jim >> >> >> >> design >> >> (Intercept) time >> 1 1 1 >> 2 1 2 >> 3 1 3 >> 4 1 4 >> 5 1 5 >> 6 1 6 >> 7 1 7 >> 8 1 1 >> 9 1 2 >> 10 1 3 >> 11 1 4 >> 12 1 5 >> 13 1 6 >> 14 1 7 >> attr(,"assign") >> [1] 0 1 >> >> >> Then how to set the contrast matrix? >> >> >> Thanks! >> >> -- >> xuan >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.**science.biology.informatics.** >> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor=""> >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> >> >> -- >> xuan >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- Chunxuan Shao [[alternative HTML version deleted]]
Microarray GO Regression limma Microarray GO Regression limma • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6