Question

design and contrast matrix for limma time series without replicates

0

Entering edit mode

Chunxuan Shao ▴ 30

@chunxuan-shao-5175

Last seen 10.6 years ago

Thanks very much for the thorough explanation, I will add tm:trt1 to my model to see expression changes in two condition. Best wishes, On Tue, Mar 20, 2012 at 7:51 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Xuan, > > > On 3/20/2012 2:13 PM, shao chunxuan wrote: > >> Hi Jim, >> >> Thanks for the reply, but I still have few more questions on the >> regression model of limma. >> >> Time points are not evenly spreaded, the real experiments like this: >> >> xx.1 <- c(0.5,1,3,6,12,24,72) >> time <- rep(xx.1,2) >> >> The design matrix is >> trt <- factor(rep(0:1, each = 7)) >> design=model.matrix(~time+trt) >> colnames(design) <- c("Intercept","time","treat") >> >> > design >> Intercept time treat >> 1 1 0.5 0 >> 2 1 1.0 0 >> 3 1 3.0 0 >> 4 1 6.0 0 >> 5 1 12.0 0 >> 6 1 24.0 0 >> 7 1 72.0 0 >> 8 1 0.5 1 >> 9 1 1.0 1 >> 10 1 3.0 1 >> 11 1 6.0 1 >> 12 1 12.0 1 >> 13 1 24.0 1 >> 14 1 72.0 1 >> attr(,"assign") >> [1] 0 1 2 >> attr(,"contrasts") >> attr(,"contrasts")$trt >> [1] "contr.treatment" >> >> Here is my understanding of what happens in limma >> >> In this case, Iimma is supposed to fit: >> y=p0 + p1*time + p2*treat, and y=p0 + p1*time, >> while time is quantitative variable, treat is group variable. >> A F-statistic will be calculated and the corresponding pvalue will be >> used to determine significance. >> >> Is it right to understand in this way? >> >> The implementation is : >> fit <- lmFit(ts.data, design) ## ts.data is the time-course data. >> fit <- eBayes(fit) >> xx.1 <- topTable(fit, coef="treat", adjust="BH") >> xx.2 <- topTable(fit, coef="time", adjust="BH") >> >> So xx.1 means the differentially expressed genes between the two >> treatments, after adjusting for the time effect, right? what does xx.2 mean? >> > > xx.2 gives you those genes where the slope of the fitted line is > significantly different from zero. In other words, these are the genes that > appear to go up or down as time progresses. > > You should note two things here. First, since you don't have replicates, > we are forced to use the time points as continuous covariates rather than > factor levels. This means that you are looking for a linear response to > time (e.g., genes that go up or down more as time progresses). This is not > normally what one would want to do, as it is more likely that a gene will > go up for a bit and then go back down (or vice versa). To account for that > curvature in the expression, you might want to add a time^2 covariate as > well. > > Second, linear regression is in some ways more complicated than analysis > of variance, and if you were just fitting a single model you would be able > to assess how well your model was conforming to the assumptions of > regression, test to see if you need a quadratic time term, look for data > with high leverage that might be skewing your results, etc. For ANOVA, you > are in general just testing to see if the mean of the groups are different. > > When doing microarray analysis, you just fit what you hope is a generally > reasonable model and let it rip on thousands of genes at once. There is no > way to check all the little details of a linear regression on thousands of > models, so you may (will?) have numerous genes that are clearly > mis-specified by the model you fit, or have glaring inconsistencies that > cause them to appear significant when it is just that the model is really > wrong. > > So that is my long winded way of saying that you are not IMO analyzing > your data in an optimal way, but you are forced to do so by the lack of > replication. So for each 'significant' gene you find, you should look at > plots of expression vs time prior to attempting validation. > > > > I am more interested in finding genes changed in time course in >> treatment, after adjusting for the control. >> Is tm*trt in design2 what I need? >> > > Not really. The model you are fitting here (xx.2) gives the genes that > change linearly with time, after adjusting for control. The assumptions you > are making are that > > a) Genes go up or down in a linear fashion (e.g., you can plot expression > values on the vertical axis and time on the horizontal axis and then draw a > line over the data, and it looks 'reasonable'). > b) The only difference between control and treatment is a difference in > the intercept, so you assume the same slope for control and treated samples. > > adding a tm:trt1 coefficient allows the slopes to be different as well. > This coefficient tests for a significant difference between slopes (for > example, it might be that geneX goes up in treated, but down in control). > > Best, > > Jim > > > >> Best, >> >> >> >> On Tue, Mar 20, 2012 at 2:48 PM, James W. MacDonald <jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote: >> >> Hi Xuan, >> >> >> On 3/20/2012 6:50 AM, Chunxuan Shao wrote: >> >> Hi everyone: >> >> I have one microarray data set considering differentiation in >> a cell line. About 7 time points for both control and >> treatment, no replicates >> I would like to use limma to find the differentially expressed >> genes between time points for control and treatment, and want >> to compare the gene expression between control and treatment >> for the same time point. But I don't know how to make the >> right design and contrast matrix. >> >> After searching mail archive, the closest related answer is >> "https://stat.ethz.ch/**pipermail/bioconductor/2010-** >> June/033849.html<https: stat.ethz.ch="" pipermail="" bioconductor="" 2010-j="" une="" 033849.html=""> >> ", >> which suggest: >> >> " >> time=1:10 >> design=model.matrix(~time) >> " >> >> In my case, is it correct to set this? >> >> time=rep(1:7,2) >> design=model.matrix(~time) >> >> >> Probably not. This implies that your time points are equally >> spaced, with one time period between them (e.g., you collected >> samples after 1,2,3,4,5,6,7 hours or days or some other period). >> If you didn't use equally spaced time points, you need to change >> to reflect that. >> >> In addition, you need to set the design matrix up to include the >> control/treatment comparison. If I assume you are in fact using >> seven equally spaced time intervals, then you would want: >> >> tm <- rep(1:7,2) >> trt <- factor(rep(0:1, each = 7)) >> >> design <- model.matrix(~tm+trt) >> >> And the trt1 coefficient estimates the difference in the >> intercepts, which is what you are looking for. In other words, >> this is fitting a model where you are allowing the two sample >> types to have different intercepts, but assuming the same slope. >> If the intercepts are different, then one sample type has overall >> higher expression than the other. >> >> You could also allow for different slopes by adding a >> time/treatment interaction term: >> >> design2 <- model.matrix(~tm*trt) >> >> Here the main coefficient of interest would be tm:trt1, which >> measures the difference in slope between the treatment and control. >> >> Best, >> >> Jim >> >> >> >> design >> >> (Intercept) time >> 1 1 1 >> 2 1 2 >> 3 1 3 >> 4 1 4 >> 5 1 5 >> 6 1 6 >> 7 1 7 >> 8 1 1 >> 9 1 2 >> 10 1 3 >> 11 1 4 >> 12 1 5 >> 13 1 6 >> 14 1 7 >> attr(,"assign") >> [1] 0 1 >> >> >> Then how to set the contrast matrix? >> >> >> Thanks! >> >> -- >> xuan >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.**science.biology.informatics.** >> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor=""> >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> >> >> -- >> xuan >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- Chunxuan Shao [[alternative HTML version deleted]]

Microarray GO Regression limma Microarray GO Regression limma • 1.6k views

ADD COMMENT • link 12.9 years ago Chunxuan Shao ▴ 30