Entering edit mode
Chunxuan Shao
▴
30
@chunxuan-shao-5175
Last seen 10.2 years ago
Thanks very much for the thorough explanation, I will add tm:trt1 to
my
model to see expression changes in two condition.
Best wishes,
On Tue, Mar 20, 2012 at 7:51 PM, James W. MacDonald <jmacdon@uw.edu>
wrote:
> Hi Xuan,
>
>
> On 3/20/2012 2:13 PM, shao chunxuan wrote:
>
>> Hi Jim,
>>
>> Thanks for the reply, but I still have few more questions on the
>> regression model of limma.
>>
>> Time points are not evenly spreaded, the real experiments like
this:
>>
>> xx.1 <- c(0.5,1,3,6,12,24,72)
>> time <- rep(xx.1,2)
>>
>> The design matrix is
>> trt <- factor(rep(0:1, each = 7))
>> design=model.matrix(~time+trt)
>> colnames(design) <- c("Intercept","time","treat")
>>
>> > design
>> Intercept time treat
>> 1 1 0.5 0
>> 2 1 1.0 0
>> 3 1 3.0 0
>> 4 1 6.0 0
>> 5 1 12.0 0
>> 6 1 24.0 0
>> 7 1 72.0 0
>> 8 1 0.5 1
>> 9 1 1.0 1
>> 10 1 3.0 1
>> 11 1 6.0 1
>> 12 1 12.0 1
>> 13 1 24.0 1
>> 14 1 72.0 1
>> attr(,"assign")
>> [1] 0 1 2
>> attr(,"contrasts")
>> attr(,"contrasts")$trt
>> [1] "contr.treatment"
>>
>> Here is my understanding of what happens in limma
>>
>> In this case, Iimma is supposed to fit:
>> y=p0 + p1*time + p2*treat, and y=p0 + p1*time,
>> while time is quantitative variable, treat is group variable.
>> A F-statistic will be calculated and the corresponding pvalue will
be
>> used to determine significance.
>>
>> Is it right to understand in this way?
>>
>> The implementation is :
>> fit <- lmFit(ts.data, design) ## ts.data is the time-course data.
>> fit <- eBayes(fit)
>> xx.1 <- topTable(fit, coef="treat", adjust="BH")
>> xx.2 <- topTable(fit, coef="time", adjust="BH")
>>
>> So xx.1 means the differentially expressed genes between the two
>> treatments, after adjusting for the time effect, right? what does
xx.2 mean?
>>
>
> xx.2 gives you those genes where the slope of the fitted line is
> significantly different from zero. In other words, these are the
genes that
> appear to go up or down as time progresses.
>
> You should note two things here. First, since you don't have
replicates,
> we are forced to use the time points as continuous covariates rather
than
> factor levels. This means that you are looking for a linear response
to
> time (e.g., genes that go up or down more as time progresses). This
is not
> normally what one would want to do, as it is more likely that a gene
will
> go up for a bit and then go back down (or vice versa). To account
for that
> curvature in the expression, you might want to add a time^2
covariate as
> well.
>
> Second, linear regression is in some ways more complicated than
analysis
> of variance, and if you were just fitting a single model you would
be able
> to assess how well your model was conforming to the assumptions of
> regression, test to see if you need a quadratic time term, look for
data
> with high leverage that might be skewing your results, etc. For
ANOVA, you
> are in general just testing to see if the mean of the groups are
different.
>
> When doing microarray analysis, you just fit what you hope is a
generally
> reasonable model and let it rip on thousands of genes at once. There
is no
> way to check all the little details of a linear regression on
thousands of
> models, so you may (will?) have numerous genes that are clearly
> mis-specified by the model you fit, or have glaring inconsistencies
that
> cause them to appear significant when it is just that the model is
really
> wrong.
>
> So that is my long winded way of saying that you are not IMO
analyzing
> your data in an optimal way, but you are forced to do so by the lack
of
> replication. So for each 'significant' gene you find, you should
look at
> plots of expression vs time prior to attempting validation.
>
>
>
> I am more interested in finding genes changed in time course in
>> treatment, after adjusting for the control.
>> Is tm*trt in design2 what I need?
>>
>
> Not really. The model you are fitting here (xx.2) gives the genes
that
> change linearly with time, after adjusting for control. The
assumptions you
> are making are that
>
> a) Genes go up or down in a linear fashion (e.g., you can plot
expression
> values on the vertical axis and time on the horizontal axis and then
draw a
> line over the data, and it looks 'reasonable').
> b) The only difference between control and treatment is a difference
in
> the intercept, so you assume the same slope for control and treated
samples.
>
> adding a tm:trt1 coefficient allows the slopes to be different as
well.
> This coefficient tests for a significant difference between slopes
(for
> example, it might be that geneX goes up in treated, but down in
control).
>
> Best,
>
> Jim
>
>
>
>> Best,
>>
>>
>>
>> On Tue, Mar 20, 2012 at 2:48 PM, James W. MacDonald
<jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote:
>>
>> Hi Xuan,
>>
>>
>> On 3/20/2012 6:50 AM, Chunxuan Shao wrote:
>>
>> Hi everyone:
>>
>> I have one microarray data set considering differentiation
in
>> a cell line. About 7 time points for both control and
>> treatment, no replicates
>> I would like to use limma to find the differentially
expressed
>> genes between time points for control and treatment, and
want
>> to compare the gene expression between control and treatment
>> for the same time point. But I don't know how to make the
>> right design and contrast matrix.
>>
>> After searching mail archive, the closest related answer is
>> "https://stat.ethz.ch/**pipermail/bioconductor/2010-**
>> June/033849.html<https: stat.ethz.ch="" pipermail="" bioconductor="" 2010-j="" une="" 033849.html="">
>> ",
>> which suggest:
>>
>> "
>> time=1:10
>> design=model.matrix(~time)
>> "
>>
>> In my case, is it correct to set this?
>>
>> time=rep(1:7,2)
>> design=model.matrix(~time)
>>
>>
>> Probably not. This implies that your time points are equally
>> spaced, with one time period between them (e.g., you collected
>> samples after 1,2,3,4,5,6,7 hours or days or some other period).
>> If you didn't use equally spaced time points, you need to change
>> to reflect that.
>>
>> In addition, you need to set the design matrix up to include the
>> control/treatment comparison. If I assume you are in fact using
>> seven equally spaced time intervals, then you would want:
>>
>> tm <- rep(1:7,2)
>> trt <- factor(rep(0:1, each = 7))
>>
>> design <- model.matrix(~tm+trt)
>>
>> And the trt1 coefficient estimates the difference in the
>> intercepts, which is what you are looking for. In other words,
>> this is fitting a model where you are allowing the two sample
>> types to have different intercepts, but assuming the same slope.
>> If the intercepts are different, then one sample type has
overall
>> higher expression than the other.
>>
>> You could also allow for different slopes by adding a
>> time/treatment interaction term:
>>
>> design2 <- model.matrix(~tm*trt)
>>
>> Here the main coefficient of interest would be tm:trt1, which
>> measures the difference in slope between the treatment and
control.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> design
>>
>> (Intercept) time
>> 1 1 1
>> 2 1 2
>> 3 1 3
>> 4 1 4
>> 5 1 5
>> 6 1 6
>> 7 1 7
>> 8 1 1
>> 9 1 2
>> 10 1 3
>> 11 1 4
>> 12 1 5
>> 13 1 6
>> 14 1 7
>> attr(,"assign")
>> [1] 0 1
>>
>>
>> Then how to set the contrast matrix?
>>
>>
>> Thanks!
>>
>> --
>> xuan
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> Search the archives:
>> http://news.gmane.org/gmane.**science.biology.informatics.**
>> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor="">
>>
>>
>> -- James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>>
>>
>> --
>> xuan
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
--
Chunxuan Shao
[[alternative HTML version deleted]]