What is the best way in bioconductor to find genes that have a
significant trend with a continuous variable e.g. concentration or
time.
This would be using microarray data and trying to find genes that
show
a dose response or a time response. In the simplest of cases this
would
be a linear regression. For example I have an experiment with time
points 24,48,72,96 and I would like to find genes who have expression
that increases with time i.e. expression is greater in each of the
time
points.
I have looked into trying to do this with limma but the user manual
only
seems to deal with time courses with each time being a factor rather
than a continuous variable.
--
**************************************************************
Daniel Brewer, Ph.D.
Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk
**************************************************************
The Institute of Cancer Research: Royal Cancer Hospital, a charitable
Company Limited by Guarantee, Registered in England under Company No.
534147 with its Registered Office at 123 Old Brompton Road, London SW7
3RP.
This e-mail message is confidential and for use by the
a...{{dropped:2}}
On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote:
> What is the best way in bioconductor to find genes that have a
> significant trend with a continuous variable e.g. concentration or
time.
> This would be using microarray data and trying to find genes that
show
> a dose response or a time response. In the simplest of cases this
would
> be a linear regression. For example I have an experiment with time
> points 24,48,72,96 and I would like to find genes who have
expression
> that increases with time i.e. expression is greater in each of the
time
> points.
>
> I have looked into trying to do this with limma but the user manual
only
> seems to deal with time courses with each time being a factor rather
> than a continuous variable.
Limma will deal with continuous variables just fine. Just change the
value of the factor to a number, if you have continuous data.
genes <- matrix(rnorm(100),nc=10)
var1 <- rnorm(10)
df <- data.frame(var1=rnorm(10))
dm <- model.matrix(~ var1, data=df)
fit1 <- lmFit(genes,dm)
fit2 <- eBayes(fit1)
topTable(fit2,coef=2)
However, keep in mind the hypothesis you will be testing--that the
gene expression changes are linearly correlated with the variable.
While some genes may show this effect, there are probably plenty of
other important and interesting genes that will not fit this model.
The same reasoning holds for the dose-response relationship; if you
are lucky enough (or smart enough) to be on the linear portion of the
dose response curve for one gene, you may be very far away from linear
for another gene.
So, to summarize, be sure that linearity is the appropriate model
before applying it; in biology, it might very well not be the correct
model for all genes.
Sean
Sean Davis wrote:
> On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote:
>> What is the best way in bioconductor to find genes that have a
>> significant trend with a continuous variable e.g. concentration or
time.
>> This would be using microarray data and trying to find genes that
show
>> a dose response or a time response. In the simplest of cases this
would
>> be a linear regression. For example I have an experiment with time
>> points 24,48,72,96 and I would like to find genes who have
expression
>> that increases with time i.e. expression is greater in each of the
time
>> points.
>>
>> I have looked into trying to do this with limma but the user manual
only
>> seems to deal with time courses with each time being a factor
rather
>> than a continuous variable.
>
> Limma will deal with continuous variables just fine. Just change
the
> value of the factor to a number, if you have continuous data.
>
> genes <- matrix(rnorm(100),nc=10)
> var1 <- rnorm(10)
> df <- data.frame(var1=rnorm(10))
> dm <- model.matrix(~ var1, data=df)
> fit1 <- lmFit(genes,dm)
> fit2 <- eBayes(fit1)
> topTable(fit2,coef=2)
>
> However, keep in mind the hypothesis you will be testing--that the
> gene expression changes are linearly correlated with the variable.
> While some genes may show this effect, there are probably plenty of
> other important and interesting genes that will not fit this model.
> The same reasoning holds for the dose-response relationship; if you
> are lucky enough (or smart enough) to be on the linear portion of
the
> dose response curve for one gene, you may be very far away from
linear
> for another gene.
>
> So, to summarize, be sure that linearity is the appropriate model
> before applying it; in biology, it might very well not be the
correct
> model for all genes.
>
> Sean
Thanks for that, that's exactly what I needed. I nearly got to the
same
place by the time this email arrived, but had missed out on the coef=2
bit. So I assume that you can fit any regression you like using this
approach e.g. if you wanted to fit a quadratic dm <- model.matrix(~
poly(var1,2), data=df). The only problem I see there is what coef you
would look at in the topTable, any ideas?
So to summarise:
1) Use categorical definitions of time if you want to see if there is
any change in expression with time.
2) Use regression if you want to determine whether genes have a
specific
trend e.g. linear, logarithmic etc.
Just one more question. If you had say tumour and control experiments
is there a way to see if the trends (say linear) are significantly
different? or do contrasts in this situation not make much sense.
Thanks
--
**************************************************************
Daniel Brewer
Institute of Cancer Research
Molecular Carcinogenesis
MUCRC
15 Cotswold Road
Sutton, Surrey SM2 5NG
United Kingdom
Tel: +44 (0) 20 8722 4109
Fax: +44 (0) 20 8722 4141
Email: daniel.brewer at icr.ac.uk
**************************************************************
The Institute of Cancer Research: Royal Cancer Hospital, a charitable
Company Limited by Guarantee, Registered in England under Company No.
534147 with its Registered Office at 123 Old Brompton Road, London SW7
3RP.
This e-mail message is confidential and for use by the
a...{{dropped:2}}
As Sean said, depending on how many time points you have, a linear
response may not be the best model. In fact, it is unclear that any
parametric model will work across all the genes in the data set.
Typically, you tend to be interested in genes that "increase over
time"
or "decrease over time", without caring about the exact shape of the
function. Similar ideas apply to dose response studies. You might
want
to consider an approach using isotonic regression (which addresses
exactly this question), as described for microarrays in the paper:
Hu J, Kapoor M, Zhang W, Hamilton SR, Coombes KR. Analysis of
dose-response effects on gene expression data with comparison of two
microarray platforms. Bioinformatics. 2005 Sep 1;21(17):3524-9.
If you're interested, you can contact Jianhua Hu and ask her for the
code....
Best,
Kevin
Daniel Brewer wrote:
>
> Sean Davis wrote:
>> On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote:
>>> What is the best way in bioconductor to find genes that have a
>>> significant trend with a continuous variable e.g. concentration or
time.
>>> This would be using microarray data and trying to find genes that
show
>>> a dose response or a time response. In the simplest of cases this
would
>>> be a linear regression. For example I have an experiment with
time
>>> points 24,48,72,96 and I would like to find genes who have
expression
>>> that increases with time i.e. expression is greater in each of the
time
>>> points.
>>>
>>> I have looked into trying to do this with limma but the user
manual only
>>> seems to deal with time courses with each time being a factor
rather
>>> than a continuous variable.
>> Limma will deal with continuous variables just fine. Just change
the
>> value of the factor to a number, if you have continuous data.
>>
>> genes <- matrix(rnorm(100),nc=10)
>> var1 <- rnorm(10)
>> df <- data.frame(var1=rnorm(10))
>> dm <- model.matrix(~ var1, data=df)
>> fit1 <- lmFit(genes,dm)
>> fit2 <- eBayes(fit1)
>> topTable(fit2,coef=2)
>>
>> However, keep in mind the hypothesis you will be testing--that the
>> gene expression changes are linearly correlated with the variable.
>> While some genes may show this effect, there are probably plenty of
>> other important and interesting genes that will not fit this model.
>> The same reasoning holds for the dose-response relationship; if you
>> are lucky enough (or smart enough) to be on the linear portion of
the
>> dose response curve for one gene, you may be very far away from
linear
>> for another gene.
>>
>> So, to summarize, be sure that linearity is the appropriate model
>> before applying it; in biology, it might very well not be the
correct
>> model for all genes.
>>
>> Sean
>
> Thanks for that, that's exactly what I needed. I nearly got to the
same
> place by the time this email arrived, but had missed out on the
coef=2
> bit. So I assume that you can fit any regression you like using
this
> approach e.g. if you wanted to fit a quadratic dm <- model.matrix(~
> poly(var1,2), data=df). The only problem I see there is what coef
you
> would look at in the topTable, any ideas?
>
> So to summarise:
> 1) Use categorical definitions of time if you want to see if there
is
> any change in expression with time.
> 2) Use regression if you want to determine whether genes have a
specific
> trend e.g. linear, logarithmic etc.
>
> Just one more question. If you had say tumour and control
experiments
> is there a way to see if the trends (say linear) are significantly
> different? or do contrasts in this situation not make much sense.
>
> Thanks
>
The samr package will do this as well. choose
'resp.type='Quantitative' when defining your sam.object (see the
docs).
cheers
iain
Sean Davis <sdavis2@mail.nih.gov> wrote: On Mon, May 19, 2008 at 6:39
AM, Daniel Brewer wrote:
> What is the best way in bioconductor to find genes that have a
> significant trend with a continuous variable e.g. concentration or
time.
> This would be using microarray data and trying to find genes that
show
> a dose response or a time response. In the simplest of cases this
would
> be a linear regression. For example I have an experiment with time
> points 24,48,72,96 and I would like to find genes who have
expression
> that increases with time i.e. expression is greater in each of the
time
> points.
>
> I have looked into trying to do this with limma but the user manual
only
> seems to deal with time courses with each time being a factor rather
> than a continuous variable.
Limma will deal with continuous variables just fine. Just change the
value of the factor to a number, if you have continuous data.
genes <- matrix(rnorm(100),nc=10)
var1 <- rnorm(10)
df <- data.frame(var1=rnorm(10))
dm <- model.matrix(~ var1, data=df)
fit1 <- lmFit(genes,dm)
fit2 <- eBayes(fit1)
topTable(fit2,coef=2)
However, keep in mind the hypothesis you will be testing--that the
gene expression changes are linearly correlated with the variable.
While some genes may show this effect, there are probably plenty of
other important and interesting genes that will not fit this model.
The same reasoning holds for the dose-response relationship; if you
are lucky enough (or smart enough) to be on the linear portion of the
dose response curve for one gene, you may be very far away from linear
for another gene.
So, to summarize, be sure that linearity is the appropriate model
before applying it; in biology, it might very well not be the correct
model for all genes.
Sean
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]
Dear Daniel
You could have a look into the maSigPro package. It consider time or
any
other continuous variable as such and indentifies genes with
significant
changes their expression trend for one or more than one series.
Best regards
Ana Conesa
Daniel Brewer wrote:
> What is the best way in bioconductor to find genes that have a
> significant trend with a continuous variable e.g. concentration or
time.
> This would be using microarray data and trying to find genes that
show
> a dose response or a time response. In the simplest of cases this
would
> be a linear regression. For example I have an experiment with time
> points 24,48,72,96 and I would like to find genes who have
expression
> that increases with time i.e. expression is greater in each of the
time
> points.
>
> I have looked into trying to do this with limma but the user manual
only
> seems to deal with time courses with each time being a factor rather
> than a continuous variable.
>
>
--
------------------------------------------
Ana Conesa, PhD
Bioinformatics and Genomics Department
Centro de Investigacio'n Pri'ncipe Felipe
Avda. Autopista Saler, 16
46012 Valencia Spain
http://bioinfo.cipf.es/aconesa