Finding genes with significant linear regression or trend
2
0
Entering edit mode
Daniel Brewer ★ 1.9k
@daniel-brewer-1791
Last seen 10.3 years ago
What is the best way in bioconductor to find genes that have a significant trend with a continuous variable e.g. concentration or time. This would be using microarray data and trying to find genes that show a dose response or a time response. In the simplest of cases this would be a linear regression. For example I have an experiment with time points 24,48,72,96 and I would like to find genes who have expression that increases with time i.e. expression is greater in each of the time points. I have looked into trying to do this with limma but the user manual only seems to deal with time courses with each time being a factor rather than a continuous variable. -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}
Microarray Regression Cancer limma DOSE Microarray Regression Cancer limma DOSE • 3.3k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > What is the best way in bioconductor to find genes that have a > significant trend with a continuous variable e.g. concentration or time. > This would be using microarray data and trying to find genes that show > a dose response or a time response. In the simplest of cases this would > be a linear regression. For example I have an experiment with time > points 24,48,72,96 and I would like to find genes who have expression > that increases with time i.e. expression is greater in each of the time > points. > > I have looked into trying to do this with limma but the user manual only > seems to deal with time courses with each time being a factor rather > than a continuous variable. Limma will deal with continuous variables just fine. Just change the value of the factor to a number, if you have continuous data. genes <- matrix(rnorm(100),nc=10) var1 <- rnorm(10) df <- data.frame(var1=rnorm(10)) dm <- model.matrix(~ var1, data=df) fit1 <- lmFit(genes,dm) fit2 <- eBayes(fit1) topTable(fit2,coef=2) However, keep in mind the hypothesis you will be testing--that the gene expression changes are linearly correlated with the variable. While some genes may show this effect, there are probably plenty of other important and interesting genes that will not fit this model. The same reasoning holds for the dose-response relationship; if you are lucky enough (or smart enough) to be on the linear portion of the dose response curve for one gene, you may be very far away from linear for another gene. So, to summarize, be sure that linearity is the appropriate model before applying it; in biology, it might very well not be the correct model for all genes. Sean
ADD COMMENT
0
Entering edit mode
Sean Davis wrote: > On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: >> What is the best way in bioconductor to find genes that have a >> significant trend with a continuous variable e.g. concentration or time. >> This would be using microarray data and trying to find genes that show >> a dose response or a time response. In the simplest of cases this would >> be a linear regression. For example I have an experiment with time >> points 24,48,72,96 and I would like to find genes who have expression >> that increases with time i.e. expression is greater in each of the time >> points. >> >> I have looked into trying to do this with limma but the user manual only >> seems to deal with time courses with each time being a factor rather >> than a continuous variable. > > Limma will deal with continuous variables just fine. Just change the > value of the factor to a number, if you have continuous data. > > genes <- matrix(rnorm(100),nc=10) > var1 <- rnorm(10) > df <- data.frame(var1=rnorm(10)) > dm <- model.matrix(~ var1, data=df) > fit1 <- lmFit(genes,dm) > fit2 <- eBayes(fit1) > topTable(fit2,coef=2) > > However, keep in mind the hypothesis you will be testing--that the > gene expression changes are linearly correlated with the variable. > While some genes may show this effect, there are probably plenty of > other important and interesting genes that will not fit this model. > The same reasoning holds for the dose-response relationship; if you > are lucky enough (or smart enough) to be on the linear portion of the > dose response curve for one gene, you may be very far away from linear > for another gene. > > So, to summarize, be sure that linearity is the appropriate model > before applying it; in biology, it might very well not be the correct > model for all genes. > > Sean Thanks for that, that's exactly what I needed. I nearly got to the same place by the time this email arrived, but had missed out on the coef=2 bit. So I assume that you can fit any regression you like using this approach e.g. if you wanted to fit a quadratic dm <- model.matrix(~ poly(var1,2), data=df). The only problem I see there is what coef you would look at in the topTable, any ideas? So to summarise: 1) Use categorical definitions of time if you want to see if there is any change in expression with time. 2) Use regression if you want to determine whether genes have a specific trend e.g. linear, logarithmic etc. Just one more question. If you had say tumour and control experiments is there a way to see if the trends (say linear) are significantly different? or do contrasts in this situation not make much sense. Thanks -- ************************************************************** Daniel Brewer Institute of Cancer Research Molecular Carcinogenesis MUCRC 15 Cotswold Road Sutton, Surrey SM2 5NG United Kingdom Tel: +44 (0) 20 8722 4109 Fax: +44 (0) 20 8722 4141 Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}
ADD REPLY
0
Entering edit mode
As Sean said, depending on how many time points you have, a linear response may not be the best model. In fact, it is unclear that any parametric model will work across all the genes in the data set. Typically, you tend to be interested in genes that "increase over time" or "decrease over time", without caring about the exact shape of the function. Similar ideas apply to dose response studies. You might want to consider an approach using isotonic regression (which addresses exactly this question), as described for microarrays in the paper: Hu J, Kapoor M, Zhang W, Hamilton SR, Coombes KR. Analysis of dose-response effects on gene expression data with comparison of two microarray platforms. Bioinformatics. 2005 Sep 1;21(17):3524-9. If you're interested, you can contact Jianhua Hu and ask her for the code.... Best, Kevin Daniel Brewer wrote: > > Sean Davis wrote: >> On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: >>> What is the best way in bioconductor to find genes that have a >>> significant trend with a continuous variable e.g. concentration or time. >>> This would be using microarray data and trying to find genes that show >>> a dose response or a time response. In the simplest of cases this would >>> be a linear regression. For example I have an experiment with time >>> points 24,48,72,96 and I would like to find genes who have expression >>> that increases with time i.e. expression is greater in each of the time >>> points. >>> >>> I have looked into trying to do this with limma but the user manual only >>> seems to deal with time courses with each time being a factor rather >>> than a continuous variable. >> Limma will deal with continuous variables just fine. Just change the >> value of the factor to a number, if you have continuous data. >> >> genes <- matrix(rnorm(100),nc=10) >> var1 <- rnorm(10) >> df <- data.frame(var1=rnorm(10)) >> dm <- model.matrix(~ var1, data=df) >> fit1 <- lmFit(genes,dm) >> fit2 <- eBayes(fit1) >> topTable(fit2,coef=2) >> >> However, keep in mind the hypothesis you will be testing--that the >> gene expression changes are linearly correlated with the variable. >> While some genes may show this effect, there are probably plenty of >> other important and interesting genes that will not fit this model. >> The same reasoning holds for the dose-response relationship; if you >> are lucky enough (or smart enough) to be on the linear portion of the >> dose response curve for one gene, you may be very far away from linear >> for another gene. >> >> So, to summarize, be sure that linearity is the appropriate model >> before applying it; in biology, it might very well not be the correct >> model for all genes. >> >> Sean > > Thanks for that, that's exactly what I needed. I nearly got to the same > place by the time this email arrived, but had missed out on the coef=2 > bit. So I assume that you can fit any regression you like using this > approach e.g. if you wanted to fit a quadratic dm <- model.matrix(~ > poly(var1,2), data=df). The only problem I see there is what coef you > would look at in the topTable, any ideas? > > So to summarise: > 1) Use categorical definitions of time if you want to see if there is > any change in expression with time. > 2) Use regression if you want to determine whether genes have a specific > trend e.g. linear, logarithmic etc. > > Just one more question. If you had say tumour and control experiments > is there a way to see if the trends (say linear) are significantly > different? or do contrasts in this situation not make much sense. > > Thanks >
ADD REPLY
0
Entering edit mode
The samr package will do this as well. choose 'resp.type='Quantitative' when defining your sam.object (see the docs). cheers iain Sean Davis <sdavis2@mail.nih.gov> wrote: On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer wrote: > What is the best way in bioconductor to find genes that have a > significant trend with a continuous variable e.g. concentration or time. > This would be using microarray data and trying to find genes that show > a dose response or a time response. In the simplest of cases this would > be a linear regression. For example I have an experiment with time > points 24,48,72,96 and I would like to find genes who have expression > that increases with time i.e. expression is greater in each of the time > points. > > I have looked into trying to do this with limma but the user manual only > seems to deal with time courses with each time being a factor rather > than a continuous variable. Limma will deal with continuous variables just fine. Just change the value of the factor to a number, if you have continuous data. genes <- matrix(rnorm(100),nc=10) var1 <- rnorm(10) df <- data.frame(var1=rnorm(10)) dm <- model.matrix(~ var1, data=df) fit1 <- lmFit(genes,dm) fit2 <- eBayes(fit1) topTable(fit2,coef=2) However, keep in mind the hypothesis you will be testing--that the gene expression changes are linearly correlated with the variable. While some genes may show this effect, there are probably plenty of other important and interesting genes that will not fit this model. The same reasoning holds for the dose-response relationship; if you are lucky enough (or smart enough) to be on the linear portion of the dose response curve for one gene, you may be very far away from linear for another gene. So, to summarize, be sure that linearity is the appropriate model before applying it; in biology, it might very well not be the correct model for all genes. Sean _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Ana Conesa ▴ 340
@ana-conesa-2156
Last seen 10.3 years ago
Dear Daniel You could have a look into the maSigPro package. It consider time or any other continuous variable as such and indentifies genes with significant changes their expression trend for one or more than one series. Best regards Ana Conesa Daniel Brewer wrote: > What is the best way in bioconductor to find genes that have a > significant trend with a continuous variable e.g. concentration or time. > This would be using microarray data and trying to find genes that show > a dose response or a time response. In the simplest of cases this would > be a linear regression. For example I have an experiment with time > points 24,48,72,96 and I would like to find genes who have expression > that increases with time i.e. expression is greater in each of the time > points. > > I have looked into trying to do this with limma but the user manual only > seems to deal with time courses with each time being a factor rather > than a continuous variable. > > -- ------------------------------------------ Ana Conesa, PhD Bioinformatics and Genomics Department Centro de Investigacio'n Pri'ncipe Felipe Avda. Autopista Saler, 16 46012 Valencia Spain http://bioinfo.cipf.es/aconesa
ADD COMMENT

Login before adding your answer.

Traffic: 616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6