Cox Model
2
0
Entering edit mode
@eleni-christodoulou-2653
Last seen 6.3 years ago
Singapore
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080213/ 16216650/attachment.pl
• 2.3k views
ADD COMMENT
0
Entering edit mode
A Gusnanto ▴ 60
@a-gusnanto-2493
Last seen 10.3 years ago
Hi Eleni, The analysis you have in mind has been described in Pawitan et al. Statistics in Medicine 2004; 23:1767?1780 (DOI: 10.1002/sim.1769). Prof. Pawitan gave me the R codes for the analysis long time ago, but I can't find it in my computer at present. I will try to look for it in my old archives, and if I find it, I'll let you know. If you are interested to identify significant genes using this type of analysis, no gene will turn up significant (in the sense of having gene-wise 95% CI away from zero). This is due to the limited number of samples you use in estimating 18,000 parameters (sparse variance-covariance matrix involved). Further details are described in the paper. Although I have never tried this, my suggestion would be to perform survival analysis on each of the genes to get gene-wise p-values, and control for false discoveries using FDR. Regards, Arief -- Dr. Arief Gusnanto Dept. of Statistics University of Leeds Leeds LS2 9JT United Kingdom Phone +44 113 3435135 Fax +44 113 3435090 Email arief at maths.leeds.ac.uk On Wed, 2008-02-13 at 09:10 +0200, Eleni Christodoulou wrote: > Hello BioC-community, > > It's been a week now that I am struggling with the implementation of a cox > model in R. I have 80 cancer patients, so 80 time measurements and 80 > relapse or no measurements (respective to censor, 1 if relapsed over the > examined period, 0 if not). My microarray data contain around 18000 genes. > So I have the expressions of 18000 genes in each of the 80 tumors (matrix > 80*18000). I would like to build a cox model in order to retrieve the most > significant genes (according to the p-value). The command that I am using > is: > > test1 <- list(time,relapse,genes) > coxph( Surv(time, relapse) ~ genes, test1) > > where time is a vector of size 80 containing the times, relapse is a vector > of size 80 containing the relapse values and genes is a matrix 80*18000. > When I give the coxph command I retrieve an error saying that cannot > allocate vector of size 2.7Mb (in Windows). I also tried linux and then I > receive error that maximum memory is reached. I increase the memory by > initializing R with the command: > R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M > > I think it cannot get better than that because if I try for example > max-vsize=300 the memomry capacity is stored as NA. > > Does anyone have any idea why this happens and how I can overcome it? > > I would be really grateful if you could help! > It has been bothering me a lot! > > Thank you all, > Eleni > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080213/ 06ff0712/attachment.pl
ADD REPLY
0
Entering edit mode
Ramon Diaz ★ 1.1k
@ramon-diaz-159
Last seen 10.3 years ago
Dear Eleni, You are trying to fit a model with 18000 covariates but only 80 samples (of which, at most, only 80 are not censored). Just doing it the way you are trying to do it is unlikely to work or make much sense... You might want to take a look at the work of Torsten Hothorn and colleagues on survival ensembles, with implementations in the R package mboost, and their work on random forests for survival data (see R package party). Some of this funcionality is also accessible through our web-based tool SignS (http://signs.bioinfo.cnio.es), which uses the above packages. Depending on your exact question, you might also want to look at the approach of Jelle Goeman, for testing whether sets of genes (e.g., you complete 18000 set of genes) are related to the outcome of interest (survival in your case). Goeman's approach is available in the globaltest package from BioC. Hope this helps, R. On Wednesday 13 February 2008 08:10, Eleni Christodoulou wrote: > Hello BioC-community, > > It's been a week now that I am struggling with the implementation of a cox > model in R. I have 80 cancer patients, so 80 time measurements and 80 > relapse or no measurements (respective to censor, 1 if relapsed over the > examined period, 0 if not). My microarray data contain around 18000 genes. > So I have the expressions of 18000 genes in each of the 80 tumors (matrix > 80*18000). I would like to build a cox model in order to retrieve the most > significant genes (according to the p-value). The command that I am using > is: > > test1 <- list(time,relapse,genes) > coxph( Surv(time, relapse) ~ genes, test1) > > where time is a vector of size 80 containing the times, relapse is a vector > of size 80 containing the relapse values and genes is a matrix 80*18000. > When I give the coxph command I retrieve an error saying that cannot > allocate vector of size 2.7Mb (in Windows). I also tried linux and then I > receive error that maximum memory is reached. I increase the memory by > initializing R with the command: > R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M > > I think it cannot get better than that because if I try for example > max-vsize=300 the memomry capacity is stored as NA. > > Does anyone have any idea why this happens and how I can overcome it? > > I would be really grateful if you could help! > It has been bothering me a lot! > > Thank you all, > Eleni > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Ram?n D?az-Uriarte Statistical Computing Team Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y ...{{dropped:3}}
ADD COMMENT
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080213/ 5e768ff3/attachment.pl
ADD REPLY
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080213/ b56b0b96/attachment.pl
ADD REPLY
0
Entering edit mode
Eleni, Note that some of the genes that declared as significant in a univariate analysis could be highly correlated. Thus, some of the selected genes would not be informative in building the multivariate model. You might want to consider reducing the dimensionality by first grouping the genes into clusters with similar patterns. There are many techniques but the one I can recall now is one of the earliest called gene shaving. Or you can pre-select some genes based on variability measures etc. Regards, Adai Eleni Christodoulou wrote: > Hi, > > Thanks for the replies. I will probably try to perform survival analysis on > each of the genes to get gene-wise p-values and then select the most > significant (the ones that are below a certain p-value) and proceed to a > full cox regression using the significant genes. Do you think that this > makes sense? > > Thanks a lot, > Eleni > > On Feb 13, 2008 2:11 PM, <phguardiol at="" aol.com=""> wrote: > >> Hi, >> wouldnt it make sense to first have data reduction dimensionality before >> undergoing such survival analysis ? Certainly, some of your genes have >> similar expression profiles across samples...? >> Best, >> Philippe Guardiola >> >> >> -----E-mail d'origine----- >> De : Ramon Diaz-Uriarte <rdiaz at="" cnio.es=""> >> A : bioconductor at stat.math.ethz.ch >> Cc : Eleni Christodoulou <elenichri at="" gmail.com=""> >> Envoy? le : Me, 13 F?vrier 2008 11:23 >> Sujet : Re: [BioC] Cox Model >> >> Dear Eleni, >> >> >> You are trying to fit a model with 18000 covariates but only 80 samples (of >> >> which, at most, only 80 are not censored). Just doing it the way you are >> >> trying to do it is unlikely to work or make much sense... >> >> >> You might want to take a look at the work of Torsten Hothorn and colleagues on >> >> survival ensembles, with implementations in the R package mboost, and their >> >> work on random forests for survival data (see R package party). Some of this >> >> funcionality is also accessible through our web-based tool SignS >> >> (http://signs.bioinfo.cnio.es), which uses the above packages. >> >> >> Depending on your exact question, you might also want to look at the approach >> >> of Jelle Goeman, for testing whether sets of genes (e.g., you complete 18000 >> >> set of genes) are related to the outcome of interest (survival in your case). >> >> Goeman's approach is available in the globaltest package from BioC. >> >> >> Hope this helps, >> >> >> R. >> >> >> >> On Wednesday 13 February 2008 08:10, Eleni Christodoulou wrote: >> >>> Hello BioC-community, >>> It's been a week now that I am struggling with the implementation of a cox >>> model in R. I have 80 cancer patients, so 80 time measurements and 80 >>> relapse or no measurements (respective to censor, 1 if relapsed over the >>> examined period, 0 if not). My microarray data contain around 18000 genes. >>> So I have the expressions of 18000 genes in each of the 80 tumors (matrix >>> 80*18000). I would like to build a cox model in order to retrieve the most >>> significant genes (according to the p-value). The command that I am using >>> is: >>> test1 <- list(time,relapse,genes) >>> coxph( Surv(time, relapse) ~ genes, test1) >>> where time is a vector of size 80 containing the times, relapse is a vector >>> of size 80 containing the relapse values and genes is a matrix 80*18000. >>> When I give the coxph command I retrieve an error saying that cannot >>> allocate vector of size 2.7Mb (in Windows). I also tried linux and then I >>> receive error that maximum memory is reached. I increase the memory by >>> initializing R with the command: >>> R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M >>> I think it cannot get better than that because if I try for example >>> max-vsize=300 the memomry capacity is stored as NA. >>> Does anyone have any idea why this happens and how I can overcome it? >>> I would be really grateful if you could help! >>> It has been bothering me a lot! >>> Thank you all, >>> Eleni >>> [[alternative HTML version deleted]] >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> >> Ram?n D?az-Uriarte >> >> Statistical Computing Team >> >> Centro Nacional de Investigaciones Oncol?gicas (CNIO) >> >> (Spanish National Cancer Center) >> >> Melchor Fern?ndez Almagro, 3 >> >> 28029 Madrid (Spain) >> >> Fax: +-34-91-224-6972 >> >> Phone: +-34-91-224-6900 >> >> http://ligarto.org/rdiaz >> >> PGP KeyID: 0xE89B3462 >> >> (http://ligarto.org/rdiaz/0xE89B3462.asc) >> >> >> >> >> **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y ...{{dropped:3}} >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > [[alternative HTML version deleted]] > > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080214/ 3664f9cc/attachment.pl
ADD REPLY
0
Entering edit mode
That's a good idea as you can address the sample selection bias. You might also be interested in reading the following papers if you haven't done so already (there are other on a similar topic): Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 2005 Feb 5-11;365(9458):488-92. PMID: 15705458 Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics. 2005 Jan 15;21(2):171-8. Epub 2004 Aug 12. PMID: 15308542 Regards, Adai Eleni Christodoulou wrote: > I was actually thinking of creating bootstrap samples and applying > univariate cox models in each of them. Then > I would sdelect the significant genes for each bootstrapping. I would > declare the common genes among the bootstrap samples as actually > significant... > > On Thu, Feb 14, 2008 at 5:46 AM, Adaikalavan Ramasamy < > ramasamy at cancer.org.uk> wrote: > >> Eleni, >> >> Note that some of the genes that declared as significant in a univariate >> analysis could be highly correlated. Thus, some of the selected genes >> would not be informative in building the multivariate model. >> >> You might want to consider reducing the dimensionality by first grouping >> the genes into clusters with similar patterns. There are many techniques >> but the one I can recall now is one of the earliest called gene shaving. >> >> Or you can pre-select some genes based on variability measures etc. >> >> Regards, Adai >> >> >> >> Eleni Christodoulou wrote: >>> Hi, >>> >>> Thanks for the replies. I will probably try to perform survival analysis >> on >>> each of the genes to get gene-wise p-values and then select the most >>> significant (the ones that are below a certain p-value) and proceed to a >>> full cox regression using the significant genes. Do you think that this >>> makes sense? >>> >>> Thanks a lot, >>> Eleni >>> >>> On Feb 13, 2008 2:11 PM, <phguardiol at="" aol.com=""> wrote: >>> >>>> Hi, >>>> wouldnt it make sense to first have data reduction dimensionality >> before >>>> undergoing such survival analysis ? Certainly, some of your genes have >>>> similar expression profiles across samples...? >>>> Best, >>>> Philippe Guardiola >>>> >>>> >>>> -----E-mail d'origine----- >>>> De : Ramon Diaz-Uriarte <rdiaz at="" cnio.es=""> >>>> A : bioconductor at stat.math.ethz.ch >>>> Cc : Eleni Christodoulou <elenichri at="" gmail.com=""> >>>> Envoy? le : Me, 13 F?vrier 2008 11:23 >>>> Sujet : Re: [BioC] Cox Model >>>> >>>> Dear Eleni, >>>> >>>> >>>> You are trying to fit a model with 18000 covariates but only 80 samples >> (of >>>> which, at most, only 80 are not censored). Just doing it the way you >> are >>>> trying to do it is unlikely to work or make much sense... >>>> >>>> >>>> You might want to take a look at the work of Torsten Hothorn and >> colleagues on >>>> survival ensembles, with implementations in the R package mboost, and >> their >>>> work on random forests for survival data (see R package party). Some of >> this >>>> funcionality is also accessible through our web-based tool SignS >>>> >>>> (http://signs.bioinfo.cnio.es), which uses the above packages. >>>> >>>> >>>> Depending on your exact question, you might also want to look at the >> approach >>>> of Jelle Goeman, for testing whether sets of genes (e.g., you complete >> 18000 >>>> set of genes) are related to the outcome of interest (survival in your >> case). >>>> Goeman's approach is available in the globaltest package from BioC. >>>> >>>> >>>> Hope this helps, >>>> >>>> >>>> R. >>>> >>>> >>>> >>>> On Wednesday 13 February 2008 08:10, Eleni Christodoulou wrote: >>>> >>>>> Hello BioC-community, >>>>> It's been a week now that I am struggling with the implementation of a >> cox >>>>> model in R. I have 80 cancer patients, so 80 time measurements and 80 >>>>> relapse or no measurements (respective to censor, 1 if relapsed over >> the >>>>> examined period, 0 if not). My microarray data contain around 18000 >> genes. >>>>> So I have the expressions of 18000 genes in each of the 80 tumors >> (matrix >>>>> 80*18000). I would like to build a cox model in order to retrieve the >> most >>>>> significant genes (according to the p-value). The command that I am >> using >>>>> is: >>>>> test1 <- list(time,relapse,genes) >>>>> coxph( Surv(time, relapse) ~ genes, test1) >>>>> where time is a vector of size 80 containing the times, relapse is a >> vector >>>>> of size 80 containing the relapse values and genes is a matrix >> 80*18000. >>>>> When I give the coxph command I retrieve an error saying that cannot >>>>> allocate vector of size 2.7Mb (in Windows). I also tried linux and >> then I >>>>> receive error that maximum memory is reached. I increase the memory by >>>>> initializing R with the command: >>>>> R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max- nsize=200M >>>>> I think it cannot get better than that because if I try for example >>>>> max-vsize=300 the memomry capacity is stored as NA. >>>>> Does anyone have any idea why this happens and how I can overcome it? >>>>> I would be really grateful if you could help! >>>>> It has been bothering me a lot! >>>>> Thank you all, >>>>> Eleni >>>>> [[alternative HTML version deleted]] >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> -- >>>> >>>> Ram?n D?az-Uriarte >>>> >>>> Statistical Computing Team >>>> >>>> Centro Nacional de Investigaciones Oncol?gicas (CNIO) >>>> >>>> (Spanish National Cancer Center) >>>> >>>> Melchor Fern?ndez Almagro, 3 >>>> >>>> 28029 Madrid (Spain) >>>> >>>> Fax: +-34-91-224-6972 >>>> >>>> Phone: +-34-91-224-6900 >>>> >>>> http://ligarto.org/rdiaz >>>> >>>> PGP KeyID: 0xE89B3462 >>>> >>>> (http://ligarto.org/rdiaz/0xE89B3462.asc) >>>> >>>> >>>> >>>> >>>> **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y >> ...{{dropped:3}} >>>> >>>> _______________________________________________ >>>> >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >>>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> ------------------------------------------------------------------ ------ >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >
ADD REPLY
0
Entering edit mode
Dear Eleni, If you are interested in prediction of survival with (a subset of) your 18000 genes, you may want to have a look at the "penalized" package on CRAN (http://cran.us.r-project.org/src/contrib/Descriptions/penalized.html) or other packages there that do penalized estimation. Jelle > -----Original Message----- > From: Eleni Christodoulou [mailto:elenichri at gmail.com] > Sent: 13 February 2008 13:22 > To: phguardiol at aol.com > Cc: rdiaz at cnio.es; bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Re : Cox Model > > Hi, > > Thanks for the replies. I will probably try to perform > survival analysis on each of the genes to get gene-wise > p-values and then select the most significant (the ones that > are below a certain p-value) and proceed to a full cox > regression using the significant genes. Do you think that > this makes sense? > > Thanks a lot, > Eleni > > On Feb 13, 2008 2:11 PM, <phguardiol at="" aol.com=""> wrote: > > > Hi, > > wouldnt it make sense to first have data reduction dimensionality > > before undergoing such survival analysis ? Certainly, some of your > > genes have similar expression profiles across samples...? > > Best, > > Philippe Guardiola > > > > > > -----E-mail d'origine----- > > De : Ramon Diaz-Uriarte <rdiaz at="" cnio.es=""> A : > > bioconductor at stat.math.ethz.ch Cc : Eleni Christodoulou > > <elenichri at="" gmail.com=""> Envoy? le : Me, 13 F?vrier 2008 11:23 Sujet : > > Re: [BioC] Cox Model > > > > Dear Eleni, > > > > > > You are trying to fit a model with 18000 covariates but only 80 > > samples (of > > > > which, at most, only 80 are not censored). Just doing it > the way you > > are > > > > trying to do it is unlikely to work or make much sense... > > > > > > You might want to take a look at the work of Torsten Hothorn and > > colleagues on > > > > survival ensembles, with implementations in the R package > mboost, and > > their > > > > work on random forests for survival data (see R package > party). Some > > of this > > > > funcionality is also accessible through our web-based tool SignS > > > > (http://signs.bioinfo.cnio.es), which uses the above packages. > > > > > > Depending on your exact question, you might also want to > look at the > > approach > > > > of Jelle Goeman, for testing whether sets of genes (e.g., > you complete > > 18000 > > > > set of genes) are related to the outcome of interest > (survival in your case). > > > > Goeman's approach is available in the globaltest package from BioC. > > > > > > Hope this helps, > > > > > > R. > > > > > > > > On Wednesday 13 February 2008 08:10, Eleni Christodoulou wrote: > > > > > Hello BioC-community, > > > > > > > > > > It's been a week now that I am struggling with the > implementation of > > > a cox > > > > > model in R. I have 80 cancer patients, so 80 time > measurements and > > > 80 > > > > > relapse or no measurements (respective to censor, 1 if > relapsed over > > > the > > > > > examined period, 0 if not). My microarray data contain > around 18000 genes. > > > > > So I have the expressions of 18000 genes in each of the 80 tumors > > > (matrix > > > > > 80*18000). I would like to build a cox model in order to retrieve > > > the most > > > > > significant genes (according to the p-value). The command > that I am > > > using > > > > > is: > > > > > > > > > > test1 <- list(time,relapse,genes) > > > > > coxph( Surv(time, relapse) ~ genes, test1) > > > > > > > > > > where time is a vector of size 80 containing the times, > relapse is a > > > vector > > > > > of size 80 containing the relapse values and genes is a > matrix 80*18000. > > > > > When I give the coxph command I retrieve an error saying > that cannot > > > > > allocate vector of size 2.7Mb (in Windows). I also tried > linux and > > > then I > > > > > receive error that maximum memory is reached. I increase > the memory > > > by > > > > > initializing R with the command: > > > > > R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max- nsize=200M > > > > > > > > > > I think it cannot get better than that because if I try > for example > > > > > max-vsize=300 the memomry capacity is stored as NA. > > > > > > > > > > Does anyone have any idea why this happens and how I can > overcome it? > > > > > > > > > > I would be really grateful if you could help! > > > > > It has been bothering me a lot! > > > > > > > > > > Thank you all, > > > > > Eleni > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor at stat.math.ethz.ch > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > Search the archives: > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > > > Ram?n D?az-Uriarte > > > > Statistical Computing Team > > > > Centro Nacional de Investigaciones Oncol?gicas (CNIO) > > > > (Spanish National Cancer Center) > > > > Melchor Fern?ndez Almagro, 3 > > > > 28029 Madrid (Spain) > > > > Fax: +-34-91-224-6972 > > > > Phone: +-34-91-224-6900 > > > > http://ligarto.org/rdiaz > > > > PGP KeyID: 0xE89B3462 > > > > (http://ligarto.org/rdiaz/0xE89B3462.asc) > > > > > > > > > > **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y > > ...{{dropped:3}} > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > [[alternative HTML version deleted]] > > >
ADD REPLY
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080215/ b92f7cf1/attachment.pl
ADD REPLY
0
Entering edit mode
On Fri, Feb 15, 2008 at 10:14 AM, Eleni Christodoulou <elenichri at="" gmail.com=""> wrote: > Hi again, > > I am trying to create bootstrap samples of my training set and conduct a > univariate cox analysis on each of them. I tried both in Linux and windows. > After 17 samples in Linux and 8 in Windows the process is killed. Does > anyone know why this happens? Hi, Eleni. You will probably need to post the code that you are using as well as any error messages. However, I might venture a guess that this is a memory-related issue. Sean
ADD REPLY
0
Entering edit mode
On Feb 13, 2008, at 2:23 AM, Ramon Diaz-Uriarte wrote: > Dear Eleni, > > You are trying to fit a model with 18000 covariates but only 80 > samples (of > which, at most, only 80 are not censored). Just doing it the way > you are > trying to do it is unlikely to work or make much sense... > > You might want to take a look at the work of Torsten Hothorn and > colleagues on > survival ensembles, with implementations in the R package mboost, > and their > work on random forests for survival data (see R package party). > Some of this > funcionality is also accessible through our web-based tool SignS > (http://signs.bioinfo.cnio.es), which uses the above packages. > > Depending on your exact question, you might also want to look at > the approach > of Jelle Goeman, for testing whether sets of genes (e.g., you > complete 18000 > set of genes) are related to the outcome of interest (survival in > your case). > Goeman's approach is available in the globaltest package from BioC. Actually you should look at Jelle's penalized package which fits an L1-regularized version of the cox model (which is something completely different from the globaltest approach). Using regularization in some way is probably your only hope if you want to fit a joint model instead of 18000 marginal models. I know that Jelle has an example with 1000s of genes from a microarray experiment - I don't know whether the code scales to 18000. What you are trying to do is certainly pretty ambitious and you should spend some time understanding the issues if you want to successfully tackle your problem. Or you could just do 18000 marginal regressions which should be easy. Kasper > Hope this helps, > > R. > > > On Wednesday 13 February 2008 08:10, Eleni Christodoulou wrote: >> Hello BioC-community, >> >> It's been a week now that I am struggling with the implementation >> of a cox >> model in R. I have 80 cancer patients, so 80 time measurements and 80 >> relapse or no measurements (respective to censor, 1 if relapsed >> over the >> examined period, 0 if not). My microarray data contain around >> 18000 genes. >> So I have the expressions of 18000 genes in each of the 80 tumors >> (matrix >> 80*18000). I would like to build a cox model in order to retrieve >> the most >> significant genes (according to the p-value). The command that I >> am using >> is: >> >> test1 <- list(time,relapse,genes) >> coxph( Surv(time, relapse) ~ genes, test1) >> >> where time is a vector of size 80 containing the times, relapse is >> a vector >> of size 80 containing the relapse values and genes is a matrix >> 80*18000. >> When I give the coxph command I retrieve an error saying that cannot >> allocate vector of size 2.7Mb (in Windows). I also tried linux >> and then I >> receive error that maximum memory is reached. I increase the >> memory by >> initializing R with the command: >> R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M >> >> I think it cannot get better than that because if I try for example >> max-vsize=300 the memomry capacity is stored as NA. >> >> Does anyone have any idea why this happens and how I can overcome it? >> >> I would be really grateful if you could help! >> It has been bothering me a lot! >> >> Thank you all, >> Eleni >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Ram?n D?az-Uriarte > Statistical Computing Team > Centro Nacional de Investigaciones Oncol?gicas (CNIO) > (Spanish National Cancer Center) > Melchor Fern?ndez Almagro, 3 > 28029 Madrid (Spain) > Fax: +-34-91-224-6972 > Phone: +-34-91-224-6900 > > http://ligarto.org/rdiaz > PGP KeyID: 0xE89B3462 > (http://ligarto.org/rdiaz/0xE89B3462.asc) > > > > **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y ... > {{dropped:3}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6