package of predicting a continuous variable from more than one continuous predictor variables

0

Entering edit mode

shirley zhang ★ 1.0k

@shirley-zhang-2038

Last seen 9.7 years ago

Dear List, I am attempting to predict the values of a continuous variable from more than one continuous predictor variables (gene expressions in Microarray). Can anyone recommend some packages? Thanks, Shirley

• 1.4k views

ADD COMMENT • link updated 14.7 years ago by Steve Lianoglou ★ 13k • written 14.7 years ago by shirley zhang ★ 1.0k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 15 months ago

United States

Hi, On Sep 9, 2009, at 9:32 AM, shirley zhang wrote: > Dear List, > > I am attempting to predict the values of a continuous variable from > more than one continuous predictor variables (gene expressions in > Microarray). Can anyone recommend some packages? Isn't this just a regression problem? There's a plethora of packages to look at, depending on the approach you're interested: http://cran.r-project.org/web/views/MachineLearning.html Take your pick :-) -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 14.7 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thanks Steve. Sorry that I did not make myself clear. I am trying to build a biomarker from gene expression microarray data. What I am doing is similar to the weighted-voting algorithm or SVM. But the difference is that the outcome is a continuous variable instead of a categorical variable. It is a regression problem, but I want to know which package is best for this purpose? How about CART? Thanks again, Shirley On Wed, Sep 9, 2009 at 9:45 AM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi, > > On Sep 9, 2009, at 9:32 AM, shirley zhang wrote: > >> Dear List, >> >> I am attempting to predict the values of a continuous variable from >> more than one continuous predictor variables (gene expressions in >> Microarray). Can anyone recommend some packages? > > Isn't this just a regression problem? > > There's a plethora of packages to look at, depending on the approach you're > interested: > > http://cran.r-project.org/web/views/MachineLearning.html > > Take your pick :-) > > ?-steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| ?Memorial Sloan-Kettering Cancer Center > ?| ?Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > >

ADD REPLY • link 14.7 years ago shirley zhang ★ 1.0k

0

Entering edit mode

Hi Shirley, On Sep 9, 2009, at 10:10 AM, shirley zhang wrote: > Thanks Steve. > > Sorry that I did not make myself clear. I am trying to build a > biomarker from gene expression microarray data. What I am doing is > similar to the weighted-voting algorithm or SVM. But the difference is > that the outcome is a continuous variable instead of a categorical > variable. It is a regression problem, but I want to know which > package is best for this purpose? How about CART? I don't know if there's such thing as "best"(?) What yard stick would you use to measure that? For instance, you mention "it" is similar to an svm (how?), but SVM's can also be used for regression, not just classification (doable from both e1071 and kernlab). How about going that route? As usual, interpretation of the model might be challenging, though (which might be why you're avoiding it for biomarker discovery?) You also mention weighted-voting: * how about boosted regression models? http://cran.r-project.org/web/packages/gbm/index.html * Also related to boosting: bagging & randomForests (both can be used for regression): http://cran.r-project.org/web/packages/randomForest/index.html http://cran.r-project.org/web/packages/ipred/index.html I think boosting/bagging/random-forests tend to lead to more interpretable models, so maybe that's better for you? There are also several penalized regression packages (also good for interpretability) for instance glmnet is great: http://cran.r-project.org/web/packages/glmnet/index.html Maybe you have some info about the grouping of your predictors? Try grouped lasso: http://cran.r-project.org/web/packages/grplasso/index.html -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.7 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi Steve, Thanks for your explanation and suggestions. I don't know SVM can also be used for regression since I only used it for classification. I will try those methods you suggested. Do you have any experience with CART? Thanks again, Shirley On Wed, Sep 9, 2009 at 10:26 AM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi Shirley, > > On Sep 9, 2009, at 10:10 AM, shirley zhang wrote: > >> Thanks Steve. >> >> Sorry that I did not make myself clear. I am trying to build a >> biomarker from gene expression microarray data. What I am doing is >> similar to the weighted-voting algorithm or SVM. But the difference is >> that the outcome is a continuous variable instead of a categorical >> variable. ?It is a regression problem, but I want to know which >> package is best for this purpose? How about CART? > > I don't know if there's such thing as "best"(?) What yard stick would you > use to measure that? > > For instance, you mention "it" is similar to an svm (how?), but SVM's can > also be used for regression, not just classification (doable from both e1071 > and kernlab). How about going that route? As usual, interpretation of the > model might be challenging, though (which might be why you're avoiding it > for biomarker discovery?) > > You also mention weighted-voting: > > ?* how about boosted regression models? > ? ? http://cran.r-project.org/web/packages/gbm/index.html > > ?* Also related to boosting: bagging & randomForests (both can be used for > regression): > ? ? http://cran.r-project.org/web/packages/randomForest/index.html > ? ? http://cran.r-project.org/web/packages/ipred/index.html > > I think boosting/bagging/random-forests tend to lead to more interpretable > models, so maybe that's better for you? > > There are also several penalized regression packages (also good for > interpretability) for instance glmnet is great: > http://cran.r-project.org/web/packages/glmnet/index.html > > Maybe you have some info about the grouping of your predictors? Try grouped > lasso: > http://cran.r-project.org/web/packages/grplasso/index.html > > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| ?Memorial Sloan-Kettering Cancer Center > ?| ?Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > >

ADD REPLY • link 14.7 years ago shirley zhang ★ 1.0k

0

Entering edit mode

Hi, On Sep 9, 2009, at 10:38 AM, shirley zhang wrote: > Hi Steve, > > Thanks for your explanation and suggestions. I don't know SVM can also > be used for regression since I only used it for classification. Yeah, no problem. It's pretty straightforward to wire up an SVM for regression -- you'll have run it a few times with different values of "epsilon" (like you would for the the C (or nu) in svm- classification). If you're interested in some details/theory, here's a "brief tutorial" on support vector regression by Alex Smola and Bernhard Scholkopf: http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf Let us know if you need help (but maybe R-help might be more appropriate?). > I will try those methods you suggested. Do you have any experience > with CART? Nope, I've never used CART before, sorry. -steve > > Thanks again, > Shirley > > On Wed, Sep 9, 2009 at 10:26 AM, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi Shirley, >> >> On Sep 9, 2009, at 10:10 AM, shirley zhang wrote: >> >>> Thanks Steve. >>> >>> Sorry that I did not make myself clear. I am trying to build a >>> biomarker from gene expression microarray data. What I am doing is >>> similar to the weighted-voting algorithm or SVM. But the >>> difference is >>> that the outcome is a continuous variable instead of a categorical >>> variable. It is a regression problem, but I want to know which >>> package is best for this purpose? How about CART? >> >> I don't know if there's such thing as "best"(?) What yard stick >> would you >> use to measure that? >> >> For instance, you mention "it" is similar to an svm (how?), but >> SVM's can >> also be used for regression, not just classification (doable from >> both e1071 >> and kernlab). How about going that route? As usual, interpretation >> of the >> model might be challenging, though (which might be why you're >> avoiding it >> for biomarker discovery?) >> >> You also mention weighted-voting: >> >> * how about boosted regression models? >> http://cran.r-project.org/web/packages/gbm/index.html >> >> * Also related to boosting: bagging & randomForests (both can be >> used for >> regression): >> http://cran.r-project.org/web/packages/randomForest/index.html >> http://cran.r-project.org/web/packages/ipred/index.html >> >> I think boosting/bagging/random-forests tend to lead to more >> interpretable >> models, so maybe that's better for you? >> >> There are also several penalized regression packages (also good for >> interpretability) for instance glmnet is great: >> http://cran.r-project.org/web/packages/glmnet/index.html >> >> Maybe you have some info about the grouping of your predictors? Try >> grouped >> lasso: >> http://cran.r-project.org/web/packages/grplasso/index.html >> >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.7 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thanks a lot. Shirley On Wed, Sep 9, 2009 at 10:45 AM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi, > > On Sep 9, 2009, at 10:38 AM, shirley zhang wrote: > >> Hi Steve, >> >> Thanks for your explanation and suggestions. I don't know SVM can also >> be used for regression since I only used it for classification. > > Yeah, no problem. It's pretty straightforward to wire up an SVM for > regression -- you'll have run it a few times with different values of > "epsilon" (like you would for the the C (or nu) in svm- classification). > > If you're interested in some details/theory, here's a "brief tutorial" on > support vector regression by Alex Smola and Bernhard Scholkopf: > http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf > > Let us know if you need help (but maybe R-help might be more appropriate?). > >> I will try those methods you suggested. Do you have any experience with >> CART? > > Nope, I've never used CART before, sorry. > > -steve > >> >> Thanks again, >> Shirley >> >> On Wed, Sep 9, 2009 at 10:26 AM, Steve Lianoglou >> <mailinglist.honeypot at="" gmail.com=""> wrote: >>> >>> Hi Shirley, >>> >>> On Sep 9, 2009, at 10:10 AM, shirley zhang wrote: >>> >>>> Thanks Steve. >>>> >>>> Sorry that I did not make myself clear. I am trying to build a >>>> biomarker from gene expression microarray data. What I am doing is >>>> similar to the weighted-voting algorithm or SVM. But the difference is >>>> that the outcome is a continuous variable instead of a categorical >>>> variable. ?It is a regression problem, but I want to know which >>>> package is best for this purpose? How about CART? >>> >>> I don't know if there's such thing as "best"(?) What yard stick would you >>> use to measure that? >>> >>> For instance, you mention "it" is similar to an svm (how?), but SVM's can >>> also be used for regression, not just classification (doable from both >>> e1071 >>> and kernlab). How about going that route? As usual, interpretation of the >>> model might be challenging, though (which might be why you're avoiding it >>> for biomarker discovery?) >>> >>> You also mention weighted-voting: >>> >>> ?* how about boosted regression models? >>> ? ?http://cran.r-project.org/web/packages/gbm/index.html >>> >>> ?* Also related to boosting: bagging & randomForests (both can be used >>> for >>> regression): >>> ? ?http://cran.r-project.org/web/packages/randomForest/index.html >>> ? ?http://cran.r-project.org/web/packages/ipred/index.html >>> >>> I think boosting/bagging/random-forests tend to lead to more >>> interpretable >>> models, so maybe that's better for you? >>> >>> There are also several penalized regression packages (also good for >>> interpretability) for instance glmnet is great: >>> http://cran.r-project.org/web/packages/glmnet/index.html >>> >>> Maybe you have some info about the grouping of your predictors? Try >>> grouped >>> lasso: >>> http://cran.r-project.org/web/packages/grplasso/index.html >>> >>> >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Graduate Student: Computational Systems Biology >>> ?| ?Memorial Sloan-Kettering Cancer Center >>> ?| ?Weill Medical College of Cornell University >>> Contact Info: http://cbio.mskcc.org/~lianos/contact >>> >>> > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| ?Memorial Sloan-Kettering Cancer Center > ?| ?Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > -- Xiaoling (Shirley) Zhang Ph.D. Candidate in Bioinformatics Boston University, Boston, MA Tel: (857) 233-9862 Email: zhangxl at bu.edu

ADD REPLY • link 14.7 years ago shirley zhang ★ 1.0k

0

Entering edit mode

Dear Shirley, have you tried to use random forests for the described task? Prediction quality is similar to SVM prediction quality while there is (almost) no need for method parameters adjustment. Predefined values of these parameters in the random forests case could be used for a variety of different problems without adjustment, subseting etc. Unfortunately, in SVM case you need to optimize/train at least 2-3 parameters. This task is time consuming, and it could lead to severe overfitting problems. You could also try to use the backpropagation neural networks but this problem is even more pronounced in this case. Zeljko Debeljak, PhD CROATIA 2009/9/9, shirley zhang <shirley0818 at="" gmail.com="">: > Thanks a lot. Shirley > > On Wed, Sep 9, 2009 at 10:45 AM, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi, >> >> On Sep 9, 2009, at 10:38 AM, shirley zhang wrote: >> >>> Hi Steve, >>> >>> Thanks for your explanation and suggestions. I don't know SVM can also >>> be used for regression since I only used it for classification. >> >> Yeah, no problem. It's pretty straightforward to wire up an SVM for >> regression -- you'll have run it a few times with different values of >> "epsilon" (like you would for the the C (or nu) in svm- classification). >> >> If you're interested in some details/theory, here's a "brief tutorial" on >> support vector regression by Alex Smola and Bernhard Scholkopf: >> http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf >> >> Let us know if you need help (but maybe R-help might be more >> appropriate?). >> >>> I will try those methods you suggested. Do you have any experience with >>> CART? >> >> Nope, I've never used CART before, sorry. >> >> -steve >> >>> >>> Thanks again, >>> Shirley >>> >>> On Wed, Sep 9, 2009 at 10:26 AM, Steve Lianoglou >>> <mailinglist.honeypot at="" gmail.com=""> wrote: >>>> >>>> Hi Shirley, >>>> >>>> On Sep 9, 2009, at 10:10 AM, shirley zhang wrote: >>>> >>>>> Thanks Steve. >>>>> >>>>> Sorry that I did not make myself clear. I am trying to build a >>>>> biomarker from gene expression microarray data. What I am doing is >>>>> similar to the weighted-voting algorithm or SVM. But the difference is >>>>> that the outcome is a continuous variable instead of a categorical >>>>> variable. ?It is a regression problem, but I want to know which >>>>> package is best for this purpose? How about CART? >>>> >>>> I don't know if there's such thing as "best"(?) What yard stick would >>>> you >>>> use to measure that? >>>> >>>> For instance, you mention "it" is similar to an svm (how?), but SVM's >>>> can >>>> also be used for regression, not just classification (doable from both >>>> e1071 >>>> and kernlab). How about going that route? As usual, interpretation of >>>> the >>>> model might be challenging, though (which might be why you're avoiding >>>> it >>>> for biomarker discovery?) >>>> >>>> You also mention weighted-voting: >>>> >>>> ?* how about boosted regression models? >>>> ? ?http://cran.r-project.org/web/packages/gbm/index.html >>>> >>>> ?* Also related to boosting: bagging & randomForests (both can be used >>>> for >>>> regression): >>>> ? ?http://cran.r-project.org/web/packages/randomForest/index.html >>>> ? ?http://cran.r-project.org/web/packages/ipred/index.html >>>> >>>> I think boosting/bagging/random-forests tend to lead to more >>>> interpretable >>>> models, so maybe that's better for you? >>>> >>>> There are also several penalized regression packages (also good for >>>> interpretability) for instance glmnet is great: >>>> http://cran.r-project.org/web/packages/glmnet/index.html >>>> >>>> Maybe you have some info about the grouping of your predictors? Try >>>> grouped >>>> lasso: >>>> http://cran.r-project.org/web/packages/grplasso/index.html >>>> >>>> >>>> -steve >>>> >>>> -- >>>> Steve Lianoglou >>>> Graduate Student: Computational Systems Biology >>>> ?| ?Memorial Sloan-Kettering Cancer Center >>>> ?| ?Weill Medical College of Cornell University >>>> Contact Info: http://cbio.mskcc.org/~lianos/contact >>>> >>>> >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> ?| ?Memorial Sloan-Kettering Cancer Center >> ?| ?Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> > > > > -- > Xiaoling (Shirley) Zhang > > Ph.D. Candidate in Bioinformatics > Boston University, Boston, MA > Tel: (857) 233-9862 > Email: zhangxl at bu.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.7 years ago Zeljko Debeljak ▴ 50

Login before adding your answer.