KNN, SVM, and randomForest - How to predict testing set without known categories (affy data)
1
0
Entering edit mode
Liu, Xin ▴ 120
@liu-xin-811
Last seen 9.7 years ago
Thanks Tom, Sean, Xavier for the reply, and especially Adai! However I still have a problem. To put the microarray data into these supervised clustering, the expreSet need to be built. To build expreSet, you need to give the class of every sample. So when I predict samples with unknown classes, how to put them into the expreSet? Thank you! Xin -----Original Message----- From: Adaikalavan Ramasamy [mailto:ramasamy@cancer.org.uk] Sent: 28 July 2004 13:00 To: Liu, Xin Cc: Tom R. Fahland; BioConductor mailing list Subject: Re: [BioC] KNN, SVM, and randomForest - How to predict testwithout known categories If algorithm 1 predicts "Yes", "Yes", "No", "No" for 4 samples and algorithm 2 predicts "Yes", "No", "Yes", "No", how do you know which one is the better algorithm ? So you use tests set with known classes to do this. You can do this by breaking your learning set (samples with know classes) into training and test set. Look up "cross validation". Some example of built in cross validation * knn.cv() is a leave one out cross-validation of knn() * svm() in library(e1071) has an argument named 'cross' for cross validation In practice, I prefer to write my own wrapper for cross-validation to ensure that sampling method is the same across all algorithms. Once you have determined the best algorithm and features, you then use predict() to predict samples with unknown classes. Regards, Adai. On Wed, 2004-07-28 at 09:18, Liu, Xin wrote: > In R, before using KNN, SVM, and randomForest, a expreSet is needed to build, which require the train WITH known catagories and the test WITH known catagories. However, by definition, in supervised learning you always train (with known > catagories), then predict the test WITHOUT known catagories. I wonder how to implement this. Thank you! > > Xin > > > > > > -----Original Message----- > From: Tom R. Fahland [mailto:tfahland@genomatica.com] > Sent: 27 July 2004 18:48 > To: Liu, Xin; bioconductor@stat.math.ethz.ch > Subject: RE: [BioC] KNN, SVM,and randomForest - How to predict samples > without category > > > By definition, in supervised learning you always train (with known > catagories), then run your unbiased data through for prediction. Both CV > and train/test partitions are good for choosing parameters and > optimizing the algorithms. I have just completed a study predicting dose > expsoure with good reasults using different algorithms. > Tom > > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Liu, Xin > Sent: Tuesday, July 27, 2004 07:39 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] KNN, SVM,and randomForest - How to predict samples > without category > > > Dear all, > > Supervised clusterings (KNN, SVM, and randomForest) use test sample set > and train sample set to do prediction. To create the expreSet, the > category is needed for each sample. However sometimes we need to predict > sample without its category. Anybody has some clue to do this? Thank you > very much! > > Best regards, > Xin LIU > > > > This e-mail is from ArraGen Ltd\ \ The e-mail and any files\...{{dropped}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > This e-mail is from ArraGen Ltd\ \ The e-mail and any files ...{{dropped}}
Microarray Clustering Category Microarray Clustering Category • 1.6k views
ADD COMMENT
0
Entering edit mode
@adaikalavan-ramasamy-675
Last seen 9.7 years ago
I do not know much about exprSet (please correct me if I am wrong) but I think and treat exprSet as matrix. Indeed in my previous message, I was writing in the context of matrix. data(affybatch.example) a <- rma(affybatch.example) m <- exprs(a) Then I work with 'm' which may or may not be what you want. If you want to force a matrix to exprSet, the examples in help("exprSet") might be helpful. Regards, Adai. On Wed, 2004-07-28 at 14:09, Liu, Xin wrote: > Thanks Tom, Sean, Xavier for the reply, and especially Adai! > However I still have a problem. To put the microarray data into these supervised clustering, the expreSet need to be built. To build expreSet, you need to give the class of every sample. So when I predict samples with unknown classes, how to put them into the expreSet? Thank you! > > Xin > > > > -----Original Message----- > From: Adaikalavan Ramasamy [mailto:ramasamy@cancer.org.uk] > Sent: 28 July 2004 13:00 > To: Liu, Xin > Cc: Tom R. Fahland; BioConductor mailing list > Subject: Re: [BioC] KNN, SVM, and randomForest - How to predict > testwithout known categories > > > If algorithm 1 predicts "Yes", "Yes", "No", "No" for 4 samples and > algorithm 2 predicts "Yes", "No", "Yes", "No", how do you know which one > is the better algorithm ? So you use tests set with known classes to do > this. You can do this by breaking your learning set (samples with know > classes) into training and test set. Look up "cross validation". > > Some example of built in cross validation > * knn.cv() is a leave one out cross-validation of knn() > * svm() in library(e1071) has an argument named 'cross' for cross > validation > In practice, I prefer to write my own wrapper for cross-validation to > ensure that sampling method is the same across all algorithms. > > Once you have determined the best algorithm and features, you then use > predict() to predict samples with unknown classes. > > Regards, Adai. > > > > On Wed, 2004-07-28 at 09:18, Liu, Xin wrote: > > In R, before using KNN, SVM, and randomForest, a expreSet is needed to build, which require the train WITH known catagories and the test WITH known catagories. However, by definition, in supervised learning you always train (with known > > catagories), then predict the test WITHOUT known catagories. I wonder how to implement this. Thank you! > > > > Xin > > > > > > > > > > > > -----Original Message----- > > From: Tom R. Fahland [mailto:tfahland@genomatica.com] > > Sent: 27 July 2004 18:48 > > To: Liu, Xin; bioconductor@stat.math.ethz.ch > > Subject: RE: [BioC] KNN, SVM,and randomForest - How to predict samples > > without category > > > > > > By definition, in supervised learning you always train (with known > > catagories), then run your unbiased data through for prediction. Both CV > > and train/test partitions are good for choosing parameters and > > optimizing the algorithms. I have just completed a study predicting dose > > expsoure with good reasults using different algorithms. > > Tom > > > > -----Original Message----- > > From: bioconductor-bounces@stat.math.ethz.ch > > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Liu, Xin > > Sent: Tuesday, July 27, 2004 07:39 > > To: bioconductor@stat.math.ethz.ch > > Subject: [BioC] KNN, SVM,and randomForest - How to predict samples > > without category > > > > > > Dear all, > > > > Supervised clusterings (KNN, SVM, and randomForest) use test sample set > > and train sample set to do prediction. To create the expreSet, the > > category is needed for each sample. However sometimes we need to predict > > sample without its category. Anybody has some clue to do this? Thank you > > very much! > > > > Best regards, > > Xin LIU > > > > > > > > This e-mail is from ArraGen Ltd\ \ The e-mail and any files\...{{dropped}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > This e-mail is from ArraGen Ltd > > The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. > > Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. > > If you have received the e-mail in error please notify helpdesk@arragen.com or telephone +44 28 38 363841 and delete the e-mail from your system. > > E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient. > > Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free. > > ArraGen Ltd. Registration Number NI 43067 > Registered Address : Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD > >
ADD COMMENT

Login before adding your answer.

Traffic: 504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6