Re: KNN, SVM, and randomForest - How to predict testing set without known categories (affy data)

0

Entering edit mode

Kasper Daniel Hansen ▴ 630

@kasper-daniel-hansen-459

Last seen 9.6 years ago

Adaikalavan Ramasamy <ramasamy@cancer.org.uk> writes: > I do not know much about exprSet (please correct me if I am wrong) but I > think and treat exprSet as matrix. Indeed in my previous message, I was > writing in the context of matrix. > > data(affybatch.example) > a <- rma(affybatch.example) > m <- exprs(a) > > Then I work with 'm' which may or may not be what you want. > > If you want to force a matrix to exprSet, the examples in > help("exprSet") might be helpful. an exprSet is a matrix of expression values coupled with a dataframe of covariates. If you (original poster) look at the aforementioned article, you will se that they use the original exprset (lets call it Edata) in the following way: Xdata <- t(exprs(Edata)) Ydata <- pData(Edata)["y-values"] So you do not really need the exprset object, as it is only used to get the matrix of expression values and the dataframe of classes. Now, given that you have a fit (which you have constructed using a train data set with known classes), you predict the classes in something like predict(fit, newdata=Xdata.test) I suggest looking at the code and try to separate the different components. /Kasper > Regards, Adai. > > > On Wed, 2004-07-28 at 14:09, Liu, Xin wrote: >> Thanks Tom, Sean, Xavier for the reply, and especially Adai! >> However I still have a problem. To put the microarray data into these supervised clustering, the expreSet need to be built. To build expreSet, you need to give the class of every sample. So when I predict samples with unknown classes, how to put them into the expreSet? Thank you! >> >> Xin >> >> >> >> -----Original Message----- >> From: Adaikalavan Ramasamy [mailto:ramasamy@cancer.org.uk] >> Sent: 28 July 2004 13:00 >> To: Liu, Xin >> Cc: Tom R. Fahland; BioConductor mailing list >> Subject: Re: [BioC] KNN, SVM, and randomForest - How to predict >> testwithout known categories >> >> >> If algorithm 1 predicts "Yes", "Yes", "No", "No" for 4 samples and >> algorithm 2 predicts "Yes", "No", "Yes", "No", how do you know which one >> is the better algorithm ? So you use tests set with known classes to do >> this. You can do this by breaking your learning set (samples with know >> classes) into training and test set. Look up "cross validation". >> >> Some example of built in cross validation >> * knn.cv() is a leave one out cross-validation of knn() >> * svm() in library(e1071) has an argument named 'cross' for cross >> validation >> In practice, I prefer to write my own wrapper for cross-validation to >> ensure that sampling method is the same across all algorithms. >> >> Once you have determined the best algorithm and features, you then use >> predict() to predict samples with unknown classes. >> >> Regards, Adai. >> >> >> >> On Wed, 2004-07-28 at 09:18, Liu, Xin wrote: >> > In R, before using KNN, SVM, and randomForest, a expreSet is needed to build, which require the train WITH known catagories and the test WITH known catagories. However, by definition, in supervised learning you always train (with known >> > catagories), then predict the test WITHOUT known catagories. I wonder how to implement this. Thank you! >> > >> > Xin >> > >> > >> > >> > >> > >> > -----Original Message----- >> > From: Tom R. Fahland [mailto:tfahland@genomatica.com] >> > Sent: 27 July 2004 18:48 >> > To: Liu, Xin; bioconductor@stat.math.ethz.ch >> > Subject: RE: [BioC] KNN, SVM,and randomForest - How to predict samples >> > without category >> > >> > >> > By definition, in supervised learning you always train (with known >> > catagories), then run your unbiased data through for prediction. Both CV >> > and train/test partitions are good for choosing parameters and >> > optimizing the algorithms. I have just completed a study predicting dose >> > expsoure with good reasults using different algorithms. >> > Tom >> > >> > -----Original Message----- >> > From: bioconductor-bounces@stat.math.ethz.ch >> > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Liu, Xin >> > Sent: Tuesday, July 27, 2004 07:39 >> > To: bioconductor@stat.math.ethz.ch >> > Subject: [BioC] KNN, SVM,and randomForest - How to predict samples >> > without category >> > >> > >> > Dear all, >> > >> > Supervised clusterings (KNN, SVM, and randomForest) use test sample set >> > and train sample set to do prediction. To create the expreSet, the >> > category is needed for each sample. However sometimes we need to predict >> > sample without its category. Anybody has some clue to do this? Thank you >> > very much! >> > >> > Best regards, >> > Xin LIU >> > >> > >> > >> > This e-mail is from ArraGen Ltd\ \ The e-mail and any files\...{{dropped}} >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@stat.math.ethz.ch >> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >> > >> >> >> >> >> >> This e-mail is from ArraGen Ltd >> >> The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. >> >> Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. >> >> If you have received the e-mail in error please notify helpdesk@arragen.com or telephone +44 28 38 363841 and delete the e-mail from your system. >> >> E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient. >> >> Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free. >> >> ArraGen Ltd. Registration Number NI 43067 >> Registered Address : Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > -- Kasper Daniel Hansen, Research Assistant Department of Biostatistics, University of Copenhagen

Microarray Clustering Category Microarray Clustering Category • 1.2k views

ADD COMMENT • link 19.8 years ago Kasper Daniel Hansen ▴ 630

Login before adding your answer.