about label permutation test for binary classification
1
0
Entering edit mode
@james-anderson-1641
Last seen 9.7 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070913/ 60759ef4/attachment.pl
• 598 views
ADD COMMENT
0
Entering edit mode
@joern-toedling-1244
Last seen 9.7 years ago
Hello, I am a bit puzzled about what you actually want to ask. James Anderson wrote: > For binary classification problem in microarray, if you do some random subsampling classification (every time split data into 80% training and 20% test with stratification (perserving the ratio in each class), repeat many times). When you get some results, one thing you would normally look at is how significantly different is your results from what you are going to get by chance, that's why people do label permutation test. My question is that: Are the final results of label permutation test for accuracy equal to the proportion of the large class (say there are 80 normal vs. 20 disease, is the mean accuracy of label permutation test equal to 80/(80+20) as long as you repeat enough times? Is this classifier independent? > The mean accuracy of your classifier after label permutation, in a cross-validation setting presumably, depends very much on your classifier. What you should contrast it to is the accuracy of the naive classifier "assign every sample to the larger class", 80% in your case. A good reason for label permutation in your case is that you want to assess the classifier's generalizability, because one can always construct a classifier that has an accuracy of 100% on the training data, but performs badly on independent test data. That is one reason why people do label permutation with classification because the classifier's mean accuracy in a cross-validation setting gives a better estimate of the classifier's accuracy on test data. (You have to make sure that you do not use any aspect of the set-aside training data for training the classifier, though.) An even better estimate for your classifier's performance, however, would be its accuracy on a completely independent test data set. Cross-validation on your training data could then be used to select parameters of your classifier, if needed. Hope this helps. Regards, Joern > Thanks a lot! > > James > > > --------------------------------- > Building a website is a piece of cake. > well, classification sometimes isn't.
ADD COMMENT
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070913/ a759ad1d/attachment.pl
ADD REPLY

Login before adding your answer.

Traffic: 602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6