Question: Hierarchical clustering and shrinking centroids...
0
gravatar for Tan, MinHan
15.1 years ago by
Tan, MinHan180
Tan, MinHan180 wrote:
Dear list members, I have been unable to resolve this conceptual problem. I performed hierarchical clustering on a filtered sample (cv=0.04, at least 2 samples > level of log 9) of 80 tumor samples, and obtained several groups. Some of these clusters were definitely more stable than others. Subsequently, based on visual inspection, and my knowledge of the case outcomes, I arbitrarily classified one large cluster as 'good prognosis' and other clusters as 'bad prognosis'. Using this classification obtained above, I did a supervised analysis using PAMR to obtain a gene list. However, the misclassification rate during cross-validation for my good prognosis is fairly low and stable (<0.05) throughout the shrinking gene list, but the misclassification rate for my poor prognosis case is relatively higher, and also fairly stable (approx 0.2). I examined the classification of my cases, and some 'poor prognosis' cases seemed to be persistently recognized as 'good prognosis' cases. Evidently, there is some problem with the classification arising from the choice of algorithm. I have tried kth nearest neighbour, and the same problem occurs. Relooking at the HC tree, some of these good/bad prognosis genes are clustered together, suggesting other genes I wonder how I may explain this - I suppose the clustering of these cases is determined by genes other than those differentiating between these two major groups. Naturally, validation by an independent set is ideal, but I guess my question is more on this problem of cross-validation. I would appreciate any advice, or pointers to any references for this! Thanks. Min-Han Tan This email message, including any attachments, is for the so...{{dropped}}
ADD COMMENTlink modified 15.1 years ago by Stephen Henderson1.0k • written 15.1 years ago by Tan, MinHan180
Answer: Hierarchical clustering and shrinking centroids...
0
gravatar for Tom R. Fahland
15.1 years ago by
Tom R. Fahland60 wrote:
Tan I have been doing a lot of classification using PAMR, as well as LDA and SVM's. The overused phrase the data is what it is is valid here. I look at highly correlated samples that mis-classify, and they are usually the same with differnet classification algorithms. Sometimes I don't get really good stability with different gene lists also. HC clustering uses simple correlation metrics, so starting from this can be problematic. I kow I really didn't answer anything, but thought sharing my experience might help. Tom -----Original Message----- From: Tan, MinHan [mailto:MinHan.Tan@vai.org] Sent: Monday, May 24, 2004 18:57 To: bioconductor@stat.math.ethz.ch Subject: [BioC] Hierarchical clustering and shrinking centroids... Dear list members, I have been unable to resolve this conceptual problem. I performed hierarchical clustering on a filtered sample (cv=0.04, at least 2 samples > level of log 9) of 80 tumor samples, and obtained several groups. Some of these clusters were definitely more stable than others. Subsequently, based on visual inspection, and my knowledge of the case outcomes, I arbitrarily classified one large cluster as 'good prognosis' and other clusters as 'bad prognosis'. Using this classification obtained above, I did a supervised analysis using PAMR to obtain a gene list. However, the misclassification rate during cross-validation for my good prognosis is fairly low and stable (<0.05) throughout the shrinking gene list, but the misclassification rate for my poor prognosis case is relatively higher, and also fairly stable (approx 0.2). I examined the classification of my cases, and some 'poor prognosis' cases seemed to be persistently recognized as 'good prognosis' cases. Evidently, there is some problem with the classification arising from the choice of algorithm. I have tried kth nearest neighbour, and the same problem occurs. Relooking at the HC tree, some of these good/bad prognosis genes are clustered together, suggesting other genes I wonder how I may explain this - I suppose the clustering of these cases is determined by genes other than those differentiating between these two major groups. Naturally, validation by an independent set is ideal, but I guess my question is more on this problem of cross-validation. I would appreciate any advice, or pointers to any references for this! Thanks. Min-Han Tan This email message, including any attachments, is for the so...{{dropped}} _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENTlink written 15.1 years ago by Tom R. Fahland60
Answer: Hierarchical clustering and shrinking centroids...
0
gravatar for Stephen Henderson
15.1 years ago by
Stephen Henderson1.0k wrote:
Yes I'm not sure why you have started with the clustering either (though it suggests that you are on the right track). You should classify the samples based on their actual outcome and try PAMR and not whether they are in the imperfect good or bad cluster. Forgive if me if I've misunderstood you. There is a useful guide to using classification on array data (using e1071 svm) under the short courses page on Bioconductor, the Heidelberg Course Sept 2002. I found this helpful getting started in R. The guide to the ipred package is also excellent. Stephen ps 0.2 error is reasonable I think for a tumour prognosis. No? -----Original Message----- From: Tom R. Fahland To: Tan, MinHan; bioconductor Sent: 5/26/04 12:52 AM Subject: RE: [BioC] Hierarchical clustering and shrinking centroids... Tan I have been doing a lot of classification using PAMR, as well as LDA and SVM's. The overused phrase the data is what it is is valid here. I look at highly correlated samples that mis-classify, and they are usually the same with differnet classification algorithms. Sometimes I don't get really good stability with different gene lists also. HC clustering uses simple correlation metrics, so starting from this can be problematic. I kow I really didn't answer anything, but thought sharing my experience might help. Tom -----Original Message----- From: Tan, MinHan [mailto:MinHan.Tan@vai.org] Sent: Monday, May 24, 2004 18:57 To: bioconductor@stat.math.ethz.ch Subject: [BioC] Hierarchical clustering and shrinking centroids... Dear list members, I have been unable to resolve this conceptual problem. I performed hierarchical clustering on a filtered sample (cv=0.04, at least 2 samples > level of log 9) of 80 tumor samples, and obtained several groups. Some of these clusters were definitely more stable than others. Subsequently, based on visual inspection, and my knowledge of the case outcomes, I arbitrarily classified one large cluster as 'good prognosis' and other clusters as 'bad prognosis'. Using this classification obtained above, I did a supervised analysis using PAMR to obtain a gene list. However, the misclassification rate during cross-validation for my good prognosis is fairly low and stable (<0.05) throughout the shrinking gene list, but the misclassification rate for my poor prognosis case is relatively higher, and also fairly stable (approx 0.2). I examined the classification of my cases, and some 'poor prognosis' cases seemed to be persistently recognized as 'good prognosis' cases. Evidently, there is some problem with the classification arising from the choice of algorithm. I have tried kth nearest neighbour, and the same problem occurs. Relooking at the HC tree, some of these good/bad prognosis genes are clustered together, suggesting other genes I wonder how I may explain this - I suppose the clustering of these cases is determined by genes other than those differentiating between these two major groups. Naturally, validation by an independent set is ideal, but I guess my question is more on this problem of cross-validation. I would appreciate any advice, or pointers to any references for this! Thanks. Min-Han Tan This email message, including any attachments, is for the so...{{dropped}} _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor ********************************************************************** This email and any files transmitted with it are confidentia...{{dropped}}
ADD COMMENTlink written 15.1 years ago by Stephen Henderson1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 356 users visited in the last hour