Question

Where classification worked better ?

0

Entering edit mode

santana sarma ▴ 80

@santana-sarma-3163

Last seen 10.2 years ago

Hi All, Having applied ONE clustering method separately to TWO (similar) type of datasets, I wonder how a) I can determine where the method worked better (not merely based on visualizing the plots)! b) I can retrieve the clusters along with their respective contents. For example: if two clusters A & B are found and the clusters contain different genes, how to access & save the genes of A and B. # Hierarchical clustering Genes <- read.csv (file="xy.csv", header = TRUE) Genes_2 <- read.csv (file="ab.csv", header = TRUE) Hierarchy1 <- clust.cor.patient <- hclust(as.dist(1 - cor(Genes)), method = "ward") Hierarchy2 <-clust.cor.genes <- hclust(as.dist(1 - cor(Genes_2)), method = "ward") ## I tried k-means too, using the following simple codes. But again, wonder if it is possible to know in which dataset the method worked better. kmeans.Genes.fit <- kmeans (Genes, 2) kmeans.Genes_2.fit <- kmeans (Genes_2, 2) # table (kmeans.Genes.fit$cluster); table (kmeans.Genes_2.fit$cluster) Thanks a lot. Cheers, Santana [[alternative HTML version deleted]]

Clustering Clustering • 763 views

ADD COMMENT • link updated 15.2 years ago by Tarca, Adi ▴ 570 • written 15.2 years ago by santana sarma ▴ 80

score 0 · Answer 1 · 2009-09-20

Hi Santana, >>Having applied ONE clustering method separately to TWO (similar) type of datasets, I wonder ? how >>a) I can determine where the method worked better (not merely based on >>visualizing the plots)! You can compare the ratio= (between cluster variance) / (within cluster variance). This should be higher for dataset where the clustering worked best. This should work even though the number of clusters will not be the same (but very similar) for both datasets. I am not sure this measure is provided by the two clustering methods you used but you can compute it yourself once you identified which genes belong to which cluster and which are the centers of each cluster. >>b) I can retrieve the clusters along with their respective contents. >>For example: if two clusters A & B are found and the clusters contain different genes, how to access & save the genes of A and >>B. The object returned by the function kmeans contains a component called "cluster" that tells you which columns of your data matrix belong to each cluster as well as a "centers" component (one for each cluster). See the example from kmeans function help. Regards, Adi Tarca