Where classification worked better ?
1
0
Entering edit mode
@santana-sarma-3163
Last seen 9.6 years ago
Hi All, Having applied ONE clustering method separately to TWO (similar) type of datasets, I wonder – how a) I can determine where the method worked better (not merely based on visualizing the plots)! b) I can retrieve the clusters along with their respective contents. For example: if two clusters A & B are found and the clusters contain different genes, how to access & save the genes of A and B. # Hierarchical clustering Genes <- read.csv (file="xy.csv", header = TRUE) Genes_2 <- read.csv (file="ab.csv", header = TRUE) Hierarchy1 <- clust.cor.patient <- hclust(as.dist(1 - cor(Genes)), method = "ward") Hierarchy2 <-clust.cor.genes <- hclust(as.dist(1 - cor(Genes_2)), method = "ward") ## I tried k-means too, using the following simple codes. But again, wonder if it is possible to know in which dataset the method worked better. kmeans.Genes.fit <- kmeans (Genes, 2) kmeans.Genes_2.fit <- kmeans (Genes_2, 2) # table (kmeans.Genes.fit$cluster); table (kmeans.Genes_2.fit$cluster) Thanks a lot. Cheers, Santana [[alternative HTML version deleted]]
Clustering Clustering • 665 views
ADD COMMENT
0
Entering edit mode
Tarca, Adi ▴ 570
@tarca-adi-1500
Last seen 5 months ago
United States
Hi Santana, >>Having applied ONE clustering method separately to TWO (similar) type of datasets, I wonder ? how >>a) I can determine where the method worked better (not merely based on >>visualizing the plots)! You can compare the ratio= (between cluster variance) / (within cluster variance). This should be higher for dataset where the clustering worked best. This should work even though the number of clusters will not be the same (but very similar) for both datasets. I am not sure this measure is provided by the two clustering methods you used but you can compute it yourself once you identified which genes belong to which cluster and which are the centers of each cluster. >>b) I can retrieve the clusters along with their respective contents. >>For example: if two clusters A & B are found and the clusters contain different genes, how to access & save the genes of A and >>B. The object returned by the function kmeans contains a component called "cluster" that tells you which columns of your data matrix belong to each cluster as well as a "centers" component (one for each cluster). See the example from kmeans function help. Regards, Adi Tarca
ADD COMMENT

Login before adding your answer.

Traffic: 573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6