I'm trying to get quality measures, for example silhouette, from HCPC clusters
I can get avg.silwidth from the k-means algorithm, using cluster.stats from fpc library :
library(factoextra)
library(fpc)
km.res <- eclust(dat, "kmeans", k = 20, nstart = 25, graph = FALSE)
# Compute pairwise-distance matrices
dd <- dist(dat, method ="euclidean")
# Statistics for k-means clustering
km_stats <- cluster.stats(dd, km.res$cluster)
But now I'm trying to use cluster.stats with HCPC. To run cluster.stats I need a distance matrix and the clusters. When I execute HCPC with consol=TRUE it indeed runs a k-mean algorithm therefore I've been trying to find the distance matrix and the clusters but I was not able.
names(res.hcpc) [1] "data.clust" "desc.var" "desc.axes" "call" "desc.ind"
In this assignment https://eric.univ-lyon2.fr/~jahpine/cours/l3_cestat-dm/tp5.pdf
it seems that they get the clusters from data.clust$clust:
clust . eval . hcpc = cluster . stats ( cluster :: daisy ( sub . D . kmodes ) , as . numeric ( res . hcpc . sub . acm$data . clust$clust ) , res . kmodes . sub . D . kmodes$cluster )
But when I run this
library(FactoMineR)
res <- PCA(dat,quali.sup=1:3, ncp=8)
#Hierarchical ascendant clustering
res.hcpc <- HCPC(res, kk=Inf, min=3, max=10, consol=TRUE)
km_stats <- cluster.stats(dd, res.hcpc$data.clust$clust)
I get the Error:
Error in Summary.factor(c(3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, :
‘max’ not meaningful for factors
Hi,
I am just asking myself if you could find a solution for this? I am also dealing with the same issue...I want to get some cluster validation statistics for my HCPC but I just can't find any solution.
BR, Daniela