I was trying to replicate the BHC library example code (https://bioconductor.org/packages/release/bioc/html/BHC.html) with the Beast Cancer dataset (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic), with PCA applied), but I have found problems with it.
I understood from the code example that, since my data is continuous, it should be discretized (as it is done in the 3rd example), so I replicate that part of the example:
BiocManager::install("BHC") library(BHC) library(RCurl) library(factoextra) breastCancer <- getURL('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data') names <- c('id_number', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean', 'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean','concave_points_mean', 'symmetry_mean', 'fractal_dimension_mean', 'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se', 'compactness_se', 'concavity_se', 'concave_points_se', 'symmetry_se', 'fractal_dimension_se', 'radius_worst', 'texture_worst', 'perimeter_worst', 'area_worst', 'smoothness_worst', 'compactness_worst', 'concavity_worst', 'concave_points_worst', 'symmetry_worst', 'fractal_dimension_worst') breastCancer <- read.table(textConnection(breastCancer), sep = ',', col.names = names) breastCancer.predictors <- breastCancer[3:32] breastCancer.prcomp <- prcomp(breastCancer.predictors, scale = TRUE, center = TRUE) breastCancer.PCA <- breastCancer.prcomp$x[, 1:7] newData2 <- breastCancer.PCA itemLabels2 <-breastCancer$diagnosis percentiles <- FindOptimalBinning(newData2, itemLabels2, transposeData=TRUE, verbose=TRUE) discreteData <- DiscretiseData(t(newData2), percentiles=percentiles) discreteData <- t(discreteData) hc3 <- bhc(discreteData, itemLabels2, verbose=TRUE) plot(hc3, axes=FALSE) WriteOutClusterLabels(hc3, verbose=TRUE)
However, although I get two clusters, the first one only has one occurrence and the second one have the rest, which is far from my expected result. Am I doing something wrong?
Thanks in advance.