Error on FindOptimalBinning function
0
0
Entering edit mode
@pavelgranalacant-23139
Last seen 2.2 years ago

Hi,

I was trying to replicate the BHC library example code (https://bioconductor.org/packages/release/bioc/html/BHC.html) with the Beast Cancer dataset (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic), with PCA applied), but I have found problems with it.

I understood from the code example that, since my data is continuous, it should be discretized (as it is done in the 3rd example), so I replicate that part of the example:

BiocManager::install("BHC")
library(BHC)
library(RCurl)
library(factoextra)

breastCancer <- getURL('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data')
'texture_mean', 'perimeter_mean', 'area_mean',
'smoothness_mean', 'compactness_mean',
'concavity_mean','concave_points_mean',
'symmetry_mean', 'fractal_dimension_mean',
'area_se', 'smoothness_se', 'compactness_se',
'concavity_se', 'concave_points_se',
'symmetry_se', 'fractal_dimension_se',
'perimeter_worst', 'area_worst',
'smoothness_worst', 'compactness_worst',
'concavity_worst', 'concave_points_worst',
'symmetry_worst', 'fractal_dimension_worst')
breastCancer <-
sep = ',',
col.names = names)

breastCancer.predictors <- breastCancer[3:32]
breastCancer.prcomp <- prcomp(breastCancer.predictors, scale = TRUE, center = TRUE)
breastCancer.PCA <- breastCancer.prcomp$x[, 1:7] newData2 <- breastCancer.PCA itemLabels2 <-breastCancer$diagnosis
percentiles  <- FindOptimalBinning(newData2, itemLabels2, transposeData=TRUE, verbose=TRUE)
discreteData <- DiscretiseData(t(newData2), percentiles=percentiles)
discreteData <- t(discreteData)
hc3          <- bhc(discreteData, itemLabels2, verbose=TRUE)
plot(hc3, axes=FALSE)
WriteOutClusterLabels(hc3, verbose=TRUE)


However, although I get two clusters, the first one only has one occurrence and the second one have the rest, which is far from my expected result. Am I doing something wrong?

bhc error • 194 views