Hi! I am Rhyeu.
I have been interested in Bayesian clustering and tried to test some packages and "BHC" package.
And I have had a question.
what is the role of the 'itemLables' in bhc function?
I have thought that it provides a sort of a prior imformation in clustering.. but when I test in Fisher's iris data, It have not made some differents....
I have been wrong to employ this function or had some misunderstanding??
I already have changed 'randomised' and 'numReps' options. They have not affect the results as I thought.
and I have already read some articles related this package and I found that my result is same as 'Lowing and Bomalaski, 2017''s results that removed 'Species labels'. I have not been sure that how to apply 'Species Label' in this function.
I attached the code that I tested as follow.
thanks for reading this question.
> sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763) Matrix products: default locale:  LC_COLLATE=Korean_Korea.949 LC_CTYPE=Korean_Korea.949  LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C  LC_TIME=Korean_Korea.949 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  dplyr_0.8.3 here_0.1 BHC_1.36.0  rstan_2.19.2 ggplot2_3.2.0 StanHeaders_2.18.1-10 loaded via a namespace (and not attached):  Rcpp_1.0.2 pillar_1.4.2 compiler_3.6.1  prettyunits_1.0.2 tools_3.6.1 pkgbuild_1.0.3  packrat_0.5.0 tibble_2.1.3 gtable_0.3.0  pkgconfig_2.0.2 rlang_0.4.0 cli_1.1.0  rstudioapi_0.10 parallel_3.6.1 xfun_0.8  loo_2.1.0 gridExtra_2.3 withr_2.1.2  knitr_1.23 rprojroot_1.3-2 stats4_3.6.1  grid_3.6.1 tidyselect_0.2.5 glue_1.3.1  inline_0.3.15 R6_2.4.0 processx_3.4.1  callr_3.3.1 purrr_0.3.2 magrittr_1.5  backports_1.1.4 scales_1.0.0 ps_1.3.0  matrixStats_0.54.0 assertthat_0.2.1 colorspace_1.4-1  lazyeval_0.2.2 munsell_0.5.0 crayon_1.3.4 library(BHC) library(dplyr) data(iris) itemLabels = as.character(c(rep(1, 50), rep(2, 50), rep(3, 50))) # setosa : 1, versicolor : 2, virginica : 3 or itemLables = iris$Species itemLabels2 = as.character(1:150) percentiles = FindOptimalBinning(t(iris[,1:4]), itemLabels, transposeData = T, verbose = T) percentiles2 = FindOptimalBinning(t(iris[,1:4]), itemLabels2, transposeData = T, verbose = T) percentiles percentiles2 discreteData <- DiscretiseData(t(iris[,1:4]), percentiles=percentiles) discreteData2 <- DiscretiseData(t(iris[,1:4]), percentiles=percentiles2) discreteData <- t(discreteData) discreteData2 <- t(discreteData2) discreteData hc3 <- bhc(discreteData, itemLabels, verbose=TRUE # randomised = T, # numReps = 50 ) hc3_2 <- bhc(discreteData2, itemLabels2, verbose=TRUE # randomised = T, # numReps = 50 ) par(mfrow=c(1,2)) plot(hc3, main = "Label 1:3") plot(hc3_2, main = "Lable 1:150")