Hi! I am Rhyeu.
I have been interested in Bayesian clustering and tried to test some packages and "BHC" package.
And I have had a question.
what is the role of the 'itemLables' in bhc function?
I have thought that it provides a sort of a prior imformation in clustering.. but when I test in Fisher's iris data, It have not made some differents....
I have been wrong to employ this function or had some misunderstanding??
I already have changed 'randomised' and 'numReps' options. They have not affect the results as I thought.
and I have already read some articles related this package and I found that my result is same as 'Lowing and Bomalaski, 2017''s results that removed 'Species labels'. I have not been sure that how to apply 'Species Label' in this function.
I attached the code that I tested as follow.
thanks for reading this question.
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=Korean_Korea.949 LC_CTYPE=Korean_Korea.949
[3] LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C
[5] LC_TIME=Korean_Korea.949
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.8.3 here_0.1 BHC_1.36.0
[4] rstan_2.19.2 ggplot2_3.2.0 StanHeaders_2.18.1-10
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 pillar_1.4.2 compiler_3.6.1
[4] prettyunits_1.0.2 tools_3.6.1 pkgbuild_1.0.3
[7] packrat_0.5.0 tibble_2.1.3 gtable_0.3.0
[10] pkgconfig_2.0.2 rlang_0.4.0 cli_1.1.0
[13] rstudioapi_0.10 parallel_3.6.1 xfun_0.8
[16] loo_2.1.0 gridExtra_2.3 withr_2.1.2
[19] knitr_1.23 rprojroot_1.3-2 stats4_3.6.1
[22] grid_3.6.1 tidyselect_0.2.5 glue_1.3.1
[25] inline_0.3.15 R6_2.4.0 processx_3.4.1
[28] callr_3.3.1 purrr_0.3.2 magrittr_1.5
[31] backports_1.1.4 scales_1.0.0 ps_1.3.0
[34] matrixStats_0.54.0 assertthat_0.2.1 colorspace_1.4-1
[37] lazyeval_0.2.2 munsell_0.5.0 crayon_1.3.4
library(BHC)
library(dplyr)
data(iris)
itemLabels = as.character(c(rep(1, 50), rep(2, 50), rep(3, 50))) # setosa : 1, versicolor : 2, virginica : 3 or itemLables = iris$Species
itemLabels2 = as.character(1:150)
percentiles = FindOptimalBinning(t(iris[,1:4]), itemLabels, transposeData = T, verbose = T)
percentiles2 = FindOptimalBinning(t(iris[,1:4]), itemLabels2, transposeData = T, verbose = T)
percentiles
percentiles2
discreteData <- DiscretiseData(t(iris[,1:4]), percentiles=percentiles)
discreteData2 <- DiscretiseData(t(iris[,1:4]), percentiles=percentiles2)
discreteData <- t(discreteData)
discreteData2 <- t(discreteData2)
discreteData
hc3 <- bhc(discreteData,
itemLabels,
verbose=TRUE
# randomised = T,
# numReps = 50
)
hc3_2 <- bhc(discreteData2,
itemLabels2,
verbose=TRUE
# randomised = T,
# numReps = 50
)
par(mfrow=c(1,2))
plot(hc3, main = "Label 1:3")
plot(hc3_2, main = "Lable 1:150")