Question

what is the role of 'itemLabels' in 'bhc' function

0

Entering edit mode

sk.rhyeu • 0

@skrhyeu-21536

Last seen 5.3 years ago

Hi! I am Rhyeu.

I have been interested in Bayesian clustering and tried to test some packages and "BHC" package.

And I have had a question.

what is the role of the 'itemLables' in bhc function?

I have thought that it provides a sort of a prior imformation in clustering.. but when I test in Fisher's iris data, It have not made some differents....

I have been wrong to employ this function or had some misunderstanding??

I already have changed 'randomised' and 'numReps' options. They have not affect the results as I thought.

and I have already read some articles related this package and I found that my result is same as 'Lowing and Bomalaski, 2017''s results that removed 'Species labels'. I have not been sure that how to apply 'Species Label' in this function.

I attached the code that I tested as follow.

thanks for reading this question.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=Korean_Korea.949  LC_CTYPE=Korean_Korea.949   
[3] LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C                
[5] LC_TIME=Korean_Korea.949    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.8.3           here_0.1              BHC_1.36.0           
[4] rstan_2.19.2          ggplot2_3.2.0         StanHeaders_2.18.1-10

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2         pillar_1.4.2       compiler_3.6.1    
 [4] prettyunits_1.0.2  tools_3.6.1        pkgbuild_1.0.3    
 [7] packrat_0.5.0      tibble_2.1.3       gtable_0.3.0      
[10] pkgconfig_2.0.2    rlang_0.4.0        cli_1.1.0         
[13] rstudioapi_0.10    parallel_3.6.1     xfun_0.8          
[16] loo_2.1.0          gridExtra_2.3      withr_2.1.2       
[19] knitr_1.23         rprojroot_1.3-2    stats4_3.6.1      
[22] grid_3.6.1         tidyselect_0.2.5   glue_1.3.1        
[25] inline_0.3.15      R6_2.4.0           processx_3.4.1    
[28] callr_3.3.1        purrr_0.3.2        magrittr_1.5      
[31] backports_1.1.4    scales_1.0.0       ps_1.3.0          
[34] matrixStats_0.54.0 assertthat_0.2.1   colorspace_1.4-1  
[37] lazyeval_0.2.2     munsell_0.5.0      crayon_1.3.4      

library(BHC)
library(dplyr)
data(iris)

itemLabels = as.character(c(rep(1, 50), rep(2, 50), rep(3, 50))) # setosa : 1, versicolor : 2, virginica : 3 or itemLables = iris$Species
itemLabels2 = as.character(1:150)

percentiles = FindOptimalBinning(t(iris[,1:4]), itemLabels, transposeData = T, verbose = T)
percentiles2 = FindOptimalBinning(t(iris[,1:4]), itemLabels2, transposeData = T, verbose = T)

percentiles
percentiles2

discreteData <- DiscretiseData(t(iris[,1:4]), percentiles=percentiles)
discreteData2 <- DiscretiseData(t(iris[,1:4]), percentiles=percentiles2)

discreteData <- t(discreteData)
discreteData2 <- t(discreteData2)
discreteData
hc3 <- bhc(discreteData, 
           itemLabels, 
           verbose=TRUE 

           # randomised = T, 
           # numReps = 50
           )



hc3_2 <- bhc(discreteData2, 
             itemLabels2, 
             verbose=TRUE

             # randomised = T, 
             # numReps = 50
             )


par(mfrow=c(1,2))
plot(hc3, main = "Label 1:3")
plot(hc3_2, main = "Lable 1:150")

bhc • 771 views

ADD COMMENT • link 5.3 years ago sk.rhyeu • 0