Question: Error using function NbClust regarding multiple values in the argument "index" of package NbClust for getting the optimal number of clusters of a dataset
2.3 years ago by
Greece/Athens/National Hellenic Research Foundation
svlachavas570 wrote:

Dear Bioconductor community,

i would like to adress a specific problem i encountered after using the package NbClust, for getting the optimal number of clusters concerning various measures, regarding a microarray dataset. Although the problem is possibly more generally programming-dependent, i decide to create the post as it is directly connected with the specific package.

In detail, the function Nblust according to the documents is used:

nc <- NbClust(exprs(eset.3),,, method="kmeans", index="all") # eset.3 my expression set & also the argument index contains all of the different criteria used for evaluation of the optimal number of clusters for each methodology:

Error in NbClust(exprs(eset.3), = 2, = 15, method = "kmeans") : 

The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.

While i solved this problem from literature and also from feedback from the creators of the package, as it is probably due to the fact that my matrix returns negative eigenvalues--Nevertheless, except some specific criteria i cant use for my set and which create the above error, i can use the rest 23 specific indexes with the function,

i.e.[ "kl", "ch", "hartigan",  "cindex"...]

The new (unsolved) problem is when i tried to directly used it in the function with only a couple of indices:

 nc <- NbClust(df,,, method="kmeans", index=c("kl","ch","hartigan",  "cindex", "db"))

Error in NbClust(df, = 2, = 15, method = "kmeans", index = c("kl",  : 
  object 'DiffLev' not found
In addition: Warning messages:
1: In if stop("invalid clustering index") :
  the condition has length > 1 and only the first element will be used
2: In if (indice == -1) stop("ambiguous index") :
  the condition has length > 1 and only the first element will be used

So i assume that because the values for the index argument are fixed, and either use one index value at a time(i.e. "ch") or the options "all" & "alllong"(it shows on ?NbClust), i tried to create a function that uses each value from the pre-defined values with a for loop and returns a list which contains each list for the specific index, based on the fact that :

the function NbClust returns a list, with 4 components, from which the one i want is called $, returns the optimum number of clusters according to each criterion-and to conclude, i would like to create generally an object that for each criterion-index from the above will return also this specific value .

Thus, i naively tried:

selected <- c( "kl", "ch", "hartigan",  "cindex", "db", "silhouette", "duda", "pseudot2", "beale", "ratkowsky", "ball", "ptbiserial", "gap", "frey", "mcclain", "gamma", "gplus", "tau", "dunn", "hubert", "sdindex", "dindex", "sdbw") # the specific indices i can use-total number=23

results <- vector("list",23)

for (i in 1:length(selected)) {results[[i]] <- NbClust(exprs(eset.2),,, method="kmeans", index=selected[i]) }

But the loop keeps running without stopping like possibly an infinite loop(i left it a couple of hours but nothing happen)

Possibly, it is due to my  lack of experienced programming, so how could i deal with this problem ?

Any help or suggestions would be essential !!






ADD COMMENTlink modified 9 months ago by james.tsakalos0 • written 2.3 years ago by svlachavas570
9 months ago by
james.tsakalos0 wrote:

I know this is an old post, I just had same issue.


My solution was to wrap try() around the loop then to look at the results.


for (i in 1:length(selected)) {

results[[i]] <- try(NbClust(exprs(eset.2),,, method="kmeans", index=selected[i]))



This should work for you now.

ADD COMMENTlink written 9 months ago by james.tsakalos0
