Dear Bioconductor community,
i would like to adress a specific problem i encountered after using the package NbClust, for getting the optimal number of clusters concerning various measures, regarding a microarray dataset. Although the problem is possibly more generally programming-dependent, i decide to create the post as it is directly connected with the specific package.
In detail, the function Nblust according to the documents is used:
nc <- NbClust(exprs(eset.3), min.nc=2, max.nc=15, method="kmeans", index="all") # eset.3 my expression set & also the argument index contains all of the different criteria used for evaluation of the optimal number of clusters for each methodology:
Error in NbClust(exprs(eset.3), min.nc = 2, max.nc = 15, method = "kmeans") :
The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.
While i solved this problem from literature and also from feedback from the creators of the package, as it is probably due to the fact that my matrix returns negative eigenvalues--Nevertheless, except some specific criteria i cant use for my set and which create the above error, i can use the rest 23 specific indexes with the function,
i.e.[ "kl", "ch", "hartigan", "cindex"...]
The new (unsolved) problem is when i tried to directly used it in the function with only a couple of indices:
nc <- NbClust(df, min.nc=2, max.nc=15, method="kmeans", index=c("kl","ch","hartigan", "cindex", "db"))
Error in NbClust(df, min.nc = 2, max.nc = 15, method = "kmeans", index = c("kl", :
object 'DiffLev' not found
In addition: Warning messages:
1: In if is.na(indice)) stop("invalid clustering index") :
the condition has length > 1 and only the first element will be used
2: In if (indice == -1) stop("ambiguous index") :
the condition has length > 1 and only the first element will be used
So i assume that because the values for the index argument are fixed, and either use one index value at a time(i.e. "ch") or the options "all" & "alllong"(it shows on ?NbClust), i tried to create a function that uses each value from the pre-defined values with a for loop and returns a list which contains each list for the specific index, based on the fact that :
the function NbClust returns a list, with 4 components, from which the one i want is called $Best.nc, returns the optimum number of clusters according to each criterion-and to conclude, i would like to create generally an object that for each criterion-index from the above will return also this specific value .
Thus, i naively tried:
selected <- c( "kl", "ch", "hartigan", "cindex", "db", "silhouette", "duda", "pseudot2", "beale", "ratkowsky", "ball", "ptbiserial", "gap", "frey", "mcclain", "gamma", "gplus", "tau", "dunn", "hubert", "sdindex", "dindex", "sdbw") # the specific indices i can use-total number=23
results <- vector("list",23)
for (i in 1:length(selected)) {results[[i]] <- NbClust(exprs(eset.2), min.nc=2, max.nc=15, method="kmeans", index=selected[i]) }
But the loop keeps running without stopping like possibly an infinite loop(i left it a couple of hours but nothing happen)
Possibly, it is due to my lack of experienced programming, so how could i deal with this problem ?
Any help or suggestions would be essential !!