**560**wrote:

Dear Bioconductor community,

after performing preprocessing and statistical analysis in an Illumina dataset with Limma, i have acquired a specific DEG list, which i would like to use it afterwards to subset my dataset, and then performed some clustering analysis and subsequent functional enrichment to see if any interesting pathways can found perturbed in any cluster. As i wanted to have an insight and able to get the genes to each cluster, firstly i used the R package **mclust **to compute the optimal number of clusters depending my selected deg genes:

**class(filtered.2)
[1] "EList"
attr(,"package")
[1] "limma"**

significant # the data.frame from topTable after extracting the DEG genes(1272 DEG genes-probeIDs)

**filtered.3 <- filtered.2[rownames(significant),] **# where rownames are the probeIDs

**jason.mclust=function(data,g1,g2){
d_clust <- Mclust(as.matrix(data), G=g1:g2)
m.best <- dim(d_clust$z)[2]
cat("model-based optimal number of clusters:", m.best, "\n")
return(m.best)
}**

It returned my 12 as the optimal number of clusters

then i used the** Kmeans function** from package **cluster** in one other function i implemented:

**get_clusters=function(data,nclusters){
fit <- kmeans(data, nclusters,iter.max=50)
aggregate(data, by=list(fit$cluster), FUN=mean)
clust.out <- fit$cluster
kclust <- as.matrix(clust.out)
clusplot(data, fit$cluster, shade=F,lines=0, color=T, lty=4, main='PC of K-means clusters')
return(kclust)
}**

Then when i firstly used :

**kclust=get_clusters(filtered.3,12)**

**Error in clusplot.default(data, fit$cluster, shade = F, lines = 0, color = T, :
x is not numeric**

On the other hand, when i used : **kclust=get_clusters(filtered.3$E,12)**

**Error in clusplot.default(data, fit$cluster, shade = F, lines = 0, color = T, :
4 arguments passed to .Internal(nchar) which requires 3 **

Except these errors, i would like to ask one more naive(silly) but also important question

if i use just the function kmeans, as : fit <- kmeans(data, nclusters,iter.max=50),

should i use **data=filtered.3** or data in a form of a matrix,i.e., **data=filtered.3$E** ?

i know that with the iterations the results change a bit but which is more appropriate ? Or it doesnt matter and it is the same ??

Thank you in advance for your time !!