Dear Bioconductor community,
after performing preprocessing and statistical analysis in an Illumina dataset with Limma, i have acquired a specific DEG list, which i would like to use it afterwards to subset my dataset, and then performed some clustering analysis and subsequent functional enrichment to see if any interesting pathways can found perturbed in any cluster. As i wanted to have an insight and able to get the genes to each cluster, firstly i used the R package mclust to compute the optimal number of clusters depending my selected deg genes:
class(filtered.2)
[1] "EList"
attr(,"package")
[1] "limma"
significant # the data.frame from topTable after extracting the DEG genes(1272 DEG genes-probeIDs)
filtered.3 <- filtered.2[rownames(significant),] # where rownames are the probeIDs
jason.mclust=function(data,g1,g2){
d_clust <- Mclust(as.matrix(data), G=g1:g2)
m.best <- dim(d_clust$z)[2]
cat("model-based optimal number of clusters:", m.best, "\n")
return(m.best)
}
It returned my 12 as the optimal number of clusters
then i used the Kmeans function from package cluster in one other function i implemented:
get_clusters=function(data,nclusters){
fit <- kmeans(data, nclusters,iter.max=50)
aggregate(data, by=list(fit$cluster), FUN=mean)
clust.out <- fit$cluster
kclust <- as.matrix(clust.out)
clusplot(data, fit$cluster, shade=F,lines=0, color=T, lty=4, main='PC of K-means clusters')
return(kclust)
}
Then when i firstly used :
kclust=get_clusters(filtered.3,12)
Error in clusplot.default(data, fit$cluster, shade = F, lines = 0, color = T, :
x is not numeric
On the other hand, when i used : kclust=get_clusters(filtered.3$E,12)
Error in clusplot.default(data, fit$cluster, shade = F, lines = 0, color = T, :
4 arguments passed to .Internal(nchar) which requires 3
Except these errors, i would like to ask one more naive(silly) but also important question
if i use just the function kmeans, as : fit <- kmeans(data, nclusters,iter.max=50),
should i use data=filtered.3 or data in a form of a matrix,i.e., data=filtered.3$E ?
i know that with the iterations the results change a bit but which is more appropriate ? Or it doesnt matter and it is the same ??
Thank you in advance for your time !!
The code runs fine on my machine. You'll need to chase up the package maintainers to get a better idea as to what's happening. As an additional note, make sure you put
set.seed(0)
or the like at the start of your code, to ensure that the random number generation is reproducible.Dear Aaron,
thank you for your verification. So there is something wrong happening with my machine.
Also heres my sessionInfo()
sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Greek_Greece.1253 LC_CTYPE=Greek_Greece.1253
[3] LC_MONETARY=Greek_Greece.1253 LC_NUMERIC=C
[5] LC_TIME=Greek_Greece.1253
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] cluster_2.0.3 mclust_5.0.2 illuminaHumanv4.db_1.26.0
[4] org.Hs.eg.db_3.1.2 RSQLite_1.0.0 DBI_0.3.1
[7] AnnotationDbi_1.30.1 GenomeInfoDb_1.4.1 IRanges_2.2.5
[10] S4Vectors_0.6.2 limma_3.24.14 BiocInstaller_1.18.4
[13] simpleaffy_2.44.0 gcrma_2.40.0 genefilter_1.50.0
[16] affy_1.46.0 Biobase_2.28.0 BiocGenerics_0.14.0
loaded via a namespace (and not attached):
[1] XVector_0.8.0 splines_3.2.0 zlibbioc_1.14.0
[4] affyPLM_1.44.0 xtable_1.7-4 tools_3.2.0
[7] survival_2.38-3 preprocessCore_1.30.0 affyio_1.36.0
[10] Biostrings_2.36.1 XML_3.98-1.3 annotate_1.46.1
I would get in touch with the maintainers to see what is happening with the specific function-on the other hand, i know that is not of your consern to reply or suggest, but as the other part of the code works fine-should i continue with my initial implementation ?
Best,
Efstathios