Question: Can we use Intraclass correlation (ICC) to optimize clustering parameters?
0
gravatar for Guest User
5.6 years ago by
Guest User12k
Guest User12k wrote:
Hi all, I am trying to find co-expressed genes in my affy data. I use hierarchical clustering with dynamic tree cut. I want to choose optimal clustering/cut parameters and I am new to cluster validation. I understand that there are many cluster indices that can be used for cluster validation. Since I am interested in co-expression only, can I simply use intraclass correlation (ICC) as a metric to choose optimal parameters? ie, choose the clustering parameters that gives the highest ICC in each cluster. Is ICC commonly used for choosing clustering parameters? Is it Ok? or Is there any other more commonly used metric? Thanks a lot in advance. Rafi -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GeneAnswers_2.4.0 RColorBrewer_1.0-5 Heatplus_2.8.0 MASS_7.3-29 XML_3.98-1.1 RCurl_1.95-4.1 [7] bitops_1.0-6 igraph_0.6.6 plyr_1.8 KEGG.db_2.10.1 GSEABase_1.24.0 rat2302.db_2.10.1 [13] org.Rn.eg.db_2.10.1 annotate_1.40.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 Matrix_1.1-1.1 [19] GO.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.2 IRanges_1.20.6 lattice_0.20-24 [6] RBGL_1.38.0 splines_3.0.2 stats4_3.0.2 survival_2.37-4 tools_3.0.2 [11] xtable_1.7-1 -- Sent via the guest posting facility at bioconductor.org.
go clustering rat2302 affy • 579 views
ADD COMMENTlink modified 5.6 years ago by Tim Triche4.2k • written 5.6 years ago by Guest User12k
Answer: Can we use Intraclass correlation (ICC) to optimize clustering parameters?
0
gravatar for Tim Triche
5.6 years ago by
Tim Triche4.2k
United States
Tim Triche4.2k wrote:
you might find the 'WCGNA' package to be useful for a starting point. it is also extensively published and IIRC, at least one of the authors is on this list Statistics is the grammar of science. Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> On Tue, Feb 11, 2014 at 11:22 AM, Rafi [guest] <guest@bioconductor.org>wrote: > > Hi all, > > I am trying to find co-expressed genes in my affy data. I use hierarchical > clustering with dynamic tree cut. I want to choose optimal clustering/cut > parameters and I am new to cluster validation. I understand that there are > many cluster indices that can be used for cluster validation. > > Since I am interested in co-expression only, can I simply use intraclass > correlation (ICC) as a metric to choose optimal parameters? ie, choose the > clustering parameters that gives the highest ICC in each cluster. > > Is ICC commonly used for choosing clustering parameters? Is it Ok? or Is > there any other more commonly used metric? > > Thanks a lot in advance. > Rafi > > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] GeneAnswers_2.4.0 RColorBrewer_1.0-5 Heatplus_2.8.0 > MASS_7.3-29 XML_3.98-1.1 RCurl_1.95-4.1 > [7] bitops_1.0-6 igraph_0.6.6 plyr_1.8 > KEGG.db_2.10.1 GSEABase_1.24.0 rat2302.db_2.10.1 > [13] org.Rn.eg.db_2.10.1 annotate_1.40.0 GOstats_2.28.0 > graph_1.40.1 Category_2.28.0 Matrix_1.1-1.1 > [19] GO.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 > AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.2 > IRanges_1.20.6 lattice_0.20-24 > [6] RBGL_1.38.0 splines_3.0.2 stats4_3.0.2 > survival_2.37-4 tools_3.0.2 > [11] xtable_1.7-1 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 5.6 years ago by Tim Triche4.2k
On Tue, Feb 11, 2014 at 2:08 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > you might find the 'WCGNA' package to be useful for a starting point. it > is also extensively published and IIRC, at least one of the authors is on > this list One of WGCNA authors reporting for duty. Tim, thanks for the advertising! :) >> Hi all, >> >> I am trying to find co-expressed genes in my affy data. I use hierarchical >> clustering with dynamic tree cut. I want to choose optimal clustering/cut >> parameters and I am new to cluster validation. I understand that there are >> many cluster indices that can be used for cluster validation. Validation usually means having an independent data set. If you do have an independent data set and want to know whether the clusters you found in your original ("reference") data set can be found in your validation ("test") data set, you can use the WGCNA module preservation statistics (http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/ModulePrese rvation/). If you wan to know whether you chose "optimal" clusters using only the data set you derived the clusters from, there are many measures of cluster quality that you can choose from; I am not an expert in this area. Be aware that the Dynamic Tree Cut approach is very heuristic and, depending on which measure of cluster quality yo choose, may not lead to an optimal clustering (but it seems to reproduce clusters in simulated data quite well, and on real data yields functionally coherent modules). >> >> Since I am interested in co-expression only, can I simply use intraclass >> correlation (ICC) as a metric to choose optimal parameters? ie, choose the >> clustering parameters that gives the highest ICC in each cluster. >> >> Is ICC commonly used for choosing clustering parameters? Is it Ok? or Is >> there any other more commonly used metric? ICC (I assume you mean the average correlation among all profiles within a cluster) should be a viable measure but will inevitably get better as you increase the number of clusters, so simply maximizing ICC will not work - you will need to include some penalty for the number of clusters. HTH, Peter
ADD REPLYlink written 5.6 years ago by Peter Langfelder2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 264 users visited in the last hour