Can we use Intraclass correlation (ICC) to optimize clustering parameters?

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.4 years ago

Hi all, I am trying to find co-expressed genes in my affy data. I use hierarchical clustering with dynamic tree cut. I want to choose optimal clustering/cut parameters and I am new to cluster validation. I understand that there are many cluster indices that can be used for cluster validation. Since I am interested in co-expression only, can I simply use intraclass correlation (ICC) as a metric to choose optimal parameters? ie, choose the clustering parameters that gives the highest ICC in each cluster. Is ICC commonly used for choosing clustering parameters? Is it Ok? or Is there any other more commonly used metric? Thanks a lot in advance. Rafi -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GeneAnswers_2.4.0 RColorBrewer_1.0-5 Heatplus_2.8.0 MASS_7.3-29 XML_3.98-1.1 RCurl_1.95-4.1 [7] bitops_1.0-6 igraph_0.6.6 plyr_1.8 KEGG.db_2.10.1 GSEABase_1.24.0 rat2302.db_2.10.1 [13] org.Rn.eg.db_2.10.1 annotate_1.40.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 Matrix_1.1-1.1 [19] GO.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.2 IRanges_1.20.6 lattice_0.20-24 [6] RBGL_1.38.0 splines_3.0.2 stats4_3.0.2 survival_2.37-4 tools_3.0.2 [11] xtable_1.7-1 -- Sent via the guest posting facility at bioconductor.org.

GO Clustering rat2302 affy GO Clustering rat2302 affy • 1.5k views

ADD COMMENT • link updated 12.0 years ago by Tim Triche ★ 4.2k • written 12.0 years ago by Guest User ★ 13k

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 5.4 years ago

United States

you might find the 'WCGNA' package to be useful for a starting point. it is also extensively published and IIRC, at least one of the authors is on this list Statistics is the grammar of science. Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> On Tue, Feb 11, 2014 at 11:22 AM, Rafi [guest] <guest@bioconductor.org>wrote: > > Hi all, > > I am trying to find co-expressed genes in my affy data. I use hierarchical > clustering with dynamic tree cut. I want to choose optimal clustering/cut > parameters and I am new to cluster validation. I understand that there are > many cluster indices that can be used for cluster validation. > > Since I am interested in co-expression only, can I simply use intraclass > correlation (ICC) as a metric to choose optimal parameters? ie, choose the > clustering parameters that gives the highest ICC in each cluster. > > Is ICC commonly used for choosing clustering parameters? Is it Ok? or Is > there any other more commonly used metric? > > Thanks a lot in advance. > Rafi > > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] GeneAnswers_2.4.0 RColorBrewer_1.0-5 Heatplus_2.8.0 > MASS_7.3-29 XML_3.98-1.1 RCurl_1.95-4.1 > [7] bitops_1.0-6 igraph_0.6.6 plyr_1.8 > KEGG.db_2.10.1 GSEABase_1.24.0 rat2302.db_2.10.1 > [13] org.Rn.eg.db_2.10.1 annotate_1.40.0 GOstats_2.28.0 > graph_1.40.1 Category_2.28.0 Matrix_1.1-1.1 > [19] GO.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 > AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.2 > IRanges_1.20.6 lattice_0.20-24 > [6] RBGL_1.38.0 splines_3.0.2 stats4_3.0.2 > survival_2.37-4 tools_3.0.2 > [11] xtable_1.7-1 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 12.0 years ago Tim Triche ★ 4.2k

0

Entering edit mode

On Tue, Feb 11, 2014 at 2:08 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > you might find the 'WCGNA' package to be useful for a starting point. it > is also extensively published and IIRC, at least one of the authors is on > this list One of WGCNA authors reporting for duty. Tim, thanks for the advertising! :) >> Hi all, >> >> I am trying to find co-expressed genes in my affy data. I use hierarchical >> clustering with dynamic tree cut. I want to choose optimal clustering/cut >> parameters and I am new to cluster validation. I understand that there are >> many cluster indices that can be used for cluster validation. Validation usually means having an independent data set. If you do have an independent data set and want to know whether the clusters you found in your original ("reference") data set can be found in your validation ("test") data set, you can use the WGCNA module preservation statistics (http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/ModulePrese rvation/). If you wan to know whether you chose "optimal" clusters using only the data set you derived the clusters from, there are many measures of cluster quality that you can choose from; I am not an expert in this area. Be aware that the Dynamic Tree Cut approach is very heuristic and, depending on which measure of cluster quality yo choose, may not lead to an optimal clustering (but it seems to reproduce clusters in simulated data quite well, and on real data yields functionally coherent modules). >> >> Since I am interested in co-expression only, can I simply use intraclass >> correlation (ICC) as a metric to choose optimal parameters? ie, choose the >> clustering parameters that gives the highest ICC in each cluster. >> >> Is ICC commonly used for choosing clustering parameters? Is it Ok? or Is >> there any other more commonly used metric? ICC (I assume you mean the average correlation among all profiles within a cluster) should be a viable measure but will inevitably get better as you increase the number of clusters, so simply maximizing ICC will not work - you will need to include some penalty for the number of clusters. HTH, Peter

ADD REPLY • link 12.0 years ago Peter Langfelder ★ 3.0k

Login before adding your answer.