Search
Question: cross validation / bootstrap after classification
0
gravatar for Heike Pospisil
12.1 years ago by
Heike Pospisil310 wrote:
Hello Bioconducters, I used t-test and/or SAM to find significant genes describing the differences in hgu133plus2-chips of two different phenotypical classes. The resulting heatmaps show a promising clustering. Now, I would like to confirm these clusters and to estimate the robustness of this clustering by cross-validation and/or bootstrapping(*). For that, I have two questions: 1) Does there exists an appropriate package and/or source to perfom cross-validation and/or bootstrapping? 2) Which is the right measure to rate the goodness of such a clustering? By now, I looked over the cluster plots(**) and decided if it was good or a bad clustering. Thanks in advance for any suggestion. Best wishes, Heike * with varying chip - subsets ** heatmap(exprs(sub),Colv=as.dendrogram(hclust(dist(t(exprs(sub)), method="euclidean"),method="complete")))
ADD COMMENTlink modified 12.1 years ago by Kevin Coombes30 • written 12.1 years ago by Heike Pospisil310
0
gravatar for Sean Davis
12.1 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On 11/4/05 7:40 AM, "Heike Pospisil" <pospisil at="" zbh.uni-hamburg.de=""> wrote: > Hello Bioconducters, > > I used t-test and/or SAM to find significant genes describing the differences > in > hgu133plus2-chips of two different phenotypical classes. The resulting > heatmaps > show a promising clustering. > > Now, I would like to confirm these clusters and to estimate the robustness of > this clustering by cross-validation and/or bootstrapping(*). For that, I have > two questions: > > 1) Does there exists an appropriate package and/or source to perfom > cross-validation and/or bootstrapping? > > 2) Which is the right measure to rate the goodness of such a clustering? By > now, > I looked over the cluster plots(**) and decided if it was good or a bad > clustering. Heike, If I understand what you did, there is a major problem with your logic, I think. You are using the genes from a SUPERVISED analysis to do your clustering. There SHOULD be clustering and the strength of the clustering is already measured by the number of significant genes from your SAM analysis. In other words, you told SAM to define genes that divide your two groups and then ask for hierarchical clustering to give you its best guess as to the clustering given those genes--of course you will get back a clustering very close to the clusters that you gave SAM (if, indeed, there is any difference between the two groups). So, there is no point in determining the significance of the heatmap clustering--it doesn't represent an unsupervised analysis anymore. Hope that helps a bit. Sean
ADD COMMENTlink written 12.1 years ago by Sean Davis21k
0
gravatar for Kevin Coombes
12.1 years ago by
Kevin Coombes30 wrote:
Hi, I have a package (ClassDiscovery) at http://bioinformatics.mdanderson.org/Software/OOMPA that includes classes for PerturbationClusterTest and BootstrapClusterTest. Richard Simon's book (Design and Analysis of DNA Microarray Experiments) includes a section on assesing the validity of clusters. Of course, clusters arising from a supervised selection of genes aren't meaningful anyway.... -- Kevin --On Friday, November 04, 2005 1:40 PM +0100 Heike Pospisil <pospisil at="" zbh.uni-hamburg.de=""> wrote: > Hello Bioconducters, > > I used t-test and/or SAM to find significant genes describing the > differences in hgu133plus2-chips of two different phenotypical classes. > The resulting heatmaps show a promising clustering. > > Now, I would like to confirm these clusters and to estimate the > robustness of this clustering by cross-validation and/or > bootstrapping(*). For that, I have two questions: > > 1) Does there exists an appropriate package and/or source to perfom > cross-validation and/or bootstrapping? > > 2) Which is the right measure to rate the goodness of such a clustering? > By now, I looked over the cluster plots(**) and decided if it was good > or a bad clustering. > > Thanks in advance for any suggestion. > Best wishes, > Heike > > * with varying chip - subsets > ** heatmap(exprs(sub),Colv=as.dendrogram(hclust(dist(t(exprs(sub)), > method="euclidean"),method="complete"))) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENTlink written 12.1 years ago by Kevin Coombes30
0
gravatar for Heike Pospisil
12.1 years ago by
Heike Pospisil310 wrote:
Hello, > I have a package (ClassDiscovery) at > http://bioinformatics.mdanderson.org/Software/OOMPA > that includes classes for > PerturbationClusterTest > and > BootstrapClusterTest. > > Richard Simon's book (Design and Analysis of DNA Microarray Experiments) > includes a section on assesing the validity of clusters. Thanks for these hints. > Of course, clusters arising from a supervised selection of genes aren't > meaningful anyway.... I see, my explanation was too unexact. I use the t-test to get a sub set of gene and cluster them. Now, I would like to decide how robust is this selection depending on a random selection of chips. Sorry for this confusion and thanks for your help. Will try your package soon. Best wishes, Heike
ADD COMMENTlink written 12.1 years ago by Heike Pospisil310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 334 users visited in the last hour