perhaps because they don't add anything beyond the simple and broadly
To: Ramon Diaz-Uriarte
Cc: Prof Brian Ripley; James W. MacDonald;
Sent: 13/09/04 20:26
Subject: Re: [BioC] Re: [S] Error in clustering procedure
Another issue which I do not understand is: Why do all
people use the same hierarchical clustering method and
none of the many new clustering methods which exist.
To mention a few examples in each clustering category:
Partitioning methods: CLARA or CLARANS
Hierarchical methods: BIRCH or CURE
Density-based methods: DBSCAN, OPTICS or DENCLUE
Grid-based methods: STING, WaveCluster or CLIQUE
Model-based methods: COBWEB or CLASSIT
It would be great to be able to try these novel methods
and to know, which method would be especially suitable
for which purpose.
Ramon Diaz-Uriarte wrote:
> On Monday 13 September 2004 10:36, michael watson (IAH-C) wrote:
>>I guess I'm coming to this late, but I'm pretty sure all biologists
>>cluster analysis for is for finding out which genes are behaving
>>to one another in a large data set. Then if, for example, all genes
> Oh, but that is one problem I was referring to: say you use UPGMA;
> will get a dendrogram; then, you can make up any story. That was one
> concerns. Clustering gives you clusters, but most papers I've seen
> clustering do not seem to be overly concerned about how meaningful
> repeatable those clusters are.
> Related to the above, and to clustering being over-sold, is the fact
> rarely does one find discussion in those papers about how the type
> clustering algorithm affects the results, and how different
> algorihms/different metrics, etc, can relate to the prior beliefs
> shape of clusters (or how different clustering algorithms are better
> detect different patterns).
> And finally, it is not always clear that the difference between
> and confirmatory is being made. We can read senteces such as "the
> results show that there are two groups"... Well, in what sense and
> results from some aglomerative clustering algorithm show that there
> groups (and not twenty)?
> But, again, I do think clustering has a role for certain types of
> just think it is not the magic bullet to "let the data speak for
> and similar marketing hype.
>>certain pathway are showing a similar expression pattern, we have a
>>hypothesis which can be tested further.
>>If cluster analysis has indeed been "over-sold", please suggest a
>>algorithm for summarising groups of genes that are behaving
>>across a group of experiments or time-points :-)
>>From: Ramon Diaz-Uriarte [mailto:email@example.com]
>>Sent: 08 September 2004 09:33
>>Cc: Prof Brian Ripley; cstrato; James W. MacDonald
>>Subject: Re: [BioC] Re: [S] Error in clustering procedure
>>On Tuesday 07 September 2004 21:17, cstrato wrote:
>>>First of all, I want to apologize to Prof. Ripley, since I forgot
>>>ask him first for permission to publish his comment.
>>>Personally, I agree that this would be useless, as Prof. Ripley has
>>>already told me some years ago. However, almost everybody still
>>>to do it and publish the corresponding results. Companies such as
>>>Spotfire are proud that you can do hierarchical clustering with
>>>than 20,000 genes. I have never seen a publication where it was
>>Part of this could be the result of imitative behavior, beliefs that
>>"unless I put a neat heatmap I won't get it past reviewers", etc,
>>evidence that it is the best way to go. If several companies are
>>issue out of it in their brochures, maybe it is because customers
>>clustering. As for "publish the corresponding results" I am not
>>the "results" are, since after clustering 7000 genes you can almost
>>make up a story after the fact; but I would not call that a result.
>>I think clustering (and biclustering) do have a place, but I guess
>>should be used as a tool to answer some question (e.g., I think I
>>understand what question a t-test is helping to answer; I am not
>>most clustering procedures), or as a guidance for something, not as
>>kind of magic tool to "let the data speak for themselves" ( = a) get
>>microarray data; b) run a clustering procedure; c) come up with a
>>that your cluster "answered".)
>>>I think that the bioconductor list would be the best forum to
>>>this issue, and provide solutions (besides the obvious suggestion
>>>filter non-varying genes).
>>>James W. MacDonald wrote:
>>>>>Sorry, but I cannot resist:
>>>>>Any comments of the microarry community on the usefulness of
>>>>>hierarchical clustering of 7000 genes?
>>>>I think this would be almost completely useless. First off,
>>>>clustering is not an inferential technique, so its use has been
>>>>completely oversold IMO to the biological community. Secondly,
>>>>clustering is usually done to produce a 'heat map' to put in a
>>>>or flash on the screen during a presentation. How on earth would
>>>>this be of any use? You couldn't even read any of the gene names!
>>>>Of course you could use the heatmap to impress friends and
>>>>colleagues with the fact that you rate a computer powerful enough
>>>>*do* a heatmap with a 7000 x 5 matrix ;-D
>>>Bioconductor mailing list
Bioconductor mailing list
This email and any files transmitted with it are