hclust and (Eisen+ de Hoon) cluster3 program
4
0
Entering edit mode
@benjamin-haibe-kains-955
Last seen 9.6 years ago
Hi all, I have a problem with the R function 'hclust'. I have noticed differences in clustering when I use the 'centroid' cluster method with 'hclust' and the cluster3 program (see M. Eisen and M. de Hoon). Have you noticed some differences too ? I use hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu; 2004-11-15 15:56:06; unix) cluster 3.0 using C Clustering Library version 1.25 Thanks a lot -- Benjamin Haibe-Kains [http://www.ulb.ac.be/di/map/bhaibeka/] PhD student in the Machine Learning Group (MLG) [http://www.ulb.ac.be/di/mlg/] Universite Libre de Bruxelles (ULB) E-mail: bhaibeka@ulb.ac.be MicroArray Unity [http://www.bordet.be/servmed/array/index.htm] Institut Jules Bordet (IJB) E-mail: benjamin.haibekains@bordet.be
Clustering Clustering • 3.1k views
ADD COMMENT
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 9.6 years ago
Benjamin You will likely get different results from all clustering software, even when using the same parameters. This is because many arbitrary decisions have to be made during a hierarchical cluster analysis and different programmers will make those decisions in different ways. Mick -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Benjamin Haibe-Kains Sent: 06 December 2004 11:05 To: Bioconductor Mailing List Subject: [BioC] hclust and (Eisen+ de Hoon) cluster3 program Hi all, I have a problem with the R function 'hclust'. I have noticed differences in clustering when I use the 'centroid' cluster method with 'hclust' and the cluster3 program (see M. Eisen and M. de Hoon). Have you noticed some differences too ? I use hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu; 2004-11-15 15:56:06; unix) cluster 3.0 using C Clustering Library version 1.25 Thanks a lot -- Benjamin Haibe-Kains [http://www.ulb.ac.be/di/map/bhaibeka/] PhD student in the Machine Learning Group (MLG) [http://www.ulb.ac.be/di/mlg/] Universite Libre de Bruxelles (ULB) E-mail: bhaibeka@ulb.ac.be MicroArray Unity [http://www.bordet.be/servmed/array/index.htm] Institut Jules Bordet (IJB) E-mail: benjamin.haibekains@bordet.be _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
Hi Michael, I think that the differences are too important to be due to different implementation decisions. Actually my problem is that I have a group of 1 object and the rest in the other group when I use the 'centroid' hclust (I use cutree to have the main two groups) and it's not the case with other softwares. It looks like a bug in the Fortran routine but I can not access to it. Have you reported this "bug" before ? Can I write my 'centroid' method easily ? cheers, benjamin michael watson (IAH-C) wrote: >Benjamin > >You will likely get different results from all clustering software, even >when using the same parameters. This is because many arbitrary >decisions have to be made during a hierarchical cluster analysis and >different programmers will make those decisions in different ways. > >Mick > >-----Original Message----- >From: bioconductor-bounces@stat.math.ethz.ch >[mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Benjamin >Haibe-Kains >Sent: 06 December 2004 11:05 >To: Bioconductor Mailing List >Subject: [BioC] hclust and (Eisen+ de Hoon) cluster3 program > > >Hi all, > >I have a problem with the R function 'hclust'. I have noticed >differences in clustering when I use the 'centroid' cluster method with >'hclust' and the cluster3 program (see M. Eisen and M. de Hoon). > >Have you noticed some differences too ? > >I use > >hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu; >2004-11-15 15:56:06; unix) >cluster 3.0 using C Clustering Library version 1.25 > >Thanks a lot > > > -- Benjamin Haibe-Kains [http://www.ulb.ac.be/di/map/bhaibeka/] PhD student in the Machine Learning Group (MLG) [http://www.ulb.ac.be/di/mlg/] Universite Libre de Bruxelles (ULB) E-mail: bhaibeka@ulb.ac.be MicroArray Unity [http://www.bordet.be/servmed/array/index.htm] Institut Jules Bordet (IJB) E-mail: benjamin.haibekains@bordet.be
ADD REPLY
0
Entering edit mode
On Mon, 6 Dec 2004, Benjamin Haibe-Kains wrote: > Hi Michael, > > I think that the differences are too important to be due to different > implementation decisions. Actually my problem is that I have a group of 1 > object and the rest in the other group when I use the 'centroid' hclust (I > use cutree to have the main two groups) and it's not the case with other > softwares. It looks like a bug in the Fortran routine but I can not access to > it. Source code is available from CRAN. In http://cran.r-project.org/src/base/R-2/R-2.0.1.tar.gz the directory R-2.0.0/src/library/stats/src contains hclust.f and hclust-utils.c. Happy debugging. ;-) > > Have you reported this "bug" before ? Can I write my 'centroid' method easily > ? > > cheers, > > benjamin > > michael watson (IAH-C) wrote: > >> Benjamin >> >> You will likely get different results from all clustering software, even >> when using the same parameters. This is because many arbitrary >> decisions have to be made during a hierarchical cluster analysis and >> different programmers will make those decisions in different ways. >> >> Mick >> >> -----Original Message----- >> From: bioconductor-bounces@stat.math.ethz.ch >> [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Benjamin >> Haibe-Kains >> Sent: 06 December 2004 11:05 >> To: Bioconductor Mailing List >> Subject: [BioC] hclust and (Eisen+ de Hoon) cluster3 program >> >> >> Hi all, >> >> I have a problem with the R function 'hclust'. I have noticed differences >> in clustering when I use the 'centroid' cluster method with 'hclust' and >> the cluster3 program (see M. Eisen and M. de Hoon). >> >> Have you noticed some differences too ? >> >> I use >> >> hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu; 2004-11-15 >> 15:56:06; unix) >> cluster 3.0 using C Clustering Library version 1.25 >> >> Thanks a lot >> >> >> > > -- > Benjamin Haibe-Kains > [http://www.ulb.ac.be/di/map/bhaibeka/] > > > PhD student in the Machine Learning Group (MLG) > [http://www.ulb.ac.be/di/mlg/] > Universite Libre de Bruxelles (ULB) > E-mail: bhaibeka@ulb.ac.be > MicroArray Unity [http://www.bordet.be/servmed/array/index.htm] > Institut Jules Bordet (IJB) > E-mail: benjamin.haibekains@bordet.be > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry@tajo.ucsd.edu UC San Diego http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717
ADD REPLY
0
Entering edit mode
Thanks for the link to the source code, I will look it into details. For the distance, I use the same one in hclust and eisen, i.e. the pearson correlation (also called uncentered correlation). Moreover, I use the 'median center' normalization. benjamin Charles C. Berry wrote: > On Mon, 6 Dec 2004, Benjamin Haibe-Kains wrote: > >> Hi Michael, >> >> I think that the differences are too important to be due to >> different implementation decisions. Actually my problem is that I >> have a group of 1 object and the rest in the other group when I use >> the 'centroid' hclust (I use cutree to have the main two groups) and >> it's not the case with other softwares. It looks like a bug in the >> Fortran routine but I can not access to it. > > > Source code is available from CRAN. In > > http://cran.r-project.org/src/base/R-2/R-2.0.1.tar.gz > > the directory > > R-2.0.0/src/library/stats/src > > contains hclust.f and hclust-utils.c. > > Happy debugging. > > ;-) > > >> >> Have you reported this "bug" before ? Can I write my 'centroid' >> method easily ? >> >> cheers, >> >> benjamin >> >> michael watson (IAH-C) wrote: >> >>> Benjamin >>> >>> You will likely get different results from all clustering software, >>> even >>> when using the same parameters. This is because many arbitrary >>> decisions have to be made during a hierarchical cluster analysis and >>> different programmers will make those decisions in different ways. >>> >>> Mick >>> >>> -----Original Message----- >>> From: bioconductor-bounces@stat.math.ethz.ch >>> [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Benjamin >>> Haibe-Kains >>> Sent: 06 December 2004 11:05 >>> To: Bioconductor Mailing List >>> Subject: [BioC] hclust and (Eisen+ de Hoon) cluster3 program >>> >>> >>> Hi all, >>> >>> I have a problem with the R function 'hclust'. I have noticed >>> differences in clustering when I use the 'centroid' cluster method >>> with 'hclust' and the cluster3 program (see M. Eisen and M. de Hoon). >>> >>> Have you noticed some differences too ? >>> >>> I use >>> >>> hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu; >>> 2004-11-15 15:56:06; unix) >>> cluster 3.0 using C Clustering Library version 1.25 >>> >>> Thanks a lot >>> >>> >>> >> >> -- >> Benjamin Haibe-Kains >> [http://www.ulb.ac.be/di/map/bhaibeka/] >> >> >> PhD student in the Machine Learning Group (MLG) >> [http://www.ulb.ac.be/di/mlg/] >> Universite Libre de Bruxelles (ULB) >> E-mail: bhaibeka@ulb.ac.be MicroArray Unity >> [http://www.bordet.be/servmed/array/index.htm] >> Institut Jules Bordet (IJB) >> E-mail: benjamin.haibekains@bordet.be >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:cberry@tajo.ucsd.edu UC San Diego > http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717 > > > -- Benjamin Haibe-Kains [http://www.ulb.ac.be/di/map/bhaibeka/] PhD student in the Machine Learning Group (MLG) [http://www.ulb.ac.be/di/mlg/] Universite Libre de Bruxelles (ULB) E-mail: bhaibeka@ulb.ac.be MicroArray Unity [http://www.bordet.be/servmed/array/index.htm] Institut Jules Bordet (IJB) E-mail: benjamin.haibekains@bordet.be
ADD REPLY
0
Entering edit mode
Dear all, I saw (be maybe on a older version of Eisen software) a problem of precision, I sent him this remark (Apr 2002): --- Old Message --- I used simple data (see below) to understand the hierarchical clustering, and I did find the same results with Maple (not very convenient !) but with a very different precision. Example (Distance: correlation centered, average link): NODE1X GENE15X GENE10X 0.9996337890625 NODE2X GENE20X GENE16X 0.99957275390625 NODE3X GENE14X GENE11X 0.99835205078125 Maple: v Node1: .99959179339780276201 Node2: .99956936766825333998 Node3: .99833748845958267738 I thought that Cluster use Double precision, but it should have something like 15 good digits. Fortunately, data were very short, and with the same order of magnitude, but a computer scientist told me that floating point precision is far more less if operands (in addition, substraction...) differ greatly in size. -------------- Data: UNIQID NAME GWEIGHT GORDER "V1" "V2" "V3" EWEIGHT 1 1 1 "A1" 1 1 2 16 18 "A2" 1 2 12 9 7 "A3" 1 3 9 10 4 "A4" 1 4 5 2 12 "A5" 1 5 12 14 7 "A6" 1 6 9 16 10 "A7" 1 7 8 10 10 "A8" 1 8 10 6 6 "A9" 1 9 14 1 28 "A10" 1 10 9 10 23 "A11" 1 11 9 16 27 "A12" 1 12 17 12 37 "A13" 1 13 15 5 23 "A14" 1 14 7 14 29 "A15" 1 15 11 8 29 "A16" 1 16 4 16 37 "A17" 1 17 32 25 34 "A18" 1 18 28 35 30 "A19" 1 19 30 28 23 "A20" 1 20 32 22 28 "A21" 1 21 25 22 26 "A22" 1 22 27 33 26 "A23" 1 23 28 33 31 "A24" 1 24 36 28 31 --- On Mon, 06 Dec 2004 13:09:46 +0100 Benjamin Haibe-Kains <bhaibeka@ulb.ac.be> wrote: > Hi Michael, > > I think that the differences are too important to be due to different > implementation decisions. Actually my problem is that I have a group of > 1 object and the rest in the other group when I use the 'centroid' > hclust (I use cutree to have the main two groups) and it's not the case > with other softwares. It looks like a bug in the Fortran routine but I > can not access to it. > > Have you reported this "bug" before ? Can I write my 'centroid' method > easily ? > > cheers, > > benjamin > > michael watson (IAH-C) wrote: > > >Benjamin > > > >You will likely get different results from all clustering software, even > >when using the same parameters. This is because many arbitrary > >decisions have to be made during a hierarchical cluster analysis and > >different programmers will make those decisions in different ways. > > > >Mick > > > >-----Original Message----- > >From: bioconductor-bounces@stat.math.ethz.ch > >[mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Benjamin > >Haibe-Kains > >Sent: 06 December 2004 11:05 > >To: Bioconductor Mailing List > >Subject: [BioC] hclust and (Eisen+ de Hoon) cluster3 program > > > > > >Hi all, > > > >I have a problem with the R function 'hclust'. I have noticed > >differences in clustering when I use the 'centroid' cluster method with > >'hclust' and the cluster3 program (see M. Eisen and M. de Hoon). > > > >Have you noticed some differences too ? > > > >I use > > > >hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu; > >2004-11-15 15:56:06; unix) > >cluster 3.0 using C Clustering Library version 1.25 > > > >Thanks a lot > > > > > > > -- Antoine Lucas Centre de g?n?tique Mol?culaire, CNRS 91198 Gif sur Yvette Cedex Tel: (33)1 69 82 38 89 Fax: (33)1 69 82 38 77
ADD REPLY
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 9.6 years ago
If there is a problem with hclust() then it's much better dealt with on the R-help mailing list than here. Mick -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Benjamin Haibe-Kains Sent: 06 December 2004 12:10 To: Bioconductor Mailing List Subject: Re: [BioC] hclust and (Eisen+ de Hoon) cluster3 program Hi Michael, I think that the differences are too important to be due to different implementation decisions. Actually my problem is that I have a group of 1 object and the rest in the other group when I use the 'centroid' hclust (I use cutree to have the main two groups) and it's not the case with other softwares. It looks like a bug in the Fortran routine but I can not access to it. Have you reported this "bug" before ? Can I write my 'centroid' method easily ? cheers, benjamin michael watson (IAH-C) wrote: >Benjamin > >You will likely get different results from all clustering software, >even when using the same parameters. This is because many arbitrary >decisions have to be made during a hierarchical cluster analysis and >different programmers will make those decisions in different ways. > >Mick > >-----Original Message----- >From: bioconductor-bounces@stat.math.ethz.ch >[mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Benjamin >Haibe-Kains >Sent: 06 December 2004 11:05 >To: Bioconductor Mailing List >Subject: [BioC] hclust and (Eisen+ de Hoon) cluster3 program > > >Hi all, > >I have a problem with the R function 'hclust'. I have noticed >differences in clustering when I use the 'centroid' cluster method with >'hclust' and the cluster3 program (see M. Eisen and M. de Hoon). > >Have you noticed some differences too ? > >I use > >hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu; >2004-11-15 15:56:06; unix) >cluster 3.0 using C Clustering Library version 1.25 > >Thanks a lot > > > -- Benjamin Haibe-Kains [http://www.ulb.ac.be/di/map/bhaibeka/] PhD student in the Machine Learning Group (MLG) [http://www.ulb.ac.be/di/mlg/] Universite Libre de Bruxelles (ULB) E-mail: bhaibeka@ulb.ac.be MicroArray Unity [http://www.bordet.be/servmed/array/index.htm] Institut Jules Bordet (IJB) E-mail: benjamin.haibekains@bordet.be _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
Tan, MinHan ▴ 180
@tan-minhan-431
Last seen 9.6 years ago
Did you use a correlation distance metric for hclust? AFAIK, that's what the Eisen software uses. I think the default distance metric for hclust is Euclidean? Could be wrong, but it's worth a try. Min-Han Tan -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Benjamin Haibe-Kains Sent: Monday, December 06, 2004 7:10 AM To: Bioconductor Mailing List Subject: Re: [BioC] hclust and (Eisen+ de Hoon) cluster3 program Hi Michael, I think that the differences are too important to be due to different implementation decisions. Actually my problem is that I have a group of 1 object and the rest in the other group when I use the 'centroid' hclust (I use cutree to have the main two groups) and it's not the case with other softwares. It looks like a bug in the Fortran routine but I can not access to it. Have you reported this "bug" before ? Can I write my 'centroid' method easily ? cheers, benjamin michael watson (IAH-C) wrote: >Benjamin > >You will likely get different results from all clustering software, >even when using the same parameters. This is because many arbitrary >decisions have to be made during a hierarchical cluster analysis and >different programmers will make those decisions in different ways. > >Mick > >-----Original Message----- >From: bioconductor-bounces@stat.math.ethz.ch >[mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Benjamin >Haibe-Kains >Sent: 06 December 2004 11:05 >To: Bioconductor Mailing List >Subject: [BioC] hclust and (Eisen+ de Hoon) cluster3 program > > >Hi all, > >I have a problem with the R function 'hclust'. I have noticed >differences in clustering when I use the 'centroid' cluster method with >'hclust' and the cluster3 program (see M. Eisen and M. de Hoon). > >Have you noticed some differences too ? > >I use > >hclust from library 'stats' (Built: R 2.0.1; i386-pc-linux-gnu; >2004-11-15 15:56:06; unix) >cluster 3.0 using C Clustering Library version 1.25 > >Thanks a lot > > > -- Benjamin Haibe-Kains [http://www.ulb.ac.be/di/map/bhaibeka/] PhD student in the Machine Learning Group (MLG) [http://www.ulb.ac.be/di/mlg/] Universite Libre de Bruxelles (ULB) E-mail: bhaibeka@ulb.ac.be MicroArray Unity [http://www.bordet.be/servmed/array/index.htm] Institut Jules Bordet (IJB) E-mail: benjamin.haibekains@bordet.be _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor This email message, including any attachments, is for the so...{{dropped}}
ADD COMMENT
0
Entering edit mode
Wittner, Ben ▴ 290
@wittner-ben-1031
Last seen 8.2 years ago
USA/Boston/Mass General Hospital
In case this helps, I've not looked at cluster3, but I have looked at the source code of Eisen's Cluster (i.e., the one for Windows) and I noticed that Cluster's average agglomeration method seems to compute an average of each cluster (i.e., a point whose n-th coordinate is the average of the n-th coordinates of all the points in the cluster) and then use the distance between those averages whereas hclust() seems to use the average distance between the points of the two clusters. In other words, Cluster uses the distance between averages and hclust() uses the average of distances. -Ben
ADD COMMENT

Login before adding your answer.

Traffic: 548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6