ctc package - cluster dendrogram
1
0
Entering edit mode
Donna Toleno ▴ 90
@donna-toleno-2383
Last seen 10.2 years ago
Hello list. When I make an R Cluster Dendrogram, it looks very different from the clustering in the Newick file displayed in Treeview (Rod Page program) . I tried a simple example with 12 probes and 3 samples and I did the Euclidean distances manually and with R. > library(ctc) > data V1 V2 V3 1 4.184499 4.142575 4.017366 2 3.459849 3.455023 3.732115 3 8.287278 4.887692 5.007794 4 4.137224 4.523774 4.191996 5 4.431768 4.356945 4.570331 6 3.867442 3.931225 3.967566 7 3.480681 3.609997 3.522618 8 3.460785 3.966638 3.708675 9 4.306729 4.480724 4.399165 10 4.290001 4.036634 4.078688 11 6.707544 7.179901 9.475103 12 6.837264 6.845438 7.364477 > hc <- hcluster(t(data), link = "ave") > write(hc2Newick(hc),file='hclust_12_probes_newick') > plot (hc) > hc Call: hcluster(x = t(data), link = "ave") Cluster method : average Distance : euclidean Number of objects: 3 'hclust_12_probes_newick' file contains: (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):0.7523 46233726435); I can see that the above Newick formatted tree shows that sample 2 and sample 3 are the appropriate distance apart, about 2.4, but where does the 0.7523... come from? How do I interpret "Height" on the y-axis of this dendrogram? I would like a tree that represents the expression difference. The Newick tree viewed in TreeView (Rod Page's Treeview) looks different from the dendrogram produced by hcluster, but the branch lengths still do not reflect the Euclidean distances. In my example, the Newick tree shows all three samples about equidistant from each other. Perhaps I should be using phylogenetic tree drawing to get the appropriate branch lengths from the Euclidean distances? I also experimented with hclust2treeview but this seems to refer to Michael Eisen's Treeview. I am not familiar with this program or the file formats it uses. Thank you for reading. Any comments will be appreciated. Euclidean distance manually calculated in Excel for all of the 12 probes: V2 V3 V1 3.508320996 4.352360295 V2 2.425648178 > distances.12.probes <- as.matrix(dist(t(data), method = "euclidean", diag = FALSE)) > distances.12.probes V1 V2 V3 V1 0.000000 3.508321 4.352360 V2 3.508321 0.000000 2.425648 V3 4.352360 2.425648 0.000000 Thank you again. -Donna
• 1.5k views
ADD COMMENT
0
Entering edit mode
Jarno Tuimala ▴ 140
@jarno-tuimala-1650
Last seen 10.2 years ago
Hi! If you draw a dendrogram in R, the y-axis is the distance between objects. In your case, the tree looks roughly like: 4 +--+ | | | 3 1 | | +-+ 2 | | 2 3 As the branch which connects V2 and V3 is at approx. 2.4 it is the distance between these objects (samples). The same applies to the distance between samples V1 and V3 (or V2 and V3). Those connect at approx. 3.9, and that is the distance between these samples. You can plot the tree using plot(hc, hang=0) and this should become more evident. This is contrast to Treeview that visualizes the distances as branch lengths. If you visualize the tree in Treeview (by Rod Page), the branches are the euclidean distances between the samples, and are not equidistant. For example, the distance between V2 and V3 is approx. 2.4. In the tree drawn by Treeview, the branch lengths are half of that, so each terminal branch leading to either V2 or V3 is about 1.2. You also asked about the 0.752... distances in the tree: > 'hclust_12_probes_newick' file contains: > (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056): > 0.752346233726435); The first is the lenght of branch leading to V1, another is the length of the only internal branch of the tree. Those are computed from the pairwise distances between samples using the average linkage (UPGMA) algorithm. - Jarno On Mon, 15 Oct 2007, Donna Toleno wrote: > Hello list. > > When I make an R Cluster Dendrogram, it looks very different from the clustering in the Newick file displayed in Treeview (Rod Page program) . I tried a simple example with 12 probes and 3 samples and I did the Euclidean distances manually and with R. > > >> library(ctc) >> data > V1 V2 V3 > 1 4.184499 4.142575 4.017366 > 2 3.459849 3.455023 3.732115 > 3 8.287278 4.887692 5.007794 > 4 4.137224 4.523774 4.191996 > 5 4.431768 4.356945 4.570331 > 6 3.867442 3.931225 3.967566 > 7 3.480681 3.609997 3.522618 > 8 3.460785 3.966638 3.708675 > 9 4.306729 4.480724 4.399165 > 10 4.290001 4.036634 4.078688 > 11 6.707544 7.179901 9.475103 > 12 6.837264 6.845438 7.364477 >> hc <- hcluster(t(data), link = "ave") >> write(hc2Newick(hc),file='hclust_12_probes_newick') >> plot (hc) >> hc > > Call: > hcluster(x = t(data), link = "ave") > > Cluster method : average > Distance : euclidean > Number of objects: 3 > > 'hclust_12_probes_newick' file contains: > (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):0.75 2346233726435); > > I can see that the above Newick formatted tree shows that sample 2 and sample 3 are the appropriate distance apart, about 2.4, but where does the 0.7523... come from? How do I interpret "Height" on the y-axis of this dendrogram? I would like a tree that represents the expression difference. The Newick tree viewed in TreeView (Rod Page's Treeview) looks different from the dendrogram produced by hcluster, but the branch lengths still do not reflect the Euclidean distances. In my example, the Newick tree shows all three samples about equidistant from each other. Perhaps I should be using phylogenetic tree drawing to get the appropriate branch lengths from the Euclidean distances? I also experimented with hclust2treeview but this seems to refer to Michael Eisen's Treeview. I am not familiar with this program or the file formats it uses. > > Thank you for reading. Any comments will be appreciated. > > Euclidean distance manually calculated in Excel for all of the 12 probes: > > V2 V3 > V1 3.508320996 4.352360295 > V2 2.425648178 > >> distances.12.probes <- as.matrix(dist(t(data), method = "euclidean", diag = FALSE)) >> distances.12.probes > V1 V2 V3 > V1 0.000000 3.508321 4.352360 > V2 3.508321 0.000000 2.425648 > V3 4.352360 2.425648 0.000000 > > > Thank you again. > > -Donna > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ---------------------------------------------------------------------- ------- Jarno Tuimala, FT, bioinformatiikan asiantuntija, CSC, PL 405, 02101 Espoo puh.: (09) 457 2226, fax: (09) 457 2302, s-posti: jarno.tuimala at csc.fi CSC on tieteen tietotekniikan keskus, http://www.csc.fi/molbio Jarno Tuimala, PhD, bioinformatics, CSC, P.O.Box 405, FI-02101 Espoo, Finland tel.: +358 9 457 2226, fax: +358 9 457 2302, e-mail: jarno.tuimala at csc.fi CSC is the Finnish IT Center for Science, http://www.csc.fi/molbio
ADD COMMENT
0
Entering edit mode
> Hi! > > If you draw a dendrogram in R, the y-axis is the distance between > objects. > In your case, the tree looks roughly like: > > 4 +--+ > | | | > 3 1 | > | +-+ > 2 | | > 2 3 > > As the branch which connects V2 and V3 is at approx. 2.4 it is the > distance between these objects (samples). The same applies to the > distance between samples V1 and V3 (or V2 and V3). Those connect at > approx. 3.9, and that is the distance between these samples. You > can plot > the tree using > > plot(hc, hang=0) > > and this should become more evident. > > This is contrast to Treeview that visualizes the distances as > branch > lengths. If you visualize the tree in Treeview (by Rod Page), the > branches > are the euclidean distances between the samples, and are not > equidistant. > For example, the distance between V2 and V3 is approx. 2.4. In the > tree > drawn by Treeview, the branch lengths are half of that, so each > terminal > branch leading to either V2 or V3 is about 1.2. > > You also asked about the 0.752... distances in the tree: > > > 'hclust_12_probes_newick' file contains: > > (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056): > > 0.752346233726435); > > The first is the lenght of branch leading to V1, another is the > length of > the only internal branch of the tree. Those are computed from the > pairwise > distances between samples using the average linkage (UPGMA) algorithm. > > - Jarno > > > > >> library(ctc) > >> data > > V1 V2 V3 > > 1 4.184499 4.142575 4.017366 > > 2 3.459849 3.455023 3.732115 > > 3 8.287278 4.887692 5.007794 > > 4 4.137224 4.523774 4.191996 > > 5 4.431768 4.356945 4.570331 > > 6 3.867442 3.931225 3.967566 > > 7 3.480681 3.609997 3.522618 > > 8 3.460785 3.966638 3.708675 > > 9 4.306729 4.480724 4.399165 > > 10 4.290001 4.036634 4.078688 > > 11 6.707544 7.179901 9.475103 > > 12 6.837264 6.845438 7.364477 > >> hc <- hcluster(t(data), link = "ave") > >> write(hc2Newick(hc),file='hclust_12_probes_newick') > >> plot (hc) > >> hc > > > > Call: > > hcluster(x = t(data), link = "ave") > > > > Cluster method : average > > Distance : euclidean > > Number of objects: 3 > > > > 'hclust_12_probes_newick' file contains: > > > (V1:0.752346233726435,(V2:1.21282408894056,V3:1.21282408894056):0.75 2346233726435);> > > I can see that the above Newick formatted tree shows that sample > 2 and sample 3 are the appropriate distance apart, about 2.4, but > where does the 0.7523... come from? How do I interpret "Height" on > the y-axis of this dendrogram? ..... > > > > Euclidean distance manually calculated in Excel for all of the 12 > probes:> > > V2 V3 > > V1 3.508320996 4.352360295 > > V2 2.425648178 > > > >> distances.12.probes <- as.matrix(dist(t(data), method = > "euclidean", diag = FALSE)) > >> distances.12.probes > > V1 V2 V3 > > V1 0.000000 3.508321 4.352360 > > V2 3.508321 0.000000 2.425648 > > V3 4.352360 2.425648 0.000000 > > > > > > Thank you again. > > > > -Donna Thank you. I understand it completely now. (4.352360 + 3.508)/2 = 3.9 = average distance from 1 to 2 and from 1 to 3. Then 3.9 - 2.4 = 1.5 1.5 /2 = 0.75 for the internal branch and the branch for V1.
ADD REPLY

Login before adding your answer.

Traffic: 652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6