Again on heatmap clusters - dChip style, Pearson's distance. Which solutions ?

0

Entering edit mode

Giulio Di Giovanni ▴ 540

@giulio-di-giovanni-950

Last seen 9.6 years ago

Looking thru the mailing list and the web I found a question by Saurin Jani that's exactly my question. But this question was answered in two different ways by Michael Watson and Shi Tao. I was trusting Michael, because Saurin answered that was Ok, but also Shi Tao argumentation look convincing... here the three mails, starting from the bottom... Wich one has the more appropriate solution ? Thanks a lot Giulio ----------------------------------- Shi, Tao writes: Here is what dChip manual says: "The default clustering algorithm of genes is as follows: the distance between two genes is defined as 1 - r where r is the Pearson correlation coefficient between the standardized expression values (make mean 0 and standard deviation 1) of the two genes across the samples used. Two genes with the closest distance are first merged into a super-gene and connected by branches with length representing their distance, and are then excluded for subsequent merging events. The expression values of the newly formed super-gene is the average of standardized expression values of the two genes (centroid-linkage) across samples. Then the next pair of genes (super-genes) with the smallest distance is chosen to merge and the process is repeated n ? 1 times to merge all the n genes. A similar procedure is used to cluster samples....." so, to follow that exactly, what you need to do is something like: row.dist <- as.dist(1 - cor(scale(t(esetSub2X)))) col.dist <- as.dist(1 - cor(scale(esetSub2X))) heatmap(esetSub2X, Colv=as.dendrogram(hclust(col.dist, method="centroid")), Rowv=as.dendrogram(hclust(row.dist, method="centroid"))) ====================================================================== ===================== >Message: 20 >Date: Tue, 16 Nov 2004 09:05:30 -0000 >From: "michael watson (IAH-C)" <michael.watson at="" bbsrc.ac.uk=""> >Subject: RE: [BioC] How can I get Heatmap using dChip > clustering..which is nice& easy to see patterns >To: <saurin_jani at="" yahoo.com="">, "Bioconductor Bioconductor" > <bioconductor at="" stat.math.ethz.ch=""> >Message-ID: > <8975119BCD0AC5419D61A9CF1A923E95E89817 at iahce2knas1.iah.bbsrc.reserved> > >Content-Type: text/plain; charset="us-ascii" > >Hi Saurin > >I may be wrong, but it looks like your code calculates the euclidean >distance between rows of 1-cor(), which is itself a distance matrix of >sorts. Try: > >row.dist <- as.dist(1 - cor(t(esetSub2X))) >col.dist <- as.dist(1 - cor(esetSub2X)) >heatmap(esetSub2X, Colv=as.dendrogram(hclust(col.dist, >method="average")), Rowv=as.dendrogram(hclust(row.dist, >method="average"))) > >Mick > >-----Original Message----- >From: Saurin Jani [mailto:saurin_jani at yahoo.com] Sent: 15 November 2004 >23:28 >To: Bioconductor Bioconductor >Subject: [BioC] How can I get Heatmap using dChip clustering..which is >nice& easy to see patterns > > >Hi , > >How can I get dChip clustering on heatmap?..which is >nice & easy to see patterns. > >I am using 1- cor(eset) but somehow its not working I >am still getting diff. kind of clustering dendrogram. > > > d <- dist((1 - cor(esetSub2X)),method = >"euclidean"); > > dCol <- dist(t((1- cor(esetSub2X))),method = >"euclidean"); > > > heatmap(esetSub2X,Colv= >as.dendrogram(hclust(d,method = "complete")),Rowv = >NA,col = rbg,cexRow = 1,cexCol = 1); > > >Am I missing something? > >Any heatmap clustering is helpful. > >Thank you, >Saurin > >

Clustering PROcess Clustering PROcess • 1.3k views

ADD COMMENT • link updated 18.4 years ago by Adaikalavan Ramasamy ★ 1.8k • written 18.4 years ago by Giulio Di Giovanni ▴ 540

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 9.6 years ago

Clustering methods are exploratory methods in nature to allow you to explore the data. There are usually no well defined rules for exploratory methods. My experience is that the investigators play around too much with such methods until they arrive at a visually interesting picture around which they can make a story for. Clustering methods may be the first analysis one does but it is by no means the final or definite one as many like to believe. Now that I have expressed my distaste for overuse of clustering methods in microarray, let me answer your question. The two answers differ in that a) Tao Shi uses scale() on the expression before calculating the correlation values and b) in the different linkage methods. The use of scale() is redundant here as cor() does it anyway. m <- matrix( rnorm(1000), nc=10 ) c1 <- cor( m ) c2 <- cor( scale(m) ) all.equal( c1, c2 ) [1] TRUE I cannot advise which linkage is best. But if you want to reproduce dChip's clustering and if it uses complete linkage, then you should do the same. But I believe that the most "popular" linkage method used for microarray data is "average" linkage. Regards, Adai On Fri, 2005-11-25 at 12:07 +0000, Giulio Di Giovanni wrote: > Looking thru the mailing list and the web I found a question by Saurin Jani > that's exactly my question. But this question was answered in two different > ways by Michael Watson and Shi Tao. > I was trusting Michael, because Saurin answered that was Ok, but also Shi > Tao argumentation look convincing... > here the three mails, starting from the bottom... > Wich one has the more appropriate solution ? > > Thanks a lot > > Giulio > > ----------------------------------- > Shi, Tao writes: > > Here is what dChip manual says: > > "The default clustering algorithm of genes is as follows: the distance > between two genes is > defined as 1 - r where r is the Pearson correlation coefficient between the > standardized > expression values (make mean 0 and standard deviation 1) of the two genes > across the samples used. > Two genes with the closest distance are first merged into a super- gene and > connected by branches > with length representing their distance, and are then excluded for > subsequent merging events. The > expression values of the newly formed super-gene is the average of > standardized expression values > of the two genes (centroid-linkage) across samples. Then the next pair of > genes (super-genes) with > the smallest distance is chosen to merge and the process is repeated n ? 1 > times to merge all the > n genes. A similar procedure is used to cluster samples....." > > so, to follow that exactly, what you need to do is something like: > > row.dist <- as.dist(1 - cor(scale(t(esetSub2X)))) > col.dist <- as.dist(1 - cor(scale(esetSub2X))) > heatmap(esetSub2X, Colv=as.dendrogram(hclust(col.dist, > method="centroid")), Rowv=as.dendrogram(hclust(row.dist, > method="centroid"))) > > ==================================================================== ======================= > >Message: 20 > >Date: Tue, 16 Nov 2004 09:05:30 -0000 > >From: "michael watson (IAH-C)" <michael.watson at="" bbsrc.ac.uk=""> > >Subject: RE: [BioC] How can I get Heatmap using dChip > > clustering..which is nice& easy to see patterns > >To: <saurin_jani at="" yahoo.com="">, "Bioconductor Bioconductor" > > <bioconductor at="" stat.math.ethz.ch=""> > >Message-ID: > > <8975119BCD0AC5419D61A9CF1A923E95E89817 at iahce2knas1.iah.bbsrc.reserved> > > > >Content-Type: text/plain; charset="us-ascii" > > > >Hi Saurin > > > >I may be wrong, but it looks like your code calculates the euclidean > >distance between rows of 1-cor(), which is itself a distance matrix of > >sorts. Try: > > > >row.dist <- as.dist(1 - cor(t(esetSub2X))) > >col.dist <- as.dist(1 - cor(esetSub2X)) > >heatmap(esetSub2X, Colv=as.dendrogram(hclust(col.dist, > >method="average")), Rowv=as.dendrogram(hclust(row.dist, > >method="average"))) > > > >Mick > > > >-----Original Message----- > >From: Saurin Jani [mailto:saurin_jani at yahoo.com] Sent: 15 November 2004 > >23:28 > >To: Bioconductor Bioconductor > >Subject: [BioC] How can I get Heatmap using dChip clustering..which is > >nice& easy to see patterns > > > > > >Hi , > > > >How can I get dChip clustering on heatmap?..which is > >nice & easy to see patterns. > > > >I am using 1- cor(eset) but somehow its not working I > >am still getting diff. kind of clustering dendrogram. > > > > > d <- dist((1 - cor(esetSub2X)),method = > >"euclidean"); > > > dCol <- dist(t((1- cor(esetSub2X))),method = > >"euclidean"); > > > > > heatmap(esetSub2X,Colv= > >as.dendrogram(hclust(d,method = "complete")),Rowv = > >NA,col = rbg,cexRow = 1,cexCol = 1); > > > > > >Am I missing something? > > > >Any heatmap clustering is helpful. > > > >Thank you, > >Saurin > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 18.4 years ago Adaikalavan Ramasamy ★ 1.8k

Login before adding your answer.