Question

dendrograms on heatmap.2 (gplots)

0

Entering edit mode

Gavin Koh ▴ 220

@gavin-koh-4582

Last seen 9.6 years ago

Dear all, I have human Illumina gene expression data that I want to present in a heatmap. I have already for the perfect heatmap using heatmap.2 (gplots) and RColorBrewer (R 2.13.0). I just want to put an dendrogram on the columns, but the default behaviour of heatmap.2 is to reorder the columns, which I do not want. Reading the heatmap.2 help file, it looks like I would need hclust() and dendrogram() to first generate the dendrogram (ideally unsupervised k-nearest neighbours), which I can do easily, but again: heatmap.2 will just reorder the columns according to the dendrogram. Help, please? How do I get R to draw me a KNN dendrogram with two clusters and add it to my heatmap without reordering it? Thanks in advance, Gavin.

• 2.6k views

ADD COMMENT • link updated 12.9 years ago by Steve Lianoglou ★ 13k • written 12.9 years ago by Gavin Koh ▴ 220

score 0 · Answer 1 · 2011-05-28

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi, On Sat, May 28, 2011 at 12:58 AM, Gavin Koh <gavin.koh at="" gmail.com=""> wrote: > Dear all, > > I have human Illumina gene expression data that I want to present in a > heatmap. I have already for the perfect heatmap using heatmap.2 > (gplots) and RColorBrewer (R 2.13.0). I just want to put an dendrogram > on the columns, but the default behaviour of heatmap.2 is to reorder > the columns, which I do not want. > > Reading the heatmap.2 help file, it looks like I would need hclust() > and dendrogram() to first generate the dendrogram (ideally > unsupervised k-nearest neighbours), which I can do easily, but again: > heatmap.2 will just reorder the columns according to the dendrogram. > > Help, please? How do I get R to draw me a KNN dendrogram with two > clusters and add it to my heatmap without reordering it? I could be mistaken, but I'm not sure that you can. Let's think of the most pathological case: Imagine that the first and last column of your expression matrix are considered closest by their distance on the dendrogram. How would you draw the dendrogram "correctly" without first moving the first and last columns next to each other? I guess your situation isn't as extreme as that, but ... I'm guessing you think your columns are already "grouped' together and just want to draw the dendrogram on top to better illustrate that point, but when you do it ends up showing you something that is different than you expected to see? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 12.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Dear Steve, I have healthy controls and patients, so two groups. k-means misclassifies a few study subjects, but by and large, redrawing the dendrogram while preserving the ordering is not going to serious mess things up. Gavin. On 28 May 2011 15:31, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi, > > On Sat, May 28, 2011 at 12:58 AM, Gavin Koh <gavin.koh at="" gmail.com=""> wrote: >> Dear all, >> >> I have human Illumina gene expression data that I want to present in a >> heatmap. I have already for the perfect heatmap using heatmap.2 >> (gplots) and RColorBrewer (R 2.13.0). I just want to put an dendrogram >> on the columns, but the default behaviour of heatmap.2 is to reorder >> the columns, which I do not want. >> >> Reading the heatmap.2 help file, it looks like I would need hclust() >> and dendrogram() to first generate the dendrogram (ideally >> unsupervised k-nearest neighbours), which I can do easily, but again: >> heatmap.2 will just reorder the columns according to the dendrogram. >> >> Help, please? How do I get R to draw me a KNN dendrogram with two >> clusters and add it to my heatmap without reordering it? > > I could be mistaken, but I'm not sure that you can. > > Let's think of the most pathological case: > > Imagine that the first and last column of your expression matrix are > considered closest by their distance on the dendrogram. How would you > draw the dendrogram "correctly" without first moving the first and > last columns next to each other? > > I guess your situation isn't as extreme as that, but ... I'm guessing > you think your columns are already "grouped' together and just want to > draw the dendrogram on top to better illustrate that point, but when > you do it ends up showing you something that is different than you > expected to see? > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > -- Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law. ?Douglas Hofstadter (in G?del, Escher, Bach, 1979)

ADD REPLY • link 12.9 years ago Gavin Koh ▴ 220

0

Entering edit mode

Hi Gavin, On Sat, May 28, 2011 at 11:06 AM, Gavin Koh <gavin.koh at="" gmail.com=""> wrote: > Dear Steve, I have healthy controls and patients, so two groups. > k-means misclassifies a few study subjects, but by and large, > redrawing the dendrogram while preserving the ordering is not going to > serious mess things up. Sorry if my post came across in the wrong way -- I'm not trying to imply that you are trying to show something that isn't true, or something ... I'm actually not sure how you interpreted my email, because I'm not sure what you're trying to say in your reply, so let my try another way :-) I guess my point is that: yes, you have two groups when you condition group assignment based on a state we call "healthy" and "affected" (or whatever you call them here). If you are asking to group your patients in a different way -- this time using your gene expression profiles -- it's not totally unusual for things to change a bit. So, again, I'm not trying to lecture here, but this is the way I understand it. If I'm wrong, feel free to correct me: The distances we "walk along" the arms/branches of the dendrogram say something about the distance between the "things" they are connecting. If you didn't change any params in your heatmap call, the default distance measure between your vectors is calculated by its euclidean distance, and that just is what it is. The dendrogram is then drawn to respect those distances. If you move things around, then you are saying something different about those distances, right? In this context, I'm confused about your point when you say "redrawing the dendrogram while preserving the ordering is not going to serious mess things up" -- what ordering do you expect to be preserved ... is it the columns of the matrix that you passed in? If you don't want to move those columns around, then do you want the branches of the tree to criss-cross or something? The way I see it, you are kind of stuck if you intend to draw a dendrogram at all. So -- how can we move things around in a natural way? Maybe you can choose a different distance measure? Maybe you can normalize your data in a different way? Maybe you can plot a subset of genes -- maybe those with the highest variance across all your data, which might result in new distances calculated, and a different drawing of the branches on the tree. You could always pass in your own dendrogram structure to the heatmap and "arbitrarily" calculate distances so that the tree draws as you want, but I don't think that's something you'd want to do anyway. Another approach to show "likeness" between expression profiles is to not focus on the dendrogram lining up "just so", but to rather add a list of colors to the examples (columns) of your data by using the "ColSideColors" parameter. Say the first 10 columns of your matrix are from the 10 controls, and the last 10 are from the affecteds. You can do: R> heatmap.2(my.data, ..., ColSideColColors=c(rep('blue', 10), rep('red', 10))) If, as you say, the expression profiles are *mostly* similar, you'll see that, by and large, the blue experiments will be "chunked" w/ blue, and the red expts are chunked with the red, which might show the same point you're trying to make with the dendrogram. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 12.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Dear Steve, Yes, I expect that in preserving the order in which I have the sample currently, the branches will cross. You are right: it will be clearer to cluster by k-means then use ColSideColors to colour the leaves than to try to draw a dendrogram with criss-crossing branches. Thanks for helping me thinking this through. Gavin. On 28 May 2011 18:20, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi Gavin, > > On Sat, May 28, 2011 at 11:06 AM, Gavin Koh <gavin.koh at="" gmail.com=""> wrote: >> Dear Steve, I have healthy controls and patients, so two groups. >> k-means misclassifies a few study subjects, but by and large, >> redrawing the dendrogram while preserving the ordering is not going to >> serious mess things up. > > Sorry if my post came across in the wrong way -- I'm not trying to > imply that you are trying to show something that isn't true, or > something ... I'm actually not sure how you interpreted my email, > because I'm not sure what you're trying to say in your reply, so let > my try another way :-) > > I guess my point is that: yes, you have two groups when you condition > group assignment based on a state we call "healthy" and "affected" (or > whatever you call them here). > > If you are asking to group your patients in a different way -- this > time using your gene expression profiles -- it's not totally unusual > for things to change a bit. > > So, again, I'm not trying to lecture here, but this is the way I > understand it. If I'm wrong, feel free to correct me: > > The distances we "walk along" the arms/branches of the dendrogram say > something about the distance between the "things" they are connecting. > If you didn't change any params in your heatmap call, the default > distance measure between your vectors is calculated by its euclidean > distance, and that just is what it is. The dendrogram is then drawn to > respect those distances. If you move things around, then you are > saying something different about those distances, right? > > In this context, I'm confused about your point when you say "redrawing > the dendrogram while preserving the ordering is not going to serious > mess things up" -- what ordering do you expect to be preserved ... is > it the columns of the matrix that you passed in? If you don't want to > move those columns around, then ?do you want the branches of the tree > to criss-cross or something? > > The way I see it, you are kind of stuck if you intend to draw a > dendrogram at all. > > So -- how can we move things around in a natural way? > > Maybe you can choose a different distance measure? > Maybe you can normalize your data in a different way? > Maybe you can plot a subset of genes -- maybe those with the highest > variance across all your data, which might result in new distances > calculated, and a different drawing of the branches on the tree. > > You could always pass in your own dendrogram structure to the heatmap > and "arbitrarily" calculate distances so that the tree ?draws as you > want, but I don't think that's something you'd want to do anyway. > > Another approach to show "likeness" between expression profiles is to > not focus on the dendrogram lining up "just so", but to rather add a > list of colors to the examples (columns) of your data by using the > "ColSideColors" parameter. Say the first 10 columns of your matrix are > from the 10 controls, and the last 10 are from the affecteds. You can > do: > > R> heatmap.2(my.data, ..., ColSideColColors=c(rep('blue', 10), rep('red', 10))) > > If, as you say, the expression profiles are *mostly* similar, you'll > see that, by and large, the blue experiments will be "chunked" w/ > blue, and the red expts are chunked with the red, which might show the > same point you're trying to make with the dendrogram. > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > -- Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law. ?Douglas Hofstadter (in G?del, Escher, Bach, 1979)

ADD REPLY • link 12.9 years ago Gavin Koh ▴ 220

0

Entering edit mode

Dear Steve, Just for the record, I think I have found a function that allows drawing of a dendrogram with the leaf order specified. It is draw.dendrogram {NeatMap}. I cannot see a way of directly drawing the reordered dendrogram in the heatmap, though, so I still think that your solution is better :-) Gavin. On 28 May 2011 19:53, Gavin Koh <gavin.koh at="" gmail.com=""> wrote: > Dear Steve, > > Yes, I expect that in preserving the order in which I have the sample > currently, the branches will cross. > You are right: it will be clearer to cluster by k-means then use > ColSideColors to colour the leaves than to try to draw a dendrogram > with criss-crossing branches. Thanks for helping me thinking this > through. > > Gavin. > > On 28 May 2011 18:20, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi Gavin, >> >> On Sat, May 28, 2011 at 11:06 AM, Gavin Koh <gavin.koh at="" gmail.com=""> wrote: >>> Dear Steve, I have healthy controls and patients, so two groups. >>> k-means misclassifies a few study subjects, but by and large, >>> redrawing the dendrogram while preserving the ordering is not going to >>> serious mess things up. >> >> Sorry if my post came across in the wrong way -- I'm not trying to >> imply that you are trying to show something that isn't true, or >> something ... I'm actually not sure how you interpreted my email, >> because I'm not sure what you're trying to say in your reply, so let >> my try another way :-) >> >> I guess my point is that: yes, you have two groups when you condition >> group assignment based on a state we call "healthy" and "affected" (or >> whatever you call them here). >> >> If you are asking to group your patients in a different way -- this >> time using your gene expression profiles -- it's not totally unusual >> for things to change a bit. >> >> So, again, I'm not trying to lecture here, but this is the way I >> understand it. If I'm wrong, feel free to correct me: >> >> The distances we "walk along" the arms/branches of the dendrogram say >> something about the distance between the "things" they are connecting. >> If you didn't change any params in your heatmap call, the default >> distance measure between your vectors is calculated by its euclidean >> distance, and that just is what it is. The dendrogram is then drawn to >> respect those distances. If you move things around, then you are >> saying something different about those distances, right? >> >> In this context, I'm confused about your point when you say "redrawing >> the dendrogram while preserving the ordering is not going to serious >> mess things up" -- what ordering do you expect to be preserved ... is >> it the columns of the matrix that you passed in? If you don't want to >> move those columns around, then ?do you want the branches of the tree >> to criss-cross or something? >> >> The way I see it, you are kind of stuck if you intend to draw a >> dendrogram at all. >> >> So -- how can we move things around in a natural way? >> >> Maybe you can choose a different distance measure? >> Maybe you can normalize your data in a different way? >> Maybe you can plot a subset of genes -- maybe those with the highest >> variance across all your data, which might result in new distances >> calculated, and a different drawing of the branches on the tree. >> >> You could always pass in your own dendrogram structure to the heatmap >> and "arbitrarily" calculate distances so that the tree ?draws as you >> want, but I don't think that's something you'd want to do anyway. >> >> Another approach to show "likeness" between expression profiles is to >> not focus on the dendrogram lining up "just so", but to rather add a >> list of colors to the examples (columns) of your data by using the >> "ColSideColors" parameter. Say the first 10 columns of your matrix are >> from the 10 controls, and the last 10 are from the affecteds. You can >> do: >> >> R> heatmap.2(my.data, ..., ColSideColColors=c(rep('blue', 10), rep('red', 10))) >> >> If, as you say, the expression profiles are *mostly* similar, you'll >> see that, by and large, the blue experiments will be "chunked" w/ >> blue, and the red expts are chunked with the red, which might show the >> same point you're trying to make with the dendrogram. >> >> HTH, >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> ?| Memorial Sloan-Kettering Cancer Center >> ?| Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> > > > > -- > Hofstadter's Law: It always takes longer than you expect, even when > you take into account Hofstadter's Law. > ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) > -- Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law. ?Douglas Hofstadter (in G?del, Escher, Bach, 1979)

ADD REPLY • link 12.9 years ago Gavin Koh ▴ 220