Question

Hier.Clustering: group size effect

0

Entering edit mode

Heike Pospisil ▴ 310

@heike-pospisil-1097

Last seen 11.4 years ago

Hello, I have a question concerning hierarchical clustering and the effect of group sizes. I would like to select genes that are differentially expressed between group A and group B. Afterwards, I wish to cluster the samples by these genes. In principle, it works fine, but I have a problem if the group sizes are significantly unequal. One example is as e.g.: group A: 53 samples group B: 12 samples The resulting clustering brings group B together, but it is not clearly separated from group A. Then again, if I take 12 samples from group A randomly (to get equal group sizes), the clustering is nearly perfect. I use hclust(dist(t(exprs(sub)),method="euclidean"),method="complete") (ncol(sub) = groupA+groupB and nrow(sub) = number of sign.genes) and tried other distance measures, but without improvement. Does anybody have a hint which clustering algorithm should be prefered for such unequal group sizes? Thanks in advance and best wishes, Heike -- Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de University of Hamburg | Center for Bioinformatics Bundesstrasse 43 | 20146 Hamburg, Germany phone:+49-40-42838-7303 | fax: +49-40-42838-7312

Clustering Clustering • 1.2k views

ADD COMMENT • link updated 20.0 years ago by Kevin R. Coombes ▴ 140 • written 20.0 years ago by Heike Pospisil ▴ 310

score 0 · Answer 1 · 2006-02-03

If you already know the groups, then what's the point of doing clustering? More precisely, what biological question do you think you are answering with this method? Kevin Heike Pospisil wrote: > Hello, > > I have a question concerning hierarchical clustering and the effect of group sizes. > > I would like to select genes that are differentially expressed between group A > and group B. Afterwards, I wish to cluster the samples by these genes. In > principle, it works fine, but I have a problem if the group sizes are > significantly unequal. One example is as e.g.: > group A: 53 samples > group B: 12 samples > The resulting clustering brings group B together, but it is not clearly > separated from group A. Then again, if I take 12 samples from group A randomly > (to get equal group sizes), the clustering is nearly perfect. > > I use hclust(dist(t(exprs(sub)),method="euclidean"),method="complete") > (ncol(sub) = groupA+groupB and nrow(sub) = number of sign.genes) and tried other > distance measures, but without improvement. > > Does anybody have a hint which clustering algorithm should be prefered for such > unequal group sizes? > > Thanks in advance and best wishes, > Heike