Hier.Clustering: group size effect
1
0
Entering edit mode
@heike-pospisil-1097
Last seen 10.3 years ago
Hello, I have a question concerning hierarchical clustering and the effect of group sizes. I would like to select genes that are differentially expressed between group A and group B. Afterwards, I wish to cluster the samples by these genes. In principle, it works fine, but I have a problem if the group sizes are significantly unequal. One example is as e.g.: group A: 53 samples group B: 12 samples The resulting clustering brings group B together, but it is not clearly separated from group A. Then again, if I take 12 samples from group A randomly (to get equal group sizes), the clustering is nearly perfect. I use hclust(dist(t(exprs(sub)),method="euclidean"),method="complete") (ncol(sub) = groupA+groupB and nrow(sub) = number of sign.genes) and tried other distance measures, but without improvement. Does anybody have a hint which clustering algorithm should be prefered for such unequal group sizes? Thanks in advance and best wishes, Heike -- Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de University of Hamburg | Center for Bioinformatics Bundesstrasse 43 | 20146 Hamburg, Germany phone:+49-40-42838-7303 | fax: +49-40-42838-7312
Clustering Clustering • 1.0k views
ADD COMMENT
0
Entering edit mode
@kevin-r-coombes-1589
Last seen 10.3 years ago
If you already know the groups, then what's the point of doing clustering? More precisely, what biological question do you think you are answering with this method? Kevin Heike Pospisil wrote: > Hello, > > I have a question concerning hierarchical clustering and the effect of group sizes. > > I would like to select genes that are differentially expressed between group A > and group B. Afterwards, I wish to cluster the samples by these genes. In > principle, it works fine, but I have a problem if the group sizes are > significantly unequal. One example is as e.g.: > group A: 53 samples > group B: 12 samples > The resulting clustering brings group B together, but it is not clearly > separated from group A. Then again, if I take 12 samples from group A randomly > (to get equal group sizes), the clustering is nearly perfect. > > I use hclust(dist(t(exprs(sub)),method="euclidean"),method="complete") > (ncol(sub) = groupA+groupB and nrow(sub) = number of sign.genes) and tried other > distance measures, but without improvement. > > Does anybody have a hint which clustering algorithm should be prefered for such > unequal group sizes? > > Thanks in advance and best wishes, > Heike
ADD COMMENT
0
Entering edit mode
Hello Kevin, Kevin R. Coombes wrote: > If you already know the groups, then what's the point of doing > clustering? More precisely, what biological question do you think you > are answering with this method? I would like to show how differentiating the selected genes are. BW, Heike -- Dr. Heike Pospisil | pospisil at zbh.uni-hamburg.de University of Hamburg | Center for Bioinformatics Bundesstrasse 43 | 20146 Hamburg, Germany phone:+49-40-42838-7303 | fax: +49-40-42838-7312
ADD REPLY

Login before adding your answer.

Traffic: 368 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6