When is centering and scaling needed before doing hierarchical clustering?
Entering edit mode
hrishi27n ▴ 20
Last seen 3 months ago
United States

Hello All,

I am working on a clustering project where we have collected protein data from over 100 patients samples. This data is normalized and log transformed to achieve a uniform distribution. The goal is to cluster samples based upon their similarities, I am using hierarchal clustering and trying out combinations of distance metrics and clustering algorithms. (We haven't made a decision on distance method or clustering algorithms) My question is related to the centering and scaling issue. Is it absolutely necessary to both scale and center the data?, even in scenarios where all the data is coming from the same platform and with same units of measurement.

Appreciate your input on this one.

clustering hierarchical clustering • 2.7k views
Entering edit mode
chris86 ▴ 390
Last seen 24 months ago
UCL, United Kingdom

Scaling is only necessary when you are combining data of different types, like height and weight for example.

Centering is done in principal component analysis for instance, it is not needed for clustering as it will not effect the results.

You could also consider trying kmeans with the silhouette method or the GAP-statistic. Clustering techniques tend to work better if the clusters are roughly spherical in N dimensional sample space, but you can also run them for uniformly distributed data. K means divides two uniformly distributed clusters here very well and I expect hclust would be fine too. I am talking about how samples are distributed in state space here, not how feature or sample 'signal' is distributed.


Finally, it is important your data is homoscedastic, so the variance of each feature does not depend on the mean. This isn't a problem with microarray data, but for RNA-seq it is as we have to transform the data appropriately.

I hope that helps.





Login before adding your answer.

Traffic: 234 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6