Search
Question: When is centering and scaling needed before doing hierarchical clustering?
0
gravatar for hrishi27n
3 months ago by
hrishi27n0
hrishi27n0 wrote:

Hello All,

I am working on a clustering project where we have collected protein data from over 100 patients samples. This data is normalized and log transformed to achieve a uniform distribution. The goal is to cluster samples based upon their similarities, I am using hierarchal clustering and trying out combinations of distance metrics and clustering algorithms. (We haven't made a decision on distance method or clustering algorithms) My question is related to the centering and scaling issue. Is it absolutely necessary to both scale and center the data?, even in scenarios where all the data is coming from the same platform and with same units of measurement.

Appreciate your input on this one.

ADD COMMENTlink modified 3 months ago by chris86330 • written 3 months ago by hrishi27n0
0
gravatar for chris86
3 months ago by
chris86330
UCL, United Kingdom
chris86330 wrote:

Scaling is only necessary when you are combining data of different types, like height and weight for example.

Centering is done in principal component analysis for instance, it is not needed for clustering as it will not effect the results.

You could also consider trying kmeans with the silhouette method or the GAP-statistic. Clustering techniques tend to work better if the clusters are roughly spherical in N dimensional sample space, but you can also run them for uniformly distributed data. K means divides two uniformly distributed clusters here very well and I expect hclust would be fine too. I am talking about how samples are distributed in state space here, not how feature or sample 'signal' is distributed.

https://www.r-bloggers.com/exploring-assumptions-of-k-means-clustering-using-r/

Finally, it is important your data is homoscedastic, so the variance of each feature does not depend on the mean. This isn't a problem with microarray data, but for RNA-seq it is as we have to transform the data appropriately.

I hope that helps.

 

 

 

ADD COMMENTlink modified 3 months ago • written 3 months ago by chris86330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 256 users visited in the last hour