Hello All,
I am working on a clustering project where we have collected protein data from over 100 patients samples. This data is normalized and log transformed to achieve a uniform distribution. The goal is to cluster samples based upon their similarities, I am using hierarchal clustering and trying out combinations of distance metrics and clustering algorithms. (We haven't made a decision on distance method or clustering algorithms) My question is related to the centering and scaling issue. Is it absolutely necessary to both scale and center the data?, even in scenarios where all the data is coming from the same platform and with same units of measurement.
Appreciate your input on this one.