11 months ago by
You can make an argument that correlation (or 1-cor) is a better distance measure for a couple of reasons. First, the intensity and range gene expression measures are to a certain extent dependent on either the length (for RNA-Seq) or GC content (for microarrays), and if you use Euclidean distance you may be dominated by genes with large changes in measured expression, which may have more to do with technical aspects of the measurement rather than changes in the underlying expression levels.
Second, a distance of 1 means really different things, depending on the underlying values. If you have an expression value of 1 vs 2, that is way less meaningful than a difference of 1 between an expression value of say 8 vs 9. Remember, your data should be logged, so a difference of 1 vs 2 is 2 vs 4 in linear terms. But 8 vs 9 is the difference between 256 and 512 in linear terms, and as such is a more believable change. The correlation for low expressing genes will probably be really poor, but will get better (if there really is something consistent between samples) as the expression values get larger, so your correlation distance may be based on more believable differences between samples.
That said, clustering isn't an inferential method, and it's difficult to say if a given heatmap is better in some sense than another. Certainly one might look better, but I'm not sure that's a criterion you should really trust.