Correlation/clustering of numeric (continuous) and categorical values
1
2
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 17 days ago
Italy

dear All,

I've the following problem:

I have a data.frame in which some of the columns are numerical values and some are categorical. Just for data exploration I would like to do sort of a clustering or a correlation matrix on the columns to see which of them are actually correlated. But how can I correlate between numeric and continuous values?

Thanks in advance for any hints and suggestions on that!

cheers, jo

correlation • 1.4k views
ADD COMMENT
1
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States

Some thoughts:

  • Consider transforming the categorical variables to numeric if that makes sense.
  • Consider transforming the numeric variables to categorical and then using an appropriate distance metric (this may take some thought).

A really interesting idea is to use a non-linear measure such as random forest proximities for clustering.  Steve Horvath discusses the idea here:

http://www.slideshare.net/Pammy98/using-random-forest-proximity-for-unsupervised-learning-in-tissue

ADD COMMENT
0
Entering edit mode

I didn't want to transform the data and was hoping there is some distance metric out there that would work with both categorical and numerical data.

But I'll definitely look into the idea with the random forest, thanks Sean!

ADD REPLY
0
Entering edit mode

just a secondary thought: Gower's distance wouldn't help here right?

ADD REPLY

Login before adding your answer.

Traffic: 434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6