Comparing discrete variable to PCA
1
0
Entering edit mode
@kvittingseerup-7956
Last seen 6 months ago
European Union

Is there a Bioconductor solution of comparing discrete variables to PCs to figure out associations. I'm looking for the discrete version of PCAtools' eigencor plot.

Cheers Kristoffer

EDA pcaExplorer PCAtools PCA • 1.4k views
1
Entering edit mode
Kevin Blighe ★ 3.9k
@kevin
Last seen 6 weeks ago
Republic of Ireland

Hey Kristoffer, I developed PCAtools, as you know - is eigencorplot not what you need? PCAtools has been in Bioconductor for > 1 year.

Note that these are the exact same:

1. Pearson correlation coefficient of Continuous X versus Categorical Y, with Y encoded numerically
2. Extracting the r correlation value from a linear regression of the form X ~ Y

As I show here:

continuous <- c(45, 67, 12, 65, 75, 3, 44, 90)
categorical <- factor(c(0,0,0,0,1,1,1,1))

cor(continuous, as.numeric(categorical)) ^ 2
[1] 0.01024737

summary(lm(continuous ~ categorical))\$r.squared
[1] 0.01024737


Kevin

0
Entering edit mode

Hi Kevin

I did try eigencorplot() but got this error when using a categorical (text or factor) variable:

Error in cor(xvals, yvals, use = corUSE, method = corFUN) :
'y' must be numeric
In eigencorplot(myPca, metavars = c("barcode")) :
barcode is not numeric - please check the source data as everything will be converted to a matrix


Turns out it can be solved by convert them to numerical values before using pca().

Thanks for pointing out it could be done.

Cheers Kristoffer

1
Entering edit mode

Oh, I thought that issue was addressed in the previous Bioc release, i.e., it should automatically convert factors to numeric and give a warning, like above. I have not yet come across the other error thrown by cor() internally.

0
Entering edit mode

Would that same approach hold true when we have multiple categories.

Let's say ''group_1", ''group_2" and ''group_3" In this case, each will be assigned to an integer (0,1,2) although they are ordered categories they do not exactly match this transformation. Am I right?

If so, any suggestions to working with multiple categories variabels?