Question

Biplot has a secondary x-axis and y-axis

0

Entering edit mode

yousavdj • 0

@9611120f

Last seen 5 months ago

United States

Hello all,

I'm pretty new to R and PCA analyses, and my datasets are normally too small for any meaningful statistics, but I need to conduct a PCA in R for a paper I'm writing. I'm using the pcaMethods package (method=ppca) because my dataset has a lot of non-random missing data and chatGPT told me this was the best package to use for that. I'm having trouble understanding how to evaluate the effectiveness of my PCA, however. I've found documentation of how to evaluate the amount of variance that's described by each component using a screeplot, but that doesn't seem to be compatible with pcaMethods. Any advice on how to visualize the effectiveness of the PCA I would appreciate (again, I'm a novice at both R and stats).

But also, I have produced a biplot from my PCA and I'm confused why there are secondary x and y-axes that are in a different scale than my primary x and y-axis. I've included a picture below to illustrate. I've looked in a paper published under my advisor several years ago and the PCA performed there looks like a standard scatter plot with one x and one y axis. Any idea why mine is generating these secondary axes and can I get rid of them?

Thanks!

pca_result <- pca(data_matrix, method = "ppca", nPcs = 3)
biplot(pca_result, choices = c(1, 2), main = "Biplot of PC1 and PC2")

enter image description here

pcaMethods pca • 586 views

ADD COMMENT • link updated 5 months ago by James W. MacDonald 66k • written 5 months ago by yousavdj • 0

score 0 · Answer 1 · 2024-02-12

0

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

That's just a biplot, which is described in ?biplot. Do note that you are plotting two things there - the points and the arrows - so it makes sense to have an axis for each.

ADD COMMENT • link 5 months ago James W. MacDonald 66k

0

Entering edit mode

Thank you. So one axis is for the points, which I assume those numbers correspond to each row and the other corresponds to the variables. From what I gather the values for the "loadings" is equivalent to the variable names on this graph. Is there a way to replace the numbers with points at least? I have been trying different options based on the information in ?biplot but it's to no avail.

ADD REPLY • link 5 months ago yousavdj • 0

0

Entering edit mode

It's not clear what you are trying to do. What exactly does 'effectiveness' mean in the context of a PCA? I would imagine you actually want to gauge the effectiveness of the imputation. But there is a whole section on that in the vignette for pcaMethods that you could have read and attempted to emulate. But it appears you didn't do that, and instead used biplot to make a plot even though that function isn't even defined in the package you are attempting to use. Using functions from unrelated packages that just happen to work is probably not the path to success.

Or maybe you just want to do a PCA plot, in which case you would just do

plot(scores(pca_result))

Which I figured out by reading the help page for pca. If you are planning to use R (or any Open Source language) to any extent, you will need to get used to finding and interpreting information. One part of that is reading the vignettes. The other part is reading the help pages. You found the pca function, but maybe didn't read all the way to the bottom, where there are examples you can run. One of which generates a PCA plot (using ggplot2, which is sort of dumb IMO for such a simple plot, but kids these days).

ADD REPLY • link 5 months ago James W. MacDonald 66k