Question

PCA plot for TPM data

0

Entering edit mode

tanyabioinfo ▴ 20

@tanyabioinfo-14091

Last seen 5.2 years ago

Hi

I am having a TPM count matrix. The columns are the samples wt0hr wt6hr wt24hr kd0hr kd6hr kd24 and have four replicates for each one of them.

Can some one help me with the correct R package to plot the PCA for samples when I have the TPM data.

Thanks

Tanya

PCA • 3.8k views

ADD COMMENT • link written 6.5 years ago by tanyabioinfo ▴ 20

score 0 · Answer 1 · 2017-10-24

0

Entering edit mode

Andy91 ▴ 60

@andy91-8905

Last seen 2.5 years ago

Netherlands

You can do it with base R. One way of doing it is first transposing the TPM count matrix (assuming you want to run PCA on the samples rather than the genes), centering it, then doing an SVD and subsequently plotting the first and second columns of the u matrix (assuming you are interested in the first and second principal components. Alternatively, use the prcomp() function instead of SVD and plot the first and second column of the x matrix. Both should yield the same pattern (note that the values will not be the same, but the pattern will).

Example:

tpm_centered <- t(tpm-rowMeans(tpm))

#SVD

tpm_svd <- svd(tpm_centered)

plot(tpm$u[,1], tpm$u[,2])

#prcomp

tpm_prcomp <- prcomp(tpm_centered)

plot(tpm_prcomp$x[,1], tpm$x[,2])

For more information, you might want to check out this tutorial: http://genomicsclass.github.io/book/pages/pca_svd.html

ADD COMMENT • link 6.5 years ago Andy91 ▴ 60

0

Entering edit mode

Thanks Andy. Can you please help me how can I add colors to this PCA plot based on sample names.

Thanks

Tanya

ADD REPLY • link 6.5 years ago tanyabioinfo ▴ 20

0

Entering edit mode

I would recommend you do some reading on plotting in R.

Example tutorials:

Pertaining the question at hand, you could use base R plot() function, or ggplot2::ggplot(). My preference goes to the latter as it does make a lot of nice plots once you get the hang of it.

#base R

cols <- as.factor(as.numeric(sample_names))

plot(tpm$u[,1], tpm$u[,2], col = cols)

#ggplot2

library(ggplot2)

plot_df <- data.frame(PC1 = tpm$u[,1], PC2 = tpm$u[,2], Samples = sample_names)

ggplot(plot_df, aes(x = PC1, y = PC2, col = Samples)) +

     geom_point()

Assuming you run this in R/RStudio that should work. If not, you would need to setup a plotting device.

ADD REPLY • link 6.5 years ago Andy91 ▴ 60

0

Entering edit mode

Hi Andy, what if I want to coloured by genotype or sex from my metadata, how to do it?

ADD REPLY • link 11 months ago Yijing • 0