Question: PCA from TPM
gravatar for tanyabioinfo
29 days ago by
tanyabioinfo10 wrote:


I am trying to do PCA analysis of my samples. I generated the matrix using the tximport package. I have transcript ids as my rows and the sample names are the columns.

txi <- tximport(files, type="salmon", tx2gene=NULL, ignoreTxVersion=TRUE,dropInfReps=TRUE,txOut = TRUE)
tpm <- (txi$abundance[apply(txi$abundance, MARGIN = 1, FUN = function(x) sd(x) != 0),])

tpm = log2(tpm + 1)
tpm_centered <- t(tpm-rowMeans(tpm))

pca = prcomp(tpm_centered , scale=TRUE, center=TRUE)

cols <- as.factor(as.numeric(colnames(tpm_centered)))

plot(pca$x[,1],pca$x[,2], xlab = "PC1", ylab = "PC2",main ="PCA replicate1", col =cols)

text(pca$x[,1],pca$x[,2], row.names(pca$x), cex=0.5, pos=3)

I have couple of question.

1. Is generating PCA plot from txi$abundance a good idea to plot the PCA

2. I am unable to get the output colored based on samples.

Can someone please help me



ADD COMMENTlink modified 29 days ago • written 29 days ago by tanyabioinfo10

Hi Michael

I am now using the follwoing code:

txi <- tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion=TRUE,dropInfReps=TRUE)
sampleTable <- data.frame(condition =samples$condition,time=factor(samples$time))
rownames(sampleTable) <- colnames(txi$counts)
dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition+time)
dds <- dds[ rowSums(counts(dds)) > 1, ]
rld <- rlog(dds, blind = FALSE)
plotPCA(rld, intgroup = c("condition", "time"))

I have wild type and mutant as condition and the time point as 0hr 6hr and 24 hr. I have 4 replicate for each one of them. In the PCA plot the replicates are not grouping together. Do you think this is normal or I am making some mistake.





ADD REPLYlink written 29 days ago by tanyabioinfo10

Make sure that files and sample table are the same order. This is very important.

ADD REPLYlink written 28 days ago by Michael Love15k
gravatar for Michael Love
29 days ago by
Michael Love15k
United States
Michael Love15k wrote:

I would recommend generating PCA plot from the normalized transformed counts. There are statistical reasons to prefer variance stabilized measurements, and the normalization takes care of any biases that were corrected by the quantification software, as well as sequencing depth.

This would look like:

dds <- DESeqDataSetFromTximport(txi)
vsd <- vst(dds)
ADD COMMENTlink written 29 days ago by Michael Love15k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 284 users visited in the last hour