Question: PCA from TPM

1

tanyabioinfo •

**20**wrote:Hi

I am trying to do PCA analysis of my samples. I generated the matrix using the tximport package. I have transcript ids as my rows and the sample names are the columns.

txi <- tximport(files, type="salmon", tx2gene=NULL, ignoreTxVersion=TRUE,dropInfReps=TRUE,txOut = TRUE) tpm <- (txi$abundance[apply(txi$abundance, MARGIN = 1, FUN = function(x) sd(x) != 0),]) tpm = log2(tpm + 1) tpm_centered <- t(tpm-rowMeans(tpm)) pca = prcomp(tpm_centered , scale=TRUE, center=TRUE) cols <- as.factor(as.numeric(colnames(tpm_centered))) plot(pca$x[,1],pca$x[,2], xlab = "PC1", ylab = "PC2",main ="PCA replicate1", col =cols) text(pca$x[,1],pca$x[,2], row.names(pca$x), cex=0.5, pos=3)

I have couple of question.

1. Is generating PCA plot from txi$abundance a good idea to plot the PCA

2. I am unable to get the output colored based on samples.

Can someone please help me

Thanks

Tanya

Hi Michael

I am now using the follwoing code:

txi <- tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion=TRUE,dropInfReps=TRUE)

sampleTable <- data.frame(condition =samples$condition,time=factor(samples$time))

rownames(sampleTable) <- colnames(txi$counts)

dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition+time)

dds <- dds[ rowSums(counts(dds)) > 1, ]

rld <- rlog(dds, blind = FALSE)

plotPCA(rld, intgroup = c("condition", "time"))

I have wild type and mutant as condition and the time point as 0hr 6hr and 24 hr. I have 4 replicate for each one of them. In the PCA plot the replicates are not grouping together. Do you think this is normal or I am making some mistake.

Regards

Tanya

20Make sure that files and sample table are the same order. This is very important.

23k