Recently, I am learning the RNA-seq analysis, especially for PCA in order to check if the control and treatment can separate based on the treatment condition. I checked online for preparing the data before performing the PCA. I got an idea that, LCPM calculated from raw count matrix can be used as the PCA input and I did it the my first PCA method below. The resulting PCA can separate between control and treatment (see the figure here: ). When I learn the DESeq2 tutorial (http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html), I got that PCA can also be performed based on VST transformation as I did in the method 2 below. However, by using the vst based method, this time the PCA can not separate (see the figure here: ) between control and treatment though all the data are same.
Can you give suggestions which method (LCPM or VST) should I used in this case? Why are different in PCA based on the different data transformation? In this case which data transformation method should I use? Or based on the PCA methods I tried, can it indicate that my RNA-seq data is hard to separate based on treatment from control?
Any suggestion are appreciated!
Some key codes are listed below.
# Method 1: PCA based on LCPM y <- readDGE(files, columns = c(1, 3)) lcpm <- cpm(y, log = TRUE) dat <- t(lcpm) pca_res <- prcomp(dat, scale. = TRUE) autoplot(pca_res, data = dat_org, colour = 'group') # Method 2: VST dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldat, design = ~ condition) vsd <- vst(dds, blind = FALSE) plotPCA(vsd, intgroup = c("condition"))