Hi, based on DESeq2 result, I ranked all genes based on sign(lfc) * log (p-value), and ran GSEA prerank based on the ranking. Some gene sets highlight the difference between my experiment groups, so I want to draw to PCA plot based on the genes within that gene set.
In practice, I could do this in two ways:
1). Get DESeq2 normalized counts, and do PCA based on that:
counts_all_deseq2 <-counts(dds, normalized=TRUE) counts_deseq2 <- counts_all_deseq2[rownames(counts_all_deseq2) != "",] mat_set <- counts_deseq2[geneset2,] #log transform of raw counts mat_set.pca <- prcomp(log2(t(mat_set)+1),center = TRUE,scale. = TRUE)
2).Get rld transformed counts, and do PCA based on that:
rld <- rlog(dds, blind=FALSE) rld_data <-assay(rld) mat_set_rld <- rld_data[geneset2,] mat_set_rld.pca <- prcomp(t(mat_set_rld),center = TRUE,scale. = TRUE)
It turns out that both figures look almost the same, and the PC values:
p1$data$PC1
[1] -8.9658245 -0.3404701 -3.8457890 -7.9102610 -8.3783767 -8.4218242 -5.2244597 -5.0840841 -4.6299148 -11.2362408
p3$data$PC1
[1] -9.0109458 -0.4131002 -3.8642725 -7.9585980 -8.4054954 -8.4550656 -5.2094461 -5.1213384 -4.5919582 -11.2688762
Any suggestions which method I should use?
Best regards,
Raymond
I also noticed that in DESeq2 plotPCA function:
So, there is no scale process in "plotPCA".
Should I add another 'scale=TRUE' when I use DESeq2 normalized counts or rld transformed counts?
Thanks& regards
Raymond
Here x=t(assay(object)), so rows are samples and columns are genes. Setting scale=TRUE would set all the genes to have unit variance. I have a post where I explain why this is a bad idea:
A: Biclustering Normalizing by Row in Heatmap of DESeq2
I'll reply to the original post soon.
Very impressive and clear demonstrations in that post! It answers a puzzle disturbs me for a long time. Thanks!