Hello,
I'm analysing an RNA-seq dataset using DESeq2, and would like to inspect the top three principal components on a 3D PCA plot. I know that the plot PCA function has been implemented in DESeq 2, but I've been unable to get the 3rd principal component to plot, print to screen or file. I've pasted the commands from my R session below:
#read in samples from file
samples = read.csv("samples.csv",header=TRUE)
#use edgeR DGE list function to make count matrix
library("edgeR")
counts = readDGE(samples$countf)$counts
colnames(counts) = samples$shortname
d = DGEList(counts=counts, group=colnames(counts))
#attach the column data to from the sample sheet to the variable coldata
coldata = with(samples,data.frame(shortname = I(shortname), condition = condition, donor = donor))
#start DESeq2
library("DESeq2")
#construct your DESeq2 data set, making sure to specify the design matrix here
dds <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ donor + condition)
#run DESeq2 on the dataset
dds <- DESeq(dds)
#make 2D PCA plot
library("RColorBrewer")
library("gplots")
library("gglplot2")
library("rgl")
hmcol <- colorRampPalette(brewer.pal(9, "GnBu"))(100)
rld<-rlog(dds)
distsRL <- dist(t(assay(rld)))
mat <- as.matrix(distsRL)
rownames(mat) <- colnames(mat) <- with(colData(dds), paste(shortname, sep=" : "))
print(plotPCA(rld, intgroup=c("condition","donor")))
#make 3D PCA plot
print(plotPCA(rld, intgroup=c("condition","donor"), pcs = c(1,2,3),plot3d = TRUE, returnData=TRUE))
At this point, I get the error message, "Error in plotPCA(rld, intgroup = c("condition", "donor"), pcs = c(1, 2, :
unused arguments (pcs = c(1, 2, 3), plot3d = TRUE)"
Is it theoretically possible to make 3D PCA plots within DESeq2 using plotPCA? Was the plotPCA package fully implemented here? I realize that this may be a question for the developer of plotPCA, but thought I'd try here first to cover all of the bases.
Thanks!
> sessionInfo() R version 3.1.3 (2015-03-09) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.2 LTS locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] rgl_0.95.1201 ggplot2_1.0.1 [3] gplots_2.16.0 RColorBrewer_1.1-2 [5] DESeq2_1.6.3 RcppArmadillo_0.4.650.1.1 [7] Rcpp_0.11.5 GenomicRanges_1.18.4 [9] GenomeInfoDb_1.2.4 IRanges_2.0.1 [11] S4Vectors_0.4.0 BiocGenerics_0.12.1 [13] edgeR_3.8.6 limma_3.22.7 loaded via a namespace (and not attached): [1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.2 [4] base64enc_0.1-2 BatchJobs_1.6 BBmisc_1.9 [7] Biobase_2.26.0 BiocParallel_1.0.3 bitops_1.0-6 [10] brew_1.0-6 caTools_1.17.1 checkmate_1.5.2 [13] cluster_2.0.1 codetools_0.2-11 colorspace_1.2-6 [16] DBI_0.3.1 digest_0.6.8 fail_1.2 [19] foreach_1.4.2 foreign_0.8-63 Formula_1.2-0 [22] gdata_2.13.3 genefilter_1.48.1 geneplotter_1.44.0 [25] grid_3.1.3 gtable_0.1.2 gtools_3.4.1 [28] Hmisc_3.15-0 iterators_1.0.7 KernSmooth_2.23-14 [31] lattice_0.20-31 latticeExtra_0.6-26 locfit_1.5-9.1 [34] MASS_7.3-40 munsell_0.4.2 nnet_7.3-9 [37] plyr_1.8.1 proto_0.3-10 reshape2_1.4.1 [40] rpart_4.1-9 RSQLite_1.0.0 scales_0.2.4 [43] sendmailR_1.2-1 splines_3.1.3 stringr_0.6.2 [46] survival_2.38-1 tools_3.1.3 XML_3.98-1.1 [49] xtable_1.7-4 XVector_0.6.0
Thanks! I had confused two plotPCA functions that were part of different software packages. I've successfully used the code you provided with FactoMineR to generate the additional principal components I was looking for.
Erin
Hi Michael,
I think I have the same error as Erin does. I did try your suggested command line but I'm not getting any definitive result.
ntop<-500
rv <- rowVars(assay(rld))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
mat <- t( assay(rld)[select, ] )
plot3d(mat)
I could not get the third PCA.
Please advice.
You have generated a matrix of transformed values for the top variance genes but you haven't performed PCA. You need to perform PCA before you plot.You can choose to use some other graphing or EDA libraries, but you will have to look up how to code those yourself.
A few lines to get you started doing PCA outside of DESeq2:
Now you have the rotated data in
pc$x
. The first two PCs arepc$x[,1:2]
.