3D PCA Plot of RNA-seq Data using plotPCA from within DESeq2
Entering edit mode
erin.gill81 ▴ 50
Last seen 3.1 years ago


I'm analysing an RNA-seq dataset using DESeq2, and would like to inspect the top three principal components on a 3D PCA plot. I know that the plot PCA function has been implemented in DESeq 2, but I've been unable to get the 3rd principal component to plot, print to screen or file. I've pasted the commands from my R session below:

#read in samples from file
samples = read.csv("samples.csv",header=TRUE)

#use edgeR DGE list function to make count matrix
counts = readDGE(samples$countf)$counts
colnames(counts) = samples$shortname
d = DGEList(counts=counts, group=colnames(counts))

#attach the column data to from the sample sheet to the variable coldata
coldata = with(samples,data.frame(shortname = I(shortname), condition = condition, donor = donor))

#start DESeq2

#construct your DESeq2 data set, making sure to specify the design matrix here
dds <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ donor + condition)

#run DESeq2 on the dataset
dds <- DESeq(dds)

#make 2D PCA plot
hmcol <- colorRampPalette(brewer.pal(9, "GnBu"))(100)
distsRL <- dist(t(assay(rld)))
mat <- as.matrix(distsRL)
rownames(mat) <- colnames(mat) <- with(colData(dds), paste(shortname, sep=" : "))
print(plotPCA(rld, intgroup=c("condition","donor")))

#make 3D PCA plot
print(plotPCA(rld, intgroup=c("condition","donor"), pcs = c(1,2,3),plot3d = TRUE, returnData=TRUE))


At this point, I get the error message, "Error in plotPCA(rld, intgroup = c("condition", "donor"), pcs = c(1, 2,  : 
  unused arguments (pcs = c(1, 2, 3), plot3d = TRUE)"

Is it theoretically possible to make 3D PCA plots within DESeq2 using plotPCA? Was the plotPCA package fully implemented here? I realize that this may be a question for the developer of plotPCA, but thought I'd try here first to cover all of the bases.



> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] rgl_0.95.1201             ggplot2_1.0.1            
 [3] gplots_2.16.0             RColorBrewer_1.1-2       
 [5] DESeq2_1.6.3              RcppArmadillo_0.4.650.1.1
 [7] Rcpp_0.11.5               GenomicRanges_1.18.4     
 [9] GenomeInfoDb_1.2.4        IRanges_2.0.1            
[11] S4Vectors_0.4.0           BiocGenerics_0.12.1      
[13] edgeR_3.8.6               limma_3.22.7             

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.2
 [4] base64enc_0.1-2      BatchJobs_1.6        BBmisc_1.9          
 [7] Biobase_2.26.0       BiocParallel_1.0.3   bitops_1.0-6        
[10] brew_1.0-6           caTools_1.17.1       checkmate_1.5.2     
[13] cluster_2.0.1        codetools_0.2-11     colorspace_1.2-6    
[16] DBI_0.3.1            digest_0.6.8         fail_1.2            
[19] foreach_1.4.2        foreign_0.8-63       Formula_1.2-0       
[22] gdata_2.13.3         genefilter_1.48.1    geneplotter_1.44.0  
[25] grid_3.1.3           gtable_0.1.2         gtools_3.4.1        
[28] Hmisc_3.15-0         iterators_1.0.7      KernSmooth_2.23-14  
[31] lattice_0.20-31      latticeExtra_0.6-26  locfit_1.5-9.1      
[34] MASS_7.3-40          munsell_0.4.2        nnet_7.3-9          
[37] plyr_1.8.1           proto_0.3-10         reshape2_1.4.1      
[40] rpart_4.1-9          RSQLite_1.0.0        scales_0.2.4        
[43] sendmailR_1.2-1      splines_3.1.3        stringr_0.6.2       
[46] survival_2.38-1      tools_3.1.3          XML_3.98-1.1        
[49] xtable_1.7-4         XVector_0.6.0 


deseq2 rnaseq PCA • 14k views
Entering edit mode
Last seen 3 days ago
United States

hi Erin,

In the current release, if you type ?plotPCA, you will get the help for the DESeq2 version of plotPCA. It's a very simple function, and if you type plotPCA and hit enter you can see the source code. If you want to use another function, the important lines you might want to use are at the top. 

ntop <- 500
rv <- rowVars(assay(rld))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
mat <- t( assay(rld)[select, ] )
pca <- yourFavoritePCA( mat )


The third line calculates the total variance for each row, the fourth line picks the top 500 genes by row variance, the fifth line subsets the data to those top 500 genes and transposes such that the samples are now rows instead of columns (outside of genomics, typically rows are samples and columns are "features"). In the sixth line you would substitute your preferred PCA function.

Because a lot of packages on Bioconductor had the same function name, in the next release (in a week or so) the plotPCA is a method which is defined for different classes, and you will have a set of options for which plotPCA you want help for if you type: ?plotPCA. This is so that the functions don't mask each other based on the order of package loading. We then created a simple class for VST or rlog objects in DESeq2 called DESeqTransform, and you can get the source code of plotPCA (with comments) in future releases with: DESeq2:::plotPCA.DESeqTransform (which will be mentioned in the help)


Entering edit mode

Thanks! I had confused two plotPCA functions that were part of different software packages. I've successfully used the code you provided with FactoMineR to generate the additional principal components I was looking for.


Entering edit mode

Hi Michael,

I think I have the same error as Erin does. I did try your suggested command line but I'm not getting any definitive result. 

rv <- rowVars(assay(rld))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
mat <- t( assay(rld)[select, ] )

I could not get the third PCA. 

Please advice.


Entering edit mode

You have generated a matrix of transformed values for the top variance genes but you haven't performed PCA. You need to perform PCA before you plot.You can choose to use some other graphing or EDA libraries, but you will have to look up how to code those yourself.

A few lines to get you started doing PCA outside of DESeq2:

pc <- prcomp(mat)

Now you have the rotated data in pc$x. The first two PCs are pc$x[,1:2].


Login before adding your answer.

Traffic: 292 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6