Question: 3D PCA Plot of RNA-seq Data using plotPCA from within DESeq2
0
gravatar for erin.gill81
4.0 years ago by
erin.gill8130
Canada
erin.gill8130 wrote:

Hello, 

I'm analysing an RNA-seq dataset using DESeq2, and would like to inspect the top three principal components on a 3D PCA plot. I know that the plot PCA function has been implemented in DESeq 2, but I've been unable to get the 3rd principal component to plot, print to screen or file. I've pasted the commands from my R session below:

#read in samples from file
samples = read.csv("samples.csv",header=TRUE)

#use edgeR DGE list function to make count matrix
library("edgeR")
counts = readDGE(samples$countf)$counts
colnames(counts) = samples$shortname
d = DGEList(counts=counts, group=colnames(counts))

#attach the column data to from the sample sheet to the variable coldata
coldata = with(samples,data.frame(shortname = I(shortname), condition = condition, donor = donor))

#start DESeq2
library("DESeq2")

#construct your DESeq2 data set, making sure to specify the design matrix here
dds <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ donor + condition)

#run DESeq2 on the dataset
dds <- DESeq(dds)

#make 2D PCA plot
library("RColorBrewer")
library("gplots")
library("gglplot2")
library("rgl")
hmcol <- colorRampPalette(brewer.pal(9, "GnBu"))(100)
rld<-rlog(dds)
distsRL <- dist(t(assay(rld)))
mat <- as.matrix(distsRL)
rownames(mat) <- colnames(mat) <- with(colData(dds), paste(shortname, sep=" : "))
print(plotPCA(rld, intgroup=c("condition","donor")))

#make 3D PCA plot
print(plotPCA(rld, intgroup=c("condition","donor"), pcs = c(1,2,3),plot3d = TRUE, returnData=TRUE))

 

At this point, I get the error message, "Error in plotPCA(rld, intgroup = c("condition", "donor"), pcs = c(1, 2,  : 
  unused arguments (pcs = c(1, 2, 3), plot3d = TRUE)"

Is it theoretically possible to make 3D PCA plots within DESeq2 using plotPCA? Was the plotPCA package fully implemented here? I realize that this may be a question for the developer of plotPCA, but thought I'd try here first to cover all of the bases.

 

Thanks!

> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] rgl_0.95.1201             ggplot2_1.0.1            
 [3] gplots_2.16.0             RColorBrewer_1.1-2       
 [5] DESeq2_1.6.3              RcppArmadillo_0.4.650.1.1
 [7] Rcpp_0.11.5               GenomicRanges_1.18.4     
 [9] GenomeInfoDb_1.2.4        IRanges_2.0.1            
[11] S4Vectors_0.4.0           BiocGenerics_0.12.1      
[13] edgeR_3.8.6               limma_3.22.7             

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.2
 [4] base64enc_0.1-2      BatchJobs_1.6        BBmisc_1.9          
 [7] Biobase_2.26.0       BiocParallel_1.0.3   bitops_1.0-6        
[10] brew_1.0-6           caTools_1.17.1       checkmate_1.5.2     
[13] cluster_2.0.1        codetools_0.2-11     colorspace_1.2-6    
[16] DBI_0.3.1            digest_0.6.8         fail_1.2            
[19] foreach_1.4.2        foreign_0.8-63       Formula_1.2-0       
[22] gdata_2.13.3         genefilter_1.48.1    geneplotter_1.44.0  
[25] grid_3.1.3           gtable_0.1.2         gtools_3.4.1        
[28] Hmisc_3.15-0         iterators_1.0.7      KernSmooth_2.23-14  
[31] lattice_0.20-31      latticeExtra_0.6-26  locfit_1.5-9.1      
[34] MASS_7.3-40          munsell_0.4.2        nnet_7.3-9          
[37] plyr_1.8.1           proto_0.3-10         reshape2_1.4.1      
[40] rpart_4.1-9          RSQLite_1.0.0        scales_0.2.4        
[43] sendmailR_1.2-1      splines_3.1.3        stringr_0.6.2       
[46] survival_2.38-1      tools_3.1.3          XML_3.98-1.1        
[49] xtable_1.7-4         XVector_0.6.0 

 

 
rnaseq deseq2 pca • 7.5k views
ADD COMMENTlink modified 4.0 years ago by Michael Love22k • written 4.0 years ago by erin.gill8130
Answer: 3D PCA Plot of RNA-seq Data using plotPCA from within DESeq2
0
gravatar for Michael Love
4.0 years ago by
Michael Love22k
United States
Michael Love22k wrote:

hi Erin,

In the current release, if you type ?plotPCA, you will get the help for the DESeq2 version of plotPCA. It's a very simple function, and if you type plotPCA and hit enter you can see the source code. If you want to use another function, the important lines you might want to use are at the top. 

library(genefilter)
ntop <- 500
rv <- rowVars(assay(rld))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
mat <- t( assay(rld)[select, ] )
pca <- yourFavoritePCA( mat )

 

The third line calculates the total variance for each row, the fourth line picks the top 500 genes by row variance, the fifth line subsets the data to those top 500 genes and transposes such that the samples are now rows instead of columns (outside of genomics, typically rows are samples and columns are "features"). In the sixth line you would substitute your preferred PCA function.

Because a lot of packages on Bioconductor had the same function name, in the next release (in a week or so) the plotPCA is a method which is defined for different classes, and you will have a set of options for which plotPCA you want help for if you type: ?plotPCA. This is so that the functions don't mask each other based on the order of package loading. We then created a simple class for VST or rlog objects in DESeq2 called DESeqTransform, and you can get the source code of plotPCA (with comments) in future releases with: DESeq2:::plotPCA.DESeqTransform (which will be mentioned in the help)

 

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Michael Love22k

Thanks! I had confused two plotPCA functions that were part of different software packages. I've successfully used the code you provided with FactoMineR to generate the additional principal components I was looking for.

Erin

ADD REPLYlink written 4.0 years ago by erin.gill8130

Hi Michael,

I think I have the same error as Erin does. I did try your suggested command line but I'm not getting any definitive result. 

ntop<-500
rv <- rowVars(assay(rld))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
mat <- t( assay(rld)[select, ] )
plot3d(mat)

I could not get the third PCA. 

Please advice.

 

ADD REPLYlink written 2.4 years ago by tarun20

You have generated a matrix of transformed values for the top variance genes but you haven't performed PCA. You need to perform PCA before you plot.You can choose to use some other graphing or EDA libraries, but you will have to look up how to code those yourself.

A few lines to get you started doing PCA outside of DESeq2:

pc <- prcomp(mat)

Now you have the rotated data in pc$x. The first two PCs are pc$x[,1:2].

ADD REPLYlink written 2.4 years ago by Michael Love22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 138 users visited in the last hour