Question: How do I extract the principal components in DESeq2?
0
4.2 years ago by
tj.hu10
United States
tj.hu10 wrote:

Hi,

I am using DESeq2 and after the rld transformation I plotted the PCA and found an interesting result. However, I don't know how to extract the PC data. I want to know what the Principal Components actually are (I believe it's a weighted gene list so I want to know the weights?). I would also like to retrieve the data on PC3, PC4, etc. and understand what are in the those principal components but also how much variance are explained by those principal components.

Please let me know if this is possible within DESeq2 or if I should use another package.

Thanks!

deseq2 pca • 6.5k views
modified 4.2 years ago by Michael Love25k • written 4.2 years ago by tj.hu10
Answer: How do I extract the principal components in DESeq2?
0
4.2 years ago by
Michael Love25k
United States
Michael Love25k wrote:

in the help file ?plotPCA we have a note:

Note:

See the vignette for an example of variance stabilization and PCA
plots. Note that the source code of ‘plotPCA’ is very simple and
commented. It can be found with comments by typing
‘DESeq2:::plotPCA.DESeqTransform’. (The version at
Users should find it easy to customize this function.

If you follow those instructions, the third line of code gives you the PCA object.

1

Whether or not you see the comments is determined by options("keep.source.pkgs"), which if FALSE (the default) will strip off the comments when you install the package. So for example, in my install I get

> DESeq2:::plotPCA.DESeqTransform
function (object, intgroup = "condition", ntop = 500, returnData = FALSE)
{
rv <- rowVars(assay(object))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop,
length(rv)))]
pca <- prcomp(t(assay(object)[select, ]))
percentVar <- pca$sdev^2/sum(pca$sdev^2)
if (!all(intgroup %in% names(colData(object)))) {
stop("the argument 'intgroup' should specify columns of colData(dds)")
}
intgroup.df <- as.data.frame(colData(object)[, intgroup,
drop = FALSE])
group <- if (length(intgroup) > 1) {
factor(apply(intgroup.df, 1, paste, collapse = " : "))
}
else {
colData(object)[[intgroup]]
}
d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], group = group,
intgroup.df, name = colnames(object))
if (returnData) {
attr(d, "percentVar") <- percentVar[1:2]
return(d)
}
ggplot(data = d, aes_string(x = "PC1", y = "PC2", color = "group")) +
geom_point(size = 3) + xlab(paste0("PC1: ", round(percentVar[1] *
100), "% variance")) + ylab(paste0("PC2: ", round(percentVar[2] *
100), "% variance"))
}
<environment: namespace:DESeq2>

In which case the Bioconductor github mirror is nice. You can find the source file here, and scroll down to the function itself.

You're right, I need to fix that note. I think I tricked myself through using devtools into thinking that would work. I'll just say, go to source for comments.

Hi - I wonder if you could let me know how to modify this code and save it as a new function that calls the 2 PCs of interest, e.g. 1 and 2 (default) or 3 and 4 as variables, perhaps>?

I guess it's the line:

d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], group = group,

where the [,1] and [,2] refer to 2 columns of the pca output.

Sorry for the basic nature of my question: I'm a beginner and not done any function editing yet..

Cheers,

Matt

The easiest thing to do is to make your own function. You can get the function body by going to the github link I posted above.

plotPCA.mystyle <-  function(object, intgroup="condition", ntop=500, returnData=FALSE, pcs = c(1,2))
{

stopifnot(length(pcs) == 2)    ### added this to check number of PCs ####
# calculate the variance for each gene
rv <- rowVars(assay(object))

# select the ntop genes by variance
select <- order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]

# perform a PCA on the data in assay(x) for the selected genes
pca <- prcomp(t(assay(object)[select,]))

# the contribution to the total variance for each component
percentVar <- pca$sdev^2 / sum( pca$sdev^2 )

if (!all(intgroup %in% names(colData(object)))) {
stop("the argument 'intgroup' should specify columns of colData(dds)")
}

intgroup.df <- as.data.frame(colData(object)[, intgroup, drop=FALSE])

# add the intgroup factors together to create a new grouping factor
group <- if (length(intgroup) > 1) {
factor(apply( intgroup.df, 1, paste, collapse=" : "))
} else {
colData(object)[[intgroup]]
}

# assembly the data for the plot
########## Here we just use the pcs object passed by the end user ####
d <- data.frame(PC1=pca$x[,pcs[1]], PC2=pca$x[,pcs[2]], group=group, intgroup.df, name=colnames(object))

if (returnData) {
attr(d, "percentVar") <- percentVar[1:2]
return(d)
}