How do I extract the principal components in DESeq2?
1
0
Entering edit mode
tj.hu ▴ 10
@tjhu-8369
Last seen 6.0 years ago
United States

Hi,

I am using DESeq2 and after the rld transformation I plotted the PCA and found an interesting result. However, I don't know how to extract the PC data. I want to know what the Principal Components actually are (I believe it's a weighted gene list so I want to know the weights?). I would also like to retrieve the data on PC3, PC4, etc. and understand what are in the those principal components but also how much variance are explained by those principal components.

Please let me know if this is possible within DESeq2 or if I should use another package.

Thanks!

deseq2 pca • 8.7k views
0
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

in the help file ?plotPCA we have a note:

Note:

See the vignette for an example of variance stabilization and PCA
plots. Note that the source code of ‘plotPCA’ is very simple and
commented. It can be found with comments by typing
‘DESeq2:::plotPCA.DESeqTransform’. (The version at
Users should find it easy to customize this function.

If you follow those instructions, the third line of code gives you the PCA object.

1
Entering edit mode

Whether or not you see the comments is determined by options("keep.source.pkgs"), which if FALSE (the default) will strip off the comments when you install the package. So for example, in my install I get

> DESeq2:::plotPCA.DESeqTransform
function (object, intgroup = "condition", ntop = 500, returnData = FALSE)
{
rv <- rowVars(assay(object))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop,
length(rv)))]
pca <- prcomp(t(assay(object)[select, ]))
percentVar <- pca$sdev^2/sum(pca$sdev^2)
if (!all(intgroup %in% names(colData(object)))) {
stop("the argument 'intgroup' should specify columns of colData(dds)")
}
intgroup.df <- as.data.frame(colData(object)[, intgroup,
drop = FALSE])
group <- if (length(intgroup) > 1) {
factor(apply(intgroup.df, 1, paste, collapse = " : "))
}
else {
colData(object)[[intgroup]]
}
d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], group = group,
intgroup.df, name = colnames(object))
if (returnData) {
attr(d, "percentVar") <- percentVar[1:2]
return(d)
}
ggplot(data = d, aes_string(x = "PC1", y = "PC2", color = "group")) +
geom_point(size = 3) + xlab(paste0("PC1: ", round(percentVar *
100), "% variance")) + ylab(paste0("PC2: ", round(percentVar *
100), "% variance"))
}
<environment: namespace:DESeq2>

In which case the Bioconductor github mirror is nice. You can find the source file here, and scroll down to the function itself.

0
Entering edit mode

You're right, I need to fix that note. I think I tricked myself through using devtools into thinking that would work. I'll just say, go to source for comments.

0
Entering edit mode

Hi - I wonder if you could let me know how to modify this code and save it as a new function that calls the 2 PCs of interest, e.g. 1 and 2 (default) or 3 and 4 as variables, perhaps>?

I guess it's the line:

d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], group = group,

where the [,1] and [,2] refer to 2 columns of the pca output.

Sorry for the basic nature of my question: I'm a beginner and not done any function editing yet..

Cheers,

Matt

0
Entering edit mode

The easiest thing to do is to make your own function. You can get the function body by going to the github link I posted above.

plotPCA.mystyle <-  function(object, intgroup="condition", ntop=500, returnData=FALSE, pcs = c(1,2))
{

stopifnot(length(pcs) == 2)    ### added this to check number of PCs ####
# calculate the variance for each gene
rv <- rowVars(assay(object))

# select the ntop genes by variance
select <- order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]

# perform a PCA on the data in assay(x) for the selected genes
pca <- prcomp(t(assay(object)[select,]))

# the contribution to the total variance for each component
percentVar <- pca$sdev^2 / sum( pca$sdev^2 )

if (!all(intgroup %in% names(colData(object)))) {
stop("the argument 'intgroup' should specify columns of colData(dds)")
}

intgroup.df <- as.data.frame(colData(object)[, intgroup, drop=FALSE])

# add the intgroup factors together to create a new grouping factor
group <- if (length(intgroup) > 1) {
factor(apply( intgroup.df, 1, paste, collapse=" : "))
} else {
colData(object)[[intgroup]]
}

# assembly the data for the plot
########## Here we just use the pcs object passed by the end user ####
d <- data.frame(PC1=pca$x[,pcs], PC2=pca$x[,pcs], group=group, intgroup.df, name=colnames(object))

if (returnData) {
attr(d, "percentVar") <- percentVar[1:2]
return(d)
}