How do I extract the principal components in DESeq2?
1
0
Entering edit mode
tj.hu ▴ 10
@tjhu-8369
Last seen 9.3 years ago
United States

Hi,

I am using DESeq2 and after the rld transformation I plotted the PCA and found an interesting result. However, I don't know how to extract the PC data. I want to know what the Principal Components actually are (I believe it's a weighted gene list so I want to know the weights?). I would also like to retrieve the data on PC3, PC4, etc. and understand what are in the those principal components but also how much variance are explained by those principal components.

Please let me know if this is possible within DESeq2 or if I should use another package.

Thanks!

deseq2 pca • 13k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 12 hours ago
United States

in the help file ?plotPCA we have a note:

Note:

     See the vignette for an example of variance stabilization and PCA
     plots. Note that the source code of ‘plotPCA’ is very simple and
     commented. It can be found with comments by typing
     ‘DESeq2:::plotPCA.DESeqTransform’. (The version at
     ‘getMethod("plotPCA","DESeqTransform")’ will not show comments.)
     Users should find it easy to customize this function.

 

If you follow those instructions, the third line of code gives you the PCA object.

 

ADD COMMENT
2
Entering edit mode

Whether or not you see the comments is determined by options("keep.source.pkgs"), which if FALSE (the default) will strip off the comments when you install the package. So for example, in my install I get

> DESeq2:::plotPCA.DESeqTransform
function (object, intgroup = "condition", ntop = 500, returnData = FALSE)
{
    rv <- rowVars(assay(object))
    select <- order(rv, decreasing = TRUE)[seq_len(min(ntop,
        length(rv)))]
    pca <- prcomp(t(assay(object)[select, ]))
    percentVar <- pca$sdev^2/sum(pca$sdev^2)
    if (!all(intgroup %in% names(colData(object)))) {
        stop("the argument 'intgroup' should specify columns of colData(dds)")
    }
    intgroup.df <- as.data.frame(colData(object)[, intgroup,
        drop = FALSE])
    group <- if (length(intgroup) > 1) {
        factor(apply(intgroup.df, 1, paste, collapse = " : "))
    }
    else {
        colData(object)[[intgroup]]
    }
    d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], group = group,
        intgroup.df, name = colnames(object))
    if (returnData) {
        attr(d, "percentVar") <- percentVar[1:2]
        return(d)
    }
    ggplot(data = d, aes_string(x = "PC1", y = "PC2", color = "group")) +
        geom_point(size = 3) + xlab(paste0("PC1: ", round(percentVar[1] *
        100), "% variance")) + ylab(paste0("PC2: ", round(percentVar[2] *
        100), "% variance"))
}
<environment: namespace:DESeq2>

In which case the Bioconductor github mirror is nice. You can find the source file here, and scroll down to the function itself.

ADD REPLY
1
Entering edit mode

You're right, I need to fix that note. I think I tricked myself through using devtools into thinking that would work. I'll just say, go to source for comments.

ADD REPLY
0
Entering edit mode

Hi - I wonder if you could let me know how to modify this code and save it as a new function that calls the 2 PCs of interest, e.g. 1 and 2 (default) or 3 and 4 as variables, perhaps>?

I guess it's the line:

d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], group = group,

 where the [,1] and [,2] refer to 2 columns of the pca output.

 

Sorry for the basic nature of my question: I'm a beginner and not done any function editing yet..

Cheers,

Matt

ADD REPLY
1
Entering edit mode

The easiest thing to do is to make your own function. You can get the function body by going to the github link I posted above.

plotPCA.mystyle <-  function(object, intgroup="condition", ntop=500, returnData=FALSE, pcs = c(1,2))
{

 stopifnot(length(pcs) == 2)    ### added this to check number of PCs ####
  # calculate the variance for each gene
  rv <- rowVars(assay(object))

  # select the ntop genes by variance
  select <- order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]

  # perform a PCA on the data in assay(x) for the selected genes
  pca <- prcomp(t(assay(object)[select,]))

  # the contribution to the total variance for each component
  percentVar <- pca$sdev^2 / sum( pca$sdev^2 )

  if (!all(intgroup %in% names(colData(object)))) {
    stop("the argument 'intgroup' should specify columns of colData(dds)")
  }

  intgroup.df <- as.data.frame(colData(object)[, intgroup, drop=FALSE])
 
  # add the intgroup factors together to create a new grouping factor
  group <- if (length(intgroup) > 1) {
    factor(apply( intgroup.df, 1, paste, collapse=" : "))
  } else {
    colData(object)[[intgroup]]
  }

  # assembly the data for the plot
  ########## Here we just use the pcs object passed by the end user ####
  d <- data.frame(PC1=pca$x[,pcs[1]], PC2=pca$x[,pcs[2]], group=group, intgroup.df, name=colnames(object))

  if (returnData) {
    attr(d, "percentVar") <- percentVar[1:2]
    return(d)
  }
ADD REPLY

Login before adding your answer.

Traffic: 621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6