Question: How do I extract the principal components in DESeq2?
0
gravatar for tj.hu
3.9 years ago by
tj.hu10
United States
tj.hu10 wrote:

Hi,

I am using DESeq2 and after the rld transformation I plotted the PCA and found an interesting result. However, I don't know how to extract the PC data. I want to know what the Principal Components actually are (I believe it's a weighted gene list so I want to know the weights?). I would also like to retrieve the data on PC3, PC4, etc. and understand what are in the those principal components but also how much variance are explained by those principal components.

Please let me know if this is possible within DESeq2 or if I should use another package.

Thanks!

deseq2 pca • 6.0k views
ADD COMMENTlink modified 3.9 years ago by Michael Love23k • written 3.9 years ago by tj.hu10
Answer: How do I extract the principal components in DESeq2?
0
gravatar for Michael Love
3.9 years ago by
Michael Love23k
United States
Michael Love23k wrote:

in the help file ?plotPCA we have a note:

Note:

     See the vignette for an example of variance stabilization and PCA
     plots. Note that the source code of ‘plotPCA’ is very simple and
     commented. It can be found with comments by typing
     ‘DESeq2:::plotPCA.DESeqTransform’. (The version at
     ‘getMethod("plotPCA","DESeqTransform")’ will not show comments.)
     Users should find it easy to customize this function.

 

If you follow those instructions, the third line of code gives you the PCA object.

 

ADD COMMENTlink written 3.9 years ago by Michael Love23k
1

Whether or not you see the comments is determined by options("keep.source.pkgs"), which if FALSE (the default) will strip off the comments when you install the package. So for example, in my install I get

> DESeq2:::plotPCA.DESeqTransform
function (object, intgroup = "condition", ntop = 500, returnData = FALSE)
{
    rv <- rowVars(assay(object))
    select <- order(rv, decreasing = TRUE)[seq_len(min(ntop,
        length(rv)))]
    pca <- prcomp(t(assay(object)[select, ]))
    percentVar <- pca$sdev^2/sum(pca$sdev^2)
    if (!all(intgroup %in% names(colData(object)))) {
        stop("the argument 'intgroup' should specify columns of colData(dds)")
    }
    intgroup.df <- as.data.frame(colData(object)[, intgroup,
        drop = FALSE])
    group <- if (length(intgroup) > 1) {
        factor(apply(intgroup.df, 1, paste, collapse = " : "))
    }
    else {
        colData(object)[[intgroup]]
    }
    d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], group = group,
        intgroup.df, name = colnames(object))
    if (returnData) {
        attr(d, "percentVar") <- percentVar[1:2]
        return(d)
    }
    ggplot(data = d, aes_string(x = "PC1", y = "PC2", color = "group")) +
        geom_point(size = 3) + xlab(paste0("PC1: ", round(percentVar[1] *
        100), "% variance")) + ylab(paste0("PC2: ", round(percentVar[2] *
        100), "% variance"))
}
<environment: namespace:DESeq2>

In which case the Bioconductor github mirror is nice. You can find the source file here, and scroll down to the function itself.

ADD REPLYlink written 3.9 years ago by James W. MacDonald50k

You're right, I need to fix that note. I think I tricked myself through using devtools into thinking that would work. I'll just say, go to source for comments.

ADD REPLYlink written 3.9 years ago by Michael Love23k

Hi - I wonder if you could let me know how to modify this code and save it as a new function that calls the 2 PCs of interest, e.g. 1 and 2 (default) or 3 and 4 as variables, perhaps>?

I guess it's the line:

d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], group = group,

 where the [,1] and [,2] refer to 2 columns of the pca output.

 

Sorry for the basic nature of my question: I'm a beginner and not done any function editing yet..

Cheers,

Matt

ADD REPLYlink written 3.8 years ago by matt.arno0

The easiest thing to do is to make your own function. You can get the function body by going to the github link I posted above.

plotPCA.mystyle <-  function(object, intgroup="condition", ntop=500, returnData=FALSE, pcs = c(1,2))
{

 stopifnot(length(pcs) == 2)    ### added this to check number of PCs ####
  # calculate the variance for each gene
  rv <- rowVars(assay(object))

  # select the ntop genes by variance
  select <- order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]

  # perform a PCA on the data in assay(x) for the selected genes
  pca <- prcomp(t(assay(object)[select,]))

  # the contribution to the total variance for each component
  percentVar <- pca$sdev^2 / sum( pca$sdev^2 )

  if (!all(intgroup %in% names(colData(object)))) {
    stop("the argument 'intgroup' should specify columns of colData(dds)")
  }

  intgroup.df <- as.data.frame(colData(object)[, intgroup, drop=FALSE])
 
  # add the intgroup factors together to create a new grouping factor
  group <- if (length(intgroup) > 1) {
    factor(apply( intgroup.df, 1, paste, collapse=" : "))
  } else {
    colData(object)[[intgroup]]
  }

  # assembly the data for the plot
  ########## Here we just use the pcs object passed by the end user ####
  d <- data.frame(PC1=pca$x[,pcs[1]], PC2=pca$x[,pcs[2]], group=group, intgroup.df, name=colnames(object))

  if (returnData) {
    attr(d, "percentVar") <- percentVar[1:2]
    return(d)
  }
ADD REPLYlink written 3.8 years ago by James W. MacDonald50k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 314 users visited in the last hour