Question

Retrieve loadings/eigenvalues for PCAs using DiffBind's dba.plotPCA

0

Entering edit mode

Spendlove ▴ 10

@95c0b574

Last seen 3.5 years ago

United States

Hi, I am trying to help a labmate who is using DiffBind to do a PCA analysis on some ATAC-seq data that she generated. I have experience with PCA but not with this specific software. How can we retrieve the PCA loadings/eigenvalues for each individual for each PC? I am used to using other PCA tools that allow you to retried the spreadsheet of loadings/eigenvalues in addition to plotting PC's 1-10 against each other. As far as I can tell DiffBind allows you to plot PC1, PC2, and PC3 against each other but does not provide the actual numbers, nor does it provide access to any further PC's. Any help with this would be greatly appreciated.

dba.PCA documentation: https://www.rdocumentation.org/packages/DiffBind/versions/2.0.2/topics/dba.plotPCA

Thanks!

DiffBind dba.plotPCA • 2.6k views

ADD COMMENT • link updated 2.3 years ago by aaweiner • 0 • written 3.5 years ago by Spendlove ▴ 10

0

Entering edit mode

Wanted to do something similar and indeed was unable to find PCA loadings. However I did find that plotting dba.plotPCA(dba, DBA_ID) and dba.plotPCA(dba, DBA_TREATMENT) (so by sample ID and treatment group, respectively), I was able to see which sample corresponds to which in the treatment PCA plot. So not exact eigenvalues as you wanted, but you can at least get an idea of which samples cluster together/away from each other which is all I was trying to do at the end of the day.

ADD REPLY • link 2.3 years ago aaweiner • 0

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 4 weeks ago

Cambridge, UK

What version are you using? That information is somewhat dated.

Since version 2.14, the default logic used is as follows:

Normalized read counts for the samples being plotted are retrieved
Any peaks with no non-zero values for these samples are removed
The log2() values for the read counts are computed after setting values < 1 to 1
prcomp()(not princomp()) is used to compute a principal components analysis on the transpose of these values
The percentage variance for each component is computed by squaring the $sdev values, dividing by the total, and multiplying by 100.
The $x values are plotted. These are the loading values that have been centered and multiplied by the rotation matrix.

Here's some code to show the essence of what is happening:

data(tamoxifen_counts)
dba.plotPCA(tamoxifen, DBA_FACTOR)

peaks <- dba.peakset(tamoxifen,bRetrieve=TRUE)
vals <- as.matrix(mcols(peaks))
vals[vals<1] <- 1
vals <- log2(vals)
pc <- prcomp(t(vals))
percent <- pc$sdev[1:2]^2 / sum(pc$sdev^2) * 100
percent <- as.character(round(percent))

lattice::xyplot(
  PC2 ~ PC1,
  data = as.data.frame(pc$x[,1:2]),
  xlab = percent[1], ylab = percent[2],
  pch = 16, cex = 1.75, aspect = 1)

ADD COMMENT • link 2.6 years ago Rory Stark ★ 5.2k

score 2 · Accepted Answer · 2021-05-12

2

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 4 weeks ago

Cambridge, UK

DiffBind does allow any components to be plotted again any other components by specifying which components to plot using components= when calling dba.plotPCA() (the default is components=1:3 but this may be changed by the user).

There is no way to directly retrieve the computed PCA data. You can retrieve the underlying read count data (normalized by default) using dba.peakset() with bRetrieve=TRUE and run a PCA on that.

ADD COMMENT • link 3.5 years ago Rory Stark ★ 5.2k

1

Entering edit mode

Thanks Rory!! In future versions of the software it might be helpful to be able to retrieve the computed PCA data, but this will work. :) Thanks for your quick response!!

ADD REPLY • link 3.5 years ago Spendlove ▴ 10

0

Entering edit mode

Note that by default DiffBind does PCA using log2() values.

I should also mention that if you are looking to get the counts for differentially bound sites only, you can call dba.report() with bCounts=TRUE and form a matrix with the count metadata columns.

ADD REPLY • link 3.5 years ago Rory Stark ★ 5.2k

0

Entering edit mode

For my dataset, I have a hard time to reproduce the result from dba.plotPCA(dba), but the proportion of variance is way off ( 17% from dba.plotPCA vs 76% from the following code for PC1). I try different score but the % remain similar. Also the x and y-axis scale are different too (screenshot attached).

Also, following the instruction from an older post:

  dba <- dba.count(dba,peaks=NULL,score=DBA_SCORE_TMM_MINUS_FULL )
  counts <- dba.peakset(dba, bRetrieve=TRUE)
  counts <- mcols(counts) %>% data.matrix
  pca <- princomp(log2(counts+1), cor = FALSE)
  pcaData <- unclass(pca$loadings )
  percentVar <- round(pca$sdev^2/sum(pca$sdev^2) *100,2)

Diffbind princomp

ADD REPLY • link 2.7 years ago Pak • 0