Retrieve loadings/eigenvalues for PCAs using DiffBind's dba.plotPCA
2
0
Entering edit mode
Spendlove ▴ 10
@95c0b574
Last seen 16 months ago
United States

Hi, I am trying to help a labmate who is using DiffBind to do a PCA analysis on some ATAC-seq data that she generated. I have experience with PCA but not with this specific software. How can we retrieve the PCA loadings/eigenvalues for each individual for each PC? I am used to using other PCA tools that allow you to retried the spreadsheet of loadings/eigenvalues in addition to plotting PC's 1-10 against each other. As far as I can tell DiffBind allows you to plot PC1, PC2, and PC3 against each other but does not provide the actual numbers, nor does it provide access to any further PC's. Any help with this would be greatly appreciated.

dba.PCA documentation: https://www.rdocumentation.org/packages/DiffBind/versions/2.0.2/topics/dba.plotPCA

Thanks!

DiffBind dba.plotPCA • 897 views
ADD COMMENT
0
Entering edit mode

Wanted to do something similar and indeed was unable to find PCA loadings. However I did find that plotting dba.plotPCA(dba, DBA_ID) and dba.plotPCA(dba, DBA_TREATMENT) (so by sample ID and treatment group, respectively), I was able to see which sample corresponds to which in the treatment PCA plot. So not exact eigenvalues as you wanted, but you can at least get an idea of which samples cluster together/away from each other which is all I was trying to do at the end of the day.

ADD REPLY
2
Entering edit mode
Rory Stark ★ 4.4k
@rory-stark-5741
Last seen 3 days ago
CRUK, Cambridge, UK

DiffBind does allow any components to be plotted again any other components by specifying which components to plot using components= when calling dba.plotPCA() (the default is components=1:3 but this may be changed by the user).

There is no way to directly retrieve the computed PCA data. You can retrieve the underlying read count data (normalized by default) using dba.peakset() with bRetrieve=TRUE and run a PCA on that.

ADD COMMENT
1
Entering edit mode

Thanks Rory!! In future versions of the software it might be helpful to be able to retrieve the computed PCA data, but this will work. :) Thanks for your quick response!!

ADD REPLY
0
Entering edit mode

Note that by default DiffBind does PCA using log2() values.

I should also mention that if you are looking to get the counts for differentially bound sites only, you can call dba.report() with bCounts=TRUE and form a matrix with the count metadata columns.

ADD REPLY
0
Entering edit mode

For my dataset, I have a hard time to reproduce the result from dba.plotPCA(dba), but the proportion of variance is way off ( 17% from dba.plotPCA vs 76% from the following code for PC1). I try different score but the % remain similar. Also the x and y-axis scale are different too (screenshot attached).

Also, following the instruction from an older post:

  dba <- dba.count(dba,peaks=NULL,score=DBA_SCORE_TMM_MINUS_FULL )
  counts <- dba.peakset(dba, bRetrieve=TRUE)
  counts <- mcols(counts) %>% data.matrix
  pca <- princomp(log2(counts+1), cor = FALSE)
  pcaData <- unclass(pca$loadings )
  percentVar <- round(pca$sdev^2/sum(pca$sdev^2) *100,2)

Diffbind princomp

ADD REPLY
0
Entering edit mode
Rory Stark ★ 4.4k
@rory-stark-5741
Last seen 3 days ago
CRUK, Cambridge, UK

What version are you using? That information is somewhat dated.

Since version 2.14, the default logic used is as follows:

  • Normalized read counts for the samples being plotted are retrieved
  • Any peaks with no non-zero values for these samples are removed
  • The log2() values for the read counts are computed after setting values < 1 to 1
  • prcomp()(not princomp()) is used to compute a principal components analysis on the transpose of these values
  • The percentage variance for each component is computed by squaring the $sdev values, dividing by the total, and multiplying by 100.
  • The $x values are plotted. These are the loading values that have been centered and multiplied by the rotation matrix.

Here's some code to show the essence of what is happening:

data(tamoxifen_counts)
dba.plotPCA(tamoxifen, DBA_FACTOR)

peaks <- dba.peakset(tamoxifen,bRetrieve=TRUE)
vals <- as.matrix(mcols(peaks))
vals[vals<1] <- 1
vals <- log2(vals)
pc <- prcomp(t(vals))
percent <- pc$sdev[1:2]^2 / sum(pc$sdev^2) * 100
percent <- as.character(round(percent))

lattice::xyplot(
  PC2 ~ PC1,
  data = as.data.frame(pc$x[,1:2]),
  xlab = percent[1], ylab = percent[2],
  pch = 16, cex = 1.75, aspect = 1)
ADD COMMENT

Login before adding your answer.

Traffic: 184 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6