What does dba.PCA in DiffBind exactly do?
3
2
Entering edit mode
renanec ▴ 20
@renanec-9080
Last seen 7.3 years ago
United States

Hi there,

I’ve been using the dba.PCA function (Package DiffBind version 1.14.6) in the standard way described in the documentation. So far,  I have two very related questions:

2. I’m interested in understanding how dba.PCA works. To do so I’ve unsuccessfully tried to recreate the PCA myself, the results are closed but not identical. I’m guessing that there are other steps in between or perhaps a different implementation of PCA that you’re using . Could you please take a look at my code ? Ideally I would like to recreate outside of dba.PCA each of the internal steps in the function

samples =dba(sampleSheet = "sample_sheet_all.csv") samples = dba.count(samples,score = DBA_SCORE_READS)​

#Retrieve read count data

peaks.init.ranges<-dba.peakset(samples, bRetrieve=TRUE)

#Create a matrix with the read counts (there might be a more direct way of creating the inputPCA_matrix) inputPCA = mcols(peaks.init.ranges) inputPCA = as.data.frame(inputPCA) inputPCA_matrix = data.matrix(inputPCA)

#Transpose matrix so that each row is a sample and the columns are read counts inputPCA_matrix=t(inputPCA_matrix) #It uses the covariance matrix by default pr.res=prcomp(inputPCA_matrix) scores = as.data.frame(pr.res$x) biplot(pr.res) Many Thanks, Renan Escalante-Chong diffbind pca • 2.8k views ADD COMMENT 2 Entering edit mode Rory Stark ★ 4.6k @rory-stark-5741 Last seen 14 days ago CRUK, Cambridge, UK Hi Renan- You're on the right track, except for a few items: 1. By default when you call dba.plotPCA(), the parameter bLog=TRUE, which means the log2() values of the read counts are used. Setting bLog=FALSE will use the raw read counts. Logs are default because it is less likely to identify a single outlier, with all the other sample bunched together, than with unlogged data. 2. dba.plotPCA() uses princomp(), not prcomp(). This shouldn't make a noticeable difference however (indeed, I'm likely to change it to prcomp() in the next release). 3. dba.plotPCA() plots the raw loadings, equivalent to pr.res$rotation[,1:2] (actually pr.res$loadings[,1:2] as it comes back from princomp()). Hope this helps- Rory ADD COMMENT 0 Entering edit mode Hi Rory, Sorry to dig up an old post but I had a follow up question; if you transpose the matrix as Renan suggested, wouldn't pr.res$rotation[,1:2] give you the loadings of the peaks, not of the samples?

I've been trying to replicate Renan's code and I get the same PC contributions as in the diffbind PCA plot only when I use the non-transposed matrix.

- Brook

0
Entering edit mode

DiffBind doesn't transpose the matrix.

-R

0
Entering edit mode

Hi Rory,

Is there a way to change the pch values in dba.plotPCA() by some metadata column? It seems round points is hard-coded into the function and can't be changed. And if not, is there a way to store the coordinates of the points for use in some other plotting package e.g. ggplot2?

1
Entering edit mode

This is indeed hard-coded.

One option you can consider is that dba.plotPCA() silently returns a trellis object, which you can update. For example:

library(lattice)
data(tamoxifen_analysis)
pcaplot <- dba.plotPCA(tamoxifen)
update(pcaplot, pch=17, cex=.7)

0
Entering edit mode
renanec ▴ 20
@renanec-9080
Last seen 7.3 years ago
United States

Thanks! This clarifies how the PCA works in the package.

0
Entering edit mode
Rory Stark ★ 4.6k
@rory-stark-5741
Last seen 14 days ago
CRUK, Cambridge, UK

I have gone ahead and changed it to use pr.comp() in the development version.