How to reproduce the PCA plot from DiffBind
2
0
Entering edit mode
r.chereji • 0
@rchereji-8534
Last seen 6.7 years ago
United States

Hi,

I'm trying to understand what exactly is done by the dba.plotPCA function from DiffBind. I tried to reconstruct the plot generated by dba.plotPCA but for some reason I get a different plot. Could you please check what I tested and tell me what is done differently by the dba.plotPCA function?

Thanks!

Razvan

 

Here is what I've tried (I used DiffBind_1.14.6):

library(DiffBind)
data(tamoxifen_peaks)
dba.plotPCA(tamoxifen)

# Trying to reproduce the plot
counts = dba.peakset(tamoxifen, bRetrieve=TRUE)
counts = as.data.frame(mcols(counts))
counts = data.matrix(counts)

pca = princomp(counts, cor = FALSE)
loadings = unclass(pca$loadings)

dat = data.frame(Tissue = tamoxifen$samples$Tissue,
                  PC1 = loadings[,1],
                  PC2 = loadings[,2])
library(ggplot2)
p = ggplot(dat, aes(PC1,PC2))
p + geom_point(size=5,aes(colour = Tissue)) + theme_bw()

# The 2 plot are different...

diffbind • 2.7k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 14 days ago
Cambridge, UK

Hi Razvan-

Now that I'm back at my desk, I see that you are plotting the peak scores, not read scores, which explains why there are -1 values. and why you are having problems reproducing the plot.

It turns out that the PCA plotting code assumes the data are based on read counts, so negative scores (which are possible when subtracting control reads) are set to a minimum value of 1. When plotting peak scores, which are between -1 and +1, this actually does the wrong thing, setting the minimal scores to be the maximum score. 

The dba.plotHeatmap() function already handles this by not allowing logs to be taken on peak scores. I am checking in a change to make dba.plotPCA() do the same thing for consistency.

In the meantime, I advise running dba.plotPCA() with bLog=FALSE.

Cheers-

Rory

 

ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 14 days ago
Cambridge, UK

Hi Razvan-

Someone asked a similar question recently, perhaps the answer is there:

What does dba.PCA in DiffBind exactly do?

It's probably that by default the read scores are log2()-ed.

-Rory

 

 

ADD COMMENT
0
Entering edit mode

Hi Rory,

 

Thanks for the link, but some of the counts given by "counts = dba.peakset(tamoxifen, bRetrieve=TRUE)" are "-1" so I can't take logarithm of that... Can you please check once more my example and tell me what is done differently by DiffBind?

Thanks,

Razvan

 

ADD REPLY

Login before adding your answer.

Traffic: 940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6