Question

Diffbind: dba.plot retrieve variance explained

0

Entering edit mode

ZheFrench ▴ 40

@zhefrench-11689

Last seen 16 months ago

France

Is there a way to grep the percentages PC1 and PC2 variance explained from dba.plotPCA in diffbind or chipQC ?

In my pca plot PC1 and PC2 numbers seem inversed . I'd like a way to recompute them from scratch (as asked What does dba.PCA in DiffBind exactly do? ) or to retrieve them to tcheck if they are wrongly sorted. Because from several dataset I treated, it looks weird.

I have a chipQC experiment already saved that I can load to play with.

diffbind chipqc • 1.8k views

ADD COMMENT • link updated 6.6 years ago by Rory Stark ★ 5.2k • written 6.7 years ago by ZheFrench ▴ 40

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 13 days ago

Cambridge, UK

There's no straightforward way to retrieve the raw PCA results from a DiffBind object. If you have a DiffBind analysis object that seems to be giving the wrong PC order, send me a copy of the object and I'll see if I can reproduce it and track it down.

There was a bug in an earlier version of DiffBind where the order of the first two principal components are reversed, but only in 3D plots. Are you using 3D plots? What version of DiffBind are you on?

-R

ADD COMMENT • link 6.7 years ago Rory Stark ★ 5.2k

0

Entering edit mode

DiffBind_2.2.12 . It's not 3D plot.I send you that.

ADD REPLY • link 6.6 years ago ZheFrench ▴ 40

0

Entering edit mode

Can you email me the DBA (or ChIPQC) object and the code you are using to generate the PCA?

Often, the property that is expected to be the main source of variance (and hence be reflected on PC1) is superseded by some other source of variance, like a batch effect or a "latent" variable. Seeing what you expect to be your primary source of variance in the second component is an indication that this has occurred.

ADD REPLY • link 6.6 years ago Rory Stark ★ 5.2k

0

Entering edit mode

OK I see that you did send it, I'll reply below.

ADD REPLY • link 6.6 years ago Rory Stark ★ 5.2k

score 2 · Accepted Answer · 2017-09-12

Looking at your PCA, I see no evidence that the components are in the wrong order. The first component is capturing most of the biological variance you expect, however there is one sample (red in the plot you sent me) where the two replicates have greater variance between them than the sample groups have between each other. The high variance between these two samples is indicative of something unexpected going on in the experiment. This is likely something technical in the ChIP (or peak calling is this is a peak score plot) and worth trying to get to the bottom of. You may want to look at these two ChIPs in a browser, and compare the peaks called for each of them (eg using Venn diagrams). If this is a peak score plot, it would be interesting to compare to a read score plot -- if they show less variance using read scores, that indicates the issue is in peak calling, which is less of a concern.

-Rory