Question

DiffBind's MA plots for ChIP-seq - how to properly interpret normalization from them?

0

Entering edit mode

melnuesch ▴ 10

@melnuesch-20883

Last seen 4.4 years ago

1) Doubt about DiffBind's MA plots in general Hello, I am doing ChIP-seq analysis of H3K4me3 in human samples across different conditions. For that, my pipeline includes DiffBind to make volcano, MA plots , PCA and heatmaps. The DiffBind manual says that MA plots are good to see the effect of normalization and puts the following example (please, see picture below). The problem is that, when I consult with other people, they say I should expect that the blue density plot is symmetrical around the X axis (when my data is properly normalized). If I see this image in the manual I don't think it's symmetrical. Therefore, I was wondering that maybe the DiffBind variant of an MA plot is done differently and requires a slightly different interpretation. What's your view in the topic?

2) This problem applied to my data. A second question is: I expect that the blue density plot doesn't have to be always perfect around the X axis, but where should we "draw the line"? I am uploading one of my MA plots to illustrate this. Across my conditions (let's say incremental conditions A B C against condition H that's healthy control), the plot A vs H is symmetrical, BvsH as well, but CvsH is less (please see the second picture below, the one with many pink points). To my eyes, this is a nice result showing a transition in my biological process (because of the loss of H3K4me3 binding that you can see), but I am afraid that, because the blue density plot is too much shifted down, the result is actually not legit because of a normalization problem or something like that. I asked in Biostars and some people suggested to do the analysis in CSAW and compare MA plots. I am doing it, but I wanted to ask here these doubts about the DiffBind inner workings anyways.

3) In the second panel with the three pictures, in A vs H you can see that the majority of the sites are symmetrical to the X axis except for a blob in the upper left quadrant. What is happening there?

Example MAplot from the DiffBind guide

enter image description here

diffbind R MAplot Chipseq • 2.5k views

ADD COMMENT • link updated 4.9 years ago by Rory Stark ★ 5.2k • written 4.9 years ago by melnuesch ▴ 10

score 2 · Accepted Answer · 2019-05-30

I'm not sure who is saying that you "should expect that the blue density plot is symmetrical around the X axis (when my data is properly normalized)". While this tends to be true for RNA-seq, in ChIP-seq there can be radical shifts in binding levels between conditions (depending on your treatments). Indeed, it is a dangerous error in processing ChIP-seq to force normalization to center the distribution around zero fold change. Consider the example in Guertin et. al. 2018 (NAR), where they ChIP the estrogen receptor ER. They use a treatment that degrades ER and eliminates most ER binding. When using normalization methods that assume the data should be distributed around zero, the show how most of this massive biological effect is eliminated, which leads to a clearly false conclusion.

In your example, the CvsH contrast shows a general loss of binding, which is what your expect. Normalization would not have exaggerated this effect; if anything it would have damped it a bit. You can see the impact of normalization more clearly by running dba.plotMA() side-by-side with bNormalized=TRUE and bNormalized=FALSE. (You can also set th=1 to eliminate the red dots to see the blue ones more clearly). You shouldn't see too much of a shift between the normalized and non-normalized plots; in the CvsH contrast, you should see that the raw read counts are more concentrated in the control (H) condition.

You can also compare normalization methods by running dba.analyze() with methods=c(DBA_EDGER, DBA_DESEQ2), and running it with bFullLibrarySize=FALSE as an alternative comparison. The biggest shift towards the zero line should occur with the DBA_EDGER/bFullLibrarySize=FALSE run, and the smallest adjustment should be in the (default) DBA_DESEQ2/bFullLibrarySize=TRUE analysis, which only normalizes using the sequencing depth of each library (number of reads in the associated bam file).

Finally, to get more confidence in the results, you can look at the raw and normalized read counts in the dba.report() by setting bCounts=TRUE and comparing bNormalized=TRUE and bNormalized=FALSE. You can also examine individual sites in a browser with all the replicate read tracks.

Bottom line: your analysis looks correct!