Question

Visualizing Normalization in DiffBind using MA: Differences between DESEQ2 / EdgeR & full v. effective library size

0

Entering edit mode

siklenkak • 0

@siklenkak-12153

Last seen 6.5 years ago

Hi, I was reading DiffBind differential binding normalization with different levels of binding and found I have a similar problem to the op, Igor. I'm also trying to figure out the effects of normalization on my data. Initially, I suspect that we would have a global change in our treatment vs control so based on the logic described throughout the literature bFullLibrarySize=TRUE seems appropriate; however, edgeR vs DESEQ2 give different results. I think this is due to differences in TMM vs the deseq2 normalization, but Rory's response to Igor made me want to visualize the differences using MA plots as he suggested.

I see pretty big effects from normalization, and really big effects in the librarysize used, but I'm having a hard time moving forward with one or the other. Would I really expect such drastic differences between these normalization conditions?

Any interpretations would be really appreciated. Code and output below.

myDBA <- dba.analyze(myDBA,method=c(DBA_EDGER, DBA_DESEQ2).bReduceObjects=FALSE, bFullLibrarySize=FALSE)
myDBA2 <- dba.analyze(myDBA,method=c(DBA_EDGER, DBA_DESEQ2),bReduceObjects=FALSE, bFullLibrarySize=TRUE)
par(mfrow=c(2,4))
#bFullLibrarySize=TRUE
dba.plotMA(myDBA2, contrast=2, method=DBA_EDGER, bNormalized=FALSE, yrange=c(-4,4))
dba.plotMA(myDBA2, contrast=2, method=DBA_EDGER, yrange=c(-4,4))
dba.plotMA(myDBA2, contrast=2, method=DBA_DESEQ2,bNormalized=FALSE, yrange=c(-4,4))
dba.plotMA(myDBA2, contrast=2, method=DBA_DESEQ2, yrange=c(-4,4))
#bfullLibrarySize=FALSE
dba.plotMA(myDBA, contrast=2, method=DBA_EDGER, bNormalized=FALSE, yrange=c(-4,4))
dba.plotMA(myDBA, contrast=2, method=DBA_EDGER, yrange=c(-4,4))
dba.plotMA(myDBA, contrast=2, method=DBA_DESEQ2,bNormalized=FALSE, yrange=c(-4,4))
dba.plotMA(myDBA, contrast=2, method=DBA_DESEQ2, yrange=c(-4,4))

diffbind edger deseq2 chip-seq • 1.6k views

ADD COMMENT • link updated 7.2 years ago by Rory Stark ★ 5.1k • written 7.2 years ago by siklenkak • 0

score 0 · Answer 1 · 2017-02-06

These plots are a bit unusual, I wonder if there is something else going on.

The non-nomalized plots don't show a big shift all in one direction, which is what we'd expect if there is a big gain in one condition. There appears to be a bit of trent towards higher binding affinity in the TG condition. Usually I expect to see the edgeR/TMM plots showing a bigger difference in the raw vs normalized distributions, but here we see a big shift "down" in the DESeq2 FullLibrary normalization, which is unusual.

In this case, I wouldn't trust the FullLibrary DESeq2 results as the shift seems anomalous. Perhaps there is a bias in the depth of sequencing between the two sample groups? You can examine the full library sizes as follows:

> myDBA$class[8,]

Do the Control samples have systematically more reads than the TG samples?

I also notice that all of the analyses are very sensitive, in that it is detecting significant differential binding with very low fold changes. This should be due to some combination of very low variance in the samples (the samples in each group look very similar) and and/or having a lot of replicates (high power). How many replicates so you have? Are they true biological replicates or some sort of technical replicate?

The other difference I see is that the edgeR analyses are identifying more sites with relatively low binding affinity as being differentially bound (compounded by the high sensitivity). Unless you have prior reason to believe that small changes in low-affinity sites are of interest, I would lean towards the high-affinity sites identified in the straight (bLibrarySize=FALSE) DESeq2 analysis.

-R