Question

diffBind: differentially bound sites are highly different between using EDGER and DESEQ2

0

Entering edit mode

dewshrs • 0

@dewshrs-21463

Last seen 6.6 years ago

Hi, Using DESEQ2 and EDGER in diffBind gives huge difference in total number of differentially bound sites around 8000 for DESEQ2 and around 700 for EDGER, reading the vignette, the number should be fairly similar. And also do we need to use "DBASCORETMMREADSFULL" in dba.count if I am setting "bFullLibrarySize=TRUE" in dba.analyze? Currenlty I am using "DBASCOREREADS" in dba.count with "bFullLibrarySize=TRUE" in dba.analyze.

diffbind • 1.8k views

ADD COMMENT • link updated 6.6 years ago by Rory Stark ★ 5.3k • written 6.6 years ago by dewshrs • 0

score 2 · Accepted Answer · 2019-07-29

2

Entering edit mode

Rory Stark ★ 5.3k

@rory-stark-5741

Last seen 13 months ago

Cambridge, UK

When there is a big difference like that, it is usually down to the different normalization methods.

With bFullLibrarySize=TRUE, the DESeq2 analysis will perform only a basic normalization to library size, while the edgeR analysis will perform a more substantive adjustment. If there is a large change in one direction, the simple normalization is usually better.

You can look at what is going on using dba.plotMA(). Compare the plots with bNormalized=FALSE to one with bNormalized=TRUE for both method=DBA_DESEQ2 and method=DBA_EDGER.

Regarding the score to use in dba.count(), it doesn't matter for the analysis. The score is only used for global plots; when you run dba.analyze(), it re-does whatever normalization you've specified in the dba.analyze() parameters.

ADD COMMENT • link 6.6 years ago Rory Stark ★ 5.3k

0

Entering edit mode

Thank you for your reply, incase of EDGER, using bFullLibrarySize = TRUE/FALSE doesn't make much difference and MA plot is almost same, but when using DeSeq2, when bFullLibrarySize = TRUE the negative fold change is highly increased and hence around 8000, but if changed to FALSE, the number is drastically reduced to around 200

ADD REPLY • link 6.6 years ago dewshrs • 0

1

Entering edit mode

This is consistent with an experiment that induces a large change in binding, all in one direction. The TMM normalization used by edgeR assumes a core of relatively unchanged binding and will over-normalize. In this case, you should use DESeq2 and bFullLibrarySize = TRUE. This is actually the reason we changed this to be the default.

In future I hope to make the normalization more transparent and better separated from the analysis method.

ADD REPLY • link 6.6 years ago Rory Stark ★ 5.3k

0

Entering edit mode

what I see in ?dba.analyze is

bFullLibrarySize logical indicating if the full library size (total number of reads in BAM/SAM/BED file) for each sample is used for scaling normalization. If FALSE, the total number of reads present in the peaks for each sample is used (generally preferable if overall biding levels are expected to be similar between samples).

So can I think that means if TRUE, I will use the reads count in BAM, then libSize are sent to sizeFactors(). So the DEseq2 will not use the function:estimateSizeFactors(). While FASLE, then I will use reads count in peak set, then DESeq2 estimateSizeFactors() will estimate sizeFactors from these sample peak counts?

ADD REPLY • link 5.8 years ago Guandong Shang ▴ 40

1

Entering edit mode

Correct, the factors are only estimated by DESeq2 if bFullLibrarySize=FALSE, otherwise they are based directly on the libSizes.

ADD REPLY • link 5.8 years ago Rory Stark ★ 5.3k