From DiffBind'
s perspective, the main difference between edgeR
and DESeq2
relates to the way they normalise the data. DiffBind
uses edgeR
's TMM normalization method in a way that assumes that most of the sites are not differentially bound. If this assumption is violated -- for example, we have experiments where we knock down the factor we are ChIPing -- the result as normalized using method=DBA_EDGER
can be (quite) incorrect. It is for this reason that in the development version of DiffBind
, the default has been changed to method=DBA_DESEQ2
, at least until we can add some more advanced normalization features.
In your case, if one of your conditions is hypo-methylated and the other hyper-methylated, the normalization can be doing the wrong thing. It is worth looking to see if this is what is going on. One way to do this is to use dba.plotMA()
to make three plots, and compare them:
> par(mfrow=c(3,1))
> dba.plotMA(myDBA, method=DBA_EDGER, bNormalized=FALSE)
> dba.plotMA(myDBA, method=DBA_EDGER, bNormalized=TRUE)
> dba.plotMA(myDBA, method=DBA_DESEQ2, bNormalized=TRUE)
Comparing the non-normalized data to the version normalized using edgeR
and DESeq2
may shed light on what is going on.
Besides that, a ~15% difference is actually pretty decent agreement. After all, the threshold is somewhat arbitrary (the default is changing from 0.10 to 0.05 in future releases). It may be interesting to see how well the different methods agree on the specific sites they identify. You can get an idea of how consistent things are by plotting a Venn diagram:
> dbsites <- dba.plotVenn(myDBA,contrast=1,method=DBA_ALL_METHODS)
Note that the three peaksets in the Venn diagram (peaks identified by only edgeR
, only DESeq2,and
both methods) are returned in dbsites
.
-Rory
Hi Rory,
Thanks for your suggestion about using GreyListChIP in handling the false positive signal. I'm now trying this approach.
I've got some question about applying the greylist.
If doing the filtering before peak calling:
1) How to do the filtering?
2) Do you know if this step compatible with MACS2? or No need to apply the greylist if using MACS2 (MACS2 consider Input signal during peak calling?)
If doing the filtering after peak calling:
1) To prepare files (.bed & .bam) for DiffBind, what is the procedure? (Should I just remove the peaks overlapped with the greylist in the bed file or I need to do something with the bam file as DiffBind mainly use bam for calculation?)
Thanks a lot!
Yours Sincerely,
Kylie
Hi,
I recommend filtering before peak calling. Peak callers such as MACS2 do use the input to filter out peaks that are also present in input, but they do not cope particularly well in regions where the input is particularly noisy, and tend to call a lot of spurious peaks. (That's sort of the whole reason for GreyListChIP.) A package like, say, bedtools https://github.com/arq5x/bedtools2 can be used to do the filtering.