Question

Diffbind Consensus vs Differential peak sets from same count

0

Entering edit mode

rbronste ▴ 60

@rbronste-12189

Last seen 5.1 years ago

Trying to get consensus and differential set from the same dba.count and not sure the best way to parse the two. Here is my command string:

BN<- dba(sampleSheet = "Adult_AT_BN.csv", config=data.frame(AnalysisMethod="DBA_DESEQ2",th=.05),peakFormat="narrow")

BN_count<- dba.count(BN, score=DBA_SCORE_READS, fragmentSize=0)

masks<-BN_count$masks

M_BN_Veh<-masks$MALE & masks$VEHICLE

M_BN_vehEB_contrast<-dba.contrast(BN_count, group1 = M_BN_Veh, group2 = M_BN_EB)

M_BN_vehEB_analysis<-dba.analyze(M_BN_vehEB_contrast, method = DBA_DESEQ2)

M_BN_vehEB_report<-dba.report(M_BN_vehEB_analysis, method = DBA_DESEQ2, th=.01, fold=2)

So wondering how I can get differential peaks of .01 FDR larger than 5 and -5 fold change and then consensus peaks between 5 and -5 with the same FDR limits? Thanks.

diffbind chipseq • 935 views

ADD COMMENT • link updated 7.2 years ago by Rory Stark ★ 5.2k • written 7.2 years ago by rbronste ▴ 60

score 1 · Answer 1 · 2017-09-21

I'm not entirely sure what you are trying to do, but if you are just wanting to see the differentially bound peaks with FDR values lower than a threshold and with absolute fold change greater than a value, you last line should do it (in that case, peaks with FDR < 0.01 and absolute fold change > 4, as the value for the fold parameter is taken to be a log2, so fold=2 returns fold changes > 4).

What I'm not clear on is what you mean by the difference between "differential peaks" and "consensus peaks between 5 and -5 with the same FDR limits". All the peaks returned by dba.report() are consensus peaks -- the only thing that labels them as "differential" is setting some threshold on the FDR.

Two notes worth considering:

First, you can work directly on the full consensus peakset by setting th=1 in the call to dba.report(), then subsetting it. For example:

> M_BN_vehEB_report<-dba.report(M_BN_vehEB_analysis,th=1)
> M_BN_vehEB_report[M_BN_vehEB_report$FDR <.01 & abs(M_BN_vehEB_report$Fold) > 5,]

Second, it has been shown that if you threshold by fold change, the FDR values computed for the whole set are no longer correct (see McCarthy, D.J. and Smyth, G.K., 2009. Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics, 25(6)). We're hoping to add this correction to DiffBind soon, but it is worth being aware. The good news is that FDR values calculated this way will generally be lower, so using the existing ones is "conservative."

-Rory