Question

DiffBind: dba.report reports odd FDR values

1

Entering edit mode

biotom ▴ 10

@biotom-7371

Last seen 7.8 years ago

Germany

Dear all,

I am analysing ChIP-Seq data and used DiffBind to compare 3 wildtype vs 3 mutant samples. When calling dba.report(myDBAObject, th=1, bUsePval=TRUE) I receive odd FDR values, i.e. it seems they do not match the p-values since there are several peaks with quite low p-values having higher FDR-values than peaks with higher p-values.

When removing the bUsePval=TRUE option the FDR values seems to be OK again but the resulting list contains about four times less peaks than the initial one.

Thank you for your help.

Kind regards,

Tom

diffbind • 2.1k views

ADD COMMENT • link updated 7.8 years ago by Rory Stark ★ 5.2k • written 7.8 years ago by biotom ▴ 10

score 1 · Answer 1 · 2016-07-27

Hi Tom-

Two things are going on here.

The interesting one concerns how DESeq2 computes multiple-testing adjusted confidence scores (called FDR in DiffBind). An "independent filter" is applied (using the genefilter package) such that only a subset of intervals will be eligible for multiple testing correction. The remainder of intervals will not have an adjusted p-value calculated (reporting a value of NA). In DiffBind, the NA values are converted to FDR values of 1.0 for consistency with edgeR (which uses a more standard multiple testing correction on all the p-values). I refer you to the man page for the results() function in DESeq2, with the relevant passage as follows:

"By default, independent filtering is performed to select a set of genes for multiple test correction which maximizes the number of adjusted p-values less than a given critical value alpha (by default 0.1). See the reference in this man page for details on independent filtering. The filter used for maximizing the number of rejections is the mean of normalized counts for all samples in the dataset. Several arguments from the filtered_p function of the genefilter package (used within the results function) are provided here to control the independent filtering behavior. In DESeq2 version >= 1.10, the threshold that is chosen is the lowest quantile of the filter for which the number of rejections is close to the peak of a curve fit to the number of rejections over the filter quantiles. 'Close to' is defined as within 1 residual standard deviation. The adjusted p-values for the genes which do not pass the filter threshold are set to NA."

In practice, DESeq2 can exclude sites with low p-values from being assigned an adjusted p-value while reporting one for sites with higher p-values. If you don't want to use DESeq2's filtering, you can apply a standard multiple testing correction such as p.adjust() to the full set of p-values.

The second issue is an inconsistency with how thresholds are treated in DiffBind. I have checked in a fix (v2.0.3) to address this, so that sites will always be considered significantly differentially bound if their confidence score is less than or equal to the threshold (rather than only less than in certain cases). After this fix, you should get the same number of results (all the sites in the binding matrix) from dba.report() when th=1, regardless of how bUsePval is set.