Question

Getting different differentially accessible peaks between versions

0

Entering edit mode

jandrade • 0

@79c8f005

Last seen 27 days ago

United States

Hi I'm rerunning some analysis for ATAC since I've been having trouble validating some results. I have two files I created in March of 2022 using Diffbind I installed through conda, an analyzed DBA object and a report-based DBA object where I set fold=1 and th=0.01

When I load these into an R session with the most up-to-date versions of packages, I see this message:

Warning: Loading DBA object from a previous version -- updating...

and recreating the report using the same code:

> dba.report(data, contrast=c(2:16), bDB=TRUE, fold=1, th=0.01)

gives me a completely different result. For example a specific contrast I'm focused on (14) goes from having 6294 regions to just 254 which is concerning.

I do remember seeing a post that talked about how FDR values are adjusted when you set a minimum fold change, and if I run the report without the fold change and manually filter by minimum fold change after, I can retrieve the peaks from the old report.

#oldReport is my previously generated report-based DBA object and 14 is the contrast I'm interested in
> oldReport
15 Samples, 70861 sites in matrix:
          Contrast Direction DB Method Intervals
1  26hpf vs. 42hpf       All DB DESeq2     13787
2  26hpf vs. 72hpf       All DB DESeq2     34827
3   26hpf vs. 7dpf       All DB DESeq2     51549
4   26hpf vs. 1hpb       All DB DESeq2     50775
5   26hpf vs. 3dpb       All DB DESeq2     44556
6  42hpf vs. 72hpf       All DB DESeq2      9819
7   42hpf vs. 7dpf       All DB DESeq2     35128
8   42hpf vs. 1hpb       All DB DESeq2     33729
9   42hpf vs. 3dpb       All DB DESeq2     25904
10  72hpf vs. 7dpf       All DB DESeq2     13438
11  72hpf vs. 1hpb       All DB DESeq2     11422
12  72hpf vs. 3dpb       All DB DESeq2     12027
13   7dpf vs. 1hpb       All DB DESeq2       529
14   7dpf vs. 3dpb       All DB DESeq2      6294
15   3dpb vs. 1hpb       All DB DESeq2      6614

> newReport1 <- dba.report(data, contrast=c(2:16), bDB=TRUE, fold=1, th=0.01)
Generating report-based DBA object...
> newReport1
15 Samples, 40515 sites in matrix:
          Contrast Direction DB Method Intervals
1  26hpf vs. 42hpf       All DB DESeq2      4339
2  26hpf vs. 72hpf       All DB DESeq2     16567
3   26hpf vs. 7dpf       All DB DESeq2     30195
4   26hpf vs. 1hpb       All DB DESeq2     31028
5   26hpf vs. 3dpb       All DB DESeq2     21324
6  42hpf vs. 72hpf       All DB DESeq2      3161
7   42hpf vs. 7dpf       All DB DESeq2     14866
8   42hpf vs. 1hpb       All DB DESeq2     15544
9   42hpf vs. 3dpb       All DB DESeq2      9025
10  72hpf vs. 7dpf       All DB DESeq2      2486
11  72hpf vs. 1hpb       All DB DESeq2      2527
12  72hpf vs. 3dpb       All DB DESeq2      2301
13   7dpf vs. 1hpb       All DB DESeq2        15
14   7dpf vs. 3dpb       All DB DESeq2       254
15   3dpb vs. 1hpb       All DB DESeq2       269

#contrast 15 here would be 14 for the above reports
> newReport2 <- dba.report(data, contrast = 15, th=0.01)
> sum(abs(newReport2$Fold)>=1)
[1] 6294

I'm unsure why this is happening and which result are correct. I also don't know how to filter for fold-change in a report-based DBA object so I'm pretty sure I didn't originally use the same method as with newReport2. Any help would be appreciated!

dba.report DiffBind • 764 views

ADD COMMENT • link written 11 months ago by jandrade • 0

score 1 · Answer 1 · 2023-05-24

Looking back over the help pages for DiffBind, at some point the documentation for the fold argument changed from saying that the fold change was taken into account, to saying that the statistics were recomputed using either the treat function from edgeR, or whatever method DESeq2 uses to incorporate the fold change into the statistic.

It's not clear to me if that was always the case, and the documentation changed, or if there was an underlying change to the code as well. But anyway.

You should never use a post hoc fold change! In the case of FDR, it totally invalidates the meaning of FDR, which is an estimate of the maximum false positives in a set of significant results. If you change the number of significant results, then the FDR is meaningless, because it was based on the set you had prior to filtering, not the new set.

Post hoc filtering invalidates the unadjusted p-values as well. The p-value isn't actually based on the data you have in hand, but instead is based on the null distribution inferred from your data. You use the data in hand to estimate the null distribution, and then infer what proportion of the observations in the null distribution exceed the observed value. The sample values only enter that calculation when estimating the null distribution, and then when comparing the observed fold change to the values from the null.

But the null distribution in this case includes all likely fold changes, of which those near zero are the most likely. As I mentioned already, the p-value is the proportion of expected observations under the null, that exceed your observed fold change. But if you use a post hoc fold change, then the proportion of expected observations now no longer includes any portion of the null distribution between 0 and the absolute value of your post hoc fold change. The p-value was based on the entirety of the null distribution, but with the post hoc filter you have now said you don't want to include some huge proportion of the likely values, in which case the proportion of those values that your observed fold change exceeds has been truncated.

That was a long way of saying that your current results are correct, and what you had before was probably anti-conservative.

score 1 · Answer 2 · 2023-05-24

1

Entering edit mode

Rory Stark ★ 5.1k

@rory-stark-5741

Last seen 7 days ago

Cambridge, UK

Indeed, starting with version DiffBind_3.4.0, the FDR values are all recomputed if the fold is not zero (it is a log2 value). Prior to that, it was (incorrectly) applying a fold threshold without recomputing the FDR values. The " new" values should be more sound.

ADD COMMENT • link 11 months ago Rory Stark ★ 5.1k