Question

Does DiffBind correctly designate peaks with very small fold-change to be statistically significant?

0

Entering edit mode

Sam ▴ 10

@sam-21502

Last seen 1 day ago

Jerusalem

In the process of analyzing ChIP-Seq replicates with DiffBind (drosophila data, 3 replicates for each sample, choosing DESeq2 as the analysis method), peaks with very small Fold-change were found to be statistically significant.

I have plotted an MA plot; the red points are the significant ones, the black points are the non-significant ones. I have plotted the non-significant points after plotting the red points, so there was peak with a large fold that would be non-significant, it would be easily visible.

All Range

Zooming in, ylim = c(-2,2) Limiting ylim to (-2,2)

I wonder, does the statistical significace make sense? I am not used to such plots when using DESeq2 with RNA-Seq data. It is true that the number of elements to correct for multiple hypothesis is much smaller here (2000 peaks vs ~16K genes in DESeq2 usually). Nevertheless, I'd like to ask whether you agree that it's surprising/strange that lots of peaks with a very small change became significant. There are no non-significant peaks above a certain, small threshold.

<h2>Note: The values DiffBind returns, and are plotted, are shrunk fold changes.</h2>

More on the data:

The median absolute value of the fold change among the significant peaks is 1.55 and the mean is 1.93 (not in log scale). The Concentration is also not that high, the median among the significant values being 225 reads.

The peak which has the smallest fold change, and is still significant is (Fold and Conc are in log2 scale here):

seqnames     start        end         width     strand 
3L           11657351   11659235       1885      * 

Conc    Conc_Mut     Conc_WT       Fold        p.value        FDR     
8.48     8.63         8.3           0.23       0.00655      0.0376

Individual counts (not in log2 scale):

S7        S8        S9        S1         S2        S3 
404.42     391     396.29     305.76     317.14     321.53

P.S. I have posted this also on BioStars, but the answers posted there had to do with biological significance of the peaks with small fold changes. I am suspicious about the statistical significance of the peaks.

DESeq2 DiffBind • 1.2k views

ADD COMMENT • link updated 2.9 years ago by Rory Stark ★ 5.1k • written 3.3 years ago by Sam ▴ 10

score 0 · Answer 1 · 2021-01-12

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 18 hours ago

United States

In DESeq2, if you don't want small effect sizes to show up in the DE results, you can use lfcThreshold. See DESeq2 vignette (or paper).

ADD COMMENT • link 3.3 years ago Michael Love 41k

0

Entering edit mode

Looking at the individual counts, I noticed that S1-S3 are rather tightly spread around 315, and S7-9 around 395. These are indeed the different groups. So, I think that the answer is simple - the repeatability of the samples is rather high. Sorry for the hassle.

ADD REPLY • link 3.3 years ago Sam ▴ 10

0

Entering edit mode

From Version 3.3.2 onwards, DiffBind will recompute the statistics if you specify a fold threshold in dba.report(), dba.plotMA(), or dba.plotVolcano().

For DESeq2 analysis, it uses the lfcThreshold parameter in DESeq2::results() (re-shrinking fold changes), while for edgeR analysis, edgeR::glmTreat() (instead of edgeR::glmQLFTest()).

ADD REPLY • link 2.9 years ago Rory Stark ★ 5.1k