Question

Low FDR for significant peaks in DiffBind output

0

Entering edit mode

Silvia • 0

@silvia-7163

Last seen 2.8 years ago

United Kingdom

Hi,

I used DiffBind for a collaborator who noticed that, contrary to his expectations, a particular region (which overlaps highly significant (pvalue << 0.001) peaks from MACS2 in each replicate of condition Y) got a very low FDR when compared to condition X (in which there are no peaks from any of the 4 replicates). Could you please help me to figure out if there's anything wrong in my code? Unfortunately, I cannot share the real data (it's still unpublished), but here's the anonymised experimental setup and the commands I used to compare conditions Y vs X, as well as Z vs X:

> samples
    SampleID  Factor Condition Replicate bamReads bamControl Peaks       PeakCaller
1  condX_repA TF     condX     1         A-TF.bam A-I.bam    A-condX.bed bed
2  condX_repB TF     condX     2         B-TF.bam B-I.bam    B-condX.bed bed
3  condX_repC TF     condX     3         C-TF.bam C-I.bam    C-condX.bed bed
4  condX_repD TF     condX     4         D-TF.bam D-I.bam    D-condX.bed bed
5  condY_repE TF     condY     1         E-TF.bam E-I.bam    E-condY.bed bed
6  condY_repF TF     condY     2         F-TF.bam F-I.bam    F-condY.bed bed
7  condY_repG TF     condY     3         G-TF.bam G-I.bam    G-condY.bed bed
8  condY_repH TF     condY     4         H-TF.bam H-I.bam    H-condY.bed bed
9  condZ_repJ TF     condZ     1         J-TF.bam J-I.bam    J-condZ.bed bed
10 condZ_repK TF     condZ     2         K-TF.bam K-I.bam    J-condZ.bed bed
11 condZ_repL TF     condZ     3         L-TF.bam L-I.bam    L-condZ.bed bed
12 condZ_repM TF     condZ     4         M-TF.bam M-I.bam    M-condZ.bed bed

> db_analysis = dba(sampleSheet="sampleInfo.csv", minOverlap=4, bCorPlot=FALSE)
> db_analysis = dba.count(db_analysis, minOverlap=4, bCorPlot=FALSE)
> db_analysis = dba.contrast(db_analysis, minMembers=4, categories=DBA_CONDITION)
> db_analysis = dba.analyze(db_analysis, bCorPlot=FALSE)
> dba.report(db_analysis, contrast=1, th=1, file="sites_Y_vs_X", ext="txt")
> dba.report(db_analysis, contrast=2, th=1, file="sites_Z_vs_X", ext="txt")

Thanks a lot for your help!

Silvia

diffbind • 1.3k views

ADD COMMENT • link 7.0 years ago Silvia • 0

score 1 · Answer 1 · 2017-04-20

Hi Silvia-

Sorry for the delay in responding -- I'm away on holiday and don't have access to a computer.

In he meantime one thing you can try is to set bCounts=TRUE in the call to dba.report() and check the read counts in the regions of interest. You may want to compare normalized and raw counts using the bNormalized parameter. This way you can see if there really is a consistent change in the read counts.

Another possibility is that there is something going on with the control tracks for these samples. If there is a pileup in the control track at these regions, this will be subtracted from the ChIP reads and may wipe out the signal. You can check this using DBA_SCORE_READS in the call to dba.count(), and turn off the subtraction when calling dba.analyze().

Hope this helps-

Rory

score 0 · Answer 2 · 2017-04-25

Hi Rory,

Sorry for the delay of my reply, I've been unwell.

Thanks a lot for your suggestions! There was indeed something weird with the read counts, but after I updated both DiffBind (1.16.3 -> 2.2.12) and systemPipeR (1.4.8 -> 1.8.1) the counts finally matched what I saw from the bam files in IGV, and the FDR for that particular region changed dramatically.

There might have been some bug in one of the two packages, but I learnt the hard way I should always keep them up-to-date :)

All the best,

Silvia