Search
Question: Low FDR for significant peaks in DiffBind output
0
gravatar for Silvia
5 weeks ago by
Silvia0
Silvia0 wrote:

Hi,

I used DiffBind for a collaborator who noticed that, contrary to his expectations, a particular region (which overlaps highly significant (pvalue << 0.001) peaks from MACS2 in each replicate of condition Y) got a very low FDR when compared to condition X (in which there are no peaks from any of the 4 replicates). Could you please help me to figure out if there's anything wrong in my code? Unfortunately, I cannot share the real data (it's still unpublished), but here's the anonymised experimental setup and the commands I used to compare conditions Y vs X, as well as Z vs X:

> samples
    SampleID  Factor Condition Replicate bamReads bamControl Peaks       PeakCaller
1  condX_repA TF     condX     1         A-TF.bam A-I.bam    A-condX.bed bed
2  condX_repB TF     condX     2         B-TF.bam B-I.bam    B-condX.bed bed
3  condX_repC TF     condX     3         C-TF.bam C-I.bam    C-condX.bed bed
4  condX_repD TF     condX     4         D-TF.bam D-I.bam    D-condX.bed bed
5  condY_repE TF     condY     1         E-TF.bam E-I.bam    E-condY.bed bed
6  condY_repF TF     condY     2         F-TF.bam F-I.bam    F-condY.bed bed
7  condY_repG TF     condY     3         G-TF.bam G-I.bam    G-condY.bed bed
8  condY_repH TF     condY     4         H-TF.bam H-I.bam    H-condY.bed bed
9  condZ_repJ TF     condZ     1         J-TF.bam J-I.bam    J-condZ.bed bed
10 condZ_repK TF     condZ     2         K-TF.bam K-I.bam    J-condZ.bed bed
11 condZ_repL TF     condZ     3         L-TF.bam L-I.bam    L-condZ.bed bed
12 condZ_repM TF     condZ     4         M-TF.bam M-I.bam    M-condZ.bed bed

> db_analysis = dba(sampleSheet="sampleInfo.csv", minOverlap=4, bCorPlot=FALSE)
> db_analysis = dba.count(db_analysis, minOverlap=4, bCorPlot=FALSE)
> db_analysis = dba.contrast(db_analysis, minMembers=4, categories=DBA_CONDITION)
> db_analysis = dba.analyze(db_analysis, bCorPlot=FALSE)
> dba.report(db_analysis, contrast=1, th=1, file="sites_Y_vs_X", ext="txt")
> dba.report(db_analysis, contrast=2, th=1, file="sites_Z_vs_X", ext="txt")

Thanks a lot for your help!

Silvia

ADD COMMENTlink modified 4 weeks ago • written 5 weeks ago by Silvia0
1
gravatar for Rory Stark
5 weeks ago by
Rory Stark1.7k
CRUK, Cambridge, UK
Rory Stark1.7k wrote:

Hi Silvia-

Sorry for the delay in responding -- I'm away on holiday and don't have access to a computer. 

In he meantime one thing you can try is to set bCounts=TRUE in the call to dba.report() and check the read counts in the regions of interest. You may want to compare normalized and raw counts using the bNormalized  parameter. This way you can see if there really is a consistent change in the read counts.

Another possibility is that there is something going on with the control tracks for these samples. If there is a pileup in the control track at these regions, this will be subtracted from the ChIP reads and may wipe out the signal.  You can check this using DBA_SCORE_READS in the call to dba.count(), and turn off the subtraction when calling dba.analyze().

Hope this helps-

Rory

ADD COMMENTlink written 5 weeks ago by Rory Stark1.7k
0
gravatar for Silvia
4 weeks ago by
Silvia0
Silvia0 wrote:

Hi Rory,

Sorry for the delay of my reply, I've been unwell.

Thanks a lot for your suggestions! There was indeed something weird with the read counts, but after I updated both DiffBind (1.16.3 -> 2.2.12) and systemPipeR (1.4.8 -> 1.8.1) the counts finally matched what I saw from the bam files in IGV, and the FDR for that particular region changed dramatically.

There might have been some bug in one of the two packages, but I learnt the hard way I should always keep them up-to-date :)

All the best,

Silvia

ADD COMMENTlink written 4 weeks ago by Silvia0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 265 users visited in the last hour