Hello. The lab I'm working with is comparing H3K4me3 binding in two different conditions of yeast cells using ChIP-seq. I am attempting to use DiffBind with the narrowPeak files generated from MACS2, along with bam files generated from mapping, to generate a set of peaks where differential binding is occuring between the two conditions for downstream analysis. However, the set of peaks I am able to generate does not seem suitable for further analysis, as peaks are not showing up in locations where we expect to see differential binding, as evidenced by visual inspection of the original peak files, and other experimental data. These types of known differential-binding points are my primary benchmark for checking to see if my data makes sense.
My hope is that someone can help explain why I might be getting results like these, and how I might tweak DiffBind's parameters to get better results, or if I'm not understanding how DiffBind is intended to work in the first place. I am new to both bioinformatics and computational genomics with little formal guidance, so I must ask for your patience. I will briefly explain the steps taken, as well as the output.
Firstly, I have a sample sheet containing the information for the peaks files and bam files containing read counts. There are two conditions, with two replicates for each:
SampleID Tissue Factor Condition Treatment Replicate bamReads ControlID bamControl Peaks PeakCaller
wtK4t1 - K4 wt - 1 E:/data/mapped_bwa/wt_t1_K4.bam wtK4t1ct E:/data/mapped_bwa/wt_t1_input.bam E:/data/peaks/wt_t1_K4_tolarge_peaks.narrowPeak bed
wtK4t2 - K4 wt - 2 E:/data/mapped_bwa/wt_t2_K4.bam wtK4t2ct E:/data/mapped_bwa/wt_t2_input.bam E:/data/peaks/wt_t2_K4_tolarge_peaks.narrowPeak bed
2koK4t1 - K4 2ko - 1 E:/data/mapped_bwa/2ko_t1_K4.bam 2koK4t1ct E:/data/mapped_bwa/2ko_t1_input.bam E:/data/peaks/2ko_t1_K4_tolarge_peaks.narrowPeak bed
2koK4t2 - K4 2ko - 2 E:/data/mapped_bwa/2ko_t2_K4.bam 2koK4t2ct E:/data/mapped_bwa/2ko_t2_input.bam E:/data/peaks/2ko_t2_K4_tolarge_peaks.narrowPeak bed
Next, I have used the sample code shown in the DiffBind vignette to get a GRanges object containing the differential peaks:
samples<-read.csv("E:/parameters/h3k4samplesheet_K4.csv")
sk1 <- dba(sampleSheet=samples)
sk1 <- dba.count(sk1, minOverlap = 2)
sk1 <- dba.contrast(sk1, categories=DBA_CONDITION, minMembers=2)
sk1 <- dba.analyze(sk1)
sk1.DB <- dba.report(sk1)
I chose minOverlap of 2 for dba.count(), but I am unsure if this means the peak must show up in both replicates for a given condition, or two out of any of the peak files. I am also unclear on how DiffBind chooses a final consensus peak when two very similar peaks have slight differences.
Finally, I have a function that outputs the sk1.DB GRanges object as a bed file file viewing in IGV, along side the original peak files. At the bottom of this post a screenshot (with some non-relevant information removed) showing the peaks at a particular location which I will call gene A. We know from other experiments that H3K4me3 presence at gene A's location is elevated in condition 2, compared to condition one. The original peak files agree, with no apparent peak in either replicate in condition one, and nearly identical, strong peaks in both replicates of condition two. To my understanding, DiffBind's output (labeled K4Diff) should show a peak in this position representing a difference in binding, but as you can see in the image, this is not the case. The same sort of thing happens in other locations we expect to have differential binding.
Again, am I using DiffBind wrong, or am I not understanding what DiffBind is intended to output? I'm unsure of how to proceed from here, so any help is appreciated. I have viewed the DiffBind manual in addition to the vignette, but I can't seem to understand which functions/parameters might be useful to me. If any more information or clarification is needed, I will do my best to provide it.