Hi everyone, wanted to run my open chromatin mapping methodology with DiffBind by you and see if there is room for improvement. Basically have 3-4 replicates per experiment where the vast majority of underlying peaks will be shared between groups. In the sampleSheet just have the BAMs and narrowPeak files from the macs2 output, and using CONDITION as the contrast. Comes from PE 75 NextSeq high output runs.
Running the following:
samples <- read.csv(file.path(system.file("extra", package="DiffBind"),"RPMG_DNAse.csv"))
RPMG<- dba(minOverlap = 2, sampleSheet = "RPMG_DNAse.csv", peakCaller = "macs", peakFormat = "narrow", config=data.frame(AnalysisMethod=DBA_EDGER, fragmentSize=151))
RPMG <- dba.count(RPMG, summits=250)
RPMG<- dba.contrast(RPMG, categories=DBA_CONDITION)
RPMG<- dba.analyze(RPMG)
RPMG.DB <- dba.report(RPMG)
View (RPMG.DB)
Basically noticed that I am getting slightly more peaks using EdgeR than DeSeq2 however still very few Diff peaks given a consensus peak set of over 90K. Wondering everyones thoughts, thanks!
Hi,
You might try again without setting summits in dba.count. If the regions are relatively broad, setting the
summits=250
might only capture the middle of the region. Also,fragmentSize
looks a little short... is that the actual mean fragment size? Other than that, the steps look right (though it's not useful to read the sample sheet into 'samples
' then again in the call todba
).Also, maybe update DiffBind... a bug-fix was just submitted on Friday (though it shouldn't affect your example here).
How many peaks are you expecting, and how many are you getting differentially bound? Can you post a screen shot showing a region that seems like it ought to be called as differentially-bound, but isn't?
- Gord
Hi Gord,
Thanks for the advice. Amended the fragment size as you were right and it was a little low and removed the summits option. Still getting small amounts of diff peaks but only for one specific comparison, it may represent actual biology but just trying to look at every angle. I would except 100-200 diff peaks but only getting on the order of 50 or so. Will work on finding that region.
-Rob
Is there anything in particular you would do throughout this analysis keeping in mind its not ChIP-seq but DNase-seq?
Thanks.
I'm not specifically experienced with DNAse-seq. The criteria that are likely to matter are those I've already mentioned... fragment length and peak width. Other than that, how much variability is there within your groups? If the within-group variability is high, then it's harder to identify differentially-bound sites. If you plot the principal components analysis via
dba.plotPCA
(after carrying out the differential binding analysis), is there clear separation between the groups? What about the unbiased PCA (i.e. before differential analysis)?Where does your expectation of 100-200 differentially-bound peaks come from? It's hard to guess why you're getting fewer peaks, without any idea why that's the expected number. Can you provide more information on that?
- Gord