Entering edit mode
Hi-
The minOverlap parameter in dba.read provides a simple way to define a
consensus peakset used for further analysis. In cases where you have
loaded a number of different peaksets, a single consensus peakset is
defined consisting of (merged) peaks that overlap with at least X of
the original peaksets. So if X=1, then all the (merged) peaks are
included in the consensus peakset. If X=2, only peaks that were
identified in at least two of the samples are included. In the
vignette example where there are 11 samples, setting X=11 would only
analyze peaks that were identified in all 11 samples.
You can see how many peaks would be in each possible consensus peakset
defined using minOverlap, and how this number diminishes as the
overlap criterion becomes more stringent (larger values of
minOverlap), by calling dba.overlap with mode=DBA_OLAP_RATE. This
returns a vector containing the number of peaks that would be in a
consensus peakset if minOverlap were set to the index (so the first
element of the vector is the total number of merged peaks, as if
minOverlap=1, the second element as if minOverlap=2, etc.).
In general, setting minOverlap to a higher number results in a set of
intervals that are more likely to represent genuine binding sites (as
they were identified in more samples), but may result in some truly
differentially bound sites being eliminated. A lower value of
minOverlap may include more spurious sites (noise), but the nature of
the differential analysis should prevent these from being identified
as differentially bound with low FDR (although to a certain extent an
FDR "penalty" is paid by including more sites in the multiple testing
correction). The default of minOverlap=2 was chosen as it eliminates
only sites that were identified uniquely in one and only one sample.
In the vignette, minOverlap is set to 3 only because this reduced the
size of the resulting DBA data objects so that they could be included
with the package and remain under size limits imposed by Bioconductor.
Cheers-
Rory
________________________________
From: Theresa Stueve [theresas@usc.edu]
Sent: 15 February 2014 02:01
To: Rory Stark; Gordon Brown
Subject: minOverlap in DiffBind in R- what does it mean?
Greetings Drs, Stark and Brown,
my name is Theresa. I have just started using DiffBind and can't
thank you enough for such a powerful and easy-to-use package.
I have gone through your tamoxifen tutorial and have read through
forums and threads on DiffBind online. but I still can't ferret out
what "minOverlaps= X" does. Everyone online seems to be quite
comfortable calling "minOverlaps" with different values from the
default- so I apologize in advance if the answer is apparent and I
missed it.
I really appreciate your time and this wonderful program.
--
(Theresa) Ryan Stueve
T32 Postdoctoral Fellow in Environmental Genomics
Department of Preventive Medicine
Ite Laird-Offringa Lab
NTT 6420
[[alternative HTML version deleted]]