Question

in diffbind dba.peakset what is the meaning of minOverlap?

0

Entering edit mode

Theo ▴ 10

@theodoregeorgomanolis-7993

Last seen 1 day ago

Germany

What will be the best use of the dba.peakset and minOverlap to extract 3 datasets that have the common peaks, the peaks enriched in c-Jun ChIP and the ones enriched in JunB ChIP. my dataset looks like:

CHIP_seq_Goett_peaks_ALL
4 Samples, 33031 sites in matrix:
           ID      Tissue Factor   Condition Replicate Caller Intervals FRiP
1   Cell1px8  epithelial   junB Cell1_junB         1 counts     33031 0.12
2  Cell1px13  epithelial   junB Cell1_junB         2 counts     33031 0.13
3   Cellc1px13 mesenchymal   cJun  Cellc1_cJun         1 counts     33031 0.04
4 Cellc1px8_13 mesenchymal   cJun  Cellc1_cJun         2 counts     33031 0.08

1 Contrast:
  Group1 Members1 Group2 Members2 DB.edgeR DB.DESeq2
1   junB        2   cJun        2    13259     10837

I've seen people Using values as 0.7, but I do not understand the usage and why the default value is 2. Also, how can this be changed to my specific needs?

One other thing I noticed that even I do

ChIPpeakset <- dba.peakset(CHIP_seq_Goett_peaks_ALL, minOverlap = 0.7)

I get the following:

> ChIPpeakset$minOverlap
[1] 2

Thank you in advance.

diffbind chip-seq dba.peakset • 892 views

ADD COMMENT • link updated 4.5 years ago by Rory Stark ★ 5.2k • written 4.6 years ago by Theo ▴ 10

score 1 · Answer 1 · 2019-10-07

I'm a little unclear of what exactly you are trying to do.

Given that you have the same number of peaks for each sample, it looks like you have already counted reads using dba.count(), is that correct? In that case you have already formed a consensus set (probably using the default minOverlap=2, meaning all peaks that overlap in at least two of the four sets). So by the "common" peaks, do you mean the 33,031 peaks in the consensus set? And by peaks "enriched" for a certain factor, do mean to divide the 10,837 differentially bound peaks into ones that have higher binding affinity in cJun and the ones with higher binding affinity in JunB?

If I understand correctly, there are a number of ways to extract the peaksets of interest. On way is to get everything in a GRanges object using dba.report(), then get the peak subsets you want. Here's an example using the sample data; in this case we are dividing the differentially bound peaks into the ones with higher binding affinity in the Responsive vs the Resistant conditions.

# Load DB object
data(tamoxifen_analysis)
tamoxifen

# Retrieve all peaks in a report
report <- dba.report(tamoxifen,th=1)

# Retrieve all common (consensus) peaks without report statistics
common_peaks <- report[,0]
length(common_peaks)

# Retrieve all differentially bound peaks
db_peaks <- report[report$FDR<0.05,]
length(db_peaks)

# Retrieve all differentially bound peaks with higher affinity in Resistant
db.resistant <- db_peaks[db_peaks$Fold>0,0]
length(db.resistant)

# Retrieve all differentially bound peaks with higher affinity in Responsive
db.responsive <- db_peaks[db_peaks$Fold<0,0]
length(db.responsive)

score 0 · Answer 2 · 2019-10-22

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 15 days ago

Cambridge, UK

Generally speaking, occupancy analyses are best performed on the peak data, before calling dba.count().

A log2 Fold value of 1 represent a 2-fold change in binding.

ADD COMMENT • link 4.5 years ago Rory Stark ★ 5.2k