in diffbind dba.peakset what is the meaning of minOverlap?
Entering edit mode
Last seen 2.0 years ago

What will be the best use of the dba.peakset and minOverlap to extract 3 datasets that have the common peaks, the peaks enriched in c-Jun ChIP and the ones enriched in JunB ChIP. my dataset looks like:

4 Samples, 33031 sites in matrix:
           ID      Tissue Factor   Condition Replicate Caller Intervals FRiP
1   Cell1px8  epithelial   junB Cell1_junB         1 counts     33031 0.12
2  Cell1px13  epithelial   junB Cell1_junB         2 counts     33031 0.13
3   Cellc1px13 mesenchymal   cJun  Cellc1_cJun         1 counts     33031 0.04
4 Cellc1px8_13 mesenchymal   cJun  Cellc1_cJun         2 counts     33031 0.08

1 Contrast:
  Group1 Members1 Group2 Members2 DB.edgeR DB.DESeq2
1   junB        2   cJun        2    13259     10837

I've seen people Using values as 0.7, but I do not understand the usage and why the default value is 2. Also, how can this be changed to my specific needs?

One other thing I noticed that even I do

ChIPpeakset <- dba.peakset(CHIP_seq_Goett_peaks_ALL, minOverlap = 0.7)

I get the following:

> ChIPpeakset$minOverlap
[1] 2

Thank you in advance.

diffbind chip-seq dba.peakset • 350 views
Entering edit mode
Rory Stark ★ 4.4k
Last seen 3 days ago
CRUK, Cambridge, UK

I'm a little unclear of what exactly you are trying to do.

Given that you have the same number of peaks for each sample, it looks like you have already counted reads using dba.count(), is that correct? In that case you have already formed a consensus set (probably using the default minOverlap=2, meaning all peaks that overlap in at least two of the four sets). So by the "common" peaks, do you mean the 33,031 peaks in the consensus set? And by peaks "enriched" for a certain factor, do mean to divide the 10,837 differentially bound peaks into ones that have higher binding affinity in cJun and the ones with higher binding affinity in JunB?

If I understand correctly, there are a number of ways to extract the peaksets of interest. On way is to get everything in a GRanges object using, then get the peak subsets you want. Here's an example using the sample data; in this case we are dividing the differentially bound peaks into the ones with higher binding affinity in the Responsive vs the Resistant conditions.

# Load DB object

# Retrieve all peaks in a report
report <-,th=1)

# Retrieve all common (consensus) peaks without report statistics
common_peaks <- report[,0]

# Retrieve all differentially bound peaks
db_peaks <- report[report$FDR<0.05,]

# Retrieve all differentially bound peaks with higher affinity in Resistant
db.resistant <- db_peaks[db_peaks$Fold>0,0]

# Retrieve all differentially bound peaks with higher affinity in Responsive
db.responsive <- db_peaks[db_peaks$Fold<0,0]
Entering edit mode

Hi Rory, thank you getting back at me. this dataset has already been under count/contrast/analyze, so your assumption is correct. the scripts you provided worked.

this work perfectly.

I am probably confused or did not comprehend the following part from the tutorial

6.3 A complete occupancy analysis: identifying sites unique to a sample group

The specified part did not work with my list. Do I need to reprocess the initial list without the count/contrast/analyze part?

Thank you

Entering edit mode

And another question will be: if I would like to include to the resistant/ responsive only the ones that are foe example above fold change of 2 ( the Fold in your case is log2?) how could I accomplish that? Over all this is something that is been done in the Volcano plots.

Entering edit mode
Rory Stark ★ 4.4k
Last seen 3 days ago
CRUK, Cambridge, UK

Generally speaking, occupancy analyses are best performed on the peak data, before calling dba.count().

A log2 Fold value of 1 represent a 2-fold change in binding.


Login before adding your answer.

Traffic: 336 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6