in diffbind dba.peakset what is the meaning of minOverlap?
2
0
Entering edit mode
Theo ▴ 10
@theodoregeorgomanolis-7993
Last seen 4 months ago
Germany

What will be the best use of the dba.peakset and minOverlap to extract 3 datasets that have the common peaks, the peaks enriched in c-Jun ChIP and the ones enriched in JunB ChIP. my dataset looks like:

CHIP_seq_Goett_peaks_ALL
4 Samples, 33031 sites in matrix:
           ID      Tissue Factor   Condition Replicate Caller Intervals FRiP
1   Cell1px8  epithelial   junB Cell1_junB         1 counts     33031 0.12
2  Cell1px13  epithelial   junB Cell1_junB         2 counts     33031 0.13
3   Cellc1px13 mesenchymal   cJun  Cellc1_cJun         1 counts     33031 0.04
4 Cellc1px8_13 mesenchymal   cJun  Cellc1_cJun         2 counts     33031 0.08

1 Contrast:
  Group1 Members1 Group2 Members2 DB.edgeR DB.DESeq2
1   junB        2   cJun        2    13259     10837

I've seen people Using values as 0.7, but I do not understand the usage and why the default value is 2. Also, how can this be changed to my specific needs?

One other thing I noticed that even I do

ChIPpeakset <- dba.peakset(CHIP_seq_Goett_peaks_ALL, minOverlap = 0.7)

I get the following:

> ChIPpeakset$minOverlap
[1] 2

Thank you in advance.

diffbind chip-seq dba.peakset • 1.1k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 10 days ago
Cambridge, UK

I'm a little unclear of what exactly you are trying to do.

Given that you have the same number of peaks for each sample, it looks like you have already counted reads using dba.count(), is that correct? In that case you have already formed a consensus set (probably using the default minOverlap=2, meaning all peaks that overlap in at least two of the four sets). So by the "common" peaks, do you mean the 33,031 peaks in the consensus set? And by peaks "enriched" for a certain factor, do mean to divide the 10,837 differentially bound peaks into ones that have higher binding affinity in cJun and the ones with higher binding affinity in JunB?

If I understand correctly, there are a number of ways to extract the peaksets of interest. On way is to get everything in a GRanges object using dba.report(), then get the peak subsets you want. Here's an example using the sample data; in this case we are dividing the differentially bound peaks into the ones with higher binding affinity in the Responsive vs the Resistant conditions.

# Load DB object
data(tamoxifen_analysis)
tamoxifen

# Retrieve all peaks in a report
report <- dba.report(tamoxifen,th=1)

# Retrieve all common (consensus) peaks without report statistics
common_peaks <- report[,0]
length(common_peaks)

# Retrieve all differentially bound peaks
db_peaks <- report[report$FDR<0.05,]
length(db_peaks)

# Retrieve all differentially bound peaks with higher affinity in Resistant
db.resistant <- db_peaks[db_peaks$Fold>0,0]
length(db.resistant)

# Retrieve all differentially bound peaks with higher affinity in Responsive
db.responsive <- db_peaks[db_peaks$Fold<0,0]
length(db.responsive)
ADD COMMENT
0
Entering edit mode

Hi Rory, thank you getting back at me. this dataset has already been under count/contrast/analyze, so your assumption is correct. the scripts you provided worked.

this work perfectly.

I am probably confused or did not comprehend the following part from the tutorial

6.3 A complete occupancy analysis: identifying sites unique to a sample group

The specified part did not work with my list. Do I need to reprocess the initial list without the count/contrast/analyze part?

Thank you

ADD REPLY
0
Entering edit mode

And another question will be: if I would like to include to the resistant/ responsive only the ones that are foe example above fold change of 2 ( the Fold in your case is log2?) how could I accomplish that? Over all this is something that is been done in the Volcano plots.

ADD REPLY
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 10 days ago
Cambridge, UK

Generally speaking, occupancy analyses are best performed on the peak data, before calling dba.count().

A log2 Fold value of 1 represent a 2-fold change in binding.

ADD COMMENT

Login before adding your answer.

Traffic: 937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6