Question

How to get 3/4 peak overlap in DiffBind?

0

Entering edit mode

Krista • 0

@0b869643

Last seen 3.9 years ago

Finland

Hello,

I'm using DiffBind to extract peaks that appear in my replicates. I have four replicates, and I wish to fetch peaks that are in at least 3 replicates. It doesn't matter which 3/4 (or 4/4) replicates they appear in. How do I achieve this? I expect it has to do with minOverlap parameter, but I can't figure out the changes I require.

The code I currently have;


dbObj=dba(sampleSheet=samples)
dbObj2 <-  dba.peakset(dbObj, consensus = DBA_FACTOR)
ExptConsensus <-  dba(dbObj2, mask=dbObj2$masks$Consensus, minOverlap=1)
ConsensusPeaks <- dba.peakset(ExptConsensus, bRetrieve=TRUE)

DiffBind • 2.2k views

ADD COMMENT • link 3.9 years ago Krista • 0

score 2 · Accepted Answer · 2022-02-03

2

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 12 months ago

Cambridge, UK

It would help to see more of your samplesheet (e.g. output of dba.show(dbObj)), and to understand better what you are trying to do. Do you have multiple factors, each with replicated for different conditions? Or are you trying to compare two factors/marks?

An example that may be of help: in the sample dataset, if you wanted a consensus set that includes all the peaks that are called in at least 3/4 of the replicates in each of the two conditions (Resistant or Responsive), you could use:

data(tamoxifen_peaks)
consensus <- dba.peakset(tamoxifen, consensus=DBA_CONDITION, minOverlap=.75)
consensus <- dba(consensus, mask=consensus$masks$Consensus, minOverlap=1)
consensusPeaks <- dba.peakset(consensus, bRetrieve=TRUE)
tamoxifenCounts <- dba.count(tamoxifen, peaks=consensusPeaks)

If you know you have exactly four replicates of each condition, you could alternatively use minOverlap=3 instead of minOverlap=.75, as they are equivalent in that situation.

ADD COMMENT • link 3.9 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi,

I'm simply trying to filter peaks based on their appearances across replicates. I have 4 replicates with same factors, same conditions, etc.

> dba.show(dbObj)
   ID Factor Replicate Caller Intervals
1 EB1  Suvar         1    bed      1496
2 EB2  Suvar         2    bed      1462
3 EB3  Suvar         3    bed      1494
4 EB4  Suvar         4    bed      1551

What confuses me that if I do minOverlap=.75, I get the same exact result as with minOverlap=1, which shouldn't be true.

> ExptConsensus1 <-  dba(dbObj2, mask=dbObj2$masks$Consensus, minOverlap=1)
> ExptConsensus75 <-  dba(dbObj2, mask=dbObj2$masks$Consensus, minOverlap=.75)
> ExptConsensus1
1 Samples, 1476 sites in matrix:
   ID Factor Replicate Caller Intervals
1 ALL  Suvar   1-2-3-4    bed      1476
> ExptConsensus75
1 Samples, 1476 sites in matrix:
   ID Factor Replicate Caller Intervals
1 ALL  Suvar   1-2-3-4    bed      1476

How should I proceed?

ADD REPLY • link 3.9 years ago Krista • 0

2

Entering edit mode

I see now.

This line in your original script creates a single consensus set:

ExptConsensus <-  dba(dbObj2, mask=dbObj2$masks$Consensus, minOverlap=1)

This consensus set includes all merged peaks that overlap any sample (because minOverlap=1). When you reference this consensus peakset using mask=dbObj2$masks$Consensus, the mask refers to only a single consensus peakset. So it doesn't matter what you specify for minOverlap= in subsequent operations, you can only ever get this consensus peakset again.

If you want a consensus peakset including merged peaks that overlap at least 3 of the 4 samples, you can get it using:

dbObj <- dba(sampleSheet=samples, minOverlap=3)
ConsensusPeaks <- dba.peakset(dbObj, bRetrieve=TRUE)

You could use dba.peakset() to add the consensus peakset you want using:

dbObj=dba(sampleSheet=samples)
dbObj2 <-  dba.peakset(dbObj, consensus = DBA_FACTOR)
ExptConsensus <-  dba(dbObj2, mask=dbObj2$masks$Consensus, minOverlap=3)
ConsensusPeaks <- dba.peakset(ExptConsensus, bRetrieve=TRUE)

But this more complicated way doesn't really gain you anything.

ADD REPLY • link 3.9 years ago Rory Stark ★ 5.2k

0

Entering edit mode

I see.

I tested with the two ways you suggest, but I get different amount of ranges (peaks) depending which method I use. Why is that?

> dbObj <- dba(sampleSheet=samples, minOverlap=3)
EB1  Suvar   1 bed
EB2  Suvar   2 bed
EB3  Suvar   3 bed
EB4  Suvar   4 bed
> ConsensusPeaks <- dba.peakset(dbObj, bRetrieve=TRUE)
> ConsensusPeaks
GRanges object with 1233 ranges and 4 metadata columns:
       seqnames            ranges strand |       EB1       EB2       EB3
          <Rle>         <IRanges>  <Rle> | <numeric> <numeric> <numeric>
     1       2L   5100265-5101513      * |  0.699708  0.000000  0.656992
     2       2L   5162266-5169535      * |  0.740525  0.678925  0.781003
     3       2L 10567725-10569426      * |  0.593294  0.000000  0.527704
     4       2L 22169545-22171532      * |  0.763848  0.646393  0.757256
     5       2L 22172230-22176371      * |  0.676385  0.592645  0.741425
   ...      ...               ...    ... .       ...       ...       ...
  1229        X 23183093-23186316      * |  0.814869  0.775106  0.820580
  1230        X 23354511-23355911      * |  0.858601  0.673267  0.831135
  1231        X 23374976-23375606      * |  0.959184  0.749646  0.928760
  1232        Y     955937-956727      * |  0.848397  0.741160  0.536939
  1233        Y   1420661-1422761      * |  0.568513  0.523338  0.000000
             EB4
       <numeric>
     1  0.647541
     2  0.690867
     3  0.620609
     4  0.788056
     5  0.642857
   ...       ...
  1229  0.857143
  1230  0.716628
  1231  0.877049
  1232  0.736534
  1233  0.620609
  -------
  seqinfo: 7 sequences from an unspecified genome; no seqlengths

Vs

> dbObj=dba(sampleSheet=samples)
EB1  Suvar   1 bed
EB2  Suvar   2 bed
EB3  Suvar   3 bed
EB4  Suvar   4 bed
> dbObj2 <-  dba.peakset(dbObj, consensus = DBA_FACTOR)
Add consensus: 
> ExptConsensus <-  dba(dbObj2, mask=dbObj2$masks$Consensus, minOverlap=3)
> ConsensusPeaks <- dba.peakset(ExptConsensus, bRetrieve=TRUE)
> ConsensusPeaks
GRanges object with 1476 ranges and 1 metadata column:
       seqnames            ranges strand |       ALL
          <Rle>         <IRanges>  <Rle> | <numeric>
     1       2L   5100265-5101513      * |  0.668081
     2       2L   5162266-5169535      * |  0.722830
     3       2L 10327977-10329497      * |  0.534847
     4       2L 10352714-10354470      * |  0.554477
     5       2L 10567725-10569426      * |  0.580536
   ...      ...               ...    ... .       ...
  1472        X 23183093-23186316      * |  0.816925
  1473        X 23354511-23355911      * |  0.769908
  1474        X 23374976-23375606      * |  0.878660
  1475        Y     955937-956727      * |  0.715757
  1476        Y   1420661-1422761      * |  0.570820
  -------
  seqinfo: 7 sequences from an unspecified genome; no seqlengths

ADD REPLY • link 3.9 years ago Krista • 0

1

Entering edit mode

Sorry, the second snippet should read:

dbObj=dba(sampleSheet=samples)
dbObj2 <-  dba.peakset(dbObj, consensus = DBA_FACTOR, minOverlap=3)
ExptConsensus <-  dba(dbObj2, mask=dbObj2$masks$Consensus)
ConsensusPeaks <- dba.peakset(ExptConsensus, bRetrieve=TRUE)

But there isn't really any point to doing it this way.

If you want to try different overlaps without re-reading the sample sheet, you can do:

dbObj <- dba(sampleSheet=samples)
ConsensusPeaks3 <- dba.peakset(dba(dbObj, minOverlap=3), bRetrieve=TRUE)
ConsensusPeaks.75 <- dba.peakset(dba(dbObj, minOverlap=.75), bRetrieve=TRUE)
ConsensusPeaksAll <- dba.peakset(dba(dbObj, minOverlap=4), bRetrieve=TRUE)

etc.

ADD REPLY • link 3.9 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Ah, okay. I see how it works now. Thank you so much for the help!

ADD REPLY • link 3.9 years ago Krista • 0