Question

DiffBind and peak counts as input

1

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 6 months ago

Palo Alto, CA, USA

Dear all, and Rory,

I would very much appreciate if you could please advise how I shall start the DiffBind workflows from a file with peak counts (binding matrix) as an input into DiffBind.

The described workflow in the vignette uses the bam files as primary input. I do not have the bam files per sample, only the counts for a consensus peak set in ~ 20 samples.

The R code that is listed on how to construct a DBA object from count matrix produces an error on my machine :

 data(tamoxifen_counts)
 tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_READS)
 counts <- dba.peakset(tamoxifen, bRetrieve = TRUE)

 newDBA <- NULL
 countMatrix <- mcols(counts)
 for(sample in 1:ncol(countMatrix)) {
   newDBA <- dba.peakset(newDBA, peaks=counts, 
                         sampID=colnames(countMatrix)[sample],
                         counts=countMatrix[,sample])
 }  
Error in if (treads) { : argument is not interpretable as logical

The same error is produced shall I use the R code :

 data(tamoxifen_counts)
 tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_READS)
 counts <- dba.peakset(tamoxifen, bRetrieve = TRUE)

 newDBA <- NULL
 countMatrix <- mcols(counts)

 peaks <- dba.peakset(tamoxifen, bRetrieve = TRUE, DataType=DBA_DATA_FRAME)
 counts1 <- cbind(peaks[,1:3],countMatrix[,1])
 counts2 <- cbind(peaks[,1:3],countMatrix[,2])

 write.table(counts1, "samp1", row.names=FALSE, col.names=FALSE,sep="\t", 
             quote=FALSE)
 write.table(counts2, "samp2", row.names=FALSE, col.names=FALSE,sep="\t", 
             quote=FALSE)

 newDBA <- dba.peakset(NULL, peaks=peaks, sampID="samp1", 
                       condition="cond1", counts="samp1")
 newDBA <- dba.peakset(newDBA, peaks=peaks, sampID="samp2", 
                       condition="cond2", counts="samp2")
Error in if (treads) { : argument is not interpretable as logical

It looks that only the 1st sample is read shall I use dba.peakset() :

peaks <- dba.peakset(tamoxifen, bRetrieve = TRUE, DataType=DBA_DATA_FRAME)

head(peaks)
    CHR  START    END BT4741 BT4742 MCF71 MCF72 MCF73 T47D1 T47D2 MCF7r1 MCF7r2 ZR751 ZR752
1 chr18  90841  91241      2      1     0     0     0     0     1      0      0    22    35
2 chr18 111395 111795     21     19   127    63   112    14    13     67     52    60   124
3 chr18 150222 150622     15     13    22    11    27     0     6     17      9    12    35
4 chr18 150807 151207     11      8     4     6     8     1     5     25     18     2    17
5 chr18 189192 189592     11     22    20    12    38     3     6     82     54    16    42
6 chr18 199821 200221     12     16     1     4     6     5    20     12     14    29    55

dba.peakset(NULL, peaks=peaks)
1 Samples, 2845 sites in matrix:
  ID Intervals
1  1      2845

If I use the alternative method, that is to make a SampleSheet file with an additional column called "Counts", these are the error messages :

Displaying the content of the SampleSheet file :

read.delim("abc_manifest.csv", sep=",", header=T, stringsAsFactors=F)
  SampleID Tissue Factor  Condition  Treatment Replicate    Counts
1   BT4741  BT474     ER  Resistant Full-Media         1 abc1s.txt
2   BT4742  BT474     ER  Resistant Full-Media         2 abc2s.txt
3    MCF71   MCF7     ER Responsive Full-Media         1 abc3s.txt
4    MCF72   MCF7     ER Responsive Full-Media         2 abc4s.txt

When I create a dba object, the error is :

 abc = dba(sampleSheet="abc_manifest.csv")
BT4741 BT474 ER Resistant Full-Media 1 counts
BT4742 BT474 ER Resistant Full-Media 2 counts
Error in if (nrow(pv$merged) != nrow(pv$binding)) { : 
  argument is of length zero

The files with the counts have the format :

     head(read.delim("abc1s.txt", header=F, stringsAsFactors=F, sep="\t"))
         V1     V2     V3 V4
    1 chr18  90841  91241  1
    2 chr18 111395 111795 19
    3 chr18 150222 150622 13
    4 chr18 150807 151207  8
    5 chr18 189192 189592 22
    6 chr18 199821 200221 16


     head(read.delim("abc2s.txt", header=F, stringsAsFactors=F, sep="\t"))
         V1     V2     V3  V4
    1 chr18  90841  91241   0
    2 chr18 111395 111795 127
    3 chr18 150222 150622  22
    4 chr18 150807 151207   4
    5 chr18 189192 189592  20
    6 chr18 199821 200221   1

  head(read.delim("abc3s.txt", header=F, stringsAsFactors=F, sep="\t"))
         V1     V2     V3  V4
    1 chr18  90841  91241   0
    2 chr18 111395 111795 127
    3 chr18 150222 150622  22
    4 chr18 150807 151207   4
    5 chr18 189192 189592  20
    6 chr18 199821 200221   1

  head(read.delim("abc4s.txt", header=F, stringsAsFactors=F, sep="\t"))
         V1     V2     V3  V4
    1 chr18  90841  91241   0
    2 chr18 111395 111795 128
    3 chr18 150222 150622  22
    4 chr18 150807 151207   4
    5 chr18 189192 189592 201
    6 chr18 199821 200221   1

Thank you,

Bogdan

> R.Version()
$platform
[1] "x86_64-apple-darwin17.0"

$arch
[1] "x86_64"

$os
[1] "darwin17.0"

$system
[1] "x86_64, darwin17.0"

$status
[1] ""

$major
[1] "4"

$minor
[1] "2.1"

$year
[1] "2022"

$month
[1] "06"

$day
[1] "23"

$`svn rev`
[1] "82513"

$language
[1] "R"

$version.string
[1] "R version 4.2.1 (2022-06-23)"

$nickname
[1] "Funny-Looking Kid"

DESeq2 edgeR DiffBind • 1.0k views

ADD COMMENT • link updated 17 months ago by Rory Stark ★ 5.2k • written 18 months ago by Bogdan ▴ 670

score 1 · Answer 1 · 2022-10-24

1

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 14 days ago

Cambridge, UK

Thank you for a nice reproducible bug report!

Unfortunately a bug was introduced in the current release version (DiffBind_3.6). Due to where we are in the release schedule, this is now frozen and no bug fixes can be checked in for DiffBind_3.6. I am checking in a fix for the next release (DiffBind_3.8) which is due to be released in about a week (November 2nd) as part of Bioconductor_3.16.

ADD COMMENT • link 18 months ago Rory Stark ★ 5.2k

0

Entering edit mode

Thank you Rory.

Is there any differential binding algorithm that you would recommend when having ChIP-seq data sets that display major differences in the peak intensities ?

We do have a set of ChIP-seq data with the following replicates :

a) 2 replicates (H3K27) that show very high intensity peaks (control, DMSO)

b) 2 replicates where the histone mark (H3K27) is erased (treatment, + drug treatment)

I would like to call the differential binding between a) and b).

As DESeq2 and edgeR assume that most of the peak signals are not increased or decreased after drug treatment, I guess that the use of DEseq2 or edgeR might not be entirely appropriate ?

I look forward to hearing from you. Thanks a lot !

Bogdan

ADD REPLY • link 18 months ago Bogdan ▴ 670

0

Entering edit mode

Dear Rory,

I think I have a similar problem, but it did not go away with the new version.

 >>> DiffBind 3.8.1
 > dba_AB <- dba(sampleSheet="Stages_table_A_B.csv")
A1 A    1 counts
A2 A    2 counts
Error in if (nrow(pv$merged) != nrow(pv$binding)) { :
  argument is of length zero

The csv table has this format

> read.delim("Stages_table_A_B.csv", sep=",", header=T, stringsAsFactors=F)
  SampleID Tissue Replicate  Counts
1    A1   A         1  A1.bed
2    A2   A         2  A2.bed
3    B1   B         1  B1.bed
4    B2   B         2  B2.bed

and the bed files have the counts in the fourth column

B1_bed <- read.table("B1.bed", header = F)
head(B1_bed)

              V1     V2     V3 V4
1 NW_022144819.1  18530  19030 20
2 NW_022144897.1  30370  30980  3
3 NW_022144897.1  31810  32380  9
4 NW_022144897.1 102670 104370 42
5 NW_022144952.1  41370  41720  0
6 NW_022144974.1 114770 115310 10

Thanks,

Cesar

>  R.Version()
$platform
[1] "x86_64-pc-linux-gnu"

$arch
[1] "x86_64"

$os
[1] "linux-gnu"

$system
[1] "x86_64, linux-gnu"

$status
[1] ""

$major
[1] "4"

$minor
[1] "2.1"

$year
[1] "2022"

$month
[1] "06"

$day
[1] "23"

$`svn rev`
[1] "82513"

$language
[1] "R"

$version.string
[1] "R version 4.2.1 (2022-06-23)"

$nickname
[1] "Funny-Looking Kid"

ADD REPLY • link 17 months ago cesar.arenasmena ▴ 10

0

Entering edit mode

I couldn't reproduce this exactly, however I put in some fixes that should prevent it from happening. They'll be in the next build (DiffBind_3.8.2).

If you want to send me your sample sheet and the first three count files, I can confirm that the bug is fixed for your data.

ADD REPLY • link 17 months ago Rory Stark ★ 5.2k