DiffBind and peak counts as input
1
1
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 5 months ago
Palo Alto, CA, USA

Dear all, and Rory,

I would very much appreciate if you could please advise how I shall start the DiffBind workflows from a file with peak counts (binding matrix) as an input into DiffBind.

The described workflow in the vignette uses the bam files as primary input. I do not have the bam files per sample, only the counts for a consensus peak set in ~ 20 samples.

The R code that is listed on how to construct a DBA object from count matrix produces an error on my machine :

 data(tamoxifen_counts)
 tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_READS)
 counts <- dba.peakset(tamoxifen, bRetrieve = TRUE)

 newDBA <- NULL
 countMatrix <- mcols(counts)
 for(sample in 1:ncol(countMatrix)) {
   newDBA <- dba.peakset(newDBA, peaks=counts, 
                         sampID=colnames(countMatrix)[sample],
                         counts=countMatrix[,sample])
 }  
Error in if (treads) { : argument is not interpretable as logical

The same error is produced shall I use the R code :

 data(tamoxifen_counts)
 tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_READS)
 counts <- dba.peakset(tamoxifen, bRetrieve = TRUE)

 newDBA <- NULL
 countMatrix <- mcols(counts)

 peaks <- dba.peakset(tamoxifen, bRetrieve = TRUE, DataType=DBA_DATA_FRAME)
 counts1 <- cbind(peaks[,1:3],countMatrix[,1])
 counts2 <- cbind(peaks[,1:3],countMatrix[,2])

 write.table(counts1, "samp1", row.names=FALSE, col.names=FALSE,sep="\t", 
             quote=FALSE)
 write.table(counts2, "samp2", row.names=FALSE, col.names=FALSE,sep="\t", 
             quote=FALSE)

 newDBA <- dba.peakset(NULL, peaks=peaks, sampID="samp1", 
                       condition="cond1", counts="samp1")
 newDBA <- dba.peakset(newDBA, peaks=peaks, sampID="samp2", 
                       condition="cond2", counts="samp2")
Error in if (treads) { : argument is not interpretable as logical

It looks that only the 1st sample is read shall I use dba.peakset() :

peaks <- dba.peakset(tamoxifen, bRetrieve = TRUE, DataType=DBA_DATA_FRAME)

head(peaks)
    CHR  START    END BT4741 BT4742 MCF71 MCF72 MCF73 T47D1 T47D2 MCF7r1 MCF7r2 ZR751 ZR752
1 chr18  90841  91241      2      1     0     0     0     0     1      0      0    22    35
2 chr18 111395 111795     21     19   127    63   112    14    13     67     52    60   124
3 chr18 150222 150622     15     13    22    11    27     0     6     17      9    12    35
4 chr18 150807 151207     11      8     4     6     8     1     5     25     18     2    17
5 chr18 189192 189592     11     22    20    12    38     3     6     82     54    16    42
6 chr18 199821 200221     12     16     1     4     6     5    20     12     14    29    55

dba.peakset(NULL, peaks=peaks)
1 Samples, 2845 sites in matrix:
  ID Intervals
1  1      2845

If I use the alternative method, that is to make a SampleSheet file with an additional column called "Counts", these are the error messages :

Displaying the content of the SampleSheet file :

read.delim("abc_manifest.csv", sep=",", header=T, stringsAsFactors=F)
  SampleID Tissue Factor  Condition  Treatment Replicate    Counts
1   BT4741  BT474     ER  Resistant Full-Media         1 abc1s.txt
2   BT4742  BT474     ER  Resistant Full-Media         2 abc2s.txt
3    MCF71   MCF7     ER Responsive Full-Media         1 abc3s.txt
4    MCF72   MCF7     ER Responsive Full-Media         2 abc4s.txt

When I create a dba object, the error is :

 abc = dba(sampleSheet="abc_manifest.csv")
BT4741 BT474 ER Resistant Full-Media 1 counts
BT4742 BT474 ER Resistant Full-Media 2 counts
Error in if (nrow(pv$merged) != nrow(pv$binding)) { : 
  argument is of length zero

The files with the counts have the format :

     head(read.delim("abc1s.txt", header=F, stringsAsFactors=F, sep="\t"))
         V1     V2     V3 V4
    1 chr18  90841  91241  1
    2 chr18 111395 111795 19
    3 chr18 150222 150622 13
    4 chr18 150807 151207  8
    5 chr18 189192 189592 22
    6 chr18 199821 200221 16


     head(read.delim("abc2s.txt", header=F, stringsAsFactors=F, sep="\t"))
         V1     V2     V3  V4
    1 chr18  90841  91241   0
    2 chr18 111395 111795 127
    3 chr18 150222 150622  22
    4 chr18 150807 151207   4
    5 chr18 189192 189592  20
    6 chr18 199821 200221   1

  head(read.delim("abc3s.txt", header=F, stringsAsFactors=F, sep="\t"))
         V1     V2     V3  V4
    1 chr18  90841  91241   0
    2 chr18 111395 111795 127
    3 chr18 150222 150622  22
    4 chr18 150807 151207   4
    5 chr18 189192 189592  20
    6 chr18 199821 200221   1

  head(read.delim("abc4s.txt", header=F, stringsAsFactors=F, sep="\t"))
         V1     V2     V3  V4
    1 chr18  90841  91241   0
    2 chr18 111395 111795 128
    3 chr18 150222 150622  22
    4 chr18 150807 151207   4
    5 chr18 189192 189592 201
    6 chr18 199821 200221   1

Thank you,

Bogdan

> R.Version()
$platform
[1] "x86_64-apple-darwin17.0"

$arch
[1] "x86_64"

$os
[1] "darwin17.0"

$system
[1] "x86_64, darwin17.0"

$status
[1] ""

$major
[1] "4"

$minor
[1] "2.1"

$year
[1] "2022"

$month
[1] "06"

$day
[1] "23"

$`svn rev`
[1] "82513"

$language
[1] "R"

$version.string
[1] "R version 4.2.1 (2022-06-23)"

$nickname
[1] "Funny-Looking Kid"
DESeq2 edgeR DiffBind • 977 views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.1k
@rory-stark-5741
Last seen 7 weeks ago
Cambridge, UK

Thank you for a nice reproducible bug report!

Unfortunately a bug was introduced in the current release version (DiffBind_3.6). Due to where we are in the release schedule, this is now frozen and no bug fixes can be checked in for DiffBind_3.6. I am checking in a fix for the next release (DiffBind_3.8) which is due to be released in about a week (November 2nd) as part of Bioconductor_3.16.

ADD COMMENT
0
Entering edit mode

Thank you Rory.

Is there any differential binding algorithm that you would recommend when having ChIP-seq data sets that display major differences in the peak intensities ?

We do have a set of ChIP-seq data with the following replicates :

a) 2 replicates (H3K27) that show very high intensity peaks (control, DMSO)

b) 2 replicates where the histone mark (H3K27) is erased (treatment, + drug treatment)

I would like to call the differential binding between a) and b).

As DESeq2 and edgeR assume that most of the peak signals are not increased or decreased after drug treatment, I guess that the use of DEseq2 or edgeR might not be entirely appropriate ?

I look forward to hearing from you. Thanks a lot !

Bogdan

ADD REPLY
0
Entering edit mode

Dear Rory,

I think I have a similar problem, but it did not go away with the new version.

 >>> DiffBind 3.8.1
 > dba_AB <- dba(sampleSheet="Stages_table_A_B.csv")
A1 A    1 counts
A2 A    2 counts
Error in if (nrow(pv$merged) != nrow(pv$binding)) { :
  argument is of length zero  

The csv table has this format

> read.delim("Stages_table_A_B.csv", sep=",", header=T, stringsAsFactors=F)
  SampleID Tissue Replicate  Counts
1    A1   A         1  A1.bed
2    A2   A         2  A2.bed
3    B1   B         1  B1.bed
4    B2   B         2  B2.bed

and the bed files have the counts in the fourth column

B1_bed <- read.table("B1.bed", header = F)
head(B1_bed)

              V1     V2     V3 V4
1 NW_022144819.1  18530  19030 20
2 NW_022144897.1  30370  30980  3
3 NW_022144897.1  31810  32380  9
4 NW_022144897.1 102670 104370 42
5 NW_022144952.1  41370  41720  0
6 NW_022144974.1 114770 115310 10

Thanks,

Cesar

>  R.Version()
$platform
[1] "x86_64-pc-linux-gnu"

$arch
[1] "x86_64"

$os
[1] "linux-gnu"

$system
[1] "x86_64, linux-gnu"

$status
[1] ""

$major
[1] "4"

$minor
[1] "2.1"

$year
[1] "2022"

$month
[1] "06"

$day
[1] "23"

$`svn rev`
[1] "82513"

$language
[1] "R"

$version.string
[1] "R version 4.2.1 (2022-06-23)"

$nickname
[1] "Funny-Looking Kid"
ADD REPLY
0
Entering edit mode

I couldn't reproduce this exactly, however I put in some fixes that should prevent it from happening. They'll be in the next build (DiffBind_3.8.2).

If you want to send me your sample sheet and the first three count files, I can confirm that the bug is fixed for your data.

ADD REPLY

Login before adding your answer.

Traffic: 705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6