Dear all, and Rory,
I would very much appreciate if you could please advise how I shall start the DiffBind workflows from a file with peak counts (binding matrix) as an input into DiffBind.
The described workflow in the vignette uses the bam files as primary input. I do not have the bam files per sample, only the counts for a consensus peak set in ~ 20 samples.
The R code that is listed on how to construct a DBA object from count matrix produces an error on my machine :
data(tamoxifen_counts)
tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_READS)
counts <- dba.peakset(tamoxifen, bRetrieve = TRUE)
newDBA <- NULL
countMatrix <- mcols(counts)
for(sample in 1:ncol(countMatrix)) {
newDBA <- dba.peakset(newDBA, peaks=counts,
sampID=colnames(countMatrix)[sample],
counts=countMatrix[,sample])
}
Error in if (treads) { : argument is not interpretable as logical
The same error is produced shall I use the R code :
data(tamoxifen_counts)
tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_READS)
counts <- dba.peakset(tamoxifen, bRetrieve = TRUE)
newDBA <- NULL
countMatrix <- mcols(counts)
peaks <- dba.peakset(tamoxifen, bRetrieve = TRUE, DataType=DBA_DATA_FRAME)
counts1 <- cbind(peaks[,1:3],countMatrix[,1])
counts2 <- cbind(peaks[,1:3],countMatrix[,2])
write.table(counts1, "samp1", row.names=FALSE, col.names=FALSE,sep="\t",
quote=FALSE)
write.table(counts2, "samp2", row.names=FALSE, col.names=FALSE,sep="\t",
quote=FALSE)
newDBA <- dba.peakset(NULL, peaks=peaks, sampID="samp1",
condition="cond1", counts="samp1")
newDBA <- dba.peakset(newDBA, peaks=peaks, sampID="samp2",
condition="cond2", counts="samp2")
Error in if (treads) { : argument is not interpretable as logical
It looks that only the 1st sample is read shall I use dba.peakset() :
peaks <- dba.peakset(tamoxifen, bRetrieve = TRUE, DataType=DBA_DATA_FRAME)
head(peaks)
CHR START END BT4741 BT4742 MCF71 MCF72 MCF73 T47D1 T47D2 MCF7r1 MCF7r2 ZR751 ZR752
1 chr18 90841 91241 2 1 0 0 0 0 1 0 0 22 35
2 chr18 111395 111795 21 19 127 63 112 14 13 67 52 60 124
3 chr18 150222 150622 15 13 22 11 27 0 6 17 9 12 35
4 chr18 150807 151207 11 8 4 6 8 1 5 25 18 2 17
5 chr18 189192 189592 11 22 20 12 38 3 6 82 54 16 42
6 chr18 199821 200221 12 16 1 4 6 5 20 12 14 29 55
dba.peakset(NULL, peaks=peaks)
1 Samples, 2845 sites in matrix:
ID Intervals
1 1 2845
If I use the alternative method, that is to make a SampleSheet file with an additional column called "Counts", these are the error messages :
Displaying the content of the SampleSheet file :
read.delim("abc_manifest.csv", sep=",", header=T, stringsAsFactors=F)
SampleID Tissue Factor Condition Treatment Replicate Counts
1 BT4741 BT474 ER Resistant Full-Media 1 abc1s.txt
2 BT4742 BT474 ER Resistant Full-Media 2 abc2s.txt
3 MCF71 MCF7 ER Responsive Full-Media 1 abc3s.txt
4 MCF72 MCF7 ER Responsive Full-Media 2 abc4s.txt
When I create a dba object, the error is :
abc = dba(sampleSheet="abc_manifest.csv")
BT4741 BT474 ER Resistant Full-Media 1 counts
BT4742 BT474 ER Resistant Full-Media 2 counts
Error in if (nrow(pv$merged) != nrow(pv$binding)) { :
argument is of length zero
The files with the counts have the format :
head(read.delim("abc1s.txt", header=F, stringsAsFactors=F, sep="\t"))
V1 V2 V3 V4
1 chr18 90841 91241 1
2 chr18 111395 111795 19
3 chr18 150222 150622 13
4 chr18 150807 151207 8
5 chr18 189192 189592 22
6 chr18 199821 200221 16
head(read.delim("abc2s.txt", header=F, stringsAsFactors=F, sep="\t"))
V1 V2 V3 V4
1 chr18 90841 91241 0
2 chr18 111395 111795 127
3 chr18 150222 150622 22
4 chr18 150807 151207 4
5 chr18 189192 189592 20
6 chr18 199821 200221 1
head(read.delim("abc3s.txt", header=F, stringsAsFactors=F, sep="\t"))
V1 V2 V3 V4
1 chr18 90841 91241 0
2 chr18 111395 111795 127
3 chr18 150222 150622 22
4 chr18 150807 151207 4
5 chr18 189192 189592 20
6 chr18 199821 200221 1
head(read.delim("abc4s.txt", header=F, stringsAsFactors=F, sep="\t"))
V1 V2 V3 V4
1 chr18 90841 91241 0
2 chr18 111395 111795 128
3 chr18 150222 150622 22
4 chr18 150807 151207 4
5 chr18 189192 189592 201
6 chr18 199821 200221 1
Thank you,
Bogdan
> R.Version()
$platform
[1] "x86_64-apple-darwin17.0"
$arch
[1] "x86_64"
$os
[1] "darwin17.0"
$system
[1] "x86_64, darwin17.0"
$status
[1] ""
$major
[1] "4"
$minor
[1] "2.1"
$year
[1] "2022"
$month
[1] "06"
$day
[1] "23"
$`svn rev`
[1] "82513"
$language
[1] "R"
$version.string
[1] "R version 4.2.1 (2022-06-23)"
$nickname
[1] "Funny-Looking Kid"
Thank you Rory.
Is there any differential binding algorithm that you would recommend when having ChIP-seq data sets that display major differences in the peak intensities ?
We do have a set of ChIP-seq data with the following replicates :
a) 2 replicates (H3K27) that show very high intensity peaks (control, DMSO)
b) 2 replicates where the histone mark (H3K27) is erased (treatment, + drug treatment)
I would like to call the differential binding between a) and b).
As DESeq2 and edgeR assume that most of the peak signals are not increased or decreased after drug treatment, I guess that the use of DEseq2 or edgeR might not be entirely appropriate ?
I look forward to hearing from you. Thanks a lot !
Bogdan
Dear Rory,
I think I have a similar problem, but it did not go away with the new version.
The csv table has this format
and the bed files have the counts in the fourth column
Thanks,
Cesar
I couldn't reproduce this exactly, however I put in some fixes that should prevent it from happening. They'll be in the next build (
DiffBind_3.8.2
).If you want to send me your sample sheet and the first three count files, I can confirm that the bug is fixed for your data.