Dear Rory,
Thanks for your effort in this excellent tool! Your tutorial and detailed answers to each post are extremely helpful. However, I feel confused to some results while running DiffBind. To better illustrate my problems, I would like to invite you to go through my case.
library("DiffBind")
samples=read.csv("res.info.csv")
tamoxifen <- dba(sampleSheet=samples)
tamoxifen_out <- dba.peakset(tamoxifen, minOverlap=1, bRetrieve=TRUE, DataType=DBA_DATA_FRAME)
write.table(tamoxifen_out, file="res.all_qvalue.txt", quote=F, sep="\t", row.names=F, col.names=T)
Problem: It should output all peaks found in any samples. However, only peaks found in more than two samples were output in file 'res.all_qvalue.txt' although I set 'minOverlap=1'
tamoxifen_consensus <- dba.peakset(tamoxifen, consensus=DBA_CONDITION, minOverlap=2)
tamoxifen <- dba.count(tamoxifen, peaks=consensus_peaks, bParallel=TRUE,summits=0)
tamoxifen_count1 <- dba.peakset(tamoxifen, bRetrieve=TRUE, DataType=DBA_DATA_FRAME)
write.table(tamoxifen_count1, file="res.consensus_count1.txt", quote=F, sep="\t", row.names=F, col.names=T)
Question: If I understand correctly, it should output the TMM normalized read number and using the full library sizes. Do you agree with me?
#Raw read count
tamoxifen_raw <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_READS)
tamoxifen_counts <- dba.peakset(tamoxifen_raw, minOverlap=1, bRetrieve=TRUE, DataType=DBA_DATA_FRAME)
write.table(tamoxifen_counts, file="res.consensus_raw_count.txt", quote=F, sep="\t", row.names=F, col.names=T)
#RPKM, producing from normalizing raw read count with sequencing depth and gene length
tamoxifen_rpkm <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_RPKM)
tamoxifen_rpkm_counts <- dba.peakset(tamoxifen_rpkm, minOverlap=1, bRetrieve=TRUE, DataType=DBA_DATA_FRAME)
write.table(tamoxifen_rpkm_counts, file="res.consensus_rpkm.txt", quote=F, sep="\t", row.names=F, col.names=T)
tamoxifen <- dba.normalize(tamoxifen)
Question: It is defining the normalization method used in the following differential analysis, default by using full library size on the raw read count. And it is not the same count as that in res.consensus_count1.txt, right? If so, I couldn't obtain the normalized count using for differential analysis. Do you agree with me?
tamoxifen <- dba.contrast(tamoxifen, design="~Condition + Factor",minMembers=2)
tamoxifen <- dba.contrast(tamoxifen,categories=c(DBA_CONDITION), block=DBA_FACTOR,minMembers=2)
Question: Two batch data were produced for some of the conditions and the batch information was stored in the 'Factor' column in csv file. I am wondering if I should specify the batch information in design or in block parameter.
Thanks for your time and effort. Looking forward to hearing from you at your earliest convenience.