Dear Dr Stark,
I have maybe a naïve question about dba.analyze function.
My experiment consist in:
-
WT-stimulated
-
WT-unstimulated(few leaky binding)
-
KO-stimulated (negative control)
immuno-precipitated samples, each performed in duplicate with its own input on a small bacterial genome obtaining from 3 to 13 million reads for each library.
I performed IDR calculations as suggested by the authors and I called peaks for each ChIP.bam against the corresponding pooled-input.bam and then performed all the other sub-pooling test. I ended up having, for each experiment, a unique list of peaks obtained from pooled-ChIP analysis, with different ChIP.bam and the same Pooled-input.bam for DiffBind analysis.
SampleID,Tissue,Factor,Treatment,Condition,Replicate,Peaks,bamReads,bamControl
IP-6C,WT,TF1,Fullmedia,1,1,peaks/Pooled_6_optimalset.bed,../Bam/IP-6C.sort.bam,../Bam/Input-6-merged.bam
IP-6E,WT,TF1,Fullmedia,1,2,peaks/Pooled_6_optimalset.bed,../Bam/IP-6E.sort.bam,../Bam/Input-6-merged.bam
As a direct consequence I usually have two to six times more reads in input respect to ChIP. Performing dba.counts setting bScaleControl=TRUE I can 'normalize' this disproportion, then input are subtracted and finally ChIP are normalized. In dba.analyze, if I understand correctly, calculations start again from raw reads but bScaleControl parameter is not available. So I suspect that I can introduce some bias just subtracting control reads, isn't it? How would you suggest to proceed? Is it better to eliminate the subtraction step or to extract the normalized counts and use DESeq2 out of this package?
Finally just a clarification, as I have some times up to 10 identical peak-ranges with different summits and scores, at which step are they merged togheter? I deduce from dba.count on, but I guess also in dba.peakset, right?
Thank you for your time
Eva