4.4 years ago by
CRUK, Cambridge, UK
1. Input reads.
DiffBind handles control reads in a very limited and simple way. In
bSubControl parameter control whether the reads from the control (Input) track are subtracted from the ChIP reads before normalization. The control reads can be scaled in the case where there are many more control reads than ChIP reads using the
bScaleControl parameter to
dba.count(). If there are more control reads than ChIP reads for a sample for a given interval, the read count is set to 1. The idea is that this can dampen peaks that have a high number of corresponding control reads, but really the control reads are best utilized prior to running
DiffBind, in two ways: 1. Blacklists should be applied to the reads to mask out problematic areas of the genome, and optimally a custom blacklist should be generated for the experiment using the
GreyListChIP package; and 2) whatever peak caller you use (ie MACS) should utilize the control reads to calculate enrichment statistics for the peaks.
2. Peak scores. Peak scores can be exported, but are mostly used for plotting data that doesn't have a differential analysis run via
dba.analyze(). For example, after calling
dba.count(), the heatmaps and PCA plots will use the peak scores. You can see the peak scores using
bRetrieve=TRUE. The documentation for
dba.count() describes the different scoring methods available (currently 16). These range from using the raw read counts (with or without control reads subtracted), to variations of RPKM normalized read counts and TMM normalized counts (the normalization method used by
edgeR) and some scores based on summits (we do recommend using the summits option in
dba.count). When you run an analysis using
dba.analyze(), the normalization method used by the underlying differential expression package (
DESeq2) will be used for plots and reports (ie, when
bCounts=TRUE in a call to
Hope this helps!