Question

Statistics of DiffBind (How bamControl files and peak scores are considered)

0

Entering edit mode

yfyuzawa • 0

@yfyuzawa-8396

Last seen 8.7 years ago

United States

We are now studying chromatin modification signature in normal vs disease human samples and found that DiffBind worked really well for our purpose.

Manual tells me that DiffBind uses the same statistics of EdgeR to find differential bound site, but I'm not sure how DiffBind treats those factors specific for ChIP-seq analysis.

1) Input reads
There seems to be some function like bScaleControl to manage them (is that correct?), but what is the basics of the calculation in which bamControl are treated?

2) peak scores

My peak files are in bed format which have peak scores in fifth column. How are they considered during DB calculation?

Please excuse me if they are already described somewhere but I'd appreciate if you could give some information. Thank you so much in advance.

diffbind • 3.5k views

ADD COMMENT • link 8.8 years ago • updated 8.7 years ago yfyuzawa • 0

score 1 · Answer 1 · 2015-07-15

Hello-

1. Input reads. DiffBind handles control reads in a very limited and simple way. In dba.analyze(), the bSubControl parameter control whether the reads from the control (Input) track are subtracted from the ChIP reads before normalization. The control reads can be scaled in the case where there are many more control reads than ChIP reads using the bScaleControl parameter to dba.count(). If there are more control reads than ChIP reads for a sample for a given interval, the read count is set to 1. The idea is that this can dampen peaks that have a high number of corresponding control reads, but really the control reads are best utilized prior to running DiffBind, in two ways: 1. Blacklists should be applied to the reads to mask out problematic areas of the genome, and optimally a custom blacklist should be generated for the experiment using the GreyListChIP package; and 2) whatever peak caller you use (ie MACS) should utilize the control reads to calculate enrichment statistics for the peaks.

2. Peak scores. Peak scores can be exported, but are mostly used for plotting data that doesn't have a differential analysis run via dba.analyze(). For example, after calling dba.count(), the heatmaps and PCA plots will use the peak scores. You can see the peak scores using dba.peakset() with bRetrieve=TRUE. The documentation for dba.count() describes the different scoring methods available (currently 16). These range from using the raw read counts (with or without control reads subtracted), to variations of RPKM normalized read counts and TMM normalized counts (the normalization method used by edgeR) and some scores based on summits (we do recommend using the summits option in dba.count). When you run an analysis using dba.analyze(), the normalization method used by the underlying differential expression package (edgeR or DESeq2) will be used for plots and reports (ie, when bCounts=TRUE in a call to dba.report).

Hope this helps!

-Rory

score 0 · Answer 2 · 2015-08-10

0

Entering edit mode

yfyuzawa • 0

@yfyuzawa-8396

Last seen 8.7 years ago

United States

Thanks Rory!

ADD COMMENT • link 8.7 years ago yfyuzawa • 0