Question

question about correlation plots generated by Diffbind based on peak occupancy

0

Entering edit mode

chintan_80 • 0

@chintan_80-9692

Last seen 8.2 years ago

Dear Dr Stark and Dr Brown,

Thank you very much for providing a great computational tool to the scientific community for the differential binding analysis of data from ChIP-seq experiments. I have two questions:

1) question regarding the correlation plots generated by Diffbind using peak occupancy data (peak scores) from the peak caller output files: The MACS2 output files for a given sample only contains q values for the peaks that were called in that sample. If–log10q values are provided to Diffbind as peak scores in the peak set input files for the samples then what score does diffbind assign to a given sample for a peak in cases where the peak was called in other samples but not in that sample.

2) question regarding counts and control samples: Are log transformed read counts used for the differential binding analysis? If log transformed counts are used then how does Diffbind normalize ChIP read counts with respect to control read counts- does it subtract log transformed control read counts from log transformed chip read counts or does it first subtract actual control read counts from actual chip read counts and then log transform the value obtained? In other words are the chip counts normalized to a fold value with respect to the control read counts or do the normalized counts represent the arithmetic difference between chip and control read counts?

Thanks

Chintan Parekh

diffbind • 1.3k views

ADD COMMENT • link updated 8.2 years ago by Udi Landau ▴ 30 • written 8.2 years ago by chintan_80 • 0

0

Entering edit mode

You can get the peak scores from MACS2 NAME_peaks.narrowPeak file..

# I write it as an answer by mistake..

ADD REPLY • link 8.2 years ago Udi Landau ▴ 30

score 1 · Answer 1 · 2016-02-11

Hello Chintan-

1. By default, peak scores are normalized to a 0..1 scale. If a peak is not identified for a sample, it gets a score of -1. So each sample has a same-sized vector of scores: either -1 for if the peak was not identified for that sample, or a normalized score between 0 and 1 if it was.

2. For the differential binding analysis, DiffBind relies on either edgeR or DESeq2. Both of these packages require actual read counts, so no log or fold change transformations are performed. The only changes are a) the (potentially scaled) number of control reads may be subtracted from the actual number of ChIP reads and b) the minimal count is set to 1. Both of these changes may violate the assumptions that underly the differential analysis packages. Log ang fold transformations are only done for assigning a “score” to un-analyzed count data, mainly for purposes of plotting (heat maps and PCA plots). After a differential analysis has been carried out, the normalized read counts are used (if a contrast is specified to dba.plotHeatmap() or dba.plotPCA()).

Hope this helps-
Rory

score 0 · Answer 2 · 2016-02-20

0

Entering edit mode

Udi Landau ▴ 30

@udi-landau-9726

Last seen 4.2 years ago

You can get the peak scores from MACS2 NAME_peaks.narrowPeak file

ADD COMMENT • link 8.2 years ago Udi Landau ▴ 30