question about correlation plots generated by Diffbind based on peak occupancy
2
0
Entering edit mode
chintan_80 • 0
@chintan_80-9692
Last seen 8.9 years ago

Dear Dr Stark and Dr Brown,

 

Thank you very much for providing a great computational tool to the scientific community for the differential binding analysis of data from ChIP-seq experiments. I have two questions:

1) question regarding the correlation plots generated by Diffbind using peak occupancy data (peak scores) from the peak caller output files: The MACS2 output files for a given sample only contains q values for the peaks that were called in that sample. If–log10q values are provided to Diffbind as peak scores in the peak set input files for the samples then what score does diffbind assign to a given sample for a peak in cases where the peak was called in other samples but not in that sample.

2) question regarding counts and control samples: Are log transformed read counts used for the  differential binding analysis? If log transformed counts are used then how does Diffbind normalize ChIP read counts with respect to control read counts- does it subtract log transformed control read counts from log transformed chip read counts or does it first subtract actual control read counts from actual chip read counts and then log transform the value obtained? In other words are the chip counts normalized to a fold value with respect to the control read counts or do the normalized counts represent the arithmetic difference between chip and control read counts?

 

 

Thanks

Chintan Parekh

 

diffbind • 1.4k views
ADD COMMENT
0
Entering edit mode

You can get the peak scores from MACS2 NAME_peaks.narrowPeak file..

# I write it as an answer by mistake..

ADD REPLY
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 8 weeks ago
Cambridge, UK

Hello Chintan-

1. By default, peak scores are normalized to a 0..1 scale. If a peak is not identified for a sample, it gets a score of -1. So each sample has a same-sized vector of scores: either -1 for if the peak was not identified for that sample, or a normalized score between 0 and 1 if it was.

2. For the differential binding analysis, DiffBind relies on either edgeR or DESeq2. Both of these packages require actual read counts, so no log or fold change transformations are performed. The only changes are a) the (potentially scaled) number of control reads may be subtracted from the actual number of ChIP reads and b) the minimal count is set to 1. Both of these changes may violate the assumptions that underly the differential analysis packages. Log ang fold transformations are only done for assigning a “score” to un-analyzed count data, mainly for purposes of plotting (heat maps and PCA plots). After a differential analysis has been carried out, the normalized read counts are used (if a contrast is specified to dba.plotHeatmap() or dba.plotPCA()).

Hope this helps-
Rory

ADD COMMENT
0
Entering edit mode
Udi Landau ▴ 30
@udi-landau-9726
Last seen 4.9 years ago

You can get the peak scores from MACS2 NAME_peaks.narrowPeak file

ADD COMMENT

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6