5.1 years ago by

CRUK, Cambridge, UK

Hello Mei San-

Here is the sequence that occurs when you read in peaksets and plot a correlation heatmap.

First, each peak set is read in separately. The scores are normalized to a 0-1 scale as follows:

> scores <- scores/max(scores)

(if `LowerBetter`

was specified, this is followed by `scores <- 1-scores.`

)

Second, all the peaksets are merged. Any overlapping peak regions are extended to encompass the contiguous peaks, so peaks are consider to overlap if at least one base position overlaps.

Third, each interval in the merged peakset is assigned a score for each sample. If a sample had one peak called that overlaps the merged interval, it receives the (normalized) score for that peak. If a sample has more than one peak called that overlaps a merged interval, it is assigned the maximum value of the overlapping peak scores for that sample. If a sample has no peaks that overlap a merged interval, it is assigned a score of -1.

At this point, there is a binding matrix with rows corresponding to the merged intervals, and columns corresponding to each sample. Each column then is a vector representing the peak scores for that sample (with -1 scores for all the regions that were not identified as peaks for that sample).

To make the correlation heatmap (by calling `dba.plotHeatmap() `

or just `plot()`

, which maps to the same thing), the `log2()`

values of the scores are obtained, then a correlation value is computed for each pair of score vectors. This is a pearson correlation by default, although it is changeable using the `distMethod`

parameter to `dba.plotHeatmap()`

.

The whole matrix of correlations is returned silently by `dba.plotHeatmap()`

, so if you assign this to a value, you can see the correlation values.

Hope this help!

Cheers-

Rory