Question

DiffBind affinity vs occupancy Heatmaps

1

Entering edit mode

shubhibartaria ▴ 10

@shubhibartaria-10946

Last seen 8.5 years ago

Hi,

I wanted to know whether the first occupancy heatmap that is generated is based on the scores that the BED files contain or are the scores calculated by Diffbind itself? What if the bed files do not have a score column (only chr #, start, end, name is present)? How is the score calculated then?

Also, earlier when I did the affinity analysis(based on counts from bam files ) with only 3 files(MB1, MB2, MB3), the correlation values were different as oppose to when I added another data set which combined all the other bam files (Joint MB ). Now if it were a pairwise comparison, the values for MB1-MB2, MB1-MB3 etc shouldn't have changed, right? Only new values relating to JointMB should have been added. Then why would the values differ?

Regards

Shubhi

diffbind heatmap pearsoncorrelation • 1.9k views

ADD COMMENT • link updated 8.5 years ago by Rory Stark ★ 5.2k • written 8.5 years ago by shubhibartaria ▴ 10

score 1 · Answer 1 · 2016-06-28

Hello Shubhi-

The scores are taken from the peak file. The column chosen depend on the specified format. The default, "raw", uses the fourth column, while "bed" will use the fifth column (see the help page for dba.peakset() for supported formats and how to change the default score column). Once read in, the scores are normalized to be on a 0...1 scale. If there are no scores (score column missing), all the scores are set to 1. When the binding matrix is constructed, any consensus peaks not present for s given sample will get a score of -1.

For an affinity analysis (count based), there are many possible scoring methods, generally corresponding to different types of normalization. The man page for dba.count() explains the scoring methods and how to change them. The default score uses the TMM normalization method in the edgeR package. the normalization method takes global aspects of the data into account, so changing the samples preset will change the scores, and hence the correlation, between any two samples.

You can see the correlation values by assigning the result of plot() or dba.plotHeatmap() to a variable, as the matrix of correlations is returned invisibly:

> data(tamoxifen_counts)
> correlations <- plot(tamoxifen)
> correlations
       ZR751 ZR752 T47D2 T47D1 BT4742 BT4741 MCF7r2 MCF7r1 MCF72 MCF71 MCF73
ZR751   1.00 0.940  0.46  0.44   0.41   0.43   0.12  0.100  0.23  0.21  0.20
ZR752   0.94 1.000  0.47  0.43   0.42   0.43   0.12  0.096  0.23  0.19  0.21
T47D2   0.46 0.470  1.00  0.77   0.33   0.35   0.31  0.300  0.33  0.30  0.30
T47D1   0.44 0.430  0.77  1.00   0.34   0.39   0.33  0.310  0.36  0.35  0.33
BT4742  0.41 0.420  0.33  0.34   1.00   0.88   0.42  0.450  0.42  0.42  0.44
BT4741  0.43 0.430  0.35  0.39   0.88   1.00   0.45  0.470  0.46  0.46  0.47
MCF7r2  0.12 0.120  0.31  0.33   0.42   0.45   1.00  0.900  0.63  0.64  0.64
MCF7r1  0.10 0.096  0.30  0.31   0.45   0.47   0.90  1.000  0.65  0.66  0.66
MCF72   0.23 0.230  0.33  0.36   0.42   0.46   0.63  0.650  1.00  0.87  0.89
MCF71   0.21 0.190  0.30  0.35   0.42   0.46   0.64  0.660  0.87  1.00  0.89
MCF73   0.20 0.210  0.30  0.33   0.44   0.47   0.64  0.660  0.89  0.89  1.00

> tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_RPKM)
> correlations <- plot(tamoxifen)
> correlations
       ZR752 ZR751 T47D2 T47D1 BT4742 BT4741 MCF7r2 MCF7r1 MCF71 MCF72 MCF73
ZR752   1.00  0.94  0.48  0.43   0.45   0.46   0.14   0.13  0.20  0.23  0.23
ZR751   0.94  1.00  0.47  0.46   0.46   0.49   0.15   0.14  0.24  0.26  0.25
T47D2   0.48  0.47  1.00  0.78   0.40   0.42   0.36   0.35  0.35  0.38  0.36
T47D1   0.43  0.46  0.78  1.00   0.41   0.46   0.38   0.34  0.41  0.43  0.40
BT4742  0.45  0.46  0.40  0.41   1.00   0.90   0.47   0.50  0.48  0.48  0.51
BT4741  0.46  0.49  0.42  0.46   0.90   1.00   0.50   0.51  0.53  0.53  0.54
MCF7r2  0.14  0.15  0.36  0.38   0.47   0.50   1.00   0.89  0.67  0.69  0.70
MCF7r1  0.13  0.14  0.35  0.34   0.50   0.51   0.89   1.00  0.69  0.71  0.71
MCF71   0.20  0.24  0.35  0.41   0.48   0.53   0.67   0.69  1.00  0.89  0.90
MCF72   0.23  0.26  0.38  0.43   0.48   0.53   0.69   0.71  0.89  1.00  0.90
MCF73   0.23  0.25  0.36  0.40   0.51   0.54   0.70   0.71  0.90  0.90  1.00

Cheers-

Rory