Hello Shubhi-
The scores are taken from the peak file. The column chosen depend on the specified format. The default, "raw
", uses the fourth column, while "bed
" will use the fifth column (see the help page for dba.peakset()
for supported formats and how to change the default score column). Once read in, the scores are normalized to be on a 0...1 scale. If there are no scores (score column missing), all the scores are set to 1. When the binding matrix is constructed, any consensus peaks not present for s given sample will get a score of -1.
For an affinity analysis (count based), there are many possible scoring methods, generally corresponding to different types of normalization. The man page for dba.count()
explains the scoring methods and how to change them. The default score uses the TMM normalization method in the edgeR
package. the normalization method takes global aspects of the data into account, so changing the samples preset will change the scores, and hence the correlation, between any two samples.
You can see the correlation values by assigning the result of plot()
or dba.plotHeatmap()
to a variable, as the matrix of correlations is returned invisibly:
> data(tamoxifen_counts)
> correlations <- plot(tamoxifen)
> correlations
ZR751 ZR752 T47D2 T47D1 BT4742 BT4741 MCF7r2 MCF7r1 MCF72 MCF71 MCF73
ZR751 1.00 0.940 0.46 0.44 0.41 0.43 0.12 0.100 0.23 0.21 0.20
ZR752 0.94 1.000 0.47 0.43 0.42 0.43 0.12 0.096 0.23 0.19 0.21
T47D2 0.46 0.470 1.00 0.77 0.33 0.35 0.31 0.300 0.33 0.30 0.30
T47D1 0.44 0.430 0.77 1.00 0.34 0.39 0.33 0.310 0.36 0.35 0.33
BT4742 0.41 0.420 0.33 0.34 1.00 0.88 0.42 0.450 0.42 0.42 0.44
BT4741 0.43 0.430 0.35 0.39 0.88 1.00 0.45 0.470 0.46 0.46 0.47
MCF7r2 0.12 0.120 0.31 0.33 0.42 0.45 1.00 0.900 0.63 0.64 0.64
MCF7r1 0.10 0.096 0.30 0.31 0.45 0.47 0.90 1.000 0.65 0.66 0.66
MCF72 0.23 0.230 0.33 0.36 0.42 0.46 0.63 0.650 1.00 0.87 0.89
MCF71 0.21 0.190 0.30 0.35 0.42 0.46 0.64 0.660 0.87 1.00 0.89
MCF73 0.20 0.210 0.30 0.33 0.44 0.47 0.64 0.660 0.89 0.89 1.00
> tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_RPKM)
> correlations <- plot(tamoxifen)
> correlations
ZR752 ZR751 T47D2 T47D1 BT4742 BT4741 MCF7r2 MCF7r1 MCF71 MCF72 MCF73
ZR752 1.00 0.94 0.48 0.43 0.45 0.46 0.14 0.13 0.20 0.23 0.23
ZR751 0.94 1.00 0.47 0.46 0.46 0.49 0.15 0.14 0.24 0.26 0.25
T47D2 0.48 0.47 1.00 0.78 0.40 0.42 0.36 0.35 0.35 0.38 0.36
T47D1 0.43 0.46 0.78 1.00 0.41 0.46 0.38 0.34 0.41 0.43 0.40
BT4742 0.45 0.46 0.40 0.41 1.00 0.90 0.47 0.50 0.48 0.48 0.51
BT4741 0.46 0.49 0.42 0.46 0.90 1.00 0.50 0.51 0.53 0.53 0.54
MCF7r2 0.14 0.15 0.36 0.38 0.47 0.50 1.00 0.89 0.67 0.69 0.70
MCF7r1 0.13 0.14 0.35 0.34 0.50 0.51 0.89 1.00 0.69 0.71 0.71
MCF71 0.20 0.24 0.35 0.41 0.48 0.53 0.67 0.69 1.00 0.89 0.90
MCF72 0.23 0.26 0.38 0.43 0.48 0.53 0.69 0.71 0.89 1.00 0.90
MCF73 0.23 0.25 0.36 0.40 0.51 0.54 0.70 0.71 0.90 0.90 1.00
Cheers-
Rory