Dear all,

I would like to plot a distance matrix (a correlation matrix) obtained from correlations between gene expression from an RNASeq experiment. The samples were analysed using DESeq2 and besides the differential expression analysis, I am plotting the relationships between samples and genes and samples.

Well, I get confused regarding the distance.

First, I though I should give the correlation matrix as input to heatmap.2():

corr.dist= function(x) as.dist(1-cor( t(x), method="spearman") ) distsRL = dist( t( assay(rld) ) ) distsRL2 = corr.dist( t( assay(rld) ) ) heatmap.2(matRLDcorr, hclustfun=avg, trace="none", col = rev(hmcol))

But doing so, the dist parameter takes euclidean distance and I wonder on what is applied this distance.

distfun: function used to compute the distance (dissimilarity) between both rows and columns.

It looks like it computes the distances between samples, but we should provide a distance matrix as input... So why is it necessary to give a distance?

Then, I tried the 3 other possibilities: inputing a correlation matrix with dist=corr.dist and inputing an "euclidean distance" matrix with dist as default or corr.dist:

matRLD = as.matrix( distsRL ) matRLDcorr = as.matrix( distsRL2 ) heatmap.2(matRLDcorr, dist=corr.dist, hclustfun=avg) heatmap.2(matRLD, hclustfun=avg) heatmap.2(matRLD, dist=corr.dist, hclustfun=avg)

The 4 possibilities give 4 distinct distance matrices. I guess I am computing distances of distances...

To sum up, now I am confused about how to get a sample-to-sample distance matrix from a correlation matrix. Moreover, how can I get a color key ranging from -1 to 1?

Thank you in advance for your help

Thank you for your answer Michael.

I have seen pheatmap(), will probably try it, but I prefer the simpler heatmap.2() plot.

Nevertheless, my question remains:

Since you give as input a matrix in which elements are already correlations (or euclidean distances), for what will be used the distance types specified in clustering_distance_rows and clustering_distance_cols? And could we give different distances to both clustering_distance_rows and clustering_distance_cols?

I am probably missing something here... From my understanding, only agglomeration steps will be performed on the matrix using a chosen method. For example, with average, we don't have too compute distances/correlations anymore.

If you see my code below, I am plotting the correlation [-1,1] and the clustering distances are specified as dist(x,y) = 1-cor(x,y).

From our workflow:

In order to plot the sample distance matrix with the rows/columns arranged by the distances in our distance matrix, we manually provide sampleDists to the clustering_distance argument of the pheatmap function. Otherwise the pheatmap function would assume that the matrix contains the data values themselves, and would calculate distances between the rows/columns of the distance matrix, which is not desired.http://www.bioconductor.org/help/workflows/rnaseqGene/#sample-distances