Dear all,
I would like to plot a distance matrix (a correlation matrix) obtained from correlations between gene expression from an RNASeq experiment. The samples were analysed using DESeq2 and besides the differential expression analysis, I am plotting the relationships between samples and genes and samples.
Well, I get confused regarding the distance.
First, I though I should give the correlation matrix as input to heatmap.2():
corr.dist= function(x) as.dist(1-cor( t(x), method="spearman") ) distsRL = dist( t( assay(rld) ) ) distsRL2 = corr.dist( t( assay(rld) ) ) heatmap.2(matRLDcorr, hclustfun=avg, trace="none", col = rev(hmcol))
But doing so, the dist parameter takes euclidean distance and I wonder on what is applied this distance.
distfun: function used to compute the distance (dissimilarity) between both rows and columns.
It looks like it computes the distances between samples, but we should provide a distance matrix as input... So why is it necessary to give a distance?
Then, I tried the 3 other possibilities: inputing a correlation matrix with dist=corr.dist and inputing an "euclidean distance" matrix with dist as default or corr.dist:
matRLD = as.matrix( distsRL ) matRLDcorr = as.matrix( distsRL2 ) heatmap.2(matRLDcorr, dist=corr.dist, hclustfun=avg) heatmap.2(matRLD, hclustfun=avg) heatmap.2(matRLD, dist=corr.dist, hclustfun=avg)
The 4 possibilities give 4 distinct distance matrices. I guess I am computing distances of distances...
To sum up, now I am confused about how to get a sample-to-sample distance matrix from a correlation matrix. Moreover, how can I get a color key ranging from -1 to 1?
Thank you in advance for your help
Thank you for your answer Michael.
I have seen pheatmap(), will probably try it, but I prefer the simpler heatmap.2() plot.
Nevertheless, my question remains:
Since you give as input a matrix in which elements are already correlations (or euclidean distances), for what will be used the distance types specified in clustering_distance_rows and clustering_distance_cols? And could we give different distances to both clustering_distance_rows and clustering_distance_cols?
I am probably missing something here... From my understanding, only agglomeration steps will be performed on the matrix using a chosen method. For example, with average, we don't have too compute distances/correlations anymore.
If you see my code below, I am plotting the correlation [-1,1] and the clustering distances are specified as dist(x,y) = 1-cor(x,y).
From our workflow:
In order to plot the sample distance matrix with the rows/columns arranged by the distances in our distance matrix, we manually provide sampleDists to the clustering_distance argument of the pheatmap function. Otherwise the pheatmap function would assume that the matrix contains the data values themselves, and would calculate distances between the rows/columns of the distance matrix, which is not desired.
http://www.bioconductor.org/help/workflows/rnaseqGene/#sample-distances