Question

Sample-to-sample distance matrix using heatmap.2

0

Entering edit mode

Jane Merlevede ▴ 90

@jane-merlevede-5019

Last seen 7.4 years ago

Dear all,

I would like to plot a distance matrix (a correlation matrix) obtained from correlations between gene expression from an RNASeq experiment. The samples were analysed using DESeq2 and besides the differential expression analysis, I am plotting the relationships between samples and genes and samples.

Well, I get confused regarding the distance.

First, I though I should give the correlation matrix as input to heatmap.2():

corr.dist= function(x) as.dist(1-cor( t(x), method="spearman") )

distsRL = dist( t( assay(rld) ) )
distsRL2 = corr.dist( t( assay(rld) ) )

heatmap.2(matRLDcorr, hclustfun=avg, trace="none", col = rev(hmcol))

But doing so, the dist parameter takes euclidean distance and I wonder on what is applied this distance.

distfun: function used to compute the distance (dissimilarity) between both rows and columns.

It looks like it computes the distances between samples, but we should provide a distance matrix as input... So why is it necessary to give a distance?

Then, I tried the 3 other possibilities: inputing a correlation matrix with dist=corr.dist and inputing an "euclidean distance" matrix with dist as default or corr.dist:

matRLD = as.matrix( distsRL )
matRLDcorr = as.matrix( distsRL2 )

heatmap.2(matRLDcorr, dist=corr.dist, hclustfun=avg)
heatmap.2(matRLD, hclustfun=avg)
heatmap.2(matRLD, dist=corr.dist, hclustfun=avg)

The 4 possibilities give 4 distinct distance matrices. I guess I am computing distances of distances...
To sum up, now I am confused about how to get a sample-to-sample distance matrix from a correlation matrix. Moreover, how can I get a color key ranging from -1 to 1?

Thank you in advance for your help

heatmap.2 gplots DESeq2 • 8.7k views

ADD COMMENT • link 8.5 years ago • updated 8.4 years ago Jane Merlevede ▴ 90

0

Entering edit mode

Thank you for your answer Michael.

I have seen pheatmap(), will probably try it, but I prefer the simpler heatmap.2() plot.

Nevertheless, my question remains:

Since you give as input a matrix in which elements are already correlations (or euclidean distances), for what will be used the distance types specified in clustering_distance_rows and clustering_distance_cols? And could we give different distances to both clustering_distance_rows and clustering_distance_cols?

I am probably missing something here... From my understanding, only agglomeration steps will be performed on the matrix using a chosen method. For example, with average, we don't have too compute distances/correlations anymore.

ADD REPLY • link 8.4 years ago Jane Merlevede ▴ 90

0

Entering edit mode

If you see my code below, I am plotting the correlation [-1,1] and the clustering distances are specified as dist(x,y) = 1-cor(x,y).

From our workflow:

In order to plot the sample distance matrix with the rows/columns arranged by the distances in our distance matrix, we manually provide sampleDists to the clustering_distance argument of the pheatmap function. Otherwise the pheatmap function would assume that the matrix contains the data values themselves, and would calculate distances between the rows/columns of the distance matrix, which is not desired.

http://www.bioconductor.org/help/workflows/rnaseqGene/#sample-distances

ADD REPLY • link 8.4 years ago Michael Love 43k

score 0 · Answer 1 · 2017-09-21

In the current DESeq2 vignette and workflow we use pheatmap, which is an easier interface in my opinion.

The following code works for the dds constructed in the DESeq2 vignette, and has a key mapping -1 to 1 from red to blue.

Note: this produces a plot that is entirely blue, because the correlation between normalized, transformed sample counts is typically going to be very high unless you compare across different cell types. For this reason, I prefer the Euclidean distances calculated on the normalized, variance stabilized data, as shown in the vignette and workflow.

Nevertheless, here is the code for making an entirely blue correlation plot:

vsd <- vst(dds, blind=FALSE)
corrs <- cor(assay(vsd), method="spearman")
corr.dists <- as.dist(1 - corrs)
library("pheatmap")
colors <- colorRampPalette(c("red","white","blue"))(99)
pheatmap(corrs, breaks=seq(from=-1,to=1,length=100),
         clustering_distance_rows=corr.dists,
         clustering_distance_cols=corr.dists,
         col=colors)

The following is a bit better for actually seeing the range of correlations, but as I said, I prefer the distances as in the vignette and workflow

library(RColorBrewer)
diag(corrs) <- NA
colors <- colorRampPalette(brewer.pal(9, "Blues"))(99)
pheatmap(corrs, 
         clustering_distance_rows=corr.dists,
         clustering_distance_cols=corr.dists,
         col=colors)