Question: DESeq2: how to interpret the heatmap clustering samples
0
lina.faller0 wrote:

Hi all,

I've been following the DESeq2 workflow for analyzing RNAseq expression data. After using the rlog function to transform my data, I plotted a heatmap as follows:

# rlog transformation

rld <- rlog(dds, blind=FALSE)

# plot the rlog transformed samples
sampleDists <- dist( t( assay(rld) ) )
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- rld\$reactor
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)

pheatmap(sampleDistMatrix,
clustering_distance_rows=sampleDists,
clustering_distance_cols=sampleDists,
col=colors)

How do I best interpret the scale of the resulting heatmap? I noticed that there seems to be no specific upper bound. Does anyone have a suggestion for how to transform the data in the heatmap further so that the scale goes from 0 to 1?

Thanks! Answer: DESeq2: how to interpret the
0
James W. MacDonald51k wrote:

Why would scaling to 0-1 make it more interpretable? You would have the exact same colors, but the scale would just have different numbers.

The idea for this part of the workflow is to say which samples are similar to each other, and you can see in your heatmap that there are three blocks of samples with darker blue squares (the first three samples, the next two, and the last four). The samples in each of those blocks are more similar to each other than they are to the samples in the other blocks. If you are expecting those sets of samples to be similar, then that's good. If not, then you need to figure out why not.

The next part of the workflow is to plot PCA plots, which IMO are a better way to do this sort of exploratory data analysis. The PCA plot should show the same grouping structure, with three groups of samples.

Thanks for your advice! I actually have a few different experiments and put together a plot like this for each experiment. My wetlab colleagues were asking about how to compare the different experiments using this plot and if I could calculate R-squared values or something similar.

Are the rlog-transformed values calculated in such a way that the values are with respect to the current samples/experiment only, or can they be used to compare across experiments?