I will like your advice in how to present the DE genes in a heatmap. I will like to show around 40 genes of interest, their log2FC varies between -1.3 to 5.4, but most of them range between -1 to 1. One gene can be DE (FDR<0.05) in one but not all the datasets and I want to show this in the heatmap. But not sure how to deal with them, if a put NA then the heatmap will have empty scales, is it appropriate?
My other question is if I convert my log2FC to FC [2^dataE$log2FC], I will get greater numbers and make the heat map colours more contrasting. But I am not sure if the adjusted pval will still support FC values.
I would recommend using the shrunken log fold changes in the heatmap (i'm assuming you are building a heatmap of LFCs, different than the sample counts heatmap in the vignette) for all of your datasets, as these are reliable regardless of DE status. You could indicate on the side which genes were DE in which dataset, perhaps using RowSideColors or the labels.
I wouldn't recommend using fold changes, because it is desirable to have the property of symmetry around 1: that 2 vs 1 has the same difference in color space as 1 vs 0.5. This is achieved on the log scale.
I mean that in DESeq2, the shrunken LFCs (we also say "maximum a posterior" or MAP) can be interpreted as meaningful alone, i.e. a large LFC means an interesting gene, while unshrunken LFCs (we also say maximum likelihood estimate or (MLE)) could be large even if the difference is not significant, and so are best presented as a pair of information along with the adjusted p-value. Take a look at our paper, where we cover this point in detail: http://genomebiology.com/2014/15/12/550
Hi Michael, how do you define large LFC? because some of my genes of interest have LFC of 0.09. Will this be consider meaningful? In addition of having a padj <0.1, to be consider a good candidate.
Let's step back a bit. The point of making a heatmap is to show the size of the differences across groups. You have a set of vectors of log fold changes for multiple group comparisons, and you want to present these in a matrix, with color indicating up/down and the size of the effect. This is a nice visual summary of a set of comparisons. (Note: the rest will make more sense if you scan the few paragraphs in the paper I linked above where we talk about fold change shrinkage.) You asked if you should blank out the fold changes which were not significant according to adjusted p-value, by putting an NA at that position in the LFC matrix. With unshrunken LFC, this makes sense, because you can get a large LFC (say > 4 on the log2 scale) which is not statistically significant, for example for genes which have very low counts. The unshrunken LFC are not in themselves reliable, but if you filter on adjusted p-value, then you can trust the unshrunken LFC. However, in DESeq2, we apply a statistical technique which can be called "shrinkage", and the result is that the entire vector of shrunken LFC is reliable. So I am recommending you just plot these shrunken LFC and not worry about which are statistically significant within the matrix of colored boxes. If you want to additionally show which LFC were statistically significant, you could indicate this on the side of the plot, using colors or labels, or with a table, or with an additional heatmap with two colors.
Thanks for your reply Michael, just went through the paper. I understand now that there are differences between the padj and LFC of unshrunken and shrunken LFC. No need to transform my data for the heatmap.
Having just a few genes with extreme fold changes can cause your heatmap to be fairly monochromatic. One strategy is to change the scale from e.g., -1.3 to 5.4 to < -1.5 to > 1.5 (or whatever you like) The upside is you can then distinguish the subtle (but perhaps significant) differences for more of your genes, but the obvious downside is that you lose the ability to distinguish differences between highly up-regulated genes.
Anyway, to do this you use the 'breaks' argument of heatmap.2(), along with key.xtickfun to adjust the key label. Something like
Hi Michael,
Thanks for your reply . What do you mean they are reliable regardless of the DE status??
I mean that in DESeq2, the shrunken LFCs (we also say "maximum a posterior" or MAP) can be interpreted as meaningful alone, i.e. a large LFC means an interesting gene, while unshrunken LFCs (we also say maximum likelihood estimate or (MLE)) could be large even if the difference is not significant, and so are best presented as a pair of information along with the adjusted p-value. Take a look at our paper, where we cover this point in detail: http://genomebiology.com/2014/15/12/550
Hi Michael, how do you define large LFC? because some of my genes of interest have LFC of 0.09. Will this be consider meaningful? In addition of having a padj <0.1, to be consider a good candidate.
Let's step back a bit. The point of making a heatmap is to show the size of the differences across groups. You have a set of vectors of log fold changes for multiple group comparisons, and you want to present these in a matrix, with color indicating up/down and the size of the effect. This is a nice visual summary of a set of comparisons. (Note: the rest will make more sense if you scan the few paragraphs in the paper I linked above where we talk about fold change shrinkage.) You asked if you should blank out the fold changes which were not significant according to adjusted p-value, by putting an NA at that position in the LFC matrix. With unshrunken LFC, this makes sense, because you can get a large LFC (say > 4 on the log2 scale) which is not statistically significant, for example for genes which have very low counts. The unshrunken LFC are not in themselves reliable, but if you filter on adjusted p-value, then you can trust the unshrunken LFC. However, in DESeq2, we apply a statistical technique which can be called "shrinkage", and the result is that the entire vector of shrunken LFC is reliable. So I am recommending you just plot these shrunken LFC and not worry about which are statistically significant within the matrix of colored boxes. If you want to additionally show which LFC were statistically significant, you could indicate this on the side of the plot, using colors or labels, or with a table, or with an additional heatmap with two colors.
Thanks for your reply Michael, just went through the paper. I understand now that there are differences between the padj and LFC of unshrunken and shrunken LFC. No need to transform my data for the heatmap.