I was trying to plot a sample heatmap and we usually do it with VST count data, but I was playing with the TPM data today and found that the heatmap generated by VST TPM data has a much better clustering of untreated and treated samples.
I also tried plotting a PCA plot with VST TPM and I also get a very good grouping. I checked the standard deviation against the means of all genes and found that the VST TPM data became more homoscedastic than the VST count data.
I wonder if It is okay to use VST TPM data for heatmap and PCA, and what is the possible explanation why I got a better grouping result from VST TPM data.
Note that since the DESeqDataSet object expects integers, I rounded the TPM to the nearest integers before doing the transformaion.