Question

Scaling for p.heatmap

0

Entering edit mode

Bhavana • 0

@1d778e30

Last seen 2.1 years ago

Australia

I'm new to R and am making a heatmap for some RNA sequencing data using p.heatmap. My input data is the Log2CPM of genes across 5 samples (samples in columns, genes in rows). I want to understand whether I should scale my data or not, using the scale() function. And secondly, if I should set scale="row" in the p.heatmap function or not. Here is my code:

install.packages("pheatmap")
library(pheatmap)
heatmap_trial_2 <- read.csv("Final genes_log2CPM.csv")
heatmap_trial_2 <- data.frame(heatmap_trial_2[,-1], row.names=heatmap_trial_2[,1])
sc_1 <-t(scale(t(heatmap_trial_2), center = TRUE, scale = TRUE))
pheatmap(sc_1, kmeans_k = NA, breaks = NA, scale = "none", cluster_rows = FALSE,
         cluster_cols = FALSE,
         show_rownames = TRUE, show_colnames = TRUE,
         colorRampPalette(brewer.pal(9,"BuPu"))(100))

Here is the output I get when I put the above code Heatmap with scale=TRUE, scale="none"

However, I noticed that if I set scale = "row" in the p.heatmap code, then the heatmap looks exactly the same regardless of whether i set scale = TRUE or scale = FALSE using the scale function. Here is what the heatmap looks like in that case:

Heatmap with scale=TRUE or FALSE, scale="row"

If I don't scale it at all (if I put scale=FALSE and scale="none"), this is what I get: Heatmap with scale=FALSE, scale= "none"

I do understand the purpose of scaling in general after reading the R documentation for both the functions and some other posts so I know I should be scaling my data, I am just struggling to determine which of these is the correct way to do it for my data. At what step should I perform the "scaling"? Any help would be highly appreciated, thanks!

my session info

sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RColorBrewer_1.1-3 viridis_0.6.2      viridisLite_0.4.0  pheatmap_1.0.12   

loaded via a namespace (and not attached):
 [1] magrittr_2.0.3   tidyselect_1.1.2 munsell_0.5.0    colorspace_2.0-3 R6_2.5.1         rlang_1.0.2      fansi_1.0.3      dplyr_1.0.9     
 [9] tools_4.1.2      grid_4.1.2       gtable_0.3.0     utf8_1.2.2       DBI_1.1.3        cli_3.3.0        ellipsis_0.3.2   assertthat_0.2.1
[17] tibble_3.1.7     lifecycle_1.0.1  crayon_1.5.1     gridExtra_2.3    purrr_0.3.4      ggplot2_3.3.6    vctrs_0.4.1      glue_1.6.2      
[25] compiler_4.1.2   pillar_1.7.0     generics_0.1.2   scales_1.2.0     pkgconfig_2.0.3

heatmaps pheatmap scaling • 4.4k views

ADD COMMENT • link 2.1 years ago Bhavana • 0

score 0 · Answer 1 · 2022-06-24

Dear Bhavana

Heatmaps are a data exploration technique, so whatever produces insights is fine, there is no right or wrong, although there are maybe better or worse practices.

In particular, if your data are already on a log2-fold-change scale, I would personally not center and scale, but rather use the natural, intrinsic scale of the data (center is at 0, one unit of change = fold change of 2). However, you should then make sure the colour map is accordingly. Use a diverging colour scale https://blog.datawrapper.de/diverging-vs-sequential-color-scales/ such that 0 maps to white. An example is here: https://www.huber.embl.de/msmb/Chap-Graphics.html#heatmaps (but do not do rowCenter if your data already have 0 as the natural midpoint).

Hope this helps, Wolfgang

PS Update your R and Bioconductor to current versions :)