Scaling for p.heatmap
1
0
Entering edit mode
Bhavana • 0
@1d778e30
Last seen 6 weeks ago
Australia

I'm new to R and am making a heatmap for some RNA sequencing data using p.heatmap. My input data is the Log2CPM of genes across 5 samples (samples in columns, genes in rows). I want to understand whether I should scale my data or not, using the scale() function. And secondly, if I should set scale="row" in the p.heatmap function or not. Here is my code:

install.packages("pheatmap")
library(pheatmap)
heatmap_trial_2 <- data.frame(heatmap_trial_2[,-1], row.names=heatmap_trial_2[,1])
sc_1 <-t(scale(t(heatmap_trial_2), center = TRUE, scale = TRUE))
pheatmap(sc_1, kmeans_k = NA, breaks = NA, scale = "none", cluster_rows = FALSE,
cluster_cols = FALSE,
show_rownames = TRUE, show_colnames = TRUE,
colorRampPalette(brewer.pal(9,"BuPu"))(100))


Here is the output I get when I put the above code

However, I noticed that if I set scale = "row" in the p.heatmap code, then the heatmap looks exactly the same regardless of whether i set scale = TRUE or scale = FALSE using the scale function. Here is what the heatmap looks like in that case:

If I don't scale it at all (if I put scale=FALSE and scale="none"), this is what I get:

I do understand the purpose of scaling in general after reading the R documentation for both the functions and some other posts so I know I should be scaling my data, I am just struggling to determine which of these is the correct way to do it for my data. At what step should I perform the "scaling"? Any help would be highly appreciated, thanks!

my session info

sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RColorBrewer_1.1-3 viridis_0.6.2      viridisLite_0.4.0  pheatmap_1.0.12

loaded via a namespace (and not attached):
[1] magrittr_2.0.3   tidyselect_1.1.2 munsell_0.5.0    colorspace_2.0-3 R6_2.5.1         rlang_1.0.2      fansi_1.0.3      dplyr_1.0.9
[9] tools_4.1.2      grid_4.1.2       gtable_0.3.0     utf8_1.2.2       DBI_1.1.3        cli_3.3.0        ellipsis_0.3.2   assertthat_0.2.1
[17] tibble_3.1.7     lifecycle_1.0.1  crayon_1.5.1     gridExtra_2.3    purrr_0.3.4      ggplot2_3.3.6    vctrs_0.4.1      glue_1.6.2
[25] compiler_4.1.2   pillar_1.7.0     generics_0.1.2   scales_1.2.0     pkgconfig_2.0.3

heatmaps pheatmap scaling • 158 views
0
Entering edit mode
@wolfgang-huber-3550
Last seen 4 days ago
EMBL European Molecular Biology Laborat…

Dear Bhavana

Heatmaps are a data exploration technique, so whatever produces insights is fine, there is no right or wrong, although there are maybe better or worse practices.

In particular, if your data are already on a log2-fold-change scale, I would personally not center and scale, but rather use the natural, intrinsic scale of the data (center is at 0, one unit of change = fold change of 2). However, you should then make sure the colour map is accordingly. Use a diverging colour scale https://blog.datawrapper.de/diverging-vs-sequential-color-scales/ such that 0 maps to white. An example is here: https://www.huber.embl.de/msmb/Chap-Graphics.html#heatmaps (but do not do rowCenter if your data already have 0 as the natural midpoint).

Hope this helps, Wolfgang

PS Update your R and Bioconductor to current versions :)

0
Entering edit mode

Okay, I understand...Thank you for the clarification! And thanks for pointing out that I need to update my versions :)