Question: stringasfactor = F changes the output of pheatmap, why?
0
9 weeks ago by
bdy80
bdy80 wrote:

Good Morning All

I have a quick question that I cannot seem to work out why it is happening.

As default, I have my R setup as stringasfactor = T. When using this i produce the following heat-map (second image)

But, when I set stringasfactor=F, using the exact same input for the heat-map i get a completely different visualisation. (First image)

I cannot figure out why this is happening so any and all ideas and suggestions would be fantastic. I have attached the code I am using to make the heatmap (this is the same for both heat-maps, only stringasfactor changes).

Ben

    matallint <- assay(vsdyear)[ immune_genes_paper$Gene_ID, c("10_salmon", "15_salmon", "16_salmon", "21_salmon", "22_salmon", "26_salmon", "27_salmon", "31_salmon", "32_salmon", "4_salmon", "5_salmon", "6_salmon", "33_salmon", "36_salmon", "38_salmon", "40_salmon", "42_salmon", "44_salmon", "46_salmon", "48_salmon", "50_salmon", "52_salmon", "54_salmon", "56_salmon", "58_salmon", "60_salmon", "62_salmon", "64_salmon", "66_salmon", "74_salmon", "76_salmon", "79_salmon", "80_salmon", "81_salmon", "85_salmon", "86_salmon", "89_salmon", "90_salmon", "1_salmon", "11_salmon", "12_salmon", "13_salmon", "14_salmon", "17_salmon", "18_salmon", "19_salmon", "2_salmon", "20_salmon", "23_salmon", "24_salmon", "25_salmon", "28_salmon", "29_salmon", "3_salmon", "30_salmon", "7_salmon", "8_salmon", "9_salmon", "34_salmon", "35_salmon", "37_salmon", "39_salmon", "41_salmon", "43_salmon", "45_salmon", "47_salmon", "49_salmon", "51_salmon", "53_salmon", "55_salmon", "57_salmon", "59_salmon", "61_salmon", "63_salmon", "65_salmon", "67_salmon", "82_salmon", "83_salmon", "84_salmon", "87_salmon", "88_salmon", "91_salmon", "92_salmon", "93_salmon")] topVarGenesall <- head(order(-rowVars(matallint)),2671) matallint <- matallint[topVarGenesall,] top25rownameintercep <- rownames(matallint) matallint <- matallint - rowMeans(matallint) dfallint <- as.data.frame(colData(vsdall)[,c("Treatment_basic", "Year")]) colours <- list(Treatment_basic = c(baseline = "lightblue" ,exposed = "orange"), Year = c("2016" = "grey", "2017" = "black")) pheatmap(matallint, annotation_col=dfallint, annotation_colors = colours, cex = 2, color = bluered(90), show_rownames = T, show_colnames = F, fontsize = 1.5, cluster_rows = T, cluster_cols = F, scale = 'row')  deseq2 pheatmap • 117 views ADD COMMENTlink modified 9 weeks ago by Wolfgang Huber13k • written 9 weeks ago by bdy80 Answer: stringasfactor = F changes the output of pheatmap, why? 0 9 weeks ago by United States James W. MacDonald49k wrote: You would have to provide some self-contained code to show what you say is happening. To me it looks like in the first heatmap you used Edit cluster_cols = TRUE and in the second heatmap you used Edit cluster_cols = FALSE. Whether or not you use stringsAsFactors = TRUE or FALSE should be irrelevant, as the clustering is based on your data, which are numeric, so R won't by definition convert to factors anyway. ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by James W. MacDonald49k Hi James I am sorry if it was not clear, the code for the two heat-maps is exactly the same. As I shared above, the code for the heatmaps is again below, and it is ordered so it produces the first heatmap, then the second as in my original post. Note the only difference being the stringasfactors argument which is the first line. The only thing I changed was a prior stringasfactor = T or stringasfactor = F. For making both heatmaps cluster_cols = FALSE, I supply an ordering of my samples (in the columns) as I want it. > options(stringasfactor = FALSE) > matallint <- assay(vsdyear)[interestinggenes, c("sampleorder")] > topVarGenesall <- head(order(-rowVars(matallint)),2671) < matallint <- matallint[topVarGenesall,] > top25rownameintercep <- rownames(matallint) > matallint <- matallint - rowMeans(matallint) > dfallint <- as.data.frame(colData(vsdall)[,c("Treatment_basic", "Year")]) > colours <- list(Treatment_basic = c(baseline = "lightblue" ,exposed = "orange"), Year = c("2016" = "grey", "2017" = "black")) > pheatmap(matallint, > annotation_col=dfallint, > annotation_colors = colours, > cex = 2, > color = bluered(90), > show_rownames = T, > show_colnames = F, > fontsize = 1.5, > cluster_rows = T, > cluster_cols = F, > scale = 'row') > > options(stringasfactor=TRUE) > matallint <- assay(vsdyear)[interestinggenes, c("sampleorder")] > topVarGenesall <- head(order(-rowVars(matallint)),2671) < matallint <- matallint[topVarGenesall,] > top25rownameintercep <- rownames(matallint) > matallint <- matallint - rowMeans(matallint) > dfallint <- as.data.frame(colData(vsdall)[,c("Treatment_basic", "Year")]) > colours <- list(Treatment_basic = c(baseline = "lightblue" ,exposed = "orange"), Year = c("2016" = "grey", "2017" = "black")) > pheatmap(matallint, > annotation_col=dfallint, > annotation_colors = colours, > cex = 2, > color = bluered(90), > show_rownames = T, > show_colnames = F, > fontsize = 1.5, > cluster_rows = T, > cluster_cols = F, > scale = 'row')  ADD REPLYlink written 9 weeks ago by bdy80 It's not whether or not the code is clear. That's not what I meant by self-contained code. By that I mean some code that anybody could run that shows the problems you are having. You are showing some code that you purport will cause changes in the behavior of pheatmap, but nobody else can run your code, so nobody can confirm that they see the same results. In addition, I am not familiar with an option 'stringasfactor'. There is a 'stringsAsFactors' option that controls how R codes strings when reading data in, but like I said already, that won't have an effect on what you are seeing. If I just do something like library(pheatmap) set.seed(0xabeef) mat <- matrix(rnorm(1000), 100) pheatmap(mat, cluster_cols = FALSE) options(stringasfactor = TRUE) pheatmap(mat, cluster_cols = FALSE) options(stringasfactor = FALSE) pheatmap(mat, cluster_cols = FALSE)  I get identical results each time. Perhaps you can generate some self-contained code that shows what you see? ADD REPLYlink written 9 weeks ago by James W. MacDonald49k HI James First of all I apologise, i blanked on the self-contained code part and it now makes sense. I will try and make this work now (i.e. some reproducible code). Ben ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by bdy80 Answer: stringasfactor = F changes the output of pheatmap, why? 0 9 weeks ago by EMBL European Molecular Biology Laboratory Wolfgang Huber13k wrote: It's impossible to say for sure without you giving reproducible code, but expressions like this are dangerous assay(vsdyear)[ immune_genes_paper$Gene_ID, ...]


if Gene_ID can either be a character vector or a factor. Namely, if Gene_ID is a factor, it is converted to an integer vector corresponding to the factor levels. A small example for what can go wrong is shown here:

dat = c(b=2, a=1)
x = factor(c("b","a"))
dat[x]
# a b
# 1 2
dat[as.character(x)]
# b a
# 2 1


Hi Wolfgang

Looking a little bit deeper into what you suggested I think this is what is causing the problem. I am working on getting some reproducible code so people can run the code.

Thank you for the response.

Ben