Question

stringasfactor = F changes the output of pheatmap, why?

0

Entering edit mode

bdy8 ▴ 10

@bdy8-16982

Last seen 6.0 years ago

Good Morning All

I have a quick question that I cannot seem to work out why it is happening.

As default, I have my R setup as stringasfactor = T. When using this i produce the following heat-map (second image)

But, when I set stringasfactor=F, using the exact same input for the heat-map i get a completely different visualisation. (First image)

I cannot figure out why this is happening so any and all ideas and suggestions would be fantastic. I have attached the code I am using to make the heatmap (this is the same for both heat-maps, only stringasfactor changes).

Ben

    matallint <- assay(vsdyear)[ immune_genes_paper$Gene_ID, c("10_salmon", "15_salmon", "16_salmon", "21_salmon", "22_salmon", "26_salmon", "27_salmon", "31_salmon", "32_salmon", "4_salmon", "5_salmon", "6_salmon", "33_salmon", "36_salmon", "38_salmon", "40_salmon", "42_salmon", "44_salmon", "46_salmon", "48_salmon", "50_salmon", "52_salmon", "54_salmon", "56_salmon", "58_salmon", "60_salmon", "62_salmon", "64_salmon", "66_salmon", "74_salmon", "76_salmon", "79_salmon", "80_salmon", "81_salmon", "85_salmon", "86_salmon", "89_salmon", "90_salmon", "1_salmon", "11_salmon", "12_salmon", "13_salmon", "14_salmon", "17_salmon", "18_salmon", "19_salmon", "2_salmon", "20_salmon", "23_salmon", "24_salmon", "25_salmon", "28_salmon", "29_salmon", "3_salmon", "30_salmon", "7_salmon", "8_salmon", "9_salmon", "34_salmon", "35_salmon", "37_salmon", "39_salmon", "41_salmon", "43_salmon", "45_salmon", "47_salmon", "49_salmon", "51_salmon", "53_salmon", "55_salmon", "57_salmon", "59_salmon", "61_salmon", "63_salmon", "65_salmon", "67_salmon", "82_salmon", "83_salmon", "84_salmon", "87_salmon", "88_salmon", "91_salmon", "92_salmon", "93_salmon")]
topVarGenesall <- head(order(-rowVars(matallint)),2671)
matallint <- matallint[topVarGenesall,]
top25rownameintercep <- rownames(matallint)
matallint <- matallint - rowMeans(matallint)
dfallint <- as.data.frame(colData(vsdall)[,c("Treatment_basic", "Year")])
colours <- list(Treatment_basic = c(baseline = "lightblue" ,exposed = "orange"), Year = c("2016" = "grey", "2017" = "black"))
pheatmap(matallint, 
         annotation_col=dfallint,
         annotation_colors = colours,
         cex = 2,
         color = bluered(90),
         show_rownames = T,
         show_colnames = F,
         fontsize = 1.5,
         cluster_rows = T,
         cluster_cols = F,
         scale = 'row')

deseq2 pheatmap • 1.8k views

ADD COMMENT • link updated 6.0 years ago by Wolfgang Huber ★ 13k • written 6.0 years ago by bdy8 ▴ 10

James W. MacDonald · Answer 1 · 2019-02-12

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

You would have to provide some self-contained code to show what you say is happening. To me it looks like in the first heatmap you used Edit cluster_cols = TRUE and in the second heatmap you used Edit cluster_cols = FALSE. Whether or not you use stringsAsFactors = TRUE or FALSE should be irrelevant, as the clustering is based on your data, which are numeric, so R won't by definition convert to factors anyway.

ADD COMMENT • link 6.0 years ago James W. MacDonald 68k

0

Entering edit mode

Hi James

I am sorry if it was not clear, the code for the two heat-maps is exactly the same. As I shared above, the code for the heatmaps is again below, and it is ordered so it produces the first heatmap, then the second as in my original post. Note the only difference being the stringasfactors argument which is the first line. The only thing I changed was a prior stringasfactor = T or stringasfactor = F. For making both heatmaps cluster_cols = FALSE, I supply an ordering of my samples (in the columns) as I want it.

> options(stringasfactor = FALSE) 
> matallint <- assay(vsdyear)[interestinggenes, c("sampleorder")]
> topVarGenesall <- head(order(-rowVars(matallint)),2671) 
< matallint <- matallint[topVarGenesall,] 
> top25rownameintercep <- rownames(matallint)
> matallint <- matallint - rowMeans(matallint) 
> dfallint <- as.data.frame(colData(vsdall)[,c("Treatment_basic", "Year")]) 
> colours <- list(Treatment_basic = c(baseline = "lightblue" ,exposed = "orange"), Year = c("2016" = "grey", "2017" = "black"))
> pheatmap(matallint, 
>          annotation_col=dfallint,
>          annotation_colors = colours,
>          cex = 2,
>          color = bluered(90),
>          show_rownames = T,
>          show_colnames = F,
>          fontsize = 1.5,
>          cluster_rows = T,
>          cluster_cols = F,
>          scale = 'row')
> 
> options(stringasfactor=TRUE) 
> matallint <- assay(vsdyear)[interestinggenes, c("sampleorder")]
> topVarGenesall <- head(order(-rowVars(matallint)),2671) 
< matallint <- matallint[topVarGenesall,] 
> top25rownameintercep <- rownames(matallint)
> matallint <- matallint - rowMeans(matallint) 
> dfallint <- as.data.frame(colData(vsdall)[,c("Treatment_basic", "Year")]) 
> colours <- list(Treatment_basic = c(baseline = "lightblue" ,exposed = "orange"), Year = c("2016" = "grey", "2017" = "black"))
> pheatmap(matallint, 
>          annotation_col=dfallint,
>          annotation_colors = colours,
>          cex = 2,
>          color = bluered(90),
>          show_rownames = T,
>          show_colnames = F,
>          fontsize = 1.5,
>          cluster_rows = T,
>          cluster_cols = F,
>          scale = 'row')

ADD REPLY • link updated 6.0 years ago by James W. MacDonald 68k • written 6.0 years ago by bdy8 ▴ 10

0

Entering edit mode

It's not whether or not the code is clear. That's not what I meant by self-contained code. By that I mean some code that anybody could run that shows the problems you are having. You are showing some code that you purport will cause changes in the behavior of pheatmap, but nobody else can run your code, so nobody can confirm that they see the same results.

In addition, I am not familiar with an option 'stringasfactor'. There is a 'stringsAsFactors' option that controls how R codes strings when reading data in, but like I said already, that won't have an effect on what you are seeing. If I just do something like

library(pheatmap)
set.seed(0xabeef)
mat <- matrix(rnorm(1000), 100)
pheatmap(mat, cluster_cols = FALSE)
options(stringasfactor = TRUE)
pheatmap(mat, cluster_cols = FALSE)
options(stringasfactor = FALSE)
pheatmap(mat, cluster_cols = FALSE)

I get identical results each time. Perhaps you can generate some self-contained code that shows what you see?

ADD REPLY • link 6.0 years ago James W. MacDonald 68k

0

Entering edit mode

HI James

First of all I apologise, i blanked on the self-contained code part and it now makes sense.

I will try and make this work now (i.e. some reproducible code).

Ben

ADD REPLY • link 6.0 years ago bdy8 ▴ 10

score 0 · Answer 2 · 2019-02-12

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 3 days ago

EMBL European Molecular Biology Laborat…

It's impossible to say for sure without you giving reproducible code, but expressions like this are dangerous

assay(vsdyear)[ immune_genes_paper$Gene_ID, ...]

if Gene_ID can either be a character vector or a factor. Namely, if Gene_ID is a factor, it is converted to an integer vector corresponding to the factor levels. A small example for what can go wrong is shown here:

dat = c(`b`=2, `a`=1) 
x = factor(c("b","a")) 
dat[x] 
# a b 
# 1 2 
dat[as.character(x)] 
# b a
# 2 1

ADD COMMENT • link 6.0 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Hi Wolfgang

Looking a little bit deeper into what you suggested I think this is what is causing the problem. I am working on getting some reproducible code so people can run the code.

Thank you for the response.

Ben

ADD REPLY • link 6.0 years ago bdy8 ▴ 10