stringasfactor = F changes the output of pheatmap, why?
2
0
Entering edit mode
bdy8 ▴ 10
@bdy8-16982
Last seen 5.9 years ago

Good Morning All

I have a quick question that I cannot seem to work out why it is happening.

As default, I have my R setup as stringasfactor = T. When using this i produce the following heat-map (second image)

But, when I set stringasfactor=F, using the exact same input for the heat-map i get a completely different visualisation. (First image)

I cannot figure out why this is happening so any and all ideas and suggestions would be fantastic. I have attached the code I am using to make the heatmap (this is the same for both heat-maps, only stringasfactor changes).

Ben

stringasfactor-F stringasfactor-T

    matallint <- assay(vsdyear)[ immune_genes_paper$Gene_ID, c("10_salmon", "15_salmon", "16_salmon", "21_salmon", "22_salmon", "26_salmon", "27_salmon", "31_salmon", "32_salmon", "4_salmon", "5_salmon", "6_salmon", "33_salmon", "36_salmon", "38_salmon", "40_salmon", "42_salmon", "44_salmon", "46_salmon", "48_salmon", "50_salmon", "52_salmon", "54_salmon", "56_salmon", "58_salmon", "60_salmon", "62_salmon", "64_salmon", "66_salmon", "74_salmon", "76_salmon", "79_salmon", "80_salmon", "81_salmon", "85_salmon", "86_salmon", "89_salmon", "90_salmon", "1_salmon", "11_salmon", "12_salmon", "13_salmon", "14_salmon", "17_salmon", "18_salmon", "19_salmon", "2_salmon", "20_salmon", "23_salmon", "24_salmon", "25_salmon", "28_salmon", "29_salmon", "3_salmon", "30_salmon", "7_salmon", "8_salmon", "9_salmon", "34_salmon", "35_salmon", "37_salmon", "39_salmon", "41_salmon", "43_salmon", "45_salmon", "47_salmon", "49_salmon", "51_salmon", "53_salmon", "55_salmon", "57_salmon", "59_salmon", "61_salmon", "63_salmon", "65_salmon", "67_salmon", "82_salmon", "83_salmon", "84_salmon", "87_salmon", "88_salmon", "91_salmon", "92_salmon", "93_salmon")]
topVarGenesall <- head(order(-rowVars(matallint)),2671)
matallint <- matallint[topVarGenesall,]
top25rownameintercep <- rownames(matallint)
matallint <- matallint - rowMeans(matallint)
dfallint <- as.data.frame(colData(vsdall)[,c("Treatment_basic", "Year")])
colours <- list(Treatment_basic = c(baseline = "lightblue" ,exposed = "orange"), Year = c("2016" = "grey", "2017" = "black"))
pheatmap(matallint, 
         annotation_col=dfallint,
         annotation_colors = colours,
         cex = 2,
         color = bluered(90),
         show_rownames = T,
         show_colnames = F,
         fontsize = 1.5,
         cluster_rows = T,
         cluster_cols = F,
         scale = 'row')
deseq2 pheatmap • 1.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You would have to provide some self-contained code to show what you say is happening. To me it looks like in the first heatmap you used Edit cluster_cols = TRUE and in the second heatmap you used Edit cluster_cols = FALSE. Whether or not you use stringsAsFactors = TRUE or FALSE should be irrelevant, as the clustering is based on your data, which are numeric, so R won't by definition convert to factors anyway.

ADD COMMENT
0
Entering edit mode

Hi James

I am sorry if it was not clear, the code for the two heat-maps is exactly the same. As I shared above, the code for the heatmaps is again below, and it is ordered so it produces the first heatmap, then the second as in my original post. Note the only difference being the stringasfactors argument which is the first line. The only thing I changed was a prior stringasfactor = T or stringasfactor = F. For making both heatmaps cluster_cols = FALSE, I supply an ordering of my samples (in the columns) as I want it.

> options(stringasfactor = FALSE) 
> matallint <- assay(vsdyear)[interestinggenes, c("sampleorder")]
> topVarGenesall <- head(order(-rowVars(matallint)),2671) 
< matallint <- matallint[topVarGenesall,] 
> top25rownameintercep <- rownames(matallint)
> matallint <- matallint - rowMeans(matallint) 
> dfallint <- as.data.frame(colData(vsdall)[,c("Treatment_basic", "Year")]) 
> colours <- list(Treatment_basic = c(baseline = "lightblue" ,exposed = "orange"), Year = c("2016" = "grey", "2017" = "black"))
> pheatmap(matallint, 
>          annotation_col=dfallint,
>          annotation_colors = colours,
>          cex = 2,
>          color = bluered(90),
>          show_rownames = T,
>          show_colnames = F,
>          fontsize = 1.5,
>          cluster_rows = T,
>          cluster_cols = F,
>          scale = 'row')
> 
> options(stringasfactor=TRUE) 
> matallint <- assay(vsdyear)[interestinggenes, c("sampleorder")]
> topVarGenesall <- head(order(-rowVars(matallint)),2671) 
< matallint <- matallint[topVarGenesall,] 
> top25rownameintercep <- rownames(matallint)
> matallint <- matallint - rowMeans(matallint) 
> dfallint <- as.data.frame(colData(vsdall)[,c("Treatment_basic", "Year")]) 
> colours <- list(Treatment_basic = c(baseline = "lightblue" ,exposed = "orange"), Year = c("2016" = "grey", "2017" = "black"))
> pheatmap(matallint, 
>          annotation_col=dfallint,
>          annotation_colors = colours,
>          cex = 2,
>          color = bluered(90),
>          show_rownames = T,
>          show_colnames = F,
>          fontsize = 1.5,
>          cluster_rows = T,
>          cluster_cols = F,
>          scale = 'row')

ADD REPLY
0
Entering edit mode

It's not whether or not the code is clear. That's not what I meant by self-contained code. By that I mean some code that anybody could run that shows the problems you are having. You are showing some code that you purport will cause changes in the behavior of pheatmap, but nobody else can run your code, so nobody can confirm that they see the same results.

In addition, I am not familiar with an option 'stringasfactor'. There is a 'stringsAsFactors' option that controls how R codes strings when reading data in, but like I said already, that won't have an effect on what you are seeing. If I just do something like

library(pheatmap)
set.seed(0xabeef)
mat <- matrix(rnorm(1000), 100)
pheatmap(mat, cluster_cols = FALSE)
options(stringasfactor = TRUE)
pheatmap(mat, cluster_cols = FALSE)
options(stringasfactor = FALSE)
pheatmap(mat, cluster_cols = FALSE)

I get identical results each time. Perhaps you can generate some self-contained code that shows what you see?

ADD REPLY
0
Entering edit mode

HI James

First of all I apologise, i blanked on the self-contained code part and it now makes sense.

I will try and make this work now (i.e. some reproducible code).

Ben

ADD REPLY
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…

It's impossible to say for sure without you giving reproducible code, but expressions like this are dangerous

assay(vsdyear)[ immune_genes_paper$Gene_ID, ...]

if Gene_ID can either be a character vector or a factor. Namely, if Gene_ID is a factor, it is converted to an integer vector corresponding to the factor levels. A small example for what can go wrong is shown here:

dat = c(`b`=2, `a`=1) 
x = factor(c("b","a")) 
dat[x] 
# a b 
# 1 2 
dat[as.character(x)] 
# b a
# 2 1
ADD COMMENT
0
Entering edit mode

Hi Wolfgang

Looking a little bit deeper into what you suggested I think this is what is causing the problem. I am working on getting some reproducible code so people can run the code.

Thank you for the response.

Ben

ADD REPLY

Login before adding your answer.

Traffic: 534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6