Question

Adding gene names as row annotations to an RNAseq heatmap made with ComplexHeatmap

0

Entering edit mode

caroline.zanchi ▴ 10

@3f9f9566

Last seen 5 months ago

Germany

Hello ! I have a deseq object (called 'dds' here)

dds
class: DESeqDataSet 
dim: 11060 198 
metadata(1): version
assays(6): counts avgTxLength ... H cooks
rownames(11060): FBgn0000003 FBgn0000008 ... FBgn0288846 FBgn0288856
rowData names(66): baseMean baseVar ... deviance maxCooks
colnames(198): 1 2 ... 1291 1425
colData names(10): library sample ... condition

the 'rownames' being FlyBase accession numbers.

I am using ComplexHeatmap to make a huge heatmap of genes that are differentially expressed between 2 levels of my 'condition' factor :

resIHW5_tidy<- results(dds, contrast=c("condition","F6","F0"), filterFun = ihw,alpha=0.05, lfcThreshold=1, altHypothesis="greaterAbs", tidy=TRUE) 
df_IHW5<- as.data.frame(resIHW5_tidy)
res_log2FC = subset(df_IHW5, padj<=0.05)
rownames(res_log2FC)=IHW5_sign$row ## now the row is the FlyBase accession number

# I added an extra column in res_log2FC which consists in a concatenation of the accession number and gene name :

genes <- res_log2FC$row
gene_names<-res_log2FC$name
genes_names<-paste(genes,"-",gene_names)

### I transformed the deseq object : 
vsd<-vst(dds,blind=FALSE)

colData(vsd)$condition <- as.factor(colData(vsd)$condition)

# filtering out the samples that are in my 2 levels of interest (the ones compared in my contrast) from the transformed deseq object : 

vsd_interest<-vsd[ , vsd$condition %in% c("F0", "F6") ]

samples_interest<-colnames(vsd_interest) # subsetting the samples that are represented in my 2 levels of interest

# I thank again the nice people who helped me [here][1] https://www.biostars.org/p/9612041/

annot_col <- data.frame(condition = factor(colData(vsd_interest)$condition)) 

expr_matrix <- assay(vsd_interest)[rownames(assay(vsd_interest)) %in% genes, colnames(assay(vsd_interest)) %in% samples_interest]

ref_means <- rowMeans(expr_matrix[, annot_col$condition == "F0"])
cent_expr_matrix <- expr_matrix - ref_means
cent_expr_breaks <- seq(-3, +3, length = 101)

# Then the heatmap : 

ha = HeatmapAnnotation(df = annot_col, which="column",show_annotation_name = FALSE,show_legend=FALSE)

Heatmap(cent_expr_matrix, name = "Log2FC", km = 5, col = colorRamp2(c(-3, 0, +3), c("#00aba9", "white", "#ff0097")),
    top_annotation = ha, column_split=annot_col$condition, show_row_names = TRUE, show_column_names = TRUE, show_column_dend=FALSE,show_row_dend=FALSE, width = unit(0.8, "snpc"))

It gives a very long heatmap, you can see the top of it here :

enter image description here

As you can see (you have to trust me cause the numbers are really small), the gene accession numbers are on the right side. I would like to add gene names as a row annotation instead. I would appreciate any help !

DESeq2 ComplexHeatmap • 2.1k views

ADD COMMENT • link 7 months ago caroline.zanchi ▴ 10

score 2 · Accepted Answer · 2025-05-20

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 12 days ago

United States

library(org.Dm.eg.db)
rn <- select(org.Dm.eg.db, row.names(cent_expr_matrix), "SYMBOL","FLYBASE")
## ensure no NA values get propagated
if(any(is.na(rn[,2]))) 
   rn[is.na(rn[,2]),2] <- rn[is.na(rn[,2]),1]
row.names(cent_expr_matrix) <- rn[,2]

ADD COMMENT • link 7 months ago James W. MacDonald 68k

0

Entering edit mode

Thank you !

enter image description here

In the end I did

rn2 <- select(org.Dm.eg.db, row.names(cent_expr_matrix), "GENENAME","FLYBASE")
## ensure no NA values get propagated
if(any(is.na(rn2[,2]))) 
   rn2[is.na(rn2[,2]),2] <- rn2[is.na(rn2[,2]),1]
row.names(cent_expr_matrix) <- rn2[,2]

Would it make sense to add an annotation column of GO terms ? Or is it something that needs to be properly analysed ?

ADD REPLY • link 7 months ago caroline.zanchi ▴ 10

2

Entering edit mode

Probably not. Any given gene may be appended with many different GO terms, so you would end up with something that is not very useful.

> z <- mapIds(org.Dm.eg.db, head(keys(org.Dm.eg.db)), "GOALL", "ENTREZID", multiVals = "CharacterList")
'select()' returned 1:many mapping between keys and columns
> z
CharacterList of length 6
[["30970"]] GO:0000139 GO:0003674 GO:0003674 ... GO:1901576 GO:1901576
[["30971"]] GO:0000122 GO:0000785 GO:0000785 ... GO:2000026 GO:2001141
[["30972"]] <NA>
[["30973"]] GO:0001941 GO:0002118 GO:0002121 ... GO:1901576 GO:1901576
[["30975"]] GO:0000976 GO:0000977 GO:0000978 ... GO:2001141 GO:2001141
[["30976"]] GO:0000166 GO:0000166 GO:0003674 ... GO:1901363 GO:1901363
> lengths(z)
30970 30971 30972 30973 30975 30976 
   85   297     1   144   276    55

Even if we only use the directly appended GO terms, it's too many.

> z <- mapIds(org.Dm.eg.db, head(keys(org.Dm.eg.db)), "GO", "ENTREZID", multiVals = "CharacterList")
'select()' returned 1:many mapping between keys and columns
> lengths(z)
30970 30971 30972 30973 30975 30976 
    7    30     1    16    16     7

The conventional thing to do is to perform some sort of test (hypergeometric, gene set test, etc) and then you might generate a heatmap that contains the genes that gave rise to one or more significant GO or KEGG terms, with annotations to indicate the genes for each term.

ADD REPLY • link 7 months ago James W. MacDonald 68k

0

Entering edit mode

Yes indeed. I should probably better add an extra annotation column identifying pseudogenes. Maybe I should even filter pseudogenes out ? I will look up the test you mentioned, thank you !

ADD REPLY • link 7 months ago caroline.zanchi ▴ 10