Question

Adding gene names to rlog transformations for heatmaps?

0

Entering edit mode

alchemist4au • 0

@alchemist4au-7524

Last seen 9.1 years ago

United States

Hi,

I've been following Beginner's guide to using the DESeq2 package to analyze some RNA-seq data and to make heatmaps. However, I can't figure out how to label the rows in the heatmap with gene names as opposed to ensembl IDs. I've tried to add gene names/hgnc symbols to the rlog transform values using bioMart, but I was unsuccessful. Here is the code I have been using from the guide to generate heatmaps.

library( "genefilter" )
topVarGenes <- head( order( rowVars( assay(rld) ), decreasing=TRUE ), 35 )

heatmap.2( assay(rld)[ topVarGenes, ], scale="row",
trace="none", dendrogram="column",
col = colorRampPalette( rev(brewer.pal(9, "RdBu")) )(255))

Is there an easy way to do this. I would greatly appreciate any advice on how to get the gene names onto the heatmaps. Thanks.

deseq2 gplot • 4.1k views

ADD COMMENT • link updated 9.1 years ago by James W. MacDonald 65k • written 9.1 years ago by alchemist4au • 0

score 2 · Accepted Answer · 2015-03-26

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

You should be able to use biomaRt. It would have been helpful if you had showed the code that was unsuccessful. Something like

mat <- assay(rld)[ topVarGenes, ]
mart <- useMart("ensembl","hsapiens_gene_ensembl") ## assuming human, as you don't say
gns <- getBM(c("hgnc_symbol","ensembl_gene_id"), "ensembl_gene_id", row.names(mat), mart)
row.names(mat)[match(gns[,1], row.names(mat))] <- gns[,2]

Note that you might need to convert the data in 'gns' to character for match() to work correctly.

Alternatively you can just use an org package

library(org.Hs.eg.db)
gns <- select(org.Hs.eg.db, row.names(mat), "SYMBOL", "ENSEMBL")

And match as above.

ADD COMMENT • link 9.1 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you. I really appreciate your help.

However, I'm getting the following error:

> row.names(mat)[match(gns[,1], row.names(mat))] <- gns[,2]
Error in row.names(mat)[match(gns[, 1], row.names(mat))] <- gns[, 2] :
NAs are not allowed in subscripted assignments

I'm not sure how to convert the data in gns to characters as you have mentioned? I'm fairly new to R.

ADD REPLY • link 9.1 years ago alchemist4au • 0

2

Entering edit mode

Let this be a lesson to you about taking advice from random strangers on the intertubes ;-D But this is a really good way to learn R, by decomposing the code and seeing where it went wrong. You will have to learn how to do this, if you use R much at all, because it is not possible to always write perfect code that works the first time.

The goal was to map the Ensembl IDs to HUGO symbols, and then replace the Ensembl IDs (at least those for which we got a Ensembl -> HUGO mapping). The last line was intended to do the replacement:

row.names(mat)[match(gns[,1], row.names(mat))] <- gns[,2]

But note that the call to getBM() was

gns <- getBM(c("hgnc_symbol","ensembl_gene_id"), "ensembl_gene_id", row.names(mat), mart)

And if we look at that output, we get this:

> head(gns)
  hgnc_symbol ensembl_gene_id
1        USP2 ENSG00000036672
2      PTGER3 ENSG00000050628
3        BCL3 ENSG00000069399
4       NEDD4 ENSG00000069869
5      RNF126 ENSG00000070423
6        PAK3 ENSG00000077264

We ask getBM() to return the original Ensembl Gene ID as well as the HUGO symbols because the return object isn't necessarily in the same order as the data we sent to the Biomart server, and we want to ensure that we get the correct mapping between Ensembl Gene ID and symbol.

So let's deconstruct the code that didn't work. At a high level what we are doing is

row.names(mat) <- gns[,2]

where we add in this business

[match(gns[,1], row.names(mat))]

because we know that the gns object isn't in the same order as the original row.names of your matrix, so we use both columns of the gns object to do the Ensembl Gene ID -> HUGO mapping. However, I made a mistake; the second column of the gns data.frame contains the Ensembl IDs, and the first column contains the HUGO symbols. So when we try to match() gns[,1] with the row.names of the matrix, we get all NA values. We instead needed to match() gns[,2] with the row.names of the matrix:

row.names(mat)[match(gns[,2], row.names(mat))] <- gns[,1]

Does that make sense?

ADD REPLY • link 9.1 years ago James W. MacDonald 65k

0

Entering edit mode

Yup, I went through a bunch of shenanigans trying to convert the gns to characters and in the end saw that the columns had been switched...

Thanks again stranger :)

ADD REPLY • link 9.1 years ago alchemist4au • 0

0

Entering edit mode

By the way, the alternative org package worked. Thank you for your help!

ADD REPLY • link 9.1 years ago alchemist4au • 0