topGO gene names/ensembl IDs
Entering edit mode
krc3004 ▴ 10
Last seen 4.5 years ago

Hi All,

I have performed a differential expression analysis using DESeq2, and would now like to analyze enriched GO terms using the topGO package.  However, I am having difficulty formatting the gene names for input.  Here's what I have so far.

## create named vector of p values
GO_genes = setNames(res$padj, row.names(res))

## create a gene selection function to select significant genes
sig_genes <- function(pval) {return (pval < 10^-5)}

## create topGO object​
topGO = new("topGOdata", description="diff expr GO test", ontology= "BP",  allGenes = GO_genes, geneSel = sig_genes, nodeSize = 10,, mapping="", ID = "GeneName")

However, I obtain the following error:

Building most specific GOs .....    ( 0 GO terms found. )

Build GO DAG topology ..........    ( 0 GO terms and 0 relations. )
Error in if ( || index < 0 || index > length(nd)) stop("vertex is not in graph: ",  : 
  missing value where TRUE/FALSE needed

I noticed some other users had this same issue, and am guessing it has something to do with the fact that I'm passing gene names instead of ensembl IDs (although it looks like topGO supports this?).  So, I tried this:

## get ensembl IDs for mouse
mart = useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
results = getBM(attributes = c("ensembl_gene_id"), values = row.names(res), mart = mart)​

This yields a vector of ensembl IDs that is larger than my list of values (row.names(res)), and doesn't provide a mapping from my values to ensembl IDs- so I'm not sure how to pass it to the topGO object, as topGO expects a named vector of p-values. 

I know a few other users have asked about this but I haven't been able to come up with a solution- any advice would be much appreciated.  Thanks!


topgo biomart deseq2 gene ontology • 3.4k views
Entering edit mode
Mike Smith ★ 5.8k
Last seen 24 minutes ago
EMBL Heidelberg / de.NBI

I'm afraid I can't help with the topGO part, but to retain the mapping between your query and the returned values with biomaRt, you can normally list the same variable as both an attribute and a filter.  

At the moment it looks like you aren't specifying what variable you want to filter on, so the values you provide are actually just ignored and it returns every ensembl ID in the dataset.  You can check the available filters using listFilters(mart).

Assuming you're using mgi_symbol as your filter, you should be able to do something like this to get both it and the ensembl IDs returned

results <- getBM(attributes = c("mgi_symbol", "ensembl_gene_id"), 
                 filter = "mgi_symbol",
                 values = c("Cntnap1", "Luzp1"), 
                 mart = mart)
> results
  mgi_symbol    ensembl_gene_id
1    Cntnap1 ENSMUSG00000017167
2      Luzp1 ENSMUSG00000001089
Entering edit mode
Last seen 22 days ago
European Union

topGO accepts the following names:

c("entrez", "genbank", "alias", "ensembl", "symbol", "genename", "unigene")

So you need to replace the ID = "GeneName" by ID="genename", it has nothing to do with accepting one name of the other but how you pass the argument.


Login before adding your answer.

Traffic: 465 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6