Question: topGO gene names/ensembl IDs
gravatar for krc3004
6 months ago by
krc300410 wrote:

Hi All,

I have performed a differential expression analysis using DESeq2, and would now like to analyze enriched GO terms using the topGO package.  However, I am having difficulty formatting the gene names for input.  Here's what I have so far.

## create named vector of p values
GO_genes = setNames(res$padj, row.names(res))

## create a gene selection function to select significant genes
sig_genes <- function(pval) {return (pval < 10^-5)}

## create topGO object​
topGO = new("topGOdata", description="diff expr GO test", ontology= "BP",  allGenes = GO_genes, geneSel = sig_genes, nodeSize = 10,, mapping="", ID = "GeneName")

However, I obtain the following error:

Building most specific GOs .....    ( 0 GO terms found. )

Build GO DAG topology ..........    ( 0 GO terms and 0 relations. )
Error in if ( || index < 0 || index > length(nd)) stop("vertex is not in graph: ",  : 
  missing value where TRUE/FALSE needed

I noticed some other users had this same issue, and am guessing it has something to do with the fact that I'm passing gene names instead of ensembl IDs (although it looks like topGO supports this?).  So, I tried this:

## get ensembl IDs for mouse
mart = useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
results = getBM(attributes = c("ensembl_gene_id"), values = row.names(res), mart = mart)​

This yields a vector of ensembl IDs that is larger than my list of values (row.names(res)), and doesn't provide a mapping from my values to ensembl IDs- so I'm not sure how to pass it to the topGO object, as topGO expects a named vector of p-values. 

I know a few other users have asked about this but I haven't been able to come up with a solution- any advice would be much appreciated.  Thanks!


ADD COMMENTlink modified 6 months ago by Lluís R300 • written 6 months ago by krc300410
gravatar for Mike Smith
6 months ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

I'm afraid I can't help with the topGO part, but to retain the mapping between your query and the returned values with biomaRt, you can normally list the same variable as both an attribute and a filter.  

At the moment it looks like you aren't specifying what variable you want to filter on, so the values you provide are actually just ignored and it returns every ensembl ID in the dataset.  You can check the available filters using listFilters(mart).

Assuming you're using mgi_symbol as your filter, you should be able to do something like this to get both it and the ensembl IDs returned

results <- getBM(attributes = c("mgi_symbol", "ensembl_gene_id"), 
                 filter = "mgi_symbol",
                 values = c("Cntnap1", "Luzp1"), 
                 mart = mart)
> results
  mgi_symbol    ensembl_gene_id
1    Cntnap1 ENSMUSG00000017167
2      Luzp1 ENSMUSG00000001089
ADD COMMENTlink modified 6 months ago • written 6 months ago by Mike Smith2.1k
gravatar for Lluís R
6 months ago by
Lluís R300
European Union
Lluís R300 wrote:

topGO accepts the following names:

c("entrez", "genbank", "alias", "ensembl", "symbol", "genename", "unigene")

So you need to replace the ID = "GeneName" by ID="genename", it has nothing to do with accepting one name of the other but how you pass the argument.

ADD COMMENTlink written 6 months ago by Lluís R300
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 180 users visited in the last hour