topGO gene names/ensembl IDs
2
0
Entering edit mode
krc3004 ▴ 10
@krc3004-12978
Last seen 6.6 years ago

Hi All,

I have performed a differential expression analysis using DESeq2, and would now like to analyze enriched GO terms using the topGO package.  However, I am having difficulty formatting the gene names for input.  Here's what I have so far.

## create named vector of p values
GO_genes = setNames(res$padj, row.names(res))

## create a gene selection function to select significant genes
sig_genes <- function(pval) {return (pval < 10^-5)}

## create topGO object​
topGO = new("topGOdata", description="diff expr GO test", ontology= "BP",  allGenes = GO_genes, geneSel = sig_genes, nodeSize = 10, annot=annFUN.org, mapping="org.Mm.eg.db", ID = "GeneName")

However, I obtain the following error:

Building most specific GOs .....    ( 0 GO terms found. )

Build GO DAG topology ..........    ( 0 GO terms and 0 relations. )
Error in if (is.na(index) || index < 0 || index > length(nd)) stop("vertex is not in graph: ",  : 
  missing value where TRUE/FALSE needed

I noticed some other users had this same issue, and am guessing it has something to do with the fact that I'm passing gene names instead of ensembl IDs (although it looks like topGO supports this?).  So, I tried this:

## get ensembl IDs for mouse
mart = useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
results = getBM(attributes = c("ensembl_gene_id"), values = row.names(res), mart = mart)​

This yields a vector of ensembl IDs that is larger than my list of values (row.names(res)), and doesn't provide a mapping from my values to ensembl IDs- so I'm not sure how to pass it to the topGO object, as topGO expects a named vector of p-values. 

I know a few other users have asked about this but I haven't been able to come up with a solution- any advice would be much appreciated.  Thanks!

 

topgo biomart deseq2 gene ontology • 4.8k views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 9 hours ago
EMBL Heidelberg

I'm afraid I can't help with the topGO part, but to retain the mapping between your query and the returned values with biomaRt, you can normally list the same variable as both an attribute and a filter.  

At the moment it looks like you aren't specifying what variable you want to filter on, so the values you provide are actually just ignored and it returns every ensembl ID in the dataset.  You can check the available filters using listFilters(mart).

Assuming you're using mgi_symbol as your filter, you should be able to do something like this to get both it and the ensembl IDs returned

results <- getBM(attributes = c("mgi_symbol", "ensembl_gene_id"), 
                 filter = "mgi_symbol",
                 values = c("Cntnap1", "Luzp1"), 
                 mart = mart)
> results
  mgi_symbol    ensembl_gene_id
1    Cntnap1 ENSMUSG00000017167
2      Luzp1 ENSMUSG00000001089
ADD COMMENT
1
Entering edit mode
@lluis-revilla-sancho
Last seen 3 days ago
European Union

topGO accepts the following names:

c("entrez", "genbank", "alias", "ensembl", "symbol", "genename", "unigene")

So you need to replace the ID = "GeneName" by ID="genename", it has nothing to do with accepting one name of the other but how you pass the argument.

ADD COMMENT

Login before adding your answer.

Traffic: 505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6