Question

Error in GO enrichment with topGO

0

Entering edit mode

June. ▴ 10

@june-23879

Last seen 4.9 years ago

Hello, I am following the tutorial "https://www.bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html#126multipletestingfdr,andcomparisonwithresultsfromtheoriginal_paper" for microarray analysis, but mine is S. cerevisiae. Here I have problem with GO enrichment and get an error that I don't understand:

table is made by topTable, where top-ranked genes are extracted from the model. The vector back_genes has background gene PROBEIDs. DE_genes is a table with deferentially expressed genes.

gene_IDs <- rownames(table)
in_universe <- gene_IDs %in% c(DE_genes, back_genes)
in_selection <- gene_IDs %in% DE_genes

all_genes <- in_selection[in_universe]
all_genes <- factor(as.integer(in_selection[in_universe]))
names(all_genes) <- gene_IDs[in_universe] 

library(topGO)
library(yeast2.db)
top_GO_data <- new("topGOdata", ontology = "BP", allGenes = all_genes,
                   annot = annFUN.db, affyLib = "yeast2.db")



  **error:
    Building most specific GOs .....
    Error: cannot join using column gene_id - column not present in both tables**

In all the steps, vectors contain information. The geneIDs is there and I can see it has more than 9000 elements like "1775344at", "1769773_at", etc. I don't know if I have made mistake in defining my arguments or I have misunderstood any step? I tried goana too, but had trouble with the package org.Sc.eg.db that couldn't be loaded.

Please help. Thanks in advance.

topGO microarray Gene ontology • 1.5k views

ADD COMMENT • link updated 5.0 years ago by James W. MacDonald 68k • written 5.0 years ago by June. ▴ 10

score 1 · Answer 1 · 2020-07-20

To use topGO, you would have to come up with your own version of annFUN.db that generates the right SQL query, which may be more than you really want to get involved with. Instead you could use a different method. I don't really get the idea of the 'background genes' in that workflow, so instead let's just use the genes that are considered significant.

all_genes <- table$adj.P.Val
names(all_genes) <- row.names(table)
## I just faked up some data as an example - you should have something that looks like this:
> head(all_genes)
1769308_at 1769309_at 1769310_at 1769311_at 1769312_at 1769313_at 
 0.7045542  0.2027509  0.6004394  0.6677696  0.9122440  0.3313085

## Generate an ID2GO mapping
> tmp <- toTable(yeast2GO2ALLPROBES)
> tmpbp <- tmp[tmp$Ontology == "BP",]
> ID2GO <- split(tmpbp$go_id, tmpbp$probe_id)
> GOdata <- new("topGOdata", ontology = "BP", allGenes = all_genes, annot = annFUN.gene2GO, gene2GO = ID2GO, geneSel = function(x) x < 0.05)

Building most specific GOs .....
    ( 5056 GO terms found. )

Build GO DAG topology ..........
    ( 5056 GO terms and 11352 relations. )

Annotating nodes ...............
    ( 5687 genes annotated to the GO terms. )

You would subset the 'tmp' object using MF and CC if you want to do those ontologies as well. But this is the general idea.