Question: How are feasible genes determined in topGO
0
2.6 years ago by
michael.steffen20 wrote:

How are feasible genes selected in topGO. I can't seem to find any information on this? Many genes that are non considered feasible seem no different that any other genes and their GO annotation?

Thanks,

Mike

topgo • 802 views
modified 2.6 years ago by James W. MacDonald51k • written 2.6 years ago by michael.steffen20
Answer: How are feasible genes determined in topGO
0
2.6 years ago by
United States
James W. MacDonald51k wrote:

From the vignette:

One important point to notice is that not all the genes that are provided by geneList, the initial gene universe, can be annotated to the GO. This can be seen by comparing the number of all available genes, the genes present in geneList, with the number of feasible genes. We are therefore forced at this point to restrict the gene universe to the set of feasible genes for the rest of the analysis.

But why can they not be annotated to the GO? These are real GOs. This is more of what I have been finding to try and answer this question, I understand what its doing, but not why or how it is doing it.

What do you mean by 'real GOs'? The quote above is in essence saying that not all Gene IDs have GO terms appended to them. If you have a gene that isn't annotated in GO, then you can't do anything with it. This has nothing to do with the GO terms themselves, but instead whether or not a gene has a GO term.

By real GOs, I mean these Gene IDs do have GO terms associated with them. If even shows them when running the code. The problem is, topGO is telling me these genes aren't useable, but I can not figure out why.

Well, I don't know what to tell you. I say what it means, and you tell me I am wrong, without any evidence as to why I am wrong, and even after you say you can't figure it out yourself.

So I'll give you an example to show why I am right and you are wrong, and if you don't believe me, that's cool. But I won't be responding further if you insist on being misinformed.

Using the example from the vignette:

> library(ALL)
> data(ALL)
> data(geneList)
> library(hgu95av2.db)
> library(hgu95av2.db)
> sampleGOdata <- new("topGOdata",
description = "Simple session",
ontology = "BP",
allGenes = geneList,
geneSel = topDiffGenes,
nodeSize = 10,
annot = annFUN.db,
affyLib = affyLib)

> sampleGOdata

--------------- topGOdata object -------------------------

Description:
-  Simple session

Ontology:
-  BP

323 available genes (all genes from the array):
- symbol:  1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at  ...
- score :  1 1 0.62238 0.541224 1  ...
- 50  significant genes.

310 feasible genes (genes that can be used in the analysis):
- symbol:  1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at  ...
- score :  1 1 0.62238 0.541224 1  ...
- 46  significant genes.

GO graph (nodes with at least  10  genes):
- a graph with directed edges
- number of nodes = 1017
- number of edges = 2275

---------------------- topGOdata object -------------------------

So we have 323 available genes, and 310 feasible genes, right? And we used annFUN.db to map the hgu95av2 IDs to GO IDs. Let's look at that:

> z <- annFUN.db("BP", names(geneList), "hgu95av2.db")
> class(z)
[1] "list"
> z[1:3]
$GO:0000022 [1] "37171_at"$GO:0000055
[1] "38708_at"

$GO:0000056 [1] "38708_at" > sum(names(geneList) %in% unique(unlist(z))) [1] 310 You can see here that annFUN.db is creating a list, where the names are the appended GO IDs, and the list items are the probeset IDs that are mapped to that GO term. Of the 323 probeset IDs I started with, there are only 310 of them that have a BP GO term! ADD REPLYlink written 2.6 years ago by James W. MacDonald51k I feel like your response was unnecessarily hostile. I am not sure how I told you that you were wrong. You asked me to clarify something, and I responded with trying to better help you understand what I meant........ But besides that, your last sentence did inadvertently lead me to the answer I was looking for. The reason for many genes being unfeasible despite having GO annotations was that they did not have the ontology domains that was being tested. So for instance,$ Caust.v2_000002: chr [1:4] "GO:0000166" "GO:0002218" "GO:0003676" "GO:0032481"

would not be feasible under "CC" since none of the GOs relate to CC, however, if I investigated "BP" or "MF", then this gene would be considered feasible. I guess I was surprised that so many genes lacked certain ontology domains, but really, that does make sense that they wouldn't.