Search
Question: GO-terms delivered by GOstats hyperGTest() do not exist in my universe
0
gravatar for martis.mihaela
12 days ago by
martis.mihaela0 wrote:

Hello,

I'm using GOstats (version 2.42.0) to perform GO enrichment analysis in a non-model organism.

The input data was obtained from the DE analysis of RNA-Seq data. For the analysis I used an own

gene universe, as well as a sub-list of this genes to run the hypergeometric test. The function

hyperGtest() returned 25 enriched GO-terms. Looking closer to the results I realised that several

GO-terms were not present in my gene universe. How can that happen? Does anyone have an

explanation for that?

Best regards,

Mihaela

ADD COMMENTlink written 12 days ago by martis.mihaela0

You will have to provide a self-contained example to show the problem for anybody to help.
 

ADD REPLYlink written 12 days ago by James W. MacDonald45k

Sorry for the late response. Here is an example for one of the enriched GO-terms which are not in my universe list:

The Go-term GO:0050136 has a count of 3 genes, namely 808221, 808224, and 808225 (entrezIds). In my universe the GO-terms corresponding to this genes are: GO:0016021, GO:0005747, GO:0008137 (for 808221); GO:0016021, GO:0005747, GO:0008137, GO:0042773 (for 808224); and GO:0016021, GO:0005747, GO:0008137, GO:0006120 (for 808225). The GO-term "GO:0050136" is not present at all in my entire universe.

Just to be sure that those enriched GO-terms which are present in my universe are also those matching the counted genes, I run a test for the GO-term GO:0008137 and everything was fine.

To run the GO-term I have used the following code:


library(AnnotationDbi)
library(biomaRt)
library(GOstats)
library(plyr)
library("GSEABase")
library(AnnotationHub)

sigDE_file     <- "list_sigDE_genes.csv"
sig.set        <- read.table(file = sigDE_file, header = TRUE,sep = "\t", stringsAsFactors = FALSE, as.is = TRUE, na.strings = c(NA, "NA", " NA"))
sig.set        <- sig.set[!is.na(sig.set$entrez),]
sig.set$entrez <- as.character(sig.set$entrez)
sig.ids        <- sig.set$entrez

univ_file      <- "list_allDE_genes.csv"
univ           <- read.table(file = univ_file, header = TRUE,quote = "", sep = "\t", stringsAsFactors = FALSE, as.is = TRUE, na.strings = c(NA, "NA"," NA"))
univ <- univ[complete.cases(univ$entrez), ]
univ <- univ[!duplicated(univ$entrez), ]

ah        <- AnnotationHub()
orgs      <- subset(ah,ah$rdataclass == "OrgDb")
query(orgs, "Oryctolagus.cuniculus")                              
rabbit    <- orgs[["AH55793"]]
go_ids    <- AnnotationDbi::select(rabbit, keys = as.character(univ$entrez), keytype = "ENTREZID", columns = c("GO","EVIDENCE","ENTREZID","SYMBOL"))

go_ids <- go_ids[,c(2,3,1,4)]
go_ids <- go_ids[complete.cases(go_ids$GO),]

universe <- unique(go_ids$ENTREZID)
my.frame   <- data.frame(cbind(GO=go_ids$GO,EVIDENCE=go_ids$EVIDENCE,ENTREZID=go_ids$ENTREZID))
goFrame    <- GOFrame(my.frame, organism = "Oryctolagus.cuniculus")
goAllFrame <- GOAllFrame(goFrame)
gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())

params  <- GSEAGOHyperGParams(name="My Custom GSEA based annot Params", geneSetCollection = gsc, geneIds = sig.ids , universeGeneIds = universe, ontology = "MF", pvalueCutoff = 0.05, conditional = FALSE, testDirection = "over")
mf_over         <- hyperGTest(params)
mf_over_results <- summary(mf_over)

all <-geneIdUniverse(mf_over)[["GO:0050136"]]
"808221"    "808222"    "808223"    "808224"    "808225"    "808226"    "808230"    "100341941" "100344233""100353024" "100358684"
x <-geneIds(mf_over)[geneIds(mf_over) %in% all]
"808221" "808224" "808225"


I can provide the gene list and the universe list, too. I'm just not sure how to attach it.

ADD REPLYlink written 12 days ago by martis.mihaela0
0
gravatar for Jenny Drnevich
11 days ago by
Jenny Drnevich1.9k
United States
Jenny Drnevich1.9k wrote:

The reason you get GO terms in the output that did not seem to be in the input universe is because the input universe only contains specific annotated GO terms. The part of your code:

goAllFrame <- GOAllFrame(goFrame)

actually pulls in all parental terms of the specific annotated GO terms. Remember GO is hierarchical in nature and any gene that gets specifically annotated with one GO term will automatically also get all the parental terms. In your example, GO:0005747 is a child term of GO:0050136 . See http://amigo.geneontology.org/amigo/term/GO:0005747 and click on the "Inferred Tree View"

ADD COMMENTlink written 11 days ago by Jenny Drnevich1.9k
Thank you very much for the answer. That explains everything. Is there a way to pool only those annotated in my own universe? /Mihaela On 10 Nov 2017 6:16 pm, "Jenny Drnevich [bioc]" <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Jenny Drnevich <https: support.bioconductor.org="" u="" 2812=""/> wrote Comment: > GO-terms delivered by GOstats hyperGTest() do not exist in my universe > <https: support.bioconductor.org="" p="" 102812="" #102858="">: > > The reason you get GO terms in the output that did not seem to be in the > input universe is because the input universe only contains specific > annotated GO terms. The part of your code: > > goAllFrame <- GOAllFrame(goFrame) > > actually pulls in all parental terms of the specific annotated GO terms. > Remember GO is hierarchical in nature and any gene that gets specifically > annotated with one GO term will automatically also get all the parental > terms. In your example, *GO:0005747 *is a child term of *GO:0050136* . > See http://amigo.geneontology.org/amigo/term/GO:0005747 and click on the > "Inferred Tree View" > ------------------------------ > > Post tags: GOstats > > You may reply via email or visit https://support.bioconductor. > org/p/102812/#102858 >
ADD REPLYlink written 11 days ago by martis.mihaela0

I wouldn't necessarily want to remove all the parental terms because they could find higher level terms that have several different child terms. However, to reduce the redundancy in your testing, do  GSEAGOHyperGParams(conditional = TRUE) . This will test GO:0005747 first, and if it is significant then it will remove those 3 genes from the counts for all it's parental terms.

ADD REPLYlink written 11 days ago by Jenny Drnevich1.9k
Thanks On 10 Nov 2017 6:28 pm, "Jenny Drnevich [bioc]" <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Jenny Drnevich <https: support.bioconductor.org="" u="" 2812=""/> wrote Comment: > GO-terms delivered by GOstats hyperGTest() do not exist in my universe > <https: support.bioconductor.org="" p="" 102812="" #102860="">: > > I wouldn't necessarily want to remove all the parental terms because they > could find higher level terms that have several different child terms. > However, to reduce the redundancy in your testing, do GSEAGOHyperGParams(conditional > = TRUE) . This will test GO:0005747 first, and if it is significant then > it will remove those 3 genes from the counts for all it's parental terms. > ------------------------------ > > Post tags: GOstats > > You may reply via email or visit https://support.bioconductor. > org/p/102812/#102860 >
ADD REPLYlink written 11 days ago by martis.mihaela0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 351 users visited in the last hour