GOStats: summary hyperGTest, count and size
3
0
Entering edit mode
@christine-glaer-6291
Last seen 10.2 years ago
Dear all, I'm analyzing a set of genes for GO enrichment, given a customized annotation. In that annotation, a specific GO term occurs only once; in my set of genes, the gene attached to it is also represented. I would have expected to find the numbers -- genes having a specific GO term in my list being "1" and genes having the term in total, in the annotation, being "1" as well -- in "count" and "size", however, 13 and 62 genes are denoted. Please have a look at my script and the session info: SessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=de_DE.utf8 LC_NUMERIC=C [3] LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8 [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=de_DE.utf8 [7] LC_PAPER=de_DE.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GO.db_2.9.0 GSEABase_1.22.0 annotate_1.38.0 [4] GOstats_2.26.0 RSQLite_0.11.4 DBI_0.2-7 [7] graph_1.38.3 Category_2.26.0 AnnotationDbi_1.22.6 [10] Biobase_2.20.1 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] AnnotationForge_1.2.2 genefilter_1.42.0 IRanges_1.18.4 [4] RBGL_1.36.2 splines_3.0.2 stats4_3.0.2 [7] survival_2.37-7 tcltk_3.0.2 tools_3.0.2 [10] XML_3.98-1.1 xtable_1.7-1 ## Script library(GOstats) library(GSEABase) library(GO.db) ANNO_FILE <- "genes_anno_forgostats.txt" UNIVERSE_FILE <-"universe.txt" ORG <- "MY_Organism" GENE_FILE<-"targets.txt" OUT_FILE<-"results.txt" goframeData <- read.table(ANNO_FILE,header=T) universe <- scan(file=UNIVERSE_FILE,what="character") goFrame = GOFrame(goframeData, organism = ORG) goAllFrame = GOAllFrame(goFrame) gsc <- GeneSetCollection(goAllFrame, setType = GOCollection()) genes <- scan(file=GENE_FILE,what="character") NAMESPACE="MF" x = length(NAMESPACE) params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, pvalueCutoff = 0.05, conditional=TRUE, testDirection="over") OverMF <- hyperGTest(params) NAMESPACE="BP" x = length(NAMESPACE) params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, pvalueCutoff = 0.05, conditional=TRUE, testDirection="over") OverBP <- hyperGTest(params) NAMESPACE="CC" x = length(NAMESPACE) params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, pvalueCutoff = 0.05, conditional=TRUE, testDirection="over") OverCC <- hyperGTest(params) write.table(summary(OverMF),file=OUTFILE,quote=F,sep='\t', append=T) write.table(summary(OverBP),file=OUTFILE,quote=F,sep='\t', append=T) write.table(summary(OverCC),file=OUTFILE,quote=F,sep='\t', append=T) If I run GOStats unconditionally, other GO terms being parent terms to the one I'm looking at also pop up. The numbers (count, size) are the same for all terms; my assumption is that genes with child terms belonging to a specific parent term are summed up in count/size, but I'm absolutely not sure if this is the case... Could you please help me figuring out what is denoted by the numbers in "counts" and "size", if not the absolute numbers like I would have expected? Kind regards, and many thanks in advance, Christine Gläßer [[alternative HTML version deleted]]
Annotation GO Organism GOstats Annotation GO Organism GOstats • 2.8k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States
Hi Christine, So from just looking at the code for Category, it looks like you might want to consult with the accessors for universeCounts(), expectedCounts() and geneCounts() etc. to get at some of your curiosity for these results. IOW, when you pass your hyperGResult object to them like (for example): universeCounts(OverCC) expectedCounts(OverCC) geneCounts(OverCC) They should each tell you something about your results. Anyhow, you can see the manual page for this whole family of accessors like this: ?universeCounts I hope this helps you get sorted out what you are curious about, Marc On 03/19/2014 05:07 AM, Christine Gläßer wrote: > Dear all, > > I'm analyzing a set of genes for GO enrichment, given a customized > annotation. In that annotation, a specific GO term occurs only once; in > my set of genes, the gene attached to it is also represented. > > I would have expected to find the numbers -- genes having a specific GO > term in my list being "1" and genes having the term in total, in the > annotation, being "1" as well -- in "count" and "size", however, 13 and > 62 genes are denoted. Please have a look at my script and the session info: > > > > SessionInfo() > > R version 3.0.2 (2013-09-25) > > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > > [1] LC_CTYPE=de_DE.utf8 LC_NUMERIC=C > > [3] LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8 > > [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=de_DE.utf8 > > [7] LC_PAPER=de_DE.utf8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > other attached packages: > > [1] GO.db_2.9.0 GSEABase_1.22.0 annotate_1.38.0 > > [4] GOstats_2.26.0 RSQLite_0.11.4 DBI_0.2-7 > > [7] graph_1.38.3 Category_2.26.0 AnnotationDbi_1.22.6 > > [10] Biobase_2.20.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > > [1] AnnotationForge_1.2.2 genefilter_1.42.0 IRanges_1.18.4 > > [4] RBGL_1.36.2 splines_3.0.2 stats4_3.0.2 > > [7] survival_2.37-7 tcltk_3.0.2 tools_3.0.2 > > [10] XML_3.98-1.1 xtable_1.7-1 > > > ## Script > > library(GOstats) > > library(GSEABase) > > library(GO.db) > > > ANNO_FILE <- "genes_anno_forgostats.txt" > > UNIVERSE_FILE <-"universe.txt" > > ORG <- "MY_Organism" > > GENE_FILE<-"targets.txt" > > OUT_FILE<-"results.txt" > > > goframeData <- read.table(ANNO_FILE,header=T) > > universe <- scan(file=UNIVERSE_FILE,what="character") > > goFrame = GOFrame(goframeData, organism = ORG) > > goAllFrame = GOAllFrame(goFrame) > > gsc <- GeneSetCollection(goAllFrame, setType = GOCollection()) > > > genes <- scan(file=GENE_FILE,what="character") > > > NAMESPACE="MF" > > x = length(NAMESPACE) > > params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, > geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, > pvalueCutoff = 0.05, conditional=TRUE, testDirection="over") > > OverMF <- hyperGTest(params) > > > NAMESPACE="BP" > > x = length(NAMESPACE) > > params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, > geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, > pvalueCutoff = 0.05, conditional=TRUE, testDirection="over") > > OverBP <- hyperGTest(params) > > > NAMESPACE="CC" > > x = length(NAMESPACE) > > params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, > geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, > pvalueCutoff = 0.05, conditional=TRUE, testDirection="over") > > OverCC <- hyperGTest(params) > > > write.table(summary(OverMF),file=OUTFILE,quote=F,sep='\t', append=T) > > write.table(summary(OverBP),file=OUTFILE,quote=F,sep='\t', append=T) > > write.table(summary(OverCC),file=OUTFILE,quote=F,sep='\t', append=T) > > > > > If I run GOStats unconditionally, other GO terms being parent terms to > the one I'm looking at also pop up. The numbers (count, size) are the > same for all terms; my assumption is that genes with child terms > belonging to a specific parent term are summed up in count/size, but I'm > absolutely not sure if this is the case... Could you please help me > figuring out what is denoted by the numbers in "counts" and "size", if > not the absolute numbers like I would have expected? > > > Kind regards, and many thanks in advance, > > Christine Gläßer > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@christine-glaer-6291
Last seen 10.2 years ago
Dear all, I'm analyzing a set of genes for GO enrichment, given a customized annotation. In that annotation, a specific GO term occurs only once; in my set of genes, the gene attached to it is also represented. I would have expected to find the numbers -- genes having a specific GO term in my list being "1" and genes having the term in total, in the annotation, being "1" as well -- in "count" and "size", however, 13 and 62 genes are denoted (please find my script below). My assumption, based on the results doing unconditional testing, is that genes with child terms belonging to a specific parent term are summed up in count/size, but I'm absolutely not sure if this is the case... Could you please help me figuring out what is denoted by the numbers in "counts" and "size", if not the absolute numbers like I would have expected? Kind regards, and many thanks in advance, Christine Gläßer SessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=de_DE.utf8 LC_NUMERIC=C [3] LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8 [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=de_DE.utf8 [7] LC_PAPER=de_DE.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GO.db_2.9.0 GSEABase_1.22.0 annotate_1.38.0 [4] GOstats_2.26.0 RSQLite_0.11.4 DBI_0.2-7 [7] graph_1.38.3 Category_2.26.0 AnnotationDbi_1.22.6 [10] Biobase_2.20.1 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] AnnotationForge_1.2.2 genefilter_1.42.0 IRanges_1.18.4 [4] RBGL_1.36.2 splines_3.0.2 stats4_3.0.2 [7] survival_2.37-7 tcltk_3.0.2 tools_3.0.2 [10] XML_3.98-1.1 xtable_1.7-1 ## Script library(GOstats) library(GSEABase) library(GO.db) ANNO_FILE <- "genes_anno_forgostats.txt" UNIVERSE_FILE <-"universe.txt" ORG <- "MY_Organism" GENE_FILE<-"targets.txt" OUT_FILE<-"results.txt" goframeData <- read.table(ANNO_FILE,header=T) universe <- scan(file=UNIVERSE_FILE,what="character") goFrame = GOFrame(goframeData, organism = ORG) goAllFrame = GOAllFrame(goFrame) gsc <- GeneSetCollection(goAllFrame, setType = GOCollection()) genes <- scan(file=GENE_FILE,what="character") NAMESPACE="MF" x = length(NAMESPACE) params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, pvalueCutoff = 0.05, conditional=TRUE, testDirection="over") OverMF <- hyperGTest(params) NAMESPACE="BP" x = length(NAMESPACE) params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, pvalueCutoff = 0.05, conditional=TRUE, testDirection="over") OverBP <- hyperGTest(params) NAMESPACE="CC" x = length(NAMESPACE) params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, pvalueCutoff = 0.05, conditional=TRUE, testDirection="over") OverCC <- hyperGTest(params) write.table(summary(OverMF),file=OUTFILE,quote=F,sep='\t', append=T) write.table(summary(OverBP),file=OUTFILE,quote=F,sep='\t', append=T) write.table(summary(OverCC),file=OUTFILE,quote=F,sep='\t', append=T) [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@christine-glaer-6291
Last seen 10.2 years ago
Dear Marc, thank you very much for your kind and fast reply. I checked the universe and gene counts - same results (13, 62 genes instead of 1,1). However, the documentation in universeCounts clarified why I had 62 in "size" ("?Size? is the number that could have been found in your gene list if every instance had turned up.") instead of 1; the docu further says that geneCounts should return the numbers of genes annotated to a specific term, if I understand correctly - then I would expect only 1 gene to be enlisted. Is there a misunderstanding on my side, and geneCounts enlists all genes belonging to the instances mentioned in "size", which are of course more genes than the one which is annotated with the specific term? Kind regards and many thanks, Christine Gl??er
ADD COMMENT

Login before adding your answer.

Traffic: 972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6