GOstat: listing genes from hyperGTest
2
0
Entering edit mode
Tim Smith ★ 1.1k
@tim-smith-1532
Last seen 9.6 years ago
Hi, I was performing a hyperGTest for genes in homo-sapiens. For a set of input genes, this function returns some 'significant' GO terms. What I wanted to now do was to co-relate each significant GO term (returned by this function) with genes (from my set of input genes) associated with that GO term. However, I think that I may be using the wrong package/function to get the releveant set of genes. Currently, what I'm doing is finding the significant GO terms by using the following code: ----------------------- ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs paramsGO <- new("GOHyperGParams", geneIds = genes1, universeGeneIds = allGenes, annotation = "org.Hs.eg.db", ontology = "BP", pvalueCutoff = 1, conditional = FALSE, testDirection = "over") GO <- hyperGTest(paramsGO) -------------------------- This gives me a set of significant GO terms. Now, I would like to find which subset of genes in 'genes1' is associated with each of the significant GO term. To do this I map all GO terms to their Entrez IDs using the 'org.Hs.eg.db' package using the following: xx <- as.list(org.Hs.egGO2EG) to get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't this number small?) that map to at least one Entrez ID. So, from here I look up which Entrez IDs are associated with my GO term of interest. My problem is that often, the GO term from hyperGTest is not associated with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described above ), i.e. the GO term/ID is not in the list obtained from 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by hyperGTest, but does not appear to be associated with any Entrez IDs in the org.Hs.eg.db package. Where could I be going wrong? I would give a set of genes so that the example is reproducible, but with hundreds of genes the email will get too long! Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... My sessionInfo() is: -------------------------------- R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] grid splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 Category_2.6.0 [8] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 GO.db_2.2.0 KEGG.db_2.2.0 [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 loaded via a namespace (and not attached): [1] cluster_1.11.11 MASS_7.2-44 --------------------------------- [[alternative HTML version deleted]]
Annotation GO Annotation GO • 1.1k views
ADD COMMENT
0
Entering edit mode
Tim Smith ★ 1.1k
@tim-smith-1532
Last seen 9.6 years ago
Thanks James. If I can tweak that function, I'll get exactly what I want. I tried what you suggested and got the following error: --------------------------- ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs paramsGO <- new("GOHyperGParams", geneIds = genes1, universeGeneIds = allGenes, annotation = "org.Hs.eg.db", ontology = "BP", pvalueCutoff = 1, conditional = FALSE, testDirection = "over") GO <- hyperGTest(paramsGO) ps <- probeSetSummary(GO) Error in get(mapName, envir = pkgEnv, inherits = FALSE) : variable "org.Hs.egENTREZID" was not found -------------------------------- I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input. Or am I doing something wrong? thanks! ----- Original Message ---- From: James W. MacDonald <jmacdon@med.umich.edu> Cc: bioc <bioconductor@stat.math.ethz.ch> Sent: Wednesday, October 22, 2008 9:10:39 AM Subject: Re: [BioC] GOstat: listing genes from hyperGTest Hi Tim, Does probeSetSummary() do what you want? Best, Jim Tim Smith wrote: > > Hi, > > I > was performing a hyperGTest for genes in homo-sapiens. For a set of > input genes, this function returns some 'significant' GO terms. What I > wanted to now do was to co-relate each significant GO term (returned by > this function) with genes (from my set of input genes) associated with > that GO term. However, I think that I may be using the wrong > package/function to get the releveant set of genes. > > Currently, what I'm doing is finding the significant GO terms by using the following code: > > ----------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > -------------------------- > This > gives me a set of significant GO terms. Now, I would like to find which > subset of genes in 'genes1' is associated with each of the significant > GO term. To do this I map all GO terms to their Entrez IDs using the > 'org.Hs.eg.db' package using the following: > > xx <- as.list(org.Hs.egGO2EG) > > to > get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't > this number small?) that map to at least one Entrez ID. So, from here I > look up which Entrez IDs are associated with my GO term of interest. > > My > problem is that often, the GO term from hyperGTest is not associated > with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described > above ), i.e. the GO term/ID is not in the list obtained from > 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by > hyperGTest, but does not appear to be associated with any Entrez IDs in > the org.Hs.eg.db package. Where could I be going wrong? > [[elided Yahoo spam]] > > Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... > > My sessionInfo() is: > -------------------------------- > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United > States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] grid splines tools stats graphics grDevices utils datasets methods base > > other attached packages: > [1] > gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 > gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 > Category_2.6.0 > [8] RBGL_1.16.0 annotate_1.18.0 > xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 > GO.db_2.2.0 KEGG.db_2.2.0 > [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 > [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 MASS_7.2-44 > > > --------------------------------- > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662 [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Tim, Yeah, probeSetSummary() is probably not what you want, if you are not starting with an Affy chip. There are some gymnastics required to map things back to the original Affy chip that you won't need to do. In addition, if you are not using a conditional hypergeometric analysis, it should be pretty simple to get what you want without even needing to parse things out of the GOHyperGResult object. An example: ## fake up some data > geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)] > univ <- Lkeys(org.Hs.egGO) > param <- new("GOHyperGParams", geneIds = geneIds, universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") > hyp <- hyperGTest(param) > summary(hyp, categorySize=10) GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 GO:0007338 0.002723500 29.25101 0.07808304 2 54 single fertilization 2 GO:0009566 0.002925855 28.16374 0.08097501 2 56 fertilization So we have two terms of interest. Getting the Entrez Gene IDs from the input set that map to these terms is easy: > geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))] [1] "100131137" "10007" Now you might also want to know which 54 Entrez Gene IDs map to that particular GO term. Since you are not conditioning, this includes that particular GO term and all its offspring. > offspring <- get("GO:0007338", GOBPOFFSPRING) > egids <- unique(unlist(mget(c("GO:0007338", offspring), revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE)) > egids[!is.na(egids)] [1] "1047" "4179" "4240" "4486" "4809" "5016" [7] "6674" "7783" "7784" "7802" "7993" "8747" [13] "8748" "8852" "9082" "10007" "10361" "22917" [19] "26476" "53340" "57055" "57829" "64100" "93185" [25] "158062" "442868" "100131137" "49" "410" "2683" [31] "3010" "4184" "6677" "7142" "7455" "8857" [37] "11055" "124626" "2054" "2741" "10343" "10566" [43] "27297" "152015" "3074" "167" "928" "2515" [49] "5104" "23553" "284359" "164684" "7141" "79400" Best, Jim Tim Smith wrote: > Thanks James. If I can tweak that function, I'll get exactly what I want. > > I tried what you suggested and got the following error: > > --------------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > ps <- probeSetSummary(GO) > > Error in get(mapName, envir = pkgEnv, inherits = FALSE) : > variable "org.Hs.egENTREZID" was not found > -------------------------------- > > I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input. > > Or am I doing something wrong? > > thanks! > > > > > > ----- Original Message ---- > From: James W. MacDonald <jmacdon at="" med.umich.edu=""> > > Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> > Sent: Wednesday, October 22, 2008 9:10:39 AM > Subject: Re: [BioC] GOstat: listing genes from hyperGTest > > Hi Tim, > > Does probeSetSummary() do what you want? > > Best, > > Jim > > > > Tim Smith wrote: >> >> Hi, >> >> I >> was performing a hyperGTest for genes in homo-sapiens. For a set of >> input genes, this function returns some 'significant' GO terms. What I >> wanted to now do was to co-relate each significant GO term (returned by >> this function) with genes (from my set of input genes) associated with >> that GO term. However, I think that I may be using the wrong >> package/function to get the releveant set of genes. >> >> Currently, what I'm doing is finding the significant GO terms by using the following code: >> >> ----------------------- >> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs >> >> paramsGO <- new("GOHyperGParams", geneIds = genes1, >> universeGeneIds = allGenes, annotation = "org.Hs.eg.db", >> ontology = "BP", pvalueCutoff = 1, conditional = FALSE, >> testDirection = "over") >> >> GO <- hyperGTest(paramsGO) >> -------------------------- >> This >> gives me a set of significant GO terms. Now, I would like to find which >> subset of genes in 'genes1' is associated with each of the significant >> GO term. To do this I map all GO terms to their Entrez IDs using the >> 'org.Hs.eg.db' package using the following: >> >> xx <- as.list(org.Hs.egGO2EG) >> >> to >> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't >> this number small?) that map to at least one Entrez ID. So, from here I >> look up which Entrez IDs are associated with my GO term of interest. >> >> My >> problem is that often, the GO term from hyperGTest is not associated >> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described >> above ), i.e. the GO term/ID is not in the list obtained from >> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by >> hyperGTest, but does not appear to be associated with any Entrez IDs in >> the org.Hs.eg.db package. Where could I be going wrong? >> > [[elided Yahoo spam]] >> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... >> >> My sessionInfo() is: >> -------------------------------- >> R version 2.7.2 (2008-08-25) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_United >> States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] grid splines tools stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] >> gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 >> gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 >> Category_2.6.0 >> [8] RBGL_1.16.0 annotate_1.18.0 >> xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 >> GO.db_2.2.0 KEGG.db_2.2.0 >> [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 >> [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 >> >> loaded via a namespace (and not attached): >> [1] cluster_1.11.11 MASS_7.2-44 >> >> >> --------------------------------- >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States
Hi Tim, Does probeSetSummary() do what you want? Best, Jim Tim Smith wrote: > > Hi, > > I > was performing a hyperGTest for genes in homo-sapiens. For a set of > input genes, this function returns some 'significant' GO terms. What I > wanted to now do was to co-relate each significant GO term (returned by > this function) with genes (from my set of input genes) associated with > that GO term. However, I think that I may be using the wrong > package/function to get the releveant set of genes. > > Currently, what I'm doing is finding the significant GO terms by using the following code: > > ----------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > -------------------------- > This > gives me a set of significant GO terms. Now, I would like to find which > subset of genes in 'genes1' is associated with each of the significant > GO term. To do this I map all GO terms to their Entrez IDs using the > 'org.Hs.eg.db' package using the following: > > xx <- as.list(org.Hs.egGO2EG) > > to > get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't > this number small?) that map to at least one Entrez ID. So, from here I > look up which Entrez IDs are associated with my GO term of interest. > > My > problem is that often, the GO term from hyperGTest is not associated > with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described > above ), i.e. the GO term/ID is not in the list obtained from > 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by > hyperGTest, but does not appear to be associated with any Entrez IDs in > the org.Hs.eg.db package. Where could I be going wrong? > > I would give a set of genes so that the example is reproducible, but with hundreds of genes the email will get too long! > > Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... > > My sessionInfo() is: > -------------------------------- > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United > States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] grid splines tools stats graphics grDevices utils datasets methods base > > other attached packages: > [1] > gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 > gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 > Category_2.6.0 > [8] RBGL_1.16.0 annotate_1.18.0 > xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 > GO.db_2.2.0 KEGG.db_2.2.0 > [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 > [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 MASS_7.2-44 > > > --------------------------------- > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD COMMENT

Login before adding your answer.

Traffic: 943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6