Hypergeometric test with Disease Ontology
1
0
Entering edit mode
Ted Morrow ▴ 60
@ted-morrow-3132
Last seen 9.6 years ago
Dear all, I would like to conduct a hypergeometric test on a list of genes but using Disease Ontology instead of GO or KEGG terms. The package "DO.db" contains this information but I have been unable to find a way of using this database in conjunction with the "GOstats" package that I have been using. Has anyone attempted to run a hypergeometric test for Disease Ontology terms? Is there another package I could use? Or is there a way of modifying the argument (if that's the right word) GOHyperGParams in GOstats so that it can make use of the information in DO.db? Both the GO and KEGG analyses work fine: ### GO chunk params <- new("GOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, annotation = "hgu95av2.db", ontology="BP", pvalueCutoff=0.01, conditional=FALSE, testDirection="over") hgOver <- hyperGTest(params) hgOver Gene to GO BP test for over-representation 2136 GO BP ids tested (15 have p < 0.01) Selected gene set size: 112 Gene universe size: 951 Annotation package: hgu95av2 ### KEGG chunk paramsKEGG <- new("KEGGHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, annotation = "hgu95av2.db", pvalueCutoff=0.01, testDirection="over") hgOverKEGG <- hyperGTest(paramsKEGG) hgOverKEGG summary(hgOverKEGG) My data looks like this: str(selectedEntrezIds) chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" "7546" "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" "28962" "10269" "4069" "30835" "7018" ... > str(entrezUniverse) chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" "6521" "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" "5048" "4843" "6329" "5080" "6401" "6853" ... My naive attempts to use DO have included: paramsDO <- new("DOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, annotation = "DO.db", pvalueCutoff=0.01, testDirection="over") Which of course doesn't work and gives the following error: Error in getClass(Class, where = topenv(parent.frame())) : "DOHyperGParams" is not a defined class > traceback() 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = NA) 2: getClass(Class, where = topenv(parent.frame())) 1: new("DOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") Replacing "GOHyperGParams" with "DOHyperGParams" also gives the following error:.. hgOverDO <- hyperGTest(paramsDO) Error in match.arg(ontology, c("BP", "CC", "MF")) : 'arg' should be one of ?BP?, ?CC?, ?MF? traceback() 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), collapse = ", ")), domain = NA) 9: match.arg(ontology, c("BP", "CC", "MF")) 8: getUniverseViaGo(p) 7: universeBuilder(p) 6: universeBuilder(p) 5: .hyperGTestInternal(p) 4: is(object, Cl) 3: is(object, Cl) 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGTest") 1: hyperGTest(paramsDO) Any help would be greatly appreciated. /Ted > sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0 Category_2.16.0 AnnotationDbi_1.12.0 [12] Biobase_2.10.0 loaded via a namespace (and not attached): [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 -- ______________________________________________________________________ ____________ Edward H. Morrow Department of Ecology and Evolution Animal Ecology (zooekologi) Evolutionary Biology Centre Uppsala University Norbyv?gen 18-D SE-752 36 Uppsala SWEDEN Email: ted.morrow at ebc.uu.se Tel: +46 18 471 2676 Fax +46 18 471 6484 Webpage: http://www.iee.uu.se/zooekol/default.php?type=personalpage&id =119&lang=en
GO hgu95av2 GO hgu95av2 • 2.2k views
ADD COMMENT
0
Entering edit mode
Gilbert Feng ▴ 300
@gilbert-feng-3778
Last seen 9.6 years ago
Hi, Ted DO.db is a standard sqlite file and you can use standard RSQLite procedure to retrieve the information. Actually we do have a lite version of Disease Ontology, DOLite that removes some redundant nodes, integrated in GeneAnswers package, which also uses hypergeometric test to run enrichment analysis as well as automatically generated interactive(cytoscape web support) html summary for one or more groups of genes. Best Gilbert On 1/27/11 5:00 AM, Ted Morrow wrote: > Dear all, > > I would like to conduct a hypergeometric test on a list of genes but > using Disease Ontology instead of GO or KEGG terms. The package "DO.db" > contains this information but I have been unable to find a way of using > this database in conjunction with the "GOstats" package that I have been > using. > > Has anyone attempted to run a hypergeometric test for Disease Ontology > terms? Is there another package I could use? Or is there a way of > modifying the argument (if that's the right word) GOHyperGParams in > GOstats so that it can make use of the information in DO.db? > > Both the GO and KEGG analyses work fine: > ### GO chunk > params <- new("GOHyperGParams", > geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, > annotation = "hgu95av2.db", ontology="BP", > pvalueCutoff=0.01, conditional=FALSE, > testDirection="over") > hgOver <- hyperGTest(params) > > hgOver > Gene to GO BP test for over-representation > 2136 GO BP ids tested (15 have p < 0.01) > Selected gene set size: 112 > Gene universe size: 951 > Annotation package: hgu95av2 > > ### KEGG chunk > paramsKEGG <- new("KEGGHyperGParams", > geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, > annotation = "hgu95av2.db", > pvalueCutoff=0.01, > testDirection="over") > > hgOverKEGG <- hyperGTest(paramsKEGG) > hgOverKEGG > summary(hgOverKEGG) > > My data looks like this: > str(selectedEntrezIds) > chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" "7546" > "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" > "28962" "10269" "4069" "30835" "7018" ... > > > str(entrezUniverse) > chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" "6521" > "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" "5048" > "4843" "6329" "5080" "6401" "6853" ... > > My naive attempts to use DO have included: > paramsDO <- new("DOHyperGParams", > geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, > annotation = "DO.db", > pvalueCutoff=0.01, > testDirection="over") > > Which of course doesn't work and gives the following error: > Error in getClass(Class, where = topenv(parent.frame())) : > "DOHyperGParams" is not a defined class > > > traceback() > 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = NA) > 2: getClass(Class, where = topenv(parent.frame())) > 1: new("DOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = > entrezUniverse, > annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") > > > Replacing "GOHyperGParams" with "DOHyperGParams" also gives the > following error:.. > > hgOverDO <- hyperGTest(paramsDO) > Error in match.arg(ontology, c("BP", "CC", "MF")) : > 'arg' should be one of ?BP?, ?CC?, ?MF? > > traceback() > 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), > collapse = ", ")), domain = NA) > 9: match.arg(ontology, c("BP", "CC", "MF")) > 8: getUniverseViaGo(p) > 7: universeBuilder(p) > 6: universeBuilder(p) > 5: .hyperGTestInternal(p) > 4: is(object, Cl) > 3: is(object, Cl) > 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", > "hyperGTest") > 1: hyperGTest(paramsDO) > > > Any help would be greatly appreciated. > /Ted > > > sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C > LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 > org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0 > Category_2.16.0 AnnotationDbi_1.12.0 > [12] Biobase_2.10.0 > > loaded via a namespace (and not attached): > [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 > splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 > >
ADD COMMENT
0
Entering edit mode
On 01/27/2011 04:17 PM, Gilbert Feng wrote: > Hi, Ted > > DO.db is a standard sqlite file and you can use standard RSQLite > procedure to retrieve the information. Actually we do have a lite > version of Disease Ontology, DOLite that removes some redundant nodes, > integrated in GeneAnswers package, which also uses hypergeometric test > to run enrichment analysis as well as automatically generated > interactive(cytoscape web support) html summary for one or more groups > of genes. > > Best > > Gilbert You might also want to take a look at this website: http://django.nubic.northwestern.edu/fundo/faq which implements enrichment analysis using genes annotated to DO terms. -peter > > On 1/27/11 5:00 AM, Ted Morrow wrote: >> Dear all, >> >> I would like to conduct a hypergeometric test on a list of genes but >> using Disease Ontology instead of GO or KEGG terms. The package "DO.db" >> contains this information but I have been unable to find a way of using >> this database in conjunction with the "GOstats" package that I have been >> using. >> >> Has anyone attempted to run a hypergeometric test for Disease Ontology >> terms? Is there another package I could use? Or is there a way of >> modifying the argument (if that's the right word) GOHyperGParams in >> GOstats so that it can make use of the information in DO.db? >> >> Both the GO and KEGG analyses work fine: >> ### GO chunk >> params<- new("GOHyperGParams", >> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >> annotation = "hgu95av2.db", ontology="BP", >> pvalueCutoff=0.01, conditional=FALSE, >> testDirection="over") >> hgOver<- hyperGTest(params) >> >> hgOver >> Gene to GO BP test for over-representation >> 2136 GO BP ids tested (15 have p< 0.01) >> Selected gene set size: 112 >> Gene universe size: 951 >> Annotation package: hgu95av2 >> >> ### KEGG chunk >> paramsKEGG<- new("KEGGHyperGParams", >> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >> annotation = "hgu95av2.db", >> pvalueCutoff=0.01, >> testDirection="over") >> >> hgOverKEGG<- hyperGTest(paramsKEGG) >> hgOverKEGG >> summary(hgOverKEGG) >> >> My data looks like this: >> str(selectedEntrezIds) >> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" "7546" >> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >> "28962" "10269" "4069" "30835" "7018" ... >> >> > str(entrezUniverse) >> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" "6521" >> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" "5048" >> "4843" "6329" "5080" "6401" "6853" ... >> >> My naive attempts to use DO have included: >> paramsDO<- new("DOHyperGParams", >> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >> annotation = "DO.db", >> pvalueCutoff=0.01, >> testDirection="over") >> >> Which of course doesn't work and gives the following error: >> Error in getClass(Class, where = topenv(parent.frame())) : >> "DOHyperGParams" is not a defined class >> >> > traceback() >> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = NA) >> 2: getClass(Class, where = topenv(parent.frame())) >> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = >> entrezUniverse, >> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >> >> >> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >> following error:.. >> >> hgOverDO<- hyperGTest(paramsDO) >> Error in match.arg(ontology, c("BP", "CC", "MF")) : >> 'arg' should be one of ?BP?, ?CC?, ?MF? >> >> traceback() >> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), >> collapse = ", ")), domain = NA) >> 9: match.arg(ontology, c("BP", "CC", "MF")) >> 8: getUniverseViaGo(p) >> 7: universeBuilder(p) >> 6: universeBuilder(p) >> 5: .hyperGTestInternal(p) >> 4: is(object, Cl) >> 3: is(object, Cl) >> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", >> "hyperGTest") >> 1: hyperGTest(paramsDO) >> >> >> Any help would be greatly appreciated. >> /Ted >> >> > sessionInfo() >> R version 2.12.1 (2010-12-16) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >> LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0 >> Category_2.16.0 AnnotationDbi_1.12.0 >> [12] Biobase_2.10.0 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > . > -- PD Dr. med. Peter N. Robinson, MSc. Institut f?r Medizinische Genetik Charit? - Universit?tsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany voice: 49-30-450566042 fax: 49-30-450569915 email: peter.robinson at charite.de http://compbio.charite.de/ http://www.human-phenotype-ontology.org
ADD REPLY
0
Entering edit mode
Thanks, Peter. Yes, Fundo is also a good choice. It also uses hypergeometric test, but it can show the GeneRef evidence for the association between diseases and genes. Gilbert On 1/27/11 9:39 AM, Peter Robinson wrote: > On 01/27/2011 04:17 PM, Gilbert Feng wrote: >> Hi, Ted >> >> DO.db is a standard sqlite file and you can use standard RSQLite >> procedure to retrieve the information. Actually we do have a lite >> version of Disease Ontology, DOLite that removes some redundant nodes, >> integrated in GeneAnswers package, which also uses hypergeometric test >> to run enrichment analysis as well as automatically generated >> interactive(cytoscape web support) html summary for one or more groups >> of genes. >> >> Best >> >> Gilbert > > > > > You might also want to take a look at this website: > http://django.nubic.northwestern.edu/fundo/faq > which implements enrichment analysis using genes annotated to DO terms. > -peter > > > > > > > > > > > >> >> On 1/27/11 5:00 AM, Ted Morrow wrote: >>> Dear all, >>> >>> I would like to conduct a hypergeometric test on a list of genes but >>> using Disease Ontology instead of GO or KEGG terms. The package "DO.db" >>> contains this information but I have been unable to find a way of using >>> this database in conjunction with the "GOstats" package that I have been >>> using. >>> >>> Has anyone attempted to run a hypergeometric test for Disease Ontology >>> terms? Is there another package I could use? Or is there a way of >>> modifying the argument (if that's the right word) GOHyperGParams in >>> GOstats so that it can make use of the information in DO.db? >>> >>> Both the GO and KEGG analyses work fine: >>> ### GO chunk >>> params<- new("GOHyperGParams", >>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>> annotation = "hgu95av2.db", ontology="BP", >>> pvalueCutoff=0.01, conditional=FALSE, >>> testDirection="over") >>> hgOver<- hyperGTest(params) >>> >>> hgOver >>> Gene to GO BP test for over-representation >>> 2136 GO BP ids tested (15 have p< 0.01) >>> Selected gene set size: 112 >>> Gene universe size: 951 >>> Annotation package: hgu95av2 >>> >>> ### KEGG chunk >>> paramsKEGG<- new("KEGGHyperGParams", >>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>> annotation = "hgu95av2.db", >>> pvalueCutoff=0.01, >>> testDirection="over") >>> >>> hgOverKEGG<- hyperGTest(paramsKEGG) >>> hgOverKEGG >>> summary(hgOverKEGG) >>> >>> My data looks like this: >>> str(selectedEntrezIds) >>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" "7546" >>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>> "28962" "10269" "4069" "30835" "7018" ... >>> >>> > str(entrezUniverse) >>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" "6521" >>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" "5048" >>> "4843" "6329" "5080" "6401" "6853" ... >>> >>> My naive attempts to use DO have included: >>> paramsDO<- new("DOHyperGParams", >>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>> annotation = "DO.db", >>> pvalueCutoff=0.01, >>> testDirection="over") >>> >>> Which of course doesn't work and gives the following error: >>> Error in getClass(Class, where = topenv(parent.frame())) : >>> "DOHyperGParams" is not a defined class >>> >>> > traceback() >>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = NA) >>> 2: getClass(Class, where = topenv(parent.frame())) >>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = >>> entrezUniverse, >>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>> >>> >>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>> following error:.. >>> >>> hgOverDO<- hyperGTest(paramsDO) >>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>> >>> traceback() >>> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), >>> collapse = ", ")), domain = NA) >>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>> 8: getUniverseViaGo(p) >>> 7: universeBuilder(p) >>> 6: universeBuilder(p) >>> 5: .hyperGTestInternal(p) >>> 4: is(object, Cl) >>> 3: is(object, Cl) >>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", >>> "hyperGTest") >>> 1: hyperGTest(paramsDO) >>> >>> >>> Any help would be greatly appreciated. >>> /Ted >>> >>> > sessionInfo() >>> R version 2.12.1 (2010-12-16) >>> Platform: i386-pc-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>> LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0 >>> Category_2.16.0 AnnotationDbi_1.12.0 >>> [12] Biobase_2.10.0 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> . >> > > -- ----------------------------------------------- Gang (Gilbert) Feng, PhD Biomedical Informatics Center Robert H. Lurie Comprehensive Cancer Center Northwestern University 750 N. Lake Shore Drive, 11th Floor(11-175e) Chicago, IL 60611 Phone:312-503-2358 Email g-feng (at) northwestern.edu
ADD REPLY
0
Entering edit mode
Thanks for your replies Gilbert and Peter.....I forgot to say in my original question that I did look into using both the GeneAnswers package and the FunDO web-based system but the issue I have not been able to solve with them is that I cannot specify "my gene Universe" within which my selected genes are tested for enrichment. I want to see which diseases are enriched out of the possible diseases in "my gene universe" which is a subset of all possible human genes. Is that possible in GeneAnswers (or even FunDO)? Cheers Ted On 1/27/2011 4:39 PM, Peter Robinson wrote: > On 01/27/2011 04:17 PM, Gilbert Feng wrote: >> Hi, Ted >> >> DO.db is a standard sqlite file and you can use standard RSQLite >> procedure to retrieve the information. Actually we do have a lite >> version of Disease Ontology, DOLite that removes some redundant nodes, >> integrated in GeneAnswers package, which also uses hypergeometric test >> to run enrichment analysis as well as automatically generated >> interactive(cytoscape web support) html summary for one or more groups >> of genes. >> >> Best >> >> Gilbert > > > > > You might also want to take a look at this website: > http://django.nubic.northwestern.edu/fundo/faq > which implements enrichment analysis using genes annotated to DO terms. > -peter > > > > > > > > > > > >> >> On 1/27/11 5:00 AM, Ted Morrow wrote: >>> Dear all, >>> >>> I would like to conduct a hypergeometric test on a list of genes but >>> using Disease Ontology instead of GO or KEGG terms. The package "DO.db" >>> contains this information but I have been unable to find a way of using >>> this database in conjunction with the "GOstats" package that I have >>> been >>> using. >>> >>> Has anyone attempted to run a hypergeometric test for Disease Ontology >>> terms? Is there another package I could use? Or is there a way of >>> modifying the argument (if that's the right word) GOHyperGParams in >>> GOstats so that it can make use of the information in DO.db? >>> >>> Both the GO and KEGG analyses work fine: >>> ### GO chunk >>> params<- new("GOHyperGParams", >>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>> annotation = "hgu95av2.db", ontology="BP", >>> pvalueCutoff=0.01, conditional=FALSE, >>> testDirection="over") >>> hgOver<- hyperGTest(params) >>> >>> hgOver >>> Gene to GO BP test for over-representation >>> 2136 GO BP ids tested (15 have p< 0.01) >>> Selected gene set size: 112 >>> Gene universe size: 951 >>> Annotation package: hgu95av2 >>> >>> ### KEGG chunk >>> paramsKEGG<- new("KEGGHyperGParams", >>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>> annotation = "hgu95av2.db", >>> pvalueCutoff=0.01, >>> testDirection="over") >>> >>> hgOverKEGG<- hyperGTest(paramsKEGG) >>> hgOverKEGG >>> summary(hgOverKEGG) >>> >>> My data looks like this: >>> str(selectedEntrezIds) >>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" "7546" >>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>> "28962" "10269" "4069" "30835" "7018" ... >>> >>> > str(entrezUniverse) >>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" "6521" >>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" "5048" >>> "4843" "6329" "5080" "6401" "6853" ... >>> >>> My naive attempts to use DO have included: >>> paramsDO<- new("DOHyperGParams", >>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>> annotation = "DO.db", >>> pvalueCutoff=0.01, >>> testDirection="over") >>> >>> Which of course doesn't work and gives the following error: >>> Error in getClass(Class, where = topenv(parent.frame())) : >>> "DOHyperGParams" is not a defined class >>> >>> > traceback() >>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = NA) >>> 2: getClass(Class, where = topenv(parent.frame())) >>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = >>> entrezUniverse, >>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>> >>> >>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>> following error:.. >>> >>> hgOverDO<- hyperGTest(paramsDO) >>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>> >>> traceback() >>> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), >>> collapse = ", ")), domain = NA) >>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>> 8: getUniverseViaGo(p) >>> 7: universeBuilder(p) >>> 6: universeBuilder(p) >>> 5: .hyperGTestInternal(p) >>> 4: is(object, Cl) >>> 3: is(object, Cl) >>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", >>> "hyperGTest") >>> 1: hyperGTest(paramsDO) >>> >>> >>> Any help would be greatly appreciated. >>> /Ted >>> >>> > sessionInfo() >>> R version 2.12.1 (2010-12-16) >>> Platform: i386-pc-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>> LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0 >>> Category_2.16.0 AnnotationDbi_1.12.0 >>> [12] Biobase_2.10.0 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> . >> > > -- ______________________________________________________________________ ____________ Edward H. Morrow Department of Ecology and Evolution Animal Ecology (zooekologi) Evolutionary Biology Centre Uppsala University Norbyv?gen 18-D SE-752 36 Uppsala SWEDEN Email: ted.morrow at ebc.uu.se Tel: +46 18 471 2676 Fax +46 18 471 6484 Webpage: http://www.iee.uu.se/zooekol/default.php?type=personalpage&id =119&lang=en
ADD REPLY
0
Entering edit mode
Hi, Ted If I correctly understand your question, you have your own gene pool, don't you? In that case, GeneAnswers still can handle that. What you need is to build a customized DOLite list to run analysis. For example, if your genes are mygenes(an unique Entrez GENE ID vector) myDOLite <- lapply(DOLite, intersect, mygenes) names(myDOLite) <- DOLiteTerm[names(myDOLite)] #Please note that the parameter 'known' is TRUE in geneAnswersBuilder, which means the test will only use shared genes in mygenes and DOLite genes as the gene pool to run hypergeometric test. You can change the number of gene pool by set parameter 'totalGeneNumber' and set 'known' to FALSE at the same time . myGA <- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...) since DO is designed for human, you can use setAnnLib to set the species as 'org.Hs.eg.db' for further visualization or analysis. Let me know if you have any question. Gilbert BTW, the current one has a new function to automatically generate a report for a or several groups of genes. Please check "groupReport" page, but before you run it, make sure you have java support and flash installation. On 1/27/11 9:45 AM, Ted Morrow wrote: > Thanks for your replies Gilbert and Peter.....I forgot to say in my > original question that I did look into using both the GeneAnswers > package and the FunDO web-based system but the issue I have not been > able to solve with them is that I cannot specify "my gene Universe" > within which my selected genes are tested for enrichment. I want to see > which diseases are enriched out of the possible diseases in "my gene > universe" which is a subset of all possible human genes. > > Is that possible in GeneAnswers (or even FunDO)? > > Cheers > Ted > > On 1/27/2011 4:39 PM, Peter Robinson wrote: >> On 01/27/2011 04:17 PM, Gilbert Feng wrote: >>> Hi, Ted >>> >>> DO.db is a standard sqlite file and you can use standard RSQLite >>> procedure to retrieve the information. Actually we do have a lite >>> version of Disease Ontology, DOLite that removes some redundant nodes, >>> integrated in GeneAnswers package, which also uses hypergeometric test >>> to run enrichment analysis as well as automatically generated >>> interactive(cytoscape web support) html summary for one or more groups >>> of genes. >>> >>> Best >>> >>> Gilbert >> >> >> >> >> You might also want to take a look at this website: >> http://django.nubic.northwestern.edu/fundo/faq >> which implements enrichment analysis using genes annotated to DO terms. >> -peter >> >> >> >> >> >> >> >> >> >> >> >>> >>> On 1/27/11 5:00 AM, Ted Morrow wrote: >>>> Dear all, >>>> >>>> I would like to conduct a hypergeometric test on a list of genes but >>>> using Disease Ontology instead of GO or KEGG terms. The package "DO.db" >>>> contains this information but I have been unable to find a way of using >>>> this database in conjunction with the "GOstats" package that I have >>>> been >>>> using. >>>> >>>> Has anyone attempted to run a hypergeometric test for Disease Ontology >>>> terms? Is there another package I could use? Or is there a way of >>>> modifying the argument (if that's the right word) GOHyperGParams in >>>> GOstats so that it can make use of the information in DO.db? >>>> >>>> Both the GO and KEGG analyses work fine: >>>> ### GO chunk >>>> params<- new("GOHyperGParams", >>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>> annotation = "hgu95av2.db", ontology="BP", >>>> pvalueCutoff=0.01, conditional=FALSE, >>>> testDirection="over") >>>> hgOver<- hyperGTest(params) >>>> >>>> hgOver >>>> Gene to GO BP test for over-representation >>>> 2136 GO BP ids tested (15 have p< 0.01) >>>> Selected gene set size: 112 >>>> Gene universe size: 951 >>>> Annotation package: hgu95av2 >>>> >>>> ### KEGG chunk >>>> paramsKEGG<- new("KEGGHyperGParams", >>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>> annotation = "hgu95av2.db", >>>> pvalueCutoff=0.01, >>>> testDirection="over") >>>> >>>> hgOverKEGG<- hyperGTest(paramsKEGG) >>>> hgOverKEGG >>>> summary(hgOverKEGG) >>>> >>>> My data looks like this: >>>> str(selectedEntrezIds) >>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" "7546" >>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>>> "28962" "10269" "4069" "30835" "7018" ... >>>> >>>> > str(entrezUniverse) >>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" "6521" >>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" "5048" >>>> "4843" "6329" "5080" "6401" "6853" ... >>>> >>>> My naive attempts to use DO have included: >>>> paramsDO<- new("DOHyperGParams", >>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>> annotation = "DO.db", >>>> pvalueCutoff=0.01, >>>> testDirection="over") >>>> >>>> Which of course doesn't work and gives the following error: >>>> Error in getClass(Class, where = topenv(parent.frame())) : >>>> "DOHyperGParams" is not a defined class >>>> >>>> > traceback() >>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = NA) >>>> 2: getClass(Class, where = topenv(parent.frame())) >>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = >>>> entrezUniverse, >>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>>> >>>> >>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>>> following error:.. >>>> >>>> hgOverDO<- hyperGTest(paramsDO) >>>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>>> >>>> traceback() >>>> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), >>>> collapse = ", ")), domain = NA) >>>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>>> 8: getUniverseViaGo(p) >>>> 7: universeBuilder(p) >>>> 6: universeBuilder(p) >>>> 5: .hyperGTestInternal(p) >>>> 4: is(object, Cl) >>>> 3: is(object, Cl) >>>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", >>>> "hyperGTest") >>>> 1: hyperGTest(paramsDO) >>>> >>>> >>>> Any help would be greatly appreciated. >>>> /Ted >>>> >>>> > sessionInfo() >>>> R version 2.12.1 (2010-12-16) >>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>> LC_TIME=English_United States.1252 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0 >>>> Category_2.16.0 AnnotationDbi_1.12.0 >>>> [12] Biobase_2.10.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>>> >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> . >>> >> >> > -- ----------------------------------------------- Gang (Gilbert) Feng, PhD Biomedical Informatics Center Robert H. Lurie Comprehensive Cancer Center Northwestern University 750 N. Lake Shore Drive, 11th Floor(11-175e) Chicago, IL 60611 Phone:312-503-2358 Email g-feng (at) northwestern.edu
ADD REPLY
0
Entering edit mode
Hi Gilbert, Thanks for the response....it seems like GeneAnswers may do the job but I am still having trouble getting it to work for me. I have two gene pools - one is a universe of genes (called say "mygenepool") which is only a subset on all human genes, the other is a subset of mygenepool (called say "mygenes"). I am interested in knowing which DO terms show over-representation in mygenes given the DO terms present in mygenepool. So, in your example I am a bit confused as to which pool "mygenes" refers to. Is it "mygenepool" or "mygenes"? So I have...... str(mygenepool) # I used as.matrix then as.vector to change the dataframe into a vector...is that OK? int [1:1310] 8813 3075 2729 8379 204 170302 10165 6521 799 3052 ... and str(mygenes) int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... and mygenes is a subset of mygenepool: overlap <- intersect(mygenepool, mygenes) str(overlap) int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... Running the first bit of your example code seems to work fine: myDOLite <- lapply(DOLite, intersect, mygenepool) names(myDOLite) <- DOLiteTerm[names(myDOLite)] and so does creating a GeneAnswers instance (myGA) whether I specify the totalGeneNumber or not: myGA <- geneAnswersBuilder(mygenes, myDOLite, categoryType='DOLITE', testType='hyperG', #known=FALSE, totalGeneNumber=1202, pvalueT=0.1, geneExpressionProfile=NULL, verbose=TRUE) But then I get the following error: myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) [1] "Mapping geneInput ..." Error in switch(sub("org.*[:.:]", "", libname), eg = "EG", tair = "TAIR", : EXPR must be a length 1 vector traceback() 2: getSymbols(as.vector(x at geneInput[, 1]), x at annLib, strict = strict, missing = missing) 1: geneAnswersReadable(myGA, verbose = TRUE) If I instead create the GeneAnswers instance with 'org.Hs.eg.db' and not myDOLite the step..... myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) .....works fine as do later steps (for example geneAnsersChartPlots and topDOLiteGenes) But then presumably I am not working with the restricted pool of genes that I want to (i.e. mygenepool) Any ideas? Cheers Ted On 1/27/2011 5:19 PM, Gilbert Feng wrote: > Hi, Ted > > If I correctly understand your question, you have your own gene pool, > don't you? In that case, GeneAnswers still can handle that. What you > need is to build a customized DOLite list to run analysis. For example, > > if your genes are mygenes(an unique Entrez GENE ID vector) > > myDOLite <- lapply(DOLite, intersect, mygenes) > names(myDOLite) <- DOLiteTerm[names(myDOLite)] > #Please note that the parameter 'known' is TRUE in geneAnswersBuilder, > which means the test will only use shared genes in mygenes and DOLite > genes as the gene pool to run hypergeometric test. You can change the > number of gene pool by set parameter 'totalGeneNumber' and set 'known' > to FALSE at the same time . > myGA <- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...) > > since DO is designed for human, you can use setAnnLib to set the > species as 'org.Hs.eg.db' for further visualization or analysis. > > Let me know if you have any question. > > Gilbert > > BTW, the current one has a new function to automatically generate a > report for a or several groups of genes. Please check "groupReport" > page, but before you run it, make sure you have java support and flash > installation. > > > On 1/27/11 9:45 AM, Ted Morrow wrote: >> Thanks for your replies Gilbert and Peter.....I forgot to say in my >> original question that I did look into using both the GeneAnswers >> package and the FunDO web-based system but the issue I have not been >> able to solve with them is that I cannot specify "my gene Universe" >> within which my selected genes are tested for enrichment. I want to see >> which diseases are enriched out of the possible diseases in "my gene >> universe" which is a subset of all possible human genes. >> >> Is that possible in GeneAnswers (or even FunDO)? >> >> Cheers >> Ted >> >> On 1/27/2011 4:39 PM, Peter Robinson wrote: >>> On 01/27/2011 04:17 PM, Gilbert Feng wrote: >>>> Hi, Ted >>>> >>>> DO.db is a standard sqlite file and you can use standard RSQLite >>>> procedure to retrieve the information. Actually we do have a lite >>>> version of Disease Ontology, DOLite that removes some redundant nodes, >>>> integrated in GeneAnswers package, which also uses hypergeometric test >>>> to run enrichment analysis as well as automatically generated >>>> interactive(cytoscape web support) html summary for one or more groups >>>> of genes. >>>> >>>> Best >>>> >>>> Gilbert >>> >>> >>> >>> >>> You might also want to take a look at this website: >>> http://django.nubic.northwestern.edu/fundo/faq >>> which implements enrichment analysis using genes annotated to DO terms. >>> -peter >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> >>>> On 1/27/11 5:00 AM, Ted Morrow wrote: >>>>> Dear all, >>>>> >>>>> I would like to conduct a hypergeometric test on a list of genes but >>>>> using Disease Ontology instead of GO or KEGG terms. The package >>>>> "DO.db" >>>>> contains this information but I have been unable to find a way of >>>>> using >>>>> this database in conjunction with the "GOstats" package that I have >>>>> been >>>>> using. >>>>> >>>>> Has anyone attempted to run a hypergeometric test for Disease >>>>> Ontology >>>>> terms? Is there another package I could use? Or is there a way of >>>>> modifying the argument (if that's the right word) GOHyperGParams in >>>>> GOstats so that it can make use of the information in DO.db? >>>>> >>>>> Both the GO and KEGG analyses work fine: >>>>> ### GO chunk >>>>> params<- new("GOHyperGParams", >>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>> annotation = "hgu95av2.db", ontology="BP", >>>>> pvalueCutoff=0.01, conditional=FALSE, >>>>> testDirection="over") >>>>> hgOver<- hyperGTest(params) >>>>> >>>>> hgOver >>>>> Gene to GO BP test for over-representation >>>>> 2136 GO BP ids tested (15 have p< 0.01) >>>>> Selected gene set size: 112 >>>>> Gene universe size: 951 >>>>> Annotation package: hgu95av2 >>>>> >>>>> ### KEGG chunk >>>>> paramsKEGG<- new("KEGGHyperGParams", >>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>> annotation = "hgu95av2.db", >>>>> pvalueCutoff=0.01, >>>>> testDirection="over") >>>>> >>>>> hgOverKEGG<- hyperGTest(paramsKEGG) >>>>> hgOverKEGG >>>>> summary(hgOverKEGG) >>>>> >>>>> My data looks like this: >>>>> str(selectedEntrezIds) >>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" >>>>> "7546" >>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>>>> "28962" "10269" "4069" "30835" "7018" ... >>>>> >>>>> > str(entrezUniverse) >>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" >>>>> "6521" >>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" >>>>> "5048" >>>>> "4843" "6329" "5080" "6401" "6853" ... >>>>> >>>>> My naive attempts to use DO have included: >>>>> paramsDO<- new("DOHyperGParams", >>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>> annotation = "DO.db", >>>>> pvalueCutoff=0.01, >>>>> testDirection="over") >>>>> >>>>> Which of course doesn't work and gives the following error: >>>>> Error in getClass(Class, where = topenv(parent.frame())) : >>>>> "DOHyperGParams" is not a defined class >>>>> >>>>> > traceback() >>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = >>>>> NA) >>>>> 2: getClass(Class, where = topenv(parent.frame())) >>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, >>>>> universeGeneIds = >>>>> entrezUniverse, >>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>>>> >>>>> >>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>>>> following error:.. >>>>> >>>>> hgOverDO<- hyperGTest(paramsDO) >>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>>>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>>>> >>>>> traceback() >>>>> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), >>>>> collapse = ", ")), domain = NA) >>>>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>>>> 8: getUniverseViaGo(p) >>>>> 7: universeBuilder(p) >>>>> 6: universeBuilder(p) >>>>> 5: .hyperGTestInternal(p) >>>>> 4: is(object, Cl) >>>>> 3: is(object, Cl) >>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", >>>>> "hyperGTest") >>>>> 1: hyperGTest(paramsDO) >>>>> >>>>> >>>>> Any help would be greatly appreciated. >>>>> /Ted >>>>> >>>>> > sessionInfo() >>>>> R version 2.12.1 (2010-12-16) >>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>>> LC_TIME=English_United States.1252 >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 >>>>> graph_1.28.0 >>>>> Category_2.16.0 AnnotationDbi_1.12.0 >>>>> [12] Biobase_2.10.0 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> . >>>> >>> >>> >> > -- ______________________________________________________________________ ____________ Edward H. Morrow Department of Ecology and Evolution Animal Ecology (zooekologi) Evolutionary Biology Centre Uppsala University Norbyv?gen 18-D SE-752 36 Uppsala SWEDEN Email: ted.morrow at ebc.uu.se Tel: +46 18 471 2676 Fax +46 18 471 6484 Webpage: http://www.iee.uu.se/zooekol/default.php?type=personalpage&id =119&lang=en
ADD REPLY
0
Entering edit mode
Hi, Ted Yes, your codes about mygenepool and mygenes are correct. I didn't clearly state them in my previous email. Sorry about that. myDOLite should be built by DOLite and mygenepool, not mygenes. So currently, myDOLite is a customized annotation library and when you build a geneAnswers instance, leave categoryType to NULL(default value), and in your case, I think the total gene number should be 1310, not 1202, since the former is the gene number of mygenepool. after buiding myGA, run setAnnLib to set it to org.Hs.eg.db I guess the error is that you don't set it. For the further mapping function, like geneAnswerReadable or other visualization, always set catTerm=FALSE. Let me know if it doesn't work. Best Gilbert > I have two gene pools - one is a universe of genes (called say > "mygenepool") which is only a subset on all human genes, the other is a > subset of mygenepool (called say "mygenes"). I am interested in knowing > which DO terms show over-representation in mygenes given the DO terms > present in mygenepool. So, in your example I am a bit confused as to > which pool "mygenes" refers to. Is it "mygenepool" or "mygenes"? > > So I have...... > str(mygenepool) # I used as.matrix then as.vector to change the > dataframe into a vector...is that OK? > int [1:1310] 8813 3075 2729 8379 204 170302 10165 6521 799 3052 ... > > and > str(mygenes) > int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... > > > and mygenes is a subset of mygenepool: > overlap <- intersect(mygenepool, mygenes) > str(overlap) > int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... > > > Running the first bit of your example code seems to work fine: > myDOLite <- lapply(DOLite, intersect, mygenepool) > names(myDOLite) <- DOLiteTerm[names(myDOLite)] > > and so does creating a GeneAnswers instance (myGA) whether I specify the > totalGeneNumber or not: > myGA <- geneAnswersBuilder(mygenes, myDOLite, categoryType='DOLITE', > testType='hyperG', > #known=FALSE, totalGeneNumber=1202, > pvalueT=0.1, geneExpressionProfile=NULL, verbose=TRUE) > > But then I get the following error: > myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) > [1] "Mapping geneInput ..." > Error in switch(sub("org.*[:.:]", "", libname), eg = "EG", tair = "TAIR", : > EXPR must be a length 1 vector > > traceback() > 2: getSymbols(as.vector(x at geneInput[, 1]), x at annLib, strict = strict, > missing = missing) > 1: geneAnswersReadable(myGA, verbose = TRUE) > > If I instead create the GeneAnswers instance with 'org.Hs.eg.db' and not > myDOLite the step..... > myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) > > .....works fine as do later steps (for example geneAnsersChartPlots and > topDOLiteGenes) > > But then presumably I am not working with the restricted pool of genes > that I want to (i.e. mygenepool) > > Any ideas? > Cheers > Ted > > > > > > > On 1/27/2011 5:19 PM, Gilbert Feng wrote: >> Hi, Ted >> >> If I correctly understand your question, you have your own gene pool, >> don't you? In that case, GeneAnswers still can handle that. What you >> need is to build a customized DOLite list to run analysis. For example, >> >> if your genes are mygenes(an unique Entrez GENE ID vector) >> >> myDOLite <- lapply(DOLite, intersect, mygenes) >> names(myDOLite) <- DOLiteTerm[names(myDOLite)] >> #Please note that the parameter 'known' is TRUE in geneAnswersBuilder, >> which means the test will only use shared genes in mygenes and DOLite >> genes as the gene pool to run hypergeometric test. You can change the >> number of gene pool by set parameter 'totalGeneNumber' and set 'known' >> to FALSE at the same time . >> myGA <- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...) >> >> since DO is designed for human, you can use setAnnLib to set the >> species as 'org.Hs.eg.db' for further visualization or analysis. >> >> Let me know if you have any question. >> >> Gilbert >> >> BTW, the current one has a new function to automatically generate a >> report for a or several groups of genes. Please check "groupReport" >> page, but before you run it, make sure you have java support and flash >> installation. >> >> >> On 1/27/11 9:45 AM, Ted Morrow wrote: >>> Thanks for your replies Gilbert and Peter.....I forgot to say in my >>> original question that I did look into using both the GeneAnswers >>> package and the FunDO web-based system but the issue I have not been >>> able to solve with them is that I cannot specify "my gene Universe" >>> within which my selected genes are tested for enrichment. I want to see >>> which diseases are enriched out of the possible diseases in "my gene >>> universe" which is a subset of all possible human genes. >>> >>> Is that possible in GeneAnswers (or even FunDO)? >>> >>> Cheers >>> Ted >>> >>> On 1/27/2011 4:39 PM, Peter Robinson wrote: >>>> On 01/27/2011 04:17 PM, Gilbert Feng wrote: >>>>> Hi, Ted >>>>> >>>>> DO.db is a standard sqlite file and you can use standard RSQLite >>>>> procedure to retrieve the information. Actually we do have a lite >>>>> version of Disease Ontology, DOLite that removes some redundant nodes, >>>>> integrated in GeneAnswers package, which also uses hypergeometric test >>>>> to run enrichment analysis as well as automatically generated >>>>> interactive(cytoscape web support) html summary for one or more groups >>>>> of genes. >>>>> >>>>> Best >>>>> >>>>> Gilbert >>>> >>>> >>>> >>>> >>>> You might also want to take a look at this website: >>>> http://django.nubic.northwestern.edu/fundo/faq >>>> which implements enrichment analysis using genes annotated to DO terms. >>>> -peter >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> On 1/27/11 5:00 AM, Ted Morrow wrote: >>>>>> Dear all, >>>>>> >>>>>> I would like to conduct a hypergeometric test on a list of genes but >>>>>> using Disease Ontology instead of GO or KEGG terms. The package >>>>>> "DO.db" >>>>>> contains this information but I have been unable to find a way of >>>>>> using >>>>>> this database in conjunction with the "GOstats" package that I have >>>>>> been >>>>>> using. >>>>>> >>>>>> Has anyone attempted to run a hypergeometric test for Disease >>>>>> Ontology >>>>>> terms? Is there another package I could use? Or is there a way of >>>>>> modifying the argument (if that's the right word) GOHyperGParams in >>>>>> GOstats so that it can make use of the information in DO.db? >>>>>> >>>>>> Both the GO and KEGG analyses work fine: >>>>>> ### GO chunk >>>>>> params<- new("GOHyperGParams", >>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>> annotation = "hgu95av2.db", ontology="BP", >>>>>> pvalueCutoff=0.01, conditional=FALSE, >>>>>> testDirection="over") >>>>>> hgOver<- hyperGTest(params) >>>>>> >>>>>> hgOver >>>>>> Gene to GO BP test for over-representation >>>>>> 2136 GO BP ids tested (15 have p< 0.01) >>>>>> Selected gene set size: 112 >>>>>> Gene universe size: 951 >>>>>> Annotation package: hgu95av2 >>>>>> >>>>>> ### KEGG chunk >>>>>> paramsKEGG<- new("KEGGHyperGParams", >>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>> annotation = "hgu95av2.db", >>>>>> pvalueCutoff=0.01, >>>>>> testDirection="over") >>>>>> >>>>>> hgOverKEGG<- hyperGTest(paramsKEGG) >>>>>> hgOverKEGG >>>>>> summary(hgOverKEGG) >>>>>> >>>>>> My data looks like this: >>>>>> str(selectedEntrezIds) >>>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" >>>>>> "7546" >>>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>>>>> "28962" "10269" "4069" "30835" "7018" ... >>>>>> >>>>>> > str(entrezUniverse) >>>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" >>>>>> "6521" >>>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" >>>>>> "5048" >>>>>> "4843" "6329" "5080" "6401" "6853" ... >>>>>> >>>>>> My naive attempts to use DO have included: >>>>>> paramsDO<- new("DOHyperGParams", >>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>> annotation = "DO.db", >>>>>> pvalueCutoff=0.01, >>>>>> testDirection="over") >>>>>> >>>>>> Which of course doesn't work and gives the following error: >>>>>> Error in getClass(Class, where = topenv(parent.frame())) : >>>>>> "DOHyperGParams" is not a defined class >>>>>> >>>>>> > traceback() >>>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = >>>>>> NA) >>>>>> 2: getClass(Class, where = topenv(parent.frame())) >>>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, >>>>>> universeGeneIds = >>>>>> entrezUniverse, >>>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>>>>> >>>>>> >>>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>>>>> following error:.. >>>>>> >>>>>> hgOverDO<- hyperGTest(paramsDO) >>>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>>>>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>>>>> >>>>>> traceback() >>>>>> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), >>>>>> collapse = ", ")), domain = NA) >>>>>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>>>>> 8: getUniverseViaGo(p) >>>>>> 7: universeBuilder(p) >>>>>> 6: universeBuilder(p) >>>>>> 5: .hyperGTestInternal(p) >>>>>> 4: is(object, Cl) >>>>>> 3: is(object, Cl) >>>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", >>>>>> "hyperGTest") >>>>>> 1: hyperGTest(paramsDO) >>>>>> >>>>>> >>>>>> Any help would be greatly appreciated. >>>>>> /Ted >>>>>> >>>>>> > sessionInfo() >>>>>> R version 2.12.1 (2010-12-16) >>>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>>> >>>>>> locale: >>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>>>> LC_TIME=English_United States.1252 >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 >>>>>> graph_1.28.0 >>>>>> Category_2.16.0 AnnotationDbi_1.12.0 >>>>>> [12] Biobase_2.10.0 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> . >>>>> >>>> >>>> >>> >> > -- ----------------------------------------------- Gang (Gilbert) Feng, PhD Biomedical Informatics Center Robert H. Lurie Comprehensive Cancer Center Northwestern University 750 N. Lake Shore Drive, 11th Floor(11-175e) Chicago, IL 60611 Phone:312-503-2358 Email g-feng (at) northwestern.edu
ADD REPLY
0
Entering edit mode
Hi Gilbert, Great thanks for the clarification....OK building myGA works fine and I have setAnnLib to org.Hs.eg.db with: myGA2 <- setAnnLib(myGA, 'org.Hs.eg.db') getAnnLib(myGA2) [1] "org.Hs.eg.db" .....but at the geneAnswerReadable step: myGAreadable <- geneAnswersReadable(myGA2, catTerm=FALSE) I get the following error: [1] "Mapping geneInput ..." [1] "Warning: some genes do not have valid symbols!" [1] "Mapping genesInCategory ..." traceback() 2: stop(paste("The input geneAnswers categoryType is not DOLite but ", x at categoryType, ". stop function!")) 1: topDOLiteGenes(myGA, orderby = "pvalue", topGenes = "ALL", genesOrderBy = "pValue", file = TRUE) Cheers Ted sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_2.4.6 GeneAnswers_1.6.0 RColorBrewer_1.0-2 Heatplus_1.20.0 rgl_0.92.798 MASS_7.3-9 RSQLite_0.9-4 DBI_0.2-5 XML_3.2-0.2 annotate_1.28.0 AnnotationDbi_1.12.0 [12] Biobase_2.10.0 RCurl_1.5-0.1 bitops_1.0-4.1 igraph_0.5.5-1 loaded via a namespace (and not attached): [1] graph_1.28.0 grid_2.12.1 RBGL_1.26.0 Rgraphviz_1.27.0 tools_2.12.1 xtable_1.5-6 > On 1/28/2011 3:40 PM, Gilbert Feng wrote: > Hi, Ted > > Yes, your codes about mygenepool and mygenes are correct. I didn't > clearly state them in my previous email. Sorry about that. myDOLite > should be built by DOLite and mygenepool, not mygenes. So currently, > myDOLite is a customized annotation library and when you build a > geneAnswers instance, leave categoryType to NULL(default value), and > in your case, I think the total gene number should be 1310, not 1202, > since the former is the gene number of mygenepool. > > after buiding myGA, run setAnnLib to set it to org.Hs.eg.db I guess > the error is that you don't set it. > > For the further mapping function, like geneAnswerReadable or other > visualization, always set catTerm=FALSE. > > Let me know if it doesn't work. > > Best > > Gilbert > > >> I have two gene pools - one is a universe of genes (called say >> "mygenepool") which is only a subset on all human genes, the other is a >> subset of mygenepool (called say "mygenes"). I am interested in knowing >> which DO terms show over-representation in mygenes given the DO terms >> present in mygenepool. So, in your example I am a bit confused as to >> which pool "mygenes" refers to. Is it "mygenepool" or "mygenes"? >> >> So I have...... >> str(mygenepool) # I used as.matrix then as.vector to change the >> dataframe into a vector...is that OK? >> int [1:1310] 8813 3075 2729 8379 204 170302 10165 6521 799 3052 ... >> >> and >> str(mygenes) >> int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... >> >> >> and mygenes is a subset of mygenepool: >> overlap <- intersect(mygenepool, mygenes) >> str(overlap) >> int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... >> >> >> Running the first bit of your example code seems to work fine: >> myDOLite <- lapply(DOLite, intersect, mygenepool) >> names(myDOLite) <- DOLiteTerm[names(myDOLite)] >> >> and so does creating a GeneAnswers instance (myGA) whether I specify the >> totalGeneNumber or not: >> myGA <- geneAnswersBuilder(mygenes, myDOLite, categoryType='DOLITE', >> testType='hyperG', >> #known=FALSE, totalGeneNumber=1202, >> pvalueT=0.1, geneExpressionProfile=NULL, verbose=TRUE) >> >> But then I get the following error: >> myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) >> [1] "Mapping geneInput ..." >> Error in switch(sub("org.*[:.:]", "", libname), eg = "EG", tair = >> "TAIR", : >> EXPR must be a length 1 vector >> >> traceback() >> 2: getSymbols(as.vector(x at geneInput[, 1]), x at annLib, strict = strict, >> missing = missing) >> 1: geneAnswersReadable(myGA, verbose = TRUE) >> >> If I instead create the GeneAnswers instance with 'org.Hs.eg.db' and not >> myDOLite the step..... >> myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) >> >> .....works fine as do later steps (for example geneAnsersChartPlots and >> topDOLiteGenes) >> >> But then presumably I am not working with the restricted pool of genes >> that I want to (i.e. mygenepool) >> >> Any ideas? >> Cheers >> Ted >> >> >> >> >> >> >> On 1/27/2011 5:19 PM, Gilbert Feng wrote: >>> Hi, Ted >>> >>> If I correctly understand your question, you have your own gene pool, >>> don't you? In that case, GeneAnswers still can handle that. What you >>> need is to build a customized DOLite list to run analysis. For example, >>> >>> if your genes are mygenes(an unique Entrez GENE ID vector) >>> >>> myDOLite <- lapply(DOLite, intersect, mygenes) >>> names(myDOLite) <- DOLiteTerm[names(myDOLite)] >>> #Please note that the parameter 'known' is TRUE in geneAnswersBuilder, >>> which means the test will only use shared genes in mygenes and DOLite >>> genes as the gene pool to run hypergeometric test. You can change the >>> number of gene pool by set parameter 'totalGeneNumber' and set 'known' >>> to FALSE at the same time . >>> myGA <- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...) >>> >>> since DO is designed for human, you can use setAnnLib to set the >>> species as 'org.Hs.eg.db' for further visualization or analysis. >>> >>> Let me know if you have any question. >>> >>> Gilbert >>> >>> BTW, the current one has a new function to automatically generate a >>> report for a or several groups of genes. Please check "groupReport" >>> page, but before you run it, make sure you have java support and flash >>> installation. >>> >>> >>> On 1/27/11 9:45 AM, Ted Morrow wrote: >>>> Thanks for your replies Gilbert and Peter.....I forgot to say in my >>>> original question that I did look into using both the GeneAnswers >>>> package and the FunDO web-based system but the issue I have not been >>>> able to solve with them is that I cannot specify "my gene Universe" >>>> within which my selected genes are tested for enrichment. I want to >>>> see >>>> which diseases are enriched out of the possible diseases in "my gene >>>> universe" which is a subset of all possible human genes. >>>> >>>> Is that possible in GeneAnswers (or even FunDO)? >>>> >>>> Cheers >>>> Ted >>>> >>>> On 1/27/2011 4:39 PM, Peter Robinson wrote: >>>>> On 01/27/2011 04:17 PM, Gilbert Feng wrote: >>>>>> Hi, Ted >>>>>> >>>>>> DO.db is a standard sqlite file and you can use standard RSQLite >>>>>> procedure to retrieve the information. Actually we do have a lite >>>>>> version of Disease Ontology, DOLite that removes some redundant >>>>>> nodes, >>>>>> integrated in GeneAnswers package, which also uses hypergeometric >>>>>> test >>>>>> to run enrichment analysis as well as automatically generated >>>>>> interactive(cytoscape web support) html summary for one or more >>>>>> groups >>>>>> of genes. >>>>>> >>>>>> Best >>>>>> >>>>>> Gilbert >>>>> >>>>> >>>>> >>>>> >>>>> You might also want to take a look at this website: >>>>> http://django.nubic.northwestern.edu/fundo/faq >>>>> which implements enrichment analysis using genes annotated to DO >>>>> terms. >>>>> -peter >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> On 1/27/11 5:00 AM, Ted Morrow wrote: >>>>>>> Dear all, >>>>>>> >>>>>>> I would like to conduct a hypergeometric test on a list of genes >>>>>>> but >>>>>>> using Disease Ontology instead of GO or KEGG terms. The package >>>>>>> "DO.db" >>>>>>> contains this information but I have been unable to find a way of >>>>>>> using >>>>>>> this database in conjunction with the "GOstats" package that I have >>>>>>> been >>>>>>> using. >>>>>>> >>>>>>> Has anyone attempted to run a hypergeometric test for Disease >>>>>>> Ontology >>>>>>> terms? Is there another package I could use? Or is there a way of >>>>>>> modifying the argument (if that's the right word) GOHyperGParams in >>>>>>> GOstats so that it can make use of the information in DO.db? >>>>>>> >>>>>>> Both the GO and KEGG analyses work fine: >>>>>>> ### GO chunk >>>>>>> params<- new("GOHyperGParams", >>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>>> annotation = "hgu95av2.db", ontology="BP", >>>>>>> pvalueCutoff=0.01, conditional=FALSE, >>>>>>> testDirection="over") >>>>>>> hgOver<- hyperGTest(params) >>>>>>> >>>>>>> hgOver >>>>>>> Gene to GO BP test for over-representation >>>>>>> 2136 GO BP ids tested (15 have p< 0.01) >>>>>>> Selected gene set size: 112 >>>>>>> Gene universe size: 951 >>>>>>> Annotation package: hgu95av2 >>>>>>> >>>>>>> ### KEGG chunk >>>>>>> paramsKEGG<- new("KEGGHyperGParams", >>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>>> annotation = "hgu95av2.db", >>>>>>> pvalueCutoff=0.01, >>>>>>> testDirection="over") >>>>>>> >>>>>>> hgOverKEGG<- hyperGTest(paramsKEGG) >>>>>>> hgOverKEGG >>>>>>> summary(hgOverKEGG) >>>>>>> >>>>>>> My data looks like this: >>>>>>> str(selectedEntrezIds) >>>>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" >>>>>>> "7546" >>>>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" >>>>>>> "113235" >>>>>>> "28962" "10269" "4069" "30835" "7018" ... >>>>>>> >>>>>>> > str(entrezUniverse) >>>>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" >>>>>>> "6521" >>>>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" >>>>>>> "5048" >>>>>>> "4843" "6329" "5080" "6401" "6853" ... >>>>>>> >>>>>>> My naive attempts to use DO have included: >>>>>>> paramsDO<- new("DOHyperGParams", >>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>>> annotation = "DO.db", >>>>>>> pvalueCutoff=0.01, >>>>>>> testDirection="over") >>>>>>> >>>>>>> Which of course doesn't work and gives the following error: >>>>>>> Error in getClass(Class, where = topenv(parent.frame())) : >>>>>>> "DOHyperGParams" is not a defined class >>>>>>> >>>>>>> > traceback() >>>>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = >>>>>>> NA) >>>>>>> 2: getClass(Class, where = topenv(parent.frame())) >>>>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, >>>>>>> universeGeneIds = >>>>>>> entrezUniverse, >>>>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>>>>>> >>>>>>> >>>>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>>>>>> following error:.. >>>>>>> >>>>>>> hgOverDO<- hyperGTest(paramsDO) >>>>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>>>>>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>>>>>> >>>>>>> traceback() >>>>>>> 10: stop(gettextf("'arg' should be one of %s", >>>>>>> paste(dQuote(choices), >>>>>>> collapse = ", ")), domain = NA) >>>>>>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>>>>>> 8: getUniverseViaGo(p) >>>>>>> 7: universeBuilder(p) >>>>>>> 6: universeBuilder(p) >>>>>>> 5: .hyperGTestInternal(p) >>>>>>> 4: is(object, Cl) >>>>>>> 3: is(object, Cl) >>>>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), >>>>>>> "HyperGResultBase", >>>>>>> "hyperGTest") >>>>>>> 1: hyperGTest(paramsDO) >>>>>>> >>>>>>> >>>>>>> Any help would be greatly appreciated. >>>>>>> /Ted >>>>>>> >>>>>>> > sessionInfo() >>>>>>> R version 2.12.1 (2010-12-16) >>>>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>>>> >>>>>>> locale: >>>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>>>>> LC_TIME=English_United States.1252 >>>>>>> >>>>>>> attached base packages: >>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>> >>>>>>> other attached packages: >>>>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>>>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 >>>>>>> graph_1.28.0 >>>>>>> Category_2.16.0 AnnotationDbi_1.12.0 >>>>>>> [12] Biobase_2.10.0 >>>>>>> >>>>>>> loaded via a namespace (and not attached): >>>>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>>>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 >>>>>>> xtable_1.5-6 >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> . >>>>>> >>>>> >>>>> >>>> >>> >> > -- ______________________________________________________________________ ____________ Edward H. Morrow Department of Ecology and Evolution Animal Ecology (zooekologi) Evolutionary Biology Centre Uppsala University Norbyv?gen 18-D SE-752 36 Uppsala SWEDEN Email: ted.morrow at ebc.uu.se Tel: +46 18 471 2676 Fax +46 18 471 6484 Webpage: http://www.iee.uu.se/zooekol/default.php?type=personalpage&id =119&lang=en
ADD REPLY
0
Entering edit mode
This has been an interesting discussion. A marginally connected question: Are there tools to normalize found lists of symptoms that use informal terminology so that they can be matched to formally catalogued terms in DO? On Fri, Jan 28, 2011 at 9:57 AM, Ted Morrow <ted.morrow at="" ebc.uu.se=""> wrote: > Hi Gilbert, > > Great thanks for the clarification....OK building myGA works fine and I have > setAnnLib to org.Hs.eg.db with: > myGA2 <- setAnnLib(myGA, 'org.Hs.eg.db') > getAnnLib(myGA2) > [1] "org.Hs.eg.db" > > .....but at the geneAnswerReadable step: > myGAreadable <- geneAnswersReadable(myGA2, catTerm=FALSE) > > I get the following error: > [1] "Mapping geneInput ..." > [1] "Warning: some genes do not have valid symbols!" > [1] "Mapping genesInCategory ..." > > ?traceback() > 2: stop(paste("The input geneAnswers categoryType is not DOLite but ", > ? ? ? x at categoryType, ". stop function!")) > 1: topDOLiteGenes(myGA, orderby = "pvalue", topGenes = "ALL", genesOrderBy = > "pValue", > ? ? ? file = TRUE) > > Cheers > Ted > > sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 ?LC_CTYPE=English_United > States.1252 ? ?LC_MONETARY=English_United States.1252 LC_NUMERIC=C > ? ? ? ? ? ? ? ? LC_TIME=English_United States.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > ?[1] org.Hs.eg.db_2.4.6 ? GeneAnswers_1.6.0 ? ?RColorBrewer_1.0-2 > Heatplus_1.20.0 ? ? ?rgl_0.92.798 ? ? ? ? MASS_7.3-9 ? ? ? ? ? RSQLite_0.9-4 > ? ? ? ?DBI_0.2-5 ? ? ? ? ? ?XML_3.2-0.2 ? ? ? ? ?annotate_1.28.0 > ?AnnotationDbi_1.12.0 > [12] Biobase_2.10.0 ? ? ? RCurl_1.5-0.1 ? ? ? ?bitops_1.0-4.1 > igraph_0.5.5-1 > > loaded via a namespace (and not attached): > [1] graph_1.28.0 ? ? grid_2.12.1 ? ? ?RBGL_1.26.0 ? ? ?Rgraphviz_1.27.0 > tools_2.12.1 ? ? xtable_1.5-6 >> > > > > On 1/28/2011 3:40 PM, Gilbert Feng wrote: >> >> Hi, Ted >> >> Yes, your codes about mygenepool and mygenes are correct. I didn't clearly >> state them in my previous email. Sorry about that. myDOLite should be built >> by DOLite and mygenepool, not mygenes. So currently, myDOLite is a >> customized annotation library and when you build a geneAnswers instance, >> leave categoryType to NULL(default value), and in your case, I think the >> total gene number should be 1310, not 1202, since the former is the gene >> number of mygenepool. >> >> after buiding myGA, run setAnnLib to set it to org.Hs.eg.db I guess the >> error is that you don't set it. >> >> For the further mapping function, like geneAnswerReadable or other >> visualization, always set catTerm=FALSE. >> >> Let me know if it doesn't work. >> >> Best >> >> Gilbert >> >> >>> I have two gene pools - one is a universe of genes (called say >>> "mygenepool") which is only a subset on all human genes, the other is a >>> subset of mygenepool (called say "mygenes"). I am interested in knowing >>> which DO terms show over-representation in mygenes given the DO terms >>> present in mygenepool. So, in your example I am a bit confused as to >>> which pool "mygenes" refers to. Is it "mygenepool" or "mygenes"? >>> >>> So I have...... >>> str(mygenepool) # I used as.matrix then as.vector to change the >>> dataframe into a vector...is that OK? >>> int [1:1310] 8813 3075 2729 8379 204 170302 10165 6521 799 3052 ... >>> >>> and >>> str(mygenes) >>> int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... >>> >>> >>> and mygenes is a subset of mygenepool: >>> overlap <- intersect(mygenepool, mygenes) >>> str(overlap) >>> int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... >>> >>> >>> Running the first bit of your example code seems to work fine: >>> myDOLite <- lapply(DOLite, intersect, mygenepool) >>> names(myDOLite) <- DOLiteTerm[names(myDOLite)] >>> >>> and so does creating a GeneAnswers instance (myGA) whether I specify the >>> totalGeneNumber or not: >>> myGA <- geneAnswersBuilder(mygenes, myDOLite, categoryType='DOLITE', >>> testType='hyperG', >>> #known=FALSE, totalGeneNumber=1202, >>> pvalueT=0.1, geneExpressionProfile=NULL, verbose=TRUE) >>> >>> But then I get the following error: >>> myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) >>> [1] "Mapping geneInput ..." >>> Error in switch(sub("org.*[:.:]", "", libname), eg = "EG", tair = "TAIR", >>> : >>> EXPR must be a length 1 vector >>> >>> traceback() >>> 2: getSymbols(as.vector(x at geneInput[, 1]), x at annLib, strict = strict, >>> missing = missing) >>> 1: geneAnswersReadable(myGA, verbose = TRUE) >>> >>> If I instead create the GeneAnswers instance with 'org.Hs.eg.db' and not >>> myDOLite the step..... >>> myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) >>> >>> .....works fine as do later steps (for example geneAnsersChartPlots and >>> topDOLiteGenes) >>> >>> But then presumably I am not working with the restricted pool of genes >>> that I want to (i.e. mygenepool) >>> >>> Any ideas? >>> Cheers >>> Ted >>> >>> >>> >>> >>> >>> >>> On 1/27/2011 5:19 PM, Gilbert Feng wrote: >>>> >>>> Hi, Ted >>>> >>>> If I correctly understand your question, you have your own gene pool, >>>> don't you? In that case, GeneAnswers still can handle that. What you >>>> need is to build a customized DOLite list to run analysis. For example, >>>> >>>> if your genes are mygenes(an unique Entrez GENE ID vector) >>>> >>>> myDOLite <- lapply(DOLite, intersect, mygenes) >>>> names(myDOLite) <- DOLiteTerm[names(myDOLite)] >>>> #Please note that the parameter 'known' is TRUE in geneAnswersBuilder, >>>> which means the test will only use shared genes in mygenes and DOLite >>>> genes as the gene pool to run hypergeometric test. You can change the >>>> number of gene pool by set parameter 'totalGeneNumber' and set 'known' >>>> to FALSE at the same time . >>>> myGA <- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...) >>>> >>>> since DO is designed for human, you can use setAnnLib to set the >>>> species as 'org.Hs.eg.db' for further visualization or analysis. >>>> >>>> Let me know if you have any question. >>>> >>>> Gilbert >>>> >>>> BTW, the current one has a new function to automatically generate a >>>> report for a or several groups of genes. Please check "groupReport" >>>> page, but before you run it, make sure you have java support and flash >>>> installation. >>>> >>>> >>>> On 1/27/11 9:45 AM, Ted Morrow wrote: >>>>> >>>>> Thanks for your replies Gilbert and Peter.....I forgot to say in my >>>>> original question that I did look into using both the GeneAnswers >>>>> package and the FunDO web-based system but the issue I have not been >>>>> able to solve with them is that I cannot specify "my gene Universe" >>>>> within which my selected genes are tested for enrichment. I want to see >>>>> which diseases are enriched out of the possible diseases in "my gene >>>>> universe" which is a subset of all possible human genes. >>>>> >>>>> Is that possible in GeneAnswers (or even FunDO)? >>>>> >>>>> Cheers >>>>> Ted >>>>> >>>>> On 1/27/2011 4:39 PM, Peter Robinson wrote: >>>>>> >>>>>> On 01/27/2011 04:17 PM, Gilbert Feng wrote: >>>>>>> >>>>>>> Hi, Ted >>>>>>> >>>>>>> DO.db is a standard sqlite file and you can use standard RSQLite >>>>>>> procedure to retrieve the information. Actually we do have a lite >>>>>>> version of Disease Ontology, DOLite that removes some redundant >>>>>>> nodes, >>>>>>> integrated in GeneAnswers package, which also uses hypergeometric >>>>>>> test >>>>>>> to run enrichment analysis as well as automatically generated >>>>>>> interactive(cytoscape web support) html summary for one or more >>>>>>> groups >>>>>>> of genes. >>>>>>> >>>>>>> Best >>>>>>> >>>>>>> Gilbert >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> You might also want to take a look at this website: >>>>>> http://django.nubic.northwestern.edu/fundo/faq >>>>>> which implements enrichment analysis using genes annotated to DO >>>>>> terms. >>>>>> -peter >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> On 1/27/11 5:00 AM, Ted Morrow wrote: >>>>>>>> >>>>>>>> Dear all, >>>>>>>> >>>>>>>> I would like to conduct a hypergeometric test on a list of genes but >>>>>>>> using Disease Ontology instead of GO or KEGG terms. The package >>>>>>>> "DO.db" >>>>>>>> contains this information but I have been unable to find a way of >>>>>>>> using >>>>>>>> this database in conjunction with the "GOstats" package that I have >>>>>>>> been >>>>>>>> using. >>>>>>>> >>>>>>>> Has anyone attempted to run a hypergeometric test for Disease >>>>>>>> Ontology >>>>>>>> terms? Is there another package I could use? Or is there a way of >>>>>>>> modifying the argument (if that's the right word) GOHyperGParams in >>>>>>>> GOstats so that it can make use of the information in DO.db? >>>>>>>> >>>>>>>> Both the GO and KEGG analyses work fine: >>>>>>>> ### GO chunk >>>>>>>> params<- new("GOHyperGParams", >>>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>>>> annotation = "hgu95av2.db", ontology="BP", >>>>>>>> pvalueCutoff=0.01, conditional=FALSE, >>>>>>>> testDirection="over") >>>>>>>> hgOver<- hyperGTest(params) >>>>>>>> >>>>>>>> hgOver >>>>>>>> Gene to GO BP test for over-representation >>>>>>>> 2136 GO BP ids tested (15 have p< 0.01) >>>>>>>> Selected gene set size: 112 >>>>>>>> Gene universe size: 951 >>>>>>>> Annotation package: hgu95av2 >>>>>>>> >>>>>>>> ### KEGG chunk >>>>>>>> paramsKEGG<- new("KEGGHyperGParams", >>>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>>>> annotation = "hgu95av2.db", >>>>>>>> pvalueCutoff=0.01, >>>>>>>> testDirection="over") >>>>>>>> >>>>>>>> hgOverKEGG<- hyperGTest(paramsKEGG) >>>>>>>> hgOverKEGG >>>>>>>> summary(hgOverKEGG) >>>>>>>> >>>>>>>> My data looks like this: >>>>>>>> str(selectedEntrezIds) >>>>>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" >>>>>>>> "7546" >>>>>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>>>>>>> "28962" "10269" "4069" "30835" "7018" ... >>>>>>>> >>>>>>>> > str(entrezUniverse) >>>>>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" >>>>>>>> "6521" >>>>>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" >>>>>>>> "5048" >>>>>>>> "4843" "6329" "5080" "6401" "6853" ... >>>>>>>> >>>>>>>> My naive attempts to use DO have included: >>>>>>>> paramsDO<- new("DOHyperGParams", >>>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>>>> annotation = "DO.db", >>>>>>>> pvalueCutoff=0.01, >>>>>>>> testDirection="over") >>>>>>>> >>>>>>>> Which of course doesn't work and gives the following error: >>>>>>>> Error in getClass(Class, where = topenv(parent.frame())) : >>>>>>>> "DOHyperGParams" is not a defined class >>>>>>>> >>>>>>>> > traceback() >>>>>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = >>>>>>>> NA) >>>>>>>> 2: getClass(Class, where = topenv(parent.frame())) >>>>>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, >>>>>>>> universeGeneIds = >>>>>>>> entrezUniverse, >>>>>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>>>>>>> >>>>>>>> >>>>>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>>>>>>> following error:.. >>>>>>>> >>>>>>>> hgOverDO<- hyperGTest(paramsDO) >>>>>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>>>>>>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>>>>>>> >>>>>>>> traceback() >>>>>>>> 10: stop(gettextf("'arg' should be one of %s", >>>>>>>> paste(dQuote(choices), >>>>>>>> collapse = ", ")), domain = NA) >>>>>>>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>>>>>>> 8: getUniverseViaGo(p) >>>>>>>> 7: universeBuilder(p) >>>>>>>> 6: universeBuilder(p) >>>>>>>> 5: .hyperGTestInternal(p) >>>>>>>> 4: is(object, Cl) >>>>>>>> 3: is(object, Cl) >>>>>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), >>>>>>>> "HyperGResultBase", >>>>>>>> "hyperGTest") >>>>>>>> 1: hyperGTest(paramsDO) >>>>>>>> >>>>>>>> >>>>>>>> Any help would be greatly appreciated. >>>>>>>> /Ted >>>>>>>> >>>>>>>> > sessionInfo() >>>>>>>> R version 2.12.1 (2010-12-16) >>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>>>>> >>>>>>>> locale: >>>>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>>>>>> LC_TIME=English_United States.1252 >>>>>>>> >>>>>>>> attached base packages: >>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>>> >>>>>>>> other attached packages: >>>>>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>>>>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 >>>>>>>> graph_1.28.0 >>>>>>>> Category_2.16.0 AnnotationDbi_1.12.0 >>>>>>>> [12] Biobase_2.10.0 >>>>>>>> >>>>>>>> loaded via a namespace (and not attached): >>>>>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>>>>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> . >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > > -- > ____________________________________________________________________ ______________ > > Edward H. Morrow > Department of Ecology and Evolution > Animal Ecology (zooekologi) > Evolutionary Biology Centre > Uppsala University > Norbyv?gen 18-D > SE-752 36 Uppsala > SWEDEN > > Email: ted.morrow at ebc.uu.se > Tel: +46 18 471 2676 > Fax +46 18 471 6484 > Webpage: > http://www.iee.uu.se/zooekol/default.php?type=personalpage&id=119&la ng=en > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi, Vincent That's a good question. Currently, DO.obo file contains some synonyms for each DO term/ID. Our collaborators are working on curation right now. I'll talk to them about this. So far, what I can think is to retrieve the synonyms from DO.obo and use some text mining tools to find the DO ids. More info can be found at http://do-wiki.nubic.northwestern.edu/index.php/Main_Page And there will be a DO workshop between 2/8 and 2/9 at Baltimore. Best Gilbert On 1/28/11 2:25 PM, Vincent Carey wrote: > This has been an interesting discussion. A marginally connected > question: Are there tools to normalize found lists of symptoms that > use informal terminology so that they can be matched to formally > catalogued terms in DO? > > On Fri, Jan 28, 2011 at 9:57 AM, Ted Morrow<ted.morrow at="" ebc.uu.se=""> wrote: >> Hi Gilbert, >> >> Great thanks for the clarification....OK building myGA works fine and I have >> setAnnLib to org.Hs.eg.db with: >> myGA2<- setAnnLib(myGA, 'org.Hs.eg.db') >> getAnnLib(myGA2) >> [1] "org.Hs.eg.db" >> >> .....but at the geneAnswerReadable step: >> myGAreadable<- geneAnswersReadable(myGA2, catTerm=FALSE) >> >> I get the following error: >> [1] "Mapping geneInput ..." >> [1] "Warning: some genes do not have valid symbols!" >> [1] "Mapping genesInCategory ..." >> >> traceback() >> 2: stop(paste("The input geneAnswers categoryType is not DOLite but ", >> x at categoryType, ". stop function!")) >> 1: topDOLiteGenes(myGA, orderby = "pvalue", topGenes = "ALL", genesOrderBy = >> "pValue", >> file = TRUE) >> >> Cheers >> Ted >> >> sessionInfo() >> R version 2.12.1 (2010-12-16) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >> LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] org.Hs.eg.db_2.4.6 GeneAnswers_1.6.0 RColorBrewer_1.0-2 >> Heatplus_1.20.0 rgl_0.92.798 MASS_7.3-9 RSQLite_0.9-4 >> DBI_0.2-5 XML_3.2-0.2 annotate_1.28.0 >> AnnotationDbi_1.12.0 >> [12] Biobase_2.10.0 RCurl_1.5-0.1 bitops_1.0-4.1 >> igraph_0.5.5-1 >> >> loaded via a namespace (and not attached): >> [1] graph_1.28.0 grid_2.12.1 RBGL_1.26.0 Rgraphviz_1.27.0 >> tools_2.12.1 xtable_1.5-6 >>> >> >> >> >> On 1/28/2011 3:40 PM, Gilbert Feng wrote: >>> >>> Hi, Ted >>> >>> Yes, your codes about mygenepool and mygenes are correct. I didn't clearly >>> state them in my previous email. Sorry about that. myDOLite should be built >>> by DOLite and mygenepool, not mygenes. So currently, myDOLite is a >>> customized annotation library and when you build a geneAnswers instance, >>> leave categoryType to NULL(default value), and in your case, I think the >>> total gene number should be 1310, not 1202, since the former is the gene >>> number of mygenepool. >>> >>> after buiding myGA, run setAnnLib to set it to org.Hs.eg.db I guess the >>> error is that you don't set it. >>> >>> For the further mapping function, like geneAnswerReadable or other >>> visualization, always set catTerm=FALSE. >>> >>> Let me know if it doesn't work. >>> >>> Best >>> >>> Gilbert >>> >>> >>>> I have two gene pools - one is a universe of genes (called say >>>> "mygenepool") which is only a subset on all human genes, the other is a >>>> subset of mygenepool (called say "mygenes"). I am interested in knowing >>>> which DO terms show over-representation in mygenes given the DO terms >>>> present in mygenepool. So, in your example I am a bit confused as to >>>> which pool "mygenes" refers to. Is it "mygenepool" or "mygenes"? >>>> >>>> So I have...... >>>> str(mygenepool) # I used as.matrix then as.vector to change the >>>> dataframe into a vector...is that OK? >>>> int [1:1310] 8813 3075 2729 8379 204 170302 10165 6521 799 3052 ... >>>> >>>> and >>>> str(mygenes) >>>> int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... >>>> >>>> >>>> and mygenes is a subset of mygenepool: >>>> overlap<- intersect(mygenepool, mygenes) >>>> str(overlap) >>>> int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... >>>> >>>> >>>> Running the first bit of your example code seems to work fine: >>>> myDOLite<- lapply(DOLite, intersect, mygenepool) >>>> names(myDOLite)<- DOLiteTerm[names(myDOLite)] >>>> >>>> and so does creating a GeneAnswers instance (myGA) whether I specify the >>>> totalGeneNumber or not: >>>> myGA<- geneAnswersBuilder(mygenes, myDOLite, categoryType='DOLITE', >>>> testType='hyperG', >>>> #known=FALSE, totalGeneNumber=1202, >>>> pvalueT=0.1, geneExpressionProfile=NULL, verbose=TRUE) >>>> >>>> But then I get the following error: >>>> myGAreadable<- geneAnswersReadable(myGA, verbose=TRUE) >>>> [1] "Mapping geneInput ..." >>>> Error in switch(sub("org.*[:.:]", "", libname), eg = "EG", tair = "TAIR", >>>> : >>>> EXPR must be a length 1 vector >>>> >>>> traceback() >>>> 2: getSymbols(as.vector(x at geneInput[, 1]), x at annLib, strict = strict, >>>> missing = missing) >>>> 1: geneAnswersReadable(myGA, verbose = TRUE) >>>> >>>> If I instead create the GeneAnswers instance with 'org.Hs.eg.db' and not >>>> myDOLite the step..... >>>> myGAreadable<- geneAnswersReadable(myGA, verbose=TRUE) >>>> >>>> .....works fine as do later steps (for example geneAnsersChartPlots and >>>> topDOLiteGenes) >>>> >>>> But then presumably I am not working with the restricted pool of genes >>>> that I want to (i.e. mygenepool) >>>> >>>> Any ideas? >>>> Cheers >>>> Ted >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 1/27/2011 5:19 PM, Gilbert Feng wrote: >>>>> >>>>> Hi, Ted >>>>> >>>>> If I correctly understand your question, you have your own gene pool, >>>>> don't you? In that case, GeneAnswers still can handle that. What you >>>>> need is to build a customized DOLite list to run analysis. For example, >>>>> >>>>> if your genes are mygenes(an unique Entrez GENE ID vector) >>>>> >>>>> myDOLite<- lapply(DOLite, intersect, mygenes) >>>>> names(myDOLite)<- DOLiteTerm[names(myDOLite)] >>>>> #Please note that the parameter 'known' is TRUE in geneAnswersBuilder, >>>>> which means the test will only use shared genes in mygenes and DOLite >>>>> genes as the gene pool to run hypergeometric test. You can change the >>>>> number of gene pool by set parameter 'totalGeneNumber' and set 'known' >>>>> to FALSE at the same time . >>>>> myGA<- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...) >>>>> >>>>> since DO is designed for human, you can use setAnnLib to set the >>>>> species as 'org.Hs.eg.db' for further visualization or analysis. >>>>> >>>>> Let me know if you have any question. >>>>> >>>>> Gilbert >>>>> >>>>> BTW, the current one has a new function to automatically generate a >>>>> report for a or several groups of genes. Please check "groupReport" >>>>> page, but before you run it, make sure you have java support and flash >>>>> installation. >>>>> >>>>> >>>>> On 1/27/11 9:45 AM, Ted Morrow wrote: >>>>>> >>>>>> Thanks for your replies Gilbert and Peter.....I forgot to say in my >>>>>> original question that I did look into using both the GeneAnswers >>>>>> package and the FunDO web-based system but the issue I have not been >>>>>> able to solve with them is that I cannot specify "my gene Universe" >>>>>> within which my selected genes are tested for enrichment. I want to see >>>>>> which diseases are enriched out of the possible diseases in "my gene >>>>>> universe" which is a subset of all possible human genes. >>>>>> >>>>>> Is that possible in GeneAnswers (or even FunDO)? >>>>>> >>>>>> Cheers >>>>>> Ted >>>>>> >>>>>> On 1/27/2011 4:39 PM, Peter Robinson wrote: >>>>>>> >>>>>>> On 01/27/2011 04:17 PM, Gilbert Feng wrote: >>>>>>>> >>>>>>>> Hi, Ted >>>>>>>> >>>>>>>> DO.db is a standard sqlite file and you can use standard RSQLite >>>>>>>> procedure to retrieve the information. Actually we do have a lite >>>>>>>> version of Disease Ontology, DOLite that removes some redundant >>>>>>>> nodes, >>>>>>>> integrated in GeneAnswers package, which also uses hypergeometric >>>>>>>> test >>>>>>>> to run enrichment analysis as well as automatically generated >>>>>>>> interactive(cytoscape web support) html summary for one or more >>>>>>>> groups >>>>>>>> of genes. >>>>>>>> >>>>>>>> Best >>>>>>>> >>>>>>>> Gilbert >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> You might also want to take a look at this website: >>>>>>> http://django.nubic.northwestern.edu/fundo/faq >>>>>>> which implements enrichment analysis using genes annotated to DO >>>>>>> terms. >>>>>>> -peter >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On 1/27/11 5:00 AM, Ted Morrow wrote: >>>>>>>>> >>>>>>>>> Dear all, >>>>>>>>> >>>>>>>>> I would like to conduct a hypergeometric test on a list of genes but >>>>>>>>> using Disease Ontology instead of GO or KEGG terms. The package >>>>>>>>> "DO.db" >>>>>>>>> contains this information but I have been unable to find a way of >>>>>>>>> using >>>>>>>>> this database in conjunction with the "GOstats" package that I have >>>>>>>>> been >>>>>>>>> using. >>>>>>>>> >>>>>>>>> Has anyone attempted to run a hypergeometric test for Disease >>>>>>>>> Ontology >>>>>>>>> terms? Is there another package I could use? Or is there a way of >>>>>>>>> modifying the argument (if that's the right word) GOHyperGParams in >>>>>>>>> GOstats so that it can make use of the information in DO.db? >>>>>>>>> >>>>>>>>> Both the GO and KEGG analyses work fine: >>>>>>>>> ### GO chunk >>>>>>>>> params<- new("GOHyperGParams", >>>>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>>>>> annotation = "hgu95av2.db", ontology="BP", >>>>>>>>> pvalueCutoff=0.01, conditional=FALSE, >>>>>>>>> testDirection="over") >>>>>>>>> hgOver<- hyperGTest(params) >>>>>>>>> >>>>>>>>> hgOver >>>>>>>>> Gene to GO BP test for over-representation >>>>>>>>> 2136 GO BP ids tested (15 have p< 0.01) >>>>>>>>> Selected gene set size: 112 >>>>>>>>> Gene universe size: 951 >>>>>>>>> Annotation package: hgu95av2 >>>>>>>>> >>>>>>>>> ### KEGG chunk >>>>>>>>> paramsKEGG<- new("KEGGHyperGParams", >>>>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>>>>> annotation = "hgu95av2.db", >>>>>>>>> pvalueCutoff=0.01, >>>>>>>>> testDirection="over") >>>>>>>>> >>>>>>>>> hgOverKEGG<- hyperGTest(paramsKEGG) >>>>>>>>> hgOverKEGG >>>>>>>>> summary(hgOverKEGG) >>>>>>>>> >>>>>>>>> My data looks like this: >>>>>>>>> str(selectedEntrezIds) >>>>>>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" >>>>>>>>> "7546" >>>>>>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>>>>>>>> "28962" "10269" "4069" "30835" "7018" ... >>>>>>>>> >>>>>>>>>> str(entrezUniverse) >>>>>>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" >>>>>>>>> "6521" >>>>>>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" >>>>>>>>> "5048" >>>>>>>>> "4843" "6329" "5080" "6401" "6853" ... >>>>>>>>> >>>>>>>>> My naive attempts to use DO have included: >>>>>>>>> paramsDO<- new("DOHyperGParams", >>>>>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>>>>> annotation = "DO.db", >>>>>>>>> pvalueCutoff=0.01, >>>>>>>>> testDirection="over") >>>>>>>>> >>>>>>>>> Which of course doesn't work and gives the following error: >>>>>>>>> Error in getClass(Class, where = topenv(parent.frame())) : >>>>>>>>> "DOHyperGParams" is not a defined class >>>>>>>>> >>>>>>>>>> traceback() >>>>>>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = >>>>>>>>> NA) >>>>>>>>> 2: getClass(Class, where = topenv(parent.frame())) >>>>>>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, >>>>>>>>> universeGeneIds = >>>>>>>>> entrezUniverse, >>>>>>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>>>>>>>> >>>>>>>>> >>>>>>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>>>>>>>> following error:.. >>>>>>>>> >>>>>>>>> hgOverDO<- hyperGTest(paramsDO) >>>>>>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>>>>>>>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>>>>>>>> >>>>>>>>> traceback() >>>>>>>>> 10: stop(gettextf("'arg' should be one of %s", >>>>>>>>> paste(dQuote(choices), >>>>>>>>> collapse = ", ")), domain = NA) >>>>>>>>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>>>>>>>> 8: getUniverseViaGo(p) >>>>>>>>> 7: universeBuilder(p) >>>>>>>>> 6: universeBuilder(p) >>>>>>>>> 5: .hyperGTestInternal(p) >>>>>>>>> 4: is(object, Cl) >>>>>>>>> 3: is(object, Cl) >>>>>>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), >>>>>>>>> "HyperGResultBase", >>>>>>>>> "hyperGTest") >>>>>>>>> 1: hyperGTest(paramsDO) >>>>>>>>> >>>>>>>>> >>>>>>>>> Any help would be greatly appreciated. >>>>>>>>> /Ted >>>>>>>>> >>>>>>>>>> sessionInfo() >>>>>>>>> R version 2.12.1 (2010-12-16) >>>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>>>>>> >>>>>>>>> locale: >>>>>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>>>>>>> LC_TIME=English_United States.1252 >>>>>>>>> >>>>>>>>> attached base packages: >>>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>>>> >>>>>>>>> other attached packages: >>>>>>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>>>>>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 >>>>>>>>> graph_1.28.0 >>>>>>>>> Category_2.16.0 AnnotationDbi_1.12.0 >>>>>>>>> [12] Biobase_2.10.0 >>>>>>>>> >>>>>>>>> loaded via a namespace (and not attached): >>>>>>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>>>>>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor at r-project.org >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> . >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> -- >> ___________________________________________________________________ _______________ >> >> Edward H. Morrow >> Department of Ecology and Evolution >> Animal Ecology (zooekologi) >> Evolutionary Biology Centre >> Uppsala University >> Norbyv?gen 18-D >> SE-752 36 Uppsala >> SWEDEN >> >> Email: ted.morrow at ebc.uu.se >> Tel: +46 18 471 2676 >> Fax +46 18 471 6484 >> Webpage: >> http://www.iee.uu.se/zooekol/default.php?type=personalpage&id=119&l ang=en >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> -- ----------------------------------------------- Gang (Gilbert) Feng, PhD Biomedical Informatics Center Robert H. Lurie Comprehensive Cancer Center Northwestern University 750 N. Lake Shore Drive, 11th Floor(11-175e) Chicago, IL 60611 Phone:312-503-2358 Email g-feng (at) northwestern.edu
ADD REPLY
0
Entering edit mode
I forgot to add my sessionInfo for the last post... sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_2.4.6 GeneAnswers_1.6.0 RColorBrewer_1.0-2 Heatplus_1.20.0 rgl_0.92.798 MASS_7.3-9 RSQLite_0.9-4 DBI_0.2-5 XML_3.2-0.2 annotate_1.28.0 AnnotationDbi_1.12.0 [12] Biobase_2.10.0 RCurl_1.5-0.1 bitops_1.0-4.1 igraph_0.5.5-1 loaded via a namespace (and not attached): [1] graph_1.28.0 grid_2.12.1 RBGL_1.26.0 Rgraphviz_1.27.0 tools_2.12.1 xtable_1.5-6 > On 1/28/2011 2:25 PM, Ted Morrow wrote: > Hi Gilbert, > > Thanks for the response....it seems like GeneAnswers may do the job > but I am still having trouble getting it to work for me. > > I have two gene pools - one is a universe of genes (called say > "mygenepool") which is only a subset on all human genes, the other is > a subset of mygenepool (called say "mygenes"). I am interested in > knowing which DO terms show over-representation in mygenes given the > DO terms present in mygenepool. So, in your example I am a bit > confused as to which pool "mygenes" refers to. Is it "mygenepool" or > "mygenes"? > > So I have...... > str(mygenepool) # I used as.matrix then as.vector to change the > dataframe into a vector...is that OK? > int [1:1310] 8813 3075 2729 8379 204 170302 10165 6521 799 3052 ... > > and > str(mygenes) > int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... > > > and mygenes is a subset of mygenepool: > overlap <- intersect(mygenepool, mygenes) > str(overlap) > int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ... > > > Running the first bit of your example code seems to work fine: > myDOLite <- lapply(DOLite, intersect, mygenepool) > names(myDOLite) <- DOLiteTerm[names(myDOLite)] > > and so does creating a GeneAnswers instance (myGA) whether I specify > the totalGeneNumber or not: > myGA <- geneAnswersBuilder(mygenes, myDOLite, categoryType='DOLITE', > testType='hyperG', > #known=FALSE, totalGeneNumber=1202, > pvalueT=0.1, geneExpressionProfile=NULL, verbose=TRUE) > > But then I get the following error: > myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) > [1] "Mapping geneInput ..." > Error in switch(sub("org.*[:.:]", "", libname), eg = "EG", tair = > "TAIR", : > EXPR must be a length 1 vector > > traceback() > 2: getSymbols(as.vector(x at geneInput[, 1]), x at annLib, strict = strict, > missing = missing) > 1: geneAnswersReadable(myGA, verbose = TRUE) > > If I instead create the GeneAnswers instance with 'org.Hs.eg.db' and > not myDOLite the step..... > myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE) > > .....works fine as do later steps (for example geneAnsersChartPlots > and topDOLiteGenes) > > But then presumably I am not working with the restricted pool of genes > that I want to (i.e. mygenepool) > > Any ideas? > Cheers > Ted > > > > > > > On 1/27/2011 5:19 PM, Gilbert Feng wrote: >> Hi, Ted >> >> If I correctly understand your question, you have your own gene pool, >> don't you? In that case, GeneAnswers still can handle that. What you >> need is to build a customized DOLite list to run analysis. For example, >> >> if your genes are mygenes(an unique Entrez GENE ID vector) >> >> myDOLite <- lapply(DOLite, intersect, mygenes) >> names(myDOLite) <- DOLiteTerm[names(myDOLite)] >> #Please note that the parameter 'known' is TRUE in >> geneAnswersBuilder, which means the test will only use shared genes >> in mygenes and DOLite genes as the gene pool to run hypergeometric >> test. You can change the number of gene pool by set parameter >> 'totalGeneNumber' and set 'known' to FALSE at the same time . >> myGA <- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...) >> >> since DO is designed for human, you can use setAnnLib to set the >> species as 'org.Hs.eg.db' for further visualization or analysis. >> >> Let me know if you have any question. >> >> Gilbert >> >> BTW, the current one has a new function to automatically generate a >> report for a or several groups of genes. Please check "groupReport" >> page, but before you run it, make sure you have java support and >> flash installation. >> >> >> On 1/27/11 9:45 AM, Ted Morrow wrote: >>> Thanks for your replies Gilbert and Peter.....I forgot to say in my >>> original question that I did look into using both the GeneAnswers >>> package and the FunDO web-based system but the issue I have not been >>> able to solve with them is that I cannot specify "my gene Universe" >>> within which my selected genes are tested for enrichment. I want to see >>> which diseases are enriched out of the possible diseases in "my gene >>> universe" which is a subset of all possible human genes. >>> >>> Is that possible in GeneAnswers (or even FunDO)? >>> >>> Cheers >>> Ted >>> >>> On 1/27/2011 4:39 PM, Peter Robinson wrote: >>>> On 01/27/2011 04:17 PM, Gilbert Feng wrote: >>>>> Hi, Ted >>>>> >>>>> DO.db is a standard sqlite file and you can use standard RSQLite >>>>> procedure to retrieve the information. Actually we do have a lite >>>>> version of Disease Ontology, DOLite that removes some redundant >>>>> nodes, >>>>> integrated in GeneAnswers package, which also uses hypergeometric >>>>> test >>>>> to run enrichment analysis as well as automatically generated >>>>> interactive(cytoscape web support) html summary for one or more >>>>> groups >>>>> of genes. >>>>> >>>>> Best >>>>> >>>>> Gilbert >>>> >>>> >>>> >>>> >>>> You might also want to take a look at this website: >>>> http://django.nubic.northwestern.edu/fundo/faq >>>> which implements enrichment analysis using genes annotated to DO >>>> terms. >>>> -peter >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> On 1/27/11 5:00 AM, Ted Morrow wrote: >>>>>> Dear all, >>>>>> >>>>>> I would like to conduct a hypergeometric test on a list of genes but >>>>>> using Disease Ontology instead of GO or KEGG terms. The package >>>>>> "DO.db" >>>>>> contains this information but I have been unable to find a way of >>>>>> using >>>>>> this database in conjunction with the "GOstats" package that I have >>>>>> been >>>>>> using. >>>>>> >>>>>> Has anyone attempted to run a hypergeometric test for Disease >>>>>> Ontology >>>>>> terms? Is there another package I could use? Or is there a way of >>>>>> modifying the argument (if that's the right word) GOHyperGParams in >>>>>> GOstats so that it can make use of the information in DO.db? >>>>>> >>>>>> Both the GO and KEGG analyses work fine: >>>>>> ### GO chunk >>>>>> params<- new("GOHyperGParams", >>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>> annotation = "hgu95av2.db", ontology="BP", >>>>>> pvalueCutoff=0.01, conditional=FALSE, >>>>>> testDirection="over") >>>>>> hgOver<- hyperGTest(params) >>>>>> >>>>>> hgOver >>>>>> Gene to GO BP test for over-representation >>>>>> 2136 GO BP ids tested (15 have p< 0.01) >>>>>> Selected gene set size: 112 >>>>>> Gene universe size: 951 >>>>>> Annotation package: hgu95av2 >>>>>> >>>>>> ### KEGG chunk >>>>>> paramsKEGG<- new("KEGGHyperGParams", >>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>> annotation = "hgu95av2.db", >>>>>> pvalueCutoff=0.01, >>>>>> testDirection="over") >>>>>> >>>>>> hgOverKEGG<- hyperGTest(paramsKEGG) >>>>>> hgOverKEGG >>>>>> summary(hgOverKEGG) >>>>>> >>>>>> My data looks like this: >>>>>> str(selectedEntrezIds) >>>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" >>>>>> "7546" >>>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>>>>> "28962" "10269" "4069" "30835" "7018" ... >>>>>> >>>>>> > str(entrezUniverse) >>>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" >>>>>> "6521" >>>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" >>>>>> "5048" >>>>>> "4843" "6329" "5080" "6401" "6853" ... >>>>>> >>>>>> My naive attempts to use DO have included: >>>>>> paramsDO<- new("DOHyperGParams", >>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>>> annotation = "DO.db", >>>>>> pvalueCutoff=0.01, >>>>>> testDirection="over") >>>>>> >>>>>> Which of course doesn't work and gives the following error: >>>>>> Error in getClass(Class, where = topenv(parent.frame())) : >>>>>> "DOHyperGParams" is not a defined class >>>>>> >>>>>> > traceback() >>>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain >>>>>> = NA) >>>>>> 2: getClass(Class, where = topenv(parent.frame())) >>>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, >>>>>> universeGeneIds = >>>>>> entrezUniverse, >>>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>>>>> >>>>>> >>>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>>>>> following error:.. >>>>>> >>>>>> hgOverDO<- hyperGTest(paramsDO) >>>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>>>>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>>>>> >>>>>> traceback() >>>>>> 10: stop(gettextf("'arg' should be one of %s", >>>>>> paste(dQuote(choices), >>>>>> collapse = ", ")), domain = NA) >>>>>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>>>>> 8: getUniverseViaGo(p) >>>>>> 7: universeBuilder(p) >>>>>> 6: universeBuilder(p) >>>>>> 5: .hyperGTestInternal(p) >>>>>> 4: is(object, Cl) >>>>>> 3: is(object, Cl) >>>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), >>>>>> "HyperGResultBase", >>>>>> "hyperGTest") >>>>>> 1: hyperGTest(paramsDO) >>>>>> >>>>>> >>>>>> Any help would be greatly appreciated. >>>>>> /Ted >>>>>> >>>>>> > sessionInfo() >>>>>> R version 2.12.1 (2010-12-16) >>>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>>> >>>>>> locale: >>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>>>> LC_TIME=English_United States.1252 >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 >>>>>> graph_1.28.0 >>>>>> Category_2.16.0 AnnotationDbi_1.12.0 >>>>>> [12] Biobase_2.10.0 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> . >>>>> >>>> >>>> >>> >> > -- ______________________________________________________________________ ____________ Edward H. Morrow Department of Ecology and Evolution Animal Ecology (zooekologi) Evolutionary Biology Centre Uppsala University Norbyv?gen 18-D SE-752 36 Uppsala SWEDEN Email: ted.morrow at ebc.uu.se Tel: +46 18 471 2676 Fax +46 18 471 6484 Webpage: http://www.iee.uu.se/zooekol/default.php?type=personalpage&id =119&lang=en
ADD REPLY
0
Entering edit mode
On 01/27/2011 04:45 PM, Ted Morrow wrote: > Thanks for your replies Gilbert and Peter.....I forgot to say in my > original question that I did look into using both the GeneAnswers > package and the FunDO web-based system but the issue I have not been > able to solve with them is that I cannot specify "my gene Universe" > within which my selected genes are tested for enrichment. I want to see > which diseases are enriched out of the possible diseases in "my gene > universe" which is a subset of all possible human genes. > > Is that possible in GeneAnswers (or even FunDO)? > > Cheers > Ted Just as a word of caution with this analysis, I think you will get biased answers if you use all possible human genes as your universe/population. I would consider limiting the population set to all genes known to be associated with at least one disease (or something similar appropriate to your experiment). In GO analysis, if you use all genes as your population (rather than all genes annotated to at least one GO term), this tends to overestimate statistical significance. -peter > > On 1/27/2011 4:39 PM, Peter Robinson wrote: >> On 01/27/2011 04:17 PM, Gilbert Feng wrote: >>> Hi, Ted >>> >>> DO.db is a standard sqlite file and you can use standard RSQLite >>> procedure to retrieve the information. Actually we do have a lite >>> version of Disease Ontology, DOLite that removes some redundant nodes, >>> integrated in GeneAnswers package, which also uses hypergeometric test >>> to run enrichment analysis as well as automatically generated >>> interactive(cytoscape web support) html summary for one or more groups >>> of genes. >>> >>> Best >>> >>> Gilbert >> >> >> >> >> You might also want to take a look at this website: >> http://django.nubic.northwestern.edu/fundo/faq >> which implements enrichment analysis using genes annotated to DO terms. >> -peter >> >> >> >> >> >> >> >> >> >> >> >>> >>> On 1/27/11 5:00 AM, Ted Morrow wrote: >>>> Dear all, >>>> >>>> I would like to conduct a hypergeometric test on a list of genes but >>>> using Disease Ontology instead of GO or KEGG terms. The package "DO.db" >>>> contains this information but I have been unable to find a way of using >>>> this database in conjunction with the "GOstats" package that I have >>>> been >>>> using. >>>> >>>> Has anyone attempted to run a hypergeometric test for Disease Ontology >>>> terms? Is there another package I could use? Or is there a way of >>>> modifying the argument (if that's the right word) GOHyperGParams in >>>> GOstats so that it can make use of the information in DO.db? >>>> >>>> Both the GO and KEGG analyses work fine: >>>> ### GO chunk >>>> params<- new("GOHyperGParams", >>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>> annotation = "hgu95av2.db", ontology="BP", >>>> pvalueCutoff=0.01, conditional=FALSE, >>>> testDirection="over") >>>> hgOver<- hyperGTest(params) >>>> >>>> hgOver >>>> Gene to GO BP test for over-representation >>>> 2136 GO BP ids tested (15 have p< 0.01) >>>> Selected gene set size: 112 >>>> Gene universe size: 951 >>>> Annotation package: hgu95av2 >>>> >>>> ### KEGG chunk >>>> paramsKEGG<- new("KEGGHyperGParams", >>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>> annotation = "hgu95av2.db", >>>> pvalueCutoff=0.01, >>>> testDirection="over") >>>> >>>> hgOverKEGG<- hyperGTest(paramsKEGG) >>>> hgOverKEGG >>>> summary(hgOverKEGG) >>>> >>>> My data looks like this: >>>> str(selectedEntrezIds) >>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" "7546" >>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>>> "28962" "10269" "4069" "30835" "7018" ... >>>> >>>>> str(entrezUniverse) >>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" "6521" >>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" "5048" >>>> "4843" "6329" "5080" "6401" "6853" ... >>>> >>>> My naive attempts to use DO have included: >>>> paramsDO<- new("DOHyperGParams", >>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>> annotation = "DO.db", >>>> pvalueCutoff=0.01, >>>> testDirection="over") >>>> >>>> Which of course doesn't work and gives the following error: >>>> Error in getClass(Class, where = topenv(parent.frame())) : >>>> "DOHyperGParams" is not a defined class >>>> >>>>> traceback() >>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = NA) >>>> 2: getClass(Class, where = topenv(parent.frame())) >>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds = >>>> entrezUniverse, >>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>>> >>>> >>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>>> following error:.. >>>> >>>> hgOverDO<- hyperGTest(paramsDO) >>>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>>> >>>> traceback() >>>> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), >>>> collapse = ", ")), domain = NA) >>>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>>> 8: getUniverseViaGo(p) >>>> 7: universeBuilder(p) >>>> 6: universeBuilder(p) >>>> 5: .hyperGTestInternal(p) >>>> 4: is(object, Cl) >>>> 3: is(object, Cl) >>>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", >>>> "hyperGTest") >>>> 1: hyperGTest(paramsDO) >>>> >>>> >>>> Any help would be greatly appreciated. >>>> /Ted >>>> >>>>> sessionInfo() >>>> R version 2.12.1 (2010-12-16) >>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>> LC_TIME=English_United States.1252 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0 >>>> Category_2.16.0 AnnotationDbi_1.12.0 >>>> [12] Biobase_2.10.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>>> >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> . >>> >> >> > -- PD Dr. med. Peter N. Robinson, MSc. Institut f?r Medizinische Genetik Charit? - Universit?tsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany voice: 49-30-450566042 fax: 49-30-450569915 email: peter.robinson at charite.de http://compbio.charite.de/ http://www.human-phenotype-ontology.org
ADD REPLY
0
Entering edit mode
Yes, Peter, that's true. The 1st version GeneAnswers has such question, but the current one has solved it. In geneAnswerBuilder, there is new parameter, known(default is TRUE), which means only use annotated genes as the gene pool. There is another parameter,totalGeneNumber, for customized annotation library. Thanks Gilbert On 1/27/11 10:00 AM, Peter Robinson wrote: > On 01/27/2011 04:45 PM, Ted Morrow wrote: >> Thanks for your replies Gilbert and Peter.....I forgot to say in my >> original question that I did look into using both the GeneAnswers >> package and the FunDO web-based system but the issue I have not been >> able to solve with them is that I cannot specify "my gene Universe" >> within which my selected genes are tested for enrichment. I want to see >> which diseases are enriched out of the possible diseases in "my gene >> universe" which is a subset of all possible human genes. >> >> Is that possible in GeneAnswers (or even FunDO)? >> >> Cheers >> Ted > > > > Just as a word of caution with this analysis, I think you will get > biased answers if you use all possible human genes as your > universe/population. I would consider limiting the population set to all > genes known to be associated with at least one disease (or something > similar appropriate to your experiment). In GO analysis, if you use all > genes as your population (rather than all genes annotated to at least > one GO term), this tends to overestimate statistical significance. > -peter > > > > > > > >> >> On 1/27/2011 4:39 PM, Peter Robinson wrote: >>> On 01/27/2011 04:17 PM, Gilbert Feng wrote: >>>> Hi, Ted >>>> >>>> DO.db is a standard sqlite file and you can use standard RSQLite >>>> procedure to retrieve the information. Actually we do have a lite >>>> version of Disease Ontology, DOLite that removes some redundant nodes, >>>> integrated in GeneAnswers package, which also uses hypergeometric test >>>> to run enrichment analysis as well as automatically generated >>>> interactive(cytoscape web support) html summary for one or more groups >>>> of genes. >>>> >>>> Best >>>> >>>> Gilbert >>> >>> >>> >>> >>> You might also want to take a look at this website: >>> http://django.nubic.northwestern.edu/fundo/faq >>> which implements enrichment analysis using genes annotated to DO terms. >>> -peter >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> >>>> On 1/27/11 5:00 AM, Ted Morrow wrote: >>>>> Dear all, >>>>> >>>>> I would like to conduct a hypergeometric test on a list of genes but >>>>> using Disease Ontology instead of GO or KEGG terms. The package >>>>> "DO.db" >>>>> contains this information but I have been unable to find a way of >>>>> using >>>>> this database in conjunction with the "GOstats" package that I have >>>>> been >>>>> using. >>>>> >>>>> Has anyone attempted to run a hypergeometric test for Disease Ontology >>>>> terms? Is there another package I could use? Or is there a way of >>>>> modifying the argument (if that's the right word) GOHyperGParams in >>>>> GOstats so that it can make use of the information in DO.db? >>>>> >>>>> Both the GO and KEGG analyses work fine: >>>>> ### GO chunk >>>>> params<- new("GOHyperGParams", >>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>> annotation = "hgu95av2.db", ontology="BP", >>>>> pvalueCutoff=0.01, conditional=FALSE, >>>>> testDirection="over") >>>>> hgOver<- hyperGTest(params) >>>>> >>>>> hgOver >>>>> Gene to GO BP test for over-representation >>>>> 2136 GO BP ids tested (15 have p< 0.01) >>>>> Selected gene set size: 112 >>>>> Gene universe size: 951 >>>>> Annotation package: hgu95av2 >>>>> >>>>> ### KEGG chunk >>>>> paramsKEGG<- new("KEGGHyperGParams", >>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>> annotation = "hgu95av2.db", >>>>> pvalueCutoff=0.01, >>>>> testDirection="over") >>>>> >>>>> hgOverKEGG<- hyperGTest(paramsKEGG) >>>>> hgOverKEGG >>>>> summary(hgOverKEGG) >>>>> >>>>> My data looks like this: >>>>> str(selectedEntrezIds) >>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" >>>>> "7546" >>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235" >>>>> "28962" "10269" "4069" "30835" "7018" ... >>>>> >>>>>> str(entrezUniverse) >>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" "6521" >>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" "5048" >>>>> "4843" "6329" "5080" "6401" "6853" ... >>>>> >>>>> My naive attempts to use DO have included: >>>>> paramsDO<- new("DOHyperGParams", >>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse, >>>>> annotation = "DO.db", >>>>> pvalueCutoff=0.01, >>>>> testDirection="over") >>>>> >>>>> Which of course doesn't work and gives the following error: >>>>> Error in getClass(Class, where = topenv(parent.frame())) : >>>>> "DOHyperGParams" is not a defined class >>>>> >>>>>> traceback() >>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = NA) >>>>> 2: getClass(Class, where = topenv(parent.frame())) >>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, >>>>> universeGeneIds = >>>>> entrezUniverse, >>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over") >>>>> >>>>> >>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the >>>>> following error:.. >>>>> >>>>> hgOverDO<- hyperGTest(paramsDO) >>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) : >>>>> 'arg' should be one of ?BP?, ?CC?, ?MF? >>>>> >>>>> traceback() >>>>> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), >>>>> collapse = ", ")), domain = NA) >>>>> 9: match.arg(ontology, c("BP", "CC", "MF")) >>>>> 8: getUniverseViaGo(p) >>>>> 7: universeBuilder(p) >>>>> 6: universeBuilder(p) >>>>> 5: .hyperGTestInternal(p) >>>>> 4: is(object, Cl) >>>>> 3: is(object, Cl) >>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", >>>>> "hyperGTest") >>>>> 1: hyperGTest(paramsDO) >>>>> >>>>> >>>>> Any help would be greatly appreciated. >>>>> /Ted >>>>> >>>>>> sessionInfo() >>>>> R version 2.12.1 (2010-12-16) >>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>>> LC_TIME=English_United States.1252 >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5 >>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0 >>>>> Category_2.16.0 AnnotationDbi_1.12.0 >>>>> [12] Biobase_2.10.0 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0 >>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6 >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> . >>>> >>> >>> >> > > -- ----------------------------------------------- Gang (Gilbert) Feng, PhD Biomedical Informatics Center Robert H. Lurie Comprehensive Cancer Center Northwestern University 750 N. Lake Shore Drive, 11th Floor(11-175e) Chicago, IL 60611 Phone:312-503-2358 Email g-feng (at) northwestern.edu
ADD REPLY

Login before adding your answer.

Traffic: 831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6