HGNC annotation for use in GOstats
1
0
Entering edit mode
Boel Brynedal ▴ 200
@boel-brynedal-2091
Last seen 9.6 years ago
Dear All, I am attempting a fairly simple thing: performing a hypergeometric test for gene sets using GOstats. My gene set is in HGNC symbols as is my 'gene universe? vector. But GOstats seems to require entrez IDs. Could anyone point me to a hgnc annotation package that includes entrez IDs? Or any other way to run GOstats using HGNC symbols. Thank you, Bo params <- new("GOHyperGParams", geneIds=geneset, universeGeneIds=universe, ontology="BP", pvalueCutoff=0.05, conditional=TRUE, testDirection="over") > hgOver <- hyperGTest(params) Error in eapply(ID2GO(datPkg), function(goids) { : error in evaluating the argument 'env' in selecting a method for function 'eapply': Error in function (classes, fdef, mtable) : unable to find an inherited method for function ?cols? for signature ?"function"?
Annotation GOstats Annotation GOstats • 1.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 18 hours ago
United States
Hi Boel, On Tuesday, December 10, 2013 5:02:50 AM, Boel Brynedal wrote: > > Dear All, > > I am attempting a fairly simple thing: performing a hypergeometric test for gene sets using GOstats. My gene set is in HGNC symbols as is my 'gene universe? vector. But GOstats seems to require entrez IDs. Could anyone point me to a hgnc annotation package that includes entrez IDs? Or any other way to run GOstats using HGNC symbols. You can convert using the org.Hs.eg.db package. genemap <- select(org.Hs.eg.db, geneset, "ENTREZID", "SYMBOL") univmap <- select(org.Hs.eg.db, universe, "ENTREZID", "SYMBOL") And you will probably get a warning like this: Warning message: In .generateExtraRows(tab, keys, jointype) : 'select' resulted in 1:many mapping between keys and return rows indicating that some of the Hugo symbols mapped to multiple Entrez Gene IDs, which you will then need to resolve in some fashion. Since this usually involves many genes, and I am A Hack (tm), I usually do something super naive like geneset <- genemap[!duplicated(genemap[,1]), 2] universe <- univmap[!duplicated(univmap[,1]), 2] assuming (obviously) that the first instance of a HGNC -> EntrezID mapping is as good as another. That would also assume that a given HGNC -> EntrezID mapping will be consistent for both the genemap and univmap, so you will end up with consistent EntrezIDs for a given Hugo symbol. There are more sophisticated ways to do this, I am sure. But note that HGNC attempts to come up with unique gene symbols, but there are lots of non-unique symbols in the wild, so there is always the possibility that you will get a symbol -> EntrezID mapping that is not only a multiple map, but that points to two (or more) completely different genes. As an example: > select(org.Hs.eg.db, "HBD", c("ENTREZID","GENENAME"), "SYMBOL") SYMBOL ENTREZID GENENAME 1 HBD 3045 hemoglobin, delta 2 HBD 100187828 hypophosphatemic bone disease So you have the added wrinkle of not necessarily knowing which HBD you might be after. Best, Jim > > Thank you, > Bo > > params <- new("GOHyperGParams", geneIds=geneset, universeGeneIds=universe, ontology="BP", pvalueCutoff=0.05, conditional=TRUE, testDirection="over") >> hgOver <- hyperGTest(params) > Error in eapply(ID2GO(datPkg), function(goids) { : > error in evaluating the argument 'env' in selecting a method for function 'eapply': Error in function (classes, fdef, mtable) : > unable to find an inherited method for function ?cols? for signature ?"function"? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hi Jim, Worked like a charm. Thanks! 10 dec 2013 kl. 15:33 skrev James W. MacDonald <jmacdon at="" uw.edu="">: > Hi Boel, > > On Tuesday, December 10, 2013 5:02:50 AM, Boel Brynedal wrote: >> >> Dear All, >> >> I am attempting a fairly simple thing: performing a hypergeometric test for gene sets using GOstats. My gene set is in HGNC symbols as is my 'gene universe? vector. But GOstats seems to require entrez IDs. Could anyone point me to a hgnc annotation package that includes entrez IDs? Or any other way to run GOstats using HGNC symbols. > > You can convert using the org.Hs.eg.db package. > > genemap <- select(org.Hs.eg.db, geneset, "ENTREZID", "SYMBOL") > univmap <- select(org.Hs.eg.db, universe, "ENTREZID", "SYMBOL") > > And you will probably get a warning like this: > > Warning message: > In .generateExtraRows(tab, keys, jointype) : > 'select' resulted in 1:many mapping between keys and return rows > > indicating that some of the Hugo symbols mapped to multiple Entrez Gene IDs, which you will then need to resolve in some fashion. Since this usually involves many genes, and I am A Hack (tm), I usually do something super naive like > > geneset <- genemap[!duplicated(genemap[,1]), 2] > universe <- univmap[!duplicated(univmap[,1]), 2] > > assuming (obviously) that the first instance of a HGNC -> EntrezID mapping is as good as another. That would also assume that a given HGNC -> EntrezID mapping will be consistent for both the genemap and univmap, so you will end up with consistent EntrezIDs for a given Hugo symbol. There are more sophisticated ways to do this, I am sure. > > But note that HGNC attempts to come up with unique gene symbols, but there are lots of non-unique symbols in the wild, so there is always the possibility that you will get a symbol -> EntrezID mapping that is not only a multiple map, but that points to two (or more) completely different genes. As an example: > >> select(org.Hs.eg.db, "HBD", c("ENTREZID","GENENAME"), "SYMBOL") > SYMBOL ENTREZID GENENAME > 1 HBD 3045 hemoglobin, delta > 2 HBD 100187828 hypophosphatemic bone disease > > So you have the added wrinkle of not necessarily knowing which HBD you might be after. > > Best, > > Jim > > >> >> Thank you, >> Bo >> >> params <- new("GOHyperGParams", geneIds=geneset, universeGeneIds=universe, ontology="BP", pvalueCutoff=0.05, conditional=TRUE, testDirection="over") >>> hgOver <- hyperGTest(params) >> Error in eapply(ID2GO(datPkg), function(goids) { : >> error in evaluating the argument 'env' in selecting a method for function 'eapply': Error in function (classes, fdef, mtable) : >> unable to find an inherited method for function ?cols? for signature ?"function"? >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099
ADD REPLY

Login before adding your answer.

Traffic: 728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6