KEGG overrepresentation loses genes
1
0
Entering edit mode
Anne Kupczok ▴ 10
@anne-kupczok-4022
Last seen 9.6 years ago
Hello, I observed the following problem when using the KEGG annotation with hyperGTest: Somehow hyperGTest does not consider all genes. In the example below, all three genes are in the category "05020" (this is what mget(genes,envir=org.Hs.egPATH) says). In the summary of hyperGTest, however, the category contains only two genes. Is there an explanation of this behavior? Thanks in advance! Anne > library("Category") Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. > library("org.Hs.eg.db") Loading required package: DBI > genes=c("1958","3553","3303") > > GoHyp=new("KEGGHyperGParams",geneIds=genes,annotation="org.Hs.eg",pval ueCutoff=1,testDirection="over") > htest=hyperGTest(GoHyp) > s=summary(htest) > s[1,] KEGGID Pvalue OddsRatio ExpCount Count Size Term 1 05020 3.810228e-06 Inf 0.003960844 2 35 Prion diseases > > p=mget(genes,envir=org.Hs.egPATH,ifnotfound=NA) > p $`1958` [1] "05020" $`3553` [1] "04010" "04060" "04210" "04620" "04640" "04940" "05010" "05020" "05332" $`3303` [1] "04010" "04144" "04612" "05020" > geneIdsByCategory(htest,"05020") $`05020` [1] "1958" "3553" > sessionInfo() R version 2.10.0 (2009-10-26) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] KEGG.db_2.3.5 org.Hs.eg.db_2.3.6 RSQLite_0.7-3 [4] DBI_0.2-4 Category_2.12.0 AnnotationDbi_1.8.1 [7] Biobase_2.6.0 loaded via a namespace (and not attached): [1] annotate_1.24.0 genefilter_1.28.0 graph_1.24.1 GSEABase_1.8.0 [5] RBGL_1.22.0 splines_2.10.0 survival_2.35-7 tools_2.10.0 [9] XML_2.6-0 xtable_1.5-6 >
Annotation Category Annotation Category • 738 views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 7.7 years ago
United States
Hi Anne, Unfortunately, this is a small bug you have uncovered that can affect people using Category to do KEGG analysis. I have just fixed it, and a patched version should be available within a day or so via biocLite(). Good job on finding that, and thanks for sharing. Marc On 04/14/2010 08:04 AM, Anne Kupczok wrote: > Hello, > I observed the following problem when using the KEGG annotation with > hyperGTest: Somehow hyperGTest does not consider all genes. In the > example below, all three genes are in the category "05020" (this is > what mget(genes,envir=org.Hs.egPATH) says). In the summary of > hyperGTest, however, the category contains only two genes. > Is there an explanation of this behavior? > Thanks in advance! > Anne > > > library("Category") > Loading required package: AnnotationDbi > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > > library("org.Hs.eg.db") > Loading required package: DBI > > genes=c("1958","3553","3303") > > > > > GoHyp=new("KEGGHyperGParams",geneIds=genes,annotation="org.Hs.eg",pv alueCutoff=1,testDirection="over") > > > htest=hyperGTest(GoHyp) > > s=summary(htest) > > s[1,] > KEGGID Pvalue OddsRatio ExpCount Count Size Term > 1 05020 3.810228e-06 Inf 0.003960844 2 35 Prion diseases > > > > p=mget(genes,envir=org.Hs.egPATH,ifnotfound=NA) > > p > $`1958` > [1] "05020" > > $`3553` > [1] "04010" "04060" "04210" "04620" "04640" "04940" "05010" "05020" > "05332" > > $`3303` > [1] "04010" "04144" "04612" "05020" > > > geneIdsByCategory(htest,"05020") > $`05020` > [1] "1958" "3553" > > > sessionInfo() > R version 2.10.0 (2009-10-26) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] KEGG.db_2.3.5 org.Hs.eg.db_2.3.6 RSQLite_0.7-3 > [4] DBI_0.2-4 Category_2.12.0 AnnotationDbi_1.8.1 > [7] Biobase_2.6.0 > > loaded via a namespace (and not attached): > [1] annotate_1.24.0 genefilter_1.28.0 graph_1.24.1 GSEABase_1.8.0 > [5] RBGL_1.22.0 splines_2.10.0 survival_2.35-7 tools_2.10.0 > [9] XML_2.6-0 xtable_1.5-6 > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 1032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6