goseq KEGG testing?
1
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 8 days ago
United States
Hi, I'm learning how to use goseq for RNA-Seq data. The vignette says that in addition to GO BP, MF and CC testing, there is native support for KEGG category testing. When I look at the man page for goseq, it says: "test.cats A vector specifying which categories to test for over representation amongst DE genes. See details for vaild options. ... Valid options for the test.cats arguement are any combination of "GO:CC", "GO:BP", "GO:MF" & "KEGG". The three GO terms refer to the Cellular Component, Biological Process and Molecular Function respectively. "KEGG" refers to KEGG pathways. " However, when I try to do just test.cats = "KEGG" or test.cats=c("GO:MF","KEGG"), I get an error. I'm using the demo data from the vignette, all codes and sessionInfo below. I'm trying to include this in a workshop I'm teaching on Thursday, and hope to find a quick answer! Thanks, Jenny > library(goseq) > > library(edgeR) > table.summary <- read.table(system.file("extdata", "Li_sum.txt", package = "goseq"), + sep = "\t", header = TRUE, stringsAsFactors = FALSE) > counts <- table.summary[, -1] > rownames(counts) <- table.summary[, 1] > grp <- factor(rep(c("Control", "Treated"), times = c(4, 3))) > summarized <- DGEList(counts, lib.size = colSums(counts), group = grp) > disp <- estimateCommonDisp(summarized) > tested <- exactTest(disp) Comparison of groups: Treated - Control > topTags(tested) Comparison of groups: Treated-Control logConc logFC PValue FDR ENSG00000127954 -31.02991 37.972297 6.800102e-79 3.366458e-74 ENSG00000151503 -12.96052 5.404687 9.143635e-66 2.263324e-61 ENSG00000096060 -11.77715 4.899691 4.866396e-60 8.030527e-56 ENSG00000091879 -15.35999 5.771018 5.469701e-55 6.769575e-51 ENSG00000132437 -14.14850 -5.896416 3.487845e-52 3.453385e-48 ENSG00000166451 -12.61713 4.567569 4.410990e-52 3.626363e-48 ENSG00000131016 -14.80016 5.274233 5.127568e-52 3.626363e-48 ENSG00000163492 -17.28041 7.296034 1.231576e-44 7.621299e-41 ENSG00000113594 -12.24687 4.053165 6.002790e-44 3.301935e-40 ENSG00000116285 -13.01524 4.112220 4.005898e-43 1.983160e-39 > genes.Seq <- as.integer(p.adjust(tested$table$p.value[tested$table$logFC != 0], + method = "BH") < 0.05) > names(genes.Seq) = row.names(tested$table[tested$table$logFC != 0, ]) > length(genes.Seq) [1] 22743 > #This is the number of genes in the universe > table(genes.Seq) genes.Seq 0 1 19344 3399 > #This shows how many have 1s, and are "selected" > > pwf.Seq <- nullp(genes.Seq, "hg18", "ensGene") Loading hg18 length data... > > GO.MF.Seq = goseq(pwf.Seq, "hg18", "ensGene", test.cats = c("GO:MF")) Fetching GO annotations... Calculating the p-values... > dim(GO.MF.Seq) [1] 2688 3 > names(GO.MF.Seq) [1] "category" "upval" "dpval" > GO.MF.Seq[1:10,] category upval dpval 1034 GO:0005515 2.881933e-50 1.0000000 18 GO:0000166 2.493016e-15 1.0000000 1042 GO:0005524 5.321059e-13 1.0000000 1628 GO:0016787 1.557386e-09 1.0000000 2328 GO:0046872 8.558029e-09 1.0000000 1611 GO:0016740 2.732780e-07 1.0000000 122 GO:0003700 3.179382e-07 1.0000000 147 GO:0003735 3.793941e-06 0.9999985 1033 GO:0005509 4.306285e-06 0.9999978 155 GO:0003779 5.667785e-06 0.9999976 > > ?goseq starting httpd help server ... done > KEGG.Seq = goseq(pwf.Seq, "hg18", "ensGene", test.cats = c("KEGG")) Fetching GO annotations... Error in getgo(names(DEgenes), genome, id, fetch.cats = test.cats) : object 'go' not found > > KEGG.Seq = goseq(pwf.Seq, "hg18", "ensGene", test.cats = c("GO:MF","KEGG")) Fetching GO annotations... Error in getgo(names(DEgenes), genome, id, fetch.cats = test.cats) : object 'go' not found > > sessionInfo() R version 2.12.2 (2011-02-25) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_2.4.6 edgeR_2.0.4 goseq_1.2.0 geneLenDataBase_0.99.5 [5] BiasedUrn_1.03 GOstats_2.16.0 graph_1.28.0 Category_2.16.1 [9] drosgenome1.db_2.4.5 org.Dm.eg.db_2.4.6 biomaRt_2.6.0 limma_3.6.9 [13] affycoretools_1.22.0 KEGG.db_2.4.5 GO.db_2.4.5 RSQLite_0.9-4 [17] DBI_0.2-5 AnnotationDbi_1.12.0 affy_1.28.0 Biobase_2.10.0 loaded via a namespace (and not attached): [1] affyio_1.18.0 affyPLM_1.26.1 annaffy_1.22.0 annotate_1.28.1 [5] Biostrings_2.18.4 BSgenome_1.18.3 gcrma_2.22.0 genefilter_1.32.0 [9] GenomicFeatures_1.2.3 GenomicRanges_1.2.3 grid_2.12.2 GSEABase_1.12.2 [13] IRanges_1.8.9 lattice_0.19-17 Matrix_0.999375-46 mgcv_1.7-3 [17] nlme_3.1-98 preprocessCore_1.12.0 RBGL_1.26.0 RCurl_1.5-0.1 [21] rtracklayer_1.10.6 splines_2.12.2 survival_2.36-5 tools_2.12.2 [25] XML_3.2-0.2 xtable_1.5-6
Pathways GO PROcess Category goseq Pathways GO PROcess Category goseq • 2.9k views
ADD COMMENT
0
Entering edit mode
@matthew-young-4510
Last seen 10.3 years ago
Hi Jenny, It seems there's a bug in the current release version of goseq in tho getgo function, which is what is causing your error. It should only occur when fetching KEGG categories and using a gene ID other than Entrez Gene ID. I've fixed this in the latest devel version (1.3.2, which can be downloaded here http://bioconductor.org/help/bioc- views/devel/bioc/html/goseq.html) and will update the release version to correct the error. In the meantime I suggest you install the current devel version (which should become the release version in bioconductor 2.8 in a couple of weeks). Thank you for bringing this issue to my attention. Cheers, Matt On 23/03/2011, at 7:06 AM, Jenny Drnevich wrote: > Hi, > > I'm learning how to use goseq for RNA-Seq data. The vignette says > that in addition to GO BP, MF and CC testing, there is native > support for KEGG category testing. When I look at the man page for > goseq, it says: > > "test.cats A vector specifying which categories to test for over > representation amongst DE genes. See details for vaild options. > > ... > > Valid options for the test.cats arguement are any combination of > "GO:CC", "GO:BP", "GO:MF" & "KEGG". The three GO terms refer to the > Cellular Component, Biological Process and Molecular Function > respectively. "KEGG" refers to KEGG pathways. " > > However, when I try to do just test.cats = "KEGG" or > test.cats=c("GO:MF","KEGG"), I get an error. I'm using the demo data > from the vignette, all codes and sessionInfo below. I'm trying to > include this in a workshop I'm teaching on Thursday, and hope to > find a quick answer! > > Thanks, > Jenny > > > library(goseq) > > > > library(edgeR) > > table.summary <- read.table(system.file("extdata", "Li_sum.txt", > package = "goseq"), > + sep = "\t", header = TRUE, stringsAsFactors = > FALSE) > > counts <- table.summary[, -1] > > rownames(counts) <- table.summary[, 1] > > grp <- factor(rep(c("Control", "Treated"), times = c(4, 3))) > > summarized <- DGEList(counts, lib.size = colSums(counts), group = > grp) > > disp <- estimateCommonDisp(summarized) > > tested <- exactTest(disp) > Comparison of groups: Treated - Control > > topTags(tested) > Comparison of groups: Treated-Control > logConc logFC PValue FDR > ENSG00000127954 -31.02991 37.972297 6.800102e-79 3.366458e-74 > ENSG00000151503 -12.96052 5.404687 9.143635e-66 2.263324e-61 > ENSG00000096060 -11.77715 4.899691 4.866396e-60 8.030527e-56 > ENSG00000091879 -15.35999 5.771018 5.469701e-55 6.769575e-51 > ENSG00000132437 -14.14850 -5.896416 3.487845e-52 3.453385e-48 > ENSG00000166451 -12.61713 4.567569 4.410990e-52 3.626363e-48 > ENSG00000131016 -14.80016 5.274233 5.127568e-52 3.626363e-48 > ENSG00000163492 -17.28041 7.296034 1.231576e-44 7.621299e-41 > ENSG00000113594 -12.24687 4.053165 6.002790e-44 3.301935e-40 > ENSG00000116285 -13.01524 4.112220 4.005898e-43 1.983160e-39 > > genes.Seq <- as.integer(p.adjust(tested$table$p.value[tested$table > $logFC != 0], > + method = "BH") < 0.05) > > names(genes.Seq) = row.names(tested$table[tested$table$logFC != > 0, ]) > > length(genes.Seq) > [1] 22743 > > #This is the number of genes in the universe > > table(genes.Seq) > genes.Seq > 0 1 > 19344 3399 > > #This shows how many have 1s, and are "selected" > > > > pwf.Seq <- nullp(genes.Seq, "hg18", "ensGene") > Loading hg18 length data... > > > > GO.MF.Seq = goseq(pwf.Seq, "hg18", "ensGene", test.cats = > c("GO:MF")) > Fetching GO annotations... > Calculating the p-values... > > dim(GO.MF.Seq) > [1] 2688 3 > > names(GO.MF.Seq) > [1] "category" "upval" "dpval" > > GO.MF.Seq[1:10,] > category upval dpval > 1034 GO:0005515 2.881933e-50 1.0000000 > 18 GO:0000166 2.493016e-15 1.0000000 > 1042 GO:0005524 5.321059e-13 1.0000000 > 1628 GO:0016787 1.557386e-09 1.0000000 > 2328 GO:0046872 8.558029e-09 1.0000000 > 1611 GO:0016740 2.732780e-07 1.0000000 > 122 GO:0003700 3.179382e-07 1.0000000 > 147 GO:0003735 3.793941e-06 0.9999985 > 1033 GO:0005509 4.306285e-06 0.9999978 > 155 GO:0003779 5.667785e-06 0.9999976 > > > > ?goseq > starting httpd help server ... done > > KEGG.Seq = goseq(pwf.Seq, "hg18", "ensGene", test.cats = c("KEGG")) > Fetching GO annotations... > Error in getgo(names(DEgenes), genome, id, fetch.cats = test.cats) : > object 'go' not found > > > > KEGG.Seq = goseq(pwf.Seq, "hg18", "ensGene", test.cats = > c("GO:MF","KEGG")) > Fetching GO annotations... > Error in getgo(names(DEgenes), genome, id, fetch.cats = test.cats) : > object 'go' not found > > > > sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] org.Hs.eg.db_2.4.6 edgeR_2.0.4 goseq_1.2.0 > geneLenDataBase_0.99.5 > [5] BiasedUrn_1.03 GOstats_2.16.0 graph_1.28.0 > Category_2.16.1 > [9] drosgenome1.db_2.4.5 org.Dm.eg.db_2.4.6 biomaRt_2.6.0 > limma_3.6.9 > [13] affycoretools_1.22.0 KEGG.db_2.4.5 GO.db_2.4.5 > RSQLite_0.9-4 > [17] DBI_0.2-5 AnnotationDbi_1.12.0 affy_1.28.0 > Biobase_2.10.0 > > loaded via a namespace (and not attached): > [1] affyio_1.18.0 affyPLM_1.26.1 annaffy_1.22.0 > annotate_1.28.1 > [5] Biostrings_2.18.4 BSgenome_1.18.3 gcrma_2.22.0 > genefilter_1.32.0 > [9] GenomicFeatures_1.2.3 GenomicRanges_1.2.3 > grid_2.12.2 GSEABase_1.12.2 > [13] IRanges_1.8.9 lattice_0.19-17 Matrix_0.999375-46 > mgcv_1.7-3 > [17] nlme_3.1-98 preprocessCore_1.12.0 > RBGL_1.26.0 RCurl_1.5-0.1 > [21] rtracklayer_1.10.6 splines_2.12.2 survival_2.36-5 > tools_2.12.2 > [25] XML_3.2-0.2 xtable_1.5-6 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT

Login before adding your answer.

Traffic: 373 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6