problem with GO terms
1
0
Entering edit mode
Ina Hoeschele ▴ 620
@ina-hoeschele-2992
Last seen 3.3 years ago
United States
Hi, I have done a simple analysis associating GO terms with a gene list using GOstats. Then when I try to retrieve all genes belonging to a significant GO category I get zero genes ! I use this code: library(biomaRt) mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl") temp <- getBM(attributes="entrezgene", filters="go", values=GOID[g], mart=mart) length(temp$entrezgene) is zero!! GOID[g=1] = "GO:0050864", so as long as this is a valid GO ID (as returned from GOstats), length(temp$entrezgene) should not be zero!? This happens for multiple of my top 105 GO (BP, CC, MF) categories. Thanks for any hint ... Ina
GO GOstats Category GO GOstats Category • 1.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States
Hi Ina, On 11/22/2011 12:19 PM, Ina Hoeschele wrote: > Hi, > I have done a simple analysis associating GO terms with a gene list using GOstats. Then when I try to retrieve all genes belonging to a significant GO category I get zero genes ! I use this code: > library(biomaRt) > mart<- useMart("ensembl", dataset="hsapiens_gene_ensembl") > temp<- getBM(attributes="entrezgene", filters="go", values=GOID[g], mart=mart) You don't give sessionInfo(), so I have no idea why this is happening (remember to always supply this in the future!). However, you don't need to use biomaRt for this. > library(GO.db) Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. Loading required package: DBI > library(org.Hs.eg.db) > get("GO:0050864", org.Hs.egGO2EG) Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for "GO:0050864" not found So for the current version of the GO.db, this GO term no longer exists, which is probably the problem you are having with biomaRt as well. However if you got this GO term from GOstats, then it will exist in your version of these packages. As an example of what you should expect: > get("GO:0007597", org.Hs.egGO2EG) TAS IDA TAS TAS TAS TAS TAS TAS TAS IC TAS "2" "350" "708" "710" "2147" "2157" "2158" "2159" "2160" "2161" "2161" TAS TAS TAS TAS TAS TAS TAS TAS "2811" "2812" "2814" "2815" "3818" "3827" "5547" "7450" > sessionInfo() R version 2.14.0 beta (2011-10-17 r57293) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_2.6.4 GO.db_2.6.1 RSQLite_0.10.0 [4] DBI_0.2-5 AnnotationDbi_1.16.4 Biobase_2.13.12 loaded via a namespace (and not attached): [1] IRanges_1.11.32 Best, Jim > > length(temp$entrezgene) is zero!! > > GOID[g=1] = "GO:0050864", so as long as this is a valid GO ID (as returned from GOstats), length(temp$entrezgene) should not be zero!? > > This happens for multiple of my top 105 GO (BP, CC, MF) categories. > > Thanks for any hint ... > > Ina > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT
0
Entering edit mode
sorry ... R version 2.14.0 (2011-10-31) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.10.0 GOstats_2.20.0 [3] graph_1.32.0 Category_2.20.0 [5] PFAM.db_2.6.1 KEGG.db_2.6.1 [7] GO.db_2.6.1 annotate_1.32.0 [9] illuminaHumanv4.db_1.12.1 org.Hs.eg.db_2.6.4 [11] RSQLite_0.10.0 DBI_0.2-5 [13] AnnotationDbi_1.16.4 Biobase_2.14.0 [15] BiocInstaller_1.2.1 loaded via a namespace (and not attached): [1] genefilter_1.36.0 GSEABase_1.16.0 IRanges_1.12.2 RBGL_1.30.1 [5] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 tools_2.14.0 [9] XML_3.4-2.2 xtable_1.6-0 > ----- Original Message ----- From: "James W. MacDonald" <jmacdon@med.umich.edu> To: "Ina Hoeschele" <inah at="" vbi.vt.edu=""> Cc: "Bioconductor mailing list" <bioconductor at="" r-project.org=""> Sent: Tuesday, November 22, 2011 12:39:18 PM Subject: Re: [BioC] problem with GO terms Hi Ina, On 11/22/2011 12:19 PM, Ina Hoeschele wrote: > Hi, > I have done a simple analysis associating GO terms with a gene list using GOstats. Then when I try to retrieve all genes belonging to a significant GO category I get zero genes ! I use this code: > library(biomaRt) > mart<- useMart("ensembl", dataset="hsapiens_gene_ensembl") > temp<- getBM(attributes="entrezgene", filters="go", values=GOID[g], mart=mart) You don't give sessionInfo(), so I have no idea why this is happening (remember to always supply this in the future!). However, you don't need to use biomaRt for this. > library(GO.db) Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. Loading required package: DBI > library(org.Hs.eg.db) > get("GO:0050864", org.Hs.egGO2EG) Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for "GO:0050864" not found So for the current version of the GO.db, this GO term no longer exists, which is probably the problem you are having with biomaRt as well. However if you got this GO term from GOstats, then it will exist in your version of these packages. As an example of what you should expect: > get("GO:0007597", org.Hs.egGO2EG) TAS IDA TAS TAS TAS TAS TAS TAS TAS IC TAS "2" "350" "708" "710" "2147" "2157" "2158" "2159" "2160" "2161" "2161" TAS TAS TAS TAS TAS TAS TAS TAS "2811" "2812" "2814" "2815" "3818" "3827" "5547" "7450" > sessionInfo() R version 2.14.0 beta (2011-10-17 r57293) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_2.6.4 GO.db_2.6.1 RSQLite_0.10.0 [4] DBI_0.2-5 AnnotationDbi_1.16.4 Biobase_2.13.12 loaded via a namespace (and not attached): [1] IRanges_1.11.32 Best, Jim > > length(temp$entrezgene) is zero!! > > GOID[g=1] = "GO:0050864", so as long as this is a valid GO ID (as returned from GOstats), length(temp$entrezgene) should not be zero!? > > This happens for multiple of my top 105 GO (BP, CC, MF) categories. > > Thanks for any hint ... > > Ina > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY
0
Entering edit mode
thank you, Jim ... I did what you show below and I get the same result: > get("GO:0050864", org.Hs.egGO2EG) Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for "GO:0050864" not found but why is GOstats giving me this GO term? Thanks again, Ina > sessionInfo() R version 2.14.0 (2011-10-31) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.10.0 GOstats_2.20.0 [3] graph_1.32.0 Category_2.20.0 [5] PFAM.db_2.6.1 KEGG.db_2.6.1 [7] GO.db_2.6.1 annotate_1.32.0 [9] illuminaHumanv4.db_1.12.1 org.Hs.eg.db_2.6.4 [11] RSQLite_0.10.0 DBI_0.2-5 [13] AnnotationDbi_1.16.4 Biobase_2.14.0 [15] BiocInstaller_1.2.1 loaded via a namespace (and not attached): [1] genefilter_1.36.0 GSEABase_1.16.0 IRanges_1.12.2 RBGL_1.30.1 [5] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 tools_2.14.0 [9] XML_3.4-2.2 xtable_1.6-0 >
ADD REPLY
0
Entering edit mode
Hi Ina, On 11/22/2011 1:03 PM, Ina Hoeschele wrote: > thank you, Jim ... > I did what you show below and I get the same result: > > > get("GO:0050864", org.Hs.egGO2EG) > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : > value for "GO:0050864" not found > > but why is GOstats giving me this GO term? Did you use GOstats with this current version of BioC, or are you using data you processed sometime in the past? As far as I can tell, it is impossible for you to be getting that GO term if you are using the current version of these packages. I am assuming that your data are from the Illumina Human V4 chip. > get("GO:0050864", illuminaHumanv4GO2PROBE) Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for "GO:0050864" not found This is with the version of the illuminaHumanV4.db package that you are using. Since this isn't even in that package, it is not possible for GOstats to be reporting it as being significant. Best, Jim > > Thanks again, Ina > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.10.0 GOstats_2.20.0 > [3] graph_1.32.0 Category_2.20.0 > [5] PFAM.db_2.6.1 KEGG.db_2.6.1 > [7] GO.db_2.6.1 annotate_1.32.0 > [9] illuminaHumanv4.db_1.12.1 org.Hs.eg.db_2.6.4 > [11] RSQLite_0.10.0 DBI_0.2-5 > [13] AnnotationDbi_1.16.4 Biobase_2.14.0 > [15] BiocInstaller_1.2.1 > > loaded via a namespace (and not attached): > [1] genefilter_1.36.0 GSEABase_1.16.0 IRanges_1.12.2 RBGL_1.30.1 > [5] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 tools_2.14.0 > [9] XML_3.4-2.2 xtable_1.6-0 -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY
0
Entering edit mode
thanks again, Jim ... to the best of my knowledge I am not using anything from the past - below is some part of my code (omitting everything that should not be relevant). Thanks and sorry - there must be an obvious reason ... library(illuminaHumanv4.db) library(annotate) library("GO.db") library("KEGG.db") library("PFAM.db") library("GOstats") DATA <- read.csv(file=filename2,header=TRUE,sep=",") topProbeIDs <- DATA$PROBE_ID[1:top] topProbeIDs <- as.character(topProbeIDs) universeProbeIDs <- DATA$PROBE_ID universeProbeIDs <- as.character(universeProbeIDs) topPvals <- DATA$AGE_pval_expr[1:top] universePvals <- DATA$AGE_pval_expr entrezIDs <- getEG(topProbeIDs, "illuminaHumanv4") topProbeIDs1 <- topProbeIDs[is.na(entrezIDs)==FALSE] entrezIDs1 <- entrezIDs[is.na(entrezIDs)==FALSE] topPvals1 <- topPvals[is.na(entrezIDs)==FALSE] universeEntrezIDs <- getEG(universeProbeIDs, "illuminaHumanv4") universeProbeIDs1 <- universeProbeIDs[is.na(universeEntrezIDs)==FALSE] universeEntrezIDs1 <- universeEntrezIDs[is.na(universeEntrezIDs)==FALSE] universePvals1 <- universePvals[is.na(universeEntrezIDs)==FALSE] GOannot1 <- getGO(topProbeIDs1, "illuminaHumanv4") topProbeIDs2 <- topProbeIDs1[is.na(GOannot1)==FALSE] entrezIDs2 <- entrezIDs1[is.na(GOannot1)==FALSE] topPvals2 <- topPvals1[is.na(GOannot1)==FALSE] GOannot2 <- GOannot1[is.na(GOannot1)==FALSE] universeGOannot1 <- getGO(universeProbeIDs1, "illuminaHumanv4") universeProbeIDs2 <- universeProbeIDs1[is.na(universeGOannot1)==FALSE] universeEntrezIDs2 <- universeEntrezIDs1[is.na(universeGOannot1)==FALSE] universePvals2 <- universePvals1[is.na(universeGOannot1)==FALSE] universeGOannot2 <- universeGOannot1[is.na(universeGOannot1)==FALSE] ... params_BP_cond_over <- new("GOHyperGParams", geneIds=entrezIDs_final, universeGeneIds=universeEntrezIDs_final, annotation="illuminaHumanv4", ontology="BP", pvalueCutoff=HGcutoffGO, conditional=TRUE, testDirection="over") BP_cond_over <- hyperGTest(params_BP_cond_over) Pval_BP_cond_over <- summary(BP_cond_over)$Pvalue[summary(BP_cond_over)$Size > minCatSize] GOterm_BP_cond_over <- summary(BP_cond_over)$Term[summary(BP_cond_over)$Size > minCatSize] GOID_BP_cond_over <- summary(BP_cond_over)$GOBPID[summary(BP_cond_over)$Size > minCatSize] ... ----- Original Message ----- From: "James W. MacDonald" <jmacdon@med.umich.edu> To: "Ina Hoeschele" <inah at="" vbi.vt.edu=""> Cc: "Bioconductor mailing list" <bioconductor at="" r-project.org=""> Sent: Tuesday, November 22, 2011 1:52:56 PM Subject: Re: [BioC] problem with GO terms Hi Ina, On 11/22/2011 1:03 PM, Ina Hoeschele wrote: > thank you, Jim ... > I did what you show below and I get the same result: > > > get("GO:0050864", org.Hs.egGO2EG) > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : > value for "GO:0050864" not found > > but why is GOstats giving me this GO term? Did you use GOstats with this current version of BioC, or are you using data you processed sometime in the past? As far as I can tell, it is impossible for you to be getting that GO term if you are using the current version of these packages. I am assuming that your data are from the Illumina Human V4 chip. > get("GO:0050864", illuminaHumanv4GO2PROBE) Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for "GO:0050864" not found This is with the version of the illuminaHumanV4.db package that you are using. Since this isn't even in that package, it is not possible for GOstats to be reporting it as being significant. Best, Jim > > Thanks again, Ina > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.10.0 GOstats_2.20.0 > [3] graph_1.32.0 Category_2.20.0 > [5] PFAM.db_2.6.1 KEGG.db_2.6.1 > [7] GO.db_2.6.1 annotate_1.32.0 > [9] illuminaHumanv4.db_1.12.1 org.Hs.eg.db_2.6.4 > [11] RSQLite_0.10.0 DBI_0.2-5 > [13] AnnotationDbi_1.16.4 Biobase_2.14.0 > [15] BiocInstaller_1.2.1 > > loaded via a namespace (and not attached): > [1] genefilter_1.36.0 GSEABase_1.16.0 IRanges_1.12.2 RBGL_1.30.1 > [5] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 tools_2.14.0 > [9] XML_3.4-2.2 xtable_1.6-0 -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY
0
Entering edit mode
Hi all, I am sorry but I still have not been able to solve my problem. I did a GO analysis using GOstats on another dataset, this time a canine dataset. The top BP category that I get from GOstats again does not exist any more! Please see below. I reinstalled everything, including GOstats, and have the current versions. How is it possible for GOstats to give me these old categories ... > get("GO:0035637",canine2GO2PROBE) Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for "GO:0035637" not found > sessionInfo() R version 2.14.0 (2011-10-31) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] org.Hs.eg.db_2.6.4 GOstats_2.20.0 graph_1.32.0 [4] Category_2.20.0 KEGG.db_2.6.1 GO.db_2.6.1 [7] biomaRt_2.10.0 canine2cdf_2.9.1 canine2.db_2.6.3 [10] org.Cf.eg.db_2.6.4 RSQLite_0.10.0 DBI_0.2-5 [13] annotate_1.32.0 AnnotationDbi_1.16.5 limma_3.10.0 [16] made4_1.28.0 scatterplot3d_0.3-33 gplots_2.10.1 [19] KernSmooth_2.23-7 caTools_1.12 bitops_1.0-4.1 [22] gdata_2.8.2 gtools_2.6.2 RColorBrewer_1.0-5 [25] ade4_1.4-17 affy_1.32.0 Biobase_2.14.0 [28] BiocInstaller_1.2.1 loaded via a namespace (and not attached): [1] affyio_1.22.0 genefilter_1.36.0 GSEABase_1.16.0 [4] IRanges_1.12.3 preprocessCore_1.16.0 RBGL_1.30.1 [7] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 [10] tools_2.14.0 XML_3.4-3 xtable_1.6-0 [13] zlibbioc_1.0.0 > ----- Original Message ----- From: "James W. MacDonald" <jmacdon@med.umich.edu> To: "Ina Hoeschele" <inah at="" vbi.vt.edu=""> Cc: "Bioconductor mailing list" <bioconductor at="" r-project.org=""> Sent: Tuesday, November 22, 2011 1:52:56 PM Subject: Re: [BioC] problem with GO terms Hi Ina, On 11/22/2011 1:03 PM, Ina Hoeschele wrote: > thank you, Jim ... > I did what you show below and I get the same result: > > > get("GO:0050864", org.Hs.egGO2EG) > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : > value for "GO:0050864" not found > > but why is GOstats giving me this GO term? Did you use GOstats with this current version of BioC, or are you using data you processed sometime in the past? As far as I can tell, it is impossible for you to be getting that GO term if you are using the current version of these packages. I am assuming that your data are from the Illumina Human V4 chip. > get("GO:0050864", illuminaHumanv4GO2PROBE) Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for "GO:0050864" not found This is with the version of the illuminaHumanV4.db package that you are using. Since this isn't even in that package, it is not possible for GOstats to be reporting it as being significant. Best, Jim > > Thanks again, Ina > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.10.0 GOstats_2.20.0 > [3] graph_1.32.0 Category_2.20.0 > [5] PFAM.db_2.6.1 KEGG.db_2.6.1 > [7] GO.db_2.6.1 annotate_1.32.0 > [9] illuminaHumanv4.db_1.12.1 org.Hs.eg.db_2.6.4 > [11] RSQLite_0.10.0 DBI_0.2-5 > [13] AnnotationDbi_1.16.4 Biobase_2.14.0 > [15] BiocInstaller_1.2.1 > > loaded via a namespace (and not attached): > [1] genefilter_1.36.0 GSEABase_1.16.0 IRanges_1.12.2 RBGL_1.30.1 > [5] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 tools_2.14.0 > [9] XML_3.4-2.2 xtable_1.6-0 -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY
0
Entering edit mode
Hi Ina, It would be helpful if you would give us a _minimal_ and functional example of what you did. We will also need the set of entrez IDs you used in order to see if we can duplicate. You can output your entrez IDs using the dump() function (e.g., dump("entrezIDs", "")). Best, Jim On 11/28/2011 5:08 PM, Ina Hoeschele wrote: > Hi all, > I am sorry but I still have not been able to solve my problem. I did a GO analysis using GOstats on another dataset, this time a canine dataset. The top BP category that I get from GOstats again does not exist any more! Please see below. I reinstalled everything, including GOstats, and have the current versions. How is it possible for GOstats to give me these old categories ... > >> get("GO:0035637",canine2GO2PROBE) > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : > value for "GO:0035637" not found > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] org.Hs.eg.db_2.6.4 GOstats_2.20.0 graph_1.32.0 > [4] Category_2.20.0 KEGG.db_2.6.1 GO.db_2.6.1 > [7] biomaRt_2.10.0 canine2cdf_2.9.1 canine2.db_2.6.3 > [10] org.Cf.eg.db_2.6.4 RSQLite_0.10.0 DBI_0.2-5 > [13] annotate_1.32.0 AnnotationDbi_1.16.5 limma_3.10.0 > [16] made4_1.28.0 scatterplot3d_0.3-33 gplots_2.10.1 > [19] KernSmooth_2.23-7 caTools_1.12 bitops_1.0-4.1 > [22] gdata_2.8.2 gtools_2.6.2 RColorBrewer_1.0-5 > [25] ade4_1.4-17 affy_1.32.0 Biobase_2.14.0 > [28] BiocInstaller_1.2.1 > > loaded via a namespace (and not attached): > [1] affyio_1.22.0 genefilter_1.36.0 GSEABase_1.16.0 > [4] IRanges_1.12.3 preprocessCore_1.16.0 RBGL_1.30.1 > [7] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 > [10] tools_2.14.0 XML_3.4-3 xtable_1.6-0 > [13] zlibbioc_1.0.0 > > ----- Original Message ----- > From: "James W. MacDonald"<jmacdon at="" med.umich.edu=""> > To: "Ina Hoeschele"<inah at="" vbi.vt.edu=""> > Cc: "Bioconductor mailing list"<bioconductor at="" r-project.org=""> > Sent: Tuesday, November 22, 2011 1:52:56 PM > Subject: Re: [BioC] problem with GO terms > > Hi Ina, > > > > On 11/22/2011 1:03 PM, Ina Hoeschele wrote: >> thank you, Jim ... >> I did what you show below and I get the same result: >> >> > get("GO:0050864", org.Hs.egGO2EG) >> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >> value for "GO:0050864" not found >> >> but why is GOstats giving me this GO term? > Did you use GOstats with this current version of BioC, or are you using > data you processed sometime in the past? > > As far as I can tell, it is impossible for you to be getting that GO > term if you are using the current version of these packages. I am > assuming that your data are from the Illumina Human V4 chip. > > > get("GO:0050864", illuminaHumanv4GO2PROBE) > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : > value for "GO:0050864" not found > > This is with the version of the illuminaHumanV4.db package that you are > using. Since this isn't even in that package, it is not possible for > GOstats to be reporting it as being significant. > > Best, > > Jim >> Thanks again, Ina >> >>> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] biomaRt_2.10.0 GOstats_2.20.0 >> [3] graph_1.32.0 Category_2.20.0 >> [5] PFAM.db_2.6.1 KEGG.db_2.6.1 >> [7] GO.db_2.6.1 annotate_1.32.0 >> [9] illuminaHumanv4.db_1.12.1 org.Hs.eg.db_2.6.4 >> [11] RSQLite_0.10.0 DBI_0.2-5 >> [13] AnnotationDbi_1.16.4 Biobase_2.14.0 >> [15] BiocInstaller_1.2.1 >> >> loaded via a namespace (and not attached): >> [1] genefilter_1.36.0 GSEABase_1.16.0 IRanges_1.12.2 RBGL_1.30.1 >> [5] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 tools_2.14.0 >> [9] XML_3.4-2.2 xtable_1.6-0 -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY
0
Entering edit mode
Hi Ina, The trouble you are having is not that this GO term is old. Neither of the two GO terms you mentioned has been deprecated from bioconductor as of the most recent release. How do I know this? library(GO.db) ## this just gets all the terms from the current ontology bar = keys(GOTERM) head(bar) ## the following shows that both terms are still in the ontology "GO:0035637" %in% bar "GO:0050864" %in% bar What I think is actually causing you grief is that you are trying to look up the term using the wrong mappings. You have this term and you want to see what genes it maps to, so you are looking at the (in the 1st example) "org.Hs.egGO2EG" mapping and you really want to be looking in the "org.Hs.egGO2ALLEGS" mapping. So what is the difference? Well the 1st mapping is for direct GO to gene mappings. This is what we have direct evidence for in the database. So why wouldn't you want to use that in this instance? Because GO is a gene "ontology", therefore certain terms can be inferred to have a relationship to genes based purely on the fact that their child-terms have been directly linked. And such "indirect" parent terms will NOT show up in those GO2EG style mappings. But they will show up in the GO2ALLEGS style of mapping. So this will NOT work: get("GO:0050864", org.Hs.egGO2EG) But this will work: get("GO:0007597", org.Hs.egGO2ALLEGS) I believe the same issue is happening with your canine example. So in that case you really want to be using this: get("GO:0050864", org.Cf.egGO2ALLEGS) Hope this helps, please let us know if there are any other issues. Marc On 11/28/2011 02:32 PM, James W. MacDonald wrote: > Hi Ina, > > It would be helpful if you would give us a _minimal_ and functional > example of what you did. We will also need the set of entrez IDs you > used in order to see if we can duplicate. You can output your entrez > IDs using the dump() function (e.g., dump("entrezIDs", "")). > > Best, > > Jim > > > > On 11/28/2011 5:08 PM, Ina Hoeschele wrote: >> Hi all, >> I am sorry but I still have not been able to solve my problem. I >> did a GO analysis using GOstats on another dataset, this time a >> canine dataset. The top BP category that I get from GOstats again >> does not exist any more! Please see below. I reinstalled everything, >> including GOstats, and have the current versions. How is it possible >> for GOstats to give me these old categories ... >> >>> get("GO:0035637",canine2GO2PROBE) >> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >> value for "GO:0035637" not found >> >>> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] grid stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] org.Hs.eg.db_2.6.4 GOstats_2.20.0 graph_1.32.0 >> [4] Category_2.20.0 KEGG.db_2.6.1 GO.db_2.6.1 >> [7] biomaRt_2.10.0 canine2cdf_2.9.1 canine2.db_2.6.3 >> [10] org.Cf.eg.db_2.6.4 RSQLite_0.10.0 DBI_0.2-5 >> [13] annotate_1.32.0 AnnotationDbi_1.16.5 limma_3.10.0 >> [16] made4_1.28.0 scatterplot3d_0.3-33 gplots_2.10.1 >> [19] KernSmooth_2.23-7 caTools_1.12 bitops_1.0-4.1 >> [22] gdata_2.8.2 gtools_2.6.2 RColorBrewer_1.0-5 >> [25] ade4_1.4-17 affy_1.32.0 Biobase_2.14.0 >> [28] BiocInstaller_1.2.1 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.22.0 genefilter_1.36.0 GSEABase_1.16.0 >> [4] IRanges_1.12.3 preprocessCore_1.16.0 RBGL_1.30.1 >> [7] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 >> [10] tools_2.14.0 XML_3.4-3 xtable_1.6-0 >> [13] zlibbioc_1.0.0 >> >> ----- Original Message ----- >> From: "James W. MacDonald"<jmacdon at="" med.umich.edu=""> >> To: "Ina Hoeschele"<inah at="" vbi.vt.edu=""> >> Cc: "Bioconductor mailing list"<bioconductor at="" r-project.org=""> >> Sent: Tuesday, November 22, 2011 1:52:56 PM >> Subject: Re: [BioC] problem with GO terms >> >> Hi Ina, >> >> >> >> On 11/22/2011 1:03 PM, Ina Hoeschele wrote: >>> thank you, Jim ... >>> I did what you show below and I get the same result: >>> >>> > get("GO:0050864", org.Hs.egGO2EG) >>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >>> value for "GO:0050864" not found >>> >>> but why is GOstats giving me this GO term? >> Did you use GOstats with this current version of BioC, or are you using >> data you processed sometime in the past? >> >> As far as I can tell, it is impossible for you to be getting that GO >> term if you are using the current version of these packages. I am >> assuming that your data are from the Illumina Human V4 chip. >> >> > get("GO:0050864", illuminaHumanv4GO2PROBE) >> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >> value for "GO:0050864" not found >> >> This is with the version of the illuminaHumanV4.db package that you are >> using. Since this isn't org.Cf.egGO2ALLEGSeven in that package, it is >> not possible for >> GOstats to be reporting it as being significant. >> >> Best, >> >> Jim >>> Thanks again, Ina >>> >>>> sessionInfo() >>> R version 2.14.0 (2011-10-31) >>> Platform: i386-pc-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 >>> [2] LC_CTYPE=English_United States.1252 >>> [3] LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C >>> [5] LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] biomaRt_2.10.0 GOstats_2.20.0 >>> [3] graph_1.32.0 Category_2.20.0 >>> [5] PFAM.db_2.6.1 KEGG.db_2.6.1 >>> [7] GO.db_2.6.1 annotate_1.32.0 >>> [9] illuminaHumanv4.db_1.12.1 org.Hs.eg.db_2.6.4 >>> [11] RSQLite_0.10.0 DBI_0.2-5 >>> [13] AnnotationDbi_1.16.4 Biobase_2.14.0 >>> [15] BiocInstaller_1.2.1 >>> >>> loaded via a namespace (and not attached): >>> [1] genefilter_1.36.0 GSEABase_1.16.0 IRanges_1.12.2 >>> RBGL_1.30.1 >>> [5] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 >>> tools_2.14.0 >>> [9] XML_3.4-2.2 xtable_1.6-0 >
ADD REPLY
0
Entering edit mode
that worked - thousand thanks! ----- Original Message ----- From: "Marc Carlson" <mcarlson@fhcrc.org> To: bioconductor at r-project.org Sent: Tuesday, November 29, 2011 12:50:01 PM Subject: Re: [BioC] problem with GO terms Hi Ina, The trouble you are having is not that this GO term is old. Neither of the two GO terms you mentioned has been deprecated from bioconductor as of the most recent release. How do I know this? library(GO.db) ## this just gets all the terms from the current ontology bar = keys(GOTERM) head(bar) ## the following shows that both terms are still in the ontology "GO:0035637" %in% bar "GO:0050864" %in% bar What I think is actually causing you grief is that you are trying to look up the term using the wrong mappings. You have this term and you want to see what genes it maps to, so you are looking at the (in the 1st example) "org.Hs.egGO2EG" mapping and you really want to be looking in the "org.Hs.egGO2ALLEGS" mapping. So what is the difference? Well the 1st mapping is for direct GO to gene mappings. This is what we have direct evidence for in the database. So why wouldn't you want to use that in this instance? Because GO is a gene "ontology", therefore certain terms can be inferred to have a relationship to genes based purely on the fact that their child-terms have been directly linked. And such "indirect" parent terms will NOT show up in those GO2EG style mappings. But they will show up in the GO2ALLEGS style of mapping. So this will NOT work: get("GO:0050864", org.Hs.egGO2EG) But this will work: get("GO:0007597", org.Hs.egGO2ALLEGS) I believe the same issue is happening with your canine example. So in that case you really want to be using this: get("GO:0050864", org.Cf.egGO2ALLEGS) Hope this helps, please let us know if there are any other issues. Marc On 11/28/2011 02:32 PM, James W. MacDonald wrote: > Hi Ina, > > It would be helpful if you would give us a _minimal_ and functional > example of what you did. We will also need the set of entrez IDs you > used in order to see if we can duplicate. You can output your entrez > IDs using the dump() function (e.g., dump("entrezIDs", "")). > > Best, > > Jim > > > > On 11/28/2011 5:08 PM, Ina Hoeschele wrote: >> Hi all, >> I am sorry but I still have not been able to solve my problem. I >> did a GO analysis using GOstats on another dataset, this time a >> canine dataset. The top BP category that I get from GOstats again >> does not exist any more! Please see below. I reinstalled everything, >> including GOstats, and have the current versions. How is it possible >> for GOstats to give me these old categories ... >> >>> get("GO:0035637",canine2GO2PROBE) >> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >> value for "GO:0035637" not found >> >>> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] grid stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] org.Hs.eg.db_2.6.4 GOstats_2.20.0 graph_1.32.0 >> [4] Category_2.20.0 KEGG.db_2.6.1 GO.db_2.6.1 >> [7] biomaRt_2.10.0 canine2cdf_2.9.1 canine2.db_2.6.3 >> [10] org.Cf.eg.db_2.6.4 RSQLite_0.10.0 DBI_0.2-5 >> [13] annotate_1.32.0 AnnotationDbi_1.16.5 limma_3.10.0 >> [16] made4_1.28.0 scatterplot3d_0.3-33 gplots_2.10.1 >> [19] KernSmooth_2.23-7 caTools_1.12 bitops_1.0-4.1 >> [22] gdata_2.8.2 gtools_2.6.2 RColorBrewer_1.0-5 >> [25] ade4_1.4-17 affy_1.32.0 Biobase_2.14.0 >> [28] BiocInstaller_1.2.1 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.22.0 genefilter_1.36.0 GSEABase_1.16.0 >> [4] IRanges_1.12.3 preprocessCore_1.16.0 RBGL_1.30.1 >> [7] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 >> [10] tools_2.14.0 XML_3.4-3 xtable_1.6-0 >> [13] zlibbioc_1.0.0 >> >> ----- Original Message ----- >> From: "James W. MacDonald"<jmacdon at="" med.umich.edu=""> >> To: "Ina Hoeschele"<inah at="" vbi.vt.edu=""> >> Cc: "Bioconductor mailing list"<bioconductor at="" r-project.org=""> >> Sent: Tuesday, November 22, 2011 1:52:56 PM >> Subject: Re: [BioC] problem with GO terms >> >> Hi Ina, >> >> >> >> On 11/22/2011 1:03 PM, Ina Hoeschele wrote: >>> thank you, Jim ... >>> I did what you show below and I get the same result: >>> >>> > get("GO:0050864", org.Hs.egGO2EG) >>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >>> value for "GO:0050864" not found >>> >>> but why is GOstats giving me this GO term? >> Did you use GOstats with this current version of BioC, or are you using >> data you processed sometime in the past? >> >> As far as I can tell, it is impossible for you to be getting that GO >> term if you are using the current version of these packages. I am >> assuming that your data are from the Illumina Human V4 chip. >> >> > get("GO:0050864", illuminaHumanv4GO2PROBE) >> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >> value for "GO:0050864" not found >> >> This is with the version of the illuminaHumanV4.db package that you are >> using. Since this isn't org.Cf.egGO2ALLEGSeven in that package, it is >> not possible for >> GOstats to be reporting it as being significant. >> >> Best, >> >> Jim >>> Thanks again, Ina >>> >>>> sessionInfo() >>> R version 2.14.0 (2011-10-31) >>> Platform: i386-pc-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 >>> [2] LC_CTYPE=English_United States.1252 >>> [3] LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C >>> [5] LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] biomaRt_2.10.0 GOstats_2.20.0 >>> [3] graph_1.32.0 Category_2.20.0 >>> [5] PFAM.db_2.6.1 KEGG.db_2.6.1 >>> [7] GO.db_2.6.1 annotate_1.32.0 >>> [9] illuminaHumanv4.db_1.12.1 org.Hs.eg.db_2.6.4 >>> [11] RSQLite_0.10.0 DBI_0.2-5 >>> [13] AnnotationDbi_1.16.4 Biobase_2.14.0 >>> [15] BiocInstaller_1.2.1 >>> >>> loaded via a namespace (and not attached): >>> [1] genefilter_1.36.0 GSEABase_1.16.0 IRanges_1.12.2 >>> RBGL_1.30.1 >>> [5] RCurl_1.7-0.1 splines_2.14.0 survival_2.36-10 >>> tools_2.14.0 >>> [9] XML_3.4-2.2 xtable_1.6-0 > _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6