Gene Ontology relationships
5
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 7.7 years ago
Hi Dave, davidl at unr.nevada.edu writes: > I was just wondering if there is an easy way to start with a list of Gene > Ontology terms and to end with an interpretable map of their > inter-relationships. Have a look at the help pages for these functions in GOstats (you will need the latest release version compatible with R-2.5.0): termGraphs inducedTermGraph plotGOTermGraph GOGraph I think termGraphs would be a good place to start and then to see about visualizing each connected component. One of the challenges is that it becomes difficult to visualize graphs with many nodes as the labels get too small rather quickly. I'm curous to know if these help and if you have suggestions of how they could be more useful. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
Cancer GOstats Cancer GOstats • 790 views
0
Entering edit mode
Last seen 7.7 years ago
Hello Seth, Thank you very much for the response. I read the help pages for those functions and they sound like they are exactly what I was looking for. I ran into a problem actually using termGraphs, however. This may be something simple and stupid but I am having trouble identifying what the problem is. This is the relevant part of my workflow and the resulting error message: > paramsFvBup<-new("GOHyperGParams", geneIds=llsupFvB, universeGeneIds=llsUniversFvB, annotation="mouse4302", ontology="BP", pvalueCutoff=0.05, conditional=TRUE, testDirection="over") > FvBupOverBP<-hyperGTest(paramsFvBup) > htmlReport(FvBupOverBP, file="test.html") #That all worked fine and the table looks good > termGraphs(FvBupOverBP,pvalue=0.05, use.terms=TRUE) Error in "names<-"(*tmp*, value = c(NA_character_, NA_character_, NA_character_, : 'value' argument must specify unique names R version 2.5.0 (2007-04-23) GOstats version 2.2.3 I tried using different combinations of all the possible arguments but I kept getting the same error. I apologize if this is something simple that I should have caught. Again, thank you very much for all your help, Dave Quoting Seth Falcon <sfalcon at="" fhcrc.org="">: > Hi Dave, > > davidl at unr.nevada.edu writes: > > I was just wondering if there is an easy way to start with a list of > Gene > > Ontology terms and to end with an interpretable map of their > > inter-relationships. > > Have a look at the help pages for these functions in GOstats (you > will need the latest release version compatible with R-2.5.0): > > termGraphs > inducedTermGraph > plotGOTermGraph > GOGraph > > I think termGraphs would be a good place to start and then to see > about visualizing each connected component. One of the challenges is > that it becomes difficult to visualize graphs with many nodes as the > labels get too small rather quickly. > > I'm curous to know if these help and if you have suggestions of how > they could be more useful. > > + seth > > -- > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center > http://bioconductor.org >
0
Entering edit mode
Last seen 7.7 years ago
Hello again, I just realized that termGraphs works fine making separate graphs (each is just a graph of one node and no edges) with a list of unrelated GO ids in the id= argument. As soon as I feed it related terms, such as GO:0043412 (biopolymer modification) and GO:0043283 (biopolymer metabolic process), it gives me that error message: Error in "names<-"(*tmp*, value = NA_character_) : 'value' argument cannot contain NAs Im not sure if that clarifies the problem at all. Is the function supposed to connect the related terms with edges or am I trying to use it in a way it was not intended? Sorry about the double email, Dave
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 7.7 years ago
Hi Dave, Sorry for the delay in response... davidl at unr.nevada.edu writes: > Hello Seth, > > Thank you very much for the response. I read the help pages for those > functions and they sound like they are exactly what I was looking for. I ran > into a problem actually using termGraphs, however. This may be something > simple and stupid but I am having trouble identifying what the problem is. > This is the relevant part of my workflow and the resulting error message: > >> paramsFvBup<-new("GOHyperGParams", geneIds=llsupFvB, > universeGeneIds=llsUniversFvB, annotation="mouse4302", ontology="BP", > pvalueCutoff=0.05, conditional=TRUE, testDirection="over") >> FvBupOverBP<-hyperGTest(paramsFvBup) >> htmlReport(FvBupOverBP, file="test.html") > #That all worked fine and the table looks good >> termGraphs(FvBupOverBP,pvalue=0.05, use.terms=TRUE) > Error in "names<-"(*tmp*, value = c(NA_character_, NA_character_, > NA_character_, : > 'value' argument must specify unique names I'm able to reproduce this. Thanks for the report. I will look into it and reply to this thread when I have a fix. Sorry about that :-P + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 7.7 years ago
Hi, > davidl at unr.nevada.edu writes: >> Hello Seth, >> >> Thank you very much for the response. I read the help pages for those >> functions and they sound like they are exactly what I was looking for. I ran >> into a problem actually using termGraphs, however. This may be something >> simple and stupid but I am having trouble identifying what the problem is. >> This is the relevant part of my workflow and the resulting error message: >> >>> paramsFvBup<-new("GOHyperGParams", geneIds=llsupFvB, >> universeGeneIds=llsUniversFvB, annotation="mouse4302", ontology="BP", >> pvalueCutoff=0.05, conditional=TRUE, testDirection="over") >>> FvBupOverBP<-hyperGTest(paramsFvBup) >>> htmlReport(FvBupOverBP, file="test.html") >> #That all worked fine and the table looks good >>> termGraphs(FvBupOverBP,pvalue=0.05, use.terms=TRUE) >> Error in "names<-"(*tmp*, value = c(NA_character_, NA_character_, >> NA_character_, : >> 'value' argument must specify unique names I believe that I've fixed the problem. GOstats_2.2.5 is available via biocLite (not yet for OS X). In the meantime, I noticed a labeling problem with plotGOTermGraph and GOstats_2.2.6 will be available in the next couple of days. Please let me know if you encounter further problems --- these functions for extracting subgraphs of the results and plotting are quite new and somewhat experimental so I'm open to suggestions. Best, + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
0
Entering edit mode
Hello Seth, That was really fast. I'm going to try out this new version as soon as I get back to the computer with the data and let you know how it goes. You are awesome. Thanks sir, Dave Quoting Seth Falcon <sfalcon at="" fhcrc.org="">: > Hi, > > > davidl at unr.nevada.edu writes: > >> Hello Seth, > >> > >> Thank you very much for the response. I read the help pages for > those > >> functions and they sound like they are exactly what I was looking for. I > ran > >> into a problem actually using termGraphs, however. This may be something > >> simple and stupid but I am having trouble identifying what the problem is. > >> This is the relevant part of my workflow and the resulting error message: > >> > >>> paramsFvBup<-new("GOHyperGParams", geneIds=llsupFvB, > >> universeGeneIds=llsUniversFvB, annotation="mouse4302", ontology="BP", > >> pvalueCutoff=0.05, conditional=TRUE, testDirection="over") > >>> FvBupOverBP<-hyperGTest(paramsFvBup) > >>> htmlReport(FvBupOverBP, file="test.html") > >> #That all worked fine and the table looks good > >>> termGraphs(FvBupOverBP,pvalue=0.05, use.terms=TRUE) > >> Error in "names<-"(*tmp*, value = c(NA_character_, NA_character_, > >> NA_character_, : > >> 'value' argument must specify unique names > > I believe that I've fixed the problem. GOstats_2.2.5 is available via > biocLite (not yet for OS X). In the meantime, I noticed a labeling > problem with plotGOTermGraph and GOstats_2.2.6 will be available in > the next couple of days. > > Please let me know if you encounter further problems --- these > functions for extracting subgraphs of the results and plotting are > quite new and somewhat experimental so I'm open to suggestions. > > Best, > > + seth > > -- > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center > http://bioconductor.org >
0
Entering edit mode
Hello, I tried out these new functions (termGraphs and plotGOTermGraph) and they seem to do what they were intended to do just fine (except I couldn't seem to get the labels to change from the GO numbers to the Descriptive labels). The functions output each set of connected GO terms as an individual graph, however, which makes it difficult to get an overall idea of what over represented GO terms are related in the total set of over represented GO terms. I borrowed a good portion of the code and functions used in termGraphs and plotGOTermGraphs and modified them a little so that I could output all the connected components in one graph: connectedTerms <- function(r, max.nchar=NULL, title=NULL) { library(Rgraphviz) pvalue <- pvalueCutoff(r) if (is.null(id)) { goids1 <- sigCategories(r, pvalue) } else { goids1 <- id } subG <- subGraph(goids1, goDag(r)) cc <- connectedComp(subG) connGOs<-c() for(i in 1:length(cc)){ if(length(cc[[i]])>1){ connGOs<-c(connGOs, cc[[i]]) } } finalG<-subGraph(connGOs, goDag(r)) nodeDataDefaults(finalG) <- list(term = as.character(NA)) nodeData(finalG, attr = "term") <- as.character(sapply(mget(nodes(finalG), GOTERM), Term)) termLab <- unlist(nodeData(finalG, attr = "term")) n <- nodes(finalG) resultTerms <- names(pvalues(r)) counts <- sapply(n, function(x) { if (x %in% resultTerms) { paste(geneCounts(r)[x], "/", universeCounts(r)[x], sep = "") } else { "0/??" } } ) if (!is.null(max.nchar)) { termLab <- sapply(termLab, substr, 1L, max.nchar, USE.NAMES = FALSE) } nlab <- paste(termLab, counts) nattr <- makeNodeAttrs(finalG, label = nlab, fixedsize=FALSE, fontsize = "15000", shape="rectangle") attr<-list(node=list(), edge=list(), graph=list(rankdir= "LR")) plot(finalG, nodeAttrs=nattr, attrs=attr) } I added that for loop to combine all the connected GO terms and came up with a list of node attributes which almost always gave me a readable graph. I had to increase the font size to 15000 in order to accomplish this when I had more than 200 nodes or more than 3 hierarchical ranks, since I couldn't seem to get a lot of the other Graphviz parameters to work (like size/ratio/overlap/etc.). I'm sure there is a better way to do this: 1. because I heard you were supposed to avoid for loops and 2. because 15,000 seems a bit excessive for font size), but I'm pretty new to any sort of command prompt computer activity. After getting these graphs, however, I noticed another problem. If the GO terms were separated by an intervening non-significant GO term, they weren't connected by connectedComp(). This means you can't really use these functions to find whether two significant GO terms are in the same GO branch, which to me seems like the main point of this sort of function. If there was a way to trace each significant GO term up to it's top parent term (i.e. biological process) through its more immediate parents and to color code the significant terms, that would be the ideal way to visualize how your over represented GO terms are functionally related. As Seth stated before, though, this probably can't be done with the Rgraphviz, since these graphs aren't scalable/zoomable and the node labels become prohibitively small with more than 150 or so nodes. Does any one have any ideas as to how to obtain such a graph (significant terms traced through their parents and color coded) either in R or in a program in which Bioconductor output could easily be used? I apologize for the lengthy post and welcome any ideas, David Quoting Seth Falcon <sfalcon at="" fhcrc.org="">: > Hi, > > > davidl at unr.nevada.edu writes: > >> Hello Seth, > >> > >> Thank you very much for the response. I read the help pages for > those > >> functions and they sound like they are exactly what I was looking for. I > ran > >> into a problem actually using termGraphs, however. This may be something > >> simple and stupid but I am having trouble identifying what the problem is. > >> This is the relevant part of my workflow and the resulting error message: > >> > >>> paramsFvBup<-new("GOHyperGParams", geneIds=llsupFvB, > >> universeGeneIds=llsUniversFvB, annotation="mouse4302", ontology="BP", > >> pvalueCutoff=0.05, conditional=TRUE, testDirection="over") > >>> FvBupOverBP<-hyperGTest(paramsFvBup) > >>> htmlReport(FvBupOverBP, file="test.html") > >> #That all worked fine and the table looks good > >>> termGraphs(FvBupOverBP,pvalue=0.05, use.terms=TRUE) > >> Error in "names<-"(*tmp*, value = c(NA_character_, NA_character_, > >> NA_character_, : > >> 'value' argument must specify unique names > > I believe that I've fixed the problem. GOstats_2.2.5 is available via > biocLite (not yet for OS X). In the meantime, I noticed a labeling > problem with plotGOTermGraph and GOstats_2.2.6 will be available in > the next couple of days. > > Please let me know if you encounter further problems --- these > functions for extracting subgraphs of the results and plotting are > quite new and somewhat experimental so I'm open to suggestions. > > Best, > > + seth > > -- > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center > http://bioconductor.org >
0
Entering edit mode
Dick Beyer ★ 1.4k
@dick-beyer-26
Last seen 7.7 years ago
Hi David, Have you had a look at the package topGO? From your description of what you want to do, that package may provide what you want. Cheers, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* ------------------------------ Message: 7 Date: Thu, 21 Jun 2007 14:02:27 -0700 From: davidl@unr.nevada.edu Subject: Re: [BioC] Gene Ontology relationships To: bioconductor at stat.math.ethz.ch Message-ID: <1182459747.467ae763f3aa0 at secure.unr.nevada.edu> Content-Type: text/plain; charset=ISO-8859-1 Hello, I tried out these new functions (termGraphs and plotGOTermGraph) and they seem to do what they were intended to do just fine (except I couldn't seem to get the labels to change from the GO numbers to the Descriptive labels). The functions output each set of connected GO terms as an individual graph, however, which makes it difficult to get an overall idea of what over represented GO terms are related in the total set of over represented GO terms. I borrowed a good portion of the code and functions used in termGraphs and plotGOTermGraphs and modified them a little so that I could output all the connected components in one graph: connectedTerms <- function(r, max.nchar=NULL, title=NULL) { library(Rgraphviz) pvalue <- pvalueCutoff(r) if (is.null(id)) { goids1 <- sigCategories(r, pvalue) } else { goids1 <- id } subG <- subGraph(goids1, goDag(r)) cc <- connectedComp(subG) connGOs<-c() for(i in 1:length(cc)){ if(length(cc[[i]])>1){ connGOs<-c(connGOs, cc[[i]]) } } finalG<-subGraph(connGOs, goDag(r)) nodeDataDefaults(finalG) <- list(term = as.character(NA)) nodeData(finalG, attr = "term") <- as.character(sapply(mget(nodes(finalG), GOTERM), Term)) termLab <- unlist(nodeData(finalG, attr = "term")) n <- nodes(finalG) resultTerms <- names(pvalues(r)) counts <- sapply(n, function(x) { if (x %in% resultTerms) { paste(geneCounts(r)[x], "/", universeCounts(r)[x], sep = "") } else { "0/??" } } ) if (!is.null(max.nchar)) { termLab <- sapply(termLab, substr, 1L, max.nchar, USE.NAMES = FALSE) } nlab <- paste(termLab, counts) nattr <- makeNodeAttrs(finalG, label = nlab, fixedsize=FALSE, fontsize = "15000", shape="rectangle") attr<-list(node=list(), edge=list(), graph=list(rankdir= "LR")) plot(finalG, nodeAttrs=nattr, attrs=attr) } I added that for loop to combine all the connected GO terms and came up with a list of node attributes which almost always gave me a readable graph. I had to increase the font size to 15000 in order to accomplish this when I had more than 200 nodes or more than 3 hierarchical ranks, since I couldn't seem to get a lot of the other Graphviz parameters to work (like size/ratio/overlap/etc.). I'm sure there is a better way to do this: 1. because I heard you were supposed to avoid for loops and 2. because 15,000 seems a bit excessive for font size), but I'm pretty new to any sort of command prompt computer activity. After getting these graphs, however, I noticed another problem. If the GO terms were separated by an intervening non-significant GO term, they weren't connected by connectedComp(). This means you can't really use these functions to find whether two significant GO terms are in the same GO branch, which to me seems like the main point of this sort of function. If there was a way to trace each significant GO term up to it's top parent term (i.e. biological process) through its more immediate parents and to color code the significant terms, that would be the ideal way to visualize how your over represented GO terms are functionally related. As Seth stated before, though, this probably can't be done with the Rgraphviz, since these graphs aren't scalable/zoomable and the node labels become prohibitively small with more than 150 or so nodes. Does any one have any ideas as to how to obtain such a graph (significant terms traced through their parents and color coded) either in R or in a program in which Bioconductor output could easily be used? I apologize for the lengthy post and welcome any ideas, David Quoting Seth Falcon <sfalcon at="" fhcrc.org="">: > Hi, > > > davidl at unr.nevada.edu writes: > >> Hello Seth, > >> > >> Thank you very much for the response. I read the help pages for > those > >> functions and they sound like they are exactly what I was looking for. I > ran > >> into a problem actually using termGraphs, however. This may be something > >> simple and stupid but I am having trouble identifying what the problem > is. > >> This is the relevant part of my workflow and the resulting error message: > >> > >>> paramsFvBup<-new("GOHyperGParams", geneIds=llsupFvB, > >> universeGeneIds=llsUniversFvB, annotation="mouse4302", ontology="BP", > >> pvalueCutoff=0.05, conditional=TRUE, testDirection="over") > >>> FvBupOverBP<-hyperGTest(paramsFvBup) > >>> htmlReport(FvBupOverBP, file="test.html") > >> #That all worked fine and the table looks good > >>> termGraphs(FvBupOverBP,pvalue=0.05, use.terms=TRUE) > >> Error in "names<-"(*tmp*, value = c(NA_character_, NA_character_, > >> NA_character_, : > >> 'value' argument must specify unique names > > I believe that I've fixed the problem. GOstats_2.2.5 is available via > biocLite (not yet for OS X). In the meantime, I noticed a labeling > problem with plotGOTermGraph and GOstats_2.2.6 will be available in > the next couple of days. > > Please let me know if you encounter further problems --- these > functions for extracting subgraphs of the results and plotting are > quite new and somewhat experimental so I'm open to suggestions. > > Best, > > + seth > > -- > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center > http://bioconductor.org >