Hi David,
Have you had a look at the package topGO? From your description of
what you
want to do, that package may provide what you want.
Cheers,
Dick
**********************************************************************
*********
Richard P. Beyer, Ph.D. University of Washington
Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100
Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer
**********************************************************************
*********
------------------------------
Message: 7
Date: Thu, 21 Jun 2007 14:02:27 -0700
From: davidl@unr.nevada.edu
Subject: Re: [BioC] Gene Ontology relationships
To: bioconductor at stat.math.ethz.ch
Message-ID: <1182459747.467ae763f3aa0 at secure.unr.nevada.edu>
Content-Type: text/plain; charset=ISO-8859-1
Hello,
I tried out these new functions (termGraphs and plotGOTermGraph)
and they
seem to do what they were intended to do just fine (except I couldn't
seem to
get the labels to change from the GO numbers to the Descriptive
labels). The
functions output each set of connected GO terms as an individual
graph,
however, which makes it difficult to get an overall idea of what over
represented GO terms are related in the total set of over represented
GO terms.
I borrowed a good portion of the code and functions used in
termGraphs and
plotGOTermGraphs and modified them a little so that I could output all
the
connected components in one graph:
connectedTerms <- function(r, max.nchar=NULL, title=NULL) {
library(Rgraphviz)
pvalue <- pvalueCutoff(r)
if (is.null(id)) {
goids1 <- sigCategories(r, pvalue)
}
else {
goids1 <- id
}
subG <- subGraph(goids1, goDag(r))
cc <- connectedComp(subG)
connGOs<-c()
for(i in 1:length(cc)){
if(length(cc[[i]])>1){
connGOs<-c(connGOs, cc[[i]])
}
}
finalG<-subGraph(connGOs, goDag(r))
nodeDataDefaults(finalG) <- list(term = as.character(NA))
nodeData(finalG, attr = "term") <-
as.character(sapply(mget(nodes(finalG),
GOTERM), Term))
termLab <- unlist(nodeData(finalG, attr = "term"))
n <- nodes(finalG)
resultTerms <- names(pvalues(r))
counts <- sapply(n, function(x) {
if (x %in% resultTerms) {
paste(geneCounts(r)[x], "/",
universeCounts(r)[x], sep = "")
}
else {
"0/??"
}
}
)
if (!is.null(max.nchar)) {
termLab <- sapply(termLab, substr, 1L, max.nchar, USE.NAMES =
FALSE)
}
nlab <- paste(termLab, counts)
nattr <- makeNodeAttrs(finalG, label = nlab, fixedsize=FALSE,
fontsize =
"15000", shape="rectangle")
attr<-list(node=list(), edge=list(), graph=list(rankdir=
"LR"))
plot(finalG, nodeAttrs=nattr, attrs=attr)
}
I added that for loop to combine all the connected GO terms and came
up with a
list of node attributes which almost always gave me a readable graph.
I had to
increase the font size to 15000 in order to accomplish this when I had
more
than 200 nodes or more than 3 hierarchical ranks, since I couldn't
seem to get
a lot of the other Graphviz parameters to work (like
size/ratio/overlap/etc.).
I'm sure there is a better way to do this:
1. because I heard you were supposed to avoid for loops and
2. because 15,000 seems a bit excessive for font size),
but I'm pretty new to any sort of command prompt computer activity.
After getting these graphs, however, I noticed another problem. If
the GO
terms
were separated by an intervening non-significant GO term, they weren't
connected by connectedComp(). This means you can't really use these
functions
to find whether two significant GO terms are in the same GO branch,
which to me
seems like the main point of this sort of function.
If there was a way to trace each significant GO term up to it's top
parent term
(i.e. biological process) through its more immediate parents and to
color code
the significant terms, that would be the ideal way to visualize how
your over
represented GO terms are functionally related. As Seth stated before,
though,
this probably can't be done with the Rgraphviz, since these graphs
aren't
scalable/zoomable and the node labels become prohibitively small with
more than
150 or so nodes.
Does any one have any ideas as to how to obtain such a graph
(significant terms
traced through their parents and color coded) either in R or in a
program in
which Bioconductor output could easily be used? I apologize for the
lengthy
post and welcome any ideas,
David
Quoting Seth Falcon <sfalcon at="" fhcrc.org="">:
> Hi,
>
> > davidl at unr.nevada.edu writes:
> >> Hello Seth,
> >>
> >> Thank you very much for the response. I read the help pages
for
> those
> >> functions and they sound like they are exactly what I was looking
for. I
> ran
> >> into a problem actually using termGraphs, however. This may be
something
> >> simple and stupid but I am having trouble identifying what the
problem
> is.
> >> This is the relevant part of my workflow and the resulting error
message:
> >>
> >>> paramsFvBup<-new("GOHyperGParams", geneIds=llsupFvB,
> >> universeGeneIds=llsUniversFvB, annotation="mouse4302",
ontology="BP",
> >> pvalueCutoff=0.05, conditional=TRUE, testDirection="over")
> >>> FvBupOverBP<-hyperGTest(paramsFvBup)
> >>> htmlReport(FvBupOverBP, file="test.html")
> >> #That all worked fine and the table looks good
> >>> termGraphs(FvBupOverBP,pvalue=0.05, use.terms=TRUE)
> >> Error in "names<-"(`*tmp*`, value = c(NA_character_,
NA_character_,
> >> NA_character_, :
> >> 'value' argument must specify unique names
>
> I believe that I've fixed the problem. GOstats_2.2.5 is available
via
> biocLite (not yet for OS X). In the meantime, I noticed a labeling
> problem with plotGOTermGraph and GOstats_2.2.6 will be available in
> the next couple of days.
>
> Please let me know if you encounter further problems --- these
> functions for extracting subgraphs of the results and plotting are
> quite new and somewhat experimental so I'm open to suggestions.
>
> Best,
>
> + seth
>
> --
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer
Research Center
>
http://bioconductor.org
>