I'm seeking advice on the use of the "all" component in various
annotation data packages relative to GO.
Using R version 2.4.1 and (e.g.) hgu133plus version 1.14.0,
library(hgu133plus2) ## an Affy data package
x <- as.list( hgu133plus2GO2ALLPROBES ) ##probe sets for each GO term
xa <- unique( x[["all"]] ) ## holds probe sets associated to "all"
xbp <- unique( x[["GO:0008150"]] ) # biological process
xmf <- unique( x[["GO:0003674"]] ) # molecular function
xcc <- unique( x[["GO:0005575"]] ) # cellular component
## note that the following is true
all( xa == xbp )
But further checks show that the molecular function probe sets are not
a subset of "all".
I was under the impression that "all" is the union of MF, BP, and CC,
but in the few libraries I've checked, "all" equals BP. I haven't
found a discussion of the matter in the few vignettes that might be
relevant.
Is "all" really "BP", or is it supposed to be the union?
thanks,
-Michael N.
--
Michael Newton
http://www.stat.wisc.edu/~newton/
Michael Newton wrote:
> I'm seeking advice on the use of the "all" component in various
> annotation data packages relative to GO.
>
> Using R version 2.4.1 and (e.g.) hgu133plus version 1.14.0,
>
> library(hgu133plus2) ## an Affy data package
> x <- as.list( hgu133plus2GO2ALLPROBES ) ##probe sets for each GO
term
>
> xa <- unique( x[["all"]] ) ## holds probe sets associated to
"all"
>
> xbp <- unique( x[["GO:0008150"]] ) # biological process
> xmf <- unique( x[["GO:0003674"]] ) # molecular function
> xcc <- unique( x[["GO:0005575"]] ) # cellular component
>
> ## note that the following is true
>
> all( xa == xbp )
>
> But further checks show that the molecular function probe sets are
not
> a subset of "all".
>
> I was under the impression that "all" is the union of MF, BP, and
CC,
> but in the few libraries I've checked, "all" equals BP. I haven't
> found a discussion of the matter in the few vignettes that might be
> relevant.
>
> Is "all" really "BP", or is it supposed to be the union?
>
> thanks,
>
> -Michael N.
>
>
Hi Michael,
The difference between go_bp_all and go_bp is that "go_bp_all" ALSO
contains the go terms that are the parent terms to the most specific
terms. It helps to know that GO ontologies are directed acyclic
graphs,
and therefore anything beyond the specific term is probably redundant
information. Therefore, the "go_bp_all" environment is really just
included here for convenience.
As for your code, I tried running it and noticed the following:
The expression:
xa <- unique( x[["all"]] )
This just assigns a null value to xa. I am pretty sure that this is
not
what you had in mind. I assume this is a consequence of what James
just
wrote in about "all".
And so then when you say:
all( xa == xbp )
what happens here is that you get true returned just because the null
value is considered to be true. In other words if you say all(xa) or
all(NULL) you also get TRUE returned.
Hope this helps you,
Marc
Hi Michael,
Michael Newton wrote:
> I'm seeking advice on the use of the "all" component in various
> annotation data packages relative to GO.
>
> Using R version 2.4.1 and (e.g.) hgu133plus version 1.14.0,
Thanks for reporting this problem!
Any reason why you don't use the current Bioconductor release? You
need R-2.5.x
for this. Our metdata packages are updated every 6 months with each
new release.
As Jim said, in more recent hgu133plus, the "all" entry has been
removed
from the GO2ALLPROBES map.
One important property of hgu133plus2GO2ALLPROBES[["all"]] is that it
should
_normally_ return all the probes that are mapped to at least 1 GO
term.
This is because GO term "all" is the parent of all GO terms.
I've looked at hgu133plus2 1.14.0, and something is obviously wrong
with it:
> library(hgu133plus2)
> length(unique(hgu133plus2GO2ALLPROBES[["all"]]))
[1] 30223
> probe_is_unmapped <- eapply(hgu133plus2GO, function(x)
isTRUEis.na(x)))
> probes_hitting_GO <-
names(probe_is_unmapped)[!unlist(probe_is_unmapped)]
> length(probes_hitting_GO)
[1] 35836
The 2 above results should match :-/
Also hgu133plus2GO2ALLPROBES[["all"]] should contain the same probes
as
the union of
bp_probes <- hgu133plus2GO2ALLPROBES[["GO:0008150"]] # biological
process
mf_probes <- hgu133plus2GO2ALLPROBES[["GO:0003674"]] # molecular
function
cc_probes <- hgu133plus2GO2ALLPROBES[["GO:0005575"]] # cellular
component
but it's apparently not the case:
> length(unique(c(bp_probes, mf_probes, cc_probes)))
[1] 35836
In fact this union contains the same probes as 'probes_hitting_GO'
(which is
good):
> setequal(unique(c(bp_probes, mf_probes, cc_probes)),
probes_hitting_GO)
[1] TRUE
> setequal(bp_probes, hgu133plus2GO2ALLPROBES[["all"]])
[1] TRUE
This only confirms what you've reported below: that
hgu133plus2GO2ALLPROBES[["all"]]
is incomplete (it only contains the "BP probes" i.e. the probes that
are mapped to
at least 1 GO term under the biological process ontology).
Please consider using hgu133plus2 1.16.0 instead (included in our
current release).
The "all" key has been removed from the hgu133plus2GO2ALLPROBES map so
I won't say
that the problem has been fixed but at least it has disappeared ;-).
We are currently in the process of reworking the way we produce our
metadata packages
with the ambitious goal to make them better. So any breakage in the
current packages
that people report to us is of great value and will help us to achieve
our goal.
Thanks again for the feedback!
Cheers,
H.
>
> library(hgu133plus2) ## an Affy data package
> x <- as.list( hgu133plus2GO2ALLPROBES ) ##probe sets for each GO
term
>
> xa <- unique( x[["all"]] ) ## holds probe sets associated to
"all"
>
> xbp <- unique( x[["GO:0008150"]] ) # biological process
> xmf <- unique( x[["GO:0003674"]] ) # molecular function
> xcc <- unique( x[["GO:0005575"]] ) # cellular component
>
> ## note that the following is true
>
> all( xa == xbp )
>
> But further checks show that the molecular function probe sets are
not
> a subset of "all".
>
> I was under the impression that "all" is the union of MF, BP, and
CC,
> but in the few libraries I've checked, "all" equals BP. I haven't
> found a discussion of the matter in the few vignettes that might be
> relevant.
>
> Is "all" really "BP", or is it supposed to be the union?
>
> thanks,
>
> -Michael N.
>