"all" category in annotation data
2
0
Entering edit mode
@michael-newton-456
Last seen 10.2 years ago
I'm seeking advice on the use of the "all" component in various annotation data packages relative to GO. Using R version 2.4.1 and (e.g.) hgu133plus version 1.14.0, library(hgu133plus2) ## an Affy data package x <- as.list( hgu133plus2GO2ALLPROBES ) ##probe sets for each GO term xa <- unique( x[["all"]] ) ## holds probe sets associated to "all" xbp <- unique( x[["GO:0008150"]] ) # biological process xmf <- unique( x[["GO:0003674"]] ) # molecular function xcc <- unique( x[["GO:0005575"]] ) # cellular component ## note that the following is true all( xa == xbp ) But further checks show that the molecular function probe sets are not a subset of "all". I was under the impression that "all" is the union of MF, BP, and CC, but in the few libraries I've checked, "all" equals BP. I haven't found a discussion of the matter in the few vignettes that might be relevant. Is "all" really "BP", or is it supposed to be the union? thanks, -Michael N. -- Michael Newton http://www.stat.wisc.edu/~newton/
GO probe affy GO probe affy • 772 views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States
Michael Newton wrote: > I'm seeking advice on the use of the "all" component in various > annotation data packages relative to GO. > > Using R version 2.4.1 and (e.g.) hgu133plus version 1.14.0, > > library(hgu133plus2) ## an Affy data package > x <- as.list( hgu133plus2GO2ALLPROBES ) ##probe sets for each GO term > > xa <- unique( x[["all"]] ) ## holds probe sets associated to "all" > > xbp <- unique( x[["GO:0008150"]] ) # biological process > xmf <- unique( x[["GO:0003674"]] ) # molecular function > xcc <- unique( x[["GO:0005575"]] ) # cellular component > > ## note that the following is true > > all( xa == xbp ) > > But further checks show that the molecular function probe sets are not > a subset of "all". > > I was under the impression that "all" is the union of MF, BP, and CC, > but in the few libraries I've checked, "all" equals BP. I haven't > found a discussion of the matter in the few vignettes that might be > relevant. > > Is "all" really "BP", or is it supposed to be the union? > > thanks, > > -Michael N. > > Hi Michael, The difference between go_bp_all and go_bp is that "go_bp_all" ALSO contains the go terms that are the parent terms to the most specific terms. It helps to know that GO ontologies are directed acyclic graphs, and therefore anything beyond the specific term is probably redundant information. Therefore, the "go_bp_all" environment is really just included here for convenience. As for your code, I tried running it and noticed the following: The expression: xa <- unique( x[["all"]] ) This just assigns a null value to xa. I am pretty sure that this is not what you had in mind. I assume this is a consequence of what James just wrote in about "all". And so then when you say: all( xa == xbp ) what happens here is that you get true returned just because the null value is considered to be true. In other words if you say all(xa) or all(NULL) you also get TRUE returned. Hope this helps you, Marc
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 7 hours ago
Seattle, WA, United States
Hi Michael, Michael Newton wrote: > I'm seeking advice on the use of the "all" component in various > annotation data packages relative to GO. > > Using R version 2.4.1 and (e.g.) hgu133plus version 1.14.0, Thanks for reporting this problem! Any reason why you don't use the current Bioconductor release? You need R-2.5.x for this. Our metdata packages are updated every 6 months with each new release. As Jim said, in more recent hgu133plus, the "all" entry has been removed from the GO2ALLPROBES map. One important property of hgu133plus2GO2ALLPROBES[["all"]] is that it should _normally_ return all the probes that are mapped to at least 1 GO term. This is because GO term "all" is the parent of all GO terms. I've looked at hgu133plus2 1.14.0, and something is obviously wrong with it: > library(hgu133plus2) > length(unique(hgu133plus2GO2ALLPROBES[["all"]])) [1] 30223 > probe_is_unmapped <- eapply(hgu133plus2GO, function(x) isTRUEis.na(x))) > probes_hitting_GO <- names(probe_is_unmapped)[!unlist(probe_is_unmapped)] > length(probes_hitting_GO) [1] 35836 The 2 above results should match :-/ Also hgu133plus2GO2ALLPROBES[["all"]] should contain the same probes as the union of bp_probes <- hgu133plus2GO2ALLPROBES[["GO:0008150"]] # biological process mf_probes <- hgu133plus2GO2ALLPROBES[["GO:0003674"]] # molecular function cc_probes <- hgu133plus2GO2ALLPROBES[["GO:0005575"]] # cellular component but it's apparently not the case: > length(unique(c(bp_probes, mf_probes, cc_probes))) [1] 35836 In fact this union contains the same probes as 'probes_hitting_GO' (which is good): > setequal(unique(c(bp_probes, mf_probes, cc_probes)), probes_hitting_GO) [1] TRUE > setequal(bp_probes, hgu133plus2GO2ALLPROBES[["all"]]) [1] TRUE This only confirms what you've reported below: that hgu133plus2GO2ALLPROBES[["all"]] is incomplete (it only contains the "BP probes" i.e. the probes that are mapped to at least 1 GO term under the biological process ontology). Please consider using hgu133plus2 1.16.0 instead (included in our current release). The "all" key has been removed from the hgu133plus2GO2ALLPROBES map so I won't say that the problem has been fixed but at least it has disappeared ;-). We are currently in the process of reworking the way we produce our metadata packages with the ambitious goal to make them better. So any breakage in the current packages that people report to us is of great value and will help us to achieve our goal. Thanks again for the feedback! Cheers, H. > > library(hgu133plus2) ## an Affy data package > x <- as.list( hgu133plus2GO2ALLPROBES ) ##probe sets for each GO term > > xa <- unique( x[["all"]] ) ## holds probe sets associated to "all" > > xbp <- unique( x[["GO:0008150"]] ) # biological process > xmf <- unique( x[["GO:0003674"]] ) # molecular function > xcc <- unique( x[["GO:0005575"]] ) # cellular component > > ## note that the following is true > > all( xa == xbp ) > > But further checks show that the molecular function probe sets are not > a subset of "all". > > I was under the impression that "all" is the union of MF, BP, and CC, > but in the few libraries I've checked, "all" equals BP. I haven't > found a discussion of the matter in the few vignettes that might be > relevant. > > Is "all" really "BP", or is it supposed to be the union? > > thanks, > > -Michael N. >
ADD COMMENT

Login before adding your answer.

Traffic: 699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6