Issues with BioC GO annotations and NetAffx
1
0
Entering edit mode
@bornman-daniel-m-1391
Last seen 9.6 years ago
Dear BioC, I am finding inconsistencies in my annotation results using bioconductor and NetAffx. This post is long and I apologize in advance but I did a lot of work and need to fully describe my process. I am analyzing the affymetrix mouse430_2 chip and have 246 probes called as differentially expressed and have annotated this list with gene ontology data using the various bioconductor packages (mouse4302, GO, annotate, goStats, etc..). My first step was to generate all biological process goids that are mapped to these probes using mouse4302PROBE2GO{mouse4302}(documentation states this is done using entrez ids: probe id -> entrez id -> go id(s)). Next I wanted to generate a list of all the ancestors of the returned go ids using GOBPANCESTOR{GO}. I now can use these reusults to build a list of all goids that are either directly mapped to my probes or are ancestors in the gene onotology tree to these directly mapped go ids. In general, this is the basic scheme for building a list of nodes to test for significance associated with a probe list using the hypergeometric calculation, phyper{stats}. In parrallel to this, I also uploaded this list of 246 probe ids in the NetAffx web app to generate a listing of all biological process go ids associated with this list. The "all_values" list of go ids from NetAffx should be the same list generated above - all diretly mapped go ids plus all ancestors. However, they are not the same. Comparing the two lists of biological process goids revealed a great deal of disparity. In order to get to the bottom of this I deceided to build this myself from two main reference files. 1) the affymetrix library file for the mouse430_2 chip and 2) the latest gene ontology reference file ("gene_ontology_obo.txt") from http://www.geneontology.org/. The affy library file is a csv table of probe ids with tons of annotation including go ids and terms. For many probe ids, several go ids from each go category are associated with a single probe id. I parsed this file and restructured it so that for each go category I created a separate tab-delimited file of go id matched to its probe id. Since a single go id can (usualy does) have many probe ids associated with it, each row in the parsed file contains a unique go-to-probe pairing. Next, I parsed the gene ontology master file of all known go ids into a lookup table. The gene_ontology_obo file lists each go id as a record and give its "is_a:" or "part_of:" information. These relationships were used to build the lookup table. Now, with a few simple steps I can take my probe id list, find all the directly mapped goids and then use these go ids to find all their ancestor ids. This will return a list of go ids that I can use in hypergeometric calculataions to find significantly represented go ids associated with my differentially expressed probe list. But first, how does my list compare with the lists using the bioc packages and netaffx? My list closely matches the netaffx list, however there are some small differences. My list returned 674 biological process go ids, netaffx returned 671. 665 go ids were identical between the two, 9 unique to my list, and 6 unique to the netaffx list. To investigate why netaffx found something I did not, I discovered that the 6 go ids incorrectly called. These six netaffx-specific go ids are either terminal nodes and do not mapped to any of my probes based on the affymetrix library file, or are ancestor nodes and do not have any offspring that mapped to my probes, and one go id is not recognized by the gene ontology website. So, the netaffx application gives some incorrect results and I don't know what is going on with the bioc method (very different results). If anyone has actually read my post and was able to follow along, do you think this could be done correctly within a bioc package. The chip annotation package {mouse4302} and {GO} package should be able to do this but maybe they are outdated. Could I use {AnnBuilder} to update my mouse4302 package? What about the GO package? Thanks all, Daniel Bornman Research Scientist Battelle Memorial Institute Department of Statistical and Information Analysis Columbus, OH 43201
Annotation GO probe affy PROcess GOstats Category Annotation GO probe affy PROcess • 1.1k views
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 9.6 years ago
Hi Daniel, A quick response now, hopefully I or someone else will have more time for more detail later. The prodcedure you described sounds reasonable and familiar in terms of how the Bioc annotation packages are put together. The underlying source data (GO, Affy) change. The GO data in particular tends to receive many updates over short periods of time. The Bioc annotation packages are built once for each release (every 6 months) against the version of the various source data files available at that time. The Bioc annotation packages are now ~5 months old so I expect that the discrepancies you are seeing are a result of that. You should be able to use AnnBuilder to build yourself an up-to-date GO package and then use AnnBuilder with your new GO package to build a fresh mouse430_2 package. If this package has significant discrepencies with what you are expecting, please let us know. Best Wishes, + seth
ADD COMMENT

Login before adding your answer.

Traffic: 833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6