I am interested in gene ontology enrichment and/or depletion analysis in D. melanogaster. Therefore I implemented an package in R that does the job, also because I really want to learn R. For the direct annotation of Entrez ids to GO terms I used the org.Dm.eg.db package. By accident I discovered that also Genes with an NOT qualifier are annotated (roughly 3% of the fly genome) to a certain GO category making them indistinguishable from genes that are truely associated with a GO category. Did I overlook something or is there something to it?
Is the NOT qualifier not specific for D. melanogaster GO annotation? As someone working with human or mouse data, I have never heard of that GO qualifier before... I would suggest a little more explanation and some sample code/examples would help, especially for the BioC core members who generate these type of annotation packages (and are likely less of a domain expert on Dm than you are...).
Edit: this link may be useful; after a quick read it seems to me the NOT qualifier is specific for FlyBase. Since the GO mappings for Dm are based on info directly derived from the GO consortium (and not FlyBase), this may be the cause of this apparent discrepancy?
The NOT qualifier is relevant for many databases:
"GO uses three qualifiers, contributes_to, colocalizes_with and NOT, to further refine annotations (see
the GO annotation conventions). The NOT qualifier,
which indicates the lack of a property, is most vital in
data interpretation. This is used judiciously, only when
there is potential for confusion or contradiction. For
example, a gene product might have sequence similarity
to protein kinases, but the curator can apply the NOT
qualifier to indicate that, contrary to expectation, the
gene product does not exhibit kinase activity based on
published results. Although the total number of NOT
annotations is minor, several databases have hundreds of
these annotations (TABLE 3)" from doi:10.1038/nrg2363 Rhee et al., 2008
On the gene ontology website go to download -> annotations -> download the annotation textfile of the desired species
run read.delim(gzfile(path_to_annotation_textfile), na.strings = "", header = FALSE,
comment.char = "!", sep = "\t") in R column "V4" gives you the qualifier of the annotation; check if the annotations that are qualified with "NOT" are also present in the org.your_fav_species.eg.db file. In my opininion they should not be. In the case of org.Dm.eg.db it seems like they are.
For human few proteins the NOT qualifier is also used, but this is only for a small percentage (1281 out of 409697 entries). Whether this is taken into account in the annotation packages (
org.Xx.eg.db
orGO.db
), and if so how, is a good question! I don't know....http://www.geneontology.org/page/download-annotations