Search
Question: How does the org.Dm.eg.db package deal with NOT annotation qualifiers when annotating genes to GO?
0
18 months ago by
ivozeller0 wrote:

I am interested in gene ontology enrichment and/or depletion analysis in D. melanogaster. Therefore I implemented an package in R that does the job, also because I really want to learn R. For the direct annotation of Entrez ids to GO terms I used the org.Dm.eg.db package. By accident I discovered that also Genes with an NOT qualifier are annotated (roughly 3% of the fly genome) to a certain GO category making them indistinguishable from genes that are truely associated with a GO category.  Did I overlook something or is there something to it?

modified 18 months ago by Valerie Obenchain ♦♦ 6.6k • written 18 months ago by ivozeller0

Is the NOT qualifier not specific for D. melanogaster GO annotation? As someone working with human or mouse data, I have never heard of that GO qualifier before... I would suggest a little more explanation and some sample code/examples would help, especially for the BioC core members who generate these type of annotation packages (and are likely less of a domain expert on Dm than you are...).

Edit: this link may be useful; after a quick read it seems to me the NOT qualifier is specific for FlyBase. Since the GO mappings for Dm are based on info directly derived from the GO consortium (and not FlyBase), this may be the cause of this apparent discrepancy?

The NOT qualifier is relevant for many databases:

"GO uses three qualifiers, contributes_to, colocalizes_with and NOT, to further refine annotations (see
the GO annotation conventions). The NOT qualifier,
which indicates the lack of a property, is most vital in
data interpretation. This is used judiciously, only when
there is potential for confusion or contradiction. For
example, a gene product might have sequence similarity
to protein kinases, but the curator can apply the NOT
qualifier to indicate that, contrary to expectation, the
gene product does not exhibit kinase activity based on
published results. Although the total number of NOT
annotations is minor, several databases have hundreds of
these annotations (TABLE 3)"    from doi:10.1038/nrg2363  Rhee et al., 2008

comment.char = "!", sep = "\t") in R   column "V4" gives you the qualifier of the annotation; check if the annotations that are qualified with "NOT" are also present in the org.your_fav_species.eg.db file. In my opininion they should not be. In the case of org.Dm.eg.db it seems like they are.

For human few proteins the NOT qualifier is also used, but this is only for a small percentage (1281 out of 409697 entries). Whether this is taken into account in the annotation packages (org.Xx.eg.db or GO.db), and if so how, is a good question! I don't know....

> GAF <- read.delim(gzfile("goa_human.gaf.gz"), na.strings = "", header = FALSE, comment.char = "!", sep = "\t")
>
> table(GAF$V4) colocalizes_with contributes_to NOT 1159 1144 1269 NOT|colocalizes_with NOT|contributes_to 14 2 > > length(GAF$V4)
[1] 409697
>
> sum(!is.na(GAF$V4)) [1] 3588 > > sum(is.na(GAF$V4))
[1] 406109
>


0
18 months ago by
ivozeller0 wrote:

I did test all NOT qualified GOIDS in Dm (before I just checked inviduals by eye) and it seems to me that in the majority of cases the NOT qualifier is respected in the Org.Dm.eg.db package.

one file containing  entrez Ids associated with GOIDs (created using org.Dm.eg.db)

the other file containing entrez ids associated with GOID categories from which they are explicitly excluded by the "NOT" qualifier (these mappings should not appear in the org.Dm.eg.db package)

org.Dm.eg.db_3.4.0 was used and the latest annotation file from http://www.geneontology.org/page/download-annotations

1

I get the same:

> library(org.Dm.eg.db)
> con <- org.Dm.eg_dbconn()

> tocheck <- annot[annot[,4] %in% "NOT",c(2,5)]

> checked <- lapply(1:nrow(tocheck), function(x) dbGetQuery(con, paste0("select flybase_id, go_id, evidence from go_all inner join flybase using(_id) where flybase_id='", tocheck[x,1],"' and go_id='", tocheck[x,2],"';")));

> table(sapply(checked, nrow))

0   1   2   3
404  75  17   7

So just under 20% of the NOT mappings still exist in the database.

0
18 months ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

Thanks for the feedback. We're rebuilding the annotations over the next few weeks for the April 25 release and we won't include the GO terms with the NOT annotation.

Valerie

2

Hi,

I've just finished building the new db0 packages (versions 3.4.2). It turns out that we do filter out genes with the NOT qualifier ... the catch is that some of these genes have a gene ID -> GO mapping that is both ok and a NOT. The output below is from an intermediate database used to build the final packages. You can see some pubmed IDs support the relationship and some don't. We filter the ones that don't and are left with the ones that do.

> dbGetQuery(con, "select gene_id,evidence,go_qualifier,pubmed_id from gene2go where gene_id=31625")
gene_id evidence go_qualifier                  pubmed_id
1    31625      IDA            -                   20826458
2    31625      IMP            - 16177138|16199763|21317294
3    31625      IMP          NOT                   15965240
4    31625      NAS            -                   10908587
5    31625      ISS            -                          -
6    31625      ISS            -                          -
7    31625      ISS            -                          -
8    31625      IDA            -                   20826458
9    31625      IMP          NOT                   16177138
10   31625      NAS            -                   10908587
11   31625      ISS            -                          -
12   31625      IMP          NOT                   16177138
13   31625      ISS            -                          -
14   31625      IDA            -                   20826458
15   31625      IDA            -                   20826458

Thanks to Jim and Martin for getting to the bottom of this.

Valerie