topGO: GO terms in gene annotation do not match significant GO term output
1
0
Entering edit mode
jgg • 0
@jgg-18562
Last seen 6.0 years ago

Hi!

I'm having trouble establishing a comparison between GO terms in the original annotation of the gene subset chosen for enrichment and significant GO terms in the GenTable output of topGO

When I retrieve the gene IDs annotated with each GO term using genesInTerm(), their original annotation does not contain the specific GO term in the GenTable() output they are associated with.

For instance,

running genesInTerm(GOdata.2, "GO:0015175") shows me there are 28 genes with that GO term

However, in the original annotation file used for readMappings(), most of those 28 genes are nor annotated with "GO:0015175".

This is their MP annotation:

1 GO:0004190;GO:0015194
2 GO:0004190;GO:0015194
3 GO:0015194
4 GO:0015194
5 GO:0015194
6 GO:0015194
7 GO:0015297;GO:0015180;GO:0015171;GO:0015189;GO:0005313;GO:0015179;GO:0015181;GO:0015185
8 GO:0015180;GO:0015297;GO:0015171;GO:0015181;GO:0015185;GO:0005313;GO:0015189;GO:0015179
9 GO:0015189;GO:0005313;GO:0015179;GO:0015181;GO:0015185;GO:0015297;GO:0015180;GO:0015171
10  GO:0005313;GO:0015189;GO:0015179;GO:0015181;GO:0015185;GO:0015180;GO:0015297;GO:0015171
11  GO:0015180;GO:0015297;GO:0015171;GO:0005313;GO:0015189;GO:0015179;GO:0015185;GO:0015181
12  GO:0015185;GO:0015181;GO:0005313;GO:0015189;GO:0015179;GO:0015180;GO:0015297;GO:0015171
13  GO:0015185;GO:0015181;GO:0015179;GO:0015189;GO:0005313;GO:0015171;GO:0015297;GO:0015180
14 GO:0015185;GO:0015181;GO:0015179;GO:0005313;GO:0015189;GO:0015171;GO:0015180;GO:0015297
15 GO:0015185;GO:0015181;GO:0015179;GO:0015189;GO:0005313;GO:0015171;GO:0015297;GO:0015180
16 GO:0015179;GO:0005313;GO:0015189;GO:0015181;GO:0015185;GO:0015171;GO:0015180;GO:0015297
17 GO:0015181;GO:0015185;GO:0015179;GO:0015189;GO:0005313;GO:0015171;GO:0015297;GO:0015180
18 GO:0015185;GO:0015181;GO:0005313;GO:0015189;GO:0015179;GO:0015180;GO:0015297;GO:0015171
19 GO:0015293;GO:0015399;GO:0015175;GO:0015194;GO:0015172;GO:0015171;GO:0016756;GO:0015193;GO:0015186;GO:0005313;GO:0015180
20 GO:0015171;GO:0016756;GO:0015180;GO:0005313;GO:0015193;GO:0015186;GO:0015194;GO:0015172;GO:0015175;GO:0015293;GO:0015399
21 GO:0015171;GO:0015172;GO:0015175;GO:0015293;GO:0015399
22 GO:0015175;GO:0015293;GO:0015180;GO:0015174;GO:0015193;GO:0015186;GO:0005313;GO:0015171;GO:0015172;GO:0015194
23 GO:0015194;GO:0015172;GO:0015171;GO:0016756;GO:0015180;GO:0005313;GO:0015193;GO:0015186;GO:0015293;GO:0015399;GO:0015175
24 GO:0015175;GO:0015293;GO:0015171;GO:0015172
25 GO:0015293;GO:0015194;GO:0016756;GO:0015171;GO:0015180;GO:0015193;GO:0005313;GO:0015186
26 GO:0015293;GO:0015175;GO:0015172;GO:0015194;GO:0015171;GO:0015180;GO:0015174;GO:0015186;GO:0015193;GO:0005313
27 GO:0015172;GO:0015171;GO:0015174;GO:0015293;GO:0015399;GO:0015175
28 GO:0015171;GO:0016756;GO:0015180;GO:0015193;GO:0005313;GO:0015186;GO:0015194;GO:0015293

Why are genes mapped to a particular GO term they are not annotated with? Is this a result of the hierarchy of terms? 

I would appreciate any help in understanding what is happening.

 

Thanks a lot for your help,

Cheers!

Joana

 

 

 

 

rnaseq topgo go enrichment • 2.2k views
ADD COMMENT
0
Entering edit mode

Could it be that these genes are child terms of the GO term of your interest? I don't remember the  code nor the documentation...

ADD REPLY
0
Entering edit mode

I think that is probably the issue here, but I just wanted to make sure as I'm guessing other people might have had the same problem. Unfortunately, I cannot check that this is the case for all l GOterms and genes in my list. 

Thanks for your reply!

Joana

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 19 hours ago
United States

It's not clear what an MP mapping might be, nor how you got that set of GO IDs, so it's difficult to say if there is a problem here or not. You should follow the posting guide, which asks for, at minimum, a self-contained example that people can run.

I can make a wild guess however, that you are misunderstanding how the GO DAG works, and how topGO does the analysis. If you have a gene that has a GO term (here a fake one I just made up) of Map Kinase Phosphorylation, to which it is directly appended, by default that gene is also appended to all ancestor GO terms for that term as well. So continuing on with my example, say the parent term for Map Kinase Phosphorylation is Kinase Phosphorylation, and that term's parent is Protein Phosphorylation, and that term's parent is Phosphorylation.

The gene in this example is directly appended to Map Kinase Phosphorylation, but by inference, since all ancestor terms are supersets of the offspring terms, it is also appended to all ancestor terms as well. If you map a gene ID to its direct GO term using something like

select(OrgDb, ID, "GO", "ENTREZID")

you don't get all the terms that it is appended to, but instead just get the direct terms. You would need to do something like

select(OrgDb, ID, "GOALL", "ENTREZID")

in order to get all the direct GO terms as well as those that are inferred by being ancestors to those terms.

ADD COMMENT
0
Entering edit mode

Sorry about that!!

Here is the example code for the output I mentioned.

Does it make more sense now? Thanks a lot for your help, Joana

 

 

####################################################################

library(topGO)

# Input files and gene IDs and background annotation with Sma3s format ----

 

# Get gene IDs for the enrichment

genes.UP <- read.csv("sigUP.csv", header=F)$V1

 

# data frame with the annotations of all genes in "genes.UP"

UP.annotated <- read.csv("pathTo_UP_AnnotationFile.csv", header = T)

 

# annotation file for all genes in the transcriptome

# which will be subsetted to keep only the gene IDs and relevant GOterm annotations

# eg: ATR45|c0_g1_i1 GO:0016021, GO:0030176, GO:0031090, GO:0005789

annot.file <- read.csv("pathTo_ALLgenes_AnnotationFile.tsv", sep = "\t")

Go.F <- annot.file[,c(1,7)]

Go.F$GO.F.ID <- str_replace_all(Go.F$GO.F.ID, ";", ", ")

write.table(Go.F, file="GoF", row.names = F, col.names = F, quote = F, sep="\t")

 

# Go enrichment by MF (Molecular Function) ----

GOesByF <- readMappings(file = "GoF")

bg_genes <- names(GOesByF) 

# Compare genes vs bg_genes

compared_genes <- factor(as.integer(bg_genes %in% genes.UP))

bg_genes <- names(compared_genes)

# create topGO object ----

GOdata.2 <- new("topGOdata", ontology = "MF",

                allGenes = compared_genes,

                annot = annFUN.gene2GO, gene2GO = GOesByF)

 

# Run Fisher test ---

resultFisher <- runTest(GOdata.2, algorithm = "classic", statistic = "fisher")

# create  table with enrichment result ----

allRes.byF.UP <- GenTable(GOdata.2, classicFisher = resultFisher, orderBy = "classicFisher", topNodes = 50)

 

# get all genes with a specific GOterm

genes <- genesInTerm(GOdata.2, "GO:0015175") 

# subset those genes to get only those in my "UP" list

genes[[1]][genes[[1]] %in% genes.UP]  # this will output the 28 genes I mentioned in my first post, whose annotation in the mapping file "GoF" does not necessarily match "GO:0015175"

ADD REPLY
2
Entering edit mode

That's not helpful at all. You have a file (that nobody else has) and you show that you read it in and do some stuff with it, and you don't get the results you expect. Perhaps you missed the part where I said 'at a minimum you need to provide a self-contained example that people can run'? Unless people can run the example for themselves, there is no way for anybody to help.

But again, taking a flyer on what you might need, do note that you can get the upstream IDs from GO.db, using a SQL query

> con <- GO_dbconn()
> dbGetQuery(con, "select _id, go_id, term from go_term where go_id='GO:0015175';")
   _id      go_id                                                  term
1 9743 GO:0015175 neutral amino acid transmembrane transporter activity
> dbGetQuery(con, "select  go_id, term from go_term inner join go_mf_parents using(_id) where go_mf_parents._parent_id='9743';")
        go_id                                                                     term
1  GO:0005304                              L-valine transmembrane transporter activity
2  GO:0015188                          L-isoleucine transmembrane transporter activity
3  GO:0001761                          beta-alanine transmembrane transporter activity
4  GO:0005294 neutral L-amino acid secondary active transmembrane transporter activity
5  GO:0005295                             neutral amino acid:sodium symporter activity
6  GO:0015182                          L-asparagine transmembrane transporter activity
7  GO:0015186                           L-glutamine transmembrane transporter activity
8  GO:0015187                               glycine transmembrane transporter activity
9  GO:0015190                             L-leucine transmembrane transporter activity
10 GO:0015193                             L-proline transmembrane transporter activity
11 GO:0015195                           L-threonine transmembrane transporter activity
12 GO:0022858                               alanine transmembrane transporter activity
13 GO:0022889                                serine transmembrane transporter activity
14 GO:0033229                              cysteine transmembrane transporter activity
15 GO:0042970                            homoserine transmembrane transporter activity

And if ANY of those GO terms are appended to the gene you care about, then by definition 'neutral amino acid transmembrane transporter activity' is appended as well.

 

ADD REPLY
0
Entering edit mode

Great! That explains it, thanks.

ADD REPLY

Login before adding your answer.

Traffic: 931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6