GO enrichment with non-model organism, supplying GO mappings
2
0
Entering edit mode
Lucía ▴ 30
@16997962
Last seen 2.3 years ago
Canada

Hi,

I am trying to do GO enrichment analysis. I work on Cannabis sativa, which is a non-model organism. There are no GO terms for Cannabis, so I cannot make use of the orgdb or biomart functions in R. However, I have generated a file with GO terms mapped to protein id from the original fasta. My file looks like this:

        !db     db_object_id    db_object_symbol        qualifier       term_accession  db_reference    evidence_code   with    aspect  db_object_name  db_object_synonym       db_>NCBI_cannabis   
XP_030477616    XP_030477616            GO:0004672      GOMAP:0000      IEA             F                       gene    taxon:3483      12042019        GOM>NCBI_cannabis   
XP_030477616    XP_030477616            GO:0005524      GOMAP:0000      IEA             F                       gene    taxon:3483      12042019        GOM>NCBI_cannabis

I am wondering how to use this file to do GO enrichment analysis. I have protein id mapped to GO term, rather than gene id mapped to GO term, is this an issue? Any guidance is appreciated

clusterProfiler goseq topGO DESeq2 • 2.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 minutes ago
United States

If your analysis uses some form of gene IDs, then you need a gene ID -> GO mapping file. Otherwise you will get no significant results, as R won't be able to infer that you are supplying protein IDs and then do the mapping for you.

ADD COMMENT
0
Entering edit mode

Thanks. How would I go about converting protein id to gene id?

ADD REPLY
0
Entering edit mode

So when you said 'there are no GO terms for Cannabis' I took you at your word. Turns out I shouldn't have!

> library(AnnotationHub)
> hub <- AnnotationHub()
> query(hub, c("cannabis","orgdb"))

AnnotationHub with 1 record
# snapshotDate(): 2022-04-21
# names(): AH101262
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Cannabis sativa
# $rdataclass: OrgDb
# $rdatadateadded: 2022-04-21
# $title: org.Cannabis_sativa.eg.sqlite
# $description: NCBI gene ID based annotations about Cannabis sativa
# $taxonomyid: 3483
# $genome: NCBI genomes
# $sourcetype: NCBI/UniProt
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.uniprot.org/p...
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation") 
# retrieve record with 'object[["AH101262"]]' 
> pot <- hub[["AH101262"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

> potlst <- mapIds(pot, keys(pot), "GOALL", "ENTREZID", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
> table(lengths(potlst))

    1    12    18    19    21    22    24    25    26    27    28    29    30 
31234     2     1     2     4     2     1     7    10     2     1     6     1 
   31    33    34    35    36    37    38    39    40    41    42    43    44 
    3     4     2     4     4     8     6    10     3     6    14     1     2 
   45    46    47    48    49    50    51    52    53    54    55    56    57 
    7    16     5     7     5     1     4    11     5     6     2     3     2 
   58    59    60    63    65    73    82    88    90   112   115   120 
    4     3     3     1     4     1     3     2     1     1     2     2 
> potlst[1:4]
$`23630667`
 [1] "GO:0009772" "GO:0018298" "GO:0009635" "GO:0006091" "GO:0008150"
 [6] "GO:0008152" "GO:0009767" "GO:0009987" "GO:0015979" "GO:0019684"
[11] "GO:0022900" "GO:0044237" "GO:0006464" "GO:0006807" "GO:0019538"
[16] "GO:0036211" "GO:0043170" "GO:0043412" "GO:0044238" "GO:0044260"
[21] "GO:0044267" "GO:0071704" "GO:1901564" "GO:0006950" "GO:0009636"
[26] "GO:0042221" "GO:0050896" "GO:0009507" "GO:0016021" "GO:0009523"
[31] "GO:0042651" "GO:0005575" "GO:0005622" "GO:0005737" "GO:0009536"
[36] "GO:0043226" "GO:0043227" "GO:0043229" "GO:0043231" "GO:0110165"
[41] "GO:0016020" "GO:0031224" "GO:0009521" "GO:0009579" "GO:0032991"
[46] "GO:0034357" "GO:0098796" "GO:0016168" "GO:0045156" "GO:0005506"
[51] "GO:0016682" "GO:0010242" "GO:0003674" "GO:0005488" "GO:0046906"
[56] "GO:0097159" "GO:1901363" "GO:0003824" "GO:0009055" "GO:0016491"
[61] "GO:0043167" "GO:0043169" "GO:0046872" "GO:0046914" "GO:0016679"

$`23630669`
 [1] "GO:0006397" "GO:0008380" "GO:0008033" "GO:0006139" "GO:0006396"
 [6] "GO:0006725" "GO:0006807" "GO:0008150" "GO:0008152" "GO:0009987"
[11] "GO:0010467" "GO:0016070" "GO:0016071" "GO:0034641" "GO:0043170"
[16] "GO:0044237" "GO:0044238" "GO:0046483" "GO:0071704" "GO:0090304"
[21] "GO:1901360" "GO:0006399" "GO:0034470" "GO:0034660" "GO:0009507"
[26] "GO:0005575" "GO:0005622" "GO:0005737" "GO:0009536" "GO:0043226"
[31] "GO:0043227" "GO:0043229" "GO:0043231" "GO:0110165" "GO:0003723"
[36] "GO:0003674" "GO:0003676" "GO:0005488" "GO:0097159" "GO:1901363"

$`23630670`
 [1] "GO:0006412" "GO:0006518" "GO:0006807" "GO:0008150" "GO:0008152"
 [6] "GO:0009058" "GO:0009059" "GO:0009987" "GO:0010467" "GO:0019538"
[11] "GO:0034641" "GO:0034645" "GO:0043043" "GO:0043170" "GO:0043603"
[16] "GO:0043604" "GO:0044237" "GO:0044238" "GO:0044249" "GO:0044260"
[21] "GO:0044267" "GO:0044271" "GO:0071704" "GO:1901564" "GO:1901566"
[26] "GO:1901576" "GO:0009507" "GO:0005840" "GO:0005575" "GO:0005622"
[31] "GO:0005737" "GO:0009536" "GO:0043226" "GO:0043227" "GO:0043229"
[36] "GO:0043231" "GO:0110165" "GO:0043228" "GO:0043232" "GO:0003735"
[41] "GO:0003674" "GO:0005198"

$`23630672`
 [1] "GO:0015979" "GO:0008150" "GO:0008152" "GO:0009987" "GO:0044237"
 [6] "GO:0009507" "GO:0016021" "GO:0009539" "GO:0042651" "GO:0005575"
[11] "GO:0005622" "GO:0005737" "GO:0009536" "GO:0043226" "GO:0043227"
[16] "GO:0043229" "GO:0043231" "GO:0110165" "GO:0016020" "GO:0031224"
[21] "GO:0009521" "GO:0009523" "GO:0009579" "GO:0032991" "GO:0034357"
[26] "GO:0098796"

Seems like you can just get what you need from the AnnotationHub

ADD REPLY
0
Entering edit mode
Lucía ▴ 30
@16997962
Last seen 2.3 years ago
Canada

Well, that's not quite the case. Sure, when I said no GO terms, that would've been a mild exaggeration, really there's 171 genes with associated GO terms for a genome of over 29 000 genes. So actually, no, AnnotationHub certainly does not provide what is needed for the analysis. Feel free to check out EnrichGO not working with db created from annotationhub if you still don't believe me.

ADD COMMENT
0
Entering edit mode

Oh, yep you are correct.

ADD REPLY

Login before adding your answer.

Traffic: 620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6