Question

EnrichGO not working with db created from annotationhub

0

Entering edit mode

Lucía ▴ 30

@16997962

Last seen 3.6 years ago

Canada

Hi, I am running clusterprofiler using my differentially expressed genes from DESeq2. I am working with Cannabis sativa. I created the db using annotation hub. Then I changed the symbols from my gene matrix to entrez id. I did that for both my genes of interest and my gene universe. And then I attempted to run enrichGO, but for some reason I get the error message below, and I have no idea why. Please help

library(AnnotationDbi)
library(clusterProfiler)
library(ReactomePA)

ah <- AnnotationHub()

AnnotationHub::query(ah, c("Cannabis", "Sativa"))
Csativa <- ah[["AH101262"]]
columns(Csativa)
keytypes(Csativa)
select(Csativa, head(keys(Csativa)), c("SYMBOL", "GENENAME", "GO")) 

annotated_significant_res_aga <- significant_res_aga

annotated_significant_res_aga$symbol <- mapIds(
  Csativa,
  keys = rownames(annotated_significant_res_aga),
  keytype = "ALIAS",
  column = "SYMBOL",
  multiVals = "first"
)

annotated_significant_res_aga$entrez <- mapIds(
  Csativa,
  keys = rownames(annotated_significant_res_aga),
  keytype = "ALIAS",
  column = "ENTREZID",
  multiVals = "first"
)

annotated_res_aga <- res_aga

annotated_res_aga$symbol <- mapIds(
  Csativa,
  keys = rownames(annotated_res_aga),
  keytype = "ALIAS",
  column = "SYMBOL",
  multiVals = "first"
)

annotated_res_aga$entrez <- mapIds(
  Csativa,
  keys = rownames(annotated_res_aga),
  keytype = "ALIAS",
  column = "ENTREZID",
  multiVals = "first"
)


ego_bp_aga <- enrichGO(gene          = as.character(unique(annotated_significant_res_aga$entrez)),
                       universe      = as.character(unique(annotated_res_aga$entrez)),
                       OrgDb         = Csativa,
                       keyType       = "ENTREZID",
                       ont           = "BP",
                       pAdjustMethod = "BH",
                       pvalueCutoff  = 0.01,
                       qvalueCutoff  = 0.05,
                       readable      = TRUE) 

--> No gene can be mapped....
--> Expected input gene ID: 23630686,27215454,27215500,24573811,27215495,27215452
--> return NULL...

sessionInfo( )

```R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] GOstats_2.62.0 graph_1.74.0 Category_2.62.0 Matrix_1.4-1
[5] GO.db_3.15.0 clusterProfiler_4.4.1 AnnotationDbi_1.58.0 RColorBrewer_1.1-3
[9] pheatmap_1.0.12 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9
[13] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.7
[17] ggplot2_3.3.6 tidyverse_1.3.1 DESeq2_1.36.0 SummarizedExperiment_1.26.1 [21] Biobase_2.56.0 MatrixGenerics_1.8.0 matrixStats_0.62.0 GenomicRanges_1.48.0
[25] GenomeInfoDb_1.32.1 IRanges_2.30.0 S4Vectors_0.34.0 AnnotationHub_3.4.0
[29] BiocFileCache_2.4.0 dbplyr_2.1.1 BiocGenerics_0.42.0 BiocManager_1.30.17

loaded via [1] shadowtext_0.1.2 [4] fastmatch_1.1-3 [7] lazyeval_0.2.2 [10] BiocParallel_1.30.0 [13] htmltools_0.5.2 [16] fansi_1.0.3 [19] tzdb_0.3.0 [22] graphlayouts_0.8.0 [25] enrichplot_1.16.0 [28] rvest_1.0.2 [31] haven_2.5.0 [34] jsonlite_1.8.0 [37] ape_5.6-2 [40] polyclip_1.10-0 [43] XVector_0.36.0 [46] scales_1.2.0 [49] Rcpp_1.0.8.3 [52] tidytree_0.3.9 [55] AnnotationForge_1.38.0 [58] ellipsis_0.3.2 [61] farver_2.1.0 [64] ggplotify_0.1.0 [67] rlang_1.0.2 [70] munsell_0.5.0 [73] tools_4.2.0 [76] cli_3.3.0 [79] broom_0.8.0 [82] ggtree_3.4.0 [85] tidygraph_1.2.1 [88] RBGL_1.72.0 [91] aplot_0.1.4 [94] compiler_4.2.0 [97] curl_4.3.2 [100] treeio_1.20.0 [103] geneplotter_1.74.0 [106] vctrs_0.4.1 [109] data.table_1.14.2 [112] httpuv_1.6.5 [115] promises_1.2.0.1 [118] assertthat_0.2.1 [121] parallel_4.2.0 [124] ggfun_0.0.6 [127] lubridate_1.8.0 ``` a namespace (and not attached): readxl_1.4.0 backports_1.4.1
plyr_1.8.7 igraph_1.3.1
GSEABase_1.58.0 splines_4.2.0
digest_0.6.29 yulab.utils_0.0.4
GOSemSim_2.22.0 viridis_0.6.2
magrittr_2.0.3 memoise_2.0.1
Biostrings_2.64.0 annotate_1.74.0
modelr_0.1.8 vroom_1.5.7
colorspace_2.0-3 blob_1.2.3
rappdirs_0.3.3 ggrepel_0.9.1
crayon_1.5.1 RCurl_1.98-1.6
scatterpie_0.1.7 genefilter_1.78.0
survival_3.3-1 glue_1.6.2
gtable_0.3.0 zlibbioc_1.42.0
DelayedArray_0.22.0 Rgraphviz_2.40.0
DOSE_3.22.0 DBI_1.1.2
viridisLite_0.4.0 xtable_1.8-4
gridGraphics_0.5-1 bit_4.0.4
httr_1.4.3 fgsea_1.22.0
pkgconfig_2.0.3 XML_3.99-0.9
locfit_1.5-9.5 utf8_1.2.2
tidyselect_1.1.2 labeling_0.4.2
reshape2_1.4.4 later_1.3.0
BiocVersion_3.15.2 cellranger_1.1.0
cachem_1.0.6 downloader_0.4
generics_0.1.2 RSQLite_2.2.14
fastmap_1.1.0 yaml_2.3.5
bit64_4.0.5 fs_1.5.2
KEGGREST_1.36.0 ggraph_2.0.5
nlme_3.1-157 mime_0.12
DO.db_2.9 xml2_1.3.3
rstudioapi_0.13 filelock_1.0.2
png_0.1-7 interactiveDisplayBase_1.34.0 reprex_2.0.1 tweenr_1.0.2
stringi_1.7.6 lattice_0.20-45
pillar_1.7.0 lifecycle_1.0.1
bitops_1.0-7 patchwork_1.1.1
qvalue_2.28.0 R6_2.5.1
gridExtra_2.3 MASS_7.3-57
withr_2.5.0 GenomeInfoDbData_1.2.8
hms_1.1.1 grid_4.2.0
ggforce_0.3.3 shiny_1.7.1

clusterProfiler AnnotationHubData GO • 5.2k views

ADD COMMENT • link updated 3.7 years ago by Guido Hooiveld ★ 4.1k • written 3.7 years ago by Lucía ▴ 30

score 0 · Answer 1 · 2022-05-16

Did you notice the first message that was returned after you ran enrichGO()? Thus: --> No gene can be mapped....

This basically means that, ehh, none of your input genes are valid entrez ids... so you should check whether this is the case. So, what is the output of head( as.character(unique(annotated_significant_res_aga$entrez)) ) and head( as.character(unique(annotated_res_aga$entrez)) )?

Also, just to show that 'it' works:

> library(AnnotationHub)
> library(clusterProfiler)

> ah <- AnnotationHub()
snapshotDate(): 2022-04-21
> AnnotationHub::query(ah, c("Cannabis", "Sativa"))
AnnotationHub with 1 record
# snapshotDate(): 2022-04-21
# names(): AH101262
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Cannabis sativa
# $rdataclass: OrgDb
# $rdatadateadded: 2022-04-21
# $title: org.Cannabis_sativa.eg.sqlite
# $description: NCBI gene ID based annotations about Cannabis sativa
# $taxonomyid: 3483
# $genome: NCBI genomes
# $sourcetype: NCBI/UniProt
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.uniprot.or...
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation") 
# retrieve record with 'object[["AH101262"]]' 
> Csativa <- ah[["AH101262"]]
>
> # as example, select the first 150 Csativa IDs as input
> # these correspond to your 'annotated_significant_res_aga$entrez'
> foreground.genes <- keys(Csativa) [1:150]
> head(foreground.genes)
[1] "23630667" "23630669" "23630670" "23630672" "23630673" "23630676"
> 
> # as background (universe), use all genes
> # these correspond to your 'annotated_res_aga$entrez
> background.genes <- keys(Csativa)
> head(background.genes)
[1] "23630667" "23630669" "23630670" "23630672" "23630673" "23630676"
> 
> # run enrichGO, but since random foreground genes are used
> # without any significance cuoff!
> ego_bp_aga <- enrichGO(gene          = foreground.genes,
+                        universe      = background.genes,
+                        OrgDb         = Csativa,
+                        keyType       = "ENTREZID",
+                        ont           = "BP",
+                        pAdjustMethod = "BH",
+                        pvalueCutoff  = 1,
+                        qvalueCutoff  = 1,
+                        readable      = TRUE)
> 
> 
> ego_bp_aga
#
# over-representation test
#
#...@organism    Cannabis sativa 
#...@ontology    BP 
#...@keytype     ENTREZID 
#...@gene        chr [1:150] "23630667" "23630669" "23630670" "23630672" "23630673" ...
#...pvalues adjusted by 'BH' with cutoff <1 
#...59 enriched terms found
'data.frame':   59 obs. of  9 variables:
 $ ID         : chr  "GO:0008152" "GO:0044237" "GO:0015979" "GO:0009059" ...
 $ Description: chr  "metabolic process" "cellular metabolic process" "photosynthesis" "macromolecule biosynthetic process" ...
 $ GeneRatio  : chr  "121/127" "121/127" "50/127" "56/127" ...
 $ BgRatio    : chr  "156/171" "156/171" "58/171" "68/171" ...
 $ pvalue     : num  0.00345 0.00345 0.00734 0.03557 0.05349 ...
 $ p.adjust   : num  0.102 0.102 0.144 0.418 0.418 ...
 $ qvalue     : num  0.0999 0.0999 0.1417 0.41 0.41 ...
 $ geneID     : chr  "psbA/matK/rps16/psbK/psbI/atpI/rps2/rpoC2/rpoC1/rpoB/petN/psbM/psbD/psbC/lhbA/rps14/psaB/psaA/ycf3/rps4/rbcL/ac"| __truncated__ "psbA/matK/rps16/psbK/psbI/atpI/rps2/rpoC2/rpoC1/rpoB/petN/psbM/psbD/psbC/lhbA/rps14/psaB/psaA/ycf3/rps4/rbcL/ac"| __truncated__ "psbA/psbK/psbI/petN/psbM/psbD/psbC/lhbA/psaB/psaA/ycf3/rbcL/psaI/ycf4/petA/psbJ/psbL/psbF/psbE/petL/petG/psaJ/p"| __truncated__ "rps16/rps2/rpoC2/rpoC1/rpoB/rps14/rps4/rpl33/rps18/rpl20/rps12/rpoA/rps11/rpl36/rps8/rpl14/rpl16/rps3/rpl22/rps"| __truncated__ ...
 $ Count      : int  121 121 50 56 63 63 61 61 66 57 ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> head( as.data.frame(ego_bp_aga) )
                   ID                            Description GeneRatio
GO:0008152 GO:0008152                      metabolic process   121/127
GO:0044237 GO:0044237             cellular metabolic process   121/127
GO:0015979 GO:0015979                         photosynthesis    50/127
GO:0009059 GO:0009059     macromolecule biosynthetic process    56/127
GO:0009058 GO:0009058                   biosynthetic process    63/127
GO:1901576 GO:1901576 organic substance biosynthetic process    63/127
           BgRatio      pvalue  p.adjust     qvalue
GO:0008152 156/171 0.003449869 0.1017711 0.09986464
GO:0044237 156/171 0.003449869 0.1017711 0.09986464
GO:0015979  58/171 0.007344356 0.1444390 0.14173319
GO:0009059  68/171 0.035570236 0.4177884 0.40996189
GO:0009058  78/171 0.053485337 0.4177884 0.40996189
GO:1901576  78/171 0.053485337 0.4177884 0.40996189
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       geneID
GO:0008152 psbA/matK/rps16/psbK/psbI/atpI/rps2/rpoC2/rpoC1/rpoB/petN/psbM/psbD/psbC/lhbA/rps14/psaB/psaA/ycf3/rps4/rbcL/accD/psaI/ycf4/petA/psbJ/psbL/psbF/psbE/petL/petG/psaJ/rpl33/rps18/rpl20/rps12/psbT/psbN/psbH/petB/petD/rpoA/rps11/rpl36/rps8/rpl14
GO:0044237 psbA/matK/rps16/psbK/psbI/atpI/rps2/rpoC2/rpoC1/rpoB/petN/psbM/psbD/psbC/lhbA/rps14/psaB/psaA/ycf3/rps4/rbcL/accD/psaI/ycf4/petA/psbJ/psbL/psbF/psbE/petL/petG/psaJ/rpl33/rps18/rpl20/rps12/psbT/psbN/psbH/petB/petD/rpoA/rps11/rpl36/rps8/rpl14
GO:0015979                                                                                                                                                                                                                                                                                                                                                                                                          psbA/psbK/psbI/petN/psbM/psbD/psbC/lhbA/psaB/psaA/ycf3/rbcL/psaI/ycf4/petA/psbJ/psbL/psbF/psbE/petL/petG/psaJ/psbT/psbN/psbH/petB/petD/psaC/ycf3/psaA/psbB/psbT/psaC/rbcL/psaI/petD/psbC/psbJ/petA/psaB/petL/petB/psbI/psbH/psbL/psbE/psbF/ycf4/psbN/petG
GO:0009059                                                                                                                                                                                                                                                                                                                                      rps16/rps2/rpoC2/rpoC1/rpoB/rps14/rps4/rpl33/rps18/rpl20/rps12/rpoA/rps11/rpl36/rps8/rpl14/rpl16/rps3/rpl22/rps19/rpl2/rpl23/rps7/rpl32/rps15/rps7/rpl23/rpl2/rps19/rps12/rps12/rpoC1/rpoB/rpl32/rps7/rps16/rpl23/rps19/rps3/rpoA/rps15/rps11/rps2/rpoC2/rpl22/rpl36/rpl14/rps19/rpl16/rps12/rpl20/rpl2/rpl2/rps18/rpl33/rps8
GO:0009058                                                                                                                                                                                                                                                                                                   rps16/atpI/rps2/rpoC2/rpoC1/rpoB/rps14/rps4/rbcL/accD/rpl33/rps18/rpl20/rps12/rpoA/rps11/rpl36/rps8/rpl14/rpl16/rps3/rpl22/rps19/rpl2/rpl23/rps7/rpl32/rps15/rps7/rpl23/rpl2/rps19/rps12/rps12/rpoC1/rpoB/rpl32/rps7/rps16/atpF/rpl23/rps19/rbcL/accD/rps3/rpoA/rps15/rps11
GO:1901576                                                                                                                                                                                                                                                                                                   rps16/atpI/rps2/rpoC2/rpoC1/rpoB/rps14/rps4/rbcL/accD/rpl33/rps18/rpl20/rps12/rpoA/rps11/rpl36/rps8/rpl14/rpl16/rps3/rpl22/rps19/rpl2/rpl23/rps7/rpl32/rps15/rps7/rpl23/rpl2/rps19/rps12/rps12/rpoC1/rpoB/rpl32/rps7/rps16/atpF/rpl23/rps19/rbcL/accD/rps3/rpoA/rps15/rps11
           Count
GO:0008152   121
GO:0044237   121
GO:0015979    50
GO:0009059    56
GO:0009058    63
GO:1901576    63
> 
> 

> sessionInfo()
R version 4.2.0 Patched (2022-05-12 r82348 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] AnnotationDbi_1.58.0  IRanges_2.30.0        S4Vectors_0.34.0     
[4] Biobase_2.56.0        clusterProfiler_4.4.1 AnnotationHub_3.4.0  
[7] BiocFileCache_2.4.0   dbplyr_2.1.1          BiocGenerics_0.42.0

score 0 · Answer 2 · 2022-05-17

0

Entering edit mode

Lucía ▴ 30

@16997962

Last seen 3.6 years ago

Canada

I've gone through my code and I'm pretty sure the issue is not the code, but that some gene IDs (all the ones that start with LOC) are just not being recognized. Actually, all my DEGs start with LOC... Does this mean I can't do GO for them?

ADD COMMENT • link 3.7 years ago Lucía ▴ 30

0

Entering edit mode

I had a closer look at this, and the error you got is actually somewhat misleading; it is not that you are using the wrong ids (ids that are not recognized by the OrgDb from the AnnotationHub), but rather that only a very limited number of genes do have a GO annotation... only 35 (based on gene symbols) or 53 (based on entrez ids) genes of the 29,813 genes present in the GFF file...

BTW: if you would use all 31,443 genes present in the OrgDb, then (only) 171 genes do have a GO annotation... See column BgRatio in my first post above.

So when all input genes (foreground genes) have no GO annotation, it will result in an error. This is the case for all LOCs,hence the error. Having at least 2 genes with a GO annotation is needed to prevent the error (although one may wonder about the relevance of the results).

Because of the lack of GO annotation, you cannot make use of the OrgDb and the function enrichGO(). However, if you do have such GO annotation info yourselves, or are able to retrieve it from another database than NCBI, than you can use the generic function enricher() with the arguments/mapping file TERM2GENE and TERM2NAME to perform GO over-representation analysis. See this post for some more info on that: AnnotationForge::makeOrgPackage GO Ids mistake. That whole thread may be relevant to you as well!

For completeness below the code that shows that the error is due to lack of GO annotations in the foreground genes, and that the OrgDb contains a very limited number of GO annotations.:

Part 1.

> library("rtracklayer")
> library("GenomicFeatures")
>
> # downloaded GFF from NCBI through link above.
> cs10.gff <- import.gff3("GCF_900626175.2_cs10_genomic.gff")
> # extract all genes
> my_genes <- cs10.gff[cs10.gff$type == "gene"]
> 
> # extract GFF-based annotation info for all genes
> gene.info <- mcols(my_genes)[c("ID","Dbxref", "Name","gene", "gene_biotype")]
> 
> dim(gene.info)  #29813 genes are present in GFF
[1] 29813     5
>
> head(gene.info)
DataFrame with 6 rows and 5 columns
                 ID           Dbxref         Name         gene
        <character>  <CharacterList>  <character>  <character>
1 gene-LOC115705401 GeneID:115705401 LOC115705401 LOC115705401
2 gene-LOC115705987 GeneID:115705987 LOC115705987 LOC115705987
3 gene-LOC115706290 GeneID:115706290 LOC115706290 LOC115706290
4 gene-LOC115707251 GeneID:115707251 LOC115707251 LOC115707251
5 gene-LOC115705027 GeneID:115705027 LOC115705027 LOC115705027
6 gene-LOC115705026 GeneID:115705026 LOC115705026 LOC115705026
    gene_biotype
     <character>
1         lncRNA
2 protein_coding
3 protein_coding
4 protein_coding
5 protein_coding
6 protein_coding
>
> tail(gene.info)
DataFrame with 6 rows and 5 columns
                      ID          Dbxref        Name        gene
             <character> <CharacterList> <character> <character>
[29808,] gene-A5N79_gr03 GeneID:27215487        rrnL        rrnL
[29809,] gene-A5N79_gr02 GeneID:27215494        rrn5        rrn5
[29810,] gene-A5N79_gr01 GeneID:27215493        rrnS        rrnS
[29811,] gene-A5N79_gp02 GeneID:27215495        sdh3        sdh3
[29812,] gene-A5N79_gp28 GeneID:27215502        nad2        nad2
[29813,] gene-A5N79_gp28 GeneID:27215502        nad2        nad2
           gene_biotype
            <character>
[29808,]           rRNA
[29809,]           rRNA
[29810,]           rRNA
[29811,] protein_coding
[29812,] protein_coding
[29813,] protein_coding
>
> # entries in column name/gene are thus symbols; the same you should have
>
> # define some fore- and background genes (as symbols)
> all.genes.symbol <- gene.info$gene
> foreground.genes.symbol <- all.genes.symbol[1:150]
> 
> # run enrichGO using gene symbols, and use the first 150 genes as foreground.
> # note that these 150 genes are all LOCs....
> head(foreground.genes.symbol)
[1] "LOC115705401" "LOC115705987" "LOC115706290" "LOC115707251"
[5] "LOC115705027" "LOC115705026"
>
> tail(foreground.genes.symbol)
[1] "LOC115707561" "LOC115706584" "LOC115706588" "LOC115706580"
[5] "LOC115707605" "LOC115706586"
>
> ego_bp_aga <- enrichGO(gene           = foreground.genes.symbol,
+                         universe      = all.genes.symbol,
+                         OrgDb         = Csativa,
+                         keyType       = "SYMBOL",
+                         ont           = "BP",
+                         pAdjustMethod = "BH",
+                         pvalueCutoff  = 1,
+                         qvalueCutoff  = 1,
+                         readable      = TRUE)
--> No gene can be mapped....
--> Expected input gene ID: ndhB,atpF,rpoC1,petB,psbB,nad4L
--> return NULL...
>
> # !! got same error as reported !!
>
> # ... but do these 150 LOCs have a GO annotation at all?
> # answer = NO!

ADD REPLY • link 3.7 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Part 2; continuation of Part 1 above...

> # !! got same error as reported !!
>
> # ... but do these 150 LOCs have a GO annotation at all?
> # answer = NO!
> 
> # OrgDb 'Csativa' downloaded from AnnotationHub as per code in 1st post. 
> AnnotationDbi::select(
+   Csativa,
+   keys = foreground.genes.symbol,
+   keytype = "SYMBOL",
+   column = c("SYMBOL", "ENTREZID", "GO")
+ )
'select()' returned 1:1 mapping between keys and columns
          SYMBOL  ENTREZID   GO
1   LOC115705401 115705401 <NA>
2   LOC115705987 115705987 <NA>
3   LOC115706290 115706290 <NA>
4   LOC115707251 115707251 <NA>
5   LOC115705027 115705027 <NA>
<<snip>>
146 LOC115706584 115706584 <NA>
147 LOC115706588 115706588 <NA>
148 LOC115706580 115706580 <NA>
149 LOC115707605 115707605 <NA>
150 LOC115706586 115706586 <NA>
>
>
> # Mmm, but how many genes do then have a GO annotation?
> annotation.all.genes <- AnnotationDbi::select(
+   Csativa,
+   keys = all.genes.symbol,
+   keytype = "SYMBOL",
+   column = c("SYMBOL", "ENTREZID", "GO")
+ )
'select()' returned many:many mapping between keys and columns
>
> dim(annotation.all.genes) # 35796 genes now (because multiple GO annotations per gene)
[1] 35796     3
> 
> # how many unique entrez genes do have a GO annotation? 53 only!!
> dim ( annotation.all.genes[!is.na(annotation.all.genes$GO) & !duplicated(annotation.all.genes$ENTREZID ), ] )
[1] 53  3
> annotation.all.genes[!is.na(annotation.all.genes$GO) & !duplicated(annotation.all.genes$ENTREZID ), ]
      SYMBOL ENTREZID         GO
29755   nad1 27215501 GO:0016021
29761  rpl10 27215482 GO:0005739
29763   nad9 27215469 GO:0005739
<<snip>>
> # when based on symbols it is even a smaller number..! 35 only
> dim ( annotation.all.genes[!is.na(annotation.all.genes$GO) & !duplicated(annotation.all.genes$SYMBOL ), ] )
[1] 35  3
> annotation.all.genes[!is.na(annotation.all.genes$GO) & !duplicated(annotation.all.genes$SYMBOL ), ]
      SYMBOL ENTREZID         GO
29755   nad1 27215501 GO:0016021
29761  rpl10 27215482 GO:0005739
29763   nad9 27215469 GO:0005739
29774   atp1 27215489 GO:0005739
<<snip>>
> 
> # proof-of-concept it works when using genes that have a GO annotation: success!!
> all.genes.symbol.2 <- annotation.all.genes[!is.na(annotation.all.genes$GO) & !duplicated(annotation.all.genes$SYMBOL ), ]$SYMBOL
> foreground.genes.symbol.2 <- all.genes.symbol.2[1:10]
> ego_bp_aga <- enrichGO(gene           = foreground.genes.symbol.2,
+                         universe      = all.genes.symbol.2, #can also be all.genes.symbol (=all genes)
+                         OrgDb         = Csativa,
+                         keyType       = "SYMBOL",
+                         ont           = "BP",
+                         pAdjustMethod = "BH",
+                         pvalueCutoff  = 1,
+                         qvalueCutoff  = 1,
+                         readable      = TRUE)
> # Success!
> ego_bp_aga
#
# over-representation test
#
#...@organism    Cannabis sativa 
#...@ontology    BP 
#...@keytype     SYMBOL 
#...@gene        chr [1:10] "nad1" "rpl10" "nad9" "atp1" "mttB" "ccmB" "nad4L" "atp4" ...
#...pvalues adjusted by 'BH' with cutoff <1 
#...15 enriched terms found
'data.frame':   15 obs. of  9 variables:
 $ ID         : chr  "GO:0046034" "GO:0009058" "GO:0044249" "GO:0044271" ...
 $ Description: chr  "ATP metabolic process" "biosynthetic process" "cellular biosynthetic process" "cellular nitrogen compound biosynthetic process" ...
 $ GeneRatio  : chr  "3/6" "2/6" "2/6" "2/6" ...
 $ BgRatio    : chr  "10/27" "12/27" "12/27" "12/27" ...
 $ pvalue     : num  0.387 0.861 0.861 0.861 0.861 ...
 $ p.adjust   : num  1 1 1 1 1 1 1 1 1 1 ...
 $ qvalue     : num  1 1 1 1 1 1 1 1 1 1 ...
 $ geneID     : chr  "nad4L/atp4/atp6" "atp4/atp6" "atp4/atp6" "atp4/atp6" ...
 $ Count      : int  3 2 2 2 2 2 2 2 2 2 ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

 > # 2nd proof-of-concept: if at least 2 genes with GO annotation are added to the LOCs, also succes!
> foreground.genes.symbol.add <- c("nad1", "atp4", foreground.genes.symbol)
>  ego_bp_aga <- enrichGO(gene           = foreground.genes.symbol.add,
+                          universe      = all.genes.symbol,
+                          OrgDb         = Csativa,
+                          keyType       = "SYMBOL",
+                          ont           = "BP",
+                          pAdjustMethod = "BH",
+                          pvalueCutoff  = 1,
+                          qvalueCutoff  = 1,
+                          readable      = TRUE)
> # Success!
> ego_bp_aga
#
# over-representation test
#
#...@organism    Cannabis sativa 
#...@ontology    BP 
#...@keytype     SYMBOL 
#...@gene        chr [1:152] "nad1" "atp4" "LOC115705401" "LOC115705987" "LOC115706290" ...
#...pvalues adjusted by 'BH' with cutoff <1 
#...15 enriched terms found
'data.frame':   15 obs. of  9 variables:
 $ ID         : chr  "GO:0046034" "GO:0009058" "GO:0044249" "GO:0044271" ...
 $ Description: chr  "ATP metabolic process" "biosynthetic process" "cellular biosynthetic process" "cellular nitrogen compound biosynthetic process" ...
 $ GeneRatio  : chr  "1/1" "1/1" "1/1" "1/1" ...
 $ BgRatio    : chr  "10/27" "12/27" "12/27" "12/27" ...
 $ pvalue     : num  0.37 0.444 0.444 0.444 0.444 ...
 $ p.adjust   : num  0.722 0.722 0.722 0.722 0.722 ...
 $ qvalue     : num  0.722 0.722 0.722 0.722 0.722 ...
 $ geneID     : chr  "atp4" "atp4" "atp4" "atp4" ...
 $ Count      : int  1 1 1 1 1 1 1 1 1 1 ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

>

ADD REPLY • link 3.7 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

I have this, can this be used for GO in R?

115709282       LOC115709282    phenylalanine ammonia-lyase
115699364       LOC115699364    4-coumarate--CoA ligase 1
115725705       LOC115725705    U4/U6.U5 small nuclear ribonucleoprotein 27 kDa protein
115725515       LOC115725515    uncharacterized LOC115725515
115725459       LOC115725459    histone H2A.6
115725416       LOC115725416    protein-S-isoprenylcysteine O-methyltransferase A
115725321       LOC115725321    ubiquitin carboxyl-terminal hydrolase 23
115725219       LOC115725219    tryptophan decarboxylase TDC2
115725203       LOC115725203    YTH domain-containing protein ECT4
115725202       LOC115725202    cyclic nucleotide-gated ion channel 4

ADD REPLY • link 3.7 years ago Lucía ▴ 30

score 0 · Answer 3 · 2022-05-17

0

Entering edit mode

Lucía ▴ 30

@16997962

Last seen 3.6 years ago

Canada

Hi, thanks so much for all your replies and running the code and confirming.

I have not seen any GO term annotations for Cannabis so far. Do you happen to know a good program to get the GO category mappings in the first place? Any advice would be appreciated.

ADD COMMENT • link 3.7 years ago Lucía ▴ 30

0

Entering edit mode

Since I almost exclusively work with data from model organisms, I never needed to do this. So I don't have any hands-on experience with this. I know from the past (literature) that the tool Blast2Go could do this. However, the use of Blast2Go is not for free anymore.

Recently I heard about the tool TOA (Taxonomy-oriented Annotation). See here for paper, and here for accompanying code. Might be useful. Idem for the web application TRAPID 2.0.

Good luck!

ADD REPLY • link 3.7 years ago Guido Hooiveld ★ 4.1k