ClusterProfiler: enrichResult table with genes that do not have GO-term
1
0
Entering edit mode
Reb • 0
@reb-21949
Last seen 19 months ago

Hi all,

I did a GO Enrichment Ananlysis with ClusterProfiler's enrichGO. My input for this are the results from DEseq2, and the Homo sapiens orgDb:

sigLevel= 0.05

univ= Results %>% pull(ENSEMBL)
geneset= Results %>%
    filter(padj <= sigLevel & log2FoldChange >=2) %>%
    pull(ENSEMBL)


ggo = enrichGO(gene= geneset, 
               universe = univ, 
               OrgDb = org.Hs.eg.db, 
               keyType = "ENSEMBL",
               ont="BP",
               pvalueCutoff = sigLevel)

This is working nicely, but when I check the output table from the enrichResult instance, I find some inconsistencies. For example, one of the enriched GO terms is:

| ID | Description | GeneRatio | BgRatio | geneID | Count|

|GO:0090148| membrane fission | 2/75 | 11/14925 | ENSG00000183486/ENSG00000157601 | 2|

However, when I check the two genes, either in the org.Hs.eg.db dataset, or at the ENSEMBL site, none of the two genes is actually associated with this GO term. I find that for several of the enriched terms, and I don't understand how this can happen, or how I need to solve it. Anyone can help?

Thanks a ton!

My session:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C                      
[5] LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] clusterProfiler_3.14.3      org.Hs.eg.db_3.10.0         AnnotationDbi_1.48.0        RColorBrewer_1.1-2          pheatmap_1.0.12            
 [6] DESeq2_1.26.0               SummarizedExperiment_1.16.1 DelayedArray_0.12.2         BiocParallel_1.20.1         matrixStats_0.55.0         
[11] Biobase_2.46.0              GenomicRanges_1.38.0        GenomeInfoDb_1.22.0         IRanges_2.20.2              S4Vectors_0.24.3           
[16] BiocGenerics_0.32.0         forcats_0.4.0               stringr_1.4.0               dplyr_0.8.3                 purrr_0.3.3                
[21] readr_1.3.1                 tidyr_1.0.0                 tibble_2.1.3                ggplot2_3.2.1               tidyverse_1.3.0            

loaded via a namespace (and not attached):
  [1] readxl_1.3.1           backports_1.1.5        Hmisc_4.3-0            fastmatch_1.1-0        plyr_1.8.5             igraph_1.2.4.2        
  [7] lazyeval_0.2.2         splines_3.6.2          urltools_1.7.3         digest_0.6.23          htmltools_0.4.0        GOSemSim_2.12.0       
 [13] viridis_0.5.1          GO.db_3.7.0            fansi_0.4.1            magrittr_1.5           checkmate_1.9.4        memoise_1.1.0         
 [19] cluster_2.1.0          annotate_1.64.0        graphlayouts_0.5.0     modelr_0.1.5           enrichplot_1.6.1       prettyunits_1.1.1     
 [25] jpeg_0.1-8.1           colorspace_1.4-1       blob_1.2.1             rvest_0.3.5            ggrepel_0.8.1          haven_2.2.0           
 [31] xfun_0.12              crayon_1.3.4           RCurl_1.98-1.1         jsonlite_1.6           genefilter_1.68.0      zeallot_0.1.0         
 [37] survival_3.1-8         glue_1.3.1             polyclip_1.10-0        gtable_0.3.0           zlibbioc_1.32.0        XVector_0.26.0        
 [43] scales_1.1.0           DOSE_3.12.0            DBI_1.1.0              Rcpp_1.0.3             viridisLite_0.3.0      xtable_1.8-4          
 [49] progress_1.2.2         htmlTable_1.13.3       gridGraphics_0.4-1     foreign_0.8-75         bit_1.1-15.1           europepmc_0.3         
 [55] Formula_1.2-3          htmlwidgets_1.5.1      httr_1.4.1             fgsea_1.12.0           ellipsis_0.3.0         acepack_1.4.1         
 [61] pkgconfig_2.0.3        XML_3.99-0.3           farver_2.0.3           nnet_7.3-12            dbplyr_1.4.2           locfit_1.5-9.1        
 [67] labeling_0.3           ggplotify_0.0.4        tidyselect_0.2.5       rlang_0.4.2            reshape2_1.4.3         munsell_0.5.0         
 [73] cellranger_1.1.0       tools_3.6.2            cli_2.0.1              generics_0.0.2         RSQLite_2.2.0          ggridges_0.5.2        
 [79] broom_0.5.3            knitr_1.27             bit64_0.9-7            fs_1.3.1               tidygraph_1.1.2        ggraph_2.0.0          
 [85] nlme_3.1-143           DO.db_2.9              xml2_1.2.2             compiler_3.6.2         rstudioapi_0.10        png_0.1-7             
 [91] reprex_0.3.0           tweenr_1.0.1           geneplotter_1.64.0     stringi_1.4.4          lattice_0.20-38        Matrix_1.2-18         
 [97] vctrs_0.2.1            pillar_1.4.3           lifecycle_0.1.0        BiocManager_1.30.10    triebeard_0.3.0        data.table_1.12.8     
[103] cowplot_1.0.0          bitops_1.0-6           qvalue_2.18.0          R6_2.4.1               latticeExtra_0.6-29    gridExtra_2.3         
[109] MASS_7.3-51.5          assertthat_0.2.1       withr_2.1.2            GenomeInfoDbData_1.2.0 hms_0.5.3              grid_3.6.2            
[115] rpart_4.1-15           rvcheck_0.1.7          ggforce_0.3.1          lubridate_1.7.4        base64enc_0.1-3       

ClusterProfiler enrichGO • 204 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

The GO ontology is a directed acyclic graph, where a gene that is appended to a child term is also appended to all ancestor terms. So as an example, the two genes you have here are appended to 'membrane fission'. First let's see if those two genes really are appended to that term:

> library(org.Hs.eg.db)
## note here I use GOALL, which gets the term and any offspring terms for each gene
> z <- select(org.Hs.eg.db, c("ENSG00000183486","ENSG00000157601"), "GOALL", "ENSEMBL")
'select()' returned 1:many mapping between keys and columns
> z[z[,2] %in% "GO:0090148",]
            ENSEMBL      GOALL EVIDENCEALL ONTOLOGYALL
203 ENSG00000183486 GO:0090148         IBA          BP
424 ENSG00000157601 GO:0090148         IBA          BP

So both of those genes are appended to membrane fission. They aren't directly appended to membrane fission, but instead are appended via mitochondrial membrane fission, which is a direct GO term for both genes, and is a child term of membrane fission.

ADD COMMENT
0
Entering edit mode

Dear James, thanks a lot! I was really confused, but that does make sense.

ADD REPLY

Login before adding your answer.

Traffic: 306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6