enrichGO returning NA
2
0
Entering edit mode
scanchi • 0
@scanchi-14313
Last seen 6.2 years ago

When I run the enrichGO function on a set of DEG, I am getting NA in the 'Description' column for a few GO term ID's. How could I rectify this error ? Or is there a way to skip NA terms when using dotplot for visualization ? 

head(condition.control_dge)
ENTREZID  SYMBOL
    6536  SLC6A9
    4601    MXI1
    3799   KIF5B
    4286    MITF
    9509 ADAMTS2
   22834  ZNF652

condition.control_ego=enrichGO(condition.control_dge$ENTREZID,'org.Hs.eg.db', ont="BP",pvalueCutoff=0.05,qvalueCutoff=0.05,pAdjustMethod="BH")

head(condition.control_ego)

ID

Description

GeneRatio

BgRatio

pvalue

p.adjust

qvalue

GO:0051650

establishment of vesicle localization

153/6595

251/17381

9.23E-14

5.62E-10

4.67E-10

GO:0051648

vesicle localization

159/6595

265/17381

2.05E-13

6.23E-10

5.17E-10

GO:0051656

establishment of   organelle localization

243/6595

450/17381

1.90E-12

3.85E-09

3.20E-09

GO:0042063

gliogenesis

150/6595

254/17381

5.60E-12

6.99E-09

5.81E-09

GO:0006914

autophagy

250/6595

470/17381

7.57E-12

6.99E-09

5.81E-09

GO:0061919  

NA

250/6595

470/17381

7.57E-12

6.99E-09

5.81E-09

The GO term with NA returns a non empty result when querying with GO.db

       GOID                                   TERM
 GO:0061919   process utilizing autophagic mechanism

 

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

Matrix products: default
BLAS: /projects/builder-group/jpg/R-bioconductor/lib64/R/lib/libRblas.so
LAPACK: /projects/builder-group/jpg/R-bioconductor/lib64/R/lib/libRlapack.so
locale:
[1] C
attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base
other attached packages:
 [1] GO.db_3.5.0           org.Hs.eg.db_3.5.0    BiocInstaller_1.28.0
 [4] AnnotationDbi_1.40.0  IRanges_2.12.0        S4Vectors_0.16.0
 [7] Biobase_2.38.0        BiocGenerics_0.24.0   clusterProfiler_3.6.0
[10] DOSE_3.4.0            edgeR_3.20.3          limma_3.34.5

 

clusterprofiler org.hs.eg.db • 1.8k views
ADD COMMENT
0
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 6 days ago
China/Guangzhou/Southern Medical Univer…
r$> clusterProfiler::go2term('GO:0061919')
       go_id                                   Term
1 GO:0061919 process utilizing autophagic mechanism

r$> bitr('GO:0006914', fromType = "GO", toType='ENTREZID', OrgDb='org.Hs.eg.db') -> id
'select()' returned 1:many mapping between keys and columns

r$> x = enrichGO(id$ENTREZID, OrgDb = 'org.Hs.eg.db', ont='BP')

r$> head(x[, 1:2], 3)
                   ID                            Description
GO:0006914 GO:0006914                              autophagy
GO:0061919 GO:0061919 process utilizing autophagic mechanism
GO:0010506 GO:0010506                regulation of autophagy

I can't reproduce your issue. Can you the guide and present a reproducible example?

ADD COMMENT
0
Entering edit mode

Thank you for the quick response. Here is the minimal list and code that can reproduce the problem albeit in a different GO term. I apologize for the long list.

dput(condition.control_dge_subset)
structure(list(ENTREZID = c("10051", "10052", "100526820", "100527963",
"10056", "10057", "10059", "1006", "100628315", "100652929",
"10066", "1007", "10072", "10073", "10076", "100775104", "10079",
"1008", "100846978", "10085", "10086", "100862662", "100862671",
"100874047", "100874058", "100874198", "100874251", "100885776",
"100885848", "100887750", "1009", "10093", "10094", "10095",
"10097", "10098", "100996251", "100996385", "100996571", "100996717",
"10102", "10106", "101060200", "101060684", "10109", "10116",
"1012", "10120", "10121", "10123", "10125", "10126", "10128",
"10129", "10131", "10140", "101410534", "10142", "10146", "10147",
"10149", "10152", "10154", "10156", "10159", "1016", "10163",
"10169", "1017", "10174", "10175", "10179", "10180", "10181",
"10188", "10189", "1019", "10190", "101926892", "101926964",
"101927111", "101927150", "101927151", "101927257", "101927314",
"101927356", "101927487", "101927532", "101927550", "101927580",
"101927599", "101927692", "101927745", "101927751", "101927768",
"101927832", "101927924", "101927973", "101928053", "101928069",
"101928079"), SYMBOL = c("SMC4", "GJC1", "CAHM", "PMF1-BGLAP",
"FARSB", "ABCC5", "DNM1L", "CDH8", "DNM3OS", "LINC02078", "SCAMP2",
"CDH9", "DPP3", "SNUPN", "PTPRU", "KLHL7-AS1", "ATP9A", "CDH10",
"LINC00506", "EDIL3", "HHLA1", "ALDH1L1-AS2", "TMEM265", "LINC00499",
"COX10-AS1", "SHANK2-AS1", "KIRREL3-AS2", "UGDH-AS1", "PTGES3L",
"MRPS31P5", "CDH11", "ARPC4", "ARPC3", "ARPC1B", "ACTR2", "TSPAN5",
"LOC100996251", "LOC100996385", "CYYR1-AS1", "LOC100996717",
"TSFM", "CTDSP2", "ZNF891", "NBPF26", "ARPC2", "FEM1B", "CDH13",
"ACTR1B", "ACTR1A", "ARL4C", "RASGRP1", "DNAL4", "LRPPRC", "FRY",
"TRAP1", "TOB1", "DLGAP1-AS4", "AKAP9", "G3BP1", "SUGP2", "ADGRG2",
"ABI2", "PLXNC1", "RASA4", "ATP6AP2", "CDH18", "WASF2", "SERF2",
"CDK2", "SORBS3", "CNIH1", "RBM7", "RBM6", "RBM5", "TNK2", "ALYREF",
"CDK4", "TXNDC9", "LOC101926892", "LOC101926964", "SUCLG2-AS1",
"L3MBTL4-AS1", "LOC101927151", "LOC101927257", "LOC101927314",
"LOC101927356", "LINC01609", "LINC01736", "LOC101927550", "LINC02141",
"ZNF529-AS1", "LOC101927692", "LOC101927745", "LOC101927751",
"HDAC2-AS2", "PAQR9-AS1", "LINC01856", "LINC01695", "LOC101928053",
"LOC101928069", "SLC44A3-AS1")), .Names = c("ENTREZID", "SYMBOL"
), row.names = c(NA, -101L), class = "data.frame")
ADD REPLY
0
Entering edit mode
##The code with the error.
condition.control_dge_subset_ego=enrichGO(condition.control_dge_subset$ENTREZID,
'org.Hs.eg.db',ont="BP",pvalueCutoff=0.05,pAdjustMethod="BH")
head(condition.control_dge_subset_ego)

                 ID                                          Description
GO:0034314 GO:0034314             Arp2/3 complex-mediated actin nucleation
GO:0045010 GO:0045010                                     actin nucleation
GO:0030838 GO:0030838 positive regulation of actin filament polymerization
GO:0110053 GO:0110053                                                 <NA>
GO:0032970 GO:0032970           regulation of actin filament-based process
GO:0051258 GO:0051258                               protein polymerization
           GeneRatio   BgRatio       pvalue     p.adjust       qvalue
GO:0034314      6/57  36/17381 1.712962e-09 1.733517e-06 1.548878e-06
GO:0045010      6/57  48/17381 1.047071e-08 5.298179e-06 4.733863e-06
GO:0030838      6/57  91/17381 5.103713e-07 1.721653e-04 1.538277e-04
GO:0110053      8/57 234/17381 8.964054e-07 2.267906e-04 2.026348e-04
GO:0032970      9/57 338/17381 1.414596e-06 2.350575e-04 2.100212e-04
GO:0051258      8/57 249/17381 1.429376e-06 2.350575e-04 2.100212e-04
ADD REPLY
0
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 6 days ago
China/Guangzhou/Southern Medical Univer…
> ENTREZID = c("10051", "10052", "100526820", "100527963",
+ "10056", "10057", "10059", "1006", "100628315", "100652929",
+ "10066", "1007", "10072", "10073", "10076", "100775104", "10079",
+ "1008", "100846978", "10085", "10086", "100862662", "100862671",
+ "100874047", "100874058", "100874198", "100874251", "100885776",
+ "100885848", "100887750", "1009", "10093", "10094", "10095",
+ "10097", "10098", "100996251", "100996385", "100996571", "100996717",
+ "10102", "10106", "101060200", "101060684", "10109", "10116",
+ "1012", "10120", "10121", "10123", "10125", "10126", "10128",
+ "10129", "10131", "10140", "101410534", "10142", "10146", "10147",
+ "10149", "10152", "10154", "10156", "10159", "1016", "10163",
+ "10169", "1017", "10174", "10175", "10179", "10180", "10181",
+ "10188", "10189", "1019", "10190", "101926892", "101926964",
+ "101927111", "101927150", "101927151", "101927257", "101927314",
+ "101927356", "101927487", "101927532", "101927550", "101927580",
+ "101927599", "101927692", "101927745", "101927751", "101927768",
+ "101927832", "101927924", "101927973", "101928053", "101928069",
+ "101928079")
> x = enrichGO(ENTREZID, OrgDb = 'org.Hs.eg.db', ont='BP')
> x$Description %>% is.na %>% sum
[1] 0

still can't reproduce your issue.

ADD COMMENT
0
Entering edit mode

I am not sure what I could be doing wrong. I have not had any issues previously working with this package. The only difference between the last time and present is that I updated the bioconductor packages to the current version (3.6.0). Is there a way to remove the 'NA' terms when plotting with dotplot? Or to remove rows with NA in the 'enrichResult' object ? Here is the entire sessionInfo if it helps.

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

Matrix products: default
BLAS: /projects/builder-group/jpg/R-bioconductor/lib64/R/lib/libRblas.so
LAPACK: /projects/builder-group/jpg/R-bioconductor/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] edgeR_3.20.3          limma_3.34.5          clusterProfiler_3.6.0
 [4] DOSE_3.4.0            org.Hs.eg.db_3.5.0    AnnotationDbi_1.40.0
 [7] IRanges_2.12.0        S4Vectors_0.16.0      Biobase_2.38.0
[10] BiocGenerics_0.24.0

loaded via a namespace (and not attached):
 [1] bitops_1.0-6               matrixStats_0.52.2
 [3] bit64_0.9-7                RColorBrewer_1.1-2
 [5] httr_1.3.1                 GenomeInfoDb_1.14.0
 [7] tools_3.4.0                backports_1.1.2
 [9] R6_2.2.2                   rpart_4.1-11
[11] Hmisc_4.0-3                DBI_0.7
[13] lazyeval_0.2.1             colorspace_1.3-2
[15] nnet_7.3-12                graphite_1.24.1
[17] gridExtra_2.3              DESeq2_1.18.1
[19] compiler_3.4.0             bit_1.1-12
[21] graph_1.56.0               htmlTable_1.11.0
[23] DelayedArray_0.4.1         scales_0.5.0
[25] checkmate_1.8.5            genefilter_1.60.0
[27] rappdirs_0.3.1             stringr_1.2.0
[29] digest_0.6.13              foreign_0.8-69
[31] XVector_0.18.0             base64enc_0.1-3
[33] pkgconfig_2.0.1            htmltools_0.3.6
[35] htmlwidgets_0.9            rlang_0.1.6
[37] rstudioapi_0.7             RSQLite_2.0
[39] bindr_0.1                  BiocParallel_1.12.0
[41] acepack_1.4.1              GOSemSim_2.4.0
[43] dplyr_0.7.4                RCurl_1.95-4.9
[45] magrittr_1.5               GO.db_3.5.0
[47] GenomeInfoDbData_1.0.0     Formula_1.2-2
[49] Matrix_1.2-12              Rcpp_0.12.14
[51] munsell_0.4.3              stringi_1.1.6
[53] SummarizedExperiment_1.8.1 zlibbioc_1.24.0
[55] plyr_1.8.4                 qvalue_2.10.0
[57] grid_3.4.0                 blob_1.1.0
[59] DO.db_2.9                  lattice_0.20-35
[61] splines_3.4.0              annotate_1.56.1
[63] locfit_1.5-9.1             knitr_1.17
[65] pillar_1.0.1               fgsea_1.4.0
[67] igraph_1.1.2               GenomicRanges_1.30.1
[69] geneplotter_1.56.0         reshape2_1.4.3
[71] fastmatch_1.1-0            XML_3.98-1.9
[73] glue_1.2.0                 latticeExtra_0.6-28
[75] data.table_1.10.4-3        gtable_0.2.0
[77] purrr_0.2.4                tidyr_0.7.2
[79] assertthat_0.2.0           ggplot2_2.2.1
[81] ReactomePA_1.22.0          xtable_1.8-2
[83] reactome.db_1.62.0         survival_2.41-3
[85] tibble_1.4.1               rvcheck_0.0.9
[87] memoise_1.1.0              bindrcpp_0.2
[89] cluster_2.0.6

 

ADD REPLY

Login before adding your answer.

Traffic: 730 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6