Species, supported by the clusterProfiler package
Entering edit mode
Last seen 2.3 years ago


I work with a Nothobranchius furzeri transcriptome data and want to do the GO enriched pathway analysis using the clusterProfiler R package. I started with the command 'searchkeggorganism'. The documentation (https://www.rdocumentation.org/packages/clusterProfiler/versions/3.0.4/topics/searchkeggorganism) says that this function searches directly in the KEGG catalogue (https://www.genome.jp/kegg/catalog/org_list.html), where Nothobranchius furzeri is present and has a code 'nfu'. However,

search_kegg_organism('nfu', by='kegg_code')

didn't work. The output was:

> search_kegg_organism('nfu', by='kegg_code')
[1] kegg_code       scientific_name common_name    
<0 rows> (or 0-length row.names)

I tried it with other species, and found out that it finds many organisms (e.g. 'mmu', 'dre'), and doesn't find many other organisms (e.g. 'malb', 'els').

What can I do about it? And does it mean that the package will not work with my species in general?

I would really appreciate if you could help me.

As it is advised, I am attaching the sessionInfo() output:

sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale: [1] LCCOLLATE=EnglishBelgium.1252 LCCTYPE=EnglishBelgium.1252 LCMONETARY=EnglishBelgium.1252 [4] LCNUMERIC=C LCTIME=English_Belgium.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] clusterProfiler3.14.3 rtracklayer1.46.0 GenomicRanges1.38.0 GenomeInfoDb1.22.0
[5] IRanges2.20.2 S4Vectors0.24.3 BiocGenerics0.32.0 goseq1.38.0
[9] geneLenDataBase1.22.0 BiasedUrn1.07

loaded via a namespace (and not attached): [1] nlme3.1-142 bitops1.0-6 matrixStats0.55.0
[4] enrichplot
1.6.1 bit640.9-7 RColorBrewer1.1-2
[7] progress1.2.2 httr1.4.1 tools3.6.2
[10] R6
2.4.1 DBI1.1.0 lazyeval0.2.2
[13] mgcv1.8-31 colorspace1.4-1 tidyselect1.0.0
[16] gridExtra
2.3 prettyunits1.1.1 bit1.1-15.1
[19] curl4.3 compiler3.6.2 Biobase2.46.0
[22] xml2
1.2.2 DelayedArray0.12.2 triebeard0.3.0
[25] scales1.1.0 ggridges0.5.2 askpass1.1
[28] rappdirs
0.3.1 stringr1.4.0 digest0.6.23
[31] Rsamtools2.2.1 DOSE3.12.0 XVector0.26.0
[34] pkgconfig
2.0.3 dbplyr1.4.2 rlang0.4.3
[37] rstudioapi0.10 RSQLite2.2.0 gridGraphics0.4-1
[40] farver
2.0.3 jsonlite1.6 BiocParallel1.20.1
[43] GOSemSim2.12.0 dplyr0.8.4 RCurl1.98-1.1
[46] magrittr
1.5 ggplotify0.0.4 GO.db3.10.0
[49] GenomeInfoDbData1.2.2 Matrix1.2-18 Rcpp1.0.3
[52] munsell
0.5.0 viridis0.5.1 lifecycle0.1.0
[55] stringi1.4.5 ggraph2.0.0 MASS7.3-51.5
[58] SummarizedExperiment
1.16.1 zlibbioc1.32.0 plyr1.8.5
[61] qvalue2.18.0 BiocFileCache1.10.2 grid3.6.2
[64] blob
1.2.1 ggrepel0.8.1 DO.db2.9
[67] crayon1.3.4 lattice0.20-38 cowplot1.0.0
[70] graphlayouts
0.5.0 Biostrings2.54.0 splines3.6.2
[73] GenomicFeatures1.38.1 hms0.5.3 pillar1.4.3
[76] fgsea
1.12.0 igraph1.2.4.2 reshape21.4.3
[79] biomaRt2.42.0 fastmatch1.1-0 XML3.99-0.3
[82] glue
1.3.1 BiocManager1.30.10 data.table1.12.8
[85] urltools1.7.3 tweenr1.0.1 vctrs0.2.2
[88] polyclip
1.10-0 gtable0.3.0 openssl1.4.1
[91] purrr0.3.3 tidyr1.0.2 assertthat0.2.1
[94] ggplot2
3.2.1 ggforce0.3.1 europepmc0.3
[97] tidygraph1.1.2 viridisLite0.3.0 tibble2.1.3
[100] rvcheck
0.1.7 GenomicAlignments1.22.1 AnnotationDbi1.48.0
[103] memoise_1.1.0

software error go • 277 views
Entering edit mode
Entering edit mode
Last seen 2 days ago
United States

It's not clear what search_kegg_organism is for? I mean it does list some of the available organisms, but it's not getting those data from KeGG, but instead an Env that comes with the package. So unless it gets updated regularly it is likely to become out of date. Anyway,

> zz <- download_KEGG("nfu")
> class(zz)
[1] "list"
> length(zz)
[1] 2
> sapply(zz, class)
    "data.frame"     "data.frame" 
> lapply(zz, head)
      from        to
1 nfu00010 107372331
2 nfu00010 107373383
3 nfu00010 107374907
4 nfu00010 107375337
5 nfu00010 107375611
6 nfu00010 107375692

      from                                       to
1 nfu00010             Glycolysis / Gluconeogenesis
2 nfu00020                Citrate cycle (TCA cycle)
3 nfu00030                Pentose phosphate pathway
4 nfu00040 Pentose and glucuronate interconversions
5 nfu00051          Fructose and mannose metabolism
6 nfu00052                     Galactose metabolism

> sapply(zz, nrow)
           15868              538 

Seems like you can get lots of data for nfu anyway.


Login before adding your answer.

Traffic: 123 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6