Species, supported by the clusterProfiler package
1
0
Entering edit mode
@poiskkirpitcha-22819
Last seen 2.3 years ago

Hello!

I work with a Nothobranchius furzeri transcriptome data and want to do the GO enriched pathway analysis using the clusterProfiler R package. I started with the command 'searchkeggorganism'. The documentation (https://www.rdocumentation.org/packages/clusterProfiler/versions/3.0.4/topics/searchkeggorganism) says that this function searches directly in the KEGG catalogue (https://www.genome.jp/kegg/catalog/org_list.html), where Nothobranchius furzeri is present and has a code 'nfu'. However,

search_kegg_organism('nfu', by='kegg_code')

didn't work. The output was:

> search_kegg_organism('nfu', by='kegg_code')
[1] kegg_code       scientific_name common_name    
<0 rows> (or 0-length row.names)

I tried it with other species, and found out that it finds many organisms (e.g. 'mmu', 'dre'), and doesn't find many other organisms (e.g. 'malb', 'els').

What can I do about it? And does it mean that the package will not work with my species in general?

I would really appreciate if you could help me.


As it is advised, I am attaching the sessionInfo() output:

sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale: [1] LCCOLLATE=EnglishBelgium.1252 LCCTYPE=EnglishBelgium.1252 LCMONETARY=EnglishBelgium.1252 [4] LCNUMERIC=C LCTIME=English_Belgium.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] clusterProfiler3.14.3 rtracklayer1.46.0 GenomicRanges1.38.0 GenomeInfoDb1.22.0
[5] IRanges2.20.2 S4Vectors0.24.3 BiocGenerics0.32.0 goseq1.38.0
[9] geneLenDataBase1.22.0 BiasedUrn1.07

loaded via a namespace (and not attached): [1] nlme3.1-142 bitops1.0-6 matrixStats0.55.0
[4] enrichplot
1.6.1 bit640.9-7 RColorBrewer1.1-2
[7] progress1.2.2 httr1.4.1 tools3.6.2
[10] R6
2.4.1 DBI1.1.0 lazyeval0.2.2
[13] mgcv1.8-31 colorspace1.4-1 tidyselect1.0.0
[16] gridExtra
2.3 prettyunits1.1.1 bit1.1-15.1
[19] curl4.3 compiler3.6.2 Biobase2.46.0
[22] xml2
1.2.2 DelayedArray0.12.2 triebeard0.3.0
[25] scales1.1.0 ggridges0.5.2 askpass1.1
[28] rappdirs
0.3.1 stringr1.4.0 digest0.6.23
[31] Rsamtools2.2.1 DOSE3.12.0 XVector0.26.0
[34] pkgconfig
2.0.3 dbplyr1.4.2 rlang0.4.3
[37] rstudioapi0.10 RSQLite2.2.0 gridGraphics0.4-1
[40] farver
2.0.3 jsonlite1.6 BiocParallel1.20.1
[43] GOSemSim2.12.0 dplyr0.8.4 RCurl1.98-1.1
[46] magrittr
1.5 ggplotify0.0.4 GO.db3.10.0
[49] GenomeInfoDbData1.2.2 Matrix1.2-18 Rcpp1.0.3
[52] munsell
0.5.0 viridis0.5.1 lifecycle0.1.0
[55] stringi1.4.5 ggraph2.0.0 MASS7.3-51.5
[58] SummarizedExperiment
1.16.1 zlibbioc1.32.0 plyr1.8.5
[61] qvalue2.18.0 BiocFileCache1.10.2 grid3.6.2
[64] blob
1.2.1 ggrepel0.8.1 DO.db2.9
[67] crayon1.3.4 lattice0.20-38 cowplot1.0.0
[70] graphlayouts
0.5.0 Biostrings2.54.0 splines3.6.2
[73] GenomicFeatures1.38.1 hms0.5.3 pillar1.4.3
[76] fgsea
1.12.0 igraph1.2.4.2 reshape21.4.3
[79] biomaRt2.42.0 fastmatch1.1-0 XML3.99-0.3
[82] glue
1.3.1 BiocManager1.30.10 data.table1.12.8
[85] urltools1.7.3 tweenr1.0.1 vctrs0.2.2
[88] polyclip
1.10-0 gtable0.3.0 openssl1.4.1
[91] purrr0.3.3 tidyr1.0.2 assertthat0.2.1
[94] ggplot2
3.2.1 ggforce0.3.1 europepmc0.3
[97] tidygraph1.1.2 viridisLite0.3.0 tibble2.1.3
[100] rvcheck
0.1.7 GenomicAlignments1.22.1 AnnotationDbi1.48.0
[103] memoise_1.1.0

software error go • 277 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

It's not clear what search_kegg_organism is for? I mean it does list some of the available organisms, but it's not getting those data from KeGG, but instead an Env that comes with the package. So unless it gets updated regularly it is likely to become out of date. Anyway,

> zz <- download_KEGG("nfu")
> class(zz)
[1] "list"
> length(zz)
[1] 2
> sapply(zz, class)
KEGGPATHID2EXTID  KEGGPATHID2NAME 
    "data.frame"     "data.frame" 
> lapply(zz, head)
$KEGGPATHID2EXTID
      from        to
1 nfu00010 107372331
2 nfu00010 107373383
3 nfu00010 107374907
4 nfu00010 107375337
5 nfu00010 107375611
6 nfu00010 107375692

$KEGGPATHID2NAME
      from                                       to
1 nfu00010             Glycolysis / Gluconeogenesis
2 nfu00020                Citrate cycle (TCA cycle)
3 nfu00030                Pentose phosphate pathway
4 nfu00040 Pentose and glucuronate interconversions
5 nfu00051          Fructose and mannose metabolism
6 nfu00052                     Galactose metabolism

> sapply(zz, nrow)
KEGGPATHID2EXTID  KEGGPATHID2NAME 
           15868              538 

Seems like you can get lots of data for nfu anyway.

ADD COMMENT

Login before adding your answer.

Traffic: 123 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6