Question

clusterProfiler: different runs return different results for gse functions

0

Entering edit mode

ssv • 0

@ssv-13861

Last seen 6.0 years ago

Hi everyone!

I've been playing around a bit with clusterProfiler to work with RNAseq data. When running my own data I noticed that when I ran the same function repeatedly, I would get different results. This happens for both gseKEGG and gseGO, even when I directly copy the commands I used a minute before, different gene sets are claimed to be enriched. I tried the same with the sample data set that they use in examples for this package and saw the same effect and also tried increasing the number of permutations, which didn't really seem to help. I think I'm missing something obvious - could someone help me out?

Below is the code I was using and the output, as well as the output of sessionInfo if that helps any:

data(geneList, package="DOSE")

gseGO(geneList = geneList, OrgDb = org.Hs.eg.db, ont = "CC", nPerm = 1000, minGSSize = 100, maxGSSize = 500, pvalueCutoff = 0.05, verbose = FALSE)

Output 1

# # Gene Set Enrichment Analysis # #...@organism Homo sapiens #...@setType CC #...@keytype ENTREZID #...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... #...nPerm 1000 #...pvalues adjusted by 'BH' with cutoff <0.05 #...34 enriched terms found 'data.frame': 34 obs. of 11 variables: $ ID : chr "GO:0031012" "GO:0005578" "GO:0042383" "GO:0000776" ... $ Description : chr "extracellular matrix" "proteinaceous extracellular matrix" "sarcolemma" "kinetochore" ... $ setSize : int 399 268 108 103 127 101 121 109 119 139 ... $ enrichmentScore: num -0.492 -0.557 -0.463 0.675 0.407 ... $ NES : num -2.15 -2.36 -1.76 2.73 1.73 ... $ pvalue : num 0.00128 0.00131 0.00146 0.00309 0.0031 ... $ p.adjust : num 0.0295 0.0295 0.0295 0.0295 0.0295 ... $ qvalues : num 0.0219 0.0219 0.0219 0.0219 0.0219 ... $ rank : num 1918 1918 2526 449 1905 ... $ leading_edge : chr "tags=35%, list=15%, signal=31%" "tags=43%, list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=26%, list=4%, signal=25%" ... $ core_enrichment: chr "4017/6649/1288/4811/57124/3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/3490/79875/7450/80781/79625/1490"| __truncated__ "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ "1062/10403/55355/220134/4751/79019/55839/54821/4085/81930/81620/332/7272/9212/891/11004/5347/701/11130/79682/57"| __truncated__ ...

Output2

# Gene Set Enrichment Analysis # #...@organism Homo sapiens #...@setType CC #...@keytype ENTREZID #...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... #...nPerm 1000 #...pvalues adjusted by 'BH' with cutoff <0.05 #...39 enriched terms found 'data.frame': 39 obs. of 11 variables: $ ID : chr "GO:0031012" "GO:0005578" "GO:0042383" "GO:0005788" ... $ Description : chr "extracellular matrix" "proteinaceous extracellular matrix" "sarcolemma" "endoplasmic reticulum lumen" ... $ setSize : int 399 268 108 166 103 101 109 115 115 119 ... $ enrichmentScore: num -0.492 -0.557 -0.463 -0.397 0.675 ... $ NES : num -2.15 -2.35 -1.76 -1.59 2.78 ... $ pvalue : num 0.00123 0.00132 0.00153 0.00285 0.00286 ... $ p.adjust : num 0.03 0.03 0.03 0.03 0.03 ... $ qvalues : num 0.0221 0.0221 0.0221 0.0221 0.0221 ... $ rank : num 1918 1918 2526 1890 449 ... $ leading_edge : chr "tags=35%, list=15%, signal=31%" "tags=43%, list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=28%, list=15%, signal=24%" ... $ core_enrichment: chr "4017/6649/1288/4811/57124/3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/3490/79875/7450/80781/79625/1490"| __truncated__ "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ "64374/845/23621/1291/1301/2098/7474/2322/5624/80781/79642/1280/1306/844/11001/5799/590/3270/1278/80267/414/1277"| __truncated__ ...

SessionInfo

R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 14393)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] clusterProfiler_3.4.4 DOSE_3.2.0 org.Hs.eg.db_3.4.1 [4] AnnotationDbi_1.38.2 IRanges_2.10.2 S4Vectors_0.14.3 [7] Biobase_2.36.2 BiocGenerics_0.22.0 BiocInstaller_1.26.0

loaded via a namespace (and not attached): [1] Rcpp_0.12.12 compiler_3.4.1 plyr_1.8.4 [4] tools_3.4.1 digest_0.6.12 bit_1.1-12 [7] RSQLite_2.0 memoise_1.1.0 tibble_1.3.4 [10] gtable_0.2.0 pkgconfig_2.0.1 rlang_0.1.2 [13] fastmatch_1.1-0 igraph_1.1.2 DBI_0.7 [16] rvcheck_0.0.9 fgsea_1.2.1 gridExtra_2.2.1 [19] stringr_1.2.0 bit64_0.9-7 grid_3.4.1 [22] glue_1.1.1 qvalue_2.8.0 data.table_1.10.4 [25] BiocParallel_1.10.1 GOSemSim_2.2.0 purrr_0.2.3 [28] tidyr_0.7.0 GO.db_3.4.1 DO.db_2.9 [31] ggplot2_2.2.1 reshape2_1.4.2 blob_1.1.0 [34] magrittr_1.5 splines_3.4.1 scales_0.5.0 [37] colorspace_1.3-2 stringi_1.1.5 lazyeval_0.2.0 [40] munsell_0.4.3

clusterprofiler • 4.9k views

ADD COMMENT • link updated 4.7 years ago by josep ▴ 10 • written 8.3 years ago by ssv • 0

score 0 · Answer 1 · 2017-08-29

The results are similar from the outputs pasted here. It would be better if you can calculate the overlap enriched term and compare the pvalue similar to what I did in http://guangchuangyu.github.io/2015/11/comparison-of-clusterprofiler-and-gsea-p/ . On Tue, Aug 29, 2017 at 10:23 PM, ssv [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User ssv <https: support.bioconductor.org="" u="" 13861=""/> wrote Question: > clusterProfiler: different runs return different results for gse functions > <https: support.bioconductor.org="" p="" 99810=""/>: > > Hi everyone! > > I've been playing around a bit with clusterProfiler to work with RNAseq > data. When running my own data I noticed that when I ran the same function > repeatedly, I would get different results. This happens for both gseKEGG > and gseGO, even when I directly copy the commands I used a minute before, > different gene sets are claimed to be enriched. I tried the same with the > sample data set that they use in examples for this package and saw the same > effect and also tried increasing the number of permutations, which didn't > really seem to help. I think I'm missing something obvious - could someone > help me out? > > Below is the code I was using and the output, as well as the output of > sessionInfo if that helps any: > > data(geneList, package="DOSE") > > gseGO(geneList = geneList, > OrgDb = org.Hs.eg.db, > ont = "CC", > nPerm = 1000, > minGSSize = 100, > maxGSSize = 500, > pvalueCutoff = 0.05, > verbose = FALSE) > > Output 1 > > # > # Gene Set Enrichment Analysis > # > #...@organism Homo sapiens > #...@setType CC > #...@keytype ENTREZID > #...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... > - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... > #...nPerm 1000 > #...pvalues adjusted by 'BH' with cutoff <0.05 > #...34 enriched terms found > 'data.frame': 34 obs. of 11 variables: > $ ID : chr "GO:0031012" "GO:0005578" "GO:0042383" > "GO:0000776" ... > $ Description : chr "extracellular matrix" "proteinaceous > extracellular matrix" "sarcolemma" "kinetochore" ... > $ setSize : int 399 268 108 103 127 101 121 109 119 139 ... > $ enrichmentScore: num -0.492 -0.557 -0.463 0.675 0.407 ... > $ NES : num -2.15 -2.36 -1.76 2.73 1.73 ... > $ pvalue : num 0.00128 0.00131 0.00146 0.00309 0.0031 ... > $ p.adjust : num 0.0295 0.0295 0.0295 0.0295 0.0295 ... > $ qvalues : num 0.0219 0.0219 0.0219 0.0219 0.0219 ... > $ rank : num 1918 1918 2526 449 1905 ... > $ leading_edge : chr "tags=35%, list=15%, signal=31%" "tags=43%, > list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=26%, list=4%, > signal=25%" ... > $ core_enrichment: chr "4017/6649/1288/4811/57124/ > 3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/ > 3490/79875/7450/80781/79625/1490"| __truncated__ > "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/ > 79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| > __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/ > 8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ > "1062/10403/55355/220134/4751/79019/55839/54821/4085/81930/ > 81620/332/7272/9212/891/11004/5347/701/11130/79682/57"| __truncated__ ... > > Output2 > > # Gene Set Enrichment Analysis > # > #...@organism Homo sapiens > #...@setType CC > #...@keytype ENTREZID > #...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... > - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... > #...nPerm 1000 > #...pvalues adjusted by 'BH' with cutoff <0.05 > #...39 enriched terms found > 'data.frame': 39 obs. of 11 variables: > $ ID : chr "GO:0031012" "GO:0005578" "GO:0042383" > "GO:0005788" ... > $ Description : chr "extracellular matrix" "proteinaceous > extracellular matrix" "sarcolemma" "endoplasmic reticulum lumen" ... > $ setSize : int 399 268 108 166 103 101 109 115 115 119 ... > $ enrichmentScore: num -0.492 -0.557 -0.463 -0.397 0.675 ... > $ NES : num -2.15 -2.35 -1.76 -1.59 2.78 ... > $ pvalue : num 0.00123 0.00132 0.00153 0.00285 0.00286 ... > $ p.adjust : num 0.03 0.03 0.03 0.03 0.03 ... > $ qvalues : num 0.0221 0.0221 0.0221 0.0221 0.0221 ... > $ rank : num 1918 1918 2526 1890 449 ... > $ leading_edge : chr "tags=35%, list=15%, signal=31%" "tags=43%, > list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=28%, list=15%, > signal=24%" ... > $ core_enrichment: chr "4017/6649/1288/4811/57124/ > 3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/ > 3490/79875/7450/80781/79625/1490"| __truncated__ > "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/ > 79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| > __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/ > 8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ > "64374/845/23621/1291/1301/2098/7474/2322/5624/80781/ > 79642/1280/1306/844/11001/5799/590/3270/1278/80267/414/1277"| > __truncated__ ... > > SessionInfo > > R version 3.4.1 (2017-06-30) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 14393) > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats4 stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] clusterProfiler_3.4.4 DOSE_3.2.0 org.Hs.eg.db_3.4.1 > [4] AnnotationDbi_1.38.2 IRanges_2.10.2 S4Vectors_0.14.3 > [7] Biobase_2.36.2 BiocGenerics_0.22.0 BiocInstaller_1.26.0 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.12 compiler_3.4.1 plyr_1.8.4 > [4] tools_3.4.1 digest_0.6.12 bit_1.1-12 > [7] RSQLite_2.0 memoise_1.1.0 tibble_1.3.4 > [10] gtable_0.2.0 pkgconfig_2.0.1 rlang_0.1.2 > [13] fastmatch_1.1-0 igraph_1.1.2 DBI_0.7 > [16] rvcheck_0.0.9 fgsea_1.2.1 gridExtra_2.2.1 > [19] stringr_1.2.0 bit64_0.9-7 grid_3.4.1 > [22] glue_1.1.1 qvalue_2.8.0 data.table_1.10.4 > [25] BiocParallel_1.10.1 GOSemSim_2.2.0 purrr_0.2.3 > [28] tidyr_0.7.0 GO.db_3.4.1 DO.db_2.9 > [31] ggplot2_2.2.1 reshape2_1.4.2 blob_1.1.0 > [34] magrittr_1.5 splines_3.4.1 scales_0.5.0 > [37] colorspace_1.3-2 stringi_1.1.5 lazyeval_0.2.0 > [40] munsell_0.4.3 > > ------------------------------ > > Post tags: clusterprofiler > > You may reply via email or visit clusterProfiler: different runs return different results for gse functions > -- --~--~---------~--~----~------------~-------~--~----~ Guangchuang Yu, PhD Candidate State Key Laboratory of Emerging Infectious Diseases School of Public Health The University of Hong Kong Hong Kong SAR, China www: https://guangchuangyu.github.io -~----------~----~----~----~------~----~------~--~---

score 0 · Answer 2 · 2021-04-20

I have the same issue. I resolved by pre-defining set.seed to run gsea Without set.seed:

Output 1:
ego3 <- gseGO(geneList     = geneList,
              OrgDb        = org.Hs.eg.db,
              ont          = "BP",
              keyType = "SYMBOL",
              nPerm        = 1000,
              minGSSize    = 3,
              maxGSSize    = 800,
              pvalueCutoff = 0.05,
              verbose      = T)
dim(ego3): 495 , 11



Output 2:
ego3 <- gseGO(geneList     = geneList,
              OrgDb        = org.Hs.eg.db,
              ont          = "BP",
              keyType = "SYMBOL",
              nPerm        = 1000,
              minGSSize    = 3,
              maxGSSize    = 800,
              pvalueCutoff = 0.05,
              verbose      = T)
dim(ego3): 506 , 11

With set.seed(123) always get same result