clusterProfiler: different runs return different results for gse functions
2
0
Entering edit mode
ssv • 0
@ssv-13861
Last seen 5.0 years ago

Hi everyone!

I've been playing around a bit with clusterProfiler to work with RNAseq data. When running my own data I noticed that when I ran the same function repeatedly, I would get different results. This happens for both gseKEGG and gseGO, even when I directly copy the commands I used a minute before, different gene sets are claimed to be enriched. I tried the same with the sample data set that they use in examples for this package and saw the same effect and also tried increasing the number of permutations, which didn't really seem to help. I think I'm missing something obvious - could someone help me out?

Below is the code I was using and the output, as well as the output of sessionInfo if that helps any:

data(geneList, package="DOSE")

gseGO(geneList     = geneList,
              OrgDb        = org.Hs.eg.db,
              ont          = "CC",
              nPerm        = 1000,
              minGSSize    = 100,
              maxGSSize    = 500,
              pvalueCutoff = 0.05,
              verbose      = FALSE)

Output 1

#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens
#...@setType     CC
#...@keytype     ENTREZID
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        1000
#...pvalues adjusted by 'BH' with cutoff <0.05
#...34 enriched terms found
'data.frame':   34 obs. of  11 variables:
 $ ID             : chr  "GO:0031012" "GO:0005578" "GO:0042383" "GO:0000776" ...
 $ Description    : chr  "extracellular matrix" "proteinaceous extracellular matrix" "sarcolemma" "kinetochore" ...
 $ setSize        : int  399 268 108 103 127 101 121 109 119 139 ...
 $ enrichmentScore: num  -0.492 -0.557 -0.463 0.675 0.407 ...
 $ NES            : num  -2.15 -2.36 -1.76 2.73 1.73 ...
 $ pvalue         : num  0.00128 0.00131 0.00146 0.00309 0.0031 ...
 $ p.adjust       : num  0.0295 0.0295 0.0295 0.0295 0.0295 ...
 $ qvalues        : num  0.0219 0.0219 0.0219 0.0219 0.0219 ...
 $ rank           : num  1918 1918 2526 449 1905 ...
 $ leading_edge   : chr  "tags=35%, list=15%, signal=31%" "tags=43%, list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=26%, list=4%, signal=25%" ...
 $ core_enrichment: chr  "4017/6649/1288/4811/57124/3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/3490/79875/7450/80781/79625/1490"| __truncated__ "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ "1062/10403/55355/220134/4751/79019/55839/54821/4085/81930/81620/332/7272/9212/891/11004/5347/701/11130/79682/57"| __truncated__ ...

Output2

# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens
#...@setType     CC
#...@keytype     ENTREZID
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        1000
#...pvalues adjusted by 'BH' with cutoff <0.05
#...39 enriched terms found
'data.frame':   39 obs. of  11 variables:
 $ ID             : chr  "GO:0031012" "GO:0005578" "GO:0042383" "GO:0005788" ...
 $ Description    : chr  "extracellular matrix" "proteinaceous extracellular matrix" "sarcolemma" "endoplasmic reticulum lumen" ...
 $ setSize        : int  399 268 108 166 103 101 109 115 115 119 ...
 $ enrichmentScore: num  -0.492 -0.557 -0.463 -0.397 0.675 ...
 $ NES            : num  -2.15 -2.35 -1.76 -1.59 2.78 ...
 $ pvalue         : num  0.00123 0.00132 0.00153 0.00285 0.00286 ...
 $ p.adjust       : num  0.03 0.03 0.03 0.03 0.03 ...
 $ qvalues        : num  0.0221 0.0221 0.0221 0.0221 0.0221 ...
 $ rank           : num  1918 1918 2526 1890 449 ...
 $ leading_edge   : chr  "tags=35%, list=15%, signal=31%" "tags=43%, list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=28%, list=15%, signal=24%" ...
 $ core_enrichment: chr  "4017/6649/1288/4811/57124/3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/3490/79875/7450/80781/79625/1490"| __truncated__ "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ "64374/845/23621/1291/1301/2098/7474/2322/5624/80781/79642/1280/1306/844/11001/5799/590/3270/1278/80267/414/1277"| __truncated__ ...

SessionInfo

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] clusterProfiler_3.4.4 DOSE_3.2.0            org.Hs.eg.db_3.4.1   
[4] AnnotationDbi_1.38.2  IRanges_2.10.2        S4Vectors_0.14.3     
[7] Biobase_2.36.2        BiocGenerics_0.22.0   BiocInstaller_1.26.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12        compiler_3.4.1      plyr_1.8.4         
 [4] tools_3.4.1         digest_0.6.12       bit_1.1-12         
 [7] RSQLite_2.0         memoise_1.1.0       tibble_1.3.4       
[10] gtable_0.2.0        pkgconfig_2.0.1     rlang_0.1.2        
[13] fastmatch_1.1-0     igraph_1.1.2        DBI_0.7            
[16] rvcheck_0.0.9       fgsea_1.2.1         gridExtra_2.2.1    
[19] stringr_1.2.0       bit64_0.9-7         grid_3.4.1         
[22] glue_1.1.1          qvalue_2.8.0        data.table_1.10.4  
[25] BiocParallel_1.10.1 GOSemSim_2.2.0      purrr_0.2.3        
[28] tidyr_0.7.0         GO.db_3.4.1         DO.db_2.9          
[31] ggplot2_2.2.1       reshape2_1.4.2      blob_1.1.0         
[34] magrittr_1.5        splines_3.4.1       scales_0.5.0       
[37] colorspace_1.3-2    stringi_1.1.5       lazyeval_0.2.0     
[40] munsell_0.4.3     

clusterprofiler • 4.1k views
ADD COMMENT
0
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 8 weeks ago
China/Guangzhou/Southern Medical Univer…
The results are similar from the outputs pasted here. It would be better if you can calculate the overlap enriched term and compare the pvalue similar to what I did in http://guangchuangyu.github.io/2015/11/comparison-of-clusterprofiler-and-gsea-p/ . ​ On Tue, Aug 29, 2017 at 10:23 PM, ssv [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User ssv <https: support.bioconductor.org="" u="" 13861=""/> wrote Question: > clusterProfiler: different runs return different results for gse functions > <https: support.bioconductor.org="" p="" 99810=""/>: > > Hi everyone! > > I've been playing around a bit with clusterProfiler to work with RNAseq > data. When running my own data I noticed that when I ran the same function > repeatedly, I would get different results. This happens for both gseKEGG > and gseGO, even when I directly copy the commands I used a minute before, > different gene sets are claimed to be enriched. I tried the same with the > sample data set that they use in examples for this package and saw the same > effect and also tried increasing the number of permutations, which didn't > really seem to help. I think I'm missing something obvious - could someone > help me out? > > Below is the code I was using and the output, as well as the output of > sessionInfo if that helps any: > > data(geneList, package="DOSE") > > gseGO(geneList = geneList, > OrgDb = org.Hs.eg.db, > ont = "CC", > nPerm = 1000, > minGSSize = 100, > maxGSSize = 500, > pvalueCutoff = 0.05, > verbose = FALSE) > > Output 1 > > # > # Gene Set Enrichment Analysis > # > #...@organism Homo sapiens > #...@setType CC > #...@keytype ENTREZID > #...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... > - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... > #...nPerm 1000 > #...pvalues adjusted by 'BH' with cutoff <0.05 > #...34 enriched terms found > 'data.frame': 34 obs. of 11 variables: > $ ID : chr "GO:0031012" "GO:0005578" "GO:0042383" > "GO:0000776" ... > $ Description : chr "extracellular matrix" "proteinaceous > extracellular matrix" "sarcolemma" "kinetochore" ... > $ setSize : int 399 268 108 103 127 101 121 109 119 139 ... > $ enrichmentScore: num -0.492 -0.557 -0.463 0.675 0.407 ... > $ NES : num -2.15 -2.36 -1.76 2.73 1.73 ... > $ pvalue : num 0.00128 0.00131 0.00146 0.00309 0.0031 ... > $ p.adjust : num 0.0295 0.0295 0.0295 0.0295 0.0295 ... > $ qvalues : num 0.0219 0.0219 0.0219 0.0219 0.0219 ... > $ rank : num 1918 1918 2526 449 1905 ... > $ leading_edge : chr "tags=35%, list=15%, signal=31%" "tags=43%, > list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=26%, list=4%, > signal=25%" ... > $ core_enrichment: chr "4017/6649/1288/4811/57124/ > 3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/ > 3490/79875/7450/80781/79625/1490"| __truncated__ > "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/ > 79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| > __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/ > 8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ > "1062/10403/55355/220134/4751/79019/55839/54821/4085/81930/ > 81620/332/7272/9212/891/11004/5347/701/11130/79682/57"| __truncated__ ... > > Output2 > > # Gene Set Enrichment Analysis > # > #...@organism Homo sapiens > #...@setType CC > #...@keytype ENTREZID > #...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... > - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... > #...nPerm 1000 > #...pvalues adjusted by 'BH' with cutoff <0.05 > #...39 enriched terms found > 'data.frame': 39 obs. of 11 variables: > $ ID : chr "GO:0031012" "GO:0005578" "GO:0042383" > "GO:0005788" ... > $ Description : chr "extracellular matrix" "proteinaceous > extracellular matrix" "sarcolemma" "endoplasmic reticulum lumen" ... > $ setSize : int 399 268 108 166 103 101 109 115 115 119 ... > $ enrichmentScore: num -0.492 -0.557 -0.463 -0.397 0.675 ... > $ NES : num -2.15 -2.35 -1.76 -1.59 2.78 ... > $ pvalue : num 0.00123 0.00132 0.00153 0.00285 0.00286 ... > $ p.adjust : num 0.03 0.03 0.03 0.03 0.03 ... > $ qvalues : num 0.0221 0.0221 0.0221 0.0221 0.0221 ... > $ rank : num 1918 1918 2526 1890 449 ... > $ leading_edge : chr "tags=35%, list=15%, signal=31%" "tags=43%, > list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=28%, list=15%, > signal=24%" ... > $ core_enrichment: chr "4017/6649/1288/4811/57124/ > 3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/ > 3490/79875/7450/80781/79625/1490"| __truncated__ > "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/ > 79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| > __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/ > 8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ > "64374/845/23621/1291/1301/2098/7474/2322/5624/80781/ > 79642/1280/1306/844/11001/5799/590/3270/1278/80267/414/1277"| > __truncated__ ... > > SessionInfo > > R version 3.4.1 (2017-06-30) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 14393) > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats4 stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] clusterProfiler_3.4.4 DOSE_3.2.0 org.Hs.eg.db_3.4.1 > [4] AnnotationDbi_1.38.2 IRanges_2.10.2 S4Vectors_0.14.3 > [7] Biobase_2.36.2 BiocGenerics_0.22.0 BiocInstaller_1.26.0 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.12 compiler_3.4.1 plyr_1.8.4 > [4] tools_3.4.1 digest_0.6.12 bit_1.1-12 > [7] RSQLite_2.0 memoise_1.1.0 tibble_1.3.4 > [10] gtable_0.2.0 pkgconfig_2.0.1 rlang_0.1.2 > [13] fastmatch_1.1-0 igraph_1.1.2 DBI_0.7 > [16] rvcheck_0.0.9 fgsea_1.2.1 gridExtra_2.2.1 > [19] stringr_1.2.0 bit64_0.9-7 grid_3.4.1 > [22] glue_1.1.1 qvalue_2.8.0 data.table_1.10.4 > [25] BiocParallel_1.10.1 GOSemSim_2.2.0 purrr_0.2.3 > [28] tidyr_0.7.0 GO.db_3.4.1 DO.db_2.9 > [31] ggplot2_2.2.1 reshape2_1.4.2 blob_1.1.0 > [34] magrittr_1.5 splines_3.4.1 scales_0.5.0 > [37] colorspace_1.3-2 stringi_1.1.5 lazyeval_0.2.0 > [40] munsell_0.4.3 > > ------------------------------ > > Post tags: clusterprofiler > > You may reply via email or visit clusterProfiler: different runs return different results for gse functions > -- --~--~---------~--~----~------------~-------~--~----~ Guangchuang Yu, PhD Candidate State Key Laboratory of Emerging Infectious Diseases School of Public Health The University of Hong Kong Hong Kong SAR, China www: https://guangchuangyu.github.io -~----------~----~----~----~------~----~------~--~---
ADD COMMENT
0
Entering edit mode

Thanks for your comment.

However, it does not explain why two runs of the exact same calculation will return different results even if they are 'similar'. Could you give me some insight into how this is possible and how to circumvent this issue?

ADD REPLY
0
Entering edit mode

I am also having the same issue where I get different numbers of enriched terms when I re-run the exact same analysis. Were you able to figure out why it does this?

ADD REPLY
0
Entering edit mode

No, unfortunately not. I didn't end up using the package very much after all. I'm not sure actually where it comes from. Maybe there's a seed somewhere that you can set? I'm not very familiar with the details anymore, I'm afraid...

ADD REPLY
0
Entering edit mode

I have the same issue.

ADD REPLY
0
Entering edit mode
josep ▴ 10
@8ff2bb38
Last seen 3.7 years ago
Spain

I have the same issue. I resolved by pre-defining set.seed to run gsea Without set.seed:

Output 1:
ego3 <- gseGO(geneList     = geneList,
              OrgDb        = org.Hs.eg.db,
              ont          = "BP",
              keyType = "SYMBOL",
              nPerm        = 1000,
              minGSSize    = 3,
              maxGSSize    = 800,
              pvalueCutoff = 0.05,
              verbose      = T)
dim(ego3): 495 , 11



Output 2:
ego3 <- gseGO(geneList     = geneList,
              OrgDb        = org.Hs.eg.db,
              ont          = "BP",
              keyType = "SYMBOL",
              nPerm        = 1000,
              minGSSize    = 3,
              maxGSSize    = 800,
              pvalueCutoff = 0.05,
              verbose      = T)
dim(ego3): 506 , 11

With set.seed(123) always get same result

ADD COMMENT

Login before adding your answer.

Traffic: 370 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6