Search
Question: clusterProfiler: different runs return different results for gse functions
0
gravatar for ssv
12 weeks ago by
ssv0
ssv0 wrote:

Hi everyone!

I've been playing around a bit with clusterProfiler to work with RNAseq data. When running my own data I noticed that when I ran the same function repeatedly, I would get different results. This happens for both gseKEGG and gseGO, even when I directly copy the commands I used a minute before, different gene sets are claimed to be enriched. I tried the same with the sample data set that they use in examples for this package and saw the same effect and also tried increasing the number of permutations, which didn't really seem to help. I think I'm missing something obvious - could someone help me out?

Below is the code I was using and the output, as well as the output of sessionInfo if that helps any:

data(geneList, package="DOSE")

gseGO(geneList     = geneList,
              OrgDb        = org.Hs.eg.db,
              ont          = "CC",
              nPerm        = 1000,
              minGSSize    = 100,
              maxGSSize    = 500,
              pvalueCutoff = 0.05,
              verbose      = FALSE)

Output 1

#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens
#...@setType     CC
#...@keytype     ENTREZID
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        1000
#...pvalues adjusted by 'BH' with cutoff <0.05
#...34 enriched terms found
'data.frame':   34 obs. of  11 variables:
 $ ID             : chr  "GO:0031012" "GO:0005578" "GO:0042383" "GO:0000776" ...
 $ Description    : chr  "extracellular matrix" "proteinaceous extracellular matrix" "sarcolemma" "kinetochore" ...
 $ setSize        : int  399 268 108 103 127 101 121 109 119 139 ...
 $ enrichmentScore: num  -0.492 -0.557 -0.463 0.675 0.407 ...
 $ NES            : num  -2.15 -2.36 -1.76 2.73 1.73 ...
 $ pvalue         : num  0.00128 0.00131 0.00146 0.00309 0.0031 ...
 $ p.adjust       : num  0.0295 0.0295 0.0295 0.0295 0.0295 ...
 $ qvalues        : num  0.0219 0.0219 0.0219 0.0219 0.0219 ...
 $ rank           : num  1918 1918 2526 449 1905 ...
 $ leading_edge   : chr  "tags=35%, list=15%, signal=31%" "tags=43%, list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=26%, list=4%, signal=25%" ...
 $ core_enrichment: chr  "4017/6649/1288/4811/57124/3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/3490/79875/7450/80781/79625/1490"| __truncated__ "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ "1062/10403/55355/220134/4751/79019/55839/54821/4085/81930/81620/332/7272/9212/891/11004/5347/701/11130/79682/57"| __truncated__ ...

Output2

# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens
#...@setType     CC
#...@keytype     ENTREZID
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        1000
#...pvalues adjusted by 'BH' with cutoff <0.05
#...39 enriched terms found
'data.frame':   39 obs. of  11 variables:
 $ ID             : chr  "GO:0031012" "GO:0005578" "GO:0042383" "GO:0005788" ...
 $ Description    : chr  "extracellular matrix" "proteinaceous extracellular matrix" "sarcolemma" "endoplasmic reticulum lumen" ...
 $ setSize        : int  399 268 108 166 103 101 109 115 115 119 ...
 $ enrichmentScore: num  -0.492 -0.557 -0.463 -0.397 0.675 ...
 $ NES            : num  -2.15 -2.35 -1.76 -1.59 2.78 ...
 $ pvalue         : num  0.00123 0.00132 0.00153 0.00285 0.00286 ...
 $ p.adjust       : num  0.03 0.03 0.03 0.03 0.03 ...
 $ qvalues        : num  0.0221 0.0221 0.0221 0.0221 0.0221 ...
 $ rank           : num  1918 1918 2526 1890 449 ...
 $ leading_edge   : chr  "tags=35%, list=15%, signal=31%" "tags=43%, list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=28%, list=15%, signal=24%" ...
 $ core_enrichment: chr  "4017/6649/1288/4811/57124/3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/3490/79875/7450/80781/79625/1490"| __truncated__ "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ "64374/845/23621/1291/1301/2098/7474/2322/5624/80781/79642/1280/1306/844/11001/5799/590/3270/1278/80267/414/1277"| __truncated__ ...

SessionInfo

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] clusterProfiler_3.4.4 DOSE_3.2.0            org.Hs.eg.db_3.4.1   
[4] AnnotationDbi_1.38.2  IRanges_2.10.2        S4Vectors_0.14.3     
[7] Biobase_2.36.2        BiocGenerics_0.22.0   BiocInstaller_1.26.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12        compiler_3.4.1      plyr_1.8.4         
 [4] tools_3.4.1         digest_0.6.12       bit_1.1-12         
 [7] RSQLite_2.0         memoise_1.1.0       tibble_1.3.4       
[10] gtable_0.2.0        pkgconfig_2.0.1     rlang_0.1.2        
[13] fastmatch_1.1-0     igraph_1.1.2        DBI_0.7            
[16] rvcheck_0.0.9       fgsea_1.2.1         gridExtra_2.2.1    
[19] stringr_1.2.0       bit64_0.9-7         grid_3.4.1         
[22] glue_1.1.1          qvalue_2.8.0        data.table_1.10.4  
[25] BiocParallel_1.10.1 GOSemSim_2.2.0      purrr_0.2.3        
[28] tidyr_0.7.0         GO.db_3.4.1         DO.db_2.9          
[31] ggplot2_2.2.1       reshape2_1.4.2      blob_1.1.0         
[34] magrittr_1.5        splines_3.4.1       scales_0.5.0       
[37] colorspace_1.3-2    stringi_1.1.5       lazyeval_0.2.0     
[40] munsell_0.4.3     

ADD COMMENTlink modified 12 weeks ago by Guangchuang Yu800 • written 12 weeks ago by ssv0
0
gravatar for Guangchuang Yu
12 weeks ago by
Hong Kong
Guangchuang Yu800 wrote:
The results are similar from the outputs pasted here. It would be better if you can calculate the overlap enriched term and compare the pvalue similar to what I did in http://guangchuangyu.github.io/2015/11/comparison-of-clusterprofiler-and-gsea-p/ . ​ On Tue, Aug 29, 2017 at 10:23 PM, ssv [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User ssv <https: support.bioconductor.org="" u="" 13861=""/> wrote Question: > clusterProfiler: different runs return different results for gse functions > <https: support.bioconductor.org="" p="" 99810=""/>: > > Hi everyone! > > I've been playing around a bit with clusterProfiler to work with RNAseq > data. When running my own data I noticed that when I ran the same function > repeatedly, I would get different results. This happens for both gseKEGG > and gseGO, even when I directly copy the commands I used a minute before, > different gene sets are claimed to be enriched. I tried the same with the > sample data set that they use in examples for this package and saw the same > effect and also tried increasing the number of permutations, which didn't > really seem to help. I think I'm missing something obvious - could someone > help me out? > > Below is the code I was using and the output, as well as the output of > sessionInfo if that helps any: > > data(geneList, package="DOSE") > > gseGO(geneList = geneList, > OrgDb = org.Hs.eg.db, > ont = "CC", > nPerm = 1000, > minGSSize = 100, > maxGSSize = 500, > pvalueCutoff = 0.05, > verbose = FALSE) > > Output 1 > > # > # Gene Set Enrichment Analysis > # > #...@organism Homo sapiens > #...@setType CC > #...@keytype ENTREZID > #...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... > - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... > #...nPerm 1000 > #...pvalues adjusted by 'BH' with cutoff <0.05 > #...34 enriched terms found > 'data.frame': 34 obs. of 11 variables: > $ ID : chr "GO:0031012" "GO:0005578" "GO:0042383" > "GO:0000776" ... > $ Description : chr "extracellular matrix" "proteinaceous > extracellular matrix" "sarcolemma" "kinetochore" ... > $ setSize : int 399 268 108 103 127 101 121 109 119 139 ... > $ enrichmentScore: num -0.492 -0.557 -0.463 0.675 0.407 ... > $ NES : num -2.15 -2.36 -1.76 2.73 1.73 ... > $ pvalue : num 0.00128 0.00131 0.00146 0.00309 0.0031 ... > $ p.adjust : num 0.0295 0.0295 0.0295 0.0295 0.0295 ... > $ qvalues : num 0.0219 0.0219 0.0219 0.0219 0.0219 ... > $ rank : num 1918 1918 2526 449 1905 ... > $ leading_edge : chr "tags=35%, list=15%, signal=31%" "tags=43%, > list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=26%, list=4%, > signal=25%" ... > $ core_enrichment: chr "4017/6649/1288/4811/57124/ > 3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/ > 3490/79875/7450/80781/79625/1490"| __truncated__ > "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/ > 79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| > __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/ > 8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ > "1062/10403/55355/220134/4751/79019/55839/54821/4085/81930/ > 81620/332/7272/9212/891/11004/5347/701/11130/79682/57"| __truncated__ ... > > Output2 > > # Gene Set Enrichment Analysis > # > #...@organism Homo sapiens > #...@setType CC > #...@keytype ENTREZID > #...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... > - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... > #...nPerm 1000 > #...pvalues adjusted by 'BH' with cutoff <0.05 > #...39 enriched terms found > 'data.frame': 39 obs. of 11 variables: > $ ID : chr "GO:0031012" "GO:0005578" "GO:0042383" > "GO:0005788" ... > $ Description : chr "extracellular matrix" "proteinaceous > extracellular matrix" "sarcolemma" "endoplasmic reticulum lumen" ... > $ setSize : int 399 268 108 166 103 101 109 115 115 119 ... > $ enrichmentScore: num -0.492 -0.557 -0.463 -0.397 0.675 ... > $ NES : num -2.15 -2.35 -1.76 -1.59 2.78 ... > $ pvalue : num 0.00123 0.00132 0.00153 0.00285 0.00286 ... > $ p.adjust : num 0.03 0.03 0.03 0.03 0.03 ... > $ qvalues : num 0.0221 0.0221 0.0221 0.0221 0.0221 ... > $ rank : num 1918 1918 2526 1890 449 ... > $ leading_edge : chr "tags=35%, list=15%, signal=31%" "tags=43%, > list=15%, signal=37%" "tags=34%, list=20%, signal=28%" "tags=28%, list=15%, > signal=24%" ... > $ core_enrichment: chr "4017/6649/1288/4811/57124/ > 3910/51162/3371/1973/1291/1397/1301/22918/56547/7474/ > 3490/79875/7450/80781/79625/1490"| __truncated__ > "4017/1288/4811/57124/3910/3371/1291/1301/56547/7474/ > 79875/7450/80781/79625/1490/1280/1306/4314/80070/8425/977/4"| > __truncated__ "5532/3880/6443/5530/6326/10211/825/6548/1291/7402/7779/ > 8082/2318/80024/844/8910/79026/6717/3752/481/1293/358/47"| __truncated__ > "64374/845/23621/1291/1301/2098/7474/2322/5624/80781/ > 79642/1280/1306/844/11001/5799/590/3270/1278/80267/414/1277"| > __truncated__ ... > > SessionInfo > > R version 3.4.1 (2017-06-30) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 14393) > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats4 stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] clusterProfiler_3.4.4 DOSE_3.2.0 org.Hs.eg.db_3.4.1 > [4] AnnotationDbi_1.38.2 IRanges_2.10.2 S4Vectors_0.14.3 > [7] Biobase_2.36.2 BiocGenerics_0.22.0 BiocInstaller_1.26.0 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.12 compiler_3.4.1 plyr_1.8.4 > [4] tools_3.4.1 digest_0.6.12 bit_1.1-12 > [7] RSQLite_2.0 memoise_1.1.0 tibble_1.3.4 > [10] gtable_0.2.0 pkgconfig_2.0.1 rlang_0.1.2 > [13] fastmatch_1.1-0 igraph_1.1.2 DBI_0.7 > [16] rvcheck_0.0.9 fgsea_1.2.1 gridExtra_2.2.1 > [19] stringr_1.2.0 bit64_0.9-7 grid_3.4.1 > [22] glue_1.1.1 qvalue_2.8.0 data.table_1.10.4 > [25] BiocParallel_1.10.1 GOSemSim_2.2.0 purrr_0.2.3 > [28] tidyr_0.7.0 GO.db_3.4.1 DO.db_2.9 > [31] ggplot2_2.2.1 reshape2_1.4.2 blob_1.1.0 > [34] magrittr_1.5 splines_3.4.1 scales_0.5.0 > [37] colorspace_1.3-2 stringi_1.1.5 lazyeval_0.2.0 > [40] munsell_0.4.3 > > ------------------------------ > > Post tags: clusterprofiler > > You may reply via email or visit clusterProfiler: different runs return different results for gse functions > -- --~--~---------~--~----~------------~-------~--~----~ Guangchuang Yu, PhD Candidate State Key Laboratory of Emerging Infectious Diseases School of Public Health The University of Hong Kong Hong Kong SAR, China www: https://guangchuangyu.github.io -~----------~----~----~----~------~----~------~--~---
ADD COMMENTlink written 12 weeks ago by Guangchuang Yu800

Thanks for your comment.

However, it does not explain why two runs of the exact same calculation will return different results even if they are 'similar'. Could you give me some insight into how this is possible and how to circumvent this issue?

ADD REPLYlink written 11 weeks ago by ssv0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 379 users visited in the last hour