sessioninfor from Intel desktop for result2

Question

ClusterProfiler enrichGO function leads to different enrichment results in different computers, while the code and gene list keep same

0

Entering edit mode

liuzhy ▴ 10

@cfaa13b4

Last seen 2.4 years ago

Hong Kong

I have a gene list containg 5 groups, and use the clusterProfiler compareCluster-enrichGO function to do the Biological Process funtional enrichment. However, different computers produce two different enrichment results with the same gene list and packages (same version).

Result 1: Tested in two Lenove laptops with AMD CPUs, one Apple Mac with A1 CPU enter image description here

Result 2: Tested in one HP desktop, one Lenove laptop and one Apple Mac with Intel CPUs enter image description here

I know little about the statistic behind, but both results seem reasonable for my study. Now I just want to know which result is the true one and how the difference happened. BTW, the difference exist while I tried other gene lists.

I attached the data and code here if you are interested to make a test enter link description here https://drive.google.com/drive/folders/1fhg9IGurlw5U9PUKfNNccNhj5M4CRzH4?usp=sharing (see if I make the link right==)

Thanks for comments!


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

library(clusterProfiler)
library("org.Hs.eg.db")

load("enrich_gene_list_for_test.RData")
enrich_gene_list_for_test
test_enrich <- compareCluster(geneCluster = enrich_gene_list_for_test, 
                                  ont           = "BP",
                                  OrgDb = org.Hs.eg.db,
                                  pAdjustMethod = "BH",
                                  pvalueCutoff  = 0.01,
                                  qvalueCutoff  = 0.05,
                                  readable      = TRUE,
                                  fun =  enrichGO)

dotplot(test_enrich,label_format = 100,showCategory=40)

sessionInfo()



sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936    LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C                               LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Hs.eg.db_3.14.0   AnnotationDbi_1.56.2  IRanges_2.28.0        S4Vectors_0.32.3      Biobase_2.54.0        BiocGenerics_0.40.0  
[7] clusterProfiler_4.2.0

clusterProfiler • 7.3k views

ADD COMMENT • link 2.4 years ago liuzhy ▴ 10

0

Entering edit mode

Are you really sure that the package versions are identical? Especially the library GO.db?

--> The differences between your Result1 and Result2 regarding the absence of some clusters (groups) and the numbers below each of the clusters, that represent the number of annotated genes in each of you input lists (=groups), strongly hints to a difference in the number of genes that were actually annotated to a GO category.

enrichGO performs a 'simple' one-sided version of Fisher’s exact test to find enriched GO categories. This type of test involves the analysis of contingency tables, which is nothing more (or less) than scoring the number of genes that are, and are not, in a specific GO category... In contrast, when you would have used gseGO (for GSEA analysis), then it involves permuting you data. Permutations are random, so in that case some (very) small differences between runs are expected. (See for more info on the differences between the types of test here: what the test method for enrichGO in clusterProfiler?).

The large differences you show can IMHO only be explained by differences in the number of annotated genes in a GO category...

I ran your example,and I got exactly the same results as in your Result1...! I am using a PC (HP, Windows10) with an Intel Core i7 processor, and used R version 4.1.1 with GO.db version 3.14.0.

Results test run

ADD REPLY • link 2.4 years ago Guido Hooiveld ★ 4.0k

0

Entering edit mode

Thanks for your testing here. I checked the GO.db package specifically and confirmed the same version GO.db_3.14.0 included.

The sessionInfor in the post is from my AMD laptop, get result1 (also GO.db_3.14.0)
The sessionInfor below is from my HP Intel desktop, get result2

As you can see, seems no difference in package version.

Actually, my colleagues help test in clusterProfiler_3.XXX version (and related packages), and get results very similar to result2. So I wonder if the package ver Test_result3 enter image description here

====

sessioninfor from Intel desktop for result2

R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: 1 LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
system code page: 1252

attached base packages: 1 stats4 stats graphics grDevices utils datasets methods base

other attached packages: 1 GO.db_3.14.0 clusterProfiler_4.2.0 org.Hs.eg.db_3.14.0 AnnotationDbi_1.56.2 IRanges_2.28.0
[6] S4Vectors_0.32.3 Biobase_2.54.0 BiocGenerics_0.40.0

ADD REPLY • link 2.4 years ago liuzhy ▴ 10

0

Entering edit mode

not full edited...

the difference between result1 and result2 is huge, and cannot see the difference in pakage version level.
different pakage version can lead to slightly different results

So I wonder, package version difference may not fully explain the huge difference, even if it really exists here. And maybe some more important factors are ignored.

ADD REPLY • link 2.4 years ago liuzhy ▴ 10

0

Entering edit mode

That's indeed very remarkable...!

To be honest, I don't know what may be causing the discrepancies you observed.

For maximal reproducibility I used the sample data set included with DOSE, and ran enrichGO according to your code on both my Win10 and Linux machine. I obtained identical results... Could you please try my code as well, and compare with my results? Again, I got identical results on both my machines.

> library(clusterProfiler)
> library(org.Hs.eg.db)
> 
> data(geneList, package = "DOSE")
> 
> de1 <- names(geneList)[1:1250]
> de2 <- names(geneList)[1000:2750]
> de3 <- names(geneList)[4000:8500]
> de4 <- names(geneList)[3500:6500]
> 
> 
> enrich_gene_list_for_test <- list("set1"=de1, "set2"=de2,
+                                   "set3"=de3, "set4"=de4)
> 
> test_enrich <- compareCluster(geneCluster = enrich_gene_list_for_test, 
+                                   ont           = "BP",
+                                   OrgDb = org.Hs.eg.db,
+                                   pAdjustMethod = "BH",
+                                   pvalueCutoff  = 0.01,
+                                   qvalueCutoff  = 0.05,
+                                   fun =  enrichGO)
> 
> test_enrich
#
# Result of Comparing 4 gene clusters 
#
#.. @fun         enrichGO 
#.. @geneClusters       List of 4
 $ set1: chr [1:1250] "4312" "8318" "10874" "55143" ...
 $ set2: chr [1:1751] "9631" "5479" "5424" "6783" ...
 $ set3: chr [1:4501] "5294" "9790" "26207" "27043" ...
 $ set4: chr [1:3001] "84928" "55326" "3839" "9702" ...
#...Result      'data.frame':   846 obs. of  10 variables:
 $ Cluster    : Factor w/ 4 levels "set1","set2",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ID         : chr  "GO:0140014" "GO:0000280" "GO:0048285" "GO:0000070" ...
 $ Description: chr  "mitotic nuclear division" "nuclear division" "organelle fission" "mitotic sister chromatid segregation" ...
 $ GeneRatio  : chr  "68/1186" "85/1186" "90/1186" "49/1186" ...
 $ BgRatio    : chr  "287/18723" "439/18723" "488/18723" "168/18723" ...
 $ pvalue     : num  7.78e-22 8.78e-21 1.86e-20 3.65e-20 4.92e-19 ...
 $ p.adjust   : num  4.17e-18 2.35e-17 3.32e-17 4.89e-17 5.27e-16 ...
 $ qvalue     : num  3.42e-18 1.93e-17 2.73e-17 4.01e-17 4.33e-16 ...
 $ geneID     : chr  "55143/991/9493/1062/4605/10403/23397/9787/11065/51203/22974/10460/4751/27338/4085/81930/81620/3832/7272/64151/9"| __truncated__ "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/51203/22974/10460/4751/10635/27338/4085/81930/"| __truncated__ "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/51203/22974/10460/4751/10635/27338/4085/81930/"| __truncated__ "55143/991/9493/1062/10403/23397/9787/11065/51203/10460/4751/4085/81930/81620/7272/64151/9212/9319/9055/3833/146"| __truncated__ ...
 $ Count      : int  68 85 90 49 45 59 51 67 57 20 ...
#.. number of enriched terms found for each gene cluster:
#..   set1: 395 
#..   set2: 274 
#..   set3: 161 
#..   set4: 16 
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, 
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. 
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. 
The Innovation. 2021, 2(3):100141 

> dotplot(test_enrich,label_format = 100 ,showCategory=20, font.size=8)
>

Result example code

Also, to exclude it has something to do with the Gene Ontology annotation; could you also run it with enrichKEGG to see whether discrepancies occur, or not? The gene sets from KEGG don't have any hiearchy (like parents, ancestors, etc).

> test_kegg <- compareCluster(geneCluster = enrich_gene_list_for_test, 
+                                   organism = "hsa",
+                                   pvalueCutoff  = 0.01,
+                                   qvalueCutoff  = 0.05,
+                                   fun =  enrichKEGG)
> 
> test_kegg 
#
# Result of Comparing 4 gene clusters 
#
#.. @fun         enrichKEGG 
#.. @geneClusters       List of 4
 $ set1: chr [1:1250] "4312" "8318" "10874" "55143" ...
 $ set2: chr [1:1751] "9631" "5479" "5424" "6783" ...
 $ set3: chr [1:4501] "5294" "9790" "26207" "27043" ...
 $ set4: chr [1:3001] "84928" "55326" "3839" "9702" ...
#...Result      'data.frame':   50 obs. of  10 variables:
 $ Cluster    : Factor w/ 4 levels "set1","set2",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ID         : chr  "hsa04110" "hsa03030" "hsa04061" "hsa03050" ...
 $ Description: chr  "Cell cycle" "DNA replication" "Viral protein interaction with cytokine and cytokine receptor" "Proteasome" ...
 $ GeneRatio  : chr  "46/654" "17/654" "26/654" "16/654" ...
 $ BgRatio    : chr  "126/8113" "36/8113" "100/8113" "46/8113" ...
 $ pvalue     : num  1.43e-19 4.20e-10 5.16e-08 2.65e-07 2.40e-06 ...
 $ p.adjust   : num  4.57e-17 6.72e-08 5.51e-06 2.12e-05 1.53e-04 ...
 $ qvalue     : num  3.95e-17 5.81e-08 4.77e-06 1.84e-05 1.33e-04 ...
 $ geneID     : chr  "8318/991/9133/890/983/4085/7272/1111/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/109"| __truncated__ "4174/4171/4175/4173/2237/5984/5111/10535/1763/5427/23649/4176/5982/5557/5558/4172/5424" "3627/10563/6373/4283/6362/6355/2921/6364/3576/6352/3559/1230/6347/3561/6351/1237/1236/11009/6354/6361/1234/6367"| __truncated__ "5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198" ...
 $ Count      : int  46 17 26 16 39 36 21 18 12 32 ...
#.. number of enriched terms found for each gene cluster:
#..   set1: 16 
#..   set2: 7 
#..   set3: 16 
#..   set4: 11 
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, 
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. 
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. 
The Innovation. 2021, 2(3):100141 

> 
> dotplot(test_kegg ,label_format = 100 ,showCategory=20, font.size=8)
>

KEGG_pathways

SessionInfo() Linux machine:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 35 (Thirty Five)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GO.db_3.14.0          org.Hs.eg.db_3.14.0   AnnotationDbi_1.56.2 
[4] IRanges_2.28.0        S4Vectors_0.32.3      Biobase_2.54.0       
[7] BiocGenerics_0.40.0   clusterProfiler_4.2.0

ADD REPLY • link 2.4 years ago Guido Hooiveld ★ 4.0k

0

Entering edit mode

I tested your code in both my AMD laptop (machine get result1 as yours) and Intel desktop (get different result2)

AMD laptop produced identicial results as yours for both GO and KEGG enrichment
Intel desktop got slightly different GO result and the same KEGG result as yours

I would like to put out all the information Intel desktop results for your reference:

go result

test_enrich #

Result of Comparing 4 gene clusters

#

.. @fun enrichGO

.. @geneClusters List of 4 $ set1: chr [1:1250] "4312" "8318" "10874" "55143" ... $ set2: chr [1:1751] "9631" "5479" "5424" "6783"

... $ set3: chr [1:4501] "5294" "9790" "26207" "27043" ... $ set4: chr [1:3001] "84928" "55326" "3839" "9702" ...

...Result 'data.frame': 1120 obs. of 10 variables: $ Cluster : Factor w/ 4 levels "set1","set2",..: 1 1 1 1 1 1 1 1 1 1 ... $ ID

: chr "GO:0140014" "GO:0048285" "GO:0000280" "GO:0006261" ... $ Description: chr "mitotic nuclear division" "organelle fission" "nuclear division" "DNA-dependent DNA replication" ... $ GeneRatio : chr "69/1189" "90/1189" "84/1189" "47/1189" ... $ BgRatio : chr "296/18862" "486/18862" "436/18862" "157/18862" ... $ pvalue : num 7.94e-22 1.02e-20 1.55e-20 5.38e-20 6.19e-20 ... $ p.adjust : num 4.24e-18 2.73e-17 2.76e-17 6.61e-17 6.61e-17 ... $ qvalue : num 3.44e-18 2.21e-17 2.24e-17 5.37e-17 5.37e-17 ... $ geneID : chr "55143/991/9493/1062/4605/9133/10403/23397/9787/11065/51203/22974/10460/4751/27338/983/4085/81930/81620/3832/727"| __truncated__ "55143/991/9493/1062/4605/9133/10403/7153/23397/9787/11065/51203/22974/10460/4751/10635/27338/983/4085/81930/816"| __truncated__ "55143/991/9493/1062/4605/9133/10403/7153/23397/9787/11065/51203/22974/10460/4751/10635/27338/983/4085/81930/816"| __truncated__ "8318/55388/79733/9837/81620/51659/4174/4171/990/3159/79075/5888/898/23594/4998/9134/4175/4173/2237/10926/54962/"| __truncated__ ... $ Count : int 69 90 84 47 48 29 61 50 27 66 ...

.. number of enriched terms found for each gene cluster:

.. set1: 449

.. set2: 368

.. set3: 239

.. set4: 64

#

...Citation Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He. clusterProfiler: an R package for comparing biological themes among

gene clusters. OMICS: A Journal of Integrative Biology 2012,
16(5):284-287 dotplot(test_enrich,label_format = 100 ,showCategory=20, font.size=8)

kegg result

test_kegg #

Result of Comparing 4 gene clusters

#

.. @fun enrichKEGG

.. @geneClusters List of 4 $ set1: chr [1:1250] "4312" "8318" "10874" "55143" ... $ set2: chr [1:1751] "9631" "5479" "5424" "6783"

... $ set3: chr [1:4501] "5294" "9790" "26207" "27043" ... $ set4: chr [1:3001] "84928" "55326" "3839" "9702" ...

...Result 'data.frame': 50 obs. of 10 variables: $ Cluster : Factor w/ 4 levels "set1","set2",..: 1 1 1 1 1 1 1 1 1 1 ... $ ID

: chr "hsa04110" "hsa03030" "hsa04061" "hsa03050" ... $ Description: chr "Cell cycle" "DNA replication" "Viral protein interaction with cytokine and cytokine receptor" "Proteasome" ... $ GeneRatio : chr "46/654" "17/654" "26/654" "16/654" ... $ BgRatio : chr "126/8113" "36/8113" "100/8113" "46/8113" ... $ pvalue : num 1.43e-19 4.20e-10 5.16e-08 2.65e-07 2.40e-06 ... $ p.adjust : num 4.57e-17 6.72e-08 5.51e-06 2.12e-05 1.53e-04 ... $ qvalue : num 3.95e-17 5.81e-08 4.77e-06 1.84e-05 1.33e-04 ... $ geneID : chr "8318/991/9133/890/983/4085/7272/1111/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/109"| __truncated__ "4174/4171/4175/4173/2237/5984/5111/10535/1763/5427/23649/4176/5982/5557/5558/4172/5424" "3627/10563/6373/4283/6362/6355/2921/6364/3576/6352/3559/1230/6347/3561/6351/1237/1236/11009/6354/6361/1234/6367"| __truncated__ "5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198" ... $ Count : int 46 17 26 16 39 36 21 18 12 32 ...

.. number of enriched terms found for each gene cluster:

.. set1: 16

.. set2: 7

.. set3: 16

.. set4: 11

#

...Citation Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He. clusterProfiler: an R package for comparing biological themes among

gene clusters. OMICS: A Journal of Integrative Biology 2012,
16(5):284-287

dotplot(test_kegg ,label_format = 100 ,showCategory=20, font.size=8)

enter image description here

sessioninfor from desktop

sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: 1 LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
system code page: 1252

attached base packages: 1 stats4 stats graphics grDevices utils datasets methods base

other attached packages: 1 GO.db_3.14.0 clusterProfiler_4.2.0 org.Hs.eg.db_3.14.0 AnnotationDbi_1.56.2 IRanges_2.28.0
[6] S4Vectors_0.32.3 Biobase_2.54.0 BiocGenerics_0.40.0

ADD REPLY • link 2.4 years ago liuzhy ▴ 10

0

Entering edit mode

Here may also attached the information in AMD laptop

GO

test_enrich #

Result of Comparing 4 gene clusters

#

.. @fun enrichGO

.. @geneClusters List of 4

$ set1: chr [1:1250] "4312" "8318" "10874" "55143" ... $ set2: chr [1:1751] "9631" "5479" "5424" "6783" ... $ set3: chr [1:4501] "5294" "9790" "26207" "27043" ... $ set4: chr [1:3001] "84928" "55326" "3839" "9702" ...

...Result 'data.frame': 846 obs. of 10 variables:

$ Cluster : Factor w/ 4 levels "set1","set2",..: 1 1 1 1 1 1 1 1 1 1 ... $ ID : chr "GO:0140014" "GO:0000280" "GO:0048285" "GO:0000070" ... $ Description: chr "mitotic nuclear division" "nuclear division" "organelle fission" "mitotic sister chromatid segregation" ... $ GeneRatio : chr "68/1186" "85/1186" "90/1186" "49/1186" ... $ BgRatio : chr "287/18723" "439/18723" "488/18723" "168/18723" ... $ pvalue : num 7.78e-22 8.78e-21 1.86e-20 3.65e-20 4.92e-19 ... $ p.adjust : num 4.17e-18 2.35e-17 3.32e-17 4.89e-17 5.27e-16 ... $ qvalue : num 3.42e-18 1.93e-17 2.73e-17 4.01e-17 4.33e-16 ... $ geneID : chr "55143/991/9493/1062/4605/10403/23397/9787/11065/51203/22974/10460/4751/27338/4085/81930/81620/3832/7272/64151/9"| __truncated__ "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/51203/22974/10460/4751/10635/27338/4085/81930/"| __truncated__ "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/51203/22974/10460/4751/10635/27338/4085/81930/"| __truncated__ "55143/991/9493/1062/10403/23397/9787/11065/51203/10460/4751/4085/81930/81620/7272/64151/9212/9319/9055/3833/146"| __truncated__ ... $ Count : int 68 85 90 49 45 59 51 67 57 20 ...

.. number of enriched terms found for each gene cluster:

.. set1: 395

.. set2: 274

.. set3: 161

.. set4: 16

#

...Citation

Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 2012, 16(5):284-287

dotplot(test_enrich,label_format = 100 ,showCategory=20, font.size=8)

enter image description here

KEGG

test_kegg #

Result of Comparing 4 gene clusters

#

.. @fun enrichKEGG

.. @geneClusters List of 4

$ set1: chr [1:1250] "4312" "8318" "10874" "55143" ... $ set2: chr [1:1751] "9631" "5479" "5424" "6783" ... $ set3: chr [1:4501] "5294" "9790" "26207" "27043" ... $ set4: chr [1:3001] "84928" "55326" "3839" "9702" ...

...Result 'data.frame': 50 obs. of 10 variables:

$ Cluster : Factor w/ 4 levels "set1","set2",..: 1 1 1 1 1 1 1 1 1 1 ... $ ID : chr "hsa04110" "hsa03030" "hsa04061" "hsa03050" ... $ Description: chr "Cell cycle" "DNA replication" "Viral protein interaction with cytokine and cytokine receptor" "Proteasome" ... $ GeneRatio : chr "46/654" "17/654" "26/654" "16/654" ... $ BgRatio : chr "126/8113" "36/8113" "100/8113" "46/8113" ... $ pvalue : num 1.43e-19 4.20e-10 5.16e-08 2.65e-07 2.40e-06 ... $ p.adjust : num 4.57e-17 6.72e-08 5.51e-06 2.12e-05 1.53e-04 ... $ qvalue : num 3.95e-17 5.81e-08 4.77e-06 1.84e-05 1.33e-04 ... $ geneID : chr "8318/991/9133/890/983/4085/7272/1111/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/109"| __truncated__ "4174/4171/4175/4173/2237/5984/5111/10535/1763/5427/23649/4176/5982/5557/5558/4172/5424" "3627/10563/6373/4283/6362/6355/2921/6364/3576/6352/3559/1230/6347/3561/6351/1237/1236/11009/6354/6361/1234/6367"| __truncated__ "5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198" ... $ Count : int 46 17 26 16 39 36 21 18 12 32 ...

.. number of enriched terms found for each gene cluster:

.. set1: 16

.. set2: 7

.. set3: 16

.. set4: 11

#

...Citation

Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 2012, 16(5):284-287

dotplot(test_kegg ,label_format = 100 ,showCategory=20, font.size=8)
sessioninfor from AMD laptop

R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: 1 LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936 LC_MONETARY=Chinese (Simplified)_China.936 [4] LC_NUMERIC=C LC_TIME=Chinese (Simplified)_China.936

attached base packages: 1 stats4 stats graphics grDevices utils datasets methods base

other attached packages: 1 GO.db_3.14.0 org.Hs.eg.db_3.14.0 AnnotationDbi_1.56.2 IRanges_2.28.0 S4Vectors_0.32.3 Biobase_2.54.0
[7] BiocGenerics_0.40.0 clusterProfiler_4.2.0

ADD REPLY • link 2.4 years ago liuzhy ▴ 10

0

Entering edit mode

sorry that I do not know how to make the copied information look better. Really thanks a lot for your effort here!

ADD REPLY • link 2.4 years ago liuzhy ▴ 10

1

Entering edit mode

Please note that you can use the button labeled CODE (5th from left) to format your text as code.

Let's compare the results ( I copied the most relevant output):

GO ORA (using enrichGO):

Confirm that the gene input is indeed the same for my and both your analysis:

#.. @geneClusters       List of 4
 $ set1: chr [1:1250] "4312" "8318" "10874" "55143" ...
 $ set2: chr [1:1751] "9631" "5479" "5424" "6783" ...
 $ set3: chr [1:4501] "5294" "9790" "26207" "27043" ...
 $ set4: chr [1:3001] "84928" "55326" "3839" "9702" ...

My result:

#.. number of enriched terms found for each gene cluster:
#..   set1: 395 
#..   set2: 274 
#..   set3: 161 
#..   set4: 16 
#

Your AMD result:

.. number of enriched terms found for each gene cluster:
.. set1: 395
.. set2: 274
.. set3: 161
.. set4: 16

--> Numbers are indeed the same as mine!

Your Intel result:

.. number of enriched terms found for each gene cluster:
.. set1: 449
.. set2: 368
.. set3: 239
.. set4: 64

--> These numbers indeed differ substantially....

Have a look at the details:

Stats results from my run:

#...Result      'data.frame':   846 obs. of  10 variables:
 $ Cluster    : Factor w/ 4 levels "set1","set2",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ID         : chr  "GO:0140014" "GO:0000280" "GO:0048285" "GO:0000070" ...
 $ Description: chr  "mitotic nuclear division" "nuclear division" "organelle fission" "mitotic sister chromatid segregation" ...
 $ GeneRatio  : chr  "68/1186" "85/1186" "90/1186" "49/1186" ...
 $ BgRatio    : chr  "287/18723" "439/18723" "488/18723" "168/18723" ...
 $ pvalue     : num  7.78e-22 8.78e-21 1.86e-20 3.65e-20 4.92e-19 ...
 $ p.adjust   : num  4.17e-18 2.35e-17 3.32e-17 4.89e-17 5.27e-16 ...

Stats results from your (identical) AMD run:

..Result 'data.frame': 846 obs. of 10 variables:
$ Cluster : Factor w/ 4 levels "set1","set2",..: 1 1 1 1 1 1 1 1 1 1 ... 
$ ID : chr "GO:0140014" "GO:0000280" "GO:0048285" "GO:0000070" ... 
$ Description: chr "mitotic nuclear division" "nuclear division" "organelle fission" "mitotic sister chromatid segregation" ... 
$ GeneRatio : chr "68/1186" "85/1186" "90/1186" "49/1186" ... 
$ BgRatio : chr "287/18723" "439/18723" "488/18723" "168/18723" ... 
$ pvalue : num 7.78e-22 8.78e-21 1.86e-20 3.65e-20 4.92e-19 ... 
$ p.adjust : num 4.17e-18 2.35e-17 3.32e-17 4.89e-17 5.27e-16 ...

Note that all numbers, including those for foreground and background genes, are the same.

For example, for first GO category (GO:0140014) these are 68/1186 and 287/18723. Therefore identical p- and FDR values are calculated.

Note that in total 18723 genes have been assigned a GO annotation.

Stats from your differing Intel run:

...Result 'data.frame': 1120 obs. of 10 variables:
$ Cluster : Factor w/ 4 levels "set1","set2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ID: chr "GO:0140014" "GO:0048285" "GO:0000280" "GO:0006261" ... 
$ Description: chr "mitotic nuclear division" "organelle fission" "nuclear division" "DNA-dependent DNA replication" ... 
$ GeneRatio : chr "69/1189" "90/1189" "84/1189" "47/1189" ... 
$ BgRatio : chr "296/18862" "486/18862" "436/18862" "157/18862" ... 
$ pvalue : num 7.94e-22 1.02e-20 1.55e-20 5.38e-20 6.19e-20 ... 
$ p.adjust : num 4.24e-18 2.73e-17 2.76e-17 6.61e-17 6.61e-17 ...

Note that all numbers, including these of fore- and background genes, differ!

For GO category GO:0140014 the fore/background gene numbers are 69/1189 and 296/18862. Therefore different p/FDR values.

Also, in total 18862 genes have been assigned a GO annotation.

My conclusion: somehow the underlying GO annotations are different between your 2 systems.

I recommend to force a reinstall of the relevant annotation packages on both machines in a fresh R session:

BiocManager::install( c("GO.db", "org.Hs.eg.db"), force=TRUE )

KEGG ORA (using enrichKEGG):

If you compare the 3 enrichKEGG results, then it turns out identical results are returned for all fore- and background numbers (ratios), and thus statistical results.

This is because enrichKEGG downloads the KEGG pathway information on-the-fly from the KEGG website each time it is executed.

You can therefore exclude that the observed differences are due to platform-introduced (rounding) differences when doing the calculations.

ADD REPLY • link 2.4 years ago Guido Hooiveld ★ 4.0k

0

Entering edit mode

Great thanks to you Guido!

It really works by force reinstalling the packages. Although the sessioninfor remained unchanged, the Intel desktop now produce the identical result as from the AMD laptop. enter image description here

R version 4.1.1 (2021-08-10)

Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: 1 LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
system code page: 1252

attached base packages: 1 stats4 stats graphics grDevices utils datasets methods base

other attached packages: 1 GO.db_3.14.0 org.Hs.eg.db_3.14.0 AnnotationDbi_1.56.2 IRanges_2.28.0 S4Vectors_0.32.3
[6] Biobase_2.54.0 BiocGenerics_0.40.0 clusterProfiler_4.2.1

Now my posted issue turns out to be a stupid question... Your logical, convincing analysis and huge testing effort paied here really warm my heart. Greatly appreciate it!

Finally, I guess that I should select the updated result1 for my further analysis. And also send the information to my colleagues to update the pakcages. However, honestly, I now feel somehow more confused and lots of uncertainty. Seems that more comprehensive results are still in the future.

Thanks again Guido!

ADD REPLY • link 2.4 years ago liuzhy ▴ 10

0

Entering edit mode

Glad to hear it has been solved!

You will know that gene annotations may change over time. For each biannual Bioconductor release, the Bioconductor core team creates snapshots of many publicly available annotation databases, and also archives these. This allows for generating fully reproducible results. That's the reason why in the end the both of us obtained identical results (but this would not have been the case if we would have used different versions of R/Bioconductor).

Bottom line: if you have installed a version of Bioconductor together with the annotation packages of that release version, then results should always be identical.

This command may be useful to check: BiocManager::valid()

ADD REPLY • link 2.4 years ago Guido Hooiveld ★ 4.0k

1

Entering edit mode

Again, glad to get another useful command tip~

Not only for the specific posted issue here, but also for all the warm insights you shared, I want to say thanks more than once and twice. As an immature beginner for this area, I really feel luck here to get this learning experience from you. Thank you Guido for all your shared. Here my best wishes to you for the coming new year ~

ADD REPLY • link 2.4 years ago liuzhy ▴ 10