Hi! I am not sure if I misunderstand how GSEA works or I have an error in my GSEA analysis. I am using clusterProfiler (gseKEGG function). Input data:
- output from DESeq2 ("control vs mutant" and "mutant vs control" --> I just set the contrast in both directions. it was the same input in DESeq2)
- ranking of genes based on log2foldchange (sign of log2FC is of course inverse for "control vs mutant" compared to "mutant vs control", but the absolute value of the log2FC is the same --> so, genes that are in "control vs mutant" on top of the ranked list are in "mutant vs control" at the bottom of the list --> ranking exactly in the inverse order.
E.g. "control vs mutant":
- Gene A log2FC 3
- Gene B log2FC 0.5
- Gene C log2FC -4
E.g. "mutant vs control":
- Gene C log2FC 4
- Gene B log2FC 0.5
- Gene A log2FC -3
Why are the enriched gene sets in "control vs mutant" and "mutant vs control" not the same (of course with inverse sign of enrichment score)? I get different significant enriched pathways for "control vs mutant" and "mutant vs control".
Results: "control vs mutant" is giving me 1 pathway enriched
"mutant vs control" is giving me 5 pathways enriched
I would expect in both e.g. the same 5 pathways enriched, but the sign of enrichment score would be inversed. Why is it not like this?
Thanks in advance for your help!
set.seed(1234)
##liste is some output of DESeq2
#used the same code for "control vs mutant" and "mutant vs control"
liste2 = na.omit(liste) #removed genes for which I have no KEGG number annotated
original_gene_list = liste2$log2FoldChange
names(original_gene_list) = liste2$KEGG.ID
gene_list = sort(original_gene_list, decreasing = TRUE)
gene_list_d <- gene_list[!duplicated(names(gene_list))] #tried with and without removing duplicates
kk <- gseKEGG(gene = gene_list_d,
organism = "ko",
keyType = "kegg",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
seed=TRUE)
sessionInfo( )
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] clusterProfiler_4.11.0.002 DESeq2_1.42.1 SummarizedExperiment_1.32.0 Biobase_2.62.0 MatrixGenerics_1.14.0 matrixStats_1.2.0
[7] GenomicRanges_1.54.1 GenomeInfoDb_1.38.8 IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] DBI_1.2.2 bitops_1.0-7 gson_0.1.0 shadowtext_0.1.3 gridExtra_2.3 rlang_1.1.3 magrittr_2.0.3
[8] DOSE_3.28.2 compiler_4.3.2 RSQLite_2.3.6 png_0.1-8 vctrs_0.6.5 reshape2_1.4.4 stringr_1.5.1
[15] pkgconfig_2.0.3 crayon_1.5.2 fastmap_1.1.1 XVector_0.42.0 ggraph_2.2.1 utf8_1.2.4 HDO.db_0.99.1
[22] enrichplot_1.22.0 purrr_1.0.2 bit_4.0.5 zlibbioc_1.48.2 cachem_1.0.8 aplot_0.2.2 jsonlite_1.8.8
[29] blob_1.2.4 DelayedArray_0.28.0 BiocParallel_1.36.0 tweenr_2.0.3 parallel_4.3.2 R6_2.5.1 stringi_1.8.3
[36] RColorBrewer_1.1-3 GOSemSim_2.28.1 Rcpp_1.0.12 snow_0.4-4 Matrix_1.6-1.1 splines_4.3.2 igraph_2.0.3
[43] tidyselect_1.2.1 qvalue_2.34.0 rstudioapi_0.16.0 abind_1.4-5 viridis_0.6.5 codetools_0.2-19 lattice_0.21-9
[50] tibble_3.2.1 plyr_1.8.9 treeio_1.26.0 withr_3.0.0 KEGGREST_1.42.0 gridGraphics_0.5-1 scatterpie_0.2.2
[57] polyclip_1.10-6 Biostrings_2.70.3 ggtree_3.10.1 pillar_1.9.0 ggfun_0.1.4 generics_0.1.3 RCurl_1.98-1.14
[64] ggplot2_3.5.0 tidytree_0.4.6 munsell_0.5.1 scales_1.3.0 glue_1.7.0 lazyeval_0.2.2 tools_4.3.2
[71] data.table_1.15.4 fgsea_1.28.0 locfit_1.5-9.9 fs_1.6.3 graphlayouts_1.1.1 fastmatch_1.1-4 tidygraph_1.3.1
[78] cowplot_1.1.3 grid_4.3.2 ape_5.7-1 tidyr_1.3.1 AnnotationDbi_1.64.1 colorspace_2.1-0 nlme_3.1-163
[85] patchwork_1.2.0 GenomeInfoDbData_1.2.11 ggforce_0.4.2 cli_3.6.2 fansi_1.0.6 S4Arrays_1.2.1 viridisLite_0.4.2
[92] dplyr_1.1.4 gtable_0.3.4 yulab.utils_0.1.4 digest_0.6.35 ggplotify_0.1.2 SparseArray_1.2.4 ggrepel_0.9.5
[99] farver_2.1.1 memoise_2.0.1 lifecycle_1.0.4 httr_1.4.7 GO.db_3.18.0 bit64_4.0.5 MASS_7.3-60