Hi all,
I ran the following code:
library(tidyverse)
res <- read_csv("1_FGSEAFULL.csv")
res
library(org.Hs.eg.db)
ens2symbol <- AnnotationDbi::select(org.Hs.eg.db,
key=res$row,
columns="SYMBOL",
keytype="ENSEMBL")
ens2symbol <- as_tibble(ens2symbol)
ens2symbol
res <- inner_join(res, ens2symbol, by=c("row"="ENSEMBL"))
res
res2 <- res %>%
dplyr::select(SYMBOL, stat) %>%
na.omit() %>%
distinct() %>%
group_by(SYMBOL) %>%
summarize(stat=mean(stat))
res2
library(fgsea)
ranks <- deframe(res2)
head(ranks, 50)
pathways.hallmark <- gmtPathways("human.gmt")
pathways.hallmark %>%
head() %>%
lapply(head)
fgseaRes <- fgsea(pathways=pathways.hallmark, stats=ranks, eps=0)
fgseaResTidy <- fgseaRes %>%
as_tibble() %>%
arrange(desc(NES))
fgseaResTidy %>%
dplyr::select(-leadingEdge, -ES) %>%
arrange(padj) %>%
DT::datatable()
ggplot(fgseaResTidy, aes(reorder(pathway, NES), NES)) +
geom_col(aes(fill=padj<0.05)) +
coord_flip() +
labs(x="Pathway", y="Normalized Enrichment Score",
title="Hallmark pathways NES from GSEA") +
theme_classic() +
theme(axis.title.y = element_text(face = "bold")) +
theme(axis.title.y = element_text(color = "black")) +
theme(axis.title.y = element_text(size = "15")) +
theme(axis.text.y.left = element_text(face = "bold")) +
theme(axis.text.y.left = element_text(color = "black")) +
theme(title = element_text(face = "bold")) +
theme(axis.text.y.right = element_text (face = "bold"))
devtools::install_github("ctlab/fgsea")
Initially, nprem = 1000 was used, resulting in a warning regarding using FGSEA Simple. So instead, I used eps=0, which migrated the issue. However, this is still persistent. "Warning message: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (33.92% of the list). The order of those tied genes will be arbitrary, which may produce unexpected results."
Does anyone know how to solve this warning as the consistency of the plot is being questioned due to the variation?
I looked into my data (which was provided by another company); majority of the data has duplicates with a sigfig of 4. However, I am unable to actually calculate the following:
p-value,padj, and IfcSE. Do you happen to know the excel formula?