**20**wrote:

Hello everybody,

I would like to calculate and plot the enrichment score of a selection from a ranked list, as many readers are familar with the gene set enrichment plots produced by the Broad Institute's GSEA implementation.

While the basic principle is clear to me, I failed to understand how the magnitude is determined, by which the running sum statistic is increased or decreased. The paper and the FAQ only say "*the magnitude of the increment depends on the correlation of the gene with the phenotype*" and a otherwise very helpful third-party site also doesn't explain how it is calculated.

Please find a working example code to illustrate my current approach below in case you would like to try it out for yourself. However to help me an explanation or formula to calculate the enrichment score would already suffice, since my current result doesn't look similar to the enrichment score of GSEA plots at all. Thanks a lot for your help.

Matthias

testvalues <- rweibull(1e4,shape=1,scale=2) + rnorm(1e4) testdata <- data.frame( rank = rank(testvalues,ties.method = "random"), value = testvalues, random_set = sample( c(TRUE,FALSE),size = 1e4,replace = TRUE,prob = c(0.02,0.98) ), enriched_set = ( # place random TRUES, but only in the upper 50% range rank(testvalues,ties.method = "random") > 5e3 & sample( c(TRUE,FALSE),size = 1e4,replace = TRUE,prob = c(0.05,0.95) ) ) ) # order dataset and reverse ranks (so biggest value is ranked first) testdata <- testdata[order(testdata[,"rank"]),] testdata[,"rank"] <- rev(testdata[,"rank"]) testdata <- rev(testdata) # fruitless attempt to calculate someting like an enrichment score testdata[,c("ES_random_set","ES_enriched_set")] <- apply(testdata[,c("random_set","enriched_set")],2,function(x) { enscore <- as.numeric(x) # normalize magnitutde for selection size enscore[enscore == 0] <- (-1 / (length(x) / sum(x) - 1)) return(round(cumsum(enscore),3)*-1) }) ### plot the wannabe enrichment score plot_es <- function(plotdata,set,es){ library("ggplot2") library("gridExtra") curve <- ggplot(plotdata,aes_string(x="rank", y="value")) + geom_line() + geom_point(data=plotdata[plotdata[,set],],color="red",size=5,shape=124) esplot <- ggplot(plotdata,aes_string(x="rank", y=es)) + geom_line() return(grid.arrange(curve,esplot,nrow=2,heights=c(3,1))) } plot_es(testdata,"random_set","ES_random_set") plot_es(testdata,"enriched_set","ES_enriched_set")

**32k**• written 20 months ago by Matthias Z. •

**20**