Entering edit mode
Hello
I am trying to run a GSSEA using the mouse hallmark pathways.
I keep getting an error with the following code line:
em2 <- GSEA(genes$d.symbol, TERM2GENE = m_t2g)
This returns the error:
Error in GSEA_internal(geneList = geneList, exponent = exponent, minGSSize = minGSSize, : geneList should be a decreasing sorted vector...
Any help in understanding how to 'rank' this in order allow me to run tthe GSEA. I have included my full code below
Thank you
```{r}
library(tidyverse)
library(tidyr)
library(readr)
library(cluster.datasets)
library(readxl)
Now we are going to arrange the file 'd' in descending log2foldchange. Then we filter for a log2fold change >1 and create this as a file called gene.
library(dplyr)
d <- res_7hour %>%
arrange(desc(log2FoldChange))
gene <- dplyr::filter(d, padj > 0.05)
head(gene)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(ensembldb)
library(org.Mm.eg.db)
library(AnnotationDbi)
d$ENTREZ <- mapIds(EnsDb.Mmusculus.v79,
keys=d$symbol,
column="ENTREZID",
keytype="SYMBOL",
multiVals="first")
GSEA FOR PROCINE DATA STARTS HERE +++++
Then we load the msigbr package and use the homo sapiens geneome. Then we run the analysis and create m_t2g. The output file has the 'Hallmark' tags removed.
library(msigdbr)
msigdbr_show_species()
m_df <- msigdbr(species = "Mus musculus")
head(m_df, 2) %>% as.data.frame
m_t2g <- msigdbr(species = "mouse", category = "C2", subcategory = "CGP") %>%
dplyr::select(gs_name, gene_symbol)
m_t2g$gs_name <- gsub("_"," ", m_t2g$gs_name)
m_t2g$gs_name <- tolower(m_t2g$gs_name)
m_t2g$gs_name <- gsub("(^|[[:space:]])([[:alpha:]])", "\\1\\U\\2", m_t2g$gs_name, perl=TRUE)
m_t2g$gs_name <- gsub("Hallmark ","", m_t2g$gs_name)
Now we name genes as the log2foldchange, and names(genes) as the ENTREZ id. Them we run the enrichment and GSEA. Then we paste the results into a file that we can save as a CSV file
library(fgsea)
library(enrichR)
library(clusterProfiler)
library(dplyr)
genes <- data.frame(d$log2FoldChange, d$symbol)
names(genes) <- d$symbol
genes <- genes %>% dplyr::arrange(desc(d.log2FoldChange))
library(fgsea)
library(enrichR)
library(clusterProfiler)
em <- enricher(gene = gene$symbol, TERM2GENE=m_t2g)
em2 <- GSEA(genes$d.symbol, TERM2GENE = m_t2g)
head(em)
head(em2)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(stats4)
library(BiocGenerics)
library(utils)
em2_D7 <- setReadable(em2, 'EnsDb.Mmusculus.v79', 'ENTREZID')
H_D7_moouse_gsea = paste("~/Documents/sequencing/Kallisto_input/Kallisto_injury/Day_7_Analysis","Hallmark_rabb_gsea","_res.csv", sep="")
write.table(em2_D7, file=H_D7_moouse_gsea, quote=FALSE, sep= ",", row.names= FALSE)
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
sessionInfo( )
Two things that are going wrong in your code : you create a data.frame
genes <- data.frame(d$log2FoldChange, d$symbol)
but then you assignednames()
as if it was a vector whereas it is a data.frame. Finally, you did not create a geneList correctly formatted : a geneList correctly formatted is a vector with assigned names, and not a column from a data.frame. You'd better go withThanks for this. This works. but then the line em2....(GSEA(...) comes back with this error: preparing geneSet collections... --> Expected input gene ID: Ugt2b36,Lrp1,Dapk3,Tac1,Scrn1,Fgg Error in check_gene_id(geneList, geneSets) : --> No gene can be mapped....
Which makes me think the set up of my mouse database is incorrect? Or I am not pulling the gene names from my data correctly.
Yes probably it does not match. Could you show
names(genes)[1:10]
?This returns:
glimpse(d) (as d is what we are trying to draw from), shows:
Hope that helps?
Yes sorry it was an error from me, gene names corresponds to
symbol
column in your data :Thank. This works.
Should I change the line here to read symbol or gene also:
This returns:
glimpse(d) (as d is what we are trying to draw from), shows:
Hope that helps?