Mouse GSEA Analysis
0
0
Entering edit mode
Rob • 0
@9a3af295
Last seen 17 months ago
United Kingdom

Hello

I am trying to run a GSSEA using the mouse hallmark pathways.

I keep getting an error with the following code line:

em2 <- GSEA(genes$d.symbol, TERM2GENE = m_t2g)

This returns the error:

Error in GSEA_internal(geneList = geneList, exponent = exponent, minGSSize = minGSSize, : geneList should be a decreasing sorted vector...

Any help in understanding how to 'rank' this in order allow me to run tthe GSEA. I have included my full code below

Thank you


```{r}
library(tidyverse)
library(tidyr)
library(readr)
library(cluster.datasets)
library(readxl)

Now we are going to arrange the file 'd' in descending log2foldchange. Then we filter for a log2fold change >1 and create this as a file called gene.

library(dplyr)
d <- res_7hour %>% 
  arrange(desc(log2FoldChange))
gene <- dplyr::filter(d, padj > 0.05)
head(gene)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(ensembldb)
library(org.Mm.eg.db)
library(AnnotationDbi)
d$ENTREZ <- mapIds(EnsDb.Mmusculus.v79,
                         keys=d$symbol,
                         column="ENTREZID",
                         keytype="SYMBOL",
                         multiVals="first")

GSEA FOR PROCINE DATA STARTS HERE +++++

Then we load the msigbr package and use the homo sapiens geneome. Then we run the analysis and create m_t2g. The output file has the 'Hallmark' tags removed.

library(msigdbr)
msigdbr_show_species()
m_df <- msigdbr(species = "Mus musculus")
head(m_df, 2) %>% as.data.frame

m_t2g <- msigdbr(species = "mouse", category = "C2", subcategory = "CGP") %>% 
  dplyr::select(gs_name, gene_symbol)

m_t2g$gs_name <- gsub("_"," ", m_t2g$gs_name)
m_t2g$gs_name <- tolower(m_t2g$gs_name)
m_t2g$gs_name <- gsub("(^|[[:space:]])([[:alpha:]])", "\\1\\U\\2", m_t2g$gs_name, perl=TRUE)
m_t2g$gs_name <- gsub("Hallmark ","", m_t2g$gs_name)

Now we name genes as the log2foldchange, and names(genes) as the ENTREZ id. Them we run the enrichment and GSEA. Then we paste the results into a file that we can save as a CSV file

library(fgsea)
library(enrichR)
library(clusterProfiler)
library(dplyr)

genes <- data.frame(d$log2FoldChange, d$symbol)
names(genes) <- d$symbol

genes <- genes %>% dplyr::arrange(desc(d.log2FoldChange))


library(fgsea)
library(enrichR)
library(clusterProfiler)
em <- enricher(gene = gene$symbol, TERM2GENE=m_t2g)
em2 <- GSEA(genes$d.symbol, TERM2GENE = m_t2g)
head(em)
head(em2)

library(AnnotationDbi)
library(org.Hs.eg.db)
library(stats4)
library(BiocGenerics)
library(utils)

em2_D7 <- setReadable(em2, 'EnsDb.Mmusculus.v79', 'ENTREZID')

H_D7_moouse_gsea = paste("~/Documents/sequencing/Kallisto_input/Kallisto_injury/Day_7_Analysis","Hallmark_rabb_gsea","_res.csv", sep="")

write.table(em2_D7, file=H_D7_moouse_gsea, quote=FALSE, sep= ",", row.names= FALSE)
# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
clusterProfiler • 3.0k views
ADD COMMENT
0
Entering edit mode

Two things that are going wrong in your code : you create a data.frame genes <- data.frame(d$log2FoldChange, d$symbol) but then you assigned names() as if it was a vector whereas it is a data.frame. Finally, you did not create a geneList correctly formatted : a geneList correctly formatted is a vector with assigned names, and not a column from a data.frame. You'd better go with

genes <- d$logFC
names(genes) <- d$genes
genes=sort(genes, decreasing = TRUE)
em2 <- GSEA(genes, TERM2GENE = m_t2g)
ADD REPLY
0
Entering edit mode

Thanks for this. This works. but then the line em2....(GSEA(...) comes back with this error: preparing geneSet collections... --> Expected input gene ID: Ugt2b36,Lrp1,Dapk3,Tac1,Scrn1,Fgg Error in check_gene_id(geneList, geneSets) : --> No gene can be mapped....

Which makes me think the set up of my mouse database is incorrect? Or I am not pulling the gene names from my data correctly.

ADD REPLY
0
Entering edit mode

Yes probably it does not match. Could you show names(genes)[1:10] ?

ADD REPLY
0
Entering edit mode

This returns:

> names(genes)[1:10]
NULL

glimpse(d) (as d is what we are trying to draw from), shows:


> glimpse(d)
Rows: 15,823
Columns: 8
$ baseMean       <dbl> 35.414475, 25.978774, 13.187945, 63.999285, 10.484201, 8.426628, 8.023201, 7.108886, 6.5…
$ log2FoldChange <dbl> 8.625703, 8.178273, 7.200294, 7.100655, 6.869258, 6.554105, 6.482692, 6.308743, 6.202211…
$ lfcSE          <dbl> 1.2815145, 3.9084923, 1.4629674, 2.0204360, 3.9110520, 3.9120993, 2.7036496, 2.1255049, …
$ stat           <dbl> 6.730866, 2.092437, 4.921705, 3.514417, 1.756371, 1.675342, 2.397756, 2.968115, 2.230004…
$ pvalue         <dbl> 1.686560e-11, NA, 8.579360e-07, 4.407197e-04, 7.902510e-02, 9.386710e-02, 1.649585e-02, …
$ padj           <dbl> 8.856384e-09, NA, 7.162882e-05, 8.864029e-03, 2.763652e-01, 3.035723e-01, 1.077803e-01, …
$ symbol         <chr> "Mark4", "Sbno2", "Mical3", "Efcab6", "Gm7887", "Msantd4", "Pggt1b", "Ddx11", "Rarres2",…
$ ENTREZ         <int> 232944, 216161, 194401, 77627, NA, 78100, 225467, 320209, 71660, NA, 225997, 224079, 319…

Hope that helps?

ADD REPLY
0
Entering edit mode

Yes sorry it was an error from me, gene names corresponds to symbol column in your data :

genes <- d$logFC
names(genes) <- d$symbol
genes=sort(genes, decreasing = TRUE)
em2 <- GSEA(genes, TERM2GENE = m_t2g)
ADD REPLY
0
Entering edit mode

Thank. This works.

Should I change the line here to read symbol or gene also:

em2_D7 <- setReadable(em2, 'EnsDb.Mmusculus.v79', 'ENTREZID')

H_D7_moouse_gsea = paste("~/Documents/sequencing/Kallisto_input/Kallisto_injury/Day_7_Analysis","Hallmark_rabb_gsea","_res.csv", sep="")

write.table(em2_D7, file=H_D7_moouse_gsea, quote=FALSE, sep= ",", row.names= FALSE)
ADD REPLY
0
Entering edit mode

This returns:

> names(genes)[1:10]
NULL

glimpse(d) (as d is what we are trying to draw from), shows:


> glimpse(d)
Rows: 15,823
Columns: 8
$ baseMean       <dbl> 35.414475, 25.978774, 13.187945, 63.999285, 10.484201, 8.426628, 8.023201, 7.108886, 6.5…
$ log2FoldChange <dbl> 8.625703, 8.178273, 7.200294, 7.100655, 6.869258, 6.554105, 6.482692, 6.308743, 6.202211…
$ lfcSE          <dbl> 1.2815145, 3.9084923, 1.4629674, 2.0204360, 3.9110520, 3.9120993, 2.7036496, 2.1255049, …
$ stat           <dbl> 6.730866, 2.092437, 4.921705, 3.514417, 1.756371, 1.675342, 2.397756, 2.968115, 2.230004…
$ pvalue         <dbl> 1.686560e-11, NA, 8.579360e-07, 4.407197e-04, 7.902510e-02, 9.386710e-02, 1.649585e-02, …
$ padj           <dbl> 8.856384e-09, NA, 7.162882e-05, 8.864029e-03, 2.763652e-01, 3.035723e-01, 1.077803e-01, …
$ symbol         <chr> "Mark4", "Sbno2", "Mical3", "Efcab6", "Gm7887", "Msantd4", "Pggt1b", "Ddx11", "Rarres2",…
$ ENTREZ         <int> 232944, 216161, 194401, 77627, NA, 78100, 225467, 320209, 71660, NA, 225997, 224079, 319…

Hope that helps?

ADD REPLY

Login before adding your answer.

Traffic: 1053 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6