Question

Clusterprofiler - MSigDB gene set analysis - Updated

0

Entering edit mode

thomasjenner333 • 0

@thomasjenner333-15064

Last seen 6.2 years ago

Hi,

I'm attempting to use 'enricher' and 'GSEA' functions from clusterprofiler package to analayze gene sets from MSigDB.

The following is the code I'm using:

> gmtfile <- "/path/c5.all.v6.1.entrez.gmt"
> c5 <- read.gmt(gmtfile)

> head(df)

   ENTREZID log2FoldChange

1 100516980     0.11587633

2 100155074     0.11587633

> egmt <- enricher(as.character(df[,1]), TERM2GENE=c5)

--> No gene can be mapped....

--> Expected input gene ID: 27433,10846,23479,3669,65977,10808

--> return NULL...

> head(geneList)

100154447    396596 100516171 100155895    397132 100515447

6.035077  4.837211  4.629196  4.524015  4.420449  4.401480

> egmt2 <- GSEA(geneList, TERM2GENE=c5, verbose=FALSE)

--> Expected input gene ID: 54932,23001,3329,2035,9837,22894

Error in check_gene_id(geneList, geneSets) :

  --> No gene can be mapped....

ANSWER: I got a list of pathways by doing the following

eg = bitr(d$SYMBOL, fromType="SYMBOL", toType=c("PATH", "ENTREZID"), OrgDb="org.Ss.eg.db")
> head(eg)
    SYMBOL  PATH  ENTREZID
1    ACKR1 05144 100154447
2     FMO1 00982    397132

tt <- eg[,c(2,3)]
> head(tt)
    PATH  ENTREZID
1  05144 100154447
2  00982    397132

> egmt <- enricher(as.vector(df[,1]), pvalueCutoff=1, qvalueCutoff=1, pAdjustMethod = "BH", TERM2GENE=tt)
> head(egmt)
ID Description GeneRatio BgRatio pvalue p.adjust qvalue
00010 00010       00010   32/2868 32/2868      1        1      1
00020 00020       00020   20/2868 20/2868      1        1      1

I'm not sure why the pathway description aren't displayed. Any suggestions? Thanks

clusterprofiler GSEA MSIGDB • 4.0k views

ADD COMMENT • link updated 6.1 years ago by Guangchuang Yu ★ 1.2k • written 6.1 years ago by thomasjenner333 • 0

score 2 · Accepted Answer · 2018-03-18

2

Entering edit mode

Guangchuang Yu ★ 1.2k

@guangchuang-yu-5419

Last seen 2 days ago

China/Guangzhou/Southern Medical Univer…

What’s the organism you want to analyze? according to https://www.ncbi.nlm.nih.gov/gene/100516980, it is Sus scrofa.

I guess the gmtfile <- "/path/c5.all.v6.1.entrez.gmt" is annotation for human.

This is why it throw the msg:


--> No gene can be mapped....  
--> Expected input gene ID: 27433,10846,23479,3669,65977,10808

ADD COMMENT • link 6.1 years ago Guangchuang Yu ★ 1.2k

0

Entering edit mode

Hi Yu!

Yes, I had used the wrong file. Then I got the ones for Sus scrofa, and did the MSigdb gene set analysis. Thanks for pointing out the mistake.

The answer that I've posted, is that an acceptable approach to get a list of pathways? Thanks for your help.

ADD REPLY • link 6.1 years ago thomasjenner333 • 0