Question: Question on GeneSetCollection in GSEABase
0
2.2 years ago by
siajunren0 wrote:

Hi,

I have gone through the vignette and parts of the reference manual but I am still stuck.

I have a vector of gene symbols, call this vector “Vec”. These are all the genes which expression levels I have measured with RNA-Seq. Subsequently, I want to perform gene set enrichment analysis with GO biological processes terms using a custom script on the differentially expressed genes and to do so, I need a comprehensive list of gene sets induced from Vec. (i.e. I need a list all the gene sets that each could form a subset of Vec, with each gene set classified according to GO biological processes terms. )

To do so, I run the following line:

Gs=GeneSetCollection(Vec, idType = SymbolIdentifier('org.Mm.eg.db'), setType=GOCollection(ontology='BP'))

But I got the following error:

Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘GeneSetCollection’ for signature ‘"character", "SymbolIdentifier", "GOCollection"’

‘org.Mm.eg.db’ definitely contains mapping from Symbols to entrez gene and vice versa.

Is my approach even somewhat correct?

Thanks,

Junren Sia, PhD

Research Fellow

Institute of Medical Biology

gseabase geneset • 440 views
modified 2.2 years ago by Martin Morgan ♦♦ 23k • written 2.2 years ago by siajunren0
Answer: Question on GeneSetCollection in GSEABase
0
2.2 years ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

Automated gene set construction is only possible for the 'primary' identifiers (ENTREZ, for the org packages), but they are not too hard to construct 'by hand'. I'll use the following packages

library(org.Hs.eg.db)
library(GSEABase)
library(magrittr)

Here's some data, for reproducibility

set.seed(123)
vec <- sample(keys(org.Hs.eg.db, "SYMBOL"), 1000)

I'll retrieve the GO identifiers associated with each term, then subset to a single ontology and columns that I'm interested in, and remove duplicate (e.g., because of multiple evidence codes, which we are not concerned with) rows

ids <- select(org.Hs.eg.db, vec, "GO", "SYMBOL") %>%
subset(ONTOLOGY=="BP", c("SYMBOL", "GO")) %>% unique

I'll create the sets by splitting the SYMBOL identifiers based on their GO identifier

sets <- split(ids$SYMBOL, ids$GO)

I'll then map each plain character vector to a GeneSet using Map(), and create a collection of gene sets

gsc <- GeneSetCollection(Map(
GeneSet, sets, setName=names(sets),
MoreArgs=list(
geneIdType=SymbolIdentifier("org.Hs.eg.db"),
collectionType=GOCollection(ontology="BP"))
))

The result is a collection with 1133 gene sets containing a total of 287 genes.

> gsc
GeneSetCollection
names: GO:0000082, GO:0000086, ..., GO:2001288 (1133 total)
unique identifiers: RPA2, PPP6C, ..., GPHA2 (287 total)
types in collection:
geneIdType: SymbolIdentifier (1 total)
collectionType: GOCollection (1 total)

Thank you for the demonstration of the "by hand" method. However, I like to avoid that if possible.

Following your advice that automated gene set construction is only possible for 'primary identifiers', I mapped my symbols to EntrezID and ran the same line but obtained the same error as before. Do you know what went wrong?