Question

Question on GeneSetCollection in GSEABase

0

Entering edit mode

siajunren • 0

@siajunren-12197

Last seen 2.7 years ago

Hi,

I have gone through the vignette and parts of the reference manual but I am still stuck.

I have a vector of gene symbols, call this vector “Vec”. These are all the genes which expression levels I have measured with RNA-Seq. Subsequently, I want to perform gene set enrichment analysis with GO biological processes terms using a custom script on the differentially expressed genes and to do so, I need a comprehensive list of gene sets induced from Vec. (i.e. I need a list all the gene sets that each could form a subset of Vec, with each gene set classified according to GO biological processes terms. )

To do so, I run the following line:

Gs=GeneSetCollection(Vec, idType = SymbolIdentifier('org.Mm.eg.db'), setType=GOCollection(ontology='BP'))

But I got the following error:

Error in (function (classes, fdef, mtable)  :

  unable to find an inherited method for function ‘GeneSetCollection’ for signature ‘"character", "SymbolIdentifier", "GOCollection"’

‘org.Mm.eg.db’ definitely contains mapping from Symbols to entrez gene and vice versa.

Is my approach even somewhat correct?

Thanks,

Junren Sia, PhD

Research Fellow

Institute of Medical Biology

gseabase geneset • 1.4k views

ADD COMMENT • link updated 7.3 years ago by Martin Morgan 25k • written 7.3 years ago by siajunren • 0

score 1 · Answer 1 · 2017-01-20

Automated gene set construction is only possible for the 'primary' identifiers (ENTREZ, for the org packages), but they are not too hard to construct 'by hand'. I'll use the following packages

library(org.Hs.eg.db)
library(GSEABase)
library(magrittr)

Here's some data, for reproducibility

set.seed(123)
vec <- sample(keys(org.Hs.eg.db, "SYMBOL"), 1000)

I'll retrieve the GO identifiers associated with each term, then subset to a single ontology and columns that I'm interested in, and remove duplicate (e.g., because of multiple evidence codes, which we are not concerned with) rows

ids <- select(org.Hs.eg.db, vec, "GO", "SYMBOL") %>%
    subset(ONTOLOGY=="BP", c("SYMBOL", "GO")) %>% unique

I'll create the sets by splitting the SYMBOL identifiers based on their GO identifier

sets <- split(ids$SYMBOL, ids$GO)

I'll then map each plain character vector to a GeneSet using Map(), and create a collection of gene sets

gsc <- GeneSetCollection(Map(
    GeneSet, sets, setName=names(sets),
    MoreArgs=list(
        geneIdType=SymbolIdentifier("org.Hs.eg.db"),
        collectionType=GOCollection(ontology="BP"))
))

The result is a collection with 1133 gene sets containing a total of 287 genes.

> gsc
GeneSetCollection
  names: GO:0000082, GO:0000086, ..., GO:2001288 (1133 total)
  unique identifiers: RPA2, PPP6C, ..., GPHA2 (287 total)
  types in collection:
    geneIdType: SymbolIdentifier (1 total)
    collectionType: GOCollection (1 total)