GSEABase: how to extract shortDescription from GeneSetCollection object
1
1
Entering edit mode
Guido Hooiveld ★ 3.9k
@guido-hooiveld-2020
Last seen 11 hours ago
Wageningen University, Wageningen, the …

Hi, I am using the package GSEAbase to import a GMT file (file with gene sets). This is working fine, but I don't understand how I can extract the 'shortDescription' of the gene sets from the GeneSetCollection-object. I assumed this could be done using the function shortDescription, but apparently this is not the case. Extracting the [ugly] setName, geneIds, or setIdentifier go fine.

Could someone please point me to the right direction? Thanks, G

Note: I know the first gene set (mmu02020) happens to be 'empty'.

    > library(GSEABase)
    > GeneSets <- getGmt("my.genesets.gmt")
    >
    > class(GeneSets)
    [1] "GeneSetCollection"
    attr(,"package")
    [1] "GSEABase"
    >
    > str(GeneSets)
    Formal class 'GeneSetCollection' [package "GSEABase"] with 1 slot
      ..@ .Data:List of 47
      .. ..$ :Formal class 'GeneSet' [package "GSEABase"] with 13 slots
      .. .. .. ..@ geneIdType      :Formal class 'NullIdentifier' [package "GSEABase"] with 2 slots
      .. .. .. .. .. ..@ type      : chr "Null"
      .. .. .. .. .. ..@ annotation: chr ""
      .. .. .. ..@ geneIds         : chr(0) 
      .. .. .. ..@ setName         : chr "mmu02020.Two.component.system.KEGG"
      .. .. .. ..@ setIdentifier   : chr "D0147357:13308:Wed Apr 24 17:16:14 2019:2"
      .. .. .. ..@ shortDescription: chr "KEGG: Two-component system"
      .. .. .. ..@ longDescription : chr ""
      .. .. .. ..@ organism        : chr ""
      .. .. .. ..@ pubMedIds       : chr(0) 
      .. .. .. ..@ urls            : chr(0) 
      .. .. .. ..@ contributor     : chr(0) 
      .. .. .. ..@ version         :Formal class 'Versions' [package "Biobase"] with 1 slot
      .. .. .. .. .. ..@ .Data:List of 1
      .. .. .. .. .. .. ..$ : int [1:3] 0 0 1
      .. .. .. ..@ creationDate    : chr(0) 
      .. .. .. ..@ collectionType  :Formal class 'NullCollection' [package "GSEABase"] with 1 slot
      .. .. .. .. .. ..@ type: chr "Null"
      .. ..$ :Formal class 'GeneSet' [package "GSEABase"] with 13 slots
      .. .. .. ..@ geneIdType      :Formal class 'NullIdentifier' [package "GSEABase"] with 2 slots
      .. .. .. .. .. ..@ type      : chr "Null"
      .. .. .. .. .. ..@ annotation: chr ""
      .. .. .. ..@ geneIds         : chr [1:294] "Gm5741" "Mapkapk3" "Arrb1" "Braf" ...
      .. .. .. ..@ setName         : chr "mmu04010.MAPK.signaling.pathway.KEGG"
      .. .. .. ..@ setIdentifier   : chr "D0147357:13308:Wed Apr 24 17:16:14 2019:3"
      .. .. .. ..@ shortDescription: chr "KEGG: MAPK signaling pathway"
      .. .. .. ..@ longDescription : chr ""
    <<snip>>
    >
    >
    > setName(GeneSets[[2]])
    [1] "mmu04010.MAPK.signaling.pathway.KEGG"
    >
    > shortDescription(GeneSets[[2]])
    Error in shortDescription(GeneSets[[1]]) : 
      could not find function "shortDescription"
    >
    > setIdentifier(GeneSets[[2]])
    [1] "D0147357:13308:Wed Apr 24 17:16:14 2019:3"
    > head(geneIds(GeneSets[[2]]))
    [1] "Gm5741"   "Mapkapk3" "Arrb1"    "Braf"     "Rap1a"    "Raf1"    



    > sessionInfo()
    R version 3.5.1 Patched (2018-11-24 r75665)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows 7 x64 (build 7601) Service Pack 1

    Matrix products: default

    locale:
    [1] LC_COLLATE=English_United States.1252 
    [2] LC_CTYPE=English_United States.1252   
    [3] LC_MONETARY=English_United States.1252
    [4] LC_NUMERIC=C                          
    [5] LC_TIME=English_United States.1252    

    attached base packages:
    [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
    [8] methods   base     

    other attached packages:
    [1] GSEABase_1.44.0      graph_1.60.0         annotate_1.60.1     
    [4] XML_3.98-1.19        AnnotationDbi_1.44.0 IRanges_2.16.0      
    [7] S4Vectors_0.20.1     Biobase_2.42.0       BiocGenerics_0.28.0 

    loaded via a namespace (and not attached):
     [1] Rcpp_1.0.1      digest_0.6.18   bitops_1.0-6    xtable_1.8-4   
     [5] DBI_1.0.0       RSQLite_2.1.1   blob_1.1.1      bit64_0.9-7    
     [9] RCurl_1.95-4.12 bit_1.1-14      compiler_3.5.1  memoise_1.1.0  
    >
GSEABase • 1.5k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 3 days ago
United States

Instead of looking at the structure of the object, I looked at the methods available (not completely foolproof...)

example(readGmt)
methods(class = class(gss[[1]]))

and then found

description(gss[[1]])

I guess I was hoping that ?"description,GeneSet-method" would actually be helpful, but I guess the author of the package didn't do a good enough job on documentation :(. The vignette lead me to details(gss[[1]]), and there I see some further hints

> details(gss[[1]])
setName: chr5q23
geneIds: ZNF474, CCDC100, ..., LOC728586 (total: 86)
geneIdType: Symbol
collectionType: Broad
  bcCategory: c1 (Positional)
  bcSubCategory: NA
setIdentifier: c1:101
description: Genes in cytogenetic band chr5q23
organism: Human
pubMedIds:
urls: file://Users/ma38727/Library/R/3.6/Bioc/3.9/GSEABase/extdata/Broad.xml
      http://www.broad.mit.edu/gsea/msigdb/cards/chr5q23.xml
      http://genome.ucsc.edu/cgi-bin/hgTracks?position=5q23
contributor: Broad Institute
setVersion: 0.0.1
creationDate:

where each of the keys corresponds to a function.

ADD COMMENT
0
Entering edit mode

Thanks Martin for your hints. Using the function description indeed does the trick!

> description(GeneSets[[2]])
[1] "KEGG: MAPK signaling pathway"
>
ADD REPLY

Login before adding your answer.

Traffic: 705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6