Hypergeometric (gene set enrichment) test using "Categories" package
Entering edit mode
josmantorres ▴ 10
Last seen 2.7 years ago

Dear Bioconductor community,

We have performed a differential gene expression analysis in an insect and identified some genes belonging to detoxification processes as differentially expressed. Now, I am trying to perform a gene set enrichment analysis based on PFAM domains as we want to see if some specific families related to detoxification (cytochromes, GST, etc..) are enriched in our dataset. We are using "Categories" and the "hyperg" function to do it. Do you suggest other type of analysis within "Categories" considering this objective?

I have some problems with the input files to perform a Hypergeometric (gene set enrichment) test. As far as I understand, I need three files:

  1. assayed - I included all gene ids (first column) with the corresponding pfam domain codes (second column and separated by ;)
  2. significant - IDs of differentially expressed genes
  3. universe - IDs of all genes

When I used the function:

result <- hyperg(assayed, sigsets, universe)

Appears the following error:

Error in .local(assayed, significant, universe, representation, ...) :
  some 'assayed' genes not in 'universe'

As "assayed" and "universe" files were generated from the same file, I think that the problem would be that my "assayed" file has an incorrect format. What would be the correct format for the "assayed "file? I have tested PFAM domains separated by tab and it gives the same error.

Thanks in advance for your time and help,

Best wishes,


R session info: ``` R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.2 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0


attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] Category_2.54.0 Matrix_1.3-3 AnnotationDbi_1.50.3 IRanges_2.24.1 S4Vectors_0.28.1 Biobase_2.50.0
[7] BiocGenerics_0.36.0 edgeR_3.30.3 limma_3.44.3

loaded via a namespace (and not attached): [1] Rcpp_1.0.6 pillar_1.6.1 compiler_4.0.5 bitops_1.0-7 tools_4.0.5 bit_4.0.4 tibble_3.1.2
[8] lifecycle_1.0.0 annotate_1.66.0 RSQLite_2.2.7 memoise_2.0.0 lattice_0.20-44 pkgconfig_2.0.3 rlang_0.4.11
[15] graph_1.66.0 DBI_1.1.1 fastmap_1.1.0 genefilter_1.70.0 hms_1.1.0 vctrs_0.3.8 locfit_1.5-9.4
[22] bit64_4.0.5 grid_4.0.5 GSEABase_1.50.1 R6_2.5.0 fansi_0.4.2 XML_3.99-0.6 RBGL_1.64.0
[29] survival_3.2-11 magrittr_2.0.1 readr_1.4.0 blob_1.2.1 ellipsis_0.3.2 splines_4.0.5 xtable_1.8-4
[36] utf8_1.2.1 RCurl_1.98-1.3 cachem_1.0.5 crayon_1.4.1

enrichment hypergeometric Categories • 824 views
Entering edit mode
Last seen 2 hours ago
United States

The hyperg function doesn't use 'files' it uses R objects. And what those should be is described in the help page (?hyperg).


 assayed: A vector of assayed genes (or other identifiers). 'assayed'
          may be a character vector (defining a single gene set) or
          list of character vectors (defining a collection of gene

significant: A vector of assayed genes that were differentially
          expressed. If 'assayed' is a character vector, then
          'significant' must also be a character vector; likewise when
          'assayed' is a 'list'.

universe: A character vector defining the universe of genes.

So you can pass in two lists and a character vector, or three character vectors, depending on what you are doing. So you could have a vector of IDs that are in a particular PFAM domain, a vector of IDs from that same domain that are significant, and a vector of IDs that define the entirety of the PFAM IDs that were tested. Or if you want to test multiple PFAM domains, you do the same, only the 'assayed' object is a list of IDs for each PFAM domain you want to test, the 'significant' object is a list containing the IDs from the 'assayed' object that were significant, and the universe is still just all the PFAM IDs that were tested.

And if you get an error saying there are things in either the significant object or the assayed object that aren't in the universe, well, it's because there are things in one of those objects that aren't in the universe. R wouldn't lie to you about that, would it?

Entering edit mode

Thanks a lot James for your help and time to answer my question. Best wishes, Jose


Login before adding your answer.

Traffic: 599 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6