Question

No gene set have size > 10 ...

1

Entering edit mode

anton.kratz ▴ 60

@antonkratz-8836

Last seen 27 days ago

Japan, Tokyo, The Systems Biology Insti…

I am using clusterProfiler to check whether a group of genes of interest is enriched in genes coming from a term from a custom ontology. I am following the outline provided by this article: "use clusterProfiler as an universal enrichment analysis tool", http://guangchuangyu.github.io/2015/05/use-clusterprofiler-as-an-universal-enrichment-analysis-tool/

The problem is that I run into an error:

No gene set have size > 10 ...
--> return NULL...

and I do not not understand this error. My googling and searching the code I assume that this is somehow related to DOSE and a variable "minGSSize: minimal size of genes annotated by Ontology term for testing", but I do not see what is wrong with my custom input ontology; all genes in the group to be tested are contained in the custom ontology, and the terms of the custom ontology which have these genes in them, all have more that 10 members respectively; and the genes_of_interest array also has more than 10 members.

My code is below. I cannot share the actual custom ontology. I tested with several hundred different groups of genes of interest against this custom ontology and had no problem. Only this particular group out of all groups tested triggers the error. However I will expand to several thousand groups in batch and must fix this error first. If I replace my custom ontology with another, custom-slimmed down version of Gene Ontology/Cellular Component, I have no problem as well. Example code (custom ontology cannot be shared):

library("clusterProfiler")

genes_of_interest <- c("MT-CO2","MT-CO3","COX5A","COX7C","COX5B","COX4I1","COX6A1","COX7A2","COX6B1","COX7B","COX8A","COX6C","MT-CO1")

# READ IN A GIVEN ONTOLOGY BUILD
# load the custom ontology
nestsystem2gene <- read.table("long_fuse81_nest_0.15.tsv", header=F)
# instead of the custom ontology I can also load a version of GO, Cellular Component
# nestsystem2gene <- read.table("collapsed_cc.symbol", header=F)

# convert everything to uppercase
df <- data.frame(lapply(nestsystem2gene, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))
# /READ IN A GIVEN ONTOLOGY BUILD

x = enricher(genes_of_interest, TERM2GENE=df)

And here is my sessionInfo:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8      
[4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C             
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DOSE_3.6.1            clusterProfiler_3.8.1 BiocInstaller_1.30.0 

loaded via a namespace (and not attached):
[1] ggrepel_0.8.0        Rcpp_0.12.18         lattice_0.20-35      tidyr_0.8.1        
[5] GO.db_3.6.0          assertthat_0.2.0     digest_0.6.15        ggforce_0.1.3      
[9] R6_2.3.0             plyr_1.8.4           ggridges_0.5.1       stats4_3.5.1       
[13] RSQLite_2.1.1        ggplot2_3.0.0        pillar_1.3.0         rlang_0.2.2        
[17] lazyeval_0.2.1       rstudioapi_0.8       data.table_1.11.8    blob_1.1.1         
[21] S4Vectors_0.18.3     Matrix_1.2-14        qvalue_2.12.0        splines_3.5.1      
[25] BiocParallel_1.14.2  stringr_1.3.1        igraph_1.2.2         bit_1.1-14         
[29] munsell_0.5.0        fgsea_1.6.0          compiler_3.5.1       pkgconfig_2.0.1    
[33] BiocGenerics_0.26.0  tidyselect_0.2.5     tibble_1.4.2         gridExtra_2.3      
[37] IRanges_2.14.10      enrichplot_1.0.2     viridisLite_0.3.0    crayon_1.3.4       
[41] dplyr_0.7.7          MASS_7.3-50          grid_3.5.1           gtable_0.2.0       
[45] DBI_1.0.0            magrittr_1.5         units_0.6-1          scales_1.0.0       
[49] stringi_1.2.4        GOSemSim_2.6.0       farver_1.0           reshape2_1.4.3     
[53] viridis_0.5.1        bindrcpp_0.2.2       DO.db_2.9            rvcheck_0.1.1      
[57] cowplot_0.9.3        fastmatch_1.1-0      tools_3.5.1          bit64_0.9-7        
[61] Biobase_2.40.0       glue_1.3.0           tweenr_1.0.0         purrr_0.2.5        
[65] yaml_2.2.0           ggraph_1.0.2         parallel_3.5.1       AnnotationDbi_1.42.1
[69] colorspace_1.3-2     UpSetR_1.3.3         memoise_1.1.0        bindr_0.1.1

clusterprofiler gene ontology • 4.5k views

ADD COMMENT • link 5.5 years ago anton.kratz ▴ 60

1

Entering edit mode

any reproducible example?

ADD REPLY • link 5.5 years ago Guangchuang Yu ★ 1.2k

0

Entering edit mode

Hi Guangchuang Yu, thank you for answering! I am sorry, I do not have a reproducible example other than the code I quoted above, because I cannot share the original custom ontology, because it is sensitive information. I do know how to make a shareable version of it. However, I would be very thankful if you could help me understand the meaning of "minGSSize: minimal size of genes annotated by Ontology term for testing". It is not clear to me which genes this refers to and what this sentence exactly means. As I wrote, the genes-of-interest list is clearly larger than 10. The custom ontology has many terms which contain more than 10 genes as members each and therefore the statement "No gene set have size > 10" can never be true. After reading especially [1] and also [2], I believe that what is meant is: minGSSize is the minimal number of genes which at least one term in the (custom) ontology needs to have, which are also in the list of genes-of-interest. I.e. there must be at least minGSSize genes which are in the list of genes tested (genes of interest), and these same genes must also be in at least one term in the ontology used for testing (a term which might contain many more other genes as well). Is this correct? However, if this is the correct interpretation, I am puzzled because there are definitely gene sets in my custom ontology which contain more than minGSSize of the genes of interest list which is tested, so I do not understand how "No gene set have size > 10" can be triggered. Which "gene set" is meant here, genes of interest, any possible gene set in the custom ontology or something else?!

[1] https://github.com/GuangchuangYu/clusterProfiler/issues/56#issuecomment-231636257
[2] Pathway analysis using ReactomePA::enrichPathway(): doubts about minGSSize and maxGSSize arguments

ADD REPLY • link 5.5 years ago anton.kratz ▴ 60

0

Entering edit mode

you are correct, as the source code indicated:

idx <- minGSSize <= geneSet_size & geneSet_size <= maxGSSize

ADD REPLY • link 5.5 years ago Guangchuang Yu ★ 1.2k