Entering edit mode
I downloaded the goslim_generic.obo from geneontology.org and followed the post: GSEABase - View/extract GO terms mapped to each GOslim to slim my GO BP terms. However, one GO term: GO:0002831 was not counted. What could be the problem here and any recommendations on how to address it?
ids<- "GO:0002831"
myCollection <- GOCollection(ids)
slim <- getOBOCollection(goslim_generic.obo)
slimdf <- goSlim(myCollection, slim, "BP", verbose=T)
## Result: not counted in any generic termsĀ
sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] tools grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] viridis_0.6.2 viridisLite_0.4.0 clusterProfiler_4.4.4 topGO_2.48.0
[5] GO.db_3.15.0 GSEABase_1.58.0 annotate_1.74.0 biomaRt_2.52.0
[9] org.Hs.eg.db_3.15.0 AnnotationDbi_1.58.0 RSQLite_2.2.14 SparseM_1.81
[13] ggrepel_0.9.1 Rtsne_0.16 ggraph_2.0.5 ComplexHeatmap_2.12.0
[17] NMF_0.24.0 synchronicity_1.3.5 bigmemory_4.6.1 cluster_2.1.4
[21] rngtools_1.5.2 pkgmaker_0.32.2 registry_0.5-1 gplots_3.1.3
[25] statmod_1.5.0 edgeR_3.38.1 variancePartition_1.26.0 BiocParallel_1.30.4
[29] limma_3.52.4 BiocManager_1.30.22 RColorBrewer_1.1-3 mgcv_1.9-0
[33] nlme_3.1-157 car_3.1-0 carData_3.0-5 mlbench_2.1-3.1
[37] plotly_4.10.0 scatterplot3d_0.3-44 data.table_1.14.2 doParallel_1.0.17
[41] iterators_1.0.14 foreach_1.5.2 forcats_1.0.0 stringr_1.5.1
[45] dplyr_1.1.2 purrr_1.0.1 readr_2.1.2 tidyr_1.2.0
[49] tibble_3.2.1 ggplot2_3.4.3 tidyverse_1.3.1 XML_3.99-0.10
[53] graph_1.74.0 Matrix_1.6-1.1 IRanges_2.30.0 S4Vectors_0.34.0
[57] Biobase_2.56.0 BiocGenerics_0.42.0 rgl_0.109.2
Thank you for your effort! I downloaded the same generic OBO file you mentioned, and yes, these generic GO terms are not present in my GO slim. I work with human expression data, and I'm interested in slimming down to summarize the results. The one I downloaded from the download page seems to be the most promising option so far. Could you kindly recommend another generic OBO file that you think would be suitable? Alternatively, do you think it's feasible for me to simply add this parent ontology to my GO slim?
I have to confess that I am personally mystified by GO slims. I get that they are meant to give a high level idea of what processes are important for a given organism, but as an example, what good is a GO slim if you are just going to stick some extra terms back in? I guess I just don't get the use case.
But maybe your use case doesn't require a GO slim. What exactly do you mean by 'slimming down to summarize the results'?
If you are just trying to summarize your univariate comparisons, I don't think you want a GO slim, but instead you just want to do a conventional GO hypergeometric, or maybe Gene set testing to identify processes or pathways that are being affected.
I have multiple groups to compare with, and each enriched with hundreds of GO terms from their correlations with clinical traits (top1 pvalue ranking). Some GO terms can go so deep, and some are more generic, and also, I think it is not very informative to just look and compare the highest-ranking GO terms individually, so I think it would be nice to explore the generic terms within each group, and then examine the most relevant individual GO terms falling under that generic category. What do you think?
What do you mean by 'enriched with ... GO terms from their correlations with clinical traits'? Are you doing a GO hypergeometric or something else?
For each group, I generated multiple co-expression clusters, and within each cluster, cluster PC1 was correlated with PC1 of multiple clinical traits. Then, I run GO Hypergeometric to determine whether specific GO terms are overrepresented among the genes within the significant cluster. Thus, I got multiple enriched GO terms per each group.
Is this a WGCNA analysis (multiple co-expression clusters sounds like WGCNA)? If so, none of that is inferential or easily interpreted. It might be easier to use the canonical method of first regressing each gene on your clinical traits to find genes that have a significant relationship with the trait, and then do a GO hypergeometric on the set of significant genes.
That's a bit more interpretable IMO. You have a set of genes for which you have evidence that they are related to the clinical trait, and then you identify pathways/processes that are over-represented in that set of genes, and may be affected by the clinical trait.
Thank you for the advice and many thanks! It is WGCNA analysis. I will try your method today and see if there are some consistencies.