GOslim missing GO term
1
0
Entering edit mode
amanda • 0
@41b054d5
Last seen 9 days ago
United States

I downloaded the goslim_generic.obo from geneontology.org and followed the post: GSEABase - View/extract GO terms mapped to each GOslim to slim my GO BP terms. However, one GO term: GO:0002831 was not counted. What could be the problem here and any recommendations on how to address it?

ids<- "GO:0002831"
myCollection <- GOCollection(ids)
slim <- getOBOCollection(goslim_generic.obo)
slimdf <- goSlim(myCollection, slim, "BP", verbose=T)
## Result: not counted in any generic termsĀ 

sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] tools     grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] viridis_0.6.2            viridisLite_0.4.0        clusterProfiler_4.4.4    topGO_2.48.0            
 [5] GO.db_3.15.0             GSEABase_1.58.0          annotate_1.74.0          biomaRt_2.52.0          
 [9] org.Hs.eg.db_3.15.0      AnnotationDbi_1.58.0     RSQLite_2.2.14           SparseM_1.81            
[13] ggrepel_0.9.1            Rtsne_0.16               ggraph_2.0.5             ComplexHeatmap_2.12.0   
[17] NMF_0.24.0               synchronicity_1.3.5      bigmemory_4.6.1          cluster_2.1.4           
[21] rngtools_1.5.2           pkgmaker_0.32.2          registry_0.5-1           gplots_3.1.3            
[25] statmod_1.5.0            edgeR_3.38.1             variancePartition_1.26.0 BiocParallel_1.30.4     
[29] limma_3.52.4             BiocManager_1.30.22      RColorBrewer_1.1-3       mgcv_1.9-0              
[33] nlme_3.1-157             car_3.1-0                carData_3.0-5            mlbench_2.1-3.1         
[37] plotly_4.10.0            scatterplot3d_0.3-44     data.table_1.14.2        doParallel_1.0.17       
[41] iterators_1.0.14         foreach_1.5.2            forcats_1.0.0            stringr_1.5.1           
[45] dplyr_1.1.2              purrr_1.0.1              readr_2.1.2              tidyr_1.2.0             
[49] tibble_3.2.1             ggplot2_3.4.3            tidyverse_1.3.1          XML_3.99-0.10           
[53] graph_1.74.0             Matrix_1.6-1.1           IRanges_2.30.0           S4Vectors_0.34.0        
[57] Biobase_2.56.0           BiocGenerics_0.42.0      rgl_0.109.2
GO.db GOslim • 520 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 10 hours ago
United States

I think it's because that GO term isn't an offspring of any term in the GO slim you are downloading.

> ids<- "GO:0002831"
> slim <- getOBOCollection("https://current.geneontology.org/ontology/subsets/goslim_generic.obo")
> allgomap <- as.list(GOBPOFFSPRING)
> inthese <- names(allgomap[sapply(allgomap, function(x) ids %in% x)])
## these are the terms that GO:0002831 is an offspring of
> inthese
[1] "GO:0008150" "GO:0009607"
[3] "GO:0048583" "GO:0050789"
[5] "GO:0050896" "GO:0065007"
## are any of these GO terms in the GO slim we downloaded?
> any(inthese %in% ids(slim))
[1] FALSE
0
Entering edit mode

Thank you for your effort! I downloaded the same generic OBO file you mentioned, and yes, these generic GO terms are not present in my GO slim. I work with human expression data, and I'm interested in slimming down to summarize the results. The one I downloaded from the download page seems to be the most promising option so far. Could you kindly recommend another generic OBO file that you think would be suitable? Alternatively, do you think it's feasible for me to simply add this parent ontology to my GO slim?

ADD REPLY
0
Entering edit mode

I have to confess that I am personally mystified by GO slims. I get that they are meant to give a high level idea of what processes are important for a given organism, but as an example, what good is a GO slim if you are just going to stick some extra terms back in? I guess I just don't get the use case.

But maybe your use case doesn't require a GO slim. What exactly do you mean by 'slimming down to summarize the results'?

If you are just trying to summarize your univariate comparisons, I don't think you want a GO slim, but instead you just want to do a conventional GO hypergeometric, or maybe Gene set testing to identify processes or pathways that are being affected.

ADD REPLY
0
Entering edit mode

I have multiple groups to compare with, and each enriched with hundreds of GO terms from their correlations with clinical traits (top1 pvalue ranking). Some GO terms can go so deep, and some are more generic, and also, I think it is not very informative to just look and compare the highest-ranking GO terms individually, so I think it would be nice to explore the generic terms within each group, and then examine the most relevant individual GO terms falling under that generic category. What do you think?

ADD REPLY
0
Entering edit mode

What do you mean by 'enriched with ... GO terms from their correlations with clinical traits'? Are you doing a GO hypergeometric or something else?

ADD REPLY
0
Entering edit mode

For each group, I generated multiple co-expression clusters, and within each cluster, cluster PC1 was correlated with PC1 of multiple clinical traits. Then, I run GO Hypergeometric to determine whether specific GO terms are overrepresented among the genes within the significant cluster. Thus, I got multiple enriched GO terms per each group.

ADD REPLY
0
Entering edit mode

Is this a WGCNA analysis (multiple co-expression clusters sounds like WGCNA)? If so, none of that is inferential or easily interpreted. It might be easier to use the canonical method of first regressing each gene on your clinical traits to find genes that have a significant relationship with the trait, and then do a GO hypergeometric on the set of significant genes.

That's a bit more interpretable IMO. You have a set of genes for which you have evidence that they are related to the clinical trait, and then you identify pathways/processes that are over-represented in that set of genes, and may be affected by the clinical trait.

ADD REPLY
0
Entering edit mode

Thank you for the advice and many thanks! It is WGCNA analysis. I will try your method today and see if there are some consistencies.

ADD REPLY

Login before adding your answer.

Traffic: 392 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6