Hi everybody,
I am a bit puzzled by topGO results. The library seems very powerful, but the documentation is quite sparse/cryptic to me. In particular, I am interested in understanding what the SigTerms / "non trivial nodes" are. From the library documentation and from this post What is Nontrivial node in topGO analysis?, they are the
number of GO categories which have at least one significant gene annotated
In my understanding, this should be independent from what statistical test I run to define what GO term are significant, as it relates to what we define as significant in the input data using the selection function. Thus, if I run a Fischer test and a KS test on the same dataset using the same threshold criteria to define what genes are significant, I expect to obtain the same number of non trivial nodes:
library(topGO)
#################################
# prepare the toy data
pvals <- c(0.372633450, 0.000195454, 0.699548147, 0.021062787, 0.732816144,
0.805712054, 0.927868696, 0.737794221, 0.847279035, 0.662742785,
0.204508888, 0.031615846, 0.543586800, 0.202557857, 0.410675473,
0.394295637, 0.097123448, 0.882223568, 0.779278809, 0.926313327)
geneids <- c("ENSG00000148584", "ENSG00000175899", "ENSG00000094914", "ENSG00000114771", "ENSG00000103591",
"ENSG00000087884", "ENSG00000127837", "ENSG00000131043", "ENSG00000149313", "ENSG00000008311",
"ENSG00000183044", "ENSG00000165029", "ENSG00000085563", "ENSG00000005471", "ENSG00000115657",
"ENSG00000131269", "ENSG00000023839", "ENSG00000108846", "ENSG00000117528", "ENSG00000164163")
smallset <- data.frame(GENEID = geneids, PADG = pvals)
ALPHA <- 0.01 #p-value threshold
#################################
# Run Fischer test
#
fisher_set <- as.integer(smallset[, "PADG"] <= ALPHA)
names(fisher_set) <- smallset[, "GENEID"]
fisher_data <- new("topGOdata", ontology = "BP", allGenes = fisher_set, geneSel = function(x)(x == 1),
nodeSize = 10, annot = annFUN.org, mapping = "org.Hs.eg.db", ID = "ENSEMBL")
top_algo <- "weight01"
top_stat <- "fisher"
fisher_results <- runTest(fisher_data, algorithm = top_algo, statistic = top_stat)
geneData(fisher_results)
#################################
# Run K-S test
#
ks_set <- smallset[, "PADG"]
names(ks_set) <- smallset[, "GENEID"]
ks_data <- new("topGOdata", ontology = "BP", allGenes = ks_set, geneSel = function(x)(x <= ALPHA),
nodeSize = 10, annot = annFUN.org, mapping = "org.Hs.eg.db", ID = "ENSEMBL")
top_algo <- "weight01"
top_stat <- "ks"
ks_results <- runTest(ks_data, algorithm = top_algo, statistic = top_stat,
scoreOrder = "increasing")
However, this is what I get:
geneData(fisher_results)
Annotated Significant NodeSize SigTerms
20 1 10 10
geneData(ks_results)
Annotated Significant NodeSize SigTerms
20 1 10 16
Can anybody explain to me what is happening? Thanks!
sessionInfo( )
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] SparseM_1.81 org.Hs.eg.db_3.14.0 AnnotationDbi_1.56.2 IRanges_2.28.0
[5] S4Vectors_0.32.4 Biobase_2.54.0 topGO_2.41.0 graph_1.72.0
[9] BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] nlme_3.1-155 bitops_1.0-7 matrixStats_0.61.0
[4] bit64_4.0.5 doParallel_1.0.17 RColorBrewer_1.1-3
[7] httr_1.4.2 GenomeInfoDb_1.30.1 backports_1.4.1
[10] tools_4.1.3 utf8_1.2.2 R6_2.5.1
[13] lasso2_1.2-22 DBI_1.1.2 colorspace_2.0-3
[16] GetoptLong_1.0.5 mnormt_2.0.2 tidyselect_1.1.2
[19] DESeq2_1.34.0 bit_4.0.4 Nozzle.R1_1.1-1
[22] compiler_4.1.3 cli_3.2.0 logging_0.10-108
[25] ggdendro_0.1.23 DelayedArray_0.20.0 scales_1.1.1
[28] psych_2.2.3 genefilter_1.76.0 stringr_1.4.0
[31] digest_0.6.29 XVector_0.34.0 pkgconfig_2.0.3
[34] MatrixGenerics_1.6.0 fastmap_1.1.0 limma_3.50.1
[37] rlang_1.0.2 GlobalOptions_0.1.2 rstudioapi_0.13
[40] RSQLite_2.2.12 shape_1.4.6 generics_0.1.2
[43] BiocParallel_1.28.3 dplyr_1.0.8 RCurl_1.98-1.6
[46] magrittr_2.0.2 GO.db_3.14.0 GenomeInfoDbData_1.2.7
[49] Matrix_1.4-0 Rcpp_1.0.8.3 munsell_0.5.0
[52] fansi_1.0.3 lifecycle_1.0.1 stringi_1.7.6
[55] edgeR_3.36.0 MASS_7.3-55 SummarizedExperiment_1.24.0
[58] zlibbioc_1.40.0 plyr_1.8.6 DEGreport_1.30.3
[61] grid_4.1.3 blob_1.2.3 parallel_4.1.3
[64] ggrepel_0.9.1 crayon_1.5.1 lattice_0.20-45
[67] cowplot_1.1.1 Biostrings_2.62.0 splines_4.1.3
[70] annotate_1.72.0 circlize_0.4.14 KEGGREST_1.34.0
[73] tmvnsim_1.0-2 locfit_1.5-9.5 knitr_1.37
[76] ComplexHeatmap_2.10.0 pillar_1.7.0 GenomicRanges_1.46.1
[79] rjson_0.2.21 geneplotter_1.72.0 codetools_0.2-18
[82] XML_3.99-0.9 glue_1.6.2 png_0.1-7
[85] vctrs_0.4.0 foreach_1.5.2 tidyr_1.2.0
[88] gtable_0.3.0 purrr_0.3.4 reshape_0.8.8
[91] clue_0.3-60 assertthat_0.2.1 cachem_1.0.6
[94] ggplot2_3.3.5 xfun_0.30 xtable_1.8-4
[97] broom_0.7.12 ConsensusClusterPlus_1.58.0 survival_3.2-13
[100] tibble_3.1.6 iterators_1.0.14 memoise_2.0.1
[103] cluster_2.1.2 ellipsis_0.3.2