Dear colleagues,
I'm currently trying to understand the details for the "weight01" topGO method, and I'm puzzled. Can I please kindly ask for some comments?
Inside .sigGroups.weight01
function (topGOalgo.R), there is the following code block:
for(child in termChildren) {
...
w[child] <- .sigRatio.01(a = childSig, b = termSig)
}
## if w[child] > 1 than that child is more significant
sig.termChildren <- names(w[w > 1])
## CASE 1: if we don't have significant children
....
## CASE 2: 'child' is more significant that 'u'
....
At the same time, function .sigRatio.01 (topGOfunctions.R) always returns values exceeding 1:
.sigRatio.01 <- function(a, b, tolerance = 1e-50) {
## if a and b are almost equal we return 2
if(identical(all.equal(a, b, tolerance = tolerance), TRUE))
return(2)
if(a < b)
return(1e50)
return(2)
}
In my understanding, it has the following effect.
- All child nodes (terms) are always treated as more significant irrespective of actual p-values (and CASE 1 is impossible).
- Thus (CASE 2), genes associated with all child nodes are always removed for an analyzed node (term). As a result, gene propagation based on the rule of path is reverted (and additionally genes associated with a term are always removed if occasionally they are also associated with a term’s (grand)child).
- Fisher's test (by default) is applied to such a "cleaned-up" set of genes.
In other words, for a "perfect-world case" with no genes assigned simultaneously to a GO term and one of its (grand)-parent terms, the analysis can be equivalently described as follows:
- gene propagation based on the rule of path is not performed;
- classical graph-independent enrichment analysis is applied.
Can I please ask
- if it is a correct interpretation, or if the code actually does something different;
- and if it is a correct interpretation, is it a bug or a feature?
Thank you very much!
Best regards,
Vladimir