topGO weight01 method: potential bug in the code
0
Entering edit mode
Last seen 7 days ago

Dear colleagues,

I'm currently trying to understand the details for the "weight01" topGO method, and I'm puzzled. Can I please kindly ask for some comments?

Inside .sigGroups.weight01 function (topGOalgo.R), there is the following code block:

    for(child in termChildren) {
...
w[child] <- .sigRatio.01(a = childSig, b = termSig)
}

## if w[child] > 1 than that child is more significant
sig.termChildren <- names(w[w > 1])

## CASE 1:  if we don't have significant children
....

## CASE 2:   'child' is more significant that 'u'
....


At the same time, function .sigRatio.01 (topGOfunctions.R) always returns values exceeding 1:

.sigRatio.01 <- function(a, b, tolerance = 1e-50) {

## if a and b are almost equal we return 2
if(identical(all.equal(a, b, tolerance = tolerance), TRUE))
return(2)

if(a < b)
return(1e50)

return(2)
}


In my understanding, it has the following effect.

• All child nodes (terms) are always treated as more significant irrespective of actual p-values (and CASE 1 is impossible).
• Thus (CASE 2), genes associated with all child nodes are always removed for an analyzed node (term). As a result, gene propagation based on the rule of path is reverted (and additionally genes associated with a term are always removed if occasionally they are also associated with a term’s (grand)child).
• Fisher's test (by default) is applied to such a "cleaned-up" set of genes.

In other words, for a "perfect-world case" with no genes assigned simultaneously to a GO term and one of its (grand)-parent terms, the analysis can be equivalently described as follows:

• gene propagation based on the rule of path is not performed;
• classical graph-independent enrichment analysis is applied.

• if it is a correct interpretation, or if the code actually does something different;
• and if it is a correct interpretation, is it a bug or a feature?

Thank you very much!

Best regards,

topGO weight01 • 38 views