How to compute the "completeness" of a KEGG module using a set of KEGG orthologs? (Module Completion Ratio)
0
0
Entering edit mode
jol.espinoz ▴ 40
@jolespinoz-11290
Last seen 14 months ago

This is a problem I have been trying to figure out for years and haven't been able to get any attention on any forums or anything. It's extremely important in metagenomics to be able to assess how complete a particular metabolic module is with respect to a genome. With the implementation of KOFAMSCAN they made it real easy to know what KEGG orthologs are associated with what ORF/GENE but knowing how complete a module is with respect to a genome is really confusing and there is no clear way to do this.

Some of these calculations but some are not very straight forward with their "definition" nomenclature. For example, M00357 is fairly complex. The definition is: ((K00925 K00625),K01895) (K00193+K00197+K00194) (K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584) (K00399+K00401+K00402) (K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))

According to the "Help" for KEGG modules, the definition is described by the following:

The definition of the module as a list of K numbers for pathway/signature modules and RC numbers for reaction modules. Comma separated K numbers or RC numbers indicate alternatives. Plus signs are used to represent a complex or a combination and a minus sign denotes a non-essential component in the complex.

Main points from the description:

• Comma separated K numbers or RC numbers indicate alternatives.
• Plus signs are used to represent a complex or a combination
• A minus sign denotes a non-essential component in the complex.

MAPLE was the only tool that I know could do this but MAPLE service was discontinued at the end of February, 2019..

I've asked a similar question in the past but got no responses what so ever.

Other people have asked similar questions: https://www.biostars.org/p/210883/

I'm trying to do this in high-throughput so trying to make a function in Python:

def module_completion_ratio(definition:str, orthology_set:set):
# So much empty...
mcr = None
return mcr


or R

module_completion_ratio = function(definition, orthology_set){
mcr=NULL
return(mcr)
}


Going back to the complex example above for M00357. Let's break it up chunk by chunk:

• ((K00925 K00625),K01895)

• Is this saying either (K00925 AND K00925) or just K01895 alone?
• (K00193+K00197+K00194)

• This is straightforward, all of these are essential
• (K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584)

• Is this saying all are essential but K00582 and K00583?
• (K00399+K00401+K00402)

• Straightforward again...
• (K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))

• For this beast here, is it saying either (K22480+K22481+K22482) OR (K03388+K03389+K03390) OR (K08264+K08265) OR K03388+K03389+K03390+K14127 AND EITHER (K14126+K14128) OR (K22516+K00125)

Are there any scripts in R or Python that have already coded this up? The logic is quite confusing and automating this is going to be quite frustrating. If you don't know of anything can you help me understand if my logic above makes sense and if not then why?

KEGG KEGGREST keggorthology Metagenomics • 297 views