Understanding and reconstructing metabolic pathways can help identify biomarkers and metabolic changes associated with systemic diseases, such as cancer and cardiovascular diseases. However, in most metabolomics datasets, more than half of the experimentally detected metabolites can be unannotated, making their pathway-specific involvement largely unknown. This study suggests that a distance concept can help place an unannotated metabolite into specific parts of a metabolic network by finding its “distance” in relation to known metabolites. We introduce the concept of “enzymatic distance” to quantify the separation of any two metabolites in metabolism in terms of the number of reaction centers involved and the flow of mass between them. Enzymatic distance was scored by two metrics: counts of shared atom mappings and reaction center atoms averaged over the reaction paths separating two compounds. Univariate and multivariate multi-layered perceptron (MLP) regression models were trained and evaluated over 100 epochs and 30 cross-validation iterations to predict enzymatic distance metrics between target metabolites given only their chemical structural features. While atom mappings were robustly predicted by univariate (R2=0.9608 ± 0.0031(sd)) and multivariate (R2=0.9576 ± 0.0028(sd)) regression, reaction centers were predicted significantly less accurately by univariate (R2=0.8761 ± 0.0065(sd)) and multivariate (R2=0.8593 ± 0.0071(sd)) regression. The univariate regression model predicted both metrics more accurately than the multivariate regression (p-value<0.001), as predicting both metrics separately renders higher prediction accuracy. Within metabolite compounds, chemical substructures containing oxygen and nitrogen were found to play a significant role in defining enzymatic distance in metabolism, with more complex features enhancing the accuracy of atom mapping predictions. By quantifying and predicting relationships between known and unknown metabolites, unannotated metabolites in the human body can be better interpreted, aiding the detection and interpretation of metabolic diseases.
Previous Article in event
Next Article in event
A New Enzymatic Distance Concept Enables Machine Learning Regression of Metabolite Chemical Representation Features to Distance in Metabolism
Published:
10 October 2025
by MDPI
in The 4th International Electronic Conference on Metabolomics
session Advanced Metabolomics and Data Analysis Approaches
Abstract:
Keywords: Enzymatic distance; pathway reconstruction; data deconvolution; unannotated metabolites; bioinformatics; KEGG; chemical substructures; machine learning
