Please login first
EnzyRxn-Transformer: A generative platform for rational experiment design in biotransformation
1 , * 2
1  Kansas State University
2  Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA
Academic Editor: Moktar Hamdi

Abstract:

Enzymatic reactions play a pivotal role in the biotransformation of agricultural products, significantly influencing experimental design and final product outcomes. Establishing a robust platform promises substantial benefits for food science and broader research communities alike. For instance, the "beany" flavor of vegetable proteins may deter consumers. Such a platform enables researchers to identify novel enzymes and their associated micro-organisms to effectively eliminate these off-flavors.

However, the insufficient training data of enzymatic reactions have hindered studies on prediction model development. In addition, the integration of enzymes from biology and molecules from chemistry is also challenging. Pretrained protein/chemistry large language models are trained on millions of protein sequences and molecules to possess a vast reserve of prior knowledge in the respective fields and have the capacity to generate a remarkable numerical representation of protein sequences and molecules, respectively. Those language models represent general knowledge of biochemistry and can be used to enhance model performance in a smaller dataset.

This study aims to develop a generative model as an expert assistant for predicting enzymatic reactions in bioconversion. We innovatively fused two cutting-edge pretrained language models (ESM and MolFormer) for protein and chemical substrate representation, respectively. The fusing model exhibited remarkable performance in predicting a small molecule substrate of an enzyme with a state-of-the-art accuracy of 96.9%.

Furthermore, EnzyRxn-Transformer was developed for enzymatic reaction prediction by integrating the ESM model for enzyme embedding. Finally, the model can predict the products if given the inputs of an enzyme and reactants with a top-1 accuracy of 39.58 %. With enzyme substrate prediction models serving as gatekeepers in practical application, EnzyRxn-Transformer can be used in for bioconversion planning (such as off-flavor elimination) with higher confidence and save more resources in trial-and-error experiments.

Keywords: deep learning; large language model; substrate-enzyme pair prediction; multimodality; bioconversion

 
 
Top