Please login first
Harnessing the Potential of Different Machine Learning Algorithms for Linking Structural and Functional Properties of Plant Proteins
* 1 , 2
1  Agricultural and Food Engineering Department, Indian Institute of Technology Kharagpur
2  Department of Food Science and Nutrition, University of Minnesota
Academic Editor: Yonghui Li

Abstract:

The food industry is observing a surge in demand for specialized protein-based functional ingredients for gelling, thickening, and emulsifying applications. There exists a complex relationship between a protein's structural and functional properties, which can be affected by species, cultivars, and processing that have a direct bearing on the protein’s structural characteristics; in turn, this governs their functional properties. This complex behavior can be modeled using various machine learning (ML) algorithms. In this study, different ML algorithms have been used to predict the protein solubility, emulsifying activity index, emulsification capacity, and gel strength of different plant proteins (soy (Glycine max), pea (Pisum sativum), chickpea (Cicer arietinum), rice (Oryza sativa), hemp (Cannabis sativa), camelina (Camelina sativa), and pennycress (Thlaspi arvense)) using structural predictors (surface hydrophobicity, zeta potential, undenatured protein content, soluble protein polymer content, and β-sheet content). The plant protein structure–function dataset comprised 150 data points, with a 70:30 split into training and testing sets. Model performances were assessed by metrics such as R2, mean absolute error (MAE), and root mean squared error (RMSE) as well as the non-violation of physical constraints. Data visualization and principal component analysis were also carried out to investigate the associations between the dependent and independent variables and to learn about the inherent patterns and linear or non-linear relationships between the variables. The Gaussian-based support vector regression model accurately predicted solubility (R2 = 0.8906), emulsifying activity index (R2 = 0.7383), emulsification capacity (R2 = 0.7978), and gel strength (R2 = 0.8822). These predictions demonstrated the potential value of ML algorithms for the prediction of plant protein functionality from macromolecular structural characteristics with a high accuracy and without the need for wet experiments and excessive protein purification.

Keywords: Plant proteins; machine learning; regression; functional properties

 
 
Top