Please login first
Comparison of Machine Learning Models for Apple Purée Consistency Prediction
1 , 2 , 1 , * 1
1  Department of Land, Environment, Agriculture and Forestry (TESAF), Università degli Studi di Padova, Viale dell'Università 16, Legnaro (PD), 35020 Italy
2  Department of Biosystems Engineering, Poznan University of Life Sciences, Wojska Polskiego 50, 60-627 Poznan, Poland
Academic Editor: Mohsen Gavahian

Abstract:

This study compares machine learning (ML) models, Deep Learning (DL), Distributed Random Forest (DRF), and Gradient Boosting Machines (GBMs), for predicting the consistency of apple purée, a key attribute in apple purée production. Consistency affects product quality and acceptability and can be used to regulate process settings in industrial production lines. The main objective was to model the Bostwick flow distance (cm/30s), a practical measure of purée consistency, using a combination of inline process data and the physicochemical properties of apples, and to compare the performance of the ML models. The data, collected from an industrial production line, included measurements such as pressure drop, average flow velocity, inline temperature, °Brix, pH, and color parameters (L*, a*, b*, Chroma, and Hue). Preprocessing was carried out using H2O's default settings, as the platform is fast and user-friendly. Models were trained on 75% of the dataset, with the remaining 25% used for validation. All modeling followed the platform’s default settings, except the number of trees, which was increased from 50 to 100 for both DRF and GBM. Model performance was evaluated using standard regression metrics (R², RMSE, and MAE).
GBM outperformed both DL and DRF in predictive accuracy and generalization, likely due to its lower sensitivity to multicollinearity and strong ability to model non-linear interactions. DRF gave acceptable results, though its performance was less stable, possibly due to its limitations with multicollinearity, which affected validation and learning curves. DL captured complex patterns effectively but required greater computational resources. Variable importance analysis of GBM showed that pressure difference was the most influential feature, providing meaningful insights into consistency behavior. This study highlights the importance of combining rheological knowledge with data-driven models to enable objective and adaptive consistency monitoring in food production. Additionally, it demonstrates the potential of using ML frameworks in industrial process environments.

Keywords: Machine Learning, Deep Learning, Distributed Random Forest, and Gradient Boosting Machines, consistency, rheology, apple puree
Comments on this paper
Currently there are no comments available.


 
 
Top