Background: Wheat is the most essential food grain produced in the US. The primary goals of US wheat breeding programs include developing cultivars with high end use quality, such as milling and baking efficiency, with desirable agronomic traits and resilience to environmental stresses. Cultivar selection is guided by achieving the recommended target values for several key parameters, including protein content, mixing properties, and baking performance. Bread volume is among the most important targets.
Objective: This study compares and combines machine learning models, including Random Forest, XGBoost, and Support Vector Machine, with the physicochemical (PC) properties and rheological parameters (Farinograph and Alveograph) of 359 Hard Red Winter (HRW) wheat cultivars. Our goal is to classify HRW wheat into three bread loaf volume categories, low (≤942 cubic centimetres), moderate (943-1080 cubic centimetres), and high (≥1081 cubic centimetres), and to allow for the rapid assessment of the bread loaf volume of new HRW cultivars, potentially bypassing the need for baking the bread and conducting rheological evaluations, and thereby conserving resources. We utilized HRW data from Wheat Quality Council (WQC) reports from 2010 to 2023.
Method: We optimized model prediction accuracy by employing a feature selection process and fine-tuning model hyperparameters.
Result: Model performance varied with different combinations of PC and rheological features. The optimal combination of PC with Farinograph and Alveograph parameters (23 features) yielded the highest accuracy with the use of a Random Forest model, achieving a test accuracy of 81% and a cross-validation (CV) accuracy of 76%. However, using only the 14 PC features, the Support Vector Machine model achieved a test accuracy of 83% and a CV accuracy of 74%. Despite the varying feature combinations, the sedimentation volume consistently ranked as the most important feature for high bread volumes. Other significant features included dough extensibility, elasticity index, swelling index, breakdown time and wheat protein.