Please login first
Data analysis of protein–flavor interactions using classification and deep-learning techniques
* 1, 2 , 3 , 4 , 3 , 5
1  Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Matemática. Buenos Aires, Argentina.
2  Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Inorgánica, Analítica y Química Física. Buenos Aires, Argentina.
3  CONICET - Universidad de Buenos Aires. Instituto de Tecnología de Alimentos y Procesos Químicos. Buenos Aires, Argentina.
4  CONICET - Universidad de Buenos Aires. Instituto de Cálculo. Buenos Aires, Argentina.
5  Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Orgánica. Buenos Aires, Argentina.
Academic Editor: Yonghui Li

Abstract:

Analyzing molecular interactions among food components is of key interest for novel food formulations and for optimizing their shelf-life, but the complexity of food matrices often poses difficulties. In this work, we formulated model systems with gelatin placed in indirect contact with different flavors (citral, cinnamaldehyde, and vanillin) and stored at room temperature during 60, 120, and 150 days. We analyzed how protein characteristics changed over time using Fourier transform infrared spectroscopy viaattenuated total reflectance (FTIR-ATR). The spectra included the amide I, II, and III regions and the region associated with glycation products (altogether around 1850-1100 cm-1). The data (N=37, 750 features) werethen analyzed using Python, and principal components analysis (PCA) was also performed, obtaining a separation into classes depending on flavor and storage time. Features associated with the first principal component were correlated with protein–flavor interactions (wavenumbers in amide III, glycation products, and amide I regions), while the second was associated with changes undergone during storage (wavenumbers in amide I region, including C=O stretching). Afterwards, data were sorted into classes according to flavor or storage time using two models with a 70-30% train–test split: random forest classification (RFC; with leave-one-out cross-validation) and a neural network consisting of a multi-class perceptron (MCP, with 1024 entry nodes and 4 or 3 output nodes, with cross-entropy loss). In both cases, the data were classified (accuracy for flavor classification: 81%—RFC and 83%—MPC; accuracy for storage time: 88%—RFC and 92%—MPC). The most relevant features selected for the RFC model corresponded to the key features previously obtained by PCA, while the MPC showed a greater degree of accuracy in the classification of the systems. This work showcases a novel application of data analysis techniques to simplify protein–flavor interaction complexity and analyze its key features, which could be useful for food formulation development.

Keywords: FTIR-ATR; principal component analysis, random forest classification, neural network, deep learning, protein-flavor interactions
Top