Machine Learning Models Applied to Predictive Maintenance in Automotive Engine Components

: Fault detection on automotive engine components is an important feature that motivates research from different engineering areas due to the interest of automakers in its potential to increase safety, reliability, and lifespan and to reduce pollutant emissions, fuel consumption, and maintenance costs. The fault detection can be applied to several types of maintenance strategies, ranging from finding the faults that generated a component failure to finding them before the failure occurs. This work is focused on predictive maintenance, which aims to constantly monitor the target component to detect a fault at the beginning, thus facilitating the prevention of target component failures. It presents the results of different machine learning methods implemented as classification predictors for fault detection tasks, including Random Forest (RF), Support Vector Machines (SVM), Artificial Neural Networks (ANN) variants, and Gaussian Processes (GP). The data used for training were generated by a simulation testbed for fault diagnosis in turbocharged petrol engine systems, whereby its operation was modeled using industrial-standard driving cycles, such as the Worldwide Harmonized Light Vehicle Test Procedure (WLTP), New European Driving Cycle (NEDC), Extra-Urban Driving Cycle (EUDC), and the United States Environmental Protection Agency Federal Test Procedure (FTP-75).


Introduction
Due to production costs, it is not possible to have sensors installed in all engine components, which makes it difficult to apply predictive maintenance for all of them [1]. One way to work around that problem is to use predictors based on machine learning paradigms that can use signals from indirect sensors and predict a fault [2]. To accomplish that, it is necessary to acquire data that contain both normal and faulty behaviors from the target component to train a machine learning method to recognize the defective behavior before embedding it into the software of an engine electronic control unit [3].
Data acquisition can be an expensive process as well, as it may require several rounds of destructive testing for different driving cycles, which must be performed in real time on instrumented vehicles in a dynamometer [4]. Since machine learning methods are capable of handling a certain amount of noise, the data to train them can be generated by simulating the model of the respective engine in which the target component is installed. That process does not require real-time executions and vehicle instrumentation in a dynamometer lab, which decreases the cost of the data acquisition as a whole [5]. Simulated data-driven training of machine learning methods has been used in different applications [6][7][8][9], showing that it is a valid approach to problems where it is possible to entirely or partially model the system where it is going to be applied.
The purpose of this work is to show the feasibility of using different machine learning approaches as predictors of fault diagnosis, for predictive maintenance purposes, by using training and testing datasets of different standardized driving cycles, generated by a simulation testbed for fault diagnosis in turbocharged petrol engine systems.
The remainder of this article is organized as follows. The experiments are detailed in Section 2. Section 3 presents the fundamentals of machine learning approaches. After, the results analysis and discussion are presented in Sections 4 and 5, respectively. The conclusion is mentioned in Section 6.

Experiments
A platform for the evaluation of fault diagnosis algorithms and strategies [10], implemented in the Matlab/Simulink computational environment, was used to simulate and build all necessary data that were used to train the machine learning methods. That platform is a simulation testbed for fault diagnosis in turbocharged petrol engine systems, which allows for the selection of fault modes for different components and driving cycles. For this work, the fault mode applied was a leakage in the compressor system for all four driving cycles available, i.e., Worldwide Harmonized Light Vehicle Test Procedure (WLTP), New European Driving Cycle (NEDC), Extra-Urban Driving Cycle (EUDC), and the United States Environmental Protection Agency Federal Test Procedure (FTP-75), all with a sampling time equal to 35 milliseconds.

Dataset
The dataset was divided into two subsets: training and testing sets. The training sets were comprised of three out of four driving cycles, i.e., NEDC, EUDC, and FTP-75, whereas the WLTP was used for testing purposes. The target used by the machine learning methods was a binary error flag, resulting from the normalization of the residual value provided by the simulation. There were fourteen signals available in the simulator that could be used to feed the machine learning methods, as inputs. Five among them were selected by a brute force algorithm, which compared the best accuracy for combinations of them against the accuracy of when they were used. The selected inputs were the following: composed the testing set. Inputs and targets were normalized between 0 and 1 and only for the training dataset, they have shuffled afterward. The testing dataset was kept unshuffled to keep the time-series signal for plotting purposes.

Machine Learning Methods
Five machine learning algorithms were chosen and implemented by using Matlab's built-in functions for this work.

Single-Layer Feed-Forward Neural Network
An artificial neural network is composed of the connection of two or more mathematical elements called artificial neurons. These neurons act as functions that receive multiple inputs and produce a single output. A weight is assigned for each input of the artificial neuron as well as a bias for each neuron itself. The weighted inputs and the bias are added together, resulting in a linear output that is fed into a non-linear function, i.e., activation function, which is common to all neurons within the same layer [11]. A single-layer feed-forward neural network (SLFN) is an artificial neural network with only one hidden layer, formed by parallel artificial neurons, which connect the input neurons to the output neurons [12]. The configuration adopted in this work used 100 neurons in the hidden layer with a hyperbolic tangent activation function, along with a linear neuron in the output.

Random Vector Functional Link Networks
Random Vector Functional Link Networks (RVFL) is an SLFN in which the weights and biases of the hidden neurons are randomly generated within a suitable range and kept fixed while the output weights are computed via a simple closed-form solution [13,14]. Randomization-based neural networks benefit from the presence of direct links from the input layer to the output layer as in RVFL [15]. The original features are reused or propagated to the output layer via the direct links. The direct links act as a regularization for the randomization [16]. It also helps to keep the model complexity low with the RVFL being smaller and simpler compared to its other counterparts, which makes the RVFL attractive to use compared to other similar randomized neural networks [17]. The setup adopted in this work had 95 neurons in the hidden layer with hyperbolic tangent activation function, 5 enhanced neurons, and a linear neuron in the output. The total number of neurons was kept the same as in the SLFN structure (i.e., 100 neurons).

Support Vector Machines
Support vector machine (SVM) is a supervised learning algorithm that follows the principle of structural minimization of dimensional risk, based on the Vapnik-Chervonenkis theory [18]. Its goal is to classify a given set of data points, which are mapped to a multidimensional feature space using a kernel function by representing a decision limit as a hyperplane in a higher dimension, in the feature space [19]. One of the crucial ingredients for SVM is the so-called kernel trick which allows for the computation of scalar products in spaces of high dimension characteristics using simple functions, defined in pairs of input patterns. This trick allows for the formulation of non-linear variants for any algorithm that can be expressed in terms of scalar product; the most promising of these is SVM [20]. The kernel function used in this work was the Gaussian kernel function [21].

Random Forest
Random forest (RF) is an algorithm from the ensemble methods, which are methods that combine different models to obtain a single result. This feature makes these algorithms more robust and complex, leading to a higher computational cost that is usually accompanied by better results [22]. During the creation of a model, different configurations of this algorithm can be tested, thus generating different models, but, at the end of the machine learning process, only the best result is used. In an ensemble method, different models are created from an algorithm, but all the results are used instead; a result is obtained for each model and combined into a single result. For instance, the result with the highest frequency is the chosen one in classification problems [23]. An RF is made up of ensembled decision trees, which establish the rules for decision making [24]. The algorithm creates a graph structure, similar to a flowchart, with nodes where a condition is verified. Depending on the decision conditions attached to each node, the flow follows through one branch or the other, always leading to the next node, until the tree ends. With the training data, the algorithm searches for the configuration and node connections that minimize the error [25]. The number of ensembled classification trees adopted in this work was 100.

Gaussian Processes
A Gaussian process is a collection of random variables, indexed by time or space, fully specified by its mean and covariance functions, such that every finite collection of those random variables has a multivariate normal distribution [26]. Gaussian processes use lazy learning and a measure of the similarity between points (i.e., the kernel function) to predict the value for an unseen point from training data. The prediction is not just an estimate for that point but also has uncertainty information. For multi-output predictions, multivariate Gaussian processes are used, for which the multivariate Gaussian distribution is the marginal distribution at each point [27]. The kernel (i.e., covariance) function adopted in this work was the exponential kernel function [28].

Metrics and Statistics
To compare the performance of the five selected machine learning methods, the same training and testing datasets were used to feed all of them. The outputs were classified as true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs); thus, the metric used to compare the performance was the binary accuracy: which represents the rate of success of each assessed predictor [29]. All selected machine learning methods were trained and validated 90 times to generate enough data for statistical comparison [30], which was accomplished by evaluating 5 statistical values: the minimum, median, mean, maximum, and standard deviation of the accuracy. A second test performed was the application of a low-pass filter, in the form of a moving average filter [31], ensuring maximum possible accuracy but allowing some time delay before indicating a failure. The same metric and statistics from the first test (i.e., no filtering) were applied to this second part, along with the evaluation of the time delay associated with the failure detection.

Results
After 90 runs, the statistics for the accuracy of all five machine learning methods were evaluated, as shown by the results in Table 1. The statistics showed that the best results were achieved by the random Forest method, since its minimum accuracy, i.e., 0.88539, was greater than the second maximum accuracy, i.e., 0.806120, achieved by the support vector machines method. Nevertheless, it is possible to increase the accuracy of all methods by low-pass filtering the outputs. A brute force algorithm was used to sweep different moving average window sizes, starting with the size of one sample, incrementing it by unit steps, until it reached a maximum mean accuracy. Along with each moving average window size, there is an associated delay, which is one sampling time (i.e., 35 milliseconds) per window size unit. The results for each method, the moving average window sizes, and delays associated with them are presented in Table 2, together with the updated statistical values. The results for the low-pass filtering outputs of the machine learning methods applied differed. One important thing to notice is that all filtered methods reached the maximum accuracy during the test phase at least once, if the delay caused by the moving average window is considered. The residual error is due to the samples within the interval between the failure applied and the failure recognition, which depends directly on the time delay. Figure 1 shows the filtered output of the best run (i.e., highest maximum accuracy equal to 0.99262) for the Gaussian processes method, but the waveform was the same for the outputs of all methods.

Discussion
The results suggest that machine learning methods, trained with simulation data, can be used in predictive maintenance to recognize failures in automotive engine components. To increase precision, the application of low-pass filtering is necessary, leading to delays in fault detection which must be considered for each component, application, and design requirements. The computational cost is also a limiting factor for real-life applications, which may lead to unfeasibility depending on the embedded technology used on such applications. In that sense, further tests in onboard, real vehicles are necessary to validate all the methods used in this work, due to its accuracy and computational cost feasibility. Nevertheless, once it is validated, it is possible to apply the methods for different components, not limited to engine components but all vehicle components that can benefit from predictive maintenance in the form of fault diagnosis. Furthermore, the application of different approaches for fault recognition is considered for further research, such as multi-step forecasting based on mode decomposition [32][33][34] along with the artificial wavelet neural networks based on swarm intelligence paradigms [35].

Conclusions
Machine learning methods, such as random forest, support vector machines, single-layer feedforward neural networks, random vector functional link networks, and Gaussian processes can be applied as fault predictors for predictive maintenance in automotive engine components by using generated data from simulation testbeds for fault diagnosis, whereby fault behaviors can be simulated and compared when performed in distinct driving cycles. Maximum accuracy is reachable when a moving average (i.e., low-pass) filter is applied, but a response delay must be considered before fault recognition.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: