An intelligent bearing fault diagnosis system: A review

. Rolling element bearing (REB) is a well-known component that most extensively used in the industry. They operate in extreme condition (high temperature, dirty environment) which may lead to unexpected failure after the certain operation. Faulty on bearing cause severe equipment damage, financial loss a nd threaten people’s life. Development of proper fault diagnosis system of REB capable of preventing unexpected failure from occurs and maintain the machine work in the healthy state. Over a few decades, machine learning is introduced to provide a consistent fault diagnosis result. Hence, this paper reviewed the development of bearing diagnosis method using machine learning models.


Introduction
Machine learning model is a sub-category of artificial intelligence. It has the ability to automatically learn and generalize the data. The google trend of machine learning model in science and technology application are shown in Figure 1. In past 8 years, the use of artificial neural network is quite extensive compared to other machine learning model. However, in 2016, the deep learning model start to increase drastically and outperformed the ANN model. Thus, this study tends to summarise the previous of the usage of machine learning model in rolling element bearing fault diagnosis. Rolling element bearing (REB) is broadly used in domestic and industrial applications. The main purpose of the bearing is to enable linear and rotational movement in order to reduce friction and handling load. Working condition of these appliances depends on the smooth and quiet running of the bearings. Moreover, bearing extensively utilised in practical equipment, such as induction machines [1] wind turbines [2], helicopters [3], and trains [4] which consider as a critical component in a majority of machines where bearing component is exposure to extreme conditions (dirty conditions, high temperature, overstress) continuously. Dirty environment contaminates bearing's lubrication lead to the formation of the defect. Lubrication plays an important role to ensure bearing operate properly. In industry, lubrication problems contribute to 80 percent of bearing failure [5]. Further, the bearing fault is the common cause of rotating machinery failure [6] where bearing problems is the main contributions to wind turbine gearbox failure as illustrated in figure 2. Statistics show that 90 percent of the rotating machinery failure due to the bearing fault and 30 percent of the bearing failure occur on inner and outer race [7]. According to ISO 15242:2004(E), there is six common failure mode on rolling element bearing such as fatigue, corrosion, wear, plastic deformation, fracture and electrical erosion. The failure of bearing in the industry may lead to the long-term breakdown of the machine and high-cost of maintenance. Over the last two decades, the development of machine learning is quite extensive in machinery diagnosis. Machine learning is introduced in this field to assist human to make a robust and consistent decision diagnostic result.

Bearing fault diagnosis based on machine learning models
Over the last few decades, the massive development of machine learning on bearing fault diagnosis has been done. Figure 3 shows the machine learning employed in bearing fault diagnosis applications. . KNN applicable to those whose do not have any idea about the data distribution. The advantages of this method are no assumptions is needed about the characteristics of the concepts and complex concepts can be learned by local approximation using simple procedures. The optimal value of K can be determined by resampling method such as example bootstrap or cross validation. Another important factor that affects the accuracy of the algorithm is a distance of the metric which depending on the data set. The most broadly used distance metric are Euclidean, Chebyshev, Manhattan, Minkowski, Cityblock and hamming [17], [18]. But, the Euclidean distance between a sample of test data and the samples of training data is commonly used in KNN [19]. However, KNN is incapable for the large dataset due to computationally expensive since the performance depends on the number of dimensions. Yigit proposed Artificial Bee Colony is used to find the optimal parameter of KNN [20]. KNN have been utilised in bearing applications. For example, Pandya et al. diagnosed rolling element bearing with Hilbert-Huang transform of acoustic emission data using APF-KNN [21]. Safizadeh and Latifi fused data from multi-sensor for vibration fault diagnosis of bearings by the accelerometer and load cell where the data is classified using KNN [22]. Baraldi et al. utilised K-Nearest Neighbour for bearing diagnostics under varying conditions by integrating KNN with Binary differential evolution (BDE) algorithm [23]. The proposed scheme has the satisfactory performance to diagnose bearing under variable conditions.

Artificial Neural Network
ANN has shown its capability in few decades in classify the health condition of the rotating machinery. By mimicking the human brain principle, the ANN forecast the target value with abundant of data by learning the pattern of historical that comprises an input layer, a hidden layer and output layers. Information moves through different layers from the input to the output where each layer is connected to neuron to the next layer. The neural network works when a particular input is inserted into the network then a unique output will produce [24]. The estimated output is compared to the desired output using loss function. The network is tuned so that the estimated output match the desired output. This method is trained via Back-propagation (BP) algorithm where training data set and the output is compared with the desired output until the system produce minimum mean squared error (MSE) as a stopping criterion. There are several types of a back-propagation algorithm that can be used during ANN training phases such as Levenberg-Marquardt, gradient descent, gradient descent with adaptive learning rate, scaled conjugate gradient and BFGS quasi-Newton. The back-propagation process contains two stages which are feed-forward and backpropagation. The initial stage is a feed-forward stage where the network is assigned randomly that will result in an error value [25]. Then the error value will be taken to adjust the weights in the back-propagation stage. The process is repeated until the error reduce. Due to the capability of ANN in past few decades, this method has been used in bearing fault diagnosis. For example, Yu et. al diagnosed bearing conditions (healthy, outer race fault & inner race fault) using energy feature based on EMD method as input for ANN [26]. The result produced by ANN is good enough to classify the type of bearing failure. Further, Ben Ali et al. utilised the ANN method to diagnose bearing with the use of energy entropy from the IMFs signal which two additional parameters is proposed [27]. Based on the comparison with the previous research, this author able to examine bearing degradation without any human intervention. In the same year, the same author proposed automated nearly online bearing fault diagnosis to classify vibration signals' parameter using a combination of PNN and Simplified Fuzzy Adaptive Resonance Theory Map (SFAM). The application of ANN method able to recognise the different type of bearing defect [28]. Rao et al. utilised feed forward four layers ANN to analysed bearing defect size by using load, revolution per minutes (RPM) and RMS velocity as input to ANN [29]. Authors reported that ANN gives better defect size prediction with 3.75 percent of mean percentage error. Some authors argue with the successful implementation of ANN since the system depends so much on the proper selection of the network structure and need a large amount of training data. In reality, to obtain the abundant of real case data is not always available. Therefore, it is not suitable to apply ANN in real application otherwise applicable for experiment study.

Support Vector Machine
Last few years, the limited application of ANN on bearing diagnosis is found where a suitable classifier to mitigate the limitation of ANN is developed by Vapnik called Support vector machine (SVM) [30]. The main purpose is to separate the data by a maximum wide gap for different possible class by determining the separators (hyperplane) on search space [31] and the data are first mapped into a higher dimensional feature space through feature mapping function φ(x). SVM reduces the occurrence of risk during the training phase since it follows the Structural Risk Minimization (SRM) principle which has good performance in term of dealing with small data size. Kernel functions used in SVM mainly consists of four types of such as linear, polynomial, RBF and sigmoid. RBF by far is the most kernel used for classification in SVM. Jiang et al. used a combination of time domain feature and support vector machine (SVM) to classify the features from the multisensory where the selected feature able to identify defect from gear, bearing and rotor [32]. SVM method depends on the number vote to classify the data into each class. However, if the data contains an equivalent number of vote, SVM unable to made any decision but it will choose the class that come first in the arrangement of the classes. Hui et al. proposed an automated fault diagnosis on bearing components method by using the combination method of SVM-Dempster Shafer (DS) to classify the healthy and faulty data more accurate compared to conventional SVM [33]. The authors succeed to increase the accuracy from 76% to 94% and demonstrated the effectiveness of DS evidence theory by eliminating the conflict decision. Zeng et al. utilised one-class classification based on SVM for bearing problems and compare with one-class classification based on the convex hull [34].

Relevance Vector Machine
Relevance Vector Machine (RVM) is a supervise learning that contains sparse kernel model that represents a Bayesian inference for probabilistic classification and regression with prior that result in the sparse in a sparse representation [31]. The RVM contains identical functional form as support vector machine but it provides probabilistic interpretation for its output [35]. RVM is based on a hierarchical prior, where in the first level, the weighted parameter is an independent Gaussian prior while in the second level an independent Gamma hyperprior is used for the variance parameters. Relevance vector machine does not suffer from the limitation of SVM where kernel function of SVM must satisfy the Mercer's conditions [36]. In addition, the RVM model provides a probabilistic output that contains less relevance vector for a given dataset and does not need to define regularisation parameter (C) [37]. Maio et al. predicted the residual life of bearing components using the combination of RVM and exponential regression which then the proposed scheme outperforms another modelbased method [38]. In addition, Tran et al. utilised thermal imaging techniques to monitor the bearing state condition using RVM and generalised discriminant analysis [39]. Fei forecasted the kurtosis of bearing vibration signal using a combination of RVM and artificial bee colony (ABC) algorithm. ABC is employed to select the kernel parameter of RVM [40].

Extreme Learning Machine
Extreme learning machine (ELM) also has been introduced in rotating machinery diagnosis. ELM is the improvement of SVM method which proposed to increase the adaptivity of SVM which aims to focus on regression applications. ELM algorithm provides an easy of implementation, quick learning rate and less human intervention where the user does not need to select proper kernel function where random kernel ELM is introduced that tend to provide smallest training error and also the smallest norm of output weights [41]. In addition, its maximises the distance of the separating margins of the two different classes in the ELM feature space by minimising the norm of output weights while the hidden neurons (contains almost nonlinear activation functions) can be randomly generated independently of training samples. The hidden layer of ELM does not require tuning step [42]. There are several improvements of ELM as illustrated in

Deep learning
Maintenance decision making requires a machine learning algorithm to acquire the behaviour of the machine based on the feature extraction during data processing. ANN and SVM are broadly used in this area due to its performance. Recently, machine learning has moved toward deep machine learning. The difference of machine learning model and deep learning model is shown in Figure 4. The deep learning capable of reducing three processes of traditional fault diagnosis. One of the promises of deep learning is the capability to replace classical feature engineering with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction where it can adaptively learn useful parameters from the raw data through multiple nonlinear transformations [46]. Deep learning actually utilise functions of higher complexity in order to deal with more complex problems [47]. Yu and Deng described two additional key of deep learning [48]. Firstly, the model is described as generative nature which naturally requires an extra top layer to perform the discriminative task. Secondly, deep learning is useful for big amounts of unlabeled training data for mining the structures and consistencies in the input features since its provide an unsupervised pretraining step. In general, deep learning is invented from neural network which recently is utilized in many applications image recognition, speech recognition and etc. Several deep learning model can be used in bearing fault diagnosis such as restricted Boltzmann machine (RBM) [50], Convolutional neural network (CNN) [51], [52], deep autoencoder [53] and deep belief network (DBN) [54]. DNN able to adaptively analyse raw data information through multiple layers of non-linear transformations and approximate complex nonlinear function with small error. autoencoder are trained by two main procedures. Firstly, each of autoencoder layers is pre-train with unsupervised techniques, like autoencoders. Then, the autoencoder structure is fine-tuned with back-propagation (BP) algorithm to achieve high accuracy of prediction [55]. A trained autoencoder can produce the input's representation by reconstructing the original data. In contrast, ANN requires the feature input that is extracted and selected manually by the user from the signal that acquires from a sensor which largely depending on prior knowledge about signal processing techniques and user expertise. Jia et al. applied autoencoder for bearing diagnosis to reveal fault characteristics of bearings under variable operation conditions [55]. DBN is a generative graphical model which composed of several stacked RBMs. RBM comprises of two layers which are visible nodes layer and hidden nodes layer where each layer is denoted by a random binary vector and it is a bipartite undirected graphical model. He et al. did a research on bearing fault diagnosis using RBM [50]. The authors compare Gaussian RBM classifier with other classifiers, such as extreme learning machine, support vector machine, and deep belief network. The result indicates that the proposed scheme capable of diagnosing bearing state conditions accurately. Li et al. combined deep Boltzmann Machine (DBM) with random forest by fusing acoustic and vibration signal to diagnose gearbox condition [57].
In addition, CNN also received the amount of attention which has been proven successful in any types of domains. In details, the CNN structure contains multiple layers including convolutional layer, subsampling or pooling layer, LRN layer, fully-connected layer and the output layer [60]. Haedong et al. diagnosed rotating machinery fault based on orbit plot and the orbit plot images are used as input in CNN. The proposed method enable to classify fault mode via orbit plot [61]. Janssens et al. applied CNN for bearing fault diagnosis [62]. The results indicate the proposed scheme expressively out-performs the handcraft feature based approach which requires manual selection of feature. Recently, Guo et al. treated bearing problems using a deep convolutional neural network where the proposed scheme outperformed the traditional method like SVM [63].

Conclusion
In this paper, a machine learning model that was used on bearing fault diagnosis has been reviewed. Among the models, SVM achieved the highest usage in bearing fault diagnosis. However, the future trend shows that deep learning will receive a lot of attention by researcher due to its capability of providing automated feature extraction and feature selection.