Multi-source fault identification based on combined deep learning

This study establishes a multi-source fault identification method based on a combined deep learning strategy to identify a multi-source fault effectively in the fault diagnosis of complex industrial systems. This framework is composed of feature extraction and classifier design. In the first state, the signal is transformed to the time-frequency domain and the time-frequency feature is learned using stacked denoising autoencoders. A learning method that consists of unsupervised pre-learning and supervised fine-tuning is used to train this deep model. In the second state, a model for an ensemble multiple support vector machine classifier is created to recognize fault information. Ten types of rolling bearing signals were adopted in a simulation experiment to validate the effectiveness of the proposed framework. The results demonstrate that the joint model helps to obtain higher recognition accuracy.


Introduction
The components of modern complex industrial systems and their interiors generally have many complicated and strongly correlated coupling relationships, many uncertainty factors, and a large amount of uncertainty information, which frequently causes failures because of such characteristics as randomness, subsequence, concurrency, and transmissibility [1]. It is difficult to determine the correlated coupling relationships between constituent units using the traditional fault diagnosis method used for a single piece of equipment, subsystem, or sub-unit, and the selection of an effective feature combination typically depends on the experience of experts. The identification accuracy is also difficult to guarantee. Therefore, it is of great significance to determine a data-based autonomous feature learning method to improve the performance of fault identification. With the rapid development of artificial intelligence technology, deep learning has become a topic of common concern in the field of signal processing.

Feature extraction based on a stacked denoising autoencoder (SDAE)
Under normal conditions, there are few abnormal data in the actual scenario of a complex industrial system, so this status of system should make it difficult to meet the demand for full-mode samples using a data-driven fault diagnosis model. The traditional solution is to accurately depict all normal states using a data model, and a system state will be considered an anomaly if it is different from the normal state provided by the data model. This method requires accurate data modeling. When some prior knowledge is missing or special conditions are insufficient, it often leads to state misjudgment and increases the uncertainty of the diagnosis decision. Therefore, the approach used to build a universal model to adapt to the impact of exogenous variables on the system and determine the inherent characteristics of the system becomes the key to fault anomaly detection.
The basic unit of the stacked autoencoder(SAE) is the autoencoder (AE);the SAE is formed by stacking multiple AEs, cascading several AEs into a multilayer neural network. Its output can be considered as the characteristic representation of input data after multiple dimensionality reduction [8]. Each AE can be regarded as an artificial neural network with a single hidden layer, and output y can reconstruct input xas much as possible by seeking optimal parameter (W, b). At this time, output of the hidden layer (k,2) h y can be regarded as the low-dimensional feature of input x after dimension reduction. Fig. 1   Generally, the loss function of AE contains input and output mean square error constraints, weight attenuation constraints, and sparsity constraints, thereby making feature extraction of the hidden layer sparse and robust. Its mathematical description is as follows: where m, n, and l S represent the number of samples, number of network layers, and number of l-layer units, respectively; AE uses a gradient descent algorithm to train the network parameters to minimize the loss function (1). In the pre-training phase, each AE is trained independently from the lower layer, with the goal of minimizing the error between its output and input. After the lower layer AE training is complete, its hidden layer output is used as the input of the higher layer AE, and the higher layer AE continues to be trained until all AE training is complete. In the global fine-tuning stage, the weights and offsets between the trained AE input layer and the hidden layer are used as the initial weights and offsets, the labels of the data are used as the supervisory signals to calculate the network errors, the Back propagation algorithm is used to calculate the errors of each layer, and the gradient descent method is used to adjust the weights and offsets of each layer. The training process is unsupervised without using input tags. On the basis of the classic AE, some "enhanced" models have been proposed, such as thedenoisingautoencoder (DAE) [9] and convolution autoencoder (CAE) [10]. Among them, the DAE is a more generalized model because of its ability to learnsome "broken" input data; these "broken" data are "cleaned,"that is, erased, randomly before input. Therefore, in this paper, astacked DAE (SDAE) model that consists of "stacking" multiple DAEs is used for autonomous feature learning of the collected signals.
The main functions of the SDAE model are noise reduction, filtering, and feature extraction. It differs from the fault diagnosis based on DBN and CNN in that both the encoder and decoder in the SDAE model can be used to integrate feature extraction algorithms and classification recognition algorithms [11]. Therefore, the SDAE model adapts to the shortage of fault samples in the actual operation of complex industrial systems, and only needs a small amount of sample data; that is, the SDAE model can achieve higher performance for fault diagnosiscoupled with appropriate classification and identification techniques.

Fault recognition based on a Multi-SVM classifier
An important link in fault diagnosis is fault location, which can distinguish multi-source faults with a strong correlation through classifiers to realize the diagnosis and identification of fault types and fault locations. The classification performance and identification ability of classifiers directly determine the effect of fault diagnosis. An SVM is widely used in many pattern recognition and data mining tasks. An SVM classifier not only has strong generalization ability but can also map data to high-dimensional space through a kernel function to classify linear inseparable data. Common kernel functions include the linear kernel function, Gaussian kernel function, and polynomial kernel function. Because an SVM classifier based on different kernel functionsperforms differently on the same training set, the selection of SVM kernel functions often becomes the core component of optimal classification results for a given classification task. However, in the process of fault identification, system faults have the characteristics of randomness, fuzziness, and information uncertainty. The selection of an SVM classifier with a single kernel function clearly affects the accuracy of fault identification for complex industrial systems. To avoid the problem of the kernel function selection of an SVM classifier, an effective method is to combine multiple SVM classifiers. The fault identification framework based on a multi-classifier combination is shown in Fig. 2.

Multi-classifier combination algorithm
A multi-classifier combination algorithm is a model fusion method. Its basic idea is to assume that different classifiers can provide different side information for the result, and by fusing this information, it can provide a better decision. In [12], the authors proposed a method of the linear integration of deep learning models, which trains ahidden Markov model (HMM) classifier by combining the outputs of different deep neural network models to obtain better results than a single deep neural network model. Inspired by this method, in this paper, a multi-classifier combination model is constructed based on posterior probability, which is described as follows: Suppose z represents a classification task in which z can be assigned to one of C pattern categories. Then the object to be classified has C pattern categories defined by . The true posterior probability of the sample Clearly, the posterior probability of sample i x belonging to a certain category is one; otherwise, itiszero. Assuming M classifiers, the k-th classifier's posterior probability for N samples can be expressed as follows: . Then the posterior probability of classifier k to the sample More generally, the decision value of a sample belonging to a certain pattern category is obtained by linear combination of the output decision values of multiple classifiers. Depending on this decision value, the combination model can determine what type of category sample is. Therefore, it can be described as: Then, the combination model of multiple classifiers can be described as: where cc k WR   represents the weight matrix of the classifier and 1 Tc bR   represents the offset vector. The parameterestimation of equation (7) isused to minimize the difference between the output value and the real value of the combined model, which can be measured using the least square error, that is, To avoid overfitting of the model, after adding the L2 regularization term to the loss function, equation (8) can be rewritten as follows: where k  is the regularization parameter and can be selected by cross-validation. It can be seen that equation (9) is a multivariate function extreme value problem, which can be solved using the least square method. First, by solving the partial derivative of , T Wb, that is, The solution equations can be solved using Thus, the adaptive solution of the optimal weight matrix and the optimal offset vector of the combined classifier can be obtained using the above method. However, a standard SVM cannot provide the posterior probability output of samples, so it is necessary to determine an SVM posterior probability output method to achieve the purpose of using a multi-SVM classifier combination.

Posterior probability output of SVM
In pattern classification decision-making, a standard SVM can obtain the decision value that the sample belongs to a certain category, but cannot provide its corresponding posterior probability. However, there is a certain correlation between the decision value and the posterior probability. The larger the absolute value of the decision value of the sample, the higher the reliability that it belongs to a certain category. Therefore, the posterior probability of SVM can be obtained using the method proposed in the literature [13]. The method fits a sigmoid model through training data, which can map the decision value of the SVM to the posterior probability. The mathematical model can be expressed as: In Equation (17), f is the decision value output by the SVM; and parameters A and B can be obtained through training. It should be noted that this method is only applicable to the two-class classification problem. For the multi-class classification problem, it can be divided into a plurality of two-class classification problems, and the corresponding posterior probabilities can be obtained and normalized.

Experiments
In this section, an ensemble of multi-SVM classifiers is used for pattern recognition experiments. It consists of a linear combination of the linear kernel-based SVM (labeled SVM_LK), polynomial kernel-based SVM (labeled SVM_PK), and Gaussian kernel-based SVM (labeled SVM_GK). And the simulation table is shown in Fig.3.

Fig. 3. Rolling bearing simulation table.
The experiments simulated four bearing operation states: normal (NOR), rolling element fault (Ball), inner ring fault (IR), and outer ring fault (OR). Each mild, moderate, and severe fault was simulated for three diameters: 0.018 mm, 0.036 mm, and 0.053 mm. The 10 states of the test platform were classified and identified, and the experimental conditions are described in Table 1. The platform environment for the experiments was Matlab2017A + LibSVM3.12, and the SDAE model was built using the Deeplearntoolbox toolbox. The settings were an empty load, the rotating shaft speed was 1792r/min, sampling frequency of the acceleration sensor was 48kHz, and sampling time was approximately 5s.
Clearly, the data of bearing rotation one week is about 1600 data points.During the test, eight groups (1600/200 = 8) of data was set for different sample lengths, with a step size of 200 data points. The time-domain waveforms of the ten state of bearing was shown in Fig.4.   Fig. 4. The time-domain waveforms of different bearing state.
A training set that contained 50 samples and a test set that contained 20 samples were randomly selected from each groupduring the test. For each group of data, by selecting the training sample set and test sample set from the 10 types of data in Table 1, a training set that contained a total of 500 samples and a test set that contained a total of 200 samples were finally obtained. After the data was simply normalized, it was input into the hidden layer of the fault identification model designed in this study.

Experiments on the SDAE model
In this section, to verify the effectiveness of SDAE feature learning, it was compared withbi-dimensional empirical mode decomposition (BEMD) [14]and multi-scalepermutation entropy (MSPE) [14] feature extraction methods, and Gaussian noise of -5dB, -4dB, -3dB, and -2dB was added to verify the feature recognition capability of different methods under different noise backgrounds. There are 6feature recognition frameworks were selected: BEND+SVM_LK, BEMD+SVM_GK, MSPE+SVM_LK, MSPE+SVM_GK, SDAE+SVM_LK and SDAE+SVM_GK. The SDAE model with three hidden layers was constructed. The parameters of the model were set using the default values of the Deeplearn Toolbox toolbox. The average classification accuracy of each method is shown in Table 2. The results in Table 2 show that SDAE model shows a good feature learning ability under strong background noise, and accuracy of the linear kernel-based SVM classifier is better than the Gaussian kernel-based SVM classifier.

Experiments on the Multi-SVM classifier
To verify the recognition rate effect of the multi-SVM classifier, the output value of the last hidden layer of the SDAE model was used as the input feature of each comparison classifier (SVM_LK, SVM_PK, SVM_GK and Soft-max), where the default values of theLibSVM toolbox were used for parameter setting; and the three SVM linear combinations were compared with each single classifier. The experimental results are shown in Table 3. As can be seen from Table 2 and Table 3, the SVM based on linear kernel function has improved the classification accuracy to a certain extent compared with the traditional Soft-max classifier. At the same time, it is easy to see that SVM_PK, SVM_LK and SVM_GK are very similar in classification performance, among which SVM_LK has the best classification performance. Compared with SDAE + Soft-max model, the combination model in this paper can improve the classification accuracy by about 2 percentage points, and the combination model can also achieve better performance than the method using single SVM classifier.
Finally, to further compare the performance of each classification model. Under the same noise environment, the classifiers based on different kernel functions were tested with different test sample lengths. For convenience, they are labeled LK, PK, and GK, and the method in this article is labeled MSVM. The sample length increased from zero to 1,600 according to the previously set of eight groups of samples. The relationship between the recognition rate and number of samples is shown in Fig. 5. As can be seen from Fig. 2, when the number of samples was small, the classifier recognition rate was low, which could be caused by the fact that the original signal contained all the associated signals, whereas the effective fault feature signal occupied only a small portion, and valuable fault information was seriously disturbed or masked by other information after the original signal was time-frequency converted, thereby making it difficult to classify correctly. However, with the increase of the number of test samples, when the sample length increased to a certain extent, the sample length had little influence on the correct identification.

Conclusion
In this paper, a multi-source fault identification method based on a hybrid deep learning model was proposed, which was applied to the classification and identification of bearing faults and verified by 10 common bearing state signals. The results demonstrate that, first, compared with the traditional vibration signal feature extraction method, the SDAE model-based feature extraction method obtains higher accuracy. Second, the multi-SVM combined classifier model can further improve signal recognition performance.
Additionally, deep learning can directly mine valuable information from existing data, whereas it is difficult to explain the physical meaning of feature parameters extracted based on the deep neural network model and training of the model requires a large amount of labeled data, so complete data collection is required to support the model. For the multi-sensor monitoring data of complex industrial systems, the approach used to deeply fuse the statistical model with the fault mechanism model is a research direction to be further studied.