Bearings Fault Diagnosis Based on Convolutional Neural Networks with 2-D Representation of Vibration Signals as Input

. Periodic vibration signals captured by the accelerometers carry rich information for bearing fault diagnosis. Existing methods mostly rely on hand-crafted time-consuming preprocessing of data to acquire suitable features. In this paper, we use an easy and effective method to transform the 1-D temporal vibration signal into a 2-D image. With the signal image, convolutional Neural Network (CNN) is used to train the raw vibration data. As powerful feature extractor and classifier for image recognition, CNN can learn to acquire features most suitable for the classification task by being trained. With the image format of vibration signals, the neuron in fully-connected layer of CNN can see farther and capture the periodic feature of signals. According to the results of the experiments, when fed in enough training samples, the proposed method outperforms other common methods. The proposed method can also be applied to solve intelligent diagnosis problems of other machine systems.


Introduction
As core components of rotating mechanism, the health conditions of rolling element bearings, for example, the fault diameters in different places under different loads could have a direct impact on the performance, stability and life span of the mechanism.To prevent possible damages, a real-time monitoring of vibration is needed during the operation of rotating mechanism.The condition signals of the rolling element bearings are collected by the sensors, and then intelligent fault diagnosis methods are implemented to recognize the fault types [1]- [3].Most intelligent fault diagnosis methods can be divided into two steps, namely, feature extraction and classification.Features such as mean, median, kurtosis, peak to peak, minimum, maximum, standard deviation, absolute mean, skewness and root mean square of the vibration signals are common used to describe the condition of bearings [4].In recent years, machine learning methods such as support vector machine (SVM) [5] and neural networks have become dominant methods to predict the fault types with the features extracted from the raw temporal vibration signals.
Over the last five years, with the fast development of deep learning especially the convolutional neural networks [6] (CNNs), image classification has achieved incredible success.This success came from the efficient use of GPUs, ReLUs, dropout and data augment.CNNs are now the dominant approach for almost all recognition and detection tasks and are comparable to human beings on some tasks [7].In order to implement CNN on the 1-D signal, common method is to treat the signal as an image with height equals 1, which is already used in speech recognition [8].However, the vibration signal is different from the speech signal.The vibration signal is periodic, which means the data in time t is not only correlated to the neighboring data but also the data collected in several cycles' intervals.In order to capture the features of periodic signals, a 2-D representation of these signals is used in this paper.This representation is first used to analyze and compress the power-quality event data [9], this 2-D transformation method is quite easy and intuitive, which will be shown in next section.With a signal image, CNN can easily extract the periodic features with small convolution and pooling kernel.We show that a segment of long periodic signals from any initialization can be classified with CNNs with the transformed image input format.
The remainder of this paper is organized as follows.The intelligent diagnosis method with new input format based on CNNs is introduced in Section 2. Some experiments are conducted to compare our methods with some common methods.Then discussions about the results of the experiments are presented in Section 3. We draw the conclusions and summarize the future work in Section 4.

Bearing fault intelligent diagnosis using the proposed method 2.1 A brief introduction to CNNs
The architecture of CNNs is briefly introduced in this section, more details for CNNs can be found in [6].
The convolutional neural network is a multi-stage neural network which is composed of some filter stages and one classification stage.The filter stage is designed to extract features from the inputs, which contains the convolutional layer and the pooling layer.The classification stage is a multi-layer perceptron, which is composed of several fully-connected layers.The function of each type of layer will be described as follows.

Convolutional layer
The convolutional layer convolves the input local regions with filter kernels, then generates the output features by the activation unit.Each filter uses the same kernel which is also known as weight sharing, to extract the local feature of the input local region.One filter corresponds to one frame in the next layer, and the number of frames is called the depth of this layer.We use and ܾ to denote the weights and bias of the i-th filter kernel in layer l, respectively, and use (݆) to denote the j-th local region in layer l.The convolutional process is described as follows: where the notation * computes the dot product of the kernel and the local regions, and ‫ݕ‬ ାଵ (݆) denotes the input of the j-th neuron in frame i of layer ݈ + 1.

Figure 1. Images of the vibration signals for different bear faults
After the convolutional operation, the Rectified Linear Unit (ReLU) which computes the function ‫)ݔ(݂‬ = max {0, ‫}ݔ‬ is used as the activation unit of our model to accelerate the convergence of the CNNs.

Pooling layer
Pooling layer is usually placed after a convolutional layer in the CNN architecture.It functions as a downsampling operation which merges semantically similar features into one and reduces the parameters of the network.The most commonly used pooling layer is max-pooling layer, which performs the local max operation over the input features.It can reduce the parameters and obtain location-invariant features at the same time.The max-pooling transformation is described as follows: where ‫)ݐ(‬ denotes the value of t-th neuron in the i-th frame of layer l, W and H are the width and height of the pooling region respectively, and ାଵ (݆) denotes the corresponding value of the neuron in layer ݈ + 1 of the pooling operation.

2-D representation of vibration signals
It is easy to transform the 1-D temporal signal into a 2-D image for recognition of CNN.Firstly, raw signal x is divided into n equal parts.Secondly, each part is aligned as the row of the signal image in sequence.This transformation can be described in mathematical format as follows:

CNN-based intelligent diagnosis method
The architecture of the proposed CNN model is shown in Fig. 2. It is composed of two filter stages and one classification stage.The input of the CNN is a signal image of bearing fault vibration signal.The first convolutional layer extract features from the input image directly without any other transformation.The features of the images are extracted from two convolutional layers and two pooling layers.As shown in Fig. 2, with the increasing of the number of layers, the depth of the layers becomes larger while the width of each frame becomes smaller.The classification stage is composed of two fullyconnected layers to accomplish the classification process.In the output layer, the softmax function is used to make the logits of the ten neurons accord with the probability distribution for the ten different bearing health conditions.Softmax function is: where ‫ݖ‬ denotes the logits of the j-th output neuron.
The loss function of our CNN model is cross-entropy between the estimated softmax output probability distribution and the target class probability distribution.Let ‫)ݔ(‬ denotes the target distribution and ‫)ݔ(ݍ‬ denotes estimated distribution, so the cross-entropy between ‫)ݔ(‬ and ‫)ݔ(ݍ‬ is: In order to minimize the loss function, the Adam Stochastic optimization algorithm is applied to train our CNN models.Adam implements straightforwardly, computes efficiently and requires little memory, which is quite suitable for models with big data or many parameters.The detail of this optimization algorithm can be found in [10].In order to make CNN more robust, Batch Normalization [11] is used to cope with problem of bad initialization.As a common way to control overfitting, dropout is used in the training process.3 Experiments and discussion

Data description
In order to train the CNN model sufficiently, huge numbers of training samples are prepared.The original experiments data was obtained from the accelerometers on a motor driving mechanical system at a sampling frequency of 12 kHz from the Case Western Reserve University Bearing Data center.There are four fault types of the bearing health.Namely, they are normal, ball fault, inner race fault and out race fault, and sizes of the fault diameters include 0.007mm, 0.014mm and 0.021mm, so there are all together ten conditions.In this experiment, each samples contain 2400 data points.he 2400 × 1 1-D signal is transformed into an image with size of 60 × 40.
Dataset A contains 30000 training data and 7500 testing data of ten different fault conditions under three loads.In order to evaluate the necessity of large training data for the CNN, dataset B containing 1500 training is also prepared.The details of all the datasets are described in Table 1.

Baseline system
We compare our methods with the standard ANN system with frequency features.The ANN system have one single hidden layer.The input of the networks is the normalized 2400 Fourier coefficients transformed from the raw temporal signals using fast Fourier transformation (FFT).

Hyper-parameters of the proposed CNN
The architecture of the proposed CNN is composed of 2 convolutional layers, 2 pooling layers, and also one fullyconnected hidden layer.The parameters of the convolutional and pooling layers are shown in Table 2.
The number of the neurons in the fully-connected hidden layer is 500.The experiments were implemented using tensorflow toolbox of Google.

Results
After taking into account the sample quantity of each dataset, 3000 samples and 100 samples are randomly selected as the training validation dataset of A and B, respectively.Twenty trials were implemented for the diagnosis of each dataset.Fig. 3 and Table 3 show the fault recognition results of datasets A and B, and compares this to raw 1-D signal recognition using CNN directly and the baseline system.As shown in Fig. 3(a), the diagnosis accuracies of all twenty trials are over 99.7%.Actually, except for the sixth trial, the diagnosis accuracies of rest trials on dataset A are higher than 99.9%.This means that by feeding CNN with vibration signal images, a very good result can be achieved when diagnosing the fault conditions of the rolling element bearings.The accuracy of the CNN declined when trained with dataset B, which indicates that more training data will result in higher accuracy.The reason for this would be discussed later.Compared with other methods, the accuracies of our method outperform the others.The standard deviations of 20 trials of the proposed method on both the validation samples and test samples of dataset A are the smallest among all three methods.According to the compare above, our method is the most precise and stable one.The results on dataset B are quite different he test accuracy of the proposed method declines near 1.8%, while the standard deviation on test samples increases by seven times.However, the results of the baseline system remain almost unchanged in terms of test accuracy and standard deviation.

Discussion
Some interesting points acquired from the experiments above are as follows: The results above show that CNN performs excellently when dealing with the bearing vibration signal images.Although the performance of the FFT+ANN method is not bad, it could be quite tricky to choose between FFT or WT as the preprocessing method of the data to get a very high accuracy, because the most suitable frequency features may vary from different dataset.The advantage of our method is that it can see much further than the method which uses CNN directly on the 1-D signals.We now explore how far a neuron in the first fully-connected layer can see using the two methods.In order to ensure that the numbers of parameters remain the same, we compress the 5 × 5 convolution kernel into the size of 25 × 1 and 2 × 2 pooling kernel into the size of 4 × 1 for the CNN architecture performed on 1-D signals.
The stride for convolutional layers equals 1 for both methods.Assume the first signal point the neuron in fullyconnected layer can see is ‫,)ݐ(ݔ‬ for the raw 1-D CNN, the farthest point it can see is ‫ݐ(ݔ‬ + 136) , while for the proposed method, the farthest point it can see is ‫ݐ(ݔ‬ + 976).According to the analysis, the neurons in the fullyconnected layer of the proposed method see much farther than the other method.With this feature of the proposed method, CNN can better capture the periodic features than directly performing on the 1-D signals.
When the size of the dataset is large enough to train the CNN sufficiently, CNN has better and more stable performance than common methods.The major drawback of CNN is the need for huge amount of data, as can be seen from Table 3.This may result in overfitting and it could also make it hard to be applied to industries where the labeled data is hard to acquire.However, the labeled data of the rolling element bearing vibration is easy to acquire, so data augmentation is unnecessary.For example, when the sample frequency is 12 kHz, the length of the data is 2400 and the stride is 1, it takes only 1 second to acquire 9601 different labeled samples.So CNN is quite suitable for the bearing fault diagnosis since huge amount of data are available.
What's more, it is shown that it is possible to recognize the bearing fault type with periodic vibration signal images as input to the CNNs.What we want to emphasize here is that the bearing vibration signal is just a representative of the periodic signals, and this method may be good at recognizing many other kinds of periodic signals, which may provide us a new approach to deal with these kinds of signals.
5) where denotes the signal image and ‫)ݐ(ݔ‬ is the vibration data of time t.In order to show the image clearly, we transform a section signal with length of 40000 into a 200 × 200 image.The images with different faults are shown in Fig. 1.As shown in Fig. 1, the images of signals look just like texture images.People may have problem in recognizing each fault type directly from the 1-D signal wave.However, with simple transformation of each signal, we can easy classify each signal from the signal images.With this intuition, CNN may perform better on the signal image than the 1-D raw signal.

Figure 2 .
Figure 2. Architecture of the proposed CNN model

Figure 3 .
Figure 3. Fault recognition results of the proposed method and the baseline system in 20 trials.(a) and (d) are the validation and test accuracy of proposed method on dataset A and B, respectively.(b) and (e) are the validation and test accuracy of method using CNN recognition the 1-D signals, respectively.(c) and (f) are the validation and test accuracy of baseline system, respectively

Table 1 .
Description of rolling element bearing datasets