An overview of PST for vibration based fault diagnostics in rotating machinery

In general, diagnostics can be defined as the procedure of mapping the information obtained in the measurement space to the presence and magnitude of faults in the fault space. These measurements, and especially their nonlinear features, have the potential to be exploited to detect changes in dynamics due to the faults. We have been developing some interesting techniques for fault diagnostics with gratifying results. These techniques are fundamentally based on extracting appropriate features of nonlinear dynamical behavior of dynamic systems. In particular, this paper provides an overview of a technique we have developed called Phase Space Topology (PST), which has so far displayed remarkable effectiveness in unearthing faults in machinery. Applications to bearing, gear and crack diagnostics are briefly discussed.


Introduction
Fault diagnostics of practical systems is a very important problem that needs to be solved robustly in order to be able to make giant leaps in reliability and safety. Diagnostics is essentially an epistemological problem that require us to make intelligent inferences based on data, that could be derived from empirical observations or computer models, which are often incomplete, noisy and uncertain. Although there is a rich and varied literature, we feel that many of the diagnostic techniques in use are quite ad hoc and heuristic, resulting in lack of general applicability 1 .
This paper presents innovative and rigorous techniques involving the nonlinear characteristics in a computational intelligence setting to diagnose changes in complex systems. Our approach consists of developing diagnostic methods using a combination of nonlinear dynamic analysis and computational intelligence techniques. In this paper, several applications are chosen with sufficient generality to be able to apply to a host of disciplines.
The theoretical approaches are validated using data from fault simulators at Villanova University and Case Western University; we also validate our algorithms using experimental data from practical machinery provided by United Technologies Research Center (UTRC, USA) and Federal University of Uberlândia (Brazil).  [4] The rest of the paper is organized as follows. Section 2 describes a family of methods that were originated and derived by our team: Phase Space Topology (PST) and Extended Phase Space Topology (EPST). In Section 3, we present a recent investigation of EPST for bearing defect analysis. Section 4 summarizes some of the applications that were investigated in order to generalize the applicability of our developed methods. Finally, Section 5 concludes the paper.

Extended phase space topology method
We first developed the method of Phase Space Topology [1][2][3], which is based on the transformation of phase space into the density space, which is characterized with quantitative measures. It was shown that, depending on the geometry and shape of the phase space, the density diagram contains peaks of various heights and sharpness at multiple locations, an example of which is shown in Fig. 1. This stems from the fact that the dynamical system occupies more time at specific regions of the space causing higher densities in those regions. The properties of the peaks in the density diagrams including the location, height and sharpness of the peaks were used as features in the initial approach. Despite the success of this approach, the need to search for the peaks in the density diagrams made it difficult or sometimes even impractical to implement, especially for systems with noisy or more complex phase space patterns. This led to EPST, which is a continuation of our development of the PST family of algorithms. EPST is based on approximating the density distribution with Legendre polynomials, the details of which are described below.

Kernel density estimation
Let X=(x 1 , x 2 , ..., x n ) be an independent and identically distributed sample data drawn from a distribution with an unknown density function f . The shape of this function can be estimated by its kernel density estimator (the hat,ˆindicates that it is an estimate, and the subscript indicates that its value can depend on h). Here, h >0 is a smoothing parameter called the bandwidth, and K(.) is the kernel function which satisfies the following requirements.
There is a range of kernel functions that can be used, including uniform, triangular, biweight, triweight, Epanechnikov and normal. Due to its conventional and convenient mathematical properties, we use the normal density function in our approach, defined as the following:

Density distribution approximation
Let x be a state of the system and y d =f h (x), its density computed using the kernel density estimator. y d is then approximated with Legendre orthogonal polynomials. Legendre polynomials can directly be obtained from Rodrigues' formula which is given by: It can also be obtained using Bonnet's recursion formula: where the first two terms are given by: The coefficients of the Legendre polynomials are obtained by using the least squares method assuming the following linear regression model: Letting the estimated coefficients are given by: The coefficientsβ constitute the features in our approach that can be used in classification or regression problems. The approximated density using Legendre Polynomials is then calculated using the following: Root mean square error (RMSE) and Pearson's correlation coefficient (PCC) are calculated to compute the quality of the fit using the following equations: where, Z = (y d − f ) is the residual vector, N is the number of points in the density function,

Artificial neural network
Artificial neural networks (ANN) are a set of algorithms that are designed to recognize a relation or a pattern between inputs and outputs. ANN consists of an interconnected group of nodes called artificial neurons and each node has a corresponding weight that adjusts as learning proceeds. ANN can be used to solve regression problems for a continuous output or classification problems for a discrete output. One of the most used algorithms in training ANN is the backpropagation algorithm, which is a popular method for optimizing the weights of the ANN in order to correctly map inputs to outputs. It works by propagating an input forward through the network layers to the output layer where the calculated output is compared with the desired output. The error values of the calculated output and the desired output are computed and propagated backwards. These errors are traced back to each associated neuron in order to update the weights.

Example application: bearing diagnostics
The algorithm, a flowchart of which is illustrated later in Fig. 3, is best explained with an example application; in this case, we choose the bearing diagnostics problem. Many traditional bearing fault detection techniques involve pattern recognition, which is effective only at one operating speed and requires retraining the classifier each time the rotational shaft speed changes because of the dependence of the dynamic response of the system and the rotational speed. This limitation motivates the need for a new method that is effective under variable operating speeds. The current study investigates different bearing configurations under two operating conditions: (A) constant operating speed and (B) variable operating speed.
In order to achieve this goal, the classification problem was initially performed by training and testing the classifier on the same set of speeds. The classifier was trained at 19 rotational speeds, and then tested on the same set of rotational speeds. The second step involved generalizing this diagnostic approach to variable operating speeds. In this step, the classifier was trained on one set of speeds and then tested on another different set of speeds. The detailed description of the analysis of both of the above mentioned procedures is provided in the sequel.

Case A: Constant operating speed
A rotating fault simulator machine, shown in Fig. 2, is employed to study a variety of different bearing defects under various rotational speeds (300-3000 rpm). Four bearing conditions were investigated: healthy bearings (H), bearings with inner race defects (IR), outer race defects (OR) and ball defects (B). Proximity probe sensors were used to measure the vibration signal of the shafts in two orthogonal directions.
The density function of the horizontal vibration signal for every speed and bearing condition was approximated using Legendre Polynomials. The order of the polynomial was selected based on the best fit between the estimated density function and the approximated density function. Root mean square error and Pearson's correlation coefficient were calculated to compute the quality of the fit. Legendre polynomials of order 20 were used to approximate the estimated density functions. The coefficients of the Legendre polynomials were computed for each of the 760 sampled signals using the least squares method as shown in Eq.9. The computed coefficients for each case were saved in a vector of 21 arrays (using only the horizontal vibration signal), which was used as a feature input to train an ANN classifier. Since the rotation speed has a high impact on the response of the dynamic behavior, it was used as an additional feature, making the total number of features equal to 22. With The performance of the classification model is presented by means of confusion matrices. In general, in a confusion matrix, the predicted classes are compared with the actual classes. Each row of the matrix represents the results of prediction for the corresponding class at that row, while each column represents the actual class. The elements in the main diagonal of the matrix represent the correct classified prediction for each corresponding class. These elements are known as true positives. For a specific row, all elements excluding the element on the main diagonal are the misclassified prediction for the corresponding class, which are known as false positives. A false negative for a specific class is defined as the summation of elements on its corresponding column, excluding the element on the main diagonal.
The classifier performance can be analyzed using certain evaluation matrices derived from the confusion matrix, such as accuracy, sensitivity and precision. Table 1 shows the predictions for training and test data using the neural network classifier. As can be seen, the classifier has been able to predict all defects with 100% accuracy, 100% precision and 100% sensitivity with no misclassification. This result is remarkable for several reasons. Firstly, it shows that combining the EPST method with the proximity sensor data can resolve the challenges in identifying faults at low rotational speeds (below 10 Hz). Secondly, no a priori knowledge of the system was included in the features. This suggests that the EPST approach can be conveniently applied to diverse dynamical systems in an automated process, with minimal need for adaptation and reliance on expert knowledge about the system. Conventional bearing analyses search for specific characteristics of the system such as ball pass frequencies but this study did not require any additional analysis because the method functioned well without it. We note with caution that it may well be the case that other operating conditions require additional feature combinations. Finally, no feature ranking or feature selection al-

Target Class
gorithm [5] was employed to select the optimal feature set. Due to the fact that the effect of coefficients in function approximation decreases by increasing the number of orthogonal functions, the calculated coefficients are naturally ranked by their order of significance.

Case B: Variable operating speed
For this part of the study, horizontal and vertical vibration data were used for every speed and bearing condition to construct the density function. The estimated density functions were then approximated using Legendre polynomials of order 20. As in case A, the order of the approximated density function was selected based on root mean square error and Pearson's correlation coefficient. The first 15 Legendre polynomial coefficients in each direction were used as a feature set. The shaft rotational speed was added to the feature set to produce an input vector of 31 arrays for each sampled data. The feature vector was used as an input to train the artificial neural network classifier. The neural network was modeled with 10 neurons and the back propagation algorithm. The classifier was trained using the extracted features from vibration data for different bearing conditions and for four rotational speeds. The rotational speeds that were selected to train the classifier are: the machine operating range boundaries (300 and 3000 rpm) and two middle speeds (1200 and 2400 rpm). The available vibration data of 160 total samples for different bearing conditions at these speeds were used for training the artificial neural network. The remaining 600 samples obtained at the other speeds (e.g., 420, 600, . . ., 2820 rpm) were used for testing the trained classifier. Figure 3 shows a flowchart for the algorithm of case B. The classification results represented as a confusion matrix for the test data for four bearing conditions are shown in Table 2. As can be seen, the classifier has 96.7% overall accuracy with 20 misclassifications out of 600 predictions. These results indicate a high prediction rate of the classifier for the four bearing conditions. Most of the misclassified predictions are for bearings with ball defects. For a better understanding of the classifier performance, sensitivity and precision were calculated for each bearing condition and are shown in Table 2. These evaluation matrices represent a measure of the classification performance for each bearing condition.

Summary of applications
This section presents a summary of some of the applications that we investigated including bearing fault diagnostics, gear fault diagnostics and crack shaft diagnostics. In the following subsections a brief introduction to each system along with the main contributions are described.

Other studies in bearing diagnostics
Rotating machinery are probably among the most important components in industry. Rotating machines are composed of different sub-systems interacting with each other in a nonlinear fashion; changes in any of these components can significantly affect the overall performance. Rolling element bearing defects are one of the major sources of breakdown in rotating machinery. The rotating fault simulator machine shown in Fig. 2, that is mentioned in Section 3, is considered to study a variety of different machinery defects under various operating conditions such as rotational speed, load and unbalance. It basically consists of a motor-driven shaft mounted on two bearings. Shafts and bearings with different sizes and conditions can be used. Various vibration sensors can be used such as accelerometers and proximity probes.
We have performed various investigations on this setup in order to develop robust techniques to diagnose bearings. In [5][6][7][8][9], we have applied conventional methods such as fast Fourier transform, envelope spectrum, and discrete wavelet transform over a span of rotational speeds as well as nonlinear physics-based modeling [10][11][12][13]. Accelerometers data were used it to extract features to diagnose bearings with inner race, outer race and ball defects. Mutual information was then used as a ranking technique, and the optimal feature subset corresponding to the highest classification accuracy was determined. An overall accuracy of 97.0% was achieved using this procedure.
In [14,15], we introduced the mapped density method in order to discriminate simultaneous bearing faults under various rotational speeds. In this work we studied the use of the information provided by proximity probe sensors. The method has significant success in fault discrimination for a single and two bearing fault configurations (accuracy of 97% for a single bearing fault and 92% for two bearing faults). Moreover, the results indicate that this method has high performance in distinguishing between different bearing conditions signatures (accuracy of 88%).
In [4,16,17], the EPST method was introduced. In this work we perform bearing diagnostics on different rotational speeds domains and obtained very good results (overall accuracy of 96.7%). We have also applied other nonlinear techniques such as recurrence plots [18] , Gottwald and Melbourne's 0-1 test and the Hugichi fractal dimension [19].

Gear-train setup
Gear fault diagnostics is still a challenging task because of the highly nonlinear characteristics of faults and its complex nonstationary dynamics. Our work investigated gear fault diagnostics using vibration data of a helicopter gearbox mock-up system provided by UTRC. The experimental setup is shown in Fig. 4. This work studied multiple test gears with different health conditions such as healthy gears (H) and defective gears with root crack on one tooth (SCD), multiple cracks on five teeth (MCD) and missing tooth (MTD) are studied. The vibrational signals were recorded using a triaxial accelerometer installed on the test gearbox.
In [20,21], we presented the application of recurrence plots (RPs) and recurrence quantification analysis (RQA) in the diagnostics of various faults in a gear-train system. It also apply mutual information to rank the extracted features in order to obtain an optimal feature set. Results indicate that RQA parameters provide valuable information in characterizing the dynamics of various gear faults in order to discriminate the healthy gear condition from defective conditions. Also, outstanding performance was achieved using RQA parameters to identify various gear conditions with 100% accuracy, 100% recall and 100% precision in detecting multiple cracks and missing tooth conditions.
In [22], the EPST method was applied to detect anomaly behavior and to diagnose various gear defects. Results indicates 99% accuracy in classifying between different gear conditions.

Crack detection
Experiments were conducted on a Crack Propagation Simulator test rig, shown in flexible steel shaft mounted on two roller bearings. Two orthogonal proximity probes oriented in the horizontal and vertical directions were used to measure the vibration response of the shaft. The crack propagator was used over a period of 24 hours to produce a fatigue crack, which is the first damage condition. Then the second fatigue crack was produced by using the crack propagator for another 24 hour period.
In [23], the EPST method integrated with mutual information was applied to detect cracks and to identify the level of degradation. Mutual information was used to only select the most relevant EPST extracted features. Results show 100% performance by using this algorithm; it is notable that only three features were necessary to detect cracks and identify the crack level.