Recognition of Aircraft Engine Sound Based on GMM-UBM Model

—Gaussian mixture model-universal background model (GMM-UBM) is a commonly-used speaker recognition technology, and which has achieved good effect for detection speaker’s sound. In this paper, we explore GMM-UBM method for use with abnormal aircraft engine sound detection. We designed a GMM-UBM based aircraft engine sound recognition system, which extracts MFCC feature parameters and trains the GMM-UBM models using maximum a posteriori (MAP) adaptive algorithm. Experimental results show the GMM-UBM based aircraft engine sound recognition system can achieve higher recognize rate in real-word aircraft engine sound test.


Introduction
In daily life conditions, we often need to recognize many kinds of environmental sound. Sound recognition technology has been successfully used in the bio-medicine, agriculture, public security, astronomy and many other fields. Sound recognition also plays important role in Aircraft field. Aircraft engine sound signal contains a lot of important information, which makes the aircraft engine sound signal recognition being a very important tool for engine status diagnosis [1][2].
There are many research works are conducted in sound recognition community. Harma.M proposed an audio monitoring system using in the office environment [3]. They combine the tone and spectrum centroid features to detect the events occurred in the office. In [4], an audio classification system for detecting crime in the elevator is reported. Their system uses GMM models to classify and identify alarm sounds events. In literature [5], Ntalampiras et al. use mel-frequency cepstrum coefficient (MFCC), MPEG-7, and CB-TEO feature to detect dangerous sound events occurred in subway station. In [6], Cheng Qingyun employs GMM method to identify two kinds of abnormal sounds in office environment. In [7], Zhou proposed to construct feature set including MFCC and logarithmic domain subband energy. They use Adboost algorithm to select effective features to detect conference room sound event. In [8,9], the researchers developed an audio monitoring system that uses MFCC and short-term energy as sound event characteristics. GMMs were used as recognition models to detect abnormal sound events in the elevator environment. Theoretically, the engine sound signal recognition are the same as the speech recognition, therefore the speech recognition technology can be explored for aircraft engine sound recognition. However, as aircraft engine sound acquisition is difficult, few aircraft engine sound processing research literatures are reported until now.
In this paper, a GMM-UBM based model is used to identify the engine sound, we use MFCC as sound features, and GMM-UBM models is trained by MAP optimization algorithm. The system can effectively recognize abnormal engine sound only require few data.

Universal Background Model
GMM-UBM is a high order GMM, it can reduce the variation of speaker utterance differences [10]. GMM is a linear weighted combination of a certain number of Gaussian probability density functions. The M-order Gaussian mixture model GMM uses the linear combination of M single Gaussian distributions to describe the distribution of frame features in the feature space: where X is a D-dimensional eigenvector, where  is the mean vector and  is the covariance matrix. Thus, a complete Gaussian mixture model can be expressed as GMM learning generally use the classic EM (Expectation-maximization) algorithm. According to the maximum likelihood estimation criterion, the likelihood of GMM model can be expressed as: According to EM algorithm, the weight, mean and variance of the i-th GMM model can be calculated by The posterior probability of the i-th GMM model is Speaker adaptation schemes should ideally be effective for small amounts of speaker-specific adaptation data and converge to a true speaker dependent estimate when a large amount of data is available. Most GMM-based speaker recognition systems are trained using maximum a posteriori (MAP) estimation [11].
For a particular Gaussian mean, with prior mean 0  , the estimate is where  is a meta-parameter which gives the bias between the ML estimate of the mean from the data, and the prior mean Speaker recognition technology can be transferred into a sound recognition problem, so the sound recognition can be built based on GMM [12][13][14][15][16][17][18][19]. For a given sound feature vector {X t }, t=1,2,..., T, assuming we have K classes of aircraft engine sound which is different from each other, the purpose of sound recognition is to find the class of sound k, whose corresponding GMM model k obtain the largest posterior probability P( k/X). Figure 1 shows the GMM-UBM based aircraft engine sound recognition system. In the training phase, after preprocess the engine sound data, we extract the MFCC feature of the engine sound. Then the UBM model is trained via EM learning algorithm. Finally, each individual GMM model is obtained using MAP of the UBM with the aircraft engine sound data. In the recognition phase, the matched likelihood is calculated used the variation between the output scores of GMM and UBM  

Engine sound recognition system architecture
where t X is the eigenvector of a certain frame of audio to be recognized,

Experimental results and analysis
The aircraft engine sound database used for experiments contains a number of aircraft engine sounds from Civil Aviation Boeing 727 and Military Fighter F-15. In the training phase, we selected 61 engine audio for UBM training, including 12 start-up audio, 8 stop audio, 30 normal operation audio and 11 crash audio, the total length of the sound is 388s. the test sound are 6 start-up, 6 stop, 29 normal and 11 crash audio clips. MAP is made using 332s length audio. In the test phase, the test engine sound include engine start sound, stop sound, run sound and some engine crash sound. The total length of the engine sound used in the experiment is 1474s, containing of engine starting sound 347s, stop sound 205s, the normal operation EITCE 2017 sound 629s, and crash sound 293s. All the sound data sampling frequency is 16 KHz, monaural recording, 16bit format. The sound features are MFCC coefficients. Recognition rate is used for evaluation the performance of sound recognition system. number of right recognition rate = 100% total number of samples 

The effect of mixture number on performance
Firstly, we compare the effect of different mixture component on the recognition performance. We use the 12 dimensional MFCC characteristic parameters. It can be seen from the experimental results that the best average recognition is achieved when the mixture number is 256. when increase the number of mixture components, the recognition rate is also increases, but when the mixture number increase to a certain value, the recognition rate no longer increases.

The effect of different MFCC coefficients on performance
Standard MFCC coefficient only represent static coefficient feature. In this experiment, we will evaluate the combination of standard and dynamic MFCC feature. Let MFCC be the first-order differential coefficient, and MFCC be the second-order differential coefficient. MFCC and MFCC reflect the dynamic characteristics. It can be seen from Table 2 that the recognition rate of the MFCC and MFCC can significantly improve recognition performance of the start-up sound, which is due to the dynamic characteristics of the start-up sound. In the experiments, we can see that HMFCC (the combination of static and dynamic coefficients) can improve system performance. For the normal and crash sounds, the HMFCC feature can obtain the 100% recognition rate.
From above experiments we can see that increase the number of mixture components, recognition rate can improve as well, but when it reaches a certain value, the recognition rate began to decline. This is because the GMM model is a generative mode which relies on the amount of training data. The higher the mixture order is, the more complex the model will be. The amount of data in this experiment is relatively small. If the mixture number increases too much, it will result in excessive fitting problem and the GMM model performance become deteriorate.

Conclusion
In this paper, a GMM-UBM based aircraft engine sound recognition system is presented. This system can work well in little limited data case. Experimental results show that the recognition system with the combination of dynamic and static MFCC features could achieve higher recognition rate. In future research work, we will investigate incorporating more prior knowledge about the aircraft engine sound into the framework, such as the sound continuity property, sound duration property, to further improve the robustness and accuracy for the proposed system. We also interest in applying DNN (Deep Neural Networks) for aircraft engine sound recognition.