Based on Gaussian Mixture Model to Explore the General Characteristic and Make Recognition of XinTianYou

XinTianYou, a folk song style from Shannxi province in China, is considered to be a precious traditional culture heritage. Research about XinTianYou is important to the overall Chinese folk music theory and is potentially quite useful for the culture preservation and applications. In this paper, we analyze the general characteristics of XinTianYou by using the pitch, rhythm features and the combination of these two features. First, we use the Gaussian Mixture Model (GMM) to cluster the XinTianYou audio based on pitch and rhythm respectively, and analyze the general characteristics of XinTianYou based on the clustering result. Second, we propose an improved Features Relative Contribution Algorithm (CFRCA) to com-pare the contributions of pitch and rhythm. Third, the probability of a song being XinTianYou can be estimated based on the GMM and the cosine similarity distance. The experimental results show that XinTianYou has large pitch span and large proportion of high pitch value (about 22%). Regarding the rhythm, we find that moderato is dominated while lento-moderato keep a similar ratio as moderato-allegro. The similarity between pitch features of all XinTianYou songs is more significant than rhythm features. Additionally, the average accuracy of XinTianYou recognition reaches 92.4% based on our method


Introduction
XinTianYou is an improvisational folk song style which stems from Northern Shannxi, China.It is a unique music form and a treasure of traditional Chinese culture.Exploring the general characteristics of XinTianYou can improve the understanding of the aesthetic characteristics and the artistic style of this genre of music, and potentially benefit the cultural preservation.
In this paper we aim to find and understand the general characteristics of XinTianYou.We use the Gaussian Mixture Model (GMM) to analyze the XinTianYou audio features including pitch and rhythm and do the clustering.Then analyze the general characteristics of XinTianYou based on the clustering results and music theory.A new method named Complete Features Relative Contribution Algorithm (CFRCA) is introduced to compare the contribution of pitch and rhythm.Lastly, we use GMM and the cosine similarity distance to recognize XinTianYou.Potentially our method can also be applied to other types of folk songs, which enlighten the study of finding the "general characteristics" of different kinds of folk songs and further find the general characteristics or gene of Chinese culture.
There is a variety of related work in Chinese folk music area that has looked at finding general characteristics of XinTianYou.In [1] the authors point out that the tune of XinTianYou is generally divided into two types: one has slow speed, long-drawn-out melody and broad range, while the other has a more rigorous structure, graceful and smooth melody.Chen.et al [2] think that XinTianYou has free rhythm and unchained melody styles.Zhang.et al [3] points out that the basic structure of XinTianYou is stable and clear.The basic structure contains two continuous 4th melodies.XinTianYou also features sonorous drawn-out strains and flexible tempo.Liu [4] states that XinTianYou has simple tunes but rich mode scale, and "double 4th interval structure" melody is an important feature of XinTianYou.These previous studies focus on the music theory.They cannot support their conclusion with precise data evidence and recognize XinTianYou automatically because of the lack of quantitative analysis and scientific experiment.
In the area of music information retrieval, quantitative analysis and identification method have been used.But lots of research work focuses on the classification of music genres [5][6][7], which mainly cares about the accuracy of classification rather than the general characteristics.
In this paper, we apply mathematic methods and modern data analysis techniques to analyze the general characteristics and make recognition of XinTianYou .We first apply GMM to the pitch and rhythm features of the music data, then explore the general characteristics, and use GMM combined with cosine similarity to identify XinTianYou.The experiment shows that our results are consistent with music theory and our method can identify XinTianYou effectively.
The paper is organized as follows: we describe the use of GMM for audio features in Section 2 then analyze the general characteristics of XinTianYou based on pitch and rhythm in Section 3 next we demonstrate the experimental results in Section 4 and finally the conclusions and future work in Section 5.

The processing of GMM based on features of audio data
Gaussian Mixture model (GMM) [8][9] has been widely used in the acoustic area.According to the statistics theory, using a linear combination of Gaussian probability density functions can approximate arbitrary distribution, thus theoretically GMM could describe the statistical distribution of music elements.We apply GMM to cluster the features of XinTianYou audio data.The clustering result can represent overall information of the audio data, and is useful for analyzing its general characteristics and further the reorganization of XinTianYou.

Formatting the title, authors and affiliations
The database we use in this paper contains audio data of 109 XinTianYou songs.They are selected from Integration of Chinese folk songs [1], which is one of the most authoritative collections of Chinese folk songs.This collection contains the most rich, comprehensive and authentic Chinese folk song music [10].After getting audio data for each song, we decompose each audio data to short segments in order to get locally stable signal.To make sure the adjacent segments transfer smoothly, we allow overlap between them.Assume that a piece of audio data has been sampled with frequency F, we can get NL samples.The number of frames Nframes of this data can be computed as follows: In which Nf is frame length -the number of samples in each frame, and NM, is overlap length -the number of samples in the overlapping region.Practically F, Nf and NM are set to 22050Hz, 1024 and 896 respectively.

The process of GMM
GMM Gaussian mixture model uses the weighted linear combination of Gaussian probability density functions to representation the statistical distribution of data.Here we build a GMM for each of the features extracted from our audio data.
Each audio piece includes Nframes frames and for each frame we compute features.Each feature is Ndimensional (N may change for different features).Then each feature vector for each frame can be described as:

Dynamic K-means to initialize GMM
When building GMM, we need to find a proper cluster number to balance the accuracy and computational efficiency.Here we apply the dynamic K-Means algorithm to determine the cluster number automatically.
The idea of dynamic K-Means is finding a cluster number which can minimize the following error function: Where K is clustering number, L(k) is the k-th cluster, x is feature vector, µk is the mean of cluster L(k).Through dynamic K-Means algorithm, we can determine the cluster number automatically, and calculate the mean, variance and weights of each cluster.Then we initialize the Gaussian mixture model with these parameters.In our experiment, the K for feature LSP, PF is 30 and for feature Tempo, FP is 6, 10 respectively.

Estimate parameters of GMM
Assume each feature of each frame data obeys the Gaussian mixture distribution: We use the Expectation Maximization (EM) method to find the Maximum Likelihood Estimation (MLE) in order to estimate the model parameters πk, µk and Σk.More specifically we find the parameters which make the expectation of the following logarithmic likelihood function to be maximal: )} ( log{ The EM algorithm is an iterative algorithm that contains two main steps.First, in the E-step, it tries to "guess" the probability that sample xi is produced by each Gaussian component; second, in the M-step, it updates the parameters of the model based on the guesses.
For each feature we can compute a GMM by using above method.All the GMMs can form a database.The process of building such a database is illustrated in Fig. 1.In the following sections we show how to use this database for the analysis and recognition of XinTianYou.

General characteristics analysis based on GMM
The basic elements of music include pitch level, rhythm, strength and timbre.The pitch level and length elements are the dominant features music.This is because the artistic effects of a piece of music do not change radically if the strength and timbre changed, and we can still recognize the basic melody.But if the pitch level or length are changed, the music might become unrecognizable.Therefore, the most important basic elements of music are the pitch level and length.So we choose pitch, rhythm, and the combination of them to explore the general characteristics of our data.

General characteristics on pitch
Pitch, indicates the subjective feeling of human when hearing sound.The value of pitch depends on the fundamental frequency of sound.Higher fundamental frequency results in higher pitch value.We use two typical features to represent pitch: Linear Spectrum Pairs (LSP) and Pitch Frequency (PF).
The parameters of LSP are recognition coefficients ) , where where A(z) is the amplitude.From (5) can be seen, lower-order coefficients describe the overall shape of the signal, and higher-order coefficients denote more detailed information.The change of pitch value results in the corresponding change of A(z) This feature is mainly reflected by the lower-order coefficients, which located in the first few dimensions in LSP.

Figure. 2. The distribution of music pitch
Fig. 2 shows different musical labels which represent different pitches.The first few dimensions of LSP change according to pitches significantly.The larger the pitch values, the higher the LSP values.In Table 1, we show the some of the pitches in Fig. 2 and the first two dimensions of the corresponding LSP.
We use the GMM to cluster the values of LSP parameter extracted from our data.Table 2 shows the GMM parameters.We can find the centre of different clusters by GMM_LSP from Table 2.The different mean of clusters is mapped into ranges of pitch values.For example, the cluster 1 corresponding to the mean value (0.0478, 0.0599) falls into the pitch ranges d1-e1, then the weight of cluster is accumulated into pitch ranges d1-e1.We get the following histogram (Fig. 3) by summing all weights in the same pitch ranges.Fig. 3 shows that the pitch of XinTianYou spans a large range and mainly gathers in high tone area.Here we analyze the histogram in details.First, we can see that the overall pitch range is between d-e and c3-d3.Generally speaking, the high pitch of soprano is about c2-c3, and tenor is c1-c2.The clusters fall into pitch range d2-e2 reach nearly 22%, which shows the pitch is gathered in a higher value area.Also we can see that the proportion of altos is the majority part of the histogram.This is because of the limitation of human voice and the need of altos for the preparation of climax.Moreover, the center of cluster ranges from one-lined octave a-b to three-line octave c3-d3, which is across three octaves.This shows that the pitch spans a relatively large range.In addition, the histogram shows some very small values, such as those for slots b2-c3, b-c1 and e2-f2.These values look like 0 because of the visualization of the histogram.The reason for these small values is that the Chinese Five-tone modes do not include semitone relations except that the six and seven tones do have semitone, but the six and seven tones rarely happen in Chinese folk songs.Lastly, we can see that there is even a recognizable value in pitch range c3-d3, which seems too high for human to sing.In fact such high notes are rarely happen in XinTianYou.The reason is that LSP may confuse overtones with normal notes and causes this unusual high pitch.To analyze the pitch, we also compute the feature PF.The correspondence between PF and pitch is shown in Table 3.
It can be seen from Table 3 that there are a simple linear relationship between pitch and PF.The difference of PF value is 100 corresponding a semitone, such as e->f, b->c, and 200 corresponding a whole tone such as c->d, d->e, f->g.Based on PF, the GMM clusters results are showed as Table 4.
We also compute a histogram for the PF cluster by using the cluster mean.The weights for clusters are accumulated into 23 different pitch ranges.The result is showed as Fig. 4.

General characteristics on rhythm
Rhythm denotes the organization of the length and strength of sound.It includes time-related elements such as tempo and beat.Here we use Tempo and FP to represent the rhythm information.
Tempo is computed by first detecting the peak of the signal curve, then counting the number of beats per time unit.It reflects the rhythm information to some extent.Table 5 shows the tempo values and corresponding speed.We compute the GMM for tempo feature.The parameters of clusters are shown in Table 6 below.We build a histogram by using the centroid of each cluster.The vertical axis values of the histogram are the proportion of the clusters falls into each tempo range.The histogram is showed in the Fig. 5.The general characteristics of XinTianYou rhythm is that the distribution of tempo spans from Lento to Allegro.The central part of the whole range is dominant while the two endings keep a similar ratio.The values from Mp to Mf are dominated, which is consistent with the frequency of walking.In addition, the figure shows that the value for Allegro is zero.This is because the Allegro is clustered into Allegretto because there are only few instances in XinTianYou.
Fluctuation Pattern (FP) is also used to analyze the distribution of speed.The FP values and their corresponding rhythm types are shown in Table 7. Again we compute the histogram as shown in Fig. 6, from which we can see a similar distribution as Tempo.Compared to Fig. 5, the Allegro to Molto mosso has a larger value.This denotes that there is few Molto mosso in XinTianYou.

Analysis on the general characteristics of pitch and rhythm's combination
As introduced in Section 3.1 and 3.2, the general characteristics of pitch of XinTianYou is the overall high value and the large span, and the general characteristics of rhythm is that moderato is the dominant part and lentomoderato has a similar ratio as moderato-allegro.In this section, we aim to analyze the combination of these two factors.We propose to use a method named CFRCA (Complete Features Relative Contribution Algorithm) for this purpose.This method is inspired by FRCA (Features Relative Contribution Algorithm) [17] which can be used to compute all features' relative contributions.In the rest of this section we will provide more details of both FRCA and CFRCA.
First of all, we normalize the extracted features to map the feature values to the same range.More specifically we use a Standardized deviation based method which is commonly used.The calculation formula is as follows: .

FRCA(Features Relative Contribution Algorithm)
For the features sequence ) ,..., , ( ,the relative contribution of feature x i can be calculated as follows: )] Where J(i, j) represents the cosine similarity between the i-th and j-th features, which can be computed by ) , ( cos ) , ( , 1 ; M represents the number of songs.

CFRCA (Complete Features Relative Contribution Algorithm )
The result of FRCA, the relative contribution C 1 (i), just represents the relative importance of feature x i in a specific feature arrangement.The relative contribution C 2 (i) of feature x i in other sequences can be different form C 1 (i).To make the contribution measurement independent to the feature arrangement, we put propose to use CFRCA, which considers all permutations of N features: where C k (i) represents the relative contribution of i-th feature in k-th permutation.By doing this we can get the contribution of each feature which fully consider the permutations between all features.Fig. 7 shows the pseudo code of CFRCA. )

ICCAE 2016
The experimental result of CFRCA is shown in Fig. 8.We can see that the contribution of pitch is relatively higher than rhythm.The contribution of LSP is the highest in all four features, which shows that LSP is the most representative feature of pitch.Although lower than LSP, the contribution of PF is still outstanding.The lowest contribution is from FP.The reason is that FP represents more information of rhythm compare to Tempo and the contribution of rhythm is low [18].In conclusion, from this section we find that XinTianYou has the following general characters: first, the pitch of spans a large range and mainly gathers in the high tone area.Second, the distribution of tempo spans from Lento to Allegro.The central part of the whole range is dominant while the two endings keep a similar ratio.Third, the pitch feature is more representative than the rhythm feature.

Recognition of XinTianYou based on GMM
In this section we use the GMM computed in Section 3.1 and Section 3.2 to recognize XinTianYou.
We randomly select 80 from the overall 109 songs for the training set and the remaining 29 songs and another 30 non-XinTianYou folk songs as test set.To measure the similarity between the test data and the GMM models, we use the following likelihood equation: Given a test song, we compare each frame of it to each GMM center and compute the cosine similarity.Then we compute the mean of all these similarities and use it for the final measurement.
To make our results take the form of probability, we normalize the value of the likelihood value for all 80 training data.More specifically each similarity value is divided by the maximum value.As a result the probability interval of XinTianYou is [min(P(x))/max(P(y)),1] x=1,2,3,…80; y=1,2,3,…80.The similarity distribution can be got by sorting the P(n) values of the training set.This make the P(n) value for test data is higher.We then compute P(n) for test data.We consider the test song is more likely to be XinTianYou if P(n) is higher.
In the experiment, we use the cross-validation that randomly selected five combinations of training set (80 pieces) and test set (29 pieces of XinTianYou and 30 pieces of non-XinTianYou).Table 9 shows the similarity range and probability interval of training sets.Table 10 shows the true positives (the number of 29 pieces of XinTianYou correctly classified as XinTianYou), and Table 11 shows the false positives (the number of 30 pieces of non-XinTianYou incorrectly classified as XinTianYou).Probability Interval is small here because there exists general characteristics in pitch and rhythm of XinTianYou, which make different folk songs of test sets substituted into the model can also get high probability.ICCAE 2016 see some the testing interval boundaries are greater than 100% in the Table 10.This is because we use the training sets' probability interval as references, and the resulting probability interval of test sets can be out of this range.

Conclusions
In this paper we use pitch and rhythm features to analyze XinTianYou, a traditional Chinese folk song style.The Gaussian Mixture Model is used for the clustering analysis and finding the general characters of XinTianYou data.We analyze the results with consideration of music theory, and also propose a method to recognize XinTianYou.
The experimental results show that the pitch of XinTianYou spans a large range, which is over 3 octaves.High pitch values take a proportion about 22%.The experiment of rhythm of XinTianYou shows that moderato is the dominant part while lento-moderato keeps a similar ratio as moderato-allegro.The pitch features of XinTianYou are more representative than rhythm features.Moreover, by using our method the average accuracy of recognition XinTianYou reaches 92.4%.
"General characteristics" provides a new way of looking into the commonality or gene of Chinese folk songs.The method we use in this paper can also be applied to other styles of music other than XinTianYou.In the future we plan to explore other types of Chinese folk song styles, and look for the general characteristics between them.

Figure 3 .
Figure 3.The histogram of pitch feature Pitch the normalized sum of cluster weights

Figure 4 .pitch c 3 -d 3
Figure 4.The histogram of pitch featureBased on the distribution of PF, it can be further concluded that the pitch of XinTianYou spans in a large range and mainly gathers in high pitch area.The pitch range above d2-e2 takes nearly 23%.The proportion of altos takes a majority part of the data.The pitch range is from a-b to c3-d3, which covers three octaves.This demonstrates again that XinTianYou spans a relatively large range.The proportion of the pitch above c 3 is different from the result of LSP.The histogram value for

Figure 5 .
Figure 5.The histogram of rhythm feature

Figure 6 .
Figure 6.The proportion of XinTianYou speed p , where i p represents the ith permutation.Then the average of the contribution values computed by all permutations is used for the final contribution:

Fig. 8 .
Fig. 8.The contribution of each feature by using CFRCA

Table 1 .
Pitch and the corresponding LSP.

Table 3 .
Pitch and corresponding PF.

Table 4 .
GMM parameters of PF

Table 5 .
Speed and corresponding Tempo

Table 6 .
The parameters of GMM_Tempo clusters

Table 7 . Speed and corresponding FP
The GMM parameters of Tempo clusters are shown in

Table 8 .
The parameters of GMM_FP clusters

Table 9
The similarity range and probability interval of training sets

Table 10
The recognition of 29 pieces of XinTianYou in test sets

Table 11
The recognition of 30 pieces of non-XinTianYou in test sets

Table 9 .
This again proves that general characteristics in pitch and rhythm of XinTianYou.From Table 10 we can see that there are 26.8 true positives on average and the average accuracy is 92.4138%, and Table 11 shows the False identification rate is only 7.33% on average.The high accuracy and the low False identification rate shows that our method can recognize XinTianYou quite effectively.Besides, we can