Hearing aid classification method based on improved AP algorithm

Based on the large medical data to evaluate the performance of the hearing aid is a promising way. Achieving the classification of the hearing aid is the foundation. In this paper an improved semisupervised AP clustering algorithm based on density path is proposed. The PESQ score is taken as the substitution of subjective score for the speech segments, which is also taken as a semi-supervised basis to improve classification accuracy. The Euclidean distance similarity is improved based on the density path, making it suitable for complex shape data sets. Through experimental verification, compared with the traditional AP algorithm, the improved algorithm shows obvious advantages in terms of hearing aid classification accuracy and recognition performance.


Introduction
Hearing aids worn by different hearing patients are also different in style, and there are also great differences in their quality performance. The quality of hearing aids is of great importance to hearing impaired patients. A large amount of data was generated during the treatment process. How to effectively extract some valuable information from these data, distinguish hearing aids of different styles, and improve the detection level of hearing aid product performance have important theoretical and practical significance.
Clustering is a widely used method in data mining. At present, the AP selection algorithm is promising. it does not need to select the initial value, and allows the distribution of data in a non-Euler also allows unconventional point -point measurement method. But it also has shortcomings. Its complexity is high, and there are limitations to large-scale data clustering. literature [1] proposed an incremental AP algorithm which could improve the effectiveness when applied to large data, the literature [2] proposed the M-AP clustering algorithm and added a merge process to effectively solve complex shape dataset problems; Literature [3] summarized some methods and compared them, and determined that the performance of IGP indicators to find the best clustering number. But facing the complexity of hearing aids medical data, all the above AP related algorithms can't obtain acceptable results. In this paper, an improved AP algorithm is put forward and its effectiveness is evaluated.

Affinity Propagation Clustering Algorithm
AP clustering is a new clustering algorithm. Frey et al. [4] first proposed AP algorithm in Science in 2007 which is an unsupervised clustering algorithm. No need to determine the number of clusters and the cluster center in advance, instead, it uses all data points as potential cluster centers and clusters according to the similarity between data points [5]. Moreover, it does not need to select the initial clustering center, which effectively avoids the problem of the initial clustering center and improves the reliability of the clustering result.
Clustering is based on the sample's closeness in nature. In order to make the class reasonable, the similarity, that is the degree of distance between the samples, is described [6]. The AP clustering algorithm uses the Euclidean distance function as a measure of similarity. That is, the similarity s between any two points i X and j X is: The AP algorithm passes two types of messages, Responsibility matrix r and Availabilities matrix , by continually iteratively updating the two matrices until a stable number of high quality cluster centers are produced. The relationship between responsibility and availabilities is shown in the Fig. 1  center of the i -point. a ( , ) i k is a numerical message sent from the candidate cluster center k to i , reflecting whether or not i has chosen k as its clustering center. The stronger ( , ) r i k and a( , ) i k , the greater the probability of k -point as the cluster center, and the more likely the i -point belongs to the cluster centered at the k -point cluster.
The iterative update formula is as follows: After each update, it can determine the representative sample point k of the current sample i , k is the k that gets the maximum value. If i k = , then it means that the sample i is the class representative point of its own cluster, if not, then explain that i belongs to the cluster to which k belongs.

Sample constraint information
The main idea of semi-supervised clustering algorithm is that using the known class of samples to form equivalent pairs of point constraint information, adjusting the similarity of the algorithm. Sample constraint information is divided into two kinds: Must-link, That is, two samples i X and j X must belong to the same class, The similarity matrix was adjusted as following. 0, , , ,

Similarity measure based on density path
In general, the similarity of the nearest neighbor propagation algorithm uses Euclidean distance. The Euclidean distance of any two points is defined as: Clustering is to divide data sample points into multiple classes. Samples of the same class have high similarity, the Euclidean distance is small, while the sample similarity in different classes is low, and the Euclidean distance is relatively large. However, sometimes there will be situations as shown in Fig.2, .According to the definition of clustering, sample 1 and sample 2 should be of the same type. The actual situation is that sample 1 and sample 3 are the same type, resulting in a contradiction. The reason is that the sample density is not considered when clustering. In order to solve this problem, the literature [7] proposed the similarity of ε-Nearest Neighbor Distance. The purpose is to amplify the sample distance in the low density region and reduce the sample distance in the high density region. The ε-neighbor distance is defined as follows: in the sample space, construct an undirected weighted graph ( , ) where V is the set of vertices, each vertex corresponds to a sample, and E is an edge set, which refers to the distance between each vertex. For any sample, its ε-neighbor distance.
In the formula, 1 ϕ and 2 ϕ are the density adjustment factor. When finding the distance between the sample points,, order , on the contrary, for distance, order ,subtract the constant 1.However, the ε-neighbor distance represents the degree of similarity, which may cause the points in the same density region to be stretched, that is, the samples i , j , k belong to the same class, and Using formula (5) to calculate the distance between sample i and sample k is stretched, it is possible to make ,Causes sample i and sample k to belong to different classes, In order to solve this problem, the literature [7] proposed the definition of popular similarity, but this formula is too complicated to implement, and it takes a long time for large sample size data, in order to correctly reflect the similarity between sample data points, this article uses a density path based similarity measure. 3, there are multiple paths from one vertex to another. We call the path with the smallest weight as the shortest path and the length of the shortest path as the shortest distance [8].
The most commonly used solution to the shortest path problem is Dijkstra's algorithm [9]. The single-step search method is used to find the shortest path. We use this algorithm to find the measurement function based on the density path based on the ε-nearest neighbor distance: ,and only ,it also satisfies the symmetry: triangle inequality: .The similarity matrix is replaced by a metric function. Since it is obtained from the shortest path, it can be connected with samples in the same high-density area with many shorter sides, and connect different density sample points with the longer side passing through the low-density area. In this way, it is possible to classify the sample points in the same density area as close to each other as possible, and sample points in different density areas are classified as different types. Effectively solve the problems raised before.

Experiment and result analysis
The data set is divided into two groups: the training data set and the test data set. The classifier is first constructed with training datasets, and then the validity of the constructed classifiers is tested with the detection dataset. Then the traditional AP algorithm is compared with the improved algorithm.

Algorithm validity indicator
In order to effectively evaluate the performance of clustering, the experiment will use the following three indicators: (1) Classification accuracy ( ) CR The classification accuracy ( ) CR represents the ratio of correctly classified samples to the total data set [10], which is defined as follows: is an arbitrary cluster. The higher the ( ) CR , the more accurate the clustering and the higher the clustering accuracy [11].
(2) Square error sum The squared sum of errors is often used as an objective function to construct the classifier and is often used to represent the classifier's distortion or coherence [12]. The square sum of errors is equal to the sum of the squares of the distances from all samples to their class representative points: The F-measure index is a commonly used indicator for clustering algorithm evaluation. It combines the "accuracy" and "recall" of the algorithm to measure the effectiveness of clustering [13]. For an arbitrary cluster,

Experiment results analysis
This article builds an experimental system and uses artificial ears to wear three different styles of hearing aids, namely A, B, and C, and then use the software to give it artificially eight different noises. They are 16000Hz babble, f16, factory, leopard, m109, Motorcycle, pink, volvo in the standard noise speech library. The noise level is 30dB, The hearing aid selects one of two modes, namely mode 1 and mode 4, voice selects the speech 1 of the standard speech library, and uses software to divide it into 3-8s speech segments, removes the pauses in the speech, improves the detection quality, and clusters quality.

Analysis of results
The data during the experiment is 30dB. Since we are hearing aids in 3 different styles, we specify the number of clustering classes as 3, and the test results are analyzed according to different groups: Mapping the clustering results in two-dimensional space and displaying them according to different clustering results. The horizontal axis represents different samples, and the vertical axis represents the corresponding PESQ score of each sample, as shown in the following figure: From (a) in Fig. 6, it can be seen that most of the cluster 1 samples are hearing aids A and B, and their PESQ scores are all above 3 points, and there are relatively more hearing aids B; From (b), most of the points in class 2 are hearing aids C, and most scores are below 2 points. In the third chart (c), most points in cluster 3 are hearing aids A and B, and most scores are all above 3 points. Among them, the style classification of the first and third types of hearing aids is not very accurate.
The improved AP algorithm results is shown in Fig.  7. In (a) of Fig. 7, it can be seen that in most of the cluster 1 sample points are Hearing A, and their PESQ scores are all above 2.5 points; in the second graph (b), most of the cluster 2 points are Hearing Aid C. And most of the scores are below 2 points. In the third graph (c), the points of cluster 3 are all hearing aids B, and the scores are all above 2.5 points. The three clusters can represent three hearing aids of different styles. The comparison results show that the improved AP algorithm can accurately distinguish different styles of hearing aids, and the hearing aids A and B voice quality is significantly better than the C section.
The three evaluation indicators of clustering are shown in Table 1 below: According to the Table 1, the accuracy of the AP algorithm is higher than the standard AP; the squared error of the second indicator indicates the degree of distortion or cohesion of the cluster. The smaller the value, the smaller the distortion. Improved algorithm this indicator is relatively small, which proves that the distortion is small and the algorithm is better. The bigger the third indicator is, the more accurate it is. After the comparison, it shows that the improved algorithm is more accurate.
In summary, in the case of a noise level of 30 dB, the improved AP algorithm is superior to the conventional AP algorithm. This algorithm can distinguish hearing aids of different styles and can prove which hearing aid has better speech quality. Therefore, Good results have been achieved in the classification of hearing aids.

Conclusion
Experiments show that clustering is better applied to the speech quality classification of hearing aids, and the improved algorithm proposed in this paper is obviously better than the traditional AP algorithm. Its classification accuracy is high, and the squared error is small. Fmeasure value Large, and the classification can represent hearing aids of different styles, indirectly confirming that A and B hearing aids have better speech quality than C. Comprehensively, the changes in the indicators in the three tables can also be found that the noise has an impact on the hearing aid's voice quality, so it also has a certain impact on the classification accuracy. The main work in the future is how to improve the accuracy of clustering under higher noise levels.