Hemolysis detection based on SVM of Adaboost classification algorithm

Aiming at the problem that clinical hemolysis is difficult to be observed and judged, a method of Adaboost learning classification based on SVM is proposed. The method firstly extracts the basic features of the target area of the blood sample, such as the average of the gray level, the standard deviation of the gray level and the appearance frequency of the particles, as the input eigenvectors of the learning, and carries out SVM weak learner learning. Subsequently, Adaboost algorithm is used to measure the weak learner Set linear weighting, so as to enhance the strong learning device; Finally, online testing, calculation of test sample hemolytic degree and classification. The Adaboost learning classification test based on SVM is compared with the macroscopic and red blood cell counting methods. The experimental results show that the learning-based classification testing method achieves higher detection accuracy without subjective factors and has the highest detection efficiency.


Introduction
The abnormal destruction of this red blood cell can cause many diseases to occur [1][2] , the diagnosis of hemolysis in clinic is mostly obtained by the observation of the medical staff on hemolysis and condensation reaction [3][4] , although this method is simple and easy, but it is easy to observe the small amount of hemolysis is susceptible to subjective factors, Therefore, only to adapt to the obvious samples of hemolysis, and in order to be able to accurately and quickly make correct judgments, medical staff need to carry out a long period of learning to accumulate experience, work efficiency is relatively low.
Machine learning [5][6] is the understanding of human learning and thinking mode by cognitive science and physiology, and establishes the cognitive model or computational model of cognition process based on human thought. Among them, machine learning based on SVM can be used as classification and regression analysis based on structural risk minimization [7] , so the SVM classification algorithm is used to classify the hemolytic samples, but because the SVM algorithm has a smooth effect on the data, the prediction of the samples with mutation characteristics is not good, We introduced the adaboost algorithm [8] to enhance the SVM classifier: The Learning weights of SVM sample learning are adjusted, the repetition training obtains different SVM classifier, until the preset classifier number is reached, and all SVM classifier is weighted linearly, then a strong classifier is obtained. Through comparing the classification algorithm with other methods, it shows that the classification of blood samples based on SVM algorithm can make effective and accurate hemolysis judgment quickly and effectively.

SVM Machine Learning Principle
SVM is a supervised learning model in the field of machine learning. It is typically used for regression analysis, classification and pattern recognition [9][10] . SVM changes the low-dimensional linear inseparable use cases into the high-dimensional eigenspace. Use cases can be linearly separable. This mainly uses the non-linear mapping algorithm, so a linear algorithm can be used to analyze the nonlinear characteristics of the use cases linearly. In the implementation, SVM learning strategy is to get the maximum interval, and finally can be classified into a convex quadratic programming problem to solve [11] .
Consider first the linear separability case, assuming two different data types are represented by a circle and a cross, respectively, in a two-dimensional plane (as showed in Fig.1). Depending on the principle of linear separable data, we separate the two types of data by a straight line. We can regard this straight line as a hyperplane. The data point on the hyperplane side corresponds to-1 and the other side corresponds to 1. , as shown in Figure 1 (b). The greater the "spacing" between the hyperplane and the data points, the higher the accuracy of the classification. To make the classification more accurate, you must select a hyperplane that maximizes this "spacing" value, as shown in Figure 1 (c).
The maximum distance classifier's objective function is defined as: max  , while meeting some constraints: Geometric interval is defined as: The above objective function is transformed into: Since the prescribed function is quadratic and has linear constraints. The QP (Quadratic Programming) optimization package can be used to solve the convex quadratic programming problem. This paper discusses the general linear irreducible situation, in fact, most of the time, the data are linearly not divided, you cannot find the maximum spacing classification hyperplane ( Fig. 2 (a)). In a linear and ( ) k x state, SVM constructs a new discriminant function by selecting a kernel function, and sends the data to the high-dimensional space so that it can be linearly divided, as showed in Fig. 2 Commonly used kernel functions are isotropic dorder polynomials, anisotropic d-order polynomials, radial basis functions, Gaussian radial basis functions, etc. It is interesting to note that the selection of kernel functions during training is often not critical. Several simple kernel function classifier performance is usually comparable [12] .

Feature extraction of blood samples
In the process of classification of hemolytic images, three features of the target area will be extracted: the average of the gray scale g , the standard deviation of the gray scale  , the frequency of the is invoked as the input, the evaluation score and the classification i s as the output. Depending on the training and learning process, the non-linear function of the evaluation score, classification i s , and eigenvector is obtained. That is, the training model is obtained. After getting the learning model, online fusion testing can be done. The image is acquired online, the target area is selected, the three characteristics of the area are calculated and used as the input of the learning model, and the evaluation scores and the classification corresponding to the target area are obtained through the learning model calculation.
Hemolytic target area image sample feature extraction, followed by the average gray-scale features, gray-scale standard deviation characteristics and particle frequency characteristics.

Gray average feature
The gray mean value of image in target region mainly reflects whether the blood distribution is completely insoluble, the illumination is completely through the blood test tube, at this time the average gray value of the target area image is very high; when the part of the blood is dissolved, the illumination part passes through the blood test tube, at this time the average gray value of the target area image is relatively low the illumination was relatively few. The average gray value of the pixel in the target zone is: Where i g represents the pixel gray value at any point in the target area, i is the i th pixel in the image, and n is the total number of pixels in the target area.

Gray standard deviation characteristics
The characteristic of gray standard deviation of image in target area mainly reflects the degree of bloodsoluble compatibility. That is, the degree of uniform intensity of the target area. When the blood distribution is completely insoluble, the illumination is completely through the blood test tube, at this time the target area image average gray value is very large, and the gray standard deviation is small, sometimes even 0; when the distribution of blood is completely dissolved, light through the blood test tube is very few, at this time the target area image average gray value is very small, and gray standard deviation is also relatively small; when the blood is partially fused, The illumination part passes through the test tube, at this time the target area image Gray standard difference is big, when the target area average value is bigger, the gray standard difference is bigger, the match blood consistency is better; Conversely, when the target region average time, the gray standard deviation is bigger, the match blood consistency is worse That is, when the target area gray standard difference is compared with the hour, the gray standard deviation affects the matching degree of the gray mean, which plays a positive role; when the target area gray standard deviation is larger, the gray standard deviation plays a reverse role in the influence of the gray mean on the matching degree of the blood distribution. Grayscale Standard deviation  :

Particle frequency characteristics
In the hemolysis process, the blood sometimes condenses into granules, extract these granules from the target area as a particle characteristic, count the number of these characteristic particles, and calculate the frequency of occurrence of these particles (particle frequency = number of particles / target area image pixel Number). When the blood is completely incompatible, less blood coagulation, the frequency of particles in the target area is relatively small, but the gray value of the target area is very large; when the blood is completely matched, the blood coagulation is less, the frequency of particles in the target area But the gray value of the target area is relatively small at this time; when the blood is partially matched, the blood condenses into a granular form, and the frequency of the particles in the target area is relatively large. When the gray value of the target area is relatively large, the frequency of the particles is higher , The better the degree of matching with blood; the contrary, when the average of the target area is relatively small, the greater the frequency of particle appearance, the worse the degree of matching blood. That is to say, when the frequency of the particle in the target area is relatively small, the frequency of the particles affects the mean value of the gray level to match the degree of blood matching, which plays a positive role in promoting. When the frequency of particles in the target area is relatively large, Impact on the degree of matching with the blood played the role of reverse offset. Calculate the particle frequency of the target area as follows: (a) Gaussian template image denoising for target area image; (b) The image computed Sobel gradient image after denoising; (c) The Sobel gradient image is trimmed to a threshold value of two-valued image; (d) Statistics of binary image connected domain number, that is, the number of statistical particles p n ; (e) Calculate the frequency = p fren n n of the particles, in which p n is the number of particles, n the number of pixels in the target region image.

Learning training based on SVM algorithm
1) Training Database collection: First, collects the image, select the target region of the blood distribution in the image, then extract the eigenvector of the target area, and then give the hemolytic degree score evaluation and classification s of the target area. The image sample database and the data collection 2) Support Vector Computation: Assuming that we have N independent and homogeneously distributed blood samples, i x is an input eigenvector of dimension 3 and i s is the corresponding hemolysis fraction and class label, the goal of SVM is to find in the feature space The linear discriminant that satisfies the maximized category boundary, where the kernel function is written as ( ) , then the discriminant function is: The  is the weight coefficient, and the b represents the offset of the separation hyperplane. Using Lagrange to odd, the above discriminant function can be deformed as follows.
( ) The objective function is: represents the offset of the detached hyperplane.

Strong classifier learning based on adaboost algorithm
The AdaBoost algorithm is an algorithm that can upgrade a weak learner to a strong learner [13][14] . The basic idea of the adaboost algorithm is to learn a weak learner from the training sample, then change the weight of the sample according to the learning result, then learn a weak learner according to the modified training sample, and continue to do it repeatedly until the number of the learners reaches the preset value, in the T A weak learner is weighted to get a strong learner.
In this paper, we apply the standard SVM algorithm as a weak learner to the Adaboost algorithm to improve the prediction accuracy of the algorithm, as showed in Fig.3.
(4) update the training data set weights: Step3. Construct a linear combination of weak classifiers

Experiment Analysis
After establishing the target Area sample library, by collecting 5 blood sample images, selecting the detection target area, calculating the gray average gray character of the target area, the gray standard deviation feature and the particle frequency characteristic, the target region corresponding classification and its fractional value are predicted by the learning model presented in this paper, and the results are as shown in Fig.4 and Fig.5:  In the process of data experiment, the sample library is divided into two parts, some of which are used to train Adaboost machine learning model based on SVM, the other part of the sample as a control, through the feature extraction and load learning model to predict the part of the sample score and classification , Compared with the actual mark of the score classification. A total of 1600 samples were constructed from 11 groups of samples, 100 samples from each group were taken to form 11 groups of 1100 samples for SVM-based Adaboost learning model training, and the remaining 11 groups of 500 samples were used for prediction, experimental control and training as shown in Table 1 As shown. In this paper, the method based on SVM based on AdaBoost learning is compared with the naked eye and RBC counting [15] . Measured data as showed in table 1, through the experimental results found that the production of hemolytic samples can be observed by the naked eye, using this algorithm. RBC classification algorithm to determine the degree of hemolysis in 60 Above.
However, between hemolytic and insoluble hemolysis or uncertain samples, each individual has different judgments, the impact of subjective errors. Although the use of red blood cell counting method from the subjective factors, but the experimental process is numerous, longer experimental period, erythrocyte itself prone to rupture during the experiment, resulting in measured values will produce errors, so the practicality of detection of hemolysis is not high. The proposed Adaboost classification algorithm based on SVM is not affected by the subjective factors, and the hemolysis judgment which is similar to the erythrocyte counting method is provided simply and effectively.

Summary
Clinically hemolysis detection mostly obtained by the naked eye, so the individual subjective factors, and the detection efficiency is low, especially the hemolytic degree is not easy to observe the situation, the observation often fails. In this paper, a new Adaboost classification method based on SVM is proposed. This method learns a large number of samples through feature extraction and sample hemolysis and classification evaluation, and obtains a strong classification model suitable for hemolysis detection. Through the experimental verification, the detection algorithm in this paper can reach the accuracy of manual detection completely, and the detection speed is fast and the detection efficiency is high, which saves a lot of manpower costs and significantly improves the work efficiency and the detection precision.