Classification of leukemia diseased cells

In this paper, the characteristics of leukemia diseased cells, diseased cells existing classification methods were studied and analyzed, in order to solve classification categories leukemia diseased cells more characteristic dimensions of the problem of high, finite automata and kernel methods, and experimental verification the advantages of this method.


Introduction
Leukemia (Leukemia) is a malignant tumor of the hematopoietic system, currently in the country is a threat to children and young adults leukemia life and health of the most common malignancies. Blood disease occurs in modern society, how to properly detect blood disease and propose effective and reliable treatment options based on the detection result, it has become a major issue in today's medicine. Blood disease is due to human hematopoietic dysfunction caused by blood cells in quantity, shape, proportion and other aspects have changed, so the correct identification of blood cells are correctly detected blood disease premise. At present, for the analysis of microscopic images of leukemia also not much hope to design a blood-based leukemia microscopic image identification and classification system.

Blood diseases and typing
According to current international combined cell morphology, tissue staining, flow cytometry and immunohistochemistry leukemia typing will be divided into: 1. Acute leukemia: acute non-lymphocytic leukemia and acute lymphocytic leukemia: 2 Chronic leukemia: 1) chronic myelogenous leukemia (CML): bone marrow myeloid hyperplasia, intermediate stage cell hyperplasia.
2) chronic lymphocytic leukemia (CLL): bone marrow mature lymphocytes 40%. The test sample is collected from chronic myelogenous leukemia (also known as chronic myeloid leukemia) of the blood film. Chronic myeloid leukemia blood film will appear segmented neutrophils and rod-shaped granulocytes, which is cell category this test to study.

Existing classification
By selecting features will be recognized classification objects merge, confirm their category is called classification process. The process is based on the appropriate decision rule, the feature space is divided into different types of samples. In some practical process, the pre-given condition, the category attribute is considered often have similarities, classification error is inevitable, therefore, the classification process can only be done in a certain error rate. Obviously, the smaller the better classification error rate. However, the classification error rate and is subject to many conditions, such as classification, classifier design and selection of samples and extracted characteristics and other factors will affect the classification results.  It can be divided into statistical decision method, syntactic structure, fuzzy judgment method and artificial intelligence four categories according to their method. Former two methods is the classic pattern recognition technology, the introduction of fuzzy mathematics research in these two methods, greatly improving the classification results. Artificial Neural Network 1980 years revival, more globally relevant feature in the field of pattern recognition has made many using traditional methods difficult to achieve success. Table 3-1 lists commonly used in blood cell classification method.
As can be seen from the above discussion and the work of their predecessors, in the blood cell classification category more due to the classification, feature dimension is high, so the problem classifier selection, structural design, and classification speed, accuracy, etc. are required considering the research prospects.

Support vector machine kernel function method of classification of blood cells
In order to solve the multi-category classification, characteristic dimension of the problem of high blood cell classification in the above, we use support vector machine classification. Support vector machine can solve high-dimensional feature of classification.

Linearly separable support vector machine
Support vector machine can be further divided into linear separable support vector machine, kernel function vector machines. We first examine the classification question whether linear classification problems, whether to use linear support vector machine method can be solved.
Using linear support vector machine plus soft margin optimization method, using 120 samples for training, with 30 samples were tested and the results are shown in Table 4-1.  Table 4-1 interval optimization soft penalty factor, from the above table, we found that the greater the penalty factor, the fewer number of support vectors obtained, it means that the promotion of learning machines get better. Moreover, the penalty factor, the higher the accuracy of the test. This reflects that the system is approximately linear separable, but linear classifier and can not get the best results.

Kernel Methods
If you want to use the kernel function method, then choose what kind of non-linear classifiers have the best effect? Select nonlinear classifier, in fact, it is to choose the type of kernel function and its parameters. Present research kernel is not very thorough, it has not been studied all possible kernel function and its advantages when classification. Now people can only prove out several common kernels on classification issues have a better effect. These core functions are: We experiment with polynomial kernel and a Gaussian kernel to do a test classification, respectively, and have chosen a few parameters for each kernel. Still 120 training samples, 30 test samples. The results are shown in Table 4-2.
Table 4-2 and Table 4-1 comparison, it can be seen, with the kernel function improved classification accuracy than the method of linear classification soft interval. Moreover, the radial basis function better than polynomial kernel function results. After you have selected the radial basis function, we choose RBF kernel function of several parameters. Gamma is where RBF kernel function parameters, C is the soft margin optimization penalty factor. After testing a number of parameters, we found that when the Gamma = 2, C = 2delivering up to the maximum classification accuracy, when Gamma and C and then increase, not in the correct classification rate increases. So learning machine obtained at this time is the most suitable machine learning WBC we can get. When the prediction of a learning machine with the best forecast of 93.496% accuracy rate. Its confusion matrix (Confusion Matrix) as shown in Table 4-3, where the i-th row j-th column represents the first i kind of cells into the wrong kind of probability j cells.

MATEC Web of Conferences
By support vector machines and kernel methods on cell sorting experiments, the results illustrate the applicability of using support vector machine kernel function method on blood cell, indicating that it has a good prospect in leukemia classification pattern recognition .