Classification of acute myeloid leukemia subtypes M1, M2 and M3 using active contour without edge segmentation and momentum backpropagation artificial neural network

. Acute Myeloid Leukemia (AML) is a type of cancer which attacks white blood cells from myeloid. AML has eight subtypes, namely: M0, M1, M2, M3, M4, M5, M6, and M7. AML subtypes M1, M2 and M3 are affected by the same type of cells, myeloblast , making it needs more detailed analysis to distinguish. To overcome these obstacles, this research is applying digital image processing with Active Contour Without Edge (ACWE) and Momentum Backpropagation artificial neural network for AML subtypes M1, M2 and M3 classification based on the type of the cell. Six features required as training parameters from every cell obtained by using feature extraction. The features are: cell area, perimeter, circularity, nucleus ratio, mean and standard deviation. The results show that ACWE can be used for segmenting white blood cells with 83.789% success percentage of 876 total cell objects. The whole AML slides had been identified according to the cell types predicted number through training with momentum backpropagation. Five times testing calibration with the best parameter generated averages value of 84.754% precision, 75.887% sensitivity, 95.090% specificity and 93.569% accuracy.

Leukemia can be detected by counting the total number of blood cells then performing a comparison of the number of white blood cells and red blood cells through a microscope by morphology by haematologists. However, the procedure of calculating blood cells with this microscope is quite time consuming and tiring [9]. The microscopic image inspection of bone marrow blood also has variative results and does not have clear standards due to the process relies heavily on the expertise of haematologists [20]. The other techniques used to calculate blood cell counts such as immunophenotyping, flowcytometer and molecular probing are also still used as a standard diagnostic of leukemia hence these techniques are considered relatively costly [17] [22]. To help to overcome these problems, we proposed a combination of digital image processing and artificial neural network [18,23].
This research combined active contour without edge segmentation (ACWE) process for WBC segmentation in blood microscopic images and artificial neural network with backpropagation momentum algorithm based on morphology and colour features of three types of cells mentioned above to identify AML subtypes: M1, M2 and M3.

Proposed Method
The research began from image acquisition of blood preparations. The image data were enhanced, segmented and then extracted to get the features value as input data in training and testing stage. The results from cell type classification were used to identify what AML class contained in each preparation. Research steps can be shown in Figure 1.

Image Acquisition
AML M1, M2 and M3 preparations were obtained from Dr. Sardjito Hospital Yogyakarta through ethical clearance procedure. Image data were taken using a 21 megapixels camera attached to Olympus microscope's ocular lens with 1000 times magnification. The shooting was done 33 times for each preparation with different slide position.

Image Enhancement and Segmentation
Image enhancement for reducing image noise was conducted by applying median filter to the original image [7,19]. Image segmentation consists of two important parts: WBC body segmentation and nucleus segmentation.
The concept of body segmentation is utilizing ACWE as the primary method of segmenting WBC candidates. ACWE minimizes the difference in energy value from background and foreground. The stopping term does not depend on the gradient of the image but related to a particular segmentation of the image [1,5]. Having obtained the image of white blood cells intact, overlapping cells were separated by using Watershed Distance Transform [19,24]. The result of the WBC body segmentation was a binary mask which was then pixelmultiplied with the corresponding pixel in original image to get the WBC object in RGB channel. The segmentation flow chart of the WBC body is shown in Figure 2.

Fig 2. WBC body segmentation flowchart
Nucleus needed to be segmented to calculate nucleus ratio. Nucleus image should be converted to HIS channel because the nucleus had a high saturation making it easier to do thresholding on the saturation channel [2]. The segmentation flowchart of nucleus is shown in Figure 3.

Feature Extraction
The feature extraction produced six numerical data which represent image characteristics: cell area, cell perimeter, cell circularity, nucleus ratio, mean and standard deviation. These six features were then used as artificial neural network inputs.  Cell Area: The sum of white pixel from binary segmented cell [18]. (1) The variable i(x, y) is the pixel value in the image f(x, y). Variable A defines the area. The variable n is the number of the length of image row whereas m is the length of image column.  Cell Perimeter: The sum of white pixel from the eroded binary segmented cell by 1-pixel strel [18]. ( Variable B is a strel of image morphology. A is the area of the object and the p is the edge of the object. All variables are in units of number of pixels.  Circularity: The roundness of cell within range 0 -1 [18]. (3) A is the area and p is the perimeter of the area. The more rounded an object the closer the value of 1.  Nucleus Ration: Nucleus area and cytoplasm area ratio within range 0 -1 [18].
Variable A(nucleus) defines the area of the nucleus object while A(cell) defines the area of the body of a WBC. R is the ratio of the nucleus area to the WBC.  Mean: Pixel colour intensity of red, green and blue channel [16].
The variable i(x, y) is the pixel value of the cell. The variable n is the number of the length of image row whereas m is the length of the image column. The variable ̅ defines the average pixel intensity value of the cell.  Standard Deviation: The root of the squared difference in pixel colour intensity and pixel mean [16].
The variable i(x, y) is the pixel value of the cell. The variable n is the number of the length of image row whereas m is the length of the image column. The variable ̅ defines the average pixel intensity value of the cell and σ is the standard deviation of the cell.

Normalization
Due to varying values of the features, normalization needed to be done first before entering the training process with the following formula: The value of x is the value of the x-index, min (data) is the smallest value of the data set on each feature, while max (data) is the largest value of the data set on each feature.

Training and Testing
AML M1, M2 and M3 classification training stage used backpropagation momentum with one hidden layer as learning algorithm. The architectural design of this training can be explained in Figure 4.
Six normalized features were used as input neurons. The output layer consisted of two neurons called y1 and y2 in binary value. Cell type results were determined by adjusting the class to the output neuron pattern as shown in Table 1.
Learning rate () used were 0.1, 0.5, and 0.9. For the number of hidden layer (z) neurons used were 6, 8 and 10. Maximum epoch () were 1000, 5000, and 10000. Momentum () used were 0.1, 0.5, and 0.9. The tolerance limit for error () was 0.000001 thus the testing was done 81 times according to the combination of parameters.

Classification Result and Validation
Validation for classification results used 3-fold crossvalidation. Out of 734 total data, 250 data were used as fold-1 test data. 248 data were used in fold-2 and 236 data were used as fold-3 test data. Each result was then summed and inserted into a confusion matrix based on the prediction of each cell type. Those values were later used to calculate four predictive analysis values: precision, sensitivity, specificity and accuracy. The combination of parameters having the highest total predictive analysis was then used to recalibrate five times. The identification of AML subtypes on each preparation was done by accumulating the number of predicted results of each cell and grouping them according to the origin of preparations. AML subtype later could be determined based on the percentage criteria of each cell type as outlined in Table 2. Preparation which was not meeting cell count criteria would be classified as "support" type cell.

Segmentation Result
Segmentation was done on each slide preparation containing dozens of blood cells. The number of each cell of each image on each preparation needed to be calculated to ensure the quantity of each cell type of a preparation. The samples of the segmentation results on AML M1, M2 and M3 preparations are shown in Table 3. There were some cells which experience errors when segmented. The incorrectly segmented cells were not included as an artificial neural network training input data. Examples of incorrectly segmented cells and its reasons can be shown in the Table 4. The result of segmentation shows that from the total of 99 white blood cell images, 876 data of white blood cell object consisting of 734 segmented data is correct and the remaining 142 segmented data is wrong thus the percentage of cells segmented correctly amounted to 83.789%. Detailed comparison of the number of cell objects resulting from correctly and incorrectly segmented cell for each subtype of AML can be shown in Table 5.

Feature Extraction Result
The feature set of the cells was entered as artificial neural network input data. The average results of WBC feature extraction can be seen in Table 6. We can see that the values obtained from each feature are in accordance with the characteristics of the real cell type. For example, myeloblast has a higher average circularity while monoblast has a larger size of area than the other cell types.

Training and Testing Result
Based on the results of training and testing conducted on each combination of momentum backpropagation parameters, it resulted the best parameters at =0.1, z=8 and =0.1 in 10000 epochs. This pattern was then recalibrated five times. Details of the test result information of five times calibration can be presented in Table 7. The identification result of AML subtype in each preparation is shown in Table 8. Preparation 1 was identified as a subtype of AML M1 because the counted cell of myeloblast exceeded 89%. Preparation 2 was identified as a subtype of AML M2 with 72% myeloblast, 13.333% promyelocyte and 2.667% monoblast. Preparation 3 was identified as AML M3 in which the percentage of myeloblast was 8.070% and 34.737% for promyelocyte. The three results of the inference were correct according to the type of preparation used. It means the classification of AML subtypes M1, M2 and M3 with backpropagation momentum can be said to be successful.

Conclusion
The ACWE algorithm could be used to segment WBC objects from image background with success rate 83.789%. The percentage of correctly segmented object data versus overall data from each preparations of AML M1, M2 and M3 were 88.189%, 75.757% and 87,692%, respectively. The feature extraction was performed on 734 cell objects from the total number of 876 segmented cell objects.
The best pattern of momentum backpropagation for AML subtype classification was 0,1 learning rate, eight neuron hidden layer and 0,1 momentum. Predictive analysis values obtained from five times calibration test were as follows: 84.754% average precision, 75.887% average sensitivity 95.090% average specificity and 93.569% average accuracy.