Milk duct segmentation in microscopic HE images of breast cancer tissues

The aim of the paper is to recognize and extract the milk duct in haematoxylin and eosin (HE) stained breast cancer tissues. The paper presents the modified K-means approach to segmentation of the milk duct in HE stained images. Instead of using single pixels we propose to consider the defined region of pixels in the process. Thanks to such modification more accurate extraction of the milk ducts has been achieved. To compare the results in a numerical way the GT images prepared by the medical expert have been subtracted from the corresponding images created by the segmentation methods. The numerical experiments performed for many preparations have confirmed the superiority of such approach. The proposed method has allowed reducing significantly the error of duct segmentation in comparison to the classical K-means approaches. The results show, that our method is superior to the standard K-means and to the K-means preceded by averaging or Gaussian filtration at different size of filtration mask.


Introductionmedical background
Ductal carcinoma in situ (DCiS) belongs to the most frequently appearing type of non-invasive breast cancer [1,2]. It is "non-invasive" because is limited to the milk duct and is regarded as not life-threatening. However, DCiS increases the risk of developing an invasive breast cancer later on (around 1/3 od DCiS are ended in an invasive cancer). More than 90% of cases are without any visible symptoms in normal life.
The first signs of the illness can be recognized in the mammogram. After noting them the biopsy is usually made and the pathologist analyzes the piece of breast tissue and reports back on the type and grade of the DCIS, describing how abnormal the cells look when compared with the normal breast cells.
The image shown in Fig.1 [4] presents possible types of findings, starting from the normal cells and ending on the invasive ductal cancer. They include normal cells, ductal hyperplasia (too many cells present), atypical ductal hyperplasia (too many cells starting to take on an abnormal appearance), ductal carcinoma in situ (too many cells but still confined to the inside of the duct), DCiS with micro-invasion (few of the cancer cells breaking through the wall of the duct) and finally invasive ductal cancer (many cancer cells broken beyond the breast duct transferring DCiS into an invasive ductal carcinoma,).
Automatic recognition of the milk ducts is an important task in computerized approach to the problem. The aim is to separate individual milk ducts existing in the preparation. On the basis of many extracted ducts the medical expert can diagnose the particular case. Our task in this presentation is to develop computerized system which is able to extract the individual ducts from the whole image with the highest precision comparable to the precision of human expert.

Problem statement
The aim of the paper is to recognize and extract the milk duct in haematoxylin and eosin (HE) stained breast cancer tissues. They form the region of interest in our analysis. An example of the analyzed image is presented in Fig. 2. It depicts the cross area of many milk ducts of different sizes and shapes. The arrows point to two chosen ducts. The group of ducts are located in the background of the tissue. The most important factor in medical decision taking is precise determination of the shape, size, degree of filling the inside, width of the walls and channel patency. On the basis of values of these parameters the decision of the advancement of the breast cancer is taken. Therefore, the task of extraction of all milk ducts should be done as precise as possible.
The popular method of doing it is the K-means algorithm [3,5,6]. However, its direct application to this problem is not efficient, since it results in a lot of artifacts in the background. The remedy to this problem is application of the morphological operations, which allow removing the residual noise in the resulting image. This process has some negative aspectschanging the shape and size of the milk duct. Our work presents some modification of K-means which eliminates the need for additional image processing. Thanks to this we are able to achieve better accuracy in the milk duct reconstruction. This will be confirmed by comparing our results with the ground truth (GT) pattern pointed manually by the medical expert for the same images.

Materials
The HE images of the breast cancer tissues, which are subject to analysis, have been prepared in the Military Medical Institute in Warsaw. The biopsy material has been acquired from the patients suffering from breast cancer of different grade and then prepared using HE technology. The preparations have been scanned using high definition 3D Histech scanner and obtained images stored in the data base. The size of each image was equal 55808×82688.

Methods
The aim of clusterization is to separate the analyzed image into clusters representing the milk ducts existing in the image. Classical K-means operates on individual pixels in RGB channels, searching for the nearby pixels of similar intensity [5,6]. In our proposition we extend the representation of individual pixel to the region around it, including N neighboring pixels. In the case of RGB image each pixel will be represented by the vector of the length 3N, x=[R 0 , R 1 , …, R N , G 0 , G 1 , …, G N , B 0 , B 1 , …, B N ], where the elements within R, G an B family are arranged in a non-decreasing order of pixel intensity values. An example of vector creation is shown below. Let us assume the RGB representation of image in the following matrix form. We will create vector representation for the marked RGB pixels of the intensity levels equal 197, 49 and 196, respectively. Assume N=25, which means that each pixel of the image has 2 neighbors from left and right as well as from the top and bottom. The vector representation of the marked pixels neighborhood region in RGB channels arranged in a non-decreasing order will look as following. This vector represents pixels of the image region denoted in the matrix form by bold. To prepare vector representation for the boundary pixels (for example position (1,1)) we extend the image by replicating the neighboring n columns and n rows in vertical and horizontal fashion. An example of such replication for red channel is shown below. Now the element (1,1) of the previous red matrix of the intensity 48 has the required number of neighbors.  Searching for the nearest neighbors of the particular pixels we use the Euclidean distance between its vector representation. For example the distance between the marked RGB pixels of intensity (197,49,196) and the pixels of intensity (60, 93, 69) of the first row of matrix is equal d= 101.5185. The alternative solution that will be compared in this work is application of the standard K-means algorithm, or K-mean algorithm preceded by the lowpass filtration of the image. The filtration might apply different filtering masks, for example averaging or Gaussian filters.

Results
The aim of numerical experiments was to make comparative analysis of the results of our automatic extraction of milk ducts to the classical K-means methods, all related to the results of medical expert. Fig. 4 presents three examples of original images containing the milk ducts of different shape and size.

Fig. 4 The examples of original HE images of the milk duct used in numerical experiments
The results of our automatic system for these three images are presented in Fig. 5 (upper row). The boundary of the extracted images are clean and the background is without any artifacts. The bottom row presents the corresponding results obtained by using the standard K-means algorithm. The difference of the quality of extraction is evident. The results of our approach do not need any additional post-processing, while the application of standard Kmeans results in many artefacts, that should be eliminated by morphological operations. Fig. 6 presents the details of graphical comparison of the extraction results of one chosen duct due to our algorithm and to the other existing approaches. Fig. 6a presents the original input image, Fig. 6b -the milk duct extracted using our method, Fig. 6c -the result of application of standard K-means and Fig. 6d -the standard K-means combined with application of the averaging filter with the 3x3 filtering mask. To compare the results in a numerical way the GT images prepared by the medical expert have been subtracted from the corresponding images created by the segmentation methods. The differential images have been defined on the basis of the binary masks. The sum of the nonzero pixels in the differential image was divided by the total sum of pixels in the GT image. The statistical results of such comparison have been done for the data base of 10 preparations, from which more than 80 milk ducts have been extracted. The statistical results are presented at application of different number of clusters used in K-means algorithm. Different variants of K-means (our approach, standard K-means and K-means with application of filtering) are depicted in Table 1. The values higher than 100% might have happened in the case when the size of the extracted ducts were much larger than the GT result.
The results show, that our method is superior to the standard K-means and to the K-means preceded by averaging or Gaussian filtration at different size of filtration mask. The advantage of the proposed method is very well seen in the best case, when at n=10 (mask size N=21×21).

Conclusions
The paper has presented the modified K-mean method for extracting the milk ducts in HE images of breast cancer. The main point of the method is substitution of the single pixel in K-mean algorithm by the region around this pixel and concatenation of RGB regions into common vector.
The proposed method has allowed reducing significantly the error of duct segmentation in comparison to the classical K-means approaches. Future research will be directed to test the method on larger population of HE images and compare the results with the GT patterns prepared by different experts. At the same time the numerical measures characterizing the basic parameters of extracted ducts will be proposed.