A segmentation of pulmonary nodules based on improved fuzzy C-means clustering algorithm

According to reports, lung cancer is gradually becoming the first cancer that threatens human life.The early stage of lung cancer is in the form of pulmonary nodules.The key issue in computer-aided diagnosis of lung tumors is to correct and accelerate rapid segmentation of diseased tissue. Therefore, this paper proposes a robust fuzzy c-mean clustering algorithm for pulmonary nodules segmentation, which can effectively improve the adaptive degree of local domain pixels.Since the information of the domain pixels does not necessarily have a positive correlation with the central pixels, the reference mechanism of domain window pixel information needs to be redefined.The robust fuzzy c-means clustering algorithm redefines the grayscale of the spatial pixel points in the domain and selects different fuzzy factors according to the reference standard.Based on this, the weights of different fuzzy factors are updated according to the characteristics of pixel points and gray fluctuation in pixel domain.The experimental results show that this method is superior to other typical algorithms in the segmentation of pulmonary nodules.


Introduction
Lung cancer is one of the most malignant tumors with the fastest growth in morbidity and mortality and the greatest threat to population health and life.Over the past 50 years, many countries have reported significant increases in the morbidity and mortality of lung cancer.Released statistics by the National Cancer Center in 2015 shows that incidence of lung cancer is close to 17.09%,and men accounted for 70.3% , women accounted for 29.7%,and the mortality rate is as high as 21.68%,ranking first among various tumors [1].Lung cancer will become the biggest threat to human health.If patients with lung cancer in early get standardized surgical treatment,the 5-year survival rate is as high as 90%.In the stage I treatment of lung cancer,the 5-year survival rate of patients is 60%, but in stage II-IV treatment of lung cancer,the 5-year survival rate reduces to 5% from 40% [2].In order not to miss the best treatment period, you need to find it as early as possible and treat it as soon as possible. Pulmonary nodules are the most common form of early lung cancer.In order to improve the true positive rate of lung nodules, CT images are used for the auxiliary diagnosis of pulmonary nodules.At present, CT image data shows an explosive growth trend, which is bound to increase the workload of doctors, leading to missed diagnosis and misdiagnosis.Studies have shown that accurate and effective lung CT image segmentation can reduce the amount of calculation, improve the efficiency of the entire diagnostic system, reduce missed diagnosis and misdiagnosis, and play an important role in the process of lung disease and function evaluation [3].
In lung CT images, pulmonary nodules are usually seen as tissues that is brighter than the surrounding lung parenchyma,and similar to a sphere or an ellipsoid. According to the shape and position distribution of lung nodules in the lung parenchyma, they can be divided into the following categories: picture (4) is Solitary Pulmonary Nodule (SPN), picture (2) is Lung Wall Attachment Pulmonary Nodule (LWA) ), picture (1) is Ground Glass Opacity (GGO), and picture (3) is Juxtavascular Pulmonary Nodule (JV). As shown in Figure 1. Figure.1.Examples of various types of pulmonary nodules Because of many noise interferences and poor resolutions in CT images, the segmentation accuracy of many traditional algorithms is relatively low, and the mis-segmentation rate is relatively high.Medical image segmentation techniques range from early manual segmentation to semi-automatic segmentation，then to automatic segmentation.During this period, many scholars at home and abroad have proposed many methods for segmentation of pulmonary nodules.Such as threshold method, regional growth, fuzzy clustering, neural network and so on.
Fuzzy C-means Clustering (FCM) [4] is the most widely used image segmentation algorithm at present.The traditional fuzzy C-means clustering is very sensitive to noise, and does not consider the spatial positional relationship of pixels, and easily converges to local extremum.In order to solve these problems, many scholars at home and abroad have improved the traditional FCM algorithm by combining the degree of correlation between pixel points and neighboring pixels.Ahmed et al introduced a spatial constraint on pixel color features in the objective function of FCM algorithm, and proposed a Fuzzy C-means Clustering with Spacial information(FCM-S) [5].The algorithm has a certain capability of noise immunity, but each iteration must perform a series of calculations on the spatial information of the domain.Chen and Zhang [6] performed mean filtering and median filtering on the image before each iteration, and proposed FCM-S1 and FCM-S2, which reduced the computational complexity of FCM-S, filtering will also smooth the details of the image while suppressing noise, resulting in missegmentation of subsequent FCM clustering.Szilagyi [7] proposed an enhanced fuzzy C-means clustering(EnFCM) based on image gray level .The EnFCM algorithm computes a linearly weighted sum image of the original image and its local neighborhood mean image. First, the image was filtered by means of means, and then the image was segmented by fuzzy c-mean clustering on its gray histogram.The number of gray levels of the image is much smaller than that of the image pixels.This algorithm effectively reduces the complexity of the operation and improves the segmentation speed of the image. However, in the mean filtering process, the original texture details will be lost, resulting in image edge blurring.Cai [8] proposed a fast generalized FCM clustering algorithm (FGFCM),a local similarity measure method that combines local spatial distance and grayscale difference.It weakens the blur of the edge of the image to a certain extent, but the parameters that this algorithm cannot be automatically obtained through experiments are introduced,and the segmentation precision is not ideal.Stelios [9] proposed a local information-based FCM algorithm (Fuzzy local information C-means, FLICM), which combines local spatial information and gray-scale information to define a new type of fuzzy factor, similarity measurement without any parameters.The factor has better adaptability, and the original image is used in the iterative process, which avoids the loss of detail caused by the preprocessing and improves the segmentation effect of the image. However, due to the limitations of its constructed local spatial information, some details are lost during image segmentation.
In order to further improve the accuracy and robustness of FCM image segmentation algorithm,this paper proposes a Robust Fuzzy C-mean Clustering algorithm to improve the local area pixel adaptive degree.The robust fuzzy c-means clustering algorithm re-specifies the grayscale values of the pixels in the domain space by grayscale,and selects different fuzzy factors by reference standard.Based on this, the weights of different fuzzy factors are adaptively updated according to the characteristics of pixel points and the gray level fluctuations in the pixel domain.

Traditional fuzzy c-mean clusering algorithm
Fuzzy clustering refers to the process of dividing a set of data sets into categories according to specified features without training samples.The fuzzy c-means clustering algorithm is a combination of fuzzy theory and cluster analysis.The FCM algorithm is a process of repeatedly modifying the cluster center and the membership matrix, so it is also called dynamic clustering.It is an unsupervised machine self-learning algorithm.
The main idea of FCM is to set the objective function. After iterative optimization, the membership degree of the target point to its category is obtained, and then the membership degree is used to classify the target sample. We use all the pixel gray values in the image as the target data, and then continuously iteratively optimize the objective function to obtain the classification result of the pixel gray value and complete the accurate segmentation of the image.Define the objective function FCM Where, N is the number of pixels in the image; c is the preset number of clusters,satisfying According to the Lagrange multiplier method, it is necessary to obtain a minimum value for the objective function under the constraint condition: The specific steps of the FCM algorithm are as follows: 1.Input image X as the target data, set the value of m and c , set the iteration stop threshold 0 > ε ; 2.Random initialization of fuzzy clustering matrix ) (0 U ; 3.Set the loop counter t=0; 4.Calculate fuzzy membership matrix and cluster center according to formulas (3) and (4); 5.If ,the algorithm terminates, otherwise t=t+1 is executed, and step (4) is repeated to continue the operation;

Adptive and robust fuzzy c-mean clusering algorithm
In order to improve the accuracy and robustness of the FCM clustering algorithm, the algorithm selects the applicable fuzzy factor by introducing a weighting factor that can automatically determine whether the central pixel is a noise point or an edge point in the objective function of the FCM algorithm.The weighting factor is defined according to the characteristics of the pixel points and the gray level fluctuations in the pixel domain: Where, k x is a certain domain pixel of the central pixel of the image;The algorithm processes the eightdomain pixel, and all the domain pixels constitute a domain system set Where i x is the gray value of the pixel in the center of the image; ij u is the membership matrix belonging to the j th class of the center pixel i x ; j v is the cluster center of the j th class; is the Euclidean distance between the pixel and the cluster center; The blur factor ij P is defined as: Where b is the control parameter of the intensity influence of the domain information; R N is the number of pixels of the domain set; r x is the gray value of the domain pixel; The blur factor ij Q is defined as: Each pixel should obey the constraint formula (2).
The constraint ij Q not only takes the neighboring pixels of the central pixel as reference information, but also limits the fuzzy membership value. The constrained pixel completely does or does not belong to the j th class of the clustering center,which avoids ambiguity and makes the central pixel's attribution more clear.There is also a fixed blur factor ij G in the objective function, defined as: Where ir d is the distance from the pixel point to the center of the first cluster; the blur factor ij G includes spatial information and gray level information, and the influence on the objective function is controlled through the distance from the neighborhood pixel to the central pixel, the robustness of the segmentation and the noise sensitivity are improved, and there is no choice of parameters. Similar to the traditional FCM algorithm, the Lagrange multiplier method can be used to obtain the membership matrix n U and cluster center The flow of the ARFCM algorithm: (1) Input image data, determine the number of clusters c , set iterative stop threshold ε , and let t=0; (2) Initialize the fuzzy clustering matrix by a random method; (3) Calculate the weighting factor from equation (5); (4) Calculate the cluster center

Experimental results and discussion
The experimental environment of this paper is as follows: The operating system is Windows 10; The processor: Intel(R) Core(TM) i7-6700CPU 3.4GHz 3.41GHZ; The actual experiment of the segmentation effect is performed by Matlab R2008a. In the experiment, setting the parameter fuzzy weighted index m=2; Segmentation window ; Controlling parameter b=0.7 of field information influence intensity; The target picture pulmonary nodule CT images is mainly from the database of the Lung Image Database Consortium (LIDC). We selected 40 low-dose lung CT images from the LIDC database. In order to verify the validity and robustness of the proposed algorithm, the red curve is used to mark the pulmonary nodule boundary according to the annotation file provided by the database. We compared the segmentation results with the FCM algorithm, FCM-S algorithm, FLICM algorithm and rFCM algorithm. Figure 2 shows the segmentation effect of Solitary Pulmonary Nodule. Figure 2-1 shows an original CT image of a pulmonary nodule. Figure 2-2 shows the pulmonary nodules boundary that marked by red curve based on the annotation file provided by the database.   Figure 3-3 shows an extracted lung parenchyma. When comparing with various traditional algorithms, we find that figure 3-5 (FCM algorithm) and 3-6 (FCM-S algorithm) do not effectively segment some small textures, and the segmentation effect is not very good. Figure 3-7 (FLICM algorithm) and figure 3-8 (rFCM algorithm) have lost the edge information of lung nodules. The over-segmentation is more serious, but the segmentation result of rFCM algorithm has been significantly improved compared with other algorithms. The algorithm of figure 3-9 (ARFCM algorithm) shows that the noise can be effectively eliminated, the edge information of pulmonary nodules is preserved, and the segmentation edge is smoother. The shape of the segmented pulmonary nodule is close to the expert's artificial segmentation result. All of the above are qualitative analysis and compare the algorithm in this paper with the traditional algorithms. The following is the quantitative analysis of the segmentation effect of each algorithm. We compare the algorithms using pixel-based precision, recall and F1 values. The precision (P) and recall rate (R) is defined as follows: In equation (12), real Area is pulmonary nodules area that we use the red curve of the pulmonary nodule boundary to mark the lung nodule area according to the annotation file provided by the database. As the gold standard, Area is the pulmonary nodule area obtained by the segmentation algorithm. When the parameter it is the most common evaluation index F1 value, which combines the results of precision and recall. When F1 is high, it proves that the segmentation effect of the algorithm is better. Figure 4 is a comparison of the precision of each algorithm; Figure 5 is a comparison of the recall rate of each algorithm; Figure 6 is a comparison of the F1 values of each algorithm. It can be seen from the following figure that the precision of ARFCM algorithm is higher than other algorithms, and the F1 value is higher than 85%, and the effectiveness of pulmonary nodule segmentation is improved.

Conclusions
This paper proposes an adaptive robust fuzzy C-means clustering method for lung nodule segmentation, which introduces an adaptive update weighting factor based on the characteristics of pixel points and grayscale fluctuations in the pixel domain.The blur factor is selected by the weight value to reflect whether the domain pixel is positively correlated with the center pixel and whether the center pixel is a noise pixel. The algorithm has the advantage of being unconstrained by parameters while suppressing noise. The experimental results show that the proposed algorithm can overcome the shortcomings of traditional fuzzy Cmeans clustering in the image segmentation process, improve the segmentation accuracy and enhance the robustness of the algorithm.