Multi-scale foreground extraction on graph cut

In order to improve Grab Cut implementation effect for real images, we propose a novel improvement which extends the Grab Cut in three aspects: 1) a series of edge-preserved components are generated via the TV smoothing model; 2) the number of sub-regions is estimated by histogram shape analysis to remove the negative effects on the unreasonable number of the sub-regions; 3) a segmentation termination condition is constructed by integrating the multi-scale components. The experiment result indicates that this method performs well compared to other methods based on graph cut and is insensitive to sub-regions.


Introduction
The foreground, which plays the key role in image analysis and comprehension, is a region with semantic meanings in an image. Existed approaches to foreground extraction are based on low-level or mid-level characteristics of the image, such as edge information [1], appearance [2] and texture features [3]. However, those methods still cannot achieve correct segmentation result because of the ambiguity of those characteristics.
The Grab Cut [4] well extracted foreground with minimal user interaction via combining edge and appearance models. The appearance model is statistically described using the Gaussian Mixture Models (GMMs), in which one problem is that the number of Gaussians has a significant influence on accuracy of the segmentation [5]; the other is that the GMMs is inefficient on the inhomogeneous sub-regions. The implementation effect is low for high-textured images.
To optimize the foreground extraction effect, a novel improvement is proposed. This improvement extends the Grab Cut in three aspects. One is that a series of smoothing components are generated via the TV smoothing model. These components satisfy edgepreserved and region-smoothed. These characteristics are favored to model the inhomogeneous sub-regions via the GMMs. The other is that the number of sub-regions is estimated by histogram shape analysis to remove the negative effects on the unreasonable number of the sub-regions. At last, integrating the multi-scale appearances of an image, a segmentation termination condition is constructed using the IOU of the foreground objects.
In the next section, relevant segmentation algorithms derived from the graph cut will be reviewed. The following section proposes the non-linear smoothing model to optimize the appearance for segmentation, and analyzes the characteristics of smoothing components.
Next, the GMMs is optimized using the histogram shape analysis. Finally, the experimental results and conclusion are given.

Previous Work
Users may be interested in different foreground regions for an image. To customize user's demands, foreground extraction approaches usually exploit information of user interaction. However, the former's extraction effect is unsatisfactory for images that the distribution of foreground and background exists a considerable overlap. The latter loses its effectiveness on high textured images.
Foreground extraction methods on graph cut, combining edges and appearance models, successfully extract foreground by the min-cut algorithm [6]. The appearance models can be divided as the given and estimation modals, note that the given modals [7,8] is intensity distribution of foreground and background. However, this modal highly relies on the known foreground and background by the user's mark, and requires a large amount of the user interaction. The estimation modals [4,7,9] require user to define initial boundary box, and an image is divided into the background and foreground. Each region is statistically estimated via the GMMs in the segmentation process.
The fixed number of Gaussian in each GMMs has negative effects on segmentation. To remove the effect of unsuitable number of Gaussian, GMMs is optimized [7] by estimating the optimal number of Gaussians in each GMMs via the CLUSTER algorithm [10]. Since it does not exploit the intensity distribution of an image, it is not effective for an image with inhomogeneous sub-regions. To best model the inhomogeneous sub-regions using GMMs, the Supercut [9] implements the local similarity constraint to optimize GMMs, which guarantees that homogeneous pixels are classified into the same sub-class.

Image Multi-scale Analysis
Due to exploiting the given-scale edge and appearance information, foreground extraction methods on graph cut have an unsatisfactory segmentation effect on real images. We use the TV smoothing operator [11] to generate a serious of smoothed components ( ) i u of an image u . The component can be described as , which smears the fine information by the smoothing operators. With the k increases, the component is getting coarser, and the intensity distribution of regions becomes tight cohesion. It is favored to model the foreground and background using GMMs. The other, the involvement of 0 u in iterations simultaneously pays equal attention to raw images which contains the intrinsic singularities features, through which the edges are preserved.

Foreground Extraction Model
Given the boundary box around the desired object, an image u with N pixels is divided into two parts-the foreground region F T and a background B T . Combining edge and appearance model, an energy function is defined whose minimum will identify with a good segmentation. Here, is the segmentation label of each pixel, where 0 and 1 correspond to the mixture region and the background, respectively.
The term ( ) V  , denotes edge information, is expressed as a penalty weight which is high with the low gradient and low with the high gradient .To encourage smoothness, the constant  relaxes the tendency to smoothness in regions of high contrast. In our experiments,  is fixed to 50. 0  is a set of the 8-neighboring pixels, and factor ( ) dis  is the Euclidean distance. To ensure the exponential term in (2) could switch appropriately between high and low contrast, the constant is chosen to be: Here  denotes the expectation over an image sample.
The parameter ω represents the parameters of the appearance models. The term ( ) R  evaluates the fitness of the segmentation α to an image u, given the models ω .
are the mean vector and the covariance matrix of the m -th GMM, , 0,1 K    denote the number of Gaussian in the foreground and background, respectively.
Combining edge and region information of the component, the energy function of segmentation be formulated and can be written as follows: Minimization is done by using a standard minimum cut [6]

Clustering via Histogram Shape Analysis
The histogram can describe grey-level distributions of an image and the difference among sub-regions. Each sub-region corresponds to peak in the histogram. And its shape illustrates the fact that the gap between the two peaks probably result from border pixels between objects and background. Thus, the number of objects in an image can be estimated via histogram shape analysis.
If the histogram of an image is multi-modal, that is the image is high-texture or complex, it typically has many local minima and maxima. To weaken the effect of the local minima and maxima on the shape analysis, median filter is used to get the histogram pre-processed to preserve the valley. Let s denotes the grey levels of an image,   Finally, image pixels are adaptively clustered into several sub-regions on its intensity distributions via histogram analysis. The median filter is usually short of an efficient decision rule on obtaining good results on different intervals in the histogram. As the sample size of the median filter increasing, the number of valleys falls slightly, which influences the partition of the sub-classes. In this paper, the sample size is determined as 17 by experiments.

Multi-scale Foreground Extraction
The foreground extraction task is to deduce the foreground object from the image u using the appearance model and edge. However，the appearance model and edge extracted from a single component cannot generate a complete correct segmentation result for the hightexture images. In different smoothing components, the inhomogeneous sub-regions are smoothed and are of compact distribution, leading that the number of Gaussians and parameters for every GMM are not the same. This difference cause that the segmentation results are inconsistency for each component. In order to evaluate segmentation performance, we use the IOU of the foreground object to measure the significant level of segmentation, which is defined as the following: Here, u k  , respectively. With the number of smoothing increasing, the significant level of segmentation monotonically decreases. According to the changes of significant level with smoothing number, the segmentation termination condition is defined as following: ( ) 0 k Ls k   (9)

Experimental Results
The experiments are conducted using Visual Studio 2013 on a PC with Intel-Core i5CPU @ 3.40GHz and 4GB of RAM without any particular code optimization. Aimed at improving segmentation performance, the non-linear smoothing is exploited to optimized appearance of images and using the histogram shape analysis to evaluate the optimal number of Gaussians for GMM construction. An image with several sub-regions (seen in the Fig.1) is used as an example. These sub-regions are smoothed and a serious of edge-preserved components are generated, which firms the segmentation of the foreground from the background. Also, the histogram shape analysis effectively optimizes the estimation of the number of Gaussians. To further test segmentation performance of this approach, experiments are carried out to compare with the correlative segmentation methods based on the graph theory, such as the Grab Cut [4] and the Grab Cut with GMM optimization [7]. The results using different methods are shown in the Fig.2. For a simple scene image with strong color contrast between the foreground and the background (i.e. the Fig.2a, b)), the effects of these three algorithms on details of edges are almost the same. And this method and the Grab Cut with GMM optimization have better performance on accurate segmentation of sub-regions than the Grab Cut (show as the red circle in Fig.2a). The reason behind it is a more reasonable estimation of color clusters for the GMM is achieved by using different methods for clustering.
For a complex scene image with multiple sub-regions (i.e. the Fig.2c, d), this method has a better result in dealing with edges comparing to the other two methods, showed as the red circle in Fig.2c. This is because in the course of smoothing, a serious of edge-preserved components are generated, which avoids the losing of edge information while smoothing sub-regions. In this situation, segmentation effect is optimized by enhancing the consistency of sub-regions and preserving edge information at the same time. Integrating the multi-scale appearances of an image and optimizing the GMMs for each region, this method successfully improves segmentation performance for the complex scene images. The IOU metric [12] is adopted to estimate the effectiveness of comparing methods above. It is estimated by the following: where ( ) F s and ( ) F g mean the foreground object of the segmentation and ground truth, respectively. The F-measure is also used for the evaluations, it is computed by the following: 2 F-measure precision recall precision recall where, The CPU time and segmentation evaluations are illustrated in Table 1 for images in the Fig. 2. The IOU and F-measure of segmentation using this method is higher than that of the other methods, which shows the superiority of this method. However, this method has higher computational cost in order to achieve better segmentation effects. The surplus CPU-time is mainly spent on the iterative computation of the non-linear smoothing for images. And the number of iteration depends on the inhomogeneous degree of the subregion.

Conclusion
This paper proposes two modifications to the Grab Cut object segmentation method to improve segmentation performance without increasing user interaction. For images with inhomogeneous sub-regions, the non-linear smoothing algorithm is shown to optimize image appearances for images with inhomogeneous sub-regions in segmentation. Plus, the histogram shape analysis assists to estimate the suitable number of Gaussians in each GMM to well model the foreground and background. Compared to the original Grab Cut and relative improved methods, the segmentation effect using this method is superior. But it tolerates expensive computations to achieve better segmentation results. To save the computational cost, we will design a method of adaptively labelling the initial curve which adjacent to the boundaries. On the other hands, the fixed sample size of the median filter might mislead the suitable number of Gaussians for each GMM. We plan to construct an algorithm to adaptively determine the sample size of the median filter, which contributes to more accurate clusters' number by the histogram shape.