An Efficient Image Co-segmentation Algorithm based on Active Contour and Image Saliency

Image co-segmentation is the problem of extracting common objects from multiple images and it is a very challenging task. In this paper we try to address the co-segmentation problem by embedding image saliency into active contour model. Active contour is a very famous and effective image segmentation method but performs poor results if applied directly to co-segmentation. Therefore, we can introduce additional information to improve the segmentation results, such as saliency which can show the region of interest. In order to optimize the model, we propose an efficient level-set optimization method based on super-pixels, hierarchical computation and convergence judgment. We evaluated the proposed method on iCoseg and MSRC datasets. Compared with other methods, our method yielded better results and demonstrated the significance of using image saliency in active contour.


Introduction
Co-segmentation is a relatively new and fast emerging research field in computer vision.Different from the traditional segmentation, which separates the foreground objects from the background, co-segmentation is defined as the simultaneous segmentation of similar regions from two (or more) images.The necessity of co-segmentation is increasing in recent years as many applications require desired-objects to be carved-out from bulk of images or videos, stored in scattered and haphazard manner.Co-segmentation is difficult due to similar backgrounds and variations in targeted objects.
The existing co-segmentation methods mainly include Markov Random Field (MRF) based, active contour based, clustering based, user interaction based and graph based.MRF based co-segmentation methods include Rother et al. [1], Mukherjee et al. [2], Hochbaum et al. [3] and Vicente et al. [4].Rother et al. [1] is the first work of co-segmentation.They used the l 1 -norm to measure the foreground consistency, and used the Trust Region Graph Cut (TRGC) algorithm to optimize the co-segmentation model.After Rother et al. [1], several other co-segmentation methods were proposed by improving the model construction and optimization.Vicente et al. [4] summarized the previous works [1][2][3] and presented a new MRF model based on Boykov-Jolly model.To optimize the proposed model, the Dual Decomposition (DD) optimization method was used.Co-segmentation methods based on clustering mainly include Joulin et al. [5].They proposed a co-segmentation method based on discriminative clustering.In their paper, normalized cuts and kernel methods are used in a discriminative clustering framework: the goal is to assign foreground/background labels jointly to all images, so that a supervised classifier trained with these labels leads to maximal separation of the two classes [5].User interaction based methods mainly include Batra et al. [6].They proposed co-segmentation method called iCoseg based on a few user interactions.ICoseg can intelligently recommend regions to scribble on, and users following these recommendations can achieve good quality cutouts with significantly less time and energy than exhaustively examining all cutouts [6].Vicente et al. [7] introduced graph based method.They presented the concept of object co-segmentation, which focused on segmenting "objects" rather than "stuff".In their work, they firstly got the candidate object-like segmentations and then described the relationship between the regions by the fully connected graph.Finally, Loop Belief Propagation (LBP) was used to obtain results.In 2012, Meng et al. [8] proposed a co-segmentation model based on active contour, and later in 2013 Meng et al. [9] improved the co-segmentation method.The ideas of former and later works were somehow the same.Based on these two closely related works, energy function was established on foreground similarity among the images and background consistency in each image.Then level-set formulation and the optimization are illustrated.
This paper presents a co-segmentation method by introducing image saliency into well-known active contour based segmentation method and we called ACIS.According to active contour schema, we design an energy function, in which the image saliency is considered for giving additional information of foreground.The energy function is minimized by using the level-set strategy.In order to speed up the computation, we propose an efficient level-set optimization based on hierarchical segmentation, super-pixel and better termination.We conducted the experiments on widely used iCoseg image database and MSRC image database.Experimental results demonstrated the effectiveness of the proposed co-segmentation algorithm.The contributions of this paper are summarized as follows: -A new image co-segmentation model is constructed by introducing the saliency information into the active contour framework.Saliency information can bring useful foreground features to achieve better performance.
-An efficient algorithm is developed through combining super-pixels, hierarchical computation and convergence judgment for solving the co-segmentation problem based on the model above.
The rest of this paper is organized as follows.Section 2 introduces the proposed co-segmentation model based on active contour and image saliency.Section 3 describes our optimization algorithm.Experiments and results are discussed in Section 4. We conclude the paper in Section 5.

The proposed co-segmentation model
As described in Section 1, the goal of the co-segmentation is extracting common objects from a group of images.Active contour is a classical and effective method for image segmentation.It is possible to apply it on co-segmentation task.Our proposed co-segmentation model considers not only foreground consistency but also background consistency, moreover, not only consistency within a single image but also consistency between images.Besides, we introduced the saliency information into the active contour framework to achieve better performance.According to the active contour algorithm, given original images, each image should have initial contour, and each contour keeps changing according to the energy function until the energy is minimized.In this paper, the initial contour is a rectangle without artificial participation because we observed the initial contour has little effect on the final results in the experiments, and the size of rectangle is determined by the parameters.
Let ‫ܫ‬ = ‫ܫ{‬ ଵ , ‫ܫ‬ ଶ , … , ‫ܫ‬ } be a group of images, suppose ‫ܥ‬ is the initial contour of ‫ܫ‬ .Based on the initial contour set ‫ܥ‬ = ‫ܥ{‬ ଵ , ‫ܥ‬ ଶ , … , ‫ܥ‬ }, the evolution of the contour curve is performed to achieve common object segmentation.Let ߱ ௧ be the inner region of the k-th image curve, and ߱ ௫௧ be the outer region of the k-th image curve.So the goal of the co-segmentation is to get the curve when the energy is minimized: For a given curve ‫ܥ‬ , according to the active contour Chan-Vese (C-V) model [10], our energy function E(‫ܥ‬ ) of the curve is expressed as: where ߣ ௦ , ߣ ௦ , ߣ , ߣ , and ߣ are weights for each item.The details of each term in Eq. 2 are described as follows.
The regularization term of the curve: : Regularizing terms mainly describe the attribute of curve, including internal curve area and curve length: where ‫߱‪݁ܽ൫‬ݎܣ‬ ௧ ൯ refers to curve area, ‫ܥ(‪ℎ‬ݐ݃݊݁ܮ‬ ) refers to the curve length, and ߤ and ߥ represent the weights for two factors.
The energy of foreground consistency of the k-th image: : The foreground consistency of a single image is used to guarantee the consistency of the foreground pixels.It also ensures the differentiation of foreground from background.When the pixels in the curve are consistent with the foreground pixels, the pixels are more likely to be foreground pixels.Thus the energy can be designed as the sum of consistency between the curve pixels and the inner region pixels: In above Eq.4, ݃൫߱ ௧ ൯ is for region metric, ൯ቁ is for measuring similarity between pixels and regions [8].If the pixels are more similar to the inner region, the ‫ܧ‬ ௦ is larger, otherwise smaller.

The energy of background consistency of the k-th image:
: The background consistency of a single image is similar to foreground consistency of a single image, except it is used to ensure consistency of the background pixels.Because the background of images may be similar, we should take the energy of background consistency into account and ߣ ௦ would be zero when the background of images are not similar.Thus, the energy of background consistency is designed as the consistency between the pixels inside the curve and the external region.The corresponding equation is: If the pixels are more similar to the outside region, the ‫ܧ‬ ௦ is larger, otherwise smaller.

The energy of foreground consistency of the k-th image and other images: :
The foreground consistency of multiple images guarantees the consistency of the image foreground region with other images, which can be expressed as the sum of the image curve and the interior region of the rest of the images: If the curve of the image is consistent with most of the inner regions, the ‫ܧ‬ is larger, and smaller in vice versa.The energy of background consistency of the k-th image and the rest of the images: : Similar to foreground consistency of multi images, background consistency of multi-images can be expressed as the sum of the image curve and the outer region of the rest of images: If the external regions are more consistent with each other, the ‫ܧ‬ is larger.The saliency energy of the k-th image: : Saliency is used to obtain more foreground information.During this image saliency process, the pixels with greater saliency values are more likely to be foreground pixels, making the discrimination between foreground and background easier.Thus, the energy can be designed as the sum of the difference between the saliency value of pixel and the saliency value of region: In above Eq.8, ‫ܬ‬ ‫,ݔ(‬ ‫)ݕ‬ refers to the saliency value for each point of the k-th image, ݈(߱ ) represents the saliency value of measuring the region saliency, and ℎ൫‫ܬ‬ ‫,ݔ(‬ ‫,)ݕ‬ ݈(߱ )൯ refers to pixel and region saliency differences.Therefore, if the saliency value of pixel is larger, then ‫ܧ‬ is larger, and vice versa.

Optimization method
This section aims at model optimization and proposes an efficient level-set optimization method which is based on super-pixel, hierarchical segmentation and better termination.

Basic level-set optimization
The level-set optimization method of Osher and Sethian [11] can be used to minimize the energy function expressed in Eq. 2. Based on the level-set function φ(x, y, t), Eq. 2 can be expressed as: Here, Ω is the k-th image in the image domain, ߮ is the k-th image level-set function, ߜ(x) is the Dirac function, ‫(ܪ‬x) is the Heaviside function.Subsequently, ߜ(߮ ) stands for ‫ܥ‬ , ‫߮(ܪ‬ ) is for the inner region and 1 − ‫߮(ܪ‬ ) is for the outer region.The Dirac and Heaviside functions are defined as follows: Energy function expressed in Eq. 9 can be optimized using Euler Lagrange by keeping ݂ ቀ‫ܫ‬ ‫,ݔ(‬ ‫,)ݕ‬ ݃(߱ ௫௧ )ቁ and ߮ as independent and keeping ݃൫߱ ௧ ൯ and ݃(߱ ௫௧ ) as constant.Set ‫ݐ∆‬ = 1, we can get the iterative formula of the level-set function: Finally, co-segmentation algorithm based on the active contour using image saliency is obtained and the iterative equation of the level-set function is expressed in Eq. 11.

݃(߱ ௫௧
).If the convergence conditions are obtained, algorithm is terminated.When working with level-sets and Dirac delta functions, a standard procedure is to reinitialize to the signed distance function to its zero-level curve.This prevents the level-set function to become too flat, or it can be seen as a rescaling and regularization [10].

An efficient level-set optimization
If the algorithm derived above is directly calculated on pixels, the calculation will be very expensive.In order to speed up the optimization, the following strategies, including super-pixels, hierarchical computation and convergence judgment, are introduced.

Super-pixels
To accelerate the computation, super-pixels based computation is performed to reduce the computational burden drastically.Taking 200×200 image as an example, the calculation directly on original pixels computes the whole 40000 pixels totally, contrarily number reduced to 1000, or 1/20th of the original level, if super-pixels are used.In this paper, we use Simple Linear Iterative Clustering (SLIC) method [12] to generate super-pixels since this method is commonly used and has good results, and the default super-pixels number is 1000.

Hierarchical Computation
From the view of down sampling, hierarchical data processing can be used to speed up computation.Hierarchical computation includes several layers and processes layer by layer, each layer processes one size of the original image and runs co-segmentation algorithm until convergence.The hierarchical structure starts from small size of images with the quick convergence to the object boundary and then propagates the segmentation from the images with small size to the images with large size.As initial contour of the last layer is somehow close to the final results, it is possible to achieve faster convergence.This whole reducing scheme is called cascade acceleration and it significantly increases the speed of the algorithm.

Convergence Judgment
In general, the algorithm sets an iteration number, and continues until the number of iterations is achieved.Large iteration number guarantees the convergence but on the huge computational cost while small iteration number, for avoiding the computational cost, hinges on the uncertainty of satisfactory convergence.In this paper, to our algorithm judges whether it is converged during iterative process and terminate timely.Specifically, the algorithm is converged if ߮ ௧ାଵ tends to remain stable and in this way we can set a relatively large number of iterations without worrying of computational cost.In this way, we can accelerate the algorithm and ensure good segmentation results at the same time.
To sum up, our algorithm of co-segmentation is summarized in Algorithm 1.

Experiments
In order to evaluate the proposed co-segmentation algorithm, we conducted the experiments on commonly used datasets and compared our algorithm with other competitive methods.

Image dataset
The iCoseg image dataset [13] and MSRC image dataset [14] are used in our experiments, which are often used in co-segmentation field.The iCoseg image dataset is established by Batra et al. [5], which contains a total of 37 categories.Each category has several to dozens of images.The MSRC image dataset was established by Criminisi et al. [14].This image dataset includes 23 categories, and each category has dozens of images.
The effectiveness of results obtained by our method is evaluated with ground truths.The error rates are measured, which is defined as the number of misclassified pixels divided by the number of the total pixels.

Algorithm 1. Our co-segmentation algorithm based on active contour and image saliency (ACIS)
Step -For the initial contour, α = 0.2 • min{height, width}, height stands for the height of an image, width stands for the width of an image.-In Eq. 11, considering the relationship between each item, we here set μ = 0.01 , υ = 0.05 -The parameter for measuring whether pixels are similar is β = 15 [8].-In the experiments, the level number of the cascade acceleration is three, the expansion scale is {0.1,0.5,1} and the corresponding iteration numbers are {100,30,30}.

Compared Methods
GrabCut [15] is a very famous image segmentation method, we compared this method to show ACIS has better performance.We also compared our method with other co-segmentation methods including CoSB [16] and CoSand [17].CoSB is based on active contour and CoSand is based on thermal expansion.We set default parameter values according to the original papers [15], [16], [17].

Evaluation of saliency methods
Saliency item in Eq. 11 greatly affects the segmentation results.As the weight of saliency item is increased, the segmentation results become more close to saliency results.In this paper, we consider four saliency methods, including GS [18], MR [19], SF [20] and wCtr [21].We compared them through careful evaluation.The results are shown in Fig. 1.The first row in Fig. 1 represents the original images and the second row represents the ground truth while the remaining rows represent the saliency images extracted by GS [18], MR [19], SF [20], and wCtr [21], respectively.From the results shown in Fig. 1, we can see that extraction results of the GS method for the foreground saliency are good, but some unnecessary backgrounds also have high saliency value.MR method obtains lower value and cannot accurately extract the foreground saliency.The SF method fails to accurately extract the foreground saliency in the bear class.Meanwhile, the extraction results of the wCtr method are more accurate and robust than other methods.Table 1 shows the comparison of the different saliency detection methods on the error rate with different thresholds.We can see that in each threshold and in each category, wCtr has the lowest error rate.Therefore, we chose the wCtr method for the extraction of the saliency in this paper.

Results on iCoseg Image dataset
The first part is the subjective comparison of co-segmentation results.Fig. 2 shows the comparison of different segmentation methods on the iCoseg image dataset.From left to right and from top to bottom, the classes are Sox players, Pyramids, Brighton and Helicopter, respectively.For each class, the first row contains the original images and the second row contains ground truth images.The third row shows segmented images using GrabCut [15].Results shown in the fourth and fifth rows are segmented results using CoSB and CoSand, respectively.Results in the sixth row are obtained using our method.
According to Fig. 2, for the class of Sox Players, the method CoSand accurately segments the foreground image from the grass but in due process over-segment the foreground objects, resulting in loss of information.In the contrary, GrabCut and CoSB have poor results.Since foregrounds and backgrounds of Pyramids class are very similar, Grab Cut and CoSB failed to distinguish Pyramid from desert.Somehow, our method can separate them in some degree.Moreover, for the Brighton class, the results of segmentation obtained by CoSB are good, but GrabCut has bad results and results of CoSand appear over-segmented.For the Helicopter class, GrabCut and CoSB failed on some images and CoSand also appears over-segmented.In general, the effects and robustness of our method, ACIS, are more excellent compared with other methods.Table 2 shows the comparison of the different methods and our method on the error rate.
GrabCut is based on single image segmentation.So it performs well in skating and gymnastics class because the background is relatively simple, but its performance deteriorates on other classes.CoSB performs well in Brighton class.Because this method only considers the consistency of foreground and background without adding more foreground information.So in many cases of co-segmentation, especially under the complex background, it is difficult to define a suitable energy function.The CoSand, a thermal diffusion method, has good results on the classes of skating and hot balloons, but due to heavily relying on the color, the method will fail easily if the foreground has different colors.ACIS is based on active contour using saliency, active contour guarantees the curve close to the edge of the object and saliency ensures stronger robustness.Therefore, our method has good co-segmentation results in most classes.For the cow class, GrabCut and ACIS performed well despite of different foreground colors, while CoSB and CoSand performs badly because of the different colors of foregrounds.For the face class, ACIS has the best segmentation effects, while other methods do not have satisfactory results due to many interferences and complexity of background.For the sign class, all the methods have relatively good segmentation results, but CoSand takes the sign post as the foreground in the 2nd image while the sign post actually belongs to the background.For the cat class, ACIS also has the best segmentation results.Comparison of the error rates of different segmentation methods are given in the following Table 3.
As shown in Table 3, ACIS has the best average value.GrabCut has relatively lower error rate in most classes while CoSB and CoSand did not perform well on the dataset.The reason is that the foreground and background are relatively complex and different colors in foreground make it more difficult.ACIS has the lowest error rate in the classes of the face and cat and the average error rate is lower.Thus, it is proved from aforementioned results that our method is effective.The main difference between ACIS and CoSB is that CoSB is based on active contour while ACIS is based on active contour and image saliency.So the above experiments show that introducing saliency information can improve co-segmentation results.In conclusion, compared with other methods, our method yielded better results and demonstrated the significance of using image saliency in active contour.

Efficiency
Except the comparison of error rates, ACIS is also an efficient image co-segmentation algorithm because we adopt an efficient level-set optimization method.shows the comparisons of computation cost on MSRC image dataset.From the results we can see that our optimized level-set is much faster than the original level-set.

Conclusion
This paper has proposed an efficient co-segmentation algorithm based on active contour and image saliency.Firstly, we construct a new image co-segmentation model by introducing the saliency information into the active contour framework.Secondly, we proposed an efficient algorithm which is using level-set method with super-pixels, hierarchical segmentation, and better termination to optimize the model above.In order to let the algorithm has more robust and better segmentation effect, the choice of saliency method is discussed using rigorous evaluation.Finally, we conducted a series of experiments on iCoseg and MSRC datasets.Compared with other methods, our method yielded better co-segmentation results and computationally inexpensive.
In future work, we will adopt more accurate foreground information, such as co-saliency, since saliency is just for a single image.

Figure 1 .
Figure 1.First row is the original image, second row is the ground truth, and other rows are the saliency extraction results using GS, MR, SF and wCtr, respectively, on iCoseg image dataset.

Figure 2 .
Figure 2. Comparison of different segmentation methods on iCoseg image dataset.

Figure 3 .
Figure 3.Comparison of different segmentation methods on MSRC image dataset.

Table 1 .
The error rates comparison of different saliency detection methods.

Table 2 .
The comparison of error rates on iCoseg image dataset.

Table 3 .
The comparison of error rates on MSRC image dataset.

Table 4 .
Computation cost comparison of level-set and optimized level-set on MSRC image dataset.