Saliency Detection with Sparse Prototypes: An Approach Based on Multi-Dictionary Sparse Encoding

. This paper proposes a bottom-up saliency detection algorithm based on multi-dictionary sparse recovery. Firstly, the SLIC algorithm is used to segment the image into superpixels in multilevel and atoms with a high background possibility are selected from the boundary superpixels to construct the multi-dictionary. Secondly, sparse recovery of the entire image is achieved using multi-dictionary to get sub-saliency maps from the perspective of sparse recovery errors. The final saliency map is generated in a weighted fusion manner. Experimental results on three public datasets demonstrate the effectiveness of our model.


Introduction
Visual saliency detection aims to identify the most distinctive regions in an image that draw human attention.This topic has received increasing interest in recent years because of its wide range of applications such as image segmentation, image retrieval, target recognition, image classification, and video compression.Although much significant progress has been made, salient object detection remains a challenging problem.
Sparse representation has been widely used in computer vision task in recent years.The typical sparse representation based saliency detection models have the following two types.The first type is the centre-surround model.Li et al. [1] and Han et al. [2] reconstruct a central image patches using its nearby local regions (i.e., using the nearby local regions as dictionary to recovery the centre patch) and then defined its saliency degree using sparse reconstruction error.However, a given patch is usually very similar to those nearby.As a result, the internal region of a large object cannot be detected well.The second method is based on the hypothesis that image boundaries belong to the background in most cases.Li et al. [3] and Jia et al. [4] propose to construct a background dictionary using superpixels at image boundary and perform sparse recovery to get the saliency map, as shown in Figure 2(a).Recovery error is used to determine saliency degree.In general, their performance is better than that of centre-surround model.However, they are imprecise for using the entire boundary superpixels as dictionary atoms if the object appears on the boundary.
Different from pervious works, in this paper, we propose a bottom-up saliency detection algorithm with sparse prototypes.First, the SLIC algorithm [5] is used to segment the image into superpixels in multi-level and reliable atoms is selected from boundary super-pixels to construct multi-dictionary.Second, sparse encoding of the entire image is achieved based on these dictionaries.The initial sub-saliency maps are generated according to the sparse recovery errors.The final saliency map is obtained through sub-saliency maps' fusion.Our algorithm flow is shown in Fig. 1 and we name it 'MDSR' in the following.

Dictionary construction
To better capture intrinsic structural information and improve computational efficiency, an input image I is segmented into N super-pixels  

 
; Therefore, the entire image can be represented as: It has been shown that image boundaries are good visual cues for background models which can be exploited for saliency detection [3,4].On the other hand, salient object are likely to appear at the centre of a scene.But these assumptions may not always hold.So instead of directly extracted all the image boundary superpixels to construct dictionary, we firstly compute the background possibility of each boundary superpixel:   i BC p stands for the boundary connectivity of superpixel i p , which was by Zhu et al. [6] to estimate the probability that an image patch belongs to background.It is defined by: Where   Then we select the boundary superpixels with its background possibility value bigger than  as the atoms. is defined as bellow: K denotes the number of all boundary superpixels.The dictionary D is constructed based on the selected atoms, as shown in Fig. 2(b).3 Saliency computing

Sparse encoding
We use the set of dictionary D as the bases for sparse representation, and encoded the superpixel i p by: The sparse recovery error is: Therefore, the reconstruction error matrix D of the entire image based dictionary is: According to theory of sparse representation, for the superpixels belonging to salient region (e.g., the person in Fig. 2), their features are quite different from the atoms in the dictionary, so the sparse recovery error is larger than that of superpixels belonging to the background (e.g., the lawn in Fig. 2).So, we could calculate the saliency value of every superpixel via sparse recovery error.To reduce the impact of some noise, we use a sigmoid function to optimize it.
i s is the saliency value of superpixel i p . is the mean of  .b control the magnitude of stretch and is set to 10 empirically.

Saliency fusion
The scale of salient object is different, and the degree of its contact with the image boundary also varies under different circumstance.To handle this problem, we segment the image at M different scales.At every segment scale, we construct the dictionary respectively as shown in Fig. 1.We perform sparse recovery and optimize tasks based on every dictionary to get M subsaliency maps.The final saliency map is calculated in a weighted fusion manner: 4 Experiment result

Experiment setting
We test each algorithm on three public datasets ASD [7], MSRA5000 [8], and SED2 [9].ASD consists of 1,000 images.It is simple and widely used in practice.MSRA5000 consists of 5,000 challenging images characterized by complex backgrounds and objects at the boundary.SED2 has 100 images, where each image has two objects.Most objects in SED2 are located at the boundary.
For each image, we obtained 10 saliency maps using the proposed model and other 9 state-of-the-art methods: GB [10], SR [11], PCA [12], DSR [3], LR [13], RPCA [14], BFS [15], RBD [6], and BSCA [16].We use AUC value and MAE value to evaluate all algorithms.For each method, a binary map is obtained by segmenting each saliency map with a given threshold   0, 255 T  and then compared with the ground truth mask G to compute the true positive and false positive rates to plot the ROC curve and compute the area under ROC curve to get the AUC value.The MAE metric calculates the average absolute error between the saliency map and ground truth mask of test image ( W H  resolution) in pixel: When in experiment, we segment every test image at three scale (i.e., we set M = 3) for accuracy and efficiency.The segment parameters are [100 10], [200 20] and [300 30].The parameter  in equation ( 3) is set to 1.The sparse reconstruction parameter  in equation ( 8) is set to 0.01.

Experiment results
As in shown in Table 1, our proposed algorithm outperforms other 9 methods lower MAE and higher AUC scores, which indicates the predicted saliency map is more similar to ground truth.Note that, the advantage of our algorithm is more obvious than others in SED2 datasets, demonstrating the effectiveness of MDSR in addressing the boundary saliency problems.A visual comparison of several saliency maps is give in Fig. 3.For the first example, the image background is complicated and there is an interference object at the boundary.For the other test images, the salient object all contacts the image boundaries.As can be seen clearly, compared with the saliency maps generated by other algorithms, our results can highlight the salient object more clearly and completely under different scenes, which demonstrates the robustness of our model.

Conclusion
In this paper, we propose a bottom-up saliency detection algorithm via multi-dictionary sparse encoding.To handle the misjudgment problem when salient objects appear at the image boundary, we segment the image in multilevel.In each segment level, we compute the background possibility of superpixels located at image boundaries and then select the reliable ones as the atoms of dictionary.Based on the multi-dictionary, sparse encoding of the whole image is achieved and the saliency map s are generated according to the sparse recovery errors.Experimental results demonstrate the effectiveness of our algorithm.
It is necessary to explain that our method also has good expansibility.If we have a strict requirement for the accuracy of the result in practice, we can set a larger M.However, if we emphasis on the speediness, we can set a smaller M. In the further research, we will focus on the relationship between the speediness and accuracy of our algorithm.

Fig. 1 .
Fig. 1.The overall flow of our algorithm i L p is the length along the image boundary bd I .    is 1 for superpixels on the image boundary and 0 otherwise.the geodesic distance between the superpixel i p and j p in the CIE-Lab colour space.

Fig. 2 .
Fig. 2. Comparison of dictionary construction: (a) traditional methods which directly use the entire boundary superpixels as the atoms of dictionary.(b) Our method that select the

Table 1 :
Comparison of AUC and MAE value on three public datasets.(The best performance are marked with red color)