Adaptive Multi-part Target Representation for Tracking

In this paper we propose and evaluate an effective approach based on multiple colour histograms. The target is adaptively divided into non-overlapping regions. The proposed partition does not weaken the robustness of the colour histogram representation; it can be used to any class targets. Experimental results show that the proposed representation improve the tracking accuracy and decrease the number of iterations.


Introduction
Visual localization and tracking plays a central role for many applications like intelligent video surveillance, smart transportation monitoring etc.Localization and tracking algorithms aim to find the most similar region to the tracking in image sequence.One of the most crucial difficulties in robust tracking is the construction of representation models that can accommodate the illumination variations, deformable appearance changes, partial occlusions, etc.
Colour histograms have been widely used to represent, analyse, and characterize images.They allow for significant data reduction, and can be computed efficiently; moreover colour histograms are robust to noise and local image transformation.In the target tracking domain, colour histogram are a popular form of target representation, because of their independence from scaling and rotation, and robust to partial occlusions [1].Nevertheless the robustness of such a model is weakened in challenging tasks due to the lack of spatial information.This problem can be limited by computing more than one histogram on different parts of the target [2], but there is no generally accepted solution for a generic division.
In this paper we propose and evaluate an effective approach based on a multi-part model.The target is divided into non-overlapping regions in order to increase the tracking sensitivity to rotations and anisotropic scale changes.The proposed partition does not weaken the robustness of the colour histogram representation; it can be used to any class targets.

Mean shift tracking algorithm
Mean shift algorithm war first proposed by Fukunage and Hostetler [3] in 1975.Coumaniciu [4]employed the MS algorithm successfully in feature space analysis.MS algorithm was also used in image smooth and image segmentation, and got good performance.

Target representation
In object tracking domain, a target is usually defined as a rectangle or an ellipsoidal region in image.Currently, a widely used target representation is the color histogram because of its independence of scaling, rotation and partial occlusion.Denote by   1,..., i i n x  the normalized pixels in target region.The probability of the feature ( 1,..., ) u u m  in the target model is computed as [1]: Where q is the target model, u q is the probability of the uth element of q ,  is the Kronecker delta function, ( ) i b x associates the pixel i x to the histogram bin and ( ) k x is an isotropic kernel profile.Constant h C is a normalization function defined by: Similarly, the probability of the feature u in the target candidate model from the candidate region centered at position y is given by: Where ( ) p y is the target candidate model, ( ) are pixels in the target candidate region centred at y , h is the bandwidth and h C is a normalization function.
In order to calculate the likelihood of the target model and the candidate model, a metric based on the Bhattacharyya coefficient [5] is defined by using the two normalized histograms q and ( ) p y as follows

Mean shift
Minimizing the distance ( ) d y in ( 5) is equivalent to in (6).The optimization process is an iterative process and is initialized with the target position, denoted by 0 y , in the previous frame.By using the Taylor expansion, the linear approximation of the Bhattacharyya coefficient [ ( ), ] p y q can be obtained as: Since, the first term in (7) is independence of y , to minimize the distance in ( 5) is to maximize the second term in (7).In the mean shift iteration, the estimation target moves from y to a new position 1 y , defined as 3 Adaptive target representations

Adaptive partition
Since the colour histogram is the whole statistical characteristics of the target, its lack of spatial information, as it depends on the colour feature, it cannot give good performance when an object and its background have similar colours.
In the case of rectangle-based trackers the adaptive partition is as follows.

The weighted Bhattacharyya coefficient
The Bhattacharyya coefficient is used to calculate the likelihood of the target model and the candidate model.Emilio Maggio and Andrea Cavallaro [6] using the average Bhattacharyya coefficient as the finally coefficient.They do not consider that the different parts have difference impact to the whole likelihood.In this paper, we adopt that the more likelihood of the same subpart of the rectangle, the bigger weights.A commonly used weight is calculated as follow: Where ( ) i  denote the weight of i th  sub-part, ( ( ), ) i i

BC p y q
is the Bhattacharyya coefficient between ( ) i p y and i q , C is the number of the sub-part, ( ) i p y and i q are the probability distribution of target model and the candidate model, respectively, ( ) y the finally Bhattacharyya coefficient between target model and candidate model.

Implementation of the algorithm
In order to select the most appropriate color feature, Emilio Maggio [6]   They are the 24 th , 39 th and 51 th , respectively.By analyzing these results, it is possible to notice that the multi-part representation improves in average the performances.Apparently, compare the frame of 39 th , the single-histogram tracking algorithm does not contain the whole car, but the multipart representation does.However, this algorithm is still some problems as single-histogram algorithm.Apparently, the scale is remaining the same throughout the tracking process.
Fig4 showed the curve of the number of iterations of different multi-part representation on faces sequences as showed in the middle of top row of the Fig2.

Fig.4. The number of iterations
In the left of the top row is the single-histogram, the right is the two-part partition representations, the left bottom is the seven-part partition representations, the right is the two-part partition representations as the series of the A,B and C showed in Fig1, respectively.The average number of iterations is 3.96, 3.15, 3.18 and 2.91, obviously, the number of iteration is decrease.

Conclusions
To summarize, we have proposed an adaptive multi-part target representations methods for tracking.Its adaptive decides the ways that divided the target by the size of external rectangle.The weighted Bhattacharyya coefficient is used to calculate the likelihood of the target model and the candidate model.Experimental results showed that the proposed adaptive multi-part representation is more accurate than the single histogram, and achieves better results in predicting the correct orientation.Future work includes investigating an adaptive scale change. Fig1:

y 4 .
in the previous frame.The width w and height h of the external rectangle.Step1.Using the width and height decide the adaptive partition, and calculate the sub-part target model.Step 2. Initialize the location of the target in the current sub-part by(10) and (11).Step 3. Derive the weights   1,2,..., Find the next location of the target candidate according to (9).Step 5. Compute   1,2,..., described and evaluated eight representations.The eight representations are derived from seven color spaces: RGB, rgb, rg, CIELab, XYZ, YCbCr, and two representations based on HSV (HSV-D, HSV-UC) [7].After the evaluating eight different color, they conclude that the RGB-based representation outperformances the others.In this paper, the RGB feature space is selected.The parameters are the same for all test sequences, and are described in the following.The Epanechnikov profile is accepted in these experiments.To satisfy the low-computational cost imposed by read-time processing discrete densities, 16 16 16   -bins histograms used in RGB feature spaces.Maximum number of iterations is 10.The results presented in this section are based on a dataset of targets extracted from 6 different test sequences, including faces, vehicles and pedestrian.All of these test sequence is download on [8] and [9].Fig2 showed the target initialization.