Visual Target Tracking using Robust Information Interaction between Single Tracker and Online Model

In this paper, a novel tracking algorithm based on the cooperative operation of online appearance model and typical tracking in contiguous frames is proposed. First of all, to achieve satisfactory performances in challenging scenes, we focus on establishing a robust discriminative tracking model with linear Support Vector Machine (SVM) and use the particle filter for localization. Intended to fit the particle filter, the outputs of SVM classifier are mapped into probabilities with a sigmoid function so that the posterior of candidate samples is estimated. Then, the tracking loop starts with median flow method and the coordinated operation of the two trackers is mediated by the maximum a posteriori (MAP) estimate for the target probability of negative samples, which is defined during the sigmoid fit. Lastly, for the purpose of model update, we sum up the optimal SVM using a prototype set with the predefined budget, and the classifier is updated on both the prototype set and the updated data from the tracking results every few frames. A number of comparative experiments are conducted on real video sequences and both qualitative and quantitative evaluations demonstrate a robust and precise performance of our method.


Introduction
Visual object tracking is aimed to estimate the states of a moving target in an image sequence. It serves as an important tool in a variety of vision-based systems designed for video surveillance, autonomous vehicles, human computer interaction, etc. Generally, an excellent tracking strategy should be able to remain stable tracking in complex conditions. For an uncertain tracking task, only the initial location of object given, the limited prior knowledge makes it a challenge to overcome drastic object appearance variations, e.g., illumination change, occlusion, abrupt object motion, and disturbance caused by background clutters.
In recent years, the framework of tracking-bydetection [1][2][3] has become the mainstream scheme for visual object tracking, where the key is to find the candidate sample that most closely matches the online model. One issue with such a framework is the updating rate [1]. for one thing, highly adaptive online models easily result in drifting in the case of noisy updates. For another thing, stable update will lead to the loss of information from the former contiguous frames and it is difficult to perform well.
To deal with the problems presented above, we present an approach in which a temporary tracker of the median flow algorithm [4] and the online appearance model are independently implemented to exchange information so that a more robust tracking performance can be obtained.
An online SVM classifier is built to restrict the temporary tracking by the median flow method and the tracker can provide new samples for the model update. The proposed method combines the context model information with the contiguous appearance information and it can effectively alleviate the model update problems, which is closely related to the drifting problem and the model adaptability to appearance change. 2 Online model embedded particle filter

Object appearance model using online linear SVM
To adapt to the appearance variations, it is necessary to construct an online object model. In our paper, we use the SVM algorithm [5] (2) Where, L is the hinge loss function and C is the penalty factor user-specified that balances the model complexity and the loss on datasets.
To , and the new feature vector of each instance is denoted as ={ , } x p q . To obtain nonlinear decision boundaries with linear SVM, the mapping method mentioned in paper [6] is applied to approximate the min kernel SVM, and a single 1850dimensional vector is calculated for the classifier training.

State estimation by particle filter algorithm
Particle filter [7], also known as condensation algorithm, has been proved an effective framework in non-linear and non-Gaussian tracking problems. In this paper,the particle filter algorithm works as the motion model to determine the position.
The particle set is generated by sampling from the proposal distribution . In our case, a multivariate Gaussian distribution is employed to draw the particle set, and a Brownian motion is used as the state dynamics model. Especially, the scale term is replaced with a uniform distribution following the power function with L1 measurement, reducing the algorithm complexity and usage of particles to a certain extent.
Moreover, with the assumed distribution above, the state of the object is finally estimated by the mean value E[S]: Generally, the likelihood () ( | ) n t t t pz  = ss is measured by the Bhattacharyya distance between the target template and the particle patches. In our case, the decision value output () f  from the classifier is associated with the weight  . Nevertheless, the output of the SVM is not an actual probability, but the distance of the candidate patches to the separating hyperplane in the feature space. Platt et. al [8] proposed to scale the output to the range of [0,1] by a sigmoid function and a further revision is carried on in [9], aiming at flaws such as the "catastrophic collapse" problem. Overall, the posterior ( 1| ) Pr y = x is built as where, f is evaluated in three-fold cross-validation of the SVM, and the optimum parameters * A , * B are calculated by solving the following regularized maximum likelihood problem: Meanwhile, in consideration of the sparsity of the sigmoid function, the MAP estimate for the target probability of positive examples and negative are defined as According to Eq.(4), the state of the target can be estimated as :

Cooperative operation of double trackers
As is stated above, a robust algorithm integrating the particle filter algorithm and the online SVM classifier is given, key of which is an appearance model summarizing previous observations. Generally, the online update is an important part. To balance the model drift problem and the model adaptability, a new target localization and model update strategy is proposed.

Target localization
The proposed algorithm is started by training an initial classifier as introduced in section 2.1. Then, the tracking loop starts with the median flow method. The Median Flow tracker is proved to be stable in temporary tracking and the calculation is faster. What is more, the quality of point prediction is then estimated and each point is assigned with an error value, e.g., Forward-Backward (FB) Error [4], Sum of Squared Differences (NCC), normalized cross correlation (SSD). Only half of the points with low error values are used to estimate the whole bounding box: When the error of Median Flow is higher than the given threshold and the probabilistic output closely approximates that of the negative examples, the learning embedded particle filter is used for reliable tracking.

Model update
The tracking results will be added to the new dataset and New members of the prototype set will arise after updating with counting number 1. Once, the size of the prototype set is over the predefined budget B , the components of the same label with the minimal distance are merged by:

ss s s s ss
In our experiment, we use =100 C , =50 B , 1  − = , and the learning rate of positive sample are adaptive by the counting number of the positive ones in the prototype set.

Experiments and results
Intended to demonstrate the effectiveness of the proposed method, both qualitative and quantitative experiments are implemented on 8 publicly available challenging image sequences. These sequences contain complex scenes with challenging factors for visual tracking. For comparison, we run 7 state-of-the-art algorithms. These algorithms [10] include

Qualitative Comparison with other methods
In this section, we qualitatively compare our performance with the other 7 state-of-the-art trackers. As shown in Figure 1,

Quantitative Comparison with other methods
In this part, we adopt the centre location error to quantitatively evaluate the performance in each frame for all test sequences.
The overall performance of different algorithms for sequences are shown in Figure 2 by the graph of centre offset value across all frames of each image sequence. The centre location error is defined as the Euclidean distance between the centre of the tracking result and the ground truth for each frame. As we can see from the Figure 2, the proposed tracker performs remarkably almost in the whole frames for all sequences. The TLD tracker also performs well on most videos, because it uses a detector integrated with a cascade of three classifiers (i.e., patch variance, random ferns, and nearest neighbour classifiers) for tracking and the lost tracking can be restarted via redetection. Furthermore, other trackers merely perform well in part of the test sequences.

Conclusion
In this paper, a robust visual tracker based on an integrated framework is introduced. Our approach combines a typical tracker based on optical flow and the online learning embedded particle filter tracker. The two trackers work in coordination and exchange information, balanced by the MAP estimate value for the target probability of negative examples. Furthermore, the appearance model is updated every few frames with a prototype set and the new dataset. The prototype set sums up the previous model and the new dataset provides updated information. The combination of these ideas leads to a precise and flexible tracker that is able to quickly applicable to the tracking of arbitrary objects in unknown environments. Both the qualitative and quantitative evaluations demonstrate that the proposed tracker can overcomes various challenging interferes and achieves stable and robust performance in long-term tracking.
In future work, much attention should be paid to setting up a more reliable appearance model, such as the combination of integral images and local histograms or a more distinct feature, so that the tracking accuracy can be further improved.