Scale Adaptive Kernel Correlation Filter Tracking Algorithm Combined with Learning Rate Adjustment

. Aiming at the problem that the traditional correlation filter tracking algorithm is prone to tracking failure under the target’s scale change and occlusion environment, we propose a scale-adaptive Kernel Correlation Filter (KCF) target tracking algorithm combined with the learning rate adjustment. Firstly, we use the KCF to obtain the initial position of the target, and then adopt a low-complexity scale estimation scheme to get the target's scale, which improves the ability of the proposed algorithm to adapt to the change of the target's scale, and the tracking speed is also ensured. Finally, we use the average difference between two adjacent images to analyze the change of the image, and adjust the learning rate of the target model in segments according to the average difference to solve the tracking failure problem when the target is severely obstructed. Compared the proposed algorithm with other five classic target tracking algorithms, the experimental results show that the proposed algorithm is well adapted to the complex environment such as target’s scale change, severe occlusion and background interference. At the same time, it has a real-time tracking speed of 231 frame/s.

Henriques et al. [8] proposes a Circulant Structure with Kernels (CSK) correlation filter tracking algorithm, the algorithm intensively samples the training samples through the cyclic shift strategy, which can extend the sample data set without affecting the running speed of the tracking algorithm, and achieves good results. Danelljan et al. [9] design a CSK-based tracking algorithm that incorporates color attribute features (Color Name, CN), and adaptively selects more obvious color attribute features to adapt to scene changes, which improve the accuracy of target tracking. Henriques et al. [10] design a Kernelized Correlation Filter (KCF) tracking algorithm, which replaces the original grayscale features with the Histogram of Oriented Gradient (HOG) feature on the basis of CSK. The feature channel has been extended to improve tracking performance. However, the above algorithms [8][9][10] use fixed-scale training samples, and when the scale of the target changes, it is easy to generate tracking drift. In order to solve the problems caused by the change of target scale, Danelljan et al. [11] proposes a Discriminative Scale Space Tracker (DSST), and Li et al. [12] design a Scale-Adaptive and Multi Feature Integration Tracker (SAMF), but they all use the method of estimating the scale of each frame of image. The algorithm takes a long time and the tracking speed is limited. At the same time, since most of the existing correlation filter tracking algorithms [8][9][10][11][12] adopt a fixed learning rate, the error accumulation will be caused after the target occlusion, which causes a large tracking deviation or even tracking failure.
In view of the above analysis, in order to enhance the robustness of the correlation filter tracking algorithm to target scale variation and occlusion, and to ensure the tracking speed of the algorithm, we propose a scale adaptive correlation filter tracking algorithm combined with the learning rate adjustment based on the KCF model. In the process of scale estimation, since the target scale between two adjacent frames does not change much, it takes more time to perform detection per frame. Therefore, a scale response detection is performed every M frames to reduce the consumption of time, thereby improving tracking speed. In addition, the average difference between two adjacent frames is used to analyze the change of the image, and the learning rate of the target apparent model is adjusted according to the average difference's size to solve the tracking failure problem when the target is severely occluded.   [13].

Kernel correlation filtering target tracking algorithm
Using the properties of circulant matrix and Discrete Fourier Transform (DFT), the coefficient α of the classifier weight ω is obtained [13] 1 F( ) Where F represents a DFT and = ( , ) κ xx k x x represents a kernel function. KCF uses Gaussian kernel function for calculation. For more details about the Gaussian kernel function please see the literature [10]. In a new frame, the position of the target is detected by acquiring the candidate image blocks z , and the output response of the classifier is [ Where x and α represent the target apparent model and classifier parameters obtained by learning, and the update method is The position where the output responseŷ takes the maximum value is the position of the target in a new frame. The main workflow of KCF is shown in Figure 1:

Figure1. Correlation filter workflow
3 Scale adaptive kernel correlation filter tracking algorithm combined with learning rate adjustment

Low complexity scale estimation method
When the target at the maximum value of the classifier output response is obtained, the scale detection of the target can further reduce the influence of noise. However, scale detection is a very time-consuming task, which will reduce the tracking speed multiple. Therefore, we propose a low-complexity scale estimation method for the time-consuming problem of scale detection. Specific steps are as follows: Step1. Set a scale update interval T .Since the scale change of the target between two adjacent frames is very small, it is mostly ineffective to perform scale detection for each frame. In this paper, we change the scale comparison from two adjacent frames to two frames with interval T .When there is a relatively obvious change in the target scale, we then scale and update it, which increases the tracking accuracy and reduces the number of scale updates, and it can effectively reduce the loss of speed by scale update.
Step 2. Use the average dichotomy to detect the largest response scale. Set the scale pool N contains 7 scales, the number of scales is too many can easily affect the tracking speed, and too little will reduce the The elements in N are sequentially multiplied by the width and height of the previous tracking frame to obtain a new scale detection range. Most of the existing algorithms are to find the maximum value in all responses as a new scale after obtaining the classifier response for each new detection range. This calculation method does not first effectively evaluate the increase or decrease of the target size, and then update the scale in a targeted manner, which increases the calculation amount and reduces the efficiency. In this paper, we use the average dichotomy to find the maximum response. First, compare down, let ( ) N i be the maximum response value of the search range corresponding to the i th − element in the scale pool, first calculate the most intermediate

Online learning rate adjustment algorithm
The classifier and the target apparent model of the correlation filter tracking algorithm are updated by equations (5) and (6), wherein the online learning rate η usually takes a fixed empirical value and cannot reflect the change of the scene in the video in real time.
The online learning rate η indicates the learning ability for the change of the target appearance. The smaller the value ofη , the slower the learning rate is, for scenes with small changes in the target apparent model caused by changes in the surrounding environment (such as lighting changes, occlusion, rapid background movement, etc.), the tracking effect is better; The larger the value ofη , the faster the learning rate is, for scenes with large changes in the target apparent model caused by the change of the target itself (such as non-rigid deformation of the target, scale change, rotation, etc.), the tracking effect is better.
In this paper, we propose a method of segmentally adjusting the online learning rate η by using the average difference between two adjacent frames. The specific steps are as follows: Step1

Experimental results and analy-sis
In order to verify the effectiveness of our algorithm, we select eight representative video sequences (Car4, Basketball, Singer1, Girl, Box, Jogging, Lemming, Subway) to test. These videos cover interference factors such as scale changes, occlusion, deformation, background interference, fast motion, illumination changes, and rotation [14].At the same time, we compare our algorithm with other five classical tracking algorithms such as CSK [8], CN [9], KCF [10], SAMF [11] and DSST [12]

Experimental environment and evaluation Indicators
The experiments in this paper are completed on Matlab R2013b, Window 10 system, Intel Core i7-4790 CPU, 4GHz, 4GB memory configuration. In order to facilitate quantitative analysis, three performance evaluation indicators are used in this paper: Center Location Error (CLE), Distance Precision (DP) and Overlap Precision (OP). Where CLE represents the average Euclidean distance between the detected target center position and the target true center position [15]; DP represents the ratio of the number of frames whose CLE is less than a certain threshold (20 pixels in the experiment) to the total number of frames of the video; OP indicates the ratio of the number of frames in which the overlap of the tracking frame exceeds a threshold (0.5 in the experiment) to the total number of frames of the video. The average CLE, DP, and OP of the six algorithms on the eight sets of test videos are shown in Table 1, Table  2, and Table 3. For each set of videos, the results of the best performing algorithm are expressed in bold.

Algorithm performance comparison experiment
As can be seen from Table 1, Table 2, and Table 3, our algorithm obtained an average CLE of 7.15 pixels on 8 sets of videos, an average DP of 95.19%, and an average OP of 84.05%. Among the other five algorithms, SAMF performed best. Compared with SAMF, our algorithm reduced the average CLE by 19.85 pixels, the average DP increased by 11.49%, and the average OP increased by 14.62%. To facilitate visual comparison, we draw a DP curve as shown in Figure 2. As can be seen from Table 4， our algorithm has a much higher tracking speed than the DSST and SAMF tracking algorithms using the scale pyramid  In figure 3(a), there are problems such as illumination changes, target scale changes and background interference in the Car4 video. When illumination and target scale change in the 238th frame, other five algorithms all have certain tracking deviations, only our algorithm can track the entire video column well. In figure 3(b), there are interference factors such as target scale change, deformation and fast motion in the Basketball video. When the target is deformed in the 54th frame, and the target moves rapidly in the 73rd frame, other algorithms appear larger tracking deviations and even tracking failures, only our algorithm is able to accurately track the entire video sequences. In figure 3(c), Singer1 video has a dramatic change in target scale and illumination. For the entire video sequences, only our algorithm and SAMF have better tracking effect. However, our algorithm's CLE, DP and OP performance indicators are better than SAMF. In figure 3(d), there are interference factors such as target scale change, rotation, attitude change and partial occlusion in the Girl video. When the target is partially occluded in the 437th frame, other algorithms have failed to track, only our algorithm can track the entire video column well. In Figure 3(e), the target has undergone scale change, fast moving and rotation in the Box video. For the whole video sequences, only our algorithm and DSST have better tracking effect, but the three performance indexes of CLE, DP and OP of our algorithm are better than DSST.

Tracking experiment when the target is severely occluded
In order to test the tracking effect of the six algorithms when the target is severely occluded, we select three sets of videos contain the target occlusion (Jogging, Lemming, and Subway) to test. Figure 4 shows the DP curves of the three sets of videos. It can be seen that compared with the other five algorithms, our algorithm can obtain more accurate and stable tracking results when the target is severely occluded. Figure 5 shows the tracking results of the three groups of videos from the beginning of occlusion to the end of occlusion. It can be seen that when the target encounters severe occlusion in the Jogging video and the Lemming video, only our algorithm can still track the target effectively, and other five algorithms all have the failure of tracking; In the Subway video, the target of the 41st frame is severely occluded. By the end of the 51st frame occlusion, our algorithm, KCF and SAMF algorithms can still accurately track the target. The other three algorithms have failed to track the target, but the CLE, DP, and OP indicators of our algorithm are superior to KCF and SAMF.
The above experimental results show that compared with the other five tracking algorithms, our algorithm is more robust and can effectively track the target in the case of severe occlusion.

Conclusion
In the framework of kernel correlation filtering, we propose a scale adaptive correlation filtering target tracking algorithm combined with learning rate adjustment. Our algorithm achieves the scale estimation of the target by using a small number of scale samples, which not only improves the adaptability of our algorithm to the scale change of the target, but also improves the tracking speed compared to the scale estimation strategy using the scale pyramid. In addition, by using the strategy of segmentally adjusting the online learning rate to update the apparent model of the target in real time, the problem of tracking failure when the target is severely occluded is solved. Compared with other five classic target tracking algorithms, our algorithm is more robust in complex environments such as target scale variation, severe occlusion and background interference. At the same time, the average tracking speed of 231 frame/s can meet the requirements of real-time.