A moving vehicles extraction method in Satellite Videos

. With the continuous innovation of optical remote sensing technology and the increasing demand for spatial information, satellite videos, which can provide higher spatial and temporal resolution, have been paid a lot of attention. And moving vehicles extraction in satellite videos is one of the most important tasks. By analyzing the shortcomings of current satellite video moving vehicles extraction algorithms, and combining with the characteristics of satellite videos and moving vehicles, this paper proposes an algorithm to extract moving vehicles in satellite videos, that some vehicles are firstly separated from the background by using image extreme points and mean differences, and then the moving vehicles are extracted by joint detection of inter-frame vehicles motion. At the same time, based on the extracted moving vehicles, we also propose a method that can extract road masks by using only three frames. Finally, we use Jilin-1 satellite video data to prove the proposed methods in the experiment. And also this paper has compared the propose methods with another two algorithms, where the results show that the proposed methods greatly improve the accuracy and quality of moving vehicles detection in satellite videos.


Introduction
Compared with traditional optical remote sensing images, satellite remote sensing videos have very high spatial and temporal resolution, which can provide users with sufficient dynamic information [1], thus helps users to analyze the motion and instantaneous characteristics of targets more accurately [2]. Therefore, satellite videos can not only play an important role in dynamic monitoring of hot spots [3], dynamic detection of specific targets [4], real-time feedback of traffic conditions [5], dynamic early warning of natural disasters, superresolution of remote sensing images in specific areas [6] [7], but also provide effective support for national defense and economic construction, which makes researches on satellite videos becoming more and more important [8].
Compared with traditional surveillance or aeronautical videos, the main characteristics of satellite videos and the moving vehicles are: 1)Satellite videos can be filmed in a relative larger range, but the resolution is lower [9]; 2) The vehicles occupy a small number of pixels [10]. For the video with a resolution of about 1m, a vehicle is generally within 6x6 pixels and the texture features are not obvious [11]; 3) The number of vehicles contained in the videos is large, and the color similarity between the vehicles and the background is high, so it is difficult to distinguish them [12].Therefore, how to extract moving vehicles from remote sensing videos has become a key research issue [13].
The traditional moving vehicles detection algorithms mainly include inter-frame difference method [14], background difference method [15] and optical flow method [16]. Many researchers have tried to improve these algorithms and apply them to extract vehicles in remote sensing videos. George et al. [17] used the average pixel intensity of 300 frames as the background template, and extracted the moving vehicles by background difference and global threshold segmentation; Yang et al. [18] used vibe algorithm [19] [20] to detect moving targets in 300 frames, superimposed the moving tracks to form a road mask, and then extracted the moving vehicles through significant features; Junpeng et al. [21] collected motion flow by vibe algorithm, and increased the correct rate by taking cluster analysis on motion patterns; Yan et al. [22] combined the inter-frame difference with the background difference to suppress the edge and noise interference, which improves the detection accuracy; Ahmadi et al. [23] used displacement and velocity as constraints to eliminate false alarm vehicles in background differences and achieved some improvements. Through research, it is found that the background difference method has better effect in remote sensing satellite videos due to its strong anti-interference ability [24]. The above algorithms solve the vehicles extraction problem in satellite videos to a certain extent, but still have some shortcomings. Firstly, the road masks need to be set artificially, which is usually obtained by manually setting data or analyzing all video frames. Secondly, the accuracy of the results of moving vehicles detection is not high, there are a large number of missed detections and false alarm vehicles in the results [25], which have a great impact on the next step of specific analysis on moving vehicles.
In order to solve the above problems, this paper proposes an inter-frame moving vehicles detection algorithm based on images extreme points and mean differences (EPAMD). This method does not only extract complete vehicle images by using the extreme points and mean filtering of the images, then effectively recognize the moving vehicles by inter-frame joint motion detection, but also automatically establish the mask of the moving vehicle area by using only three frames. The proposed method effectively improves the accuracy and reduces false alarm rate and missing rate for extract moving vehicles in satellite videos.

Methods
The inter-frame moving vehicles extraction algorithm based on image extreme points and mean differences is proposed in this paper mainly includes the following steps: 1) Preprocessing of satellite video frames; 2) Extracting the complete images of vehicles by using extreme points and mean differences; 3) Eliminating the false alarm vehicles by using inter-frame joint moving vehicles detection; 4) Extracting the road masks by moving vehicles difference images. The detailed algorithm flow is shown in Figure 1.

Moving vehicles detection
The moving vehicles detection of optical remote sensing images is a new research field with the emergence of satellite videos. Most of the algorithms mainly use the mathematical statistics principle to establish the background model, then use the model to separate the foreground and background to achieve moving vehicle extraction [26], while other information about the vehicles has been seldom used for extraction. This paper combines the characteristics of optical remote sensing videos and moving vehicles in the images, to implement an accurate detection of moving vehicles through the comprehensive application of multidimensional information.

Obtaining the inner points of vehicles by using extreme points.
The single frame color image in optical remote sensing videos is composed of three colors (RGB), each color corresponds to a two-dimensional matrix. Through the statistical analysis of the images, it is not difficult to find that although the distinction between moving vehicles and background in the remote sensing images is weak, there are still some differences, those differences are reflected in the image color surface as the extreme points of images with obvious characteristics [27] [28], its mathematical representation is as follows: Where, 'I' is the color value of the current coordinate, 'x' and 'I' are the transverse coordinate of the image, 'y' and 'j' are the longitudinal coordinate of the image.
Therefore, by using this feature of remote sensing images, we can calculate the extreme points in each image, then we can successfully obtain the inner points of the vehicles (generally the centre points). From Figure 2 we can see that the color of vehicles in the area is different from the surrounding roads. Those differences appear as bump or depression in the surface maps, so the inters points of vehicles become the extreme points of images; (e), (f), and (g) are the results of extracting the extreme points of three regions, where the red points are the extracted extreme points. The comparison shows that the points in the blue circles, which stand for the moving vehicles, are successfully found, and each vehicle contains at least one extreme point. Of course, the extracted extreme points also contain a lot of points outside the vehicles, so our next step is to find out which points are inside the vehicles.

Using region growing algorithm to eliminate false alarm vehicles.
Region growing is an effective method for image segmentation [29]. The basic principle is to start from a seed point and find points around the seed point that meet the condition according to the specified criteria, then define them as the next seed points, and continuously iterate the whole process until there are no satisfying points around all the seed points. In the end, we will get an area that meets certain similar rules with the initial point. The specific process is shown in Figure 3. By extracting the extreme points of images, we have obtained some points inside the moving vehicles and a large number of irrelevant points. By using these points as seed points for region growing, we can obtain some complete images of moving vehicles, and can also eliminate some false alarm vehicles.
In the remote sensing image, vehicles can generally be regarded as the rigid points, which does not undergo deformation. In addition, the vehicles generally occupy less than 6×6 pixels, and their colors are relatively close. Therefore, we map the three values contained in the color images to a three-dimensional space, so the Euclidean distances [30] between pixels are: We use the Euclidean distances between pixels as the criterion for region growing algorithm, and judge whether to go on growing by threshold. The specific formula is as follows: x , y ( ) point, 2 2 ( , ) x y R is the red component value of 2 2 x , y ( ) point, 1 1 ( , ) x y G is the green component value of 1 1 x , y ( ) point, 2 2 ( , ) x y G is the green component value of 2 2 x , y ( ) point, 1 1 ( , ) x y B is the blue component value of 1 1 x , y ( ) point, 2 2 ( , ) x y B is the blue component value of 2 2 x , y ( ) point, and h is the threshold of region growing.
Since the pixels of the vehicles are generally within 36 pixels, when we grow a region from a point, if the obtained region is larger than 72 pixels, we can determine that it is not the inner point of the vehicle. Using this algorithm, the extreme points that are not in the vehicles can be effectively excluded. The comparison figures before and after using the algorithm are as in Figure 4. As can be seen from Figure 4, the region growing algorithm, does not only effectively extract the vehicle where the extreme points are located, but also eliminate a large number of false alarm vehicles by limiting the number of pixels. However, due to low resolution of images, further process is required in the next step.

Separating foreground and background with mean differences.
In remote sensing images, the internal colors of large targets such as roads, large buildings, and vegetation have some similarities, while small targets such as vehicles on the road only have similarities in a small range. So when we use n*n template to filter the images, the internal color change of large targets are usually small, while the small targets inside the large targets will change greatly due to the large difference with the surrounding environment [31]. Moreover, the degree of color change inside large targets do not change significantly with the increase of n, but the color change of the small targets will increase with the increase of n. Therefore, we can first use 3*3 template and 5*5 template to perform the mean filtering [32] on a single frame, then, we can difference the two obtained images to find the region with the largest color change, after that the small targets we want to find are obtained. The specific algorithm implementation is shown in Figure 5.
Given Figure 5, we can see that the mean differences algorithm can separate the small target from the surrounding environment very well. Therefore, if we find the area with the inner point in the difference image with the obtained extreme points in Section 2.1.2, it is simply to obtain the complete images of vehicles. Figure 5. Mean differences effect diagram. Two regions are selected, each region takes four frames, and each region is displayed as three rows, where the first row is the original images of the region in the video, and the second row is the results extracted using mean differences algorithm. The third line is the results of the extractions (red) superimposed on the original images.

Vehicles motion detection.
Through the above steps, we have obtained the complete vehicles images and eliminated some obvious wrong vehicles, but there are still some wrong vehicles or non-moving vehicles, which can be excluded by checking their position changes in multi-frames. The image of all vehicles extracted in three consecutive frames are drawn as follows: In Figure 6, the red centers and rectangular in the image are the positions and shapes of the vehicles in the k-1th frame. The green center and rectangular are the positions and shapes of the vehicles in the kth frame, and the blue center and the rectangular are the positions and shapes of the vehicles in the k+1th frame. Figure 6   x is the transverse coordinate of the vehicle center in the k+1 th frame, k+1 y is the vertical coordinate of the center of the vehicle in the k+1 th frame.
Through the above judgment, we can precisely extract the moving vehicles in each frame. The extraction of moving vehicles compared to the algorithms in references [18] and [19] is as follows:  [19]; (c) The moving vehicles detected by the algorithm in reference [18]; (d) The moving vehicles detected by the proposed algorithm; (e) The local magnification of original image; (f) The local magnification of (b); (g) The local magnification of (c); (h) The local magnification of (d).
As can be seen from Figure 7, Compared with the algorithm in [18], the proposed algorithm does not detect some vehicles, but the extracted vehicles are more complete and accurate, and there is no case that two vehicles are identified as one. Moreover, the number of false alarm vehicles is greatly reduced. Compared with the algorithm in [19], the proposed algorithm does not only greatly reduce the number of false alarm vehicles, but also extracts more effective vehicles.

Method for extracting road mask by using moving vehicles
Through the above method, we successfully extracted most of the moving vehicles, but it also takes a lot of time to eliminate the false alarm vehicles. In the calculation process, it is not difficult to find that most of the false alarm vehicles appear in areas outside the road. If the road mask can be extracted, the speed of algorithm will be greatly improved. In the past research results, the extraction of road masks is mainly formed by GIS information [34] or all video frames [18], which also requires additional human intervention. Therefore, a mask extraction method without human intervention is proposed in this paper, which can effectively exclude more false alarms.

Road mask extraction using moving vehicles.
In remote sensing images, there are some similarities in the characteristics of vehicles driving areas such as roads and plazas, which can be extracted by region growing algorithm. Fortunately, the seed point of region growing algorithm can be obtained by difference between moving vehicles in two frames. The principle is that if area A is the feasible domain of the road in the kth frame, m1 is the moving vehicles detected in the kth frame, and m2 is the moving vehicles detected in the k+1th frame, then there is: is the pixels in area A and not inside the vehicle. By using these pixels as seed points and selecting some appropriate criteria for region growing, we can get a complete road mask. The algorithm implementation is shown in the following figure: Figure 8. Results of road mask extraction: (a) The extraction results of the algorithm in reference [18] (using 297 frames); (b) The extraction results of the algorithm in this paper (using 3 frames); (c) The results of (a) and (b) are superimposed.
From Figure 8, we can see that the algorithm proposed in this paper covers the moving vehicles path extracted by the algorithm in reference [18]. At the same time, the parking lot and roads where moving vehicles have not yet appeared are also extracted, thus fundamentally reducing the situation that the moving vehicles cannot be detected due to the small mask. Because the longest consuming step in the proposed algorithm is the step of eliminating the invalid point by the region growing algorithm in 2.1.2, that region growing algorithm has to be used for each point. Therefore, the less invalid extreme point extraction, the faster the calculation speed of EPAMD algorithm. Through Table 1 and Figure 9, we can see that using the road mask can effectively reduce the invalid extreme points by more than 60%, which greatly improves the speed of the algorithm. In addition, using mask does not only affect the extraction effect of the vehicles, but also reduces the false alarm rate by 47%. Compared with the method in reference [18], which uses 297 frames to extract road mask, while the proposed method only uses 3 frames to extract the road mask. And the two extracted masks generate the same moving vehicles detection results.

Results
We verify the feasibility and versatility of the algorithm through comparative experiments. Since the selected experimental data scale is 1.1 meters and the frame rate is 25 frames per second [35], the world's major countries limit the total length of road vehicles to a maximum of 26 meters, and the width of the vehicle does not exceed 2.6 meters [36], Therefore, we set the vehicle distance threshold in formula (6) to 1 pixel, which means the vehicles speed is greater than 7.6km/h, the angle threshold is set to 120 degrees, which means the vehicle turning angle is not more than 60 degrees, According to the statistical data, the area threshold is set to 10 pixels, and the region growing threshold in formula (5) is set to 8 pixels, which means the Euclidean distance between pixels color in the region does not exceed 8 pixels.

Data
The satellite video used in this article is the sample video data provided by Chang Guang Satellite, which was filmed on May 3, 2017 in Atlanta, USA. The video has 297 frames, each frame includes 12000*5000 pixels, the corresponding ground area is 11km*4.6km. Since the video images are too large, it is not conducive to data statistic, so we selected a region of 843*843 pixels as the original data for comparison.

Evaluation criteria
In order to quantitatively compare the results, we selected three general evaluation criteria [37], which are precision, recall, and comprehensive evaluation index (Fscore). The specific calculation methods are as follows: TP recall TP FN = + (8) 2* * precision recall F score prescision recall − = + Among them, TP is the correct number of moving vehicles, FP is the number of false alarm vehicles, and FN is the number of undetected moving vehicles.

Comparison of results.
In order to make the experiment more complete, we selected four consecutive five frames in the area for comparison. The comparison results are shown in table 2: It can be seen from Table 2 that compared with the algorithm in [16], the comprehensive precision rate has increased from 45% to 95%, and the comprehensive index has increased from 75% to 94%. Compared with the algorithm in [18], the comprehensive precision rate has increased from about 53% to about 95%, the recall rate is increased from about 75% to about 93%, and the comprehensive index has increased from 63% to about 94%. On the whole, the algorithm can greatly improve the precision, and the recall rate is maintained at a fairly high standard.

Conclusion
The EPAMD algorithm proposed in this paper separates and extracts the vehicle using image extreme points and mean differences, and obtains the correct moving vehicles through inter-frame motion detection.
Then based on the extracted vehicles, we proposed an algorithm to successfully extract road masks only using 3 frames. The experimental results show that the proposed method has higher accuracy of moving vehicles extraction. Compared with direct using of image differences or background modeling methods, the proposed method effectively solves the shortcomings of those algorithm, such as "ghosts", and it also can provide more accurate moving vehicles information and complete image. On the other hand, this method provides a new way to solve the problem of vehicles extraction in satellite videos. With the continuous development of remote sensing technology and the increasing demand for remote sensing information, video satellites will provide people with more abundant and effective information in the future, based on moving vehicles extraction.