Binocular Vision Three-dimensional Imaging Technology by Using Structural Light Projection

SIFT matching algorithm is used to carry out the binocular three-dimensional imaging. Active projection is introduced to solve the problem of low feature quantity and poor matching results in the matching process. By means of projection random speckle, the matching feature is increased, and the matching quality is greatly improved. According to the train running part of the three-dimensional imaging experiment, achieved a good imaging result. Compared with the Fourier profilometry in the active three-dimensional imaging technology. The experimental results show that the structured light projection binocular three-dimensional imaging has a better effect.


Introduction
Three-dimensional optical measurement is divided into passive three-dimensional measurement technology and active three-dimensional measurement technology. The passive measurement technology does not rely on the structured light illumination in terms of the illumination mode, and it can use the captured two-dimensional digital image to restore the appearance of the object directly from one or more ready-made video recording systems [1][2][3]. The active measurement technology requires the illumination of structured light. From the digital image carrying the measuring object's three-dimensional topographic information, the measuring object's threedimensional topography is obtained by some other algorithm [4][5][6].
Binocular vision is a passive three-dimensional imaging technology based on human eye imaging theory. It uses the imaging device to obtain two images of the measuring object from different positions based on the parallax principle, and calculates three-dimensional information through the relationship between corresponding points of images [7]. It has the advantages of high efficiency, suitable precision, simple system structure and low cost, so it is very suitable for online and non-contact product inspection and quality control systems at the manufacturing site. Since image acquisition is done instantaneously, it is an effective fast three-dimensional measurement method [8][9].
Railway plays a crucial role in China's economic and social development. In order to keep the running part of the train in good condition during running, it is necessary to conduct detailed inspection work on the running part before the train travels to meet the safe driving condition. In daily primary inspections, inspections are usually made visually or manually, which is time-consuming and labor-intensive. Besides, trains need to be in the highspeed train section or locale depots for a long time to complete inspections. Also, the numerous parts at the bottom and dark environment pose threats to the safety of maintenance personnel. Therefore, in order to save the inspection time and ensure the safety of inspection, an image inspection program should be designed so that the inspector may directly observe three-dimensional images of the bottom parts on the computer side.
A complete binocular stereo vision system consists of six parts: (1) camera calibration; (2) image acquisition; (3) feature extraction; (4) stereo matching; (5) depth calculation; (6) interpolation and reconstruction. For these six steps, domestic and overseas scholars have conducted a lot of research to improve the binocular vision measurement effect.
Based on SIFT binocular matching, the binocular three-dimensional imaging technology is studied in this paper. To handle the difficult matching problem of binocular vision, this paper greatly improves the matching accuracy by introducing the structured light projection, and greatly enhances the three-dimensional imaging effect. The binocular three-dimensional imaging method is used for the inspection of bottom parts and three-dimensional imaging verification is performed.

Binocular vision model
It should be noted that the actual situation is that the horizontal center distance between the two cameras should be written as x T . Here is the simplified derivation under the assumption that the X coordinate of the world coordinate system is taken on OO , so x T T  . In combination with the camera distortion, according to the pinhole camera model, the relationship between the object's world coordinate point and the corresponding pixel on the computer image is as Equation (2) Where, K : camera internal reference matrix   T : translation vector of camera In the Equation (2), the internal reference matrix contains the camera's focal length, distortion and other information, while the binocular camera's rotation, translation, and other information are contained in the external reference matrix. By integrating the internal reference matrix with the external reference matrix, the author may yield the following Equation (3): Where, M is the projection matrix (also called homography matrix). After the equation is expanded, three linear equations can be obtained. After the scale factor s is removed, two relations are left and there are three unknowns, so at least two cameras are required to complete the depth measurement.

SIFT feature matching algorithm
Scale Invariant Feature Transform (SIFT) was patented by a Canadian professor David G. Lowe. SIFT feature remains invariant to rotation, scale and brightness changes, and it is a very stable local feature. The algorithm first establishes a scale-space description of the image and identifies potential scales and selects invariant extreme points through Gaussian differential functions. According to its local gradient direction, a feature description vector is established, which is a feature detection description method with scaling, rotation and affine invariance.
After convolving the image I with the twodimensional Gaussian function G of different Gaussian kernels, the scale space G at different scales is obtained, and the Gaussian difference image ( , , ) D x y  is obtained after the two are subtracted. Each of these pixels is compared with its upper layer, lower layer and neighbors for a total of 26 pixels, and the maximum or minimum value is used as a candidate feature point. In order to enhance the matching stability, it is also necessary to remove the low-contrast extreme points and the unstable edge response points in the candidate feature points, so as to accurately locate the extreme points to obtain the local feature points.
Taking each local feature point as a center, a 16×16 window is taken in its neighborhood and is divided into 4×4 pixel blocks. A gradient histogram of 8 directions is calculated on each pixel block, which belongs to 8 intervals. The value of each interval is a Gaussian weighted cumulative value of gradient amplitude. Therefore, each 4×4 block of pixels can be represented by a description vector of 8 dimensions. In this way, a SIFT feature vector of 128 is generated for each local feature point. In addition, the feature vector length is normalized to further remove the effects of illumination.
The matching of feature vectors is to measure the similarity of SIFT feature vectors by calculating the Euclidean distance 2 ( ) , (1, 2, ,128) the local feature points in the two images to be matched, that is, to find the nearest local feature point of the first image in another image adjacent. A ratio threshold R is set and 0 1 R   . It is determined to be a correct matches when min R  , where min is the nearest neighbor distance.

Experimental results and analysis 4.1 Binocular three-dimensional Imaging Experiment
Before performing three-dimensional imaging, the binocular camera needs to be calibrated, i.e., the homography matrix of the camera should be calculated. This paper adopts the Zhang Zhengyou [10] camera calibration method. Calibration checkerboard is shown in Figure1. (a) Left side camera shot (b) Right side camera shot   According to the matching results, it can be seen that there are a large number of unmatched gaps in the planar area, failing to meet the standard of 3D imaging. In order to enhance the matching effect, the feature points are increased, and the structured light projection is cited here. The projected image is a random speckle, it is shown in Figure 4.

Experimental Comparison with Fourier Profilometry
Fourier profilometry is an active three-dimensional measurement technology. The frequency spectrum of the raster fringe image is calculated by using a fast Fourier transform. After filtering, the phase value of the measuring object surface is extracted from the frequency spectrum, and then the three-dimensional shape data of the object is calculated. It is shown in Figure6.  Through the above experiments, it is concluded that the key factor of binocular three-dimensional imaging lies in the accuracy and redundancy of matching points. When the feature points are more and similar to the edges, the restoration results are better; while the planes with fewer feature points (continuous smooth surfaces, such as wheel treads), the restored three-dimensional images will be sparse and have certain noise points. These are caused by inaccurate matching. As for the Fourier profilometry, due to the periodic nature of the grating, it is inaccurate in dealing with phase cutoff regions (discontinuous portions in actual objects, such as cross-sections and fractures). This method has better restoration effect for continuous surfaces such as wheel treads, and when the components in the area are complex, it cannot produce accurate threedimensional images. In general, binocular vision has superior imaging performance.