A novel method of 3D motion object’s position perception

. With the development of industrial automation, location measurement of 3D objects is becoming more and more important, especially as it can provide necessary positional parameters for the manipulator to grasp the object accurately. In view of the disabled object which is in widespread use currently, its image is captured to obtain positional parameters and transmitted to manipulators in industry. The above process is delayed, affecting the work efficiency of the manipulator. A method for calculating the position information of target object in motion is proposed. This method uses monocular vision technology to track 3D moving objects ， then uses contour sorting method to extract the minimum constrained contour rectangle, and combines the video alignment technology to realize the tracking. Thus, the measurement error is reduced. The experimental results and analysis show that the adopted measurement method is effective.


Introduction
Automatic robotic systems have more and more been desirable for many real-world applications such as automatic surveillance, vehicle control and industrial robot. Vision sensors provide a wide range of information for both motionless or motion object; they can provide wide role for automatic control of robotic systems.
Machine vision [1] is widely used in manipulator control, defect detection and so on. A large number of relevant studies have emerged [2][3][4][5][6]. In the industrial world, there are many application scenarios in which objects are conveyed by automatic conveyor belts. In order to improve production efficiency, robots are often used for picking and handling. Since the target object needs to be perceived to start the robot to grab it, machine vision provides a good technical means for manipulators. At present, there are many studies on manipulators, which use machine vision to obtain and analyse images, and then control the trajectory of the robot by the obtained position parameters [3][4][5][6][7]. However, at present, many robots use the target object in the stationary state to sense the position information of the object [3]. Although this method is simple, it is unfavourable to the work efficiency of the robot and is lagged behind the operation of the manipulator. At present, there are some studies on the collection, calculation and manipulator control of the target moving objects [7], but they are often not targeted for regular objects. In addition, with respect to camera calibration, Literature [8] proposed a global calibration method that includes the hand-eye pose estimation, Literature [9] proposed a trial calibration technique based on a 2D picture test with a pose estimation algorithm to correct initial difference in AHRS Inertial reference frames and improve joint angle accuracy. In Literature [10], a method for detecting six -dimensional geometric error using single CCD camera and well -designed object is presented. Literature [11] studied the angular localization of chessboard image. Above researches have studied the camera calibration, which provide the conditions for the correct measurement of position and pose. The camera calibration method based on checkerboard is also used in this paper. In measurement methods, there exists some methods from researchers, such as in document [12], a method of measure space target' position such as work piece with the intersection of straight line and plane. Peng Wang et al. [13] studied biologically inspired progressive enhancement target detection from heavy cluttered SAR images. [14] provided a novel measurement method based on pinhole camera model with five reference points, the coordinates of the reference points is estimated through projection model with the least-squares algorithm. Xu, LY. Et al. [15] studied a measurement model with the camera's extrinsic parameters such as the height and pitch angle, then the height of object can be calculated. Above these methods, the [12,13] are only for stationary target object, no accounting for motion target object; the [13] is for planar target; the [14] described a calculation method for 3D object's height, and obtained 3D object's position, but didn't describe handing method for motion object. In the aspect of system implementation, Literature [16] describes the development of a simple 3D machine vision measurement system. In this paper, drawing on the experience gained, aiming at the problems mentioned above, a more general measuring method of moving objects is proposed, which can measure the position and direction of the objects in any shape and design and implement related functions based on OpenCV.
Aimed at above existing problems, this study focus on position of motion 3D target object, taking the target object of the geometrical shape on the conveyer belt as a research scene.

Camera model in OpenCV calibration method
I n the camera calibration, the camera model should be firstly determined. The camera model of OpenCV is based on the ideal pinhole model, and the radial distortion and tangential distortion are introduced. Five coordinate systems are defined in the model: The transformation from the world coordinate system to the image pixel coordinate system is as follows： (1)The world coordinate system is numbered as a camera sitting system(transformation of the rigid body position in 3D space) Where,  (2)Transformation of the camera coordinate system into the ideal image coordinate system (projective transformation) (3)Transformation of ideal Image coordinate system to actual Image coordinate system ( distortion considered) Lens distortion is mainly caused by radial distortion， the lens model of secondary radial distortion is: The singular coordinate is expressed as: Where 1 K and 2 K indicate radial distortion coefficients; ' A indicates equivalent transformation matrix.
(４)Transformation of actual image coordinates to pixel image coordinates The homogeneous coordinates is expressed as： x y S S each indicates the number of pixels in the unit distance on the plane of the image in the direction of the axis X and Y； γ is the tilting factor between the two coordinate axes. It's given by formula Where 1 M is determined by

Three-dimensional moving objects position perception method
Target position and posture perception is composed of 4 parts: image acquisition and preprocessing, calibration, target tracking, video correction, and the process is shown as follows.
Step1.Get the number of corner points and the length and height of the board.
Step3.Find out the chessboard corner and save the corner information to the matrix.
Step4.Determine whether the loading image ends? Yes, go to the next step, or go to Step2.
Step5.Calibrate the camera parameters and save to the XML file. Step6.Get video from the camera.
Step7.Capture a frame, determine whether the capture was successful, yes, turn to the next step, otherwise it is over.
Step10.Display the original image and show the corrected image.

Calibration
There is a one-to-one correspondence between the coordinate of a certain point in space and the coordinate of the imaging point corresponding to the point in a camera image. The geometric correspondence is determined by the imaging geometry of the camera. The calculation process of camera geometric parameters is called camera calibration. According to different calibration methods, the camera calibration is divided into the following three categories [17]: the traditional camera calibration method, the camera calibration method based on active vision and the camera self-calibration method. 2, Make calibration plate and measure board data on the box.
3, Establish two structures, input the same corner pixel coordinates and three-dimensional coordinates respectively.
4, Load the preprocessed image and draw the corner. 5, Introduce the structure as a parameter into the function cvCalibrateCamera2 (), as well as internal and external parameters.
6, Save the calculation results in the xml file. As is shown in Figure 1, to achieve the effect of this part as an intermediate process, the implementation is fully automated.
After system start, the camera opens, and the qualified picture is taken out from the display window, and the preprocessing module is handed over to the calibration function. The results will be saved into the xml file to correct and obtain the object three-dimensional information needed to extract parameters from the xml file to add the calculation, and the coordinates will be stored in the xml file. As shown in Figure 2.
With the design of the pose table storage system of four three-dimensional coordinates and rotation angle of the target object, the storage elements are pixel_dst1, pixel_dst2, pixel_dst3, pixel_dst4 and angle. Table 1 shows the following content: Pixel_dst1, pixel_dst2, pixel_dst3, pixel_dst4 are the coordinate values of the four corner points of the tracked object and are stored in matrix form in the xml file for the robot to grab when picking up the object. Angle is the rotation angle of the object, which is the focus of the pose information. Only by knowing the angle of deflection can the robot grasp it accurately. f , y f are calculated as a whole in camera calibration.

Target Tracking
This part mainly achieves two functions: target recognition and target positioning. In order to achieve target recognition, we use the background subtraction method [18]. When there are abnormal objects in the monitoring scene, there will be obvious differences between the frames. Subtracting two frames to get the absolute value of the difference in brightness of the two images, and judging whether it is greater than the threshold value to analyse the motion characteristics of the video or image sequence and determine whether there is any object motion in the image sequence. The interframe differential image sequence is equivalent to the image sequence in the time-domain high-pass filter. First, the contour of the target is detected to see the track of the target in the field of view. The basic principle of the interframe difference method is to subtract the gray value of the pixels corresponding to the two frames before and after the image. If the gray value of the corresponding pixels is small, the scene can be considered as static; if the grayscale changes somewhere in the image area, it can be thought that this is caused by moving objects in the image. By marking these areas and using these marked pixel areas, we can find the position of the moving object in the image [19].  The purpose of target positioning is to obtain the target's three-dimensional coordinates. First of all, the pixel coordinates of the image are corrected, and then the homography matrices are used to rotate and translate them to obtain three-dimensional coordinates.
Target tracking program flow chart is seen in Fig.3. Fig.4 shows the effect of target recognition and tracking: the contour we see is that the system sets a threshold based on the known object size, filters all the contours less than this threshold, and draws the smallest rectangle with the largest contour. In this way, we can see in the vision of the target trajectory and then calculate the location of the target information.

Fig.5 Pose information acquisition program flow chart
By adding the target tracking profile information and contour information into the new structure, the algorithm can be enabled by sorting. The maximum value is selected and the maximum contour of the sort has angle information we want to find, then the centroid coordinates can be found. The results are saved to the XML file. Pose information acquisition program flow chart is seen in Fig.  5.

Video correction
In this part, we calibrate the image of camera by calibrating the homology matrix. When shooting a target with a digital camera, a video camera or other digital devices, the camera often has difficulty in obtaining images perpendicular to the ground (or the photographed plane) due to factors such as shooting distance and alike. Thus, the image is distorted and the quality of the image is degraded, and the subsequent image processing is affected.
(a) Before correction (b) After correction Fig.6 Correction effect display In this case, the distortion mainly includes two kinds of radial distortion and skew distortion. The radial distortion is corrected many times. In practical work, it has also been applied well. The representative methods are: two-step method, the use of control points, the calibration method based on the camera model, the zoom address correction method [20]. The implementation process of this part is as follows, the implementation effect is shown in Figure 6(due to the limited experimental conditions, it will bring some errors in the use of more flat ground as the target station for sports, and the error analysis will be involved in the later).
1, get the frame from the camera 2, correct the distortion image 3, output correction image in the window

Experiment and analysis
The experimental environment is under the operating system of Win 7, using development tools Visual Studio 2013, OpenCV, choosing C++ programming language, using USB camera to obtain calibration images. Four measurements are taken to obtain their respective measurements.  unit：cm. In order to verify the effectiveness of the measurement algorithm, the following comparison test is used to compare the data measured by the algorithm with the actual manual measurement results. Table 2 shows the experimental results.
As we can see from the data in Table 2, there is a certain error in the detection of the diagonal point of the system, so there is certain deviation between the detection value of the corner point and the actual value. In From the calculation results, the error rate on the x coordinate is relatively large. The error analysis of the experiment is as follows: 1, Number of images: the impact of the calibration method on the camera calibration accuracy requires each image to be shot from a different angle, that is, the image is a field of view and only a homology H matrix is determined. The number of images is H number. In the process of solving the homography matrix of the camera's internal and external parameters, the number of H also affects the calibration accuracy. In the case of no distortion, the camera has 9 parameters, whereas the homography matrix can only determine 8 degrees of freedom. Therefore, it is necessary to calibrate at least 2 images of different perspectives. In order to obtain a stable numerical result, the number of images collected far exceeds the number of times. If the captured images are not enough, it will obviously bring some errors to the calculation of internal parameters. However, collecting too many images will affect the accuracy of the calibration. On the other hand, it will cause error accumulation and increase the error probability. Therefore, an appropriate number of images must be taken into account. This selection contains 15 calibration images.
2, Re-projection error and noise error: re-projection error increases rapidly with the increase of the noise standard deviation, and the calibration results are very sensitive to the world coordinates. In order to further improve the calibration accuracy, it is necessary to improve the measurement accuracy of planar targets or print accuracy. Theoretically, when the error level is less than 1.5% and the re-projection error is less than one pixel, the error can be controlled within 3 pixels. When the error level is greater than 2%, the projection error increases rapidly.
3, Feature point extraction error: Due to the feature point extraction error, in order to ensure that the internal parameter calibration result is reliable, at most 20 images are needed to make the calibration result stable within 3 pixels.
4, World coordinate measurement error: the generation of calibration plate, often assuming that print black and white checkerboard is equilateral long. However, because the accuracy of the ordinary printer is not high, measurement error is caused by the calibration board. Therefore, it is necessary to study the influence of the measurement accuracy of world coordinates on the calibration.
Based on the analysis of the above error causes, the error factors that we can control are the number of images, the measurement of world coordinates and the relatively good experimental environment. Camera calibration uses up to 15 images to achieve high-precision calibration. In camera calibration, the calibration board is made reasonably, and the number of calibrated images is chosen to calibrate the camera quickly and accurately. These are the shortcomings of our experiments, which lead to greater errors, but the results do not affect our verification of the entire process and its related algorithms, so the method proposed in this article is effective.

Conclusion
In view of the extensive application background of the position perception of three-dimensional moving objects, a more general location perception method is proposed based on the existing research. The contributions of this article are as follows: 1) Contour sorting method is used to detect the main contour of the three-dimensional moving object, and then calculate the position coordinates of the target object; this method overcomes the shortcoming that the traditional shape objects can only be detected in the past.
2) The experimental error is analysed in detail. In order to improve the accuracy of position sensing, calibration is very important, for it involves the selection of experimental conditions and the number of images. Although we have explored a more general method for measuring the position perception of moving objects of any shape, there are many details that need to be improved and further studied. In the next step, we will enhance our research to improve the accuracy and apply it to the industry.