Acquisition and Registration of Human Point Cloud Based on Depth Camera

Registration of human cloud points is a key step in 3D human reconstruction. So,this paper proposes using depth camera to acquire human point cloud. Then,using feature matching and PnP (Perspective-n-Point) to solve the camera pose.Finally, use least square to optimize camera pose and implement point cloud registration. Experimental results show the effectiveness of this method.


INTRODUCTION
With the development of virtual reality and augmented reality, the real 3D human body model is applied to games, movies and animations.It has great commercial value.But the traditional 3D human model acquisition usually uses a large scanner or laser scanner to obtain multi view data; Then, by scanning the mark points in the body part, it achieves the registration of point cloud data and get the complete human point cloud; but these 3D scanners are expensive, complex operation.Therefore, the use of traditional scanners to obtain human point clouds has the disadvantages of high cost and difficult operation.With the development of hardware, a low cost depth camera is available on the market.These cameras can capture color and depth images.These cameras is easy to operate and cheap.So we use Microsoft's Kinect depth camera as a scanning device to capture human point clouds.But because of the limits of equipment, We can't get complete human cloud data at a time, so we need to repeatedly scan the body multi angle, and through the relevant registration algorithm achieve all point cloud transform into the same coordinate system.

OVERVIEW
In many applications of machine vision, such as stereo matching, image registration and shape recognition, etc. Registration of point clouds has always been a critical step.Point cloud registration is the registration of point clouds from different camera coordinate systems into the world coordinate system, achieve the object stitching, and get a complete object point cloud model.The accuracy of the point cloud registration is related to the complete model of the object.At present, the commonly used registration methods are genetic algorithm, least square matching method, three point pair method and ICP algorithm.ICP algorithm is a commonly used algorithm for point cloud registration, but the requirement of cloud point is very high, and the point cloud of registration should have most of the coincidence points, otherwise it will fail to register.Therefore, this paper proposes the use of depth camera to obtain human skeletal feature points pairs,and using SIFT algorithm for image feature extraction and matching [1] .Finally, PnP (Perspective-n-Point) is used to solve the camera pose and make the optimization to realize the precise registration of point cloud with least square [2] .

CLOUD ACQUISITION AND REGISTRATION
Based on the depth camera,the human point cloud registration method is divided into three steps:(1) Use Kinect to get spatial color image, depth image and body color image, depth image, skeleton feature points,and filtering of these images to remove the noise [3] .(2) Use SIFT feature extraction algorithm extract and match feature points of two frames of color images,and select the wrong matching points.(3)Add the extracted skeleton pairs to the matched feature points,according to the depth information of the image,use PnP (Perspective-n-Point) to solve the camera pose of the 3D point to the 2D point (4) Construction of minimum projective error to the pose obtained and solve.(5) The human image is transformed into human point cloud, and the human point cloud registration is implemented with the desired pose.

Image Capture
We use Microsoft Kinect cameras to capture spatial color images and depth images, Since the color and depth images of camera structure are not in the same coordinate system, it is necessary to align color images with depth images, we use the function MapDepthFrameToColorFrame of the Kinect SDK to map the depth image into color image space.Due to use the high 13 bits of depth image pixel value save a depth value,the lower three bits to save the user number information.The human body can be identified and the foreground of the human body is extracted from the image,get color and depth images of the human.whileusing the kinect skeleton information to trace the 20 skeleton of human body.Since the storage order of skeleton points in each frame image is consistent, the skeleton points of each frame image can be matched into feature points, and the number of feature points is increasing.

Feature Point Extraction and Matching
In the field of computer vision, there are many stable local image features,Such as the famous SIFT [1] , SURF [4] , ORB [5] etc,They are repeatable, distinguishable, efficient, local.In this paper, we select the appropriate algorithm to extract features by comparing the number of extracted features, the time and the matching quality of different algorithms.The experimental results are shown in table 1: Table 1.The connectivity detection result of video with large skin color area Through experimental comparison and analysis, considering the registration effect and time, we select SIFT (Scale-Invariant, Feature, Transform) for feature extraction,The algorithm takes full account of the changes of illumination, scale, rotation and so on in the process of image transformation, and the calculation is accurate,So we choose SIFT as the feature detector and descriptor extractors and extract the feature points from the two frames panoramic image captured above.Use SIFT to extract feature points x m t,(m=1,2,...,M) in image It, and feature points x n t+1(n=1,2,...,N)in the image It+1.Using the Brute-Force Matcher measure the distances of descriptors between each feature point x m t and all x n t+1and sort them.Select the closest point as the matching point [6] .
The matched feature points have a large number of mismatches, so we need to do a filter, the algorithm follows: (1) Filter the matched points and delete the feature points at distances less than K.
(2) Subtract the coordinates of the matching points against Ft and Ft+1, obtaining the point Pi (i=1,2,3,...) and storing the Pi in the array M.
(3) Because the camera usually moves horizontally, the point of |pi|>90 and the corresponding depth value d=0 of the feature points are deleted and storing the Pi in the array M2.
(4) Count the number of points in the array M2 in which the PI is greater than zero, denoted as a (the number of forward moving points), and the feature point subscript i corresponding to the PI point is stored in the array Ma.
The number of points less than zero is denoted as B (the number of negative translation points), and the feature point subscript i corresponding to the PI point is stored in the array Mb.
(5) If the a>b shows Image forward translation, negative translation for noise,so the feature points in the array Ma are correct matching points, whereas the point pairs in the Mb are taken.The result of the filter is shown in figure 1: As you can see in figure 1:a, there is an incorrect matching point pair, and the results in figure 1:b are exactly filtered by the algorithm and the results clearly eliminate the wrong matching points.So,the accurate matching of image feature pairs is obtained.

Using PnP to Solve Camera Pose
PnP (Perspective-n-Point) is a method of solving 3D to 2D point pair motion.It describes how to estimate the pose of a camera when the n 3D space points and their projection positions are known.So we need to use depth image information to transform one point of feature points and the skeleton feature points into 3D points.The coordinate formula is as follows: The i, j coordinates of feature points, z for the corresponding depth information, fx, fy means focal length, cx,cy said photocenter.Now we have the 3D space coordinates and their projection positions with n feature points, and we use the direct linear transformation to solve the PnP problem.Now we know that the point in space is p, and the homogeneous coordinate is p=(x,y,z,1) T in which the projection point is x1=(u1,v1,1) T in the Eliminate s of the formula(3)and get the formula (4): At the formula (3) t1=(t1,t2,t3,t4)T, t2=(t5,t6,t7,t8)T,t3=(t9,t10,t11,t12)T, The two linear constraints on t are derived by formula (4), and now there are N characteristic points, get the formula(5), Therefore, when the feature point pairs are greater than 6 pairs, we obtain the camera pose [7] .

Use Least Square to Optimize Camera Pose
Because of the camera noise and other reasons, the camera pose is only an initial value, so we need to optimize the camera pose.Therefore, We get the formula(6)according to the characteristics of Pi=[Xi, Yi, Zi] T , and the pixel projection of the ui=[ui, vi]T, the camera pose,R,T (expressed as ξ Lie algebras).
[ , Due to the unknown camera pose and the noise of the observation point, the equation has errors.We construct the least squares problem (7) to minimize the sum of errors, and find the optimal solution of the camera pose: In order to solve the equation, we need to know the derivative of each error term about the optimization variable, P'=[X',Y',Z'] T is the space point coordinates of the Pi transform to the camera coordinate system, according to formula (6),get the formula (8): ' ' , ' ' By the difference between u, v and observation,Get the error e.take the derivative of e with respect to feature points and camera pose,get the formula (9) (10): According to the formula (9) and the formula (10) we use Levin Berg Marquardt method for solving equation ( 7) to obtain the optimal solution of the camera pose.Finally, we transform the color image of human body into the point cloud according to formula (2), and multiply the point cloud and camera pose to realize the point cloud registration.

EXPERIMENTAL RESULTS
In this paper, we use depth camera to acquire human point cloud and implement point cloud registration.The experiment shows that the point cloud can be well registered, and the experimental results are shown in figure 2: Table 2. Registration time of different registration methods

CONCLUSIONS
In this paper, the depth camera is used to separate the point cloud of human body from the object cloud.then,itused feature matching and PnP (Perspective-n-Point) to solve the camera pose.Finally, the least square is used to optimize the pose of the camera so as to achieve the registration of each cloud point.The accuracy of point cloud registration is improved,and this paper proposes a method to filter the false feature matching so as to improve the accuracy of the calculation.Compared with the traditional method of human point cloud registration, this paper reduces the hardware requirements and registration time, and improves the registration accuracy.

Figure 1 .
Figure 1.Feature point screening results

Figure 2 .
Figure 2. The result of Point cloud registration Through the point cloud registration method, we can quickly use the low-cost depth camera to obtain the human point cloud for registration,At the same time, compared with the traditional point cloud registration method, it has the characteristics of high speed, small error, accurate registration, and the low correlation degree of point cloud.Compared with the traditional ICP point cloud registration method, the registration speed of this paper is very large, and the GPU acceleration point cloud registration is not needed, which reduces the demand for hardware.