A Survey of Learning Approaches and Application for 3 D Vision

Three-dimensional (3D) vision extracted from the stereo images or reconstructed from the twodimensional (2D) images is the most effective topic in computer vision and video surveillance. Threedimensional scene is constructed through two stereo images which existing disparity map by Stereo vision. Many methods of Stereo matching which contains median filtering, mean-shift segmentation, guided filter and joint trilateral filters [1] are used in many algorithms to construct the precise disparity map. These methods committed to figure out the image synthesis range in different Stereo matching fields and among these techniques cannot perform perfectly every turn. The paper focuses on 3D vision, introduce the background and process of 3D vision, reviews several classical datasets in the field of 3D vision, based on which the learning approaches and several types of applications of 3D vision were evaluated and analyzed.


Introduction
Real world is a three-dimensional space.In the last few decades, significant development has been achieved in Computer vision in 2D images.However, accurate geometrical information cannot be provided in 2D images.As a result, in order to reconstruct the 3D information out of the 2D image, 3D vision was raised in the 1960s.Stereoscopic cameras have already existed since the early 20th Century.The idea of stereoscopic TV was generated in the 1920s, and stereoscopic cinema made its colorful debut with the 1952 movie "Bwana Devil" [1].The technology plays an increasingly important role in defining the world from a boarder perspective.
Digital Image processing includes: (1) Computer Graphic, (2) image, (3) computer vision.Computer graphics creates images.Image Processing is used to enhance or manipulate other images, resulting that Computer Vision figure out image content analysis.These existing operations performed on images are divided into four parts: (1) enhancement, (2) restoration uses to generate dense depth maps, (3) analysis, (4) syntheses, is the essential image processing task which produces images by using the scene model [2].
Nowadays, 3D vision has been drawing more and more attention in computer science as a result of the rapid development of 3D scanners and computing devices over the last few years [3].Furthermore, the widespread availability and relatively low cost of these devices add to the boom in 3D vision researches.
3D vision is a significant technology aiming to utilize 3D information with the aid of machine vision, has enabled the realization of applications, which so far could not be done with classical 2D technologies.Many 3D vision applications perform well in inspection, medicine, etc.It is also anticipated that 3D vision will play a key role in enhancing visual perception of future robotic systems.Depth sensors are now opening a new era in robotic vision systems all over the globe, replacing conventional monovision cameras and other types of range finders such as laser, ultra-sonic and radar sensors [4].
A system of 3D vision is made up of two or more cameras placed horizontally to capture the image of the intended object simultaneously.By the method of approximating the differences between two or more given images from different horizontal viewpoints, the corresponding pixels shifted between these images can be obtained, thus giving a disparity map to recover the visual depth of the object.
In stereo reconstruction, attempts were made to generate the dense depth maps from uncalibrated images and that process is the essential key and the most challenging job among the whole process of 3D vision.As a result, it is important to produce strong stereo matching algorithms to achieve the correspondence points between two or multi-view images.At the same time, it can be discovered that the process stereo matching can be mainly divided into four measures: (1) Matching cost calculation, (2) cost aggregation, (3) disparity computation, and (4) disparity refinement [2].Also, according to thorough review, the algorithms applied to 3D vision are mainly classified into two main categories: global algorithms and local algorithms.The local methods trade accuracy for speed while the global methods are accurate but timeconsuming at the same time.With respect to the types of algorithms, we can conclude 3 different algorithms: (1) those that have published results of real-time or near realtime performance on standard processors (2)  those that have not been shown to obtain near real-time performance [5].
As a primary branch of computer vision, 3D vision has raised significant attention over the past few decades.Large numbers of researches have been performed and considerable progress has been made to promote development in this era.Over the last few years, a large number of high-quality algorithms and methods have come up to apply to different fields, or both improving accuracy of results and achieving real-time performance in calculating the results [5].At the same time, the existing algorithms have been augmented to be more accurate and faster.
At the same time, the technology of 3D vision has been put into application at present widely in the fields of robot vision, industrial design, and medical imaging.Apart from an increasing number of algorithms and methods proposed, a large body of datasets have been released to facilitate the research in 3D computer vision.In this paper, 5 typical types of benchmark datasets were selected as example with analysis on their characteristics.However, with respect to the present status of the 3D vison techniques, the techniques of constructing a strong 3D vision system and apply it to the reality is still imperfect.
This paper intends to introduce several learning approaches related to 3D vision, give several datasets utilized in experiments, and review some typical application of 3D vision technology.The rest of this paper is organized as follows.In section 2, the datasets employed in experiments are reviewed briefly.Section 3 gives a review on several learning approaches related to 3D computer vision in detail.Section 4 describes several types of application of 3D computer vision.At last, the conclusion and future works are presented in section 5.

Dataset
With the intention of enabling a quantitative evaluation of 3D vision reconstruction algorithms, several calibrated image datasets were collected which provide ground truth 3D models.
( This data set is made up with 2 objectsthe former one is a plaster reproduction of the temple and the latter one is a plaster dinosaur.This data set has the characteristics of both sharp and smooth features, complex topologies, strong concavities, and both strongly and weakly textured surfaces.
This data set consists of images captured from 68 viewpoints over the hemisphere by Stanford spherical gantry, using a CCD camera with a resolution of 640 x 480 pixels attached to the tip of the robotic arm.Moreover, grid points were found and utilized to estimate camera intrinsic and extrinsic.This dataset consists of 21 datasets in totally and each dataset consists of 7 views from view 0 to view 6.The dataset is taken under 3 different illuminations with 3 exposures.Among this, disparity maps are provided for view 1 and view 5.In making this dataset, rectified images and images with radial distortion were not included in.This dataset has three sizes: full size of width from 1240 to 1396 and height of 1110, half size of width from 620 to 698 and height of 555, and third size of width from 413 to 465 and height of 370.

The learning model
With the aim of improving the existing 3D vision techniques, a large number of learning models has been raised recently.This section summarizes several methods and learning models in the field of 3D vison.In stereo matching, how to handle depth discontinuity is an important issue.Stefano et al. raises a 3D matching method -Single Matching Phase [11].This type of algorithm has the ability of performing with high speed by the way of rejecting prior matches at the time when better objects are detected as it is based on the uniqueness constraint.By using this algorithm, reliable real-time dense disparity maps can be produced.Muhalmann et al. represents a method which uses SAD correlation metric for color images, utilizing left to right reliability check with the aim of achieving improvement in both the fields of speed and quality [12].Binaghi et al. raised an advanced method which utilizes zero mean normalized cross correlation i.e.ZNCC metric which combined with the techniques in the field of neural network [13].The neural network in this method is applied for each support region, depending on the shape and size of each window.In order to conquer the problems that some methods are sensitive to depth discontinuity, several new methods have been presented.Tomasi and Birchfield raised a method with the aim of detecting discontinuity throughout 3D matching images, by which providing method matches individual pixels throughout equivalent scanline pairs while enabling occluded pixels to incomparable, thus spreading the knowledge between scanlines via the method of using post-processor [14].
Different from methods that based on local optimization, algorithms based on global optimization show the ability to solve difficult optimization and produce accurate smoothness assumptions.Criminisi and Torra proposes a framework involving the integration related to partial information about disparities, for example known surfaces inside the scene [15].In different strategies of global optimization-based methods, many different types of algorithms have been proposed to improve the 3D vision technology.Firstly, the mathematical method of dynamic programming can simplify the computational operation of optimization problems by the way of decomposing individual problems into sub-problems.With respect to the graph cuts strategy, Lim and Daolei employ the method of graph cuts and provide a formula of segment-based 3D matching for extracting depth map [16].By this way, every segment will be labelled and thus successfully utilize plane to match the region with regular disparities.Sun and Zheng proposed an algorithm applying belief propagation to resolve the 3D matching problem [17].It represents the obstacle of 3D matching as a Markov network which contains three coupled Markov random fields.With its help, the two-view stereo model can be extended into multi-view stereo model.
In the 3D matching process, one primary obstacle needed to be handled is occlusion.With the aim of solving this problem, Luo et al. introduces the definition of disparity consistency which provides the benefits of integrity along with illusion sensitivity [18].Furthermore, one major obstacle in stereo correspondence problem of 3D vision is the absence of the photo consistency between two images.

Application
The technology of 3D vision has been widely applied to really life in a number of fields.3D vision is commonly utilized to realize the tasks including human activity detection, 3D terrain mapping, navigation, obstacle detection and bio-inspired autonomous guidance [19].Nishida and Kitamura raised an application detecting human activity that can quickly detect the human daily activity events in the real world [20].In the field of navigation, Spampinato et al. produced an algorithm that is able to be widely applied to AGVs (Automated Guided Vehicles) with a high potential to dramatically reduce the costs [21].In their methods, 3D vision was applied to realize the function of localization and map building of unknown environments.Cabani et al. raised on application in obstacle detection using 3D vision, where it is expected to help improve the security in driving especially in extreme weather [22].They used color-based edges segmentation methods and color matching to realize 3D vision.

Conclusion
In the field of computer stereo vision, the threedimensional (3D) vision is a computer science subject that combines multiple disciplines.3D vision can be created by reconstructing from several two-dimensional (2D) images.As a promising field in computer vision, 3D vision has been drawing increasing attention over the past few decades thanks to the development of both software and hardware.In the process of constructing 3D scene, the disparity map of provided 2D images should be extracted.In the methods of stereo matching, four typical process, which are median filtering, mean-shift segmentation, guided filter and joint trilateral filters, are commonly utilized when constructing precise disparity map.With these processes, the image synthesis range in different stereo matching fields can be figured out.According to the datasets collected via various approaches, a large number of learning models have been produced to improve the quality of 3D vision at each stage.3D has broad applications combined with the fields of other disciplines such as electronic engineering and biomedical.

( 4 )Fig. 4 .Fig. 5 .
Fig. 4. Examples of TablesThis dataset consists of 9 datasets, each containing 7 views from view 0 to view 6 taken under 3 different illuminations with 3 exposures.This dataset has highaccuracy stereo depth maps and disparity maps are provided in view 1 and view 5.In making this dataset, rectified images and images with radial distortion were not included in.This dataset has three sizes: full size of width from 1330 to 1390 and height of 1110, half size of width from 665 to 695 and height of 555, and third size of width from 443 to 463 and height of 370.(5) 2006 High-Accuracy Stereo Depth Maps Using Structured Light[9,10]