Planar object recognition based on local feature prediction agreements

. Object recognition has drawn great attention in industrial application especially in automated feeding and assembling, for it can greatly improve the line flexibility and save cost. In this paper, a simple but effective method for planar object recognition is presented. This method can deal with objects under complex conditions like occlusion and clutter. The method generates object pose hypothesis from the prediction agreements of different local features in the object. There are two stages contained in our method, offline stage and online stage. At offline stage, the representative parts in the object are chosen as its local features and the recognition template is made. Next at online stage, the matches of different local features are found in the input image. Then the prediction agreements are searched among them in order to generate the final object pose hypothesis. A thin planar object recognition experiment has been conducted under occluded conditions and an improved result is presented compared with the traditional overall matching method.


Introduction
Object recognition under complex conditions is a popular topic in machine vision for its promising industrial application potentiality in aspect of automated feeding and assembling. Compared with traditional automation method, machine-vision-based one does not require additional equipment like the feeding lines, thus the system can be more flexible and save cost as well. However, objects are usually randomly placed in a bin, so that the occlusion and clutter inevitably happen, which presents a big challenge in accurate recognition ( Figure 1). The main existing problem is that the information derived from the 2D image of occluded objects is incomplete. And the extracted object edges may be missing where the objects overlap due to similar texture and color. Besides, the self-similar clutter background can raise false positive recognition easily.

Figure.1 Object in occluded and clutter condition
In this paper, a simple but effective method for planar object recognition by local feature prediction agreements is presented, which allows to give more accurate prediction under occlusion and clutter conditions. The method contains offline and online two stages. At offline stage, objects are first split into several representative parts as its local features and for each part-object pair, their local-overall transformation relation is then calculated. At online stage, by finding matches of each local feature in the input image, the local feature matching instance sets are obtained. All the feature instances are then transformed to object pose prediction via transformation relation. Finally, in a Hough-like voting process, different orders of local feature prediction agreements are obtained. And in general, the higher order the agreements are, the more confident the prediction result is.

Related Work
Recognition for planar object in 2D image is a recurrent topic in machine vision and industry application. There are mainly two technical routes for recognition. One is based on global description of object. In [1], Steger presented a recognition method via overall contour matching, which is robust against occlusion, clutter and nonlinear illumination changes. And this method is widely used in industry application for its good performance. But for highly clutter conditions, this method always fails and may offer some false positive results. The other is focused on local description of object. In [2], Rothwell et al. used projective invariant features of the object shape to do matching and generate hypotheses. Then the hypotheses are merged and verified to obtain final results. In [3], Huang et al. defined a chord angle descriptor based on the angles between two chords for each sample points in contours. By calculating the chord angle descriptor in both model contour and input contour, the distance of the two descriptors is obtain and thus the similarity score is calculated. In [4], Tang et al. used line segment detector (LSD) first to extract straight lines in input image and then connected the line segments and did region division. At last, they completed the object recognition by comparing the shape feature vector to that of the template. In [5], Alberto et.al also used LSD method to detect the contour line segments first. Then they used the line segment and its direction as local descriptor. They found matches between the line segments and the standard CAD model to gain object pose hypotheses. Finally, through a modified Hough-voting scheme, they obtained several best hypotheses as the final results.

Overview of the paper
Section 2 describes the recognition method based on local feature prediction agreements and its theoretical analysis. In section 3, some experiment results and the corresponding analysis are presented. In section 4, the potential extension of this method to 3D object recognition via 2D image is discussed.
2 Recognition method based on local feature agreements

Definition and notations
An object or workpiece that is to be recognized in the following is denoted as . And all the local features contains are = { , , … , }, where is the number of its local features. Local features are the representative parts in . The given input 2D image is denoted as , and the matched instance set of in is written as orders. The hypotheses existing in high order agreement will not be recorded repeatedly in lower one.
With all the agreements obtained, the complete recognition results of are denoted as ( , ) = { ∨ ∨ … ∨ }.

The probabilistic analysis of recognition by prediction agreements
Under the same matching technique, applying the method based on local feature prediction agreements can gain higher confidence in object recognition compared with traditional overall matching method. Given an object with local features, and assume in , there is a non-occluded existing with some . Apply the overall matching method, and suppose the probability of existing with is ( , ) = . With the same matching technique, there should be local feature instances detected which belong to local features respectively. And the instances should form a = { , , … , } in -order agreement that predicts exits with . Suppose the probability of each instance prediction is , then considering their agreements, the probability of existing with is (1) Using the same matching technique, the matching score or probability of local features should be no less than that in overall method, i.e., ≥ . Compare the two methods, (2) That is, applying our method, the confidence in objection recognition could be enhanced.
When is occluded in , the overall method may encounter a sharp confidence loss due to the missing of some local information. For our method, the prediction agreement may degenerate to lower order, but the nonoccluded local features can still contribute a high confidence to the results.
Next, we would analyze the reliability of our method. That is, check the probability of ( ≤ ) predictions given by local features that agree on existing with a certain pose while is actually not. The probability is And ( ) means the probability that local feature instance sets could form a -order Suppose As for (¬( , )) , it can be written as 1 − ( , ). ( , ) is the probability that exists with a certain . Given an input image , exists with limited pose due to its limited number, so for a certain , the probability that exists is approximately 0, i.e. ¬( , ) ≈ 1. Finally, the probability When the number is set relatively large, the denominator would be a quite large number, and the probability ¬( , ) would be very close to 0, thus the reliability of our method can be guaranteed.

Obtaining multi-order prediction agreement based on combination numbers
After obtaining the local feature prediction set ( ) } , we need to check the prediction agreements between every two of them and form the multi-order prediction agreements. Different with traditional Hough-like voting scheme [6,7], where all the individual hypotheses are put together for pose clustering or voting, our method makes the vote or forms the multi-order prediction agreements among each local feature prediction set. In this case, there would be no agreement containing two pose predictions that belong to the same local feature. Since some local features might be easy to detect, their instance predictions may form a selfclustering in some place, which could lead to the falsepositive recognition. With our scheme, such misrecognition can be avoided.
First, define the prediction agreement measurement. Suppose the pose predictions of two local feature and are where the and are used for normalization such that the component position and angle are of the same magnitude. and are parameters used to adjust different matching error acceptance.
Set a probability threshold, and using (9), we can obtain the two-order agreement . Next, we will discuss how to obtain -order agreement based on and combination numbers. By this method, we can avoid repetitive agreement measurement calculation between instance predictions when finding different agreements.
Code the local features of with index 1~, then for -order agreement, there would be conditions. Denote all the combinations (with ascending order) by . Therefore, we can infer that to obtain a with feature combination ( , {1,2,3, … , }) ( ) is to find the predictions groups which appear in all − 1 order agreements with combination condition in ( − 1, ( , {1,2,3, … , }) ( ) ). Based on the analysis above, the procedure to obtain multi-order prediction agreement is given:

Object pose derivation from multi-order agreement
Assume there is a in a certain -order prediction agreement, = { , , … , }, in which the probability of each pose prediction is . We will talk about the final object pose hypothesis derivation from two different aspects.
From the aspect of their prediction possibility, naturally, the pose prediction with higher probability should gain a larger weight in final pose synthesis. Therefore, give the pose hypothesis based on probability, Our method is established on the prediction agreements between different local features, so intuitively, the prediction pose which is more agreed with others should gain a larger synthesis weight. The target "more agreed" can be described by their space aggregation. Here, we define a position weight to describe their space aggregation. First, give the average pose of the Where is the parameter that describes the space aggregation of and . According to (9), we can use = ( = ). On the basis of position weight, the pose hypothesis is In general, = = 0.5. To obtain the confidence of final hypothesis via the analysis in section 2.2, we have to transfer the confidence of each pose prediction to that at . The probability transfer can be implemented via (9). Define the probability transfer coefficient = ( = ) .Then according to (1), the confidence of the final hypothesis for is

Experiment
To evaluate our method, we conduct an experiment on platform Halcon-12.0 and Matlab(R2016b) to identify the planar object which is randomly piled up in Figure 1. Also, the results of our method and the traditional one by overall contour matching are compared. In our experiment, the scale changes are not considered, because the planar object is very thin and the distance change from objects in different stack layers to our 2D camera can be neglected.

Offline Stage
At offline stage, we make the object template for recognition ( Figure 2). The object parts which are pointed out with rectangles are the chosen local features. The arrow in Figure 2 is the reference set to represent the whole part. And the relative position between each local feature and the reference arrow gives the local-overall transformations.

Online Stage
At online stage, given an input image, edge detection algorithm is first performed to extract object edges. However, due to the similar surface texture and color, the gradient where objects overlap has a very low magnitude and the standard edge detection methods like Canny operator [9] tend to fail, giving rise to the edge missing. To overcome this problem, we first apply LSD method [10] to extract the edge segments ( Figure 3) and then superimpose them on the original image to enhance object edges.

Figure 3. Edge segments extracted by LSD
We use "Matching Assistant" in Halcon-12.0 to find local feature instances in the input image via its contour matching function. The matching results are given in Figure 4. As we can see, many false positive instances are presented and in the meanwhile, several true feature instances are missing. The false positive instances can be easily rejected because no strong agreements are made with them. As for the missing true instances, so long as not all the local features of an object instance are inaccessible, we still can identify it through the predictions of its rest local features via our agreement scheme. To visualize our prediction agreements scheme, we draw all the object pose predictions as arrows in Figure 5 and use different colors to distinguish between predictions from different local features. The arrow tail indicates the predicted position, while its direction indicates the predicted angle. And the length of it represents the confidence that the prediction holds. We can see that all the arrows belong to false positive instances seem to be randomly dispatched. Comparatively, the ones of true positive instances are gathered together with highly consistent directions. According to the algorithm in section 2.3, we can obtain eight prediction agreement , and they are marked with black circles with dashed lines in Figure 5. Refer to section 2.4, eight object pose hypotheses can be derived from those eight prediction agreement . The results are showed in Figure 6(a) We also calculate the probabilities that these hypotheses hold. In experiment, we set the similarity probability 0.9 for two positions with 10pixel difference in both dimensions or two angles with 2degree difference. Based on that assumption, we can determine the parameters in (9). Then, according to (15), we get the result that all probabilities are greater than 0.99.
We also use "Matching Assistant" in Halcon-12.0 to implement the traditional overall matching experiment. The same edge enhanced image is fed as its input and the results are shown in Figure 6(b) The matching parameters are set as recommended in Halcon and the lowest matching score is set to be 0.5. Only three matches are presented with probabilities 0.53, 0.54 and 0.57 respectively which is far less than our method. Comparatively speaking, our method can give more matches with higher confidence under the same complex working conditions.

Conclusion and discussion
We have presented a simple and reliable method for planar object recognition. This method focuses on the local information of an object, so that it can handle the conditions with incomplete object information like occlusion. The matching hypothesis is given through the consistent predictions of different local features in an object, thus it can obtain high confident results even in clutter situations. The thought obtaining hypothesis through the prediction agreements of local features can also be generalized in 3D objection recognition. Choose different feature surfaces of an object and find their matches in 3D point data. Then with our method, the 3D object pose hypothesis can also be obtained.