Orthogonal Discriminant Diversity and Similarity Preserving Projection for Face Recognition

Feature extraction is a crucial step for face recognition. In this paper, based on supervised local structure and diversity projection (SLSDP), a new feature extraction method called orthogonal discriminant diversity and similarity preserving projection (ODDSPP) is proposed for face recognition. ODDSPP defines two parameterless weighted matrices by taking into account the class label information and local structure. Thus ODDSPP could utilize both the diversity and similarity information of the data simultaneously for dimensionality reduction. Moreover, the proposed algorithm is able to extract the orthogonal discriminant vectors in the feature space and does not suffer from the small sample size problem, which is desirable for many pattern analysis applications. Experimental results on the ORL and AR databases show the effectiveness of the proposed method.


Introduction
Over the past few decades, a large volume of dimensionality reduction methods for face recognition have been proposed.The most well-know dimensionality reduction methods are principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2].Recent studies have shown that large volumes of high dimensional data possibly reside on a nonlinear submanifold.However, both PCA and LDA fail to discover the underlying manifold structure, due to the fact that they aim only to preserve the global structures of the samples.The most representative manifold learning methods include Laplacian Eigenmap (LE) [3], Isomap [4], and Locally linear embedding (LLE) [5].These nonlinear methods yield impressive results on some benchmark artificial datasets.However, they are difficult to find low dimensional embeddings of new test samples.Locality preserving projection (LPP) [6] is the linear approximation of LE and can compute the low dimensional embeddings for new test points.Motivated by LPP, many local discriminant approaches [7][8][9][10][11] have been developed for image classification, among which the most prevalent ones include margin fisher analysis (MFA) [7], locality preserving discriminant projection (LPDP) [8], local fisher discriminant analysis (LFDA) [9], local discriminant embedding(LDE) [10], locality sensitive discriminant analysis (LSDA) [11].These methods are simple to compute and easy to operate.However, one common limitation of the above mentioned manifold-based learning algorithms is that they only pay attention on the local similarity information and neglect the diversity information of the samples.Impairing the diversity information of the samples may arise the over learning problem.In order to overcome the over learning problem, Gao et al. considered on the diversity weighted information between points and proposed the supervised local structure and diversity projection (SLSDP) [12].SLSDP, which explicitly considers the variation of the values among nearby data, overcomes the over learning problem successfully.However, there are still some limitations existed in SLSDP.Firstly, the diversity matrix is constructed without the guide of the class label information that is very important for recognition problem.Secondly, the neighborhood size has an impact on the learning of the manifold and is usually set in advance to the same value for all samples.Finally, similar with LDA, SLSDP also suffers from the small sample size problem (SSS) when dealing with high-dimensional data recognition task, such as face recognition.
In this paper, we present a new method called orthogonal discriminant diversity and similarity preserving projection (ODDSPP), which can directly overcome the SSS problem and at the same time derive all the orthogonal optimal discriminant vectors.The rest of this paper is organized as follows.Section 2 briefly gives an overview of related works about SLSDP.In Section 3, the dimensionality reduction method of ODDSPP is presented.In Section 4, the performance of ODDSPP is experimentally evaluated on the ORL and AR databases.At last, a brief conclusion is offered in Section 5.

Related work on SLSDP
The aim of SLSDP is to find an optimal projection that can preserve the similarity and the diversity information at the same time.Let S denote the local weighted matrix with elements characterizing the similarity of two close data points with same labels and B denote the local weighted matrix with elements characterizing the diversity of two close data points.
The elements of the local similarity weighted matrix S are defined as follows: l and j l denote the class label of data i x and j x , respectively.
) , 0 ( f t is a suitable constant.The elements of the diversity weighted matrix B are defined as follows: x relative to j x to the diversity information.
Using the similarity matrix S, the local similarity scatter L S can be expressed as: Where D is a diagonal matrix whose elements on diagonal are column sum of S , i.e.
¦ j ij ii S D .Using the local diversity weighted matrix B, the local diversity scatter D S can be expressed as: Where Q is a diagonal matrix whose elements on diagonal are column sum of B, i.e.
The objective function of SLSDP can be expressed as follows:

Orthogonal Discriminant Diversity and Similarity Preserving Projection
In this section, we introduce the ODDSPP method, which can preserve both the similarity and diversity information and at the same time derive all the optimal orthogonal discriminant vectors in the low-dimensional space.We firstly redefine the discriminating similarity matrix S and discriminating diversity matrix B , respectively.The elements of the discriminating similarity matrix S are defined as follows: (6) Where i d denotes average distance between samples with the same label.S denotes the case that i x and j x have the same label .
The elements of the discriminating diversity matrix B are defined as follows: (7) Where i d denotes average distance between samples of the same label and d denotes average distance between all samples.
From the Eq.6 and Eq.7, it is obvious to find that the discriminating similarity matrix and discriminating diversity matrix integrate both the local structure and class label information.The size of the local structure is parameterless, which is favorable for face recognition.
Then using the similarity matrix S and diversity matrix B , the local similarity scatter L S and local diversity scatter D S can be expressed as: and  (10) In order to avoid the SSS problem, the objective function of ODDSPP can be rewritten as: Owing to the dimensionality of the sample very high, firstly we need to calculate the total-scatter " corresponding to the first r non-zero eigen-values.The total-scatter matrix t S is defined as: Where I is the N N u identity matrix and M is a N N u matrix with all terms equal to N 1 .

Denote
Then, Eq. 11 can be rewritten as: . Finally, we apply the Lagrange multipliers to the Eq.13 and set the derivative with respect to A to zero.The projection vector a that minimizes the Eq.13 is given by the minimum eigenvalue solution to the generalized eigenvalue problem The procedure of computing ODDSPP can be summarized in the following steps: Step 1: Constructing the discriminating similarity matrix S and the discriminating diversity matrix B according t o Eq.6 and Eq.7.
Step 2: Calculating the discriminating similarity scatter L S and discriminating diversity scatter B S according to Eq.8 and Eq. 9.
Step 3:Compute t S according to Eq.12 and solve t S by g eneralized eigenvalue decomposition, then we can obtain the basis vectors , then solve Eq.14 by generalized eigenvalue decomposition.Finally, we can obtain the optimal orthogonal transformation matrix VA W .

Experimental Results
In order to test performance of our proposed method, the ORL and AR database are used.The ORL face database is composed of 40 distinct subjects.Each subject has 10 images under different expression and different views.Fig. 1 shows the sample images of one person in the ORL database.The AR face database contains over 4000 color face images from 126 people, including frontal views of faces with different facial expressions, lighting conditions, and occlusions.In the experiments, we choose a subset of AR database.The subset of AR database contains 100 individuals and 10 images for person.Fig. 2 shows the sample images of one person in the AR database.In the experiments, the nearest distance classifier is adopted.All methods are compared on the same training sets and testing sets.In each round, k(3-7) images are randomly selected from the database.For each k, 10 tests are performed and these results are averaged.The recognition rates of UDDPP, MFA and SLSDP are listed in Table 1 and 2 .3

APOP
From Table 1 and 2, we find that ODDSPP is the most efficient dimensionality reduction method, and is much more efficient than SLSDP and MFA.The one reason may be that the ODDSPP not only overcomes the small sample size problems but also exploits the orthogonal features, which is very suitable for face recognition.The other reason may be that our method can deal with the similarity and diversity information simultaneously.Furthermore, the strong structure preserving and discriminating ability make the proposed method more suitable for the recognition tasks.

Conclusions
In this paper, we presented a novel algorithm which is based on SLSDP, namely ODDSPP, for dimensionality reduction.Two contributions were made in this paper.
(1)Combined with local structure information and the class information, ODDSPP redefines the parameterless weighted matrices, then can prevent the main geometric structure of the data and has more discriminating power.
(2)Using a difference-based optimization objective function, the optimized transformation matrix can be computed by solving an eigen-equation in our method.Thus ODDSPP does not suffer from the SSS problem.These merits make our method more robust and suitable for classification tasks.In further research, we will focus on the incremental learning methods.The incremental feature set could be utilized by manifold learning to improve its recognition performance for real-time classification.

"
corresponding to the first k smallest eigen-values.

Table 1 .
The comparison of recognition rates(%)

Table 2 .
The