Preserving Global and Local Structures for Supervised Dimensionality Reduction

. In this paper, we develop a new approach for dimensionality reduction of labeled data. This approach integrates both global and local structures of data into a new objective, we show that the objective can be optimized by solving an eigenvalue problem. Testing results on benchmark data sets show that this new approach can effectively capture both the crucial global and local structures of data and thus lead to more accurate results for dimensionality reduction than existing approaches.


Introduction
Dimensionality reduction has been extensively used in areas including computer vision, data mining, machine learning and pattern recognition [5,9].In general, most dimensionality reduction techniques map the highdimensional data points in a data set into a set of lowdimensional data points while preserving the features that are most important for recognition and classification.
So far, a large number of approaches have been developed for dimensionality reduction.For example, the Principal Component Analysis (PCA) [8] is a traditional unsupervised dimensionality reduction technique.It reduces the dimensionality of data by seeking for linear projections that can maximize the global variance of the projected data points.PCA thus can preserve the maximum amount of the global information of a data set.However, PCA may not be ideal for classification and thus is seldom used on labeled data.
Linear Discriminant Analysis (LDA) is a classical supervised dimensionality reduction technique [7,8].To maximize the extent to which data points in different classes are separated from one another in the space of reduced dimensionality, LDA computes the directions along which the ratio of the between-class distance and the within-class distance is maximized.LDA has been extensively used for dimensionality reduction in a variety of applications, such as microarray data analysis and face recognition [1,6].However, LDA is only able to capture the global geometric structure of a data set, the local geometric structure might be lost after the dimensionality of data is reduced [3].
Recently, research has shown that local geometric structure is an important feature of a data set and may affect the recognition accuracy of certain classifiers [2,3,4].A variety of approaches thus have been developed to reduce the dimensionality of a data set while preserving its local geometric structure.For example, Laplacian Eigenmaps (LE) [2] and Locality Preserving Projection (LPP) [10] reduce the dimensionality by minimizing an objective defined in terms of the graph Laplacian matrix.Locally Linear Embedding (LLE) [11] models the local geometric structure with linear dependence and data points are mapped into a low-dimensional manifold while preserving the dependence with minimum error.These dimensionality reduction techniques preserve the local geometric structure of a data set.However, important global features might be lost during the process of dimensionality reduction.
Since global and local structures are both important for recognition and classification, a method that can reduce the dimensionality of a data set while preserving both of its global and local structures is thus highly desirable.In this paper, we develop a new approach that considers both of them while reducing the dimensionality of a data set.Our approach develops a new objective that includes both the global and local structure features of a data set.The optimal direction for projection is thus the optimal solution of the objective.We show that the objective can be optimized by solving an eigenvalue problem.We evaluate the effectiveness of this new approach with six benchmark data sets and compare it with both LDA and LPP.Our results show that this new approach outperforms both LDA and LPP in all benchmark data sets.Testing results also suggest that combining both global and local structures retains more important features in a data set and thus is promising for improving the effectiveness of dimensionality reduction.

Linear Discriminant Analysis
LDA is a supervised dimensionality reduction technique, it reduces the dimensionality of a data set by projecting the data points in the data set to directions that can maximize the between-class distance while minimizing the within-class distance.Given a l dimensional data set } ,..., , { , where class i C contains i n data points.
Two scatter matrices are defined as follows where the second summation in equation ( 2) is over all data points in class i The objective of LDA is to maximize the ratio of the between-class distance and the within-class distance after the transformation is applied to the data points in D , which is to solve the following optimization problem.
It has been shown that, in cases where w S is nonsingular, the objective in equation ( 4) can be minimized by solving the following eigenvalue problem [8].W must satisfy the constraint as described in equation (7).

Locally Linearly Embedding
can be computed by minimizing the total amount of error due to this approximation under the constraint described in (7), the total amount of error is shown in equation (8).
The values of can then be computed based on a linear equation set of k equations as shown in equation ( 9) where ij C is the number on the i th row and j th column of the covariance matrix C of the data set.After solving the set of linear equations, the solutions can be normalized to satisfy the constraints described in equation (7)

∑ ∑
The right hand side of equation ( 10) should be minimized based on the constraint as shown in equation (11).
where the matrix M can be computed from as follows.
1 can be determined from the eigenvectors that correspond to the r lowest eigenvalues of M [11].

The New Objective
We observe that, if we assume the mapped data points ) It is not difficult to see that the right hand side of equation ( 18) can be optimized by solving the following eigenvalue problem.

Testing Results
We have implemented this new dimensionality reduction approach and evaluated its performance with six benchmark data sets.The effectiveness of a dimensionality reduction result is evaluated based on the classification accuracy obtained on it.In our experiments, the constants 1 λ and 2 λ are computed as follows.
) ( λ are as above.In addition, for each data point, five of its nearest neighbors are used to compute matrix Q . Table 1 shows the information on the benchmark data sets we have used in our experiments.These data sets include image documents, such as USPS, and text documents, such as 20Newsgroups.The rest of data sets are from UCI Machine Learning Repository.These data sets include Satimage, Waveform, Soybean, and Letter.We randomly partition each data set into a training set and a test set.The sizes of both training and testing sets are also listed in table 1.We then compare the effectiveness of our approach with that of LDA and LLE, two approaches that only consider global or local structures in dimensionality reduction.The effectiveness of an approach is evaluated by the classification accuracy measured with the Nearest-Neighbor (NN) approach.To estimate the classification accuracy more accurately, we repeat the partition of each data set for 50 times and compute the average accuracy and the standard deviation.Table 2 shows the average accuracy and the standard deviation obtained with our approach, LDA, and LLE respectively on all six benchmark data sets.To evaluate the computational efficiency of our approach, we measure the computation time needed by our approach on data sets of different sizes.Specifically, we select a few candidate data sets of different sizes from 20Newsgroup, which has the largest dimensionality of all six benchmark data sets.We then measure the computation time needed by our approach on these candidate data sets.Table 3 shows the computation time (in seconds) needed by our approach on candidate data sets of different sizes.It is clear from the table that our approach is able to efficiently process high-dimensional data.

Conclusions
In this paper, we develop a new approach for supervised dimensionality reduction.Our approach considers both the global and local geometric structures of a data set and develops a new objective that includes contributions from both of them.We show that this new objective can be optimized by solving an eigenvalue problem.Our testing results show that our approach outperforms both LDA and LLE on six benchmark data sets, which consider only global or local structures of data.Currently, the parameters used in the objective are estimated from the traces of two matrices, which may not be optimal for the effectiveness of dimensionality reduction.Future work will focus on the development of approaches to determining the optimal parameters.In addition, this new approach can probably be combined with our previous work [12][13][14] to solve a few important bioinformatics problems. i

1 −
r rows of W are the eigenvectors of b w S S that correspond to the r largest eigenvalues of dimensionality of the data points generated by LDA is thus at most 1 − k .

2 λ
by applying a linear transformation T on the original data set D , the relationship between i y and i d can be written as follows.guarantee that both the global and local structures are preserved to some extent, we propose to find a linear transformation T that can maximize the between-class distance while minimizing both the within-class distance and the are positive constants that determine the relative weights of the within-class distance and the error that arises from the transformation.Similar to the approximation used in LDA, we can approximate is not difficult to see that the denominator in equation ( . The computed relative weights ij

Table 1 .
Information on the test benchmark data sets

Table 2 .
The average classification accuracy and standard deviation of our approach, LDA and LLE in percentage.It can be seen clearly from table 2 that our approach outperforms both LDA and LLE on all six benchmark data sets, while LDA outperforms LLE on 20Newsgroups, Waveform, Soybean, and USPS and LLE outperforms LDA on Satimage and Letter.The results suggest that for Satimage and Letter, local structures are more important than global ones while global structures are more important than local ones for the rest four data sets.Table 2 also suggests that preserving both global and local structures while reducing the dimensionality can effectively improve the classification accuracy.

Table 3 .
Computation time(in seconds) of our approach on candidate sets of different sizes from 20Newsgroups.