Feature selected based on PCA and optimized LMC

. In this article, we propose an optimization algorithm for the original LMC [1] (Large Margin Classifier). We use PCA [2] (Principal Component Analysis) to reduce the dimensionality of the images, and then put the data after dimensionality reduction into the optimized LMC for the feature selection [3]. We will get several features with the greatest distinction. We use these features to classify images. Finally, the experiment shows that the accuracy of the optimized LMC under the same dimensions is higher than that of the original LMC, and in many cases, the accuracy of the optimized LMC after taking 6 feature vectors has exceeded the highest accuracy of the original LMC.


Introduction
With the improvement of computer computing power, machine learning has made great breakthroughs in recent years. Based on machine learning, there are many classification models such as Faster-rcnn, Yolo, SSD, and Resnet. There are also many classification algorithms such as SVM [4] (Support Vector Machine), DT [5] (Decision Tree), LR [6](Logistic Regression) and many other algorithms. In this paper, the algorithm we use is a simplified version of SVM named LMC. We have optimized the original LMC based on PCA to get the feature selection. The original LMC only circulates the feature vector according to the energy value from large to small after PCA dimensionality reduction. Although the energy of the first feature vector is the largest, its distinguishability is not necessarily the best. Therefore, we have taken this point and then optimized it.

Principal component analysis
PCA is the most widely used algorithm of data dimensionality reduction in this world. The main idea of PCA is to map -dimensional features to -dimensional features, and thedimensional features are the new orthogonal features called principal component. In fact, by calculating the covariance matrix of the data matrix, we will get the eigenvector, and then we select the matrix composed of the eigenvectors with the largest eigenvalue. In this way, data dimensionality can be achieved.
The main advantage of PCA is to maintain the original data information to the greatest extent.

Large margin classifier
LMC uses the concept of convex hull, but the condition of convex hull is too harsh. So in this paper, we use the concept of affine hull. The sample points are described as subspaces which can be written as: (1) while we use the concept of affine hull, the formula can be written as: This change relaxes the requirements on , and we expect it to be linearly related. If the subspaces overlap, the meaning of our classification will not exist. So we assume a positive example + and a negative example − under the premise of linear separability. We can write them separately as: In this formula, is the feature vector, is the corresponding coefficient in the feature vector projection, is the mean. Their subscripts indicate whether they are positive example or negative example.
The idea of LMC is to find the two closest points between two subspaces, and then find the normal vector of the distance between these two points so that we can separate the two samples. We can write the distance between two subspaces as With the formula 5, the formula can be written as We can get by deriving in formula 6.
The split plane and offset in the decision function of ( ) = < , > + can be derived by

Optimized large margin classifier
The original LMC is the feature vector after the PCA dimensionality reduction from the first to the last. Although the energy of the feature vector after PCA processing is the largest at the beginning, the largest amount of energy does not mean the strongest discrimination. In order to find the feature vector with the highest classification accuracy, the original LMC needs to find the last one from the first feature vector, which is very time consuming, and the final accuracy rate is not the highest. Our method is to take one eigenvector from the positive examples and the other from the negative examples each time, and calculate the two eigenvectors with the highest discrimination rate and put them in the next operation in turn.

Datasets introduction
In this experiment, we have used three datasets, namely MNIST, Fashion-MNIST, Med-MNIST. Among them, MNIST is the easiest to classify and has the highest accuracy rate, and Med-MNIST is the most difficult to classify and has the lowest accuracy rate. We selecte the two most difficult ones to classify for each data set for experiment. The figure below shows three data sets.

Visualization after PCA dimensionality reduction
We use PCA to reduce the dimensionality of the original images by means of feature normalization to display the feature picture as shown in Figure 5.

Comparison of LMC and optimized LMC
We compare LMC and the optimized LMC in the same dimension. The histogram below is their comparison of accuracy.

Conclusion
From Figure 6 we can see that in the same dimension, the accuracy of the optimized LMC is higher than that of the original LMC. We have found that the accuracy of our optimized LMC starts to increase slowly in most cases after taking three columns of feature vectors for positive and negative examples. In some cases, when the optimized LMC takes two eigenvectors, its accuracy is higher than when the original LMC takes six eigenvectors. In general, our optimized LMC is more accurate than the original LMC algorithm, and our proposed optimization algorithm is feasible.