Low-Rank Optimization Dictionary Training for Image Classification

. Bag-of-words model has been extremely popular in image categorization. The method of constructing the dictionary is important. In this paper a category constrained low-rank optimization dictionary training approach is proposed for the dictionary construction. Through the low-rank optimization, the rank of the coefficient matrix constructed by same category images is minimized. Experimental results show that the proposed method can obtain better performance on two standard image databases (Caltech-101 and Caltech-256) than not employing the category constrained low-rank optimization.


INTRODUCTION
If we do not have enough images of each category for training, it is not probable to use deep learning approaches. Bag of Words (BOW) [1] is a description method for statistic feature images which describes the histogram of an image via the frequencies of any feature words appearing in a statistic dictionary. Currently, there are numerous image feature description methods, such as those introduced in literature [2] that corresponds to different feature detection methods, feature description methods as well as feature organization and representation methods. Due to its powerful image feature description capacity, BOW is widely used in areas of image retrieval [3] [4], classification [5] and coding [6] [7]. BOW has different creation and usage methods in different applications. When it is used for image denoising and super-resolution image restructuring, an image is always divided into little blocks of the same size. In literature [8], feature analysis is performed to these local little blocks to form a feature BOW allowing local blocks description. For image classification, the model will perform feature analysis to the entire image, which normally consists of feature detection, feature description and feature clustering, and then a feature BOW will be formed by each cluster center, See literature [9][10][11][12].
In the issues of image classification based on BOW, it is important to know how to create a stable and highresolution feature dictionary with high descriptive power. A good feature dictionary can maximize resolvability with minimum image restructuring errors. Normally there are two ways to construct a BOW: one is using images in all categories to construct a feature dictionary and the other is constructing a separate feature dictionary for each category of images so that multiple feature dictionaries are constructed. For issues like a single dictionary having no sufficient utilization category information, and the very high calculation cost and easy loss of common feature information of each category of images useful for the classification of constructing a multiple-class dictionaries, particularly when there are a particularly large number of categories, this paper introduces a low rank optimization feature dictionary training approach with category constraint, which enables feature dictionaries to generate representation coefficients as similar as possible when images of the same category are represented by introduction of the idea of low rank, so as to introduce class information into dictionary learning and to increase the resolvability of dictionary description of images.

RELATED WORKS
Common-used dictionary methods for image classification are SPM [11] (Spatial Pyramid Matching) and Bof (Bag-of-Features) [13] and a number of expansions to BoF, such as local generation model described in literature [14] [15], potential generation model in literature [16] and discriminative dictionary learning in literature [17] [18]. Furthermore, BOW is also widely used in target monitoring and [19] and feature descriptor [20]. In these methods, feature dictionaries play a very important role. "Key feature dictionary" was first introduced in literature [9], which is also one of the first literatures where such method is used in image classification. In this literature, methods of affine covariant region and SIFT (Scale-Invariant Features Transform) feature descriptors were used for feature description, K-means was used to get a feature dictionary, and then Bayes classifier and support vector machine (SVM) were used to perform category experiments. Later, literature [10] made improvement to it, where standardized non-affine transformed SIFT locality description was used and SVM was replaced with Boost with additional minor geometric consistency constraints. In literature [12], feature BOW methods were studied in a more systematic way and experiments were performed to feature detection methods and feature descriptors, such as SIFT, RIFT (Rotation Invariant Features Transform) and SVM kernel function. In literature [21] [22], sparse optimization was applied to BOW method; such a method could construct a dictionary with high resolvability, so it has become one of the mainstream BOW construction methods. The procedure of describing images with feature BOW is: first, use the feature vectors in a dictionary to represent each feature vector detected in an image; then describe an image with the corresponding dictionary representation coefficients in the BOW of all the image features; and at last construct a classifier with the dictionary description feature vectors.
Key issues in a BOW requiring consideration: (1) Feature selection, including feature detection methods and feature description methods; (2) Methods of constructing a feature dictionary when features are extracted, such as conventional clustering method, sparse constraint method or KSVD [23][24] methods, etc.
(3) How should the feature dictionary represent new features, that is to say how to quantify vectors, such as by means of nearest neighbor coding, flexible weighting and sparse coding; (4) How to pool after vectors have been quantified, that is how to construct the feature description of the entire image when the representation coefficients of all feature points in an image in the BOW; (5) Identify using feature vector construction classifier formed after feature dictionary description.
In the conventional dictionary training approach, literature [11] get the feature dictionary by clustering the features detected in all images (such as K-means), where the number of cluster centers are the number of words in the feature dictionary.
is the feature points under M Ddimension feature space and V is the feature dictionary of K cluster centers, that is the size of the dictionary is K, then vectoring problem is to find the best optimization formula (1) In the conventional approach, nearest neighbor feature words are used to represent new characteristics and then statistical histogram of words in dictionaries are calculated to represent images. This approach can reflect the characteristic distribution in an image to an extent, but cannot accurately describe new characteristics as it only uses the nearest neighbor words to represent new characteristics, which have relatively large reconstruction error. Later, the approach based on flexible weighting value made improvement to the conventional approach, using the k atoms in the nearest dictionary to the feature vector of input signal, the weighting value of the k atoms is determined by the similarity between the atoms and the signal, and the more similar they are, the greater the weighting value. Although this approach is simple, easy to implement and has smaller signal reconstruction error comparing to Kmeans approach, such description method features relatively poor discrimination.

Basic Idea
Feature dictionary training is crucial, as a good feature dictionary will reduce the error in reconstruction of new features as much as possible while ensuring dictionaries for images of the same category have more similar dictionary representation, and dictionary for images of different categories have more distinctive differences. In formula (1), the category information of images which is helpful to train dictionaries with better resolvability is not employed when the feature dictionary is trained. The description of a dictionary to an image is represented on the description coefficient, that is the coefficient of each linear atom combination in the dictionary. Images of same category have similar coefficient matrix under the description of BOW, while images of different categories have relatively distinctive coefficient matrix under the description of BOW.
The low-rank idea is very widely used, such as for image correction in literature [25], for descriptor space structure in literature [26], for feature extraction in literature [27], for popular optimization in literature [28], for matrix restoration in literature [29], for subspace partitioning in literature [30], etc. During image correction, the rank for images corrected by transformation parameters will become lower because the corrected images will have more similar or same data in each column (or row), resulting in lower ranks of image matrix. Solution to such an issue can be transformed into solution to the following optimization problem: The part of rank solution in formula (2) is unfavorable to solve this optimization problem, and will normally be replaced with kernel norm solution:  (3) In view of the idea of low rank, the present paper suggest to apply low-rank optimization to dictionary creation, that is introducing category constraint information through minimizing the rank of coefficient matrix of images of the same category in the dictionary description so that images of the same category have more similar dictionary representation. For this purpose, the following optimization issues are constructed to solve low-rank constrained feature dictionary D: ( ) which is some kind of regularization strategy. There are C classes and i Z is the coefficient matrix of the class i images using D as the dictionary.

Solving the Low-Rank Optimization Problem
The optimization problem in (4) A singular value thresholding algorithm can be used to solve (7). It is defined that with xR  and 0   , for the above problem in (7) (17) The formulation (17) derivatives at D：

EXPERIMENT AND ANALYSIS
To verify the effectiveness of the proposed image description approach, this paper selects Caltech-101 [32] and Caltech -256 [33] image database for experiment. First, transform images into grey-scale images, and then detect the key feature points by means of DOF, describe the key points through SIFT feature descriptors where the 44  total seed points obtained via local pixel locality block probability density sampling are used for description. Each key point generates 128 data with moving step length of 8 pixels. Dictionary initialization is obtained by means of K-means, and the size of a feature dictionary is 128 1024  . During the experiment, all the classifiers are LIBSVM support vector classifiers.
Caltech-101 image database contains 101 categories of images for animals, flowers and cars and a total 9144 pieces of images. Each category includes 31 to 800 images and most images are pixel images. For the experiment, some images from each category are selected respectively to form an exercise set while the remaining images form a testing set. Table 1 shows the comparison between results of the experiment with Caltech-101 by this and other approaches. Wherein LRC-DT (Low Rank Constrained-Dictionary Training)is the approach proposed in this paper with the feature dictionary obtained from low rank constrained optimization. From the results, it can be seen that the proposed LRC-DT dictionary training approach effectively increases the accuracy of image classification, and with the size increase of exercise sample of each category, addition of low-rank constraint (LRC) can better adjust the base vector of the feature dictionary, realizing better classification effect; however, small exercise sample of each category will result in relatively smaller increase or smaller decrease. To evaluate the experiment effectiveness before and after the low-rank optimization feature dictionaries are used, the SPM approach proposed by Lazebnik et al. [11] and ScSPM coding approach in literature [21] are replaced with the feature dictionary with LRC low-rank constraints and compare the classification effect. Figure 1 shows the experiment results, from which we can see that when the calculation methods in other steps are kept unchanged, replacing feature dictionary training approach with lowrank constrained approach can improve the classification accuracy by various extents. Caltech-256 image database has 256 image categories with 30607 images. Comparing to Caltech-101 image database, it has more category and image variance, and each category includes at least 80 images. Table 2 shows the experiment results that when there are 45 and 60 exercise samples, which means there are relatively a large number of exercise samples in each category, the approach proposed in this paper has the highest classification accuracy. Figure 2 shows the comparison results of SPM approach and LLC coding approach with the feature dictionary replaced with the low-rank optimization constrained feature dictionary. It can be seen that with LRC, the accuracy has increased to a certain extent.

CONCLUSION
Constructing a multiple BOW with rich representation is crucial and a good feature BOW can not only represent abundant image features, realize better resolution to descriptions of images of different categories, but also give descriptions with similarity to images of the same category. The approach to apply low-rank constraint to the coefficient matrix of images of the same category integrates category information into the creation process of feature dictionaries, improving performance of such feature dictionaries and increase the accuracy of image classification based on BOW to a certain extent. It is probable for the situation that we do not have enough images of each category for training. The approach proposed in this paper integrates the similarity of feature description of images of the same category into the construction of feature dictionaries, but how to include the relatively great difference between feature descriptions of images of different categories will be further researched in subsequent studies. Furthermore, due to time consumption of low-rank optimization calculation, the efficient calculation of low-rank optimization will be another direction of study in the future.