An Improved Extreme Learning Machine Based on Full Rank Cholesky Factorization

. Extreme learning machine (ELM) is a new novel learning algorithm for generalized single-hidden layer feedforward networks (SLFNs). Although it shows fast learning speed in many areas, there is still room for improvement in computational cost. To address this issue, this paper proposes an improved ELM (FRCF-ELM) which employs the full rank Cholesky factorization to compute output weights instead of traditional SVD. In addition, this paper proves in theory that the proposed FRCF-ELM has lower computational complexity. Experimental results over some benchmark applications indicate that the proposed FRCF-ELM learns faster than original ELM algorithm while preserving good generalization performance.


Introduction
As a universal approximator, the single-hidden-layer feedforward networks (SLFNs) have been widely studied and used in different fields in the past decades [1][2][3]. However, the slow traditional learning algorithms cannot satisfy the need of the rapid development of SLFNs. To address issue, Huang et al. have proposed a simple and efficient learning algorithm for SLFNs named extreme learning machine (ELM) [4]. This technique randomly chooses the parameters (input weights and biases) of hidden nodes and analytically determines the output weights according to least-mean square method [5]. Compared to the traditional gradient-based learning algorithms, ELM not only provides better generalization performance with fast learning speed, but also avoids many issues faced by traditional algorithms such as stopping criteria, learning rate, local minima and overfitting. In view of these advantages, ELM has been widely studied and applied in various areas [6][7][8][9].
Although ELM shows fast learning speed in many areas, there is still room for improvement in computational complexity. By analyzing the theory of ELM, we can find that the learning time of ELM is mainly consumed by computing the Moore-Penrose generalized inverse of hidden layer output matrix (denoted by † H ). Technically, the computation of † H is generally completed based on the singular value decomposition (SVD) and the corresponding computational complexity is 2 3 (4 8 ) O NL L + [10]. It can be seen that when the data size N or the number of hidden neurons L increases, the necessary computational complexity will yet become large. Therefore, the large complexity of computing † H is the key factor restricting the learning speed of ELM.
To address this problem, this paper proposes an improved ELM based on the full rank Cholesky factorization, named FRCF-ELM. Different from the original ELM, FRCF-ELM computes the output weights by using the simpler full rank Cholesky factorization instead of the traditional SVD. Furthermore, we systematically analyze the computational complexity of such FRCF-ELM in theory. It will be verified that the obtained FRCF-ELM has lower computational complexity. Experimental results over some benchmark applications also indicate that the proposed FRCF-ELM learns faster than the original ELM while preserving good generalization performance.

Review of ELM
is the weight vector connecting the ith hidden node and the output nodes, a i is the weight vector connecting the input layer to the ith hidden node, i b is the bias of the ith hidden node, ( , , ) a x is the output of the ith hidden node with respect to the input x.
The above N equations can be written compactly as where In ELM, the parameters (input weights a i and biases i b ) of output matrix H are randomly generated. Thus, training the SLFNs is simply equivalent to finding the solution of the linear system (2) with respect to the output weight β . In fact, the output weight β is determined to be the minimum norm least squares solution of (2), that is † where † H is the Moore-Penrose generalized inverse of the output matrix H. It should be noted that ELM computes † H based on the singular value decomposition (SVD) of H, that is Generally speaking, ELM learning algorithm can be summarized as follows:

Proposed FRCF-ELM
In this section, we propose an improved ELM based on the full rank Cholesky factorization, named FRCF-ELM, which aims to further improve the learning speed of ELM. Different from the original ELM, the core idea of the proposed FRCF-ELM is to compute the output weights by using the simpler full rank Cholesky factorization instead of traditional SVD.

Formulation of FRCF-ELM
According to the property of the Moore-Penrose inverse [10,11], the Moore-Penrose inverse of a matrix product AB is as follows † † If A=S T , B=S, then one obtains from (6) Using Eqs. (8) and (9), one obtains from (7)

Analysis of computational complexity
In ELM, the output weight β is calculated based on SVD of the output matrix H . Thus the computational complexity of ELM is approximately equal to that of SVD. As described in [10], the computational complexity of applying SVD to H is . Therefore, we can deem that the computational complexity of ELM is  Therefore, the total computational complexity of the proposed FRCF-ELM is 3 3 2 ( 3 2 3 ( )) O L r L N r + + + . In order to better illustrate the superiority of FRCF-ELM in the learning speed, we then compare the computational complexities of FRCF-ELM and the original ELM, 3 3 2 and further derive that Based on the above analyses, we can conclude that the computational complexity of FRCF-ELM is smaller than that of ELM. In other words, FRCF-ELM learns faster than ELM.

Experimental verification
In this section, we carried out a series of experiments to test the proposed FRCF-ELM. In order to better verify the effectiveness of our method, we compare it with ELM and popular support vector machine (SVM). Here, the SVM is performed using the latest LIBSVM software package (3.23 version) [13]. In addition, we select epsilon-SVR for regression problems and nu-SVC for classification problems.
We compare the performance of these three learning algorithms on some benchmark problems from UCI database [14], including six regression problems and three classification problems. The specification of these datesets is shown in Table 1. For each problem, the dataset is randomly divided into two subsets: a training set for model learning and a testing set for performance quantification. In order to make a fair comparison, 50 trials have been conducted and the median results are marked. All the simulations of different algorithms are running in MATLAB 2011a environment on the PC with Intel Core 3.40 GHz CPU and 4 GB RAM. Tables 2 and 3 show the performance comparison of ELM, SVM and the proposed FRCF-ELM on regression problems and classification problems, respectively. The apparent better results are emphasized in boldface. From Tables 2 and 3, it can be seen that the proposed FRCF-ELM always has faster learning speed than two other methods for all nine problems. For example, in the case of Census (House8L) problem, FRCF-ELM learns up to 4 times faster than ELM and 12 times faster than SVM. In addition, it also can be seen that the generalization performance of FRCF-ELM is similar or comparable to ELM and SVM. These facts indicate that the proposed FRCF-ELM learns faster while preserving good generalization performance.

Conclusion
In this paper, an enhanced ELM called FRCF-ELM is proposed which can greatly improve the learning speed of ELM. In FRCF-ELM, the output weights are computed using simpler full rank Cholesky factorization instead of traditional SVD. It has been proved in theory that FRCF-ELM has lower computational complexity. Experimental results over some benchmark applications also indicate that the proposed FRCF-ELM learns faster than original ELM while preserving good generalization performance.