MODIFICATION OF HIDDEN LAYER WEIGHT IN EXTREME LEARNING MACHINE USING GAIN RATIO

Extreme Learning Machine (ELM) is a method of learning feed forward neural network quickly and has a fairly good accuracy. This method is devoted to a feed forward neural network with one hidden layer where the parameters (i.e. weight and bias) are adjusted one time randomly at the beginning of the learning process. In neural network, the input layer is connected to all characteristics/features, and the output layer is connected to all classes of species. This research used three datasets from UCI database, which were Iris, Breast Wisconsin, and Dermatology, with each dataset having several features. Each characteristic/feature of the data has a role in the process of classification levels, starting from the most influencing role to non-influencing at all. Gain ratio was used to extract each feature role on each datasets. Gain ratio is a method to extract feature role in order to develop a decision tree structure. In this study, ELM structure has been modified, where the random weights of the hidden layer were adjusted to the level of each feature role in determining the species class, so as to improve the level of training and testing accuracy. The proposed method has higher classification accuracy rate than basic ELM on all three datasets, which were 99%, 96%, and 82%, respectively.


INTRODUCTION
Extreme Learning Machine (ELM) is a learning method for a single hidden layer feed-forward neural network (SLFN) that resolves concerns raised by the use of back propagation methods.The learning stages in backpropagation take much longer than ELM despite using the same neural network configuration, i.e. one hidden layer.This is because the Feed Forward Neural Network uses a learning algorithm based on gradient that works slowly, and all the adjustable parameters are repeatedly adjusted in the learning process until an iteration stopping criteria is reached (Huang et al., 2004).Meanwhile, the learning stage in ELM requires only one iteration and the weight parameters are set once randomly, despite having only one hidden layer (Huang et al., 2006).Although ELM gives quickly learning process, it has a fairly good accuracy too.
ELM has been gaining high attention from researchers since its announcement (Huang et al., 2015).It is not only researched within the scope of classification problems, but also in the scope of regression and clustering problems.With its advantages, ELM is considered appropriate to resolve various problems with big data and real-time applications, such as in medical field, image processing, computer vision, etc.They paper show that ELM and its variants are efficient, accurate and easy to implement, also in hardware needs (robot).
ELM research result by Huang et al. (2004) showed that ELM has a higher accuracy than other methods, such as SVM, AdaBoost, C4.5, and RBF.Two datasets used were real diabetes medical diagnosis dataset and forest type dataset.
Data in classification problems consists of several features that represent an object of a specific class species.Each feature has a role level (i.e. a weight feature) which can be categorized as high and low role levels.Weighting is generally performed in feature selection stage and aims to analyze the data and generate the level of a feature role in the classification process.There are two approaches in feature selection process: filter approach and wrapper approach (Karegowda et al., 2010).The filter approach is carried out separately from the classification engine, and is an important preprocessing stage.Since it is separated from the classification engine, the outputs of this feature selection approach can be used by different classification engines.The wrapper feature selection approach uses a classification engine to determine the role levels of each feature in the classification process.In other words, the filter approach is simpler and faster than the wrapper approach.Some of the feature selection methods using filter approach, among others, are gain ratio (Karegowda et al., 2010, Priyadarsini et al., 2011, and Anggraeny et al., 2013), particle swarm intelligence (PSO) (Yang et al., 2007 andHuang et al., 2008), and differential evolution (Khushaba et al., 2011).Some feature selection methods using wrapper approach are ant colony optimization (ACO) (Kanan et al., 2008) and sequential floating forward selection (SFFS) (Liao et al., 2010).Karegowda et al. (2010) used gain ratio (GR) as feature selection technique, and Radial Basis Function Network (RBF) and Back Propagation Neural Network (BPN) as classifier.The research result showed that classification accuracy for BPN is about 72.88%, GR-BPN is 78.21%,RBF is 81.20%, and GR-RBF is 86.46%.Anggraeny et al. (2013) used gain ratio (GR) as feature selection technique and Voting of Neural Network Particle Swarm Optimization (VNNPSO) as classifier.The result showed that GR-VNNPSO did not improve all datasets accuracy rate.The method improved classification accuracy on dermatology dataset about 13.28% and reduced accuracy rate about 0.45% on iris dataset and 3.49% on breast cancer Wisconsin.Although only one dataset showed better accuracy, the increasing value was much higher than the decreasing value.Priyadarsini et al. (2011) used gain ratio as feature subset selection, combined by Naïve Bayes as classifier and K-Means for clustering.Ranking method is used in adult dataset to select a subset of 7 attributes In this research, we will add feature weight using gain ratio method as a multiplier factor of hidden layer random weight in Extreme Learning Machine, in the hope that this modification of ELM weighting will increase the accuracy of training and testing data.

METHODOLOGY Extreme Learning Machine
Extreme Learning Machine (ELM) is a learning method in single hidden layer feed-forward neural network which is faster and generally has a higher accuracy than backpropagation (Huang et al., 2004).The configuration of the neural network consists of d input nodes in accordance with the number of features, L hidden nodes, and m output nodes in accordance with the number of classes (Figure 1).Unlike backpropagation, ELM is not only aims to achieve a minimum error learning, but also the smallest norm of weight.The function of the output of ELM is (Huang et al., 2015): (1) where is the output weight between hidden layer nodes and output nodes, and is the ELM nonlinear feature mapping (Figure 2), where is the output of the i-th node to the hidden layer.The output function in hidden nodes may use a different activation function for each node.The output function in hidden nodes is notated as follows: (2) where , with hidden nodes parameter as a non-linear activation function, is the j-th input value, is random weight of i-th input layer, and is bias of the ith hidden node.
(3) ELM consists of two main stages: random mapping feature and linear completion of parameters.In the first phase, ELM randomly initializes the weights of hidden layer nodes to map input data into ELM feature space.The hidden node's parameters (a, b), are randomly initialized by a probability distribution.In the second phase, the weights connecting the hidden layer and output layer (β) are resolved by minimizing the error output: (4) where H is the matrix of hidden layer output, T is target data training matrix, and denotes Frobenius norm.

Gain Ratio
Gain ratio is an improvement of information gain.Information gain is used to form the induction of a decision tree (ID3), while the gain ratio is used on C4.5 algorithm, which is an improvement algorithm of ID3 (Asha et al., 2010).Information gain produces bias; it prefers features with many variations of values rather than features which has little variation despite being more informative.For example, let us look at a unique feature of a student data, such as its ID, in student table of a database.A separation using student ID creates a lot of partitions, as each data record has a unique value, which is student ID (Asha et al., 2010).
Let S is a data sample set and m is the number of classes.The entropy or information approximation to classify a sample is: (5) where is the sample probability with a conclusion.
Let feature/attribute A has a value variation of v. Let is the number sample class in subset .
consists of samples in S having a value from A. The entropy based on the division of the subset of attribute A is: (6) Gain information to branch attribute A is: (7) C4.5 uses the gain ratio by applying a normalization of the gain information obtained from: (8) Gain ratio is then computed using the following formula: Attribute with the highest gain ratio is selected as a splitting attribute.

Feature Weight Adjusted on Extreme Learning Machine
The proposed approach is completely described in the following FW-ELM algorithm in Figure 2. ELM preferred to use only random weights as parameter for hidden layer nodes.In the proposed method, first we calculate the feature role using gain ratio method for all features in dataset.These features role will be used as multipliers factor together with ELM random weighting as the input of neural network structure.
This research added one weight, i.e. feature role, which was obtained using gain ratio method.With the additional feature weight, the formulation in hidden layer becomes: Where is the feature weight of j-th input (feature) using gain ratio method.

RESULTS and DISCUSSION
Several datasets from UCI database were used for testing purpose as listed in Table 1, which were Iris, Breast Wisconsin, and Dermatology.The phase of feature weight computation using gain ratio gave results in the form of an order of importance of features from the largest to the smallest, as listed in Table 2, along with their respective gain ratio values.
Trials were performed 10 times for each dataset, and all data were used both as training and testing.The parameter assessed was the level of accuracy.Table 3 shows a comparison of SLFN tests between Feature Weighted-ELM (FW-ELM), ELM, and GR-VNNPSO (Anggraeny et al, 2013), in terms of classification accuracy (%) and computing time (s).For the original ELM method by Huang et al (2004), the datasets used in their published paper were different from ours.However, we were able to obtain their source code from Huang's website 1 and run it on our datasets.The comparison between ELM and FW-ELM aimed to investigate the effect of variable addition, i.e. feature weight, in ELM architecture.In the case of comparison between FW-ELM and GR-VNNPSO, the latter applied gain ratio before classification process but using different classifier method.This trial was performed using all features in each classifier.
Based on trial results shown in Table 3, the FW-ELM gave a relatively better accuracy than ELM.This showed that the addition of feature weight parameters in the ELM configuration was able to improve the accuracy on all three datasets significantly.In terms of training time, FW-ELM was faster than ELM, provided that feature weight computation process is carried out outside the classification engine.Compared with GR-VNNPSO, FW-ELM has less accuracy on all dataset but faster in computing time.

Figure- 1 .
Figure-1.ELM Architecture(Huang et al., 2015).from the original dataset of 10 attributes.Both on classification and clustering, the utility of the dataset is unaffected by the attribute reduction.In this research, we will add feature weight using gain ratio method as a multiplier factor of hidden layer random weight in Extreme Learning Machine, in the hope that this modification of ELM weighting will increase the accuracy of training and testing data.

Table -
There are features that have a big enough role in helping classification process and those that weaken of all the features possessed by a dataset.For the next improvement, this research can be expanded by adding a feature selection method, so that the features used in classification are those that have major roles in the classification process.