Predicting China's Economic Running State Using Machine Learning

China's business index of macro-economic includes early warning index, coincidence index, leading index and lagging index, among which early warning index reflects the economic running state. However, obtaining these indexes is a complex and daunting task. To simplify the task, this article mainly explores how to use machine learning algorithms including multiple linear regression(MLR), support vector machine regression(SVM), random forest(RF), artificial neural network(ANN) and extreme learning machine(ELM) to accurately predict early warning index. Finally, it can be found that the warning index can be well predicted by above machine learning algorithms with coincidence index, leading index and lagging index to be variables, furthermore, extreme learning machine and random forest are superior to other methods.


Introduction
Business index of macro-economic composed of coincidence index, leading index, early warning index and lagging index reflects the strength of the national economic growth momentum. The coincidence index reflects the basic trend of the current economy, which is composed of industrial production, employment, social demand (investment, consumption, foreign trade), social income (national taxation, corporate profits and household income), etc. The leading index is used to predict the future trend of the economy. The lagging index is mainly used to confirm the peak and valley of economic cycle. The early warning index classifies the state of economic operation into five levels: "red light", "yellow light", "green light", "light blue light" and "blue light". "Red light" means overheated economy, "yellow light" means partially heated economy, "green light" means normal economic operation, "light blue light" means cold economy, and "blue light" means excessively cold economy.
Economic sentiment index is derived from the business climate survey, which is a statistical investigation system and compiled by conducting regular questionnaire survey on entrepreneurs according to their judgments and expectations to enterprise operation and macroeconomic condition. It reflects the status of production , operation of enterprises, economic operation situation to predict the future trend of the development of the economy. The process of getting indexes is complex and time-consuming [1][2][3], so it is of great significance to explore new ways to obtain indexes more easily.
Machine learning has been booming in recent years. An amazing feature of machine learning is its strong predictive ability. Machine learning has been widely used in various fields [5][6][7][8][9][10][11][12][13][14][15][16]. Support vector machine regression is originally proposed by Vapnik in 1995, and were widely used in all walks of life. Other researchers use it to predict the pressure drop during evaporation of R407C [17], toxicity of ionic liquids [18][19], partition coefficients [20], etc. Artificial neural network has been a research hotspot in the field of artificial intelligence since 1980s. Extreme learning machine is proposed by Huang et al ,and applied for classification in 2014 [21]. To some extent, ELM is a type of feedforward neural network. ELM can be used to achieve good generalization performance at extremely fast learning speed in different fields [22]. Simple linear regression, multiple linear regression, logical regression and random forest et al. are also widely used in various fields to making predictions.
In this paper, we explore the relationships among early warning index, coincidence index, leading index and lagging index based on machine learning methods. finally, it can be observed that early warning index which reflects the economic running state can be well predicted with coincidence index, leading index and lagging indexes to be variables. However, support vector machine regression and the random forest approach are superior to other machine learning methods. The main contribution of this study is to provide an easy way to obtain early warning index.

Model and Data
For the purpose of testing the predictive capability of different machine learning methods with coincidence index, lagging index and leading index to be inputs and early warning index to be output , we rely on the data set from CEIC. The data set consists of the monthly business index of macro-economic from 1991 to 2017, with a total of 324 data. The data need to be preprocessed before predicting, the data are normalized by equation (1): (1) min X and max X are the maximum and the minimum values corresponding to the indexes, respectively.
X is the actual value of the index, and norm X is the normalized value of the index . Furtherrmore, 75% of the data was used for training, and 25% of the data was used for testing.
In this paper, five machine learning methods were explored to predict the economic running state i.e. , multiple linear regression, support vector machine regression, random forest, artificial neural network and extreme learning machine. Mean square error (MSE) and squared correlation coefficient ( 2 R ) are selected to evaluate the model performance. MSE is the mean square sum of the error of the corresponding points between the predicted data and the original data, and the closer it is to 0, the more successful the data prediction is.
2 R is to describe the goodness or badness of model fitting, and the closer it is to 1, the better the model fitting is. MSE and 2 R are given by equation (2) and equation (3): where exp y is the predicted value (output), and act y is the actual value.

Multiple linear regression (MLR)
Multiple linear regression is a commonly used modeling method and a simple regression method. In this model, the main goal is to find the best fitting straight line with the given data. Linear regression is given by equation (2): is given, the error (RSS) is given by equation (3) :

Support vector machine regression (SVM)
The main purpose of SVM is to obtain a optimal hyperplane. The support vector machine regression method reduces the constraint in error and no longer consider the residual in the training data set. Therefore, the goal of support vector machine (SVM) is to find the kernel function which can map data to highdimensional space, and m w the respective weights.
SVM can take regression problem as an optimization problem as follows: The objective function is given by

Random forest (RF)
Random forest is characteristiced by bagging and random feature selection. The RF approach selects a subset of characteristics to be split at each node during the tree formation, and each tree is built independently using the boot sample of the training data. The general algorithm for RF is as follows: 1. The bagging idea is used to randomly generate n sample subsets from the training data set.
2. Taking  to build n trees, finally, each tree grows freely without cutting branches to form a forest.
3. Predicting the output of the new data set. The predicted values are the average of the predicted results of all decision trees.
The output of FP prediction can be expressed as equation (5): is the predicted value, and ) (x f j is the individual prediction of a tree for an input vector.

Artificial neural network (ANN)
Artificial neural network algorithm is based on the mathematical model inspired by behaviors of biological neurons. It abstracts the neural network of human brain from a certain point of view to establishes some simple model, and forms different networks according to different connection modes. The artificial neural network method consists the input layers, hidden layers and output layers, wherein hidden layer contains a given number of neurons that take input from the input layer and connect their outputs to the output layer. If the artificial neural network has more than one hidden layer, the outermost layer is connected between the innermost layer and the output. Each line connecting two neurons is associated with a given weight. In the hidden layer, the output of a neuron can be obtained from the following equation (6)

Extreme learning machine (ELM)
ELM is a new algorithm for single hidden layer feedforward neural network. Compared with the slow training speed of the traditional feedforward neural network, a great chance to fall into local minimum point and sensitivity of learning rate selection and other shortcomings, the ELM method generates the weights between input layer and hidden layer and neuron threshold of hidden layer without adjustment in the process of training, and only need to set the number of neurons in hidden layer to obtain the optimal solution. ELM is Characterized by the proposition of using random independent nonlinear feature transformation. Inherently, with two important characteristics, interpolation capability and universal approximation capability. ELM is widely applied in various fields for extremely fast learning speed.

Results and discussion
The outcomes of multiple linear regression, support vector machine regression, random forest, artificial neural network and extreme learning machine are presented in Figure1, Figure2, Figure3, Figure4 and Figure5, respectively. The data predicted from training and test set are highlighted in blue, which are shown respectively, and the coefficient of determination 2 R (calculated based on either the training or test sets) is indicated. Figure 1 shows the prediction of the multiple linear regression, which presents that multiple linear regression provides an acceptable model for the prediction with the MSE and 2 R values of the training set and the test set to be 0.0047, 0.9014, 0.0061 and 0.9037, respectively. Besides regression coefficients are -0.01467 , -0.06193 and 1.1157. The SVM method can be used to train a model based on both linear and non-linear data. Figure 2 shows the predicted values using the SVM method, as compared with the measured values. As we can see, the SVM method can also give a good prediction with the MSE and 2 R values of the training set and the test set to be 0.0035, 0.9259, 0.0057 and 0.9098, respectively. Furthermore, RBF function is selected to be kernel function. In general, the RF approach can be optimized by increasing the number of trees, but it can also ultimately lead to an overfitting situation. To find the best number of trees, we gradually increase the number of trees from 5 to 100. In this process, the 2 R and MSE values of the test set are computed in Table 1. It can be observed that the change of the number of trees has little effect on the prediction. MSE reaches the minimum value and 2 R reaches the maximum value when the number of trees increases to 20, so optimal number of trees is identified to be 20.   Artificial neural network method can improve the predictive ability by increasing the number of hidden layers and the number of neurons like the random forest method by increasing the number of trees, and it can also lead to overfitting. Likewise, we increase the number of neurons from 1 to 20 with 1 hidden layer to observe the corresponding changes. We find that the optimal number of neurons is 5, and the sigmiod function is chosen as the activation function.  In the same way, changing the number of neuron to predict early warning indexes, the optimal number of neurons was finally determined to be 10, and radbas function was selected to be the activation function. The comparison between the predicted results using the ELM method and the real values is shown in  The comparison of outcomes of different machine learning methods is shown in Table 2, which suggests that all those machine learning methods can make a good prediction. As we can see RF and ELM both give an excellent prediction to the early warning index, and we also can find that RF is superior to the other machine learning methods. In this study, to solve the problem that obtaining business index of macro-economic is a complex and daunting task, business index of macro-economic is explored based on 5 machine learning methods. It can be found that early warning index can be well predicted by the five machine learning methods and accurately predicted by the RF and ELM approach with the other indexes to be inputs, which means machine learning models provide a more convenient way to obtain earning warning index.