Finding input-output dependencies of feed forward neural networks

. In this paper we are finding input-output dependencies of feed-forward neural network which usually behaves as black box. It is very important and difficult to find or evaluate those dependencies especially for multi-input/output data approximation. We will use small neural network which will be trained on a given data in MATLAB Mathworks. Network will be simulated in standalone .NET application.


Introduction
Artificial neural networks are representing powerful tools, which are specific, by its behavior, they can adapt its behavior to the given data. On the other side artificial neural networks are also models which are storing their behavior in some parameters like biases and weights. Those models can have many parameterstens to thousands of parameters. In the case that the model has thousands of weight and bias parameters it is impossible to determine which input neuron affects the certain output neuron the most. In this paper we will work with simple finding and dependency drawing of multiple input quantities and one output quantity. Delineation of this input-output dependency is impossible, because in this case we would need to delineate dependency in n-dimension space, where n is summation of input and output neurons. In this paper we used simulation data which we generated manually. The data generation algorithm for this paper is not the subject for this research.
[1] The computational mechanism of the artificial neural network is quite simple, more complicated is to find optimized parameters of the networktrain the model. For better understanding how the network calculates the output[1,2], we can define weighted input inas the sum of the weights multiplied by the inputs to the artificial neuron (eq. 1) (1) In this calculation of weighted input in, we are filling the artificial neuron model with n inputs values. We can notate those inputs as . The significance and amount of impact of every input is expressed by their weights . Every input has his own weight parameterw [2]. When the weighted input value is calculated it is necessary to calculate the activation of the neuron. This operation is simple, all what is important in this step is that , weighted is transferred via the activation function. There are various types of activation functions such as hyperbolic tangent, logical sigmoid, linear activation function etc. Those functions can differ in every layer of artificial neural network. When backpropagation method is chosen for training it is necessary to have differentiate activation functions. Activation function equation (eq. 2) is given below [3].
We decided to use hyperbolic tangent activation functions in hidden layers, because of stronger gradient (in comparison with logical sigmoid) [3].

Simulation model
For this paper we designed a simulation model which will simulate technological process.
The acting of this model is described in the generated data. Also, there was added some noise into the training data for experimental purposes. There are several inputs which are affecting output quantity of product. We will design a simple feed forward neural network which will approximate the behavior function from the training data. This model will be trained in MATLAB by using neural network toolbox. After the training, the model will be simulated in standalone application written in .Net Framework. You can see the simulation model and the physical quantity description in the figure below [4,5].

Creating the structure of the simulation model in MATLAB
In this case artificial neural network is performing a task called function approximation.
The structure of this model is dependent from the function features (number of inputs, outputs etc.). It is obviously that the input layer will correspond to input parameters of the function which is in this case 8. Output layer will have 1 neuron, which represents and stores the values for the product quality [5,6]. Created structure of the neural network model is shown in the Fig. 2. Also, you can see used activation functions in every layer, connections between layers and sizes of every layer [7]. For the hidden layers we performed some tests which results of those tests are that we need to design 3 hidden layers. The minimum size of the first hidden layer is 150 neurons, second hidden layer should contain 200 neurons and the last hidden layer should contain 100 neurons. We´ve had two options how to achieve this. Firstly, we could use nntool toolbox for creating this type of feed-forward neural network [7]. Alternatively, you can write this MATLAB script which creates desired neural network with defined structure: With this script we have created neural network with 51300 weight parameters and 451 bias parameters which we need to properly optimizetrain. We want to find global optima of error function (given by training dataset) in n dimension space (n=51752) [8,9]. Also MATLAB can offer various optimization methods for training the neural network for example Levenberg-Marquardt, Quasi-Newton, Scaled Conjugate Gradient, Conjugate Gradient with Powell/Beale Restarts, Resilient Backpropagation, One Step Secant etc. [7,9].

Training the designed model in MATLAB
We decided to use SCG (Scaled Conjugate Gradient method) for this paper. This optimization method can train any neural network which uses differentiable function. The training is ended when several preconditions [7] are met:  the maximum epochs or iterations were reached during the training,  the maximum amount of time was elapsed during the training,  performance goal of the trained model was reached,  the performance gradient falls below defined minimal value,  maximum count of validation check-fail was reached. In comparison with Levenberg-Marquardt optimization algorithm in SCG there is not used line search per learning iteration. We need to write all parameters of the network in an appropriate wayas a vector in real euclidean space (eq. (3)). The reason why we need this is for the optimization of this algorithm [7]. SCG algorithm is performing matrix mathematical operations and that is why we need to have trained parameters in this vector format [9].
where ⃗⃗ -vector of the weight and bias parameters, ( )i weight parameter of the l layer from j node of previous layer. Before the learning algorithm starts, it is necessary to set up all weight and bias parameters to random small values near 0 so we get initialized vector ⃗⃗ . After that, some specific optimization strategy needs to be performed during the training which is completely different as in Levenberg-Marquardt backpropagation algorithm. Another approach (Levenberg-Marquardt or Gauss-Newton training methods) is to use gradient descent. In this case we have defined error function E (also known as cost function C). Often this function is defined like mean squared error MSE, mean average error MAE or RMSE [10] (eq. 4).
where, N is number of training samples, Y is the target output from the network, Out is the output from network after training and the i is the index of the training sample. Out represents the complete calculation entry of the neural network. Levenberg-Marquardt or Gauss-Newton approach use this error function E for weight and bias adjustment. Generated data used in this paper were randomly divided for testing, training, and validating. In the figure below you can see the correlation dependency between training (horizontal axis) and real (vertical axis) output from the model. Every circle on this plot represents one sample (model output and target). Easiest solution for this is to use the exponential smoothing. By adding this filter, you can eliminate the peaks on the signal and simplify the training process for the neural network. But you can see on the Fig. 4, that the neural network can approximate the function very well also when there is implemented noise. As you can see after the training process, we have got a model which can approximate the function y which describes how the product quality depends on physical quantities.

Evaluating dependencies
After the training, we have a specific model, and we can simulate various situations and conditions. If we want to evaluate dependency of one physical quantity to the output quantity of the neural network, we need to set other inputs to the neural network to constant values. We can choose and test various parameters from statistics for example mean average, median, modus etc. [10]. In this paper we choose the mean average parameter for the constant input values. When the inputs are set to the constant values, we can simply increment the investigated input by small value and calculate the output from the neural network. You can see the functionality of this process in the C# script below: In this case we want to investigate how the vibrations affect the dependency of the product quality and the speed quantity. Firstly, we will set necessary inputs to constant values. After that we will evaluate the product qualityspeed curve when the vibrations quantity has constant value for example 550 m/s^2. When the curve is evaluated, we will simply increment the value by 250. We will do this increasing multiple timesthe range of the vibrations will be 550-2050 m/s^2. After the evaluation process we will get total of 7 quality-speed dependency curves. The dependency of those quantities is given in the figure below. In the next test we will investigate how the vibrations are affecting the dependency of the product quality and pressure dependency. The range of the vibrations will be the same as in the previous test. The evaluated dependencies are shown in the Fig. 6. below. We can see in the Fig. 6. that when the pressure is low (in the green range of 0 MPa -80 MPa), the vibrations can affect the product quality much more than in the case of the higher pressure range (red range -80 MPa -160 MPa). Also, the product quality increases with the increasing pressure.
In the green highlighted area, you can see a state, where the output from the neural network was almost the same for every value of the vibrations.

Conclusion
Neural networks can be used as function approximator. This approximation of the function depends on the given training data. In this paper we created a simulation data set containing 8 inputs and 1 output. After successful training in MATLAB we have a model which stores his behaviour in the weight and bias parameters. Those parameters were exported from MATLAB after training. After that we created .NET application where we could test and evaluate required situations and conditions of the simulation model. There were created several dependency graphs in this application. We analysed two dependency graphs in this paper. We can easily evaluate dependencies of the model as we did in chapter 3. We choosethe vibration variable for investigation how this variable affects the dependency of quality-speed and quality-pressure.
As we can see, these algorithms of artificial intelligence are also very powerful for data analysis. If there is a large data set which contains records of some technological process you can use feed-forward neural networks and train them on this dataset. By training on the dataset, you will get a good approximated simulation model. The accuracy of this model is given by the regression plot and the correlation coefficient. This regression plot is automatically generated during the training process in MATLAB. Disadvantage of this method is that the training data should contain relevant datathat means that the variables in the training dataset should be in correlating relationship. Also, another disadvantage is that in the training dataset should contain every possible situation and condition of the technological process. It should also contain error conditions where the output has bad, not desired results. This is probably the biggest disadvantage of this method, because collecting and storing large amount of data can be difficult in terms of required time.