On the modification of Differential Evolution strategy for the construction of Runge Kutta pairs

Among the most popular methods for the solution of the Initial Value Problem are the Runge–Kutta (RK) pairs. These methods can be derived by solving a system of nonlinear equations after admitting various simplifying assumptions. The more simplifying assumptions we consider the more we narrow the solution space. In [1] Tsitouras presented an algorithm for the construction of Runge–Kutta pairs of orders 5 and 4 satisfying only the so called "first column simplifying assumption". In [2] Famelis and Tsitouras have studied the ability of Differential Evolution techniques to find solutions satisfying all the order conditions needed for the derivation of orders 5 and 4 pairs. In this work we propose an modification on the Differential Evolution strategy for the same problem. The current study will be a guide to the construction of other classes of RK that have not been presented in the literature.


Introduction
We consider the numerical solution of the non-stiff initial value problem, y = f (x, y), y(x 0 ) = y 0 ∈ IR m , x ≥ x 0 (1) where the function f : IR × IR m → IR m is assumed to be as smooth as necessary. For the approximate solution of the problem ( 1), the general s−stage embedded explicit Runge-Kutta pair of orders p(p − 1) advances the approximation of the solution from x n to x n+1 = x n + h n , n = 0, 1, 2, . . . using the formulaê a i j f j ), i = 1, 2, · · · , s.
As in both formulae c i , a i j are common the coefficientŝ b j , b j define the (p−1)−th and p−th order approximations respectively. The coefficients of such a pair of methods can be presented in a matrix-array form using the following Butcher Tableau [3,4] c A b b a e-mail: ifamelis@teiath.gr b e-mail: tsitouras@mail.teiste.gr where A ∈ IR s×s is strictly lower triangular, the column vectors b T ,b T , ∈ IR s and the line vector c ∈ IR s satisfies c = Ae, e = [1, 1, · · · , 1] T ∈ IR s .
Throughout this paper, we assume that local extrapolation is applied e.g. the p−th order approximation advances the integration. For the step size selection mechanism the local error estimate E n = y n −ŷ n is used. Given a tolerance parameter T OL, when E n ≤ T OL the mechanism h n+1 = 0.9 · h n · ( T OL E n ) 1 p furnishes the next step length. In case of E n > T OL, the current step is rejected and a new approximation of y n+1 is computed using as step size h n the outcome of the above formula.
In case that c s = 1, a s, j = b j for j = 1, 2, · · · , s − 1 and b s = 0 b s the First Stage of each new step is the same As the Last one of the previous stage. This device called FSAL, possibly introduced in [5], effectively reduces the stages of the pair by one to s − 1.
The Local Truncation Error (LTE) e n+1 is the error of the integration when y n = y(x n ) e.g. the previous step value is exact. LTE associated with a p−th order RK method is The Journal's name with Q qi algebraic functions of A, b, c and ξ qi positive integers. P qi are differentials of f evaluated at (x n , y n ) and T qi = 0 for q = 1, 2, · · · , p and i = 1, 2, · · · , λ q . λ q is the number of elementary differentials for each order which coincides with the number of rooted trees of order q. It is known that The construction of an effectively 6−stages FSAL Runge-Kutta pair of orders 5(4) (RK5(4)) requires the solution of a nonlinear system of 25 equations e.g. λ 1 + · · · + λ 5 = 17 order conditions for the higher order formula and λ 1 + · · · + λ 4 = 8 order conditions for the lower order formula. In Table 1 we present the 25 order conditions of a RK5(4) pair in matrix operation form. In these equations, the operation "*" is to be understood as component-wise multiplication and the power of an vector as a componentwise power, e.g. c 2 = c * c. The rest multiplications and the matrix powers are the known from linear algebra vector-matrix operations. So, our problem has a total of 28 parameters (unknowns), namely c 2 , . . . , c 6 , b 1 , . . . , b 6 , b 1 , . . . ,b 7 , a 32 , a 42 , a 43 , a 52 , a 53 , a 54 and a 62 , . . . , a 65 .
The exact solution of the resulted nonlinear system is out of question. Only after considering simplifying assumptions, we can use nonlinear optimization techniques to get accurate enough solutions. In practice nonlinear optimization techniques based on some kind estimation of derivatives such as conjugate gradient, back-propagation or other Newton-type methods methods are not easily applicable due to the nature of the problem. Instead nonlinear optimizers based to stochastic direct search seem to work very efficiently.

Differential Evolution
Optimization methods can be divided in two large classes. The former is continuous optimization where the search area and solutions are presumed to be situated in a certain continuous space with its metrics. The later is combinatorial optimization where the search area is limited by a finite number of feasible solutions. Depending on the nature of the objective function and the constrains, the former class is subdivided into linear programming , quadratic programming and nonlinear programming methods. The last subclass consists of local search methods and global methods.
The global methods can be either classical methods, where the global search is successfully realized as a sequence of solution of local optimization problems. Alternatively, we have metaheuristic methods which can be either population-based as Differential Evolution method, Particle Swarm method and genetic algorithms or neighborhood-based methods.
Combinatorial methods can either be exact methods where the enumeration of all sets of solutions results the global optimum. Due to computational cost such methods are appropriate for small scale problems. Alternatively we can consider approximate methods where a partial enumeration leads to a near to optimum solution with a bias. Approximate methods can be either heuristic which are designed for a certain problem and cannot usually used for other problems or metaheuristic algorithms. Such procedures can be used when we have mixed (continuous and discrete) parameters.
Differential Evolution (DE) [7-9] is a population based metaheuristic method which has become more and more popular for problems that either classical continuous and combinatorial methods fail to solve. Its virtues are that a) they do not require special conditions for the properties of the objective functions and the constrains, b) they can be applied in both continuous and combinatorial problems and c) they are extensible on multimodal and multiobjective optimization. Sensitivity of the process to the control parameters and the possible high computational cost are its drawbacks.
In an optimization problem we have an objective function (usually called a fitness function) f : B ⊆ IR D → IR for which we want to find a minimum point X gmin ∈ B where f attains its global minimum subject to some inequality constrains. B is the set of feasible points that satisfy the constrains. Our aim is to satisfy an optimization criterion which consists of the fitness function and the constrains. Usually nonlinear problems have many local minimum so the approximate problem solution is to find a X appr ∈ B which satisfies the constrains and f (X appr ) has a desirable precision VT R (value-to-reach) e.g. | f (X appr )| ≤ VT R.
DE is a very powerful tool for global optimization which the applies a procedure of evolution of a population of individuals in a intelligent manner. DE disposes three control parameters the population size NP, the differentiation constant F and the crossover constant Cr. Even though DE is more robust regarding control parameters (compared to Particle Swarm optimization or evolutionary algorithms) the proper choice of the control parameter values improves the procedure's convergence.
DE is an iterative process where in each iteration, called generation g, we work with a "population" of individuals IP g = {X g cin ∈ B, cin = 1, 2, . . . , NP}. Every member of this population is a potentially optimal solution and it is a set of D gens e.g. X g cin = {x g cin, j j = 1, 2, . . . , D}. In the first step of the algorithm, the initialization, an initial population IP 0 is considered, usually randomly and the fitness function is evaluated on it. Then, in each iteration (generation) a reproduction scheme updates all the individuals of the IP g performing a sequential procedure with 1st MINI CONFERENCE ON EMERGING ENGINEERING APPLICATIONS three phases: the Differentiation, the Crossover and the Selection.
In Differentiation phase three (up to five ) individuals X g 1 , . . . , X g 5 are chosen from the population. The Differentiation strategy is defined by the way of choice which can be based on random, directed, local or hybrid criteria. For each individual X g cin ∈ IP g the result of the Differentiation is the trial individual: where F is the Differentiation constant, β is the base vector and δ is the difference vector. The base vector can be chosen either randomly, without taking any information about the values of the fitness function, or locally by choosing either X g cin or the best X g best individual of the generation. The difference vector can be formed either randomly or by using the values of the objective function to determine a direction which can be viewed as an imitation of the gradient function. Combination of these selections in the difference vector formation has been proposed too [9]. Thee most popular strategies are the following: Strategy 1 best/rand2 where β = X g best and δ = X g 1 − X g 2 . Strategy 2 rand1/rand2 where β = X g 1 and δ = X g 2 − X g 3 . Strategy 3 local/rand2,best1,dir1 where β = X g cin and δ = X g best − X g cin + X g 1 − X g 2 . Strategy 4 best/rand4 where β = X g best and δ = X g 1 − X g 2 + X g 3 − X g 4 . Strategy 5 rand1/rand4 where β = X g 5 and δ = X g 1 − X g 2 + X g 3 − X g 4 . Strategy 6 local/hybrid, linear crossover combination of local,dir1 and rand2 where β = X g cin and F · δ = F cr (X g best − X g cin ) + F(X g 1 − X g 2 ). In Crossover phase the trial individual ω g cin is recombined with X g cin and a new trial individual ω g ν cin is formed by inheriting its gens by using the following probabilistic rule: cin, j , otherwise where j = 1, 2, . . . , D, Cr ∈ [0, 1] the Crossover constant and rand j ∈ [0, 1) a random number.
In Selection phase the new trial individual replaces X g cin if it attains a smaller fitness value. The algorithm iteration terminates when a stopping criterion, such as a maximum number of generations is reached or VT R criterion is satisfied [9].
The numerical experiments presented in [2] revealed that the trial individual the ω g cin should be considered having a linear combination of both X g cin and X g best . The bigger the participation of the best individual is, the less generations we need to have an acceptable solution but we loose on the percentage of successes. Whereas, the Strategy 6, with a randomly in [0, 1] (as suggested in the literature) choice of the contribution of X g cin and X g best in the linear combination, proves to be very efficient.

Numerical Experiments and Conclusions
For the construction of the RK5(4) pair considering the Table 1 order conditions we have a fitness function subject to linear constrains which keep the coefficients of the method in appropriate limits. In order to construct effective methods of the desired order as VT R we consider computer arithmetic epsilon. We set an arbitrary value for b 7 and so the dimension of the real precision parameters D = 27. I We set the population size of the population NP = 270 and the maximum number of generations equal to 100000. We experiment with various modifications of the random choice in Strategy 6. We apply the DE procedure for 500 times and we record the average number of convergence of the procedure having reached the desired VT R accuracy and the average number of generations needed. Our numerical results (See Table 2) reveal that for both robustness and fast convergence we should modify Strategy 6 and choose the F cr randomly in [0.5, 1].