Analytical first derivatives of the direct Hermite-Simpson collocation formulation of optimal control problems

In order to assess the capabilities of South Africa as a launch site for commercial satellites, an optimal control solver was developed. The developed solver makes use of direct Hermite-Simpson collocation methods, and can be applied to a general optimal control problem. Analytical first derivative information was obtained for direct Hermite-Simpson collocation methods. Typically, a numerical estimate of the derivative information is used. This paper will present the solver algorithm, and the formulation and derivation of the analytical first derivative information for this approach. A linear and non-linear sample problem are provided as validation of the solver.


Introduction
An optimal control solver making use of the direct Hermite-Simpson collocation formulation of optimal control problems was developed. Formulae for determining the analytical first derivative information of the defect constraints of the direct Hermite-Simpson collocation formulation of optimal control problems have been formulated. The control variables were represented by linear mean splines. Irrespective of this, the first derivative formulae obtained can be extended to other representations of the control variables. The formulae obtained can also be extended to solving ordinary two-point boundary value problems with Hermite-Simpson collocation.
Typically, these first derivatives are estimated using sparse finite-differencing, alternatively they can be obtained with automatic differentiation or symbolic differentiation. The formula provided allows for an easy and compact representation and calculation of analytical first derivatives. Provision of analytical derivative information, versus estimating this information, typically increases the robustness and accuracy of solvers using Newton-iterative techniques [1]. Symbolic differentiation algorithms tend to not present derivatives in compact forms, which has the potential to significantly increase computation time. Forward automatic differentiation is easy to use, but inefficient for large problems, whereas reverse automatic differentiation requires significant effort to implement, but is efficient for large problems [2].
Optimal control is a field of study with various application in fields such as aerospace, robotics and chemical engineering. It is concerned with the determination of trajectories which optimise some cost function of a given dynamic system. The techniques of optimal control have been used, successfully, to optimise a host of dynamic system trajectories. These trajectories include: minimum fuel orbital insertion trajectories, re-entry trajectories and minimum energy robot trajectories. This paper first outlines the general transcription process of an optimal control problem to an Nonlinear Programming (NLP) problem using Hermite-Simpson collocation with linear mean control splines. The transcription process presented will assume the period of any phase of the state variables trajectory is a free variable. The derivation of the first derivative information of this formulation of the optimal control problem is then presented. Following which the first derivatives will be applied with example problems, using Matlab's Sequential Quadratic Programming (SQP) algorithm.

Transcription
Optimal control is the field of study concerned with determining the control history of a dynamic system which optimises a performance index or cost function. The cost can be a function of terminal values of the state variables or an integral function of the system, or both. The following summarises the optimal control problem, with terminal constraints to the state variables, and some of the state variables initial values defined. The systems analysed will be autonomous. It should be noted that other constraints such as path constraints can also be considered.
Where x f is the terminal state vector, t f is the unknown final time of the phase of the trajectory. x ∈ R m×1 and u ∈ R n×1 are the state and control vector respectively. The functions g : R (m+n)×1 → R m×1 , outputs the first time derivative of the state vector, φ : R (m+1)×1 → R m×1 , outputs the component of the cost function which is dependent on terminal values of the system, and the function L : R (m+n)×1 → R m×1 , outputs the integrand of the integral component of the cost function. ψ : R (m+n)×1 → R m×1 , outputs the component of terminal boundary constraints, which is dependent on terminal values of the system. In the event the final time is unknown, the following augmentation to the problem may be performed in order to allow for numerical solutions to be obtained.
Let t = t f τ where τ ∈ [0, 1] then dt dτ = t f . Let there be a function f : R (m+n+1)×1 → R m×1 , which outputs the first derivative of the state vector with respect to the new independent variable τ. That is Then by the chain rule f (x(τ), u(τ), t f ) = t f g(x(τ), u(τ)) The cost (or objective) function could, for example, be formulated in a manner so as to minimise the fuel usage of a particular rocket mission. The dynamic equations, along with any path constraints, essentially constrain the feasible solutions of the optimal objective function. In order to solve this problem using an NLP, the problem is converted, or transcribed, by Hermite-Simpson collocation into an NLP problem. This is achieved by discretising the independent variable of the system (typically time), in order to create a grid, or mesh. Each grid point of this grid represents discrete values of the dependent variables, in the case of optimal control these variables are a state and control vector. The differential equations which represent the system are replaced with defect constraints for each independent variable interval, which relate the dependent variables of adjacent grid points by some suitable quadrature. In the case of this paper, Hermite-Simpson is the quadrature used.
C. Rosmann et al. [3] presented time-optimal solutions using direct Hermite-Simpson collocation. The solutions which used control splines with a free midpoint led to oscillations in the time-optimal solutions. The solutions which used linear mean control splines had no such oscillations. Since such oscillations are undesirable in practical application, and the ultimate goal of this solver is to deal with phases with time as a free variable, the control variables were represented using linear mean control splines.
Let c k ∈ R m×1 represent the collocation constraint vector at a grid point k + 1. Let there be N grid points. Then at a particular grid-point k: The transcribed form of the optimal control problem, with terminal constraints to the state variables is now presented as follows: subject to Here,c k is referred to as a defect constraint. In the event that there are multiple trajectory phases, the following constraints must be imposed: where the superscript i indicates the phase, and x i 0 indicates the initial state vector of phase i. This new problem can be posed to an NLP solver in order for a solution to be obtained. It is often of importance to scale variables in order to improve robustness of an NLP solver [4]. In this work, state variables were scaled manually as the author saw fit.

Warm Starts
The developed optimal control solver will make use of warm starts. That is, obtaining the solution of simpler versions of the actual problem, in order to initialise the solver when attempting to obtain the solution of the actual problem. This provides solvers with good initial guesses. Provision of good initial guesses to Newton-iterative solvers increases speed and robustness of the solver, hence it is advantageous to make use of warm starts [5].
In this work, warm starts will be taken advantage of by obtaining a solution for a coarse grid. An estimate of this solution's discretisation error will be tested, if this estimate of the discretisation error is above a defined tolerance, the solver will make use of the coarse grids solution to determine an initial guess for a finer grid. The solver will then obtain a solution for this new grid, and the process will repeat.

Choice of Non-Linear Programming
The two popular NLP algorithms for solving direct optimal control problems are SQP and Interior point methods. SQP algorithms have the capacity to make use of warm starts. Thus the SQP algorithm can take advantage of a re-meshing strategy.
Interior point methods tend to increase the non-linearity of functions, which can present difficulties for already highly non-linear systems. Each iteration of the Interior point method must be strictly feasible, this prevents it from using warm starts. Hence interior point methods are unsuitable if a re-meshing strategy is to be used [6].
The class of problems the developed optimal control solver is intended to solve are highly non-linear. In addition to this, it is very desirable to take advantage of warm starts, hence a SQP algorithm was chosen.

First Derivative Information
The following section presents the analytical first derivative information of the defect constraints of the direct Hermite-Simpson collocation method, with linear mean control splines. In addition to this, first derivative information of the integral component of the cost function is provided.
For a particular phase of the trajectory: Letf Then the first derivative information of the defect constraint of a particular grid point is represented as follows: If the system is autonomous (not explicitly dependent on the independent variable): Let The transpose of the Jacobian of the nonlinear constraints is then The gradient of the integral component of the cost function for a particular phase is as follows: and let Now, we define Then: If the system is autonomous, then ∂C k ∂t f becomes: The gradient of C is thus In the event that multiple trajectory phases exist, the derivative information is as follows: If t i f represents the final time of phase i, y k represents a vector of all the unknown variables of phase k, and c i represents a vector of the collocation constraints of phase i, and P be the number of phases. Let and

Interpolation of Solution and Error Estimation
In order to make use of coarse grids when using direct collocation, interpolation of the solution is used to both represent the solution, and estimate discretisation error. The interpolation method used must be consistent with the quadrature method used, in this case Hermite-Simpson.
The interpolation function of the state and control variables for a particular grid interval is as follows: First, if p = t − t k , theñ and . The relative discretisation error for each mesh or grid interval is calculated as follows [6]: Where j indicates the jth state variable, k indicates the kth grid interval, and is the maximum relative local error. The integral which is equivalent to η k is solved with Romberg quadrature [6].

Linear case
The validation example presented is an exercise taken from page 60 of Lawden [7]. This problem was chosen as it has an analytical solution. The problem is as follows: The dynamic system subject to the following state equatioṅ Where x is the state variable, and u is the control variable. The system starts at a time of 0 and ends at a time of 1 second. The initial values for the state variable is 1. Then the optimization problem is set as follows: Optimise the cost function, C, Then the Hamiltonian of the problem is and thus Substituting the value for λ from the equation (21) leads to the following system of ordinary differential equationṡ Since the final value of x is unconstrained, the value of λ, at the final time, must be 0. Therefore the value of u must be zero at the final time. This system of differential equations (24) has been solved for using the Laplace transform, providing the following solution for the optimal control and cost The cost, C, and control are both consistent with the solution provided in [7].
The following matrices were defined in order to solve the problem using the developed solver.
J k = [−1, 1], and G k = 2x k 2 3 u k The problem was solved in 0.17 seconds with re-meshing, using the developed optimal control solver. Without the provision of analytical first derivative information, the solver required 3.41 seconds to obtain the same solution. This solver time is the sum of the time taken by the NLP to determine a solution for each mesh. The relative discretisation error of the presented solution is 5.2845 × 10 −10 . The maximum difference between the analytical and numerical solution of the control variable is 1.8201 × 10 −4 . The cost found by the solver is 0.3252424, implying a 0% objective function error. The variables were unscaled. Figure 1 below depicts the time history of the numerical solution of the state and control variables, the control error, and the analytical optimal control. The control error is the difference between the analytical and numerical solution of the control history, at discrete points in time. It should be noted that only first order conditions were used to determine the analytical optimal control. Figure  2 depicts the phase trajectory.

Nonlinear case
The validation example presented is a minimum time co-planar circular orbital transfer. The system dynamics was augmented from Moyer & Pinkham [8], such that the problem was autonomous. The specific version of this problem solved for in Moyer & Pinkham will be solved. This problem attempts to find the minimum time taken for a spacecraft to transfer from the average circular orbit of the Earth around the Sun, to the average circular orbit of Mars around the Sun. The system of units used in Moyer & Pinkham were normalised. The time unit is equivalent to 58.18 days. The distance unit used is AU (astronomical unit). The mass unit is equivalent to 46.58 slugs, or 14.594 kg.
The following assumptions were used: • Constant Thrust • Gravity and Thrust are the only Forces acting on the rocket • Earth and Mars have circular, co-planar orbits • Only the gravitational attraction of the Sun is considered.
The system dynamics are as follows: Where r is the radial distance of the spacecraft from the sun, θ is the angular position of the spacecraft, v r is the radial speed, v θ is the circumferential speed, ω is the angular speed, T is the thrust constant, m is the mass, µ is the standard gravitational parameter of the sun, and u 1 and u 2 are the components of the unit vector which represent the thrust direction. In this example, the aforementioned unit vector components are the control variables.
The normalised boundary constraints are: The following constants are used: The solution to this problem, found in [8] by using an indirect Newton-Raphson optimal control method, is a minimum time of 3.3207 time units or 193.2 days. This solution was found in 36 seconds. The following figure, obtained from [9], is a representation of the optimal trajectory, and the corresponding optimal control history. The arrows represent the thrust direction unit vectors. The matrix below was defined in order to solve the problem using the developed solver: The problem was solved in 1.34 seconds with re-meshing, using the developed optimal control solver. Without the provision of analytical first derivative information, the solver required 14.82 seconds to obtain the same solution. This solver time is the sum of the time taken by the NLP to determine a solution for each mesh. The relative discretisation error of the presented solution is 7.659 × 10 −7 . The cost found by the solver is 3.206 time units, implying a 0.00301 % objective function error. The variables were unscaled. The figure below is the optimal trajectory solution found by the optimal control solver. As stated in [9], the optimal radial thrust component is outward for roughly the first half of the trajectory, and inward for the rest. The results obtained are consistent with this observation. Figure 5 below compares the thrust direction history determined by Moyer & Pinkham [8], to the history determined by the developed optimal control solver. This figure shows that the optimal thrust direction history solved for is consistent with the optimal thrust direction history (represented by θ * ) found by Moyer & Pinkham. In this figure, the thrust direction is represented by the angle measured from the local horizontal to the thrust direction vector. The phase trajectory plots, are as follows:

Conclusion
Hermite-Simpson direct collocation methods have been effectively used to solve various optimal control problems. However typically the derivative information found using these methods have been estimated with finite difference methods. This paper presented a relatively simple formulae for determining first order analytical derivative information, for the Hermite-Simpson direct collocation method. This information allows for robust and faster solving of optimal control problems. The derivatives were tested in a developed solver, and have been proven to function in various validation examples, one of which was provided in this paper.
The first derivative information of the initial mesh was made to undergo a derivative information check by Matlab. Matlab tests the derivative information by comparing the analytically determine derivatives with numerical estimates (using finite differencing methods). The first derivative information passed Matlab's test, further validating the obtained results. The provision of first order analytical derivative information had successfully reduced the solve time of the solver by a significant margin.
This work, was constrained by Matlab's SQP algorithm, which estimates second derivative information using dense finite differencing methods. In order to improve the overall solver performance, and to compare it fairly to other optimal control solvers, the solver should be used in conjunction with an SQP algorithm capable of exploiting the sparsity of the second derivative information.