Representation of simulation errors in single step methods using state dependent noise

The local error of single step methods is modelled as a function of the state derivative multiplied by bias and zero-mean white noise terms. The deterministic Taylor series expansion of the local error depends on the state derivative meaning that the local error magnitude is zero in steady state and grows with the rate of change of the state vector. The stochastic model of the local error may include a constant, “catch-all” noise term. A continuous time extension of the local error model is developed and this allows the original continuous time state differential equation to be represented by a combination of the simulation method and a stochastic term. This continuous time stochastic differential equation model can be used to study the propagation of the simulation error in Monte Carlo experiments, for step size control, or for propagating the mean and variance. This simulation error model can be embedded into continuous-discrete state estimation algorithms. Two illustrative examples are included to highlight the application of the approach.


Introduction
In the solution of ordinary differential equations using numerical methods the problem of local and global errors arises because of finite order methods and finite step size. The problem has been considered exhaustively in the literature where, in simulation studies, the aim is usually to get a solution that has sufficiently small global error and over-estimating the error is wasteful but does no harm. In time-or resource-critical applications of numerical solution of ODEs, correct local error estimates are required and this motivates a more careful analysis of the local error. Of particular interest is the application of numerical simulation used to project state equations between measurement updates in continuous-discrete state estimation problems where a reliable estimate of the local error behaviour can be embedded into the state estimator to improve estimator optimality, for example, if the effect of the local error is overestimated, the estimator will de-emphasise the state estimates in favour of the measurements and this will result in sub-optimal performance. (Two conundrums remain: (i) If one can estimate the local error, there is no reason not to add this to the computed solution to get an improved solution but then no local error estimate is available. (ii) Local errors accumulate to the global error making it hard to quantify the latter even with smoothness and stability assumptions on the underlying system model and good local error estimates.) The development of the field of probabilistic solutions to differential equations is described in [1] and [2]. The ideas in [1] are developed in [3] to establish stronger convergence results for probabilistic integrators, and as their results apply to general additive noise models, they apply to the the state dependent noise with bias model used to model the local error in this paper. A general survey of probabilistic numerics is provided by [4].
In this paper, in Section 2 the local error for any single step method is modelled as a function of the state derivative multiplied by a bias term and a zero-mean white noise term, both appropriately scaled by the step size. Scaling the error model by the state derivative provides a richer model for the local error than used by [1], captures the fact that the local error depends on the rate of change of the state, and correctly gives zero local error effect in steady state. A stochastic difference equation is formed by combining the single step method with the local error model. To account for non-autonomous behaviour and as a "catch-all," an unscaled noise term may also be added to the model.
Following [1], in 3 a continuous time extension of the discrete time local error model is developed so that the original continuous time state differential equation can be represented by a combination of the simulation method and a stochastic term. This continuous time stochastic differential equation can be used to propagate the simulation error via Monte Carlo analysis or by statistical moments; can be used for step size control; and has specific application to incorporating local simulation errors in continuous-discrete state estimation.
Section 4 presents two examples to illustrate the utility of the approach presented in the paper.

Stochastic, state dependent local error model for single step methods
Consider the autonomous n th order dynamic system, Non-autonomous systems can be dealt with by appending an additional state variable, x n+1 (t) = t, with f n+1 = 1, which makes the extended system autonomous ( [5], Chapter 4). Apply any single-step method with p th order global error, with step size h i per time step, i, If (1) is Lipchitz along the state trajectory, to achieve a global error O(h p ), the local error must be O(h p+1 ). The local error (with index i to denote local error in making the step from where D (p) (x i ) f (x i ) represents the accumulation of the differences between h (p+1)th order elementary differentials ( [5], [6]) of the Taylor series expansions of the true solution and the method with a common term, f (x i ), factored out. Because there may be product terms, e.g. f a (x i ) f b (x i ), in the expansion, there may be ambiguity in elements of matrix D (p) (x i ) but not in the vector, D (p) (x i ) f (x i ). For a p th order method, terms in the local error up to and including h p cancel.
When applied to the linear time invariant (LTI) system,ẋ(t) = Ax(t), (3) gives the local error as e i = h p+1 (d p A p + O e (h)) Ax i , where d p is a coefficient. (d p = 1/(p + 1)! for explicit methods with p ≤ 4 so that the stage number equals the order, [6].) Although the local error is deterministic, based on the above observations and developing on previous work in this area ( [1], [2], [3]), we propose to model the local error for each row, r ∈ {1 . . . n}, of the state differential equation as a combination of a deterministic term and a stochastic term, where m r is a (generally) non-zero mean vector that captures the average or deterministic component of the local error and ξ r,i ∼ N(0, Q r ), a (n + 1) × 1, zero-mean white Gaussian random vector with constant covariance Q r to capture the residual effect of the local error. (Equation (4) can be represented using a single, equivalent term, ξ r,i ∼ N(m r , Q r ).) For LTI systems with p th order method and a fixed step size, the local error multiplier, m r (h) is constant (depending on h), as the local error is, . For general non-linear systems or varying step size, D (p) (x i ) + O(h) is not constant but may still have a non-zero mean over the operating region of the state space and range of step sizes. In (4), augmenting the state derivative vector with f n+1 = 1 takes care of time dependence for non-autonomous systems and provides a "catch-all" for errors that are not modelled via the state dependent terms. (This is the only term used in [1].) We do not recommend adding a constant offset to the deterministic part as this will change the steady state behaviour by adding bias to the equilibrium point.
The following approach is proposed for calibrating of the local error model.
1. Select a representative set of points over the typical operating state of the system, Notice that the sample points selected do not need to form a solution to the differential equation, they only need to cover the points to be traversed in the state space adequately.
2. At each point, apply the intended numerical method to make a single step, [k] . Also record f i, [k] = f x i, [k] . Calculating the local error over a range of plausible step sizes, or making use of an automatic step size control algorithm to capture the step sizes appropriate to various points in the state space ensures that the local errors are properly represented in the data.
3. Use a higher order method or sub-steps (possibly with Richardson extrapolation) to get an accurate estimate of the state at the next time, x i+1,[k] ≈ x [k] (t i + h), and hence estimate the local error, e i, . This step is only performed with high precision during calibration but typically there would be some form of (cheap) step size control to limit the worst-case local error during normal operation. 4. Use linear least squares (or any other method) to find elements of a fixed M (with rows, m T r ) to minimise the mean square residual error, γ r,i, over k. (M can be solved in a single least-squares step.) As M f (x i )h p+1 is scaled by h p+1 it does not affect the asymptotic error and elements of M may be forced to zero, if statistical analysis of variance supports the hypothesis that individual elements are zero within acceptable confidence levels, or if modelling of the problem indicates that this choice is appropriate. The justification for weighting the local error with 1/h p+1 is to follow the asymptotic error behaviour and will penalise large errors incurred with smaller step sizes. Different weighting could be used to emphasise areas of the state space that require tighter modelling of the error.

5.
Using the white noise model of the residual errors, ξ r,i ∼ N(0, Q r ) to give ϑ r,i = f T (x i ), 1 ξ r,i , find Q r to minimise γ 2 r,i − E{ϑ 2 r,i } s over k for some norm, s. This problem can be solved in two steps: is linear in unknown q j j , and can be found using a non-negative least squares solver (e.g. Matlab's lsqnonneg). (ii) If required, and having regard to the risk of over-fitting, find (e.g. upper) triangular Cholesky factor R, (R T R = Q r ) to solve the optimisation problem. The diagonal R = diag{ √ q j j } found above provide an appropriate starting value. The choice of metric, s (e.g. one-or two-norm) for the fit between γ 2 r,i and ϑ 2 r,i will affect the result of the optimisation.

Variations on the theme
Step 5 can be interpreted/implemented in other ways: 1. It may be preferred to assume diagonal, Q r , both for simplicity and to avoid the risk of over-fitting. This assumes that the residual error covariance depends only on f 2 a terms and not on product terms, f a f b , a b which may miss couplings that are physically relevant.
2. The residual error can be viewed/modelled per row as ϑ r,i = f T (x i ), 1 w r ξ r,i , with ξ r,i ∼ N(0, 1) a scalar white noise term with unit variance, and w r a weighting vector, found to minimise γ 2 r,i − E{ϑ 2 r,i } p . The non-negative least squares solution discussed above for diagonal Q r does not apply as E{ϑ 2 3. As noted in Section 3 below, if the residual error vector over the entire simulation is modelled as, ϑ i = W f (x i ) 1 ξ i with ξ i ∼ N(0, 1) a scalar stochastic element, the equivalent, continuous time stochastic differential equation can be simulated using the strong first order Milstein method for the special case of scalar noise. This requires finding the (n 2 + n) elements of the matrix W to minimise the matrix norm The row solutions in variation (2) above (corresponding to the case that the rows of γ i are uncorrelated) might be used as a starting value for numerical optimisation. 4. We have tried an alternative approach of constructing (term by term), a least-length pseudo noise term from the residual errors via, and then find- . We have found that this approach is optimistic: While (by construction) the pseudo-noise generates the residual error, because of the way it allocates the residual error to individual pseudo-noise terms the scheme typically gives too low a variance and since E ξ r,i, † , the resulting covariance cannot be expected to be constant. Finally, the dynamic range of f (x) depends on choice of units so weighting may be required.
The algorithm could be run on-line if computing resources allowed the local errors to be calculated accurately on the fly. Although this use-case seems to defeat the purpose of the exercise, the model parameters, m r , and Q r , could be adapted as new information becomes available while traversing of the operating region of the state space.

Continuous time extension of stochastic, local error model
There are applications where a continuous time equivalent of the local error model will be useful. These include continuous-discrete state estimation incorporating simulation error effects, cheap local and global error estimation, and step-size control. This section develops the continuous time extension of the results above by representing the continuous time system via a stochastic differential equation. [1] modelled the local error as a stochastic term added to the differential equation and calibrated this as a zero mean, iid Gaussian sequence, with a covariance scaled by h 2p+1 . In this section, this idea is developed and applied to the more complicated local error model in (4).
If χ(t) is a zero mean, white Gaussian noise with covariance, Q χ (t)δ(t), (δ(t) is the Dirac delta function) and Q χ (t) constant over the interval, t ∈ (0, τ]. Then, the Wiener process, ξ(τ) = τ 0 χ(λ)dλ, is zero mean with covariance, Q ξ = τQ χ . To develop a continuous time extension of the discrete time result in Section 2, the Taylor series expansions of the true solution and solution of the simulation method over the interval, t = t i + τ, τ ∈ (0, h], are written as follows. True solution: with all quantities evaluated at (x i , t i ), e.g.
Simulation method: The single step method, (2), is written out in the interval, giving a continuous time function, y(t), can be written as the Taylor series expansion of the time derivative of τψ(y i , τ) in the interval. If a deterministic correction is used in the simulation, the corresponding continuous time extension would be y( The continuous time extension of the local error is e(t) = x(t) − y(t), where i (s) is the time derivative of the local error's continuous time expansion,

Local error
The above developments mean that,ẋ(t) = f (t) = g i (t) + i (t) in the interval t = t i + τ, τ ∈ (0, h]. Both e(t i + τ) and (t i + τ) depend on a principal term, D (p) (p+1)! f (x i ) and τ p p! f (x i ) respectively, and on higher order terms (also scaled by f (x i ) but dependent on the step size). The higher order expansions in the local error, O e (τ), and local error derivative, O (τ), do not have the same coefficients, for example, an LTI system with an explicit method (truncated Taylor series) has, (t i + τ) = τ p p! A p + O (τ) Ax i . As examples of the above discussion, the Heun (modified Euler) and linearly implicit Taylor-Heun schemes applied to the scalar, autonomous system,ẋ = f (x), x(t 0 ) = x 0 , with correct x i = x(t i ); and to the linear time invariant (LTI) systemẋ = Ax are summarised in Tables 1 and 2 respectively. For the LTI examples, the next higher order term in the error expansion is shown to illustrate that O e (τ) and O (τ) terms are not the same.
The above calculations are possible because i (t) is deterministic. If however, (after removing the deterministic component), e r,i = e r (t i + h) is modelled via the stochastic term, h p+1 ϑ r,i = h p+1 f T i , 1 ξ r,i , the stochastic version of i ξ r,i as required. Although this arrangement ensures that the local error has the required covariance at the end of each step, it does not model the error correctly during the step because of the difference between deterministic and stochastic integrals. For the deterministic version, the principal component of the local error is constant during the evolution of the continuous time extension, while for the stochastic version, the principal error effect (standard deviation) grows with √ τ. As most studies will end up with numerical simulation of each step, the under-estimation of the local error covariance during the initial part of the continuous time extension will not matter.
Defining the stochastic increment per row as, dβ r = χ r dt, the simulation ofẋ r (t) = f r (x), can be approximated and studied via the stochastic differential equation, dx r (t) = g r,i (t)dt + f T (x i ), 1 dβ r (10) If (10) is simulated using the strong order 0.5 Euler-Maruyama approach ( [7]), the original, generally higher order, deterministic method selected for the problem is retained as (7) and the stochastic terms are propagated with low order. This scheme is justified by noting that the noise term χ r is pre-scaled by h p+0.5 i before integration. [3] emphasises the difference between solvers for stochastic differential equations and the present problem of probabilistic numerics where the variance of the local error model decays faster than h. No general claims about the resulting method order are made here and the cautions raised in [8] should be noted. The algorithm would be, 1. Draw independent random vectors, ∆β r,i from the distributions, ∆β r,i ∼ N(0, h 2(p+1) i Q r ).
2. Propagate the state vector via, If the local error is modelled with a scalar noise term that scales all rows of the SDE, dβ, the state can be propagated using a modification of the first order Milstein method ( [7]), giving, Notice that the Jacobian required in (12) for the stochastic component is already available for linearly implicit methods.

Application of results to numerical simulation
The results developed above may be applied to numerical simulation in several ways:

Simulation of an ensemble of solutions:
For each solution the underlying numerical method is modified and simulated via (11).

Simulation of mean and covariance of solutions:
The solution of the mean,x(t) = E {x(t)}, and covariance, P(t) = E (x(t) −x(t)) (x(t) −x(t)) T of the stochastic difference equation can be obtained using standard techniques of either linearizing the covariance equation, using unscented transforms or Monte Carlo particle approaches. The covariance solution provides an estimate of the uncertainty range of the global error. Ignoring high order terms the mean is, giving,x The covariance is, with L(x) = diag f T (x i ), 1 and Q = diag {Q r }. The approximations in (13) and (15) assume sufficiently smooth behaviour of f (x) around the mean value -see for instance [9], Section 4.9 for second order expansions of the mean and covariance, or [7]. To discretize (15) with a simple (recall that the noise covariance is already O(h 2p+1 )), A-stable method that preserves positive semi-definite behaviour of P, the following Padé type approximation is recommended ([10]), uses a mid-timestep calculation to achieve second order accuracy. Also see [11] for further discussion.)

Step size control based on stochastic model of local error:
If the local error, (4), is well-characterised by the noise term over the operating state of the system, this can be used for cheap local error estimation and control as f (x i ) is calculated at the start of every step. For any step, the stochastic model of the local error for row r will have standard deviation, σ r,i = h p+1 f T (x i ), 1 R T r 2 . If the error standard deviation is required to be less thanσ r = g r h, (the additional h to take care of the step size effect on the global error accumulation), the step size should be to ensure that the step size is small whenever there is a high rate of change in a direction that causes high local error sensitivity. This idea is illustrated in Example 2.

Examples
Both examples in this section use inappropriately large step size to give visual illustrations of the results in the paper. Example 1 uses the first order Euler method and Example 2 uses the second order Heun (modified Euler) method.

Example 1: Explicit Euler Method
The explicit Euler method, x i+1 = x i + h f (x i ) is applied to the FitzHugh-Nagumo oscillator used as an example in [1]:    only results for the disc-shaped region. To illustrate further, an ensemble of solutions to the SDE and the explicit Euler solution with one standard deviation, y i ± diag (P i ), are shown. Fig. 2 illustrates the development of the global error. Notice that in this simulation, after multiple cycles, the predicted global error is optimistic compared to the actual value.

Example 2: Heun Method
The Heun method, is applied to the classic two body problem discussed in [5] and elsewhere. The model for an orbit in the x-y plane with the origin as one of the foci, isẍ = −x/r 3 ,ÿ = −y/r 3 , with r 2 = x 2 + y 2 , (ẋ 0 ,ẏ 0 , x 0 , y 0 ) T = 0, 1−e 1+e , 0, 1 − e T and large eccentricity, e = 0.9. Heun is a second order explicit method and the purpose of the example is to illustrate the results presented in this paper rather than to advocate Heun as the best numerical method for this problem. (Higher order methods can exploit the smoothness, and implicit methods and step size control be used to deal with the stiffness.) The value of M obtained is quite consistent over a range of step sizes but it is averaging the local error effect over the whole of an orbit, which has significant velocity differences at different points due to the eccentricity. Even though the simulation method, is still globally second order this understanding suggests excluding the offset term (i.e. M = 0). As a result, the local error is modelled in its entirety via noise terms. The local error data is obtained by selecting points on the orbit found using Matlab's ODE23 with high tolerance.
To illustrate a simple step size control scheme, at each step, the noise covariance per row is calculated from (17)  , with p tol the allowed position tolerance per step and σ p and σ v the position and velocity standard deviation increments predicted using the noise model and (16). (Practically, the step size selection is also jacketed between minimum and maximum values but these are not active in the simulation results presented below.) Fig. 3 shows the position along a single orbit with p tol = 5 × 10 −4 used for step-size control. The accurate solution is shown along with Heun method and stochastic differential equation solutions are shown every 30 steps. The step size control algorithm varies the step size between 0.0010 and 0.0158 along the trajectory. Because of the way the step size is calculated this is less reactive than a regular step size control algorithm would be -for example, Matlab's ODE23 (albeit a third order method) used step sizes ranging from 0.0003 to 0.071 for the same problem with tighter tolerance. The displayed points are fairly uniformly distributed in space because larger step sizes are used when the body is far from the origin and moving slowly. The solution mostly remains within the predicted one standard deviation ellipses of the accurate solution but becomes hopelessly uncertain within one orbit at this resolution. The important point is that the error ellipses give insight into the loss of reliability of the solution.

Conclusions
This paper has introduced the idea of representing local errors in numerical simulation as stochastic variables formed by the product of the state derivative and white noise, with the possibility of including a constant stochastic term to take care of unmodelled effects and time dependence. This richer representation of local error allows a less conservative model of the local error that, for instance, correctly models zero local simulation error when the system is in steady state. A continuous time extension of the results allows numerical simulation to be modelled using the framework of stochastic differential equations with the time derivative of the single step method as the deterministic part and a state dependent noise to represent the