The role of optimal perturbations in improvement of stability and forecasting capacity of adaptive filter for high dimensional systems

This paper is devoted to an optimal perturbation (OP) which allows to qualify a capacity of the dynamical system to predict more or less correctly the system behaviour in the future. The different types of OPs, deterministic, stochastic, OP in an invariant subspace, are introduced. The theoretical results on the optimality of the introduced OPs are presented. The different simple and efficient numerical algorithms for computation of the OPs are outlined which constitute a basis for implementation of a stable adaptive filter in a very high dimensional environment.


Introduction
Study in high dimensional systems (HdS) today constitutes one of the most important research subjects thanks to the exponential increase of the power and speed of computers [1].This exponential increase is still far from being sufficient for responding to great demand on computational and memory resources in implementing optimal data assimilation algorithms (DA-Alg) for operational forecasting systems (OFS) in meteorology and oceanography.
The objective of this paper is to introduce different types of OPs, deterministic, stochastic, OP in an invariant subspace and to present the main theoretical results on optimality of the introduced OPs.
It is well known the OPs play an important role in the study on the predictability of dynamical systems [2], [3].On the other hand, as will be seen in the sequel, the OPs are the key elements constituting a basis for a subspace of correction, which ensures a stability of the filter for data assimilation in HdS [4].The predictability of the system depends on two sources : the first source is the system response to a forcing, and the second -due to the initial state [5].The OPs thus concern the predictability of the system due to the initial state and are therefore a key in assessing the system's predictability [6].
For the ideas on the predictability of the atmosphere, see [7], [5].As found here, the atmosphere is a chaotic system and a predictability limit to numerical forecast is of about two weeks.The fact that estimates of the current state are inaccurate (due to many approximations involved in data assimilation algorithms) and that numerical models have inadequacies, leads to prediction error (PE) that grows with increasing forecast lead time.As a consequence, a single (control) forecast often departs rapidly from the real atmosphere.The idea of ensemble forecasting (EnF) is to add the perturbations around the filtered (analysis) estimate to produce a collection of forecasts that try to better simulate the possible uncertainties in a numerical forecast.This allows to obtain an indication of the range of possible future states of the system, with the objective to assess to the information on a correct predicted ensemble spread.Finding a few (optimal) perturbations, allowing to determine the ensemble spread as correctly as possible, is important for the extremely HdSs (dimension is of order 10 6 − 10 7 ).
On the other hand, for the HdSs, the implementation of optimal algorithms like a Kalman filter (KF) [8], is today impossible.It is inevitable, therefore, to employ a reduced-order filter (ROF) for reducing computational complexity.One of the most important requirements for a reduction technique used is that it must ensure a stability of the ROF.This requirement can be satisfied by choosing a subspace of correction, which makes the filter transition matrix to be stable.As found in [4], [9], a stabilizing subspace can be constructed on the basis of the unstable (and neutral) eigenvectors (EiVs), Schur vectors (SchVs) or singular vectors (SVs) of the system dynamics and they are in fact the OPs to be introduced in this paper.
The paper is organized as follows.Sect. 2 outlines the optimal perturbation (OP) theory, on its important role for seeking the most growing direction of prediction error (PE).The definition of the optimal deterministic perturbation (ODP) and some theoretical results on the ODP are introduced.It is found that the ODP is associated with the right SV of the system dynamics.In Sect. 3 the two other classes of ODPs are presented : the leading eigenvector (EV) and real Schur vector (SchV) of the system dynamics.In Sect. 4 we present the other type of OP called as optimal stochastic perturbation (OSP).One important class of perturbations (known as simultaneous stochastic perturbation -SSP) is presented in Sect. 5.This class of perturbations is efficient for computing the OPs in high dimensional (Hd) setting by solving optimization problems, without the need of adjoint code (AdC).A simple numerical example is presented in Sect.6 for illustrating the different OPs.The experiment on data assimilation in the Hd ocean model MICOM by the filters, constructed on the basis of the Schur ODSs and SSPs, is presented in Sect.7. The concluding remarks are presented in Sect.8.

System predictability and stability of filter
The behavior of atmosphere or ocean is highly sensitive to initial conditions.It means that a small change in an initial condition can alter strongly the trajectory of the system.It is therefore important to be able to know about the directions of rapid growth of the system state.Finding the system states which are developed into the rapidly growing directions of the system dynamics, and optimizing the predictability of the physical process under consideration are important elements in the design of a high performance OFS.
Another role of the OP is lying in overcoming insurmountable difficulties related to extremely high demand on computational and memory resources in the DA-Algs.The reason is that for Hd systems, traditionally optimal DA-Algs (like a KF) cannot be implemented.Knowledge on OPs allows to reduce the dimension of the DA-Algs, to ensure a stability of a DA-Alg and to optimize its performance.The ideology of an adaptive filter (AF) [4] is to optimize the filter performance with respect to (wrt) to the gain elements of a stable filter, hence the question of sensibility of the forecast to the initial condition is not of primary importance.

Stability of filter
Consider a standard linear filtering problem where Φ ∈ R n×n is the state transition matrix, H ∈ R p×n is an observation matrix, w k , v k are white noise sequences.The filtering problem is to estimate the system state x k as precisely as possible, based on the set of observations z l , l = 1, 2, ..k.
It is well known that the minimum mean squared (MMS) estimate xk can be obtained by the KF, where is the filtered (or analysis) estimate, x(k + 1/k) is the one-step ahead prediction for x(k + 1).The gain K is given by K = MH T [HMH T + R] −1 and the transition matrix L for the filtered error (FE) For HdS, the gain K in the KF is impossible to compute.According to [4], the ROAF (Reduced-Order AF) has the gain with the structure K = P r K e with P r ∈ R n×n e -an operator projecting a reduced space R n e to the full system space R n , K e ∈ R n e ×p is the gain for the reduced filter.It is found (see [4]) that detectability of the input-output system (1) is sufficient for the existence of a stabilising gain K, with P r consisting from all unstable EVs (or unstable SVs, SchVs).

Unstable singular vectors
Introduce the SVD of the matrix Φ ∈ R n×n [10] where is the number of all unstable and neutral SVs of Φ.In the future, for simplicity, unless otherwise stated, we say on the set of all unstable SVs as that including all unstable and neutral SVs.Suppose the system is s-detectable (detectability of all the columns of U 1 or V 1 ).From [9], there exists a stabilizing gain K s (sufficient but not necessary condition) with P r = U 1 (see Sect. 2.3) such that the transition matrix L f = (I−K s H)Φ of the equation for FE e f (k) (or equation for the PE e p (k)) is stable.It signifies that the filter is stable and the estimation error is bounded.The columns of U 1 , i.e. the unstable left SVs of Φ, serve as a basis for seeking appropriate correction in the filtering algorithm.
On the other hand, in practice, for extremely HdS, one cannot compute all elements of U 1 but only some its subset U ′ 1 ∈ U 1 .Using U ′ 1 instead of U 1 cannot guarantee a filter stability.The ensemble forecasting has been proposed as an approach to prevent a possible large error in the forecast and requires a knowledge on the rapidly growing directions of the PE.In this context, the OPs appear to be important which allow to search the rapid growing directions of the PE by model integration of OPs.They are in fact the unstable right SVs (RSV) in the SVD decomposition (3) as follows from the next sections.
Definition 2.1.The ODP δx o is the solution of the extremal problem under the constraint (4).

One can prove
Lemma 2.1.The OP in the sense (5)( 4) is where v 1 is the first right SV of Φ.

Subspaces of ODPs
Introduce Φ 1 := Φ − σ 1 u 1 v T 1 and consider the optimization problem under the constraint (4).Similarly to the proof of Lemma 2.1 one can prove Lemma 2.2.The OP the sense (6)( 4) is where v 2 is the second right SV of Φ.

By iteration, for
repeating the proof of Lemma 2.2 with slight modifications, one finds that the OPs for The OP for Φ i−1 will be called the i th OP for Φ (or the i th SOP -singular OP) Comment 2.2.The OPs, presented above, are optimal in the sense of the Euclidean norm ||.|| 2 .In practice, there is a need to normalize the state vector (using the inverse of the covariance matrix M).The normalization is done by changing δx ′ = M −1/2 δx and, all the results presented above, remain valid s.
As to the PE, a normalisation is also applied in order to have a possibility to compare different variables like density, temperature, velocity ... In this situation, the norm for y = Φδx may be semi-norm.

Adaptive filter (AF)
The idea underlying the AF is to construct a filter which uses feedback in the form of the PE signal (innovation) to adjust the free parameters in the gain to optimize the filter performance.If in the KF, the optimality is defined as a minimum mean squared error (MMS), in the AF, optimality is understood in the sense of MMS for the prediction output error (innovation).This definition allows to define the optimality of the filter in the realization space, but not in the probability space as done in the KF.
The optimal gain thus can be determined from solving the optimization problem by adjusting all elements of the filter gain (stationarity hypothesis.There are two major difficulties to overcome in this situation : (i) Instability : As the filter gain is estimated stochastically during the optimization process, the filter may become unstable due to the stochastic character of the filter gain.
(ii) Reduction of tuning parameters : For extreme HdS, the number of elements in the filter gain is still very high.Reduction of the number of tunning gain elements is necessary.

Leading EVs and SchVs as optimal perturbations
Interest on stability of the AF arises soon after the AF has been introduced.The study on the filter stability shows that it is possible to provide a filter stability when the system is detectable [4].For the different parameterized stabilizing gain structures based on a subspace of unstable and neutral EVs see [4].As the EVs may be complex and their computation is unstable ( [11]), in [4] it is proved that one can also ensure a stability of the filter if the space of projection is constructed from a set of unstable and neutral SchVs of the system dynamics.The unstable and neutral real SchVs are referred to as SchVs associated with the unstable and neutral eigenvalues of the system dynamics.The advantage of the real SchVs is that they are real, orthonormal and their computation is stable.Moreover, the algorithm for estimating dominant SchVs is simple which is based on the power iteration procedure (Sampling Procedure, see [12]).As to the unstable SVs, although they are real and orthonormal, their computation requires an AdC for computing the product Φ T x).Construction of adjoint code (AC) is a time consuming and tedious process.Approximating leading SVs without (AC) can be done on the basis of Algorithms 3.1-3.2 in [13].

EVs as optimal perturbations in the invariant subspace
Let Φ be diagonalizable.Introduce the set The subspace of x ∈ C n satisfying Φx = λx for some λ ∈ C 1 is known as an invariant subspace of Φ : the matrix Φ acts on x to stretch it but conserves the direction of x.Consider the optimization problem It is seen that the optimal solution is the first EV x ei (1) of Φ with the largest magnitude equal to |λ 1 |.We will call λ 1 a first optimal EV perturbation (denoted as EI-OP).
For a symmetric matrix, the EI-OP coincides with the SOP.The EI-OP is not unique.By solving the optimization problem (5) s.t.
one finds the second EI-OP x ei (2).In a similar way, by defining for i = 1, 2, .., n − 1 we obtain a sequence of EI-OPs x ei (i), i = 1, 2, .., n.The first n e EI-OPs are unstable SVs.
In general, for a defective case (not diagonalizable), Φ does not have n linearly independent EVs and the independent generalized EVs can serve as "optimal" perturbations to construct a subspace of projection in the AF.
To summarize, let the EV decomposition be where J is a matrix of Jordan canonical form, X −1 ei is the matrix inverse of X ei (see [10]).The columns of X ei are the EVs of Φ, J is a block diagonal with the diagonal blocks of 1 or 2 dimensions.The rank k decomposition is X ei,1 J 1 Xei,1 where with . Multiplying the right of EV k by X ei,1 yields X ei,1 J 1 , i.e. we obtain the k largest (in modulus) perturbations in the eigen (invariant) space of Φ.The perturbations being the column vectors of X ei,1 (i.e. the k first EVs of Φ) are the first k OPs of Φ in the eigen-invariant subspace (EI-InS).

Dominant SchVs as OPs in the Schur invariant subspace
The study in [4] shows that the subspace of projection of the stable filter can be constructed on the basis of all unstable EVs or SchVs of Φ.
Compared to the EVs, the approach based on real Schur decomposition is of preference in practice since the SchVs are real and orthonormal.Moreover, there exists a simple, power iterative algorithm for approaching the set of real leading SchVs.According to Theorem 7. 3 in [10], the subspace R[X s,1 ] spanned by the n u leading SchVs converges to the unique invariant subspace D n u (Φ) (called a dominant invariant subspace) associated with the eigenvalues λ 1 , ..., λ n u if |λ n u | > |λ n u +1 |.In this sense we consider the leading SchVs as OPs (denoted as Sch-OP) in the Schur invariant subspace (Sch-InS).

Optimal stochastic perturbation (OSP)
In section 2 the perturbation δx is deterministic (see Definition 2.1).In practice, it happens that δx is of stochastic nature.For example, the priori information on the FE is an zero mean random vector (RV) with the error covariance matrix (ECM) P. The question arising here is how one determine the OP in such situation and how to find it.
We will consider now δx as an element of the Hilbert space H of RVs.This space H is a complete normed linear vector space equipped with the inner product where E(.) denotes the mathematical expectation.The norm in H is defined as All elements of H are of finite variance and for simplicity, we assume they all have zero mean value.
Introduce the set of RVS under the constraint (16).One can prove Lemma 4.1.For δx ∈ S s (δx) there exists δy = (δy 1 , ..., δy n ) T ∈ S s (δx) such that The optimal perturbation in the sense (17)( 16) is where ψ is a RV with zero mean and unit variance, v 1 is the first right SV of Φ.
Comment 4.1.Comparing the ODP with OSP shows that if the ODS is the first right SV (defined up to the sign), the OSP is an ensemble of vectors lying in the subspace of the first right SV with the lengths being the samples of the RV of zero mean and unit variance.
Introduce https://doi.org/10.1051/matecconf/201821002033CSCC 2018 and consider the objective function Lemma 4.3.The optimal perturbation in the sense (18) ( 16) is where ψ is an RV with zero mean and unit variance, v 2 is the first right SV of Φ.

By iteration, for
applying Lemma 4.3 with slight modifications, one finds that the OSP for Φ k , k = 0, 1, ..., n − 1 are ψv k , k = 1, 2, ...n.where ψ is an RV with zero mean and unit variance, v k is the k th right SV of Φ.
Theorem 4.1 The optimal perturbation for the matrix Φ i−1 , i = 1, ..., n is ψv i where ψ is an RV with zero mean and unit variance, v k is the k th right SV of Φ.

Simultaneous stochastic perturbations
In [14] Spall proposes a simultaneous perturbation stochastic approximation (SPSA) algorithm for finding optimal unknown parameters by minimizing some objective function.The main feature of the SPGA resides in the way to approximate the gradient vector (in average) : a sample gradient vector is estimated by perturbing simultaneously all components of the unknown vector in a stochastic way.This method requires only two or three measurements of the objective function, regardless of the dimension of the vector of unknown parameters.In [15] the idea of the SPGA is described in detail, with a wide variety of applications in engineering domains.The application to estimation of the ECM in the filtering problem is given in [16].In [13] a simple algorithm for estimating the elements of an unknown matrix as well as the way to decompose the estimated matrix into a product of two matrices, under the condition that only the matrix-vector product is accessible, has been proposed.

Theoretical background of SPGA. Gradient approximation
The component-wise perturbation is a method for numerical computation of the cost function with respect to the vector of unknown parameters.It is based on the idea to perturb separately each component of the vector of parameters.For very HdS, this technique is impossible to implement.An alternative to the component-wise perturbation is the SSP approach.

Algorithm for estimation of an unknown matrix
In [13] the different algorithms for estimation of Hd matrix (Algorithm 2.1) as well as that for decomposition of a Hd matrix into the product of two matrices (Algorithm 3.1) are introduced.There is also the iterative matrix decomposition algorithm (Algorithm 3.2) which is important for computing the OP in Hd environment.For more details on the optimal properties of the produced estimates, convergences of algorithms, see [13].are presented

Numerical example
Consider the matrix First, we apply Algorithm 2.1 of [13] to estimate the matrix Φ. Figure 1 shows the estimates produced by Algorithm 2.1 [13].It is seen that the estimates are converging quickly to the true elements of Φ.
Next Algorithm 3.2 (Iterative Decomposition Algorithm) [13] has been applied to estimate the decomposition of the matrix Φ.After each i th iteration the algorithm yields b(i) , ĉ(i) .
The different OPs are shown in Table 1 where x r sv (i), x ei (i) and x sch (i) are the theoretical SV-POs, EI-POs, Sch-POs.The vectors xsch (i) are the components of X t computed by Algorithm 3.1 in [12] (Sampling Procedure).As to ĉ(i) n , they are the results of normalization (with the unit Euclidean norm) of ĉ(i) .
Looking at the first OPs, one sees that integration of x r sv (1), x sch (1), xsch (1) and ĉ(1) n produces the vectors of almost the same amplification.The predictor from the first x ei (1) has the amplification 3 times less than those of x r sv (1), x sch (1).The predictor from the second x sch (2) is much less optimal than those of x r sv (2) and ĉ(2) n .By comparing x r sv (i) with ĉ(i) n for i = 1, 2 one concludes that the obtained results justify the correctness of Theorem 3.1 of [13].Mention that only xsch (i) and ĉ(i) n can be calculated for HdS.In Table 2 we show the results obtained by Algorithm 3.2 [13] after two consecutive iterations (matrix estimation in R 1 subspace).The elements of the true Φ = [φ i j ] are displayed in the second column, whereas their estimates -in the third column, The estimates, resulting from the first iteration, are the elements of Φ(1) (Table 1, column 4).After the first iteration, Φ (2) ,T and the optimization yields the estimates φ (2) i j displayed in the column 5. From the columns 4-5 on sees that the first iteration allows to well estimate the two biggest elements Φ 11 = 5, Φ 12 = 7.In the similar way, the second iteration captures the two biggest elements of Φ (2) .To see the impact of optimal SchVs in the design of filtering algorithm for HdS, in this section we present the results of the experiment on the Hd ocean model MICOM (Miami Isopycnal Ocean Model).This numerical experiment is identical to that described in [12].The model configuration is a domain situated in the North Atlantic from 30 0 N to 60 0 N and 80 0 W to 44 0 W; for the exact model domain and some main features of the ocean current produced by the model, see [12].The system state x = (h, u, v) where h = h(i, j, k) is a layer thickness and u = u(i, j, k), v = v(i, j, k) are two velocity components.Mention that after discretization the dimension of the system state is n = 302400.The observations available at each assimilation instant are the sea surface height (SSH) with dimension p = 221.

Data matrix based on dominant Sch-OPs
The filter is a reduced-order filter (ROF) with the variable h as a reduced state and u, v are calculated on the basis of the geostrophy hypothesis.To obtain the gain in the ROF, first the Algorithm 3.1 [12] has been implemented to generate an ensemble of dominant SchVs (totally 72 SchVs, denoted as En(S CH)).The sample ECM M d (S CH) is computed on the basis of the En(S CH).Due to rank deficiency, the sample M d (S CH) is considered only as a data matrix.The optimization procedure is applied to minimize the distance between the data matrix M d (S CH) and the structured parametrized ECM M(S CH) = M v ⊗ M h which is written in the form of the Schur product of two matrices M(S CH) = M v (θ) ⊗ M h (θ).Here M v is the vertical ECM, M h is the horizontal ECM [16]), (θ) is a vector of unknown parameters.Mention that the hypothesis on separability of the vertical and horizontal variables in the ECM is not new in the meteorology [17].The gain is computed according to [16] with R = αI, α > 0 is a small positive value.The ROF is denoted as PEF(SSP).The second data matrix M d (S S P) is obtained by perturbing the system state according to the SSP method.The SSP samples are simulated in the way similar to that described above for generating En(S CH), with the difference that the perturbation components δh (l) (i, j, k) are the i.i.d.random Bernoulli variables assuming 2 values +/-1 with the equal probability 1/2.The same optimization procedure has been applied to estimate M(S S P).The obtained ROF is denoted as PEF(SSP).
Figure 2 shows the evolution of estimates for the gain coefficients k(1) computed from the estimated coefficients of ĉkl of M d (S CH) and M d (S S P) on the basis of En(S CH) (curve "schur") and En(S S P) (curve "random"), during model integration.It is seen that two coefficients are evolved in nearly the same manner, of nearly the same magnitude as that of k(1) in the CHF (Cooper-Haines filter, [18]).Mention that the CHF is a filter widely used in the oceanic data assimilation, which projects the PE of the surface height data lifting or lowering of water columns.
The same pictures are obtained for the estimates ĉk , k = 2, 3, 4. Mention that in the CHF, c 2 = c 3 = 0.

Performance of different filters
In Table 3 the performances of the three filters are displayed.The errors are the averaged (spatially and temporally) RMS (root mean squere) of PE for the SSH and for the two velocity components u and v.
The results in Table 3 show that two filters PEF(SCH) and PEF(SSP) are practically of the same performance, and their estimates are much better compared to those of the CHF, with a slightly better performance for the PEF(SSP).We note that as the PEF(SCH) is constructed on the basis of an ensemble of samples tending to the first dominant SchV, its performance must be theoretically better than that of the PEF(SSP).The slightly better performance of PEF(SSP) (compared to that of PEF(SCH)) may be explained by the fact that the best theoretical performance of PEF(SCH) can be obtained only if the model is linear, stationary and the number of PE samples in En(S CH) at each iteration must be large enough.The ensemble  size of En(S CH) in the present experiment is too small compared with the dimension of the MICOM model.It is therefore possible that a more efficient way to simulate the PE samples is to perturb randomly all components of the system state.Detailed study of this question is of undoubted interest and is left for a future study.
To illustrate how adaptation can improve the performance of the nonadaptive filter, in figure 3 we show the cost functions (variances of innovation) resulting from applying the three filters PEF(SCH), PEF(SSP) and AF (i.e.APEF based on PEF(SCH); The same performance is observed for the AF based on PEF(SSP)).Undoubtedly the adaptation allows to improve considerably the performances of nonadaptive filters.

Conclusions
In this paper, we introduced the different types of OP (deterministic, stochastic or optimal the invariant subspaces of system dynamics) which play important role in the study on predictability, in generating an ensemble of (optimal) PE samples for EnF as well as for the construction of stabilizing subspace in a ROF.The optimality of the OPs is presented.Computing OPs allows to reduce greatly a size of an ensemble of forecasts in order to determine an ensemble spread of possible forecast values in an optimal way.The OP approach serves also as a tool for finding a basis of the subspace of correction to ensure a stability of the ROF for Hd environmental geophysical systems.
As seen from the theoretical results, the direction of faster error growth is obtained by integration of the first right SV. Computation of the SVs requires the AdC for a linear tangent (or a system transition matrix).It is well known that coding of the adjoint, especially of sophisticated environmental models, is time consuming and subject to errors.One way to avoid the development of an AdC is to apply Theorem 3.1 in [13] to compute the leading https://doi.org/10.1051/matecconf/201821002033CSCC 2018 SVs by solving the corresponding optimization problem (see Sect. 5.3).On the other hand, the Sch-OPs seem to be more preferable for practical application since they are orthogonal (as do the SVs), their computation is simple (requiring only the integration of the model and application of the Gram-Schmidt orthogonalization procedure) and their forecast amplitudes are almost the same as those of the SV-OPs.This advantage, along with the fact that SchVs computation is numerically stable, makes the SchVs the most attractive OPs.
The numerical experiments presented in this paper illustrate an important role of the OPs in the study of Hd data assimilation systems.

Figure 1 . 7 Assimilation in high dimensional ocean model MICOM 7 . 1
Figure 1.Estimates of elements of the matrix Φ

Figure 2 .
Figure2.Evolution of estimates for the gain coefficients k(1) computed from ĉkl on the basis of En(S CH) (curve "schur") and En(S S P) (curve "random"), during model integration.It is seen that two coefficients are evolved in nearly the same manner, of nearly the same magnitude as that of k(1) in the CHF.The same picture is obtained for other c k , k = 2, 3, 4.

Figure 3 .
Figure 3. Variance of PE resulting from three filters PEF(SCH), PEF(SSP) and AF.It is seen that the AF yields much better performance compared to the two other nonadaptive filters PEF(SCH) and PEF(SSP). .1

Table 3 .
RMS of PE for ssh, and u, v velocity components