Data Assimilation Method, its Application and Stability

The authors data assimilation method, namely, generalized Kalman filter (GKF) method, its application and stability is considered. The problem of stability of a dynamic system with data assimilation formulated for a sequence of random variables forming a Markov chain is considered. The stability formulation for this problem is suggested as the problem of the convergence of the corresponding Markov chain when the number of its members goes to infinity. Necessary and sufficient conditions of this convergence are proved. A number of numerical experiments with the specific dynamic system, namely with the ocean model circulation HYCOM and the GKF method are conducted and discussed. The stability of the GKF method was proofed.


Introduction
The development and application of data assimilation (DA) methods in conjunction with a dynamic system (model) defined by a system of differential equations is one of the popular and relevant problems of modern oceanology, geophysics, meteorology and climatology that have a great practical importance. The goal of these methods is to correct the model solution with the combination of observational data and, improve on this basis both the structure of the model characteristics themselves and the prediction of the characteristics of this system for various space-time scales. At the same time, unlike the data interpolation, the DA methods and schemes involve the changes in the entire dynamic system for both the observable and non-observable variables in order to maintain a balance between model parameters, which usually consists in the mathematical formulation of conservation laws. In the process of data assimilation, it is necessary to preserve the balance conditions and, at the same time, bring the model solution closer to observations in a given metric.
DA schemes and methods in oceanography and meteorology have been quite well represented in the scientific literature since the 1960s. A good review of the methods for solving DA problems is presented in [1]. In the recent years, the main progress in this direction is related to the advance in observational data and computer facilities; the development of powerful computing methods such as parallel programming, and special research programs and experiments, for instance, projects HYCOM + NCODA (USA) [2], REMO (Brazil) [3], etc.
Nevertheless, some important theoretical and practical specific problems in the DA theory remain out of scope of research. There are practically no theoretical studies stability and reliability of numerical forecast based on the DA theory, except for their direct statistical comparison with observations. It is not clear how the system will behave with quite a long integration process, noticeably perturbed initial conditions and model parameters, as well as with a lack of knowledge (errors) on the observed values.
In geophysical and meteorological models, dynamic equations are very complex, so, their analytical solution cannot be obtained; various examples show that even in fairly simple non-linear models, there are attraction points (attractors) and under close initial conditions, the solutions can converge to different attractors.
When dealing with DA, the task becomes even more complicated. Since the dynamical system is "rebuilt" at the moment of assimilation and begins to integrate with the new initial conditions, it seems to receive a "jolt" and the smoothness of the trajectories in the analytical sense is violated. With sufficiently frequent and long-term assimilation, the system may become unstable simply due to a permanent influx of new observable information and its assimilation. The problem is all the more significant since, as a rule, observational data are irregular in time, inhomogeneous in space, have omissions and, on the contrary, duplicates. In addition, they are often noisy and require preliminary mathematical processing. Because of this, the problem of the stability of a dynamic system with DA is of great importance from the theoretical and practical points of view, especially for the analysis and prediction of real characteristics.
In this paper, we study the general dynamic system with a specific assimilation scheme proposed by the authors in a number of papers, for instance, in [4] as a version of the so-called "generalized Kalman filter (GKF)". In [5], the GKF method is also compared with the Ensemble Optimal Interpolation scheme (EnOI) as a counterpart of the standard Kalman filter method and it is shown that the proposed general method has several advantages.
In the current work, the dynamic system in conjunction with the DA scheme is considered as a Markov chain and the stability problem is reduced to the existence of a stationary distribution of this chain. The goal of this work is to find theoretically the necessary and sufficient conditions for the stability of a dynamical system of general form with DA, using the GKF method. Experiments are also proposed which would allow answering the question of whether the entire model (the dynamic system plus the GKF assimilation) is stable in the aforementioned sense.

Data assimilation method
The dynamic system defined by the equations Here, X denotes the vector state of the system. The right side of equation (1) is a vector function, which is, generally, non-linear and does not contain explicitly time derivatives. The solution of (1) can be represented as where 0 ( ) X t is a given value (initial condition). System (1) is considered on a probability space, so that the vectors ( ) X t and the initial condition 0 ( ) X t are assumed to be random and measurable with respect to a given probability measure P. In this case, the vector function Λ is not assumed to be random.
Along with the dynamic system (1), the adjoint dynamic system is considered, where Y is a random variable independent of X, measurable with respect to P, and considered on the same probability space; H is a linear non-random projection operator that transfers the random variable X to the space defined by the values of the quantity Y; and K is a given weight matrix, which, generally speaking, depends on X and Y, and, of course, on time. Its meaning is to correct the value of the solutions of the dynamic system (1) using the observed value. It should be noted that, using formula (2), we change the entire vector rather than only its observed part HX, that is, the matrix K converts the observed vector to the model space (of the dynamic system (1)). All model variables are considered on the Euclidian space with the dimension r. Let the dimension of observational vector Y be m. Remark. Since random functions are considered in (1) and (2), the corresponding time derivatives are determined in the sense of the mean square.
Further, we will consider the system (2) at discrete equidistant moments of time, n n t t t + = + ∆ are the so-called assimilation moments. The requirement of equidistance is insignificant and is introduced only to simplify calculations. So, let the dynamic system be given in the form 1 1 1 ( , ) ( ) n n n n n n n X X X t t K Y HX t and assimilation scheme GKF produces the following matrix [4].

Numerical experinents
As an example of applications of the DA scheme described above, the following experiments are considered. The ocean circulation model HYCOM (HYbrid Circulation Ocean Model, University of Maiami, USA) is used as a given dynamic system according to (1). This model produces all parameters of ocean state, such as temperature, salinity, sea level and others as well as their variability in time. The model domain is shown in Fig 1. Since this model is used only as an example, no detailed descriptions for the model equations, numerical configuration and boundary conditions are presented; full information can be found in [6].
For the assimilation according to equations (3),(4), the AVISO (Archiving, Validating and Interpolating Satellite Observations) database was used. This archive contains the sea level data for several years, we used the data for 10 years: from January, 2001 to December, 2010. The data assimilation has been performed daily for one year; the results of integration for one month, January, 2010, is discussed below. Before the DA, the model forced by the climatology data from the NCEP/NCAR archive atlas runs for 40 years [7].
Figures 2 present the results of calculation after the one-month integration for January, 2010. Figure 2a is the direct observations taken from arhive AVISO on this day; Fig. 2b is the control field that is the model calculation without any assimilation; and Fig. 2c presents the analysis. i.e., the model field with DA according to scheme (3), (4). It is clearly seen that the field in Fig. 2c realizes some intermediate position between those in Figs. 2a and 2b. Indeed, Fig. 2a has many synoptical eddies, dynamically active and chaotic structures. On the contrary, Fig. 2b contains much less synoptical eddies and it is less dynamical. Figure 2c captures the observed dynamic structures, however, it is substantially smoothed by the model itself.

Stability of the data assimilation method
The stability of the method will be proved on the theory of Markov chains. As it is seen from (3)  Definition. Let random variable Y be consistent with such as Y=HX with P -non zero probability.
Lemma. Let the following conditions be fulfilled: 1) the value From the Lemma conditions the matrix ( , where x* is some mean value, because of its continuity and limitation. Therefore the equation (5) has a solution with non -zero P-probability, which leads to the condition ( , ) 0 p x y > . This proves the lemma.

Corollary
The proof of the theorem follows from the known properties of Markov chains [8].
As a consequence of the presented result we can assert that if all conditions hold, the equations (3), (4) becomes stable if the number of time-steps unlimitedly increase.

Conclusions
This paper states that the presented earlier and studied in the current work DA method has several advantages. It is not only pactically feasible and numerical effective, which is shown before but it is also theoretically consistent. It is proved that under reasonable conditions this DA scheme is stable in sense of long-term integrations and it limit behavior is realistic and can be assessed numerically. In the further development it is possible to obtain the characteristics of the limit distribution and directly calculate their average value and variance. Physically it means that if the model scenario is realistic and is consistent with the observations the resulted model fields after assimilations have physical sense and can be analyzed.
Th i s r esear ch was su p p or t ed by Ru ssi an Sci en ce Fou n d at i on , p r oject n o. 19 -11-00076.