On the advances of automatic modal identification for SHM

Structural health monitoring of civil infrastructures has great practical importance for engineers, owners and stakeholders. Numerous researches have been carried out using long-term monitoring, for instance the RioNiterói Bridge in Brazil, the former Z24 Bridge in Switzerland, the Millau Bridge in France, among others. In fact, some structures are monitored 24/7 in order to supply dynamic measurements that can be used for the identification of structural problems such as the presence of cracks, excessive vibration, damage or even to perform a quite extensive structural evaluation concerning its reliability and life cycle. The outputs of such an analysis, commonly entitled modal identification, are the so-called modal parameters, i.e. natural frequencies, damping ratios and mode shapes. Therefore, the development and validation of tools for the automatic identification of modal parameters based on the structural responses during normal operation is fundamental, as the success of subsequent damage detection algorithms depends on the accuracy of the modal parameters estimates. The proposed methodology uses the data driven stochastic subspace identification method (SSI-DATA), which is then complemented by a novel procedure developed for the automatic analysis of the stabilization diagrams provided by the SSI-DATA method. The efficiency of the proposed approach is attested via experimental investigations on a simply supported beam tested in laboratory and on a motorway bridge.


Introduction and state-of-the-art
Firstly, for the sake of clarity, one should clearly distinguish modal parameter estimation (MPE) from modal tracking.The former consists in the estimation of modal parameters from a single record of measured data and the latter corresponds to tracking the evolution of the modal parameters of a structure through repeated MPE.This paper focuses on the MPE process, which means that structural responses are evaluated independently.Therefore, what would naturally be the second step, related to modal tracking, will not be considered here.
Thus, concerning the parametric system identification techniques, one can enumerate the following two steps for the MPE: 1.
Computation of the modes estimates for several orders of the parametric model, by fitting them to the response series; 2.
Interpretation of the modes estimates data by detecting spurious modes and selection of the physical ones.
In fact, without the automation of the MPE process, a lot of user interaction over a large amount of measured vibration data would be required, leading to an impossibility of practical applications such as the health monitoring of crucial infrastructures.Therefore, especially over the last decade, several methods aiming the reduction of the number of user-defined parameters are being developed [1][2][3][4], some of which reached the complete elimination of all manually specified parameters: these methods are usually labelled as fully automated.
For instance, in 2012, Reynders et al. [1] published a fully automated approach for the interpretation of a stabilization diagram.It was cleverly based on clustering techniques through three stages: a diagram pre-cleaning by means of a classification of all modes into two categories (possible physical or certainly spurious); a hierarchical clustering of the possible physical ones to group them together (automatic detection of vertical lines on diagram); and a final classification of the formed clusters into a physical or spurious condition.No userdefined parameter is needed whatsoever in the entire process.
On the other hand, the structural dynamic monitoring practice has been showing that, even the most wellcrafted automated process of MPE has to be, at a first use, manually checked or tuned to fit a possible specific behaviour of the analysed structure in order to obtain satisfactory results.As a consequence, the development focus of new methods does not need to be on the elimination of all the user-defined parameters (fully automation), but on the construction of a robust automated approach with some few easy configurable user settings, which are defined once before starting the repetitive automated process itself.Thereby, this paper presents an automated method for interpreting the stabilization diagram with just two easily user-defined parameters.In other words, the main objective here is all about the second step of the MPE previously stated.
Naturally, in order to obtain the data related to the stabilization diagram, one must first carry out the first step.This is done by the parametric system identification algorithm.Thus, in this paper, one utilizes the data-driven Stochastic Subspace Identification method (SSI-DATA), which is briefly discussed in Section 2.
A benchmark method for the automatic interpretation of the stabilization diagram according to Magalhaes [2] is treated in section 3.Moreover, this section also presents the novel approach developed in this paper.Section 4 shows the results and comparisons between the methods subjected to practical applications.Finally, in section 5, some conclusions about the results are drawn.

Data-driven stochastic subspace identification
Since the work published by Van Overschee and De Moor in 1996 [5], the application of data-driven SSI algorithms to determine the modal parameters of structures has been quite spread over the ambient vibration tests.They compute state space models from output data (structure response) and can have its problem stated as following.
Given s measurements of the output yk ℝ l generated by the unknown stochastic system of order n: where xk ℝ n is the state vector at a discrete time instant k, wk and vk ℝ n are zero mean, white noise vector sequences with covariance matrix (at instants p and q with δ meaning the Kronecker delta ):

Determine:
xThe order n of the unknown system; xThe system matrices A ℝ n x n and C ℝ l x n with the covariance matrices Q ℝ n x n , Sv ℝ n x l and R ℝ l x l .For this purpose, the SSI-DATA algorithm must consist of the following 8 steps: where † x denotes the Moore-Penrose pseudoinverse of the matrix x ; 3. Compute the Singular Value Decomposition (SVD) of the projection weighted by matrices W1 and W2: T W PW USV (5) 4. Determine the system order n by inspecting the rank of S (the number of non-zero singular values) and partition the SVD accordingly to obtain U1 and S1: 5. Compute the extended observability matrix Γ: 6. Determine, in a least square sense, the system (dynamic) matrix A from Γ as: where Γ denotes Γ without the last l (number of outputs) rows and Γ denotes Γ without the first l rows.7. Determine C as the first l rows of Γ. 8. Finally, knowing A and C, one can calculate the modal frequencies ωi, the modal damping ratios ξi and the corresponding modal shapes ϕi.This calculation is made through the eigenvalue decomposition of matrix A as the following statements show: where μi is the i th discrete-time eigenvalue of A and gi is its corresponding eigenvector.The eigenvalues of A are also called poles.They are complex numbers and appear mostly in conjugated pairs.Only those poles which have a conjugated pair and have positive imaginary component are taken into account for the equations (9) to (12).Therefore, up to n/2 modes are expected to be estimated.The Re( x ) is the symbol for the real part of x and λi denotes the continuous-time eigenvalue of A. These 8 steps summarize the SSI-DATA routine which is, essentially, a fit of a state space model to the temporal output data by means of the geometric projection of the row space of the future measurements on the past measurements.
There are several valid sets of weighting matrices W1 and W2 to be used in the equations ( 5) and (7).They determine the state space basis in which the model will be identified.Special choices of these matrices correspond to algorithms described in the literature: Principal Component (PC), Unweighted Principal Component (UPC) and Canonical Variate Analysis (CVA).The PC algorithm is the one used in the applications shown in section 4.
It is necessary to point out that when one works with real structure response data, it is very hard to determine the system order through inspection of singular values like stated in the fourth step.This happens because there are no zeros at all, but an asymptotic smooth variation towards zero (S is actually full rank).Thus, a threshold would be necessary to consider a singular value as being null.Therefore, instead of creating manners for defining this threshold, the system order n is readily given as an input to the algorithm, which solves the problem but creates the need for a further check whether this arbitrary order is able or not to accurately represent the actual data.One of the tools for this check is, exactly, the well-known stabilization diagram, where the modes estimates are plotted for various different system orders.

Automation methods for stabilization diagram interpretation
As previously mentioned, the mode estimates (usually presented in a stabilization diagram) are supposed to be analysed in order to get physical modes apart from the spurious (numerical) ones.The latter inevitably exist among the data, because they are a natural consequence of the parametric model attempt to better fit the response data.
Naturally, due to the large amount of collected data from online structural monitoring, a manual interpretation of modes estimates data is unfeasible.Thus, the obvious solution is the automation of such a process.For this purpose, in this paper, one proposes a novel approach, which was mainly inspired on the automation methodology proposed by Magalhães in his PhD thesis [3].His method is considered as a benchmark.
Both methods are suitable for the analysis of the outputs generated by any parametric identification technique that produces modes estimates (frequencies, damping ratios and modal shapes) for several orders.

Bechmark methodology
In his approach, no pre-filter is applied to the modes estimates data, which means that the stabilization diagram would be still full of certainly spurious modes, i.e. like those with extremely high or negative damping ratios.This method is based on a hierarchical clustering algorithm.
Firstly, the dissimilarity (distance) measure between all pairs of estimated modes is calculated.The adopted 'metric' for calculation of such a distance is: where fi and fj are, respectively, the natural frequencies of the modes estimates i and j.The mode shape similarity between these two modes estimates is taken into account by the well-known Modal Assurance Criterion (MAC).Hierarchical clustering algorithms differ in the way they compute the distance between two already formed clusters.In this methodology, the single-linkage is used, which means that the distance between two clusters will be considered as being equal to the smallest distance, also computed with equation (13), between objects inside these two clusters.This information allows constructing the hierarchical tree.
In the following step, one must define the tree's cutlevel.This is usually dependent on the expected number of clusters.However, this number is not trivial to predict due to the unknown quantity of clusters grouping spurious modes.Thus, the alternative strategy is the use of a distance limit (dlim).Such a threshold is the criterion to prone the branches of the hierarchical tree, generating clusters that are distant from each other more than that value.Hence, the lower is the limit, the higher is the number of resulting clusters.
At this point, one has each cluster representing a single mode.To decide whether the mode is spurious or not, the clusters are ranked according to their number of elements.Thus, the top nm clusters with more elements are selected.Once physical modes are very consistent for models with different orders, their clusters present a much higher number of elements than the clusters that contain numerical estimates (which present a higher scatter between models of different orders).Therefore, it is expected to find that the nm selected modes are indeed physical.In practice, it is common to realize that the number of physical modes nm can be guessed, for instance, by a simple preliminary frequency domain analysis.
Then, the damping ratios are taken into account by means of an outlier analysis within each selected cluster.This internal filtering eliminates the modes estimates with extreme damping ratio values, which are out of the range defined by ±2.698σ (standard deviation) that embraces 99.3% of the samples in a Gaussian distribution.
Finally, each selected cluster produces average values of the modal parameters (natural frequency, modal damping ratio and mode shape), which are the outputs of the methodology.

Proposed methodology
This method is also based on a hierarchical clustering algorithm.However, differently from the previous approach, one intends to avoid unnecessary computational efforts and aims to obtain better results from the clustering process.Thus, a pre-filter is applied on the modes estimates data, which removes all modes whose damping ratios are not between 0 and 15% (recommended for civil engineering structures).Besides that, the user optionally may have the elimination of all modes with Modal Phase Collinearity (MPC) indicator [1,6] lower than a specified mlim ranging from 0 to 1. Finally, the user also may have removed all modes estimates out of a range of frequencies specified (much like a "band-pass" filter on the modes estimates data).
After having the modes estimates data processed by the pre-filter, a plot of the stabilization diagram would reveal that a significant amount of modes was removed, yielding a clearer aspect.Depending on the situation, and this is not rare, the removed modes may even represent something around 50% of initial data.These removed modes estimates are considered as certainly spurious (or undesired) and will not be processed by the subsequent clustering algorithm.
For the distance calculation between the modes estimates i and j, the novel 'metric' is used: where c is a constant arbitrarily set to 5 Hz (fair weight value, found after several simulations by the authors).
One can note that, oddly, di,j ≠ dj,i is possible if the used metric is that one shown in equation ( 13), leading to an asymmetric dissimilarity matrix.On the other hand, this does not happen with the metric proposed in equation ( 14), which is perfectly symmetric.
Then, the hierarchical clustering is applied.However, differently from the benchmark methodology, the distance between two already formed clusters are measured according to a complete linkage, which means that such a distance is equivalent to the largest possible distance between two of their elements.With this information, one can construct the hierarchical tree that will be trimmed exactly like in the benchmark methodology: by a distance threshold dlim .
Contrary to what the previous methodology does, the internal filter is here applied within the clusters before they are ranked according to their number of elements.This is done to penalize the clusters with high scatter (likely to be representing a numerical mode) before they could be ranked among the top nm modes that will be selected as physical.Thereby, the internal filter removes outlier modes within the cluster considering, not only damping ratios, but also mode shapes.Mode estimates that have damping ratios out of the ±2.698σ range and/or have a lower than 0.8 MAC related to the cluster average mode shape are eliminated.
Finally, one can carry out the clustering procedure according to their number of elements.The top nm clusters (modes) with more elements are selected as being physical.Then, for the methodology output, each selected cluster has its average modal parameters (natural frequency, damping ratio and mode shape) calculated.

Laboratory experiment
This section presents the experimental tests conducted at COPPE/UFRJ laboratory on a simply supported steel beam depicted in Figure 1.This beam is 1.46 m long with rectangular cross-section (76.2 x 8.0 mm) and was instrumented with six piezoelectric accelerometers (PCB, 336C31).Data acquisition was carried out using Lynx ADS2002 equipment, which essentially is a conditioning/amplifier regulating system.This study considered random vibration tests (using a shaker).The random excitation was applied throughout the duration of the tests.The PC variant of SSI-DATA method was applied to the response series (six channels), leading to modes estimates of models with even order from 10 to 120.After a first quick manual judgment, the user-defined parameters were set up: dlim=0.01 and nm=8 (benchmark methodology); dlim=1Hz and nm=5 (proposed methodology).The mlim filter parameter was set to 0.9.Then, the different methodologies performed the automatic interpretation of such a data.Results of the processes, for both methods, can be checked in Figures 2  to 9.  It is possible to see the clearer aspect of the stabilization diagram produced by the proposed methodology.Besides, one can note that the columns of modes estimates considered physical, are much more vertical (less scattered) for the proposed approach.The number of elements grouped into each cluster can be checked by means of Figures 4 and 5.The nm selected modes are highlighted by the big circles on the top of the stem.One can note the high number of elements grouped, erroneously, into the cluster with frequency mean close to 200Hz in the case of the benchmark methodology (Figure 4).
Figures 6 and 7 show the modes estimates of the nm selected clusters, scattered in function of damping ratio and natural frequency.Concerning the benchmark results (Figure 6), special atention should be paid to the high dispersion of modes estimates with frequencies higher than 150 Hz.This phenomenon does not happen with the proposed methodology, as can be seen on Figure 7.

PI-57 Oise-Bridge
The PI-57 Bridge is a double-deck bridge located near the town of Senlis in France, crossing the Oise River and carrying the A1 motorway, which connects Paris to Lille (Figures 10).The bridge, a 116.50 m long, cast-in-place, post-tensioned segmental structure built in 1965, consists of three continuous spans of 18.00 m, 80.50 m, 18.00 m.The two lateral spans play the role of counterweights.Dynamic tests were performed under ambient excitation: the traffic was used as a source of excitation.Sixteen piezo-electric accelerometers (Bruel & Kjaer 4507B-005 with sensitivity 1 V/g, frequency range from 0.4 to 6000 Hz, maximum operational level 75g, temperature range from -54 to 100ºC) have been installed on the bridge deck (Lille/Paris -Figure 11).The data acquisition system was composed of two separate data acquisition systems.For the acceleration recording, a data programmable controller Gantner E-PAC DL was used and connected to an 8 GB USB flash drive.Data were transferred by a TCP/IP modem.Accelerations were filtered within the 0-30 Hz frequency range and sampling was set to 0.004 s (250 Hz) during 5 min.In order to make the data processing amenable, structural data were only recorded every 3 h over a 24-h time period and stored on a buffer hard disk.For the sake of simplicity, in this experiment, only the upper nine vertical accelerometers (in the plane view -Figure 11) had their signals processed.Thus, one can expect that the algorithms will identify torsional modes among the vertical bending modes.Exactly like in the previous experiment, the SSI-DATA (PC) was applied to estimate structural modes considering both the benchmark methodology and the proposed methodology.However, differently from the previous test, in this application the first method had dlim set to 0.1 and nm set to 6. On the other hand, the proposed algorithm had its two parameters set to the very same values of the first experiment (dlim=1Hz and nm=5) with one difference: here the proposed method also eliminated (through its pre-filter) all the modes estimates with frequencies out of the 0 to 50 Hz range.
The benchmark method detected, as requested, six modes, which can be seen in Figure 12 as the six vertical columns of solid circles.
Concerning the proposed method, the stabilization diagram, after filtering, can be analyzed in Figure 13.This diagram depicts, clearly, five vertical columns of physical modes estimates (solid circles).Because of the filters, much less computational effort was needed to achieve the results.One can note the clearer aspect of the diagram.The number of elements per cluster can be seen in Figures 14 and 15 for the benchmark and proposed methodologies, respectively.The big head stems means the cluster is selected as being physical.
A sccater plot (frequency versus damping ratio) of the selected modes estimates can be checked by means of Figure 16 and Figure 17 for both methodologies.

Final considerations and remarks
Considering the experimental applications presented in this paper, it is possible to note that, compared to the benchmark approach, the proposed methodology works with a much lower number of mode estimates due to the pre-filter application.As a result, a faster and more accurate clustering process is achieved.In the first test, for instance, 1015 (57.83%) of the 1755 initial modes estimates were automatically considered as certainly spurious and, therefore, were immediately removed.This fact can be noticed through the comparison between the full stabilization diagram in Figure 2 and the cleared one in Figure 3.In a similar way, concerning the second test, as can be seen in Figure 13, the pre-filter also deleted a large number of undesired noisy modes.More precisely, this quantity reached 81.39% (1347) of the initial 1655 modes estimates.These expressive numbers of removed modes are mainly due to the MPC limit, which was set to 0.9.Such a parameter must be used with caution [1].
The hierarchical clustering process over the modes estimates, or over the few remaining ones in case of the proposed method, deserves special attention.In the first experiment, by inspecting Figures 2 to 9, one may observe that the benchmark method was not able to identify the first vertical bending mode with frequency close to 9 Hz.However, in spite of the low energy excitation of this mode during the tests, the proposed method did find this mode.Several values for the distance threshold (dlim) were tried on the benchmark method in order to induce the algorithm to 'catch' the first mode.However, the results were conclusive: as one increases the value of dlim, before the low frequencies clusters could gain enough modes to be ranked among the top nm, the higher frequencies clusters start to gather, in a too permissive way, a lot of modes (resulting in a fast growth of their stems in Figure 4).Consequently, a high scatter between modes belonging to a high frequency cluster can be observed in Figures 2, 6 and 8. Evidently, this excessive scattering is undesirable.A possible way of forcing the benchmark algorithm to catch the first mode is by keeping the same dlim value and increasing the nm parameter in an extreme conservative way.However, one can say that this solution is not elegant, since a lot of spurious clusters (modes) would be selected along with the physical ones.
The high frequency scattering phenomenon happens for two reasons.Firstly, the distance 'metric', which is written in equation ( 13), presents a dimensionless ratio of frequencies and, therefore, does not measure a distance between low frequency modes with the same rigor as it measures a distance between high frequency modes.Secondly, the single linkage criterion for measuring the distance between two already formed clusters promotes a decreasing in the heights of the hierarchical U-shaped branches, i.e. it makes easier to 'gather' two clusters when compared to the complete linkage criterion.
On the other hand, the novel methodology proposed in this paper is immune to such a phenomenon.That is due to the fact that the distance 'metric', stated in equation ( 14), does not contain any frequency ratio and, therefore, ensures that the distances between modes (with EVACE '15 S 03001-p.7 MATEC Web of Conferences either high or low frequency) will be expressed by an absolute value in Hz.Another remarkable point is that the success of the algorithm proposed in this work is not as sensible to the threshold limit as the benchmark algorithm is.For instance, in both tests the user-defined parameters for the proposed algorithm did not need to be adjusted.
Both experiments showed that the peculiarities of the two examined automation methods produce significant differences at the final process output.However, at least for the studied cases, the proposed methodology, when compared to the benchmark, generates better results and, therefore, represents an advance towards an even more robust automatic modal identification method.
DOI: 10.1051/ C Owned by the authors, published by EDP Sciences, 2015

Figure 4 .
Figure 4. Number of elements per cluster.Benchmark method.

Figure 5 .
Figure 5. Number of elements per cluster.Proposed method.

Figure 8 .Figure 9 .
Figure 8. Mode shapes, natural frequencies and damping ratios for each selected cluster.Benchmark methodology.

Figure 11 .
Figure 11.Plane view of the bridge with monitoring system.

Figure 12 .
Figure 12.Stabilization diagram obtained by the benchmark methodology.

Figure 13 .
Figure 13.Stabilization diagram obtained by the proposed methodology.

Figure 14 .
Figure 14.Number of elements per cluster.Benchmark method.

Figure 15 .
Figure 15.Number of elements per cluster.Proposed method.

Figure 17 .
Figure 17.Frequency versus damping.Proposed method.A summary of the automated identification results can be checked by means of the Figures 18 e 19 for both methods.The small font texts inform about the frequency and damping ratio of each mode.'STD' means standard deviation of the respective quantity.

Figure 18 .
Figure 18.Mode shapes, natural frequencies and damping ratios for each selected cluster.Benchmark methodology.

Figure 19 .
Figure19.Mode shapes, natural frequencies and damping ratios for each selected cluster.Proposed methodology.