Assessment of the stability of morphological ECG features and their potential for person verification / identification

This study investigates the potential of a set of ECG morphological features for person verification\identification. The measurements are done over 145 pairs of ECG recordings from healthy subjects, acquired 5 years apart (T1, T2 = T1+5 years). Time, amplitude, area and slope descriptors of the QRS-T pattern are analysed in 4 ECG leads, forming quasi-orthogonal lead system (II&III, V1, V5). The correspondence between feature values in T1 and T2 is verified via factor analysis by principal components extraction method; correlation analysis applied over the measurements in T1 and T2; synthesis of regression equations for prediction of features’ values in T2 based on T1 measurements; and cluster analysis for assessment of the correspondence between measured and predicted feature values. Thus, 11 amplitude descriptors of the QRS complex are highlighted as stable, i.e. keeping their strong correlation ( 0.7) within a certain factor, weak correlation (<0.3) with the features from the remaining factors and presenting high correlation in the two measurement periods that is a sign for their person verification\identification potential. The observed coincidence between feature values measured in T2 and predicted via the designed regression models (r=0.93) suggests about the confidence of person identification via the proposed morphological features.


Introduction
Nowadays, research on automatic person identification is a rapidly developing area, considering not only the security requirements for the cases of financial transactions, access control, travelling, but also the remote health monitoring scenarios in the clinical and emergency medicine, as well as the organization and processing of hospital databases.The efforts of many researchers are focused on the application of internal body physiological biometric characteristics, which provide robustness to hacker attacks and falsification and are available in most of the health monitoring scenarios.The analysis of the electrocardiogram (ECG) in this respect started about a decade ago, applying either methods that use measurements after detection of fiducial points or analysis of the ECG morphology.
The fiducial-based approaches are the primary considered in the field as they employ morphological features, typically measured for diagnostic purposes by ECG devices.Thus, the crucial task for precise localization of specific anchor points on the P-QRS-T segment can be managed by certified commercial ECG analysis modules with minimal intervention.The application of such a person identity approach can be easily extended to automated management of in-hospital databases.According to the data cited in literature, the identification accuracy could reach values between 97% and 100%, based on the following analysis approaches: 12 uncorrelated diagnostic features of P-QRS-T amplitudes and durations, processed by Principle Component analysis score plots applied over a database with 20 subjects [1]; 15 P-QRS-T temporal features, fed to discriminant functions over 29 subjects under various stress conditions [2]; fiducial based temporal and amplitude measurements combined with features that capture the heartbeat patterns, tested over 31 healthy subjects: 18 with a single ECG record and 13 with more than one ECG record [3].
The fiducial independent approaches for person identification are based on assessment of the P, QRS, T waveforms similarity in the analysed ECG recordings by calculation of correlation coefficients, which are subsequently processed with different techniques.The achieved identification accuracy is: 100% via discrete cosine transform over a database with 14 subjects [4]; 96.2% provided by discriminant analysis over 61 subjects [5]; 91.4% [6] and 85.7% [7] via assessment of the maximal correlation coefficient of a single-lead ECG for 11 and 14 subjects, respectively, improved to 100% while 12-lead ECG is analysed for the database with 11 subjects [6].
Majority of the cited methods are tested with smallsized ECG databases [1,2,4,7] or track intra-subject changes of ECG characteristics measured in very short distanced time intervals or even in the same session [2,3,5].This might bias the reported high identification accuracy from the real case scenario, where the influence of different factors, such as long-term ECG changes, electrode rearrangement, etc., should be considered.Our experience with large ECG databases, containing distanced in time couples of standard 12-lead ECG recordings shows lower authentication accuracies, i.e. about 86% correct verifications for a fiducial-based morphological measurements over 574 ECG couples [8]; 92.5% correct verifications for non-fiducial QRS pattern matching over 316 ECG couples [9]; 87%/78% correct verifications/identifications for non-fiducial correlation analysis of limb leads over 147 ECG couples [10].
Recently, we evaluated the inter-subject variability of morphological ECG features over large amount of time (>5 years) in a population of healthy cardiac subjects [11].The aim of this study is to investigate the intraindividual stability of the ECG morphology using the same feature set in order to assess the potential for person identification via orthogonal ECG leads.

ECG database
The ECG signals used in this study are taken from a computerized ECG-ILSA database [12,13], collected for the Italian Longitudinal Study on Aging project [14].The ECGs have been collected from individuals aged from 65 to 84 years.Each recording is with duration of 10s and includes the standard 12 leads, sampled at 500 Hz.The database contains 901 patients recorded both in the first phase (T1) and in the second phase (T2=T1+5 years).A set of 145 individuals without cardiac diseases has been selected for analysis in this study.

Method
Four ECG leads, forming quasi-orthogonal lead system (II&III,V1,V5) [15] are processed as follows: 1) Detection of QRS complexes in a combined ECG lead [16], heartbeat classification [17], calculation of mean RR interval between predominant beats; 2) Extraction (by best-fit correlation) of an averaged QRS-T pattern of the predominant beats for each of the 4 leads.Interval of 0.35*RR before and 0.55*RR after the detected fiducial points is considered; 3) QRS-T delineation in each of the four leads [18]; 4) Calculation of 15 morphological pattern features per lead (see figure 1), as follows: -12 features describing the QRS morphology that have previously been approved to provide adequate heartbeat comparison and classification [14]: QRS-width (Width); amplitudes (Ma, Mi) and offsets (Ima, Imi) of maximal positive and negative peaks; slope from QRS-onset to first peak (S1); slope from first to second peak (S2); QRS positive (ArP), negative (ArN) and total area (Ar=ArP+ArN); sum of the absolute QRS velocities values (Av); number of samples crossing 70% of the maximal peak amplitude (No);

Results and discussion
The analysis of the designed feature space (15x4=60 features) via Factor analysis highlights 11 features that are relevant to the aim of the study and for which the following observations are valid: -High (0.7) and significant (p<0.05)factor loadings in one of the first 3 factors, combined with low and insignificant loadings in the remaining factors, when the factor analysis is performed over the joint data for 2x145 patients in T1+T2 records (see Table 1); -The same feature distribution among the first 3 factors and comparable factor loadings, when the factor analysis is performed separately over the data collected during T1 (145 patients) and T2 (145 patients).The results are presented in Table 2 and Table 3.The Factor analysis results support the following conclusions: 1) Confirm our hypothesis that the ECG leads (II&III), V1 and V5 form quasi-orthogonal lead system, considering the fact that the features, which are strongly correlated with a certain factor are from a single axis (lead) and the features measured in different axes (leads) are distributed among the 3 factors; 2) Confirm the stability of the formed feature subsets, considering the measurements in T1 and T2, i.e. their strong correlation within a certain factor and weak correlation with the features from the remaining factors.The formed feature subsets are as follows: -ArN_II; S2_III; Ar_III; ArN_III; Av_III in lead (II&III); -Ar_V1; ArN_V1; Av_V1 in lead V1; -Ar_V5; ArP_V5; Av_V5 in lead V5.Based on the above findings, it is reasonable to consider that the best features within the formed subsets are representative for person identification.
The results from the correlation analysis applied over the feature values measured in T1 and T2 (see Table 4) give a hint about the possibility for synthesis of regression equations that could be applied for predicting the value of a certain feature in T2 based on data measured in T1.The comparison between the predicted features' values and the measured ones could be considered as an adequate preliminary sign about the confidence level for person identification via morphological ECG features measured in different time moments.This is demonstrated via one of the models designed by means of stepwise multiple regression analysis, i.e. prediction of the values for ArN_III(T2) based on the values measured for ArN_II(T1) and ArN_III(T1) -see Table 5.Based even on this single model for calculation of ArN_III values in T2 we have achieved 80.7% group indistinction, when applying Kmeans clustering over the measured and predicted values.Only 19.3% of the predicted values have been assigned to different cluster.

Conclusions
The results achieved in this preliminary study show that in general the morphological features measured from the electrocardiogram could be applied for identification of subjects in different periods of examination.This could be used in the clinical and emergency medicine, as well as for the design and management of personalized ECG databases.

Fig. 1 .
Fig. 1.Graphical representation of the morphological features calculated for the averaged delineated QRS-T pattern: a) slopes; b) amplitudes, areas; c) time intervals.Aiming to select feature constellation, which is stable over time, thus being appropriate for the identification task, we apply the following procedures included in the multivariate analyses of the software package Statistica 12.7 (Dell Inc.): 1) Factor Analysis by Principal Components extraction method with 'varimax' rotation for providing orthogonal factors.The idea is to highlight the outstanding correlated (dependent) and uncorrelated (independent) features separated in the first 3 factors.The stability of the factors is independently verified over the data collected in the periods: T1, T2 and joint T1+T2.Presence of factor stability between the 3 datasets would indicate the factors' potential to represent the subjects' correspondence.2) Correlation analysis for assessment of the coincidence between feature values in T1 and T2; 3) Multiple stepwise regression for synthesis of regression equations that could be applied for predicting the value of a certain feature in T2, based on data measured in T1. 4) Cluster analysis for assessment of the prediction model, e.g. the coincidence between measured and predicted feature values.

Table 1 .
Factor loadings in the first 3 factors obtained via analysis of the data for T1 and T2.Loadings with absolute values 0.7 are highlighted.

Table 2 .
Factor loadings in the first 3 factors obtained via analysis of the data for T1.Loadings with absolute values 0.7 are highlighted.

Table 3 .
Factor loadings in the first 3 factors obtained via analysis of the data for T2.Loadings with absolute values 0.7 are highlighted.

Table 4 .
Correlations coefficients between the features included in the stable groups for both measurement periods T1 and T2.Significant correlations are highlighted (p < 0.05).