Security Enhancement of Knowledge-based User Authentication through Keystroke Dynamics

Keystroke Dynamics is a behavioural biometrics characteristic in Biometric science, which solve the issues in user identification or verification. In Knowledge-based user authentication technique, we compromise with PIN or password which is unsafe due to different type of attacks. It is good to choose password with the combination of upper and lower case letter with some digits and symbols, but which is very hard to remember or generally we forget to distinguish those passwords for different access control systems. Our system not only takes the users’ entered texts but their typing style is also account for. In our experiment, we have not taken hard password type texts, we have chosen some daily used words where users are habituated and comfortable at typing and we obtained the consisting typing pattern. Different distance-based and data mining algorithms we have applied on collected typing pattern and obtained impressive results. As per our experiment, if we use keystroke dynamics in existing knowledge based user authentication system with minimum of five daily used common texts then it increases the security level up to 97.6% to 98.2% (if we remove some of the irrelevant feature sets).


Introduction
Keystroke dynamics is a behavioral biometrics which is the method of analysing the way a user types on a keyboard and classify user based on regular typing rhythm.It is the study of whether people can be wellknown by their typing rhythms, much like handwriting is used to recognize the author of a written text.A user's typing pattern may be unique because similar neuro-physiological factors that make written signatures unique.Here user can choose any text for password from his/her own dictionary.It is very simple and nothing to remember still it enhances the security level and can be used to identify an individual.Recognizing typing style promises a parameter like biometric characteristics that may facilitate nonintrusive, cost-effective and continuous monitoring.
System takes comprising of characters as well as the typing style of each subsequent character entered.It facilitates that no one can track the time or presses the character of password in same rhythm.It will prevent our system from off-line guessing attracts and also prevent to track by un-authorised people.Our objective is to minimize the probability of off-line guessing attacks, hide the password from public, minimize the hardware cost and minimize the software cost by making faster pattern recognition.
In our experiment we have consider five daily used strings ("kolkata123", "facebook", "yahoo.com","gmail.com"and "123456") as password string laptop keypad as hardware configuration.We have collected the keystroke raw data samples from 12 users in four sessions, in each session 6 times using JAVA swing.Then we extracted all the features (key duration, updown key latency, down-up key latency, up-up key latency, down-down key latency, tri-gap timing, fourgap timing and total timing) and we analysed features and combination of features using R Statistical tool and Microsoft Excel.
We have tested our raw data using eight different score-based, distance-based and features mining algorithms and we got 2.4% of Equal Error Rate for Zscore algorithm where we have consider all the five fixed strings and all the mentioned features.

Basic Idea
Keystroke dynamics is set of some timing data or keystroke pressure data which is generated at typing on keyboard which is unique and can be used to classify the users.

Science and Features Selection
Placement of fingers on keyboard, hand weight, length of finger, neuro-physiological factors make typing style unique, where some timing factors easily (2) Down Down Key Latency (T 3 )=P i+1 -P i (3) Up Down Key Latency (T 4 )=P i+1 -R i (4) Down Up Key Latency (T 5 )=R i+1 -P i (5) Total Time Key Latency (T 6 )=R n -P 1 (6) Tri-graph Latency (T 7 )=R i+2 -P i (7) Four-graph Latency (T 8 )=R i+3 -P i (8) Some new features are keystroke pressure, finger tips size, finger placement on keyboard, keystroke sound, error correcting mechanism, sequence of leftright control keys.There are different ways in which a user can be authenticated.However all of these ways can be categorized into one of three classes: "Something we know" e.g.password, "Something we have" e.g.token, "Something we are" e.g.biometric property.

Security Issues
Among various user authentication techniques knowledge-based, token-based and biometric-based authentication techniques, biometric authentication is most popular for their uniqueness characteristics and cannot be stolen or there is no chance to loss.Keystroke Dynamics is a behavioral characteristic which is unique and can be effectively implemented with the existing system with minimal alternation.It can be used as a safe guard of our password from different type of attacks.

Factors Affecting Performance
Some of the factors which affect the way of keystroke Dynamics as follows: Text length, sequences of character types, word choice, and number of training sample, statistical method to create template, mental state of the user, tiredness or level of comfort, keyboard type, keyboard position and height of the keyboard, hand injury, weakness of hand mussel, shoulder pain, education level, computer knowledge, and category of users.

Algorithms
Many classification methods have been applied in keystroke dynamics study over the last three decades.Following are the anomaly detector algorithm described in [8].

Background details
In 30+ years of experience, many researchers have proposed their algorithms, taking various features, various length of pattern string.
Table 1.Background of keystroke dynamics

Experimental results
We have implemented a program in JAVA for experimental purpose, which has the capability of capturing all key pressing and releasing events, which are used to create the database of different sample of passwords and timing templates.Here we have calculated average equal error rate for all eight algorithms considering some single feature and combination of features for all five strings.

Evaluation and Analysis
Here we see that no combination of features and algorithms give bellow 0.08 average equal error rate for all five type of fixed string.
We have tested combining these five strings and we got the following result.Here minimum average equal error rate is 0.024 where all five strings and all features are considered.Here we see that no combination of features and algorithms give bellow 0.08 average equal error rates for all five type of fixed string.
We have tested combining these five strings and we got the following result.Here minimum average equal error rate is 0.024 where all five strings and all features are considered.

Discussion and Further Area of Research
Keyboard is essential for a computer device, which can recognize our typing style and very much unique as per our experiment and cannot be copied or stolen.It can be used as safe guard of our password in any access control system.This technique can be used in online criminal investigation, back door account identification, online typing examination, emotion recognition, lab attendant system many more.Sometimes, score of different algorithms varies due to mental state of user or aging problems.It can be solve by updating mechanism.strings database and we got 97.6 % of accuracy.If we remove some of the feature sets like four graph times for the string "kolkata123" (e.g.four graph times of "ata1", "ta12", "a123", three graph times between "lka" etc), then we can get up to 98.2% of accuracy in this technique.So it has been established that this technique can be used as a safe guard of password or PIN in knowledge-based user authentication.But in practical there are many affecting factors may affect way of this process.Need much more experiment on it like key-pressure, finger placement on keyboard, finger tips size etc. can be calculated.

Figure 1 .
Figure 1.Two different samples from a user in line chart

Figure 2 .Figure 3 .
Figure 2. Line chart of all 8 classifiers for each dataset In the above figure, we see that for all the strings outlier Count (z-score) is achieved best result after scaled Manhattan.

Figure 4 .
Figure 4. ROC Curve of the Outlier Count

Table 2 .
Average equal error rate for fixed-text "kolkata123"

Table 3 .
Average equal error rate for fixed-text "yahoo.com"

Table 4 .
Average equal error rate for fixed-text "gmail.com"

Table 5 .
Average equal error rate for fixed-text "facebook"

Table 6 .
Average equal error rate for fixed-text "123456"

Table 7 .
Average equal error rate for all combination of fixed texts

Table 8 .
Comparison by average equal error rates of different distance-based algorithmThe Table8represents that Outlier Count, Lorentzian, Canberra, Chebyshev and Scaled Manhattan are the suitable model in identification of user through typing pattern.