Finger vein recognition based on convolutional neural network

. Biometric Authentication Technology has been widely used in this information age. As one of the most important technology of authentication, finger vein recognition attracts our attention because of its high security, reliable accuracy and excellent performance. However, the current finger vein recognition system is difficult to be applied widely because its complicated image pre-processing and not representative feature vectors. To solve this problem, a finger vein recognition method based on the convolution neural network (CNN) is proposed in the paper. The image samples are directly input into the CNN model to extract its feature vector so that we can make authentication by comparing the Euclidean distance between these vectors. Finally, the Deep Learning Framework Caffe is adopted to verify this method. The result shows that there are great improvements in both speed and accuracy rate compared to the previous research. And the model has nice robustness in illumination and rotation.


Introduction
Biometric recognition technology has a long development history.At present, there are so many categories of biometric technologies such as fingerprint, face, iris, speech and other features.Although these biological features are unique, accessible, widespread and can be regarded as markers of human identity, some challenges still exist [1].In recent years, it has been proved that the error rate (ERR) of finger vein recognition can reach 0.0009% under certain conditions [2].At the same time, finger vein identification also has the following advantages: (1) It is a non-contact biometric identification method without finger touch, so it is easy to be accepted by the user.
(2) Finger vein belongs to human's internal characteristics, and it can not be copied and embezzled.(3) It has been proved by medical science that finger vein is unique.(4) Each of us has ten fingers.If there is a sudden condition, we can use other fingers instead.
In the traditional finger vein recognition, the main processes are: ROI extracting, filtering, noise reducing, image enhancing, feature extracting and distance matching.However, it takes a long time in computing in the preprocessing stage, and different image sources are suitable for different processing methods.So this paper proposed the finger vein recognition system based on convolutional neural network(CNN) [3], it can solve the problem of illumination change, scale transformation and image rotation, leading to an excellent performance on finger vein recognition.

Related Works
The traditional methods of finger vein recognition mainly focus on the extraction of ROI and the representation of feature vectors.In 2010, Yang proposed interphalangeal joint prior method to get ROI segment and calculated feature vector by steerable filter.Then the nearest neighbour classifier is used to identify individual, and the accuracy has reached 98.7% [4].Later, Guan made improvements.He combined it with filters to remove noise, then used two direction weighted(2D)2LDA methods to represent feature vector.However, the accuracy hadn't been achieved as expected, only 94.69% [5].Based on the above research, Yang raised a more detailed method.In the preprocess stage, elimination, noise reduction, image enhancement, size and brightness normalization are executed.Finally, the accuracy rate reached 100% through the template matching [6].However, it can not be a perfect method because these complex processing steps take a long time.After four years, Gupta adopted a new approach to get feature by multi-scale matched filtering and line tracking, then he got 4.47% err [7].Although some achievements have been made by conventional approaches, the robustness of them is not enough in both noise and misalignment.Therefore, we need to find a more suitable and efficient method.
In recent years, with the development of artificial intelligence, scholars have made an attempt to apply machine learning method into finger vein recognition.Wu and Liu made an experiment based on Principal Component Analysis and neuro-fuzzy system(ANFIS) and achieved high accuracy of 99%, but the execution time is 45.0s [8].In addition, Kuan-Quan extracted finger vein vectors by Gaussian filter and local binary pattern, and then it reached 98.75% accuracy rate by improved SVM method [9].However, this method lacks robustness because of the low accuracy in the middle and small finger.In this paper, a new approach using CNN for finger-vein biometric identification was proposed, which reached a high accuracy and speed rate.

The proposed approach
Inspired by the visual nerve mechanism, CNN is designed by biologists Hubel and Wiesel in their early research and it's an extension of the multilayer perceptron [10].It has the advantages of rotation invariance, position invariance and scale invariance because of its three characteristics: local receptive field, weight sharing and pooling.

Proposed finger vein recognition system
The finger vein recognition system proposed in this paper is shown in Figure 1.In the registration phase, we capture the image of the finger vein by the device and get the region of interest(ROI), and then extract the feature vector through the CNN.In the authentication phase, we use the same method to get the image and its feature vector.Then we calculated the Euclidean distance [11] between the two vectors obtained above.If this distance is less than the threshold, it can be considered that two images comes from the same person and the authentication is successful, otherwise failed.The value of threshold is obtained by plenty of experiments in which we compare the Euclidean distance between intra-class and inter-class and choose a proper threshold as a judgment standard.And its value can be adjusted flexibly according to the practical applications, which makes it stricter for acceptation or rejection.

Proposed CNN Architecture
In the finger vein recognition system mentioned above, one of the most critical parts is using CNN to extract features, and how to extract the most representative and robust feature vector is the focus of our research.The finger vein image has the characteristics of location invariance and compositionality and the vein lines distribution of different people are similar.And then the CNN model can be trained by inputting a large number of images so that it gets the ability to extract the most representative vector.Also, the feature vectors of new images can be extracted by this already trained model.Therefore, training the CNN to get the most suitable model to extract finger vein features vectors is extremely important.
The most typical structure of AlexNet contains eight layers with weights, the first five is convolutional and the remaining three are fully-connected [12].The AlexNet is relatively complex and it has higher dimension in the last three full connection layers which is not suitable for training in small data sets.Therefore, the AlexNet needs to be cut properly.At present, there is no rule to set the values of some parameters such as the number of layers, convolution kernel size, sliding step size and so on.After observation, we found the background of finger vein picture is not complicated and colourful, and the mainly differences between different pictures are reflected in the distribution and brightness of vein lines.In order to extract features as fast as possible, it need to reduce the dimensions of three full connection layers.Secondly, our purpose is to make the Euclidean distances as large as possible for the different persons, and as small as possible for the same person.In that case, we take a cost function called Softmax Loss [13].In order to get the proper parameters of the fully connection layers, we did a lot of experiments and designed a CNN shown in Table 1.The input dimension is 256 * 256 pixels and the output dimension is 64 * 1 pixels.

Experimental data
In this experimental, the finger vein image database was obtained from the DataTang [14].And it consists of 64 subjects with 15 samples each for finger, which were captured in three months and 5 images per month.The original size of the captured image is 376*328 pixels.In order to enrich the samples and make the network more robust in illumination and rotation, we added random illumination and slight rotation in the 960 pictures, which is closer to the real application scenarios.Finally, we got 4800 images.In the whole images, we took the 39 images from each subject randomly as training samples.6 images were used as the testing samples and the remaining 30 images were validation samples, and each image was numbered.

Experimental process
An open source deep learning framework-Caffe is adopted in the experiment, which has a clear and efficient architecture [15].This experiment was carried out in CPU environment.First, all images in training set and testing set are numbered separately, and marked with different labels according to their categories.Then we need to convert format of the images form jpg to lmdb by the tools in Caffe which is helpful to reduce the overhead time of IO.Besides, in order to speed up the convergence of Loss, we calculate the average of the pixel value of the training image by Caffe tools.Before each picture is put into the network, its pixel value must subtract the average to make the data smaller.In this experiment, the amount of data is relatively small and the number of model layers is large.If we just use the random initial value to training the CNN, the Loss doesn't converge and the desired effect cannot be achieved.Therefore, we fine tuned this CNN based on the previous trained caff model.The parameters of each layer were taken as the initial value of the network, then Loss can converge.
In order to avoid overfitting, the weight is reduced by a smaller factor called weight_decay in each iterate.It means a penalty is added in the total weights to keep the weight smaller.So we set the weight_decay value to 0.01.In the SGD optimization [16], the gradient can not be too large or too small.In the former case, there is no decline for the Loss as the iteration number increases; And the latter one will make it difficult to find a fast descent direction to change the Loss.In this paper, the learning rate is adjusted continuously through experiments, and finally is set to 0.001.To realistically and vividly describe the process of extracting features in CNN, we visualized the results of the forth convolution layer in Figure 2. It can be seen that the extracted feature image is smooth and it has the characters of low correlation, irregular structure and less noise, which prove that CNN is suitable to extract finger vein features.

Evaluate the extracted features
In the previous CNN training process, in order to increase the contrast between the background and vein lines, the original images were equalized.After the training, we set the 0.25 as threshold value to determine whether authentication is successful by observing the Euclidean distance between intra-class and inter-class in the test set (shown in Figure 3.1).According to this threshold, we made a test in the validation set as is shown in Figure 3.2.Unfortunately, the performance was poor, which was judged as a serious overfitting in train set.     .It is easy to distinguish whether two feature vectors belong to the same person in test set and validation set when the threshold of Euclidean distance is 1.49.The ROC curve is drawn according to different thresholds, as is shown in Figure 6.It can be seen that when the threshold is set to be 1.24, both FRR and FAR achieve the ideal value.The equal error rate is 0.21%.According to the experimental results, this network has better ability to extract finger vein features and can effectively distinguish the distance between intra-class and inter-class.

Comparison Between CNN and Other Methods
A new finger vein recognition system is proposed in this paper.Particularly, the feature vectors of finger vein images are extracted by CNN.Through the experiment, we can see that this method is outstanding both in speed and accuracy.The Table 2 shows the performance comparison between the CNN and the traditional method.It can be seen that there is no complex pre-processing procedures for images.Therefore, CNN is not only faster when extracting features, but also more accurate.Most importantly, it has a stronger robustness in rotation and illumination.

Conclusion
Finger vein, as an evidence of authentication, has a wide application prospects.However, finger vein images must be processed by various complex algorithms in most of the current finger vein authentication systems.The speed and robustness of this kind of system need to be improved.So in this paper, we proposed a new finger vein authentication system in which the ROI of captured images are directly input to CNN to extract the feature vector, then we judged success or failure of the authentication according to the Euclidean distance between two vectors.Finally, through experiments and analysis, this method shows a good performance in practical application.In the future, we hope to train a better network based on a larger data set to improve the technology of finger vein technology.

Figure 2 .
Figure 2. The extracted feature of the fourth layer

Figure 3 .
Figure 3. Distance distribution in the previous experimentsThen we picked out those pictures according to the false rejection and false acceptance and found that most pictures contain large chunks of white or black areas which is shown in Figure4.Therefore, equalization is the reason to cause these errors.

Figure 4 .
Figure 4. Error images in validation set So we cancelled the step equalization and directly resized the image to 256*256 pixels, and input them into the CNN for training.Finally, we got the Euclidean distance distribution between intra-class and inter-class in the test set and validation set which are separately shown in Figure 5.1 and Figure 5.2.It is easy to distinguish whether two feature vectors belong to the same person in test set and validation set when the threshold of Euclidean distance is 1.49.The ROC curve is drawn according to different thresholds, as is shown in Figure6.It can be seen that when the threshold is set to be 1.24, both FRR and FAR achieve the ideal value.The equal error rate is 0.21%.According to the experimental results, this network has better ability to extract finger vein features and can effectively distinguish the distance between intra-class and .

Table 1 .
Proposed CNN model structure.