FYO: A Novel Multimodal Vein Database With Palmar, Dorsal and Wrist Biometrics

Multimodal biometric systems are preferred as a defense compared to unimodal systems. This study introduces an open access multimodal vein database named FYO with each letter dedicated to each author’s name. The database involves three biometric traits; palm vein, dorsal vein and wrist vein of the same individuals, to explore and enhance research in the area of using these traits to create a spoof-proof multimodal authentication system. The vein images of FYO are acquired using medical vein finder in a controlled environment. Comparisons are performed to show the differences with the existing well known databases and the state-of-the-art recognition algorithms. Hand-crafted feature extractors such as Binarized Statistical Image Features (BSIF), Gabor filter and Histogram of Oriented Gradients (HOG) are applied to show the viability of the vein datasets. Additionally, a deep learning based Convolutional Neural Networks (CNN) architecture is proposed with two models using decision-level fusion of palmar, dorsal and wrist biometric traits on vein images. Unimodal systems, multimodal systems and the proposed architecture are tested on several vein datasets including palmar, dorsal and wrist vein images. Experimental results based on accuracy and computation time on our FYO datasets showed competitive output with that of other databases such as Tongji Contactless Palm Vein database, VERA, PUT, Badawi and Bosphorus hand vein databases. Moreover, the proposed CNN architecture on three vein biometric traits show superior performance compared to hand-crafted methods.


I. INTRODUCTION
Biometric systems are among the most popular technologies around the world for person authentication. The development of new technologies capable of acquiring these biometric traits, such as face, iris, fingerprint, palmprint, soft biometrics, palm vein and finger vein have helped to advance research studies in biometric authentication both in unimodal and multimodal forms [1], [2]. Multimodal systems are used as a defense since these systems are more robust and more secure compared to unimodal systems [3]. Consequently, vein images captured from different regions of the hand, such as palm and back of the palm (dorsal), and the wrist, as biometric traits have received enormous attention in recent years due to these vascular vessels under the skin which makes it relatively impossible to spoof [4]. Furthermore, the development of low-cost devices capable of capturing vein patterns of different regions of the hand has also made them popular for use in high security authentication systems. The image capturing The associate editor coordinating the review of this manuscript and approving it for publication was Andrea F. F. Abate . process is fast, contactless and user friendly which makes it easy for users to use such devices willingly.
In this paper, we introduce an open access multimodal vein database named FYO; composed of palm vein, dorsal vein and wrist vein, as shown in Fig. 1. The proposed dataset images were acquired using a low-cost medical vein finder device. The images were collected among volunteered students and staff of Eastern Mediterranean University. Unimodal and multimodal systems have been implemented based on our vein images using hand-crafted and deep learning based feature extractors. Additionally, we used several unimodal and multimodal publicly available vein databases that involve one or two vein biometrics to present a detailed comparison on various vein databases with different methods applied on them. Consequently, comparison of these unimodal and multimodal vein biometric systems are also presented in this study. In addition to our new vein database including three biometric trait images, we propose a deep learning based Convolutional Neural Networks (CNN) architecture with two models using decision-level fusion of palmar, dorsal and wrist biometric traits. Unimodal systems, multimodal systems and the proposed architecture are tested on several vein datasets including palmar, dorsal and wrist vein images.
The rest of the paper is organized as follows. Section 2 gives an overview of similar researches in the area of using hand vein as a biometric trait and multimodal biometric systems. Section 3 describes our new multimodal vein database named FYO, while Section 4 compares FYO to other vein databases. Preliminary experiments with hand-crafted methods are presented in Section 5, along with step by step detailed explanation of the proposed deep learning method in Section 6, while the conclusion and future work are given in Section 7.

II. LITERATURE REVIEW
Most of the existing hand vein recognition studies in the literature used one or two types of vein images with palmar, dorsal or wrist biometrics, captured by infrared or near-infrared cameras. For instance, the use of vascular patterns inherent in palm veins was proposed in [5]. Images were captured using near-infrared Charge-Coupled Device (CCD) camera and lighting system, then feature extraction was done by binarizing and thinning to obtain connecting vein lines and minutiae. The authors reported the result of experiments in terms of Equal Error Rate (EER) as 1.82%.
In 2011, the first publicly available palm vein and wrist vein database called PUT database was introduced [6]. The authors presented the experimental results as within series and between series for images collected. EER for within series comparison was reported as 1.1% which is better than the EER results for between series comparison that was given as 3.8%. A near-infrared camera-based device to capture the palm vein images was constructed and features were extracted based on 2-D Gabor filter in [7]. An improvement was made on using Gabor filter by proposing Adaptive Gabor filter where the appropriate parameters for Gabor filter were selected at different orientations and frequencies, instead of the original Gabor filter where the parameters were initialized at the beginning [8].
Palm vein recognition was studied against spoof attacks under Print Attack category in [9], while also introducing VERA palm vein database, reporting the results for two different regions of interest: ROI-1 and ROI-2 with and without pre-processing. Directional filter bank has also been used to extract line-based features of palm vein, in a system designed to identify non-vein pixels [10]. Maximal Principal Curvature (MPC) algorithm and k-means method were used to extract features from palm vein images in [11]. A palm vein biometric authentication system for bank ATM transactions using palm vein trait and unique identification number has been proposed [12]. Palm vein authentication technology was introduced in [13], giving examples of its application to financial solutions while also discussing Palm. Secure, an authentication product developed by Fujitsu for the general market [14].
Dorsal vein biometrics has also gained an increasing interest from the biometric community. Dorsal vein recognition using Hierarchically Structured Texture and Geometry Features has been proposed [15]. Log-Gabor and Sparse Representation Classifier (SRC) was used to evaluate a new dorsal vein sensor [16]. Morphological operations on local features for dorsal vein recognition have also been examined [17]- [19]. Integration of Cholesky decomposition with low dimensional representation for dorsal vein feature representation was done in [20] while the use of Linear Discriminant Analysis (LDA) on dorsal vein images was proposed in [21].
Similarly, wrist vein as biometric trait has also received enormous attention in the biometric community, such as in [22] where Dense Local Binary Pattern (D-LBP) was used to characterize wrist vein. Neural Network based wrist vein biometric identification using ordinary mobile phone camera has also been proposed [23] as well as wrist identification for forensic investigation [24].
Motivated by the fact that multi-biometrics improves the accuracy of biometric systems, several researchers have used multiple hand vein parts in their methodology, such as using vascular patterns of finger vein and palm vein images using modified 2-D Gabor filter and a gradient-based technique [25], using Gabor filters on wrist and palm vein images after enhancement with CLAHE and 2-D Gaussian high pass filter [26], using both Shearlet Transform and Scale-Invariant Feature Transform to extract features from finger vein, palm vein and dorsal vein [27], fusion of palm and dorsal vein features has also been proposed [29], [30]. Multi-modal framework for identification using left and right wrist vein patterns has also been proposed [28].
Moreover, there exist deep learning based studies on vein biometrics. A CNN based deep-learning method for finger-vein identification similar to [31] has been used [32]. CNN feature learning and transfer approach on hand dorsal vein was performed in [33]. Palm vein authentication using CNN with PVSNet architecture for training to deal with the need for huge amount of training data in deep learning [34]. Researchers have also combined hand vein traits to form multimodal biometric systems such as in [35] where a multimodal system with CNN based deep-learning using finger-vein and finger shape as traits was proposed. Moreover, fusion of fingerprint and finger vein pattern using deep convolutional network to obtain fingerprint features and SIFT for finger vein patterns was proposed in [36].
On the other hand, our aim is to construct a vein database including three types of biometrics and to propose a deep learning based CNN architecture combing three biometrics, namely palmar vein, dorsal vein and wrist vein. Consequently, we captured palmar, dorsal and wrist vein images, and established a new multimodal vein database in order to be used as a defense for several attacks on biometric systems because of following reasons: • Live Body Identification: The vein images can only be taken from live body, accordingly no identification and authentication can be made for a non-live hand. Therefore, falsification of the trait at the capturing stage is impossible.
• Internal Features: The vein biometric feature authentication system extracts the vein pattern inside a hand or wrist rather than the outside features of a hand or wrist for authenticating that person. So there is no concern about identification and verification issue due to wear and tear, dry and wet hand surface.
• High Security: With the combination of the two features, namely live body identification and internal features, vein biometric cannot be falsified and no barrier occurs for identification; thus it has high security standard [4]. The details related to the new database and the proposed architecture are given in the next sections.

III. NEW MULTIMODAL VEIN DATABASE
The device that was employed to capture images of the three traits, namely palmar vein, dorsal vein and wrist vein, was used with three different hand guides as shown in Fig. 2. The device is equipped with 1/3 inch infrared Complementary Metal-Oxide Semiconductor (CMOS) camera rounded with 12 pieces of infrared LED light sources, designed to be able to capture veins anywhere in the body regardless of skin color, age and weight with the aid of infrared LED light sources. This capturing device is connected to a laptop via a USB cord, where captured images are displayed on the screen and stored in PNG format having 800×600×24 dimensions.
The device holder is adjustable almost at 360 degree angles; however, for consistency and to reduce rotation, a fixed position as shown in Fig. 2 was maintained throughout the acquisition period of about 30 days. The sensor was kept at a height of 35 cm. Wooden hand guides, 7 cm high, were constructed to reduce rotation of the hand. Consequently, the target (hand) was about 26-28 cm from the camera source, depending largely on the part of the hand being captured (the wrist is slightly closer to the camera), but also on the size of the hand. The acquisition of the three different traits was   done sequentially, approximately 10 seconds apart, due to the time taken to place different biometric traits on different hand guides for capturing images.
A total of 160 volunteers' left and right hand vein images were taken in two sessions of 10 minutes apart, from palm, dorsal and wrist. One image was taken per session for each hand, summing up to 4 images per trait and 12 images per person. Therefore, we have a total of 640 palm vein images, 640 dorsal vein images and 640 wrist vein images, making a total of 1940 images in the database. The volunteers are between the ages of 17 to 63 as shown in the age distribution in Fig. 3, while the gender ratio is 69.27% to 30.73% representing 111 males to 49 females, respectively. The volunteers are mainly from North Cyprus, Turkey, Nigeria, Iran and other parts of the Middle East and Africa.
FYO database is open access to researchers upon request (https://fyo.emu.edu.tr/). The database contains three different folders including the original images (in png format) with the background and unwanted parts in the first folder, the region of interest images (in png format) after cropping out the background and unwanted parts in the second folder, and the generated images (in jpg format) using Keras data generator in the third folder. Table 1 shows the comparison of FYO with other existing databases, namely PUT [6], VERA [9], Tongji [37], Badawi [38] and Bosphorus [39], in terms of vein traits available in them. The number of subjects and sessions and the number of samples per subject as well as the male to female ratio of each dataset are also presented. From the table, we can see that only FYO database has all three traits (including palm vein dataset FYOPV, dorsal vein dataset FYODV and wrist vein dataset FYOWV) while other databases have just one trait each except PUT database which has palm and wrist vein images. Although our database has the lowest number of samples per session, we prioritized having more subjects so as to account for variation from volunteers to volunteers, with 160 subjects; FYO is the second database compared to the total number of subjects used in the presented databases in Table 1.

V. EXPERIMENTAL ANALYSIS WITH HAND-CRAFTED METHODS
The preliminary experiments are conducted using handcrafted feature extractors, namely BSIF [40], Gabor filter [41] and HOG [42]. These methods are chosen for the aid of proper comparison since they are some of the most commonly used hand-crafted feature extractors in many studies. They are applied on the datasets after pre-processing steps which involved Region of Interest (ROI) cropping to remove the background from the images, and image enhancement is carried out using histogram equalization. Afterwards, Nearest Neighbor classifier is applied. The datasets are divided for training and testing as follows; The results of the experiments showed that BSIF performs the best in all the databases and for all the traits except in Tongji palm vein datasets. Generally, the results show a consistent performance for each method used as shown in Table 2. It is also shown that FYO database performed comparably well despite having only one sample per session. It achieves 95.94% accuracy for palmar, 95.31% for dorsal and 92.50% for wrist using BSIF.
Unimodal biometric authentication setup is compared with multi-modal setup in Table 3 using PUT database which has palm vein and wrist vein datasets, and FYO database which has dorsal vein, palm vein and wrist vein datasets. As shown in Fig. 4, features are extracted separately from each image; the features are fused together at this stage to become one vector of features for each individual, by concatenating the features; palmar features are appended to dorsal, and then wrist features appended to the end (Feature-Level Fusion).    Normalization is not required since the features from the different biometric traits are in the same range for images of the same format and of similar source, thus fusion is faster and more efficient. Nearest Neighbor classifier is applied using Manhattan distance method similar to the unimodal setup.
Results in Table 3 show that better performances are obtained in the multi-modal setups. Combining two biometric traits gave significant improvement in performance against the unimodal setup in each of the three feature extractors used. The table also shows that combining three traits also increases the performance of the system across board with as much as 100 percent accuracy in the case of BSIF in FYO dataset.

VI. PROPOSED DEEP LEARNING BASED CNN ARCHITECTURE AND RELATED EXPERIMENTS
We propose a deep learning based Convolutional Neural Networks architecture with two models (Model1 and Model2) using decision-level fusion of palmar, dorsal and wrist biometric traits. The proposed deep learning models were implemented on the databases presented in Table 2 to compare its performances with hand-crafted methods. CNN has been shown widely to be highly efficient in image classification problems. In this paper, we modeled our system in line with AlexNet model [43], [44] which has five layers; the first model, named Model1, has fewer filters compared to AlexNet model as shown in Fig. 5. The details related to CNN Model1 are as follows: • First layer includes 32 different filters of size 3 × 3, max pooling is implemented with 2 × 2 filters.
• Second layer has 64 different filters of size 3 × 3, max pooling is implemented with 2 × 2 filters.
• Third and fourth layers have 96 different filters of size 3 × 3, max pooling is implemented with 2 × 2 filters.
A second model, named Model2, is similar to the first but with different number of filters. The model, shown in Fig. 6, has the same number of filters in each layer as AlexNet model. Both Model1 and Model2 are completed with a Dropout layer, Fully-connected plus Softmax layer and the Classification layer.
Deep learning generally requires training samples to be in thousands in order to achieve low training error. However, all but one (Tongji) of the databases used do not have up to this required number, therefore, we generated more images from   the datasets using Keras data generator which alters images with varying slight rotation, shear, zoom, width and height shift, brightness, etc. These increased the training and test datasets as follows; • Badawi: Training set: 5000 (50 samples for each subject), Test set: 500 (5 samples for each subject) • Bosphorus: Training set: 5400 (54 samples for each subject), Test set: 600 (6 samples for each subject) • PUT (palm/wrist): Training set: 5400 (54 samples for each subject), Test set: 600 (6 samples for each subject) • Tongji: Training set: 5400 (18 samples for each subject), Test set: 600 (2 samples for each subject) • VERA: Training set: 4950 (45 samples for each subject), Test set: 550 (5 samples for each subject) • FYO: Training set: 5760 (18 samples for each subject), Test set: 640 (2 samples for each subject) To avoid over-fitting, cross-validation was performed by swapping the test set with a section of the training set and repeating the experiment. This repetition is done 10 times without repeating the test set. Experimental results in Table 4 show that CNN outperformed the hand-crafted methods used in Table 2.
The proposed multimodal CNN architecture involves decision-level fusion with three biometric traits including palmar, dorsal and wrist biometrics. The multimodal architecture is shown in Fig. 7 where Decision-Level Fusion is used to combine CNN decisions from each biometric trait. The final decision of the proposed method is obtained using a Weighted OR Rule after fusing the three decisions, namely Decision D, Decision P and Decision W. Each decision returns either correct recognition (True) or incorrect recognition (False). True decision is given a weight of 0.5, while False decision is assigned 0. The weights of the three traits are added together and measured against a threshold set at 0.9 to obtain the final decision. The proposed architecture is tested using the same datasets presented in Table 3. A comparison is made with the corresponding unimodal setups in Table 5.
On the other hand, computation times for training and testing unimodal systems, multimodal systems and the proposed architecture are measured and shown in Tables 6 and 7, respectively. The measurement is conducted on three hand-crafted feature extractor systems namely BSIF, Gabor and HOG, and on two CNN based models on unimodal and multimodal vein images of PUT and FYO databases. The training and testing times on unimodal palmar and wrist images with their multimodal counterpart are measured for PUT database. Similar measurements on three unimodal biometric systems including dorsal, palmar and wrist traits are performed. Additionally, computation times of CNN-based   and hand-crafted multimodal systems fusing all three biometric traits on FYO database have been measured.
According to the computation times shown in Table 6 and Table 7, BSIF required the lowest time for training while CNN generally required the highest time; which is about a thousand times the time required by BSIF, for training. However, CNN Model1 required the lowest time for testing; average of 6.68 seconds for 600 test samples for unimodal system and average of 19.67 seconds for 600 test samples for multimodal system, making it favorable over other methods for biometric authentication systems where fast prediction of a new sample is the most favored characteristics apart from accuracy. Furthermore, combining the traits in a multimodal system will only increase the prediction time to about 20 seconds for 600 test samples. Additionally, since the acquisition of each trait takes about 1-2 seconds and the system should acquire all three traits at once with the aid of three different cameras, acquisition time will not affect the overall multimodal system.
In general, we compared the performance of hand-crafted feature extractors with deep learning based CNN models in unimodal and multimodal systems, using accuracy and computational time. From the results, we can deduce that multimodal systems generally perform better than unimodal systems both in hand-crafted feature extractors and CNN, although more computation time is required in multimodal systems to combine the features. CNN models compared favorably against the hand-crafted feature extractors, where Model1 performed best among the unimodal systems. Although the training time for CNN is higher than those of hand-crafted feature extractors, the testing time is the smallest; thereby making CNN favorable for fast authentication since training is a one-time event.

VII. CONCLUSION
This paper introduces the first multimodal vein database called FYO that involves three biometric modalities; palmar vein, dorsal vein and wrist vein in order to be used as a defense against several attacks to biometric systems. Having these three datasets of the same individuals in one database enhances and encourages research in the area of multimodal vein biometric authentication which is a spoof-proof setup and also highly efficient compared to unimodal systems. Additionally, we propose a multimodal CNN architecture using decision-level fusion with three biometric traits including palmar, dorsal and wrist biometrics. The experiments are performed using FYO, PUT, VERA, Tongji, Badawi and Bosphorus datasets.
Preliminary experiments using hand-crafted feature extractors on palmar, dorsal and wrist biometric traits show that BSIF is the most powerful hand-crafted method for almost all unimodal vein datasets. On the other hand, combining two or three biometric traits for vein recognition results in better performance compared to unimodal results achieved using hand-crafted feature extractors. Additionally, the proposed deep learning based CNN architecture is tested using two models and further experiments are conducted on all aforementioned datasets. Proposed CNN architecture with Model1 and Model2 outperformed the hand-crafted methods for vein recognition. Moreover, CNN Model2 achieved the best performance compared to all methods. Although the analysis on computation times for training and testing on CNN models and hand-crafted systems show that deep learning based CNN models require more computation time for training compared to hand-crafted feature extractors, testing times of CNN models are also acceptable. Consequently, CNN Model1 requires the lowest time for testing which makes it favorable for fast authentication.
Finally, the experimental results show that combining palmar, dorsal and wrist biometric traits increases the performance of multimodal vein recognition systems up to 100 percent on FYO database using both hand-crafted BSIF method and deep learning based CNN models. Therefore, our work proved that using three vein biometrics achieves superior performance and can be used as a defense against several attacks.
As a future work, several spoofing attack types, such as presentation attack, will be tested on our FYO database. Anti-spoofing methods according to the behavior of our proposed CNN-based architecture against attacks will be developed and different deep learning architectures, such as VGG-16 and DAG-CNN, will be implemented for multimodal vein biometrics.