Multi-Layer Convolutional Features Concatenation With Semantic Feature Selector for Vein Recognition

Pre-trained DCNN trained on a large-scale image database can be used as a universal feature representation for image classification, which has achieved significant progress in some image recognition tasks. Compared with other image recognition tasks, directly utilizing a single convolutional feature as feature representation for vein recognition task cannot achieve the impressive result due to the sparse distribution of vein information. Therefore, to obtain more representative and discriminative convolutional feature for vein recognition, a novel multi-layer convolutional features’ concatenation with semantic feature selector is proposed in this paper. In the pre-trained DCNN, different convolutional layers can encode different-level feature information. High-level convolutional features with vein information cover more semantic information and low-level convolutional features with vein information cover more detail information. However, low-level convolutional features also contain some background information. Therefore, in order to remove the background information of low-level convolutional features, a novel semantic feature selector is presented. First, the proposed local max-pooling of preserving spatial position (LMP-PSP) information is applied on activation map obtained by adding up all feature maps of the high-level convolutional layer to generate the semantic weighting map, which reflects key vein information of high-level convolutional features. Then, semantic weighting map is regarded as a feature selector to discard the background information of the low-level convolutional features and preserve the detail information of low level convolutional features. Finally, low-level convolutional features with vein information are selectively linked to high-level convolutional features with vein information based on the proposed semantic feature selector. A series of rigorous experiments on two lab-made vein databases named CUMT-Hand-Dorsa Vein database and CUMT-Palm Vein database is conducted to verify the effectiveness and feasibility of the proposed model. Besides, additional experiments with PUT Palm Vein database and the subset of PolyU database illustrate its generalization ability and robustness.


I. INTRODUCTION
With the rapid development of society and scientific technology, the protection of personal information security plays a more and more important role in people's life. Identity authentication, one of the most efficient protection measures The associate editor coordinating the review of this manuscript and approving it for publication was Xiaoyu Zhang. for personal information security protection, has paid considerable attention in recent years. Compared with traditional identity identification techniques including smart cards, passwords and signature, biometric recognition techniques such as face recognition [1], palmprint recognition [2], fingerprint recognition [3] and vein recognition [4], are developing rapidly in the last few years due to their specific properties. Vein recognition, one of the most popular biometric identification techniques, is playing more and more essential role in the identity authentication system. Vein recognition system usually consists of vein image preprocessing, feature extraction as well as classifier design and matching. Currently, most of current studies about this field mainly concentrate on generating more robust and discriminative feature representation, which is regarded as the most important and difficult part in vein recognition systems. In traditional vein recognition systems, hand-crafted feature representation algorithms are generally adopted to extract feature information of vein images. However, a more discriminative and robust vein recognition system is difficultly constructed by hand-crafted feature extraction algorithm due to its insufficient feature descriptor ability. Recently, deep convolutional neural network has achieved excellent performance in some image recognition tasks due to its discriminative feature representation capacity. However, because of the lack of sufficient training database like ImageNet, some image recognition tasks such as hand-dorsa vein recognition have great difficulty in training DCNN model. Thus, to utilize the discriminative feature representation ability of DCNN, Fang et al. [5] propose the two-stream convolutional network learning model for finger vein recognition system, and this model achieves the impressive performance. Besides, Wang et al. [6] present a novel task-specific transfer learning model to address the issues of over-fitting in the training process of DCNN model due to insufficient training vein images. Although acquiring the excellent performance, the above two vein recognition systems based on DCNN model always need the complex training process, leading to the low practicability of vein recognition system in a real environment.
Activations of a DCNN trained on a large-scale database [7], [8] such as ImageNet can be used as a universal image descriptor, and applying this descriptor to various image classification tasks delivers impressive performance. Liu et al. [9] present a novel cross-convolutional-layer pooling based on convolutional activation of pre-trained DCNN to better obtain discriminative deep representation for image recognition. Wei et al. [10] propose the selective convolutional descriptor aggregation model based global mean threshold method for fine-grained image retrieval. However, compared with other image recognition tasks, directly utilizing single convolutional feature as feature representation for vein recognition task cannot realize the excellent result due to the sparse distribution trait of vein image. Therefore, a novel multi-layer convolutional features concatenation with semantic feature selector is proposed to obtain more discriminative and richer convolutional feature for hand-dorsa vein recognition. In the pre-trained DCNN model, high-level convolutional features can learn more semantic information about samples and lowlevel convolutional features can learn more detail information about samples. If low-level convolutional features and highlevel convolutional features are connected together, richer and more discriminative deep feature representation can be obtained for vein recognition task. However, low-level convolutional features also contain more background information about samples. Therefore, a novel semantic feature selector is proposed to remove the background information the lowlevel convolutional features. First, the weighting map based on semantic information, which represents key vein information of high-level convolutional features, is produced by applying the proposed LMP-SPS on activation map generated by adding up all feature maps of the high-level convolutional layer. Second, we adopt the weighting map based on semantic information as a feature selector to select the useful detail information in low-level convolutional features. Finally, lowlevel convolutional features and high-level convolutional features are connected together as the final feature vector for vein recognition. Besides, PCA is introduced to reduce the dimension of the final feature vector for speeding up the training process of SVM. The outstanding recognition results of series rigorous comparison experiments on four databases including two lab-made databases and two public databases can demonstrate the effectiveness and robustness of the proposed multi-layer convolutional features concatenation with semantic feature selector for vein recognition system.
Overall, the main contributions of this paper are summarized as follows: • We propose a novel vein recognition model with multilayer convolutional features concatenation of pre-trained DCNN. Compared with other DCNN-based model, the proposed model not only realizes a simple, efficient, yet discriminative vein recognition system but also is of the high generalization capacity and robustness.
• A novel semantic feature selector is proposed to connect low-level convolutional features and high-level convolutional features for obtaining more discriminative and richer deep convolutional features.
• The LMP-SPS is presented to localize the key vein information such as ending points and cross points in the activation map generated by adding up all feature maps of high-level convolutional feature.
Compared with our previous published work [53], first, the proposed model in this paper fully employ the semantic information of high-level convolutional feature and the detailed information of low-level convolutional feature rather than the semantic information of single convolutional feature. Next, we propose a novel local max-pooling of preserving spatial position information (LMP-SPS) instead of global max-pooling of preserving spatial position information in prior work to better localize the key vein information of convolutional features. Third, the model proposed in this paper obtains more discriminative and richer convolutional features for vein recognition compared with our previous paper. Finally, the proposed model in this paper are evaluated on four databases instead of two databases in previous work to effectively verify its performance for vein recognition.
The remainder of this paper is organized as follows: the related vein recognition methods in recent years are introduced in Section II. In Section III, we propose a novel model for vein recognition with multi-layer convolutional features concatenation of pre-trained DCNN and illustrate the process VOLUME 7, 2019 of the connection of low-level convolutional features and high-level convolutional features. The experimental results and analyses are presented in Section IV. Finally, we summarize the paper and conclude the future work in Section V.

II. RELATED WORK
Vein recognition, one of the most popular biometric identification technologies, has been received considerable attention due to its higher security. Currently, many studies about this field mainly adopt several vein information such as hand-dorsa vein [11]- [13], palm vein [14], [15] and finger vein [16]- [18] as feature representation for identity verification. The feature extraction methods, whose design is regarded as the most essential and difficult part in vein recognition system, can be summarized into two categories including handcrafted feature extraction methods and deep feature extraction methods.

A. HANDCRAFTED FEATURE EXTRACTION METHODS
Handcrafted feature extraction methods divided into shape feature-based methods and texture feature-based methods have been widely applied to vein recognition tasks, which also achieves excellent performance.
Shape feature-based methods: These approaches usually adopt the shape information of vein image such as the positions and angles of the straight line vectors [19], endpoints and crossing points [20], vein knuckle shapes [21], dominant points [22] as feature representation for later classification. Apart from the above vein recognition model with shape information, Yang et al. [23] adopted the tri-branch vein structure as shape information to enhance the recognition performance of the template matching in finger vein recognition system. Huang et al. [24] proposed a novel hand-dorsa vein recognition with the fusion of texture information and shape clues generated by the proposed binary coding and graph matching. Yang et al. [25] proposed a novel feature representation methods for finger vein recognition based on adaptive vector field estimation by which the finger vein matching accuracy can be effectively improved. However, due to unreliable lighting condition in vein acquisition process, the acquisition of the accurate vein segmentation image for different images of the same sample is extremely difficult, which leads to the insufficient generalization ability of these methods.
Texture feature-based methods: Mainly concentrating on the texture variations within the vein image, these methods can further be divided into two categories: local gray distribution coding and local invariant feature. The former which covers local binary pattern (LBP) [26], local derivative pattern (LDP) [27], local ternary pattern [28] and local line binary pattern (LLBP) [29] is to depict the global gray distribution histogram on the basis of local coding calculation results. Besides, in order to enhance the feature descriptor ability of local gray distribution coding, Wang and Wang [30] proposed the discriminative local binary pattern (DLBP) for vein recognition to decrease the influence of contrast information during feature coding process. Xi et al. [31] presented the discriminative binary codes (DBC) learning method for finger vein recognition, which can obtain more discriminative feature representation and reduce the time cost and storage requirements. Kang and Wu [32] proposed a novel palm vein recognition method with a mutual foreground-based local binary pattern, which can achieve the better matching accuracy. In a word, although achieving the high performance for vein recognition, local gray distribution coding is of the insufficient generalization capacity due to the effect of contrast information of vein image. The latter which contains SIFT [33], SURF [34], RootSIFT [35] and ASIFT [36] generally consists of three main steps including scale-space establishment, extrema detection and descriptors generation. Wang et al. [37] adopted SIFT for hand vein describing after simple preprocessing, which refers to the procedure of denoising and contrast enhancement (CE). However, it is argued that the preprocessing would lead to the loss of vein information and the low contrast distribution renders great difficulty in generating enough descriptive keypoints. To remove the negative effect of CE, SIFT or SURF [38], [39] is extracted directly with input without any pre-processing and binarization procedure, resulting in considerable performance improvement. However, the existence of mismatching pairs between unpaired keypoints usually brings up the false acceptance rate (FAR) and equal error rate (EER), which is unacceptable for vein recognition. Inspired by the conclusion in [38]- [40], Kang et al. [41] tried to improve the performance of SIFT by way of complementary feature fusion and noisy keypoints removal, and they also import DoG-HE for better CE. In the unpaired matching removal stage, the LBP is adopted for describing the region distribution of the matching keypoints, and then the mismatching points are removed with LBP difference. The analytical disadvantage of the keypoints fusion method lies in the fact that the detected non-vein keypoints describing the palmprint or other skin region is verified to bring up the FAR. In order to address this issue, Huang et al. [42] proposed the novel key-point detection and matching for hand-dorsa vein recognition, and the proposed the key-point and detection accounts for the physiological characteristics of dorsal hand as well as the representativeness of the feature points. Thus, this model achieves impressive performance for dorsal hand vein recognition.

B. DEEP FEATURE EXTRACTION METHODS
DCNN has been widely applied to many image recognition tasks, images segmentation tasks and image detection tasks due to its discriminative feature representation ability. Currently, some researchers have been brought DCNN into vein recognition task. Wang et al. [6] proposed a novel coarseto-fine transfer learning model to address the problem of insufficient training database in hand-dorsa vein recognition, which achieves the impressive performance. Fang et al. [5] presented a lightweight deep-learning model to obtain representative and discriminative feature descriptor for finger vein recognition. Qin and El-Yacoubi [43] proposed a DCNN model to extract and recover vein features based on limited a priori knowledge, which improves the finger vein verification accuracy. Wang and Wang [44] designed a structure growing guided CNN model to learn more discriminative deep feature for vein recognition, which also improves the performance of vein recognitions.

III. METHODOLOGY
To obtain more discriminative and richer convolutional activation of pre-trained DCNN model, we propose a multi-layer convolutional feature concatenation model with semantic feature selector for hand-dorsa vein recognition system which is illustrated in Fig. 1. First, we apply pre-trained DCNN model such as VGG-16 to extract the convolutional features of the input vein image. Next, the semantic feature selector is generated by applying the proposed LMP-SPS on activation map obtained by adding up all feature maps of the highlevel convolutional layer. Then, the semantic feature selector is used to remove the background information of low-level convolutional features and acquire more discriminative lowlevel convolutional features. Finally, low-level convolutional features and high-level convolutional features are connected together as the final feature vector for vein recognition. Besides, PCA is introduced to reduce the dimension of the final feature vector for speeding up the training process of SVM.

A. THE ANALYSIS OF CONVOLUTIONAL FEATURES WITH VEIN INFORMATION
Convolutional activations of a pre-trained DCNN model can be widely utilized as image representation, which also achieves the excellent performance in some image classification and retrieval tasks due to the fact that convolutional activations contain richer spatial and semantic information. However, compared with other images, the vein image is of sparse distribution traits, which results in the fact that adopting single convolutional feature of vein information as image representation cannot achieve the acceptable result.
In the pre-trained DCNN model, high-level convolutional features generally cover richer semantic information of input image and low-level convolutional generally contain richer detail information of input image. If connecting the high-level convolutional features and low-level convolutional features, we will obtain richer and more discriminative convolutional features for vein recognition. However, the study [45] indicated that directly connecting the high-level convolutional features and low-level convolutional features as images representation cannot obtain the excellent result due to the fact that low-level convolutional contain some background information.
To better analyze the specific property of convolutional activation of vein image, we employ the pre-trained DCNN model to extract the convolutional activations of four vein images randomly selected on CUMT-Hand-Dorsa Vein database, and the visualization results of some convolutional features are shown in Fig. 2. It can be observed from Fig. 2 that the responses of feature maps in highlevel convolutional features mainly focus on several vein regions including the end points, cross points and local key vein information and the responses of feature maps in lowlevel convolutional features mainly concentrate on several regions including edges and background, which effectively demonstrates the fact that high-level convolutional features generally cover richer semantic information of vein images and low-level convolutional features generally contain richer detail information of vein images but also cover some nonvein information.

B. SEMANTIC FEATURE SELECTOR
Through the above analysis for the feature maps of convolutional layer based on vein information, it can be concluded that low-level convolutional features based on vein information not only contain more detail information, but also cover some non-vein information. Therefore, in order to effectively connect low-level convolutional feature with high-level convolutional features to obtain more discriminative and richer convolutional features, it is necessary to remove the non-vein information of low-level convolutional features. However, it can be seen from Fig. 2 that the distribution of background information of low-level convolutional features is scattered, which leads to the fact that it is difficult to directly remove non-vein information by using some measures on low-level convolutional features. During the visualization process of feature maps, we find the fact that the strongest parts of the strong responses in the feature maps of high-level convolutional layer generally correspond to some key vein information such as ending point or crossing point, and the strongest parts of the weak responses in the feature maps of high-level convolutional layer generally also correspond to some key vein information such as end point or cross point. Inspired by this idea, a novel LMP-SPS is proposed to obtain key vein information of highlevel convolutional features based on vein information. We do not employ the semantic information of single feature map of high-level convolutional layer due to the fact that it does not reflect the situation of semantic information of local vein region. Instead, we add up all feature maps of high-level convolutional layer to obtain activation mapA. Given the feature maps of high-level convolutional layer S ∈ R H 1 ×W 1× C 1 , the activation map is obtained as: where A is the activation map and S n is the n-th feature map of high-level convolutional layer. Base on the above analysis, the semantic feature selector W can be obtained by applying LMP-SPS on activation map A. In the neighborhood of 3 × 3, the proposed LMP-SPS can be represented as follow: where A 3×3 is the neighborhood of 3 × 3 in the activation map, W 3×3 is the partial semantic feature selector generated by applying the proposed method on the neighborhood of 3 × 3 and Tmax is maximum of the neighborhood of 3 × 3.
It should be noted that the feature maps of pool5 layer in VGG-16 model are adopted as high-level convolutional features in our experiment. In order to evaluate the performance of the proposed LMP-SPS in obtaining key vein information of high-level convolutional features, we randomly select six vein images on CUMT Hand-Dorsa Vein database and visualize the obtained semantic feature selector which are shown in Fig. 3. It can be seen from Fig. 3 that employing LMP-SPS on activation map can accurately obtain key vein information of high-level convolutional features, which also evidences the effectiveness of the proposed LMP-SPS.

C. MULTI-LAYER CONVOLUTIONAL FEATURES CONCATENATION
After semantic feature selector is generated by utilizing LMP-SPS on activation map, it is used to remove background information of low-level convolutional features. Given the original feature maps of low-level convolutional layer D ∈ R H 2 ×W 2× C 2 and then the low-level convolutional features D which is identical in size with semantic feature selector W can be acquired by conducting a max-pooling operation on the original low-level convolutional features. Low-level convolutional features which is removed non-vein information  D can be obtained as: It should be noted that the Conv3_3, Conv4_2 and Conv5_2 of VGG-16 model are adopted as low-level convolutional features. The concatenation of high-level convolutional features and low-level convolutional features can be achieved as: where f is the final multi-layer convolutional features. The detailed process of selective convolutional descriptor model is illustrated in Fig. 1. The whole process of proposed multi-layer convolutional features concatenation with semantic feature selector for vein recognition is summarized in Algorithm 1.

A. DATABASE
In this section, to evaluate the effectiveness of the proposed multi-layer convolutional features with semantic feature selector for vein recognition, a series of experiments is conducted on our two lab-made vein database respectively named as CUMT-Hand-Dorsa Vein database (CUMT-HDV) and CUMT-Palm Vein database (CUMT-PV). Besides, additional experiments are performed on two public database including PUT Palm Vein database [46] and PolyU Palmprint database [47] to verify the generalization ability of the proposed model, and state-of-the-art recognition result fully demonstrates the generalization capacity and robustness of the proposed model.
• Two lab-made vein databases, which contain CUMT-HDV database and CUMT-PV database, are captured by self-designed image capturing device which can be referenced from our previous works [48]. Two lab-made • PUT Palm Vein database, which consists of 1200 palm vein images, is captured by 100 palms and 12 images are acquired for each palms in three sessions. In our experiment, the palm vein images of the first sessions are used to evaluate the performance of the proposed model.
• PolyU Database, which contains 6000 images, are acquired by 500 palms in two sessions and 6 images are captured in each session. It should be noted that PolyU Multispectral Palmprint database (near-infrared part) is utilized to verify the generation ability of the proposed model. Some samples of four database including two lab-made databases and two public databases are as shown in Fig. 4.

B. EXPERIMENT DETAILS
In our experiments, we utilize the pre-trained VGG-16 model [50] which is trained on ImageNet database to extract convolutional activations of input vein images. Besides, the pool5 layer of VGG-16 model is adopted as the highlevel convolutional features and the Conv3_2, Conv4_2 and Conv5_2 layers of VGG-16 model are regarded as lowlevel convolutional features. Thus, S is the 7 × 7 × 512 convolutional features, and D 1 , D 2 , D 3 are respectively 56 × 56 × 256, 28 × 28 × 512 and 14 × 14 × 512 convolutional features. Due to the fact that low-level convolutional features is not identical in size with high-level convolutional features, the corresponding max-pooling operation is conducted before the connection of high-level convolutional features and low-level convolutional features. The size of multi-layer convolutional features obtained by linking high-level convolutional features and low-level convolutional features is 7 × 7 × 1792. Therefore, the final feature representation formed by concatenating multi-layer convolutional features is a 87808-dimension feature vector. Due to the fact that the size of obtained feature representation is too large, PCA is used to decrease the dimension of input feature vector of SVM for speeding up its training procedure. In addition, the vein feature representation which is 240 feature vector is regarded as the input data to train SVM. The training samples contain 200 * 5 images, and the test samples contain 200 * 5 images. The configuration of SVM in our experiments is that its kernel function adopt the redial basis function and its penalty parameter as well as gamma are respectively set as 128 and 0.0078.

C. PERFORMANCE EVALUATION OF MULTI-LAYER CONVOLUTIONAL FEATURES CONCATENATION WITH SEMANTIC FEATURE SELECTOR
In this section, the effectiveness of multi-layer convolutional features concatenation with semantic feature selector for vein recognition is evaluated on our two lab-made vein databases such as CUMT-HDV database and CUMT-PV database. To verify the advantage of the proposed method in obtaining more discriminative and richer deep convolutional feature, several encoding and pooling methods such as maxpooling, average-pooling, FV, VLAD, CL [9] and SCDA [10] are adopted as comparison experiments. The experimental results as shown in Tables 1. It should be noted that the feature maps of pool5 layer are regarded as convolutional features in comparison experiments. Besides, to fully evaluate the performance of the proposed method, we respectively connect high-level convolutional feature and different lowlevel convolutional features with or without semantic feature selector. The performance of different parts of the proposed model is illustrated in Table 2.
Judging from the experimental results in Table 1, the proposed multi-layer convolutional features concatenation with semantic feature selector compared with other encoding and pooling methods achieves the best performance. The stateof-the-art recognition rates with 97.66% and 96.82% demonstrate the effectiveness of multi-layer convolutional features concatenation with semantic feature selector for vein recognition. It can be observed from Table 2 that the multi-layer convolutional features with semantic feature selector achieves the highest recognition rate, which demonstrates the fact that convolutional features from a single convolutional feature are not sufficient to obtain excellent performance and also indicates the fact that the proposed model can effectively remove the background information of low-level convolutional features and obtain more discriminative and richer convolutional features for vein recognition.

D. COMPARISON WITH STATE-OF-THE-ART MODEL
In this part, several experiments are designed on our two lab-made vein database including CUMT HDV database and CUMT-PV database to verify the advantage of the proposed multi-layer convolutional features concatenation with semantic feature selector for vein recognition over other state-of-the-art feature extraction algorithms. Multi-modal experiments in the scenario of verification are designed. In such scenario, the first image is regarded as gallery whereas the remaining images are exploited as probe. Therefore, the number of genuine and imposter scores is 10080 (224 × 10 × 9/2) and 2497600 (10 × 10 × 224 × 223/2).
Three kinds of representative vein recognition algorithms which contain local invariant feature, local gray distribution coding and deep feature extraction model are utilized as comparison model to fully validate the advantage of the proposed multi-layer convolutional features concatenation with semantic feature selector for vein recognition. Local invariant feature models such as SIFT [33], SURT [34], RootSIFT [35] and ASIFT [36] are regarded as one of the most best handcrafted feature representation algorithms in vein recognition tasks due to their advantage in the invariant of rotation, translation and scale uncertainly. Local gray distribution coding models mainly contain LBP [26], LDP [27], LTP [28] and LLBP [29], which have been widely applied to vein recognition tasks in recently years. The deep feature extraction algorithms including transfer learning model [6], structure growing guide CNN [44] CL [9] and SCDA [10] achieve outstanding recognition results due to their robustness feature descriptor ability. The ROC curves of three feature extraction models are illustrated in Fig. 5 and Table 3.
It can be concluded from Fig. 6 that the proposed multilayer convolutional features concatenation with semantic feature selector model performs far better than the LIF (Local Invariant Feature) models with EER as 1.28% and 1.74% whereas the best of LIF is 2.25% and 2.63% with RootSIFT as well as the best of LBPs is 1.96% and 2.38% with LDP, and the state-of-the-art recognition results fully evidence the discriminative and representative feature representation ability of the proposed model for vein recognition task. It can be observed from Table 3 that compared with the coarse-tofine transfer learning model [6] and structure growing guide CNN [44], our proposed model is simpler. Besides, the EER result as 1.28% and 1.74% achieved by the proposed model is better than the EER results as 1.72% and 2.21% as well as 2.41% and 3.15% respectively obtained by the coarseto-fine transfer learning model and structure growing guide CNN, which also verifies the advantage of the proposed VOLUME 7, 2019  model. Compared with CL [9] and SCDA [10], our proposed method can achieve higher performance for hand-dorsa vein recognition task. In addition, the EER result as 1.28% and 1.74% achieved by our proposed model is better than the EER results as 1.78% and 2.25% as well as 2.13% and 2.76% respectively obtained by the CL and SCD, which also illustrates the high performance of the proposed model for hand-dorsa vein recognition.    vein information. Identity recognition systems in real environment should not only have higher recognition result but also reduce their time-consuming as much as possible. Sometimes, recognition rate of identity recognition system may be appropriately decreased to optimize the time cost of the system. The key time consumption parts of our proposed multi-layer convolutional features concatenation with semantic feature selector mainly contain the extraction process of multi-layer convolutional features, the process of multilayer convolutional features concatenation with selector and classification. The evaluation experiment of time cost of our proposed model is conducted by MATLAB 2017 on a PC with CPU 3.30GHz and 8.00GB memory. The average time consumption of each image is shown in Table 4. It can be observed from Table 4 that the time cost of the proposed method is acceptable on two lab-made vein database, which demonstrates the fact that the proposed model can better meet the demand of identity recognition systems based on vein information for time consumption.

F. GENERALIZATION EVALUATION OF THE PROPOSED MODEL
After the effectiveness of the proposed model are evaluated on our two lab-made vein databases, two experiments are designed on two public database including PUT Palm Vein database (single session) and PloyU Multispectral Palmprint VOLUME 7, 2019 database (near-infrared part) to verify the generalization ability of the proposed model. In our experiments, the first image is regarded as gallery whereas the remaining images are exploited as probe. The experiments results derived from recently published methods on two public database are shown in Table 5 and Table 6. It should be noted that several methods based on DCNN model for hand-dorsa vein recognition are utilized as comparison experiments with PUT Palm Vein database to better evaluate the generalization capacity of the proposed due to the fact that recently published methods on PUT Palm Vein database are insufficient. It can be observed from Table 5 and Table 6 that the proposed model could achieve state-of-the-art EER level compared with others methods, which effectively verifies the high performance of the proposed model in the generalization ability and robustness.

V. CONCLUSION
In this paper, a novel multi-layer convolutional features concatenation with semantic feature selector is proposed to increase the feature representation ability of a pre-trained DCNN for vein recognition. First, a pre-trained DCNN model is used to extract the multi-layer convolutional features of input vein images. Second, the weighting map based on semantic information, which includes key vein information of high-level convolutional features, is produced by applying LMP-SPS on activation map generated by adding up all feature maps of the high-level convolutional layer. Third, we adopt the weighting map based on semantic information as a feature selector to select the useful detail information in low-level convolutional features. Finally, low-level convolutional features and high-level convolutional features are connected together as final feature vector for vein recognition. Besides, PCA is introduced to reduce the dimension of final feature vector for speeding up the training process of SVM. The outstanding recognition results of series rigorous comparison experiments on four databases including two labmade databases and two public databases can demonstrate the effectiveness and robustness of proposed model for vein recognition system.
We also argue that the proposed multi-layer convolutional features concatenation with semantic feature selector is also applicable for other computer vision task.
In the future, we can try to design an end-to-end DCNN with a novel attention model for hand-dorsa vein recognition.