Multimodal Biometric Recognition Based on 3D Ultrasound Palmprint-Hand Geometry Fusion

In recent years, multimodal biometric systems are increasingly employed in many application field due to several advantages in terms of universality, recognition rate. and security. Among various acquisition technologies, Ultrasound shows important merits, because it allows obtaining volumetric images of the human body and hence a more accurate description of characteristics and to verify liveness. In this work, a multimodal ultrasound recognition system based on the fusion between 3D hand geometry and 3D palmprint features is proposed and experimentally evaluated. The system acquires a volumetric image of the whole hand and for both characteristics, several 2D images are extracted at different depth levels. From each image, 2D features are extracted and then properly combined to achieve a 3D template. Recognition performances are evaluated through verification and identification experiments by employing a homemade database. Experiments are carried out first for the two unimodal biometrics and successively, by fusing the two modalities at score level. Results have shown that fusion is able to dramatically improve the recognition performances of the single biometrics, achieving an Equal Error Rate of 0.08% and an identification rate of 100%.


I. INTRODUCTION
The immense necessity for security solutions and devices favored the development of increasingly advanced recognition systems including the biometric systems, which are emerging in various applications where high personal security is required. Biometric recognition consists to identify an individual based on physical and behavioral characteristics, progressively substituting the existing personal authentication methods based on PIN and password.
At present, biometric systems based on characteristics as fingerprint, palmprint, face, iris systems are employed in a wide range of civilian applications. Among the others, hand-based biometrics are showing popularity as they present supposedly unique and time-invariant anatomical structures greatly exploitable for recognition [1].
Hand geometry is well-established biometrics, as it has been employed in applications for many decades throughout the world [2]. Its main merits include universality, invariance for a long time, collectability, and acceptability, while its distinctiveness is not among the best. Due to these reasons, The associate editor coordinating the review of this manuscript and approving it for publication was Lefei Zhang . hand geometry is used mainly in verification rather than identification modality [3], [4].
Palmprint has similar merits as hand geometry, but it also exhibits a very rich texture, which can be appreciated with high-resolution images. This approach is exploited mainly in law or forensic environments [5], [6]. Another approach, which is suited for access control applications, consists to extract only principal lines and main wrinkles; in this case, lower resolution images can be collected [7], [8]. In recent years, wide research activity has been devoted to 3D palmprint, which is able to overcome several limits of 2D palmprint [9]- [12].
In recent years, multimodal systems, which combine two or more biometric features, are becoming more and more popular because they result better than unimodal ones, in terms of recognition rate, universality, allowing to authenticate users for which one of the single biometrics cannot be measured, and security [13], [14].
Several hand-based multimodal systems have been proposed in literature based on the fusion of hand-geometry and palmprint [15]- [17], which have the advantage to give more distinctiveness to hand geometry through palmprint traits. In general, it is desirable to extract both palmprint and hand geometry features from a single image, with the benefit of using only one sensor for image acquisition, with cost-saving and improvement of the acceptability from users, which are not forced to use different acquiring devices.
Different technologies have been experimented with for collecting images of the human hand, including optical, thermal, ultrasonic. The optical one is mostly based on the use of CCD cameras to collect 2D or 3D images [18], [19] while thermal images are usually acquired by using infrared radiation [20]. Contactless biometry is currently widely investigated [21]- [23], principally for users acceptability and reasons of personal hygiene and customs because some people resist placing their hands on the device that is touched by other individuals and also respect the pandemic requirements. However it seems not as reliable yet as conventional methods.
Ultrasound is an imaging modality that has been widely used in many different application fields including Medical Diagnostics [24], Non Destructive Evaluation [25], Indoor Localization [26]. Systems able to collect ultrasound images of human regions cannot be contactless, yet they have several peculiarities that make them suitable for biometric authentication [27]. The most important one is probably the capability to penetrate the human body in a non-invasive and healthy way, allowing the possibility to provide volumetric images of regions of the human body without exploiting ionizing radiations such as rays-X and electromagnetic waves but only sound waves that cannot be risky for the health of the individual. This feature implies: a more accurate (3D) description of the characteristic, consequently improving recognition accuracy; possibility to extract different biometrics from the same acquired volume to implement a multimodal system; capability of detecting liveness of the sample by checking vein pulsing, which, together with the intrinsic difficulty of creating a fake of under-skin features, makes the system almost unspoofable. In addition, ultrasound images are not affected by environmental changes of light, temperature, or humidity, by impurities of the hand like grease or ink and may allow to extract features also in presence of skin abrasion.
Various features have been extracted with ultrasound methods, the most popular being fingerprint [28], [29], for which micromachined sensors were recently realized and integrated into portable devices [30]. Other biometrics include inner geometry of fingers or palm hand [31], [32], hand veins [33]- [36], and palmprint [37]- [39]. An ultrasound system able to acquire a volume of the whole hand was proposed as well [40]. More recently, a preliminary database was established with this system, allowing to experimentally evaluate a recognition system based on 3D hand geometry features [41].
In the present work, a multimodal recognition system based on the fusion between 3D hand geometry and 3D palmprint features is proposed and experimentally tested. An improved and more accurate procedure for extracting 3D hand geometry features than the one proposed in [41] is first derived; then, by automatically selecting a square region of the palm, 3D features of palmprint are achieved as well.
Verification and identification experiments, carried out on the same database used in [41], are finally performed for the two single biometrics and for the fusion between them, implemented through a weighted average of the single scores.
The rest of the paper is organized as follows: section II describes the acquisition modality of the 3D image, section III illustrates 2D and 3D features extraction for both hand geometry and palmprint, section IV reports verification and identification results for hand geometry, palmprint, and fusion, section V reports the conclusions.

II. 3D ULTRASOUND IMAGE
The setup for collecting 3D ultrasound images of the whole human hand has been described in detail in [41]. It is composed of an advanced ultrasound research scanner [42], which drives a commercial 192 elements linear array (LA435, Esaote S.p.A., Genoa, Italy). The probe, which has a central frequency of 12 MHz, a pitch of 200 µm, and an elevation aperture of about 3.5m, is bound to a numerical pantograph (by Delta Macchine CNC, Vazia (RI) -Italy). Volunteer's hand is immersed in a container full of distilled water. A volumetric image is obtained by moving the probe, which is immersed as well, along the elevation direction. While the probe is moving over the whole hand, B-mode images are collected and saved. Several parallel scans are needed to cover the whole volume of 166×200×27 mm 3 ; collected data are realigned in a postprocessing phase so that the volume is represented as an 8-bit grayscale 3D matrix (416 × 500 × 68 voxels). The relatively low resolution, which is 0.4 mm in any direction, is due to the limited internal memory of the scanner (256 MB). An example of 3D rendering of a human hand is shown in Figure 1. It is obtained by removing dark voxels, representing water, through an opportune thresholding operation. In order to extract features of both hand geometry and palmprint, 2D renderings are extracted at different depths from the 3D volume by projecting the external surface of the hand on the xy plane. Also, this surface can be translated along z inside the skin and then projected again on the xy plane, obtaining in this way 2D images of the hand at several under skin depths. In a previous work [41], six of VOLUME 10, 2022 such 2D images were extracted with a depth ranging from 100 µm to 600 µm with step 100 µm to define 3D templates for the hand geometry. In the present work, by performing opportune interpolation along z-axis, a much higher number (14) of 2D hand images were extracted, to achieve richer 3D features. Also in this case the shallowest image was extracted at 100 µm while the deepest one at 750 µm; the step was 50 µm. Figure 2 shows the 2D images extracted at 14 different depths with the above-described procedure. As can be observed, as depth increases, the images show overall lighting. This behavior is caused by the presence of more residual water pixels on the shallowest images. Due to this problem, even if the image at depth of 50 µm was extracted as well, it has not been considered in further elaboration because it results too dark.

III. FEATURE EXTRACTION
Features for both hand geometry and palmprint are extracted from the 14 images shown in Figure 2. For the hand geometry they are represented by a number of measurements taken from the human hand, like the size of palm, lengths and widths of fingers, while for palmprint by principal lines and main wrinkles.

A. HAND GEOMETRY
A procedure for extracting features from a 2D ultrasound image of the hand is first described. Then, three kinds of 3D templates are defined by opportunely combining 2D features.

1) 2D TEMPLATE
The procedure used for extracting a 2D template from each of the images in Figure 2 is very similar to that presented in [41] and [43]. At first, short-tailed noise and details are removed on the 2D image through a median filter, which is an effective method to distinguish out-of-range isolated noise from the edges. The resulting image is binarized with a suitable threshold and a reference point, p r , is defined at the middle point on the wrist boundary. Then, the distance between this reference point and each point D i on the contour of the hand is calculated with the Euclidean distance formula: where i varies from 1 to N, (x bi , y bi ) are the coordinates of boundary pixel in the clockwise direction from the reference point p r . Several feature points are then extracted; they are shown in Figure 3a with different colors: • finger peaks, in red, calculated as local maxima of D i . • valleys between fingers, in yellow, calculated as local minima of D i and named ''base points''. • other finger base points, in green, defined as points that have the same distance from the peak as the other base point for the thumb, index, and little fingers.
• middle point, in blue, defined as the arithmetic mean between the two base points of each finger.
• extra point, in pink, defined as the point on the right boundary of the hand that is at the same distance from p r as the left base point of the thumb. Starting from these points, 26 distances are calculated to define the 2D template. As can be seen from Figure 3a, they include fingers' width and length (in red), for a total of 20 distances, the whole hand length (in yellow), the three distances among thumb, index, and little fingers (in blue) and palm width and length (in green).

2) 3D TEMPLATE
A 3D template is obtained by combining 2D templates extracted at 14 levels of depth. Three kinds of combinations are evaluated: • Mean Features (MF): 3D template is characterized by the same number of lengths as the 2D template, where each length is computed as the mean value of the lengths obtained at each depth.
• Weighted Mean Features (WMF): similar to MF, but in this case, each length is represented by a weighted mean of the lengths obtained at various depths.
• Global Features (GF): 3D template contains all lengths computed at every depth.

B. PALMPRINT
Similar to hand geometry, feature extraction firstly consists to obtain 2D templates at each depth and then to combine them in order to generate a 3D template, which contains information on principal lines' depth.

1) 2D TEMPLATE
The procedure for extracting a Region OF Interest (ROI) for palmprint from the whole hand is shown in Figure 3b. This kind of approach is usually used to guarantee a repeatable ROI extraction in optical palmprint [9]. This operation is carried out for each of the images of Figure 2 and the corresponding 2D palmprints are shown in Figure 4. It is to underline that, in previous works [37], [38] where ultrasound palmprint images were collected with a single probe swept, no reference point was available to align the image, making necessary additional processing during the matching operation to take into account for possible translations or rotations of the image when it is collected. 2D feature extraction procedure adopted is based on a classic ''line-based'' approach [44]- [46]. The input image (Figure 5a) is first preprocessed by performing a bicubic resize from 133 × 133 pixels to 266 × 266 pixels and a contrast adjustment. The main algorithm basically consists of scanning the image along four directions (0 • , 90 • , 180 • , 270 • ) and then summing the four binary images obtained. For each scanning direction, two main operations are performed: identification of the first edge of lines, performed by using derivatives, and deletion of isolated points and short lines following the same approach as in [46]. Figure 5b shows the result of the first operation obtained for the direction 0 • , while Figures 5c to 5f the features extracted along the four directions and Figure 5g their logical sum. Some morphological operations [47] are finally performed to obtain the final 2D template (see Figure 5h).

2) 3D TEMPLATE
A procedure for achieving 3D templates, able to provide information on lines' depth, from ultrasound palmprint images, was proposed in [46], [48]. In these works, 2D templated are extracted for each depth. The shallowest one, which is assumed to be the richest one of information, is dilated and compared through an AND operation with the immediately deeper one. The result is dilated, stored, and compared with VOLUME 10, 2022  the next deep 2D template, and so on until the deepest one. In this way, the 3D template is represented by a matrix A(i, j, n), where n is the number of 2D palmprint images collected at various depths. The dilate operation allows to consider that, as the under skin depth increases, the principal lines may not be perpendicular to an xy plane. The AND operation can be considered as a filter for spurious traits that may be present in deeper images. With this procedure, the depth information of the line is associated with the number of ''1'' scored at each A(i, j) pixel and is usually plotted as a color scale 2D image [46], [48].
In the present work, a similar method is applied but, this time, the 3D template generation procedure does not start with the shallowest 2D template (Figure 6a) because, as can be seen in Figure 4, shallowest images are quite dark and, furthermore, they present some artifacts that progressively disappear by increasing the depth. Best results are obtained by starting the procedure with the template extracted at depth of 350 µm (Figure 4f). The 3D template is then generated by performing dilation and AND comparison towards both shallower and deeper 2D templates.
It should be highlighted that the choice of the dimension β of structuring element employed for the dilatation operation results quite delicate because if it is too high, the 3D template could contain some spurious traits, whereas if it is too low, some principal information could be eliminated. Figure 7 shows the 3D template obtained from the 14 2D templates of Figure 6, by setting β = 5, in a color scale representation, where every pixel can assume a value which varies in a range from 0 to 13. Such value represents the number of 2D templates that contain a determined white pixel, that is, the depth of a single trait. Therefore, a blue pixel (value 0) is achieved whenever no white pixel is present in any 2D template while a dark red one (value 13) when all 2D templates exhibit white pixels. For comparison, the 350 µm depth 2D palmprint (Figure 4f) is reported as well.

IV. RECOGNITION RESULTS
Recognition performances are tested by executing verification and identification experiments on the same database used in [41]; however, in this work, a few more samples, which were discarded in the previous work, were recovered through some dedicated image processing. In this way, a total number of 110 samples, collected from 50 different users of age ranging from 18 to 55 of which 36 males and 14 females, is achieved. The majority (47) however were students with age ranging from 18 to 29, while the age of the remaining three ranges from 50 to 55.

A. VERIFICATION EXPERIMENTS
Verification modality allows to authenticate a person by checking his claimed identity. It performs a one to one comparison between a query template, obtained by the sample released by a user, and a reference template stored in the database. Verification experiments are performed by comparing each template with all others in the database both for hand geometry and palmprint. Fusion between these two characteristics is finally implemented.

1) HAND GEOMETRY
The comparison between two templates is performed using absolute distance function both for 2D and 3D templates, according to [41], where it was demonstrated that this function provides best results: where Q i and R i are the i th distances of query and reference templates, respectively. If the score is the result of a comparison between two templates belonging to the same user, a genuine score is registered; instead, if the score results from a comparison between templates of different users, an impostor score is registered. To decide if a user is authenticated or not, a predefined threshold is set. If the matching score exceeds such threshold, it is authenticated, otherwise, it is rejected. Two types of errors can be committed: False Rejection Error occurs when a genuine score is lower than the threshold, False Acceptance Error occurs when an impostor score is higher than the threshold.
For each value of the threshold, False Acceptance Rate and False Rejection Rate are computed through the ratio between the occurrences of false acceptance and false rejection and the total scores, respectively. In order to evaluate the performances of biometric systems, a parameter called Equal Error Rate (EER) is often used. This error occurs in correspondence with the threshold value for which FAR = FRR. Recognition performances of various biometric systems are usually compared through Detection Error Tradeoff (DET) curves, which plot the False Rejection Rate against the False Acceptance Rate. Figure 8 shows the DET curves obtained exploiting 2D images at various depths. The closer the plot to the Cartesian axes, the higher the recognition accuracy. As can be seen, the best results are obtained with the deepest images. In the figure, the EER is simply calculated as the intersection of each curve with the first bisector (FAR = FRR). Table 1 resumes EER values for all the curves; the lowest one is achieved at a depth of 750 µm.
Recognition performances were also evaluated by exploiting 3D templates defined in the previous section: GF, MF, and WMF. Figure 9 shows the DET curves obtained. For comparison, the DET curve of the best 2D result, at a depth of 750 µm, is also plotted. As can be seen, the results achieved with 3D templates by considering GF and MF improve those achieved with 2D templates. Particularly, GF exhibits the best accuracy, further improving results presented in [41]. EER values obtained with 3D templates are reported in Table 1.

2) PALMPRINT
The matching score between two 2D palmprint templates is obtained by a classic pixel-to-area approach [9], [46]. This  method mainly consists of a logical AND operation between corresponding pixels of two binary images: where T R and T Q are the reference and query templates, respectively, n × n is the dimension of the template and S R and S Q are the sum of pixels of value ''1'' in T R and T Q , respectively. Figure 10 reports DET curves computed by using 2D templates at the 14 depth levels. As can be seen, the lowest EER value is obtained by considering the depth of 350 µm.
As far as 3D templates are concerned, eq. 3 is modified to account for the number of occurrences of each pixel:  where α is a parameter whose values range from 0 to the number of 3D template levels, T R (i, j, 1) and T Q (i, j, 1) are the 3D reference and query templates at level 1, respectively, and O R (i, j) and O Q (i, j) are the occurrences of value ''1''. The additive term |O R (i, j) − O Q (i, j)| < α ensures that the comparison between the 3D templates produces a ''1'' only if the occurrences of corresponding pixels in the two templates differ for a maximum of α. Therefore, α is directly proportional to the acceptable difference of such occurrences. Figure 11 reports the DET curves obtained with 3D templates by varying α in a range from 3 to 9 and by setting β = 5, together with that relative to the best 2D case. We can observe that the 3D method improves the recognition performances of the 2D one for most values of α. Particularly, the best recognition capability is obtained for α = 6. All the EER values of the curves plotted in Figures 10 and 11 are reported in Table 2.
It should be noted that these recognition results are worse than those achieved in previous works, by using both water [46] and gel [48] as a coupling medium. This behavior is to be attributed to the far lower resolution of images collected with the setup employed in this work, due to the much greater scanned volume with the same available memory.

3) FUSION
Fusion between different characteristics can be carried out at various levels such as sensor-level, feature-level, score-level and decision-level [23], [49], [50]. Among them, score level fusion, which is performed after matching operation, is the most popular because it results quite easy to implement and, on the other hand, it guarantees adequate information content.
A score level decision is made by performing a combination among the scores achieved by each of the biometrics involved [51]. This operation is usually carried out by assigning appropriate weight to each score. As in previous works [9], [39], [52], the weight is chosen to be inversely proportional to the EER obtained for each biometrics: where w i represents the weight of R i , e i is the respective EER, and n is the number of the characteristics. Figure 12 shows DET curve obtained with the MW fused feature and, for comparison best DET curves previously obtained for 3D palmprint and 3D hand geometry. As can be seen, the fusion of the two features dramatically improves the recognition performances concerning the single modalities, achieving an outstanding EER value of 0.08%.
In order to better investigate and highlight the reason for this improvement, an analysis of genuine and impostor distributions as well as of the plots of FAR and FRR as a function of the threshold was carried out for all three cases reported in Figure 12. Figure 13 shows the achieved results. As can  be seen, as a consequence of the weighted mean described by eq. 5, distributions of both genuine and impostor scores are translated to an intermediate position between those of the two single biometrics. This operation determines a minimization of the overlapping area between the genuine and impostor curves and hence of recognition errors. An example of how fusion can improve recognition performances can be given by observing the very low genuine score (about 0.4) in the palmprint distribution that is completely compensated by the corresponding scores in the hand geometry distribution.
The EER values obtained from each characteristic are resumed in Table 3.

B. IDENTIFICATION EXPERIMENTS
While verification allows to authenticate a person, identification modality has the purpose of establishing the identity of an unknown person. Identification experiments were carried out for 3D hand geometry (GF), 3D palmprint (α = 6) and fusion between them as in the previous section. For all cases, it is assumed that each template is compared with all the other templates contained in the database. The resulting matching scores are memorized in 110 tables, each for acquisition, where every table contains 109 scores, resulted from comparisons of a determined acquisition, ordered from the highest to lowest. Successively, the normalized difference between the lowest genuine score and the highest impostor score is computed for each table and its distribution is plotted in Figure 14. As can be seen, for both hand geometry ( Figure 14a) and palmprint (14b) the Normalized Score Difference is lower than zero for the 8% and 2% of the tables, respectively. The presence of some negative values of normalized differences corresponds to an identification rate lower than 100 %.
These values were obtained when the lowest genuine is lower than the highest impostor and in this case the individual isn't correctly identified. As far as the fusion is concerned ( Figure 14c), instead, the Normalized Score Difference is always higher than 0, hence ensuring an identification rate equal to 100%.

V. CONCLUSION
In this work, a multimodal recognition system based on the fusion of hand geometry and palmprint, obtained through ultrasound images, is proposed and tested. The system acquires a volumetric image of the whole hand and for both characteristics, several 2D images are extracted at different depth levels. From each image, 2D features are extracted and then properly combined to achieve a 3D template. Verification and identification experiments are carried out on a homemade database in order to test the performances of the system, first for hand geometry and palmprint separately and, successively, by fusing the two modalities. Through such experiments it is demonstrated that the fusion dramatically improves the recognition performances of the system with respect to the single biometrics, achieving excellent results: EER of 0.08% and recognition rate of 100% in verification and identification modality, respectively.
The multimodality could be further extended by extracting, from the same collected volume, other biometrics like internal hand geometry [32] and palm veins [33], [34], [36], still improving system's universality, recognition accuracy and resistance to fraudulent attacks. As a matter of fact, it is quite impossible to fake inner organs and, furthermore, the proposed system can easily certify aliveness by automatically checking vein pulsing. Due to these characteristics, it seems especially suited for high-security applications.
Other merits of the proposed technique are found in the intrinsic properties of Ultrasound, i.e., the insensitivity to change in environment humidity, light or temperature as well as to several types of contamination, like ink or grease stains, on the skin.
Further improvements of the proposed systems currently under work include the upgrading of the memory of the ultrasonic scanner to allow acquisition of higher resolution images and the reduction of the time needed to collect the volumetric image of the hand. It is also under test the possibility to use gel as a coupling medium between the probe and the human hand, instead of water, as successfully experimented for palmprint recognition [37], [48], [54]. This acquisition modality is expected to make the system more acceptable by people than the present wet one.
It is also under planning the establishment of a wider database, to verify the reliability of the achieved recognition results. Such a database would be also used to experiment with features extraction, matching and fusion methods expecially feature-level [23] and machine learning techniques [55], which requires extra samples for training.