Journals & Magazines >IEEE Access >Volume: 11

Unveiling the Power of Simplicity: Two Remarkably Effective Methods for Fingerprint Segmentation

This paper introduces two novel fingerprint segmentation methods, GMFS and SUFS, inspired by the KISS principle. Both methods are evaluated on a public benchmark, achievi...

Abstract:

Accurate fingerprint segmentation is crucial for reliable fingerprint recognition systems. This paper presents two novel segmentation methods, GMFS and SUFS, inspired by ...Show More

Metadata

Abstract:

Accurate fingerprint segmentation is crucial for reliable fingerprint recognition systems. This paper presents two novel segmentation methods, GMFS and SUFS, inspired by the KISS (Keep It Simple and Straightforward) principle. Both methods, evaluated on a public benchmark and compared to eighteen state-of-the-art approaches, excel in terms of accuracy, while maintaining simplicity and computational efficiency. GMFS utilizes a single handcrafted feature for straightforward yet effective fingerprint segmentation, achieving superior performance compared to previously reported traditional methods. SUFS employs a simplified U-net architecture for end-to-end segmentation, demonstrating remarkable performance: it achieves an average classification error rate of 1.51% across the entire benchmark, with an improvement of over 40% compared to the previously best-performing method. Furthermore, despite being trained on a relatively small dataset, it exhibits significant generalization capabilities, effectively segmenting fingerprints from very different acquisition technologies without requiring fine-tuning. An open-source Python implementation of both methods is available, fostering further research and development in the field of fingerprint recognition.

This paper introduces two novel fingerprint segmentation methods, GMFS and SUFS, inspired by the KISS principle. Both methods are evaluated on a public benchmark, achievi...

Published in: IEEE Access ( Volume: 11)

Page(s): 144530 - 144544

Date of Publication: 21 December 2023

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2023.3345644

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Fingerprint recognition has emerged as a ubiquitous biometric technology, revolutionizing personal identification due to its reliability and accuracy [1], [2]. Its versatility has made it a cornerstone of various applications, including criminal investigations, border control systems, and mobile device authentication. A critical step in fingerprint recognition is segmentation, the process of isolating the fingerprint pattern from the background (Fig. 1). Segmentation is essential as it eliminates noise and irrelevant information, paving the way for accurate fingerprint recognition algorithms.

FIGURE 1.

An example of fingerprint segmentation. (a) a fingerprint, (b) segmentation mask, (c) segmentation mask contour overlaid on the fingerprint.

Show All

The field of fingerprint segmentation has witnessed a proliferation of techniques, transitioning from traditional methods based on handcrafted features in the spatial or frequency domain, to more sophisticated approaches utilizing classifiers, genetic algorithms, clustering, and, more recently, deep learning. While these advancements have undoubtedly improved segmentation performance, they have also introduced a degree of complexity that can hinder implementation and comprehension of the underlying mechanisms driving performance gains. This study adopts the KISS (Keep It Simple and Straightforward1) principle [3], [4], seeking to develop novel segmentation methods that achieve state-of-the-art performance while maintaining simplicity.

The primary contributions of this paper are as follows:

A novel segmentation method based on simple and efficient image processing steps that achieves state-of-the-art performance.
A novel segmentation method based on a simplified U-net architecture that surpasses all previous methods evaluated on a public benchmark and is able to deal with fingerprints acquired through very different technologies without requiring fine-tuning.
An open-source implementation of both methods to facilitate further research and development in this domain.

The rest of this paper is organized as follows. Section II reviews the main fingerprint segmentation methods proposed in the literature. Section III describes the two novel fingerprint segmentation methods. Section IV reports experiments aimed at evaluating the performance of the proposed methods and comparing them to the state-of-the-art on a public benchmark. Finally, section V draws some concluding remarks.

SECTION II.

Related Works

During the last four decades, more than 100 methods for fingerprint segmentation have been published [1], [5]. The earliest methods typically split the image into blocks (e.g., $16\times 16$ pixels), extract some features from each to classify it as foreground or background, and finally perform a few postprocessing operations. Mehtre et al. [6] look for the presence of peaks in histograms of local ridge orientations, and, in [7], consider also the gray-level variance. Ratha et al. [8] use the gray-level variance as well but computing it along the direction orthogonal to the ridge orientation. Bazen and Gerz [9] compute the local mean and variance of gray-level intensities, and the coherence of local gradients; a simple perceptron is trained as a foreground/background classifier from these features. A similar technique is proposed in [10]. Shen et al. [11] and Alonso-Fernandez et al. [12] propose methods based on Gabor filter bank responses, while Wu et al. [13] choose Harris corner response as main feature and Gabor filter responses for postprocessing. Zhu et al. [14] start from a local orientation estimation as the main feature, then a shallow neural network helps to detect wrongly estimated orientations and improves segmentation. Wang et al. [15] propose a method based on Gaussian-Hermite moments. Other methods rely on fuzzy C-means clustering [16], [17], [18].

Given that fingerprints are characterized by an oriented pattern with frequencies only in a specific band of the Fourier spectrum, some authors propose segmentation methods that work in the frequency domain. In [19], Chikkeru et al. perform a local Fourier analysis with the goal of enhancing the fingerprint pattern, obtaining a fingerprint segmentation mask together with an estimation of local ridge orientation and frequency. Hu et al. [20] apply a Log-Gabor filter in the frequency domain and combine it with orientation reliability information. Marques and Thome [21] extract feature vectors based on the Fourier spectrum and the directional consistency of each $32\times 32$ block, to train a shallow neural network. Thai et al. [5] design a segmentation approach based on the directional Hilbert transform of Butterworth bandpass filters. Thai and Gottschlich [22] propose a segmentation method by global three-part decomposition, which decomposes a fingerprint image into cartoon, texture, and noise parts: the foreground mask is obtained from the non-zero coefficients in the texture image using morphological processing.

More recent methods are mostly based on deep learning. Dai et al. [23] first apply the total variation model to decompose the fingerprint image into cartoon and texture components; then the texture component is divided into overlapping blocks, each of which is classified as foreground or background by a convolutional neural network; the final segmentation mask is obtained after a morphology-based postprocessing. Another block-based deep learning method is described in [24], where Serafim et al. experiment two well-known convolutional neural network architectures for block classification: LeNet [25] and AlexNet [26]; as in the previous method, some postprocessing steps produce the final segmentation mask. An end-to-end fingerprint segmentation method, trained on full sized images, is described in [27], where Joshi et al. introduce a recurrent U-Net with dropout, called DRUnet, and compare it to four existing network architectures: Conditional Generative Adversarial Network [28], U-Net [29], Convolution neural network with criss-cross attention module [30], and Recurrent U-Net [31].

This section has not considered latent fingerprint segmentation methods, since this paper focuses on plain fingerprints [1]. Segmentation of latent fingerprints requires specifically-designed approaches that are outside the aims of this study: interested readers may refer to [32], [33], and [34] and the references therein.

SECTION III.

Proposed Methods

Given a grayscale image $\mathbf {F}$ containing a fingerprint image, a segmentation method must classify each pixel in $\mathbf {F}$ as foreground or background: the result is a binary image $\mathbf {S}$ with the same size of $\mathbf {F}$ (Fig. 1).

The following sections describe the two fingerprint segmentation methods proposed in this paper. Both methods expect fingerprint images with a resolution of 500 dpi, which is the typical resolution of most fingerprint scanners [1].

A. GMFS

GMFS (Gradient-Magnitude Fingerprint Segmentation) is the result of a research effort aimed at designing a fingerprint segmentation method based on traditional image processing techniques and inspired by the KISS principle. In particular, during its development, the author’s goal was to obtain a method that:

uses a small number of simple features (ideally only one),
consists in a short sequence of well-known and efficient image processing steps, and
requires a small number of parameters to be configured.

Fig. 2 shows a functional schema of GMFS and Fig. 3 an example with all the intermediate processing steps. Feature extraction simply consists of gradient magnitude estimation for each pixel, and is performed as follows. Let

$\frac {\partial \mathbf {F}}{\partial x}$

and

$\frac {\partial \mathbf {F}}{\partial y}$

be the two images which, at each pixel, contain the horizontal and vertical derivative approximations of the input fingerprint

$\mathbf {F}$

$\frac {\partial \mathbf {F}}{\partial x}$

and

$\frac {\partial \mathbf {F}}{\partial y}$

can be computed by convolution with the Sobel filters

$\mathbf {S}_{x}$

and

$\mathbf {S}_{y}$

[35]:

$\frac {\partial \mathbf {F}}{\partial x}=\mathbf {F}\ast \mathbf {S}_{x}, \frac {\partial \mathbf {F}}{\partial y}=\mathbf {F}\ast \mathbf {S}_{y}$

. The magnitude of the gradient at each pixel is

$\mathbf {M=}\sqrt {\left ({\frac {\partial \mathbf {F}}{\partial x} }\right)^{2}\mathbf {+}\left ({\frac {\partial \mathbf {F}}{\partial y} }\right)^{2}}$

. The gradient magnitude is typically high at the transition points between ridges and valleys (see Fig. 3.b).

FIGURE 2.

A visual summary of the proposed GMFS method.

Show All

$FIGURE 3. - An example of fingerprint segmentation using GMFS: (a) Fingerprint image $\mathbf {F}$ , (b) Gradient magnitude $\mathbf {M}$ , (c) Averaged gradient magnitude $\bar {\mathbf {M}}$ , (d) Initial segmentation mask $\mathbf {S}_{\mathbf {t}}$ , (e)-(h) Results of the first four postprocessing steps with the previous mask superimposed in semitransparency to highlight changes, (i) Final segmentation mask, and (j) Ground truth.$

FIGURE 3.

An example of fingerprint segmentation using GMFS: (a) Fingerprint image $\mathbf {F}$ , (b) Gradient magnitude $\mathbf {M}$ , (c) Averaged gradient magnitude $\bar {\mathbf {M}}$ , (d) Initial segmentation mask $\mathbf {S}_{\mathbf {t}}$ , (e)-(h) Results of the first four postprocessing steps with the previous mask superimposed in semitransparency to highlight changes, (i) Final segmentation mask, and (j) Ground truth.

Show All

Starting from this feature, a threshold $t$ is computed as $t=percentile\left ({\mathbf {M},95 }\right)\cdot \tau$ , where $percentile\left ({\mathbf {M},95 }\right)$ is the $95^{th}$ percentile of matrix $\mathbf {M}$ , and $\tau$ is a parameter of the method (see section IV-B). Using the $95^{th}$ percentile, instead of simply choosing the maximum value in $\mathbf {M}$ , helps to mitigate the impact of outliers, which are often present due to noise in the image.

The gradient magnitude is then averaged by convolving $\mathbf {M}$ with a Gaussian filter $\mathbf {G}_{\sigma }$ with size $g_{s}\times g_{s}$ : $\bar {\mathbf {M}}\mathbf {=}\mathbf {F}\ast \mathbf {G}_{\sigma }$ . Filter $\mathbf {G}_{\sigma }$ is obtained by discretizing the 2D Gaussian function $G_{2D}\left ({x,y }\right)=\frac {1}{2\pi \sigma ^{2}}e^{-\frac {x^{2}+y^{2}}{2\sigma ^{2}}}$ over a $g_{s}\times g_{s}$ grid and normalized by dividing each element by the sum of all the filter values. $\sigma$ is a parameter of the method (see Section IV-B), while the filter size is set to $g_{s}=\left \lceil{ 3\cdot \sigma }\right \rceil \cdot 2+1$ , to contain most of the Gaussian values according to the three-sigma rule. This smoothing step is very important: it reduces the effect of noise and “fills” most of the inner regions of ridges and valleys where the gradient magnitude is low (see Fig. 3.c).

The initial segmentation mask $\mathbf {S}_{t}$ is obtained by a simple thresholding operation on the average gradient magnitude:

$\begin{align*} \mathbf {S}_{t}=\left [{ S_{i,j} }\right], \text {with} S_{i,j}=\begin{cases} \displaystyle foreground & if \bar {M}_{i,j}>t\\ \displaystyle background & otherwise\end{cases}\end{align*}$ View Source

Note that threshold $t$ is computed from the non-averaged gradient magnitude $\mathbf {M}$ , while thresholding is performed on the averaged gradient magnitude $\bar {\mathbf {M}}$ .

The final segmentation mask $\mathbf {S}$ is obtained from $\mathbf {S}_{t}$ after the following postprocessing steps:

Morphological closing [35] (dilation followed by erosion) with a simple $3\times 3$ disc-shaped structuring element (each erosion and dilation step is repeated $n_{c}$ times, where $n_{c}$ is a parameter of the method).
If the foreground contains more than one connected component, only the largest one is considered; the connected component labeling algorithm [35] can be used to perform this step efficiently.
Filling any holes, provided they are not adjacent to an image border; this step can be efficiently carried out by applying the connected components labeling algorithm to the background.
Morphological opening [35] (erosion followed by dilation) with the same structuring element used at step 1 (each dilation and erosion step is repeated $n_{o}$ times, where $n_{o}$ is a parameter of the method).
If the previous step creates more than one connected component, step 2 is executed again.

Fig. 3.e-h illustrate the effects of the first four postprocessing steps: the first step partially fills the holes caused by noise (e), the second one removes the noise artifact at the bottom-left corner of the image (f), the third one fills the remaining holes (g), and the fourth postprocessing step removes the protrusion at the top-left of the fingerprint (h).

GMFS satisfies the goals stated at the beginning of this section:

it is based on a single feature (the gradient magnitude),
it consists of a sequence of well-known image processing operations that can be efficiently computed (convolution with Sobel filters and Gaussian filter, thresholding, morphology operations and connected component labelling algorithm),
only four parameters regulate it: $\tau$ , $\sigma$ , $n_{c}$ , and $n_{o}$ (see section IV-B).

B. SUFS

SUFS (Simplified U-net Fingerprint Segmentation) is a fingerprint segmentation method based on deep learning, with the following general characteristics:

It uses a single network for the end-to-end segmentation task (from fingerprint $\mathbf {F}$ to segmentation mask $\mathbf {S}$ ), except for straightforward preprocessing and postprocessing operations.
The network architecture is inspired by U-Net [29], but with some modifications that make the network simpler, more symmetrical, and more suitable for the task of segmenting fingerprints (Table 1).
Thanks to the use of standard layers and a simple loss function, the network can be easily implemented in any deep learning framework.
Network training is carried out with a simple procedure using basic augmentation techniques on the learning data.

TABLE 1 Main Differences Between SUFS Network and U-Net [29]

The network, like the traditional U-Net, has an encoding path and a decoding path. Both paths consist of six levels: at each level there is an encoder (decoder) block with the same structure.

An encoder block (Fig. 4) takes $\frac {f}{2}$ feature maps of size $d\times d$ as input (except for the first encoder, which takes a single-channel image: the input fingerprint), then applies a $3\times 3$ convolution with padding and ReLU activation, batch normalization, and a $2\times 2$ max pooling operation for downsampling. An encoder block produces two types of outputs: $f$ downsampled feature maps of size $\frac {d}{2}\times \frac {d}{2}$ for the next level in the encoding path, and $f$ feature maps at the original $d\times d$ size to be provided, through a skip connection, to the decoder block operating at the same resolution.

FIGURE 4.

SUFS: the two building blocks of the network with their inputs, outputs, and intermediate layers.

Show All

A decoder block (Fig. 4) takes $4\cdot f$ feature maps of size $\frac {d}{2}\times \frac {d}{2}$ as input (except for the first decoder block, which takes $f$ feature maps from the last encoder block), then applies a $3\times 3$ convolution with padding and ReLU activation, batch normalization, and a $2\times 2$ upsampling layer to produce $f$ upsampled feature maps of size $d\times d$ , which are concatenated to the same number of feature maps from the input skip connection, resulting in a final output of $2\cdot f$ feature maps of size $d\times d$ .

Fig. 5 provides a visual summary of the whole SUFS method. The preprocessing step adjusts the image size by adding or cropping borders. It is important to emphasize that the image is not resized, as doing so would alter its resolution. The network expects a $512\times 512$ grayscale image as input: at the first level of the encoding path, 16 downsampled feature maps are extracted; each subsequent encoding level doubles the number of feature maps, while halving the size. The output of the last encoding block consists of 512 feature maps of size $8\times 8$ , which are provided as input to the first decoding block, together with 512 feature maps at the previous size through the skip connection. The first decoding level produces 1024 feature maps of size $16\times 16$ and each subsequent decoding level halves the number of feature maps while doubling the size. The last decoding level produces 32 feature maps at the original $512\times 512$ size, which are converted into a single feature map by a $3\times 3$ convolution with padding and sigmoid activation. A final postprocessing step converts the network output to a binary image using 0.5 as a threshold and adjusts the image size by adding or cropping borders to obtain a segmentation map $\mathbf {S}$ with the same size of the input fingerprint $\mathbf {F}$ .

FIGURE 5.

A visual summary of SUFS, including preprocessing, network architecture, and postprocessing.

Show All

The network architecture exhibits some symmetries. For instance, at each level, both the encoder block and its corresponding decoder block share identical $f$ and $d$ parameters. Additionally, the feature maps passing through the skip connections span from $512\times 512\times 16$ to $16\times 16\times 512$ .

The above network architecture was selected as the most promising among some possible alternatives that were examined during a round of preliminary experiments on separate datasets (see section IV-C). In particular, the following options were considered:

the standard U-net architecture and some of its variants,
a greater or smaller number of feature maps $f$ in each level,
a greater or smaller number of levels,
transposed convolution instead of normal convolution for the decoder block,
transposed convolution with stride two in the decoder block, instead of the upsampling layer.

To train the network, a loss function based on the Tversky index [36] was chosen:

$\begin{align*} loss\left ({\hat {y},y }\right)\mathrm {=1-}\frac {1+\hat {y}\cdot y}{1+\hat {y}\cdot y+\alpha \left ({1-y }\right)\hat {y}+\left ({1-\alpha }\right)y\left ({1-\hat {y} }\right)} \tag{1}\end{align*}$ View Source

where

$\hat {y}$

is the true value,

$y$

is the predicted outcome,

$\alpha$

is a parameter that weights false negatives, and one is added in numerator and denominator to ensure that

$loss$

is not undefined in edge cases. During the preliminary experiments, other loss functions were considered [37], including the Binary Cross-Entropy, the Focal Loss, the Dice Loss, and the Focal Tversky Loss. The loss function (1) was chosen as the most promising according to the results obtained, with

$\alpha =0.7$

, the same value suggested in [38]. Network training is carried out with a batch size of 16, using the Adam optimizer [39] with a learning rate of

${10}^{-3}$

, which is progressively reduced to

${10}^{-5}$

. Additionally, simple data augmentation techniques are employed to artificially expand training data, introducing horizontal flips, small translations and rotations, and varying contrast and scale.

Fig. 6 shows an example of fingerprint segmentation with SUFS, from the network input to the corresponding output. For each encoder and decoder block, one of the feature maps is shown. Note that the size of the feature maps varies from $512\times 512$ to $8\times 8$ , although they are all resized at the same dimensions for visualization purposes. Along the encoding path, the feature maps evolve from capturing fine-grained details and local patterns to representing higher-level contextual information and global structures. This process allows the network to extract progressively more abstract and meaningful features from the input fingerprint. In contrast, the decoding path reverses this trend, transforming feature maps from abstract spatial representations back into precise pixel-level segmentation masks.

FIGURE 6.

An example of input, feature maps, and output of the SUFS network. One of the feature maps for each level is shown.

Show All

SECTION IV.

Experimental Results

A. Benchmark and Metrics

The publicly available fingerprint databases from the first three Fingerprint Verification Competitions (FVC2000 [40], FVC2002 [41], and FVC2004 [42]) are an established benchmark for fingerprint comparison algorithms [2], [43]. Thanks to the work of Thai, Huckemann, and Gottschlich, who made publicly available a manually marked segmentation ground truth for all those fingerprints [5], these databases are also a suitable benchmark for fingerprint segmentation, already adopted by recent published works in the field. Table 2 reports some general information on the benchmark databases; there are four databases for each competition: three are acquired from real fingers and the fourth one is generated using SFinGe [44], [45], a synthetic fingerprint generation method. Fig. 7 shows a sample fingerprint from each database, together with the corresponding segmentation ground truth. The benchmark covers a wide range of acquisition technologies and image sizes. As can be seen in Fig. 7, fingerprints from the various databases are highly heterogeneous with respect to their size, appearance, contrast, noise type, etc. This contributes to make the benchmark quite challenging. The image resolution is 500 dpi for all databases except two: FVC2002 DB2 and FVC2004 DB3. Since GMFS and SUFS are designed to work with 500 dpi images, the fingerprints of these two databases are resized to bring their resolution to 500 dpi before being provided as input to the two proposed methods. The resulting segmentation masks are then resized back to the original resolution before being compared to the ground truth.

TABLE 2 The Twelve Databases That Comprise the Benchmark (Four for Each Competition)

FIGURE 7.

A fingerprint image for each database in FVC2000 [40] (a)-(d), FVC2002 [41] (e)-(h), and FVC2004 [42] (i-l): all images are at the same scale factor and with the contour of the corresponding segmentation mask provided by Thai, Huckemann, and Gottschlich [5].

Show All

Each database is divided into two sets: A and B. Each set A contains 800 fingerprints from 100 different fingers (there are eight impressions for each finger), each set B contains 80 fingerprints from 10 different fingers. Following the same approach of previous papers, sets A are reserved for testing, while sets B are used for parameter tuning, model training and validation.

The accuracy of a segmentation mask $\mathbf {S}$ with respect to the corresponding ground truth $\mathbf {S}_{GT}$ is evaluated in terms of:

True positives (TP) – the number of foreground pixels in $\mathbf {S}$ that are foreground in $\mathbf {S}_{GT}$ .
True negatives (TN) – the number of background pixels in $\mathbf {S}$ that are background in $\mathbf {S}_{GT}$ .
False positives (FP) – the number of foreground pixels in $\mathbf {S}$ that are background in $\mathbf {S}_{GT}$ .
False negatives (FN) – the number of background pixels in $\mathbf {S}$ that are foreground in $\mathbf {S}_{GT}$ .

Three evaluation metrics are used in this paper: the classification error rate ER (2) is the percentage of incorrectly-classified pixels with respect to the total number of pixels in the image, the Dice coefficient DC (3) is a standard metric to quantify segmentation performance [46], and the Jaccard similarity coefficient JC (4) is a measure of similarity between finite sample sets [47], also referred to as intersection-over-union.

$\begin{align*} \mathrm {ER}&=\frac {\mathrm {FP+FN}}{\mathrm {TP+TN+FP+FN}} \tag{2}\\ \mathrm {DC}&=\frac {\mathrm {2\cdot TP}}{\mathrm {2\cdot TP+FP+FN}} \tag{3}\\ \mathrm {JC}&=\frac {\mathrm {TP}}{\mathrm {TP+FP+FN}} \tag{4}\end{align*}$

View Source

B. GMFS Parameter Selection

GMFS is controlled by four parameters (see section III-A):

$\tau$ and $\sigma$ are related to the thresholding operation and should be tuned according to specific characteristics of the fingerprint acquisition sensor, such as background and image contrast.
$n_{c}$ and $n_{o}$ are related to the postprocessing steps and should be tuned according to the typical amount of noise present in the fingerprints.

In this experimentation, the values of the above parameters are chosen on set B of each database. The rationale for choosing the parameters for each database is that the twelve databases have been acquired (generated) with twelve different sensors (generation settings) and their images have specific properties (see Fig. 7 and table 2). Table 3 reports the parameter values used: for each database, the values that minimize the average

$ER$

over the corresponding set B have been chosen among a set of reasonable combinations of values.

TABLE 3 GMFS Parameters Chosen for Each Database

It is worth noting that the smallest values of parameter $\tau$ (0.02 and 0.03) are chosen for the databases where the background tends to be uniform and less noisy (FVC2004 DB1 and FVC 2002 DB1, see Fig. 7.i and Fig. 7.e).

C. SUFS Model, Hyperparameters and Training

Unfortunately, the size of each set B (80 fingerprints from 10 fingers) is too small to train a deep neural network and the only reasonable option is to train a single network using all the fingerprints from B sets (960 in total). In fact, for SUFS preliminary experiments and training, these fingerprints are split into the following sets:

B1 – 96 fingerprints (the eight fingerprints of the finger with index 1012 from all B sets),
B2 – 96 fingerprints (the eight fingerprints of the finger with index 102 from all B sets),
B3 – 768 fingerprints (the eight fingerprints of the fingers with indices 103–110 from all B sets).

During the preliminary experiments to evaluate various options for the network architecture, and to choose the loss function and the hyperparameters, B3 is used as a training set, B1 as a validation set, and B2 as a test set to choose the most promising configuration.

The final training, with the network architecture described in section III-B, is carried out on B2 + B3, using B1 as a validation set.

D. Results

GMFS is implemented in Python using the OpenCV library [48]. The time required to segment a fingerprint depends on the image size: on a PC with an Intel® Xeon® Silver 4112 CPU at 2.60GHz, the average segmentation time ranges from 7ms for FVC2000 DB4 images (the smallest ones) to 22ms for FVC2004 DB1 images (the largest).

SUFS is implemented in Python with the Keras library [49]. On a PC with an NVIDIA GeForce RTX™3080 Ti GPU, training the network requires about 25 minutes, while the average segmentation time is about 4ms, using batches of 32 fingerprints.

Fig. 8 illustrates sample successful segmentation cases using GMFS and SUFS. Both methods exhibit robustness across the large variety of background and sensor noise that characterizes the benchmark (Fig. 7).

FIGURE 8.

Examples of successful fingerprint segmentations using GMFS (a, c) and SUFS (b, d). The green contour represents the ground truth mask, while the blue contour represents the proposed segmentation.

Show All

Fig. 9 shows two examples where GMFS produces less satisfactory results than SUFS: in Fig. 9.a, GMFS falsely classifies the top portion of the fingerprint as background due to its low contrast. In Fig. 9.c, GMFS misclassifies the bottom and right portions of the fingerprint due to their poor quality (the valleys are almost impossible to locate). In both cases, SUFS achieves significantly better results.

FIGURE 9.

Examples of cases where GMFS (a, c) is less precise than SUFS (b, d). The green contour represents the ground truth mask, while the blue contour represents the proposed segmentation. The green areas correspond to false negatives.

Show All

Fig. 10 shows three cases where both methods produce unsatisfactory results. In Fig. 10.a-b, low-contrast portions of the fingerprint are misclassified as background. In Fig. 10.c-d, fingerprint-like noise (likely a ghost-fingerprint left on the sensor by a previous acquisition) is misclassified as foreground. Similarly, in Fig. 10.f, a ghost-fingerprint portion at the top tricks SUFS into producing a false-positive region; this does not happen for GMFS (Fig. 10.e) because of postprocessing steps 2 and 5 (see section III-A). On the other hand, GMFS produces more false negatives on the same fingerprint.

FIGURE 10.

Examples of failure cases using GMFS (a, c, e) and SUFS (b, d, f). The green contour represents the ground truth mask, while the blue contour represents the proposed segmentation. The green areas correspond to false negatives, while the blue areas correspond to false positives.

Show All

E. Comparison With the State-of-the-Art

The two proposed segmentation methods are compared with 18 state-of-the-art approaches. These are all the relevant approaches for which results have been reported on the FVC benchmark. Table 4 lists all the twenty methods and specifies which evaluation metrics are available for each of them. Nine methods (including SUFS) are based on deep learning, while the other eleven (including GMFS) use traditional image processing operations and handcrafted features. Four methods come from software projects available in the public domain: Mind (from the MindTCT software developed by NIST [50]), NFIQ2 (from the NFIQ2 software [51]), SAFIS (from the open source fingerprint recognition software SourceAFIS [52]), and FJ (from the open source version of the FingerJetFX software [53]). The remaining methods are among the most often cited fingerprint segmentation methods in the scientific literature [5], [9], [11], [13], [19], [22], [23], [24], [27] or are based on well-known network architectures [25], [26], [28], [29], [30], [31].

TABLE 4 A Summary of the State-of-the-Art Methods Considered and the Two New Proposed Ones

Tables 5, 6, and 7 report all available results with ER, DC, and JC metric, respectively. Tables 8 and 9 report the average results over all databases for traditional and deep-learning-based methods, respectively.

TABLE 5 Average ER Computed on Set a of Each Database in the Benchmark*

TABLE 6 Average DC Computed on Set a of Each Database in the Benchmark*

TABLE 7 Average JC Computed on Set a of Each Database in the Benchmark*

TABLE 8 Average Results of Traditional Methods*

TABLE 9 Average Results of Deep-Learning-Based Methods*

SUFS exhibits impressive performance, surpassing the other methods in all the three metrics. Specifically, it outperforms the other methods on the ER metric in all databases, on the DC metric in ten out of twelve databases, and on the JR metric in nine out of twelve databases. Additionally, it outperforms the other methods on all metrics when considering the average over the twelve databases.

However, the excellent performance of SUFS must not overshadow the remarkable performance of GMFS, which achieves the second-best performance in two databases on the ER metric and in five databases on the DC metric. It also achieves the third-best performance on the average ER over all databases and the second-best performance on the average DC over all databases. Furthermore, if the comparison is limited to the other traditional (non-deep-learning-based) methods, GMFS obtains the best result on the ER metric in seven out of twelve databases, on the DC metric in all databases, and on the JR metric in nine out of twelve databases. Finally, when considering the average metrics over the twelve databases, it outperforms any other traditional method (Table 8). A further experiment has been carried out to evaluate the impact of GMFS post-processing steps: without them, its average ER grows from 2.98% to 4.01%, thus confirming the importance of GMFS post-processing.

Although comparing performance metrics is essential, the overall complexity of the methods must also be considered to assess their practical utility. The following paragraphs explore the complexity of the two top-performing traditional methods and deep-learning-based approaches, according to the average ER.

Among traditional methods (Table 8), GMFS (average $ER 2.98$ %) and G3PD (average ER 3.06%) stand out for the lowest average ER. GMFS, described in detail in Section III-A, employs simple convolutions (with Sobel and Gaussian filters), thresholding, and minimal morphological operations: using an image processing library like OpenCV [48], it can be implemented in about 30 lines of Python code. This simplicity contrasts with G3PD, which requires solving an optimization problem to decompose the fingerprint into cartoon, texture, and noise components, followed by morphological operations to extract the segmentation from the texture coefficients [22]. While the source code is not publicly available (the authors only provide an obfuscated MATLAB implementation), based on the paper [22], it is reasonable to infer that G3PD implementation is significantly more complex than that of GMFS.

Turning to deep-learning-based methods (Table 9), SUFS (average ER 1.51%) and PCnet (average $ER 2.62$ %) exhibit the lowest average ER. SUFS, detailed in Section III-B, utilizes an end-to-end neural network with minimal preprocessing and postprocessing steps. Using Keras [49], implementing the SUFS network architecture requires about 15 lines of Python code, and implementing the SUFS segmentation method, including pre- and post-processing, can be accomplished in about 20 lines. PCnet, in contrast, involves a multi-step process [23]: 1) decomposing the fingerprint into texture and cartoon components using a method similar to G3PD, 2) dividing the texture component into overlapping patches, 3) classifying each patch as foreground or background using a specifically trained neural network, and 4) applying morphological operations to obtain the final segmentation. Unfortunately, the source code is not available, but it can be concluded from the paper [23] that PCnet implementation is substantially more complex than that of SUFS.

SECTION V.

Conclusion

This paper introduces two novel fingerprint segmentation methods, GMFS and SUFS, inspired by the KISS principle. Both methods are evaluated on a public benchmark, achieving state-of-the-art performance while maintaining simplicity and computational efficiency.

GMFS, a straightforward method based on a single handcrafted feature, exhibits superior performance compared to all other traditional methods with available results on the benchmark, including some considerably more complex approaches. Notably, GMFS achieves comparable performance to a range of deep learning-based methods. Given its minimal computational resource requirements, GMFS is particularly well-suited for applications where computing power and memory are constrained.

SUFS leverages the power of deep learning using a simplified U-net architecture for the task of end-to-end fingerprint segmentation and exhibits impressive performance despite being trained on a relatively limited dataset. It can effectively segment fingerprints from databases acquired using a variety of technologies without requiring fine-tuning or specific parameter adjustments for each image type. SUFS surpasses all existing state-of-the-art methods, achieving an average ER of 1.51% across the entire benchmark. This represents a substantial improvement of over 40% compared to the previously best-performing method.

An open-source Python implementation of both methods is available at https://github.com/raffaele-cappelli/pyfing.

ACKNOWLEDGMENT

The author is grateful to Dr. Annalisa Franco for her insightful comments and to Sara Cappelli for proofreading the manuscript.

NOTE

Open Access provided by 'Alma Mater Studiorum - Università di Bologna' within the CRUI CARE Agreement

References is not available for this document.

Unveiling the Power of Simplicity: Two Remarkably Effective Methods for Fingerprint Segmentation

Abstract:

Metadata

Abstract:

Introduction

Related Works