Introduction
Fingerprint recognition has emerged as a ubiquitous biometric technology, revolutionizing personal identification due to its reliability and accuracy [1], [2]. Its versatility has made it a cornerstone of various applications, including criminal investigations, border control systems, and mobile device authentication. A critical step in fingerprint recognition is segmentation, the process of isolating the fingerprint pattern from the background (Fig. 1). Segmentation is essential as it eliminates noise and irrelevant information, paving the way for accurate fingerprint recognition algorithms.
An example of fingerprint segmentation. (a) a fingerprint, (b) segmentation mask, (c) segmentation mask contour overlaid on the fingerprint.
The field of fingerprint segmentation has witnessed a proliferation of techniques, transitioning from traditional methods based on handcrafted features in the spatial or frequency domain, to more sophisticated approaches utilizing classifiers, genetic algorithms, clustering, and, more recently, deep learning. While these advancements have undoubtedly improved segmentation performance, they have also introduced a degree of complexity that can hinder implementation and comprehension of the underlying mechanisms driving performance gains. This study adopts the KISS (Keep It Simple and Straightforward1) principle [3], [4], seeking to develop novel segmentation methods that achieve state-of-the-art performance while maintaining simplicity.
The primary contributions of this paper are as follows:
A novel segmentation method based on simple and efficient image processing steps that achieves state-of-the-art performance.
A novel segmentation method based on a simplified U-net architecture that surpasses all previous methods evaluated on a public benchmark and is able to deal with fingerprints acquired through very different technologies without requiring fine-tuning.
An open-source implementation of both methods to facilitate further research and development in this domain.
The rest of this paper is organized as follows. Section II reviews the main fingerprint segmentation methods proposed in the literature. Section III describes the two novel fingerprint segmentation methods. Section IV reports experiments aimed at evaluating the performance of the proposed methods and comparing them to the state-of-the-art on a public benchmark. Finally, section V draws some concluding remarks.
Related Works
During the last four decades, more than 100 methods for fingerprint segmentation have been published [1], [5]. The earliest methods typically split the image into blocks (e.g.,
Given that fingerprints are characterized by an oriented pattern with frequencies only in a specific band of the Fourier spectrum, some authors propose segmentation methods that work in the frequency domain. In [19], Chikkeru et al. perform a local Fourier analysis with the goal of enhancing the fingerprint pattern, obtaining a fingerprint segmentation mask together with an estimation of local ridge orientation and frequency. Hu et al. [20] apply a Log-Gabor filter in the frequency domain and combine it with orientation reliability information. Marques and Thome [21] extract feature vectors based on the Fourier spectrum and the directional consistency of each
More recent methods are mostly based on deep learning. Dai et al. [23] first apply the total variation model to decompose the fingerprint image into cartoon and texture components; then the texture component is divided into overlapping blocks, each of which is classified as foreground or background by a convolutional neural network; the final segmentation mask is obtained after a morphology-based postprocessing. Another block-based deep learning method is described in [24], where Serafim et al. experiment two well-known convolutional neural network architectures for block classification: LeNet [25] and AlexNet [26]; as in the previous method, some postprocessing steps produce the final segmentation mask. An end-to-end fingerprint segmentation method, trained on full sized images, is described in [27], where Joshi et al. introduce a recurrent U-Net with dropout, called DRUnet, and compare it to four existing network architectures: Conditional Generative Adversarial Network [28], U-Net [29], Convolution neural network with criss-cross attention module [30], and Recurrent U-Net [31].
This section has not considered latent fingerprint segmentation methods, since this paper focuses on plain fingerprints [1]. Segmentation of latent fingerprints requires specifically-designed approaches that are outside the aims of this study: interested readers may refer to [32], [33], and [34] and the references therein.
Proposed Methods
Given a grayscale image
The following sections describe the two fingerprint segmentation methods proposed in this paper. Both methods expect fingerprint images with a resolution of 500 dpi, which is the typical resolution of most fingerprint scanners [1].
A. GMFS
GMFS (Gradient-Magnitude Fingerprint Segmentation) is the result of a research effort aimed at designing a fingerprint segmentation method based on traditional image processing techniques and inspired by the KISS principle. In particular, during its development, the author’s goal was to obtain a method that:
uses a small number of simple features (ideally only one),
consists in a short sequence of well-known and efficient image processing steps, and
requires a small number of parameters to be configured.
An example of fingerprint segmentation using GMFS: (a) Fingerprint image
Starting from this feature, a threshold
The gradient magnitude is then averaged by convolving
The initial segmentation mask \begin{align*} \mathbf {S}_{t}=\left [{ S_{i,j} }\right], \text {with} S_{i,j}=\begin{cases} \displaystyle foreground & if \bar {M}_{i,j}>t\\ \displaystyle background & otherwise\end{cases}\end{align*}
Note that threshold
The final segmentation mask
Morphological closing [35] (dilation followed by erosion) with a simple
disc-shaped structuring element (each erosion and dilation step is repeated3\times 3 times, wheren_{c} is a parameter of the method).n_{c} If the foreground contains more than one connected component, only the largest one is considered; the connected component labeling algorithm [35] can be used to perform this step efficiently.
Filling any holes, provided they are not adjacent to an image border; this step can be efficiently carried out by applying the connected components labeling algorithm to the background.
Morphological opening [35] (erosion followed by dilation) with the same structuring element used at step 1 (each dilation and erosion step is repeated
times, wheren_{o} is a parameter of the method).n_{o} If the previous step creates more than one connected component, step 2 is executed again.
GMFS satisfies the goals stated at the beginning of this section:
it is based on a single feature (the gradient magnitude),
it consists of a sequence of well-known image processing operations that can be efficiently computed (convolution with Sobel filters and Gaussian filter, thresholding, morphology operations and connected component labelling algorithm),
only four parameters regulate it:
,\tau ,\sigma , andn_{c} (see section IV-B).n_{o}
B. SUFS
SUFS (Simplified U-net Fingerprint Segmentation) is a fingerprint segmentation method based on deep learning, with the following general characteristics:
It uses a single network for the end-to-end segmentation task (from fingerprint
to segmentation mask\mathbf {F} ), except for straightforward preprocessing and postprocessing operations.\mathbf {S} The network architecture is inspired by U-Net [29], but with some modifications that make the network simpler, more symmetrical, and more suitable for the task of segmenting fingerprints (Table 1).
Thanks to the use of standard layers and a simple loss function, the network can be easily implemented in any deep learning framework.
Network training is carried out with a simple procedure using basic augmentation techniques on the learning data.
The network, like the traditional U-Net, has an encoding path and a decoding path. Both paths consist of six levels: at each level there is an encoder (decoder) block with the same structure.
An encoder block (Fig. 4) takes
SUFS: the two building blocks of the network with their inputs, outputs, and intermediate layers.
A decoder block (Fig. 4) takes
Fig. 5 provides a visual summary of the whole SUFS method. The preprocessing step adjusts the image size by adding or cropping borders. It is important to emphasize that the image is not resized, as doing so would alter its resolution. The network expects a
A visual summary of SUFS, including preprocessing, network architecture, and postprocessing.
The network architecture exhibits some symmetries. For instance, at each level, both the encoder block and its corresponding decoder block share identical
The above network architecture was selected as the most promising among some possible alternatives that were examined during a round of preliminary experiments on separate datasets (see section IV-C). In particular, the following options were considered:
the standard U-net architecture and some of its variants,
a greater or smaller number of feature maps
in each level,f a greater or smaller number of levels,
transposed convolution instead of normal convolution for the decoder block,
transposed convolution with stride two in the decoder block, instead of the upsampling layer.
To train the network, a loss function based on the Tversky index [36] was chosen:\begin{align*} loss\left ({\hat {y},y }\right)\mathrm {=1-}\frac {1+\hat {y}\cdot y}{1+\hat {y}\cdot y+\alpha \left ({1-y }\right)\hat {y}+\left ({1-\alpha }\right)y\left ({1-\hat {y} }\right)} \tag{1}\end{align*}
Fig. 6 shows an example of fingerprint segmentation with SUFS, from the network input to the corresponding output. For each encoder and decoder block, one of the feature maps is shown. Note that the size of the feature maps varies from
An example of input, feature maps, and output of the SUFS network. One of the feature maps for each level is shown.
Experimental Results
A. Benchmark and Metrics
The publicly available fingerprint databases from the first three Fingerprint Verification Competitions (FVC2000 [40], FVC2002 [41], and FVC2004 [42]) are an established benchmark for fingerprint comparison algorithms [2], [43]. Thanks to the work of Thai, Huckemann, and Gottschlich, who made publicly available a manually marked segmentation ground truth for all those fingerprints [5], these databases are also a suitable benchmark for fingerprint segmentation, already adopted by recent published works in the field. Table 2 reports some general information on the benchmark databases; there are four databases for each competition: three are acquired from real fingers and the fourth one is generated using SFinGe [44], [45], a synthetic fingerprint generation method. Fig. 7 shows a sample fingerprint from each database, together with the corresponding segmentation ground truth. The benchmark covers a wide range of acquisition technologies and image sizes. As can be seen in Fig. 7, fingerprints from the various databases are highly heterogeneous with respect to their size, appearance, contrast, noise type, etc. This contributes to make the benchmark quite challenging. The image resolution is 500 dpi for all databases except two: FVC2002 DB2 and FVC2004 DB3. Since GMFS and SUFS are designed to work with 500 dpi images, the fingerprints of these two databases are resized to bring their resolution to 500 dpi before being provided as input to the two proposed methods. The resulting segmentation masks are then resized back to the original resolution before being compared to the ground truth.
Each database is divided into two sets: A and B. Each set A contains 800 fingerprints from 100 different fingers (there are eight impressions for each finger), each set B contains 80 fingerprints from 10 different fingers. Following the same approach of previous papers, sets A are reserved for testing, while sets B are used for parameter tuning, model training and validation.
The accuracy of a segmentation mask
True positives (TP) – the number of foreground pixels in
that are foreground in\mathbf {S} .\mathbf {S}_{GT} True negatives (TN) – the number of background pixels in
that are background in\mathbf {S} .\mathbf {S}_{GT} False positives (FP) – the number of foreground pixels in
that are background in\mathbf {S} .\mathbf {S}_{GT} False negatives (FN) – the number of background pixels in
that are foreground in\mathbf {S} .\mathbf {S}_{GT}

B. GMFS Parameter Selection
GMFS is controlled by four parameters (see section III-A):
and\tau are related to the thresholding operation and should be tuned according to specific characteristics of the fingerprint acquisition sensor, such as background and image contrast.\sigma andn_{c} are related to the postprocessing steps and should be tuned according to the typical amount of noise present in the fingerprints.n_{o}
It is worth noting that the smallest values of parameter
C. SUFS Model, Hyperparameters and Training
Unfortunately, the size of each set B (80 fingerprints from 10 fingers) is too small to train a deep neural network and the only reasonable option is to train a single network using all the fingerprints from B sets (960 in total). In fact, for SUFS preliminary experiments and training, these fingerprints are split into the following sets:
B1 – 96 fingerprints (the eight fingerprints of the finger with index 1012 from all B sets),
B2 – 96 fingerprints (the eight fingerprints of the finger with index 102 from all B sets),
B3 – 768 fingerprints (the eight fingerprints of the fingers with indices 103–110 from all B sets).
The final training, with the network architecture described in section III-B, is carried out on B2 + B3, using B1 as a validation set.
D. Results
GMFS is implemented in Python using the OpenCV library [48]. The time required to segment a fingerprint depends on the image size: on a PC with an Intel® Xeon® Silver 4112 CPU at 2.60GHz, the average segmentation time ranges from 7ms for FVC2000 DB4 images (the smallest ones) to 22ms for FVC2004 DB1 images (the largest).
SUFS is implemented in Python with the Keras library [49]. On a PC with an NVIDIA GeForce RTX™3080 Ti GPU, training the network requires about 25 minutes, while the average segmentation time is about 4ms, using batches of 32 fingerprints.
Fig. 8 illustrates sample successful segmentation cases using GMFS and SUFS. Both methods exhibit robustness across the large variety of background and sensor noise that characterizes the benchmark (Fig. 7).
Examples of successful fingerprint segmentations using GMFS (a, c) and SUFS (b, d). The green contour represents the ground truth mask, while the blue contour represents the proposed segmentation.
Fig. 9 shows two examples where GMFS produces less satisfactory results than SUFS: in Fig. 9.a, GMFS falsely classifies the top portion of the fingerprint as background due to its low contrast. In Fig. 9.c, GMFS misclassifies the bottom and right portions of the fingerprint due to their poor quality (the valleys are almost impossible to locate). In both cases, SUFS achieves significantly better results.
Examples of cases where GMFS (a, c) is less precise than SUFS (b, d). The green contour represents the ground truth mask, while the blue contour represents the proposed segmentation. The green areas correspond to false negatives.
Fig. 10 shows three cases where both methods produce unsatisfactory results. In Fig. 10.a-b, low-contrast portions of the fingerprint are misclassified as background. In Fig. 10.c-d, fingerprint-like noise (likely a ghost-fingerprint left on the sensor by a previous acquisition) is misclassified as foreground. Similarly, in Fig. 10.f, a ghost-fingerprint portion at the top tricks SUFS into producing a false-positive region; this does not happen for GMFS (Fig. 10.e) because of postprocessing steps 2 and 5 (see section III-A). On the other hand, GMFS produces more false negatives on the same fingerprint.
Examples of failure cases using GMFS (a, c, e) and SUFS (b, d, f). The green contour represents the ground truth mask, while the blue contour represents the proposed segmentation. The green areas correspond to false negatives, while the blue areas correspond to false positives.
E. Comparison With the State-of-the-Art
The two proposed segmentation methods are compared with 18 state-of-the-art approaches. These are all the relevant approaches for which results have been reported on the FVC benchmark. Table 4 lists all the twenty methods and specifies which evaluation metrics are available for each of them. Nine methods (including SUFS) are based on deep learning, while the other eleven (including GMFS) use traditional image processing operations and handcrafted features. Four methods come from software projects available in the public domain: Mind (from the MindTCT software developed by NIST [50]), NFIQ2 (from the NFIQ2 software [51]), SAFIS (from the open source fingerprint recognition software SourceAFIS [52]), and FJ (from the open source version of the FingerJetFX software [53]). The remaining methods are among the most often cited fingerprint segmentation methods in the scientific literature [5], [9], [11], [13], [19], [22], [23], [24], [27] or are based on well-known network architectures [25], [26], [28], [29], [30], [31].
Tables 5, 6, and 7 report all available results with ER, DC, and JC metric, respectively. Tables 8 and 9 report the average results over all databases for traditional and deep-learning-based methods, respectively.
SUFS exhibits impressive performance, surpassing the other methods in all the three metrics. Specifically, it outperforms the other methods on the ER metric in all databases, on the DC metric in ten out of twelve databases, and on the JR metric in nine out of twelve databases. Additionally, it outperforms the other methods on all metrics when considering the average over the twelve databases.
However, the excellent performance of SUFS must not overshadow the remarkable performance of GMFS, which achieves the second-best performance in two databases on the ER metric and in five databases on the DC metric. It also achieves the third-best performance on the average ER over all databases and the second-best performance on the average DC over all databases. Furthermore, if the comparison is limited to the other traditional (non-deep-learning-based) methods, GMFS obtains the best result on the ER metric in seven out of twelve databases, on the DC metric in all databases, and on the JR metric in nine out of twelve databases. Finally, when considering the average metrics over the twelve databases, it outperforms any other traditional method (Table 8). A further experiment has been carried out to evaluate the impact of GMFS post-processing steps: without them, its average ER grows from 2.98% to 4.01%, thus confirming the importance of GMFS post-processing.
Although comparing performance metrics is essential, the overall complexity of the methods must also be considered to assess their practical utility. The following paragraphs explore the complexity of the two top-performing traditional methods and deep-learning-based approaches, according to the average ER.
Among traditional methods (Table 8), GMFS (average
Turning to deep-learning-based methods (Table 9), SUFS (average ER 1.51%) and PCnet (average
Conclusion
This paper introduces two novel fingerprint segmentation methods, GMFS and SUFS, inspired by the KISS principle. Both methods are evaluated on a public benchmark, achieving state-of-the-art performance while maintaining simplicity and computational efficiency.
GMFS, a straightforward method based on a single handcrafted feature, exhibits superior performance compared to all other traditional methods with available results on the benchmark, including some considerably more complex approaches. Notably, GMFS achieves comparable performance to a range of deep learning-based methods. Given its minimal computational resource requirements, GMFS is particularly well-suited for applications where computing power and memory are constrained.
SUFS leverages the power of deep learning using a simplified U-net architecture for the task of end-to-end fingerprint segmentation and exhibits impressive performance despite being trained on a relatively limited dataset. It can effectively segment fingerprints from databases acquired using a variety of technologies without requiring fine-tuning or specific parameter adjustments for each image type. SUFS surpasses all existing state-of-the-art methods, achieving an average ER of 1.51% across the entire benchmark. This represents a substantial improvement of over 40% compared to the previously best-performing method.
An open-source Python implementation of both methods is available at https://github.com/raffaele-cappelli/pyfing.
ACKNOWLEDGMENT
The author is grateful to Dr. Annalisa Franco for her insightful comments and to Sara Cappelli for proofreading the manuscript.
NOTE
Open Access provided by 'Alma Mater Studiorum - Università di Bologna' within the CRUI CARE Agreement