Measuring Linkability of Protected Biometric Templates Using Maximal Leakage

As the applications of biometric recognition systems are increasing rapidly, there is a growing need to secure the sensitive data used within these systems. Considering privacy challenges in such systems, different biometric template protection (BTP) schemes were proposed in the literature, and the ISO/IEC 24745 standard defined a number of requirements for protecting biometric templates. While there are several studies on evaluating different requirements of the ISO/IEC 24745 standard, there have been few studies on how to measure the linkability of biometric templates. In this paper, we propose a new method for measuring linkability of protected biometric templates. The proposed method is based on maximal leakage, which is a well-studied measure in information-theoretic literature. We show that the resulting linkability measure has a number of important theoretical properties and an operational interpretation in terms of statistical hypothesis testing. We compare the proposed measure to two other linkability measures: one previously introduced in the literature, and a similar measure based on differential privacy. In our experiments, we use the proposed measure to evaluate the linkability of biometric templates from different biometric characteristics (face, voice, and finger vein), which are protected with different BTP schemes. The source codes of our proposed measure and all experiments are publicly available.

Abstract-As the applications of biometric recognition systems are increasing rapidly, there is a growing need to secure the sensitive data used within these systems. Considering privacy challenges in such systems, different biometric template protection (BTP) schemes were proposed in the literature, and the ISO/IEC 24745 standard defined a number of requirements for protecting biometric templates. While there are several studies on evaluating different requirements of the ISO/IEC 24745 standard, there have been few studies on how to measure the linkability of biometric templates. In this paper, we propose a new method for measuring linkability of protected biometric templates. The proposed method is based on maximal leakage, which is a well-studied measure in information-theoretic literature. We show that the resulting linkability measure has a number of important theoretical properties and an operational interpretation in terms of statistical hypothesis testing. We compare the proposed measure to two other linkability measures: one previously introduced in the literature, and a similar measure based on differential privacy. In our experiments, we use the proposed measure to evaluate the linkability of biometric templates from different biometric characteristics (face, voice, and finger vein), which are protected with different BTP schemes. The source codes of our proposed measure and all experiments are publicly available.

I. INTRODUCTION
B IOMETRIC recognition systems generally establish the identity of users based on their physiological (e.g., face, finger vein, fingerprint, iris, etc.), behavioral (e.g., voice, gait, signature, etc.), or chemical (e.g., DNA, etc.) attributes, which are unique to individuals. As biometric authentication and identification systems offer great convenience for users and also provide fast and accurate recognition, applications of biometric recognition systems tend towards ubiquity, from personal (e.g., smart phone unlocking with face 1 or fingerprint 2 recognition, etc.) to large-scale applications (e.g., face 3 , fingerprint 4 , and iris 5 recognition in national identity system, or face recognition for passport control at borders and airports 6 etc.). In such systems, biometric templates (a.k.a., features) are often extracted from biometric data and are stored in the system's database during the enrolment stage. Later, during the recognition stage, a new biometric template is extracted and compared with the templates in the database. Since biometric templates convey important information about the user's identity, data protection regulations, such as the European Union General Data Protection Regulation (GDPR) [1], consider biometric templates as sensitive data and impose legal obligations to protect biometric templates.
To protect biometric templates and address privacy issues in biometric recognition systems, several schemes are proposed in the literature [2], [3], [4]. For each biometric template protection (BTP) scheme, the ISO/IEC 24745 standard [5] defines four criteria. First, the protection scheme should not significantly degrade the accuracy of the biometric recognition system (i.e., performance preservation). Second, the protected template should be irreversible, meaning it should be computationally infeasible to reconstruct the original template from the protected template. (i.e., irreversibility). Third, if a protected template is compromised, it should be possible to revoke that protected template and generate a new protected template (i.e., revocability/renewability). Fourth, if two or more protected templates are leaked, it should not be feasible to determine whether they are from the same subject or different subjects (i.e., unlinkability).
Notwithstanding standardized metrics to evaluate and report the recognition performance of biometric systems (e.g., ISO/IEC 19795-1 standard [6]), no standardized measure has been included in the ISO/IEC 30136 standard [7] for evaluating the irreversibility and unlinkability of protected templates. In addition, while a lot of research has been devoted to the irreversibility evaluation of protected templates [8], there  [7], the more precise definition of unlinkability is: "unlinkability is the difficulty of distinguishing between Auxiliary Data (AD)s and/or Pseudonymous Identifiers (PIs) of two Renewable Biometric References (RBRs) generated from the same subject's characteristic and ADs and/or PIs of two RBRs generated from different subjects' characteristics" [emphasis added]. In the context of BTP, we can extend the definition of mated and non-mated pairs in the ISO/IEC 2382-37 standard [9] as: • mated: two protected templates are mated if they correspond to the same subject (they can be either from the same sample or different samples) and with different keys.
• non-mated: two protected templates are non-mated if they correspond to different subjects with different keys. Therefore, to gain the unlinkability criterion, the protected templates should be such that an adversary would not be able to distinguish mated and non-mated protected pairs. Table I summarizes the previous works in the literature which have used a generic method to evaluate the linkability of protected biometric templates. Buhan et al. [10] considered a biometric cryptosystem and compared the recognition accuracy of the system in terms of Equal Error Rates (EER) in two scenarios: i) templates protected with a single key (i.e., regular recognition accuracy analysis), ii) templates protected with different keys (i.e., unlinkability analysis). While the increase of EER implies some degree of unlinkability, the unlinkability is not quantified in their work. Kelkboom et al. [11] considered similar scenarios and compared the recognition performance of the system in terms of the Receiving Operating Characteristic (ROC). Then, if the recognition accuracy shown by the ROC curve decreases, the system is considered to be unlinkable. However, the unlinkability can neither be quantified in this approach. Similarly, Nagar et al. [12] found the ROC curve of matching templates with different keys to evaluate the unlinkability of the system.
Piciucco et al. [13] used a similar approach to [10], [11], and [12], but combined the results of regular analysis and unlinkability analysis. They plotted the True Match Rate (TMR) in the unlinkability analysis 7 versus the system's False Non-Match Rate (FNMR) in the regular analysis. Their 7 refered as Renewable Template Matching Rate (RTMR) in their work. method does not evaluate the True Match Rate (TMR) in the unlinkability analysis, and the degree of general unlinkability is also not quantified in their method. Along the same lines, Rua et al. [14] found the probability that the adversary can determine the correct identity in a top-N list and plotted this probability similar to Cumulative Match Curves (CMC). Then, as an evaluation of the unlinkability of the system, they compared this plot with the curve corresponding to the probability of random guesses being correct (i.e., full unlinkability). However, their method does not provide a single number to quantify the general unlinkability of the system.
In contrast to [10], [11], [12], [13], and [14] which have evaluated unlinkability based on accuracy metrics, [15], [16], [17] considered score distributions in their unlinkability evaluations. In [15], Ferrara et al. calculated three distributions of scores, including scores of templates with different keys from: 1) the same sample, 2) different samples of the same subject, and 3) samples of different subjects. Then, according to visual comparisons of these distributions, they evaluate the unlinkability of templates. Wang and Hu [16] used the latter two score distributions only and evaluated unlinkability by visual comparison of these distributions. Gomez-Barrero et al. [17] proposed two quantitative measures (local and global) based on score distributions. Similar to [16], they considered two distributions of scores for mated and non-mated pairs. Then, as their local measure for each score, they consider the difference in conditional probabilities of the hypothesis of being mated and the hypothesis of being non-mated. To calculate their local measure, they use the likelihood ratio of mated and non-mated hypotheses and the ratio of prior probabilities. For their global measure, they considered the conditional expectation of their local measure over score values. The global measure (D sys ↔ ) proposed in [17] was the first quantitative evaluation that measures the degree of unlinkability of the biometric systems. It is also properly defined and bounded in the [0, 1] interval. However, it has several drawbacks that we discuss in Section III-B.
In addition to prior work on linkability, there is ample work on general privacy measures in information theory and computer science communities [18], [19]. The most prominent notions of privacy are ϵ-differential privacy and (ϵ, δ)differential privacy which were developed for the database release problem [20], [21], [22]. The main idea behind this approach is to control the influence of a single database entry on the output of differentially private queries. BTP schemes have been studied from the differential privacy perspective in [23] where a differentially private distributed face-recognition system is proposed. A hypothesis testing perspective on differential privacy has been introduced in [24] and extended in [25]. In particular, [25] show that (ϵ, δ)differential privacy guarantees could be interpreted as bounds on the ROC curves of appropriately defined hypothesis tests.
Another recent measure of interest is maximal leakage which seeks to control the adversary's ability to refine his or her estimate of any function of data [26], [27]. Maximal leakage has been recently discussed in the context of hypothesis testing: Privacy-utility trade-offs using maximal leakage as a privacy metric and the type II (false alarm) error exponent as the utility metric have been studied in [28]; In [29] the socalled "noiseless privacy" is related to hypothesis testing and to maximal leakage; And, maximal leakage is used to bound generalization errors of learning algorithms in [30].
In this paper, we propose a new measure for evaluating the linkability of protected biometric templates. Our proposed measure combines the work on maximal leakage from information-theoretic literature [26], [27] with the perspective on global linkability introduced in [17]. Since our proposed measure is based on a well-studied information measure, it inherits many of the theoretic properties of this measure. In addition, we show that the proposed linkability measure has an appealing operational interpretation in terms of hypothesis testing that the adversary could perform on a pair of protected templates. This hypothesis testing interpretation of our proposed measure makes it consistent with the definition of linkability in the ISO/IEC 30136 standard [7]. We further compare our proposed measure to a similar measure based on differential privacy [22] and show that the differential privacy-based measure is too strict for the linkability application. Finally, the experimental implementation of our proposed measure shows that it gives intuitively correct linkability scores across different BTP schemes, biometric characteristics, and scoring functions.
The remainder of the paper is organized as follows. In Section II we define our proposed measure, as well as discuss its operational interpretations and its properties. In Section III we compare the proposed measure to two other linkability measures: the global measure introduced in [17] and a similar measure based on differential privacy. In Section IV, we evaluate the unlinkability of different biometric recognition systems based on different biometric characteristics and protected with different BTP schemes. Finally, the paper is concluded in Section V.

II. PROPOSED MEASURE
In this section, we propose a new measure of linkability for biometric templates. In Section II-A, we introduce our notation and overview the maximal leakage information measure. In Section II-B, we define our measure of linkability as a maximal leakage of information about the mated and nonmated hypothesis, as well as review its properties. We end by interpreting the new measure in terms of statistical hypothesis testing in Section II-C.

A. Paper Notation and Maximal Leakage
Throughout the paper, we use capital letters to denote random variables, calligraphic letters to denote support sets of these random variables (and sets in general), and lower case letters to denote realizations of these random variables. For example, X is a random variable taking values on X while x ∈ X is a possible realization of this random variable. We use the notation X ↔ Y ↔ Z to denote that X , Y , and Z form a Markov chain. We use p X to denote the probability mass function (if X is discrete) or the probability density function (if X is continuous) of X . If the associated random variable is clear from context, we omit the subscript: for example, p(y|s). We use sanserif font to indicate functions, for example f : X → Y denotes a function from X to Y. Finally, all the logarithms in this paper will be assumed to have base two.
Maximal leakage is an information leakage measure introduced in [26] and [27]. Specifically, [26] defined this measure as follows. Let X and Y be two jointly-distributed random variables, where X represents some secret information which may be of interest to an adversary, while Y represents the actual observations of an adversary. The maximal leakage of information from X to Y is defined as where U ,Û are random variables over some common finite alphabet. The auxiliary random variable U in Eq.1 denotes some, possibly random, mapping of secret information X , whileÛ denotes the best guess an adversary could make about U . Thus, the ratio P U =Û max u∈U p U (u) captures how much an adversary's ability to guess any hidden mapping of data U improves by observing Y . The whole quantity in Eq. 1 measures multiplicative improvement of the adversary's ability to guess any possible function of the secret X .
Maximal leakage was independently introduced in [27] where it was defined as where X ↔ Y ↔X . When X has full support, both definitions in Eq. 1) and Eq. 2 are equivalent [26].
Although it is not immediately clear that Eq. 1) and Eq. 2 are computable, it is shown in [26, Theorem 1] that, for discrete (X, Y ), maximal leakage could be evaluated via the following simple formula This result could be extended to more general settings [26,Theorem 7]. For example, a setting that will be of interest to us is when Y is continuous, X is discrete, and the probability density functions p Y |X (y|x) exist. In this case, the maximal leakage reduces to Finally, it is shown in [26] that where I ∞ (X ; Y ) denotes the Sibson's mutual information of order infinity [31], [32]. In other words, L(X → Y ) could be viewed as a generalization of Shannon's mutual information in the same way that Rényi entropy is a generalization of Shannon's entropy [33]. Because maximal leakage is a well-defined information measure, it has a number of mathematical properties. We highlight some of the most important properties here: It is zero if and only if X and Y are statistically independent.
• Secondly, it satisfies the data processing inequality which states that where X ↔ Y ↔ Z form a Markov chain.
• Finally, for a discrete random variable X , Proofs of these properties and additional properties of maximal leakage could be found in [26].

B. Maximal Linkabilty of Biometric Templates
The proposed linkability metric uses maximal leakage to measure the amount of information revealed by two templates about the two possible hypotheses: the templates are mated, and the templates are not mated. Specifically, given two biometric systems, let T 1 be the space of all possible protected templates that could be produced by the first system and T 2 be the space of all possible protected templates that could be produced by the second system. Given two templates (t 1 , t 2 ) ∈ T 1 × T 2 we can define the following hypothesis: h m = {templates t 1 and t 2 belong to mated instances} h nm = {templates t 1 and t 2 belong to non-mated instances}.
Moreover, let (T 1 , T 2 ) be random variables each taking values on T 1 × T 2 and let H be a random variable taking values on H = {h m , h nm }. In other words, H denotes the true hypotheses about templates T 1 and T 2 .
Definition 1 (Maximal Linkability): Maximal linkability of two systems producing templates (T 1 , T 2 ) is defined as We can make two observations about maximal linkability in light of Eq. 10. First, since maximal linkability depends only on the conditional distributions p(t 1 , t 2 |h m ) and p(t 1 , t 2 |h nm ), it is independent of the distribution of the hypothesis H . This is a desirable property for a linkability measure since it means that M sys ↔ depends on the BTP scheme itself, and not on any assumptions on the distributions of mated and non-mated pairs of templates.
Secondly, from an information-theoretic perspective, it is important to define M sys ↔ as we do in Definition 1. This measure is the 'true' linkability score of the system. That is, as we will see in Lemma 2, this score gives us the most general guarantees with fewest assumptions on the behaviour of the adversary. However, to compute M sys ↔ , it is necessary to know p(t 1 , t 2 |h m ) and p(t 1 , t 2 |h nm ) for all possible values of (t 1 , t 2 ) ∈ T 1 × T 2 . This means that if M sys ↔ is to be estimated from data, we need to generate a number of samples on the order of |T 1 ||T 2 | and this is prohibitive in most practical settings. To circumvent this issue, we follow [17] and propose a linkability measure based on a similarity function. That is, we assume that there is a similarity function which captures the relevant information about the similarity of the two templates. This similarity function could then be used to approximate the linkability score proposed in Definition 1.
To this end, we define another linkability measure with respect to a fixed similarity function. Definition 2 (Maximal s-Linkability): Let S = s(T 1 , T 2 ) be a similarity score for templates T 1 and T 2 , and a similarity function s. Maximal s-linkability of two systems producing templates (T 1 , T 2 ) is defined as Then, for discrete S, and for continuous S, Maximal s-linkability generalizes maximal linkability in the following sense. It measures the amount of information revealed by the similarity score S about the two possible hypotheses: the templates are mated, and the templates are not mated. If s is taken to be the identity function, maximal s-linkability reduces to maximal linkability. Thus, just like in [17], the linkability of the system should be evaluated for several similarity functions and the worst-case score should be considered.
Lemma 1: Let s be any similarity function on T 1 ×T 2 . Then Proof: Eq. 15 follows from Eq. 6, 7, and 8. Specifically, the first inequality follows from Definition 2 and from Eq. 6. In other words, since M s ↔ is an information measure, it cannot be negative. The second inequality follows from the data processing inequality (i.e., Eq. 7) since we have a Markov chain H ↔ (T 1 , T 2 ) ↔ S. Finally, the last inequality follows from Definition 1 and Eq. 8 since H is a binary-valued random variable.
Just like the linkability measure proposed in [17], our measure is supported on [0, 1]. If M sys ↔ = 0 then the system is completely unlinkable. That is, templates T 1 and T 2 reveal nothing about the hypothesis h m and h nm . On other hand, M s ↔ = 1 means that the system is completely linkable and the adversary could always determine the correct hypothesis after observing T 1 and T 2 .

C. Maximal Linkability and Hypothesis Testing
In this section, we interpret M sys ↔ and M s ↔ in terms of Neyman-Pearson hypothesis testing. Recall that in this framework, the goal is to design a hypothesis test based on the available data while trading-off two types of errors: false alarm error and missed detection error. In the present case, the adversary's goal is to distinguish between two hypotheses {h m , h nm }, while keeping the two errors small. In the biometrics literature, the false alarm error is also known as false match rate (FMR), while the missed detection error is also known as the false non-match rate (FNMR). The maximal linkability metrics provide impossibility bounds on the adversary's ability to design well-performing hypothesis tests. If an adversary has access to the protected templates (T 1 , T 2 ), the relevant bound is derived in terms of M sys ↔ . On the other hand, if an adversary has access to similarity score S = s(T 1 , T 2 ) only, the relevant bound is derived in terms of M s ↔ . These impossibility bounds are formalized in the following lemmas.
The proof of Lemma 2 is given in the Appendix A. The Proof of the following Lemma 3 is identical to the proof of Lemma 2 with the key difference being that the adversary's hypothesis testing is assumed to be done on the similarity score S and not on the protected templates (T 1 , T 2 ).
We see from Lemma 2 that a low value of M sys ↔ guarantees that an adversary cannot perform any meaningful hypothesis testing on observed templates T 1 and T 2 to decide if they are mated or non-mated. Likewise, we see from Lemma 3 that a low value of M s ↔ guarantees that an adversary cannot perform any meaningful hypothesis testing on an observed similarity score S to decide if it comes from mated or non-mated templates. These results give an operational interpretation to M sys ↔ and M s ↔ an addition to those already provided in [26], see Figure 1. our measure returns a low value (i.e, near zero), while for distributions with less overlap (e.g., Figure 2d) our measure returns a higher value. In addition, we see in all four cases that our measure provides a good upper bound on the true ROC curve of an optimal hypothesis test performed by the adversary.

III. COMPARISON WITH OTHER MEASURES
In this section, we compare the proposed measure to other approaches to measuring linkability. In Section III-A, we discuss the implications of using differential privacy as an information measure in the definition of linkability. In Section III-B, we compare our proposed measure to the one from [17], as the most relevant linkability measure in the literature for protected biometric templates.

A. On Linkability via Differential Privacy
The main insight behind the proposed linkability measure is to measure the amount of information leaked by a pair protected biometric templates about whether these templates are mated or not mated. Definitions 1 and 2 use maximal leakage as a measure of such information leakage. This raises the question of whether other measures of privacy loss could be used instead of maximal leakage. In this section, we consider the most prominent such measure: differential privacy [22].
We will show that for ϵ-differential privacy the resulting linkability measure does not differentiate between the four distinct examples in Figure 2. That is, it assigns the value of infinity to all four examples and classifies all four systems as completely linkable. Another possible approach is to apply a common relaxation of ϵ-differential privacy known as (ϵ, δ)differential privacy. We will show as well, from the example of Figure 2, that this approach does not provide us with a single linkability measurement. Instead, it provides us with a curve trading off between the ϵ and the δ privacy parameters.
1) ϵ-Differential Privacy: Differential privacy is the most prominent approach to privacy that was designed for a private data release problem [22]. In this discussion, we view ϵ-differential privacy as an information measure between our true hypothesis H and an observed template pair T 1 , T 2 , and apply it in the manner similar to Definition 1. In other words, we seek to measure how differentially private the mapping H to (T 1 , T 2 ) is. In this way, we can define a new measure of linkability: Likewise, for a given similarity function s with continuous scores S, we can define a measure of linkability: where f (s|h) denotes the probability density function of S given h ∈ {h m , h nm }.
As it turns out, these definitions do not distinguish between any of the cases in Table II and instead classify all of them as fully linkable. In other words, ϵ-differential privacy is too pessimistic for the linkability application. For example, let the score distribution of mated and non-mated templates be any of the four normally distributed pairs in Table II. Then This is because the four synthetic distributions in Table II are all examples of a Gaussian additive mechanism applied to a database {h m , h nm }. These do not satisfy ϵ-DP according to [  2) (ϵ,δ)-Differential Privacy: (ϵ, δ)-Differential privacy is a well-studied relaxation of differential privacy which introduces a second parameter δ. We could also consider treating this as an information measure between our true hypothesis H and an observed template pair (T 1 , T 2 ), and apply it in the manner similar to Definition 1. Or, we could consider treating this as an information measure between our true hypothesis H and an observed similarity score S, and apply it in the manner similar to Definition 2. However, in both of these cases we would need to estimate two parameters: ϵ and δ. In general, a BTP scheme will not satisfy (ϵ, δ)-differential privacy for a single pair (ϵ, δ), but would instead satisfy it for an (ϵ, δ) curve.
As an example, take the score distribution of mated and non-mated templates be normally distributed N (1.1, 0.5) and N (1, 0.5) as in Figure 2a. Let c ∈ [0, ∞] be any non-negative constant. Then, mapping from H to S induced by the BTP scheme satisfies (ϵ, δ)-differential privacy with As we see from the above discussion, differential privacy does not appear to be an appropriate information measure for the linkability application. In the case of ϵ-differential privacy, it does not differentiate between the simple synthetic examples in Table II and labels all of them completely linkable. On the other hand, in the case of (ϵ, δ)-differential privacy, it is not clear how to obtain a single linkability score.

B. Comparison With Gomez-Barrero et al. [17] Measure
Recall that the first quantitative measure of linkage was introduced in [17]. The main idea of [17] is to base the measure on the distributions of mated and non-mated hypotheses conditioned on a similarity score.
1) Overview of Gomez-Barrero et al. [17] Measure: As mentioned in Section I, Gomez-Barrero et al. [17] proposed two quantitative measures (local and global) based on score distributions. They considered a similarity function s to find the score s = s(t 1 , t 2 ) ∈ S between two templates t 1 and t 2 , and found distributions of mated and non-mated pairs. Next, they defined their local measure for each score s in [17, Eq. 4] as: With some assumptions and simplification, they define their local unlinkability measure in [17, Eq. 14] as: where L R(s) = p(s|h m )/ p(s|h nm ) is the likelihood ratio and ω = p(h m )/ p(h nm ) denotes the ratio between the prior probabilities of the mated and non-mated samples. The value of ω = 1, i.e, p(h m ) = p(h nm ), is proposed as the worstcase scenario. Finally, the global measure D sys ↔ is found by calculating the conditional expectation of the local measure D ↔ (s) over all comparison scores in [17,Eq. 19] as: The global measure D sys ↔ was the first quantitative evaluation that measures the degree of unlinkability of the biometric systems. In addition to the mathematical definition of D sys ↔ , [17, Section V] proposes a general protocol for evaluating linkability from data.
2) Comparison With Maximal Linkability: Both D sys ↔ (as in Eq. 24) and M s ↔ are based on the similarity score of biometric templates. As discussed in Section II-B, the true linkability of the system is given by M sys ↔ . However, as this is computationally infeasible in most real-world biometric systems, we follow [17] and focus on computing M s ↔ as proxies for the true linkability. Just like in [17], it is thus important to compute M s ↔ for a number of different similarity scores.
The proof for D sys ↔ ≤ M s ↔ is given in Appendix A, while the other inequalities follow from Lemma 1 and [17]. We highlight that even though M s ↔ is always higher than D sys ↔ , it is possible for the two measures to give different rankings to two biometric systems. As an example, consider distributions of scores for mated and non-mated pairs as depicted in Figure 3. In this example, the linkability of mated and non-mated templates is 0.6177 by our measure (i.e, M sys ↔ as in Eq. 10) and 0.3902 by the measure in [17] (i.e, D sys ↔ as in Eq. 24) for system (a). For system (b), the linkability of mated and non-mated templates is 0.5988 by our measure and 0.4289 by the measure in [17].
Secondly, according to Lemmas 2 and 3, maximal linkability has a clear operational interpretation in terms of hypothesis testing capabilities of an adversary. This makes it consistent with the definition of unlinkability in the ISO/IEC 30136 standard [7] presented in Section I. The measure D sys ↔ does not appear to have such a hypothesis testing interpretation. Considering again the example in Figure 3, we see that from the hypothesis testing perspective of Lemmas 2 and 3 it is correct to label system (a) as more linkable than system (b). The rational for labeling system (b) as more linkable than Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. system (a) (as is done by D sys ↔ ) is less apparent. In addition, unlike maximal linkability, D sys ↔ has a built-in asymmetry where it prioritizes the linkability of mated templates in its definition. While according to the definition of unlinkability in the ISO/IEC 30136 standard [7] given in Section I, a linkability measure should take into account the difficulty of arriving at both, mated and non-mated, hypotheses. From an informationtheoretic perspective, understanding that two templates are non-mated could also leak information to the adversary and should not be overlooked by a linkability measure. A closely related issue is that, to prevent ( 23) from being negative, it is rounded up to zero in certain cases. This rounding again leads to a similar loss of information.
A third difference is that maximal linkability appears to be numerically more stable. For example, to estimate M s ↔ we simply need to estimate the area under the curve of the maximum of mated and non-mated probability density function as in Eq. 14. On the other hand, to calculate D ↔ (s) in Eq. 23, it is necessary to estimate the likelihood ratio L R(s) = p(s|h m )/ p(s|h nm ), which is numerically unstable for low values of p(s|h nm ). In addition, for estimating L R(s) in practical evaluation in the case of p(s|h nm ) = 0, the authors considered L R(s) = 1 in their open-source implementation 8 which is theoretically incorrect.
Finally, maximal linkability is independent of the prior probabilities of mated and non-mated hypotheses. By contrast, D sys ↔ requires the ratio of prior probabilities of the mated and non-mated samples (ω). We further discuss the effect of this assumption in Section III-B.3.
3) Different Values of ω: As mentioned in Section III-B.1, the measure in [17] requires the ratio of prior probabilities of the mated and non-mated samples (i.e., ω = p(H m )/ p(H nm )). If we vary the value of ω in this measure, we get counter intuitive results. For small values of ω, clearly linkable systems are characterized as unlinkable. On the other hand, for large values of ω, clearly unlinkable systems are characterized as linkable. Table II reports the linkability measurement of synthesized distributions in Figure 2 using the measure in [17] with different values of ω and our measure. As this table shows, while our linkability measure is independent of prior probabilities, the linkability measure D sys ↔ is sensitive to the value ω and thus depends on the prior distributions of mated and non-mated template pairs. This may be an issue for two reasons. First, estimating this prior probability could, in general, be hard. While the authors in [17] considered ω = 1 as the worst-case scenario, such an assumption is not necessarily realistic in many practical cases. In particular, the adversary might have some knowledge about the prior probabilities. For instance, in many practical cases, it is reasonable to assume that non-mated pairs have a higher probability than mated pairs. Secondly, a linkability measure should depend on the BTP scheme and not on the prior belief about the distribution of the hypothesis. Arguably, it makes sense to consider measures that do not depend on the prior probability of H . 8 Available at https://github.com/dasec/unlinkability-metric

IV. EXPERIMENTS
In this section, we describe the experimental results of evaluating the linkability of protected biometric templates using the proposed measure. First, we describe our experimental setup in Section IV-A. Next, we analyze the numerical results of linkability measurement for different BTP schemes, different scoring functions, different characteristics, different feature extractors, and also examples of linkable templates in Section IV-B. Finally, we discuss our experiments in Section IV-C.

A. Experimental Setup
In our experiments, we evaluate the linkability of different BTP schemes on different characteristics (face, voice, and finger vein). We also considered DNN-based (face and voice) and hand-crafted (finger vein) feature extractors in our experiments.
2) Biometric Characteristics: In our experiments, we use different biometric characteristics, including face, voice, and finger vein. We build different biometric recognition systems based on the aforementioned characteristics as follows. Table IV summarises different biometric recognition systems used in our experiments. a) Face recognition: For face recognition, we use ArcFace-InsightFace [39], ElasticsFace [40], and FaceNet [41] models as different feature extractors and generate mated and non-mated templates from MOBIO [42] dataset. The MOBIO dataset is a bimodal dataset including face and voice data taken with mobile and laptop devices from 150 individuals, captured in 12 sessions (6-11 samples in each session) for each subject. To generate mated scores, we consider all possible combinations of samples for different subjects. For non-mated comparisons, we use the first 10 samples for each subject, and then we consider all possible pairs of samples from different subjects. b) Voice (speaker) recognition: For voice (speaker) recognition, we use ECAPA-TDNN model [43] as the feature  [42] dataset to generate mated and non-mated templates. To generate mated and non-mated scores, we use the same protocol as we use for face recognition.
c) Finger vein recognition: For finger vein recognition, we use Wide Line Detector (WLD) [44] as the feature extractor on the UTFVP [45] finger vein dataset. The UTFVP dataset includes 1440 finger vein images from 60 individuals captured in two identical sessions. For each subject, the vascular patterns of the middle, index, and ring fingers of both hands were collected twice at each session. In our experiments, we consider different fingers for each user as a different data subject (i.e., 6 data subjects corresponding to each individual). For mated comparisons, we generate 10 different protected templates from each unprotected template using different keys. Then, we consider all possible combinations of protected templates for each subject. For non-mated comparisons, we consider all possible pairs of samples from different subjects.
3) Implementation Details and Source-Code: In our experiments, we use the Bob 9 toolbox [46], [47] to both build the biometric recognition systems and generate mated and nonmated pairs. In addition, we use the open-source implementations (in Bob) of the BioHashing, MLP-Hash, IoM-GRP, IoM-URP, Bloom Filters, and HE schemes [35], [48], [49], [50], [51], [52]. For the implementation of HE, we use its implementation in Bob [52] using the SEAL-Python 10 wrapper on Python 3.8 for the C++ SEAL library [53]. The source code of all our experiments is publicly available to help researchers reproduce our results as well as to allow them to use our method to measure the linkability of their own protected templates. 11

B. Analyze
In this section, we describe our experiments on different biometric recognition systems. We evaluate the linkability of protected templates with different BTP schemes (in Section IV-B.1), based on different scoring functions (in Section IV-B.2), across different characteristics (in Section IV-B.3), and from different feature extractors (in Section IV-B.4). In each experiment, we try to fix all biometrics modules, except only one module. 12 We also evaluate the linkability of exemplary linkable templates in Section IV-B.5, including linkable protected templates (Section IV-B.5.a) and linkable unprotected templates (Section IV-B.5.b).

1) Linkability Measurement of Different BTP Schemes:
In this experiment, we consider the features extracted from face images using the ArcFace model, and apply different BTP schemes, including BioHashing, MLP-Hashing, Bloom Filters, IoM-GRP, IoM-URP, and HE. Table V reports the linkability measurement of protected templates using the measure in [17] and our proposed measure. As this table shows, protected templates by these BTP schemes are almost unlinkable. This table also compares the rank of each BTP scheme compared to other schemes in terms of unlinkability by both measures (ranks are reported in parentheses). As this table shows, both methods rank these schemes the same in terms of the unlinkability of protected templates. However, the values of the measure [17] do not have any interpretation, and it is not clear how significant is the difference in unlinkability of these BTP schemes based on their unlinkability values by measure [17]. Whereas the values of our measure can be interpreted by Lemma 3 by providing an upper bound given the unlinkability value which guarantees that the adversary cannot perform any better hypothesis test than that upper bound. Therefore, each of these BTP schemes leads to a different upper bound for the accuracy of the adversary's hypothesis testing (similar to the upper bounds illustrated in the ROC plots of Figure 1 and Figure 2).
2) Linkability Measurement With Different Scoring Functions: Recall that our proposed measure and the one proposed in [17] are both based on score distributions of mated and non-mated templates. Therefore, as also discussed in Section II, different scoring functions can provide different levels of linkability for protected templates. To evaluate the effect of the scoring function, in this experiment, we generate BioHash-protected templates from the features extracted by Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   3) Linkability Measurement Across Different Biometric Characteristics: To explore the application of our measure on different biometric characteristics, in this experiment, we evaluate the linkability of BioHash-protected templates across different biometric characteristics, including face (ArcFace), voice (ECAPA-TDNN), and finger vein (WLD). Table VII compares the linkability measurement of BioHash-protected templates using the measure in [17] and our proposed measure across different biometric characteristics. This experiment confirms that our measure can be applied to templates with different biometric characteristics, and Table VII show that BioHash-protected templates are almost unlinkable across different biometric characteristics.
4) Linkability Measurement for Different Feature Extractors: To evaluate the effect of the feature extractor, in this experiment, we evaluate the linkability of BioHash-protected templates of face data extracted using different feature extractors, including ArcFace, ElasticFace, and FaceNet. Table VIII 13 Implementations of all these scoring functions are available in the SciPy package: https://scipy.org Fig. 4. Histogram of mated and non-mated scores for linkable protected templates (FaceNet templates protected by BioHashing scheme using user-specific keys). The linkability of the mated and non-mated scores in this example is 0.9765 and 0.9574 by our and [17] measure, respectively. compares the linkability measurement of BioHash-protected templates using the measure in [17] and our proposed measure for different feature extractors. As this table shows, BioHashprotected templates are almost unlinkable for these feature extractors.
5) Linkability Measurement of Linkable Templates: In our experiments in Sections IV-B.1-IV-B.4, we measured the linkability of protected biometric templates using different BTP schemes across different biometric recognition systems. Our experiments indicate that the protected templates with the aforementioned BTP schemes are almost fully unlinkable. In this section, we consider two examples of linkable protected templates and linkable unprotected templates: a) Linkable protected templates: As an example of linkable protected templates, we consider FaceNet features protected by the BioHashing scheme using user-specific keys. Note that in our experiments in Sections IV-B.1-IV-B.4, we considered sample-specific keys for generating protected templates. While considering user-specific keys in this experiment may be assumed as a hypothetical scenario, it can reflect the situation where templates with the same key 14 for each user are leaked. For instance, consider a biometric recognition system where multiple protected templates are stored for each user in the system's database (i.e., multiple reference templates). Then, an adversary gains access to all (or a portion of) the templates stored in the system's database, and aims to distinguish mated and non-mated pairs. In such a situation, since mated templates are generated using the same key corresponding to the user (i.e., user-specific key), there should be a high link between protected templates. Figure 4 depicts the histogram of scores for mated and non-mated templates for FaceNet features protected by the BioHashing scheme. The linkability of mated and non-mated templates in this example is 0.9765 and 0.9574 by our proposed measure and the measure in [17], respectively. Therefore, as also expected from the histogram of scores, these templates are almost fully linkable.
b) Linkability of unprotected templates: In this experiment, we consider an unprotected system, and because no key is applied to generate templates in such systems, we expect to observe a high distinguishability between mated and non-mated templates (as expected from the normal operation of a biometric recognition system). As an example of such a case, we consider FaceNet features in this experiment. Figure 5 illustrates the histogram of scores for (unprotected) mated and non-mated templates. The linkability of templates for this case is 0.9912 by our proposed measure and 0.9669 by the one in [17]. Therefore, this experiment confirms that unprotected templates are almost fully linkable.

C. Discussions
In our experiments in Sections IV-B.1-IV-B.4, we evaluated the linkability of protected biometric templates. In Section IV-B.1, we observed that our proposed measure and the one proposed in [17] return low values for linkability, and therefore the protected templates with different BTP schemes are almost unlikable based on both measures. Comparing the values for different BTP schemes in Table V, both methods rank the evaluated BTP schemes similarly. While the values for different BTP schemes in each of these measures are close, there is theoretically no interpretation possible for the values of measure [17] and the significance of the difference between the two values in this measure. In contrast, the values of our measure can be interpreted according to Lemma 3, which provides an upper bound for the adversary's hypothesis testing (similar to the upper bounds depicted in the ROC plots of Figure 1 and Figure 2). For example, to compare the linkability of BioHashing and Bloom Filters, we have different values for the linkability measurement of these schemes in Table V, and therefore we have different upper bounds according to Lemma 3. Comparing the corresponding bounds, we can say that if an adversary can gain access to BioHash-protected templates instead of templates protected with Bloom Filters, then the adversary can achieve up to 2 0.0162 − 2 0.0007 = 0.0108(≈ 1.1%) more accuracy when performing hypothesis test (i.e., up to 1.1% more accuracy in distinguishing mated and non-mated templates). However, such an exercise cannot be done with [17] because there is no practical interpretation for the linkability values in [17].
The experiment in Section IV-B.2 showed that different scoring functions can provide different levels of linkability for protected templates. This is reasonable since each scoring function compares two given templates differently, and thus provides different information from the similarity of the two templates. Hence, since our proposed measure and the one in [17] are based on score distributions of mated and non-mated templates, different scoring functions lead to different linkability values. Therefore, it is important to consider different scoring functions when evaluating the linkability of protected templates.
In our experiments in Section IV-B.3 and Section IV-B.4, we measured the linkability of BioHash-protected biometric templates across different biometric characteristics and for different feature extractors, respectively. These experiments show that the BioHash-protected biometric templates from different biometric characteristics and from different feature extractors are almost fully unlinkable. This experiment also confirms the application of our measure across different biometric characteristics and for different feature extractors.
In our experiments in Section IV-B.5, we measured the linkability of two systems that we expect to be linkable. In Section IV-B.5.a we considered an example of linkable protected templates where we assumed that user-specific keys are used to generate protected templates. Since keys to generate protected templates for each user are the same in this scenario, we should have high linkability between templates, which is also confirmed by our results. As another example of linkable templates, we considered unprotected templates in Section IV-B.5.b. Similarly, in this case, we expect that the templates from the same user be similar and differ from templates of other users, which means a high level of linkability. The result of our linkability measurement also confirms that unprotected templates are almost fully linkable.
All in all, our experiments confirm that our proposed method can be deployed to measure the linkability of protected templates, and the results are intuitively correct. We evaluated the linkability of protected templates using our measure for different BTP schemes, scoring functions, biometric characteristics, and feature extractors. Furthermore, we evaluated two examples of linkable templates, where our measure also showed a high level of linkability. As discussed in Section II our measure has a solid theoretical background, and also the values of our measure have a practical interpretation according to Lemma 3, where our proposed measure can provide an upper bound for the accuracy of the adversary's hypothesis testing given score distributions for mated and non-mated templates.

V. CONCLUSION
In this paper, we proposed a new method for measuring the linkability of protected biometric templates. We used maximal leakage, which is a well-studied measure in informationtheoric literature. Our proposed measure is based on hypothesis testing using the distributions of similarity scores of mated and non-mated protected templates.
The proposed measure is consistent with the definition of linkability in the ISO/IEC 30136 standard and quantifies the linkability degree of protected templates. In particular, we showed that our measure can provide an upper bound on the accuracy of the adversary's hypothesis test given distributions of scores, and guarantees that an adversary cannot achieve better performance than the provided upper bound. The value of our measure is bounded in the [0, 1] interval, where a higher value indicates more linkability (i.e., 0 shows fully unlinkable and 1 shows fully linkable). The proposed method is also computationally stable and does not require any assumptions on prior probabilities of mated or non-mated hypotheses.
We also investigated the application of differential privacy to measure the linkability of protected biometric templates and showed that the differential privacy-based measure is too strict for the linkability application. Last but not least, in our experiments, we used the proposed measure to evaluate the linkability of biometric templates from different biometric characteristics (face, voice, and finger vein), different feature extractors, and protected with different BTP schemes. The experimental implementation of our proposed measure showed that it gives intuitively correct linkability scores across different BTP schemes, biometric characteristics, and scoring functions.
We conclude the discussion with some comments on an important question: how to estimate, M sys ↔ , the true linkability of the system. In this paper, we adopted the approach of using M s ↔ as proxies for M sys ↔ . As we see in Lemma 1, the value of M s ↔ is always lower than the value of M sys ↔ and it is therefore important to take the highest available value of M s ↔ across different similarity scores. Other approaches to this problem include stronger theoretical analysis of Eq. 7, as well as a more extensive analysis of how well different similarity functions estimate M sys ↔ . Understanding how to better estimate the true linkability of a system is thus an important direction for future work.
where Eq. 28 is obtained by applying the law of total probability where we definedD = F p(s|h m )ds + F p(s|h nm )ds. Note that M s ↔ = log(D) (40) and thus, where recall that the logarithm has base two.  Sébastien Marcel (Graduate Student Member, IEEE) received the Ph.D. degree in signal processing from CNET, Université de Rennes I, France, in 2000, and the Research Center of France Telecom (currently Orange Laboratories). He heads the Biometrics Security and Privacy Group, Idiap Research Institute, Switzerland, where he conducts research on face recognition, speaker recognition, vein recognition, attack detection (presentation attacks, morphing attacks, and deepfakes), and template protection. He is a Professor with the School of Criminal Justice, University of Lausanne, and a Lecturer with École Polytechnique Fédérale de Lausanne. He is also the Director of the Swiss Center for Biometrics Research and Testing, which conducts certifications of biometric products.