New Perspectives on Recognition Performance of Daugman’s IrisCode or “Everything is new – it is well forgotten old.”

Daugman’s design of IrisCode continues to be fascinating in the research world with its practicality, efficiency, and outstanding performance. The limits of Daugman’s recognition system, however, remain a topic of active discussion. Multiple approaches to scale performance have been explored in the past. Despite them, the problem of finding the maximal population of IrisCode remains open. Because of this, we appeal to Rate-Distortion theory (limits of error-correction codes) to establish bounds on the maximum possible population of iris classes that IrisCode can support under the constraint of a minimum Hamming Distance (HD) between any two codewords. This approach considers the distribution of iris data within and across iris classes and the quality of iris data. We first present the Hamming, Plotkin, and Elias-Bassalygo upper bounds and the Gilbert-Varshamov lower bound on the population of IrisCode. The bounds relate the number of iris classes that the IrisCode algorithm can sustain and the quality of iris data expressed in terms of HD. Then, we analyze our results and draw conclusions regarding the relationship of IrisCode population size and the level of quality that enrolled data must have to ensure a particular population coverage. By applying the theory presented here, researchers can better understand what maximum population is achievable based on the quality of their iris dataset.


I. INTRODUCTION
A S with any practical data, iris datasets are not perfect.
Even good quality iris images experience some quality degradation due to occlusions, illumination conditions, camera noise, motion, and out-of-focus blurs (see [1]- [3] and references therein); followed by an additional degradation due to imperfect signal processing applied while iris images are transformed to IrisCode templates [4]. All these degradations make perfect zero genuine Hamming distance (HD) values impossible.
When partitioning the interval of normalized HDs into genuine and imposter subintervals, a range of values between 0.28 and 0.36 is recommended as an interval to pick a threshold [5], [6]. This implies that two IrisCode templates are considered as samples from the same class (or genuine) if the normalized HD between them is less than the selected threshold. Otherwise two templates are considered as samples from two different classes (or imposter). Plots of typical distributions of genuine and imposter HD values can be found in [4], [7] and several other publications analyzing performance of IrisCode. Imposter HD values are fitted with a narrow Binomial probability mass function centered at 0.5 and characterized by 245 degrees of freedom. Distribution of genuine HD values is highly dependent on quality of iris data and takes a variety of shapes (see for example, [8]). Note that the spread of genuine HD values is much larger than the spread of imposter HD values, which explains the choice of the threshold value in the range (0.28, 0.36).
Performance of IrisCode is quantified using a number of metrics. False Match Rate (FMR) is a traditional measure of verification (authentication) performance, while Rank 1 identification, also known in Detection Theory as M-ary detection error [9], is a measure of recognition (or identification) performance. Error to Enroll is a type of probability of error applied to analyze the size of iris population that can be enrolled with nearly zero probability of error or, using the terminology introduced in [7], [8], without collision. As demonstrated by Daugman [4], Error to Enroll, FMR, the decision threshold, and the maximal number of iris classes that IrisCode can sustain are all related by means of a single inequality, in which FMR for small values of the decision threshold is calculated by fitting a narrow Binomial distribution with 245 degrees of freedom into the plot of relative frequencies obtained from imposter HD values found empirically.
Let us take a closer look at the analysis of maximal population that IrisCode can sustain based on a particular value assigned to Error to Enroll. We begin with the analysis of FMR, interpreting it as the probability that a new class not present in iris dataset is classified as any one of the existing classes. Setting the number of degrees of freedom to 245 as suggested by Daugman [4], [6]- [8] and threshold to 0.32, we conclude that the probability of this event is one in 137 million. If the threshold is set to 0.28, the probability decreases to one in 648 billion [5], [6].
We now turn to the analysis of the number of iris classes that IrisCode can sustain at a value of Error to Enroll. After solving Daugman's bound (see its detailed explanation in [8]) for the value of maximal population, denoted here as , the bound is given as . ( Here is the threshold between genuine and imposter HD values. The bound (2) yields the population size of 9.50 × 10 7 and 4.49 × 10 11 when is set to 0.5 and the threshold value to 0.32 and 0.28, respectively. If the problem of finding the maximal iris population is formulated as a "birthday problem," (see for details [6], [7]), then the probability that one or more iris class pairs, among ( − 1)/2 possible pairings, are falsely matched is given as Bounding it by a value allows us to invert the inequality and solve for This bound yields 13, 782 and 947, 800 values of maximal population, when = 0.5 and the threshold is set to 0.32 and 0.28, respectively. The inequalities (1) through (4) establish a relationship between one of two error probabilities (Error to Enroll or error of one of more collisions among ( − 1)/2 distinct paired classes), the population size of IrisCode, and FMR parameterized by a threshold . Although these equations allow us to estimate the maximal size of IrisCode population, given values of the other two components, the equations are exclusively based on the distribution of imposter HD values and do not take into account the varying quality of iris biometric data, which is inherently present in the distribution of genuine HDs.
In an attempt to fill the gap in understanding the performance limits of Daugman's algorithm, we turn to an analysis of the relationship between the size of the population that the IrisCode can effectively cover and the iris sample quality. Given Daugman's IrisCode algorithm, the problem of finding its maximal population is cast as a basic Rate-Distortion problem. Upper bounds (Hamming, Plotkin, and Elias-Bassalygo) and a lower bound (Gilbert-Varshamov) are applied to the population of a binary code, with the constraint of quality of iris data as a minimum Hamming Distance (HD) between two codewords, to obtain the maximal/minimal number of iris classes that the IrisCode algorithm can sustain.
The rest of the paper is organized as follows: Section II presents four bounds on the population that IrisCode can sustain under the condition of a given quality of data, Section III provides comments on performance of the presented bounds, and Section IV presents a summary of the main points discussed in the paper.

II. BOUNDS ON POPULATION OF IRISCODE
In the rest of the paper, we assume that a one-to-one encoding technique is available to map Daugman's IrisCode templates (defined as binary templates of length 2048 with each carrying phase information and representing iris classes) into a set of binary codewords, each of length = 245 bits. Note that the length of each codeword is equal to the number of degrees of freedom supported by IrisCode (see [4], [8] for a detailed reasoning behind our assumption). Given this assumption, the problem of finding the maximal population covered by Daugman's algorithm is reduced to finding the number of binary codewords with a specified minimum HD between any two codewords as a constraint.
Consider binary codewords of length and denote the threshold that separates the interval of possible normalized HDs into genuine and imposter subintervals as . Given the assumption above, ideal iris classes can be visualized as points positioned along the -dimensional lattice in a 2 code space. If a query iris codeword, submitted for recognition, belongs to a specific iris class, then the query codeword and the true codeword of the claimed class lie within a hyper-dimensional ball of radius , where and are related as For iris classes to be distinguishable during matching, centers of the hyper-dimensional balls representing different iris classes have to be spaced at least ×2+1 = × ×2+1 = bits apart, where is the HD between two hyper-dimensional balls expressed in bits. Under this setup, the problem of finding the maximal population of the IrisCode is reduced to the sphere packing problem from Rate-Distortion theory [10]. To  be more specific, we are looking at limits of error correction codes [11], [12]. This problem has been well analyzed, and the developed results can be directly applied to estimate the size of the IrisCode population. Most Rate-Distortion theory results are presented in the form of bounds. Below we provide a brief summary of three upper bounds and illustrate how to apply them to the IrisCode. We also provide a fourth bound, a lower bound, on the population size, since both upper and lower bounds are useful to analyze the performance of the IrisCode. This allows for the formation of a confidence band around the true but unknown maximal population size.

A. HAMMING BOUND
Let ( , ) denote the maximum possible population of a binary iris gallery with each iris class represented by a codeword of length , given that the minimum HD between iris classes equals = 2 × + 1 = × × 2 + 1 bits. Then, the application of the Hamming bound [11]- [13] yields the following result For an illustration in application to IrisCode, we set to 245 and thus the number of codewords in unconstrained code is equal to 2 245 . If the decision threshold is set to 0.32, then the maximum radius of a class's hyper-dimensional ball, , is equal to = 245 × 0.32 = 78.40 = 78 bits,  Fig. 1 and Table 2 present the results of Hamming Bound as a function of normalized HD. A discussion of the results of Hamming and following bounds is presented in later sections.

B. PLOTKIN BOUND
The Hamming bound is considered to be a loose upper bound on a constrained code population. Among other bounds, the Plotkin bound [14] and the Elias-Bassalygo bound are tighter alternatives, although each has its own limitations. The Plotkin bound takes several forms. 1) If is even and 2 > , then 2) If is odd and 2 + 1 > , then 3) If is even and 2 = , then 4) If is odd and 2 + 1 = , then Since = 2 + 1 is an odd number, then under condition > ( − 3)/4 (if = 245, then > (245 − 2)/4 = 60.75) the Plotkin bound on the population covered by IrisCode is given as If (4 + 3) = , then the Plotkin bound can be calculated as To illustrate performance of the bound, we continue the example from the previous subsection. We set = 245 and = 0.32. Thus, the radius of the hyper-dimensional ball is VOLUME 4, 2016  To get a deeper insight on the dependence of the maximal population size on the length of codewords, we enforced an assumption that the length of IrisCode codewords can be extended beyond 245. Table 1 displays the results of the Plotkin bound as a function of two parameters, codeword length and threshold . As can be seen from the table, for a large value of normalized HD (the same as threshold ), the population coverage of the IrisCode is limited to only a few classes, and the number of classes does not change significantly with the increase of length of codewords. For example, if we set the threshold to = 0.28, the number of classes that can be enrolled without any error is 8, regardless of the length assigned to codewords (as seen from the table, we experimented with = 256 and up to = 2048).

C. ELIAS-BASSALYGO BOUND
It can be seen from Fig. 1, that the Hamming bounds is loose when the threshold is set to a small value, and Plotkin bound does not exist for small values of . As an alternative solution, we involve Elias-Bassalygo bound [15]. This bound is known to be tight for small values of threshold and large values of . Under condition ≤ ( − 2)/4 (if = 245, this inequality implies ≤ (245 − 2)/4 = 60.75), the Elias-Bassalygo bound on the population covered by IrisCode is given as where A plot of the Elias-Bassalygo bound for code length 245 and varying values of threshold is presented in Fig. 1 (see the blue line). Numerical comparison of the three upper bounds for a broad range of values of is provided in Table 2.

D. GILBERT-VARSHAMOV LOWER BOUND
Unlike the three bounds discussed above, the Gilbert-Varshamov bound [11], [16], [17] is a lower bound on the size of code population. The bound was developed using a sphere covering technique that requires that the overall space of 2 binary codewords can be covered with overlapping hyperdimensional balls of radius close to the minimum allowed HD distance between codewords. The Gilbert-Varshamov bound on the maximum population of a binary code is described mathematically as The bound in application to the IrisCode, with the length of codewords = 245 and the number of all possible binary codewords 2 245 , is shown in Fig. 1 (see the purple line). Table  2 compares Gilbert-Varshamov bound with the previously described upper bounds at several selected values of threshold .

III. COMMENTS ON PERFORMANCE OF BOUNDS A. ACTUAL POPULATION VERSUS BOUNDS
Although binary codewords of length = 245 are viewed as short length codewords in communications theory, numerical analysis of maximal population, given a constraint on the minimal distance between any two codewords, is a computationally challenging problem. In fact, finding the maximal population empirically requires an exhaustive search. To limit computational load, we consider a scaled version of binary codewords of IrisCode. We chose = 16 over = 245. A numerical comparison between the four bounds and empirically evaluated maximal population of a binary code with words of length = 16 is provided in Table 3, where the value of empirical maximal population is obtained through an exhaustive search. The search is implemented in several steps. It begins with generating all possible binary codewords of length = 16. Then a single codeword is picked and all its neighbor codewords with the normalized HD smaller than are eliminated. This step is followed by analyzing the remaining neighbor codewords and eliminating their neighbors that are located closer than the distance allowed by the normalized HD. This process continues until all pairs of codewords are spaced at least × × 2 + 1 apart.

B. MAXIMAL POPULATION VERSUS THRESHOLD
We now turn to the case of = 245. As the Plotkin bound (see Table 2) clearly demonstrates, IrisCode is unable to enroll a large number of iris classes when the threshold is set at or above 0.28. Moving it to 0.26 increases the bound considerably.
Equipped with the intuition provided by the example in Sec. III-A, we anticipate that as decreases the population size of IrisCode (case of = 245) grows at a rate similar to the rate in the example. If the threshold is moved to 0.2 and even further to 0.12, according to the Elias-Bassalygo bound, we may be able to enroll without collision up to 6.98 × 10 14 and 5.30 × 10 34 iris classes, respectively. At the same time, the Gilbert-Varshamov like bound ensures that the number of classes that can be successfully enrolled at = 0.2 and = 0.12 are above 952 and 3.84 × 10 16 , respectively. The population of 3.84 × 10 16 is a large scale population, however, this size enrollment is possible only if we ensure that both enrolled and query data are of exceedingly high quality, that is, with the combined distortions in enrolled and query data less than 0.12.
This may sound like a stringent requirement, however, modern technology is able to support it. State-of-the-art image acquisition cameras (including those in our cell-phones) take multi-view video sequences of an object and then interpolate them in a single view capture of the highest possible quality. In addition, given a video of an iris, various signal processing and machine learning approaches can be applied to ensure high quality of IrisCode templates.

C. OTHER CODES
A brief performance analysis of Daugman's IrisCode leads us to a conclusion that recognition performance of iris biometrics for a given encoding technique is determined by its degrees of freedom. For Daugman's algorithm the number of degrees of freedom is 245. Other iris encoding approaches are likely to lead to a different number of degrees of freedom and thus to a different maximum population size that codes can cover. A comprehensive list of methods that can be used as encoding techniques can be found in [2], [3], [18], [19]. However, as processing steps in all these encoding techniques are different, their tolerance to noise is different as well. Despite their encoding difference, the analysis of the maximal population covered by the algorithms can be reduced to the analysis presented in this paper.

IV. SUMMARY
From the examples and theory presented above, we conclude that the size of the maximal population of IrisCode can be analyzed by stating the problem as a basic Rate-Distortion / channel coding problem. Within this framework, the task is to find the maximal possible population covered by IrisCode under the constraint on the distance between codewords. They should be separated by more than the minimum normalized HD, which is attributed to noise and distortions present in iris codewords.
The size of enrolled IrisCode population can be increased by moving the decision threshold far to the left. Doing so, requires IrisCode templates of high quality, which can be readily achieved by modern image acquisition and data processing technologies. In conclusion, with the application of the theory presented above, researchers can better understand what maximum population is achievable with zero error for their iris dataset based on its quality.