A New Helper Data Scheme for Soft-Decision Decoding of Binary Physical Unclonable Functions

Physical unclonable functions (PUFs) exploit randomness in the hardware for the derivation of cryptographic keys. In the literature, usually the readout is two-level quantized and hard-decision channel decoding is used to stabilize the extracted key. In this paper, we assess soft-decision decoding of binary PUFs. It is well known in the literature on channel coding that soft-decision decoding provides significant gains over hard-decision decoding since reliability information about the symbols is utilized. The PUF readout process is interpreted as digital transmission over a noisy channel, the respective capacity is calculated, and the optimum decoding metric is derived. In addition, we propose an augmented helper data scheme which is suited for soft-decision decoding. This scheme utilizes the fact that operations on the analog readout values are possible, opposed to operations on hard-decided binary symbols in classical PUFs. The security of the new scheme is proven and a possible realization is discussed. The performance is covered by numerical simulations and by applying the scheme to measurement data from FPGA implementations of ring oscillator PUFs.


I. INTRODUCTION
Physical unclonable functions (PUFs) are hardware primitives that can be used to securely generate and store cryptographic keys. Randomness that occurs from uncontrollable variations in manufacturing processes of physical objects is exploited to extract a response from the hardware, which classically is a binary sequence that is unique for each PUF. Based on the response, a key can be derived. Since the exploited randomness is static over the object's lifetime, a key can be reproduced at any time. Hence, an implicit key storage is implemented, thereby avoiding additional cost and chip area, and increasing the security compared to a protected non-volatile memory for key storage. Since reproducing the key might be erroneous due to environmental effects like changing temperature or supply voltage, channel coding has to be applied to guarantee stable keys.
Strategies for channel decoding differentiate between hard-decision decoding and soft-decision decoding. In harddecision decoding, tentative decisions are produced by a threshold device (i.e., a quantization operation). Based on these ''hard'' symbols, the decoder aims to estimate the The associate editor coordinating the review of this manuscript and approving it for publication was Jin Sha. transmitted codeword. In contrast, in soft-decision decoding reliability information about the symbols is present or can be extracted; these ''soft'' values are utilized in the decoder. In practical schemes, the reliability information is often expressed as so-called log-likelihood ratios.
Traditionally, PUFs employ hard-decision decoding; the PUF readout is quantized (threshold operation) and all further operations are based on these quantized symbols. To enable decoding, during initialization, helper data is generated. Via the helper data, the original PUF response is transferred to a codeword of the desired error-correcting code. During reproduction, the readout may differ from the original PUF response. However, applying the helper data, the response is transferred to the form of codeword plus superimposed error word. If the error word has a small enough Hamming weight, decoding will be possible, cf., e.g., [1]- [4]. Note that in the classical setting, responses, helper data, and codewords are all assumed to be binary.
The concept of soft-decision decoding has also been transferred to PUFs. Essentially two methods for gathering soft information from PUF measurements gained interest in the literature. First, reliability information about the individual PUF cells can be obtained by repeatedly extracting the binary PUF response during initialization and evaluating the fraction of ones for each position. These reliability information can be used to calculate the decoding metric, e.g., [5]- [7], or to improve the channel by only using highly reliable response bits, e.g., [8], [9]. Second, instead of deriving a quantized binary response, depending on the PUF construction, real numbers can directly be extracted. For example, real-valued frequency differences of ring oscillators may be exploited, cf. [8].
In the present paper, we follow the second line of work and deal with soft-decision decoding of binary PUFs. To that end, we interpret the PUF readout process as digital transmission over a noisy channel and calculate the respective capacity. In addition, the optimum decoding metric is derived. Based on these information-theoretic considerations, the code rates can be chosen. Moreover, we propose an augmented helper data scheme which is suited for soft-decision decoding. This scheme utilizes the fact that operations on the analog (nonquantized) readout values are possible, opposed to operations on hard-decided binary symbols in classical PUFs. The security of the new scheme is proven and a possible realization is discussed. The performance is covered by numerical simulations and by applying the scheme to measurement data from FPGA implementations of ring oscillator PUFs.
PUFs are usually categorized into weak and strong PUFs. We primarily address weak PUFs (typically used for key generation, i.e., a unique fingerprint is delivered based on the properties of the hardware), although the discussed concepts can be translated to strong PUFs (for example used for authentication, i.e., the unique response is additionally dependent on a challenge). In addition, we consider the coding/decoding scheme and do not study attacks, as, e.g., done in [10], [11], and do not address countermeasures as, e.g, done in configurable ROPUFs or transformer PUF, cf. [12].
The paper is organized as follows: In Sec. II, PUFs are reviewed and the use of soft information is discussed. The capacity is calculated and the optimum soft-decision decoding metric is derived. A new helper data scheme which exploits the degrees of freedom additionally present when operating on the analog readout is presented in Sec. III. Its security is proven and its free parameters are optimized for achieving best performance. Numerical examples are given. The paper closes in Sec. IV with a brief summary.

II. PHYSICAL UNCLONABLE FUNCTIONS EMPLOYING SOFT-DECISION DECODING
In this section we review classical PUFs and study those which directly employ an analog quantity which is extracted from the hardware. Based on an information-theoretic analysis-in particular considering the capacity when interpreting the readout process of PUFs as digital transmission over a noisy channel-we compare the potential performance when using hard decisions and soft information, respectively.

A. RING OSCILLATOR PUFs
PUFs, introduced in [13], exploit intrinsic randomness that occurs due to variations in the manufacturing process of physical items. Since the extracted randomness is usually static over the lifetime of the PUF, keys can be regenerated when required by a cryptosystem, and hence, no non-volatile, protected memory is needed to implement a key storage. Thereby PUFs replace pseudo random number generators and non-volatile memories to provide secure key generation and storage, respectively.
Essentially, the randomness in PUFs is either extracted from delays in electronic components or from the behavior of memory cells. We focus on ring oscillator PUFs (ROPUFs), the most prominent member from the group of delay-based PUFs, cf. [14], [15]. A ring oscillator (RO) is a loop consisting of an odd number of inverters. If a signal propagates through the RO, it oscillates with a frequency whose actual value depends on the random delays in its inverters and wires. Fig. 1 visualizes the structure of a ROPUF. In a PUF implementation, pairs of ROs are selected by a multiplexer. The frequencies of the ROs are measured by counters and the frequency difference r diff is calculated. In the classical description of ROPUFs, depending on the sign of r diff either the binary symbol 1 0 or 1 is derived. The symbols of n RO pairs are combined into a vector r = [r 1 , . . . , r n ] which establishes the extracted information.
When re-extracting this information again in a reproduction phase, due to variations in the environmental conditions (e.g., temperature or supply voltage), errors might occur, i.e., the word r will differ from the word r ref derived during initialization. In the literature, this behavior is traditionally modeled by r ref , the nominal/reference readout being transmitted over a binary symmetric channel (BSC); its bit error probability is often approximated by p ≈ 0.14, e.g., [2], [16], [17].
Hence, in order to guarantee a stable result, channel coding has to be employed to correct the readout errors. However, the nominal readout r ref is in general not a valid codeword from a given binary channel code. This problem is solved by employing a so-called helper data scheme (HDS); the most-often used in practical applications is the code-offset algorithm according to [18]- [20].
1 Notation: We distinguish between quantities from the set of real numbers R (conventional font) and variables over the binary field F 2 (fraktur font, e.g., m i = 0). In addition, we distinguish between scalars (normal font) and vectors (bold font). Real vectors, e.g., r = [r 1 , . . . , r n ] T , are column vectors, however, as usual in channel coding, vectors over the finite field, e.g., c = [c 1 , . . . , c n ], are row vectors. VOLUME 10, 2022 There, in an initialization phase which is carried out in a secure environment, a k-bit message m is randomly, uniformly drawn and encoded (ENC) into a codeword c = ENC(m) employing a given binary error-correcting code C. The word h = r ref ⊕c established the helper data (⊕: addition over the binary field F 2 ); it may be stored publicly. In the reproduction phase, the (erroneous) word r is extracted from the hardware and y = r⊕h is calculated. Due to construction, y = c ⊕ (r ref ⊕ r) = c ⊕ e, i.e., the codeword plus an additive error given by the deviation between r ref and r is present. If the Hamming weight of e is below the error correcting capability, a (hard-decision) channel decoder is able to recover the correct codeword c and, thus, the associated message m.

B. SOFT-DECISION DECODING
Alternatively, as done in [21], the real-valued frequency differences r diff can be utilized directly in a soft-decision decoder. It is well known in the literature that softdecision decoding provides significant gains over harddecision decoding since reliability information about the symbols is utilized.
To that end, we suitably normalize the (analog) frequency difference r diff by multiplying with a scaling factor c (additionally a possible mean is removed). The normalized, real-valued readout symbols are denoted as r. As shown in [21] for ROPUFs, the readout vector which combines n PUF cells is very well modeled by where both, the reference/nominal readout r ref and the error word e ref , are zero-mean Gaussian distributed. W.l.o.g. the normalization factor c can be chosen such that r ref has unit variance (per element, i.e., σ 2 x = 1). The error vector e ref has variance σ 2 e per element. As shown in [21], over a wide range of temperatures the error variance is not larger than σ 2 e = 0.01; the corresponding signal-to-noise ratio (SNR) is thus at least 10 log 10 (1/σ 2 e ) = 20 dB. We expect the scheme to produce reliable outputs for SNRs larger than this worst case.
The above discussed code-offset algorithm can easily be adapted to the case of soft readout [21]. To that end, we define a mapping of the binary (finite-field) symbols ''0'' and ''1'' to the real-valued elements of a binary phase shift-keying (BPSK) alphabet. The mapping is done according to Since M(c 1 ⊕ c 2 ) = M(c 1 ) · M(c 2 ), a homomorphism between addition over F 2 and multiplication of BPSK symbols exists; the addition of ''1'' over F 2 is equivalently done by a sign flip (multiplication with −1) over the real numbers.
As the sign of the readout represents the extracted information, the sign has to be adjusted such that it matches the sign of the desired codeword c which is mapped (element-wise) to a BPSK constellation, i.e., a = M(c). The sign flip for the entire word can be represented by a signed identity matrix S (±1 on the main diagonal; zero else). In the initialization phase, this matrix (which is equivalent to h) is calculated and stored publicly.
In the reproduction phase, is calculated. As can be seen, the useful (error-free) signal x = Sr ref is distorted by the error e = Se ref . Still both quantities are zero-mean Gaussian distributed with variances σ 2 x = 1 and σ 2 e , respectively.

C. SIGNAL/CHANNEL MODEL AND CAPACITY 1) STATISTICS OF THE SIGNALS
We start with the model (3) of the processed PUF readout (after application of the helper data), and aim at deriving the optimum decoding metric and the capacity of the scheme.
To that end, in Fig. 2, the probability density function (pdf) of the useful (error-free) PUF readout x is depicted; it is zeromean, unit variance Gaussian distributed. In BPSK, the binary 0 is represented by the real number (signal point) +1; the binary 1 by −1. Each binary symbol is thus represented by a unique number; the pdf of the BPSK data symbols is discrete (gray Diracs in Fig. 2). When considering soft-output PUFs, this unique representation is no longer present. Instead, any number from the region R 0 = {x | x ≥ 0} represents a binary 0 and any number from the region R 1 = {x | x < 0} represents a binary 1, i.e., since both regions are used with probability 1/2, we have The binary information is thus represented by any number from a region. The number which is actually present in a PUF cell can be seen as randomly drawn from the regions following the Gaussian distribution within the region. For performance evaluation, the point of view is thus reversed compared to the operations in a PUF. Instead of stating that the PUF readout is positive or negative and thus gives a 0 or 1, we pretend to communicate a 0 or a 1 and select the actual physical representation randomly from the given region and according to the given statistics. This can be seen as randomness at the transmitter being present, which is an important concept in physical-layer security [22].
The selected symbol x is transmitted over an AWGN channel, i.e., zero-mean Gaussian noise with variance σ 2 e , which is independent of x, is superimposed. After some manipulations, the conditional pdfs of the receive signal y given the binary symbol to be communicated can be calculated to be where '' * '' denotes convolution and Q(x) is the complementary Gaussian integral function We use the abbreviations 2) CAPACITY Knowing the pdfs of channel input and output, we are able to calculate the capacity-expressed in bit/PUF cell or in short bit/cell-of this channel, i.e., the mutual information I(c; y) between the imagined binary channel input c and the real-valued channel output y. From basic information theory, e.g., [23], we have (integration is done over the entire real line) here h(·) denotes differential entropy. Note that, due to sym- Considering that e is Gaussian distributed and, thus, has differential entropy h(e) = 1 2 log 2 (2πeσ 2 e ), and y is Gaussian distributed with variance σ 2 y = 1+σ 2 e , we arrive at where C Gauss denotes the capacity of the AWGN channel with Gaussian input and C half Gauss the respective capacity when the input is half-normal distributed. The capacity is thus given as the difference between the capacity when using the entire distribution and that of using only the positive (or negative) half.

3) DECODING METRIC
In soft-decision decoding the reliability is often expressed as log-likelihood ratio (LLR), which, for equal-probable binary symbols is given by Using (6) and (7), we have Please note that in case of BPSK over the AWGN channel the LLR would read

4) HARD-DECISION DECODING
For comparison, we also consider hard-decision decoding.
Here, the decoder is fed with the hard decisions and operates on the Hamming metric.
The hard decisions are characterized by their bit error ratio (before decoding). Due to symmetry, the end-to-end model for the channel including the receiver-side quantization is given by binary symmetric channel (BSC). Its bit error ratio is given by The capacity of this BSC is then where the binary entropy function is defined as

D. NUMERICAL EXAMPLE
In Fig. 3, the capacities are plotted over the signal-to-noise ratio. Besides the capacity in case of Gaussian readout, that of BPSK is displayed. The sold lines are valid when utilizing the analog channel output; the dashed lines when only hard decisions are used. As can be seen, for a fixed SNR, the Gaussian readout provides a much lower capacity than BPSK signaling in conventional digital transmission. Moreover, for a Gaussian readout, hard decision causes a much more significant loss as in case of BPSK. This means, that PUFs utilizing softdecision decoding enable a large gain over the conventional hard-decision decoding design. For example for a desired capacity (rate of the code) of C = 0.7 bit/cell, the curves for Gaussian readout are spaced by approximately 5 dB; softdecision decoding is possible at 5 dB lower SNR than harddecision decoding.

III. NEW HELPER DATA SCHEME
In this section, we present an augmented helper data scheme for soft-decision decoding. The general principle will be enlightened, its security analyzed, and a particular realization is proposed.

A. BASIC PRINCIPLE
In classical hard-decision binary PUFs the symbols ''0'' and ''1'' are flipped in a way such that the noise-free (reference) readout is a randomly chosen valid codeword. Considering the analog readout, as discussed above, a flipping of the sign of the real-valued symbols is the corresponding adequate strategy. However, operating on the analog readout, much more degrees of freedom as for hard-decision binary PUFs are possible.
Consequently, in addition to the sign flipping via the signed identity matrix S, we propose a further addition of a suitably chosen real-valued word d ∈ R n -subsequently we call this word ''dither''. In the initialization phase both components are selected and establish the helper data H = {S, d}. The situation of the reproduction phase and the interpretation as communication scheme are depicted in Fig. 4.
As in the binary hard-decision case, the component S of the helper data guarantees that r c is a valid (mapped) codeword. If the component d would not be present (or d = 0), the reproduction task is to decode r c in additive Gaussian noise e. Employing the real-valued word d, the useful signal which has to be decoded is x = r c + d-choosing d suitably, the pdf of x can be shaped and, thus, decoding may be done more reliably.
However, when choosing d, two contradicting demands have to be taken into account. On the one hand, the performance of the decoder should be improved. Hence, d has to be dependent on r c ; their sum should be decodable more reliable than only r c . On the other hand, as the helper data, and thus d, is publicly available, it must not reveal any information about r c .
We first analyze the security of this scheme and then present a possible choice of d which fulfills the contradicting demands.
B. SECURITY OF THE HELPER DATA SCHEME As described above, the k-bit message m is chosen uniformly at random. Encoding and mapping to BPSK symbols is a oneto-one function and gives a = M(ENC(m)). The helper data S is chosen such sign(Sr ref ) = sign(r c ) = a is the valid chosen codeword; the part d of the helper data is generated suitably.
The last equation is valid since S is irrelevant when knowing a. Hence, we have the intuitive result that a possible leakage is the sum of the leakage via S and that via d.
and finally When the leakage L d due to the word d can be made zero, the entire leakage of the scheme will be zero.

C. SCHEME WITH DITHER
We now present a first approach for choosing the word d.
To that end, two extreme cases can be observed. First, if d = 0, regardless of the other quantities, no modification of r c is done to obtain x. Thus, x is Gaussian distributed and the initial situation is present. Clearly, no leakage is caused and L d = 0.
Second, if d = sign(r c ) − r c is chosen, we have x = r c + d = sign(r c ) ∈ {±1} n ; the situation of a BPSK transmission would be present. However, here a significant leakage is caused. In general, error-free decoding of the message m purely based on d would be possible.
Consequently, a suited strategy has to be in between these extreme cases-x should be moved away from a Gaussian distribution to support decoding, but at the same time no (or only little) leakage should be caused. In other words, d should be selected based on r ref , S, and a, such that the performance of the reconstruction is increased but the leakage is as small as possible. A possible procedure is to maximize the capacity of the AWGN channel with input x minus the leakage caused by the knowledge of d.

1) DITHER AND PROBABILITY DENSITY FUNCTIONS
Our proposal is as follows: As already done in [21], the sign of the readout r ref is flipped, i.e., r c = S r ref is generated (S is selected suitably), such that sign(r c ) is a valid (mapped) codeword.
The new part d is generated as follows; all subsequent calculations are done individually per elements of the word r c . Let u be a random variable, independent of all other variables, and uniformly distributed over the interval [0, µ], where µ is a free parameter to be optimized. Further, let ν be a given threshold which has to be optimized, too. Then, the considered element d of the vector d is calculated as follows which means that We call d ''dither'' as the word r c is dithered; the elements of x jitter around the original values of r c . The use of a dither is a well-known concept in digital transmission and channel decoding, e.g., [24], [25]. The support of the joint pdf of r c and d is sketched in Fig. 5. This joint pdf has a Gaussian marginal pdf over r c and a uniform marginal pdf (interval [−µ, µ]) over d.

Please note that
When knowing the sign of d, the entropy of the sign of r c is thus only H 2 (2 Q(ν)). Hence, the leakage (per symbol) caused by knowing d is (29) VOLUME 10, 2022 If ν = 0.674, we have 2 Q(0.674) = 1/2; here, knowing d does not give any information about the sign of r c which carries the message; no leakage is caused. The support of the joint pdf of r c and x = r c +d is sketched in Fig. 6. As above, this joint pdf has a Gaussian marginal pdf over r c and (for fixed r c ) is uniform in x direction.
Integrating this joint pdf over r c gives the pdf of x (marginal pdf). In anticipating the subsequent results, we may restrict ourselves to the range ν/2 ≤ µ ≤ ν. For this case, the marginal pdf is given by (for The influence of the dither on the pdf of x is visualized in Fig. 7. Via the dither, the Gaussian pdf of the readout (magenta) is driven towards the discrete pdf of a BPSK transmit signal (blue). Since the pdf has much less contributions around the threshold x = 0, better performance can be expected.

2) OPTIMIZATION OF THE PARAMETERS
For best performance, the free parameters µ (amplitude) and ν (threshold) have to be optimized. If zero leakage is requested, ν = 0.674 has to be selected but still µ has to be adjusted.
The optimization can be done as follows: Given the pair µ and ν, the pdf of the transmit signal x is calculated via (30).
Having f x (x), the capacity C SD can be calculated numerically via the procedure explained in Sec. II-C by replacing the Gaussian pdf by the given one. Finally, considering the leakage for the specific choice of ν, the useful capacity C = C SD −n L d is obtained. For each desired useful capacity (equal to the rate of the code), the optimum pair µ and ν (for which the required SNR is minimum) can be determined.
The results are depicted in Fig. 8. There, the capacity in case of Gaussian readout is compared with the case of employing a dither. It is visible, that via a dither, significant gains can be achieved. For example, a desired capacity is C = 0.7 bit/cell is already guaranteed at an approximately 4 dB lower SNR. If ν = 0.674 is fixed such that no leakage is present, some (small) loss compared to the case when a leakage is allowed (but the useful capacity is maximized) has to be accepted. However, even the case of no leakage shows very good performance. For reference, the capacity curve of BPSK over the AWGN channel is shown as well. Using a dither, approximately half the distance between Gaussian readout and BPSK can be bridged.

3) OPTIMIZATION OF THE PDF
Up to now, the elements of the dither d are uniformly distributed. Obviously, the pdf of the initial random variable u can be optimized. Please note that the above statements on the leakage are valid regardless of the pdf of u. Hence, as long as ν = 0.674, no leakage is caused. For optimizing the pdf, we restrict ourselves to this case.
We now allow the pdf of u to follow any function over the interval [0, µ] given by a polynomial of degree p u , i.e., f u (u) = p u l=0 ζ l u l . Thereby, the coefficients ζ l have to be normalized such that µ 0 f u (u) du = 1. Unfortunately, in this case no closed-form expression for f x (x) can be given. However, via numerical optimization the coefficients ζ l have been optimized for polynomials up to degree p u = 4 to maximize capacity. Over a wide range of SNRs, the optimum is very close to a triangular distribution f u (u) = 2 µ u, for u ∈ [0, µ], and zero else.

4) NUMERICAL SIMULATIONS
Finally, we present results from numerical simulations. Polar codes [26] are used as channel coding schemes as a lowcomplexity soft-input decoding algorithm is available, which can be efficiently implemented in hardware [27]. A code with rate R = 0.7 and codelength n = 1024 is presumed; the code is designed based on the Bhattacharyya parameter [26], [28]; for the rate-0.7 code we use the design SNR 5.74 dB (3 dB above the capacity limit of BPSK over the AWGN channel; blue curve in Fig. 8). Successive cancellation decoding is employed. LLRs clipping to a maximum magnitude of 100 is active.
We plot the word error ratio (WER), i.e., the probability that the reproduced messagem differs from the actual message m. As common for FPGA PUFs, a WER below 10 −6 is desired. Fig. 9 compares the different situations. First, Gaussian readout without additional dither is considered (magenta). The solid lines are valid for the (correct) LLR calculation according to (13) and the dashed lines are valid for the LLR calculation (14) which is optimum for BPSK. Clearly, the LLR calculation matched to the specific situation gives better results than that for BPSK. However, the loss due to the much simpler calculation is not too large. For comparison (black), the WER when using hard decisions is given. The significant gain due to soft decisions is clearly visible. Employing the (uniformly distributed) dither, much better performance can be achieved. The cyan lines are valid for the parameters ν = 0.674 and µ = 0.6; here no leakage is present. The red lines hold for ν = 0.849 and µ = 0.807 (opt); as here a leakage of 1 − H 2 (2Q(0.849)) = 0.032 bit/symbol is present, the coderate is increased to R = 0.732. Even with this larger coderate, a better performance is achieved. Both LLR calculations (which here are both approximations) give almost the same performance.
The green curve is valid for the case of a triangularly distributed dither (here only the LLR calculation according to (13) has been used). Here, ν = 0.674 (no leakage) is chosen and µ = 0.53 results from the optimization. Employing this optimized pdf of the dither, a gain of approximately 1 dB can be achieved over the situation when the dither is uniformly distributed.
Finally, the potential performance of BPSK over an AWGN channel is shown (blue). Please notice that the relations predicted by the capacity arguments (cf. Fig. 8) are reflected in the word error rate curves. Due to the finite (relatively short) codelength, the absolute gaps are larger than the differences in capacity.

D. EVALUATION WITH ROPUF MEASUREMENT DATA
At the Institute of Microelectronics at Ulm University, 22 instances of FPGA ROPUFs have been implemented, cf. [15]. Out of the available ROs, n = 1024 disjoint pairs have been selected randomly. Each pair has been measured at various temperatures; we use the measurements from −10 • C to 50 • C (in steps of 10 • C). Temperature variations, voltage variations, and aging are the most relevant items for readout deviations/errors. However, in contrast to the environmental temperature, the supply voltage can (and will) be stabilized by voltage regulators.
The reference readout r ref of each PUF instance is obtained by averaging 10 readouts at a temperatures of 20 • C. For each instance, the message m is randomly selected and the helper data (sign-flipping matrix S and uniform dither d) is generated as detailed in the present paper. We restrict ourselves to the no-leakage case ν = 0.674). Polar codes with codelength n = 1024 are employed.
For verification, 10, 000 readouts per PUF instance and per temperature are used. The helper data is applied to the verification readout and decoding is performed.
In Tab. 1, the number of erroneous PUF instance among the 22 instances and the number of word errors per erroneous instance are tabulated. The rate is chosen as R = 0.7 (with the optimum choice µ = 0.60), R = 0.8 (with µ = 0.57), and R = 0.9 bit/cell (with µ = 0.55). Results for the scheme with and without dither are given. For example, for R = 0.8 and the scheme with dither, error occurred in 3 out of the 22 instances; 19 instances were free of errors over the entire temperature range. The 3 instances with errors showed a singe or two errors over the entire temperature range (7 temperatures) and all 10, 000 readouts per temperature.
The improvement by using the proposed dither is clearly visible. Without dither, in all cases errors occur. For R = 0.7, the scheme with dither is able to deliver all messages free of errors. For higher rates, the scheme without dither completely fails; in the scheme with dither only rare errors occur. In summary, the proposed scheme is able to operate reliably over a wide range of temperatures and with rates up to R = 0.7 bit/cell.

IV. CONCLUSION
In this paper, soft-decision decoding in binary PUFs has been addressed. By interpreting the PUF readout process as digital transmission over a noisy channel with a specific (uncommon) pdf of the useful signal, the respective capacity has been derived. Moreover, the optimum decoding metric (in form of LLRs) has been given. In addition, a helper data scheme suited for soft-decision decoding has been studied. In particular, an augmentation by an additive dither word has been proposed. The security of this new approach has been proven. The performance is covered by numerical simulations and by evaluating measurement data from FPGA implementations of ring oscillator PUFs. Employing the scheme with dither, rates up to 0.7 bit/cell can be extracted reliably, which is a tremendous gain over state-of-the-art binary PUFs utilizing hard-decision decoding.
Even though we started from a ROPUF, the discussed principles can be applied to any PUF architecture where the analog source of randomness is accessible. In many situations, Gaussian signal and error models are reasonable assumptions in view of the law of large number.