A Probabilistic Approach for Spatio-Temporal Phase Unwrapping in Multi-Frequency Phase-Shift Coding

Multi-frequency techniques with temporally encoded pattern sequences are used in phase-measuring methods of 3D optical metrology to suppress phase noise but lead to ambiguities that can only be resolved by phase unwrapping. However, classical phase unwrapping methods do not use all the information to unwrap all measurements simultaneously and do not consider the periodicity of the phase, which can lead to errors. We present an approach that optimally reconstructs the phase on a pixel-by-pixel basis using a probabilistic modeling approach. The individual phase measurements are modeled using circular probability densities. Maximizing the compound density of all measurements yields the optimal decoding. Since the entire information of all phase measurements is simultaneously used and the wrapping of the phases is implicitly compensated, the reliability can be greatly increased. In addition, a spatio-temporal phase unwrapping is introduced by a probabilistic modeling of the local pixel neighborhoods. This leads to even higher robustness against noise than the conventional methods and thus to better measurement results.


I. INTRODUCTION
Optical metrology systems are used for high-precision surface measurement and the dense 3D reconstruction of complexly formed objects. One or more cameras thereby often observe the measurement space. In this context, depending on the measuring principle, it is necessary to carry out an optical position encoding to be able to reconstruct the surface through triangulation. The most prominent examples of such systems are the so-called structured light systems, in which the surface under test is optically encoded by means of a structured illumination [1]. For example, profilometric measurement techniques use a projector as a reference system to project one or more known reference patterns onto the target object and triangulate the surface using the pre-calibrated measurement setup [2]- [5]. Related to profilometry is deflectometry, which is used to reconstruct specular surfaces [6]- [10]. Here, an optical encoding is applied not to the target surface itself, but to a reference structure outside the measurement space, e. g., an LCD monitor that is observed indirectly as a reflection in the specular surface. The objective of the structured illumination in these cases is to determine an imaging function that allows direct mapping of camera pixels to points in the pixel plane of the reference system. With the help of this pixelto-pixel registration, local defects in the surface under test can be detected or the surface can be reconstructed globally. Apart from optical metrology, optical encoding techniques are also used more and more in the field of camera calibration. Several works indicate that using reference features displayed by an active target drastically decrease the calibration error as compared to when standard checkerboard features are used [11]- [14]. To ensure the most precise measurement, the registration must therefore be as accurate as possible.
There are many possibilities for such an encoding. A local encoding of the reference pixels can be done, e. g., by displaying statistical patterns where each position within this pattern is identified by the local pixel neighborhood [1]. While this method allows a very fast measurement, since only one pattern has to be displayed, it is only of limited use for the measurement of more complex scenes. Since the surface distorts the reference pattern, the encoding of the local neighborhood can thus often no longer be recognized. For high-accuracy structured illumination techniques, a temporal encoding of each pixel is thus more suitable. Here, instead of a single pattern, a sequence of patterns is now displayed by the reference. The sequence of intensity values measured in the camera subsequently allows the decoding of the reference pixels and thus yields the determination of the imaging function. A popular temporal coding method is the coding of the reference pixels utilizing a gray code [15]. Here, a binary pattern sequence is displayed to uniquely encode the individual pixels. However, a major disadvantage of the graycode method is that it uses only binary intensity values. As a result, the displayed signal with its sharp edges has very high frequency components. Because most of the time the camera and the surface as well provide a blurred image of the reference pattern, these edges become blurred and the decoding becomes more difficult. Another disadvantage is that only discrete pixels can be encoded, and hence, no subpixel information can be extracted [15].
Because of these disadvantages, phase-shift coding methods have become widely accepted in structured illumination applications. Here, a sequence of sinusoidal signals is displayed by the reference, where the encoding of the pixel coordinate is contained in the phase of the sinusoidal signal. The great advantage of these methods is that they are robust to a variation in the ambient illumination, to noise, to a low-pass filtering due to a defocusing effect of the camera, and that they allow an estimation of the phase uncertainty [16]. At the same time, these methods allow a subpixel-accurate encoding if the reference pixels are slightly out of focus. In order to further increase the accuracy of the measurement, additionally multi-frequency methods are used, where sinusoidal pattern sequences with different frequencies are displayed. While this increases the accuracy of the measurement, the periodicity of the sinusoidal pattern sequence leads to an ambiguous position encoding in the entire measurement range with just a single phase measurement. The uniqueness range of the phase measurement initially extends only over one period of the underlying sinusoidal pattern. This leads to a modulo-2π phase wrapping, which can only be compensated using phase unwrapping methods.
Apart from phase-shift coding, these phase wrapping effects also appear in other fields of optical metrology, e. g., interferometry [17]- [19], SAR imaging [20], [21], or even time of flight imaging [22], [23]. Thus, the phase unwrapping problem influences many applications. For applications where several phase measurements can be performed, the so-called temporal multi-frequency phase unwrapping methods have proven to be the best choice, since they allow a pixelindividual unwrapping. These temporal methods are generally categorized into four groups: hierarchical methods [24]- [29], heterodyne methods [17]- [19], [30]- [37], number-theoretical methods [38]- [46], and optimization-based methods [47]- [53]. They differ in the way the unwrapping is performed, in which frequency configurations can be used, and in how large the resulting uniqueness range of the unwrapping is. However, a disadvantage of the classical methods is that generally not all phase measurements are unwrapped at the same time. Moreover, they often do not take into account the inherent periodic structure of the phase, which leads to erroneous results. In addition, and more importantly, the estimation of the phase uncertainty is completely neglected in the whole unwrapping procedure.
To overcome these deficiencies, we present in this paper a new probabilistic approach for phase unwrapping, which uses circular statistics to describe the multi-frequency phaseshift coding to optimally reconstruct the phase. With this we respect the periodicity of the phase, we implicitly unwrap all phase measurements simultaneously by finding the underlying optimal position encoding that caused the phase measurement, we allow for an easy frequency selection with a maximum uniqueness range of the unwrapping, and we additionally include the estimation of the phase uncertainty into the overall unwrapping process. Furthermore, we propose to not only perform a temporal unwrapping but to additionally incorporate the information of the local pixel neighborhood in the modeling and thus obtain a probabilistic approach for spatio-temporal phase unwrapping.
The structure of this paper is as follows. In section II we first discuss the general principle of phase-shift coding and show how the phase and also the phase uncertainty can be reconstructed from the sinusoidal pattern sequence. Further, section III describes the concept of phase unwrapping and introduces the state of the art in multi-frequency temporal phase unwrapping. In section IV we present a new method for temporal phase unwrapping that solves the problem by probabilistically modeling the phase measurement. For this, we find in section IV-A a proper mathematical formulation for the probability density function of the phase-shift coding. In section IV-B we derive an algorithm to unwrap the phase in terms of a maximum-likelihood estimation from the probability density of the multi-frequency phase measurement. And at last, in section IV-D we extend our probabilistic framework to include spatial information of the observed scene to further improve the phase unwrapping, thus providing an ideal phase reconstruction that serves as the basis of any structured illumination system. Eventually, in section V the presented methods are extensively analyzed and compared to the state of the art. And finally, a summary and conclusion are given in section VI.

II. PHASE-SHIFT CODING
In optical metrology applications such as deflectometry or profilometry, phase-shift coding is used to encode the pixels of a reference system using an image pattern series. The absolute coordinates of the reference pixels can then be further used for triangulation or defect detection [3], [9].
The basic principle of phase-shift coding is to assign an individual phase of sinusoidal signals to each reference pixel: where the coordinates of the pixels are interpreted as relative coordinates x, y ∈ [0, L) with L = 1 for the rest of this paper. Phase-shift coding must be performed independently in both the horizontal and vertical directions, which is why only the encoding in the x direction is considered in the following. The encoding in the y direction is done analogously. Further, the argument of the phase is also simplified by omitting the coordinate y, since the phase in x direction will take the same value for each y. In other words, in the following φ(x) := φ x (x, y) holds without loss of generality.
To encode a normalized monitor coordinate x ∈ [0, 1), a signal sequence of M sinusoidal patterns with frequency f and shifted by Ψ m is generated and displayed on a monitor screen, whereby the coordinate is contained in the phase φ(x) = 2πf x of the signal sequence Here I max represents the maximum displayable brightness value, and for the so-called symmetric M -step algorithms the phase offset has equidistant steps [54], [55]: The signal I m displayed on the reference illuminates the scene, which is to be examined and is then mapped onto the camera sensor. Thus, the camera records for every camera pixel u = (u, v) T a signal sequencẽ Here A(u) is a constant background illumination, B(u) is the modulation of the signal and φ(u) is the the phase that contains the information about the encoded screen pixels x(u).
Because each camera pixel u can be considered independently, the coordinates u are neglected in the following for clarity.
To determine the unknown quantities from the recorded signal sequence, at least M ≥ 3 phase shifts are needed. The solutions for the modulation B and phase φ are then given by where arctan2(a, b) ∈ [−π, π) is used, which correctly assigns the arguments of the arctangent to the four quadrants. For sake of simplicity, in the remainder of this paper, the domain of the phase is shifted to positive values

A. PHASE UNCERTAINTY
The accuracy of the phase measurement is influenced by external systematic influences of the entire measurement set-up as well as by stochastic errors. External systematic influences may change the brightness and contrast of the pattern sequence, which can lead to an increase in uncertainty. For example, the camera optics can image the sinusoidal patterns out of focus, which leads to a decrease in contrast.
Since most of the time, the surface is part of the structured illumination system, the shape, roughness, and color of the surface also influence the quality of the estimation. Due to these system-related influences, the uncertainty of the phase estimation can be different for each pixel. Furthermore, the phase measurement is influenced by stochastic errors. Every camera image is accompanied by image noise. Therefore, it is obvious that this noise also affects the phase estimation and influences the uncertainty of the measurement. In general, the sensor noise shows up as noise in the pixel values and can be regarded in a good approximation as normally distributed noise with variance σ 2 I and zero mean [56]. For symmetrical M -step methods, the phase noise has zero mean and the uncertainty (standard deviation) can be specified [5]: While M and the modulation B are directly defined by the phase-shift coding or can be estimated, the sensor noise is initially unknown. To be able to describe the phase noise absolutely, Fischer et al. [16] introduced a quantitative noise model, which combines the phase noise with the parameters of the EMVA 1288 standard for camera systems [56]. This makes it possible to predict the phase uncertainty very precisely by estimating only the modulation B from the pattern sequence.
To further reduce the uncertainty, it is beneficial to use sinusoidal pattern sequences with a frequency f > 1. This effectively reduces the phase noise induced by the camera sensor noise. As will be explained in more detail in section III, phase jumps occur in the reconstructed phase when the frequency of the sinusoidal pattern sequence is chosen to be f > 1. The phase would take values φ > 2π, but is only defined on the periodic interval [0, 2π). Thus, the real line R is wrapped to the smaller interval [0, 2π) , see Fig. 1. To unwrap the phase again, an integer multiple of 2π must be added at corresponding places. The unwrapped phase finally results in where φ ∈ [0, 2π) represents the wrapped phase, ε φ ∈ [0, 2π) represents the phase noise with uncertainty σ φ , and k ∈ Z is the so-called period-order number or unwrapping factor. VOLUME 4, 2016 Since the domain of the unwrapped phase has been increased to Φ f ∈ [0, 2πf ), it has to be scaled back to the original range. The final phase measurement therefore results in with Φ ∈ [0, 2π) . By increasing the frequency and then scaling back, the phase information is not changed, but the noise is reduced by the factor 1/f . The uncertainty of the unwrapped phase is then be given by In summary, with the phase-shift coding one obtains not only a pure position encoding, but additionally also the associated uncertainty, where the complete information is encoded in the phase φ and the phase uncertainty σ φ .

III. PRINCIPLES OF PHASE UNWRAPPING
If the frequency of the phase-shift pattern sequence is chosen to f > 1, jumps occur in the reconstructed phase. Resolving these jumps is the goal of phase unwrapping. Since the wrapping of the phase strongly depends on the chosen frequency, the optimal choice of the unwrapping factor k is also frequency-dependent. Thus, for a coordinate x and frequency f i , eq. (9) can be rewritten: The task of phase unwrapping is now to find the correct k i for each phase measurement. Since for each pixel an individual unwrapping factor exists, problem (12) is initially underdetermined. To get a solution anyway, additional information has to be used. In principle, there are two approaches to solve the problem: spatial and temporal phase unwrapping.
Spatial phase unwrapping algorithms are useful when it cannot be guaranteed that the phase remains constant over time or when repeated measurements would be too costly. With spatial algorithms, phase unwrapping is performed using only a single phase measurement. The information necessary for the unwrapping is then obtained from the 2D pixel neighborhood. For example, in region growingbased approaches, starting from an initial pixel, the phase is unwrapped aiming to achieve a continuous phase profile where neighboring pixels have a similar value [57]- [59]. However, spatial unwrapping is very susceptible to noise, and phase discontinuities can make the unwrapping difficult or cause errors. For example, a step in the phase cannot be reconstructed without ambiguity, since the algorithm is unable to determine the height of the step, which may have a multiple of 2π as an offset. The main disadvantage of spatial unwrapping methods, however, is that they can generally only obtain a relative phase instead of an absolute one, which is not useful for 3D reconstruction problems.
Hence, if the requirements for spatial phase unwrapping are not satisfied or an absolute phase estimate is needed, temporal phase unwrapping must be used.

A. TEMPORAL PHASE UNWRAPPING
Temporal phase unwrapping methods in general do not use the spatial information in the phase map. They can therefore handle each pixel individually, which means that discontinuities in the phase do not cause any problems. On the other hand, they rely on additional information obtained by additional measurements, e. g., additional patterns or geometric constraints. A rough overview of absolute phase unwrapping methods is provided by [60], [61].
In this work, however, we are interested in a specific class of unwrapping methods: Temporal multi-frequency phase unwrapping. These methods use multiple phase-shift pattern sequences with different frequencies f i to obtain for each measurement a phase φ i ∈ [0, 2π), all of which are based on the same coordinate encoding. Since we assume that the unwrapped phase does not change over time, the multiple phase measurements generate the equation system (12). Because all phase measurements are based on the same coordinate x, if certain requirements are met, a unique solution exists The unwrapping of the phase is then obtained by solving this equation system, for which various methods exist.

1) Hierarchical Unwrapping
The hierarchical methods are among the most intuitive approaches. They use a series of phase measurements, in which the frequency of the underlying sinusoidal signals is increased in each step. To obtain an unambiguous unwrapping of all phase measurements, the frequency of the first measurement is chosen in such a way that the measured phase is not subject to ambiguities. Thus, f 0 = 1 and Φ 0 = φ 0 . Each subsequent measurement is then unwrapped using the previous unwrapped phase associated with the lower frequency as a The unwrapping factor can hereby be determined using a simple rounding operation There are many variations of hierarchical unwrapping algorithms in the literature, which differ mainly in the choice of the frequency sequence, e. g., linearly increasing frequencies [24], [26], exponentially increasing frequencies [28], [29], reversed sequences [26], [27] or generalized approaches [25]. Usually, after unwrapping the individual phase maps, the phase corresponding to the highest frequency is used or all phase maps are averaged.

2) Heterodyne Unwrapping
The two-wavelength heterodyne methods were originally developed for interferometry [17]- [19], but are also applicable to phase-shifting 3D-measurement systems [35]- [37]. Unlike before, the heterodyne method can be used directly at high frequencies. The approach of heterodyne phase unwrapping consists in extending the unambiguous measurement range of a synthetic wavelength at the beat frequency of two comparatively closely neighboring wavelengths. Usually, only two wavelengths λ 1 and λ 2 are used. The phase measurements associated with the two wavelengths are subtracted, and the wavelength of the synthetic phase can be obtained where f 12 represents the beat frequency. If λ 1 and λ 2 are close enough and well-chosen, the uniqueness range of the phase unwrapping can be increased enough to resolve the ambiguity [33]. With the normalized reference size L = 1 we use in this paper, f 12 = |f 1 − f 2 | ≤ 1 must hold in order to allow an unambiguous phase reconstruction.
Since the phase noise of φ 1 and φ 2 is accumulated during the formation of the synthetic phase, the signal-to-noise ratio deteriorates. For this reason, the synthetic phase is generally used only to unwrap the underlying measurements φ 1 and φ 2 . The unwrapping factors k 1 and k 2 are hereby calculated using (14) with The extension to more than two wavelengths is described in [30], [31] and allows increasing the unambiguous measurement range even further. For this, several approaches exist that optimize the choice of the wavelengths to obtain a robust unwrapping result [32]- [34].

3) Number-Theoretical Unwrapping
The number-theoretical unwrapping methods are based on number theory, relative primes, and the divisibility properties of integers. They were originally proposed by [38]. They were then further improved to reduce the susceptibility to phase errors [2], [39]- [41]. In its basic form, the method uses the Chinese-remainder theorem to calculate a simultaneous solution to the unwrapping problem. Following the theorem, a system of simultaneous equations of congruence has a unique solution X ∈ Z , if b i ∈ Z and m i ∈ Z are known integers, and if the set of m i are pairwise coprime numbers, i. e., for their greatest common divisor applies gcd(m i , m j ) = 1 , ∀i, j [38]. The theorem can be applied to phase unwrapping, by comparing (13) to (17) and substituting The phase ambiguity can then be resolved, if the condition lcm(m 1 , m 2 , . . . ) ≥ L for the least common multiple is fulfilled [62]. Hereby, of course, an appropriate scaling factor L needs to be chosen to obtain meaningful integer values and co-prime m i . E. g., in the case of structured illumination applications, it can be set to the size of the reference pattern generator measured in pixels.
Further improvements to the algorithm can be achieved by precalculating a look-up table to speed up the computation time [42]- [46], by including geometric constraints in the unwrapping [63], or by removing the restriction of having co-prime wavelengths [64] .

4) Optimization-based Unwrapping
The previous methods have relatively high restrictions on the choice of frequencies. Thus, newer approaches try to circumvent these restrictions by posing the phase unwrapping as an optimization problem. Pribanić et al. [43] extend the two-wavelengths number-theoretical method by removing the restriction of having co-prime wavelengths. From the combination of all possible unwrapping factors, they search for the one that minimizes the distance between the two respective unwrapped phases.
The excess fraction methods can be regarded as a multiwavelength extension of the heterodyne methods [50]- [53]. They define an excess fraction as the difference between an ideal continuous unwrapping factor and its integer analogon. The unwrapping factors are then determined individually by minimizing the respective excess fraction, where each excess fraction is influenced by all phase measurements.
More recent approaches try to perform the unwrapping of all phase measurements simultaneously to find an ideal solution for all unwrapping factors at the same time. For this purpose, the vector of ideal unwrapping factors k = (k 1 , k 2 , . . . ) is sought, which minimizes the distance of the individual unwrapped phases to the mean value of all unwrapped phase measurements. Here, the distance measure can be defined by an orthogonal projection of the wrapped phases onto a subspace [47], or it can be written down directly as a sum of distances between the unwrapped phases to the averaged unwrapped phase [48], [49]. It is hence titled projection distance minimization (PDM). With Φ Mean = j fj Φj j f 2 j and by minimizing the projection distance the unwrapping factors, and thus, the simultaneous unwrapping of all phase measurements can be obtained. The optimal unwrapping factors are thereby found by an excessive trial and error of all possible combinations. To speed up the optimization, Petković et al. [47] suggest ignoring impossible combinations, and Zuo et al. [48] use the geometry of the measurement setup of a profilometry system to exclude unreasonable combinations.

5) Remarks
The classical phase unwrapping algorithms from the previous sections do not use all of the information to unwrap the phase measurements, and far more importantly, they generally do not take into account the inherently periodic structure of the phase, which can lead to incorrect unwrapping. For example, the simple hierarchical unwrapping method only uses the previous phase measurement with a lower frequency to unwrap the current phase. However, measurements with higher frequency could also contain information VOLUME 4, 2016 to unwrap the phase measurements with lower frequency. In addition, the periodic structure of the phase is not taken into account, so unwrapping errors often occur near the 2π-discontinuities. Also, to achieve good accuracy for 3D reconstruction, phase maps corresponding to high frequencies are needed. But then, the number of necessary measurements is high, because the sequence always starts at f = 1 .
The heterodyne method does not have to start at low frequencies, but can directly select high ones, allowing it to achieve an overall smaller mean uncertainty with the same number of measurements [34]. However, it is disadvantageous that the unambiguous measurement range of the unwrapping is determined by the beat frequency and thus, there are frequency configurations that do not yield an unambiguous solution, but which could be solved unambiguously with other methods [47]. Additionally, the method is not straightforwardly extendable to a multi-frequency approach.
The number-theoretical unwrapping methods perform a simultaneous unwrapping of all phase measurements and also consider the periodicity of the phase. Nevertheless, the restriction to pairwise co-prime wavelengths makes the selection more difficult, and due to the integer arithmetic and rounding operations, these methods are relatively susceptible to noise [39]. Even more, for the method to work, the frequencies must be chosen very precisely proportional to the integer co-prime wavelengths, which is especially problematic for applications where the wavelengths cannot be chosen freely, e. g., interferometry [65].
The optimization-based methods, on the other hand, perform a simultaneous unwrapping of all phase measurements without having to apply rounding operations. In their current form, however, they are still not perfect. They do not take into account the periodic structure of the phase so that unwrapping errors occur frequently near the boundaries of the coding interval. Also, it is a very expensive procedure due to testing all possible combinations of unwrapping factors k i .
Additionally, all methods have in common that the phase unwrapping does not consider the estimated phase uncertainty at all, although it could help to compensate for an unfavorable measurement. Therefore, in the following, we present an unwrapping method that respects the periodicity of the phase, which does not rely on integer arithmetic, which implicitly unwraps all phase measurements simultaneously by finding the underlying encoding that caused the phase measurement, and which additionally includes the estimation of the phase uncertainty into the process.

IV. PROBABILISTIC APPROACH FOR TEMPORAL PHASE UNWRAPPING
In the field of phase unwrapping, probabilistic approaches have already been used in the spatial domain. Carballo et al. [66] and Koetter et al. [67] use a probabilistic approach to model the probability of a phase discontinuity in interferometric synthetic aperture radar (InSAR) images, to use them as weight factors for a spatial phase unwrapping procedure. Droeschel et al. [22] uses a similar approach for time-of-flight imaging. Baselice et al. [68] use an Extended Kalman Filter that includes probabilistic data to perform phase unwrapping and phase noise reduction of InSAR data.
In contrast, we propose to use a probabilistic model for temporal phase unwrapping. To solve the phase unwrapping problem optimally, we attempt to find the coordinate that has the highest probability of having caused the corresponding phase measurements. To formulate the unwrapping as a probability problem, we model the phase measurement as an appropriate stochastic process. With this we determine the probability density of the encoded coordinate, find the optimal decoding by a maximum-likelihood approach, and thus implicitly compensate for the wrapping of all phase measurements.

A. PROBABILITY DENSITY FUNCTION OF PHASE-SHIFT CODING
As indicated in section II-A, the variance of the image noise can be propagated through the phase-shifting algorithm. Thus, every measurement provides not only an estimate of the phase φ but also the uncertainty σ φ of this estimation. The probability density function of the true phase is therefore centered around the respective measurement. The question now arises, however, which probability distribution the phase has. In principle, several distribution functions are possible. Since the image noise has a normal distribution, the first assumption is that the phase noise is also normally distributed. However, because the phase has a periodic structure and is only defined on the interval [0, 2π), the probability density must be searched in the field of circular statistics [69].

1) Wrapped Normal Distribution
Arguably the most intuitive approach to obtain a probability distribution of the phase is to assume it to be a normal distribution θ ∼ N µ, σ 2 and to allow its values to be spread on the entire set of real numbers θ ∈ R . By folding the density function around the unit circle the range of values is then forced to the interval [0, 2π) . The density function of the folded random variable is then the wrapped normal distribution [69] with the parameters µ ∈ [0, 2π) und σ 2 . The density function is symmetric and centered around the mean µ, whereas the width of the function is affected by the parameter σ. Since in practice the infinite sum must be terminated at some point, more efficient representations of the distribution exist where, depending on the choice of σ 2 , the sum can be aborted after only a few terms [70].

2) von Mises Distribution
A major disadvantage of the wrapped normal distribution is that it is quite intractable due to the infinite sum. Furthermore, it is not certain that a real phase measurement results from the folding of a linear normal distribution around the unit circle, and hence, one must not forcefully assume that (20) is the correct description of the phase-shift coding.
If we approach the problem with minimal knowledge, we can find an alternative probability density function for the phase. The knowledge we have is: the mean of the distribution corresponds to a phase measurement µ, we have a measure of the second central moment σ φ , and we know that the phase should be defined on the periodic interval [0, 2π). The circular probability density function which maximizes the entropy under the given conditions and thus represents the ideal choice under these circumstances is the von Mises distribution [69] where I 0 (κ) is the modified Bessel function of the first kind and order zero The parameter µ represents the mean value and κ corresponds to a concentration measure which is analogous to the inverse of the variance 1/σ 2 in the normal distribution. Because of its mathematical simplicity, the von Mises distribution is one of the most commonly used distributions in circular statistics. And due to its great importance, it is also often referred to as the circular normal distribution [69].

3) Phase Noise Model of Phase-Shift Coding
A more precise way to describe the probability density of the phase is to analyze the phase-shift coding directly. Rathjen [71] examines the random phase error arising from the normally distributed image noise of the sinusoidal pattern sequence. The two arguments of the arctan2 function from (6) are described using a bivariate normal distribution, where the parameters of the distribution are computed from the normal distribution of the image noise of the underlying pattern sequence. Finally, the distribution of the phase is calculated from this bivariate normal distribution, which applies to any phase-shift coding method.
Depending on the algorithm, different distributions are obtained, which do not necessarily have to be symmetrical and which may also depend on the absolute value of the phase. For the symmetric M -step methods used in this work, the arguments of the arctan2-function are uncorrelated and have the same variance, leading to a symmetric distribution function for the phase which is independent of the absolute phase value [71]. The probability distribution function of the phase φ for symmetric M -step algorithms is then given by where µ ∈ [0, 2π) represents the mean, the signal-to-noise ratio SNR = 1 2σ 2 φ determines the width of the distribution, x 0 e −t 2 dt is the Gaussian error function.

4) Comparison
To identify which of the presented distributions is best suited for the problem, a Monte Carlo simulation of the phase measurement is performed. For this, the coordinate x = 0 is encoded using phase-shift coding and then the phase φ is measured. The simulation is performed 10 7 times, each time adding image noise corresponding to a phase uncertainty of σ φ = π 2 . The probability density of the phase noise can then be approximated using histogram analysis. Fig. 2 shows the histogram of the measured phases and the different density functions whose parameters can be calculated from the phase-shift-coding. As expected, the phase noise can best be described by the noise model of Rathjen [71]. However, the von Mises distribution also shows a reasonably good fit to the histogram. Whereas the wrapped normal distribution is too low on the hills and too high in other areas of the histogram. Fig. 3 shows the Jensen-Shannon distance (JSD) [72] between the histogram and each of the distributions over different phase uncertainty values as a similarity measure, i. e., a small JSD value corresponds to a high similarity. It can be seen, that the model of Rathjen has a high similarity to the histogram for all uncertainty values. The von Mises distribution is also very close to the histogram and hence, represents the phase-shift coding sufficiently good, although the similarity is not constant for all noise values. Finally, compared to those two distributions, the wrapped normal distribution has a greater distance to the histogram. For small noise values, all distributions converge into one another [69], so that they are almost equivalent, and for very large noise values everything converges to the uniform distribution on the interval [0, 2π) .

B. COMPOUND PROBABILITY DENSITY FUNCTION
Because the individual phase measurement is affected by phase noise, the probability density of the true phase is hence centered around the respective measured value. To consider all phase measurements simultaneously in the unwrapping, depending on their respective uncertainty, we look for the phase that caused the individual measurements with maximum probability. However, since the phase has a periodic structure, it must be modeled using circular statistics.
Since the von Mises distribution is mathematically easy to handle, since it approximates the true distribution of the phase noise quite well, and since a maximum-likelihood estimation can be performed in a numerically stable way (see the next section IV-C), it will be used as the basis for modeling the phase measurement in the following without loss of generality. A modeling using the other densities would work analogously.
The density function of the true phase φ ∈ [0, 2π) as a function of the measurement is therefore given by Here φ i represents the measured phase and κ i = 1/σ 2 φi models the knowledge about the uncertainty of the phase measurement and thus describes the concentration of the distribution. Depending on the frequency of the pattern sequence, the distribution function of the encoded coordinate x can now be derived from this. With φ i (x) = 2πf i x and with the known frequency f i , the multi-modal von Mises distribution on the periodic interval x ∈ [0, 1) is obtained: where now, due to the multi-modal character of the distribution, the ambiguity of the phase measurement becomes illustratively visible in the density functions, see Fig. 4a.
Since the acquisition of the sinusoidal pattern sequence using phase-shift coding is performed independently for each image and identical acquisition conditions are assumed, each image has in principle the same standard deviation σ I of the image noise. Therefore, the strength of the phase noise σ φ remains the same in each measurement. Nevertheless, the variable substitution φ i (x) = 2πf i x reduces the width of the distribution locally by 1/f i . This leads to a reduction of the uncertainty which in turn is bought by an f i -fold ambiguity.
While the image noise generally remains the same for all images, the estimated phase uncertainty can vary significantly for different situations. For example, if impulse noise appears in images, it is detected by the phase-shift coding as a reduction in the modulation B, which leads to an increase in the estimated uncertainty σ φi for the respective pixels. On the other hand, if the sinusoidal pattern is blurred due to the imaging system, the local contrast of the pattern sequence decreases. Again, the modulation B is affected and the uncertainty increases for the whole phase measurement. Of course, this is strongly influenced by the used pattern frequencies.
The uncertainty estimate thus contains knowledge about the system and can therefore be efficiently integrated into the probabilistic modeling of the phase estimate.
Depending on the chosen frequency of the sinusoidal pattern sequence, each phase measurement φ i corresponds to an individual probability distribution p(x|φ i , κ i , f i ). Since each phase measurement φ i is measured independently and all have the same underlying coordinate, the compound density of x for given frequencies f = [f 1 , . . . , f N ], phase measurements φ = [φ 1 , . . . , φ N ] and estimated concentration parameters κ = [κ 1 , . . . , κ N ] can be directly expressed:

C. MAXIMUM-LIKELIHOOD PHASE UNWRAPPING
Having described the probability density function of the multifrequency phase-shift coding, this can now be used to find the most likely coordinate that caused the phase measurements. The optimal coordinate and thus the simultaneous unwrapping of all phase measurements can be found with a maximum likelihood estimator. As a result, maximizing the density function yields the sought coordinatê = arg max = arg max  where the monotonicity of the logarithm helps to simplify the equations and removes the potentially more numerically unstable exponential function.

1) Uniqueness
To be able to identify a unique maximum, constraints must be applied to the selected frequencies. As with other unwrapping methods from the literature, uniqueness can be achieved if the frequencies are relatively prime [47], [48]. However, while for classical number-theoretical approaches all the frequencies need to be pairwise co-prime integers with gcd (f i , f j ) = 1 , ∀i ̸ = j, we have a less restrictive condition. For our approach, uniqueness is obtained with gcd (f ) = gcd (f 1 , f 2 , . . . ) ≤ 1 , whereas the frequencies do not necessarily need to be integer valued. For frequencies f i ∈ Q, the extension of the gcd to rational numbers can be used to check uniqueness. For frequencies that are irrational numbers, the maximum of (32) is theoretically always unique if ∃f i ̸ = f j with i ̸ = j . Though, in this case, when the frequencies are poorly chosen the unwrapping might be more susceptible to noise. Fig. 4a and Fig. 4b demonstrate the uniqueness constraint illustratively. In Fig. 4a the frequencies are set to f = (2, 3, 6), thus gcd (f ) = 1 . Even though with gcd(2, 6) = 2 and gcd(3, 6) = 3, the frequencies are not pairwise co-prime, a unique maximum of the compound probability density can still be found. In Fig. 4b the frequencies are set to f = (2, 4, 6), thus gcd (f ) = 2 .
Here the maximum has a two-fold ambiguity. The compound density is only unique in the range x ∈ [0, 0.5) and repeats itself in x ∈ [0.5, 1) . Thus, in this case, the phase cannot be recovered uniquely.

2) Finding the Maximum
Although (32) appears simple, no analytical solution can be given for the global maximum because of the many local extrema. Therefore, the problem must be solved numerically. However, no global optimizer (e. g., simulated annealing, differential evolution) can be used because it could get stuck in a local maximum. To ensure that the maximum of the probability density is found every time and to avoid unwrapping errors, we solve the optimization problem on subintervals. To define the subintervals, (32) must be interpreted as a signal g(x) = i κ i cos (2πf i x − φ i ). Since it is a summation of sinusoidal signals, the maximum frequency of the signal g(x) is equivalent to the maximum used frequency f max = max(f i ) in the phase-shift coding. From sampling theory, it is known that a discrete signal can be reconstructed from its sampling points only if the signal does not change significantly between said points [73]. Consequently, the sampling frequency must be respected. Given the maximum frequency f max and using the sampling theorem, we therefore obtain a minimum required number of intervals I min = ⌈2f max ⌉, in which the global maximum must uniquely lie as a single extremum. A simple 1D line search procedure (see [74]) is now used to find the local maximum in each of those subintervals. A comparison of the intervals finally yields the global maximum.
Purely illustrative it would be sufficient to reduce the interval number to I min = ⌈f max ⌉, since only the local maxima are required and not the minima. Empirical investigations showed, however, that in rare cases nearly saddle point-like shapes appear in the signal. In these cases, two local maxima can lie very close to each other, and thus, with the reduced number of intervals, only one can be identified as a local maximum in the optimization. Nonetheless, the global maximum could always be found unambiguously in billions of simulations, since the signal changes very strongly in the vicinity of the global maximum, and thus, only a single solution exists in the interval under investigation.
As a remark, it remains to say that the presented maximum-VOLUME 4, 2016 likelihood optimization can in principle also be carried out with the other distributions from section IV-A. Though, since the log-likelihood function of the corresponding densities cannot be represented as a simple sum of cosine functions, the spectrum of these log-likelihood functions also has components at higher frequencies. Nevertheless, empirical investigations showed that higher frequencies are attenuated so strongly that the sampling theorem is almost fulfilled and hence a maximum could still be found every time. However, this could only be observed, when all I min = ⌈2f max ⌉ subintervals were searched. Thus, the other density functions need twice the computation time as compared to the von Mises distribution. In summary, with the presented method, the wrapping of all phases is compensated simultaneously and all measurements are fused to an optimal solution so that finally the most likely value of the coordinate x can be found.

D. SPATIO-TEMPORAL PHASE UNWRAPPING
Temporal phase unwrapping has the great advantage that each pixel can be individually unwrapped and an absolute phase is obtained. This is especially useful when only a little information about the surface to be examined is known and when 2D unwrapping methods would lead to erroneous results. In many tasks of optical metrology, where structured illumination is used, one often examines continuous surfaces. For example, in deflectometry, i. e., the inspection of specular objects, one often works with lacquered body parts from the automotive industry, with lenses or parabolic mirrors, which can be described for the most part by continuous surfaces with only a few regions deviating from this continuity due to sharp edges [8]. Also in time-of-flight imaging and many areas of profilometry, i. e., fringe projection, piecewise continuous objects are often inspected [4], [23]. This piecewise continuity has the consequence that neighboring camera pixels will observe similar phase values on the surface. It is therefore reasonable to use this additional information as help for the phase unwrapping to suppress phase errors.
Hence, we want to integrate the assumption of local continuity into the probabilistic framework from the previous section. This allows performing not only an unwrapping in the temporal dimension but a 3D phase unwrapping, while implicitly smoothing the probability density functions over the spatial dimensions. To do this, the probability density of each camera pixel is modeled as a superposition of the probability densities of the local neighborhood.
The probability density for each individual pixel u was already derived in the previous section and can be considered as a conditional density p(x(u)|φ(u), κ(u), f ) = i p(x(u)|φ i (u), κ i (u), f i ) . p(u|û)p(x(û)|φ(û), κ(û), f ) , (34) where U(u) represents a set of relevant neighborhood pixels. Since more distant pixels have less influence and we approach the modeling with minimal knowledge about the observed surface, we model the transition probabilities using a 2D normal distribution The compound density, consisting of spatial modeling employing normal distributions and temporal modeling using von Mises distributions, finally results in .
Although this probability density appears more complicated than (28) from the previous section, it can be maximized using the same methods for finding the optimal solution of the coordinate:x ML (u) = arg max p(x(u)).
However, it must be considered that this approach only leads to meaningful results if the local continuity assumption is not violated. To ensure that the given model is only applied in continuous areas, discontinuities have to be detected.

1) Detection of Discontinuities
Depending on the application, discontinuities in a surface transform themselves into discontinuities in the phase map. In the case of profilometry, a step in the surface results in a step in the phase map, whereas a step in the surface gradient does not necessarily destroy the continuity of the phase map. However, for example, in the case of a deflectometric measurement of specular surfaces, even a step in the surface gradient may result in a step in the phase map. Consequently, this means that we do not intend to detect edges on the surface, but only jumps in the phase.
For edge detection, a simple detector operating directly on the wrapped phase estimates is suitable for this purpose. But, since the 2π-discontinuities contained in the phase maps do not represent a property of the surface, they must not be falsely detected (see Fig. 12). Thus, we need a 2π-invariant detector. Typically, gradient-based operators are used to detect edges in images. For this, the Laplace operator ∆φ(u) = ∂φ(u) 2 ∂u 2 + ∂φ(u) 2 ∂v 2 is often used. However, for a 2π-phase jump in the wrapped phase, the operator will yield a multiple of 2π even when the correctly unwrapped phase would have only a small continuous change. To have this property ignored, we define a 2π-invariant Laplace operator ∆ 2π φ(u) := ∆φ(u) mod 2π (37) which is only sensitive to phase discontinuities in the unwrapped phase caused by the surface, whereas discontinuities that are caused by the ambiguity of a wrapped phase are ignored. To reduce the effect of noise in edge detection, a Laplacian of Gaussian may be used. Eq. (37) can take values within the periodic interval [0, 2π). However, since the strength of an edge is defined as the distance to 0, it is necessary to calculate the circular distance for an appropriate edge quality measure. Hence, for every phase measurement φ i (u) we can calculate an energy measure in which the maximum possible circular distance is equal to 1, which would correspond to a strong edge feature. Further, an appropriate averaging over all phase maps improves the edge estimate where the uncertainty of the phase estimate can be taken into account. Hence, the application of the modified Laplacian operator ultimately provides an energy measure for an edge, which is insensitive towards 2π-discontinuities. And finally, subsequent thresholding on this energy measure results in a feature map containing edge areas and non-edge areas, see Fig. 12. In places where an edge has been detected, the temporal modeling according to section IV-B must be used, whereas everywhere else the modeling according to section IV-D may be used to improve the phase unwrapping by utilizing the spatial neighborhood information.

V. EVALUATION
In this section, the methods presented in this paper are evaluated, analyzed, and compared with the state of the art. Sinusoidal pattern sequences with different frequencies are simulated and the respective phase is estimated using phaseshift coding, where the number of steps is chosen to be M = 8. The following methods are examined: the hierarchical unwrapping of Huntley et al. [24], the heterodyne unwrapping of Lai et al. [30], the number-theoretical unwrapping of Towers et al. [40], the optimization-based unwrapping of Zuo et al. [48], the proposed probabilistic temporal method, and the proposed probabilistic spatio-temporal method. For the proposed probabilistic methods, the von Mises probability density is used, unless specified otherwise. For the spatiotemporal method a spatial neighborhood U(u) of 3 × 3 pixels is used. To investigate the robustness of the presented phase unwrapping algorithms, the influence of Gaussian image noise and impulse noise is examined.

A. QUALITATIVE COMPARISON
The resolution of the reference pattern generator was chosen to be (2003,2003). For the first simulation three phase measurements with frequencies f ≈ (1, 3, 5) were generated. Because for the numer-theoretical method pairwise co-prime wavelengths must be used, we quantize the wavelength as λ = (2003, 668, 401). This corresponds to frequencies   (8), Gaussian noise with variance σ 2 I = σ 2 φ B 2 M 2 was added to the sinusoidal pattern sequence. It is important to note that the noise is not added to the wrapped phase measurements, as it is often done in the literature, but to the camera images I m , otherwise no realistic statements about phase-shift coding can be made. The heterodyne method calculates a phase difference to obtain a unique reference phase. Since φ 1 is already unique, it does not make sense to evaluate the heterodyne method for this frequency configuration. The coordinate x ∈ [0, 1) was sampled in 2003 steps and each value was simulated 2003 times. Fig. 5 shows the phase measurements and the coordinates estimated with the different unwrapping methods. Here, stronger colors represent a higher point density. The upper three plots show the noisy phase measurements φ i over the true coordinate x. The lower plots show the corresponding estimated coordinatesx.
The hierarchical unwrapping shows a strip of correctly unwrapped estimates in the middle section. At the boundaries of the coding interval, however, large errors appear because the periodicity of the phase is not implicitly modeled for this method. For these reasons, the effective coding interval is of-VOLUME 4, 2016 ten reduced in practical applications. This avoids unwrapping errors, but the effectively used frequency decreases, which increases the overall phase uncertainty. In addition to the boundary errors, the hierarchical method shows unwrapping errors that are represented by parallel lines to the middle line. In these cases, the phase was incorrectly unwrapped once or even twice. Since the hierarchical method always refers back to the unwrapped previous phase, unwrapping errors propagate from top to bottom and can no longer be compensated once they occurred. The number-theoretical unwrapping shows no errors at the boundary of the coding interval since the Chinese remainder theorem is based on a modulo arithmetic. Also, only a few unwrapping errors occur in the middle of the coding interval. Two lines of faulty estimations appear near the boundary, where noisy estimates x > 1 and x < 0 are folded back into the used interval [0, 1) . Overall, however, the method is more susceptible to noise, which results in a coordinate estimation with greater uncertainty. The optimization-based unwrapping is much better. Almost all pixels in the middle of the coding interval are correctly unwrapped and only at the boundaries do errors occur again, which are caused by the lack of modeling of the periodicity of the phase. The proposed probabilistic temporal method can compensate well for the boundary errors since the periodicity is well described using circular statistics and it performs well in other areas as well. Finally, the spatiotemporal method yields even better results. Here almost all values are correctly unwrapped and the uncertainty of the estimation is very small, as can be seen by the overall thinner line. In other words, it makes a lot of sense to include spatial information.
In a second simulation, the sinusoidal pattern sequence is now overlaid with impulse noise, where the probability of an impulse is set to p I = 0.075 . An impulse in the image appears as either a black pixel or a completely white pixel, i. e., it acts like salt and pepper noise. Again, of course, the noise must be added to the sinusoidal pattern sequence I m and not to the wrapped phase maps. Hence, although 7.5% of the pixels show an impulse, the same amount of phase estimates may not necessarily be affected. Fig. 6 shows the phase measurements and the coordinates estimated with the different unwrapping methods. An impulse in the pattern sequence causes the respective phase to be distorted to a greater or lesser extent, depending on how far the impulse is from the correct intensity value.
Again, the hierarchical method shows similar effects as before, which are, however, more prominent here. The numbertheoretical method can still provide a good unwrapping performance. However, due to the generally higher noise level, the accuracy decreases. For the optimization-based method, erroneous estimations are now also apparent, comparable to the hierarchical method. Opposed to this, the two probabilistic methods can still give very good results. This can be explained by considering that a phase measurement with an impulse in the pattern sequence has a smaller modulation B. This also increases the corresponding estimate of the phase uncertainty. This estimate can be used directly in the proposed methods to compensate for poor phase measurements. Thus, better phase measurements have more influence on the optimization. For the spatio-temporal method, this means that a distortion of the phase has an effect only if a large number of the pixels in the respective 3 × 3 × 3 cube are disturbed. Since the probability of this is quite low, the method yields almost no errors. In order to evaluate the heterodyne method, phase measurements with frequencies f ≈ (6,9,11) are generated. For the same reasons as before, the wavelengths were quantized as λ = (331, 223, 181). This corresponds to frequencies f ≈ (6.051, 8.982, 11.066). Image noise is superimposed on the sinusoidal pattern sequence, corresponding to a phase uncertainty of σ φ = 0.15 rad = 8.6 • . Since the hierarchical method can uniquely unwrap the phases only up to the first period, it is not considered in this comparison. Fig. 7 shows the phase measurements and the coordinates estimated with the different methods.
It can be seen that even with smaller noise than before, the heterodyne method delivers only mediocre results. A large part of the pixels is correctly unwrapped, but many lines of incorrect values appear parallel to the true line.This is because the phase noise is summed up when calculating the phase difference. To get a unique solution, first are calculated. A unique phase can then be calculated with φ 123 = φ 23 − φ 12 with f 123 = f 12 − f 23 = 0.85. This is then used to unwrap the individual phase measurements. However, since the noise is summed up in each step, the reference phase is of poor quality, resulting in a poor overall unwrapping result. Surprisingly, the number-theoretical method fails completely. The integer arithmetic of the method cannot work even at a very small noise level. The optimization-based method can unwrap the phase very well, as before. Errors are still found at the boundary and in parts in the center. As before, the presented probabilistic methods provide very good results, whereas the spatio-temporal approach yields almost only correct estimates. For a final evaluation, the sinusoidal pattern sequence was now again overlaid with impulse noise, with the probability of an impulse set to p I = 0.03. Fig. 8 shows the phase measurements and the coordinates estimated with the different methods. Although the noise is very small, the heterodyne method again shows many unwrapping errors. The numbertheoretical method delivers bad values too. Here we see that almost only pixels without distortion are unwrapped correctly, visible by the somewhat stronger line in the center. The optimization-based method gives very good results, with only a few boundary errors and two clusters of erroneous estimates. The presented probabilistic methods show almost perfect results, which is explainable by the incorporation of the estimated phase uncertainty in the unwrapping process.

B. ROBUSTNESS AGAINST NOISE
The methods presented are now to be evaluated quantitatively. For this purpose, the robustness of the methods against Gaussian noise and impulse noise will be investigated. To compare all methods, sinusoidal pattern sequences with M = 8 phase shifts were simulated. Subsequently, various noise factors were superimposed on the images, the phase was estimated using phase-shift coding, and finally, the phases were unwrapped using the presented methods.

1) Error Metrics
In order to make quantitative statements about the methods, suitable error metrics have to be defined beforehand. As a first error measure, the absolute distance of the estimated coordinate x to the true coordinate x true is required The second error metric evaluates the quality of phase unwrapping and describes the success rate, representing whether a pixel was correctly unwrapped: where C i indicates whether the phase measurement associated with the frequency f i has been correctly unwrapped Because our proposed methods do not directly unwrap the individual phase measurements but return a global solution, (43) is used with C i = C imax and i max = arg i max f i . Hence, any phase value that is farther away from the true solution than 1 2fmax is therefore classified as an unwrapping error.

2) Error Evaluation
For a first analysis, the frequencies of the sinusoidal pattern sequence were again chosen to be f ≈ (1, 2.999, 4.995) in order to create integer wavelengths λ = (2003, 668, 401) to ensure that the number-theoretical method can be used. The robustness towards Gaussian image noise was analyzed by increasing the phase uncertainty incrementally from σ φ = 0 to σ φ = 0.5 rad ≈ 28.6 • in 100 steps. For the analysis of robustness to impulse noise, the probability of an impulse was increased stepwise from p I = 0 to p I = 20 % in 100 steps. Fig. 9 shows the results of the analysis as a plot of the mean estimation error ϵ x and mean success rate s x . The evaluation of the unwrapping error metrics yields similar results as the evaluation of the quantitative results from the previous section, for both Gaussian noise and impulse noise. The number-theoretical method consistently yields the worst results with the largest estimation error. The success rate is also consistently the lowest, mainly caused by the erroneous unwrapping at the boundaries of the coding interval. Interestingly, the hierarchical method and the optimizationbased method show almost identical behavior up to about σ φ ≈ 0.2. Only for higher noise levels, the advantage of the optimization-based method becomes apparent. The proposed probabilistic methods provide the best results with the smallest estimation error and highest success rate, where even for very large noise levels the spatio-temporal method can still correctly unwrap more than 99.9 % of the pixels.
For the second analysis, the frequencies of the sinusoidal pattern sequence were again chosen to be f ≈ (6.051, 8.982, 11.066) to create integer wavelengths λ = (331, 223, 181) suitable for the number-theoretical method. The noise was parameterized in the same way as before. influence of Gaussian noise, it can be clearly seen that the number-theoretical method is extremely susceptible to noise. It can only deliver correct values for very small noise values. Starting from a noise of σ φ ≈ 0.02 it has already reached the maximum possible mean error. For small noise levels, the heterodyne method still shows very good results and can keep up with the other methods. Only for larger noise do significant deficiencies become apparent. For the investigated frequency configuration, the optimization-based method and the proposed probabilistic temporal unwrapping method show very similar behavior in the success rate. The proposed method is slightly better for low noise levels resulting in a smaller estimation error. For large noise levels, both yield almost the same result. The spatio-temporal method, on the other hand, can still give very good results for large noise levels even when a phase-shift configuration is used consisting of high frequencies, where in general the success rate is more susceptible to noise. The analysis of the impulse noise emphasizes again the advantages of the proposed methods. The heterodyne method and the number-theoretical method are very susceptible to impulse noise. Even small amounts of noise cause the success rate to drop steeply and the estimation error to rise significantly. The optimization-based method and the probabilistic temporal method show similar behavior, with the proposed temporal method being slightly better. Interestingly, the spatio-temporal method shows exceptionally good results here. Even with p I = 20 % impulse noise, the success rate is still greater than 99.99 %. This can be explained by the fact that a coordinate estimation is only disturbed when enough phase measurements are influenced by an impulse. Since the spatio-temporal method combines 27 probability densities for each coordinate estimate, the probability that a large part of these densities is disturbed is very small. To obtain a correct estimate, at least one pixel of the 3 × 3 spatial neighborhood must be correct for only two of the three phase measurements, since the corresponding frequencies are pairwise co-prime and effectively two phase measurements are sufficient to get a unique result. The probability of an unwrapping error at p I = 20 % impulse noise with a spatial neighborhood of S = 9 pixels, N = 3 frequencies, and pairwise co-prime frequencies is therefore approximately The probability may in fact be even lower since not every impulse necessarily causes an erroneous measurement. VOLUME 4, 2016 As final comparison, Table 1 shows the evaluation of the error metrics for all unwrapping methods and both frequency configurations for certain noise levels. Here again, it can be seen, that the proposed spatio-temporal phase unwrapping outperforms all other methods.

C. COMPARISON OF DIFFERENT PHASE NOISE MODELS
To confirm the choice of the von Mises distribution as a representative for the probability density of the phase, we will now also compare the different densities. For this, we examine the error metrics of the probabilistic temporal phase unwrapping. Table 2 shows the estimation error and the success rate for Gaussian noise with σ φ = 0.3 and impulse noise with p I = 0.03. For reference, the optimization-based method is also shown. As expected, the phase-shift model according to Rathjen [71] gives the best results for the Gaussian noise and the von Mises distribution is the second-best result. Compared to the optimization-based method, however, the probabilistic methods differ only minimally. Interestingly, for impulse noise, the wrapped normal distribution performs better than the model of Rathjen. Again, the von Mises distribution provides the second-best results, which is only insignificantly worse than the wrapped normal distribution. Apart from the other advantages of the von Mises distribution, it turns out to also be a good compromise to be robust against Gaussian and impulse noise.

D. PHASE MAP RECONSTRUCTION
In this section, we show how phase maps are reconstructed using the presented unwrapping methods. For this purpose, we generate two phase maps (512 × 512 pixels), see Fig. 11. Phase map 1 shows a continuous surface with hills and valleys whereas phase map 2 represents a discontinuous surface that has sharp edges. We generate the corresponding sinusoidal pattern sequence with wavelengths λ = (331, 223, 181), corresponding to frequencies f ≈ (6.051, 8.982, 11.066), and superimpose the pattern images with Gaussian noise corresponding to σ φ = 0.15. Fig. 11 shows the generated phase-shift image data and the wrapped phase maps that are calculated using phase-shift coding.

1) Edge Detection Example
Because our presented spatio-temporal method may not be used across discontinuities, edges in the phase map must be detected first. The application of edge-detection to the phase maps is shown in Fig. 12. Fig. 12a and Fig. 12c each show the application of the presented 2π-invariant edge detector to the wrapped phase maps. Fig. 12b and Fig. 12d show the output of a standard edge detector that uses a standard Laplacian and a standard absolute distance to obtain the edge. It can be clearly seen that the standard detector not only detects the edges in the phase map but also the phase jumps caused by the wrapping of the phase values. The presented detector, on the other hand, detects only the real edges in the phase map. Since phase map 1 is continuous, no edge is detected. Only a few individual pixels are detected as edges since the edge detector is of course also influenced by the noise. The spiral shape of phase map 2 can be detected very well and in addition, only a few individual pixels are incorrectly detected as an edge. An optimization of the thresholding parameter in the edge detection could resolve those wrongly detected pixels. However, even a wrongly detected edge may not cause a faulty phase unwrapping, since edge pixels are then "just" unwrapped using the probabilistic temporal method instead of the spatio-temporal method which still performs very well.

2) Phase Reconstruction
The results of the unwrapping of phase map 1 are shown in Fig. 13 for the heterodyne unwrapping, the optimizationbased unwrapping, the proposed temporal unwrapping, and the proposed spatio-temporal unwrapping, respectively. The top row shows the reconstruction as a 3D plot, with the linearly increasing phase ramp subtracted for better visibility. The middle row shows the reconstructed phase and the bottom row shows the respective error.
It can be clearly seen that the heterodyne method works only suboptimally. The total error is quite high and only 78.19 % of the pixels are correctly unwrapped. The reconstructed phase map looks very noisy. The optimization-based method, on the other hand, yields 99.89 % correct pixels and thus provides a far smoother phase reconstruction. Single unwrapping errors occur for phase values close to 0 and 2π, e. g., near the large hill and the deep valley. In addition, some unwrapping errors occur in the center of the phase map near lines where the   wrapped phases show 2π-discontinuities. We cannot directly explain these errors. However, as indicated by Petković et al. [47], their optimization-based method performs worse for non-integer frequencies, which therefore could be the cause. The proposed probabilistic temporal method can correctly unwrap 99.99 % of the pixels. Similar to the optimizationbased method, isolated errors occur for values near the boundaries of the coding interval. The errors along the 2πdiscontinuities of the wrapped phases do not occur here and show that the proposed method also works properly for rational frequencies. The proposed probabilistic spatiotemporal method can correctly unwrap all pixels. At the same time, the general accuracy is higher, as can be seen in the error map by the overall darker green color. Thus, the local information used in the maximum-likelihood estimation not only improves the success rate of the unwrapping but also acts as a denoising filter and therefore leads to lower uncertainty in the estimated coordinate. Fig. 14 shows the results of the unwrapping of phase map 2, again, for the heterodyne unwrapping, the optimization-based unwrapping, the presented temporal and spatio-temporal unwrapping, respectively. Here again, the heterodyne method performs significantly worse than the other methods. Only 78.35 % of the pixels can be correctly unwrapped and the estimation error is very high. As before, the optimizationbased method shows errors at the boundaries of the coding interval, that is, at the right edge of the spiral and the right side of the phase map. Also, in addition, unwrapping errors occur near the 2π-discontinuities of the wrapped phases, which could be caused by the non-integer frequencies.
The probabilistic temporal method again shows smaller errors at the boundaries of the coding interval and faulty lines as in the optimization-based method do not appear. The spatiotemporal method again shows an overall smaller error and can correctly unwrap almost all pixels. The error map shows that the pixels along the edge of the spiral have a larger error. This can be explained by the fact that for these pixels the continuity assumption of the surface is violated and these pixels were detected as an edge, see Fig. 12c. Wherever an edge is detected, the temporal method is used, everywhere else the spatio-temporal method helps to improve the estimation. Further, the spatio-temporal method is clearly more robust against unwrapping errors, which can also be seen in the error map at the right edge of the spiral. Here, only unwrapping errors occur exactly on the edge. The pixels away from the edge can be correctly unwrapped.

VI. CONCLUSION
In this work, we presented a new probabilistic approach for phase unwrapping which uses circular statistics to describe the phase-shift coding. The presented method unwraps all phase measurements simultaneously by finding the coordinate that had the maximum probability to cause the phase measurements. Using circular statistics, both the periodicity of the phase is taken into account and the estimation of the phase uncertainty can be included in the unwrapping process, thus automatically compensating for individual erroneous phase measurements. We achieve this by expressing the individual phase measurements as appropriate stochastic variables, and we investigated different distributions to describe them. We further determined the probability density of the encoded coordinate, found the optimal decoding by a maximum likelihood approach, and thus can implicitly and simultaneously compensate for the wrapping of all phase measurements. Furthermore, we can extend the presented probabilistic method to a spatiotemporal approach by integrating a local surface continuity assumption into the framework and modeling the local pixel neighborhood. This results in an implicit smoothing of the probability densities over the spatial dimensions. To ensure the assumptions are not violated, we detect discontinuities in the surface using a modified edge detector and exclude them from this spatial modeling.
In simulations, we compared our methods with state-of-theart temporal phase unwrapping algorithms and investigated the effect of different noise types. The results show that the proposed methods are noticeably more robust against noise. This provides the ability to increase the acquisition speed of a structured illumination system by using phase-shift coding with fewer shifts, where the noise level is generally higher. It was also shown that the proposed methods allow a relatively free range in the choice of frequencies of the sinusoidal pattern sequence so that even rational frequencies yield good results. At the same time, it is demonstrated that by modeling the periodicity using circular probability densities, the unwrapping errors at the boundary of the coding interval can be significantly reduced. In addition, the inclusion of the phase uncertainty allows to automatically compensate for too noisy phase measurements, making the presented methods very robust against impulse noise. Although the von Mises distribution does not ideally describe the phase noise, it is better able to handle impulse distortions and thus proves to be a suitable compromise to compensate well for both Gaussian noise and impulse noise at the same time. Finally, the extension of the temporal approach to a spatiotemporal approach can considerably increase the robustness of the method even further, eventually leading to improved performance of optical metrology applications.