Multiprojector Multicamera Structured Light Surface Scanner

Structured light surface scanners are usually limited to a single projector and cannot provide a full surface profile in a single scan. Multiple projectors must be employed to extend the imaging area over the whole surface of the imaged object. Using multiple projectors is a challenging problem due to inter-projector interference which makes pattern decoding hard. We describe a practical approach to multi-projector structured light surface scanning based on sinusoidal fringe patterns and temporal phase shifting which allows simultaneous projector employment. In our approach a different set of temporal phase shifts is assigned to each projector to enable an efficient and robust separation of observed combined patterns into the contributions of each projector. The proposed approach does not place any restrictions on projector placement nor on the upper limit of the number of projectors used. We demonstrate the applicability of the proposed approach on a structured light surface scanner comprised of three projectors and six cameras.


I. INTRODUCTION
Structured light (SL) surface scanning [1], [2] is an important topic in computer vision and industrial metrology with different applications such as 3D sensing, object recognition, industrial inspection, reverse engineering, clothing design, to name a few. Most often a digital projector and a digital camera are used; together they form a calibrated projector-camera pair which is called a SL surface scanner. In SL surface scanning a pattern is projected onto an object of interest, is then imaged, and the deformation of the pattern due to object shape is observed. In such a setup the projector is an active device which emits modulated light, a SL pattern, onto the scanned object. The camera is a passive device which observes the light reflected from the object's surface. The image acquired by the camera contains the deformed SL pattern which carries all the information required to reconstruct the surface profile. How that information is extracted from the acquired image(s) depends on the pattern type. For example, in fringe projection profilometry (FPP) [3] pattern's phase is The associate editor coordinating the review of this manuscript and approving it for publication was Olutayo O. Oyerinde . the information carrier, while for binary coded patterns [4] information is encoded using pattern intensity.
Many practical applications of SL profilometry require a full surface profile of an object [5]. Obtaining the whole surface profile in a single scan using a SL surface scanner comprised of a single projector-camera pair is impossible due to a limited field-of-view (FOV) and due to inevitable occlusions and shadows. Therefore, multiple scans must be performed to obtain many partial surface profiles which must then be stitched together to construct the full surface of an object. Such multiple partial scans of a single object may be achieved in two ways: (1) by moving either the object or the scanner, and (2) by adding more projectors and cameras to the scanner itself. In the first approach partial scans are always obtained sequentially one after another with movement performed in-between the scans to change the FOV. Movement may be done by hand [6], however most often a rotating table [5], [7], [8], [9], [10] is used to move the object. Sometimes a robotic arm [11], [12] is used to move the scanner. The use of rotating tables or robotic arms simplifies the stitching of partial scans by providing a accurate information about the relative movement. If no such information is available then scan stitching becomes a tedious task, e.g. when movement is performed by hand. In the second approach all projectors and cameras may be used simultaneously. If all projectors and cameras are calibrated to the same reference coordinate system then such simultaneous employment eliminates the stitching step: the full surface profile is obtained immediately. Also, in contrast to the first approach, no movement whatsoever is required which may be a significant advantage in some applications.
What are the problems when adding more cameras and projectors to a SL surface scanner? Adding more cameras to the scanner is trivial as they are passive devices. Adding projectors is much more difficult as they are active devices which mutually interfere. A careful design of a SL pattern is necessary to overcome the undesired effects of inter-projector interference. Unfortunately, almost all SL patterns currently described in the literature [13], [14], [15], [16] are designed for a single projector systems. Using such SL patterns limits multi-projector scanners to the simplest acquisition strategy where projectors are used in-turns, a so-called sequential projector employment [17], [18]. The sequential projector employment is not desirable as it unnecessary prolongs the scanning time and as it increases the amount of acquired images proportionally to the number of projectors used. A true multi-projector multi-camera scanner must employ projectors simultaneously, hence it requires a SL pattern which allows clear, efficient and robust separation of acquired images into the contributions of each projector.
Surface scanners using more than one projector or camera may be classified into three distinct groups: (a) singleprojector multi-camera scanners; (b) multi-projector singlecamera scanners; and (c) true multi-projector multi-camera scanners.
Single-projector multi-cameras scanners are frequently presented in the literature. The most common approach is active stereo [19], [20], [21], [22], [23] where a simple texture which improves the robustness of stereo-matching algorithms is projected. However, the projected texture is never decoded, hence all such methods are not true SL methods. Approaches that both project and decode a SL pattern are also plenty: a fringe SL pattern is used in [24], [25], [26], [27], and [28], Gray code in [29], and line patterns in [30] and [31]. More specifically: in [24], [25], and [26] a stereo-pair is used to directly unwrap a single wrapped phase; in [27], [28], and [31] a decoded SL pattern is used to drive the stereomatching algorithm; in [29] the projected Gray code is used to directly establish the correspondences between the two cameras of a stereo-pair; and in [30] a stereo-pair is used to determine line order and drive SL decoding. Interestingly, all aforementioned scanners use only two cameras which are spatially arranged to form a stereo-pair with a maximum possible FOV overlap between cameras, i.e. additional cameras are not used to increase the size of the visible surface.
There are several multi-projector single-camera scanners described in the literature. The most common approach is assigning different spatial orientations to SL pattern of each projector which enables pattern separation: Woolford and Burnett [32] describe a two-projectors one-camera system which uses one horizontal and one vertical sinusoidal fringe; Je et al. [33] describe a two-projectors two-cameras system which uses one horizontal and one vertical color stripe pattern; and Sagawa et al. [34] and Furukawa et al. [35], [36] describe a two-and three-projectors single-camera systems which use colored line pattern with different spatial line orientation for each projector. Another approach is by Servin et al. [37], [38] who describe a multi-projector single-camera system which relies on a very precise spatial placement of projectors, a so called co-phased profilometry, and therefore has limited general applicability. Similarly, Jia et al. [39] describe a four-projectors one-camera system specifically designed to scan indoor scenes; their design places the projectors in a circle facing outward so there exists no overlap between projectors' FOV.
Multi-projector multi-camera scanners are not often described in the literature. Two main reasons may be the difficulty of the multi-projector interference problem and the build complexity of such scanners. Yan et al. [40] provide a theoretical analysis of a multi-projector multi-camera system which combines De Bruijn color sequences and random dot patterns; unfortunately, their work provides simulated results only making practical applicability dubious. Garcia and Zakhor [17], [41] present a three-projector nine-camera system for human body scanning which uses hardware synchronization to implement sequential acquisition strategy which neatly sidesteps the projector interference problem at the cost of increased acquisition time. Petković et al. [42], [43] present a three-projector six-camera system where phase shifts are carefully assigned to each projector to enable efficient separation of projected patterns by using discrete Fourier transform.
Considering the state-of-the art a majority of multiprojector scanners described in the literature use some form of spatial pattern multiplexing where different spatial orientations are used to separate the patterns. However, enforcing different spatial orientations in a multi-projector systems places undesirable limits on projector placement. Furthermore, it is not easily extensible to an arbitrary number of projectors due to the fact that spatial orientation of a SL pattern changes from projection to recording depending on the object's surface normal and on the position of the camera. Using temporal pattern multiplexing [42] is a more attractive solution as it enables robust separation of projected patterns without placing constraints on projector placement.
In this work we describe a practical approach to multiprojector multi-camera SL surface scanners which is an extension of the method we have proposed in [42] and [43]. The proposed approach uses gray-level sinusoidal fringe patterns such that each projector has its own set of temporal phase shifts which together allow robust separation of projected patterns when projectors are employed simultaneously. The ability to employ projectors simultaneously is an advantage compared to the scanners which use sequential employment [17], [18], [41]. Also, the proposed approach does not place any restrictions on projector placement nor on the upper limit of the number of projectors used; an arbitrary spatial arrangement is allowed. That property is a clear advantage compared to the scanners which use spatial orientation to separate patterns [32], [33], [34], [35], [36]. Compared to our previous work [42], [43]: • We extend the method from fixed phase shifts mandated by the use of the discrete Fourier transform to arbitrary phase shifts.
• We give clear instructions and analytical expressions on how to select phase shifts to achieve an efficient computational separation.
• We discuss the conditions which phase shifts must satisfy for the separation to be robust to unwanted harmonic interference.
• We discuss in detail the photometric calibration of the scanning system. This article is organized as follows: in Section II we first give a short review of single projector FPP using temporal phase shifting which is followed by the description of the proposed method. Experimental results and discussion are presented in Section III. We conclude in Section IV.

II. MULTI-PROJECTOR MULTI-CAMERA SCANNER
Before discussing the proposed method we first review a classical temporal phase shifting strategy which uses one projector only [44], and then we then extend this strategy to an arbitrary number of projectors.

A. CLASSICAL TEMPORAL PHASE SHIFTING
In classical temporal phase shifting one projector is used. The projector projects a set of N sinusoidal fringe patterns all having the same spatial frequency ω and a different temporal phase shift ϕ[n]. Each of N projected fringe patterns is imaged by the camera yielding a set of N frames.
The intensity of the nth sinusoidal fringe in the projector's coordinate system (x PRJ , y PRJ ) is: where I 0 is projector's intensity and where n = 1, . . . , N is a frame number (time-step). Note that Eq. (1) states that the projector coordinate x PRJ is encoded in the phase using the spatial frequency ω. Also, instead of x PRJ we may encode y PRJ or a spatial coordinate along an arbitrary axis in the projector's image plane. The intensity of the nth projected fringe as observed by the camera and expressed in camera's coordinate system (x CAM , y CAM ) is: where I AMB is a constant ambient illumination and where h, 0 ≤ h < 1, models a pixel-dependent channel loss. Combining Eqs. (1) and (2) yields: where a = I AMB + 1 2 hI 0 (4) is a time-invariant component comprised of both ambient and averaged projector illumination and where is the amplitude of the observed fringe or contrast. Eq. (3) defines a spatio-temporal signal where ϕ[n] are temporal phase shifts and where ωx PRJ = is a spatial phase. Note that in Eq. (3) the projector coordinate x PRJ is encoded in the phase for each camera pixel (x CAM , y CAM ); recovering the signal phase enables the extraction of x PRJ and subsequent surface reconstruction via triangulation. Hence, all processing is temporal and is performed on a set of N frames for each camera pixel (x CAM , y CAM ) independently. From here on spatial camera coordinates are omitted for clarity.
Expanding Eq. (3) for the nth frame yields +b sin(ωx PRJ ) sin(ϕ[n]). (6) We must recover the projector coordinate x PRJ from the set of N temporal observations I CAM [n]. That is usually done using the least-squares approach as explained in [44], [45], and [46]. We say the signal I CAM [n] is decomposed into parameters according to Eq.
for n = 1, . . . , N . Then Eq. (6) may be interpreted to state we are observing a linear combination (a weighted sum) of three N -dimensional vectors in V with the weights which we have to recover. Let G be the 3×3 Grammian matrix of all dot-products in V , where the time-step n was omitted for clarity. If the vectors in V are linearly independent then dim span(V ) ≥ 3, the Grammian matrix G has an inverse, and signal decomposition VOLUME 10, 2022 of Eq. (6) may be solved for the unknown weight vector w using the orthogonality principle as follows: If all temporal phase shifts ϕ[n] are different then it is sufficient to have N ≥ 3 for the invertible matrix G. Also, due to Hilbert's projection theorem the weights given by Eq. (11) are optimal in the least-square sense and are equivalent to solutions presented in [44], [45], and [46]. Once the weights w are recovered using Eq. (11) a wrapped spatial phase φ ≡ (mod 2π) may be recovered using and the contrast b may be recovered using To reconstruct the surface profile first the contrast b is used to determine which areas are illuminated by the projector by comparing it to some threshold T . Pixels for which are considered illuminated. The threshold T is best set as a percentage of 1 2 I 0 , e.g. setting T to 10% of 1 2 I 0 requires that channel loss h of Eq. (5) is no lower than 0.1. Next, the wrapped spatial phase φ is unwrapped for the illuminated pixels only using one of the well known phase unwrapping algorithms [47], [48], [49] to obtain . The spatial projector coordinate x PRJ is then recovered using Finally, the surface profile is reconstructed via triangulation [50].
In practice the phase shifts ϕ[n] may be freely selected as long as the Grammian matrix is invertible. A selection of particular interest is one which diagonalizes the Grammian matrix G of Eq. (10) and which makes Eq. (11) trivial to solve. One possible choice which is the standard in FPP [47] is The additional advantage of phase shifts given by Eq. (16) is that they allow the user to design a customized phase extraction filter using the results of signal processing theory as was proposed by Servin et al. [51].

B. MULTI-PROJECTOR TEMPORAL PHASE SHIFTING
Let us now present the proposed solution to the problem of simultaneous employment of multiple projectors, which is an extension of our previous work [42].
Let P be the total number of projectors. When all projectors are employed simultaneously a camera observes an additive combination of all projected sinusoidal fringes for each of N recorded frames. Adding the contributions of additional P − 1 projectors to Eq. (3) yields where b p , ω p , and ϕ p [n] are contrast, spatial fringe frequency, and temporal phase shift of pth projector. The term a of Eq. (3) which contains the ambient illumination and the average intensity of each projector is similarly transformed to where I p is intensity of pth projector and where 0 ≤ h p < 1 models a pixel-dependent channel loss from pth projector to the camera. The contrast terms b p of Eq. (17) are now different for each projector, Similarly to the decomposition of Eq. (6) the observed intensity given by Eq. (17) may be decomposed into a linear combination of sine and cosine functions: Then Eq. (20) may be interpreted as a linear combination of 2P + 1 vectors with 2P + 1 weights The weights w must be recovered from the N -element observation vector I CAM [n]. Recovery may be performed in the same way as was done in Section II-A for the case of one projector. The Grammian matrix G is a 2P + 1 × 2P + 1 matrix: If all vectors in V are linearly independent then dim span(V ) ≥ 2P + 1 and the Grammian matrix G has an inverse. The least-squares estimate of weights is then The weights obtained by Eq. (24) are clearly separated by projector index p, therefore the signal separation between projectors when they are employed simultaneously is achieved. Due to clear signal separation the unknown projector's coordinate x PRJ,p of pth projector is easily recoverable using Eqs. (12) and (15), as explained in Section II-A.

C. SELECTING TEMPORAL PHASE SHIFTS
A key difference between one projector and multiple projectors is in the selection of temporal phase shifts ϕ[n]. For one projector the selection is a simple one as explained in Section II-A. For P projectors the phase shifts ϕ p [n] have to be selected to allow simple, efficient, and robust wrapped phase recovery.
A similar problem was studied in signal processing which gives one important result regarding wrapped phase selection [52]. Let the temporal phase shifts for the pth projector have the form where 0 < p and p 1 = p 2 for p 1 = p 2 , i.e. temporal phase shifts ϕ p [n] are selected from an increasing linear sequences each having a different temporal frequency p . For such choice of temporal phase shifts if N ≥ 2P + 1 then all 2P + 1 vectors in V are linearly independent [52]. Then the Grammian matrix G given by Eq. (23) is invertible, which is a sufficient condition to separate observed frames into the contributions of each projector. This result means one may randomly assign different temporal frequencies p to each of p projectors to realize a multi-projector surface scanner which allows sequential projector employment, or even more generally phase shifts ϕ p [n] may be randomly assigned. 1 However, a random choice of temporal frequencies and/or phase shifts will in general yield a Grammian matrix G which is not diagonal; having a diagonal Grammian matrix G significantly reduces computational complexity and makes separation efficient. Also, in addition to efficiency we may also require some resistance to signal interference to achieve robustness.
In the remainder of this section we first discuss conditions required to make separation efficient, and then we discuss conditions to make separation robust. 1 For a completely random assignment one must always numerically verify that the Grammian is invertible.

1) LINEAR PHASE SHIFTS FOR EFFICIENT SEPARATION
The Grammian matrix G given by Eq. (23) is comprised of several different dot products. Three of these must always evaluate to zero for the Grammian matrix to be diagonal. The dot products which must evaluate to zero are: Eq. (26) must hold for all p, p 1 , and p 2 , p 1 = p 2 . Combining Eq. (25), which guarantees the invertibility of the Grammian matrix if all p are different, with Eq. (26), which guarantees the Grammian matrix is diagonal, gives the conditions on the set of phase shifts required to achieve an efficient separation of projectors' contributions.
Evaluating all three sums of Eq. (26) as a finite sums of geometric series of complex exponentials yields the following equations: Solving Eq. (27) together with conditions of [52] then gives the sufficient conditions for invertible and diagonal Grammian matrix: To summarize, the sufficient conditions to obtain the invertible and diagonal Grammian matrix which is required for efficient separation of projectors are: (1) the number of projected fringes/recorded frames N must be at least twice the number of projectors plus one, (2) phase shifts of pth projector must form a discrete linear sequence whose temporal frequency p is a multiple of 2π/N , and (3) temporal frequencies of phase shift sequences must be different for each projector.
One possible choice which satisfies the conditions of Eq. (28) is selecting the temporal frequency of the phase shift sequence using the projector index p: Then phase shifts for the pth projector are and the two vectors in V corresponding to the pth projector are The choice of temporal phase shifts given by Eq. (29) significantly simplifies the phase recovery as Eqs. (12) and (13) (32) Furthermore, note that due to the diagonal G the right-hand side of Eq. (24) reduces to dot products of I CAM [n] and all vectors in V , i.e. we are representing I CAM [n] using vectors in V as a basis. The basis in V is comprised of one constant function which models the ambient illumination and of sine and cosine functions which model each projector. All elements of V may then be decomposed into sums of complex exponentials [53], where denotes a time-discrete complex exponential with the period B and the exponent A; j is an imaginary unit. The decomposition of vectors in V into terms of complex exponentials allows the use the fast Fourier transform (FFT) algorithm [54] for signal decomposition as 2P + 1 complex exponentials W ±(n−1)p 2P+1 , p = 0, . . . , P and n = 1, . . . , 2P + 1 are the basis of the discrete Fourier transform in 2P+1 points [53]. Hence, both wrapped phase φ p and contrast b k may be computed using FFT. Let (35) be the discrete Fourier transform of I CAM [n] in 2P + 1 points computed using the FFT algorithm. Then and Therefore, selecting the linear phase shifts using Eqs. (29) and (30) allows one to use the FFT algorithm to efficiently separate the contributions of P projectors from 2P + 1 observed frames.

2) TEMPORAL ALIASING AND ROBUSTNESS
The choice of linear phase shifts using Eqs. (29) and (30) may lead to temporal aliasing problems when projectors do not project a perfect sinusoid. This may happen due to many reasons; a common one in digital fringe projection profilometry is gamma transformation [55], [56], [57], [58]. Consider first a case of a single projector. Denote by g(·) the non-linear amplitude transfer function of the projector, i.e. for an input I the projector outputs g(I ). Eq. (2) then becomes The second term in Eq. (38) may be decomposed using the Fourier series expansion, (39) where c k are the coefficients of Fourier series expansion, i.e. c 0 is the amplitude of the fundamental harmonic of the projected fringe and c k , k > 0, are amplitudes of harmonic overtones. The value of coefficients c k depends on g(·).
In practice there usually exists some finite k 0 such that values of all c k for k > k 0 are almost zero. Hence, the harmonics overtones for all k > k 0 have a negligible contribution and may be discarded from the sum.
In Eq. (40) coefficients c 0,p correspond to contrasts b p of Eq. (20). Coefficients c k,p , 1 ≤ k ≤ k 0 , are unwanted harmonic overtones in the mixture which are normally called harmonic distortion. One additional issue of harmonic distortion in multi-projector systems which is not present in single projector systems is that harmonic distortion of a particular projector is not limited to itself, instead it affects all other projectors as well. For example, in two-projector system the choice of parameters given by Eq. (28) causes the first harmonic overtone of the first projector c 1,1 to affect the fundamental harmonic of the second projector c 0,2 due to 2 1 n = 2 2π 2P+1 n = 2 n. Even worse, for a finite number of N frames all harmonic overtones whose temporal frequency is larger than π will alias into the baseband making determination of affected fundamental harmonics a non-trivial task.
Consider a general case of an overlap between one fundamental harmonic and one harmonic overtone for N phase shifts (frames) and P projectors. Let p 1 and p 2 be temporal frequencies of phase shifts for two projectors p 1 and p 2 . 2 Then, the kth harmonic overtone of p 1 will affect the fundamental harmonic of p 2 if holds. Expanding this condition using Eq. (28) yields two Diophantine equations, If Eq. (42) has a solution (k, l) in whole numbers for 1 ≤ k ≤ k 0 then kth harmonic overtone of projector p 1 affects the fundamental harmonic of projector p 2 . By repeating Eq. (42) for all possible projector pairs we obtain a system of Diophantine equations [59] which, to achieve the insensitivity to the first k 0 harmonics, must have no solutions in whole numbers for 1 ≤ k ≤ k 0 . One way to achieve such insensitivity is to increase the number of projected fringes/recorded frames N from the minimal value of 2P + 1 and to neglect the temporal frequency assignment given by Eq. (29). Then the temporal frequencies p of phase shifts may be selected to shift the unwanted harmonic overtones into the unused 3 basis vectors of the discrete Fourier transform basis. Therefore, to achieve robustness to interference a separation between selected frequencies must be maintained.
Unfortunately, an analytical solution to Eq. (42) is not known to us (if it even exists), so we can only solve the frequency selection numerically. An example for three projectors, P = 3, and k 0 = 1: a particular N for which Eq. (42) has no solution for 1 < k ≤ k 0 in all projector-pairs is N = 12 using temporal frequencies 1 = 1 2π N , 2 = 3 2π N , and 3 = 5 2π N .

III. RESULTS AND DISCUSSION
To demonstrate the applicability the proposed approach we have constructed a prototype scanner comprised of three projectors denoted -and of six cameras denoted -. The prototype is shown in Fig. 1. The first projector is Canon LV-WX310ST while the remaining two, and , are Acer S1383WHne. All cameras are PointGrey's Grasshopper3 GS3-U3-23S6C-C; the first four cameras -are equipped with Fujinon HF12.5SA-1 lenses while the remaining two, and , are equipped with Kowa LM8JCM lenses. All projectors and cameras are operated at their native resolutions: 1280 × 800 for projectors and 1920 × 1200 for cameras.
The architecture of the assembled prototype using three vertical poles on wheeled carts provides flexibility when testing various spatial arrangements. We have used two spatial arrangements. In the first arrangement all three carts were placed directly in front of an object to maximize the overlap between all projectors (Fig. 1). In the second arrangement each cart was placed on a vertex of (an equilateral) triangle with the object in the triangle's center. The first arrangement which maximizes inter-projector interference is used to test the proposed separation approach. The second arrangement which maximizes surface coverage is used to showcase how the (almost) complete surface may be acquired in a single scan. The remainder of this section is organized as follows: We first briefly explain how the scanner was calibrated, starting with geometric calibration and continuing with photometric calibration. Then, we present the results of spectrum estimation for both photometrically calibrated and uncalibrated projectors to demonstrate that the photometric calibration is necessary and cannot be skipped. Next, we present experiments to demonstrate the effectiveness of the proposed signal separation scheme and experiments to showcase a single scan surface reconstruction. Finally, we conclude this Section with a discussion.

A. CALIBRATION
The 3D surface scanner must be calibrated prior to use. The calibration is usually separated into geometric and photometric calibrations which may be performed independently. The purpose of geometric calibration is to retrieve the intrinsic and extrinsic parameters of all projectors and cameras. The purpose of photometric calibration is to retrieve the intensity transfer functions for all projectors and cameras. For our prototype (Fig. 1) the photometric calibration is performed once while geometric calibration must performed each time three carts carrying projectors and cameras are rearranged in space.

1) GEOMETRIC CALIBRATION
We have performed geometric calibration exactly as explained in our previous work [60].   Briefly, projectors are modeled as inverse cameras and the approach of Zhang and Huang [61] is used to enable each projector to ''capture'' the images of a calibration board like a camera, thus reducing the calibration problem to the well known case of calibrating a multiple-camera system.
The calibration object is a double-sided calibration board which is shown in Fig. 2 and which was assembled by gluing two large vinyl stickers containing the calibration pattern onto both sides of a whiteboard mounted on wheels. A rigid transformation (rotation and translation) between the front and the back calibration pattern was then determined using MicroScribe G2LX digitizing arm. To calibrate the system in practice the double-sided calibration board is recorded in many positions. The coordinates of white circles are then automatically extracted the whole system is calibrated as a multi-camera system using bundle adjustment [50], [62]. The final result of geometric calibration are intrinsic and extrinsic parameters of each camera and projector, where extrinsic parameters refer to a common coordinate system. A calibration result for the second spatial arrangement where projectors and cameras surround an area of interest is shown in Fig. 3.

2) PHOTOMETRIC CALIBRATION
We have performed the photometric calibration of the system to correct the non-linear intensity transfer function of the projectors, specifically to compensate for projector's unknown gamma factors. For each projector we must measure its gamma factor γ . Once γ is known a pre-corrected image (43) is sent to the projector to compensate for projector's gamma transformation [55]. The factor γ of Eq. (43) may be determined from the intensity transfer function using the least-squares power-law fitting [63] to determine the parameters a and γ of the model I CAM = aI γ PRJ . The gray-level intensity transfer function of each projector was sampled in 255 points by recording a planar white board in a dark room. To this purpose we have used PointGrey's Dragonfly2 DR2-HICOL camera which was operated in 12 bit acquisition mode. This camera was intentionally used for the intensity calibration of the projectors as it has better intensity resolution than Grasshopper3 GS3-U3-23S6C-C. After the least-squares power-law fitting (see Fig. 4) the estimated values of γ factors for three used projectors are: γ 1 = 2.1725 for , γ 2 = 2.1849 for , and γ 3 = 2.1812 for .

B. PHOTOMETRIC CALIBRATION AND HARMONIC DISTORTION
To evaluate the effectiveness of the photometric calibration regarding harmonic distortion we have estimated temporal power spectral densities (PSD) using Bartlett's method [64]. Temporal PSD were estimated for each projector separately using a white board and PointGrey's Dragonfly2 DR2-HICOL camera which was operated in 12bit acquisition mode.
To estimate the temporal PSD of a sinusoidal fringe pattern comprised of N phase shifts using some temporal frequency we have to record a total of N · M , M N , frames using temporal phase shifts ϕ[n] = (n − 1), n = 1, . . . , N · M . This produces an ensemble of temporal signals, one temporal signal for each camera pixel, all having exactly N · M samples. Applying the Bartlett's method of periodogram averaging [64] to such ensemble produces the desired temporal PSD estimate. Note that the temporal PSD estimate obtained it this way is invariant with regard to the spatial fringe frequency ω.  We have performed two different experiments to evaluate the effect of harmonic distortion caused by projector's gamma: (1) estimation of temporal PSD for the case of N = 7 phase shifts using M = 90; and (2) estimation of temporal PSD for the case of N = 12 phase shifts using M = 70. In both experiments temporal PSDs were estimated twice, first without compensating for projector's gamma and then with compensation given by Eq. (43). Also note that values M = 90 and M = 70 are only used to obtain reliable PSD estimates; such a huge number of images in not required for normal 3D scanning.

1) PSD FOR N = 7 PHASE SHIFTS
Three temporal PSDs for N = 7 without compensating for projector's gamma are shown in Fig. 5; all are for using three available choices of temporal frequency . Note that 1st harmonic overtone is always attenuated about −11 dB compared to the fundamental harmonic. In terms of amplitudes used in Eqs. (39) and (40) such attenuation gives c 1 /c 0 ≈ 0.28, i.e. 1st harmonic overtone is significant as its amplitude is 28% of the amplitude of the fundamental harmonic. The 2nd harmonic overtone is about −37 dB attenuated (c 2 /c 0 ≈ 0.014) and may be neglected. A complete list of attenuations for 1st and 2nd harmonic overtones compared to the fundamental harmonic for all three projectors is listed in Table 1.
Three temporal PSDs for N = 7 after compensating for projector's gamma are shown in Fig. 6 for projector only while the list of attenuations for 1st and 2nd harmonic VOLUME 10, 2022  overtones compared to the fundamental harmonic for all three projectors is given in Table 2. Compensating for projector's gamma adds on average −30 dB of attenuation to the first harmonic overtone, i.e. after compensating for projector's gamma we have c 1 /c 0 < 9% which is a significant improvement. Note that this result is consistent across all three projectors.

2) PSD FOR N = 12 PHASE SHIFTS
Five temporal PSDs for N = 12 are shown in Fig. 7 without compensating for projector's gamma and in Fig. 8 with compensation for projector's gamma; all PSDs are for using five available choices of temporal frequency . Similarly to the first experiment note the significant reduction in the power of harmonic overtones after the compensation for projector's gamma given by Eq. (43) is applied.
Of particular interest is the topmost PSD in Fig. 7 which gives attenuations of for first four harmonic overtones: −11.16 dB, −41.01 dB, −44.98 dB, and −45.82 dB. The most significant is the first harmonic overtone while other three are quite similar. This means we may in practice choose k 0 = 1 to cut the Fourier series expansion of Eq. (40) (Section II-C2). Also note that the choice of temporal frequency = 4 2π 12 for N = 12 is particularly bad as then the first harmonic overtone overlaps its own fundamental harmonic due to aliasing.
The case of N = 12 phase shifts is interesting as observed PSDs confirm that for the usual gamma factor a reliable separation can be achieved by shifting the dominant first harmonic into unused basis vectors as explained in the last paragraph of Section II-C2, which achieves no overlap between all possible fundamental harmonic and 1st overtone pairs.

C. SEPARATION OF PROJECTED PATTERNS
To showcase and test the proposed separation of projected patterns we have performed several experiments using the first arrangement of projectors which maximizes the unwanted effects of inter-projector interference. The subjects which were scanned are: a human male, a calibration board, and a mannequin.
To perform these experiments we have to choose a spatial frequency and select temporal phase shifts for each projector. We have selected three spatial frequencies, ω A = 2π 21 W , ω B = 2π 31 W , and ω C = 2π 15 W , where W = 1280 px is projector width, with N = 7. To each of these three spatial frequencies we have assigned its own temporal phase shift sequence: ϕ A [n] = 2 2π 7 n, ϕ B [n] = 3 2π 7 n, and ϕ C [n] = 1 2π 7 n. These spatial frequencies and phase shifts are then assigned to projectors in a round-robin fashion: first projects A, projects B, and projects C; then projects C, projects A, and projects B; and finally projects B, projects C, and projects A. This means each camera acquires a set of 7 · 3 = 21 frames. Such choice allows us to use multiple-phase shifting (MPS) unwrapping strategy [48] to recover projector coordinate from three wrapped phases at different spatial frequencies.

1) SEPARATION EXAMPLE
A scan of a human male is used as an example of the separation process. Each of three projectors (P = 3) projects seven patterns (2P + 1 = 7) which are in turn recorded by six cameras, so each camera concurrently [65] acquires exactly seven images. Separation of recorded images is performed as explained in Section II-B for each camera separately, and the separation result for one camera is shown in Fig. 9. Note that the obtained decomposition yields seven images. The first one is a constant component which is the combined contribution of ambient illumination and of averaged illumination of each of three used projectors. The remaining six images contain amplitudes and wrapped phases for each of three projectors. Note that in Fig. 9 the intensity scaling for components a,  FIGURE 9. Separation of projector contributions for . Seven input frames are decomposed using the proposed method to obtain contributions of each individual projector. Note that φ 1 has spatial frequency ω A = 2π 21 W , φ 2 has ω B = 2π 31 W , and φ 3 has ω C = 2π 15 W .

FIGURE 10.
Recovered wrapped phases and projector coordinates for . Note different shadows in the projector coordinate x PRJ for three projectors. VOLUME 10, 2022 FIGURE 11. Separation of projector contributions for into amplitudes and wrapped phases with and without gamma pre-correction when using seven SL patterns. All amplitude images b 1,2,3 are linearly contrast-stretched to make the interference more noticeable. Three leftmost columns show separation without gamma pre-correction: amplitudes b 1,2,3 are all affected by other components due to interference and wrapped phases φ 1,2,3 exhibit unwanted artifacts. Three rightmost columns show separation with applied gamma pre-correction: amplitude components are free of interference and wrapped phases are correct. b 1 , b 2 and b 3 and the input image is the same, and due to this a appears much brighter than individual components b 1 , b 2 and b 3 (as it is their sum). Phases φ 1 , φ 2 and φ 3 are wrapped phases and have to be unwrapped to obtain projector coordinates, which is done by recording another set of images using different frequencies in a round robing fashion as explained earlier and as illustrated in Fig. 10. The phases are unwrapped using the method of [48].

2) REGULAR AND RANDOM PHASE SHIFTS
To showcase how separation performs for regular and for random phase shifts we use a mannequin which is stationary during multiple acquisitions thus facilitating an easier qualitative comparison.
The mannequin was recorded four times in the same position using the minimal number of SL patterns: (i) using regular phase shifts without gamma pre-correction; (ii) using regular phase shifts with gamma pre-correction; (iii) using random phase shifts without gamma pre-correction; and (iv) using random phase shifts with gamma pre-correction. Separation results are shown in Fig. (11). Note that the amplitude components b 1 , b 2 and b 3 should in the ideal case depend only on the output illumination of the projector and on the albedo of the object; they should not contain visible traces of the fringe patters projected by other projectors. However, if projector's non-linear amplitude transfer function is not accounted for using the gamma pre-correction then this causes interference which affect both amplitudes and wrapped phases. The unwanted interference in amplitude images manifests as a fringe pattern having a different spatial frequency, and in wrapped phase images as an unwanted ripples.
Also note that the robustness of separation scheme for random phase shifts may depend on the selected phase values, i.e. phase shift values which are not evenly spread over the whole 2π interval of wrapped phases may lead to numerical instabilities in Eq. (24), and which is a drawback of a random choice. However, one advantage of randomly selected phase shifts is that they may be independently selected while retaining a high probability that separation is possible, which may be of use when e.g. a projector is added to an existing closed system.
Considering results shown in Fig. (11) and taking into considerations the observed temporal PSDs to reduce the effects of unwanted harmonic distortion all projectors in a multi-projector SL scanner must be photometrically calibrated to project a clear sinusoid with harmonic overtones maximally suppressed. If such photometric calibration is for any reason not possible then surface scanning may be performed using N > 2P + 1 and k 0 ≥ 1, i.e. unwanted harmonic components may be shifted into spectral bands which are unused as explained in Section II-C2. An example is shown in Fig. 12 which demonstrates that unwanted artifacts can be significantly suppressed by frequency and phase shift selection which places dominant overtones into unused basis vectors. We also note that such an approach will not work with randomly selected phase shifts as the random selection causes spectral leakage, i.e. if random shifts or random frequencies are used then all projectors must be photometrically calibrated.

3) QUANTITATIVE RESULTS
A flat board (shown in Fig. 13) and a mannequin are used for quantitative comparison between sequential and simultaneous acquisitions due to the fact that they are stationary and are easily recorded multiple times, allowing for a straightforward comparison between different SL patterns. As the most important output of the proposed method are wrapped phases we compare them using the absolute circular distance d circular (·, ·) as the error metric:  29) and (30). Bottom row shows wrapped phases using proposed assignment of Section II-C2. Note the missing artifacts for the bottom row. where φ is the wrapped phase obtained using Eq. (36) and φ MPS is the wrapped phase obtained using a classical sequential approach which is taken to be the ground truth. The error metric was computed for the illuminated pixels only using the threshold T of 5%, see Eq. (14).
Average errors in the form of mean and deviation are listed in Table 3 for the calibration board and in Table 4 for the mannequin. Listed errors are averages over all cameras for all projectors; the results are identical for individual camera and projector pairs so we omit those. Sequences denoted with ''regular'' use phase shifts listed at the start of Section III-C, while ''random'' sequences use random phase shifts selected from the uniform distribution over a 2π interval. Note that errors in the wrapped phase are between 1 • and 2 • and more   importantly, when transformed to errors in decoded projector coordinate in pixels, all listed mean errors are in the subpixel range. Note that the errors for the mannequin have a somewhat lower spread than the errors for the calibration board, which is probably due to different albedo: the mannequin is uniformly white while the calibration board is not.
We have also performed the experiment with and without gamma pre-correction for which errors are listed in Table 5. Note that errors for the N = 12 using strategy of Section II-C2 are acceptable for ''regular'' phase shifts.
The last experiment is the least-squares fitting of a plane to the reconstructed point cloud of the calibration board. The median absolute distance of the reconstructed 3D points from the fitted plane is 0.97 mm, the average is 1.45 mm with the deviation 1.61 mm. Considering the scanner's working space which spans several meters the error in the millimeter range is more than acceptable.

D. 3D RECONSTRUCTION
Once projector coordinates are known for each camera a 3D point cloud is obtained via triangulation [50] and no additional registration [66] is required.
Two examples of obtained point clouds are shown in Fig. 14 where points corresponding to different cameras are rendered using different randomly assigned colors. The top  row in Fig. 14 shows reconstructed point clouds for the first spatial arrangement of wheeled carts where all projectors and cameras are facing in the same direction. The bottom row in Fig. 14 shows reconstructed point clouds for the second arrangement where wheeled carts surround the calibration board. Note that regardless of the spatial arrangement of wheeled carts contributions of each projector and camera pair are obtained in the same coordinate system which makes registration unnecessary. Also note two artifacts for the second spatial arrangements where two cameras and directly observe the lens of the opposing projector . A direct observation of projector's lens produces an oversaturated patch in the image. For such an oversaturated patch camera coordinates are well defined but decoded projector coordinates are essentially randomly distributed and cannot be filtered out via thresholding using Eq. (14). As a consequence reconstructed points corresponding to such patches are constrained to a narrow set of camera rays. To avoid such unwanted artifacts position and/or orientation of projectors should be adjusted so no camera directly observes the lens of an opposing projector, or such patches should be identified during calibration and masked out.
Once point clouds are obtained the unwanted parts of the scan may be cut out and a surface may be produced using a standard meshing software such as MeshLab [67], [68]. One reconstruction example for a mannequin is shown in Fig. 15.
Regarding the scanning time our prototype scanner operates at 20 FPS [43], [65]. Therefore, if using ray-plane triangulation then three frequencies with seven shifts per frequency are required giving a total of 21 image to be projected and acquired, so the acquisition time is 1.05 seconds. The processing time for pattern decoding and triangulation is about 1 second per camera-projector pair, so for six camera three projector system the total image processing time is a bit less than half a minute. Note that reported times are for C++ implementation of the image acquisition, while image processing and 3D reconstruction is implemented in Matlab.

E. DISCUSSION
The proposed method uses sinusoidal fringes with a relatively simple and straightforward assignment of spatial frequencies and temporal phase shifts which offers many advantages as discussed before. Additionally, we note that sinusoidal fringes are naturally robust to projector and camera blur as the phase is usually unaffected by blurring due to the spatial symmetry of the blurring optical transfer function [69]. This is a highly desirable property and is a clear advantage of the proposed multi-projector FPP approach over non-FPP multiprojector approaches such as [34], [40], [70].
A possible drawback of the proposed method is the limited dynamic range of cameras, especially if all projectors are positioned to have overlapping FOVs. Camera pixels which observe an overlap of projector's FOVs receive more light, and if there are many projectors then overlap may require a reduction in the light sensitivity of a camera. On the other hand, reducing the light sensitivity of a camera to accommodate for increased illumination affects the signal-to-noise (SNR) ratio for pixels which are illuminated by a single projector only, and for pixels which observe parts of the object which are dark and do not reflect light well. For the proposed approach camera's available dynamic range must be shared between all projector's which particular camera observes, and this in turn defines an upper limit to the number of projectors which may simultaneously illuminate any part of the scene. Note that in practice an overlap of more than tree projectors is not expected as we naturally want such a spatial arrangement which maximizes the coverage of the working space which is in turn achieved when FOV overlap is minimized. The available dynamic range can be estimated using estimated PSDs. Observe in Figs. 6, 5, 7 and 8 the existence of the noise floor which is at about −65 dB compared to the DC component. Also observe that after compensating projector's gamma almost all harmonic overtones are at about −42 dB compared to the DC component. Taking into consideration a standard formula for signal-to-noise ratio [71], SNR = 6.02B + 1.76 dB, where B is the number of bits, we obtain B = 10.5 bits for the camera and B = 6.68 for the projector, which is consisted with the equipment used as Dragonfly2 camera is declared to use 12 bits, and projectors to use 8 bits (note that we can never utilize the whole theoretical dynamic range). From our experience 4 to 5 bits of contrast are required per projector which makes a standard 8 bit camera capable of handling three projectors easily.
Recovered 3D surface points may be textured using the value of a as the gray-level intensity. Furthermore, if cameras are recording in color then the DC spectral coefficient a may be computed for each color channel separately yielding a RGB texture with the sinusoidal fringe removed. To get an even better rendering of the final mesh note that photometric stereo [72] may be used to recover surface normals. Consider three images showing the recovered amplitudes b k in Fig. 9 where it is clearly observable that the direction of incident illumination depends on the position of a particular projector. If a sufficient number of viewing directions are known then surface normals may be recovered from the coefficients b k using photometric stereo with an additional advantage that view directions of camera and of projector are known in advance due to geometric calibration. Therefore, by introducing additional data processing the proposed method should be able to acquire both texture and normal information for each reconstructed 3D point with no need to project additional frames such as e.g. an all-white pattern or to acquire images with projectors turned off, and this is a future work we will pursue

IV. CONCLUSION
We have proposed a method for efficient SL scanning which is applicable to multi-projector multi-camera SL scanners. The proposed method is based on a computational separation between projected sinusoidal fringe patterns which is enabled by specific choices of temporal phase shifts: a different set of temporal phase shifts is assigned to each projector in a way which enables efficient and/or robust separation of the observed combined patterns into the contributions of each projector.
We have shown that for a system having P projectors the minimal number of phase shifts required for separation is 2P + 1. Considering specific strategies to select the required phase shifts we have discussed two options, regular phase shifts and random phase shifts. The choice of regular phase shifts is such that phase shifts define basis vectors of the discrete Fourier transform in 2P + 1 (or more) points which in turn enables an efficient separation. The choice of random phase shifts is one where phase shifts are chosen from the uniform distribution over a 2π interval.
The proposed method enables construction of large FPP systems comprised of many projectors and cameras which can obtain the whole surface profile of an object in a single scan. TOMISLAV PRIBANIĆ (Senior Member, IEEE) is currently a Professor with the Faculty of Electrical Engineering and Computing, University of Zagreb. He teaches several undergraduate and graduate courses in the field of algorithms and data structures, image processing, sensors, and human motion analysis. His research interests include computer vision and biomedical signal measurement and analysis. He has led a number of domestic and international scientific projects, collaborating with researchers from within and outside the EU. He was a Visiting Researcher at INRIA Rhone-Alpes, Grenoble, France, Fraunhofer IGD, Darmstadt, Germany, and a Fulbright Visiting Scholar at the University of Wisconsin-Madison, Madison, USA. The results of his research have been implemented in technological projects and he has also received recognition for innovations. He is a Senior Member of IFMBE and a Collaborating Member of the Croatian Academy of Engineering. VOLUME 10, 2022