Optically Synchronized Phased Arrays in CMOS

—Optical synchronization of large-span arrays offers signiﬁcant beneﬁts over electrical methods in terms of the weight, cost, power dissipation, and complexity of the clock distribution network. This work presents the analysis and design of the ﬁrst phased array transmitter synchronized using a fully monolithic CMOS optical receiver. We demonstrate a bulk CMOS, 8-element, 28-GHz phased array building block with an on-chip photodiode (PD) that receives and processes the optical clock and uses an integrated PLL to generate eight independent phase-programmable RF outputs. The system demonstrates beam steering, data transmission, and remote synchronization of array elements at 28 GHz with ﬁber lengths up to 25 m, in order to show the scaling beneﬁts of our approach. The provision of small footprint and cost-effective CMOS transceivers with integrated optoelectronic receivers enables exciting opportunities for low-cost and ultralight array systems.


I. INTRODUCTION
L OW-COST, functionally complex silicon mm-wave and RF integrated circuits (RFICs) [1]- [5] have significantly changed the nature and potential applications of phased arrays. These RFICs enable very large span, low-cost, lightweight, and flexible phased arrays [6], which may be implemented in a dense or sparse fashion [7], and used in a range of emerging applications.
In a modern modular phased array, a common reference signal is distributed to sub-modules, which are usually fully functional phased arrays themselves. Each sub-module is governed by a single RFIC which drives several radiating elements and performs local beamforming tasks that are essential for the array's functionality.
As array spans become large, electrically synchronizing these submodules becomes more challenging. Methods for Manuscript  reference distribution relying on local synthesis were introduced [8]- [10], but are limited in performance. Clock distribution at RF, on the other hand, may become prohibitively expensive and power hungry. Neither of these options scales well with the size, span, or operation frequency of the array. Furthermore, as future arrays shift toward flexible [6], lightweight [5], and cost-effective [10] implementations, the support infrastructure ( Fig. 1) for reference, signal, and power distribution becomes a dominant factor in the system's cost, power consumption, and mass. Optical timing synchronization (OTS), where the timing information is modulated onto an optical carrier and distributed over a long distance, can overcome some of the challenges mentioned above in RF and digital systems. Optical timing distribution has been envisioned in the past [11] and has been realized in various applications today. Several radio astronomy telescopes use or have used optical references to synchronize remote antennas [12]- [15], and optical RF distribution plays a critical role in radio-over-fiber systems [16], [17]. Some work has been done for optical distribution of digital clocks using monolithic CMOS photodiodes (PDs) with varying degrees of integration and success [18]- [20], while other works implement data receiver building blocks using monolithic CMOS PDs [21]- [23].
Considering the advantages of OTS and related prior work, silicon RFICs using optical synchronization can benefit a variety of applications. Indoors, applications that require multiple modules distributed throughout a room can benefit from OTS; for example, mapping the interior of a house for robotic navigation or creating 3-D models [ Fig. 1(a)]. In order to reach acceptable scanning resolutions of subcentimeter, these systems have to either work in mm-wave frequency bands or use interferometry. If mm-wave bands are used, obstacles must be avoided to reduce losses, which requires multiple antenna modules spread across the room, while interferometry inherently requires multiple modules. Another example is wireless power transfer, for which the avoidance of obstacles may be desirable for efficiency and safety reasons, warranting multiple modules as well. In all such cases, OTS provides a low-cost and flexible alternative to electrical wiring.
Outdoor terrestrial and space applications can also take advantage of the physical flexibility of OTS to enable large arrays, possibly bendable and/or conformable, which may be used in communications, radar, imaging, radio astronomy, and so on. Moreover, constructing these systems out of a large number of sub-modules enables economies of scale and design flexibility, such as hybrid solutions using OTS with local electrical timing distribution, as illustrated in Fig. 1 These advantages are exemplified in space-to-earth solar power transfer [24] or space-based radio telescopes, which heavily rely on extremely low-mass, foldable, highly efficient and modular phased-array sheets. For both examples, OTS through either fiber or free space provides a low-loss, low-mass solution to the problem of transmitting a reference over large distances. In terrestrial applications, multiaperture coherent radar can yield an SNR increase of up to N 3 through the synchronization of N apertures [25]. Radio astronomers have used aperture synthesis for decades to synthesize extremely large radio telescope apertures through timing synchronization [26], and the same technique may be used for terrestrial imaging applications [27].
In Sections II-VII, we will present the design and demonstration of the first complete and functional optically synchronized phased array in CMOS, with an on-chip bulk CMOS PD. We start with an analysis of the benefits and potential use cases of optically synchronized arrays. Then, we will present our OTS system architecture and implementation. We will discuss the various system building blocks and their measurements and finish with measurements of our phased array transmitter, optically synchronized to a remote optical source up to 25 m away from it. Our OTS array demonstrates beam-steering, data transfer, and multimodule synchronization capabilities.

II. TIMING SYNCHRONIZATION SCHEMES
In this section, we will review and compare different methods to distribute a timing reference over large distances. We will compare the materials and performance of an optical reference scheme with common electrical design approaches and draw conclusions about its limitations and potential use cases.

A. Overview and Materials
Fig. 2 illustrates three design approaches to distribute a clock reference to a synchronous array. The first [ Fig. 2(a)] approach treats the line as a network of lumped elements and assumes unmatched load terminations [6], [28], [29]. As the array size increases, the lines must be buffered in order maintain the signal integrity and the validity of the model. In practice, the line capacitance limits the use of this design approach to sub-GHz frequencies. Therefore, it will be denoted as low-frequency timing synchronization (LFTS). This method is cost-effective, power-efficient, and widely used, but the addition of buffers adds jitter and PVT-dependent skew to the clock signal. These deteriorate the quality of transmitted data [30]- [32] and limit the focusing capability of the array [33]. For systems that operate at radio frequencies, the use of LFTS also requires high multiplication ratio frequency synthesizers on each sub-module, which further limits the overall system performance.
A second approach utilizes RF design techniques [ Fig. 2(b)] to overcome the noise and scaling limitations of LFTS [34]- [36]. Here, the clock is distributed along matched transmission lines and buffering is not required. This enables the use of significantly higher clock frequencies, practically up to tens of GHz. This approach will be denoted radiofrequency timing synchronization (RFTS) even though it can also be implemented at lower frequencies (e.g. Ethernet, USB). The clock SNR deteriorates due to the attenuation of the lines and the signal splitters, but can always be improved by increasing the clock source power. This trade-off enables RFTS to achieve superior performance at the expense of increased power dissipation. At high clock frequencies, RFTS enables the use of low multiplication ratio synthesizers. Despite its advantages, RFTS at GHz clock rates is not often used in low-cost consumer applications, due to the cost of high-performance RF materials.
It is also possible to use an optical link to distribute timing information [ Fig. 2(c)]. An OTS approach benefits from the low loss of the propagation media at the cost of additional optoelectronic hardware, usually with limited efficiency. This approach supports arbitrary clock frequencies [16], practically limited by the capabilities of the electronic front-end (FE), with a potential advantage in the cost, mass, and loss of the clock lines over electrical networks (Table I). Clearly, these benefits are substantial only if the optical packaging and interface are simple and robust enough so as not to become the main cost and area driver of the OTS system. For large systems, such as radio telescopes or space-toearth power transfer, optical fiber is an inexpensive way to span large distances with low loss, little to no EMI, and low sensitivity to environmental changes [37]. For comparatively small systems that require large arrays, electro-optical printed circuit boards (EOCBs) with pick-and-place-compatible mirror elements can be mass-produced on standard or flexible substrates [38], [39]. This is less costly than aligning fibers to every RFIC, since individual fibers or fiber arrays can be passively aligned to the edge of the board using proven techniques [40]. For either large or small systems, the option of OTS through free space, when attenuation is low, eliminates the need for a physical connection. 1 Optical fiber, free space optics, and EOCBs may be used in various combinations, depending on the requirements of the system. Additional benefit may be gained through low cost integration of the optical-to-electrical conversion, for example, monolithic PDs in CMOS.
Next, we will use a simplified timing distribution model to compare the performance of the various approaches and draw conclusions regarding their potential applications.

B. Performance Comparison
The power dissipation and noise added by distribution networks, along with their mass, cost, complexity, and functionality, determine the use cases for the different clocking schemes. For our analysis, we will assume a 1-D array model with radiators operating at a wavelength λ. This simplified arrangement enables us to draw conclusions that are applicable to more complex configurations (different fan-outs, 2-D, etc.). We consider a modular, scalable phased array. The clock signal is fed to each of the array's sub-modules, each mounted with a single RFIC that drives n e radiating elements. Fig. 3 illustrates the models for the different distribution methods. The sub-modules are spaced in such a way that the radiating elements form a 1-D array and are separated λ/2 apart. Notably, LFTS is slightly different than the other two methods because it requires additional buffers along the clock line, with a separation that is generally a function of the clock frequency and not λ. We will denote the models in Fig. 3(a)-(c) as dense. The model in Fig. 3(d) where a single sub-module is driven by a long line will be denoted sparse and will be used as a special case where multiple splitters and sub-module loads do not affect the system performance.
We start with the cases of RFTS and OTS [ Fig. 3(a) and (b)], where the clock is sent over a matched network to the sub-modules' RFICs. The network power dissipation can be derived from the required SNR at the RFIC's FE amplifier output. Assuming that the amplifier is driven and adds white noise, with a noise factor F, at a temperature T , the spectral density of its normalized phase noise floor S φ is related to the input power, P chip by [42] The integrated phase error φ rms depends on the amplifier brick-wall bandwidth B. Thus, φ 2 rms ≈ S φ · B and can be related to the amplifier timing error t rms by φ rms = 2π f ck t rms , where f ck is the input clock frequency. The required input power to the RFIC's receiver can be found from the FE amplifier performance and the timing error that is dictated by the system, such that An RFTS scheme utilizes an electronic network of lines and splitters. The insertion loss L rf of a matched transmission line with a length of n e · λ/2 and a loss of α dB/m, followed in the 1-D case by a single splitter with a loss of α spl in dB can be expressed as A 1-D RFTS network that drives M sub-modules will dissipate to deliver P chip to each sub-module from a centralized source with an efficiency η src . This is a geometric series with a sum of Contrary to RFTS, an OTS scheme delivers P chip to the RFIC's FE amplifier via a PD with a limited efficiency η pd . For typical distances, line 3 (fiber or free space, assuming that all the transmitted power is collected) loss is negligible compared to the PD loss, and the power dissipation of an OTS network driving M sub-modules is We can now use (5) and (6) in order to find the length l tot (M) beyond which the exponentially increasing loss associated with RFTS is larger than the linearly increasing loss of the OTS network. For arrays beyond that span, OTS is favorable from a power dissipation perspective. Fig. 4 illustrates this calculation for several common RF materials. In Fig. 4(a), we first consider a sparse array, with a single long line driving a single sub-module at its end [no splitting, as in Fig. 3 Fig. 4(b) accounts for dense arrays with λ/2 spaced elements [as shown in Fig. 3(a)-(c)]. This is done assuming similar FE amplifiers for RFTS and OTS, a 20% efficient PD, and a realistic RF splitter loss of 0.3 dB over the optical splitter. The RF and optical source efficiencies are assumed to be 50% and 30%, respectively. No local frequency multiplication is assumed at the sub-module output. We see that at a reference frequency above 10 GHz, OTS is favorable 3 See Table I    for lines longer than about 1 m when compared to PCBbased RFTS, and 6 m when compared to coaxial lines. One exception is specialty RF lines with exceptionally low loss. Those, however, are about two orders of magnitude heavier and more expensive than optical fibers. Fig. 4(c) compares a high-frequency OTS reference versus an RFTS signal at a sub-GHz frequency, as sometimes done using terminated twisted pairs. Note that the comparison is made assuming the necessary frequency multiplication for the electrical timing signal and no frequency multiplication for OTS. In this case, the electrical line loss is small, about 0.2 dB/m at the range between 50 and 100 MHz, and a realistic splitter loss is around 0.4 dB [45] over the optical splitter. One important observation from Fig. 4(b) and (c) is that for low-loss lines, the presence of power splitters dominates the loss of the electrical infrastructure.
The power dissipation of LFTS is usually calculated from the energy that can be stored in its lumped-element model equivalent. We assume that the line in Fig. 3(c) is segmented into parts that are significantly shorter electrically than the reference wavelength. Then, each segment is modeled as shown in Fig. 5. The buffer spacing l seg is, in general, independent of the spacing of the submodules. The maximum LFTS clock frequency f lfts is, by definition, limited to approximately the resonance frequency associated with l seg , and it can be expressed in terms of C P and L S , the parallel capacitance, and series inductance per unit length, as The width of each segment is usually chosen, so its resistive and dielectric losses R S and G P are negligible, and its length is made short enough for it to be treated as a lumped capacitor. The power dissipation of a line of length l tot is then where V is the buffer supply voltage. Fig. 6(a) compares the power dissipations of LFTS at 100 MHz and OTS as derived in (6), assuming a timing synchronization error of 100 fs and an amplifier bandwidth of 5% of the OTS clock frequency. For LFTS, we assume a capacitance per unit length of 100 pF/m as approximately exhibited by most PCB and coaxial materials for a 50-characteristic impedance. Interestingly, OTS is superior to LFTS for sparse arrays due to its length-independent power dissipation. However, OTS power dissipation scales with the number of PDs, so this advantage diminishes as the array becomes more dense. For a noise performance comparison, we estimate n buf , the required number of LFTS line buffers, which are cascaded and add jitter to the system It is noteworthy that we consider here segment lengths comparable to the clock electrical wavelength, even though they should actually be much shorter in order to be regarded as capacitive loads. This is done to not rule out resonant lumpedmodel designs, despite the fact that they are prone to phase drifts, do not generally scale well in size for high-Q line resonators, and are not commonly used. Consideration of such long segments results an overly optimistic jitter prediction for LFTS. The cascaded noise at the end of the distribution line is assuming that each buffer has a noise of N buf and that the buffer noise sources are independent of each other. Fig. 6(b) shows how the LFTS-only approach presents growing noise performance challenges if used to synchronize large and/or distant array domains. Two other important noise considerations in designing distribution networks for large arrays are pickup and drift. Long conductors are prone to couple stray signals, especially if they are poorly terminated. Meter-long lines, for example, may pick-up FM radio signals and corrupt low-frequency, lownoise references in these frequency ranges. OTS is inherently resilient to this phenomenon. In addition, both optical fibers and metal transmission lines are subject to thermal expansion and to drift in their dielectric and dimensional properties due to environmental conditions. Uncompensated optical fiber delay typically varies by 20-40 ps/km/ • C [12], [46]. The electrical length variation of a coaxial cable is about 20 ppm/ • C [47] which translates to about 83 ps/km/ • C with the reported speed of propagation (0.8c).

C. Analysis Summary
From a pure performance perspective, OTS is a good candidate as a clocking platform for large arrays, for example, as a complement to a local LFTS scheme. 4 While electrical references are utilized in mid-size arrays [48], [49], larger implementations (>40 k elements), for example, [50], may benefit from an optical reference clock. The strongest competitor to OTS in sparse applications is a low-frequency reference that is distributed using RFTS methods. However, metal lines are still heavier than a bare fiber, and the low reference frequency may require the use of high multiplication ratio synthesizers. These limit the system performance and produce close-in spurs that are not easily filtered away from the output signal.
Additionally, OTS is less prone to EMI on the clock reference and demonstrates lower sensitivity to environmental conditions. While drift compensation of large networks is outside the scope of this work, the phase drift of fibers can be further reduced either by feedback [37], [51]- [53] or material/structure engineering [46], [54].
Contrary to electrical references, the large bandwidth of an optical fiber also supports clock and data distribution on a single line, without the limitations of conductor trace coupling and board area. The flexibility of fiber enables its use in lightweight, bendable array interconnects. Free-space optical synchronization can potentially enable efficient synchronization 5 of physically disconnected modules, and the possibility of distributing extremely high reference frequencies may be an attractive option in THz systems. The advantages in OTS motivate its development as an attractive reference distribution method.

A. Overview
With the analysis conclusions in mind, we demonstrate an OTS system using a bulk CMOS RFIC phased array building block fabricated in a 65-nm process. It has eight RF outputs in the 28-GHz frequency range, synchronized to an optical reference at 7 GHz. The fiber interface is through a simple via hole in the chip carrier board and does not require additional mask processing. This paves the way for a low-cost packaging approach which justifies the clock infrastructure comparison in Table I. The RFIC in Fig. 7 has three main functional sections. The FE receiver contains a CMOS integrated PD which operates near the visible wavelength range and an injection-locked transimpedance amplifier (TIA) chain to amplify the optical signal to 1-V supply digital levels. The digital clock signal is fed into a low-noise, fully integrated synthesizer (PLL) with a low multiplication ratio to generate and distribute the desired output RF frequency. Lastly, the signal is buffered and distributed to drive eight TX channels with independently controlled phase and amplitude to demonstrate the beam-forming and data transfer capabilities of our approach.

B. A Modular Phased Array Building Block
The RFIC is flip-chip bonded [ Fig. 8(a) and (b)] to a modular circuit board with eight 28-GHz transmit patch antennas, spaced 0.6λ apart. The module size is 1.2λ × 2.4λ to allow uniform tiling of RFIC boards into a larger array. Fig. 8(c) and (d) illustrates the interface between the optical signal and the chip surface. A 125-μm-diameter optical fiber is inserted through a 140 μm via that is aligned with the 45-μm-wide on-chip PD. This gives sufficient tolerance to ensure reliable coupling to the PD. Optical coupling is further improved through the use of index matching gel, and mechanical stability is ensured by epoxying the fiber to a rigid support, which is shown in Fig. 8(c). The headers and bottom board that hold the fiber support are also used to distribute power and communication signals to the array submodule.

A. High-Speed Photodiode
Full integration of PDs in bulk CMOS provides a direct optical interface to a standard electronic chip with wellstudied trade-offs. The integrated PD is a three finger, n+/pwell diode with shallow-trench isolation (STI) guard rings and a deep n-well diffusion current block, as studied and described in [55]. The responsivity is shown in Fig. 9(a) and was measured at 780 nm, which is the optical wavelength that is intensity modulated by the reference. In the case of single-tone reference distribution, it is possible to resonate the PD capacitance with an inductor, which also provides dc biasing to the cathode (Fig. 10). The PD anode is connected to a reverse bias pin, V r , which can be set independently to a desired reverse bias (as low as −9.5 V before breakdown occurs).
Considering just the PD and the resonant tank, the two main sources of noise are thermal noise and shot noise. The thermal noise will most likely be dominated by the integrated inductor, which typically has Q ≈ 10. However, at bias voltages close to breakdown, the shot noise will be significantly greater than the thermal noise [ Fig. 9(b)] [56].
To analyze the shot noise, we note that the early McIntyre model [57], [58] was shown to overestimate avalanche noise in a CMOS APD [55], [59]- [61] and was subsequently improved upon by later models [62], [63]. Nevertheless, we have adopted the early model as a simple way to estimate the upper bound of shot noise generated by our APD. Following [60], the data for electron and hole ionization rates are taken from [64], and k eff is found numerically, as derived in [58]. The parameter k eff is used together with the avalanche gain to determine the excess noise factor, given by [56] We note that inductor parallel resistance is approximately Q 2 · r s , where r s is the series resistance and Q is the quality factor of the inductor [65]. Then, neglecting the PD dark current, the shot noise spectral density is 2q M 2 FR 0 P in , and the thermal noise spectral density is 4kT/Q 2 r s [56]. By normalizing to the signal, MR 0 P in , the PD phase noise spectral density can be expressed in dB as where M is the avalanche gain and R 0 is the zero bias responsivity. Using the plot in Fig. 9(b), one may determine the reverse bias necessary to achieve a particular jitter specification at the input to the FE amplifier.

B. Front-End Amplifier
The receiver TIA can be configured either as a driven LNA as specified in (1)-(6), or as a tuned injection-locked [66] TIA (TIL-TIA), as illustrated in Fig. 10, to boost the FE sensitivity at the expense of a limited lock range. This has the added benefit of built-in test capability; by allowing the TIL-TIA to self-oscillate, we can characterize the complete PLL and output path without an input signal. The reference is further divided by 2-3.5 GHz, which eases on-chip signal distribution and reduces the coupling to the first tuned amplifier. The combined TIL-TIA and divider together nominally draw 9 mA from a 1-V supply. In order to make sure that our FE does not contribute excess noise to the overall system output when operated as an oscillator, we analyze and estimate its shortand long-term phase variations.

1) Injection-Locked Amplifier With a Subsequent PLL:
The phase noise of the TIL-TIA is the short-term random phase deviation of its output compared to the phase of the injected signal. It can be shown (Appendix A) that an injection locked oscillator shapes the noise of the injected signal L i (ω) and the free-running oscillation L f (ω) similar to a first-order PLL. Our TIL-TIA is used to amplify a weak injection signal, so in that case its output phase noise is where ω L is the TIL-TIA lock range, and ω 0 is the difference between the injection frequency and the TIL-TIA free-running frequency. Examining (13), we notice the interchangeable effect that increasing the lock range and injecting a signal closer to the TIL-TIA center-frequency have on the output noise. Fig. 11(a) shows how the TIL-TIA approximate transfer function shapes its output noise, while Fig. 11(b) illustrates how modifying the circuit's bandwidth affects the output noise. We assume here the phase noise curve of a typical CMOS LC oscillator, and a signal source with an SNR floor of −128 dBc/Hz at 7 GHz, corresponding to a jitter of 100 fs when integrated over a 50-MHz SSB bandwidth. Unsurprisingly, optimum noise performance is achieved for a bandwidth chosen roughly at the intersection of the reference and the free-running oscillator noise curves. This could be a challenge as this bandwidth depends on the injection strength. The constraint is alleviated, however, by the subsequent on-chip synthesizer which further limits the TIL-TIA noise bandwidth. Fig. 12(a) illustrates how an additional second-order low-pass filter of the form shapes the TIL-TIA output noise, and Fig. 12(b) shows how the limited subsequent PLL bandwidth significantly relaxes the requirement for accurate injection strength control; as long as the injection strength is above a certain threshold, it should not limit the overall system noise performance. The filtering effect of injection locking was measured with a standalone TIL-TIA, as illustrated in Fig. 13, with the first-order filtering effect clearly demonstrated. It is noteworthy that the phase noise shape of the measured free-running oscillator is somewhat different from the analytical derivation due to supply noise and the absence of an amplitude-limiting mechanism in the measured circuit.

2) Phase Drift of an Injection-Locked Amplifier:
Being a first-order feedback loop, the TIL-TIA tracks the input signal with a constant phase shift. Phase drift is a slow long-term variation of this phase shift due to environmental changes that affect the integrated circuit. As mentioned, utilizing a TIL-TIA implies a small injection. In this case, the constant phase difference θ 0 of an LC injection-locked oscillator (ILO) can be expressed as a function of the free-running and injection frequencies ω 0 and ω inj , respectively; of the output and injection strengths I o and I inj , respectively; and of the tank quality factor Q [67], [68], so that where = I inj /I o is the injection strength ratio. A similar effect exists when utilizing tuned amplifiers in RF chains (Appendix B), but for a TIL-TIA the situation is exacerbated by a factor of 1/. This sets a lower limit on the injection strength depending on the oscillator sensitivity and the permissible long-term output timing drift. If a TIL-TIA drives a frequency multiplier (a PLL usually), its required phase error θ tia,max can be expressed in terms of the allowable PLL output timing drift t out,max , where the PLL output frequency is f out , and its multiplication ratio is N, so Re-writing ω = (ω 0 − ω inj )/ω 0 and substituting (16) into (15) > 2Q ω θ tia,max (17) where, for small angles, sin θ 0 ≈ θ 0 . Conversely, ω can be defined in terms of ωt , the rate of frequency drift, and the allowed phase drift period t cor , as In that case, (17) can be re-arranged to set an upper limit for how frequently phase correction algorithms must be utilized to maintain acceptable required long-term drift Fig. 14(a) illustrates how (17) can be used to estimate the minimum injection ratio for a maximum output phase drift of 100 fs as a function of the normalized frequency drift for different division ratios, assuming a TIL-TIA with a nominal Q = 10. Notably, using a low-multiplier PLL subsequent to the TIL-TIA significantly eases the design requirements. Fig. 14(b) shows how (19) can be used to estimate the maximum allowed time interval before phase estimate algorithms (e.g., [10]) must be utilized to correct for TIL-TIA frequency drift. In this case, we assume that frequency drift results mainly from temperature changes and that the temperature fluctuates on the order of 0.1 • C/s. Temperature frequency drifts of uncompensated CMOS LC oscillators are on the order of 100 ppm/ • C [69], [70], which translates to a frequency drift of 10 ppm/s. For our TIL-TIA, operated with normalized injection strengths of 0.05-0.1, this requires phase correction once every few seconds, an achievable task for modern integrated systems. Drift compensation techniques [69], [71], [72] can be used to further increase the phase correction intervals.

V. LOW-MULTIPLIER FREQUENCY SYNTHESIS
The frequency multiplying PLL [73] (3.5-28 GHz) in Fig. 15 is co-designed with the receiver amplifier to limit the output phase bandwidth to 50 MHz and reduce the referencerelated jitter, while keeping the VCO noise sufficiently low. Due to the high output frequency, the divider circuitry is composed of an injection-locked frequency divider [74] (ILFD; Fig. 16) with a frequency range that is matched   to the VCO [75] control voltage dependence, followed by current mode logic (CML) and true-single-phase clock (TSPC) dividers (Fig. 17). Using a small multiplication ratio enables us to achieve low jitter performance of 147 fs, as shown in Fig. 18(a). The low clock multiplier reduces the risk of harmonic locking and the large spacing of reference spurs from the carrier assists in suppressing them by the loop filter, the tuned RF path, and the antennas' bandwidths. As a result, our PLL demonstrates reference spurs at a power of −77 dB below the carrier as illustrated in Fig. 18(b). The asymmetric spur measurement might be due to amplitude noise in the measurement and/or asymmetric antenna bandwidth, which is also included in the measurement. The improved spurious tone rejection eases the system compliance with spectral disturbance level requirements.   VI. TX CHANNELS TX channels (Fig. 19) are intentionally designed to enable the demonstration of a broad range of applications. The PLL output is buffered by four independently programmable VGAs and routed to quadrants of pairs of TX channels. Each channel is composed of a first-order RC polyphase filter (PPF) [76], a vector modulator (VM) [77], and a power amplifier (PA), as shown in Figs. 20 and 21. The VGAs directly drive two unbuffered PPFs to reduce the power consumption. Those are followed by independently controlled VMs which double as buffer stages to minimize the coupling between the channels. The VMs are calibrated offline using a gradient descent algorithm to a 6-bit resolution with phase and amplitude errors of 2.3 • and 0.9% rms (Fig. 22), respectively. The calibration is done with a network analyzer that measures the output phases of the TX channels and matches an I/Q setting for each desired phase. In our specific implementation, the VM state is programmed by a slow serial link which limits the transmission speed to several Mbps. This is not a fundamental limitation [78] and programming speed can be increased by using a parallel or an analog data interface, for example. Another interesting idea is to expand the optical clock approach and distribute phase modulated data optically as well, similar to the electrical implementation in [79] and [80]. Each phase shifter drives a two-stage PA with an output −1-dB bandwidth of about 1 GHz, output power of more than +12 dBm at 28 GHz, and a drain efficiency of 23% (Fig. 23).   The total output power from the chip is more than +21 dBm from a 1-V power supply. The output PA stage can work in a linear or switching mode [81] and has series inductors added between the driving transistors and the cascode in order to align the output voltage and current waveforms, which slightly increases the drain efficiency.
The contribution of the TX channel to the system's noise is negligible since it is driven by a large, almost rail-to-rail signal. The expected combined jitter added by the standalone TIL-TIA and PLL blocks is about 180 fs. Neglecting amplitude fluctuations and long-term drifts, the receiver-ignorant [32], [82]- [84] EVM can be calculated from jitter: EVM ≈

A. Experimental Setup
With the system architecture quantitatively evaluated, and its subblocks characterized, we assembled a single transmitter sub-module in a small anechoic space. A 7-GHz signal modulates the laser source that distributes a clock reference to the phased-array transmitter over a 25-m fiber, as illustrated in Fig. 24. Out of a variety of methods for generating the RF reference at the PD, an externally modulated laser was chosen for overall simplicity. Other possibilities include using a directly modulated laser, the beat-tone output of two lasers, an optoelectronic oscillator, or an optical frequency comb, among others [16], [85]. These, however, imposed noise, cost, and/or availability challenges with currently available off-theshelf hardware. The narrow depletion region dictates the use of a short wavelength, which was constrained to 780 nm at the lowest by hardware availability for our output power requirements. A spectrum analyzer with vector signal analysis capability serves as a remote receiver about 40 cm away from the RFIC to record its far-field radiation pattern. Fig. 25 shows the RFIC output signal spectrum at 28 GHz and its measured phase noise. This measurement demonstrates significantly lower noise than that reported earlier in [86], due to optimized laser modulation and fiber-PD alignment. The measurement in Fig. 25 was done with an integrated CMOS PD. The measured system timing jitter is degraded (compared to Fig. 18) by the limited performance of the laser source and by the limited extinction ratio of the optical modulator. 6 The measured 28-GHz RF signal also includes additional amplitude noise due to the absence of an amplitude limiting mechanism at the spectrum analyzer input. Fig. 26 shows the beam steering capabilities of the optically synchronized array with symmetrical radiation patterns from 0 • to ±45 • .

C. Data Transmission
Transmission of 16-QAM and 32-QAM modulated data streams through the array is demonstrated by programming the IQ phase-shifter steering angles. This is done for a single output channel in order to minimize multipath reflections, using a pseudorandom custom logic state machine implemented on an FPGA. The modulation does not assume a specific communication protocol. Data rate is limited by the serial interface programming speed, but the output bandwidth [ Fig. 23(a)] of the chip is greater than 1 GHz and can support much higher data rates. We transmitted strings of 4096 symbols with a total transmission time of 11.6 ms. As estimated in (19) and in Fig. 14, significant timing drift is expected to occur on the order of seconds and was not noticed in this measurement. Therefore, calibration for phase drift was not necessary here. Fig. 27 shows the raw measurement  of the received signal, measured by a signal analyzer. These results demonstrate a significantly improved EVM and double the data rate compared to those reported earlier in [86], thanks to optimized programming procedures and receiver setup. The measured EVM is ∼5% which is larger than expected due to the additional noise in the transmission (see Section VII-B). The EVM reported here includes the effect of a deterministically imperfect constellation generated using the VM and can be further improved in principle.

D. Synchronizing Two Remote Phased Arrays
Lastly, we use OTS to synchronize two electrically distant array elements. We drive two phased array modules through two 5-m optical fibers carrying the same optical signals, as shown in Fig. 28, leading to an electrical distance of more than 10 m. First, we verify that the modules are indeed synchronized. For this purpose, we activate a single element on each module. The power received from each module is different because the elements are not aligned identically with the receiver. We varied the phase setting of one module with respect to the other over 360 • . Fig. 29 illustrates the verification that the modules are indeed synchronized, that their received signals can be coherently added or subtracted, and that the quality of the phase coherence depends on the injection strength. For a very narrow lock range at slow sample intervals [ Fig. 29(a)], the coherent addition suffers from significant phase drift until eventually one of the sources loses lock. As the injection strength increases and measurement time shortens, the signal addition follows the expected sinusoidal pattern very closely [ Fig. 29(d)].
We can use the linear regions of the phase combination to roughly estimate statistical information about phase drift over time, 7 as illustrated in Fig. 30(a)-(d). We use a least mean squares fit to find the phase variance of the measurements.  Assuming that the noise sources are additive and that the phase samples are independent, the phase combination variance is where θ inj is the lock-range-dependent phase drift and θ other represents noise associated with other sources such as frequency multiplication and fiber phase drift. θ inj can be rewritten from (16) to (19) as where t interval is the time period between samples. The plot of θ 2 err versus [N · (ω f /ω L ) · t interval ] 2 in Fig. 30(e) is used to predict the TIL-TIA integrated oscillator drift and the phase deviation added by other sources. 8 It is noteworthy that the linear approximation of phase error is mainly determined by the last point in the dataset, where the multiplication (ω f /ω L )· 8 The averages and the error margins are calculated using the two values y = (x ± 3σ x ) 1/2 , for both the oscillator drift and the other phase errors.  t interval is significantly larger than the other points. In addition, the deviation of a single-phased array submodule is actually smaller by up to 30%, since we measure the sum of the phase deviation of two similar submodules with nonidentical amplitudes. Overall, the measured TIL-TIA drift is well within the range of ∼10 ppm/s as estimated in Section IV-B2. The remaining phase error is very similar to the short-term phase noise measured in Fig. 27. Based on Section II-B, the expected drift for 5 m of uncompensated fiber is 100 fs/ • C, which is probably not significant given the estimated temperature drift of 0.1 • C/s in the measurements that were carried out so far.
In a second measurement, half the array elements in each module are turned on and steered toward the receiver, so the received signals have similar magnitudes. Once both of the chips are operating concurrently, we measure a coherent increase of received power by ∼5 dB. This is comparable to the total power received by a single module with all its array elements steered broadside, as illustrated in Fig. 31. The small difference from a perfect coherent addition of 6 dB is probably due to a static phase offset between the two separate modules.

VIII. CONCLUSION
This work presents the design and measurement of the first fully integrated OTS system in a bulk CMOS process. By quantitatively analyzing the benefits of an OTS system and estimating its performance, we are able to demonstrate an optically synchronized 28-GHz phased array transmitter with beam steering, remote module synchronization, and data transmission capabilities. The implementation of OTS in low-cost CMOS enables the scaling of arrays in high-volume, lightweight, low-cost, and large-span commercial applications.
By reducing the mass, cost, and loss of the synchronization infrastructure, we provide an attractive alternative to traditional high-frequency clocking schemes.

APPENDIX A INJECTION-LOCKED OSCILLATOR AS A FIRST-ORDER PLL
A simple model [87] of injection-locked LC-oscillators will be further investigated here. The discussion starts with the small signal model of an ILO introduced in [88 where θ is the phase difference between the phase of the injected signal θ i and the phase of the ILO output θ osc , ω i , and ω f are the injection and oscillator free-running frequencies, respectively, Q is the tank quality factor, and = I inj /I f is the normalized injection strength. Defining f = dθ/dt, the first-order differential of small phase and frequency deviations from the equilibrium point (bold letters are vectors) is The partial derivatives are ∂ f We note that by definition f 0 = 0 at equilibrium, so the differential equation can be approximated as and in Laplace domain Since θ = (θ i − θ osc ), then θ = (θ i − θ osc ) and the differential equation can be rewritten as or With the frequency-domain transfer function defined around the equilibrium point, it is now clear how the noise spectra of the input and free-running oscillator are shaped by the injection-lock dynamics. If the noise sources are independent, then (32) which is mathematically identical to a first-order PLL. Utilizing an ILO as an amplifier implies 1, so With a weak injection also [67] sin Substituting cos 2 θ = 1 − sin 2 θ or, more simply were ω L is the ILO lock range, and ω 0 is the difference between the injection frequency and free-running frequency of the oscillator.

APPENDIX B PHASE SENSITIVITY OF A PARALLEL-LC TUNED AMPLIFIER
Assuming an harmonic excitation I ( j ω), the output voltage can be written in phasor notation as shown in Fig. 32.
The output phase θ relative to the current is derived from and for a small deviation around resonance This is similar to the phase variation that was derived in (15) for a small injection, where sin θ ≈ tan θ ≈ θ but without the added sensitivity to the normalized injection strength .