Performance Analysis of Scalable Optical Circuit Switch Employing Fast-Tunable AMZI Filters for Coherent Detection

High-port-count optical switches are expected to resolve the envisaged bandwidth and power crunch in intra-data center networks stemming from the slowdown of Moore's Law. This paper briefly reviews electrical switching technologies in current intra-data centers and discusses requirements for optical circuit switches in future data centers. We highlight an optical switch architecture that combines the two independent dimensions of space and wavelength to realize large port counts and fast switching. Wavelength routing is implementable with receiver-side channel selection using wavelength tunable filters (TFs). Coherent detection necessitates a fast-tunable local oscillator (LO) at the receiver. It can be cost-effectively realized using an LO bank that consists of fixed-wavelength laser sources and Silicon-photonic asymmetric Mach-Zehnder interferometer (AMZI) filters. Colorless detection that removes a TF at the front of the receiver is possible when the number of wavelength-division multiplexing (WDM) channels is small, however, as the number exceeds a certain limit the TF is needed to prevent power saturation of the receiver. We present an effective cooperative filtering scheme for transmission signals and LO channels sourced from an LO bank; it can detect C-band WDM coherent signals while easing the receiver's power limitation. The scheme is analyzed to clarify receiver performance and the switch port-count bounds under various operating conditions. We demonstrate that the cooperative filtering scheme effectively works from 60 (typical of coherent receivers) to $ > $116 wavelength channels, which is verified by experiments for 1,856 × 1,856 optical switching with the short switching time of 3.2 μs.


I. INTRODUCTION
D ATA centers now comprise a major part of the crucial infrastructure for our personal and professional lives, as they store and process huge amounts of data. The datacenterrelated traffic has been growing annually by 25% and exceeded 20 Zettabytes in 2021 [1]. Advances in cloud computing and big-data analysis spurred by rapid progress in machine learning and artificial intelligence has accelerated the growth, which will yield the annual global datacenter traffic of 350 Zettabytes by 2030, giving a compound annual growth rate (CAGR) of 82% [2]. The global energy consumption of data centers reached 200 TWh in 2020 [3], and is forecasted to account for ∼40% of global information and communication technology(ICT) electricity usage by 2030 [4]. Most of the traffic resides within data centers, i.e., about 75% of the total datacenter-related traffic [5]. Present datacenter networks rely on multistage electrical packet switches that interconnect several hundred thousands of servers/storage systems [6], [7], [8]. The explosive traffic demands are outpacing the bandwidth growth rate of electrical switch application-specific integrated circuits (ASICs) and thus raising a problem with the bandwidth and power consumption of intra-data center networks. Indeed, many recent data centers were constructed in high latitudes to ease cooling loads [9].
Optical circuit switches can provide larger bandwidth and lower power consumption compared to electrical packet switches. An effective way for resolving the bandwidth bottleneck and power consumption barriers is the hybrid use of electrical-packet and optical-circuit switching [10], [11], [12], [13], [14], [15], [16], [17]. In intra -data center networks, most flows are small flows, while large flows dominate the traffic volume [15]. In an optical and electronic hybrid switching network, large and long-live flows are conveyed by optical circuit switches while small or short-live flows use electrical packet switches. With the hybrid network solution, bandwidth scalability is enhanced, and energy and cost effectiveness are attained by minimizing the use of power-consuming electrical switches and removing optical-electrical-optical (O/E/O) conversion at the switches. The benefit is clear for the Hadoop filesystem (e.g., typical size of data block is 64-128 Mbyte) which is commonly This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ adopted for big-data analysis [18]. Two key challenges remain for hybrid switching networks: how to attain port-count scalability and minimize optical switching speeds. The requirements (e.g., lower cost/latency and shorter transmission reach) are different from those of present wide area networks which use reconfigurable optical add-drop multiplexers (ROADMs).
An optical switching network employing coherent technologies provides a viable solution toward high-capacity and largescale data center interconnection [19], [20]. Thanks to the enhanced transmission capacity and loss budget, they are now widely adopted for core and metro networks including inter data center networks. With the success of the 400ZR coherent interface, standardization of the next-generation 800G coherent transmission is proceeding at the optical internetworking forum (OIF) [21] and IEEE 802.3df Task Force [22]. The specifications will include short-reach transmission of 2-10 km (e.g., intra-data center applications) in addition to core/metro applications. Optical switch design offers a complex problem which involves many interdependencies among switch port count, channel speed, available wavelength number, loss budget, and so on. Comprehensive analytical investigations are needed before coherent technologies can support cost-sensitive intradata center networks.
This paper studies a port-count scalable and fast wavelengthrouting optical switch based on coherent detection. Silicon-Photonic wavelength tunable filters (TFs) at the receiver attain fast optical switching by tuning local oscillator (LO) wavelengths for wavelength selection. This creates the colorless coherent receiver, where the target channel is selected from incident broadband wavelength-division multiplexing (WDM) signals. According to the switch port count (i.e., available number of wavelengths) and channel speed, TF design should be optimized so that high-port-count optical switches are cost effectively created. Compared to silicon ring filters, cascaded asymmetric Mach-Zehnder interferometer (AMZI) filters offer fast wavelength tuning of the order of microseconds with wide free spectral range (FSR) [23], [24]. They are highly integrable as evidenced by 1024 AMZIs on a single switch chip [25]. On the other hand, silicon ring filter integration needs special thermal control [26], [27] and/or special waveguide layout design [28] for stable operation. In this paper, we present a cooperative filtering scheme for transmission signals and LO channels using Silicon-Photonic multistage AMZI filters. This scheme is superior to conventional multistage AMZI filters in terms of chip size and tuning speed, enabling 50% smaller chips (or more than double the integration density) and shorter response times than the conventional filter.
The remainder of this paper is organized as follows. Section II gives a brief overview of current intra-data center networks and explains the basics of optical and electrical hybrid switching networks. Section III describes the target optical switch based on the combined use of wavelength-routing and space switches. Large-scale and fast optical switching is depicted that is attained by tuning LO wavelength for coherent detection. The cooperative filtering scheme is adopted as it ensures scalability for greater numbers of wavelengths (i.e., switch port count) and can overcome the receiver's power limitation. In Section IV, the design criteria of the scheme are derived for maximizing the switch port count, and its performance is analyzed via numerical simulations. Experimental verification is presented in Sections V and VI, including wavelength tuning characteristics of fabricated multistage AMZI filters. Fast switching times under 3.2 μs are achieved in a 1856 × 1856 optical switching experiment emulating 116-ch. × 128-Gb/s dual-polarization quadrature phase shift keying (DP-QPSK) signals. Section VII summarizes this paper with conclusions and discussions. This paper is an extension of an invited presentation at OFC2022 [29], as it adds detailed design methods and related analytical results of cooperative filtering with Silicon-Photonic AMZIs. Fig. 1(a) shows a typical intra-data center network in the form of a multi-tier hierarchical (Fat-tree) electrical packet switching (EPS) network. Several dozens of storage devices/servers are connected to each top-of-rack (ToR) switch via copper cables or optical fibers. Packet data is exchanged between racks using aggregation (Leaf) and core (Spine) switches above ToR switches. The chief advantage of the multistage electrical packet switching network is high fault tolerance and network scalability. For instance, Facebook's F4 Fabric is configured by 4608 ToR switches, 384 Leaf switches, and 192 Spine switches using 40 Gigabit Ethernet [6], [7]. Likewise, 2048 ToR switches, 512 Middle block switches, and 256 Spine block switches are used for Google's Jupiter [8]. The electrical switch chip bandwidth has been increasing at 40% per year and reached 51.2 Tbps in 2022 [30]. However, recent traffic demands outpace progress in complementary metal oxide semiconductor (CMOS) technology. The bandwidth crunch and power consumption limitation of electrical switches becomes a bottleneck for scaling out hierarchical EPS networks.

II. SWITCHING TECHNOLOGIES IN INTRA-DATA CENTERS
Optical switches are an attractive solution for future intra-data center networks, since they are agnostic (transparent) to the signal bit rates, and their available port counts and link speeds can surpass those of electrical switches. Note that optical switches can't simply replace electrical switches deployed in today's data centers due to the lack of effective optical memories. One viable solution is the introduction of optical circuit switches in combination with EPSs [10], [11], [12], [13], [14], [15], [16]. Fig. 1(b) shows an electrical and optical hybrid switching network with a small portion (10-20% of bandwidth) being multistage electrical switches and large-scale optical switches. The hybrid switching network can reduce needed O/E/O conversion by about 75% [15], where large flows (high-bandwidth traffics) are offloaded from electrical switches to optical switches. The key component is a high-port-count optical switch that minimizes the use of power-consuming electrical switches and provides single-hop connection between ToRs. Indeed, the flattened network reduces the number of optical transponders and interconnection links (about 75% reduction compared to current multi-tier EPS networks), which contributes to simplified network configuration and reduced switch power consumption [31]. Optical path setup/release control is done by simply referring to a connection status table on the ingress and egress ports, whereby simple and fast connection is achievable without any specific timing or traffic adjustment [31]. In addition, blocking probabilities can be reduced by means of optical switch parallelism which matches the trend in radix and bandwidth increase for merchant Silicon switch chips. Control network design issues including control latency and overall blocking performances were elaborated in our recent paper [9], [32]. The benefits of the hybrid network solution have been demonstrated in terms of power consumption [31], bandwidth scalability [9], [20], architectural simplification and a substantial reduction (more than 70% [9]) in interconnection hardware (transceivers and fibers) due to switching tier reduction [15]. In addition, its performance benefits such as reduced job completion times [18], [33], [34] or increased throughput [35] have been clarified for machine learning applications that handle large data blocks, which includes using the Hadoop filesystem (e.g., typical size of data block is 64-128 Mbyte) which is commonly adopted for deep learning model and big-data analysis.
Optical switch port counts of more than 1000 are considered necessary to gain the benefit from the single-tier network [15], [16], [17]. Such large-scale optical switching technology has been already adopted in core and metro networks outside datacenters, referred as reconfigurable optical add-drop multiplexers/optical cross-connects (ROADMs/OXCs). The basic configuration was first demonstrated in the early 1990's [36] with the creation of an OXC (the so-called route-&-combine configuration; wavelength selective switch (WSS) function at incoming fiber side and optical couplers at outgoing fiber side in the express switch part). This OXC has been widely deployed in commercial ROADM/OXC systems, most of which utilize the broadcast-&-select configuration (optical splitters and WSSs are placed in reverse order to route-&-combine devices). The wavelength-routing mechanism is based on TFs at add parts or tunable laser diodes (TLDs) at drop ports. As will be explained in Section III, we apply the combination of wavelength routing and space switching to create high-port-count optical switches. One of the major performance differences between ROADMs and data center optical switches is the switching latency requirement. Switching can be as slow as the order of seconds for ROADMs, while our present target, data center application, is less than 10 μs. The wavelength WSS is a very sophisticated device, but relies on three-dimensional microelectromechanical system (3D-MEMS) or liquid crystal on silicon (LCoS) technologies. Free-space switches offer scalability in terms of the port counts without significantly increasing insertion loss, although the available port-count is limited to about 300-400 [37], [38]. Furthermore, their switching speed which must include careful beam-steering control is rather slow (25 milliseconds [37] and 250 milliseconds [38]), which prevents us from using them for intra-data center networks. Cost efficiency and scalability are also important goals to be met in datacenters. Accordingly, we make the best of silicon photonics technologies from the viewpoints of high-density integration, low-power consumption, and good mass productivity.

III. OPTICAL SWITCH ARCHITECTURE BASED ON LOCAL OSCILLATOR (LO) BANK FOR COHERENT DETECTION
Extensive research has been done to attain the large-scale optical switches and/or fast optical switching speeds required for intra-data center applications [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49]. In this paper, we focus on a scalable switch architecture that leverages routing in the two dimensions of space and wavelength [20], [46], [47], [48]. As the total port count is given by the product of two sub-switch port counts, it can be easily scalable to MN × MN optical switches by combining N wavelengths with M × M space switches. Wavelength routing can be implemented using TLDs at transmitters, or TFs or tunable LOs at receivers. As TFs are simpler and more reliable devices than TLDs due to their passive nature, we focus on the receiver side wavelength selection (i.e., TF-based optical switch architecture) in this paper.

A. Colorless Coherent Detection
The substantial yearly reduction in cost, size, and power consumption of coherent transceivers portends their use in intra-data centers in the near future [50], [51]. The switch bandwidth and port-count can be increased by using coherent optics which offers high spectral efficiency and receiver sensitivity. Coherent detection, however, needs widely and fast tunable LOs at the receiver for wavelength selection. Cost-effective tunable LO lights are available without using TLDs; a wavelength bank can be combined with Silicon-photonic TFs (i.e., LO bank) [52]. This can eliminate TFs (and the resultant loss) in front of the receivers, and thus colorless coherent detection further enlarges the available switch port count due to the improved loss budget. However, the colorless receiver suffers by self-beat interference from out-of-band (OOB) channels which degrades performance [53], [54], [55]. This originates from power imbalance and skew mismatch between the two inputs in each of the four balanced photodiodes (BPDs). To maximize the port count of the optical switch, we need to consider the interplay between the removed TF loss and induced self-beat noise. Our design principle, presented in a previous paper [20], uses colorless coherent detection instead of conventional (filtered) detection; assuming a Silicon-photonic TF loss of 10 dB, the port count is quadrupled.

B. Receiver Power Limitation
According to the OIF arrangement, the receiver's dynamic range is restricted by the photodiodes (PDs) inside the receiver. When we assume an 18-dB receiver dynamic range, specified by the OIF [56], the maximum number of wavelength signals input to a coherent receiver is limited to 60 (= 10 18/10 ) for 100-Gb/s DP-QPSK format. Tighter constraints on the receiver's dynamic range are imposed with high-order modulation formats such as dual-polarization eight quadrature amplitude modulation (DP-8QAM) and dual-polarization sixteen quadrature amplitude modulation (DP-16QAM). Accordingly, even a coherent system needs a TF to prevent receiver power saturation when the channel number exceeds a certain limit [57]. Table I summarizes design examples of optical switch throughputs and port counts for different modulation formats, wavelength numbers (N), and space switch port counts (M), where colorless and filtered coherent systems (receivers), marked red and blue, respectively, are assumed. The channel speed stands for net rate excluding forward error correction (FEC) overhead. Multiplying the channel speed by MN yields the switch throughput. A tradeoff relation exists between available wavelength number (N) and channel speed. Colorless coherent detection is possible when channel speed is higher than 400 Tb/s, while TFs are imperative in scaling the port count over 1000 × 1000 (M = 16). In this paper, we focus on the cases where the number of wavelength channels is larger than the maximum permitted for a coherent receiver (i.e., N ≥ 60). Larger space switches (M) allow us to further expand the total port count, but what is currently available holds M to 32 as shown in [25]. Finally, the filtered signal is coherently detected by using an LO light sourced from an LO bank. By changing the state of the MCS and/or the passband of TF1, a signal is routed between arbitrary input and output ports without blocking.

C. Large-Scale and Fast Optical Switch Employing Colorless Coherent Detection
Wavelength selection is performed by tuning the LO wavelength from a shared LO bank. Multi-wavelength optical carriers are generated from a wavelength bank that is composed of an optical comb source or N fixed-wavelength LDs with an N × 1 WDM multiplexer. The wavelength bank output is distributed by a 1 × (MN/S L ) splitter and then the power is adjusted by an EDFA to output power of P L dBm. The amplified signal is further distributed by a 1 × S L splitter preceding optical filters. The target wavelength is extracted by TF2 and amplified by a compact and low-cost preamplifier (P A ) with uncooled-LD pumping. After preamplification, the selected center channel serves as the LO for coherent detection in place of the conventional LO laser. In this way, a high-port-count and fast optical switch can be cost-effectively built over coherent detection using LO  wavelength turning. Although an EDFA is a relatively expensive device, the per-port cost is reduced as it is shared by multiple output ports (S or S L ).
The optical filtering scheme is critically important when establishing wavelength routing on the optical switch. Fig. 2(b) illustrates a conventional configuration of the Silicon-Photonic TF based on multistage AMZIs. The AMZI has phase shifters on the two arms to adjust the passband and FSR. The k-th AMZI has FSR of 35/2 k nm to cover the C-band. The overall filter response is given by convolving all AMZI responses; its 3-dB bandwidth is given by 4400/2 k GHz for the k-stage cascaded AMZIs [24].  selection, filtering a single LO carrier. For 32-Gbaud WDM signals with more than 60 channels (N ≥ 60), we recently proposed a cooperative filtering scheme to handle transmission signals and LO channels [58]. This is illustrated schematically in Fig. 2(c). If the optical switch processes a 32-Gbaud WDM signal with more than 60 channels (N > 60), seven-stage (35 GHz) and eight-stage (17 GHz) cascaded AMZI filters are conventionally used for TF1 and TF2, respectively. In our configuration, the TF2 function is divided into two, dividing the 8-stage AMZIs into k-stage and (8 − k)-stage cascaded AMZIs for TF1 and TF2. The k-stage filter (TF1) prevents signal-power saturation at the receiver, and coherent detection can be done with the cooperation of TF1 and TF2.
The benefit of the cooperative filtering scheme is a reduction in Silicon-Photonic TF chip size (or increase of the integration level) and its response time. Fig. 4 compares the electrical wiring scale and chip length for different numbers of AMZI stages. For TF2, when reducing stage number from 8 to 4 (k = 4), the chip length and number of package I/Os needed can be reduced by ∼50% or the number of TF2s integrated on a single chip can be commensurately increased. Likewise, by reducing the AMZI stages, tuning speed of TF2 can be reduced since the larger the AMZI cascade number, the larger is the possible error in optimizing each AMZI response. As reported in Section V, the tuning time of a 4-stage TF2 is less than 50% that of an 8-stage TF2 [58]. TF1 can be integrated with a 1 × M tree switch in the MCS to minimize optical switch loss with only a slight increase in chip area: 8.6% increase for 1 × 16 (M) switch and k = 4. As a consequence from the above discussion, this scheme realizes high integration levels of the silicon photonics components and reduces tuning time, which will contribute to its cost-effective implementation.

IV. SIMULATIONS
In this section, optical AMZI filters are analyzed in detail with the view of implementing cooperative filtering. Care is needed in designing the optical switch with regard to in-band signal interference originating from the crosstalk-induced LO and power saturation in the coherent receiver. Numerical assessments can clarify the optimum filter combinations for the proposed optical switch under realistic scenarios.

A. Theoretical Model of Optical AMZI Filters
We begin with a theoretical description of multistage AMZI filters [59] to support port-count analysis. AMZIs are widely used as filters, modulators, and switches in optical communication systems. Let us consider a 2 × 2 AMZI with different arm lengths L 1 and L 2 . An incident light is split into two paths by a 3-dB directional coupler, passed through different physical paths (L 1 or L 2 ) to acquire phase shifts, then recombined by the other 3-dB directional coupler at the output. When electric fields, E in1 and E in2 , are directed to the two different inputs of the first 3-dB directional coupler, the output electric fields are written using the transfer matrix M(λ) as where j = √ −1 is the imaginary unit, k 0 is the free-space wave number, λ is the wavelength of the light, n is the refractive index, and ΔL = |L 1 − L 2 | is the path difference between the two arms. The input and output fields through k-stage cascaded AMZI segments are derived by an expression similar to (1): where the i-th transfer matrix M i (λ) is expressed using the arm conditions L 1 (i), L 2 (i), and ΔL(i) on the i-th 2 × 2 AMZI as By setting E in1 = 1 and E in2 = 0 in (2), we immediately obtain the following analytic expression for transmission: In our cascaded AMZI filter, the path difference ΔL(k) of the k-th AMZI is designed to be twice that of the (k − 1)-th AMZI, being ΔL (k) = 2 k−1 ΔL(1). The calculated transmission spectra plotted in Fig. 3 are for the case of a 4-stage cascaded AMZI filter, assuming a path difference of L (1) = 18.57 μm and a refractive index of n = 3.45. The frequency response of k-th AMZIs is calculated by deriving the output optical power with the k-th transfer matrix M k (λ) of (3). The cooperative filtering acts as an 8-stage cascaded AMZI filter in a coherent receiver, separating the eight AMZI segments into k-stage and (8 − k)-stage AMZIs, corresponding to TF1 and TF2 in an optical switch.

B. Crosstalk Analysis for Colorless Coherent Detection
Our recent work modeled a colorless coherent receiver for detecting a WDM signal using an ideal LO light (i.e., high carrier-to-noise ratio and no crosstalk) [20]. However, an actual LO bank suffers crosstalk from TFs, which can degrade signal quality such as signal-to-noise ratio (SNR), Q-factor, and bit error ratio (BER). Fig. 6 illustrates how signal quality is degraded by LO-crosstalk-induced noise in colorless coherent detection, where the OOB channels and LO crosstalk generate in-band crosstalk. Numerical analyses that extend the previous work [20] quantify the performance limit associated with LO crosstalk. Without loss of generality, we assume a conventional coherent receiver employing polarization beam splitters (PBSs), optical 90°hybrid mixers, and four pairs of BPDs.
For the receiver shown in Fig. 6, we can obtain the complex amplitudes of the two polarization components as I X (t) = I XI (t) + jI XQ (t) and I Y (t) = I YI (t) + jI YQ (t). Frequency offset between the reference light and the channel of interest is assumed to be much less than the baseband bandwidth for intradyne detection. As described in [20], the performance of an ideal colorless coherent receiver is dictated by amplified spontaneous emission (ASE) noise, thermal, shot noise, LO relative intensity noise (RIN), and self-beat interference from out-of-band (OOB) channels. The extra impairments considered here are the beat noise between the OOB channels and LO crosstalk. We simplify the receiver design procedure by considering the combined effects of all noise behaviors as a single white Gaussian noise. The coherent receiver with an LO bank yields the following SNR from the pair of BPDs, where σ 2 ASE , σ 2 th , and σ 2 sh are ASE, thermal, and shot noise variances, respectively, σ 2 RIN is beat noise variance between the LO light and RIN from an LO laser, σ 2 oob is self-beat noise variance from OOB channels, and σ 2 XT is beat noise variance between the OOB channels and undesired LO lights. Similar to [20], we can express each noise term in (5) using signal and LO power as follows: The symbols that appear above are defined as follows: An interesting term considered in this paper is the signalcrosstalk beat noise, which is formalized as The purpose of this subsection is to investigate how the LO crosstalk impacts signal quality. We analyze the SNR penalty defined as the ratio of SNRs obtained from ideal (i.e., noiseless LO) and actual coherent receivers (i.e., LO bank). Throughout the paper, we test a 37.5-GHz-spaced 116-channel WDM signal modulated with 128-Gbps DP-QPSK. Each spectral component  was filtered to have Nyquist root-raised-cosine (RRC) shape with a roll-off factor of 0.1. After adding white Gaussian noise to the signal to yield the received optical signal-to-noise ratio (OSNR) of 17 dB, the central channel (i.e., Ch. 58) was detected with the same coherent receiver as that reported in [52]. Fig. 7 shows the crosstalk dependency while varying the total LO crosstalk from −50 dB to 0 dB. The SNR penalty is lowered as the total in-band crosstalk is decreased, and the improvement saturates for crosstalk below −30 dB. As a result, the optical switch system design needs to properly specify TFs considering the tradeoffs between the switch port count and system cost (e.g., the number of optical amplifiers and the saturation power). For simplicity, our assessment of system performance shown in Fig. 7 assumed a simulated TF with ideal frequency response and high extinction ratio. This is based on the fact that the possible fabrication error from ideal MZI filters only marginally impacts system performance as shown in our previous work [24].

C. Optical Switch Design and Performance Evaluation
Various parameters need to be considered for the cooperative filtering scheme; the number of filter stages, available wavelength number, MCS port count, EDFA saturation powers, receiver's dynamic range, each device loss, and so on. We simulated optical switching transmission using 116-ch. × 128-Gbps DP-QPSK signals in the C-band. The WDM signal was processed in the optical switch where the signal-side TF1 loss was calculated by 0.5 log 2 (k + 2) dB corresponding to k-stage cascaded AMZIs. The EDFA saturation power (P S ) of the optical switch was adjusted as the output level was increased from 15 dB to 27 dB in steps of 1dB. For the LO bank, we set EDFA saturation and preamplifier powers at P L = 23 dBm and P A = 17 dBm, respectively. The EDFAs were shared by S and S L ports [see Fig. 2(a)]. The LO-side TF2 loss was defined as 0.5 log 2 [(8 − k) + 2] + 7 dB, where the total number of signal-and LO-side AMZI stages was fixed at 8. The assumed loss of the signal-side TF1 is smaller than that of the LO-side TF2 as the 1 × M switch of the MCS and TF1 are integrated into a single chip. The other parameters were same as those used in the previous works [20], [52]. The performance was evaluated in terms of the achievable port count needed to satisfy BER = 1 × 10 −3 on the central channel (Ch.58).
Prior to AMZI filter optimization, we find the best balance between the EDFA saturation power (P S ) and sharing numbers (S and S L ). The available port count (MN) of optical switch systems subject to the signal and LO-side 4-stage cascaded AMZI filters was evaluated. The results are shown in Fig. 8, where the EDFA saturation power (P S ) and its sharing number (S) were varied holding S L to 4. At the expense of system cost, the unshared case (i.e., S = 1) provides the maximum port count: 3712 ports (N = 116, M = 32) at a saturation power of 23 dBm. Due to the reduction in received optical power, shared-amplification cases (S = 2 -8) reduce this to 1856-928 at the same saturation power. Fig. 9 illustrates the dependency of LO-side EDFA sharing number (S L ) for S = 2. Similar trends were observed for LO bank design, but the reduction was eased when increasing S L compared to the results in Fig. 8. A port count of more than one thousand was achieved under the conditions of P S = 23 dBm, S = 2, and S L = 4, thus these parameters were used in the subsequent simulations.  Finally, we performed numerical simulations to analyze the interplay of filtering effect at transmitter and LO sides. Fig. 10 shows the relationships between the available port count (MN) and the number of signal-side AMZI stages (k). In the case of no signal-side TF1 (k = 0), the port count can't reach 1000 port due to the halved number of wavelengths (N) limited by the receiver's dynamic range. Available port count is also degraded as the signal-side AMZI stages become larger than 5 (k ≥ 5). This is because multiple wavelengths remaining after passing through the LO-side TF2 incurs power reduction per channel at the LO bank output. From these tradeoffs, we identified the best TF pairs as k = 1-4; they achieved the maximum port count of 1856. The estimated port scale (i.e., 1856) is large enough to efficiently build hybrid switching networks, and validation of such a system is presented using the reasonable conditions applied in numerical simulations. The SNR penalty is 3 dB in comparison to that of ideal coherent systems (receivers), as given by the simulations for a total in-band crosstalk of −15 dB. Additional stage numbers on TF1 and/or TF2 can reduce the in-band crosstalk, which lessen SNR penalty from ideal coherent systems (receivers).

V. FABRICATION OF SILICON-PHOTONIC AMZI FILTER
Silicon-Photonic AMZI filters are a key enabler in creating a scalable optical switch and a rapid-tuning LO bank for coherent detection. Using the same fabrication process as previous works [24], [48], we developed a bandwidth-adaptable AMZI filter on a single silicon-photonic chip for this evaluation. The filter consists of eight AMZIs in conjunction with a 4 × 1 selector as illustrated in Fig. 11(a). They are monolithically integrated on a silicon-on-insulator wafer (top: 220 nm, buried oxide: 3 μm) by lithography and reactive ion etching. Fig. 11(b) and (c) show photographs of the AMZI filter module and chip, respectively. Two identical optical filter modules were assembled in a polarization-diversity configuration using an external circulator and fiber-based PBSs. The measured fiber-to-fiber insertion loss was 8.9 dB including 2.5-dB on-chip propagation loss, 3.4-dB polarization-diversity loss, 1.5-dB chip-to-fiber coupling loss per facet. The transmission spectra were measured by propagating an ASE light through the AMZI filter. As indicated in Fig. 11(d), the 3-dB bandwidths are 256 GHz, 66 GHz, 33 GHz, and 17 GHz after the 4 th , 6 th , 7 th , and 8 th AMZI, respectively. The crosstalk decreased with the number of stages, to reach −13 dB in the 8-stage case.
The AMZIs have thermo-optic phase shifters on both arms to adjust the passband wavelength. By applying an electrical current to heaters placed on the phase shifters, the passbandcenter wavelength is tunable from 1530 nm to 1565 nm in a fully continuous manner. The switching time was examined by launching two continuous wave (CW) lights into the AMI filter and detecting the output with a photodetector. Fig. 12(a)-(d) show measured optical power transitions of the 4-, 6-, 7-, and 8-stage cascaded AMZI filters, respectively. The switching time was defined as that to reach 90% optical power from the instant the trigger was initiated. Optical channels were switched within 38 μs almost independent of the number of stages, but this is slower than our intra-data center requirement (<10 μs). Wavelength tuning speed is accelerated by introducing the Turbopulse heater control scheme [60]. The switching performances are plotted in Fig. 12(e)-(h) for the number of AMI stages of 4,

VI. EXPERIMENTS
For validating the cooperative filtering scheme, we conducted a proof-of-concept demonstration using the fabricated AMZI filters. Fig. 13 shows the experimental setup used to emulate 1856 × 1856 (i.e., N = 116, M = 16) optical switching. At the transmitter, we generated 3-channel 32-Gbaud DP-QPSK signals using three LDs and a dual-polarization IQ modulator (DP-IQM) driven by an arbitrary waveform generator (AWG). The applied data patterns were uncorrelated pseudo-random bit sequences (PRBS) of 2 15 -1 for x-and y-polarizations that were decorrelated by half of the pattern length between the polarization components. The data was Nyquist-pulse-shaped by a RRC filter with a roll-off factor of 0.05. Their wavelengths were set at 1530.236 nm (Ch.A), 1547.116 nm (Ch.B), and 1564.679 nm (Ch.C) to replicate the edges and center channel in the C-band.
The generated signals were coupled with spectrally-shaped ASE (SS-ASE) light by a WSS. The resulting signal emulated a 37.5-GHz-spaced 116-channel DP-QPSK WDM signal covering the wavelength range of 1530 nm to 1565 nm as shown in Fig. 13(a). To match the simulation conditions, the optical power from the WDM transmitter was fixed at 15.6 dBm. The transmitted WDM signal was divided by a 1 × (116/S) splitter and amplified by an EDFA with saturation power of P S . The amplified signal was further distributed by a 1 × S splitter and connected to a 16 × 16 MCS (M = 16). After the MCS, the signal extracted by a signal-side AMZI filter or WSS was passed to the coherent receiver. The WSS was used to emulate the transmission spectra of 1-, 2-, 3-, and 5-stage AMZI filters. Fig. 13(b)-(h) show the measured transmittance spectra of the 1-8 stage AMZI filters when extracting the WDM signal at 1547.116 nm (Ch.B).
The LO bank supplied 37.5-GHz-spaced 116-channel LO lights in the same way as at the transmitter [ Fig. 13(i)]. The LO light was divided by a 1 × 512 splitter and further distributed by a 1 × 4 splitter. After the first splitter, EDFA gain was adjusted to yield the output power of 23 dBm. The desired channel was extracted from the distributed LO light by an LO-side AMZI filter or WSS. The extracted spectra at 1547.116 nm (Ch.B) using the AMZI filters with 1-8 stages are shown in Fig. 13(j) -(p), respectively. A preamplifier boosted the extracted LO light to the saturation power of 17 dBm and sent it to the receiver. The received signal was detected by an optical front-end followed by digital storage oscilloscope (DSO). Demodulation was carried out by offline digital signal processing (DSP) including polarization demultiplexing, carrier recovery, and adaptive equalization. Fig. 14 shows the measured BER versus EDFA saturation power (P S ) when the signal-side AMZI stages were changed from 1 to 5, and accordingly LO side (TF2) AMZI stages were changed from 7 to 3. The sharing number (S) was fixed at 2 to compare the simulation results to those of the experiments. We achieved BERs under 1× 10 − at the EDFA saturation power (P S ) of ∼20.5 and ∼23.0 dBm for 3 stages and 4 stages, respectively. Smaller saturation power is needed when TF1 has fewer than 3 stages. The obtained saturation power is consistent with the simulated value (P S = 23 dBm). The good agreement between numerical simulations and measured results confirms the validity of the cooperative filtering scheme. Our previous work showed that nonlinear effects can be ignored at launched optical powers of less than 15 dBm [61]. Indeed, no distinct signal distortion is observed in Fig. 13 where TFs operate at input powers of less than 17 dBm. Tuning speed can be enhanced by reducing the number of filter stages since large cascade numbers yield large possible error in optimizing each AMZI response. According to the system requirements, we should determine the optimum filter combination by considering the tradeoff between system performance and switching speed.
We highlight system performance of signal-side and LO-side 4-stage AMZI filters. Fig. 15 plots measured BER as a function of EDFA saturation power (P S ) for different sharing numbers (S). The wavelength of the test channel was 1547.116 nm (Ch.B). The attainable EDFA saturation powers (P S ) were found to be 22.7, 22.9, and 25.0 dBm at the BER values of 1 × 10 −3 for S = 1, 2, and 4, respectively. These results prove that port counts of more than 1000 are feasible with conventional EDFAs (P S ≤ 23 dBm). The number of EDFAs in the whole switching network is given by MN/S + MN/S L , so the per-port EDFA power consumption becomes the product of 1/S + 1/S L and EDFA power consumption. Given an EDFA that offers a saturation power of 23 dBm at 3 W power consumption, we can expect reasonable per port EDFA power consumption of 2.25 W at S = 2 and S L = 4. In this case, the total number of EDFAs needed for an optical switch is 1392 (M = 16, N = 116, S = 2 and S L = 4), which results in a total EDFA power consumption of about 4 kW, much lower than available maximum power limit for a conventional rack. The use of the cooperative filtering scheme enables the cost-effective switch system that uses more than 60 wavelengths as determined by the receiver's dynamic range. In this paper, all the component AMZIs in a filter were designed to operate in the transverse magnetic (TM) mode. By changing the TM mode operation to the transverse electric (TE) mode, we can reduce the fiber-to-fiber insertion loss by ∼3 dB [62]. This increases the EDFA sharing numbers (S and S L ), which reduces the total number of EDFAs (MN/S + MN/S L ).
The switching performances are shown in Fig. 16, where Q-factor transitions were measured by changing the signal wavelength (i.e., center wavelength of signal-side 4-stage AMZI filter) from 1530.236 nm (Ch.A) to 1564.679 nm (Ch.C). The Q-factor threshold (9.8 dBQ) was calculated to match a baseline BER of 1 × 10 −3 as per Refs. [63], [64]. During the switching operation, the LO wavelength was fixed because the LO bank used a non-burst mode preamplifier. A short switching time of 3.2 μs was recorded with Turbo-pulse heater control, in contrast to the 10.7 μs achieved without turbo-pulse operation. Here, we defined the switching time as the period from the start of control and 90% of the final Q-factor. Shorter switching times than shown in Fig. 12 stem from the different vertical scales: Fig. 12 plots optical intensity in Watts while Fig. 16 plots Q-factor in   decibels. While the results obtained from thermo-optic AMZIs are satisfactory in our present system, further switching time reduction remains as an important goal; we are trying to attain nanosecond optical switching times by employing electro-optic AMZIs [65].

VII. CONCLUSION
In realizing scalable and fast optical switches for intra-data center networks, wavelength routing and coherent detection play a key role. In this scenario, a Silicon-Photonic TF is one of the critical devices. According to the switch port-count and channel speed needed, TF device selection and the configuration should be optimized to effectively create port counts of several thousand. In data center applications, cost is of critical importance and degree of Silicone-Photonic device integration should be maximized. The performance analysis and verification experiments presented herein confirm that our proposed cooperative filtering scheme successful reduces the total needed number of AMZIs for filtering transmission signals and LO channels. The reduced number of AMZIs for TF1, transmission signal filtering, determines maximum available integration degree with preceding MZIs for M × 1 switches, and those of AMZIs for TF2, LO light filtering, enhances integration degree of TF2, which leads to cost reduction of LO bank. Our approach also attains substantial response time reduction; 3.2 μs switching time was obtained. The value of M is bounded in practice by fabrication concerns: M = 32 is currently available [25] and M = 64 is under development. For instance, 1-Pbps-class optical switch (7488 × 7488 port at 150/200 Gbps) is achievable combining 117 wavelengths and 64 × 64 space switches. We believe the high degree of integration possible with Silicon-Photonic technologies should be maximally exploited to promote cost reduction of optical switches for intra-data center applications.
There remain two challenges before optical switches can be applied to data center networks. The first one is higher density integration as well as cost reduction of Silicon-photonic devices, which will be the key for creating scalable and cost-effective optical switches. The second one relates to network control. Control network design issues including control latency were elaborated in our recent paper [31], [32], and implementation of flow router mechanism in a ToR switch is needed. This paper focuses optical switch technologies for hybrid switch networks, and its compelling advantages over present electricalswitch-based networks would be reinforced for large bandwidth networks, however, with regard to identifying the crossing point, details of device cost (Silicon-photonic devices used are not yet commercially available), reliability and performance (expected loss reduction etc.) need to be clarified.

ACKNOWLEDGMENT
Part of this paper is based on results obtained from project JPNP16007 commissioned by the New Energy and Industrial Technology Development Organization (NEDO).