A Universal Method for Constructing N-Port Reconfigurable Non-Blocking Optical Switches on a Silicon Chip

We propose a novel method for building <inline-formula> <tex-math notation="LaTeX">${N}$ </tex-math></inline-formula>-port nonblocking optical switches on a silicon chip. The rationale behind this is a bidirectional merge-replace-mirror method. It offers nonblocking interconnections among multiple inputs and outputs. The critical merits include <inline-formula> <tex-math notation="LaTeX">$170~\mu \text{m}\,\,\times 300~\mu \text{m}$ </tex-math></inline-formula> footprint, nano-second circuit-switching, and <inline-formula> <tex-math notation="LaTeX">$85.1~\mu \text{W}$ </tex-math></inline-formula> power consumption per link. The worst-case on-chip insertion loss for the 32-port optical switch is 18.02dB (all cross) and crosstalk is −16.96dB (all bars), respectively.


I. INTRODUCTION
Recently, there has been a trend from increasing the processor clock rate to increasing the number of cores in the network on chip (NoC) [1]. As the cores on a single chip expand, the chip size shrinks. This dis-proportionality can lead to excessive power consumption in the traditional metallic system [2]. Optical interconnections thus promise a new paradigm because it is an elegant solution. Its advantage over the electrical counterpart is the high data rate and low energy consumption [3].
A number of non-blocking topologies have been studied to bring optical switching technologies into the NoC environment, such as Crossbar [4], Clos [5], Benes [6], and Piloss [7]. Typically, they employed a microring resonator or a Mach-Zehnder interferometer as the key switching element (SE). Among these methods, the Benes network requires the minimum number of SEs, and only uses the two-port switches [7]. Therefore, the Benes network may be favourable for reducing costs and losses over current switching technologies.
However, constructing a high-port-count Benes in its original form is still a formidable challenge. The main limitation is the additional loss and crosstalk at the The associate editor coordinating the review of this manuscript and approving it for publication was Laura Celentano . waveguide crossings. The larger the network size, the more the intersections accumulate [8]. By using dual-ring assisted architecture [6], [9], Shanghai Jiaotong University optimized the loss to 18.5dB and crosstalk to be −15.1dB, respectively. Our previous work [10] used the bidirectional topology to prevent waveguide crossings. The indicators are recorded 12.01dB and −13.67dB, respectively. As IBM believes [11], the high first-order crosstalk noise is still the main limiting factor for Benes. For this reason, we will extend our work to reduce crossings and crosstalk further.
The main contribution is to propose a novel method for building reconfigurable non-blocking optical switches. Compared to our previous work [10], we may find two unique characteristics: 1) the use of coupled-ring based fourport modules in some critical locations of the network and 2) the novel merge-replace-mirror method. Coupled-ring based modules can provide flatter passbands, sharper roll-off but higher losses. The merge-replace-mirror method significantly reduced waveguide crossings. Though the loss of this work is similar to Benes, its crosstalk suppression capability is comparable to Piloss.
The following chapters will discuss the two-port SEs, the four-port modules, and the N-port network principle in turn. Low insertion loss and crosstalk are anticipated for these compact new designs.

II. BASIC BUILDING BLOCKS
Two components are presented before introducing the universal method: 2 × 2 optical switches and 4 × 4 building blocks.

A. TWO-PORT OPTICAL SWITCHES
The first building block is presented in Fig. 1. In all 2 × 2 optical switches schemes, the ''cross'' states have three consecutive stages. First, at the entrance of waveguides, the optical signal propagates along the waveguides. When the ring resonator turns off, the optical signal that meets the mode coupling condition is coupled into the resonator. Lastly, the outgoing light propagates along another waveguide in the opposite direction. On the other hand, the ''bar'' state only needs two consecutive phases: the first and last phases. When the resonator turns on, the travelling light is in the same direction as the signal light.
These building blocks act as 2 × 2 cross-bar switches. The ''cross'' states are shown in (a), (c), and (e). Subsequently, input ports In 1 and In 2 are routed through the ring towards output ports Out 2 and Out 1 , respectively. The switches are in the ''bar'' state, such as (b), (d), and (f). Subsequently, input ports In 1 and In 2 can be destined to output ports Out 1 and Out 2 directly. Two colours (blue and orange) are used to indicate paths achieved from different input ports.
The most immediate consequence of utilizing three building blocks is the misalignment of the rings' resonant wavelength. It leads to high power consumption and adds a relatively high optical power penalty to signals. Typically, we follow the perimeter choice presented in [10], [12] to determine the coupling mode. However, this perimeter choice is not always optimal in all building blocks. For example, Fig. 1(a) and Fig. 1(c) looks similar at first glance. But the ratio of the waveguide sections between two coupling regions in the microscopic image is quite different. The former is 1:1, and the latter is 1:3. In Fig. 1(e), we add more rings into the design.
One possible solution to this misalignment is by varying the perimeter of the proposed three types of switches, each TABLE 1. The parameter setting is the same as in [10], [12], except that the coupling efficients and perimeter of rings are slightly different.
with a small amount of change relative to the previous one, to centre their wavelengths properly. The perimeter variation was approximately 3.9819nm between consecutive rings in this context.
The first experiment aimed to analyze the subsystem modules for signal high-data-rate transmission. A broadband spectrum generator (ONA) was chosen as input, ranging from 1.46 to 1.61µm. The periodicity of microwaves results in the same transmission spectrum for all operating wavelengths. Their wavelength alignment was checked using INTERCON-NECT simulations, and the required parameters are listed in Table 1. Fig. 2 records the transmission spectrum of all our building blocks as a function of wavelength. Twelve curves are shown representing all drop port T ij configurations, where i denotes the input and j indicates the output port. Three switches are aligned at the band centre near a wavelength of 1574.41nm. The two single-ring switches have a unified free spectral range (FSR), approximately 19.05nm. Both single-ring schemes in Fig. 2 (a) behave at the same crosstalk level of -29.8dB over a 5nm spectral bandwidth. The one with an intersection will experience a slightly more significant loss than the one without crossings. The double-ring method is entirely different from the single-ring one. A passband occurs in the middle with two sub-bands on both sides. The novel approach uses sub-bands to align the spectra rather than the passband. As a result, a 7.16dB improvement in crosstalk can be expected over a 3nm spectral bandwidth. Overall, the crosstalk level is sufficient (above -20dB) for multi-stage structures. This scenario can discuss all building blocks in the united framework.
Indeed, the dual-ring switch is more sensitive to temperature and aging than the single-ring. The reason for deploying a double-ring method over the single ring schemes is twofold. First, insertion losses decrease by using a sub-band instead of the passband. In doing so, we successfully reduce the insertion loss penalty from 1dB to 0.21dB. Second, further reduction of the crosstalk is obtained with the dual-ring scheme. In the cross states, the dual rings' extinction ratio is significantly better than that of the single ring, about 7.16dB. In the bar states, the dual rings' crosstalk is considerably better than the single-ring, with an enhancement of 10.82dB.

B. FOUR-PORT BUILDING BLOCKS
The second basic building block is the four-port optical modules. Recall that the traditional Benes network [13]- [15] requires several two-port SEs from left to right. The basic idea of this paper is to substitute these unidirectional modules VOLUME 10, 2022 with bidirectional ones. The 4 × 4 module is expected to replace two identical 2 × 2 SEs in the future. Under these assumptions, the four-port optical module only works in four states. Below, we will recommend two architecture options: add and drop modules and cross-connects.
The first four-port building block appears as a transparent add/drop module, as Fig. 3 (a) illustrates. The transparent optical module is realized by satisfying the following constraints. 1) The light generated by the input of any add port {3, 4} can be delivered to the output of any traversing port {1', 2'} without blocking. 2) The light emitted from the input of any traversing port {1,2} should terminate with any local drop port {3', 4'}. Subsequently, a four-port add/drop module is composed of four waveguides and four SEs switches. The second module is considered as contention-less crossconnects. Note that the contention-less optical module must meet the same constraints. In this article, we offer two architectures in four directions. See Fig. 3 (b)-(c). The cross-connect modules can perform the same routing functions, as illustrated in Fig. 3 (d). The first module is composed of four waveguides and four SEs. The second module consisted of four waveguides but two SEs. All modules have the same number of crossings. Finally, SE's state and simple logic are illustrated in Fig. 3(e)-(f).
The electrical pad area in Fig. 4 is dedicated to each module, where S denotes signal and G denotes Ground, n and p denotes the negative and positive end.  Table 2 compares our proposed switches with other existing 4-port switches in critical metrics such as insertion losses, number of SEs, and crossings. They include 4 × 4 hitless routers, Min's routers and Jae's routers. Part of the actual values is taken from [16]. Our results are estimated based on the average and maximum values in simulations. Under the average and worst-case scenarios, the proposed switch has the fewest SEs with the least waveguide-crossings. Therefore, our proposal has the lowest insertion loss compared with other alternatives. Fig. 4(b) will account for most of the 4 × 4 modules in this work. It is worth noting that the same resonator is shared in both directions. For example, waveguides 3-1' and 4-2' use the same SE 1 . Although it is not visible in the topology, we have drawn a path with a different colour. The following section will show them with various labels to illustrate the direction of the ports.

III. N-PORT OPTICAL SWITCH
In this section, we propose a universal method for designing an N-port optical switch. It decreases the crossings and crosstalk in the architecture. Four variants of the N×N optical switches are presented in Fig. 5. They proved that our approach is equal to the conventional Benes network in nature. Reconfigurable non-blocking conditions and routing algorithms are discussed along with the implementation. 1) The input and output ports are combined into a single add/drop module discussed in Fig. 3(a). Therefore, the total number of added/removed modules is N/2. A set of basic functions are to connect ports within a single module. The lower half of the ports is used for add and drop purposes. And the upper half is used for cross-connects.
In the upper part of the network, the input and output modules are merged into add/drop modules. In the lower part of the network, the input module 2j-1 and the output   to the right to complete the intermediate replacement steps. 3) Finally, this kind of network provides bidirectional links in a mirror-symmetrical manner. Usually, we divide the network into two. In the upper half of the network, the add/drop module transmits the signal from the input port to the middle module. Then, it receives the signal from the central module through the output port. Note that the lower half of the network repeats the process but uses another variable j. In our design, the network features nested bi-directional rings. In this way, the structure can quickly change from unidirectional to bi-directional.

B. EXPANSION OF THE OPTICAL SWITCH
Understanding the MRM method paves the way for the future expansion of the optical switch. We offer a solution to extend the unidirectional Benes to the bi-directional ones at a more general level. The current Benes network takes two intermediate steps towards this goal. In Fig. 6(a), one can observe that the Benes network theory is used twice. Then, Fig. 6(b) shows the merge-mirror process. It uses the MRM method twice. The key is to make the outermost layer evolve from unidirectional to bidirectional. Finally, Fig. 6(c) shows the detailed view of the merge-replace-mirror method. It replaces all modules with 2×2 switches and 4×4 modules mentioned above. Correspondingly, the module label has also undergone three rounds of evolutions. In Fig.6 (a), we define (i, j, k) tags based on the module address. The label i denotes the module, the label j denotes the direction, and the label k denotes the port. In Fig. 6(b), to route the lightpaths in the topology, we removed the input ports, considered only the output ports, where j = 1. In Fig. 6(c), we use the simple logic similar to Fig. 3(f) to combine three labels into one. Fig. 6(c) shows the topological view of the 32-port switch. The footprint is 170µm wide and 300µm long in a doublelayer manner. We adopt the 8 × 8 part of the switch on the same layer while the rest is on the other. A simple placement method is shown in Fig. 6(d). Note that the electrical air pad and ground are used alternately. The footprint of the electrical air pad to be implemented will not expand the area calculation.
Theoretically, N is infinite because it always complies with the non-blocking requirement. In reality, the maximum N is limited by loss and crosstalk. In our previous work [10], we evaluated the limitations of N in INTERCONNECT. When 2log N 2 −1 ≤ 45, the cumulative loss will not exceed the receiver sensitivity -27 dB. Therefore, the parameter N ranges between (2,2 23 ). However, this work evaluated just 32 ports. The reason is that measuring BER performance is still challenging; the time performance of each module is 0.3ns, while in our evaluation, the computation time of each test exceeds 12 hours. As software versions are updated, we will continue to increase the network size to provide more convincing evidence for practical discussions.
As mentioned earlier, numerous non-blocking optical switches have been proposed for different applications. Benes [9] and Piloss [17] were chosen to ensure more representative results. Two main types generally exist in the literature: i) Benes and ii) Piloss networks. The Benes structure requires just O(NlogN) switches associated with O(N 2 ) waveguide crossings. After that, for the Piloss type, the strict non-blocking network is generally composed of O(N 2 ) switches with O(N 2 ) waveguide crossings. Although the DRAGON structure topology [18] can be arranged slightly different, in essence, it still conforms to this trend. Precisely, its number of switches conforms to O(N 2 ) growth, and the number of waveguide crossings conforms to O(N 2 ) growth. Therefore, we did not choose other structures as a typical comparison. Table 3 compares this work with switches Benes and Piloss from the following aspects. They include the number of waveguides, bends, SEs used, and crossings passed. A strict non-blocking network like Piloss is generally composed of O(N 2 ) switches. On the other hand, the rearrangeable nonblocking network, i.e., Benes and this work, require just O(NlogN) switches. Our structure offers a third possibility, which simultaneously reduces both metrics to O (NlogN).
Unlike traditional structures [9], this work concentrates I/O ports in the centre of the switch architecture. Moreover, the I/O ports are not arranged in sequence. If not mitigated, the difficulty to attach all I/O ports to edges becomes significant. To this end, we have already mitigated this issue using a simple and effective port converter. The penalty is a 4dB onboard loss, see Fig. 7. More details could be found [10].

C. RECONFIGURABLE NON-BLOCKING CONDITIONS AND ROUTING ALGORITHM
Two issues are discussed herein: the non-blocking conditions and routing algorithm. In essence, our structure is a one-toone correspondence to Benes. Thus, our non-blocking condition is consistent with the Benes structure, which conforms to the Clos theorem. The theorem states that: when the input  ports n are less than or equal to 2n-1 of intermediate modules, it is a strictly non-blocking structure. When the input ports n equal to the intermediate modules m, it meets the reconfigurable non-blocking condition. Each module has two inputs and outputs for traffic add and drop at the entrance and exit nodes, which means n equals 2. In the intermediate nodes, the path selection always has two possible choices: m equals 2. Therefore, our design meets the reconfigurable non-blocking requirement.
The network has no wavelength contention when the routing algorithm is reconfigurable non-blocking. Usually, this is an edge colouring problem. A looping algorithm [19] might be used. This routing algorithm provides fast, non-blocking and low-crosstalk connectivity through the switch fabric, see Table 4.
As an illustrative example, let's consider an eight-port switch. Examine a specific set of connection requests between input and output pairs. The requests are given as (2,7), (3,6), (4, 5)}. All requests must be satisfied simultaneously, and the routing process is performed in four steps. (see Fig. 8) First, reliable connectivity is found as expected by the looping algorithm. The bi-partite problem is used recursively in this step. Second, one can find many solutions, and the redundancy is resolved. A first fit method is adopted to make the process simple and scalable. Here, we have a total of 128 solutions to choose from. Select the first computed result of the solutions.
Third, the switch states are computed and converted into the Bi-Benes form. For example, the initial states are bbcc\bcbc\cccc\bcbc\bbcc, where c denotes the cross and b denotes the bar. A certain of the rows or columns are doubled or flip-flopped. Then, the results become ffffnnnn\fffnnn\ffff\fffnnn\nnnnffff, where n denotes on, and f denotes off states. Finally, the scheduler sends the states to nodes, configures the switches, and provides N lightpaths simultaneously.

IV. SYSTEM PERFORMANCE
This article performed a network-level simulation. It includes insertion loss, crosstalk, power consumption, bit error rate, and switching time. The simulation is carried out through In the first study, we investigated the effects of loss and crosstalk. High values may end up limiting scalability. Following the same method as in Fig. 2, we also studied the impact of the transmission performance for each path. In this work, the 32-port switch is composed of 224 SEs, where 80 modules are dual ring modules. This suggests that there are a total of 2 144 possible permutations. Testing all possible options becomes exceptionally inefficient and timeconsuming. We demonstrated the transmission spectra at a wavelength of 1594.44nm to examine two extremes, all-bars and all-cross cases. Fig. 9 reports the power penalties with Y 1 and Y 2 axes representing insertion loss and crosstalk levels. The on-chip loss and the crosstalk are separately in the black square and the blue diamond. Take the all-bar case as an example; it achieved IL by averaging −14.465dB and XTL by −35.3651dB. The IL penalty in black square ranges from −16.5dB to −12dB, whereas the worst-case of the crosstalk in blue diamond is −32dB.
Therefore, the cases where crosstalk is much higher than signal do not exist. When the extinction ratio exceeds 20dB, no strategy is applied. The 'floor' behaviour in the case of ER is below 20dB is due to higher insertion loss; however, thanks to the linear polarizers of the employed system, even the case  One can see from Fig. 9 that the insertion loss is almost the same as the traditional method [9]. But the crosstalk level is significantly better, ranging from -29.39dB to -16.96dB. The essential criteria of NoC applications are met. They are low insertion loss, low crosstalk, and reconfigurable non-blocking [20]. Now let us focus on evaluating the design's bit error rate (BER) and power consumption. In this experiment, a 1000µm Mach-Zehnder interferometer modulator is used. It modulates data at 1594.44nm at a rate of 25Gb/s. The pulse pattern generator outputs 2 11 -1 pseudo-random bit sequence. Then, the modulator input signal is obtained. The input power range is -5dBm to -30dBm. We first consider the backto-back case. Then, we need to find the best-and worstattenuated paths. The challenge is that the ''cross'' state is not always the worst case in all types of switches. In addition, the trade-off process should consider crossing and bending along the path. The worst case of signal attenuation marked in red is from input 10 to output 32'. It connects all modules in the ''bar'' state, except for the middle and add/drop modules. In addition, it is one of the worst-case paths through four intersections and ten bends. The best-case path shown in green is from input 16 to output 14'.
In all cases, error-free 25Gb/s operations are observed in Fig. 10. The deteriorated BER requires almost the same received optical power at the insets. The turning point occurs where BER = 10 −2 , at which the BER reaches the minimum. But it gradually decreases as the power increases. The worst-case occupies the most power consumption. The optimum BER of 10 −9 is measured. It corresponds to a low power penalty of −10.7dBm. The electric-optic switch circuit consumes as low as 0.0851mW per path. It indicates that the overall power consumption can be roughly 2.7232mW.
Finally, to complete the study, we investigated the effect of the switching rise and fall time. When the SE is switched on or switched off, the switching times limitation comes from the carrier transit times. The time to raise the transmission from 10% to 90% and vice versa is 10.62ps and 4.38ps, respectively.
Indeed, for most cases, thermo-optics allows for microsecond circuits. Nano-second circuit-switching has been reported in [21]. It is challenging but still offers limited VOLUME 10, 2022 performance of 23ns. The nano-second-speed circuit switching is sufficient to meet the bandwidth and latency requirements. Table 5 compared the three considered options in power consumption, reconfiguration time, and footprint. A summary of these values is taken from [9], [17]. Compared with Piloss and Benes, an improvement in power and chip size are confirmed. It suggests the MRM method enables us to build compact, energy-efficient switches. Next, the optical loss is almost the same, but the crosstalk is better. The worst-case crosstalk is −16.96dB, which performs better than the traditional −15dB level.
Most of the fabrication steps can be achieved by standard optical lithography techniques. The switch is fabricated on a clean SOI chip with a 220nm thick Si layer and a 2µm thick buried layer of dioxides. This is a standard substrate specification used by silicon photonics foundaries [22].
Lithography defines the ridge waveguides to form the ring resonators and the bus waveguides. Ridge waveguides with a cross section of 500 × 220nm 2 and 70nm thick plates may be employed only to support the basic quasi-TE mode.
A dedicated heating element thermally tunes each switch to allow access to the respective electrical pads. Electrodes are deposited around the channel waveguide and then transferred to the device layer by timed dry etching of the silicon.
Due to the symmetry, the whole chip can be divided into four equal parts. Then, a double layer intersection [23] is required to transmit signals between the first and second layers. Then microwave coupling, fiber pigtail and mounting are completed.
In summary, this work explores an entirely new architecture that coincides with Benes' theory. We highlight its design methods, control process, recursive architectures, and numerical simulations. However, there are still many constraints in application situations that prevent the switches from realization. But this does not affect the completeness of this work. For the more comprehensive approach you may expect, we will find a more precise method for performance analysis in the future.

V. CONCLUSION
A novel reconfigurable non-blocking optical switch is constructed via the merge-replace-mirror method. The rationale is to use a bidirectional switching fabric instead of a unidirectional one. The new approach improves the waveguide crossings, and the switching elements at the same level of O(Nlog 2 N ). The average on-chip loss remains the same, but the range of the crosstalk changes. For a 32-port switch, the crosstalk ranges between -16.96dB and -29.39dB.