Energy Efficiency and Yield Optimization for Optical Interconnects via Transceiver Grouping

Optical interconnects enabled by silicon microring-based transceivers offer great potential for short-reach data communication in future high-performance computing systems. However, microring resonators are prone to process variations that harm both the energy efficiency and the yield of the fabricated transceivers. Especially in the application scenario where a batch of transceivers are fabricated for assembling multiple optical networks, how the transceivers are mixed and matched can directly impact the average energy efficiency and the yield of the networks assembled. In this study, we propose transceiver grouping for assembling communication networks from a pool of fabricated transceivers, aiming to optimize the network energy efficiency and the yield. We evaluated our grouping algorithms by wafer-scale measurement data of microring-based transceivers, as well as synthetic data generated based on an experimentally validated variation model. Our experimental results demonstrate that optimized grouping achieves significant improvement in the network energy efficiency and the yield across a wide range of network configurations, compared to a baseline strategy that randomly groups the transceivers.

Despite great potential demonstrated, silicon microrings often suffer from significant process variations due to fabrication imperfection. As a result, the optical links and networks comprising these imperfect devices must be actively tuned to compensate for the process variations, for which the tuning power is nontrivial [8]. The variation issues become more prominent in the application scenario where a batch of transceivers are fabricated for assembling multiple optical networks. Specifically, some transceivers with straggling variation magnitudes may produce networks that either 1) demand excessive power for variation compensation or 2) fail to support a target data rate, thus worsening the average energy efficiency, the product uniformity, and the yield of the networks assembled. Nevertheless, network-level variation alleviation techniques that exploit waferscale fabrication of microring-based transceivers have been lacking. Techniques based on channel shuffling [9], [10] and sub-channel redundancies [11]- [13] were proposed to reduce the expected power for thermally tuning the resonance wavelengths of the microrings. A hybrid strategy employing both thermal and electrical tuning was proposed in [14]. However, these techniques are limited to the link-level, rather than the network-level, and only target a single pair of transmitter (Tx) and receiver (Rx). Considering wafer-scale fabrication, an optimal pairing scheme for a batch of fabricated transceivers could further reduce the average tuning power required for pairs formed from the batch [15]. Nevertheless, all of the above techniques are restricted to the mitigation of the wavelength tuning power, while the overall energy efficiency and the yield of the transceivers are also impacted by the variations of other parameters, such as the extinction ratios and the quality factors of the microrings. Moreover, none have encompassed the application scenario where the fabricated transceivers are used for assembling communication networks of multiple nodes.
We observed from wafer-scale measurement data of microring-based transceivers that, due to the distinct variation profile of each transceiver, optical networks assembled from different transceivers will have different energy efficiency. Therefore, when a batch of fabricated transceivers are available for assembling several networks, there is an opportunity to group the transceivers in a way that the average energy efficiency of the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ networks assembled is optimized. Meanwhile, it is also desirable, from the perspective of quality control, that the energy efficiency of the networks assembled is uniform. Moreover, some networks assembled may fail to support a target data rate, thus lowering the yield. Therefore, the grouping of the transceivers should also be optimized for the objective of meeting the target data rate. In this study, we propose transceiver grouping which mixes and matches a pool of fabricated transceivers to assemble networks of equal size, aiming to optimize the average energy efficiency, the uniformity, and the yield of the networks assembled.
We designed two algorithms inspired by simulated annealing to address this multi-objective optimization problem. The proposed algorithms were evaluated by wafer-scale measurement data of microring-based transceivers, as well as synthetic data generated based on an experimentally validated variation model. Our experimental results demonstrate that the proposed grouping algorithms achieve significant improvement in all three objectives, namely the average energy efficiency, the uniformity, and the yield of the networks assembled, compared to a baseline strategy that randomly groups the transceivers.
The rest of the paper is organized as follows. In Section II, we review the background of this study and some related work. In Section III, we formulate transceiver grouping as an optimization problem and present our algorithms. In Section IV, we elaborate the measurement and the synthetic data of microring-based transceivers for evaluating our algorithms. We also introduce the power models of the optical devices used in our simulations. In Section V, we evaluate our grouping algorithms for a wide range of network configurations. Finally, in Section VI, we draw the conclusion of this study.

A. Microring-Based Optical Interconnects
An optical network is a collection of optical links that provides data communication among processing nodes. Fig. 1 illustrates an exemplar architecture of an optical network with a generic ring topology [5], where silicon microring-based transceivers are utilized to send and receive optical signals at each node. A silicon microring resonator is a highly wavelength-selective device [16], whose transmission spectrum can be characterized by a Lorentzian function: where λ r , ER, and Q are the resonance wavelength, the extinction ratio, and the quality factor of the microring. A microringbased transceiver, as shown in Fig. 1   can thus be modeled as where m is the number of DWDM channels. The cascaded microrings of a Tx/Rx are usually designed with incremental radii to provide a set of evenly-spaced resonance dips. However, as shown in Fig. 2, the fabricated transceivers often suffer from significant process variations that manifest themselves as the deviation of λ r , ER, and Q from their design values.

B. Impact of Process Variations on Energy Efficiency
The energy efficiency of an optical network largely depends on the energy efficiency of the links it comprises, which, in turn, is impacted by the process variations of the microrings. First of all, the resonance wavelengths of the Tx/Rx must be tuned and aligned to a mutual set of carrier wavelengths. Besides, the variations of ER and Q affect the loss and the crosstalk noise within the optical channel, which must be compensated by an increased laser power to maintain a target data rate. As the variation magnitudes are different from device to device, optical networks comprising different transceivers will have different energy efficiency. Therefore, when a batch of fabricated transceivers are available for assembling such networks, how the transceivers are grouped can directly impact the energy efficiency of each network assembled.

C. Optimization Objectives for Transceiver Grouping
In this study, we focus on the application scenario where a pool of fabricated transceivers are grouped to assemble several optical networks, as shown in Fig. 3. We assume the networks to be assembled are of a multiple-reader-multiple-writer (MWMR) architecture. For networks of different architectures, our proposed approach would still apply except that some specifics need to be adjusted. As illustrated in Fig. 1, each network node in MWMR has both write and read access to the optical ring bus achieved by its Tx and Rx, respectively. With a proper arbitration scheme [17], any two nodes can establish point-to-point communication without the need for relay nodes. Based on this assumption, we propose the following optimization objectives for transceiver grouping. 1) Energy Efficiency: We propose to optimize the average energy efficiency of the networks assembled. We first quantify the energy efficiency of an optical link as its power consumption divided by its data rate. Measured in pJ/b, the smaller the value, the better the energy efficiency. Now, consider a total of N transceivers to be grouped into G networks, each with n nodes (G N/n). The energy efficiency of a network is thus a weighted sum of the energy efficiency of all its links: where ij is the energy efficiency of the unidirectional link from Tx #i to Rx #j (hereafter link (i, j)), and p ij is the portion of the network traffic carried out by this link. The average energy efficiency of all networks assembled is thus where g,ij denotes the energy efficiency of link (i, j) in the gth network.
For a specific application, p ij can be recorded by executing the application within a network simulator [18], [19] and can be different for each link. However, in this study, we assume that the network traffic results from the execution of various applications and is uniformly distributed to each link. Therefore, p ij is considered identical for all links.
Note that the microring tuning schemes proposed in [9]- [14] are dedicated to improving ij of a specific link, as shown in Fig. 3. However, regardless of which technique adopted at the link level, we can always apply transceiver grouping to further optimize the average energy efficiency of the networks.
2) Product Uniformity: Product uniformity is another victim of the process variations, as the energy efficiency can be vastly different for each network assembled. The authors of [20] suggest binning, a widely adopted technique after the testing stage, to categorize the transceivers based on the variation magnitudes. However, different bins may end up having different performance specifications, such as the maximum data rate. On the contrary, our transceiver grouping can improve the uniformity of the energy efficiency of the networks assembled without compromising the target data rate, thus delivering products with similar performance specifications. Specifically, we propose to reduce the standard deviation of the energy efficiency across the networks assembled: where all networks still target the same data rate. The transceiver pairing technique proposed in [15] is a special case of our transceiver grouping with n = 2. However, our study accounts for the overall energy efficiency for communication, in contrast to [15] that only targets the microring tuning power. Moreover, we further introduce a third optimization objective for transceiver grouping, i.e., the network yield.
3) Network Yield: Apart from producing defective devices, the process variations could harm the yield in a way that some networks assembled cannot support a target data rate. Specifically, due to the optical nonlinearities of the silicon material, we assume a maximum optical power of 7 dBm per channel [21], which limits the highest data rate that an optical link can attain. We then propose to optimize where G is the number of networks determined capable of supporting the target data rate. Note that in contrast to E and σ, Y is expected to be maximized. As suggested by Eqs. (4) and (5), both E and σ can be computed from ij . Therefore, for N transceivers available for grouping, it is desirable to prepare a cost matrix E ∈ R N ×N so that every possible ij is computed beforehand for fast look-up. It is also noteworthy that ij is computed as the link power consumption divided by the target data rate. During the computation of ij , if the required optical power is found to exceed the maximum allowed value, the link and the network to which it belongs should be marked as not supporting the target data rate. The preparation of the cost matrix will be detailed in Section IV-C with a description of the device power models involved.

III. PROBLEM FORMULATION
Consider a complete directed graph with N vertices and N (N − 1) directed edges, as illustrated in Fig. 4. Suppose that each vertex denotes a transceiver, and each edge is weighted by ij , the energy efficiency of link (i, j). Then, the objective of minimizing E, as suggested by Eq. (4), can be converted to finding a partition of the graph into equally sized blocks such that the sum of in-block edge weights is minimized. It is further equivalent to finding a partition of the graph with the maximum cut weights [22] with a balance constraint on the sub-graphs [23]. The NP-completeness of the graph partitioning problem has been proven [24], and several heuristic methods have been proposed for balanced partitioning [25]- [27]. However, the balance constraint in these algorithms is often formulated as a penalty to the cost function and might not be strictly satisfied. Directly applying them to transceiver grouping can result in groups of different sizes. Moreover, there exists no algorithm for balanced partitioning with multiple objectives. Therefore, we developed our customized heuristics for transceiver grouping.

A. Grouping Scheme Representation
To strictly enforce groups of equal size, we encode a grouping scheme of N transceivers into a vector s, where s is a permutation of {1, 2, . . . , N}. Every n elements of s are automatically grouped. For example, a grouping scheme for 16 transceivers into four 4-node networks can be s = 6 3 16 11 7 14 8 5 15 1 2 4 13 9 10 12 .
It can be observed that any permutation of the elements within a group does not change the grouping scheme. However, such a representation allows us to easily generate new schemes by shuffling a current one: where u and v are randomly chosen from two different groups.
B. Proposed Algorithms 1) Simulated Annealing: Heuristics based on simulated annealing (SA) [28] can take advantage of the shuffling operation to explore various grouping schemes. We first present an SA-based algorithm (outlined in Algorithm 1) that aims to minimize a unified cost function: Here, the objective E has a constant weight of 1, while the objectives σ and (1 − Y ) are weighted by w 1 and w 2 , respectively. In other words, the energy efficiency of the networks assembled is always an optimization target, while the significance of the uniformity and the yield, as a second and a third optimization target, can be adjusted by the values of w 1 and w 2 . At each SA iteration, a new grouping scheme s is generated by shuffling the current s, and its corresponding cost Z is evaluated based on Eqs. (4)- (6) and Eq. (9). The algorithm decides whether to accept the new grouping scheme with a probability of p = P (Z, Z, T ): where T is the current temperature. When Z is no better than Z, there is still a probability between 0 and 1/2 to accept the new grouping scheme in order to avoid local minima. The SA-based algorithm is seeded by an initial grouping scheme s 0 which is produced by a greedy algorithm (outlined in Algorithm 2). At each iteration, the algorithm greedily groups n transceivers for the best network energy efficiency determined by Eq. (3), until N/n groups are formed.
2) Pareto Simulated Annealing: The SA-based algorithm allows the user to prioritize the three minimization targets, namely E, σ, and (1 − Y ), by specifying w 1 and w 2 . However, it presents another challenge to determine the proper values for w 1 and w 2 . A straightforward approach is to sweep w 1 and w 2 within a given range. Alternatively, one may employ another optimization solver that takes w 1 and w 2 as input variables to explore their impact on the optimization results. Nevertheless, both methods involve an execution of the SA-based algorithm for each pair of w 1 and w 2 and thus can be time-consuming. To address this challenge, we further propose an algorithm based on Pareto simulated annealing (PSA) [29] to efficiently explore the trade-off between E, σ, and Y . Without the need to specify w 1 and w 2 , the PSA-based algorithm directly targets to find a Pareto front of the three optimization objectives where improving any objective will require sacrificing another. During the PSA iterations, a new solution (Z ) is said to dominate an old one (Z) if all three objectives of Z are improved compared to that of Z. Accordingly, the rule for deciding whether to accept a new grouping scheme is modified into where Γ is a vector of weights associated with each optimization objective and automatically updated during the optimization. The larger is the weight of an objective, the lower is the probability of accepting the new grouping scheme if it worsens the objective. At each PSA iteration, multiple new schemes can be generated and evaluated in parallel. Algorithm 3 outlines the main steps of our PSA-based algorithm.

IV. DATA PREPARATION
We now introduce the measurement and the synthetic data of microring-based transceivers for evaluating our algorithms. We also elaborate the computation of the cost matrix and the device power models involved.

A. Measurement Data
We measured the transmission spectra of some 24-channel microring-based transceivers fabricated by STMicroelectronics on a 300 mm silicon-on-insulator (SOI) wafer. As illustrated in Fig. 5, the transceivers are organized into 66 dies, each die consisting of a transmitter and a receiver. The microrings in each Tx/Rx start with a 5 μm radius and ramp-up to a 5.046 μm radius with a step size of 2 nm. The Rx spectra of two dies were not measured correctly, as indicated in Fig. 5(a). Thus, we have the measurement data of 64 fabricated transceivers for evaluating our grouping algorithms.

B. Synthetic Data
To emulate situations where more transceivers are available for grouping, we generate synthetic data of transceivers to evaluate our grouping algorithms. We first extracted the resonance wavelength (λ r ), the extinction ratio (ER), and the quality factor (Q) of each fabricated microring by fitting Eq. (2) to the measured spectra (Fig. 2). Then, we effectively characterized the spatial variations of λ r , ER, and Q by applying our well-established variation modeling method [30]. Specifically,  we attribute the location dependency of the variation magnitude on a wafer to three systematic components, namely wafer-level, intra-die, and inter-die components. This hierarchical method, detailed in [30], involves the usage of 1) robust regression [31] to fit the measurement data with several wafer-level basis functions, followed by 2) a spatial-frequency-domain analysis to extract the intradie variation patterns, and 3) low-rank tensor factorization [32] to extract the inter-die variation patterns. Finally, we fit the residuals from this hierarchical decomposition process with a normal distribution N (μ, σ) that is assumed spatially-stationary across the wafer. Fig. 6 visualizes the variation modeling process for λ r as an example. The variations of ER and Q were modeled in the same manner, and the results are summarized in Table I . We generate wafer-level data for λ r , ER, and Q following the variation model and synthesize them into transceiver spectra based on Eq. (2). To validate that our synthetic transceivers can closely resemble the fabricated ones in terms of power and energy estimation, we simulated the microring tuning power and the communication energy efficiency for the fabricated transceivers and ten wafers of synthetic transceivers. Fig. 7 plots the simulation results in ascending order for a data rate of 30 Gb/s per channel, showing a considerable resemblance of the synthetic transceivers to the fabricated ones. The power models used in these simulations are the same as those used for the computation of ij and will be detailed in Section IV-C.

C. Cost Matrix
For N transceivers available for grouping, a cost matrix E ∈ R N ×N is computed where the entry ij is the energy  efficiency of a unidirectional link from Tx #i to Rx #j at a given data rate, i, j ∈ {1, 2, . . . , N}. We compute ij as the power consumption of the link divided by the aggregated data rate of all DWDM channels. The power consumption includes those of the laser, microring wavelength tuning, and Tx/Rx driver circuitry. Therefore, we have ij = P laser + P tuning + P driver m · DR , where m is the number of DWDM channels, and DR is the target data rate per channel. The power models and assumptions are listed in Table II and explained as follows.

1) Laser Power:
We assume a quantum dot comb laser [33] that can generate a group of evenly-spaced frequency combs to cover the free spectrum range (FSR) of the microrings. We further assume a Gaussian-shaped comb spectrum, as illustrated in Fig. 8, with a spectrum efficiency η = P usable /P total ≈ −3.2 dB [35]. The optical power provided at the laser output must be high enough so that the following power budget equation holds for any channel k ∈ {1, 2, . . . , m}: Here, P comb,k is the optical power of the kth comb line; PL k ∈ (0, 1) is the overall power loss of the kth channel, which is the product of several losses (listed in Fig. 8) as the light travels; P sensitivity is the sensitivity requirement of the receiver and is modeled as a function of the data rate in [34]. The laser is characterized by the wall-plug efficiency (WPE) when converting Fig. 8. Power losses in a microring-based optical link, plotted for five channels for illustration purpose, including 1 coupling loss and modulator passing loss; 2 modulator insertion loss; 3 coupling loss, propagation loss, and Rx drop-port loss; and 4 crosstalk noise.
the electrical power into the optical power: Based on Eqs. (14) and (15), the laser power consumption can be computed for various data rates and is consistent with what reported in [33]. Note that if the required optical power for Eq. (14) to hold exceeds the maximum power allowed (7 dBm as per [21]), the link is marked as not supporting the target data rate.
2) Microring Tuning: The P tuning term in Eq. (13) is the tuning power required to align the microring resonance wavelengths of Tx #i and Rx #j to a mutual set of laser comb lines. We assume that thermal tuning is adopted to redshift the resonance wavelengths of the microrings with a tuning efficiency of 0.15 nm/mW [36]. If some resonance wavelengths fall out of the usable laser range, channel shuffling [9], [10] is applied to utilize a neighboring mode for alignment.
3) Driver Circuitry: We consider the modulator drivers, the receiver transimpedance amplifiers (TIA), and the serializer/deserializer (SerDes) circuitry as the main components of the driver circuitry of an optical link, thus: A decent analysis is provided in [8] that models the power of the driver circuitry as a function of the data rate. In this study, we made lookup tables for P driver at various data rates for the computation of ij . Note that for network topologies other than the generic ring bus described in Section II-C, one can adjust Eq. (13) accordingly for computing ij , which is the energy efficiency of a unidirectional link from Tx #i to Rx #j including relay nodes (if there are any), so that the transceiver grouping algorithms proposed in Section III-B can be directly applied without modification.

V. EVALUATION
We evaluated our SA-and PSA-based algorithms for transceiver grouping based on the data of 64 measured transceivers and up to 256 synthetic transceivers for a wide range of network configurations. A. SA-Based Grouping Algorithm 1) Effectiveness: We first present a few case studies to demonstrate the effectiveness of our SA-based algorithm (Algorithm 1). Fig. 9 shows an example for N = 16 and n = 2 at a target data rate of 30 Gb/s per channel. Several grouping schemes are illustrated in the form of graphs, including random grouping, local grouping, greedy grouping, and three grouping schemes produced by the SA-based algorithm with different w 1 's and w 2 's. The nodes in each graph represent the transceivers available for grouping (i.e., pairing when n = 2). The energy efficiency of each group (pair) is computed from the data of the first 16 measured transceivers. The thinner an edge, the better the energy efficiency. A dashed line, however, indicates that the link cannot support the target data rate.
We observed from Fig. 9 that, compared to a random grouping scheme, the local grouping scheme that groups neighboring transceivers on a wafer only achieves marginal improvement in E and σ. It might seem non-intuitive, as local grouping should mitigate the impact of wafer-level variations. However, Table I suggests that even neighboring transceivers still suffer from significant inter-die variations. The observation justifies the need for more sophisticated grouping algorithms. We further observed that r the greedy algorithm achieves considerable improvement in E but not σ, as the transceivers that lead to better energy efficiency are greedily grouped at earlier steps, leaving the remaining ones grouped at later steps incurring significantly worse energy efficiency; r the SA-based algorithm, which initiates the optimization by shuffling the greedy grouping scheme, can further improve E when w 1 = w 2 = 0, but may converge to a solution with a low yield; r the SA-based algorithm can also improve σ and Y by increasing their corresponding weights, at the cost of less improvement in other objectives. We then used the energy-yield curves to compare different grouping schemes for other network configurations. Fig. 10 provides two more cases for N = 32 and 64, n = 4, at a target data rate of 30 Gb/s per channel. Specifically, for each grouping scheme, we plotted the energy efficiency of all networks assembled in ascending order, so that the average energy efficiency and the uniformity of the networks assembled can be visualized by the position and the slope of a curve. On the other hand, the horizontal axis of the plot, i.e., the network index g ∈ {1, 2, . . . , G}, was normalized by N/n . Then, as defined by Eq. (6), the network yield of a grouping scheme can thus be visualized by the x-coordinate of the ending point of the corresponding curve, as indicated by the vertical dashed lines in Fig. 10. The energy-yield curves again verified that our SA-based algorithm, with a proper assignment of w 1 and w 2 , can achieve significant improvement in the average energy efficiency and the yield of the networks assembled, while drastically improving the uniformity compared to a random grouping scheme.
2) Scalability: We further evaluated our SA-based algorithm for a variety of network configurations that cover N ∈ {16, 32, 64, 128, 256}, n ∈ {2, 4, 8, 16}, and a target data rate ranging from 20 Gb/s to 30 Gb/s per channel. We computed the improvement in E, σ, and Y achieved by our SA-based algorithm over random grouping. Note that the improvement in E and σ is measured by the percentage of reduction compared to that of the random grouping scheme, while the improvement in Y is measured by the arithmetic difference of the yields (a.k.a. percentage points or p.p.) of the two grouping schemes. For example, improving the yield from 50% to 80% is considered as an increase of 30 percentage points, rather than a 60% increase. Overall, our SA-based algorithm with w 1 = 1 and w 2 = 2 achieves up to 25% improvement in the average energy efficiency of the networks assembled, up to 94% reduction of the standard deviation of the energy efficiency, and up to 75 percentage points increase of the network yield, compared to a random grouping scheme for the network configurations evaluated. Furthermore, we observed several trends from the evaluation results that are noteworthy: r As shown in Fig. 11(a), for a given network size (n) and a target data rate, the energy efficiency improvement achieved by our SA-based algorithm increases with N , i.e., the total number of transceivers. In other words, with more transceivers available for grouping, there is a greater opportunity to optimize the average energy efficiency of the networks assembled.
r As shown in Fig. 11(b), for a given number of transceivers available for grouping, the reduction of the standard deviation of the energy efficiency, achieved by our SA-based algorithm, is more significant for a larger n. In other words, when the networks to be assembled are of a larger size, there is a greater opportunity to group the transceivers in a way that the networks assembled have relatively similar energy efficiency.
r As shown in Fig. 11(c), for a given number of transceivers available for grouping, the yield improvement achieved by our SA-based algorithm is greater for a larger n and a higher data rate. It was observed that the network yield resulted from a random grouping scheme drastically decreases with the network size and the target data rate. Especially for n = 16, none of the randomly assembled networks could support a target data rate of 30 Gb/s. Nevertheless, our SA-based algorithm can maintain a reasonably high yield for all network configurations evaluated. The execution time of our SA-based algorithm was recorded for an initial temperature of 100, a cooling rate of 0.95, a re-annealing interval of (10 × N ) iterations, and 50 rounds of annealing. Thus, each optimized grouping scheme was produced from a total of (500 × N ) annealing iterations. According to Fig. 12, this setting was empirically found adequate for Eq. (9) to converge to a steady value. As shown in Fig. 11(d), the execution time of our SA-based algorithm grows polynomially with the number of transceivers and is largely independent of other network parameters. Limited within 40 s for N = 256, the execution time of our SA-based algorithm is considered a small overhead to the test time of the fabricated transceivers.

B. PSA-Based Grouping Algorithm
The SA-based algorithm requires a proper combination of w 1 and w 2 to be specified. To avoid excessive trials only to determine the values for w 1 and w 2 , the SA-based algorithm is best suited for situations where 1) either the uniformity or the yield of the networks assembled has an overriding priority over the other, so that having w 1 or w 2 equal to zero generally works well; or 2) the proper values for w 1 and w 2 are already learned from past runs for the network configuration of interest. For situations where the proper values for w 1 and w 2 are unknown, our PSA-based algorithm (Algorithm 3) can effectively and efficiently explore the trade-off between the three optimization objectives, namely the energy efficiency, the uniformity, and the yield of the networks assembled. By giving a set of Pareto-optimal solutions in a single run, our PSA-based algorithm allows one to select a desired grouping scheme without the need to specify w 1 and w 2 . We compared our PSA-based algorithm to two other methods that explore the same trade-off by varying the combination of w 1 and w 2 :  1) To sweep w 1 and w 2 within a given range (hereafter the SWEEP method). For each combination of w 1 and w 2 , the SA-based algorithm is called to optimize Eq. (9). The Pareto front of E, σ, and Y is derived after the sweeping by eliminating the dominated solutions. 2) To employ another optimization solver that takes w 1 and w 2 as input variables. In this study, we modified an existing implementation of Multi-Objective Particle Swarm Optimization [37] (hereafter the MOPSO method). In each generation, the MOPSO method generates multiple combinations of w 1 and w 2 and calls the SA-based algorithm to optimize Eq. (9) for each combination. The Pareto front of E, σ, and Y is updated at the end of each generation, and new combinations of w 1 and w 2 are generated for the next generation based on the current Pareto front. 1) Effectiveness: For each network configuration, i.e., given N , n, and a target data rate, a Pareto front of E, σ, and Y was explored by SWEEP, MOPSO, and our PSA-based algorithm with the following settings, respectively: SWEEP We swept both w 1 and w 2 from 0.2 to 2 with a step size of 0.2. Thus, a total of 100 different combinations of w 1 and w 2 were explored. For each combination of w 1 and w 2 , a grouping scheme was optimized through (500 × N ) SA iterations. MOPSO We specified a population size of 10 for the MOPSO method, i.e., ten combinations of w 1 and w 2 generated and evaluated in each generation. Thus, a total of 100 combinations of w 1 and w 2 were explored in 10 generations, each producing a grouping scheme optimized through (500 × N ) SA iterations. PSA We executed our PSA-based algorithm for (500 × N ) iterations with a population size of 100, where each individual in the population is a candidate grouping scheme. In other words, 100 grouping schemes were simultaneously optimized through (500 × N ) PSA iterations. Fig. 13 shows the results for N = 32, 64, n = 4, and N = 128, 256, n = 8, at a target data rate of 30 Gb/s per channel. Specifically, each plotted point corresponds to a grouping scheme, whose E and σ can be read from its xand y-coordinates, respectively. The value of Y is color-coded from light yellow (lowest) to dark blue (highest). Therefore, a grouping scheme is considered a better one if it is closer to the bottom left corner and darker in color. The random, local, and greedy grouping schemes are also marked in each plot. We compared the Pareto-optimal grouping schemes produced by SWEEP, MOPSO, and our PSA-based algorithm and made the following observations: r The yield of the networks assembled, as suggested by Eq. (6), can only take a few discrete values. Thus, the Pareto front of E, σ, and Y appears as multiple curves that correspond to different yield values. Taking Fig. 13(a) as an example, for a network configuration of interest, one may pick a grouping scheme from the Pareto front by first specifying an acceptable yield value, then selecting a grouping scheme on the corresponding curve that reflects the desired trade-off between E and σ.
r In all four plots of Fig. 13, most of the Pareto-optimal solutions given by SWEEP and MOPSO are overlaid by solutions given by our PSA-based algorithm. In other words, our PSA-based algorithm can produce Paretooptimal grouping schemes as good as those identified by SWEEP and MOPSO.  r For N = 128 and 256, both SWEEP and MOPSO tend to produce grouping schemes with a low yield. Nevertheless, our PSA-based algorithm can still explore various grouping schemes with a reasonably high yield. r Our PSA-based algorithm can always identify multiple grouping schemes that are significantly better than the random grouping scheme in all three optimization objectives, namely E, σ, and Y . 2) Efficiency: We defined the efficiency of SWEEP, MOPSO, and our PSA-based algorithm as the number of Pareto-optimal grouping schemes that can be produced in unit time. Using the settings specified in Section V-B1, the number of candidate grouping schemes to be optimized by each method was 100, while some of the optimized grouping schemes ended up not on the Pareto front. Fig. 14 compares the efficiency of our PSA-based algorithm to that of SWEEP and MOPSO for various network configurations, and the following observations were made: r The MOPSO method brought a minor increase in the number of Pareto-optimal grouping schemes compared to the SWEEP method, at the cost of longer execution time for the same amount of candidates evaluated. On average, the MOPSO method only achieved 0.97x efficiency compared to the SWEEP method. r Our PSA-based algorithm, compared to both SWEEP and MOPSO, can produce significantly more Pareto-optimal grouping schemes within a shorter execution time for all network configurations evaluated. Overall, our PSA-based algorithm achieved 1.67x to 9.30x improvement in terms of efficiency with an average of 3.13x, compared to the SWEEP method. In a nutshell, when a proper combination of w 1 and w 2 is unknown, our PSA-based algorithm can explore a larger solution space with better efficiency compared to SWEEP and MOPSO, producing more Pareto-optimal grouping schemes for selection.

VI. CONCLUSION
In this study, we target the application scenario where fabricated microring-based transceivers are grouped for assembling optical networks of multiple nodes. We propose two algorithms to mix and match the fabricated transceivers so that the three optimization objectives, namely the average energy efficiency, the uniformity, and the yield of the networks assembled, are optimized. We evaluated our proposed algorithms by wafer-scale measurement data of microring-based transceivers, as well as synthetic data generated based on an experimentally validated variation model. Our first algorithm based on simulated annealing (SA) can achieve up to 25% improvement in the average energy efficiency of the networks assembled, up to 94% reduction of the standard deviation of the energy efficiency, and up to 75 percentage points increase of the network yield, compared to a baseline strategy that randomly groups the transceivers. Moreover, our second algorithm based on Pareto simulated annealing (PSA) can efficiently produce multiple Pareto-optimal grouping schemes that significantly outperform the random grouping scheme in all three optimization objectives, namely the energy efficiency, the uniformity, and the yield of the networks assembled.