Guiding Principle for Minor-Embedding in Simulated-Annealing-Based Ising Machines

We propose a novel type of minor-embedding (ME) in simulated-annealing-based Ising machines. The Ising machines can solve combinatorial optimization problems. Many combinatorial optimization problems are mapped to find the ground (lowest-energy) state of the logical Ising model. When connectivity is restricted on Ising machines, ME is required for mapping from the logical Ising model to a physical Ising model, which corresponds to a specific Ising machine. Herein we discuss the guiding principle of ME design to achieve a high performance in Ising machines. We derive the proposed ME based on a theoretical argument of statistical mechanics. The performance of the proposed ME is compared with two existing types of MEs for different benchmarking problems. Simulated annealing shows that the proposed ME outperforms existing MEs for all benchmarking problems, especially when the distribution of the degree in a logical Ising model has a large standard deviation. This study validates the guiding principle of using statistical mechanics for ME to realize fast and high-precision solvers for combinatorial optimization problems.


A. MOTIVATION
Combinatorial optimization problems find the optimal combination of decision variables to minimize or maximize the objective function under given constraints. Solving a combinatorial optimization problem with a large number of decision variables is difficult because the number of solution candidates increases exponentially with the number of decision variables. Typical examples of combinatorial optimization problems found in textbooks include the satisfiability problem, the traveling salesman problem, and the knapsack problem. In our daily life, combinatorial optimization problems are ubiquitous. Common examples include the shiftplanning optimization, the logistics optimization, and the traffic route optimization. Consequently, the development of efficient solvers for combinatorial optimization problems has attracted attention both in academia and in industry.
Ising machines have been developed as fast and highprecision solvers for combinatorial optimization prob-lems [1]- [12]. They employ three phases to solve problems. In the first phase, a combinatorial optimization problem is mapped as an Ising problem. The Ising problem finds the ground (lowest-energy) state of the logical Ising model, which was originally introduced in statistical mechanics to describe the nature of phase transition materials [13], [14]. The Ising model consists of spins with values of +1 or −1. As described in Sec. III-A, the logical Ising model is defined on an undirected graph with unrestricted connectivity between vertices. The objective function and the constraints in a combinatorial optimization problem are encoded in the Ising model [15]- [17]. Different encoding methods have been proposed: machine learning [18], portfolio optimization [17], [19], traffic optimization [20], optimization in an integrated design circuit [21], [22], and material design [23]. In the second phase, the logical Ising model formulated in the first phase is mapped onto a physical Ising model. The model corresponds to the Ising machine considered. Here, the physical Ising model is defined on an undirected graph VOLUME 8, 2020 1 arXiv:2012.02372v1 [cond-mat.stat-mech] 4 Dec 2020 where the connectivity between vertices may be restricted. For Ising machines with restricted connectivity such as D-Wave [1], [2] and CMOS annealing machines [3], [8], the mapping called minor-embedding (ME) [24] is necessary. In ME, a single spin in the logical Ising model is expressed by several spins in the physical Ising model. The set of spins is called a chain since chains are often formed in an actual ME. In the third phase, the Ising machine searches for the lowestenergy state according to its operation principle. ME can be classified into two types according to the number of spins in each chain. In the first type, each chain has the same number of spins (i.e., a uniform chain length). This type of ME is often called clique ME or completegraph ME because the logical Ising model with all-to-all coupling can be embedded. The algorithms for this type of ME have been developed for D-Wave [25]- [27] and CMOS annealing machines [28]. In the second type, each chain has a different number of spins. The total number of spins in the physical Ising model is usually smaller in the second type if the logical Ising model is not fully connected. Thus, the second type can embed a larger number of logical spins. Heuristic algorithms for finding this type of ME have been developed [29]- [36]. In existing MEs, the spins in a chain interact with ferromagnetic coupling. In both types of MEs, the chains have the same coupling strength. Here, we call the two types of MEs "uniform-length and uniform-coupling ME (ME i)" and "nonuniform-length and uniform-coupling ME (ME ii)", respectively.

B. SUMMARY OF CONTRIBUTIONS
Herein, we discuss the guiding principle of ME design to achieve a higher performance in simulated-annealing (SA)based Ising machines. The main contributions are: • A novel type of ME is proposed where the lengths are nonuniform and the coupling strength of each chain depends on the chain length. The formula between the coupling strength and the chain length is derived from a viewpoint of statistical mechanics. The coupling strength increases with the chain length. This type of ME, which is herein called "nonuniform-length and nonuniform-coupling ME (ME iii)", has not been discussed in previous studies.
• The performance of our proposed ME is compared to two existing types of ME through SA. The results demonstrate that the proposed ME has the best performance for all the problems. In particular, it outperforms the others when the degree of the logical Ising model is widely distributed. The results are general and independent of the distribution of the coupling strengths and biases in logical Ising models.
The rest of the paper is organized as follows. Section II briefly introduces the SA and thermal equilibrium states. The idea of thermal equilibrium states is necessary to derive the initialize to a random initial state 3: for each temperature T do 4: for each Monte Carlo sweep at the temperature do 5: choose a candidate site 6: calculate the energy difference ∆E given by (1) 7: generate a random number r such that 0 ≤ r < 1 8: if r is less than the transition probability W (∆E, T ), update the state 9: end for 10: lower the temperature 11: end for 12: end for proposed ME (ME iii). Section III discusses ME to fix the notation. Then a physical Ising model is presented to tune the chain lengths and intra-chain-coupling strengths in ME. With this model, we show the new type of ME as well as the two existing types of ME. Section IV explains the experimental setup. Section V demonstrates the numerical results, and Sec. VI discusses our results. Section VII concludes with a summary of the results and future research directions. The Appendices give supplemental information for the derivation of the proposed ME (Appendix A) and the experimental results (Appendix B and Appendix C).

II. SIMULATED ANNEALING AND THE THERMAL EQUILIBRIUM STATE
SA is a heuristic algorithm. It is useful in a wide range of application [37]- [39]. It has been employed to find the optimal solution of an objective function in combinatorial optimization problems. To explain SA as an operation principle of Ising machines in the language of statistical mechanics, we consider the objective function as an energy function, which is referred to as the Hamiltonian of the Ising model. As explained in Sec. I, the Ising model consists of spins with values of +1 and −1. Let H({σ i }) be the Hamiltonian of the Ising model, where {σ i } is a combination of decision variables called spins. In this case, the ground (lowest-energy) state corresponds to the spin combination (spin configuration) Algorithm 1 shows the SA algorithm implemented by Markov Chain Monte Carlo (MCMC). The algorithm starts from a completely random initial state. That is, the spin configuration {σ i } is arbitrarily selected. Then the spin configuration is repeatedly updated. Let us consider a transition from the current state {σ i } to a candidate state {σ i }. The probability of making the transition is specified by a transition probability W (∆E, T ), which depends on temperature T and the energy difference between the two states defined by (1) 2 VOLUME 8, 2020 According to the principle of MCMC, the transition probability W (∆E, T ) must satisfy the balance condition, which is given as (2) where the summation means the summation of all the spin configurations. Two well-known choices of the transition probability satisfying the above equation are the heat-bath method and the Metropolis method. Here, P eq T ({σ i }) is the probability distribution of the thermal equilibrium state at temperature T and is given by Here, we set the Boltzmann constant, which is a physical constant, to unity. When the temperature T is fixed, Algorithm 1 is used to sample spin configurations in a thermal equilibrium state [16], [40]. The thermal equilibrium state at high temperature is a random state where the population is almost the same for all spin configurations. By contrast, the thermal equilibrium state at low temperature has a large population in the lower-energy states. In SA, by gradually lowering the temperature, the state should make transitions from a high-temperature state to a low-temperature state while annealing. After performing SA, a lower-energy state, ideally the ground state of H({σ i }), is identified. Herein the expectation value of a physical quantity in the thermal equilibrium state at temperature T is referred to as the thermal average and is denoted as where f ({σ i }) is an arbitrary function of the spin configuration.

III. MINOR-EMBEDDING
In this section, we describe our proposed ME. First, we introduce the concept of ME and a physical Ising model to systematically tune the chain length and the intra-chaincoupling strength.

A. BRIEF INTRODUCTION OF MINOR-EMBEDDING
ME is the mapping from a logical Ising model to a physical Ising model. The symbols L and P denote a logical Ising model and a physical Ising model, respectively. The logical Ising model is defined on an undirected graph G L = (V L , E L ), where V L and E L are sets of vertices and edges, respectively. Herein we refer to G L as a logical graph. The number of vertices is denoted by N L . As mentioned in Sec. II, the Hamiltonian of the Ising model is an objective function and is given by where σ i ∈ {−1, 1} is the logical spin, J ij is the interaction between spins σ i and σ j , and h i is the bias on the spin σ i . Both J ij and h i are real values. J ij > 0 indicates ferromagnetic coupling, whereas J ij < 0 denotes antiferromagnetic coupling. Many combinatorial optimization problems can be mapped as problems to find the ground state of H L ({σ i }).
The interaction strengths J ij and the biases h i are specified by the objective function and constraints of the combinatorial optimization problem.
In a similar manner, the physical Ising model is defined on an undirected graph G P = (V P , E P ), where a physical spin with a binary variable is put on each vertex. [For the specific form of the Hamiltonian in this study, see eq. (6).] Hereafter, G P is referred to as the physical graph and it corresponds to the graph determined by the Ising machine architecture. In general, the physical graph G P has a degree constraint where each vertex can have at most a constant degree. For example, the degree is 6 for the Chimera graph [2], 15 for the Pegasus graph in the D-Wave machines [41], and 5 (1st generation prototype [3]) and 8 (2nd generation prototype [8]) in the CMOS annealing machines. Due to the connectivity restriction among vertices, the logical graph G L is not typically a subgraph of G P . ME enables G L to be expressed in G P even when G L is not a subgraph of G P . Each vertex in the logical graph, i ∈ V L , is mapped to a set of several vertices in the physical graph, φ(i) ⊂ V P . ME is defined by mapping φ : V L → V P , which satisfies the following conditions [29]: 1) For each vertex i ∈ V L , the vertices in φ(i) ⊂ V P are connected and the connection is called chain; 2) For all i = j in V L , φ(i) and φ(j) are disjointed; 3) For each pair (i, j) ∈ E L , the corresponding pair exists in the physical graph (i.e., a pair of vertices, k ∈ φ(i) and ∈ φ(j), satisfying (k, ) ∈ E P ).
The physical spins in a chain interact with a ferromagnetic coupling. When the ferromagnetic coupling is sufficiently large, the ground state of the logical Ising model and that of the physical Ising model have a one-to-one correspondence [24]. This implies that the ground state of H L ({σ i }) is obtained by searching the ground state of the embedded physical Ising model.

B. PHYSICAL ISING MODEL TO TUNE CHAIN LENGTHS AND INTRA-CHAIN-COUPLING STRENGTHS IN MINOR-EMBEDDING
This subsection describes a physical Ising model to systematically tune the chain lengths and the intra-chain-coupling strengths. The upper panel of Fig. 1 represents the logical Ising model with N L = 5. The logical spins σ i and σ j are connected with coupling strength J ij when there is an edge between the corresponding vertices, and the bias with the strength h i is applied on each spin i. The lower panel shows the physical Ising model in which the logical Ising model is embedded. For simplicity, we assume that each chain is a ring of vertices in G P . Each vertex in the logical graph, i ∈ V L , is mapped to the ring with the length L(i), and the physical spins in the ring are connected through a ferromagnetic coupling with the strength J F (i). The Hamiltonian of the physical Ising model is explicitly given by where s i,j ∈ {−1, 1} is the j-th physical spin in the ring φ(i) and the periodic boundary condition is imposed (i.e., s i,L(i)+1 = s i,1 ). There is an interaction with the strength J ij between a physical spin in a ring φ(i) and a physical spin in a ring φ(j), and v i (j) denotes the physical spin in a ring φ(i). We assume that each physical spin interacts with one physical spin in other rings, at most.
The bias on each physical spin in a ring φ(i) is set as h i /L(i). In this way, the biases applied to the spins in a ring become uniform. Since intra-ring-couplings are ferromagnetic couplings, J F (i) > 0 for all i. When the strength of J F (i) is sufficiently large, the ground states of the logical Ising model and the physical Ising model have a one-to-one correspondence.

C. TYPES OF MINOR-EMBEDDING
We consider three types of ME: ME i, ME ii, and ME iii. These depend on the choice of the ring length L(i) and the intra-ring-coupling strength J F (i). ME i and ME ii have been studied previously [25]- [29], [31]- [36]. ME iii is a new type proposed in this study. Equation (6) can systematically express the three types of ME.
• ME i: uniform-length and uniform-coupling ME In the first type of ME, all rings have the same length and coupling strength. We set the number of spins in a ring as N L − 1, which is the maximum degree for each vertex. That is We In ME i, the logical Ising model with all-to-all coupling can be embedded. We introduce a hyperparameter J c for the intra-ring-coupling, which is expressed as The number of vertices in the physical graph (i.e., the number of spins in the physical Ising model) is provided as • ME ii: nonuniform-length and uniform-coupling ME In the second type of ME, the total number of physical spins is set as small as possible. For a given logical Ising model, it is sufficient to take the number of spins in a ring φ(i) as the degree of vertex denoted by k i . Namely, We where n i (j) ∈ N is an integer given for vertex i ∈ V L . The integer is incremented by one when there is an edge between the logical spins i and j. That is, (i, j) ∈ E L . For example, if a logical spin labeled by 2 interacts with spins labeled by 1, 5, and 6, then n 2 (1) = 1, n 2 (5) = 2, and n 2 (6) = 3. Similar to ME i, a uniform ferromagnetic coupling strength is assumed inside the ring and In this ME, every physical spin is connected to a spin in another ring. Hence, the number of vertices in the physical Ising model is given by • ME iii: nonuniform-length and nonuniform-coupling ME We propose a new type of ME where the intra-ringcoupling strength depends on the ring length. Similar to ME ii, the length of ring L(i) is equal to the degree of the vertices i ∈ V L , and v i (j) is set by eq. (12). The intra-ring-coupling strength J F (i) is given by Here, J F (i) is a monotonically increasing function of L(i), and it asymptotically behaves as Below, we derive the formula in eq. (16). First, consider the local Hamiltonian of the i-th ring, Here, the effect due to inter-ring couplings between rings {J ij } and the biases on spins {h i /L(i)} is neglected. The correlation length ξ i (T ) of this model at temperature T is given by [13] (see Appendix A for a detailed derivation) where ξ i (T ) is defined by Here, C i (j) is called the correlation function. It describes the thermal average of the products of spins s i,k and s i,j+k . The correlation function is independent of k due to the periodic boundary condition of the ring.
As the distance between the spins increases, the value of C i (j) decays exponentially. The correlation length ξ i (T ) determines the decay length scale. ξ i (T ) is a monotonically decreasing function of T . At sufficiently low temperatures, the correlation length is much larger than the ring length ξ i (T ) L(i). Hence, all the spins in the ring tend to have the same values. On the other hand, at sufficiently high temperatures, ξ i (T ) L(i). In this case, each spin in the ring randomly has values +1 or −1. The crossover occurs at T c (i), where Here, we assume that T c (i) of all the rings have the same value. That is, Substituting eqs. (19) and (22) into (21) gives (16). The guiding principle of ME design to achieve a high performance in Ising machines is that the intra-ring-coupling strength must be tuned according to eq. (16). In SA, the temperature T decreases from a high temperature to a low temperature. The physical spins in each ring randomly take values of +1 or −1 when T > J c . By contrast, they are aligned in the same direction when T < J c . The physical spins in each ring are aligned along the same direction simultaneously at T = J c .
When the lengths of rings are uniform and L(i) = N L − 1, ME iii is reduced to ME i. As such, the case with uniformlength and nonuniform-coupling ME is not considered in this study. Next, we compared the performance of the three MEs.

IV. EXPERIMENTAL SETUP A. BENCHMARKING PROBLEMS
We considered four types of benchmarking problems (i.e., logical Ising models). Each benchmarking problem has its own distribution of the degree in G L or of {J ij } and {h i }.
• Binomial-Bimodal problem The logical graph G L is created by connecting the vertices i and j by an edge with half probability. The degree distribution is given by the binomial distribution. The coupling strengths and the biases are chosen according to a bimodal distribution. That is, J ij and h i take values from {−1, +1} with equal probability. • Binomial-Gaussian problem The logical graph G L is created by connecting the vertices i and j by an edge with half probability. The coupling strengths and the biases are chosen according to a Gaussian distribution with a mean of zero and a standard deviation of unity. • Power-Bimodal problem The logical graph G L with a scale-free network is created by the algorithm of the Barabasi-Albert (BA) model [42]. The degree distribution is given by a powerlaw distribution. The coupling strengths and the biases are chosen according to a bimodal distribution. That is, J ij and h i take values from {−1, +1} with an equal probability.  Fig. (a), the line is drawn using a Gaussian distribution. In Fig. (b), the line is a guide to show the power-law scaling of n(k) ∝ k −3 . Error bars are the standard deviation of 10 realizations of the logical graphs for each benchmarking problem.
deviation of unity.
For each benchmarking problem, we prepared a hundred scenarios by creating ten connected logical graphs. For each connected graph, we generated ten sets of different coupling strengths {J ij } and biases {h i }.
The BA model was originally introduced to explain the mechanism responsible for the emergence of power-law degree distributions of networks in various fields. In the algorithm of the BA model, the graph begins from a fully connected graph with m 0 vertices. At every step, a new vertex with m edges is added to m different vertices already present in the graph with a certain probability. The probability of connecting a new vertex and vertex i depends on the degree k i , and is given as The algorithm ends when the number of vertices is N L . Since the number of edges increases by m in every step, the number of edges in a graph with N L vertices is approximately Numerical simulations and analytic results [42] have demonstrated that the graph evolves into a scale-free network. Namely, the histogram of the degree k denoted by n(k) follows a power-law scaling. In the BA model, the exponent is 3 and is independent of m 0 and m. Figure 2 shows histograms of the degree k in the Binomial-Bimodal problem and the Binomial-Gaussian problem [ Fig. 2  (a)] and in the Power-Bimodal problem and the Power-Gaussian problem [ Fig. 2 (b)] for the model with N L = 1000. The error bars denote the standard deviation of the 10 realizations of the logical graphs in each benchmarking problem. In the Binomial-Bimodal problem and the Binomial-Gaussian problem, there is a peak around k = N L /2 because the vertices are connected by an edge with half probability. The peak width is the order of N 1/2 L . The histogram is well described by a scaled Gaussian distribution with a mean of N L /2 and a standard deviation of √ N L /2. On the other hand, in the Power-Bimodal problem and the Power-Gaussian problem, the degree is more widely distributed, and the histogram follows a power law. The power-law scaling of n(k) ∝ k −3 is consistent with our data. Here, we set m 0 = 3 and m = 3.

B. SIMULATION DETAILS
We applied SA to the physical Ising models by adopting the single-spin flip Monte Carlo method. In each update of the spin configuration, the spin is randomly selected and the energy difference ∆E is calculated between the current state and the candidate state in which the chosen spin is flipped [see eq. (1)]. Here, the heat-bath transition probability at temperature T is used and is given as Equation (25) satisfies the balance condition [see eq. (2)], and each Monte Carlo step (MCS) repeats the updates |V P | times. The temperature is initially set to T ini = 10, which is larger than the typical energy scale, and decreases by 10 −4 in every MCS. The temperature at the end of the annealing is zero. Appendix B shows the result using a different type of annealing schedule. Regardless of the annealing schedule, the same results are qualitatively produced. After performing SA, the values of the logical spins {σ i } are determined from the spin configuration of the physical Ising model. If all the physical spins in the ring have the same value, the value is the same as that for the logical spin. If not, the value of the logical spin +1 or −1 is determined by the majority vote. Namely, when five physical spins take +1 and three physical spins take −1 in a ring φ(i), σ i is determined as +1. If (+1)-spins and (−1)-spins are the same, the value of the corresponding logical spin is set to +1. Energy density ε J c For each physical Ising model (i.e., the model mapped by an embedding), we performed SA one hundred times to estimate the average and standard deviation of the quantities described in Sec. V.

V. NUMERICAL RESULTS
We compared the performances of ME i, ME ii, and ME iii for each benchmarking problem. We measured two quantities. The first one is the step to solution (STS). The STS is the number of steps required for the algorithm to obtain the ground state at least once with a probability of 0.99, and it is defined by [9], [43] R (k) where P (k) s is the success probability and k is a label of the logical Ising models. Herein k is the run, which ranges from 1 to 100. A small R (k) 99 value indicates a good performance. Here, the success probability is estimated as P is the number of obtained ground states in one hundred simulations of SA. We measured STS for relatively small-size systems up to N L = 28 because it is difficult to obtain the ground state of H L ({σ i }) for a larger system size.
For the N L -dependence of the performance in a largersized system, we calculated the energy density (i.e., the value of H L ({σ i })/N L ), where σ i is determined by the majority vote after SA (see subsection IV-B). (k) denotes the average of the energy density for 100 simulations of SA, where k is the label of the logical Ising model. A smaller (k) indicates a better performance. Note that the energy density can be evaluated on the order of N 2 L steps. Thus, this quantity is useful to study larger-sized systems.
To investigate the performance of ME in the benchmarking problems, we used the median of the STSs and the median of energy densities. These are denoted as R 99 and , respectively. Figure 3 plots the J c -dependences of R 99 (a) and (b) in ME i for each benchmarking problem. Here, the optimal values of J c that minimizes R 99 or are found. We estimated the optimal values of J c in each ME for different sized systems, where 0.1 is used as the precision threshold of J c (see Appendix C for the N L -dependences). In the following, we show the data of R 99 , , and (k) at the optimal value of J c .

A. STEP TO SOLUTION (STS)
First, we used STS to compare the performance among ME i, ME ii, and ME iii. Figure 4 shows the N L -dependences of STS for each benchmarking problem. In all cases, STS increases exponentially with N L . For small system sizes, N L = 8 or 12, there is not a clear difference in performance. However, a clear difference appears as the number of logical spins increases. For all benchmarking problems, ME i shows the poorest performance. For the Binomial-Bimodal problem [ Fig. 4 (a)] and the Binomial-Gaussian problem [ Fig. 4 (b)], the performances of ME ii and ME iii are the same within the margin of error. On the other hand, for the Power-Bimodal problem [ Fig. 4 (c)] and the Power-Gaussian problem [ Fig. 4  (d)], ME iii outperforms ME ii.
We also considered the time to solution (TTS) [9], [43]- [45]. TTS is the total time required for the algorithm to obtain the ground state at least once with a probability of 0.99. TTS is estimated as where n MCS is the number of the MCSs in the SA and τ MCS is the time required to calculate one MCS. For all three MEs, n MCS is the same as STS and is given as In each MCS, we calculated the energy difference of a singlespin flip |V P | times. The time required for the calculation of VOLUME 8, 2020  Power-Gaussian problem. ME i, ME ii, and ME iii are denoted by red circles, green squares, and blue triangles, respectively. the energy difference is O(|V P | 0 ) since the connectivity of the physical Ising model is sparse. Thus In the Binomial-Bimodal problem and the Binomial-Gaussian problem, the number of physical spins is on the order of N 2 L for all three MEs. Thus, TTS qualitatively shows the same result as STS. In the Power-Bimodal problem and the Power-Gaussian problem, the number of physical spins is on the order of N 2 L in ME i, while it is on the order of N L in ME ii and ME iii. The scaling of O(N L ) is obtained from eqs. (14) and (24). TTS shows larger performance differences between ME i and the other two compared to the STS.

B. ENERGY DENSITY
Next, we compared the performance among ME i, ME ii, and ME iii in terms of the energy density. The results are qualitatively the same as those of STS. Figure 5 shows the N L -dependence of for each benchmarking problem. As N L increases, the difference in performance among the MEs becomes clear. ME i has the poorest performance. ME ii and ME iii have the same performance for the Binomial-Bimodal problem [ Fig. 5 (a)] and the Binomial-Gaussian problem [ Fig. 5 (b)]. On the other hand, ME iii outperforms ME ii for the Power-Bimodal problem [ Fig. 5 (c)] and the Power-Gaussian problem [ Fig. 5 (d)]. Figures 5 (c) and (d) show that the differences in energy densities are almost the same for N L ≥ 10 2 . This implies that ME iii will provide the best performance, even for larger-sized systems. Figure 6 shows the scatterplot to compare the energy densities for each benchmarking problem (i.e., { k } 100 k=1 , among the three MEs) using the data at N L = 100. The upper panel compares the energy densities between ME i and ME ii. All the points are plotted below the diagonal, indicating that ME ii outperforms ME i for all the logical Ising models in each benchmarking problem. The lower panel compares the energy densities between ME ii and ME iii. For the Binomial-Bimodal problem [ Fig. 6 (a2)] and the Binomial-Gaussian problem [ Fig. 6 (b2)], the points are plotted around the diagonal, implying that ME ii and ME iii have similar performance. On the other hand, for the Power-Bimodal problem [ Fig. 6 (c2)] and the Power-Gaussian problem [ Fig. 6 (d2)], all the points are plotted below the diagonal, indicating that ME iii is better suited for these benchmarking problems.

VI. DISCUSSION
The numerical studies demonstrate that ME i has the poorest performance. The poor performance of ME i is attributed to the large dimension of the solution space. In ME i, the logical Ising model with N L -spins is mapped to the physical Ising model with N L (N L − 1)-spins. On the other hand, in ME ii, the number of the physical spins is about N 2 L /2 for the Binomial-Bimodal problem and the Binomial-Gaussian problem, but is on the order of N L for the Power-Bimodal problem and the Power-Gaussian problem. The dimension of the solution space increases exponentially with respect to the number of physical spins. Thus, the dimension of the solution space in ME i rapidly increases with N L compared to those in ME ii and ME ii. ME ii and ME iii have the same performance for the Binomial-Bimodal problem and the Binomial-Gaussian problem. This can be understood as follows for large L(i). For the Binomial problem, the degree distribution shows a peak around k = N L /2 with the width on the order of N 1/2 L [see Fig. 2(a)]. In ME iii, the length of the ring is equal to the degree. Thus, L(i) is distributed with a mean of L mean = N L /2 and a standard deviation ∆L on the order of N  Binomial-Bimodal (a2) Binomial-Gaussian (b1) . Scatterplots of the energy densities { k } 100 k=1 at NL = 100 among ME i, ME ii, and ME iii for (a) the Binomial-Bimodal problem, (b) the Binomial-Gaussian problem, (c) the Power-Bimodal problem, and (d) the Power-Gaussian problem. Upper panel compares ME i and ME ii, and the lower panel compares ME ii and ME iii. deviation of J F (i) scales as The standard deviation ∆J F decays with N L , implying that ME iii approaches ME ii as the system size increases. VOLUME 8, 2020 On the other hand, ME iii outperforms ME ii for the Power-Bimodal problem and the Power-Gaussian problem for large N L . In these problems, the degree of the logical Ising model is widely distributed, reflecting the distribution of L(i) in the physical Ising model. Namely, the lengths of some rings are on the order of 1, while others are on the order of N L . In these problems, it is necessary to set the intraring-coupling strength according to eq. (16) to achieve a high performance in SA-based Ising machines.

VII. CONCLUSION AND OUTLOOK
Here, we discussed the guiding principle of ME design to achieve a high performance in SA-based Ising machines from a viewpoint of statistical mechanics. We proposed a new type of ME shown in eq. (16). In the proposed ME, the coupling strength inside a chain depends on the chain length. This is a unique approach that has not been discussed previously. We compared the performance of our proposed ME with the two existing MEs using four benchmarking problems. SA showed that the proposed ME has the best performance for all the benchmarking problems. In particular, it outperformed the others when the logical Ising model has a wide degree distribution. The results are independent of the distribution of coupling strengths and biases in the logical Ising model.
We demonstrated the importance of tuning the intra-chain coupling strengths in SA, which is regarded as an ideal Ising machine. In the future, we plan to apply eq. (16) to real Ising machines such as a CMOS annealing machine.
It is also important to compare our results with the case of quantum annealing (QA) [46], where the transverse-field strength Γ plays the role of the temperature T in SA. A recent paper [47] shows that ME ii outperforms ME i. This is consistent with the results in this study. Furthermore, our results imply that the performance of QA could be enhanced for some problems if the intra-chain coupling strength J F is tuned according to the chain length. The D-Wave's report [48] evaluated the chain-length dependence of the tunneling energy between the all-up-spin state and the all-downspin state of chains, and showed that Γ/J F should be larger for a longer chain to achieve a high performance of QA. In this study, Γ was tuned instead of J F . Interestingly, their result implied the opposite as ours using SA. This is a future problem to uncover the origin of the difference between QA and SA. .

APPENDIX A CORRELATION LENGTH IN A ONE-DIMENSIONAL ISING MODEL
The appendix provides a detailed derivation of the correlation length in a one-dimensional Ising model [see eq. (19) in the main text]. The correlation length is determined by the correlation function, which is given by We use the Hamiltonian shown in eq. (18) and set K(i) = J F (i)/T . Here, Z is called the partition function in statistical mechanics, which is given by The transfer matrix method is a powerful tool in statistical mechanics. We applied it to calculate the partition function Z and the numerator on the right-hand side of eq. (31). First, we introduce Then the partition function is given by · · · T (s i,L(i)−1 , s i,L(i) )T (s i,L(i) , s i,1 ). (34) Here, it is convenient to regard T (s i,j , s i,j+1 ) as a matrix element of T such as The matrix T is called the transfer matrix. Then the partition function is written as where λ ± are the two eigenvalues of T [i.e., λ + = 2 cosh K(i) and λ − = 2 sinh K(i)]. Similarly, the numerator on the right-hand side of eq. (31) is evaluated as × T (s i,1 , s i,2 )T (s i,2 , s i,3 ) · · · T (s i,L(i) , s i,1 ), =Tr σT j σT L(i)−j , where σ = 1 0 0 −1 .
10 VOLUME 8, 2020 The correlation function is then given by For large L(i), the parenthesis in eq. (39) can be approximated by one. Then C i (j) = tanh j K(i).
By combining this expression with we obtain eq. (19) in the main text.

APPENDIX B PERFORMANCE COMPARISON OF MINOR-EMBEDDING FOR A DIFFERENT ANNEALING SCHEDULE
This appendix validates the proposed ME (ME iii) when a different annealing schedule in SA is used. Here, the temperature in the SA algorithm [see Algorithm 1] is taken as where T k is the temperature at the k-th MCS. We set the total number of the MCSs as n MCS = 10 4 , the initial temperature T ini = 10, and the final temperature T fin = 0.01. The initial temperature and the final temperature are the default values in the CMOS annealing machine. The decay rate of the temperature is determined by T 1 = T ini and T nMCS = T fin as r 6.9×10 −4 . In this annealing schedule, the temperature is lowered as an exponential function of MCSs. This is different from the annealing schedule in the main text, where the temperature linearly decays to zero. We use the STS given by eq. (26) to compare the performances among ME i, ME ii, and ME iii. Figure 7 shows the N L -dependences of the STS for each benchmarking problem. Similar to the main text, ME iii outperforms ME i and ME ii for all the benchmarking problems, implying that the results are independent of the annealing schedules.

APPENDIX C NL-DEPENDENCES OF THE OPTIMAL VALUES OF JC
This appendix discusses the N L -dependences of the optimal value J c , which minimizes the energy density of the logical Ising model . Here, we denote the optimal value as J (opt) c . Figure 8 shows the N L -dependences of J gradually increases with N L , but the increase is slower than that of the ME i.