Fast-Converging Simulated Annealing for Ising Models Based on Integral Stochastic Computing

Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising models. In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing. The stochastic implementation approximates a p-bit function, which can search for a solution to a combinatorial optimization problem at lower energy than conventional p-bits. Searching around the global minimum energy can increase the probability of finding a solution. The proposed stochastic computing-based SA method is compared with conventional SA and quantum annealing (QA) with a D-Wave Two quantum annealer on the traveling salesman, maximum cut (MAX-CUT), and graph isomorphism (GI) problems. The proposed method achieves a convergence speed a few orders of magnitude faster while dealing with an order of magnitude larger number of spins than the other methods.


I. INTRODUCTION
Simulated annealing (SA) has been studied as a possible technique for solving NP-hard problems [1].In [2], SA was applied to solve the traveling salesman problem (TSP) by finding solutions at the global minimum energy of an Ising model.The solution was based on invertible logic and was implemented with probabilistic bits (p-bits) that can achieve probabilistic bidirectional computing between inputs and outputs [3].This p-bit-based model for invertible logic was based on Boltzmann machines [4] and ground-state spin logic [5]- [7].Invertible logic with p-bits has been applied to solve several critical problems, such as integer factorization and neural network training [8], [9].Probabilistic device models of p-bits have been given using a probabilistic magnetic tunnel junction [10] and standard CMOS digital circuits [11], [12].
We can also apply p-bits to solve combinatorial optimization problems with SA on Ising models [13]; however, the effectiveness of p-bit-based SA has not been reported, such as its convergence speed and scalability.In this brief, we introduce a fast-converging SA method based on p-bits (spins).Our p-bits are designed using stochastic computing, a probabilistic computing technique using random bit streams [14], [15].The proposed method approximates the p-bit function based on an extension of stochastic computing called integral stochastic computing [16].More specifically, the tanh function in the p-bit model is approximated using a saturated up-down counter that can induce relaxed transitions of a spin state for fast convergence to the global minimum energy.We evaluate SA based on the conventional p-bits and the proposed stochastic computing methods for three typical combinatorial optimization problems with Ising models: the TSP, maximum cut (MAX-CUT) problem, and graph isomorphism (GI) problem.The proposed method can search for solutions to the optimization problems at lower energy than the conventional method, and it achieves a convergence speed that is a few orders of magnitude faster.In addition, we compare the proposed SA method to quantum annealing (QA) using a 504-qubit D-Wave Two machine [17] where the qubit corresponds to the spin.For the same number of iterations, the proposed method can solve the considered combinatorial optimization problems with roughly 16 times more spins (qubits) than QA.
This brief is organized as follows.Section II reviews p-bits and their application to the SA of Ising models.Section III introduces SA based on integral stochastic computing.Section IV describes the simulation conditions of SA for combinatorial optimization problems.Section V compares the proposed method with conventional SA and QA and discusses some insights into the proposed SA method.Section VI concludes this brief.

II. P-BIT-BASED SA FOR ISING MODELS
The state of a p-bit is represented as follows: where σ i (t + 1) ∈ {−1, 1} is a binary output signal, I i (t + 1) is a real-valued input signal, and n rnd is the noise magnitude of a random signal, r i (t) ∈ {−1:1}.A p-bit is treated as a spin of an Ising model.Fig. 1 shows SA for Ising models based on p-bits.I i (t + 1) is calculated using a spin bias h and weights J between spins as follows: where I 0 is a pseudoinverse temperature used to control the SA.This computation can be performed using a resistor network [3] or digital circuits [11].
Based on (1) and ( 2), SA is carried out for an energy function (Hamiltonian), which is described as follows: where the energy function is represented by an Ising model.In SA, the solutions of combinatorial optimization problems are embedded in the global minimum energy (E min ) of an Ising model.Before the SA process, h and J are determined from combinatorial optimization problems [18], [19].During the SA process, the σ i values are changed using random signals, which can decrease the energy of the Ising model.Finally, σ i can be found as the solution to the optimization problem at the global minimum energy.

III. SA BASED ON INTEGRAL STOCHASTIC COMPUTING
In this brief, p-bit-based SA for Ising models is performed using integral stochastic computing.Stochastic computing uses values represented by the frequency of 1's in bit streams [14], [15], [20] and has been used for several applications, such as low-density paritycheck decoders, image processing, digital filters, and deep neural networks [16], [21]- [25].Integral stochastic computing extends stochastic computing by using streams of integers (representing the sum of binary bit streams) to represent data values with a larger range than that of stochastic computing [16].
Based on integral stochastic computing, ( 1) and ( 2) are redefined in the proposed method as follows: Fig. 2 shows a spin-gate circuit based on integral stochastic computing, which corresponds to (4).Let us briefly explain (4) with Fig.In addition, the multiplication and tanh functions shown in ( 4) are designed based on integral stochastic computing.Let us denote by denotes the expected value of the random variable x.The multiplication, such as σ j • J i j , is designed using a two-input multiplexor, and the addition is designed using a binary adder.The random signal r i (t) can be generated using pseudorandom generators.
Rather than calculating tanh in (1) as in the conventional method, the tanh function is approximated using a saturated up-down counter in the proposed method.The output I i (t + 1) of (4a) can be truncated with a range of 2 • I 0 states to obtain Itanh i (t + 1), which determines the spin state σ i (t +1) using the sign block.Concurrently, Itanh i (t +1) is stored in the saturated up-down counter and is used in the next cycle.

A. Hamiltonian Design
Fig. 3 summarizes the design flow used to implement SA from problem to solution.First, a combinatorial optimization problem is represented by quadratic unconstrained binary optimization (QUBO) with binary variables x i ∈ {0, 1} as follows: where Q i j is an upper triangular matrix.In this brief, three combinatorial optimization problems are solved using SA, as shown in Fig. 4. The TSP aims to find the shortest route that visits each city once and returns to the original city [26].A QUBO model for a TSP with n cities is defined as follows: where A is 4 and B is 1 based on a constraint of A ≥ n • max(W uv ) and E is the set of edges between cities. x u,i = 1 is that a customer visits city u as the ith city, and W uv is the penalty between cities u and v.The reader is directed to [27] for further details of this equation.An example of a four-city TSP is shown in Fig. 4(a) with the shortest route of a → b → c → d.
The second problem is a MAX-CUT program [28].It is defined on a graph G = (V, E), where V = {1, 2, . . ., n} and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.E ⊂ {(i, j ) : 1 ≤ i < j ≤ n} are given.Let the edge weights w i j = w ji be given such that w i j = 0 for (i, j ) / ∈ E. The MAX-CUT problem aims to find a bipartition (V 1 , V 2 ) of V so that the sum of the edge weights between V 1 and V 2 is maximized.A QUBO model for MAX-CUT with n nodes is defined as follows: Fig. 4(b) shows an example of a five-node MAX-CUT problem with weights of −1 and +1.The graph is divided into Group A (nodes 1 and 5) and Group B (nodes 2-4), where the sum of the edge weights is 4.
The third problem solved in this brief is a GI problem, which determines whether two graphs are isomorphic.In the four-node GI problem shown in Fig. 4(c), Graphs 1 and 2 are isomorphic.The QUBO model is obtained based on a vertex mapping penalty (C 1 ) and an edge inconsistency penalty (C 2 ) [13].It is defined as follows: where x u,i ∈ {0, 1} for every possible mapping of a vertex u in Graph 2 to a vertex i in Graph 1.An example of a C 2 penalty for edge inconsistencies is that when vertexes u and v in Graph 2 are mapped to i and j in Graph 1, an edge exists in one graph and not the other (e.g., u, v ∈ E 2 , while i, j / ∈ E 1 ).Fig. 5 shows an example of a three-node GI problem and its QUBO model.As the number of nodes N = 3, the number of spins required for SA is N 2 = 9.In this example, these two graphs are not isomorphic.Q i j is a 9 × 9 matrix obtained based on (8).Once the QUBO model is determined, it is converted to a Hamiltonian of an Ising model with h and J described in (3) as follows: )

B. Simulation Condition
In the proposed SA process, ( 4) is calculated to obtain the global minimum energy of the Ising model.In contrast, in the conventional SA process, (1) and ( 2) are calculated.To reach the global minimum energy, a pseudoinverse temperature, I 0 , is controlled within the range I 0min to I 0max for each iteration shown in Fig. 6.In each iteration, I 0 is gradually increased according to I 0 (t + τ ) = (1/β) • I 0 (t).When I 0 is small, the spin state, σ i , is easily flipped between "−1" and "1."In contrast, when I 0 is large, σ i can be stabilized.The conventional and proposed SA methods are evaluated using MATLAB R2020b on an 8-core Intel Core i9 at 2.4 GHz and 64-GB memory.

A. Traveling Salesman Problem
To find the global minimum energy at a high probability, a parameter search is randomly performed for β, I 0min , and n rnd .Fig. 7 summarizes the probabilities of convergence to the global minimum energy (P conv ) versus β in a four-city TSP for 1000 cycles with τ = 10.As shown in Fig. 7, the proposed SA method based on integral stochastic computing results in high convergence probabilities for many parameter search results in comparison with those of the conventional SA method with the original p-bit.Based on the parameter search, the conventional and proposed (prop1) methods select the parameters summarized in Table I for performance evaluation.In addition, the prop2 value determined manually is used only with integer values of I 0 .A simple data representation would be helpful in designing the hardware of the proposed method, as a topic for future research.
Fig. 8 shows P conv versus the average cycles of four-city TSPs in the conventional and the proposed SA methods.One-hundred arbitrary TSPs are randomly generated to evaluate P conv .As the average number of cycles increases, P conv increases.The proposed SA methods (prop1 and prop2) achieve 90% of P conv , and they are 90 times and 25 times faster, respectively, than the conventional SA method.

B. MAX-CUT Problem
A MAX-CUT problem (G11) [29] is solved using an SA method without p-bits [30] and using the p-bit-based methods (conv and prop2).G11 contains 800 vertices and 1600 edges, where the edge weights are −1 or +1.Let us briefly explain the simulation conditions of SA.The temperature T is gradually decreased by IT as T ← 1/(1/T + I T ) at each cycle, where the initial temperature is 1 and the final temperature is 1/10 000.The number of cycles varies depending on IT .In each cycle, a vertex state is randomly flipped, and a new state is accepted if the new energy (E new ) is lower than the current energy (E cur ) or higher with a probability of exp(−(E new − E cur )/T ).
Fig. 9(a) shows the mean edge weights cut versus the number of cycles for 100 trials.The proposed method (prop2) rapidly obtains near-optimal solutions of the edge weights, where the best edge weight is 564, as reported in [30].In contrast, near-optimal solutions are not obtained by the conventional method (conv).SA gradually increases the mean edge weights with a larger number of cycles, while each cycle requires less computation than the p-bit-based methods.Hence, the mean edge weights versus simulation time are also evaluated in Fig. 9(b).The proposed method is 45 times faster than SA in obtaining near-optimal solutions.

C. GI Problem
Fig. 10(a) shows P conv versus the number of nodes in each graph in GI problems for 100 000 cycles.One-hundred arbitrary problems are randomly generated for each number of nodes in accordance with [17].In conventional SA, P conv drops at more than five nodes in each graph, where five nodes correspond to 5 2 = 25 spins.In contrast, in the proposed SA method, a high P conv is maintained until there are 20 nodes in each graph.Notably, prop2 achieves P conv of 70% at 20 nodes in each graph, where 20 nodes correspond to 400 spins.As a result, the proposed SA method is capable of solving problems with 16 times more spins than conventional SA.Fig. 10(b) compares the average number of cycles in reaching the global minimum energy.The average number of cycles of the conventional SA method rapidly increases at five nodes in each graph, while that of the proposed SA method remains small.The proposed annealing method (prop1) achieves an approximately 1000-fold smaller average number of cycles than conventional annealing at six nodes.
We next compare SA methods with QA [31].In QA, a 504-qubit D-Wave Two machine was used to solve GI problems for a number of annealing and readout cycles (iterations) of 40 000 [17].Note that a qubit in QA corresponds to a p-bit (spin) in SA.Fig. 11 summarizes P conv versus the number of nodes in each graph for 40 000 iterations.QA shows a slightly better P conv than conventional SA.In [17], a modified Hamiltonian with a small number of qubits was used to increase P conv instead of the complete Hamiltonian described in (8).In addition, a postprocessing technique was carried out after QA, such as majority voting.P conv with these techniques is referred to as "QA w/small H" and "QA w/small H and postprocessing," and both show better performance than QA.The proposed SA method can solve problems with 4 times and 16 times larger numbers of nodes and spins, respectively, than QA at P conv = 100%.Compared to QA with a small Hamiltonian, the proposed method with the complete Hamiltonian can handle two times more nodes.
It is expected that if the small Hamiltonian were used in the proposed SA method, the performance difference would be larger.As the number of nodes can be increased from 5 to 13 in QA [17], the proposed SA method would be able to deal with approximately 50 nodes using small Hamiltonians.Performance comparisons of GI problems with conventional simulated and QA methods [17] with 40 000 iterations.

D. Discussion
Let us discuss the effectiveness of the proposed SA method based on integral stochastic computing in comparison with the conventional SA method with the original p-bit.The conventional method calculates the tanh function of the p-bit exactly, while tanh is approximated using the saturated up-down counter in the proposed method.In general, stochastic-computing-based approximation causes more calculation errors than conventional computing (e.g., multiplication and addition) [14], [15], [20].However, the SA method could benefit from incurring calculation errors instead of using exact calculations without errors.Table II shows the simulation time versus cycles in the 20-node GI problem using the conventional (conv) and the proposed (prop2) methods.As stochastic computing approximates the tanh function of (1) using a saturated updated counter, the simulation time is approximately 30% shorter than that of the conventional method.
Fig. 12 shows an example of energy transitions in the four-node GI problems with τ = 20, where the energy is the Hamiltonian defined in (3).Table III summarizes the mean energies of SA in the GI problems with 4, 8, and 12 nodes.Based on the simulation results, the proposed SA method can search for solutions with lower energy than the conventional method.To discuss some insights on the proposed method, the mean energy transitions and spin-update probabilities are evaluated in the MAX-CUT (G11) and the 20-node GI problem, as summarized in Table IV.The mean energy transition is defined by averaging the energy differences between consecutive cycles, and the spin-update probability is defined by averaging the spin-state (σ i ) differences.The simulation results indicate that the proposed method can reduce unnecessary spin updates, leading to larger negative energy transitions than in the conventional method.It is anticipated that the stochastic approximation of the proposed method can induce relaxed transitions of spin states, which accelerates the speed of the energy drop to the global minimum energy.
Recently, a D-Wave machine with 5000 qubits has been reported to deal with larger Ising models [32].The number of connections to a qubit is increased to 15 from 6 in the 504-qubit D-Wave machine, but it is still limited to small numbers.In contrast, as the proposed method is designed using integral stochastic computing, it can be designed using CMOS digital circuits.This means that the number   of connections to a spin is not limited, unlike in QA.Hence, the proposed method could be more scalable than QA.Note that the MATLAB simulation used in this brief performs a bit-true operation that is equivalent to an operation on hardware.To verify the proposed SA method on actual hardware, a ten-node GI problem based on prop2 is implemented on a Digilent Genesys2 field-programmable gate array (FPGA) as a simple example.Regarding the hardware resources on the FPGA, the numbers of lookup tables (LUTs) and registers are 38 039 and 2891, respectively, using an 8-bit fixed-point data representation.The proposed method runs on the actual hardware at 50 MHz, and it can be approximately two orders of magnitude faster than the MATLAB simulation.A hardware architecture and large-scale implementation will be explored in future work.

VI. CONCLUSION
In this brief, we introduce fast-converging SA based on integral stochastic computing for Ising models.The proposed method approximates the p-bits (spins) using stochastic computing, which can search for solutions to combinatorial optimization problems around the global minimum energy.As a result, the proposed SA method achieves a convergence speed a few orders of magnitude higher than conventional SA with the original p-bit in three combinatorial optimization problems: TSP, MAX-CUT, and GI problems.Compared with the SA method without p-bits, the proposed method is 45 times faster in the MAX-CUT problem (G11).In addition, compared with QA using a 504-qubit D-Wave Two machine, 16 times more spins (qubits) in the GI problem can be solved by the proposed method in the same number of iterations.
Prospects for future research include simulating other combinatorial optimization problems using the proposed SA method to evaluate the effectiveness of the proposed method.A large-scale hardware implementation of the proposed annealing method would be interesting as a fast solver of real-world social issues represented as combinatorial optimization problems.

Fast-
Converging Simulated Annealing for Ising Models Based on Integral Stochastic Computing Naoya Onizawa , Member, IEEE, Kota Katsuki, Duckgyu Shin , Warren J. Gross , Senior Member, IEEE, and Takahiro Hanyu , Senior Member, IEEE Abstract-Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising models.In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing.The stochastic implementation approximates a p-bit function, which can search for a solution to a combinatorial optimization problem at lower energy than conventional p-bits.Searching around the global minimum energy can increase the probability of finding a solution.The proposed stochastic computing-based SA method is compared with conventional SA and quantum annealing (QA) with a D-Wave Two quantum annealer on the traveling salesman, maximum cut (MAX-CUT), and graph isomorphism (GI) problems.The proposed method achieves a convergence speed a few orders of magnitude faster while dealing with an order of magnitude larger number of spins than the other methods.Index Terms-Combinatorial optimization, graph isomorphism (GI) problem, Hamiltonian, Ising model, quantum annealing (QA), simulated annealing (SA), stochastic computing, traveling salesman problem (TSP).

Fig. 3 .
Fig. 3. Flow of SA for a combinatorial optimization problem.A QUBO model is formulated for the problem.

Fig. 4 .
Fig. 4. Problems for SA solved in this brief: examples of (a) TSP with four cities, (b) MAX-CUT program with five nodes, and (c) GI problem with four nodes.

Fig. 5 .
Fig. 5. Example of a three-node GI problem: (a) two graphs that are not isomorphic and (b) QUBO model of the problem based on (8).

Fig. 9 .
Fig. 9. Mean edge weights cut versus (a) number of cycles and (b) simulation time on the MAX-CUT problem G11.The performance is evaluated with 100 trials each for the SA method and the p-bit-based methods (conv and prop2).

Fig. 10 .
Fig. 10.Performance comparisons of GI problems for 100 000 cycles: (a) P conv versus number of nodes in each graph and (b) average cycles versus number of nodes in each graph.

Fig. 12 .
Fig. 12. Example of energy transitions in the four-node GI problem with τ = 20: (a) conventional and (b) proposed (prop2).The proposed method can search for a solution around the global minimum energy.

TABLE II SIMULATION
TIME (S) VERSUS CYCLES IN THE 20-NODE GI PROBLEM.THE PROPOSED METHOD PERFORMS APPROXIMATELY 30% FASTER THAN THE CONVENTIONAL METHOD

TABLE III MEAN
ENERGY OF SA IN THE GI PROBLEMS

TABLE IV MEAN
ENERGY TRANSITIONS AND SPIN-UPDATE PROBABILITIES IN THE CONVENTIONAL AND PROPOSED METHODS