Ising Machines’ Dynamics and Regularization for Near-Optimal MIMO Detection

Optimal MIMO detection is one of the most computationally challenging tasks in wireless systems. We show that new analog computing approaches, such as Coherent Ising Machines (CIMs), are promising candidates for performing near-optimal MIMO detection. We propose a novel regularized Ising formulation for MIMO detection that mitigates a common error floor issue in the naive approach and evolve it into a regularized, Ising-based tree search algorithm that achieves near-optimal performance. By means of numerical simulation using the Rayleigh fading channel model, we show that in principle, a MIMO detector based on a high-speed Ising machine (such as a CIM implementation optimized for latency) would allow a higher transmitter antennas (users)-to-receiver antennas ratio and thus increase the overall throughput of the cell by a factor of two or more for massive MIMO systems. Our methods create an opportunity to operate wireless systems using more aggressive modulation and coding schemes and hence achieve high spectral efficiency: for a <inline-formula> <tex-math notation="LaTeX">$16\times 16$ </tex-math></inline-formula> MIMO system, we estimate around <inline-formula> <tex-math notation="LaTeX">$2.5\times $ </tex-math></inline-formula> more throughput in the mid-SNR regime (≈12 dB) and <inline-formula> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula> more throughput in the high-SNR regime (>20 dB) as compared to the industry standard, a Minimum-Mean Square Error (MMSE) linear decoder.


I. INTRODUCTION
Wireless technologies have recently undergone tremendous growth in terms of supporting more users and providing higher spectral efficiency, with the next generation of cellular networks planning to support massive machine-to-machine communication [1], large IoT networks [2], and unprecedented data rates [3].The number of mobile users and data usage is rapidly increasing [4], and while data traffic has been predominantly downlink, the volume of uplink traffic is becoming ever higher [5] due to the emergence of interactive services and applications.The problem of optimal and efficient wireless signal detection in a Multiple-Input, Multiple-Output (MIMO) system is central to this rapid growth and has been a key interest of network designers for several decades.While the optimal Maximum Likelihood (ML) detector is well known, it attempts to solve an NP-Hard problem [6] exactly, and so its implementation is usually impractical and infeasible for real-world systems.These computational challenges have prompted network designers to seek optimized implementations such as the Sphere Decoder [7], [8], or sub-optimal approximations with polynomial complexity.These methods include linear detectors (like the Minimum Mean Square Error decoder (MMSE) [9]) which perform channel inversion, successive interference cancellation (SIC) based techniques [10] that focus on decoding each user sequentially while canceling inter-user interference, approximate tree search algorithms like the Fixed Complexity Sphere Decoder (FSD) [11] and Latticereduction based algorithms which involves pre-processing the channel to produce a reduced lattice basis [12].However, even today, practical methods that achieve near-optimal performance for systems with large numbers of both users and antennas are lacking [13].
Instead, Massive MIMO systems [14], [15], where the number of base station antennas is much larger than the number of users, have emerged as the dominant solution to the poor bit error performance of practically-feasible MIMO detectors.Such systems have extremely well-conditioned channels, and hence even linear detectors like MMSE achieve near-optimal BER performance [16], [17].While this enables very low BER without any significant computational load, unfortunately, it comes at the cost of extreme under-utilization of available radio resources, as we can potentially support much larger spectral efficiency by concurrently serving more uplink users (up to the number of base station antennas).The existence of a near-optimal MIMO detector with practical complexity for large MIMO systems would enable the expansion of massive MIMO systems to serve more users by having a base station antenna-user ratio of less than two, improving the overall uplink throughput of the system several-fold.
In the Computer Architecture and Physics communities, the last decade has seen a rise of a novel class of analog computers that use the dynamics of a physical system to heuristically find solutions to optimization problems framed as instances of the Ising model, one of the most studied frameworks for magnetism in statistical mechanics.These methods include Quantum Annealing [18], [19], Coherent Ising Machines (optical or opto-electronic systems) [20], [21], [22], [23], [24], [25], and Oscillator-based Ising Machines [26].These already show promise as practical computational structures for addressing some NP-hard problems arising in practical applications.The application of Quantum Annealing (QA) and Ising machines to MIMO detection is starting to be investigated in the last couple of years [27], [28] and has shown promising results.The QuAMax MIMO detector [29] leverages quantum annealing for MIMO detection.A classicalquantum hybrid approach to QA-based ML-MIMO was proposed in [30].QA has also shown promising results for other tough computational problems in wireless systems like Vector Perturbation Precoding [31].In [32], authors explore the use of Parallel Tempering for Ising-based MIMO detection (ParaMax), improving the performance of QuAMax.However, these recent works [29], [32] that leverage a straightforward mapping of ML-MIMO decoding problem to the Ising model experiences an error floor in the bit error rate (BER) versus the signal-to-noise ratio (SNR) characteristics.
In this paper, we observe that this error floor is present in the regime of practical deployment for MIMO detection.More specifically, in the regime relevant for real systems (uncoded BER of 10 −3 − 10 −6 ), even if we dismiss the limitations of non-idealized physics-based Ising solvers, depending on the SNR, there are many interesting scenarios in which they would not serve as good MIMO detectors in practical systems if the known Ising formulation of the ML-MIMO detection problem is used.Hence we propose a novel regularized Ising formulation of the ML-MIMO problem, using a low-complexity approximation, that significantly improves the BER performance of the existing state-of-the-art (see Fig. 1 for an overview of our decoder architecture) and show that it mitigates the error-floor problem present in the existing state-of-the-art Ising machine-based MIMO detectors.
Our proposed methods improve upon the drawbacks of several popular MIMO detectors: the MMSE decoder has very low complexity but has extremely bad BER performance, and our proposed methods can vastly outperform MMSE by achieving near-optimal BER performance.While SICbased and FSD-based methods can provide better BER than MMSE, SIC requires a sequential implementation, and FSD has limited parallelism; hence, they cannot fully utilize the large pool of parallel computing resources available in the FPGA/ASIC/GPU-assisted BS implementations.In contrast, our proposed methods have a very high degree of parallelism and can efficiently utilize the large pool of parallel computing resources to meet the processing requirements of state-of-theart LTE systems.
It is important to note that while we discuss CIMs in this paper, there are a number of other promising Ising-machine technologies [33], including quantum annealing [18], [19], photonic Ising machines besides CIMs [34], [35], [36], [37], digital-circuit Ising solvers [38], [39], [40], [41], [42], bifurcation machines [43], oscillation based Ising machines [26], and memristor and spintronic Ising machines [44], [45], [46].Note that, another promising approach of using Quantum Approximate Optimization Algorithm (QAOA) for solving the maximum likelihood detection problem [47] has been proposed as well; however, current evaluation and bench-marking is limited to Single-Input-Single-Output (SISO) channels and very small MIMO scenarios.The main contributions of this paper are as follows: • We propose a novel regularised Ising (RI) formulation that enhances the probability of finding the ground state and provides significant gains in BER performance over the existing state-of-the-art.We show that it mitigates the error-floor problem present in the existing state-of-the-art Ising machine-based MIMO detectors.• We propose Regularised Ising MIMO (RI-MIMO), which is based on the proposed RI formulation, and show that it is asymptotically optimal and can achieve near-optimal performance, in the relevant BER regime (an uncoded BER of 10 −3 -10 −6 ), within practical complexities.• We further evolve RI-MIMO into Tree search with RI-MIMO (TRIM) that allows us to achieve better performance when the complexity of the underlying MIMO detection problem increases with higher-order modulations.• Finally, we estimate how a physical implementation of a CIM could employ these algorithms to bridge the gap between optimality and efficiency in MIMO detection and may realize a near-optimal MIMO detector, meeting the cellular-system-complexity and timing requirements of practical scenarios, years before quantum-annealing technology would reach the necessary maturity [48], [49].The rest of the paper is organized as follows.Section II describes the MIMO system model and the reduction of the ML-MIMO problem to an Ising optimization problem.Section III is a primer on CIMs.Section IV describes our novel Ising formulation and the proposed MIMO detection algorithms (RI-MIMO and TRIM).Section V contains an extensive evaluation of BER and spectral efficiency of our methods.We show that our techniques mitigate the error floor problem and can achieve the same BER as the Sphere Decoder for 16 × 16 MIMO with BPSK modulation.We study the impact of practical constraints, like finite precision, on the performance of CIM for MIMO detection.We perform extensive empirical experimentation for parameter tuning.We show that for higher-order modulations, our methods provide a 10-15 dB gain over the MMSE receiver.We evaluate the spectral efficiency of our methods with Adaptive Modulation and Coding (AMC) and show that our methods can achieve around 2.5× more throughput in the mid-SNR regime (≈12 dB) and 2× more throughput in high-SNR regime(>20 dB) than the industrially-viable MMSE decoding.We evaluate the BER and spectral efficiency of our methods for several massive MIMO systems (16 × 32, 32 × 64, 32 × 128, 64 × 128 and 64 × 256).We also evaluate our methods on the geometrybased QuaDRiGa [50] channel model.Finally, we conclude in Section VI.

II. SYSTEM MODEL
In this section, we will describe the MIMO system model, the MIMO Maximum Likelihood Detection (ML-MIMO) problem, and the transformation between the ML-MIMO problem and its equivalent Ising problem.
Consider the UL transmission in a MIMO system [51] with N r antennas at the base station (BS) and N t users, each with a single antenna.x o = {x 1 , x 2 , ...x Nt } T is the transmit vector where x i is the symbol transmitted by user i. y = {y 1 , y 2 , ...y Nr } T is the received vector where y j is the signal received by antenna j.Each x i is a complex number drawn from a fixed constellation Ω.The channel between user j and receive antenna i is expressed as a complex number h ij that represents the channel's attenuation and phase shift of the transmitted signal x j .Let H denote the complex-valued channel matrix, Fig. 1: Uplink Maximum Likelihood MIMO detection (ML-MIMO) using Ising Machines, illustrating the differences between the proposed regularised Ising approach and the direct application of the Ising formulation.
where n denotes Additive White Gaussian Noise (AWGN).
With AWGN, the optimal receiver is the Maximum Likelihood receiver [8] which is given by xML = arg min An Ising optimization problem [29] is quadratic unconstrained optimization problem over N spin variables, as follows: where each spin variable s i ∈ {−1, 1}, or in its vector form (RHS) s = {s 1 , s 2 , ...s N }, where all diagonal entries of the matrix J are zeros.
The minimization problem expressed in (2) can be equivalently converted into Ising form by expressing x using spin variables.The first step is derive a real valued equivalent of (1), which is obtained by the following transformation, The ML receiver described in (2) has the same expression under the transformation and the optimization variable x is real valued.Let us say x has N elements, which are drawn from a square M-QAM constellation, then each element of the optimization variable x takes integral values in the range The number of bits needed to express Ω r are given by r b = ⌈log 2 ( √ M )⌉.let s be an N * r b × 1 spin vector such than each element of s can take values {−1, 1}.Then, element j of x can be represented using r b spin variables {s j , s j+N ...s j+(r b −1)N }, Let us define the transform matrix T = 2 r b −1 I N 2 r b −2 I N ... I N , then x can be expressed as, For a rectangular QAM constellations and BPSK, the ℜ(x) and ℑ(x) in ( 4) have different range and ( 5) can be accordingly modified to construct the transform matrix T .We substitute (6) in the real valued ML problem to obtain the ising formulation for ML receiver.Let z = ỹ − HT 1N * r b + ( √ M − 1) H1 N , then the Ising problem for ML-MIMO is described by, h = 2 * z T HT, J = −zeroDiag(T T HT HT), (7) where zeroDiag(W) sets the diagonal elements of matrix W to zero.We further scale the problem such that all the coefficients lie in [−1, 1].The Ising solution can be converted to real valued ML solution using (6), which can be then converted to the complex valued solution for the original problem described in (2) by inverting the transform described by (4).

III. COHERENT ISING MACHINES (CIM)
In simple terms, an Ising Machine can be described as a module that takes an Ising Problem (Eq. 3) as input and outputs a candidate solution, according to an unknown probability distribution that depends on a few parameters.We refer to a single, independent run on the Ising Machine as an "anneal", borrowing nomenclature from the simulated/quantumannealing methods.After each run, the machine is reset.A common approach is to run several samples of a single Ising problem instance and then return the best-found solution (the "ground state") in the sample.
Coherent Ising Machines (CIMs), as originally conceived [21], [22], [23], implement the search for the ground state of an Ising problem by using an optical artificial spin network.Their baseline architecture encodes the Ising spins into a train of time-resolved, phase-coherent laser pulses traveling on an optical fiber loop, undergoing controlled interference between all pairs of wavepackets.The phase dynamics of the pulses is governed by the presence of a nonlinear element in the form of a degenerate-optical-parametricoscillator (DOPO).For the purpose of our work, this version of the CIM can be modeled using a system of stochastic differential equations [20].In this work, we will use a model of the CIM that describes its continuous-time limit evolution neglecting quantum effects, which has been proven to be fitting the experiments of multiple devices and represents a baseline setup for more sophisticated embodiments.Following Ref. [24], the in-phase (c i ) and quadrature (q i ) components of each signal-variable can be modeled using the following differential equations: where the normalized pump rate (p) are CIM parameters that relates to the laser used in the machine and can be tuned easily.
The constant C is typically fixed by design considerations (mostly by the power transmission coefficient and the laser saturation amplitude).J ij is the Ising coupling coefficient from the j th pulse to the i th pulse, which is programmable.The stochasticity is introduced through dW 1 and dW 2 , which are independent Gaussian-noise processes.The variable t is time (normalized with respect to the photon decay rate).
An Ising problem with the spin-spin-coupling matrix J is encoded in the CIM by setting the optical couplings ζij ∝ J ij .
The anneal consists of pumping energy into the system by gradually varying p. Heuristically this is implemented in a schedule at a speed that is some monotonic function of N .The solution to the Ising problem is read out at the end of the anneal by measuring the in-phase component of each DOPO c i , and interpreting the sign of each as a spin value s i , i.e. s i = sign(c i ).

A. CIM Simulator
To enable the study of how an ideal CIM would perform on solving Ising instances related to the application at hand (MIMO detection), we implemented a software simulator of a CIM that integrates the differential equations described in (9), using double precision numerics implemented in MATLAB.The annealing schedule consists in varying the pump parameter as p(t) = 2 * tanh( 2t N ).For the purpose of numerical integration, dt = 0.01 and the total anneal time is 128 * dt, which corresponds to 128 steps of numerical integration).In both numerics and experiments, we can also sample the state of the CIM at intermediate times (number of measurements N m ) to retrieve more candidate solutions for the programmed Ising problem and then select the best solution found; so each run actually evaluates N m = 128 candidate solutions.However, most of the time, the best solution corresponds to the one at the final measurement.Following the inspiration from design described in Ref. [24] we set C = √ 10.As the pump rate is gradually increased from 0 to 2, the amplitudes of DOPO pulses demonstrate a bifurcation and settle to either a positive (indicating spin +1) or a negative value (indicating spin -1).

IV. DESIGN
In this section, we propose the RI-MIMO detector, based on our novel regularised Ising formulation of maximumlikelihood MIMO receiver, which mitigates the error floor problem.We use a single auxiliary spin variable to transform the Ising problem into a form compatible with CIMs.We finally propose TRIM, a novel tree search algorithm based on RI-MIMO, which enhances the performance for higher-order modulations.

A. RI-MIMO: Regularized Ising-MIMO
As noted before, when we use Ising machines for MIMO detection, there is a finite probability of not returning the ground state of the problem, even at zero noise, and hence, we cannot decrease the BER beyond a certain limit (i.e., there is an "error floor").We can observe this error floor in existing works on ML-MIMO detection using a Quantum Annealer [29].In this section, we present Regularized Ising-MIMO (RI-MIMO) that mitigates the error floor and provides significant performance enhancements.
1) Regularization: The key idea is to add a regularisation term based on a low complexity estimate of the solution, which, as we will see later, will enhance the probability of finding the ground state of the Ising problem and hence improve the BER performance.The maximum likelihood MIMO receiver is given by ( 2).Let us say that we have a polynomial-time estimate (obtained by algorithms like MMSE or ZF) x P .Let s P be the spin vector corresponding to x P obtained from (6).We add to the Ising form a penalty term for deviations from the poly-time estimate, which would penalize non-optimal solutions in low noise scenarios, to obtain: (9) where r(ρ, M, N t ) is a regularization parameter dependent of the SNR, modulation and number of users.This style of regularisation falls in the class of generalized Tikhonov regularisation [52].We will look at the choice of r(ρ, M, N t ) in Section V-C.
The RI-MIMO-N a algorithm is then defined as follows: 1) Convert the ML-MIMO detection problem into the Ising form as described in Section II.2) Add the regularisation term as described by (9).
3) Perform N a anneals using an Ising machine.In contrast to RI-MIMO, a trivial way to lower the error floor is to increase the number of runs per instance of the problem.However, each additional anneal increases the overall computation time as well.Another simple approach is to use a direct combination of Ising machine-based ML-MIMO and MMSE, where we select the best solution among those not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/TWC.2022.3189604generated by the Ising machine and the MMSE solution.This removes the error floor but, at higher SNR, the reduction in BER is primarily due to the MMSE receiver and, as we will show later, our proposed techniques have much better BER performance.

B. Solution of an Ising problem having access only to programmable quadratic couplings
Not all CIMs are designed to solve Ising problems containing a bias term (h T s in (3)).In order to solve a general Ising problem using a CIM that does not natively support bias terms (although some, such as the implementation used in [22], do), we introduce an auxiliary spin variable s a and solve the following Ising problem: which contains no bias terms and can be solved using an Ising machine that doesn't support bias terms.(10) has two degenerate solutions: i=1 is the solution for the original Ising problem (3).Hence we can obtain the solution to the original Ising problem from the solutions of the auxiliary Ising problem.
Hence, J a (s, 1) = J(ŝ).Note that, the auxiliary Ising problem has two degenerate solutions (s, 1) and (−s, −1), where s is also an optimal solution for the original Ising problem.Therefore, we can obtain the solution to the original problem from the solutions of the auxiliary problem.

C. TRIM: Tree search with Regularized Ising-MIMO
A tree search algorithm for ML-MIMO receiver represents the process of finding the optimal solution by visiting various leaf nodes of a suitably constructed search tree.For an M-QAM constellation, there are M possible symbols for each user.We represent the M possibilities for user-1 by M nodes at depth 1.Each of these nodes has M children, which represent the M possibilities for user 2, and so on.Any path from the root to a leaf node will represent a candidate solution for the ML-MIMO problem.As noted before, there exist several techniques that involve traversing this search tree to obtain good quality solutions in polynomial complexity.
In this section, we build on the ideas of RI-MIMO, to further improve the performance of CIM based ML-MIMO receiver by proposing a hybrid tree search algorithm.The maximumlikelihood MIMO receiver described in (2) can be expressed using a QR decomposition, H = QR, xML = arg min where w = Q † y.If we fix the symbol corresponding to user , where R is an (N t − 1) × (N t − 1) upper triangular matrix, c is (N t − 1) × 1 and r is a scalar.Substituting this is (13) and simplifying, we obtain: This has the same form as (2) and can be solved using RI-MIMO techniques.The same procedure can be extended if we fix symbols corresponding to more than one user.We use this formulation for building a tree search algorithm, TRIM d -N a , augmented by RI-MIMO-N a .While solving the ML-MIMO problem, we consider all possibilities for d users, which is the same as considering all nodes in the search tree up to depth (d).For each branch of the search tree, once we fix the symbols corresponding to d users, we formulate the maximum likelihood problem for the remaining users as illustrated above, and solve it using RI-MIMO-N a , as illustrated in Section IV.In the end, we select the best solution out of all the candidate solutions obtained by RI-MIMO-N a sub-problems corresponding to various branches of the search tree.Note that for an M -QAM constellation, TRIM d -N a requires M d × N a anneals.

V. EVALUATION
In this section, we will perform an extensive evaluation of RI-MIMO and TRIM in various scenarios.We will study the performance of RI-MIMO considering some practical constraints of Ising Machines (IM), like the finite precision available to program the Ising coefficients.We will also evaluate RI-MIMO and TRIM for various modulation schemes, MIMO sizes, and under adaptive MCS (Modulation and Coding Scheme).
We will simulate (using MATLAB, see III-A) uplink wireless MIMO transmission between N t users with one transmit antenna each and a base station with N r receive antenna.We assume Rayleigh fading channel between them and additive white Gaussian noise (AWGN) at the receiver.We further assume, for simplicity, that the channel is known at the receiver and all users use the same modulation scheme.The BER of an N t × N r MIMO system is calculated as the mean BER of the N t independent data streams transmitted by N t users.

A. BER Performance
We start comparing the optimal decoder (the Sphere Decoder) and the linear MMSE decoder against RI-MIMO and the unregularized ML-MIMO using as a test case BPSK 16×16.This case will represent a baseline for our benchmarks and their sophistication.Note that a trivial way to remove the error floor is to take the better solution out of those generated by MMSE and CIM-ML-M a .We see from Fig 2 that RI-MIMO provides much better BER than CIM-ML, mitigating the error floor problem associated with it.We note that if we run concurrently MMSE and ML-MIMO for each instance and we select the best of both results (CIM-ML+MMSE), we are still less performant than RI-MIMO.TRIM performs similar to RI-MIMO when both algorithms execute the same number of total anneals.(TRIM 1 -32 vs RI-MIMO-64).However, as we will see later, this is not the case when higher modulations are used.The best performing algorithm is TRIM 1 -64, which achieves near-optimal performance for 16 × 16 and 20 × 20 MIMO with BPSK modulation.For 24×24 MIMO, we see that the performance gap between TRIM 1 -64 and Sphere Decoder increases, and a higher number of anneals are required to bridge the gap.Note that, for MIMO systems with a larger number of antennas, along with an increase in the number of anneals, increasing the depth of the tree search might be required to maintain the performance gains of TRIM.

B. Complexity of RI-MIMO: From Simulation to CIM Hardware Implementation
In this section, we explore the computational complexity of RI-MIMO and discuss the hardware considerations, for a real-world deployment of our algorithms, to satisfy the LTE requirements.While this study is performed with a simulator, our objective is to predict the performance and understand the trade-offs for designing a future hardware implementation of a CIM for ML-MIMO.To estimate the feasibility of MIMO decoding, we need to compare the computational requirements versus the perspective capabilities on a real-world perspective machine.
Infeasible scaling of FPGA/ASIC: Before we go forward and discuss the hardware requirements of a real-world CIM, let us look at why FPGA/ASIC-based solutions will not be enough to meet the requirements of future wireless systems.As stated before, the Sphere Decoder is known to be theoretically optimal, and there are several FPGA/ASIC implementations of the Sphere Decoder in the literature; however, it is known to have exponential average complexity, (unless we operate sufficiently under the capacity or have high SNR) [53], [54], and becomes impractical as the size of the MIMO system increases.The ETH SD [55] is known to be one of the most optimized implementations of the Sphere Decoder algorithm.Using the data in [55], we can approximately extrapolate the results of ETH SD by linearly extrapolating the logarithm of the average number of nodes visited (exponential scaling).As per the LTE specifications, a typical LTE deployment with 10 MHz bandwidth and 15KHz sub-carrier spacing produces 8400 MIMO detection problems (corresponding to 14 OFDM symbols per sub-frame and 600 sub-carriers) every sub-frame of 1-millisecond duration.Based on this, we compute the number of parallel ASIC/FPGA implementations of ETH SD to meet the LTE requirements (in Table I), illustrating that current silicon-based implementation can only support MIMO sizes like 8 × 8 with less than 25 parallel implementations, but larger systems like 64 × 64 will require a billion parallel instances of ETH SD, which is impractical.As a result, practical deployments of large MIMO systems end up using computationally efficient linear detectors like Zero Forcing or MMSE, but their BER performance is significantly worse.The inefficient scaling FPGA implementations of Sphere Decoder and unacceptable BER performance of linear detectors prompt us to explore novel hardware/algorithmic solutions and motivate us to study CIM-based MIMO detection.Returning to the discussion on hardware considerations for real-world deployment of our methods, it is important to note that the 8400 MIMO detection problems (generated every millisecond) are independent, so different instances can be solved in an embarrassingly parallel fashion if there are enough spin variables available in the machine.In Fig. 3(A   Ising instances need to be solved every 1 millisecond.Throughput Requirement gives the number of anneals (of instances each with N s spins) that the set of Ising machines needs to process in a millisecond, given by 8400 × N a /1 ms. and 3(B) , we look at the variation of BER with the Number of Anneals per instance (N a ) and try to characterize the value of N a required to provide near-optimal performance.We simulate the RI-MIMO-N a system, with BPSK modulation, for various values of N a .In Fig. 3(A) and 3(B), we illustrate the variation of RI-MIMO BER with N a for 16×16 and 20×20 MIMO with BPSK modulation at 5, 7.5, and 10 dB.We see that at first, the RI-MIMO-BER reduces with increasing N a and asymptotically approaches the optimal BER (Sphere Decoder).Based on the evaluation of the BER performance of our proposed algorithms, as discussed in the previous section, we see that we need to study the performance of our proposed methods using a number of anneals of the order of N a = 16 − 256 depending on the modulation and size of the MIMO problem.
To investigate the variation in the required number of anneals with an increase in the size of the MIMO system, we define the metric, Number-of-Anneals-to-BER (N a TB), which is equal to the number of anneals required to achieve a certain multiple of the optimal BER.In Fig. 3(C), we plot the variation in the number of anneals required to achieve 1.1×, 1.2×, and 1.5× the optimal BER as a function of the MIMO Size (N ).We see that, there is an exponential increase in N a TB with increasing MIMO size, which is consistent with the fact that the underlying decoding problem is NP-Hard.In Table II we list the regimes of MIMO operation that are compatible with plausibly near-term Ising machines.For each regime, the table gives a throughput requirement that a set of Ising machines would need to satisfy.The fundamental requirement we assume in all the regimes is that 8400 Ising instances must be solved within 1 millisecond.Given our findings from simulations of how many anneals (N a ) need to be performed to achieve suitable MIMO performance for each regime, we can characterize the requirement for the set of Ising machines in terms of how many anneals they must perform in total within 1 ms.have similar working mechanisms [33], including electronic oscillator-based Ising machines [26].However, we will now briefly give some interpretation of our results that is specific to CIMs, and in particular, time-multiplexed CIMs.In timemultiplexed CIMs such as those reported in Refs.[21], [22], [23], [56], the length of an anneal (128) corresponds to the number of roundtrips of pulses around an optical cavity.In Ref. [56], pulses representing spins are separated in time in a cavity by 200 picoseconds; in a CIM system with such a pulse spacing the time required for a single roundtrip is N s × 200 ps, and hence the time required for a single anneal is 128 × N s × 200 ps.If we consider the MIMO regimes for which Ising instances with N s = 64 spins need to be solved, a single anneal will take 1.638 µs.Therefore the number of 64spin anneals that a single time-multiplexed CIM with 200-ps pulse spacing can perform per millisecond is ≈ 610.Even if the roundtrip time was decreased by a factor of 20 (which is currently under experimental investigation by groups working on time-multiplexed CIMs), leading to a 20× increase in the speed of a single CIM and hence a 20× reduction in the number of CIMs needed to meet the throughput requirement, one would still need ≈ 176 copies of the CIM.While it is certainly conceivable that one could construct hundreds or thousands of copies of such a 64-spin time-multiplexed CIM, especially when considering on-chip implementations using integrated photonics [57], it is clear that meeting the throughput requirement is a large practical engineering challenge.
Returning to a discussion applicable to Ising machines generically, it seems likely that any Ising machine-be it a CIM or any other type-will struggle to meet the throughput requirement using just a single machine solving Ising instances serially because to solve 8400 instances in 1 ms serially, one needs to solve (up to the accuracy achieved using N a anneals in a CIM) each instance in 119 ns, and even for Ising instances with only 64 spins, this seems beyond any current machine or approach.Therefore it seems likely that multiple copies of an Ising machine running in parallel will be needed, and the question turns to how many parallel Ising machines are needed and how practical is operating such a set of machines at each LTE base station?This is a major open question and challenge for the Ising-machine community at large.

C. Optimal regularisation factor for RI-MIMO and Higher Order Modulations
In this section, we will discuss the tuning of the value for the regularisation prefactor for RI-MIMO in Eq. 9, r(ρ, M, N t ), relative to the magnitude of Ising coefficients of the original un-regularised problem.To maintain consistency of results, we normalize the Ising coefficients of the original problem to [−1, 1].Starting from the 16 × 16 BPSK baseline MIMO system, we compute performance for various values.
In order to determine the impact of modulation(M ), number of users (N t ) and SNR (ρ) on the optimal value of r(ρ, M, N t ), we look at BER vs regularisation factor for various MIMO sizes and modulations, while keeping SNR fixed at 10 dB and 15 dB in Fig. 4. We note that the BER reduces dramatically from r = 0 (unregularized) to around r = 0.1, beyond which the sensitivity of BER to choice of r is not much.We note that the optimal value is around 0.15, which acts as a threshold: for larger r the BER performance is only slightly affected.Based on these observations, for practicality, in our benchmarks, we will be using r(ρ, M, N t ) = 0.15, irrespective of SNR, modulation, and the number of users.In a practical system,  similar experiments can be used to construct a lookup table for the optimal value of r as a function of M, N t and ρ.
Using the prescriptions above, Fig. 5 provides the BER performance of RI-MIMO and TRIM for a 12×12 and 16×16 MIMO system with 4 QAM and 16 QAM modulations.We see that the error floor problem appears even for higher modulation, and RI-MIMO successfully mitigates it.Note that the difference between RI-MIMO and MMSE reduces as the modulation order increases.However, TRIM provides superior performance for the same number of total anneals and provides 10-15 dB gain over MMSE.Finally, we expand our evaluation to have even more antennas and users (32 × 32 and 64 × 64 MIMO) in Fig. 6 and see similar performance trends for TRIM, RI-MIMO and MMSE as observed for our baseline 16 × 16 MIMO.

D. Finite Precision of Ising Machines
Physical realizations of CIMs typically involve analog control of the couplings between spins.Consequently, the precision in specifying the coefficients of an Ising problem (J ij ) can be limited.Currently, published results of optical hardware implementations of CIMs have only been for problems where the Ising coefficients take values {−1, 0, 1}, and even digital optimized implementations on GPUs are limited to floats encoded with ≈ 5 bits.While the precision may be improved, any analog machine will have a rather strict practical limit on how much precision can be achieved.Current quantum annealers have a similar limit on specifying Ising coefficients accurately, which can lead to drastic deterioration of performance [58].In this subsection, we look at the performance of RI-MIMO under the constraint that Ising coefficients can be expressed using K bits only.All Ising coefficients are normalized to [−1, 1]; one bit is used to indicate the sign (positive/negative), and K-1 bits are used to express the decimal part of the Ising coefficient.We simulate RI-MIMO with limited precision CIM for the baseline 16×16 BPSK MIMO system in Fig. 7(left).We see that 5 bits of precision are enough for achieving performance similar to double precision for the given system.If we reduce precision below 5 bits, then the performance starts to deteriorate.It is interesting to note that, even with 2-bit precision, where the Ising coefficients can take values {−1, 0, 1} only, RI-MIMO performs much better than MMSE.Note that for a more complicated system with more users or higher modulations, the precision requirement might be higher -but we leave the analysis to a future study.Independently from the optimization solver, we could study the "Resilience" [59] of a problem as a metric to study the changes in ground state configuration due to noise in representing Ising coefficients.In Fig. 7(right), we look at the L 0 distance between the original ground state and the ground state of the Ising problem represented with finite precision.It is a natural metric for wireless applications; as for BPSK, it essentially represents the BER.In Fig. 7(right), we observe for an indicative 8 × 8 BPSK instance set that L 0 deviation is nearly zero for 5 bits of precision and that  the regularized Ising formulation is more resilient than the unregularized.

E. Massive MIMO
In this section, we will address the need and performance of RI-MIMO/TRIM for massive MIMO systems.Massive MIMO systems tend to have a much higher number of antennas at the receiver than at the transmitter.Due to this, the channel is extremely well-conditioned, and even linear detectors such as MMSE or iterative algorithms can perform extremely well [60].However, having a large number of antennas at the base station and yet supporting a small number of users reduces the overall cell throughput.Hence, the low BER of linear methods comes at the price of the total number of users served.We will see that for massive MIMO systems, where the linear methods are not optimal, RI-MIMO can be used to obtain optimal performance with very low complexity.In Massive MIMO systems, where the ratio between the number of receiver antennas and the number of transmitter antennas is three or more, MMSE can provide performance similar to the Sphere Decoder.However, the total cell throughput achieved is much lower than what may be possible, with near-optimal MIMO detection, in an equivalent large MIMO system (where the number of receiver antennas and transmitter antennas is the same).In Fig. 8 we look at massive MIMO systems with 16 antennas at the base station and BPSK modulation.We see that for 8 × 16 and 10 × 16 MIMO with BPSK, the MMSE decoder is sub-optimal, and RI-MIMO can provide near-optimal performance at very low complexity.In Fig. 9, we expand our evaluation to more realistic massive MIMO scenarios with 128 and 256 antennas at the base station using BPSK, 4-QAM, and 16-QAM modulations.Note that, for such systems, we are unable to compute the optimal BER due to extremely large complexity of the Sphere decoder.We note that, RI-MIMO performs much better than MMSE for such systems when BPSK or 4-QAM modulation is used.However, as noted before, the performance gains are much less for 16-QAM modulation.Improving the performance of our methods for 16-QAM and higher modulations is the key focus of our future work.
Before we go ahead and look at spectral efficiency, note that if we do not consider the MMSE solution as a candidate, then RI-MIMO will have an error floor as well.However, due to a much higher probability of success than the conventional approach, the error floor associated with RI-MIMO will be much lower and reduced much more quickly, to very low values, in comparison to the conventional approach.We see from Fig. 10 that it take just 4-6 anneals per instance (much less than the conventional approach) for a 16 × 16, 20 × 20, and 24 × 24 MIMO system with BPSK modulation, to reduce the error floor to 10 −5 .In contrast, the error floor associated  with CIM-ML reduces very slowly, and even with 64 anneals per instance, it remains around 10 −3 (as seen in Fig 2).If we consider the MMSE solution as a candidate, then there is no error floor, as MMSE will make sure that BER goes to zero as SNR goes to infinity.

F. Throughput with MCS adaptation
In this section, we will try to look at the spectral efficiency of RI-MIMO.In a practical system, the Modulation and Coding Scheme (MCS) associated with the data transmission is selected dynamically based on the SNR experienced by the user.Usually, the BS has a list fixed set of (modulation, coding rate) and selects the suitable tuple based on Channel State Information (CSI) measurements.To emulate such a system, we consider several MIMO systems that use feedforward convolutional coding with 1500-byte packets for data transmission.
Usually, there is an Adaptive Modulation and Coding (AMC) module that selects a suitable MCS based on the CSI measurements to maximize the throughput performance.For this section, we ignore the complexities of the AMC module and assume that there is an oracle AMC that always selects the best MCS such that the overall throughput is maximized at any given SNR.The AMC module can select among BPSK, 4-QAM, or 16-QAM modulation, and among convolutional coding rate.We see from Fig 11(left) that RI-MIMO and TRIM allows us to operate using aggressive modulations and coding schemes and hence achieve much better performance.In particular RI-MIMO achieves around 2.5× more throughput in low-SNR regime (≈ 7.5dB) and 2× more throughput in mid-SNR regime( > 15dB).In the high-SNR ( > 20dB) both MMSE and RI-MIMO-64 seem to provide similar throughput, because RI-MIMO-64 is not sufficient for 16 QAM modulation (as noted before).However, TRIM allows us to get good performance with 16 QAM and provides a 2× throughput gain in the high-SNR regime.With Increasing SNR, the channel capacity also increases; hence we would expect the AMC module to use more aggressive modulations in order to achieve the best possible capacity.Consequently, improving the performance of RI-MIMO/TRIM for 16 QAM and higher   to 32 × 32 MIMO and increase the overall cell throughput by twice.Note that, even though the spectral efficiency of MMSE and TRIM is similar in high-SNR conditions for 32 × 32 MIMO, it is not feasible to expand 16 × 32 MIMO to 32 × 32 MIMO with just MMSE because, unlike TRIM, the performance of MMSE is terrible in low SNR situations.In Fig. 12, we simulate massive MIMO scenarios which resemble real-world deployments with 64, 128, and 256 antennas at the BS.We see that, RI-MIMO can provide significant throughput gains over MMSE in the low-SNR and mid-SNR regimes.We also observe that throughput of RI-MIMO is similar to

G. Evaluation on QuaDRiGa Channel Model
In this section, we evaluate our methods on more realistic channels generated by the QuaDRiGa simulator [50].The network topology is as follows: a single base station with an "ula8" antenna array consisting of eight antennas at the center of a hexagonal cell situated at the height of 10m.The channel instances are calculated assuming the users are distributed randomly in a radius of 50m around the base station.Each user is equipped with a single "Omni" antenna at a height of 1.5m.The system operates at a frequency of 2.6GHz, and we assume flat fading channels.We observe in Fig. 13, that RI-MIMO/TRIM provides significant gains over MMSE over the QuaDRiGa channels as well.We further note that the performance gains of RI-MIMO are severely affected for 16-QAM, as noted previously as well, and improving the performance for 16-QAM and higher modulations remains the key focus of our future work.These observations are consistent with our previous experiments using Rayleigh fading channels.Although the exact BER characteristics might differ due to changes in the statistical properties of the channel, we do not observe any significant changes in the relative performance of different methods.

H. Statistical Comparison of Direct and Regularised Ising Formulations
In this section, we will empirically study the distribution of solutions returned by our method.In Fig. 14, we simulate 16 × 16 MIMO, BPSK modulation, 10 dB SNR, and N a = 1024.As all stochastic optimization algorithms, we are sensitive to the existence and number of locally optimal states in the cost function landscape.For a given MIMO instance, characterized by (H, y), we define M k = the number of distinct states returned by the CIM machines over k anneals.Note that the ground state may or may not be a part of these M k states.M k is a random variable over the MIMO instances, and we look at the distribution of M 1024 in Fig 14 (left); we see that our proposed RI formulation drastically reduces the number of locally stable states returned by the CIM solver.Intuitively this could be at the origin of the observed decrease in the probability of getting stuck in a local minimum.Next, we look at the "quality" of various local minima in terms of how many of its spins differ from the ground state, i.e., the L 0 Hamming distance from the ground state (L gnd 0 ).In Regularised Ising (RI) formulation, we penalize the deviations from the MMSE solution with the goal of increasing the energy of locally stable states that are "far away".This is indeed what we observe in Fig. 14(center); we see that locally stable states of the RI formulation are more likely to be closer to the ground state, and the probability of success (equal to the probability of finding a solution at zero distance) is much higher for the RI formulation.As a result, both the frequency of finding the ground state and the quality of locally stable states is higher for RI formulation, which leads to a better BER performance and a much lower BER floor (as seen in Fig. 10).In Fig 14(right), we look at the impact of tuning the regularization factor r(ρ, M, N t ).The regularisation term penalizes the deviations from the MMSE solution, and a high regularisation factor should limit the search around the MMSE solution only.We observe the same in Fig 14(right); as r increases, the solutions obtained by the CIM are closer to the MMSE solution.However, choosing r to be very high will limit the search to the MMSE solution, and CIM will miss the ground state more often; our empirically optimized choice of r = 0.15 strikes a balance and optimizes for the best BER.

VI. CONCLUSION
In this paper, we explore the application of Coherent Ising Machines (CIM) for maximum likelihood detection for MIMO detection.We see that previous approaches used by MIMO detectors based on the Ising model suffer from an error floor problem and, unless many repetitions are allowed, does not have a satisfactory Bit Error Rate (BER) performance in practice.We propose a novel Regularized Ising approach and show that it mitigates the error floor problem and is a viable method to be implemented on the hardware implementation of Coherent Ising Machines.We propose two MIMO detection algorithms based on regularised Ising approach (RI-MIMO and TRIM).We demonstrate, using a CIM simulator, that our algorithms can outperform the previous Ising approach and have the potential to achieve near-optimal performance for large MIMO systems.
While our experiments use a CIM simulator, we have identified the constraints that need to be satisfied by a physical implementation of the CIM to meet the processing requirements of an LTE system.As a next step, we plan to evaluate our methods on more recent extensions to the CIM (e.g., amplitude-heterogeneity correction [61]) and on an experimental CIM implementation [22].
In conclusion, our results are complementing the mounting evidence from generic spin-glass benchmarks [49], [62], [56], further indicating that Coherent Ising Machines (using the proposed Regularized Ising approach) are a promising candidate for providing a superior alternative to the existing MIMO detection approach and achieving near-optimal performance for practical systems with a large number of users and antennas, although performance and engineering aspects need to be substantially improved for practical deployment, especially for higher modulations.

)Fig. 3 :
Fig. 3: BER vs. N a for (A) 16 × 16, (B) 20 × 20 MIMO and BPSK modulation at 5, 7.5 and 10 dB SNR: Illustrating that the BER of RI-MIMO reduces rapidly with increase in number of anneals per instance and asymptotically approaches the optimal BER.(C) N a TB vs. MIMO Size (N ) for N × N large MIMO systems: Illustrating the exponential growth in the number of anneals required to achieve a BER which is 1.1×, 1.2× and 1.5× the optimal BER.

Fig. 4 :
Fig. 4: Bit Error Rate at 10 dB and 15 dB SNR , illustrating the performance of RI-MIMO on Coherent based Ising Machines (CIM) for various value of regularisation factor with different MIMO sizes and modulation (using 64 anneals per instance).

Fig. 5 :
Fig. 5: Bit Error Rate Curves higher order modulation schemes, illustrating the performance of RI-MIMO and TRIM on Coherent based Ising Machines (CIM).TRIM executes total 64 anneals for each instance (same as RI-MIMO-64).

Fig. 7 :
Fig. 7: (left) Bit Error Rate Curves for 16×16 MIMO and BPSK modulation, illustrating the performance of RI-MIMO on Coherent based Ising Machines (CIM) with varying precision.(right) L 0 deviation from the ground state for 8 × 8 MIMO and BPSK modulation.

Fig. 8 :
Fig. 8: Bit Error Rate Curves for various Massive MIMO system with BPSK modulation and 16 antennas at the base station, illustrating the performance of RI-MIMO.

Fig. 9 :
Fig. 9: BER comparison of MMSE and RI-MIMO-512 for massive MIMO systems with very large number of antennas at the base station.

Fig. 10 :
Fig. 10: Variation of Error Floor with the number of anneals if RI-MIMO is run without considering MMSE solution as a candidate for various MIMO systems, illustrating that it takes only a few anneals to reduce the resultant error floor to the very low values.In contrast, the error floor associated with CIM-ML reduces very slowly with an increasing number of anneals.

Fig. 11 :
Fig. 11: Spectral Efficiency: Comparison between throughput of MMSE, TRIM, RI-MIMO (using adaptive MCS for all) in large/massive MIMO scenarios: (left) 16 antennas at the base station and we use N a = 64, (right) 32 antennas at the base station and we use N a = 256.

Fig. 12 :
Fig. 12: Spectral Efficiency: Comparison between throughput of MMSE and RI-MIMO for massive MIMO scenarios with large number of antennas at the base station.

Fig. 13 :
Fig. 13: Bit Error Rate for 8 × 8 MIMO with QuaDRiGa channels, demonstrating that the performance trends for various tested algorithms are similar to the Rayleigh fading experiments.

Fig. 14 :
Fig. 14: Empirical distribution of the number of locally stable states and their L 0 distance from the ground state and the MMSE solution for 16 × 16 MIMO, BPSK, and 10 dB SNR.
4) Select the best solution from the candidate solutions (x 1 , x 2 ...x Na×Nm ) generated by the Ising machine (one from each measurement times the number of anneals N m × N a ) and the MMSE solution (x mmse ).

TABLE II
: Requirements for an Ising machine (or set of Ising machines) to meet the Ising mapping and LTE processing constraints for various MIMO systems.N a is the number of anneals required per Ising instance to obtain a satisfactorily accurate solution.N s is the number of spins in each Ising instance.In this table we assume that for all MIMO orders listed, 8400