Evaluation of Quantum Annealer Performance via the Massive MIMO Problem

Quantum annealing offers an appealing route to handle large-scale optimization problems. Existing Quantum Annealing processing units are readily available via cloud platform access for solving Quadratic Unconstrained Binary Optimization (QUBO) problems. In particular, the novel D-Wave Advantage device has been recently released. Its performance is expected to improve upon the previous state-of-the-art D-Wave 2000Q annealer, due to higher number of qubits and the Pegasus topology. Here, we present a comparative study via an ensemble of Maximum Likelihood (ML) Channel Decoder problems for MIMO scenarios in Centralized Radio Access Network (C-RAN) architectures. The main challenge for exact optimization of ML decoders with ever-increasing demand for higher data rates is the exponential increase of the solution space with problem sizes. Since current 5G solutions mainly use approximate methodologies, Kim et al. leveraged Quantum Annealing for large MIMO problems with Phase Shift Keying and Quadrature Amplitude Modulation scenarios. Here, we extend their work and analyze experiments for more complex modulations and larger MIMO antenna array sizes. By implementing the extended QUBO formulae on the novel annealer architecture, we uncover the limits of state-of-the-art quantum optimization for the massive MIMO ML decoder. We report on the improvements and discuss the uncovered limiting factors learned from the 64-QAM extension. We include the enhanced evaluation of raw annealer sampling via implementation of post-processing methods in the comparative analysis between D-Wave 2000Q and the D-Wave Advantage system.


I. INTRODUCTION
Quantum Computers can harness the processing capabilities of quantum mechanics to speed up calculations for complex mathematical problems [2]. Although we are yet to achieve universal large-scale quantum computation, today's Noisy Intermediate-Scale Quantum (NISQ) devices can already be used in medium-sized experimental setups. A Quantum Annealer (QA) [3]- [5], one of the promising heuristic devices in this NISQ era, is capable of solving complex opti-The associate editor coordinating the review of this manuscript and approving it for publication was Luyu Zhao . mization problems using thousands of noisy qubits. In this paper, we study the performance of state-of-the-art Quantum Annealers for the telecommunication problem of decoding wireless physical channel transmission using large and massive Multiple Input Multiple Output (MIMO) [6] antenna arrays (see illustration in Fig. 1).
To support high transmission rates, modern wireless access points use spatial multiplexing with multiple antennas to transmit more than one data stream at once. In 5G networks, the application of MIMO antenna arrays is indispensable, however, as we increase the number of antennas, we also need to increase the computational power to be able to decode transmissions at the receiver [7]. This is also due to the complex modulation techniques employed at the transmitter. Maximum Likelihood (ML) is the optimal decoding of received symbols in a MIMO channel, as it is capable of minimizing the probability of bit errors, but it is also known to be NP-hard [8], [9]. Today's commercial massive MIMO antenna systems already contain antenna arrays large enough to face complex decoding problem. E.g., Ericsson's currently available antenna systems have 128 antenna elements (64T64R) in a 2D layout, integrated with up to 64 radio chains and capable of 256-QAM modulation scheme [10]. Furthermore, the size of the future antenna systems is expected to increase, especially when extremely large aperture arrays or holographic massive MIMO will take place [11].
In order to enable practical applicability of ML decoding for large antenna arrays, Kim et al. [1] explored the possibility of placing a Quantum Annealer within the data center of a Centralized Radio Access Network (C-RAN) [12] to provide solution to the NP-hard problem while still maintaining high throughput of the ML decoder. This requires formulating the ML decoding problem of the received symbols as a Quadratic Unconstrained Binary Optimization (QUBO) problem [13], [14] making it suitable for a QA. The resulting solution to the optimization problem can simply be mapped back to a bit string according to the constellation diagram of the receiver.
Our first goal is to use an extended methodology derived from [1] and a set of advanced modulation schemes, relevant for high-performance telecommunication scenarios. Hence, the formulae to convert 64-QAM modulated symbols to the Ising spin glass form, presented in [15], is implemented as the highest complexity problem class. Next, a set of experiments with increasing complexity are presented, defined for a comparative analysis between the D-Wave 2000Q and the recently released D-Wave Advantage platform. For the Advantage Quantum Processing Unit (QPU) [16] the number of qubits has increased to 5000, from 2000 of the previous generation, allowing for larger problem mapping. Since the new D-Wave Advantage QPU has not only more qubits but also a topology of significantly higher connectivity, the expectation is that it offers higher quality solutions to more complex QUBO problems as well. Both the increase in problem size to much higher user counts and the progression in problem complexity to more advanced modulation schemes can be tested to reveal the capacity gain limits of the new QPU in complex scenarios.
The structure of this work is as follows. Sec. II presents the theoretical background of Quantum Annealing and the Maximum Likelihood decoding methods. In Sec. III, we present the QUBO formulation of the MIMO channel decoding and its extension to 64-QAM modulation from [15].
In Sec. IV, we investigate methods for embedding QUBO formulated MIMO decoding problems. Next, in Sec. V, we present our experimental results for solving these problems on both the D-Wave 2000Q and the Advantage system. We also compare these results to conventional MIMO decoding techniques. Finally, in Sec. VI, we summarize our results and provide an outlook for potential future work.

II. THEORETICAL BACKGROUND A. ISING AND QUBO MODELS
The D-Wave Quantum Annealer can solve problems formulated as a QUBO or an Ising spin model. The Ising model describes a physical system of N binary spin variables s i ∈ {−1, 1} over the following configuration space: . The Ising Hamiltonian defines the energy of the system in a given spin configuration s ∈ via a sum of interaction and local energy terms in the following form: where h i is the bias of the ith spin variable and J i,j is the coupling strength between variables s i and s j . A reformulation of the above energy minimum search is used in [1] with the following notation: where f i and g i,j are the Ising model parameters andŝ is the minimum energy configuration. We can also refer to optimizations on Quantum Annealers as QUBO problems, as they are trivially equivalent to the Ising model in (1): where the symmetric matrix Q holds the coefficients of the binary decision variables (q i ∈ {0, 1}), having the useful property: q 2 i = q i andq is the solution bit string. Replacing s i in (1) with (q i · 2) − 1, we arrive at the same problem description.

B. QUANTUM ANNEALING
Quantum Annealing algorithms are a set of heuristic methods for finding a global minimum of a given objective function, VOLUME 9, 2021 using quantum mechanical evolution. The objective function is usually given in the form of an Ising Hamiltonian that encodes a combinatorial optimization problem [13].
In the now standard QA devices, the solution is obtained by first initializing the system in a superposition of all possible computational basis states with equal amplitudes that is stabilized by a transverse field.
Then the system evolves according to the time-dependent Schrödinger equation while the amplitudes keep changing as the problem Ising Hamiltonian is slowly introduced and the transverse field is slowly turned down. The still remaining transverse field enables the system to tunnel through the energy barriers so that it can reach lower energy states. Indeed, according to the Adiabatic Theorem, if the change of the coupling strengths of the Ising Hamiltonian and the transverse field is slow enough the system remains in the ground state of the momentary Hamiltonian throughout the annealing. Thus, for such a perfect annealing the resulting configuration of the system (when the transverse field is set to zero) is a minimal energy state of the Ising Hamiltonian, in case of imperfect annealing process the system can also end up in an excited state.
The D-Wave QPUs [17] implement an imperfect version of this process using a physical lattice of qubits and couplers referred to as the Chimera and Pegasus architecture (described in the Sec. IV-A). These systems perform thousands of anneals for each problem in a quick manner, which means they often leave the ground state. However, the hope is, that some of the samples obtained this way will reflect the minimal energy configuration of the problem's Ising Hamiltonian. Recent results of algorithmic benchmarking of QA architectures have been reported in [18]- [22], highlighting diverse application areas.

C. MAXIMUM LIKELIHOOD OPTIMIZATION FOR MIMO CHANNEL DECODING
We refer to a setup of N t users with single-antenna transmitters and an Access Point (AP) capable of receiving N r transmission symbols simultaneously, as a MIMO scenario of size N t × N r . Each user can send multiple bits with only one symbol using digital modulation techniques, such as Phase Shift Keying or Quadrature Amplitude Modulation, hence the transmitted symbols can be represented by complex vectors [23]. The set of possible values of the transmitted symbols is called the constellation O. The size of such a constellation grows exponentially as we increase the complexity (the number of transmitted bits per symbol) of the corresponding modulation scheme. We denote the vector of transmitted symbols from the user antennas asv ∈ C N t , and the vector of received symbols at the AP as y ∈ C N r . The channel matrix is denoted as H ∈ C N r ×N t . We get the received symbols by letting the channel matrix effect the transmitted symbols and adding Gaussian white noise (GWN): n ∈ C N r (y = Hv + n). Introducing v, the variable for possible transmitted symbols, andv, the vector of decoded symbols, the ML decoding [24] at the AP is: which is a search in a space of |O| N t . To regain the original binary message, one can use the constellation to get the decoded bit vectorb. The parameters f , g in (2) will be derived from the channel matrix and the received symbol vector, as detailed in Appendix VI. For the rest of this discussion, we shall only consider scenarios of N t = N r , since the qubit requirement only depends on N t , see Sec. III-A for more details. However, we note that, in general, N t = N r might be the case, and the methodologies described here work well with such setups. Although ML decoding would provide optimal detection of the transmission, in practice -due to the computational complexity -only approximations (e.g., Zero Forcing [25]) and heuristic methods (e.g., Sphere Decoding [26]) are used. As ML decoding can maximize throughput, successfully applying Quantum Annealing to speed up computation would undoubtedly yield an important real-world application of quantum computing.

III. OVERVIEW OF QUBO FORMALISM FOR LARGE AND MASSIVE MIMO OPTIMIZATION A. QUBO FORMULATION OF THE MIMO ML DECODING
QUBO formulation of the MIMO ML decoding (QuAMax transform [1]) requires assigning QUBO variable(s) to each symbol of the transmitted symbol vector. First, using the variable-to-symbol transform function T, one has to map each value in the constellation to logical qubits of the QUBO equation. Next, by expanding (4) via substitution of symbol vectors with the derived qubit equation, one can write the final QUBO form of the ML optimization problem.
For linear T, this process will provide at most quadratic terms, however, in case of Gray-coded transmission, 1 nonlinear transformation is necessary. Here, the method introduced by Kim et al. [1] is based on the linear mapping for the Ising transformation, but with an additional post-translation of the decoded bits ofb to finally restore the original, Graycoded message.
In case of higher-order modulations, more qubits are needed to encode one symbol. In particular, an N -QAM modulation requires exactly log 2 N variables per symbol for the linear mapping T. The total qubit requirement of encoding a symmetric MIMO setup of N t ×N r is N t log 2 N , giving us an estimate on the problem sizes that can be tackled by current Quantum Annealing hardware.
In the following, we give a brief overview of the basic modulations already covered in [1]. In addition, we include the extension of the QuAMax transform to the 64-QAM modulated symbol vectors presented in [15].

B. BPSK, QPSK AND 16-QAM
For the modulation scheme of Binary Phase Shift Keying (BPSK), the mapping T is a trivial conversion, as one can simply map each possible symbol v i ∈ {−1, 1} to 2q i − 1.
Higher order modulations have complex numbers as symbols, with exponentially increasing constellation size: i.e., each dimension can encode one bit. Therefore, mapping T requires two qubit variables: A more complex encoding is introduced by Quadrature Amplitude Modulation (QAM).
can encode 2 bits, we require 2 qubits per dimension for the mapping T: From these equations, substituting back to (4) gives us the expanded formulae of encoding the ML problem in QUBO form. In [1], the expanded QUBO coefficients are written explicitly for BPSK, QPSK and 16-QAM modulations.

C. THE 64-QAM MODULATION
Although 256-QAM modulation is already available in commercial mobile systems targeting Gbit per second data rates, such a complex modulation scheme requires very good radio conditions, i.e., high signal-to-noise-ratio (SNR); furthermore, it is often limited to mobile devices served at the very center of the mobile cells. Hence, mobile devices still utilize lower-complexity modulation schemes, as well. Although implementing the highest commercially available scheme would be desirable, we have to align with the limitations of current QPU hardware architectures, therefore we only increase the complexity to the next level compared to Kim et al. [1] via 64-QAM. This allows for a fair comparison with the already investigated implementations of BPSK, QPSK and 16-QAM by [1].
Here we extend our study to include the corresponding QUBO formalism presented in [15]. The usual values of 64-QAM modulated symbols are , which requires constellation of size 64 (see Fig. 9 in Appendix VI). A straightforward (and linear) variable-tosymbol transform is thus: The linearity comes at the price of disparity between the Gray-code and QuAMax, requiring additional transformation steps. The details of how this post-translation technique works can be found in Sec. 3.2 of [1]. The full expansion of (4) with 64-QAM is given in Appendix VI.

IV. EMBEDDING ONTO D-Wave QPUs A. D-Wave ARCHITECTURES
There are two types of publicly available QA architectures in D-Wave's current portfolio. D-Wave 2000Q is a model with up to 2048 physical qubits, accessible in a Chimera topology. The novel D-Wave Advantage architecture presents up to 5640 physical qubits in a Pegasus topology. As an illustration, Fig. 2 depicts two embeddings of a complete graph into first, a Chimera C 16 subgraph in Fig. 2a of four unit cells, and second, a Pegasus P 16 unit cell in Fig. 2b. While Chimera C 16 topology is composed of K 4,4 graphs in a 16 × 16 lattice, the recently released Advantage QPU has a Pegasus P 16 topology, which is more connected with nodes of degree 15 via programmable qubit couplers as opposed to previously available degree of 6.

B. MINOR-EMBEDDING METHODS WITH EXTENDED HEURISTICS
QUBO models can have arbitrarily high connectivity, hence mapping them directly to a QPU hardware topology is rarely possible. Instead, one needs to find optimal chains of physical qubits created using large negative couplings to represent the logical qubits. The process of finding such a problem mapping is called minor-embedding [27]. Finding the graph minor is NP-hard and since one needs to find it on a classical machine as part of pre-processing QUBO problems, one must employ a heuristic algorithm. One such heuristic method is the MinorMiner (MM) algorithm [28] developed for finding arbitrary graph minors. It can be used for finding minors of C 16 and P 16 as implemented in D-Wave's open-source framework, the Ocean SDK [29]. An example of embedding the same graph into C 16 and P 16 is illustrated in Fig. 2.
The quality of embedding is measured in the length of resulting chains and the uniformity of the chain length distribution. As MM is a heuristic algorithm, it has guarantees for neither of these properties. To ensure near-uniform chain lengths, Native Clique Embedding (CLIQUE) [30] can be used. It is a specialized algorithm for quickly finding embeddings of cliques onto C 16 and P 16 , which one can use to embed an arbitrary graph since any graph can be mapped to a complete graph. However, CLIQUE has a fixed upper limit for the maximal clique sizes that it can handle, which is 64 and 180 for C 16 and P 16 , respectively [31].
We note that despite the high number of available qubits, many factors can lead to high ratio of inactive nodes after an embedding. First, the special QPU topologies are hard to be fully utilized due to their sparse connectivity. Furthermore, since the actual hardware graphs can have manufacturing imperfections some graphs that would be embeddable into perfect C 16 or P 16 will not be embeddable in practice [32].
In order to embed larger QUBO problems that supersede the limitations of CLIQUE, we use two heuristic approaches that aim to yield higher qubit utilization. Clique-Based MinorMiner (CLMM) and Spring-Based MinorMiner (SPMM) [31] work by finding initial chains of qubits which can be passed to MM as a parameter in hopes of finding better final embedding using these as starting points. CLMM finds initial chains using CLIQUE, which results in the near uniformity of chains. As MM is able to shorten these chains, CLMM is capable of finding embeddings to even larger problems, exceeding the capability of pure MM. SPMM lays out both the hardware topology graph and the problem graph on a [−1, 1] × [−1, 1] plane and matches each problem node to the nearest hardware node in Euclidean distance. Hence, the initial ''chains'' in case of SPMM all have length of one, therefore this method is more suitable for sparser problem graphs (Fig. 3). We present test results for both methods and use the found limiting cases of embeddable MIMO ML decoding scenarios of maximum problem size.

C. MINOR-EMBEDDING OF MIMO ML DECODING
In this section, we present the found hard limits of MIMO ML decoding as QUBO problem regarding size and modulation complexity embeddable into Pegasus P 16 and Chimera C 16 graphs. As any N -QAM N t × N r MIMO ML setup is (almost) equivalent to a K N t log 2 N complete graph [1], using CLIQUE, one can derive the largest embeddable problem sizes. For Chimera C 16 , it is known that the largest native clique is K 64 , whereas for Pegasus P 16 , it is K 180 . From these assumptions, one can trivially derive the limits of each problem complexity. However, for completeness, we have compiled the logical and physical qubit requirements in Table 1 for a set of large and massive MIMO scenarios that are relevant in practice. The green cells indicate feasible Native Clique Embedding on both QPU architectures, yellow cells are only embeddable to Pegasus P 16 topology and red cells indicate non-feasibility on both the 2000Q and the Advantage system.
Already by using Pegasus P 16 , the results surpassed previous limits published by [1], more than doubling the largest embeddable problem sizes. However, additional heuristics further improved these results. In addition, a comparison of physical qubit usage of MM, SPMM and CLMM is depicted in Fig. 4. Since the embedded problems are represented by near complete graphs, CLMM is the best performer, whereas SPMM has inferior performance -as expected based on [31]. The rate of growth is fast but sub-exponential for all three algorithms. Still, none of the embeddings could leverage all available qubits due to, at least in part, the connectivity limitations of the QPU topologies.
The upper limits for each modulation and for the two architectures are summarized in Table 2 comparing the Native Clique Embedding to best-performing CLMM. Note that CLMM produced near-uniform chain lengths, whereas MM and SPMM did not. This gives CLMM an advantage as it is  capable of producing both shorter and more uniform chains, the two main factors affecting the optimization quality of Quantum Annealing solutions.

V. OPTIMIZATION OF MIMO ML DECODING WITH QUANTUM ANNEALING DEVICES A. EXPERIMENTS FOR COMPARATIVE ANALYSIS
We tested each modulation with many different symmetric user setups (N t × N r , N t = N r = 2 p ) that are relevant for commercial telecommunication installations. For each specific problem, we generated 10 different noiseless random instances (channel matrices, transmitted bits), each of which was run 5 times.
Our main performance metric was the bit error rate (BER), which indicates the ratio of unsuccessfully decoded bits, i.e.: where n is the length of the sent bit string b. The natural expectation is that increasing problem complexity leads to increased BER. Moreover, the hope is that the more advanced QPU will not only handle higher problem sizes, but also improve upon performance for the smaller problems running on both architectures. Both of these expectations were proven by the experiments presented in the next subsection.
For most of the runs, we used Native Clique Embedding as it produced uniform chains, leading to more stable results as opposed to heuristic embedding methods. However, the largest instances on the Pegasus architecture were embedded using the base MinorMinor method, since CLIQUE would require more qubits than available in the current hardware topology. Fig. 5 shows that CLIQUE is the most stable one of the presented embedding methods when solving a 4 × 4 16-QAM problem multiple times on the D-Wave Advantage system.

B. EVALUATION OF EXPERIMENTS ON D-Wave QPUs
The experiments were run using the D-Wave DW_2000Q_6 and the Advantage_system_1.1 QPU solvers [33], [34]. The QPU solvers have several parameters available for controlling the annealing process. Annealing time (T a ) and the number of requested samples (N a ) are the most basic annealing process parameters, and also the most important ones for the quality of solution. T a sets the duration (in µsec) of each annealing cycle, and N a tells the QPU how many sampling cycles should be executed. As both of these parameters have high impact on the optimization outcome one often needs to empirically search the whole parameter space for each particular problem to find the most promising combination. We tested a minimal set of parameter pairs that proved to be well-performing during our earlier research [35]. We concluded that setting T a to 15 and N a to 500 is a good choice overall as it gives stable results across all problem complexities (BPSK, QPSK, 16-QAM, 64-QAM) and MIMO sizes (N t ∈ {2, 4, . . . , 128}). We should note however, that this setting does not necessarily meet the practical real-time computing requirements of the MIMO ML detection problems.
The other important parameter was the chain strength that governs the cohesion of physical qubit chains. Setting this parameter too high can lead to lost precision of the QUBO coefficients, while setting it too low can cause broken chains. We conducted the experiments with chain strengths of 0.3 and 0.8. From our experiments, the lower complexity problems require smaller chain strength and the QAM modulations benefit from stronger chains.
The results of the performance tests are shown in Fig. 6, where BER statistics for each problem's solution on both QPUs are depicted. We can see that increasing problem complexity indeed leads to increased BER. Besides the more advanced Advantage QPU being able to handle higher problem sizes, the results also show performance improvement for smaller problem solutions as compared to the 2000Q architecture solutions. Hence, our expectations were proven. Although the improvements per problem instance are not substantial, still, the problem size extension is significant.
For the dimension of complexity of modulation types, we can see a separation between Phase Shift Keying and Quadrature Amplitude Modulations. First, for the BPSK and QPSK modulations, even the larger instances could be solved via both QPUs with near zero median BER, furthermore, for most of these problems the QPUs could produce at least one proper solution. In Fig. 6, we can observe that the median BER increases significantly when we are reaching the limits of the largest investigated size of the problems. This is not unexpected, as larger cliques require embedding with longer chains leading to the aforementioned phenomenon of losing precision of QUBO coefficients. Second, for the 16-QAM and 64-QAM cases, the results are not satisfactory. For these complex modulation schemes, even small-sized problems could incorporate some bit errors and the largest problems rarely had proper solutions.
For the dimension of the transmitter numbers, there is a separation only in case of the more complex modulation schemes. First, for the 2 × 2 and 4 × 4 instances of the 16-QAM cases, there exist perfect solutions, but the high bit error rate dominates in the majority of the samples; for 64-QAM, we could only produce proper solutions for the 2 × 2 instance. Second, in case of the large instances of higher-order modulations, further parameter fine-tuning would be required to produce near-optimum solutions.

C. POST-PROCESSING OF QUANTUM ANNEALING SAMPLES
Single Qubit Correction (SQC) is a post-processing technique introduced by [36]. It aims to reduce energies of Quantum Annealing samples by iteratively changing each solution bit depending on its energy impact. It is a heuristic approach, i.e., there is no guarantee that the resulting samples will have lower energies than the original ones (but it is guaranteed that it does not increase the energies).
The method works by iteratively changing the sign of bits in every sample in order to reduce the sample's energy. Let s i denote the ith bit of a sample obtained via Quantum Annealing. Given the matrix of QUBO coefficients Q and the set of indices of non-zero quadratic biases C = {i, j| Q i,j = 0}, the influence of s i is: If I i and s i has the same sign, we can flip the sign of s i to reduce the energy of the complete sample by 2I i . Here, s i runs over the entire set of QUBO variables. After every bit has been checked and potentially flipped, the whole process can be repeated until no more change is possible.
With SQC, we were able to achieve an average BER reduction of 0.067, with a maximum reduction of 0.25. The method achieved the most significant improvements for 16-QAM and 64-QAM modulations with averages of 0.158 and 0.118, and with the maximums reaching 0.25 and 0.229 reduction in BER, respectively. Fig. 6 shows the results achieved via post-processing of P 16 QPU samples as compared to the raw samples from C 16 and P 16 QPUs for the same set of random instances.

D. COMPARISON TO CONVENTIONAL METHODS
We compared the performance of the QA-based ML decoding to conventional decoding schemes [37], [38]. As a baseline for linear decoding methods, we used Zero Forcing [25] and Minimum Mean Squared Error [39]. Furthermore, we also included the Sphere Decoding [40], [41] method with infinite lattice that implements Maximum Likelihood estimation, which is capable of finding the best possible solution to the decoding problem. We have compared these methods with the Quantum Annealing based technique, in particular the P 16 QPU + SQC method with CLMM embedding. We note that improvements to the mentioned basic classical techniques already exist [42]- [44], which can outperform these basic versions either in accuracy, or in complexity.
Without noise, the conventional methods can solve any decoding problem, as they can calculate the exact solution due to their mathematical framework. However, the same cannot be said about the QA-based decoder as it needs to work on a modified (embedded) version of the problem. Furthermore, an analog QPU device also suffers from environmental noise and low floating-point precision. To make the problem more realistic, we added noise to the received symbols, which was done by specifying the signal-to-noise ratio (SNR), measured in dB.
Each of the decoders solved 4 × 4 16-QAM problems with 7 different SNR values with 10 randomly generated instances per SNR value. SNR analysis of BPSK and QPSK comparing quantum and classical methods was already presented in Kim et al. [1] . We chose 16-QAM to extend upon this work toward practical use cases. Fig. 7 depicts the result of the comparison measured in BER averaged over the 10 random instances. From this, we can conclude that the QA-based method is capable of providing the same solution quality as the conventional decoding methods in noisy channels. In particular, it produced the same results as the Sphere Decoding ML method, which is the best possible outcome.
The simulations were done using a modified version of [45], a massive MIMO simulator with multiple available decoders.

E. TIME STATISTICS OF QPU SAMPLING
The process of QA sampling on D-Wave's QPUs can be deconstructed into a sum of processing time intervals, each measured in µs. On the high level, QPU Access Time can be split into QPU Programming Time, QPU Sampling Time, and QPU Access Overhead Time. During Programming, the biases and couplers are initialized according to the QUBO model, Sampling Time is used to collect N a samples, whereas Access Overhead accounts for post-processing of the batch of samples last to be collected (the other batches are postprocessed concurrently with the annealing run of the consecutive batch). For more details on D-Wave QPU timing, see [46]. QPU Sampling Time can be further decomposed into timing requirements of each individual sampling. Producing one sample includes QPU Annealing Time Per Sample (T a ), QPU Read Out Time Per Sample (R a ) and QPU Delay Time Per Sample. Although Sampling Time is usually referred to as N a · T a , we wanted to highlight that Read Out Times take up a large portion of Sampling Time, which must be considered for the real-time requirements of the MIMO ML decoding problem. Fig. 8 shows the breakdown of the total Sampling Time for a single QPU run grouped by modulation, QPU architecture VOLUME 9, 2021 A decomposition of the total QA sampling process is used with the following elements: Programming (green) initializes the QPUs, Annealing (orange) drives the system to optimal state, Readout (blue) measures the solution states, Delay (yellow) is used for thermalization of the QPU to its initial temperature, and Post-Processing (purple) transforms the last batch of sample measurements before returning the best optimum and additional information of the annealing run. and N t . The first step for a QPU run the qubits are initialized (Programming Time). This is a one-time only process, and it is nearly constant for all problem sizes, but it differs on the two QPUs significantly due to the size difference.
The actual annealing process takes N a · T a · R a total time, where N a and T a is determined by the API user, whereas R a is dependent on the hardware implementation. In case of our experiments, N a = 500 and T a is 15. One could improve upon these results by lowering T a (down to 1) or requesting fewer samples, however, the optimal values T a and N a are often problem dependent and finding them requires thorough search of the parameter space.
D-Wave's post-processing optimization of samples is applied batch-by-batch, in parallel with the annealing runs, and therefore only the last batch's post-processing adds extra time to the overall annealing run. In Fig. 8, we can observe that post-processing time is highly dependent on problem size and has negligible impact on the whole sampling time. Fig. 8 also tells us that the 2000Q QPU has constant time requirement regarding readouts, whereas the Advantage system demonstrates scaling with the problem size. Furthermore, the overall readout times improved also on the architecture and therefore it is not only capable of solving larger problems, but also achieves speedup over all experiment types and problem sizes.

VI. CONCLUSION AND OUTLOOK
In this paper, we presented the experimental evaluation of the MIMO ML decoding tests of multiple modulation schemes on currently available Quantum Annealers. For testing the newly available D-Wave Advantage system, we first reproduced the results of [1] for BPSK, QPSK, 16-QAM on the predecessor hardware D-Wave 2000Q and then extended the range of experiments to the 64-QAM modulation scheme using the novel state-of-the-art QPU.
As expected, using the Pegasus P 16 architecture of the D-Wave Advantage system, the embedding and solution qualities of the MIMO ML decoding problems scale better, for both the dimensions of modulation complexity and transmitter numbers, than in the previously available Chimera C 16 architecture. Whereas the embeddable problem sizes could be doubled on the new architecture, paving the way for massive MIMO applications, the improvements per problem instance were not substantial for smaller QUBO problems.
We believe that this is due to precision limitations of the analog QPU hardware. With the use of the SQC postprocessing technique, the quality of solutions could be further improved, however, only to a certain degree. We believe that additional heuristics and parameter tuning could be tested for further enhancement of the optimization results. Furthermore, fine-tuning for lower values for N a and T a could be an additional step towards requirements of latency sensitive optimization tasks. We leave the study of these possibilities to future work.
We showed that using SQC post-processing the current QPU is already capable of solving up to 16-QAM problems with the same quality as the classical ML decoders. This is an indication that future QPU architectures (with higher qubit connectivity) will be able to solve MIMO ML decoding problems of any complexity with at least the same quality as classical decoders. While NISQ quantum technologies present a remarkable progress each year, until their maturity there are hybrid and quantum-inspired classical solutions already commercially available -with impressive performance reports for specific optimization tasks. For large-scale problem sizes, D-Wave's new Leap Hybrid service [47], [48] is capable of processing up to 20,000 QUBO variables with fully connected graphs, using an approach of problem partitioning. Digital annealing technologies [49]- [52] also pave the way towards time sensitive optimization scenarios with high-parallelism and highperformance dedicated architectures.
The quality metrics of novel optimization systems would also have to be tested via algorithmic comparative experiments of specific problem types, and we expect to see more diverse set of industry-specific performance studies published in the near future.

APPENDIX A CONVERSION OF ENCODED BITS WITH 64-QAM
Here we refer to (7) in Sec.III-C, and present a more detailed constellation diagram for 64-QAM modulation scheme. Fig. 9 illustrates the process of converting the QuAMax encoded bits to the original, Gray-coded 64-QAM transmission as follows.
First, the sender transmits Gray-coded message using the constellation of Fig. 9c. At the receiver, the QuAMax constellation of Fig. 9a is used to decode the message, which uses non-Gray-coded bit strings in order to retain linear QUBO transformation. To restore the originally sent bit string, post processing of the minimization result is required using post-translation which flips the bits of even-numbered columns in the constellation resulting in Intermediate code constellation of Fig. 9b. Finally, from the Intermediate code constellation we convert back to the original, Gray-coded bit string via either simple mapping or by differential bit encoding.

APPENDIX C ISING TRANSFORMATION OF 64-QAM MODULATION
The equations for the Ising coefficients of the 64-QAM Modulation follow the same notation as Kim et al. [1] , where H (:,i) denotes the ith column of the channel matrix H and y is the vector of received symbols, 2 (10), as shown at the next page.  (14)), contrary to the misprinted formula in Appendix C of [1].    Research, Budapest. His current research interests include AI in network analytics, network management, and network automation. His research interests also include emerging technologies that can potentially be involved in various business areas of telecommunication, such as quantum AI, camerabased positional tracking, and AR/VR.