Multiblock ADMM Heuristics for Mixed-Binary Optimization on Classical and Quantum Computers

Solving combinatorial optimization problems on current noisy quantum devices is currently being advocated for (and restricted to) binary polynomial optimization with equality constraints via quantum heuristic approaches. This is achieved using, for example, the variational quantum eigensolver (VQE) and the quantum approximate optimization algorithm (QAOA). In this article, we present a decomposition-based approach to extend the applicability of current approaches to “quadratic plus convex” mixed binary optimization (MBO) problems, so as to solve a broad class of real-world optimization problems. In the MBO framework, we show that the alternating direction method of multipliers (ADMM) can split the MBO into a binary unconstrained problem (that can be solved with quantum algorithms), and continuous constrained convex subproblems (that can be solved cheaply with classical optimization solvers). The validity of the approach is then showcased by numerical results obtained on several optimization problems via simulations with VQE and QAOA on the quantum circuits implemented in Qiskit, an open-source quantum computing software development framework.


Introduction
Mixed-Binary Optimization (MBO) has been studied for decades in Mathematical Programming, because of the widespread range of applications in several domains [18,32,53], and the inherent difficulties posed by integer variables. The MBO class is very broad, and tailored exact or heuristic solution approaches have been devised in classical computation, depending on the nature and structure of the specific formulation [8,29]. Recently, the advances in universal quantum computing [24,37,45,47] fostered efforts to understand whether this alternative computing paradigm could offer advantages (e.g., faster exact algorithms, more reliable heuristics) to solving combinatorial optimization problems [68]. Research directed to apply the resulting algorithms to early generation of universal quantum computers (often referred to as Noisy Intermediate-Scale Quantum devices, NISQ) has mainly focused on hybrid quantum/classical variational approaches [45], which have been applied to chemistry [40,50], machine learning [26,55], mathematical optimization [25,30,47]. In broad terms, a variational approach works by choosing a parametrization of the space of quantum states that depends on a relatively small set of parameters, then using classical optimization routines to determine values of the parameters corresponding to a quantum state that maximizes or minimizes a given utility function. Typically, the utility function is given by a Hamiltonian encoding the total energy of the system, to be minimized. The variational theorem ensures that the expectation value of the Hamiltonian is greater than or equal to the minimum eigenvalue of the Hamilto-nian. Such variational approaches can be applied for solving combinatorial optimization problems, provided that we can construct a Hamiltonian encoding the objective function of the optimization problem, see [7,48]. In the mathematical optimization context, research has been directed mainly to quadratic unconstrained binary optimization problems (QUBO): minimize x c x + x Qx subject to: x ∈ {0, 1} n , with c ∈ R n , Q ∈ R n×n , which can be transformed into an Ising model with Hamiltonian constituted as a summation of weighted tensor products of Z Pauli operators. In case equality constraints Ax = b are required to be modeled, a QUBO can still be devised by adding a quadratic penalization α Ax − b 2 of the equality constraints to the objective function, as a soft-constraint in an Augmented Lagrangian fashion [27,48,64]. In the MBO formulations, continuous variables and inequality constraints are typically required to be modeled, and this general case is at early stages with variational approaches, e.g., with slack variable formulations [15].
A typical hybrid quantum/classical variational approach, such as VQE [50] would involve the following two key steps in solving a QUBO, given its Ising Hamiltonian H formulation. First, one would parametrize the quantum state via a small set of rotation parameters θ: the state can then be expressed as |ψ(θ) = U (θ)|0 , where U (θ) is the parametrized quantum circuit applied to the initial state |0 . The variational approach would then aim at solving min θ ψ(θ)|H|ψ(θ) . Such optimization can be performed in a hybrid setting that uses a classical computer running an iterative algorithm to select θ, and a quantum computer to compute information about ψ(θ)|H|ψ(θ) for given θ (e.g., its gradient).
In this paper, we aim at extending the hybrid quantum/classical methodologies to be able to cope with MBOs on NISQ devices. As a matter of fact, we start in a bit more general context (i.e., abstracting away from quantum computing per se) and we pose as an assumption that a QUBO solver oracle is available to solve QUBOs at optimality; then, we ask ourselves where we can go from there. In particular, we explore ways to (heuristically) solve certain classes of MBOs with the assumed QUBO solver. The aim of the paper is: • To extend the hybrid quantum/classical methodologies to be able to cope with MBOs; • To propose new heuristics to solve MBOs on nearterm quantum computers, having the potential to scale to larger sizes than heuristics on classical computers; • To offer a glimpse on current research in combinatorial optimization in quantum computing, along with assumptions, challenges, and open problems.
The proposed heuristics are based on the celebrated alternating direction method of multipliers (ADMM), see [10, 13, 19, 21, 23, 31, 33-35, 38, 49, 56]. Alternating Direction Method of Multipliers (ADMM) is an operator splitting algorithm that has a long history in convex optimization. ADMM is known to have residual, objective and dual variable convergence properties, provided that convexity assumptions are holding [13]. Recently, non-convex variants of ADMM have been developed (see [22, 39, 59-61, 65, 67] for nonconvex/combinatorial theoretical results and ADMMinspired heuristics). In the heuristic framework of [22] for problems with convex objective and decision variables from a non-convex set, ADMM relies on a (eventually approximate) projection on the non-convex feasibility set, and the ADMM iterates are improved via local search methods. Our method instead does not involve a projection step and makes use of the ADMM operator-splitting procedure to devise a decomposition for certain classes of MBOs into: • a QUBO subproblem to be solved by QUBO solver oracle (or, on near-term quantum devices via hybrid classical/quantum variational algorithms, such as VQE [50], QAOA [24]).
• a convex constrained subproblem, which can be efficiently solved with classical optimization solvers [14] .
Our method builds upon the recent results of [65] and [59], which propose global convergence guarantees for non-convex and non-smooth optimization problems. For MBO, the convergence results of [65] would not hold because of the requirement on the Lipschitz continuity of specific components of the objective function. However, [59] observed that a third block can be added to the two-block decomposition of ADMM in order to gain convergence properties to stationary points. Our method also leverages the recent results of [6] on related tame problems to ensure that convergence is attained to a unique stationary point, by taking advantage of the semi-algebraic structure of the problem.
The mathematical contribution of the paper is a multiblock ADMM heuristic (M-ADMM-H) algorithm for MBO, for which we present: • a decomposition approach suitable for hybrid computation on near-term quantum devices • conditions for convergence, feasibility and optimality.
• computational results for classical and hybrid quantum/classical implementations, including comments on the solution quality achieved.
Despite the quantum-oriented angle of the paper, our results applied also to classical computing, whenever a QUBO solver is available, or whenever the QUBO subproblems are easy or trivially solvable.
The paper is organized as follows. Section 2 reviews the ADMM proposed for convex optimization and introduces a two and three-block implementation of ADMM when binary variables are present. In Section 3, the two and three-block implementations are then specified for mixed-binary optimization problems and convergence properties are illustrated. To get a better picture of the proposed algorithm, Section 4 discusses small-sized examples. The two illustrative MBO formulations in Section 5 are solved with the proposed ADMM-based algorithms in Section 6, discussing the results obtained on simulated NISQ devices. Finally, conclusions are drawn in Section 7.
Notation. Notation is whenever possible standard and borrowed from convex analysis [14,54]. Vectors x ∈ R n , matrices A ∈ R n×m , and sets X ⊆ R n . For vectors and matrices (·) indicates the transpose operation. Functions f : R n → R ∪ {+∞}, whose codomain is the extended real line. A function is convex iff its epigraph is a convex set. A convex function is closed if its epigraph is closed, and it is proper if its effective domain is nonempty and it never attains −∞. A function is lower-semicontinuous if and only if all of its sublevel sets {x ∈ X : f (x) ≤ y} are closed. A proper convex function is closed iff it is lowersemicontinuous. The indicator function of the set X is a function ι X : R n → R ∪ {+∞}, for which ι X (x) = 0 if x ∈ X and +∞ otherwise. The indicator function of any closed set is lower-semicontinuous.
2 From the standard ADMM to a three-block structure

Convex ADMM
We start with some background on ADMM and the known results in the case of non-convex and combinatorial problems. Let f : R n → R ∪ {+∞} and h : R m → R ∪ {+∞} be closed convex proper functions, and let A ∈ R n×p , B ∈ R m×p be given matrices.
The prototypical problem we are interested in is of the form: subject to : Then the ADMM is the following algorithm: • Initialize the sequences (x k ) k∈N , (y k ) k∈N , (λ k ) k∈N as x 0 ∈ R n , y 0 ∈ R m , λ 0 ∈ R p . Choose a penalty parameter > 0; • For k = 1, 2, . . . do: -First block update: x k = arg min x∈R n f (x) + +λ k−1 (Ax + By k−1 ) + -Second block update: -Dual variable update: For the classical ADMM, we have various convergence and convergence rate results. For an ample classes of convex costs, ADMM converges for any , that is, starting from any x 0 , y 0 , λ 0 , it generates a sequence for which we have • Residual convergence: Ax k + By k → 0 as k → ∞, i.e., the iterates approach feasibility; • Objective convergence. f (x k ) + h(y k ) → p * as k → ∞, i.e., the objective function of the iterates approaches the optimal value; • Dual variable convergence. λ k → λ * as k → ∞, where λ * is a dual optimal point.
Non-convex results (when the cost functions are nonconvex) are less ubiquitous in the literature and, in general, more restrictive in terms of assumptions. However, ADMM still behaves quite favourably in nonconvex cases and attracts a considerable amount of attention from the research community.

Mixed-binary ADMM
In this paper, we start by modifying (1) by considering that x is now constrained to live in the non-convex set {0, 1} n , or equivalently that each of the component of the vector x, i.e., x (i) , i ∈ {1, . . . , n}, is constrained as We compactly write this as requiring x ∈ X , where X represents the said binary set.
Let now ι X : R n → R ∪ {+∞} be the indicator function of the set X , which is by construction closed and proper (but obviously non-convex), and consider the new function The function f NC (x) is non-convex by construction, yet one could still attempt at using the ADMM approach (2) with the new function f NC (x) in lieu of the "old" one f (x), with the goal of solving the MBO: subject to : This is in general a heuristic. However, under some more restricting conditions the sequence generated by ADMM converges also in this case as follows.
Theorem 1 (Convergence of mixed-binary ADMM [65]) Consider the following assumptions: is coercive over the set Ax + By = 0; A2) (Feasibility) Im(A) ⊆ Im(B), where Im(·) returns the image of a matrix; A3) (Lipschitz sub-minimization paths) There exists a positive constantM , such that for any iterate counters k 1 and k 2 , we have: Define the augmented Lagrangian, Then, Binary ADMM converges subsequently for any sufficiently large , that is, starting from any x 0 , y 0 , λ 0 , it generates a sequence that is bounded, has at least one limit point, and that each limit point (x * , y * , λ * ) is a stationary point of L , namely, 0 ∈ ∂L (x * , y * , λ * ).
Theorem 1 is a special case of the more general Theorem 1 of [65] adapted to our problem setting (and where we have chosen to use a stronger version of A3) for sake of clarity and ease of implementation). Functions satisfying the Kurdyka-Łojasiewicz (KŁ) property are for example semi-algebraic functions and locally strongly convex functions. We recall that a semi-algebraic function can be defined based on its graph as follows

Definition 1 ( [4])
A subset of R n is called semialgebraic if it can be written as a finite union of sets of the form where p i , q i are real polynomial functions.
A function f : R n → R ∪ {+∞} is semi-algebraic if its graph is a semi-algebraic subset of R n+1 .
From this discussion, ι X (x) (besides being lower semicontinuous) is semi-algebraic, since its the indicator functions of semi-algebraic sets Theorem 1 (or its broader version) is fairly tight, and counter-examples exists in which some of the assumptions are not verified and ADMM fails to converge.
To understand better the implications of 1, we consider a toy example, which verifies all the assumptions of the theorem.
Example 1 Consider the problem: The unique optimal solution is v * = w * = 1. If we apply ADMM to it, as for Theorem 1, we can obtain convergence for sufficiently large starting from any initial v 0 , w 0 , λ 0 . For example, we can start with v 0 = 1, w 0 = 1, λ 0 = 0 with = 100. Then we can see that the ADMM algorithm converges to the solution v = w = 0 = λ = 0, which is a stationary point of the augmented Lagrangian L (v, w, λ). If we start with a different starting point v 0 = 0, w 0 = 0.5, λ 0 = 0 with the same , then convergence is attained to the point v = w = 1, λ = 2, which is the optimal solution of the original problem, and another stationary point of the augmented Lagrangian.
From the above example, one can understand the implications of convergence of ADMM in the non-convex setting, where one may converge to a feasible point, but not necessarily optimal for the problem. This is in general not a very unsatisfactory behaviour, especially in non-convex setting, where one is often concerned about finding "good" feasible points.

Mixed-binary three-block ADMM
We move now to generalize the mixed-binary ADMM to three-block implementation. The reason behind the three blocks is that the assumptions in Theorem 1 are restrictive for MBO problems and they would not be satisfied in general (as we see later).
Consider the prototypical (mixed-binary) problem: subject to : where we have introduced the functions f 0 : R n → R, f 1 : R l → R, the matrices A 0 ∈ R n×p , A 1 ∈ R l×p and we have put ourselves already in the mixed-binary case.
(For completeness, recall the definition of function h : R m → R∪{+∞} as closed convex proper function, and matrix B ∈ R m×p .) Then the three-block ADMM is the following algorithm: • Initialize the sequences (x k ) k∈N , (x k ) k∈N , (y k ) k∈N , (λ k ) k∈N as x 0 ∈ R n ,x 0 ∈ R l y 0 ∈ R m , λ 0 ∈ R p . Choose a penalty parameter > 0; • For k = 1, 2, . . . do: -First block update: -Second block update: -Third block update: -Dual variable update: This is in general a heuristic. However, under some more restricting conditions the sequence generated by ADMM converges also in this case as follows.
Theorem 2 (Convergence of mixed-binary three-block ADMM [65]) Consider the following assumptions: A3) (Lipschitz sub-minimization paths) There exists a positive constantM , such that for any iterate counters k 1 and k 2 , we have: Define the augmented Lagrangian, Then, Mixed-binary three-block ADMM converges subsequently for any sufficiently large , that is, starting from any x 0 ,x 0 , y 0 , λ 0 , it generates a sequence that is bounded, has at least one limit point, and that each limit point (x * ,x * , y * , λ * ) is a stationary point of L , namely, 0 ∈ ∂L (x * ,x * , y * , λ * ).
Theorem 2 is a special case of the more general Theorem 1 of [65] adapted to our problem setting (and where we have chosen to use a stronger version of A3) for sake of clarity and ease of implementation). Functions satisfying the restricted prox-regularity assumptions are for example convex functions, including indicator functions of convex sets (which will be the ones that we will use in the sequel).
What is now fundamental in the three-block ADMM is that we can put separate constraints on x (that is the binary constraint) and onx (any restricted proxregular constraints, e.g., linear inequalities). This without affecting y, whose function h(y) needs to be smooth. This "trick" was first explored in [59] in the context of distributed computations and discussed in the following example.
Example 2 Consider the problem: This problem does not satisfy the assumptions of Theorem 1, since y is now constrained (although ADMM here is nonetheless converging in practice). But a possible way to overcome this (without adding constraints on the binary variable v), is to use the relaxed problem for a large β < .
In [59], a proper dualization of the constraint w = 0 is imposed, but the convergence of the then two-level approach has more restricting assumptions that the ones that we consider here.
3 Two and three-block ADMM algorithms for MBO

From MBOs to two-block ADMM
We are now ready to tackle MBOs. In this paper, we will consider the following reference problem (P ): with the corresponding functional assumptions.

Assumption 1 (Functional assumptions)
The following assumptions hold: • Function q : R n → R is quadratic, i.e., q(x) = x Qx + a x for a given symmetric squared matrix Q ∈ R n × R n , Q = Q , and vector a ∈ R n ; ∀i} enforces the binary constraints; • Matrix G ∈ R n ×R n , vector b ∈ R n , and function g : R n → R is convex; • Function ϕ : R l → R is convex and U is a convex set; • Function : R n ×R l → R is jointly convex in x, u.
Problem (P ) with the required functional assumptions can still capture many relevant problems in mathematical programming, such as vehicle routing [11,18,63] and facility location [16]. Formulations for bin packing and knapsack problems will be discussed in Section 5.
In order to put Problem (P ) in the ADMM standard form, we need to write Problem (P ) as problem (3). First, in this paper, following mainstream quantum practice (see [15,48]) and because we need to retrieve a QUBO, we soft-constrain the equality constraint (whenever present) as an augmented term in the cost function. Then, we introduce the new variable z ∈ R n and Problem (P ) can be written as the soft-constrained problem (P ) : for a large positive constant c > 0. Problem (P ) is a soft-constrained version of Problem (P ) (it would be equivalent if G = 0 and b = 0): it is however a convenient splitting of binary and continuous variables.
subject to : where A 0 = I n and A 1 = [−I n , 0 l×l ].
A first possible strategy to use ADMM on (P ) is summarized in Algorithm 1, in a two-block implementation (2-ADMM-H). As we discussed in Section 2 and Example 2, this strategy is in general a heuristic, since the variablex is constrained, however in some cases Algorithm 1 can deliver good solutions (as we will explore). In order to keep track of the solution quality during the iterations, we compute a merit value associated with each iterate x k . Let ζ k = max(g(x k ), 0) + max(l(x k ,x k ), 0) be the violation of the constraints on decision variable x in Problem (P ) at iteration k, and µ be a penalization for ζ k . Then, the merit value η k of x k is a linear combination q(x k ) + φ(x k ) + µζ k of the constraint violation and solution cost in problem (P ). Iterates with high merit value are both not likely to be of optimal value and close to feasibility for Problem (P ), hence the minimum merit value solution is returned by Algorithm 1.

4:
Dual variable update: Compute merit value: The strength of Algorithm 1 is that the original MBO is now split into a QUBO (that can be solved on the QUBO oracle, or on NISQ devices) and a convex problem, that can be solved with off-the-shelf solvers, such as SPDT3 [62] and MOSEK [3].

Remark 1
In [22], the authors explore a slightly different decomposition of the same MBO problem (12).
In particular, the authors let In this way, the QUBO problem (first block update) becomes a projection problem of dimension n onto the one dimensional constraint {0, 1}, which is easily solvable, while the convex problem (second block update) becomes the convex relaxation of the MBO problem (with an additional penalization term). This non-convex ADMM heuristic is effective in finding approximate solutions to a wide variety of problems in classical computation, depending on an appropriate setting of the initial parameters. However, it is not readily applicable on NISQ devices, as it does not involve QUBOs.

From two-block to three-block ADMM for MBOs
To overcome the limitation imposed by the convergence theorems (Theorem 1-2) on the smoothness of function f 1 (x), we use the same approach explored in Example 2, as well as in [59]. We exploit a three-block implementation of ADMM (3-ADMM-H) onto the softconstrained problem (P ) : subject to : where the only difference with (14) is the introduction of variable y, which penalizes constraint violations.
Algorithm 2 reports the 3-ADMM-H algorithm, along with stopping criteria and evaluation metrics. As we can see, once again, the problem (15) is split into a QUBO, that can be solved by a QUBO oracle, and convex optimization problems. We note that the twoblock implementation is a particular case of the threeblock algorithm, with y 0 = 0 ∈ R n , and skipping third block update (i.e., step 4 of 2).
We are now ready for the convergence results for Algorithm 2 (3-ADMM-H). First, we present the results when continuous variables are not present, and then extend it to continuous variables.

5:
Dual variable update: Compute merit value: Theorem 3 (Convergence of Algorithm 2) Consider Problem (12) with no continuous variable u and let Assumption 1 hold. Define the augmented Lagrangian, Then, Algorithm 2 converges subsequently for any sufficiently large > max{β, c}, that is, starting from any x 0 ,x 0 , y 0 , λ 0 , it generates a sequence that is bounded, has at least one limit point, and that each limit point In addition, if f 1 (x) is a Kurdyka-Łojasiewicz (KŁ) function [5,12,43], then (x k ,x k , y k , λ k ) converges globally to the unique limit point (x * ,x * , y * , λ * ).
Proof -We are going to leverage the results of Theorem 2 to prove Theorem 3. In particular, we are going to check that all the assumptions in Theorem 2 are satisfied and determine a necessary condition on how large must be for the algorithm to converge. A1) (Coercivity). Coercivity holds since x lies in a bounded set, h(y) = β 2 y 2 2 is quadratic, therefore coercive, and the same holds forx.
A4) (Objective f -regularity). f 0 +ι X (x) is lower semicontinuous, and f 1 is restricted prox-regular since the sum of a convex function and the indicator function of a convex set.
And to finish the proof: Theorem 3 describes a set of assumptions for which Algorithm 2 is proven to converge to a stationary point of the augmented Lagrangian L , which is a softconstrained version of the original MBO problem (12). We now expand on Theorem 3 by considering continuous variables.
Theorem 4 (Convergence of Algorithm 2 with continuous variables) The same results of Theorem 3 hold if: • The function ϕ(u) is strictly convex and the inequality constraint (z, u) ≤ 0 is never active, i.e., for each z k and u k generated by the algorithm we have (z k , u k ) < 0; • The inequality constraint (z, u) ≤ 0 is always active, i.e., for each z k and u k generated by the algorithm we have (z k , u k ) = 0, and for any fixed z, the inverse mapping u(z) = {u| (z, u) = 0} is unique and Lipschitz, i.e., u(z) − u(z ) ≤ C z −z , for C < ∞, and > max{C 2 β, c+C 2 }.
Proof -We have only to show that A3 holds in these cases. For the first case, the inequality constraint is redundant and u k is only determined from ϕ(u). Since ϕ(u) is strictly convex, u k is unique and the same for all k's, so u k1 − u k2 = 0 and A3 holds. This is the case, e.g., when inequality constraints are absent.
For the second case, since (z k , u k ) = 0 and u(z) − u(z ) ≤ C z − z , then A3 holds withM = C. And the conditions on are derived from [59,Lemma 9]. This is the case, e.g., when the inequalities are linear equality constraints as F z + Hu = g, and H is full rank.
The conditions of Theorems 3 and 4 are quite mild in many practical relevant MBO problems. In full generality however, Algorithm 2 is a heuristic algorithm, especially as we remark next.
• Equality constraints. When equality constraints are presents, they are softened with the augmented term c 2 Gx − b 2 2 in the cost function. This induces a trade-off: from the conditions in [59,Lemma 9], then at least > max{β, c}; however, to enforce the equality constraints, these have to be at least as important as the enforcing of zero residuals, i.e., c ≥ . This introduces the trade-off of either terminating with a solution with zero residuals (meaning the convergence has been reached, but equality constraints are not necessarily satisfied), or with equality constraints satisfied (without bounds on the magnitude of the residual).
Note that off-loading the equality constraints to variablex and imposing them exactly, only mildly solves the issues, since residual convergence would be achieved with y = 0 (in general) and therefore the equality constraints will not be satisfied exactly.
• Continuous variables. When continuous variables are present, which do not satisfy either of the conditions of Theorem 4, then assumption A3 is not satisfied, making Theorem 3 not hold in this situation and Algorithm 2 is still a heuristic for this case.

Simple examples
We discuss here some interesting examples to showcase the performance of 2-ADMM-H and 3-ADMM-H for MBOs problems in simple settings, and gain some insights on the solutions obtained.

Inequaltity constraints
Example 3 Consider the problem: subject to: 2v + w ≤ 2, where x = [v, w] . We consider two cases, Case 1: 1001 = > β = 1000 (verifying the necessary conditions for Algorithm 2 to converge, but Algorithm 1 is a heuristic), and Case 2: = β = 1000, for which both algorithms are heuristics. Figure 1 showcases convergence of the residual of both Algorithm 1 and Algorithm 2, where we defined the three-block residual as   Example 4 Consider the problem: subject to: 2v + 10w + t ≤ 3,   In Case 2, both algorithms converge and deliver the optimal solution.

Continuous variables
Example 6 Consider the problem: where x = [v, w, t] . We fix = 1001, β = 1000, and the penalization parameter for the equality constraint to be c = 900. The inequality constraint with the continuous variable is always active, so Algorithm 2 is supposed to converge (as for Theorem 4).
In Figure 4, we see how both algorithms converge, but only Algorithms 2 yield the optimal solution (incurring zero optimality gap). In particular, Algo-    In the best case, 3-ADMM-H is guaranteed to converge and it delivers an optimal solution for the original MBO. In the worst case, both algorithms fail to deliver feasible solutions. In the middle, 3-ADMM-H may converge, but the soft-constrained solution is not feasible with respect to the hard-constrained formulation, or both algorithms could converge to a feasible but not optimal solution. With this in mind, we are now ready to apply the algorithms to two well-known MBO problems: Bin Packing (BP) Problem and Mixed Integer Setup Knapsack (MISK) problem. The computational results will be discussed in Section 6, where we will show that despite the heuristic nature of M-ADMM-H, we can still obtain feasible solutions in many cases. This is not trivial in general for combinatorial optimization problems [9,28].
The BP is arguably one of the most studied combinatorial problems [20]. Being strongly NP-hard, it stimulated the study of heuristics, metaheuristics and worstcase approximation bounds. Given n items, each having an integer weight w j , j = 1, . . . , n, and m identical bins of integer capacity Q, the aim of BP is to pack all the items into the minimum number of bins so that the total weight packed in any bin does not exceed the capacity. Applications of BP in logistics and scheduling are numerous, and include cutting stock problems, containers loading, data storage, job scheduling and resource allocation. The MISK belongs to the class of Knapsack Problems [41,44]. The classical knapsack problem is that of deciding which items to pack in a capacitated knapsack, so as to maximize the profit of the items in the knapsack. In the setup knapsack problem (SKP), each item belongs to a family, and an item can be assigned to the knapsack only if a setup charge for the correspondent family is paid [42]. SKP can model capacitated scheduling problems. In the MISK, items can be fractionally assigned to the knapsack. MISK appears as a subproblem of the capacitated coordinated replenishment problem.

Binary Linear Programming Formulation for Bin Packing
Let ξ ij ∈ {0, 1} be the binary decision variable which, if 1, indicates that item j is assigned to bin i. Let χ i ∈ {0, 1} be the binary decision variable which, if 1, indicates that bin i is containing items. A natural mathematical formulation for Bin Packing (BP) problem is then given by the binary linear program: In particular: • The objective function (34a) is the number of bins in solution.
• Constraints (34b) enforce the assignment of each item into a bin.
• Constraints (34c) ensure the packed items do not exceed the bin capacity.
• Constraints (34d) and (34e) express the bounds on the decision variables.
The presence of inequalities to express the capacity constraints (34c) forbids the straightforward mapping to an Ising Hamiltonian model, and the direct application of hybrid quantum/classical optimization algorithms, such as VQE [45] and QAOA [24].

Mixed-Binary Formulation for the Mixed Integer Setup Knapsack (MISK)
The MISK has received limited attention in literature. The mixed-integer formulation proposed in [1] is presented in this section. The items belongs to K nonoverlapping families. Each family k has T items, and a setup cost S k ≥ 0, when included in the knapsack. Each item t of family k has a value C kt < 0, and a resource consumption D kt ≤ 0, if assigned to a knapsack with capacity P . The decision variables are the fraction ξ kt of item t that is included in the knapsack, and is the binary decision χ k to setup family k in the knapsack. MISK can then be formulated as: The aim is to minimize the setup costs and maximize the value of the assigned items via the objective function (35a). Constraints (35b) ensures that the capacity of the knapsack is not violated: this is a fixed charge capacity constraints, because setup capacity consumption is not considered. Constraints (35c) impose that if item t of family k is assigned to the knapsack, then the setup cost of family k is paid accordingly.

Computational Results
We discuss here the multi-block (M-ADMM-H) results on BP and MISK. The algorithm has been implemented in Python on a machine with 2.2 GHz, Intel Core i7 processor, and a RAM of 16 GB; the simulations on quantum devices to solve the QUBOs have been conducted by using the Qiskit framework [52] accessed in November 2019, while IBM ILOG CPLEX 12.8 has been chosen as classical optimization solver 1 .
In Figure 5, a summary of the proposed approach and implementation choices are presented. In particular, for all simulations reported in the following subsections: • The M-ADMM-H algorithm has been run with a time limit of 1 hour, a limit of 500 iterations, and with merit parameter µ = 1e + 3.
In addition, to avoid large penalization factors from the first iteration, we start with the value = 1e + 4, which is then increased by 10% at each iteration, until it exceeds the value of 1e + 7. In the 3-blocks implementation, the penalization c of equality constraints has been set to 1e + 5. The penalization β of residual y is initially set to 1e + 3 and then updated according to scheme described in [59], specifically β k+1 = γβ k , if x k ≤ ω x k−1 , with ω = 0.5 and γ = 2, so to foster exact penalization.
• The QUBO subproblems are solved either classically with CPLEX, or for the quantum simulations, we have used the variational solver VQE, invoked via Qiskit, with the RY variational form in a circuit of depth 5 and full entanglement, and the QASM simulator as Qiskit Aer backend (Figure 6 represents the circuit that was used in the case of three qubits and depth 4). A common random seed has been fixed for all simulations. No limitations on the VQE running time have been imposed.
• VQE is itself an iterative hybrid classical/quantum algorithm that involves defining a parametrized variational form and optimizing classically on the rotation parameter vector θ, while evaluating the variational form and its gradients on the quantum device. In our simulations, the classical solvers used by VQE are the modelbased local optimizers Simultaneous Perturbation Stochastic Approximation (SPSA) [58], and Constrained Optimization By Linear Approximation (COBYLA) [36]. For both solvers, the Qiskit implementation has been used.
The gap of the minimum-merit-value solution with value v with respect to known optimal value v * is computed as |v−v * | 1e−10+|v * | . In order to report the computational results, we have included: the number of qubits, the number of iterations (IT) of M-ADMM-H, the gap (Gap) to optimality, and percentage of M-ADMM-H solution that are feasible (Feas) or optimal (Opt) with respect to the constraints and objective of the original constrained problem.

Remark 2
We notice here that VQE does not solve (in general) a QUBO at optimality (and in this sense, it is not a perfect oracle), while CPLEX does (for the considered small instances). In addition, even in cases in which VQE solves the QUBO at optimality, the optimizer may be different from CPLEX, since multiple solutions could exist. In general, then the solution of VQE and CPLEX will be different when solving the same QUBO and the outer ADMM loop will be affected by it. In practice, using VQE could either worsen or boost convergence: since M-ADMM-H is in general a heuristic, small errors can be beneficial in some cases, while worsening performance in others.
We notice that the choice of VQE in this paper is due to the current technical status of quantum computing. In the future, better QUBO solvers may be available, e.g., based on (iterative) phase estimation, which might deliver optimal solutions at scale.

BP
We first discuss two implementation improvements to reduce computational complexity and foster convergence in the heuristic case for BP.
Removing unnecessary decision variables. Let l be a lower bound on the number of bins required to pack all items (for example the continuous relaxation bound i,j wij Q ). Then, it is possible to discard variables χ 1 , . . . , χ l from the mathematical formulation. In addition, it is not restrictive to assume ξ 1,1 = 1. With these observations, the number of decision variables required is (mn − n) + (m − l). Typically, n = m, hence this boils down to n 2 − l. The stronger the bound l is, the fewer binary variables are introduced. In the current implementation, the continuous relaxation bound has been adopted.

Local search operator (LS).
To improve the convergence of M-ADMM-H to solutions that are feasible for the equality constraints (34c), we have implemented a local search operator [2] to be applied to the solutions of the QUBO in the first block update of (2) and (1). This operator is based on the Karmarkar-Karp Differencing Method [46], and it shuffles the assignment of items to pairs of bins in such a way to minimize the difference of the weights of the bin.
Bin Packing has been tested on M-ADMM-H on two groups of instances: • Small-sized: n = 2, 3, 4. Weights w j have been randomly picked in [1, Q]. The QUBO has been solved via VQE and CPLEX. On the Scholl dataset instances, the QUBO subproblem has been solved via CPLEX only, to evaluate the quality of M-ADMM-H solutions. The simulations on quantum devices are not of practical implementation at the moment, since the number of qubits in QUBO are O(n 2 ) and would exceed the capabilities of current quantum technology.

Small-sized dataset
For the simulations on CPLEX, Table 1 r 3 k W t f l e v N K 7 z O I r k h J y S c + K R S 9 I g t 6 R J W o Q T R Z 7 J K 3 l z t P P i v D s f i 9 a C k 8 8 c k z 9 w P n 8 A t Z a P s A = = < / l a t e x i t > Figure 5: Illustration diagram of the hybrid classical/quantum approach M-ADMM-H. The optimization solvers adopted for the numerical results are specified (i.e., CPLEX, VQE, SPSA, and COBYLA). There are two nested loops for the selected implementation, specifically the outer ADMM loop, and the inner VQE loop. Figure 6: Prototype circuit used in the simulation results to evaluate |ψ(θ) = U (θ)|0 , here exemplified for three qubits (q = 3) and a depth d = 4, consisting of d + 1 layers. The first operations consists in single-qubit Y rotations, with one variational parameter θ j i per qubit to determine the rotation angle. Each additional layer after the first contains entangling gates, more specifically controlled-Z gates applied to all qubit pairs, followed by another set of single-qubit Y rotations with one variational parameter each to represent the angle. The variational form is then parametrized over q(d + 1) angles, arranged in a vector θ = [θ j i ] i=1,...,q;j=1,...,d+1 .
feasible or optimal solutions, grouped by the number of items of the instance. The 3-block 3-ADMM-H implementation is able to find feasible solutions for over 90% of the instances. The search for optimal solutions becomes more difficult as the number of items increases, and for only 5% of the 4-items instances optimal solutions are found, and the gap to optimality is close to 70% on the 4-items instances. For the two-block implementation 2-ADMM-H the increase of gap is less, however the search for feasible solutions is more difficult, as for 63.33% of the instances feasible solutions are found.
For the simulations in which QUBO is solved via VQE, the solvers Simultaneous Perturbation Stochastic Approximation (SPSA) [58] and Constrained Optimization by Linear Approximation (COBYLA) [51] have been used as classical optimizers with IT = 20, 50 on BP instances with N = 2, 3 and Q = 40. SPSA is known to be more computationally demanding than COBYLA, because it requires two function evaluations per iteration. COBYLA makes ADMM converge in 1 iteration to the optimal solution for instances with 2 items. For each combination of values of N, Q, and IT , 20 instances have been generated with weights in [1, Q], and average results for each group are reported in Table 2. Increasing the number of SPSA iterations is detrimental for the gap, feasibility and optimality of the instances: this is because SPSA runs for as many iterations as the limit set in Qiskit. Invoking VQE with IT = 50 in COBYLA, enables to increase by 40% the number of instances with feasible solutions with N = 3.
Overall, the choice of SPSA as classical solver for VQE with 20 iterations is the best one in terms of solutions quality for these instances with 2 and 3 items, and outperforms the results obtained with CPLEX displayed in Table 1. This can be explained by the percentage of QUBO suproblems solved to optimality by VQE (column QUBO): while COBYLA solves all QUBOs to optimality when N = 2, SPSA reports a non-optimal QUBO solution in a considerable percentage of the instances when N = 2. It seems therefore beneficial for ADMM to solve a part of the QUBO suproblems in an inexact fashion. For instances with N = 3, the number of qubits increases and VQE hardly ever solves the QUBOs to optimality (cf. Remark 2). The residuals are not guaranteed to decrease in each ADMM iterations, as reported by Figure 7 on instance N3C40I8. In this case, 3-ADMM-H explores solutions with 3 bins for about 70 iterations, and then converges to a nonoptimal solution of lower value, which makes the residual equal to 0.
On the same groups of BP instances, 2-ADMM-H has 3-ADMM-H    Finally, the VQE simulations where conducted on 3 BP instances with N = 4, Q = 4. In this case, M-ADMM-H cannot perform more than 2 iterations within the time limit of 1 hour. We have also observed that, due to the size of the search state, VQE is not always able to find a solution where the equality constraints (34b) are satisfied. The number of iterations of the classical solver invoked by VQE has to be set to a sufficiently large value that ensures to explore solutions without augmented Lagrangian penalty terms. As a representative example, Figure 8 displays the allocation of items to the bins on a BP instance (referred to as instance N4Q4) with weights [2, 3, 2, 2], obtained from QUBO at iteration 1 of 3-ADMM-H. Since one of the items with weight 2 is assigned twice in the solution obtained by SPSA with 50 iterations, it is necessary to increase the iterations to 100 to obtain a solution where all items are assigned to one bin. In this case, the solution is feasible and optimal. The time required to perform this 3-ADMM-H iteration goes from 2749.87s in the 50-iteration simulation case to 5380.40s in the 100-iterations case. This shows that the BP instances with 4 items are computationally very demanding for the 3-ADMM-H algorithm.   The LS is instead extremely beneficial to find feasible solutions in the 2-ADMM-H, and it enables to reach convergence within 21 iterations, on average. It is worthy to note that the 2-block implementation enables to find solution with average gaps to optimality less than 50% on those found by the 3-block implementation, even if the convergence is often not reached in 500 iterations. In the 2-block implementation, the gap is less dependent on the capacity of the bins.

MISK
Mixed-Integer Setup Knapsack problem has been tested on M-ADMM-H on 2 groups of instances. The first group of instances, Group 1, has been generated by following the guidelines of [1]. To generate challenging MISK instances, the capacity utilization K k=1 T t=1 D kt P is set to 2.5, data correlation is medium (i. e., D kt ∈ [1, 10], C kt ∈ −[D kt − 2, D kt + 2]), and setup costs S k are randomly sampled in [40,60]. A second group of instances, Group 2, has been generated with the aim to test M-ADMM-H in cases where the continuous decisions have an impact larger than the binary decisions on the solutions. To this end, the S k and values C kt have been lowered, specifically S k ∈ [0, 1], and C kt ∈ [−60, −40]. In both groups of instances, T has been set to 10, and the number of families K, corresponding to the number of qubits in the QUBO, ranges in the set {5, 8, 11, 14}.
Both groups have been initially tested on M-ADMM-H with QUBO solved via CPLEX on a classical device. In this case, the feasible solution in which any item is assigned to the knapsack is very often the only feasible solution found, which can be arbitrarily far from the optimal value. For the simulations with VQE, Table 5 reports the average results obtained on 3 instances for fixed K in Group 1, with 3-ADMM-H. While 3-ADMM-H with SPSA fails to converge within 1 hour for instances with K ≥ 8, it converges with COBYLA in a few iterations, and produces more feasible solutions. However, SPSA yields better results in terms of optimality gap, especially when the maximum number of iteration IT is set to 20. Feasible solutions are found for all instances with COBYLA with IT= 20.
The Group 2 instances are solved with average optimality gap of 21.03% with SPSA, as shown in Table 6, reporting a 75% decrease of this metric with re-    Figure 9 shows solution costs and value of the residuals reported in the 3-ADMM-H iterations on instance K11T10I1 with 11 qubits. The solution cost changes at each iteration in a non monotonic way, and 3-ADMM-H converges to a feasible solution in 10 iterations.
For the 2-ADMM-H implementation, the results are reported in Table 7 and Table 8

Conclusions
In this work, we have proposed an iterative heuristic method M-ADMM-H, based on Alternating Direction Method of Multipliers, to solve MBOs on nearterm quantum devices as well as on classical computers whenever a QUBO solver is available. The method relies on a decomposition of MBO into a QUBO subproblem, which can be tackled via hybrid classical/quantum optimization solvers such as VQE and QAOA, and convex subproblems. This enables to extend the range of mathematical optimization problems that can be solved on NISQ devices. The method has been tested via the Qiskit framework with VQE as quantum QUBO solver on two representative MBO problems, namely Bin Packing Problem, and Mixed-Integer Setup Knapsack Problem. The simulations indicated the effectiveness of M-ADMM-H in finding solutions feasible for the MBO formulations. In particular, for Bin Packing instances with 2 and 3 items, feasible solutions are found with an average optimality gap of at most 7.50%. In this case, setting SPSA in 2-ADMM-H as the VQE solver with 50 iterations delivers the best results. On MISK instances, VQE is beneficial to explore feasible solutions different to a trivial one found via the classical computation with CPLEX. It is important to observe that M-ADMM-H is a heuristic optimization algorithm for a class of MBO formulations, and it is not tailored to the two applications addressed in this paper, namely BP and MISK problems; therefore the results in terms of feasibility are not trivial on these combinatorial problems.
In theory, we have presented formal requirements under which 3-ADMM-H is guaranteed to converge to a stationary point of a pertinent augmented Lagrangian, which applies on quantum and classical computers alike. In practice, we have offer a glimpse on current re-search in combinatorial optimization in quantum computing, along with assumptions, challenges, and open problems.
Future works can include the investigation of other decomposition approaches to devise QUBO subproblems, and the integration of techniques to enforce the feasibility of equality constraints of MBO in the QUBO subproblems [66], as well as the combination of ADMM with slack variable approaches [15].