Variational Quantum Optimization of Nonlocality in Noisy Quantum Networks

The inherent noise and complexity of quantum communication networks leads to challenges in designing quantum network protocols using classical methods. To address this issue, we develop a variational quantum optimization framework that simulates quantum networks on quantum hardware and optimizes the network using differential programming techniques. We use our hybrid framework to optimize nonlocality in noisy quantum networks. On the noisy IBM quantum computers, we demonstrate our framework's ability to maximize quantum nonlocality. On a classical simulator with a static noise model, we investigate the noise robustness of quantum nonlocality with respect to unital and nonunital channels. In both cases, we find that our optimization methods can reproduce known results, while uncovering interesting phenomena. When unital noise is present we find numerical evidence suggesting that maximally entangled state preparations yield maximal nonlocality. When nonunital noise is present we find that nonmaximally entangled states can yield maximal nonlocality. Thus, we show that variational quantum optimization is a practical design tool for quantum networks in the near-term. In the long-term, our variational quantum optimization techniques show promise of scaling beyond classical approaches and can be deployed on quantum network hardware to optimize quantum communication protocols against their inherent noise.


I. INTRODUCTION
The world is progressing towards the quantum internet [1][2][3][4], a global network of quantum devices linked by quantum communication. The quantum internet will revolutionize science and technology by providing advantages in distributed sensing [5][6][7], communications [8][9][10], network security [11][12][13][14][15], and distributed information processing [16,17]. Unfortunately, these applications are not readily available due to, largely, the presence of noise in quantum hardware. Nevertheless, we are at the forefront of the quantum internet. Using existing technology, we can build and scale rudimentary quantum networks, and as new quantum technologies emerge, we can upgrade the functionality of these networks [3].
As quantum communication networks scale, two daunting design challenges emerge. First, network protocols must be designed around hardware noise, however, the difficulty of exactly characterizing quantum noise grows exponentially with the number of qubits [18][19][20]. Second, classical simulation and optimization methods fail to scale with the exponential growth of the Hilbert space dimension. While tensor networks [21] are a promising classical approach, their efficiency requires certain symmetries and entanglement structures to hold. In general, these assumptions might not hold because quantum repeaters [22,23] can perform entanglement swapping protocols [24][25][26] to create complex, multi-qubit entanglement across distant network nodes. Thus, how can we design protocols for noisy, complex quantum networks when classical approaches fail?
Quantum problems often have quantum solutions. Variational quantum optimization (VQO) [27][28][29] is a promising tool for quantum network design. This hybrid algorithm simulates a quantum system on a quantum computer and optimizes it using a classical computer. Hybrid, quantum-classical algorithms have demonstrated success across a wide range of simulation and optimization problems [29][30][31]. In the near-term, VQO techniques are predicted to show practical advantages on noisy intermediate-scale quantum (NISQ) devices [32]. Furthermore, VQO can be deployed on devices in quantum networks to optimize communication protocols against the noise inherent to the quantum network hardware.
To demonstrate that VQO can be a design tool for quantum networks, we use it to find the optimal state preparations and measurements for nonlocality in quantum networks. Nonlocality refers to a phenomenon where entanglement is used to create stronger-than-classical correlations between distant, noncommunicating systems [33][34][35][36][37][38]. These nonlocal correlations are not only a matter of theoretical interest, but can test quantum systems in a black-box manner [34,[39][40][41][42][43][44], and securely perform device-independent cryptography on untrusted devices [15,[45][46][47][48][49][50][51]. In recent years, machine learning has successfully solved complex problems that arise in the study of nonlocality [52][53][54]. Most notably, a hybrid, quantumclassical reinforcement learning algorithm was shown to find the optimal state preparations and measurements that maximize nonlocality [54]. Furthermore, hybrid optimization techniques have been used to maximize nonlocality in simple photonic systems [55,56]. Our VQO methods have many similarities with these previous approaches, thus, we expect VQO to show similar success.
Hence we show that VQO can provide practical value to the design and development of noisy quantum networks. In Section II, we give an overview of n-local networks. In Section III, we describe our VQO framework for noisy quantum networks. In Section IV, we discuss quantum non-n-locality and how it can be maximized using VQO. In Section V, we demonstrate VQO of nonn-locality on noisy quantum hardware. In Section VI, we apply VQO on a classical simulator to investigate the noise robustness of non-n-locality. In summary, we justify our VQO framework for quantum networks by reproducing known results, obtaining new results, and showing its promise of practical advantages when run on quantum hardware.
Our work is accessible, transparent, and reproducible on a laptop computer. Our VQO software for quantum networks is released as a Python package called qNetVO [73], which is built upon PennyLane [74]. All numerics and data are available on GitHub in a supplementary codebase [75].

II. N -LOCAL NETWORKS
In this section, we outline the theoretical model of nlocal quantum networks. We first introduce n-local networks in the classical setting and then, extend this model to the noisy quantum setting. The experienced reader may proceed to Section III where our variational quantum optimization framework is discussed in detail.

A. n-Local Classical Networks
In the classical setting, an n-local network consists of n sources Λ 1 , . . . , Λ n that distribute randomness to m nonsignaling devices or nodes A 1 , . . . , A m (see Fig. 1). The i th source outputs classical value λ i drawn randomly from the distribution Ω Λi with probability P (λ i ) and sends λ i to all linked devices. All sources in the network are assumed to be independent such that where λ = (λ i ) n i=1 contains the random values output from each independent source. The sources {Λ i } n i=1 each distribute their random value to the nodes {A j } m j=1 through a collection of links {L k } l k=1 . The node A j may receive multiple random values, thus, we denote with λ Aj ⊆ λ the set of random values received by node A j .
The j th node has classical input and output alphabets X j := {1, . . . , |X j |} and A j := {1, . . . , |A j |} respectively where the input and output alphabets for the entire network are denoted X := X 1 × · · · × X m and A := A 1 × · · · × A m respectively. Hence the network processes the classical input x ∈ X to produce the classical output a ∈ A where x = (x j ∈ X j ) m j=1 and a = (a j ∈ A j ) m j=1 . Since, the nodes are nonsignaling, their joint probability distribution must satisfy for all x ∈ X , a ∈ A, and λ ∈ {Ω Λi } n i=1 . We characterize n-local networks using only their input-output statistics. Hence we consider a scenario where many identical and independent experiments are performed. In each experiment, a classical input x ∈ X is drawn from a uniform random distribution. The network processes the input x to produce the output a ∈ A. After many repetitions, an approximate conditional probability distribution {P ( a| x)} a∈A, x∈X is constructed. These conditional probabilities fully characterize the network and are represented as a column stochastic matrix referred to as the network behavior where {| x } x∈X and {| a } a∈A form classical orthonormal bases over the input and output sets respectively. The transition probabilities P ( a| x) decompose as [61], Definition 1. L Net is the set of network behaviors whose probabilities P ( a| x) decompose as Eq. (4) for all inputs x ∈ X and outputs a ∈ A for a given n-local network.

B. n-Local Quantum Networks
An n-local network is extended to the quantum setting by replacing the shared randomness at each source with quantum entanglement. The entanglement is distributed amongst the nodes, which subsequently measure their local quantum state to produce a classical output.
This work models an n-local quantum network as an N -qubit system where each qubit is indexed by an inte- . Sources Λ i , nodes A j , and links L k are uniquely described by their local qubits, . Qubits cannot be shared between multiple sources, links, or nodes, hence, Λ i ∩ Λ i = ∅ ∀ i = i and similarly for A j and L k .
In the quantum setting, the n sources collectively prepare the state Prep as the joint Hilbert space where source independence is ensured by the separability across states. For mixed states, we denote the density operator as ρ Λi ∈ D(H Λi Prep ). Quantum channels model the link between source and measurement devices and noise in the network. A quantum channel is represented by the completely-positive trace-preserving (CPTP) map [76] Meas ), where the L k denotes the qubits on which the channel act. For convenience, the input and output Hilbert spaces have equal dimension. A quantum channel is expressed in either the operator-sum representation [76,77] where {K i } i are Kraus operators [77], or the the systemenvironment representation [76,78] where E ⊆ [N ] is an ancillary set of qubits that represent the environment, Tr E [·] denotes the partial trace over the environment, and U N is a unitary applied to both system and environment. Independent quantum channels combine to describe the network noise as N Net = l k=1 N L k where in the noiseless case, N Net (ρ Net ) = id(ρ Net ) = ρ Net . Note that noise can also be applied to the source or measurement devices.
The measurement at node A j is modeled using a projection-valued measure (PVM) {Π Aj aj |xj } aj ∈Aj that forms a set of orthogonal projectors satisfying aj ∈Aj Π Aj aj |xj = I Aj . The node measures its local qubits ρ Aj ∈ D(H Aj Meas ) that were received from its linked sources. In aggregate, the network applies the projector Π Net a| x = m j=1 Π Aj aj |xj , where the PVM applied at each node is conditioned upon the classical input x j ∈ X k . Upon measurement, the classical output a is obtained with probability, where the expectation of multiple observables, is understood as the expected parity of a.
Definition 2. Q Net is the collection of network behaviors P Net whose probabilities P ( a| x) decompose as Eq. (7) for some choice of network state preparation ρ Net , noise model N Net , and PVM measurements {{Π Net a| x } a∈A } x∈X .

III. VARIATIONAL QUANTUM OPTIMIZATION OF NOISY QUANTUM NETWORKS
In this section, we develop a variational quantum optimization (VQO) framework for maximizing non-nlocality in noisy quantum networks. At a high-level, our VQO framework simulates the quantum network as quantum circuit and optimizes the circuit parameters using differential programming and gradient descent. We implement our methods in a Python package called qNetVO: the Quantum Network Variational Optimizer [73]. The qNetVO software is built upon PennyLane, a free and open-source quantum differential programming framework [74]. PennyLane enables our VQO framework to be easily run on a wide range of quantum devices and classical simulators. The threelayer quantum circuit ansatz that simulates the network. In the source layer, the quantum state |ψ Λ i is prepared. In the noise layer, a static noise model N q k is applied. In the measurement layer, the projector Π A j z j |x j is a measurement in the computational basis. The ansatz circuit is combined into a single unitary U Net (Θ x ) and the outcome probabilities are calculated using Eq. (11). When a measurement includes multiple qubits, an XOR operation is used to map the raw output to a single bit, e.g., b = z2 ⊕ z3.

A. Simulating Noisy Quantum Networks
To simulate a noisy n-local quantum network on a quantum computer, the state preparations, measurements, and noise are encoded into the unitary evolution of a quantum circuit. We define the network-specific ansatz as a unitary operator U Net (Θ x ) that is parameterized by a collection of real values Θ x referred to as the circuit settings. The ansatz circuit parameters are organized as Θ parameterize the state preparations and measurements respectively. The parameterized unitary U Net (Θ x ) acts upon the Nqubit zero state |0 N and is measured in the computational basis {| z } z∈Z where Z := {0, 1} N is the set of all N -bit strings. When a noiseless quantum computer executes the ansatz circuit, the bitstring z is output with probability Using Eqs. (3) and (11), we express the parameterized quantum circuit behavior P QC (Θ) as the column stochastic matrix The network settings Θ contain the settings for all inputs x ∈ X and are organized as where the network settings Θ are distinct from the circuit settings Θ x because Θ x contains only the settings pertaining to input x ∈ X . If a quantum network's output does not conform to an N -bit string, then a classical postprocessing map L : Z → A is needed to obtain the simulated network behavior P from the quantum circuit behavior P QC . That is, where P QC (Θ) is defined in Eq. (12) and the postprocessing map is represented as a column stochastic matrix, On a quantum computer, the network behavior P(Θ) is obtained by repeatedly executing the ansatz circuit U Net (Θ x ) across all inputs x ∈ X to estimate the probabilities P ( z| x) for all inputs.

Simulating Noiseless n-Local Quantum Networks
A noiseless n-local quantum network ansatz decomposes into preparation and measurement layers, where each layer is modularized as The network state preparation and measurements are expressed as Combining Eqs. (19) and (20) with Eq. (11) yields,

Simulating Noisy n-Local Quantum Networks
On quantum hardware, noise is modeled using a unitary operator U N to implement the system-environment representation of a quantum channel expressed in Eq. (6). Ancilla qubits are required to implement nonunitary dynamics. In the worst case, the Hilbert space dimension of the ancilla space is d E = d 2 S where d S is the dimension of the system the quantum channel operates upon [78].
If the quantum circuit is run on a classical simulator such as the PennyLane default.mixed mixed state simulator [74], then ancillary qubits are not needed and the operator-sum representation expressed in Eq. (5) can be used instead. In practice, using a classical simulator to model noisy dynamics is a good idea because the inherent noise of the quantum hardware is not included. Furthermore, the Kraus operators give precise control over the action of quantum channel description that might not easily be implemented by the unitary gates available on a quantum computer.
Detector errors can be modeled using a classical postprocessing map E. That is, a noiseless network behavior P Net can have noise added as where E is a column stochastic matrix. This model of detector errors describes unflagged errors that occur without the experimenter's knowledge. Hence the errors cannot be corrected or removed by postselection.

B. Optimizing Quantum Networks
The goal of our VQO framework is to find the optimal settings Θ that yield a network behavior P Net (Θ ) that is optimal for a particular task, e.g., violating a Bell inequality. We define a problem-specific cost function Cost(P Net (Θ)) that quantifies the network's performance at this task. The optimization objective is then expressed as a minimization of the cost function, The cost function can quantify a wide range of network properties such as entropic quantities, the distance to a desired network behavior, or the winning probability of a multipartite game. The optimization problem in Eq. (23) is solved using gradient descent [79,80] where η ∈ R is the step size and ∇ Θ Cost(Θ) is the gradient of the cost function evaluated at Θ. This iterative optimization procedure incrementally traverses the path of steepest descent to find a local minimum of the cost function where in each step, the network settings are updated as Θ → Θ . The gradient ∇ Θ Cost(P Net (Θ)) is evaluated numerically using automatic differentation [81,82]. On a classical simulator, the gradient of a quantum circuit can be evaluated using a chain rule method known as back propagation [82]. On quantum hardware, the gradient of a quantum circuit is evaluated using the parametershift rule [83]. In practice, the parameter-shift rule first runs the network simulation over a collection of "shifted" settings {Θ g } g =Θ PS and then, the classical optimizer uses the resulting circuit behaviors {P QC (Θ g )} Θ g ∈ΘPS to construct the gradient.

C. Variational Quantum Optimization Algorithm
Having established the key components of our VQO framework, we now explicitly describe our hybrid optimization algorithm implemented by the qNetVO software [73]. The algorithm requires a parameterized network ansatz U Net (Θ x ) and cost function Cost(P Net (Θ)). As hyperparameters, the algorithm accepts the step size η and the number of gradient descent iterations num steps. The following steps depicted in Fig. 3. 1. The network settings Θ init are randomly initialized.
2. The hybrid VQO loop repeats num steps times. In each step the following actions are performed: (a) The classical optimizer constructs the collection of settings {Θ g } g =Θ PS needed for the parameter-shift rule.
(b) The quantum computer evaluates P QC (Θ g ) for all settings in Θ g ∈Θ PS .
(c) The classical optimizer evaluates the gradient ∇ Θ Cost(P Net (Θ)) using the parameter-shift rule and updates the network settings using Eq. (24) to perform gradient descent.
3. The optimization algorithm exits after performing num steps of gradient descent iterations. As output, the algorithm provides the collection of settings evaluated in each step, and the cost Cost(P Net (Θ)) achieved by those settings.
The VQO algorithm is not guaranteed to find the global optimum, however, the probability of finding the global optimum is increased by repeating the optimization with randomly initialized settings Θ init . Whether the global optimum can be found is highly dependent on the cost function and network ansatz. Furthermore, the algorithm provides a upper bound on the true minimum of the cost function. That is, it does not find the exact settings Θ that minimize the cost, but settings that are close to optimal, i.e., Θ = Θ ± for some small value . The precision of the optimization can be adjusted using the step size η. In practice, qNetVO [73] provides convenient descriptions of ansatz, network settings, and cost functions, while PennyLane [74] provides the machinery for automatic differentiation and the execution of circuits on classical and quantum hardware.

IV. QUANTUM NON-N -LOCALITY
In this section, we introduce the concept of quantum non-n-locality. We first discuss Bell inequalities and how they can be used to witness quantum non-n-locality. Next, we introduce the considered n-local networks, their Bell inequalities, and their maximal quantum violations in the noiseless case. Finally, we describe how our variational quantum optimization methods can be used to maximize quantum non-n-locality.

A. Witnessing Quantum Non-n-Locality
In the noiseless setting, quantum entanglement allows for stronger-than-classical correlations to form across distant network nodes. That is, the set of classical network behaviors L Net from Definition 1 and the set of quantum network behaviors Q Net from Definition 2 relate as L Net ⊆ Q Net when both network have the same topology [34,64]. Quantum non-n-locality is then defined as follows.
Definition 3. A quantum network behavior P ∈ Q Net is non-n-local if and only if P / ∈ L Net where both classical and quantum n-local networks have the same topology.
To decide whether a network behavior P Net is nonn-local, we must characterize the geometry of the set of classical network behaviors L Net . In general, L Net is closed, connected, and contains a finite set of vertices V corresponding to deterministic behaviors [64] (25) Deterministic network behaviors do not require shared randomness, hence, the set of vertices V is determined solely by the network nodes {A j } m j=1 . As a result, if we denote L Net as the local (n = 1) set having a single source and network nodes {A j } m j=1 , then L Net = Conv (L Net ) [34,64]. In general, L Net ⊆ L Net where equality only occurs in trivial cases, e.g., the network contains a single measurement device. It follows that L Net is nonconvex for all L Net ⊂ L Net [64].
The set of n-local network behaviors L Net is bounded by inequalities referred to as Bell inequalities [34,64] where S Bell (·) is a function referred to as the Bell score and β is the classical upper bound. All behaviors P ∈ L Net must satisfy the Bell inequalities that bound L Net . For the local (n = 1) case, the Bell inequalities bounding L Net are linear half-space inequalities [34]. In general, the Bell score S Bell (P) is a nonlinear function of the probabilities P ( a| x) [57-64, 84, 85]. The Bell inequalities bounding L Net can be used to witness quantum non-n-locality. That is, if S Bell (P) ≤ β, then P / ∈ L Net . In this case, the Bell inequality is violated and the behavior P is witnessed to be non-n-local. Therefore, the study of quantum non-n-locality reduces to finding n-local quantum behaviors P ∈ Q Net that violate a particular Bell inequality. Fortunately, many Bell inequalities have been derived that tightly bound important n-local networks including star [59], chain [60], and tree [61][62][63] topologies. Thus, non-n-locality can be studied broadly without deriving new n-local bounds.

B. Bell Inequalities for n-Local Networks
This work considers general n-local star [59] and chain [60] networks where each network node has binary inputs x j , y j ∈ {0, 1} and outputs a j , b j ∈ {±1} (see Fig. 4). Star networks are important because they can implement generalized entanglement swapping protocols [24][25][26]59] where the exterior nodes in the star can become arbitrarily entangled based upon the measurement applied in the central node. Similarly, chain networks are important because they model quantum repeater chains that enable long-distance quantum communication via relay nodes that perform entanglement swapping [22,23]. In both cases, non-n-locality is important for device-independent certification of the entanglement sources and quantum measurements required to implement entanglement swapping protocols and network security protocols [13,15,49,51].
For the n-local networks depicted in Fig. 4, we now introduce the Bell inequalities and their quantum violations. In all cases, the optimal noiseless strategies are achieved using maximally entangled state preparations ψ Λi = |Φ + = (|00 + |11 )/ √ 2 and linear combinations of the Pauli observables σ x and σ z .
The n-local star network depicted in Fig. 4.d consists of n, two-qubit sources {Λ i = (q i , q n+i )} n i=1 and m = n + 1 measurement nodes arranged in a star formation with n nodes {A j = (q j )} n j=1 serving as points of the star connected to a single central node B = (q n+j ) n j=1 . The n-local star network Bell inequality is defined as [59] S n-Star := |I n,0 | 1/n + |I n,1 | 1/n ≤ 1, where In Eq.  [59]. In the n = 1 case, the star network reduces to two measurement devices depicted in Fig. 4.a. This simple two-qubit network is the fundamental example of nonlocality [34] and is bounded by the CHSH inequality [86] Rearranging the CHSH inequality in Eq. (29) we find the n = 1 case for the star inequality in Eq. (27) The maximal violation of Eq. (29) is S CHSH = 2 √ 2 ≤ 2 and is achieved by the source preparation ψ Λ = |Φ + measured using the observables . In the n = 2 case, the star network reduces to the bilocal network depicted in Fig. 4.b. This network contains two sources Λ 1 = (q 1 , q 2 ) and Λ 2 = (q 3 , q 4 ). The two exterior nodes A 1 = (q 1 ) and A 2 = (q 4 ) hold the first and last qubit while the central holds the qubits B = (q 2 , q 3 ). The bilocal network is bounded by the Bell inequality [57,58] which is exactly the n = 2 case of n-local star Bell inequality in Eq. (27). The maximal violation is S Biloc = √ 2 ≤ 1 and is achieved using the observables O The n-local chain network is an important extension of the bilocal network. In this network, the n sources connect the m = n+1 nonsignaling measurement devices in a chain depicted in Fig. 4.c. The exterior nodes A 1 = (q 1 ) and A 2 = (q 2n ) hold the first and last qubits while the interior nodes each hold two qubits {B j = (q 2j−2 , q 2j−1 )} n j=2 . The set of n-local chain behaviors is bounded by the inequality [60], where S n-Chain is very similar to the bilocal inequality described by Eq. (31). The distinction arises in the central observable O B y , which for the n-local chain, decomposes as where O Bj yj =y is the two-qubit observable applied at each interior node in the chain. In Eq. (32) the collection of interior measurement devices {B j } n j=2 is treated as a single observable O B y conditioned on a single input y ∈ {0, 1}. The optimal quantum violation of Eq. (32) is S n-Chain = √ 2 ≤ 1 using Bell state preparations and the observables O

C. VQO of Quantum Non-n-Locality
To summarize, non-n-locality is witnessed by the violation of a Bell inequality S Bell (·) ≤ β that bounds the n-local set L Net . Thus, our objective is to find where S Bell is the maximal Bell score that can be achieved by n-local quantum network and the maximization is performed over all network behaviors in the n-local quantum set Q Net (see Definition 2). This maximization requires us to find the optimal state preparations and measurements for maximizing non-n-locality given a static network noise model N Net . The maximization of non-n-locality expressed in Eq. (34) is well suited for our VQO framework. The n-local network ansatz is constructed similarly to Fig. 2 where the state preparations and measurements are modeld by Eq. (17) and Eq. (18) respectively. Furthermore, the network noise N Net can be modeled as described in Section III A 2. The cost function is simply expressed as where the minus sign transforms the minimization of the cost into the maximization of the Bell score. It is then a simple matter of using software such as qNetVO [73] to perform the algorithm shown in Fig. 3.

V. VQO OF NON-N -LOCALITY ON QUANTUM HARDWARE
In this section, we show that our VQO framework can be run on quantum hardware and discuss how our optimization methods show promise in demonstrating practical advantages. We first demonstrate that our VQO techniques can successfully maximize non-n-locality on IBM quantum hardware. Then, we discuss how our methods can be scaled to demonstrate practical advantages on quantum computers. Finally, we discuss the advantages that might be gained by deploying our optimization methods directly on quantum network hardware.

A. VQO of Non-n-Locality on Noisy IBM Quantum Computers
Using IBM quantum computers, we use the variational quantum optimization algorithm depicted in Fig. 3 to maximize the non-n-locality in quantum networks using Eq. (34). We use VQO to maximize the CHSH score S CHSH , the bilocal score S Bilocal , the trilocal chain score S 3-Chain , and the trilocal star score S 3-Star . Each optimization uses the same preparation and measurement ansatzes depicted in Fig. 5 where the Bell state |Φ + = 1 √ 2 (|00 + |11 ) is prepared at each source and local qubit measurements are free to rotate about the yaxis. This ansatz reflects experimental setups of n-local networks [65][66][67][68][69] where the free parameters correspond a rotation applied to each measurement apparatus Our variational quantum optimization results are shown in Fig. 6. In all optimizations, each circuit eval- When applying VQO on hardware, we consider a simple ansatz. Each source (green) prepares the Bell state Φ + using a Hadamard and CNOT gate. Each nonsignaling device (blue) applies a local rotation about the y-axis to each qubit before measurement in the computational basis. When a measurement device contains more than one local qubit, the XOR is taken to convert the bit string into a binary output.
FIG. 6. VQO of Non-n-Locality on IBM Quantum Computers. Non-n-locality is optimized in four different n-local networks. The x-axis shows the step of the gradient descent optimization while the y-axis shows the Bell score. The noiseless quantum bound is shown by the solid blue line and the classical bound is shown by the dashed orange line. For each Bell inequality, we aggregate data across several different optimization runs. At the end of each optimization, the theoretically optimal score is evaluated on noisy hardware to serve as a baseline. The mean theoretical score is shown by the dash-dotted brown line and the max theoretical score is shown by the dash-dot-dotted pink line. In each step, the max score across all optimization is shown by the dotted red line with circle markers while the mean score across all optimizations is shown by the dotted purple line with error bars showing the standard error. The dotted green line with diamond markers shows the noiseless optimal score obtained by running the settings for the maximal score in each step on a noiseless classical simulator.
uation was run using 6000 shots and randomized initial settings. The CHSH case was optimized using the 5qubit ibmq belem device while the bilocal, trilocal chain, and trilocal star networks were optimized using the 7qubit ibmq casablanca and ibmq jakarta devices. The CHSH plot shown in the upper-left of Fig. 6 aggregates data from 11 separate optimizations using a step size of η = 0.12. The bilocal plot shown in the upper-right of Fig. 6 aggregates data from 5 optimizations and using a step size ranging from η = 1.4 to η = 1.5. The trilocal chain plot shown in the bottom-left of Fig. 6 aggregates data from 6 optimization using a step size ranging from η = 1.6 to η = 2. The trilocal star plot shown in the bottom-right of Fig. 6 aggregates data from 5 optimizations using a step sizes ranging from η = 1.6 to η = 1.8.
As a baseline measure of noise on the IBM quantum computers, we calculate the theoretical score using the optimal settings in the noiseless case described in Section IV B. The noise on quantum hardware deteriorates the non-n-local correlations causing a separation from the noiseless quantum bound. In particular, the bottomleft plot shows the theoretical quantum violation of the trilocal chain inequality to be close the classical bound, especially when compared with the trilocal star network (bottom-right). This difference is due to the fact that the S n-Star is more robust to noise than S n-Chain [59,60]. Finally, the noise on the quantum computers is not constant, it can fluctuate throughout the day [88], hence, the optimal settings in the noiseless case do not always produce the same value.
All plots in Fig. 6 show the mean optimized score exceeding the classical bound, thus, VQO is able to find non-n-local settings in all cases. In most cases, the error bars shrink as the optimization step increases indicating convergence to an optimum. In the bilocal network optimization, the significant error bars on the final step are likely a result of the step size being too large. In the trilocal chain and star optimizations, the mean optimization score converges to the mean theoretical score showing that the optimization consistently finds the theoretical maximum. In the bilocal and CHSH optimizations, the mean optimization score does not reach the mean theoretical score. This is the result of some optimizations finding the local optimum of the classical bound.
In all plots, the max optimized score converges to a value consistent with the max theoretical score. In all cases except for the CHSH optimization, an optimized score is found that exceed the max theoretical score. While there is a statistical chance that that the optimized score may be larger than the max theoretical score, it is also possible that our VQO framework may be finding optimal settings tailored for the quantum hardware. That is, there may be biases in the gate operations and measurement bases that VQO may be optimizing against. On the other hand, the theoretical settings are naïve to any such hardware biases.
The final feature shown in the data of Fig. 6 is that the setting optimized on the IBM hardware correspond to the optimal settings in the noiseless case. That is, when we take the settings for the maximal score in each optimization step and run them on a noiseless classical simulator, they maximally violate the Bell inequality in question. While this remarkable feature is interesting, its value is questionable. While we may be able to obtain the optimal settings on a large NISQ device, the quantum computer will still output a nonoptimal answer that cannot be checked on a classical computer due to computing constraints. However, there may exist some applications where the optimal settings provide value on their own. At any rate, the prevalence and usefulness of this feature should be a matter of further study.

B. Scaling VQO for Practical Advantages on Quantum Computers
It is challenging to scale our optimizations beyond what was considered in the previous section. As more circuit parameters are introduced, the number of circuit executions required by the parameter-shift rule grows polynomially [74,83]. Furthermore, some cost functions require additional correlators to be evaluated as the network scales. For example, the n-local star Bell inequality in Eq. (27) contains 2 n+1 correlator terms that each require a quantum circuit to be run or differentiated. Unfortunately, additional circuit executions are have a significant overhead due to the latency of remote execution, queue wait times, and serial circuit execution on a quantum computer. We mitigate this overhead by reserving quantum hardware and batching circuit executions, however, more can be done to scale the optimizations.
First, we need wide parallelization across quantum computers. In the worst case, the cost function will require |X | = m j=1 |X j | circuits to construct the network behavior P Net (Θ) for all network inputs. Furthermore, the parameter-shift rule scales the number of circuit executions polynomially. Fortunately, these circuits are independent and can be run in parallel. Thus, if we wish to scale VQO techniques, it will be exceedingly important parallelize circuit executions across many NISQ devices.
Second, to overcome the latency of remote execution, we need classical and quantum hardware to be run in close proximity. The quantum computing industry is taking steps in this direction, for example, the Qiskit Runtime environment.
Third, larger NISQ devices will allow larger networks to be simulated. In this regime, we may find practical simulation advantages. However, NISQ devices have yet to show a simulation advantage [89][90][91][92][93][94]. Nevertheless, NISQ devices are predicted to provide simulation advantages in the near-term [32,95] As an aside, we note that larger networks can be simulated using smaller quantum devices where an exponential increase in the number of circuit evaluations is accrued [96,97]. Such methods are only feasible if wide parallelization across NISQ devices can offset the exponential increase in circuit evaluations.

C. Adapting VQO to Quantum Network Hardware
In principle, the VQO framework depicted in Fig. 3 can be run on a quantum network rather than a quantum computer. The key requirement is that the network devices have free parameters that can be tuned continuously, and are therefore, differentiable. For example, rotating the polarization of a photon in an optics setup is similar to the ansatzes shown in Fig. 5 where qubits are rotated about the y-axis. In fact, hybrid optimization techniques on photonic systems have previously demonstrated the ability to maximize the violation of the CHSH inequality [55,56]. Hence, our VQO framework could be extended to similar photonic implementations of n-local networks [65][66][67][68][69].
A key advantage of extending our VQO framework to quantum network hardware is that network protocols can be optimized against the noise inherent to the quantum network hardware. That is, the noise and biases on quantum computers may not accurately reflect the noise and biases on quantum network hardware. Hence, communications protocols optimized on quantum network hardware will be more robust because they are tailored specially for the hardware they run on. Therefore, the cost of noise tomography [18][19][20] can be avoided altogether.
Deploying VQO on quantum networks may provide a means of automating device integration and maintenance in quantum networks. Using VQO, quantum network devices may be able to automatically align photon polarization bases, maintain the communication capacity of channels, or set up device-independent protocols. Such self-organization amongst network devices may significantly reduce the manual work and expertise needed to build, scale, and maintain quantum networks.

VI. USING VQO TO INVESTIGATE THE NOISE ROBUSTNESS OF QUANTUM NON-N -LOCALITY
In this section, we demonstrate that our VQO framework can provide practical value when run on a classical simulator. To do this, we use VQO to investigate the noise robustness of quantum non-n-locality. We verify that VQO can reproduce theoretical noise robustness results while also uncovering interesting new phenomena.
We begin by discussing the effect of noise on non-nlocal correlations. Then, we discuss the maximal violations for arbitrary mixed states. Next, we overview our use of VQO to investigate the noise robustness of quantum non-n-locality. Finally, we observe using VQO that maximally entangled state preparations are optimal for unital noise models, but not necessarily for nonunital noise models.

A. The Noise Robustness of Quantum
Non-n-Locality The noise robustness of quantum non-n-locality quantifies the amount of noise that an n-local quantum network can tolerate before its behaviors become n-local. Noise causes the deformation of Q Net , the set of quantum network behaviors. That is, if Q id Net denotes the set of noiseless quantum network behaviors, then the set of quantum behaviors for a noisy network Q Net satisfies Q Net ⊂ Q id Net . Furthermore, if a sufficient amount of noise is present, then Q Net ⊆ L Net and the non-n-locality of the network is broken. We extend the concept non-nlocality breaking from references [71,72], which introduce the general concept of nonlocality breaking noise.
To quantify noise, a quantum channel N γ or classical postprocessing map E γ is parameterized by γ ∈ [0, 1]. We let γ = 0 denote the noiseless case while all γ ∈ [0, 1] correspond to CPTP maps for quantum channels and column stochastic maps for classical postprocessing models. If the noise model is parameterized by multiple values, then we denote the parameters as γ = (γ k ) M k=1 where M specifies the number of parameters in the model.
The robustness of quantum non-n-locality can then be quantified by the critical noise parameter γ 0 at which non-n-locality is broken. In general, it is difficult to determine if Q Net ⊆ L Net because all Bell inequalities must be known. However, it is reasonable to determine the noise parameter γ 0 that breaks the non-n-locality with respect to a particular Bell inequality S Bell , which is the approach taken in references [71,72].

B. "Maximal" n-Local Violations Using Noisy Quantum States
In Section IV B, the optimal quantum strategies for non-n-locality are described in the noiseless case. In the presence of noise, these strategies are not guaranteed to be optimal. Fortunately, maximal violations have been derived for arbitrary mixed state preparations when separable qubit PVM measurements are applied at each measurement device [98][99][100][101].
These bounds for maximal violation are based upon the necessary and sufficient conditions for violation of the CHSH inequality [98]. That is, we first construct the real-valued 3 × 3 matrix T ρ Λ with elements where σ q1 i and σ q2 j are Pauli operators acting on qubits q 1 and q 2 respectively. We then construct the matrix and define its two largest eigenvalues as µ 1 (R ρ Λ ) and µ 2 (R ρ Λ ) respectively. If µ 1 (R ρ Λ ) + µ 2 (R ρ Λ ) > 1 then the CHSH inequality is maximally violated by the score The maximal n-local star score is [99][100][101] and the maximal n-local chain score is [101], The maximal violations for the star network relate to the maximal CHSH score S Λi CHSH for source Λ i as [99][100][101] S n-Star ≤ where equality is achieved when the sources are identical such that ρ Λi = ρ Λ i for all i, i ∈ [n] [99]. These equations do not account for entangled measurements or POVMs, which may improve the Bell score. However, in the bilocal case of Eq. (39) the correlations of Bell state measurements are proven to be strict subset of local PVM measurements [100]. Unfortunately, the "maximal" violations expressed in Eqs. (39) and (40) do not reflect the optimal strategy in some cases. Our variational quantum optimization results in Sections VI D and VI E uncover key edge cases in which Eqs. (39) and (40) do not hold. Namely, classical strategies applied on a subset of sources can sometimes achieve greater Bell scores.
First, consider the quantum n-local star network with one classical source ρ Λ1 = |00 00| and n − 1 quantum sources ρ Λi = |Φ + Φ + | for all i ∈ [2, n]. Since ρ Λ1 is a classical state, µ 1 (R ρ Λ 1 ) = 1 and µ 2 (R ρ Λ 1 ) = 0. Using Eq. (39), we find that S n-Star = 1 and is therefore nlocal. However, if the classical state ρ Λ1 = |00 00| is measured using the local observables O A1 x1 = (1 − x 1 )σ z + x 1 σ x and O B1 y = σ z , then the two-body correlator is . This strategy is equivalent to the case where the measurement on the central node B 1 outputs 1 with certainty while the measurement at node A 1 outputs 1 if x 1 = 0, and otherwise, outputs ±1 with equal probability. The remainder of the network nodes implement the measurements for the optimal quantum star network strategy. Then, using Eq. (28) and the fact that all correlator terms with x 1 = 1 vanish due to the random output from A 1 , we find I n,y = 1 2 Furthermore, we can extend this argument into a more general case where the star network has k classical sources that prepare the state |00 00|, then, the described strategy can achieve a Bell score of Thus, we have described a partially classical strategy that outperforms the "maximal" score predicted by Eq. (39). As a second example, we consider the n-local chain network with the interior devices preparing the classical state ρ Λi = |00 00| for all i ∈ [2, n − 1]. In this case, Eq. (40) predicts that S n-Chain = 1, however, we show that this is not the case. Let the observables on qubits q 3 , . . . , q n−2 be σ z for all y. Then O q3 y . . . O qn−2 y = 1 for all y ∈ {0, 1} because the sources prepare classical states on these qubits. However, if sources ρ Λ1 and ρ Λn are quantum, then they can achieve S Biloc using the n = 2 case using Eq. (39). Hence, the n-local chain inequality in Eq. (32) can be maximally violated even when all interior sources are classical.
Thus, we show that edge cases exist for which the maximal violations predicted by Eqs. (40) and (39) break down. Nevertheless, these equations still present a useful baseline to which we compare our VQO results. Furthermore, the partially classical strategies utilized in the edge cases will be useful in understanding the noise robustness results obtained from VQO.

C. VQO of the Noise Robustness of Non-n-Locality
Our objectives are to verify that our VQO framework can reproduce known noise robustness results and also, uncover interesting new results. In doing so, we demonstrate the practical value of our VQO techniques when applied on a classical simulator. Our investigative approach to noise robustness is distinct from previous works [71,72] that evaluate the precise noise parameters at which nonlocality is broken. Instead, we use VQO to find maximal violations of a Bell inequality given a static noise model N Net γ . By scanning through the noise parameters, we create a picture of how the non-n-locality deteriorates as the amount of noise increases. Hence we are able to easily compare the relative noise robustness across different n-local networks.
We consider noise applied during the preparation, communication, and measurement stages of an n-local network (see Fig. 7). Source noise occurs during the state preparation at each source and is modeled as N Net γ = n i=1 N Λi γi . Communication noise occurs during transmission of quantum states and is modeled as N Net γ = l k=1 N L k γ k . Detector noise occurs during measurement and is modeled as E Net γ = m j=1 E Aj γj . Alternatively, detector noise can be modeled as an adjoint channel applied to the measurement N Net † γ (Π Net a| x ). To simplify our investigation, we characterize the network noise using one noise parameter. First, we consider networks where noise is applied to only one source, link, or measurement device. This setting models a faulty component in an ideal network. Second, we consider the case where noise is applied uniformly to all sources, links, or measurement devices. This setting is more realistic and assumes that all devices are equally noisy. Our approach can easily be extended to nonuniform noise models as it introduces no additional computational overhead in comparison to the uniform noise model.
To evaluate the noise robustness we begin with static noise model N Net γ . To create a high-level overview of the noise robustness, we scan through the noise parameter γ ∈ [0, 1] using an interval of 0.05. For each γ, we use VQO to find the optimal state preparations and measurements that maximize non-n-locality with respect to the Bell inequality S Bell . We repeat this procedure for all considered n-local networks depicted in Fig. 4 and compare their relative noise robustness. Furthermore, we compare the optimized results with theoretical bounds on the max violation. For some channels, we derive max violation directly and show that our optimizations reproduce the expected results. In other cases, we use Eqs. (38), (39), and (40) to derive the maximal violation for a fixed state such as the Bell state We organize our investigation into two broad classes of noise, unital and nonunital. In each case, we consider noise applied to sources, qubit communication, and detectors. For each noise model we compare the results across the Bell inequalities expressed in Eqs. (30), (31), (27), and (32). We note that in the case of the CHSH inequality, we use the Bell score 1 2 S CHSH so that it can be compared directly with the Bell scores of the other n-local networks. Finally, to investigate how entangled state preparations or measurements relate to the noise robustness of non-n-locality, we explore a wide range of state preparation and measurement ansatzes. For details about the ansatzes, please refer to Appendix A

D. Unital Channels
A quantum channel U is unital if it satisfies U(I) = I. Unital noise cannot improve the purity of a state, hence, Tr U(ρ) 2 ≤ Tr ρ 2 . To motivate our investigation of unital noise, we refer to Theorem 1 proven in Appendix B. This theorem states that the noisy state U 1 ⊗U 2 (|Φ + Φ + |) maximally violates the CHSH inequality for any two unital qubit channels U 1 and U 2 . Hence, unital qubit noise does not affect the optimal state preparation for violating the CHSH inequality. Thus, we ask two questions. Does the optimality of maximally entangled states extend to the n-local networks with unital qubit noise models? Does the optimality of maximally entangled states extend to n-local networks with unital multi-qubit noise models?
To help answer these questions, we use our VQO framework to investigate the noise robustness of various unital channel including qubit depolarizing noise, source depolarizing noise, detector white noise, and qubit dephasing noise. In all cases, we find that maximally entangled states and local qubit measurements are sufficient to achieve maximal violations. That is, we find no example of arbitrary state preparations or measurements leading to a larger Bell score. While our results provide evidence that maximally entangled state preparation may be optimal in the case of general unital noise, we cannot make any significant claim outside of the considered networks and noise models.
Furthermore, we find that VQO is able to reproduce known results and find interesting new quantum strategies. In all cases, we show that our VQO results align closely with known or derived maximal violations. Furthermore, in the case of the dephasing channel, we find that the "maximal" n-local violations of Eqs. (39) and (40) do not hold because there exist partially classical strategies that are more robust to dephasing noise.

Qubit Depolarizing Noise Robustness
Qubit depolarizing noise mixes the maximally mixed state with the input qubit state as where v is a parameter commonly referred to as the visibility. The visibility relates to the noise parameter as The Kraus operators for the qubit depolarizing channel are expressed as Pauli operators Using these Kraus operators, we simulate qubit depolarizing noise on the PennyLane default.mixed classical simulator [74] (see Fig. 8). We find a close correspondence between the Bell scores maximized using VQO and the theoretical maximum derived using Eqs. (38), (39), and (40) and the Bell state preparation |Φ + at each source. Using our VQO framework, we find the maximal violation is achieved using the Bell state |Φ + . However, we also consider arbitrary preparation and measurement ansatzes (see Appendix A). In the CHSH and bilocal networks, we find that arbitrary state preparations and measurements do not yield a larger Bell score. For star and chain networks, we are unable to explore the arbitrary state preparation and measurement ansatzes because the PennyLane default.mixed simulator does not permit efficient optimization.

Source Depolarizing Noise Robustness
Source depolarizing noise mixes the two-qubit maximally mixed state with the input state, where the visibility v relates to the noise parameter as The Kraus operators for a two-qubit depolarizing channel are expressed as where all i and j are considered except i = j = 0. Maximal quantum violations for n-local networks with source depolarizing noise have been derived in terms of the visibility v. Namely, the max violation of the star net- [58,59] and the max violation of the chain network is S n-Chain = √ 2 n i=1 v i [60]. Using Eq. (49), we redefine these maximal violations in terms of the visibility to find where S Bilocal and S CHSH are treated as the respective n = 2 and n = 1 cases of S n-Star .
The results in Fig. 9 verify that our VQO framework can reproduce the theoretical results in Eqs. (52) and (53). To simulate these networks, we use the PennyLane default.mixed simulator [74] and the Kraus operators expressed in Eq. (51). We find that the state preparation |Φ + and local qubit measurements optimized over rotations about the y-axis are sufficient for maximal violations in all cases. Unfortunately, we are unable to efficiently scale our optimizations beyond the 3-Local chain and 3-Local star networks because there is a significant computational overhead in applying the two-qubit Kraus operators expressed in Eq. (51).
We provide further evidence supporting the optimality of maximally entangled state preparations and local qubit measurements through the consideration of general ansatzes. In both the CHSH and bilocal cases, we consider arbitrary state preparation and measurement ansatzes (see Appendix A). We find no example of states or measurements that exceed the theoretical bound in Eqs. (52) and (53).

White Noise Detector Errors
For a detector with binary outputs, we define a white noise error as the classical postprocessing map where with probability γ, the detector outputs a binary value drawn from a uniform random distribution. The white noise detector error is described by the M -qubit POVM with elements where Π ±|x are projectors onto even and odd parity subspaces. In Proposition 2 of Appendix C, we prove that the white noise detector error is unital and equivalent to the M -qubit depolarizing channel acting upon the detector's local state ρ Aj . Furthermore, in Proposition 3 of Appendix C, we prove that the maximal Bell score for detector errors on all measurement devices is In Fig. 10, we verify that our VQO framework can reproduce the theoretical violations of Proposition 3. To simulate the network we use the PennyLane default.qubit pure state simulator [74] and the postprocessing white noise models of Eq. (54). For all networks, we find that the state preparation |Φ + and local qubit measurements optimized over rotations about the y-axis are sufficient to achieve the theoretical maximum.
Since the noisy network simulation does not require ancillary qubits or mixed state simulation, it performs with greater computational efficiency. Therefore, we consider arbitrary state preparations in the CHSH, bilocal, 3-local chain, 4-local chain, and 3-local star. We find no improvement over the theoretical bound in Proposition 3 when more general simulation ansatzes are considered.

Qubit Dephasing Noise
Qubit dephasing is a unital channel describing the decoherence process as where the offdiagonal terms of ρ go to zero as the noise parameter γ increases. The Kraus operators for the dephasing channel are however, the circuit expressed in Fig. 12.a) leads to the most efficient computation. As proven in Proposition 1 of Appendix B, the maximal violation of the CHSH inequality given two-sided dephasing noise P γ1 ⊗ P γ2 is For uniform state preparations, the maximal violation for the n-local star inequality is [99] S n-Star = 1 2 n (Right) Dephasing noise is applied uniformly to all qubits. The markers show the maximal Bell score achieved using VQO. The dashed lines show the "maximal" score predicted by Eqs. (38), (39), and (40). The solid lines match the markers using the theoretical max scores described by Eqs. (62) and (63).
which, in the case of uniform dephasing noise, reduces to For the n-local chain, the partially classical strategy can implemented using the interior sources because classical states are not affected by dephasing noise. Thus, S n-Chain is simply the bilocal case, which is encompassed by Eq. (62). In the single-qubit dephasing case, the maximal star violation is then where the n-local chain simply corresponds to the bilocal case.
Using the dephasing circuit implementation in Fig. 12 [76]. a) The qubit dephasing channel Pγ is implemented using one ancillary qubit and a controlled rotation about the y-axis. b) The qubit amplitude damping channel Aγ is implemented using one anillary qubit and a controlled rotation about the y-axis followed by a CNOT gate. In both circuits, the rotation parameter θ relates to the noise parameter γ as θ = 2 sin −1 ( √ γ).
demonstrate that our VQO framework can find the optimal state preparations and measurements for non-nlocality. In all cases, the maximum is achieved by preparing the |Φ + state at each source and optimizing over local qubit measurements. Since each noisy qubit requires an ancilla, our simulations require twice the number of qubits needed to simulate the noiseless network. We are able to optimize over arbitrary state preparations and measurements in the CHSH, bilocal, 3-local chain, and 4-local chain. For the star networks, we optimize over arbitrary states preparations, but measurements are optimized over local qubit rotations due to inefficiency in computational performance. In general, we find that arbitrary state preparation and measurement ansatzes do not improve the Bell score beyond maximally entangled states and separable measurements.

E. Nonunital Noise
In this section, we investigate the noise robustness of quantum non-n-locality with respect to nonunital noise models. Nonunital noise describes quantum channels satisfying N (I) = I. For unital channels, we found evidence that the Bell state preparation |Φ + was optimal for achieving the maximal violations of the considered n-local Bell inequalities.
On the contrary, we find that maximally entangled state preparations are not optimal for nonunital noise models. Using our VQO framework, we find that nonmaximally entangled state preparations can achieve greater Bell scores than maximally entangled states when qubit amplitude damping noise and biased detector errors are present. Furthermore, in the case of colored source noise, we find that the there is a preference of which maximally entangled state is optimal. That is, the state |Ψ + = (|01 + |10 )/ √ 2 can achieve higher Bell scores than the state |Φ + = (|00 + |11 )/ √ 2.
In our investigation, we first verify that our VQO results agree with the maximal violations for maximally entangled states predicted by Eqs. (39) and (40), or with partially classical strategies that outperform these predictions. Then, we show that our VQO can find stronger violations when arbitrary state preparations are permitted. Hence we show that maximally entangled states are not always optimal in the presence of nonunital noise.

FIG. 13. Qubit amplitude damping noise robustness
Amplitude damping noise is applied to a single qubit (left column) and uniformly to all qubits (right column). In all plots, the markers show the maximal Bell score achieved using VQO (top row) over maximally entangled state preparations and local qubit measurements, (bottom row) over arbitrary state preparations and measurements. The dashed lines show the theoretically maximal score from Eqs. (38), (39), and (40) when the Bell state preparation is used. In the bottom row, the dashed line saturates at the bound for classical sources described by Eq. (44).

Qubit Amplitude Damping Noise
The qubit amplitude damping channel A γ describes the process of energy dissipation where a higher energy qubit state transitions into a lower energy state. The Kraus operators are defined as where the effect on a qubit density matrix is We find that it is most efficient to simulate the amplitude channel on the PennyLane default.qubit classical simulator [74] using an ancillary qubit for each channel as described in Fig. 12.b. Furthermore, the amplitude damping channel preserves the classical state A γ (|0 0|) = |0 0|, hence the optimal partially classical strategies described in Section VI B can be used with amplitude damping channel. As shown in the top row of Fig. 13, we perform our VQO over maximally entangled state preparation and local qubit measurement ansatzes. We find a close correspondence between the maximal violations predicted by Eqs. (38), (39), and (40) when Bell state preparations are used. As discussed in Proposition 4 in Appendix D, once the nonlocality of the CHSH inequality is broken, there exists maximally entangled states that achieve larger CHSH scores than the Bell state. This feature can be observed in the top-right plot of Fig. 13 where in the region γ ∈ [0.55, 0.75], our VQO results for the 3-local and 4-local chains score slightly higher than the prediction made using Bell state preparations and Eq. (40).
In the bottom row of Fig. 13, we perform VQO over arbitrary state preparation and measurement ansatzes. We compare our VQO results to the Bell score calculated by taking the larger of the predictions by Eq. (44) and (39) where Bell state preparations are used in each case. In the single-qubit case, our results closely match the theoretical prediction where the classical strategy becomes maximal at γ = 0.5, which corresponds to exactly the point where the qubit amplitude damping channel breaks the nonlocality of the CHSH inequality [71]. In the uniform case, there is also a close correspondence between theoretical and VQO data. However, we note that 3-local and 4-local chain scores drop off more quickly than the VQO data because the maximally entangled state cannot be used to implement the classical strategy on the interior chain network nodes.
In Appendix D, we prove for the CHSH inequality that nonmaximally entangled state preparations can perform better than maximally entangled states. If the state preparation and noise is constant across all sources [99], then the optimality of nonmaximally entangled state preparations can be extended to all considered n-local networks. However, as shown in Fig. 16 the region of improved Bell violation is small. Within in the 0.05 interval for γ in Fig. 13 the improvement of the Bell score is only present for γ = 0.3. As outlined in Appendix D the uniform qubit amplitude breaking noise breaks non-n-locality of maximally entangled states when γ 0 = (1 − 1/ √ 2) < 0.3. However, nonmaximally entangled states can be seen in Fig. 13 to be slightly above the classical bound at γ = 0.3, hence our VQO finds the slight improvement over maximally entangled states.

Source Colored Noise
Colored noise is important because it affects photonic entanglement sources [70]. Colored noise is a two-qubit noise model representing depolarization on a preferred axis of the state, where |Ψ ± = (|01 ± |10 )/ √ 2. The Kraus operators for colored noise are where |Φ ± and |Ψ ± constitute the Bell basis. When γ = 1, this nonunital channel outputs C γ=1 (ρ) = (|00 00| + |11 11|)/2, which is equivalent to classical shared randomness, hence the partially classical strategies described in Section VI B can be implemented. Using the PennyLane default.mixed simulator [74], we simulate colored noise using the Kraus operators described by Eq. (68). Our numerical results are shown in Fig. 14 where in the top row, we consider the |Φ + state preparation and in the bottom row, we consider the |Ψ + state preparation. When we consider the state preparation ρ Λi = |Φ + Φ + |, we find that in the single-source noise case, our VQO results match and for the chain network In the uniform noise case, the VQO results match Eqs. (39) and (40). When we consider the state preparation ρ Λi = |Ψ + Ψ + |, we find that in the single-source and uniform noise case, our VQO results match Eqs. (69) and (70). We also optimize over arbitrary state preparation and measurement ansatzes. We find no example state preparation that outperforms the |Ψ + state. Furthermore, local qubit measurements are sufficient to obtain the maximal Bell scores. Finally, we did not optimize the 4-local star network because it was to computationally intensive to run across all noise parameters on the mixed state simulator using the Kraus operators of Eq. (68).

Biased Detector Errors
We define a biased detector error as the classical postprocessing map, where the fixed classical value of +1 is output whenever the error occurs. The biased detector error can be described by the POVM with elements where Π +|x and Π −|x constitute an M -qubit PVM. In Proposition 5 of Appendix E, we prove that the biased detector error is nonunital because it is equivalent to the M -qubit partial replacer channel applied to the local qubits ρ Aj held by the detector.
Using the PennyLane default.qubit simulator [74] and the classical postprocessing map R γ we optimize the Bell score against biased detector noise. In Fig.  15, we plot with the dashed lines our VQO over maximally entangled state preparations and local measurements. These results are compared directly with the markers that show VQO over arbitrary state preparations and measurements. Hence nonmaximally entangled states can improve the Bell score when biased detector errors are present.

VII. DISCUSSION
This work introduces a hybrid, variational quantum optimization framework for noisy quantum communication networks. We implement our framework in the qNetVO software [73], which we use to optimize nonn-locality in noisy quantum networks. We show that our VQO framework can successfully maximize non-nlocality on noisy IBM quantum hardware. Furthermore, we demonstrate on a classical simulators that our VQO framework can both reproduce known results, and serve as a convenient investigative tool. Thus, VQO techniques provide practical value to current quantum information research while also showing promise of being a key tool for quantum network design and development.
When run on a classical simulator, our VQO framework can conveniently obtain valuable insights. We observe that in the presence of unital noise, maximally entangled state preparations yield maximally non-n-local correlations. In the presence of nonunital noise, we find that nonmaximally entangled states achieve larger violations than maximally entangled states. This separation is formally proven in the case of the amplitude damping channel. Our VQO techniques also find important edge cases where classical sources outperform the "maximal" violations derived by [99][100][101].
Our variational quantum optimization techniques show promise in being a key technology in the development of quantum networks. In principle, our methods can be deployed on quantum network hardware to optimize network protocols against the inherent hardware noise and potentially, automating the setup of protocols and maintenance of connections between quantum network nodes. Our VQO techniques also show promise of demonstrating a practical quantum computing advantage. In the longterm, fault-tolerant quantum computers will provide a clear simulation advantage of large, complex quantum networks [102,103]. In the near-term, NISQ devices are predicted to show advantage in the simulation and optimization of quantum systems [32,95].
In summary, our variational quantum optimization methods can conveniently be applied on classical hardware to solve meaningful problems while also showing promise of demonstrating practical advantages on quantum hardware. Hence, VQO is an important tool for designing and developing quantum networks.

A. Future Directions
To demonstrate practical advantages on NISQ devices, our methods should be broadly parallelized, run with close integration between quantum and classical hardware, and scaled to the largest available devices. Our framework is generic and interesting results may be obtained by applying VQO to new applications such as optimizing entropic quantities, multipartite games, or quantum protocols. Additionally, it will be interesting to extend our framework to local operations and classical communication protocols such as teleportation, superdense coding, and entanglement swapping. Finally, we found that settings optimized on the noisy IBM hardware were optimal in the noiseless case. Investigating the extent to which this feature holds will be important to the practical application of VQO.
An advantage of our VQO framework is that it can optimize quantum network protocols against the hard-ware noise without needing to explicitly know the noise model. Hence, it is important to demonstrate that the parameter-shift rule and gradient descent optimization can be extended to quantum network hardware.
Finally, we have conjectured that unital noise preserves the optimality of maximally entangled state preparations for non-n-locality, while nonunital noise does not. It will be important to determine if this is the case when alternative Bell inequalities are considered, especially ones that consider more inputs or outputs on devices and different network topologies.

Code and Data Availability
Our variational quantum optimization framework is released as a public python package called qNetVO [73]. All numerics and data are found in a supplementary codebase on GitHub [75]. Theorem 1. Consider the two-sided unital channel acting upon an arbitrary state preparation U 1 ⊗ U 2 (ρ AB ) where U 1 and U 2 are unital qubit channels. For any two unital channels U 1 , and U 2 , the maximal CHSH score S CHSH is achieved by the maximally entangled state ρ AB = |Φ + Φ + |.
Proof. The CHSH score of the noisy state U 1 ⊗ U 2 (ρ AB ) is defined as with the CHSH operator O AB CHSH is a two-body correlator where σ = (σ x , σ y , σ z ) and a, b ∈ R 3 satisfy | a|, | b| ≤ 1. The operator O AB CHSH admits the decomposition where t i,j = Tr σ A i ⊗ σ B j O AB CHSH and the values t i,j form the 3 × 3 correlation matrix T CHSH . In Proposition 1 of [72] it was proven that for any correlation matrix T , if Rank(T ) ≤ 2, then its corresponding two-body correlator Tr O AB ρ AB is maximized by a maximally entangled state. The authors then prove that the unital channel U 1 ⊗ I B preserves the optimality of the Bell state |Φ + Φ + |. We adapt their proof to the two-sided case to find that where the adjoint channel U † i is also unital. Noting that the CHSH operator can be diagonalized as where α x and α z are eigenvalues of O AB CHSH . It follows that, where σ i,j = U † j (σ i ) = s · σ for some s ∈ R 3 satisfying | s| ≤ 1 and O AB CHSH is a correlation operator with correlation matrix T CHSH satisfying Rank( T CHSH ) ≤ 2. By Proposition 1 of [72], the rank constraint proves that a maximally entangled state maximizes the CHSH inequality. Furthermore, the Bell state |Φ + Φ + | can always achieve the optimal CHSH score because the CHSH operator can always be transformed by local qubit unitaries O AB CHSH = U ⊗V O AB CHSH U † ⊗ V † such that the largest eigenvalue of O AB CHSH corresponds to the eigenvector |φ + .
Proposition 1. Given two-sided dephasing noise, P γ1 ⊗ P γ2 the maximal violation of the CHSH inequality is Proof. Since the phase damping channel is unital, Theorem 1 states that the maximal CHSH violation can be achieved using the Bell state preparation |Φ + = (|00 + |11 )/ √ 2. By direct application of the dephasing Kraus operators in Eq. (59) the channel's output is found to be ρ N = P γ1 ⊗ P γ2 ( Φ + Φ + ) (B11) To decide whether ρ N can violate the CHSH inequality we apply the necessary and sufficient conditions introduced by Horodecki et al. [98]. First, we construct the real-valued 3 × 3 correlation matrix T ρ N using Eq. (36) Next, we find Since γ 1 , γ 2 ∈ [0, 1], the two maximal eigenvalues are µ 1 (R ρN ) = 1 and µ 2 (R ρN ) = (1 − γ 1 )(1 − γ 2 ). It follows that the maximal CHSH violation is It follows that, where ρ Aj is the M -qubit state local to the detector. Thus, we conclude that detector white noise is equivalent to depolarizing noise applied to ρ Aj . Hence, detector white noise is unital.  Table I. In each case, arbitrary local qubit measurements are considered. The blue star denotes the value γ = (1 − 1/ √ 2) where nonlocality with respect to the CHSH inequality is broken for maximally entangled states.
Proof. We first derive the critical boundary at which the channel A γ1 ⊗ A γ2 breaks the nonlocality of the bell state |Φ + = 1 √ 2 (|00 + |11 ). By direct application of the amplitude damping Kraus operators in Eq. (64) the channel's output is found to be To decide whether ρ N can violate the CHSH inequality we apply the necessary and sufficient conditions introduced by Horodecki et al. [98]. First, we construct the real-valued 3 × 3 correlation matrix T ρ N using Eq. (36) Next, we find Finally, we calculate the quantity maximal CHSH violation as where µ 1 (R ρN ) and µ 1 (R ρN ) denote the two largest eigenvalues of R ρN . If the sum of eigenvalues satisfies µ 1 (R ρN ) + µ 2 (R ρN ) > 1, then the CHSH inequality is violated. For all γ 1 , γ 2 ∈ [0, 0.5] the two largest eigenvalues of R ρN are µ 1 (R ρN ) = µ 2 (R ρN ) = (1 − γ 1 )(1 − γ 2 . It follows that the CHSH inequality cannot be violated when Thus, we have proven the critical boundary at which the nonlocality of the Bell state |Φ + is broken by the two-sided amplitude damping channel. The question still remains as to whether there exists a maximally entangled state which performs better than the Bell state |Φ + . To this end we consider the family of maximally entangled states as |ψ = U ⊗ I |Φ + where the arbitrary unitary U is expressed as U = e −i(φ+ω)/2 cos(θ/2) −e i(φ−ω)/2 sin(θ/2) e −i(φ−ω)/2 sin(θ/2) e i(φ+ω)/2 cos(θ/2) . (D8) Using Mathematica, we calculate the eigenvalues of R ρN and find that they depend only on the parameters γ 1 , γ 2 , and θ. We use Mathematica to maximize these eigenvalues over the parameters where nonlocality is broken for the |Φ + state, i.e., θ ∈ [0, 2π], γ 1 , γ 2 ∈ [0, 1], and 1 2 ≥ (1 − γ 1 )(1 − γ 2 ). In this optimization we find no example violation of the CHSH inequality. Hence, the nonlocality of arbitrary maximally entangled states is also broken in the region specified by Eq. (D1).
Finally, within the region where the nonlocality of the Bell state is not broken, we maximize max γ 1 , γ 2 , θ2(1 − γ 1 )(1 − γ 2 ) − (µ 1 (R ρN ) + µ 2 (R ρN )) (D9) subject to the constraints γ 1 , γ 2 ∈ [0, 0.5], θ ∈ [0, 2π] and 1 2 < (1 − γ 1 )(1 − γ 2 ) where ρN is taken to be an arbitrary maximally entangled states. Using Mathematica to solve the maximization problem, we find no example where maximally entangled states achieves a larger violation than the Bell state. However, we do find examples where, once the nonlocality is broken, a larger Bell score can be obtained using a maximally entangled state that is not the Bell state. Hence we confirm that the nonlocality of all maximally entangled states is broken with respect to the CHSH inequality if Eq. (D1) is satisfied. The interested reader can find our Mathematica code in our supplementary codebase [75].