Improving Primal Decomposition for Multistage MPC Using an Extended Newton Method

Multistage model predictive control is a robust MPC formulation that takes into account parametric uncertainty by constructing a finite set of coupled scenarios. As the amount of scenarios increase so does computational cost and real-time implementation might not be possible. Scenario decomposition has been proposed to distribute computations and make real-time implementation possible, however, typically the subproblems are coordinated using the steepest descent method with slow convergence properties. In this letter a primal decomposition algorithm is improved by the use of a nonsmooth Newtons method for continuous nonsmooth equations. The proposed algorithm is applied to a gas-lift optimization system and compared to the standard primal decomposition method using steepest descent.


I. INTRODUCTION
I N MODEL predictive control (MPC) an optimal control problem is formulated and solved minimizing some cost while obeying the model equations of the system. An optimal control trajectory is calculated and the current control action is acted upon the system. Uncertainties in the model are handled through feedback, but if the uncertainties are too large a robust MPC scheme is necessary. One of these schemes which accounts for uncertainties is the multistage MPC (msMPC) also called the scenario MPC [14].
The drawback of the msMPC is that the problem becomes very large as it handles more uncertainty and computational cost increase exponentially. Several methodologies have been proposed to address this challenge. In [22] and [19] parametric sensitivity based approaches were proposed while in [6] and [8] parallelizable linear algebra approaches exploiting the structure of the problem were proposed.
Another approach much studied in literature is to employ decomposition algorithms to the msMPC. In these methods the msMPC problem is decomposed into smaller subproblems with a coordination algorithm on top ensuring that the subproblems converge to the optimal solution of the centralized problem. Each subproblem can be solved fast but must be resolved a repeated number of times to coordinate. While these approaches do not necessarily reduce the overall computational time to compute a control action, the computations can be performed in parallel such that the overall computational delay is reduced. The msMPC is usually decomposed per scenario. A dual decomposition approach was proposed by [15] where a progressive hedging approach was used. This method was further improved in [16]. The drawback of dual decomposition approaches is that they do not provide a single feasible control action until the algorithm converges.
This was addressed in [10] where a primal decomposition scheme was used instead, which could supply a feasible but suboptimal control action even if terminated early. As early termination is not desirable, speeding up the algorithm is desirable. Solving the distributed subproblems is the computational bottleneck of the algorithm. Two ways to speed up the algorithm are to reduce the computational time of solving the subproblems and reduce the number of necessary resolves in the coordination algorithm. In [12] a sensitivitybased path-following algorithm was employed to speed up the computations of the subproblems. In this letter, we propose to speed up the coordination step of the primal decomposition algorithm by employing a nonsmooth Extended Newton Method.
Using a Newton method for speeding up dual decomposition of a linear msMPC was done in [5] and for nonlinear msMPC using dual decomposition was done [16]. A real-time iterations approach to msMPC combined with dual decomposition employed a nonsmooth Newtons Method in [13]. This letter differs by employing a nonsmooth Newton method to a primal decomposition algorithm and showing local quadratic convergence for the proposed algorithm.

II. PRELIMINARIES A. Multistage MPC
Here we present the concept and our notation for msMPC, see [14] for details. Consider a discretized nonlinear system where x k ∈ R n x are the states, u k ∈ R n u are the control inputs and p k ∈ R n p are uncertain model parameters at time iteration k. The function f : R n x × R n u × R n p → R n x represents the nonlinear model of the system. A sequence of inputs u k for k = 0, . . . , N − 1, that minimizes an expected cost over a time horizon of length N is calculated. The parameters p k are sampled from a distribution. A set of M points from this distribution is selected to represent the possible realizations for the uncertain parameters. At each time step the each possible realizations are considered, which leads the msMPC to branch into M new branches. To reduce the size of the msMPC problem a robust horizon N r is used, which are the number of steps where branching is considered. This lead to a structure of N s = M N r scenarios.
The scenarios are tied together at certain points in their prediction horizon. The nodes where the scenarios are tied together can be thought of as knots with the tree having N Kn = N r −1 k=0 M k knots. In the nodes where the scenarios are tied together they have to make the same control action u k . This is known as the nonanticipativity constraints.
The msMPC problem is formulated as where j s k : R n x × R n u × R n p → R are the stage costs, w s the weight per scenario and c : R n x × R n u → R represents inequality constraints. Constraint (2d) represents the nonanticipativity constraints, where P u ∈ R N r n u ×Nn u such that P u u s = [u s 0 , . . . , u s N r ] T , the decision variables l ∈ R N l , N l = N Kn n u are connecting variables representing the knots and P s l ∈ R N r n u ×N l are selection matrices enforcing the tree structure. For the scenario tree shown in Fig. 1 we have P 1 l = P 2 l = P 3 l = I n u 0 0 0 0 I n u 0 0 , P 4 l = P 5 l = P 6 l = I n u 0 0 0 0 0 I n u 0 , This optimization problem can be reformulated in a more compact form which will be used in the rest of this letter, where ω s = [x sT , u sT ] T ∈ R n ω s , J s : R n ω s → R represents the cost, h s : R n ω s → R n h the equality constraints, g s : R n ω s → R n g the inequality constraints and equation (3d) represents the nonanticipativity constraints for scenario s with P ω ∈ R N r n u ×n ω s .

B. Primal Decomposition of msMPC
In decomposition methods the large optimization problem is split into smaller subproblems dependent on some parameter variables, where a coordinator problem finds the optimal parameter variables to make the subproblems converge to a common solution. Many different decomposition methods have been proposed, and in this letter, we focus on primal decomposition as was used in [10]. Here these parameter variables are primal variables from the original problem [1].
Using primal decomposition the large msMPC problem (3) can be decomposed into smaller subproblems in the following way, where the decision variables l in the centralized problem become the parameters for the subproblems (4). The variables λ s , μ s and γ s denote the dual variables related to the equality constraints (4b), inequality constraints (4c), and nonanticipativity constraints (4d) respectively. The N s subproblems (4) can be solved independently of each other in parallel for a given value of l. The centralized problem (3) is recovered by the following problem, where the coordinator variables from (4) become the decision variables

C. Standard Approach (Steepest Descent)
From [1] we have that a subgradient of the subproblem (4) with respect to the parameter l can be found as where γ s * is the optimal dual variable related to (4d) given l. In the standard approach to solving the coordinator problem (5) as outlined in [10] the subgradient of (5) is calculated as and a Steepest Descent (SD) method is applied to find the vector l such that the subgradient G(l) is equal to zero. If l k is the point given at iteration k the step to the next point is calculated as Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
where l k is the step in the variable l and α > 0 is the step length.
The Steepest Descent method only has linear convergence and is therefore slow to converge. This leads to resolving the subproblems (4) a large number of times, which is the computational bottleneck of the distributed msMPC algorithm. We propose to speed up the algorithm by using a nonsmooth Newton method with quadratic convergence.

D. Nonlinear Sensitivity Concepts
Before we procede we will introduce some concepts that will be used later. Let z s = [ω sT , λ sT , μ sT , γ sT ] T be a primaldual pair for subproblem (4) and let denote the subproblem Lagrangian. The point z s * (l) is called a KKT point for (4) if it is the root of the following system of nonsmooth equations, also known as the KKT conditions The active constraints can be further subdivided into two sets. The set of strongly active constraints A s + (l) = {i ∈ A s (l)|μ s * i > 0} and the weakly active Let |D| denote the size of an integer set D.
A KKT point z s * (l) is an optimal solution of (4) if a constraint qualification holds as well as a second-order condition of optimality. This letter will consider the Linear Independence Constraint Qualification (LICQ) and the Strong Second Order Sufficient Condition (SSOSC). In addition the Strong Complementarity (SC) will be used below.
Definition: LICQ holds for a primal solution ω s * (l) if the gradients of the equality constraints ∇ ω s h s (ω s * ), the active inequality constraints ∇ ω s g s i (ω s * ), i ∈ A(l) and the rows in P ω are linearly independent.
Definition: SSOSC holds for a primal-dual solution The following theorem introduced and proved in [3] establishes the sensitivity of a primal-dual solution z s * (l) of problem (4). III. PRIMAL DECOMPOSITION OF MSMPC USING AN EXTENDED NEWTON METHOD Let l * be an optimal solution of (5). Assumption 1: The functions J s , h s and g s are all twice continuously differentiable for all N s subproblems.
Assumption 2: LICQ and SSOSC holds for the primal-dual solutions z s * (l * ) for all N s subproblems (4).
From Theorem 1 and 2 if Assumption 1 and 2 hold there is a neighborhood L ⊂ R N l of l * where the optimal primal-dual solution z s * (l) is given by a PC 1 function for all subproblems s ∈ {1, . . . , N s }.
Note that even with smooth functions, the primal-dual solution of (4) can be nonsmooth because the active inequality constraints change with the parameter l.
Definition: A function σ : V ⊆ R n → R m is PC 1 if it is continuous and if for all y 0 ∈ V there is a neighborhood W ⊂ V of y 0 and finite number of C 1 functions σ 1 , . . . , σ N such that σ (y) ∈ {σ 1 (y), . . . , σ N (y)} ∀y ∈ W.
From Theorem 2 we have that the gradient of the subproblems (4) was given by (6).The gradient G(l) of the coordinator problem (5) is then given by (7) meaning it is a PC 1 function for l ∈ L. The optimal solution of (5) is given as the root of the gradient G(l), so we propose to use the Extended Newton (EN) method from [7].

A. Algorithm
The main contribution of this letter is algorithm 1. The algorithm uses the Extended Newton method from [7] to solve the coordinator problem (5). The Extended Newton method extends the Newton method to PC 1 functions and it retains quadratic convergence for a starting point near the solution.
G(l) will be used as the PC 1 function to explain the Extended Newton method. LetL s ⊆ L be the set where SC holds for the optimal solution of (4) for subproblem s. From Theorem 1 we have that γ * s is continuously differentiable for l ∈L s . ThenL = N s s=1L s is the set where G(l) is continuously differentiable.
Let l k ∈ L be a point given at iteration k of the Extended Newton Method. In an ordinary Newtons method the step l k would be calculated by solving the following equation where α ∈ (0, 1] is a step length and H(l k ) = ∇ l G(l k ) is the Jacobian of the gradient. Since the Jacobian only exist for l k ∈L the Extended Newton Method for PC 1 functions instead uses H(l k ) ∈ ∂ B G(l k ), where ∂ B G(l k ) is the B-subdifferential ("Bouligand") of the gradient G(l k ) [20].
Note that the B-subdifferential becomes a singleton when the function is differentiable, i.e., ∂ B G(l k ) = ∇ l G(l k ) for l k ∈L.
The following theorem proved in [7] establishes quadratic convergence of the Extended Newton method.
Theorem 3 [7]: Let l * be a solution for PC 1 system of equations G(l) = 0. Given that all elements of the B-subdifferential ∂ B G(l) are nonsingular and Lipschitz continuous for l ∈ L, then there exists a positive number r such that if ||l 0 − l * || ≤ r then the sequence of steps generated by the Extended Newton method converge to l * quadratically, where l 0 is the starting point.
The main result of this letter is given by the following theorem.
Theorem 4 (Main Result): Let l * be the optimal solution of the coordinator problem (5). If assumption 1 and 2 hold and all members of the B-subdifferential ∂ B G(l) are nonsingular and Lipschitz continuous, then there is a positive number r such that if ||l 0 − l * || ≤ r Algorithm 1 converge quadratically to l * .
Proof: Invoking Theorem 1 and 2, l * is found by solving a PC 1 system of equations G(l) = 0. Algorithm 1 ensures a member of the B-subdifferential ∂ B G(l) is used in the Extended Newton method, the rest follows from Theorem 3.

B. Computing the Derivatives (Smooth Case)
For l ∈L the Jacobian can be calculated as When SC holds the sensitivity ∇ l γ s * (l) can be found by applying the implicit function theorem to the KKT conditions (10) to obtain and solve the following linear system of equations, ⎡ where g + and μ + are the vectors of strongly active inequality constraints and their related dual variables.

C. Computing the Derivatives (Nonsmooth Case)
When l k ∈ L\L a member of the B-subdifferential of G(l k ) must be computed. This is not a trivial task as B-subdifferentials do not obey strict calculus rules, e.g., if a member of the B-subdifferential sets ∇ l γ s * (l k ) ∈ ∂ B γ s * (l k ) were found for each scenario s ∈ {1, . . . , N s } then − N s s=1 ∇ l γ s * (l k )P s l would not necessarily be in ∂ B G(l k ). To find a member of the B-subdifferential lexicographical derivatives will be used. A Lexicographical derivative is found for some full rank direction matrix P ∈ R N l ×N l , and for PC 1 functions is an element of the B-subdifferential J L G(l k , P) ∈ ∂ B G(l k ) [18]. In addition, strict calculus rules holds for lexicographical derivative meaning that The lexicographical derivatives of the optimal dual variables J L γ s * (l k , P) can be found by [18,Algorithm 1]. This means that H(l k ) = J L G(l k , P) from (15) is used in the extended Newton method when l k ∈ L\L. Note that J L γ s * (l k , P) = ∇ l γ s * (l k ) for l ∈L s for s ∈ {1, . . . , N s } meaning equation (14) can be used for subproblems with no weakly active inequality constraints.

IV. CASE STUDY
The proposed algorithm is demonstrated in an example and compared to the algorithm used in [10] which used the Steepest Descent method. The algorithm is applied to a uncertain gas-lift well network. The system consists of two wells, both with their own uncertain gas-oil-ratio GOR connected to a common manifold. The manifold is connected to a riser. The schematic figure of the system is seen in Fig. 2. The goal is to produce as much oil w po as possible while keeping the gas production rate w pg below the max capacity of the topside facility. This is done by injecting gas,w gl,1 , w gl,2 , into the two wells. More details about the system as well as the dynamic model of the system can be found in [9].  The uncertain parameters being the GOR in the two wells are assumed to take on values in the bounds between a low and high value given in Table I, as well as the nominal value for the GOR.

A. MPC Setup
In the model predictive controller the following cost function is used per scenario, where $ o is the price of oil, γ pg andŵ pg,max are a parameters to track the maximum gas production rate to prevent it from exceeding the max capacity, and γ r is a parameter penalizing large changes in the inputs.
To design the scenarios for the msMPC we considered the cases where parameters take maximum and minimum values, as well as the midpoint (nominal value) for 2 parameters, this leads to M = 5 realizations. A robust horizon of N r = 1 is used leading to N s = M N r = 5 scenarios in the msMPC.
A prediction horizon N = 24 is used in the msMPC with a sampling time of 300 seconds. The dynamic model is discretized using a 3rd order Gauss-Radau collocation scheme. The nonlinear programs are embedded using JuMP [2] and solved using Ipopt [21].
In the primal decomposition algorithm the coordinator variables l are warm started using the solution of the previous iteration. For the steepest decent algorithm α = 3 × 10 −5 was used as the step length. For the Extended Newton method α = 1 was used as the step length. Both the Steepest Decent and Newton algorithms were terminated when the change in the coordinator variables were || l|| ∞ < 10 −4 .

B. Simulation Setup
The gas-lift well network system was simulated for 5 hours. In the first 50 minutes, the GOR for both wells remains constant. In the next 200 minutes, the GOR for the two wells changes, while they remain constant the last 50 minutes. Both decentralized schemes are compared to the centralized approach.

V. SIMULATION RESULTS AND DISCUSSION
The simulation results are shown in Fig. 3, where the total produced oil rate is shown at the top. The gas-lift injection rates for the two wells are plotted beneath. From the figure, both the Steepest Descent and Extended Newton methods give almost identical results. A comparison with the centralized msMPC is plotted in Fig. 4. The absolute error between the centralized and decentralized schemes is shown for both the total produced oil rate and gas-lift injection rates. As can be seen both the Steepest Decent and Extended Newton schemes converges to the same solution as the centralized scheme, but the Steepest Decent method introduces a small error.
In Fig. 5 the required amount of iterations to converge is plotted for both the decentralized schemes. As can be seen the proposed algorithm converges in much fewer iterations than the SD algorithm. This holds especially true in the transient region at the start where the amount of necessary iterations is approximately reduced by a factor of 10. We chose to present the computational performance results in terms of iterations and not in terms of time, because the actual running time is dependent on many other factors than the algorithm itself.
Both in terms of computational time and accuracy the Extended Newton method outperforms the Steepest Descent method which is expected as the Extended Newton method has local quadratic convergence while Steepest Decent only has linear convergence. Since the coordinator algorithm is warm started with the solution of the previous iteration the algorithm might already start in the neighborhood of the solution leading to fast convergence. Another benefit of the Extended Newton method is the step is scaled meaning it is easy to find a stable step size α = 1. This is not the case for the Steepest Descent method where a step size that both makes fast progress as well as being stable is difficult to find. In fact a better step size could probably have been found for this simulation study.

VI. DISCUSSION AND CONCLUSION
The main computational load of the proposed algorithm is to solve the distributed subproblems, however, an extra computational load is introduced by the proposed algorithm. That is computing the Jacobian matrix used in the Extended Newton step. For nondegenerate, subproblems this involves solving equation (14) which can potentially be a large linear system of equations, which can be expensive. If the problems are solved using Ipopt the matrix factorization of the KKT matrix in equation (14) can be reused to calculate the Jacobian fast with sIpopt [17]. This extra computational cost should be negligible compared to solving the subproblems. For degenerate subproblems on the other hand a series of QPs has to be solved as outlined in [18], which might be computationally expensive.
Note that for nonconvex problems the proposed decomposition algorithm can converge to any stationary point of the coordinator problem and not necessarily a minimum. This is known property of primal decomposition. However it will converge there faster (quadratically) than steepest descent (linearly) and in the simulations conducted in this letter it was found to converse to same optimum as the centralized solution.
A related issue when applying primal decomposition to solve the msMPC problem is that the continuity properties of the subproblems only hold with respect to a local solution. The subproblems are nonlinear programs meaning they are likely non-convex. This means each subproblem might have several local solutions. To avoid this problem we suggest adding regularization to the cost function to force one local solution to be preferential to the others. We also suggest warm-starting each subproblem with the previous iterations solution so they start in a neighborhood of the same local solution.
Another potential challenge with using primal decomposition is that coordinator variables l might lead to the subproblems (4) being infeasible. This was addressed in [11], where it was proposed to use the worst case predicted control action from the previous iteration to initialize the coordinator variables l and to use feasibility ensuring back-tracking algorithm to make sure the updates are feasible.