A Linearized Alternating Direction Method of Multipliers for a Special Three-Block Nonconvex Optimization Problem of Background/Foreground Extraction

In this paper, we focus on the three-block nonconvex optimization problem of background/foreground extraction from a blurred and noisy surveillance video. The coefficient matrices of the equality constraints are nonidentity matrices. Regarding the separable structure of the objective function and linear constraints, a benchmark solver for the problem is the alternating direction method of multipliers (ADMM). The computational challenge is that there is no closed-form solution to the subproblem of ADMM since the objective function is not differentiable and the coefficient matrices of the equality constraints are not identity matrices. In this paper, we propose a linearized ADMM by choosing the proximal terms appropriately and add the dual step size to make the proposed algorithm more flexible. Under proper assumptions and the associated function satisfying the Kurdyka-Łojasiewicz property, we show that the proposed algorithm converges to a critical point of the given problem. We apply the proposed algorithm to the background/foreground extraction and the numerical results are used to demonstrate the effectiveness of the proposed algorithm.


I. INTRODUCTION
In this research, we study a special kind of threeblock separable nonconvex optimization problem of background/foreground exaction [5], [7], [8], [14], [20], [44], [45] which is used to detect moving objects in blurred and noisy surveillance videos acquired from static cameras, as well as other fields [1], [10], [49]. In general, the problem can be summarized as the following problem: min x,y,z f (x) + g(y) + h(z) The associate editor coordinating the review of this manuscript and approving it for publication was Radu-Emil Precup .
where f : R m → R {+∞} is a proper lower semicontinuous, g : R n → R {+∞} is convex and continuous, and h : R l → R is Lipschitz continuous differentiable with the modulus L > 0, and A ∈ R l×m and B ∈ R l×n are the matrices representing a regular blurring operator for the blurred data b ∈ R l .
In recent years, a large variety of algorithms [4], [26], [32], [46], [47] for solving problem (1) have been studied. Due to its separable structure and linear equality constraints, perhaps the first choice is the alternating direction method of multipliers (ADMM) proposed in [29], which has been well studied in two-block convex problems [16], [17], [19], [24], [25], [43], [54] and two-block nonconvex problems [31], [35], [53]. ADMM can be regarded as a splitting version of the classical augmented Lagrangian method in [26], [46], which transforms the original high-dimensional problems into lowdimensional problems. The augmented Lagrangian function L β (·) of (1) is L β (x, y, z, λ) where β > 0 is the penalty parameter and λ ∈ R l is the Lagrange multiplier. The scheme of the extended ADMM for the three-block nonconvex problem (1) is: y k+1 ∈ arg min{L β (x k+1 , y, z k , λ k ) | y ∈ R n }, (3b) Note that without any assumptions, it is difficult to analyze the convergence of ADMM for three-block convex or nonconvex optimization problems. In particular, Chen et al. [11] constructed a counterexample to prove that the direct extension of the multi-block convex optimization problem does not necessarily converge, which has aroused the attention of the majority of scholars. Some scholars adopt strategies that correct the output of (3) to generate a new iteration or change the iterative scheme to guarantee the convergence for three-block convex optimization problems [9], [12], [13], [15], [21], [22], [28], [36], [38], [39], [51], and others have extended it to three-block nonconvex optimization problems [27], [30], [42], [50], [55]. Due to the special structure of the problem (1), that is, f and g are not differentiable and the coefficient matrices of the equality constraints are not identity matrices, the subproblems (3a) and (3b) do not have closed-form solutions. To adress the two-block convex problems of the socalled sparse group least absolute shrinkage and selection operator (SGLASSO) and the fused least absolute shrinkage and selection operator (FLASSO), Li et al. [34] embed linearization technology into ADMM approach due to the simplicity of the resolvent operator of the subproblem, that is, the quadratic term of the subproblems without a closedform solution is linearized and thus the resulting subproblem has a closed-form solution. The numerical experiments in [34] show that the linearization technique is very simple and effective at addressing such problems. Meanwhile, some scholars have performed similar works on linearizing the augmented term of augmented Lagrangian function [25], [37], [52], [56] for convex problems. The linearization technique is significant due to it easing the numerical implementation; therefore, it is popular in a wide range of applications, especially in ADMM for convex problems, such as linearizing the differentiable part of the objective function [34], [40], [41], [48], [52].
Based on the simplicity and efficiency of the linearization technique in numerical calculation, we propose a new ADMM by introducing a linearization technique for the three-block nonconvex problem (1). First, the subproblem (3a) is specified as Linearizing the quadratic term β 2 Ax + By k where and rI βA T A. Substituting (6) into (5), along with The linearization technique makes it easier to get the optimal solution of the subproblem, and it even gets the closed-form solution.
In the same way, by linearizing the quadratic term of (3b), we propose a more general proximal linearized version of ADMM (LADMM) as follows in Algorithm 1.
By adding appropriate proximal terms, the subproblem of the algorithm 1 has a closed-form solution, which improves the computational effect and the convergence condition is relaxed, i.e., matrices A and B no longer need column full rank, as required in [30], [55].
The rest of this paper is organized as follows. In section II, we summarize some useful preliminary results. In section III, we establish the global convergence and convergence rate of the proposed algorithm. Then, we report some numerical results of the proposed algorithm in Section IV. Finally, we make some conclusions in section V.

II. PRELIMINARIES
In this section, we summarize some preliminaries known in the literature which will be useful for further discussions. We use R n and R + to denote the n-dimensions Euclidean space and the set of nonnegative real number, respectively.
·, · to denote the inner product and · to denote the norm induced from the inner product. For a extended-real-valued function f , the domain of f is defined as domf := {x ∈ R n : f (x) < ∞}. We say that the function is proper if domf = ∅ and it is never −∞, and is closed if it is lower VOLUME 8, 2020 2 ), penalty parameter β >β := 1 4 + 2c α L+ 1 2 L, where c := max{ α 3 1−α 2 +α , 1}, rI βA T A, sI βB T B, k = 0. while a termination criterion is not met, do Step 1. Set We denote X Y as X −Y positive definite, for any matrices X , Y ∈ R n×n . For any subset S ⊆ R n and any point x ∈ R n , the distance from x to S is defined and denoted by d(x, S) := inf{ y − x : y ∈ S}, and we have that d(x, S) = ∞ for all x when S = ∅. For the sake of simplicity of notation, we denote ω = (x T , y T , z T , λ T ) T and v = (x T , y T , z T ) T .
Definition 1: We say that ω * is a critical point of the augmented Lagrangian function L β (·) defined in (2), if it satisfies The set of critical points of L β (·) is denoted by crit L β . Remark 2: We denote * ⊆ as the set whose elements are the optimal solutions of the augmented Lagrangian function. Throughout the paper, we assume that * is nonempty.
Definition 3: Let f : R n → (−∞, +∞] be a proper and lower semicontinuous function. For the given α, β ∈ R, The level set of f is simply denoted by Definition 4 ( [3], [6]): Let f : R n → (−∞, +∞] be a proper and lower semicontinuous function. (i) For a given x ∈ dom f , the Fréchet subdifferential of f at x, written by ∂f (x), is the set of all vectors u ∈ R n which satisfy and we set ∂f = ∅ when x / ∈ domf . (ii) The limiting-subdifferential, or simply the subdifferential, of f at x, written by ∂f (x), is defined by (iii) A point x * is called (limiting-)critical point or stationary point of f if it satisfies 0 ∈ ∂f (x * ), and the set of critical points of f is denoted by critf . The Definition 4 means that the property ∂f (x) ⊆ ∂f (x) holds immediately, and the first set is closed and convex while the second one is closed. We also use the notation dom(∂f ) := {x ∈ R n : ∂f = ∅}. Indeed, the subdifferential (9) reduces to the derivative of f denoted as ∇f if f is continuously differentiable. Furthermore, if g is a continuously differentiable function, then ∂(f + g) = ∂f + ∇g.
We give some properties about the convexity of functions below.
Definition 5: Let C be a convex subset of R n and let f : C → R be a function. Then f is convex over , if (10) Lemma 6: Let C be a convex subset of R n and f : C → R.

(i) f is convex over C if and only if there exists a vector
Moreover, if f is differentiable over R n , u = ∇f (x). (ii) f is strictly convex over C if and only if the above inequality is strict whenever x = z. We give an important property of the smooth function and omit its proof ( see e.g. [6]).
Lemma 7 [6]: Let f : R n → R be a Lipschitz differentiable function with the modulus L > 0. Then for any x, y ∈ R n , we have and 198888 VOLUME 8, 2020 Next, we give the definition of Kurdyka-Łojasiewicz property which plays a central role in our further convergence analysis. Now we proceed to give the definition of Kurdyka-Łojasiewicz (KL) property formally.
Definition 8 ( [2], [6]): We say that a proper function f : , a neighborhood U of x * and a continuous and concave function ϕ :

A proper lower semicontinuous f satisfies the KL property at each point of dom(∂f ) is called a KL function.
We denote n be the set of functions which satisfy Definition 8. Now we give an important lemma which was established in [6] for the KL property, and it will be useful for further convergence analysis.
Lemma 9 [6]: Let be a compact set and f : R n → (−∞, +∞] be a proper and lower semicontinuous function. Assume that f is a constant on and satisfies that KL property at each point of . Then there exist ε > 0, η > 0 and ϕ ∈ n such that for allx ∈ and all x in the following intersection

III. CONVERGENCE ANALYSIS
Before proceeding the analysis, we first make the following assumptions. Assumption 10: From LADMM 1, A, B, α, β, r, and s are satisfy (i) The optimal set of (1) is * , and we assume * = ∅.
Thus, the sequence {ω k } is bounded. Proof: Before prove the boundedness of the sequence {ω k }, we have derive an upper bound of λ k 2 . Form (15c), we have i.e., Now, we prove the boundedness of λ k in two respects: (i) For 0 < α ≤ 1, where the inequality follows from the convexity of · 2 (10) and the equality follows from (7g). i.e., (30), we obtain Using the convexity of · 2 (10), we have where the first equality follows from (7g). Thus, combining (31) and (32), we get which implies that the boundedness of {λ k } can be deduced from the boundedness of {v k }. From (26), we have where the inequality follows from (33) and the last equality follows from (27). Since α ∈ (0, 1+ √ 5 2 ) and β >β, we have ).
Lemma 18: Let {ω k } be the sequence generated by LADMM (1) which is assumed to be bounded and denote the limit point as 0 . Then, we get the assertions below: (i) 0 is a nonempty compact set and On the other hand, from (41), we have lim From the continuity of ∇h and the closeness of ∂f and ∂g, we have 0 ∈ ∂m(ω * ), which implies ω * satisfies (8) and it is the critical point of m(·). (iii) From (45), m(·) is a constant on 0 . The proof is completed.
Theorem 19: Let {ω k } be the sequence generated by LADMM 1 which is assumed to be bounded and suppose that m(·) is a KL function. Then, the sequence {ω k } has finite length, that is, and sequence {ω k } globally converges to a critical point of m(·).
Proof: We prove the assertion (46) from two cases. Since the monotonicity of m(·) and rearranging terms of (26), for any k > k 1 , we have
Since 0 is a nonempty compact set and m(·) is a constant on 0 , for k > k 4 , and together := 0 in (14), we get For convenience, we denote Since the concavity of ϕ, we have Together with (41) and ϕ(m(ω k ) − m(ω * )) > 0, rearranging terms of (47), we obtain Together with (26), we get where the second inequality follows from 2 √ ab ≤ a + b, for any a, b > 0. After some simple calculation and summing up the above inequality from k = k 4 + 1 to k 5 , yields From KL property (14), we have ϕ(m(ω k 5 ) − m(ω * )) > 0 and let k 5 → +∞, thus, Thus, {ω k } is a Cauchy sequence [6] and convergent. The proof is completed.
We show the convergence rate of (1)in the following theorem, which is similar to [6], [31], [53]. We omit its proof.

In
To better illustrate the effectiveness of LADMM 1, we compare it with the algorithm (ADMMy) proposed by Yang et al. [55]. Its iterative scheme for solving the problem (48) is given as follows: where α ∈ (0, 1+ √ 5 2 ) is the dual step size. We update β using the following heuristics: for the givenβ in Assumption 10, initialize n β = 0 and β = 0.6β; compute and then if qn k > 0.99 pn k−1 , we increase n β by 1. Then, we replace β by 1.1β whenever β ≤ 1.1β and the sequence is either n β ≥ 0.3 k or pn k > 10 10 . For λ max (A T A) = 0.07960, we set r = 0.1. The initialization we consider is In our experiments, we choose the following three choices of sparse regularizer function f (x) as suggested in [55]: Our test problems are shown in TABLE 1, where we choose 3 spare regularizers, 10 choices of µ, and 6 choice of p or ζ . Since the sensitivity of parameters values is different for different problems, we choose the best parameter values for our numerical experiment comparison.   holds. Then, we further check We choose ERR 1 = 5 × 10 −4 and ERR 2 = 5 × 10 −3 as in [55]. The results of the numerical experiments satisfy the stopping criterion, that is, ω k − ω k−1 F is less than a certain value, which guarantees the convergence of LADMM 1 and ADMMy (49), such as in Theorem 19.
We choose the 'Hall' video that contains 200 1444 × 176 frames (from airport 2001 to airport 2200) following Li et al. [33]. We choose two frames (airport 60 and airport 180) of the video as the test images, as shown in the first line of FIGURE 1. The images on the second line of FIGURE 1 are the blurred images, which are assessed in a numerical test.
We denote 'Iter', 'Time', and 'F-measure' as the number of iterations, the computing time, and the measure of the accuracy of the separation results, as in [55], respectively. The 'F-measure' is close to the maximum value of 1, indicating that the foreground is completely restored. We report the   To illustrate the influence of the dual step size parameter α on the algorithm, we calculate the function values and plot ERR 3 = |F k −F min | F min against the number of iterations for each algorithm with different values of α, where F k denotes the kth objective function value and F min denotes the minimum objective function value in the sense of each parameter α. The results of FIGURE 5 show that for fraction regularizer function and logistic regularizer function, the effect is better  when α = 0.8. For bridge regularizer function, the effect is good when α = 1.6.

V. CONCLUSION
This paper shows that the background/foreground extraction with a blurred and noisy surveillance video can be efficiently solved by the linearized alternating direction method of multipliers. The introduction of linearization technology makes it easy to solve the subproblem. Under the powerful Kurdyka-Łojasiewicz property and some assumptions on the correlation function, we prove that the sequence generated by the proposed algorithm converges to a critical point of the augmented Lagrangian function. We also report some preliminary numerical results to indicate the feasibility and effectiveness of the linearization strategy.