Video Background/Foreground Separation Model Based on Non-Convex Rank Approximation RPCA and Superpixel Motion Detection

Traditional robust principal component analysis (RPCA) is very prone to voids in the process of background/foreground separation of complex scene videos and easy to misjudge the dynamic background as a moving target, which makes the separation effect unideal. In order to address this problem, this article introduces the super-pixel segmentation technique into the RPCA model. First, the Linear Spectral Clustering algorithm (LSC) is used to mark the super-pixel segmentation of the video sequence and a super-pixel grouping matrix is obtained. Then a new video background/foreground separation model is proposed based on the non-convex rank approximation RPCA and super-pixel motion detection (SPMD) technique. The Otsu algorithm is used to obtain the motion mask matrix and the augmented lagrange alternating direction method is used to solve the improved RPCA model. The results of numerical experiment show that the method proposed in this article has a higher accuracy in the detection of moving objects in dynamic background.


I. INTRODUCTION
Video background/foreground separation [1]- [3] is a key pre-processing step in the surveillance system. It has a wide range of applications in the fields of intelligent traffic management, intelligent video surveillance and sports behavior analysis. It is also one of the most active research topics in the field of computer vision, image processing and pattern recognition.
At present, the robust principal component analysis (RPCA) model based on matrix low rank sparse decomposition by Candès et al. [4] has been widely used in moving target detection [5]- [8]. In fact, background sequences can be modeled by a low-rank subspace that can change gradually over time, while moving foreground objects constitute associated sparse outliers [9]. The RPCA model achieves the detection of moving targets by decomposing the video matrix into a low-rank background matrix and a sparse moving foreground target matrix. This kind of method can estimate the background and separate the foreground moving target at the same time, effectively overcome the target false detection caused by the periodic change of the background, and has good robustness to noise and illumination changes. Specially, when The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wei. the background in the video is relatively static, the RPCA model can usually achieve more accurate foreground object detection [10], [11].
However, in most cases, a video sequence is always captured with a complex background in which the foreground objects may blend into the background [12], such as wind blowing leaves, waves, swaying vegetation, fountains, changes in light, ripples on the water, flags flying in the wind and so on. Because the background is not completely static (that is, the background also contains dynamic components), the performance of the foreground detection method will be affected by the dynamic pixel components in the background. It is easy to misjudge the dynamic background as the foreground moving target, resulting in the problems of incomplete and empty edge of the foreground moving object detection.
In order to overcome the above problems, scholars introduced different regularization terms and space-time continuity constraints into the original RPCA model, and used non-convex functions instead of nuclear norms to approximate the rank function, which greatly improved the foreground object detection results [13]- [16]. Especially in recent years, with the widespread application of super-pixel technique in the field of image processing, some scholars have proposed to use super-pixel segmentation technique to group VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ pixels that are adjacent in appearance, space or time to better preserve the boundary features of the foreground and achieved better results [10], [17]- [21]. Considering the success of super-pixel segmentation technique in image processing and the advantages of non-convexrank-approximation-based RPCA model in video foreground/ background separation and moving target detection under dynamic background, this article proposed a new video pre-background separation model (LSCNC-RPCA) based on the non-convex rank approximation RPCA and super-pixel motion detection technique. The new model is solved by the augmented lagrange alternating direction method. The experimental results of eight dynamic videos in the CDnet2014 database show that the model proposed in this article can effectively improve the accuracy of moving target extraction and has better visual effects of foreground target extraction than existed methods.
The rest of this article is organized as follows. Section 2 is a brief introduction to the background and some related work. The proposed model is given in section 3. Section 4 gives the experimental results and the conclusion is given in section 5.

II. BACKGROUND AND RELATED WORKS
In this section, the background and some related work are given.
The traditional RPCA can be described as the following optimization problem where rank(L) is the rank function of the matrix, · 0 is the l 0 norm of the matrix (that is, the number of non-zero elements in the matrix), and λ is a regularization parameter. Because the rank function and l 0 norm are non-convex discrete functions, the solution of this model is NP-hard. Note that the rank of a matrix is equal to the number of its nonzero singular values, the nuclear function and l 1 norm of the matrix can be used to convexly approximate the rank function and l 0 norm of the matrix, respectively. Hence, the above RPCA model can be approximated as the following convex optimization problem where L * = r i=1 σ i is the nuclear norm of the matrix L; σ i is the non-zero singular value of the matrix L; · 1 is the l 1 norm of the matrix, defined as: Under the assumption that the background of the video sequence is static or quasi-static, and the detection target in the video moves fast, the RPCA model (2) has a better detection effect on the foreground target.

2) SUPER-PIXEL SEGMENTATION
Super-pixel segmentation is an image segmentation technique proposed by Ren Xiaofeng in 2013 [22]. It refers to an image area with certain visual significance composed of adjacent pixels with similar physical characteristics such as texture, color, and brightness, etc [23]. Super-pixels can capture the redundant information of the image, thereby greatly reduces the complexity of image subsequent processing, such as target recognition and image segmentation. Therefore, it has been widely used in computer vision applications such as image segmentation, target tracking and target recognition, and so on.
At present, the main super-pixel segmentation algorithms include linear spectral clustering (LSC) [24], normalized cuts [25], meanshift [26], turbo-pixel [27], and SLIC [28],etc. Among them, the linear spectral clustering algorithm (LSC) uses an image segmentation method based on normalized cut sets. It uses the mathematical equivalence between two seemingly different methods to explicitly map the pixel data into a high-dimensional feature space. Then, through effective local operation, the global image structure can be successfully preserved, which effectively solved the problem of global high complexity. Compared with other existing super-pixel segmentation method, LSC algorithm not only avoids correlation matrix decomposition, but also avoids the creation of a large nuclear matrix. It has linear computational complexity and high storage efficiency, and the resulting super-pixel by LSC algorithm can well adapt to changes in the structure and texture of natural images. It can also connect the operation based on local features with the objective function of global optimization, and has good shape compactness and high boundary adhesion.

B. RELATED WORKS
For the traditional RPCA models that use the nuclear norm to approximate the rank function, there are two main problems. First, the nuclear norm is defined as the sum of all the singular values of the matrix. When one or several singular values of the matrix are very large, it will lead to the problem of excessive rank estimation, which will affect the recovery results of the low rank matrix. Secondly, in the solution of RPCA models based on nuclear norm, the singular value decomposition (SVD) must be performed at each step, which is time and computation consuming, especially when the matrix size is large.
To address this problem, scholars proposed different improved models. For example, in order to overcome the problem of excessive rank estimation of nuclear norms, Gu et al. [29] and Hu et al. [30] used weighted nuclear norms and truncated nuclear norms respectively to replace the normal nuclear norm and improved the detection effect to a certain extent. However, the calculation amount is still large. In order to overcome the problem of large computational complexity for solving the nuclear norm, scholars proposed RPCA models based on low-rank matrix decomposition technique [31], [32], which decomposes the low-rank matrix into two or three smaller matrix, reducing the size of the matrix that requires singular value decomposition and the amount of calculation as well. In addition, in order to completely avoid singular value decomposition in the model solution and overcome the overestimation problem aroused by the nuclear norm, scholars proposed to use different non-convex functions to approximate the rank function, such as the non-convex-γ -norm [14], non-convex Laplace norm [15], logarithmic determinant function [33], [34],etc. In particular, Yang and Zou [6] proposed a general optimization model based on non-convex rank approximation to achieve background separation before surveillance video. A large number of experimental results show that the RPCA model based on non-convex rank approximation can restore a cleaner background and the operation time is shorter.
However, for the video sequences including complex dynamic background, such as waves, snowflakes, swinging leaves and so on, the above non-convex rank approximationbased RPCA model may misjudge the dynamic background as the foreground moving target, resulting in the problems of incomplete and empty edge of the foreground moving object detection. In order to overcome this problem, scholars consider making use of the spatiotemporal key information contained in video and image, that is, to extend RPCA model by imposing some additional constraints. Based on this consideration, Ebadi and Izquierdo [17] introduced a new sparse criteria and group structure sparse constraints into the foreground part, and proposed a sparse RPCA model with a dynamic tree structure. Javed et al. [18] proposed a matrix decomposition model that combines the maximum norm regularization and structural sparsity constraints simultaneously. Besides, with the wide use of super pixel segmentation technique, scholars began to introduce the super pixel segmentation technique into the RPCA model to construct the spatial-temporal continuity between pixels. For example, on the basis of local and global invariant assumptions, Javed et al. [10] proposed a super-pixel-based space-time manifold structured sparse RPCA model by using two different manifold regularizations to characterize the sparse parts and applied the model to the field of moving target detection. Based on the characteristics of super-pixels, Silva et al. [35]proposed an on-line swing single-class integration technique for feature selection in foreground / background separation. Chen et al. [21] proposed a background subtraction algorithm based on hierarchical super-pixel segmentation, spanning tree and optical flow by using multiple Gaussian mixture models (GMS) to generate super-pixel segmentation trees. Fang et al. [36] proposed a background subtraction method for video analysis based on multi-scale random super-pixel by replacing the custom region segmentation region with the super-pixel segmentation region with similar features, etc. The experimental results show that the method of super-pixel segmentation can effectively solve the problem of the lack of edge information in the extraction of moving objects by traditional models and improve the effect of foreground extraction and background restoration.
However, most of the existed video foreground and background separation models based on super-pixel segmentation technique use SLIC super-pixel segmentation technique. As a local feature-based algorithm, the relationship between SLIC and the global image attributes is not very clear. Hence, it is hard to connect the local feature-based operation with the global optimization objective function, which will greatly affect its boundary dependence and shape compactness and lead to the result that part of dynamic background information may be included in the foreground when dealing with videos with complex dynamic background.
Compared with SLIC, the LSC algorithm not only produces super pixels with high boundary dependence and good shape compactness by bridging the local and global methods, but also captures global image attributes, which effectively solves the high complexity of the global question [24]. Therefore, in this article we will combine the LSC super-pixel segmentation technique with the non-convex rank approximation based RPCA model to establish an improved video background foreground separation model. The non-convex rank approximation function used here is the non-convexnorm proposed in [14], which is defined as follows:

III. THE PROPOSED MODEL AND ALGORITHM
The establishment of the video background and foreground separation model in this article mainly includes three steps. First, use the LSC algorithm to perform super-pixel segmentation on the video sequence, and obtain the super-pixel grouping matrix, which contains the hyper-pixel grouping information of all pixels in each frame of the video sequence. Then, use the improved RPCA model to perform low-rank sparse decomposition on the video to obtain a sparse matrix.
In the third step, the motion mask matrix is obtained by the Otsu algorithm (Otsu method) according to the obtained sparse matrix, and then the motion mask and super-pixel grouping matrix is used to further accurately locate the motion area and achieve effective separation of complex dynamic backgrounds. This method is referred as LSCNC-RPCA. Figure 1 shows the flow chart of this method.

A. LSC SUPER PIXEL SEGMENTATION
In the data preprocessing stage of this article, the LSC algorithm is used to segment each frame of the video to form the same region and record the corresponding grouping matrix. The homogenous area is an irregular and similar pixel block with similar texture, color and brightness. It not only has locality and continuity, but also provides effective boundary information for subsequent processing. The LSC algorithm is proposed based on the study of the relationship between the objective function of the normalized cut set and the weighted K-mean. It first assigns a 5-dimensional feature vector space and (x, y) is the spatial coordinates of the pixels. But the algorithm is not done in a 5-dimensional feature vector space. It uses the mathematical equivalence between two seemingly different methods to map data points into a high-dimensional feature space to improve linear separability.
Theorem 1 [24]: If the following equations (4) and (5) are satisfied at the same time, then the weighted K-means clustering and the minimized normalized cut set objective function are mathematically equivalent.
where w(·) represents the weight value of the data point, φ(·) is the function of mapping the data point to a high-dimensional feature space, and W (p, q) represents the similarity between the two data points. In other words, the theorem can be expressed as: if the similarity of two points in the input space is equal to the weighted inner product between two corresponding vectors in the carefully constructed high-dimensional feature space, then the division result of the normalized cut set should be the same as the optimal clustering result of the weighted K-means clustering.
Algorithm 1 gives the steps to implement the LSC super-pixel segmentation algorithm, where V x /V y is approximately equal to the aspect ratio of the image, and t > 0.5 is a parameter that balances local compactness and global optimality. In the clustering and merging stage, we have empirically merged small isolated pixels with a quarter of the expected super-pixel size with adjacent large pixels.
After the super-pixel segmentation of each frame of the video sequence is completed by LSC algorithm, the video sequence is labeled according to the super-pixel segmentation result. The pixels belonging to the i-th super-pixel in each frame are marked as i(i = 1 · · · k), and the grouping information of each video sequence frame is arranged as column vector to form a video grouping matrix G ∈ R m×n (where m is the number of pixels in each video frame, n is the number of video frames). The grouping matrix can not only cover the entire video sequence, but also effectively locate the contour boundary of the moving target. Figure 2 shows the LSC super-pixel segmentation result of a frame image in the Watersurface sequence. From Figure 2, it can be found that the area segmented by the LSC algorithm has good shape compactness and high boundary adhesion. Moreover, the homogenous regions tend to have the same motion characteristics. Hence, the grouping matrix of the homogenous regions can be used to extract the sparse foreground. In order to better extract foreground objects in complex scenes with dynamic background, in this section, we introduce the noise analysis and decompose the visual frequency sequence D into three terms, i.e., D = A + S + G. Where A is the low rank static background, S is the sparse foreground, and G is the dynamic background. At the same time, in order to more accurately separate the sparse foreground from the dynamic background and prevent the moving objects from appearing in both S and G, we introduce an incoherent term to constrain S and G so as to improve the separability of the two.
The definition of incoherent terms is as follows: where S r and G r represent the rth column of S and G respectively.
In this article, we use the γ -norm to approximate the rank function of the matrix, and use the l 2,1 norm to describe the sparse prospects. Besides, note that the dynamically changing background is usually unstructured and non-sparse, we use the F-norm to describe the noise matrix. Then we can get the following improved RPCA model: where D, L, S, G ∈ R m×n are the original video data matrix, low rank background matrix, sparse foreground matrix and noise matrix, respectively. · γ is defined by (3), γ > 0; · 2,1 represents the l 2,1 norm, defined as S 2,1 = n j=1 m i=1 S 2 i,j ; (S, G) is the incoherent term; · F is the Frobenius norm of the matrix; λ, α and β are penalty parameters.

2) SOLVING THE MODEL
In this section, we will use the alternating direction multiplier method (ADMM) [37] to solve the model (7). The augmented Lagrangian function of (7) is: where Y ∈ R m×n is the Lagrange multiplier and µ is the penalty parameter. Assume that the current number of iterations is k, then the variables and multipliers are updated as follows: 1) Fix S and G to update L k+1 : Equation (9) can be simplified as: In order to solve the problem (10), we need the following Theorem 2.
Theorem 2 [14]: be an unitary invariant function, and µ > 0. Then the optimal solution for the problem is In (10), let F(L) = L γ , A = D−S k −G k , then according to Theorem 2 we can obtain where σ * is given by equation (12) and can be iterated by the following equation [14]: 2) Fix A and G to update S k+1 : Simplify the formula (15), we get: For (15), we use the following Theorem 3 proposed in [38] to iteratively update.
Theorem 3 [38]: For a given matrix M ∈ R m×n and parameter τ > 0, the following optimization problem has a closed form of solution S * = (S * 1 , · · · , S * n ), where M j is the jth column of matrix j-th column of matrix M . In sub-problem (15) µ k , then we can obtain the solution of sub-problem (16) by (18).

C. SUPER-PIXEL MOTION DETECTION
In this section, the super-pixel motion detection is performed and the super-pixel segmentation results are combined with the sparse matrix generated by the RPCA model to analyze the correlation and obtain the motion mask of the image. Then the moving object is extracted from the relevant original video sequence.
Specifically speaking, for the sparse foreground matrix S ∈ R m×n (m is the number of pixels per frame of the video and n is the number of video frames) obtained by Algorithm 2, we use the Ostu method [39] to automatically calculate the motion mask M ∈ R m×n . The optimal decision threshold τ can be calculated, using the Ostu method, by minimizing the intra group variance, namely τ = arg min g ω 0 (g)σ 2 0 (g) + ω 1 (g)σ 2 1 (g) .
For each column S j , j = 1 · · · n of the sparse matrix S, ω 0 (g) and ω 1 (g) are the probability of moving pixels whose pixel value is smaller than the threshold τ while background pixels larger than the threshold τ ,respectively, and ω 0 (g) + ω 1 (g) = 1, σ 2 0 (g) and σ 2 1 (g) are the corresponding intra-class variance. In this way, a motion mask based on the improved RPCA model can be obtained: where M j (i, 1) and S j (i, 1),i = 1 · · · m represent the values of column j of M and S, respectively, where i = 1 · · · m. Next, the motion mask M and the super-pixel segmentation grouping matrix G combined to obtain the final super-pixel motion detection motion mask N . Count the number of moving pixels in each homogeneous region in the same group. If Q > ceil m θ k , the whole homogeneous region is classified as the moving pixels, where ceil(·) stands for the rounding-up function, m is the number of pixels per frame, k represents the number of super-pixels, and θ is the constraint parameter. Then the super-pixel motion detection motion mask N can be expressed as follows: ∀(x, y) ∈ SR : Algorithm 3 presents the algorithm framework for super-pixel motion detection(SPMD).

Algorithm 3 SPMD
Input: Sparse foreground matrix S, hyperpixel segmentation and grouping matrix G,m,k,θ = 3; Initialize:1:Formula (22) is used to calculate the optimal decision threshold τ ; 2:Equation (23) is used to calculate the sparse matrix motion mask M ; 3:for y=1: n; 4: for j=1:g(y); 5: for x=1: m; 6: Statistic Q(j) according to principle (x, y) ∈ SR(j) and M (x, y) = 1; 7: for Q(j) > q 8: N (x, y) = 1; 9: else; 10: N (x, y) = 0; 11: end ; 12: end; 13: end; 14:end; Output: N . Figure 3 shows an example of super-pixel motion detection. For the traditional RPCA model, due to the uniformity of image texture or clustering errors, many holes are left in the foreground. By grouping adjacent pixels in the same homogenous region through the super-pixel grouping matrix, these holes can be correctly classified into motion plates, thereby reducing unexpected errors caused by static objects or object clustering.

IV. EXPERIMENTAL RESULTS AND ANALYSIS
In this section, we will verify the performance of the model proposed in this article through experiments, using a combination of quantitative and qualitative methods. The algorithm in this article was compared with GoDec [13], NC-RPCA [14], AccAltProj [40], KNN [41], LRSD-TNNSR [42] and NCSC-RPCA [43]. We mainly selected eight video sequences in I2R dataset and CDnet2014 dataset for verification. All numerical experiments are based on the PC Inter Core i5-4210U 2.40Hz CPU, 4GB RAM environment, and are programmed using MATLAB R2019b.

A. PARAMETER SELECTION
In the model proposed in this article, we mainly use six parameters: λ, α, β, µ 0 , ε, ρ. For the parameters λ, we adopt an adaptive value to different video sequences and set λ = 5 * 10 −3 / √ min(size(D)). For the penalty parameter µ 0 , we take µ 0 = 10 −3 . In order to better separate the dynamic background from the foreground, we take α = 5 × 10 −2 and β = 10 −3 to perform the relevant constraints. The parameter ρ is used to control the convergence rate, here we set ρ = 1.1.
In order to make different models comparable, all the methods here use the same stopping criterion, that is, when the number of iterations reaches k max = 300 or the relative error is less than ε = 10 −6 , the iteration stops. Fig.4 shows the FM values extracted from the moving trees, fountain, switchlight and override video sequences for different number of hyper-pixels. Among them, the moving objects in the moving trees and overpass video sequences are relatively large, while the moving objects in the fountain and switchlight video sequences are relatively small. It can be seen from the figure that the pixel block is not the more the better since the more the pixel blocks are, the less pixels are included in the block. On the other hand, the boundary of super-pixel block segmented by LSC can not completely coincide with the boundary of the moving object. Hence, when some of the background texture features are similar to the moving object, they will be divided into the same category, such that the moving object will be regarded as the background in the extraction process when there are too many background pixels in the super pixel block. In addition, for  video sequences with relatively small moving targets, the setting of super-pixel blocks should be as large as possible, because the smaller the moving targets are, the less obvious their texture will be.

B. NUMBER OF SUPERPIXELS
Based on the above analysis, we divide the number of super pixels into two categories for the data set used in this article. The four types of moving objects, namely, water surface, waving trees, override and curtain, are relatively large, and the number of super pixels is set to 500, while Campous, fountain, boats and switchlight are relatively small, and the number of super pixels is set at 750.

C. QUANTITATIVE ANALYSIS INDICATORS
In order to evaluate the performance of each model more accurately and objectively in the quantitative comparative analysis, we refer to [44] and [45], and uses four indicators to evaluate the performance. To simplify the representation, the ground truth and estimated foreground are expressed as B GT and B E , respectively. The specific evaluation indexes are as follows: 1. Average gray error (AGE): The average gray error is the average of the absolute errors of B GT and B E , where B GT and B E are the values that have been converted to grayscale.
2. Percentage of error pixels (pEPs): Error pixels (EP) refers to the pixels whose difference betweenB GT and B E is non-zero (In the model results, white is used to represent the extracted target, and black is the background). PEPs is defined as the ratio of EPs to total pixels.
3. Peak signal-to-noise ratio (PSNR): PSNR is widely used to measure image quality. It is defined as where, the MAX value is 255, and MSE is the mean square error between B GT and B E . 4. F-measure: It is a comprehensive index based on comprehensive consideration of the performance of Precision (foreground segmentation precision) and Recall (previous spot accuracy), and is often used to evaluate the effect of dynamic video detection. It is defined as follows: Among them, TP indicates the foreground pixel correctly marked as foreground, FP indicates that the background pixel is wrongly marked as foreground, and FN indicates that the foreground pixel is wrongly marked as background.

D. EXPERIMENTAL RESULT AND ANALYSIS
In this article, we mainly selects eight video sequences for verification, namely WaterSurface, Waving trees, Campous, Fountain, Overpass, Boats, Curtain sequence and Switch-Light. Among them, the first six video sequences are outdoor test sequences and the latter two are indoor video sequences. All the sequences have light changes, and the light changes of SwitchLight sequence are more complicated than the Curtain sequence. Figure 5 shows the detection results of GoDec, NC-RPCA, AccAltProj, KNN, LRSD-TNNSR, NCSC-RPCA and our proposed algorithm on the selected eight video sequences, where (a) represents the original video frame selected from 8 different dynamic background videos, (b) represents the true foreground of the corresponding video frame, (c) to (i) represent the foreground of the eight video sequences extracted by GoDec, NC-RPCA, AccAltProj, KNN, LRSD-TNNSR, NCSC-RPCA and our proposed algorithm.
The first line in Figure 5 is the WaterSurface dataset. In this dataset, the foreground is relatively simple, but the ripples on the sea surface will appear as background motion during detection, which increases the difficulty of pedestrian detection. It can be seen from the figure that although LRSD-TNNSR and KNN can suppress part of the disturbed background, some water surface ripples will still be detected as the foreground, and the foreground area is not smooth enough. The calf part is missed from the NC-RPCA and NCSC-RPCA detection results. However, the proposed algorithm in this article can not only effectively extract the moving target, but also remove the influence of the water surface ripple disturbance, and the effect is significantly better than other methods.
Waving trees, Campous datasets and Overpass are all video sequences with a large number of undulating leaves. The background motion is relatively complex and can be easily extracted as foreground. The second, third and seventh lines show the detection results of these three sequences on these algorithms respectively. Irregular motion of shaking leaves in the background is the main obstacle to detecting moving objects. It can be seen from the figure that AccAltProj, KNN and LRSD-TNNSR cannot effectively eliminate the effects of shaking leaves in the background. Although GoDec, NC-RPCA and NCSC-RPCA can suppress the effects of shaking leaves, the detection area is very incomplete.
In contrast, our method can detect more complete moving targets. The NCSC-RPCA algorithm has a good effect on the extraction of Overpass data set extraction, but the void phenomenon is serious in the extracted target. For the algorithm in this article, since the target is close to the lens and the clothing texture is obvious, the LSC algorithm divides the target into several small hyper-pixel blocks. When more pixels are misdivided in the area of the small hyperpixel block, there will be a cavity in the extracted target. However, by contrast, our method can detect more complete moving targets.
The Fountain data set in the fourth row is a video sequence with a dynamic fountain in the background. It can be found that among the eight algorithms, LRSD-TNNSR and our algorithm have better moving target extraction effects. Besides, our algorithm extracts the most complete moving target area, and is least affected by the irregular movement of fountain.
The fifth and sixth lines are video sequences with illumination changes in the background, which are the Curtain and SwitchLight data sets. It can be found from the results that for the Curtain dataset, except for the algorithm in this article, there are obvious "phantom" phenomenon in the detection results of the other six algorithms. Meanwhile, the contour of the motion foreground detected by our algorithm is more complete and the background is cleaner. In general, for the SwitchLight dataset, our algorithm is hardly affected by illumination changes, and can effectively detect moving objects without extracting human shadow from the ground. From the comparison of the above experimental results, we can see that our algorithm can effectively eliminate the moving background and extract more complete foreground targets for video sequences with complex moving background.
The last line is the Boats data set, a data surface ripple that occupies a large portion of the image, which greatly affects the effect of foreground extraction. Although the denoising effect of the proposed algorithm is better than that of the previous five algorithms and the target extraction accuracy is similar to that of the CNSC-RPCA algorithm, due to the influence of water surface ripples, the LSC algorithm can't segment the moving object's boundary texture accurately, which makes the extracted moving object not smooth enough. Table 1 shows the quantitative evaluation results of the proposed algorithm and other six methods on the test sequence. In this article, we use F-measure, Average Grayscale Error (AGE), Peak Signal-to-Noise Ratio (PSNR) and Error Pixel percentage scores (pEPs) for comparative analysis. From the definition, we know that the larger the values of F-measure and PSNR, the better the effect, and the smaller the values of AGE and pEPs, the better. From the comparison of the relevant evaluation index data, we can find that our algorithm has a great improvement in terms of F-measure, AGE, PSNR and pEPs, compared with the other six algorithms, in the eight data sets with complex dynamic background. Specially, for the data set with relatively large moving target area, the promotion effect is more significant. According to the F-measure values of the eight selected datasets with different degrees of dynamic background interference, our algorithm has obvious advantages in target extraction when there is huge dynamic background disturbance in the video background, which shows that the model proposed in this article has strong robustness.

V. CONCLUSION
In order to effectively extract the moving target under complex dynamic background, this article proposes a video background separation model based on improved RPCA and super pixel motion detection. The new model combines the improved RPCA model and the super-pixel grouping matrix to realize the moving object detection, and effectively solves the problems of incomplete detection boundary and easy to appear holes in traditional foreground extraction. Experimental results show that when the moving target is large, the superiority of the algorithm is most obvious, while when the moving target is relatively small, the detection effect is slightly poor and prone to missing phenomenon. This will be the focus of our future research. Her research interests include nonconvex low-rank matrix decomposition, moving object detection, and image processing. VOLUME 8, 2020