An Improved Matrix Factorization Based Active Contours Combining Edge Preservation for Image Segmentation

Image segmentation is a crucial role towards clinical diagnosis and therapy planning due to the existence of abundant noise, blurry boundaries and heterogeneity. In this work, a novel matrix factorization based approach with the ability of edge preservation is presented. Firstly, to obtain more comprehensive feature description, we use the local spectral histograms to describe the local structures formed by feature values. Secondly, the energy function is established via matrix factorization theory, which makes each pixel fall into the sub-region with the largest coverage area in its neighborhood. Then, the edge preservation is used to obtain a smoother and more accurate object boundary. Finally, a number of synthetic and natural images are performed for verification. Experiments demonstrated that our approach achieves satisfactory results and has more robust against the complex background than other methods.


I. INTRODUCTION
Over the past decades, many image segmentation algorithms, including wavelet transformation [1], [2], graph cut [3], [4], edge detection [5], [6], level set [7], [8], deep learning [9], [10], have been presented. Among them, active contour models (ACMs) based on level set theory have become a successful branch. According to the nature of constraints, ACMs can be approximately categorized into two types: edge-based models [11]- [14] and region-based models [15]- [18]. Edgebased methods rely on local edge information to evolve contour curves towards the target boundaries. Due to the local limitation, these models are quite sensitive to the initial contour. Different from edge-based models, region-based models use region statistical information to guide the motion of contours. Therefore, they do not depend on the image gradients and can segment the objects with poor edges. As one of the most representative region-based ACMs, C-V [19] model utilizes the statistical information to guide the contour.
The associate editor coordinating the review of this manuscript and approving it for publication was Mingbo Zhao .
Nevertheless, supposing that image intensities are statistically homogeneous in each region, thus it is extremely idealistic and cannot provide accurately segment with intensity inhomogeneity.
In order to improve the segmentation performance with intensity inhomogeneity, the local region information of image is incorporated into traditional ACMs. In [20], Li et al. presented the region scalable fitting (RSF) energy based on the kernel function, which has achieved promising results. However, the model is sensitive to parameters and initial contour curve. Min et al. [21] proposed a new multi-scale local region-based level set method by using a local maximum description difference feature, while this method has good performance of handling intensity inhomogeneities. Yu et al. [22] utilized local patch similarity measure energy as a kernel function to guide the evolution of the curve, which can balance the noise suppression. Peng et al. [23] introduced local mean and variance energy with a kernel function, which is extended from [24]. Thus, it is much least sensitive to the initial contour and has attracted extensive attentions. Li et al. [25] defined a novel framework by using the patch information to replace the Gaussian kernel function, which can guarantee certain noise robustness. Ding et al. [26] presented an active contour model driven by local pre-fitting energy. Experiments have proved the proposed local prefitting method is able to reduce the computational costs.
In recent years, researchers have integrated texture information into active contour models. Min et al. [27] presented a color-texture segmentation method, in which intensity term and texture term are incorporated into the energy function. The major advantage of this approach is not limited to color. Liu et al. [28] proposed a new segmenting method for texture image by using local Gaussian distribution fitting and local self-similarity that has a relatively low complexity. Subudhi et al. [29] proposed a hybrid structural energy based on co-occurrence features of the image to evolve contour curves toward the desired texture boundary. Gao et al. [30] presented a typical histogram-based local variation degree and Gabor filter, in which more comprehensive features are achieved. Dong et al. [31] proposed a novel segmentation method based on the theory of the matrix factorization. This method is more robust and can deal with the indeterminacy.
In this work, we introduce a matrix factorization based approach that attempts to solve the problems as mentioned above. To be specific, the main contributions of this paper can be summarized as follows: 1) We employ the local spectral histograms to describe the local structures formed by feature values, so more comprehensive feature description can be obtained.
2) The energy function is built based on matrix factorization theory, in which each pixel fall into the subregion with the largest covered area in its neighborhood. 3) We incorporate the edge preservation into variational level set function to obtain a smoother and more accurate object boundary. Therefore, the proposed method is more robust against the complex background, which leads to accurate segmentation.
The remainder of the paper is organized as follows: Section 2 briefly introduces the related work. In next Section, the proposed method is discussed. Experiments and results are provided in Section 4, and finally concluding statements are given in Section 5.

II. RELATED WORK A. LOCAL SPECTRAL HISTOGRAM
Given a window size W in an input image, and a bank of filters {F (α) , α = 1, 2, . . . , K }, we can compute the set of responses through convolution. With the respect to the chosen filters, a bin of the spectral histogram is defined as [32]: where |W | shows the size of the input image. The spectral histogram characterizes local patterns via filtering and global impression through a histogram. It has been proven that, with properly selected filters, the spectral histogram can uniquely represent the texture appearance. For each pixel location, the local spectral histogram is computed over the square window.
In order to get more appropriate features, we use 11 equalwidth bins for each filter response in this paper.

B. FACTORIZATION BASED SEGMENTATION FRAMEWORK
Yuan et al. [33] proposed a robust texture image segmentation method by using the feature structure of local spectral histogram. The method is based on factor-based decomposition construction, where the input image is assumed to have N pixels and the number of features is L. Therefore, all eigenvectors of all local windows can be constructed as a M * N matrix Y , which can be factored as: where R represents an M * L matrix, and each column contains a representative spectral histogram. β is an L * N matrix, whose columns represent the weight information, η is the additive noise.
In order to calculate the number of features L, the singular value decomposition (SVD) technique is used. Then, an approximation feature matrix Y , which is approximate to its true value Y , can be defined as: where U and V are two matrices of size M * L and L * N , whose first L columns of YY T and Y T Y . is a L * L matrix, with the largest L singular values on the diagonal, and the other elements are equivalent to zero.
where Q is an invertible matrix, the above formula show that representative features R and combination weight matrix β are linear transformations of R and β , respectively. After obtaining R as the representative features, Yuan et al. introduced a nonnegativity constraint to compute the combination weight matrix, which is described as follow: where λ 1 and λ 2 are two parameters used to prevent R and β growing too large.

III. PROPOSED METHOD A. NON-LOCAL MEANS FILTERING
Unlike other traditional denoising methods, non-local means filtering can effectively reduce noise due to calculate weighted average of all pixels by measuring the image similarity, which makes full use of spatial information from the entire image. Given an image X R = {x i } N i=1 , x i represent the ith pixel in local window, and N i is a series of neighbors region centered on the i-th pixel. Then, the output of non-local means filtering is computed by: where W i,j shows the similar weight between the neighbor and the central pixels. It is expressed as: where h is a smoothing parameter controlling the kernel width. |x N i − x N j | 2 2,σ shows the Gaussian weighted Euclidean distance, Z (i) stands for the normalizing coefficient expressed as: The success of the non-local means filtering lies in removing the redundant information of the image. So, we use this algorithm to preprocess various images, so as to achieve the purpose of reducing noise. In this paper, we set h = 5.

B. MATRIX FACTORIZATION BASED FITTING ENERGY COMBINING EDGE PRESERVATION
After the filtered image is computed, we present a modified energy function by employing ACM for two-phase segmentation. Supposing a given image I , and its domain is , the goal of segmentation is to partition the image domain into object region o and background region b . For each pixel x, its corresponding histogram feature is computed as H ω (x). Using matrix factorization based approach, H ω (x) can be computed as [30]: where ω o and ω b denotes the weights of object region and background region, respectively. Let φ(x) be a level set function, the evolving contour can be represented by C = {x|φ(x) = 0}. In [30], Yuan et al computed the representation feature from subspace instead of feature space to enhance computing efficiency. Since the representation features represent the distribution of features of the entire region, and the average of the entire coordinate in the region.
where H ε (φ) means the Heaviside equation, and its derivative is Dirac function δ ε (φ). The representation features R is computed by: where R 1 represents an M * L matrix, and each column contains of first two eigenvectors in matrix YY T , the representation features should lie in a L-dimensinal subspace spanned by R 1 .
The combination weight is defined as: According to such characteristics, matrix factorization based energy term can be defined by: To drive the zero level set toward the object boundaries, the length term and area term are given as [34]: where g new is a edge indicator function, which is defined as: Finally, the total energy functional is represented as follows: where λ, β, ν are three parameters. As shown in [34], the value of parameter ν is constant, which makes our model lack sign and amplitude adaptability. In order to overcome this problem, we propose the adaptive weight coefficient: where and ∇ are the Laplacian operator and gradient operator, sign(·) is the sign function, k is a control constant. Therefore, the modified energy functional is: The magnitude of ν new is related to the gradient and second derivative of image, so it can adaptively adjust its value according to the image information. When in the smooth region, the value of ν new is small, and the level set evolves rapidly to avoid falling into the false edge. When the active contour moves to the target boundary, ν new is larger, it evolves smoothly to avoid boundary leakage.
Subsequently, all variables are kept unchanged and the total energy formula (21) is minimized, then the gradient descent evolution process can be defined as follows:

A. SYNTHETIC IMAGES SEGMENTATION
In this subsection, we test the performance of our method on two synthetic images with severe intensity inhomogeneity, as shown in Figs. 1-2. For these images, the gray values of the background are gradual, and the red initial contours are plotted on the original images with different spatial locations and different sizes. The corresponding segmentation results implemented by C-V, CVXB, FACM, FRAGL, RSF and the proposed model are given in first second to last columns. It reveals that C-V, CVXB, and FRAGL only integrate the global region information, and fail to handle the objects with intensity inhomogeneity. The FACM and RSF use the local image information to fit the measured image, and thus it can extract the object boundaries in some cases. However, these two methods are prone to fall into local minima, resulting in incorrect segmentation of the contour. In turn, our method can obtain better segmentation results for all the initial contours.
In the next experiment, we still choose the above six methods to segment the synthetic images in presence of severe Gaussian noise, which are shown in Fig. 3, respectively. In our experiments, the two noisy images obtained by adding Gaussian noise (zero mean, variances 0.1) to two clean images. It can be seen that except for the C-V model, all methods can accurately segment all the boundaries of the objects. This shows that the proposed method has good adaptability to noise.
Next, we will demonstrate the performance of our proposed model for texture segmentation. The synthetic images contain four kinds of texture features whose boundaries interact randomly. For each image, the initial rectangular contour is set by the user manually, and the corresponding segmentation results are shown in Fig. 4. Experimental results show that the texture complexity of the first two images is low, so CVXB, FACM and our method can successfully segment 223476 VOLUME 8, 2020 FIGURE 6. Comparison of our method with some classic algorithms using natural images. First column: original images with red initial contours. Second to last columns are C-V, CVXB, FACM, FRAGL, RSF and the proposed model, respectively. Fig. 6 in the same order. the target. However, for the last two images, with the increase of texture complexity, we can find that traditional active contour models (such as C-V, CVXB, FRAGL and RSF), which are based on only the global or local image information, failed to extract objects of inhomogenous textures.

TABLE 1. Comparison with iterations and computational time (s) for the images in
In contrast, due to the use of local spectral histogram and matrix factorization theory, FACM and the proposed model can eliminate the interference of textures, so they can give better and complete segmentation of the object of interest. For a quantitative comparison, the Jaccard similarity (JS) [38],  [39] index is employed to evaluate the images on Fig. 4. If S 1 and S 2 denote the segmented results obtained by the models and the ground truth respectively, then the metric is defined: where N (·) is the pixels number of the enclosed region. The value of JS ranges from 0 to 1, with a higher value representing a better segmentation result. Furthermore, bar plots of the segmentation accuracy of six methods, listed in terms of the JS, are shown in Fig. 5, which prove that both the FACM and the proposed approach have high accuracy, but our method is slightly higher than FACM model.

B. NATURAL IMAGES SEGMENTATION
To demonstrate the effectiveness of our method for segmenting complicated nature images, a comparative experiment is tested on five real images from the Berkeley database [40], and the segmentation results are shown in Fig. 6. The first image is a butterfly with texture object and complex background, the second image is a parterre in the desert with the texture structure, the third image is a flower with many petals, the last two images are both of stone, with a single and complex background. For the images of butterfly, CVXB and our method accurately segment the contour of objects, C-V, FACM, FRAGL, and RSF produce the wrong segmentation results. For the image of parterre, C-V, FRAGL and our method can perform the satisfying segmentation results, while the other models fail to segment the object. For other images, our method also can achieve good segmentation results, while other models have a lot of interference. Furthermore, we use the JS index to evaluate the performance of the algorithms and the corresponding JS values are shown in Fig. 7. From the JS values we can see that, except for the first image, the proposed model has the highest JS values among these methods. By following the above analysis, it shows that our model has a better segmentation performance than these traditional active contour models. Besides, the iterations and segmentation time are listed in Table 1. Obviously, the proposed method converges with relatively less time than that of other method in most instances. Because the CVXB, FRAGL and RSF models need  to process more iteration so that the three models often spend more time. Although the C-V and FACM models converge faster than the proposed, it does not get the ideal segmentation results for these natural images. This experiment demonstrates that our method can get more accurate results with less time.
In the second experiment, we compare our results on three complex animal texture images. As shown in Fig. 8, the contrast between the interested animals and the backgrounds are very similar, and the edges of the object are extremely indistinct. One can clearly see that C-V, FRAGL and RSF fail to recognize the objective edges. FACM and FRAGL can sketch out the general outline of the targets, but there is still exists unexpected results to some extent with both false detection and missing detection. The segmentation results of our method shown in the last column are more satisfactory and obviously superior to other models. This is because it not only obtains local image information, but also combines the advantages of non-local mean filtering and local spectral histogram.

C. WINDOW SIZE W
Through the above experiments, we find that the local spectral histogram has an important influence on the segmentation results. There are many parameters for local spectral histograms, such as filter scales, window size. Among them, window size W is an important parameter to be considered. Its size determines how much local information will be adopted. For this reason, experimental results with different values of the window size are shown in Fig. 9. In fact, the window size should not be too large or too small. Otherwise the segmentation result is wrong. Fig. 10 illustrates relationship between window size W and the JS values for a natural image. It can be seen that the segmentation effect has most accurate segmentation result when W = 14.

V. CONCLUSION
In this study, we have presented a novel powerful method, based on matrix factorization with edge preservation for image segmentation. By introducing the local spectral histograms to construct the region information, we can obtain more accurate localization of region boundaries. In addition, we integrate the edge preservation into our proposed energy functional, making segmentation results very robust to noise, intensity inhomogeneity and texture. Experimental results on synthetic and natural images demonstrated that our approach achieves satisfactory results and has more robust against the complex background when compared with other five representative methods.