Regression Based Clustering by Deep Adversarial Learning

Despite the great success, existing regression clustering methods based on shallow models are vulnerable due to: (1) They often pay no attention to the combination between learning representations and clustering, thus resulting in unsatisfactory clustering performance. (2) They ignore the relationship of data distribution and target distribution such that those methods are noise and illumination-change sensitive. (3) These nonlinear regression methods usually impose the hard constraint to minimize the mismatch between the discrete cluster assignment matrix and latent representations, which leads to over-fitting. In this paper, we utilize deep adversarial regression to tackle these problems and formulate regression based clustering by deep adversarial learning (RCDA). By seamlessly combining with the stacked autoencoder, the proposed model integrates learning deep nonlinear latent representation and clustering in a unified framework. Specifically, RCDA uses a kind of relax constraint between latent representations and continuous cluster assignment matrix to avoid over-fitting, and simultaneously utilizes the t-SNE algorithm and adversarial learning to analyze data distribution and target distribution so that improve representations learning. Experimental results on public benchmark datasets demonstrate that the proposed architecture achieves better performance than state-of-the-art clustering models in image clustering task.


I. INTRODUCTION
Clustering, primitive exploration with little or no prior knowledge, is one of the most indispensable and fundamental research topics in artificial intelligence research, and applies in many fields such as image retrieval, image annotation, document analysis and image segmentation, etc. In the past few decades, many classic clustering algorithms have been proposed, including spectral clustering (SC) [1], [2], subspace clustering [3], [4], graph based clustering [5] and so on. Despite extensive study, the performance of traditional clustering methods deteriorates with high dimensional data due to unreliable similarity metrics, known as the curse of dimensionality, when working with large-scale real-world image datasets.
To deal with the problem of dimensional curse, a common way is to transform data from a high dimensional data space to a lower feature space by applying hand-crafted feature extraction or dimension reduction techniques like principle The associate editor coordinating the review of this manuscript and approving it for publication was Juan Wang . component analysis (PCA), scale invariant feature transform (SIFT feature) and histogram of oriented gradients (HOG feature). Then, clustering can be performed in the lower dimensional feature space. However, these hand-crafted features ignore the interconnection between features learning and clustering. To address this issue, Torre and Kanade [6] propose a shallow model to perform clustering and feature learning simultaneously by integrating K-Means and linear discriminant analysis (LDA) into a joint framework. Nevertheless, the representation ability of features learned via these shallow models is limited.
To address the above challenges, lately, deep clustering models have emerged, which apply deep neural network to clustering tasks. For instance, Tian et al. [7] utilizes deep neural networks (DNNs) to transform feature at first phase, and then clustering. Xie et al. [8] propose deep embedded clustering (DEC) that simultaneously learns feature representations and cluster assignments using DNN, in which feature mapping and clustering are jointly learned. Guo et al. [9] present an improved deep embedded clustering (IDEC) method with local structure preserved based on DEC. Dizaji et al. [10] base their new deep clustering model, termed DEPICT on a multi-layer convolutional autoencoder, in which a regularized relative entropy loss function is employed for clustering.
Traditional clustering methods refer to unsupervised settings. Regression as one kind of classic machine learning algorithm has been applied to deal with many classic supervised learning tasks, e.g., object classification and face recognition [11]. When the prior label knowledge of instances is unknown in regression learning, it becomes unsupervised regression. However, there are few studies to utilize the property of regression to tackle clustering tasks. In practice, many high dimensional data may exhibit dense grouping in a low dimensional subspace, and the true cluster indicator matrix can be always embedded into a low dimensional mapping of the data [12]. Hence, regression can help guide the partitioning process by modeling the dissimilarity of each cluster in the low dimensional subspace. To take advantage of this property, [13] developed a local and global discriminative framework for balanced clustering via minimizing distribution entropy and the least-squares regression between cluster indicator matrix and low dimensional features. Since the cluster indicator matrix consists of discrete binary values, on the contrary, the low-dimensional feature is continuous. This hard constraint allows continuous values to approximate discrete values, which may make the overall model hard to optimize. In order to embed the discriminative information in the cluster indicator matrix of spectral clustering, thereby boosting clustering performance, [14] proposed to take controlling the regression constraint between the cluster indicator matrix and the latent features of the data into account, in which relaxing the cluster indicator matrix is considered, but keep the orthogonality intact. This problem also exists in [15], in which a robust regression-based clustering method was presented to tackle cancer genome data. However, a vital constraint is usually ignored by above methods, i.e., all the elements of the cluster indicator matrix should be nonnegative by definition.
Although all the above regression-based clustering methods provide impressive results, they still have several limitations: 1) Overlooking the relationship of data distribution makes the model sensitive to noise and illumination changes; 2) Being unable to capture the no-linear structure of data, because these methods based on shallow and linear objective function; 3) Using strict restrictions, which leads to algorithms overfitting; 4) Separation of the latent representation and clustering.
To handle the problems mentioned above, motivated by DEC [8], stacked autoencoder [16] and adversarial learning [17], we propose a novel deep adversarial regression clustering model (RCDA) to learn an effective parameterized non-linear mapping from the data space X to a lowerdimensional feature space F, which takes the advantages of regression clustering methods, deep embedding models and adversarial learning. RCDA basically consists of two training procedures: pre-training of the autoencoder and training of deep adversarial regression model. The pre-trained autoencoder makes sure the output of encoder is reliable. RCDA simultaneously solves for cluster assignment and the underlying feature representation via iteratively refining clusters with regression clustering loss and an auxiliary target distribution derived from the current data distribution. The adversarial learning between target distribution and data distribution significantly improves the effectiveness of two distributions, thereby improving the effectiveness of the feature representation. Moreover, experimental results show that RCDA achieves superior results compared to the state-of-theart algorithms on the image benchmark datasets. The main contributions of this paper are summarized as follows: • We propose a novel deep adversarial regression clustering architecture RCDA to simultaneously learn feature transformation and cluster assignment. To our best knowledge, this is the first work that uses the property of deep learning to help regression-based clustering.
• We derive a loss function to guide agglomerative clustering and deep representation learning which makes optimization over the two tasks seamless.
• We propose a method to make the learned data distribution and target distribution more effective, thereby achieving superior clustering results on highdimensional and large-scale datasets.

II. RELATED WORKS A. REGRESSION CLUSTERING
Regression-based clustering [14] is one of the most representative clustering methods. The objective is where ξ is the penalty coefficient. W and b are the parameters. X and L are the raw data matrix and the cluster indicator matrix, respectively. Problem 1 leverages hard constraint to make the continuous low-dimensional features approximate to the discrete cluster indicator matrix L. However, discrete zero and one elements are too ideal, leading to a suboptimal solution. Although some methods usually relax the cluster indicator matrix, keep the orthogonality intact. Under this circumstance, the relaxed solution may severely deviate from the true solution and thus degrade the clustering performance. Because, all the elements of the cluster indicator matrix should be nonnegative by definition. Also, they use K-Means to cluster indicator matrix to get the clustering results in the last step, this postprocessing operation will increase the instability of the original performance due to the uncertainty of K-Means. Moreover, these methods directly utilize the handcraft features and dimension reduction skills, which neglect the distribution of input data. And these shallow methods cannot model the non-linear structure of image data so that the algorithms is not robust enough. RCDA takes full advantages of stacked autoencoder to transform the data with a nonlinear mapping and integrate clustering and representation learning in a unified framework, which can consistently produce VOLUME 8, 2020 semantically meaningful and well-separated representations on real-world datasets.

B. DEEP CLUSTERING
Deep clustering is a new kind of clustering that has arisen in recent years. Inspired by the similarity between eigenvalue composition in spectral methods and stacked autoencoder [16] in learning lower-dimensional representation, Tian et al. [7] were the first to introduce DNN to tackle clustering tasks, which combines a nonlinear embedding of the original graph and K-Means algorithm in the embedding feature space. Law [18] proposed a deep supervised clustering metric learning method to learn data representation, given the ground-truth partition. These methods mentioned above firstly learn representations in a low dimensional feature space, and then run clustering algorithm on the embedding space, which can be divided into a two-stage procedure. RCDA integrates unsupervised learning of deep representations and clustering into a framework. Yang et al. [19] proposed a recurrent framework for joint unsupervised learning of deep representations and image clusters. Unlike these models that ignore the distribution of input data and target distribution, RCDA utilizes Student's t-distribution as a kernel to measure the distribution of input data. Xie et al. [8] and Guo et al. [9] use KL divergence between soft assignment and target distribution minimization to simultaneously learn feature representations and cluster assignments in a deep neural network. Although these two methods consider the data distribution and target distribution, they ignore the noise between distributions. Differently, RCDA utilizes the adversarial learning between data and target distribution to suppress the noise, thus improving the performance of clustering.

III. REGRESSION BASED CLUSTERING BY DEEP ADVERSARIAL LEARNING
In this section, we first elaborate the representation learning model and clustering module of the RCDA. Then, we will introduce the implementation details of RCDA. Our model is made up of three sub-networks: one stack fully-connected autoencoder (encoder: E n , decoder: D e ) that is used to learn latent representations, one deep embedding clustering layer that to cluster samples, and one discriminator D that is used to supervise the clustering. Figure 1 shows the framework of our model with example X, the detailed information of the framework will be given as follows.
Notations: For ease of explanation, suppose we aim to cluster N instances{x i ∈ X} N i=1 into K clusters according to their feature attributes, where the label information of each instance is unknown. Meanwhile, we utilize µ j (j = 1, 2, · · · K ) to represent the centroid of each cluster.

A. REPRESENTATION LEARNING MODEL
To learn the latent representations F ∈ R N ×K , we introduce the encoder E n and decoder D e : R N ×d → R N ×K → R N ×d . The autoencoder consists of four fully connected layers, aims to learn a latent feature of original input data X. There we choose autoencoder based on the fact that autoencoder consistently produces semantically meaningful and well-separated representations on realworld datasets. To be specific, encoder transforms the raw input data to a low-dimensional representation F via a nonlinear mapping where E n refers to the non-linear function and θ = } is the l-th layer's learnable parameters of encoder E n . Then, a decoder is exploited to reconstruct the input data X from low-dimensional representation, where the output of decoder is the reconstructed data X. To ensure that the latent features obtained by the encoder are effective, the network minimizes the least mean square loss L AE between X and X to update the learnable parameters of E n and D e , so we have This loss is used for training encoder E n , and decoder D e . It encourages encoder to catch essential structure for the latent representation from input data, and the latent representation recover the real data exactly. The encoder E n takes X as input and learns one latent representations F = E n (X; θ ). The decoder reconstructs the single view from the latent representation F. The output is X = D e (E n (X; θ ); ω). ω = } represents learnable parameters of D e at m-th layer.
We minimize the reconstruction error between the output of decoder and the input of encoder to optimize the encoder and decoder networks in Eq. (3). In order to improve the performance of clustering and ensure the prime target distribution is available, we pre-train the encoder and decoder.

B. CLUSTERING MODELS
To perform clustering, we map the output of the encoder, i.e. F, to the corresponding clusters by using the t-SNE like algorithm. To be specific, given an initial estimate of the nonlinear mapping f i , we get a latent representation F. Unlike t-SNE [20], we employ a mapping function Student's tdistribution to measure the similarity between representation f i of data point x i and cluster centroid µ j instead of measuring the similarity between data point x i and data point x j . Hence, we can calculate the soft cluster assignment by where q ij is the probability of assigning sample i to cluster j, α is the degree of freedom of the Student's t-distribution.
The K centroids {µ j } K j=1 is defined the trainable parameters, and the initial values of µ is obtained by implement K-Means on latent representations F. We herein call matrix Q the actual distribution. In order to simultaneously relax the discrete values of L in Eq. (1) and supervise the quality of clustering, thereby improve clustering performance, we here introduce a target distribution P as cluster indicator matrix. Thus, the regression-based clustering objective is defined by where λ 21 and λ 22 are two tradeoff parameters. The first item is regression-based clustering objective, the second item is employed to supervise clustering. Our aim is to match the soft assignment Q to the target distribution P. In this way, we can sharpen the data distribution and concentrate the same class data. In addition, we will get a more effective and latent representation for clustering task. We hope the target distribution has the following properties: 1) it can further emphasize more on the nodes assigned with high confidence, 2) it can strengthen predictions, 3) it can prevent large clusters from distorting the latent representations of the nodes. Hence, we computer target distribution p ij by first raising q ij to the squared and then normalizing by frequency per cluster. Hence, we have where t j = i q ij is soft cluster frequency of each cluster, which is adopted to normalize the loss contribution itself so that distorting the hidden space by larger clusters is prevented. In our method, we raise q ij to the second power (squared closeness), because it can simultaneously suppresses the responses from dissimilar points and enhances the responses from similar points, which makes the result more robust and sparser, as shown in Fig. 2.

C. ADVERSARIAL MODELS
Although the error between distribution P and Q can be measured by P − Q 2 F in Eq. (5), it cannot ensure that the differences in salient features is small. Accordingly, we utilize the adversarial learning to tackle this problem, that is to say, we introduce the adversarial learning between P and Q to further minimize the mismatch of them. Hence, in adversarial learning phase, we take autoencoder as a generator and combined a discriminator to make up a GAN-like module in RCDA. The discriminator aims to distinguish the target distributions P and the actual distribution Q, and it consists of three-layer fully connected networks. For D, we hope it VOLUME 8, 2020 can distinguish that Q is the actual distribution of input data points and P is the real target distribution. The loss function of autoencoder minimizes the likelihood that data distribution Q assigns to the fake source, while the discriminator is maximizes the likelihood that data distribution Q is assigned to the fake source, so the objective of adversarial learning is where λ 11 and λ 12 are two tradeoff parameters.
The encoder is trained to generate data distribution Q which are similar to target distribution P. The discriminators is trained to distinguish the data distribution Q from the real data. They play a min-max game until convergence. The adversarial loss can assist the encoder in mapping a given sample X to a desired output F. Thus, the combination of adversarial loss and clustering loss further ensures the encoder map the input data points X to a desired latent representations, thereby boosting clustering performance.

D. IMPLEMENTATION DETAILS
In this section, we present the detailed implementation of the unsupervised regression based clustering model. The overall objective function of our model contains three terms covering autoencoder loss, adversarial loss, and clustering loss, each being linked to one sub-network of our model. The overall objective function of RCDA is given by

1) STEP 1: PRE-TRAINING ENCODER E n AND DECODER D e
We utilize original data to train stacked encoder and decoder because the unsupervised representation learned by stacked autoencoder naturally facilitates the learning of clustering representations with RCDA. Similar to Vincent et al. [16] we initialize the SAE network layer by layer with each layer being a denoising autoencoder trained to reconstruct the previous layer's output after random corruption. After greedy layer-wise training, we concatenate all encoder layers followed by all decoder layers, in reverse layer-wise training order, to form a deep autoencoder and then fine-tune it to minimize reconstruction loss and update the parameters of θ, ω, E n , D e . The final result is a multi-layer deep autoencoder with a bottleneck coding layer in the middle.

2) STEP 2: TRAINING ENCODER E n , CLUSTERING LAYER AND DISCRIMINATOR D ON ALL DATA
We discard the decoder layers and use the encoder layers as our initial mapping between the data space and the feature space. Then we pass the data through the initialized encoder E n to get latent representation point f i and then perform standard K-means clustering in the feature space F to obtain k initial centroids {µ j } k j=1 . According to the centroids µ j and representation point f i , we next utilize Eq. (4) to calculate

Algorithm 1: Regression Based Clustering by Deep Adversarial Learning
Input: Data X ∈ R N ×d , number of clusters: K , Parameter λ 11 , λ 12 , λ 21 , λ 22 . Output: Cluster label c i of x i ∈ X. 1 Randomly initialize the parameters of E n , D e ; 2 for not converged do 3 // Step 1 −→ Pre-train the autoencoder 4 Updating E n and D e by Eq. (3); 5 end 6 Use the pre-trained parameters of E n and D e to project raw sample and gain F; 7 Implement K-Means on feature space F, obtain the initial clustering centroids {µ j } K j=1 ; 8 Calculate initial Q and P via Eqs. (4, 6); 9 Input Q and P to discriminator networks; 10 for not converged do 11 // Step 2 −→ Jointly training overall networks. 12 Alternately updating autoencoder and the discriminator by Eq. (8), where the centroid of j-th cluster is updated by the data distribution Q, and easily calculate the target distribution P via Eq. (6). Finally, we enter the distribution P and Q into discriminator networks for adversarial learning. We minimize the total loss function Eq. (4) to alternately optimize the parameters of discriminator network, centroids µ j and autoencoder network via back propagation algorithm until the objective function converges. Algorithm 1 reports a brief description of RCDA model.

IV. EXPERIMENTS
In this section, we apply the proposed RCDA model to image clustering and evaluate the performance on four popular datasets (MNIST, CIFAR10, CIFAR100 and STL-10) with three frequently-used measures (Accuracy, Normalized Mutual Information and Adjusted Rand Index).

A. DATASETS AND EXPERIMENTAL SETTINGS
Four widely-used clustering benchmark datasets i.e. MNIST [21], CIFAR-10 [22], CIFAR-100 [22] and STL-10 [23] are used to verify the effectiveness of the proposed method. Statistics of four datasets are shown in Table 1.  • MNIST The MNIST dataset consists of 70000 handwritten digits of 28 × 28 pixel size. The digits are centered and size-normalized. We transform the image to a vector (dimension is 784 = 28 × 28) as input to all algorithms.
• CIFAR-10 The CIFAR-10 dataset consists of 60000 32× 32 colour images in 10 classes, with 6000 images per class. We transform the color image to a vector (dimension is 3072 = 32 × 32 × 3) as input to all algorithms.
• CIFAR-100 This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. 20 superclasses are considered in our experiments. We also transform the color image to a 3072-dimensional vector.
• STL-10 The dataset consists of 96 × 96 color images. There are 10 classes with 1300 examples each. It also contains 100000 unlabeled images of the same resolution. We used the unlabeled set when training our autoencoder.
For a fair comparison, the training and testing samples of each dataset are jointly utilized in our experiments for all algorithms, and we set the number of clusters is the number of ground-truth categories. Similar to DEC [8], on STL-10 dataset, we concatenated HOG feature and a 8 × 8 color map to use as input to all algorithms, the remaining datasets and methods, the pixel intensities serve as inputs.
All the hyper-parameters and their values of our approach are listed in Table 2. We use TensorFlow to implement our approach. Stochastic gradient descent (SGD) with momentum is adopted in the autoencoder loss Eq. (3) minimization phase. During optimizing clustering loss Eq. (5) and adversarial loss Eq. (7), Adam stochastic optimization is adopted. In our experiments, the stacked autoencoder described in [8] is utilized in our model. For the discriminator networks D, we utilize a three-layer fully-connect layers with dimension K → 2000 → 2000 → 1, where the number of last layer neurons is changed to one to discriminate the input distribution is real or fake.

B. EVALUATION METRICS
In our experiments, we utilize three popular measures in the literature to evaluate the performance of clustering methods, accuracy (ACC), normalized mutual information (NMI) and adjusted rand index (ARI).
• ACC Accuracy is the best mapping between cluster assignments and true labels, which is defined by where l i is true label of sample i, c i is the cluster assignment produced by the algorithm, and m(·) ranges over all possible one-to-one mappings between clusters and labels and n means the number of samples. When m(c i ) = l i , σ (l i , m(c i )) = 1.
• NMI Normalized mutual information is the normalized measure of similarity between two labels of the same data, which is defined by where I is the mutual information metric and H is entropy.
• ARI Adjusted rand index is defined by where combination operation C m n is defined as a selection of m items from a collection n.
All above measures range in [0, 1], and higher scores imply better clustering performance.

D. EXPERIMENT RESULTS AND ANALYSIS 1) IMAGE CLUSTERING
(1) For image clustering task, our model achieves the best results on all datasets except the CIFAR-10 dataset. Specifically, the ACC on the MNIST dataset increase 6.67% compared the strongest competitor DEC [8]. On the CIFAR-10 and CIFAR-100 datasets, the advantage of the ARI is not TABLE 3. The clustering results of various methods on four datasets. The best results are highlighted in bold. ⊗ means the results are unavailable from the corresponding paper or code. The data marked with in the upper right corner is obtained by running the code provided by the author. very obvious. However, the proposed method get much more improvement than the DEC, which demonstrates that the proposed method can effectively learning the latent representation hidden in visual features. Furthermore, the approaches of deep representation (such as DeCNN [30], SWWAE [31]) is dramatically outperforms the traditional methods (such as K-Means [24], AC [25]), by which we can draw a conclusion that representation learning is significant to image clustering. Additionally, the proposed RCDA is better than some deep subspace clustering models, e.g., DPSC [33] and DSC [32], which demonstrate we learned better latent space. Fig. 3 shows part of the clustering result on MNIST dataset.
(2) For large-scale image datasets such as CIFAR-100 and STL-10, the proposed method is more distinct superiority than other methods. Hence, RCDA is able to handle complex and massive image clustering task.  method generally degraded. This is because more uncertainty is triggered as the number of clusters changes. The results demonstrate that CADR possesses adequate capability to tackle various clusters.

3) SENSITIVITY ANALYSIS
We then test the sensitivity of our method w.r.t the parameter λ 11 of the adversarial learning and parameter λ 21 of regression term. We first analyze the sensitivity of the parameter λ 11 . The tested range is [0, 1.0]. The ACC, NMI and ARI metric values on MNIST dataset of different λ 11 ∈ [0, 1.0] are shown in Fig. 5 (a), from which we can observe that our method performs stably in a wide range of λ 11 . When we make this experiment, the parameter λ 21 is a constant (10 −4 ). Next, we test the sensitivity of the parameter λ 21 , in which we set λ 11 = 0.5. Due to the fact that the regression term F − P 2 F in Eq. (5) is a huge number, so we set λ 21 ∈ [10 −11 , 10 −1 ] to keep the clustering loss in Eq. (5) balanced. As shown in Fig. 5 (b), our method achieves stably performance in a wide range of λ 21 . When λ 21 is bigger than 10 −3 , the clustering loss cannot maintain balance between these two terms in Eq. (5), which lead to bad clustering results. The default value of the parameter λ 11 of adversarial learning and ratio λ 21 of regression term is recommended to be set to 0.5 and 10 −4 , respectively.

4) CONVERGENCE ANALYSIS
As shown in Fig. 6 (a), we show the objective value convergence curve of pre-training autoencoder, i.e., Eq. (3), on MNIST dataset. As shown in Fig. 6 (b), we show the total objective value convergence curve of the proposed RCDA, i.e., Eq. (8). As seen, the proposed method has good convergence in both the pre-training stage and the clustering stage. Especially in the clustering stage, the proposed method converges very quickly, which ensures the running speed of RCDA.

5) VISUALIZATION
For convenience, we randomly choose 5, 000 samples from MNIST dataset, and provide a t-SNE visualization of our proposed RCDA. As shown in Fig. 7 (a), we apply t-SNE on the raw sample. As shown in Fig. 7 (b), we apply t-SNE on the latent representation learned by RCDA, i.e., the representation F obtained via Eq. (2). As can be seen, our approach exhibits a clearer and more compact cluster structure than the raw sample. This nice cluster-structured property is attributed to adversarial regression learning of our proposed RCDA.

E. DISCUSSION OF ADVERSARIAL REGRESSION
According to objective function Eq. (8), we discard the discriminator networks D for proving that the adversarial learning between data distribution Q and target distribution P can help to improve clustering effect via improving the performance of latent representation. Hence, the Eq. (8) can be changed as where β 1 is the parameters of clustering loss. With similar experiment settings for four datasets, Table 4 shows the difference between the results of containing discriminator networks D and discarding D.
In Table 4, we report the clustering results of containing P, Q adversarial learning or not. Note that the performance when adding P, Q adversarial learning outperforms the methods without P, Q adversarial learning on all the three clustering quality measures. Hence, the P, Q adversarial learning in Eq. (8) is advantageous in the process of latent representation  learning of our deep adversarial regression for clustering model.

V. CONCLUSION
In this paper, we propose a novel clustering model called regression based clustering by deep adversarial learning (RCDA), which jointly learns a mapping from the data space to a lower dimensional feature space and precisely predicts cluster assignments. In our method, we consider the distribution relationship between data distribution and target distribution, and utilize adversarial learning to supervise clustering. To enhance the representation ability of latent representations, we utilize a soft regression constraint as clustering loss to update learnable parameters of autoencoder. Empirical results on four widely used datasets show this new deep clustering model outperforms existing clustering methods.
FEI TANG received the master's degree in communication and information system from Shenzhen University, in 2007, and the Ph.D. degree from South China Agricultural University, in 2017. Since 2007, he has been working with the Shenzhen Institute of Information Technology. He presided over two projects of the Guangdong Natural Science Foundation and two projects of the Shenzhen Science and Technology Plan. He has published more than 20 academic articles. His research interests include intelligent computing, evolutionary algorithm, and image processing.