Adaptive Anchor-Based Partial Multiview Clustering

Partial multiview clustering, which aims to effectively merge multiple prespecified incomplete views to improve clustering performance, is a research hotspot and difficulty in the field of machine learning. Guo et al. proposed a partial multiview clustering method (APMC) based on anchor graph, which uses a Gauss kernel function to solve the similarity matrix. The Gaussian kernel function is sensitive to parameter $\sigma $ , and it is difficult to find the optimal value only by stepwise adjustment in practical applications. This undoubtedly affects the practicality of the APMC algorithm. To address this issue, an adaptive partial multiview clustering method based on anchor graph (AAPMC) is developed in this paper, which proposes an adaptive neighbor assignment strategy and utilizes it to improve the anchor-based similarity matrix computation of each view. The method proposed in this paper uses an adaptive method to solve the similarity matrix, eliminating the tediousness of parameter adjustment. In addition, the non-iterative method based on anchors is used to solve the optimal solution with low time complexity and is suitable for large-scale datasets. In short, our method is simple and effective, and it is easier to implement in practice. Extensive experiments show that our model can not only effectively solve the parameter setting problem, but also performs better than the state-of-the-art partial multiview clustering methods.


I. INTRODUCTION
With the development of information technology, multiview data is becoming more and more common in reality [1], [2]. For example, a patient's physical condition can be monitored simultaneously by multiple sensors in a home care system, or an image can be represented by visual features or text annotations. In general, different views provide complementary information to describe the data, which makes multiview learning get better performance than single view method [3]. As one of the most representative methods of multiview learning, multiview clustering can obtain better clustering results by exploring the consistency and complementarity of different views, and has been widely applied in data analysis, image classification, significance detection and information retrieval [4]- [6]. Many advanced methods have been proposed for multi-view clustering. Zhou et al. [7] propose a novel incremental multi-view spectral clustering The associate editor coordinating the review of this manuscript and approving it for publication was Qichun Zhang . method (IMSC). In IMSC, instead of ensembling the collection of all views simultaneously, they integrate them one by one in an incremental way. This method is scalable and applicable to streaming views. Wang et al. [8] propose a Multi-view Clustering via Late Fusion Alignment Maximization (MVC-LFA). MVC-LFA proposes to maximally align the consensus partition with the weighted base partitions. Such a criterion is beneficial to significantly reduce the computational complexity and simplify the optimization procedure. However, due to noise, failure of data-collecting equipment and many other unforeseen factors, data can be lost randomly in a single view or multiple views, making partial view data widely exist [9]. Traditional multiview clustering methods cannot directly process this data because they work under the assumption that all views are complete. In recent years, partial multi-view clustering has attracted extensive attention.
Researchers have proposed a variety of methods for partial multiview clustering [10]- [20], which can be generally divided into two categories: matrix factorization-based clustering method and graph-based clustering method. The matrix factorization technique directly learns the low dimensional consistent representation of all views. For example, PVC [10] uses nonnegative matrix factorization (NMF) and L 1 regularization to learn a common latent subspace of all views where instances of different views are forced to have the same representation. MIC [11] first fills the missing samples with the average of all instances in the corresponding view, and then extends MultiNMF [12] through L 2,1 regularized NMF to learn the consistent representation of all views. DAIMC [13] further extends MIC by seminegative matrix factorization and L 2,1 regularization regression. However, they only pay attention to the consistent representation of learning and ignore the internal structure of the data, and cannot guarantee the distinguishability and compactness of the representation. Graph based methods aim to learn low dimensional representations from different graphs, which reveal the relationships among all samples. Compared with the method based on matrix factorization, the geometric structure of data can be effectively used. Trivedi et al. [14] proposed to complete the view lacking instances by referring to the Laplacian matrix of the complete view, and then learn the low-dimensional representation of different views through the kernel CCA. It requires at least a complete view, which is the biggest disadvantage of this method. IMG [15] integrates latent subspace generation and compact global structure into a unified framework through a Laplacian graph on a complete data instance, and this integration brings more parameters. DCNMF [16] develops a dual constraint framework by combining cluster similarity and manifold keeping constraints. Gao et al. [17] proposed to fill the missing views with the average of the instances in the corresponding views for the construction of the graph and the learning of the subspace. However, this method will not work when the multiview data has a large number of missing instances in all views. Liu et al. [18] unifies the estimation and clustering of partial multiview into a single optimization process, proposes MKKM-IK algorithm, which has good performance, but has high computational and storage complexity at the same time. On the basis of MKKM-IK, LF-IMVC [19] proposed a late fusion method to simultaneously clustering and imputing the incomplete base clustering matrices. Though both MKKM-IK and LF-IMVC unify the imputation and clustering into a single optimization, they are different in the manner of imputation: the former is early fusion (or kernel-level imputation), while the latter is a kind of late fusion (or decision-level imputation).
Inspired by anchor-based strategies [21], [22], Guo and Ye [20] proposed an anchor-based partial multiview clustering (APMC) method. APMC uses anchors to reconstruct clustering relationships between instances, and integrates intra-view and inter-view similarities through anchors. This anchor-based approach effectively improves the calculation efficiency, and easy to be extended for more than two partial views.
However, there are still some limitations in APMC. Firstly, there are two parameters involved in the intra-view similarity computation, one is k, which controls how many nearest anchors are selected as the basis for sample representation, the other is σ , which adjusts the local influence range of the Gaussian kernel function. Especially the Gaussian kernel function used for solving the similarity matrix is very sensitive to the parameter σ , but APMC does not give effective setting criteria or method. Parameters problem leads to poor operability of APMC in practical applications [23], [24]. Secondly, ignoring of the credibility of view itself. At the stage of synthesizing inter-view similarity, APMC treats all views equally. Affected by factors such as missing samples or noise, the credibility of different views should be different.
In this paper, to solve the limitations of APMC, we introduce an adaptive strategy to solve the similarity matrix, which is effective and easy to operate for partial multiview clustering. On this basis, an Adaptive Anchor-based Partial multiview Clustering (AAPMC) method is presented. The experimental results verify its superiority.
The rest of this paper is organized as follows. Section 2 briefly introduces some works related to the proposed method. Section 3 introduces AAPMC in detail. Section 4 presents the experimental results. Section 5 concludes the paper.

II. RELATED WORK
In this section, we review relevant research of partial multiview clustering and anchor-based similarity reconstruction as our foundations for the subsequent discussion.

A. NOTATIONS AND PROBLEM DEFINITION
We summarize the notations used in this paper, which is shown in the Table 1. To solve the missing examples clustering problem, partial multiview clustering method always assume that instances share the common feature space in each individual view and the two different views are bridged by the shared common examples. To facilitate the discussion and without loss of generality, we take two views for illustration. Suppose there are n samples {x 1 , x 2 , . . . , x n } in total. We separate the original data of the two partial views as {X (1,2) , X (1) , X (2) }. X (1,2) ∈ R n c ×(d 1 +d 2 ) represents the common samples present in both views, n c is the number of common samples. X (1) ∈ R n 1 ×d 1 represents the samples that appear only in view-1, n 1 is the number of samples that only appear in view-1, X (2) ∈ R n 2 ×d 2 denotes the samples only appearing in view-2, n 2 is the number of samples only appearing in view-2. Here d 1 and d 2 denote the feature dimension of view-1 and view-2 respectively. Obviously, the formula n = n c + n 1 + n 2 can be derived.
As shown in Fig 1, the paired common part of the two views are X (1,2) = {x 3 , x 4 , x 5 }. The samples appear only in view-1 are X (1) = {x 1 , x 2 } and the samples appear only in view-2 are X (2) = {x 6 , x 7 }. The purpose of partial multiview clustering is to group all the above samples into c clusters, where c is given in advance by the user.

B. THE SIMILARITY MATRIX CONSTRUCTION IN APMC
The APMC algorithm uses spectral clustering method to classify the data at last, and as known, spectral clustering classifies the data according to the similarity matrix of the input data, so the clustering results of APMC depend on the similarity learning of the data to a great extent.
APMC algorithm firstly select the k nearest anchor points as a set of bases to represent every sample, and then uses a similarity calculation method adopt Gaussian kernel function to calculate the intra-view similarity. At the stage of solving the inter-view similarity, the APMC synthesize the similarities between common instances and anchors in the proportion of two views each accounting for half. The steps of anchor-based similarity construction in APMC algorithm are briefly described as follows.
(1) Intra-view similarity construction. Suppose there are l pairs of anchor points by selecting the common instances that appear in both views, i.e., l = n c . Denote the set of all instances in the v-th view as (x according to a distance function D 2 (x, u) and σ is the parameter controlling the neighborhood width.
(2) Inter-view similarity solving. The final unified similarity matrix is composed of three parts Z = [ Z ; Z (1) ; indicates the similarity between the samples appear only in the v-th view and anchors, which is consist of the last n v rows of the Z v ∈ R (n c +n v )×l . Z ∈ R n c ×l indicates the similarity between the common instances and anchors. In APMC, to leverage the information of both views, the element of Z is defined as ij ). The consistent similarity matrix S ∈ R n×n among all instances is approximated as S = Z −1 Z T in APMC, and here = diag(Z T 1) ∈ R l×l .

C. MOTIVATIONS AND CONTRIBUTIONS
Clustering is a solution process from the local structure of the sample to the global structure of the sample set. The accurate expression of the local structure of the sample is the prerequisite to ensure the final clustering quality.
In partial multiview clustering, it is more challenging to accurately express and fuse the local structure of samples with incomplete information through samples with complete information. APMC provides a solution, but it involves multiple parameters, and there is a problem of parameter sensitivity. In complete single view clustering problem, Nie et al. [25] proposed an adaptive neighbor assignment strategy to learn the data similarity matrix by assigning the adaptive and optimal neighbors for each data point based on the local connectivity. We improve this strategy and apply it to the calculation of sample similarity in partial multiview clustering.
To be specific, first we should select the common parts of the views as anchors to build an anchor graph and use an adaptive neighbor assignment strategy to construct a similarity matrix between samples and anchors, then we integrate intra-view and inter-view similarities. Finally, perform spectral clustering on the consensus matrix to obtain clustering results.
In general, our approach has the following advantages: (1) Our method uses the parameter-free adaptive neighborhood assignment strategy to construct the similarity matrix, which avoids the tedious parameter adjustment and is easier to use. Finally, perform spectral clustering on the consensus matrix to obtain clustering results.
(2) Our method adopts a non-iterative approach with low time complexity. In addition, this method solves the similarity matrix only involving basic operations of addition, subtraction, multiplication, and division. It does not require the calculation of Gaussian kernel functions and other more operations, so it is more efficient.
(3) This method has good generalization ability and can be extended to more than two views.

III. THE PROPOSED METHOD A. METHOD FRAMEWORK
The partial multiview clustering method proposed in this paper can be divided into two stages. Fig 2 shows the process of the two stages. The consensus similarity matrix is constructed at the first stage and spectral clustering is performed at the following stage. The method we proposed in this paper aims to learn a consensus representation S from multiple views for clustering, and the solution process of the consensus similarity matrix is shown in Fig 3. We still use two views to illustrate.

B. ANCHOR-BASED ADAPTIVE SIMILARITY CONSTRUCTION
The similarity matrix is learned by assigning a local connection-based adaptive optimal neighbor to each sample. The similarity z i . A closer distance corresponds to a greater probability and similarity. Therefore, the neighbor probability of obtaining the i-th sample in view-1 needs to solve the following problems: where z ij is the j-th value of z (1) i T , 1 denotes a column vector with all elements equal to 1, l is the number of anchors and γ is the regularization , and (1) can be rewritten in vector form as Given the equality and inequality constraints in (2), we use the Lagrangian function with KKT condition [26] to solve. The Lagrangian function of (2) is where η and β i are the Lagrangian multipliers, η is the equality constraint coefficient, and β i is the inequality constraint coefficient. According to the KKT condition, to solve the above optimization problem, the following conditions must be met: where z i is the optimal solution, and γ can be set as γ = ij [25], such that the optimal solution to (3) is The number of neighbors k is much easier to adjust than the regularization parameter γ , because k is an integer with an explicit meaning. For each sample, we can use (5) to assign its neighbors and obtain a sparse z (1) i that has exactly k nonzero values.

C. INTEGRATING INTRA-VIEW AND INTER-VIEW SIMILARITIES
After getting the intra-view similarity of two views respectively, Z (1) and Z (2) , we use Z = W (1) Z (1) ij + W (2) Z (2) ij , Z ∈ R n×l to represent the similarity of all samples and anchors in the two views in order to make full use of the information. In Fig 2, matrices W (1) and W (2) can help to realize this inter-view fusion. W (1) and W (2) represent the fusion weights of view-1 and view-2 respectively. W i  The non-missing samples in a view represent the information that we can use. The more information a view provides, the greater the weight of this view. Moreover, the view with more missing samples should be assigned a smaller weight to help achieve a highly reliable and consistent representation and to reduce the negative impact of incomplete views. If a large difference is observed in the sample missing rates of multiple views and if the weights are equally assigned, then those views with too many missing samples may provide too much inaccurate information and affect the final clustering results. Therefore, in our method, when the i-th sample appears in both views, W i (1) can be expressed in the following formula.
W i (1) = (n c + n 1 )/(n c + n 1 + n c + n 2 ) = (n c + n 1 )/(n c + n) In a similar way, W (2) can be computed. W i (1) = 1. The fusion similarity matrix S ∈ R n×n among all samples in the two views can be obtained by [27].
S combines intra-view and inter-view similarities.

D. SPECTRAL CLUSTERING
After obtaining the fused similarity matrix S, we perform spectral clustering to obtain the final clustering result. Spectral clustering learns a low-dimensional representation F ∈ R n×c for clustering according to the consensus matrix S. Spectral clustering minimizes the problem (7) through feature decomposition on L, obtains the corresponding c minimum feature vectors, and then performs k-means clustering to obtain the clustering result.
where Tr(.) is the trace of the matrix, and c is the number of clusters of the cluster. L = D-S is a Laplacian matrix [28], D ∈ R n×n is a diagonal matrix with D ii ∈ n j=1 S ij and I is the identity matrix.

E. EXTENSION FOR MULTIPLE VIEWS
The AAPMC method we proposed can not only handle partial multi-view clustering of two views, but also can be easily extended to more than two partial views. Taking three views as an example, the entire extension process is shown in  We first divide this partial three-view case into three two-view subcases. To adjust each subcase so that we can directly conduct two-view anchor-based similarity construction, we rearrange the instances according to their types. After dividing the partial three-view case, we then construct a similarity matrix for each subcase. The anchor-based similarity reconstruction method can be parallelly applied here. For each partial two-view subcase, we select the common instances present in both views as anchors. Next, we compute a truncated similarity matrix and the corresponding similarity matrix for each subcase. To further fuse the above similarity matrices in three partial two-view subcases, we rearrange them into aligned similarity matrices whose rows and columns follow the original order of instances. Finally, we perform spectral clustering on the consensus matrix S to obtain the clustering results.

F. COMPUTATIONAL COMPLEXITY ANALYSIS
At the first stage of the similarity reconstruction, the time complexity of generating the similarity matrix S is O nl v d v , where n is the total number of samples, l is the total number of anchors and d v is the feature dimension of the v-th view.
At the second stage of spectral clustering on the fused similarity matrix S, benefiting from the properties of the similarity matrix S, by executing SVD, the time complexity becomes O min{nl 2 , n 2 l} . If we only need the c largest singular values, the time complexity can be reduced to O nc 2 [29].

IV. EXPERIMENTS
To verify the effectiveness of AAPMC proposed in this paper, we compare it with eight advanced methods on five datasets. [30] is made up of 17 flower classes, each with 80 images which are described by color, shape, and textures. Following [11], [20], we take the X 2 distance matrix of color and shape features as the two views.  [31] has six feature sets of ten classes of digits and each class holds 200 instances, summing up to 2,000 instances. Following [20], we set view-1 as 76 Fourier coefficients of the character shapes, and view-2 as 216 profile correlations.

Multiple Features Handwritten Dataset (Digit)
USPS-MNIST Dataset merges two famous handwritten datasets: USPS [32] and MNIST [33]. USPS includes 9, 298 digit images with the size of 16 × 16 in ten classes, while there are 70,000 digit images with the size of 28 × 28 in the MNIST. The same digits in two datasets can be considered as described in two different views. We follow [20] and randomly select 50 images per digit class from each dataset. Consequently, each view comprises 500 instances.
Synthetic Dataset [34] consists of two views. For each view, we select 200 data points from a two-component Gaussian mixture model as instances at random. There are two clusters (i.e., cluster 1 and 2). Specifically, the cluster means are U   As a fundamental method, the best single view (BSV) method firstly fills in the missing values in each feature with the average value of the feature for each view, and then performs clustering on each view and reports the best result.

2) SC[C]
After preprocessing, we concatenate each instance's features from different views into a single feature vector. Then, we obtain an instance-to-instance similarity matrix and perform spectral clustering.

3) SC[A]
After preprocessing, we first compute an instance to instance similarity matrix for each view. Then, we fuse these similarity matrices by equal-weighted average and perform spectral clustering. [12] In multiview NMF, a structure sparsity based unsupervised feature selection method is proposed to seek a common latent subspace for multiview clustering. It is designed for complete multiview learning, meanwhile it is the foundation of the following partial multiview algorithms. Therefore, we impute all the missing values as BSV, and then perform the multiview NMF.

5) PVC [10]
The partial multiview clustering is the first work in dealing with partial multi-modal data based on subspace mapping. It is proposed based on NMF to establish a latent sub space, in which the instances described a same sample in different views are close to each other and similar instances in the same view should be well grouped. 6) MIC [11] Extends MultiNMF via weighted NMF with L 2,1 regularization.

7) IMG [15]
Incomplete MultiModality Grouping which integrate global structure of data to subspace learning. 8) APMC [20] Anchor based Partial multiview Clustering method, which utilizes anchors to reconstruct instance-to-instance relationships for clustering.

C. EXPERIMENTAL SETTINGS
3Sources is a partial multiview dataset, while other datasets are complete. We set Partial Data Ratio (PDR) from 10% to 90% with 20% as interval. 0% means that all views are complete. The lost samples are evenly distributed across all views, and each sample is available in at least one view. Same to APMC, each method was operated 20 times, and the  average was calculated to eliminate the uncertainty caused by randomness.
In order to evaluate the performance of the clustering results, two classic clustering evaluation indicators are adopted in this paper: clustering accuracy (ACC) and standardized mutual information (NMI). These two evaluation indicators measure the clustering algorithm performance by the similarity between the clustering results and the reference results. The values of these two indicators are between 0 and 1, and the larger the value is, the better performance of clustering algorithm will be.  Over two-view data sets, according to the trends of ACC and NMI with varied PDR, the above algorithms can be roughly divided into 3 classes: BSV, SC[C], SC[A], MultiNMF, MIC, and PVC, IMG, APMC, respectively. The first class, SC[C] and BSV perform poorly in most cases, which demonstrates that concatenating all views into a long single view is not a good approach in dealing with the multiview clustering tasks. This is mainly because the differences between views in feature scales and distribution are ignored and the complementary information between different views cannot be utilized. For the second class, as the PDR increases, the value of ACC and NMI of the SC[A], MIC, and MultiNMF methods decreases significantly, especially on the Digit dataset and the Synthetic dataset. This shows that filling the missing samples with corresponding average samples is not a good way to solve the partial multiview clustering problem. As for the third class, we found that the PVC, IMG, and APMC methods can obtain relatively acceptable performance, which proves that using the complementary information of the view to solve the partial multiview clustering problem is an effective method. Compared with all the above methods, our AAPMC mostly shows better performance. In the case of two views, only in the complete Synthetic dataset, can't our method achieve the best performance, but the gap between our result and the optimal result is very small.
Over 3Sources dataset, our proposed AAPMC method consistently outperforms other competitors in terms of ACC indicator in three two-view cases and one three-view case. In terms of NMI indicator, the three-view case is better than all competitors, but on the two two-view data sets, the APMC algorithm has a better NMI value. When the number of views is increased from two to three, AAPMC produces better results, which indicates that it can be extended to more than two views.
Compared with the previous method, the running time of APMC has been greatly reduced. But our method has the least running time on all data sets, and it still has a certain reduction compared to APMC. Therefore, our method improves the clustering accuracy while further reducing the running time.

E. PARAMETER SENSITIVITY
To further explore the performance of our method, here we also give a parameter study. Our method AAPMC only has one parameter k to be fine-tuned. We set PDR from 0% to 90% as aforementioned, and explore the clustering performance of AAPMC by ranging k within {4,6,. . . ,18}. As shown in Fig 8 and Fig 9, our method is insensitive to k in a relatively wide range.

V. CONCLUSION
In recent years, the partial multiview clustering problem has been widely studied, and many clustering algorithms have been proposed. However, the existing methods still have the following shortcomings, such as high time complexity, or too many parameters involved, or cannot be extended to more than two views.
In this paper, a partial multiview clustering method AAPMC based on adaptive anchor strategy is proposed, which can solve the shortcomings of previous methods. The experimental results fully validate its superiority.
As for future work, it will be interesting to extend the adaptive anchor-based method to partial multi-view clustering problem where there are no sample among multiple views contains all the view features. In addition, most of the existing incomplete multi-view clustering methods (including our method) require prior of the number of clusters. We plan to refer to COMIC [36] and adjust our method so that clustering can be performed without prior of cluster size.