I. Introduction
Clustering analysis has been an essential research problem in machine learning and pattern recognition, which aims at automatically grouping the data points with similar intrinsic properties into the same cluster by unsupervised learning to represent the data more concisely and to facilitate downstream tasks. However, data are often collected from different sources or sensors in real-world applications. For instance, the same news could be reported by multiple news media; the same semantics could be described by different languages. Although each view is informative and has individual properties, different views are complementary and often deliver the same cluster structures. To better collect complementary information among multiple views, many multi-view clustering algorithms have been proposed in recent years, which can be roughly divided into four categories: the multi-view -means clustering [1], [2], [3], [4], the multi-view spectral clustering [5], [6], [7], the multi-view graph clustering [8], [9], [10], [11], [12], [13], [14], [15], the multi-view subspace clustering [16], [17], [18], [19], [20], [21], [22], [23], [24], and the deep multi-view clustering [25], [26], [27], [28], [29], [30]. Among them, multi-view subspace clustering (MvSC) is very prevailing and has attracted massive attention, due to its excellent data representation capability. The hypothesis behind MvSC is that the consensus representation learned from different views emerges from multiple subspaces associated with different clusters. More specifically, the existing MvSC algorithms can be divided into three different branches, including self-representation learning, matrix factorization and view-shared anchor learning.