Non-Aligned Multi-View Multi-Label Classification via Learning View-Specific Labels

In the multi-view multi-label (MVML) classification problem, multiple views are simultaneously associated with multiple semantic representations. Multi-view multi-label learning inevitably has the problems of consistency, diversity, and non-alignment among views and the correlation among labels. Most of the existing multi-view multi-label methods for non-aligned views assume that each view has a common or shared label set, but because a single view cannot contain the entire label information, they often learn suboptimal results. Based on this, this paper proposes a non-aligned multi-view multi-label classification method that learns view-specific labels (LVSL), aiming to explicitly mine the information of view-specific labels and low-rank label structures in non-aligned views in a unified model framework. Furthermore, to alleviate insufficient available label information, we thoroughly explored the global and local structural information among labels. Specifically, first, we assume that there is structural consistency between the view and the label space and then construct the view-specific label model in turn. Second, to enrich the original label space information, we mine the consistent information of multiple views and the low-rank correlation information hidden among multiple labels. Finally, the contribution weight of each view is combined with learning the complementary information among the views in the decision-making stage, and extend the model to handle nonlinear data. The results of the proposed method compared with existing state-of-the-art algorithms on several datasets validate its effectiveness.


I. INTRODUCTION
M VML is used to describe multi-semantic problems of multi-source heterogeneous data objects [1], [2], [3].In Fig. 1, given a natural scene image, it can be represented by multiple view structures (LBP, HOG, HSV) with multiple labels (blue sky, white clouds, desert).Multi-view multi-label is Dawei Zhao is with the School of Electrical Engineering and Automation and with the School of Computer and Technology, Anhui University, Hefei 230601, China (e-mail: zhaodwahu@163.com).
Qingwei Gao, Yixiang Lu, and Dong Sun are with the Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China (e-mail: qingweigao@ahu.edu.cn;lyxahu@ahu.edu.cn;sundong@ahu.edu.cn).
Digital Object Identifier 10.1109/TMM.2022.3219650 a learning framework for handling high-dimensional heterogeneous multi-semantic data classification problems.Multi-view learning [4], [5], [6], [7] can describe data objects more comprehensively and accurately than single-view learning.For example, video labeled as "Sports," "National Basketball Association," and " Basketball Stars" is represented simultaneously by diverse data forms, such as text, image, and audio.In addition, there are learning paradigms with different perspectives under the same modality.For example, we can use various feature forms to describe image data (texture description, shape description, color, etc.).With the emergence of Big Data and the rapid development of data collection technology, people are bound to face data classification problems in more complex and changeable real-world scenarios.In the past few decades, multi-view and multi-label learning [8], [9], [10] have been extensively studied as two separate research fields.A fundamental assumption of conventional single-label learning is that the relationships among labels are mutually exclusive.In multi-label learning, the semantic information of the labels is rich, and there is mutual dependence among the labels, which is a theoretical conflict with single-label learning.To solve more complex data classification problems in real-world scenarios, the MVML framework has emerged.The existing methods have the following problems in the existing methods that urgently need to be solved: 1) There are two major principles in multi-view learning: consistency and diversity in multi-source heterogeneous data [11], [12].The principle of consistency asserts that it is necessary to keep the consistent information of multiple views as much as possible in multi-view learning.The diversity principle advocates that each view should learn complementary information among views while completing its specific knowledge discovery task.
2) Label correlation learning problem [13], [14].The correlation among labels in multi-view learning is one of the critical factors for improving multi-label classification performance.
3) The non-aligned multi-view learning problem [15].In most multi-view learning methods, it is often explicitly or implicitly assumed that the view samples are uniformly aligned, but in reality, it is often difficult to obtain fully consistent multi-view information.For example: in video recommendations, label data are obtained from different video software, but due to the privacy protection principle of users, we cannot match and align these data with the same user consistently [16].In the field of face recognition, due to the failure of face landmark detection, multi-view faces cannot be aligned, which harms facial expression recognition [17].In general, there are many non-aligned multi-view data in the real world, and a single view cannot contain all the label information.Otherwise, multi-view learning will lose its meaning.Therefore, we naturally face the following challenges: one is how to solve these three problems simultaneously, and the other is to solve the linear inseparability problem of the given data.According to the different solutions, we divide the existing strategies into two types: feature fusion and classification fusion [18], [19]: The feature fusion strategy usually considers transforming the problem into a multi-view shared subspace information extraction problem and degenerates the multi-view heterogeneous feature information into a multi-label learning problem after fusion [20], [21], [22], [23].The matrix factorization method [24] is often used to obtain the shared subspace information of the multi-view data and then uses the shared information among the views and the label information of the labeled samples to learn the discriminant predictor.The effectiveness of subspace learning relies on the accurate acquisition of consensus representations, but low-dimensional consensus representation learning becomes more difficult as the number of views increases.
The classification fusion strategy divides the problem into multiple multi-label learning problems and then predicts the unknown example label set by assigning a weight to each view classifier [18], [19], [25], [26].Because a unified predictor needs to be learned for each view, the classification fusion strategy forces each view to learn common sample label information to learn multiple views and consistent information across multiple labels and assigns different views to each view weight to learn complementary information for this view.Such methods can effectively learn view diversity information, and these individual modes can also improve the robustness of the predictor.Clearly, individual models rely heavily on the performance of each individual classifier.Since it is impossible to label each view separately in reality, the label information learned by this type of method is often the general label information.
Most of the existing methods focus on the first two challenges.For the third problem, the literature [15] gives a mitigation scheme: although the samples among views are not aligned, they can still be implicitly connected through common or shared labels to be learned complementarily.However, this strategy is suboptimal because it assumes that all views have a uniform set of labels.In practice, there is a problem of inconsistent views with their corresponding labels [27].The intuitive explanation is that each view only observes a part of the corresponding label information, so different views have specific label sets.For example, in Fig. 1, we observe that in subgraphs (a), (b), and (c), all three different views can only obtain a part of the complete label information.Subspace learning can avoid the effect of inconsistent labels for views, it does not focus on the problem of non-aligned multi-views.
With our existing knowledge, it is impossible to learn viewspecific features and multi-label structures jointly.Additionally, the data of each view have a complex nonlinear structure, so linear models are no longer sufficient for current needs.This paper proposes an MVML method for jointly learning view-specific labels and multi-label structural information.Specifically, first, a view-specific label matrix is learned based on the structural assumption of similarity between multi-view features and labels.Then, the global label structure and local structure correlation are introduced to enrich view-specific label information.Finally, the joint learning model is extended to nonlinear models.
We designed the model to establish the final optimization goal to study the above problems jointly.Fig. 2 illustrates the model framework of the proposed method.The most significant difference between our method and the existing multi-view learning method is that the latter ignores the misalignment of multi-source heterogeneous features and label space.Our experiments prove that this view-specific label learning structure plays an indispensable role.Our main contributions in this paper are as follows: 1) We propose a novel MVML method, that combines viewspecific labels and label structure learning.2) Our method mines view-specific label information for multi-view consistency and complementary information learning.3) We extend the linear model to the nonlinear model to solve scenarios where the given data are not be linearly separable.The rest of this article is organized as follows.In Section II, we briefly summarize the related work of multi-view multi-label learning.Section III proposes our method, and Section IV proposes an effective alternative iterative optimization solution method to solve it.A large number of experimental results and analyses are reported in Section V. Section VI summarizes the research directions of this article.

II. RELATED WORK
The previous section divided existing approaches into two different strategies, depending on the solution.In this section, we outline the latest research that is closely related to our approach based on the above taxonomy.

A. Multi-View Multi-Label Learning
Direct feature fusion is a method that connects the features of all views in series for classification.For example, Fig. 2. The framework of the proposed LVSL method.High-order label correlation information is used to augment and complete the shared label set.View inconsistency is guided by view-specific label learning, and label consistency is guided by view-label alignment learning.LVSL combines multi-view feature data with the consistent alignment of views and labels for non-aligned multi-view multi-label classification tasks.RLM-MCML [26] merges multi-view features through a simple concatenation strategy.Meanwhile, the structural relationship among labels is learned based on low-rank labels and sample local smoothness assumptions.This degenerate method of merging ignores the unique physical meaning of the view itself.Simultaneously, the high-dimensional heterogeneous features obtained by the merging strategy may lead to the curse of dimensionality and overfitting.The subspace learning method considers that all views have a latent common representation to build a classification model, a feature fusion strategy.For example: in lrMMC [28], the first stage captures the low-dimensional common representation of all views, limits it to a low-rank matrix, and then assigns specific weights to each view to explore the complementarity between different views.In the second stage, the consensus matrix is embedded in the matrix completion for classification.The difference between TMV-LE [22] and lrMMC is that tensor factorization technology is added to learn the high-order relationship between different views when using subspace learning to mine public representations.In addition, the label enhancement method is used when performing multi-label classification.GLMVML [29] learns a consensus multi-view representation through matrix factorization and encodes complementary information from different views.In addition, it also learns global and local label structural information.iMvWL [20] attempts to capture a distinguishable shared subspace from incomplete views through nonnegative matrix factorization and local label structure learning, thereby constructing a robust weak label classifier.LSA-MML [23] uses subspace learning to force the alignment of undiscovered latent patterns to obtain a public representation, revealing the latent semantic patterns in the data.ICM2L [21] utilizes nonnegative matrix factorization to learn the individual and common information of different views, thereby improving the recognition ability of the classifier on rare labels.MLMVL-MM [30] uses multi-label correlation information to merge multiple feature views and maximum margin classification simultaneously.However, with the subspace method, as the number of views increases, it becomes more challenging to learn an effective latent low-dimensional consistency representation, which leads to decrease in the performance of the algorithm.
Classification fusion: Multiple views are fused to perform multi-label classification in the prediction stage.For example, VLSF [31] leverages pairwise label correlations and views contributions to learn view label-specific features in multi-view multi-label learning, addressing the issues of view consistency and complementarity.GRADIS [32] adopts a two-stage label disambiguation method to solve the multi-view partial multilabel problem.First, the candidate labels are disambiguated based on the fusion similarity graph, and the ground-truth labels of the training samples are estimated; then, the disambiguationguided clustering analysis is used to generate a prediction model for learning label-specific features.NAIM 3 L [15] uses a classification fusion strategy to describe the global and local structures among labels as high-rank and low-rank, respectively, to alleviate the problem of insufficient available labels, which simultaneously solves the learning problems of missing labels, incomplete views, and non-aligned views.F2L21F [33] proposes a sparse framework for image classification.MLSO [3] builds an SVM classifier based on each data view and jointly learns multi-source multi-label learning tasks under a unified optimization framework.Multi-label classification results are obtained by a weighted combination of decisions from multiple sources.The classification fusion methods generally consider that although the various views are not explicitly aligned, they can still be implicitly connected through public or shared labels [15].Nevertheless, intuitively, each view has only a subset of the corresponding labels, meaning each view can only catch a subset of common or shared label data.Therefore, there are obvious shortcomings in the premises of the methods mentioned above based on classification fusion.
In addition, the existing multi-view multi-label learning methods have achieved certain results, but most of them are based on linear models.When a given dataset is linearly inseparable, we may not achieve the expected classification effect.For this reason, scholars add nonlinear mapping to the model.For example, TM3L [18] is a two-step learning strategy.The first step is to learn a common representation of multiple views with complementarity and consistency through subspaces, and the second step combines label correlation to build a nonlinear multilabel classifier model.MVLE [34] utilizes the low-dimensional latent semantic space to connect the labels and features of different views and further uses the Hilbert-Schmidt independence criterion (HSIC) [35] to mine the consistency information among different views.SIMM [36] proposes a neural network MVML method, which uses the shared subspace learning and view-specific information identification.On this basis, MML-DAN [37] adopts a self-attention mechanism to model the interaction information of label-specific views to explore consistent label correlations.CDMM [19] utilizes multiple multi-label models to learn view consistency information jointly and introduces HSIC theory to extract the different information among views.

B. Label Correlation Learning
Different from traditional single-label learning tasks, multilabel learning aims to assign multiple category labels to a sample, which has gained increasing attention in different machine learning tasks.From an intuitive point of view, samples with similar labels are more likely to have strong correlations [38].Therefore, the existing multi-label methods are divided into three categories according to the different label correlations used [9].First-order strategies: consider that there is no inherent correlation among labels and that labels are independent of each other [39], [40].Second-order strategies: consider that the label correlation exists in pairs, and use the distance measurement method to evaluate the correlation of the label pairs [31], [41].High-order strategies: consider that label correlation in complex scenarios is multifaceted and semantically related [42], [43].Theoretical research on label propagation dependencies shows that label correlations can reconstruct and enrich original label information [44].
In addition, most of the previous label correlation studies considered the global structural information of labels, but more studies confirmed that the correlation among labels might only be shared with a subset of samples [38].Therefore, there is a weak correlation or irrelevance among samples with different labels, reflecting the local structural relationship within multiple labels [45].ML-LRC [46] uses a low-rank structure to capture the complex associations among labels and jointly learns label correlation and multi-label classifiers; GLOCAL [47] builds the global sum of labels by combining multiple regularizers of labels in a multi-label classifier of local structural relationships.
As mentioned above, most of the existing MVML methods consider that all views share a set of labels, but in practical applications, there is a problem of inconsistent view-label information.Moreover, this problem caused by non-aligned view learning has not been directly investigated in previous studies.We propose an MVML method for learning view-specific labels based on the aforementioned issue.First, view-specific label learning addresses the view-label inconsistency of non-aligned views.Then, effective global and local structural regularizers for label correlations are introduced into view-specific label learning.Finally, the complementary information among views is learned by a weighted combination of each view, and the model is extended nonlinearly.The effectiveness of our method is verified on multiple benchmark multi-view multi-label data sets.

A. Problem Settings
Let X = {x v } m v=1 denote multi-view multi-label data sets with m views, where is the complete feature space of the v-th view, N represents the number of training samples.Y = [y 1 , y 2 , . . ., y N ] ∈ R N ×l represents the label space corresponding to the feature set, where y i ∈ {0, 1} N ×l is the label vector of x i , and l represents the number of labels.

B. Problem Formulation
In the initial prediction model of multi-view multi-label classification, label classification learning is a typical regression model problem.The base model advocates different views to predict the same label result to use consistent information between different views.Furthermore, the different contribution weights of each view are considered in the base model to learn the complementary information among views.The objective function can be formally defined as follows: (1) The variable θ v is used to measure the contribution of each view.
There are two main problems currently faced: 1) We need to learn non-aligned views in a common label space.
2) The introduction of multi-label structural learning in multi-label learning helps to improve the classification performance of the algorithm.Therefore, how to combine these two attributes more effectively and make our model more discriminative is the main issue to be considered below.
Eq.1 assumes that the samples among views share a common label set, which is an implicit solution to view alignment consistency.However, there is no such explicit or implicit alignment view sample in a large amount of data in reality because the labels that each view in the real world can observe may only be part of the entire information, so it is necessary to learn a particular non-aligned multi-view method that solves the inconsistency of observable information in each view.For the first question, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
we propose a display view non-alignment method, introducing the concept of view-specific labels.Then, we have the following equation: Where P v represents the view-specific label matrix, the second term of Eq.3 represents the introduction of the topological structure of each view in the feature space, which ensures that the local geometric structure between the feature space and the semantic matrix of different views is consistent.
is the graph Laplacian matrix.S ij measures the similarity between instances X i and X j .The local geometric structure is constructed from the nearest neighbor graph on the feature space X v in our work.In addition, the calculation of the similarity between the two instances of the v-th view is as follows: where N p (x) is the set of p nearest neighbors of instance X v .
For the second problem, we introduce a structural learning method of label correlation.We know that most existing multilabel label correlation learning methods have two limitations: 1) Label correlation is usually regarded as prior knowledge and cannot correctly describe the true dependency relationship among labels; 2) The consideration of the local structure of the label relationship in the label space is ignored.For the first limitation, we use the idea of label propagation to build a joint learning model of view-specific labels and label correlations to solve them.Specifically, we believe that in addition to keeping the structure consistent with different view features, the view-specific labels should also consider the impact of label correlation on the information supplement of the original label space.Therefore, we introduce label correlation to supplement the original label matrix: (5) Regarding the second limitation, we believe that in addition to focusing on the global features of multi-labels, we also need to capture some local structural information.For example, there is usually a group of labels so that the labels in a group have a strong correlation with each other and are independent of different labels.Therefore, we use • * to represent the nuclear norm to limit the label correlation matrix C to a low-rank structure.Finally, obtain the objective function as follows: (6) Based on the above problems, we jointly learn non-aligned multi-view and multi-label semantic structures.Furthermore, because Eq.6 is a linear model, it cannot solve the inseparable linearity of given data.At present, some existing multi-label learning algorithms (such as [14], [34], and [48]) use nonlinear models to achieve good performance.We use the feature map φ(•) to map the feature space X to a higher-dimensional (possibly infinite-dimensional) Hilbert space φ(•).According to the expression theorem, we rerepresent the linear combination of input variables W as W = φ(x) T A, according to the expression theorem [40].Suppose K is the kernel matrix where κ(•, •) is the kernel function used (the Gaussian kernel is used in this paper).Then, Eq.3 and Eq.6 can be rewritten as: In the next section, we will solve problem 8 with alternate iterative optimization.

A. Model Optimization
The optimization problem in ( 8) is convex, and the resulting problem can be solved by following the alternate optimization procedure.
Fix P v , C and θ, Optimize A v .
Taking the derivative of L(A v ) w.r.tA v and setting the derivative to 0 can obtain a closed solution w.r.t.A v : Fix A v , C and θ, Optimize P v .
Taking the derivative of L(P v ) w.r.t.L(P v ) and setting the derivative to 0 can obtain a closed solution w.r.t.L(P v ).
Compared with variables A v and P v that can directly obtain closed solutions, it is difficult to directly optimize C because of the nonsmooth regularization term in (8).To make the objective function Eq.8 separable, we introduced the auxiliary variable Z to replace C, and then an equivalent objective function can be expressed as: We use augmented Lagrangian multipliers (ALMs) to solve this problem and reformulate the objective function (13) as: Then, the inexact ALM (IALM) method is used to iteratively solve each variable in ( 14) by the block coordinate descent method.μ and Λ are expressed as nonnegative penalty factors and Lagrangian multipliers, respectively.According to the optimization strategy of IALM [49], we divide ( 14) into the following subproblems: Update multiplier Λ.
Fix A v , P v and C, Optimize θ.
In summary, we introduce a kernel model to generate the predicted label vector Y t : where , and η is the given threshold obtained by cross-validation.

B. Complexity Analysis
In this section, we mainly analyze the complexity of the optimization parts listed in Algorithm 1.The time complexity of LVSL is mainly controlled by step 4. The complexity of updating A v in each iteration is O(N 3 + N 2 l), and the complexity of updating , where t is the number of iterations.Typically, the model reaches its optimum after ten iterations converge quickly.

A. Experimental Settings
We performed experiments on 7 benchmark multi-view multilabel data sets, which can be downloaded from Mulan [51]. 1ascal07, Corel5k, ESPgame, Iaprtc12, and Mirflickr are the   five widely used image datasets 2 from [52], [53].The details of the datasets are summarized in Table I .
To verify the effectiveness of the proposed method, we compare our method with the following seven competing methods.Two of these methods use a concatenation strategy, which builds a multi-label learning model based on each data view and combines the weights of the output results to make the final prediction.Other methods are multi-view multi-label learning methods.
r ICM2L [21]: Individual-view and commonality-view min- ing MVML classification method.Parameter configurations are implemented according to the suggestions given in the paper.[20]: Incomplete multi-view weak label learning.
In the experiment, the complete view information is available.Parameter configurations are implemented according to the suggestions given in the paper.
3 code: https://github.com/zhaodwahu/LVSL.For all the above methods, the parameters are tuned to achieve the best performance by grid search.

B. Evaluation Metrics
We use five evaluation metrics that are widely used in multilabel learning to measure the performance of each algorithm.The specific evaluation metrics are average precision (AP), coverage (CV), Hamming loss (HL), one error (OE), and ranking loss (RL).The larger the value of AP is, the better.The smaller the other evaluation metrics values are, the better.The detailed metric definitions can be found in [9], [10].

C. Experimental Results
We performed fivefold cross-validation on each dataset, and each algorithm repeated the experiment 5 times.The average and standard deviation of each metric value under each dataset are reported in Tables II to VI.We show the best results in red and the second-best results in blue.
The F riedman test [55], as a common strategy for comparing whether multiple algorithms have the same performance.VII, we know that the F F statistics of all metrics are greater than the critical value.Obviously, all metrics negate the null hypothesis, so we need to use a post-hoc test method to illustrate the significant differences among the approaches.In this article, we choose the Nemenyi test [39], [56], [57] as the post-hoc test method.In Fig. 3, the algorithm performance is sorted from left to right, and the best algorithm is ranked on the far right.Specifically, if the average ranking difference among the comparison algorithms is within a CD value, they are connected with a red solid line.From the reports in Tables II to VI and Fig. 3(a) to (e), the following conclusions can be drawn: r Among 35 configurations (7 datasets and 5 evaluation met- rics), ours ranked first and second at 71.4% and 14.3%, respectively.
r Fig. 3 shows that LVSL is significantly better than other methods in 40% of cases, followed by CDMM and SIMM in 20% of cases.It is worth noting that our method is always better than CDMM.
r Encouragingly, by observing Tables II to VI, we find that our method achieves better performance on all metrics of Emotions and Y east.The overall CV metric performance of LVSL is not as good as SIMM, but it is not much different from the better results.The analysis in addition to the experimental results is as follows: r Compared with LSML and ML k NN, it can be seen that the performance of the traditional multi-label method connected to the multi-view multi-label learning approaches is flawed, mainly because they ignore the consistency and complementary information mining of multi-view and the physical interpretation of the characteristics of different views.
r The comparison among LVSL and iMvWL, ICM2L, and TM3L shows that our view-specific label learning method Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.r LVSL, SIMM, TM3L, and CDMM use nonlinear mapping to solve the linear inseparability problem.In general, LVSL is always better than the other three methods.View-specific labels and multi-label structural learning can effectively improve classification performance.In addition, SIMM also ignores the impact of label correlation, which leads to its poor overall performance.
r LVSL performs worse than SIMM on the AP and CV met- rics on the Pascal07 and Mirflickr datasets for two main reasons.(1) LVSL uses a single kernel function for kernel mapping of multiple views, but it is undeniable that the performance of the kernel method often depends on the choice of the kernel function.Because the nonlinear relationship among the data of each view may be different, the optimal kernel function for one view may not be suitable for another view [58], which provides a new direction for our future research work.SIMM does not need to consider this problem.(2) SIMM develops the shared subspace based on the information among each view.In our work, considering the problem of the non-aligned view, the information among views cannot be directly communicated, which affects the performance of the LVSL to a certain extent.Additionally, there are two main reasons for the advantage of our method over deep learning methods: r The current multi-view multi-label learning tasks cannot directly perform end-to-end training through deep learning and require solutions that benefit from some traditional feature extraction techniques.Therefore, the feature representation capability of deep learning is limited in this task, and due to its powerful nonlinear data processing capability, our method using kernel tricks can also achieve this purpose [48].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
r The training data in this paper are relatively limited, and deep learning may overfit the training data, resulting in insufficient model generalization ability.The traditional method has good generalization ability, interpretability, sufficient transparency, and universality [59].Therefore, to some extent, traditional methods are more suitable for solving the complex tasks proposed in this paper.

D. Ablation Analysis
In this section, to further verify the effectiveness of each component in LVSL, we conducted additional ablation analysis experiments and reported the values on the five evaluation metrics in Table VIII.LVSL-I, LVSL-II, and LVSL-III are variants of LVSL, which exclude the influence of view-specific labels, label correlations, and view contributions, respectively.Comparing the results of LVSL-I and LVSL on Table VIII, it can be found that the overall performance is significantly improved after adding view-specific labels, which confirmed our clear motivation to use view-specific label learning to solve the problem of the non-aligned view.Comparing LVSL-II and LVSL, it is found that LVSL is better than LVSL-II in most cases, which proves the necessity of capturing label structure information and verifies the effectiveness of using the label association matrix C to complement the original label matrix Y .In some cases, LVSL-III and LVSL have the same performance, showing that our contribution measurement method has room for further improvement.
The hyperparameter λ 1 controls the complexity of the model coefficients and adjusts the balance between overfitting and underfitting.When λ 1 is too small, it will cause overfitting problems in the model, and underfitting problems will occur when λ 1 is too large.The hyperparameter λ 2 controls the contribution of different views.The hyperparameter λ 3 controls the structural diversity among different views.The hyperparameter λ 4 controls the global consistency of information between the view-specific label and the real label.The hyperparameter λ 5 controls the effect of local label correlation.
Fig. 4 shows that the parameter λ 1 has a better effect in taking the intermediate value, and intuitively, the intermediate value ensures the balance of the model fitting.When the parameter λ 2 achieves 10 5 , the effect is better.A larger value means that the influence of the contribution weight of each view is ignored, and a smaller value will be too sensitive to the contribution of view parameters and ignore the complementary information between views.The parameter λ 3 and λ 5 values tend to take smaller values, but values that are too small will ignore the contribution of the corresponding regularization term, so we generally choose the median value.The performance is better when the parameter  λ 4 takes a larger value.A larger value can fully learn the view consistency information of multiple views, but an excessively large value will also lead to insufficient complementary learning of view-specific labels.Our parameter sensitivity analysis results on other datasets are similar, and similar conclusions can be drawn.

F. Further Analysis
We report the algorithm efficiency analysis of LVSL in this section.Fig. 5 shows the iterative trend of our method on two datasets.Fig. 5 shows that the value of the objective function is significantly reduced during the initial iteration, and as the optimization process proceeds, the value of the objective function Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
gradually converges.LVSL tends to converge for 10 iterations on both datasets, proving that it can converge faster.Our convergence results on other datasets are similar.

VI. CONCLUSION
This paper proposes a novel multi-view multi-label classification method that jointly learns view-specific labels and label structures.LVSL differs from existing work on multi-view multi-label classification by implicitly concatenating common or shared labels in that it assigns a specific label to each view to solve the problem of inconsistent labels for views in non-aligned views.When constructing view-specific labels, the consistency and diversity information among the views in multi-view learning are learned, and the label correlation information in multilabel learning is also combined.A large number of experiments show that the proposed non-aligned view learning method is a promising solution for multi-view multi-label classification based on view-specific labels.
This method is of great significance for future research on the feasibility of the multi-view multi-label classification of nonaligned views.Future work will be devoted to proposing more new methods to study view-specific label learning problems via multi-kernel learning.

Manuscript received 17
August 2021; revised 23 March 2022 and 20 September 2022; accepted 29 October 2022.Date of publication 4 November 2022; date of current version 1 November 2023.This work was supported in part by the Nature Science Foundation of Anhui under Grants 2008085MF183 and 2008085MF192 and in part by the National Natural Science Foundation of China (NSFC) under Grants 62071001 and 61502003.The Associate Editor coordinating the review of this manuscript and approving it for publication was Prof. Ngai-Man Cheung.(Corresponding author: Qingwei Gao.)

Fig. 3 .
Fig. 3.The performance comparison results of LVSL and other comparison methods using the Nemenyi test (CD = 3.9685 at the 0.05 significance level) under five evaluation metrics.

Fig. 4 .
Fig. 4. Parameter sensitivity analysis of the LVSL algorithm on the Corel5k dataset.(a) Effect of λ 1 with other fixed parameters.(b) Effect of λ 2 with other fixed parameters.(c) Effect of λ 3 with other fixed parameters.(d) Effect of λ 4 with other fixed parameters.(e) Effect of λ 5 with other fixed parameters.

TABLE V EXPERIMENTAL
RESULTS (MEAN ± STD) ON ONE ERROR (↓)

TABLE VII THE
CORRESPONDING STATISTICAL F F VALUE OF EACH EVALUATION METRIC AND CRITICAL VALUE UNDER THE F riedman TEST r iMvWL Table VII summarizes the Friedman statistical F F value of each Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE VIII COMPARISON
RESULTS OF LVSL-I, LVSL-III, LVSL-III AND LVSL.LVSL-I WITHOUT VIEW-SPECIFIC LABEL STRUCTURE, LVSL-III WITHOUT LABEL CORRELATION, AND LVSL-III WITH THE SAME CONTRIBUTION WEIGHT FOR ALL VIEWS evaluation metric and the critical value at the 0.05 significance level.Observing Table