Multi Label Feature Selection Through Dual Hesitant q-Rung Orthopair Fuzzy Dombi Aggregation Operators

In this article, the feature selection (FS) process is taken as a multi criteria decision making (MCDM) problem. Also, to consider the impreciseness arising in the real time data, the values of the decision matrix procured after the ridge regression is fuzzified into dual hesitant q-rung orthopair fuzzy set. For the information fusion process, we have proposed various aggregation operators such as the Dual Hesitant q-rung orthopair fuzzy weighted Dombi arithmetic aggregation operator, Dual Hesitant q-rung orthopair fuzzy weighted Dombi geometric aggregation operator, Dual Hesitant q-rung orthopair fuzzy ordered weighted Dombi arithmetic aggregation operator and Dual Hesitant q-rung orthopair fuzzy ordered weighted Dombi geometric aggregation operator. A multi-label feature selection method is proposed using these MCDM techniques formed by the aggregation operators. This algorithm, initially, obtains the values of the decision matrix through the process of ridge regression. The weight vector required for the MCDM process is calculated using entropy. Further, the data are fuzzified and the MCDM process proposed using the aforementioned aggregation operators are utilized. A rank vector is obtained by utilizing the score function to select the desired number of features. It should be noted that through changing the aggregation operator, the algorithm can be altered. Experimental evaluation that compares the proposed method to other existing methods in terms of evaluation metrics demonstrates the effectiveness of the proposed method and their significance is also evaluated.


I. INTRODUCTION
In machine learning, the concept of FS is mainly used as a pre-processing step. These pre-processing approaches are extremely important when working with high dimensional datasets. The major advantage of FS is that it can reduce the data dimensionality thereby improving the speed of the algorithm which in turn accelerates the performance of the learning algorithm. Single label learning involves dataset with only one class. However, due to the evolvement of multi-label datasets, for instance, there could be one gene that is associated with multiple functions, several tags may be incorporated on one image and several topics could be The associate editor coordinating the review of this manuscript and approving it for publication was Tony Thomas. covered in a single document. Hence, there is a necessity for the development of multi-label FS algorithms [1], [2]. Multi label learning primarily involves two challenges (i) Unlike the conventional single-label learning that contains classes which are mutually exclusive, multi-label learning's classes are often interdependent and associated, making it more difficult to anticipate all relevant labels for a particular instance. (ii) The data involved in multi-label learning are generally of higher dimension. High-dimensional data is prone to the dimensionality curse which increases the computational cost and limits the generalization capacity of the classifier [3], [4]. FS seeks to identify a small subset of features that describes the dataset as well as, if not better than, the original set of features, is an efficient technique to lessen the dimensionality curse. The Binary Relevance approach is compatible with VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the usual multi-label FS approach, which converts multi-label datasets into single-label datasets before applying classic FS algorithms [5]. The major issue in this technique is that the inter-dependency between the labels is frequently overlooked which in turn causes difficulty to investigate the structure of labels that improves the performance of the multi-label learning by reducing the dimensionality [6], [7]. The authors of [6] used the concept of fuzzy neighborhood rough sets to handle multi-label datasets. The authors of [7] carried out reduction of attributes for multi-label learning algorithms using fuzzy rough sets. Hesitant fuzzy set based approach was utilized for ensemble of FS algorithms in [8]. FS of heterogeneous data were carried out in [9] using the fuzzy neighborhood multigranulation rough sets. The commonly used FS methods are filter, wrapper and embedding techniques each of them having their unique advantages. The filter techniques [10] are independent of the learning algorithms and choose appropriate features based on the general properties of training data. Such approaches rate features based on a set of criteria and delete features with low scores. The fundamental advantage of these approaches is that they have a low computational complexity making them acceptable for usage with high-dimensional data. While the wrapper technique [11] utilizes a particular algorithm as a component of their feature selection process whose results are more efficient, however, their computation cost is high and cannot always be used. Finally, the embedded technique [12] combines the advantages of the aforementioned techniques as they are complementing each other. Other than these techniques, the FS techniques can be classified from their label perspective into three, namely, supervised, unsupervised and semi-supervised. Appropriate labeled training subsamples are provided in supervised feature selection procedures [13] and feature relevance is established by analyzing feature correlation with the class. Unsupervised algorithms [14], on the contrary, do not require any labeled training data sets. Semisupervised FS [15] strategies are appropriate when there are only a few labeled examples among the entire training data set. The authors of [10] proposed a filter based multi-label FS technique namely MFS-MCDM using the MCDM method, the technique for order of preference by similarity to ideal solution. The authors of [16] combined the advantages of wrapper and filter based techniques for a differential evolution based FS technique.
Different from the various meta-heuristic approaches such as the gravitational search algorithm, ant colony optimization, and particle swarm optimization that involve complex optimization process, MCDM techniques are efficient whenever we need to specify preferences and achieve desired results influenced by the opinions of various decision or criteria. Also, it is important to note that the dataset used for multi-label learning are real time data that involve vagueness and imprecision leading to the necessity for the usage of fuzzy set theory. Fuzzy set theory developed by Zadeh [17] and further extended to intuitionistic fuzzy sets [18], Pythagorean fuzzy sets [19], q-rung orthopair fuzzy sets [20], hesitant fuzzy set [21], q-rung orthopair hesitant fuzzy sets [22], dual hesitant fuzzy sets [23], dual hesitant q-rung orthopair fuzzy sets (DHq-ROFSs) [24] etc., found their applications in diverse fields [25]- [27]. Among them, the concept of DHq-ROFSs has gained more attention recently due to its advantage of taking into account more amount of vagueness [28]- [30]. The usage of aggregation operators in different forms of fuzzy set has been able to handle the MCDM problems more efficiently. Among the various aggregation operators, the Dombi aggregation operator has the advantage of making the process of aggregation simpler through the alteration of the Dombi parameter [30], [31]. Through altering the parameter value in the Dombi aggregation operator, we alter the working behavior of the parameter resulting in the change of norm utilized for aggregation. The authors of [33] handled a decision making problem in the q-rung orthopair fuzzy environment using Dombi operators. For instance, the authors of [30] implemented Dombi operators in the Bonferroni mean and used it in the DHq-ROFSs environment and applied it to a MCDM technique. In [31], Dombi operators were handled in Pythagorean fuzzy environment and were used in multi attribute decision making problem. Crop selection MCDM problem was handled in the bipolar neutrosophic fuzzy environment using Dombi operators by the authors of [32].

A. MOTIVATION
Based on the aforementioned works, in this article, we propose the Dombi aggregation operators in the environment of DHq-ROFSs to aggregate the decision matrix procured after calculating the correlation between the data and the labels. The main advantage of this technique is that the vagueness and imprecision occurring in the real time data are considered for the evaluation. Also, the problems involved in multi-label learning such as the correlation and high dimensionality of the dataset are handled efficiently. The aggregation of data is a simple yet effective process that can reduce the curse of dimensionality, hence, reducing the computational cost and time.
• Few basic properties of these operators are discussed. • A multi-label filter based feature selection algorithm is formulated using the proposed aggregation operators.
• This method is evaluated based on multiple performance metrics and the significance test is also carried on.

C. STRUCTURE
The basic structure of this paper is as follows: Section 2 deals with preliminaries of multi-label learning and fuzzy set theory. Section 3 elucidates the proposed operators and their properties. Section 4 elaborates the proposed methodology. Section 5 deals with the experimental results and their discussions. A proper conclusion for the article is given in Section 6.

II. PRELIMINARIES A. MULTI-LABEL LEARNING
When the data of each sample is associated with multiple labels, it is termed as multi-label data. In this type of data, for every feature vectorX a = X a1 ,X a2 , · · · ,X aR there is a corresponding binary label vectorŴ b = Ŵ b1 ,Ŵ b2 , · · · ,Ŵ bS where R and S represents the number of features and labels respectively. This method primarily considers the construction of a E training sample which have the capacity to forecast labels of the new matrix. The structure of a multi-label dataset is given in table 1 Definition 1: (Information Entropy) [34] The correlation between random variables is evaluated using information entropy whose basic criteria is entropy which procures the uncertainty degree of the random variable. For a set of random variablesĈ = (c 1 , c 2 , . . . , c k ), the entropy h(Ĉ) of the random variableĈ whose possible outcomes are c j with probability P(c j ) isĈ The value ofĈ lie in the interval [0, 1]. The value 1 depicts equal distribution between the classes and 0 depicts that the instances lie in a single class. Definition 2 (Ridge Regression [35], [36]): It is a widely used method for regularizing linear least-squares issues in order to minimize the impact of multicollinearity in linear regression. For instance, consider a feature matrixX ∈ R E×R and the label matrixŴ ∈ R E×S and the coefficient matrix Q ∈ R R×S that depicts the relationship between R samples and S labels. The Ridge regression iŝ Here, I ∈ R E×E is the identity matrix,λ > 0 represents the parameter that regularizes the coefficients so that the optimization function is penalized when the coefficients take big values. It is essential to highlight that the coefficient matrix Q produced by ridge regression utilizing training data might reflect the relevance of features. The higher the significance of feature a in predicting the label b, the larger the value of Q a×b .

Definition 4 ([24]):
The dual hesitant q-rung orthopair fuzzy set (DHq-ROFS)K on a universe of discourseẐ is defined asK = {X ,ĜˆK (X ),FˆK (X )|X ∈Ẑ } such that the possible degree of membership and non-membership values of the variableX ∈Ẑ given through the setsĜˆK (X ) and

Definition 5 ([24]):
Let us consider a DHq-ROFNK = ĜˆK (X ),FˆK (X ) . The score function of this DHq-ROFN is given aŝ The Accuracy function is given aŝ where the values lˆĜ and lˆF denote the number of elements in GˆK andFˆK respectively.

Definition 6 ([24]):
The ordering of two DHq-ROFNsK 1 andK 2 can be carried on using the score function as follows:

III. DUAL HESITANT Q-RUNG ORTHOPAIR FUZZY DOMBI AGGREGATION OPERATORS
A. DOMBI OPERATORS Definition 7: The Dombi operations between any two real numbersα andβ such as the Dombi t-norm,T D and t-conorm,Ŝ D are defined aŝ Definition 8: Basic Dombi operations on any two DHq-ROFNsK 1 andK 2 can be given as follows Theorem 1: The basic operations using Dombi operators for the two DHq-ROFNsK 1 andK 2 given in definition 8 satisfy the closure property. Proof: 1) To prove the closure property ofK 1 ⊕ DK 2 , it is enough if we prove that the values First, let us consider the membership term. For µˆK From this, we can easily conclude that, In a similar manner, we can prove for the nonmembership term also. 2) The closure property ofK 1 ⊗ DK 2 can be proved similar to the proof ofK 1 ⊕ DK 2 .
3) The closure property ofθK can be proved similar to the proof ofK 1 ⊕ DK 2 .
4) The closure property ofKˆθ can be proved similar to the proof ofK 1 ⊕ DK 2 .
Proof: Mathematical induction is used to prove this assertion. It is obvious that the result is true for m = 1. Suppose the result holds for m = v − 1, i.e., To prove that the result holds good for m = v, let us consider Hence proved.

IV. PROPOSED METHODOLOGY
The methodology of the proposed technique is elucidated in this section. The flow chart of the proposed algorithm is provided in figure 1. The proposed method is filter based which involves conversion of the multi-label FS problem into a MCDM problem and using dual hesitant q-rung orthopair fuzzy Dombi aggregation operators to solve them.

A. ALGORITHM
In a multi-label data, there are features and label matrix similar to the structure given belowX Step 1: An empty vector is defined as the feature ranking vector so that the features can be added to it.
Step 2: Ridge regression is used to determine the correlation between the features and the labels. The decision matrix is obtained as follows:  3: The values are fuzzified using equations (13), (14). 4: The values are normalized. 5: Values are aggregated using the equations (6), (8), (10) and (12). 6:Ẑ vector is obtained using score function (3) for the aggregated value.  (10) and (12). 9:F = Rank the vectors based on score function in descending order.
Here,Q(X a ,Ŵ b ) denotes the importance of the feature a to the label b andλ denotes the regularization parameter which is assumed as 10 after examining the data for multiple values.
Step 3-7: The weight vector is calculated utilizing the information entropy and its structure is given as inÔ and in step 7, it is normalized.

H(Ŵ(:, S))
       Step 8: As we have obtained the decision matrix and the weight vector, we proceed with the MCDM methodology.

1) Fuzzification:
The obtained data is converted into dual hesitant q-rung orthopair fuzzy set. The membership and non-membership values are obtained using the sigmoidal function Here,ˆ 1,v andt denote the distance of the point from origin, steepness of the function respectively andb ∈Â. Next, convert them into triangular fuzzy set with equal intervals. The final form as DHq-ROFN of the crisp element isK = μ 1 ,μ 2 ,μ 3 ,ν 1 ,ν 2 ,ν 3 q that satisfies the condition maxˆμˆĜ 2) Normalization: As there are two types of criteria namely the costδ c and benefitδ b , it is essential to normalize them into a single criterion which can be done using the followinĝ 3) Aggregation: The values are aggregated using the equations (6), (8), (10), (12) and the values are ranked after the values obtained from the score function (3).

V. EXPERIMENTAL STUDIES
In this section, we consider three multi-label FS algorithms MLACO [36], BMFS [37] and MFS-MCDM [7] and compare their performance with the proposed techniques. These use different approaches such as ant colony optimization, bipartite matching based strategy and MCDM approach.

VI. RESULTS AND DISCUSSIONS
The parameter values for the multi-label FS methods MLACO, BMFS and MFS-MCDM were given the values that were suggested in the corresponding article. MLKNN [31] classifier was utilized for this process and the number of neighbors was assigned the value 10 for every procedure. 60% of the samples were allotted for training and 40% of the samples were allotted for testing in each test. The results VOLUME 10, 2022  that are procured is the average of 20 iterations on each method. The feature subset was altered from 10 to 100 in ranges of 10. It is essential to note that the proposed method can choose the number of features. The figures 2 -6 are the results procured for the proposed methods and other methods considered for comparison in terms of the metrics accuracy, average precision, coverage, Hamming loss, One-Error and Ranking loss. The x axis in these graphs reflects the amount of features extracted, while the y axis represents performance of the classifier.
The features are evaluated based on labeled data in the multi-label feature selection. As a result, the information from labels should be merged to construct an effective method and features should be assessed based on how they relate with all labels, not just one. The MCDM is one of the most effective strategies for dealing with such issues. Also, as the datasets are real time data, incorporating fuzzy set theory into this becomes essential. In this article, we have used dual hesitant q-rung orthopair fuzzy set to consider more amount of vagueness in the datasets. Since, the multi-label feature selection involves the necessity to aggregate the decision depending on the criteria, we propose the MCDM technique based on Dombi aggregation operators that involves the Dombi parameter to analyze the correlation between the data. The decision matrix is formed considering the labels as criteria. We employed a Ridge Regression approach, which is built on a subspace learning strategy that represents the relevance of the associated feature. Indeed, we may acquire the gradient of a line relative to a feature with Ridge Regression, and if the resultant value is substantial, we can conclude that the line variations on that feature are substantial. It denotes a strong link between the characteristic and the label. For the proposed technique, the data acquired by Ridge Regression was utilized as the decision matrix and weight was considered from the entropy of each label. From the figures 2-6, we can infer that the proposed methods show efficient performance in most of the datasets. Only in the medical dataset, their performance was ranked second. Among the proposed techniques, we can see that MFS-ODG and MFS-ODA showed efficient performance in most of the cases. It can be seen that the ordered operators performed efficiently. Hence, we can infer that the operators DHq-ROFOWDA and DHq-ROFOWDG performed efficiently. Thus, the proposed techniques were able to consider the impreciseness and also perform efficiently.
Sensitivity analysis was carried out for the algorithm through altering the values ofτ = 1, 2, 3, 10, 100. From the results of accuracy, average precision, coverage, Hamming loss, One-Error and Ranking loss, we can infer that the alteration of the parameterτ , altered these results based on the dataset. The results procured forτ = 3 was efficient on average and henceforth, in the figures 2-6, the values of the methods plotted for MFS-DA, MFS-ODA, MFS-DG and MFS-ODG is given forτ = 3.
The significance of the proposed algorithm is evaluated by utilizing the Friedman test [42] and the post hoc-Conover test [43] for comparison with the existing methods. The significance value is considered as 0.05. In the Friedman test, the proposed method is said to as significant as the existing methods, if the p-value is greater than the existing methods. If the p-value of the Friedman test lies below this significance level, we proceed with the post-hoc Conover test. In the posthoc Conover test, if the p-value lies above the significance level, we conclude that the proposed method is as significant as the existing techniques, otherwise, we conclude that the there is no significance. We have obtained significance for our proposed techniques after performing the test. The tables are omitted here.

VII. CONCLUSION
In this article, filter based FS algorithms are proposed. This method transfers the features and labels space to a Multi-Criteria Decision Making problem and utilizes a subspace learning technique to determine the correlation among features and labels. Upon measuring the correlation among features and labels, we supply this data as our decision-making data in this approach to the MCDM technique using the proposed aggregation operators. For the first time, fuzzy aggregation operations are utilized in a multilabel learning. To consider the impact of different labels, entropy was used as the weighting technique. Various evaluation metrics were used to emphasize the efficiency of the proposed technique and the significance tests were also carried on. The limitation of this work is that it doesn't consider the interrelationship between the membership and non-membership terms.
The future direction of this work could concentrate in the development of the Frank t-norm and t-conorm for this fuzzy environment and applying it in various feature selection problems. Inspired from the works of [44], one can also consider the interrelationship between the membership and non-membership values using the Archimedian t-norm and t-conorm and build interactive aggregation operators. The researchers can develop various FS methods such as the ensemble FS method etc., using the proposed aggregation operator. Various distance measures and similarity measures and other aggregation operators can be used to build a MCDM methodology and further apply it to multi-label FS problems.