A Novel Framework of Two Successive Feature Selection Levels Using Weight-Based Procedure for Voice-Loss Detection in Parkinson’s Disease

Parkinson’s disease (PD) is one of the public neuro-degenerative disorders. Speech/voice disorder is considered one of the symptoms at an early stage. Acoustic and speech signal processing methods can potentially evaluate and measure PD-related vocal impairment. The present work proposed a novel feature selection framework using two levels of the feature selection procedure for voice-loss detection in PD patients. At the first level selection, the principal component analysis (PCA) and the eigenvector centrality feature selection (ECFS) methods are initially calculated independently, and the selected features from each method are considered as a separated sublist, namely ECFS selected features sublist, and PCA selected features sublist, in the first set. Accordingly, the first set, which is the first level selection set, is generated from the union of these two sublists using the top-selected features from both methods. In the training phase, a second level selection, which forms the second set (which is a subset from the first set), is generated to calculate the proposed weight of each selection method. Since in the present work, the ECFS provided superior performance to the PCA in the first level selection, the ECFS is applied to the first set in order to find weight values based on the contribution/impact of the top-selected PCA- and ECFS- features in the second level. This weight is determined by finding a proposed ratio, which is multiplied directly by the selected ECFS features in the first level. The selected weighted ECFS features are then combined with the same PCA features to avoid ignoring any of the top-ranked features from the first level. This combination includes the final weighted-hybrid selected features that fed to a support vector machine (SVM) classifier to evaluate the proposed weighted hybrid selected features. Hence, in the test phase, the generated weight is used directly without any further need for the second level selection. Several comparative studies were conducted to evaluate the proposed feature selection performance for PD voice-loss detection. The experimental results established the superiority of the proposed procedure using cubic kernel-SVM with 94% accuracy for voice-loss detection in PD, while, with the same classifier, 88% accuracy was achieved without using the proposed selection method.


I. INTRODUCTION
Parkinson's disease is a progressive, enduring neurological disease leading to deterioration or death of the brain cells. It has several symptoms, including memory problems, as well The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney. as depression, movement problems, including slowness, stiffness, and tremor. Also, walking/ balance problems, such as the freezing of the gait, are observed at the last stages of PD. Such indicators with their progression differ from patient to patient [1]. Generally, there are five main PD stages. In the earliest stage (stage 1), the PD patient suffers from mild symptoms on one side of the body, such as rigidity and tremor VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ in one hand or leg [2]. In stage 2, symptoms appear on the body's sides without balance impairment. In this stage, the PD patient suffers from speech abnormalities, loss of facial expression, the trunk's muscle stiffness, and stooped posture [3]. In the moderate stage (stage 3), the patient suffers from slow of movement, loss of balance, and occurrence of falling. In the last two severe stages (stages 4 and 5); patients are unable to live independently without assistance in walking and standing. Generally, in stage 5, the PD patient may fall while standing and may suffer from the freeze of gait while walking. Furthermore, the patient's physical and mental vitality decline [4]. Commonly, not all patients experience these five stages of PD progression in their order. In addition, the five stages vary in severity and time duration from patient to patient. In some cases, the PD patients facing all the five stages, whereas other patients skip from an early stage to an advanced one without suffering from the in-between stages. This leads to difficulty and complications in predicting PD progression, which attracts researchers to explore such an active research area and develop automated and accurate PD detection and prediction models [5]- [11]. For PD monitoring and telemedicine systems, early stages' detection and diagnosis become essential. In such cases, speech abnormalities are observed, including monotone voice, soft voice, slurring speech, and fading of the voice's volume after starting loud-speaking [12], [13]. These changes in the voice's characteristics in PD patients are considered the milestone for early detection of PD based on using features extraction procedures and classification techniques. Consequently, recent studies are concerned to use the linking between PD and speech loss and weakness, where, disordered vowels have widespread changes in the vibrations from approximately periodic to extremely aperiodic complex patterns. Such random properties in the PD patients' sounds facilitate the ability to extract an enormous number of dysphonia features. This leads to inconsistency in the feature space. Several diagnostic methods for Parkinson's disease were based on speech signals [14]- [23]. Moreover, different speech signal processing methods were designed for detecting PD cases from the variation in the speech signals and also grading the PD severity. For example, Sakar et al. [24] applied machine learning-based classification on huge, recorded voice samples of sentences, words, and sustained vowels in PD patients. The results established the discriminating characteristics in the sustained vowels compared to the other types of samples.
Generally, the association between various speech measures and movement (non-speech) measurements in PD patients has a great impact on the detection of PD cases. This assertion was verified by Goberman [25] through the analysis of the acoustic articulation, prosody, and phonation. The results depicted that seven acoustic measures out of sixteen speech features were considerably associated with movement measures, such as facial expression, gait, postural stability, posture, rest tremor, and postural tremor. These results established the correlation between the speech measures with both axial-and non-axial-motor signs. Another study was conducted by Orozco-Arroyave et al. [26] on the recorded speech of different languages, namely German, Czech, and Spanish. Initially, a segmentation process was applied to the speech signals of utterances to separate the unvoiced and voiced frames. Afterward, unvoiced sounds' energy was modeled 25 bands using the Bark scale and 12 Mel-frequency Cepstral coefficients. This approach proved its accuracy to classify the speech signal of PD patients and healthy control individuals with an accuracy range of 85% to 99% based on the spoken language.
To separate healthy individuals from early-stage PD patients, Rusz et al. [27] assessed the impairment on the vocal to conclude the existence of the disorders in the speech at the PD early phases. The results proved that the main features to detect the PD cases are the dissimilarities in the fundamental frequency, where 78% of the early PD patients suffer from vocal impairment. Little et al. [28] extracted the fractal scaling and recurrence features during the speech analysis. These features were used to detect speech disorders as symptoms of the PD. Afterward; a bootstrapped classifier was applied to discriminate disordered from normal voices.
In a healthy voice, the pattern of vocal fold vibration has approximately periodic distribution. Thus, Tsanas et al. [29] extracted a huge number of extracted dysphonia features (132 dysphonia measures), including shimmer and jitter features as well as other features from the speech signals of both healthy and PD individuals. To overcome the uneven feature space with the possibility of over-fitting, different feature selection (FS) methods were compared, such as mRMR (minimum redundancy maximum relevance), LLBFS (local learning-based feature selection), LASSO (least absolute shrinkage and selection operator), and Relief [30]. Finally, support vector machines (SVMs) and random forest classifiers were used to classify to discriminate healthy individuals from PD cases. Recently, for PD patients' detection, Sakar et al. [16] extracted the tunable-Q wavelet coefficients and Mel-frequency Cepstral from the recorded voice signals of 252 individuals for feature extraction. Afterward, ensemble learning techniques were applied for classification after selecting the significant features based on their relevance using the mRMR FS scheme. The used ensembles entailed the k-nearest neighbor, multilayer perceptron, Random Forest, SVM with linear/ RBF kernels, logistic regression, and Naive Bayes classifier. The highest achieved accuracy was 86.0% with 0.84 F1-score by feeding the top 50 selected features to SVM with an RBF kernel classifier.
The preceding studies concluded the impact of the voice signal analysis to detect the PD cases, where several features can be extracted, such as glottis quotient, pitch-period entropy, F0-related measures, recurrence period density entropy, tunable-Q wavelet coefficients, Mel-frequency Cepstral, and empirical mode decomposition excitation ratio [16], [31]. This enormous number of different extracted features and dysphonia leads to inconsistent feature space with the possibility of over-fitting. This inspired the present work to propose a novel FS procedure to reduce the feature subset and facilitate the interpretation of the speech/ voice signals to gain perceptions into the PD detection problem using the most significant features. This also guarantees the efficiency and power of the detection and classification model using a reduced number of features.
In the present work, the significant features in the extracted features, by Sakar et al. [16], are selected using the proposed novel feature selection framework for voice-loss detection of the PD patients. A first-level selection was proposed by selecting the significant features using both the principal component analysis (PCA) and the eigenvector centrality feature selection method (ECFS), separately, and then merging them by performing a union of the two subsets contain the top-ranked features of both. Afterward, the second level selection is conducted by applying other ECFS on the hybrid selected features to find a weight factor. Finally, the weighted-hybrid selected features were used as the feature vector space to be inputted to the cubic-SVM classifier for distinguishing the PD cases. Different comparative studies were also conducted to evaluate the performance of the proposed feature selection method for PD voice-loss detection.
The structure of the remaining sections is as follows. Section II includes the dataset description, background, and methodology of extracting features and applying feature selection methods by introducing the proposed procedure. In sections III and IV, the experimental results with comparative studies are reported and discussed, respectively. Finally, the proposed work is concluded in section V.

A. DATASET
The used data set consists of voice signals recorded from 188 PD patients and 64 healthy persons as a control group. Each of these cases has 3 records or repetitions, which provided an overall number of samples equals 756 samples. Thus, the 753 features (attributes) were extracted from each patient in the dataset of 756 instances [16]. Speech features are employed to assess PD patients. The most popular speech baseline features are the fundamental frequency parameters, harmonicity parameters, jitter, shimmer, WT coefficients, MFCCs, recurrence period density entropy (RPDE), pitch period entropy (PPE), and Detrended fluctuation analysis (DFA) [16], [29]- [31]. In the present work, the used dataset entails 753 extracted features from speech signal processing procedures [16]. Such features include wavelet transform (WT) based features, time-frequency features, tunable Q-factor wavelet transform (TQWT) features, Mel frequency Cepstral coefficients (MFCCs), and vocal fold features which were extracted from the recorded speech of PD patients. These extracted features are related to the effects of the PD on the speech which include voice becomes softer, the speech might be incomprehensive, slurred, expressed rapidly, and the voice's tone might become monotone.

B. METHODOLOGY OF VOICE-LOSS DETECTION IN PARKINSON'S DISEASE 1) TRADITIONAL FEATURES EXTRACTION AND SELECTION
Several speech signal processing processes were applied to PD patients recorded speech for clinically extract convenient information for PD valuation. These features in [16] are used in the present work to evaluate the proposed FS method. The huge number of 753 features increases the feature space dimensionality and increase the existence of irrelevant features possibility. Accordingly, FS is the main process to reduce the dimensionality of the feature space by selecting significant features. FS techniques can be categorized into filter methods that use the proxy measure to score features, wrapper methods, and embedded methods [32]. Robust FS methods select significant features and discard irrelevant and redundant features [33]. This improves the data quality and the performance of the used classifier, accordingly. In the present work, the PCA and ECFS methods are applied as a hybrid combination procedure to gain the benefits of both of them.

a: PRINCIPAL COMPONENT ANALYSIS
The PCA is used to reduce the dimensionality of the feature space of the samples. This concept is achieved by transforming data into a new set of variables by calculating the eigenvectors and eigenvalues of the covariance matrix. In the PCA, the principal components are computed for an input matrix X of size m × n, containing m samples of n features to find the eigenvalues and eigenvectors of the correlation matrix, which is given by: where η X is the mean value of the features. Hence, the principal components matrix is given by: where S is a diagonal matrix of the singular values and U is an n × n matrix, also, the principal component R j is given by: This component represents the scaled left-singular vector using the standard deviation of the data points in the consistent direction, where the data variance is given by: where λ j is the eigenvalue. The output of the PCA is the principal component, where R contains the R j principal components of the input samples. Therefore, in the present work, the PCA is applied to the feature space, and only the principal components of the input samples, which provided a good description of the data, were considered. Furthermore, the ECFS method is applied in the present work. VOLUME 8, 2020

b: EIGENVECTOR CENTRALITY FEATURE SELECTION
The ECFS method orders the extracted features to rank the features according to their relevance. Eigenvector selection is a graph-based feature selection method for ranking features according to a graph centrality measure (Eigenvector centrality). It maps the features as nodes in a graph and scores the connected edges of the distributed features, where the edges define the path between features. For a set of nodes F which related to features F = {f 1 , . . . , f n }, an undirected graph is defined as [34]: where V is the set of vertices corresponding to each feature f , and E represents the weighted edges between features. To define the nature of the weighted edges, an adjacency matrix A is represented, which is associated with G.
In addition, each element a ij of A, which represents a pairwise potential term, is given by: where the potentials are represented as a binary function ϕ(f i , f j )of the node. Consequently, the adjacency matrix of the graph can be expressed as [34]: where A is the adjacency matrix of a directed graph is a matrix n * n, which contain nonnegative integers, such as aij= number of arrows from all nodes. Also, α is a loading coefficient ∈[0, 1], and K is a kernel obtained using the Fisher criterion. Since the FS is used mainly to find discriminating features, only the features that can distinguish the classes are kept. Hence, mutual information is used to rank the features by assigning a high score to the features that highly can predict each class. This scoring can be expressed as follows: S i is the scoring function used to select the features, Z is the feature set of features F = {f 1 , . . . , f n }, and Y is the set of class labels. For a set of nodes, F is related to features F = {f 1 , . . . , f n }, where 1 ≤ i ≤ n, n = 753, and y represents the class labels. Also, p(., .) is the joint probability distribution function of the features measured to calculates the likelihood of the two events (features and certain class) occurring together at the same time. Thus, the probability of the feature occurrence at the same time in a certain class is measured. Also, the score is computed to keep only the features that are related to or lead to these classes. In the ECFS method, an eigenvector A is calculated, which is defined as v 0 , which is related to the largest eigenvalue representing the strength of the connection between the nodes (i.e., features).

2) PROPOSED WEIGHTED-BASED TWO-LEVEL FEATURE SELECTION HYBRIDIZATION a: FIRST FEATURE SELECTION LEVEL (SET 1)
The where FV 1 is the set of the combined feature vector of level 1 selection.

b: SECOND FEATURE SELECTION LEVEL (SET 2)
In the present work, the proposed second features selection level is applied to the FV 1 using the ECFS method again to determine the weights of the selected features. Consequently, the new generated feature vector FV 2 is expressed as: where PC k and EV l are the top-ranked selected features using the second level selection. Here, k < d and l ≤ b as k and d are the number of the PC selected features in FV 1 and FV 2 , respectively. In addition, l and b are the number of the selected EV features in FV 1 and FV 2 , respectively. In addition, g < h, where g is the total number of selected features resulting from the second level selection including PC k and EV l . The set of the selected features from the second level (S 2 ) is considered a subset of the set of top-selected features from the first level selection (S 1 ). Consequently, a newly proposed non-zero weight w impact is defined to indicate the ratio and impact of each selected feature from PCA features or the ECSF features that have the greatest contribution in FV 2 , where the features from each method were labeled. This weight is proposed to increase the effect of the feature (impact) on the overall selected features set. This proposed weight factor is calculated using the following expression: In the present work, according to the reported results, l/b is greater than k/d, accordingly w impact is multiplied by the selected features from the ECFS method in FV 1 using the following expression: Hence, the final hybrid combination consists of PC d and EV b−weighted , which can be given by: where FV f includes h features in each sample, which are the modified selected features of first-level selection step using w impact , which include the union of the selected features in the proposed method. Henceforth, the proposed weighted selected features assigned feature weights that signify the selected features prominence based on the achieved highest classification accuracy.

3) OVERALL PROPOSED VOICE-LOSS DETECTION IN PARKINSON'S DISEASE
After selecting the significant features and using the weight, the final pool of features FV f is used in the classificationbased PD detection. Typically, there are different types of machine learning classifiers that can be used to detect voice-loss in PD. The weighted-hybrid features are fed to cubic-SVM. Also, a number of classifiers including the artificial neural network (ANN), and SVMs with different kernels were used for a comparative study. The proposed overall system of voice-loss detection is illustrated in Figure 1, which consists of two phases, namely train, and test. In the proposed feature selection method, the final, significant features in the selected features of both PCA and ECFS are weighted without ignoring any of these features. In the second level selection, the ECFS was applied for a second time to FV 1 for determining the value of w impact based on the impact and the number of features in PC k and EV l . Then, the calculated weight factor is multiplied by EV b , where the number of ECSF selected features in FV 2 is greater than the number of selected features from the PCA features, as proved in the results section, leading to the final selected feature subset FV f . The final selected based weighting hybrid features are then used for final classification to detect the PD cases and distinguish the healthy and PD patient's classes. The FV f is then inputted to the trained cubic-SVM classifier to attain the final PD detection using the classification results. These preceding steps are illustrated in Fig. 1 (a) in the training phase.
In the training phase, the results of the second level selection depicted that the ECFS has a great impact compared to the PC selected features. Consequently, after calculating w impact and based on the obtained results (in the Results section), w impact it is multiplied by the ECFS selected features directly in the test phase. In this test phase, the combination of features FV 1 is used with the weight factor, which obtained from the second level selection in the training phase, to determine the final selected feature vector FV f . The final design of the proposed system is demonstrated in the test phase ( Fig. 1(b)). In Fig. 1, the final binary classifier using the selected features FV f is able to accurately discriminate the PD patient cases from the healthy control ones.

III. RESULTS AND DISCUSSION
The proposed system was designed to reduce the dimensionality of the features with accurate classification for voice-loss detection in PD cases.

A. PERFORMANCE EVALUATION COMPARATIVE STUDY FOR DIFFERENT CLASSIFIERS WITHOUT FEATURE SELECTION
To ensure the efficiency of the proposed FS framework, a comparative study was conducted between the cubic-SVM and another 16 broadly used machine-learning procedures. The classification performance of several classifiers was evaluated using the feature subsets of 753 features from every 756 samples, as demonstrated in Figure 2.  The performance of the cubic-SVM classifier with selecting the top-ranked principal components of the input samples using a different number of principle components r is reported in Table 1 as an example, where the other possibilities were also examined.  Table 1 showed that the highest classification accuracy of value 91.1% is achieved by selecting the first 100 top-ranked principal components, which are used in the further coming steps as the selected features from the PCA.

C. STEP 2: CUBIC-SVM BASED ECFS
In this section, all the 753 extracted features from the voice signal in [16] were used. During the classification process, 70% of samples were used for training, while 30% were used for testing. Then, feature ranking using the ECFS method was applied to select the best top-ranked features. Table 2 reported the accurate measurements of the cubic-SVM classifier after features ranking using the first 753, 700, 600, 500, 300, 200, 100, and 50 ranked/ selected features for all samples, as an example of the obtained results from using grid search to find the best number of the selected ECFS features.  Table 2 depicted the best classification accuracy obtained with selecting the first 300 features using the ECFS method for voice-loss detection with 93.1% accuracy. Accordingly, we used these 300 ECFS top-ranked features in the further coming steps. This step in the results section applied the procedure mentioned in section 2.1 of the first level selection. The proposed study gained the benefits of combining the selected features from the two FS methods in steps 1 and 2 to obtain the hybrid model of selected features. In the first level selection, 300 top-ranked selected features using ECFS(shown in step 1), and 100 top-ranked selected principal components of PCA (in step 2) were integrated and summed up to a total of 400 features in a set called S1, i.e. FV 1 = 400, which are used in our proposed hybrid system. These 400 features were applied to a cubic-SVM, where the input has a size of 756 × 400, while the target was a 756 × 1 matrix with '1' indicating PD patient and '-1' indicating healthy individual, then the cubic-SVM achieved about 93.3% accuracy. This result indicated that the hybridization increases the classification performance compared to the preceding cases, namely using the whole extracted features without selection, using the ECFS only, and using the PCA only. This inspired the present work to study the impact of the selected ECFS and the selected PCA features in FV 1 using the proposed second level selection as follows, where the number of the selected features in the first level selection is found to be N (FV 1 ) = 400.

E. STEP 4: SECOND LEVEL SELECTION (FV 2 ) AND WEIGHT FACTOR IN THE TRAINING PHASE
This step in the results section applied the procedure mentioned in section 2.2 of the second level selection by calculating the weight factor in the training phase. To find the contribution/ impact of the different FS methods in FV 1 , a second level selection was applied to the hybrid selected features FV 1 = 400 using another ECFS. Thus, the 400 combined and labeled selected features in the first feature selection are fed again to other ECFS to determine the top-ranked selected features for further use to determine the weight factor. Table 3 presented the accuracy of the cubic-SVM classifier after ranking using the first 200, 300, 350, and 370 ranked/ selected features for all samples from the 400 selected features as an example, where the classification accuracies were calculated for all ranking possibilities. Thus, Table 3 includes the worst and best cases.  Table 3 depicted that the highest classification accuracy of 93.8% is achieved by a new set called S2, using 350 topranked selected features from the hybrid selected features of ECFS and PCA. Since the selected features were labeled, it was found that the whole 300 ECSF features were included in this finally selected pool, while only 50 PCA features were used in this final selection stage out of a total 100 PCA features. Accordingly, the result showed that the percentage of using PCA is 50% (i.e., half of the PCA components that ranked in the second level selection) from the total number of selected features in the first level selection, while the percentage of using ECFS is 100% (i.e., all the ECSF that used and ranked in the first level selection were passed to the second level selection). Accordingly, the weight factor was calculated using the obtained values of k, d, land bto substitute in Eq. (12) as follows: w impact = k/d + l/b = (300/300) + (50/100) = 1.5 (15) Thus, in the present work, the weight factor is found to be w impact = 1.5. This step implements the test phase in the overall proposed framework shown in Fig. 1 (b) using the deduced weight value in the previous step. The calculated weight value w impact = 1.5is then multiplied by the selected ECFS features in FV 1 leading to EV b−weighted as given in Eq. (11), where the impact of the ECFS features is greater than the impact of the PCA features.
From the previous study, we found that the percentage of the selected features using ECFS is greater than PCA in the hybrid method, and the performance of selected features using eigenvector only is greater than PCA only. So, we multiplied the 300 selected features from eigenvector by the weight factor of 1.5 to increase the effect of the ECFS features on the overall hybrid selected features set in FV 1 to find the final weighted-hybrid selected features set FV f . To prove the correctness of this way using the calculated weight factor by using (12), other weight values were tested and multiplied by the ECFS features and gathered with the PCA selected features with computing the value to find the relation between changing the weight and the classification accuracy as illustrated in Fig. 3 using FV f in (14).   3 showed the performance of cubic-SVM using weighted hybrid features. It described the effect of changing the proposed weight value on the final classification accuracy. Accordingly, it is concluded that the proposed weight applied to the selected ECFS features improves the classification performance up-to 94% with a weight of value 1.5. This weight value equals the sum of the PCA contribution's percentage in the hybrid selected features set, and the ECFS selected features percentage in the second level ECFS. So, the VOLUME 8, 2020 proposed weighted-hybrid selected features achieved the best accuracy to classify and detect the voice-loss in PD patients. Moreover, other classification performance metrics of the proposed weighted-hybrid selected features were measured. Such metrics include the sensitivity, specificity, precision, negative predictive value, miss rate (false negative rate), fall out (false positive rate), false discovery rate, false omission rate, and accuracy, which have the following values, 84.4%, 97.3%, 91.5%, 94.8%, 15.6%, 2.7%, 8.5%, 5.2%, and 94%, respectively. Finally, the Receiver Operating Characteristic (ROC) Curve, which is a probability curve for comparing diagnostic tests, is plotted in Fig. 4. Also, the area under the curve (AUC) indicates that the classification performance is close to the perfect classifier.

IV. DISCUSSION AND COMPARISON WITH STATE-OF-THE-ART WORK
In the proposed method, the weighted-hybridizing based second level selection (from training phase) was performed by merging both the PCA-based selected features and the weighted ECFS-based selected features. Lastly, the weightedhybrid selected features were inputted to the SVM classifier to detect the voice-loss signals identifying PD cases. The preliminary feature selection using the PCA and the ECFS methods supported by the response of the cubic-SVM indicated the strength of features' association in the feature space. Nevertheless, ultimately, our purpose is to improve the association between the selected features using the PCA and ECFS. This goal was achieved by multiplying the selected features using the ECFS by a weight factor w impact , where the accuracy of using EV b separately is superior to using the PCA selected features PC d . Consequently, the final binary classifier using the selected dysphonia measures FV f was able accurately to discriminate the PD patient cases from the healthy control ones. To ensure the efficiency of the proposed method, a comparative study was conducted between the cubic-SVM and another 16 broadly used machine-learning procedures. This comparative study which reported in Fig. 2 proved the superiority of using the cubic-SVM. Furthermore, the pro-posed method is superior by 8% improvement in the accuracy to the obtained results in [16] that used the same dataset, which achieved maximum accuracy of 86% with 0.84 F1-score and 0.59 MCC by feeding the top-50 features selected by mRMR to SVM-RBF classifier. Thus, the preceding results established the efficiency of the proposed selection method for the PD voice-loss detection based classification process. Subsequently, it is recommended to generalize this proposed new feature selection framework for automatic measurements and assessment method for PD patients at the early stage as well as in a range of different applied clinical purposes.
Moreover, a comparison between the proposed FS method on text clustering is compared to the hybrid feature selection method by Bharti and Singh in [35]. In [35], a modified union based on sets intersection was designed to avoid ignoring any of the selected features in the text clustering problem. The authors selected all top-ranked features from two sublists as well as the common features using the intersection between the non-selected features. By comparing this procedure with our proposed method, we represented the feature section levels in the methodology section in terms of the sets and sublists, where here the first set (S 1 ) consists of 400 features including the top-ranked features from first-level selection (300 selected features of ECFS and 100 selected features of PCA). In addition, the second set (S 2 ), which includes the second level top-ranked selected features, consists of 350 features (300 of ECFS and 50 of PCA). This relation between the two sets can be expressed as follows: In terms of the union and intersection operations, the relation between the two sets, respectively, can be given as: At the same time, the difference between the two proposed sets is given by: where S 3 consists of the features in S 1 and not exist in S 2 , which includes the remaining 50 features. Since S 2 is a subset of S 1 without any remaining features in S 1 and there is no intersection between the remaining features sets, which is expressed as follows: Finally, by applying the concept of the modified union in [35] in our proposed method, the final set is can be formulated as: Accordingly, in the present work, the description of the two levels, where S 2 set is a subset of S 1 is illustrated in Figure 5. According to the obtained results, the 400 features from the first level selection were used without ignoring any features and without any change, which achieved an accuracy of 93.3% classification-based PD detection. In addition, the probability of the PCA features existence in level two is: Also, the probability of the existence of ECFS features in level two is: Subsequently, the proposed weighted-hybridizing based second level selection (from training phase) was performed by merging both the PCA-based selected features and the weighted ECFS-based selected features without ignoring any features. Lastly, the weighted-hybrid selected features were inputted to the SVM classifier to detect the voice-loss signals identifying PD cases, which achieved an accuracy of 94 %. Generally, among the different feature selection methods that dealt with voice signals, both PCA and ECFS were recommended in several studies such as in [36]- [38]. Generally, PCA is very common as it projects the data into a new space with reducing the dimensionality of feature space. This guarantees the uncorrelated selected features from the PCA and any other feature selection method, where the selected PCA features are in a different space that differs from the space of any other features that can be selected using another feature selection method, such as the ECFS method. Hence, in the present work, the independently selected features from both the PCA and ECFS methods are uncorrelated, which improves the performance of the selection method. This independent and uncorrelated relation between the features from PCA and ECFS is also proved in the present work, where the intersection between their features is ∅. This accelerates the PD detection-based classification by getting rid of the correlated variables which do not contribute to any decision making.
Since the proposed method achieved significant results, it is recommended also to apply this proposed framework along with data discretization in data mining applications compared to the Tsai, and Chen [39]. In addition, the proposed method can be incorporated with the feature selection methods in [40]- [44] as well as different FS methods that can be applied, such as random projection, independent component analysis (ICA), and non-negative factorization. However, due to the limitations of using tail-and-error to select the number of ranked/ selected features from each FS method in stages 1 and 2, it is recommended to automate this process using an optimization algorithm by using the classification accuracy as the fitness function. Also, it is recommended to generalize our efficient feature selection framework and test it with different problems and datasets, where it is based on the extracted features and their selection using our novel feature selection framework and did not depend directly on the nature of the signals and or the images (if used).

V. CONCLUSION
PD patients suffer from various symptoms in different body parts, including the tongue that leads to voice-loss. Recent researches are directed to find the association between speech impairment and PD for further use in PD cases detection and prediction. Dysphonia measures and other speech signal processing procedures are directed to expect the severity of PD symptoms using voice signals. Subsequently, numerous features can be extracted from the speech signal processing stage leading to over-fitting and the increased prospect of finding irrelevant features along with the problem of the imbalance in the datasets.
In this work, a novel cubic-SVM based weighted-hybrid two-level feature selection for voice-loss detection in PD was proposed. The first selection stage aims to select the significant features using both the ECFS and the PCA components for further hybridization. However, both FS methods do not achieve the same classification accuracy separately; this indicated that they do not have the same impact on the overall hybrid combination. Hence, the second level selection was proposed in the training phase only to find the proposed weight factor, which is multiplied by the hybrid selected features from the first selection of the most effective FS method. In the present work, the results proved that using the ECFS is superior to the PCA; accordingly, the final selected feature set included the same PCA selected features from the first selection stage while multiplying the selected ECFS features by 1.5 (computed weight value). Therefore, the proposed novel Cubic-SVM based two-level selection realized 94% classification accuracy, which is superior to several well-known machine learning classifiers. Overall, the proposed FS method established promising results in machinelearning based voice-loss detection in PD patients.  -566-135-1441). The authors, therefore, gratefully acknowledge DSR technical and financial support. VOLUME 8, 2020