Surviving Most Relevant Features on Transient Trajectories Data by Dyadic 24-Way Hybrid Feature Selection Algorithm for Transient Stability Prediction

Designing an effective feature selection scheme (FSS) is an inevitable solution for top-level balancing contrastive-correlated indices, namely transient processing time (TPT) and transient prediction accuracy (TPA) on transient stability assessment (TSA). Achieving low TPT and high TPA have a tight relationship in selecting the most relevant transient point features (MRTPFs) survived by applying comprehensive FSS on <inline-formula> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula>-variate transient trajectory features (<inline-formula> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula>VTTFs). Hence, we introduce dyadic 24-way hybrid FSS (D24WHFSS) to select MRTPFs from <inline-formula> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula>VTTFs. The D24WHFSS comprises 24 permutations of the chained four-stage hybrid structure called 24-way hybrid FSS (24WHFSS). The 24WHFSS raised by bi-incremental wrapper mechanism (bi-IWM) contains incremental wrapper subset selection (IWSS) and IWSS with replacement (IWSSr). Each hybrid scenario is equipped with symmetric uncertainty (SU) (filter phase) and dual support vector-based classifiers (DSVCs) (wrapper phase). Embedded DSVCs into IWSS/ IWSSr include kernel support vector machine (<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>SVM) and <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-twin SVM (<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>TWSVM). By plugging dual kernel function pairs (DKFPs) into DSVCs, 24-way <inline-formula> <tex-math notation="LaTeX">$^{\mathrm {SU}}$ </tex-math></inline-formula>bi-IWM<inline-formula> <tex-math notation="LaTeX">$^{\mathrm {DSVCs}}$ </tex-math></inline-formula> is exerted in the varied twofold repetition (dyadic 24WHFSS). In the first KFP (KFP1), the radial basis function (RBF) is situated in the DSVCs of bi-IWM. In KFP2, the dynamic time warping (DTW) and polynomial (Poly) kernels are used in 24-way <inline-formula> <tex-math notation="LaTeX">$^{\mathrm {SU}}$ </tex-math></inline-formula>bi-IWM<inline-formula> <tex-math notation="LaTeX">$^{\mathrm {DSVCs}}$ </tex-math></inline-formula> that the DTW and Poly kernels plugged into <inline-formula> <tex-math notation="LaTeX">$^{\mathrm {SU}}$ </tex-math></inline-formula>IWSS<inline-formula> <tex-math notation="LaTeX">$^{k\mathrm {SVM}}/^{\mathrm {SU}}$ </tex-math></inline-formula>IWSSr<inline-formula> <tex-math notation="LaTeX">$^{k\mathrm {SVM}}$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$^{\mathrm {SU}}$ </tex-math></inline-formula>IWSS<inline-formula> <tex-math notation="LaTeX">$^{k\mathrm {TWSVM}}/^{\mathrm {SU}}$ </tex-math></inline-formula>IWSSr<inline-formula> <tex-math notation="LaTeX">$^{k\mathrm {TWSVM}}$ </tex-math></inline-formula>, respectively. Finally, the efficacy of D24WHFSS-based MRTPFs in TSA is evaluated via cross-validation. The results show that D24WHFSS has a TPA of 99.25 % and a TPT of 102.607 milliseconds for TSA.


OFs stages total
Struct for recording the L4SH 1:6specific OFs related per zone of i D24WHFSS.

I. INTRODUCTION
Nowadays, the application of data mining (DM) technologies [1], [2] for promoting the prediction quality of fast-sudden phenomena (FSP) in core strategic industries (e.g., the energy industry, health industry, transportation industry, and so on) leads to realizing intelligent insights for system stakeholders [3], [4], [5]. Measuring FSP prediction quality depends on contrastive-correlated metrics, namely the accuracy of system status prediction and processing time for system status labeling. Achieving high prediction accuracy (HPA) and low processing time (LPT) simultaneously is defined ultimate goal in supervision-required industries. Such targeting in cognition of system phenomena markedly reduces system operator's directing challenges in conducting the timely-accurate intervention actions to return the system to a normal operation state. However, a significant concept called high dimensional space (HDS) negatively overshadows the performance of DM-based tools to reach HPA and LPT-based effective decision-making on the system under study. The HDS arises from massive variables observed by software and hardwareoriented monitoring systems that irrelevant and redundant features (IRFs) populate a considerable portion of HDS. The presence of IRFs in DM-based extracting pattern procedures reduces prediction accuracy and increases the processing time related to unseen case labeling. Hence, the problem of dimensionality is a critical topic in pattern recognition [6], [7]. The right solution presented by DM experts to handle the HDS is termed the feature selection (FS) process [8], [9]. Applying FS-based techniques on HDS be caused discarding IRFs and surviving the most relevant ones. The selected subset encompasses features with minimum redundancy and maximum relevance criteria that promise low processing time and high accuracy prediction regarding FSP occurred in highrisk industries.
The energy sector as a sensitive industry includes the strategic product, namely electricity. Electric power has an incomparable role in economic prosperity and meeting human needs in modern society by guaranteeing the survival and continued growth of downstream to upstream industries. Hence, power system operational reliability assessment is a 24/7 supervisory support to ensure a stable power supply. One of the significant branches of dynamic stability assessment of the power grid is transient stability assessment (TSA) [10]. TSA aims to predict the transient stability status based on the data analytics dashboard raised by coupling DM technologies and transient data obtained by phasor measurement units (PMUs) [11] so that the system operator triggers a prompt-correct reaction against the unstable state. However, on the way to the synchronal achievement of low transient processing time (TPT) (low prediction time and small observation window) and high transient prediction accuracy (TPA) on TSA, the irrelevant-redundant transient features (IRTFs) in high dimensional transient space (HDTS) is the main obstacle. To solve this concern, applying the feature selection scheme (FSS) on HDTS to select the most relevant transient point features (MRTPFs) should be on the agenda. The IRTFs elimination and streaming of more information between MRTPFs and target class cause facilitate the welltraining process condition of machine learning classifiers (MLCs) for high TPA. On the other hand, MRTPFs-based compact transient space (CTS) brings low TPT (low prediction time due to CTS-based faster training procedure; and selecting small observation window from MRTPFs subset), which causes to pass the time constraint in demanded corrective actions [12]. Considering the above points, designing the effective FSS for high-performance TSA has become a hot topic for DM specialists.

II. RELATED WORKS
Scrutinizing glance at the FS-based TSA studies manifests that applied FS mechanisms to find optimal transient fea-tures are mounted on information theory principles (ITPs) (filter) and ITPs-MLCs approaches (filter-wrapper). In the case of filter-oriented works like [13] and [14], power and angle-based HDTS is targeted by mutual information (MI) metric to gain the best quantify of redundancy and relevance (minimum-redundancy and maximum-relevance (mRMR)). Another way of measuring the relevancy of observed features for monitoring the induction motor is the extended Relief called ReliefF elaborated in [15]. In [16], the fast correlation-based filter (FCBF) is situated as the primary stage in the transfer capability calculation (TTC) model, which by selecting optimal features helped the grid operators in addressing triple issues, namely static security, static voltage stability, and transient stability. In the case of the hybrid FS frameworks used in transient studies, [17] proposes the Relief and support vector machine (SVM)-based filter-wrapper combination to extract the optimal features on transient trajectories data set. In [18], hybrid FSS is appeared by integrating the normalized mutual information (NMI) (filter phase) and binary particle swarm optimization (BPSO) (wrapper phase) for high-performance transient stability status prediction. To surmount the HDTS, [19] presents the point and trajectory-feeding hybrid algorithm in the form of coupling fuzzy imperialist competitive algorithm (FICA) and incremental wrapper subset selection (IWSS) called FICA-IWSS that includes MI and conditional MI metrics in the filter phase and kernel SVM in the wrapper phase. In [20], a BinJava-based kernelized fuzzy rough sets (KFRS) approach is conducted on the entire feature space for selecting optimal feature subsets. In [21], coupling the kernelized fuzzy rough sets (KFRS) and the memetic algorithm is applied to transient data to survive the optimal transient features for TSA of power systems. In [22], crosspermutation-based quad-hybrid FSS (CPQHFSS) to select optimal features from TMEs. The CPQHFSS consists of four filter-wrapper blocks (FWBs) in the form of twin two-FWBs mounted on two-mechanism of the incremental wrapper. Reference [23] presents the partial-injective trilateral hybrid scheme (PITHS) based on horizontally integrated mode is applied on transient multivariate trajectory features (TMTFs) which consist of two nested trilateral phases namely nested trilateral filter phase (NTFP) and the nested trilateral wrapper phase (NTWP).
Focusing on the past FS-based TSA studies (e.g., [13], [14], [15], [16], [17], [18], [19], [22], and [23]) revealed the released strategies suffer from the mono-way filterwrapper structure (MWFWS) that causes failure in the precise exploring MRTPFs from nonlinear HDTS. Passing the weak-learner MWFWS gates requires designing the well-structured FSS supported by a multi-level circular learning model (MLCLM). Performing MLCLM on foggy non-separable transient data brings the retrieving of invisible MRTPFs (IMRTPFs). On the other hand, in some of FS-based TSA like [20] and [21], applying MLCLM on m-variate transient trajectory features (mVTTFs) set to single-window streaming data (SWSD) mode stemmed from VOLUME 10, 2022 sticking transient univariates together leads to pruning features defeated by selected features according to filter-wrapper metrics (a slight distinction between discarded features and optimal ones). In this regard, replacing the sectoral-oriented view with the SWSD mode is compelling in the feature selection process. Generally, overcoming mentioned obstacles has a direct impact on achieving timely-accurate transient stability prediction (TSP).
The trilateral contributions of this paper to handle FS-based TSP problems faced by transient analysts are categorized as follows: • A novel feature selection algorithm named the dyadic 24-way hybrid FSS (D24WHFSS) is proposed to extract optimal features for high-performance TSA. The offered scheme including the linked four-stage hybrid model in a 24-way manner (called 24WHFSS) mounted on the bi-incremental wrapper mechanism (bi-IWM). The bi-IWM is decorated by symmetric uncertainty (SU)based filter phase and hyperplane-based MLCs as wrapper phase. To reach CTS containing discriminative transient features, HDTS is fed to kernel-based in the varied twofold-repetition of 24WHFSS (dyadic 24WHFSS).
• Based on the SWSD in FS, the univariates of mVTTFs are fed to the D24WHFSS separately. Besides extracting univariate-specific MRTPFs in such an approach, the risk of discarding optimal features in the streamed feature set induced by pasting together features of transient univariates (TUs) (TU 1 to TU m ) will be close to zero.
• The performance of D24WHFSS-based MRTPFs in TSP is compared with survived MRTPFs by various FS algorithms via cross-validation.
The rest of the article is structured as follows: The proposed D24WHFSS are elaborated in Section 3. Experimental results of exerting D24WHFSS on mVTTFs and MRTPFs-based TSP are depicted in Section 4. Also, Section 4 ending is explained the performance comparison of the D24WHFSSbased MRTPFs with the optimal features survived by other FS techniques in TSA. Finally, the conclusion is remarked in Section 5.

III. DYADIC 24-WAY HYBRID FEATURE SELECTION SCHEME (D24WHFSS)
The overall workflow for high-performance TSA centered on D24WHFSS is depicted in Fig. 1. As the preliminary step, the contingency simulation to construct transient data set is performed by a triad of SIEMENS power system simulator for engineering (PSS/E) software, Python technology, and matrix laboratory (Matlab) tools. Next, dimensionality reduction of HDTS by introducing D24WHFSS is on the agenda. The proposed hybrid FSS is driven by 24 permutations of the chained four-stage hybrid models mounted on bi-IWM called 24WHFSS. The bi-IWM encompasses incremental wrapper subset selection (IWSS) and IWSS with replacement (IWSSr), which is supported by SU and dual support vector-based classifiers (DSVCs) as the filter and wrapper phases of D24WHFSS, respectively ( filter bi-IWMs wrapper including SU IWSS DSVCs and SU IWSSr DSVCs ). The DSVCs contain kernel support vector machine (kSVM) and k-twin SVM (kTWSVM). Based on setting dual kernel function pairs (DKFPs) into DSVCs, 24WHFSS is conducted in varied twofold repetition (dyadic 24WHFSS). Each KFP contains two functions: the first function is the bi-IWM kSVM -specific kernel and the second function is the bi-IWM kTWSVM -specific kernel. As the first KFP (KFP 1 ), the radial basis function (RBF) kernel plugged into DSVCs of bi-IWM that we have four-stage hybrid scenarios including SU IWSS RBFSVM , SU IWSS RBFTWSVM , SU IWSSr RBFSVM , and SU IWSSr RBFTWSVM . In KFP 2 , SU IWSS kSVM -SU IWSSr kSVM -specified kernel is dynamic time warping (DTW) and SU IWSS kTWSVM -SU IWSSr kTWSVMspecified kernel is polynomial (Poly) kernel. Finally, the efficacy rate of survived MRTPFs by D24WHFSS in TSP is measured via cross-validation. In the continuation of the third step, we compare the performance of D24WHFSSbased MRTPFs with extracted discriminative features by other feature selection algorithms. In D24WHFSS, the bi-IWM accompanied by filter and wrapper methods plays the pivot role in extracting MRTPFs of mVTTFs to achieve low TPT and high TPA. According to the overall summary of D24WHFSS depicted in Fig. 2, first, the filter phase is conducted per TU x of mVTTFs for ranking the transient point features (TPFs) of TU x ( RankTPFs TU x ). Next, RankTPFs TU x is entered into 24WHFSS, which is repeated two times ( 1 D24WHFSS and 2 D24WHFSS) according to various two KFPs (See Fig. 2, sprinkler symbols), namely (RBF, RBF), (DTW, Poly) pairs. Each i D24WHFSS is categorized into four learning zones. Each zone consists of linked four-stage hybrid (L4SH) structure ( SU IWSS kSVM , SU IWSS kTWSVM , SU IWSSr kSVM , and SU IWSSr kTWSVM ) which led by one of them (Zone 1 led by SU IWSS kSVM , Zone 2 led by SU IWSS kTWSVM , Zone 3 led by SU IWSSr kSVM , and Zone 4 led by SU IWSSr kTWSVM ). Based on the starter hybrid scenario in each zone, by permuting of the rest hybrid scenarios (three scenarios have six permutations), we have six types of L4SH structure (6L4SH structure) per zone of i D24WHFSS. Based on the chained-face of L4SH, the output of each hybrid scenario will be the input of the other scenario. After entering RankTPFs TU x into 6L4SH of zones (See Fig. 2, blueface dotted line), the optimal features (OFs) obtained by L4SH 1 to L4SH 6 ( x OFs 1 to x OFs 6 ) are recorded. Then, the intersection of zone z L4SH 1:6 -specific selected features is labeled as i D24WHFSS-based x zone z relevant TPFs ( i D24WHFSS-based x Z z RTPFs). Then, the union operator is applied on i D24WHFSS-based x Z 1 RTPFs to i D24WHFSSbased x Z 4 RTPFs for selecting i D24WHFSS-based x Z 1:4 unified RTPFs ( i D24WHFSS-based x Z 1:4 URTPFs) (See Fig. 2, red-face dotted line). Based on setting DKFPs on 24WHFSS, we have 1 D24WHFSS-based x Z 1:4 URTPFs and 2 D24WHFSS-based x Z 1:4 URTPFs sets. To achieve  Besides the visual summary of D24WHFSS (See Fig. 2), for more details about the main functions of the proposed FSS, the pseudocode of D24WHFSS is shown in Table 1. According to Table 1, 'SUCalc' for SU-based TPFs weighting per transient univariate (TU 1 to TU m ) is situated in the main body of D24WHFSS as a primary filter-oriented function (See Table 1, Line 1-5). After ranking the weights of TPFs 1 to TPFs l related to TPFs SU TU x in descending order (Line 7), the necessary condition for entering each TU x into 24WHFSS decorated in varied two-fold repetition based on DKFPs (Line 9) is provided. The RankTPFs SU TU x is fed to i D24WHFSS designed by the four-zone contains the linked four-stage hybrid structure (Line 11, Zone4SH). Each zone of i D24WHFSS (Line 12) is led by the starter (or leader) hybrid scenario. By execution of starter leader in each zone of i D24WHFSS (IWSS ( RankTPFs TU x , 1  Besides the explanations of D24WHFSS in the form of the pseudocode of its main body and various functions, we discuss the complexity of D24WHFSS in this section. The complexity of D24WHFSS is based on the bi-IWM (IWSS and IWSSr) accompanied by DSVCs (kSVM and kTWSVM). By focusing on these significant elements, we can approximate the complexity of D24WHFSS. In the worst case, the complexity of IWSS and IWSSr is O(n) and O(n 2 ), respectively [24]. Also, the complexity of SVM and TWSVM is O(n 3 ) and O(2×(n/2) 3 ), respectively [25]. Hence, the complexity of IWSS kSVM/kTWSVM is O(max{(n×n 3 ), (n×2×(n/2) 3 )}) and IWSSr kSVM/ k TWSVM has O(max{(n 2 ×n 3 ), (n 2 ×2×(n/2) 3 )}) complexity. Since the complexity of the SVM is 4 times larger than of the TWSVM, the complexity of SU IWSS kSVM/kTWSVM and SU IWSS kSVM/kTWSVM will be equal to O(n×n 3 ) and O(n 2 ×n 3 ), respectively. On the other hand, [24] results show that the complexity of IWSSr is near to IWSS in the presence of the compacted space of optimal features in wrapper iterations. Consequently, according to the D24WHFSS scheme, D24WHFSS has O(c×n 4 ) complexity.

A. BI-INCREMENTAL WRAPPER MECHANISM (BI-IWM) 1) IWSS
The IWSS [27] is one of the IWM, which is used in variant forms in six types of the chained four-stage hybrid scenarios embedded in the D24WHFSS four-zone. Two approaches, including the ITPs (filter) and MLCs (wrapper), have a direct VOLUME 10, 2022  relationship in the growth of IWSS tree branches. First, features ranking in descending order based on information theory-based indices (ITIs) is performed to determine how features are entered into the IWSS tree. Then, the first branch of the IWSS tree grows by training the MLC via the feature with the highest ITI (f h1 ), and the prediction accuracy in the presence of f h1 ( Acc (f h1 )) is obtained. The subsequent growth is related to the participation of the second-rank feature (f h2 ) with f h1 to train MLC, and the obtained result is labeled as Acc (f h1 , f h2 ). If the comparison of Acc (f h1, f h2 ) and Acc (f h1 )  reports superiority of coupling f h1 and f h2 , the f h3 added to (f h1, f h2 ) as 3 rd element for MLC training. On the other hand, if the Acc (f h1 , f h2 ) is lower than Acc (f h1 ), f h2 is discarded from the candidate file, and a couple of f h3 and f h1 are used for MLC training. An example of how the IWSS algorithm works is shown in Fig. 3.

2) IWSSr
Another IWM accompanied by IWSS for selecting optimal features in D24WHFSS is the IWSSr algorithm [24].
According to the dual basic requirements to form the IWSS tree (ITPs and MLCs), the IWSSr is no exception in this matter. The IWSSr works differently than IWSS in the tree branches grow. After sorting features based on ITIs, f h1 is placed in the candidate file as the first element and used for training the MLC. After obtaining Acc (f h1 ), two branches develop from the f h1 -based node. First, f h1 is replaced with f h2 (See Fig. 4, Node 2), and MLC is trained by f h2 . Second, couple f h1 and f h2 (See Fig. 4, Node 3) participate in MLC training. According to Fig. 4, the obtained results is manifested that among Node 1 ( Acc (f h1 )), Node 2 ( Acc (f h2 )), and Node 3 ( Acc (f h1 , f h2 )), Node 3 is selected for subsequent increment. Node 3 growth via f h3 cause to create the Node 4 with (f h2 , f h3 )-based MLC, Node 5 with (f h1 , f h3 )-based MLC, and (f h1 , f h2 , f h3 )-based MLC. Such an increment under Node 3 does not improve the prediction accuracy, and consequently, (f h1 , f h2 ) of Node 3 is introduced as optimal features.

B. FILTER AND WRAPPER METHODS PLUGGED IN BI-IWM of D24WHFSS 1) SU-BASED FILTER IN BI-IWM
The SU index [28] as the symmetrical measure is considered a preliminary step of bi-IWM to specify the importance degree of features. The SU via interlacing basic ITIs measures the amount of feature relevance with the target class. Based on triple basic ITIs, namely entropy, conditional entropy, and mutual information (MI), the SU index is defined as: According to (1), SU supported by the entropy index, which is defined as:

SU (TU
In (2), based on the probability density function p(k) = Pr{K = k}, K 's entropy (K represent discrete random variable) is calculated. Another index situated in (1) is mutual information (MI) which is given by:  [29] is one of the most popular hyperplane-based classifiers that focus on plotting the optimal separating hyperplane between binary or multi-label considering triple VOLUME 10, 2022 fundamental principles (3FPs), namely margin maximization, structural-risk minimization, and avoiding overfitting. Besides the soft or hard margin-based idea for linear classification, the kernel-based approach paves the way for finding the zero-error nonlinear decision boundary on non-separable HDTS. Plugging kernel into SVM computations causes projecting data into separable space. Hence, SVM can be reformulated by exploiting the kernel trick as follow:

MI (TU
In this paper, two types of efficient kernels are used as substitutes for K (x i , x j ): 1) radial basis function (RBF) [29] and 2) dynamic time warping in RBF ( DTW RBF) [30]. The concise explanations of RBF and DTW RBF kernels are as follow: (1) RBF kernel: The RBF provides point-to-point matching for pattern discovery in feature space. The non-elastic RBF kernel is defined as: ||x −x || 2 in (6) represents the squared Euclidean distance for calculating the distance between two data points.
(2) DTW RBF kernel: Changing the RBF kernel formula by replacing RBF's distance function with DTW causes defining elastic kernel, which brings the nonlinear pattern matching on feature space. The DTW distance is given by: Based on DTW distance, the DTW RBF kernel is defined as (8): Finally, solving (9) leads to drawing 3FPs-based separating hyperplane in HDTS:  cross hyperplane-based classifier called generalized proximal eigenvalue support vector machine (GEPSVM) [31]. In GEPSVM, finding two hyperplanes that each hyperplane takes the nearest distance from the samples of a class and  the farthest distance from the samples of another one is on the agenda. Achieving the high-performance classification based on GEPSVM motivates DM scholars to design a novel classifier based on the GEPSVM principle. TWSVM [32] is the name of the GEPSVM-based classification model that regards the new formulation for plotting the separating cross hyperplanes. The TWSVM-based optimization problems are as follows: where c 1 , c 2 > 0 are parameters, and e 1 and e 2 are vectors of ones with proper dimensions. By obtaining the Karush-Kuhn-Tucker (KKT) conditions for (10) and (11) based on applying the Lagrangian function on (10) and (11); and also combining some relations, the dual optimization problem of (10) and (11) are obtained as follows: (13) Based on quadratic programming applied on (12) and (13), αand ψ are obtained. Hence, necessary conditions to take the values of [w (1) , b (1) ] and [w (2) , b (2) ] for drawing the cross hyperplanes on a binary classification problem is provided: Finally, predicting the class label of an unseen case is given by: To achieve high-performance classification in the nonseparable HDTS, empowering TWSVM by embedding the kernel functions on TWSVM computations is the best solution [32]. Hence, we have: In (16), C T = [A B] T and K indicate the kernel function. Solving the (17) and (18) leads to obtaining the [u (1) b (1) ] T and [u (2) b (2) ] T vectors.
In (17) and (18), the K is replaced with RBF (discussed in (a) section of III. B. 2) and the polynomial (Poly) [33] kernels. The following definition is related to the Poly: Poly kernel: In linear kernel relation [33], by setting degree (d) to greater than one, the Poly kernel is defined as follows:

IV. EXPERIMENTAL DESIGN A. CREATING TRANSIENT DATASET
For FS-based TSA, creating the transient dataset is the preliminary task of the three-step proposed framework in this paper (See Fig. 1, Step 1). In this regard, we design the twostep transient data creation mechanism (2STDCM) as shown in Fig. 5. In 2STDCM, first, output channel transient values per basic features (OCTVs BF i ) are recorded. The BF i includes the bus voltages (VOLT), voltage phase angle (VANGLE), machine active power (PELEC), machine reactive power (QELEC), and reactive power consumption (QLOAD)). The OCTVs BF i is obtained via Python-based contingency simulation (PCS) supported by the application program interface (API) functions of the SIEMENS power system simulator for engineering (PSS/E) [34]. For more information about Python scripting for dynamic simulation based on PSS/E API ('psspy' module) refers to Table 4. The contingency simulation is conducted on the New England test system-New York power system (NETS-NYPS) (See Fig. 6) [35]. The transient cases are stemmed from substation outages, generator outages, and line outages by setting disturbance different parameters (fault duration time: 0.23 seconds with 0.0167 seconds time step and the fault clearing time is set after the end of fault duration time). For gathering severe transient samples, the uncertainty factor is considered by the convert load (CONL) API of PSS/E situated in PCS that causes the setting of different load characteristics for converting active and reactive power load (See Table 3; first row) [23], [34].  Fig. 7. We executed  [37]. ), VOLUME 10, 2022 ( RankTPFs TU 1 to RankTPFs TU 28 ) based on SU is shown in Table 5.
In the case of exerted wrapper-based predictive models (SVM and TWSVM) in D24WHFSS, the following points are important. The accuracy (Acc) metric (21) measured the per-FIGURE 9. Structure of IWSS RBF SVM tree related to Z 1 -specific first stage of L4SH 1:6 of 1 D24WHFSS in selecting OFs of TU 8 regarding Acc variations in optimal node (node 3). formance of SU IWSS kSVM , SU IWSSr kSVM , SU IWSS kTWSVM , and SU IWSSr kTWSVM learning models. On the other hand, the fine-tuning of learning parameters (C in SVM and TWSVM, σ in RBF and DTW, and p in Poly) in each increment of bi-IWM is considered in train-test procedures. The range of learning parameters (RoLPs) related to SVM (SVM RoLPs ) and TWSVM (TWSVM RoLPs ) is defined in (22) and (23), respectively. In each iteration, the maximum amount of the Acc which is obtained based on the best value of learning parameters is recorded. For example, Fig. 9 shows the IWSS RBFSVM tree related to Z 1 -specific first stage of L4SH 1:6 of 1 D24WHFSS applied on TU 8 , which Acc variations of optimal node (node 3: green-face) is depicted in 3-D plot.     Table 8, last row) is used for TSP in this section. The10-fold cross-validation-based scenario is considered for measuring the performance of the UMRTPFs 1:28 on TSP. The SVM RBF -based learning model is performed per fold-specific train-test procedure. Furthermore, the fine-tuning of the SVM RBF parameters, namely C and σ , is considered regarding the {C = 2 i |i = 0, 1, . . . , 15} and {σ = 2 j |j = −5, −4, . . . , 15} to report the best values of evaluation metrics (See Table 9) per fold. According to the above-mentioned experimental design, the performance of SVM RBF based on UMRTPFs 1:28 is shown in Table 10. Set different values on learning parameters of VOLUME 10, 2022    SVM RBF cause obtaining various Acc per fold. As can be seen in Table 10, the maximum value of Acc is considered as the result of Acc-based performance evaluation per fold. For more clarity, the Acc variations in some folds (fold 4 , fold 6 , fold 8 , and fold 10 ) are illustrated in Fig. 10. Also, the maximum Acc-specific TPR and TNR per fold are listed in Table 10. Finally, the mean value of obtained results in all folds per metric is calculated (See Table 10, last row). The Acc 99.25 %, TPR 99 %, and TNR 99.5 % indicates the high TPA on TSP via UMRTPFs 1:28 . Another main factor in proving the efficiency of the UMRTPFs 1:28oriented learning model is the TPT index (including observation window time (OWT) and prediction time). For TPT calculation, first, we focus on the TPFs of UMRTPFs 1:28 (See Table 8, last row) to specify OWT. The most extended observed cycle in UMRTPFs 1:28 is related to TPFs 6 , which picked up as the optimal cycle of TU 1 , TU 3 , TU 5 , TU 6 , TU 8 , TU 14 , TU 18 , TU 21 , TU 23 , TU 25 , and TU 26 . In this manner, the OWT is six cycles (100.2 milliseconds (ms)). On the other hand, the prediction time based on UMRTPFs 1:28 -SVM RBF is 2.407 ms. Consequently, the TPT is 102.607 ms (See Table 11), reflecting the low TPT to exert control actions.

D. COMPARISON OF EXPERIMENTAL METHODS: D24WHFSS VS. 3MWHFSSs AND 3MCWHFSSs
For a deep assessment of the efficiency of the proposed FSS in selecting OFs, D24WHFSS is compared with 3MWHFSSs. The 3MWHFSSs includes mRMR [13], Reli-efF [15] and FCBF [16]. Also, D24WHFSS is compared with 3MCWHFSSs including BMHFSS [19], CPQHFSS [22], and PITHS [23]. The 28VTTFs are fed to the 3MWHFSSs and 3MCWHFSSs, the 3MWHFSSs-based OFs and 3MCWHFSSs-based OFs are selected. Then, the 3MWHFSSs OFs and 3MCWHFSSs OFs are entered into the SVM RBF based on similar train-test conditions defined for D24WHFSS (See Section IV. C).

V. CONCLUSION AND FUTURE WORK
Thinking critically about the low-performance of the proposed mono-way hybrid FSS on TSA studies motivated us to design a novel feature selection algorithm called dyadic 24-way hybrid FSS (D24WHFSS) in this paper. The D24WHFSS is driven by the beating heart of linked fourlevel hybrid models (LFLMs) that the different permutations of levels cause execution LFLMs in 24-way (called 24WHFSS). The 24WHFSS is mounted on the bi-incremental wrapper mechanism (bi-IWM), namely IWSS and IWSSr. The filter and wrapper phases of bi-IWM are accompanied by SU and DSVCs, respectively. kSVM and kTWSVM are supervised machine learning algorithms regarded as DSVCs plugged into the bi-IWM. For precise mining on nonlinear HDTS, DKFPs are situated into DSVCs. Hence, KFPs-based 24WHFSS exerting is repeated in varied two times (dyadic 24WHFSS). After conducting D24WHFSS on mVTTFs, survived MRTPFs are entered into the cross-validation procedure to measure the efficacy of MRTPFs set in achieving low TPT and high TPA. Obtained results manifested that the MRTPFs have high performance (Acc 99.25 %, TPR 99 %, TNR 99.5 %, and TPT of 102.607 ms) for TSP. To address the effectiveness of the D24WHFSS against other feature selection algorithms, the performance of D24WHFSS compared with 3MWHFSSs and 3MCWHFSSs. The results show that selected MRTPFs by D24WHFSS have better performance than extracted optimal features by 3MWHFSSs and 3MCWHFSSs on TSP.
In future work, we intend to introduce a novel feature selection-feature extraction algorithm decorated by embedding a hybrid-based optimum-features selector layer in the convolutional deep network-based feature extraction. Such a framework can promise to pick up the most discriminativerelevant features on HDTS for high-performance TSA. Furthermore, the efficacy of the optimal transient features in achieving high-performance TSA under the N -k contingency analysis, load-generation level variations, and contaminated transient responses (missing and noisy transient data) is evaluated in future studies.