Comparative Study of Full and Reduced Feature Scenarios for Health Index Computation of Power Transformers

Power transformer health index (PTHI) computation is performed based on the results of different tests, such as dissolved gas analysis (DGA), oil quality (OQ) evaluation, and depolarization factor (DP) testing. In this study, PTHI computation is performed using 631 dataset samples from Malaysia and 730 samples from the Gulf Region. A new model is proposed to predict the PTHI state by adopting intelligent classification methods (e.g., decision tree, support vector machine, k-nearest neighbor, and ensemble methods). The model is built via two-stage data processing. The first stage separates the test results into three modules that represent DGA, OQ, and DP factor codes. In the second stage, the output of the three modules is processed to predict the PTHI state. The four classification methods are applied to the proposed model, and the prediction accuracy of the PTHI state is determined. Results indicate that the proposed model has superior classification accuracy for each AI method compared with recent work. Furthermore, feature reductions are applied to minimize the testing time, effort, and costs. The reduced-feature models reveal the effectiveness of the adopted feature reduction technique. A slight difference in accuracy is observed between the full- and reduced-feature scenarios. Thus, the reduced-feature scenario is considered to decrease the effort and time of the computation process and the experimental cost. The proposed model is validated against uncertain noise in features of up to ±20%.


I. INTRODUCTION
The transformer is the most expensive equipment in the electric power system. Monitoring the status of the insulation system in a power transformer is vital. Deterioration of the transformer insulating oil due to electrical and thermal stresses can lead to undesired transformer outage in electric power networks. Therefore, the continuous evaluation of the power transformer state has elicited much attention. The condition of a transformer can be evaluated using a health index [1]. Establishing the power transformer health index (PTHI) is a challenge. It involves the fusion of present data from several sensors and equally important historical data. PTHI is a practical method that uses transformer test results to indicate the state of a power transformer.
The associate editor coordinating the review of this manuscript and approving it for publication was Arpan Kumar Pradhan .
Artificial intelligence (AI) methods have been used to construct a prediction model to identify PTHI directly from test results. A neuro-fuzzy (NF) network was constructed in [3], [4] to determine PTHI on the basis of the results of DGA, OQ, and FA. The results indicated that 62% of the 73 tested transformers had the same assessment results regardless of the type of data used. In [4], an artificial neural network (ANN) and adaptive NF inference system were used to construct a model for identifying PTHI. Technical and economical parameters were utilized as inputs to the constructed model. The technical parameters were DGA, OQ, and FA. The economical parameters included transformer aging variables and cost functions. A fuzzy logic (FL) system was utilized in [2], [5]- [7]. In [2], the effect of interfacial tension and oil color was ignored, and the accuracy of the proposed FL model was not reported. In [5], an elaborate approach was used to construct separate FL systems for identifying individual transformer states. Then, a final FL system was built to compute PTHI on the basis of the results of the primary FL systems (thermal, electrical, mechanical, and dielectric integrity conditions). In [6], a Cauchy membership function for fuss grade division and a fuzzy evidence fusion method were used to develop a PTHI prediction model. The model considers the bushing state and the state of other accessories. In [7], PTHI was estimated using the FL approach based on a distribution area of 69 KV or less. An analysis was conducted using the results of oil tests, such as water content, breakdown voltage, dissipation factor, furan, H 2 , CH 4 , C 2 H 2 , C 2 H 6 , and C 2 H 4 . The ANN approach was applied to predict the PTHI state in [8]- [10]. In [8], an ANN was proposed for PTHI prediction by using a feature-based exhaustive method to eliminate the least significant tests. The constructed model was based on DGA, OQ, and FA. In [9], integrated transformer subsystems (insulation system, bushing, tap changer, core, and windings) were considered to evaluate PTHI. An intelligent multiple regression ANN model was constructed to build a quantitative PTHI. The model was applied on the basis of 345 transformer datasets with high-performance PTHI state prediction. In [10], an ANN model was built based on DGA, OQ, and FA factors to predict the PTHI state. The model exhibited an accuracy of 95% when it used the subset of input features, but its accuracy was 89% when it was tested with datasets of other utility networks.
A support vector machine (SVM) model was built in [11] to predict PTHI. The model was tested with 14 oil test results, which were used to build the model on the basis of 724 test samples with high detection accuracy. In [12], the level of furan content in transformer oil was determined through the measurement of oil test factors, such as breakdown voltage, water content, and dissolved gas. A prediction model was built using the k-nearest neighbor (KNN) method with 90% detection accuracy. A wavelet model was developed in [13] on the basis of 19 oil test results to predict the PTHI state. The main objective of this work was to present a new method of assessing transformer conditions by adding a PTHI table that improves conventional approaches. The model was used to predict the PTHI state in 345 transformers, and it exhibited good detection performance.
A Bayesian information fusion approach was proposed in [14] to predict the PTHI state on the basis of data collected from transformer measurements, maintenance, and failure statistics. In [15], a decision-making model was built in consideration of reliability and economy to identify the best maintenance strategy for oil-filled transformers as a basis of PTHI prediction. Particle swarm optimization was used to construct the decision-making model on the basis of DGA, oil test, electrical test, reliability, and economic operation. In [16], a Markov model (MM) was utilized to determine the PTHI of 373 transformers. Dissolved gas, OQ, and furan compounds were used as model inputs. The MM model was built based on a probability decision process, and it was used for PTHI prediction. Meanwhile, a cluster-merging model was presented in [17] to predict the PTHI state. However, only DGA was considered in this work; all other tests for identifying the PTHI state were ignored. The method's accuracy was investigated by using one case only, which was not enough to verify the model's prediction capability. The proposed model in [18] was used to estimate the linear relation between PTHI and transformer tests. The researchers concluded that DGA, FA, and breakdown voltage test are sufficient and can provide an acceptable indication of PTHI compared with other techniques that involve many tests. However, the model was applied in 90 transformers only, which may not be enough to provide useful insights into the accuracy of the method. In [19], a feature reduction model was developed based on different reduced-feature approaches. The results showed that water content, breakdown voltage, furan, and acidity are the most important features in predicting the PTHI state.
The drawbacks of these previous studies are as follows. First, most of them predicted the PTHI state by using a onestage process, which decreases the overall prediction accuracy. Second, they did not investigate the effect of uncertainty originating from various parameters, such as sampling temperature, sampling position, loading history, and maintenance history of the test transformer.
The current study presents a PTHI framework for transformers that is based on extensive research on ageing markers in transformers and how to classify them effectively for asset management. The framework is fed with training data and assesses classification accuracy by using a test dataset.
In addition, a new AI-based model for PTHI prediction was developed. The proposed model consists of two-stage data processing. The first stage uses four modules to process each  type of test result, including DGA, OQ, and DP, and produces  intermediate code factors. The codes are used as inputs to  the second-stage module (module 4), which identifies the final PTHI state. All modules in the proposed model use AI classification methods, such as decision tree (DT), SVM, KNN, and ensemble method (EN). A total of 631 and 730 test dataset samples were collected from the test and maintenance laboratories of two electricity companies in two countries to evaluate the accuracy of the proposed models and alternative AI-optimized methods. The datasets were utilized for training and testing purposes. The results on PTHI prediction accuracy indicated that the proposed model and the four classification methods exhibited high prediction accuracy for PTHI states. The EN method demonstrated an advantage over the other AI methods. A feature reduction approach was used to reduce testing time, effort, and costs. The results of the feature reduction showed high PTHI detection performance. The uncertainty study revealed the robustness of the proposed method in predicting the PTHI state, and PTHI prediction accuracy exhibited only a slight change. The maximum change in PTHI accuracy was 2.8% and 5.3% for the KNN method at ±15% and ±20% uncertainty noise compared with that without uncertainty under full-and reduced-feature scenarios, respectively.

II. POWER TRANSFORMER HEALTH INDEX
The determination of PTHI was performed based on the analysis in [1], [20], [21]. PTHI can be estimated as follows.

A. DISSOLVED GAS ANALYSIS
The computation of the DGA factor (DGAF) was performed based on the following dissolved gases decomposed in transformer insulating oil: H 2 , CH 4 , C 2 H 6 , C 2 H 4 , C 2 H 2 , CO, and CO 2 . DGAF can be determined as where W i and S i refer to the weight and score of test parameter i, respectively, as shown in Table 12 in the appendix [20]. The transformer condition based on DGAF is represented by a range from ''A'' to ''E'' For instance, when DGAF is less than 1.2, the transformer condition is good and referred to as code ''A'' The transformer condition is considered poor when DGAF is greater than 3.0, which corresponds to code ''E'' The codes from ''A'' to ''E'' are defined in Table 13 in the appendix [20].

B. OIL QUALITY
The second important factor that influences PTHI is the oil quality factor (OQF), which depends on the dielectric strength, humidity, color, interfacial tension, acid amount of the insulating oil, and insulation dissipation factor (DF). The condition of the transformer can be monitored and evaluated by measuring the insulation DF [20]. Insulation DF indicates the state of the insulation integrity of the winding and identifies the DF of the overall insulation, including the winding and bushing. The DF test is a routine test that measures the insulation capacitance and power factor of transformer insulation under 10 kV with 50 or 60 Hz [22]. Table 16 in the appendix (as in [20]) illustrates a ranking method of the transformer insulation DF. The score (S i ) and weight factors (W i ) of test parameter i in different transformer operating voltage ratings are listed in detail in Table 14 in the appendix, as in [20]. OQF can be calculated as follows: The transformer condition is identified based on OQF by using a code from ''A'' to ''E'' similar to DGAF.

C. DEPOLARIZATION FACTOR
The degree of depolarization (DP) of the paper insulation can be measured via the furfural content in the insulating oil [23]. The furan test is recommended and should be performed periodically when the rate of change in CO and CO 2 concentration increases or when the transformer age exceeds 25 years [22]. DP is an important factor to compute PTHI. However, when the oil has been replaced, the degradation of the cellulose paper cannot be identified based on the furan content. In this case, the age of the transformer can be used as a factor for health index computation [20], [23]. The range of furan content and the transformer age can be utilized to compute the DP factor (DPF) in accordance with Table 15 in the appendix, as in [20]. Table 1 shows the scoring system of the main representative factors for investigating the transformer health index factor (HIF). The HIF score varies from 4 to 0, which correspond to transformer criteria codes ''A'' to ''E'' respectively. On the basis of representative factors, the total PTHI can be computed as where K i is the weight factor for each transformer condition criterion [1], [20]. K i is identified in accordance with the importance of the test (DGA, OQ, and DP) in evaluating the PTHI state. The weighting factor of DGAF is 10 because its effect on evaluating the PTHI state is higher than that of OQF (8) and DPF (6). DGAF is the main factor that indicates faulty (poor) and non-faulty (good and fair) states. Utility engineers use the values of DGA features as indicators for faulty and non-faulty states of power transformers. Table 2 defines three PTHI states and their corresponding threshold limits [24]. The ''good'' state represents the case where the PTHI value is greater than or equal to 85, and the ''fair'' state represents the case where PTHI ranges from 85 to 50. The threshold limits for the ''poor'' state are obtained when PTHI is lower than 50. The required actions corresponding to each sate are defined in Table 2 [24].

III. METHODOLOGY AND SCENARIOS A. FULL-FEATURE SCENARIO
PTHI is computed using AI classification methods (DT, SVM, KNN, and EN). Figure 1 presents the procedure used to compute the PTHI state in the proposed model depending on the full features of the three test results (DGA, OQ, and DP tests). The dataset containing test results collected from a chemical company in Malaysia and an electric company in the Gulf Region is divided into two subsets. The first subset is used for training, and the second one is used for testing. To calculate the target output for each feature vector of the training and testing datasets, PTHI is computed using Eq. (3) and mapped to the corresponding transformer health state in accordance with Table 2. Then, the selected AI classification method is applied. Finally, the PTHI state is predicted and reported against the selected AI classification method. Figure 2 shows the procedure of the proposed model.
In the full-feature scenario, the PTHI state is predicted in two stages, as illustrated in Figure 2. In the first stage, the 14 input parameters are divided into three groups corresponding to the three factors: DGAF, OQF, and DP. Each group and its corresponding factor are used for training an individual module. The first group consists of seven input parameters (H 2 , CH 4 , C 2 H 6 , C 2 H 4 , C 2 H 2 , CO, and CO 2 ) that are fed into the first classification module with the corresponding calculated DGAF. The second group consists of six inputs parameters (moisture, breakdown voltage, color, acidity, IF, and DF) that are supplied to the second classification module with the corresponding calculated OQF. The furan concentration is applied to the third classification module with the corresponding calculated DPF. The selected classification method is used to train the three classification modules individually. In the second stage, a fourth module is trained with a feature vector composed of DGAF, OQF, and DPF as the input and the PTHI state obtained from Table 2 as the output. After completing the training process, the testing dataset is used to measure the accuracy of the developed classification methods.

B. REDUCED-FEATURE SCENARIO
Feature selection (FS) is a process of considering relevant features and discarding irrelevant features that do not influence the output of a problem with minimum performance degradation. Given that PTHI evaluation considers all features, it requires considerable time for experiments, intensive labor, and high costs. FS is divided into three categories, namely, filters, wrappers, and embedded methods. In this work, the minimum redundancy maximum relevance (MRMR) approaches categorized as filter methods were used to determine the importance features. The feature importance scores were estimated using MRMR, as performed in [25]- [27].
MRMR ranks the importance features of the classification problem. The main objective of MRMR is to minimize the redundancy of a feature set and maximize the relevance of a feature set to classification output c by using mutual information I , as follows [18], [19]: The algorithms of the four modules are introduced as follows: where S is the optimal set of features that maximize relevance A s and minimize redundancy B s , |S| is the number of features in S, x and z are two features that belong to set S, c denotes the output classes, and I is the mutual information that can be expressed as follows [18], [20]: where A and B are two sets; a and b are features in subsets A and B, respectively; p(a, b) is the probability of a and b; and p(a) and p(b) are the probability of a and b, respectively. Calculating an optimal set S requires considering all 2 w combinations, where w is the entire feature site. The mutual information quotient (MIQ) is used to facilitate the estimation of MRMR, as follows: where A x and B x are the relevance and redundancy of a feature, respectively, and they can be evaluated as follows:

IV. RESULTS AND DISCUSSIONS A. FULL-FEATURE RESULTS
Each of the four selected classification methods (DT, SVM, KNN, and EN) was implemented using 2019b MAT-LAB/software. The methods were optimized and implemented in accordance with the MATLAB classification learner application toolbox. A brief description of the four intelligent classification methods is presented in the appendix.
The cross-fold validation approach with five folds was used to train the four AI-optimized classification methods. The training and testing processes were carried out on the 1,361 dataset samples. The dataset samples were collected from two countries. The first set (631 samples) was collected from the transmission regions of Malaysia Electricity Company. The dataset includes samples of power transformers of 220 kV at the transmission stage. The second set (730 samples) was collected from the Gulf Region's medium-voltage region (66 kV). Each set was divided randomly into 65% and 35% training and testing data samples, respectively. The training samples (885) of the two sets were collected for training, and the testing samples (476) of the two sets were collected for testing. Table 3 shows the distribution of the samples.
The accuracy of each AI classification method for training, testing, and overall stages was calculated as: where Nc is the number of correctly predicted state samples from each AI-optimized classification method. The correctly predicted state sample was identified by comparing the result of the AI classification method with the result of the PTHI of Eq. (3) for each sample. Nt is the total number of state samples.
The classification learner toolbox application in MATLAB 2019b was used to build the four classification methods  (DT, SVM, KNN, and EN). For example, the DT method types (fine tree, medium tree, and coarse tree) were used, and the issue was determining which one develops the best predictor. Hence, the hyperparameter optimization in the classification learner toolbox was used to select the suitable classification method and the corresponding parameters of the two suggested scenarios.
The minimum estimated error was determined as follows: where Nci is the number of correctly predicted states of classification type i. AI classification algorithms (DT, SVM, KNN, and EN) require an optimization technique to determine the best optimization parameters for each AI classification method. The optimal parameters of DT methods are the maximum number of splits and split criterion, and that of SVM are multiclass method, box constraint level, kernel function used, and standardized data. Several optimization approaches, such as Bayesian optimization (BO), grid search, and random search, are frequently used with classification methods. The BO approach is effective for optimization problems and can be used for most machine learning techniques for hyperparameter optimization [28]. BO iteratively explores the hyperparameter space, where a probabilistic model of estimation is built based on prior calculation. Then, the probabilistic model is utilized to evaluate the optimal parameters by using the probability values of its position and selecting the parameters related to the highest probability [29]. The detailed model of BO used for estimating the optimal hyperparameters of machine learning techniques was introduced in [29], [30].
The training process was conducted through the following steps. First, the main optimization parameters were selected before the training process of each classification method (DT, SVM, KNN, and EN). Second, the optimization training process was carried out for each classification method. Third, an optimized model was generated for each classification method. Lastly, Steps 2 and 3 were repeated for each module of the proposed model shown in Figure 2. The main optimization parameters of training DT, SVM, KNN, and EN methods are presented in Table 4. The DT, SVM, KNN, and EN methods were optimized for the four modules of the full-feature approach on the basis of the 885-sample training dataset. The optimal parameters of the four classification methods for the four modules with the full-feature scenario are given as in Table 5. Figure 3 shows the minimum errors during the optimization process of the training stage for DT, SVM, KNN, and EN method corresponding to the four modules introduced in Figure 2 in the full-feature scenario. SVM had a minimum error for module 1, EN had a minimum error for module 2, and the four methods had an equal minimum error for modules 3 and 4.     Figure 5 illustrates the overall accuracy for training and testing stages in the full-feature scenario. The overall accuracy of predicting the PTHI states for DT, SVM, KNN, and EN methods corresponding to the training and testing process was (95.3%, 96.2%, 95.4%, and 96%) and (93.9%, 95.2%, 95.2%, and 95.6%), respectively. These results indicate good prediction accuracy during the training and testing stages. Table 6 presents the number of total samples correctly predicted by each of the four methods against each PTHI state. The detection accuracy of SVM and EN was better than that of the other methods for the ''good'' state. KNN had the highest accuracy for the ''fair'' state, and SVM method had the highest accuracy for the ''poor'' state. The EN method achieved superior classification accuracy (95.9%) in the full-feature scenario.

B. REDUCED-FEATURE RESULTS
This section introduces the procedure of evaluating the rank of important redundancy features for DGAF and OQF by using the MRMR approach presented in Section III (B). The sufficient features for each factor were determined using the  principal component analysis (PCA) facility in the MATLAB classification learner toolbox after the training process. The sufficient features for each factor were those that had a variance greater than 95%. The training process was carried out with PCA for DGAF and OQF. The minimum feature for DGAF was only one feature with explained variance ratios in the order of 98.9%, 1%, and other features <0.1%. Meanwhile, the number of features required for OQF was at least three, with explained variance ratios in the order of 72.4%, 15.7%, 9.3%, 2.5%, 0.2%, and 0%. The importance features for DGAF and OQF were selected based on the highest feature scores obtained from the MRMR method. The importance feature scores of DGAF and OQF are presented in Figures (6) and (7), respectively. The results showed that CO 2 , C 2 H 2 , C 2 H 6 , and C 2 H 4 ( Figure 6) were the most important factors of DGAF, whereas color, BDV, IF, and moisture ( Figure 7) were the most important features of OQF. The training process was repeated with the selected features for DGAF and OQF. Although the variance of DGAF was only one (CO 2 ), it was not enough to represent DGAF because CO 2 and CO illustrate the insulation paper state, and the other features illustrate the faulty state of transformer oil. Hence, one of them was selected with CO 2 as an input of module 1 to predict the DGAF state.
After selecting the required features for each module (modules 1 and 2), the same procedure for the full-feature scenario was carried out for the reduced-feature scenario. Different features were selected for DGAF and OQF and applied to  modules 1 and 2, respectively, whereas furan was inserted into module 3. The output codes of modules 1, 2, and 3 were applied to module 4 to obtain the final PTHI state. The obtained model for each case was used at the testing stage to predict the PTHI by using the testing data samples. Different numbers of features were considered for DGAF and OQF. The overall accuracy in each case is reported in Table 7. In Table 7, the numbers 7, 5, 3, and 4 denote CO 2 , C 2 H 2 , C 2 H 6 , and C 2 H 4 on the DGAF column, respectively, and the numbers 12, 8, 9, and 11 denote color, BDV, IF, and moisture on the OQF column, respectively. The highlighted case has the best prediction accuracy of 93.5%, 94.3%, 94.9, and 95.8% for DT, SVM, KNN, and EN methods, respectively. The selected features for DGAF that satisfy the minimum requirements of PCA and achieve the highest prediction accuracy for the PTHI state were CO 2 , C 2 H 2 , C 2 H 2 , and C 2 H 4 ; the corresponding features for OQF were color, BDV, IF, and moisture. Figure 8 shows the confusion matrix for the training process of the EN method in the reduced-feature scenario. The classification process for the ''good'' state had the highest classification accuracy (655/662 = 98.9%), whereas the ''poor'' state had the lowest classification accuracy   (10/19 = 52.6%). The overall accuracy of detecting the PTHI state was (846/885 = 95.6%). Figure 9 presents the overall accuracy for training and testing stages in the reduced-feature scenario. The overall accuracy of predicting the PTHI states for DT, SVM, KNN, and EN methods during the training and testing processes was (94.1%, 95%, 94.4% and 95.6%) and (92.4%, 92.9%, 96% and 96.2%), respectively. The results reveal a good prediction accuracy during the training and testing stages, especially with the EN method. Table 8 presents the correct number of samples for the classification methods in comparison with each PTHI state. The results indicated that the detection accuracy of the EN method was better than that of the other methods for the ''good'' and ''fair'' states. KNN exhibited the highest accuracy for the ''poor'' state. The EN method achieved the highest classification accuracy (95.8%) in the reduced-feature scenario. Only a slight difference in accuracy was observed between the  full-and reduced-feature scenarios. Thus, the reduced-feature scenario was considered to decrease the effort and time of the computation process and the experimental cost.

C. COMPARISON OF FULL AND REDUCED FEATURES
Comparisons of the full-and reduced-feature scenarios were implemented to investigate the difference in their prediction accuracy. All features were used for training and testing the full-feature scenario, but only nine features (CO 2 , C 2 H 2 , C 2 H 6 , C 2 H 4 , color, BDV, IF, moisture, and furan) were adopted for the reduced-feature scenario. Applying the reduced features resulted in a short training time (for training modules 1 and 2), less time and effort for measuring the features in the laboratory, and fewer oil samples required for measuring the features. Table 9 presents a detailed comparison of the full-and reduced-feature scenarios for ''good,'' ''fair,'' ''poor,'' and overall PTHI states. The prediction accuracies of each method in the two scenarios showed good agreement. The prediction accuracy of ''good,'' ''fair,'' ''poor,'' and overall PTHI states for the EN method was (98.6%, 91%, 61.8%, 95.9%) and (98.7%, 91%, 55.9%, 95.8%) for the full-and reduced-feature scenarios, respectively. The results show a slight difference between the full-and reduced-feature scenarios, especially when the EN method was involved. Figure 10 presents a comparison boxplot of the classification methods (DT, SVM, KNN, and EN) for the full-and reduced-feature scenarios. It presents the minimum, maximum, and standard deviation of each method in the two scenarios. A slight difference in the maximum and mean values was observed between the two scenarios for the four classification methods, especially the EN method. The KNN and EN methods had small differences in their minimum values under the full-and reduced-feature scenarios, whereas DT and SVM showed a large difference. The EN method is thus the preferred method for the reduced-feature scenario; it requires only nine features, and its accuracy is close to that of the full-feature accuracy.

D. PROPOSED MODEL VALIDATION 1) PROPOSED MODEL AGAINST THE RESULTS IN [24]
The effectiveness of the proposed model was checked by using two approaches. The first one involved comparisons with the results published in [24], which used dataset samples of System 2 (Gulf Region). The second approach was an uncertainty check. Table 10 presents the comparison between the results obtained by the proposed methods and those presented in [24] for the full-and reduced-feature scenarios. The results in [24] are based on the Gulf Region dataset and on the methods NN, MLR, J48, and RF (the nine methods in [24] exhibited high accuracy). The proposed methods demonstrated higher accuracy compared with the methods presented in [24] for the full-feature ( * ) scenario and acceptable accuracy for the reduced-feature ( * * ) scenario. For the full-feature scenario, the highest accuracy achieved by the proposed model was 96.7% for the KNN classification method, whereas the highest accuracy achieved by the method in [24] was 96.6 % for RF classification. In the reduced-feature scenario, the results of the proposed methods were better than those in [24], except for the RF method that has 96.6%. The best among the proposed methods (the EN method) achieved 95.6%.

2) PROPOSED MODEL AGAINST UNCERTAINTY
The process of preparing the datasets of the power transformers was carried out offline in three main steps. The first step was extracting samples from the power transformers. The second step was extracting the gases from the transformer oil, and the third step was predicting the PTHI state. The oil samples were extracted using special syringes. The extracted samples were stored and transferred to laboratories. Many factors, such as storage time and temperature, affect gas concentration. The extraction process of gases is often conducted using several techniques. Air bubble is the most critical factor that affects gas concentrations [31]. Air bubbles reduce dissolved gases because of the diffusion of gases into the air bubbles, thus leaving the oil [32]. Hence, uncertainty during measurements affects PTHI state prediction. The uncertainty during measurements must therefore be considered by the AI classification techniques used for PTHI state prediction. An uncertainty level of ±14% is caused by the sample's storage and temperature effects, and an uncertainty level of up to ±5% is caused by measurement accuracy [33]. In this work, an uncertainty level of up to ±20% was considered.
An uncertainty evaluation was carried out by applying percentage noise on the input data. Percentage noise can be calculated as follows [21]: where N l is the noise vector, l represents the number of applied samples in the uncertainty evaluation (476 testing samples), m is the required noise percentage level (5%, 10%, 15%, or 20%), and R L is a random vector with a length of 14 and varies from 0 to 1. Noise vector N l is a vector that has numbers varying from 0.95 to 1.05, 0.9 to 1.1, 0.85 to 1.15, or 0.8 to 1.2 when m has values of 5%, 10%, 15%, or 20%, respectively. The original input feature vector and N l were produced element by element to obtain new data (data with uncertain noise). These new data were inserted into the proposed model by using different AI classification methods for full-and reduced-feature scenarios. The test dataset was used to measure the performance of the four AI methods during the uncertainty evaluation. Table 11 presents the results of full-and reduced-feature scenarios against uncertainty data (476 testing dataset samples) for the four AI classification methods. Tables 6 and  10 show the results of the four AI methods with the testing data for the full-and reduced-feature scenarios, respectively. The four AI methods exhibited good depredation performance against uncertainty, and the EN method showed the best performance in the full-feature scenario.   at ±20% uncertainty noise. The performance of the four AI classification methods confirmed that the uncertainty noise effect was limited in the proposed scenarios. Figure 11 presents the change in the accuracy of the four AI classification methods against uncertainty noise of 0% to ±20% under the two scenarios. The degradation in the accuracy of the four methods was limited in the two scenarios against an uncertainty level of up to ±20%. EN had the best performance against uncertainty in the full-and reduced-feature scenarios.

V. CONCLUSION
In this work, full-and reduced-feature scenarios were proposed to predict the PTHI state by using four intelligent classification methods. The full-feature scenario included four classification modules for predicting the PTHI state. The four proposed AI classification methods were DT, SVM, KNN, and EN. The overall prediction accuracy in the full-feature scenario was 94.8%, 95.3%, 95.8%, and 95.9% for DT, SVM, KNN, and EN, respectively. These results indicate that the recommended method for PTHI state prediction is the EN method for the full-feature scenario. Furthermore, the results of the proposed model are superior to those presented in [24]. The MRMR feature reduction method was also used to reduce the input features of the full-feature scenario from 7 to 4 and from 6 to 4 for DGAF and OQF, respectively. CO 2 , C 2 H 2 , C 2 H 6 , C 2 H 4 , color, BDV, IF, moisture, and furan were the final selected features for the reduced-feature scenario. A slight difference in accuracy was observed between the fulland reduced-feature scenarios. The highest overall accuracy in the full-and reduced-feature scenarios was 95.9% and 95.8%, respectively; these values were obtained with the EN method. Thus, using the reduced-feature scenario is more reasonable than using the full-feature scenario. An uncertainty approach was applied to investigate the robustness of the proposed model. The results indicated that the maximum errors of classification accuracy were 2.8% at ±15% uncertainty noise and 5.4% at ±20% uncertainty noise for the full-and reduced-feature scenarios, respectively; these values were obtained with the KNN classification method.

APPENDIX A DECISION TREE (DT)
The DT method is a classification approach. The first step of the decision tree is the root node [34]. In this step, data are classified into different branches of the tree to obtain the specified classes. The second step of the decision tree is to identify the class condition, and the classification forms a sub-tree. The same sorting process is repeated until all data in a branch are of the same type. Figure 12 shows the construction of a decision tree that consists of the root node. Two or more paths branch from the root node to represent an outcome of the test on the training dataset. These paths end with an internal node, which denotes a test on an attribute. At the end, the leaf node holds a numeric prediction that considers numeric class. The DT method has different types, such as fine, medium, and coarse.

APPENDIX B ENSEMBLE METHOD (EN)
This classifier method is used to reduce variance in prediction models. The EN method has different types, such as bag, AdaBoost, RUS boost, logit boost, and gentle boost. The most important EN method is the bag trees method. It accomplishes the task by extracting new generation datasets using combinations with repetitions from the original dataset for training [35], [36]. It fits the base classifiers of the random generation subsets from the original dataset. Hence, it aggregates the individual predictions of these subsets to develop the final decision. This type of classification is based on a tree structure consisting of a master node called the ''root'' and a group of internal and final nodes called ''terminals.'' The models generated by this classification are characterized by high accuracy and speed in model construction. It can also be applied to multiclass data and can be interpreted and understood by decision tree analysis. The proposed model was validated against uncertainty noise of up to ±20%.

APPENDIX C SUPPORT VECTOR MACHINE (SVM)
SVM is one of the most common machine learning methods, and it is used as a classifier for data analysis [37], [38]. SVM finds the optimal separating hyperplane to maximize the margin between data samples, as indicated in Figure 13. The filled circles denote the support vectors, and the unfilled circles refer to the training data.
SVM regression evaluates a function on the basis of input and output data, as follows [38]: where w refers to the weight factor and b is the bias term. They are used to identify the location of the hyperplane that satisfies certain constraints, as follows [38]:    y k (w.x k + b) ≥ 1, k = 1, 2, . . . . . . , m min( Therefore, SVM depends on the features of each collected sample. SVM constructs a hyperplane to segregate the samples of different classes. The hyperplane is built according to the training datasets, and it is used as a classifier for a new sample to obtain the actual class of each tested sample. A popular function used to identify the hyperplane is the kernel function. The kernel function has different types, such as Gaussian, linear, quadratic, and cubic [39].

APPENDIX D K-NEAREST NEIGHBOR (KNN) CLASSIFIER
KNN is a machine learning classification method. KNN stores all available data on certain states and classifies the new sample on the basis of a similarity measure by measuring the   distance functions. KNN operation theory determines the distances between a query and all samples of the datasets. Then, by selecting the specified number of samples (k) close to the query, KNN votes on the most frequent label or averages the labels. Figure 14 shows the operation of KNN.
Several classes can be represented by different shapes, such as squares and triangles. A test sample is expressed using a star to identify the class that it belongs to (class 1 [square] or class 2 [triangle]). The label of k determines the class. For example, k = 3 expresses class 1 because only two squares and only one triangle exist. By contrast, k = 5 refers to class 2 because the number of samples is five, which consists of three triangles and two squares [40]. The different metric distance methods used for KN are Euclidean, city block, Chebyshev, Minkowski (cubic), Mahalanobis, cosine, Spearman correlation, Hamming, and Jaccard.