Using GANCNN and ERNET for Detection of Non Technical Losses to Secure Smart Grids

In this paper, two supervised learning models based solutions are proposed for Electricity Theft Detection (ETD). In the first solution, Adaptive Synthetic Edited Nearest Neighbor (ADASYNENN) is used to solve class imbalanced problem. For feature extraction, Locally Linear Embedding (LLE) technique is utilized. Moreover, Self-Attention Generative Adversarial Network (SAGAN) is used in combination with Convolutional Neural Network (CNN) for the classification of electricity consumers. In the second solution, Synthetic Minority Oversampling Technique Edited Nearest Neighbor (SMOTEENN) is proposed. Moreover, a novel classification model, named as ERNET, which is based on EfficientNet, Residual Network (ResNet) and Gated Recurrent Unit (GRU), is used to detect Non-Technical Losses (NTLs). We also used a Sparse Auto Encoder (SAE) for effective feature extraction that makes the classification more robust and easy. Furthermore, a robust Root Mean Square Propagation (RMSProp) optimizer is used to improve the learning rate of the model. To validate the proposed models, simulations are performed using different performance metrics, such as precision, recall, F1-score, Area Under the Curve (AUC), FPR and Root Mean Square Error (RMSE). All simulations are performed using State Grid Corporation of China (SGCC) dataset. The proposed models are compared with benchmark models, such as SAGAN, Wide and Deep Convolutional Neural Network (WDCNN), CNN and Long Short Term Memory (LSTM). The simulation results prove that the proposed models outperform the existing models in terms of the aforementioned performance metrics.

x Consumer's observation x i Daily electricity consumption of a consumer y Predicted output y Actual output z Input sample Z A randomly selected input from data λ Random number between 0 and 1 σ Sigmoid function η Learning rate ρ Exponential average of last updated value To avoid ending with zero

I. INTRODUCTION
With the increase in the number of residential homes and industries, the demand of energy increases manifolds. Therefore, power generation companies need to generate more electricity [1]. Moreover, there should be a balance between electricity generation and consumption to mitigate the issue of energy shortage [2]. Due to the latest advancements in Advanced Metering Infrastructure (AMI), traditional grids are converted into smart grids where data is collected through smart meters. The balance between demand and supply is also established using bi-directional flow of energy and information [3]. In energy transmission systems, two types of losses occur, which are known as Technical Losses (TLs) and Non-Technical Losses (NTLs). The former losses occur due to poor infrastructure and energy dissipation. Whereas, the latter losses are defined as the difference between total electricity transmitted through distribution lines and the electricity consumed by the users. Due to the NTLs, power utilities face losses worth millions of dollars, which highly affect the country's economy [4]. The manual inspection of these losses is both time consuming and expensive [5], [6].
There are different reasons for the occurrence of NTLs, which are broadly categorized in two categories: human and non-human. The former includes tampering the meter readings, hooking with the main lines, etc. Whereas, the latter includes errors in smart meters, fluctuating energy flow, meter inaccuracies, etc., [6]. With the NTLs, other losses also occur, such as unbearable load on electrical systems, load shedding, economical loss, etc., [7]. With the use of smart meters, flow of both energy and information becomes automated. For the utility companies, the smart meters remotely provide data related to readings of electricity consumption on real time basis. Therefore, it becomes easy to steal the electricity by manipulating the electricity consumption data [8], [9].
To handle NTLs, several solutions have been proposed in the literature [10], [11]. These solutions are broadly categorized into three types: hardware based solutions, game theory based solutions and data-driven based solutions. Hardware based solutions focus on designing smart devices and sensors to detect electricity thieves. These solutions require hardware equipment and devices that are expensive and involve high maintenance cost. Moreover, these devices are less efficient and more time consuming [12]. On the other hand, in game theory based solutions, there exists a game between players, i.e., utility and electricity users [13]. Both entities try to maximize their benefits. However, these solutions are based on assumptions that can be inappropriate if wrong utility function is formulated. Moreover, these solutions have less accuracy and high False Positive Rate (FPR) [14]. In contrast, many data-driven based solutions are adopted in literature for NTL detection [15]. These solutions include techniques that are based on artificial intelligence and machine learning. These techniques perform pattern analysis on the electricity consumption data without requiring additional hardware cost and human resources. Furthermore, the data-driven based solutions are more robust, efficient and easy to understand as compared to hardware based and game theory based solutions [16]. However, in the data-driven based solutions, there are some problems, which are needed to be addressed. The problems are class imbalanced problem, low accuracy and high FPR. Therefore, this study is carried out keeping in view the data-driven solutions.
The remainder of this paper is organized as follows. Section II discusses the work done in the literature on the detection and prevention of NTLs and also highlights the limitations. Whereas, the proposed solutions are elaborated in Section III. Simulation results along with their discussion are presented in Section IV. Conclusion and future work are given in Section V.

II. STATE-OF-THE-ART METHODOLOGIES
In smart grids, anomaly is defined as the deviation from regular or normal electricity consumption patterns. It occurs due to many factors like arrival of more family members at home, occurrence of a special occasion, illegal use of electricity, etc. In anomaly detection, data-driven models are used that learn the normal patterns and detect the abnormal patterns to identify the electricity thieves.
Maamar et al. [17] have proposed a hybrid technique, which is based on k-means clustering and deep neural network for anomaly detection. However, users have to select the value of k (number of clusters) at the start, which is not suitable in a dynamic environment. Also, the imbalance class problem is not resolved. Yip et al. [18] have proposed a novel technique, named as loss factor and error term, to detect anomaly in smart meter's data. Loss function is used to calculate NTLs; whereas, error term is used to detect noise in transmission and distribution lines. However, the class imbalanced problem is not addressed. In a recent study [19], Cheng et al. have proposed an auto encoder technique for anomaly detection in electricity consumption data. Auto encoder is used to learn the patterns in an unsupervised manner while discarding the noise. However, the class imbalanced problem is not addressed and overfitting is not solved, especially when there is no sufficient level of diversity in the data. Giuseppe et al. have proposed a concept of drift aware based approach for the detection of anomaly from electricity consumption patterns. The authors have used Long Short Term Memory (LSTM) to capture the periodicity of normal consumers based on their previous consumption history [20]. However, the class imbalanced problem is not solved. The LSTM method may require high memory bandwidth to feed its computational units. Ding et al. [21] have proposed a hybrid model, which is based on Gausian Mixture Model (GMM) and LSTM for the detection of real time anomaly. However, it is difficult to determine for certain the number of clusters to be created. Also, the class imbalanced problem is not tackled. Authors in [22] have proposed Jaya-LSTM for the forecasting of electricity load. All of the above mentioned methods perform better in terms of anomaly detection. However, the methods are not feasible enough to accurately detect electricity fraudsters. Zheng [23] have proposed Wide and Deep Convolutional Neural Network (WDCNN) for ETD. They have used State Grid Corporation of China (SGCC) dataset, which consists of verified electricity thieves. However, the class imbalanced problem is not addressed.
Zheng et al. [24] have proposed a hybrid technique, which is a combination of maximum information coefficient and clustering technique, to find density peaks for the detection of electricity thieves. However, it is tedious to generate clusters from local densities of data points that are randomly distributed. Thus, it is difficult for the cluster heads to be selected. Moreover, the class imbalanced problem is not solved. Li et al. [25] have performed ETD for Internet of Things (IoTs) enabled smart homes. The method is not suitable for solving the future changes in electricity consumption data as it depends on the past data. Moreover, the class imbalanced problem is not addressed. Fleury et al. [26] have proposed genetic programming algorithm for theft detection. Data is collected from more than 4000 consumers for experiments. The authors have focused on feature engineering rather than classification. However, it is difficult to tune the parameters of genetic programming algorithm and also selecting wrong number of clusters may affect the accuracy of the algorithm. Micheli et al. [27] have proposed multiple linear regression model for the detection of NTL. However, the model is not efficient for real life scenario as the relationship between covariates and response variables may not be linear. In another study, Coma-Puig et al. [28] have implemented and compared three machine learning techniques: eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightBoost) and Category Boosting (CatBoost) for NTL detection. However, the techniques are not suitable for smaller datasets as the methods may create overfitting problem. Viegas et al. [29] have proposed fuzzy clustering model for the identification of electricity thieves. They have used Gustafson-Kessel fuzzy algorithm and have attained 63% True Positive Rate (TPR). Buzau et al. [30] have applied several machine learning models, such as K Nearest Neighbor (KNN), Linear Regression (LR) and Support Vector Machine (SVM) for NTLs detection. However, there is a problem of selecting k in KNN, and SVM is not suitable for large dataset.
Moreover, electricity theft is a crucial problem for utility companies, as they have to bear huge losses every year. Many data-driven based solutions are proposed in the literature for ETD. However, there exists some limitations in these solutions, which are needed to be addressed. Li et al. [44] have proposed a hybrid model, which consists of CNN and Random Forest (RF) for ETD. However, the computational complexity of RF is very high as it takes more time to construct decision trees. In addition, FPR is also not calculated. Hasan et al. [45] have proposed a hybrid technique by combining CNN and LSTM for ETD. The proposed technique efficiently performed in terms of accuracy. However, LSTM requires a lot of memory for storing long-term sequences. Moreover, LSTM is not hardware friendly because it needs more resources as compared to CNN and Gated Recurrent Unit (GRU). In [45], authors have used Synthetic Minority Oversampling Technique (SMOTE) to balance the data for training CNN and LSTM models to perform classification. However, SMOTE generates synthetic data samples, which cause overfitting problem. In [31], authors have proposed Random Undersampling (RUS) with adaptive boosting method to solve the problem of imbalanced data. They have proposed Maximal Overlap Discrete Wavelet Packet Transform (MODWPT) for classification. However, RUS leads to loss of relevant samples that are important for the training of machine learning model and prevent it from underfitting problem. Authors in [23] have proposed WDCNN model for the detection of electricity thieves. However, authors do not provide a mechanism to handle the class imbalanced problem, which leads to misclassification. Furthermore, the model has high FPR value, which leads to high inspection cost. Table 1 shows the analysis of some of the existing techniques and their performance measures while Table 2 presents the advantages and disadvantages of existing techniques.

III. PROPOSED MODELS
To overcome the issues identified from the literature, we propose two deep learning models in this work: GANCNN and ERNET. The former is the combination of Self-Attention Generative Adversarial Network (SAGAN) and CNN. Whereas, the latter is a hybrid of EfficientNet, Residual Network (ResNet) and GRU. In the GANCNN model, data sampling and feature extraction are done using Adaptive Synthetic Edited Nearest Neighbor (ADASYNENN) and Locally Linear Embedding (LLE), respectively. In the ERNET model, GRU is applied for the classification of honest and dishonest consumers. GRU does not require separate memory cell and excessive parameters to train the model.

A. PROPOSED GANCNN MODEL
The proposed GANCNN model comprises of five steps: data collection, data pre-processing, data sampling, feature extraction and classification. The proposed model is illustrated in Figure 1. The flowchart of the proposed model is presented in Figure 2. Moreover, the steps and flowchart are discussed in different sections.

1) DATA COLLECTION
The data used in this work is acquired from SGCC [46]. It is a labeled dataset with a known number of electricity thieves. It consists of customers' ID, daily consumption and flagged (i.e., target attribute) either as 0 or 1. Daily electricity consumption data from January 2014 to October 2016 is considered. The consumption records of 42,372 consumers are present in the dataset. Out of these, 3,615 are electricity thieves and remaining 38,373 are normal consumers [23]. Table 3 shows the detail of SGCC dataset.

2) DATA PRE-PROCESSING
The dataset consists of missing values due to the faulty meters, unscheduled repairing of electricity equipment, data tampering, etc. In this research, we use imputation method for handling the missing values [47]. The working of imputation method is shown in Figure 3. Initially, it checks the current value in the dataset. If the value is not null, it remains unchanged; otherwise, the algorithm checks the neighboring values to fill the missing value. If the neighboring values are null, then they are replaced with zeroes. In case, if previous and next values are not null, it takes the mean of both values and replaces the current null value with the mean value. Here, one thing to be noted is that even if the neighboring values are null and replaced with zeroes, the overall pre-processing method is not affected negatively. It is because the data distribution remains the same. To normalize the data, we use Min-Max normalization method in this research, as neural networks are sensitive to diverse data. The method is applied using following Equation 1 [23].
x i denotes daily electricity consumption of a consumer. Where max(x) and min(x) show maximum and minimum values of a consumer's consumption, respectively.

3) DATA SAMPLING
SGCC dataset is imbalanced in nature, as shown in Figure 4a. If the data in both majority and minority classes is not VOLUME 9, 2021   which further leads to performance degradation. In our scenario, normal consumers (i.e., majority class) are more in number as compared to electricity thieves (i.e., minority class). Therefore, it is important to balance the data before passing it to the classification model. If the model is trained on the normal consumers, it becomes ineffective and shows less accuracy in detecting the electricity thieves. In existing studies, many data sampling techniques have been used to handle imbalanced data. The techniques are broadly categorized into two types: oversampling and undersampling. The former is performed when the number of samples of one class are less and they have to be increased. Whereas, the latter is the opposite of it and involves removing the samples, which are more in number. Both oversampling and undersampling are performed to remove the class imbalanced issue and have their respective advantages and disadvantages. In this work, the hybrid technique, named as ADASYNENN, is used for data sampling. It is a combination of ADASYN (oversampling) and ENN (undersampling) techniques, as shown in Figure 4b. ADASYN is used to increase minority samples and make them equal to majority samples [48]. Whereas, ENN is used to reduce the majority samples and make them equal to the minority samples. The following steps are involved in ADASYN.
• Calculate the number of minority and majority class samples represented as S i and S l , respectively.
• Measure the degree of imbalance ratio d using • Equation 3 is used for calculating the number of samples required to be generated G. The minority samples should be equal to the majority samples for handling class imbalance issue.
• Find KNNs of each minority class sample. After this step, each minority sample is associated with a specific group.
• Give more importance to those minority class samples, which have more majority class samples in their specific groups.
• If d < d s , new minority class samples are generated.
(d s is the preset threshold value, which decides the tolerable value of the class imbalanced ratio).
• A new sample is generated using Equation 4.
S n is a newly generated sample. x i and xz i are randomly chosen from minority group i. λ is any random number between 0 and 1. Moving ahead, ENN is an undersampling technique used to remove majority class samples [49]. In this method, majority class samples are removed, which are near to borderline of minority class samples. Figures 4a and 4b show the dataset before and after applying ADASYNENN sampling technique, respectively.

4) FEATURE EXTRACTION
The efficiency of a classification model depends upon the successful execution of the feature extraction process in which the significant and most important features are extracted. In the literature, different methods have been developed for performing feature extraction. In the first proposed model of the underlying work, LLE is used for extracting useful information from the dataset [50]. The required number of features are extracted using KNN and covariance between features. The closely related features are extracted from the feature space, which are then used for the classification purpose. A detailed stepwise feature extraction process is shown in Figure 6. It is used for non-linear data and is based on manifold technique, which extracts the important features from the feature space in an iterative manner.

5) CLASSIFICATION
For the classification of electricity thieves and normal consumers, a hybrid GANCNN model is proposed, which is a combination of SAGAN and WDCNN. SAGAN is a deep learning model and is considered as the best training model. It has two modules: generator and discriminator [51]. The former creates synthetic data similar to original data by selecting random input samples from the dataset. The latter discriminates between fake and original data [52]. During GAN's process, both generator and discriminator modules are trained until discriminator is failed half of the time to distinguish between fake and original samples, which means that generator is successful in creating fake samples. The random input samples are selected on the basis of inverse transform technique in which Cumulative Distribution Function (CDF) is used. CDF is given in Equation 5 where P is the probability, Z is randomly selected input from data and z is the input sample. The architecture of SAGAN is shown in Figure 5.
The output of SAGAN is passed as an input to deep CNN, which comprises of many layers: convolutional layer, max pooling layer, dropout layer, fully connected layer and flatten layer [44]. Where the convolutional layer overcomes the limitations of traditional neural networks by connecting a neuron to its neighboring neurons, known as receptive field. Convolutional operation is performed on input samples and convoluted feature maps are sent to the max pooling layer, which selects optimal features by reducing high dimensionality. The dropout layer is used to prevent the model from overfitting problem. The optimal features selected in the max pooling layer are used by the fully connected layer to change the dimensions of the vector. The flatten layer connects the input layers to the output layers. The architecture of CNN is shown in Figure 7.
The network structure of the proposed GANCNN model is given in Table 4 while Table 5 provides the hyperparameter of the model. In this study, GAN is used as the front end to process the input data while CNN is used as back end to process the non-linear features of the abstracted data. The GANCNN model has 13 layers, which alternate between convolution and maxpooling layers.

6) ADAM OPTIMIZER
In the literature, many optimization techniques are presented for tuning hyperparameters of the deep learning models. In [53], authors have analyzed different algorithms used for hyperparameters' tuning, which include genetic algorithm, cross validation, simulated annealing, etc. In this work, we use Adam optimizer to tune the hyperparameters of CNN, because it is considered as the best optimizer. It requires VOLUME 9, 2021   less memory, less execution time and is also efficient for large datasets. It possesses the properties of both Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp) optimizers, which give the best output [54], [55]. It also calculates the learning rate for each parameter. Moreover, it is also easy to implement as compared to other optimization techniques.

B. PROPOSED ERNET MODEL
The second model proposed in this work for ETD is comprised of five stages, as shown in Figure 8. The stages are same as defined in the GANCNN model. The SGCC dataset is used for this model as well. For dimensionality reduction, Sparse Auto Encoder (SAE) technique is used as a feature extractor. For data sampling, SMOTEENN is proposed. A hybrid of EfficientNet, ResNet and GRU, named as ERNET, is proposed for classification of theft and normal consumers. A detailed flowchart of the ERNET model is shown in Figure 9.

1) DATA PRE-PROCESSING
SGCC data is collected through smart meters, which are installed across the country. It contains missing or erroneous values. Therefore, we have applied same pre-processing steps, which are performed in the first proposed model. Linear interpolation method [56] is used to fill the missing values, which improves the training and performance of the proposed model.
Robust scaler normalization method [57] is used for normalizing the data. It is similar to Min-Max normalization. The only difference is that it uses quartile ranges and is considered more robust for normalization. Formula of robust scaler is given in Equation (6) RobustScaler x is divided into three parts: Q 1 , Q 2 and Q 3 that represent first, second and third quartile ranges, respectively.

2) DATA SAMPLING
As discussed earlier, a hybrid of two sampling techniques is used in this model for data sampling, known as SMOTEENN.   generalization and misclassification. To resolve the aforementioned issues, SMOTEENN is used in the proposed work. The working of SMOTEENN is presented in Algorithm 1. It can be seen from the algorithm that SMOTEENN comprises of two sampling techniques: SMOTE (lines 1-9) and ENN (lines [10][11][12][13][14]. The algorithm is presented to give a better understanding to the readers that how SMOTEENN would work. The technique starts with the oversampling of the minority class using SMOTE, which is an enhanced version of Random Oversampling (ROS). In SMOTE, new samples are generated by calculating the KNNs. If the data is highly imbalanced, it generates new samples of minority class equal in number to the majority class. The newly generated samples create overhead and cause overfitting problem. Afterwards, the majority class samples are removed using ENN, which also solves the aforementioned problems. SMOTEENN is used to efficiently solve the class imbalanced issue, which is not efficiently solved by SMOTE and ENN individually. An abstract view of SMOTEENN is shown in Figure 10.   and reduce the risk of overfitting. The data dimensionality is reduced by creating new features from the existing dataset. The number of newly created features is less than the existing features. SAE has been applied in many research fields like audio processing, subtitle processing, human body image detection, natural language processing, etc., for extracting optimal features [58]. In this method, input features are encoded before passing them to the hidden layer. After processing in hidden layer, the features are decoded at the output layer. As shown in Figure 11, features passed to the output layer are less in number as compared to those passed to the input layer. Encoding and decoding functions are defined using Equations (7a) and (7b), respectively [58].
where a is used for input layer, h is hidden layer, W is used to connect parameters among two layers and b is denoted as the  bias value. For non-linear mapping, we use sigmoid function (σ ), which is calculated using Equation (8) σ Sparsity penalty is an important step in SAE. It is calculated by adjusting the loss function, given in Equation (9).
y andŷ represent the actual and predicted outputs, respectively. Loss function's value is reduced by adjusting W, b and number of hidden layers. Whereas, overfitting is avoided using regularization.

4) ERNET BASED CLASSIFICATION MODEL
The proposed classification model consists of three sub-models including EfficientNet, ResNet and GRU, as shown in Figure 12. ResNet is an advanced variant of CNN that uses skip connections after each layer, as shown in Figure 13. Moreover, additional weight matrix is used in the model to learn the skip connections. The purpose of skip connections is to prevent gradient vanishing problem 98688 VOLUME 9, 2021  by reusing activation functions from previous layer until the adjacent layer learns its weights. These skip connections help in simplifying the model as it takes only few layers at the start of the training phase. The model restores the skip connections when it learns all feature space. EfficientNet is combined with ResNet to improve effectiveness of the entire ERNET model. To prevent the loss of information from Rectified Linear Unit (ReLU) activation function, the linear activation function is used in the last layer of each block. The final classification is made by using GRU model, which is an advanced version of RNN, as shown in Figure 14. GRU solves the problem of vanishing gradient by learning long-term dependencies. It has two gates: update gate and reset gate. Update gate is used for long-term dependencies while reset gate is used for short-term dependencies. GRU requires less data for high generalization, needs few tuning parameters and has fast training process. Table 6 shows the hyperparameter of the proposed ERNET model.

5) RMSprop OPTIMIZER
The basic idea of RMSprop is same as that of gradient descent. The only difference is of momentum. RMSprop works in a perpendicular position and its convergence speed is faster because it moves in a vertical direction [59], [60]. Due to this, the learning rate of model is increased. The main advantage of RMSprop is that it chooses several learning rates for each parameter. The scenario of local optima is illustrated in Figure 15. As shown in figure, the model starts moving from unit A and after one loop of gradient descent, it reaches unit B in the next square. In the next loop, the model reaches at point C that is more closer to the local optima.
Formula of RMSprop optimizer is given in Equations (10)- (12). For each parameter • ρ: exponential average of last updated value, • η: initial learning rate, • g t : gradient at time t, • v t : exponential average of square gradients, • : to avoid ending with zero.

IV. SIMULATION RESULTS
In this section, simulations are performed to validate the performance of the proposed models. Google Colaboratory tool is used to perform simulations and it also provides free access to Graphical Processing Unit (GPU) for data processing and storage [61]. This online tool is very helpful for performing high computational tasks.

A. ACCURACY AND LOSS VALUES OF THE PROPOSED GANCNN AND ERNET MODELS
In this section, accuracy and loss of both GANCNN and ERNET are calculated. Moreover, the models are compared with each other and existing models. To calculate the accuracy and loss of the proposed GANCNN model, dataset is divided into training and testing sets as 75% and 25%, respectively. The performance metrics and splitting criteria are set as in [62]. Early stopping method is applied to stop the training process when learning rate of the model does not improve. We measure the accuracy of GANCNN and ERNET models to check the closeness of the measured values with the known values. The accuracy of GANCNN is 0.95, as shown in Figure 16. Whereas, the accuracy of ERNET model is 0.98, which means it is accurately trained on the given data. Accuracy and loss are inversely proportional to each other. The training and testing loss of GANCNN and ERNET models are also calculated. In GANCNN, the training loss starts decreasing from 0.48 to 0.15; whereas, the testing loss starts decreasing from 0.37 to 0.16, as shown in Figure 16. On the other hand, the training and testing loss for the ERNET model is also shown in Figure 16. The loss value of training starts decreasing from approximately 0.5 and reaches below 0.1. Whereas, the loss value of testing decreases from 0.3 to 0.1.

B. COMPARISON OF ADASYNENN WITH EXISTING SAMPLING TECHNIQUES
In this section, we compare the proposed ADASYNENN with the existing sampling techniques, such as SMOTE, ENN, ADASYN, ROS and RUS in terms of accuracy and loss. In Figure 17, accuracy of the ADASYNENN is compared

C. ACCURACY OF GANCNN WITH DIFFERENT SAMPLING TECHNIQUES
In this section, we measure the accuracy of GANCNN model by applying different sampling techniques and is shown in Figure 19.  Figure 19 is labeled with V.2 and V.6, which validate the solutions S.2 and S.6, respectively.

D. PERFORMANCE METRICS AND COMPARISON OF THE PROPOSED GANCNN MODEL WITH EXISTING TECHNIQUES
In this section, the proposed GANCNN model is validated in terms of precision, recall, F1-score, Area Under the Curve (AUC), FPR and RMSE. In the existing literature, authors use these metrics to validate the performance of their models. AUC is used in [24], [26], [29], [63], FPR is measured in [26], [64]. Precision, recall and F1-score are used in [44]. Moreover, GANCNN model is compared with SAGAN, WDCNN, CNN and CNN-RF models in terms of aforementioned metrics. Precision of the proposed and existing models is calculated and compared in this section. It is used as a performance metric in many classification problems. Precision is defined as the total number of true positive samples divided by the total number of false positive samples and true positive samples. It is also defined as the ratio of correctly predicted positive observations to the total predicted positive observations. It is calculated using Equation (13) [63] The precision values of GANCNN and existing models are shown in Figure 20, which is labeled as V.6. The results show that the proposed model has the highest precision value; whereas, CNN has the lowest precision value.
Recall is defined as the total number of true positive divided by the total number of false negative and true positive. Figure 20 shows the recall value of the proposed VOLUME 9, 2021 F1-score is the mean of precision and recall. It is defined as the ratio of the product of precision and recall to their sum, which is multiplied by 2. In Figure 21, GANCNN is compared with the existing models in terms of F1-score. It is clear from the figure that GANCNN achieves better F1-score as compared to the existing models. F1-score is calculated using Equation (15) In Figure 21, FPR and RMSE for GANCNN and existing models are calculated. FPR is defined as the model's inability to detect actual electricity thieves. High FPR increases the chance of low classification accuracy. It is known as an important performance measure for classification. As, it tells about the users that are misclassified as thieves. If the FPR is high, it increases the on-site inspection cost. The formula of FPR is given in Equation 16 Figure 21 is labeled as V.8 and it validates the solution S.8.    The ERNET model is compared with ResNet, EfficientNet, GRU and MLP-LSTM in terms of precision, recall, FPR and F1-score. Figure 24 shows the values of precision and recall for ERNET and other models. As shown in the figure, precision of ERNET is 0.96, which is the highest than other models, which have less than 0.8 precision. Moreover, the recall score of the ERNET model is 0.94, which is the highest among all existing models. Whereas, recall score of ResNet, EfficientNet, GRU and MLP-LSTM is 0.72, 0.76, 0.85 and 0.70, respectively. The figure is labeled as V.7 and it validates the solution S.7.

E. PERFORMANCE METRICS AND COMPARISON OF THE PROPOSED ERNET MODEL WITH EXISTING TECHNIQUES
Furthermore, the comparison of ERNET with other models in terms of FPR and F1-score is shown in Figure 25. As it is illustrated in the figure, FPR of ERNET is below 0.1 that is the lowest among all. Whereas, the FPR of MLP-LSTM is 0.51 and is the highest. The FPR of EfficientNet, GRU VOLUME 9, 2021  and ResNet is 0.30, 0.25 and 0.45, respectively. Moreover, comparison results of the models in terms of F1-score are also shown in Figure 25. The F1-score of ERNET is 0.93, which is the highest than other models. Whereas, F1-score of other models is less than 0.85. The figure is labeled as V.9 and its validates the solution S.9. Moreover, Figure 26 shows the comparison of ERNET with existing models in terms of accuracy, sensitivity, specificity and AUC. The results show that ERNET outperforms all of the existing models in terms of aforementioned metrics. Figure 26 also shows that the value of precision-recall curve of the ERNET model is 0.94. Figure 27 shows the performance of the ERNET model with RMSProp and Adam optimizers in terms of accuracy and loss. The results show that with the RMSProp optimizer, ERNET achieves 0.95 accuracy for both training and testing. Whereas, with the Adam optimizer, accuracy of ERNET for both training and testing is 0.98. Furthermore, the training and testing loss of ERNET with both RMSProp and Adam optimizers is less than 0.1. Figure 28 shows the comparison of GANCNN with existing models in terms of AUC. As shown in the figure, AUC of GANCNN is 0.9, which is the highest than other models. Whereas, all of the other models have less than 0.83 AUC value. Figure 28 also shows the comparison of Adam optimizer with other optimizers, which include Ada-Grad, Stochastic Gradient Descent (SGD) and RMSProp. As shown in the figure, Adam has the highest accuracy and the lowest loss value as compared to other techniques. Because for each parameter, it uses momentum and adaptive  learning rate mechanism, which lead to faster convergence. The parameters of the Adam optimizer include η, beta1, beta2 and . In this scenario, the values of these parameters for GANCNN are set as: η = 0.001, beta1 = 0.9, beta2 = 0.999 and = 1e-08. In GANCNN, Adam optimizer is applied for better performance of the CNN model. Adam is also more suitable for large, noisy or sparse datasets. Whereas, in the ERNET model, RMSProp optimizer is used in ResNet. η, ρ, momentum, and centered are the parameters of RMSProp optimizer. We set the values of the parameters as: η = 0.001, ρ = 0.9, momentum = 0.0, = 1e-07 and centered = False. The comparison between Adam and RMSProp optimizer is as follows. RMSProp performs parameter updation using a momentum on the rescaled gradient. Whereas, in Adam optimizer, the parameters are updated through the running average of first and second gradient moment. Moreover, the bias-correction term is absent in RMSProp. However, Adam involves the correction term, which leads to fast convergence. The accuracy and loss comparison of different optimizers is observed from Figure 28.

G. AUC OF THE PROPOSED MODELS
In this section, AUC of both GANCNN and ERNET models is calculated, as shown in Figure 29. AUC dictates that how well a model performs in classifying the electricity thieves. The range of AUC is between 0 and 1. If AUC is near to 1, it means that model performs well in terms of classification and if it is near to 0, it means that model does not perform well. The threshold value is set as 0.5. The choice of selecting AUC of 0.5 is to examine the performance of the proposed models in terms of classifying electricity theft. As shown in the figure, AUC of GANCNN is 0.985, which is near to 1 that means classification of the model is good. Whereas, the ERNET model accurately separates both classes (majority and minority) and its value of AUC is 0.988. The results show that ERNET model outperforms GANCNN in terms of AUC. The AUC of 0.985 for the proposed GANCNN means that there is no discrimination in the classification of electricity theft, which means that any AUC greater than 0.5 is acceptable. The same explanation is given for the AUC of 0.988 for the ERNET model. Generally, the AUC of 0.985 means that the ROC curve falls on the diagonal line of the curve. It implies that the ROC curve on the diagonal line has no discriminatory ability. Whereas, the ROC curve above the diagonal line has discriminatory ability to classify electricity theft.

H. COMPARISON OF PROPOSED AND BENCHMARK MODELS' EXECUTION TIME
In this section, the execution time of GANCNN and ERNET models is discussed. We compared GANCNN with CNN-RF, MLP-LSTM, SAGAN and WDCNN models. As shown VOLUME 9, 2021 in Figure 30, MLP-LSTM has the highest execution time because of separate memory cell of LSTM. Whereas, the proposed GANCNN model has the lowest execution time because no additional memory cell is required to store the information for long-term. Moreover, the ERNET model is also compared with the above mentioned benchmark models. The figure shows that ERNET has low execution time as compared to CNN-RF and MLP-LSTM. Figure 30 is labeled as V.3 and V.11, and it validates the solutions S.3 and S.11. In Table 7, the performance analysis of the proposed models in terms of AUC, precision, accuracy, recall, F1-score, FPR and execution time, is given. From the analysis results, the proposed ERNET model outperforms GANCNN model in terms of accuracy while the proposed GANCNN model performs better than ERNET in terms of precision, recall, F1-score, FPR and execution time. The reasons for better performance of GANCNN model over the ERNET model are given as follows. The GANCNN model generates data that is similar to the actual data. It means that the generated data is not distinguishable from the actual data. Thus, a real synthetic data can be generated in order to address the data imbalance problem. Also, the GANCNN model learns the internal representation of the data such that any difficulty in the data is learned easily. Furthermore, after training the data, the discriminator of the GANCNN model classifies the data efficiently. On the other hand, the ERNET model requires more computational time and memory during training. However, the ERNET model achieves a better accuracy because it can perform well in a larger network with high dept and width. Also, ERNET preserves the computational power through proper scaling of network's depth and width; thereby, increasing its accuracy. Table 8

I. DISCUSSION
To efficiently analyze consumers' energy consumption data using classical machine learning methods, the class imbalanced problem must be addressed. The problem occurs when the overall number of one class of data (honest consumers' data) is more than the overall number of another class of data (fraudulent consumers' data). So, this paper proposes two deep neural networks: GANCNN and ERNET. The former is used to generate synthetic data using the actual data. The purpose of generating synthetic data is to solve the class imbalanced problem. Whereas, the latter is used for increasing the network's width and depth. The aim of increasing the width and depth of the network is to achieve higher prediction accuracy from a large dataset.
The effectiveness of the proposed GANCNN and ERNET models is evaluated using the following performance metrics:  precision, AUC, FPR, accuracy, F1-score and recall. In this paper, two case studies are considered for analyzing the proposed models. For the first case, the proposed ERNET model is compared with existing models, such as Efficient-Net, MLP-LSTM and GRU. The simulation results show that the proposed ERNET model outperforms the existing models in terms of all of the performance metrics. Figure 17 shows that the training accuracy of the proposed ERNET model is higher than other existing models. Figure 24, Figure 25 and Figure 26 show that the proposed ERNET model outperforms other models in terms of precision, recall, FPR, F1-score, accuracy, sensitivity, specificity and AUC. The reason for the better performance of the proposed ERNET model over other models is as follows. The ERNET model can perform efficiently in a large network with high dept and width. Besides, it conserves its computational power via scaling the network's width and depth. For the second case, the proposed GANCNN model is compared with other models, such as CNN-RF, SAGAN, WDCNN and CNN. From the simulation results, it is observed that the proposed GANCNN model outperforms other models in terms of all of the performance metrics. Figure 20 shows that the proposed GANCNN model is better than other models in terms of precision and recall. Figure 28 shows that the proposed GANCNN outperforms other models in terms of AUC and accuracy. Figure 30 shows the performance of the proposed GANCNN model in terms of execution time. The reason for the better performance of the proposed GANCNN model is as follows. The GANCNN model can generate synthetic data that is similar to the actual data, which is extremely important to solve the class imbalanced problem. Generally, the focus of the proposed work is to provide an anomaly detection mechanism that compares the electricity consumption behavior of different consumers regarding the trend of prediction error. If the current pattern of energy consumption of a consumer is different from the previous one, then an anomaly is detected for that consumer. The practical actualization of the proposed work is constrained for different available consumers' electricity consumption data, computational capacities of consumers and lack of privacy about the consumers' data. However, we do not consider the privacy preservation of consumers in our proposed scenario, but it will be considered in the future work.

V. CONCLUSION
With increase in the electricity demand over the years, two types of losses are faced by the power utilities: TLs and NTLs. These losses lead to other problems as well, such as huge revenue loss, class imbalanced problem, low accuracy and high FPR for detecting electricity thieves. To solve these problems, two models are proposed in this work. In the first model, ADASYNENN is used to solve the class imbalanced problem; whereas, LLE is used for feature extraction. Moreover, a hybrid technique based on SAGAN and WDCNN, termed as GANCNN, is introduced for ETD. The second proposed model consists of five stages. Firstly, interpolation is used to remove the missing values. Secondly, robust scaler method is used for data normalization. Thirdly, SAE is applied for reduction of data dimensionality. Fourthly, SMOTEENN is applied to solve the class imbalanced problem. Finally, for classifying honest and theft consumers, a hybrid ERNET model is introduced, which is a combination of EfficientNet, ResNet and GRU. Additionally, RMSProp optimizer is used to enhance the performance of the model. To validate the proposed models, extensive simulations are performed using SGCC dataset. Different performance metrics, such as precision, recall, F1-score, FPR, accuracy and AUC are used for evaluation. The results of GANCNN for precision, recall, F1-score, FPR, accuracy and AUC are 0.95, 0.99, 0.9, 0.05, 0.95 and 0.985, respectively. On the other hand, the results of ERNET for precision, recall, F1-score, FPR, accuracy and AUC are 0.94, 0.93, 0.89, 0.02, 0.98 and 0.988, respectively. Moreover, the proposed models are compared with the stateof-the-art models, which include SAGAN, WDCNN, CNN, CNN-RF, MLP-CNN, MLP-LSTM, LSTM and CNN-LSTM. The results show that the proposed models outperformed the aforementioned benchmark models. In future, proposed models will be used for other theft related areas, such as banking. Moreover, different datasets will be used to determine the effectiveness of the proposed models. she was a Visiting Scholar with Stanford University. She is the author of more than 120 articles and the principal investigator of more than 50 projects. Her research interests include application of artificial intelligence technology in electrical engineering, power system security and control, and power system planning and reliability.
TANZEELA SULTANA received the bachelor's degree in computer science from The University of Azad Jammu and Kashmir (UAJK), Kotli campus, Kotli, Azad Kashmir, in 2016. She is currently pursuing the master's degree in computer science from COMSATS University Islamabad, Islamabad, Pakistan. She is currently working as a Research Associate with the Communications over Sensors (ComSens) Research Laboratory, Department of Computer Science, COMSATS University Islamabad. She has authored over three articles in technical journals and 11 proceedings in international conferences. Her research interests include data analytics in smart grids, cloud computing, and blockchain in IoT and vehicular networks.