Modeling, Simulation and Optimization of Power Plant Energy Sustainability for IoT Enabled Smart Cities Empowered With Deep Extreme Learning Machine

A smart city is a sustainable and effective metropolitan hub, that offers its residents high excellence of life through appropriate resource management. Energy management is among the most challenging problems in such metropolitan areas due to the difficulty and key role of energy systems. To optimize the benefit from the available megawatt-hours, it is important to predict the maximum electrical power output of a baseload power plant. This paper explores the method of a deep extreme learning machine to create a predictive model that can predict a combined cycle power plant’s hourly full-load electrical output. An intelligent energy management solution can be achieved by properly monitoring and controlling these resources through the internet of things (IoT). The universe of artificial intelligence has produced many strides through deep learning algorithms and these methods were used for data analysis. Nonetheless, for further accuracy, deep extreme learning machine (DELM) is another candidate to be investigated for analyses of the data sequence. By using the DELM approach, a high level of reliability with a minimum error rate is achieved. The approach shows better results compared to previous investigations since previous studies could not meet the findings up to the mark and unable to predict power plant electrical energy output efficiently. During the investigation, it is shown that the proposed approach has the highest accuracy rate of 98.6% with 70% of training (33488 samples), 30% of test and validation (14352 examples). Simulation results validate the prediction effectiveness of the proposed scheme.


R
Hidden layer output S Resultant matrix γ Make the matrix more generalized A FE Output of the hidden layer hl Hidden layer R hl i,j weight and bias among the output and the hidden layer tp j Calculated output S Transpose of the resultant matrix Rγ Hidden layer output A l Weight matrix of l th hidden layer R Bias of hidden layer neurons B l Estimated output of the first hidden layer R l Estimated output of the l th hidden layer A FE R E Net hidden layer value

I. INTRODUCTION
The smart city is a new concept, well defined and used by many researchers and institutions [1]. In a simple term, the smart city is intended to tackle or minimize problems created by rapid urbanization and population growth, such as energy consumption, waste, and mobility, through the highest efficiency and resource optimization. In the literature, one can be finding a lot of classifications of the smart city intervention areas [2]. The main disadvantage of these categories is that they classify energy mainly based on the infrastructure, ignoring other appropriate energy aspects such as a smart grid. Urban life is an important challenge already in our daily lives [3]. The Population Fund of the United Nations forecasts that by 2030 around 60% of the world's population will live in urban areas whereas 27 large cities with over 10 million people are projected to exist [4]. Thus, urgent solutions for sustainable living and urban development are being sought. The energy demands of cities are complicated and extensive [5]. As a result, modern cities should enhance current technologies and execute new alternatives in a timely and effective manner, taking advantage of the interactions between all these energy approaches [6]. The intermittent use of renewable resources, increased demand, and the need for energy-efficient systems represent important energy challenges that, as usual, are more addressed in general than separately.
Simulation models were designed to help participants understand city dynamics and mitigate the influence of energy policy substitutes [7]. However, these attempts very often tackle individually the energy regions, where the whole image is missing and therefore inadequate alternatives are produced [8]. To satisfy the growing energy requirements of current and future cities effectively, an extensive smart city model that involves all energy operations whilst maintaining the volume and complexity of the model manageable is extremely desirable [9]. Energy management performs an important part in identifying an appropriate and efficient alternative to reduce peak demand and attain energy conservation [10]. The Internet of Things provides numerous advanced and ubiquitous applications for smart cities [11]. IoT applications are increasing their energy needs, while IoT devices are growing both in terms of numbers and requirements [12]. Therefore, Smart city solutions must be capable of efficient energy use and of addressing the challenges associated with it.
On the other hand, it can help us to look at the light at the end of the tunnel by developing huge information and communications technologies, and by using the internet power [13]. Even now, without the internet, someone can hardly imagine our lives [14]. It is even difficult to think that with the increasing accessibility of linked items; the internet today is to achieve individuals to discover data and to help us in our regular life with creative facilities is not expanded to create value from such enhanced accessibility of linked items [15]. This research is about helping people to move from the Internet of People to the Internet of Things (IoT), in particular in future smart cities. According to the European Research Cluster for the Internet of Things, IoT is an auto-configuring vibrant, worldwide network infrastructure where physical and virtual items can be identified and communicated by normal and interoperable procedures. While IoT is expected to be the perfect hope of viable city lives [16]. On the other hand, in order to make IoT smarter, lots of analysis technologies are launched into IoT; in which few of the most valuable technologies are Data Mining, Artificial Intelligence, Cloud Computing and Neural Networks [17].
In this research work, managing energy is regarded as the main prototype for the implementation of smart city complicated energy systems. This research work provides a short summary of energy management and difficulties in smart cities, then specify a cohesive structure for energy sustainability for IoT based smart cities. A deep extreme learning machine (DELM) approach will be used to make smart cities energy efficient with IoT enabled sensors with improved performance. Energy-efficient systems are entering future smart cities with superior equipment, control structures, and demand response policies [18].
The Internet of Things (IoT) transmits the prospective of transform communities around all over the world into ''smart cities,'' actually by creating a new lifestyle of urban living [19]. The major benefits include increased safety, Health, improve education and living houses environments, Energy consumption, more efficient of climate and ecosystem, green economy and improve the employment system [3]. Even though the basic idea of smart cities has existed for almost more than 10 years, but the concept has come a long way since it is introduced at the beginnings, now the posed to radically alter city life that is an emergence of different serious enablers like the internet of Things. Reference [20] stated that the vision of smart cities is a safe, secure, eco-friendly and effective metropolitan center of the future because of all the structures like power, water, and transportation.
Reference [21] stated that a smart city is an innovationintensive and improved city that links individuals, data and urban features through innovative technologies, to build a sustainable, safer city featuring competition and innovative trade, and improved quality of living. As citizens shape such a city through ongoing relationships, they are the main component of smart cities. That is why a smart population is acknowledged as the main driving force of smart cities; therefore, in smart cities, learning, education, and communication are significant strategies. Reference [22] argued that a smart city is an intelligent community in which different components such as individuals, climate, mobility, democracy and the economy are constructed within an intelligent structure.
Fog Computing is an alternative term to cloud computing that places certain transactions and resources at the edge of a network, rather than creating channels for cloud storage and use [23]. Fog computation can decrease the bandwidth requirements by providing no data on cloud channels and aggregating it instead at certain entry points, such as routers. This enables more strategic data compilation that may not be necessary for cloud storage. Fog computer extends the normal cloud computing to the brink and is thus also known as edge computing [24].
In this paper, a deep extreme learning machine for the prediction of power plant electrical energy output is investigated to achieve the highest accuracy. In the training and testing of estimation of electrical energy output with deep learning, a data set named combined cycle power plant with 47840 data instances are used, so that each instance includes different and diverse characteristics. Consequently, the examination and comparison with state-of-the-art techniques in the same field are made.
The remainder of this paper is organized as follows. Section 2 briefly describes the related work. Section 3 presents the method to carry out a comprehensive evaluation for the prediction of power plant electrical energy output. Section 4 discusses the simulation and results of the DELM approach. Sections 5 discuss the conclusions from the study.

II. LITERATURE REVIEW
The smart city relates to the convergence, the generation of big data and artificial intelligence systems, and the evolving IoT and smart city systems [25]. IBM utilizes a concept of cognitive computing to define technologies that can learn from various information sets, provide explanations for interaction and acquire knowledge with individuals in natural languages. Google Now is also a service to make customer suggestions and to provide the customer with the most helpful data at the correct moment. This scheme is taught by users ' previous behaviors and inputs in Google accounts including Calendar, Chrome, Gmail, Search and YouTube [26]. It is nearer to the cognitive age with the use of natural language comprehension embedded with other facilities like search engines.
IoT is the latest communication paradigm that will emerge in the future when all items used in daily life are integrated with electronic transmission and interaction microcontrollers, detectors and transceivers and protocol servers. This type of protocol can be used to prepare them for a great opportunity to interact with one another and with customers and to become an essential part of the Internet [27]. The smart city project is most popular now a day. The latest estimates and research forecast of the World Health Organization that the population of the world will live in municipal regions in the future. Reference [28] supposes that, by 2050, 70% of the population will live in cities. The smartness project is used for creating luxury in life. The smart city is a project which uses the latest technology to enhance the quality of urban life. Smart cities also improve the performance of the environment and provide people with superior facilities [29]. Transmission and Information Technology is important to support and change urban cities to smart cities [30].
Reference [31] decided to explore new ways of tackling the problems of energy management. Among the approaches discussed he chose to focus in particular on methods for ' machine learning ' to manage repeated in-patient real-time scenarios through the use of historical data. It was at a period when the revival of artificial neural networks was pursued with great enthusiasm.
In order to address nonlinear relations in the ability of artificial neural networks, environmental conditions are considered as inputs of the model and energy generated as outputs of the model. We can forecast the plant's output power based on environmental conditions using this model [32]. A GA method for the production of MLP for CCPP power output estimates has been proposed in this study [33]. MLP model selection based on the GA Multiple-layers perceptron is performed using an individual mutation, three different crossover processes, and two distinct fitness functions for several hidden layers.
The Internet of Things (IoT) connects multiple physical gadgets, cars, mobile and dissimilar stuff that can interact and exchange data with hardware, software, sensors, actuators and network [34]. Since it was founded, IoT has played a crucial part from a variety of traditional equipment to prevalent household objects and in latest years has attracted the attention of educational, academic, technological and industrial scientists as well as the public. The main aim is to control, manage, and monitor everything readily at a key stage, and to automatically identify other stuff which is interconnected and can even make choices on their own.
In [35], For different local ambient conditions, the ANN model is employed to predict the working parameters and efficiency of a gas turbine. Intelligent systems are also used to model a fixed gas turbine. In [36], the identification methods for the ANNs have been established and the findings have shown that the identification system for the ANN is perfectly suited for estimating gas turbine conduct from full speed to full load conditions in many different operating points.
In addition, the research on electricity consumption with machine learning instruments has also been carried out several times [37], [38]. Some research, such as [39], has been found to be a similar study on the total electric power produced by a cogeneration power station with three gas turbines, one steam turbine, and a district heating system in this article.
Similar to waste storage procedures and norms, procedures and mechanisms for the recycling of information are needed in metropolitan towns with the generation of hundreds or thousands of gigabytes of information per second. Methods of data analytics and algorithms of machine learning should be prepared to obtain information and knowledge to decrease digital waste [40]. Although computing and storage techniques have recently advanced, most data-analytical approaches take advantage of sampling techniques that are time-efficient but overlook a big number of data that can contain significant information models not presented in the samples. Data sets with lots of constraints can be taken to obtain perceptive analysis by means of the use of deep neural networks (DNNs) [41]. The most important thing, is the IoT platform actually designing and development, requires a perfect solution that is called middleware-level solution to enable the seamless interoperability between Machine-to-Machine based application and existing internet-based service [42].
There are several works in the research community that suggest cognitive solutions, that are appropriate to the requirements of IoT devices. Reference [43] suggested a cognitive management structure that allows smart objects to communicate and thus create end-users more conscious. The focus in this work was on the reuse of available object functionality and services across three levels, comprising virtual objects (VOs), virtual composites (CVOs) and level of service. The type of service originates from the features that a stakeholder or a specific implementation requires of the desired service. The CVOs will be responsible for these functionalities. The writers demonstrated that in their suggested structure the service delivery period is reduced, leading to reduced operating costs.
An additional research project carried out by Wu et al. to develop a cognitive structure for IoT applications is Cognitive IoT (CIoT) [44]. The structure provides relationships of five behavioral functions: a perception-action cycle, mass analysis of information, semantic derivation and discovery of information, smart decision making and supply of on-demand services. They acknowledged two areas for the understanding and learning of objects in a cognitive setting. They derived the semantics from the analyzed data and found useful outlines and directions as knowledge.
Reference [45] outlined a cognitive framework for smart homes on the basis of cognitive dynamic, IoT systems and utilized the Bayesian, Bayesian filter and RL models in the core of their cognitive memory. The Bayesian model ranks above the environmental control unit. The Bayesian filter measures the status of the scheme and RL offers the method for determining the finest feasible activities based on the complete awards.
Reference [46] have opted for integrating artificial intelligence in fog computing to promote intelligent large data exploration. In comparison to centralized cloud intelligence and assessment. They implemented a hierarchical fog computing model for big data analysis for intelligent city applications. This model enhances general efficiency by decreasing the communication bandwidth because raw information is not transmitted to the cloud, and because the fog is close to the information source in real-time analysis. In their model, they used a hidden Markov (HMM) model to assist big data analysis in an intelligent pipeline surveillance scheme.

A. SYSTEM MODEL
The smart city manages and controls resources through smart data systems. To ensure optimum supplies and effective use of urban assets, the development of IoT-based technologies to tackle these issues generated by those technologies is essential. Furthermore, intelligent alternatives are the primary premises of a smart city for transport, healthcare, comfort, farming, and public. In this research work, the energy efficiency model has been introduced for smart cities, which show how deep extreme learning machine techniques are used to provide the inhabitants of the cities with superior facilities. The general intelligence context in smart cities is presented in this paper. This structure provides four intelligence concentrations: smart city and IoT facilities, deep extreme learning machine, fog computation, and cloud computation. FIGURE 1 shows the general situation of the deep extreme learning machine approach within the smart city structure hierarchy where the intelligent software manager that is implemented in fog or in the cloud depends on the features of the analytics needed. The raw information can, therefore, be transmitted to the cloud or the fog. The operational analytical agent based on the approach of a deep extreme learning machine then gives a proper response based on predictions for infrastructure devices (e.g. adjust the consumption of energy on the basis of the information). The reason behind this architecture is to deepen the abstraction of information and understanding while transiting the information through the smart city infrastructure. A city-wide abstraction is required at the highest levels for the long-term management of city resources and services. On the other side, sensor or intelligent object-generated information is used at the smallest stage for short-term management of assets and facilities. In addition, fog-driven analytics promote local activities in predefined situations, while cloud-driven analysis can cover broader geographical areas with diverse situations. Fog Computing further divided into two sub-phase training and validation phase. In the training phase, backpropagation is used to train data and then trained model export to the cloud. Respectively in validation phase trained model import from the cloud and predict electricity production of a power plant from real data.
IoT infrastructure levels are used to detect surroundings by the sensors and resource-contracted equipment. The limitation of the resources of those devices impedes the use of complex and extensive learning models. However, in order to take analytics and intelligence nearer to the information source (for example, end-users, IoT-resource resilient equipment), contemporary and sophisticated designs VOLUME 8, 2020 of learning like deep extreme learning machines are required.
A new path of research is to overcome this resource limit, so that deeper neural model networks can be used. During the past few years, various methods have also been suggested for compressing or planting deep neural networks in order to load them into IoT systems, wearable electronics, and smartphones that are restricted by resources [47]. The raw information is collected and transferred to the Cloud computing level on the fog computing stage. The DELM technique can be used at this stage because the resources at this stage are lower than the IoT assets. Lightweight intelligence must also be provided at this stage to the IoT gates and representatives in order to efficiently integrate facilities in the assistance of smart city applications [48]. At the cloud computing stage, mechanisms and methodologies can be incorporated with semantical teaching and ontologies to obtain high-level ideas and trends from the gathered information. Deep learning models are extremely appropriate at this stage since they can provide a more profound overview of data.
The purpose is to make the infrastructure more intelligent so that resources can efficiently utilize it. They want modern technology, including water wastes, electricity consumption, transport traffic congestions and so on, to address many pressing problems. Building intelligent cities help them address all of these issues, resulting in favorable economic results. This will make the living atmosphere more effective and sustainable.
The assumption of smart cities appears interesting, but how viable is it. This must obtain information in each industry in order to be effective. It is hard to implement alternatives at this level because every city is distinctive and therefore each moment a fresh set of issues arises. If someone wants to construct a scalable model, this needs to discover models that can constantly be used by many distinct cities to implement the intelligent city model. The focus is on intelligent data analysis and building solid information collection capabilities, communication protocols, inter-operability between computers, data storage systems, intelligence levels, etc.
In this research work, the deep extreme learning machine (DELM) technique is being unified to make smart cities energy efficient with IoT enabled sensors. FIGURE 3 demonstrates that in deep extreme learning machine (DELM) diverse amounts of hidden layers, different hidden neurons, and numerous kinds of activation functions have been used to attain the finest structure of DELM for energy efficiency. The proposed technique comprises of three diverse layers, namely data acquisition, pre-processing, and application layer. In the application layer, there are two sub-layers namely the prediction layer and performance evaluation. Real data from sensors are collected and actuators for experimental analysis. The data collected are provided for the acquisition layer as an input. Various data cleaning and preparation systems were implemented to extract anomalies from the information in the pre-processing layer. In the application layer, Deep extreme learning machine (DELM) have been used for energy efficiency. The DELM takes the benefits of both extreme learning and deep learning techniques [49].
The complete system procedure is shown in FIGURE 2. In which layer of data acquisition contains the parameters of input, they will go to the neural system, where an algorithm has been trained to predict power plant electrical energy output. Nowadays, artificial neural networks can be used in all sectors. The artificial neural network comprises of a set of neurons which are characterized by special arrangement.
The main parts of an artificial neural network are neurons and connections between them. A neuron is the fundamental unit of processing information that forms the foundation for the performance of ANN. Neurons are conjunct processing elements that work together to solve a problem.

B. DEEP EXTREME LEARNING MACHINE
The deep extreme learning machine (DELM) is a well-known method used in various areas for predicting health problems, energy consumption predictions, transportation and traffic management, etc. [49]. The traditional ANN algorithms require more samples and slow learning times and can overfit the learning model [50]. The idea of ELM was first specified by [51]. The DELM is used widely in various areas for classification and regression purposes because DELM learns fast and it is efficient in the cost of computational complexity. Extreme learning machine is feedforward neural network which means data only goes one way through the series of layers but we have used backpropagation method in this proposed model during training phase where information flows back through the network and in backpropagation method network adjust the weights to achieve high accuracy with minimum error rate. During validation phase weights of the network are constant in which we import the trained model and predict the real data. There are three layers included in the DELM model the input layer, multiple hidden layers, and an output layer. The structural model of a DELM is shown in FIGURE 3, where np represents input layer nodes, represents hidden layer nodes, and O p indicates output layer nodes.
In order to make energy sustainable cities, the prediction of electricity production in a power plant is a major reallife problem. For the effectiveness and cost-effectiveness of a power plant, the complete energy production of a baseload power plant should be properly predicted. It is helpful to increase the revenue from the accessible megawatt-hours (MWh). Turbine reliability and durability rely heavily on the forecast of its energy generation, especially where elevated profitability and contractual liabilities are restricted. This research examines the deep extreme learning machine (DELM) method for developing a predictive model to estimate the complete energy production of a power plant hourly.

C. SERVICE-ORIENTED ARCHITECTURE
The main advantage of smart grids lies in the capacity of incorporation of energy sources into the network and control of energy consumption and generation, as is shown in III-D.
Energy generation is the first step of the smart grid value chain, it involves power sources and relieves broad-based technology monitoring and control to communicate with the next step known as power distribution. This is focused on a proximity network connecting consumers to the grid and transmitting data via advanced metering facilities. Power consumption is the final step in the smart network value chain, involving both residential and industrial electricity users. Therefore, to maximize the operation, it is very important to supervise their consumption and production.

D. DATASET
A power plant's baseload function is affected by four primary parameters that are used in the dataset as input factors, such as temperature (T), ambient pressure (AP), relative humidity (RH), and exhaust vacuum (V). These parameters influence the production of electrical power, which is termed the target variable such as electrical power output (EP). The temperature (T) input variable is measured in the range between 1.81 • C and 37.11 • C. Ambient pressure (AP) is the input variable and is measured in the range between 992.89 and 1033.30 millibar. The variable relative humidity (RH) is evaluated as a percentage in the range from 25.56% to 100.16%. The variable exhaust vacuum (V) is evaluated with the spectrum between 25.36 to 81.56 cm Hg. The data set uses electrical energy output (EP) as a target variable. It is calculated in megawatt with the range of 420.26-495.76 MW. In the data acquisition layer, inputs will be taken from the collection of data such as temperature(T), ambient Pressure, (AP), relative humidity (RH) and exhaust vacuum(V). These variables affect the output of electrical energy, which is termed as the target variable (EP). In the pre-processing layer, cleaning abnormalities in data and data reduction will be used for quality data in machine learning. In the application layer, a deep extreme learning machine method will be considered for energy management. In the evaluation layer, three parameters mean absolute error (MAE), root means square error (RMSE), means absolute percentage errors (MAPE) and mean square error (MSE) will be observed for calculating full load electrical energy output to improve the efficiency of energy management for smart cities. In this article, DELM was used to train and fit 47840 sets of data. This data arbitrarily divides into 70% of training (33488 samples), 30% of data is used for validation and testing (14352 samples). In Table 1 a pseudocode of proposed deep extreme learning machine of a powerplant electrical energy output is described.
In Eq. (1), a mathematical representation of the moving average filter is considered. Where u ( ) denotes the inputs, P [ ] denotes the output and G represents the point of moving average.
In the modeling of machine learning algorithms to increase predictability and to improve the training process, complete  3) and (4), where the dimensions α and β are the input matrix and output matrix. The ELM will then adjust weights arbitrarily among input and the hidden layer. Where k D1 represents the weights between the k th input node and l th hidden layer node as shown in Eq. (5).
Next, the ELM has randomly selected the biases of the hidden layer nodes, as in Eq. (7). Extreme Learning Machine also preferred an f(u) function that was the network activation function. Data acquisition layer in Fig. 1, the resulting matrix is shown in Eq. (8). Respectively the column vector of the resulting matrix S shown in Eq. (9) [49]. 1, 2, 3, . . . .., n) Then we can obtain Eq. (10) with regard to Eqs. (8) and (9). The outcome of the hidden layer is R and transposition of S as S' and values of weight matrix γ are calculated in Eq. (11) with the least square method.
The γ regularization term was used to increase the network's overall stability [52]. Deep learning emerges and now a day the very famous subject for researchers. A system with a minimum of four layers with inputs/outputs fulfills the needs of a deep learning system. The neurons of every layer are trained in a deep neural network on a diverse set of parameters with the result of the previous layer. This allows extensive datasets to be processed by the deep learning networks. Deep learning captured numerous researchers ' attention because it is very effective in solving real-world issues. DELM is used in our proposed work to capture both ELM advantages and deep learning. DELM's model comprised of a single input layer with four neurons, six hidden layers, with Ten neurons each hidden layer, and FIGURE 3 shows one layer of output with one neuron. The test and error method for selecting the number of nodes from hidden layers was used because of the lack of any special mechanism for specifying hidden layer neurons. The second hidden layer output is achievable as; where γ + is the general inverse of a matrix γ . Thus, second hidden layer values can be easily attained through Eq. (11).
In the Eq. (13), the parameters A l , R, B l , and R l represent the first two hidden layers ' weight matrix, preference of the first neurons in the hidden level, the first hidden layer assessed output, and the second hidden layer estimated output respectively.
F + E is the inverse of F E and in order to compute Eq. (5), the active function f (x) is being used. Hence the required outcome of the second hidden layer is revised as follows by indicating the correct f (x) activation function: Updating the weighted matrix γ among the second layer and third layer as per Eq. (16). R + l+1 is the inverse of R l+1 . The estimated layer's results are thus shown as Eq. (17).
Sγ + new is the inverse of the weight matrix µ l+1 . The DELM then sets the matrix F M E l = [B l+1 , A l+1 ]. Eqs. (10) and (11) allow the output of the further layer to be achieved.
The back-propagation algorithm incorporates, weight initialization, feedforward, back error propagation, and weight and bias update is subject to distinctive developments. An activation function like g (x) = sigmoid exists on each neuron in the hidden layer. The sigmoid input function and the hidden layer of the DELM can be composed in this way; r j = Desired output tp j = calculated output Eq. (18) shows the backpropagation of error, which can be calculated by the sum of the square of the desired output from the calculated output divided by 2.
To reduce the overall error the adjustment of weight is required. Eq. (19) shows the rate of change in weight for the output layer.  writing Eq. (20) by using the chain rule method The value of change weight can be achieved after substituting the values in Eq.
The calculation to determine appropriate weight change to the hidden weight is shown in the following procedure. It is more complex because the weighted connection can lead to errors at all nodes.
From R 6 to R 1 or R n Where n = 5, 4, 3, 2, 1 The process of upgrading the weight and bias among the output and the hidden layer is shown in Eq. (23).
Eq. (24) shows how updating the weight and bias among the input and the hidden layer.

IV. RESULTS AND DISCUSSION
In the proposed article, the deep extreme learning machine algorithm has been applied to data [53] and in this regard, the Matlab tool has been performed for simulation. In Matlab, a python script was implemented to train data [53]. In this article, DELM was used to train and fit 47840 sets of data. This data arbitrarily divides into 70% of training (33488 samples), 30% of data is used for validation and testing (14352 samples). Data were previously processed to remove data abnormalities and free the data from error. DELM has attempted to discover the finest configuration model for power plant electrical energy output prediction in different hidden layers, hidden neurons and combinations of Activation Functions. Therefore, we have tried the same number of neurons, different types of active functions in hidden layers. In this work, we used the proposed DELM for prediction to properly test the effectiveness of this algorithm. In order to measure the performance of this DELM algorithm together with the counterpart algorithms, we used different statistical measures written in Eqs. (25,26). The exhaustive search is applied in this study to the original dataset consisting of four parameters as input variables and a target parameter as a response. The goal is to choose a minimal model with the best subset that predicts the response correctly [54]. To this end, we applied an exhaustive search to the original dataset after collecting preliminary statistical data to find the best subset by comparing all the competing candidate subsets (2 4 − 1 = 15) in the experiments. In addition, we divided the experiments into four groups, applying the sub-sets with one, two, three, and four parameters to the regression methods. We calculated the best subset of each test in these experiments by analyzing and comparing the results of all regression methods for the candidate subsets, which are shown in TABLE 2. In Eq. (25), O represent the predictive output of megawatt-hour and T represent the actual output. O 0 and T 0 represent that there is no change in predictive and powerplant electricity production output respectively from the previous cycle. O 1 For the occupancy data set [53], the DELM approach was used and the results obtained can be seen in TABLE 3. After comparing with expected output and result that got after applying the proposed approach, Table 3 shown that the result of our proposed approach gives 98.6% accuracy and 1.6% miss rate during training.
We take 30% of data (14352 samples) for testing and validation from the dataset [53]. After comparing with the expected output and result that got after applying the proposed approach it can be shown in Table 3. Table 3 also shows that the proposed approach accuracy during testing and validation is 93.9% and the miss rate is 6.1%.
It is observed that the overall performance of the proposed method (DELM) during training was 98.6% accurate as shown in TABLE 4, while the miss rate of training is 1.6%. But in testing and validation, the overall performance of the proposed method (DELM) was 93.9% accurate, while the miss rate of training is 6.1% as shown in Table 3. It also observed from TABLE 3 in the training phase results accuracy increases with the minimum miss rate as compared to the testing and validation phase. According to Table 4 [55] utilized combined cycle power plant data set [53] consisting of 47840 data samples, the mean square error in each round is increasing. In the [55] approach, there is only one hidden layer and an increasing number of neurons.
TABLE 5 shows that during training the mean square error of [55] methodology is decreasing respectively but It is observed that MSE of proposed DELM approach with the same number of neurons is less than [55]. Moreover, with the increase in the number of hidden layers, the error is further mitigated. However, the number of neurons is taken 10 in all the cases, whereas in [55], the number of neurons is significantly high with a single layer. It can lead to the conclusion that having more layers with a smaller number of neurons is a better idea compared to having more neurons in a single layer.
For subsequent tests, in order to compare the performance of different learning methods, 5 × 2 cross-validation [56] was applied. In this scheme, the dataset is randomly shuffled 5 times and each of them used in the 2-fold CV. The resulting validation set performances of size 10 are used for statistical significance tests.
Further, it is observed that in [55], with a single hidden layer when the neuron count increases the performance of the system also increases as shown in TABLE 5. While, in the proposed DELM solution as the number of hidden layers increases with the same (10) number of neurons, the performance increases significantly. This means that the performance of the system is increased by the increased number of neurons but not much as in the proposed DELM system. TABLE 5 enlists MSE performances of local models (LM). Here, LM1 is k-NN + ANN model with k=100, LM2 is K-Means + ANN with K=20, LM3 is a K-Means + ANN ensemble of population 3 with K=20, LM4 is a K-Means + ANN ensemble of population 5 with K=20 and LM5 is a K-Means + ANN ensemble of population 3 with K=10. As can be seen local models with clustering yield better results with an ensemble. TABLE 5, the prediction of power plant electrical energy output is carried out by all conventional findings with data from the combined cycle power plant data set. The proposed approach with DELM outperforms in terms of accuracy by other prototypes, such as backpropagation [55]. The overall results for [55] were 89.56%, but the proposed DELM system performance is 98.6% and was higher than the previously proposed methods in terms of accuracy rate. The values of the statistical measures suggest that DELM performance is much higher than the other approaches. So, the proposed DELM is a considerable choice for the power plant electrical energy output prediction. VOLUME 8, 2020  In Figure 5, the prediction of powerplant electrical energy output is carried out by all conventional findings with data from combined cycle powerplant [53] other than the Artificial Neural Network [31]. The proposed approach with DELM outperforms in terms of accuracy by other prototypes, Artificial Neural Network [31], GA base Multilayer Perceptron [33], Regression ANN Model [32] and K-Means + ANN [55]. Among these approaches, the worst approach is an Artificial Neural Network [31] with an RMSE of 47. Moreover, K-Means + ANN [55] is the best approach during the training phase compared to Regression ANN Model [32]. In addition, the accuracy of the GA base Multilayer Perceptron [33] and Regression ANN Model [32] is quite close. The proposed DELM system RMSE is 2.61 and was lower than the previously proposed methods in terms of accuracy. The values of the statistical measures suggest that DELM performance is much higher than the other approaches. So, the proposed DELM is a considerable choice for powerplant electrical energy output prediction.

V. CONCLUSION
Modeling, analysis, and prediction of power plant electrical energy output is a challenging task. In this research, a model for power plant electrical energy output prediction has been proposed to improve the prediction accuracy. The proposed model is an expert system based on an artificial neural system (ANN) with a deep extreme learning machine (DELM) possessing a high level of potential to predict power plant electrical energy output. Various numbers of the hidden layer neurons were defined, and diverse activation functions and features were used for the ideal arrangement of different DELM parameters to obtain an optimized structure.
For measuring the performance of the proposed approach, various statistical measures have been used. These measuring figures show that proposed DELM in contrast to other algorithms is way better in terms of accuracy. Compared to past approaches, the proposed DELM technique produces attractive results. The proposed technique exhibits 98.6% accuracy which much better than the existing techniques. Moreover, it is observed that the proposed approach exhibits an affordable computational complexity. DELM has been used in the proposed work to encapsulate the benefits of ELM as well as deep learning. We are confident in initial results and intended to expand this work in the future by investigating different datasets, learning machines, structures, and algorithms.

FUTURE WORK
In this segment, we explored how the technique of a deep extreme learning machine used to predict electric power plant energy output. Compare this issue to the systematic experiments of simulated datasets, in particular, because the neural network is only supplied with partial information: the neural network is blind to certain topological changes, for example, changes irrespective of the presence or absence of power lines. The deep extreme learning machine performs better in these more complicated cases than the neural network baseline. This allows approximation of flows without complete information about the topology of the power plant compared to most models in use today. This model performs fairly well on data similar to those on which it has been trained. This is an important result: deep extreme learning machine can be used for grid operations. Still, the flows on completely unseen topologies have been struggling. This more refers to an evaluation metric issue. Transmission system operators do not really want to estimate flows; they want to know whether or not the power grid will be unsafe.
Further research could be done on this dimension. The aim of future studies will be to identify and quantify this metric more accurately. The neural network would be more often retrained to improve the performance of the new topologies. An algorithm that can be learned within a few hours and that has the property of being "fine-tuned" has been tested over a span of more than a month. Another way of improving the results would be by learning not only from snapshots but also from the power plant that operators are observing.
SAGHEER ABBAS received the M.Phil. and Ph.D. degrees in computer science from the School of Computer Science, NCBA&E, Lahore, Pakistan. He is currently working as an Assistant Professor with the School of Computer Science, NCBA&E. He has been teaching graduate and undergraduate students in computer science and Engineering for the past eight years. He has published about 60 research articles in international journals as well as reputed international conferences. His research interests primarily include cloud computing, the IoT, intelligent agents, image processing, and cognitive machines with various publications in international journals and conference.
MUHAMMAD ADNAN KHAN received the Ph.D. degree from ISRA University, Pakistan. Prior to joining the NCBA&E, he has worked in various academic and industrial roles in Pakistan. He is currently working as an Assistant Professor with the Department of Computer Science, Lahore Garrison University, Lahore, Pakistan. He has been teaching graduate and undergraduate students in computer science and Engineering for the past 10 years. He is currently guiding four Ph.D. scholars and eight M.Phil. Scholars. He has published about 120 research articles in international journals as well as well in reputed international conferences. His research interests primarily include MUD, channel estimation in multicarrier communication systems, as well as image processing and medical diagnosis using soft computing with various publications in journals and conferences of international repute.
LUIS EDUARDO FALCON-MORALES received the M.Sc. degree in mathematics from UNAM, in 1994, and the Ph.D. degree in computer vision from CINVESTAV, in 2007. During his Ph.D., he has spent several months with GRASP Laboratory, University of Pennsylvania, and also with the Computer Vision Laboratory of Berkeley, University of California. His current researches focus on computer vision, image processing, as well as machine and deep learning. He has been a full-time Professor with the Department of Computer Science School, Tecnologico de Monterrey, since 1998. He has worked and published papers in several topics, such as medical image processing and machine learning for detection of diabetes based on retina images; digital analysis of blood patterns for the detection of several diseases; leader of a project of computer vision for the assistance of visually impaired people; as well as sentiment analysis of a social networking platform.