A Secure Federated Deep Learning-Based Approach for Heating Load Demand Forecasting in Building Environment

Recently, with the establishment of new thermal regulation, the energy efficiency of buildings has increased significantly, and various deep learning-based methods have been presented to accurately forecast the heating load demand of buildings. However, all of these methods are executed on a dataset with specific distribution and do not have the property of global forecasting, and have no guarantee of data privacy against cyber-attacks. This paper presents a novel approach to heating load demand forecasting based on Cyber-Secure Federated Deep Learning (CSFDL). The suggested CSFDL provides a global super-model for forecasting heating load demand of different local clients without knowing their location and, most importantly, without revealing their privacy. In this study, a CSFDL global server is trained and tested considering the heating load demand of 10 different clients in their building environment. The presented results, including a comparative study, prove the viability and accuracy of the proposed procedure.

the world follows the concept of energy security [1], [2]. Most of the world's electricity is currently generated from fossil fuels, and this trend will lead to an increase in global warming. According to statistics provided by the International Energy Agency, the Worldwide electricity demand is expected to increase by more than 65% by the year 2035 [3]. Recent studies have shown that about 60% of the world's energy is consumed in residential and commercial buildings [4]. Accordingly, energy management in buildings can be one of the most important tools in saving energy and reducing CO2 in the environment.
Optimization and management energy consumption in buildings require full identification and knowledge of energy sources and major end-uses of the building. Typically, energy sources in a building include electricity, heating supply, and natural gas. The related major end-uses also include heating, ventilation and air conditioning (HVAC), elevators, domestic hot water, kitchen equipment, lighting, peripherals and home appliances [5]. Fig. 1 shows the adopted ISO Standard 12655:2013, which is related to the classification of building energy consumption [6]. It should be noted that the HVAC operation schedule and indoor/outdoor conditions are two important factors that in addition to the above-mentioned energy sources and major end-uses should be considered in the analysis of building performance.
Among these, the heating load forecasting in buildings plays a key role in energy management of buildings and in guiding the optimal control objective of on-demand heating. Heating load forecasting is a complex nonlinear optimization problem that has attracted the attention of many researchers in recent years [7]. Buildings' heating load forecasting should be done by considering various aspects such as forecasting objects, forecasting time horizons, and forecasting techniques [8].
Heating load forecasting objects are mainly divided into four categories: heat source, heat exchange station, and building. In supply-side, the heat source is the target of heating load forecasting, while buildings are the target on the demand side, and the heat exchange station is the bridge between the supply side and the demand side to achieve efficient energy transmission [7].
In [9], a mixed integer linear program (MILP) has been proposed to forecast the integrated model in large-scale buildings to evaluate the optimal performance of HVAC. A global forecasting system sflux model, which is a statistical-based procedure, has been suggested in [10] to estimate the heating load demand of buildings. The method FIGURE 1. The usage of energy in buildings [6].
proposed in this study that is based on weather conditions, advances the forecasting process.
In [11], the heating and cooling loads forecasting in the buildings has been done using ANNs and the extraction of a black box model by considering the meteorological data as input variables. In this study, cooling and heating loads are investigated for five office buildings. Eighteen different models are formulated in the different number of inputs and different parameters for each of the five buildings, using recent weather history and load data in the past three hours. The heating load forecasting of a building using a backpropagation neural networks called nonlinear autoregressive in [12] with the meteorological parameters and the HVAC equipment parameters playing the role as input variables. This study forecasts the heating load for a commercial building based on various objectives such as access to high thermal comfort and the possibility of resetting the air temperature setpoint without compromising the comfort level of the occupants. The process of implementing the proposed goal has been done by considering issues such as optimizing the parameters and size of the network and determining the appropriate value for the number of training data.
Building heating load forecasting based on modeling the design and structural features of a building has been performed in [15] via the ELM technique and via its improved version called online sequential ELM. The main purpose of this study is to develop an ELM-based model that can extract the correlation between features, mutual information their association strengths, and their relation with heating and cooling loads based on the structural characteristics of the building. Lack of participation of climatic parameters related to the study area can be considered as one of the most important disadvantages of this study. In [18], various machine learning applications such as support vector machine, random forest, Gradient Boosted Regression Trees, and XGBoost have been utilized to forecast the heating and cooling loads of a commercial building. In this study, predictions have been made by considering the available historical and meteorological data as input variables. The control of district heating systems in [19] has been done based on the heating load forecast of 10 residential buildings located in Rottne, Sweden by introducing a hybrid model of ANN and SVM techniques. In this study, the proposed methods are trained and tested based on a 27-month thermal load dataset obtained from 10 residential buildings along with outdoor temperature information received from a weather forecast service. Heating and cooling loads forecasting based on the technical specifications of the building has been described in [8] using two multilayer perceptron and support vector regression techniques. This study aims to manage and optimize the consumption of cooling and heating loads. However, the developed machine learningbased forecasting models are trained and tested based on building specification parameters and without being affected by climatic variables. In [17], the forecasting of building heating load by considering meteorological data as input variables has been done by one of the machine learning applications called decision tree.
Forecasting the heating and cooling loads of a residential building to inside energy management has been presented in [20] using deep neural network (DNN). In this study, the technique used forecast the heating and cooling loads based on the technical characteristics of the building. Another deep learning application called short-term memory network (LSTM) has been used in [21] to forecast the lighting loads, heating loads, and miscellaneous electrical loads in two office buildings in United State. In this study, the main focus is on introducing, preprocessing, and selecting the most appropriate features for training the forecasting models. Thus, the effectiveness of the proposed approaches with a focus on results, which indicates a reduction in internal heat gains forecasting errors from 12% to 8% in building A, and from 26% to 16% in building B. In another study in [22], deep learning applications called recurrent neural network, LSTM, and gated recurrent units have selected to forecast indoor energy in residential and commercial buildings. In this study, deep learning-based forecasting models are used to extract and analyze 24-hour profiles related to building cooling loads. This process is performed by considering the time variables, outdoor variables that describe the conditions of the outdoor environment, operational variables that introduce the condition of chiller plants, and energy variables that include the power consumption of systems, as the input of each network. The short-term internal energy forecasting of a residential building has been performed in [23] by presenting a hybrid approach of ELM technique and a deep learning stacked auto-encoder. Another application of deep learning called convolutional neural network to forecast the heating load in residential buildings has been suggested and used in [7]. This study is conducted for four heat exchange stations located in Anyang, China in the 2018 heating season. Heat load forecasting is considered a complex nonlinear optimization problem to be solved by providing deep learning-based approaches. It should be noted that the expression of the effectiveness of the proposed solution is done in a comparative approach with other advanced algorithms based on machine learning and deep learning. The hourly heating load forecasting in residential buildings by considering historical data and climatic information as input variables has been described in [24] using a LSTM technique. This study focuses on providing a robust forecasting model for modeling the time-series mode of input data. Thus, the proposed model is implemented in an online system of a power plant in Shandong province, China, and is able to make a continuous forecast without human intervention for four months during the hot season of 2018.
A multilayer hybrid model based on the ARX and particle swarm optimization neural network has been proposed in [25] to forecast the heating load of a residential building. One of the most important objectives of this study is to set up and develop an energy performance management system as an effective method to deal with the high growth demand for electricity in China's urbanization process in order to prevent the expansion of existing fossil power plants. It should be noted, the technical specifications of the building and weather data are considered as the input variables for the developed forecasting model in this study. Regarding energy management and conservation, an enhanced integration model (stacking model) has been presented in [26] for forecasting the heating load of two educational buildings in Tianjin, China. A hybrid approach based on deep learning applications called hybrid spatial-temporal attention long short-term memory has been introduced in [27] to forecast the heating load for energy management in residential buildings.
From the literature review presented above, two points can be concluded: 1) all of them point out the importance of energy management and the need to forecast heating loads in a variety of buildings; 2) various methods have been proposed to address this problem, however, each method suffers from specific issues. In particular, the statistical and numerical methods have complex numerical calculations and require strong processing systems and experienced people to solve the forecasting problem. In addition, the predictions made by these techniques are not very accurate. In machine learning and ANN-based methods, accurate model extraction for forecasting requires a large amount of effective data. However, these techniques cannot model time-series mode related to input variables such as weather parameters, and real-time prediction models are not very accurate. Deep learning-based techniques have been able to significantly improve the problems associated with previous methods. However, these techniques also suffer from generalizability to VOLUME 10, 2022 different buildings. A very important point to consider is the complete dependence of all conventional and used techniques on the training data of each region. Thus, if complete and comprehensive data from the work history is not available, it will be impossible to make any forecasting, and this can be considered the main limitation of the techniques used. In addition, none of the studies have addressed the issue of data privacy and the performance of proposed methods against cyber-attacks. Therefore, addressing issues such as time-series state modeling related to input data, providing data-independent forecasting models or as little as possible with less dependence on training data, developing generalizable forecasting models that can extract the relationship between input variables relevant to different regions, and ultimately protecting the privacy of data from cyber-attacks that can be significant for industry or government owners, can significantly improve the forecasting process, and are welcomed by researchers and stakeholders that are active in this field.
In this paper, a novel hybrid technique called cyber-secure federated deep learning (CSFDL) is presented to improve the process of forecasting the building's heating load and to maintain the privacy of the data used. The proposed CSFDL technique can be viewed as a hybrid framework combining federated learning and CNN models. The federated network is responsible for maintaining data security against various types of cyber-attacks and forms a global super-model. The CNN technique is used to extract features from the input data and forecast the heating load based on the extracted features. In addition, the presentation of a global super-model that can be generalized and forecast the heating load of unknown and new buildings without the need for training data is one of the most important contributions of this paper. Also, in order to provide a comparative approach, other conventional learningbased techniques, called Support Vector Regression (SVR), General Regression Neural Network (GRNN), LSTM, and Bidirectional LSTM (Bi-LSTM) are used to forecast the heating load studied in this paper. Evaluation of the results indicates the effectiveness of the proposed technique in comparison with other conventional techniques.
The remainder of this paper is organized as follows. Section II describes in detail the applied deep learning techniques including CSFDL and CNN. The case study of heating load demand in a building environment is described in Section III. The prediction results are presented in Section IV. Finally, Section V contains the conclusions of the paper.

II. FEDERATED DEEP LEARNING
Over the past decade, deep learning models have experienced significant growth and have been used in a variety of applications. With increasing data availability and improved computational power, more efficient deep learning algorithms have spawned a plethora of new applications, including smart energy systems, smart buildings, automatic driving, etc.
The accuracy of these algorithms usually depends on the availability of a large amount of data. Typically, this amount of data is stored on a central server, which may cause some problems for the entire network. First, given the increasing bandwidth availability, and the possibility of remote data storage and retrieval, it is easy to reach these servers and therefore they can easily be a target for an attacker. Second, collecting a large amount of data on a single server requires more computational processing. Last but not least, devices connected to servers might accumulate data of sensitive nature that is subject to secrecy rules and regulations. Therefore, it is important to overcome these problems by applying more powerful ways of decentralized data storage. The one used here is Federated Deep Learning (FDL), which was recently introduced by Google [28]- [30].
FDL provides a shared global model to connect deep learning models with local participants without knowing their location and, more notably, without revealing privacy. This method aims to achieve a global model with a federation of multiple participating machines that preserve the security of their own data when clients are typically abundant and have Internet connections. Moreover, this method needs less computational processing and even does not need a main server to receive data from local parties. After each training iteration, a broadcasted model is downloaded from the main server in the cloud, and then the local parities train their local data and send updated weights back to the server for the next iteration. In the server part, aggregation is performed to obtain a new global model on the server.
Mathematically, considering K enabled clients and letting k denotes the index of each client, the FDL algorithm in a distributed scheme that intends to solve the following loss function: where n k is the number of local samples in each parties and L k (θ) can be formulated as: where P k is a set of data indexes whose length is n k , i.e., n k = |P k |. A typical FDL operation consists of one server and K clients, depicted in Fig. 2. The process could be summarized in four main steps. First, in each local party, initial training is done by receiving the parameters θ t from the central server. The second step is model aggregation, where the server receives the parameters θ k t from each client and provides a secure aggregation over the clients. The most commonly used method for aggregation is the federated averaging (FedAvg) [30], that is based on averaging local stochastic gradient descent. The third step is parameter broadcasting from the server, when new parameters θ t+1 are sent back to each local client to perform new training on their own dataset. Finally, all clients update their individual models with respect to the aggregated parameters and evaluate the effectiveness of the new models.

A. CYBER-SECURE FEDERATED DEEP LEARNING (CSFDL)
The CSFDL-based heating load forecasting process explained in Algorithm 1 and the general flowchart of the CSFDL process implementation based on the proposed method are shown in Fig. 3. As shown in this flowchart, the CSFDL-based heating load forecasting is divided into four main steps. In the first step, local customers perform an initial training with their own local data and calculate the performance of this model. Then, the encrypted parameters of the performance results are sent to the server. In the second step, the server aggregates all the parameters of the clients without knowing the local information of the clients. In the third step, the server sends the aggregated parameters to N clients. In the fourth step, the local participants update their respective models with the decrypted gradients. Finally, a security check is performed on the updated models, which then leads to the next iteration. The reason for this security control and the main procedure explained in following.
Data protection and privacy preservation are among the most important features of the FDL. However, recent research [31], [32] has shown that this method is the new target of adversary threats. The principle of FDL is the communication of parameters between clients and the server. Therefore, leaking any information during this transmission could compromise the confidentiality of the whole model. Moreover, to build the final global model, several model updates are exchanged periodically to achieve the optimized model. These communications and model updates may result in unintended information leaks that could be exploited by an attacker.
One of the reported threats to FDL is the manipulation of gradient updates [32], which causes the injection of false information into the network. To avoid this threat and keep the model secure, we use two security strategies. First, we employ model averaging, which reduces the disclosure of gradient updates and available information to the adversary. Second, we use dropout [33], the most common regularization technique used to prevent overfitting in neural networks. Moreover, it also has an interesting property that helps to keep the FDL secure. Due to the nature of dropout, it limits the activation between neurons. As a result, these deactivations weaken the information leak, and the attacker observes fewer gradient updates compared to the active neurons and provides the CSFDL.

B. CONVOLUTIONAL NEURAL NETWORK (CNN)
CNN has been used as a powerful tool for image and video processing, classification, forecasting, and feature extraction. This technique has been able to solve many problems related to machine learning methods, including feature extraction and pattern recognition, by providing a layer-bylayer structure [4], [34]. As shown in Fig. 4, convolution layers, pooling layers, fully connected layers, and Softmax layer forms the architecture of a CNN network. Each of these layers completes the function of this architecture by playing a unique role. Each convolution layer consists of kernels that act as filters on the data. The kernels come in a variety of sizes and are used to extract features from data [35].
In each convolution layer, various features are extracted based on the dimensions of the kernels. The summing of prominent features is done by the pooling layer to create a feature map and transfer it as input to the next convolution layer. Max pooling is a well-known type of pooling operation that forms the feature map of each layer by collecting the maximum extracted features. Mathematically, the CNN in layer l and filter i operates as follows [4], [36]: Step 1: Initialization of θ t on the Sever; Step 2: for each t = 1.2. . . . where t is communication round: Calculate m = C × K when C ∈ (0.1) and K is the numbers of clients; Download θ t for all of clients; for all client k ∈ m keep it for synchronization: n k n θ k ; End for.

End for.
Step 3: In client K , θ k = θ t ; Step 4: for each iteration between 1 to E (E is the training iterations): For batch b ∈ B where B is the size of each mini batch: where η is the learning rate; End for. End for.
Step 5: Return θ k to server.
where ω and b are weights and bias which are updated according to the following equations: where t, c, m, n, x and λ are the update step, the cost function, the momentum, the training examples, the learning rate, and the regularization parameter, respectively. The output of the last convolution layer is considered as the input of the fully-connected layer. The fully-connected layer is a kind of feed forward neural network that determines the weight and bias of the extracted features. The output of this layer is used as the input of the last CNN layer, the classification or estimation layer, to make the final CNN prediction in this layer. The CNN training process is based on minimizing the training error between actual and forecasted values by updating the network parameters [34], [35]. The training error is determined by the equation: where L is the loss function and (x i , y i ) are training samples. The loss function is given by where f represents the predicted value computer by the whole CNN network as follows:

III. CASE STUDY BUILDING ENVIRONMENT
The high variability of heating load type is a feature of thermal energy measurement. Consumption of heating load in various commercial and residential buildings depends on many factors such as technical specifications of the building and daily weather parameters, each of which is effective in forecasting the heating load. This paper uses data from a building in Tomsk, Russia, that are collected on a daily basis [37]. The collected dataset includes the mass of input and output waters (heat carrier) per day, difference between mass of input and output waters, average temperature of the heating carrier in the input of the heating system, average temperature of the heating carrier in the output, temperature difference, amount of the consumed heating in Gcal, heating or heating plus hot water, type of the heating system (opened or closed), code system-load (4 digits: the first digit 1 is opened system and 2 is closed system. The second digit 0 is heating, 1 is heating and hot water supply. The third and fourth digits is floors amount (01, 02, 03, . . . , 17)), area of building that heating meter is served, amount of building floors, walls material, year of building construction, total area of building, and outdoor temperature. In this paper, data related to 10 floors of this building were studied. In this building, the physical and structural characteristics of each floor are completely different. So that each floor can be considered as an independent building unit. In this paper, according to the structure of the Federated Learning network, data related to 7 classes were selected to training in each client and recognize behavioral patterns related to heating load in it. After completing the training process and forming a global supermodel based on data related to 7 building units, data related to the other 3 units were utilized to test the supermodel and evaluate its performance in forecasting heating load. Table 1 shows the historical characteristics, the number of training, and the test data related to the floors of the building that was used to train and test the network.
The performance of Federated Deep Learning is perfect when the data has the property of Non-independent and identically distributed (Non-IID). In this work, the utilized data is related to a real dataset from a building in Tomsk, Russia, which is collected daily and satisfies the condition of Non-IID. Since the data was recorded from one building on 10 different floors, the dataset depends (Non-independent) on the following factors: walls materials, year of construction of the building and outdoor temperature. In addition, the data are not identically distributed because the data were recorded from 10 different customers with different characteristics in heating energy consumption. For further clarification, Fig. 5 shows the distribution of the data on main features of the dataset in the form of boxplots. This figure justifies the different distribution of the data on each client.

IV. HEATING LOAD DEMAND FORECASTING RESULT
In this paper, building's heating load forecasting is done with the aim of presenting a generalized global forecasting model and maintaining data privacy. This is based on the Federated Learning technique and using the CNN algorithm. The designed federated network consists of 7 clients. In each client, the training process for extracting features from the data related to each building is performed by the CNN technique. The behavioral patterns recognized in each client form the global forecasting supermodel that is then used to perform the forecasting process in other new and unknown buildings.
Network performance in each of the training and test stages was measured based on various performance evaluation metrics such as correlation coefficient (R 2 ), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). The values for each of these metrics represent a specific definition of the performance of the method used. Thus, the highest value of R 2 indicates the high accuracy of the network, while other indicators are related to the amount of forecasting errors and in the best possible case should have the lowest value close to zero. Each of the mentioned metrics is calculated based on the following equations: where X i and Y i represent the real heating load values and forecasted values, respectively.X andȲ represent the mean of real heating load values and average of forecasted heating load values, respectively. By designing the CNN network structure in each client, the training process was performed based on the data specified in Table 1. Training time is evaluated as an important indicator in the use of learning-based techniques. Accordingly, in this paper, the processing and training time of each of the techniques used has been calculated and evaluated. The runtime of CPU for the federated learning process to train and VOLUME 10, 2022 test the model of 7 clients and to fully test the performance of the server model on 3 clients is 16 minutes and 43 seconds. While the training and test time related to each of the methods SVR, GRNN, LSTM, and Bi-LSTM for the data related to each client (each floor) were 13 minutes and 08 seconds, 11 minutes and 37 seconds, 8 minutes and 19 seconds, and 6 minutes and 51 seconds, respectively. This computation time shows the superiority of the proposed Federated Learning procedure in terms of processing time as well as its efficiency and robustness.
It should be noted that the federated learning algorithm converges after 20 interactions. Fig. 6 illustrate the convergence process of evaluation indicators on test dataset of a sample client.
The results of CNN feature extraction and performance on each client are presented in Table 2. Evaluation of the results presented in Table 2 shows that the CNN technique was able to learn the behavioral patterns of the heating load input variables well in each client during the training process. This process allows the trained network to extract the relationship between input variables to forecast the heating load and use this modeling to test the network and forecast new data. It can be seen that the training process has been such that the test of trained networks in each client has been done with the highest accuracy and the lowest error values. After test each of the trained networks and confirming the training and test process in each client, a global supermodel containing the extracted features is formed in each client. This supermodel is a kind of estimation toolbox based on the behavioral patterns of the data examined in each client. The aim is that this supermodel be able to respond to and forecast the heating load for buildings that have no training data and had no effect on the training process of any client and are considered as new and unknown buildings. To achieve this, and to evaluate  the performance of this global supermodel, input data from three buildings (as states in Table 1), which are new and unknown buildings, are now used as super model input. Fig. 7 compares the heating load predictions made for data on new and unknown buildings by the global supermodel with actual heating load values. In addition, this figure presents a comparative approach to the forecasted and actual values of the heating load for different seasons of the year. The results show that the global supermodel has been able to model the behavioral pattern of the heating load of new buildings with acceptable accuracy. Table 3 shows the results of the forecasts made for the new buildings by the supermodel with different evaluation metrics. The results indicate that the designed supermodel was able to test and forecast the heating load for each new client who had no role in the training process and the formation of the supermodel, with acceptable accuracy coefficients and low forecasting errors. The performance evaluation of the global supermodel and its generalizability for various residential and commercial buildings in forecasting the heating load was performed with acceptable results. As stated in Section III, the heating load forecasting in this paper was based on various input variables, each of which is effective in determining the amount of heating load. Increasing the number of modeling input parameters and exploring the correlation between them makes it difficult to determine the output value. Thus, the network used should model the relationship between the input variables and their effect on the output value. In this paper, to emphasize the high ability of the proposed method, various types of parameters affecting the heating load values, such as technical specifications of the building and climatic characteristics were used. In order to show the effect of each of the considered parameters in determining the amount of heating load of the building, a sensitivity analysis and correlation analysis were performed between them. Fig. 8 shows the results of the sensitivity analysis and the correlation between the input variables in determining the heating load values. The results presented in Fig. 8 show that the input variables have different correlations with each other and some of them can increase the forecasting error in addition to increasing the forecast accuracy. Therefore, choosing a method that can model the correlation between input variables in a way that increases the accuracy of forecasting and determining the value of the output parameter, is one of the most important issues in choosing the ideal method for performing forecasting approaches.
In the last part of the paper, in order to present a comparative approach, some common techniques called SVR, GRNN, LSTM, and Bi-LSTM are used to predict the heating load of the building. Each of these techniques has been used in a variety of applications related to power and energy systems and are described in detail below.
SVR is a regression version of the support vector machine that is mainly used for regression, forecasting, linear mapping, and function approximation with only a few fewer different applications [38]. The SVR has several versions, but the classical model (ε-SVR) is mainly considered in science and engineering and has been employed in this work as well. The structural model and mathematical formulation of this technique are presented in detail in [39].
GRNN is one of the most powerful regression applications based on machine learning techniques, mainly used for regression, linear and nonlinear modeling, and classification applications [40]. The GRNN is based on a completely parallel structure with a very high training speed and is usually defined as a type of radial-basis function network. Modeling of nonlinear relationships between input data is one of the most important features of this technique, even for very small volumes of data. The structural architecture and mathematical modeling of GRNN are fully presented in [41].
LSTM is one of the most well-known deep learning techniques that was first introduced in 1997 to improve the performance of recurrent neural networks (RNNs) [38]. Since its introduction, this technique has been able to solve RNNs-related gradient vanishing and exploding problems and be used as a powerful tool for regression, prediction, linear and nonlinear modeling, and classification applications. High-dimensional data processing and high training speed are the obvious advantages of LSTM. Reference [42] introduces the complete structural schematic and mathematical formulation of LSTM.
Bi-LSTM is one of the well-known applications of deep learning, which was proposed to improve the performance of LSTM [43]. Thus, the transfer of information in this network is done in two ways and can forecast the data with information about past and present times. The Bi-LSTM has been able to significantly solve time-series data modeling problems and become a powerful tool for doing so. The layer-bylayer structure and mathematical modeling of Bi-LSTM are presented in [44].
Each of the introduced techniques is applied to the data of each client from the federated network to compare and evaluate the results with the results of the proposed CSFDL technique. Table 4 compares and evaluates the results of techniques SVR, GRNN, LSTM, and Bi-LSTM in forecasting the heating load of buildings. The results presented in Table 4 show that each of the mentioned techniques based on their abilities is able to forecast the heating load of each building in clients 1 to 7. However, it can be observed that each of the SVR, GRNN, LSTM, and Bi-LSTM techniques does not perform well in forecasting the heating load associated with the data available on clients 8, 9, and 10, and these results indicate the lack of generalizability of these techniques. Thus, in the results presented in Table 3, it is observed that the proposed CSFDL approach, due to its generalizability, is able to provide an acceptable forecast of buildings that does not have any training data. It should be noted that the proposed SDFL procedure can be used as a powerful tool in all matters related to forecasting in power and energy systems, where data privacy is also of great importance.

V. CONCLUSION
Forecasting of energy demand in buildings, especially heating load demand, which is the main part of energy consumption inbuildings, is very important to improve the energy efficiency of buildings and energy management tasks. Various forecasting approaches based on artificial intelligence methods have been presented in the literature to accurately predict the energy demand of buildings. In this paper, a novel forecasting model has been presented to cover all the critical issues that could not be addressed in previously published works.
Many machine learning and deep learning techniques presented for energy demand forecasting rely on this assumption that training and testing data have the same distribution. Indeed, when the data are split into a training and a testing part, this statistical consistency is satisfactory. However, in the real experiment, the new input data has a different distribution than the data with which the model has already been trained with. The presented CSFDL method has this property and the global server as a super model can accurately forecast data sets with different distributions.
Maintaining the confidentiality of the collected data is another concern that any failure allows the external attacker to successfully modify the data, and the energy supplier operates on the basis of an incorrect demand forecast. The proposed method applies novel techniques in edge computing to avoid the transmission of data to the central server and analyze it locally with a federation of multiple participating machines that preserve the security of the local parties.
The accuracy of the presented CSFDL method was verified by the following procedure. First, the collected heating load demand data from 7 clients were divided into training and testing parts with a ratio of 80-20. Each local client has its own CNN network which downloaded the proposed model from the main server in the cloud. The primary evaluation test of the server was performed on 20% of each client's data that the server works perfectly for all clients with the lowest forecasting error and correlation coefficient of 99.00%. Second, the performance of the generated supermodel was tested on 3 clients whose distribution was unfamiliar to the server and considered as out-of-sample data. It was observed that the global supermodel was able to predict the heating load demand with a correlation coefficient of 98.00%, 93.00% and 70.00%. Finally, four conventional techniques based on artificial intelligence named SVR, GRNN, LSTM, and Bi-LSTM were applied to the data used in each client in order to forecast the heating load. The results of the mentioned techniques were compared and evaluated with the results of the proposed CSFDL method. It was observed that due to the fact that no training data was available in clients 8, 9, and 10, techniques SVR, GRNN, LSTM, and Bi-LSTM were not able to make predictions for these buildings. While the proposed SFDL technique was able to provide high impact for these clients based on its generalizability. Overall, the presented results show the high capability of the proposed CSFDL method to produce acceptable forecasts while preserving data privacy and eliminating the dependence of the model on the training data.
HAMED MOAYYED received the M.Sc. degree in mathematics from the K. N. Toosi University of Technology, Tehran, Iran, and the Ph.D. degree in optoelectronics from the University of Porto, Portugal, in 2016. Currently, he is a Postdoctoral Researcher with the Faculty of Engineering (FEUP), University of Porto. His main research interests include smart energy systems, power systems, state estimation, and power system cyber security. He is experienced in mathematical modeling, computer simulations, optimization, numerical analysis, software development, and deep learning. He is currently doing research on the applications of deep neural networks in modern power systems.