Federated Learning-Based Multi-Energy Load Forecasting Method Using CNN-Attention-LSTM Model

Zhang, Ge; Zhu, Songyang; Bai, Xiaoqing

doi:10.3390/su141912843

Open AccessArticle

Federated Learning-Based Multi-Energy Load Forecasting Method Using CNN-Attention-LSTM Model

by

Ge Zhang

^1,2,

Songyang Zhu

^1,2 and

Xiaoqing Bai

^1,2,*

¹

Key Laboratory of Power System Optimization and Energy Saving Technology, Guangxi University, Nanning 530004, China

²

School of Electrical Engineering, Guangxi University, Nanning 530004, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(19), 12843; https://doi.org/10.3390/su141912843

Submission received: 13 August 2022 / Revised: 17 September 2022 / Accepted: 27 September 2022 / Published: 8 October 2022

(This article belongs to the Special Issue Sustainable Power Systems and Optimization)

Download

Browse Figures

Versions Notes

Abstract

:

Integrated Energy Microgrid (IEM) has emerged as a critical energy utilization mechanism for alleviating environmental and economic pressures. As a part of demand-side energy prediction, multi-energy load forecasting is a vital precondition for the planning and operation scheduling of IEM. In order to increase data diversity and improve model generalization while protecting data privacy, this paper proposes a method that uses the CNN-Attention-LSTM model based on federated learning to forecast the multi-energy load of IEMs. CNN-Attention-LSTM is the global model for extracting features. Federated learning (FL) helps IEMs to train a forecasting model in a distributed manner without sharing local data. This paper examines the individual, central, and federated models with four federated learning strategies (FedAvg, FedAdagrad, FedYogi, and FedAdam). Moreover, considering that FL uses communication technology, the impact of false data injection attacks (FDIA) is also investigated. The results show that federated models can achieve an accuracy comparable to the central model while having a higher precision than individual models, and FedAdagrad has the best prediction performance. Furthermore, FedAdagrad can maintain stability when attacked by false data injection.

Keywords:

integrated energy microgrid (IEM); multi-energy; load forecasting; federated learning

1. Introduction

The uneven distribution and depletion of unrenewable energy, and severe environmental pollution, have challenged the energy industry [1]. Integrated Energy System (IES) takes electric energy as the core and manages energy systems on a unified basis considering the diversified energy demand [2]. It is a multi-type energy joint dispatching development model which significantly improves energy utilization. The small-scale IES, Integrated Energy Microgrid (IEM), is more accessible to build and more common than large-scale IES. Due to the complex energy coupling of IEM, it is challenging to apply traditional mathematical methods to IEM.

Artificial intelligence (AI) is a common solution for solving complex problems. AI has been adopted in various fields, including medicine, image processing, and natural language processing. AI relies on big data volumes and enough variation to be trained adequately [3]. With the short development time available for IEM, a single IEM may face low data diversity and poor data quality. Hence, the central model [4] is widely adopted to address this limitation. Nonetheless, the central model requires huge storage capacity and faces the disadvantage of delay in data calculation and communication [5]. Additionally, in the case of the central model, each client needs to share their data with the server [6], which means there are security risks in communication and storage. Thus, clients may be restricted by legal constraints or unwilling to share data due to privacy and security concerns, and self-interest [7]. In such cases, it is difficult for the server to collect enough data and train a model with high accuracy. Therefore, finding a method to guarantee forecasting accuracy and avoid privacy leakage is vital. Federated learning (FL) was introduced by Google in 2017 [8]. It is a distributed machine learning mechanism that enables clients to cooperate in training a global model without sharing data. FL protects client data privacy and is more scalable than the central model [9]. It is a new method of privacy-preserving data sharing [10].

Although it does not need to transmit raw data, FL still needs to transmit the model parameters. False data injection attack (FDIA) means that the attacker destroys the data in a tricky way such that the errors cannot be detected during transmission [11]. During parameter transfer, FL is at risk of encountering FDIA.

Consequently, the emphasis of this paper lies in finding a method to predict the multi-energy load of IEMs, which enables IEMs to train a multi-energy load forecast model in a distributed way and has no need to share data with the server.

The key contributions can be summarized below:

(1): We establish the CNN-Attention-LSTM model based on FL to forecast multi-energy load.
(2): We simulate and evaluate the prediction performance of individual, central, and federated models with four strategies. The prediction accuracy of federated models is comparable to or even better than that of the central model.
(3): We simulate and compare the performance of four types of FL (FedAvg, FedAdagrad, FedYogi, and FedAdam) under FDIA. The experiment results indicate that FedAdagrad has better forecasting results than others, whether a regular operation or suffering from FDIA.

The rest of the paper is organized as follows: Section 2 describes the related works. The methodology and framework are introduced in Section 3. Section 4 explains the experiments and corresponding results, while Section 5 concludes the paper.

2. Related Work

2.1. Multi-Energy Load Forecasting

Recently, various neural networks and their combined forms have been applied to multi-energy load prediction. Liu et al. [12] adopted a deep Long-Short Term Memory Network (LSTM) to obtain accurate forecasting. Zhu et al. [13] add Convolutional Neural Network (CNN) to LSTM forecasts of the multi-energy load of combined cooling, heating, and power (CCHP) systems. Instead of CNN, Wang et al. [14] combined a gradient boosting decision tree with LSTM based on an encoder-decoder model. Tan et al. [15] proposed a model with multi-task learning and least squares support vector machine (SVR) to forecast the load. On this basis, Yan et al. [16] optimized the parameter settings of the model with particle swarm optimization (PSO). Li et al. [17] and Wang et al. [18] predicted load using Stacked auto-encoders (SAEs). In addition to network-specific studies, a double objective operation optimization model that considered an integrated demand response (IDR) mechanism was exploited by Wang et al. [19]. Li et al. [20] proposed a multi-energy load forecasting method based on the neural network model and transfer learning. Guo et al. [21] used Bi-directional Long Short-Term Memory multi-task learning to forecast load. In order to realize the multi-task architecture and perform joint prediction of multi-energy load, Wang et al. [22] proposed a Multiple-Decoder Transformer model. Feng et al. [23] took weather and calendar into consideration to forecast electricity consumption. Hu et al. [24] adopted a time convolutional network (TCN) to deal with the data’s timing and non-timing characteristics.

FL is introduced further to improve the data diversity and the model’s performance while ensuring that client privacy is not leaked. The IEM can better plan or cooperate with multiple IEMs.

2.2. Federated Learning in Energy Systems

In recent years, FL has been applied in several areas, such as banking [25], transportation [26,27], communications [4,8,28], and others [7,29,30,31]. In energy systems, the application of FL has been the subject of several investigations, and the existing researches focus on smart grids.

Afaf et al. [3] tested the performance of LSTM with Federated Averaging (FedAvg) on smart grid load forecasting. The results verified that the models could reduce data security risks and networking load while ensuring accuracy. Based on LSTM-FedAvg, SAVI et al. [32] added K-means to the method. The results showed that the method had comparable forecasting performance to the state of the art method and outperformed it in training time and privacy awareness. Fekri [33] also built a model that combined LSTM with FL. In addition, two strategies (FedSGD and FedAvg) of federated learning were examined. The studies confirmed that the FedAvg achieves better prediction performance than FedSGD and single centralized models.

3. Methodology

3.1. Federated Learning

As a machine learning framework, federated learning can effectively help multiple clients train a model while meeting client privacy protection, data security, and regulatory requirements. FL consists of two parts: server and client. The process of FL usually consists of several rounds, and the process of each round includes three steps. Firstly, clients compute and send encryption gradients to the server. After that, new parameters are sent back after aggregation of clients’ parameters by the server. Finally, clients update models with new parameters [34].

Figure 1 shows the architecture of the federated learning mechanism:

FedOpt in Figure 1 means an adaptive optimization method. Algorithm 1 provides pseudocode for the FedOpt framework [35].

Algorithm 1: FedOpt framework

1: Initialize the global model

x^{0}

2: for each round

t

= 1,2… do

3: The server sends

x^{t}

to all clients

4: Each client performs

E

epochs at local

5: Each client has an individual model

x_{k}^{t}

and sends

Δ_{k}^{t} = x^{t} - x_{k}^{t}

to the server.

6: The server computes a pseudo-gradient

Δ^{t}

and updates its model via

Δ^{t} = \frac{\sum_{k}^{K} p_{k} Δ_{k}^{t}}{\sum_{k}^{K} p_{k}}

x^{t + 1} = S E R V E R O P T (x_{t}, η_{s}, Δ^{t})

Where

t

represents the t-th interaction;

x^{t}

represents the model of the server after the t-th interaction;

E

means the number of locally training epochs; K represents the number of users in the system, and k refers to the k-th client;

x_{k}^{t}

is the model after local training and

Δ_{k}^{t}

is the difference between the local and the global model. SERVEROPT() refers to the method used by the server to aggregate parameters. The global model is stored on the server, and the local model refers to the model formed by the client after local training. The server updates the global model after receiving the parameters of the local model.

Algorithm 1 generalizes several federated learning algorithms, including FedAvg, FedAdagrad, FedYogi, and FedAdam. These are the cases where SERVEROPT is SGD [8], Adagrad [36], Yogi [34], and Adam [37], respectively.

The FL mechanism was adopted to simultaneously improve model generalization and data diversity and protect privacy.

3.2. CNN-Attention-LSTM Model

In this paper, we build a CNN-Attention-LSTM as the global model of FL. CNN can extract features and reduce the dimensions of the data. LSTM is used to extract temporal features. The attention mechanism helps the network to extract the most important information under the limited resources. The combination of CNN-Attention-LSTM can achieve dimensionality reduction of data, extract temporal features, and increase training efficiency. The attention mechanism maximizes the model’s effective information extracted from the data [38].

The structure of the CNN-Attention-LSTMis presented in Figure 2:

LSTM₁ and LSTM₂ represent LSTM layers. Input represents historical load data and related factor data. Output₁ and Output₂ represent multi-energy load. The data was converted into (step, features) form before input, and the step was set to 24.

4. Simulation

The simulation is based on the Downtown, Polytechnic, Tempe, and West campuses of Arizona State University [39]. The data set contains the cooling and electric load data from 1 January 2019, to 31 December 2019. The data resolution is one hour. The environmental data were chosen from the weather station closest to each campus and downloaded at National Centers for Environmental Information [40]. The data includes temperature, dew point, humidity, air pressure, and wind speed, which is used to study the correlation between environmental factors and the energy consumption behavior of users.

4.1. Federated Learning-Based Multi-Energy Load Forecasting Framework

The main steps of the framework are as follows.

Step1: The preparations for the client and server.

Clients: Firstly, the missing and abnormal values of campuses’ data sets were completed or replaced. Secondly, the characteristics with high correlation were selected as input data. Thirdly, the input data were partitioned into three sets for training.

Sever: Initialize the model weights.

Step2: Campuses received the model parameters delivered by the server and trained the local model with the data sets.

Step3: Campuses retransmitted the parameters to the server.

Step4: The trained models’ parameters were different due to the different datasets of the campuses. The server aggregated the collected parameters and then obtained new parameters.

Step5: The new parameters would be sent to the campuses.

Step6: Campuses updated the local model with the new parameters.

The frame is shown in Figure 3:

4.2. Data Preprocessing

All four campuses require data preprocessing. Data preprocessing consists of data cleaning, standardization, and input data selection.

4.2.1. Data Cleaning

Because the load is determined by users’ needs and is closely related to the economic operation of society, under the comprehensive influence of many aspects, the electricity and cooling loads data is erratic, fluctuating, and periodic under the comprehensive influence of many aspects. Furthermore, there were instances where historical data was lost or preserved improperly. If raw data were used directly, the training process would be affected by many interference factors, which ultimately affect the effectiveness of the model. In order to ensure the quality of data, the missing and outlier values were replaced by average values of the previous and following values of the relevant points [38].

4.2.2. Data Normalization

Because the calculation is focused on features with larger dimensions, the actual effect in the overall mapping effect would be distorted for the broad span of the data range. Besides, data normalization helped accelerate the solution speed of parameter gradient descent, which means the convergence speed of the model could be improved.

Data normalization converts original data to the interval of [0,1]. After normalization, variables can be analyzed and evaluated in parallel. Data normalization prevented gradient explosion and improved network training effectiveness during the training process.

The normalization formula is shown in Equation (1):

X^{'} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}

(1)

where

X_{i}

is the original data;

X^{'}

is the normalized value, and

X_{\max}

are the minimum and maximum values.

4.2.3. Selection of Input Data

A higher correlation coefficient indicates a more significant impact. The main factors are factors whose correlation coefficient is higher than 0.6. Day type does not participate in the calculation of the value; it only represents the type and has no practical significance. However, day type is still the main factor, which has been proven to influence load by historical experience significantly.

Take Downtown as an example, the Spearman rank correlation coefficient among load and environmental data was calculated, and the thermal diagram was obtained in Figure 4.

4.3. Evaluation Criteria

Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) were utilized to evaluate the effectiveness of the forecasting model, as follows:

M A P E = \frac{100 %}{N} \sum_{i = 1}^{N} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(2)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{1})}^{2}}

(3)

where

N

represents the number of samples,

y_{i}

the actual load values, and the prediction results. They directly relate to accuracy: the smaller they are, the higher the accuracy.

Besides, Weighted Mean Accuracy (WMA) was adopted to comprehensively evaluate the network’s effect by weighting the forecasting precision based on the degree of importance of tasks. A higher WMA means a better overall forecast. The relevant equations are shown in (4) and (5):

M A = 1 - M A P E

(4)

W M A = α_{1} M A_{1} + α_{2} M A_{2}

(5)

where

M A

means the average accuracy

α_{1}

denotes the weight of the cooling load prediction task and

α_{2}

represents the weight of the electric load prediction task.

Based on the load capacity ratio, Table 1 describes the settings of

α_{1}

and

α_{2}

of campuses.

4.4. Hyperparameters

Appropriate hyperparameters are beneficial to the accuracy of forecasting. The optimization of hyperparameters is mainly to find the appropriate hyperparameters through repeated training processes to achieve maximum performance in a reasonable time. The selection of network hyperparameters is the most time-consuming link in multi-energy load forecasting [41]. Bayesian optimization has been used for hyperparameter tuning [42], which is an efficient parameter tuning method.

Bayesian optimization is characterized by using the information of previously searched points to determine the next search point. After generating a candidate solution set, the next possible extreme point would be determined based on these points and added to the set. This step would be repeated until the interaction ended. Pseudocode of Bayesian optimization is shown in Algorithm 2 [41]:

Algorithm 2: Bayesian optimization

1: Input: function to be maximized g, parameter space

X

, initial observations

M

2: for each round

n

= 1,2… do

3: Select

x_{n} \in \arg {max}_{x \in X} (x | G P ())

4: Evaluate

y_{n}

at

x_{n}

5: Add

(x_{n}, y_{n})

to

M

6: Update GP model

7: end for

8: Output:

x^{*} \in \arg {max}_{x \in X} g (x)

Where GP represents Gaussian Process, in the simulation, the function represents the opposite of the prediction error, which is the minimum value of the error. Since the structural parameters of the global model must be consistent, only the learning rate was optimized by Bayesian optimization, which means that the parameter space

X

here is the selection range of the learning rate, which is set between 0.0001 and 0.1. The learning rate of Downtown was set as 0.001, that of Polytechnic was set as 0.003, and 0.005 and 0.009 were set for Tempe and West, respectively.

4.5. Comparison of Forecasting Results

Six forecasting models are simulated as follows:

Individual model: each campus uses individual data for model training separately
Central model: a server centralizes the data of the campuses for model training
FedAvg: server and campuses use FedAvg for model training
FedAdagrad: server and campuses use FedAdagrad for model training
FedYogi: server and campuses use FedYogi for model training
FedAdam: server and campuses use FedAdam for model training

Here, Actual refers to the actual value of the load.

The structural parameters of the individual model and the global model of the federated models are identical. Due to the large amount of data to be processed in the central model, the number of neurons in each layer is doubled. The number of epochs is determined according to the convergence, which the Loss curve can reflect. More important than the number of epochs in federated learning is the number of server-client interactions (Rounds) [33]. The federated models were set to perform five local training sessions each round to extract data characteristics.

Figure 5 shows the Loss (Rounds) curves of each model. The number of epochs for individual and central models is determined to be 100, and the number of rounds for federated models is 20.

4.5.1. Forecasting Accuracy under Regular Operation

To compare the performance of models in regular operation, Table 1 lists the prediction errors based on the test set.

From Table 1, it is clear that no matter which campus, all models are better than individual models, and the central model and FedAdagrad had the best forecasting effects.

For the Downtown campus, the WMA of the central model was 89.94%, 1.21% higher than that of the individual model. The WMA of the federated models was 1.92% to 3.11% higher than that of the individual model, all of which were better than that of the central model. Among them, FedAdagrad has the best prediction effect, whose WMA can reach 91.84%, 3.11% higher than the single model, and 1.9% higher than the central model.

For the Polytechnic campus, the WMA of the central model was 85.04%, 4.02% higher than that of the individual model. Federated models also improved the prediction accuracy to different degrees compared with the individual model. Among them, the WMA of FedAdagrad came in second at 84.85%, 3.83% higher than the individual model’s, and only 0.19% lower than that of the central model.

For the Tempe campus, the WMA of the central model was 94.36%, and that of FedAdagrad was 93.37%, 3.49%, and 2.50% higher than that of the individual model, respectively. Other federated models also have improved WMA to varying degrees.

FedAdagrad had the best prediction effect for the West campus, with a WMA of 82.60%, 5.67%, and 5.04% higher than that of the individual and central models. Compared with the individual model, the prediction accuracy of the other three federated models also improved significantly, with an improved range of 4.04% to 5.02%, which was better than that of the central model.

It can be seen from the above data that under regular operation, the accuracy of the model based on FL could achieve similar or even higher accuracy than the central model, and the FedAdagrad had the most accurate forecasting. Moreover, the improvement in the federal model was more pronounced in districts with poor data quality, such as the West camps.

Take Downtown, for example, whose forecasting results of FedAdagrad and actual values of 3 days ahead are shown in Figure 6.

In addition, the stability of the federated learning-based multi-energy load forecasting models was tested by simulating the situation of fake data injection.

4.5.2. Forecasting Accuracy under FDIA

The experiment simulated that in the process of sharing parameters from campuses to the server, the parameters were added with noises ranging from 0.01% to 0.05% [43], which means an FDIA had occurred.

In this scenario, the forecasting results of FedAvg, FedAdagrad, FedYogi, and FedAdam are listed in Table 2.

Table 2 and Table 3 show that the prediction accuracy of federated models decreases to different degrees compared to the regular operation. Among them, FedAdagrad’s WMA decreased by 0.49% and 1.73%. The federated models still showed higher WMA than the individual model, except for FedYogi’s 0.18% decrease at the Polytechnic campus.

According to the simulation results, the federal model can still maintain a high prediction accuracy under FDIA, and the prediction accuracy of FedAdagrad remains the highest among the federal models, even higher than that of the individual model.

4.5.3. Training Time Evaluation of Models

In addition to the model forecasting accuracy, the time required for training was also counted. In order to simulate the situation of more campuses participating in training, the data were amplified to eight campuses from the original four campuses. The results are listed in Table 4. The time for the central model only referred to the time for training the model, excluding data transmission time. The time for the FL included the time of data transmission, training model, and parameter aggregation time at the server.

Due to the small amount of data and no data transmission time, the time which the individual model needed was the shortest. In comparison, the training time of central models increases due to a large amount of data and the hardware conditions of one device. Federated models were distributed training, and models could be trained on multiple devices with little impact on device hardware. Moreover, servers of federated models only extract four clients for training in each round, which means the training data is only half of the central model. Therefore, federated models took less time than the central model.

The time required for FedAdagrad was 73.8 s longer than individual models, but the accuracy can be improved by 2.50% to 5.67%, bringing significant economic value [44].

5. Conclusions

Traditional load forecasting usually trains the model through an individual or central model. Due to the short development time, IEM may face the problem of poor data quality and low data diversity. The central model requires all clients to send the original data to the server, which has privacy and security risks, brings pressure to the communication network, and needs a lot of centralized computing resources. Therefore, this paper proposed an FL-based multi-energy load prediction method. Each IEM could train the forecasting model in a distributed manner, not sharing local data.

In this method, the CNN-Attention-LSTM model was first established. CNN extracted the overall features, the attentional mechanism was used to assign feature weights, and LSTM captured the temporal features of the data. Furthermore, each IEM used local data to train the model and shared parameters instead of local data to the server under the FL mechanism. After receiving the parameters, the server aggregated the parameters and then generated new parameters. These new parameters would be retransmitted to IEMs. This method ensured the model’s ability to extract features while protecting clients’ privacy. This paper examined the performance of the individual, central, and federated models with four strategies (FedAvg, FedAdagrad, FedYogi, and FedAdam). The observations show that the prediction accuracy of FL is comparable to the central model and requires a shorter time. In addition, when facing FDIA, FedAdagrad could maintain a good forecasting effect and achieve the purpose of multi-energy load forecasting.

Our further work will consider more types of attacks and more FL strategies. Moreover, further improving the prediction accuracy is also part of future research.

Author Contributions

Conceptualization, X.B.; Data curation, G.Z.; Formal analysis, G.Z. and X.B.; Funding acquisition, X.B.; Investigation, G.Z.; Methodology, X.B.; Project administration, X.B.; Resources, X.B.; Software, G.Z.; Supervision, X.B.; Validation, G.Z.; Visualization, G.Z.; Writing–original draft, G.Z.; Writing–review & editing, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [grant numbers 51967001].

Data Availability Statement

The data used in the manuscript are downloaded from public access and are open source data. Load data downloaded from http://cm.asu.edu/ (accessed on 1 May 2021). Weather data downloaded from https://www.ncei.noaa.gov/ (accessed on 1 May 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

Cheng, Y.; Zhang, N.; Lu, Z.; Kang, C. Planning multiple energy systems toward low-carbon society: A decentralized approach. IEEE Trans. Smart Grid 2018, 10, 4859–4869. [Google Scholar] [CrossRef]
Quelhas, A.; Gil, E.; McCalley, J.D.; Ryan, S.M. A Multiperiod Generalized Network Flow Model of the U.S. Integrated Energy System: Part I—Model Description. IEEE Trans. Power Syst. 2007, 22, 829–836. [Google Scholar] [CrossRef]
Taïk, A.; Cherkaoui, S. Electrical load forecasting using edge computing and federated learning. In Proceedings of the 2020–2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
Subramanya, T.; Riggio, R. Centralized and Federated Learning for Predictive VNF Autoscaling in Multi-Domain 5G Networks and Beyond. IEEE Trans. Netw. Serv. Manag. 2021, 18, 63–78. [Google Scholar] [CrossRef]
Li, J.; Ren, Y.; Fang, S.; Li, K.; Sun, M. Federated Learning-Based Ultra-Short term load forecasting in power Internet of things. In Proceedings of the IEEE International Conference on Energy Internet (ICEI), Sydney, NSW, Australia, 24–28 August 2020; pp. 63–68. [Google Scholar]
Jiang, Y.; Ma, M.; Bennis, M.; Zheng, F.-C.; You, X. User Preference Learning-Based Edge Caching for Fog Radio Access Network. EEE Trans. Commun. 2019, 67, 1268–1283. [Google Scholar] [CrossRef] [Green Version]
Zhang, X.; Fang, F.; Wang, J. Probabilistic Solar Irradiation Forecasting Based on Variational Bayesian Inference With Secure Federated Learning. IEEE Trans. Ind. Inform. 2021, 17, 7849–7859. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Artificial Intelligence and Statistics; PMLR: Seattle, WA, USA, 2016. [Google Scholar]
Messaoud, S.; Bradai, A.; Bukhari, S.H.R.; Quang, P.T.A.; Ben Ahmed, O.; Atri, M. A survey on machine learning in Internet of Things: Algorithms, strategies, and applications. Internet Things 2020, 12, 100314. [Google Scholar] [CrossRef]
Li, H.; Ota, K.; Dong, M. Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing. IEEE Netw. 2018, 32, 96–101. [Google Scholar] [CrossRef] [Green Version]
Ahmed, M.; Pathan, A.S.K. False data injection attack (FDIA): An overview and new metrics for fair evaluation of its countermeasure. Complex Adapt. Syst. Model. 2020, 8, 4. [Google Scholar] [CrossRef] [Green Version]
Liu, E.; Wang, Y.; Huang, Y. Short-term Forecast of Multi-load of Electrical Heating and Cooling in Regional Integrated Energy System Based on Deep LSTM RNN. In Proceedings of the IEEE Conference on Energy Internet and Energy System Integration (EI2), Wuhan, China, 15 February 2020; pp. 2994–2998. [Google Scholar]
Zhu, R.; Guo, W.; Gong, X. Short-Term Load Forecasting for CCHP Systems Considering the Correlation between Heating, Gas and Electrical Loads Based on Deep Learning. Energies 2019, 12, 3308. [Google Scholar] [CrossRef] [Green Version]
Wang, S.; Wang, S.; Chen, H.; Gu, Q. Multi-energy load forecasting for regional integrated energy systems considering temporal dynamic and coupling characteristics. Energy 2020, 195, 116964. [Google Scholar] [CrossRef]
Tan, Z.; De, G.; Li, M.; Lin, H.; Yang, S.; Huang, L.; Tan, Q. Combined electricity-heat-cooling-gas load forecasting model for integrated energy system based on multi-task learning and least square support vector machine. J. Clean. Prod. 2020, 248, 119252. [Google Scholar] [CrossRef]
Yan, Y.; Zhang, Z. Cooling, Heating and Electrical Load Forecasting Method for Integrated Energy System based on SVR Model. In Proceedings of the 2021 6th Asia Conference on Power and Electrical Engineering (ACPEE), Chongqing, China, 24 May 2021; pp. 1753–1758. [Google Scholar]
Li, Q.; Tang, X.; Luo, Y.; Liu, W.; Chen, Q.; Hu, J. Integrated Energy System Load Forecast Based on Information Entropy and Stacked Auto-Encoders. In Proceedings of the 2021 IEEE IAS Industrial and Commercial Power System Asia, Chengdu, China, 18–21 July 2021; pp. 385–390. [Google Scholar]
Wang, Y.; Ma, K.; Li, X.; Liang, Y.; Hu, Y.; Li, J.; Liu, H. Multi-type Load Forecasting of IES Based on Load Correlation and Stacked Auto-Encode Extreme Learning Machine. In Proceedings of the IEEE International Conference on Power and Energy Systems (ICPES), Chengdu, China, 25–27 December 2020; pp. 585–589. [Google Scholar]
Wang, Y.; Ma, Y.; Song, F.; Ma, Y.; Qi, C.; Huang, F.; Xing, J.; Zhang, F. Economic and efficient multi-objective operation optimization of integrated energy system considering electro-thermal demand response. Energy 2020, 205, 118022. [Google Scholar] [CrossRef]
Chuang, L.; Guojie, L.; Keyou, W.; Han, B. A multi-energy load forecasting method based on parallel architecture CNN-GRU and transfer learning for data deficient integrated energy systems. Energy 2022, 259, 124967. [Google Scholar]
Guo, Y.; Li, Y.; Qiao, X.; Zhang, Z.; Zhou, W.; Mei, Y.; Lin, J.; Zhou, Y.; Nakanishi, Y. BiLSTM Multitask Learning-Based Combined Load Forecasting Considering the Loads Coupling Relationship for Multienergy System. IEEE Trans. Smart Grid 2022, 13, 3481–3492. [Google Scholar] [CrossRef]
Wang, C.; Wang, Y.; Ding, Z.; Zheng, T.; Hu, J.; Zhang, K. A Transformer-Based Method of Multienergy Load Forecasting in Integrated Energy System. IEEE Trans. Smart Grid 2022, 13, 2703–2714. [Google Scholar] [CrossRef]
Feng, Y.; Wang, Q. A New Calendar Effect and Weather Conditions based Day-ahead Load Forecasting Model. In Proceedings of the IEEE Power & Energy Society General Meeting (PESGM), Atlanta, GA, USA, 4–8 August 2019; pp. 1–5. [Google Scholar] [CrossRef]
Hu, X.Y.; Li, B.J.; Shi, J.; Hua, L.; Guojing, L. A Novel Forecasting Method for Short-term Load based on TCN-GRU Model. In Proceedings of the 2021 IEEE International Conference on Energy Internet (ICEI), Southampton, UK, 27–29 September 2021; pp. 79–83. [Google Scholar] [CrossRef]
Shingi, G. A federated learning based approach for loan defaults prediction. In Proceedings of the IEEE International Conference on Data Mining Workshops (ICDM Workshops), Sorrento, Italy, 17–20 November 2020; pp. 362–368. [Google Scholar]
Zeng, T.; Guo, J.; Kim, K.J.; Parsons, K.; Orlik, P.; Di Cairano, S.; Saad, W. Multi-Task Federated Learning for Traffic Prediction and Its Application to Route Planning. In Proceedings of the IEEE Symposium on Intelligent Vehicle, Nagoya, Japan, 11–17 July 2021; pp. 451–457. [Google Scholar]
Liu, Y.; James, J.Q.; Kang, J.; Niyato, D.; Zhang, S. Privacy-Preserving Traffic Flow Prediction: A Federated Learning Approach. IEEE Internet Things J. 2020, 7, 7751–7763. [Google Scholar] [CrossRef]
Jiang, F.; Cheng, W.; Gao, Y.; Sun, C. Caching Strategy Based on Content Popularity Prediction Using Federated Learning for F-RAN. In Proceedings of the IEEE International Conference on Communications in China Workshops (ICCC), Xiamen, China, 28–30 July 2021; pp. 19–24. [Google Scholar]
Thorgeirsson, A.T.; Scheubner, S.; Fünfgeld, S.; Gauterin, F. Probabilistic Prediction of Energy Demand and Driving Range for Electric Vehicles With Federated Learning. IEEE Open J. Veh. Technol. 2021, 2, 151–161. [Google Scholar] [CrossRef]
Cheng, X.; Luo, Q.; Pan, Y.; Li, Z.; Zhang, J.; Chen, B. Predicting the APT for Cyber Situation Comprehension in 5G-Enabled IoT Scenarios Based on Differentially Private Federated Learning. Secur. Commun. Netw. 2021, 2021, 8814068. [Google Scholar] [CrossRef]
Mitra, A.; Ngoko, Y.; Trystram, D. Impact of Federated Learning On Smart Buildings. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems, Coimbatore, India, 25–27 March 2021; pp. 93–99. [Google Scholar]
Savi, M.; Olivadese, F. Short-Term Energy Consumption Forecasting at the Edge: A Federated Learning Approach. IEEE Access 2021, 9, 95949–95969. [Google Scholar] [CrossRef]
Fekri, M.N.; Grolinger, K.; Mir, S. Distributed load forecasting using smart meter data: Federated learning with Recurrent Neural Networks. Int. J. Electr. Power Energy Syst. 2022, 137, 107669. [Google Scholar] [CrossRef]
Reddi, S.; Charles, Z.; Zaheer, M.; Garrett, Z.; Rush, K.; Konečný, J.; Kumar, S.; McMahan, H.B. Adaptive federated optimization, ICRL. arXiv 2021, arXiv:2003.00295. [Google Scholar]
Charles, Z.; Garrett, Z.; Huo, Z.; Shmulyian, S.; Smith, V. On Large-Cohort Training for Federated Learning. In Proceedings of the 35th Conference on Neural Information Processing Systems, Sydney, Australia, 6–14 December 2021. [Google Scholar]
McMahan, H.B.; Streeter, M. Adaptive Bound Optimization for Online Convex Optimization. In Proceedings of the 23rd Annual Conference on Learning Theory (COLT), Haifa, Israel, 27–29 June 2010. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Zhang, G.; Bai, X.; Wang, Y. Short-time multi-energy load forecasting method based on CNN-Seq2Seq model with attention mechanism. Mach. Learn. Appl. 2021, 5, 100064. [Google Scholar] [CrossRef]
Campus Metabolism. 2021. Available online: http://cm.asu.edu/ (accessed on 1 May 2021).
National Centers for Environmental Information (NCEI). 2021. Available online: https://www.ncei.noaa.gov/ (accessed on 1 May 2021).
Cho, H.; Kim, Y.; Lee, E.; Choi, D.; Lee, Y.; Rhee, W. Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks. IEEE Access 2020, 8, 52588–52608. [Google Scholar] [CrossRef]
Shin, S.; Lee, Y.; Kim, M.; Park, J.; Lee, S.; Min, K. Deep neural network model with Bayesian hyperparameter optimization for prediction of NOx at transient conditions in a diesel engine. Eng. Appl. Artif. Intell. 2020, 94, 103761. [Google Scholar] [CrossRef]
Mode, G.R.; Calyam, P.; Hoque, K.A. False Data Injection Attacks in Internet of Things and Deep Learning enabled Predictive Analytics. arXiv 2019, arXiv:10.48550. [Google Scholar]
Fekri M, N.; Patel, H.; Grolinger, K.; Sharma, V. Deep learning for load forecasting with smart meter data: Online Adaptive Recurrent Neural Network. Appl. Energy 2021, 282, 116177. [Google Scholar] [CrossRef]

Figure 1. Federated learning mechanism.

Figure 2. CNN-Attention-LSTM model.

Figure 3. Framework of FL-based multi-energy load forecasting model.

Figure 4. Spearman rank correlation coefficient of Downtown.

Figure 5. The Loss (Rounds) curves. (a) Individual models. (b) Central model. (c) Federated models.

Figure 6. The curve of load forecasting.

Table 1. Weight setting.

	Downtown	Polytechnic	Tempe	West
α₁	0.4	0.5	0.7	0.7
α₂	0.6	0.5	0.3	0.3

Table 2. Errors comparison.

(a) Downtown
Type	Cooling		Electricity		WMA (%)
Type	MAPE (%)	RMSE (Tons)	MAPE (%)	RMSE (kW)	WMA (%)
Individual	15.13	13.79	8.69	103.65	88.73
Central	14.50	17.19	7.11	109.45	89.94
FedAvg	12.50	12.42	5.76	72.91	91.54
FedAdagrad	11.74	12.22	5.78	72.12	91.84
FedYogi	13.65	14.29	5.93	76.57	90.98
FedAdam	15.42	15.54	5.31	72.58	90.65
(b) Polytechnic
Type	Cooling		Electricity		WMA (%)
Type	MAPE (%)	RMSE (Tons)	MAPE (%)	RMSE (kW)	WMA (%)
Individual	27.35	37.61	10.61	229.39	81.02
Central	23.66	27.65	6.26	162.85	85.04
FedAvg	26.78	30.24	7.58	161.52	82.82
FedAdagrad	23.91	28.35	6.40	141.68	84.85
FedYogi	27.70	34.20	6.72	151.62	82.79
FedAdam	24.85	28.59	6.43	146.70	84.36
(c) Tempe
Type	Cooling		Electricity		WMA (%)
Type	MAPE (%)	RMSE (Tons)	MAPE (%)	RMSE (kW)	WMA (%)
Individual	11.31	682.65	4.03	917.23	90.87
Central	6.79	400.49	2.93	719.94	94.36
FedAvg	9.10	531.36	3.60	820.49	92.55
FedAdagrad	7.87	425.15	3.74	821.08	93.37
FedYogi	9.08	468.76	3.36	775.60	92.63
FedAdam	8.95	468.22	3.60	819.17	92.65
(d) West
Type	Cooling		Electricity		WMA (%)
Type	MAPE (%)	RMSE (Tons)	MAPE (%)	RMSE (kW)	WMA (%)
Individual	25.35	189.27	17.76	293.80	76.93
Central	27.51	175.43	10.60	189.71	77.56
FedAvg	21.03	156.02	11.73	211.20	81.76
FedAdagrad	19.51	123.20	12.46	205.27	82.60
FedYogi	19.87	122.33	13.81	234.64	81.95
FedAdam	22.02	159.05	12.04	217.89	80.97

Table 3. Errors comparison of different models under FDIA.

(a) Downtown
Type	Cooling		Electricity		WMA (%)
Type	MAPE (%)	RMSE (Tons)	MAPE (%)	RMSE (kW)	WMA (%)
FedAvg	15.35	16.44	4.69	58.43	91.05
FedAdagrad	13.94	15.13	5.13	66.58	91.35
FedYogi	15.08	15.65	6.25	83.62	90.22
FedAdam	15.29	14.76	6.48	98.88	89.99
(b) Polytechnic
Type	Cooling		Electricity		WMA (%)
Type	MAPE (%)	RMSE (Tons)	MAPE (%)	RMSE (kW)	WMA (%)
FedAvg	27.62	31.29	8.08	173.69	82.15
FedAdagrad	26.13	0.26	6.52	143.44	83.68
FedYogi	30.63	42.83	7.69	187.41	80.84
FedAdam	31.23	31.02	6.71	165.41	81.03
(c) Tempe
Type	Cooling		Electricity		WMA (%)
Type	MAPE (%)	RMSE (Tons)	MAPE (%)	RMSE (kW)	WMA (%)
FedAvg	10.67	567.61	3.71	917.69	91.42
FedAdagrad	10.26	527.35	3.92	859.08	91.64
FedYogi	11.41	571.13	3.31	810.36	91.02
FedAdam	11.27	617.29	3.69	870.25	91.01
(d) West
Type	Cooling		Electricity		WMA (%)
Type	MAPE(%)	RMSE(Tons)	MAPE(%)	RMSE(kW)	WMA (%)
FedAvg	22.40	156.28	13.22	215.08	80.36
FedAdagrad	20.91	147.21	11.35	197.21	81.96
FedYogi	23.39	173.17	9.87	176.56	80.66
FedAdam	22.16	136.31	12.57	217.80	80.72

Table 4. Computation comparison.

Type	Rounds	Clients	Epochs	Time (s)
Individual	-	-	100	140.21
Central	-	8	100	350.42
FedAvg	20	8	5	229.04
FedAdagrad	20	8	5	214.09
FedYogi	20	8	5	210.35
FedAdam	20	8	5	238.99

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, G.; Zhu, S.; Bai, X. Federated Learning-Based Multi-Energy Load Forecasting Method Using CNN-Attention-LSTM Model. Sustainability 2022, 14, 12843. https://doi.org/10.3390/su141912843

AMA Style

Zhang G, Zhu S, Bai X. Federated Learning-Based Multi-Energy Load Forecasting Method Using CNN-Attention-LSTM Model. Sustainability. 2022; 14(19):12843. https://doi.org/10.3390/su141912843

Chicago/Turabian Style

Zhang, Ge, Songyang Zhu, and Xiaoqing Bai. 2022. "Federated Learning-Based Multi-Energy Load Forecasting Method Using CNN-Attention-LSTM Model" Sustainability 14, no. 19: 12843. https://doi.org/10.3390/su141912843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Federated Learning-Based Multi-Energy Load Forecasting Method Using CNN-Attention-LSTM Model

Abstract

1. Introduction

2. Related Work

2.1. Multi-Energy Load Forecasting

2.2. Federated Learning in Energy Systems

3. Methodology

3.1. Federated Learning

3.2. CNN-Attention-LSTM Model

4. Simulation

4.1. Federated Learning-Based Multi-Energy Load Forecasting Framework

4.2. Data Preprocessing

4.2.1. Data Cleaning

4.2.2. Data Normalization

4.2.3. Selection of Input Data

4.3. Evaluation Criteria

4.4. Hyperparameters

4.5. Comparison of Forecasting Results

4.5.1. Forecasting Accuracy under Regular Operation

4.5.2. Forecasting Accuracy under FDIA

4.5.3. Training Time Evaluation of Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI