Next Article in Journal
Analysis of the Attitudes of Central European Small- and Medium-Sized Enterprises towards Adaptation to the Low-Carbon Economy and Its Implementation Barriers
Next Article in Special Issue
Simulation of a Building with Hourly and Daily Varying Ventilation Flow: An Application of the Simulink S-Function
Previous Article in Journal
Evaluation of Landweber Coupled Least Square Support Vector Regression Algorithm for Electrical Capacitance Tomography for LN2–VN2 Flow
Previous Article in Special Issue
Study of Internal Flow Heat Transfer Characteristics of Ejection-Permeable FADS
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

DelayNet: Enhancing Temporal Feature Extraction for Electronic Consumption Forecasting with Delayed Dilated Convolution

Department of ICT Convergence System Engineering, Chonnam National University, 77, Yongbong-ro, Buk-gu, Gwangju 61186, Republic of Korea
Department of Electronic Convergence Engineering, Kwangwoon University, 20 Gwangun-ro, Nowon-gu, Seoul 01897, Republic of Korea
Author to whom correspondence should be addressed.
Energies 2023, 16(22), 7662;
Submission received: 18 October 2023 / Revised: 13 November 2023 / Accepted: 17 November 2023 / Published: 20 November 2023


In the face of increasing irregular temperature patterns and climate shifts, the need for accurate power consumption prediction is becoming increasingly important to ensure a steady supply of electricity. Existing deep learning models have sought to improve prediction accuracy but commonly require greater computational demands. In this research, on the other hand, we introduce DelayNet, a lightweight deep learning model that maintains model efficiency while accommodating extended time sequences. Our DelayNet is designed based on the observation that electronic series data exhibit recurring irregular patterns over time. Furthermore, we present two substantial datasets of electricity consumption records from South Korean buildings spanning nearly two years. Empirical findings demonstrate the model’s performance, achieving 21.23%, 43.60%, 17.05% and 21.71% improvement compared to recurrent neural networks, gated-recurrent units, temporal convolutional neural networks and ARIMA models, as well as greatly reducing model complexity and computational requirements. These findings indicate the potential for micro-level power consumption planning, as lightweight models can be implemented on edge devices.

1. Introduction

According to recent observations, the frequency and intensity of extreme weather events are increasing globally and becoming more severe [1,2]. Due to abnormal temperatures and climate changes, there is an increasing volatility in power consumption [3]. Since electrical energy must be produced and consumed simultaneously due to its physical characteristics [4], accurate prediction of power demand is crucial when planning power generation at the national and power plant levels. For instance, failing to predict power demand accurately may result in not only budget overspending on excessive power generation facilities but also wasted electrical energy. On the other hand, if there is a shortage of electrical energy, power outages can occur. In fact, in South Korea, a power outage occurred in September 2011 due to a shortage of electricity [5].
The most effective application of electricity demand forecasting pertains to predicting the power consumption of individual buildings. The consumption of energy in buildings is anticipated to witness a surge of over 40% in the forthcoming two decades, accounting for 30–45% of the total global energy consumption. Systems such as the Home Energy Management System (HEMS) and Building Energy Management Systems (BEMS) are structured in diverse forms, contingent upon the type and size of the building. These systems are under research not only for collecting and monitoring building-level electricity consumption [6,7] data, but also for enabling the forecasting of electricity consumption. In buildings of larger scale, extensive cloud servers and server rooms exist, equipped with substantial computing power for BEMS. However, medium and smaller-scale buildings typically possess only small-scale server rooms or MCUs at the level of edge computing. Consequently, the availability of computing power resources emerges as a critical element in the research of electricity forecasting.
Research on electricity demand forecasting has seen substantial progress encompassing a variety of methodologies from machine learning to deep learning techniques. In earlier studies, there were efforts aimed at forecasting household electricity consumption by categorizing patterns of electricity usage data, which were accumulated through home systems [8]. Investigations were initiated where the characteristics of electricity utilization data were grouped and examined, and uniquely normalized data was employed to educate machine learning models for forecasting purposes [9]. Additionally, studies were conducted on systems with the capability to predict long-term electricity demand, utilizing machine learning models trained with time-series-analyzed monthly electricity consumption data [10].
In more recent advancements, deep learning techniques have been integrated, utilizing multilayer perceptron (MLP), recurrent neural network (RNN), and long-short-term memory (LSTM) for training models on household electricity consumption data for predictive purposes [11]. There has been a noteworthy study that proposed a hybrid convolutional long-short-term-memory (CNN-LSTM) model specifically aiming to predict residential energy consumption [12]. Additionally, innovative approaches have been adopted where electricity consumption data was manipulated to conform with the characteristics inherent to sequential data, employing Transformer models for the purpose of electricity demand prediction [13]. This convergence of methodologies highlights the evolving landscape of machine learning research in addressing the challenges of electricity demand forecasting.
Typically, power consumption prediction models employ temporal convolutional neural networks (TCNs), effectively capturing the local features, a strength of CNN, but they fall short of adequately incorporating time-series characteristics. To counter this, advanced approaches have been developed, combining CNN and RNN and introducing models in the Transformer family that apply attention mechanisms. This evolution in methodologies has led to a notable improvement in the accuracy of electricity demand forecasting models, which are trained using electricity usage data. However, this advancement in accuracy has consequently demanded a parallel increase in both the model’s capacity and the computing power required.
In this paper, we aim to propose DelayNet, a lightweight building power consumption prediction model for BEMS that can be applied to medium- and small-sized buildings. The proposed DelayNet interprets the Dilated Convolution of TCN [14] in a new way and applies Dilated Delay Convolution, thus well reflecting both local features and time-series characteristics while using fewer parameters and fewer computational operations. The main contributions of this paper are as follows:
  • We publish our in-house power consumption dataset, a reliable dataset collected from residences over a period of two years.
  • We propose DelayNet, a stable yet lightweight deep learning model for predicting the power consumption of buildings. The DelayNet can retain the local feature of TCN while reflecting long sequence characteristics. Compared to previous works, the Delay Block maintains the same number of filters while increasing the interval between sequences, significantly reducing the number of parameters and computational operations while exhibiting high performance.
The structure of the remainder of this paper is as follows: Section 2 will explore studies related to predicting power consumption. Section 3 will give a detailed explanation of the common deep learning models used for such predictions and discuss the background and approach of the proposed DelayNet. In Section 4, we will share the actual data on power consumption that we collected and the results of our experiments. Finally, Section 5 will conclude the paper.

2. Related Works

Research efforts have been dedicated to extracting features from energy consumption data and predicting power demand to ensure a stable power supply. These efforts can be divided into three main categories: statistical-based approaches, machine learning-based approaches, and deep learning-based approaches.
Statistical-based methods have been utilized for predicting power demand. These approaches involved conducting simple and multiple linear regression analyses as well as quadratic regression analyses using building data collected on an hourly and daily basis [15]. It was noted that as the time intervals increased, the quality of the models [16] improved. In the context of forecasting daily power consumption in commercial buildings, research was conducted on both multiple regression and genetic programming models. While the genetic programming model displayed a slight performance advantage over the multiple regression model, it did have the drawback of longer training times. Additionally, a study presented a method for predicting next-day electricity prices in the Spanish and Californian power markets based on the autoregressive integrated moving average (ARIMA) methodology [17]. Customized ARIMA models were developed for each dataset. Furthermore, research focused on building load forecasting using a hybrid model that combined clustering techniques with ARIMA models [18]. It was observed that the hybrid model yielded improved predictive performance compared to using a standalone ARIMA model. However, these traditional statistical models often encounter difficulties when handling complex datasets that are non-linear and noisy, leading to potential reductions in prediction accuracy.
In the context of machine-learning-based approaches, research endeavors have focused on the application of the Random Forest ensemble learning method to predict the maximum energy consumption [19]. Random Forest was chosen for its ability to optimize and mitigate overfitting, and attempts were made to improve performance by including not only basic current loads but also data from GSM network call records. Furthermore, a study investigated the utilization of the Support Vector Regression (SVR) model to forecast demand response criteria in office buildings [20]. Enhancements in performance were sought by introducing foundational electrical loads and ambient temperature into the model. It is important to note that these machine learning-based methods may face challenges as the volume of data increases.
In the domain of deep learning-based techniques, research has delved into the development of models for short-term load prediction. This includes a study that combines CNN (Convolutional Neural Network) with K-means clustering [21]. Additionally, there has been investigation into LSTM (Long-Short-Term-Memory) models for short-term load forecasting [22]. Building upon these findings, hybrid models integrating CNN and LSTM networks have been explored for short-term load prediction [23]. Furthermore, research has been conducted on S2S (Sequence to Sequence) models, such as LSTM-based S2S structures [24] and GRU-based S2S structures [25]. In these cases, crucial autocorrelation coefficients in time series data were used for data analysis. Moreover, research has examined the integration of Attention mechanisms into S2S structures [26]. For time series prediction, there has been exploration of TCN (Temporal Convolutional Network) models, which adapt CNN filters for time series data [27].
Inspired by the Transformer model’s exceptional performance in Natural Language Processing (NLP) [28], research has also extended to power demand prediction models based on the Transformer architecture [13]. The basic Transformer structure has been customized to optimize it for power demand prediction in this context. More recently, models like Informer have emerged, which are Transformer-based models fine-tuned specifically for time series data prediction [29]. Additionally, there are models like Autoformer, which employ autocorrelation coefficients for time series prediction within a Transformer framework [30], and Fedformer, which optimizes computation by using frequency domain operations in a Transformer architecture [31].
These deep learning-based methods have demonstrated impressive performance not only in short-term and long-term power demand forecasting but also in various other time series prediction tasks. However, it is essential to acknowledge that the pursuit of higher performance often results in significantly larger model sizes and computational requirements. For example, the computational complexity of Transformers is O ( L 2 ) , meaning computational demands increase quadratically with the length of the input sequence.
As previously mentioned, research in power demand prediction has explored a variety of methods, including those based on statistics, machine learning, and deep learning. However, most of these studies aimed to improve the accuracy of power demand prediction models by incorporating additional variables and leveraging big data alongside the provided power data. Nevertheless, our research reveals that using highly complicated models does not always lead to better outcomes. When dealing with specific tasks and limited data, it is often better to create a custom model that suits the task well and learns effectively, instead of using complex models that need a lot of data to work correctly.

3. DelayNet

This section outlines our contribution, the Delay Dilated Convolutional Grouping Network (DelayNet), designed for time series forecasting. Our proposed architecture draws inspiration from TCN models and shares a close relationship with the Stride-TCN design. To provide context, we will briefly demonstrate developments in time series forecasting before introducing our model.

3.1. Preliminaries

Time series forecasting (TSF) finds numerous applications and can be categorized into two tasks: Single Step Prediction and Multi-Step Prediction. Concerning time series datasets, they can be categorized into Multivariable and Univariable Datasets. In this research, our primary focus is on the task of Multi-Step Prediction within the context of Univariable time series data.
The task of power demand forecasting shares commonalities with the broader field of time series prediction, which finds applications across diverse domains. Specifically, predicting power demand from power consumption data can be likened to the challenge of univariate time series forecasting [32].
y ^ t + 1 = f s ( y t k : t , x t k : t )
Equation (1) presents a mathematical model for time series prediction tasks. In this context, y ^ represents the predicted value, while x and y denote input values. Time is denoted by t , and k is the window size, while s represents static metadata values that remain constant over time. For univariate time series prediction, we can define it as the forecasted value derived by applying input values and static metadata from time step t k to t to a function f . This framework allows us to construct 2 subcomponents, as in Equations (2) and (3).
z t = g e n c o d e r ( x t k : t )
y ^ t + 1 = g d e c o d e r ( z t ,   y t k : t )
In this framework, z t represents a latent vector. The process involves transforming input values into latent values using an encoder, and these resultant latent values are subsequently employed for generating final predictions through a decoder. Within g e n c o d e r , a variety of deep learning models can be applied, offering flexibility, while g d e c o d e r can encompass a range of models, spanning from basic linear functions to more complex ones like Sequence-to-Sequence (S2S) models. In the context of univariate time series prediction, non-linear functions are primarily employed within g e n c o d e r . Next, we will introduce some common deep learning-based models for TSF.

Deep Learning Model Based TSF

Deep learning models for time series forecasting can be categorized into four primary types, each distinguished by how the encoder and decoder functions are constructed.
Firstly, the MLP-based TSF, depicted in Figure 1a, employs perceptrons as fundamental components, Equation (4). These perceptrons take inputs x t k : t , apply weights W , add biases b , and utilize activation functions f to compute predictions. While effective for simple and short-term data, MLP models struggle with large datasets due to their substantial computational requirements.
y ^ t + 1 = g d e c o d e r = f ( W x t k : t + b )
On the other hand, RNN-based TSF, represented in Figure 1b, utilizes recurrent neural networks (RNNs) that incorporate information from previous hidden layers into current computations, Equations (5) and (6). This allows them to capture long-range dependencies in sequential data. However, RNNs have limitations, such as fixed window sizes and challenges related to gradient issues and long-distance prediction.
z t = g e n d o d e r = f ( [ U e x t , V e z t 1 ] + b e )
y ^ t + 1 = g e n c o d e r = f ( [ U d y t , V d z t , W d h t 1 ] + b d )
The third type is Transformer-based TSF, as seen in Figure 1c, leverages attention mechanisms to consider relationships between tokens in input sequences, Equation (7). Transformers are capable of incorporating important information from past time steps directly into predictions. However, they are characterized by complex models, high computational costs for attention calculations O ( n 2 ) , and suboptimal performance on smaller datasets.
z t = g e n c o d e r = f ( i = t k t 1 α ( K t , Q i ) v t )
y ^ t + 1 = g d e c o d e r = f ( W [ z t k : t ] + b )
Lastly, to adapt CNNs for time series, causal convolution is essential to ensure that the model only considers past data. In this context, hidden states z t and filter weights W t play crucial roles. To increase the receptive field while minimizing computational demands, Dilated convolution is employed, as shown in Figure 1d, allowing the model to capture d long-term dependencies, Equation (9). CNN-based time series forecasting excels in capturing local correlations and addressing long-range dependencies [33], making it a valuable tool for time series data analysis.
z t = g e n c o d e r = f ( i = 0 k d l W t x t d i )
y ^ t + 1 = g d e c o d e r = f ( W [ z t k : t ] + b )

3.2. Rethinking TCN

Conventional TCNs [34] using 1D causal convolution and made use of dilated convolutions, as seen in Figure 2a, to enhance their capacity for information capture. However, to extend the receptive field for examining longer-term patterns in time series data, it is necessary to stack multiple dilated convolutions. This led to an increase in both model size and computational complexity. Nonetheless, TCNs offer an advantage by maintaining solid performance with a more compact model and lower computational demands when compared to other types of models for univariate time series prediction. To address this efficiency concern, Anh et al. introduced Stride-Dilated TCN (Stride-TCN), as depicted in Figure 2b [14]. Stride-TCN introduced a stride parameter into dilated convolutions, resulting in a significant enlargement of the receptive field using a single dilated convolution.
While TCN exhibits relatively dense connections for convolution calculations, resulting in an inflated model size that does not effectively harness sequence repetitiveness, Stride-TCN was introduced to mitigate this issue. However, Stride-TCN’s optimal performance requires an intricate and computationally expensive hyperparameter search using Bayesian Optimization [14], leading to prolonged training times. Moreover, it came to our attention that TCN and Stride-TCN inadequately capture sequence information, as they rely on a single connection between repeating intervals (dilated). This limitation is particularly significant when dealing with energy data, where individual values carry limited informational value.
Our model draws inspiration from the architectural foundations of TCN and Stride-TCN, underpinned by a crucial observation regarding the repetitive nature of data sequences, particularly in the context of electronic consumption data. Within DelayNet, we propose a fresh adaptation of the TCN model by incorporating Delay Dilated convolutions. This approach enables us to fully capture the nuances of time series data while keeping the model’s parameter and computational demands relatively low. The primary objective is to substantially increase the receptive field, addressing the constraints observed in previous TCN models.
DelayNet exhibits a structure akin to the original TCN and stride-TCN, comprising two fundamental components: dilated and causal 1D convolutional layers. However, the novel aspect of Delay Dilated Convolution lies in its distinct approach. As depicted in Figure 3, each layer denoted as l is directly computed based on the preceding layer, l 1 . The nodes within layer l , represented as x i l , are determined by a convolution size K , which encompasses a delay gap extending upwards from layer l 1 . Likewise, the calculation of node x i + 1 l in layer l follows the same methodology as x i l . This computation facilitates the aggregation of information from the delayed data, as previously mentioned, within the confines of a single causal 1D convolution filter size.
z t = g e n c o d e r = D e l a y B l o c k ( x t k : t )
y ^ t + 1 = g d e c o d e r = M L P W ( z t k : t )
In essence, this signifies that delayed dilated convolution can place a more significant emphasis on capturing the seasonal and cyclical characteristics inherent in time series data, while maintaining an equal number of convolution operations. Especially when dealing with datasets employed for predicting power demand, as visualized in Figure 4, the seasonal and cyclical patterns influenced by climate conditions and building-specific attributes often outweigh the trend component. Hence, delayed dilated convolutional filters can yield substantial benefits in such scenarios. Furthermore, by employing fewer convolutional filters with a delayed gap, it becomes feasible to acquire a considerable receptive field while simultaneously reducing both parameter quantities and computational intricacies. Consequently, this approach allows for crafting models suitable for forecasting power demand in small to medium-sized building systems, making them adaptable to environments with constrained server resources or embedded systems, while also maintaining cost-effectiveness.

4. Experiments

In our experiment, we explore three scenarios to assess the performance and efficiency of DelayNet in comparison to other established models across various configurations.
Scenario 1 (Model Comparison): In the first scenario, we evaluate the performance of DelayNet with a small model size, emphasizing high accuracy. We compare it against a range of well-established models, including LSTM, MLP, GRU, TCN with two layers, ARIMA, and StrideTCN-2. This comparison aims to discern how DelayNet’s compact architecture fares in accuracy against more complex counterparts, shedding light on its suitability for resource-efficient forecasting.
Scenario 2 (Model Depth Analysis): In the second scenario, we delve into the impact of model depth on forecasting capabilities. We analyze Deeper-DelayNet with stacking five and seven layers. By exploring how these architectures handle increased depth, we aim to identify the optimal trade-off between model complexity and forecasting accuracy.
Scenario 3 (Lightweight Models Comparison): In the third scenario, we investigate Light-DelayNet versus Light-TCN, emphasizing efficiency and computational simplicity. This comparison seeks to elucidate how these lightweight variants of DelayNet and TCN perform.

4.1. Dataset

In our empirical studies, we conducted experiments using two publicly available datasets and two privately acquired datasets. All these datasets are accessible online, the size comparison is shown in Figure 5. It is worth noting that our research specifically centers on univariate time-series forecasting, which means we concentrated on time series data with a single dimension for each of the mentioned datasets. The key statistics of the dataset corpus are summarized in Table 1.
  • Individual Household Electric Power Consumption (France): This dataset is accessible online [35] and offers minute-by-minute electric power consumption data from a single household in France. The data spans 47 months, covering the period from December 2006 to November 2010. It comprises multivariable sequences, including total active power consumption, total reactive power consumption, average current intensity, active energy for the kitchen, active energy for the laundry, and active energy for climate control systems [36]. The dataset contains a total of 2,075,259 multivariable sequences. For our research, we specifically utilized 34,589 univariable sequences, focusing on the hourly global active power data. Our analysis centers on the total electricity consumption of a specific ID: 20.
  • Energy Consumption from Spain (Spain): Dataset is available online [37], provides hourly energy consumption data, outside temperatures for the region, and metadata for 499 customers in Spain. This dataset covers approximately one year, from 1 January to 31 December 2019, totaling 8760 data points. Our primary interest in this research is the total electricity consumption of a particular ID.
  • CNU Energy Consumption (CNU): This dataset features a real-world collection of energy consumption data from 90 distinct locations within CNU (Chonnam National University), available online [38]. The data was meticulously gathered at an hourly frequency over a span of 1.3 years, starting from 1 January 2021 and completing on 14 January 2022. Each location contributes valuable insights through a total of 11,232 data points. A notable enhancement in this upgraded CNU energy consumption dataset is the inclusion of additional dependable multivariate data, such as temperature, humidity, wind speed, and more. These supplementary variables were sourced from records maintained by the South Korean Meteorological Administration during the same period. Our experiment focuses on the total electricity consumption of a particular location, namely, “Gongdae 7th Building, HV_02”.
  • Gyeonggi-do Building Power Usage (Gyeonggi): This is a real-world dataset collected directly from household electricity consumption, also accessible online [39]. It encompasses hourly records of building power consumption spanning approximately 1.9 years, ranging from 1 January 2021 to 14 January 2022. The dataset covers around 17,001 specific commercial buildings situated in Gyeonggi Province, South Korea. We have chosen to concentrate on two specific buildings within this dataset, namely, ID 9654 and 6499. These buildings were selected based on our data analysis, as they exhibit minimal missing data, making them ideal candidates for our experiments.
Before the training process, it is crucial to perform preprocessing steps. This is necessitated by the relatively high-power values present in the datasets. For instance, in the case of CNU dataset, the power values exhibit a mean of approximately 130.48 and a standard deviation of around 46.97 . To circumvent potential challenges such as numerical errors and distortion of the dataset, we employ min-max normalization. This normalization method transforms the features to a standardized range of [ 0 ,   1 ] , ensuring uniformity and optimal training performance. The min-max normalization is calculated in Equation (13).
z i = x i min ( x ) max ( x ) min ( x )
In this experiment, we conducted time-series forecasting tasks using each of four datasets. To provide further details, most models employed an input window of 168 h and generated forecasts with output windows ranging from 1 to 144 h. For each dataset, we followed a chronological split, dividing it into three distinct subsets: 80 % for the training set, 10 % for the validation set, and 10 % for the testing set.

4.2. Configurations

We utilized two evaluation metrics for univariate forecasting, namely the Mean Absolute Error (MAE, k = 1 ) and Mean Squared Error (MSE, k = 2 ), which are defined as follows:
Error = 1 n i = 1 Y i Y i ^ k k
In our study, we employ input sequences of a fixed length, precisely 168 data points, as the foundation for our forecasting endeavors. Our forecasting horizon extends from short-term predictions of 1 to 24 time steps ahead, with a keen focus on assessing the model’s performance in long-term forecasting, spanning 36 , 48 , 60 , 72 , 84 , 96 , 108 , 120 , 132 and 3 steps into the future. To ensure consistency and comparability, we normalize our input sequences. During training, we utilize two loss functions, Huber and Mean Squared Error (MSE), each serving specific aspects of our predictive goals. Additionally, we employ a ReduceLROnPlateau callback on Adam optimizer to dynamically adjust the learning rate during training of ten epochs. This setup aims to explore the model’s capabilities across various forecasting horizons. All the models were trained/tested on a single Nvidia A100 40GB GPU. The source code is available at (accessed on 23 September 2023).

4.3. Model Comparision

Table 2, Table 3, Table 4 and Table 5 present a comprehensive analysis of DelayNet’s performance in comparison to other baseline models across the CNU, Gyeonggi, France, and Spain datasets, respectively. In these experiments, DelayNet is configured to the settings of followings, the kernel size is set to 3 , the gap of 6 , the delay factor of 7 , the number of filters is 16 while the number of stacks is 3 . The results reveal that DelayNet is better than other models, including LSTM, MLP, GRU, 2layers-TCN, ARIMA, and 2layers-Stride-TCN, with an average in (MSE, MAE) across horizon decrease (21.23%, 6.64%), (9.42%, 2.21%), (17.05%, 2.74%), (21.71%, 7.03%), (67.18%, 43.60%) and (38.4%, 19.82%), respectively on CNU dataset. Similarly, on Gyeonggi dataset, DelayNet better than other models with the margin of (10.28%, 6.22%), (7.0%, 8.47%), (13.45%, 9.57%), (5.81%, 7.22%), (54.37%, 27.51%), and (23.22%, 17.23%). On Spain dataset, DelayNet is also better than other models with the margin of (9.46%, 6.85%), (4.19%, 2.48%), (3.89%, 2.54%), (11.78%, 6.32%), (65.67%, 45.65%) and (31.94%, 18.62%). This demonstrates the robust predictive capabilities of Delay-Net, making it a compelling choice for time series forecasting tasks. We compare model’s parameters in Figure 6.
The proposed model demonstrates better performance not only in short-term predictions but also in longer-term forecasting scenarios. For instance, when examining the CNU (Table 2) and Gyeonggi (Table 3) datasets, DelayNet surpasses all other methods in terms of MSE for prediction horizons ranging from 72- to 144-time steps. Moreover, even on larger datasets like France, DelayNet delivers great performance, securing the second-best results with only a slight margin (0.07% in MSE and 1.08% in MAE) behind the leading GRU-based methods (Table 4). These outcomes underscore the model’s proficiency in capturing both short and long-term patterns within time series data.
For more emphasize the robustness of our method, we measure the runtime of DelayNet and TCN when forecasting range is 1 h, Figure 7. The results indicate that DelayNet exhibits inferior performance compared to TCN, with a discernable gap of approximately 1% and 1.5 s in metrics, respectively.

4.4. Depth Analysis

Figure 8 provides an insightful comparison of DelayNet models of varying sizes against a seven-layer TCN model. DelayNet models are characterized by different sizes in terms of the number of parameters (small: 62   K , medium: 148   K , large: 649   K ), while the TCN model comprises 640 K parameters. The analysis reveals that DelayNet consistently delivers competitive forecasting accuracy across different sizes, despite variations in parameter count. Remarkably, the medium-sized DelayNet (DelayNet-m) and the TCN exhibits similar MAE values, underscoring their robust predictive capabilities. Intriguingly, the small-sized DelayNet (DelayNet-s), despite its reduced parameter size, also performs exceptionally well, achieving MAE values that are on par with its larger counterparts. This underscores DelayNet’s ability to strike a harmonious balance between model complexity and predictive accuracy, rendering it an appealing choice for a wide range of forecasting tasks. Nonetheless, as the model depth increases with DelayNet-l, we observe an escalation in errors compared to other models. While this phenomenon may suggest overfitting, it is useful to note that there are instances in which DelayNet-l outperforms TCN within certain timeframes. On average, DelayNet-s, DelayNet-m and DelayNet-l demonstrates approximately a 13.44 % ,   12.24 % and 12.99 % improvement in forecasting accuracy over TCN-7layers across the 10 forecasting horizons.

4.5. Light-Weight Model Comparision

The baseline DelayNet architecture relies on a kernel size of 3 and maintains a gap of 5, ensuring that the gap size exceeds the kernel size. It references data from the past, specifically 2-time steps back, within each kernel mask and employs 16 filters within each kernel mask. Furthermore, it consists of a single block within the model, with a minimum requirement of at least one block. In contrast, the Light-DelayNet architecture adopts a larger kernel size of 12 and features a substantial gap of 24, which provides a more extensive distance between connections within a kernel mask, surpassing the kernel size. Additionally, Light-DelayNet incorporates three connections into each kernel mask from the past and retains 16 filters within each kernel mask. To enhance model performance, Light-DelayNet introduces increased depth by employing two blocks, offering greater potential for capturing complex temporal patterns and improving forecasting accuracy.
In the architectural comparison between Light-TCN and Light-DelayNet, both models employ a kernel size of 12 , 16 filters, and process input sequences of width 168 , indicating a foundational structural similarity. However, Light-TCN utilizes a list of dilation values [ 1 ,   2 ] to capture diverse temporal patterns, enabling the model to attend to different time scales within the data. In contrast, Light-DelayNet opts for a more straightforward design with a single stack, potentially reflecting a streamlined approach. Both Light-TCN and Light-DelayNet models possess an equivalent number of parameters, totaling 9521. These highlight the trade-off between complexity and adaptability in time series forecasting models, offering researchers flexibility in choosing the most suitable model.
Figure 9 illustrates the MAE comparison between Light-DelayNet and Light-TCN across four datasets when predicting energy consumption for a future time horizon of 10 h. In all datasets, Light-DelayNet consistently outperforms Light-TCN in terms of forecasting error. Notably, both models exhibit an increase in prediction errors as the forecasting range extends further into the future, though the discrepancy between their performance diminishes as the time steps increase. Moreover, the performance gap between the two models becomes more recognizable when dealing with larger datasets, such as Gyeonggi and France. Specifically, the average discrepancy in MAE over the forecasting range remains at 6.26 % ,   2.13 % ,   10.80 % and 0.9 % for Gyeonggi, France, CNU, and Spain, respectively.
Through experiments, DelayNet exhibits remarkable adaptability in time series prediction, showcasing its prowess in forecasting over both short and extended timeframes. Drawing inspiration from TCN while surpassing its performance in simpler and more intricate cases, DelayNet is tailored for datasets of moderate size, achieving a harmonious balance between efficiency and efficacy. Furthermore, its scalability, achieved through the stacking of DelayBlocks, bolsters its suitability for addressing complex forecasting challenges. To sum up, DelayNet’s aptitude for capturing historical patterns, its flexibility across dataset sizes, and its potential for expansion make it a compelling choice for a diverse spectrum of time series forecasting applications.

5. Conclusions

While various deep learning methodologies have been applied to power consumption forecasting, they have increasingly grown in size and computational complexity. In practical applications, especially within small to medium-sized buildings, the resource constraints of computing power become a critical consideration.
To address these challenges, we have proposed a specialized deep learning model tailored for real-building power consumption prediction. Our DelayNet effectively enhances the receptive field while significantly reducing the model’s parameter count and computational complexity. This model outperforms traditional approaches, including RNN, GRU, TCN, and ARIMA, by 21.23 % , 43.60 % ,   17.05 % and 21.71 % in terms of MSE error.
Furthermore, we have made significant contributions by providing access to valuable datasets, such as the CNU building power usage dataset and the Gyeonggi-do building power usage dataset, which offer rich resources for future power consumption prediction research.
Future research opportunities for enhancing DelayNet include integrating external variables like weather and holidays for improved real-world predictions, fine-tuning hyperparameters to tailor the model for specific contexts, and exploring transfer learning to leverage knowledge across different datasets or domains, ultimately aiming to boost its forecasting performance and practical applicability.

Author Contributions

Conceptualization, L.H.A. and J.-Y.K.; methodology, L.H.A., G.-H.Y. and D.T.V.; software, L.H.A.; validation, H.-G.K. and J.-Y.K.; formal analysis, D.T.V.; investigation, G.-H.Y. and D.T.V.; resources, H.-G.K.; data curation, L.H.A. and G.-H.Y.; writing—original draft preparation, L.H.A. and G.-H.Y.; writing—review and editing, D.T.V., H.-G.K. and J.-Y.K.; visualization, L.H.A.; supervision, J.-Y.K.; project administration, G.-H.Y., H.-G.K. and J.-Y.K.; funding acquisition, G.-H.Y. and J.-Y.K. All authors have read and agreed to the published version of the manuscript.


This work was partly supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-02068, Artificial Intelligence Innovation Hub) and the MSIT (Ministry of Science and ICT), Korea, under the Innovative Human Resource Development for Local Intellectualization support program (IITP-2023-RS-2022-00156287) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).

Data Availability Statement

All datasets in the study are available online. The CNU dataset can be downloaded from (accessed on 23 September 2023). The Gyeonggi dataset can be found (accessed on 23 September 2023). The two public dataset Spain and France can be found online.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Min, S.-K.; Son, S.-W.; Seo, K.-H.; Kug, J.-S.; An, S.-I.; Choi, Y.-S.; Jeong, J.-H. Changes in weather and climate extremes over Korea and possible causes: A review. Asia-Pac. J. Atmos. Sci. 2015, 51, 103–121. [Google Scholar] [CrossRef]
  2. Kim, M.; Sung, K. Impact of abnormal climate events on the production of Italian ryegrass as a season in Korea. J. Anim. Sci. Technol. 2021, 63, 77. [Google Scholar] [CrossRef] [PubMed]
  3. Park, J.; Seo, B. Nonlinear impact of temperature change on electricity demand: Estimation and prediction using partial linear model. Korean J. Appl. Stat. 2019, 32, 703–720. [Google Scholar]
  4. Ibrahim, H.; Ilinca, A.; Perron, J. Energy storage systems—Characteristics and comparisons. Renew. Sustain. Energy Rev. 2008, 12, 1221–1250. [Google Scholar] [CrossRef]
  5. Hwang, J.; Suh, D.; Otto, M.-O. Forecasting electricity consumption in commercial buildings using a machine learning approach. Energies 2020, 13, 5885. [Google Scholar] [CrossRef]
  6. Mariano-Hernández, D.; Hernández-Callejo, L.; Zorita-Lamadrid, A.; Duque-Pérez, O.; García, F.S. A review of strategies for building energy management system: Model predictive control, demand side management, optimization, and fault detect & diagnosis. J. Build. Eng. 2021, 33, 101692. [Google Scholar]
  7. Khan, M.H.; Asar, A.U.; Ullah, N.; Albogamy, F.R.; Rafique, M.K. Modeling and optimization of smart building energy management system considering both electrical and thermal load. Energies 2022, 15, 574. [Google Scholar] [CrossRef]
  8. Yildiz, B.; Bilbao, J.I.; Dore, J.; Sproul, A. Household electricity load forecasting using historical smart meter data with clustering and classification techniques. In Proceedings of the 2018 IEEE Innovative Smart Grid Technologies—Asia (ISGT Asia), Singapore, 22–25 May 2018; pp. 873–879. [Google Scholar]
  9. Khorsheed, E. Long-term energy peak load forecasting models: A hybrid statistical approach. In Proceedings of the Advances in Science and Engineering Technology International Conferences (ASET), Dubai, Sharjah, Abu Dhabi, United Arab Emirates, 6 February–5 April 2018. [Google Scholar]
  10. Çamurdan, Z.; Ganiz, M.C. Machine learning based electricity demand forecasting. In Proceedings of the International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, 5–8 October 2017. [Google Scholar]
  11. Lee, D.-G.; Sun, Y.-G.; Kim, S.-H.; Sim, I.; Hwang, Y.-M.; Kim, J.-Y. Comparison of power consumption prediction scheme based on artificial intelligence. J. Inst. Internet Broadcast. Commun. 2019, 19, 161–167, 2019. [Google Scholar]
  12. Kim, T.-Y.; Cho, S.-B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
  13. L’Heureux, A.; Grolinger, K.; Capretz, M.A. Transformer-based model for electrical load forecasting. Energies 2022, 15, 4993. [Google Scholar] [CrossRef]
  14. Anh, L.H.; Yu, G.H.; Vu, D.T.; Kim, J.S.; Lee, J.I.; Yoon, J.C.; Kim, J.Y. Stride-TCN for Energy Consumption Forecasting and Its Optimization. Appl. Sci. 2022, 12, 9422. [Google Scholar] [CrossRef]
  15. Fumo, N.; Rafe Biswas, M.A. Regression analysis for prediction of residential energy consumption. Renew. Sustain. Energy Rev. 2015, 47, 332–343. [Google Scholar] [CrossRef]
  16. Amber, K.; Aslam, M.; Hussain, S. Electricity consumption forecasting models for administration buildings of the UK higher education sector. Energy Build. 2015, 90, 127–136. [Google Scholar] [CrossRef]
  17. Contreras, J.; Espinola, R.; Nogales, F.; Conejo, A. ARIMA models to predict next-day electricity prices. IEEE Trans. Power Syst. 2003, 18, 1014–1020. [Google Scholar] [CrossRef]
  18. Nepal, B.; Yamaha, M.; Yokoe, A.; Yamaji, T. Electricity load forecasting using clustering and ARIMA model for energy management in buildings. Jpn. Archit. Rev. 2020, 3, 62–76. [Google Scholar] [CrossRef]
  19. Bogomolov, A.; Lepri, B.; Larcher, R.; Antonelli, F.; Pianesi, F.; Pentland, A. Energy consumption prediction using people dynamics derived from cellular network data. EPJ Data Sci. 2016, 5, 13. [Google Scholar] [CrossRef]
  20. Chen, Y.; Xu, P.; Chu, Y.; Li, W.; Wu, Y.; Ni, L.; Bao, Y.; Wang, K. Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy 2017, 195, 659–670. [Google Scholar] [CrossRef]
  21. Dong, X.; Qian, L.; Huang, L. Short-term load forecasting in smart grid: A combined CNN and K-means clustering approach. In Proceedings of the IEEE International Conference on Big Data and Smart Computing, Jeju, Republic of Korea, 13–16 February 2017. [Google Scholar]
  22. Zheng, J.; Xu, C.; Zhang, Z.; Li, X. Electric load forecasting in smart grids using long-short-term-memory based recurrent neural network. In Proceedings of the 2017 51st Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 22–24 March 2017. [Google Scholar]
  23. Rafi, S.H.; Deeba, S.R.; Hossain, E. A short-term load forecasting method using integrated CNN and LSTM network. IEEE Access 2021, 9, 32436–32448. [Google Scholar] [CrossRef]
  24. Marino, D.L.; Amarasinghe, K.; Manic, M. Building energy load forecasting using deep neural networks. In Proceedings of the IECON 2016-42nd Annual Conference of the IEEE Industrial Electronics Societ, Florence, Italy, 23–26 October 2016. [Google Scholar]
  25. Li, D.; Sun, G.; Miao, S.; Gu, Y.; Zhang, Y.; He, S. A short-term electric load forecast method based on improved se-quence-to-sequence GRU with adaptive temporal dependence. Int. J. Electr. Power Energy Syst. 2022, 137, 107627. [Google Scholar] [CrossRef]
  26. Sehovac, L.; Grolinger, K. Deep learning for load forecasting: Sequence to sequence recurrent neural networks with attention. IEEE Access 2020, 8, 36411–36426. [Google Scholar] [CrossRef]
  27. He, Y.; Zhao, J. Temporal convolutional networks for anomaly detection in time series. J. Phys. Conf. Ser. 2019, 1213, 042050. [Google Scholar] [CrossRef]
  28. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. Advances in Neural Information Processing Systems 2017. Available online: (accessed on 23 September 2023).
  29. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
  30. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
  31. Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. International Conference on Machine Learning. 2022, pp. 27268–27286. Available online: (accessed on 23 September 2023).
  32. Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef] [PubMed]
  33. Chen, Y.; Kang, Y.; Chen, Y.; Wang, Z. Probabilistic forecasting with temporal convolutional neural network. Neurocomputing 2020, 399, 491–501. [Google Scholar] [CrossRef]
  34. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
  35. Individual Household Electric Power Consumption. Available online: (accessed on 23 September 2023).
  36. Parate, A.; Bhoite, S. Individual Household Electric Power Consumption Forecasting Using Machine Learning Algorithms. Int. J. Comput. Appl. Technol. 2019, 8, 371–376. [Google Scholar] [CrossRef]
  37. Mey, O.; Schneider, A.; Enge-Rosenblatt, O. Prediction of energy consumption for variable customer portfolios including aleatoric uncertainty estimation. In Proceedings of the 10th International Conference on Power Science and Engineering (ICPSE), Istanbul, Turkey, 21–23 October 2021; pp. 61–71. [Google Scholar]
  38. Anh, L.H. Github. Available online: (accessed on 23 September 2023).
  39. Anh, L.H. Github. Available online: (accessed on 23 September 2023).
Figure 1. Deep learning models for univariate time series forecasting can be categorized into four distinct types.
Figure 1. Deep learning models for univariate time series forecasting can be categorized into four distinct types.
Energies 16 07662 g001
Figure 2. Original TCN with dilated convolutions (left) and Stride-Dilated TCN (right).
Figure 2. Original TCN with dilated convolutions (left) and Stride-Dilated TCN (right).
Energies 16 07662 g002
Figure 3. Overall architecture: Delay Block (left), and Connection between two layers in DelayNet (right).
Figure 3. Overall architecture: Delay Block (left), and Connection between two layers in DelayNet (right).
Energies 16 07662 g003
Figure 4. Characteristics of electricity consumption-time series data.
Figure 4. Characteristics of electricity consumption-time series data.
Energies 16 07662 g004
Figure 5. Comparison of Dataset Sizes—Public (Spain and France) vs. Private (CNU and Gyeonggi).
Figure 5. Comparison of Dataset Sizes—Public (Spain and France) vs. Private (CNU and Gyeonggi).
Energies 16 07662 g005
Figure 6. Number of parameters comparison.
Figure 6. Number of parameters comparison.
Energies 16 07662 g006
Figure 7. MSE of our DelayNet compared with TCN.
Figure 7. MSE of our DelayNet compared with TCN.
Energies 16 07662 g007
Figure 8. DelayNet’s depth comparison, experiments on France dataset. Errors (left) of each model and its corresponding values are given in the table (right).
Figure 8. DelayNet’s depth comparison, experiments on France dataset. Errors (left) of each model and its corresponding values are given in the table (right).
Energies 16 07662 g008
Figure 9. Light-DelayNet performance compared with Light-TCN on four datasets.
Figure 9. Light-DelayNet performance compared with Light-TCN on four datasets.
Energies 16 07662 g009
Table 1. Summary of Dataset.
Table 1. Summary of Dataset.
DatasetLengthNo. VariablesAttributions
France34,5897Global active power, Global reactive power, Voltage, Global intensity, Submetering 1, Submetering 2, Submetering 3
Spain87602Energy consumption, Outside temperature
CNU11,2092Energy consumption, Outside temperature
Gyeonggi17,0011Energy consumption
Table 2. Performance on CNU dataset.
Table 2. Performance on CNU dataset.
MethodsDelay-Net2LSTMMLPGRUTCN-2 LayersARIMAStride-TCN 2 Layers
Better (in average) 21.236.649.4267.1843.602.2117.052.7421.717.0338.0419.82
Table 3. Performance on Gyeonggi dataset.
Table 3. Performance on Gyeonggi dataset.
MethodsDelayNetLSTMMLPGRUTCN-2 LayersARIMAStrideTCN-2 Layers
Better (in average)
Table 4. Performance on France dataset.
Table 4. Performance on France dataset.
MethodsDelay-NetLSTMMLPGRUTCN-2 LayersARIMAStride-TCN 2 Layers
Better (in average) 2.780.461.460.67−0.07−1.083.992.4255.2629.5818.4915.01
Table 5. Performance on Spain dataset.
Table 5. Performance on Spain dataset.
MethodsDelay-Net2LSTMMLPGRUTCN-2 LayersARIMAStride-TCN 2 Layers
Better (in average) 9.646.854.192.483.892.5411.786.3265.6745.6531.9418.62
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Anh, L.H.; Yu, G.-H.; Vu, D.T.; Kim, H.-G.; Kim, J.-Y. DelayNet: Enhancing Temporal Feature Extraction for Electronic Consumption Forecasting with Delayed Dilated Convolution. Energies 2023, 16, 7662.

AMA Style

Anh LH, Yu G-H, Vu DT, Kim H-G, Kim J-Y. DelayNet: Enhancing Temporal Feature Extraction for Electronic Consumption Forecasting with Delayed Dilated Convolution. Energies. 2023; 16(22):7662.

Chicago/Turabian Style

Anh, Le Hoang, Gwang-Hyun Yu, Dang Thanh Vu, Hyoung-Gook Kim, and Jin-Young Kim. 2023. "DelayNet: Enhancing Temporal Feature Extraction for Electronic Consumption Forecasting with Delayed Dilated Convolution" Energies 16, no. 22: 7662.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop