Next Article in Journal
On the Crystal Chemistry of Photochromic Yttrium Oxyhydride
Next Article in Special Issue
Which Factors Determine CO2 Emissions in China? Trade Openness, Financial Development, Coal Consumption, Economic Growth or Urbanization: Quantile Granger Causality Test
Previous Article in Journal
Numerical Modeling of the Hydrodynamic Performance of Slanted Axial-Flow Urban Drainage Pumps at Shut-Off Condition
Previous Article in Special Issue
Empirical Study on CO2 Emissions, Financial Development and Economic Growth of the BRICS Countries
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Autonomous Energy Management by Applying Deep Q-Learning to Enhance Sustainability in Smart Tourism Cities

Pannee Suanpang
Pitchaya Jamjuntr
Kittisak Jermsittiparsert
3,4,5,6 and
Phuripoj Kaewyong
Faculty of Science and Technology, Suan Dusit University, Bangkok 10300, Thailand
Computer Engineering Department, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand
Faculty of Education, University of City Island, 9945 Famagusta, Cyprus
Faculty of Social and Political Sciences, Universitas Muhammadiyah Sinjai, Kabupaten Sinjai 92615, Sulawesi Selatan, Indonesia
Faculty of Social and Political Sciences, Universitas Muhammadiyah Makassar, Kota Makassar 90221, Sulawesi Selatan, Indonesia
Publication Research Institute and Community Service, Universitas Muhammadiyah Sidenreng Rappang, Sidenreng Rappang Regency 91651, South Sulawesi, Indonesia
Author to whom correspondence should be addressed.
Energies 2022, 15(5), 1906;
Submission received: 5 February 2022 / Revised: 28 February 2022 / Accepted: 1 March 2022 / Published: 4 March 2022
(This article belongs to the Special Issue Behavioral Models for Energy with Applications)


Autonomous energy management is becoming a significant mechanism for attaining sustainability in energy management. This resulted in this research paper, which aimed to apply deep reinforcement learning algorithms for an autonomous energy management system of a microgrid. This paper proposed a novel microgrid model that consisted of a combustion set of a household load, renewable energy, an energy storage system, and a generator, which were connected to the main grid. The proposed autonomous energy management system was designed to cooperate with the various flexible sources and loads by defining the priority resources, loads, and electricity prices. The system was implemented by using deep reinforcement learning algorithms that worked effectively in order to control the power storage, solar panels, generator, and main grid. The system model could achieve the optimal performance with near-optimal policies. As a result, this method could save 13.19% in the cost compared to conducting manual control of energy management. In this study, there was a focus on applying Q-learning for the microgrid in the tourism industry in remote areas which can produce and store energy. Therefore, we proposed an autonomous energy management system for effective energy management. In future work, the system could be improved by applying deep learning to use energy price data to predict the future energy price, when the system could produce more energy than the demand and store it for selling at the most appropriate price; this would make the autonomous energy management system smarter and provide better benefits for the tourism industry. This proposed autonomous energy management could be applied to other industries, for example businesses or factories which need effective energy management to maintain microgrid stability and also save energy.

1. Introduction

With the advancement of information technology in the disruption era, which is driving digital disruption, the way tourism businesses operate would be transformed by adopting new technology to help support their business operations and elevate them to sustainable development [1,2,3,4]. With the development model based on Sustainable Development Goal 7 (SDG7) by the United Nations Environment Programme (UNEP) by 2030, the goal is to have “cheap, reliable, sustainable, and modern energy for all.” Its three main goals serve as the cornerstone for the researchers’ efforts: (1) ensure that everyone has access to energy services that are affordable, reliable, and contemporary; (2) significantly enhance the amount of renewable energy in the global energy mix; (3) double the global rate of energy efficiency improvement [5]. In this regard, this would see the emergence of using advanced technology for sustaining and managing the energy to go green and preserve the environment, moving toward sustainability [6]. To reduce any potential differences, an autonomous energy management system would combine various energy sources, both renewable and non-renewable, and energy storage systems (ESS) to meet the demand for the loads, that could be connected to the main grid at the point of common coupling (PCC) or operated off-grid, where the microgrids’ operating systems could support green energy. When a fault would develop in the linked power systems, autonomous energy management would act in an isolated mode. Hence, microgrids provide a number of advantages, including reducing greenhouse gas emissions, supporting reactive power to raise the voltage profile, decentralizing the energy supply, and responding to the demand. By 2024, the global deployment of microgrids is estimated to reach 8.8 GW. Moreover, microgrids have been installed in rural places, towns, and a variety of industries, including commercial, industrial, and military, based on their goals, load types, and geographical and climatic conditions [7].
Autonomous energy management has also been applied in the tourism industry because of the rapidly growing demand for energy at an accelerated pace due to the internationalization and development of civilization [8]. Hotels and resorts are a very important accommodation service business in the Thai tourism industry, which is an industry that generated an income of 3.076 trillion Thai Baht in 2019 for the country (reference). Additionally, the amount of energy demand depends on many factors, such as the nature and style of the building, usage of the customers who stay, number of rooms, outdoor temperature maintenance, etc. Therefore, if the hotel or resort applies energy management to its business operations by using energy in each section effectively and reduces unnecessary energy use, this would help the hotel or resort to save costs, electricity, and reduce wastage or waste of natural resources. Today, many hotels and resorts are able to generate electricity in several ways, including solar cells or gasoline generators that create the complexity of the power system for the tourism business [9]. Therefore, smart energy management is a necessary system for the tourism business.
With the advent of technology in the digital disruption era, the tourism industry’s energy conservation system has been widely implemented, and it has become an important aspect of driving an attraction toward becoming a smart tourism city [6]. A smart city is a sustainable and efficient urban center that delivers a high quality of life for a large number of people, while also requiring effective resource management. As such, energy management is one of the most essential concerns in such urban centers where the energy networks are complex. Therefore, smart energy management is an important key in order to solve this problem. As a consequence, modeling and simulation would be applied to find smart solutions, as well as to plan the most appropriate ways to change from existing cities to smarter ones [10]. In general, energy planning and operation models of a smart city consist of generation, storage, infrastructure, facilities, and transport, which can become a complex power system that would provide an autonomous energy management system for a smart city. Thus, the research topic about deep Q-learning is interesting for research in the energy field [11].
Many tourism locations are situated in remote areas that are quite far from the main power grids. Consequently, these grids are unable to support these tourism attractions effectively, with the result being that many do not have sufficient energy to operate their business [12]. Each microgrid has a different objective and capacity that cooperates with various power resources and a high quantity of loads. Microgrids are alternatively called energy management systems that are operated in coordination to reliably supply electricity to a cluster of loads and distribute generated units. Furthermore, they are energy storage systems connected to the host power system at the distribution level at a single point of connection, or PCC [13] Microgrids can also be totally self-contained and independent of the grid (off-grid).
The purpose of this study was to investigate a deep Q-learning artificial intelligence (AI) model for automatically regulating an energy management system (EMS) that would preserve the energy reserve, maximize the overall system’s efficiency, and optimize the dispatch of local resources [2,3,4,5,6]. The EMS had significant hurdles as a result of the microgrid’s structure, including the small size, volatility, uncertainty, and intermittency of the distributed energy resources (DER), as well as demand unpredictability and dynamic power market prices. Further advancements in microgrid construction and control would also be necessary to overcome these obstacles. To mitigate the significant volatility of the DER, additional sources of flexibility would need to be used at the architectural level. In addition, to improve the energy dispatch and overcome the uncertainties of the microgrid’s components, new control mechanisms and intelligent control approaches would be required.

2. Literature Review

2.1. Autonomous Energy Management

The term “autonomous energy management” still has no specific definition to cover this concept. However, the literature review found that the problem of high demand for energy production is the main problem. Behind this concept, many scholars try to study the new paradigm of looking for energy autonomy in several countries in response to this problem; in [13,14] the authors studied the problem regarding the local energy organization management in the European Union and proposed autonomous energy regional organizations. Furthermore, the study proposed the solution to the preparation and implementation of the grid services as a part of the local public autonomous energy system [15]. In [16,17], the authors purpose the other idea in terms the technology development to support the renewable energy sources. They propose a multipurpose optimization method of the autonomous energy system size which consists of diesel, wind, battery storage, and photovoltaic systems, as well as a load switching system [17].
Currently, power systems are principally based on large-scale power plants like coal, hydro, natural gas, and nuclear. However, those forms of energy are based on non-renewable energy sources. In addition, every country around the world has been concerned about the environment and energy resources. As a result, renewable resources like solar and wind were investigated and integrated into the system [14,18]. At present, the power system is controlled centrally, and the power is synchronously generated in the power plants and flows from the central power station to the customers in a single direction [16]. With this centralized energy management, the customers would depend on the central power station as the only source. However, in the event that the power station experienced some problems, this would affect the customers.
With the worldwide growth in the digital era, the demand for energy is further increasing. Nevertheless, there are many new technologies being integrated into future power systems. First of all, there are more kinds of distributed energy source technologies, such as solar, wind, combined heat, and power generators [18,19]. These technologies are slightly different from the conventional models as they typically have a power inverter interface to connect to the grid. There are also new distributed technologies, such as distributed storage, flexible loads, and electric vehicles (EVs). Finally, all of these have led to a highly complex situation in the control and operation of the energy system.
Nowadays, microgrid technology has provided a solution for autonomous energy management from distributed energy sources [8]. A microgrid is a small low-voltage or medium-voltage system, which has integrated the power generator, electric load, information technology, and communication systems. The combined energy storage and automatic control system are able to work together as a single system. Typically, microgrid systems are connected to the main grid.
Regarding from the literature review, we can summarize the definition of autonomous energy management as the deconstruction of a centralized power grid’s control from a large scale into a smaller grid. With decentralized control and autonomy by every load from each energy resource and data communication, in order to contribute more savings and stability to the energy system.
The advantage of the microgrid system is the reliability of the self-sufficient energy system, which can detect any problems from the main grid and switch to its system automatically. It is also able to serve some activities to continuously operate the system, such as in hospitals, university laboratories, hotels, factories, electric vehicle charging stations, etc. Moreover, the existing literature showed that some energy management systems could decrease the energy costs for business owners. Kapiki [20] found that efficient energy management systems could save energy costs up to 65% for hotel owners. Furthermore, the smart grid and the latest technologies could provide a solid solution to control complex distributed energy systems, such as an autonomous energy management system for green buildings [8]. Additionally, Basit et al. [12] proposed an autonomous energy management system for smart houses, which reduced the cost at peak load times in the home environment. In the study from Raju et al. [21], a multi-agent system (MAS) was implemented for the autonomous energy management of a solar microgrid consisting of two solar photovoltaic (PV) systems. Each component of the microgrid was used as an agent, and together on the optimal energy management [22].

2.2. Smart City

The term “smart tourism cities” is gaining renown [2], but there is still no specific definition that can particularly cover this concept. Chung et al. [23] stated “smart tourism cities are indistinct boundaries between tourists and residents in geospatial locations (e.g., urban or destination).” However, behind this concept, most researchers have referred to the terms “smart city” and “smart tourism”. Being “a smart city is using all available resources and technologies to grow to be integrated, habitable, and sustainable in an intelligent and corresponding manner” [24]. Harrison et al. [25] also defined that “a smart city means a city that connects with social substructure, physical substructure, business substructure and IT substructure to take advantage of the city’s collective intelligence.” For the concept of smart tourism, Li et al. [26] defined this as “it is a tourist information service that tourists receive throughout the travel process.” Gretzel et al. [27] indicated that “smart tourism is tourism maintained by a combined endeavor to collect data from the social connection, physical infrastructure, and government with the use of innovative technology to transform that data to on-site experiences and business value schemes with emphasis on efficiency, sustainability, and experience enhancement.” Moreover, Chung et al. [23] introduced the integration of a “smart city” and “smart tourism,” so “smart tourist city” was born. Furthermore, “smart tourism towns are sophisticated tourist destinations that provide sustainable growth that simplifies and increases visitor contact with the destination experience and, as a result, improves the quality of life for the locals” [28].

2.3. Deep Q-Learning

In reinforcement learning, deep Q-learning is a familiar algorithm that produces a Q-table that an agent is able to use to find the most appropriate solution to process [29]. In deep Q-learning, neural networks (NN) are used to approximate the Q-valued function. The state is defined as the input, and the Q-values of all possible actions are generated as the output [30]. Additionally, the deep Q-learning algorithm has many benefits for the control system.
The following literature demonstrates the existing research. James and Johns [31] presented an approach that used deep Q-learning to train seven robotic arms in a controlled task without any prior knowledge. Rahman et al. [32] also applied the deep Q-network (DQN) for a self-balancing robot to make the robot model learn the best actions for staying balanced in an environment. Additionally, Qiao et al. [33] proposed handwritten digit recognition using an adaptive deep Q-learning strategy. Furthermore, Zhu et al. [34] studied a deep-Q-learning-based transmission scheduling mechanism for the cognitive Internet of Things (IoT). Moreover, Bui et al. [35] controlled a battery energy storage system by using a double deep-Q-learning-based approach.

3. Materials and Methods

In this paper, the researchers developed a prototype of smart microgrids for tourism cities, which developed a microgrid virtual environment by using an open-source Python tool. Reinforcement learning (RL) also allowed the machine to learn how to perform the actions. In order to optimize a reward signal, the machine conducted actions in the surroundings. That reward signal in the context of a microgrid could comprise the energy cost, peak load, or safety, depending on which behavior would need to be incentivized. A Markov decision process (MDP) was used to teach the agent how to respond in an RL scenario. However, because the state space in modern power grids is so huge, a normal RL algorithm would be unable to solve it. Therefore, to solve this problem, a deep NN could be used to model the desired policies and value functions, which would therefore be called deep RL.
To apply solutions for sequential decision making based on deep RL, the optimal operation of an MG could be described as a partially observable MDP, in which the MG would be viewed as an agent interacting with its surroundings. The state of the system st = s was made up of a history of features of observations in order to approach the Markov property. 𝑂𝑡𝑖; 𝑖 ∈ {1,...,𝑁𝑓}, where 𝑁𝑓 ∈ 𝑁 would be the total number of features. Each 𝑂𝑡𝑖 would be represented by a series of punctual observations over a predetermined period of time ℎ𝑖: 𝑂𝑡𝑖 = [𝑜𝑡 − ℎ𝑖 + 1𝑖; ...; 𝑜𝑡𝑖] (the history length may depend on the feature). The agent would observe a state variable st at each time step, perform an action at A, and advance into a state st, take an action 𝑎𝑡 ∈ 𝐴, and move into a state 𝑠𝑡 + 1~𝑃 (|𝑠𝑡; 𝑎𝑡). The transition (𝑠𝑡; 𝑎𝑡; 𝑠𝑡 + 1) would be coupled with a reward signal 𝑟𝑡 =𝜌 (𝑠𝑡; 𝑎𝑡; 𝑠𝑡 + 1), where: SASR would be the reward function. Then, the γ-discounted optimal Q-value function would be defined.
𝑄∗(𝑠,𝑎) = max𝜋 𝐸 [∑ 𝛾𝑘 − 𝑇∞𝑘 = 𝑡 𝑟𝑘|𝑠𝑡 = 𝑠,𝑎𝑡 = 𝑎,𝜋]

Value-Based Deep Reinforce Learning Methods

The Q-function would be represented as an approximator using an NN with parameters based on the MDP formulation notations. Deep Q-learning (DQN) is one of the parameter-tuning techniques that is most often used with the goal of directly approximating the ideal Q-function. The parameters are learned in one-step DQN by iteratively minimizing a succession of loss functions with the loss function defined. The Q-function is then changed to return in one step. The researchers also implemented an experience replay mechanism to improve the efficient use of a previously gained experience. The learning phase was conceptually separated from the experience gain phase in an experience replay. Randomly sampled batches of transitions from an experience dataset were used in the experience replay. Moreover, the NN could overcome the limitations of non-stationary data distributions through this technique, thus resulting in improved algorithm convergence. It is also worth mentioning that this algorithm did not employ the greedy strategy because the search space was always explored at random during the training.
The 𝑠𝑡𝑀𝐺 ∈ 𝑆𝑀𝐺 storage operating state of the microgrid was used by the researchers. This was a term used to define the quantity of energy stored in the storage devices. The quantity of energy stored in the battery was measured in 𝑠𝑡𝐵 [𝑊] ∈ 𝑆𝐵 [𝑊], and the energy density of a diesel generator was represented by 𝑠𝑡𝐷𝐺 [𝑊] ∈ 𝑆𝐷𝐺 [𝑊/𝑘𝑔]. Then, x𝐵 [𝑊] (resp.x𝐻2 [𝑊𝑝]) was introduced as well as the battery storage capacity and generator output x𝐷𝐺 [𝑊]. The variable η𝐵 (resp. 𝜁𝐵) denoted the discharge efficiency of the battery. Likewise, the efficiency of the electrolysis and fuel cells were given by η𝐻2 (when storing energy) and 𝜁𝐻2 (when delivering energy). The variable 𝜁𝐷𝐺 was the efficiency of a diesel generator, and an action was undertaken at each time step. a𝑡 = [𝑎𝑡𝐻2; 𝑎𝑡𝐷𝐺; 𝑎𝑡𝐵] ∈ 𝐴𝑡 was applied on the system, where 𝑎𝑡𝐻2 was the amount of energy moved into (if positive) or out of (if negative) the hydrogen storage device; similarly, this was the amount of energy transported into (if positive) or out of (if negative) the hydrogen storage device. 𝑎𝑡𝐵 was the quantity of energy transferred into or out of the battery that was measured by 𝑎𝑡𝐷𝐺, which was the quantity of energy emitted by the diesel generator (all negative). The dynamics of the battery were determined by 𝑠𝑡 + 1𝐵 = 𝑠𝑡𝐵 + 𝜂𝑡𝐵𝑎𝑡𝐵 if 𝑎𝑡𝐵 ≥ 0 and 𝑠𝑡 + 1𝐵 = 𝑠𝑡𝐵𝑎𝑡𝐵 𝜁𝑡𝐵⁄ otherwise. Similarly, the dynamics of hydrogen were described by 𝑠𝑡 + 1𝐻2 = 𝑠𝑡𝐻2 + 𝜂𝑡𝐻2𝑎𝑡𝐻2 if 𝑎𝑡𝐻2 ≥ 0 and 𝑠𝑡 + 1𝐻2 = 𝑠𝑡𝐻2 − 𝑎𝑡𝐻2 𝜁𝑡𝐻. Figure 1 show the deep reinforcement learning design of the study.
The instantaneous reward signal 𝑟𝑡 was calculated by adding the earnings from the generation of hydrogen. 𝑟𝐻2 with the penalties 𝑟− was due to the value of the loss load: 𝑟𝑡 = 𝑟 (𝑎𝑡; 𝑑𝑡) = 𝑟𝐻2 +𝑟 − (𝑎𝑡; 𝑑𝑡). The penalty 𝑟 was equivalent to the total quantity of the energy not delivered to meet the demand: 𝑟 (𝑎𝑡; 𝑑𝑡) = 𝑘δ𝑡 when δ𝑡 < 0 and null otherwise (𝑘 was the cost endured per Wh not supplied within the microgrid), while 𝑟𝐻2 was given by 𝑟𝐻2 (𝑎𝑡; 𝑑𝑡) = 𝑘𝐻2𝑎𝑡𝐻2 (𝑘𝐻2 was the revenue/cost per Wh of hydrogen produced/used). According to the description of the problem, there was no means to supply energy from outside the system (for the public grid), and the system was not rewarded for it. The operational revenue for year y was calculated by using the series of incentives rt as follows: 𝑀𝑦 = ∑𝑟𝑡𝑡𝜏𝑦 where 𝜏𝑦 was the set of time steps belonging to year 𝑦. The optimal operation of the MG necessitated the development of a sequential decision-making method that led to the maximization of the output of 𝑀𝑦 (Algorithm 1).
Algorithm 1
Initialize building parameters.
Initialize Q(s,a) arbitrarily.
Repeat (for each episode).
Initialize s.
    Choose a from s using the policy from Q(ϵ-greedy).
        Take action (a).
    Update building states (s’).
    Calculate reward (r).
    Q(s,a)←r + γQ(s’,a’)
until s is terminal.
The researchers’ experiment replicated the operation of an actual microgrid with PV panels, batteries, and a generator that was not linked to the main utility grid (off-grid). The researchers developed a DQN architecture in which the state vector provided the inputs, and each discretized action’s Q-values were represented by a separate output. The DQN time series processes used a set of 16-filter convolutions with stride 1 followed by a convolution with 16-filter convolutions with stride 2. The output of the convolutions, as well as the other inputs, was followed by two fully connected layers of 50 and 20 neurons, respectively, as well as the output layer. Except for the output layer, where no activation function was employed, the rectified linear activation unit (ReLU) was utilized as the activation function. The researchers conducted the updated Q at each time step by starting with a random DQN. Simultaneously, the researchers used an agent to supplement a replay memory with all the observations, actions, and rewards. This was followed by an 𝜖-greedy policy s.t. where the policy 𝜋(𝑠) = max𝑎𝐴 𝑄 (𝑠; 𝑎; Θ𝑘) was selected with the probability 1 − 𝜖, and a random action was chosen with the probability (with uniform probability over the acts) 𝜖. The researchers also employed a decreasing value of 𝜖 over time. During the validation and test phases, the policy 𝜋(𝑠) = max𝑎𝐴 𝑄 (𝑠; 𝑎; Θ𝑘) was applied (with 𝜖 = 0). Figure 2 show the microgrid diagram of this study.
The researchers assumed a household power consumer in a holiday village with an off-grid MG (average of 48 kWh/day). As a starting point, historical data on total sun radiation were employed. At a meteorological station in this town, solar radiation was measured. The electrical load was calculated using real-time data from typical days in each month. The battery had a capacity of xB = 384 kWh, the diesel generator had power of xDG = 100 kW, and the peak PV power generation was xPV = 75 kWp, consumed outside of the MG that was fixed at 2.16 Thai Baht/kWh. The main goal was to minimize the electrical costs, and the reward function was created to maximize the economic profit from the activities. The incentive was based on the gross margin from the operations, which was the money generated by selling electricity to the microgrid and to the external grid minus the costs of the power generation, purchases, and transmission from the external grid.

4. Results

The result of operating deep RL algorithms in a simulated environment for 50 h and recording both the training performances and daily rewards is shown in Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8 and Table 1, which depicts the learning processes for each of the RL algorithms. In the simple one-step DQN, the learning curves showed a large amount of instability, and the remaining algorithms displayed a positive learning process that resulted in reasonable convergence. Figure 3 shows the relation of load (kilowatt) on the y-axis, in 9000 h on the x-axis.
Figure 4 shows the relation between the photovoltaics in 9000 h.
Figure 5 shows the relation between load (orange) and photovoltaics (blue) in 24 h.
Figure 6 shows the relation between load (orange) and photovoltaics (blue) in 168 h.
Figure 7 shows the relation between load, photovoltaics, battery (charge and dis-charge) and grid (in and out) in 9000 h.
Figure 8 shows the relation between episodes and reward.

5. Discussion and Conclusions

In this study we applied Q-learning for autonomous energy management. After running the simulation, the results showed this proposed method could save 13.19% in the cost, compared to conducting manual control for energy management. The results showed the average cost of manual control was 1637.32 baht in 24 h; the average cost of control by applying Q-learning was 1431.36 baht in 24 h.
An autonomous energy management system for a residential microgrid for a hotel or resort with multiple sources of flexibility was investigated in this study. The suggested microgrid model took into account the demand flexibility that price-responsive loads could provide. To achieve the effective management of the local resources, the suggested autonomous energy management systems were coordinated between the ESS, the main grid, the loads, and the price-responsive loads. The high dimensionality of the variables in the microgrid components encouraged the employment of intelligent learning-based methods in autonomous energy management systems, such as deep reinforcement learning (RL) algorithms. The numerical findings revealed that varied levels of convergence were attained by the deep RL methods. The findings were compared to a theoretical optimal controller with perfect knowledge of the system’s variables and dynamics for the entire day, as well as an electricity retailer who purchased electricity on the day-ahead market and met the same demand without using a microgrid. The results suggested that the proposed microgrid paradigm had a substantial advantage in terms of financial prosperity and resilience in the face of adversity. Because of the high complexity and uncertainty of the microgrid components, designing and implementing an effective autonomous energy management system for future microgrids would be a difficult undertaking. Although deep RL approaches have shown to be successful in simulations, they are far from ideal, and due to data inefficiency, instability, and sluggish convergence, they confront implementation challenges in real-world energy management systems [36,37,38]. As a consequence, the researchers are now working on improving the performance of the deep RL algorithms and expanding their applicability to real-world energy management problems [15,38,39].
From the study, we found that the result from proposed system of applying Q-learning for autonomous energy management could reduce energy costs by 13.19% and applying reinforcement learning could reduce energy costs by 9.74%, compared to manual controlling.
In future work, the experimental results could show improvements in autonomous energy management in several ways. Firstly, when the microgrid produces energy higher than the demand, the system could control the energy storage system to charge or discharge electricity to be sold to the main grid or neighboring microgrids. Secondly, deep RL could be applied to energy planning for selling or buying at the real-time price in energy markets. Finally, a study could be undertaken of the performance of deep Q-learning in order to convert the knowledge of simulations to a real application for the microgrid in the tourism industry.
Additionally, our proposed autonomous energy management could be applied to other industries, for example businesses or factories which need an effective energy management. Although these industries can produce energy, these still need to connect to the main grid. Therefore, our proposed autonomous energy management can help to maintain microgrid stability and also save energy.

Author Contributions

The research conceptualization was by P.S. and P.J.; research methodology by P.S. and P.J.; software and system implementation by P.S. and P.J.; validation by P.J.; formal analysis by P.S. and P.J.; investigation, P.S. and P.J.; resources by P.S. and P.J.; data curation by P.J.; writing—original draft preparation by P.S., P.J. and K.J.; writing—review and editing, P.S., P.J., P.K. and K.J. All authors have read and agreed to the published version of the manuscript.


This research was funded by Suan Dusit University under the Ministry of Higher Education, Science, Research and Innovation, Thailand. The research project grant number 65-FF-003, Innovation of Smart Tourism to Promote Tourism in Suphan Buri Province.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Jermsittiparsert, K.; Chankoson, T. Behavior of Tourism Industry under the Situation of Environmental Threats and Carbon Emission: Time Series Analysis from Thailand. Int. J. Energy Econ. Policy 2019, 6, 366–372. [Google Scholar] [CrossRef]
  2. Suanpang, P.; Sopha, C.; Jakjarus, C.; Leethong-in, P.; Tahanklae, P.; Panyavacharawongse, C.; Phopun, N.; Prasertsut, N. Innovation for Human Capital Development in the Tourism and Hospitality Industry (Frist S-Curve) on the Eastern Economic Corridor (EEE) (Chon Buri-Rayong-Chanthaburi-Trat) to Enrich International Standards and Prominence to High Value Services for Stimulate Thailand to Be Word Class Destination and Support New Normal Paradigm; Suan Dusit University: Bangkok, Thailand, 2021. [Google Scholar]
  3. Suanpang, P.; Jamjuntr, P. A Chatbot Prototype by Deep Learning Supporting Tourism. Psychol. Educ. 2021, 4, 1902–1911. [Google Scholar]
  4. Suanpang, P.; Jamjuntr, P. A comparative study of deep learning methods for time-Series forecasting tourism business recovery from the COVID 19 pandemic crisis. J. Manag. Inf. Decis. Sci. 2021, 24, 1–10. [Google Scholar]
  5. United Nation Environment Programme. SDG 7. Available online: (accessed on 10 January 2022).
  6. Suanpang, P.; Pothipassa, P.; Netwong, T.; Kaewyong, P.; Niamsorn, C.; Chunhaparagu, T.; Donggitt, J.; Webb, P.; Rotprasoet, P.; Songma, S.; et al. Innovation of Smart Tourism to Promote Tourism in Suphan Buri Province; Suan Dusit University: Bangkok, Thailand, 2022. [Google Scholar]
  7. Hirsch, A.; Parag, Y.; Guerrero, J. Microgrids: A review of technologies, key drivers, and outstanding issues. Renew. Sustain. Energy Rev. 2018, 9, 402–411. [Google Scholar] [CrossRef]
  8. Jonban, M.S. Autonomous energy management system with self-healing capabilities for green buildings (microgrids). J. Build. Eng. 2020, 34, 01604. [Google Scholar] [CrossRef]
  9. Parpairia, K. Sustainability and Energy Use in Small Scale Greek Hotels: Energy Saving Strategies and Environmental Policies. Procedia Environ. Sci. 2017, 38, 169–177. [Google Scholar] [CrossRef]
  10. Calvillo, C.F.; Sánchez-Miralles, A.; Villar, J. Energy management and planning in smart citie. Renew. Sustain. Energy Rev. 2016, 55, 273–287. [Google Scholar] [CrossRef] [Green Version]
  11. Arent, D.J.; Barrows, C.; Davis, S.; Grim, G.; Schaidle, J.; Kroposki, B.; Ruth, M.; Van Zandt, B. Integration of energy system. MRS Bulletin 2022, 46, 1–14. [Google Scholar] [CrossRef]
  12. Basit, A.; Sidhu, G.A.S.; Mahmood, A.; Gao, F. Efficient and Autonomous Energy Management Techniques for the Future Smart Homes. IEEE Trans. Smart Grid 2017, 2, 917–926. [Google Scholar] [CrossRef]
  13. Ramamoorty, M.; Venkata, S.N.L.L. Microgrid Protection Systems. In Micro-Grids-Applications, Solutions, Case Studies, and Demonstrations; IntechOpen: London, UK, 2019. [Google Scholar] [CrossRef] [Green Version]
  14. Kumar, M. Social, Economic, and Environmental Impacts of Renewable Energy Resources. In Wind Solar Hybrid Renewable Energy System; Qubeissi, M., El-kharouf, A., Soyhan, H., Eds.; IntechOpen: London, UK, 2020. [Google Scholar] [CrossRef] [Green Version]
  15. Lavrik, A.; Zhukovskiy, Y.; Tcvetkov, P. Optimizing the Size of Autonomous Hybrid Microgrids with Regard to Load Shifting. Energies 2021, 14, 5059. [Google Scholar] [CrossRef]
  16. Rakhshani, E.; Rouzbehi, K.; JSánchez, A.; Tobar, A.C.; Pouresmaeil, E. Integration of Large Scale PV-Based Generation into Power Systems: A Survey. Energies 2019, 8, 1425. [Google Scholar] [CrossRef] [Green Version]
  17. Maśloch, P.; Maśloch, G.; Kuźmiński, Ł.; Wojtaszek, H.; Miciuła, I. Autonomous Energy Regions as a Proposed Choice of Selecting Selected EU Regions—Aspects of Their Creation and Management. Energies 2020, 13, 6444. [Google Scholar] [CrossRef]
  18. Salvarli, M.; Salvarli, H. For Sustainable Development: Future Trends in Renewable Energy and Enabling Technologies. In Renewable Energy: Resources, Challenges and Applications; Okedu, K., Tahour, A., Aissaou, A., Eds.; IntechOpen: London, UK, 2020. [Google Scholar] [CrossRef]
  19. Siemens. Microgrid. Available online: (accessed on 29 January 2022).
  20. Kapiki, S. Energy Management in Hospitality: A Study of the Thessaloniki Hotels. Econ. Organ. Future Enterp. 2010, 1, 78–97. [Google Scholar]
  21. Raju, L.; Milton, R.S.; Morais, A.A. Autonomous Energy Management of a Micro-Grid using Multi Agent System. Indian J. Sci. Technol. 2016, 9, 1–6. [Google Scholar] [CrossRef]
  22. Boudoudouh, S.; Maâroufi, M. Multi agent system solution to microgrid implementation. Sustain. Cities Soc. 2018, 39, 252–261. [Google Scholar] [CrossRef]
  23. Chung, N.; Lee, H.; Ham, J.; Koo, C. Smart Tourism Cities’ Competitiveness Index: A Conceptual Model. In Information and Communication Technologies in Tourism 2021; Wörndl, W., Koo, C., Stienmetz, J.L., Eds.; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
  24. Barrionuevo, J.M.; Berrone, P.; Ricart, J.E. Smart Cities, Sustainable Progress. IESE Insight 2012, 14, 50–57. [Google Scholar] [CrossRef]
  25. Harrison, C.; Eckman, B.; Hamilton, R.; Hartswick, P.; Kalagnanam, J.; Paraszczak, J.; Williams, P. Foundations for Smarter Cities. IBM J. Res. Dev. 2010, 54, 1–16. [Google Scholar] [CrossRef]
  26. Li, Y.; Hu, C.; Huang, C.; Duan, L. The concept of smart tourism in the context of tourism information services. Tour. Manag. 2017, 58, 293–300. [Google Scholar] [CrossRef]
  27. Gretzel, U.; Sigala, M.; Xiang, Z.; Koo, C. Smart tourism: Foundations and developments. Electron. Mark. 2015, 25, 179–188. [Google Scholar] [CrossRef] [Green Version]
  28. Ma, H. The Construction Path and Mode of Public Tourism Information Service System Based on the Perspective of Smart City. Complexity 2020, 2020, 1–11. [Google Scholar] [CrossRef]
  29. Fan, J.; Wang, Z.; Xie, Y.; Yang, Z. A Theoretical Analysis of Deep Q-Learning. PMLR 2020, 120, 486–489. [Google Scholar]
  30. Ong, H.Y.; Chavez, K.; Hong, A. Distributed Deep Q-Learning. Available online: (accessed on 28 January 2022).
  31. James, S.; Johns, E. 3D Simulation for Robot Arm Control with Deep Q-Learning. Available online: (accessed on 25 January 2022).
  32. Rahman, M.; Rashid, S.M.H.; Hossain, M.M. Implementation of Q learning and deep Q network for controlling a self balancing robot model. Robot. Biomim. 2018, 5, 1–6. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Qiao, J.; Wang, G.; Li, W.; Chen, M. An adaptive deep Q-learning strategy for handwritten digit recognition. Neural Netw. 2018, 107, 61–71. [Google Scholar] [CrossRef] [PubMed]
  34. Zhu, J.; Song, Y.; Jiang, D.; Song, H. A New Deep-Q-Learning-Based Transmission Scheduling Mechanism for the Cognitive Internet of Things. IEEE Internet Things J. 2017, 5, 2375–2385. [Google Scholar] [CrossRef]
  35. Bui, Y.-H.; Hussain, A.; Kim, H.-M. Double Deep $Q$ -Learning-Based Distributed Operation of Battery Energy Storage System Considering Uncertainties. IEEE Trans. Smart Grid 2019, 11, 457–469. [Google Scholar] [CrossRef]
  36. Bokolo, A.J. Smart City Data Architecture for Energy Prosumption in Municipalities: Concepts, Requirements, and Future Directions. Int. J. Green Energy 2020, 13, 827–845. [Google Scholar]
  37. Nakabi, T.A.; Toivanen, P. Deep reinforcement learning for energy management in a microgrid with flexible demand. Sustain. Energy Grids Netw. 2020, 25, 100413. [Google Scholar] [CrossRef]
  38. Perera, A.; Kamalaruban, P. Applications of reinforcement learning in energy systems. Renew. Sustain. Energy Rev. 2020, 137, 110618. [Google Scholar] [CrossRef]
  39. Malmedal, K.; Kroposki, B.; Sen, P.K. Distributed Energy Resources and Renewable Energy in Distribution Systems: Protection Considerations and Penetration Levels. In Proceedings of the 2008 IEEE Industry Applications Society Annual Meeting, Edmonton, AB, Canada, 5–9 October 2008; pp. 1–8. [Google Scholar] [CrossRef]
Figure 1. Deep reinforcement learning design.
Figure 1. Deep reinforcement learning design.
Energies 15 01906 g001
Figure 2. Microgrid diagram.
Figure 2. Microgrid diagram.
Energies 15 01906 g002
Figure 3. Load.
Figure 3. Load.
Energies 15 01906 g003
Figure 4. PV.
Figure 4. PV.
Energies 15 01906 g004
Figure 5. PV and load for the next 24 h.
Figure 5. PV and load for the next 24 h.
Energies 15 01906 g005
Figure 6. PV and load for 7 days.
Figure 6. PV and load for 7 days.
Energies 15 01906 g006
Figure 7. Energy management system applied by deep reinforcement learning.
Figure 7. Energy management system applied by deep reinforcement learning.
Energies 15 01906 g007
Figure 8. Reward and episodes.
Figure 8. Reward and episodes.
Energies 15 01906 g008
Table 1. Time-State-Action-Cost.
Table 1. Time-State-Action-Cost.
0(304, 0.2)discharge74.3 ฿
1(200, 0.2)discharge123.7 ฿
2(200, 0.2)discharge173.1 ฿
3(200, 0.2)discharge222.4 ฿
4(202, 0.2)discharge272.2 ฿
5(306, 0.2)import347.4 ฿
6(524, 0.2)discharge476.6 ฿
7(611, 0.2)discharge627.9 ฿
8(568, 0.2)discharge807.8 ฿
9(394, 0.2)discharge932.4 ฿
10(450, 0.2)discharge1074.9 ฿
11(483, 0.2)import1228.0 ฿
12(470, 0.2)discharge1518.1 ฿
13(389, 0.2)import1758.3 ฿
14(365, 0.2)discharge1983.5 ฿
15(409, 0.2)import2235.8 ฿
16(593, 0.2)import2599.5 ฿
17(625, 0.2)discharge2979.6 ฿
18(625, 0.2)discharge3170.7 ฿
19(525, 0.2)import3330.7 ฿
20(525, 0.2)import3490.6 ฿
21(524, 0.2)discharge3613.8 ฿
22(522, 0.2)import3736.8 ฿
23(533, 0.2)discharge3864.0 ฿
24(305, 0.2)discharge3938.7 ฿
25(200, 0.2)discharge3988.4 ฿
26(200, 0.2)discharge4038.2 ฿
27(200, 0.2)discharge4088.0 ฿
28(202, 0.2)discharge4138.4 ฿
29(306, 0.2)import4214.8 ฿
30(524, 0.2)discharge4345.5 ฿
31(611, 0.2)discharge4498.7 ฿
32(568, 0.2)discharge4680.7 ฿
33(296, 0.2)import4775.4 ฿
34(334, 0.2)discharge4882.1 ฿
35(393, 0.2)import5007.6 ฿
36(303, 0.2)discharge5195.1 ฿
37(390, 0.2)import5436.2 ฿
38(351, 0.2)discharge5653.3 ฿
39(432, 0.2)import5919.9 ฿
40(590, 0.2)discharge6282.6 ฿
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Suanpang, P.; Jamjuntr, P.; Jermsittiparsert, K.; Kaewyong, P. Autonomous Energy Management by Applying Deep Q-Learning to Enhance Sustainability in Smart Tourism Cities. Energies 2022, 15, 1906.

AMA Style

Suanpang P, Jamjuntr P, Jermsittiparsert K, Kaewyong P. Autonomous Energy Management by Applying Deep Q-Learning to Enhance Sustainability in Smart Tourism Cities. Energies. 2022; 15(5):1906.

Chicago/Turabian Style

Suanpang, Pannee, Pitchaya Jamjuntr, Kittisak Jermsittiparsert, and Phuripoj Kaewyong. 2022. "Autonomous Energy Management by Applying Deep Q-Learning to Enhance Sustainability in Smart Tourism Cities" Energies 15, no. 5: 1906.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop