Software-Defined Heterogeneous Edge Computing Network Resource Scheduling Based on Reinforcement Learning

Li, Yaofang; Wu, Bin

doi:10.3390/app13010426

Open AccessArticle

Software-Defined Heterogeneous Edge Computing Network Resource Scheduling Based on Reinforcement Learning

by

Yaofang Li

^1,* and

Bin Wu

²

¹

School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China

²

School of Computer Science and Technology, Tianjin University, Tianjin 300072, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(1), 426; https://doi.org/10.3390/app13010426

Submission received: 28 November 2022 / Revised: 14 December 2022 / Accepted: 23 December 2022 / Published: 29 December 2022

(This article belongs to the Special Issue Edge Computing in 6G Networks)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the rapid development of wireless networks, wireless edge computing networks have been widely considered. The heterogeneous characteristics of the 6G edge computing network bring new challenges to network resource scheduling. In this work, we consider a heterogeneous edge computing network with heterogeneous edge computing nodes and task requirements. We design a software-defined heterogeneous edge computing network architecture to separate the control layer and the data layer. According to different requirements, the tasks in heterogeneous edge computing networks are decomposed into multiple subtasks at the control layer, and the edge computing node alliance responding to the tasks is established to perform the decomposed subtasks. In order to optimize both network energy consumption and network load balancing, we model the resource scheduling problem as a Markov Decision Process (MDP), and design a Proximal Policy Optimization (PPO) resource scheduling algorithm based on deep reinforcement learning. Simulation analysis shows that the proposed PPO resource scheduling can achieve low energy consumption and ideal load balancing.

Keywords:

edge computing; deep reinforcement learning; software-defined network; heterogeneous network; resource scheduling

1. Introduction

With the rapid development of mobile communication, the network demand, the amount of data, and computing demand are also growing rapidly. Traditional cloud computing struggles to meet the requirements of low latency and low energy consumption of various edge services of 6G. Edge computing, unlike the centralized mode of cloud computing, sinks computing devices to the edge of the network [1,2]. These computing devices can be routers, gateways, computers, and other terminal devices. They can undertake the computing tasks of the edge network and effectively alleviate the computing pressure of the cloud data center. In addition, because the edge computing devices are more dispersed and closer to the data request users, the requirements of low latency are easier to meet [3,4]. By using more edge computing devices to complete network computing tasks, edge computing effectively reduces the cost of the network, the delay of business processing, and changes the paradigm of traditional cloud computing [5]. With the development of application scenarios and conditions, more and more data types are sensed and transmitted in wireless networks, and edge computing networks are gradually becoming heterogeneous. Heterogeneous edge computing networks refer to the differences in edge computing devices and task requirements that constitute the network. Heterogeneous edge computing devices have different computing capabilities when processing different types of data, such as images, matrices, etc. Due to the inherent heterogeneous characteristics of heterogeneous edge computing networks, resource scheduling and network performance optimization are challenging.

The idea of a software-defined edge computing network is to introduce a software-defined network [6,7,8] into a wireless edge computing network. The edge computing network becomes a data layer and control layer decoupling network architecture, and the control logic of the nodes is transferred to the upper centralized controller to provide flexible network management and scheduling. Software-defined edge computing networks can effectively solve the problems of traditional heterogeneous edge computing network resource scheduling and network heterogeneity. Through the centralized control of the control layer, the heterogeneous edge computing network can effectively respond to the heterogeneous task requirements and match the task types with the edge computing devices. At the same time, the architecture of separation of control layer and data layer can achieve efficient reuse of the underlying physical network resources. In addition, the software-defined edge computing network allows edge computing devices with different parameter functions to coexist, and has good adaptability to node heterogeneity.

The work in [9] focused on a two-tier edge computing system composed of a macro base station and multiple micro base stations and jointly optimized task scheduling and energy management decisions. In terms of software-defined network, the work in [10] proposed an software-defined network architecture of wireless networks and applied it to wireless edge computing networks. The enhancing performance of software-defined wireless edge computing networks was verified. Further, the work in [11] proposed the separation of wireless network control layer and data layer. In this network, the nodes only transmitted data without participating in data processing which improved the efficiency and life of the network. Considering edge computing, the work in [7,12,13] analyzed the advantages of edge computing and software-defined network in mobile vehicle networking. The architecture of software-defined vehicle networking greatly improves the flexibility and reliability of the network. The work in [14] summarized the application of software-defined network and edge computing in the next generation mobile communication. It was shown that the Internet of things based on software-defined networks and edge computing can effectively shorten the service response time and improve the quality of users. It is worth noting that most of the work [10,11,15,16,17] did not take into account the heterogeneous characteristics of data and edge computing devices in 6G networks. The optimization and scheduling problems in heterogeneous edge computing networks remain to be further studied.

Driven by the latest advances in algorithms, computing power, and big data, artificial intelligence has made substantial breakthroughs in a wide range of fields [18]. Deep reinforcement learning technology enables agents to learn from past experience to make the best decisions, and does not require massive labeled data. Deep reinforcement learning obtains data from the interaction with the environment to train, which makes RL widely used in automatic control, resource management, network optimization, and other fields [19,20,21]. As the most popular deep reinforcement learning algorithm, Proximal Policy Optimization (PPO) has good stability and has become the baseline algorithm of OpenAI [22,23]. Deep reinforcement learning algorithms such as PPO also have great research value in software-defined sensor networks. By learning the distribution characteristics and task requirements of sensor networks, the network can effectively overcome the influence of dynamic environment and heterogeneous characteristics, and adaptively realize the scheduling of sensor network resources.

Motivated by the above work, we consider an edge computing network with heterogeneous edge computing nodes and heterogeneous task requirements, and establish a software-defined heterogeneous edge computing network architecture. Furthermore, in order to optimize network energy consumption and network load balancing, we establish a multi-objective optimization problem and further model the problem as a Markov Decision Problem (MDP) process. The PPO resource scheduling algorithm based on deep reinforcement learning is designed to solve the combinatorial optimization problem. Specifically, our contributions are as follows.

We propose a heterogeneous edge computing network architecture based on software-defined network. The heterogeneous edge computing network architecture realizes the separation of control layer and data layer, and can effectively improve the efficiency of resource scheduling through centralized control strategy.
We establish an optimization model for the heterogeneous edge computing networks considering network energy consumption and network load balancing. We decouple the edge computing network tasks into subtasks according to different requirements, establish a task alliance, and propose a task scheduling problem model.
We model the heterogeneous edge computing network resource scheduling problem as an MDP, and introduce PPO algorithm to solve the scheduling strategy. The proposed PPO resource scheduling algorithm has good performance under the dual goals of network energy consumption and network load balancing.

The arrangement of this article is as follows. We first describe the heterogeneous edge computing network, establish the software-defined heterogeneous edge computing network architecture, and propose a multi-objective optimization model in the Section 2. In the Section 3, we first model the resource scheduling problem as an MDP process, and design a reinforcement learning algorithm based on PPO. The simulation analysis and conclusion are given in Section 4 and Section 5, respectively.

2. System Model and Problem Formulation

We consider a heterogeneous edge computing network as shown in Figure 1, where each edge network includes P edge computing nodes, an edge server node and a remote data center. The edge computing nodes have heterogeneous computing processing and communication capabilities. Edge server as the main node, mainly responsible for computing task scheduling, with sufficient power. We consider that tasks in the heterogeneous edge computing network have different data type computing requirements and are independent and random. It needs to be noted that different data types correspond to different processing capabilities for our computing nodes. For example, devices equipped with graphics processing GPUs usually have higher processing power for video data. For devices with multi-core CPUs, matrix operations are often more efficient. The computing task is transmitted from the remote data center to the edge server. The edge server is assigned to the edge computing node to complete the computing task and finally returns to the data center. According to the energy consumption model of the edge computing network, when the edge computing node p performs the computing task m, the required computing energy consumption can be expressed as

E_{p, m}^{c o m p} = e_{p, m}^{c o m p} u_{p, m},

(1)

where

e_{p, m}^{c o m p}

represents the unit computational energy consumption when the edge computing node p performs task m, and

u_{p, m}

represents the data size requirement for task m. Assuming that after edge computing p completes task m, the size of the collected packet is

L_{p, m}

, considering the transmission distance as d, we can give the communication energy consumption required by edge computing p to perform task m as

E_{p, m}^{c o m m} = \{\begin{matrix} L_{p, m} E_{e l e c} + L_{p, m} ε_{s f} d^{2}, d \leq d_{0} \\ L_{p, m} E_{e l e c} + L_{p, m} ε_{m p} d^{2}, d \geq d_{0} \end{matrix},

(2)

where

E_{e l e c}

represents the energy consumption required to send or receive unit data,

ε_{s f}

and

ε_{m p}

represents the free space model magnification and the multipath transmission model magnification, respectively. The transmission distance is related to the wireless channel. It can be known that within a certain distance, inter-device communication usually has a good wireless channel. As the distance gradually increases, the impact of multipath effects on the wireless channel begins to become severe. At this time, the parameters of multipath transmission should be used to describe the wireless channel. For the transmission distance threshold

d_{0}

, we have

d_{0} = \sqrt{\frac{ε_{f s}}{ε_{m p}}} .

(3)

In order to tackle the resource scheduling problems caused by the heterogeneous characteristics of edge computing network, we introduce software-defined network to realize the separation of control layer and data layer. The idea of separating data layer and control layer is embodied in software defined network. For the network we consider, the corresponding control layer of the resource scheduling strategy, the processing of computing tasks and the corresponding data layer of transmission. The control layer is independent of the data layer, allowing the network to have better compatibility and scalability. At the same time, the architecture can also realize the dynamic adjustment of the network, improve the overall network efficiency and make the network more intelligent. Such a software-defined heterogeneous edge computing network architecture effectively increases network flexibility and enables more efficient resource management. Specifically, the software-defined heterogeneous edge computing network architecture is shown in Figure 2. The edge server with enough power acts as a control layer node. The edge server node in control layer can realize the centralized deployment, and also can realize the separation of control layer and data layer. The global information of edge computing network is accessible to the control layer. Other heterogeneous edge computing nodes, as data layer nodes, are only responsible for data computing and transmitting and do not participate in any decision-making. When a task arrives, the control node divides the task into k subtasks according the data type contained in the task, where each subtask is completed independently by an edge computing node. To efficiently solve the above heterogeneous edge computing network resource scheduling problem, we introduced a multi-task alliance. In the considered network, the edge server acts as a control node to decompose tasks and build alliances for each task. The alliance is generated by the edge server with centralized decision-making, avoiding the complex broadcast process. Each edge computing node in the alliance undertakes at least one subtask and cooperates to complete the requirements of the task. After the task decomposition and allocation are completed, each edge computing node performs the sensing work for its subtask. Based on (1), we can obtain the computational energy consumption required to task m in the software-defined edge computing network as

E_{m}^{c o m p} = \sum_{i = 1}^{k} E_{p_{i}, i}^{c o m p} = \sum_{i = 1}^{k} e_{p_{i}, i}^{c o m p} u_{p_{i}, i},

(4)

where

p_{i}

is the edge computing node for subtask i. Further, we can obtain the communication energy consumption required for the task m as

E_{m}^{c o m m} = \sum_{i = 1}^{k} E_{p_{i}, i}^{c o m m} .

(5)

Considering that the load balancing of edge computing network has great influence on network performance and network lifetime, we describe the resource scheduling of software-defined heterogeneous edge computing networks as a multi-objective optimization problem, which is expressed as

min \{\begin{matrix} E^{t o t l a} = \sum_{m = 1}^{M} (E_{m}^{c o m p} + E_{m}^{c o m m}) \\ E^{b a l a n c e} = \frac{\sum_{m = 1}^{M} \sum_{P_{i} = 1}^{p_{m}} | ∥ E_{m}^{c o m p} + E_{m}^{c o m m} - \frac{E^{t o t l a}}{P} | ∥}{E^{t o t l a}} \end{matrix},

(6)

where

E^{b a l a n c e}

represents the load balancing degree of the network. The load balance here is a representation of the offset of the nodes in the network relative to the average energy consumption. We hope that the offset is as small as possible to achieve node load balancing. The total energy consumption takes into account the sum of the energy consumption of the computing node to process all subtasks and the communication energy consumption required for post-completion backhaul. The first constraint is the maximum load constraint of a single computing node, which takes into account the actual maximum computing power of the node. Further, we consider each sub-task allocation constraint as the second one, which requires each sub-task to be completed by a computing node alone.

The above edge computing resource scheduling problem is a complex combinatorial optimization problem, which is usually difficult to be solved by traditional optimization methods. In the next section, we model the problem as an MDP process and study it based on deep reinforcement learning method.

3. PPO Resource Scheduling

Based on the above content, the heterogeneous edge computing network resource scheduling problem is established. In this section, it is constructed as a deep reinforcement learning problem and solved by PPO resource scheduling algorithm. First, the problem is modeled as an MDP, and the edge server (i.e., the control node) is regarded as an agent to learn the resource scheduling strategy. The state space, action space, state transition probability function, and reward function of MDP are designed as follows.

State space $S$ . For the considered heterogeneous edge computing network, the edge computing tasks and edge computing node resource scheduling in the network determine the energy consumption of the entire network. Therefore, we take the subtasks in the network and edge computing node scheduling as the state s, then the state space can be expressed as a set of all subtasks and node scheduling strategies.
Action space $A$ . The scheduling of subtasks causes the above state to change. Therefore, we use the scheduling strategy of the next subtask as the action of the agent a. This design is more intuitive and convenient for the interaction between the agent and the environment. The defined action space can be represented as a collection of all possible scheduling policies for the next subtask.
State transition probability function $P$ . According to the design of agent state and action, the state transition probability function of the design environment in this section is given as

$P (s^{'} | s, a) = \{\begin{matrix} 1, s^{'} = s (a) \\ 0, others \end{matrix}, s \in S, a \in A .$

(7)

The above expression indicates that the update of the state is uniquely determined by the current state and the next action.
Reward function $R$ . We transform the optimization objective (6) into a cumulative reward maximization problem by designing a reward function. Specifically, the reward of each step is defined as a weighted function of the computing, communication energy consumption and load balancing degree brought by the subtask allocation, which is expressed as

$r = - λ E^{t o t l a} E^{b a l a n c e},$

(8)

where $λ$ is a positive weight to limit the reward into a reasonable range. The design of the above reward function enables the agent to continuously move toward a low-energy, load-balanced state to maximize the accumulation of rewards.

The role of the Actor network is to fit the strategy of the agent

π_{θ} (a |s)

, and the strategy is expressed in the form of a Gaussian distribution. The Gaussian distribution can be fully described by its mean and standard deviation. When the agent needs to take action, the Gaussian distribution is recovered first, and then an action is selected by random sampling. The PPO initializes a network with the same structure and parameters as the Actor network, named old-Actor to keep the old policy unchanged for each update round. The old-Actor network does not participate in training. It only copies parameters from the Actor network before each update to maintain the old strategy of the current round

π_{θ_{g}}

. The parameters of the Actor network are updated by

\begin{matrix} θ_{g + 1} = \underset{θ}{arg max} \frac{1}{T} \sum_{t = 0}^{T} min (\frac{π_{θ} (a_{t} |s_{t})}{π_{θ_{g}} (a_{t} |s_{t})} A_{t}^{π_{θ_{g}}}, \\ clip (\frac{π_{θ} (a_{t} |s_{t})}{π_{θ_{g}} (a_{t} |s_{t})}, 1 - ε, 1 + ε) A_{t}^{π_{θ_{g}}}), \end{matrix}

(9)

where g indicates the number of rounds of updates,

A_{t}^{π_{θ_{g}}} = R_{t} + γ V (s_{t + 1}) - V (s_{t})

is the advantage function,

R_{t}

and

V (s_{t})

are the reward value of the time step t and the state value of

s_{t}

.

The Critic network completes the estimation of the state value

V (s)

. The training goal is to minimize the loss function based on mean square error, and its parameters are updated by the following formula.

ϕ_{g + 1} = \underset{ϕ}{arg min} \frac{1}{T} \sum_{t = 0}^{T} {(V_{ϕ} (s_{t}) - {\hat{R}}_{t})}^{2},

(10)

where

{\hat{R}}_{t}

is an accumulated reward calculated from state transition trajectory data.

The Actor network has an input layer, a hidden layer, and two output layers to complete the mapping of the state to the mean and standard deviation of the policy Gaussian distribution. The activation function used by the hidden layer is the Rectified Linear Unit (ReLU) function, the output layer is divided into two parts, the mean and the standard deviation. The mean part first limits the output to through the Tanh activation function, then maps to the entire action space. The standard deviation part limits the output to through the Softplus activation function. The Critic network has an input layer, two hidden layers, and an output layer. Critic network completes the mapping from state to state value. The hidden layer uses the ReLU activation function, and the output layer outputs the state value directly without using the activation function.

The considered resource scheduling of heterogeneous edge computing networks has been modeled as an MDP, and the PPO network architecture has been built. Based on PPO, we complete the resource scheduling of heterogeneous edge computing networks with the Algorithm 1 shown as follow.

Algorithm 1 PPO Resource Scheduling Algorithm.

Input: Edge computing network task parameters, edge computing parameters, PPO network parameters, other parameters in Table 1.
Output: Subtask scheduling strategy set, network energy consumption, network load balancing.

1:: Initialize the Actor network and Critic estimate parameters $θ$ and $ϕ$ , initialize the old-Actor network.
2:: Clear the experience buffer.
3:: for each episode do
4:: Agent obtains current sensor network state $s_{t}$ .
5:: Actor inputs $s_{t}$ and output strategy Gaussian distributed, agent recoveries strategy and select an action $a_{t}$ .
6:: Agent gets reward value $r_{t}$ performs action $a_{t}$ and updates to next state $a_{t + 1}$ .
7:: for for $1 : T$ do do
8:: According to $V (s_{t})$ and the reward value of each time step in the buffer, the state value of the corresponding time step is calculated.
9:: Update the parameters of actor network and critic network according to (9) and (10), respectively.
10:: end for
11:: end for
12:: return Subtask scheduling strategy set, network energy consumption, network load balancing.

4. Simulation Results

We present the simulation results of heterogeneous edge computing network resource scheduling in this section. The simulation process is based on Pytorch. Edge computing nodes are randomly distributed over a 1000 * 1000 m square with edge server’s location of (500, 500) (i.e., the center of this area). Multiple tasks arrive independently in heterogeneous edge computing network. According to the various needs of the task, each task can be divided into 10 subtasks with different data volume requirements. As for the reinforcement learning part, we implement the proposed PPO resource scheduling algorithm based on the reinforcement learning framework proposed in [24]. The detailed parameters of the edge computing network and PPO are given in Table 1. In order to verify the effectiveness of the algorithm, we consider two baseline strategies.

Random assignment scheme. All subtasks are randomly assigned to nodes that can perform the subtask. This scheme has low complexity, but does not achieve network optimization.
Energy greedy scheme. The subtask is assigned to the edge computing node with the lowest energy consumption to complete the subtask without exceeding the maximum load of the node. After the load reaches the upper limit, the subtask is assigned to the edge computing node with the sub-low power consumption.

We first analyze the convergence of the proposed algorithm as shown in Figure 3. Based on the average cumulative reward of 200 steps in each epoch, we trained 1,000,000 steps for models with different task numbers. It can be seen that the convergence speed of PPO resource scheduling decreases with the increase of task number. This is because the larger task number corresponds to a larger state space, and also leads to an increase in the complexity of the network. Therefore, large number of tasks require more steps to train. Moreover, we can also observe that the cumulative reward is declining while the task number is increasing. This is because the reward we designed consists of network energy consumption and network load balancing. As the number of tasks increases, the energy consumption of the network increases, resulting in a decrease in cumulative rewards. We can conclude that the proposed PPO resource scheduling has a stable training process and good convergence characteristics under different task numbers.

We give the total energy consumption of the network under different task numbers in Figure 4. As we expect, the energy consumption of the network always increases with the increase of the task number. In addition, we can see that the proposed PPO resource scheduling performs significantly better than the random assignment scheme in terms of energy consumption, but slightly worse than the energy greedy scheme. When the task number is 20, the network energy consumption of PPO resource scheduling is 12.3 J, which is 3.6 J higher than greedy strategy and 10.4 J lower than random assignment scheme. The reason that PPO resource scheduling network energy consumption is higher than the energy greedy scheme is that we consider both network energy consumption and network load balancing in PPO resource scheduling. In order to ensure the overall load balancing of the network, the agent chooses to deploy subtasks on more nodes, thus sacrificing the performance of network energy consumption to a certain extent. The energy greedy scheme always pursues a low-energy allocation method without considering any other influencing factors. The energy consumption of the random deployment strategy is the largest. Because the random deployment strategy does not consider the heterogeneous of the processing capacity of the edge computing nodes for different tasks, and it is difficult to achieve a reasonable correspondence between the tasks and the edge computing nodes.

We give the load balancing degree of the network under different task numbers in Figure 5. The change of load balance with the number of tasks is not regular, but within a certain range. This is because the types of simulation tasks, data volume, etc., have a certain randomness. Random task generation results in jitters in energy balance. At the same time, different algorithms also lead to differences in the range of energy balance because of different goals and strategies. It can be observed that compared with the energy consumption greedy strategy and the random allocation strategy, the PPO resource scheduling we proposed can always achieve the best network load balancing. When the task number is 20, the network load balancing of PPO resource scheduling is 0.31, which is 0.37 lower than the greedy strategy of energy consumption and 0.13 lower than the random deployment strategy. This is because our proposed optimization takes energy consumption and load balancing as the dual goals. In order to enhance the cumulative reward, agents must ensure a low degree of load balancing while pursuing low network energy consumption. The energy consumption greedy strategy does not consider the network balance, and always tends to allocate tasks to nodes with low energy consumption, which greatly leads to unbalanced network load. The random assignment scheme may be approximated as a uniform distribution of tasks when the number of tasks is large, but it does not take into account the data size required by the task and the difference in task processing capabilities of edge computing nodes. Therefore, the heterogeneity of edge computing network and task requirements affects the load balancing of the network to a certain extent.

5. Conclusions

We considered a heterogeneous edge computing network in this work, including heterogeneous edge computing nodes and task requirements. According to the characteristics of heterogeneous edge computing network, we designed a software-defined heterogeneous edge computing network architecture. In this architecture, we realized the separation of control layer and data layer. In addition, the tasks in the network were decomposed into multiple subtasks according to the task requirements by the control layer. The edge computing nodes alliance of the tasks was established to perform the response tasks. For this resource scheduling problem, we established a combinatorial optimization problem to optimize network energy consumption and load balancing. Further, the PPO resource scheduling algorithm was proposed to solve the above combinatorial optimization problem. Simulation results showed that the proposed PPO resource scheduling can maintain low energy consumption while ensuring network load balancing.

Author Contributions

Writing—original draft, Y.L.; Writing—review & editing, B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Esposito, C.; Castiglione, A.; Pop, F.; Choo, K.K.R. Challenges of Connecting Edge and Cloud Computing: A Security and Forensic Perspective. IEEE Cloud Comput. 2017, 4, 13–17. [Google Scholar] [CrossRef]
Zeng, J.; Sun, J.; Wu, B.; Su, X. Mobile edge communications, computing, and caching (MEC3) technology in the maritime communication network. China Commun. 2020, 17, 223–234. [Google Scholar] [CrossRef]
Xia, J.; Wang, P.; Li, B.; Fei, Z. Intelligent task offloading and collaborative computation in multi-UAV-enabled mobile edge computing. China Commun. 2022, 19, 244–256. [Google Scholar] [CrossRef]
Shi, W.; Dustdar, S. The Promise of Edge Computing. Computer 2016, 49, 78–81. [Google Scholar] [CrossRef]
Corcoran, P.; Datta, S.K. Mobile-Edge Computing and the Internet of Things for Consumers: Extending cloud computing and services to the edge of the network. IEEE Consum. Electron. Mag. 2016, 5, 73–74. [Google Scholar] [CrossRef]
Khorsandroo, S.; Tosun, A.S. An experimental investigation of SDN controller live migration in virtual data centers. In Proceedings of the 2017 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), Berlin, Germany, 6–8 November 2017; pp. 309–314. [Google Scholar] [CrossRef]
Gaur, K.; Grover, J. Exploring VANET Using Edge Computing and SDN. In Proceedings of the 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Sikkim, India, 25–28 February 2019; pp. 1–4. [Google Scholar] [CrossRef]
Subramanya, T.; Goratti, L.; Khan, S.N.; Kafetzakis, E.; Giannoulakis, I.; Riggio, R. SDEC: A platform for software defined mobile edge computing research and experimentation. In Proceedings of the 2017 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), Berlin, Germany, 6–8 November 2017; pp. 1–2. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, Y.; Wu, Y.; Qi, L.; Chen, X.; Shen, X. Joint Task Scheduling and Energy Management for Heterogeneous Mobile Edge Computing With Hybrid Energy Supply. IEEE Internet Things J. 2020, 7, 8419–8429. [Google Scholar] [CrossRef]
Bernardos, C.J.; de la Oliva, A.; Serrano, P.; Banchs, A.; Contreras, L.M.; Jin, H.; Zúniga, J.C. An architecture for software defined wireless networking. IEEE Wirel. Commun. 2014, 21, 52–61. [Google Scholar] [CrossRef]
Kobo, H.I.; Abu-Mahfouz, A.M.; Hancke, G.P. A Survey on Software-Defined Wireless Sensor Networks: Challenges and Design Requirements. IEEE Access 2017, 5, 1872–1899. [Google Scholar] [CrossRef]
Xu, X.; Huang, Q.; Zhu, H.; Sharma, S.; Zhang, X.; Qi, L.; Bhuiyan, M.Z.A. Secure Service Offloading for Internet of Vehicles in SDN-Enabled Mobile Edge Computing. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3720–3729. [Google Scholar] [CrossRef]
Hou, X.; Ren, Z.; Wang, J.; Cheng, W.; Ren, Y.; Chen, K.C.; Zhang, H. Reliable Computation Offloading for Edge-Computing-Enabled Software-Defined IoV. IEEE Internet Things J. 2020, 7, 7097–7111. [Google Scholar] [CrossRef]
Lv, Z.; Xiu, W. Interaction of edge-cloud computing based on SDN and NFV for next generation IoT. IEEE Internet Things J. 2020, 7, 5706–5712. [Google Scholar] [CrossRef]
Donato, C.; Serrano, P.; de la Oliva, A.; Banchs, A.; Bernardos, C.J. An openflow architecture for energy-aware traffic engineering in mobile networks. IEEE Netw. 2015, 29, 54–60. [Google Scholar] [CrossRef]
Zeng, D.; Li, P.; Guo, S.; Miyazaki, T. Minimum-energy reprogramming with guaranteed quality-of-sensing in software-defined sensor networks. In Proceedings of the 2014 IEEE International Conference on Communications (ICC), Sydney, Australia, 10–14 June 2014; pp. 288–293. [Google Scholar] [CrossRef]
Xiang, W.; Wang, N.; Zhou, Y. An Energy-Efficient Routing Algorithm for Software-Defined Wireless Sensor Networks. IEEE Sens. J. 2016, 16, 7393–7400. [Google Scholar] [CrossRef]
Harika, J.; Baleeshwar, P.; Navya, K.; Shanmugasundaram, H. A Review on Artificial Intelligence with Deep Human Reasoning. In Proceedings of the 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 9–11 May 2022; pp. 81–84. [Google Scholar] [CrossRef]
Zhou, J.; Luo, J.; Wang, J.; Deng, L. Cache Pollution Prevention Mechanism Based on Deep Reinforcement Learning in NDN. J. Commun. Inf. Netw. 2021, 6, 91–100. [Google Scholar] [CrossRef]
Kaloev, M.; Krastev, G. Experiments Focused on Exploration in Deep Reinforcement Learning. In Proceedings of the 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 21–23 October 2021; pp. 351–355. [Google Scholar] [CrossRef]
Debner, A. Scaling up Deep Reinforcement Learning for Intelligent Video Game Agents. In Proceedings of the 2022 IEEE International Conference on Smart Computing (SMARTCOMP), Helsinki, Finland, 20–24 June 2202; pp. 192–193. [Google Scholar] [CrossRef]
Toan, N.D.; Woo, K.G. Mapless Navigation with Deep Reinforcement Learning based on The Convolutional Proximal Policy Optimization Network. In Proceedings of the 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju Island, Republic of Korea, 17–20 January 2021; pp. 298–301. [Google Scholar] [CrossRef]
Cheng, Y.; Huang, L.; Wang, X. Authentic Boundary Proximal Policy Optimization. IEEE Trans. Cybern. 2022, 52, 9428–9438. [Google Scholar] [CrossRef] [PubMed]
Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]

Figure 1. Heterogeneous edge computing networks.

Figure 2. Software-defined heterogeneous edge computing network.

Figure 3. Convergence of PPO resource scheduling.

Figure 4. Energy consumption with different number of tasks.

Figure 5. Load balancing with different number of tasks.

Table 1. Parameters of the sensor networks and PPO.

Description	Value
heterogeneous sensor network range (m * m)	1000 * 1000
number of tasks	20, 30, 40, 50, 60
subtask data volume requirements (kbit)	[3, 5]
number of sensors	100
free space model magnification ( $J / (bit * m^{2})$ )	10
multipath transmission model magnification ( $J / (bit * m^{4})$ )	0.0013
learning rate	1 × 10 $^{- 5}$
batch size	64
discount factor	0.99
generalized advantage estimator factor	0.9
clipping parameter	0.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Wu, B. Software-Defined Heterogeneous Edge Computing Network Resource Scheduling Based on Reinforcement Learning. Appl. Sci. 2023, 13, 426. https://doi.org/10.3390/app13010426

AMA Style

Li Y, Wu B. Software-Defined Heterogeneous Edge Computing Network Resource Scheduling Based on Reinforcement Learning. Applied Sciences. 2023; 13(1):426. https://doi.org/10.3390/app13010426

Chicago/Turabian Style

Li, Yaofang, and Bin Wu. 2023. "Software-Defined Heterogeneous Edge Computing Network Resource Scheduling Based on Reinforcement Learning" Applied Sciences 13, no. 1: 426. https://doi.org/10.3390/app13010426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Software-Defined Heterogeneous Edge Computing Network Resource Scheduling Based on Reinforcement Learning

Abstract

1. Introduction

2. System Model and Problem Formulation

3. PPO Resource Scheduling

4. Simulation Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI