Next Article in Journal
On-Wafer Temperature Monitoring Sensor for Condition Monitoring of Repaired Electrostatic Chuck
Next Article in Special Issue
Information Separation Network for Domain Adaptation Learning
Previous Article in Journal
Design of a Dual Change-Sensing 24T Flip-Flop in 65 nm CMOS Technology for Ultra Low-Power System Chips
Previous Article in Special Issue
Delay and Energy-Efficiency-Balanced Task Offloading for Electric Internet of Things
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Edge Intelligence Empowered Dynamic Offloading and Resource Management of MEC for Smart City Internet of Things

1
China Mobile System Integration Co., Ltd., Xi’an 710077, China
2
College of Communications and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(6), 879; https://doi.org/10.3390/electronics11060879
Submission received: 29 January 2022 / Revised: 5 March 2022 / Accepted: 8 March 2022 / Published: 10 March 2022

Abstract

:
Internet of Things (IoT) has emerged as an enabling platform for smart cities. In this paper, the IoT devices’ offloading decisions, CPU frequencies and transmit powers joint optimization problem is investigated for a multi-mobile edge computing (MEC) server and multi-IoT device cellular network. An optimization problem is formulated to minimize the weighted sum of the computing pressure on the primary MEC server (PMS), the sum of energy consumption of the network, and the task dropping cost. The formulated problem is a mixed integer nonlinear program (MINLP) problem, which is difficult to solve since it contains strongly coupled constraints and discrete integer variables. Taking the dynamic of the environment into account, a deep reinforcement learning (DRL)-based optimization algorithm is developed to solve the nonconvex problem. The simulation results demonstrate the correctness and the effectiveness of the proposed algorithm.

1. Introduction

Smart city is a promising city paradigm, which improves the quality of experience (QoE) of citizens through advanced information and communication technologies (ICTs) infrastructure and enormous Internet of Things (IoT) devices [1,2,3]. A practical problem is that the IoT devices are usually low cost with limited computing powers and storage capacities. Therefore, it is hard to complete compute-intensive and latency-sensitive tasks independently by the IoT devices. An intuitive method to alleviate this problem is to adopt the cloud computing technology for remote task computation. However, most of the clouding computing servers are deployed far away from the IoT devices, in which offloading the tasks of the IoT devices will cause severe transmission delay. Hence, traditional cloud computing technology is difficult to satisfy the latency requirements of applications in smart cities. To solve the issue mentioned above, researchers have proposed the concept of mobile edge computing (MEC).
In MEC systems, MEC servers are deployed at the edge of network to provide cloud-like computing services for the IoT devices [4,5]. IoT devices offload their compute-intensive tasks to the MEC servers for task execution. Since MEC servers are deployed around the IoT devices, the latency for task offloading is significantly reduced compared with the cloud computing. Hence, MEC has been considered as a promising solution to provide ultra-latency computation service for smart cities [1,2,3].
There have been a lot of works focused on the researches of the resource allocation and caching problems in MEC systems in IoT or IoT related areas. A multi-user MEC network consisting of a MEC server and multiple wireless devices was considered in [6]. The weighted sum computation rate of all the wireless devices maximization problem was studied. The computing mode and the system resource allocation were jointly optimized by a proposed alternating direction method of multipliers decomposition technique-based algorithm. In [7], a MEC system with a multiple antenna access point (AP) and K single antenna users was studied. The beamforming vector of the AP, the CPU frequencies, the numbers of offloaded bits and the time allocation of the users were jointly optimized to minimize the energy consumption of the AP. A device-to-device (D2D)-MEC system including one MEC server and multiple user devices was considered in [8]. The goal of this paper was to maximize the number of devices serviced by the system under the communication and computation resources constraints. Unlike the cloud computing, the MEC server has limited computing power. Hence, in single base station (BS) or MEC server scenarios, the IoT devices may have the problem of long service response time and even failure during periods of peak demand. The authors in [9] studied a heterogeneous network consisting of a multi-antenna macro-cell BS and multiple small-cells BSs. The offloading decision, offloading and computation resources allocation were optimized to minimize the total energy consumption of the devices within the coverage of the BSs. In [10], a dense small-cell network was concerned, which had multiple MEC servers. The spatial demand coupling, service heterogeneity, and decentralized coordination problems were solved by a proposed collaborative service placement algorithm. In [11], the weighted sum of the difference of the observed delay and its corresponding delay requirement at each slice was minimized through optimizing the offloading decisions of the users and the communication and computing resource allocation in a multi-cell MEC server network.
The above works focused on the MEC problems in static environment, which is a particular case of dynamic environment. In dynamic environment, the MEC system state changes randomly and unpredictable, which is more approximate to the practical scenarios. In static environment, the MEC systems mainly concern about the short-term utility, while in dynamic environment, the long-term utility are concerned. Edge intelligence empowered by artificial intelligence (AI) is promising way to optimize the system performance in the field of the smart city IoT [12,13,14]. In [15], the power control and computing resource allocation optimization problem in Industrial Internet of Things MEC network was studied, a deep reinforcement learning (DRL)-based dynamic resource management algorithm was proposed to minimize the long-term average delay of the tasks. In [16], a content caching problem was investigated, and an actor-critic DRL-based algorithm was studied to maximize the cache bit rate. In [17], the task migration problem was studied in multi-MEC server and multi-user network, a multi-agent DRL task migration algorithm was proposed to solve the formulated problem. In [18], a multi-user end-edge-cloud orchestrated network was proposed and a DRL-based computation offloading and resource allocation strategy was designed to minimize the energy consumption of the system.
In practical scenarios, there are many metrics to measure the performance of a MEC system, hence, the system requirements are always multifaceted. The works mentioned above mainly considered single-objective optimization scenarios, which may be not generalized and universal for some practical MEC systems. Motivated by these facts, we propose a multi-MEC server and multi-IoT device cellular network structure and investigate a weighted sum of multiple objectives minimization optimization problem in this paper. The weighted sum of multiple objectives optimization problems in dynamic MEC systems were also studied in [19,20,21], but the optimization objectives and system models are different to ours. The key differences between the relevant works and our work are shown in Table 1. The main contributions of this paper are summarized as follows:
(1)
A multi-MEC server and multi-IoT device cellular network structure is proposed. A high-cost and high-performance primary MEC server (PMS) with relative strong computing power is deployed in the BS, and multiple low-cost secondary MEC servers (SMSs) with relative weak computing powers are deployed within the coverage area of the BS.
(2)
An optimization problem is formulated. The problem considers the weighted sum of multiple optimization objectives, including the minimization of the weighted sum of the computing pressure on the PMS, the sum of energy consumption of the network, and the task dropping cost. The formulated problem is a nonconvex mixed integer nonlinear program (MINLP) problem, which is solved by our proposed DRL-based optimization algorithm.
(3)
Simulation results are presented to evaluate the performance of the proposed algorithm. The correctness and effectiveness of the proposed algorithm are demonstrated by the simulation results.
The remainder of this paper is organized as follows. Section 2 presents the system model and formulates the optimization problem. The proposed DRL-based optimization algorithm is described in Section 3. The complexity and convergence analysis is given in Section 4. Simulation results are provided in Section 5. Finally, Section 6 concludes this paper.

2. System Model

In this section, we first introduce the proposed multi-MEC server and multi-IoT device cellular network, the channel model, and the computation model, respectively. Then, based on these we establish the optimization problem of our paper.

2.1. Network and Channel Model

As shown in Figure 1, a multi-MEC server and multi-IoT device cellular network is considered, which consists of a high-performance PMS, M SMSs with relative weak computing powers, and K IoT devices. The computing power of the PMS is much stronger than the SMSs. The PMS is deployed at the BS, and the SMSs are deployed in the APs, which are distributed in different locations within the coverage area of the BS. A specific scenario of this network is a UAV-assisted MEC system, in which the PMS is the BS and the SMSs can be the UAVs equipped with limited computing power MEC servers. To avoid repetition, the notations used below do not distinguish between PMS and BS, the SMS and the AP. We assume each SMS’s cost is low and can be easily deployed and removed according to the requirements of the MEC system. Let M = 0 , 1 , 2 , , M denote the set of the MEC servers, where 0 denotes the PMS and others denote the SMSs. Let K = 1 , 2 , , K denote the set of the IoT devices. We assume that all of the IoT devices and MEC servers are equipped with one single antenna. By adopting multi-antenna technologies, our work can be extended to multi-antenna scenarios [7,9,22,23].
It is assumed that the system is operated in a time-slotted manner with time-slot length Δ . In this paper, we concern about the long-term return during T consecutive time-slots. The set of the time-slots is denoted as T = 1 , 2 , , T . Let h k , m , t denote the channel power gain between the IoT device k and the MEC server m at time-slot t. Similar to [7,24], we assume that the wireless channels between the IoT devices and the MEC servers remain unchanged at each time-slot and vary at different time-slots. Motivated by the works in [25,26], we adopt a Z k -element channel power gain state set to capture the time-varying characteristics of the h k , m , t , denoted as H k , m = h 1 , m k , h 2 , m k , , h Z k , m k , i.e., h k , m , t H k , m .

2.2. Computation Task Model

At time-slot t, the computation task of the IoT device k is denoted as β k , t , which is defined by a tuple l k , t , c k , t , τ k , t , where l k , t denotes the size (in bits) of the task β k , t , c k , t denotes the number of required CPU cycles for computing 1-bit of the task β k , t (i.e. the computational complexity), and τ k , t is the latency requirement of the task β k , t . Similar to the assumption in [27], we assume that at the beginning of each time-slot t, each IoT device k has a new task arrival, l k , t is randomly generated from the set L k = L 1 k , L 2 k , , L N k k , and the corresponding computational complexity c k , t belongs the set of complexity C k = C 1 k , C 2 k , , C N k k . For simplicity, the latency requirement for each task is set to τ k , t = Δ , k K , t T .
To facilitate the resource management, a virtual system operator (VSO) is deployed at the BS, which is responsible for collecting the network information (e.g., the channel state information, size of the each IoT device’s task, each IoT device’s task computational complexity, etc.) and allocating computation resources for the IoT devices. As the computing powers of the IoT devices are always weak, we assume that the tasks of each IoT device must be entirely offloaded to a certain MEC server for computation through wireless link. The task offloading decision variable of the IoT device k is denoted as μ k , m , t 0 , 1 , where μ k , m , t = 1 denotes the IoT device k’s computation task is offloaded to the MEC server m for execution at time-slot t.
Let τ k , m , t denote the time duration of task offloading from the IoT device k to the MEC server m at time-slot t. The offloaded task is processed by the MEC server m in remaining time duration Δ τ k , m , t . The rate of task data offloaded by the IoT device k to the MEC server m at time-slot t can be expressed as
R k , m t = B k , m log 2 1 + h k , m , t p k , m , t σ m 2 , k K , m M , t T
where B k , m denotes the available channel bandwidth between the IoT device k and the MEC server m; p k , m , t denotes the transmit power of the IoT device k at the time-slot t, σ m 2 is the noise power at the MEC server m. The energy consumption of the IoT device k at time-slot t for task offloading is expressed as
E m , k o t = p k , m , t τ k , m , t , k K , m M , t T
The corresponding computation energy consumption of the MEC server m can be expressed as
E m , k c t = ρ m f m , k , t 3 Δ τ k , m , t , k K , m M , t T
where f m , k , t denotes the CPU frequency of the MEC server m allocated to the IoT device k’s task at time-slot t; ρ m is the effective capacitance coefficient of the MEC server m at time-slot t, which is determined by the chip architecture [6,7].

2.3. Problem Formulation

If the computing pressure of the PMS is too high, i.e., if the VSO allocates too many tasks to the PMS, the PMS may have higher probability for crashing. As the PMS has much stronger computing power than the SMSs, the crashing of the PMS has serious impact on the reliability of the MEC system. Meanwhile, the energy consumption and the task completion rate are also very important to the MEC system. Therefore, in this paper, we aim to minimize the weighted sum of the computing pressure of the PMS, the sum energy consumption of the MEC servers and the IoT devices, and the task dropping cost. The corresponding optimization problem is formulated as
P 1 : min μ k , m , t , τ k , m , t , f m , k , t , p k , m , t E t = 1 T γ t 1 ω 0 ψ 0 k = 1 K μ k , 0 , t + ω 1 ψ 1 m = 0 M k = 1 K μ k , m , t E m , k 0 t + E m , k c t + ψ 2 m = 0 M k = 1 K μ k , m , t Γ k , t I k , t β k , t
s . t . m = 1 M μ k , m , t = 1 , μ k , m , t 0 , 1 , k K , m M , t T ,
τ k , m , t R k , m t μ k , m , t l k , t , k K , m M , t T ,
μ k , m , t l k , t c k , t f m , k , t Δ τ k , m , t , k K , m M , t T ,
p k , m , t 0 , p k , m , t p k , max , k K , m M , t T ,
f m , k , t 0 , f m , k , t f m , max k = 1 K μ k , m , t , k K , m M , t T
0 τ k , m , t Δ , k K , m M , t T ,
ω i 0 , 1 , i = 0 , 1 , k K , m M , t T ,
where the first term of the objective function represents the computing pressure of the PMS, the second term represents the sum of energy consumption of the network and the task dropping cost; ω 0 and ω 1 are the weights of the two terms above, respectively; ω i = 0 , i = 0 , 1 means the corresponding objective is not considered; ω i = 1 , i = 0 , 1 means the corresponding objective is considered; ψ i > 0 , i = 0 , 1 , 2 is the normalization factors to normalize each term. γ 0 , 1 is the discount factor, which denotes the difference on importance between the future rewards and the present reward [28]; p k , max and f m , max denote the maximal transmit power of the IoT device k and the maximal available CPU frequency of the MEC server m, respectively. Γ k , t > 0 is task dropping cost of the IoT device k at time-slot t. I k , t is the indicator function, which is given as
I k , t = 0 , β k , t is completed 1 , β k , t is dropped .
(4b) is the offloading decision variable constraint, which guarantees each IoT device’s task has been allocated to a MEC server. (4c) and (4d) are the IoT devices’ computation tasks constraints to make sure that each IoT device’s task can be offloaded and completed. (4e) is the transmit power constraint of the IoT devices. (4f) is the CPU frequency constraint, we assume that IoT devices equally share the CPU at the MEC server m, m M . (4g) and (4h) are the constraints of task offloading time and weights, respectively.
Due to the binary variable μ k , m , t and high coupling constraints, problem P 1 is a non-convex MINLP problem. Furthermore, the computation tasks of the IoT devices and the channel gains are randomly varying during T consecutive time-slots. Hence, it is impossible to solve the problem at the beginning of the T consecutive time-slots. Thus, traditional optimization-based methods are not suitable to solve the problem P 1 .

3. Proposed DRL-Based Optimization Algorithm

In order to address the above issue, we propose a DRL-based optimization algorithm in this section. Specifically, we utilize the importance sampling based parameterized policy gradient approach (PPGA) DRL algorithm [28]. In order to apply the DRL-based algorithm, we first give the system state, action, reward, and the policy of the MEC system as follows:
(1) System state S t : The system state at time-slot t is characterized by the channel power gain, the size (in bits) of computation task data, and the corresponding task complexity, i.e., S t = h k , m , t , l k , t , c k , t , k K , m M , t T .
(2) Action A t : A t is the set of the offloading decisions of the IoT devices, i.e., A t = μ k , m , t , k K , m M , t T .
(3) Reward R t : After executing an action A t under system state S t at time-slot t, the VSO will receive a reward R t . The reward of a DRL model is direct related with the optimization objective of the system. Therefore, the reward of our DRL model is determined by the objective function value of the problem P 1 at time-slot t. With given μ k , m , t , k K , m M , t T ( A t ), the optimization problem for reward R t are given as
P 2 : R t = min τ k , m , t , f m , k , t , p k , m , t γ t 1 ω 0 ψ 0 k = 1 K μ k , 0 , t + ω 1 ψ 1 m = 0 M k = 1 K μ k , m , t E m , k 0 t + E m , k c t + ψ 2 m = 0 M k = 1 K μ k , m , t Γ k , t I k , t β k , t
s . t . 4 b 4 h .
The standard form of the DRL-based optimization problem is to maximize the accumulated reward, hence, we add a minus sign before the R t . Obviously, when μ k , m , t , k K , m M , t T are determined, the computing pressure of the PMS is determined too. Therefore, solving the problem P 2 is equal to solve the problem P 3 below
P 3 : min τ k , m , t , f m , k , t , p k , m , t
ω 1 ψ 1 m = 0 M k = 1 K μ k , m , t E m , k 0 t + E m , k c t + ψ 2 m = 0 M k = 1 K μ k , m , t Γ k , t I k , t β k , t
s . t . 4 c 4 g .
Based on the primal decomposition theory [29], the problem P 3 can be decomposed into K sub-optimization problems. Specifically, for the IoT device k and the MEC server m, k , m k , m μ k , m , t = 1 , the corresponding sub-optimization problem can be expressed as
P 3 , k m : min τ k , m , t , f m , k , t , p k , m , t ψ 1 E m , k 0 t + E m , k c t + ψ 2 Γ k , t I k , t β k , t
s . t . 4 c 4 g .
If the problem P 3 , k m is solvable, i.e., there exist feasible solutions of τ k , m , t , f m , k , t , and p k , m , t to meet the constraints (4c)–(4g). The optimal value of P 3 , k m is equal to the optimal objective value of ψ 1 E m , k 0 t + E m , k c t . On the other hand, if there exist no feasible solutions of τ k , m , t , f m , k , t , and p k , m , t , we let the objective value of P 3 , k m be equal to Γ k , t . It is worth noting that, if not all the IoT devices’ tasks can be completed, our work can not be applied to minimize the sum of energy consumption of the system. According to P 3 , k m , if we set ψ 2 = 0 , the optimal solutions for μ k , m , t is to set μ k , m , t = 0 , k K , m M , t T , which are pointless solutions. Hence, the conditions ψ 1 > 0 and ψ 2 > 0 must be satisfied simultaneously when ω 1 = 1 . If the problem P 3 , k m has feasible solutions, the problem P 3 , k m is transformed into P 4 , k m , which is given as
P 4 , k m : min τ k , m , t , f m , k , t , p k , m , t E m , k 0 t + E m , k c t
s . t . 4 c 4 g .
Problem P 4 , k m is still non-convex and intractable due to the complex coupling among the variables τ k , m , t , f m , k , t , and p k , m , t . To address this issue, we adopt block coordinate descending (BCD) algorithm to optimize τ k , m , t , f k , m , t , and p k , m , t alternately. For any given feasible τ k , m , t , the optimization problem P 4 , k m is transformed into P 5 , k m , which is given as
P 5 , k m : min f m , k , t , p k , m , t p k , m , t τ k , m , t + ρ m f m , k , t 3 Δ τ k , m , t
s . t . 4 c 4 f .
The above optimization problem can be further decomposed into the following two manageable sub-problems, namely,
P 5 , k m , 1 : min p k , m , t p k , m , t τ k , m , t
s . t . 4 c , 4 e .
P 5 , k m , 2 : min f m , k , t ρ m f m , k , t 3 Δ τ k , m , t
s . t . 4 d , 4 f ,
Theorem 1.
For a given τ k , m , t , the optimal p k , m , t and f k , m , t can be given as
f m , k , t * = μ k , m , t l k , t c k , t Δ τ k , m , t ,
p k , m , t * = σ m 2 h k , m , t 2 μ k , m , t l k , t τ k , m , t B k , m σ m 2 h k , m , t ,
respectively.
Proof. 
It is easy to prove that problem P 5 , k m , 1 and P 5 , k m , 2 are both convex optimization problem and can be efficiently solved by using the Karush-Kuhn-Tucker (KKT) conditions [30]. □
Substituting the above results into P 5 , k m , we have
P 6 , k m : min τ m . k , t σ m 2 h k , m , t τ k , m , t 2 μ k , m , t l k , t τ k , m , t B k , m + ρ m μ k , m l k , t c k , t 3 Δ τ k , m , t 2 σ m 2 h k , m τ k , m , t
s . t . τ k , m , t min τ k , m , t τ k , m , t max ,
where
τ k , m , t min = μ k , m , t l k , t B k , m log 2 1 + h k , m , t p k , max σ 2 ,
τ k , m , t max = Δ μ k , m , t l k , t c k , t k = 1 M μ k , m , t f m , max .
According to Theorem 1, the optimal solution of the problem P 5 , k m have closed-form optimal solutions, which are determined by the value of τ m . k , t . Therefore, if τ k , m , t min τ k , m , t max , solving the problem P 4 , k m is equivalent to solving the problem P 6 , k m , which has one optimization variable τ m . k , t . If τ k , m , t min > τ k , m , t max , the problem P 6 , k m is unsolvable.
Theorem 2.
The optimization problem P 6 , k m is convex.
Proof. 
See Appendix A. □
Based on the convexity illustrated in Theorem 2, we can adopt the bisection method to solve the problem P 6 , k m . The bisection based optimization algorithm for solving problem P 6 , k m is summarized in Algorithm 1, where g τ k , m , t denote the objective function of P 6 , k m .
(4) Policy π θ A t S t : The policy π θ A t S t denotes the mapping from the state S t to the action A t of the MEC system, i.e., π θ A t S t : S t A t , where θ is the parameter of the policy.
The parameter of the policy π θ A t S t is obtained through gradient based method. The performance measure Y θ of the PPGA is defined as [28]
Y θ = V π θ A t S t S 0
where V π θ S 0 is the value function for policy π θ starting from initial state S 0 and θ is the parameter of the policy. An analytic expression for the gradient of Y θ is provided by policy gradient theorem [28], which is given as
Y θ S t μ S t A t q π θ A t S t S t , A t π θ A t S t
where μ S t is the on-policy distribution over states, q π θ A t S t S t , A t is the value of taking action A t in state S t under policy π θ A t S t .
Algorithm 1 : A Bisection Algorithm for Solving P 6 , k m
1: Initialization:
2:  The bisection algorithm iteration index i = 1 , maximum number
 of iterations I max , τ k , m , t min , τ k , m , t max , k K , m M , the tolerance errors ξ .
3: for: i = 1 : I max
4:  Update c = τ k , m , t min + τ k , m , t max 2 .
5:  if  g c = 0  then
6:   The optimal value of τ k , m , t is τ k , m , t opt = c ;
7:   break;
8:  end if
9:  if  g τ k , m , t min g c < 0  then
10:   Update τ k , m , t max = c ;
11: end if
12: if  g c g τ k , m , t max < 0  then
13:  Update τ k , m , t min = c ;
14: end if
15: if  τ k , m , t max τ k , m , t min < ξ or i = = I max  then
16:   The optimal value of τ k , m , t is τ k , m , t opt = c ;
17: end if
18: end for
An action-independent baseline b S t is always introduced to decrease the variance in the training process. Then, the analytic expression for the policy gradient with baseline is denoted as
Y θ S t μ S t A t q π θ A t S t S t , A t b S t π θ A t S t .
Off-policy method adopts an exploratory behavior policy ψ A t S t to generate behavior, while the target policy π θ A t S t learns about the behavior and finally become the optimal policy. Importance sampling technique is widely used by off-policy methods, which weights the returns by importance-sampling ratio [28]. The parameter θ is updated as
θ t + 1 = θ t + α γ t 1 ψ A t S t G b S t π θ t A t S t π θ t A t S t
where α is the learning rate, G is the return following time-slot t, θ t is the estimate of θ at time-slot t. We adopt the estimate of the state value v S t ; w as the baseline, where w is the weight vector of the state value function. Then, the DRL-based algorithm is summarized in Algorithm 2.
Algorithm 2 : The proposed DRL-based algorithm.
1: Initialization:
2: θ , w , target policy π θ A t S t , behavior policy ψ A t S t
maximal number of iterations K max , discount factor γ , the learning
rate of policy α p > 0 , the learning rate of the baseline α b > 0 ;
3: for  k = 1 : K max :
4: Using ψ A t S t and Algorithm 1 to generate trajectory
S 0 , A 0 , R 1 , S 1 S T 1 , A T 1 , R T ,
S T by action policy ψ ;
5: for t = T 1 , T 2 , , 0 :
6:  Update G: G γ G + R t + 1 ;
7:  Update w : w w + α b G v S t ; w v S t ; w ;
8:  Update θ by (18) with α = α b ;
9: end for
10: end for

4. Complexity and Convergence Analysis

According to [31], the computation complexity of a training step for a full-connection deep neural network (DNN) is O j = 1 J N r j 1 N r j , where J is number of the layers, N r j is the number of the neural in j-th layer. Considering Algorithm 1 and Algorithm 2, the total complexity of our proposed algorithm is O 2 T U j = 1 J N r j 1 N r j I max , where U is total training episodes.
The convergence guarantee of the DRL algorithm is still an open issue [27], which are influenced by many factors, such as the setting of the hyperparameters and the initial value of the DNN parameters. The convergence performance of our proposed algorithm is shown in Section 5.

5. Simulation Results

In this section, simulation results are provided to evaluate the performance of the proposed DRL-based algorithm. We conduct the simulations through python 3.8 and Tensorflow 2.5.0. Fully-connected hidden layer with 10 neurons in both the baseline and policy networks are employed. The learning rates α b and α p are set as 8 e 3 and 2 e 3 , respectively. The channel bandwidth between each IoT device and each MEC server is 200 KHz. The maximum CPU frequency of the SMS and the PMS are set as 1 GHz and 5 GHz, respectively. The length of time-slot Δ is set as 100 ms. H k , m is set as 2 × 10 6 , 4 × 10 6 , 6 × 10 6 , 8 × 10 6 , k K , m M . Without loss of generality, c k , t , k K , m M , t T are all set as 1000. ρ m is set as 1 × 10 27 . L k is set as 1.5 × 10 4 , 3 × 10 4 , 4.5 × 10 4 , 6 × 10 4 bits, k K . T is set as 40.
Figure 2 shows the impacts of the initial values of θ and w on the accumulated reward. The iterative algorithms are susceptible to the initial value of variables. In our paper, the initial value of θ and w are randomly given, which is a common way in DRL algorithms. It can be seen from Figure 2 that different initial values of θ and w have deep influences on the convergence performance of the our DRL based algorithm. In order to guarantee the performance of the algorithm, we must run the algorithm multiple times and select the one has best performance as the final output.
Figure 3 shows the impact of the weights on the normalized number of tasks sent to the PMS per episode. It can be seen from Figure 3 that the case ω 0 , ω 1 = 1 , 0 has the smallest number of tasks sent to the PMS. This is because the VSO only concerns the computing pressure of the PMS when ω 0 , ω 1 = 1 , 0 . When ω 0 , ω 1 = 0 , 1 , the VSO mainly tries to find a policy to minimize the number of dropped tasks, namely, to make the problem P 3 , k m solvable. As the PMS has strongest computing power, the VSO will allocate many tasks to the PMS, which can be proofed by Figure 3. Finally, when ω 0 , ω 1 = 1 , 1 , the VSO must make a trade-off between the computing pressure on the PMS and the task dropping cost. Hence, when ω 0 , ω 1 = 1 , 1 , the number of tasks sent to the PMS per episode is higher than the case ω 0 , ω 1 = 1 , 0 but smaller than the case ω 0 , ω 1 = 0 , 1 .
Figure 4 shows the impact of the weights on the normalized number of dropped tasks per episode. When ω 0 , ω 1 = 1 , 0 , most of the tasks are allocated to the SMSs to reduce the computing pressure of the PMS. Since the computing power of the SMSs are weak, many tasks may be dropped. The case ω 0 , ω 1 = 0 , 1 has the smallest normalized number of dropped tasks per episode, which is explained in Figure 3. Similar to that in Figure 3, the case ω 0 , ω 1 = 1 , 1 has middle number of dropped tasks per episode.
Figure 5 and Figure 6 show performance comparisons between our proposed algorithm ( ω 0 , ω 1 = 1 , 1 ) and two benchmark polices: only send to the PMS policy and random allocation policy. Only send to the PMS policy allocate all the IoT devices’ tasks to the PMS, which is a policy adopted in typical single MEC server deployment scenario. Random allocation policy allocates each IoT device’s task randomly to the M + 1 MEC servers. The two benchmark policies are both short-term optimization policies, we plot the mean value of the normalized number of tasks sent to the PMS and dropped tasks per episode in Figure 5 and Figure 6. As shown in Figure 5 and Figure 6, our proposed algorithm achieve superior performances on computing pressure of the PMS and tasks dropped cost than the only send to the PMS policy. In order to obtain a lower task dropped cost, our algorithm has larger computing pressure on the PMS than the random allocation policy. However, we have significant performance gain in term of task dropped cost compared with the random allocation policy. Hence, our proposed algorithm is more practical than the random allocation policy.

6. Conclusions

We studied the problem of making trade-off among the computing pressure on the PMS, the sum of energy consumption of the IoT devices and all the MEC servers, and the task dropping cost. The formulated MINLP problem was solved by a proposed DRL-based optimization algorithm. The simulation results demonstrated the validity of the proposed algorithm.

Author Contributions

Conceptualization, K.T. and B.L.; methodology, software, validation, K.T., H.C., Y.L. and B.L.; writing—review and editing, K.T. and B.L.; supervision, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

The research reported in this article was supported by the Research Program of China Mobile System Integration Co., Ltd. under Grant ZYJC-Shaanxi-202110-B-CB-001.

Acknowledgments

We thank Wei Zhang for his valuable comments and discussion.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof (Proof of Theorem 1).
We start the proof by deriving the first and second order derivatives of the objective function of P 6 , k m to clarify the convexity. Specifically, let g τ k , m , t denote the objective function of P 6 , k , namely,
g τ k , m , t = σ 2 h k , m , t τ k , m , t 2 μ k , m , t l k , t τ k , m , t B k , m + ρ m μ k , m l k , t c k , t 3 Δ τ k , m , t 2 σ 2 h k , m , t τ k , m , t .
Thus, its first and second order derivatives can be respectively given as
g τ k , m , t = 2 μ k , m , t l k , t τ k , m , t B k , m σ 2 h k , m , t σ m 2 h k , m , t μ k , m , t l k , t τ k , m , t B k , m ln 2 + 2 ρ m μ k , m , t l k , t c k , t 3 Δ τ k , m , t 3 σ m 2 h k , m , t ,
g τ k , m , t = σ m 2 h k , m , t 2 μ k , m , t l k , t τ k , m , t B k , m ln 2 2 μ k , m , t 2 l k , t 2 B k , m 2 τ k , m , t 3 + 6 ρ m μ k , m , t l k , t c k , t 3 Δ τ k , m , t 4 .
It is observed that the second order derivative of the objective function is positive for any feasible τ k , m , t . Thus, the optimization problem P 5 , k m contains a convex objective function and a linear constraint. Therefore, the optimization problem P 5 , k m is a convex optimization problem. The proof is completed. □

References

  1. Zhao, Y.; Xu, K.; Wang, H.; Li, B.; Qiao, M.; Shi, H. MEC-enabled hierarchical emotion recognition and perturbation-aware defense in smart cities. IEEE Internet Things J. 2021, 8, 16933–16945. [Google Scholar] [CrossRef]
  2. Khan, L.U.; Yaqoob, I.; Tran, N.H.; Kazmi, S.M.A.; Dang, T.N.; Hong, C.S. Edge-computing-enabled smart cities: A comprehensive survey. IEEE Internet Things J. 2020, 7, 10200–10232. [Google Scholar] [CrossRef] [Green Version]
  3. Wu, H.; Zhang, Z.; Guan, C.; Wolter, K.; Xu, M. Collaborate edge and cloud computing with distributed deep learning for smart city internet of things. IEEE Internet Things J. 2020, 7, 8099–8110. [Google Scholar] [CrossRef]
  4. Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A survey on mobile edge computing: The communication perspective. IEEE Commun. Surv. Tutor. 2017, 1, 2322–2358. [Google Scholar] [CrossRef] [Green Version]
  5. Ryu, J.W.; Pham, Q.V.; Luan, H.N.T.; Hwang, W.J.; Kim, J.D.; Lee, J.T. Multi-access edge computing empowered heterogeneous networks: A novel architecture and potential works. Symmetry 2019, 11, 842. [Google Scholar] [CrossRef] [Green Version]
  6. Bi, S.; Zhang, Y.J. Computation rate maximization for wireless powered mobile-edge computing with binary computation offloading. IEEE Trans. Wirel. Commun. 2018, 17, 4177–4190. [Google Scholar] [CrossRef] [Green Version]
  7. Wang, F.; Xu, J.; Wang, X.; Cui, S. Joint offloading and computing optimization in wireless powered mobile-edge computing systems. IEEE Trans. Wirel. Commun. 2018, 17, 1784–1797. [Google Scholar] [CrossRef]
  8. He, Y.; Ren, J.; Yu, G.; Cai, Y. D2D communications meet mobile edge computing for enhanced computation capacity in cellular networks. IEEE Trans. Wirel. Commun. 2019, 18, 1750–1763. [Google Scholar] [CrossRef]
  9. El Haber, E.; Nguyen, T.M.; Assi, C.; Ajib, W. Macro-cell assisted task offloading in mec-based heterogeneous networks with wireless backhaul. IEEE Trans. Netw. Serv. Manag. 2019, 16, 1754–1767. [Google Scholar] [CrossRef]
  10. Chen, L.; Shen, C.; Zhou, P.; Xu, J. Collaborative service placement for edge computing in dense small cell networks. IEEE Trans. Mob. Comput. 2021, 20, 377–390. [Google Scholar] [CrossRef]
  11. Zarandi, S.; Tabassum, H. Delay minimization in sliced multi-cell mobile edge computing (mec) systems. IEEE Commun. lett. 2021, 25, 1964–1968. [Google Scholar] [CrossRef]
  12. Lim, W.Y.B.; Ng, J.S.; Xiong, Z.; Jin, J.; Zhang, Y.; Niyato, D.; Leung, C.S.; Miao, C. Decentralized Edge Intelligence: A Dynamic Resource Allocation Framework for Hierarchical Federated Learning. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 536–550. [Google Scholar] [CrossRef]
  13. Lim, W.Y.B.; Ng, J.S.; Xiong, Z.; Niyato, D.; Miao, C.; Kim, D.I. Dynamic Edge Association and Resource Allocation in Self-Organizing Hierarchical Federated Learning Networks. IEEE J. Sel. Areas Commun. 2021, 39, 3640–3653. [Google Scholar] [CrossRef]
  14. Yang, H.; Zhao, J.; Xiong, Z.; Lam, K.-Y.; Sun, S.; Xiao, L. Privacy-Preserving Federated Learning for UAV-Enabled Networks: Learning-Based Joint Scheduling and Resource Management. IEEE J. Sel. Areas Commun. 2021, 39, 3144–3159. [Google Scholar] [CrossRef]
  15. Chen, Y.; Liu, Z.; Zhang, Y.; Wu, Y.; Chen, X.; Zhao, L. Deep reinforcement learning-based dynamic resource management for mobile edge computing in industrial internet of things. IEEE Trans. Ind. Informat. 2021, 17, 4925–4934. [Google Scholar] [CrossRef]
  16. Zhong, C.; Gursoy, M. C.; Velipasalar, S. Deep reinforcement learning-based edge caching in wireless networks. IEEE Trans. Cogn. Commun. Netw. 2020, 6, 48–61. [Google Scholar] [CrossRef]
  17. Liu, C.; Tang, F.; Hu, Y.; Li, K.; Tang, Z.; Li, K. Distributed task migration optimization in mec by extending multi-agent deep reinforcement learning approach. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 1603–1614. [Google Scholar] [CrossRef]
  18. Dai, A.; Zhang, K.; Maharjan, S.; Zhang, Y. Edge intelligence for energy-efficient computation offloading and resource allocation in 5G beyond. IEEE Trans. Veh. Technol. 2020, 69, 12175–12186. [Google Scholar] [CrossRef]
  19. Hu, H.; Wang, Q.; Hu, R. Q.; Zhu, H. Mobility-Aware Offloading and Resource Allocation in a MEC-Enabled IoT Network With Energy Harvesting. IEEE Internet Things J. 2021, 8, 17541–17556. [Google Scholar] [CrossRef]
  20. Ale, L.; Zhang, N.; Fang, X.; Chen, X.; Wu, S.; Li, L. Delay-aware and energy-efficient computation offloading in mobile-edge computing using deep reinforcement learning. IEEE Trans. Cogn. Commun. Netw. 2021, 7, 881–892. [Google Scholar] [CrossRef]
  21. Temesgene, D.A.; Miozzo, M.; Gündüz, D.; Dini, P. Distributed deep reinforcement learning for functional split control in energy harvesting virtualized small cells. IEEE Trans. Sustain. Comput. 2021, 6, 626–640. [Google Scholar] [CrossRef]
  22. Han, H.; Fang, L.; Lu, W.; Zhai, W.; Li, Y.; Zhao, J. A GCICA Grant-Free Random Access Scheme for M2M Communications in Crowded Massive MIMO Systems. IEEE Internet Things J. 2021. early access. [Google Scholar] [CrossRef]
  23. Han, H.; Fang, L.; Lu, W.; Chi, K.; Zhai, W.; Zhao, J. A Novel Grant-Based Pilot Access Scheme for Crowded Massive MIMO Systems. IEEE Trans. Veh. Technol. 2021, 70, 11111–11115. [Google Scholar] [CrossRef]
  24. Mao, Y.; Zhang, J.; Letaief, K.B. Dynamic Computation offloading for mobile-edge computing with energy harvesting devices. IEEE J. Sel. Areas Commun. 2016, 34, 3590–3605. [Google Scholar] [CrossRef] [Green Version]
  25. Liu, Y.; Xie, S.; Zhang, Y. Cooperative offloading and resource management for uav-enabled mobile edge computing in power iot system. IEEE Trans. Veh. Technol. 2020, 69, 12229–12239. [Google Scholar] [CrossRef]
  26. Wang, H.; Yu, F.R.; Zhu, L.; Tang, T.; Ning, B. Finite-state markov modeling for wireless channels in tunnel communication-based train control systems. IEEE Trans. Intell. Transp. Syst. 2014, 15, 1083–1090. [Google Scholar] [CrossRef]
  27. Tang, M.; Wong, V.W.S. Deep reinforcement learning for task offloading in mobile edge computing systems. IEEE Trans. Mob. Comput. 2020, in press. [CrossRef]
  28. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  29. Palomar, D.P.; Chiang, M. A tutorial on decomposition methods for network utility maximization. IEEE J. Sel. Areas Commun. 2006, 24, 1439–1451. [Google Scholar] [CrossRef] [Green Version]
  30. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  31. Li, C.; Xia, J.; Liu, F.; Li, D.; Fan, L.; Karagiannidis, G.K.; Nallanathan, A. Dynamic Offloading for Multiuser Muti-CAP MEC Networks: A Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2021, 70, 2922–2927. [Google Scholar] [CrossRef]
Figure 1. The illustration of the multi-MEC server cellular network.
Figure 1. The illustration of the multi-MEC server cellular network.
Electronics 11 00879 g001
Figure 2. The impacts of the initial value of θ and w on the accumulated reward.
Figure 2. The impacts of the initial value of θ and w on the accumulated reward.
Electronics 11 00879 g002
Figure 3. The impact of the weights on the normalized number of tasks sent to the PMS per episode.
Figure 3. The impact of the weights on the normalized number of tasks sent to the PMS per episode.
Electronics 11 00879 g003
Figure 4. The impact of the weights on the normalized number of dropped tasks per episode.
Figure 4. The impact of the weights on the normalized number of dropped tasks per episode.
Electronics 11 00879 g004
Figure 5. Comparison with benchmark polices in terms of normalized number of tasks sent to the PMS per episode.
Figure 5. Comparison with benchmark polices in terms of normalized number of tasks sent to the PMS per episode.
Electronics 11 00879 g005
Figure 6. Comparison with benchmark polices in terms of normalized number of dropped tasks per episode.
Figure 6. Comparison with benchmark polices in terms of normalized number of dropped tasks per episode.
Electronics 11 00879 g006
Table 1. Comparison of relevant works.
Table 1. Comparison of relevant works.
WorkObjectiveMethodEnvironments
Our workWeighted sum of computing pressure on the PMS, energy consumption, and task dropping costDRLDynamic
[6]Computation rate of the wireless devicesConvex optimizationStatic
[7]Energy consumption of the APConvex optimizationStatic
[8]Number of serviced devicesConvex optimizationStatic
[9]Total energy consumption of the devicesConvex optimizationStatic
[10]System utilityGame theoryStatic
[11]LatencyConvex optimizationStatic
[15]Long-term average delay of the tasksDRLDynamic
[16]Cache bit rateDRLDynamic
[17]Average completion time of tasksDRLDynamic
[18]Energy consumption of the systemDRLDynamic
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Tian, K.; Chai, H.; Liu, Y.; Liu, B. Edge Intelligence Empowered Dynamic Offloading and Resource Management of MEC for Smart City Internet of Things. Electronics 2022, 11, 879. https://doi.org/10.3390/electronics11060879

AMA Style

Tian K, Chai H, Liu Y, Liu B. Edge Intelligence Empowered Dynamic Offloading and Resource Management of MEC for Smart City Internet of Things. Electronics. 2022; 11(6):879. https://doi.org/10.3390/electronics11060879

Chicago/Turabian Style

Tian, Kang, Haojun Chai, Yameng Liu, and Boyang Liu. 2022. "Edge Intelligence Empowered Dynamic Offloading and Resource Management of MEC for Smart City Internet of Things" Electronics 11, no. 6: 879. https://doi.org/10.3390/electronics11060879

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop