# Priority-Aware Resource Management for Adaptive Service Function Chaining in Real-Time Intelligent IoT Services

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Routing optimization, priority-aware traffic classification (e.g., application-based/flow-based approaches), quality-of-service (QoS) and quality-of-experience estimation, advanced security, and resource allocation in SDN systems;
- Increasing flexibility in terms of instantiation, modification, and placement of virtual network functions (VNFs) in the NFV layer with model scalability and transferability;
- Reactive/proactive caching placement, improved service slicing, virtualized resource mapping, and task offloading decisions in resource-constrained MEC.

## 2. Preliminary Studies on DRL-Based SFC Deployment

**State**consists of all the possible ranging predictions classified into three categories, namely underestimation (predicted value is less than the actual value), equivalent (predicted value is equal to the actual value), and overestimation (predicted value is more than the actual value).**Action**alters states by setting decrement/increment values of the prediction by 0.01.**Reward**considers an outcome range between 0 and 1 as a deficiency and efficiency of the state–action pairs for improving immediate and next state. The acceptable error margin is defined.

**State**defines the features of remaining resource capacities, output bandwidth, and characteristics of processed VNFs.**Action**indicates the placement index of deploying particular VNFs in the selected server node. The action alters the placement specifications which output different performances, whereas the null action indicates the undeployed status of VNFs.**Reward**formulates the optimization model of the weighted total between provider and client profits, which can be obtained by the approved requests and expenditures for deployment.

## 3. System Models

#### 3.1. Costs of Allocation and Virtual Link Usages

#### 3.2. Computation Model

#### 3.3. Execution Delays

## 4. Priority-Aware Agent for Resource Management in SFC

**State spaces**indicate the significant features of the SDN/NFV-assisted SFC environment, where the placement costs, bandwidth allocation, link costs, and required resources per VNF are observed for formulating the central policy in both SFC-OP and SFC-FRFP. The proposed multi-agent DRL utilizes the Markov decision process framework to tackle the problem by synchronously storing the sequential experienced IoT networking. SFC-OP ensures the provision of the placement ${a}_{j\left(i\right)}^{s}$ and link ${c}_{l\left(j\right)}^{s}$ costs for agents. SFC-FRFP responds with $b{w}_{l\left(j\right)}^{s\to {s}^{\prime}}$ and $r{r}_{j\left(i\right)}.$ The reference points are assigned within NFV management and orchestration entities. Representational state transfer is used mainly in SDNC for interactions between OF-enabled networks (cluster heads), SFC-FRFP, and central multiple agents. These interfaces allow communications within central software mechanism systems for high and reliable performance.$${s}_{t}=\left\{{a}_{j\left(i\right)}^{s},b{w}_{l\left(j\right)}^{s\to {s}^{\prime}},{c}_{l\left(j\right)}^{s},r{r}_{j\left(i\right)}\right\}$$**Actor spaces**consist of the configuration parameters in priority-aware SFC systems that highly impact the system execution. The status of deploying VNF in any particular path $s{t}_{j\left(i\right)}^{s}$ can be adjusted between 0 and 1, which acts as a decision variable of the placement. The proposed agent considers the workload of services, application criticality, and congestion as major factors. Similarly,$s{t}_{l\left(j\right)}^{s}$ is also a decision variable for placing virtual links between VNFs, which outputs between 0 and 1. This configuration aims to emphasize the difference between connectivity points in non-real-time and real-time IoT services. The service criticality is incorporated with the application plane in the SDN-based management and experience-driven entity for enhancing the priority awareness, which integrates weight ${\theta}_{i}$ for any service path $i$. The critical ratio is a ranging float value from 0.0 to 1.0, where 1.0 allows the allocation to be completely placed in critical real-time labels. The predetermined resource allocation from $f\left({r}_{j\left(i\right)}^{s}{\theta}_{i}\right)$ maximizes or stabilizes its value based on the service-critical weights. For example, the critical weight of 0.5 in service $i$ from agent 3 decides the final values of placement and resource capacities from both agents 1 and 2 as described in Equations (1) and (2).$${a}_{t}=\left\{s{t}_{j\left(i\right)}^{s},s{t}_{l\left(j\right)}^{s},{\theta}_{i},f\left({r}_{j\left(i\right)}^{s}{\theta}_{i}\right)\right\}$$**Reward**determines the evaluation of applied ${a}_{t}$, in terms of placement decision variables and weights, on current costs, required computation, and available bandwidth resources as ${s}_{t}$. With the performance of the state–action pair $\left({s}_{t},{a}_{t}\right)$, the collective rewards on each agent, as shown in (7), are reflected to reward agents 1, 2, and 3 as expressed in (8), (9), and (10), respectively. The reward for agent 1, denoted as ${r}_{1}$, presents the resource efficiency by aiming to compare the optimal variables that minimize the overload resource placement. The minimization follows the constraints of decreasing the output when it exceeds 1, and the optimal output is 1 since the model meets the best matching orchestration when $r{r}_{j\left(i\right)}$ is equal to $f\left({r}_{j\left(i\right)}^{s}{\theta}_{i}\right)$. The reward for agent 2, denoted as ${r}_{2}$, aims to range the scoring reward based on a container with minimization of the placing costs. A tradeoff between high cost and latency occurs when the ${\theta}_{i}$ is equal to 1 (highly mission-critical weight). The latency performance of executing the service path is emphasized as a reward ${r}_{3}$ in agent 3. The latency in orchestration, execution, and modification in the processing SFC systems reflects the performances compared with the lower-bound and upper-bound threshold of each service QoS class identifier.$$R={r}_{1}+{r}_{2}+{r}_{3}$$$${r}_{1}=\mathrm{min}\left({\displaystyle \sum}_{i\in vFG}{\displaystyle \sum}_{j\in vNF}{\displaystyle \sum}_{s\in vS}s{t}_{j\left(i\right)}^{s}\frac{r{r}_{j\left(i\right)}}{f\left({r}_{j\left(i\right)}^{s}{\theta}_{i}\right)}\right)$$$${r}_{2}=\mathrm{min}\left({\displaystyle \sum}_{i\in vFG}{\displaystyle \sum}_{j\in vNF}\left({\displaystyle \sum}_{s\in vS}s{t}_{j\left(i\right)}^{s}{a}_{j\left(i\right)}^{s}{\theta}_{i}+{\displaystyle \sum}_{l\in vL}{\displaystyle \sum}_{s\in vS}s{t}_{l\left(j\right)}^{s}b{w}_{l\left(j\right)}^{s\to {s}^{\prime}}{c}_{l\left(j\right)}^{s}{\theta}_{i}\right)\right)$$$${r}_{3}=\mathrm{min}\left({T}_{total}^{i}\right)$$**Q-value and loss optimization**aims to optimize long-term targets and parameters for the deep networks of function approximators. The target network uses the replay batch parameters for training. However, in every training phase, loss values appear as either high or low compared to the satisfactory point. The loss follows the mean square error concept, which is a squared difference between the target ${Q}^{*}\left(s,a\right)$ and predicted q-value. With optimal q-value approximation, the agent can take actions that maximize the performance of each real-time service with long-term efficiency. Equation (11) presents the primary target q-value, which follows the rewards in (8)–(10) with the discount factor $\gamma $ (uncertainty rate) of the expected maximum in the next state. The networks (online and target) iteratively modify the parameters and synchronously exchange the weights. The policy iteration of multi-agent DQNs consists of a flow transition between each component in the Markov decision process framework, which is depicted in Figure 4.$${Q}^{*}\left(s,a\right)={\displaystyle \sum}_{k\in R}{r}_{k}+\gamma \underset{{a}^{\prime}}{\mathrm{max}}{Q}^{*}\left({s}^{\prime},{a}^{\prime}\right)$$

## 5. Performance Evaluation

#### 5.1. Proposed and Reference Schemes

**DQN-RM**indicates the single-agent mechanism which tackles resource management without considering the critical weights ${\theta}_{i}$ of service priority awareness. A variety of congestion intervals leads DQN-RM to consist of deficient action selection, particularly when parallel service requests occur. By solely tackling the resource predetermination $f\left({r}_{j\left(i\right)}^{s}\right)$, the efficiency of resources is high; however, it remains an optimizable mechanism for real-time intelligent IoT services [33,34,35]. This reference scheme observes the same states as the proposed priority-aware scheme.**Random-RM**is used as a conventional approach in various complementary studies, where the orchestration policy allocates the system resources by random method [15,34]. The rewards are still formulated in this scheme, by executing the service requests with adjusted random resources and service-critical weights.**MR-RM**follows the greedy method by managing the resources based on either cost or real-time latency optimization. In our experience, we prioritize real-time latency over cost, which leads to a configuration maximal rate of ${\theta}_{i}$.

#### 5.2. Performance Metrics

**The average immediate total reward per episode**identifies the output of evaluating the state–action $\left({s}_{t},{a}_{t}\right)$ execution as expressed in Equations (7), (8), (9), and (10). In the experiment, the agent operates for 300 episodes. Epsilon-greedy balances the exploring and exploiting phase. The immediate rewards are most likely to be stable after 150 episodes. The average metric sums up every immediate reward of that episode and then divides it by the total numbers. The visualization is considered per 50-episode average to present the accumulative outputs. The proposed and reference schemes operate using the same hyperparameters. A negative reward (−10, −1) identifies deficient action selection. The maximum value is 0.**End-to-end executing delay of requested services**is measured in milliseconds (ms) within 300 s simulation time. This metric is considered the total time from IoT tenants to end service, which differentiates between the proposed and reference schemes only by ${T}_{total}^{i}$. The propagation, queueing, transmission, and proposed agent orchestration are calculated for this metric.**Delivery ratio: tasks of requested services**represent the efficiency of the decision-making agent when the service requests are being operated, whether there is a reliable path or not. The instantiation, deletion, and modification of service requests are the key management in this environment and variables for ensuring the reliability of the offloading and placed service tasks on any particular chain. With low delivery ratios, the policy is not suitable for real-time IoT scenarios. The priority-aware scheme takes the upper-bound tolerable delays more critically, and they are assigned a critical weight for preventing the high drop possibility.**Throughput**indicates the success data over the allocated bandwidth and VM computation resources for going through each VNF on SFC systems. This metric is measured in megabits per second (Mbps).

#### 5.3. Result and Discussions

## 6. Conclusions and Future Work

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Mijumbi, R.; Serrat, J.; Gorricho, J.L.; Bouten, N.; De Turck, F.; Boutaba, R. Network function virtualization: State-of-the-art and research challenges. IEEE Commun. Surv. Tutor.
**2016**, 18, 236–262. [Google Scholar] [CrossRef] - European Telecommunications Standards Institute (ETSI). Deployment of Mobile Edge Computing in an NFV environment. ESTI Group Rep. MEC
**2018**, 17, V1. [Google Scholar] - Contreras, L.M.; Bernardos, C.J. Overview of Architectural Alternatives for the Integration of ETSI MEC Environments from Different Administrative Domains. Electronics
**2020**, 9, 1392. [Google Scholar] [CrossRef] - Xie, J.; Yu, F.R.; Huang, T.; Xie, R.; Liu, J.; Wang, C.; Liu, Y. A survey of machine learning techniques applied to software defined networking (SDN): Research issues and challenges. IEEE Commun. Surv. Tutor.
**2018**, 21, 393–430. [Google Scholar] [CrossRef] - Minias, D.M.; Shami, A. The Need for Advanced Intelligence in NFV Management and Orchestration. IEEE Netw.
**2021**, 35, 365–371. [Google Scholar] [CrossRef] - McClellan, M.; Cervelló-Pastor, C.; Sallent, S. Deep Learning at the Mobile Edge: Opportunities for 5G Networks. Appl. Sci.
**2020**, 10, 4735. [Google Scholar] [CrossRef] - Chen, W.; Qiu, X.; Cai, T.; Dai, H.N.; Zheng, Z.; Zhang, Y. Deep Reinforcement Learning for Internet of Things: A Comprehensive Survey. IEEE Commun. Surv. Tutor.
**2021**, 23, 1659–1692. [Google Scholar] [CrossRef] - ETSI GS ZSM 009-2 V1.1.1; Zero-Touch Network and Service Management (ZSM); Cross-Domain E2E Service Lifecycle Management. European Telecommunications Standards Institute (ETSI): Sophia-Antipolis, France, 2022.
- Ning, Z.; Wang, N.; Tafazolli, R. Deep Reinforcement Learning for NFV-based Service Function Chaining in Multi-Service Networks. In Proceedings of the 2020 IEEE 21st International Conference on High Performance Switching and Routing (HPSR), Newark, NJ, USA, 11–14 May 2020; pp. 1–6. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Introduction to Reinforcement Learning; MIT Press Cambridge: Cambridge, MA, USA, 1998; Volume 2. [Google Scholar]
- Vithayathil Varghese, N.; Mahmoud, Q.H. A Survey of Multi-Task Deep Reinforcement Learning. Electronics
**2020**, 9, 1363. [Google Scholar] [CrossRef] - Chae, J.; Kim, N. Multicast Tree Generation using Meta Reinforcement Learning in SDN-based Smart Network Platforms. KSII Trans. Internet Inf. Syst.
**2021**, 15, 3138–3150. [Google Scholar] - Adoga, H.U.; Pezaros, D.P. Network Function Virtualization and Service Function Chaining Frameworks: A Comprehensive Review of Requirements, Objectives, Implementations, and Open Research Challenges. Future Internet
**2022**, 14, 59. [Google Scholar] [CrossRef] - Moonseong, K.; Woochan, L. Adaptive Success Rate-based Sensor Relocation for IoT Applications. KSII Trans. Internet Inf. Syst.
**2021**, 15, 3120–3137. [Google Scholar] - Kim, E.G.; Kim, S. An Efficient Software Defined Data Transmission Scheme based on Mobile Edge Computing for the Massive IoT Environment. KSII Trans. Internet Inf. Syst.
**2018**, 12, 974–987. [Google Scholar] - Guo, A.; Yuan, C. Network Intelligent Control and Traffic Optimization Based on SDN and Artificial Intelligence. Electronics
**2021**, 10, 700. [Google Scholar] [CrossRef] - Pei, J.; Hong, P.; Pan, M.; Liu, J.; Zhou, J. Optimal VNF placement via deep reinforcement learning in SDN/NFV-enabled networks. IEEE J. Sel. Areas Commun.
**2019**, 38, 263–278. [Google Scholar] [CrossRef] - Bunyakitanon, M.; Vasilakos, X.; Nejabati, R.; Simeonidou, D. End-to-end performance-based autonomous VNF placement with adopted reinforcement learning. IEEE Trans. Cogn. Commun. Netw.
**2021**, 6, 534–547. [Google Scholar] [CrossRef] - Jang, I.; Choo, S.; Kim, M.; Pack, S.; Shin, M. Optimal Network Resource Utilization in Service Function Chaining. In Proceedings of the 2016 IEEE NetSoft Conference and Workshops 2016, Seoul, Korea, 6–10 June 2016; pp. 11–14. [Google Scholar]
- Wang, X.; Xu, B.; Jin, F. An Efficient Service Function Chains Orchestration Algorithm for Mobile Edge Computing. KSII Trans. Internet Inf. Syst.
**2021**, 15, 4364–4384. [Google Scholar] - Okafor, K.C.; Longe, O.M. Integrating Resilient Tier N+1 Networks with Distributed Non-Recursive Cloud Model for Cyber-Physical Applications. KSII Trans. Internet Inf. Syst.
**2022**, 16, 2257–2285. [Google Scholar] - Qiao, Q. Routing Optimization Algorithm for Logistics Virtual Monitoring Based on VNF Dynamic Deployment. KSII Trans. Internet Inf. Syst.
**2022**, 16, 1708–1734. [Google Scholar] - Alonso, R.S.; Sittón-Candanedo, I.; Casado-Vara, R.; Prieto, J.; Corchado, J.M. Deep Reinforcement Learning for the Management of Software-Defined Networks and Network Function Virtualization in an Edge-IoT Architecture. Sustainability
**2020**, 12, 5706. [Google Scholar] [CrossRef] - Huang, Y.-X.; Chou, J. A Survey of NFV Network Acceleration from ETSI Perspective. Electronics
**2022**, 11, 1457. [Google Scholar] [CrossRef] - Lantz, B.; O’Connor, B. A mininet-based virtual testbed for distributed sdn development. ACM Sigcomm Comput. Commun. Rev.
**2015**, 45, 365–366. [Google Scholar] [CrossRef] - Oliveira, R.L.S.D.; Schweitzer, C.M.; Shinoda, A.A.; Prete, L.R. Using mininet for emulation and prototyping software-defined networks. In Proceedings of the 2014 IEEE Colombian Conference on Communications and Computing (COLCOM), Bogota, Colombia, 4–6 June 2014; pp. 1–6. [Google Scholar]
- Svorobej, S.; Takako Endo, P.; Bendechache, M.; Filelis-Papadopoulos, C.; Giannoutakis, K.M.; Gravvanis, G.A.; Tzovaras, D.; Byrne, J.; Lynn, T. Simulating Fog and Edge Computing Scenarios: An Overview and Research Challenges. Future Internet
**2019**, 11, 55. [Google Scholar] [CrossRef] - Park, J.-H.; Kim, H.-S.; Kim, W.-T. DM-MQTT: An Efficient MQTT Based on SDN Multicast for Massive IoT Communications. Sensors
**2018**, 18, 3071. [Google Scholar] [CrossRef] - Abadi, M. Tensorflow: Learning Functions at Scale. ACM Sigplan Not.
**2016**, 51, 1. [Google Scholar] [CrossRef] - Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv
**2016**, arXiv:1606.01540. [Google Scholar] - Li, L.; Lv, Y.; Wang, F.Y. Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Autom. Sin.
**2016**, 3, 247–254. [Google Scholar] - Wang, S.; Liu, H.; Gomes, P.H.; Krishnamachari, B. Deep reinforcement learning for dynamic multichannel access in wireless networks. IEEE Trans. Cogn. Commun. Netw.
**2018**, 4, 257–265. [Google Scholar] [CrossRef] - Tam, P.; Math, S.; Nam, C.; Kim, S. Adaptive Resource Optimized Edge Federated Learning in Real-Time Image Sensing Classifications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2021**, 14, 10929–10940. [Google Scholar] [CrossRef] - Nam, C.; Math, S.; Tam, P.; Kim, S. Intelligent resource allocations for software-defined mission-critical IoT services. Comput. Mater. Contin.
**2022**, 73, 4087–4102. [Google Scholar] - Math, S.; Tam, P.; Kim, S. Intelligent Offloading Decision and Resource Allocations Schemes Based on RNN/DQN for Reliability Assurance in Software-Defined Massive Machine-Type Communications. Secur. Commun. Netw.
**2022**, 2022, 4289216. [Google Scholar] [CrossRef]

**Figure 4.**State transition procedure between each agent, function approximators, SFC environment, and experience-driven entities for enhancing priority awareness.

**Figure 5.**Performance metrics for proposed and reference schemes, including (

**a**) the average immediate total reward per 50 episodes, (

**b**) end-to-end execution delay of requested services, (

**c**) delivery ratio, and (

**d**) throughput.

Abbreviation | Description |
---|---|

IoT | Internet of Things |

DQN | Deep Q-Network |

DRL | Deep Reinforcement Learning |

MEC | Multi-Access Edge Computing |

NFV | Network Function Virtualization |

NFVO | NFV Orchestrator |

QoS | Quality-of-Service |

SDN | Software-Defined Networking |

SDNC | SDN Controller |

SFC | Service Function Chaining |

SFC-FRFP | SFC Flow Rule Forwarding Path |

SFC-OP | SFC Orchestration Policy |

TOSCA | Topology and Orchestration Specification for Cloud Applications |

VIM | Virtual Infrastructure Manager |

VM | Virtual Machine |

VNFs | Virtual Network Functions |

VNFFG | VNF Forwarding Graph |

Notation | Description |
---|---|

$BS=\left\{1,2,\dots ,b\right\}$ | A set of small base stations |

$vS=\left\{1,2,\dots ,s\right\}$ | A set of NFV-enabled nodes with allocated virtual resources |

$vFG=\left\{1,2,\dots ,i\right\}$ | A set of VNFFGs |

$vNF=\left\{1,2,\dots ,j\right\}$ | A set of VNFs |

$vL=\left\{1,2,\dots ,l\right\}$ | A set of virtual links |

$s{t}_{j\left(i\right)}^{s}$ | The status variable of deploying VNF$-j$ in service path $i$ |

$s{t}_{l\left(j\to {j}^{\prime}\right)}^{s}$. | The status variable of placing virtual link $l$ connecting between VNFs |

${a}_{j\left(i\right)}^{s}$ | The placeme cost of VNF$-j$ |

${\theta}_{i}$ | Critical weight of priority-aware service $i$ |

$b{w}_{l\left(j\right)}^{s\to {s}^{\prime}}$ | The bandwidth allocation between NFV-enabled nodes |

${c}_{l\left(j\right)}^{s}$ | Link costs of VNF$-j$ in NFV-enabled node |

$f\left({r}_{j\left(i\right)}^{s}{\theta}_{i}\right)$ | Function for returning predetermined resource allocation in node $s$ for executing VNF$-j$(primarily based on the ${\theta}_{i}$) |

$r{r}_{j\left(i\right)}$ | Required resources for computing VNF$-j$ |

${C}_{alloc}$ | Allocation cost in VNFFG |

${C}_{util}$ | Cost of total utilization in VNFFG |

${C}_{comp}$ | Computational resource model in VNFFG |

${T}_{total}^{i}$ | The total delay in a particular chain $i$ |

${T}_{mod}^{i}$ | Modification delay in a particular chain $i$ |

${T}_{orc}^{j}$ | Orchestration delay of VNF$-j$ |

${T}_{exe}^{j}$ | Execution delay of VNF$-j$ |

Purpose/Platform | Specifications |
---|---|

SDN Controller (c0, c1, c2, c3) | RYU 4.32 (controller and NFV-enabled nodes) |

Mininet | 2.3.0d6 |

Open vSwitch | 2.13.0 |

Iperf3 (QoS metrics) | Fair-queuing-based socket-level packing, UDP |

RESTful | 4.1.1 |

Device Configuration | mininet.node.CPULimitedHost |

Control and Interfaces | mininet.link.TCintf and mininet.link.OVSIntf |

N4, N9 | OpenFlow-enabled protocol |

N9 | 1 Gb/s to 2 Gb/s |

N6 | 9 Gb/s |

Payload Sizes | 1024 bytes |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Tam, P.; Math, S.; Kim, S.
Priority-Aware Resource Management for Adaptive Service Function Chaining in Real-Time Intelligent IoT Services. *Electronics* **2022**, *11*, 2976.
https://doi.org/10.3390/electronics11192976

**AMA Style**

Tam P, Math S, Kim S.
Priority-Aware Resource Management for Adaptive Service Function Chaining in Real-Time Intelligent IoT Services. *Electronics*. 2022; 11(19):2976.
https://doi.org/10.3390/electronics11192976

**Chicago/Turabian Style**

Tam, Prohim, Sa Math, and Seokhoon Kim.
2022. "Priority-Aware Resource Management for Adaptive Service Function Chaining in Real-Time Intelligent IoT Services" *Electronics* 11, no. 19: 2976.
https://doi.org/10.3390/electronics11192976