# A Learning Game-Based Approach to Task-Dependent Edge Resource Allocation

## Abstract

## 1. Introduction

- We propose a two-stage resource allocation method in the context of dependent tasks.
- In the first stage, we model the problem of incentivizing users to request resources from edge servers as a multivariate Stackelberg game. We analyze the uniqueness of SE under the scenario of information sharing. Furthermore, we investigate the incentive problem in the absence of information sharing, and we transform it into a partially observable Markov decision process for multiple agents. To solve the SE in this situation, we design a learning-based game-theoretic reinforcement learning algorithm.
- In the second stage, to allocate resources effectively, we design a greedy-based deep reinforcement learning algorithm to minimize the task execution time.
- Through experimental simulation, it is demonstrated that the reinforcement learning algorithm proposed in this paper, which is based on learning games, can achieve SE in scenarios without information disclosure, and that it outperforms the conventional A2C algorithm. The reinforcement learning algorithm, grounded in the principle of greediness, can significantly reduce the execution time of tasks.

## 2. Related Work and Preliminary Technology

#### 2.1. Related Work

#### 2.2. Preliminary Technology

## 3. System Model

#### 3.1. Local Computation

#### 3.2. Edge Computation

## 4. Incentives under Information Sharing Conditions

#### 4.1. Participant Utility Functions

#### 4.2. Problem Formulation

#### 4.3. Stackelberg Equilibrium Analysis under Information Sharing Conditions

Algorithm 1 Coordinate Alternation Method |

Input: initialization $\mathit{D},\mathit{\alpha},\mathit{\beta},\mathit{\kappa},I,\epsilon ,\mathit{R},{\mathit{x}}_{\mathit{i}}$ |

Output: optimal strategy ${\mathit{R}}^{\left[\mathit{k}\right]},{\mathit{x}}_{\mathit{i}}^{\left[\mathit{k}\right]}$ |

1: while $||{\mathit{R}}^{\left[\mathit{k}\right]}-{\mathit{R}}^{[\mathit{k}-\mathit{1}]}||>\epsilon $ do |

2: while $||{\mathit{x}}_{\mathit{i}}^{\left[\mathit{k}\right]}-{\mathit{x}}_{\mathit{i}}^{[\mathit{k}-\mathit{1}]}||>\epsilon $ do |

3: for $i=1,2,\dots ,I$ do |

4: calculation of the followers’ utility ${u}_{1,m}$, ${u}_{2,m}$ by Equations (9) and (10) |

5: save the strategy that maximizes ${u}_{1,m}$ and ${u}_{2,m}$ as ${\mathit{x}}_{\mathit{i}}^{\left[\mathit{k}\right]}$ |

6: end for |

7: calculation of the leader’s utility U by Equation (12) |

8: save the strategy that maximizes U as ${\mathit{R}}^{\left[\mathit{k}\right]}$ |

9: end while |

10: end while |

## 5. Study of the Incentives under Non-Information-Sharing Conditions

#### 5.1. Overview

#### 5.2. Design Details

#### 5.3. Optimization of Learning Objectives and Strategies

## 6. Task Allocation for DRL Based on Greedy Thinking

#### 6.1. Overview

#### 6.2. Design Details

#### 6.3. Optimization of Learning Objectives and Strategies

## 7. Simulation Results

#### 7.1. Performance Analysis of the Incentive Mechanism Algorithm Based on Learning Games

#### 7.2. Analysis of the Effectiveness of the Greedy-Based DRL Algorithm

## 8. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

Parameters | Value |
---|---|

User device computational power ${f}_{l}$ | 1 GHz |

Effective switching capacitance ${\kappa}_{m}$ | 10${}^{-27}$ |

Transmit/receive task size ${d}_{i}/{r}_{i}$ | 5∼50 Kb |

CPU cycles per bit of data processed $\eta $ | 500∼1500 cycles/bit |

Unit cost of calculating energy consumption $\alpha $ | 10${}^{-6}$∼10${}^{-5}$ |

Incentive mechanism penalty factor $\mu $ | 20 |

Task assignment penalty factor $\varphi $ | 5 |

Unit cost of communication rate $\beta $ | 10${}^{-4}$∼10${}^{-3}$ |

