A Fuzzy-Logic-Based Load Balancing Scheme for a Satellite–Terrestrial Integrated Network

Gao, Yuehong; Yang, Haotian; Wang, Xiaoqi; Chen, Yihao; Li, Chenyang; Zhang, Xin

doi:10.3390/electronics11172752

Open AccessArticle

A Fuzzy-Logic-Based Load Balancing Scheme for a Satellite–Terrestrial Integrated Network

by

Yuehong Gao

,

Haotian Yang

^*,

Xiaoqi Wang

,

Yihao Chen

,

Chenyang Li

and

Xin Zhang

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(17), 2752; https://doi.org/10.3390/electronics11172752

Submission received: 27 July 2022 / Revised: 19 August 2022 / Accepted: 29 August 2022 / Published: 1 September 2022

Download

Browse Figures

Versions Notes

Abstract

:

With the development of communication systems, users are becoming more widely distributed and require higher speed networks. A satellite–terrestrial integrated network could provide seamless coverage for these users. In previous studies of load balancing, initial access and load balancing are decided on based on signal reception and are performed reactively after the overloading occurs, which may not work well in satellite–terrestrial integrated networks. Therefore, this paper proposes a fuzzy-logic-based load balancing scheme. In this scheme, a fuzzy evaluation metric to pre-evaluate the user’s impact on overload is presented. The fuzzy logic system is constructed based on adaptive neuro fuzzy system, which takes the user’s signal reception, speed and data requirement as inputs. Then, the fuzzy-logic- and reinforcement-learning-based access is proposed to give an access decision for all users in the network to prevent overloading. Due to the large dimensions of action space, the reinforcement learning model is trained by the proposed fuzzy, deep, deterministic policy gradient. Next, the fuzzy-logic-based offloading algorithm is proposed to balance load after overloading. A simulation platform is established to evaluate the performance. Simulation results indicate that the proposed scheme can ensure load balance for a longer time than base line schemes while ensuring data rate of users.

Keywords:

load balancing; fuzzy logic; reinforcement learning; satellite–terrestrial network

1. Introduction

Coinciding with the development of 5th generation (5G) technology, users have higher data requirements and are widely distributed [1]. Due to the uneven distribution of data requirements, the frequency resource utilization in any system will be also unevenly distributed. Cells with high resource utilization are called overloaded cells. These cells have less available resources, which may affect the quality of service (QoS) for users. Load balancing means to transfer some traffic from overloaded cells to other available cells to balance the frequency resources among cells. It is an important method to optimize resource utilization and improve QoS. Reference [2] reviews the historical developments of load balancing and provides guidance and a roadmap for developing load balancing. However, there is a growing demand for high-rate communication in remote areas with sparsely deployed terrestrial cells. In these areas, the frequency resources are insufficient, such that the cells are more likely to overload. What is more, neighboring cells found by traditional load balancing schemes may not suit being the offloading target cells due to the limitation of coverage. Since neighboring cells are not always available, reference [3] suggests using cells with high coverage to achieve load balancing. According to the architecture of the future network [4], non-terrestrial cells will provide coverage for these suburban, rural and island areas. Therefore, this paper studies an effective scheme for a satellite–terrestrial integrated network (STIN) to solve the overloading problem.

According to the different radio access technologies (RAT) in networks, load balancing is divided into intra-RAT load balancing and inter-RAT load balancing. Some work has focused on intra-RAT load balancing: In [5], a load assignment policy and a target selection policy are proposed, which utilize the matching theory. In [6], traffic was transferred from overloaded cells to the neighboring cells with less load to guarantee seamless handover in 5G systems. In [7], for giant low-Earth-orbit satellite networks, the authors designed two different handover methods for users with predictable handover times and users with unpredictable handover times to ensure load balance. In [8], the authors used fuzzy logic to adjust the cell individual offset (CIO) parameter in the handover process, so that overloaded cells were more likely to trigger handover and the neighboring cells were more likely to accept handover users. In [9], the authors also focused on CIO, but they introduced reinforcement learning to suit dynamically changing environments. These schemes focus on the optimization of handover events. Without considering the characteristics of satellite coverage, these schemes may not be suitable for the STIN [10].

For inter-RAT load balancing, a heterogeneous network consisting of small cells and traditional macro cells is one of the typical research scenarios. The authors of [11] proposed a two-step mechanism based on two biases for RAT selection. The authors of [12] proposed an algorithm that adapts handover margins and time to trigger. Reference [13] proposes two different versions of simulated annealing to improve load balancing and spectral performance. On the other hand, STIN, which is expected to achieve seamless coverage and transmission, has also introduced many new challenges for current systems [14]. The authors of [15] used a content popularity and Stakelberg game model to propose an effective scheme for load balancing between unmanned aerial vehicle cells and macro cells. Reference [16] proposes an efficient scheme for load balancing by using Knapsack and Zipf. Reference [17] formulates the problem as a constrained, multi-objective linear programming problem to maximize the utilization efficiency between satellites. The authors of [18] analyzed the transmission characteristics of terrestrial and back-haul links to propose a greedy-based user association algorithm and a matching algorithm with user grouping for balancing the load by performing multiple iterations between users and cells. In [19], the authors noted that the current methods adopt the greedy strategy, which leads to the load imbalance problem in cells. Thus, they defined a load coefficient and added it to the reward function to make handover decisions while balancing loads. Reference [20] proposes a load balancing scheme based on a load measurement metric for both a terrestrial network (TN) and a non-terrestrial network (NTN). However, the metric ignores the impact of user changes in the future, which may result in poor performance over a long period of time.

Since most existing load-balancing methods are usually performed reactively after the overload occurs [21], it would be worthwhile to design a method of active load balancing that acts before overloading occurs. Fuzzy logic is an effective pre-evaluation method [22]. Thus, in this paper, the fuzzy evaluation metric (FEM) is proposed to evaluate the impact of users on overloading for both TN and NTN. Then, we propose a joint load balancing scheme to solve the problem of overloading in STIN. The joint scheme consists of two parts. To reduce the tendency of overload before it occurs, an access algorithm is proposed: deep reinforcement learning (DRL) is a widely utilized tool to make selection decisions in a dynamic environment [23]. However, the neural network will not learn effectively with too many states and actions to be explored. Therefore, the fuzzy-logic- and reinforcement-learning-based access algorithm (FLRL-AC) is proposed. Then, to further ensure load balance, an offloading algorithm is proposed to offload users after overloading occurs: Existing studies on offloading usually only take the received signal as a single metric. To take into account the impact of dynamic changes of users, a fuzzy-logic-based offloading algorithm (FL-OL) is proposed. Finally, the fuzzy-logic-based load balancing scheme (FL-LB) is proposed, which is the combination of FLRL-AC and FL-OL.

The rest is organized as follows. Section 2 describes the considered network structure and the problems and solutions discussed in this paper. Numerical and simulation results, which demonstrate and verify the analysis, are presented in Section 3. Section 4 describes advantages and future research areas of this study. Finally, our concluding remarks are made in Section 5.

2. Materials and Methods

This section firstly describes the structure and problems to be solved in STIN. Then the proposed solutions are described: The FEM is proposed to pre-evaluate the impact of users on overloading. In the proposed FL-LB, the access algorithm runs in a centralized control cell and determines initial access in order to prevent overloading before it occurs. The offloading algorithm runs in each cell and determines how to offload users in the overloaded cells. The proposed metric and algorithms are described in detail in the following subsections.

2.1. Network Structure

We consider an

L \times L

area in the STIN network. There are in total N, cells including

N - 1

terrestrial cells, and one satellite hanging overhead. Terrestrial cells are modeled with reference to [24] and distributed sparsely in hexagonal cellular cell mode with the inter site distance (ISD) D. The satellite is modeled as a geostationary Earth orbit (GEO) satellite [25]. The coverage diameter of a single beam of the satellite is usually 50 to 250 km, which is very large compared with TN coverage. Therefore, in the system we assume the central beam of the satellite covers the whole TN area. TN and NTN cells differ in bandwidth, defined as

B_{g}

and

B_{s}

. M users are randomly distributed, requiring data transfer at a random rate with an average of

R_{a v g}

. Users move at a fixed speed v and with random angles.

TN channel is modeled as a downlink channel with additive white Gaussian noise and Rayleigh fading. The maximum achievable rate is expressed as

C_{m, n} = B \frac{n_{R B}}{N_{R B}} {log}_{2} (1 + \frac{p_{s} {|h|}^{2}}{Γ (ϵ) d_{m, n}^{2} N_{0}}),

(1)

where

C_{m, n}

is bit rate between user m and cell n,

n_{R B}

is the number of assigned resource blocks (

R B

), B is the maximum usable bandwidth of the cell,

N_{R B}

is the total number of

R B

,

p_{s}

is the transmission power and

ϵ

is the target block error rate (BLER).

Γ (ϵ) = - \frac{2 lg (5 ϵ)}{3}

(2)

represents the signal to noise ratio (SNR) margin to meet the desired target BLER with the QAM constellation [26].

h = \frac{1}{\sqrt{2}} (N (μ_{0}, σ_{0}) + j \cdot N (μ_{0}, σ_{0}))

(3)

is the channel fading coefficient where

N (μ, σ)

means a Gaussian random number in Equation (4).

d_{m, n}

is the distance between user m and cell n.

N_{0}

is the noise power.

X \sim N (μ, σ) : f (x) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}

(4)

The altitude of GEO is 35,536 km. It has a typical reflector antenna with a circular aperture [25]. We consider the simulation area is in the coverage area of the central beam and that the elevation angle of the beam’s center is

90^{\circ}

. The NTN channel is modeled with reference to [27], and the inter-beam interference is modeled with reference to [28], and they are not described in detail due to our being limited in space.

2.2. Problem Description

The process of load balancing in STIN with sparsely deployed TN cells is shown in Figure 1. Some users are on the edge of the area, having poor signal reception and requiring more bandwidth. Some users cannot find a better cell to handover. These make TN cells likely to overload in this network. An example is shown in Figure 1. The solid arrows point to cells that users are currently accessing. The dashed arrows point to cells that users previously accessed. User a and user b are users on the edge of the area that initially accesses the NTN cell. When the TN cell A suffers overloading, user c is selected to be offloaded to the NTN cell, and user d is selected to be offloaded to the adjacent TN cell B for relatively better QoS. Thus, the problem is split into two sub-problems: the access algorithm decides which cell users initially access, and the offloading algorithm decides which users are offloaded and how to offload them.

For the access algorithm, according to the link budget, in a scenario where cells are sparsely deployed, such as the rural scenario in 3GPP [24], the average SNR is 9.21 dB. In the coverage range of the central beam of a GEO satellite, the average SNR is only 2.95 dB. If the received signal quality is used to determine initial access, only a few users will access the NTN cell. As a result, NTN resources are not effectively utilized and TN cells are more likely to overload. To limit access to the NTN to suitable users, we aim to consider not only received signal, but also the impact of users on cell overloading. Therefore, fuzzy logic is utilized to propose an overload evaluation metric. Then, to deal with dynamic changes in the environment and determine the appropriate number of users allowed to access the NTN, the FLRL-AC is proposed.

For the offloading algorithm, even if the initial access is optimized, overloading may still occur after a certain period of time due to users’ movements, especially when the user density in the environment is large. Additionally, if the FLRL-AC is utilized to make a global re-access decision at this time, the QoS of users in the cells which are not overload is affected. Therefore, the FL-OL is proposed for those overloaded cells. The FL-OL offloads the most suitable users to the most suitable cells by utilizing FEM.

2.3. The Fuzzy Evaluation Metric

Existing load balancing methods are usually performed reactively after the overloading occurs. Active load balancing adapts the network in advance to prevent overloading and improve performance. Therefore, this paper firstly proposes a metric to pre-evaluate the impact of users on overloading, which helps the networks to make further decisions. Considering the differences in carrier frequency and bandwidth between TN and NTN, and the difficulty of explicitly evaluating overload tendency, fuzzy control is utilized [29] for the metric. It provides a unified measurement to evaluate the impact of users on overloading in STIN. An adaptive neuro fuzzy network (ANFN) is utilized to build the fuzzy system. The training network is shown in Figure 2. In the following, the structure of each layer of the network is described.

In the input layer, SNR, relative speed and data rate are selected as the inputs by considering the Shannon formula

C = B {log}_{2} (1 + S N R)

: SNR and the target rate. Channel with low SNR needs more bandwidth to transmit data at the same target rate, which is more likely to cause overload. In addition, if the user is getting closer to the cell, SNR between them will get better, otherwise it will get worse. Additionally, if the SNR is the same, users with higher data rates will require more bandwidth. Measurements of SNR and data rate requirements are in the same way in both RAT. The movement of user could be expressed by the relative speed between user and cell. Since the satellite is far from the Earth, the moved distance of the user in a short period of time can be ignored relative to the satellite height. Thus, the speed relative to the satellite is considered to be 0 in this paper. It is difficult to directly judge whether SNR, relative speed and data rate are high or not. Therefore, we take them as the three inputs in the “input” layer.

The “inputmf” layer uses membership functions to convert input values into fuzzy values. Commonly utilized membership functions are triangular, trapezoidal, Gaussian and bell-shaped. Since the relationship between inputs and output is not linear, the Gaussian membership function was selected.

f (x, c, σ) = e^{- \frac{{(x - c)}^{2}}{2 σ^{2}}},

(5)

where c determines the center position of the function.

σ

determines the width of the function. Both c and

σ

are trained by the ANFN. The fuzzy system has three inputs, and there are P, Q and R fuzzy concepts for each input, respectively. The membership degrees of inputs to different fuzzy concepts could be calculated via the membership functions. In this paper, both P, Q and R are set to 5, based on five fuzzy concepts: very bad (VB), bad (B), medium (M), good (G) and very good (VG). For each input, there are five membership functions subjecting to the same distribution with different parameters.

The “rule” layer pairs P fuzzy concepts of the first input, Q fuzzy concepts of the second input and R fuzzy concepts of the third input to obtain

P \times Q \times R

fuzzy rules. T-S fuzzy reasoning is utilized in the proposed fuzzy system. For the lth rule, the first input x is

X_{i}

, the second input y is

Y_{j}

and the third input z is

Z_{k}

. The mapping result

u_{l}

is calculated by

u_{l} = p_{l} x_{i} + q_{l} y_{j} + r_{l} z_{k} + c_{l},

(6)

where

l = (i - 1) Q R + (j - 1) R + k

;

i \in [1, P], j \in [1, Q], k \in [1, R]

.

p_{l}

,

q_{l}

,

r_{l}

and

c_{l}

are parameters of the lth rule trained by ANFN;

X_{i}

,

Y_{j}

and

Z_{k}

are the ith, jth and kth fuzzy concepts of the three inputs;

x_{i}

,

y_{j}

and

z_{k}

are the membership degree values calculated by the corresponding membership functions.

The “outputmf” layer uses weighted average method to consider the influence of each fuzzy rule comprehensively. The output fuzzy evaluation metric f is calculated by

f = \frac{\sum w_{l} u_{l}}{\sum w_{l}} = \frac{\sum x_{i} y_{j} z_{k} u_{l}}{\sum x_{i} y_{j} z_{k}},

(7)

where

l = (i - 1) Q R + (j - 1) R + k

,

i \in [1, P], j \in [1, Q], k \in [1, R]

.

w_{l}

is the weight of the lth rule calculated by the product method.

In order to train the ANFN, multiple groups of user trajectories; SNR; relative speeds and data rate requirements at each time and location; and the average required bandwidth for a period of time, were generated via simulation. The ANFN was trained with these simulation data. The smaller the value is, the greater the impact is.

2.4. The Fuzzy-Logic- and Reinforcement-Learning-Based Access Algorithm

Traditionally, users access the cell with the best reference signal receiving power (RSRP) [30]. According to the link budget, this may not work well in the STIN studied in this paper. In order to reduce the occurrence of overloading, to reduce the frequency of calling offloading algorithms and ensure QoS, the intelligence of reinforcement learning is introduced to make access decisions. In the following, the proposed FLRL-AC is described.

Reinforcement learning is a common method for intelligent decision. It obtains learning information and updates model parameters by calculating the rewards for actions in the current state of the environment. Reinforcement learning is divided into two categories: One is value learning, which uses a neural network to approximate the optimal action value function, such as Q-Learning or a deep Q-network. The other is policy learning, which uses a neural network to approximate the policy function, such as the actor–critic method. In this paper, reinforcement learning is used to select the initial access cell for each user so that the action dimensions are

N^{M}

. Due to exponential expansion and large dimensions of action space, a deep deterministic policy gradient (DDPG) [31] which is compatible with a large dimension state and actions is utilized in the proposed algorithm. FEM is utilized in DDPG in order to further reduce the difficulty of training and enable the decision to have a better impact on the future state, which is called fuzzy deep deterministic policy gradient (FDDPG) in this paper. In the following, we will describe the Markov decision process of the problem and the FDDPG training process.

(1) State space: The sate space describes the environment. It reflects the relative positions, relative motions and channel states between cells and users. Thus, the state at time t is defined as

s^{t} = (f^{t} (1, 1), f^{t} (1, 2), \dots, f^{t} (M, N)),

(8)

where

f^{t} (m, n)

means the FEM between user m and cell n at time t.

(2) Action space: To make better access decisions, the index of cell is set as the action. In Equation (9),

A C^{t} (m) \in [0, N]

means the index of cell which user m accesses at time t.

a^{t} = (A C^{t} (1), A C^{t} (2), \dots, A C^{t} (M)) .

(9)

(3) Reward function: The proposed access decision method aims at maximizing the total reward of the access selections for all users. The discounted total reward is

R^{t} = \sum_{T = 0}^{\infty} γ^{T} \cdot r^{t + T} = γ^{T} (a_{1} \cdot r_{1}^{t + T} + a_{2} \cdot r_{2}^{t + T} + a_{3} \cdot r_{3}^{t + T}),

(10)

where

γ

is the reward discount and

r^{t}

is the per-time reward. In this paper, the per-time reward is composed of three parts,

r_{1}^{t}

,

r_{2}^{t}

and

r_{3}^{t}

. For the first part,

r_{1}^{t} = \frac{\sum_{m = 1}^{M} f^{t} (m, A C^{t} (m))}{M},

(11)

the design goal is to reduce the overloading tendency in the future. Thus,

r_{1}^{t}

equals the average value of the FEM of all users. At meanwhile, for the second part,

r_{2}^{t} = \sum_{n = 1}^{N} O^{t} (n),

(12)

we hope to keep load balanced at the current time by minimizing the number of overloaded cells.

r_{2}^{t}

equals the overload penalty for all cells, where

O^{t} (n) = \{\begin{matrix} 1, i f η_{n}^{t} \geq Θ \\ 0, o t h e r w i s e \end{matrix}

(13)

indicates whether cell n is overloaded at time t.

η_{n}^{t} = \frac{B_{n}^{t}}{B_{C}}

is the radio resource utilization ratio (RRUR) of cell n at time t, where

B_{n}^{t}

is the occupied portion of the bandwidth of cell n and

B_{C}

is the total bandwidth resources of the cell. For terrestrial cells,

B_{C}

is equal to

B_{g}

. Additionally, for the satellite,

B_{C}

is equal to

B_{s}

.

Θ

is the overload threshold. For TN cells, the threshold is

Θ_{g}

. Additionally, for the NTN cell, the threshold is

Θ_{s}

. In order to minimize the resource utilization ratio and balance the resource utilization, the third part of the reward is defined as

r_{3}^{t} = \sum_{n = 1}^{N - 1} η_{n}^{t} + w_{s} \cdot η_{N}^{t} + \frac{\sum_{n = 1}^{N} {(η_{n}^{t} - \frac{\sum_{n = 1}^{N} η_{n}^{t}}{N})}^{2}}{N},

(14)

where the weighted value of the sum and the variance of RRUR is included in

r_{3}^{t}

. Due to the large bandwidth, the NTN resources are enough to load many users. If there is no additional limit on accessing NTN, the agent will tend to let as many users as possible access the NTN to avoid overloading in TN cells. In order to prevent the transmission delay of users from being affected, we use the additional weight

w_{s}

to increase the impact of resource utilization of the NTN cell.

(4) The training process of the FDDPG algorithm: As in DDPG, FDDPG has two components, actor and critic. The actor network defined as

μ (s^{t})

takes

s^{t}

as input and returns action

a^{t}

. The critic network defined as

Q (s^{t}, a^{t})

returns long-term reward based on states and actions.

Q (s^{t}, a^{t})

can be expressed as

Q (s^{t}, a^{t}) = E [r^{t} | s^{t}, a^{t}] \approx E [r^{t} + γ Q (s^{t + 1}, μ (s^{t + 1}))]

(15)

according to the Bellman equation, where

E [\cdot]

means expectation and

γ

means the reward discount.

DDPG combines actor–critic and DQN, so there are four networks in total. The actor network

μ (s^{t})

and actor target network

μ^{'} (s^{t})

have the same structure, but different parameters

θ^{μ}

and

θ^{μ^{'}}

and different update frequencies. The critic network

Q (s^{t}, a^{t})

and critic target network

Q^{'} (s^{t}, a^{t})

have the same structure, but different parameters

θ^{Q}

and

θ^{Q^{'}}

and different update frequencies. For the activation function, the linear rectification function (ReLU) is utilized in hidden layers and the hyperbolic tangent function is utilized in output layers.

Figure 3 shows the training structure of the proposed fuzzy reinforcement learning. In each episode during the training process, users’ positions and velocities are randomly reset to reset the environment. In each training step, the three fuzzy inputs are obtained from the environment and the state

s^{t}

is obtained based on Equation (8). In order to speed up training,

s^{t}

is normalized to get

{\hat{s}}^{t} = (s^{t} (m, n) - m o d (s^{t} (m, n), β))

(16)

with the normalization coefficient

β

, where

m \in [1, M], n \in [1, N]

.

To explore new states, the output of the actor network is added by random noise. The action

a^{t}

is obtained by

a^{t} = μ ({\hat{s}}^{t} | θ^{μ}) + N (0, v a r),

(17)

in which the noise is a Gaussian random number with mean value of 0 and variance of

v a r

. After performing

a^{t}

on the environment,

r^{t}

and the next state

{\hat{s}}^{t + 1}

can be obtained from the output of the environment. To break the association between data,

({\hat{s}}^{t}, a^{t}, r^{t}, {\hat{s}}^{t + 1})

is stored in a replay buffer.

θ^{μ}

,

θ^{μ^{'}}

,

θ^{Q}

and

θ^{Q^{'}}

are updated by sampling a mini-batch with size K from the replay buffer. The loss function of the critic network is defined as

\begin{matrix} L (θ^{Q}) = E_{μ^{'}} [{(y^{t} - Q ({\hat{s}}^{t}, a^{t} | θ^{Q}))}^{2}], \\ y^{t} = r^{t} + γ Q^{'} ({\hat{s}}^{t + 1}, μ^{'} ({\hat{s}}^{t + 1}) | θ^{Q^{'}}), \end{matrix}

(18)

which is the temporal difference error between the outputs of

θ^{Q}

and

θ^{Q^{'}}

. Thus, the gradient of critic network is calculated by

▽_{a} Q (s, a | θ^{Q}) |_{s = {\hat{s}}^{t}, a = μ ({\hat{s}}^{t} | θ^{μ})} .

(19)

By applying the chain rule to the expected return from the start distribution J with respect to the actor parameters [31], the actor is updated by Equation (20).

θ^{Q}

and

θ^{μ}

can be updated via gradient descent method.

▽_{θ^{μ}} J \approx \frac{1}{K} \sum_{t} [▽_{a} Q (s, a | θ^{Q}) |_{s = {\hat{s}}^{t}, a = μ ({\hat{s}}^{t} | θ^{μ})} ▽_{θ^{μ}} μ (s | θ^{μ}) |_{s = {\hat{s}}^{t}}] .

(20)

The weights of target networks

θ^{Q^{'}}

and

θ^{μ^{'}}

are updated based on the weights of

θ^{Q}

and

θ^{μ}

, as in Equations (21) and (22). The detailed training process is shown in Algorithm 1.

θ^{Q^{'}} \leftarrow θ^{Q} + (1 - τ) θ^{Q^{'}} .

(21)

θ^{μ^{'}} \leftarrow θ^{μ} + (1 - τ) θ^{μ^{'}} .

(22)

Algorithm 1. Fuzzy reinforcement learning training algorithm.

Require:: Training episodes $E_{m a x}$ , training steps for each episode $T_{m a x}$ , learning rate of actor network $α_{A}$ , learning rate of critic network $α_{C}$ , initial exploration rate $v_{i n i t}$ , exploration discount $v_{d i s}$ , minimum exploration rate $v_{m i n}$ , replay buffer size G, mini-batch size K, reward discount $γ$ , update rate $τ$ .

1:: Randomly initialize the weights of actor network and critic network as $θ^{μ}$ and $θ^{Q}$ , the initial weight of actor target network $θ^{μ^{'}}$ is same as $θ^{μ}$ and the initial weight of critic target network $θ^{Q^{'}}$ is same as actor network $θ^{Q}$
2:: Initialize the empty replay buffer, initialize exploration rate $v a r$ as $v_{i n i t}$
3:: for each episode $e p$ in range $(1, E_{m a x})$ do
4:: Randomly set users’ positions and speeds and data requirements to reset environment
5:: for each step t in range $(1, T_{m a x})$ do
6:: Get fuzzy inputs from the environment, get $s^{t}$ and ${\hat{s}}^{t}$ from Equations (8) and (16)
7:: Get $a^{t}$ from Equation (17)
8:: Perform $a^{t}$ to the environment and get $r^{t}$ from Equations (11)–(14)
9:: Get $s^{t + 1}$ and ${\hat{s}}^{t + 1}$ from Equations (8) and (16)
10:: Store $({\hat{s}}^{t}, a^{t}, r^{t}, {\hat{s}}^{t + 1})$ in replay buffer
11:: if the replay buffer is full then
12:: Replace a data randomly by $({\hat{s}}^{t}, a^{t}, r^{t}, {\hat{s}}^{t + 1})$
13:: Update exploration rate, $v a r = m a x (v a r \cdot v_{d i s}, v_{m i n})$
14:: Sample a mini-batch of size K from the replay buffer
15:: Update $θ^{Q}$ by Equation (19)
16:: Update $θ^{μ}$ by Equation (20)
17:: Update target networks by Equations (21) and (22)
18:: end if
19:: end for
20:: end for
21:: Training completed, save the actor network

2.5. The Fuzzy-Logic-Based Offloading Algorithm

The algorithm above provides the access policy for all users to prevent cells from overloading. However, with the irregular movements of users, some cells could still overload after a long enough period of time. At this moment, to ensure the service quality both at the time of offloading and in the future, FL-OL is proposed to select appropriate users to be offloaded. In the following, the proposed FL-OL is described.

Consider a set of cells

S = {s_{1}, s_{2}, \dots, s_{N}}

. Whenever the resource utilization rate

η_{i}

of the TN cell

s_{i} \in S, i \neq N

is higher than the threshold

Θ_{g}

, it is considered as an overloaded TN cell. Consider a set of users

U^{i} = {u_{1}^{i}, u_{2}^{i}, \dots, u_{M_{i}}^{i}}

which are served by cell

s_{i}

.

M_{i}

is the total number of users served by the cell

s_{i}

.

f_{j}^{i}

is the FEM between cell

s_{i}

and user

u_{j}^{i}

;

j \in [1, M_{i}]

.

u_{\hat{j}}^{i}

is the user with the minimum value of the evaluation metric

f_{m i n} = f_{\hat{j}}^{i}

. Calculate the FEM

f_{\hat{j}}^{i^{'}}

between cell

s_{i^{'}}, i^{'} \in [1, N], i^{'} \neq i

and user

u_{\hat{j}}^{i}

. Due to the long distance between users and the GEO satellite, relative speed can be ignored for the NTN cell

s_{N}

. Thus,

f_{\hat{j}}^{N}

is only influenced by user’s position and the randomness of the channel. If there is any

f_{\hat{j}}^{i^{'}}

higher than

f_{m i n}

, select the cell

s_{\hat{i}}

with the maximum value of the evaluation metric. If

\hat{i} \neq N

, check whether

η_{\hat{i}}

is higher than

Θ_{g}

after offloading

u_{\hat{j}}^{i}

to

s_{\hat{i}}

. If not,

s_{\hat{i}}

is selected as the target cell. Otherwise, it means that user

u_{\hat{j}}^{i}

is at the outer edge of the area and there is no more suitable TN cell for this user. Thus,

u_{\hat{j}}^{i}

is offloaded to the NTN cell on the premise that

η_{N}

is not higher than

Θ_{s}

. The algorithm will continue to find new

u_{\hat{j}}^{i}

with new

f_{m i n}

and offload until the current cell

s_{i}

is no longer overloaded. The whole process is summarized by Algorithm 2 as follows.

Algorithm 2. The fuzzy logic offloading algorithm.

Require: Resource utilization rate

η_{k}, k \in [1, N]

of cells in

S = {s_{1}, s_{2}, \dots, s_{N}}

1:: while $η_{i} \geq Θ_{g}, i \in [1, N - 1]$ do
2:: Get users $U^{i} = {u_{1}^{i}, u_{2}^{i}, \dots, u_{M_{i}}^{i}}$ severed by $s_{i}$
3:: for $u_{j}^{i}$ in $U^{i}, j \in [1, M_{i}]$ do
4:: Get $f_{j}^{i}$ by FEM and save in a set F
5:: end for
6:: Sort F in ascending order and get the first user $u_{\hat{j}}^{i}$ with $f_{m i n} = f_{\hat{j}}^{i}$
7:: for $i^{'} \in [1, N - 1], i^{'} \neq i$ do
8:: Calculate $η_{i^{'}}$ assuming $u_{\hat{j}}^{i}$ is offloaded to $s_{i^{'}}$
9:: if $f_{\hat{j}}^{i^{'}} > f_{m i n}$ and $η_{i^{'}} < Θ_{g}$ then
10:: Save $f_{\hat{j}}^{i^{'}}$ in a set $F^{'}$
11:: end if
12:: end for
13:: if $F^{'}$ is not empty then
14:: Sort $F^{'}$ in descending order and get the first cell $s_{\hat{i}}$
15:: Offload $u_{\hat{j}}^{i}$ to $s_{\hat{i}}$
16:: else if $η_{N} < Θ_{s}$ then
17:: Offload $u_{\hat{j}}^{i}$ to NTN cell
18:: else
19:: break;
20:: end if
21:: Update $η_{k}, k \in [1, N]$
22:: end while

3. Results

In this section, the simulation scenario is presented and parameters are set to train the networks both by the proposed scheme and several typical algorithms, including DDPG, proximal policy optimization (PPO) [32] and the adaptive multi-RAT mobile offloading algorithm (AMMO) [20]. Simulation results show that compared with baseline schemes, the proposed scheme solves the overloading problem in STIN more effectively.

3.1. Simulation Environment

Referring to the rural scenario in 3GPP [24], D was set to 1732 m. We supposed there were three TN cells in the area; thus,

N = 4

. The width of the area L was set to 3500 m. To compare the performance of the proposed scheme with different user densities, 5 to 30 users per TN cell were deployed, which means M was set from 15 to 90. The TN cells worked at 4 GHz with 30 MHz bandwidth. The GEO satellite worked in the Ka band (20 GHz) with 400 MHz bandwidth. The threshold of TN was set to 85%. Considering that the simulation area only accounts for a small part of the satellite coverage area, the threshold of NTN was set to 30%. There was no signal interference between the cells of TN and NTN. Other parameters mentioned in the previous sections were configured as in Table 1.

Membership functions of the three inputs trained by ANFN are shown in Figure 4. The performances of the proposed scheme based on FEM are described in the following subsections.

3.2. Performance of the Fuzzy-Logic- and Reinforcement-Learning-Based Access Algorithm

The FDDPG model utilized in FLRL-AC was trained by Algorithm 1 and the hyper-parameters in Table 2. In order to reflect the improvements of FDDPG, two other baseline algorithms were utilized to train the neural networks. PPO is based on an actor–critic network similar to that of DDPG, which can solve the problem of continuous control. It balances the difficulty of implementation, the complexity of sampling and the effort required for debugging. For these reasons, it is widely used as a default reinforcement learning algorithm for new problems. The other baseline algorithm was DDPG without the ANFN, in which the state function and reward function are defined as

s^{t} = (ζ^{t} (1, 1), ζ^{t} (1, 2), \dots, ζ^{t} (M, N)),

(23)

r_{1}^{t} = \frac{\sum_{m = 1}^{M} ζ (m, A C^{t} (m))}{M},

(24)

instead of Equations (8) and (11); and

ζ^{t} (m, n)

means SNR between user m and cell n at time t.

The environment was reset and users were dropped randomly in each drop. Users accessed cells based on different algorithms at time slot 0 and moved in later time slots. The overload ratio (OLR) is defined as

O L R = \frac{N_{o l}}{N_{t o t a l}},

(25)

where

N_{o l}

is the number of time slots in which overloading occurs and

N_{t o t a l}

is the total number of the simulation slots.

Results with different access algorithms are shown in Figure 5. RAND means users access randomly, which is considered as the lower limit. BCA [30] means users access with the best channel state. As the number of users increases, the OLR increases for all algorithms. The purpose of designing FDDPG and FLRL-AC was to pre-process the factors affecting overloading by ANFN and to reduce the training difficulty of the network in a high-dimensional state and action space. Additionally, the FEM calculated by ANFN gives a more accurate evaluation of the impact of users on overloading. Therefore, taking the statistics of FEM as the reward function could enable the agent to make better decisions. The advantages brought by FDDPG are not obvious when the number of users is small and the training dimensions are not too high. Therefore, the three curves of PPO, DDPG and FDDPG are close. As the number of users increases, the results of BCA become unacceptable because the resources of NTN cell are not effectively utilized, resulting in overloading of TN cells. PPO and DDPG have similar performances to the two baseline algorithms, and the gain brought about by FDDPG becomes more and more obvious with the increase users: up to 29%.

Users who achieve their data requirements are called satisfied users. The satisfied ratio (SR) is defined as

S R = \frac{M_{s}}{M_{t o t a l}},

(26)

where

M_{s}

is the number of satisfied users and

M_{t o t a l}

is the total number of users.

Results in Figure 6 show that with RAND and BCA, many users fail to achieve their target rates. With the three access algorithms based on reinforcement learning, there are always more than 97% of users who achieve their data rate requirements, which again proves that active load balancing can significantly improve QoS.

3.3. Performance of the Fuzzy-Logic-Based Offloading Algorithm

In order to evaluate the performance of the offloading algorithm, a special user dropping method was used. Two of the cells with more users were dropped to emulate overloading. Additionally, users accessed cells based on the basic algorithm, BCA. The proposed FL-OL is compared with AMMO. Whenever overloading occurred, the two offloading algorithms were utilized, respectively, to offload users to other available cells. After a period of time, the cell could overload again due to the dynamic changes of the environment. The time interval of the first or second overload time is called the next overload interval. Although the current signal reception quality of some users is poor, they may be moving toward the current serving cell and away from adjacent cells. Such users are likely to be selected if the offloaded users are selected only according to RSRP, which may bring a heavy load to the adjacent cell. The rate requirement is also an important factor. Low rate users have little impact on the overloading and should have a lower priority for being offloaded. Unlike AMMO, FL-OL takes the above factors into account by introducing FEM and should be able to prolong the time with a balanced load for the system.

Figure 7 shows the next overload interval for the two algorithms and different numbers of users. The next overload interval of both algorithms decreases with the increase in users due to limited frequency resources. FL-OL makes the next overload interval longer by about 17%. It proves our inference that by considering future changes, FL-OL prolongs the time that the system maintains load balance. In the long run, it will reduce the number of times of calling the offloading algorithm and the number of times of handover.

With an increase in users and limited TN frequency resources, it is more difficult to balance load when only relying on TN cells. The number of users served by each satellite is increasing. However, there are always less users offloaded to NTN with the proposed FL-OL due to it making more effective use of the TN’s resources. As shown in Figure 8, FL-OL reduces users affected by the delay of long-distance transmission of satellite-user link by up to 14% compared with the existing algorithm.

Furthermore, to verify the effectiveness of the proposed algorithm, Figure 9 shows the average data rate before and after offloading. With an increase in users, the limited resources of the overloaded cell struggle more difficult to meet users’ rate requirements so that the average rate decreases significantly. After performing offloading, the average rates increased by 10% to 70% with different user numbers, and users basically met their target data rates. The results show that overloading will significantly affect QoS, and once again verifies that the scheme of reducing the overloading through active load balancing in this paper is meaningful.

3.4. Performance of the Fuzzy-Logic-Based Loading Balancing Scheme

In previous subsections, the performances of the two parts of the FL-LB were evaluated. Combining the two parts, the performance of FL-LB is evaluated in this subsection. Different access algorithms and offloading algorithms were paired to obtain the following four baseline joint schemes. BCA∼RSRP means to access based on BCA and offload based on RSRP. BCA∼FL-OL means to access based on BCA and offload based on FL-OL. FLRL-AC∼RSRP means to access based on FLRL-AC and offload based on RSRP. Finally, FLRL-AC∼FL-OL is the FL-LB proposed in this paper, which means to access based on FLRL-AC and offload based on FL-OL.

Results in Figure 10 and Figure 11 show that BCA∼RSRP had the worst performance with all user numbers because it does not consider any dynamic changes in the environment. FLRL-AC∼RSRP performed better with smaller numbers of users, and BCA∼FL-OL performed better with larger number of users. This is because in the two parts of the FL-LB, the access algorithm plays a more important role when the number of users is small. No matter how good the access decision is, overloading will occur when the number of users is too large. Therefore, a better offloading algorithm should be used for the overloading cells in time. This is the reason why the FL-LB proposed in this paper retains the reactive load balancing part. Finally, FL-LB had the best performance with all user numbers. Therefore, FL-LB can effectively reduce the occurrence of overloading in a longer term and improve QoS for users in STIN.

4. Discussion

Contributions of this paper are presented in this section. Existing load balancing schemes usually focus on the offloading algorithms, which are executed after overloading. Note that once overloading occurs, data rates of users decrease significantly. Therefore, we utilized FEM to learn and evaluate users’ impacts on overloading in the future, and propose an active load balancing scheme to ensure data rates. Furthermore, the proposed FDDPG adds an adaptive neuro fuzzy network before the original DRL network. Compared with the widely utilized PPO and DDPG, FDDPG pre-filters the data relations that the DRL network needs to learn, thereby reducing the training difficulty of the DRL network and obtaining better training results. On the other hand, the proposed FL-OL makes FL-LB retain the ability of reactive load balancing on the basis of active load balancing. When the ANFN is trained, the complexity of calculating FEM is only related to the number of input variables and the number of membership functions in the ANFN. Therefore, the computational complexity of calculating FEM will not increase with increases in base stations and users. Compared with the existing method, FL-OL maintains the same complexity

O (M N)

while utilizing FEM to consider the impact of offloading selection on overload in the future and extending the next overload interval.

However, FDDPG still needs more than

10^{6}

training steps even after pre-filtering by FEM. This is because the proposed scheme hopes to run on the central control cell and give the access decision for all of the users. This inevitably leads to a large state space and action space. On the other hand, compared with the existing algorithms that only focus on reactive load balancing, the FLRL-AC in FL-LB obviously increases the computational complexity, even though it is thought acceptable compared with the QoS gain. In order to solve these problems, multi-agent deep reinforcement learning (MADRL) based load balancing is a future research area. If FLRL-AC is distributed to each cell, the complexity of training will be greatly reduced. Additionally, in this case, since the decision dimension is the same, the access algorithm and the offloading algorithm can be more deeply combined so that to reduce the additional computing overhead brought by the access algorithm.

5. Conclusions

Existing studies on load balancing in STIN only considered a single metric of signal reception. When users move and require data randomly, the baseline schemes may not have acceptable performance in the long term. Active load balancing methods could obtain performance gains compared with the existing reactive methods. Considering the randomness of users in the future and the difficulty of explicitly evaluating overload tendency, we proposed an overload tendency evaluation metric based on fuzzy logic. Then, the overloading problem in STIN was solved by the proposed FL-LB: An access algorithm for all users called FLRL-AC was proposed, which prevents overloading while considering the characteristics of NTN. An offloading algorithm for the already overloaded cells called FL-OL was proposed, which balances load between cells. The fuzzy logic network is trained by ANFN, and the neural network in FLRL-AC is trained by FDDPG. Results show that FL-LB reduces the possibility of overloading before it occurs and makes cells maintain a longer load balance after offloading to ensure QoS for users.

Author Contributions

Conceptualization, Y.G. and H.Y.; methodology, Y.G. and H.Y.; software, H.Y. and X.W.; validation, Y.C. and C.L.; formal analysis, Y.G.; investigation, H.Y., X.W., Y.C. and C.L.; resources, X.Z. and Y.G.; data curation, H.Y.; writing—original draft preparation, H.Y. and Y.G.; visualization, H.Y.; supervision, Y.G.; project administration, Y.G. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All relevant data presented in the article are stored according to institutional requirements, and as such are not available online. However, all data used in this manuscript can be made available upon request to the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

3GPP TR 38.913; Study on Scenarios and Requirements for Next Generation Access Technologies. 3GPP: Sophia Antipolis, France, 2020.
Gures, E.; Shayea, I.; Ergen, M.; Azmi, M.H.; El-Saleh, A.A. Machine Learning-Based Load Balancing Algorithms in Future Heterogeneous Networks: A Survey. IEEE Access 2022, 10, 37689–37717. [Google Scholar] [CrossRef]
Tey, F.J.; Wu, T.Y.; Wu, Y.; Chen, J.L. Generative Adversarial Network for Simulation of Load Balancing Optimization in Mobile Networks. J. Internet Technol. 2022, 23, 297–304. [Google Scholar]
Yang, P.; Xiao, X.; Xiao, M.; Li, S.Q. 6G Wireless Communications: Vision and Potential Techniques. IEEE Netw. 2019, 33, 70–75. [Google Scholar] [CrossRef]
Park, J.; Kim, Y.; Lee, J.R. Mobility Load-Balancing Method for Self-Organizing Wireless Networks Inspired by Synchronization and Matching with Preferences. IEEE Trans. Veh. Technol. 2018, 67, 2594–2606. [Google Scholar] [CrossRef]
Zreikat, A.I. Load Balancing Call Admission Control Algorithm (CACA) Based on Soft-Handover in 5G Networks. In Proceedings of the 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 26–29 January 2022. [Google Scholar]
Zhang, T.T.; Yang, L.T.; Dong, T.; Yin, J.; Liu, Z.H.; Wang, Z.W. A Multi-Attribute Decision Handover Strategy for Giant LEO Mobile Satellite Networks. In Proceedings of the 6th International Conference on Smart Computing and Communication (SmartCom), Online, 29–31 December 2021. [Google Scholar]
Chen, Y.S.; Chang, Y.J.; Tsai, M.J.; Sheu, J.P. Fuzzy-Logic-Based Handover Algorithm for 5G Networks. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference (WCNC), Nanjing, China, 29 March–1 April 2021. [Google Scholar]
Asghari, M.Z.; Ozturk, M.; Hämäläinen, J. Reinforcement Learning Based Mobility Load Balancing with the Cell Individual Offset. In Proceedings of the 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), Helsinki, Finland, 25–28 April 2021. [Google Scholar]
Giuseppi, A.; Maaz, S.S.; Santis, E.; Ho Won, S.; Kwon, S.; Choi, T. Design and Simulation of the Multi-RAT Load-balancing Algorithms for 5G-ALLSTAR Systems. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Korea, 21–23 October 2020. [Google Scholar]
Ghatak, G.; De Domenico, A.; Coupechoux, M. Coverage Analysis and Load Balancing in HetNets With Millimeter Wave Multi-RAT Small Cells. IEEE Trans. Wirel. Commun. 2018, 17, 3154–3169. [Google Scholar] [CrossRef]
Hatipoğlu, A.; Başaran, M.; Yazici, M.A.; Durak-Ata, L. Handover-based Load Balancing Algorithm for 5G and Beyond Heterogeneous Networks. In Proceedings of the 2020 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), Brno, Czech Republic, 5–7 October 2020. [Google Scholar]
Agarwal, B.; Ruffini, M.; Muntean, G.-M. Reduced Complexity Optimal Resource Allocation for Enhanced Video Quality in a Heterogeneous Network Environment. IEEE Trans. Wirel. Commun. 2022, 17, 2892–2908. [Google Scholar] [CrossRef]
Zhao, Z.X.; Du, Q.H.; Wang, D.W.; Tang, X.; Song, H.B. Overview of Prospects for Service-Aware Radio Access towards 6G Networks. Electronics 2022, 11, 1262. [Google Scholar] [CrossRef]
Furqan, M.; Iqbal, S.; Wasim, M.; Huang, Y. Load Balancing for Future 5G Network Communication: Performance and Trade-off. In Proceedings of the 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dubai, United Arab Emirates, 11–12 December 2019. [Google Scholar]
Furqan, M.; Ali, Z.; Jan, Q.; Nazir, S.; Iqbal, S.; Huang, Y. An Efficient Load-Balancing Scheme for UAVs in 5G Infrastructure. IEEE Syst. J. 2022, 1–12. [Google Scholar] [CrossRef]
Huang, Y.; Feng, B.; Dong, P.; Tian, A.; Yu, S. A Multi-objective Based Inter-Layer Link Allocation Scheme for MEO/LEO Satellite Networks. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022. [Google Scholar]
Dai, C.-Q.; Li, S.; Wu, J.; Chen, Q. Distributed User Association With Grouping in Satellite–Terrestrial Integrated Networks. IEEE Internet Things J. 2022, 9, 10244–10256. [Google Scholar] [CrossRef]
Wu, D.F.; Huang, C.H.; Yin, Y.B.; Huang, S.D.; Ashraf, W.A.; Guo, Q.Q.; Zhang, L. LB-DDQN for handover decision in satellite-terrestrial integrated networks. Wirel. Commun. Mob. Comput. 2021, 2021, 5871114. [Google Scholar] [CrossRef]
Shahid, S.M.; Seyoum, Y.T.; Won, S.H.; Kwon, S. Load Balancing for 5G Integrated Satellite-Terrestrial Networks. IEEE Access 2020, 8, 132144–132156. [Google Scholar] [CrossRef]
Huang, M.; Chen, J. Joint Load balancing and Spatial-temporal Prediction Optimization for Ultra-Dense Network. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022. [Google Scholar]
Zhu, A.; Ma, M.; Guo, S.; Yang, Y. Adaptive Access Selection Algorithm for Multi-Service in 5G Heterogeneous Internet of Things. IEEE Trans. Netw. Sci. Eng. 2022, 70, 1630–1644. [Google Scholar] [CrossRef]
Fan, K.; Feng, B.; Zhang, X.; Zhang, Q. Network Selection Based on Evolutionary Game and Deep Reinforcement Learning in Space-Air-Ground Integrated Network. IEEE Trans. Netw. Sci. Eng. 2022, 9, 1802–1812. [Google Scholar] [CrossRef]
3GPP TR 38.901; Study on Channel Model for Frequencies from 0.5 to 100 GHz. 3GPP: Sophia Antipolis, France, 2019.
3GPP TR 38.821; Solutions for NR to Support Non-Terrestrial Networks (NTN). 3GPP: Sophia Antipolis, France, 2021.
Yu, S.; Wang, X.; Langar, R. Computation Offloading for Mobile Edge Computing: A Deep Learning Approach. In Proceedings of the 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Montreal, QC, Canada, 8–13 October 2017. [Google Scholar]
3GPP TR 38.811; Study on New Radio (NR) to Support Non-Terrestrial Networks. 3GPP: Sophia Antipolis, France, 2020.
3GPP R1-1909515; Summary on Simulation Assumptions for NTN. 3GPP: Sophia Antipolis, France, 2019.
Gao, Y.H.; Yang, H.T.; Chen, L.; Yang, H.W.; Yin, L. Selection Algorithm of eMBB/URLLC Multiplexing Schemes Based on Fuzzy Logic. J. Beijing Univ. Posts Telecommun. 2021, 44, 15–20, 34. [Google Scholar]
3GPP TR 38.321; Medium Access Control (MAC) Protocol Specification. 3GPP: Sophia Antipolis, France, 2022.
Lillicrap, T.; Hunt, J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Schulman, J.; Wolski, P.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]

Figure 1. Diagram of load balancing in a satellite–terrestrial integrated network.

Figure 2. Adaptive neuro fuzzy network structure.

Figure 3. The fuzzy deep deterministic policy gradient training structure.

Figure 4. Input membership functions.

Figure 5. The overload ratio with different algorithms.

Figure 6. The satisfied ratio with different algorithms.

Figure 7. Next overload interval with different algorithms.

Figure 8. Number of users served by a satellite with different algorithms.

Figure 9. Average rate with different algorithms.

Figure 10. The overload ratio with different joint schemes.

Figure 11. The satisfied ratio with different joint schemes.

Table 1. Parameter configuration.

Parameter	Value
L	3500 m
N	3
M	15∼90
D	1732 m
$B_{g}$	30 MHz
$B_{s}$	400 MHz
$ϵ$	0.1
$p_{s}$	46 dBm
$N_{0}$	$5 \times 10^{- 5}$ w
$μ_{0}$	0
$θ_{h}$	1
v	10 m/s
$R_{a v g}$	10 Mbps
P	5
Q	5
R	5
$Θ_{g}$	85%
$Θ_{s}$	30%

Table 2. Training parameters.

Parameter	Value
$E_{m a x}$	10,000
$T_{m a x}$	100
$a_{1}$	0.1
$a_{2}$	0.3
$a_{3}$	0.2
$w_{s}$	2
$α_{A}$	0.001
$α_{C}$	0.002
$β$	5
$v_{i n i t}$	10
$v_{d i s}$	0.999997
$v_{m i n}$	0.01
G	5000
K	64
$γ$	0.001
$τ$	0.01

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, Y.; Yang, H.; Wang, X.; Chen, Y.; Li, C.; Zhang, X. A Fuzzy-Logic-Based Load Balancing Scheme for a Satellite–Terrestrial Integrated Network. Electronics 2022, 11, 2752. https://doi.org/10.3390/electronics11172752

AMA Style

Gao Y, Yang H, Wang X, Chen Y, Li C, Zhang X. A Fuzzy-Logic-Based Load Balancing Scheme for a Satellite–Terrestrial Integrated Network. Electronics. 2022; 11(17):2752. https://doi.org/10.3390/electronics11172752

Chicago/Turabian Style

Gao, Yuehong, Haotian Yang, Xiaoqi Wang, Yihao Chen, Chenyang Li, and Xin Zhang. 2022. "A Fuzzy-Logic-Based Load Balancing Scheme for a Satellite–Terrestrial Integrated Network" Electronics 11, no. 17: 2752. https://doi.org/10.3390/electronics11172752

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Fuzzy-Logic-Based Load Balancing Scheme for a Satellite–Terrestrial Integrated Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Network Structure

2.2. Problem Description

2.3. The Fuzzy Evaluation Metric

2.4. The Fuzzy-Logic- and Reinforcement-Learning-Based Access Algorithm

2.5. The Fuzzy-Logic-Based Offloading Algorithm

3. Results

3.1. Simulation Environment

3.2. Performance of the Fuzzy-Logic- and Reinforcement-Learning-Based Access Algorithm

3.3. Performance of the Fuzzy-Logic-Based Offloading Algorithm

3.4. Performance of the Fuzzy-Logic-Based Loading Balancing Scheme

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI