Bio-Inspired Fission–Fusion Control and Planning of Unmanned Aerial Vehicles Swarm Systems via Reinforcement Learning

Zhang, Xiaorong; Wang, Yufeng; Ding, Wenrui; Wang, Qing; Zhang, Zhilan; Jia, Jun

doi:10.3390/app14031192

Open AccessArticle

Bio-Inspired Fission–Fusion Control and Planning of Unmanned Aerial Vehicles Swarm Systems via Reinforcement Learning

by

Xiaorong Zhang

^1,2,

Yufeng Wang

^3,*

,

Wenrui Ding

³,

Qing Wang

⁴,

Zhilan Zhang

¹ and

Jun Jia

⁵

¹

School of Electronic Information Engineering, Beihang University, Beijing 100191, China

²

School of Shen Yuan Honors College, Beihang University, Beijing 100191, China

³

Institute of Unmanned System, Beihang University, Beijing 100191, China

⁴

School of Automation Science Electrical Engineering, Beihang University, Beijing 100191, China

⁵

Shanghai Eletro-Mechanical Engineering Institute, Shanghai 201109, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(3), 1192; https://doi.org/10.3390/app14031192

Submission received: 25 December 2023 / Revised: 24 January 2024 / Accepted: 29 January 2024 / Published: 31 January 2024

(This article belongs to the Special Issue Intelligent Control of Unmanned Aerial Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

Swarm control of unmanned aerial vehicles (UAV) has emerged as a challenging research area, primarily attributed to the presence of conflicting behaviors among individual UAVs and the influence of external movement disturbances of UAV swarms. However, limited attention has been drawn to addressing the fission–fusion motion of UAV swarms for unknown dynamic obstacles, as opposed to static ones. A Bio-inspired Fission–Fusion control and planning via Reinforcement Learning (BiFRL) algorithm for the UAV swarm system is presented, which tackles the problem of fission–fusion behavior in the presence of dynamic obstacles with homing capabilities. Firstly, we found the kinematics models for the UAV and swarm controller, and then we proposed a probabilistic starling-inspired topological interaction that achieves reduced overhead communication and faster local convergence. Next, we develop a self-organized fission–fusion control framework and a fission decision algorithm. When dealing with various situations, the swarm can autonomously re-configure itself by fissioning an optimal number of agents to fulfill the corresponding tasks. Finally, we design a sub-swarm confrontation algorithm for path planning optimized by reinforcement learning, where the sub-swarm can engage in encounters with dynamic obstacles while minimizing energy expenditure. Simulation experiments demonstrate the capability of the UAV swarm system to accomplish self-organized fission–fusion control and planning under different interference scenarios. Moreover, the proposed BiFRL algorithm successfully handles adversarial motion with dynamic obstacles and effectively safeguards the parent swarm.

Keywords:

UAV swarm; dynamic obstacles; fission–fusion control; starling-inspired topological interaction; reinforcement learning

1. Introduction

UAV swarm systems exhibit collective behavior among drones and their interaction with the environment [1]. Nature frequently serves as an inspiration source for the development of these systems, drawing from phenomena like insect movement [2], bird flocks [3], and fish schools [4,5]. The motion of such swarms is inherently dynamic, with the group size and composition undergoing frequent changes as individual members split or merge, referred to as “fission-fusion” behavior [6,7,8]. Fission–fusion of the swarm holds significant importance in the natural world for various animal species. For instance, tree-dwelling bats rely on shared survival space within the group [9], while bird flocks enhance their chances of survival [10], and herds of bison and giraffes evade predators [11].

Recently, researchers from diverse fields have increasingly recognized the significance of fission–fusion behaviors. They have successfully emulated these behaviors to achieve robot-controlled motion [12,13,14], enhance the efficiency of swarm resource search [15], and accomplish planning objectives, such as task allocation and obstacle avoidance [1,16,17]. For example, Wang et al. [13] incorporate the fission–fusion motion into the formation controller of an underwater vehicle, Nauta et al. [15] study the population resource search with the fission–fusion concept, whereas Reséndiz-Benhumea et al. [16] integrate the fission–fusion movements observed in ant colonies into the task assignment algorithm for robot swarms. As such, the investigation of fission–fusion behavior in swarm systems holds substantial theoretical research and practical application values.

Swarm fission–fusion behavior involves the ability of individuals within a certain range to gather into a cohesive group with synchronized movement through specific interactions. Reynolds et al. [18] propose three fundamental behavioral rules to address this issue: collision avoidance, speed consistency, and mutual aggregation. Subsequently, extensive research has been conducted based on these rules for implementing swarm controls [19,20,21] and exploring swarm fission–fusion dynamics [17,22,23,24]. Nevertheless, most of these studies primarily focus on collision avoidance for static obstacles within a single swarm or during the fission–fusion process in a known environment with complete knowledge. There has been limited research on swarm planning in the presence of unknown dynamic obstacles, particularly of those with tracking capabilities. This is primarily because existing path planning strategies for static or dynamic obstacles often rely on global environment information [25,26,27,28]. For instance, Wang et al. [26] perform path planning through a sampling-based algorithm, while Garrett et al. [27] propose to address the continuous space subproblems and integrate the discrete and continuous aspects of the search process to achieve path planning objectives. In the real world, however, many dynamic obstacles remain unknown or possess tracking capabilities, continually intruding upon and pursuing the targeted swarm to disrupt its progress. Previous methods [22,23,24,25] for swarm control and planning may be significantly affected by limited success rate and resource utilization in reaching the intended destination.

Typical swarm controllers incorporate fixed-neighbor distance interactions for fission–fusion motions [1,22,29,30,31]. However, this interaction manner imposes a communication load that increases with the swarm size, making it unfeasible for large-scale swarms [25,29,30,31]. In practice, large-scale swarms are usually achieved through limited interactions among a subset of individuals, enabling the formation of expansive swarms [32,33,34]. Notably, studies have shown that each starling in a flock interacts with only six to seven of its closest neighbors, allowing for the swarm formation comprising thousands of individuals through topological interaction structures [32]. As such, numerous researchers have investigated the starling topology and its utilization in swarms [35,36,37]. However, in the context of UAV swarms, the reliance on a seven-nearest-neighbor topological interaction model often leads to local convergence, as shown in Figure 1. It still remains a problem of the local convergence to maintain swarm integrity while reducing the communication load among UAV swarm members.

Based on the above discussion, this paper focuses on investigating the bio-inspired fission–fusion control and planning of UAV swarm systems by enhancing the starling topology and incorporating a reinforcement learning algorithm. The proposed approach, referred to as the Bio-inspired Fission–Fusion Control and Planning via Reinforcement Learning (BiFRL), encompasses several key contributions. Firstly, we introduce a probabilistic starling-inspired topological interaction (PSTI) structure, which deviates from the traditional fixed-range neighbor interactions in swarm motion while effectively reducing the communication load and likelihood of local convergence. Next, a self-organized fission–fusion control framework and a fission decision algorithm are developed, which enables the swarm to autonomously divide into sub-swarms when encountering dynamic obstacles. It allows for precise control of sub-swarm units. Finally, we propose a sub-swarm confrontation via reinforcement learning (SCRL), which can efficiently plan confrontation movement with minimal energy loss through reinforcement learning techniques. This characteristic facilitates the original objectives of the parent swarm remaining unaffected by dynamic obstacles, while the sub-swarms can seamlessly reintegrate into the parent swarm once the interference ceases. Finally, the effectiveness and robustness of the proposed BiFRL algorithm for UAV swarm are validated through simulation models that consider dynamic obstacles approaching from different directions.

The present study provides the following contributions:

We introduced a probabilistic starling-inspired topological interaction structure, which effectively reduces the communication load and likelihood of local convergence.
A self-organized fission–fusion control framework for UAV swarms is presented that extends the existing social force model, where a fission decision algorithm is designed to manipulate the composition of sub-swarms and target the dynamic obstacles.
We developed a reinforcement learning sub-swarm confrontation algorithm to achieve a self-organized sub-swarm against dynamic obstacles in unknown environments, which significantly improves the adaptability of UAV swarms.
The proposed method’s feasibility and validity are demonstrated through extensive numerical simulations, accompanied by the development of several numerical evaluation indicators.

The subsequent sections of this paper are structured as follows: In Section 2, we present the formulation of the problems and review of existing models pertaining to fission–fusion control and planning in UAV swarms. Section 3 presents the proposed BiFRL framework and its components in detail. We then conduct the simulation experiments to analyze the proposed method in Section 4. Finally, the findings and conclusions of this study are presented in Section 5.

2. Problem Formulations

2.1. Unmanned Aerial Vehicles Kinematic Model

In the event that the UAV is endowed with a tri-loop autopilot for velocity, altitude control, and heading angle. The kinematic model of the UAV can be simplified as follows [38]:

\{\begin{array}{l} {\dot{x}}_{i}^{ϖ} = v_{i}^{ϖ} \cos ψ_{i}^{ϖ} \\ {\dot{y}}_{i}^{ϖ} = v_{i}^{ϖ} \sin ψ_{i}^{ϖ} \\ {\dot{h}}_{i}^{ϖ} = λ_{i}^{ϖ} \\ {\dot{v}}_{i}^{ϖ} = \frac{1}{α_{v}} (v_{i}^{ϖ_{i n p}} - v_{i}^{ϖ}) \\ {\dot{ψ}}_{i}^{ϖ} = \frac{1}{α_{ψ}} (ψ_{i}^{ϖ_{i n p}} - ψ_{i}^{ϖ}) \\ {\dot{λ}}_{i}^{ϖ} = + \frac{1}{α_{h}} (h_{i}^{ϖ_{i n p}} - h_{i}^{ϖ}) - \frac{1}{α_{λ}} λ_{i}^{ϖ} \end{array}

(1)

where (

x_{i}^{ϖ}, y_{i}^{ϖ}, h_{i}^{ϖ}

) denote the position of UAV

i

in

ϖ \in [sub - swarm, parent - swarm]

;

v_{i}^{ϖ}

,

v_{i}^{ϖ_{i n p}}

are the rate and control input commands of horizontal velocity;

ψ_{i}^{ϖ}

,

Ψ_{i}^{ϖ_{i n p}}

are the rate and control input commands of the heading angle;

λ_{i}^{ϖ}

is altitude change rate and control input commands of agent

i

in

ϖ

;

α_{ψ}

,

α_{v}

,

α_{h}

, and

α_{λ}

are the self-driving instrument control parameters. The following are the UAV flight condition constraints that are taken into account.

\{\begin{array}{l} v_{m i n} < v_{i}^{ϖ} < v_{m a x} \\ |\dot{χ}| \leq ϕ_{m a x} g / v_{i}^{ϖ} \\ λ_{m i n} < λ_{i}^{ϖ} < λ_{m a x} \end{array}

(2)

where

v_{m i n}

and

v_{m a x}

are the minimal and ceiling horizontal speeds, respectively;

φ_{m a x}

indicates the maximum lateral overload;

g

denotes the acceleration of gravity;

λ_{m i n}

and

λ_{m a x}

present the minimum and maximum height change rates, respectively, which are all greater than zero.

2.2. Dynamic Obstacle Movement Model

In this research, a dynamic obstacle with an expansive sensing limit is established, and the UAVs in its proximity are selected as tracking targets. The kinematic equation governing the dynamic obstacle is formulated as follows:

\{\begin{array}{l} {\dot{x}}_{i n v a d e r} = v_{i n v a d e r} \\ {\dot{v}}_{i n v a d e r} = u_{i n v a d e r} - δ_{i n v a d e r} {‖ v_{i n v a d e r} ‖}^{2} v_{i n v a d e r} \end{array}

(3)

u_{i n v a d e r} = κ_{i n v a d e r}^{a u t} \frac{v_{i n v a d e r}}{‖ v_{i n v a d e r} ‖} + κ^{i n v a d e r} \frac{x_{m i n}^{s o n_s u b g r o u p} - x_{i n v a d e r}}{‖ x_{m i n}^{s o n_s u b g r o u p} - x_{i n v a d e r} ‖} + δ_{i n v a d e r} {‖ v_{i n v a d e r} ‖}^{2} v_{i n v a d e r}

(4)

where

x_{i n v a d e r} = (x_{i n v a d e r}, y_{i n v a d e r}, h_{i n v a d e r}) \in R^{n}

represents the three-dimensional position coordinates of the dynamic obstacle,

v_{i n v a d e r} \in R^{n}

and

u_{i n v a d e r} \in R^{n}

are the velocity vector rate and control input of the dynamic obstacle,

κ_{i n v a d e r}^{a u t}

and

κ^{i n v a d e r}

are the inertia coefficient and racking factor of the dynamic obstacle, and

- δ_{i n v a d e r} {‖ v_{i n v a d e r} ‖}^{2} v_{i n v a d e r}

is the frictional force generated by the interaction between the dynamic obstacle and its surrounding environment.

2.3. Traditional Unmanned Aerial Vehicles Swarm Dynamics Model

In the present investigation, two swarm systems are present

ϖ \in [s u b - s w a r m, p a r e n t - s w a r m]

. In the present investigation, two swarm systems comprising a total of N agents are examined. It is observed that these swarms traverse through a three-dimensional space without any imposed boundary constraints. It is noteworthy that the two swarms under consideration are equivalent and autonomous, with each adhering to its own set of swarm movement regulations. The subsequent double integrator governs the motion of the agent:

\{\begin{array}{l} {\dot{χ}}_{i}^{ϖ} = v_{i}^{ϖ} \\ m_{i}^{ϖ} {\dot{v}}_{i}^{ϖ} = u_{i}^{ϖ} - ξ {‖ v_{i}^{ϖ} ‖}^{2} v_{i}^{ϖ} \end{array}

(5)

where

χ_{i}^{ϖ} = (x_{i}^{ϖ}, y_{i}^{ϖ}, h_{i}^{ϖ}) \in R^{3}

is the position of UAV

i

in

ϖ \in [s u b - s w a r m, p a r e n t - s w a r m]

,

v_{i}^{ϖ} \in R^{3}

, indicating the velocity vector,

u_{i}^{ϖ} \in R^{3}

denotes the control input acting on;

m_{i}^{ϖ}

is the quality,

- ξ {‖ v_{i}^{ϖ} ‖}^{2} v_{i}^{ϖ}

is the friction against air, and

ξ

is the damping factor of air.

The majority of internal interactions between agents adhere to the cohesion alignment [22] and separation rules [21]. A prototypical implementation is outlined as follows:

u_{i}^{ϖ} = u_{i}^{ϖ_{v e l}} + u_{i}^{ϖ_{g u i}} + u_{i}^{ϖ_{p o s}} + ξ {‖ v_{i}^{ϖ} ‖}^{2} v_{i}^{ϖ} + γ^{i n e} v_{i}^{ϖ}

(6)

where

u_{i}^{ϖ_{v e l}}

is the velocity alignment,

u_{i}^{ϖ_{g u i}}

is the navigation function,

u_{i}^{ϖ_{g u i}}

is navigational force,

u_{i}^{ϖ_{p o s}}

is the position of cooperation, and

γ^{i n e}

is the inertia coefficient.

u_{i}^{ϖ_{p o s}}

is defined as follows:

u_{i}^{ϖ_{p o s}} = \sum_{j \in N_{i}^{ϖ} (t)} Γ^{p o s} ‖ d_{i j}^{ϖ} ‖ (1 - ({(\frac{l_{a}^{ϖ} ‖ χ_{j}^{ϖ} - χ_{i}^{ϖ} ‖}{χ_{j}^{ϖ} - χ_{i}^{ϖ}})}^{2}) \exp (\frac{l_{c}^{ϖ}}{d_{i j}^{ϖ}}))

(7)

where

Γ^{p o s}

is the coefficient of position cooperation,

l_{a}^{ϖ}

,

l_{c}^{ϖ}

are the agent desire spacing and the motion attenuation factor,

d_{i j}^{ϖ}

is the distance between two agents,

N_{i}^{ϖ}

is utilized to represent the number of agents that are interacting with agent

i

in

ϖ

, and

N_{i}^{ϖ} (t)

is the collection of the neighbors in

ϖ

that are interacting with agent

i

at time

t

.

Typical definitions of

u_{i}^{ϖ_{v e l}}

is given as follows:

u_{i}^{ϖ_{v e l}} = \frac{Γ^{v e l}}{N_{i}^{ϖ}} \sum_{j \in N_{i}^{g} (t)} (v_{j}^{ϖ} - v_{i}^{ϖ})

(8)

where

Γ^{v e l}

is the parameter of

u_{i}^{ϖ_{v e l}}

. Equation (8) represents the widely recognized “velocity consensus” algorithm, which allows the swarm to quickly reach a common speed and sustain a homogeneous state [18]. The present study aimed to investigate the ability of a swarm using Equation (8) to perform self-organized fission movements when only a few agents in the swarm detect the presence of an obstacle. However, researcher findings revealed that the swarm was unable to execute self-organized fission movements in such a scenario [34]. To address this limitation, Yang et al. [17] introduced an intermittent selective mechanism that allows swarms encountering static obstacles to engage in fission–fusion motion. However, when confronted with dynamic obstacles possessing tracking capabilities or uncertain interference directions, the UAV swarm often struggles to complete its flight with optimal resource utilization. Furthermore, such situations may severely disrupt the normal movement of the swarm. The effects of dynamic obstacles on traditional swarm motion are illustrated in Figure 2.

Hence, it is of great interest to develop a new control algorithm that facilitates the self-organized fission–fusion of UAV swarms in three-dimensional space while maintaining a lightweight structure and low communication costs. In this study, we propose a probabilistic starling-inspired topological interaction approach, which enables the UAV swarm to execute fission–fusion motions while minimizing communication requirements and reducing local convergence among swarm members. Additionally, we propose a sub-swarm confrontation algorithm based on reinforcement learning that is designed specifically for unknown dynamic obstacles with tracking capabilities. This algorithm allows the sub-swarm to self-organize and effectively confront unknown dynamic obstacles with minimal energy loss while simultaneously safeguarding the parent swarm from the disruptive influence of these obstacles.

2.4. Conversion Relations between Swarm Controller and Kinematic Model

The

u_{i}^{ϖ}

obtains the autopilot control input for UAV is given as follows:

u_{i}^{ϖ} = (\begin{matrix} u_{i}^{ϖ_{x}} \\ u_{i}^{ϖ_{y}} \\ u_{i}^{ϖ_{h}} \end{matrix})

(9)

\{\begin{array}{l} v_{i}^{ϖ_{i n p}} = α_{v} (u_{i}^{ϖ_{x}} \cos ψ_{i}^{ϖ} + u_{i}^{ϖ_{y}} \sin ψ_{i}^{ϖ}) + v_{i}^{ϖ} \\ ψ_{i}^{ϖ_{i n p}} = \frac{α_{ψ}}{v_{i}^{ϖ}} (u_{i}^{ϖ_{x}} \cos ψ_{i}^{ϖ} - u_{i}^{ϖ_{y}} \sin ψ_{i}^{ϖ}) + ψ_{i}^{ϖ} \\ h_{i}^{ϖ_{i n p}} = h_{i} + \frac{α_{h}}{α_{λ}} λ_{i}^{g} + α_{h} u_{i}^{ϖ_{h}} \end{array}

(10)

where

u_{i}^{ϖ_{x}}

,

u_{i}^{ϖ_{y}}

are the swarm control inputs in the horizontal direction and

u_{i}^{ϖ_{h}}

is the control inputs in the height direction.

The output values from the unmanned aerial vehicle dynamics model can be translated into vectors denoting both the position and velocity. These vectors serve as the inputs for the UAV swarm controller, as outlined below:

\{\begin{array}{l} x_{i}^{ϖ} = (x_{i}^{ϖ}, y_{i}^{ϖ}, h_{i}^{ϖ}) \\ v_{i}^{ϖ} = (v_{i}^{ϖ} \cos ψ_{i}^{ϖ}, v_{i}^{ϖ} \sin ψ_{i}^{ϖ}, λ_{i}^{ϖ}) \end{array}

(11)

3. Bio-Inspired Fission–Fusion Control and Planning via Reinforcement Learning Algorithm

The proposed BiFRL algorithm consists of four parts. First, we propose a probabilistic starling-inspired topological Interaction that provides a starling communication structure for UAV swarms that avoids local convergence. Second, in the SFCRL algorithm, we have developed a self-organized fission–fusion control framework tailored for unmanned aerial vehicle swarms. Then, on this basis, we propose a fission decision algorithm and a sub-swarm confrontation via reinforcement learning to realize the effect of a controllable number of sub-swarm in fission–fusion swarm movement and confrontation movement facing unknown dynamic obstacles.

3.1. Probabilistic Starling-Inspired Topological Interaction

In practical scenarios, interaction structures that rely on fixed-distance neighbors often result in large communication overheads, thus hindering the possibility of large-scale UAV swarming in real-world applications. To address the above issues, we present a probabilistic starling-inspired topological interaction approach against the existing seven-nearest-neighbor topological interaction structure in Algorithm 1. It draws inspiration from the topological interaction observed in starling flocks and offers a solution to mitigate the communication burden associated with fixed-distance neighbors while reducing the occurrence of localized swarming. It introduces a probabilistic decision model that performs a probabilistic decision when a swarm of UAVs swarms a localized convergence, and if two swarms are identified as locally convergent, the UAVs in the swarm will interact with the UAVs that are farther away from them as the new topology.

Algorithm 1: Probabilistic starling-inspired topological interaction algorithm

Input:

N_{i}^{ϖ}

,

N_{t o p}

,

d_{N}^{ϖ}

Output:

N_{i}^{ϖ} (t)

Function topological interaction
if

N_{i}^{ϖ} \leq N_{t o p}

n_{n o w}

\leftarrow N_{i}^{ϖ}

for

o \leftarrow

N − 1 do
find

\min (d_{i j}^{ϖ})

in

d_{N}^{ϖ}

n_{d_{i j}}^{ϖ} \leftarrow \min (d_{i j}^{ϖ})

if

d_{i j}^{ϖ}

<

R_{R a d i u s}

&&

N_{i}^{ϖ} \leq N_{t o p}

− 2 then

N_{i}^{ϖ} (t)

\leftarrow

the agent j from swarm{}

n_{n o w}

\leftarrow n_{n o w}

+ 1

N_{i}^{ϖ} \leftarrow \leq N_{t o p}

end if
if

N_{i}^{ϖ} = N_{t o p}

− 1 then

ϱ = e^{Υ_{rad}} (\max (d_{i j}^{ϖ}) - \min (d_{i j}^{ϖ}))

if

ϱ > ϱ_{rad}

p_{d_{i j}}^{ϖ}

\leftarrow

choose the

\min (d_{i j}^{ϖ})

agent from
swarm{}

o_{d_{i j}}^{ϖ}

\leftarrow

choose the

\min (d_{i j}^{ϖ})

agent from
swarm{except(

n_{d_{i j}}^{ϖ}

)}
if

|d_{i p}^{ϖ} - d_{i o}^{ϖ}| \leq ϱ_{m a x}

then

N_{i}^{ϖ} (t)

\leftarrow

the agent p from swarm{}

n_{n o w}

\leftarrow n_{n o w}

+ 1

N_{i}^{ϖ} \leftarrow N_{t o p}

else

N_{i}^{ϖ} (t)

\leftarrow

the agent o from swarm{}
end if
end if
o

\leftarrow N

− 1
end if
end for
end if
return sub-swarm
end function

3.2. Self-Organized Fission–Fusion Control Framework

The fission–fusion dynamics within a swarm involve two contrasting and competitive behaviors. The fusion behavior necessitates the formation of a collectively coordinated ensemble among all individuals, while the fission behavior requires a disruption of the original order, giving rise to distinct smaller sub-swarms [5,19]. To account for the influence of dynamic obstacles, we integrate intrusive forces into the self-organized fission–fusion control algorithm, as delineated below:

\{\begin{array}{l} {\dot{χ}}_{i}^{ϖ} = v_{i}^{ϖ} \\ m_{i}^{ϖ} {\dot{v}}_{i}^{ϖ} = γ^{i n e} u_{i}^{ϖ} + \underset{u_{i}^{ϖ_{p o s}}}{\underset{︸}{℘_{i}^{ϖ} u_{i}^{ϖ_{p l s}} + (1 - ℘_{i}^{ϖ}) u_{invaders}^{p o s}}} \\ + \underset{u_{i}^{ϖ_{v e l}}}{\underset{︸}{℘_{i}^{ϖ} u_{i}^{ϖ_{v l l}} + (1 - ℘_{i}^{ϖ}) u_{invaders}^{v e l}}} \\ - ξ {‖ v_{i}^{ϖ} ‖}^{2} v_{i}^{ϖ} + ζ ϑ_{i}^{ϖ} + ℘_{i}^{ϖ} u_{i}^{ϖ_{g u i}} \end{array}

(12)

where

γ^{i n e} v_{i}^{ϖ}

represents the inertial term associated with the velocity of the UAV;

ζ ϑ_{i}^{ϖ}

pertains to the stochastic disturbance term generated in the context of the UAV. To address the interference posed by dynamic obstacles, we extend velocity coordination in the following manner:

u_{i}^{ϖ_{v e l}} = \sum_{j \in N_{i}^{ϖ} (t)} Γ^{v e l} (v_{j}^{ϖ} - v_{i}^{ϖ}) + (1 - ℘_{i}^{ϖ}) \underset{u_{invaders}^{v e l}}{\underset{︸}{(u_{t r a} + u_{l u r})}}

(13)

where

u_{l u r}

represents the attractive force generated when a sub-group identifies dynamic obstacles.

u_{t r a}

is the capture force generated when a sub-swarm engages in the capture of dynamic obstacles, with a detailed analysis of the specific mechanisms behind the capture force provided in Section 3.4.

℘_{j}^{ϖ}

is the state of the agent. We have taken into consideration the impact of dynamic obstacles on sub-swarm and expanded the position cooperation term based on Equation (7). The definition of position cooperation is outlined as follows:

\begin{array}{l} u_{i}^{ϖ_{p o s}} = (1 - ℘_{i}^{ϖ}) ε_{i n v a d e r} \exp (- d_{i i n v a d e r}^{ϖ}) \\ + \sum_{j \in N_{i}^{ϖ} (t)} Γ^{p o s} ‖ d_{i j}^{ϖ} ‖ (1 - ({(\frac{l_{a}^{ϖ} ‖ x_{j}^{ϖ} - x_{i}^{ϖ} ‖}{x_{j}^{ϖ} - x_{i}^{ϖ}})}^{2}) \exp (\frac{l_{c}^{ϖ}}{d_{i j}^{ϖ}})) \end{array}

(14)

where

ε_{i n v a d e r}

is the position interference coefficient of the dynamic obstacle, and

d_{i i n v a d e r}^{ϖ}

represents the distance between the dynamic obstacle and agent

i

. According to Equation (14), when the sub-swarm is tighter from the dynamic obstacle, the further the UAVs of the sub-swarm are from each other to prevent the agents in the sub-swarm from getting too close to each other to be more vulnerable togetherly to attack.

3.3. Fission Decision Algorithm

Algorithm 1 has depicted the establishment of the interaction topology for the entire swarm through a limited number of interactions among agents. We further illustrate the sub-swarm selection mechanism employed by the swarm when encountering an obstacle, as shown in Algorithm 2. Under the influence of topological interactions, the algorithm selectively organizes sub-swarm based on the interference direction of dynamic obstacles, concurrently achieving a controllable number of agents within the swarm.

Algorithm 2: Fission decision algorithm

Input:

d_{N}^{ϖ}

,

N_{i}^{ϖ} (t)

Output: sub-swarm
Function sub-swarm selection
if

\min (d_{i j}^{ϖ})

<

R_{R a d i u s}

then

n_{n o w}

\leftarrow

0
for i

\leftarrow

1: N do
if

n_{n o w}

\leq n_{e x - n o w}

&&

φ_{i}^{ϖ} \neq 0

then
{}_sub-swarm

\leftarrow

choose

\min (d_{i j}^{ϖ})

agent
from swarm{except(

φ_{i}^{ϖ} = 0

)}

φ_{i}^{ϖ} = 0

n_{n o w}

\leftarrow n_{n o w}

+ 1

N_{i}^{ϖ} (t) \leftarrow

Algorithm 1
if

φ_{N_{i}^{ϖ} (t)}^{ϖ} \neq 0

then
{}_sub-swarm

\leftarrow

choose

N_{i}^{ϖ} (t)

agent
{1,

n_{n o w}

}_sub-swarm

\leftarrow

{}_sub-swarm

n_{n o w}

\leftarrow

+

n

_{}sub-swarm

φ_{choose N_{i}^{ϖ} (t) agent}^{ϖ} = 0

end if
end if
if

n_{n o w}

==

n_{e x - n o w}

then
break
end if
end for
if

m i n (d_{i i n v a d e r}^{ϖ}) \in ʑ_{i v a}

then

u_{t r a} = - v_{i i n v a d e r}^{ϖ} / |v_{i i n v a d e r}^{ϖ}| + - ξ {‖ v_{i}^{ϖ} ‖}^{2} v_{i}^{ϖ}

else

u_{l u r} \leftarrow Algorithm 3

end if
end if
return sub-swarm
end function

3.4. Sub-Swarm Confrontation via Reinforcement Learning

After splitting the sub-swarms, we formulate the sub-swarm path planning problem with a Markov decision process (MDP). An MDP provides a mathematical framework for modeling sequential decision-making problems where an agent must choose actions in a sequence to achieve a desired goal. The MDP can be denoted as a 4-tuple:

〈S, A, P, R〉

, where

S

is the set of state

s_{t}

,

A

is the set of available actions (

a_{t}

) of the UAV,

P

is the transition probability distribution, and

R

defined as

S \times A \to R

, which is the reward function.

In time slot

t

, the state

S

in the environment can be denoted as

S_{t} = \{c_{p s}, c_{s s}, c_{e}, c_{t}\}

, which indicates the coordinate of the parent swarm, sub-swarm, dynamic obstacle, and target, respectively. The action of a UAV can be written as

A_{t} = \{v_{t}, θ_{t}\}

, which denote the flying speed and direction of the UAV, respectively.

The reward function in time slot

t

is comprised of three parts, which are described as

R_{t} = r_{s 2 e} + r_{s 2 t} + r_{e 2 p}

(15)

The first term

r_{s 2 e}

is a reward for the distance between the sub-swarm and the dynamic obstacle, denoted as

r_{s 2 e} = 2 \times (e^{- \frac{{(d_{s 2 e} - d_{s 2 e}^{m e a n})}^{2}}{2 σ^{2}}} - 0.5) + \{\begin{array}{r} α \log (\frac{d_{s 2 e}}{d_{s 2 e}^{s a f e}}), d_{s 2 e} < d_{s 2 e}^{s a f e} \\ 0, d_{s 2 e} \geq d_{s 2 e}^{s a f e} \end{array}

(16)

where

d_{s 2 e}

is the distance between the sub-swarm and the dynamic obstacle,

d_{s 2 e}^{s a f e}

and

d_{s 2 e}^{c a p t u r e}

are the safe and capture distance between the sub-swarm and dynamic obstacle, respectively. This term is parameterized through

σ

and

α

.

The second term (

r_{s 2 t}

) is a reward for the distance between the sub-swarm and the target, given by

r_{s 2 t} = - β d_{s 2 t} + c_{1}

(17)

where

d_{s 2 t}

is the distance between the sub-swarm and the target, and

β

and

c_{1}

are parameters for adjustment.

The third term (

r_{e 2 p}

) is a reward for the distance between the dynamic obstacle and the parent swarm, and it is described as

r_{e 2 p} = \{\begin{array}{r} γ c l i p (\log (\frac{d_{e 2 p}}{d_{e 2 p}^{m i n}} - 1), - 1, 1), d_{e 2 p} > d_{e 2 p}^{m i n} \\ c_{2}, d_{e 2 p} \leq d_{e 2 p}^{m i n} \end{array}

(18)

where

d_{e 2 p}

means the distance between the dynamic obstacle and the parent swarm,

d_{e 2 p}^{m i n}

means the minimum distance between the dynamic obstacle and parent swarm,

γ

and

c_{2}

are parameters for adjustment, and

c l i p (\log (\frac{d_{e 2 p}}{d_{e 2 p}^{m i n}} - 1), - 1, 1)

is a mathematical operation which removes the incentive for moving

\log (\frac{d_{e 2 p}}{d_{e 2 p}^{m i n}} - 1)

outside of the interval

[- 1, 1]

. The sub-swarm is captured by the dynamic obstacle if

d_{e 2 p} < = d_{e 2 p}^{m i n}

.

To address the path planning problem of the sub-swarm, we propose an RL-based algorithm. Reinforcement learning is an effective approach for tackling sequential decision problems in MDPs, aiming to maximize the cumulative reward within an episode. Among various RL algorithms, the proximal policy optimization (PPO) algorithm stands out as one of the most efficient methods for policy optimization. Another notable off-policy RL algorithm is the soft actor-critic (SAC) algorithm, which demonstrates high sample efficiency. Additionally, deep Q-learning (DQN) is a widely used RL method, which, however, is limited to discrete action spaces. In this study, we introduce the SAC algorithm to realize the antimotion of sub-swarm with dynamic obstacles.

Algorithm 3: Sub-swarm confrontation algorithm

Input:

c_{t}^{parent - swarm}

,

c_{t}^{sub - swarm}

,

c_{t}^{enemy}

,

c_{t}^{target}

Output:

u_{l u r}

Function: sub-swarm Confrontation with dynamic obstacles
Initialize parameter vectors

ψ

,

\bar{ψ}

,

θ

,

ϕ

Initialize replay buffer

D

for each iteration do
for each time step do
Sample

a_{t}

according to

a_{t} ~ π_{ϕ} (a_{t} | s_{t})

from target network
Observe

s_{t + 1}

according to

s_{t + 1} ~ p (s_{t + 1} | s_{t}, a_{t})

from transition
probability distribution
Obtain

r_{t} = r_{s 2 e} + r_{s 2 t} + r_{e 2 p}

from the environment
Store transitions to replay buffer

D

end for
for each gradient step do
Update

ψ

,

θ

,

ϕ

,

\bar{ψ}

in turn
end for
end for
end function

4. Simulation Studies

4.1. Swarm Campaign Evaluation Metrics

4.1.1. Order Parameters

The concept of swarm motion order proposed by Vicsek [38] is typically assessed using an order parameter. In this section, we employ a set of quantitative metrics as order parameters to analyze the fission–fusion motion of the UAV swarm.

Polarization Index.

φ \in [0, 1]

denotes the degree to which all drones tend to move in the same direction at this moment. Given the introduction of stochastic disturbances, we have established a benchmark (

φ_{f l o c k}

) for the polarization index. When surpassing

φ_{f l o c k}

, it indicates the formation of a stable swarm.

φ = ‖ \sum_{i = 1}^{N_{i}^{ϖ}} \frac{v_{i}^{ϖ}}{‖ v_{i}^{ϖ} ‖} ‖ \frac{1}{N_{i}^{ϖ}}

(19)

where

v_{i}^{ϖ} \in R^{3}

is the velocity vector.

Differentiation Index. The single polarization index does not adequately capture the movement characteristics of the swarm during sub-swarm movements. To address this limitation, we incorporate the differentiation index [17] to assess the velocity variation among agents within the swarm, defined as follows:

λ = \frac{Γ^{2} + 1}{ℏ}

(20)

Γ = \frac{1}{N} \sum_{i = 1}^{N} {(v_{i} - \bar{v})}^{3} / {(\frac{1}{N} \sum_{i = 1}^{N} {(v_{i} - \bar{v})}^{3})}^{- \frac{3}{2}}

(21)

ℏ = \frac{1}{N} \sum_{i = 1}^{N} {(v_{i} - \bar{v})}^{4} / {(\frac{1}{N} \sum_{i = 1}^{N} {(v_{i} - \bar{v})}^{2})}^{- 2}

(22)

where

Γ

and

ℏ

indicate the skewness and kurtosis of the velocity distribution of the UAV swarm, respectively, for differentiation index

λ \in [0, 1

], when

λ = 1

denotes two independent Renoulli distributions signifies the complete fission of velocities; In this study, when

λ > 0.9

we determine that two independent swarms have been formed.

4.1.2. Performance Evaluation Index

To assess the effectiveness of the swarm in resisting the interference from dynamic obstacles, this section proposes the precision of stimuli as an indicator of the swarm’s resilience to interference. Additionally, we establish a communication load to quantify the communication pressure.

Precision of Stimuli. The precision of stimuli denotes the proximity of the swarm’s motion direction to that of dynamic obstacles. When

Λ = 0

, it signifies a lack of interference as the motion directions are dissimilar. Conversely, when

Λ = 1

, it indicates severe interference, as the motion directions are entirely congruent. The definition is as follows:

Λ = \frac{A - A_{0}}{1 - A_{0}} A_{0} = \frac{1 + r_{o} r_{a t t}}{2} A = \frac{1}{N_{i}^{ϖ}} \sum_{i = 1}^{N_{i}^{ϖ}} \frac{1 + r_{i} r_{a t t}}{2}

(23)

where

r_{i}

,

r_{a t t}

and

r_{o}

, respectively signify the unit velocity directions of unmanned aerial vehicle

i

, the swarm, and the dynamic obstacles.

Communication Load. This term represents the average communication cost incurred by the UAV fleet, where a lower communication load indicates a reduced average communication cost. It is defined as follows:

N_{c l} = \frac{1}{N} ‖ \sum_{i = 1}^{N} \sum_{j = 1}^{N_{i l}^{ϖ}} d_{i j}^{ϖ} ‖

(24)

where

N_{i l}^{ϖ}

represents the number of unmanned aerial vehicles interacting with unmanned aerial vehicle

i

.

4.2. Detailed Simulation Parameters

Table 1 and Table 2 show the detailed simulation parameters of the experiments and RL-based path planning of the sub-swarm, respectively. It is worth noting that the purpose of the parameter list we have provided is in the hope that researchers will be able to reproduce the methodology of this study more quickly. In fact, the parameters in the table can be modified according to different research situations, which does not affect the validity of the proposed algorithm.

4.3. Simulation Results Analysis

In this section, we assess the ability of the UAV swarm to achieve fission–fusion and perform confrontation movements, ensuring that the parent swarm movement remains unaffected during anti-confrontation scenarios.

4.3.1. Evaluation of the Probabilistic Starling-Inspired Topological Structure

For the validation of the PSTI structure, we conducted multiple tests with varying birth ranges for the agents (ranging from 4 to 13). Each test consisted of 1000 trials, with 130 iterations per trial. Figure 3 displays the probabilities of generating a stable swarm using the previous seven-nearest-neighbor and our proposed PSTI structures, respectively, after randomly generating 20 agents at different birth ranges.

From Figure 3, it is observed that both methods achieve high swarm rates within a small range. However, as the range increases, the number of seven-nearest-neighbour topological interaction models gradually exhibits local convergence problems. Even at a range of 15, the probability of local convergence reduces to 50 percent. Conversely, the proposed PSTI structure consistently produces superior results across all tested ranges, which demonstrates its ability to enhance the swarming efficiency of UAV swarms while reducing the occurrence of local convergence. It is worth noting that the PSTI structure does exhibit some instances of local convergence due to the random characteristics of the incorporated probability coefficients. However, we can maintain a low communication load while achieving superior swarming performance with our proposed approach by balancing the probability factors.

4.3.2. Simulation of the Sub-Swarm Confrontation via Reinforcement Learning Algorithm

We build a simulation environment to satisfy the demand for abundant interactions between the swarm and the dynamic obstacle. Assuming that the mission area is

100 \times 100

. The coordinates of the parent swarm, sub-swarm, and dynamic obstacle are generated by the planning algorithms mentioned above. We use the center point of the sub-swarm to optimize its trajectory, and the sub-swarm flies at a constant altitude of its initial height. When an episode starts, the coordinates of both swarms and the dynamic obstacle are generated. Once the parent swarm is captured by the dynamic obstacle, when

d_{e 2 p} \leq d_{e 2 p}^{m i n}

or the sub-swarm runs for more than

T = 28

steps, the episode ends. The maximal speed of the parent swarm, sub-swarm, and the dynamic obstacle are

v_{p}^{m a x} = 1

,

v_{s u b}^{m a x} = 2

,

v_{e}^{m a x} = 1.5

, respectively.

We then evaluate the performance of RL-based algorithms in the simulation environment, including the PPO, SAC, and DQN methods. For the sake of inadaptability to the continuous action space of the DQN algorithm, we discretize each action of sub-swarm (i.e., the flying speed (

v_{t}

) and direction (

θ_{t}

)) into 10. The three algorithms have the same hyperparameters of

γ = 0.99

, a learning rate of

l_{r} = 0.0003

, and network architecture of

N = 128 \times 128 \times 128

. Figure 4 shows the accumulative reward of these RL algorithms over training steps for the same network depth. It can be observed that these algorithms achieve similar accumulated rewards, while the agent based on the SAC algorithm converges faster and is as stable as the agent of the PPO algorithm.

Figure 5 illustrates the trajectory of the sub-swarm planned by using the PPO algorithm. The visualization demonstrates the efficient response of the sub-swarm to dynamic obstacles, considering various obstacle coordinates. The sub-swarm consistently approaches the target successfully by the end of each episode.

4.3.3. Simulation of the Bio-Inspired Fission–Fusion Control and Planning via Reinforcement Learning Algorithm

Simulation results of swarm motion are depicted in Figure 6a–f. Figure 6a showcases the whole process of fission–fusion motion of the UAV swarm when encountering dynamic obstacles. This motion is achieved through our proposed BiFRL algorithm. The sub-swarm successfully executes the confrontation movement and seamlessly integrates back into the parent swarm upon completion. The point of origin for the dynamic obstacle is (40, 60, 30). When the swarm perceives the presence of dynamic obstacles, it initiates a self-organizing fission into two swarms, which include a parent swarm and a sub-swarm. The two swarms swiftly synchronize their positions. During the confrontation process, the sub-swarm effectively confronts the interference from dynamic obstacles utilizing the SFCRL algorithm. Finally, following the resolution of the sub-swarm’s engagement, it autonomously converges back into the parent swarm, fusion into a stable and cohesive swarm.

Figure 6b illustrates random initial spots of all agents. Figure 6c demonstrates the rapid formation of a stable swarm by intelligent agents under random initial spots. In Figure 6d, the swarm perceives the obstacle, leading to its self-organization and fission into two state-stable swarms. The confrontation motion against dynamic obstacles is executed based on the SFCRL algorithm. Figure 6e exhibits the sub-swarm completing its confrontation movement and initiating its return to the parent swarm after the dynamic obstacle stops tracking. Lastly, Figure 6f illustrates the self-organization of the sub-swarm as it returns to the parent swarm following the conclusion of the antagonistic movement, culminating in a fusion.

4.3.4. Evaluation of the Bio-Inspired Fission–Fusion Control and Planning via Reinforcement Learning Algorithm

Figure 7 depicts the temporal evolution of the polarization index of the UAV swarm combined with reinforcement learning. The polarization indices in Figure 7 demonstrate that the entire swarm successfully completes the swarming within 10 s. After the fission campaign, both swarms have strong robustness. Upon the sub-swarm’s return to the parent swarm, there is a brief decrease in the polarization index, followed by stabilization around 1. Notably, the parent swarm is unaffected by the dynamic obstacle throughout the entire process.

Figure 8 illustrates the evolution of the differentiation index in the swarm, integrated with reinforcement learning.

The differentiation index showed that 20 randomly generated UAVs rapidly established stable swarms within the first 0–3 s of the initial, which persists until approximately 24 s. At around 27 s, the sub-swarm undergoes rapid fission, forming two stable sub-swarms, while maintaining a constant differentiation index of approximately 1. Subsequently, at approximately 92 s, after completing the confrontation and returning to the parent swarm, it seamlessly fuses back into a stable swarm, thereby validating the effectiveness of our proposed algorithm.

Figure 9 illustrates the precision of stimuli exhibited by both swarms under dynamic obstacle interference. Both swarms demonstrate commendable precision of response stimuli throughout the fission–fusion process. The precision of stimuli of the sub-swarm, influenced by the SFCRL algorithm, exhibits some variability; however, it consistently demonstrates a high level of accuracy. In contrast, the parent swarm experiences slight fluctuations in response stimuli when dynamic obstacles are detected. These fluctuations primarily arise from the need for the parent swarm to stabilize into a new formation during the fission–fusion process and are not directly caused by the dynamic obstacles themselves. Furthermore, it is observed that the obstacles do not significantly impact the parent swarm during the remaining period duration.

Figure 10 presents the dynamics of communication load in UAV swarms employing different interaction structures. In this study, we compare the communication load of the PSTI structure with various fixed-distance communication structures. The simulation results demonstrate that in the absence of fission–fusion, the probabilistic starling-inspired topological interaction achieves a significant reduction in communication load, ranging from 50% to 85% compared to other structures. Even during the fission–fusion process, our proposed interaction structure continues to exhibit advantages. Conversely, as the fixed distance interaction structure adopts smaller interactive distances, the likelihood of local convergence increases during the swarming process.

5. Conclusions

In recent years, researchers have shown significant interest in the UAV swarm, particularly its self-organized fission–fusion control methods. However, dealing with dynamic obstacles that possess tracking capabilities remains a challenge in UAV swarm control. As such, we present a bio-inspired fission–fusion control and planning of a UAV swarm system via reinforcement learning algorithm to tackle this issue. In contrast to existing methodologies, our approach is targeted at resolving the interference posed by unknown dynamic obstacles through the utilization of fewer resources. The proposed self-organized fission–fusion framework facilitates autonomous grouping and separation in response to dynamic obstacles while maintaining control over the size of the sub-swarm. Additionally, we introduce sub-swarm confrontation via reinforcement learning to handle the selection of confrontation paths in the presence of unknown disturbance directions. After completing the confrontation, the sub-swarm seamlessly integrates back into the parent swarm through self-organization. Furthermore, we propose a probabilistic starling-inspired topological interaction structure, which effectively mitigates the issue of swarm local convergence encountered by existing seven-nearest-neighbor algorithms. To validate the competitiveness of our approach, we conduct extensive simulations involving different swarm initial ranges and evaluate the communication load as a performance metric. The results demonstrate the effectiveness and feasibility of our proposed BiFRL algorithm, which combines reinforcement learning with UAV swarm fission–fusion control and planning to handle unknown dynamic obstacle disturbances. We believe that our proposed algorithm has a positive effect on improving the efficiency of swarm control in dynamic environments and the ability of clusters to combat dynamic disturbances. This has a positive effect on the research in the field of multi-swarm.

However, the only resource terms we considered during the study are limited, and it is clear that we need to consider more resource terms, e.g., electromagnetic interference in the environment, leakage rate during recognition of dynamic obstacles, and other factors. These issues will make the existing dynamic environment more complex but more realistic. We will introduce more AI algorithms to try to solve such complex dynamic problems. This will, in turn, improve the robustness and applicability of the algorithms. We will perform in-depth sensitivity analyses for more complex environments to observe the robustness of the algorithms [39]. In future work, we will further investigate the interference of complex dynamic obstacles within the swarm fission–fusion approach and study the influence of specific parameters on the algorithm in greater depth.

Author Contributions

Conceptualization, X.Z. and W.D.; Methodology, Y.W. and W.D.; Software, X.Z.; Validation, X.Z., Y.W. and J.J.; Formal analysis, W.D.; Investigation, Q.W.; Resources, J.J.; Writing—original draft, Q.W. and Z.Z.; Writing—review & editing, Y.W.; Visualization, Z.Z.; Funding acquisition, W.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant NO. U20B2042), Science and Technology Innovation 2030 Key Project of “New Generation Artificial Intelligence” (Grant NO. 2020AAA01082010).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to specific reinforcement learning simulation data is required by the lab to be retained, you may communicate with the corresponding author if you have the same requirement.

Conflicts of Interest

Jun Jia was employed by the company Shanghai Eletro-Mechanical Engineering Institute, Shanghai. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zelenka, J.; Kasanický, T.; Budinská, I.; Naďo, L.; Kaňuch, P. SkyBat: A Swarm Robotic Model Inspired by Fission-Fusion Behaviour of Bats. In Advances in Service and Industrial Robotics, Proceedings of the International Conference on Robotics in Alpe-Adria Danube Region (RAAD 2018), Poitiers, France, 6–8 June 2019; Springer: Cham, Switzerland, 2019; pp. 521–528. [Google Scholar] [CrossRef]
Jeanson, R.; Kukuk, P.F.; Fewell, J.H. Emergence of division of labor in halictine bees: Contributions of social interactions and behavioural variance. Anim. Behav. 2005, 70, 1183–1193. [Google Scholar] [CrossRef]
Hemelrijk, C.K.; Hildenbrandt, H. Schools of fish and flocks of birds: Their shape and internal structure by self-organization. Interface Focus 2012, 2, 726–737. [Google Scholar] [CrossRef] [PubMed]
Ward, A.J.W.; Sumpter, D.J.T.; Couzin, I.D.; Hart, P.J.B.; Krause, J. Quorum decision-making facilitates information transfer in fish shoals. Proc. Natl. Acad. Sci. USA 2008, 105, 6948–6953. [Google Scholar] [CrossRef]
Lecheval, V.; Jiang, L.; Tichit, P.; Sire, C.; Hemelrijk, C.K.; Theraulaz, G. Social conformity and propagation of information in collective U-turns of fish schools. Proc. R. Soc. B Biol. Sci. 2018, 285, 20180251. [Google Scholar] [CrossRef]
Couzin, I.D. Behavioural ecology: Social organization in fission–fusion societies. Curr. Biol. 2006, 16, R169–R171. [Google Scholar] [CrossRef] [PubMed]
Couzin, I.D.; Laidre, M.E. Fission–fusion populations. Curr. Biol. 2009, 19, R633–R635. [Google Scholar] [CrossRef] [PubMed]
Loy, J. Social Behavior and Habitat: Primate Societies. Group Techniques of Ecological Adaptation. Hans Kummer. Aldine-Atherton, Chicago, 1971. 160 pp., illus. Cloth, 7.50; paper, 2.95. Worlds of Man series. Science 1971, 174, 49. [Google Scholar] [CrossRef]
Zelenka, J.; Kasanický, T.; Budinská, I.; Kaňuch, P. An agent-based algorithm resembles behaviour of tree-dwelling bats under fission–fusion dynamics. Sci. Rep. 2020, 10, 16793. [Google Scholar] [CrossRef]
Silk, M.J.; Croft, D.P.; Tregenza, T.; Bearhop, S. The importance of fission–fusion social group dynamics in birds. Ibis 2014, 156, 701–715. [Google Scholar] [CrossRef]
Fortin, D.; Fortin, M.-E.; Beyer, H.L.; Duchesne, T.; Courant, S.; Dancose, K. Group-size-mediated habitat selection and group fusion–fission dynamics of bison under predation risk. Ecology 2009, 90, 2480–2490. [Google Scholar] [CrossRef] [PubMed]
Bond, M.L.; Lee, D.E.; Ozgul, A.; König, B. Fission–fusion dynamics of a megaherbivore are driven by ecological, anthropogenic, temporal, and social factors. Oecologia 2019, 191, 335–347. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Wang, C.; Wei, Y.; Zhang, C. Neuroadaptive sliding mode formation control of autonomous underwater vehicles with uncertain dynamics. IEEE Syst. J. 2019, 14, 3325–3333. [Google Scholar] [CrossRef]
Liang, H.; Fu, Y.; Gao, J. Bio-inspired self-organized cooperative control consensus for crowded UUV swarm based on adaptive dynamic interaction topology. Appl. Intell. 2021, 51, 4664–4681. [Google Scholar] [CrossRef]
Nauta, J.; Simoens, P.; Khaluf, Y. Group size and resource fractality drive multimodal search strategies: A quantitative analysis on group foraging. Phys. A Stat. Mech. Appl. 2022, 590, 126702. [Google Scholar] [CrossRef]
Reséndiz-Benhumea, G.M.; Froese, T.; Ramos-Fernández, G.; Smith-Aguilar, S.E. Applying social network analysis to agent-based models: A case study of task allocation in swarm robotics inspired by ant foraging behaviour. In Proceedings of the ALIFE 2019: The 2019 Conference on Artificial Life, One Rogers Street, Cambridge, UK, 29 July–2 August 2019; pp. 616–623. [Google Scholar] [CrossRef]
Yang, P.; Yan, M.; Song, J.; Tang, Y. Self-Organized fission-fusion Control Algorithm for Flocking Systems Based on Inter-mittent Selective Interaction. Complexity 2019, 2019, 2187812. [Google Scholar] [CrossRef]
Reynolds, C.W. Flocks, Herds and Schools: A Distributed Behavioural Model. In Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, Anaheim, CA, USA, 27–31 July 1987; Association for Computing Machinery: New York, NY, USA, 1987; pp. 25–34. [Google Scholar] [CrossRef]
Ming, R.; Jiang, R.; Luo, H.; Lai, T.; Guo, E.; Zhou, Z. Comparative Analysis of Different UAV Swarm Control Methods on Unmanned Farms. Agronomy 2023, 13, 2499. [Google Scholar] [CrossRef]
Bajec, I.L.; Zimic, N.; Mraz, M. Simulating flocks on the wing: The fuzzy approach. J. Theor. Biol. 2005, 233, 199–220. [Google Scholar] [CrossRef]
Ban, Z.; Hu, J.; Lennox, B.; Arvin, F. Self-organised collision-free flocking mechanism in heterogeneous robot swarms. Mob. Networks Appl. 2021, 26, 2461–2471. [Google Scholar] [CrossRef]
Lei, X.K.; Liu, M.Y.; Yang, P.P. Fission control algorithm for swarm based on local following interaction. Control Decis. 2013, 28, 741–745. [Google Scholar]
He, Y. Mission-driven autonomous perception and fusion based on UAV swarm. Chin. J. Aeronaut. 2020, 33, 2831–2834. [Google Scholar] [CrossRef]
Nandi, G.C.; Mitra, D. Development of a sensor integration strategy for robotic application based on geometric optimization. In Sensor Fusion: Architectures, Algorithms, and Applications; SPIE: St Bellingham, WA, USA, 2001; Volume 4385, pp. 282–291. [Google Scholar] [CrossRef]
Wei, Y.; Blake, M.B.; Madey, G.R. An operation-time simulation framework for UAV swarm configuration and mission planning. Procedia Comput. Sci. 2013, 18, 1949–1958. [Google Scholar] [CrossRef]
Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
Chitnis, R.; Holladay, R.; Kim, B.; Silver, T.; Kaelbling, L.P.; Lozano-Pérez, T. Integrated task and motion planning. Annu. Rev. Control Robot. Auton. Syst. 2021, 4, 265–293. [Google Scholar] [CrossRef]
Zhang, X.; Ding, W.; Wang, Y.; Luo, Y.; Zhang, Z.; Xiao, J. Bio-Inspired Self-Organized Fission–Fusion Control Algorithm for UAV Swarm. Aerospace 2022, 9, 711. [Google Scholar] [CrossRef]
Soysal, O.; Şahin, E. A macroscopic model for self-organized aggregation in swarm robotic systems. In Swarm Robotics; Springer: Berlin/Heidelberg, Germany, 2006; pp. 27–42. [Google Scholar] [CrossRef]
Vengatesan, K.; Kumar, A.; Chavan, V.T.; Wani, S.M.; Singhal, A.; Sayyad, S. Simple Task Implementation of Swarm Robotics in Underwater. In Emerging Trends in Computing and Expert Technology; Springer: Cham, Switzerland, 2019; pp. 1138–1145. [Google Scholar] [CrossRef]
Steinberg, M. Biologically-inspired approaches for self-organization, adaptation, and collaboration of heterogeneous autonomous systems. In Defense Transformation and Net-Centric Systems; SPIE: St Bellingham, WA, USA, 2011; pp. 127–139. [Google Scholar] [CrossRef]
Ballerini, M.; Cabibbo, N.; Candelier, R.; Cavagna, A.; Cisbani, E.; Giardina, I.; Zdravkovic, V. Interaction ruling animal collective behaviour depends on topological rather than metric distance: Evidence from a field study. Proc. Natl. Acad. Sci. USA 2008, 105, 1232–1237. [Google Scholar] [CrossRef]
Asensio, N.; Aureli, F.; Schaffner, C.; Korstjens, A. Intragroup aggression, fission–fusion dynamics and feeding competition in spider monkeys. Behaviour 2008, 145, 983–1001. [Google Scholar] [CrossRef]
Ling, H.; Mclvor, G.E.; van der Vaart, K.; Vaughan, R.T.; Thornton, A.; Ouellette, N.T. Local interactions and their group-level consequences in flocking jackdaws. Proc. R. Soc. B Biol. Sci. 2019, 286, 20190865. [Google Scholar] [CrossRef]
Fulginei, F.R.; Salvini, A. The flock of starlings optimization: Influence of topological rules on the collective behaviour of swarm intelligence. In Computational Methods for the Innovative Design of Electrical Devices; Springer: Berlin/Heidelberg, Germany, 2010; pp. 129–145. [Google Scholar] [CrossRef]
Martin, S. Multi-agent flocking under topological interactions. Syst. Control Lett. 2014, 69, 53–61. [Google Scholar] [CrossRef]
Xie, R.; Gu, C.; Liu, L.; Chen, L.; Zhang, L. Large scale UAVs collaborative formation simulation based on starlings flight mechanism. In Proceedings of the Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, Macau, China, 23–25 July 2018; Springer: Cham, Switzerland, 2018; pp. 65–78. [Google Scholar] [CrossRef]
Vicsek, T.; Czirók, A.; Ben-Jacob, E.; Cohen, I.; Shochet, O. Novel type of phase transition in a system of self-driven particles. Phys. Rev. Lett. 1995, 75, 1226–1229. [Google Scholar] [CrossRef]
Pianosi, F.; Beven, K.; Freer, J.; Hall, J.W.; Rougier, J.; Stephenson, D.B.; Wagener, T. Sensitivity analysis of environmental models: A systematic review with practical workflow. Environ. Model. Softw. 2016, 79, 214–232. [Google Scholar] [CrossRef]

Figure 1. Illustration of the interaction manners. (a) Comparison of the fixed-distance (left) and topological (right) interaction structures; (b) local convergence issue of the topological interaction structure.

Figure 2. Effects of the dynamic obstacles on typical swarm movement. The swarm avoids dynamic obstacles with tracking functions (a) by only evasions or (b) in a simple fission–fusion operation, which either cannot reach or misses the target point; (c) by constant fission and meandering or (d) by continuous fission–fusion operations on a fixed path, which consume large resource cost.

Figure 3. Swarm effect under different topological interactions.

Figure 4. Accumulated reward versus training steps of different RL algorithms.

Figure 5. Planned path of sub-swarm in the situation of the different initial coordinates of the dynamic obstacle, including (a) (22.62, 21.39), (b) (23.31, 27.70), (c) (26.21, 26.35), and (d) (29.50, 21.71), respectively.

Figure 6. Self-organized fission–fusion process of UAV swarm. (a) indicates the complete UAV swarm fission–fusion movement, while (b–f) denotes different steps within the fission–fusion process.

Figure 7. Polarization index of the UAV swarm.

Figure 8. Differentiation index of the UAV swarm.

Figure 9. Precision of stimuli of the UAV swarm.

Figure 10. Communication load comparison results.

Table 1. Swarm detailed simulation parameters.

Parameters	Description	Numerical Value
$γ^{i n e}$	Inertia coefficient	0.75
$ξ$	Environmental damping factor	0.006
$ζ$	Random noise factor	0.003
$Γ^{p o s}$	Position cooperation factor	1
$Γ^{v e l s}$	Velocity alignment factor	1
$l_{c}^{ϖ}$	Decay coefficient	0.2
$l_{a}^{ϖ}$	Desired spacing	0.2
$α_{ψ}$		0.85
$α_{v}$	Self-driving instrument conrol parameters	3.15
$α_{h}$		0.4
$α_{\dot{h}}$		1
$v_{\min}$	Min horizontal speed	0.02
$ℏ_{\max}$	Max height change rate	2.1
$φ_{\max}$	Max lateral overload	10
$v_{\max}$	Max horizontal speed	2
g	Gravitational acceleration	9.8
$ℏ_{\min}$	Minimum height change rate	−2.2
$R_{R a d i u s}$	Perception radius	70
$φ_{f l o c k}$	Polarization index threshold	0.85
$ʑ_{i v a}$ $ʑ_{i n a}$	Lead-in distance Safe confrontation range	70–65 3–5
N_top	Maximum number of neighbors	7
$Υ_{r a d}$	Random number	0–1
$ϱ_{r a d}$	Probability thresholds	1.75

Table 2. Sub-swarm simulation parameter setting.

Parameters	Description	Numerical Value
W	Simulation region size	100
$T$	Maximal simulation steps	28
$d_{e 2 p}^{m i n}$	Minimal distance between parent swarm and obstacle	5
$d_{s 2 e}^{s a f e}$	Safe distance between the sub-swarm and obstacle	15
$d_{s 2 e}^{c a p t u r e}$	Capture distance between the sub-swarm and dynamic obstacle	10
$v_{s u b}^{m a x}$	Max speed of sub-swarm	2
$v_{p}^{m a x}$	Max speed of parent swarm	1
$v_{e}^{m a x}$	Max speed of dynamic obstacle	1.5
$D$		10
$σ$		2
$α$		5
$β$	Discrete actions	0.1
$γ$		5
$c_{1}$		5
$c_{2}$		−10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Wang, Y.; Ding, W.; Wang, Q.; Zhang, Z.; Jia, J. Bio-Inspired Fission–Fusion Control and Planning of Unmanned Aerial Vehicles Swarm Systems via Reinforcement Learning. Appl. Sci. 2024, 14, 1192. https://doi.org/10.3390/app14031192

AMA Style

Zhang X, Wang Y, Ding W, Wang Q, Zhang Z, Jia J. Bio-Inspired Fission–Fusion Control and Planning of Unmanned Aerial Vehicles Swarm Systems via Reinforcement Learning. Applied Sciences. 2024; 14(3):1192. https://doi.org/10.3390/app14031192

Chicago/Turabian Style

Zhang, Xiaorong, Yufeng Wang, Wenrui Ding, Qing Wang, Zhilan Zhang, and Jun Jia. 2024. "Bio-Inspired Fission–Fusion Control and Planning of Unmanned Aerial Vehicles Swarm Systems via Reinforcement Learning" Applied Sciences 14, no. 3: 1192. https://doi.org/10.3390/app14031192

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bio-Inspired Fission–Fusion Control and Planning of Unmanned Aerial Vehicles Swarm Systems via Reinforcement Learning

Abstract

1. Introduction

2. Problem Formulations

2.1. Unmanned Aerial Vehicles Kinematic Model

2.2. Dynamic Obstacle Movement Model

2.3. Traditional Unmanned Aerial Vehicles Swarm Dynamics Model

2.4. Conversion Relations between Swarm Controller and Kinematic Model

3. Bio-Inspired Fission–Fusion Control and Planning via Reinforcement Learning Algorithm

3.1. Probabilistic Starling-Inspired Topological Interaction

3.2. Self-Organized Fission–Fusion Control Framework

3.3. Fission Decision Algorithm

3.4. Sub-Swarm Confrontation via Reinforcement Learning

4. Simulation Studies

4.1. Swarm Campaign Evaluation Metrics

4.1.1. Order Parameters

4.1.2. Performance Evaluation Index

4.2. Detailed Simulation Parameters

4.3. Simulation Results Analysis

4.3.1. Evaluation of the Probabilistic Starling-Inspired Topological Structure

4.3.2. Simulation of the Sub-Swarm Confrontation via Reinforcement Learning Algorithm

4.3.3. Simulation of the Bio-Inspired Fission–Fusion Control and Planning via Reinforcement Learning Algorithm

4.3.4. Evaluation of the Bio-Inspired Fission–Fusion Control and Planning via Reinforcement Learning Algorithm

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI