Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- The reinforcement learning algorithm is used to solve the optimal DSC problem for restricted interconnected nonlinear safety-critical systems, and the asymmetric input constraint is successfully solved. The method optimizes the control strategy by minimizing the performance function, ensuring the safety of the system’s state, while considering the asymmetric input constraints.
- Nonlinear interconnected safety-critical systems with asymmetric input constraints and safety constraints are converted to equivalent systems that satisfy user-defined safety constraints using barrier functions. Unlike the nonlinear safety-critical systems [3,9,10,13], this paper solves the security constraint problem of the interconnection term through the potential barrier function, which ensures the interconnected nonlinear safety-critical system satisfies the security constraint.
- The asymmetric input constraints are solved by utilizing a single CNN architecture for online approximation of the performance function. Theoretical demonstrations show that the optimal DSC method can achieve uniformly ultimately bounded (UUB) system states and neural network weight estimation errors. In addition, a simulation example verified the feasibility and effectiveness of the developed DSC method.

## 2. Preliminaries

#### 2.1. Problem Descriptions

**Assumption**

**1.**

**Remark**

**1.**

**Assumption**

**2.**

#### 2.2. Security Conversion Issues

**Remark**

**2.**

**Problem**

**1.**

**Definition**

**1**

**.**The function $B\left(\xb7\right):\mathbb{R}\to \mathbb{R}$ defined on interval (a, A) is referred to as the barrier function if

**Assumption**

**3.**

**Lemma**

**1**

**.**$\forall \left({s}_{1},{s}_{2}\right)\in {\mathbb{R}}^{2}$, we have the following condition,

**Remark**

**3.**

- 1.
- 2.
- When the system’s state approaches the boundary of the safety area, the barrier function changes as follows:$$\begin{array}{c}\hfill \underset{{z}_{i}\to {a}_{i}^{+}}{lim}B({z}_{i};{a}_{i},{A}_{i})=-\infty ,\\ \hfill \underset{{z}_{i}\to {A}_{i}^{-}}{lim}B({z}_{i};{a}_{i},{A}_{i})=+\infty .\end{array}$$
- 3.
- The barrier function fails to function when the system state reaches equilibrium, i.e.,$$\begin{array}{c}\hfill B(0;{a}_{i},{A}_{i})=0,\phantom{\rule{2.em}{0ex}}\forall {a}_{i}<{A}_{i}.\end{array}$$

## 3. Decentralized Optimal DSC Design

#### 3.1. Barrier Function Conversion

**Remark**

**4.**

**Problem**

**2.**

**Lemma**

**2.**

- 1.
- 2.

**Proof.**

#### 3.2. Designing the Optimal DSC Strategy by Solving n HJB Equations

**Theorem**

**1.**

**Proof.**

## 4. Critic Network for Approximation

**Remark**

**5.**

## 5. Stability Analysis

**Assumption**

**4.**

**Assumption**

**5.**

**Theorem**

**2.**

**Proof.**

**Remark**

**6.**

## 6. Simulation Example

## 7. Conclusions

The ith Subsystem | Parameter | Meaning | Value |
---|---|---|---|

${m}_{1}$ | Mass of payload | 5 kg | |

${M}_{1}$ | Viscous friction | 2 N | |

The first subsystem | ${\tilde{l}}_{1}$ | Length of the arm | 0.5 m |

${\tilde{G}}_{1}$ | Moment of inertia | 10 kg | |

${\tilde{g}}_{1}$ | Acceleration of gravity | 9.81 m/s | |

${m}_{2}$ | Mass of payload | 10 kg | |

${M}_{2}$ | Viscous friction | 2 N | |

The second subsystem | ${\tilde{l}}_{2}$ | Length of the arm | 1 m |

${\tilde{G}}_{2}$ | Moment of inertia | 10 kg | |

${\tilde{g}}_{2}$ | Acceleration of gravity | 9.81 m/s |

