Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints

Wang, Jinguang; Qin, Chunbin; Qiao, Xiaopeng; Zhang, Dehua; Zhang, Zhongwei; Shang, Ziyang; Zhu, Heyang

doi:10.3390/math10152744

Open AccessArticle

Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints

by

Jinguang Wang

,

Chunbin Qin

^*

,

Xiaopeng Qiao

,

Dehua Zhang

,

Zhongwei Zhang

,

Ziyang Shang

and

Heyang Zhu

School of Artificial Intelligence, Henan University, Zhengzhou 450000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(15), 2744; https://doi.org/10.3390/math10152744

Submission received: 11 July 2022 / Revised: 30 July 2022 / Accepted: 1 August 2022 / Published: 3 August 2022

(This article belongs to the Topic Advances in Nonlinear Dynamics: Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we investigate the constrained optimal control problem of nonlinear multi-input safety-critical systems with uncertain disturbances and time-varying safety constraints. By utilizing a barrier function transformation, together with a new disturbance-related term and a smooth safety boundary function, a nominal system-dependent multi-input barrier transformation architecture is developed to deal with the time-varying safety constraints and uncertain disturbances. Based on the obtained transformation system, the coupled Hamilton–Jacobi–Bellman (HJB) function is established to obtain the constrained Nash equilibrium solution. In addition, due to the fact that it is difficult to solve the HJB function directly, the single critic neural network (NN) is constructed to approximate the optimal performance index function of different control inputs, respectively. It is proved theoretically that, under the influence of uncertain disturbances and time-varying safety constraints, the system states and neural network parameters can be uniformly ultimately bounded (UUB) by the proposed neural network approximation method. Finally, the effectiveness of the proposed method is verified by two nonlinear simulation examples.

Keywords:

barrier function; time-varying safety constraints; adaptive dynamic programming; multi-input system

MSC:

93C10; 93D05; 93D21

1. Introduction

To solve the optimal control problem of any safety-critical systems (e.g., autonomous vehicles, intelligent robots, etc.), safety should be the basic requirement. Failure to ensure the safety of such systems may result in serious consequences, such as casualties, environmental pollution, and equipment damage. The safety control design refers to the control strategy which satisfies the safety specification stipulated by the physical or environmental constraints of the system. The barrier function (BF) method [1,2] has been proved to be an effective method to realize the system safety constraints or state constraints, and have attracted a wide amount of attention in recent years. For the optimal control problem in the modern control domain, it usually relies on solving the complex Hamilton–Jacobi–Bellman (HJB) equation [3,4,5]. However, there is no effective mathematical method to solve the HJB equation due to its own properties. When designing the controllers that are both safe and optimal, the proper combination of safety and performance goal is an issue worth studying.

It has been proved that the dynamic programming (DP) method is a feasible and effective method to solve the HJB equation and derive the optimal solution. However, as the dimension of the variables increases, the dynamic programming method suffers from the “dimension curse”. Adaptive dynamic programming (ADP) [6,7,8,9,10] uses the function approximation, such as neural network (NN) approximation methods, to approximate the cost function in the HJB equation, which has been proved to be a valid method to solve the dimension curse of dynamic programming method. It is an emerging method combining the development of artificial intelligence and control field, and has become a hotspot of international optimization research in recent years [11,12,13,14,15]. In [11,12,13], the authors studied the optimal control problem with disturbance by using the reinforcement learning (RL) method. Aiming at the random differential equations systems with coexisting parametric uncertainties and severe nonlinearities, Zhang et al. [14] studied the problem of event-triggered adaptive tracking control. Vamvoudakis et al. [15] proposed an online continuous time learning algorithm based on policy iteration to learn the optimal control solutions of known nonlinear systems. In [16,17,18], the robust control problem was transformed into the optimal control problem of the nominal system by selecting an appropriate utility function. On the other hand, game theory [19,20,21,22,23,24] has become a powerful tool to optimize the coordination and cooperation of multiple controllers, and has been proved in many practical control problems. In fact, many systems in the real world have the idea of the non-zero-sum (NZS) game, where each controller of the system tries to minimize its cost function. Many researchers translate the non-zero-sum game problem [25,26] into the problem of solving the coupled HJB equation, but it is still a great difficulty to solve the coupled HJB equation [27,28,29]. The development of adaptive dynamic programming and game theory has prompted many scholars to conduct relevant research. For robust trajectory tracking multiple input control of uncertain nonlinear systems, Qin et al. [28] proposed a new adaptive online learning method to learn the Nash equilibrium solution. Song et al. [29] developed a non-strategic integral reinforcement learning (IRL) method to effectively solve the NZS game control problem with unknown system dynamics. Ming et al. [30] proposed a single-network adaptive control method to obtain the optimal solution of NZS differential game for autonomous nonlinear systems. All of the above methods can effectively solve the NZS game optimal control problem. However, few studies have been done on the NZS game with disturbance and time-varying safety constraints. This prompted the author to study this problem.

For the safety constraints, the existing methods based on barrier function and adaptive dynamic programming have received a lot of attention in recent years. Marvi et al. [31] proposed a barrier certified method to learn the safety optimal controller and ensure the operation of the safety-critical system within its safety zone while providing the optimal performance. By introducing the barrier function into utility function, Xu et al. [32] augmented the penalty mechanism to the utility function, and solved the state constraints problem that was difficult to be dealt with by the traditional ADP method. Liu et al. [33] proposed an adaptive control method to obtain the safety solution of nonlinear stochastic systems. In addition, the barrier function transformation method has proved that it is possible to transform the safety-critical system with safety constraints into a general system without constraints in different scenarios, such as zero-sum game [34], non-zero-sum game [35], tracking control [36], and event-triggered control [37]. However, without exception, the above results must satisfy the implicit assumption that the safety constraints are constant. In fact, the constant constraint is only a special case of time-varying constraints. In practical applications, the time-varying constraints also have a wide range of application scenarios, such as UAV or manipulator working in some more complex environments.

For the constrained optimal control problem with time-varying safety constraints and uncertain disturbances, the constrained Nash equilibrium solutions are obtained by introducing a novel barrier function transformation and constructing coupled HJB equations. The novelty of this paper is reflected in the following points:

(1). A novel barrier function transformation method is proposed by introducing a smooth safety boundary function and a barrier function with a single variable. Compared to previous works [34,35], the proposed method no longer strictly requires the time-invariance of safety constraints and can deal with both time-invariance and time-varying safety constraints.

(2). In order to obtain the constrained optimal Nash equilibrium solution of the multi-input barrier transformation system with uncertain disturbances, the reasonable performance index function and coupled HJB function are designed for the nominal system by introducing a disturbance-related term. It is proved that the obtained constrained Nash equilibrium solution can make the safety-critical system asymptotically stable under the uncertain disturbances and time-varying safety constraints.

(3). The single critical neural network is used to approximate the performance index function online to obtain the constrained control input. It is proved theoretically that the proposed barrier function transformation and neural network approximation method can make the system state and NN parameters uniformly ultimately bounded (UUB) under the condition of satisfying the time-varying safety constraints. In addition, two simulation examples also verify the feasibility and effectiveness of the proposed method.

The remainder of this article is organized as follows: Problem formulation and barrier transformation are given in Section 2. Section 3 employs the coupled Hamilton–Jacobi–Bellman equation to obtain the approximate optimal solution online. Section 4 shows the efficiency of the proposed method by giving two simulation examples. Finally, conclusions are given in Section 5.

2. Problem Formulation and Barrier Transformation

Consider the following nonlinear multi-input safety-critical system:

\begin{matrix} \dot{x} = f (x (t)) + g_{1} (x (t)) u_{1} (t) + g_{2} (x (t)) u_{2} (t) + k (x (t)) d (φ (x (t))), \end{matrix}

(1)

where

x \in C \subset R^{n}

is the system state,

u_{1} \in U_{1} \subset R^{m_{1}}

,

u_{2} \in U_{2} \subset R^{m_{2}}

are the control inputs,

d (φ (x (t))) \in R^{m}

is the uncertain disturbance,

f (x) \in R^{n}

,

g_{1} (x) \in R^{n \times m_{1}}

,

g_{2} (x) \in R^{n \times m_{2}}

and

k (x) \in R^{n \times m}

. C indicates the set of acceptable system state, and

U_{1}, U_{2}

indicates the set of acceptable system inputs. It is supposed that

f (x), g_{1} (x), g_{2} (x)

is Lipschitz continuous, and

f (0) = 0

. It is also assumed that the system (1) is stabilizable. The uncertain disturbance term d satisfies

d^{T} d < δ^{T} δ

, where

δ

is a given function,

δ (0) = 0

and

φ (\cdot)

satisfy that

φ (0) = 0

is a fixed function denoting the uncertainty.

Given the initial system state

x_{0}

, the purpose of this article is to find the constrained control inputs

u_{1}, u_{2}

to make the system state x converge to the ideal value under the impact of the uncertain disturbances and time-varying safety constraints.

Remark 1.

In some papers, for example [31,35], the system state is constrained by the constant, that is,

x \in (ζ_{a}, ζ_{A})

, where

(ζ_{a}, ζ_{A})

represent the upper and lower bounds of system state. We consider a more complex and interesting case where the system safety constraints are time-varying and can be mathematically expressed as

x \in (ζ_{a} (t), ζ_{A} (t))

, where

(ζ_{a} (t), ζ_{A} (t))

represent the bounded smooth time-varying functions.

In order to satisfy the time-varying safety constraints, we define the following barrier function with a single independent variable

τ

,

\begin{matrix} b (z (τ); ξ_{a} (τ), ξ_{A} (τ)) = l o g \frac{ξ_{A} (τ) (ξ_{a} (τ) - z (τ))}{ξ_{a} (τ) (ξ_{A} (τ) - z (τ))}, \end{matrix}

(2)

\begin{matrix} b^{- 1} (y (τ); ξ_{a} (τ), ξ_{A} (τ)) = ξ_{a} (τ) ξ_{A} (τ) \frac{e^{\frac{y (τ)}{2}} - e^{- \frac{y (τ)}{2}}}{ξ_{a} (τ) e^{\frac{y (τ)}{2}} - ξ_{A} (τ) e^{- \frac{y (τ)}{2}}}, \end{matrix}

(3)

where

ξ_{a} (\cdot) : R \to R

,

ξ_{A} (\cdot) : R \to R

,

z (\cdot) : R \to R

,

y (\cdot) : R \to R

. The defined barrier function should satisfy the following assumption.

Assumption 1.

The proposed barrier function

b (\cdot)

has the following characteristics:

(1)

ξ_{a} (τ)

,

ξ_{A} (τ)

are two smooth functions and satisfy

ξ_{a} (τ) < 0 < ξ_{A} (τ)

for any

τ > 0

;

(2) For any

τ > 0

, the barrier function takes finite value when

z (τ) \in (ξ_{a} (τ), ξ_{A} (τ))

is satisfied;

(3) For any

τ > 0

, as the function

z (τ)

tends to the prescribed region

(ξ_{a} (τ), ξ_{A} (τ))

,

b (\cdot)

approaches infinity, i.e.,

lim_{z (τ) \to ξ_{a} {(τ)}^{+}} b (z (τ); ξ_{a} (τ), ξ_{A} (τ)) = - \infty

,

lim_{z (τ) \to ξ_{A} {(τ)}^{-}} b (z (τ); ξ_{a} (τ), ξ_{A} (τ))

= + \infty

;

(4) For any

τ > 0

, the barrier function

b (\cdot)

also converges when the function

z (τ)

converges.

It is worth noting that the constraints given by

(ζ_{a} (t), ζ_{A} (t))

can be many common trajectories, including sinusoidal waveforms, damping sinusoids, ramp, and so on. In our study, we will discuss a more useful form. We design the constraints

(ζ_{a} (t), ζ_{A} (t))

as the following smooth transformation functions, and satisfy the following conditions:

\begin{matrix} ζ_{a} (t) = [\begin{matrix} ζ_{a 1} (t) \\ ⋮ \\ ζ_{a n} (t) \end{matrix}], ζ_{a i} (t) = \{\begin{matrix} l_{1}, t < t_{1} \\ l_{1} - ϑ_{1} - ϑ_{1} cos (π \frac{t_{2} - t}{t_{2} - t_{1}}), t_{1} \leq t \leq t_{2} \\ l_{2}, t > t_{2} \end{matrix} \end{matrix}

(4)

\begin{matrix} ζ_{A} (t) = [\begin{matrix} ζ_{A 1} (t) \\ ⋮ \\ ζ_{A n} (t) \end{matrix}], ζ_{A i} (t) = \{\begin{matrix} l_{3}, t < t_{3} \\ l_{3} - ϑ_{2} - ϑ_{2} cos (π \frac{t_{4} - t}{t_{4} - t_{3}}), t_{3} \leq t \leq t_{4} \\ l_{4}, t > t_{4} \end{matrix} \end{matrix}

(5)

where

i = 1, \dots, n

,

l_{1} < 0

,

l_{2} < 0

,

l_{3} > 0

,

l_{4} > 0

, and

l_{1} - 2 ϑ_{1} = l_{2}

,

l_{3} - 2 ϑ_{2} = l_{4}

. We can find many similar practical applications where the similar constraints are imposed (e.g., vehicle entering a narrow road from a wide road, drone entering a tunnel, robotic arm working in a narrow space, etc.).

Remark 2.

A reasonable choice of parameters can be such that

l_{1} = l_{2}

,

l_{3} = l_{4}

when designing a smooth transformation function. In other words, the proposed method can also impose time-invariant safety constraints on the system state when some parameters are selected properly. In addition, according to the defined smooth transformation function, it can be extended to scenarios with more complex safety requirements, such as more frequent transformation of constraints and different types of constraints.

Considering the system (1) with the uncertain disturbances and time-varying safety constraints, we use the proposed barrier function and smooth transformation function to convert the multi-input safety-critical system x with the uncertain disturbances and time-varying safety constraints into the transformation system with uncertain disturbances only. We define

\begin{matrix} s_{i} = b (x_{i} (t); ζ_{a i} (t), ζ_{A i} (t)), \end{matrix}

(6)

\begin{matrix} x_{i} = b^{- 1} (s_{i} (t); ζ_{a i} (t), ζ_{A i} (t)) . \end{matrix}

(7)

According to the chain rule and Equations (6) and (7), the transformed system dynamics

\dot{s}

can be defined as

\begin{matrix} {\dot{s}}_{i} & = \frac{{\dot{x}}_{i}}{\frac{d b^{- 1} (s_{i} (t); ζ_{a i} (t), ζ_{A i} (t))}{d s_{i}}}, \\ = \frac{f_{i} (x (t)) + g_{1 i} (x (t)) u_{1} (t) + g_{2 i} (x (t)) u_{2} (t) + k_{i} (x (t)) d (φ (x (t)))}{\frac{ζ_{A i} (t) ζ_{a i}^{2} (t) - ζ_{a i} (t) ζ_{A i}^{2} (t)}{ζ_{a i}^{2} (t) e^{s_{i}} - 2 ζ_{a i} (t) ζ_{A i} (t) + ζ_{A i}^{2} (t) e^{- s_{i}}}}, \\ = F_{i} (s (t)) + G_{1 i} (s (t)) u_{1} (t) + G_{2 i} (s (t)) u_{2} (t) + K_{i} (s (t)) d (φ (b^{- 1} (s (t)))), \end{matrix}

(8)

where

\begin{matrix} F_{i} (s (t)) = \frac{ζ_{a i}^{2} (t) e^{s_{i}} - 2 ζ_{a i} (t) ζ_{A i} (t) + ζ_{A i}^{2} (t) e^{- s_{i}}}{ζ_{A i} (t) ζ_{a i}^{2} (t) - ζ_{a i} (t) ζ_{A i}^{2} (t)} \times f_{i} ([b^{- 1} (s_{1}), \dots, b^{- 1} (s_{n})]), \\ G_{1 i} (s (t)) = \frac{ζ_{a i}^{2} (t) e^{s_{i}} - 2 ζ_{a i} (t) ζ_{A i} (t) + ζ_{A i}^{2} (t) e^{- s_{i}}}{ζ_{A i} (t) ζ_{a i}^{2} (t) - ζ_{a i} (t) ζ_{A i}^{2} (t)} \times g_{1 i} ([b^{- 1} (s_{1}), \dots, b^{- 1} (s_{n})]), \\ G_{2 i} (s (t)) = \frac{ζ_{a i}^{2} (t) e^{s_{i}} - 2 ζ_{a i} (t) ζ_{A i} (t) + ζ_{A i}^{2} (t) e^{- s_{i}}}{ζ_{A i} (t) ζ_{a i}^{2} (t) - ζ_{a i} (t) ζ_{A i}^{2} (t)} \times g_{2 i} ([b^{- 1} (s_{1}), \dots, b^{- 1} (s_{n})]), \\ K_{i} (s (t)) = \frac{ζ_{a i}^{2} (t) e^{s_{i}} - 2 ζ_{a i} (t) ζ_{A i} (t) + ζ_{A i}^{2} (t) e^{- s_{i}}}{ζ_{A i} (t) ζ_{a i}^{2} (t) - ζ_{a i} (t) ζ_{A i}^{2} (t)} \times k_{i} ([b^{- 1} (s_{1}), \dots, b^{- 1} (s_{n})]) . \end{matrix}

Based on Formula (8), the transformation system

s = [s_{1};

⋯;

s_{n}]

can be written as

\begin{matrix} \dot{s} = F (s (t)) + G_{1} (s (t)) u_{1} (t) + G_{2} (s (t)) u_{2} (t) + K (s (t)) d (φ (b^{- 1} (s (t)))) \end{matrix}

(9)

where

F (s) = [F_{1} (s);

⋯;

F_{n} (s)]

,

G_{1} (s) = [G_{11} (s);

⋯;

G_{1 n} (s)]

,

G_{2} (s) = [G_{21} (s);

⋯;

G_{2 n} (s)]

,

K (s) = [K_{1} (s); \dots; K_{n} (s)]

. For convenience, we use d to represent

d (φ (b^{- 1} (s (t))))

and use s to represent

s (t)

in the following description.

After the proposed barrier transformation, we have transformed the problem from the constrained optimal control problem for the safety-critical system (1) with uncertain disturbances and time-varying safety constraints to the constrained optimal control problem for the transformation system (9) with uncertain disturbances only. Before proceeding, we need to make the following proof about the transformation system (9).

Theorem 1.

Based on the proposed barrier transformation (6) and (7), the transformation system (9) obtained from the system (1) satisfies the following properties:

(1)

F (s)

is Lipschitz with

F (0) = 0

, and satisfies

∥F (s)∥ \leq λ_{f} ∥s∥

, where

λ_{f}

is a constant;

(2)

G_{1} (s)

,

G_{2} (s)

are bounded, and there exists constants

λ_{1 g}

,

λ_{2 g}

, makes

∥G_{1} (s)∥ \leq λ_{1 g}

,

∥G_{2} (s)∥ \leq λ_{2 g}

. The transformation system (9) has zero state observability.

Proof of Theorem 1.

(1) Based on Equation (8), we can obtain

\begin{matrix} F_{i} (s) = f_{i} (x) T_{i} (s), \end{matrix}

(10)

where

T_{i} (s) = \frac{ζ_{a i}^{2} (t) e^{s_{i}} - 2 ζ_{a i} (t) ζ_{A i} (t) + ζ_{A i}^{2} (t) e^{- s_{i}}}{ζ_{A i} (t) ζ_{a i}^{2} (t) - ζ_{a i} (t) ζ_{A i}^{2} (t)}

,

F_{i} (0) = f_{i} (0) = 0

. Based on Assumption 1, we know that, as long as

x \in C

, then the transformation system state s is bounded, that is,

T_{i} (s)

is bounded. We can derive

\begin{matrix} ∥F_{i} (s)∥ \leq ∥f_{i} (x)∥ ∥T_{i} (s)∥ \leq ∥f_{i} (x)∥ λ_{ζ}, \end{matrix}

(11)

where

λ_{ζ}

represents the upper bound of

T_{i} (s)

. Based on the assumptions about the system (1), we can obtain

\begin{matrix} ∥F_{i} (s_{1}) - F_{i} (s_{2})∥ = ∥(f_{i} (x_{1}) - f_{i} (x_{2})) T_{i} (s)∥ \leq ∥x_{1} - x_{2}∥ k_{L 1} λ_{ζ}, \end{matrix}

(12)

where

x_{1}, x_{2} \in C

,

k_{L 1}

is the Lipschitz constant of

f_{i} (x)

. Based on the property of the barrier function, we can deduce that

s_{1}

and

s_{2}

are bounded as long as

x_{1}, x_{2} \in C

. For any

x_{1}, x_{2} \in C

, there is always a constant

k_{L 2}

that makes

∥F_{i} (s_{1}) - F_{i} (s_{2})∥ \leq ∥s_{1} - s_{2}∥ k_{L 2}

. Considering the fact that

F (s) = [F_{1} (s); \dots; F_{n} (s)]

, we can deduce that

\begin{matrix} ∥F (s_{1}) - F (s_{2})∥ \leq ∥s_{1} - s_{2}∥ k_{L 3} . \end{matrix}

(13)

where

k_{L 3}

is the Lipschitz constant of

F (s)

. Based on the Lipschitz condition [38],

F (s)

is Lipschitz continuous. Based on the boundedness of

T_{i} (s)

and the assumptions about system (1), we can obtain that every term in

F_{i} (s)

is bounded with

x \in C

. Therefore, we can say that

F (s)

is also bounded, and there is a constant

λ_{f}

such that

∥F (s)∥ \leq λ_{f} ∥s∥

.

(2) Based on the boundedness of

T_{i} (s)

and Equation (8), we can obtain that

G_{1 i} (s), G_{2 i} (s)

are bounded with

x \in C

. Considering the fact that

G_{1} (s) = [G_{11} (s); \dots; G_{1 n} (s)], G_{2} (s) = [G_{21} (s); \dots; G_{2 n} (s)]

, there are constants

λ_{1 g}

and

λ_{2 g}

, such that

∥G_{1} (s)∥ \leq λ_{1 g}

,

∥G_{2} (s)∥ \leq λ_{2 g}

. Given the initial system state

x_{0}

, the initial state of transformed system (9) can be obtained from Equation (6), which proves the zero state observability of transformed system (9).

This completes the proof. □

Based on the transformation system, the nominal system of (9) can be defined as

\begin{matrix} \dot{s} = F (s) + G_{1} (s) u_{1} + G_{2} (s) u_{2} . \end{matrix}

(14)

The performance index function related to the design of

u_{1}

can be defined as

\begin{matrix} V_{1} (s, u_{1}, u_{2}) = \int_{0}^{\infty} s^{T} Q_{1} s + Φ_{1} (u_{1}, λ_{1}) + Φ_{2} (u_{2}, λ_{2}) + Γ_{1} (s, \nabla V_{1}) d t, \end{matrix}

(15)

where

Q_{1}, R_{11}, R_{12}

are positive definite matrices,

{\bar{R}}_{11} = [r_{1}, \dots, r_{m_{1}}] \in R^{1 \times m_{1}}, {\bar{R}}_{12} = [r_{1}, \dots, r_{m_{2}}] \in R^{1 \times m_{2}}

,

\nabla V_{1}

represents the partial derivative of the performance index function

V_{1}

with respect to s,

Φ_{1} (u_{1}, λ_{1}) = 2 λ_{1} {({tanh}^{- 1} (\frac{u_{1}}{λ_{1}}))}^{T} R_{11} u_{1} + λ_{1}^{2} {\bar{R}}_{11} ln (1 - \frac{u_{1}^{2}}{λ_{1}^{2}})

is the nonquadratic penalty function of

u_{1}

,

Φ_{2} (u_{2}, λ_{2}) = 2 λ_{2} {({tanh}^{- 1} (\frac{u_{2}}{λ_{2}}))}^{T} R_{12} u_{2} + λ_{2}^{2} {\bar{R}}_{12} ln (1 - \frac{u_{2}^{2}}{λ_{2}^{2}})

is the nonquadratic penalty function of

u_{2}

,

Γ_{1} (s, \nabla V_{1} (s)) = δ^{T} δ + \frac{1}{4} \nabla V_{1} {(s)}^{T} K (s) K^{T} (s)

\nabla V_{1} (s)

represents the disturbance-related term.

The performance index function related to the design of

u_{2}

is defined as

\begin{matrix} V_{2} (s, u_{1}, u_{2}) = \int_{0}^{\infty} s^{T} Q_{2} s + Φ_{3} (u_{1}, λ_{1}) + Φ_{4} (u_{2}, λ_{2}) + Γ_{2} (s, \nabla V_{2}) d t, \end{matrix}

(16)

where

Q_{2}, R_{21}, R_{22}

are positive definite matrices,

{\bar{R}}_{21} = [r_{1}, \dots, r_{m_{1}}] \in R^{1 \times m_{1}}, {\bar{R}}_{22} = [r_{1}, \dots, r_{m_{2}}] \in R^{1 \times m_{2}}

,

\nabla V_{2}

represents the partial derivative of the performance index function

V_{2}

,

Φ_{3} (u_{1}, λ_{1}) = 2 λ_{1} {({tanh}^{- 1} (\frac{u_{1}}{λ_{1}}))}^{T} R_{21} u_{1} + λ_{1}^{2} {\bar{R}}_{21} ln (1 - \frac{u_{1}^{2}}{λ_{1}^{2}})

is the nonquadratic penalty function of

u_{1}

,

Φ_{4} (u_{2}, λ_{2}) = 2 λ_{2} ({tanh}^{- 1}

(\frac{u_{2}}{λ_{2}}))^{T} R_{22} u_{2} + λ_{2}^{2} {\bar{R}}_{22} ln (1 - \frac{u_{2}^{2}}{λ_{2}^{2}})

is the nonquadratic penalty function of

u_{2}

, and

Γ_{2} (s, \nabla V_{2} (s)) = δ^{T} δ + \frac{1}{4} \nabla V_{2} {(s)}^{T}

K (s) K^{T} (s) \nabla V_{2} (s)

represents the barrier-disturbance related term.

Definition 1.

The control strategy set

(u_{1}^{*}, u_{2}^{*})

is a Nash equilibrium control strategy set if

\begin{matrix} V_{1} (u_{1}^{*}, u_{2}^{*}) \leq V_{1} (u_{1}, u_{2}^{*}), \\ V_{2} (u_{1}^{*}, u_{2}^{*}) \leq V_{2} (u_{1}^{*}, u_{2}), \end{matrix}

(17)

hold for any admissible control policies

u_{1}

and

u_{2}

.

Based on the performance index function (15) and (16), the Hamilton functions associated with the control input

u_{1}

and

u_{2}

are defined as

\begin{matrix} H_{1} (s, u_{1}, u_{2}) = s^{T} Q_{1} s + Φ_{1} (u_{1}, λ_{1}) + Φ_{2} (u_{2}, λ_{2}) + Γ_{1} (s, \nabla V_{1}) + \\ \nabla V_{1}^{T} (F (s) + G_{1} (s) u_{1} + G_{2} (s) u_{2}), \end{matrix}

(18)

\begin{matrix} H_{2} (s, u_{1}, u_{2}) = s^{T} Q_{2} s + Φ_{3} (u_{1}, λ_{1}) + Φ_{4} (u_{2}, λ_{2}) + Γ_{2} (s, \nabla V_{2}) + \\ \nabla V_{2}^{T} (F (s) + G_{1} (s) u_{1} + G_{2} (s) u_{2}) . \end{matrix}

(19)

We define the optimal performance index functions of

u_{1}

,

u_{2}

as

\begin{matrix} V_{1}^{*} (s, u_{1}^{*}, u_{2}) = min_{u_{1} \in U_{1}} \int_{0}^{\infty} s^{T} Q_{1} s + Φ_{1} (u_{1}, λ_{1}) + Φ_{2} (u_{2}, λ_{2}) + Γ_{1} (s, \nabla V_{1}) d t, \end{matrix}

(20)

\begin{matrix} V_{2}^{*} (s, u_{1}, u_{2}^{*}) = min_{u_{2} \in U_{2}} \int_{0}^{\infty} s^{T} Q_{2} s + Φ_{3} (u_{1}, λ_{1}) + Φ_{4} (u_{2}, λ_{2}) + Γ_{2} (s, \nabla V_{2}) d t . \end{matrix}

(21)

Considering the nominal system (14) and the Formulas (15) and (16), the constrained optimal control strategys

u_{1}^{*}

and

u_{2}^{*}

can be obtained according to the stationarity condition of optimization:

\begin{matrix} u_{1}^{*} = - λ_{1} tanh (\frac{1}{2 λ_{1}} R_{11}^{- 1} G_{1}^{T} (s) \nabla V_{1}^{*} (s)), \end{matrix}

(22)

\begin{matrix} u_{2}^{*} = - λ_{2} tanh (\frac{1}{2 λ_{2}} R_{22}^{- 1} G_{2}^{T} (s) \nabla V_{2}^{*} (s)), \end{matrix}

(23)

where

V_{1}^{*} (s)

and

V_{2}^{*} (s)

are obtained by solving the following coupled HJB equations:

\begin{matrix} s^{T} Q_{1} s + 2 λ_{1} {({tanh}^{- 1} (\frac{u_{1}}{λ_{1}}))}^{T} R_{11} u_{1} + λ_{1}^{2} {\bar{R}}_{11} ln (1 - \frac{u_{1}^{2}}{λ_{1}^{2}}) + 2 λ_{2} {({tanh}^{- 1} (\frac{u_{2}}{λ_{2}}))}^{T} R_{12} u_{2} + \\ λ_{2}^{2} {\bar{R}}_{12} ln (1 - \frac{u_{2}^{2}}{λ_{2}^{2}}) + Γ_{1} (s, \nabla V_{1}) + \nabla V_{1}^{T} (F (s) - G_{1} (s) λ_{1} tanh (\frac{1}{2 λ_{1}} R_{11}^{- 1} G_{1}^{T} (s) \nabla V_{1}^{*} (s)) \\ - G_{2} (s) λ_{2} tanh (\frac{1}{2 λ_{2}} R_{22}^{- 1} G_{2}^{T} (s) \nabla V_{2}^{*} (s))) = 0 \end{matrix}

(24)

\begin{matrix} s^{T} Q_{2} s + 2 λ_{1} {({tanh}^{- 1} (\frac{u_{1}}{λ_{1}}))}^{T} R_{21} u_{1} + λ_{1}^{2} {\bar{R}}_{21} ln (1 - \frac{u_{1}^{2}}{λ_{1}^{2}}) + 2 λ_{2} {({tanh}^{- 1} (\frac{u_{2}}{λ_{2}}))}^{T} R_{22} u_{2} + \\ λ_{2}^{2} {\bar{R}}_{22} ln (1 - \frac{u_{2}^{2}}{λ_{2}^{2}}) + Γ_{2} (s, \nabla V_{1}) + \nabla V_{2}^{T} (F (s) G_{1} (s) λ_{1} tanh (\frac{1}{2 λ_{1}} R_{11}^{- 1} G_{1}^{T} (s) \nabla V_{1}^{*} (s)) - \\ - G_{2} (s) λ_{2} tanh (\frac{1}{2 λ_{2}} R_{22}^{- 1} G_{2}^{T} (s) \nabla V_{2}^{*} (s))) = 0 \end{matrix}

(25)

Lemma 1.

Assume that

V_{1} (s), V_{2} (s)

are the continuously differentiable function satisfying

V_{1} (s) > 0, V_{2} (s) > 0

for all

s \neq 0

and

V_{1} (0) = V_{2} (0) = 0

, and there exist two bounded functions

Γ_{1} (s), Γ_{2} (s)

satisfying

Γ_{1} (s) \geq 0, Γ_{2} (s) \geq 0

, and two control laws

u_{1}, u_{2}

, such that

\begin{matrix} (a) \nabla V_{j}^{T} \bar{T} (s, u_{1}, u_{2}, d) \leq \nabla V_{j}^{T} T (s, u_{1}, u_{2}) + Γ_{j} (s), \\ (b) \nabla V_{j}^{T} T (s, u_{1}, u_{2}) + Γ_{j} (s) < 0, s \neq 0, \end{matrix}\} j = 1, 2

(26)

where

\bar{T} (s, u_{1}, u_{2}, d) = F (s) + G_{1} (s) u_{1} + G_{2} (s) u_{2} + K (s) d

,

T (s, u_{1}, u_{2}) = F (s) + G_{1} (s) u_{1} + G_{2} (s) u_{2}

. Then, the transformation system (9) can achieve asymptotic stability under the control laws

u_{1}

and

u_{2}

.

Proof of Lemma 1.

We can use the chain rule to obtain

{\dot{V}}_{1} (s (t)) = \frac{d (V_{1} (s (t)))}{d t} = \nabla V_{1}^{T} \bar{T} (s, u_{1}, u_{2}, d) .

(27)

According to Formula (26), we can obtain

{\dot{V}}_{1} (s (t)) < 0

for any

s \neq 0

. We can derive that

V_{1} (\cdot)

is a Lyapunov function for the transformation system (9), which proves that the transformation system can be asymptotic stability. As long as

V_{1} (\cdot)

satisfies the condition of Formula (26), it is concluded that the control law

u_{1}

can realize the asymptotic stability of the transformation system. Similarly, we can prove that the control law

u_{2}

can realize the asymptotic stability of the transformation system. □

Lemma 2.

Under Assumption 1, if the constrained optimal control problem of the transformation system (9) can be solved by the constrained optimal control laws

u_{1}, u_{2}

, then the system (1) satisfies the time-varying safety constraints

(ζ_{a} (t), ζ_{A} (t))

provided that the initial state

x_{0}

of the system (1) satisfies time-varying safety constraints.

Proof of Lemma 2.

Based on Lemma 1, one can obtain

{\dot{V}}_{1} (s (t)) \leq 0

and

{\dot{V}}_{2} (s (t)) \leq 0

, such that

V_{1} (s (t)) \leq V_{1} (s (0)), V_{2} (s (t)) \leq V_{2} (s (0)), \forall t \geq 0 .

(28)

According to the properties of the barrier function in Assumption 1, we can derive that the performance index functions

V_{1} (s (0))

and

V_{2} (s (0))

are finite when the initial value

x_{0}

of the safety-critical system (1) satisfies the time-varying safety constraints

(ζ_{a} (t), ζ_{A} (t))

, and

V_{1} (\cdot), V_{2} (\cdot)

satisfies the condition of Formula (26). That is, the performance index functions

V_{1} (s (t))

and

V_{2} (s (t))

are finite. Therefore, based on Assumption 1, we obtain

x (t) \in (ζ_{a} (t), ζ_{A} (t)), t > 0 .

(29)

This proof is completed. □

According to Lemmas 1 and 2, the constrained optimal control laws (22) and (23) can make the safety-critical system (1) with the uncertain disturbances and time-varying safety constraints asymptotically stable based on the proposed barrier transformation and disturbance-related term. Based on (22) and (23), we only need to use the proposed coupled HJB Equations (24) and (25) to obtain the optimal performance index function, and then obtain the constrained optimal control solution. However, Equations (24) and (25) are often difficult or impossible to solve due to their inherently nonlinear nature. In view of this problem, an approximate structure based on NN is proposed to learn the solutions of the coupled HJB equations online.

3. Approximate Optimal Solution of Coupled Hamilton–Jacobi–Bellman Equations

In this section, an online approximation method is proposed by constructing a single critic network. Based on the universal approximation property of NN, the optimal performance index functions (20) and (21) and their partial derivatives can be approximated as follows:

\begin{matrix} V_{j}^{*} (s) = W_{j}^{* T} ϕ_{j} (s) + ε_{j} (s), \\ \nabla V_{j}^{*} (s) = \nabla ϕ_{j}^{T} (s) W_{j}^{*} + \nabla ε_{j} (s), \end{matrix}\} j = 1, 2

(30)

where

W_{j}^{*} = {[ω_{j 1} ω_{j 2} ω_{j 3} \dots ω_{j L}]}^{T} \in R^{L}

represents the ideal weight,

ϕ_{j} (s) = [φ_{j 1} φ_{j 2} φ_{j 3} \dots

φ_{j L}]^{T}

\in R^{L}

represents the neural network activation function,

\nabla ϕ_{j} (s)

represents the partial derivative of

ϕ_{j} (s)

, L represents the number of hidden layer neurons,

ε_{j} (s)

represents the NN approximation error, and

\nabla ε_{j} (s)

represents the partial derivative of

ε_{j} (s)

.

Assumption 2.

It is assumed that the ideal weights

W_{j}

are limited to constants, i.e.,

‖ W_{j} ‖ \leq λ_{W_{j}}

, and the neural network approximation residuals satisfy

‖ ε_{j} ‖ \leq λ_{ε_{j}}

,

‖ \nabla ε_{j} ‖ \leq λ_{d ε_{j}}

, and the neural network activation functions satisfy

‖ ϕ_{j} ‖ \leq λ_{ϕ_{j}}

,

‖ \nabla ϕ_{j} ‖ \leq λ_{d ϕ_{j}}

.

Based on Formula (30), the Bellman approximation errors of the neural network approximation can be expressed as

\begin{matrix} H_{1} (s, W_{1}^{*}, W_{2}^{*}) = ε_{B 1}, H_{2} (s, W_{1}^{*}, W_{2}^{*}) = ε_{B 2} . \end{matrix}

(31)

Remark 3.

The Bellman approximation errors

ε_{B 1}

and

ε_{B 2}

will be equal to 0 with the number of hidden neurons

L \to \infty

. When the number of L is a constant, the Bellman approximation errors is bounded, i.e.,

ε_{B j} (s) < ε_{B j h}

. In the later proof, we will consider the influence of Bellman approximation errors

ε_{B 1}

and

ε_{B 2}

.

Since the ideal weights

W_{1}^{*}

and

W_{2}^{*}

are unknown, we use the estimates of ideal weights to construct the critic neural network:

\begin{matrix} {\hat{V}}_{j} (s) = {\hat{W}}_{j}^{T} ϕ_{j} (s), \nabla {\hat{V}}_{j} (s) = \nabla ϕ_{j}^{T} (s) {\hat{W}}_{j} . \end{matrix}

(32)

According to Formulas (22), (23) and (32), the approximate optimal control strategys are

\begin{matrix} {\hat{u}}_{1} = - λ_{1} tanh (\frac{1}{2 λ_{1}} R_{11}^{- 1} G_{1}^{T} (s) \nabla ϕ_{1}^{T} (s) {\hat{W}}_{1}), \end{matrix}

(33)

\begin{matrix} {\hat{u}}_{2} = - λ_{2} tanh (\frac{1}{2 λ_{2}} R_{22}^{- 1} G_{2}^{T} (s) \nabla ϕ_{2}^{T} (s) {\hat{W}}_{2}) . \end{matrix}

(34)

Substituting (32)–(34) into (18) and (19), the approximate Hamiltonian function can be obtained

\begin{matrix} H_{1} (s, {\hat{W}}_{1}, {\hat{W}}_{2}) = s^{T} Q_{1} s + 2 λ_{1} {({tanh}^{- 1} (\frac{{\hat{u}}_{1}}{λ_{1}}))}^{T} R_{11} {\hat{u}}_{1} + λ_{1}^{2} {\bar{R}}_{11} ln (1 - \frac{{\hat{u}}_{1}^{2}}{λ_{1}^{2}}) + \\ 2 λ_{2} {({tanh}^{- 1} (\frac{{\hat{u}}_{2}}{λ_{2}}))}^{T} R_{12} {\hat{u}}_{2} + λ_{2}^{2} {\bar{R}}_{12} ln (1 - \frac{{\hat{u}}_{2}^{2}}{λ_{2}^{2}}) + Γ_{1} (s, \nabla {\hat{V}}_{1}) + \\ \nabla {\hat{V}}_{1}^{T} (F (s) - G_{1} (s) λ_{1} tanh (\frac{1}{2 λ_{1}} R_{11}^{- 1} G_{1}^{T} (s) \nabla {\hat{V}}_{1} (s)) - \\ G_{2} (s) λ_{2} tanh (\frac{1}{2 λ_{2}} R_{22}^{- 1} G_{2}^{T} (s) \nabla {\hat{V}}_{2} (s))) ≜ e_{1}, \end{matrix}

(35)

\begin{matrix} H_{2} (s, {\hat{W}}_{1}, {\hat{W}}_{2}) = s^{T} Q_{2} s + 2 λ_{1} {({tanh}^{- 1} (\frac{{\hat{u}}_{1}}{λ_{1}}))}^{T} R_{21} {\hat{u}}_{1} + λ_{1}^{2} {\bar{R}}_{21} ln (1 - \frac{{\hat{u}}_{1}^{2}}{λ_{1}^{2}}) + \\ 2 λ_{2} {({tanh}^{- 1} (\frac{{\hat{u}}_{2}}{λ_{2}}))}^{T} R_{22} {\hat{u}}_{2} + λ_{2}^{2} {\bar{R}}_{22} ln (1 - \frac{{\hat{u}}_{2}^{2}}{λ_{2}^{2}}) + Γ_{2} (s, \nabla {\hat{V}}_{2}) + \\ \nabla {\hat{V}}_{2}^{T} (F (s) - G_{1} (s) λ_{1} tanh (\frac{1}{2 λ_{1}} R_{11}^{- 1} G_{1}^{T} (s) \nabla {\hat{V}}_{1} (s)) - \\ G_{2} (s) λ_{2} tanh (\frac{1}{2 λ_{2}} R_{22}^{- 1} G_{2}^{T} (s) \nabla {\hat{V}}_{2} (s))) ≜ e_{2} . \end{matrix}

(36)

The estimates of ideal weights need to be adjusted so that

{\hat{W}}_{1}

and

{\hat{W}}_{2}

can minimize the squared residual error

E = e_{1}^{T} e_{1} / 2 + e_{2}^{T} e_{2} / 2

. In general, the online adaptive learning algorithm usually requires a persistence excitation (PE) condition to achieve convergence. In order to satisfy this condition, we redefine the residual squared error as

E = \frac{1}{2} (e_{1}^{T} e_{1} + \sum_{l = 1}^{N} e_{1 l}^{T} e_{1 l} + e_{2}^{T} e_{2} + \sum_{l = 1}^{N} e_{2 l}^{T} e_{2 l})

, where

e_{1 l}, e_{2 l}

represent the past data with

t_{l} < t

. We choose the normalized gradient descent algorithm as the tuning laws of the estimates to minimize the residual squared error,

\begin{matrix} {\dot{\hat{W}}}_{1} = - α_{1} \frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} {[σ_{1} {(t)}^{T} {\hat{W}}_{1} + r_{1} (s, {\hat{u}}_{1}, {\hat{u}}_{2}, Γ_{1})]}^{T} - \\ α_{1} \sum_{l = 1}^{N} \frac{σ_{1} (t_{l})}{{\bar{σ}}_{1} (t_{l})} {[σ_{1} {(t_{l})}^{T} {\hat{W}}_{1} + r_{1} (s (t_{l}), {\hat{u}}_{1} (t_{l}), {\hat{u}}_{2} (t_{l}), Γ_{1} (t_{l}))]}^{T}, \end{matrix}

(37)

\begin{matrix} {\dot{\hat{W}}}_{2} = - α_{2} \frac{σ_{2} (t)}{{\bar{σ}}_{2} (t)} {[σ_{2} {(t)}^{T} {\hat{W}}_{2} + r_{2} (s, {\hat{u}}_{1}, {\hat{u}}_{2}, Γ_{2})]}^{T} - \\ α_{2} \sum_{l = 1}^{N} \frac{σ_{2} (t_{l})}{{\bar{σ}}_{2} (t_{l})} {[σ_{2} {(t_{l})}^{T} {\hat{W}}_{2} + r_{2} (s (t_{l}), {\hat{u}}_{1} (t_{l}), {\hat{u}}_{2} (t_{l}), Γ_{2} (t_{l}))]}^{T}, \end{matrix}

(38)

where

α_{1} > 0

and

α_{2} > 0

are learning rates that determine the convergence speed of the estimate,

σ_{1} (t) = \nabla ϕ_{1} (s) (F (s) + G_{1} (s) {\hat{u}}_{1} + G_{2} (s) {\hat{u}}_{2})

,

{\bar{σ}}_{1} (t) = {(σ_{1}^{T} (t) σ_{1}^{T} (t) + 1)}^{2}

,

σ_{2} (t) = \nabla ϕ_{2} (s) (F (s) + G_{1} (s) {\hat{u}}_{1} + G_{2} (s) {\hat{u}}_{2})

,

{\bar{σ}}_{2} (t) = {(σ_{2}^{T} (t) σ_{2}^{T} (t) + 1)}^{2}

,

r_{1} (s, {\hat{u}}_{1}, {\hat{u}}_{2}, Γ_{1}) = s^{T} Q_{1} s + Φ_{1} ({\hat{u}}_{1}, λ_{1}) + Φ_{2} ({\hat{u}}_{2}, λ_{2}) + Γ_{1} (s, \nabla {\hat{V}}_{1})

,

r_{2} (s, {\hat{u}}_{1}, {\hat{u}}_{2}, Γ_{2}) = s^{T} Q_{2} s + Φ_{3} ({\hat{u}}_{1}, λ_{1}) + Φ_{4} ({\hat{u}}_{2}, λ_{2}) + Γ_{2} (s, \nabla {\hat{V}}_{2})

, and

s (t_{l}), {\hat{u}}_{1} (t_{l}), {\hat{u}}_{2} (t_{l}), σ_{1} (t_{l}), {\bar{σ}}_{1} (t_{l}), σ_{2} (t_{l}), {\bar{σ}}_{2} (t_{l}), Γ_{1} (t_{l}), Γ_{2} (t_{l})

are all obtained by storing the past data.

The weight estimation errors

{\tilde{W}}_{1}

and

{\tilde{W}}_{2}

can be defined as

\begin{matrix} {\tilde{W}}_{1} = W_{1}^{*} - {\hat{W}}_{1}, {\tilde{W}}_{2} = W_{2}^{*} - {\hat{W}}_{2} . \end{matrix}

(39)

Based on (37)–(39), we have

\begin{matrix} {\dot{\tilde{W}}}_{1} = α_{1} \frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} {[σ_{1} {(t)}^{T} {\hat{W}}_{1} + r (s, u_{1}, u_{2}, Γ_{1})]}^{T} + \\ α_{1} \sum_{l = 1}^{N} \frac{σ_{1} (t_{l})}{{\bar{σ}}_{1} (t_{l})} {[σ_{1} {(t_{l})}^{T} {\hat{W}}_{1} + r (s (t_{l}), u_{1} (t_{l}), u_{2} (t_{l}), Γ_{1} (t_{l}))]}^{T}, \end{matrix}

(40)

\begin{matrix} {\dot{\tilde{W}}}_{2} = α_{2} \frac{σ_{2} (t)}{{\bar{σ}}_{2} (t)} {[σ_{2} {(t)}^{T} {\hat{W}}_{2} + r (s, u_{1}, u_{2}, Γ_{2})]}^{T} + \\ α_{2} \sum_{l = 1}^{N} \frac{σ_{2} (t_{l})}{{\bar{σ}}_{2} (t_{l})} {[σ_{2} {(t_{l})}^{T} {\hat{W}}_{2} + r (s (t_{l}), u_{1} (t_{l}), u_{2} (t_{l}), Γ_{2} (t_{l}))]}^{T} . \end{matrix}

(41)

Combined with the previous content, the proposed multi-input safety-critical system structure diagram is shown in Figure 1.

Theorem 2.

Consider the system (9), the approximate optimal control strategy (33) and (34), and the weight tuning laws (37) and (38). Suppose that

\nabla ϕ_{1}

,

\nabla ϕ_{2}

,

ϵ_{1}

,

\nabla ϵ_{1}

,

ϵ_{2}

,

\nabla ϵ_{2}

,

ϵ_{B 1}

,

ϵ_{B 2}

are all uniformly bounded. Assume that the Assumptions 1 and 2 hold. Then, the system state s, the neural network weight errors

{\tilde{W}}_{1}

,

{\tilde{W}}_{2}

can be guaranteed to be UUB under the time-varying safety constraints and uncertain disturbances.

Proof of Theorem 2.

See the Appendix A. □

Remark 4.

According to the result of Theorem 2, we can obtain that the neural network weight errors are UUB. According to formulas (33), (34), and (39), we can easily derive that, as

{\hat{V}}_{1} (s) \to V_{1}^{*} (s)

,

{\hat{V}}_{2} (s) \to V_{2}^{*} (s)

, then the control input

{\hat{u}}_{1} \to u_{1}^{*}

,

{\hat{u}}_{2} \to u_{2}^{*}

. That is, the control strategy can be approximately optimal.

Remark 5.

Compared with [35], this work considers a more complex and interesting constrained control problem, that is, the safety constraints change with time. In addition, we establish the coupled HJB equation to obtain the constrained optimal solution, so that the system state can complete convergence under the condition that the time-varying constraints are satisfied.

Remark 6.

In [34,36], the safety optimal control problem with external disturbance is considered, and the control scheme based on barrier transformation is designed. However, all of the external disturbances mentioned are known. In this work, the safety control problem with uncertain disturbance is further studied, and it is proved that the system state can complete convergence under the proposed control strategy.

4. Simulation

To prove the effectiveness of the proposed method, we give two nonlinear examples with time-varying safety constraints. In both cases, we observe that the system can satisfy the time-varying safety constraints.

4.1. Nonlinear System Example 1

Consider the affine nonlinear system as follows [30]:

\begin{matrix} \dot{x} = & [\begin{matrix} x_{2} - 2 x_{1} \\ (- x_{2} - 0.5 x_{1} + 0.25 x_{2} {(cos (2 x_{1} + 2))}^{2} \\ + 0.25 x_{2} {(sin {(4 x_{1})}^{2} + 2)}^{2}) \end{matrix}] + [\begin{matrix} 0 \\ cos (2 x_{1} + 2) \end{matrix}] u_{1} + \\ [\begin{matrix} 0 \\ 4 x_{1}^{2} + 2 \end{matrix}] u_{2} + [\begin{matrix} 0 \\ cos (x_{1}) x_{2} \end{matrix}] d . \end{matrix}

(42)

In addition,

x = {[x_{1}, x_{2}]}^{T}

is the system state. One selects

α_{1} = α_{2} = 1, R_{11} = R_{12} = 2,

R_{21} = R_{22} = 1, Q_{1} = Q_{2} = [1 0; 0 1]

. The initial system state is defined as

x_{0} = {[2, 2]}^{T}

. We choose

φ (x) = x

and

d (φ (x)) = p x_{1} sin x_{2}, p \in [- 1, 1]

. Similarly, we select

δ (x) = x_{1} sin x_{2}

. Based on Formula (4) and (5), we define the time-varying parameters for

x_{1}

as

l_{1} = - 1, l_{2} = - 0.6, ϑ_{1} = - 0.2, t_{1} = 3, t_{2} = 4, l_{3} = 2.2, l_{4} = 1.8, ϑ_{2} = 0.2

. We define the time-varying parameters for

x_{2}

as

l_{1} = - 2.8, l_{2} = - 1.8, ϑ_{1} = - 0.5,

t_{3} = 3, t_{4} = 4, l_{3} = 3, l_{4} = 2, ϑ_{2} = 0.5

. Before 75 s, the persistence excitation condition is ensured by the probing noise. Since the effectiveness of the barrier transformation has been demonstrated in many previous works, we no longer compare our work with scenarios without safety constraints, but with scenarios with constant constraints.

We define the activation functions as

\begin{matrix} ϕ_{1} (s) = ϕ_{2} (s) = {[s_{1}^{2} s_{1} s_{2} s_{2}^{2}]}^{T} . \end{matrix}

Meanwhile, the critic weight parameters are denoted as

\begin{matrix} {\hat{W}}_{1} = {[{\hat{ω}}_{11} {\hat{ω}}_{12} {\hat{ω}}_{13}]}^{T}, {\hat{W}}_{2} = {[{\hat{ω}}_{21} {\hat{ω}}_{22} {\hat{ω}}_{23}]}^{T} . \end{matrix}

The critic parameters after 100 s converge to the value of

{\hat{W}}_{1} = [- 0.392 1.789 1.162]

,

{\hat{W}}_{2} = [- 1.849 2.590 0.142]

.

It is obtained from Figure 2 that the method of using constant constraints can satisfy constant constraints

(- 1, 2.2), (- 2.8, 3)

in the process of system state convergence, but can not satisfy the time-varying constraints

(ζ_{a 1}, ζ_{A 1})

,

(ζ_{a 2}, ζ_{A 2})

. It can be seen that the trajectory of system state x obtained by the proposed method can converge to zero under the condition that time-varying safety constraints are satisfied. Figure 3 gives the evolution of the critic parameters for player 1. The evolution of the critic parameters for player 2 is shown in Figure 4. It can be seen that, according to the proposed tuning laws (37) and (38), the critic weight parameters converge to their ideal values. Figure 5 shows the state trajectories of the transformation system (9).

4.2. Nonlinear System Example 2

Consider the following nonlinear system of a single link robot arm:

\begin{matrix} \dot{x} = [\begin{matrix} x_{2} - 2 x_{1} \\ - 5 sin (x_{1}) - 0.2 x_{2} \end{matrix}] + [\begin{matrix} 0 \\ 0.1 \end{matrix}] u_{1} + [\begin{matrix} 0 \\ 0.1 \end{matrix}] u_{2} + [\begin{matrix} 0 \\ 1 \end{matrix}] d . \end{matrix}

(43)

In addition,

x = {[x_{1}, x_{2}]}^{T}

is the system state. One selects

α_{1} = 5, α_{2} = 1, R_{11} = R_{12} = 2,

R_{21} = R_{22} = 1, Q_{1} = Q_{2} = [5 0; 0 5]

. The initial system state is defined as

x_{0} = {[2, 2]}^{T}

. Similarly, we choose

φ (x) = x

,

d (φ (x)) = p x_{1} sin x_{2}, p \in [- 1, 1]

, and

δ (x) = x_{1} sin x_{2}

. In this example, we apply the more complex time-varying safety constraints to the system state, where the constraints on the upper bounds of

x_{1}, x_{2}

vary at 3 and 8 s, respectively, and the constraints on the lower bounds of

x_{1}

and

x_{2}

vary at 3 and 10 s, respectively. Define

λ_{1} = 3

,

λ_{2} = 18

as the boundaries of the control inputs. Before 75 s, the persistence excitation condition is ensured by the probing noise.

We define the activation function as

\begin{matrix} ϕ_{1} (s) = ϕ_{2} (s) = {[s_{1}^{2} s_{1} s_{2} s_{2}^{2}]}^{T} . \end{matrix}

Meanwhile, we denoted the critic weight parameters as

\begin{matrix} {\hat{W}}_{1} = {[{\hat{ω}}_{11} {\hat{ω}}_{12} {\hat{ω}}_{13}]}^{T}, {\hat{W}}_{2} = {[{\hat{ω}}_{21} {\hat{ω}}_{22} {\hat{ω}}_{23}]}^{T} . \end{matrix}

The critic parameters after 100 s converge to the value of

{\hat{W}}_{1} =

[−1.319 0.249 −0.023],

{\hat{W}}_{2} =

[0.250 −1.113 0.658].

In Example 2, we further consider the case of input constraints. Figure 6 shows that the method using constant constraints cannot satisfy the time-varying safety constraints

(ζ_{a 1}, ζ_{A 1})

,

(ζ_{a 2}, ζ_{A 2})

in the process of system state convergence, while the proposed method can ensure that the system state x converges under the time-varying safety constraints. The constrained control inputs are shown in Figure 7. The evolution of the critic parameters is given in Figure 8 and Figure 9. The transformation system state trajectories are shown in Figure 10.

5. Conclusions

For the affine nonlinear multi-input safety-critical systems with uncertain disturbances and time-varying safety constraints, a new adaptive learning algorithm based on the coupled HJB equations was proposed to solve the constrained optimal control problem. In order to satisfy the time-varying safety constraints, the novel barrier function and smooth safety boundary function were used to transform the safety-critical system into the transformation system without the time-varying safety constraints. The proposed barrier function solves the time-varying safety constraint problem which cannot be solved by the traditional constant constraint method. The influence of uncertain disturbances on the transformation system was dealt with reasonably by establishing the nominal system and disturbance-related term. In addition, two critic neural networks were used to learn the optimal solutions of the coupled HJB equations. The effectiveness of this method was verified by the theoretical proof. In addition, we test both the nonlinear system of the robotic arm and the numerical nonlinear example. Simulation results also verify the effectiveness of the proposed method.

Author Contributions

J.W. and C.Q.: Methodology, Validation, Conceptualization, and Writing—Original Draft; X.Q., D.Z. and Z.Z.: Formal analysis, Writing—Review and editing; Z.S. and H.Z.: Data curation; C.Q.: Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. (U1504615), the Science and Technology Research Project of the Henan Province 222102240014, and Youth Backbone Teachers in Colleges and Universities of Henan Province 2018GGJS017.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors can confirm that all relevant data are included in the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 2.

Consider the following Lyapunov function candidate

\begin{matrix} L (s) = V_{1} (s) + V_{2} (s) + \frac{1}{2} α_{1}^{- 1} {\tilde{W}}_{1}^{T} {\tilde{W}}_{1} + \frac{1}{2} α_{2}^{- 1} {\tilde{W}}_{2}^{T} {\tilde{W}}_{2} . \end{matrix}

(A1)

The time derivative on the trajectory of the transformation system is calculated as

\begin{matrix} \dot{L} = {\dot{V}}_{1} + {\dot{V}}_{2} + α_{1}^{- 1} {\tilde{W}}_{1}^{T} {\dot{\tilde{W}}}_{1} + α_{2}^{- 1} {\tilde{W}}_{2}^{T} {\dot{\tilde{W}}}_{2} . \end{matrix}

(A2)

Considering (40), we derive that

\begin{matrix} α_{1}^{- 1} {\tilde{W}}_{1}^{T} {\dot{\tilde{W}}}_{1} = α_{1}^{- 1} {\tilde{W}}_{1}^{T} (α_{1} \frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} {[σ_{1} {(t)}^{T} {\hat{W}}_{1} + r_{1} (s, u_{1}, u_{2}, Γ_{1})]}^{T} + \\ α_{1} \sum_{l = 1}^{N} \frac{σ_{1} (t_{l})}{{\bar{σ}}_{1} (t_{l})} {[σ_{1} {(t_{l})}^{T} {\hat{W}}_{1} + r_{1} (s (t_{l}), {\hat{u}}_{1} (t_{l}), {\hat{u}}_{2} (t_{l}), Γ_{1} (t_{l}))]}^{T}) . \end{matrix}

(A3)

Define

Π_{1} = σ_{1} {(t)}^{T} {\hat{W}}_{1} + r_{1} (s, u_{1}, u_{2}, Γ_{1})

. Based on Formula (31), one has

\begin{matrix} Π_{1} = σ_{1} {(t)}^{T} {\hat{W}}_{1} + s^{T} Q_{1} s + Φ_{1} ({\hat{u}}_{1}, λ_{1}) + Φ_{2} ({\hat{u}}_{2}, λ_{2}) + \\ Γ_{1} (s, \nabla {\hat{V}}_{1}) - σ_{1}^{*} {(t)}^{T} W_{1}^{*} - s^{T} Q_{1} s - Φ_{1} (u_{1}^{*}, λ_{1}) - Φ_{2} (u_{2}^{*}, λ_{2}) \\ - Γ_{1} (s, \nabla V_{1}^{*}) + ε_{B 1}, \\ = Φ_{1} ({\hat{u}}_{1}, λ_{1}) + Φ_{2} ({\hat{u}}_{2}, λ_{2}) - Φ_{1} (u_{1}^{*}, λ_{1}) - Φ_{2} (u_{2}^{*}, λ_{2}) + ε_{B 1} \\ - {\tilde{W}}_{1}^{T} σ_{1} (t) + W_{1}^{* T} (σ_{1} (t) - σ_{1}^{*} (t)) + Γ_{1} (s, \nabla {\hat{V}}_{1}) - Γ_{1} (s, \nabla V_{1}^{*}), \end{matrix}

(A4)

where

σ_{1}^{*} (t) = \nabla ϕ_{1} (s) (F (s) + G_{1} (s) u_{1}^{*} + G_{2} (s) u_{2}^{*})

.

Define

Π_{2} = Φ_{1} ({\hat{u}}_{1}, λ_{1}) - Φ_{1} (u_{1}^{*}, λ_{1})

. Based on the results in [39,40], we can obtain

\begin{matrix} Π_{2} = {\hat{W}}_{1}^{T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} t a n h ({\hat{D}}_{1}) + {\tilde{W}}_{1}^{T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} t a n h (σ_{m 1} {\hat{D}}_{1}) \\ - W_{1}^{* T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} t a n h (D_{1}) - W_{1}^{* T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} [t a n h (σ_{m 1} {\hat{D}}_{1}) \\ - t a n h (σ_{m 1} D_{1})] + λ_{1}^{2} {\bar{R}}_{11} (ε_{{\hat{D}}_{1}} - ε_{D_{1}}) + ε_{σ 1}, \end{matrix}

(A5)

where

{\hat{D}}_{1} = \frac{1}{2 λ_{1}} R_{11}^{- 1} G_{1}^{T} (s) \nabla ϕ_{1} {(s)}^{T} {\hat{W}}_{1}

,

D_{1} = \frac{1}{2 λ_{1}} R_{11}^{- 1} G_{1}^{T} (s) \nabla ϕ_{1} {(s)}^{T} W_{1}^{*}

,

ε_{{\hat{D}}_{1}}

and

ε_{D_{1}}

are bounded approximation errors,

σ_{m 1}

is a big constant, and

ε_{σ 1}

is the approximate error between the tanh and sgn functions.

Define

Π_{3} = Φ_{2} ({\hat{u}}_{2}, λ_{2}) - Φ_{2} (u_{2}^{*}, λ_{2})

. Similarly, we can obtain

\begin{matrix} Π_{3} = {\hat{W}}_{2}^{T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} t a n h ({\hat{D}}_{2}) + {\tilde{W}}_{2}^{T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} t a n h (σ_{m 2} {\hat{D}}_{2}) \\ - W_{2}^{* T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} t a n h (D_{2}) - W_{2}^{* T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} [t a n h (σ_{m 2} {\hat{D}}_{2}) \\ - t a n h (σ_{m 2} D_{2})] + λ_{2}^{2} {\bar{R}}_{22} (ε_{{\hat{D}}_{2}} - ε_{D_{2}}) + ε_{σ 2}, \end{matrix}

(A6)

where

{\hat{D}}_{2} = \frac{1}{2 λ_{2}} R_{22}^{- 1} G_{2}^{T} (s) \nabla ϕ_{2} {(s)}^{T} {\hat{W}}_{2}

,

D_{2} = \frac{1}{2 λ_{2}} R_{22}^{- 1} G_{2}^{T} (s) \nabla ϕ_{2} {(s)}^{T} W_{2}^{*}

,

ε_{{\hat{D}}_{2}}

and

ε_{D_{2}}

are bounded approximation errors,

σ_{m 2}

is a big constant, and

ε_{σ 2}

is the approximate error. Based on (A5) and (A6) and some manipulation, one has

\begin{matrix} Π_{1} = {\hat{W}}_{1}^{T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} t a n h ({\hat{D}}_{1}) + {\tilde{W}}_{1}^{T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} t a n h (σ_{m 1} {\hat{D}}_{1}) \\ - W_{1}^{* T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} t a n h (D_{1}) - W_{1}^{* T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} [t a n h (σ_{m 1} {\hat{D}}_{1}) \\ - t a n h (σ_{m 1} D_{1})] + {\hat{W}}_{2}^{T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} t a n h ({\hat{D}}_{2}) \\ - W_{2}^{* T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} t a n h (D_{2}) - W_{2}^{* T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} [t a n h (σ_{m 2} {\hat{D}}_{2}) \\ - t a n h (σ_{m 2} D_{2})] - {\tilde{W}}_{1}^{T} σ_{1} (t) + W_{1}^{* T} (σ_{1} (t) - σ_{1}^{*} (t)) + ϵ_{11} + ϵ_{12} \\ + {\tilde{W}}_{2}^{T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} t a n h (σ_{m 2} {\hat{D}}_{2}), \\ = - {\tilde{W}}_{1}^{T} σ_{1} (t) + {\tilde{W}}_{1}^{T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} [t a n h (σ_{m 1} {\hat{D}}_{1}) - t a n h ({\hat{D}}_{1})] \\ - W_{1}^{* T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} [t a n h (σ_{m 1} {\hat{D}}_{1}) - t a n h (σ_{m 1} D_{1})] \\ + {\hat{W}}_{2}^{T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} [t a n h ({\hat{D}}_{2}) - t a n h (σ_{m 2} {\hat{D}}_{2})] \\ + W_{2}^{* T} \nabla ϕ_{2} (s) G_{2} (s) λ_{2} [t a n h (σ_{m 2} D_{2}) - t a n h (D_{2})] \\ + W_{1}^{* T} \nabla ϕ_{1} (s) G_{2} (s) λ_{2} [t a n h (D_{2}^{*}) - t a n h ({\hat{D}}_{2})] + ϵ_{11} + ϵ_{12}, \\ = - {\tilde{W}}_{1}^{T} σ_{1} (t) + {\tilde{W}}_{1}^{T} ψ_{1} + W_{1}^{* T} (ψ_{5} - ψ_{2}) + {\hat{W}}_{2}^{T} ψ_{3} + W_{2}^{* T} ψ_{4} + ϵ_{11} + ϵ_{12}, \end{matrix}

(A7)

where

\begin{matrix} ϵ_{11} = Γ_{1} (s, \nabla {\hat{V}}_{1}) - Γ_{1} (s, \nabla V_{1}^{*}) + ε_{B 1}, \\ ϵ_{12} = λ_{1}^{2} {\bar{R}}_{11} (ε_{{\hat{D}}_{1}} - ε_{D_{1}}) + ε_{σ 1} + λ_{2}^{2} {\bar{R}}_{22} (ε_{{\hat{D}}_{2}} - ε_{D_{2}}) + ε_{σ 2}, \\ ψ_{1} = \nabla ϕ_{1} (s) G_{1} (s) λ_{1} [t a n h (σ_{m 1} {\hat{D}}_{1}) - t a n h ({\hat{D}}_{1})], \\ ψ_{2} = \nabla ϕ_{1} (s) G_{1} (s) λ_{1} [t a n h (σ_{m 1} {\hat{D}}_{1}) - t a n h (σ_{m 1} D_{1})], \\ ψ_{3} = \nabla ϕ_{2} (s) G_{2} (s) λ_{2} [t a n h ({\hat{D}}_{2}) - t a n h (σ_{m 2} {\hat{D}}_{2})], \\ ψ_{4} = \nabla ϕ_{2} (s) G_{2} (s) λ_{2} [t a n h (σ_{m 2} D_{2}) - t a n h (D_{2})], \\ ψ_{5} = \nabla ϕ_{1} (s) G_{2} (s) λ_{2} [t a n h (D_{2}^{*}) - t a n h ({\hat{D}}_{2})] . \end{matrix}

Similarly,

\begin{matrix} σ_{1} {(t_{i})}^{T} {\hat{W}}_{1} + r (s (t_{i}), {\hat{u}}_{1} (t_{i}), {\hat{u}}_{2} (t_{i}), Γ_{1} (t_{i}))]^{T} = - {\tilde{W}}_{1}^{T} σ_{1} (t_{i}) + {\tilde{W}}_{1}^{T} ψ_{1} \\ + W_{1}^{* T} (ψ_{5} - ψ_{2}) + {\hat{W}}_{2}^{T} ψ_{3} + W_{2}^{* T} ψ_{4} + ϵ_{11} + ϵ_{12} . \end{matrix}

(A8)

Substituting Formulas (A7) and (A8) into Formula (A3) yields

\begin{matrix} α_{1}^{- 1} {\tilde{W}}_{1}^{T} {\dot{\tilde{W}}}_{1} = - {\tilde{W}}_{1}^{T} [\frac{σ_{1} (t) σ_{1}^{T} (t)}{{\bar{σ}}_{1} (t)} + \sum_{l = 1}^{N} \frac{σ_{1} (t_{i}) σ_{1}^{T} (t_{i})}{{\bar{σ}}_{1} (t_{i})}] {\tilde{W}}_{1} \\ + {\tilde{W}}_{1}^{T} [\frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} + \sum_{l = 1}^{N} \frac{σ_{1} (t_{i})}{{\bar{σ}}_{1} (t_{i})}] ψ_{1}^{T} {\tilde{W}}_{1} + {\tilde{W}}_{1}^{T} [\frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} + \sum_{l = 1}^{N} \frac{σ_{1} (t_{i})}{{\bar{σ}}_{1} (t_{i})}] ψ_{3}^{T} {\hat{W}}_{2} \\ + {\tilde{W}}_{1}^{T} [\frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} + \sum_{l = 1}^{N} \frac{σ_{1} (t_{i})}{{\bar{σ}}_{1} (t_{i})}] {(ψ_{5} - ψ_{2})}^{T} W_{1} \\ + {\tilde{W}}_{1}^{T} [\frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} + \sum_{l = 1}^{N} \frac{σ_{1} (t_{i})}{{\bar{σ}}_{1} (t_{i})}] ψ_{4}^{T} W_{2}^{*} \\ + {\tilde{W}}_{1}^{T} [\frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} + \sum_{l = 1}^{N} \frac{σ_{1} (t_{i})}{{\bar{σ}}_{1} (t_{i})}] (ϵ_{11} + ϵ_{12}), \\ = - {\tilde{W}}_{1}^{T} ϖ_{1} {\tilde{W}}_{1} + {\tilde{W}}_{1}^{T} ϖ_{2} ψ_{1}^{T} {\tilde{W}}_{1} + {\tilde{W}}_{1}^{T} ϖ_{2} ψ_{3}^{T} {\hat{W}}_{2} + {\tilde{W}}_{1}^{T} ϖ_{2} ψ_{4}^{T} W_{2}^{*} \\ + {\tilde{W}}_{1}^{T} ϖ_{2} (ψ_{5}^{T} - ψ_{2}^{T}) W_{1} + {\tilde{W}}_{1}^{T} ϖ_{3}, \\ \leq - {\tilde{W}}_{1}^{T} ϖ_{1} {\tilde{W}}_{1} + \frac{r_{c}}{2} {\tilde{W}}_{1}^{T} ϖ_{2} ϖ_{2}^{T} {\tilde{W}}_{1} + \frac{1}{2 r_{c}} {\tilde{W}}_{1}^{T} ψ_{1} ψ_{1}^{T} {\tilde{W}}_{1} + {\tilde{W}}_{1}^{T} ϖ_{2} ψ_{3}^{T} {\hat{W}}_{2} \\ + {\tilde{W}}_{1}^{T} ϖ_{2} (ψ_{5}^{T} - ψ_{2}^{T}) W_{1} + {\tilde{W}}_{1}^{T} ϖ_{3} + {\tilde{W}}_{1}^{T} ϖ_{2} ψ_{4}^{T} W_{2}^{*}, \end{matrix}

(A9)

where

r_{c}

is a positive constant to be determined,

\begin{matrix} ϖ_{1} = [\frac{σ_{1} (t) σ_{1}^{T} (t)}{{\bar{σ}}_{1} (t)} + \sum_{l = 1}^{N} \frac{σ_{1} (t_{i}) σ_{1}^{T} (t_{i})}{{\bar{σ}}_{1} (t_{i})}], \\ ϖ_{2} = [\frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} + \sum_{l = 1}^{N} \frac{σ_{1} (t_{i})}{{\bar{σ}}_{1} (t_{i})}], \\ ϖ_{3} = [\frac{σ_{1} (t)}{{\bar{σ}}_{1} (t)} + \sum_{l = 1}^{N} \frac{σ_{1} (t_{i})}{{\bar{σ}}_{1} (t_{i})}] (ϵ_{11} + ϵ_{12}) . \end{matrix}

We can also obtain an upper bound on

α_{2}^{- 1} {\tilde{W}}_{2}^{T} {\dot{\tilde{W}}}_{2}

using the similar method,

\begin{matrix} α_{2}^{- 1} {\tilde{W}}_{2}^{T} {\dot{\tilde{W}}}_{2} = - {\tilde{W}}_{2}^{T} ϖ_{4} {\tilde{W}}_{2} + {\tilde{W}}_{2}^{T} ϖ_{5} ψ_{6}^{T} {\tilde{W}}_{2} + {\tilde{W}}_{2}^{T} ϖ_{5} ψ_{8}^{T} {\hat{W}}_{1} + {\tilde{W}}_{2}^{T} ϖ_{5} ψ_{9}^{T} W_{1}^{*} \\ + {\tilde{W}}_{2}^{T} ϖ_{5} (ψ_{10}^{T} - ψ_{7}^{T}) W_{2} + {\tilde{W}}_{2}^{T} ϖ_{6}, \\ \leq - {\tilde{W}}_{2}^{T} ϖ_{4} {\tilde{W}}_{2} + \frac{r_{c}}{2} {\tilde{W}}_{2}^{T} ϖ_{5} ϖ_{5}^{T} {\tilde{W}}_{2} + \frac{1}{2 r_{c}} {\tilde{W}}_{2}^{T} ψ_{6} ψ_{6}^{T} {\tilde{W}}_{2} + {\tilde{W}}_{2}^{T} ϖ_{5} ψ_{8}^{T} {\hat{W}}_{1} \\ + {\tilde{W}}_{2}^{T} ϖ_{5} (ψ_{10}^{T} - ψ_{7}^{T}) W_{2} + {\tilde{W}}_{2}^{T} ϖ_{6} + {\tilde{W}}_{2}^{T} ϖ_{5} ψ_{9}^{T} W_{1}^{*}, \end{matrix}

(A10)

where

ε_{{\hat{D}}_{1}}

and

ε_{D_{1}}

are bounded approximation errors,

σ_{m 3}, σ_{m 3}

are two big constants, and

ε_{σ 3}, ε_{σ 4}

are approximate errors,

\begin{matrix} ψ_{6} = \nabla ϕ_{2} (s) G_{2} (s) λ_{2} [t a n h (σ_{m 3} {\hat{D}}_{2}) - t a n h ({\hat{D}}_{2})], \\ ψ_{7} = \nabla ϕ_{2} (s) G_{2} (s) λ_{2} [t a n h (σ_{m 3} {\hat{D}}_{2}) - t a n h (σ_{m 3} D_{2})], \\ ψ_{8} = \nabla ϕ_{1} (s) G_{1} (s) λ_{1} [t a n h ({\hat{D}}_{1}) - t a n h (σ_{m 4} {\hat{D}}_{1})], \\ ψ_{9} = \nabla ϕ_{1} (s) G_{1} (s) λ_{1} [t a n h (σ_{m 4} D_{1}) - t a n h (D_{1})], \\ ψ_{10} = \nabla ϕ_{2} (s) G_{1} (s) λ_{1} [t a n h (D_{1}^{*}) - t a n h ({\hat{D}}_{1})], \\ ϖ_{4} = [\frac{σ_{2} (t) σ_{2}^{T} (t)}{{\bar{σ}}_{2} (t)} + \sum_{l = 1}^{N} \frac{σ_{2} (t_{i}) σ_{2}^{T} (t_{i})}{{\bar{σ}}_{2} (t_{i})}], \\ ϖ_{5} = [\frac{σ_{2} (t)}{{\bar{σ}}_{2} (t)} + \sum_{l = 1}^{N} \frac{σ_{2} (t_{i})}{{\bar{σ}}_{2} (t_{i})}], \\ ϖ_{6} = [\frac{σ_{2} (t)}{{\bar{σ}}_{2} (t)} + \sum_{l = 1}^{N} \frac{σ_{2} (t_{i})}{{\bar{σ}}_{2} (t_{i})}] (ϵ_{21} + ϵ_{22}), \\ ϵ_{21} = Γ_{2} (s, \nabla {\hat{V}}_{2}) - Γ_{2} (s, \nabla V_{2}^{*}) + ε_{B 2}, \\ ϵ_{22} = λ_{1}^{2} {\bar{R}}_{11} (ε_{{\hat{D}}_{3}} - ε_{D_{3}}) + ε_{σ 3} + λ_{2}^{2} {\bar{R}}_{22} (ε_{{\hat{D}}_{4}} - ε_{D_{4}}) + ε_{σ 4} . \end{matrix}

Considering (30), we derive that

\begin{matrix} {\dot{V}}_{1} = (W_{1}^{* T} \nabla ϕ_{1} (s) + \nabla ε_{1}^{T}) (F (s) + G_{1} (s) u_{1} + G_{2} (s) u_{2}) \\ = (W_{1}^{* T} \nabla ϕ_{1} (s) F (s) - W_{1}^{* T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} t a n h ({\hat{D}}_{1}) \\ - (W_{1}^{* T} \nabla ϕ_{1} (s) G_{2} (s) λ_{2} t a n h ({\hat{D}}_{2}) + ε_{0}, \end{matrix}

(A11)

where

ε_{0} = \nabla ε_{1}^{T} (F (s) - G_{1} (s) λ_{1} t a n h ({\hat{D}}_{1}) - G_{2} (s) λ_{2} t a n h ({\hat{D}}_{2}))

. Based on Assumptions 1 and 2, one has

\begin{matrix} ε_{0} \leq λ_{d ε_{1}} λ_{f} ∥s∥ + λ_{d ε_{1}} λ_{1 g} λ_{1} + λ_{d ε_{1}} λ_{2 g} λ_{2} . \end{matrix}

(A12)

Based on (31), one has

\begin{matrix} W_{1}^{* T} \nabla ϕ_{1} (s) F = - s^{T} Q_{1} s - Φ_{1} (u_{1}, λ_{1}) - Φ_{2} (u_{2}, λ_{2}) - Γ_{1} (s, \nabla V_{1}) + ε_{B 1} \\ + (W_{1}^{* T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} t a n h (D_{1}) + (W_{1}^{* T} \nabla ϕ_{1} (s) G_{2} (s) λ_{2} t a n h (D_{2}) . \end{matrix}

(A13)

Based on (A13) and the facts that

(W_{1}^{* T} \nabla ϕ_{1} (s) G_{1} (s) λ_{1} [t a n h (D_{1}) - t a n h ({\hat{D}}_{1})] \leq 2 λ_{1} λ_{g 1}

λ_{d ϕ_{1}} ∥ W_{1}^{*} ∥

,

(W_{1}^{* T} \nabla ϕ_{1} (s) G_{2} (s) λ_{2} [t a n h (D_{2}) - t a n h ({\hat{D}}_{2})] \leq 2 λ_{2} λ_{g 2} λ_{d ϕ_{1}} ∥ W_{2}^{*} ∥

,

ε_{B 1} \leq ε_{B 1 h}

, and

Φ_{1} (u_{1}, λ_{1})

,

Φ_{2} (u_{2}, λ_{2})

,

Γ_{1} (s, \nabla V_{1})

are positive definite, one has

\begin{matrix} {\dot{V}}_{1} \leq - s^{T} Q_{1} s + ε_{B 1 h} + λ_{d ε_{1}} λ_{f} ∥s∥ + λ_{d ε_{1}} λ_{1 g} λ_{1} + λ_{d ε_{1}} λ_{2 g} λ_{2} \\ + 2 λ_{1} λ_{g 1} λ_{d ϕ_{1}} ∥ W_{1}^{*} ∥ + 2 λ_{2} λ_{g 2} λ_{d ϕ_{1}} ∥ W_{2}^{*} ∥ . \end{matrix}

(A14)

Similarly, we can derive

\begin{matrix} {\dot{V}}_{2} \leq - s^{T} Q_{2} s + ε_{B 2 h} + λ_{d ε_{2}} λ_{f} ∥s∥ + λ_{d ε_{2}} λ_{1 g} λ_{1} + λ_{d ε_{2}} λ_{2 g} λ_{2} \\ + 2 λ_{1} λ_{g 1} λ_{d ϕ_{2}} ∥ W_{1}^{*} ∥ + 2 λ_{2} λ_{g 2} λ_{d ϕ_{2}} ∥ W_{2}^{*} ∥ . \end{matrix}

(A15)

Collecting the results in (A9), (A10), (A14) and (A15), one has

\begin{matrix} \dot{L} \leq - s^{T} Q_{1} s - s^{T} Q_{2} s - {\tilde{W}}_{1}^{T} ϖ_{1} {\tilde{W}}_{1} + \frac{r_{c}}{2} {\tilde{W}}_{1}^{T} ϖ_{2} ϖ_{2}^{T} {\tilde{W}}_{1} + \frac{1}{2 r_{c}} {\tilde{W}}_{1}^{T} ψ_{1} ψ_{1}^{T} {\tilde{W}}_{1} \\ + {\tilde{W}}_{1}^{T} ϖ_{2} ψ_{3}^{T} {\hat{W}}_{2} + {\tilde{W}}_{1}^{T} ϖ_{2} (ψ_{5}^{T} - ψ_{2}^{T}) W_{1}^{*} + {\tilde{W}}_{1}^{T} ϖ_{3} + {\tilde{W}}_{1}^{T} ϖ_{2} ψ_{4}^{T} W_{2}^{*} \\ - {\tilde{W}}_{2}^{T} ϖ_{4} {\tilde{W}}_{2} + \frac{r_{c}}{2} {\tilde{W}}_{2}^{T} ϖ_{5} ϖ_{5}^{T} {\tilde{W}}_{2} + \frac{1}{2 r_{c}} {\tilde{W}}_{2}^{T} ψ_{6} ψ_{6}^{T} {\tilde{W}}_{2} + {\tilde{W}}_{2}^{T} ϖ_{5} ψ_{8}^{T} {\hat{W}}_{1} \\ + {\tilde{W}}_{2}^{T} ϖ_{5} (ψ_{10}^{T} - ψ_{7}^{T}) W_{2} + {\tilde{W}}_{2}^{T} ϖ_{6} + {\tilde{W}}_{2}^{T} ϖ_{5} ψ_{9}^{T} W_{1}^{*} + h_{1} + h_{2}, \\ = - s^{T} Q_{1} s - s^{T} Q_{2} s - {\tilde{W}}_{1}^{T} h_{3} {\tilde{W}}_{1} + {\tilde{W}}_{1}^{T} h_{4} - {\tilde{W}}_{2}^{T} h_{5} {\tilde{W}}_{2} + {\tilde{W}}_{2}^{T} h_{6} + h_{1} + h_{2}, \end{matrix}

(A16)

where

h_{1} = ε_{B 1 h} + λ_{d ε_{1}} λ_{f} ∥s∥ + λ_{d ε_{1}} λ_{1 g} λ_{1} + λ_{d ε_{1}} λ_{2 g} λ_{2} + 2 λ_{1} λ_{g 1} λ_{d ϕ_{1}} ∥ W_{1}^{*} ∥ + 2 λ_{2} λ_{g 2} λ_{d ϕ_{1}} ∥ W_{2}^{*} ∥,

h_{2} = ε_{B 2 h} + λ_{d ε_{2}} λ_{f} ∥s∥ + λ_{d ε_{2}} λ_{1 g} λ_{1} + λ_{d ε_{2}} λ_{2 g} λ_{2} + 2 λ_{1} λ_{g 1} λ_{d ϕ_{2}} ∥ W_{1}^{*} ∥ + 2 λ_{2} λ_{g 2} λ_{d ϕ_{2}} ∥ W_{2}^{*} ∥,

h_{3} = ϖ_{1} + \frac{r_{c}}{2} ϖ_{2} ϖ_{2}^{T} + \frac{1}{2 r_{c}} ψ_{1} ψ_{1}^{T},

h_{4} = ϖ_{2} ψ_{3}^{T} {\hat{W}}_{2} + ϖ_{2} (ψ_{5}^{T} - ψ_{2}^{T}) W_{1}^{*} + ϖ_{3} + ϖ_{2} ψ_{4}^{T} W_{2}^{*},

h_{5} = ϖ_{4} + \frac{r_{c}}{2} ϖ_{5} ϖ_{5}^{T} + \frac{1}{2 r_{c}} ψ_{6} ψ_{6}^{T},

h_{6} = ϖ_{5} ψ_{8}^{T} {\hat{W}}_{1} + ϖ_{5} (ψ_{10}^{T} - ψ_{7}^{T}) + ϖ_{6} + ϖ_{5} ψ_{9}^{T} W_{1}^{*} .

Finally, collecting the results in (A9), (A10), (A14), (A15) and (A16), one has

\begin{matrix} \dot{L} \leq - s^{T} Q_{1} s - s^{T} Q_{2} s - {\tilde{W}}_{1}^{T} h_{3} {\tilde{W}}_{1} + {\tilde{W}}_{1}^{T} h_{4} - {\tilde{W}}_{2}^{T} h_{5} {\tilde{W}}_{2} + {\tilde{W}}_{2}^{T} h_{6} + h_{1} + h_{2}, \\ \leq - λ_{m i n} (Q_{1}) {∥ s ∥}^{2} - λ_{m i n} (Q_{2}) {∥ s ∥}^{2} - λ_{m i n} (h_{3}) ∥ {\tilde{W}}_{1} ∥^{2} + ∥ {\tilde{W}}_{1} ∥ ∥ h_{4} ∥ \\ - λ_{m i n} (h_{5}) ∥ {\tilde{W}}_{2} ∥^{2} + ∥ {\tilde{W}}_{2} ∥ ∥ h_{6} ∥ + h_{1} + h_{2} . \end{matrix}

(A17)

Reasonable selection of parameters makes

h_{3} > 0, h_{4} > 0, h_{5} > 0, h_{6} > 0

, and the Lyapunov derivative (A2) is negative if

\begin{matrix} ∥ {\tilde{W}}_{1} ∥ > \frac{∥ h_{4} ∥}{2 λ_{m i n} (h_{3})} + \sqrt{\frac{∥ h_{4} ∥^{2}}{4 λ_{m i n}^{2} (h_{3})} + \frac{∥ {\tilde{W}}_{2} ∥ ∥ h_{6} ∥ + h_{1} + h_{2}}{λ_{m i n} (h_{3})}}, \end{matrix}

(A18)

\begin{matrix} ∥ {\tilde{W}}_{2} ∥ > \frac{∥ h_{6} ∥}{2 λ_{m i n} (h_{5})} + \sqrt{\frac{∥ h_{6} ∥^{2}}{4 λ_{m i n}^{2} (h_{5})} + \frac{∥ {\tilde{W}}_{1} ∥ ∥ h_{4} ∥ + h_{1} + h_{2}}{λ_{m i n} (h_{5})}} . \end{matrix}

(A19)

Based on the Lyapunov theorem and Formulas (A18) and (A19), we can select parameters appropriately to ensure that the system state s and critic neural network weight errors

{\tilde{W}}_{1}

,

{\tilde{W}}_{2}

are UUB.

This completes the proof. □

References

Tee, K.P.; Ge, S.S.; Tay, E.H. Barrier Lyapunov Functions for the control of output-constrained nonlinear systems. IFAC Proc. Vol. 2013, 46, 449–455. [Google Scholar] [CrossRef]
Ames, A.D.; Coogan, S.; Egerstedt, M.; Notomista, G.; Sreenath, K.; Tabuada, P. Control barrier functions: Theory and applications. In Proceedings of the 18th European Control Conference (ECC), Saint Petersburg, Russia, 12 May 2020; pp. 3420–3431. [Google Scholar]
Wang, D.; He, H.; Liu, D. Adaptive Critic Nonlinear Robust Control: A Survey. IEEE Trans. Cybern. 2017, 47, 3429–3451. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Liu, D. Learning and guaranteed cost control with event-based adaptive critic implementation. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 6004–6014. [Google Scholar] [CrossRef] [PubMed]
Vamvoudakis, K.G.; Lewis, F.L. Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations. Automatica 2011, 47, 1556–1569. [Google Scholar] [CrossRef]
Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive Dynamic Programming for Control: A Survey and Recent Advances. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 142–160. [Google Scholar] [CrossRef]
El-Sousy, F.F.M.; Amin, M.M.; Al-Durra, A. Adaptive Optimal Tracking Control Via Actor-Critic-Identifier Based Adaptive Dynamic Programming for Permanent-Magnet Synchronous Motor Drive System. IEEE Trans. Ind. Appl. 2021, 57, 6577–6591. [Google Scholar] [CrossRef]
Liu, D.; Li, H.; Wang, D. Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics. IEEE Trans. Syst. Man Cybern. 2014, 44, 1015–1027. [Google Scholar] [CrossRef]
Zhao, B.; Liu, D. Event-Triggered Decentralized Tracking Control of Modular Reconfigurable Robots Through Adaptive Dynamic Programming. IEEE Trans. Ind. Electron. 2020, 67, 3054–3064. [Google Scholar] [CrossRef]
Zhao, B.; Wang, D.; Shi, G.; Liu, D.; Li, Y. Decentralized Control for Large-Scale Nonlinear Systems With Unknown Mismatched Interconnections via Policy Iteration. IEEE Trans. Syst. Man Cybern. 2018, 48, 1725–1735. [Google Scholar] [CrossRef]
Wang, D.; Liu, D.; Li, H.; Ma, H. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf. Sci. 2014, 282, 167–179. [Google Scholar] [CrossRef]
Modares, H.; Lewis, F.L.; Jiang, Z.P. H∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2550–2562. [Google Scholar] [CrossRef]
Wang, D.; He, H.; Liu, D. Improving the Critic Learning for Event-Based Nonlinear H_∞ Control Design. IEEE Trans. Cybern. 2017, 47, 3417–3428. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Xi, R.; Wang, Y.; Sun, S.; Sun, J. Event-Triggered Adaptive Tracking Control for Random Systems With Coexisting Parametric Uncertainties and Severe Nonlinearities. IEEE Trans. Autom. Contr. 2022, 67, 2011–2018. [Google Scholar] [CrossRef]
Vamvoudakis, K.G.; Lewis, F.L. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
Li, J.; Ding, J.; Chai, T.; Lewis, F.L.; Jagannathan, S. Adaptive Interleaved Reinforcement Learning: Robust Stability of Affine Nonlinear Systems with Unknown Uncertainty. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 270–280. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Zhang, K.; Xiao, G.; Jiang, H. Robust Optimal Control Scheme for Unknown Constrained-Input Nonlinear Systems via a Plug-n-Play Event-Sampled Critic-Only Algorithm. IEEE Trans. Syst. Man Cybern. 2020, 50, 3169–3180. [Google Scholar] [CrossRef]
Wang, D.; Mu, C.; He, H.; Liu, D. Event-Driven Adaptive Robust Control of Nonlinear Systems With Uncertainties Through NDP Strategy. IEEE Trans. Syst. Man Cybern. 2017, 47, 1358–1370. [Google Scholar] [CrossRef]
Wei, Q.; Zhu, L.; Song, R.; Zhang, P.; Liu, D.; Xiao, J. Model-Free Adaptive Optimal Control for Unknown Nonlinear Multiplayer Nonzero-Sum Game. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 879–892. [Google Scholar] [CrossRef]
Zhang, H.; Su, H.; Zhang, K.; Luo, Y. Event-Triggered Adaptive Dynamic Programming for Non-Zero-Sum Games of Unknown Nonlinear Systems via Generalized Fuzzy Hyperbolic Models. IEEE Trans. Fuzzy. Syst. 2019, 27, 2202–2214. [Google Scholar] [CrossRef]
Vamvoudakis, K.G.; Modares, H.; Kiumarsi, B.; Lewis, F.L. Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning: How to Solve Multiplayer Games Online. IEEE Contr. Syst. Mag. 2017, 37, 33–52. [Google Scholar]
Li, J.; Xiao, Z.; Li, P. Discrete-time Multi-player Games Based on Off-Policy Q-Learning. IEEE Access 2019, 7, 134647–134659. [Google Scholar] [CrossRef]
Su, H.; Zhang, H.; Jiang, H.; Wen, Y. Decentralized Event-Triggered Adaptive Control of Discrete-Time Nonzero-Sum Games Over Wireless Sensor-Actuator Networks With Input Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4254–4266. [Google Scholar] [CrossRef]
Song, R.; Wei, Q.; Zhang, H.; Lewis, F.L. Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics. IEEE Trans. Cybern. 2021, 51, 2929–2943. [Google Scholar] [CrossRef]
Xue, S.; Luo, B.; Liu, D. Event-Triggered Adaptive Dynamic Programming for Zero-Sum Game of Partially Unknown Continuous-Time Nonlinear Systems. IEEE Trans. Syst. Man Cybern. 2020, 50, 3189–3199. [Google Scholar] [CrossRef]
Luo, B.; Yang, Y.; Liu, D. Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems. IEEE Trans. Cybern. 2021, 51, 3630–3640. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Chen, X.; Fu, H.; Wu, M. Model-Free Distributed Consensus Control Based on Actor–Critic Framework for Discrete-Time Nonlinear Multiagent Systems. IEEE Trans. Syst. Man Cybern. 2020, 50, 4123–4134. [Google Scholar] [CrossRef]
Qin, C.; Shang, Z.; Zhang, Z.; Zhang, D.; Zhang, J. Robust Tracking Control for Non-Zero-Sum Games of Continuous-Time Uncertain Nonlinear Systems. Mathematics 2022, 10, 1904. [Google Scholar] [CrossRef]
Song, R.; Lewis, F.L.; Wei, Q. Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 704–713. [Google Scholar] [CrossRef] [PubMed]
Ming, Z.; Zhang, H.; Liang, L.; Su, H. Nonzero-sum differential games of continuous-Time nonlinear systems with uniformly ultimately ε-bounded by adaptive dynamic programming. Appl. Math. Comput. 2022, 430, 127248. [Google Scholar] [CrossRef]
Marvi, Z.; Kiumarsi, B. Safe reinforcement learning: A control barrier function optimization approach. Int. J. Robust Nonlinear Control 2021, 31, 1923–1940. [Google Scholar] [CrossRef]
Xu, J.; Wang, J.; Rao, J.; Zhong, Y.; Wang, H. Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function. Int. J. Robust Nonlinear Control 2021, 32, 3408–3424. [Google Scholar] [CrossRef]
Liu, Y.J.; Lu, S.; Tong, S.; Chen, X.; Chen, C.P.; Li, D.J. Adaptive control-based Barrier Lyapunov Functions for a class of stochastic nonlinear systems with full state constraints. Automatica 2018, 87, 83–93. [Google Scholar] [CrossRef]
Yang, Y.; Ding, D.W.; Xiong, H.; Yin, Y.; Wunsch, D.C. Online barrier-actor-critic learning for H∞ control with full-state constraints and input saturation. J. Franklin Inst. 2020, 357, 3316–3344. [Google Scholar] [CrossRef]
Yang, Y.; Vamvoudakis, K.G.; Modares, H. Safe reinforcement learning for dynamical games. Int. J. Robust Nonlinear Control 2020, 30, 3706–3726. [Google Scholar] [CrossRef]
Qin, C.; Wang, J.; Qiao, X.; Zhu, H.; Zhang, D.; Yan, Y. Integral Reinforcement Learning for Tracking in a Class of Partially Unknown Linear Systems with Output Constraints and External Disturbances. IEEE Access 2022, 10, 55270–55278. [Google Scholar] [CrossRef]
Qin, C.; Zhu, H.; Wang, J.; Xiao, Q.; Zhang, D. Event-Triggered Safe Control for the Zero-Sum Game of Nonlinear Safety-Critical Systems with Input Saturation. IEEE Access 2022, 10, 40324–40337. [Google Scholar] [CrossRef]
Hu, G. Observers for one-sided Lipschitz nonlinear systems. IMA J. Math. Control Inf. 2006, 23, 395–401. [Google Scholar] [CrossRef]
Modares, H.; Lewis, F.L.; Sistani, M. Online Solution of nonquadratic two-player zero-sum games arising in the H∞ control of constrained input systems. Int. J. Adapt. Control 2014, 28, 232–254. [Google Scholar] [CrossRef]
Modares, H.; Lewis, F.L.; Naghibi-Sistani, M.B. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 2014, 50, 193–202. [Google Scholar] [CrossRef]

Figure 1. The structure diagram of the proposed multi-input safety-critical system.

Figure 2. Evolution of the state

x (t)

by using the presented method and the method in [35].

Figure 2. Evolution of the state

x (t)

by using the presented method and the method in [35].

Figure 3. Evolution of the critic estimates for player 1.

Figure 4. Evolution of the critic estimates for player 2.

Figure 5. Transformed system states using the presented method.

Figure 6. Evolution of the state

x (t)

by using the presented method and the method in [35].

Figure 6. Evolution of the state

x (t)

by using the presented method and the method in [35].

Figure 7. Constrained control inputs of player 1 and player 2.

Figure 8. Evolution of the critic estimates for player 1.

Figure 9. Evolution of the critic estimates for player 2.

Figure 10. Transformed system states using the presented method.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Qin, C.; Qiao, X.; Zhang, D.; Zhang, Z.; Shang, Z.; Zhu, H. Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints. Mathematics 2022, 10, 2744. https://doi.org/10.3390/math10152744

AMA Style

Wang J, Qin C, Qiao X, Zhang D, Zhang Z, Shang Z, Zhu H. Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints. Mathematics. 2022; 10(15):2744. https://doi.org/10.3390/math10152744

Chicago/Turabian Style

Wang, Jinguang, Chunbin Qin, Xiaopeng Qiao, Dehua Zhang, Zhongwei Zhang, Ziyang Shang, and Heyang Zhu. 2022. "Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints" Mathematics 10, no. 15: 2744. https://doi.org/10.3390/math10152744

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints

Abstract

1. Introduction

2. Problem Formulation and Barrier Transformation

3. Approximate Optimal Solution of Coupled Hamilton–Jacobi–Bellman Equations

4. Simulation

4.1. Nonlinear System Example 1

4.2. Nonlinear System Example 2

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI