Critic Learning-Based Safe Optimal Control for Nonlinear Systems with Asymmetric Input Constraints and Unmatched Disturbances

Qin, Chunbin; Jiang, Kaijun; Zhang, Jishi; Zhu, Tianzeng

doi:10.3390/e25071101

Open AccessArticle

Critic Learning-Based Safe Optimal Control for Nonlinear Systems with Asymmetric Input Constraints and Unmatched Disturbances

¹

School of Artificial Intelligence, Henan University, Zhengzhou 450000, China

²

School of Software, Henan University, Kaifeng 475000, China

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(7), 1101; https://doi.org/10.3390/e25071101

Submission received: 29 May 2023 / Revised: 1 July 2023 / Accepted: 7 July 2023 / Published: 24 July 2023

(This article belongs to the Section Complexity)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In this paper, the safe optimal control method for continuous-time (CT) nonlinear safety-critical systems with asymmetric input constraints and unmatched disturbances based on the adaptive dynamic programming (ADP) is investigated. Initially, a new non-quadratic form function is implemented to effectively handle the asymmetric input constraints. Subsequently, the safe optimal control problem is transformed into a two-player zero-sum game (ZSG) problem to suppress the influence of unmatched disturbances, and a new Hamilton–Jacobi–Isaacs (HJI) equation is introduced by integrating the control barrier function (CBF) with the cost function to penalize unsafe behavior. Moreover, a damping factor is embedded in the CBF to balance safety and optimality. To obtain a safe optimal controller, only one critic neural network (CNN) is utilized to tackle the complex HJI equation, leading to a decreased computational load in contrast to the utilization of the conventional actor–critic network. Then, the system state and the parameters of the CNN are uniformly ultimately bounded (UUB) through the application of the Lyapunov stability method. Lastly, two examples are presented to confirm the efficacy of the presented approach.

Keywords:

critic neural network; asymmetric input constraints; unmatched disturbances; safety; adaptive dynamic programming; nonlinear systems

1. Introduction

Safety-critical systems are those that, in case of accidents or failures, can result in significant consequences, including but not limited to injuries, loss of life, environmental harm, or financial losses. The emergence of safety-critical systems like unmanned aerial vehicles (UAVs) [1,2,3] and robots [4] has led to an increased focus on safety control design within the field of control systems [5,6]. Safety control designs entail control strategies that satisfy safety specifications imposed by environmental limitations or physical limitations of the system. Ignoring the detrimental impact of safety entails substantial risks to both the safety of belongings and personal security. To address the challenges of the safe controller design, researchers have provided some effective approaches [7,8,9,10,11]. The problem of safety in the presence of unmodeled dynamics or disturbances in drones has recently been addressed by designing the robust controller based on the nonlinear estimator in [9]. In ref. [10], the use of neural networks integrated with the Lyapunov theory was preliminarily treated with application in the automotive sector for critical situations, and this aspect was further addressed in an even more organic way. In ref. [11], the quadratic programming-based method was applied to develop a safe controller. Despite the fact that this method can guarantee safety at a local level for every time step, selecting a step size that is too small leads to redundant computations. In contrast, a step size that is too large causes unsafe behavior, making it challenging to ensure the safety of the system. Hence, it is crucial to identify an appropriate control design method for CT safety-critical systems that can guarantee the safety of the systems.

Recently, the CBF technique has emerged as an effective approach for ensuring the security of safety-critical systems [12,13,14]. The underlying principle of the CBF is to insure the forward invariance of the safe set. In ref. [15], the safe-based reinforcement learning approach was demonstrated, where the CBF was merged into the cost function to assure both the safety and optimality of the system. Typically, the CBF component is contained within the primitive cost function to penalize behavior that violates safety constraints. Reference [16] incorporated damping factors into the CBF and intervened selectively only in the event of safety constraint violations, aiming to reduce disruptions to the optimal controller. Reference [17] introduced the utilization of the CBF and summarized the verification approach for safety-critical control systems. Nevertheless, the methods mentioned above do not take into account the presence of external disturbances, which served as a motivation for the research conducted in this paper.

As disturbances are present in almost all industrial systems and hurt control performance, it is necessary to consider external disturbances in actual projects. Recently, several methods have been proposed to address disturbances [18,19,20,21,22,23,24]. For instance, references [18,19] used

H \infty

control to reduce external disturbances in nonlinear systems. Reference [23] combined ADP with sliding mode control for addressing optimal control problems of CT nonlinear systems considering uncertain disturbances. Reference [24] cast external disturbances as the ZSG problem, with the control strategy aiming at minimizing the cost function and the disturbance strategy striving towards maximizing it. It is well known that for the HJI equation of the ZSG, it is difficult to find its analytical solution. Fortunately, with the evolution of optimal control [25,26,27], the ADP approach [28] was employed to approximately tackle the ZSG problem. For example, in reference [29], a new database-based adaptive critic algorithm was presented to study the infinite-scale robust control for nonlinear systems. However, the aforementioned methods fail to consider the capability limitations of the system due to asymmetric input constraints.

Although symmetric input constraints have been widely investigated in prior research and tackled with various techniques, such as the control problem that concerns the uncertain impulse system that has input constraints, which was handled in [30], and the utilization of integral reinforcement learning with the actor–critic network to address the tracking control problem under input constraints in [31,32], there has been relatively little research on the treatment of asymmetric input constraints that frequently occur in practical systems. Several optimal control methods exist for addressing CT systems with nonlinear dynamics and input constraints that are asymmetric. Among them is one that used the cost function with adjustable upper and lower limits of integration [33,34,35]. Another proposed the switching function [36] to tackle the problem, but it is only applied to linear systems. However, none of these results considered the incorporation of CBF into the CT nonlinear safety-critical systems to study the safe and optimal control problem under asymmetric input constraints with external disturbances.

This study explores the safe and optimal control issue for safety-critical nonlinear systems to reject unmatched external disturbances under the condition of asymmetric input constraints. Unlike other works, a new non-quadratic form function for handling asymmetric input constraints is proposed in this paper. To tackle the challenge posed by unmatched disturbances, a two-player ZSG is put forward to formulate the optimization problem. The ZSG is then addressed by finding the Nash equilibrium point, which is obtained by addressing the HJI equation. However, since solving the HJI equation is challenging, an ADP technique similar to that used in references [37,38,39] is exploited to estimate the solution of the HJI equation. In addition, one single CNN is used instead of a dual actor–critic neural network to diminish the computational complexity in approximating the control policy. Consequently, the optimal control policy is obtained by considering the worst disturbance.

The contributions are outlined mainly as follows:

Asymmetric input constraints are considered in the control problem of the CT nonlinear safety-critical systems. In addition, this paper proposes a new non-quadratic form function to address the issue of asymmetric input constraints. It is important to note that when applying this approach, the optimal control policy no longer remains at 0, even when the system state reaches the equilibrium point of $x = 0$ (see $u^{*} (x)$ in later Equation (15)).
This paper adopts the CBF to construct safety constraints and proposes designing a damping coefficient within the CBF to balance the safety and optimality of safety-critical systems based on varying safety requirements in different applications.
The safe optimal control problem is turned into the ZSG problem to address unmatched disturbances; then, the optimal control law is gained by tackling the HJI equation using one CNN. Moreover, the use of only one CNN to approximate the HJI equation is an effective way to reduce the computational burden compared to the actor–critic network and the system state, and CNN parameters are demonstrated to be UUB.

The following structure is adopted for this article. Section 2 provides the initial formulation of the problem. Section 3 presents a safe optimal control design for the two-player ZSG problem. Then, in Section 4, an adaptive CNN method for addressing the HJI equation using an online method is proposed, and its stability is verified. Section 5 introduces two examples to demonstrate that the presented approach is effective. Lastly, Section 6 gives conclusions.

2. Problem Statement

Consider the CT nonlinear safety-critical system as

\dot{x} = F (x) + G (x) u + P (x) v,

(1)

where

x = {[x_{1}, x_{2}, \dots, x_{n}]}^{T} \in C_{a} \subseteq R^{n}

indicates the system state vector with n-dimensional parameters,

F (x) \in R^{n}

represents the internal dynamics,

G (x) \in R^{n \times m}

and

P (x) \in R^{n \times q}

indicate control and disturbance coefficient matrices, respectively. Additionally,

u \in R^{m}

denotes an input variable with m-dimensional parameters denoted by

∁_{u} = \{u | u_{m a x} \geq u \geq u_{m i n}\}

, where

u_{m a x}

and

u_{m i n}

stand for the upper and lower bounds, respectively. And

v \in R^{q}

is the unmatched disturbances. The paper assumes

F (\cdot)

,

G (\cdot)

,

P (\cdot)

are Lipschitz continuous and satisfy

F (0) = 0

, and the safety-critical System (1) is stabilizable and controllable. Moreover, we assume there exist two constants

G_{M} > 0

and

P_{M} > 0

. Both

G (x)

and

P (x)

have upper bounded values, i.e.,

G_{M} \geq ∥G (x)∥, P_{M} \geq ∥P (x)∥

, for any

x \in R^{n}

.

In addition, it is essential to emphasize that

C_{a}

represents a safe set for (1).

C_{a}

is derived from operational restrictions, such as the allowable states of the robot arm, which is mathematically determined by

\{\begin{matrix} \begin{matrix} C_{a} & = \{x \in R^{n} | z (x) \geq 0\}, \\ i n t (C_{a}) & = \{x \in R^{n} | z (x) > 0\}, \\ \partial C_{a} & = \{x \in R^{n} | z (x) = 0\}, \end{matrix} \end{matrix}

(2)

where

z (x)

represents continuous concerning x. The set

i n t (C_{a})

denotes the interior of

C_{a}

, while

\partial C_{a}

represents the boundary of

C_{a}

.

Subsequently, the representation of the infinite horizon cost function from

t = 0

for System (1) is given by

V (x) = \int_{0}^{\infty} x {(t)}^{T} Q x (t) + U (u) - Υ^{2} {∥v∥}^{2} d t,

(3)

where

Q

represents a function with positive definite properties,

{∥v∥}^{2}

=

v^{T} v

,

Υ > 0

represents a constant weight coefficient,

U (u)

is a non-quadratic form function employed for handling the asymmetric input constraints determined by

\begin{matrix} U (u) & = 2 \int_{ℑ}^{u} Ψ t a n h^{- 1} (\frac{t - ℑ}{Ψ}) d t \\ = 2 Ψ (u - ℑ) t a n h^{- 1} (\frac{u - ℑ}{Ψ}) + Ψ^{2} {ln}_{} (1 - \frac{{(u - ℑ)}^{2}}{Ψ}), \end{matrix}

(4)

with

Ψ

and ℑ defined as

Ψ = \frac{1}{2} (u_{m a x} - u_{m i n}), ℑ = \frac{1}{2} (u_{m a x} + u_{m i n}),

(5)

where

|u_{m a x}| \neq |u_{m i n}|

and

t a n h (z) = (e^{z} - e^{- z}) / (e^{z} + e^{- z})

with

z \in R

.

Remark 1.

Even though

t a n h (z)

is symmetric,

U (u)

in (4) generates asymmetric constraints in the control signal

u^{*} (x)

(see

u^{*} (x)

in later (15)). This is due to the fact that ℑ is not equal to 0 in (4). This feature is different from studying the symmetric input constraints.

Additionally, the ultimate objective of this paper is to devise the safe and optimal control input policy for (1), which involves the utilization of the CBF concept. In the upcoming section, this paper presents the concept of the CBF and proposes an ADP-based approach to design the safe and optimal controller.

3. Safe Optimal Control Design

This section presents a detailed explanation of the concept of the CBF. Then, the safe and optimal control problem is converted to the two-player ZSG to overcome the unmatched disturbances, and the CBF is integrated with the cost function without an intermediary to punish unsafe behavior.

3.1. Control Barrier Function

The utilization of the CBF provides a solution to address the safety constraint problem in safety-critical systems. The CBF is a function that is non-negative within the set

C_{a}

and exhibits divergence to infinity at the edge of

C_{a}

. As the state x is about to reach the boundary of

C_{a}

, the condition of negative derivative can bring the system state x back within

C_{a}

, ensuring that the system state is always confined within

C_{a}

. To better illustrate the properties of the CBF, the following assumption is given.

Assumption 1.

The CBF candidate

B_{r} (x)

meets the subsequent three characteristics [40,41]:

(1): $B_{r} (x) \geq 0, \forall x \in i n t (C_{a})$ ,
(2): $B_{r} (x) \to \infty, \forall x \in \partial C_{a}$ ,
(3): $B_{r} (x)$ is monotonically decreasing $\forall x \in C_{a}$ .

Moreover, for all

x \in C_{a}

, the CBF

B_{r} (x)

has the following properties:

\begin{matrix} \frac{1}{γ_{1} (z (x))} \leq B_{r} (x) \leq \frac{1}{γ_{2} (z (x))}, \dot{B_{r}} (x) \leq γ_{3} (z (x)), \end{matrix}

(6)

where

γ_{1} (\cdot), γ_{2} (\cdot)

, and

γ_{3} (\cdot)

are class

K

functions.

Under the premise that Assumption 1 and Equation (6) both hold, a suitable choice for

B_{r} (x)

is

ρ y (x) / z (x)

, where

y (x)

represents a special scheduling function determined by the user to allow for flexibility in selecting

B_{r} (x)

. Specifically,

y (x)

ensures that the CBF operates only when the system is close to the unsafe set.

ρ > 0

is the damping factor used to balance safety and optimality.

Remark 2.

In contrast to the previous CBF [16], the ρ chosen here shows a positive correlation with the value of

B_{r} (x)

. The larger the value of ρ, the faster the system state moves away from the unsafe set, and the smaller the value of ρ, the slower the state x moves away from the unsafe set. A smaller value of ρ emphasizes optimality and a larger value of ρ enforces safety.

3.2. Safe and Optimal Control Approach

By augmenting the selected CBF

B_{r} (x)

to the cost function (3), a new refined cost function is obtained, that is,

\begin{matrix} V (x) = \int_{0}^{\infty} x {(t)}^{T} Q x (t) + U (u) - Υ^{2} {∥v∥}^{2} + B_{r} (x) d t . \end{matrix}

(7)

Remark 3.

To ensure the safety of the system, it is assumed that the original system state x is confined within the set

C_{a}

. This is because the rapid increase in

B_{r} (x)

as the state x nears the boundary of

C_{a}

is the reason behind the penalization of state convergence behavior when the initial state is beyond

C_{a}

. This prevents the system state from converging.

The conventional control problems can be transformed into two-player ZSG problems. The Nash equilibrium point, i.e., the saddle point (

u^{*}

,

v^{*}

) can be obtained by addressing the special HJI equation. Then, the optimal cost function is defined by

\begin{matrix} V^{*} (x) = \underset{u}{m i n} \underset{v}{m a x} \int_{0}^{\infty} x {(t)}^{T} Q x (t) + U (u) - Y^{2} {∥v∥}^{2} + B_{r} (x) d t . \end{matrix}

(8)

The purpose of the two-player ZSG problem is to identify a saddle point so that the following inequality can hold:

\begin{matrix} V^{*} (x, u^{*}, v) \leq V^{*} (x, u^{*}, v^{*}) \leq V^{*} (x, u, v^{*}) . \end{matrix}

(9)

Therefore, for the two-player ZSG problem,

u^{*}

is the optimal control input policy minimizing the cost function, and

v^{*}

represents the worst disturbance input policy maximizing the cost function.

Definition 1.

Input policy

u

is considered admissible in relation to (7) on

℧ \in R^{n}

, denoted by

u \in ℵ (℧)

,

u

stabilizes (1) on ℧ if

u

is continuous on ℧, and (7) is limited for any

x \in ℧

.

For the admissible input policy

u \in ℵ (℧)

, if Equation (7) is continuously differentiable, computing the gradient of

V (x)

with respect to t on both sides of Equation (7) yields the nonlinear Lyapunov equation as

\begin{matrix} 0 = \nabla V {(x)}^{T} (F (x) + G (x) u + P (x) v) + x {(t)}^{T} Q x (t) + U (u) - Y^{2} {∥v∥}^{2} + B_{r} (x), \end{matrix}

(10)

where

\nabla V (x)

is the gradient of

V (x)

,

V (0) = 0 .

Based on the optimal control approach, the HJI equation for the two-player ZSG problem possesses an exclusive solution if there exists a saddle point, that is, if the following conditions hold:

\begin{matrix} 0 = \underset{u}{m i n} \underset{v}{m a x} H (x, u, v, \nabla V^{*} (x)) = \underset{v}{m a x} \underset{u}{m i n} H (x, u, v, \nabla V^{*} (x)), \end{matrix}

(11)

where

H (x, u, v, \nabla V^{*} (x))

refers to the Hamiltonian function of the safety-critical system (1), that is,

\begin{matrix} H (x, u, v, \nabla V^{*} (x)) = \nabla & V^{*} {(x)}^{T} (F (x) + G (x) u + P (x) v) \\ + x {(t)}^{T} Q x (t) + U (u) - Y^{2} {∥v∥}^{2} + B_{r} (x) . \end{matrix}

(12)

By using Equations (11) and (12), the saddle point can be found by addressing two equations as

u^{*} (x) = a r g \underset{u}{m i n} H (x, u, v, \nabla V^{*} (x)),

(13)

and

v^{*} (x) = a r g \underset{v}{m a x} H (x, u, v, \nabla V^{*} (x)) .

(14)

Thus, the saddle point

(u^{*}, v^{*})

can be gained as

u^{*} (x) = - Ψ t a n h (\frac{1}{2 Ψ} G {(x)}^{T} \nabla V^{*} (x)) + ψ_{ℑ},

(15)

and

v^{*} (x) = \frac{1}{2 Y^{2}} P {(x)}^{T} \nabla V^{*} (x),

(16)

where

ψ_{ℑ} = {[ℑ, ℑ, \dots, ℑ]}^{T} \in R^{m}

with ℑ given by Equation (5).

Remark 4.

Given that

ℑ \neq 0

from (5), it can be concluded that

u (0) = ℑ \neq 0

. Therefore, in order to establish the equilibrium point of (1) at

x = 0

, the assumption of

G (0) = 0

is necessary.

Substituting Equations (15) and (16) into Equation (11), the HJI equation can be redefined as

\begin{matrix} 0 = & \nabla V^{*} {(x)}^{T} (F (x) + P (x) v^{*}) + x {(t)}^{T} Q x (t) + U (- Ψ t a n h (T (x)) + ψ_{ℑ}) \\ - Y^{2} {∥v^{*}∥}^{2} - {(Ψ \nabla V^{*} (x))}^{T} G (x) t a n h (T (x)) + \nabla V^{*} {(x)}^{T} G (x) ψ_{ℑ} + B_{r} (x), \end{matrix}

(17)

where

T (x) = 1 / (2 Ψ) G {(x)}^{T} \nabla V^{*} (x)

and

V^{*} (0) = 0

.

For the optimal safe control problem of the ZSG with unmatched external disturbances and asymmetric input constraints, it is necessary to obtain the value corresponding to the optimal cost Function (8) for achieving the optimal control input Policy (15) and the worst disturbance input Policy (16). Therefore, the solution of Equation (17) needs to be obtained. Nevertheless, since Equation (17) represents a nonlinear partial differential equation, it is challenging to find its analytical solution using conventional mathematical approaches. Hence, the solution of this equation is estimated by using the CNN in the next section.

4. Adaptive CNN Design

4.1. Solving the HJI Equation via the CNN

This section designs a CNN to estimate cost function

V^{*} (x)

as

V^{*} (x) = W_{c}^{T} δ (x) + ξ (x),

(18)

where

ξ (x)

represents the estimation error about the CNN with

ξ (0) = 0

,

W_{c} \in R^{r}

represents the ideal weight vector of the CNN,

δ (x) = [δ_{1} (x); δ_{2} (x); \dots; δ_{r} (x)]

represents activation function with

δ_{j} (0) = 0, j = 1, 2, \dots, r, r

is the number of neurons in the CNN.

The gradient of the approximate optimal cost function is

\nabla V^{*} (x) = \nabla δ {(x)}^{T} W_{c} + \nabla ξ (x) .

(19)

Substituting Equation (19) into Equation (15),

u^{*} (x)

can be represented as

u^{*} (x) = - Ψ t a n h (\bar{A} (x)) + ξ_{u^{*}} (x) + ψ_{ℑ},

(20)

where

\bar{A} (x) = \frac{1}{2 Ψ} G {(x)}^{T} \nabla δ {(x)}^{T} W_{c},

(21)

and

ξ_{u^{*}} (x) = - \frac{1}{2} (I_{m} - Φ (A (x))) G {(x)}^{T} \nabla ξ (x),

(22)

with

Φ (A (x)) = d i a g \{t a n h^{2} (A_{l} (x))\} (l = 1, 2, \dots, m)

with

A_{l} (x) = [A_{1} (x); A_{2} (x); \dots;

A_{m} (x)] \in R^{m}

being selected between

\bar{A} (x)

and

T (x)

. Then, considering Equation (19),

v^{*} (x)

in Equation (16) can be redefined as

v^{*} (x) = \frac{1}{2 Y^{2}} P {(x)}^{T} \nabla δ {(x)}^{T} W_{c} + ξ_{v^{*}} (x),

(23)

where

ξ_{v^{*}} (x) = \frac{1}{2 Υ^{2}} P {(x)}^{T} \nabla ξ (x) .

Similarly, substituting Equation (19) into Equation (17), the HJI equation can be rewritten as

\begin{matrix} 0 = & W_{c}^{T} \nabla δ (x) (F (x) + P (x) v^{*}) + x^{T} Q x + U (- Ψ t a n h (\bar{A} (x) + K (x) + ψ_{ℑ})) + B_{r} (x) \\ - Ψ W_{c}^{T} \nabla δ (x) G (x) t a n h (\bar{A} (x) + K (x)) - Ψ \nabla ξ {(x)}^{T} G (x) t a n h (\bar{A} (x) + K (x)) \\ + \nabla ξ {(x)}^{T} (F (x) + P (x) v^{*}) - Y^{2} {∥v^{*}∥}^{2} + (W_{c}^{T} \nabla δ (x) + \nabla ξ {(x)}^{T}) G (x) ψ_{ℑ}, \end{matrix}

(24)

where

K (x) = 1 / (2 Ψ) G {(x)}^{T} \nabla ξ (x)

.

However, since the ideal CNN weight

W_{c}

in Equation (18) is unknown, it can not be used in the control procedure. Hence, the CNN is used to estimate the cost function and its gradient as

\hat{V} (x) = {\hat{W}}_{c}^{T} δ (x),

(25)

\nabla \hat{V} (x) = \nabla δ {(x)}^{T} {\hat{W}}_{c},

(26)

where

{\hat{W}}_{c}

represents the estimation of

W_{c}

.

Therefore, the approximate optimal input and the approximate worst disturbance input become

{\hat{u}}^{*} (x) = - Ψ t a n h (\frac{1}{2 Ψ} G {(x)}^{T} \nabla δ {(x)}^{T} {\hat{W}}_{c}) + ψ_{ℑ},

(27)

and

{\hat{v}}^{*} (x) = \frac{1}{2 Y^{2}} G {(x)}^{T} \nabla δ {(x)}^{T} {\hat{W}}_{c} .

(28)

Subsequently, the approximated Hamilton function can be formulated by

\begin{matrix} \hat{H} (x, {\hat{W}}_{c}, {\hat{v}}^{*}) = & {\hat{W}}_{c}^{T} ð + {\hat{W}}_{c}^{T} \nabla δ (x) G (x) ψ_{ℑ} + U (- Ψ t a n h (Γ (x)) + ψ_{ℑ}) \\ + x^{T} Q x - Y^{2} {∥{\hat{W}}_{c}∥}^{2} - Ψ {\hat{W}}_{c}^{T} \nabla δ (x) G (x) t a n h (Γ (x)) + B_{r} (x), \end{matrix}

(29)

where

ð = \nabla δ (x) (F (x) + P (x) {\hat{v}}^{*})

(30)

and

Γ (x) = \frac{1}{2 Ψ} G {(x)}^{T} \nabla δ {(x)}^{T} {\hat{W}}_{c} .

(31)

The CNN weight estimation error is denoted by

{\tilde{W}}_{c} = W_{c} - {\hat{W}}_{c},

(32)

and the approximation error

ϱ_{c}

of the Hamiltonian function is derived as

\begin{matrix} ϱ_{c} & = \hat{H} (x, {\hat{W}}_{c}, {\hat{v}}^{*}) - H (x, u^{*}, v^{*}, \nabla V^{*} (x)) \\ = \hat{H} (x, {\hat{W}}_{c}, {\hat{v}}^{*}) . \end{matrix}

(33)

To achieve

{\hat{W}}_{c} \to W_{c}

, it is necessary to ensure that

ϱ_{c} \to 0

. Therefore, the chosen target function is denoted by

E = \frac{1}{2} ϱ_{c}^{T} ϱ_{c} (1 / {(1 + ð^{T} ð)}^{2})

, where

O = 1 + ð^{T} ð

. Consequently, based on a normalized gradient descent algorithm, the weight vector

{\hat{W}}_{c}

is defined by

{\dot{\hat{W}}}_{c} = - \frac{α}{O^{2}} \frac{\partial E}{\partial {\hat{W}}_{c}} = - \frac{α}{O^{2}} ϱ_{c},

(34)

with

α > 0

being the adjustable parameter and

ϱ_{c}

defined as Equation (33).

Using Equations (32) and (34), the weight approximation error

{\dot{\tilde{W}}}_{c}

can be expressed as

{\dot{\tilde{W}}}_{c} = \frac{α ζ}{O} ξ_{c} - α ζ ζ^{T} {\tilde{W}}_{c},

(35)

where

ξ_{c} = - \nabla ξ {(x)}^{T} (F (x) + P (x) {\hat{v}}^{*})

is the residual error and

ζ = \frac{ð}{O}

.

4.2. Stability Analysis

The UUB of both the state x and the CNN parameters in the closed-loop system is demonstrated by utilizing the Lyapunov stability analysis principle in this subsection. First, two assumptions that were also used in [28,42] are required, as

Assumption 2.

The ideal optimal CNN weight vector

W_{c}

is upper bounded, i.e.,

∥W_{c}∥ \leq b_{W_{c}}

, where

b_{W_{c}} > 0

is a constant. Moreover, for any

x \in ℧

, this paper assumes that there are two known constants

b_{\nabla δ} > 0

,

b_{δ} > 0

so that

∥\nabla δ (x)∥ \leq b_{\nabla δ}, ∥δ (x)∥ \leq b_{δ}

. Meanwhile, there exist

b_{\nabla ξ} > 0

and

b_{ξ} > 0

so that

∥\nabla ξ (x)∥ \leq b_{\nabla ξ}, ∥ξ (x)∥ \leq b_{ξ}

for any

x \in ℧

.

Assumption 3.

We make

b_{ξ_{u^{*}}}, b_{ξ_{v^{*}}}, b_{ξ_{c}}

be positive constants.

(1): $b_{ξ_{u^{*}}} \geq ∥ξ_{u^{*}} (x)∥$ for any $x \in ℧$ .
(2): $b_{ξ_{v^{*}}} \geq ∥ξ_{v^{*}} (x)∥$ for any $x \in ℧$ .
(3): $b_{ξ_{c}} \geq ∥ξ_{c}∥$ for any $x \in ℧$ .

Theorem 1.

Assuming Assumptions 1–3 are met, we consider System (1) with the associated Control (27) and the update rule of CNN (34), ensuring all signals in the nonlinear system are UUB if the following condition holds:

α k_{m i n} (ζ ζ^{T}) - (1 / Y^{2}) ℑ_{\nabla δ}^{2} P_{M}^{2} > 0 .

(36)

Proof.

We let the Lyapunov candidate function as the following (note: for convenience,

V^{*} (x)

and

(1 / 2) {\tilde{W}}_{c}^{T} {\tilde{W}}_{c}

are abbreviated as

L_{1}

and

L_{2}

below):

L (t) = \underset{L_{1}}{\underset{⏟}{V^{*} (x)}} + \underset{L_{2}}{\underset{⏟}{(1 / 2) {\tilde{W}}_{c}^{T} {\tilde{W}}_{c}}} .

(37)

Taking the derivation of

L_{1}

in Equation (37) and using System (1), the derivation of

L_{1}

can be expressed as

\begin{matrix} {\dot{L}}_{1} & = \frac{d V^{*} (x)}{d t} \\ = \nabla V^{*} {(x)}^{T} (F (x) + G (x) {\hat{u}}^{*} + P (x) {\hat{v}}^{*}) \\ = \nabla V^{*} {(x)}^{T} (F (x) + G (x) u^{*} + P (x) v^{*}) \\ + \nabla V^{*} {(x)}^{T} P (x) ({\hat{v}}^{*} - v^{*}) + \nabla V^{*} {(x)}^{T} G (x) ({\hat{u}}^{*} - u^{*}) . \end{matrix}

(38)

Then, using Equations (12) and (11), it can be derived as

\begin{matrix} \nabla V^{*} {(x)}^{T} (F (x) + G (x) u^{*} + P (x) v^{*}) = - x^{T} Q x - U (u^{*}) + Y^{2} {∥v^{*}∥}^{2} - B_{r} (x) . \end{matrix}

(39)

Similarly, taking into account Equations (27) and (28), the derived results are

\nabla V^{*} {(x)}^{T} G (x) = 2 Ψ {(t a n h^{- 1} ((ψ_{ℑ} - u^{*}) / (Ψ)))}^{T},

(40)

and

\nabla V^{*} {(x)}^{T} P (x) = 2 Y^{2} v^{* T} .

(41)

According to Equations (38)–(41), Equation (38) can be rewritten as follows (note: for convenience,

\bar{ω} - U (u^{*})

and

2 Y^{2} v^{* T} {\hat{v}}^{*} - Y^{2} {∥v^{*}∥}^{2} - B_{r} (x)

are abbreviated as

Λ_{1}

and

Λ_{2}

below):

{\dot{L}}_{1} = - x^{T} Q x + \underset{Λ_{1}}{\underset{⏟}{\bar{ω} - U (u^{*})}} + \underset{Λ_{2}}{\underset{⏟}{2 Y^{2} v^{* T} {\hat{v}}^{*} - Y^{2} {∥v^{*}∥}^{2} - B_{r} (x)}},

(42)

where

\begin{matrix} \bar{ω} = 2 Ψ (t a n h^{- 1} ((ψ_{ℑ} - u^{*}) / Ψ)) ({\hat{u}}^{*} - u^{*}) . \end{matrix}

(43)

We apply Young’s inequality to Equation (43). Additionally, considering Equations (19), (20), (27), (40) and (41),

\bar{ω}

can be formulated as

\begin{matrix} \bar{ω} & \leq {∥Ψ (t a n h^{- 1} ((ψ_{ℑ} - u^{*}) / Ψ))∥}^{2} + {∥{\hat{u}}^{*} - u^{*}∥}^{2} \\ = \frac{1}{4} {∥G {(x)}^{T} \nabla V^{*} (x)∥}^{2} + {∥{\hat{u}}^{*} - u^{*}∥}^{2} \\ = \frac{1}{4} {∥G {(x)}^{T} (\nabla δ {(x)}^{T} W_{c} + \nabla ξ (x))∥}^{2} \\ + {∥ - Ψ t a n h (Γ (x)) + Ψ t a n h (\bar{A} (x)) - ξ_{u^{*}} (x) ∥}^{2} . \end{matrix}

(44)

Furthermore, utilizing Young’s inequality,

\bar{ω}

in Equation (44) further yields

\begin{matrix} \bar{ω} \leq & 2 {∥- Ψ t a n h (Γ (x)) + Ψ t a n h (\bar{A} (x))∥}^{2} + 2 {∥ξ_{u^{*}} (x)∥}^{2} \\ + \frac{1}{2} ({∥G {(x)}^{T} \nabla δ {(x)}^{T} W_{c}∥}^{2}) + \frac{1}{2} ({∥G {(x)}^{T} \nabla ξ (x)∥}^{2}) \\ \leq & 4 {∥Ψ t a n h (Γ (x))∥}^{2} + {∥Ψ t a n h (\bar{A} (x))∥}^{2} + 2 {∥ξ_{u^{*}} (x)∥}^{2} \\ + \frac{1}{2} ({∥G {(x)}^{T} \nabla δ {(x)}^{T} W_{c}∥}^{2}) + \frac{1}{2} ({∥G {(x)}^{T} \nabla ξ (x)∥}^{2}) . \end{matrix}

(45)

According to Equations (21) and (31), the following inequalities can be depicted as

{∥t a n h (Γ (x))∥}^{2} = t a n h^{2} (Γ (x)) \leq m

(46)

and

{∥t a n h (\bar{A} (x))∥}^{2} = t a n h^{2} (\bar{A} (x)) \leq m .

(47)

Based on Equation (46) and Assumptions 2 and 3,

\bar{ω}

can be expressed as

\begin{matrix} \bar{ω} \leq 8 Ψ^{2} m + \frac{1}{2} G_{M}^{2} (ℑ_{\nabla δ}^{2} ℑ_{W_{c}}^{2} + ℑ_{\nabla ξ}^{2}) + 2 b_{ξ_{u^{*}}}^{2} . \end{matrix}

(48)

By observing Equations (4) and (5), it can be concluded that

U (u^{*}) > 0

. Using Young’s inequality and Equation (48), the expression of

Λ_{1}

in Equation (42) can be rewritten as

\begin{matrix} Λ_{1} \leq 8 Ψ^{2} m + \frac{1}{2} G_{M}^{2} (ℑ_{\nabla δ}^{2} ℑ_{\nabla W_{c}}^{2} + ℑ_{\nabla ξ}^{2}) + 2 b_{ξ_{u^{*}}}^{2} . \end{matrix}

(49)

Similarly,

Λ_{2}

in Equation (42) can be rewritten as follows (note: from Assumption 1,

B_{r} (x)

\geq 0

):

\begin{matrix} Λ_{2} & = Y^{2} {∥{\hat{v}}^{*}∥}^{2} - B_{r} (x) \\ = - Y^{2} {∥v^{*}∥}^{2} + Y^{2} {∥v^{*}∥}^{2} + Y^{2} {∥{\hat{v}}^{*}∥}^{2} - B_{r} (x) \\ \leq - Y^{2} {∥v^{*}∥}^{2} + Y^{2} {∥v^{*}∥}^{2} + Y^{2} {∥{\hat{v}}^{*}∥}^{2} \\ \leq - Y^{2} {∥v^{*}∥}^{2} + Y^{2} ({∥v^{*}∥}^{2} + {∥{\hat{v}}^{*}∥}^{2}) \\ = (1 / (4 Y^{2})) {∥P {(x)}^{T} \nabla δ {(x)}^{T} (W_{c} - {\tilde{W}}_{c})∥}^{2} . \end{matrix}

(50)

Meanwhile, using Young’s inequality and Assumptions 1 and 3,

Λ_{2}

in Equation (50) further yields

\begin{matrix} Λ_{2} & \leq (1 / (4 Y^{2})) P_{M}^{2} ℑ_{\nabla δ}^{2} {∥W_{c} - {\tilde{W}}_{c}∥}^{2} \\ \leq (1 / (2 Y^{2})) P_{M}^{2} ℑ_{\nabla δ}^{2} (ℑ_{W_{c}}^{2} + {∥{\tilde{W}}_{c}∥}^{2}) . \end{matrix}

(51)

Hence, by observing Equations (49) and (51), it can be inferred that

{\dot{L}}_{1}

in Equation (42) satisfies

\begin{matrix} {\dot{L}}_{1} \leq & - k_{m i n} (Q) {∥x∥}^{2} + (1 / (2 Y^{2})) P_{M}^{2} ℑ_{\nabla δ}^{2} ℑ_{W_{c}}^{2} + 8 Ψ^{2} m + 2 b_{ξ_{u^{*}}}^{2} \\ + (1 / 2) G_{M}^{2} (ℑ_{\nabla δ}^{2} ℑ_{W_{c}}^{2} + ℑ_{\nabla ξ}^{2}) + (1 / (2 Y^{2})) P_{M}^{2} ℑ_{\nabla δ}^{2} {∥{\tilde{W}}_{c}∥}^{2} . \end{matrix}

(52)

Then, the derivative of

L_{2}

in Equation (37) along the solution of Equation (34) is as follows (note:

α {\tilde{W}}_{c}^{T} (ζ / O) ξ_{c}

is abbreviated as

Λ_{3}

below):

{\dot{L}}_{2} = {\tilde{W}}_{c}^{T} {\dot{\tilde{W}}}_{c} = \underset{Λ_{3}}{\underset{⏟}{α {\tilde{W}}_{c}^{T} (ζ / O) ξ_{c}}} - α {\tilde{W}}_{c}^{T} ζ ζ^{T} {\tilde{W}}_{c} .

(53)

Immediately after, using Young’s inequality,

Λ_{3}

can be depicted as

\begin{matrix} Λ_{3} & \leq \frac{α}{2 O} ({∥ζ^{T} {\tilde{W}}_{c}∥}^{2} + {∥ξ_{c}∥}^{2}) \\ \leq α (\frac{1}{2} {\tilde{W}}_{c}^{T} ζ ζ^{T} {\tilde{W}}_{c} + \frac{1}{2} {∥ξ_{c}∥}^{2}) . \end{matrix}

(54)

Additionally, with Assumption 3 holding, it can be deduced that

{\dot{L}}_{2}

in Equation (53) satisfies

\begin{matrix} {\dot{L}}_{2} & \leq \frac{1}{2} (- α {\tilde{W}}_{c}^{T} ζ ζ^{T} {\tilde{W}}_{c} + α {∥ξ_{c}∥}^{2}) \\ \leq \frac{1}{2} (- α k_{m i n} (ζ ζ^{T}) {∥{\tilde{W}}_{c}∥}^{2} + α ℑ_{ξ_{c}}^{2}) . \end{matrix}

(55)

Using Equations (37), (52) and (55),

\dot{L}

can be depicted as

\begin{matrix} \dot{L} \leq & - k_{m i n} (Q) {∥x∥}^{2} + (1 / (2 Y^{2}) P_{M}^{2} ℑ_{\nabla δ}^{2} ℑ_{W}^{2}) + (1 / 2) G_{M}^{2} (ℑ_{\nabla δ}^{2} ℑ_{W}^{2} + ℑ_{\nabla ξ}^{2}) \\ - (1 / 2) (α k_{m i n} ζ ζ^{T} - (1 / Y^{2}) P_{M}^{2} ℑ_{\nabla δ}^{2}) {∥{\tilde{W}}_{c}∥}^{2} + 8 Ψ^{2} m + 2 b_{ξ_{u^{*}}}^{2} + (α / 2) ℑ_{ξ_{c}}^{2} . \end{matrix}

(56)

Finally,

\dot{L} < 0

is true if

x \notin ℧ (x)

or

{\tilde{W}}_{c} \notin ℧ ({\tilde{W}}_{c})

, and based on Equation (36),

℧ (x)

and

℧ ({\tilde{W}}_{c})

can be respectively formulated as

\begin{matrix} ℧ (x) = \{∥x∥ \leq \sqrt{\frac{α ℑ_{ξ_{c}}^{2} + Ξ + A_{1} P_{M}^{2}}{2 k_{m i n} (Q)}}\}, \end{matrix}

(57)

and

\begin{matrix} ℧ ({\tilde{W}}_{c}) = \{∥{\tilde{W}}_{c}∥ \leq \sqrt{\frac{α ℑ_{ξ_{c}}^{2} + Ξ + A_{1} P_{M}^{2}}{α k_{m i n} (ζ ζ^{T}) - A_{2} P_{M}^{2}}}\}, \end{matrix}

(58)

where

Ξ = G_{M}^{2} (ℑ_{\nabla δ}^{2} ℑ_{W_{c}}^{2} + ℑ_{\nabla ξ}^{2}) + 16 Ψ^{2} m + 4 b_{ξ_{u^{*}}}^{2}

,

A_{1} = (1 / Y^{2}) ℑ_{\nabla δ}^{2} ℑ_{W_{c}}^{2}

and

A_{2} = (1 / Y^{2}) ℑ_{\nabla δ}^{2}

.

To summarize, the Lyapunov stability method has been used to demonstrate the state x of Equation (1) and

{\tilde{W}}_{c}

are UUB, with Equations (57) and (58) representing their respective bounds. The proof is complete. □

5. Simulation Study

Within this section, two examples are utilized to validate the efficacy of the proposed approach.

5.1. Example 1

Consider the F16 aircraft plant used in [28] as

\dot{x} = F (x) + G (x) u + P (x) v,

(59)

where

x (t) = {[x_{1}, x_{2}, x_{3}]}^{T} \in R^{3}

with

x_{0} = {[1, - 1, 1]}^{T}

represents the system state vector, where

x_{1}, x_{2}

and

x_{3}

represent the attack angle, the pitch rate, and the elevator deflection angle, respectively.

u

is control input,

v

is disturbance input. The internal dynamics, control, and disturbance coefficient matrices are expressed as

F (x) = [\begin{matrix} - 1.01887 x_{1} + 0.90506 x_{2} - 0.00215 x_{3} \\ 0.82225 x_{1} - 1.07741 x_{2} - 0.17555 x_{3} \\ - x_{3} \end{matrix}], G (x) = [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}], P (x) = [\begin{matrix} 0 \\ 0 \\ - 1 \end{matrix}] .

The control input

u

is constrained to be greater than −1 and less than 2. Hence,

Ψ = 1.5

and

ℑ = 0.5

. And then, the danger region is described as a ball with a radius of

0.15

and a center at

{[0.3, 0.05, - 0.05]}^{T}

. The

y (x)

is chosen as

\frac{1.5 \sqrt{{(x_{1} - 0.3)}^{2} + 0.1 {(x_{2} - 0.05)}^{2} + 1.2 {(x_{3} + 0.05)}^{2}} - 0.15}{\sqrt{{(x_{1} - 0.3)}^{2} + {(x_{2} - 0.05)}^{2} + 25 {(x_{3} + 0.05)}^{2}} - 0.15} .

The

z (x)

is chosen as

\sqrt{{(x_{1} - 0.3)}^{2} + {(x_{2} - 0.05)}^{2} + {(x_{3} + 0.05)}^{2}} - 0.15 .

In addition, substituting

Ψ

and ℑ into Equation (4),

U (u)

can be expressed as

\begin{matrix} U (u) & = 2 Ψ (u - ℑ) t a n h^{- 1} (\frac{u - ℑ}{Ψ}) + Ψ^{2} {ln}_{} (1 - \frac{{(u - ℑ)}^{2}}{Ψ}) \\ = 3 (u - 0.5) t a n h^{- 1} (\frac{u - 0.5}{1.5}) + 2.25 {ln}_{} (1 - \frac{{(u - 0.5)}^{2}}{1.5}) . \end{matrix}

(60)

Letting

Q = I_{3}

and

Y = 2

, the cost function for Equation (62) is formulated as

V (x) = \int_{0}^{\infty} x {(t)}^{T} Q x (t) + U (u) - 2^{2} {∥v∥}^{2} + B_{r} (x) d t,

(61)

where

B_{r} (x) = ρ \frac{y (x)}{z (x)}

represents the CBF and

ρ = 2

.

The activation function is given as

δ (x) = {[x_{1}^{2}, x_{1} x_{2}, x_{1} x_{3}, x_{2}^{2}, x_{2} x_{3}, x_{3}^{2}]}^{T}

and the CNN weight vector is

{\hat{W}}_{c} = {[{\hat{W}}_{c 1}, {\hat{W}}_{c 2}, {\hat{W}}_{c 3}, {\hat{W}}_{c 4}, {\hat{W}}_{c 5}, {\hat{W}}_{c 6}]}^{T}

. In addition, the adjustable parameter

α

is 10, and the original parameters of the CNN are configured as 1. At last, the probing noise

e x p (- 0.1 t) (0.001) (s i n {(t)}^{2} c o s (t) + s i n {(2 t)}^{2} c o s (0.1 t))

is added to the control input policy for the initial 30 s in order to ensure the persistence of the excitation.

Through simulation experiments, Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7 are obtained. Figure 1 displays that

{\hat{W}}_{c}

is convergent after the first 10 s, and can know the ideal vector

W_{c}^{*} = [16.4603, - 6.5022,

{- 4.3910, 4.8851, 3.7081, 11.6158]}^{T}

. Figure 2 displays the convergence of the states

x_{1}

,

x_{2}

, and

x_{3}

. Figure 3 displays the danger region, which is represented by the ball, and the original states are in the danger area. However, the system states controlled by the safe optimal controller bypass this ball, and as the damping coefficient

ρ

increases, the distance between the system states and the dangerous region becomes larger and larger. Figure 3 shows that as states

x_{1}

,

x_{2}

, and

x_{3}

gradually approach the danger zone, the convergence of

x_{3}

is accelerated due to the CBF and cost function. Figure 4 presents the control input

u

with asymmetric input constraints. The plot reveals that the value of

u

remains within the specified range, bounded by

u_{m a x} = 2

and

u_{m i n} = - 1

, providing evidence that the asymmetric input constraints are implemented successfully. Figure 5 presents the disturbance input

v

. Figure 6 presents the cost function of the system. It can be seen that when the system states confront the danger area, the cost function changes significantly and eventually converges to zero. According to the principle of optimal control, when the cost function converges to zero, the following conclusion can be drawn: The cost function imposes a higher penalty on control actions that do not comply with the asymmetric input constraints and safety constraints. Therefore, when the cost function converges to zero, the system finds the optimal control actions that satisfy all the constraints.

In order to further show the efficiency of the presented method, Equation (4) is redefined as

u^{T} R u

(where R =

I_{1}

), and the simulation results are illustrated in Figure 7. Subsequently, Figure 4 illustrates the control input, which is restricted to the limits of −1 to 2. This can be observed by comparing it with Figure 7, where the input is clearly outside this range.

5.2. Example 2

We consider the nonlinear system as

\dot{x} = F (x) + G (x) u + P (x) v,

(62)

where

x (t) = {[x_{1}, x_{2}]}^{T} \in R^{2}

with

x_{0} = {[1, - 1]}^{T}

represents the system state vector; the internal dynamics, control, and disturbance coefficient matrices are expressed as

F (x) = [\begin{matrix} - \frac{1}{2} x_{1} + x_{2} \\ - 2 x_{2} c o s (2 x_{1}) \end{matrix}], G (x) = [\begin{matrix} 0 \\ - x_{1} \end{matrix}], P (x) = [\begin{matrix} 0 \\ x_{1} \end{matrix}] .

Just like F16, the control input

u

is subject to an asymmetrical boundary, with a lower bound of −1 and an upper bound of 3, establishing its limits. Hence,

Ψ = 2

and

ℑ = 1

. And then, the danger region is described as a circle with radius

= 0.1

, and the center of the circle is

{[0.19, - 0.12]}^{T}

. The

y (x)

is chosen as

a t a n (\frac{1}{\sqrt{{(x_{1} - 0.19)}^{2} + {(x_{2} + 0.12)}^{2}} - 0.1}) .

The

z (x)

is chosen as

\sqrt{{(x_{1} - 0.19)}^{2} + {(x_{2} + 0.12)}^{2}} - 0.1 .

In addition, substituting

Ψ

and ℑ into Equation (4),

U (u)

can be expressed as

\begin{matrix} U (u) & = 2 Ψ (u - ℑ) t a n h^{- 1} (\frac{u - ℑ}{Ψ}) + Ψ^{2} {ln}_{} (1 - \frac{{(u - ℑ)}^{2}}{Ψ}) \\ = 4 (u - 1) t a n h^{- 1} (\frac{u - 1}{2}) + 4 {ln}_{} (1 - \frac{{(u - 1)}^{2}}{2}) . \end{matrix}

(63)

Letting

Q = I_{2}

and

Y = 1.35

, the cost function for Equation (62) is formulated as

V (x) = \int_{0}^{\infty} x {(t)}^{T} Q x (t) + U (u) - {1.35}^{2} {∥v∥}^{2} + B_{r} (x) d t,

(64)

where

B_{r} (x) = ρ \frac{y (x)}{z (x)}

represents the CBF and

ρ = 0.3

.

Then, the CNN presented as Equation (18) is applied to address the HJI equation for Equation (62). The activation function is given as

δ (x) = {[x_{1}^{2}, x_{1} x_{2}, x_{2}^{2}, x_{1}^{4}, x_{1}^{3} x_{2}, x_{1}^{2} x_{2}^{2}, x_{1} x_{2}^{3}, x_{2}^{4}]}^{T}

and the CNN weight vector is

{\hat{W}}_{c} = {[{\hat{W}}_{c 1}, {\hat{W}}_{c 2}, {\hat{W}}_{c 3}, {\hat{W}}_{c 4}, {\hat{W}}_{c 5}, {\hat{W}}_{c 6}, {\hat{W}}_{c 7}, {\hat{W}}_{c 8}]}^{T}

. In addition, the adjustable parameter

α

is 20, the original parameters of the CNN are configured as 1. At last, the probing noise

e x p (- 0.001 t) (- 0.1 (s i n {(t)}^{2} c o s (t) + s i n {(t)}^{5} + s i n {(2 t)}^{2} c o s (0.1 t) + s i n {(- 1.2 t)}^{2} c o s (0.5 t))

is added to the control input policy for the initial 30 s.

Through simulation experiments, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 are obtained. Figure 8 displays that

{\hat{W}}_{c}

is convergent after the first 10 s, and can know the ideal vector

W_{c}^{*} = [84.6487, - 12.2017,

{9.5269, 11.7425, - 3.0924, 3.4273, - 0.5533, 2.0591]}^{T}

. Figure 9 displays the convergence of the states

x_{1}

and

x_{2}

. Figure 10 illustrates the relationship between the system states and the dangerous area, revealing that increasing the damping factor

ρ

leads to a greater distance between the system states and the dangerous zone. Evidently, system states

x_{1}

and

x_{2}

with a safe and optimal controller take an alternate route to avoid the dangerous region, while the conventional optimal controller cannot circumvent the dangerous region. As can be seen from Figure 10, when states

x_{1}

and

x_{2}

gradually approach the danger zone, the convergence speed of

x_{2}

is accelerated due to the influence of CBF and cost function and obtains an optimal trajectory around the danger zone again. Figure 11 shows input

u

with asymmetric input constraints. The plot reveals that the value of

u

remains within the specified range, bounded by

u_{m a x} = 3

and

u_{m i n} = - 1

, providing evidence that the asymmetric input constraints are implemented successfully. Figure 12 presents disturbance input

v

. Figure 13 presents the cost function of the system. It can be seen that the cost function eventually converges to zero. Similar to the linear system, when the cost function converges to zero, it can be concluded that the system finds the optimal control action that satisfies the asymmetric input constraints and safety constraints.

In this paper, asymmetric input constraints and unmatched disturbances are applied to nonlinear safety-critical systems for the first time, and Equation (4) is used to handle the asymmetric input constraints. To further demonstrate the efficacy of the presented algorithm, as in articles [14,16,28], (4) is redefined as

u^{T} R u

(where R =

I_{1}

) and the simulation results are shown in Figure 14. Subsequently, the control input in Figure 11 is constrained to fall within the limits of −1 to 3, as can be observed by comparing it with Figure 14, while the input in Figure 14 is clearly outside this range.

6. Conclusions

The safe and optimal control problem of the nonlinear CT safety-critical systems with asymmetric input constraints and unmatched disturbances was addressed. Firstly, the new non-quadratic form function was considered for addressing the issue of asymmetric input constraints. Then, the control design was transformed into the two-player ZSG problem to handle unmatched disturbances. In order to obtain the optimal controller for safety, the combination of the CBF and cost function was directly used to penalize unsafe behavior. Moreover, the CNN was applied to reduce the computational complexity of dual actor–critic network. The effectiveness of the proposed method was validated by the simulation results.

Author Contributions

C.Q. and K.J. provided methodology, validation, and writing—original draft preparation; T.Z. provided conceptualization, writing—review; J.Z. provided supervision; C.Q. provided funding support. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by science and technology research project of the Henan province 222102240014.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The authors can confirm that all relevant data are included in the article.

Conflicts of Interest

The authors declare that they have no conflict of interest. All authors have approved the manuscript and agreed with submission to this journal.

References

Yi, X.; Luo, B.; Zhao, Y. Adaptive dynamic programming-based visual servoing control for quadrotor. Neurocomputing 2022, 504, 251–261. [Google Scholar] [CrossRef]
Liscouët, J.; Pollet, F.; Jézégou, J.; Budinger, M.; Delbecq, S.; Moschetta, J. A methodology to integrate reliability into the conceptual design of safety-critical multirotor unmanned aerial vehicles. Aerosp. Sci. Technol. 2022, 127, 107681. [Google Scholar] [CrossRef]
Dou, L.; Cai, S.; Zhang, X.; Su, X.; Zhang, R. Event-triggered-based adaptive dynamic programming for distributed formation control of multi-UAV. J. Frankl. Inst. 2022, 359, 3671–3691. [Google Scholar] [CrossRef]
Molnar, T.; Cosner, R.; Singletary, A.; Ubellacker, W.; Ames, A. Model-free safety-critical control for robotic systems. IEEE Robot. Autom. Lett. 2021, 7, 944–951. [Google Scholar] [CrossRef]
Nguyen, Q.; Sreenath, K. Robust safety-critical control for dynamic robotics. IEEE Trans. Autom. Control 2021, 67, 1073–1088. [Google Scholar] [CrossRef]
Liu, S.; Liu, L.; Yu, Z. Safe reinforcement learning for affine nonlinear systems with state constraints and input saturation using control barrier functions. Neurocomputing 2023, 518, 562–576. [Google Scholar] [CrossRef]
Han, J.; Liu, X.; Wei, X.; Sun, S. A dynamic proportional-integral observer-based nonlinear fault-tolerant controller design for nonlinear system with partially unknown dynamic. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 5092–5104. [Google Scholar] [CrossRef]
Ohnishi, M.; Wang, L.; Notomista, G.; Egerstedt, M. Barrier-certified adaptive reinforcement learning with applications to brushbot navigation. IEEE Trans. Robot. 2019, 35, 1186–1205. [Google Scholar] [CrossRef] [Green Version]
Bianchi, D.; Di Gennaro, S.; Di Ferdinando, M.; Acosta Lùa, C. Robust Control of UAV with Disturbances and Uncertainty Estimation. Machines 2023, 11, 352. [Google Scholar] [CrossRef]
Bianchi, D.; Borri, A.; Di Benedetto, M.; Di Gennaro, S. Active Attitude Control of Ground Vehicles with Partially Unknown Model. IFAC-PapersOnLine 2020, 53, 14420–14425. [Google Scholar] [CrossRef]
Ames, A.; Xu, X.; Grizzle, J.; Tabuada, P. Control barrier function based quadratic programs for safety critical systems. IEEE Trans. Autom. Control 2016, 62, 3861–3876. [Google Scholar] [CrossRef]
Wang, H.; Peng, J.; Zhang, F.; Zhang, H.; Wang, Y. High-order control barrier functions-based impedance control of a robotic manipulator with time-varying output constraints. ISA Trans. 2022, 129, 361–369. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Liu, L.; Yu, Z. Safe reinforcement learning for discrete-time fully cooperative games with partial state and control constraints using control barrier functions. Neurocomputing 2023, 517, 118–132. [Google Scholar] [CrossRef]
Qin, C.; Wang, J.; Zhu, H.; Zhang, J.; Hu, S.; Zhang, D. Neural network-based safe optimal robust control for affine nonlinear systems with unmatched disturbances. Neurocomputing 2022, 506, 228–239. [Google Scholar] [CrossRef]
Xu, X.; Tabuada, P.; Grizzle, J.W.; Ames, A. Robustness of control barrier functions for safety critical control. IFAC-PapersOnLine 2015, 48, 54–61. [Google Scholar] [CrossRef]
Marvi, Z.; Kiumarsi, B. Safe reinforcement learning: A control barrier function optimization approach. Int. J. Robust Nonlinear Control 2021, 31, 1923–1940. [Google Scholar] [CrossRef]
Xiao, W.; Belta, C.; Cassandras, C. Adaptive control barrier functions. IEEE Trans. Autom. Control 2021, 67, 2267–2281. [Google Scholar] [CrossRef]
Modares, H.; Lewis, F.; Sistani, M. Online solution of nonquadratic two-player zero-sum games arising in the H∞ control of constrained input systems. Int. J. Adapt. Control Signal Process. 2014, 28, 232–254. [Google Scholar] [CrossRef]
Qin, C.; Zhu, H.; Wang, J.; Xiao, Q.; Zhang, D. Event-triggered safe control for the zero-sum game of nonlinear safety-critical systems with input saturation. IEEE Access 2022, 10, 40324–40337. [Google Scholar] [CrossRef]
Song, R.; Zhu, L. Stable value iteration for two-player zero-sum game of discrete-time nonlinear systems based on adaptive dynamic programming. Neurocomputing 2019, 340, 180–195. [Google Scholar] [CrossRef]
Lu, W.; Li, Q.; Lu, K.; Lu, Y.; Guo, L.; Yan, W.; Xu, F. Load adaptive PMSM drive system based on an improved ADRC for manipulator joint. IEEE Access 2021, 9, 33369–33384. [Google Scholar] [CrossRef]
Qin, C.; Qiao, X.; Wang, J.; Zhang, D. Robust Trajectory Tracking Control for Continuous-Time Nonlinear Systems with State Constraints and Uncertain Disturbances. Entropy 2022, 24, 816. [Google Scholar] [CrossRef] [PubMed]
Fan, Q.; Yang, G. Adaptive actor—Critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 165–177. [Google Scholar] [CrossRef]
Yang, X.; He, H. Event-driven H∞-constrained control using adaptive critic learning. IEEE Trans. Cybern. 2020, 51, 4860–4872. [Google Scholar] [CrossRef] [PubMed]
Lewis, F.; Vrabie, D.; Syrmos, V. Optimal Control; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
Kiumarsi, B.; Vamvoudakis, K.; Modares, H.; Lewis, F. Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 2042–2062. [Google Scholar] [CrossRef]
Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans. Syst. Man Cybern. Syst. 2020, 51, 142–160. [Google Scholar] [CrossRef]
Vamvoudakis, K.; Lewis, F. Online actor—Critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
Han, H.; Zhang, J.; Yang, H.; Hou, Y.; Qiao, J. Data-driven robust optimal control for nonlinear system with uncertain disturbances. Inf. Sci. 2023, 621, 248–264. [Google Scholar] [CrossRef]
Lou, X.; Zhang, X.; Ye, Q. Robust control for uncertain impulsive systems with input constraints and external disturbance. Int. J. Robust Nonlinear Control 2022, 32, 2330–2343. [Google Scholar] [CrossRef]
Wang, N.; Gao, Y.; Yang, C.; Zhang, X. Reinforcement learning-based finite-time tracking control of an unknown unmanned surface vehicle with input constraints. Neurocomputing 2022, 484, 26–37. [Google Scholar] [CrossRef]
Liu, C.; Zhang, H.; Xiao, G.; Sun, S. Integral reinforcement learning based decentralized optimal tracking control of unknown nonlinear large-scale interconnected systems with constrained-input. Neurocomputing 2019, 323, 1–11. [Google Scholar] [CrossRef]
Yang, X.; Zhao, B. Optimal neuro-control strategy for nonlinear systems with asymmetric input constraints. IEEE/CAA J. Autom. Sin. 2020, 7, 575–583. [Google Scholar] [CrossRef]
Tang, Y.; Yang, X. Robust tracking control with reinforcement learning for nonlinear-constrained systems. Int. J. Robust Nonlinear Control 2022, 32, 9902–9919. [Google Scholar] [CrossRef]
Zhou, W.; Liu, H.; He, H.; Yi, J.; Li, T. Neuro-optimal tracking control for continuous stirred tank reactor with input constraints. IEEE Trans. Ind. Inform. 2018, 15, 4516–4524. [Google Scholar] [CrossRef]
Kong, L.; He, W.; Dong, Y.; Cheng, L.; Yang, C.; Li, Z. Asymmetric bounded neural control for an uncertain robot by state feedback and output feedback. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 1735–1746. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Zhao, B.; Liu, D.; Zhang, S. Event-triggered control of discrete-time zero-sum games via deterministic policy gradient adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 4823–4835. [Google Scholar] [CrossRef]
Zhang, S.; Zhao, B.; Liu, D.; Zhang, Y. Observer-based event-triggered control for zero-sum games of input constrained multi-player nonlinear systems. Neural Netw. 2021, 144, 101–112. [Google Scholar] [CrossRef]
Wei, Q.; Liu, D.; Lin, Q.; Song, R. Adaptive dynamic programming for discrete-time zero-sum games. IEEE Trans. Neural Networks Learn. Syst. 2017, 29, 957–969. [Google Scholar] [CrossRef]
Perrusquía, A.; Yu, W. Continuous-time reinforcement learning for robust control under worst-case uncertainty. Int. J. Syst. Sci. 2021, 52, 770–784. [Google Scholar] [CrossRef]
Yang, Y.; Vamvoudakis, K.; Modares, H.; Yin, Y.; Wunsch, D. Safe intermittent reinforcement learning with static and dynamic event generators. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 5441–5455. [Google Scholar] [CrossRef]
Fu, Z.; Xie, W.; Rakheja, S.; Na, J. Observer-based adaptive optimal control for unknown singularly perturbed nonlinear systems with input constraints. IEEE/CAA J. Autom. Sin. 2017, 4, 48–57. [Google Scholar] [CrossRef]

Figure 1. Convergence of the CNN weights.

Figure 2. Convergence of system states

x_{1}

,

x_{2}

, and

x_{3}

.

Figure 2. Convergence of system states

x_{1}

,

x_{2}

, and

x_{3}

.

Figure 3. The comparison between the safe and unsafe states.

Figure 4. Control input in the system.

Figure 5. Disturbance input in the system.

Figure 6. The cost function of the system.

Figure 7. Control input without asymmetric input constraints.

Figure 8. Convergence of the CNN weights.

Figure 9. Convergence of system states

x_{1}

and

x_{2}

.

Figure 9. Convergence of system states

x_{1}

and

x_{2}

.

Figure 10. The comparison between the safe and unsafe states.

Figure 11. Control input in the system.

Figure 12. Disturbance input in the system.

Figure 13. The cost function of the system.

Figure 14. Control input without asymmetric input constraints.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qin, C.; Jiang, K.; Zhang, J.; Zhu, T. Critic Learning-Based Safe Optimal Control for Nonlinear Systems with Asymmetric Input Constraints and Unmatched Disturbances. Entropy 2023, 25, 1101. https://doi.org/10.3390/e25071101

AMA Style

Qin C, Jiang K, Zhang J, Zhu T. Critic Learning-Based Safe Optimal Control for Nonlinear Systems with Asymmetric Input Constraints and Unmatched Disturbances. Entropy. 2023; 25(7):1101. https://doi.org/10.3390/e25071101

Chicago/Turabian Style

Qin, Chunbin, Kaijun Jiang, Jishi Zhang, and Tianzeng Zhu. 2023. "Critic Learning-Based Safe Optimal Control for Nonlinear Systems with Asymmetric Input Constraints and Unmatched Disturbances" Entropy 25, no. 7: 1101. https://doi.org/10.3390/e25071101

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Critic Learning-Based Safe Optimal Control for Nonlinear Systems with Asymmetric Input Constraints and Unmatched Disturbances

Abstract

1. Introduction

2. Problem Statement

3. Safe Optimal Control Design

3.1. Control Barrier Function

3.2. Safe and Optimal Control Approach

4. Adaptive CNN Design

4.1. Solving the HJI Equation via the CNN

4.2. Stability Analysis

5. Simulation Study

5.1. Example 1

5.2. Example 2

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI