Next Article in Journal
A Continuous-Time Semi-Markov System Governed by Stepwise Transitions
Previous Article in Journal
An Ensemble and Iterative Recovery Strategy Based kGNN Method to Edit Data with Label Noise
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints

School of Artificial Intelligence, Henan University, Zhengzhou 450000, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(15), 2744; https://doi.org/10.3390/math10152744
Submission received: 11 July 2022 / Revised: 30 July 2022 / Accepted: 1 August 2022 / Published: 3 August 2022
(This article belongs to the Topic Advances in Nonlinear Dynamics: Methods and Applications)

Abstract

:
In this paper, we investigate the constrained optimal control problem of nonlinear multi-input safety-critical systems with uncertain disturbances and time-varying safety constraints. By utilizing a barrier function transformation, together with a new disturbance-related term and a smooth safety boundary function, a nominal system-dependent multi-input barrier transformation architecture is developed to deal with the time-varying safety constraints and uncertain disturbances. Based on the obtained transformation system, the coupled Hamilton–Jacobi–Bellman (HJB) function is established to obtain the constrained Nash equilibrium solution. In addition, due to the fact that it is difficult to solve the HJB function directly, the single critic neural network (NN) is constructed to approximate the optimal performance index function of different control inputs, respectively. It is proved theoretically that, under the influence of uncertain disturbances and time-varying safety constraints, the system states and neural network parameters can be uniformly ultimately bounded (UUB) by the proposed neural network approximation method. Finally, the effectiveness of the proposed method is verified by two nonlinear simulation examples.

1. Introduction

To solve the optimal control problem of any safety-critical systems (e.g., autonomous vehicles, intelligent robots, etc.), safety should be the basic requirement. Failure to ensure the safety of such systems may result in serious consequences, such as casualties, environmental pollution, and equipment damage. The safety control design refers to the control strategy which satisfies the safety specification stipulated by the physical or environmental constraints of the system. The barrier function (BF) method [1,2] has been proved to be an effective method to realize the system safety constraints or state constraints, and have attracted a wide amount of attention in recent years. For the optimal control problem in the modern control domain, it usually relies on solving the complex Hamilton–Jacobi–Bellman (HJB) equation [3,4,5]. However, there is no effective mathematical method to solve the HJB equation due to its own properties. When designing the controllers that are both safe and optimal, the proper combination of safety and performance goal is an issue worth studying.
It has been proved that the dynamic programming (DP) method is a feasible and effective method to solve the HJB equation and derive the optimal solution. However, as the dimension of the variables increases, the dynamic programming method suffers from the “dimension curse”. Adaptive dynamic programming (ADP) [6,7,8,9,10] uses the function approximation, such as neural network (NN) approximation methods, to approximate the cost function in the HJB equation, which has been proved to be a valid method to solve the dimension curse of dynamic programming method. It is an emerging method combining the development of artificial intelligence and control field, and has become a hotspot of international optimization research in recent years [11,12,13,14,15]. In [11,12,13], the authors studied the optimal control problem with disturbance by using the reinforcement learning (RL) method. Aiming at the random differential equations systems with coexisting parametric uncertainties and severe nonlinearities, Zhang et al. [14] studied the problem of event-triggered adaptive tracking control. Vamvoudakis et al. [15] proposed an online continuous time learning algorithm based on policy iteration to learn the optimal control solutions of known nonlinear systems. In [16,17,18], the robust control problem was transformed into the optimal control problem of the nominal system by selecting an appropriate utility function. On the other hand, game theory [19,20,21,22,23,24] has become a powerful tool to optimize the coordination and cooperation of multiple controllers, and has been proved in many practical control problems. In fact, many systems in the real world have the idea of the non-zero-sum (NZS) game, where each controller of the system tries to minimize its cost function. Many researchers translate the non-zero-sum game problem [25,26] into the problem of solving the coupled HJB equation, but it is still a great difficulty to solve the coupled HJB equation [27,28,29]. The development of adaptive dynamic programming and game theory has prompted many scholars to conduct relevant research. For robust trajectory tracking multiple input control of uncertain nonlinear systems, Qin et al. [28] proposed a new adaptive online learning method to learn the Nash equilibrium solution. Song et al. [29] developed a non-strategic integral reinforcement learning (IRL) method to effectively solve the NZS game control problem with unknown system dynamics. Ming et al. [30] proposed a single-network adaptive control method to obtain the optimal solution of NZS differential game for autonomous nonlinear systems. All of the above methods can effectively solve the NZS game optimal control problem. However, few studies have been done on the NZS game with disturbance and time-varying safety constraints. This prompted the author to study this problem.
For the safety constraints, the existing methods based on barrier function and adaptive dynamic programming have received a lot of attention in recent years. Marvi et al. [31] proposed a barrier certified method to learn the safety optimal controller and ensure the operation of the safety-critical system within its safety zone while providing the optimal performance. By introducing the barrier function into utility function, Xu et al. [32] augmented the penalty mechanism to the utility function, and solved the state constraints problem that was difficult to be dealt with by the traditional ADP method. Liu et al. [33] proposed an adaptive control method to obtain the safety solution of nonlinear stochastic systems. In addition, the barrier function transformation method has proved that it is possible to transform the safety-critical system with safety constraints into a general system without constraints in different scenarios, such as zero-sum game [34], non-zero-sum game [35], tracking control [36], and event-triggered control [37]. However, without exception, the above results must satisfy the implicit assumption that the safety constraints are constant. In fact, the constant constraint is only a special case of time-varying constraints. In practical applications, the time-varying constraints also have a wide range of application scenarios, such as UAV or manipulator working in some more complex environments.
For the constrained optimal control problem with time-varying safety constraints and uncertain disturbances, the constrained Nash equilibrium solutions are obtained by introducing a novel barrier function transformation and constructing coupled HJB equations. The novelty of this paper is reflected in the following points:
(1). A novel barrier function transformation method is proposed by introducing a smooth safety boundary function and a barrier function with a single variable. Compared to previous works [34,35], the proposed method no longer strictly requires the time-invariance of safety constraints and can deal with both time-invariance and time-varying safety constraints.
(2). In order to obtain the constrained optimal Nash equilibrium solution of the multi-input barrier transformation system with uncertain disturbances, the reasonable performance index function and coupled HJB function are designed for the nominal system by introducing a disturbance-related term. It is proved that the obtained constrained Nash equilibrium solution can make the safety-critical system asymptotically stable under the uncertain disturbances and time-varying safety constraints.
(3). The single critical neural network is used to approximate the performance index function online to obtain the constrained control input. It is proved theoretically that the proposed barrier function transformation and neural network approximation method can make the system state and NN parameters uniformly ultimately bounded (UUB) under the condition of satisfying the time-varying safety constraints. In addition, two simulation examples also verify the feasibility and effectiveness of the proposed method.
The remainder of this article is organized as follows: Problem formulation and barrier transformation are given in Section 2. Section 3 employs the coupled Hamilton–Jacobi–Bellman equation to obtain the approximate optimal solution online. Section 4 shows the efficiency of the proposed method by giving two simulation examples. Finally, conclusions are given in Section 5.

2. Problem Formulation and Barrier Transformation

Consider the following nonlinear multi-input safety-critical system:
x ˙ = f ( x ( t ) ) + g 1 ( x ( t ) ) u 1 ( t ) + g 2 ( x ( t ) ) u 2 ( t ) + k ( x ( t ) ) d ( φ ( x ( t ) ) ) ,
where x C R n is the system state, u 1 U 1 R m 1 , u 2 U 2 R m 2 are the control inputs, d ( φ ( x ( t ) ) ) R m is the uncertain disturbance, f ( x ) R n , g 1 ( x ) R n × m 1 , g 2 ( x ) R n × m 2 and k ( x ) R n × m . C indicates the set of acceptable system state, and U 1 , U 2 indicates the set of acceptable system inputs. It is supposed that f ( x ) , g 1 ( x ) , g 2 ( x ) is Lipschitz continuous, and f ( 0 ) = 0 . It is also assumed that the system (1) is stabilizable. The uncertain disturbance term d satisfies d T d < δ T δ , where δ is a given function, δ ( 0 ) = 0 and φ ( · ) satisfy that φ ( 0 ) = 0 is a fixed function denoting the uncertainty.
Given the initial system state x 0 , the purpose of this article is to find the constrained control inputs u 1 , u 2 to make the system state x converge to the ideal value under the impact of the uncertain disturbances and time-varying safety constraints.
Remark 1.
In some papers, for example [31,35], the system state is constrained by the constant, that is, x ( ζ a , ζ A ) , where ( ζ a , ζ A ) represent the upper and lower bounds of system state. We consider a more complex and interesting case where the system safety constraints are time-varying and can be mathematically expressed as x ( ζ a ( t ) , ζ A ( t ) ) , where ( ζ a ( t ) , ζ A ( t ) ) represent the bounded smooth time-varying functions.
In order to satisfy the time-varying safety constraints, we define the following barrier function with a single independent variable τ ,
b ( z ( τ ) ; ξ a ( τ ) , ξ A ( τ ) ) = l o g ξ A ( τ ) ( ξ a ( τ ) z ( τ ) ) ξ a ( τ ) ( ξ A ( τ ) z ( τ ) ) ,
b 1 ( y ( τ ) ; ξ a ( τ ) , ξ A ( τ ) ) = ξ a ( τ ) ξ A ( τ ) e y ( τ ) 2 e y ( τ ) 2 ξ a ( τ ) e y ( τ ) 2 ξ A ( τ ) e y ( τ ) 2 ,
where ξ a ( · ) : R R , ξ A ( · ) : R R , z ( · ) : R R , y ( · ) : R R . The defined barrier function should satisfy the following assumption.
Assumption 1.
The proposed barrier function b ( · ) has the following characteristics:
(1) ξ a ( τ ) , ξ A ( τ ) are two smooth functions and satisfy ξ a ( τ ) < 0 < ξ A ( τ ) for any τ > 0 ;
(2) For any τ > 0 , the barrier function takes finite value when z ( τ ) ( ξ a ( τ ) , ξ A ( τ ) ) is satisfied;
(3) For any τ > 0 , as the function z ( τ ) tends to the prescribed region ( ξ a ( τ ) , ξ A ( τ ) ) , b ( · ) approaches infinity, i.e., lim z ( τ ) ξ a ( τ ) + b ( z ( τ ) ; ξ a ( τ ) , ξ A ( τ ) ) = , lim z ( τ ) ξ A ( τ ) b ( z ( τ ) ; ξ a ( τ ) , ξ A ( τ ) )
= + ;
(4) For any τ > 0 , the barrier function b ( · ) also converges when the function z ( τ ) converges.
It is worth noting that the constraints given by ( ζ a ( t ) , ζ A ( t ) ) can be many common trajectories, including sinusoidal waveforms, damping sinusoids, ramp, and so on. In our study, we will discuss a more useful form. We design the constraints ( ζ a ( t ) , ζ A ( t ) ) as the following smooth transformation functions, and satisfy the following conditions:
ζ a ( t ) = ζ a 1 ( t ) ζ a n ( t ) , ζ a i ( t ) = l 1 , t < t 1 l 1 ϑ 1 ϑ 1 cos ( π t 2 t t 2 t 1 ) , t 1 t t 2 l 2 , t > t 2
ζ A ( t ) = ζ A 1 ( t ) ζ A n ( t ) , ζ A i ( t ) = l 3 , t < t 3 l 3 ϑ 2 ϑ 2 cos ( π t 4 t t 4 t 3 ) , t 3 t t 4 l 4 , t > t 4
where i = 1 , , n , l 1 < 0 , l 2 < 0 , l 3 > 0 , l 4 > 0 , and l 1 2 ϑ 1 = l 2 , l 3 2 ϑ 2 = l 4 . We can find many similar practical applications where the similar constraints are imposed (e.g., vehicle entering a narrow road from a wide road, drone entering a tunnel, robotic arm working in a narrow space, etc.).
Remark 2.
A reasonable choice of parameters can be such that l 1 = l 2 , l 3 = l 4 when designing a smooth transformation function. In other words, the proposed method can also impose time-invariant safety constraints on the system state when some parameters are selected properly. In addition, according to the defined smooth transformation function, it can be extended to scenarios with more complex safety requirements, such as more frequent transformation of constraints and different types of constraints.
Considering the system (1) with the uncertain disturbances and time-varying safety constraints, we use the proposed barrier function and smooth transformation function to convert the multi-input safety-critical system x with the uncertain disturbances and time-varying safety constraints into the transformation system with uncertain disturbances only. We define
s i = b ( x i ( t ) ; ζ a i ( t ) , ζ A i ( t ) ) ,
x i = b 1 ( s i ( t ) ; ζ a i ( t ) , ζ A i ( t ) ) .
According to the chain rule and Equations (6) and (7), the transformed system dynamics s ˙ can be defined as
s ˙ i = x ˙ i d b 1 ( s i ( t ) ; ζ a i ( t ) , ζ A i ( t ) ) d s i , = f i ( x ( t ) ) + g 1 i ( x ( t ) ) u 1 ( t ) + g 2 i ( x ( t ) ) u 2 ( t ) + k i ( x ( t ) ) d ( φ ( x ( t ) ) ) ζ A i ( t ) ζ a i 2 ( t ) ζ a i ( t ) ζ A i 2 ( t ) ζ a i 2 ( t ) e s i 2 ζ a i ( t ) ζ A i ( t ) + ζ A i 2 ( t ) e s i , = F i ( s ( t ) ) + G 1 i ( s ( t ) ) u 1 ( t ) + G 2 i ( s ( t ) ) u 2 ( t ) + K i ( s ( t ) ) d ( φ ( b 1 ( s ( t ) ) ) ) ,
where
F i ( s ( t ) ) = ζ a i 2 ( t ) e s i 2 ζ a i ( t ) ζ A i ( t ) + ζ A i 2 ( t ) e s i ζ A i ( t ) ζ a i 2 ( t ) ζ a i ( t ) ζ A i 2 ( t ) × f i ( [ b 1 ( s 1 ) , , b 1 ( s n ) ] ) , G 1 i ( s ( t ) ) = ζ a i 2 ( t ) e s i 2 ζ a i ( t ) ζ A i ( t ) + ζ A i 2 ( t ) e s i ζ A i ( t ) ζ a i 2 ( t ) ζ a i ( t ) ζ A i 2 ( t ) × g 1 i ( [ b 1 ( s 1 ) , , b 1 ( s n ) ] ) , G 2 i ( s ( t ) ) = ζ a i 2 ( t ) e s i 2 ζ a i ( t ) ζ A i ( t ) + ζ A i 2 ( t ) e s i ζ A i ( t ) ζ a i 2 ( t ) ζ a i ( t ) ζ A i 2 ( t ) × g 2 i ( [ b 1 ( s 1 ) , , b 1 ( s n ) ] ) , K i ( s ( t ) ) = ζ a i 2 ( t ) e s i 2 ζ a i ( t ) ζ A i ( t ) + ζ A i 2 ( t ) e s i ζ A i ( t ) ζ a i 2 ( t ) ζ a i ( t ) ζ A i 2 ( t ) × k i ( [ b 1 ( s 1 ) , , b 1 ( s n ) ] ) .
Based on Formula (8), the transformation system s = [ s 1 ; ⋯; s n ] can be written as
s ˙ = F ( s ( t ) ) + G 1 ( s ( t ) ) u 1 ( t ) + G 2 ( s ( t ) ) u 2 ( t ) + K ( s ( t ) ) d ( φ ( b 1 ( s ( t ) ) ) )
where F ( s ) = [ F 1 ( s ) ; ⋯; F n ( s ) ] , G 1 ( s ) = [ G 11 ( s ) ; ⋯; G 1 n ( s ) ] , G 2 ( s ) = [ G 21 ( s ) ; ⋯; G 2 n ( s ) ] , K ( s ) = [ K 1 ( s ) ; ; K n ( s ) ] . For convenience, we use d to represent d ( φ ( b 1 ( s ( t ) ) ) ) and use s to represent s ( t ) in the following description.
After the proposed barrier transformation, we have transformed the problem from the constrained optimal control problem for the safety-critical system (1) with uncertain disturbances and time-varying safety constraints to the constrained optimal control problem for the transformation system (9) with uncertain disturbances only. Before proceeding, we need to make the following proof about the transformation system (9).
Theorem 1.
Based on the proposed barrier transformation (6) and (7), the transformation system (9) obtained from the system (1) satisfies the following properties:
(1) F ( s ) is Lipschitz with F ( 0 ) = 0 , and satisfies F ( s ) λ f s , where λ f is a constant;
(2) G 1 ( s ) , G 2 ( s ) are bounded, and there exists constants λ 1 g , λ 2 g , makes G 1 ( s ) λ 1 g , G 2 ( s ) λ 2 g . The transformation system (9) has zero state observability.
Proof of Theorem 1. 
(1) Based on Equation (8), we can obtain
F i ( s ) = f i ( x ) T i ( s ) ,
where T i ( s ) = ζ a i 2 ( t ) e s i 2 ζ a i ( t ) ζ A i ( t ) + ζ A i 2 ( t ) e s i ζ A i ( t ) ζ a i 2 ( t ) ζ a i ( t ) ζ A i 2 ( t ) , F i ( 0 ) = f i ( 0 ) = 0 . Based on Assumption 1, we know that, as long as x C , then the transformation system state s is bounded, that is, T i ( s ) is bounded. We can derive
F i ( s ) f i ( x ) T i ( s ) f i ( x ) λ ζ ,
where λ ζ represents the upper bound of T i ( s ) . Based on the assumptions about the system (1), we can obtain
F i ( s 1 ) F i ( s 2 ) = ( f i ( x 1 ) f i ( x 2 ) ) T i ( s ) x 1 x 2 k L 1 λ ζ ,
where x 1 , x 2 C , k L 1 is the Lipschitz constant of f i ( x ) . Based on the property of the barrier function, we can deduce that s 1 and s 2 are bounded as long as x 1 , x 2 C . For any x 1 , x 2 C , there is always a constant k L 2 that makes F i ( s 1 ) F i ( s 2 ) s 1 s 2 k L 2 . Considering the fact that F ( s ) = [ F 1 ( s ) ; ; F n ( s ) ] , we can deduce that
F ( s 1 ) F ( s 2 ) s 1 s 2 k L 3 .
where k L 3 is the Lipschitz constant of F ( s ) . Based on the Lipschitz condition [38], F ( s ) is Lipschitz continuous. Based on the boundedness of T i ( s ) and the assumptions about system (1), we can obtain that every term in F i ( s ) is bounded with x C . Therefore, we can say that F ( s ) is also bounded, and there is a constant λ f such that F ( s ) λ f s .
(2) Based on the boundedness of T i ( s ) and Equation (8), we can obtain that G 1 i ( s ) , G 2 i ( s ) are bounded with x C . Considering the fact that G 1 ( s ) = [ G 11 ( s ) ; ; G 1 n ( s ) ] , G 2 ( s ) = [ G 21 ( s ) ; ; G 2 n ( s ) ] , there are constants λ 1 g and λ 2 g , such that G 1 ( s ) λ 1 g , G 2 ( s ) λ 2 g . Given the initial system state x 0 , the initial state of transformed system (9) can be obtained from Equation (6), which proves the zero state observability of transformed system (9).
This completes the proof. □
Based on the transformation system, the nominal system of (9) can be defined as
s ˙ = F ( s ) + G 1 ( s ) u 1 + G 2 ( s ) u 2 .
The performance index function related to the design of u 1 can be defined as
V 1 ( s , u 1 , u 2 ) = 0 s T Q 1 s + Φ 1 ( u 1 , λ 1 ) + Φ 2 ( u 2 , λ 2 ) + Γ 1 ( s , V 1 ) d t ,
where Q 1 , R 11 , R 12 are positive definite matrices, R ¯ 11 = [ r 1 , , r m 1 ] R 1 × m 1 , R ¯ 12 = [ r 1 , , r m 2 ] R 1 × m 2 , V 1 represents the partial derivative of the performance index function V 1 with respect to s, Φ 1 ( u 1 , λ 1 ) = 2 λ 1 ( tanh 1 ( u 1 λ 1 ) ) T R 11 u 1 + λ 1 2 R ¯ 11 ln ( 1 u 1 2 λ 1 2 ) is the nonquadratic penalty function of u 1 , Φ 2 ( u 2 , λ 2 ) = 2 λ 2 ( tanh 1 ( u 2 λ 2 ) ) T R 12 u 2 + λ 2 2 R ¯ 12 ln ( 1 u 2 2 λ 2 2 ) is the nonquadratic penalty function of u 2 , Γ 1 ( s , V 1 ( s ) ) = δ T δ + 1 4 V 1 ( s ) T K ( s ) K T ( s )   V 1 ( s ) represents the disturbance-related term.
The performance index function related to the design of u 2 is defined as
V 2 ( s , u 1 , u 2 ) = 0 s T Q 2 s + Φ 3 ( u 1 , λ 1 ) + Φ 4 ( u 2 , λ 2 ) + Γ 2 ( s , V 2 ) d t ,
where Q 2 , R 21 , R 22 are positive definite matrices, R ¯ 21 = [ r 1 , , r m 1 ] R 1 × m 1 , R ¯ 22 = [ r 1 , , r m 2 ] R 1 × m 2 , V 2 represents the partial derivative of the performance index function V 2 , Φ 3 ( u 1 , λ 1 ) = 2 λ 1 ( tanh 1 ( u 1 λ 1 ) ) T R 21 u 1 + λ 1 2 R ¯ 21 ln ( 1 u 1 2 λ 1 2 ) is the nonquadratic penalty function of u 1 , Φ 4 ( u 2 , λ 2 ) = 2 λ 2 ( tanh 1   ( u 2 λ 2 ) ) T R 22 u 2 + λ 2 2 R ¯ 22 ln ( 1 u 2 2 λ 2 2 ) is the nonquadratic penalty function of u 2 , and Γ 2 ( s , V 2 ( s ) ) = δ T δ + 1 4 V 2 ( s ) T   K ( s ) K T ( s ) V 2 ( s ) represents the barrier-disturbance related term.
Definition 1.
The control strategy set ( u 1 * , u 2 * ) is a Nash equilibrium control strategy set if
V 1 ( u 1 * , u 2 * ) V 1 ( u 1 , u 2 * ) , V 2 ( u 1 * , u 2 * ) V 2 ( u 1 * , u 2 ) ,
hold for any admissible control policies u 1 and u 2 .
Based on the performance index function (15) and (16), the Hamilton functions associated with the control input u 1 and u 2 are defined as
H 1 ( s , u 1 , u 2 ) = s T Q 1 s + Φ 1 ( u 1 , λ 1 ) + Φ 2 ( u 2 , λ 2 ) + Γ 1 ( s , V 1 ) + V 1 T ( F ( s ) + G 1 ( s ) u 1 + G 2 ( s ) u 2 ) ,
H 2 ( s , u 1 , u 2 ) = s T Q 2 s + Φ 3 ( u 1 , λ 1 ) + Φ 4 ( u 2 , λ 2 ) + Γ 2 ( s , V 2 ) + V 2 T ( F ( s ) + G 1 ( s ) u 1 + G 2 ( s ) u 2 ) .
We define the optimal performance index functions of u 1 , u 2 as
V 1 * ( s , u 1 * , u 2 ) = min u 1 U 1 0 s T Q 1 s + Φ 1 ( u 1 , λ 1 ) + Φ 2 ( u 2 , λ 2 ) + Γ 1 ( s , V 1 ) d t ,
V 2 * ( s , u 1 , u 2 * ) = min u 2 U 2 0 s T Q 2 s + Φ 3 ( u 1 , λ 1 ) + Φ 4 ( u 2 , λ 2 ) + Γ 2 ( s , V 2 ) d t .
Considering the nominal system (14) and the Formulas (15) and (16), the constrained optimal control strategys u 1 * and u 2 * can be obtained according to the stationarity condition of optimization:
u 1 * = λ 1 tanh 1 2 λ 1 R 11 1 G 1 T ( s ) V 1 * ( s ) ,
u 2 * = λ 2 tanh 1 2 λ 2 R 22 1 G 2 T ( s ) V 2 * ( s ) ,
where V 1 * ( s ) and V 2 * ( s ) are obtained by solving the following coupled HJB equations:
s T Q 1 s + 2 λ 1 ( tanh 1 ( u 1 λ 1 ) ) T R 11 u 1 + λ 1 2 R ¯ 11 ln ( 1 u 1 2 λ 1 2 ) + 2 λ 2 ( tanh 1 ( u 2 λ 2 ) ) T R 12 u 2 + λ 2 2 R ¯ 12 ln ( 1 u 2 2 λ 2 2 ) + Γ 1 ( s , V 1 ) + V 1 T ( F ( s ) G 1 ( s ) λ 1 tanh 1 2 λ 1 R 11 1 G 1 T ( s ) V 1 * ( s ) G 2 ( s ) λ 2 tanh 1 2 λ 2 R 22 1 G 2 T ( s ) V 2 * ( s ) ) = 0
s T Q 2 s + 2 λ 1 ( tanh 1 ( u 1 λ 1 ) ) T R 21 u 1 + λ 1 2 R ¯ 21 ln ( 1 u 1 2 λ 1 2 ) + 2 λ 2 ( tanh 1 ( u 2 λ 2 ) ) T R 22 u 2 + λ 2 2 R ¯ 22 ln ( 1 u 2 2 λ 2 2 ) + Γ 2 ( s , V 1 ) + V 2 T ( F ( s ) G 1 ( s ) λ 1 tanh 1 2 λ 1 R 11 1 G 1 T ( s ) V 1 * ( s ) G 2 ( s ) λ 2 tanh 1 2 λ 2 R 22 1 G 2 T ( s ) V 2 * ( s ) ) = 0
Lemma 1.
Assume that V 1 ( s ) , V 2 ( s ) are the continuously differentiable function satisfying V 1 ( s ) > 0 , V 2 ( s ) > 0 for all s 0 and V 1 ( 0 ) = V 2 ( 0 ) = 0 , and there exist two bounded functions Γ 1 ( s ) , Γ 2 ( s ) satisfying Γ 1 ( s ) 0 , Γ 2 ( s ) 0 , and two control laws u 1 , u 2 , such that
( a ) V j T T ¯ ( s , u 1 , u 2 , d ) V j T T ( s , u 1 , u 2 ) + Γ j ( s ) , ( b ) V j T T ( s , u 1 , u 2 ) + Γ j ( s ) < 0 , s 0 , j = 1 , 2
where T ¯ ( s , u 1 , u 2 , d ) = F ( s ) + G 1 ( s ) u 1 + G 2 ( s ) u 2 + K ( s ) d , T ( s , u 1 , u 2 ) = F ( s ) + G 1 ( s ) u 1 + G 2 ( s ) u 2 . Then, the transformation system (9) can achieve asymptotic stability under the control laws u 1 and u 2 .
Proof of Lemma 1. 
We can use the chain rule to obtain
V ˙ 1 ( s ( t ) ) = d ( V 1 ( s ( t ) ) ) d t = V 1 T T ¯ ( s , u 1 , u 2 , d ) .
According to Formula (26), we can obtain V ˙ 1 ( s ( t ) ) < 0 for any s 0 . We can derive that V 1 ( · ) is a Lyapunov function for the transformation system (9), which proves that the transformation system can be asymptotic stability. As long as V 1 ( · ) satisfies the condition of Formula (26), it is concluded that the control law u 1 can realize the asymptotic stability of the transformation system. Similarly, we can prove that the control law u 2 can realize the asymptotic stability of the transformation system. □
Lemma 2.
Under Assumption 1, if the constrained optimal control problem of the transformation system (9) can be solved by the constrained optimal control laws u 1 , u 2 , then the system (1) satisfies the time-varying safety constraints ( ζ a ( t ) , ζ A ( t ) ) provided that the initial state x 0 of the system (1) satisfies time-varying safety constraints.
Proof of Lemma 2. 
Based on Lemma 1, one can obtain V ˙ 1 ( s ( t ) ) 0 and V ˙ 2 ( s ( t ) ) 0 , such that
V 1 ( s ( t ) ) V 1 ( s ( 0 ) ) , V 2 ( s ( t ) ) V 2 ( s ( 0 ) ) , t 0 .
According to the properties of the barrier function in Assumption 1, we can derive that the performance index functions V 1 ( s ( 0 ) ) and V 2 ( s ( 0 ) ) are finite when the initial value x 0 of the safety-critical system (1) satisfies the time-varying safety constraints ( ζ a ( t ) , ζ A ( t ) ) , and V 1 ( · ) , V 2 ( · ) satisfies the condition of Formula (26). That is, the performance index functions V 1 ( s ( t ) ) and V 2 ( s ( t ) ) are finite. Therefore, based on Assumption 1, we obtain
x ( t ) ( ζ a ( t ) , ζ A ( t ) ) , t > 0 .
This proof is completed. □
According to Lemmas 1 and 2, the constrained optimal control laws (22) and (23) can make the safety-critical system (1) with the uncertain disturbances and time-varying safety constraints asymptotically stable based on the proposed barrier transformation and disturbance-related term. Based on (22) and (23), we only need to use the proposed coupled HJB Equations (24) and (25) to obtain the optimal performance index function, and then obtain the constrained optimal control solution. However, Equations (24) and (25) are often difficult or impossible to solve due to their inherently nonlinear nature. In view of this problem, an approximate structure based on NN is proposed to learn the solutions of the coupled HJB equations online.

3. Approximate Optimal Solution of Coupled Hamilton–Jacobi–Bellman Equations

In this section, an online approximation method is proposed by constructing a single critic network. Based on the universal approximation property of NN, the optimal performance index functions (20) and (21) and their partial derivatives can be approximated as follows:
V j * ( s ) = W j * T ϕ j ( s ) + ε j ( s ) , V j * ( s ) = ϕ j T ( s ) W j * + ε j ( s ) , j = 1 , 2
where W j * = [ ω j 1 ω j 2 ω j 3 ω j L ] T R L represents the ideal weight, ϕ j ( s ) = [ φ j 1 φ j 2 φ j 3 φ j L ] T R L represents the neural network activation function, ϕ j ( s ) represents the partial derivative of ϕ j ( s ) , L represents the number of hidden layer neurons, ε j ( s ) represents the NN approximation error, and ε j ( s ) represents the partial derivative of ε j ( s ) .
Assumption 2.
It is assumed that the ideal weights W j are limited to constants, i.e., W j λ W j , and the neural network approximation residuals satisfy ε j λ ε j , ε j λ d ε j , and the neural network activation functions satisfy ϕ j λ ϕ j , ϕ j λ d ϕ j .
Based on Formula (30), the Bellman approximation errors of the neural network approximation can be expressed as
H 1 ( s , W 1 * , W 2 * ) = ε B 1 , H 2 ( s , W 1 * , W 2 * ) = ε B 2 .
Remark 3.
The Bellman approximation errors ε B 1 and ε B 2 will be equal to 0 with the number of hidden neurons L . When the number of L is a constant, the Bellman approximation errors is bounded, i.e., ε B j ( s ) < ε B j h . In the later proof, we will consider the influence of Bellman approximation errors ε B 1 and ε B 2 .
Since the ideal weights W 1 * and W 2 * are unknown, we use the estimates of ideal weights to construct the critic neural network:
V ^ j ( s ) = W ^ j T ϕ j ( s ) , V ^ j ( s ) = ϕ j T ( s ) W ^ j .
According to Formulas (22), (23) and (32), the approximate optimal control strategys are
u ^ 1 = λ 1 tanh 1 2 λ 1 R 11 1 G 1 T ( s ) ϕ 1 T ( s ) W ^ 1 ,
u ^ 2 = λ 2 tanh 1 2 λ 2 R 22 1 G 2 T ( s ) ϕ 2 T ( s ) W ^ 2 .
Substituting (32)–(34) into (18) and (19), the approximate Hamiltonian function can be obtained
H 1 ( s , W ^ 1 , W ^ 2 ) = s T Q 1 s + 2 λ 1 ( tanh 1 ( u ^ 1 λ 1 ) ) T R 11 u ^ 1 + λ 1 2 R ¯ 11 ln ( 1 u ^ 1 2 λ 1 2 ) + 2 λ 2 ( tanh 1 ( u ^ 2 λ 2 ) ) T R 12 u ^ 2 + λ 2 2 R ¯ 12 ln ( 1 u ^ 2 2 λ 2 2 ) + Γ 1 ( s , V ^ 1 ) + V ^ 1 T ( F ( s ) G 1 ( s ) λ 1 tanh 1 2 λ 1 R 11 1 G 1 T ( s ) V ^ 1 ( s ) G 2 ( s ) λ 2 tanh 1 2 λ 2 R 22 1 G 2 T ( s ) V ^ 2 ( s ) ) e 1 ,
H 2 ( s , W ^ 1 , W ^ 2 ) = s T Q 2 s + 2 λ 1 ( tanh 1 ( u ^ 1 λ 1 ) ) T R 21 u ^ 1 + λ 1 2 R ¯ 21 ln ( 1 u ^ 1 2 λ 1 2 ) + 2 λ 2 ( tanh 1 ( u ^ 2 λ 2 ) ) T R 22 u ^ 2 + λ 2 2 R ¯ 22 ln ( 1 u ^ 2 2 λ 2 2 ) + Γ 2 ( s , V ^ 2 ) + V ^ 2 T ( F ( s ) G 1 ( s ) λ 1 tanh 1 2 λ 1 R 11 1 G 1 T ( s ) V ^ 1 ( s ) G 2 ( s ) λ 2 tanh 1 2 λ 2 R 22 1 G 2 T ( s ) V ^ 2 ( s ) ) e 2 .
The estimates of ideal weights need to be adjusted so that W ^ 1 and W ^ 2 can minimize the squared residual error E = e 1 T e 1 / 2 + e 2 T e 2 / 2 . In general, the online adaptive learning algorithm usually requires a persistence excitation (PE) condition to achieve convergence. In order to satisfy this condition, we redefine the residual squared error as E = 1 2 ( e 1 T e 1 + l = 1 N e 1 l T e 1 l + e 2 T e 2 + l = 1 N e 2 l T e 2 l ) , where e 1 l , e 2 l represent the past data with t l < t . We choose the normalized gradient descent algorithm as the tuning laws of the estimates to minimize the residual squared error,
W ^ ˙ 1 = α 1 σ 1 ( t ) σ ¯ 1 ( t ) [ σ 1 ( t ) T W ^ 1 + r 1 ( s , u ^ 1 , u ^ 2 , Γ 1 ) ] T α 1 l = 1 N σ 1 ( t l ) σ ¯ 1 ( t l ) [ σ 1 ( t l ) T W ^ 1 + r 1 ( s ( t l ) , u ^ 1 ( t l ) , u ^ 2 ( t l ) , Γ 1 ( t l ) ) ] T ,
W ^ ˙ 2 = α 2 σ 2 ( t ) σ ¯ 2 ( t ) [ σ 2 ( t ) T W ^ 2 + r 2 ( s , u ^ 1 , u ^ 2 , Γ 2 ) ] T α 2 l = 1 N σ 2 ( t l ) σ ¯ 2 ( t l ) [ σ 2 ( t l ) T W ^ 2 + r 2 ( s ( t l ) , u ^ 1 ( t l ) , u ^ 2 ( t l ) , Γ 2 ( t l ) ) ] T ,
where α 1 > 0 and α 2 > 0 are learning rates that determine the convergence speed of the estimate, σ 1 ( t ) = ϕ 1 ( s ) ( F ( s ) + G 1 ( s ) u ^ 1 + G 2 ( s ) u ^ 2 ) , σ ¯ 1 ( t ) = ( σ 1 T ( t ) σ 1 T ( t ) + 1 ) 2 , σ 2 ( t ) = ϕ 2 ( s ) ( F ( s ) + G 1 ( s ) u ^ 1 + G 2 ( s ) u ^ 2 ) , σ ¯ 2 ( t ) = ( σ 2 T ( t ) σ 2 T ( t ) + 1 ) 2 , r 1 ( s , u ^ 1 , u ^ 2 , Γ 1 ) = s T Q 1 s + Φ 1 ( u ^ 1 , λ 1 ) + Φ 2 ( u ^ 2 , λ 2 ) + Γ 1 ( s , V ^ 1 ) , r 2 ( s , u ^ 1 , u ^ 2 , Γ 2 ) = s T Q 2 s + Φ 3 ( u ^ 1 , λ 1 ) + Φ 4 ( u ^ 2 , λ 2 ) + Γ 2 ( s , V ^ 2 ) , and s ( t l ) , u ^ 1 ( t l ) , u ^ 2 ( t l ) , σ 1 ( t l ) , σ ¯ 1 ( t l ) , σ 2 ( t l ) , σ ¯ 2 ( t l ) , Γ 1 ( t l ) , Γ 2 ( t l ) are all obtained by storing the past data.
The weight estimation errors W ˜ 1 and W ˜ 2 can be defined as
W ˜ 1 = W 1 * W ^ 1 , W ˜ 2 = W 2 * W ^ 2 .
Based on (37)–(39), we have
W ˜ ˙ 1 = α 1 σ 1 ( t ) σ ¯ 1 ( t ) [ σ 1 ( t ) T W ^ 1 + r ( s , u 1 , u 2 , Γ 1 ) ] T + α 1 l = 1 N σ 1 ( t l ) σ ¯ 1 ( t l ) [ σ 1 ( t l ) T W ^ 1 + r ( s ( t l ) , u 1 ( t l ) , u 2 ( t l ) , Γ 1 ( t l ) ) ] T ,
W ˜ ˙ 2 = α 2 σ 2 ( t ) σ ¯ 2 ( t ) [ σ 2 ( t ) T W ^ 2 + r ( s , u 1 , u 2 , Γ 2 ) ] T + α 2 l = 1 N σ 2 ( t l ) σ ¯ 2 ( t l ) [ σ 2 ( t l ) T W ^ 2 + r ( s ( t l ) , u 1 ( t l ) , u 2 ( t l ) , Γ 2 ( t l ) ) ] T .
Combined with the previous content, the proposed multi-input safety-critical system structure diagram is shown in Figure 1.
Theorem 2.
Consider the system (9), the approximate optimal control strategy (33) and (34), and the weight tuning laws (37) and (38). Suppose that ϕ 1 , ϕ 2 , ϵ 1 , ϵ 1 , ϵ 2 , ϵ 2 , ϵ B 1 , ϵ B 2 are all uniformly bounded. Assume that the Assumptions 1 and 2 hold. Then, the system state s, the neural network weight errors W ˜ 1 , W ˜ 2 can be guaranteed to be UUB under the time-varying safety constraints and uncertain disturbances.
Proof of Theorem 2. 
See the Appendix A. □
Remark 4.
According to the result of Theorem 2, we can obtain that the neural network weight errors are UUB. According to formulas (33), (34), and (39), we can easily derive that, as V ^ 1 ( s ) V 1 * ( s ) , V ^ 2 ( s ) V 2 * ( s ) , then the control input u ^ 1 u 1 * , u ^ 2 u 2 * . That is, the control strategy can be approximately optimal.
Remark 5.
Compared with [35], this work considers a more complex and interesting constrained control problem, that is, the safety constraints change with time. In addition, we establish the coupled HJB equation to obtain the constrained optimal solution, so that the system state can complete convergence under the condition that the time-varying constraints are satisfied.
Remark 6.
In [34,36], the safety optimal control problem with external disturbance is considered, and the control scheme based on barrier transformation is designed. However, all of the external disturbances mentioned are known. In this work, the safety control problem with uncertain disturbance is further studied, and it is proved that the system state can complete convergence under the proposed control strategy.

4. Simulation

To prove the effectiveness of the proposed method, we give two nonlinear examples with time-varying safety constraints. In both cases, we observe that the system can satisfy the time-varying safety constraints.

4.1. Nonlinear System Example 1

Consider the affine nonlinear system as follows [30]:
x ˙ = x 2 2 x 1 ( x 2 0.5 x 1 + 0.25 x 2 ( cos ( 2 x 1 + 2 ) ) 2 + 0.25 x 2 ( sin ( 4 x 1 ) 2 + 2 ) 2 ) + 0 cos ( 2 x 1 + 2 ) u 1 + 0 4 x 1 2 + 2 u 2 + 0 cos ( x 1 ) x 2 d .
In addition, x = [ x 1 , x 2 ] T is the system state. One selects α 1 = α 2 = 1 , R 11 = R 12 = 2 , R 21 = R 22 = 1 , Q 1 = Q 2 = [ 1 0 ; 0 1 ] . The initial system state is defined as x 0 = [ 2 , 2 ] T . We choose φ ( x ) = x and d ( φ ( x ) ) = p x 1 sin x 2 , p [ 1 , 1 ] . Similarly, we select δ ( x ) = x 1 sin x 2 . Based on Formula (4) and (5), we define the time-varying parameters for x 1 as l 1 = 1 , l 2 = 0.6 , ϑ 1 = 0.2 , t 1 = 3 , t 2 = 4 , l 3 = 2.2 , l 4 = 1.8 , ϑ 2 = 0.2 . We define the time-varying parameters for x 2 as l 1 = 2.8 , l 2 = 1.8 , ϑ 1 = 0.5 , t 3 = 3 , t 4 = 4 , l 3 = 3 , l 4 = 2 , ϑ 2 = 0.5 . Before 75 s, the persistence excitation condition is ensured by the probing noise. Since the effectiveness of the barrier transformation has been demonstrated in many previous works, we no longer compare our work with scenarios without safety constraints, but with scenarios with constant constraints.
We define the activation functions as
ϕ 1 ( s ) = ϕ 2 ( s ) = [ s 1 2 s 1 s 2 s 2 2 ] T .
Meanwhile, the critic weight parameters are denoted as
W ^ 1 = [ ω ^ 11 ω ^ 12 ω ^ 13 ] T , W ^ 2 = [ ω ^ 21 ω ^ 22 ω ^ 23 ] T .
The critic parameters after 100 s converge to the value of W ^ 1 = [ 0.392 1.789 1.162 ] , W ^ 2 = [ 1.849 2.590 0.142 ] .
It is obtained from Figure 2 that the method of using constant constraints can satisfy constant constraints ( 1 , 2.2 ) , ( 2.8 , 3 ) in the process of system state convergence, but can not satisfy the time-varying constraints ( ζ a 1 , ζ A 1 ) , ( ζ a 2 , ζ A 2 ) . It can be seen that the trajectory of system state x obtained by the proposed method can converge to zero under the condition that time-varying safety constraints are satisfied. Figure 3 gives the evolution of the critic parameters for player 1. The evolution of the critic parameters for player 2 is shown in Figure 4. It can be seen that, according to the proposed tuning laws (37) and (38), the critic weight parameters converge to their ideal values. Figure 5 shows the state trajectories of the transformation system (9).

4.2. Nonlinear System Example 2

Consider the following nonlinear system of a single link robot arm:
x ˙ = x 2 2 x 1 5 sin ( x 1 ) 0.2 x 2 + 0 0.1 u 1 + 0 0.1 u 2 + 0 1 d .
In addition, x = [ x 1 , x 2 ] T is the system state. One selects α 1 = 5 , α 2 = 1 , R 11 = R 12 = 2 , R 21 = R 22 = 1 , Q 1 = Q 2 = [ 5 0 ; 0 5 ] . The initial system state is defined as x 0 = [ 2 , 2 ] T . Similarly, we choose φ ( x ) = x , d ( φ ( x ) ) = p x 1 sin x 2 , p [ 1 , 1 ] , and δ ( x ) = x 1 sin x 2 . In this example, we apply the more complex time-varying safety constraints to the system state, where the constraints on the upper bounds of x 1 , x 2 vary at 3 and 8 s, respectively, and the constraints on the lower bounds of x 1 and x 2 vary at 3 and 10 s, respectively. Define λ 1 = 3 , λ 2 = 18 as the boundaries of the control inputs. Before 75 s, the persistence excitation condition is ensured by the probing noise.
We define the activation function as
ϕ 1 ( s ) = ϕ 2 ( s ) = [ s 1 2 s 1 s 2 s 2 2 ] T .
Meanwhile, we denoted the critic weight parameters as
W ^ 1 = [ ω ^ 11 ω ^ 12 ω ^ 13 ] T , W ^ 2 = [ ω ^ 21 ω ^ 22 ω ^ 23 ] T .
The critic parameters after 100 s converge to the value of W ^ 1 = [−1.319 0.249 −0.023], W ^ 2 = [0.250 −1.113 0.658].
In Example 2, we further consider the case of input constraints. Figure 6 shows that the method using constant constraints cannot satisfy the time-varying safety constraints ( ζ a 1 , ζ A 1 ) , ( ζ a 2 , ζ A 2 ) in the process of system state convergence, while the proposed method can ensure that the system state x converges under the time-varying safety constraints. The constrained control inputs are shown in Figure 7. The evolution of the critic parameters is given in Figure 8 and Figure 9. The transformation system state trajectories are shown in Figure 10.

5. Conclusions

For the affine nonlinear multi-input safety-critical systems with uncertain disturbances and time-varying safety constraints, a new adaptive learning algorithm based on the coupled HJB equations was proposed to solve the constrained optimal control problem. In order to satisfy the time-varying safety constraints, the novel barrier function and smooth safety boundary function were used to transform the safety-critical system into the transformation system without the time-varying safety constraints. The proposed barrier function solves the time-varying safety constraint problem which cannot be solved by the traditional constant constraint method. The influence of uncertain disturbances on the transformation system was dealt with reasonably by establishing the nominal system and disturbance-related term. In addition, two critic neural networks were used to learn the optimal solutions of the coupled HJB equations. The effectiveness of this method was verified by the theoretical proof. In addition, we test both the nonlinear system of the robotic arm and the numerical nonlinear example. Simulation results also verify the effectiveness of the proposed method.

Author Contributions

J.W. and C.Q.: Methodology, Validation, Conceptualization, and Writing—Original Draft; X.Q., D.Z. and Z.Z.: Formal analysis, Writing—Review and editing; Z.S. and H.Z.: Data curation; C.Q.: Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. (U1504615), the Science and Technology Research Project of the Henan Province 222102240014, and Youth Backbone Teachers in Colleges and Universities of Henan Province 2018GGJS017.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors can confirm that all relevant data are included in the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 2. 
Consider the following Lyapunov function candidate
L ( s ) = V 1 ( s ) + V 2 ( s ) + 1 2 α 1 1 W ˜ 1 T W ˜ 1 + 1 2 α 2 1 W ˜ 2 T W ˜ 2 .
The time derivative on the trajectory of the transformation system is calculated as
L ˙ = V ˙ 1 + V ˙ 2 + α 1 1 W ˜ 1 T W ˜ ˙ 1 + α 2 1 W ˜ 2 T W ˜ ˙ 2 .
Considering (40), we derive that
α 1 1 W ˜ 1 T W ˜ ˙ 1 = α 1 1 W ˜ 1 T ( α 1 σ 1 ( t ) σ ¯ 1 ( t ) [ σ 1 ( t ) T W ^ 1 + r 1 ( s , u 1 , u 2 , Γ 1 ) ] T + α 1 l = 1 N σ 1 ( t l ) σ ¯ 1 ( t l ) [ σ 1 ( t l ) T W ^ 1 + r 1 ( s ( t l ) , u ^ 1 ( t l ) , u ^ 2 ( t l ) , Γ 1 ( t l ) ) ] T ) .
Define Π 1 = σ 1 ( t ) T W ^ 1 + r 1 ( s , u 1 , u 2 , Γ 1 ) . Based on Formula (31), one has
Π 1 = σ 1 ( t ) T W ^ 1 + s T Q 1 s + Φ 1 ( u ^ 1 , λ 1 ) + Φ 2 ( u ^ 2 , λ 2 ) + Γ 1 ( s , V ^ 1 ) σ 1 * ( t ) T W 1 * s T Q 1 s Φ 1 ( u 1 * , λ 1 ) Φ 2 ( u 2 * , λ 2 ) Γ 1 ( s , V 1 * ) + ε B 1 , = Φ 1 ( u ^ 1 , λ 1 ) + Φ 2 ( u ^ 2 , λ 2 ) Φ 1 ( u 1 * , λ 1 ) Φ 2 ( u 2 * , λ 2 ) + ε B 1 W ˜ 1 T σ 1 ( t ) + W 1 * T ( σ 1 ( t ) σ 1 * ( t ) ) + Γ 1 ( s , V ^ 1 ) Γ 1 ( s , V 1 * ) ,
where σ 1 * ( t ) = ϕ 1 ( s ) ( F ( s ) + G 1 ( s ) u 1 * + G 2 ( s ) u 2 * ) .
Define Π 2 = Φ 1 ( u ^ 1 , λ 1 ) Φ 1 ( u 1 * , λ 1 ) . Based on the results in [39,40], we can obtain
Π 2 = W ^ 1 T ϕ 1 ( s ) G 1 ( s ) λ 1 t a n h ( D ^ 1 ) + W ˜ 1 T ϕ 1 ( s ) G 1 ( s ) λ 1 t a n h ( σ m 1 D ^ 1 ) W 1 * T ϕ 1 ( s ) G 1 ( s ) λ 1 t a n h ( D 1 ) W 1 * T ϕ 1 ( s ) G 1 ( s ) λ 1 [ t a n h ( σ m 1 D ^ 1 ) t a n h ( σ m 1 D 1 ) ] + λ 1 2 R ¯ 11 ( ε D ^ 1 ε D 1 ) + ε σ 1 ,
where D ^ 1 = 1 2 λ 1 R 11 1 G 1 T ( s ) ϕ 1 ( s ) T W ^ 1 , D 1 = 1 2 λ 1 R 11 1 G 1 T ( s ) ϕ 1 ( s ) T W 1 * , ε D ^ 1 and ε D 1 are bounded approximation errors, σ m 1 is a big constant, and ε σ 1 is the approximate error between the tanh and sgn functions.
Define Π 3 = Φ 2 ( u ^ 2 , λ 2 ) Φ 2 ( u 2 * , λ 2 ) . Similarly, we can obtain
Π 3 = W ^ 2 T ϕ 2 ( s ) G 2 ( s ) λ 2 t a n h ( D ^ 2 ) + W ˜ 2 T ϕ 2 ( s ) G 2 ( s ) λ 2 t a n h ( σ m 2 D ^ 2 ) W 2 * T ϕ 2 ( s ) G 2 ( s ) λ 2 t a n h ( D 2 ) W 2 * T ϕ 2 ( s ) G 2 ( s ) λ 2 [ t a n h ( σ m 2 D ^ 2 ) t a n h ( σ m 2 D 2 ) ] + λ 2 2 R ¯ 22 ( ε D ^ 2 ε D 2 ) + ε σ 2 ,
where D ^ 2 = 1 2 λ 2 R 22 1 G 2 T ( s ) ϕ 2 ( s ) T W ^ 2 , D 2 = 1 2 λ 2 R 22 1 G 2 T ( s ) ϕ 2 ( s ) T W 2 * , ε D ^ 2 and ε D 2 are bounded approximation errors, σ m 2 is a big constant, and ε σ 2 is the approximate error. Based on (A5) and (A6) and some manipulation, one has
Π 1 = W ^ 1 T ϕ 1 ( s ) G 1 ( s ) λ 1 t a n h ( D ^ 1 ) + W ˜ 1 T ϕ 1 ( s ) G 1 ( s ) λ 1 t a n h ( σ m 1 D ^ 1 ) W 1 * T ϕ 1 ( s ) G 1 ( s ) λ 1 t a n h ( D 1 ) W 1 * T ϕ 1 ( s ) G 1 ( s ) λ 1 [ t a n h ( σ m 1 D ^ 1 ) t a n h ( σ m 1 D 1 ) ] + W ^ 2 T ϕ 2 ( s ) G 2 ( s ) λ 2 t a n h ( D ^ 2 ) W 2 * T ϕ 2 ( s ) G 2 ( s ) λ 2 t a n h ( D 2 ) W 2 * T ϕ 2 ( s ) G 2 ( s ) λ 2 [ t a n h ( σ m 2 D ^ 2 ) t a n h ( σ m 2 D 2 ) ] W ˜ 1 T σ 1 ( t ) + W 1 * T ( σ 1 ( t ) σ 1 * ( t ) ) + ϵ 11 + ϵ 12 + W ˜ 2 T ϕ 2 ( s ) G 2 ( s ) λ 2 t a n h ( σ m 2 D ^ 2 ) , = W ˜ 1 T σ 1 ( t ) + W ˜ 1 T ϕ 1 ( s ) G 1 ( s ) λ 1 [ t a n h ( σ m 1 D ^ 1 ) t a n h ( D ^ 1 ) ] W 1 * T ϕ 1 ( s ) G 1 ( s ) λ 1 [ t a n h ( σ m 1 D ^ 1 ) t a n h ( σ m 1 D 1 ) ] + W ^ 2 T ϕ 2 ( s ) G 2 ( s ) λ 2 [ t a n h ( D ^ 2 ) t a n h ( σ m 2 D ^ 2 ) ] + W 2 * T ϕ 2 ( s ) G 2 ( s ) λ 2 [ t a n h ( σ m 2 D 2 ) t a n h ( D 2 ) ] + W 1 * T ϕ 1 ( s ) G 2 ( s ) λ 2 [ t a n h ( D 2 * ) t a n h ( D ^ 2 ) ] + ϵ 11 + ϵ 12 , = W ˜ 1 T σ 1 ( t ) + W ˜ 1 T ψ 1 + W 1 * T ( ψ 5 ψ 2 ) + W ^ 2 T ψ 3 + W 2 * T ψ 4 + ϵ 11 + ϵ 12 ,
where
ϵ 11 = Γ 1 ( s , V ^ 1 ) Γ 1 ( s , V 1 * ) + ε B 1 , ϵ 12 = λ 1 2 R ¯ 11 ( ε D ^ 1 ε D 1 ) + ε σ 1 + λ 2 2 R ¯ 22 ( ε D ^ 2 ε D 2 ) + ε σ 2 , ψ 1 = ϕ 1 ( s ) G 1 ( s ) λ 1 [ t a n h ( σ m 1 D ^ 1 ) t a n h ( D ^ 1 ) ] , ψ 2 = ϕ 1 ( s ) G 1 ( s ) λ 1 [ t a n h ( σ m 1 D ^ 1 ) t a n h ( σ m 1 D 1 ) ] , ψ 3 = ϕ 2 ( s ) G 2 ( s ) λ 2 [ t a n h ( D ^ 2 ) t a n h ( σ m 2 D ^ 2 ) ] , ψ 4 = ϕ 2 ( s ) G 2 ( s ) λ 2 [ t a n h ( σ m 2 D 2 ) t a n h ( D 2 ) ] , ψ 5 = ϕ 1 ( s ) G 2 ( s ) λ 2 [ t a n h ( D 2 * ) t a n h ( D ^ 2 ) ] .
Similarly,
σ 1 ( t i ) T W ^ 1 + r ( s ( t i ) , u ^ 1 ( t i ) , u ^ 2 ( t i ) , Γ 1 ( t i ) ) ] T = W ˜ 1 T σ 1 ( t i ) + W ˜ 1 T ψ 1 + W 1 * T ( ψ 5 ψ 2 ) + W ^ 2 T ψ 3 + W 2 * T ψ 4 + ϵ 11 + ϵ 12 .
Substituting Formulas (A7) and (A8) into Formula (A3) yields
α 1 1 W ˜ 1 T W ˜ ˙ 1 = W ˜ 1 T [ σ 1 ( t ) σ 1 T ( t ) σ ¯ 1 ( t ) + l = 1 N σ 1 ( t i ) σ 1 T ( t i ) σ ¯ 1 ( t i ) ] W ˜ 1 + W ˜ 1 T [ σ 1 ( t ) σ ¯ 1 ( t ) + l = 1 N σ 1 ( t i ) σ ¯ 1 ( t i ) ] ψ 1 T W ˜ 1 + W ˜ 1 T [ σ 1 ( t ) σ ¯ 1 ( t ) + l = 1 N σ 1 ( t i ) σ ¯ 1 ( t i ) ] ψ 3 T W ^ 2 + W ˜ 1 T [ σ 1 ( t ) σ ¯ 1 ( t ) + l = 1 N σ 1 ( t i ) σ ¯ 1 ( t i ) ] ( ψ 5 ψ 2 ) T W 1 + W ˜ 1 T [ σ 1 ( t ) σ ¯ 1 ( t ) + l = 1 N σ 1 ( t i ) σ ¯ 1 ( t i ) ] ψ 4 T W 2 * + W ˜ 1 T [ σ 1 ( t ) σ ¯ 1 ( t ) + l = 1 N σ 1 ( t i ) σ ¯ 1 ( t i ) ] ( ϵ 11 + ϵ 12 ) , = W ˜ 1 T ϖ 1 W ˜ 1 + W ˜ 1 T ϖ 2 ψ 1 T W ˜ 1 + W ˜ 1 T ϖ 2 ψ 3 T W ^ 2 + W ˜ 1 T ϖ 2 ψ 4 T W 2 * + W ˜ 1 T ϖ 2 ( ψ 5 T ψ 2 T ) W 1 + W ˜ 1 T ϖ 3 , W ˜ 1 T ϖ 1 W ˜ 1 + r c 2 W ˜ 1 T ϖ 2 ϖ 2 T W ˜ 1 + 1 2 r c W ˜ 1 T ψ 1 ψ 1 T W ˜ 1 + W ˜ 1 T ϖ 2 ψ 3 T W ^ 2 + W ˜ 1 T ϖ 2 ( ψ 5 T ψ 2 T ) W 1 + W ˜ 1 T ϖ 3 + W ˜ 1 T ϖ 2 ψ 4 T W 2 * ,
where r c is a positive constant to be determined,
ϖ 1 = [ σ 1 ( t ) σ 1 T ( t ) σ ¯ 1 ( t ) + l = 1 N σ 1 ( t i ) σ 1 T ( t i ) σ ¯ 1 ( t i ) ] , ϖ 2 = [ σ 1 ( t ) σ ¯ 1 ( t ) + l = 1 N σ 1 ( t i ) σ ¯ 1 ( t i ) ] , ϖ 3 = [ σ 1 ( t ) σ ¯ 1 ( t ) + l = 1 N σ 1 ( t i ) σ ¯ 1 ( t i ) ] ( ϵ 11 + ϵ 12 ) .
We can also obtain an upper bound on α 2 1 W ˜ 2 T W ˜ ˙ 2 using the similar method,
α 2 1 W ˜ 2 T W ˜ ˙ 2 = W ˜ 2 T ϖ 4 W ˜ 2 + W ˜ 2 T ϖ 5 ψ 6 T W ˜ 2 + W ˜ 2 T ϖ 5 ψ 8 T W ^ 1 + W ˜ 2 T ϖ 5 ψ 9 T W 1 * + W ˜ 2 T ϖ 5 ( ψ 10 T ψ 7 T ) W 2 + W ˜ 2 T ϖ 6 , W ˜ 2 T ϖ 4 W ˜ 2 + r c 2 W ˜ 2 T ϖ 5 ϖ 5 T W ˜ 2 + 1 2 r c W ˜ 2 T ψ 6 ψ 6 T W ˜ 2 + W ˜ 2 T ϖ 5 ψ 8 T W ^ 1 + W ˜ 2 T ϖ 5 ( ψ 10 T ψ 7 T ) W 2 + W ˜ 2 T ϖ 6 + W ˜ 2 T ϖ 5 ψ 9 T W 1 * ,
where ε D ^ 1 and ε D 1 are bounded approximation errors, σ m 3 , σ m 3 are two big constants, and ε σ 3 , ε σ 4 are approximate errors,
ψ 6 = ϕ 2 ( s ) G 2 ( s ) λ 2 [ t a n h ( σ m 3 D ^ 2 ) t a n h ( D ^ 2 ) ] , ψ 7 = ϕ 2 ( s ) G 2 ( s ) λ 2 [ t a n h ( σ m 3 D ^ 2 ) t a n h ( σ m 3 D 2 ) ] , ψ 8 = ϕ 1 ( s ) G 1 ( s ) λ 1 [ t a n h ( D ^ 1 ) t a n h ( σ m 4 D ^ 1 ) ] , ψ 9 = ϕ 1 ( s ) G 1 ( s ) λ 1 [ t a n h ( σ m 4 D 1 ) t a n h ( D 1 ) ] , ψ 10 = ϕ 2 ( s ) G 1 ( s ) λ 1 [ t a n h ( D 1 * ) t a n h ( D ^ 1 ) ] , ϖ 4 = [ σ 2 ( t ) σ 2 T ( t ) σ ¯ 2 ( t ) + l = 1 N σ 2 ( t i ) σ 2 T ( t i ) σ ¯ 2 ( t i ) ] , ϖ 5 = [ σ 2 ( t ) σ ¯ 2 ( t ) + l = 1 N σ 2 ( t i ) σ ¯ 2 ( t i ) ] , ϖ 6 = [ σ 2 ( t ) σ ¯ 2 ( t ) + l = 1 N σ 2 ( t i ) σ ¯ 2 ( t i ) ] ( ϵ 21 + ϵ 22 ) , ϵ 21 = Γ 2 ( s , V ^ 2 ) Γ 2 ( s , V 2 * ) + ε B 2 , ϵ 22 = λ 1 2 R ¯ 11 ( ε D ^ 3 ε D 3 ) + ε σ 3 + λ 2 2 R ¯ 22 ( ε D ^ 4 ε D 4 ) + ε σ 4 .
Considering (30), we derive that
V ˙ 1 = ( W 1 * T ϕ 1 ( s ) + ε 1 T ) ( F ( s ) + G 1 ( s ) u 1 + G 2 ( s ) u 2 ) = ( W 1 * T ϕ 1 ( s ) F ( s ) W 1 * T ϕ 1 ( s ) G 1 ( s ) λ 1 t a n h ( D ^ 1 ) ( W 1 * T ϕ 1 ( s ) G 2 ( s ) λ 2 t a n h ( D ^ 2 ) + ε 0 ,
where ε 0 = ε 1 T ( F ( s ) G 1 ( s ) λ 1 t a n h ( D ^ 1 ) G 2 ( s ) λ 2 t a n h ( D ^ 2 ) ) . Based on Assumptions 1 and 2, one has
ε 0 λ d ε 1 λ f s + λ d ε 1 λ 1 g λ 1 + λ d ε 1 λ 2 g λ 2 .
Based on (31), one has
W 1 * T ϕ 1 ( s ) F = s T Q 1 s Φ 1 ( u 1 , λ 1 ) Φ 2 ( u 2 , λ 2 ) Γ 1 ( s , V 1 ) + ε B 1 + ( W 1 * T ϕ 1 ( s ) G 1 ( s ) λ 1 t a n h ( D 1 ) + ( W 1 * T ϕ 1 ( s ) G 2 ( s ) λ 2 t a n h ( D 2 ) .
Based on (A13) and the facts that ( W 1 * T ϕ 1 ( s ) G 1 ( s ) λ 1 [ t a n h ( D 1 ) t a n h ( D ^ 1 ) ] 2 λ 1 λ g 1 λ d ϕ 1 W 1 * , ( W 1 * T ϕ 1 ( s ) G 2 ( s ) λ 2 [ t a n h ( D 2 ) t a n h ( D ^ 2 ) ] 2 λ 2 λ g 2 λ d ϕ 1 W 2 * , ε B 1 ε B 1 h , and Φ 1 ( u 1 , λ 1 ) , Φ 2 ( u 2 , λ 2 ) , Γ 1 ( s , V 1 ) are positive definite, one has
V ˙ 1 s T Q 1 s + ε B 1 h + λ d ε 1 λ f s + λ d ε 1 λ 1 g λ 1 + λ d ε 1 λ 2 g λ 2 + 2 λ 1 λ g 1 λ d ϕ 1 W 1 * + 2 λ 2 λ g 2 λ d ϕ 1 W 2 * .
Similarly, we can derive
V ˙ 2 s T Q 2 s + ε B 2 h + λ d ε 2 λ f s + λ d ε 2 λ 1 g λ 1 + λ d ε 2 λ 2 g λ 2 + 2 λ 1 λ g 1 λ d ϕ 2 W 1 * + 2 λ 2 λ g 2 λ d ϕ 2 W 2 * .
Collecting the results in (A9), (A10), (A14) and (A15), one has
L ˙ s T Q 1 s s T Q 2 s W ˜ 1 T ϖ 1 W ˜ 1 + r c 2 W ˜ 1 T ϖ 2 ϖ 2 T W ˜ 1 + 1 2 r c W ˜ 1 T ψ 1 ψ 1 T W ˜ 1 + W ˜ 1 T ϖ 2 ψ 3 T W ^ 2 + W ˜ 1 T ϖ 2 ( ψ 5 T ψ 2 T ) W 1 * + W ˜ 1 T ϖ 3 + W ˜ 1 T ϖ 2 ψ 4 T W 2 * W ˜ 2 T ϖ 4 W ˜ 2 + r c 2 W ˜ 2 T ϖ 5 ϖ 5 T W ˜ 2 + 1 2 r c W ˜ 2 T ψ 6 ψ 6 T W ˜ 2 + W ˜ 2 T ϖ 5 ψ 8 T W ^ 1 + W ˜ 2 T ϖ 5 ( ψ 10 T ψ 7 T ) W 2 + W ˜ 2 T ϖ 6 + W ˜ 2 T ϖ 5 ψ 9 T W 1 * + h 1 + h 2 , = s T Q 1 s s T Q 2 s W ˜ 1 T h 3 W ˜ 1 + W ˜ 1 T h 4 W ˜ 2 T h 5 W ˜ 2 + W ˜ 2 T h 6 + h 1 + h 2 ,
where
h 1 = ε B 1 h + λ d ε 1 λ f s + λ d ε 1 λ 1 g λ 1 + λ d ε 1 λ 2 g λ 2 + 2 λ 1 λ g 1 λ d ϕ 1 W 1 * + 2 λ 2 λ g 2 λ d ϕ 1 W 2 * ,
h 2 = ε B 2 h + λ d ε 2 λ f s + λ d ε 2 λ 1 g λ 1 + λ d ε 2 λ 2 g λ 2 + 2 λ 1 λ g 1 λ d ϕ 2 W 1 * + 2 λ 2 λ g 2 λ d ϕ 2 W 2 * ,
h 3 = ϖ 1 + r c 2 ϖ 2 ϖ 2 T + 1 2 r c ψ 1 ψ 1 T ,
h 4 = ϖ 2 ψ 3 T W ^ 2 + ϖ 2 ( ψ 5 T ψ 2 T ) W 1 * + ϖ 3 + ϖ 2 ψ 4 T W 2 * ,
h 5 = ϖ 4 + r c 2 ϖ 5 ϖ 5 T + 1 2 r c ψ 6 ψ 6 T ,
h 6 = ϖ 5 ψ 8 T W ^ 1 + ϖ 5 ( ψ 10 T ψ 7 T ) + ϖ 6 + ϖ 5 ψ 9 T W 1 * .
Finally, collecting the results in (A9), (A10), (A14), (A15) and (A16), one has
L ˙ s T Q 1 s s T Q 2 s W ˜ 1 T h 3 W ˜ 1 + W ˜ 1 T h 4 W ˜ 2 T h 5 W ˜ 2 + W ˜ 2 T h 6 + h 1 + h 2 , λ m i n ( Q 1 ) s 2 λ m i n ( Q 2 ) s 2 λ m i n ( h 3 ) W ˜ 1 2 + W ˜ 1 h 4 λ m i n ( h 5 ) W ˜ 2 2 + W ˜ 2 h 6 + h 1 + h 2 .
Reasonable selection of parameters makes h 3 > 0 , h 4 > 0 , h 5 > 0 , h 6 > 0 , and the Lyapunov derivative (A2) is negative if
W ˜ 1 > h 4 2 λ m i n ( h 3 ) + h 4 2 4 λ m i n 2 ( h 3 ) + W ˜ 2 h 6 + h 1 + h 2 λ m i n ( h 3 ) ,
W ˜ 2 > h 6 2 λ m i n ( h 5 ) + h 6 2 4 λ m i n 2 ( h 5 ) + W ˜ 1 h 4 + h 1 + h 2 λ m i n ( h 5 ) .
Based on the Lyapunov theorem and Formulas (A18) and (A19), we can select parameters appropriately to ensure that the system state s and critic neural network weight errors W ˜ 1 , W ˜ 2 are UUB.
This completes the proof. □

References

  1. Tee, K.P.; Ge, S.S.; Tay, E.H. Barrier Lyapunov Functions for the control of output-constrained nonlinear systems. IFAC Proc. Vol. 2013, 46, 449–455. [Google Scholar] [CrossRef]
  2. Ames, A.D.; Coogan, S.; Egerstedt, M.; Notomista, G.; Sreenath, K.; Tabuada, P. Control barrier functions: Theory and applications. In Proceedings of the 18th European Control Conference (ECC), Saint Petersburg, Russia, 12 May 2020; pp. 3420–3431. [Google Scholar]
  3. Wang, D.; He, H.; Liu, D. Adaptive Critic Nonlinear Robust Control: A Survey. IEEE Trans. Cybern. 2017, 47, 3429–3451. [Google Scholar] [CrossRef] [PubMed]
  4. Wang, D.; Liu, D. Learning and guaranteed cost control with event-based adaptive critic implementation. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 6004–6014. [Google Scholar] [CrossRef] [PubMed]
  5. Vamvoudakis, K.G.; Lewis, F.L. Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations. Automatica 2011, 47, 1556–1569. [Google Scholar] [CrossRef]
  6. Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive Dynamic Programming for Control: A Survey and Recent Advances. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 142–160. [Google Scholar] [CrossRef]
  7. El-Sousy, F.F.M.; Amin, M.M.; Al-Durra, A. Adaptive Optimal Tracking Control Via Actor-Critic-Identifier Based Adaptive Dynamic Programming for Permanent-Magnet Synchronous Motor Drive System. IEEE Trans. Ind. Appl. 2021, 57, 6577–6591. [Google Scholar] [CrossRef]
  8. Liu, D.; Li, H.; Wang, D. Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics. IEEE Trans. Syst. Man Cybern. 2014, 44, 1015–1027. [Google Scholar] [CrossRef]
  9. Zhao, B.; Liu, D. Event-Triggered Decentralized Tracking Control of Modular Reconfigurable Robots Through Adaptive Dynamic Programming. IEEE Trans. Ind. Electron. 2020, 67, 3054–3064. [Google Scholar] [CrossRef]
  10. Zhao, B.; Wang, D.; Shi, G.; Liu, D.; Li, Y. Decentralized Control for Large-Scale Nonlinear Systems With Unknown Mismatched Interconnections via Policy Iteration. IEEE Trans. Syst. Man Cybern. 2018, 48, 1725–1735. [Google Scholar] [CrossRef]
  11. Wang, D.; Liu, D.; Li, H.; Ma, H. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf. Sci. 2014, 282, 167–179. [Google Scholar] [CrossRef]
  12. Modares, H.; Lewis, F.L.; Jiang, Z.P. H tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2550–2562. [Google Scholar] [CrossRef]
  13. Wang, D.; He, H.; Liu, D. Improving the Critic Learning for Event-Based Nonlinear H Control Design. IEEE Trans. Cybern. 2017, 47, 3417–3428. [Google Scholar] [CrossRef] [PubMed]
  14. Zhang, H.; Xi, R.; Wang, Y.; Sun, S.; Sun, J. Event-Triggered Adaptive Tracking Control for Random Systems With Coexisting Parametric Uncertainties and Severe Nonlinearities. IEEE Trans. Autom. Contr. 2022, 67, 2011–2018. [Google Scholar] [CrossRef]
  15. Vamvoudakis, K.G.; Lewis, F.L. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
  16. Li, J.; Ding, J.; Chai, T.; Lewis, F.L.; Jagannathan, S. Adaptive Interleaved Reinforcement Learning: Robust Stability of Affine Nonlinear Systems with Unknown Uncertainty. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 270–280. [Google Scholar] [CrossRef] [PubMed]
  17. Zhang, H.; Zhang, K.; Xiao, G.; Jiang, H. Robust Optimal Control Scheme for Unknown Constrained-Input Nonlinear Systems via a Plug-n-Play Event-Sampled Critic-Only Algorithm. IEEE Trans. Syst. Man Cybern. 2020, 50, 3169–3180. [Google Scholar] [CrossRef]
  18. Wang, D.; Mu, C.; He, H.; Liu, D. Event-Driven Adaptive Robust Control of Nonlinear Systems With Uncertainties Through NDP Strategy. IEEE Trans. Syst. Man Cybern. 2017, 47, 1358–1370. [Google Scholar] [CrossRef]
  19. Wei, Q.; Zhu, L.; Song, R.; Zhang, P.; Liu, D.; Xiao, J. Model-Free Adaptive Optimal Control for Unknown Nonlinear Multiplayer Nonzero-Sum Game. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 879–892. [Google Scholar] [CrossRef]
  20. Zhang, H.; Su, H.; Zhang, K.; Luo, Y. Event-Triggered Adaptive Dynamic Programming for Non-Zero-Sum Games of Unknown Nonlinear Systems via Generalized Fuzzy Hyperbolic Models. IEEE Trans. Fuzzy. Syst. 2019, 27, 2202–2214. [Google Scholar] [CrossRef]
  21. Vamvoudakis, K.G.; Modares, H.; Kiumarsi, B.; Lewis, F.L. Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning: How to Solve Multiplayer Games Online. IEEE Contr. Syst. Mag. 2017, 37, 33–52. [Google Scholar]
  22. Li, J.; Xiao, Z.; Li, P. Discrete-time Multi-player Games Based on Off-Policy Q-Learning. IEEE Access 2019, 7, 134647–134659. [Google Scholar] [CrossRef]
  23. Su, H.; Zhang, H.; Jiang, H.; Wen, Y. Decentralized Event-Triggered Adaptive Control of Discrete-Time Nonzero-Sum Games Over Wireless Sensor-Actuator Networks With Input Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4254–4266. [Google Scholar] [CrossRef]
  24. Song, R.; Wei, Q.; Zhang, H.; Lewis, F.L. Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics. IEEE Trans. Cybern. 2021, 51, 2929–2943. [Google Scholar] [CrossRef]
  25. Xue, S.; Luo, B.; Liu, D. Event-Triggered Adaptive Dynamic Programming for Zero-Sum Game of Partially Unknown Continuous-Time Nonlinear Systems. IEEE Trans. Syst. Man Cybern. 2020, 50, 3189–3199. [Google Scholar] [CrossRef]
  26. Luo, B.; Yang, Y.; Liu, D. Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems. IEEE Trans. Cybern. 2021, 51, 3630–3640. [Google Scholar] [CrossRef] [PubMed]
  27. Wang, W.; Chen, X.; Fu, H.; Wu, M. Model-Free Distributed Consensus Control Based on Actor–Critic Framework for Discrete-Time Nonlinear Multiagent Systems. IEEE Trans. Syst. Man Cybern. 2020, 50, 4123–4134. [Google Scholar] [CrossRef]
  28. Qin, C.; Shang, Z.; Zhang, Z.; Zhang, D.; Zhang, J. Robust Tracking Control for Non-Zero-Sum Games of Continuous-Time Uncertain Nonlinear Systems. Mathematics 2022, 10, 1904. [Google Scholar] [CrossRef]
  29. Song, R.; Lewis, F.L.; Wei, Q. Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 704–713. [Google Scholar] [CrossRef] [PubMed]
  30. Ming, Z.; Zhang, H.; Liang, L.; Su, H. Nonzero-sum differential games of continuous-Time nonlinear systems with uniformly ultimately ε-bounded by adaptive dynamic programming. Appl. Math. Comput. 2022, 430, 127248. [Google Scholar] [CrossRef]
  31. Marvi, Z.; Kiumarsi, B. Safe reinforcement learning: A control barrier function optimization approach. Int. J. Robust Nonlinear Control 2021, 31, 1923–1940. [Google Scholar] [CrossRef]
  32. Xu, J.; Wang, J.; Rao, J.; Zhong, Y.; Wang, H. Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function. Int. J. Robust Nonlinear Control 2021, 32, 3408–3424. [Google Scholar] [CrossRef]
  33. Liu, Y.J.; Lu, S.; Tong, S.; Chen, X.; Chen, C.P.; Li, D.J. Adaptive control-based Barrier Lyapunov Functions for a class of stochastic nonlinear systems with full state constraints. Automatica 2018, 87, 83–93. [Google Scholar] [CrossRef]
  34. Yang, Y.; Ding, D.W.; Xiong, H.; Yin, Y.; Wunsch, D.C. Online barrier-actor-critic learning for H control with full-state constraints and input saturation. J. Franklin Inst. 2020, 357, 3316–3344. [Google Scholar] [CrossRef]
  35. Yang, Y.; Vamvoudakis, K.G.; Modares, H. Safe reinforcement learning for dynamical games. Int. J. Robust Nonlinear Control 2020, 30, 3706–3726. [Google Scholar] [CrossRef]
  36. Qin, C.; Wang, J.; Qiao, X.; Zhu, H.; Zhang, D.; Yan, Y. Integral Reinforcement Learning for Tracking in a Class of Partially Unknown Linear Systems with Output Constraints and External Disturbances. IEEE Access 2022, 10, 55270–55278. [Google Scholar] [CrossRef]
  37. Qin, C.; Zhu, H.; Wang, J.; Xiao, Q.; Zhang, D. Event-Triggered Safe Control for the Zero-Sum Game of Nonlinear Safety-Critical Systems with Input Saturation. IEEE Access 2022, 10, 40324–40337. [Google Scholar] [CrossRef]
  38. Hu, G. Observers for one-sided Lipschitz nonlinear systems. IMA J. Math. Control Inf. 2006, 23, 395–401. [Google Scholar] [CrossRef]
  39. Modares, H.; Lewis, F.L.; Sistani, M. Online Solution of nonquadratic two-player zero-sum games arising in the H control of constrained input systems. Int. J. Adapt. Control 2014, 28, 232–254. [Google Scholar] [CrossRef]
  40. Modares, H.; Lewis, F.L.; Naghibi-Sistani, M.B. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 2014, 50, 193–202. [Google Scholar] [CrossRef]
Figure 1. The structure diagram of the proposed multi-input safety-critical system.
Figure 1. The structure diagram of the proposed multi-input safety-critical system.
Mathematics 10 02744 g001
Figure 2. Evolution of the state x ( t ) by using the presented method and the method in [35].
Figure 2. Evolution of the state x ( t ) by using the presented method and the method in [35].
Mathematics 10 02744 g002
Figure 3. Evolution of the critic estimates for player 1.
Figure 3. Evolution of the critic estimates for player 1.
Mathematics 10 02744 g003
Figure 4. Evolution of the critic estimates for player 2.
Figure 4. Evolution of the critic estimates for player 2.
Mathematics 10 02744 g004
Figure 5. Transformed system states using the presented method.
Figure 5. Transformed system states using the presented method.
Mathematics 10 02744 g005
Figure 6. Evolution of the state x ( t ) by using the presented method and the method in [35].
Figure 6. Evolution of the state x ( t ) by using the presented method and the method in [35].
Mathematics 10 02744 g006
Figure 7. Constrained control inputs of player 1 and player 2.
Figure 7. Constrained control inputs of player 1 and player 2.
Mathematics 10 02744 g007
Figure 8. Evolution of the critic estimates for player 1.
Figure 8. Evolution of the critic estimates for player 1.
Mathematics 10 02744 g008
Figure 9. Evolution of the critic estimates for player 2.
Figure 9. Evolution of the critic estimates for player 2.
Mathematics 10 02744 g009
Figure 10. Transformed system states using the presented method.
Figure 10. Transformed system states using the presented method.
Mathematics 10 02744 g010
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, J.; Qin, C.; Qiao, X.; Zhang, D.; Zhang, Z.; Shang, Z.; Zhu, H. Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints. Mathematics 2022, 10, 2744. https://doi.org/10.3390/math10152744

AMA Style

Wang J, Qin C, Qiao X, Zhang D, Zhang Z, Shang Z, Zhu H. Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints. Mathematics. 2022; 10(15):2744. https://doi.org/10.3390/math10152744

Chicago/Turabian Style

Wang, Jinguang, Chunbin Qin, Xiaopeng Qiao, Dehua Zhang, Zhongwei Zhang, Ziyang Shang, and Heyang Zhu. 2022. "Constrained Optimal Control for Nonlinear Multi-Input Safety-Critical Systems with Time-Varying Safety Constraints" Mathematics 10, no. 15: 2744. https://doi.org/10.3390/math10152744

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop