Next Article in Journal
The Concurrent Validity and Test-Retest Reliability of Possible Remote Assessments for Measuring Countermovement Jump: My Jump 2, HomeCourt & Takei Vertical Jump Meter
Previous Article in Journal
Tasks Allocation Based on Fuzzy Rules in Fractional Assembly Line with Redundancy
Previous Article in Special Issue
Integrated Adaptive Steering Stability Control for Ground Vehicle with Actuator Saturations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input

1
College of Mechanical and Electrical Engineering, China Jiliang University, Hangzhou 310018, China
2
Key Laboratory of Intelligent Manufacturing Quality Big Data Tracing and Analysis of Zhejiang Province, China Jiliang University, Hangzhou 310018, China
3
School of Electronic and Information Engineering, University of Science and Technology Liaoning, Shenyang 114051, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(4), 2140; https://doi.org/10.3390/app13042140
Submission received: 17 December 2022 / Revised: 3 February 2023 / Accepted: 6 February 2023 / Published: 7 February 2023

Abstract

:
In this paper, an event-triggered adaptive dynamic programming (ADP) method is proposed to deal with the H problem with unknown dynamic and constrained input. Firstly, the H -constrained problem is regarded as the two-player zero-sum game with the nonquadratic value function. Secondly, we develop the event-triggered Hamilton–Jacobi–Isaacs(HJI) equation, and an event-triggered ADP method is proposed to solve the HJI equation, which is equivalent to solving the Nash saddle point of the zero-sum game. An event-based single-critic neural network (NN) is applied to obtain the optimal value function, which reduces the communication resource and computational cost of algorithm implementation. For the event-triggered control, a triggering condition with the level of disturbance attenuation is developed to limit the number of sampling states, and the condition avoids Zeno behavior by proving the existence of events with minimum triggering interval. It is proved theoretically that the closed-loop system is asymptotically stable, and the critic NN weight error is uniformly ultimately boundedness (UUB). The learning performance of the proposed algorithm is verified by two examples.

1. Introduction

In control systems, the main task of the controller is to obtain an admissible control law when certain conditions are satisfied according to the dynamic characteristics of the plant. Then, the plant cooperates with other controllers to achieve the optimization of performance indicators (maximum or minimum), so as to solve the optimal control problem [1]. Solving the optimal control problem is equivalent to solving the Hamilton–Jacobi–Bellman (HJB) equation. Because of the nonlinearity and partial derivative of the HJB equation, it is difficult to acquire its analytical solution. In recent years, the adaptive dynamic programming (ADP) method has received widespread attention for obtaining the approximate solutions of the HJB equation [2,3,4,5]. The ADP method combines the idea of dynamic programming (DP) and reinforcement learning (RL). The RL-based agent interacts with the environment and takes action to obtain cumulative rewards, and it overcomes the problem of dimensional disasters generated in DP. The implementation of the ADP method approximates the HJB equation by the approximation principle of neural network function, and obtains the optimal value function and optimal control law by RL. In [6], the structure of ADP is proposed for the first time. Subsequently, in [7] uses the actor–critic NN structure to deal with the nonlinear system that depends on the dynamic information. In [8,9,10], the policy iteration (PI) consisting of policy evaluation and policy improvement iterative techniques is used for continuous time and discrete time dynamic systems to iteratively update the control policy online with state and input information. However, the PI learning algorithm relies on a precise dynamic system. Both drift dynamic and input dynamic are needed in the iterative process. Therefore, it is based on the improvement of the PI algorithm and uses the iterative technology of PI approximation to the optimal solution. An integral reinforcement learning (IRL) algorithm in [11] is proposed to release drift dynamics by adding integral operations. In [12], for a locally unknown continuous-time nonlinear system with actuator saturation, an IRL algorithm with an actor-critic network is proposed to solve the HJB equation online. Then, the experience replay technique is used to update the critic weights to solve the IRL-Bellman equation. The IRL method is used for the unknown H tracking control problem of a linear system with actuator saturation in [13]. The designed controller not only makes the system state trajectory of convergence but the control strategy can also make the tracking error asymptotically stable.
The complexity and uncertainty of the system increase the difficulty of solving the HJB solution. Especially the nonlinear system in practical problems, H control problem is an alternative method to solve the robust optimal problem. The purpose of the H control problems is to design a controller that can effectively suppress the impact of external disturbances on system performance. Therefore, many studies propose the robust optimal control method to transform the H optimal control problem into the zero-sum game problem, which is a max/min optimization problem essentially [14,15]. Therefore, the Nash saddle point of the two-player zero-sum problem is regarded as solving the solution of the Hamilton–Jacobi–Isaacs (HJI) equation. In [16,17], an online policy iteration (PI) algorithm is presented to solve the two-player zero-sum game. An ADP-based critic–actor network is used to approximate the solution of the HJI equation. For the system containing external disturbances, a new ADP-based online IRL algorithm is proposed to approximate the HJI equation. Use current data and historical data to update network weights to improve data utilization efficiency when solving the HJI equation [18]. In [19], an  H tracking controller is designed to solve the zero-sum game with completely unknown dynamics and constrained input. The tracking HJI equation is solved by off-policy reinforcement learning. In [20,21], Adaptive critic designs (ACD) are used to solve the zero-sum problem that external disturbances have an impact on system performance.
However, in engineering applications, the controller requires high physical characteristics and security, which is usually limited by the threshold. In addition, the design and analyzable methods of the control system also have higher requirements. It is not only necessary to achieve the control design goal to ensure the stability of the dynamic system but also to consider the control performance of the system to save energy and reduce consumption. The above iterative methods mostly use the time-triggered mechanism of periodic sampling to solve the nonlinear zero-sum game problem with actuator saturation.
To reduce unnecessary data transmission and frequency update between components, the event-triggered mechanism is introduced into adaptive dynamic programming for the first time in [22]. A controller with sampling states limited by a triggering condition is designed, which not only ensures the stability and optimality of the system, but also reduces the controller update. For the system with constrained input, [23,24] propose an approximately optimal control structure based on the event-triggered strategy, which makes control laws update non-periodically to reduce computation and transmission costs, and ensures the uniform ultimate boundedness of the event-triggered system. The designed triggering condition is the key to the event-based controller. Not only does the triggering condition have a non-negative triggering threshold but it avoids Zeno behavior. In [25], an event-driven controller is designed, and the triggering condition should not only have a non-negative triggering threshold but also the Zeno behavior is free. In [26], the IRL-based event-triggered algorithm is used to solve the partially unknown nonlinear system. The critic-network updates periodically and the actor-network updates non-periodically to approximately acquire the performance index function and control law. The good convergence of the NN weight is theoretically proved and effectively avoids the Zeno phenomenon. For the system with unknown disturbance, the robust controller is designed by using the H control method. In [15], an event-triggered H controller is structured, which introduces the disturbance attenuation level into the triggering condition, and ensures that the triggering threshold is non-negative by selecting appropriate parameters. The control law is updated with an event-triggered strategy, and the perturbation law is adjusted under the time-triggered mechanism.
At present, the event-triggered ADP algorithm has been applied to optimal regulation problem [27,28], optimal tracking control [29], zero-sum game [30,31,32], non-zero-sum game [33,34], robust control problem [35,36]. However, most studies are based on identification–critic NNs or critic–actor–perturbation NNs structure to approximate obtain the solution of the HJI equation for the zero-sum game problem, which often increases the communication load such as actuators and controllers, and increases the loss of resources and costs [37,38]. Therefore, an event-triggered ADP method is proposed to solve the H optimal control problem for partially unknown continuous-time nonlinear systems with constrained input, and the structure of a single-critic network is constructed to acquire approximately the solution of the HJI equation. This paper aims to achieve the following aspects:
  • Based on the event-triggered control, the triggering condition with the level of disturbance attenuation is developed to limit the sampling states of the system, and the appropriate level of disturbance attenuation is selected to ensure that the triggering condition remains non-negative.
  • An event-triggered H input-constrained nonlinear system controller is designed. The event-based control law and disturbance law are updated at the triggering instant, and the computation in the control process is effectively reduced.
  • For the zero-sum game problem, a single-critic network structure based on the event-triggered ADP is proposed to approximate the solution of the HJI equation. It not only greatly reduces the update frequency of the controller and reduces the computational cost, but also the reliance on known dynamic information is relaxed.
The rest of this article is organized as follows. Section 2 gives the description and transformation of the problem. Section 3 introduces the event-triggered HJI equation. Section 4 describes the implementation of the event-based ADP algorithm, and gives an analysis of system stability. Section 5 demonstrates the simulation of the continuous-time linear system and the continuous-time (CT) nonlinear system, and Section 6 presents the conclusion of this paper.
Notation. Some parameters are defined in this article. R , R m , R m × n are all denoted the set of real matrices, and m, m × n are represented as the corresponding dimension matrix. N + is the set of positive real numbers. tanh T ( · ) is expressed as an inverse hyperbolic function, λ ̲ ( · ) and λ ¯ ( · ) are defined as the minimum and maximum eigenvalue of the matrix. · express as the 2-norm. V V / x is denoted as the partial derivative of the function V with respect to the variable x, → denotes as numerical infinite approximation.

2. Problem Description

Consider the continuous-time nonlinear system with external disturbance as
x ˙ = f ( x ( t ) ) + g ( x ( t ) ) u ( t ) + h ( x ( t ) ) υ ( t )
where x R n denotes the states vector of system. u ( t ) = { ( u 1 , u 2 , , u m ) R m : u i u M , i = 1 , 2 , , m } , u M is a positive upper bound. f ( x ) R n is drift dynamic, and has f ( 0 ) = 0 . g ( x ) R n × m is input coupling dynamic. h ( x ) R n × q and υ ( t ) R q are the disturbance dynamics and bounded external disturbance. υ M is a positive constant with υ ( t ) υ M .
Since the system is affected by constrained input, M ( u ) is defined as [39]
M ( u ) = 2 0 u u M tanh T ( μ / u M ) R d μ = 2 i = 1 m 0 u i u M tanh T ( μ i / u M ) R d μ i
where M ( u ) represents nonquadratic function, and R is assumed to be diagonal matrix. tanh T ( · ) is a hyperbolic tangent function, which is treated as the constrained input.
For the H control problem with constrained input, we need to reduce the impact of external disturbances on system performance. For any υ ( t ) L 2 [ 0 , ) , the closed-loop system (1) exists
t x T Q x + M ( u ) d τ γ 2 t υ T ( τ ) υ ( τ ) d τ
and this is, L 2 -gain not larger than γ is satisfied, where Q R n × n is a symmetric positive matrix. γ > 0 denotes the level of disturbance attenuation.
The design intention of the H optimal control problem is to find a control law that not only guarantees the asymptotic stability of the system (1) but also the disturbance attenuation condition (3) holds. Then, we need to define the following value function
J ( x ( t ) , u , v ) = t U x ( τ ) , u ( τ ) , υ ( τ ) d τ
where U ( x , u , υ ) = x T Q x + M ( u ) γ 2 υ T υ .
To achieve the above goals, the  H control problem is treated as the zero-sum game problem. By the minimax optimization principle, the perturbation policy is used as a decision-maker and maximizes its value, while the control policy acts as another decision-maker and minimizes its value. The following optimal value function is indicated as
V * ( x ) = J ( x ( t ) , u * , v * ) = min u max υ t U x ( τ ) , u ( τ ) , υ ( τ ) d τ
Assume that the V ( x ) = J ( x , u , υ ) is continuous and differentiable, and the Bellman equation can be given as
V T ( f + g u + h υ ) + x T Q x + M ( u ) γ 2 υ T υ = 0
Then, the Hamiltonian function is defined as
H ( x , V , u , υ ) = V * T ( f + g u + h υ ) + M ( u ) + x T Q x γ 2 υ T υ
Definition 1
([12]). A control law is defined as admission with respect to the value function (4) on a compact set Λ. If  u ( 0 ) = 0 and υ ( 0 ) = 0 , u ( x ) can ensure system (1) stability on Λ and for any x 0 Λ the value funcion V ( x 0 ) is finite.
By stationarity conditions H / u = 0 and H / υ = 0 , the optimal control law u * ( x ) and the disturbance law υ * ( x ) are expressed as
u * ( x ) = u M tanh 1 2 u M R 1 g T ( x ) V * ( x )
υ * ( x ) = 1 2 γ 2 h T ( x ) V * ( x )
Submitting (8) into (2), M ( u * ) can be obtained
M ( u * ) = u M V * T g tanh ( Ψ ) + u M 2 R ¯ ln 1 ̲ tanh 2 ( Ψ )
where Ψ = 1 / ( 2 u M ) R 1 g T V * ( x ) , and  1 ̲ indicates the elements are all 1 column vectors, R ¯ is a row vector made up of the diagonal elements of R.
Based on (7)–(9), the optimal time-triggered HJI equation becomes
V * T ( x ) f ( x ) + x T Q x + u M 2 R ¯ ln 1 ̲ tanh 2 ( Ψ ) 1 4 γ 2 V * T h ( x ) h T ( x ) V * ( x ) = 0
From (11), it is difficult to solve its solution. In addition, since the dynamic system f ( x ) is unknown, it further increases the difficulty of solving the HJI equation. Thus, we prefer to introduce the IRL technology to obtain the solution to the HJI equation without requiring the system dynamic f ( x ) .
The learning process of IRL mainly includes the following steps in Algorithm 1.
Algorithm 1: IRL algorithm.
1: Choose the initial admissible control laws u 0 , υ 0 .
2 : Policy Evaluation . During the interval time [ t T , t ] , the value function V [ i ] ( x ) is obtained by solving the following Bellman equation
V [ i ] x ( t T ) = t T t x T Q x + M ( u [ i ] ) γ 2 υ [ i ] T υ [ i ] d τ + V [ i ] ( x ( t ) )
3 : Policy Inprovement . Update the control law and disturbance law by
u [ i + 1 ] ( x ) = u M tanh 1 2 u M R 1 g T ( x ) V [ i ] ( x )
υ [ i + 1 ] ( x ) = 1 2 γ 2 h T ( x ) V [ i ] ( x )
Remark 1.
The IRL algorithm is an improvement over the policy iterative (PI) technique. Compared with the standard Bellman Equation (4), the IRL-based Bellman Equation (12) does not involve system dynamics f ( x ) and g ( x ) .
However, in the IRL algorithm, the control policy is updated at t and applied to the next interval [ t , t + T ] of the system. It is so-called periodic sampling or time-triggered control (TTC). By the TTC, the control law and disturbance law are periodically updated, which may cause large numerous resource consumption and computation costs for the network system with limited bandwidth. Therefore, an event-triggered ADP is applied to solve the HJI equation for the H control problem.

3. Design of Event-Triggered H -Constrained Optimal Control

An event-triggered control is introduced into ADP to solve the zero-sum game problem and then develops an event-triggered HJI equation. Then, a triggering condition is designed to limit the number of sampling states which can avoid the Zeno behavior.
In the ETC, the event-triggered instant sequences are defined as { t k } k = 0 , t k < t k + 1 and k N + . A novel sampling state can be obtained at the triggering instant when the triggering condition is violated. The event error determines the appearance of the sampling state, and it can be expressed as
e k ( t ) = x ^ k x ( t ) , t [ t k , t k + 1 )
where x ^ k = x ( t k ) is the sampling state at the triggering instant t k and x ( t ) represents the current state, respectively.
Remark 2.
The event is triggered depending on two basic conditions, which are the event-triggered error e k and the triggering threshold e T . An event occurs and the controller updates when the error violates the triggering threshold, this is, the error is e k = 0 at t = t k . The holding of the sampling state in the non-triggered interval t [ t k , t k + 1 ) is realized by the zero-order holder (ZOH). The ZOH is used to store the control law of the previous moment and convert the sampled signal into a continuous signal. Then, the law is kept until the next triggering instant.
It is assumed that there is no delay between the sensor and the controller. According to (15), the closed-loop system (1) is rewritten as
x ˙ = f ( x ) + g ( x ) u ( x ^ k ) + h ( x ) υ ( x ^ k ) , t [ t k , t k + 1 )
Then, the event-based value function is shown as
V x ( t T ) = t T t x T Q x + M u ( x ^ k ) γ 2 υ T ( x ^ k ) υ ( x ^ k ) d τ + V x ( t )
Therefore, the optimal event-triggered control law and the disturbance law in t [ t k , t k + 1 ) can be transformed into
u * ( x ^ k ) = u M tanh 1 2 u M R 1 g T ( x ^ k ) V * ( x ^ k )
υ * ( x ^ k ) = 1 2 γ 2 h T ( x ^ k ) V * ( x ^ k )
Similar to (11), the event-triggered HJI equation becomes
V * T ( x ) f ( x ) + x T Q x + M ( u * ( x ^ k ) ) u M V * T ( x ) g tanh 1 2 u M R 1 g T ( x ^ k ) V * ( x ^ k ) + 1 2 γ 2 V * T ( x ) h ( x ) h T ( x ^ k ) V * ( x ^ k ) 1 4 γ 2 V * T ( x ^ k ) h ( x ^ k ) h T ( x ^ k ) V * ( x ^ k ) = 0
To sum up, in order to reduce communication load and computational cost, we introduce the event-triggered ADP in Algorithm 2 as follows
Algorithm 2 Event-triggered ADP algorithm.
1 : Choose the initial admissible control laws u 0 , υ 0 .
2 : Policy Evaluation . During the interval time, the value function V ( x ) can be obtain by solving the event-based Bellman equation (17).
3 : Policy Inprovement . (18) and (19) can obtain the optimal control policy u * ( x ^ k ) and the optimal disturbance policy υ * ( x ^ k ) separately.
Assumption 1
([15]). For the optimal value function V * ( x ) is continuously differentiable, V * ( x ) and V * ( x ) are bounded on Λ, such that the solution V * ( x ) and its partial derivative V * ( x ) can be denoted as max { V * ( x ) , V * ( x ) } η 0 with a positive constant η 0 .
Assumption 2
([26]). For x R n , there exist g M > 0 and h M > 0 and can be satisfied with g ( x ) g M and h ( x ) h M .
Assumption 3
Assuming that the optimal control law and the disturbance law are Lipschitz continuous on a compact set Λ R n . There exist L u > 0 and L υ > 0 satisfying
u * ( x ) u * ( x ^ k ) L u e k ( t )
υ * ( x ) υ * ( x ^ k ) L υ e k ( t )
Theorem 1.
Let V * ( x ) be the optimal solution of (11). The optimal control laws u * ( x ^ k ) in (18) and υ * ( x ) in (19) are obtained to ensure that the dynamic system is uniformly ultimately bounded (UUB) only when the following triggering condition is given as
e k ( t ) ( 1 σ 2 ) L u 2 + γ 2 L υ 2 λ ̲ ( Q ) x 2 e T
where γ > 0 is the level of disturbance attenuation from (3), σ is a positive design parameter. λ ̲ ( Q ) is the minimum eigenvalue of the matrix Q. The controller gets a new control law when the event error e k exceeds the triggering threshold e T .
Proof. 
The optimal value function V * ( x ) is obtained from (11) and it is positive definite for any x 0 . Use V * ( x ) as the Lyapunov function, and its derivative to time is defined as
V ˙ * ( x ) = V * T ( x ) f + g u * ( x ^ k ) + h υ * ( x ^ k ) = V * T ( x ) g u * ( x ^ k ) u * ( x ) + V * T ( x ) h υ * ( x ^ k ) υ * ( x ) + V * T ( x ) f + g u * ( x ) + h υ * ( x )
From (6), we can acquire
V * T ( x ) f + g u * ( x ) + h υ * ( x ) = x T Q x M ( u * ( x ) ) + γ 2 υ * T υ *
Then, (8) and (9) can be transformed as
V * T ( x ) g = 2 u M R tanh T u * ( x ) u M
V * T ( x ) h = 2 γ 2 υ * T ( x )
According to (25)–(27), (24) can be rewritten as
V ˙ * ( x ) = 2 u M R tanh T u * ( x ) u M u * ( x ) u * ( x ^ k ) s 1 x T Q x + 2 γ 2 υ * T ( x ) υ * ( x ^ k ) υ * ( x ) s 2 M u * ( x ) + γ 2 υ * T υ *
By the weighted form of Young’s inequality, we have
s 1 u M R tanh T u * ( x ) u M 2 + u * ( x ) u * ( x ^ k ) 2 1 4 V * T ( x ) g 2 + L u 2 e k ( t ) 2
Similarly, we can obtain
s 2 γ 2 υ * 2 + γ 2 υ * ( x ^ k ) υ * ( x ) 2 γ 2 υ * 2 + γ 2 L υ 2 e k ( t ) 2
Let Assumption 1–3 holding and note x T Q x λ ̲ ( Q ) x 2 , we have
V ˙ * ( x ) = x T Q x M u * ( x ) + 2 γ 2 υ * ( x ) 2 + γ 2 L υ 2 e k ( t ) 2 + 1 4 V * T ( x ) g 2 + L u 2 e k ( t ) 2 ( 1 σ 2 ) λ ̲ ( Q ) x 2 σ 2 λ ̲ ( Q ) x 2 + ( L u 2 + γ 2 L υ 2 ) e k 2 + 1 4 V * T ( x ) g 2 + 1 2 γ 2 V * T ( x ) h 2 M u * ( x ) ( 1 σ 2 ) λ ̲ ( Q ) x 2 M u * ( x ) + ( L u 2 + γ 2 L υ 2 ) e k 2 σ 2 λ ̲ ( Q ) x 2 + ω 0
where ω 0 = ( 1 / 4 ) η 0 2 g M 2 + ( η 0 2 h M 2 ) / ( 2 γ 2 ) . From (10), we can obtain M u * 0 . If the triggering condition is given by (23), we can obtain
V ˙ * ( x ) σ 2 λ ̲ ( Q ) x 2 + ω 0
Thus, it can conclude that V ˙ * ( x ) < 0 for x > ω 0 / ( σ 2 λ ̲ ( Q ) ) . Therefore, the optimal laws u * ( x ^ k ) and υ * ( x ^ k ) can ensure that the closed-loop system (1) is asymptotically stable by the Lyapunov theory. The proof is completed. □
From (23), we can acquire the triggering interval currently when the event is triggered. The difference between the two triggering instants is recorded as the triggering interval. The interval can be expressed as τ k = t k + 1 t k . However, the Zeno behavior occurs if there exists ( τ k ) m i n = 0 .
Remark 3.
Zeno behavior is a special dynamic behavior in hybrid systems in which an infinite number of discrete transitions occur in a finite amount of time [40]. The existence of the Zeno behavior does not guarantee that the system can be asymptotically stable. Therefore, it is required to avoid Zeno behavior.
Theorem 2.
Under the triggering conditions (23) given in Theorem 1, the inter-execution time exists τ k = t k + 1 t k and satisfies
τ k Γ ( t ) L x 1 + Γ ( t )
where Γ ( t ) = e k ( t ) / x ( t ) , then the minimum interval time is not less than the positive constant.
Proof. 
Based on [41,42], for t ( t k , t k + 1 ) , we have
Γ ˙ ( t ) = d d t e k x = d d t ( e k T e k ) 1 2 ( x T x ) 1 2 = ( e k T e k ) 1 2 e k T e ˙ k ( x T x ) 1 2 x 2 ( x T x ) 1 2 x T x ˙ ( e k T e k ) 1 2 x 2 = e k T x ˙ e k x x T x ˙ e k x 3
According to (15) and its gradient is e ˙ k ( t ) = x ^ ˙ k x ˙ ( t ) . Since x ^ k is a constant for t [ t k , t k + 1 ) , we obtain e ˙ k ( t ) = x ˙ ( t ) . For the drift dynamid f ( x ) is Lipschitz continuous, there exists L x > 0 and satisfies f ( x ) L x x , such that the following inequality holds [36]
x ˙ = e ˙ k = f ( x ) + g ( x ) u ( x ^ k ) + h ( x ) υ ( x ^ k ) f ( x ) + g ( x ) u ( x ^ k ) + h ( x ) υ ( x ^ k ) L x x + e k
Thus, we can obtain
Γ ˙ ( t ) e k x ˙ e k x + x x ˙ x x e k x = x ˙ x 1 + e k x L x e k + L x x x ( 1 + e k x ) = L x 1 + e k x 2
From (33) and (36) can be rewritten as Γ ˙ ( t ) L x ( 1 + Γ ( t ) ) 2 . Let ϱ ( t ) be the solution to ϱ ˙ ( t ) = L x ( 1 + ϱ ( t ) ) 2 and satisfying ϱ ( 0 ) =0. Then, its solution can be expressed as ϱ ( t ) = L x ( t t k ) / 1 L x ( t t k ) where t t k . Therefore, we obtain Γ ( t ) ϱ ( t ) by comparison principle. Assume that the instant time ξ R + , and have t k ξ t k + 1 , we can obtain Γ ( ξ ) ϱ ( ξ ) . Then, we obtain ξ η + t k with η = Γ ( t ) / L x ( 1 + Γ ( t ) ) . Since η + t k ξ t k + 1 , we have τ k = t k + 1 t k η . For the asymptotically stable closed-loop system, we can obtain e k ( t ) > 0 for any x ( t ) 0 . Therefore, Γ ( t ) > 0 is held, which is guaranteed to have a lower bound with ( τ k ) m i n η > 0 . The proof is accomplished. □

4. Approximate Solution of Event-Triggered HJI Equation

An ADP-based event-triggered algorithm is shown before. In this section, an event-based single neural network with the approximation function is constructed, namely critic NN, to approximately solve the HJI equation.
Based on the general approximation of neural networks. The value function V ( x ) and its gradient V ( x ) can be obtained by the neural network as
V ( x ) = W c T ϕ ( x ) + ε V ( x )
V ( x ) = ϕ ( x ) T W c + ε V ( x )
where W c R L 1 is the ideal weight the critic NN, L 1 is the number of neurons, ϕ ( x ) R L 1 is the suitable activation function of NN. ε V ( x ) indicates the reconstruction error.
The approximate value function can be obtained from (37), and substituted it into (12) to obtain the following Bellman equation
t T t x T Q x + M ( u ) γ 2 υ T υ d τ + W c T Δ ϕ ( x ) e B ( t )
where Δ ϕ ( x ) = ϕ ( x ( t ) ) ϕ ( x ( t T ) ) . Due to the existence of the NN approximation error, the Bellman equation error is
ε B ( t ) = ε V ( t ) ε V ( t T ) = t T t ε V T ( f ( x ) + g ( x ) u + h ( x ) υ ) d τ
For (39), the optimal value is obtained only when e B goes to zero indefinitely. However, the ideal weights W c are unknown, they can be estimated by the current weights W ^ c . The actual value function of the approximate network can be expressed as
V ^ ( x ) = W ^ c T ϕ ( x )
V ^ ( x ) = ϕ ( x ) T W ^ c
To solve the optimal control problem, the value function is obtained by solving the Bellman equation in the policy evaluation, and then the new control laws are obtained by the present value function in the policy improvement. Combined with (20), (21), and (42), the event-triggered control law and disturbance law become
u ^ ( x ^ k ) = u M tanh 1 2 u M R 1 g T ( x ^ k ) ϕ T ( x ^ k ) W ^ c
υ ^ ( x ^ k ) = 1 2 γ 2 h T ( x ^ k ) ϕ T ( x ^ k ) W ^ c
Combining (12) with (41), the approximate Bellman equation is described as
q ^ ( t ) + W ^ c T Δ ϕ ( x ) e c
where the reinforcement signal during the integral interval can be defined as
q ^ ( t ) = t T t x T Q x + M ( u ^ ( x ^ k ) ) γ 2 υ T ( x ^ k ) υ ( x ^ k ) d τ
In the iterative process of NN, its purpose is to minimize the value function. The gradient-descent method is used in this paper. Then, the critic network weights are adjusted online by minimizing the objective function E = ( 1 / 2 ) e c T e c . Thus, the turning law of the critic NN weight can be expressed as
W ^ ˙ c = α c E W ^ c = α c E e c e c W ^ c = α c Δ ϕ ( 1 + Δ ϕ T Δ ϕ ) 2 × t T t x T Q x + M ( u ^ ( x ^ k ) ) γ 2 υ T ( x ^ k ) υ ( x ^ k ) d τ + Δ ϕ T ( x ) W ^ c
where α c > 0 is an adaptive learning rate and ( 1 + Δ ϕ T Δ ϕ ) 2 is used for normalization. Define the critic weight approximation error as W ˜ c = W c W ^ c , and the square of the denominator in (47) is used to guarantee W ˜ c is bounded. Then, the critic weight error dynamic is presented by
W ˜ ˙ c = α c Δ ¯ ϕ ( t ) Δ ¯ ϕ T ( t ) W ˜ c + α c Δ ¯ ϕ ( t ) m e B ( t )
where m = 1 / ( 1 + Δ ϕ T Δ ϕ ) , Δ ¯ ϕ = Δ ϕ / ( 1 + Δ ϕ T Δ ϕ ) .
Remark 4.
This paper designs an online event-triggered ADP learning algorithm, the activation function must be continuously excited to ensure that the NN weights can be converged. Then, the value function and control laws are obtained. Therefore, the probing noise signal with different amplitudes and diverse frequencies is constructed as d ( t ) = 0.5 ( sin ( 0.08 t ) 2 cos ( 1.5 t ) + 0.3 sin ( 2.3 t ) 4 cos ( 7 t ) ) .
Remark 5.
Under the triggering condition (23), the adaptive learning law of the critic network (47) updates its weight by the reinforcement signal from (46), then the control law (43), and perturbation law (44) are updated at the triggering instant. In order to intuitively clarify the main idea of the algorithm, Figure 1 shows the block diagram of the event-triggered ADP H -constraint control.
Assumption 4
([26]).
  • g ( x ) and h ( x ) are Lipschitz continuous and satisfy g ( x ) g ( x ^ k ) L g e k ( t ) and h ( x ) h ( x ^ k ) L h e k ( t ) .
  • ϕ ( x ) is Lipschitz continuous, that is ϕ ( x ) ϕ ( x ^ k ) L ϕ e k ( t ) , L ϕ is a positive constant.
Assumption 5
([34,35,36]).
  • The NN activation function and its gradient are bounded by positive constants so that ϕ ( x ) ϕ m , ϕ ( x ) ϕ M .
  • The critic NNs approximation error and its gradient are all bounded by a constant and satisfied ε V ε m , ε V ε M .
The control law u ( x ) is admissible control. The approximate weight W ^ c can be guaranteed to converge to ideal weight W c under persistent excitation (PE) condition. Assume that the activation function Δ ϕ is continuously exciting in the interval [ t T , t ] . Let the residual error due to the approximation error satisfies e B e M , and it can ensure that the critic network weight is UUB. The boundedness condition of critic NN weight is expressed at (50).
Theorem 3.
Let Assumption 1–5 be valid, and the control laws and the turning law of critic NN are implemented by (43), (44), and (47). The closed-loop system (1) is asymptotically stable and the weight approximation error (48) is UUB only when the triggering condition in (23) is used and the following inequalities condition holds
α c 1 2 λ ̲ ( Δ ¯ ϕ ( t ) Δ ¯ ϕ T ( t ) ) ϖ 2 > 0
where ϖ 2 and ϖ 4 are constants from (60) and (63).
Proof. 
The initial control for the dynamical system (1) is admissible. For the value function (4) of the system, the Lyapunov function is constructed as
L ( t ) = L 1 + L 2 + L 3
where L 1 = V * ( x ) , L 2 = V * ( x ^ k ) , L 3 = ( 1 / 2 ) W ˜ c T W ˜ c .
The process of NN learning is mainly divided into two cases under event-triggered control: (1) During the flow dynamic on t [ t k , t k + 1 ) . (2) At the triggering instant t = t k .
Case 1. When the triggering condition is not satisfied and the derivative of L 1 can be given as
L ˙ 1 = V * T ( x ) f ( x ) + g ( x ) u ^ ( x ^ k ) + h ( x ) υ ^ ( x ^ k ) = V * T ( x ) g ( x ) u ^ ( x ^ k ) u * ( x ) + V * T ( x ) h ( x ) υ ^ ( x ^ k ) υ * ( x ) + V * T ( x ) f ( x ) + g ( x ) u * ( x ) + h ( x ) υ * ( x )
According to (25)–(27) and (51) can be transformed as
L ˙ 1 = 2 u M R tanh T u * ( x ) u M u ^ ( x ^ k ) u * ( x ) + 2 γ 2 υ * T ( x ) υ ^ ( x ^ k ) υ * ( x ) x T Q x M u * ( x ) + γ 2 υ * ( x ) T υ * ( x )
Similarly, using Young’s inequality, we have
L ˙ 1 u M R tanh T u * ( x ) u M 2 + u ^ ( x ^ k ) u * ( x ) 2 + γ 2 υ * ( x ) 2 + γ 2 υ ^ ( x ^ k ) υ * ( x ) 2 x T Q x M u * ( x ) + γ 2 υ * T ( x ) υ * ( x ) 1 4 V * T ( x ) g 2 + 2 γ 2 υ ( x ) * 2 x T Q x M u * ( x ) + u ^ ( x ^ k ) u * ( x ) 2 o 1 + γ 2 υ ^ ( x ^ k ) υ * ( x ) 2 o 2
Combining (8) with (38), then we obtain
u * ( x ) = u M tanh 1 2 u M R 1 g T ( x ) V * ( x ) = u M tanh 1 2 u M R 1 g T ( x ) ϕ T ( x ) W c + ε V ( x )
Note that | tanh ( x ) | 1 and | x y | | x | + | y | , Then, we have
o 1 = u M tanh 1 2 u M R 1 g T ( x ) ϕ T ( x ) W c + ε V ( x ) u M tanh 1 2 u M R 1 g T ( x ^ k ) ϕ T ( x ^ k ) ( W c W ˜ c ) 2 u M 2 1 2 u M R 1 g T ( x ) ϕ T ( x ) W c + ε V ( x ) + 1 2 u M R 1 g T ( x ^ k ) ϕ T ( x ^ k ) ( W ˜ c W c ) 2 1 4 R 1 2 ϕ ( x ) g ( x ) ϕ ( x ^ k ) g ( x ^ k ) W c + g T ( x ) ε V ( x ) + ϕ ( x ^ k ) g ( x ^ k ) W ˜ c 2 1 2 R 1 2 ϕ ( x ) g ( x ) ϕ ( x ^ k ) g ( x ^ k ) 2 F 1 W c 2 + g T ( x ) ε V ( x ) + ϕ ( x ^ k ) g ( x ^ k ) W ˜ c 2 F 2
According to Assumption 4 and Assumption 5, we obtain
F 1 = ϕ ( x ) g ( x ) ϕ ( x ^ k ) g ( x ) + ϕ ( x ^ k ) g ( x ) ϕ ( x ^ k ) g ( x ^ k ) 2 = ϕ ( x ) ϕ ( x ^ k ) g ( x ) + ϕ ( x ^ k ) g ( x ) g ( x ^ k ) 2 2 ϕ ( x ) ϕ ( x ^ k ) g ( x ) 2 + 2 ϕ ( x ^ k ) g ( x ) g ( x ^ k ) 2 2 L ϕ 2 g M 2 + ϕ M 2 L g 2 e k 2
Similarly, it can be obtained
F 2 = g T ( x ) ε V ( x ) + ϕ ( x ^ k ) g ( x ^ k ) W ˜ c 2 2 g T ( x ) ε V ( x ) 2 + 2 ϕ ( x ^ k ) g ( x ^ k ) W ˜ c 2 2 g M 2 ε M 2 + g M 2 ϕ M 2 W ˜ c 2
According to (55)–(57), o 1 rewrite as
o 1 R 1 2 L ϕ 2 g M 2 + ϕ M 2 L g 2 e k 2 W c 2 + R 1 2 ( g M 2 ε M 2 + g M 2 ϕ M 2 W ˜ c 2 )
Lisewise, combining (9) with (37), o 2 can be given as
o 2 1 2 γ 2 h T ( x ) ϕ T ( x ) W c + ε V ( x ) 1 2 γ 2 h T ( x ^ k ) ϕ T ( x ^ k ) W ^ c 2 1 2 ϕ ( x ) h ( x ) ϕ ( x ^ k ) h ( x ^ k ) 2 W c 2 + h T ( x ) ε V ( x ) 2 + ϕ ( x ^ k ) h ( x ^ k ) W ˜ c 2 ( L ϕ 2 h M 2 + ϕ M 2 L h 2 ) e k 2 W c 2 + h M 2 ε M 2 + h M 2 ϕ M 2 W ˜ c 2
According to (53), (58) and (59), we have
L ˙ 1 x T Q x M u * ( x ) + ϖ 1 e k 2 W c 2 + ϖ 2 W ˜ c 2 + ϖ 3 ( 1 σ 2 ) λ ̲ ( Q ) x 2 σ 2 λ ̲ ( Q ) x 2 + ϖ 1 e k 2 W c 2 + ϖ 2 W ˜ c 2 + ϖ 3
where ϖ 1 = R 1 2 L ϕ 2 g M 2 + ϕ M 2 L g 2 + γ 2 ( L ϕ 2 h M 2 + ϕ M 2 L h 2 ) , ϖ 2 = R 1 2 g M 2 ϕ M 2 + γ 2 h M 2 ϕ M 2 , ϖ 3 = 1 4 g M 2 η 0 + 2 γ 2 υ M 2 + R 1 2 g M 2 ε M 2 + γ 2 h M 2 ε M 2 .
If the event is not triggered, we have L ˙ 2 = 0 . Note that 1 / ( 1 + Δ ϕ T Δ ϕ ) 1 , thus, m, Δ ¯ ϕ ( t ) from (48) are bounded. Then, according to W c = W ^ c + W ˜ c , the part of L ˙ 3 can be given as
L ˙ 3 = W ˜ c T W ˜ ˙ c = α c W ˜ c T Δ ¯ ϕ ( t ) Δ ¯ ϕ T ( t ) W ˜ c + α c W ˜ c T Δ ¯ ϕ ( t ) m ε B ( t ) α c λ ̲ ( Δ ¯ ϕ ( t ) Δ ¯ ϕ T ( t ) ) W ˜ c 2 + 1 2 α c 2 e M 2 + 1 2 λ ̲ ( Δ ¯ ϕ ( t ) Δ ¯ ϕ T ( t ) ) W ˜ c 2 ( α c 1 2 ) λ ̲ ( Δ ¯ ϕ ( t ) Δ ¯ ϕ T ( t ) ) W ˜ c 2 + 1 2 α c 2 e M 2
Combining (60) and (61), we have
L ˙ ( 1 σ 2 ) λ ̲ ( Q ) x 2 σ 2 λ ̲ ( Q ) x 2 + ϖ 1 e k 2 W c 2 + ϖ 2 W ˜ c 2 α c 1 2 λ ̲ ( Δ ¯ ϕ ( t ) Δ ¯ ϕ T ( t ) ) W ˜ c 2 + 1 2 α c 2 e M 2 + ϖ 3
Let the triggering condition (23) holding the derivative of L become
L ˙ σ 2 λ ̲ ( Q ) x 2 α c 1 2 λ ̲ ( Δ ¯ ϕ ( t ) Δ ¯ ϕ T ( t ) ) ϖ 2 W ˜ c 2 + ϖ 4
where ϖ 4 = ( 1 / 2 ) α c 2 e M 2 + ϖ 3 . Let the triggering condition (23) and the inequality condition (49) hold. We can conclude that L ˙ < 0 if the x ( t ) and W ˜ c are satisfied with
x > ϖ 4 σ 2 λ ̲ ( Q )
W ˜ c > 2 ϖ 4 2 α c 1 λ ̲ ( Δ ¯ ϕ ( t ) Δ ¯ ϕ T ( t ) ) 2 ϖ 2
Therefore, by the Lyapunov theory, the state x and the NN weight error W ˜ c are uniformly ultimately bounded.
Case 2. The event is triggered at t = t k + 1 . A new Lyapunov function is constructed as
Δ L ( t ) = Δ L 1 + Δ L 2 + Δ L 3
where Δ L 1 = V * ( x + ) V * ( x ^ k ) , Δ L 2 = V * ( x ^ k + 1 ) V * ( x ^ k ) , Δ L 3 = 1 / 2 ( W ˜ c + T W ˜ c + W ˜ c T W ˜ c ) . If ξ 0 + , then we have x + = x ( t k + ) x ( t k + ξ ) with ξ ( 0 , t k + 1 t k ) . It can be concluded x and W ˜ c are UUB in the flow dynamic from Case 1 such that the closed-loop system (1) is asymptotically stable. Since V * ( x ) and W ˜ c are continuous, then we can obtain Δ L 1 = 0 and Δ L 3 = 0 . At the triggering instant t = t k + 1 , we have
Δ L 2 = V * ( x ^ k + 1 ) V * ( x ^ k ) ω ( e k + 1 ( t k ) ) < 0
where e k + 1 ( t k ) = x ^ k + 1 x ^ k , and ω ( · ) is a class- κ function [43]. Therefore, we can obtain Δ L < 0 . The above two cases ensure that the closed-loop system under the event-triggered strategy is asymptotically stable and the NN weight error is UUB if the triggering condition (23) and the inequality (49) are held. The proof is completed. □
Remark 6.
The theoretical description has been shown. Algorithm 2 can be implemented by using a single critic NN, which updates the critic weight using the turning law (47). Then, the new control laws u ( x ) and υ ( x ) are obtained by (43) and (44) when an event occurs. Compared with [12,26,35,44], we propose the event-triggered ADP method which solves the zero-sum problem with actuator saturation and partially unknown dynamic. Based on event-triggered control, the control laws of the two-player are presented in segments by the designed triggering condition (23). Thus, the event-triggered control avoids unnecessary data transmission and calculation costs.

5. Simulation Results

In this section, two simulation examples of the linear system and the nonlinear system are provided to verify the effectiveness of the online event-triggered adaptive dynamic programming algorithm for the continuous-time system.

5.1. Linear Sytsem

Consider the F-16 aircraft continuous-time linear system with external perturbation as [12]
x ˙ = A x ( t ) + B u ( t ) + H υ ( t )
where
A = 1.01887 0.90506 0.00215 0.82225 1.07741 0.17555 0 0 1 , B = [ 0 , 0 , 1 ] T , H = 1 , 0 , 1 T
The three state vectors are denoted as x = [ x 1 , x 2 , x 3 ] T , and the initial state is x 0 = [ 1 , 1 , 2 ] T . Choose a matrix of the appropriate dimension, where Q = I 3 and R = I 1 . The constrained control can be satisfied with u { u R : u 1 } . The activation function of single NNs is ϕ ( x ) = [ x 1 2 , x 2 2 , x 3 2 , x 1 x 2 , x 1 x 3 , x 2 x 3 ] T . Then, the approximated weight vectors are denoted as W ^ c = [ W ^ c 1 , W ^ c 2 , W ^ c 3 , W ^ c 4 , W ^ c 5 , W ^ c 6 ] , which is the solution of the algebraic Riccati equation (ARE): A T P + P A P B R 1 B T P + Q = 0 . The NN learning and triggering parameters are needed in this paper: γ = 0.5 , L ( 0 , 1 ) , T = 0.1 , α c = 0.35 . The external disturbance is selected as υ ( t ) = 5 cos ( t ) e t .
Figure 2 shows the convergence of three state vectors. It is obvious that the states tend to be asymptotically stable. The iterative process of critic NN is illustrated in Figure 3. The critic weight vectors are converged to W ^ c = [ 0.0349,−0.0128,−0.1575, 1.4738, 0.6577, 0.2075 ] T under the PE condition after 60s. Therefore, the PE condition is effectively guaranteed by adding detection noise. Under the event-triggered strategy, the control law and the perturbation law are updated in a nonperiodical manner. The evolution of ETC laws u ( x ^ k ) and υ ( x ^ k ) is described in Figure 4. Under the condition of constrained input, the control law remains no more than 1, that is u m a x < 1 . The trajectory of event error (15) and the triggering threshold (23) is shown in Figure 5, which verifies the working effect of the triggering condition. Figure 6 presents the event-triggered sample intervals and periods under the triggering condition. Compared with 1000 sampling states based on the time-triggered strategy, only 85 state samples are performed based on the event-triggered strategy. From Table 1, the minimum interval time ( τ k ) m i n = 0.1 , and the Zeno phenomenon is avoided. As a result, the updated frequencies and computations of the controller are greatly reduced.

5.2. Nonlinear Sytsem

Consider a continuous-time nonlinear system with the external disturbance given as
x ˙ = f ( x ( t ) ) + g ( x ( t ) ) u ( t ) + h ( x ( t ) ) υ ( t )
where
f ( x ) = x 1 + x 2 0.5 x 1 0.5 x 2 1 cos ( ( 2 x 1 ) + 2 ) 2
g ( x ) = 0 cos ( 2 x 1 ) + 2 , h ( x ) = 0 sin ( 4 x 1 + 2 )
The state vectors are denoted as x = [ x 1 , x 2 ] T , and the initial state is x 0 = [ 1 , 1 ] T . The constrained input control is satisfied with u { u R : u 1 } . Choose Q = I 2 and R = I 1 . The activation function is selected as ϕ ( x ) = [ x 1 2 , x 1 x 2 , x 2 2 ] T . Then, the approximated weight vectors are denoted as W ^ c = [ W ^ c 1 , W ^ c 2 , W ^ c 3 ] . The NN learning and triggering parameters are needed in this paper: γ = 0.4 , L ( 0 , 1 ) , T = 0.1 , α c = 0.25 . The external disturbance is selected as υ ( t ) = 5 cos ( t ) e t similarly.
The two state trajectories of the closed-loop system are described in Figure 7. It is obvious that x 1 and x 2 gradually converge to asymptotically stable. The ideal critic weight is [ W c 1 , W c 2 , W c 3 ] T . The NNs can be ensured to converge under the action of PE condition. Figure 8 demonstrates the approximable critic weight vectors are converged to [ 0.5468 , 0.8876 , 0.4125 ] T after learning. From Figure 9, we can obtain the evolution of ETC policies u ( x ^ k ) and υ ( x ^ k ) . Under the condition of constrained input, the control law remains no more than 1, that is u m a x < 1 . The trajectory of the event error and the triggering threshold are shown in Figure 10. It can be seen intuitively that the event-triggered errors are always within the triggering thresholds. Figure 11 indicates the event-triggered sampling intervals under the event-triggered mechanism. From Table 2, it has only 157 sampling states are performed based on an event-triggered strategy. Likewise, states will be sampled at each time by using a time-triggered strategy.

6. Conclusions

This paper has proposed an event-triggered ADP algorithm to solve the locally unknown zero-sum game problem with constrained input. Using the IRL technique to obtain the unknown dynamic system, and a single-critic NN is structured to solve the HJI equation. The event-based H controller is designed to ensure that the state of the system is sampled only at the triggering instant. Furthermore, the triggering condition guarantees the dynamic system state and the critic weight are uniformly ultimately bounded. Two examples are used to verify the feasibility of the algorithm. However, with the complexity and nonlinearity of the system, this algorithm is not enough to solve systems with model-free or parameter uncertainties. Therefore, the direction of future research is how to solve the above problem.

Author Contributions

Writing—original draft, B.P.; Writing—review & editing, X.C., Y.C. and W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Zhejiang grant number LY22F030009, Zhejiang Province basic scientific research business fee funding project (2022YW20, 2022YW84), National Natural Science Foundation of China grant number 61903351, Liaoning Revitalization Talents Program grant number XLYC2007182, Education Department Project of Liaoning grant number LJKMZ20220655.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hu, Z.; Mu, X. Event-Triggered Impulsive Control for Nonlinear Stochastic Systems. IEEE Trans. Cybern. 2022, 52, 7805–7813. [Google Scholar] [CrossRef] [PubMed]
  2. Jiang, H.; Zhang, H.; Luo, Y.; Han, J. Neural-Network-Based Robust Control Schemes for Nonlinear Multiplayer Systems with Uncertainties via Adaptive Dynamic Programming. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 579–588. [Google Scholar] [CrossRef]
  3. Jiang, H.; Zhang, H.; Luo, Y.; Cui, X. H control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method. Neurocomputing 2017, 237, 226–234. [Google Scholar] [CrossRef]
  4. Cui, X.; Zhang, H.; Luo, Y.; Jiang, H. Adaptive dynamic programming for H tracking design of uncertain nonlinear systems with disturbances and input constraints. Int. J. Adapt. Control Signal Process. 2017, 31, 1567–1583. [Google Scholar] [CrossRef]
  5. Lewis, F.L.; Vrabie, D.L. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
  6. Werbos, P. Advanced forecasting methods for global crisis warning and models of intelligence. Gen. Syst. Yearb. 1977, 22, 25–38. [Google Scholar]
  7. Vamvoudakis, K.G.; Lewis, F.L. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
  8. Mu, C.; Wang, K.; Sun, C. Policy-Iteration-Based Learning for Nonlinear Player Game Systems With Constrained Inputs. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 6488–6502. [Google Scholar] [CrossRef]
  9. Modares, H.; Lewis, F.L.; Naghibi-Sistani, M.B. Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 1513–1525. [Google Scholar] [CrossRef]
  10. Xu, D.; Wang, Q.; Li, Y. Adaptive Optimal Robust Control for Uncertain Nonlinear Systems Using Neural Network Approximation in Policy Iteration. Appl. Sci. 2021, 11, 2312. [Google Scholar] [CrossRef]
  11. Vrabie, D.L.; Lewis, F.L. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. Off. J. Int. Neural Netw. Soc. 2009, 22, 237–246. [Google Scholar] [CrossRef] [PubMed]
  12. Modares, H.; Lewis, F.L.; Naghibi-Sistani, M.B. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 2014, 50, 193–202. [Google Scholar] [CrossRef]
  13. Qin, C.; Wang, J.; Qiao, X.; Zhu, H.; Zhang, D.; Yan, Y. Integral Reinforcement Learning for Tracking in a Class of Partially Unknown Linear Systems With Output Constraints and External Disturbances. IEEE Access 2022, 10, 55270–55278. [Google Scholar] [CrossRef]
  14. Xue, S.; Luo, B.; Liu, D. Event-Triggered Adaptive Dynamic Programming for Unmatched Uncertain Nonlinear Continuous-Time Systems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2939–2951. [Google Scholar] [CrossRef] [PubMed]
  15. Yang, X.; He, H. Event-Driven H-Constrained Control Using Adaptive Critic Learning. IEEE Trans. Cybern. 2021, 51, 4860–4872. [Google Scholar] [CrossRef] [PubMed]
  16. Modares, H.; Lewis, F.L.; Sistani, M.B. Online solution of nonquadratic two-player zero-sum games arising in the H control of constrained input systems. Int. J. Adapt. Control Signal Process. 2014, 28, 232–254. [Google Scholar] [CrossRef]
  17. Vamvoudakis, K.G.; Lewis, F.L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int. J. Robust Nonlinear Control. 2010, 22, 3040–3047. [Google Scholar]
  18. Zhao, J.; Gan, M.; Chen, J.; Hou, D.; Zhang, M.; Bai, Y. Adaptive optimal control for a class of uncertain systems with saturating actuators and external disturbance using integral reinforcement learning. In Proceedings of the 2017 11th Asian Control Conference (ASCC), Gold Coast, QLD, Australia, 17–20 December 2017; pp. 1146–1151. [Google Scholar]
  19. Cui, X.; Chen, J.; Wang, B.; Xu, S. Off-policy algorithm based Hierarchical optimal control for completely unknown dynamic systems. Neurocomputing 2022, 488, 669–680. [Google Scholar] [CrossRef]
  20. Zhong, X.; He, H.; Wang, D.; Ni, Z. Model-Free Adaptive Control for Unknown Nonlinear Zero-Sum Differential Game. IEEE Trans. Cybern. 2018, 48, 1633–1646. [Google Scholar] [CrossRef]
  21. Wei, Q.; Liu, D.; Lin, Q.; Song, R. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 957–969. [Google Scholar] [CrossRef]
  22. Vamvoudakis, K.G. An online actor/critic algorithm for event-triggered optimal control of continuous-time nonlinear systems. In Proceedings of the 2014 American Control Conference, Portland, OR, USA, 4–6 June 2014; pp. 1–6. [Google Scholar]
  23. Sahoo, A.; Xu, H.; Jagannathan, S. Approximate Optimal Control of Affine Nonlinear Continuous-Time Systems Using Event-Sampled Neurodynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 639–652. [Google Scholar] [CrossRef] [PubMed]
  24. Sahoo, A.; Narayanan, V.; Jagannathan, S. Optimal event-triggered control of uncertain linear networked control systems: A co-design approach. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; pp. 1–6. [Google Scholar]
  25. Heemels, W.; Johansson, K.H.; Tabuada, P. An introduction to event-triggered and self-triggered control. In Proceedings of the 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), Maui, HI, USA, 10–13 December 2012; pp. 3270–3285. [Google Scholar]
  26. Mu, C.; Wang, K.; Qiu, T. Dynamic Event-Triggering Neural Learning Control for Partially Unknown Nonlinear Systems. IEEE Trans. Cybern. 2022, 52, 2200–2213. [Google Scholar] [CrossRef] [PubMed]
  27. Narayanan, V.; Sahoo, A.; Jagannathan, S. Optimal Event-triggered Control of Nonlinear Systems: A Min-max Approach. In Proceedings of the 2018 Annual American Control Conference (ACC), Milwaukee, WI, USA, 27–29 June 2018; pp. 3441–3446. [Google Scholar]
  28. Zhao, F.; Gao, W.; Jiang, Z.P.; Liu, T. Event-Triggered Adaptive Optimal Control With Output Feedback: An Adaptive Dynamic Programming Approach. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5208–5221. [Google Scholar] [CrossRef]
  29. Zhang, K.; Zhang, H.; Jiang, H.; Wang, Y. Near-optimal output tracking controller design for nonlinear systems using an event-driven ADP approach. Neurocomputing 2018, 309, 168–178. [Google Scholar] [CrossRef]
  30. Hu, C.; Zou, Y.; Li, S. Observed-based event-triggered control for nonlinear systems with disturbances using adaptive dynamic programming. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 581–586. [Google Scholar]
  31. Shi, C.X.; Yang, G. Nash equilibrium computation in two-network zero-sum games: An incremental algorithm. Neurocomputing 2019, 359, 114–121. [Google Scholar] [CrossRef]
  32. Su, H.; Zhang, H.; Liang, Y.; Mu, Y. Online event-triggered adaptive critic design for non-zero-sum games of partially unknown networked systems. Neurocomputing 2019, 368, 84–98. [Google Scholar] [CrossRef]
  33. Zhang, Y.; Zhao, B.; Liu, D.; Zhang, S. Event-Triggered Control of Discrete-Time Zero-Sum Games via Deterministic Policy Gradient Adaptive Dynamic Programming. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 4823–4835. [Google Scholar] [CrossRef]
  34. Mu, C.; Wang, K. Aperiodic adaptive control for neural-network-based nonzero-sum differential games: A novel event-triggering strategy. ISA Trans. 2019, 92, 1–13. [Google Scholar] [CrossRef]
  35. Yang, X.; He, H. Event-Triggered Robust Stabilization of Nonlinear Input-Constrained Systems Using Single Network Adaptive Critic Designs. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 3145–3157. [Google Scholar] [CrossRef]
  36. Zhang, Q.; Zhao, D.; Wang, D. Event-Based Robust Control for Uncertain Nonlinear Systems Using Adaptive Dynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 37–50. [Google Scholar] [CrossRef]
  37. Dong, L.; Zhong, X.; Sun, C.; He, H. Event-Triggered Adaptive Dynamic Programming for Continuous-Time Systems With Control Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 37–50. [Google Scholar] [CrossRef] [PubMed]
  38. Wang, K.; Gu, Q.; Huang, B.; Wei, Q.; Zhou, T. Adaptive Event-Triggered Near-Optimal Tracking Control for Unknown Continuous-Time Nonlinear Systems. IEEE Access 2022, 10, 9506–9518. [Google Scholar] [CrossRef]
  39. Abu-Khalaf, M.; Lewis, F.L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 2005, 41, 779–791. [Google Scholar] [CrossRef]
  40. Tabuada, P. Event-Triggered Real-Time Scheduling of Stabilizing Control Tasks. IEEE Trans. Autom. Control 2007, 52, 1680–1685. [Google Scholar] [CrossRef]
  41. Xue, S.; Luo, B.; Liu, D.; Yang, Y. Constrained Event-Triggered H Control Based on Adaptive Dynamic Programming With Concurrent Learning. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 357–369. [Google Scholar] [CrossRef]
  42. Luo, B.; Yang, Y.; Liu, D.; Wu, H. Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 76–88. [Google Scholar] [CrossRef]
  43. Khalil, H.K.; Grizzle, J. Nonlinear Systems; Prentice Hall: Englewood Cliffs, NJ, USA, 1996. [Google Scholar]
  44. Xue, S.; Luo, B.; Liu, D. Event-Triggered Adaptive Dynamic Programming for Zero-Sum Game of Partially Unknown Continuous-Time Nonlinear Systems. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 3189–3199. [Google Scholar] [CrossRef]
Figure 1. The structure diagram of Event-triggered ADP algorithm.
Figure 1. The structure diagram of Event-triggered ADP algorithm.
Applsci 13 02140 g001
Figure 2. Three state vectors of linear system.
Figure 2. Three state vectors of linear system.
Applsci 13 02140 g002
Figure 3. The weight trajectories of a single NNs.
Figure 3. The weight trajectories of a single NNs.
Applsci 13 02140 g003
Figure 4. The control law u ( x ^ k ) and the perturbation law υ ( x ^ k ) .
Figure 4. The control law u ( x ^ k ) and the perturbation law υ ( x ^ k ) .
Applsci 13 02140 g004
Figure 5. The event error e k 2 and the triggering threshold e T .
Figure 5. The event error e k 2 and the triggering threshold e T .
Applsci 13 02140 g005
Figure 6. Sample intervals of two strategies. (a) The time-triggered control. (b) The event-triggered control.
Figure 6. Sample intervals of two strategies. (a) The time-triggered control. (b) The event-triggered control.
Applsci 13 02140 g006
Figure 7. The evolution of x 1 ( t ) and x 2 ( t ) .
Figure 7. The evolution of x 1 ( t ) and x 2 ( t ) .
Applsci 13 02140 g007
Figure 8. Three vectors for a single-critic network.
Figure 8. Three vectors for a single-critic network.
Applsci 13 02140 g008
Figure 9. The control law u ( x ^ k ) and the perturbation law υ ( x ^ k ) .
Figure 9. The control law u ( x ^ k ) and the perturbation law υ ( x ^ k ) .
Applsci 13 02140 g009
Figure 10. The event error e k 2 and the triggering threshold e T .
Figure 10. The event error e k 2 and the triggering threshold e T .
Applsci 13 02140 g010
Figure 11. The event-triggered sampling intervals.
Figure 11. The event-triggered sampling intervals.
Applsci 13 02140 g011
Table 1. State Sampling of Two Strategies.
Table 1. State Sampling of Two Strategies.
MethodsEvent-TriggeredTime-Triggered
Samples851000
Minimal interval (s)0.10.1
Average interval (s)1.01650.1
Table 2. Comparison of Nonlinear System Sampling.
Table 2. Comparison of Nonlinear System Sampling.
MethodsEvent-TriggeredTime-Triggered
Samples1571000
Minimal interval (s)0.10.1
Average interval (s)0.5270.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Peng, B.; Cui, X.; Cui, Y.; Chen, W. Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input. Appl. Sci. 2023, 13, 2140. https://doi.org/10.3390/app13042140

AMA Style

Peng B, Cui X, Cui Y, Chen W. Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input. Applied Sciences. 2023; 13(4):2140. https://doi.org/10.3390/app13042140

Chicago/Turabian Style

Peng, Binbin, Xiaohong Cui, Yang Cui, and Wenjie Chen. 2023. "Event-Triggered Single-Network ADP for Zero-Sum Game of Unknown Nonlinear Systems with Constrained Input" Applied Sciences 13, no. 4: 2140. https://doi.org/10.3390/app13042140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop