Next Article in Journal
Application to Lipschitzian and Integral Systems via a Quadruple Coincidence Point in Fuzzy Metric Spaces
Previous Article in Journal
Innovation of the Component GARCH Model: Simulation Evidence and Application on the Chinese Stock Market
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Tracking Control for Non-Zero-Sum Games of Continuous-Time Uncertain Nonlinear Systems

1
School of Artificial Intelligence, Henan University, Zhengzhou 450000, China
2
School of Software, Henan University, Kaifeng 475000, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(11), 1904; https://doi.org/10.3390/math10111904
Submission received: 28 April 2022 / Revised: 28 May 2022 / Accepted: 30 May 2022 / Published: 2 June 2022
(This article belongs to the Topic Advances in Nonlinear Dynamics: Methods and Applications)

Abstract

:
In this paper, a new adaptive critic design is proposed to approximate the online Nash equilibrium solution for the robust trajectory tracking control of non-zero-sum (NZS) games for continuous-time uncertain nonlinear systems. First, the augmented system was constructed by combining the tracking error and the reference trajectory. By modifying the cost function, the robust tracking control problem was transformed into an optimal tracking control problem. Based on adaptive dynamic programming (ADP), a single critic neural network (NN) was applied for each player to solve the coupled Hamilton–Jacobi–Bellman (HJB) equations approximately, and the obtained control laws were regarded as the feedback Nash equilibrium. Two additional terms were introduced in the weight update law of each critic NN, which strengthened the weight update process and eliminated the strict requirements for the initial stability control policy. More importantly, in theory, through the Lyapunov theory, the stability of the closed-loop system was guaranteed, and the robust tracking performance was analyzed. Finally, the effectiveness of the proposed scheme was verified by two examples.

1. Introduction

Control theory has been gradually developed to meet the needs of engineering. In practical engineering, environmental uncertainties, such as noise, temperature, etc., greatly affect the stability of a system, so it is very important to find a control method to solve this problem. In recent years, some methods to deal with disturbance or uncertainty have been proposed, such as sliding mode control, type-2 fuzzy control [1,2], internal model control, and so on. However, in a sense, robust control can be also applied to solve the control problems of uncertain dynamic systems [3,4]. Based on the development of adaptive dynamic programming (ADP) and control algorithms, some methods have been effective in solving robust control problems, including guaranteed cost control [5], the system transformation method [6,7,8], control schemes for robust stabilization using integral reinforcement learning (IRL) methods [9,10]. These methods mainly embody the ideas of reinforcement learning and adaptive dynamic programming. When studying the optimal control problem using adaptive dynamic programming [11,12,13], the key is to solve the Hamilton–Jacobi–Bellman (HJB) equation; however, due to the curse of dimensionality, it is almost impossible to solve directly. Combining neural network (NN) approximation methods and ADP ideas, the adaptive critic design has been widely used in robust control [14,15]. Considering the adaptive critic design, the approximate solution of the HJB equation can be attained to cope with robust control problems [5,9,15]. For a system with uncertainties, the upper bound function of uncertainties is usually given, and then the cost function is modified so that the robust control problem can be transformed into the optimal control problem of a nominal system [15]. It has inspired our processing method for uncertain disturbance. From the above results, it can be seen that the basic regulation problem has been solved.
As the complexity of a system increases, a large class of systems often has multiple controllers, such as immune systems [16] and interconnected systems [17]. Game theory considers individual predictive behavior and practical behavior in a game, and studies optimization strategies, and multi-controller system issues can be well addressed by it [18]. As an important theory in game theory, non-zero-sum (NZS) game theory was first proposed in [19] and it aims to find a set of feedback control strategies to achieve the so-called Nash equilibrium while satisfying the defined performance indicators and guaranteeing the system’s stability. In this process, the most important aspect is to solve the coupled HJB equations. Since the coupled HJB equations are difficult to solve directly, many advanced algorithms have been developed. In general, iteration-based algorithms can be used to approximate the solution of HJB equations. A policy-based iteration algorithm was used to solve the system of NZS game problems in [20,21]. Considering that it is difficult to know the specific dynamics of complex systems, in [22], based on the iteration algorithm, the Nash equilibrium was obtained approximately by the data-based IRL, which does not need known system dynamics. As policy iteration requires an initial stable control policy, an off-policy IRL method was given to solve the coupled HJB equations in [23]. Recently, the ADP method has become an effective tool in solving the coupled HJB equations. To solve the NZS game of unknown nonlinear systems, using a generalized fuzzy hyperbolic model, an approximately optimal control scheme based on the ADP method was presented in [24]. Combined with the ADP method and the NN structure, the adaptive critic design was also applied to the NZS game. Based on the structure of an actor?critic NN, an adaptive algorithm was proposed for NZS games in the nonlinear system in [25]. In [26], using experience replay techniques, based on the framework of a single critic NN, the NZS game of the unknown dynamical systems was studied. The method proposed above can effectively solve the NZS game. However, there are few studies on NZS games with uncertain disturbances. Therefore, based on adaptive critic design, the NZS game of nonlinear systems with uncertain perturbations was studied in this work.
Initially, our research for the system was limited to allowing the state of the system to converge to the origin; however, many system controller designs also require the controlled object to track a reference trajectory, especially in noisy and uncertain environments. Usually, this is a very common control problem. Trajectory tracking control problems have been solved by some algorithms in [27,28,29,30,31,32,33,34]. The iterative algorithm can still be effectively applied to trajectory tracking control. In [27], to overcome some shortcomings of the traditional controller, an adaptive iterative algorithm was proposed for the robot trajectory tracking problem. Considering disturbance, an iterative algorithm based on Q-learning was presented to solve the H∞ tracking problem of discrete-time systems in [28], which didn’t require system dynamics. In [29], the tracking problem was transformed into the tracking error adjustment problem through system transformation, which was solved by the iterative ADP algorithm. Then, some non-iterative algorithms for tracking problems were proposed in [30,31,32,33,34]. In [30], the optimal tracking control was studied using online approximators, but this method involved the reversibility of the control matrix. To overcome the requirement of invertibility of the control matrix, some new methods were proposed. In [31], based on system transformation, a self-learning optimal control method was used to solve the robust trajectory tracking design of uncertain nonlinear systems. Considering the need for multiple outputs in some systems, the robust tracking control of discrete-time systems with multiple inputs and multiple outputs was studied utilizing the adaptive critic design in [32]. By modifying the cost function and introducing a discount factor, the guaranteed cost tracking problem was transformed into an optimal tracking problem, and by developing a new critic NN the optimal tracking control problem could be addressed without policy iteration in [33]. As with some systems with unmatched perturbation, the NN-based ADP algorithm was used to obtain the approximately optimal tracking control law of uncertain nonlinear systems with a predefined cost function in [34]. In this paper, based on the critic NN structure and the ADP method, an augmented system was used to solve the tracking control problem for NZS games with perturbation.
The main contributions of this paper are as follows:
(1)
An augmented system was constructed by combining the tracking error and the reference trajectory. The robust tracking control problem was transformed into an optimal tracking control problem of the nominal augmented system by modifying the cost functions. This method no longer strictly required the control matrix to be reversible. Moreover, in most cases, robust tracking control is applied to some special systems, but here we considered a general system similar to a spring-mass-damper system [31].
(2)
For the NZS game between two players with uncertainties, a newly improved adaptive critic design was proposed to solve the revised coupled HJB equations. Two additional terms were introduced in the critic NN weight design, one was used to ensure that the system could always be in a stable state without the need for the initial stability control policy, and the other was used to analyze the stability of the system.
(3)
Compared with the actor–critic NN, each player only used one critic NN to approximate their value function and control policy, which could greatly reduce the amount of calculation. By the Lyapunov theory, the stability of the closed-loop system was proved, and the trajectory tracking performance was analyzed. What is more, the adaptive critic design could be carried out online.
The rest of this paper is arranged as follows. In the second section, the description of the two-player NZS game with uncertain terms and the construction method of the augmented matrix structure are given. Then in the third section, a single critic NN structure is used to approximate the value function for each player, and the approximate feedback Nash equilibrium is then solved. Moreover, the system stability analysis and the tracking performance analysis are given. Finally, the effectiveness of the proposed scheme is verified by two examples.

2. Problem Statement

A class of continuous-time uncertain nonlinear dynamical systems for two-player NZS games is given by
x ˙ ( t ) = f ( x ( t ) ) + g ( x ( t ) ) u ( t ) + k ( x ( t ) ) v ( t ) + Δ f ( x ( t ) ) ,
where x R n is the system state, u R m is the first control input, v R q is the second control input. The known functions f ( · ) , g ( · ) and k ( · ) are Lipschitz continuous on a compact set Ω R n with f ( 0 ) =0. Δ f ( x ( t ) ) = M ( x ) d ( x ) is the unknown perturbation satisfying Δ f ( 0 ) = 0 . Here, M ( · ) R n × r is a known function, and d ( · ) R r is an uncertain function with d ( 0 ) = 0 . One chooses the initial state as x ( 0 ) = x 0 . Let the uncertain term Δ f ( x ( t ) ) be bounded by a known function λ f ( x ) , i.e., Δ f ( x ) λ f ( x ) with λ f ( 0 ) = 0 .
Here, we introduce a system reference trajectory command generator to implement the trajectory tracking, that is
s ˙ ( t ) = φ ( s ( t ) ) ,
where s ( t ) R n denotes the bounded reference trajectory. Let the initial trajectory be s ( 0 ) = s 0 and φ ( s ( t ) ) is a Lipschitz continuous function with φ ( 0 ) = 0 . The tracking error is defined as
e r ( t ) = x ( t ) s ( t ) .
Then, the initial error vector is e r ( 0 ) = e r 0 = x 0 s 0 . According to (1)–(3) the tracking error dynamics can be obtained as
e ˙ r ( t ) = f ( x ( t ) ) φ ( s ( t ) ) + g ( x ( t ) ) u ( t ) + k ( x ( t ) ) v ( t ) + Δ f ( x ( t ) ) .
Due to x ( t ) = e r ( t ) + s ( t ) , system (4) is written as
e ˙ r ( t ) = f ( e r ( t ) + s ( t ) ) φ ( s ( t ) ) + g ( e r ( t ) + s ( t ) ) u ( t ) + k ( e r ( t ) + s ( t ) ) v ( t ) + Δ f ( e r ( t ) + s ( t ) ) .
To introduce the augmented system, we define an augmented state vector ζ ( t ) = [ e r T ( t ) , s T ( t ) ] T R 2 n , and we can choose its initial condition as ζ ( 0 ) = ζ 0 = [ e r T ( 0 ) , s T ( 0 ) ] T R 2 n . Combining (2) and (5), the augmented system dynamics is simplified to
ζ ˙ ( t ) = F ( ζ ( t ) ) + G ( ζ ( t ) ) u ( t ) + K ( ζ ( t ) ) v ( t ) + Δ F ( ζ ( t ) ) ,
where F ( · ) , G ( · ) and K ( · ) are new system matrices. What is more, Δ F ( ζ ) represents the augmented system uncertainty, and they are written in the following specific form:
F ( ζ ( t ) ) = f ( e r ( t ) + s ( t ) ) φ ( s ( t ) ) φ ( s ( t ) ) ,
G ( ζ ( t ) ) = g ( e r ( t ) + s ( t ) ) 0 n × m ,
K ( ζ ( t ) ) = k ( e r ( t ) + s ( t ) ) 0 n × q ,
Δ F ( ζ ( t ) ) = Δ f ( e r ( t ) + s ( t ) ) 0 n × 1 .
It’s easy to conclude that Δ F ( ζ ) is upper bounded, and the details are as follows:
Δ F ( ζ ) = Δ f ( e r + s ) = Δ f ( x ) λ f ( e r + s ) λ f ( ζ ) .
In order to better analyze the NZS game with the uncertain perturbation, we decompose the uncertain term Δ F ( ζ ) into
Δ F ( ζ ) = Δ F 1 ( ζ ) + Δ F 2 ( ζ ) = M 1 ( ζ ) d 1 ( ζ ) + M 2 ( ζ ) d 2 ( ζ ) ,
where M 1 ( · ) R n × r and M 2 ( · ) R n × r are known functions in the uncertain term. d 1 ( · ) R r and d 2 ( · ) R r are the uncertain functions satisfying d 1 ( 0 ) = d 2 ( 0 ) = 0 . Similarly, two known functions λ f 1 ( ζ ) and λ f 2 ( ζ ) are the upper bounds of Δ F 1 ( ζ ) and Δ F 2 ( ζ ) with λ f 1 ( 0 ) = λ f 2 ( 0 ) = 0 .
Assumption 1.
The control function matrixes g ( x ) and k ( x ) are bounded as g ( x ) λ g and k ( x ) λ k [31], where λ g and λ k are positive constants, and hence
G ( ζ ) = g ( e r + s ) = g ( x ) λ g ,
K ( ζ ) = k ( e r + s ) = k ( x ) λ k .
By constructing the augmented dynamics (6), the feedback control laws u ( ζ ) and v ( ζ ) are found to make the state of system move along the reference trajectory. At the same time, the closed-loop system is asymptotically stable under the influence of the uncertain term. Next, we can give the appropriate cost functions to transform the robust control the problem into the optimal control problem for its nominal system.
For the augmented system (6), we focus on the nominal system part
ζ ˙ ( t ) = F ( ζ ( t ) ) + G ( ζ ( t ) ) u ( t ) + K ( ζ ( t ) ) v ( t ) .
The two-player cost functions are
J 1 ( ζ 0 , u , v ) = 0 { Γ 1 ( ζ ( t ) ) + U 1 ( ζ ( t ) , u ( t ) , v ( t ) ) } d t ,
J 2 ( ζ 0 , u , v ) = 0 { Γ 2 ( ζ ( t ) ) + U 2 ( ζ ( t ) , u ( t ) , v ( t ) ) } d t ,
where U 1 ( ζ , u , v ) and U 2 ( ζ , u , v ) are the the basic parts of utility functions with U 1 ( 0 , 0 , 0 ) = U 2 ( 0 , 0 , 0 ) = 0 , U 1 ( ζ , u , v ) 0 and U 2 ( ζ , u , v ) 0 for all ζ , u and v. Utility functions are chosen as U 1 ( ζ , u , v ) = ζ T Q ¯ 1 ζ + u T R 11 u + v T R 12 v and U 2 ( ζ , u , v ) = ζ T Q ¯ 2 ζ + u T R 21 u + v T R 22 v , where Q ¯ 1 = d i a g { Q 1 , 0 n × n } , Q ¯ 2 = d i a g { Q 2 , 0 n × n } , Q 1 , Q 2 , R 11 , R 12 , R 21 and R 22 are positive definite matrices. Γ 1 ( ζ ) and Γ 2 ( ζ ) are related to the dynamical uncertainty with Γ 1 ( ζ ) 0 and Γ 2 ( ζ ) 0 . What is more, the feedback controllers required to solve the optimal control problem are admissible. Then, the definition of admissible policies is described below.
Definition 1.
(Admissible policies) Control functions u ( ζ ) and v ( ζ ) are said to be admissible with respect to (16) and (17) on Ω R n [26], if u ( ζ ) and v ( ζ ) are continuous on Ω, u ( 0 ) = v ( 0 ) = 0 , u ( ζ ) and v ( ζ ) stabilize system (15) on Ω, moreover, the cost functions (16) and (17) are finite ζ 0 Ω .
Given admissible feedback policies u ( ζ ) A ( Ω ) and v ( ζ ) A ( Ω ) , one can define value functions that correspond to the cost functions as
V 1 ( ζ ( t ) ) = t { Γ 1 ( ζ ( τ ) ) + U 1 ( ζ ( τ ) , u ( τ ) , v ( τ ) ) } d τ ,
V 2 ( ζ ( t ) ) = t { Γ 2 ( ζ ( τ ) ) + U 2 ( ζ ( τ ) , u ( τ ) , v ( τ ) ) } d τ ,
where one can define Γ 1 ( ζ ) and Γ 2 ( ζ ) as
Γ 1 ( ζ ) = λ f 2 1 ( ζ ) + 1 4 ( V 1 ( ζ ) ) T M 1 ( ζ ) M 1 T ( ζ ) V 1 ( ζ ) ,
Γ 2 ( ζ ) = λ f 2 2 ( ζ ) + 1 4 ( V 2 ( ζ ) ) T M 2 ( ζ ) M 2 T ( ζ ) V 2 ( ζ ) .
In this paper, a 2-tuple of policies {u, v} is found to minimize (18) and (19), thus, the optimal value functions V 1 * and V 2 * are defined as
V 1 * ( ζ ( t ) ) = m i n u A ( Ω ) t { Γ 1 ( ζ ( τ ) ) + U 1 ( ζ ( τ ) , u ( τ ) , v ( τ ) ) } d τ ,
V 2 * ( ζ ( t ) ) = m i n v A ( Ω ) t { Γ 2 ( ζ ( τ ) ) + U 2 ( ζ ( τ ) , u ( τ ) , v ( τ ) ) } d τ .
In addition, there exists a Nash equilibrium in the NZS game between two players. Next, we give the Nash equilibrium definition.
Definition 2.
(Nash equilibrium policies) A 2-tuple of policies { u * , v * } with u, v A ( Ω ) is said to constitute a Nash equilibrium solution for the two-player game [35], if the following two inequalities are satisfied for all u , v A ( Ω ) :
J 1 * ( u * , v * ) J 1 ( u , v * ) ,
J 2 * ( u * , v * ) J 2 ( u * , v ) .
Under the admissible feedback policies, if the value functions (18) and (19) are continuously differentiable, their differential equivalents are given by
0 = Γ 1 ( ζ ) + U 1 ( ζ , u , v ) + ( V 1 ) T [ ( F ( ζ ) + G ( ζ ) u ( ζ ) + K ( ζ ) v ( ζ ) ] ,
0 = Γ 2 ( ζ ) + U 2 ( ζ , u , v ) + ( V 2 ) T [ F ( ζ ) + G ( ζ ) u ( ζ ) + K ( ζ ) v ( ζ ) ] ,
with V i ( 0 ) = 0 and V i = V i / ζ , i = 1 , 2 . Define the Hamiltonian functions
H 1 ( ζ , u ( ζ ) , v ( ζ ) , V 1 ) = Γ 1 ( ζ ) + U 1 ( ζ , u ( ζ ) , v ( ζ ) ) + ( V 1 ) T [ ( F ( ζ ) + G ( ζ ) u ( ζ ) + K ( ζ ) v ( ζ ) ] ,
H 2 ( ζ , u ( ζ ) , v ( ζ ) , V 2 ) = Γ 2 ( ζ ) + U 2 ( ζ , u ( ζ ) , v ( ζ ) ) + ( V 2 ) T [ ( F ( ζ ) + G ( ζ ) u ( ζ ) + K ( ζ ) v ( ζ ) ] .
According to the stationarity conditions [36], two players’ optimal feedback control policies are given by
H 1 u = 0 u * = 1 2 R 11 1 G T ( ζ ) V 1 * ,
H 2 v = 0 v * = 1 2 R 22 1 K T ( ζ ) V 2 * .
Combining (26), (27), (30) and (31), one obtains the coupled HJB equations
0 = Γ 1 ( ζ ) + ζ T Q ¯ 1 ζ + ( V 1 * ) T F ( ζ ) 1 2 ( V 1 * ) T G ( ζ ) R 11 1 G T ( ζ ) ( V 1 * ) 1 2 ( V 1 * ) T K ( ζ ) R 22 1 K T ( ζ ) ( V 2 * ) + 1 4 ( V 1 * ) T G ( ζ ) R 11 T R 11 R 11 1 G T ( ζ ) V 1 * + 1 4 ( V 2 * ) T K ( ζ ) R 22 T R 12 R 22 1 K T ( ζ ) V 2 * ,
0 = Γ 2 ( ζ ) + ζ T Q ¯ 2 ζ + ( V 2 * ) T F ( ζ ) 1 2 ( V 2 * ) T G ( ζ ) R 11 1 G T ( ζ ) ( V 1 * ) 1 2 ( V 2 * ) T K ( ζ ) R 22 1 K T ( ζ ) ( V 2 * ) + 1 4 ( V 1 * ) T G ( ζ ) R 11 T R 21 R 11 1 G T ( ζ ) V 1 * + 1 4 ( V 2 * ) T K ( ζ ) R 22 T R 22 R 22 1 K T ( ζ ) V 2 * ,
where V 1 * ( 0 ) = 0 and V 2 * ( 0 ) = 0 . To simplify the operation, eight non-negative matrices A i ( ζ ) , B i ( ζ ) , C i ( ζ ) and D i ( ζ ) , i = 1 , 2 are given by
A 1 ( ζ ) = M 1 ( ζ ) M 1 T ( ζ ) ,
A 2 ( ζ ) = M 2 ( ζ ) M 2 T ( ζ ) ,
B 1 ( ζ ) = G ( ζ ) R 11 1 G T ( ζ ) ,
B 2 ( ζ ) = K ( ζ ) R 22 1 K T ( ζ ) ,
C 1 ( ζ ) = K ( ζ ) R 22 T R 12 R 22 1 K T ( ζ ) ,
C 2 ( ζ ) = G ( ζ ) R 11 T R 21 R 11 1 G T ( ζ ) ,
D 1 ( ζ ) = K ( ζ ) R 22 1 K T ( ζ ) ,
D 2 ( ζ ) = G ( ζ ) R 11 1 G T ( ζ ) .
We all know that it is difficult to directly solve the coupled HJB equations, so, next, we approximate their solutions using the NN-based adaptive critic design.

3. Robust Trajectory Tracking Design for Non-Zero-Sum Games

This section mainly includes two parts. First, the solution of coupled HJB equations is approximated by the adaptive critic design based on a single NN structure, so that the so-called Nash equilibrium is found. Secondly, the stability of the system is proved and the tracking performance is analyzed via the Lyapunov theory.

3.1. Neural Network Implementation

In order to realize the neural network approximation, we first introduce the Weierstrass high-order approximation theorem [37,38].
Assumption 2.
The solutions to (26) and (27) are smooth.
According to Assumption 2, there exist complete independent basis sets { ϖ i ( ζ ) } and { μ i ( ζ ) } such that the solutions to (26) and (27) and their gradients are uniformly approximated, that is, there exist coefficients c i and z i such that
V 1 ( ζ ) = i = 1 c i ϖ i ( ζ ) = i = 1 K c i ϖ i ( ζ ) + i = K + 1 c i ϖ i ( ζ ) ,
V 2 ( ζ ) = i = 1 z i μ i ( ζ ) = i = 1 K z i μ i ( ζ ) + i = K + 1 z i μ i ( ζ ) .
Then we have
V 1 ( ζ ) C 1 T ϕ 1 ( ζ ) + i = K + 1 c i ϖ i ( ζ ) ,
V 2 ( ζ ) Z 1 T ϕ 2 ( ζ ) + i = K + 1 z i μ i ( ζ ) ,
where ϕ 1 ( ζ ) = [ ϖ 1 ( ζ ) , ϖ 2 ( ζ ) ϖ K ( ζ ) ] T , ϕ 2 ( ζ ) = [ μ 1 ( ζ ) , μ 2 ( ζ ) μ K ( ζ ) ] T , and the last terms in these equations converge uniformly to zero as K . Next, we give the specific content of the value function approximation.
For the augmented dynamics (15), the value functions are re-expressed as
V 1 ( ζ ) = W 1 T ϕ 1 ( ζ ) + ε 1 ,
V 2 ( ζ ) = W 2 T ϕ 2 ( ζ ) + ε 2 ,
where W 1 , W 2 R K are ideal weights, ϕ 1 ( ζ ) , ϕ 2 ( ζ ) R K are defined as activation function vectors, K is the number of hidden neurons, and ε 1 and ε 2 are the critic NN approximation errors. When K , ε 1 and ε 2 converge to zero; however, when K is a fixed constant they are bounded.
Assumption 3.
In order to ensure the boundedness, we make the following assumptions, as in [26].
(1)
The critic NN activation functions and their gradients are bounded such as ϕ i λ ϕ i and ϕ i λ d ϕ i , i = 1 , 2 . λ ϕ i and λ d ϕ i are positive constants.
(2)
The critic NN approximation errors and their gradients are bounded by positive constants such that ε 1 λ ε i and ε i λ d ε i , i = 1 , 2 . λ ε i and λ d ε i are positive constants.
(3)
The critic NN weights are upper bounded such that W i W ¯ i , i = 1 , 2 . W ¯ i are positive constants.
The derivatives of (39) and (40) along with ζ are
V 1 ( ζ ) = ϕ 1 T ( ζ ) W 1 + ε 1 ,
V 2 ( ζ ) = ϕ 2 T ( ζ ) W 2 + ε 2 ,
where ϕ i = ϕ i / ζ , ε i = ε i / ζ , i = 1 , 2 . Noticing (30), (31), (41) and (42), the optimal control laws are written as
u * = 1 2 R 11 1 G T ( ζ ) [ ϕ 1 T ( ζ ) W 1 + ε 1 ] ,
v * = 1 2 R 22 1 K T ( ζ ) [ ϕ 2 T ( ζ ) W 2 + ε 2 ] .
Then the associated Bellman equations can be derived as
Γ 1 ( ζ ) + U 1 ( ζ , u , v ) + W 1 T ϕ 1 ( ζ ) [ F ( ζ ) + G ( ζ ) u ( ζ ) + K ( ζ ) v ( ζ ) ] = ε b 1 ,
Γ 2 ( ζ ) + U 2 ( ζ , u , v ) + W 2 T ϕ 2 ( ζ ) [ F ( ζ ) + G ( ζ ) u ( ζ ) + K ( ζ ) v ( ζ ) ] = ε b 2 ,
where ε b i = ( ε i ) T ( F + G u + K v ) , i = 1 , 2 are the Bellman equation errors. When the number of the critic NN hidden neurons K , they converge to zero [36]. However, when K is a fixed constant they are bounded by constants such as ε b i λ ε b i , i = 1 , 2 .
Based on (32), (33), (43) and (44), one obtains
H 1 = ζ T Q ¯ 1 ζ + λ f 2 1 ( ζ ) + W 1 T ϕ 1 ( ζ ) F ( ζ ) + 1 4 W 1 T ϕ 1 ( ζ ) A 1 ( ζ ) ϕ 1 T ( ζ ) W 1 1 4 W 1 T ϕ 1 ( ζ ) B 1 ( ζ ) ϕ 1 T ( ζ ) W 1 + 1 4 W 2 T ϕ 2 ( ζ ) C 1 ( ζ ) ϕ 2 T ( ζ ) W 2 1 2 W 1 T ϕ 1 ( ζ ) D 1 ( ζ ) ϕ 2 T ( ζ ) W 2 = ε H J 1 ,
H 2 = ζ T Q ¯ 2 ζ + λ f 2 2 ( ζ ) + W 2 T ϕ 2 ( ζ ) F ( ζ ) + 1 4 W 2 T ϕ 2 ( ζ ) A 2 ( ζ ) ϕ 2 T ( ζ ) W 2 1 4 W 2 T ϕ 2 ( ζ ) B 2 ( ζ ) ϕ 2 T ( ζ ) W 2 + 1 4 W 1 T ϕ 1 ( ζ ) C 2 ( ζ ) ϕ 1 T ( ζ ) W 1 1 2 W 2 T ϕ 2 ( ζ ) D 2 ( ζ ) ϕ 1 T ( ζ ) W 1 = ε H J 2 .
ε H J 1 and ε H J 2 are the coupled HJB equations approximation errors shown in [36]. Without loss of generality, as the number of the critic NN hidden neurons K , they converge to zero. However, when K is a fixed constant they are bounded by positive constants such that ε H J i λ ε H J i , i = 1 , 2 .
Since the ideal weights W 1 and W 2 are unknown, they are estimated as W ^ 1 and W ^ 2 , then the weight estimation errors are defined as W i ˜ = W i W ^ i , i = 1 , 2 . The estimated value functions are given by
V ^ 1 ( ζ ) = W ^ 1 T ϕ 1 ( ζ ) ,
V ^ 2 ( ζ ) = W ^ 2 T ϕ 2 ( ζ ) .
Meanwhile, the approximate optimal control policies are presented as
u ^ * = 1 2 R 11 1 G T ( ζ ) ϕ 1 T ( ζ ) W ^ 1 ,
v ^ * = 1 2 R 22 1 K T ( ζ ) ϕ 2 T ( ζ ) W ^ 2 .
Based on (32), (33), (51) and (52), the approximate Hamilton functions are
H ^ 1 = ζ T Q ¯ 1 ζ + λ f 2 1 ( ζ ) + W ^ 1 T ϕ 1 ( ζ ) F ( ζ ) + 1 4 W ^ 1 T ϕ 1 ( ζ ) A 1 ( ζ ) ϕ 1 T ( ζ ) W ^ 1 1 4 W ^ 1 T ϕ 1 ( ζ ) B 1 ( ζ ) ϕ 1 T ( ζ ) W ^ 1 + 1 4 W ^ 2 T ϕ 2 ( ζ ) C 1 ( ζ ) ϕ 2 T ( ζ ) W ^ 2 1 2 W ^ 1 T ϕ 1 ( ζ ) D 1 ( ζ ) ϕ 2 T ( ζ ) W ^ 2 e 1 ,
H ^ 2 = ζ T Q ¯ 2 ζ + λ f 2 2 ( ζ ) + W ^ 2 T ϕ 2 ( ζ ) F ( ζ ) + 1 4 W ^ 2 T ϕ 2 ( ζ ) A 2 ( ζ ) ϕ 2 T ( ζ ) W ^ 2 1 4 W ^ 2 T ϕ 2 ( ζ ) B 2 ( ζ ) ϕ 2 T ( ζ ) W ^ 2 + 1 4 W ^ 1 T ϕ 1 ( ζ ) C 2 ( ζ ) ϕ 1 T ( ζ ) W ^ 1 1 2 W ^ 2 T ϕ 2 ( ζ ) D 2 ( ζ ) ϕ 1 T ( ζ ) W ^ 1 e 2 ,
where e 1 and e 2 are the residual errors. The next tasks are to train neural networks and design W ^ 1 and W ^ 2 to minimize the target function E = 1 2 e 1 T e 1 + 1 2 e 2 T e 2 . Then W ^ 1 and W ^ 2 converge to W 1 and W 2 .
To overcome the difficulty of finding the initial admissible controllers, the following assumption is given. Furthermore, an additional term is developed to strengthen the learning process of the critic NN.
Assumption 4.
Given the cost functions (16) and (17), for the nominal augmented system (15), under the optimal control policies of the two players, we define a continuously differentiable Lyapunov function candidate J s ( ζ ) satisfying
J ˙ s ( ζ ) = ( J s ( ζ ) ) T [ F ( ζ ) + G ( ζ ) u * ( ζ ) + K ( ζ ) v * ( ζ ) ] < 0 ,
where J s ( ζ ) = J s ( ζ ) / ζ . Suppose there exists a positive definite matrix Ξ ( ζ ) such that
( J s ( ζ ) ) T [ F ( ζ ) + G ( ζ ) u * ( ζ ) + K ( ζ ) v * ( ζ ) ] = ( J s ( ζ ) ) T Ξ ( ζ ) J s ( ζ )
holds [5].
Remark 1.
We assume that F ( ζ ) + G ( ζ ) u * ( ζ ) + K ( ζ ) v * ( ζ ) θ J s ( ζ ) , and θ is a positive constant [5]. Hence, we have ( J s ( ζ ) ) T F ( ζ ) + G ( ζ ) u * ( ζ ) + K ( ζ ) v * ( ζ ) θ J s ( ζ ) 2 . The minimum and maximum eigenvalues of matrix Ξ ( ζ ) are λ m and λ M , then we obtain
λ m J s ( ζ ) 2 ( J s ( ζ ) ) T Ξ ( ζ ) J s ( ζ ) λ M J s ( ζ ) 2 .
Here, J s ( ζ ) can be selected as J s ( ζ ) = 0.5 ζ T ζ .
Now, based on the normalized gradient descent algorithm, the weights of the critic NN for each player are tuned with two additional terms, that is
W ^ ˙ 1 = a σ 11 ( 1 + σ 11 T σ 11 ) 2 [ σ 11 T W ^ 1 + λ f 2 1 ( ζ ) + U 1 ( ζ , u ^ , v ^ ) 1 4 W ^ 1 T ϕ 1 ( ζ ) A 1 ( ζ ) ϕ 1 T ( ζ ) W ^ 1 ] + b 2 Π ( ζ , u ^ , v ^ ) ϕ 1 ( ζ ) B 1 ( ζ ) J s ( ζ ) + a σ 11 4 × ( 1 + σ 11 T σ 11 ) 2 [ W ^ 1 T ϕ 1 ( ζ ) A 1 ( ζ ) ϕ 1 T ( ζ ) W ^ 1 W ^ 1 T ϕ 1 ( ζ ) B 1 ( ζ ) ϕ 1 T ( ζ ) W ^ 1 W ^ 2 T ϕ 2 ( ζ ) C 1 ( ζ ) ϕ 2 T ( ζ ) W ^ 2 ] ,
W ^ ˙ 2 = a σ 22 ( 1 + σ 22 T σ 22 ) 2 [ σ 22 T W ^ 2 + λ f 2 2 ( ζ ) + U 2 ( ζ , u ^ , v ^ ) 1 4 W ^ 2 T ϕ 2 ( ζ ) A 2 ( ζ ) ϕ 2 T ( ζ ) W ^ 2 ] + b 2 Π ( ζ , u ^ , v ^ ) ϕ 2 ( ζ ) B 2 ( ζ ) J s ( ζ ) + a σ 22 4 × ( 1 + σ 22 T σ 22 ) 2 [ W ^ 2 T ϕ 2 ( ζ ) A 2 ( ζ ) ϕ 2 T ( ζ ) W ^ 2 W ^ 2 T ϕ 2 ( ζ ) B 2 ( ζ ) ϕ 2 T ( ζ ) W ^ 2 W ^ 1 T ϕ 1 ( ζ ) C 2 ( ζ ) ϕ 1 T ( ζ ) W ^ 1 ] ,
where a > 0 is the learning rate of the critic NN and the third term, b > 0 is the learning rate of the second term. σ i i = ϕ i ( F ( ζ ) + G ( ζ ) u ^ + K ( ζ ) v ^ ) + 1 2 ϕ i ( ζ ) M i ( ζ ) M i T ( ζ ) ϕ i T ( ζ ) W ^ i , i = 1 , 2 . In addition, J s ( ζ ) is given in Assumption 3. In (58) and (59), the Π ( ζ , u ^ , v ^ ) is the additional stabilizing term defined as
Π ( ζ , u ^ , v ^ ) = 0 , if J ˙ s ( ζ ) = ( J s ( ζ ) ) T [ F ( ζ ) + G ( ζ ) u ^ ( ζ ) + K ( ζ ) v ^ ( ζ ) ] < 0 1 , e l s e
Remark 2.
The second term introduced guarantees that the system remains stable during the weight update process. When the system is stable, the value of this item is 0. When the system is unstable, this item is activated to reinforce system stability by enhancing the training process. On account of
J ˙ s ( ζ ) W ^ 1 = u ^ W ^ 1 J ˙ s ( ζ ) u ^ = 1 2 ϕ 1 ( ζ ) B 1 ( ζ ) J s ( ζ )
and
J ˙ s ( ζ ) W ^ 2 = v ^ W ^ 2 J ˙ s ( ζ ) v ^ = 1 2 ϕ 2 ( ζ ) B 2 ( ζ ) J s ( ζ ) ,
the additional stability term makes the weights update in the opposite direction of J ˙ s ( ζ ) . If J ˙ s ( ζ ) 0 , the reinforced training process can reduce it to a negative value. On the other hand, when the probing noise is needed to satisfy the persistent excitation (PE) condition, the additional stabilizing term can keep the system in a closed-loop stable state, which leads the system to no longer need initial stability control. The third terms given in (58) and (59) are for the next stability analysis.

3.2. Stability Analysis

In this section, we give several theorems and then add some assumptions to prove the stability of the closed-loop nominal augmented system and analyze the tracking performance.
Assumption 5.
Assume that the matrices associated with each player’s control input have upper bounds, i.e. R 11 R 11 M , R 12 R 12 M , R 21 R 21 M and R 22 R 22 M . Eight non-negative matrices A i ( ζ ) , B i ( ζ ) , C i ( ζ ) and D i ( ζ ) , i = 1 , 2 are bounded, i.e. A i ( ζ ) λ A i , B i ( ζ ) λ B i , C i ( ζ ) λ C i and D i ( ζ ) λ D i , i = 1 , 2 , λ A i , λ B i , λ C i and λ D i , i = 1 , 2 are positive constants. Moreover, B 1 ( ζ ) ε 1 ( ζ ) λ 2 and B 2 ( ζ ) ε 2 ( ζ ) λ 3 . λ 2 , λ 3 , R 11 M , R 12 M , R 21 M and R 22 M are positive constants.
Theorem 1.
For the nominal augmented system (15), a pair of feedback control laws { u * , v * } are derived by (51) and (52), moreover, the weight vectors of the critic NN are trained by (58) and (59), respectively. Then, we have that the closed-loop system state and the critic NN weights’ estimation errors are both uniformly ultimately bounded (UUB).
Proof. 
See the Appendix A. □
According to Thereom 1, it is easy to conclude that the feedback control laws converge.
Corollary 1.
The control policies converge to the approximate Nash equilibrium solution of the NZS game.
Proof of Corollary 1.
Based on (43), (44), (51) and (52), we have
u * u ^ * = 1 2 R 11 1 G T ( ζ ) ϕ 1 T ( ζ ) W ˜ 1 1 2 R 11 1 G T ( ζ ) ε 1 ( ζ ) ,
v * v ^ * = 1 2 R 22 1 K T ( ζ ) ϕ 2 T ( ζ ) W ˜ 2 1 2 R 22 1 K T ( ζ ) ε 2 ( ζ ) .
According to Theorem 1, Assumption 1 and Assumption 3, we conclude that W i ˜ , i = 1 , 2 , the terms R 11 1 G T ( ζ ) ϕ 1 T ( ζ ) W ˜ 1 , R 22 1 K T ( ζ ) ϕ 2 T ( ζ ) W ˜ 2 , R 11 1 G T ( ζ ) ε 1 ( ζ ) and R 22 1 K T ( ζ ) × ε 2 ( ζ ) are bounded. Furthermore, we have
u * u ^ * 1 2 R 11 M 1 λ g λ d ϕ 1 M + 1 2 R 11 M 1 λ g λ d ε 1 λ u ,
v * v ^ * 1 2 R 22 M 1 λ k λ d ϕ 2 M + 1 2 R 22 M 1 λ k λ d ε 2 λ v .
where λ u and λ v are the finite bounds. Therefore, u * u ^ * and v * v ^ * are UUB. This completes the proof. □
In addition to the convergence of system states to the origin, the tracking performance of the system is also an important indicator. Therefore, we put forward Theorem 2 to show that system (1) can track the reference trajectory (2 well, and the proof is given.
Theorem 2.
Given the cost functions (16) and (17), for the nominal augmented system (15), the approximate optimal control laws obtained by (51) and (52) ensure that the tracking error dynamics are UUB.
Proof. 
See the Appendix A. □
Remark 3.
In this section, we give an optimal robust tracking control scheme for the NZS game, which can be extended to the N-player NZS game system in theory.

4. Simulation

4.1. Two-Player Linear Non-Zero-Sum Game

Consider a continuous-time uncertain linear system:
x ˙ = x 2 3 x 1 0.5 x 2 + 0 1 u + 0 2 v + η 1 x 2 c o s x 1 η 2 x 1 s i n x 2 ,
where x = [ x 1 , x 2 ] T R 2 is the state variable, u R and v R are the control inputs and the uncertain parameters η 1 , η 2 [ 1 , 1 ] . The last term of system (67) is the uncertain term that is bounded by λ f ( ζ ) = x 1 2 + x 2 2 , then we have λ f 1 ( ζ ) = x 2 2 and λ f 2 ( ζ ) = x 1 2 . Let the initial system state vector be x 0 = [ 1 , 1 ] T .
Here, the reference trajectory s ( t ) is generated by the following system:
s ˙ = 0.5 s 1 s 2 c o s ( s 1 ) 3 s i n ( s 1 ) s 2 ,
where s = [ s 1 , s 2 ] T R 2 is the reference state. One lets the initial reference state vector be s 0 = [ 0.5 , 0.5 ] T .
Defining the tracking error as e r = x s so that e r ˙ = x ˙ s ˙ , let the augmented state vector be ζ = [ e r T , s T ] T . Then, we have the augmented system dynamics as follows:
ζ ˙ = ζ 2 + ζ 4 + 0.5 ζ 3 + ζ 4 c o s ( ζ 3 ) 3 ( ζ 1 + ζ 3 ) 0.5 ( ζ 2 + ζ 4 ) 3 s i n ( ζ 3 ) + ζ 4 0.5 ζ 3 ζ 4 c o s ( ζ 3 ) 3 s i n ( ζ 3 ) ζ 4 + 0 1 0 0 u + 0 2 0 0 v + Δ F ( ζ ) ,
where ζ = [ ζ 1 , ζ 2 , ζ 3 , ζ 4 ] T R 4 with ζ 1 = e r 1 , ζ 2 = e r 2 , ζ 3 = s 1 , ζ 4 = s 2 , and Δ F ( ζ ) is the uncertain term of the augmented system. Here, we choose M 1 ( ζ ) = [ 1 , 0 , 0 , 0 ] T and M 2 ( ζ ) = [ 0 , 1 , 0 , 0 ] T . Meanwhile, the decomposed the uncertain term are respectively λ f 1 ( ζ ) = ( ζ 2 + ζ 4 ) 2 and λ f 2 ( ζ ) = ( ζ 1 + ζ 3 ) 2 . Therefore, the initial state of the augmented system is ζ 0 = [ 1.5 , 0.5 , 0.5 , 0.5 ] T with the initial tracking error vector e r 0 = x 0 s 0 = [ 1.5 , 0.5 ] .
Select Q ¯ 1 = d i a g { 2 I 2 , 0 2 × 2 } , Q ¯ 2 = d i a g { I 2 , 0 2 × 2 } , R 11 = R 21 = 1 , R 12 = R 22 = 0.5 , η 1 = 1 and η 2 = 1 . The critic NN activation functions are chosen as ϕ 1 ( ζ ) = ϕ 2 ( ζ ) = [ ζ 1 2 , ζ 1 ζ 2 , ζ 1 ζ 3 , ζ 1 ζ 4 , ζ 2 2 , ζ 2 ζ 3 , ζ 2 ζ 4 , ζ 3 2 , ζ 3 ζ 4 , ζ 4 2 ] T . Let the learning rates be a = 2 and b = 0.5 . Moreover, one brings in a probing noise to satisfy the persistence excitation (PE) condition. The state trajectories and reference trajectories are displayed in Figure 1 and Figure 2. After the learning process, Figure 3 and Figure 4 show that the weights of critic NN1 and NN2 converged to [ 0.2521 , 0.0627 , 0.0501 , 0.0213 , 0.0487 , 0.0373 , 0.0134 , 0.0188 , 0.0171 , 0.0273 ] T and [ 0.1934 , 0.0558 , 0.0248 , 0.2574 , 0.1487 , 0.0026 , 0.1406 , 0.0039 , 0.0134 , 0.0928 ] T . Since the value of the initial weights was all set as zero, we could conclude that the system did not require the initial stable control policies. The control trajectories for each player are in Figure 5. Figure 6 demonstrates that the tracking errors convergenced to 0, which indicated that system (67) could track the reference trajectory (68) well. To verify the robustness of the method, one could choose η 1 = 0.5 and η 2 = 0.5 , and then perform the simulation and verification.The tracking error and control input are depicted in Figure 7 and Figure 8, which still demonstrated the desired trajectory tracking performance again.

4.2. Two-Player Nonlinear Non-zero-Sum Game

Consider a continuous-time uncertain nonlinear system:
x ˙ = x 2 x 2 0.5 x 1 0.25 x 2 ( c o s ( 2 x 1 ) + 2 ) 2 0.25 x 2 ( s i n ( 4 x 1 ) + 2 ) 2 + 0 c o s ( 2 x 1 ) + 2 u + 0 s i n ( 4 x 1 2 ) + 2 v + η 1 x 2 c o s x 1 s i n x 2 η 2 x 1 s i n x 2 2 .
In this example, the reference signal s ( t ) is derived by
s ˙ = s 1 + s i n ( s 2 ) 2 s i n 3 ( s 1 ) 0.5 s 2 .
The critic NN activation functions, a and b are the same as in the first example. Similarly, the augmented system dynamics are as follows:
ζ ˙ = ζ 2 + ζ 4 + ζ 3 s i n ( ζ 4 ) ζ 2 + ζ 4 0.5 ( ζ 1 + ζ 3 ) 0.25 ( ζ 2 + ζ 4 ) ( c o s ( 2 ( ζ 1 + ζ 3 ) ) + 2 ) 2 0.25 ( ζ 2 + ζ 4 ) × ( s i n ( 4 ( ζ 1 + ζ 3 ) ) + 2 ) ) 2 + 2 s i n 3 ( ζ 3 ) + 0.5 ζ 4 ζ 3 + s i n ( ζ 4 ) 2 s i n 3 ( ζ 3 ) 0.5 ζ 4 + 0 c o s ( 2 ( ζ 1 + ζ 3 ) ) + 2 0 0 u + 0 s i n ( 4 ( ζ 1 + ζ 3 ) 2 ) + 2 0 0 v + Δ F ( ζ ) .
Here, we select M 1 ( ζ ) = [ 1 , 0 , 0 , 0 ] T , M 2 ( ζ ) = [ 0 , 1 , 0 , 0 ] T , λ f 1 ( ζ ) = ( ζ 2 + ζ 4 ) 2 and λ f 2 ( ζ ) = ( ζ 1 + ζ 3 ) 2 . Let the initial system state vector be x 0 = [ 0.5 , 0.5 ] T and the initia reference trajectory vector be s 0 = [ 0.5 , 0.5 ] T , then the initial state of the augmented system is ζ 0 = [ 1 , 1 , 0.5 , 0.5 ] T .
Select Q ¯ 1 = d i a g { 5 I 2 , 0 2 × 2 } , Q ¯ 2 = d i a g { 2 I 2 , 0 2 × 2 } , R 11 = R 21 = 2 , R 12 = R 22 = 1 , η 1 = 0.2 and η 2 = 0.2 . The state trajectories and reference trajectories are displayed in Figure 9 and Figure 10. Figure 11 and Figure 12 show that the weights of critic NN1 and NN2 converge to [ 0.4582 , 0.2514 , 0.2907 , 0.2567 , 0.1455 , 0.1353 , 0.1050 , 0.1527 , 0.1321 , 0.1112 ] T and [ 0.2622 , 0.0666 , 0.0854 , 0.0858 , 0.0879 , 0.0610 , 0.0470 , 0.0601 , 0.0487 , 0.0406 ] T , respectively. It could also be seen that initial stability control policies were not required. The control trajectories for each player are in Figure 13. The tracking errors are displayed in Figure 14, which indicated that system (70) could track the reference trajectory (71) well. These experimental results verified the effectiveness of the proposed method in this paper.

5. Conclusions

In this paper, an ADP-based robust tracking control design was proposed for the NZS game of nonlinear systems with dynamic uncertainties. Firstly, the tracking error and reference trajectory were used to construct the augmented system. The coupled HJB equations were modified by defining appropriate performance indicators. Then, a new adaptive critic design was proposed to solve the coupled HJB equations. A single-network structure was used to approximate the value function and control policy for each player. By a modified critic NN weights’ tuning law, the control policies of the two players converged to the Nash equilibrium of NZS games. What is more, the proof that the system state, tracking error and weight estimation error were UUB was given via the Lyapunov theory. Finally, two simulation results verified the effectiveness of the proposed scheme. We will consider the input constraints and state constraints for this problem in the future.

Author Contributions

C.Q. and Z.S. provided methodology, validation, and writing—original draft preparation; Z.Z. and D.Z.provided conceptualization, writing—review; J.Z. provided supervision; C.Q. provided funding support. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Natural Science Foundation of China under Grant (U1504615, 61703141), Youth Backbone Teachers in Colleges and Universities of Henan Province 2018GGJS017, and Science and Technology Research Project of the Henan Province 222102240014.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors can confirm that all relevant data are included in the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

ProofofTheorem 1.
We choose the following Lyapunov function candidate:
L = 1 2 a W ˜ 1 T W 1 + 1 2 a W ˜ 2 T W 2 + b a J s ( ζ ) ,
where J s ( ζ ) is presented in Assumption 3. Let σ i i = σ i i , σ ¯ i i = σ ¯ i i , m s i i = 1 + σ i i T σ i i , σ ¯ i i = σ i i / m s i i , i = 1 , 2 , combining (47), (48), (51), (52), (58) and (59), we obtain the weight estimation error dynamics as
W ˜ ˙ 1 = a σ ¯ 11 m s 11 ( σ 11 T W ˜ 1 + 1 4 W 1 T ϕ 1 A 1 ( ζ ) ϕ 1 T W 1 1 2 W ˜ 1 T ϕ 1 A 1 ( ζ ) ϕ 1 T W 1 1 4 W 1 T ϕ 1 B 1 ( ζ ) ϕ 1 T W 1 + 1 2 W ˜ 1 T ϕ 1 B 1 ( ζ ) ϕ 1 T W 1 1 4 W 2 T ϕ 2 C 1 ( ζ ) ϕ 2 T W 2 + 1 2 W ˜ 2 T ϕ 2 C 1 ( ζ ) ϕ 2 T W 2 1 2 W ˜ 2 T ϕ 2 C 1 ( ζ ) ϕ 1 T W 2 + 1 2 W 1 T ϕ 1 D 1 ( ζ ) ϕ 2 T W ˜ 2 + ε H J 1 ) b 2 Π ϕ 1 B 1 ( ζ ) J s ( ζ ) ,
W ˜ ˙ 2 = a σ ¯ 22 m s 22 ( σ 22 T W ˜ 2 + 1 4 W 2 T ϕ 2 A 2 ( ζ ) ϕ 2 T W 2 1 2 W ˜ 2 T ϕ 2 A 2 ( ζ ) ϕ 1 T W 2 1 4 W 2 T ϕ 2 B 2 ( ζ ) ϕ 2 T W 2 + 1 2 W ˜ 2 T ϕ 2 B 2 ( ζ ) ϕ 2 T W 2 1 4 W 1 T ϕ 1 C 2 ( ζ ) ϕ 1 T W 1 + 1 2 W ˜ 1 T ϕ 1 C 2 ( ζ ) ϕ 1 T W 1 1 2 W ˜ 1 T ϕ 1 C 2 ( ζ ) ϕ 2 T W 1 + 1 2 W 2 T ϕ 2 D 2 ( ζ ) ϕ 1 T W ˜ 1 + ε H J 2 ) b 2 Π ϕ 1 B 2 ( ζ ) J s ( ζ ) .
Based on (A2) and (A3), the derivation of L can be rewritten as
L ˙ = 1 a W ˜ 1 T W ˜ ˙ 1 + 1 a W ˜ 2 T W ˜ ˙ 2 + b a ( J s ( ζ ) ) T ζ ˙ = W ˜ 1 T σ ¯ 11 σ ¯ 11 T W ˜ 1 W ˜ 1 T σ ¯ 11 1 4 m s 11 W 1 T ϕ 1 A 1 ( ζ ) ϕ 1 T W 1 + W ˜ 1 T σ ¯ 11 1 2 m s 11 W ˜ 1 T ϕ 1 A 1 ( ζ ) ϕ 1 T × W 1 + W ˜ 1 T σ ¯ 11 1 4 m s 11 W 1 T ϕ 1 B 1 ( ζ ) ϕ 1 T W 1 W ˜ 1 T σ ¯ 11 1 2 m s 11 W ˜ 1 T ϕ 1 B 1 ( ζ ) ϕ 1 T W 1 + W ˜ 1 T × σ ¯ 11 1 4 m s 11 W 2 T ϕ 2 C 1 ( ζ ) ϕ 2 T W 2 W ˜ 1 T σ ¯ 11 1 2 m s 11 W ˜ 2 T ϕ 2 C 1 ( ζ ) ϕ 1 T W 2 W ˜ 1 T σ ¯ 11 1 2 m s 11 × W 1 T ϕ 1 D 1 ϕ 2 T W ˜ 2 W ˜ 2 T σ ¯ 22 σ ¯ 22 T W ˜ 2 W ˜ 2 T σ ¯ 22 1 4 m s 22 W 2 T ϕ 2 A 2 ( ζ ) ϕ 2 T W 2 + W ˜ 2 T σ ¯ 22 × 1 2 m s 22 W ˜ 2 T ϕ 2 A 2 ( ζ ) ϕ 2 T W 2 + W ˜ 2 T σ ¯ 22 1 4 m s 22 W 2 T ϕ 2 B 2 ( ζ ) ϕ 2 T W 2 W ˜ 2 T σ ¯ 22 1 2 m s 22 × W ˜ 2 T ϕ 2 B 2 ( ζ ) ϕ 2 T W 2 + W ˜ 2 T σ ¯ 22 1 4 m s 22 W 1 T ϕ 1 C 2 ( ζ ) ϕ 1 T W 1 W ˜ 2 T σ ¯ 22 1 2 m s 22 W ˜ 1 T ϕ 1 × C 2 ( ζ ) ϕ 2 T W 1 W ˜ 2 T σ ¯ 22 1 2 m s 22 W 1 T ϕ 2 D 2 ϕ 1 T W ˜ 1 W ˜ 2 T σ ¯ 22 1 m s 22 ε H J 2 b 2 Π W ˜ 2 ϕ 2 × B 2 ( ζ ) J s ( ζ ) W ˜ 1 T σ ¯ 11 1 m s 11 ε H J 1 b 2 Π W ˜ 1 ϕ 1 B 1 ( ζ ) J s ( ζ ) .
Defining p = [ W ˜ 1 T σ ¯ 11 , W ˜ 2 T σ ¯ 22 , W ˜ 1 T , W ˜ 2 T ] T , the derivation of L can be rewritten as
L = p T N 11 N 12 N 13 N 14 N 21 N 22 N 23 N 24 N 31 N 32 N 33 N 34 N 41 N 42 N 43 N 44 p + p T ψ
where
N 11 = N 22 = I ,
N 12 = N 21 = N 33 = N 34 = N 43 = N 44 = 0 ,
N 13 = N 31 T = W 1 T 4 m s 11 ( ϕ 1 B 1 ( ζ ) ϕ 1 T ϕ 1 A 1 ( ζ ) ϕ 1 T ) ,
N 14 = N 41 T = 1 4 m s 11 W 2 T ϕ 2 C 1 ( ζ ) ϕ 2 T 1 4 m s 11 W 2 T ϕ 1 C 1 ( ζ ) ϕ 2 T + 1 4 m s 11 W 1 T ϕ 1 D 1 ( ζ ) ϕ 2 T ,
N 23 = N 32 T = 1 4 m s 11 W 1 T ϕ 1 C 2 ( ζ ) ϕ 1 T 1 4 m s 11 W 1 T ϕ 2 C 2 ( ζ ) ϕ 1 T + 1 4 m s 11 W 2 T ϕ 2 D 2 ( ζ ) ϕ 1 T ,
N 24 = N 42 T = W 2 T 4 m s 22 ( ϕ 2 B 2 ( ζ ) ϕ 2 T ϕ 2 A 2 ( ζ ) ϕ 2 T ) ,
and the vector ψ = [ ψ 1 , ψ 2 , ψ 3 , ψ 4 ] T is given by
ψ 1 = 1 4 m s 11 ( W 1 T ϕ 1 A 1 ( ζ ) ϕ 1 T W 1 + W 1 T ϕ 1 B 1 ( ζ ) ϕ 1 T W 1 + W 2 T ϕ 2 C 1 ( ζ ) ϕ 1 T W 2 ) 1 m s 11 ε H J 1 ,
ψ 2 = 1 4 m s 22 ( W 2 T ϕ 2 A 2 ( ζ ) ϕ 2 T W 2 + W 2 T ϕ 2 B 2 ( ζ ) ϕ 2 T W 2 + W 1 T ϕ 1 C 2 ( ζ ) ϕ 2 T W 1 ) 1 m s 22 ε H J 2 ,
ψ 3 = ψ 4 = 0 .
According to Assumption 3 and the fact that σ i i , i = 1 , 2 are bounded, we derive that ψ is bounded. Selecting the appropriate parameters such that N > 0 , one lets λ m i n ( N ) denote the minimum eigenvalue of N and ψ be bounded by ψ M , We can conclude that
L ˙ λ m i n ( N ) p 2 + ψ M p b 2 Π W ˜ 1 ϕ 1 B 1 ( ζ ) J s ( ζ ) b 2 Π W ˜ 2 ϕ 1 B 2 ( ζ ) J s ( ζ ) + b a ( J s ( ζ ) ) T ζ ˙ .
In the following, the cases of Π = 0 and Π = 1 will be considered.
Case 1.
Π = 0 . Since J s ( ζ ) T ζ ˙ < 0 , we have J s ( ζ ) T ζ ˙ > 0 . According to the density property of real numbers, there exists a positive constant λ 1 such that 0 < λ 1 J s ( ζ ) ( J s ( ζ ) ) T ζ ˙ holds for all ζ Ω , i.e., ( J s ( ζ ) ) T ζ ˙ λ 1 J s ( ζ ) . Hence, the inequality (A6) becomes
L ˙ λ m i n ( N ) p 2 + ψ M p b a λ 1 J s ( ζ ) .
Therefore, given that the following inequalities
p ψ M λ m i n ( N ) M 1
or
J s ( ζ ) b ψ M 2 4 a λ m i n ( N ) λ 1 N 1
hold, we conclude L ˙ < 0 .
Case 2.
Π = 1 . Adding and subtracting b ( J s ( ζ ) ) T B 1 ( ζ ) ε 1 ( ζ ) / ( 2 a ) and b ( J s ( ζ ) ) T B 2 ( ζ ) × ε 2 ( ζ ) / ( 2 a ) to the right hand side of (A6), meanwhile taking Assumption 1 and Assumption 4 into consideration, we can conclude that
L ˙ λ m i n ( N ) p 2 + ψ M p b 2 Π W ˜ 1 ϕ 1 B 1 ( ζ ) J s ( ζ ) b 2 Π W ˜ 2 ϕ 1 B 2 ( ζ ) J s ( ζ ) + b a ( J s ( ζ ) ) T ζ ˙ = λ m i n ( N ) p 2 + ψ M p + b a ( J s ( ζ ) ) T ( F ( ζ ) + G ( ζ ) u * + K ( ζ ) v * ) + b 2 a ( J s ( ζ ) ) T × B 1 ( ζ ) ε 1 ( ζ ) + b 2 a ( J s ( ζ ) ) T B 2 ( ζ ) ε 2 ( ζ ) λ m i n ( N ) p 2 + ψ M p b a λ m J s ( ζ ) 2 + b 2 a ( λ 2 + λ 3 ) J s ( ζ ) .
Therefore, given that the following inequalities
p ψ M 2 4 λ m i n 2 ( N ) + b ( λ 2 + λ 3 ) 2 16 a λ m i n ( N ) λ m + ψ M 2 λ m i n ( N ) M 2
or
J s ( ζ ) a ψ M 2 4 λ m i n ( N ) λ m + ( λ 2 + λ 3 ) 2 16 λ m 2 + λ 2 + λ 3 4 λ m N 2
hold, we conclude L ˙ < 0 .
To summarize, if the inequality p > m a x ( M 1 , M 2 ) = M or J s ( ζ ) > m a x ( N 1 , N 2 ) = N holds, then L ˙ < 0 and we have that system state and the weight estimation errors are UUB. This completes the proof. □
Proof of Theorem 2.
We choose the following Lyapunov function candidate:
L 1 = V 1 + V 2 .
Differentiating L 1 along ζ , we have
V ˙ 1 = ζ T Q ¯ 1 ζ [ λ f 2 1 ( ζ ) + 1 4 W 1 T ϕ 1 A 1 ϕ 1 T W 1 + Δ F ( ζ ) T Δ F ( ζ ) ] 1 4 W 1 T ϕ 1 B 1 ϕ 1 T W 1 + 1 2 W 1 T ϕ 1 B 1 ϕ 1 T W ˜ 1 1 4 W 2 T ϕ 2 C 1 ϕ 2 T W 2 + 1 2 W 1 T ϕ 1 D 1 ϕ 2 T W ˜ 2 [ ϕ 1 T W 1 Δ F ( ζ ) ] T [ ϕ 1 T W 1 Δ F ( ζ ) ] + ε H J 1 + ε b 1 + ε F 1 ,
where ε F 1 = ε 1 Δ F ( ζ ) , since ε 1 and Δ F ( ζ ) are bounded, let ε F 1 λ ε F 1 . V ˙ 2 is similarly as V ˙ 1 , it is not hard to see that
L ˙ 1 = V ˙ 1 + V ˙ 2 ζ T ( Q ¯ 1 + Q ¯ 2 ) ζ q T Y q + λ 4 λ m i n ( Q ¯ 1 + Q ¯ 2 ) ζ 2 λ m i n ( Y ) q 2 + λ 4 ,
where ε H J 1 + ε b 1 + ε H J 2 + ε b 2 + ε F 1 + ε F 2 λ ε H J 1 + λ ε H J 2 + λ ε b 1 + λ ε b 2 + λ ε F 1 + λ ε F 2 = λ 4 , q = [ W 1 T , W 2 T , W ˜ 1 , W ˜ 2 ] T , λ m i n ( Q ¯ 1 + Q ¯ 2 ) and λ m i n ( Y ) are the minimum eigenvalues of Q ¯ 1 + Q ¯ 2 and Y, respectively. In the top formula,
Y = Y 11 Y 12 Y 13 Y 14 Y 21 Y 22 Y 23 Y 24 Y 31 Y 32 Y 33 Y 34 Y 41 Y 42 Y 43 Y 44
and
Y 11 = 1 4 ϕ 1 B 1 ϕ 1 T + 1 4 ϕ 1 C 2 ϕ 1 T ,
Y 22 = 1 4 ϕ 2 B 2 ϕ 2 T + 1 4 ϕ 2 C 1 ϕ 2 T ,
Y 13 = Y 31 T = 1 4 ϕ 1 B 1 ϕ 1 T ,
Y 14 = Y 41 T = 1 4 ϕ 1 D 1 ϕ 2 T ,
Y 23 = Y 32 T = 1 4 ϕ 2 D 2 ϕ 1 T ,
Y 24 = Y 42 T = 1 4 ϕ 2 B 2 ϕ 2 T .
Therefore, if the following inequalities
ζ λ 4 λ m i n ( Q ¯ 1 + Q ¯ 2 ) C 1
or
q λ 4 λ m i n ( Y ) C 2
hold, we obtain L ˙ 1 < 0 .
To summarize, if the inequality ζ > C 1 or q > C 2 holds, then L ˙ 1 < 0 and we have that the tracking errors of the closed-loop uncertain augmented system are UUB. This completes the proof. □

References

  1. Namadchian, Z.; Zare, A. Stability analysis of dynamic nonlinear interval type-2 TSK fuzzy control systems based on describing function. Soft Comput. 2020, 24, 14623–14636. [Google Scholar] [CrossRef]
  2. Tavoosi, J.; Suratgar, A.A.; Menhaj, M.B.; Mosavi, A.; Mohammadzadeh, A.; Ranjbar, E. Modeling Renewable Energy Systems by a Self-Evolving Nonlinear Consequent Part Recurrent Type-2 Fuzzy System for Power Prediction. Sustainability 2021, 13, 3301. [Google Scholar] [CrossRef]
  3. Zhang, H.; Hong, Q.; Yan, H.; Yang, F.; Guo, G. Event-Based Distributed H Filtering Networks of 2-DOF Quarter-Car Suspension Systems. IEEE Trans. Ind. Inform. 2017, 13, 312–321. [Google Scholar] [CrossRef]
  4. Li, L.; Xiao, J.; Zhao, Y.; Liu, K.; Peng, X.; Luan, H.; Li, K. Robust position anti-interference control for PMSM servo system with uncertain disturbance. CES Trans. Electr. Mach. Syst. 2020, 4, 151–160. [Google Scholar] [CrossRef]
  5. Liu, D.; Wang, D.; Wang, F.-Y.; Li, H.; Yang, X. Neural-Network-Based Online HJB Solution for Optimal Robust Guaranteed Cost Control of Continuous-Time Uncertain Nonlinear Systems. IEEE Trans. Cybern. 2014, 44, 2834–2847. [Google Scholar] [CrossRef]
  6. Zhong, X.; He, H.; Prokhorov, D.V. Robust controller design of continuous-time nonlinear system using neural network. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA, 4–9 August 2013; pp. 1–8. [Google Scholar]
  7. Sun, J.; Liu, C.; Ye, Q. Robust differential game guidance laws design for uncertain interceptor-target engagement via adaptive dynamic programming. Int. J. Control 2017, 90, 990–1004. [Google Scholar] [CrossRef]
  8. Yang, X.; Liu, D.; Luo, B.; Li, C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inf. Sci. 2016, 369, 731–747. [Google Scholar] [CrossRef]
  9. Yang, X.; He, H. Adaptive Critic Designs for Event-Triggered Robust Control of Nonlinear Systems With Unknown Dynamics. IEEE Trans. Cybern. 2019, 49, 2255–2267. [Google Scholar] [CrossRef]
  10. Wang, X.; Ye, X. Optimal Robust Control of Nonlinear Uncertain System via Off-Policy Integral Reinforcement Learning. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 1928–1933. [Google Scholar]
  11. Vamvoudakis, K.G.; Lewis, F.L. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. In Proceedings of the 2009 International Joint Conference on Neural Networks, Atlanta, GA, USA, 14–19 June 2009; pp. 3180–3187. [Google Scholar]
  12. Dierks, T.; Jagannathan, S. Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation. In Proceedings of the 49th IEEE Conference on Decision and Control (CDC), Atlanta, GA, USA, 15–17 December 2010; pp. 3048–3053. [Google Scholar]
  13. Lv, Y.; Na, J.; Yang, Q.; Wu, X.; Guo, Y. Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics. Int. J. Control 2016, 89, 99–112. [Google Scholar] [CrossRef] [Green Version]
  14. Wang, D.; He, H.; Liu, D. Adaptive Critic Nonlinear Robust Control: A Survey. IEEE Trans. Cybern. 2017, 47, 3429–3451. [Google Scholar] [CrossRef]
  15. Wang, D.; Liu, D.; Zhang, Q.; Zhao, D. Data-Based Adaptive Critic Designs for Nonlinear Robust Optimal Control With Uncertain Dynamics. IEEE Trans. Syst. Man Cybern. Syst. 2016, 46, 1544–1555. [Google Scholar] [CrossRef]
  16. Sun, J.; Zhang, H.; Yan, Y.; Xu, S.; Fan, X. Optimal Regulation Strategy for Nonzero-Sum Games of the Immune System Using Adaptive Dynamic Programming. IEEE Trans. Cybern. 2021, 47, 1–10. [Google Scholar] [CrossRef]
  17. Narayanan, V.; Sahoo, A.; Jagannathan, S.; George, K. Approximate Optimal Distributed Control of Nonlinear Interconnected Systems Using Event-Triggered Nonzero-Sum Games. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1512–1522. [Google Scholar] [CrossRef]
  18. Morris, P. Introduction to Game Theory, 1st ed.; Springer: New York, NY, USA, 1994; pp. 115–147. [Google Scholar]
  19. Starr, A.W.; Ho, Y.C. Nonzero-sum differential games. J. Optim. Theory Appl. 1969, 3, 184–206. [Google Scholar] [CrossRef]
  20. Zhang, H.; Jiang, H.; Luo, C.; Xiao, G. Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms. IEEE Trans. Cybern. 2017, 47, 3331–3340. [Google Scholar] [CrossRef] [PubMed]
  21. Mu, C.; Wang, K.; Sun, C. Policy-Iteration-Based Learning for Nonlinear Player Game Systems with Constrained Inputs. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 6488–6502. [Google Scholar] [CrossRef]
  22. Zhang, Q.; Zhao, D. Data-Based Reinforcement Learning for Nonzero-Sum Games with Unknown Drift Dynamics. IEEE Trans. Cybern. 2019, 49, 2874–2885. [Google Scholar] [CrossRef] [PubMed]
  23. Song, R.; Lewis, F.L.; Wei, Q. Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 704–713. [Google Scholar] [CrossRef] [PubMed]
  24. Zhang, H.; Su, H.; Zhang, K.; Luo, Y. Event-Triggered Adaptive Dynamic Programming for Non-Zero-Sum Games of Unknown Nonlinear Systems via Generalized Fuzzy Hyperbolic Models. IEEE Trans. Fuzzy Syst. 2019, 27, 2202–2214. [Google Scholar] [CrossRef]
  25. Zhao, Q.; Sun, J.; Wang, G.; Chen, J. Event-Triggered ADP for Nonzero-Sum Games of Unknown Nonlinear Systems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 27, 1–9. [Google Scholar] [CrossRef]
  26. Zhao, D.; Zhang, Q.; Wang, D.; Zhu, Y. Experience Replay for Optimal Control of Nonzero-Sum Game Systems with Unknown Dynamics. IEEE Trans. Cybern. 2016, 46, 854–865. [Google Scholar] [CrossRef] [PubMed]
  27. Zhang, C.; Zhang, Z. Adaptive Iterative Learning Trajectory Tracking Control of SCARA Robot. In Proceedings of the 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 18–20 June 2021; pp. 910–914. [Google Scholar]
  28. Yang, Y.; Wan, Y.; Zhu, J.; Lewis, F.L. H Tracking Control for Linear Discrete-Time Systems: Model-Free Q-Learning Designs. IEEE Control. Syst. Lett. 2021, 5, 175–180. [Google Scholar] [CrossRef]
  29. Huang, Y.; Liu, D. Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm. Neurocomputing 2014, 125, 46–56. [Google Scholar] [CrossRef]
  30. Dierks, T.; Jagannathan, S. Non-zero sum games: Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) Held Jointly with 2009 28th Chinese Control Conference, Shanghai, China, 29 January 2010; pp. 6750–6755. [Google Scholar]
  31. Wang, D.; Mu, C. Adaptive-Critic-Based Robust Trajectory Tracking of Uncertain Dynamics and Its Application to a Spring–Mass–Damper System. IEEE Trans. Ind. Electron. 2018, 65, 654–663. [Google Scholar] [CrossRef]
  32. Liu, L.; Wang, Z.; Zhang, H. Neural-Network-Based Robust Optimal Tracking Control for MIMO Discrete-Time Systems with Unknown Uncertainty Using Adaptive Critic Design. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 1239–1251. [Google Scholar] [CrossRef]
  33. Yang, X.; Liu, D.; Wei, Q.; Wang, D. Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing 2016, 198, 80–90. [Google Scholar] [CrossRef]
  34. Mu, C.; Zhang, Y.; Gao, Z.; Sun, C. ADP-Based Robust Tracking Control for a Class of Nonlinear Systems with Unmatched Uncertainties. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 4056–4067. [Google Scholar] [CrossRef]
  35. Başar, T.; Olsder, G.J. Dynamic Noncooperative Game Theory, 2nd ed.; Academic Press: Cambridge, MA, USA, 1999. [Google Scholar]
  36. Vamvoudakis, K.G.; Lewis, F.L. Non-zero sum games: Online learning solution of coupled Hamilton-Jacobi and coupled Riccati equations. In Proceedings of the 2011 IEEE International Symposium on Intelligent Control, Denver, CO, USA, 28–30 September 2011; pp. 171–178. [Google Scholar]
  37. Finlayson, B.A. The Method of Weighted Residuals and Variational Principles. J. Fluid Mech. 1973, 57, 623. [Google Scholar]
  38. Vamvoudakis, K.G.; Lewis, F.L. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar]
Figure 1. System state x 1 and its tracking trajectory when η 1 = 1 and η 2 = 1 .
Figure 1. System state x 1 and its tracking trajectory when η 1 = 1 and η 2 = 1 .
Mathematics 10 01904 g001
Figure 2. System state x 2 and its tracking trajectory when η 1 = 1 and η 2 = 1 .
Figure 2. System state x 2 and its tracking trajectory when η 1 = 1 and η 2 = 1 .
Mathematics 10 01904 g002
Figure 3. Convergence curves of the critic NN1 weights for player 1.
Figure 3. Convergence curves of the critic NN1 weights for player 1.
Mathematics 10 01904 g003
Figure 4. Convergence curves of the critic NN2 weights for player 2.
Figure 4. Convergence curves of the critic NN2 weights for player 2.
Mathematics 10 01904 g004
Figure 5. Control trajectories for two players when η 1 = 1 and η 2 = 1 .
Figure 5. Control trajectories for two players when η 1 = 1 and η 2 = 1 .
Mathematics 10 01904 g005
Figure 6. Tracking error trajectories when η 1 = 1 and η 2 = 1 .
Figure 6. Tracking error trajectories when η 1 = 1 and η 2 = 1 .
Mathematics 10 01904 g006
Figure 7. System state x 1 and its tracking trajectory when η 1 = 0.5 and η 2 = 0.5 .
Figure 7. System state x 1 and its tracking trajectory when η 1 = 0.5 and η 2 = 0.5 .
Mathematics 10 01904 g007
Figure 8. System state x 2 and its tracking trajectory when η 1 = 0.5 and η 2 = 0.5 .
Figure 8. System state x 2 and its tracking trajectory when η 1 = 0.5 and η 2 = 0.5 .
Mathematics 10 01904 g008
Figure 9. System state x 1 and its tracking trajectory when η 1 = 0.2 and η 2 = 0.2 .
Figure 9. System state x 1 and its tracking trajectory when η 1 = 0.2 and η 2 = 0.2 .
Mathematics 10 01904 g009
Figure 10. System state x 2 and its tracking trajectory when η 1 = 0.2 and η 2 = 0.2 .
Figure 10. System state x 2 and its tracking trajectory when η 1 = 0.2 and η 2 = 0.2 .
Mathematics 10 01904 g010
Figure 11. Convergence curves of critic NN1 weights for player 1.
Figure 11. Convergence curves of critic NN1 weights for player 1.
Mathematics 10 01904 g011
Figure 12. Convergence curves of critic NN2 weights for player 2.
Figure 12. Convergence curves of critic NN2 weights for player 2.
Mathematics 10 01904 g012
Figure 13. Control trajectories for two players when η 1 = 0.2 and η 2 = 0.2 .
Figure 13. Control trajectories for two players when η 1 = 0.2 and η 2 = 0.2 .
Mathematics 10 01904 g013
Figure 14. Tracking error trajectories when η 1 = 0.2 and η 2 = 0.2 .
Figure 14. Tracking error trajectories when η 1 = 0.2 and η 2 = 0.2 .
Mathematics 10 01904 g014
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Qin, C.; Shang, Z.; Zhang, Z.; Zhang, D.; Zhang, J. Robust Tracking Control for Non-Zero-Sum Games of Continuous-Time Uncertain Nonlinear Systems. Mathematics 2022, 10, 1904. https://doi.org/10.3390/math10111904

AMA Style

Qin C, Shang Z, Zhang Z, Zhang D, Zhang J. Robust Tracking Control for Non-Zero-Sum Games of Continuous-Time Uncertain Nonlinear Systems. Mathematics. 2022; 10(11):1904. https://doi.org/10.3390/math10111904

Chicago/Turabian Style

Qin, Chunbin, Ziyang Shang, Zhongwei Zhang, Dehua Zhang, and Jishi Zhang. 2022. "Robust Tracking Control for Non-Zero-Sum Games of Continuous-Time Uncertain Nonlinear Systems" Mathematics 10, no. 11: 1904. https://doi.org/10.3390/math10111904

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop