Next Article in Journal
Spatial Distribution of Multi-Fractal Scaling Behaviours of Atmospheric XCO2 Concentration Time Series during 2010–2018 over China
Previous Article in Journal
Low Light Image Enhancement Algorithm Based on Detail Prediction and Attention Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Trajectory Tracking Control for Continuous-Time Nonlinear Systems with State Constraints and Uncertain Disturbances

School of Artificial Intelligence, Henan University, Zhengzhou 450000, China
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(6), 816; https://doi.org/10.3390/e24060816
Submission received: 20 April 2022 / Revised: 7 June 2022 / Accepted: 7 June 2022 / Published: 11 June 2022
(This article belongs to the Topic Advances in Nonlinear Dynamics: Methods and Applications)

Abstract

:
In this paper, a robust trajectory tracking control method with state constraints and uncertain disturbances on the ground of adaptive dynamic programming (ADP) is proposed for nonlinear systems. Firstly, the augmented system consists of the tracking error and the reference trajectory, and the tracking control problems with uncertain disturbances is described as the problem of robust control adjustment. In addition, considering the nominal system of the augmented system, the guaranteed cost tracking control problem is transformed into the optimal control problem by using the discount coefficient in the nominal system. A new safe Hamilton–Jacobi–Bellman (HJB) equation is proposed by combining the cost function with the control barrier function (CBF), so that the behavior of violating the safety regulations for the system states will be punished. In order to solve the new safe HJB equation, a critic neural network (NN) is used to approximate the solution of the safe HJB equation. According to the Lyapunov stability theory, in the case of state constraints and uncertain disturbances, the system states and the parameters of the critic neural network are guaranteed to be uniformly ultimately bounded (UUB). At the end of this paper, the feasibility of the proposed method is verified by a simulation example.

1. Introduction

With the continuous application of automatic driving technology [1,2] and intelligent robot technology [3,4], the role of the safety-critical system has attracted extensive attention. In the process of designing a controller, safety is the primary consideration compared to other performances. For the control system with strict safety requirements, the CBF was applied to the control system as a tool to achieve the purpose of state constraints.
Reinforcement learning (RL) can be regarded as the technology of strategy learning and evaluation learning. In the actual engineering application, although the phenomenon of the dimension curse exists in dynamic programming, RL can deal with it well, and we also call it adaptive dynamic programming (ADP) [5,6,7]. Adaptive dynamic programming is an intelligent control method, and it is also an approximate tool to deal with optimal control problems. However, the analytical solution of the Hamilton–Jacobi–Bellman (HJB) equation is generally difficult to obtain; therefore, the adaptive dynamic programming (ADP) can learn the solution of the HJB equation online by the neural network (NN) approximation method [8,9,10]. At present, a variety of control methods based on ADP have been proposed by researchers to deal with the problem of trajectory tracking and optimal control [11,12,13,14,15,16].
Adaptive dynamic programming enables the complex nonlinear system to achieve the desired tracking control goal [17,18,19,20]. In reference [17], the tracking performance of continuous-time nonlinear systems was analyzed by considering the influence of input constraints. Due to the influence of the actual situation, a series of uncertain disturbance factors are often considered. Therefore, robust optimal tracking control has become a research hot spot. In reference [18], solving tracking problems for complex nonlinear systems with uncertainty can be more difficult, and the adaptive criticism technique was used to solve the robust tracking control problems of nonlinear systems with random disturbances. Considering the nonlinear system with a continuous-time matching uncertainty, an effective robust tracking control method was adopted, and the discounted coefficient was selected for the nominal augmented error system in references [19,20]. Considering the system with disturbances, H tracking control was used in control systems with disturbances [21]. In order to reduce the design cost and waste of resources and adjust the accuracy of the control system, a tracking control method based on the event triggering was proposed [22]. Considering the optimal regulation problem, a new non-quadratic discount performance function was proposed in reference [23]. In reference [24], an improved adaptive robust tracking method was proposed for the uncertainty of nonlinear systems and successfully extended to the mass-spring-damper system. The tracking control method proposed above enabled the feasibility of the control strategy and enabled the system to achieve the predetermined control target. However, none of the tracking control methods proposed above consider the state constraints problem.
In references [25,26,27,28], different ADP-based methods were proposed to solve various engineering problems. In some specific environments, the control system is often required to have reliable security. The purpose for which the safety system was designed is to find its control strategy by conforming to the safety specifications specified by the physical constraints of the system [29]. The use of the CBF method to solve the safety constraints of systems with strict requirements has attracted extensive attention [30,31,32,33]. Let the states displayed by the system converge to the desired equilibrium point; an approximate adjustment method for solving the optimization problem of safety boundary control was proposed, and the cost of violating the safety constraint was directly embedded into the value function [34]. In reference [35], the application of the CBF was introduced, and the verification method and the characteristics of implementation safety in the context of a safety-critical control system are summarized. The discrete-time state constraints problem was described in reference [36], and the HJB equation with the CBF was solved by using the approximate properties of the neural network.
In this paper, a new guaranteed cost robust tracking method with state constraints and uncertain disturbances is proposed. This method can guarantee the convergence of the system error under conditions of uncertain disturbances and state constraints. The discounted coefficient is selected for the nominal augmented system with tracking errors. In addition, the CBF is added to the system to solve the constraint problem of system states. Finally, the approximation property of the critic NN is adopted to deal with the HJB equation. The contributions of this paper are described below:
  • For robust tracking control problems, the CBF is applied to the tracking control system with uncertain disturbances so that the system can still have good tracking performance in the case of state constraints;
  • Combining the traditional adaptive control method with the CBF, the CBF is directly extended to the original system, and the CBF is used as a penalty function to punish unsafe behavior;
  • A new guaranteed cost robust adaptive tracking method with state constraints and uncertain disturbances is proposed to solve the safety HJB equation through the critic NN learning framework, and the critic NN parameters are guaranteed to be uniformly ultimately bounded (UUB) under the influence of state constraints and uncertain disturbances.
The arrangement of other parts of this article is described below: Section 2 states the preliminary knowledge and introduces the relevant contents of the control barrier function. Section 3 describes the selection of discount value functions for the nominal augmented system and introduces the form of the new cost function after adding the barrier function. Section 4 introduces the learning method of a critic neural network with state constraints and uncertain disturbances. In Section 5, the effectiveness of the proposed method is verified by a simulation example. Finally, some conclusions are summarized in Section 6.

2. Preliminaries

2.1. Problem Statement

Consider the following uncertain nonlinear safety system
x ˙ ( t ) = f ( x ( t ) ) + g ( x ( t ) ) u ( t ) + Δ f ( x ( t ) ) ,
where x ( t ) Φ R n is the state variable, u ( t ) U R m is the control vector, Φ represents the safe feasible states set, U represents all admissible input sets, f ( x ( t ) ) R n and g ( x ( t ) ) R m are known functions with f ( 0 ) = 0 , and Δ f ( x ( t ) ) R n is the unknown perturbation term with Δ f ( 0 ) = 0 . Here, let the initial state x ( 0 ) = x 0 ; we assume that there exists a constant g M and it satisfies 0 < g ( x ) g M for x R n , and Δ f ( x ) = g ( x ) d ( x ) , where d ( x ) R m is an unknown perturbation, and the known perturbation function d M ( x ) > 0 is the boundary of d ( x ) , here, d ( x ) d M ( x ) . In addition, d ( 0 ) = 0 and d M ( 0 ) = 0 .
Assumption 1.
Let the reference trajectory of the system (1) be x d ( t ) , and x d ( t ) is a bounded function, which is limited and generated by the command generator x ˙ d ( t ) = r ( x d ( t ) ) . Meanwhile, the reference trajectory x d R n and the command function r R n are all Lipschitz continuous.

2.2. Control Barrier Function

The application of the CBF further solves the constraint problem of the system [36]. In a predefined security set, the CBF candidate is always positive and tends to infinity at the defined set boundary. The CBF has a negative derivative at infinity, so the CBF will not reach infinity. If the state of the system is close to the safety boundary, then the condition that the derivative is negative will return the state to the safety set, so that the state displayed by the system will be maintained within the predetermined set. The safe feasible set Φ consists of operational constraints and safety specifications [34],
Φ = { x R n | h ( x ) 0 } ,
Φ = { x R n | h ( x ) = 0 } ,
I n t Φ = { x R n | h ( x ) > 0 } ,
where Φ represents the boundary of the safe feasible set Φ , I n t Φ represents the interior of the set Φ , and h is a continuously differentiable function of x, which is composed of a one-dimensional system constraint range.
The CBF candidate B ( x ) satisfies all of the following properties,
1 α 1 ( h ( x ) ) B ( x ) 1 α 2 ( h ( x ) ) , x i n t Φ
B ˙ ( x ) α 3 ( h ( x ) ) , x i n t Φ
where α 1 ( · ) , α 2 ( · ) , and α 3 ( · ) are Lipschitz class K functions, and B ( x ) is a control barrier function.
Assumption 2.
Under the condition of uncertain disturbances, to ensure that the states of the system are constrained. We use a logarithmic control barrier function B r ( x ) , which satisfies the following properties,
B r ( x ) > 0 , x Φ , B r ( x ) , x Φ .
Besides, B r ( x ) is monotonically decreasing for ∀x∈Φ.
Under the condition of satisfying Assumption 2, the expression of the specific logarithmic barrier function can be defined as
B r ( x ) = l o g ( γ h ( x ) γ h ( x ) + 1 ) .
In (8), the parameter γ is a constant, and γ also determines the speed at which B r ( x ) is limited as it approaches the safety barrier.
Before describing the modified robust tracking method with constraints, we first make the following definitions and assumptions.
Definition 1.
The safety control input set of the nonlinear system (1) is given below
U c = { u R m | x u i n t Φ } ,
where x u is the system state associated with the control strategy u, and i n t Φ is the interior of the set defined in (4).
Assumption 3.
The initial condition of the nonlinear system ( 1 ) is strictly within Φ; in other words, x 0 i n t Φ . Assume that the initial set of allowed inputs is not empty and satisfies U a = U U c . In addition, the control strategy u ( x 0 ) U a exists.

3. Guaranteed Cost Robust Tracking Design with State Constraints and Uncertain Disturbances

3.1. Modified Robust Adaptive Tracking Control

The augmented system is constructed by combining the tracking error and the reference trajectory. Before describing the modified robust adaptive tracking control, the tracking error is written as e x ( t ) = x ( t ) x d ( t ) . According to (1), the tracking error system is derived as
e ˙ x ( t ) = f ( x d ( t ) + e x ( t ) ) + g ( x d ( t ) + e x ( t ) ) u ( t ) + Δ f ( x d ( t ) + e x ( t ) ) r ( x d ( t ) ) ,
where r ( x d ( t ) ) is a Lipschitz continuous function.
By considering the tracking error dynamics (10), the infinite horizon cost function is given below [37]
V ¯ ( e x ( t ) , u ) = t e α ( τ t ) U ( e x ( τ ) , u ( τ ) ) d ( τ ) ,
where α > 0 is a discount factor, and U ( e x , u ) = e x T Q e x + u T R u , both Q R n × n and R R m × m are symmetric positive definite matrices.
Under the condition of state constraints and uncertain disturbances, the purpose of dealing with the guaranteed cost tracking problem is to find the control input u = u ( e x ( t ) , x d ( t ) ) and a positive real number V ¯ * ; then the tracking error e x ( t ) converges to zero. Meanwhile, the cost function described in (11) satisfies V ¯ < V ¯ * . It should be pointed out that V ¯ * is called a guaranteed cost function, and the control u is called a guaranteed cost control input.
Remark 1.
The discount term e α ( τ t ) given in (11) is mainly used to ensure that the cost function is V ¯ < since the control u ( e x ( t ) , x d ( t ) ) contains a part depending on the reference trajectory x d ( t ) . In the absence of the discount term, u ( e x ( t ) , x d ( t ) ) may make (11) to be unbounded. If the reference trajectory x d ( t ) ) does not converge to zero, the cost function (11) is unbounded without considering the discount term e α ( τ t ) .
Let s ( t ) = [ e x T ( t ) , x d T ( t ) ] T R 2 n , and the augmented system of error dynamics can be given
s ˙ ( t ) = F ( s ( t ) ) + G ( s ( t ) ) u ( t ) + Δ F ( s ( t ) ) ,
the specific forms of F ( s ( t ) ) and G ( s ( t ) ) can be expressed as
F ( s ( t ) ) = f ( x d ( t ) + e x ( t ) ) r ( x d ( t ) ) r ( x d ( t ) )
and
G ( s ( t ) ) = g ( x d ( t ) + e x ( t ) ) 0 ,
and Δ F ( s ( t ) ) = G ( s ( t ) ) d ( s ( t ) ) , and because we know that d ( x ) d M ( x ) , it is very easy to know that the uncertain disturbance term d ( s ( t ) ) d M ( s ( t ) ) holds, and d M ( s ( t ) ) is the boundary of the uncertain disturbance term d ( s ( t ) ) .
Remark 2.
There is a random disturbance term d ( s ( t ) ) in the augmented system (12) described above, which makes the process of designing the controller difficult. In the following introduction, the augmented error system (12) is equivalent to the optimal control of its nominal system, and the tracking problem with the random disturbance is transformed into an optimization adjustment problem with the discounted value function.
Considering the existence of the uncertain term d ( s ( t ) ) , the nominal system description of the system (12) is
s ˙ ( t ) = F ( s ( t ) ) + G ( s ( t ) ) u ( t ) .
Inspired by references [29,36], B r ( x ) is combined with the nominal augmented system (13), and the modified value function is
V ( s ( t ) ) = t e α ( τ t ) [ ρ d M 2 ( s ( τ ) ) + s T ( τ ) Q T s ( τ ) + u T ( τ ) R u ( τ ) + B r ( s ( τ ) ) ] d ( τ ) ,
where Q T = d i a g { Q , 0 n × n } , ρ = λ m a x ( R ) , the maximum eigenvalue of R can be expressed by λ m a x ( R ) , both Q R n × n and R R m × m are weighted symmetric positive definite matrices of augmented systems, and α > 0 is a discount coefficient.
According to Bellman’s principle of the optimal control theory [38], the minimum value of the Hamiltonian of the modified value function (14) of the nominal system (13) is given
H m i n ( s , u , V s ) = ρ d M 2 ( s ) + s T Q T s + u T R u α V ( s ) + B r ( s ) + V s T ( F ( s ) + G ( s ) u ) ,
where V s = V / s , and the cost function V * ( s ( t ) ) can be considered as
V * ( s ( t ) ) = min u V ( s ( t ) ) .
For the system (13) with the control barrier value function (14), since the equation H ( s , u * , V s * ) / u * = 0 holds, we can obtain the optimal control input u * from (15)
u * = 1 2 R 1 G T ( s ) V s * ,
where V s * = V * ( s ) / s , and V * ( s ) denotes the optimal value V ( s ) .

3.2. State Constraints Analysis

In the process of designing a robust tracking controller, the CBF as a constraint tool makes the states of the system evolve within the specified constraints, and the system can maintain good performance within the set safety constraints. The CBF provides a constraint tool for safety-critical systems to optimize the performance of other control objectives and clearly explains the priority of security compared to other performance indexes. In order to further describe that the CBF is bounded, it is described below that the boundedness of the CBF is demonstrated by changing the order of the controller.
Lemma 1.
Consider an admissible feedback control strategy u 1 U a ; there is the following time-invariant positive definite function Z, which satisfies Z N 1
V T x ( F ( s ) + G ( s ) u 1 ) + ρ d M 2 ( s ) + s T Q T s + u 1 T R u 1 α V ( s ) + B r ( s ) = 0 ,
V ( s 0 , u 1 ) = Z ( s 0 , u 1 ) ,
where V is the value function of the system for all t [ 0 , ) , and the following formula holds
V ( s , u ) = Z ( s , u ) .
Proof. 
Assume V ( s , u 1 ) > 0 exists and is continuously differentiable; then, we have
V ( s ( t ) , u 1 ) V ( s 0 , u 1 ) = 0 t V ˙ ( s ( τ ) , u 1 ) d ( τ ) = 0 t V s ( F + G u 1 ) d ( τ ) .
Considering (21), there are also
Z ( s ( t ) , u 1 ) Z ( s 0 , u 1 ) = 0 t P ( s ( τ ) , u 1 ) d ( τ ) ,
where P ( s , u ) = ρ d M 2 ( s ) + s T Q T s + u T R u α V ( s ) + B r ( s ) .
We can derive from (21) and (22)
Z ( s ( t ) , u 1 ) V ( s ( t ) , u 1 ) = 0 t ( V s ( F + G u 1 ) P ( s ( τ ) , u 1 ) ) d ( τ ) + Z ( s 0 , u 1 ) V ( s 0 , u 1 ) .
Combining (18), (21), and (23), we can obtain
Z ( s ( t ) , u 1 ) V ( s ( t ) , u 1 ) = 0 t ( P ( s ( τ ) , u 1 ) P ( s ( τ ) , u 1 ) ) d ( τ ) = 0 .
Therefore, we can obtain
Z ( s ( t ) , u 1 ) = V ( s ( t ) , u 1 ) .
This completes the proof. □
Lemma 2.
We consider a series of positive definite value functions V ( s , t , u 1 ) , V ( s , t , u 2 ) , …, and V ( s , t , u i ) , and the corresponding abbreviations are V 1 , V 2 , …, and V k , which are concerned with the allowable control inputs u 1 ( s , t ) , u 2 ( s , t ) , …, and u k ( s , t ) U a . Then, the Hamiltonian value defined in (15) satisfies the following conditions
H m i n 1 H m i n 2 H m i n i ,
and the C B F candidate B r k is bounded in the range of 1 < k < i .
Proof. 
Assume that 0 k j i is satisfied for any j and k, and the condition H m i n k H m i n j holds; therefore, one has
V j = V k + V o ,
where V o = V o ( s ( t ) , u k ) . According to (17), u * can be rewritten as
u * = 1 2 R 1 G T V j ,
H m i n j = V j T ( F + G ( 1 2 R 1 G T V j ) ) + T ( s ) + 1 4 V j T G R 1 G T V j .
Considering T ( s ) = ρ d M 2 ( s ) + s T Q T s + B r ( s ) α V ( s ) . According to (27), one may obtain
H m i n j = H m i n k + V o T ( F + G u k * ) ( u o * T R u o * ) .
According to the above description, since H m i n j H m i n k + ( u o * T R u o * ) 0 , we can obtain
V o ( s , u k ) t 0 .
Because lim t V o ( s ( t ) ) = 0 , the following results are obtained
V o 0 ,
V ( s , t , u 1 ) > V ( s , t , u 2 ) > > V ( s , t , u i ) .
From the above Lemmas 1 and 2, we can obtain
Z ( s ( t ) , u k ) < Z ( s ( t ) , u 1 ) , 1 < k < i .
In the above derivation, not only Z ( s ( t ) , u k ) is bounded, but also P ( s , u ) is positive definite, and then B r k is also bounded. In other words, in the case of state constraints, the system states will not reach the safety boundary in the process of tracking the reference trajectory. This proves that the CBF is bounded within each moment. □
Theorem 1.
For the performance optimization problem described in (16), let both Assumption 2 and Assumption 3 hold. Through the improvement of control input (17), the security of the tracking state is guaranteed within a certain range for all t > 0 .
Proof. 
Through the introduction to Lemmas 1 and 2 above, the performance functions Z ( s , u k ) and candidate function B r k are bounded at each moment after the control input (17) is changed. From Assumptions 1 and 2, at the boundary of the constraint range, the value of the barrier function B r k will reach infinity; in other words, the CBF remains is bounded at any moment, which ensures that the states of the system never reach the safe boundary. □
In the above introduction, the CBF is directly added to the cost function, which makes the states of the system constrained. This method is applicable to the guaranteed cost robust trajectory tracking control without initial admissible control. The traditional tracking controller usually needs the initial admissible control law. Although the appropriate initial admissible control law is found, the appropriate initial admissible control law may not satisfy the condition of state constraints.
Due to the existence of the discount term e α ( τ t ) in Equation (11), to guarantee the stability of the closed-loop system in the process of the tracking reference trajectory process, a guaranteed cost adaptive critic NN learning framework is designed. Before proceeding to the next step, we make the following assumption.
Assumption 4.
Let J 1 ( s ) be a candidate of Lyapunov function and satisfy the condition of J ˙ 1 ( s ) = J 1 T ( s ) ( F ( s ) + G ( s ) u * ) < 0 , and J 1 ( s ) is continuously differentiable, where J 1 ( s ) = J 1 ( s ) / s . Assume there exists a symmetric positive definite matrix Λ ( s ) , and the condition of expression ( J 1 ( s ) ) T ( F ( s ) + G ( s ) u * ) = ( J 1 ( s ) ) T Λ ( s ) J 1 ( s ) holds.

4. Design of Guaranteed Cost Adaptive Critic NN Learning Framework

In this section, the approximation property of the critic NN is used to approximate the solution of the safety HJB Equation (15), a guaranteed cost adaptive critic NN learning framework is proposed, the weight of the critic NN is updated through online the learning scheme, and all the vectors of the critic NN finally are guaranteed to be UUB. Considering the cost function described in (16), we design a critic NN to approximate the cost function V * ( s ( t ) ) and its partial derivative
V * ( s ) = W T ϕ ( s ) + ϵ v ( s ) ,
V * ( s ) = ( ϕ ( s ) ) T W + ϵ v ( s ) ,
where W R l is the ideal vector of the critic neural network, the activation function of the critic NN can be expressed as ϕ ( s ) = [ φ 1 φ 2 φ 3 · · · φ l ] T R l , l is the number of hidden-layer neurons, ϕ ( s ) is denoted as the derivative of ϕ ( s ) , the approximation error of the critic NN is denoted by ϵ v ( s ) , and ϵ v ( s ) is the derivative of ϵ v ( s ) .
Assumption 5.
The vector W of the critic NN is bounded by a positive constant, i.e., W < W M , the activation function ϕ ( s ) and its derivative ϕ ( s ) , the critic NN error ϵ v ( s ) and its derivative ϵ v ( s ) , are bounded, and satisfy ϕ ( s ) < ϕ ε , ϕ ( s ) < ϕ d ε , ϵ v ( s ) < ϵ σ , and ϵ v ( s ) < ϵ d σ , where ϕ ε , ϕ d ε , ϵ σ , and ϵ d σ are positive constants.
From Equations (15), (16), and (30), the approximate error of the safety HJB form is
ρ d M 2 ( s ) + s T Q T s + u * T R u * α V * ( s ) + B r ( s ) + W T ϕ ( s ) ( F ( s ) + G ( s ) u * ) + ϵ v 1 ( s ) = 0 ,
where ϵ v 1 ( s ) = ϵ v T ( s ) ( F ( s ) + G ( s ) u * ) .
Considering Equations (17) and (30), we can draw the following conclusion,
u * = 1 2 R 1 G T ( s ) ( ϕ ( s ) ) T W + ϵ v 2 ,
where ϵ v 2 = ( 1 / 2 ) R 1 G T ( s ) ϵ v . At the same time, we substitute (32) into (31) and can obtain
ρ d M 2 ( s ) + s T Q T s α W T ϕ + B r ( s ) + W T ϕ F 1 4 W T ϕ ϱ ( ϕ ) T W ω = 0 ,
where ϱ = G ( s ) R 1 G ( s ) , and ω = α ϵ v ( ϵ v ) T F + ( 1 / 4 ) ( ϵ v ) T ϱ ϵ v + ( 1 / 2 ) W T ϕ ϱ ϵ v is the approximation error.
We do not know the value of the ideal weight W; therefore, by using the critic NN to approximate the cost function V * ( s ) as
V ^ ( s ) = W ^ T ϕ ( s ) ,
where W ^ denotes the estimated value of the ideal vector W, and V ^ is the estimated value of the ideal cost function V * . We can obtain the approximate HJB equation form from Equations (15) and (34)
H ^ ( s , u , V ^ s ) = ρ d M 2 ( s ) + s T Q T s + u T R u α V ^ ( s ) + B r ( s ) + V ^ s T ( F ( s ) + G ( s ) u ) .
Based on Equation (34), the control input u ^ ( s ) can be approximated by
u ^ ( s ) = 1 2 R 1 G T ( s ) ( ϕ ( s ) ) T W ^ .
Through Equations (31) and (35), we define the HJB equation error caused by the critic NN in the approximation process as
ε = ρ d M 2 ( s ) + s T Q T s α W ^ T ϕ + B r ( s ) + W ^ T ϕ F 1 4 W ^ T ϕ ϱ ( ϕ ) T W ^ .
The estimation error of the weights of the critic NN is defined as W ˜ , and we can obtain
W ˜ = W W ^ .
The HJB approximation error can be defined as
ε = W ˜ T ξ + 1 4 W ˜ T ϕ ϱ ( ϕ ) T W ˜ + ω ,
where ξ = ϕ ( F ( s ) + G ( s ) u ^ ) α ϕ ( s ) . The Lyapunov function candidate J 1 ( s ) is shown in Assumption 4, and we take Π ( s , u ^ ) as an indicator function and define it as
Π ( s , u ^ ) = 0 , if J 1 T ( s ) ( F ( s ) + G ( s ) u ^ ) < 0 , 1 , else .
We choose W ^ to minimize the square residual E = ( 1 / 2 ) ε T ε , and then we obtain the minimum value of the HJB approximation error ε . We use the gradient descent method as the critic vector adjustment optimization law
W ^ ˙ = β ξ ¯ ( L ( s ) + Y ( s ) + B r ( s ) + ρ d M 2 ( s ) ) + β 2 Π ( s , u ^ ) ϕ ϱ J 1 ( s ) + β ( ( K 1 θ T K 2 ) + A ( s ) ) W ^ ,
where ξ ¯ = ξ / ( 1 + ξ T ξ ) 2 , θ = ξ / ( 1 + ξ T ξ ) , L ( s ) = W ^ T ϕ F α W ^ T ϕ + s T Q T s , Y ( s ) = ( 1 / 4 ) W ^ T ϕ ϱ ( ϕ ) T W ^ , and A ( s ) = ( 1 / 4 ) ϕ ϱ ( ϕ ) T W ^ ( θ / ( 1 + ξ T ξ ) ) T , and β > 0 is a learning rate that determines the convergence speed of the critic NN. K 1 and K 2 are two tuning parameters.
From the above description, it is deduced that the weight estimation error is
W ˜ ˙ = β ξ ¯ ( W ˜ T ξ + Y ˜ ( s ) B r ( s ) ) β 2 Π ( s , u ^ ) ϕ ϱ J 1 β ( ( K 1 θ T K 2 ) A ˜ ( s ) ) ( W W ˜ ) ,
where Y ˜ ( s ) = ( 1 / 4 ) W ˜ T ϕ ϱ ( ϕ ) T W ˜ , A ˜ ( s ) = ( 1 / 4 ) ϕ ϱ ( ϕ ) T ( W W ˜ ) ( θ / ( 1 + ξ T ξ ) ) T .
Theorem 2.
Consider the nominal system (13), the modified value function (15), and the tuning laws (41). Only if all the above Assumptions 1–5 hold, then the critic NN error W ˜ , the system state x, and the control input u * are guaranteed to be UUB.
Proof. 
Analyze the Lyapunov candidate function described below
L ( t ) = V s ( s ( t ) ) + 1 2 W ˜ T β 1 W ˜ .
The result of deriving Equation (43) is shown as
L ˙ ( t ) = V ˙ s ( s ( t ) ) + W ˜ T β 1 W ˜ ˙ = L ˙ V + L ˙ W .
The first term L ˙ V is
L ˙ V = W ˜ T ϕ ( s ) ( F ( s ) + G ( s ) u ) + ϵ v T ( s ) ( F ( s ) + G ( s ) u ) = W T ( ϕ ( s ) F ( s ) 1 2 D 1 W ^ ) + ϵ v 1 ( s ) = W T ϕ ( s ) F ( s ) + 1 2 W T D 1 ( W W ^ ) 1 2 W T D 1 W + ϵ v 1 ( s ) = W T ϕ ( s ) F ( s ) + 1 2 W T D 1 W ˜ 1 2 W T D 1 ( s ) W + ϵ v 1 ( s ) = W T σ + 1 2 W T D 1 W ˜ + ϵ v 1 ( s ) ,
where ϵ v 1 ( s ) = ϵ v 1 T ( s ) ( F ( s ) 1 2 G ( s ) R 1 G ( s ) T ϕ ( s ) W ˜ ) , σ = ϕ ( s ) ( F ( s ) + G ( s ) u ) , and D 1 = ϕ ϱ ( ϕ ) T .
The second term L ˙ W can be obtained by (41)
L ˙ W = W ˜ T β 1 W ˜ ˙ = W ˜ T β 1 [ β ξ ¯ ( W ˜ T ξ + A ˜ ( s ) B r ( s ) ) β 2 Π ( s , u ^ ) ϕ ϱ J 1 β ( ( K 1 θ T K 2 ) A ˜ ( s ) ) ( W W ˜ ) ] = W ˜ T [ ξ ¯ ( W ˜ T ξ + A ˜ ( s ) B r ( s ) ) 1 2 Π ( s , u ^ ) ϕ ϱ J 1 ( ( K 1 θ T K 2 ) A ˜ ( s ) ) ( W W ˜ ) ] = W ˜ T [ ξ ¯ ( ( W ˜ T ξ ω ) 1 4 W ˜ T D 1 W ˜ ) ( ( K 1 θ T K 2 ) D 1 4 ( W W ˜ ) θ T m ) ( W W ˜ ) ] 1 2 Π ( s , u ^ ) W ˜ T ϕ ϱ J 1 = W ˜ T [ θ W ˜ T θ + θ ω m + θ W ˜ T D 1 W ˜ 4 ] W ˜ T [ ( K 1 θ T K 2 ) ( W W ˜ ) D 1 4 ( W W ˜ ) ( θ / m ) T ( W W ˜ ) ] c ,
where m = 1 + ξ T ξ , c = Π ( s , u ^ ) W ˜ T ϕ ϱ J 1 2 . Further, we can obtain
L ˙ W = W ˜ T θ W ˜ T θ + W ˜ T θ ω m + W ˜ T θ W ˜ T D 1 W ˜ 4 W ˜ T ( K 1 θ T K 2 ) ( W W ˜ ) + W ˜ T D 1 θ T 4 m ( W W ˜ ) ( W W ˜ ) c = W ˜ T θ W ˜ T θ + W ˜ T θ ω m + W ˜ T θ W ˜ T D 1 W ˜ 4 W ˜ T ( K 1 θ T K 2 ) W + W ˜ T ( K 1 θ T K 2 ) W ˜ + W ˜ T D 1 W θ T 4 m W W ˜ T D 1 W ˜ θ T 4 m W W ˜ T D 1 W θ T 4 m W ˜ + W ˜ T D 1 W ˜ θ T 4 m W ˜ c .
Taking the sum of the terms L ˙ V and L ˙ W , we obtain
L ˙ ( t ) = W T σ + 1 2 W T D 1 W ˜ W ˜ T θ W ˜ T θ + W ˜ T θ ω m + W ˜ T θ W ˜ T D 1 W ˜ 4 W ˜ T ( K 1 θ T K 2 ) W + W ˜ T ( K 1 θ T K 2 ) W ˜ + W ˜ T D 1 W θ T 4 m W W ˜ T D 1 W ˜ θ T 4 m W W ˜ T D 1 W θ T 4 m W ˜ + W ˜ T D 1 W ˜ θ T 4 m W ˜ c + ϵ v 1 ( s ) .
Assume that Z = [ W ˜ T θ , W ˜ T ] T , then we can obtain
L ˙ ( t ) = Z T I W T D 1 8 m K 1 T 2 D 1 W 8 m K 1 2 K 2 θ T W D 1 4 m Z
+ Z T ω m D 1 W θ T W 4 m + K 2 W K 1 θ T W + b + d ,
where
b = c + W T σ + ϵ v 1 ( s ) ,
d = W T D 1 ( W ˜ θ T W ˜ W ˜ θ T W W θ T W ˜ ) 4 m .
Define
M = I W T D 1 8 m K 1 T 2 D 1 W 8 m K 1 2 K 2 θ T W D 1 4 m ,
a = ω m D 1 W θ T W 4 m + K 2 W K 1 θ T W .
Let the tuning parameters K 1 , K 2 , and γ be chosen so that M > 0 , and we obtain
L ˙ ( t ) < Z 2 σ m i n ( M ) + a Z + μ ,
where μ = b + d . In summary, the Lyapunov derivative L ˙ ( t ) is negative if
Z > a 2 σ m i n ( M ) + μ σ m i n ( M ) + a 2 4 σ m i n ( M ) .
Based on the Lyapunov theorem [39], as long as the selected appropriately tuning parameters K 1 , K 2 , and γ make the formula (32) hold, in the case of state constraints and uncertain disturbances, the critic NN weight error W ˜ , the system state x, and the control input u * are guaranteed to be UUB, and the nonlinear system (1) is guaranteed to be closed-loop stable in the presence of state constraints and uncertain disturbances. The proof is completed. □

5. Simulation

We consider a spring-mass-damping system with nonlinear properties [22], and the system dynamics of the spring-mass-damper are as follows [24]
x ˙ 1 = x 2 , x ˙ 2 = K ( x 2 ) M C M x 1 + 1 M u + p x 1 s i n ( x 2 ) ,
where x = [ x 1 , x 2 ] T R 2 and the nonlinear condition K ( x ) = x 3 , x 1 , and x 2 are the position and velocity, respectively, and u is the force applied to the object. M is the mass of the object. K is the stiffness constant of the spring, and C is the damping. The above system dynamics parameters are M = 1 kg and C = 0.5 N· s/m. A mismatched disturbance may lead to system instability. Considering that the system still has stable performance under disturbances, the uncertain disturbance matching the system is selected, the uncertain disturbance term d ( x ) = p x 1 s i n ( x 2 ) , and we assume that p [ 1 , 1 ] and d M ( x ) = x .
In the simulation process, considering that the initial allowable control law is not required, to make the tracking errors of the system converge to zero, the reference trajectory gradually tending to zero is selected, and the following reference trajectory x d ( t ) is given
x ˙ d = 0.5 x d 1 x d 2 c o s ( x d 1 ) s i n ( x d 1 ) x d 2 ,
the initial condition is given as x d ( 0 ) = [ 0.15 , 0.25 ] T , and we set the augmented state vector as s = [ e x T , x d T ] T , and then combine (56) with (57), the dynamics of the augmented system can be derived
s ˙ = s 2 + s 4 + 0.5 s 3 + s 4 c o s ( s 3 ) ( s 2 + s 4 ) 3 0.5 ( s 1 + s 3 ) s i n ( s 3 ) + s 4 0.5 s 3 s 4 c o s ( s 3 ) s i n ( s 3 ) s 4 + 0 1 0 0 ( u + d ( s ) ) ,
where s = [ s 1 , s 2 , s 3 , s 4 ] T = [ e x 1 , e x 2 , x d 1 , x d 2 ] T with e x i = x i x d i .
To constrain the states of the system in augmented system dynamics (58), the control barrier function used is as follows
B r 1 ( s 1 + s 3 ) = l o g ( γ h ( s 1 + s 3 ) γ h ( s 1 + s 3 ) + 1 ) B r 2 ( s 2 + s 4 ) = l o g ( γ h ( s 2 + s 4 ) γ h ( s 2 + s 4 ) + 1 ) .
The state constraints of the system are given as 0.2 x 1 0.35 and 0.15 x 2 0.4 , and the parameter γ = 0.02 .
To complete the design of robust trajectory tracking control, the modified value function (14) can be specified as
V ( s ( t ) ) = t e α ( τ t ) [ ρ d M 2 ( s ( τ ) ) + s T ( τ ) Q T s ( τ ) + u T ( τ ) R u ( τ ) + B r 1 ( s ( τ ) ) + B r 2 ( s ( τ ) ) ] d ( τ ) .
Besides, we select the learning rate as β = 1.5 and the discount factor α = 0.15 , respectively. In order to deal with the approximate optimal control for the nominal augmented part of (58), we choose Q T = d i a g { 5 I 2 , 0 2 × 2 } and R = I , and I denotes an identity matrix of appropriate dimensions. In this example, the activation function for the critic NN is chosen as ϕ ( s ) = [ s 1 2 , s 1 s 2 , s 1 s 3 , s 1 s 4 , s 2 2 , s 2 s 3 , s 2 s 4 , s 3 2 , s 3 s 4 , s 4 2 ] T . In addition, the weights of the critic NN are denoted as W ^ c = [ W c 1 , W c 2 , , W c 10 ] T . The initial value of the state is given as x ( 0 ) = [ 0.2 , 0.4 ] T , and it is easy to calculate the initial error vector according to s ( 0 ) = x ( 0 ) x d ( 0 ) , so the initial state of the augmented system is s ( 0 ) = [ 0.35 , 0.15 , 0.15 , 0.25 ] T . In order to satisfy the condition of persistency of excitation, an exploration noise e x p ( 0.25 t ) s i n 2 ( t ) c o s ( t ) is added during the training of the neural network.
The convergence of critic parameters is shown in Figure 1, and the critic parameters after 30 s converge to W ^ = [ 3.3767 , 0.9606 , 0.8867 , 0.7752 , 1.9266 , 1.0686 , 1.105 , 1.067 , 1.0992 , 1.0898 ] T . Figure 2 shows the control inputs of the system. Figure 3 shows the trajectory of the tracking errors e x 1 and e x 2 of the system without state constraints. Figure 4 shows the tracking error of the system under state constraints. Figure 5 and Figure 6 show that the system tracks the reference trajectory without state constraints, and we can see that the system states violate the constraints. Figure 7 and Figure 8 show that the system tracks the desired trajectory with state constraints, and that under the condition of state constraints and uncertain disturbances, the system still maintains good performance. The method described in this paper can ensure the stability of the closed-loop system. In summary, the simulation results display the effectiveness of the proposed method.

6. Conclusions

This paper presented a robust trajectory tracking method for nonlinear systems with state constraints and uncertain disturbances based on adaptive dynamic programming. Firstly, the system error was combined with the reference trajectory to construct the augmented system, and at the same time, the nominal system of the augmented system was considered. In order to overcome the uncertain disturbances of the augmented system, the discount coefficient was introduced into the nominal system, and the CBF was added into the nominal system with the discount coefficient to constrain the states of the system. In addition, cost functions and control strategies were learned by designing a guaranteed cost adaptive critic NN learning framework. Finally, the simulation results demonstrated that the described method can converge the system error within the state constraints. In the next work, we will try to extend the state constraints method to discrete-time tracking control systems and multi-agent systems.

Author Contributions

C.Q. and X.Q. provided methodology, validation, and writing—original draft preparation; J.W. and D.Z. provided conceptualization, writing—review, and supervision; C.Q. provided funding support. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant (U1504615), the Youth Backbone Teachers in Colleges and Universities of Henan Province (2018GGJS017), and the Science and Technology Research Project of the Henan Province (222102240014).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors can confirm that all relevant data are included in the article.

Conflicts of Interest

The authors declare that they have no conflict of interest. All authors have approved the manuscript and agreed with submission to this journal.

References

  1. Xie, R.; Tian, X.; Zhang, Q.; Pan, X. Research on longitudinal control algorithm for intelligent automatic driving. In Proceedings of the 2021 IEEE 4th International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China, 19–21 November 2021; pp. 664–667. [Google Scholar]
  2. Chen, Y.; Peng, H.; Grizzle, J. Obstacle Avoidance for Low-Speed Autonomous Vehicles with Barrier Function. IEEE Trans. Control Syst. Technol. 2018, 26, 194–206. [Google Scholar] [CrossRef]
  3. Molnar, T.G.; Cosner, R.K.; Singletary, A.W.; Ubellacker, W.; Ames, A.D. Model-Free Safety-Critical Control for Robotic Systems. IEEE Robot. Autom. Lett. 2022, 7, 944–951. [Google Scholar] [CrossRef]
  4. Ferraguti, F.; Landi, C.T.; Costi, S.; Bonfe, M.; Fantuzzi, C. Safety barrier functions and multi-camera tracking for human–robot shared environment. Robot. Auton. Syst. 2020, 124, 103388. [Google Scholar] [CrossRef]
  5. Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive Dynamic Programming for Control: A Survey and Recent Advances. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 142–160. [Google Scholar] [CrossRef]
  6. Zhang, H.; Zhang, X.; Luo, Y. An overview of research on adaptive dynamic programming. Acta Autom. Sin. 2013, 39, 303–311. [Google Scholar] [CrossRef]
  7. Wang, F.Y.; Zhang, H.; Liu, D. Adaptive dynamic programming: An introduction. IEEE Comput. Intell. Mag. 2009, 4, 39–47. [Google Scholar] [CrossRef]
  8. Vamvoudakis, K.G.; Lewis, F.L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
  9. Mehraeen, S.; Dierks, T.; Jagannathan, S.; Crow, M.L. Zero-Sum Two-Player Game Theoretic Formulation of Affine Nonlinear Discrete-Time Systems Using Neural Networks. IEEE Trans. Cybern. 2013, 43, 1641–1655. [Google Scholar] [CrossRef]
  10. Munos, R.; Baird, L.; Moore, A. Gradient descent approaches to neural-net-based solutions of the Hamilton Jacobi Bellman equation. In Proceedings of the International Joint Conference on Neural Networks, Washington, DC, USA, 10–16 July 1999; pp. 2152–2157. [Google Scholar]
  11. Wei, Q.; Liu, D.; Lin, H. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans. Cybern. 2016, 46, 840–853. [Google Scholar] [CrossRef]
  12. Wei, Q.; Liao, Z.; Yang, Z.; Li, B.; Liu, D. Continuous-time time-varying policy iteration. IEEE Trans. Cybern. 2020, 50, 4958–4971. [Google Scholar] [CrossRef]
  13. Kiumarsi, B.; Lewis, F.L.; Modares, H.; Karimpour, A.; Naghibi, S. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 2014, 50, 1167–1175. [Google Scholar] [CrossRef]
  14. Wei, Q.; Song, R.; Yan, P. Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 444–458. [Google Scholar] [CrossRef] [PubMed]
  15. Modares, H.; Lewis, F.L. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning. IEEE Trans. Autom. Control 2014, 59, 3051–3056. [Google Scholar] [CrossRef]
  16. Qin, C.; Zhang, H.; Luo, Y. Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming. Int. J. Control 2014, 87, 1000–1009. [Google Scholar] [CrossRef]
  17. Cui, L.; Xie, X.; Wang, X.; Luo, Y.; Liu, J. Event-triggered single-network ADP method for constrained optimal tracking control of continuous-time non-linear systems. Appl. Math. Comput. 2019, 352, 220–234. [Google Scholar] [CrossRef]
  18. Wang, D.; Liu, D.; Zhang, Y.; Li, H. Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems. Neural Netw. 2018, 97, 11–18. [Google Scholar] [CrossRef] [PubMed]
  19. Zhao, J.; Na, J.; Gao, G. Robust tracking control of uncertain nonlinear systems with adaptive dynamic programming. Neurocomputing 2022, 471, 21–30. [Google Scholar] [CrossRef]
  20. Wang, D.; Liu, D.; Li, H.; Ma, H. Adaptive dynamic programming for infinite horizon optimal robust guaranteed cost control of a class of uncertain nonlinear systems. In Proceedings of the American Control Conference, Chicago, IL, USA, 1–3 July 2015; pp. 2900–2905. [Google Scholar]
  21. Modares, H.; Lewis, F.L.; Jiang, Z. H Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2550–2562. [Google Scholar] [CrossRef]
  22. Wang, F.; Gu, Q.; Huang, B.; Wei, Q.; Zhou, T. Adaptive Event-Triggered Near-Optimal Tracking Control for Unknown Continuous-Time Nonlinear Systems. IEEE Access 2022, 10, 9506–9518. [Google Scholar] [CrossRef]
  23. Modares, H.; Lewis, F.L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
  24. Wang, D.; Mu, C. Adaptive-Critic-Based Robust Trajectory Tracking of Uncertain Dynamics and Its Application to A Spring-Mass-Damper System. IEEE Trans. Ind. Electron. 2018, 65, 654–663. [Google Scholar] [CrossRef]
  25. Lu, K.; Liu, C.; Sun, J.; Li, C.; Ma, C. Compensator-based approximate optimal control for affine nonlinear systems with input constraints and unmatched disturbances. Trans. Inst. Meas. Control 2020, 42, 3024–3034. [Google Scholar] [CrossRef]
  26. Liu, D.; Wang, D.; Wang, F.; Li, H.; Yang, X. Neural-Network-Based Online HJB Solution for Optimal Robust Guaranteed Cost Control of Continuous-Time Uncertain Nonlinear Systems. IEEE Trans. Cybern. 2014, 44, 2834–2847. [Google Scholar] [CrossRef] [PubMed]
  27. Yang, X.; Liu, D.; Wei, Q. Online approximate optimal control for affine nonlinear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl. 2014, 8, 1676–1688. [Google Scholar] [CrossRef] [Green Version]
  28. Zhu, Y.; Zhao, D.; Li, X. Using reinforcement learning techniques to solve continuous-time nonlinear optimal tracking problem without system dynamics. IET Control Theory Appl. 2016, 10, 1339–1347. [Google Scholar] [CrossRef]
  29. Marvi, Z.; Kiumarsi, B. Safe Off-policy Reinforcement Learning Using Barrier Functions. In Proceedings of the 2020 American Control Conference (ACC), Denver, USA, 1–3 July 2020; pp. 2176–2181. [Google Scholar]
  30. Yang, Y.; Ding, D.; Xiong, H.; Yin, Y.; Wunsch, D. Online barrier-actor-critic learning for H control with full-state constraints and input saturation. J. Frankl. Inst. 2020, 357, 3316–3344. [Google Scholar] [CrossRef]
  31. Ames, A.D.; Grizzle, J.W.; Tabuada, P. Control barrier function based quadratic programs with application to adaptive cruise control. In Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA, 15–17 December 2014; pp. 6271–6278. [Google Scholar]
  32. Yang, Y.; Vamvoudakis, K.; Modares, H. Safe reinforcement learning for dynamical games. Int. J. Robust Nonlinear 2020, 30, 3706–3726. [Google Scholar] [CrossRef]
  33. Srinivasan, M.; Coogan, S. Control of Mobile Robots Using Barrier Functions Under Temporal Logic Specifications. IEEE Trans. Robot. 2021, 37, 363–374. [Google Scholar] [CrossRef]
  34. Cohen, M.; Belta, C. Approximate Optimal Control for Safety-Critical Systems with Control Barrier Functions. In Proceedings of the 59th IEEE Conference on Decision and Control (CDC), Jeju, Korea, 14–18 December 2020; pp. 2062–2067. [Google Scholar]
  35. Ames, A.D.; Coogan, S.; Egerstedt, M.; Notomista, G.; Sreenath, K.; Tabuada, P. Control Barrier Functions: Theory and Applications. In Proceedings of the 18th European Control Conference (ECC), Naples, Italy, 25–28 June 2019; pp. 3420–3431. [Google Scholar]
  36. Xu, J.; Wang, J.; Rao, J.; Zhong, Y.; Wang, H. Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function. Int. J. Robust Nonlinear Control 2022, 32, 3408–3424. [Google Scholar] [CrossRef]
  37. Yang, X.; Liu, D.; Wei, Q.; Wang, D. Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing 2016, 198, 80–90. [Google Scholar] [CrossRef]
  38. Lewis, F.L.; Vrabie, D.L.; Syrmos, V.L. Optimal Control, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012; pp. 277–279. [Google Scholar]
  39. Lewis, F.L.; Jagannathan, S.; Yesildirek, A. Neural Network Control of Robot Manipulators and Nonlinear Systems; Taylor& Francis: London, UK, 1999. [Google Scholar]
Figure 1. Convergence of parameters of the critic NN.
Figure 1. Convergence of parameters of the critic NN.
Entropy 24 00816 g001
Figure 2. The control input of the system.
Figure 2. The control input of the system.
Entropy 24 00816 g002
Figure 3. Tracking error of system without state constraints ( p = 0.8 ) .
Figure 3. Tracking error of system without state constraints ( p = 0.8 ) .
Entropy 24 00816 g003
Figure 4. Tracking error of system with state constraints ( p = 0.8 ) .
Figure 4. Tracking error of system with state constraints ( p = 0.8 ) .
Entropy 24 00816 g004
Figure 5. Trajectory of the system state x 1 without state constraints ( p = 0.8 ) .
Figure 5. Trajectory of the system state x 1 without state constraints ( p = 0.8 ) .
Entropy 24 00816 g005
Figure 6. Trajectory of the system state x 2 without state constraints ( p = 0.8 ) .
Figure 6. Trajectory of the system state x 2 without state constraints ( p = 0.8 ) .
Entropy 24 00816 g006
Figure 7. Trajectory of the system state x 1 with state constraints ( p = 0.8 ) .
Figure 7. Trajectory of the system state x 1 with state constraints ( p = 0.8 ) .
Entropy 24 00816 g007
Figure 8. Trajectory of the system state x 2 with state constraints ( p = 0.8 ) .
Figure 8. Trajectory of the system state x 2 with state constraints ( p = 0.8 ) .
Entropy 24 00816 g008
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Qin, C.; Qiao, X.; Wang, J.; Zhang, D. Robust Trajectory Tracking Control for Continuous-Time Nonlinear Systems with State Constraints and Uncertain Disturbances. Entropy 2022, 24, 816. https://doi.org/10.3390/e24060816

AMA Style

Qin C, Qiao X, Wang J, Zhang D. Robust Trajectory Tracking Control for Continuous-Time Nonlinear Systems with State Constraints and Uncertain Disturbances. Entropy. 2022; 24(6):816. https://doi.org/10.3390/e24060816

Chicago/Turabian Style

Qin, Chunbin, Xiaopeng Qiao, Jinguang Wang, and Dehua Zhang. 2022. "Robust Trajectory Tracking Control for Continuous-Time Nonlinear Systems with State Constraints and Uncertain Disturbances" Entropy 24, no. 6: 816. https://doi.org/10.3390/e24060816

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop