Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube

Yang, Shizhong; Liu, Yanli; Cao, Huidong

doi:10.3390/sym15101845

Open AccessArticle

Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube

by

Shizhong Yang

,

Yanli Liu

^* and

Huidong Cao

College of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(10), 1845; https://doi.org/10.3390/sym15101845

Submission received: 17 August 2023 / Revised: 22 September 2023 / Accepted: 26 September 2023 / Published: 29 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

This paper proposes a novel robust model predictive control (RMPC) scheme for constrained linear discrete-time systems with bounded disturbance. Firstly, the adjustable error tube set, which is affected by local error and error variety rate, is introduced to overcome uncertainties and disturbances. Secondly, the auxiliary control rate associated with the cost function is designed to minimize the discrepancy between the actual system and the nominal system. Finally, a constrained deep neural network (DNN) architecture with symmetry properties is developed to address the optimal control problem (OCP) within the constrained system while conducting a thorough convergence analysis. These innovations enable more flexible adjustments of state and control tube cross-sections and significantly improve optimization speed compared to the homothetic tube MPC. Moreover, the effectiveness and practicability of the proposed optimal control strategy are illustrated by two numerical simulations. In practical terms, for 2-D systems, this approach achieves a remarkable 726.23-fold improvement in optimization speed, and for 4-D problems, it demonstrates an even more impressive 7218.07-fold enhancement.

Keywords:

symmetry; constrained system; robust model predictive control; deep neural network; bounded disturbance

1. Introduction

Over the past few years, RMPC was enjoying enormous acceptance in practical applications, including trajectory tracking, industrial process control, and energy systems [1,2,3]. The successful implementation of RMPC in the various branches is on account of its prominent advantages. In particular, RMPC provides an integrated solution for controlling systems with model uncertainty, additive disturbance, and constraints. Theoretically, the feature attracted remarkable attention for analyzing and synthesizing different forms of RMPC. As a result, several RMPC algorithms were investigated in the literature [4,5,6,7], and so on.

Recently, the application requirements for considering practical constraints and the realization environment prompted increasing attention of RMPC towards new orientations. For instance, the increasing demand for algorithms underscores the need to integrate optimization performance and control robustness, propelling the development of tube-based MPC (TMPC) [8,9,10]. The deployment of tubes draws forth a set of strictly set theoretic strategies for RMPC synthesis, which consider a computationally efficient treatment of uncertainties and their interaction with the system dynamics and constraints. In [11,12], a class of linear systems with bounded disturbance and convex constraint separated the nominal system from the actual system by adopting a separation control strategy. What is noteworthy is that the conservatism of the proposal employing this construction in [13,14] was caused by deploying the fixed tube cross-section shape sets. To mitigate this conservatism, the homothetic tube model predictive control (HTMPC) strategy proposed in [15,16] explored the impact of disturbances by constructing locally accurate reachable sets centered around nominal system trajectories. In light of these developments, the concept of HTMPC emerged as an enhanced and more adaptable framework for RMPC synthesis. Among the array of control schemes considered, HTMPC stands out as an improved and more versatile option. What sets it apart is its capacity to parameterize the cross-sections of the state tube and control tube in terms of associated centers and scaling sequences. This study aims to further investigate this concept by considering variations in state error and changes in the value of the cost function during error adjustment in designing the tube size controller and auxiliary control law, which distinguishes it from previous literature [15] that incorporates scaling vector optimization into OCP, thereby increasing computational complexity and aiming to optimize scaling vectors to a specific value. However, it is essential to note that an inherent drawback of the HTMPC approach lies in its computational complexity, which grows significantly with an increasing number of constraints, as measured by the proliferation of polytopic regions.

Furthermore, the issue of computational complexity is generally associated with dynamic programming in the presence of constraints and uncertainties, which inspires the development of parameterized RMPC [17,18,19]. The parameterized optimization problem is commonly approximated using neural network (NN) or DNN to enhance computational efficiency [20,21]. Certain studies even turned to symmetric neural networks (SNNs) due to their unique properties [22]. SNNs, characterized by symmetric weight initialization and activation functions, demonstrated their ability to accelerate convergence and improve the robustness of neural network-based approaches [23]. Some studies adopted an offline approach to generate nominal systems [24,25]. While effective in reducing online computation time, this method leans toward a more conservative control strategy in highly uncertain scenarios, necessitating a trade-off with control performance. Additionally, other studies considered system uncertainty by establishing a linear variable parameter system model [26,27]. This approach facilitates adaptive learning to address system changes and uncertainties, making it better suited for handling variations in the variable parameter system. However, applying this technique in complex, large-scale systems demands substantial computational resources for the training and inference of DNN, potentially leading to real-time control delays. As the field of online learning technology continues to mature, its integration with RMPC holds promise for enhancing the real-time capabilities and scalability of the control scheme. Notably, previous studies employed reinforcement learning techniques to solve linear quadratic regulator and MPC problems, providing convergence proofs for associated issues [28,29]. Advanced deep reinforcement learning algorithms further demonstrated their potential within an RMPC framework, emphasizing the iterative interaction between optimal control actions and performance indices [30,31]. These instances underscore the capacity of online learning techniques to address quadratic programming problems. Therefore, the integration of online learning techniques, including deep neural networks (DNNs) with a symmetric architecture, holds immense potential in enhancing the real-time capabilities and scalability of robust model predictive control (RMPC). Our proposed approach, which leverages the computational power of GPUs for real-time acquisition of time-varying nominal system information, not only ensures real-time control performance, but also optimizes efficiency.

Building upon the above research, it is not difficult to find that a promising approach involves incorporating tubes with increased degrees of freedom into the optimization process while employing function approximation and online learning techniques within the framework of RMPC to enhance computational efficiency. The main contributions of the paper are three-fold:

A fuzzy-based tube size controller is investigated to adjust the local error tube-scaling vector. Specifically, the controller is designed by considering the state error between the nominal and the actual systems; the error and error variety rate bounds are then established, and the fuzzy IF-THEN rules are derived. The tightened sets on state error are developed to satisfy the system constraints in the case of external disturbances and model uncertainties.
An auxiliary control law pertaining to the scaling vector of the error tube holds greater significance. The auxiliary control law effectively mitigates interference impact on the system by considering variations in the system’s cost function.
A theoretically rigorous and technically achievable framework for RMPC with online parameter estimation, based on a constrained DNN with symmetry properties to improve computing performance, was developed: the OPC is defined based on the parameters of online learning; the DNN structure is expanded using Dykstra’s projection algorithm to ensure the feasibility of the successor state and control input; a time-varying nominal system is generated based on the aforementioned content to fulfill the requirements of system robustness.

The remainder of this paper is organized as follows: Preliminaries and problem formulation is considered in Section 2. In Section 3, a novel RMPC scheme is developed based on the fuzzy-based tube size controller and constrained DNN algorithm. Section 4 provides two numerical examples to illustrate the feasibility and effectiveness of the proposed control scheme. In Section 5, some conclusions are drawn.

2. Preliminaries and Problem Formulation

2.1. Nomenclatures

The set of non-negative reals is denoted by

ℝ

;

ℓ_{N}

is a sequence of non-negative integers

ℓ_{N} ≜ {0, 1, 2, \dots, N}

. For a set A and a real matrix M of compatible dimensions, the image of A under M is denoted by

M A = {M a : a \in A}

. Given two subsets C and B of

ℝ^{n}

and

x \in ℝ^{n}

, the Minkowski set addition is defined by

C \oplus B ≜ {c + b | c \in C, b \in B}

and Minkowski set subtraction is defined by

C ⊖ B ≜ {c | c \oplus B \subseteq C}

.

{x} \oplus C

is substituted for

x \oplus C

. For

M > 0

and

x \in ℝ^{n}

, define

{‖ x ‖}_{M}^{2} = x^{T} M x

. The distance of a point

x \in ℝ^{n}

from a point

z \subset ℝ^{n}

is denoted by

d (x, z) = | x - z |

.

C o n v {\cdot}

denotes the convex hall of elements in

{\cdot}

. For an unknown vector

v

, the notations

v^{*}

represent its optimal value.

2.2. Problem Formulation

Consider a discrete-time linear system with bounded disturbance (actual system) in the form of

x_{k + 1} = A x_{k} + B u_{k} + w_{k}, k \in ℓ_{N},

(1)

where N is the horizon length.

x_{k} \in ℝ^{n}

and

u_{k} \in ℝ^{m}

are the state vector and the control input of the actual system subject to bounded disturbance

w_{k}

.

w_{k} \in ℝ^{n}

is taking values in the set

W \in ℝ^{n}

. The

x_{k + 1}

denotes the successor state of the actual system. The system variables are selected such that the following constraints are satisfied:

x_{k} \in X \subseteq ℝ^{n}, u_{k} \in U \subseteq ℝ^{m}, w_{k} \in W \subseteq ℝ^{n}, k \in ℓ_{N},

(2)

where

X

and

U

are compact and convex, which contains the origin as an interior point. The compact set

W

contains the origin.

Let the nominal (reference) system without any disturbance corresponding to (1) be defined by

z_{k + 1} = A z_{k} + B v_{k}, k \in ℓ_{N},

(3)

where

z_{k} \in ℝ^{n}

and

v_{k} \in ℝ^{m}

are the state and control input of the nominal system without accounting for any uncertainty, respectively.

z_{k + 1}

denotes the desired value of the successor state in the system (1).

The state error is represented as

e_{k} = x_{k} - z_{k}, k \in ℓ_{N} .

(4)

Assumption 1.

The matrix pair $(A, B) \in ℝ^{n \times n} \times ℝ^{n \times m}$ is known and stabilizable;
The state $x_{k}$ can be measured at each sample time;
The current disturbance $w_{k} \in W$ and future disturbances $w_{k + i} \in W, i = 1, 2, \dots, N - 1$ are not known and can take arbitrary values.

In this paper, the fixed shape set of the error tube is expressed as E. For any non-empty set

E \subseteq ℝ^{n}

, the error tube is a sequence of sets

E_{N} = {E_{k}}

, where

E_{k}

is given by

E_{k} = α_{k} E, k \in ℓ_{N} with α_{k} \in ℝ,

(5)

where

α_{k}

is the scaling vector. Meanwhile, for each relevant

k \in ℓ_{N}

, the state tube

X_{N}

and control tube

U_{N - 1}

corresponding to HTMPC [18] are indirectly determined by the following form

X_{N} = {z_{k} (e_{k - 1})} \oplus E_{N}, k \in ℓ_{N}

(6)

U_{N - 1} = {v_{k} (e_{k})} \oplus K E_{N}, k \in ℓ_{N}

(7)

where

{z_{k} (e_{k - 1})}

and

{v_{k} (e_{k})}

are the sequence of state tube and control tube centers determined by state error

e

.

K \in ℝ^{m \times n}

is the disturbance rejection gain [32]. The corresponding control policy is a sequence of control laws

Π_{N - 1} = {π_{k} (e_{k}, E_{k}, U_{k})}

with

\forall e_{k} \in α_{k} E, π_{k} (e_{k}, E_{k}, U_{k}) = v_{k} (e) + K e_{k}, k \in ℓ_{N - 1} .

(8)

Refer to Equations (5)–(8), clearly, given set

E

, the error tube

E_{N}

, state tube

X_{N}

, control tube

U_{N - 1}

, and control policy

Π_{N - 1}

are determined by the sequences of

{e_{k} \in ℝ^{n}}

and

{v_{k} \in ℝ^{m}}

. Consequently, introduce a decision variable

φ_{N} = (e_{0}, \dots, e_{N}, v_{0}, \dots, v_{N - 1}) \in ℝ^{N (n + m + 1)}

.

Subsequently, the OCP

ℙ_{N} (e)

is defined by

V_{N}^{0} (e) = \inf_{φ_{N}} {V_{N} (φ_{N}) : φ_{N} \in Φ_{N} (e)},

(9)

d_{N}^{0} (e) = \arg \inf_{φ_{N}} {V_{N} (φ_{N}) : φ_{N} \in Φ_{N} (e)},

(10)

where the cost function

V_{N} (\cdot)

is defined by

V_{N} (φ_{N}) = \min \sum_{k = 0}^{N - 1} ℓ (e_{k}, v_{k}) + V_{f} (e_{N}),

(11)

with

ℓ (e_{k}, v_{k}) = {‖ e ‖}_{Q_{e}}^{2} + {‖ v ‖}_{Q_{v}}^{2}, k \in ℓ_{N}

(12)

and

V_{f} (e_{N}) = {‖ e ‖}_{P}^{2},

(13)

here,

ℓ (e_{k}, v_{k})

is the stage cost, which is employed to achieve the desired performance of the control. The terminal cost represented by

V_{f} (e_{N})

ensures stability and recursive feasibility.

Q_{e} \in ℝ^{n \times n}

,

Q_{v} \in ℝ^{m \times m}

, and

P_{} \in ℝ^{n \times n}

are known positive definite symmetric matrices. For any

x_{k} \in X \subseteq ℝ^{n}

, the set of permissible decision variables

φ_{N}

corresponds to the value of the set-valued map

Φ_{N} (e)

as

Φ_{N} (e) : = {φ_{N} : (14) h o l d s f o r a l l k \in ℓ_{N - 1}}

, where

e_{0} \in E,

(14)

(A + B K) e \oplus W \subseteq α_{k + 1} E,

(15)

\forall e_{k} \in α_{k} E,

(16)

{υ (e_{k})} \in U ⊖ α_{k} K E,

(17)

{z (e_{k - 1})} \in X ⊖ α_{k} E,

(18)

A x_{k} + B u_{k} \oplus W \subseteq z (e_{k}) \oplus α_{k + 1} E,

(19)

E_{N} \in E_{f},

(20)

where

E_{f} \subseteq ℝ^{n + 1}

is the terminal constraint set [33] for

ℙ_{N} (e)

.

Similar to the tube MPC principle [8], if

z_{k}

satisfies

X ⊖ α_{k + 1} E

and

v_{k}

satisfies

U ⊖ K E

, then the imposed constraints on the actual system state

x \in X

and control input

u \in U

are also met. In this work, the determination of

z_{k}

is related to

e_{k - 1}

, while the determination of

v_{k}

is concerned with

e_{k}

; thus, it is imperative to satisfy both conditions

{z (e_{k - 1})} \in X ⊖ α_{k} E

and

{υ (e_{k})} \in U ⊖ α_{k} K E

. Furthermore, at step N, if

E_{N}

fulfills terminal constraint

E_{N} \in E_{f}

(the Equation (21) provides the formulation and limitation of

E_{f}

), it guarantees that the system state complies with requirement

x_{N} \in X

.

Constraints (15) and (19) represent the set dynamics of the error tube and the homothetic state tube, respectively, which contribute to dynamic relaxation in [8]. In addition, the terminal constraint set

E_{f}

satisfies the following constraint:

(A + B K) E_{f} \subset E_{f} .

(21)

The performance evaluation of the terminal control necessitates the definition of a 0-step homothetic tube controllability set

X_{0}

[15], which must satisfy the following constraints:

X_{0} = \Pr o j_{ℝ^{n}} {(x, z, α) : z \in X ⊖ α E | K z \in U ⊖ α E} .

(22)

where

\Pr o j_{ℝ^{n}} (Z)

denotes a set

z \subset ℝ^{n + m}

projected onto

ℝ^{n}

as

\Pr o j_{ℝ^{n}} (Z) = {x \in ℝ^{n} : \exists y \in ℝ^{m} s u c h

t h a t (x, y) \in Z}

.

2.3. Controller Synthesis

The objectives of this paper is to design an optimal control policy

u_{k}

based on any given initial state error

e_{0}

, which not only renders the local state

x_{k}

asymptotically tracking the reference state

z_{k}

, namely

e_{k}

asymptotically approaching zero, but also minimizes the OCP. The problem of solving the conventional control policy

u_{k}

of (1) is converted into finding the nominal optimal control input

v_{k} (e)

and designing an appropriate disturbance rejection gain

K

while ensuring that the constraints related to

α_{k}

are satisfied.

The controller synthesis for the proposed RMPC scheme is specified as

u_{k} = υ (e_{k}) + K e_{k}, k \in ℓ_{N}

(23)

where

u_{k}

is the control action obtained from the presented method. The ancillary control law is denoted as

K e_{k}

, which keeps the local state

x_{k}

within the error tube centered around the trajectory of

z_{k}

.

υ (e_{k})

is the output obtained by online learning with state errors as input.

Consider the error system obtained by integrating the Equations (1), (3), and (4) as

e_{k + 1} = A e_{k} + B (u_{k} - v_{k}) + w_{k}, k \in ℓ_{N}

(24)

where

e_{k + 1}

is the successor state error. The system (24) is rewritten to be

e_{k + 1} = (A + B K) e_{k} + w_{k}, k \in ℓ_{N} .

(25)

3. DNN-Based RMPC with a Fuzzy-Based Tube Size Controller

This section presents the design of the novel RMPC scheme, which incorporates updates to scaling and policy iteration for nominal control. The innovative RMPC framework consists of a fuzzy-based tube size controller and a constrained DNN-based nominal RMPC component. The former calculates the error tube-scaling vector by considering both state error and error variety rate, while the latter determines a sequence of constraints associated with scaling to ensure optimal control policy generation. Concurrently, the DNN-based nominal RMPC offers a time-varying nominal system that exhibits enhanced computational efficiency. Moreover, by incorporating variations in the cost function value into the auxiliary control law design, it effectively mitigates the adverse effects of interference on the system.

3.1. Error Tube and Constraint Satisfaction

This work discusses that the fuzzy control is used to estimate (predict) the corresponding error tube-scaling vector

α_{k}

, allowing for computational feasibility of the OCP

ℙ_{N} (e)

. More importantly, an auxiliary control law

K e_{k}

pertaining to the scaling vector of the error tube holds greater significance. The auxiliary control law effectively mitigates interference impact on the system by considering variations in the system’s cost function.

Assumption 2.

The error tube cross-section shape set $E \subset ℝ^{n}$ (i.e., outer invariant approximation of the minimal robust positively invariant set [34]) is compact, convex, and contains the origin such that ${(A + B K) e : e \in E} \oplus W \subseteq α_{k + 1} E$ , $k \in ℓ_{N - 1}$ ;
The state tube cross-section shape set $ℤ$ is given by $ℤ = C o n v {ℤ (e) : e \in E}$ ;
The control tube cross-section shape set $V$ is given by $V = C o n v {v (e) : e \in E}$ .

If

E

satisfies Assumption 2, then for any established

α_{k} \in ℝ

, it holds that

e_{k} \in α_{k} E

. Further, the nominal state and control input are restrained indirectly as

ℤ (e) \in ℤ

and

v (e) \in V

. It is clear that if

\forall e_{k} \in α_{k} E

,

v (e) \in V

, then the satisfaction of original constraints

u \in U

for

\forall w \in W

is guaranteed by using the control scheme

u_{k} (e) = υ (e_{k}) + K e_{k}

.

Next, the fuzzy-based tube size controller is employed to estimate the error tube scaling, which generates the scaling vector by considering the local error and the error variety rate. The components of the fuzzy controller [35] include some fuzzy IF-THEN rules and a fuzzy inference engine. The fuzzy inference engine utilizes the IF-THEN rules to map from input error

e \in ℝ^{n}

and error variety rate

e_{c} \in ℝ^{n}

to an output variable

α \in ℝ

. The lower and upper bound values of

e

and

e_{c}

are represented as

\pm e_{0}

and

\pm e_{0}^{'}

, respectively. Furthermore, divide the two-dimensional graph comprising

e

and

e_{c}

into nine distinct regions, as depicted in Figure 1. Upper and lower limits for both

e

and

e_{c}

define these regions. Each region, denoted as

G_{i}

, corresponds to a specific IF-THEN rule. The fuzzy controller accurately determines the region within the graph where a given pair of values for

e

and

e_{c}

are located, based on the provided input. Subsequently, it employs IF-THEN rules to calculate the appropriate scaling variables. Taking

G_{3 +}

as an illustrative example, in this particular scenario, when

e \geq e_{0}

and

e \geq e_{0}

, it indicates a relatively high positive deviation of the system’s state error with a gradual increase. In such circumstances, the controller generates a diminished value for

α

as an output, ensuring that the system’s state error exhibits a tightening trend.

To be specific, fuzzy IF-THEN rules are written as

( $G_{1}$ ). IF $e \leq - e_{0}$ and $e_{c} \geq e_{0}^{'}$ or $e \geq e_{0}$ and $e_{c} \leq - e_{0}^{'}$ THEN $α$ takes on a smaller value;
( $G_{2}$ ). IF $| e | \leq e_{0}$ and $e_{c} \geq e_{0}^{'}$ or $| e | \leq e_{0}$ and $e_{c} \leq - e_{0}^{'}$ THEN takes on a slightly larger value;
( $G_{3}$ ). IF $e \leq - e_{0}$ and $e_{c} \leq | e_{0}^{'} |$ or $e \geq e_{0}$ and $e_{c} \leq | e_{0}^{'} |$ THEN $α$ takes a value as small as possible.
( $G_{4}$ ). IF $| e_{c} | \leq e_{0}^{'}$ and $e \geq e_{0}$ or $| e_{c} | \leq e_{0}^{'}$ and $e \leq - e_{0}$ THEN $α$ takes on a larger value;
( $G_{5}$ ). IF $| e | < e_{0}$ and $| e_{c} | < e_{0}^{'}$ THEN $α$ takes a value as large as possible.

For convenience, let the universe of

e

be a~b and set the universe of

e_{c}

as c~d. The membership degree function is taken as the triangular function. Then, singleton fuzzifier and average center defuzzifier [36] were used to calculate outputs

α

based on the feedback values of

e

and

e_{c}

in the form of

α = \frac{\sum_{i = 1}^{5} η_{i} μ (e, e_{c})}{\sum_{i = 1}^{5} μ (e, e_{c})},

(26)

where

μ (\cdot)

is the membership degree of the five cases mentioned above. The

η_{i}

is an adjustable weight parameter of

α

under a different context. Afterward, the successor value of

α

is determined by

α_{k + 1} = α_{k} + τ,

(27)

with

τ = \max_{λ} {λ | W \in λ E} .

(28)

Theorem 1.

Given system (1) controlled with the control policy

u ≜ {u_{0}^{}, u_{1}^{}, \dots, u_{N - 1}^{}}

, the state error

e_{k}

is restricted to the error tube

α_{k} E

. To be specific, the design of the disturbance rejection rate ensures that error

\lim_{k \to \infty} e_{k} \to 0

for

\forall w \in W

.

Lemma 1

([37]).

s_{1}^{T} F s_{2} + s_{2}^{T} F s_{1} \leq s_{1}^{T} F s_{1} + s_{2}^{T} F s_{2},

where

s_{1}, s_{2}

are any vector.

F \in ℝ^{m \times n}

is a positive definite matrix.

Proof of Theorem 1.

Consider the error system (25). The disturbance rejection gain

K

guarantees that

e_{k}

is constrained to be inside the set

α_{k} E

, i.e.,

x_{k} \in z_{k} + α_{k} E

. Since the nominal system (3) has robust stability, the nominal state

z_{k}

should converge to the origin

d (z_{k}, 0) \to 0

. Then, the state error

e_{k}

must converge to error tube

α_{k} E

because of

x_{k} \in z (e_{k - 1}) + α_{k} E

, namely

d (e_{k}, α_{k} E) \to 0

. Finally, the state error

e_{k}

is restricted to a variable error tube

α_{k} E

whose center is at the origin by implementing the ancillary control law

K e_{k}

.

Here, the disturbance rejection gain

K

is solved by the following equation

K^{T} B^{T} H I_{n} + H I_{n} B K + 2 {‖ B K ‖}_{H I_{n}}^{2} + (3 τ^{2} + 1) \cdot H I_{n} = - Q,

(29)

where H is determined by equation

H = {(V_{N} (d_{N}))}^{T} V_{N} (d_{N})

and Q is a positive definite matrix.

I_{n}

denotes the identity matrix with the same dimensions as the state vector

x_{k}

. For convenience, let us set

H I_{n} = P_{V}

.

Then, P is the solution to the following Lyaponuv equation

ε^{2} {‖ A + B K ‖}_{P}^{2} - P = - Q_{e} - {‖ K ‖}_{Q_{v}}^{2},

(30)

with

ε \in (1, \frac{1}{E i g_{\max} (A + B K)}),

(31)

where

E i g_{\max} (\cdot)

is the maximum value of the matrix eigenvalue.

The Lyapunov candidate function is represented as

V_{L} = e_{k}^{T} P_{V} e_{k}, k \in ℓ_{N} .

(32)

Consider the first difference equation as

Δ V_{L} = e_{k + 1}^{T} P_{V} e_{k + 1} - e_{k}^{T} P_{V} e_{k} .

(33)

By substituting Equation (25) into Equation (33), one obtains

\begin{array}{l} Δ V_{L} & = {(B K e_{k})}^{T} P_{V} e_{k} + {(B K e_{k})}^{T} P_{V} B K e_{k} + {(B K e_{k})}^{T} P_{V} w_{k} + w_{k}^{T} P_{V} e_{k} + w_{k}^{T} P_{V} B K e_{k} \\ + w_{k}^{T} P_{V} w_{k} + e_{k}^{T} P_{V} B K e_{k} + e_{k}^{T} P_{V} w_{k} \end{array} .

(34)

According to Lemma 1, then it follows that

Δ V_{L} \leq {(B K e_{k})}^{T} P_{V} e_{k} + e_{k}^{T} P_{V} e_{k} + e_{k}^{T} P_{V} B K e_{k} + 2 {(B K e_{k})}^{T} P_{V} B K e_{k} + 3 w_{k}^{T} P_{V} w_{k} .

(35)

Or, equivalently

Δ V_{L} \leq e_{k}^{T} (K^{T} B^{T} P_{V} + 2 {‖ B K ‖}_{P_{V}}^{2}) e_{k} + e_{k}^{T} P_{V} e_{k} + e_{k}^{T} P_{V} B K e_{k} + 3 w_{k}^{T} P_{V} w_{k} .

(36)

According to Equation (28), the disturbance is bounded by

τ

as

w_{k} \leq τ e_{k}

. We have

Δ V_{L} \leq e_{k}^{T} (K^{T} B^{T} P_{V} + 2 {‖ B K ‖}_{H P_{V}}^{2} + P_{V} B K + (3 τ^{2} + 1) \cdot P_{V}) e_{k} .

(37)

By substituting Equation (29) into Inequation (37), further obtain

Δ V_{L} \leq e_{k}^{T} (- Q) e_{k} .

(38)

It is clear that

Δ V_{L} \leq 0

, thus the function (32) is a decreasing function, then

\lim_{t \to \infty} e_{k} \to 0

. □

This section shows that optimal cross-sections of the error tube are calculated online by considering the adjustable tube-scaling parameters

α_{k}

, which are affected by a combination of error and error variety ratio. Theorem 1 shows that the successor estimation of the actual system has a non-increasing estimation error at each time step. The design of the fuzzy-based tube size controller and the auxiliary control law considers both variations in state error and changes in the value of cost function during error adjustment, unlike previous literature [15] that incorporates optimization of scaling vector

α_{k}

into OCP, thereby increasing computational complexity and aiming to optimize scaling vectors to a specific value. In addition, we discover that the appropriate selection of the acquisition form of a nominal system can improve prediction accuracy. Nonetheless, the invariable nominal system is considered during the prediction in [24,25]. In order to improve the control performance, our main concern here is to define a parameter estimation scheme that generates a time-varying nominal system based on the DNN algorithm and still enables a computationally tractable RMPC algorithm, which is presented in the following.

3.2. Design of DNN-Based Nominal RMPC

This section focuses on designing the DNN-based nominal RMPC to construct a parameter estimation synthesis that provides a time-varying nominal system for the control scheme. The cost function for the constrained system proposed in conventional RMPC is reformulated as an online learning problem by introducing a series of reference control inputs

v_{k} = υ^{θ} (e_{k})

parameterized by θ. The modified OCP

ℙ_{N}^{θ} (e)

, solved online, is defined by

V_{N}^{θ} (e) = \inf_{φ_{N}} \sum_{k = 0}^{N - 1} ({‖ e_{k} ‖}_{Q_{e}}^{2} + {‖ υ^{θ} (e_{k}) ‖}_{Q_{v}}^{2}) + {‖ (e_{N}) ‖}_{P}^{2},

(39)

φ_{N}^{θ} (e) = \arg \inf_{φ_{N}} \sum_{k = 0}^{N - 1} ({‖ e_{k} ‖}_{Q_{e}}^{2} + {‖ υ^{θ} (e_{k}) ‖}_{Q_{v}}^{2}) + {‖ (e_{N}) ‖}_{P}^{2} .

(40)

s . t . \forall e_{k} \in α_{k} E, k \in ℓ_{N - 1},

(41)

(A + B K) e_{k} \oplus W \subseteq α_{k + 1} E, k \in ℓ_{N - 1},

(42)

{υ^{θ} (e_{k})} \in U ⊖ α_{k} K E, k \in ℓ_{N - 1},

(43)

{z (e_{k})} \in X ⊖ α_{k + 1} E, k \in ℓ_{N - 1},

(44)

A x_{k} + B u_{k} \oplus W \subseteq z (e_{k}) \oplus α_{k + 1} E, k \in ℓ_{N - 1},

(45)

E_{N} \in E_{f} .

(46)

The parameters θ will update in the direction of the gradient

\nabla_{θ} V_{N}^{θ} (e)

of the cost function by adopting the policy gradient method. In the architecture of constrained DNN-based nominal RMPC, the state errors (

e_{k}, \dots, e_{N}

) are used as input to create the optimal control policy

υ^{θ} (e_{k})

as the output of DNN.

This paper employs DNN characterized by inherent symmetry, which features symmetric weights, facilitating efficient parameter sharing. Consequently, the network demands fewer computational resources than conventional network structures, rendering them advantageous in resource-constrained environments. Assuming the network has L hidden layers, the layers 1 and L each consist of

i

neurons. The architecture of a deep neural network is illustrated in Figure 2.

The superiority of the network architecture employed in this paper over a typical neural network structure is demonstrated in Table 1.

From Table 1, it can be observed that in symmetric neural networks, the number of weights to be calculated is reduced since each connection is computed only once and then shared. Notably, despite having fewer parameters, deep neural networks with symmetric structures achieve higher accuracy under the same computational resources. Regarding convergence, symmetric neural networks require 35.79% fewer iterations than conventional neural networks.

The output of the DNN-based nominal RMPC is formulated as

υ^{θ} (e_{k}) = δ (\sum_{i = 1}^{m} W^{L} a^{L - 1} + b^{L}),

(47)

where the linear relationship coefficient matrix and bias vector between the hidden layer and the output layer are denoted as

W \subset ℝ^{n \times m}

and

b \subset ℝ^{n \times 1}

, respectively. The affine function parameters

θ = {W_{1 : L}, b_{1 : L}}

will be optimized.

δ

is a rectified linear unit function. The output value of the hidden layer is

a^{L} \subset ℝ^{m \times 1}

, and set the input

a^{1}

to

e_{k}

.

Since the neural network may output a potentially infeasible

υ^{θ} (e_{k})

for a given error

e_{k}

, Dykstra’s projection algorithm [38] is introduced to ensure that subsequent states and controls remain feasible. Its structure is shown in Figure 3.

Theorem 2.

By applying Dykstra’s projection algorithm, the optimal control input

υ^{θ} (e_{k})

converges to the orthogonal projection of

υ^{θ} (e_{k})

onto the polytopic

U ⊖ α_{k} K E

as

t \to \infty

.

Proof of Theorem 2.

First, define the orthogonal projection of

υ^{θ} (e_{k})

onto the polytopic

U ⊖ α_{k} K E

as

P (υ^{θ} (e_{k}))

, and a series of variables

v^{(k, t)}

and

I^{(k, t)}

are generated from the DNN structure, which is extended by Dykstra’s projection algorithm. It then iterates as

v^{(k, t)} = P (υ^{θ} (e_{k})) (v^{(k - 1, t)} - I^{(k, t - 1)}),

(48)

I^{(k, t)} = v^{(k, t)} - (v^{(k - 1, t)} - I^{(k, t - 1)}) .

(49)

Assume that the starting condition of the algorithm is

v^{(0, 0)} = v (e_{0})

and

I^{(0, 0)} = 0

. When

t \to \infty

, we have

I^{(k, t)} = I^{(k, t - 1)}

, it is clearly that

v^{(k, t)} = v^{(k + 1, t)}

(i.e., the nominal control input

υ^{θ} (e_{k})

converges to

P (υ^{θ} (e_{k}))

). □

Thus, given a state error

e_{k}

, control policy will output

P (υ^{θ} (e_{k})) = f (e_{k}; θ)

.

According to the policy gradient theory presented in [39], the gradient of the value function

\nabla_{θ} V_{N}^{θ} (e)

with respect to the policy parameters θ is

\nabla_{θ} V_{N}^{θ} (e) = E_{v} [V_{N}^{θ} (e) \nabla_{θ} \log ϕ (v_{t}; f (e_{t}; θ_{t}), Σ)],

(50)

where

ϕ (v_{t}; f (e_{t}; θ_{t}), Σ)

is a multivariate Gaussian probability density function used to sample control inputs

υ^{θ} (e_{k})

, centered at the DNN output

f (e_{k}; θ)

with diagonal covariance Σ, the covariance Σ anneals to 0 at the end of training to return to the control police.

The neural network parameters iterate by using stochastic gradient descent as

θ_{t + 1} = θ_{t} - γ_{t} V_{N}^{θ} (e) \nabla_{θ_{t}} \log ϕ (v_{t}; f (e_{t}; θ_{t}), Σ) .

(51)

The learning rate

γ_{t}

of DNN is selected as a positive number.

The termination criterion for the iteration is defined as

| θ_{t + 1} - θ_{t} | \leq | V_{N}^{θ} (e) - V_{N - 1}^{θ} (e) | .

(52)

For application of the proposed approach, instead of focusing on constructing a set of polytopic regions, function approximation and reinforcement learning techniques are used to directly learn an approximate optimal control policy. Furthermore, the policy gradient method guarantees the control action converges to locally optimal solutions by applying function approximation to generate unbiased estimates of the gradient with respect to the parameter θ. The proposed optimization method significantly enhances the computational performance of the system control while ensuring the feasibility of control inputs.

3.3. The Feedback Mechanism of the Control Synthesis

In this paper, the feedback loop encompasses state error, state error variety rate, and cost function, as illustrated in Figure 4. Expressly, the state error and error variety rate are conveyed to the fuzzy controller, subsequently yielding an error scaling vector associated with constraints at the subsequent time step. Simultaneously, the state error contributes to the optimization process of the cost function. The resulting cost function value is then fed back into the auxiliary control law, thereby determining the auxiliary control rate for the upcoming sampling time.

The comparison between the computational performance of the proposed algorithm and HTMPC is shown in Table 1. Where

q_{X}

,

q_{U}

,

q_{E}

, and

q_{E_{f}}

in Table 1 denote the numbers of affine inequalities of the irreducible representation of the sets

X

,

U

,

E

, and

E_{f}

employed in the propose scheme;

q_{S}

and

q_{G_{f}}

are the numbers of affine inequalities of the irreducible representation of the state homothetic set and the terminal constraint set, respectively.

Table 2 clearly demonstrates that assigning the scaling vector to the fuzzy controller’s specialized treatment not only provides a more comprehensive consideration of the impact of state error and error variety rate on error tube scaling, but also effectively reduces the number of decision variables and inequality constraints in the optimization process. Furthermore, the design of a symmetric constrained DNN structure addresses the issue of the exponential growth of polyhedra construction with the increasing number of constraints during the optimization. Consequently, implementing the proposed algorithm allows for a substantial reduction in computational complexity while enhancing the flexibility of system control.

3.4. The DNN-Based RMPC with a Fuzzy-Based Tube Size Controller Structure

To recapitulate, the proposed RMPC scheme comprises a fuzzy-based tube size controller and a DNN-based nominal RMPC part. The fuzzy-based tube size controller is employed to adjust error tube scaling. Meanwhile, the tightened sets (i.e., the minimal disturbance invariant set with an adjustable parameter

α_{k}

) and disturbance rejection gain

K

are computed online to restrain state error. Then, the DNN nominal RMPC is used to generate the time-varying nominal system in the case that the constraints associated with

α

are satisfied. It provides a theoretically rigorous and technically achievable framework for RMPC with online parameter estimation to improve calculated performance.

In this paper, we obtain the error tube shape set E by computing the minimum robust positively invariant set using the method described in [34]. Moreover, we set the error bound

e_{0}

to 3.7 and secured the rate of error variation

e_{0}^{'}

by 2.5. As parameter

α

typically ranges between 1 and 2, we design the fuzzy rule table in the Table 3 format. Table 3 shows that inputting a data pair

(e_{0}, e_{0}^{'})

determines a reasonable value for

α

, which subsequently dictates

α_{k + 1}

’s value according to Equation (27). The determination of

α_{k + 1}

further influences determining associated constraints

α_{k + 1} E

,

α_{k + 1} K E

,

X ⊖ α_{k + 1} E

, and

U ⊖ α_{k + 1} K E

. Additionally, disturbance rejection rate and terminal cost function for control can be determined based on Equations (29) and (30). To meet specified constraints, constrained time-varying nominal system trajectories are computed through a Dykstra’s projection algorithm-extended constrained DNN. The actual system will track the nominal trajectory while satisfying relevant conditions. Section 4 will explicitly discuss DNN parameter settings depending on the dimensionality of input and output variables. Specifically, Algorithm 1 gives the main procedure of the proposed control scheme, and its whole structure diagram is presented in Figure 5.

Algorithm 1 DNN-based RMPC with a fuzzy-based tube size controller

Given initial conditions

e_{0} = 0

,

α_{0} = 1

and weighting matrices

Q_{e}

,

Q_{v}

, determine the set

E

.

Compute the terminal weight matrix P and disturbance rejection gain

K

by using (29) and (30).

1: Randomly initialize

θ

2: Set learning rate

γ

3: for each time instant k = 0,1,2,…,N do

4: Compute polytopic

α_{k} E

,

α_{k} K E

,

X ⊖ α_{k} E

and

U ⊖ α_{k} K E

5: if constraints (41)–(46) are satisfied then

6: repeat calculate

θ_{t + 1}

by using (51)

7: until convergence

8: else

9: let

e_{k + 1} = e_{k}

10: end if

11: Solve the optimization problem (39) and (40) based on

θ_{t + 1}

to obtain

v_{k}^{*} (e_{k})

,

12: Compute the error variety rate

e_{c}

and the corresponding scaling vector

α

, then obtain the successor scaling vector

α_{t + 1}

by using (27),

13: Calculate the control input as

u_{k} (e) = υ^{θ} (e_{k}) + K e_{k}

, and then implement

u_{k}

to the system.

14: end for

4. Simulations and Comparison Study

In this section, the advantages of the Algorithm 1 are illustrated by the following simulation examples of both 2-D and 4-D systems. The simulation experiments were conducted using Matlab, and the polyhedral constraint set was constructed utilizing the Mosek and MPT toolbox. Subsequently, the convex optimization problem of the actual system was solved. Deep learning toolboxes were employed to train neural networks for determining optimal control inputs in a nominal system.

Example 1.

Consider a 2-D double integrator discrete-time system in the form of (1) with

A = [\begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix}], B = [\begin{matrix} 0.5 \\ 1 \end{matrix}] .

(53)

The state constraints are

x \in X ≜ {[10, - 2] \times [- 10, 2]}

, the disturbance is bounded as

w \in W ≜ {w | {‖ w ‖}_{\infty} \leq 0.1}

, and the control constraint is

u \in U ≜ {u | | u | \leq 1}

. The performance index function is defined in (39)–(46) with

Q_{e} = I_{2}

and

Q_{v} = 0.01

, the terminal cost

V_{f} (e)

is the value function

{‖ e_{N} ‖}_{P}^{2}

, while P is calculated from (30). Then, disturbance rejection gain

K

is computed by using (29). The set

E

is computed as a polytopic. The horizon length is selected as

N = 12

. The system is simulated using the initial condition

x_{0} = z_{0} = (- 4, - 2)

and

α_{0} = 1

, the value of

α_{k + 1}

is induced by Equation (27). In the context of neural network architecture determination, Figure 6 compares system nominal state trajectories when employing different network structures and deep neural networks with varying layers.

Indeed, from Figure 6, it is evident that when utilizing a symmetric DNN with six hidden layers, the trajectories of system nominal states can reach the desired values more rapidly (i.e., the trajectories of system nominal states reaching the origin by the 12th sampling time).

The state trajectories for the proposed RMPC scheme are indicated in Figure 7. The solid line represents the state trajectory of the nominal system (3), while the dash-dot line is the state trajectory of the actual system (1). The error tube

α_{k} E

is depicted by green polytopes, while the 0-step homothetic tube controllability set X₀ is represented by the dark gray area. Obviously, the local state at each instance is regulated in an error tube

α_{k} E

centered around the trajectory of the nominal state. As anticipated, the cross-section of the error tube diminishes as the nominal state converges towards the origin.

Then, in order to make the comparison between the control performance of Algorithm 1 and the RMPC algorithm more apparent, N is set to 25. Figure 8 shows the state curves for Algorithm 1 and the HTMPC strategy. The state constraint is shown in the gray region. Algorithm 1 makes that initiating from an initial condition significantly distant from the desired equilibrium point enables faster convergence to the target state while maintaining a narrower range of fluctuation in state error when satisfying origin constraints for disturbances and state.

Figure 9 presents the control input curves generated by two optimization methods. The region shown in gray is

U

. Obviously, the control action of the actual system (1) consistently satisfies the control constraint. Meanwhile, Algorithm 1 accelerates the convergence of the control input toward the desired equilibrium point with reduced overshoot.

For the purpose of validating the efficacy of Algorithm 1 in reducing optimization time, a statistical analysis was conducted on the optimization time. Furthermore, to investigate the trend of optimization time, a slightly larger value of N (N = 50) was selected during the experimentation. As shown in Figure 10, the computational efficiency of Algorithm 1 is generally 2–3 orders of magnitude faster than HTMPC. In addition, as N increases, the calculation time for HTMPC exhibits an exponential growth trend. In contrast, the calculation time required by Algorithm 1 shows a gradual slowing trend and eventually stabilizes within 0.16 ms. Specifically, Algorithm 1 saves an average of 339.54 times more optimization time than HTMPC. When N = 50, A1 can save 726.23 times the optimization time compared to HTMPC.

Example 2.

To further authenticate the proposed approach, consider the system of the form (1) with four state dimensions and two control input dimensions as

A = [\begin{matrix} 1 & 1.5 & 0 & 0 \\ 0.5 & - 0.5 & 1 & 0 \\ 0 & 0.1 & 0.1 & 0 \\ 0.5 & 0 & 0.5 & 0.5 \end{matrix}], B = [\begin{matrix} 0 & 1 \\ 1 & 0.1 \\ 1 & 0 \\ 0 & 0 \end{matrix}] .

(54)

Constraints are given by the inequalities as

x \in X ≜ {x | | x | \leq [\begin{matrix} 5 \\ 5 \\ 2 \\ 2 \end{matrix}]}, u \in U ≜ {u | | u | \leq [\begin{matrix} 1 \\ 1 \end{matrix}]} .

(55)

The parameters are set to horizon N = 30, weighting matrices

Q_{e} = d i a g {10, 10, 1, 1}

and

Q_{v} = d i a g {0.01, 0.01}

. The system is simulated according to the provided initial condition

x_{0} = z_{0} = (- 3, - 4, - 1.5, 1)

. The other parameters of the system are under the same conditions as those in Example 1. Algorithm 1 will be implemented in this system to test its control performance for large-scale systems. Furthermore, the final DNN structure is determined by comparing the Euclidean norms of state errors generated when applying different deep neural network architectures, as illustrated in Figure 11. Specifically, the chosen DNN configuration comprises a symmetric deep neural network with eight hidden layers, each containing 14 neurons.

The Euclidean norm

{‖ e ‖}_{2} = \sqrt{\sum_{i = 1}^{4} {(e_{i})}^{2}}

is employed to depict the trend of state error changes. As indicated in Figure 11, it is observed that when applying a symmetric neural network with eight hidden layers, the system’s state error is generally more minor and converges within the neighborhood of zero more quickly.

Figure 12 depicts the state variable curves for each dimension. The figure demonstrates that the time-varying nominal system obtained by online learning results in a slight error and shorter adjustment time during the convergence of the nominal state. The translucent area in these figures represents the range of error fluctuations; evidently, Algorithm 1 generally yields a bound on state errors than HTMPC, indicating greater flexibility in scaling the state tube. Furthermore, in Table 2, a visual comparison is performed using specific data to effectively demonstrate the error-constraining capabilities when evaluating the tracking performance of the actual system against the nominal system, employing Algorithm 1 and HTMPC. In order to mitigate the extreme influence of outliers, we opted for the mean squared error (MSE), known for its numerical stability, as the metric for assessing the tracking performance.

The utilization of Algorithm 1 for controlling a 4-D system, as illustrated in Table 4, leads to a minor MSE between the nominal and actual states across all four dimensions. The average MSE of the four dimensions is reduced by 67.86% when Algorithm 1 is employed, compared to its counterpart HTMPC. Consequently, the implementation of Algorithm 1 ensures a closer approximation of the actual state to the nominal state with reduced error.

Moreover, the time-varying nominal system generated by Algorithm 1, as depicted in Figure 13, exhibits enhanced control input stabilization capabilities with a faster convergence rate and reduced overshoot.

From a computational perspective, Algorithm 1 exhibits more pronounced advantages regarding computational efficiency for large-scale systems. As illustrated in Table 5, it can be observed that the proposed method significantly reduces the computation time to less than 6 ms when applied to four-dimensional input systems. In contrast, the HTMPC approach requires a longer computation time. On average, Algorithm 1 achieves optimization up to 7218.07 times faster than HTMPC.

5. Conclusions

This paper presents a mathematically rigorous and computationally tractable RMPC scheme for constrained linear systems with bounded disturbance. Firstly, a more flexible approach is proposed to adjust the size of the corresponding tube cross-section by incorporating a fuzzy-based tube size controller, which is influenced by both error magnitude and error variability ratio. Subsequently, the OCP for systems is reformulated as an online learning problem with iterative parameters. A time-varying nominal system for the control scheme is generated from the DNN-based nominal RMPC. Additionally, Dykstra’s projection algorithm is incorporated into the DNN optimization process to ensure the feasibility of the successor state and control input. The proposed integrated control strategy significantly reduces the computational time while enhancing control effectiveness, thereby enabling its potential application in large-scale systems. Simulation results demonstrate the effectiveness of the proposed optimal control algorithm. The current study is constrained by the need for a measurable criterion for evaluating the suboptimal nature of the derived control law, thus impeding our ability to ascertain its degree of alignment with an optimal solution. To address this constraint, it might be imperative to devise metrics or algorithms capable of proficiently assessing the efficacy of the control.

Author Contributions

Conceptualization, S.Y. and Y.L.; methodology, S.Y.; software, Y.L.; validation, Y.L. and H.C.; formal analysis, S.Y.; investigation, Y.L.; resources, S.Y.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and H.C.; supervision, S.Y.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 61703224 and 61640302.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare that there is no conflict of interest.

References

Yang, H.Y.; Yan, Z.P.; Zhang, W.; Gong, Q.S.; Zhang, Y.; Zhao, L.Y. Trajectory tracking with external disturbance of bionic underwater robot based on CPG and robust model predictive control. Ocean Eng. 2022, 263, 112215. [Google Scholar] [CrossRef]
Shi, H.Y.; Li, P.; Su, C.L.; Wang, Y.; Yu, J.X.; Cao, J.T. Robust constrained model predictive fault-tolerant control for industrial processes with partial actuator failures and interval time-varying delays. J. Process Control 2019, 75, 187–203. [Google Scholar] [CrossRef]
Xie, Y.Y.; Liu, L.; Wu, Q.W.; Qian, Z. Robust model predictive control based voltage regulation method for a distribution system with renewable energy sources and energy storage systems. Int. J. Electr. Power Energy Syst. 2020, 118, 105749. [Google Scholar] [CrossRef]
Ojaghi, p.; Bigdeli, N.; Rahmani, M. An LMI approach to robust model predictive control of nonlinear systems with state-dependent uncertainties. J. Process Control 2016, 47, 1–10. [Google Scholar] [CrossRef]
Zheng, Y.P.; Li, D.W.; Xi, Y.G.; Zhang, J. Improved model prediction and RMPC design for LPV systems with bounded parameter changes. Automatica 2013, 49, 3695–3699. [Google Scholar] [CrossRef]
Fleming, J.; Kouvaritakis, B.; Cannon, M. Robust tube MPC for linear systems with multiplicative uncertainty. IEEE Trans. Autom. Control 2015, 60, 1087–1092. [Google Scholar] [CrossRef]
Hertneck, M.; Köhler, J.; Trimpe, S.; Allgöwer, F. Learning an approximate model predictive controller with guarantees. IEEE Control Syst. Lett. 2018, 2, 543–548. [Google Scholar] [CrossRef]
Langson, W.; Chryssochoos, I.; Raković, S.V.; Mayne, D.Q. Robust model predictive control using tubes. Automatica 2004, 40, 125–133. [Google Scholar] [CrossRef]
Mayne, D.Q.; Raković, S.V.; Findeisen, R.; Allgöwer, F. Robust output feedback model predictive control of constrained linear systems: Time varying case. Automatica 2009, 45, 2082–2087. [Google Scholar] [CrossRef]
Yu, S.Y.; Maier, C.; Chen, H.; Allgöwer, F. Tube MPC scheme based on robust control invariant set with application to Lipschitz nonlinear systems. Syst. Control Lett. 2013, 62, 194–200. [Google Scholar] [CrossRef]
Limon, D.; Alvarado, I.; Alamo, T.; Camacho, E.F. Robust tube-based MPC for tracking of constrained linear systems with additive disturbances. J. Process Control 2010, 20, 248–260. [Google Scholar] [CrossRef]
Cannon, M.; Kouvaritakis, B.; Raković, S.V.; Cheng, Q.F. Stochastic tubes in model predictive control with probabilistic constraints. IEEE Trans. Autom. Control 2011, 56, 194–200. [Google Scholar] [CrossRef]
Mayne, D.Q.; Seron, M.M.; Raković, S.V. Robust model predictive control of constrained linear systems with bounded disturbances. Automatica 2005, 41, 219–224. [Google Scholar] [CrossRef]
Mayne, D.Q.; Raković, S.V.; Findeisen, R.; Allgöwer, F. Robust output feedback model predictive control of constrained linear systems. Automatica 2006, 42, 1217–1222. [Google Scholar] [CrossRef]
Raković, S.V.; Kouvaritakis, B.; Findeisen, R.; Cannon, M. Homothetic tube model predictive control. Automatica 2012, 48, 1631–1638. [Google Scholar] [CrossRef]
Raković, S.V.; Cheng, Q.F. Homothetic tube MPC for constrained linear difference inclusions. In Proceedings of the Chinese Control and Decision Conference (CCDC), Guiyang, China, 25–27 May 2013; pp. 754–761. [Google Scholar]
Georgiou, A.; Tahir, F.; Jaimoukha, I.M.; Evangelou, S.A. Computationally Efficient Robust Model Predictive Control for Uncertain System Using Causal State-Feedback Parameterization. IEEE Trans. Autom. Control 2023, 68, 3822–3829. [Google Scholar] [CrossRef]
Zhang, B.; Yan, S. Asynchronous constrained resilient robust model predictive control for markovian jump systems. IEEE Trans. Ind. Inform. 2020, 16, 7025–7034. [Google Scholar] [CrossRef]
Olaru, S.; Dumur, D. A parameterized polyhedra approach for the explicit robust model predictive control. In Informatics in Control, Automation and Robotics II; Springer: Dordrecht, The Netherlands, 2007; pp. 217–226. [Google Scholar]
Lin, C.Y.; Yeh, H.Y. Repetitive model predictive control based on a recurrent neural network. In Proceedings of the 2012 International Symposium on Computer, Consumer and Control, Taichung, Taiwan, 4–6 June 2012; pp. 540–543. [Google Scholar]
Han, H.; Kim, H.; Kim, Y. An efficient hyperparameter control method for a network intrusion detection system based on proximal policy optimization. Symmetry 2022, 14, 161. [Google Scholar] [CrossRef]
Xu, K.; Huang, D.Z.; Darve, E. Learning constitutive relations using symmetric positive definite neural networks. J. Comput. Phys. 2021, 428, 110072. [Google Scholar] [CrossRef]
Di, M.M.; Forti, M.; Tesi, A. Existence and characterization of limit cycles in nearly symmetric neural networks. IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 2002, 49, 979–992. [Google Scholar]
Ma, C.Q.; Jiang, X.Y.; Li, P.; Liu, J. Offline computation of the explicit robust model predictive control law based on eeep neural networks. Symmetry 2023, 15, 676. [Google Scholar] [CrossRef]
Bumroongsri, P.; Kheawhom, S. An off-line robust MPC algorithm for uncertain polytopic discrete-time systems using polyhedral invariant sets. J. Process Control 2012, 22, 975–983. [Google Scholar] [CrossRef]
Lorenzen, M.; Cannon, M.; Allgöwer, F. Robust MPC with recursive model update. Automatica 2019, 103, 461–471. [Google Scholar] [CrossRef]
Moreno-Mora, F.; Beckenbach, L.; Streif, S. Performance bounds of adaptive MPC with bounded parameter uncertainties. Eur. J. Control 2022, 68, 100688. [Google Scholar] [CrossRef]
Bradtke, S.J. Reinforcement learning applied to linear quadratic regulation. Adv. Neural Inf. Process. Syst. 1992, 5, 295–302. [Google Scholar]
Wang, H.R.; Zariphopoulou, T.; Zhou, X.Y. Reinforcement learning in continuous time and space: A stochastic control approach. J. Mach. Learn. Res. 2020, 21, 8145–8178. [Google Scholar]
Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. Int. Conf. Mach. Learn. 2015, 37, 1889–1897. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Kolmanovsky, I.; Gilbert, E.C. Theory and computation of disturbance invariant sets for discrete-time linear systems. Math. Probl. Eng. 1998, 4, 317–367. [Google Scholar] [CrossRef]
Wan, Z.; Pluymers, B.; Kothare, M.V.; De Moor, B. Efficient robust constrained model predictive control with a time varying terminal constraint set. Syst. Control Lett. 2006, 55, 618–621. [Google Scholar] [CrossRef]
Raković, S.V.; Kerrigan, E.C.; Kouramas, K.I.; Mayne, D.Q. Invariant approximations of the minimal robust positively invariant set. IEEE Trans. Autom. Control 2005, 50, 406–410. [Google Scholar] [CrossRef]
Nguyen, A.T.; Taniguchi, T.; Eciolaza, L.; Campos, V.; Palhares, R.; Sugeno, M. Fuzzy control systems: Past, present and future. IEEE Comput. Intell. Mag. 2019, 14, 56–58. [Google Scholar] [CrossRef]
Zeng, X.J.; Madan, G.S. Approximation accuracy analysis of fuzzy systems as function approximators. IEEE Trans. Fuzzy Syst. 1996, 4, 44–63. [Google Scholar] [CrossRef]
Bertsekas, D. Dynamic programming and optimal control: Volume I. Athena Sci. 2012, 4, 111–120. [Google Scholar]
Wu, L.G.; Su, X.J.; Shi, P. Dykstra’s Algorithm for a constrained least-squares matrix problem. Numer. Linear Algebra Appl. 1996, 3, 459–471. [Google Scholar]
Sutton, R.S.; Mcallester, D.; Singh, S.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Info. Proc. Syst. (NIPS) 2000, 12, 1057–1063. [Google Scholar]

Figure 1. A two-dimensional depiction of the control maneuver. Different regions in the graph represent the following:

G_{1 -}

corresponds to the region where

e \leq - e_{0}

and

e_{c} \geq e_{0}^{'}

;

G_{1 +}

depicts the area with

e \geq e_{0}

and

e_{c} \leq - e_{0}^{'}

;

G_{2 +}

characterizes the area where

| e | \leq e_{0}

and

e_{c} \geq e_{0}^{'}

;

G_{2 -}

highlights the territory where

| e | \leq e_{0}

and

e_{c} \leq - e_{0}^{'}

;

G_{3 +}

showcases the domain where

e \geq e_{0}

and

e_{c} \leq | e_{0}^{'} |

;

G_{3 -}

marks the domain where

e \leq - e_{0}

and

e_{c} \leq | e_{0}^{'} |

;

G_{4 +}

exemplifies the region where

| e_{c} | \leq e_{0}^{'}

and

e \geq e_{0}

;

G_{4 -}

describes the space where

| e_{c} | \leq e_{0}^{'}

and

e \leq - e_{0}

;

G_{5}

specifies the domain where

| e | < e_{0}

and

| e_{c} | < e_{0}^{'}

.

Figure 1. A two-dimensional depiction of the control maneuver. Different regions in the graph represent the following:

G_{1 -}

corresponds to the region where

e \leq - e_{0}

and

e_{c} \geq e_{0}^{'}

;

G_{1 +}

depicts the area with

e \geq e_{0}

and

e_{c} \leq - e_{0}^{'}

;

G_{2 +}

characterizes the area where

| e | \leq e_{0}

and

e_{c} \geq e_{0}^{'}

;

G_{2 -}

highlights the territory where

| e | \leq e_{0}

and

e_{c} \leq - e_{0}^{'}

;

G_{3 +}

showcases the domain where

e \geq e_{0}

and

e_{c} \leq | e_{0}^{'} |

;

G_{3 -}

marks the domain where

e \leq - e_{0}

and

e_{c} \leq | e_{0}^{'} |

;

G_{4 +}

exemplifies the region where

| e_{c} | \leq e_{0}^{'}

and

e \geq e_{0}

;

G_{4 -}

describes the space where

| e_{c} | \leq e_{0}^{'}

and

e \leq - e_{0}

;

G_{5}

specifies the domain where

| e | < e_{0}

and

| e_{c} | < e_{0}^{'}

.

Figure 2. Diagram illustrating the architectural structure of the deep neural network.

Figure 3. Diagram of the constrained DNN structure expanded by Algorithm 1.

Figure 4. The flowchart depicting the feedback processing mechanism of the control synthesis.

Figure 5. The structure of the constrained DNN-based robust model predictive control scheme with an adjustable error tube.

Figure 6. Nominal state trajectories under various DNN architectures.

Figure 7. The state trajectories of the proposed algorithm (N = 12). Colors in the figure represent specific categories as follows: the green polytopes depicte the error tube for every sampling times; the dark gray area represents the 0-step homothetic tube controllability set X₀; the gray area declares the undesirable state area.

Figure 8. State astringency comparison between Algorithm 1 and the HTMPC (N = 25). (a) Curves of

x_{1}

obtained by implementing two control algorithms, respectively; (b) curves of

x_{2}

obtained by implementing two control algorithms, respectively.

Figure 8. State astringency comparison between Algorithm 1 and the HTMPC (N = 25). (a) Curves of

x_{1}

obtained by implementing two control algorithms, respectively; (b) curves of

x_{2}

obtained by implementing two control algorithms, respectively.

Figure 9. Control input astringency comparison between Algorithm 1 and the HTMPC (N = 25).

Figure 10. Comparison of Algorithm 1 and the HTMPC for computational efficiency (N = 50). (a) The statistical of computational time for Algorithm 1. (b) The statistical of computational time for HTMPC.

Figure 11. Euclidean norm of state error under various DNN architectures.

Figure 12. State astringency comparison between Algorithm 1 and the HTMPC (N = 30). (a) Curves of state obtained by implementing two distinct control algorithms, respectively. (b) Curves of state obtained by implementing two distinct control algorithms, respectively. (c) Curves of state obtained by implementing two distinct control algorithms, respectively. (d) Curves of state obtained by implementing two distinct control algorithms, respectively.

Figure 13. Control input astringency comparison between Algorithm 1 and the HTMPC (N = 30). (a) Curves of control input

v_{1}

for the nominal system obtained by employing two distinct control algorithms, respectively. (b) Curves of control input

v_{2}

for the nominal system obtained by employing two distinct control algorithms, respectively.

Figure 13. Control input astringency comparison between Algorithm 1 and the HTMPC (N = 30). (a) Curves of control input

v_{1}

for the nominal system obtained by employing two distinct control algorithms, respectively. (b) Curves of control input

v_{2}

for the nominal system obtained by employing two distinct control algorithms, respectively.

Table 1. Performance comparison: symmetrical DNN vs. general DNN.

Network Structure	Number of Calculated Weights	Iterations	Precision
Symmetrical DNN	$2 i (L + 1)$	436	98.03%
Typical DNN	$[m + n + (i + 1) L - i + 1] i$	679	97.64%

Table 2. The comparison of computational complexity between the proposed approach and HTMPC.

Control Strategy	Number of Decision Variables	Number of Inequality Constraints	Upper Bound on the Number of Critical Regions
Proposed Approach	$N (m + n) + n$	$N (q_{X} + q_{U} + q_{E}) + q_{E_{f}}$	0
HTMPC	$N (m + n + 1) + n + 1$	$N (q_{X} + q_{U} + q_{S} + 1) + q_{S} + q_{G_{f}}$	$2^{N (q_{X} + q_{U} + q_{S} + 1) + q_{S} + q_{G_{f}}}$

Table 3. Fuzzy rule comparison table.

Scaling Vector	Fuzzy Rule Control Region
Scaling Vector	$G_{1}$	$G_{2}$	$G_{3}$	$G_{4}$	$G_{5}$
α	0.5–0.8	0.9–1.2	0–0.4	1.3–1.6	1.7–2.0

Table 4. The MSE of state trajectories produced by two distinct control methodologies.

Control Strategy	Mean Squared Error
Control Strategy	X₁	X₂	X₃	X₄
Algorithm 1	0.257407572	0.250064179	0.081418276	0.150326006
HTMPC	1.226454484	0.606667032	0.166021517	0.300930419

Table 5. Comparison of Algorithm 1 and the HTMPC for computational efficiency.

Control Strategy	Horizon Length (N)
Control Strategy	10	20	30	40	50
Algorithm 1	0.003938 s	0.004592 s	0.004823 s	0.004967 s	0.005094 s
HTMPC	23.179 s	27.674 s	30.239 s	38.098 s	49.837 s

The table denotes the calculation time unit “second” as “s”.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, S.; Liu, Y.; Cao, H. Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube. Symmetry 2023, 15, 1845. https://doi.org/10.3390/sym15101845

AMA Style

Yang S, Liu Y, Cao H. Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube. Symmetry. 2023; 15(10):1845. https://doi.org/10.3390/sym15101845

Chicago/Turabian Style

Yang, Shizhong, Yanli Liu, and Huidong Cao. 2023. "Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube" Symmetry 15, no. 10: 1845. https://doi.org/10.3390/sym15101845

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube

Abstract

1. Introduction

2. Preliminaries and Problem Formulation

2.1. Nomenclatures

2.2. Problem Formulation

2.3. Controller Synthesis

3. DNN-Based RMPC with a Fuzzy-Based Tube Size Controller

3.1. Error Tube and Constraint Satisfaction

3.2. Design of DNN-Based Nominal RMPC

3.3. The Feedback Mechanism of the Control Synthesis

3.4. The DNN-Based RMPC with a Fuzzy-Based Tube Size Controller Structure

4. Simulations and Comparison Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI