Long Short-Term Memory Neural Networks for Modeling Dynamical Processes and Predictive Control: A Hybrid Physics-Informed Approach

Zarzycki, Krzysztof; Ławryńczuk, Maciej

doi:10.3390/s23218898

Open AccessArticle

Long Short-Term Memory Neural Networks for Modeling Dynamical Processes and Predictive Control: A Hybrid Physics-Informed Approach

by

Krzysztof Zarzycki

^*

and

Maciej Ławryńczuk

Institute of Control and Computation Engineering, Faculty of Electronics and Information Technology, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(21), 8898; https://doi.org/10.3390/s23218898

Submission received: 4 October 2023 / Revised: 26 October 2023 / Accepted: 30 October 2023 / Published: 1 November 2023

(This article belongs to the Special Issue Fuzzy Systems and Neural Networks for Engineering Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This work has two objectives. Firstly, it describes a novel physics-informed hybrid neural network (PIHNN) model based on the long short-term memory (LSTM) neural network. The presented model structure combines the first-principle process description and data-driven neural sub-models using a specialized data fusion block that relies on fuzzy logic. The second objective of this work is to detail a computationally efficient model predictive control (MPC) algorithm that employs the PIHNN model. The validity of the presented modeling and MPC approaches is demonstrated for a simulated polymerization reactor. It is shown that the PIHNN structure gives very good modeling results, while the MPC controller results in excellent control quality.

Keywords:

dynamical systems; LSTM neural networks; physics-informed neural networks; model predictive control

1. Introduction

Model predictive control (MPC) algorithms, as highlighted in [1,2], find their primary applications in managing processes that classical control methods struggle to handle effectively. These processes often involve multiple-input, multiple-output (MIMO) systems or exhibit strong nonlinearity. MPC, renowned for its flexibility in accommodating various constraints, excels in ensuring high-quality control, even in the face of challenging processes. Real-world instances of successful MPC applications include control of chemical reactors [3,4] and distillation towers [5], as well as the integration of MPC in embedded systems controlling heating ventilation and air conditioning systems (HVAC) [6], quadrotors [7], fuel cells [8], autonomous vehicles [9], and underwater vehicles [10].

As emphasized in [11,12,13], accurate sensor measurements of essential process variables play a critical role in MPC. It is widely acknowledged that the absence of these measurements inevitably leads to a significant loss in control performance. To address this challenge, when the necessary measurements are not readily available, engineers commonly employ online estimation techniques, such as Kalman or extended Kalman filters ([14]). Furthermore, specialized methods and strategies have been developed to tackle this issue in specific applications. In the domain of vehicles, innovative solutions have emerged. The authors of [15] introduce a real-world example where a vehicle employs an external camera to detect obstacles and lane positions on the road. Additionally, it utilizes external rear-corner radars to identify objects approaching from the rear. An intriguing application of sensors is presented in [16], where an anemometer measures external factors such as wind force and direction. Beyond the automotive sector, there are applications like sea ship depth measurement. In [10], a depth sensor is installed for precisely measuring sea ship depth, with heave speed derived from the depth sensor data. Finally, MPC is also used to manage fault-tolerant control. This application addresses issues like stiction in control valves, as discussed in [17].

The cornerstone of any effective MPC algorithm is the precision of its process model. Broadly, two general model classes are usually considered: first-principle (FP) models rooted in the fundamental understanding of the process; black-box approximations. Both model classes have their distinct strengths and limitations:

FP models demand meticulous process descriptions and accurate parameter values but offer unparalleled modeling precision across a wide operating range, even in abnormal situations. In practice, however, the values of some model parameters may be imprecise or unknown.
In contrast, data-driven black-box models, including support-vector machines (SVM) [18], multi-layer perceptron (MLP) neural networks [19,20], radial basis function (RBF) networks [21], and recurrent long short-term memory networks (LSTM) [22,23], require no prior domain expertise. Among these, gated recurrent unit (GRU) has gained traction in modeling dynamic systems [24,25] and integrating with MPC algorithms [23,26,27]. Neural networks models have proven to be very useful, especially when dealing with complex dynamical processes, such as predator–prey systems [28,29,30]. However, black-box models may struggle when the available dataset lacks coverage for certain process variables, particularly those operating at infrequent points.

Physics-informed neural network models (PINNs) offer a compelling fusion of both modeling approaches. These models combine the foundational principles governing the process with the data-driven power of machine learning. The result is a versatile model that adheres to fundamental laws while approximating the behavior of real-world processes. The literature showcases PINN applications in scenarios where parameters of ordinary differential equations (ODE) models are either imprecisely known [31] or immeasurable [32]. Furthermore, PINNs can approximate parameters of partial differential equations (PDE) [33]. These PINN models find utility in replacing numerical solvers for ODEs [34] and even serve as models within MPC frameworks [35]. Additionally, one can find several hybrid models aiming to combine a data-driven modeling approach with knowledge of physics. The hybrid physical guided neural network [36] is a feed-forward neural network integrated with a first-principles model. The entire hybrid model is trained jointly. This training process involves incorporating a fusion output layer that utilizes a straightforward interpolation technique. Other examples include using deep neural networks in a physically guided modeling approach [37], in modeling lithium batteries [38], and in modeling a traffic state [39]. One can also find examples of introducing physics directly in the forward pass of the neural network to model the lake temperature [40].

This study addresses a common modeling challenge characterized by two specific limitations. Firstly, process-variable measurements are typically feasible but confined to a limited vicinity of certain operating points. Consequently, the resulting models exhibit localized validity, restricted to the regions where data have been collected for identification purposes. Secondly, although fundamentally sound, the existing first-principle models describing the process often lack precision due to imprecise parameters. In response to these limitations, this work introduces an innovative physics-informed hybrid neural network (PIHNN) model structure, leveraging LSTM neural networks. This approach combines elements from both first-principle and black-box data-driven methodologies, offering robust modeling capabilities in scenarios characterized by the aforementioned issues. Within this research, we delve into two data fusion techniques, drawing from the principles of the first-principle process description and the LSTM network, both employing a fuzzy-logic-based approach. The initial method employs a simplified data fusion block, while the subsequent method harnesses machine learning techniques to minimize overall model errors. To assess the effectiveness of the proposed model structure and data fusion techniques, we apply them to a benchmark polymerization reactor process.

Additionally, we integrate the developed PIHNN model into the MPC framework. Our analysis encompasses a straightforward MPC algorithm with a nonlinear optimization MPC (MPC-NO) and a more intricate linearization-based MPC scheme named the MPC algorithm with nonlinear prediction and linearization around the predicted trajectory (MPC-NPLPT), which relies on computationally uncomplicated quadratic optimization tasks. Our findings demonstrate that the linearization-based MPC approach can yield commendable control performance while significantly reducing computational demands compared to nonlinear counterparts. An initial iteration of the PIHNN model was introduced in conference proceedings [41], where a basic GRU neural network was employed. This current study represents a substantial expansion of previous research efforts. Here, we consider more general LSTM-based PIHNN models, comprehensively examine the model’s structure, explore various potential variants and present details. Furthermore, we introduce an efficient model predictive control (MPC) algorithm for the PIHNN models considered in this study.

This work is organized as follows. Firstly, Section 2 presents the general structure and the details of the hybrid PIHNN model structure utilizing LSTM neural networks. The state–space modeling approach is employed. Secondly, Section 3 briefly describes the general MPC scheme with nonlinear optimization and presents general formulation, necessary implementation details, and the resulting quadratic optimization task of the linearization-based MPC method. Section 4 thoroughly studies the validity of as many as sic PIHNN model variants applied to approximate the behavior of a chemical reactor benchmark. Furthermore, the control efficiency and computational speed of the recommenced linearization-based MPC algorithm is shown. Finally, Section 5 concludes the article.

2. Hybrid Physics-Informed Models Using LSTM Neural Networks

We introduce an innovative PIHNN model that blends a data-driven approach with expert knowledge of the underlying physics of the process. To effectively apply the PIHNN model, the following conditions must be satisfied:

The process input and output variables, i.e., the manipulated and controlled variables, respectively, must be measurable. State variables may be measured or observed using a state estimation, e.g., in the form of an extended Kalman filter (EKF).
The FP process should exist in the form of a set of differential equations and, when necessary, additional algebraic relations based on the fundamental laws of physics governing the process.

However, we assume that the measurements and the FP model may exhibit imperfections. Specifically, the measurements may originate from a limited range within the entire spectrum of process variable variability. Furthermore, the FP model may also contain inaccuracies and be susceptible to errors arising from factors such as incorrect estimation of specific process parameters or measurement inaccuracies.

2.1. Model Structure

This paper primarily focuses on single-input single-output (SISO) process modeling. The process input and output are denoted as u and y, respectively. Additionally, the process has

n_{x}

state variables, represented as

x = {[x_{1} \dots x_{n_{x}}]}^{T}

.

Figure 1 illustrates the model’s overall structure. The PIHNN model is divided into three distinct components. The first model component, highlighted in blue, is entirely data-driven. It comprises

n_{LSTM}

neural sub-models, each trained on available data. The number of data-driven sub-models corresponds to the number of distinct operational areas of the process from which measurement data can be collected. Each sub-model takes the vector

X_{LSTM}^{i}

as the input and generates the scalar

y_{LSTM}^{i}

as the output. LSTM networks are employed in this study, as earlier research has demonstrated their exceptional ability to model dynamical processes [23,27]. However, it is important to note that alternative data-driven models could also be applied in this context. The second component of the PIHNN structure, highlighted in green, is rooted in expert knowledge about the underlying physics of the process. It consists of an FP sub-model formulated using ordinary differential equations. The input to this sub-model is the vector

X_{FP}

, while the output is denoted by the scalar

y_{FP}

. The third component of the PIHNN structure, highlighted in orange, represents the data fusion block (DF). In general, many decision models can be used here, such as neural networks of various architectures. However, we recommend using the fuzzy data fusion block (Fuzzy DF) because it directly incorporates the sub-models. There is no need to train Fuzzy DF on data, which is particularly useful when training data is lacking across specific ranges of process variable variability. By selecting membership function shapes, one can determine which areas and to what extent we should consider the sub-models when calculating the overall PIHNN model output. The DF block takes output calculated by all LSTM sub-models and the FP models as inputs. Based on the current operating state of the process, represented by the vector

X_{DF}

, it makes decisions regarding the combination of outputs from all sub-models. The primary goal of this fusion process is to minimize the overall error of the entire PIHNN model.

2.2. First-Principle Sub-Model

Typically, the FP model utilizes fundamental physical laws formulated in the continuous-time domain, i.e., a set of differential equations must be considered. The state equations have the classical form

\begin{matrix} {\dot{x}}_{1} (t) & = f_{1} (x_{1} (t), \dots, x_{n_{x}} (t), u (t)) \end{matrix}

(1)

⋮

\begin{matrix} {\dot{x}}_{n_{x}} (t) & = f_{n_{x}} (x_{1} (t), \dots, x_{n_{x}} (t), u (t)) \end{matrix}

(2)

while the output equation is

\begin{matrix} y (t) & = g (x_{1} (t), \dots, x_{n_{x}} (t)) \end{matrix}

(3)

where

f_{1}, \dots, f_{n_{x}} : R^{n_{x} + 1} \to R

and

g : R \to R

are nonlinear functions. Since we will next use the PIHNN model relying on the FP model in the MPC algorithm with online linearization, we require the functions

f_{1}, \dots, f_{n_{x}}, g

to be differentiable. From Equations (1)–(3), we can find a corresponding discrete-time FP model

\begin{matrix} x_{1} (k) & = f_{1}^{d} (x_{1} (k - 1), \dots, x_{n_{x}} (k - 1), u (k - 1)) \end{matrix}

(4)

⋮

\begin{matrix} x_{n_{x}} (k) & = f_{n_{x}}^{d} (x_{1} (k - 1), \dots, x_{n_{x}} (k - 1), u (k - 1)) \end{matrix}

(5)

\begin{matrix} y (k) & = g^{d} (x_{1} (k - 1), \dots, x_{n_{x}} (k - 1)) \end{matrix}

(6)

where

f_{1}^{d}, \dots, f_{n_{x}}^{d} : R^{n_{x} + 1} \to R

and

g^{d} : R \to R

are nonlinear mapping functions. The input vector to the FP model can be expressed as

X_{FP} (k) = {[x^{T} (k - 1) u (k - 1)]}^{T}

(7)

2.3. LSTM Sub-Model

LSTM networks were developed in response to the vanishing gradient problem that impacts traditional recurrent neural networks [42]. Each LSTM neuron is referred to as a “cell” (Figure 2) and encompasses gates responsible for governing the flow of information within the network. The LSTM cell comprises four distinct gates:

The forget gate f determines which values from the previous cell state should be retained and which should be discarded.
The input gate i selects values from both the previous hidden state and the current input for updating purposes.
The cell state candidate gate g initially regulates the flow of information within the network and subsequently computes the candidate value for the current cell state.
The output gate o is responsible for calculating the new hidden state ‘h’.

Each cell in the network has its input vector expressed as

X_{LSTM} (k) = {[x_{LSTM}^{1}, \dots, x_{LSTM}^{m}]}^{T} = {[u (k - 1), \dots, u (k - n_{B}), y (k - 1), \dots, y (k - n_{A})]}^{T}

(8)

where parameters

n_{A}

and

n_{B}

define the order of dynamics of the model. The LSTM network has

n_{N}

cells. The weights in the network can be written in a matrix form

W = [\begin{matrix} W_{i} \\ W_{f} \\ W_{g} \\ W_{o} \end{matrix}], R = [\begin{matrix} R_{i} \\ R_{f} \\ R_{g} \\ R_{o} \end{matrix}], b = [\begin{matrix} b_{i} \\ b_{f} \\ b_{g} \\ b_{o} \end{matrix}]

(9)

The input weight matrices

W_{i}

,

W_{f}

,

W_{g}

and

W_{o}

have dimensionality

n_{N} \times n_{f}

; the recurrent weight matrices

R_{i}

,

R_{f}

,

R_{g}

and

R_{o}

have dimensionality

n_{N} \times n_{N}

; and the bias vectors

b_{i}

,

b_{f}

,

b_{g}

and

b_{o}

have dimensionality

n_{N} \times 1

, respectively. At time instant k, the LSTM model initially calculates the output value of each gate

\begin{matrix} i (k) & = σ (W_{i} X_{LSTM} + R_{i} h (k - 1) + b_{i}) \end{matrix}

(10)

\begin{matrix} f (k) & = σ (W_{f} X_{LSTM} + R_{f} h (k - 1) + b_{f}) \end{matrix}

(11)

\begin{matrix} g (k) & = tanh (W_{g} X_{LSTM} + R_{g} h (k - 1) + b_{g}) \end{matrix}

(12)

\begin{matrix} o (k) & = σ (W_{o} X_{LSTM} + R_{o} h (k - 1) + b_{o}) \end{matrix}

(13)

Subsequently, the cell state of the network can be computed

\begin{matrix} c (k) = f (k) \circ c (k - 1) + i (k) \circ g (k) \end{matrix}

(14)

where the symbol ∘ represents the Hadamard product of vectors. Finally, the hidden state can calculated

\begin{matrix} h (k) = o (k) \circ tanh (c (k)) \end{matrix}

(15)

The LSTM layer of the network is typically added to a fully connected layer (Figure 3), with weight matrix

W_{y}

with a dimensionality of

1 \times n_{N}

and bias

b_{y}

. Finally, the computation of the network’s output at time instant k can be expressed as

\begin{matrix} y_{(i)}^{LSTM} (k) = W_{y} h (k) + b_{y} \end{matrix}

(16)

One can represent Equations (10)–(15) in scalar form, which will prove useful for the derivation of the MPC algorithm considered in Section 3. The scalar form expressions for the n-th elements of the gate and state vectors are

\begin{matrix} i_{n} (k) & = σ (\sum_{m = 1}^{n_{A} + n_{B}} (w_{n, m}^{i} x_{LSTM}^{m} (k)) + \sum_{m = 1}^{n_{N}} (r_{n, m}^{i} h_{m} (k - 1)) + b_{n}^{i}) \end{matrix}

(17)

\begin{matrix} f_{n} (k) & = σ (\sum_{m = 1}^{n_{A} + n_{B}} (w_{n, m}^{f} x_{LSTM}^{m} (k)) + \sum_{m = 1}^{n_{N}} (r_{n, m}^{f} h_{m} (k - 1)) + b_{n}^{f}) \end{matrix}

(18)

\begin{matrix} g_{n} (k) & = tanh (\sum_{m = 1}^{n_{A} + n_{B}} (w_{n, m}^{g} x_{LSTM}^{m} (k)) + \sum_{m = 1}^{n_{N}} (r_{n, m}^{g} h_{m} (k - 1)) + b_{n}^{g}) \end{matrix}

(19)

\begin{matrix} o_{n} (k) & = σ (\sum_{m = 1}^{n_{A} + n_{B}} (w_{n, m}^{o} x_{LSTM}^{m} (k)) + \sum_{m = 1}^{n_{N}} (r_{n, m}^{o} h_{m} (k - 1)) + b_{n}^{o}) \end{matrix}

(20)

\begin{array}{l} c_{n} (k) & = f_{n} (k) c_{n} (k - 1) + i_{n} (k) g_{n} (k) \end{array}

(21)

\begin{array}{l} h_{n} (k) & = o_{n} (k) tanh (c_{n} (k)) \end{array}

(22)

\begin{array}{l} y_{(i)}^{LSTM} (k) & = \sum_{n = 1}^{n_{N}} w_{y}^{n} h_{n} (k) + b_{y} \end{array}

(23)

Equations (22) and (23) could be used to find the output of the network in the form of one equation

\begin{matrix} y_{(i)}^{LSTM} (k) = & \sum_{n = 1}^{n_{N}} w_{y}^{n} (o_{n} (k) tanh (f_{n} (k) c_{n} (k - 1) + i_{n} (k) g_{n} (k))) + b_{y} \end{matrix}

(24)

2.4. Fuzzy Data Fusion Block

Considering Figure 1, the output of the whole PIHNN model is

\begin{matrix} y_{PIHNN} (k) = \frac{\sum_{n = 1}^{n_{LSTM}} y_{n}^{LSTM} (k) μ_{n}^{LSTM} (k) + y_{FP} (k) (\sum_{n = 1}^{n_{FP}} μ_{n}^{FP} (k))}{\sum_{n = 1}^{n_{LSTM}} μ_{n}^{LSTM} (k) + \sum_{n = 1}^{n_{FP}} μ_{n}^{FP} (k)} \end{matrix}

(25)

In this study, we use trapezoidal, sigmoidal, and Gaussian membership functions. For trapezoidal functions, we have

μ_{n}^{LSTM} (k) = μ^{LSTM} (X_{DF} (k)) = max (min (\frac{X_{DF} (k) - a_{n}}{b_{n} - a_{n}}, 1, \frac{d_{n} - X_{DF} (k)}{d_{n} - c_{n}}), 0)

(26)

for sigmoidal ones, we write

μ_{n}^{LSTM} (k) = μ^{LSTM} (X_{DF} (k)) = \frac{1}{1 + e^{- a_{n} (X_{DF} (k) - b_{n})}} \frac{1}{1 + e^{- c_{n} (X_{DF} (k) - d_{n})}}

(27)

and for Gaussian ones, we define

μ_{n}^{LSTM} (k) = μ^{LSTM} (X_{DF} (k)) = exp (\frac{- (X_{DF} (k) - a_{n})}{2 b_{n}^{2}})

(28)

The signal

X_{DF} (k) = y (k)

or

X_{DF} (k) = u (k - 1)

defines the current operating point of the process. Parameters

a_{n}

,

b_{n}

,

c_{n}

,

d_{n}

define the shape of membership functions used.

2.5. Model Development Procedure

The process for establishing the PIHNN model unfolds as follows:

We determine the number of distinct training datasets that can be derived from the process measurements.
We conduct training of the LSTM network for each training dataset.
We implement a discrete FP model of the process.
We select the initial shape and range of the membership function within the DF block.
We deliver the outputs of the LSTM sub-models and the output of the FP model as inputs of the DF block, where their fusion is carried out based on the current operational state of the process. This fusion process determines the output of the PIHNN model.
We assess the quality of PIHNN modeling. If it proves unsatisfactory, then it becomes necessary to modify the shape of the membership function.
We adjust the membership function’s shape, which can be executed manually, drawing upon expert knowledge, or using an optimization procedure.

The flow chart of the model development procedure is also presented in Figure 4.

3. LSTM PIHNN Models in Predictive Control

3.1. Basic Predictive Control Problem Formulation

This work utilizes the general MPC formulation [1,2]. Namely, at each discrete-time sampling instant k, where

k = 0, 1, 2, \dots

, the MPC controller performs real-time calculations to determine the vector of decision variables. It is defined as the following current and future increments of the input variable

▵ u (k) = {[▵ u (k | k) ▵ u (k + 1 | k) \dots ▵ u (k + N_{u} - 1 | k)]}^{T}

(29)

The symbol

▵ u (k | k)

represents the increment of the manipulated variable at time instant k, computed at the same time instant k. Similarly, the symbol

▵ u (k + 1 | k)

corresponds to the increment of the manipulated variable at the future time instant

k + 1

, computed at the current time instant k. This notation extends to subsequent time instants as well.

N_{u}

represents the control horizon, which determines the length of the MPC decision variable vector. The fundamental MPC optimization problem aims to minimize the predicted control error, minimize excessive increments of the manipulated variable, and satisfy constraints. Let us denote the set-point of the controlled variable for the future sampling instant

k + p

known at the current instant k by

y^{sp} (k + p | k)

and the corresponding prediction determined from the process model by

\hat{y} (k + p | k)

. We consider the predictions and control errors on the prediction horizon N. As far as the magnitude constraints on the manipulated variable and the predicted controlled variable are concerned, they are represented by

u^{min}

,

u^{max}

and

y^{min}

,

y^{max}

, respectively. The fundamental MPC optimization task can be formulated as follows:

\begin{matrix} min_{▵ u (k)} \{J (k) = \sum_{p = 1}^{N} (y^{sp} {(k + p | k)}^{2} - \hat{y} (k + p | k)) + λ \sum_{p = 0}^{N_{u} - 1} {(▵ u (k + p | k))}^{2}\} \\ subject to \\ u^{min} \leq u (k + p | k) \leq u^{max}, p = 0, \dots, N_{u} - 1 \\ ▵ u^{min} \leq ▵ u (k + p | k) \leq ▵ u^{max}, p = 0, \dots, N_{u} - 1 \\ y^{min} \leq \hat{y} (k + p | k) \leq y^{max}, p = 1, \dots, N . \end{matrix}

(30)

In general, the predictions over the prediction horizon are obtained as

\hat{y} (k + p | k) = y (k + p | k) + d (k)

(31)

where the model output for the future discrete time

k + p

, determined at the current time k, is denoted as

y (k + p | k)

. The unmeasured disturbance, that covers the model error and real disturbances that act on the controlled process, is computed as the difference between the measured value of the process controlled variable and its estimation obtained from the model. The MPC optimization problem (30) is solved online at each sampling instant, yielding the solution vector (29). According to the principle of repetitive control, the first element of the obtained solution vector is sent to the process and the whole procedure is repeated in the subsequent sampling instants.

3.2. Nonlinear MPC Optimization for PIHNN Models

Suppose a nonlinear model, e.g., an LSTM structure or the PIHNN model described in this work, is directly used to determine the predictions

\hat{y} (k + p | k)

. The general MPC optimization problem (30) becomes nonlinear in that case. We will refer to such a control method as MPC-NO.

3.3. Quadratic MPC Optimization for PIHNN Models

In order to derive a computationally attractive alternative to the MPC-NO method, we derive an MPC with successive linearization of the predicted trajectory. Such an approach will make it possible to derive a quadratic optimization MPC task. We use the general approach to predicted trajectory linearization known as the MPC-NPLPT method, introduced in [19,20]. However, the application of an original PIHNN model structure requires careful derivation of the algorithm. Firstly, let us define the predicted trajectory of the controlled variable over the entire prediction horizon, i.e., the following vector:

\hat{y} (k) = {[\hat{y} (k + 1 | k) \dots \hat{y} (k + N | k)]}^{T}

(32)

In the MPC-NPLPT approach, linearization is performed along a trajectory of the manipulated variable defined over the control horizon. It has the following form:

u^{traj} (k) = {[u^{traj} (k | k) \dots u^{traj} (k + N_{u} - 1 | k)]}^{T}

(33)

From the definition of the control horizon, it follows that

u^{traj} (k + p | k) = u^{traj} (k + N_{u} - 1 | k)

for

p = N_{u}, \dots, N

. The input trajectory (33) is utilized to determine the predicted trajectory of the controlled variable over the prediction horizon

{\hat{y}}^{traj} (k) = {[{\hat{y}}^{traj} (k + 1 | k) \dots {\hat{y}}^{traj} (k + N | k)]}^{T}

(34)

For linearization, we use Taylor’s approach. Let us define the vector comprising the current and future values of the manipulated variable that correspond to the MPC decision variable vector (29)

u (k) = {[u (k | k) \dots u (k + N_{u} - 1 | k)]}^{T}

(35)

Taking advantage of the compact vector–matrix notation, the predicted trajectory,

\hat{y} (k)

, is expressed as the following linear function of the vector (35):

\hat{y} (k) = {\hat{y}}^{traj} (k) + H (k) (u (k) - u^{traj} (k))

(36)

The

N \times N_{u}

matrix

H (k) = \frac{d {\hat{y}}^{traj} (k)}{d u^{traj} (k)}

(37)

defines partial derivatives of the predicted controlled variable’s trajectory with respect to the future manipulated variable’s trajectory; both trajectories take into account the linearization conditions, so we have to utilize the trajectories

{\hat{y}}^{traj} (k)

and

u^{traj} (k)

, respectively. The entries of the matrix

H (k)

are

H_{r + 1, p} (k) = \frac{\partial {\hat{y}}^{traj} (k + p | k)}{\partial u^{traj} (k + r | k)}

(38)

for all predictions over the prediction horizon, i.e.,

p = 1, \dots, N

, and all computed values of the manipulated variable over the entire control horizon, i.e.,

r = 0, \dots, N_{u}

. The link between the vectors

u (k)

and

▵ u (k)

is

u (k) = J ▵ u (k) + u (k - 1)

(39)

when the entries of the

N_{u} \times N_{u}

auxiliary matrix

J

are defined as

J_{i, j} = \{\begin{matrix} 0 & if i < j \\ 1 & if i \geq j \end{matrix}

(40)

and the vector of length

N_{u}

is

u (k - 1) = {[u (k - 1) \dots u (k - 1)]}^{T}

(41)

Using the linearized trajectory (36) and the rule (39), the general predictive control optimization task (30) is transformed to the subsequent quadratic optimization problem, as follows:

\begin{matrix} min_{▵ u (k)} \{∥ y^{sp} (k) - H (k) J ▵ u (k) - \hat{y} (k) - H (k) (u (k - 1) - u (k)) ∥^{2} + {∥▵ u (k)∥}_{Λ}^{2}\} \\ subject to \\ u^{min} \leq J ▵ u (k) + u (k - 1) \leq u^{max} \\ ▵ u^{min} \leq ▵ u (k) \leq ▵ u^{max} \\ y^{min} \leq H (k) J ▵ u (k) + \hat{y} (k) + H (k) (u (k - 1) - u (k)) \leq y^{max} \end{matrix}

(42)

The definitions for all necessary symbols used in the above problem are

$Λ$ : a diagonal $N_{u} \times N_{u}$ matrix with diagonal entries equal to the weighting coefficient $λ$ ;
$u^{min}$ : a vector of length $N_{u}$ , where all elements are equal to $u^{min}$ ;
$u^{max}$ : a vector of length $N_{u}$ , where all elements are equal to $u^{max}$ ;
$▵ u^{min}$ : a vector of length $N_{u}$ , where all elements are equal to $▵ u^{min}$ ;
$▵ u^{max}$ : a vector of length $N_{u}$ , where all elements are equal to $▵ u^{max}$ ;
$y^{min}$ : a vector of length N, where all elements are equal to $y^{min}$ ;
$y^{max}$ : a vector of length N, where all elements are equal to $y^{max}$ .

3.4. PIHNN Prediction

Let us now discuss how the PIHNN model discussed in this work is utilized for MPC prediction, i.e., to calculate the predicted trajectory of the controlled variable defined by Equation (34). We use Equation (25) for the future time instant

k + p

which gives

\begin{matrix} y^{PIHNN} (k + p | k) = \frac{\sum_{n = 1}^{n_{LSTM}} y_{n}^{LSTM} (k + p | k) μ_{n}^{LSTM} (k) + y_{FP} (k + p | k) (\sum_{n = 1}^{n_{FP}} μ_{n}^{FP} (k))}{\sum_{n = 1}^{n_{LSTM}} μ_{n}^{LSTM} (k) + \sum_{n = 1}^{n_{FP}} μ_{n}^{FP} (k)} \end{matrix}

(43)

Taking advantage of Equation (31), the predictions are, therefore, expressed as

\begin{matrix} {\hat{y}}^{PIHNN} (k + p | k) = \frac{\sum_{n = 1}^{n_{LSTM}} y_{n}^{LSTM} (k + p | k) μ_{n}^{LSTM} (k) + y_{FP} (k + p | k) (\sum_{n = 1}^{n_{FP}} μ_{n}^{FP} (k))}{\sum_{n = 1}^{n_{LSTM}} μ_{n}^{LSTM} (k) + \sum_{n = 1}^{n_{FP}} μ_{n}^{FP} (k)} + d (k) \end{matrix}

(44)

where the membership functions are defined by Equations (26), (27) or (28). Let us note that the predicted trajectory from the PIHNN model depends on the trajectories generated by both LSTM and FP sub-models. The disturbance (the prediction error) is determined as the difference between the measured process output and its estimation obtained from the model

\begin{matrix} d (k) = y (k) - y^{PIHNN} (k) \end{matrix}

(45)

where the signal

y^{PIHNN} (k)

is found from Equation (31).

3.5. LSTM Model Prediction

For each LSTM sub-model, the calculations start with computing the predicted output of the gates. For this purpose, we use Equations (17)–(20) which yield the following

\begin{matrix} i_{n} (k + p | k) & = σ (\sum_{m = 1}^{n_{A} + n_{B}} (w_{n, m}^{i} x_{LSTM}^{m} (k + p | k)) + \sum_{m = 1}^{n_{N}} (r_{n, m}^{i} h_{m} (k - 1 + p | k)) + b_{n}^{i}) \end{matrix}

(46)

\begin{matrix} f_{n} (k + p | k) & = σ (\sum_{m = 1}^{n_{A} + n_{B}} (w_{n, m}^{f} x_{LSTM}^{m} (k + p | k)) + \sum_{m = 1}^{n_{N}} (r_{n, m}^{f} h_{m} (k - 1 + p | k)) + b_{n}^{f}) \end{matrix}

(47)

\begin{matrix} g_{n} (k + p | k) & = tanh (\sum_{m = 1}^{n_{A} + n_{B}} (w_{n, m}^{g} x_{LSTM}^{m} (k + p | k)) + \sum_{m = 1}^{n_{N}} (r_{n, m}^{g} h_{m} (k - 1 + p | k)) + b_{n}^{g}) \end{matrix}

(48)

\begin{matrix} o_{n} (k + p | k) & = σ (\sum_{m = 1}^{n_{A} + n_{B}} (w_{n, m}^{o} x_{LSTM}^{m} (k + p | k)) + \sum_{m = 1}^{n_{N}} (r_{n, m}^{o} h_{m} (k - 1 + p | k)) + b_{n}^{o}) \end{matrix}

(49)

Let us introduce auxiliary integer variables as follows:

I_{uf} (p) = max (min (p, n_{B}), 0)

,

I_{yf} (p) = min (p - 1, n_{A})

. We can represent gate predictions as

\begin{matrix} i_{n} (k + p | k) = σ ( & \sum_{m = 1}^{I_{uf} (p)} w_{n, m}^{i} u (k - m + p | k) + \sum_{m = I_{uf} (p) + 1}^{n_{B}} w_{n, I_{uf} (p) + m}^{i} u (k - m + p) \\ + \sum_{m = 1}^{I_{yf} (p)} w_{n, n_{B} + m}^{i} {\hat{y}}^{LSTM} (k - m + p | k) + \sum_{m = I_{yf} (p) + 1}^{n_{A}} w_{n, I_{yf} (p) + m}^{i} y (k - m + p) \\ + \sum_{m = 1}^{n_{N}} r_{n, m}^{i} h_{m} (k + p - 1 | k) + b_{n}^{i}) \end{matrix}

(50)

\begin{matrix} f_{n} (k + p | k) = σ ( & \sum_{m = 1}^{I_{uf} (p)} w_{n, m}^{f} u (k - m + p | k) + \sum_{m = I_{uf} (p) + 1}^{n_{B}} w_{n, I_{uf} (p) + m}^{f} u (k - m + p) \\ + \sum_{m = 1}^{I_{yf} (p)} w_{n, n_{B} + m}^{f} {\hat{y}}^{LSTM} (k - m + p | k) + \sum_{m = I_{yf} (p) + 1}^{n_{A}} w_{n, I_{yf} (p) + m}^{f} y (k - m + p) \\ + \sum_{m = 1}^{n_{N}} r_{n, m}^{f} h_{m} (k + p - 1 | k) + b_{n}^{f}) \end{matrix}

(51)

\begin{matrix} g_{n} (k + p | k) = tanh ( & \sum_{m = 1}^{I_{uf} (p)} w_{n, m}^{g} u (k - m + p | k) + \sum_{m = I_{uf} (p) + 1}^{n_{B}} w_{n, I_{uf} (p) + m}^{g} u (k - m + p) \\ + \sum_{m = 1}^{I_{yf} (p)} w_{n, n_{B} + m}^{g} {\hat{y}}^{LSTM} (k - m + p | k) + \sum_{m = I_{yf} (p) + 1}^{n_{A}} w_{n, I_{yf} (p) + m}^{g} y (k - m + p) \\ + \sum_{m = 1}^{n_{N}} (r_{n, m}^{g} h_{m} (k + p - 1 | k)) + b_{n}^{g}) \end{matrix}

(52)

and

\begin{matrix} o_{n} (k + p | k) = σ ( & \sum_{m = 1}^{I_{uf} (p)} w_{n, m}^{o} u (k - m + p | k) + \sum_{m = I_{uf} (p) + 1}^{n_{B}} w_{n, I_{uf} (p) + m}^{o} u (k - m + p) \\ + \sum_{m = 1}^{I_{yf} (p)} w_{n, n_{B} + m}^{o} {\hat{y}}^{LSTM} (k - m + p | k) + \sum_{m = I_{yf} (p) + 1}^{n_{A}} w_{n, I_{yf} (p) + m}^{o} y (k - m + p) \\ + \sum_{m = 1}^{n_{N}} r_{n, m}^{o} h_{m} (k + p - 1 | k) + b_{n}^{o}) \end{matrix}

(53)

Then, the predicted cell and hidden states can be determined from Equations (21) and (22)

\begin{array}{l} c_{n} (k + p | k) & = f_{n} (k + p | k) c_{n} (k - 1 + p | k) + i_{n} (k + p | k) g_{n} (k + p | k) \end{array}

(54)

\begin{array}{l} h_{n} (k + p | k) & = o_{n} (k + p | k) tanh (c_{n} (k + p | k)) \end{array}

(55)

Let us stress that the above equations have to be used recurrently for

p = 1, \dots, N

. Finally, the predicted output of the i-th LSTM sub-model can be computed from Equation (23)

\begin{matrix} {\hat{y}}_{(i)}^{LSTM} (k + p | k) = & \sum_{n = 1}^{n_{N}} w_{y}^{n} (h_{n} (k + p | k)) + b_{y} \end{matrix}

(56)

which can be also expressed as

\begin{matrix} {\hat{y}}_{(i)}^{LSTM} (k + p | k) & = \sum_{n = 1}^{n_{N}} w_{y}^{n} (o_{n} (k + p | k) tanh (f_{n} (k + p | k) c_{n} (k + p - 1 | k) \\ + i_{n} (k + p | k) g_{n} (k + p | k))) + b_{y} + d (k) \end{matrix}

(57)

3.6. FP Model Prediction

Using Equations (4) and (5), we find model states and the output for the future time instant

k + 1

\begin{matrix} x_{1} (k + p | k) & = f_{1} (x_{1} (k + p - 1 | k), \dots, x_{n_{x}} (k + p - 1 | k), u (k + p - 1 | k)) \end{matrix}

(58)

⋮

\begin{matrix} x_{n_{x}} (k + p | k) & = f_{n_{x}} (x_{1} (k + p - 1 | k), \dots, x_{n_{x}} (k + p | k), u (k + p | k)) \end{matrix}

(59)

\begin{matrix} y^{FP} (k + p | k) & = g (x_{1} (k + p - 1 | k), \dots, x_{n_{x}} (k + p - 1 | k)) \end{matrix}

(60)

To simplify the following calculations, let us start with computing the prediction of the states for the time instant

k + p

\begin{matrix} {\hat{x}}_{1} (k + 1 | k) & = f_{1} (k + 1 | k) = f_{1} (x_{1} (k), \dots, x_{n_{x}} (k), u (k | k)) + ν_{1} (k) \end{matrix}

(61)

⋮

\begin{matrix} {\hat{x}}_{n_{x}} (k + 1 | k) & = f_{n_{x}} (k + 1 | k) = f_{n_{x}} (x_{1} (k), \dots, x_{n_{x}} (k), u (k | k)) + ν_{n} (k) \end{matrix}

(62)

From Equation (6), we find the corresponding predicted controlled variable:

\begin{matrix} {\hat{y}}^{FP} (k + 1 | k) & = g (k + 1 | k) + d (k) = g ({\hat{x}}_{1} (k + 1 | k), \dots, {\hat{x}}_{n_{x}} (k + 1 | k)) + d (k) \end{matrix}

(63)

Next, we can determine the predictions for the subsequent sampling instants:

\begin{matrix} {\hat{x}}_{1} (k + p | k) & = f_{1} (k + p | k) = f_{1} ({\hat{x}}_{1} (k + p - 1 | k), \dots, {\hat{x}}_{n_{x}} (k + p - 1 | k), u (k + p - 1 | k)) + ν_{1} (k) \end{matrix}

(64)

⋮

\begin{matrix} {\hat{x}}_{n_{x}} (k + p | k) & = f_{n_{x}} (k + p | k) = f_{n_{x}} ({\hat{x}}_{1} (k + p - 1 | k), \dots, {\hat{x}}_{n_{x}} (k + p - 1 | k), u (k + p - 1 | k)) + ν_{n} (k) \end{matrix}

(65)

\begin{matrix} {\hat{y}}^{FP} (k + p | k) & = g (k + p | k) + d (k) = g ({\hat{x}}_{1} (k + p | k), \dots, {\hat{x}}_{n_{x}} (k + p | k)) + d (k) \end{matrix}

(66)

where

p = 2, \dots, N

. The state and output disturbances (prediction errors), respectively, are computed as the measurements compared with the outputs of the corresponding model equations

\begin{matrix} ν_{1} (k) & = x_{1} (k) - f_{1} (x_{1} (k - 1), \dots, x_{n_{x}} (k - 1), u (k - 1)) \end{matrix}

(67)

⋮

\begin{matrix} ν_{n} (k) & = x_{n} (k) - f_{n} (x_{1} (k - 1), \dots, x_{n_{x}} (k - 1), u (k - 1)) \end{matrix}

(68)

\begin{matrix} d (k) & = y (k) - g (x_{1} (k - 1), \dots, x_{n} (k - 1)) \end{matrix}

(69)

3.7. PIHNN Model Derivatives

The entries of the matrix

H (k)

(Equation (37)) are computed from Equation (38). Differentiation of Equation (44) yields

\begin{matrix} \frac{\partial {\hat{y}}^{PIHNN} (k + p | k)}{\partial u (k + r | k)} = \frac{\sum_{n = 1}^{n_{LSTM}} \frac{\partial {\hat{y}}_{n}^{LSTM} (k + p | k)}{\partial u (k + r | k)} μ_{n}^{LSTM} (k) + \frac{\partial {\hat{y}}^{FP} (k + p | k)}{\partial u (k + r | k)} (\sum_{n = 1}^{n_{FP}} μ_{n}^{FP} (k))}{\sum_{n = 1}^{n_{LSTM}} μ_{n}^{LSTM} (k) + \sum_{n = 1}^{n_{FP}} μ_{n}^{FP} (k)} \end{matrix}

(70)

Let us note that the derivatives of the whole PIHNN model depend on the LSTM and FP sub-model derivatives.

3.8. LSTM Model Derivatives

Derivatives for LSTM sub-models are calculated by differentiating Equation (57)

\begin{matrix} \frac{\partial {\hat{y}}_{(i)}^{LSTM} (k + p | k)}{\partial u (k + r | k)} = \sum_{n = 1}^{n_{N}} w_{n}^{y} \frac{\partial h_{n} (k + p | k)}{\partial u (k + r | k)} \end{matrix}

(71)

For all

p = 1, \dots, N

and

r = 0, \dots, N_{u} - 1

, the subsequent step involves the application of the chain rule of differentiation. Initially, it is imperative to determine the derivatives of gates i, f, g, and o. We proceed to differentiate Equation (50):

\begin{matrix} \frac{\partial i_{n} (k + p | k)}{\partial u (k + r | k)} & = i_{n} (k + p | k) (1 - i_{n} (k + p | k)) (\sum_{m = 1}^{I_{uf} (p)} w_{n, m}^{i} \frac{\partial u (k - m + p | k)}{\partial u (k + r | k)} \\ + \sum_{m = 1}^{I_{yf} (p)} w_{n, n_{B} + m}^{i} \frac{\partial {\hat{y}}^{LSTM} (k - m + p | k)}{\partial u (k + r | k)} + \sum_{m = 1}^{n_{N}} r_{n, m}^{i} \frac{\partial h_{m} (k + p - 1 | k)}{\partial u (k + r | k)}) \end{matrix}

(72)

Equation (51) gives

\begin{matrix} \frac{\partial f_{n} (k + p | k)}{\partial u (k + r | k)} & = f_{n} (k + p | k) (1 - f_{n} (k + p | k)) (\sum_{m = 1}^{I_{uf} (p)} w_{n, m}^{f} \frac{\partial u (k - m + p | k)}{\partial u (k + r | k)} \\ + \sum_{m = 1}^{I_{yf} (p)} w_{n, n_{B} + m}^{f} \frac{\partial {\hat{y}}^{LSTM} (k - m + p | k)}{\partial u (k + r | k)} + \sum_{m = 1}^{n_{N}} r_{n, m}^{f} \frac{\partial h_{m} (k + p - 1 | k)}{\partial u (k + r | k)}) \end{matrix}

(73)

from Equation (52), we obtain

\begin{matrix} \frac{\partial g_{n} (k + p | k)}{\partial u (k + r | k)} & = (1 - g_{n}^{2} (k + 1 | k)) (\sum_{m = 1}^{I_{uf} (p)} w_{n, m}^{g} \frac{\partial u (k - m + p | k)}{\partial u (k + r | k)} \\ + \sum_{m = 1}^{I_{yf} (p)} w_{n, n_{B} + m}^{g} \frac{\partial {\hat{y}}^{LSTM} (k - m + p | k)}{\partial u (k + r | k)} + \sum_{m = 1}^{n_{N}} r_{n, m}^{g} \frac{\partial h_{m} (k + p - 1 | k)}{\partial u (k + r | k)}) \end{matrix}

(74)

Finally, using Equation (53), we derive

\begin{matrix} \frac{\partial o_{n} (k + p | k)}{\partial u (k + r | k)} & = o_{n} (k + p | k) (1 - o_{n} (k + p | k)) (\sum_{m = 1}^{I_{uf} (p)} w_{n, m}^{o} \frac{\partial u (k - m + p | k)}{\partial u (k + r | k)} \\ + \sum_{m = 1}^{I_{yf} (p)} w_{n, n_{B} + m}^{o} \frac{\partial {\hat{y}}^{LSTM} (k - m + p | k)}{\partial u (k + r | k)} + \sum_{m = 1}^{n_{N}} r_{n, m}^{o} \frac{\partial h_{m} (k + p - 1 | k)}{\partial u (k + r | k)}) \end{matrix}

(75)

The following step involves computing the derivative of the cell state c using Equation (54)

\begin{matrix} \frac{\partial c_{n} (k + p | k)}{\partial u (k + r | k)} & = \frac{\partial f_{n} (k + p | k)}{\partial u (k + r | k)} c_{n} (k + p - 1 | k) + f_{n} (k + p | k) \frac{\partial c_{n} (k + p - 1 | k)}{\partial u (k + r | k)} \\ + \frac{\partial i_{n} (k + p | k)}{\partial u (k + r | k)} g_{n} (k + p | k) + i_{n} (k + p | k) \frac{\partial g_{n} (k + p | k)}{\partial u (k + r | k)} \end{matrix}

(76)

and from Equation (54), we can derive the derivatives of the hidden state h

\begin{matrix} \frac{\partial h_{n} (k + p | k)}{\partial u (k + r | k)} & = \frac{\partial o_{n} (k + p | k)}{\partial u (k + r | k)} tanh (c_{n} (k + p | k)) \\ + o_{n} (k + p | k) (1 - {tanh}^{2} (c_{n} (k + p | k))) \frac{\partial c_{n} (k + p | k)}{\partial u (k + r | k)} \end{matrix}

(77)

3.9. FP Model Derivatives

We start by finding derivatives of the predicted state variables for the sampling instant

k + 1

. Differentiating Equations (61) and (62), we obtain

\begin{matrix} \begin{matrix} \frac{\partial {\hat{x}}_{1} (k + 1 | k)}{\partial u (k + r | k)} & = \sum_{i = 1}^{n_{x}} \frac{\partial f_{1} (x_{1} (k), \dots, x_{n_{x}} (k), u (k | k))}{\partial x_{i} (k)} \frac{\partial x_{i} (k)}{\partial u (k + r | k)} \\ + \frac{\partial f_{1} (x_{1} (k), \dots, x_{n_{x}} (k), u (k | k))}{\partial u (k | k)} \frac{\partial u (k | k)}{\partial u (k + r | k)} \end{matrix} \end{matrix}

(78)

⋮

\begin{matrix} \begin{matrix} \frac{\partial {\hat{x}}_{n_{x}} (k + 1 | k)}{\partial u (k + r | k)} & = \sum_{i = 1}^{n_{x}} \frac{\partial f_{n_{x}} (x_{1} (k), \dots, x_{n_{x}} (k), u (k | k))}{\partial x_{i} (k)} \frac{\partial x_{i} (k)}{\partial u (k + r | k)} \\ + \frac{\partial f_{n_{x}} (x_{1} (k), \dots, x_{n_{x}} (k), u (k | k))}{\partial u (k | k)} \frac{\partial u (k | k)}{\partial u (k + r | k)} \end{matrix} \end{matrix}

(79)

Knowing that

\frac{\partial u^{traj} (k + p | k)}{\partial u^{traj} (k + r | k)} = \{\begin{matrix} 1 & if p = r or (p > r and r = N_{u} - 1) \\ 0 & otherwise \end{matrix}

(80)

we can simplify Equations (78) and (79) to

\begin{matrix} \frac{\partial {\hat{x}}_{1} (k + 1 | k)}{\partial u (k + r | k)} & = \frac{\partial f_{1} (x_{1} (k), \dots, x_{n_{x}} (k), u (k | k))}{\partial u (k | k)} \frac{\partial u (k | k)}{\partial u (k + r | k)} \end{matrix}

(81)

⋮

\begin{matrix} \frac{\partial {\hat{x}}_{n_{x}} (k + 1 | k)}{\partial u (k + r | k)} & = \frac{\partial f_{n_{x}} (x_{1} (k), \dots, x_{n_{x}} (k), u (k | k))}{\partial u (k | k)} \frac{\partial u (k | k)}{\partial u (k + r | k)} \end{matrix}

(82)

The next step is to find the derivative for the FP model states and the controlled variable for prediction at the sampling instant

k + p

, where

p = 2, \dots, N

. From Equation (63), we have

\begin{matrix} \frac{\partial {\hat{y}}^{FP} (k + 1 | k)}{\partial u (k + r | k)} & = \sum_{i = 1}^{n_{x}} \frac{\partial g ({\hat{x}}_{1} (k + 1 | k), \dots, {\hat{x}}_{n_{x}} (k + 1 | k))}{\partial {\hat{x}}_{i} (k + 1 | k)} \frac{\partial {\hat{x}}_{i} (k + 1 | k)}{\partial u (k + r | k)} \end{matrix}

(83)

Next, we can determine the derivatives when

p = 2, \dots, N

. We start with the state variables. From Equations (64) and (65), we obtain

\begin{matrix} \begin{matrix} \frac{\partial {\hat{x}}_{1} (k + p | k)}{\partial u (k + r | k)} & = \sum_{i = 1}^{n_{x}} \frac{\partial f_{1} ({\hat{x}}_{1} (k + p - 1 | k), \dots, {\hat{x}}_{n_{x}} (k + p - 1 | k), u (k + p - 1 | k))}{\partial {\hat{x}}_{i} (k + p - 1 | k)} \\ \times \frac{\partial {\hat{x}}_{i} (k + p - 1 | k)}{\partial u (k + r | k)} \\ + \frac{\partial f_{1} ({\hat{x}}_{1} (k + p - 1 | k), \dots, {\hat{x}}_{n_{x}} (k + p - 1 | k), u (k + p - 1 | k))}{\partial u (k + p - 1 | k)} \\ \times \frac{\partial u (k + p - 1 | k)}{\partial u (k + r | k)} \end{matrix} \end{matrix}

(84)

⋮

\begin{matrix} \begin{matrix} \frac{\partial {\hat{x}}_{n_{x}} (k + p | k)}{\partial u (k + r | k)} & = \sum_{i = 1}^{n_{x}} \frac{\partial f_{n_{x}} ({\hat{x}}_{1} (k + p - 1 | k), \dots, {\hat{x}}_{n_{x}} (k + p - 1 | k), u (k + p - 1 | k))}{\partial {\hat{x}}_{i} (k + p - 1 | k)} \\ \times \frac{\partial {\hat{x}}_{i} (k + p - 1 | k)}{\partial u (k + r | k)} \\ + \frac{\partial f_{n_{x}} ({\hat{x}}_{1} (k + p - 1 | k), \dots, {\hat{x}}_{n_{x}} (k + p - 1 | k), u (k + p - 1 | k))}{\partial u (k + p - 1 | k)} \\ \times \frac{\partial u (k + p - 1 | k)}{\partial u (k + r | k)} \end{matrix} \end{matrix}

(85)

Finally, we can find the predictions of the FP sub-model output using Equation (66)

\begin{matrix} \frac{\partial {\hat{y}}^{FP} (k + p | k)}{\partial u (k + r | k)} & = \sum_{i = 1}^{n_{x}} \frac{\partial g ({\hat{x}}_{1} (k + p | k), \dots, {\hat{x}}_{n_{x}} (k + p | k))}{\partial {\hat{x}}_{i} (k + p | k)} \frac{\partial {\hat{x}}_{i} (k + p | k)}{\partial u (k + r | k)} \end{matrix}

(86)

4. Results

4.1. Polymerization Process Description

The process under study is a polymerization reactor [43] that is frequently used as a benchmark to assess the usefulness of models and control methods, e.g., [20,27]. This process is characterized by a single input, representing the initiator’s flow rate, denoted as

F_{I}

(m

^{3}

h

^{- 1}

). Likewise, it has a single output, the number average molecular weight (

NAMW

) (kg kmol

^{- 1}

). Both input and output signals have been appropriately normalized to facilitate the training of neural networks. The scaling is defined as follows:

u = 100 (F_{I} - {\bar{F}}_{I})

and

y = 0.0001 (NAMW - \bar{NAMW})

. The values at the nominal operating point are

{\bar{F}}_{I} = 0.016783

and

\bar{NAMW} =

20,000. The polymerization process operates with a sampling time

T = 1.8

seconds.

Let us note that the predictions determined from the LSTM sub-models are universal, as derived in Section 3.5. Similarly, let us note that the derivatives matrix determined from the LSTM sub-models are universal, as derived in Section 3.8. Hence, it is only necessary to derive specific equations for prediction using the specific first-principle model of the process. Next, we have to derive equations for the derivatives matrix.

4.2. First-Principle Model for Polymerization Process and Its Use in MPC

The continuous-time first-principle model of the polymerization process [43] is discreticized using the Euler method. The discrete-time model has the following form:

\begin{matrix} x_{1} (k) & = T (60 - 10 x_{1} (k - 1) \sqrt{x_{2} (k - 1)}) + x_{1} (k - 1) \end{matrix}

(87)

\begin{matrix} x_{2} (k) & = T (80 u (k - 1) - 10.1022 x_{2} (k - 1)) + x_{2} (k - 1) \end{matrix}

(88)

\begin{array}{l} x_{3} (k) & = T (0.0024121 x_{1} (k - 1) \sqrt{x_{2} (k - 1)} + 0.112191 x_{2} (k - 1) - 10 x_{3} (k - 1)) \\ + x_{3} (k - 1) \end{array}

(89)

\begin{matrix} x_{4} (k) & = T (245.978 x_{1} (k - 1) \sqrt{x_{2} (k - 1)} - 10 x_{4} (k - 1)) + x_{4} (k - 1) \end{matrix}

(90)

\begin{matrix} y^{FP} (k) & = \frac{x_{4} (k)}{x_{3} (k)} \end{matrix}

(91)

where the model parameters are:

p_{1} = 60 T

,

p_{2} = - 10 T

,

p_{3} = 80 T

,

p_{4} = - 10.1022 T + 1

,

p_{5} = 0.0024121 T

,

p_{6} = 0.112191 T

,

p_{7} = - 10 T + 1

,

p_{8} = 245.978 T

,

p_{9} = - 10 T + 1

.

It is important to note that to emulate the imperfections and inaccuracies of the FP model, we introduced a 20 percent increase to the gain of the model during the simulation experiments, i.e.,

y_{disturbed}^{FP} (k) = 1.2 y^{FP} (k) = 1.2 \frac{x_{4} (k)}{x_{3} (k)}

(92)

For the PIHNN model used in our MPC algorithm, we have to derive equations for the prediction using the specific FP model of the considered benchmark system and the general rules formulated in Section 3.6. They will allow to calculate the predicted trajectory

{\hat{y}}^{traj} (k)

, as defined by Equation (34). We start with determining prediction equations when

p = 1

. From Equations (87)–(91), we obtain

\begin{matrix} {\hat{x}}_{1} (k + 1 | k) & = p_{1} + p_{2} x_{1} (k) \sqrt{x_{2} (k)} + x_{1} (k) + ν_{1} (k) \end{matrix}

(93)

\begin{matrix} {\hat{x}}_{2} (k + 1 | k) & = p_{3} u (k | k) + p_{4} x_{2} (k) + ν_{2} (k) \end{matrix}

(94)

\begin{matrix} {\hat{x}}_{3} (k + 1 | k) & = p_{5} x_{1} (k) \sqrt{x_{2} (k)} + p_{6} x_{2} (k) + p_{7} x_{3} (k) + ν_{3} (k) \end{matrix}

(95)

\begin{matrix} {\hat{x}}_{4} (k + 1 | k) & = p_{8} x_{1} (k) \sqrt{x_{2} (k)} + p_{9} x_{4} (k) + ν_{4} (k) \end{matrix}

(96)

\begin{matrix} {\hat{y}}^{FP} (k + 1 | k) & = \frac{{\hat{x}}_{4} (k + 1 | k)}{{\hat{x}}_{3} (k + 1 | k)} + d (k) \end{matrix}

(97)

The state and output disturbances are derived from the general Equations (67)–(69), respectively, which gives

\begin{matrix} ν_{1} (k) & = x_{1} (k) - p_{1} + p_{2} x_{1} (k - 1) \sqrt{x_{2} (k - 1)} + x_{1} (k - 1) \end{matrix}

(98)

\begin{matrix} ν_{2} (k) & = x_{2} (k) - p_{3} u (k - 1) + p_{4} x_{2} (k - 1) \end{matrix}

(99)

\begin{matrix} ν_{3} (k) & = x_{3} (k) - p_{5} x_{1} (k - 1) \sqrt{x_{2} (k - 1)} + p_{6} x_{2} (k - 1) + p_{7} x_{3} (k - 1) \end{matrix}

(100)

\begin{matrix} ν_{4} (k) & = x_{4} (k) - p_{8} x_{1} (k - 1) \sqrt{x_{2} (k - 1)} + p_{9} x_{4} (k - 1) \end{matrix}

(101)

\begin{matrix} d (k) & = y (k) - \frac{x_{4} (k)}{x_{3} (k)} \end{matrix}

(102)

Next, we find the equations for state and output predictions for

p = 2, \dots, N

\begin{matrix} {\hat{x}}_{1} (k + p | k) & = p_{1} + p_{2} {\hat{x}}_{1} (k + p - 1 | k) \sqrt{{\hat{x}}_{2} (k + p - 1 | k)} + {\hat{x}}_{1} (k + p - 1 | k) + ν_{1} (k) \end{matrix}

(103)

\begin{matrix} {\hat{x}}_{2} (k + p | k) & = p_{3} u (k + p - 1 | k) + p_{4} {\hat{x}}_{2} (k + p - 1 | k) + ν_{2} (k) \end{matrix}

(104)

\begin{matrix} {\hat{x}}_{3} (k + p | k) & = p_{5} {\hat{x}}_{1} (k + p - 1 | k) \sqrt{x_{2} (k + p - 1 | k)} + p_{6} x_{2} (k + p - 1 | k) \\ + p_{7} {\hat{x}}_{3} (k + p - 1 | k) + ν_{3} (k) \end{matrix}

(105)

\begin{matrix} {\hat{x}}_{4} (k + p | k) & = p_{8} {\hat{x}}_{1} (k + p - 1 | k) \sqrt{{\hat{x}}_{2} (k + p - 1 | k)} + p_{9} {\hat{x}}_{4} (k + p - 1 | k) + ν_{4} (k) \end{matrix}

(106)

\begin{matrix} {\hat{y}}^{FP} (k + p | k) & = \frac{{\hat{x}}_{4} (k + p | k)}{{\hat{x}}_{3} (k + p | k)} + d (k) \end{matrix}

(107)

Using the above predictions generated by the FP model, we have to determine derivatives of the predicted trajectory of the controlled variable with respect to the trajectory of the manipulated variable, i.e., the derivative matrix

H (k)

, as defined by Equation (38). For this purpose, we use the general rules formulated in Section 3.9. We consider Equations (81) and (82) and we obtain

\begin{matrix} \frac{\partial {\hat{x}}_{1} (k + 1 | k)}{\partial u (k + r | k)} & = 0 \end{matrix}

(108)

\begin{matrix} \frac{\partial {\hat{x}}_{2} (k + 1 | k)}{\partial u (k + r | k)} & = 80 T \frac{\partial u (k | k)}{\partial u (k + r | k)} \end{matrix}

(109)

\begin{matrix} \frac{\partial {\hat{x}}_{3} (k + 1 | k)}{\partial u (k + r | k)} & = 0 \end{matrix}

(110)

\begin{matrix} \frac{\partial {\hat{x}}_{4} (k + 1 | k)}{\partial u (k + r | k)} & = 0 \end{matrix}

(111)

Equation (83) allows us to express the output derivatives as

\begin{matrix} \frac{\partial {\hat{y}}^{FP} (k + p | k)}{\partial u (k + r | k)} & = 0 \end{matrix}

(112)

Finally, we use Equations (84)–(86) to determine the state variable and output derivatives, respectively

\begin{matrix} \begin{matrix} \frac{\partial {\hat{x}}_{1} (k + p | k)}{\partial u (k + r | k)} & = p_{2} \frac{\partial {\hat{x}}_{1} (k + p - 1 | k)}{\partial u (k + r | k)} \sqrt{{\hat{x}}_{2} (k + p - 1 | k)} - 0.5 p_{2} {\hat{x}}_{1} (k + p - 1 | k) \\ \times {({\hat{x}}_{2} (k + p - 1 | k))}^{- 2} \frac{\partial {\hat{x}}_{2} (k + p - 1 | k)}{\partial u (k + r | k)} + \frac{{\hat{x}}_{1} (k + p - 1 | k)}{\partial u (k + r | k)} \end{matrix} \end{matrix}

(113)

\begin{matrix} \frac{\partial {\hat{x}}_{2} (k + p | k)}{\partial u (k + r | k)} & = p_{3} \frac{\partial u (k + p - 1 | k)}{\partial u (k + r | k)} + p_{4} \frac{{\hat{x}}_{2} (k + p - 1 | k)}{\partial u (k + r | k)} \end{matrix}

(114)

\begin{matrix} \begin{matrix} \frac{\partial {\hat{x}}_{3} (k + p | k)}{\partial u (k + r | k)} & = p_{5} \frac{\partial {\hat{x}}_{1} (k + p - 1 | k)}{\partial u (k + r | k)} \sqrt{{\hat{x}}_{2} (k + p - 1 | k)} - 0.5 p_{5} {\hat{x}}_{1} (k + p - 1 | k) \\ \times {({\hat{x}}_{2} (k + p - 1 | k))}^{- 2} \frac{\partial {\hat{x}}_{2} (k + p - 1 | k)}{\partial u (k + r | k)} + p_{6} \frac{{\hat{x}}_{2} (k + p - 1 | k)}{\partial u (k + r | k)} \\ + p_{7} \frac{{\hat{x}}_{3} (k + p - 1 | k)}{\partial u (k + r | k)} \end{matrix} \end{matrix}

(115)

\begin{matrix} \begin{matrix} \frac{\partial {\hat{x}}_{4} (k + p | k)}{\partial u (k + r | k)} & = p_{8} \frac{\partial {\hat{x}}_{1} (k + p - 1 | k)}{\partial u (k + r | k)} \sqrt{{\hat{x}}_{2} (k + p - 1 | k)} - 0.5 p_{8} {\hat{x}}_{1} (k + p - 1 | k) \\ \times {({\hat{x}}_{2} (k + p - 1 | k))}^{- 2} \frac{\partial {\hat{x}}_{2} (k + p - 1 | k)}{\partial u (k + r | k)} + p_{9} \frac{{\hat{x}}_{4} (k + p - 1 | k)}{\partial u (k + r | k)} \end{matrix} \end{matrix}

(116)

\begin{matrix} \frac{\partial {\hat{y}}^{FP} (k + p | k)}{\partial u (k + r | k)} & = \frac{1}{{\hat{x}}_{3} {(k + p | k)}^{2}} (\frac{\partial {\hat{x}}_{4} (k + p | k)}{\partial u (k + r | k)} {\hat{x}}_{3} (k + p | k) - {\hat{x}}_{4} (k + p | k) \frac{\partial {\hat{x}}_{3} (k + p | k)}{\partial u (k + r | k)}) \end{matrix}

(117)

for all

p = 2, \dots, N

and

r = 0, \dots, N_{u} - 1

.

4.3. LSTM Model for Polymerization Process

Two separate training datasets have been collected from the simulated process (i.e., from simulation of the continuous-time first-principle models), for different operating conditions, as follows:

dataset 1 has been collected for the range of the manipulated variable $0.003 < F_{I} < 0.0129$ , which results in the controlled variable $2.78 \times 10^{4} < NAMW < 4.55 \times 10^{4}$ .
dataset 2 has been collected for $0.05 < F_{I} < 0.06$ which results in $1.41 \times 10^{4} < NAMW < 1.54 \times 10^{4}$ .

The datasets were then used to train two LSTM models, denoted thereafter as LSTM1 and LSTM2. Both models have been trained with the same parameters:

the number of neurons $n_{N} = 7$ ;
the order of dynamics $n_{A} = 0, n_{B} = 1$ .

LSTM models have been trained in MATLAB on a PC equipped with an Nvidia GeForce 970 GTX GPU, an Intel i5-3450 CPU and 16 GB of RAM. We have employed the Adam optimization algorithm with a learning rate of

0.001

and a maximum of 1000 training epochs.

4.4. Modeling Quality of LSTM and FP Models

The modeling quality of all sub-models developed for the polymerization process can be compared in Figure 5. In this comparison, we can see the individual outputs of all sub-models when operating independently with the test dataset. LSTM1, trained predominantly with data featuring large NAMW values, unsurprisingly demonstrates exceptional performance when dealing with such high NAMW values. However, the model’s capability to provide correct outputs diminishes when it encounters data not present in the training dataset. Conversely, LSTM2, trained with low NAMW values, excels when the NAMW values are indeed low. However, it exhibits subpar performance when attempting to model high NAMW values. Notably, in the FP model with the increased gain performs poorly across the entire range of NAMW values.

4.5. Development of PIHNN Models

Once all the sub-models have been prepared, the next step to design the PIHNN model is to develop the DF block. Various membership function shapes have been tested, i.e.:

PIHNN model ver. 1—initial trapezoidal functions;
PIHNN model ver. 2—optimized trapezoidal functions;
PIHNN model ver. 3—initial trapezoidal functions;
PIHNN model ver. 4—optimized trapezoidal functions;
PIHNN model ver. 5—initial trapezoidal functions;
PIHNN model ver. 6—optimized trapezoidal functions.

The membership functions are depicted in Figure 6. Our understanding of the sub-models has guided the initial choices of these shapes. The plots display fuzzified variable values along the horizontal axis, specifically representing the NAMW output of the polymerization reactor. Along the vertical axis, one can find the membership function values. Each membership function corresponds to a particular model. LSTM1, which was trained on data with large NAMW values, is most effective when dealing with large NAMW values. The blue membership functions on the plot indicate the range of NAMW values for which prioritizing the use of the LSTM1 model is recommended. LSTM2, characterized by yellow membership functions, is best suited for NAMW values close to the data in its training set, which primarily includes small values of NAMW. In scenarios where NAMW values fall outside the data ranges of both training sets, the most reliable choice is to utilize the FP model, represented by orange membership functions. Once the initial shapes have been determined, the subsequent step involves utilizing an optimization procedure to fine-tune these shapes. The procedure starts with initial membership function shapes, using Levenberg–Marquardt to minimize the overall error of the PIHNN model.

4.6. PIHNN Modeling Quality

The results of the polymerization reactor modeling experiments are presented in Figure 7, Figure 8 and Figure 9. These figures illustrate the initial 1500 steps of the simulation. Each figure showcases the outputs of two PIHNN models: one with the initial membership function shapes (orange) and the other with optimized (yellow) membership function shapes. These results are compared to the data from the test set. Figure 7 presents the use of the most straightforward decision blocks with trapezoidal membership functions. Even this simplest approach enables the PIHNN model to outperform individual sub-models. The initial shape of the membership function allows the PIHNN structure to represent the data effectively for both small and large values of NAMW. In cases with intermediate values of NAMW, the PIHNN model averages the outputs of the sub-models, while model output still exhibits some deviation from the test data, there is a clear improvement over the FP model. The model with a tuned shape has lower error overall; however, it tends to have poorer modeling quality for both large and small values of NAMW in comparison to the LSTM sub-models.

Figure 8 illustrates the utilization of sigmoidal membership functions in the DF block of the PIHNN model. Here, the sigmoidal shape allows the PIHNN to excel in modeling small, large, and intermediate NAMW values. Importantly, the tendency to average out intermediate NAMW values, as previously observed with the trapezoidal DF model, has been eliminated with the sigmoidal DF PIHNN model. Adopting sigmoidal functions has resulted in highly accurate modeling of medium NAMW values. When comparing the output signals of the models with the initial and tuned shapes of the membership functions, they exhibit minimal differences, with only slight variations noticeable for intermediate values of NAMW.

Finally, Figure 9 presents the utilization of Gaussian membership functions in a DF block of PIHNN model. Here, one can observe that the Gaussian decision model tends to average the values of the three sub-models across the entire spectrum of NAMW variability. This effect is particularly evident in the model with the initial shape of the membership function, where, for large values of NAMW, the model noticeably diverges from the data. As a result, for large NAMW values, PIHNN gives worse results than the independent LSTM1 submodel. Low and intermediate NAMW values are subject to much lower modeling errors. Although optimizing the shape mitigated this averaging effect somewhat, the model’s output still exhibits relatively large errors.

4.7. Validation of MPC Algorithms Using PIHNN Models

The PIHNN model, in six different versions, has been implemented in MPC algorithms. We compare the results obtained from two types of controllers: one with nonlinear optimization (MPC-NO) and the second one recommended in this work, involving linearization around the prediction trajectory (MPC-NPLPT). Table 1 compares the control errors determined for these controllers. First, it is worth noting that the best control quality is achieved for models utilizing DF with Gaussian function shapes. Models employing trapezoidal functions exhibited slightly higher errors, while the poorest performance was observed in models with sigmoidal-shaped functions. This observation may seem counter-intuitive, considering that models with sigmoidal membership functions have smaller modeling errors compared to models with Gaussian ones. It is important to stress that the shape of the closed-loop output trajectory with the MPC controller is affected not only by the quality of the model used but also by the feedback mechanism. Even though Gaussian models exhibit a higher error rate, their inherent averaging characteristic enhances the performance of the MPC controller when coupled with feedback.

Secondly, Table 1 demonstrates that the MPC-NPLPT controller generally yields slightly higher error values than the MPC-NO one when utilizing the same PIHNN model for prediction. This result is not surprising, as MPC-NPLPT employs a linearized model. During linearization, some of the information present in the nonlinear model is simplified or lost. The exception here is PIHNN model ver. 5, where MPC-NPLPT algorithm provides better controller performance. This may be attributed to chance where the simplifications happened to benefit the controller’s performance in this specific case. However, it is worth noting that the error differences between MPC-NO and MPC-NPLT controllers are minimal for each type of PIHNN model, and both types of controllers work very well.

Table 2 compares the average time required by each MPC controller for control calculations. The computations have been conducted on a PC, and since it is not a real-time system, results may vary on different PCs. Therefore, the results are presented as percentages. The longest time recorded for MPC-NO with PIHNN model ver. 3, which amounted to 140 ms, is considered as 100%. The table reveals that the implementation of the online linearization-based MPC controller significantly reduced the calculation time required, resulting in a 4–5 times decrease compared to nonlinear controllers.

The results are also visually presented. In Figure 10, one can observe the performance of the MPC algorithm with a DF employing trapezoidal functions. The output signals for the PIHNN model with the initial function shape are swift without overshoot for both low and high values of NAMW. However, for intermediate NAMW values, there is a slightly larger overshoot, and the settling time is extended. The signals are quite similar in the case of DF with a tuned function shape, but there is a greater overshoot for intermediate NAMW values. Additionally, it is worth noting that the results obtained for the MPC-NO controller are practically indistinguishable from those for the MPC-NPLT one.

Figure 11 illustrates the results for sigmoidal membership functions. Here, we can observe that the overshoot becomes more pronounced, particularly for intermediate NAMW values, especially when considering the set-point

{NAMW}^{sp} = 2.5 \times 10^{4}

.

The final Figure 12 displays the results of applying Gaussian membership functions. These results are characterized by the shortest settling time and the smallest overshoot. Notably, the controller exhibits excellent performance for average values of NAMW. This observation leads to the conclusion that the averaging nature of Gaussian functions, as seen earlier in the modeling phase (Figure 9, positively impacts the controller’s performance when using the model in the MPC scheme. For NAMW values within the range of

2 \times 10^{4}

to

3 \times 10^{4}

, the FP model significantly impacts PIHNN performance. As mentioned, the FP model is imperfect, featuring an increased gain of 20%.

5. Conclusions

This work defines a new PIHNN model structure that combines the first-principle process description and data-driven neural sub-models using a specialized data fusion block that relies on fuzzy logic. We consider a very practical case when the available first-principle model is imperfect and the data cannot be measured in the complete range of process operation. By combining an imperfect physical model with data obtained from an incomplete range of operations, we have developed a hybrid model that significantly improves performance across the entire range of signal variability. Secondly, this work develops a computationally efficient MPC controller for the PIHNN model. We show the efficacy of the PIHNN model and the resulting MPC controller for a simulated polymerization benchmark. We study the efficiency of different data fusion fuzzy blocks and their impact on model accuracy. We recommend tuning, i.e., optimizing the fuzzy membership functions, greatly improving model accuracy. Finally, we show that the described MPC controller based on the PIHNN model gives excellent results. Namely, the obtained control quality is very similar to that possible in MPC relying on nonlinear optimization while its calculation time is a few times shorter. In our future work, we plan to develop a methodology for designing PIHNN structures tailored to processes with multiple inputs and outputs. Additionally, it is interesting to check the impact of employing various decision model types within the data fusion block on PIHNN modeling quality.

Author Contributions

Conceptualisation K.Z. and M.Ł.; methodology, K.Z. and M.Ł.; software, K.Z. and M.Ł.; validation, K.Z. and M.Ł.; formal analysis, K.Z. and M.Ł.; investigation, K.Z.; writing—original draft preparation, K.Z. and M.Ł.; writing—review and editing, K.Z. and M.Ł.; visualization, K.Z.; supervision, M.Ł. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financed by the Warsaw University of Technology in the framework of the project for the scientific discipline automatic control, electronics and electrical engineering.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

On request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Camacho, E.F.; Bordons, C. Model Predictive Control; Springer: London, UK, 1999. [Google Scholar]
Tatjewski, P. Advanced Control of Industrial Processes, Structures and Algorithms; Springer: London, UK, 2007. [Google Scholar]
Hosen, M.A.; Hussain, M.A.; Mjalli, F.S. Control of polystyrene batch reactors using neural network based model predictive control (NNMPC): An experimental investigation. Control. Eng. Pract. 2011, 19, 454–467. [Google Scholar] [CrossRef]
Wang, B.; Shahzad, M.; Zhu, X.; Rehman, K.U.; Uddin, S. A Non-linear Model Predictive Control Based on Grey-Wolf Optimization Using Least-Square Support Vector Machine for Product Concentration Control in l-Lysine Fermentation. Sensors 2020, 20, 3335. [Google Scholar] [CrossRef]
Assandri, A.D.; de Prada, C.; Rueda, A.; Martínez, J.S. Nonlinear parametric predictive temperature control of a distillation column. Control. Eng. Pract. 2013, 21, 1795–1806. [Google Scholar] [CrossRef]
Carli, R.; Cavone, G.; Ben Othman, S.; Dotoli, M. IoT Based Architecture for Model Predictive Control of HVAC Systems in Smart Buildings. Sensors 2020, 20, 781. [Google Scholar] [CrossRef]
Alexis, K.; Nikolakopoulos, G.; Tzes, A. Switching model predictive attitude control for a quadrotor helicopter subject to atmospheric disturbances. ISA Trans. 2011, 19, 1195–1207. [Google Scholar] [CrossRef]
Gruber, J.K.; Doll, M.; Bordons, C. Design and experimental validation of a constrained MPC for the air feed of a fuel cell. Control. Eng. Pract. 2009, 17, 874–885. [Google Scholar] [CrossRef]
Lima, P.F.; Pereira, G.C.; Mårtensson, J.; Wahlberg, B. Experimental validation of model predictive control stability for autonomous driving. Control. Eng. Pract. 2018, 81, 244–255. [Google Scholar] [CrossRef]
Yao, F.; Yang, C.; Liu, X.; Zhang, M. Experimental Evaluation on Depth Control Using Improved Model Predictive Control for Autonomous Underwater Vehicle (AUVs). Sensors 2018, 18, 2321. [Google Scholar] [CrossRef]
Ding, Z.; Sun, C.; Zhou, M.; Liu, Z.; Wu, C. Intersection Vehicle Turning Control for Fully Autonomous Driving Scenarios. Sensors 2021, 21, 3995. [Google Scholar] [CrossRef]
Bassolillo, S.R.; D’Amato, E.; Notaro, I.; Blasi, L.; Mattei, M. Decentralized Mesh-Based Model Predictive Control for Swarms of UAVs. Sensors 2020, 20, 4324. [Google Scholar] [CrossRef]
Xiong, L.; Fu, Z.; Zeng, D.; Leng, B. An Optimized Trajectory Planner and Motion Controller Framework for Autonomous Driving in Unstructured Environments. Sensors 2021, 21, 4409. [Google Scholar] [CrossRef]
Simon, D. Optimal State Estimation: Kalman, H, and Nonlinear Approaches; John Wiley and Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Karimshoushtari, M.; Novara, C.; Tango, F. How Imitation Learning and Human Factors Can Be Combined in a Model Predictive Control Algorithm for Adaptive Motion Planning and Control. Sensors 2021, 21, 4012. [Google Scholar] [CrossRef]
Miller, A.; Rybczak, M.; Rak, A. Towards the Autonomy: Control Systems for the Ship in Confined and Open Waters. Sensors 2021, 21, 2286. [Google Scholar] [CrossRef]
Bacci di Capaci, R.; Vaccari, M.; Pannocchia, G. Model predictive control design for multivariable processes in the presence of valve stiction. J. Process. Control. 2018, 71, 25–34. [Google Scholar] [CrossRef]
Ławryńczuk, M. Modelling and predictive control of a neutralisation reactor using sparse Support Vector Machine Wiener models. Neurocomputing 2016, 205, 311–328. [Google Scholar] [CrossRef]
Ławryńczuk, M. Nonlinear Predictive Control Using Wiener Models: Computationally Efficient Approaches for Polynomial and Neural Structures; Studies in Systems, Decision and Control; Springer: Cham, Switzerland, 2022; Volume 389. [Google Scholar]
Ławryńczuk, M. Computationally Efficient Model Predictive Control Algorithms: A Neural Network Approach; Studies in Systems, Decision and Control; Springer: Cham, Switzerland, 2014; Volume 3. [Google Scholar]
Balla, K.M.; Nørgaard, J.T.; Bendtsen, J.D.; Kallesøe, C.S. Model Predictive Control using linearized Radial Basis Function Neural Models for Water Distribution Networks. In Proceedings of the 2019 IEEE Conference on Control Technology and Applications (CCTA), Hong Kong, China, 19–21 August 2019; pp. 368–373. [Google Scholar]
Schwedersky, B.B.; Flesch, R.C.C.; Dangui, H.A.S. Practical nonlinear model predictive control algorithm for Long Short-Term Memory networks. IFAC-PapersOnLine 2019, 52, 468–473. [Google Scholar] [CrossRef]
Zarzycki, K.; Ławryńczuk, M. Advanced predictive control for GRU and LSTM networks. Inf. Sci. 2022, 616, 229–254. [Google Scholar] [CrossRef]
Wang, Y. A new concept using LSTM Neural Networks for dynamic system identification. In Proceedings of the 2017 American Control Conference (ACC), Seattle, WA, USA, 24–26 May 2017; pp. 5324–5329. [Google Scholar]
Jordan, I.D.; Sokół, P.A.; Park, I.M. Gated Recurrent Units Viewed Through the Lens of Continuous Time Dynamical Systems. Front. Comput. Neurosci. 2021, 15, 678158. [Google Scholar] [CrossRef]
Bonassi, F.; da Silva, C.F.O.; Scattolini, R. Nonlinear MPC for Offset-Free Tracking of systems learned by GRU Neural Networks. IFAC-PapersOnLine 2021, 54, 54–59. [Google Scholar] [CrossRef]
Zarzycki, K.; Ławryńczuk, M. LSTM and GRU Neural Networks as Models of Dynamical Processes Used in Predictive Control: A Comparison for Two Chemical Reactors. Sensors 2021, 21, 5625. [Google Scholar] [CrossRef]
Li Ping, Z.; Min, X.; Hui-Nan, W. Hybrid control of bifurcation in a predator-prey system with three delays. Acta Phys. Sin. 2011, 60, 010506. [Google Scholar] [CrossRef]
Lu, L.; Huang, C.; Song, X. Bifurcation control of a fractional-order PD control strategy for a delayed fractional-order prey–predator system. Eur. Phys. J. Plus 2023, 138, 77. [Google Scholar] [CrossRef]
Xu, C.; Cui, X.; Li, P.; Yan, J.; Yao, L. Exploration on dynamics in a discrete predator–prey competitive model involving feedback controls. J. Biol. Dyn. 2023, 17, 2220349. [Google Scholar] [CrossRef]
Alhajeri, M.S.; Luo, J.; Wu, Z.; Albalawi, F.; Christofides, P.D. Process structure-based recurrent neural network modeling for predictive control: A comparative study. Chem. Eng. Res. Des. 2022, 179, 77–89. [Google Scholar] [CrossRef]
Roehrl, M.A.; Runkler, T.A.; Brandtstetter, V.; Tokic, M.; Obermayer, S. Modeling System Dynamics with Physics-Informed Neural Networks Based on Lagrangian Mechanics. IFAC-PapersOnLine 2020, 53, 9195–9200. [Google Scholar] [CrossRef]
Yang, L.; Meng, X.; Karniadakis, G.E. B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 2021, 425, 109913. [Google Scholar] [CrossRef]
Nascimento, R.G.; Fricke, K.; Viana, F.A. A tutorial on solving ordinary differential equations using Python and hybrid physics-informed neural network. Eng. Appl. Artif. Intell. 2020, 96, 103996. [Google Scholar] [CrossRef]
Antonelo, E.A.; Camponogara, E.; Seman, L.O.; de Souza, E.R.; Jordanou, J.P.; Hübner, J.F. Physics-Informed Neural Nets-based Control. arXiv 2021, arXiv:2104.02556. [Google Scholar]
Bolderman, M.; Lazar, M.; Butler, H. Physics–Guided Neural Networks for Inversion–based Feedforward Control applied to Linear Motors. In Proceedings of the 2021 IEEE Conference on Control Technology and Applications (CCTA), San Diego, CA, USA, 9–11 August 2021; pp. 1115–1120. [Google Scholar]
Wang, R.; Yu, R. Physics-Guided Deep Learning for Dynamical Systems: A Survey. arXiv 2021, arXiv:2107.01272. [Google Scholar]
Nascimento, R.G.; Corbetta, M.; Kulkarni, C.S.; Viana, F.A. Hybrid physics-informed neural networks for lithium-ion battery modeling and prognosis. J. Power Sources 2021, 513, 230526. [Google Scholar]
Shi, R.; Mo, Z.; Di, X. Physics-Informed Deep Learning for Traffic State Estimation: A Hybrid Paradigm Informed By Second-Order Traffic Models. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 540–547. [Google Scholar]
Daw, A.; Karpatne, A.; Watkins, W.; Read, J.; Kumar, V. Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. arXiv 2017, arXiv:1710.11431. [Google Scholar]
Zarzycki, K.; Ławryńczuk, M. Physics-Informed Hybrid Neural Network Model for MPC: A Fuzzy Approach. Lecture Notes in Networks and Systems; Pawełczyk, M., Bismor, D., Ogonowski, S., Kacprzyk, J., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2023; Volume 708, pp. 183–192. [Google Scholar]
Hochreiter, S. Untersuchungen zu dynamischen neuronalen Netzen. Master’s Thesis, Technical University Munich, Munich, Germany, 1991. [Google Scholar]
Doyle, F.J.; Ogunnaike, B.A.; Pearson, R. Nonlinear model-based control using second-order Volterra models. Automatica 1995, 31, 697–714. [Google Scholar] [CrossRef]

Figure 1. General structure of the PIHNN model.

Figure 2. Structure of the LSTM cell.

Figure 3. Structure of the whole LSTM network.

Figure 4. Flow chart for development of PIHNN model.

Figure 5. A total of 1000 samples of the validation dataset vs. outputs of two local LSTM sub-models and the FP model with an incorrect gain.

Figure 6. Membership functions for considered fuzzy PIHNN models: fuzzy set 1 (blue), fuzzy set 2 (orange), fuzzy set 3 (yellow). (a) Initial (left) and optimized (right) trapezoidal membership functions; (b) initial (left) and optimized (right) sigmoidal membership functions; (c) initial (left) and optimized (right) Gauss membership functions.

Figure 7. A total of 1000 samples of the validation dataset vs. the output of initial and optimized fuzzy PIHNN structures with trapezoidal MFs (PIHNN models ver. 1 and ver. 2).

Figure 8. A total of 1000 samples of the validation dataset vs. the output of initial and optimized fuzzy PIHNN structures with sigmoid MFs (PIHNN models ver. 3 and ver. 4).

Figure 9. A total of 1000 samples of the validation dataset vs. the output of initial and optimized fuzzy PIHNN structures with Gauss MFs (PIHNN models ver. 5 and ver. 6).

Figure 10. MPC with trapezoidal membership function shapes. (a) MPC-NO and MPC-NPLPT controllers using PIHNN model ver. 1; (b) MPC-NO and MPC-NPLPT controllers using PIHNN model ver. 2.

Figure 11. MPC controllers with sigmoidal membership function shapes. (a) MPC-NO and MPC-NPLPT controllers using PIHNN model ver. 3; (b) MPC-NO and MPC-NPLPT controllers using PIHNN model ver. 4.

Figure 12. MPC controllers with Gaussian membership function shapes. (a) MPC-NO and MPC-NPLPT controllers using PIHNN model ver. 5; (b) MPC-NO and MPC-NPLPT controllers using PIHNN model ver. 6.

Table 1. Control errors of MPC algorithms with different PIHNN models.

Model Type	MPC-NO	MPC-NPLPT
PIHNN ver. 1	3.051	3.116
PIHNN ver. 2	3.031	3.095
PIHNN ver. 3	3.205	3.263
PIHNN ver. 4	3.220	3.290
PIHNN ver. 5	2.935	2.741
PIHNN ver. 6	2.965	3.020

Table 2. Average execution time of MPC algorithms with different PIHNN models.

Model Type	MPC-NO	MPC-NPLPT
PIHNN ver. 1	94.4%	22.4%
PIHNN ver. 2	97.2%	23.1%
PIHNN ver. 3	100.0%	21.0%
PIHNN ver. 4	97.2%	23.1%
PIHNN ver. 5	89.3%	24.5%
PIHNN ver. 6	93.7%	23.1%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zarzycki, K.; Ławryńczuk, M. Long Short-Term Memory Neural Networks for Modeling Dynamical Processes and Predictive Control: A Hybrid Physics-Informed Approach. Sensors 2023, 23, 8898. https://doi.org/10.3390/s23218898

AMA Style

Zarzycki K, Ławryńczuk M. Long Short-Term Memory Neural Networks for Modeling Dynamical Processes and Predictive Control: A Hybrid Physics-Informed Approach. Sensors. 2023; 23(21):8898. https://doi.org/10.3390/s23218898

Chicago/Turabian Style

Zarzycki, Krzysztof, and Maciej Ławryńczuk. 2023. "Long Short-Term Memory Neural Networks for Modeling Dynamical Processes and Predictive Control: A Hybrid Physics-Informed Approach" Sensors 23, no. 21: 8898. https://doi.org/10.3390/s23218898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Long Short-Term Memory Neural Networks for Modeling Dynamical Processes and Predictive Control: A Hybrid Physics-Informed Approach

Abstract

1. Introduction

2. Hybrid Physics-Informed Models Using LSTM Neural Networks

2.1. Model Structure

2.2. First-Principle Sub-Model

2.3. LSTM Sub-Model

2.4. Fuzzy Data Fusion Block

2.5. Model Development Procedure

3. LSTM PIHNN Models in Predictive Control

3.1. Basic Predictive Control Problem Formulation

3.2. Nonlinear MPC Optimization for PIHNN Models

3.3. Quadratic MPC Optimization for PIHNN Models

3.4. PIHNN Prediction

3.5. LSTM Model Prediction

3.6. FP Model Prediction

3.7. PIHNN Model Derivatives

3.8. LSTM Model Derivatives

3.9. FP Model Derivatives

4. Results

4.1. Polymerization Process Description

4.2. First-Principle Model for Polymerization Process and Its Use in MPC

4.3. LSTM Model for Polymerization Process

4.4. Modeling Quality of LSTM and FP Models

4.5. Development of PIHNN Models

4.6. PIHNN Modeling Quality

4.7. Validation of MPC Algorithms Using PIHNN Models

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI