Observer-Based State Estimation for Recurrent Neural Networks: An Output-Predicting and LPV-Based Approach

Wang, Wanlin; Chen, Jinxiong; Huang, Zhenkun

doi:10.3390/mca28060104

Open AccessArticle

Observer-Based State Estimation for Recurrent Neural Networks: An Output-Predicting and LPV-Based Approach

by

Wanlin Wang

,

Jinxiong Chen

and

Zhenkun Huang

^*

School of Science, Jimei University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl. 2023, 28(6), 104; https://doi.org/10.3390/mca28060104

Submission received: 5 August 2023 / Revised: 12 October 2023 / Accepted: 13 October 2023 / Published: 25 October 2023

(This article belongs to the Collection Feature Papers in Mathematical and Computational Applications 2023)

Download

Browse Figures

Versions Notes

Abstract

:

An innovative cascade predictor is presented in this study to forecast the state of recurrent neural networks (RNNs) with delayed output. This cascade predictor is a chain-structured observer, as opposed to the conventional single observer, and is made up of several sub-observers that individually estimate the state of the neurons at various periods. This new cascade predictor is more useful than the conventional single observer in predicting neural network states when the output delay is arbitrarily large but known. In contrast to examining the stability of error systems solely employing the Lyapunov–Krasovskii functional (LKF), several new global asymptotic stability standards are obtained by combining the application of the Linear Parameter Varying (LPV) approach, LKF and convex principle. Finally, a series of numerical simulations verify the efficacy of the obtained results.

Keywords:

cascade predictor; recurrent neural networks; delayed output; linear parameter varying approach

1. Introduction

Over the past decades, delayed recurrent neural networks were successfully applied in many fields, including pattern recognition, image processing, and combinatorial optimization [1,2,3,4,5], and the dynamic behaviors of RNNs have quickly become a research hotspot. At present, many stability results about the dynamic behavior of RNNs have been obtained [6,7,8,9,10,11,12]. Meanwhile, the state information of neurons is very important, because it may participate in the design process of control law, such as feedback control. Therefore, neural networks’ state estimation research is of significant importance in practical applications.

The issue of state estimation for RNNs is currently of great interest to many scholars, and many significant results have been made [13,14,15,16,17,18,19,20]. In [13], the authors discussed the state estimation problem for delayed RNNs and obtained delay-independent results using LMI technique. The state estimation problem for Markov jump RNNs with distributed delays was discussed in [14]; the authors proposed an effective LMI technique to solve the problem of neuron states’ estimation. The state estimation problem of uncertain RNNs was addressed via a robust state estimator in [15], and it was shown that the suggested robust estimator can be ensured by the feasibility of solving a set of LMIs. An interesting delay partition method was proposed in [17]; the authors used this method to investigate the state estimation problem for delayed static neural networks. In [20], the authors solved the memristive neural networks’ (MNNs) state estimation problem by using a novel full-order state observer. It is well known that the effectiveness of the designed observer is usually related to system parameters and the size of various time delays. For example, in most of the works mentioned above, the output states do not have a time delay or the size of the delay is limited to a small range, and a full-order observer is designed based on the measured output. In [21], during the identification of RNN models, a subspace encoder is co-estimated to reconstruct the state of the model from past input and output data. However, such an explicit form of an observer might run into difficulties if the state delay is not known, and needs an excessively large number of past input–output samples.

For arbitrary large known output delays, it is still an open problem to construct an effective observer to predict the current accurate states of the neuron. In fact, the proposal of the cascade predictor in the field of nonlinear systems has attracted much attention from researchers in recent years. In [22], the authors first proposed a cascade predictor for a class of triangular nonlinear systems that have only output delay; the cascade predictor is made up of a series of subsystems, and each subsystem has a similar structure. However, due to the complexity of the structure of the cascade predictor in [22], the observation estimation is not easily implemented by computer simulation. Since then, the cascade predictor has been studied and a series of results have been achieved on such triangular nonlinear systems with only output delay [23,24,25]. It can be observed that the cascade predictor has a promising application value in the state estimation of delay systems. It will be a challenge to incorporate cascade predictors into the state estimation of RNNs, and new state estimation approaches may arise.

Inspired by the arguments mentioned above, we focus on the state estimation of delayed RNNs based on cascade predictors. The designed cascade predictor is composed of a limited amount of subsystems; each subsystem estimates the neuron states at different delays, and the last subsystem estimates the current actual states of the neuron. The following are the paper’s primary innovative ideas.

(1) This paper theoretically describes the reason why a single observer cannot observe the neuron state information when the output delay is large enough. Then, inspired by [22,23], we design a new cascade predictor for estimating the state of RNNs with state and output delays. To our best knowledge, this is the first time that a cascade predictor has been applied to state estimation in neural networks.

(2) For the activation function, most papers usually use the traditional Lipschitz condition hypothesis; however, for the activation with the large Lipschitz constant, it may indirectly lead to the conservatism of the design process and theoretical results. To overcome these difficulties, a new reformulated Lipschitz property of the activation function, which is the outcome of applying the LPV approach to the Lipschitz condition, is provided. This property is motivated by [26,27] and can lessen conservatism in the observer design process.

(3) In contrast to [14,16,20], the case when the output states have an arbitrarily large delay is explored, and the state prediction problem of delayed RNNs is resolved based on the measured output. A set of LMIs may be used to calculate the observer gain, and new adequate requirements for the global asymptotic stability of each error system are obtained based on the LKF, the LMI technique, and the convex principle.

The structure of this paper is as follows. The RNNs model and its associated assumptions are introduced in Section 2 of this article. The main results of this paper are presented in Section 3. The efficiency of the results obtained is demonstrated in Section 4 by numerical simulations. Section 5 closes with a general conclusion.

Notations:

R

,

Z

denote, respectively, the set of real numbers and the positive integer set.

R^{n}

represents the n dimensional Euclidean space with the Euclidean norm

∥ \cdot ∥

.

R^{n \times m}

denotes the set of all

n \times m

real matrices. The superscript

“ T ”

and

` ` - 1 "

represent the transpose and inverse of a matrix.

X > Y

(

X < Y

) means that

X - Y

is a positive (negative) matrix.

∥ A ∥

denotes the operator norm of matrix A, i.e.,

A = (a_{i j}), ∥ A ∥ = \sqrt{λ_{m a x} (A^{T} A)}

, where

λ_{m a x} (A)

is the largest eigenvalue of A.

d i a g {\cdot \cdot \cdot}

represents a block diagonal matrix. The symbol

“ * ”

denotes the symmetric term of the matrix. Let

τ > 0

,

C ([- τ, 0]; R^{n})

denote the family of continuous functions

ψ

from

[- τ, 0]

to

R^{n}

. I stands for an identity matrix with the proper dimensions.

2. Problem Formulation

Consider the following RNNs with delayed output [28,29]:

\begin{matrix} \{\begin{matrix} \dot{x} (t) = - A x (t) + W_{0} f (x (t)) + W_{1} f (x (t - h_{x})), \\ y (t) = C x (t - h_{y}), \\ x (s) = ϕ (s), s \in [- τ, 0], \end{matrix} \end{matrix}

(1)

where

x (t) = {[x_{1} (t), \dots, x_{n} (t)]}^{T} \in R^{n}

denotes the state vector,

A = d i a g {a_{1}, \dots, a_{n}}

is a diagonal matrix with

a_{i} > 0

,

W_{0}

and

W_{1}

represent the connection weight matrices,

f (x (t)) = {[f_{1} (x_{1} (t)), \dots, f_{n} (x_{n} (t))]}^{T} \in R^{n}

denotes the activation function, C is a output matrix,

y (t)

represents the measured output.

h_{x}, h_{y}

denote the known discrete delays, and

ϕ (s) \in C ([- τ, 0]; R^{n})

is an initial condition,

τ = m a x {h_{x}, h_{y}}

.

The primary objective of this research is to construct an effective observer that can accurately predict the neuron states when the output delay

h_{y}

is arbitrarily large yet known. Next, for a subsequent analysis, the following corresponding lemmas are given.

Assumption 1.

The activation function

f_{i} (\cdot)

is bounded and satisfies

\begin{matrix} |f_{i} (u) - f_{i} (v)| \leq l_{i} |u - v|, \forall u, v \in R, \end{matrix}

(2)

where

f_{i} (0) = 0

and

l_{i} > 0

is a Lipschitz constant.

Some conservative conditions in the observer design may result from the Lipschitz condition of the activation function in (2). However, it is generally known that LPV approach can reduce the Lipschitz condition’s conservatism, making it useful for designing observers for nonlinear systems with a large Lipschitz constant [26,27]. Here, we will extend this method to RNNs (1) and derive the subsequent lemma.

Lemma 1.

The activation function

f (\cdot)

has the following two properties that are equal:

(1): Lipschitz property: $f_{i} (\cdot)$ is $l_{i}$ -Lipschitz, i.e.,

$\begin{matrix} |f_{i} (x_{i}) - f_{i} (y_{i})| \leq l_{i} |x_{i} - y_{i}|, \forall x_{i}, y_{i} \in R . \end{matrix}$

(3)
(2): Lipschitz property reformulated: for all $i = 1, \cdot \cdot \cdot, n$ , there exist functions $ψ_{i i} (t) : R \to R$ and constants $\bar{γ_{i i}}, \underset{̲}{γ_{i i}}$ , such that

$\begin{matrix} f (x) - f (y) = \sum_{i = 1}^{n} ψ_{i i} (t) H_{i i} (x - y), \forall x, y \in R^{n} \end{matrix}$

(4)

with $\underset{̲}{γ_{i i}} \leq ψ_{i i} (t) \leq \bar{γ_{i i}}$ , where $H_{i i} = e_{n} (i) e_{n}^{T} (i)$ and $e_{n} (i) = {[0, \dots, \overset{i - t h}{\overset{︷}{1}}, \dots, 0]}^{T} \in R^{n}$ .

The proofs of Lemma 1 are similar to Lemma 6 and Lemma 7 in [26]; we omit it here. Note that

ψ_{i i} (t)

in (4) is expressed as follows

\begin{matrix} ψ_{i i} (t) = \{\begin{matrix} \frac{f_{i} (x_{i}) - f_{i} (y_{i})}{x_{i} - y_{i}}, x_{i} \neq y_{i}, \\ 0, x_{i} = y_{i} . \end{matrix} \end{matrix}

(5)

Remark 1.

Compared with the traditional global Lipschitz condition hypothesis of activation functions in [13,14,15,16,30], the reformulation (4) in Lemma 1 offers a best less conservative Lipschitz condition and deals with the activation functions

f (x)

with the best accuracy. For instance, for

f (x) = {[t a n h (x_{1}) + \frac{1}{2} s i n (x_{1}), \frac{1}{2} c o s (x_{2}) + \frac{1}{2} (|x_{2} + 1| - |x_{2} - 1|)]}^{T}

, we have

\underset{̲}{γ_{11}} = - \frac{1}{2},

\bar{γ_{11}} = \frac{3}{2}, \underset{̲}{γ_{22}} = - \frac{1}{2}, \bar{γ_{22}} = \frac{3}{2}

. For

g (x) = {[\frac{3}{4} (|x_{1} + 1| - |x_{1} - 1|), \frac{3}{2} t a n h (x_{2})]}^{T}

, we have

\underset{̲}{γ_{11}} = 0, \bar{γ_{11}} = \frac{3}{2}, \underset{̲}{γ_{22}} = 0, \bar{γ_{22}} = \frac{3}{2}

. If we use the global Lipschitz condition in Assumption 1, we can only obtain

|f_{i} (x_{i}) - f_{i} (y_{i})| \leq \frac{3}{2} |x_{i} - y_{i}|

and

|g_{i} (x_{i}) - g_{i} (y_{i})| \leq \frac{3}{2} |x_{i} - y_{i}|

,

i = 1, 2

, we cannot accurately distinguish between

f (x)

and

g (x)

, and the related properties of

f (x)

and

g (x)

cannot be effectively utilized.

Lemma 2

(Finsler’s Lemma [31]). For

x \in R^{n}

,

M \in R^{n \times n}

is a symmetric matrix, and the matrix

G \in R^{m \times n}

, such that

r a n k (G) < n

. The subsequent properties are equivalent:

\begin{matrix} (1) x^{T} M x < 0, \forall x \in R^{n} / G x = 0, x \neq 0, \\ (2) G^{⊥^{T}} M G^{⊥} < 0, \end{matrix}

where

G^{⊥}

is a right orthogonal complement of

G

.

Lemma 3

(Moon’s Inequality [32]). Assume that

x (s) \in R^{n_{a}}, y (s) \in R^{n_{b}}

are defined on the interval Ω and

Y \in R^{n_{a} \times n_{b}}

. Then, for matrices

D \in R^{n_{a} \times n_{a}}, T \in R^{n_{a} \times n_{b}}

and

Z \in R^{n_{b} \times n_{b}}

, the following holds:

\begin{matrix} - 2 \int_{Ω} x^{T} (s) Y y (s) d s \leq \int_{Ω} {[\begin{matrix} x (s) \\ y (s) \end{matrix}]}^{T} \cdot [\begin{matrix} D & T - Y \\ * & Z \end{matrix}] \cdot [\begin{matrix} x (s) \\ y (s) \end{matrix}] d s, \end{matrix}

where

\begin{matrix} [\begin{matrix} D & T \\ * & Z \end{matrix}] > 0 . \end{matrix}

3. Results

3.1. Single Observer

In this section, we will employ a full-order observer to handle the matter of RNNs’ state estimation. First, in order to accurately estimate the RNNs’ state information, we will design the full-order observer based on the measured delayed output as

\begin{matrix} \dot{\hat{x}} (t) = - A \hat{x} (t) + W_{0} f (\hat{x} (t)) + W_{1} f (\hat{x} (t - h_{x})) + L (C \hat{x} (t - h_{y}) - y (t)), \end{matrix}

(6)

where

\hat{x} (t)

is an estimation of the state

x (t)

of (1). Then, by defining the estimation error

e (t) = \hat{x} (t) - x (t)

, we can obtain the error system given, as follows

\begin{matrix} \dot{e} (t) = & - A e (t) + W_{0} Δ f (t) + W_{1} Δ f (t - h_{x}) + L C e (t - h_{y}), \end{matrix}

(7)

where

Δ f (t) = f (\hat{x} (t)) - f (x (t))

. Due to Lemma 1, there are functions

ψ_{i i} (t)

and

ψ_{i i}^{h_{x}} (t)

, such that

\begin{matrix} \{\begin{matrix} Δ f (t) = \sum_{i = 1}^{n} ψ_{i i} (t) H_{i i} e (t), \\ Δ f (t - h_{x}) = \sum_{i = 1}^{n} ψ_{i i}^{h_{x}} (t) H_{i i} e (t - h_{x}), \end{matrix} \end{matrix}

(8)

where

ψ_{i i}^{h_{x}} (t) = ψ_{i i} (t - h_{x})

.

Define the time-varying matrices

Ψ (t) = d i a g {ψ_{11} (t), \dots, ψ_{n n} (t)}

,

Ψ_{h_{x}} (t) = d i a g

{ψ_{11}^{h_{x}} (t), \dots, ψ_{n n}^{h_{x}} (t)}

, and bounded convex set

H_{n}

, where the vertex set of

H_{n}

is defined as

\begin{matrix} V_{H_{n}} = [ϕ = d i a g [ϕ_{11}, ϕ_{22}, \dots, ϕ_{n n}] \in R^{n \times n} | ϕ_{i i} \in {\underset{̲}{γ_{i i}}, \bar{γ_{i i}}}] . \end{matrix}

(9)

It is obvious that time-varying matrix parameters

Ψ (t)

and

Ψ_{h_{x}} (t)

belong to the bounded convex set

H_{n}

. Now, we define the following matrices:

\begin{matrix} \{\begin{matrix} A (Ψ (t)) = - A + W_{0} \sum_{i = 1}^{n} ψ_{i i} (t) H_{i i} = - A + W_{0} Ψ (t), \\ B (Ψ_{h_{x}} (t)) = W_{1} \sum_{i = 1}^{n} ψ_{i i}^{h_{x}} (t) H_{i i} = W_{1} Ψ_{h_{x}} (t), \end{matrix} \end{matrix}

(10)

then, by using (10), the LPV error system (7) can be reconstructed as

\begin{matrix} \dot{e} (t) = A (Ψ (t)) e (t) + B (Ψ_{h_{x}} (t)) e (t - h_{x}) + L C e (t - h_{y}) . \end{matrix}

(11)

The sufficient condition for the global asymptotic stability of error system (11) is presented in the following theorem.

Theorem 1.

The error system (11) is globally asymptotically stable for all

h_{y} \in [0, h^{*}]

, if there exist matrices

P > 0

,

Q > 0

,

M > 0

,

Z > 0

,

S > 0

, a matrix R, and positive scalars

ρ_{i} > 0, i = 1, 2

, such that for

\forall Ψ, Ψ_{h_{x}} \in V_{H_{n}}

, the following LMIs hold with observer gain

L = P^{- 1} R

:

\begin{matrix} [\begin{matrix} Ω_{1} & P B (Ψ_{h_{x}}) + \frac{Z}{h_{x}} & R C + \frac{S}{h^{*}} & h_{x} A^{T} (Ψ) P & h^{*} A^{T} (Ψ) P \\ * & Ω_{2} & 0 & h_{x} B^{T} (Ψ_{h_{x}}) P & h^{*} B^{T} (Ψ_{h_{x}}) P \\ * & * & Ω_{3} & h_{x} C^{T} R^{T} & h^{*} C^{T} R^{T} \\ * & * & * & Ω_{4} & 0 \\ * & * & * & * & Ω_{5} \end{matrix}] < 0, \end{matrix}

(12)

where

Ω_{1} = P A (Ψ) + A^{T} (Ψ) P + Q + M - \frac{Z}{h_{x}} - \frac{S}{h^{*}}

,

Ω_{2} = - Q - \frac{Z}{h_{x}}

,

Ω_{3} = - M - \frac{S}{h^{*}}

,

Ω_{4} = - h_{x} (2 ρ_{1} P - ρ_{1}^{2} Z)

and

Ω_{5} = - h^{*} (2 ρ_{2} P - ρ_{2}^{2} S)

.

Proof.

Consider the following Lyapunov–Krasovskii functions as

\begin{matrix} V (t) = & e^{T} (t) P e (t) + \int_{t - h_{x}}^{t} e^{T} (τ) Q e (τ) d τ + \int_{t - h_{y}}^{t} e^{T} (τ) M e (τ) d τ \\ + \int_{- h_{x}}^{0} \int_{t + β}^{t} {\dot{e}}^{T} (τ) Z \dot{e} (τ) d τ d β + \int_{- h_{y}}^{0} \int_{t + β}^{t} {\dot{e}}^{T} (τ) S \dot{e} (τ) d τ d β, \end{matrix}

(13)

and the time derivative of

V (t)

can be evaluated as

\begin{matrix} \dot{V} (t) = & 2 e^{T} (t) P \dot{e} (t) + e^{T} (t) Q e (t) - e^{T} (t - h_{x}) Q e (t - h_{x}) + e^{T} (t) M e (t) \\ - e^{T} (t - h_{y}) M e (t - h_{y}) + h_{x} {\dot{e}}^{T} (t) Z \dot{e} (t) - \int_{t - h_{x}}^{t} {\dot{e}}^{T} (τ) Z \dot{e} (τ) d τ \\ + h_{y} {\dot{e}}^{T} (t) S \dot{e} (t) - \int_{t - h_{y}}^{t} {\dot{e}}^{T} (τ) S \dot{e} (τ) d τ . \end{matrix}

(14)

Applying the Jensen’s Inequality [33], we can obtain

\begin{matrix} - \int_{t - h_{x}}^{t} {\dot{e}}^{T} (τ) Z \dot{e} (τ) d τ \leq - \frac{1}{h_{x}} Δ e_{h_{x}}^{T} (t) Z Δ e_{h_{x}} (t), \end{matrix}

(15)

\begin{matrix} - \int_{t - h_{y}}^{t} {\dot{e}}^{T} (τ) S \dot{e} (τ) d τ \leq - \frac{1}{h_{y}} Δ e_{h_{y}}^{T} (t) S Δ e_{h_{y}} (t), \end{matrix}

(16)

where

Δ e_{h_{x}} (t) = e (t) - e (t - h_{x})

,

Δ e_{h_{y}} (t) = e (t) - e (t - h_{y})

.

By using (14)–(16),

\dot{V} (t)

satisfies

\begin{matrix} \dot{V} (t) \leq χ^{T} (t) Υ (h_{x}, h_{y}) χ (t), \end{matrix}

(17)

where

\begin{matrix} χ (t) = {[{\dot{e}}^{T} (t), e^{T} (t), e^{T} (t - h_{x}), e^{T} (t - h_{y}), Δ e_{h_{x}}^{T} (t), Δ e_{h_{y}}^{T} (t)]}^{T}, \\ Υ (h_{x}, h_{y}) = [\begin{matrix} h_{x} Z + h_{y} S & P & 0 & 0 & 0 & 0 \\ * & Q + M & 0 & 0 & 0 & 0 \\ * & * & - Q & 0 & 0 & 0 \\ * & * & * & - M & 0 & 0 \\ * & * & * & * & - \frac{Z}{h_{x}} & 0 \\ * & * & * & * & * & - \frac{S}{h_{y}} \end{matrix}] . \end{matrix}

Moreover, it follows from the error system (11) and the definition of

y_{h_{x}} (t)

and

y_{h_{y}} (t)

that

Γ (Ψ (t), Ψ_{h_{x}} (t)) χ (t) = 0

with

\begin{matrix} Γ (Ψ (t), Ψ_{h_{x}} (t)) = [\begin{matrix} I & - A (Ψ (t)) & - B (Ψ_{h_{x}} (t)) & - P^{- 1} R C & 0 & 0 \\ 0 & - I & I & 0 & I & 0 \\ 0 & - I & 0 & I & 0 & 0 \end{matrix}] . \end{matrix}

Therefore, the error system (11) is globally asymptotically stable if, for all

Γ (Ψ (t),

Ψ_{h_{x}} (t)) χ (t) = 0

with

χ (t) \neq 0

, there holds

χ^{T} (t) Υ (h_{x}, h_{y}) χ (t) < 0

. Then, according to Lemma 2 and the convexity principle [34],

χ^{T} (t) Υ (h_{x}, h_{y}) χ (t) < 0

is equivalent to

\begin{matrix} {(Γ^{⊥} (Ψ, Ψ_{h_{x}}))}^{T} Υ (h_{x}, h_{y}) Γ^{⊥} (Ψ, Ψ_{h_{x}}) < 0, \forall Ψ, Ψ_{h_{x}} \in V_{H_{n}}, \end{matrix}

(18)

where

Γ^{⊥} (Ψ, Ψ_{h_{x}})

is a right orthogonal complement of

Γ (Ψ, Ψ_{h_{x}})

and

\begin{matrix} Γ^{⊥} (Ψ, Ψ_{h_{x}}) = [\begin{matrix} A (Ψ) & B (Ψ_{h_{x}}) & P^{- 1} R C \\ I & 0 & 0 \\ 0 & I & 0 \\ 0 & 0 & I \\ I & - I & 0 \\ I & 0 & - I \end{matrix}] . \end{matrix}

(19)

Further, (18) can be rewritten as

\begin{matrix} [\begin{matrix} P A (Ψ) + A^{T} (Ψ) P + Q + M & P B (Ψ_{h_{x}}) & R C \\ * & - Q & 0 \\ * & * & - M \end{matrix}] + h_{x} Π_{1} Z Π_{1}^{T} \\ - \frac{1}{h_{x}} Π_{2} Z Π_{2}^{T} + h_{y} Π_{1} S Π_{1}^{T} - \frac{1}{h_{y}} Π_{3} S Π_{3}^{T} < 0, \forall Ψ, Ψ_{h_{x}} \in V_{H_{n}}, \end{matrix}

(20)

where

Π_{1} = {[A (Ψ), B (Ψ_{h_{x}}), P^{- 1} R C]}^{T}

,

Π_{2} = {[I, - I, 0]}^{T}

and

Π_{3} = {[I, 0, - I]}^{T}

. Since

Z > 0

and

S > 0

, (20) cannot hold when

h_{y}

is large enough. Therefore, there exists an upper bound

h^{*}

for

h_{y}

such that (20) holds when

h_{y} \in [0, h^{*}]

, and

{(Γ^{⊥} (Ψ, Ψ_{h_{x}}))}^{T} Υ (h_{x}, h_{y}) Γ^{⊥} (Ψ, Ψ_{h_{x}}) < 0

,

\forall Ψ, Ψ_{h_{x}} \in V_{H_{n}}

is a sufficient condition for (18).

By employing a Schur complement [34],

{(Γ^{⊥} (Ψ, Ψ_{h_{x}}))}^{T} Υ (h_{x}, h_{y}) Γ^{⊥} (Ψ, Ψ_{h_{x}}) < 0

with

\forall Ψ, Ψ_{h_{x}} \in V_{H_{n}}

is equivalent to

\begin{matrix} [\begin{matrix} Ω_{1} & P B (Ψ_{h_{x}}) + \frac{Z}{h_{x}} & R C + \frac{S}{h^{*}} & h_{x} A^{T} (Ψ) Z & h^{*} A^{T} (Ψ) S \\ * & Ω_{2} & 0 & h_{x} B^{T} (Ψ_{h_{x}}) Z & h^{*} B^{T} (Ψ_{h_{x}}) S \\ * & * & Ω_{3} & h_{x} C^{T} R^{T} P^{- 1} Z & h^{*} C^{T} R^{T} P^{- 1} S \\ * & * & * & - h_{x} Z & 0 \\ * & * & * & * & - h^{*} S \end{matrix}] < 0, \end{matrix}

(21)

with

\forall Ψ, Ψ_{h_{x}} \in V_{H_{n}}

, then, multiplying both sides of (21) on the left and on the right by

d i a g {I, I, I, P Z^{- 1}, P S^{- 1}}

and its transpose, respectively, and using the inequalities

- P Z^{- 1} P \leq - 2 ρ_{1} P + ρ_{1}^{2} Z

,

- P S^{- 1} P \leq - 2 ρ_{2} P + ρ_{2}^{2} S

, we can deduce that (12) is a sufficient condition for (21). We finish the proof. □

Remark 2.

Different from the stability analysis of the nonlinear error system in [12,13,15,16,20], we provide an LPV formulation of the error system for RNNs with Lipschitz activation functions, which leads us to study the stability of the linear error system (11) by using the convexity principle. Obviously, our LPV-based approach is a useful tool for the state estimation of RNNs.

Remark 3.

It follows from Theorem 1, which depends on the output delay, that the designed full-order observer (6) cannot predict the current state of RNNs (1) if

h_{y} ≫ h *

. This is due to our inability to choose the appropriate observer gain L to stabilize the error system (11). From the later numerical simulations, it is clear that this is a drawback of the full-order observer. In addition, the conditions shown in (12) can be checked against a set of fixed values by standard LMI routines and an estimate of

h^{*}

is obtained. The algorithm for finding a feasible solution to (12) is summarized as follows:

Step 1: Fix the value of $h^{*}$ to constant $\bar{h}$ and make an initial guess for $\bar{h}$ .
Step 2: Fix the value of $ρ_{1}$ , $ρ_{2}$ to some constants ${\bar{ρ}}_{1}$ , ${\bar{ρ}}_{2}$ and make an initial guess for the values of ${\bar{ρ}}_{1}$ , ${\bar{ρ}}_{2}$ .
Step 3: Solve the LMI (12) for L with the fixed values ${\bar{ρ}}_{1}$ , ${\bar{ρ}}_{2}$ and $\bar{h}$ ; if a feasible value of L cannot be computed, return to step 2 to reset the initial values of ${\bar{ρ}}_{1}$ and ${\bar{ρ}}_{2}$ ; if a feasible value of L can be computed, return to step 1 and increase the value of $\bar{h}$ until L cannot be solved.

3.2. Cascade Predictor

If

h ≫ h^{*}

, the full-order observer (6) will fail. In this case, cascade predictor design can be used to solve this problem. Let

h = \frac{h_{y}}{m}

,

m \in Z

and define

\begin{matrix} x_{i} (t) : \{\begin{matrix} {\dot{x}}_{i} (t) = - A x_{i} (t) + W_{0} f (x_{i} (t)) + W_{1} f (x_{i} (t - h_{x})), t \in [(m - i) h, + \infty), \\ x_{i} (s) = ϕ (s - (m - i) h), s \in [- τ + (m - i) h, (m - i) h], \end{matrix} \end{matrix}

(22)

where

i = 1, 2, \dots, m

. Through mathematical analysis, we obtain

x_{i} (t) = x (t - h_{y} + i \cdot h) = x_{i + 1} (t - h)

,

i = 1, 2, \dots, m - 1

and

x_{m} (t) = x (t)

. Then, the cascade predictor can be constructed as

\begin{matrix} \{\begin{matrix} {\dot{\hat{x}}}_{1} (t) = & - A {\hat{x}}_{1} (t) + W_{0} f ({\hat{x}}_{1} (t)) + W_{1} f ({\hat{x}}_{1} (t - h_{x})) + L_{1} (C {\hat{x}}_{1} (t - h) - y (t)), \\ {\dot{\hat{x}}}_{2} (t) = & - A {\hat{x}}_{2} (t) + W_{0} f ({\hat{x}}_{2} (t)) + W_{1} f ({\hat{x}}_{2} (t - h_{x})) + L_{2} (C {\hat{x}}_{2} (t - h) - {\hat{y}}_{1} (t)), \\ ⋮ \\ {\dot{\hat{x}}}_{m} (t) = & - A {\hat{x}}_{m} (t) + W_{0} f ({\hat{x}}_{m} (t)) + W_{1} f ({\hat{x}}_{m} (t - h_{x})) + L_{m} (C {\hat{x}}_{m} (t - h) - {\hat{y}}_{m - 1} (t)), \end{matrix} \end{matrix}

(23)

where

ϕ_{i} (s) \in C ([- τ_{1}, 0]; R^{n})

is an initial condition for the subsystem

{\hat{x}}_{i}

,

i = 1, 2, \dots, m

,

τ_{1} = m a x {h_{x}, h}

, and

{\hat{y}}_{i} (t) = C x_{i} (t)

,

i = 1, 2, \dots, m - 1

. In the cascade predictor (23), the subsystem

{\hat{x}}_{i} (t)

estimates the state

x_{i} (t), i = 1, 2, \dots, m - 1

, and the subsystem

{\hat{x}}_{m} (t)

estimates the state

x (t)

.

Remark 4.

The idea behind the cascade predictor (23) is that regardless of how long the output delay

h_{y}

is, we can split it into m small time periods. Then, each sub-observer

{\hat{x}}_{i} (t)

in the cascade predictor estimates the delayed state

x (t - h_{y} + i \frac{h_{y}}{m})

, and the last sub-observer

{\hat{x}}_{m} (t)

estimates the current state

x (t)

. Compared with [14,15,16,17,18,19,20], which discuss the state estimation of neural networks with only a small output delay or even no output delay using a full-order observer, the output delay

h_{y}

in this paper is arbitrarily large yet known, and, in this sense, this is an advancement in the study of neural networks’ state estimation.

Remark 5.

Moreover, the idea of this novel predictor was first proposed in [22], and the authors used this predictor with a chain structure for a class of triangular nonlinear systems with only the output delay. In this paper, we will discuss the state estimation problem of RNNs with both the state delay and output delay using this novel cascade predictor.

Next, define the estimation error

e_{i} (t) = {\hat{x}}_{i} (t) - x_{i} (t)

,

i = 1, 2, \dots, m

; then, similar to (10)–(11), which use the LPV approach, we can obtain the following error systems

\begin{matrix} \{\begin{matrix} {\dot{e}}_{1} (t) = A (Ψ (t)) e_{1} (t) + B (Ψ_{h_{x}} (t)) e_{1} (t - h_{x}) + L_{1} C e_{1} (t - h), \\ {\dot{e}}_{j} (t) = A (Ψ (t)) e_{j} (t) + B (Ψ_{h_{x}} (t)) e_{j} (t - h_{x}) + L_{j} C e_{j} (t - h) - L_{j} e_{j - 1} (t), j = 2, 3, \dots, m . \end{matrix} \end{matrix}

(24)

Theorem 2.

For given output delay

h_{y}

and scalar

m \in Z

, the error systems (24) are globally asymptotically stable if there exist matrices

P_{j} > 0

,

Q_{j} > 0

,

M_{j} > 0

,

Z_{j} > 0

,

S_{j} > 0

,

X_{i} > 0

,

D_{i} > 0

,

i = 1, 2, \dots, m

, matrices

R_{i}

,

Y_{i}

,

T_{i}

,

i = 1, 2, \dots, m

and a positive scalar

γ > 0

, such that

L_{i} = P_{i}^{- 1} R_{i}

,

i = 1, 2, \dots, m

and the following LMIs are satisfied:

\begin{matrix} [\begin{matrix} Ξ_{i} & - T_{i} + P_{i} B (Ψ_{h_{x}}) & - Y_{i} + R_{i} C & γ A (Ψ) P_{i} \\ * & - Q_{i} & 0 & γ B^{T} (Ψ_{h_{x}}) P_{i} \\ * & * & - M_{i} & γ C^{T} R_{i}^{T} \\ * & * & * & - 2 γ P_{i} + h_{x} Z_{i} + h S_{i} \end{matrix}] < 0, \end{matrix}

(25)

with

\forall Ψ \in V_{H_{n}}

,

\forall Ψ_{h_{x}} \in V_{H_{n}}

and

\begin{matrix} [\begin{matrix} X_{i} & Y_{i} \\ * & S_{i} \end{matrix}] < 0, \end{matrix}

(26)

\begin{matrix} [\begin{matrix} D_{i} & T_{i} \\ * & Z_{i} \end{matrix}] < 0, \end{matrix}

(27)

where

Ξ_{i} = P_{i} A (Ψ) + A^{T} (Ψ) P_{i} + T_{i} + T_{i}^{T} + Y_{i} + Y_{i}^{T} + h_{x} D_{i} + h X_{i} + Q_{i} + M_{i}

.

Proof.

The stability of the error systems (24) will be proved gradually:

Step 1: we consider the first error system

e_{1} (t)

in (24):

\begin{matrix} {\dot{e}}_{1} (t) = A (Ψ (t)) e_{1} (t) + B (Ψ_{h_{x}} (t)) e_{1} (t - h_{x}) + L_{1} C e_{1} (t - h) . \end{matrix}

(28)

Using the Newton–Leibniz formula, we have

\begin{matrix} \{\begin{matrix} e_{1} (t - h_{x}) = e_{1} (t) - \int_{t - h_{x}}^{t} {\dot{e}}_{1} (τ) d τ, \\ e_{1} (t - h) = e_{1} (t) - \int_{t - h}^{t} {\dot{e}}_{1} (τ) d τ, \end{matrix} \end{matrix}

(29)

then,

{\dot{e}}_{1} (t)

in (28) can be reconstructed as

\begin{matrix} {\dot{e}}_{1} (t) = [A (Ψ (t)) + B (Ψ_{h_{x}} (t)) + L_{1} C] e_{1} (t) - B (Ψ_{h_{x}} (t)) \int_{t - h_{x}}^{t} {\dot{e}}_{1} (τ) d τ - L_{1} C \int_{t - h}^{t} {\dot{e}}_{1} (τ) d τ . \end{matrix}

(30)

Constructing the Lyapunov–Krasovskii functions as follows

\begin{matrix} V_{1} (t) = & \overset{V_{11} (t)}{\overset{︷}{e_{1}^{T} (t) P_{1} e_{1} (t)}} + \overset{V_{12} (t)}{\overset{︷}{\int_{t - h_{x}}^{t} e_{1}^{T} (τ) Q_{1} e_{1} (τ) d τ}} + \overset{V_{13} (t)}{\overset{︷}{\int_{t - h}^{t} e_{1}^{T} (τ) M_{1} e_{1} (τ) d τ}} \\ + \overset{V_{14} (t)}{\overset{︷}{\int_{- h_{x}}^{0} \int_{t + η}^{t} {\dot{e}}_{1}^{T} (τ) Z_{1} {\dot{e}}_{1} (τ) d τ d η}} + \overset{V_{15} (t)}{\overset{︷}{\int_{- h}^{0} \int_{t + η}^{t} {\dot{e}}_{1}^{T} (τ) S_{1} {\dot{e}}_{1} (τ) d τ d η}} . \end{matrix}

(31)

then, the derivative of

V_{11} (t)

along (30) is as follows

\begin{matrix} {\dot{V}}_{11} (t) = & e_{1}^{T} (t) [P_{1} A (Ψ (t)) + A^{T} (Ψ (t)) P_{1} + P_{1} B (Ψ_{h_{x}} (t)) + B^{T} (Ψ_{h_{x}} (t)) P_{1} + R_{1} C + C^{T} R_{1}] \\ \times e_{1} (t) - 2 e_{1}^{T} (t) P_{1} B (Ψ_{h_{x}} (t)) \int_{t - h_{x}}^{t} {\dot{e}}_{1} (τ) d τ - 2 e_{1}^{T} (t) R_{1} C \int_{t - h}^{t} {\dot{e}}_{1} (τ) d τ . \end{matrix}

(32)

According to Lemma 2 and (26)–(27), we have

\begin{matrix} - 2 e_{1}^{T} (t) P_{1} B (Ψ_{h_{x}} (t)) \int_{t - h_{x}}^{t} {\dot{e}}_{1} (τ) d τ \leq & h_{x} e_{1}^{T} (t) D_{1} e_{1} (t) + 2 e_{1}^{T} (t) [T_{1} - P_{1} B (Ψ_{h_{x}} (t))] \end{matrix}

\begin{matrix} \times [e_{1} (t) - e_{1} (t - h_{x})] + \int_{t - h_{x}}^{t} {\dot{e}}_{1}^{T} (τ) Z_{1} {\dot{e}}_{1} (τ) d τ, \\ - 2 e_{1}^{T} (t) R_{1} C \int_{t - h}^{t} {\dot{e}}_{1} (τ) d τ \leq & h e_{1}^{T} (t) X_{1} e_{1} (t) + 2 e_{1}^{T} (t) [Y_{1} - R_{1} C] [e_{1} (t) - e_{1} (t - h)] \end{matrix}

(33)

\begin{matrix} + \int_{t - h}^{t} {\dot{e}}_{1}^{T} (τ) S_{1} {\dot{e}}_{1} (τ) d τ . \end{matrix}

(34)

By (32)–(34),

{\dot{V}}_{11} (t)

satisfies

\begin{matrix} {\dot{V}}_{11} \leq & e_{1}^{T} (t) [P_{1} A (Ψ (t)) + A^{T} (Ψ (t)) P_{1} + T_{1}^{T} + T_{1} + Y_{1}^{T} + Y_{1} + h_{x} D_{1} + h X_{1}] e_{1} (t) \\ - 2 e_{1}^{T} (t) [T_{1} - P_{1} B (Ψ_{h_{x}} (t))] e_{1} (t - h_{x}) - 2 e_{1}^{T} (t) [Y_{1} - R_{1} C] e_{1} (t - h) \\ + \int_{t - h_{x}}^{t} {\dot{e}}_{1}^{T} (τ) Z_{1} {\dot{e}}_{1} (τ) d τ + \int_{t - h}^{t} {\dot{e}}_{1}^{T} (τ) S_{1} {\dot{e}}_{1} (τ) d τ . \end{matrix}

(35)

The derivatives of

V_{12} (t)

,

V_{13} (t)

,

V_{14} (t)

and

V_{15} (t)

are as follows

\begin{matrix} {\dot{V}}_{12} (t) = & e_{1}^{T} (t) Q_{1} e_{1} (t) - e_{1}^{T} (t - h_{x}) Q_{1} e_{1} (t - h_{x}), \end{matrix}

(36)

\begin{matrix} {\dot{V}}_{13} (t) = & e_{1}^{T} (t) M_{1} e_{1} (t) - e_{1}^{T} (t - h) M_{1} e_{1} (t - h), \end{matrix}

(37)

\begin{matrix} {\dot{V}}_{14} (t) = & h_{x} {\dot{e}}_{1}^{T} (t) Z_{1} {\dot{e}}_{1} (t) - \int_{t - h_{x}}^{t} {\dot{e}}_{1}^{T} (τ) Z_{1} {\dot{e}}_{1} (τ) d τ, \end{matrix}

(38)

\begin{matrix} {\dot{V}}_{15} (t) = & h {\dot{e}}_{1}^{T} (t) S_{1} {\dot{e}}_{1} (t) - \int_{t - h}^{t} {\dot{e}}_{1}^{T} (τ) S_{1} {\dot{e}}_{1} (τ) d τ, \end{matrix}

(39)

and for arbitrary constant

γ > 0

, we have

\begin{matrix} - 2 γ {\dot{e}}_{1}^{T} (t) P_{1} [{\dot{e}}_{1} (t) - A (Ψ (t)) e_{1} (t) - B (Ψ_{h_{x}} (t)) e_{1} (t - h_{x}) - P_{1}^{- 1} R_{1} C e_{1} (t - h)] = 0 . \end{matrix}

(40)

Combining (35)–(40), we obtain

\begin{matrix} {\dot{V}}_{1} (t) \leq ξ_{1}^{T} (t) Ω_{1} ξ_{1} (t), \end{matrix}

(41)

where

ξ_{1} (t) = {[e_{1}^{T} (t), e_{1}^{T} (t - h_{x}), e_{1}^{T} (t - h), {\dot{e}}_{1}^{T} (t)]}^{T}

and

Ω_{1} = [\begin{matrix} Ξ_{1} & - T_{1} + P_{1} B (Ψ_{h_{x}} (t)) & - Y_{1} + R_{1} C & γ A (Ψ (t)) P_{1} \\ * & - Q_{1} & 0 & γ B^{T} (Ψ_{h_{x}} (t)) P_{1} \\ * & * & - M_{1} & γ C^{T} R_{1}^{T} \\ * & * & * & - 2 γ P_{1} + h_{x} Z_{1} + h S_{1} \end{matrix}] .

Due to

\forall Ψ \in V_{H_{n}}

,

\forall Ψ_{h_{x}} \in V_{H_{n}}

, according to the convex principle, one obtains

\begin{matrix} Ω_{1} < 0, \forall Ψ, Ψ_{h_{x}} \in V_{H_{n}}, \end{matrix}

(42)

which leads to

{\dot{V}}_{1} (t) < 0

. Then, it follows from (41) that

\begin{matrix} {\dot{V}}_{1} (t) \leq - λ_{m i n} (- Ω_{1}) ξ_{1}^{T} (t) ξ_{1} (t) \leq - λ_{m i n} (- Ω_{1}) e_{1}^{T} (t) e_{1} (t), \end{matrix}

(43)

this indicates that the error system

e_{1} (t)

is globally asymptotically stable. Obviously, conditions (25) ensure that

Ω_{1} < 0

holds.

Step j: To recursively prove the stability of the error system

e_{j} (t)

, we assume that

e_{j - 1} (t)

is globally asymptotically stable. Similar to (28) and using the Newton–Leibniz formula,

e_{j} (t)

(j = 2, 3, \dots, m)

can be rewritten, as follows

\begin{matrix} {\dot{e}}_{j} (t) = & [A (Ψ (t)) + B (Ψ_{h_{x}} (t)) + L_{j} C] e_{j} (t) - B (Ψ_{h_{x}} (t)) \int_{t - h_{x}}^{t} {\dot{e}}_{j} (τ) d τ - L_{j} C \int_{t - h}^{t} {\dot{e}}_{j} (τ) d τ \\ - L_{j} C e_{j - 1} (t) . \end{matrix}

(44)

Then, we construct the following Lyapunov–Krasovskill functions

\begin{matrix} V_{j} (t) = & \overset{V_{j 1} (t)}{\overset{︷}{e_{j}^{T} (t) P_{j} e_{j} (t)}} + \overset{V_{j 2} (t)}{\overset{︷}{\int_{t - h_{x}}^{t} e_{j}^{T} (τ) Q_{j} e_{j} (τ) d τ}} + \overset{V_{j 3} (t)}{\overset{︷}{\int_{t - h}^{t} e_{j}^{T} (τ) M_{j} e_{j} (τ) d τ}} \\ + \overset{V_{j 4} (t)}{\overset{︷}{\int_{- h_{x}}^{0} \int_{t + η}^{t} {\dot{e}}_{j}^{T} (τ) Z_{j} {\dot{e}}_{j} (τ) d τ d η}} + \overset{V_{j 5} (t)}{\overset{︷}{\int_{- h}^{0} \int_{t + η}^{t} {\dot{e}}_{j}^{T} (τ) S_{j} {\dot{e}}_{j} (τ) d τ d η}} . \end{matrix}

(45)

Taking the derivative of

V_{j 1} (t)

along with (44), we have

\begin{matrix} {\dot{V}}_{j 1} (t) = & e_{j}^{T} (t) [P_{j} A (Ψ (t)) + A^{T} (Ψ (t)) P_{j} + P_{j} B (Ψ_{h_{x}} (t)) + B^{T} (Ψ_{h_{x}} (t)) P_{j} + R_{j} C + C^{T} R_{j}] \times \\ e_{j} (t) - 2 e_{j}^{T} (t) P_{j} B (Ψ_{h_{x}} (t)) \int_{t - h_{x}}^{t} {\dot{e}}_{j} (τ) d τ - 2 e_{j}^{T} (t) R_{j} C \int_{t - h}^{t} {\dot{e}}_{j} (τ) d τ \\ - e_{j}^{T} (t) R_{j} C e_{j - 1} (t) . \end{matrix}

(46)

Now, by using Young’s inequality [27], we obtain

\begin{matrix} - e_{j}^{T} (t) R_{j} C e_{j - 1} (t) \leq ε_{1} e_{j}^{T} (t) e_{j} (t) + \frac{1}{ε_{1}} ∥ R_{j} {C ∥}^{2} {∥ e_{j - 1} ∥}^{2}, \end{matrix}

(47)

with

ε_{1} > 0

. Then, similar to (33)–(34) in step 1 and using (46)–(47), we have

\begin{matrix} {\dot{V}}_{j 1} \leq & e_{j}^{T} (t) [P_{j} A (Ψ (t)) + A^{T} (Ψ (t)) P_{j} + T_{j}^{T} + T_{j} + Y_{j}^{T} + Y_{j} + h_{x} D_{j} + h X_{j} + ε_{1} I] e_{j} (t) \\ - 2 e_{j}^{T} (t) [T_{j} - P_{j} B (Ψ_{h_{x}} (t))] e_{j} (t - h_{x}) - 2 e_{j}^{T} (t) [Y_{j} - R_{j} C] e_{j} (t - h) \\ + \int_{t - h_{x}}^{t} {\dot{e}}_{j}^{T} (τ) Z_{j} {\dot{e}}_{j} (τ) d τ + \int_{t - h}^{t} {\dot{e}}_{j}^{T} (τ) S_{j} {\dot{e}}_{j} (τ) d τ + \frac{1}{ε_{1}} ∥ R_{j} {C ∥}^{2} {∥ e_{j - 1} ∥}^{2} . \end{matrix}

(48)

For arbitrary constants

γ > 0

and

ε_{2} > 0

, we obtain

\begin{matrix} \{\begin{matrix} - 2 γ {\dot{e}}_{j}^{T} (t) P_{j} [{\dot{e}}_{j} (t) - A (Ψ (t)) e_{j} (t) - B (Ψ_{h_{x}} (t)) e_{j} (t - h_{x}) - P_{j}^{- 1} R_{j} C e_{j} (t - h) \\ + P_{j}^{- 1} R_{j} C e_{j - 1} (t)] = 0, \\ - 2 γ e_{j}^{T} (t) R_{j} C e_{j - 1} (t) \leq ε_{2} e_{j}^{T} (t) e_{j} (t) + \frac{γ^{2}}{ε_{2}} ∥ R_{j} {C ∥}^{2} {∥ e_{j - 1} ∥}^{2} . \end{matrix} \end{matrix}

(49)

Then, combining the derivative of

\sum_{i = 2}^{5} V_{j i} (t)

and using (48)–(49), we have

\begin{matrix} {\dot{V}}_{j} (t) \leq ξ_{j}^{T} (t) {\hat{Ω}}_{j} ξ_{j} (t) + (\frac{1}{ε_{1}} + \frac{γ^{2}}{ε_{2}}) ∥ R_{j} {C ∥}^{2} {∥ e_{j - 1} ∥}^{2}, \end{matrix}

(50)

where

ξ_{j} (t) = {[e_{j}^{T} (t), e_{j}^{T} (t - h_{x}), e_{j}^{T} (t - h), {\dot{e}}_{j}^{T} (t)]}^{T}

,

{\hat{Ω}}_{j} = [\begin{matrix} {\hat{Ξ}}_{j} & - T_{j} + P_{j} B (Ψ_{h_{x}} (t)) & - Y_{j} + R_{j} C & γ A (Ψ (t)) P_{j} \\ * & - Q_{j} & 0 & γ B^{T} (Ψ_{h_{x}} (t)) P_{j} \\ * & * & - M_{j} & γ C^{T} R_{j}^{T} \\ * & * & * & - 2 γ P_{j} + h_{x} Z_{j} + h S_{j} + ε_{2} I \end{matrix}]

and

{\hat{Ξ}}_{j} = P_{j} A (Ψ (t)) + A^{T} (Ψ (t)) P_{j} + T_{j} + T_{j}^{T} + Y_{j} + Y_{j}^{T} + h_{x} D_{j} + h X_{j} + Q_{j} + M_{j} + ε_{1} I

. Similar to

Ω_{1}

in step 1, if

\begin{matrix} {\hat{Ω}}_{j} < 0, \forall Ψ, Ψ_{h_{x}} \in V_{H_{n}} \end{matrix}

(51)

is true, we have

\begin{matrix} {\dot{V}}_{j} (t) \leq - λ_{m i n} (- {\hat{Ω}}_{j}) e_{j}^{T} (t) e_{j} (t) + (\frac{1}{ε_{1}} + \frac{γ^{2}}{ε_{2}}) ∥ R_{j} {C ∥}^{2} {∥ e_{j - 1} ∥}^{2} . \end{matrix}

(52)

Then, employing the comparison Lemma [35], we can conclude that if

e_{j - 1} (t)

is globally asymptotically stable, then

e_{j} (t)

is also globally asymptotically stable.

Furthermore, it not difficult to observe that

{\hat{Ω}}_{j} = Ω_{j} + ε_{1} Π_{1}^{T} Π_{1} + ε_{2} Π_{2}^{T} Π_{2}

, where

Π_{1} = {[I, 0, 0, 0]}^{T}

,

Π_{2} = {[0, 0, 0, I]}^{T}

, and

Ω_{j} = [\begin{matrix} Ξ_{j} & - T_{j} + P_{j} B (Ψ_{h_{x}}) & - Y_{j} + R_{j} C & γ A (Ψ) P_{j} \\ * & - Q_{j} & 0 & γ B^{T} (Ψ_{h_{x}}) P_{j} \\ * & * & - M_{j} & γ C^{T} R_{j}^{T} \\ * & * & * & - 2 γ P_{j} + h_{x} Z_{j} + h S_{j} \end{matrix}] < 0 .

Since

ε_{1}

and

ε_{2}

are selected arbitrarily,

Ω_{j} < 0

ensures that

{\hat{Ω}}_{j} < 0

holds when

ε_{1}

and

ε_{2}

are sufficiently small. Finally, it observes that conditions (25) ensure that

Ω_{j} < 0

holds. We finish the proof. □

Remark 6.

It follows from Theorem 2 that for a given output delay

h_{y}

, whether the LMI set (25), (26) and (27) have feasible solutions depends on the size of the parameter

h = \frac{h_{y}}{m}

. Obviously, for a large output delay of

h_{y}

, a sufficiently large m can ensure that the LMI sets (25), (26) and (27) have feasible solutions. However, the complexity of the cascade predictor is proportional to m; in other words, a larger m will reduce the observational performance of the cascade predictor. Therefore, we should choose a suitable value of m to balance the stability requirement and the complexity limitation of the cascade predictor.

4. Numerical Simulation

This section provides a series of numerical simulations to demonstrate the efficacy of our results.

Let

x (t) = {[x_{1} (t), x_{2} (t)]}^{T}

. Consider the RNNs (1) with the following parameters:

\begin{matrix} A = [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}], W_{0} = [\begin{matrix} 2.0 & - 0.1 \\ - 5.0 & 3.0 \end{matrix}], W_{1} = [\begin{matrix} - 1.5 & - 0.1 \\ - 0.2 & - 2.5 \end{matrix}], C = [\begin{matrix} 3 & 0.3 \\ 0.3 & 3 \end{matrix}], \\ h_{x} = 1, ϕ (s) = {[2, 2]}^{T}, f (x (t)) = {[t a n h (x_{1} (t)), t a n h (x_{2} (t))]}^{T}, \end{matrix}

where

h_{y}

exists if we take the value later. It can be observed from Figure 1 that these RNNs have complex chaotic behaviors. Then, from Theorem 1, it follows that there is a nonfeasible solution to the LMI set (12) when

h^{*} > 0.24

. Therefore, when

h_{y} \leq 0.24

, we only use the full-order observer (6). When

h_{y} > 0.24

, we use the cascade predictor (23).

Example 1.

For

h_{y} = 0.23

, assume that the initial conditions of the full-order observer

\hat{x}

are

\hat{x} (s) = {[- 2, - 2]}^{T}

,

s \in [- 1, 0)

. Then, from Theorem 1, we can obtain feasible solutions:

\begin{matrix} P = [\begin{matrix} 15.9540 & 0.3889 \\ 0.3889 & 2.3080 \end{matrix}], L = [\begin{matrix} - 1.4728 & 0.1894 \\ 1.1641 & - 2.3357 \end{matrix}] . \end{matrix}

The simulation results are shown in Figure 2 and Figure 3. Figure 2 represents the trajectory profiles of

\hat{x} (t)

and

x (t)

, which implies the validity of our designed full-order observer (6). Figure 3 represents the converged trajectory of

∥ e (t) ∥ = ∥ \hat{x} (t) - x (t) ∥

, which implies the convergence performance of the full-order observer (6). However, when

h = 0.25

, it can be observed from Figure 4 and Figure 5 that the full-order observer cannot accurately estimate the state of the original system and the estimation error

∥ e (t) ∥

becomes larger, since

h = 0.25

is greater than

h^{*} = 0.24

.

Example 2.

This example considers the case

h_{y} > 0.24

; thus, we only use the cascade predictor (23).

(i) For

h_{y} = 0.5

, we select

m = 5

and

h = 0.1

and assume that the initial conditions of

{\hat{x}}_{i}

,

i = 1, \dots, 5

are

{\hat{x}}_{i} (s) = {[- 2, - 2]}^{T}

,

s \in [- 1, 0)

. Then, from Theorem 2, we obtain feasible solutions:

\begin{matrix} P_{i} = [\begin{matrix} 142.1455 & - 3.8554 \\ - 3.8554 & 16.5776 \end{matrix}], L_{i} = [\begin{matrix} - 1.5129 & 0.1483 \\ 1.5729 & - 2.8071 \end{matrix}] (i = 1, \dots 5) . \end{matrix}

As illustrated in Figure 6 and Figure 7, the cascade predictor is valid and the estimation error

∥ e_{5} (t) ∥ = ∥ {\hat{x}}_{5} (t) - x (t) ∥

finally converges to 0.

(ii) For

h_{y} = 1

, we select

m = 10

and

h = 0.1

and assume that the initial conditions of

{\hat{x}}_{i}

,

i = 1, \dots, 10

are

{\hat{x}}_{i} (s) = {[- 1, - 1]}^{T}

,

s \in [- 1, 0)

. Since the value of h is equal to the value of h in i), the observer gain

L_{i}

(i = 1, \dots, 10)

can be equal to the observer gain in (i). Then, from Figure 8 and Figure 9, it is clear that the cascade predictor is valid, and the observer error

∥ e_{10} (t) ∥ = ∥ {\hat{x}}_{10} (t) - x (t) ∥

finally converges to 0.

Example 3.

This example will further discuss the effect of the size of the output delay on the convergence of the two predictors. The simulation results are shown in Figure 10, Figure 11 and Figure 12, and the influence of m and h on the convergence time is given in Table 1, where “*" denoting the single observer is not valid (

h_{y} > h^{*}

).

From Table 1, we can clearly observe that for both predictors, the output delay is directly proportional to the convergence time, and the larger the output delay, the longer the convergence time. In addition, from the experimental results, it can be concluded that, although the cascade predictor can solve the problem of an arbitrarily large output delay, as

h_{y}

increases, we will have to choose more subsystems to transmit the state information, which leads to the accumulation of error information and increases the cost of observation.

5. Conclusions

In this research, we investigate the RNNs’ state estimation by proposing an output-predicting and LPV approach. Due to the LPV approach, LKF and convex principle, several new conditions for the global asymptotic stability of the error system have been established. Compared with the traditional observer in [14,15,16,17,18,19,20], the chain-structured cascade predictor is more useful in the state estimation of neural networks. Different from [12,13,15,16,20], we use the LPV approach to convert nonlinear error dynamic systems into linear error systems, which greatly reduces the difficulty of the stability analysis. Finally, a series of numerical simulations show the effectiveness of the cascade predictor.

Author Contributions

Conceptualization, W.W. and Z.H.; methodology, W.W.; software, W.W. and J.C.; formal analysis, W.W.; writing—original draft preparation, W.W.; writing—review and editing, W.W. and Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61573005, and in part by the Natural Science Foundation of Fujian Province under Grant 2018J01417 and Grant 2019J01330.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chua, L.; Yang, L. Cellular neural networks: Application. IEEE Trans. Circuits Syst. 1998, 35, 1273–1290. [Google Scholar] [CrossRef]
Cichocki, A.; Unbehauen, R. Neural Networks for Optimization and Signal Processing; Wiley: Chichester, UK, 1993. [Google Scholar]
Joya, G.; Atencia, M.; Sandoval, F. Hopfield neural networks for optimization: Study of the different dynamics. Neurocomputing 2002, 43, 219–237. [Google Scholar] [CrossRef]
Li, W.; Lee, T. Hopfield neural networks for affine invariant matching. IEEE Trans. Neural Netw. 2001, 12, 1400–1410. [Google Scholar] [CrossRef]
Yong, S.; Scott, P.; Nasrabadi, N. Object recognition using multilayer Hopfield neural network. IEEE Trans. Image Process. 1997, 6, 357–372. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Liu, L.; Shan, Q.; Zhang, H. Stability criteria for recurrent neural networks with time-varying delay based on secondary delay partitioning method. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2589–2595. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; He, Y.; Jiang, L.; Wu, M. Delay-dependent stability criteria for generalized neural networks with two delay components. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 1263–1276. [Google Scholar] [CrossRef]
Zhang, X.; Han, Q. Global asymptotic stability analysis for delayed neural networks using a matrix-based quadratic convex approach. Neural Netw. 2014, 54, 57–69. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Zhang, H.; Jiang, B. LMI-based approach for global asymptotic stability analysis of recurrent neural networks with various delays and structures. IEEE Trans. Neural Netw. 2011, 22, 1032–1045. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.J.; Lee, S.M.; Kwon, O.M.; Park, J.H. New approach to stability criteria for generalized neural networks with interval time-varying delays. Neurocomputing 2015, 149, 1544–1551. [Google Scholar] [CrossRef]
Wu, Z.; Shi, P.; Su, H.; Chu, J. Delay-dependent stability analysis for switched neural networks with time-varying delay. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2011, 41, 1522–1530. [Google Scholar] [CrossRef]
Zhu, Q.; Cao, J. Stability of Markovian jump neural networks with impulse control and time varying delays. Nonlinear Anal. Real World Appl. 2012, 13, 2259–2270. [Google Scholar] [CrossRef]
Wang, Z.; Ho, D.W.C.; Liu, X. State estimation for delayed neural networks. IEEE Trans. Neural Netw. 2005, 16, 279–284. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Liu, R.; Liu, Y. State estimation for jumping recurrent neural networks with discrete and distributed delays. Neural Netw. 2009, 22, 41–48. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Wang, J.; Wu, Y. State estimation for recurrent neural networks with unknown delays: A robust analysis approach. Neurocomputing 2017, 227, 29–36. [Google Scholar] [CrossRef]
Huang, H.; Feng, G.; Cao, J. Robust state estimation for uncertain neural networks with time-varying delay. IEEE Trans. Neural Netw. 2008, 19, 1329–1339. [Google Scholar] [CrossRef]
Huang, H.; Feng, G.; Cao, J. State estimation for static neural networks with time-varying delay. Neural Netw. 2010, 23, 1202–1207. [Google Scholar] [CrossRef]
Ren, J.; Zhu, H.; Zhong, S.; Ding, Y.; Shi, K. State estimation for neural networks with multiple time delays. Neurocomputing 2015, 151, 501–510. [Google Scholar] [CrossRef]
Liu, M.; Chen, H. H_∞ state estimation for discrete-time delayed systems of the neural network type with multiple missing measurements. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2987–2998. [Google Scholar] [CrossRef]
Guo, M.; Zhu, S.; Liu, X. Observer-based state estimation for memristive neural networks with time-varying delay. Knowl.-Based Syst. 2022, 246, 108707. [Google Scholar] [CrossRef]
Beintema, G.I.; Schoukens, M.; Toth, R. Deep subspace encoders for nonlinear system identification. Automatica 2023, 156, 111210. [Google Scholar] [CrossRef]
Germani, A.; Manes, C.; Pepe, P. A new approach to state observation of nonlinear systems with delayed output. IEEE Trans. Autom. Control 2002, 47, 96–101. [Google Scholar] [CrossRef]
Ahmed-Ali, T.; Cherrier, E.; Lamnabhi-Lagarrigue, F. Cascade high gain predictors for a class of nonlinear systems. IEEE Trans. Autom. Control 2012, 57, 224–229. [Google Scholar] [CrossRef]
Farza, M.; M’Saad, M.; Menard, T.; Fall, M.L.; Gehan, O.; Pigeon, E. Simple cascade observer for a class of nonlinear systems with long output delays. IEEE Trans. Autom. Control 2015, 60, 3338–3343. [Google Scholar] [CrossRef]
Farza, M.; Hernandez-Gonzalez, O.; Menard, T.; Targui, B.; M’Saad, M.; Astorga-Zaragoza, C.M. Cascade observer design for a class of uncertain nonlinear systems with delayed outputs. Automatica 2018, 89, 125–134. [Google Scholar] [CrossRef]
Zemouche, A.; Boutayeb, M. On LMI conditions to design observers for Lipschitz nonlinear systems. Automatica 2013, 49, 585–591. [Google Scholar] [CrossRef]
Adil, A.; Hamaz, A.; N’Doye, I.; Zemouche, A.; Laleg-Kirati, T.M.; Bedouhene, F. On high-gain observer design for nonlinear systems with delayed output measurements. Automatica 2022, 141, 11281. [Google Scholar] [CrossRef]
Huang, H.; Huang, T.; Chen, X.; Qian, C. Exponential stabilization of delayed recurrent neural networks: A state estimation based approach. Neural Netw. 2013, 48, 153–157. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; He, Y.; Zhang, C.; Wu, M. Exponential stabilization of neural networks with time-varying delay by periodically intermittent control. Neurocomputing 2016, 207, 469–475. [Google Scholar] [CrossRef]
Huang, H.; Feng, G. Delay-dependent h_∞ and generalized h₂ filtering for delayed neural networks. IEEE Trans. Circuits Syst. Regul. Pap. 2009, 56, 846–857. [Google Scholar] [CrossRef]
Gonzalez, A. Improved results on stability analysis of time-varying delay systems via delay partitioning method and Finsler’s lemma. J. Frankl. Inst. 2022, 359, 7632–7649. [Google Scholar] [CrossRef]
Moon, Y.S.; Park, P.; Kwon, W.H.; Lee, Y.S. Delay-dependent robust stabilization of uncertain state-delayed systems. Int. J. Control 2001, 74, 1447–1455. [Google Scholar] [CrossRef]
Gu, K.L.; Kharitonov, V.; Chen, J. Stability of Time-Delay Systems; Springer: Berlin, Germany, 2003. [Google Scholar]
Body, S.; Ghaoui, L.E.; Feron, E.; Balakrishnan, V. Linear Matrix Inequalities in System and Control Theory; SIAM: Philadelphia, PA, USA, 1994. [Google Scholar]
Khalil, H.K. Nonlinear Systems; Prentice-Hall: Englewood Cliffs, NJ, USA, 2002. [Google Scholar]

Figure 1. The phase space trajectory of RNNs (1).

Figure 2. Profile on trajectories of

x (t)

and

\hat{x} (t)

.

Figure 2. Profile on trajectories of

x (t)

and

\hat{x} (t)

.

Figure 3. The estimation error

∥ e (t) ∥

.

Figure 3. The estimation error

∥ e (t) ∥

.

Figure 4. The state

x_{i} (t)

and

{\hat{x}}_{i} (t)

,

i = 1, 2

.

Figure 4. The state

x_{i} (t)

and

{\hat{x}}_{i} (t)

,

i = 1, 2

.

Figure 5. The estimation error

∥ e (t) ∥

.

Figure 5. The estimation error

∥ e (t) ∥

.

Figure 6. Profile on trajectories of

x (t)

and

{\hat{x}}_{5} (t)

.

Figure 6. Profile on trajectories of

x (t)

and

{\hat{x}}_{5} (t)

.

Figure 7. The estimation error

∥ e_{5} (t) ∥

.

Figure 7. The estimation error

∥ e_{5} (t) ∥

.

Figure 8. Profile on trajectories of

x (t)

and

{\hat{x}}_{10} (t)

.

Figure 8. Profile on trajectories of

x (t)

and

{\hat{x}}_{10} (t)

.

Figure 9. The estimation error

∥ e_{10} (t) ∥

.

Figure 9. The estimation error

∥ e_{10} (t) ∥

.

Figure 10. Convergence of single observer at different delays.

Figure 11. Convergence of cascade predictor at different delays.

Figure 12. Convergence of cascade predictor at different delays.

Table 1. The convergence time for two types of predictors by setting observer error

∥ e ∥ = 0.1

.

Table 1. The convergence time for two types of predictors by setting observer error

∥ e ∥ = 0.1

.

Predictor∖ $h_{y}$		$h_{y} = 0.1$	$h_{y} = 0.2$	$h_{y} = 0.5$	$h_{y} = 1$	$h_{y} = 1.5$	$h_{y} = 2$
Convergence time	Simple observer	1.6	8.5	*	*	*	*
Convergence time	Cascade predictor	1.8 (m = 1)	3.6 (m = 2)	9.1 (m = 5)	16.3 (m = 10)	27.4 (m = 15)	31.5 (m = 20)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, W.; Chen, J.; Huang, Z. Observer-Based State Estimation for Recurrent Neural Networks: An Output-Predicting and LPV-Based Approach. Math. Comput. Appl. 2023, 28, 104. https://doi.org/10.3390/mca28060104

AMA Style

Wang W, Chen J, Huang Z. Observer-Based State Estimation for Recurrent Neural Networks: An Output-Predicting and LPV-Based Approach. Mathematical and Computational Applications. 2023; 28(6):104. https://doi.org/10.3390/mca28060104

Chicago/Turabian Style

Wang, Wanlin, Jinxiong Chen, and Zhenkun Huang. 2023. "Observer-Based State Estimation for Recurrent Neural Networks: An Output-Predicting and LPV-Based Approach" Mathematical and Computational Applications 28, no. 6: 104. https://doi.org/10.3390/mca28060104

Article Menu

Observer-Based State Estimation for Recurrent Neural Networks: An Output-Predicting and LPV-Based Approach

Abstract

1. Introduction

2. Problem Formulation

3. Results

3.1. Single Observer

3.2. Cascade Predictor

4. Numerical Simulation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI