Learning Interactions in Reaction Diffusion Equations by Neural Networks

Chen, Sichen; Brunel, Nicolas J-B.; Yang, Xin; Cui, Xinping

doi:10.3390/e25030489

Open AccessFeature PaperArticle

Learning Interactions in Reaction Diffusion Equations by Neural Networks

¹

Department of Statistics, University of California, Riverside, CA 92521, USA

²

ENSIIE & Laboratoire de Mathématiques et Modélisation d’Evry, Université Paris Saclay, 91025 Evry, France

³

Quantmetry, 75008 Paris, France

⁴

Department of Mathematics, University of California, Riverside, CA 92521, USA

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(3), 489; https://doi.org/10.3390/e25030489

Submission received: 6 February 2023 / Revised: 7 March 2023 / Accepted: 8 March 2023 / Published: 11 March 2023

(This article belongs to the Special Issue Physics-Based Machine and Deep Learning for PDE Models)

Download

Browse Figures

Versions Notes

Abstract

:

Partial differential equations are common models in biology for predicting and explaining complex behaviors. Nevertheless, deriving the equations and estimating the corresponding parameters remains challenging from data. In particular, the fine description of the interactions between species requires care for taking into account various regimes such as saturation effects. We apply a method based on neural networks to discover the underlying PDE systems, which involve fractional terms and may also contain integration terms based on observed data. Our proposed framework, called Frac-PDE-Net, adapts the PDE-Net 2.0 by adding layers that are designed to learn fractional and integration terms. The key technical challenge of this task is the identifiability issue. More precisely, one needs to identify the main terms and combine similar terms among a huge number of candidates in fractional form generated by the neural network scheme due to the division operation. In order to overcome this barrier, we set up certain assumptions according to realistic biological behavior. Additionally, we use an

L^{2}

-norm based term selection criterion and the sparse regression to obtain a parsimonious model. It turns out that the method of Frac-PDE-Net is capable of recovering the main terms with accurate coefficients, allowing for effective long term prediction. We demonstrate the interest of the method on a biological PDE model proposed to study the pollen tube growth problem.

Keywords:

neural networks; deep learning; non-linear reaction–diffusion equations; model discovery; sparse regression; multiple testing

1. Introduction

Two-component reaction–diffusion systems often model the interaction of two chemicals, leading to the formation of non-uniform spatial patterns of chemical concentration or morphogenesis under certain conditions due to chemical reactions and spreading. Since Turing’s groundbreaking work [1], reaction–diffusion systems have been extensively used in developmental biology modeling. For example, let

u = u (x, y, t)

and

v = v (x, y, t)

represent the concentration of two chemical species, which may either enhance or suppress each other depending on the context. The system of u and v can be modeled as follows:

\{\begin{matrix} \partial_{t} u & = & d_{0} Δ u + N_{1} (u, v), \\ \partial_{t} v & = & d_{1} Δ v + N_{2} (u, v), \end{matrix}

(1)

where

Δ = \partial_{x}^{2} + \partial_{y}^{2}

denotes the Laplacian operator,

N_{1}

and

N_{2}

are interactions between u and v. The functions

N_{1}

and

N_{2}

are sums of various reaction terms that can be derived from physical or chemical principles such as mass-action laws, Michaelis–Menten kinetics, or products that represent some competition, cooperation effects. We refer the readers to ([2], Section 2.2) for more discussions. Hence,

N_{1}

and

N_{2}

are sums of meaningful functions that represent specific mechanisms: if we are able to identify these terms and discover the explicit formulas for

N_{1}

and

N_{2}

, then we can learn more about the nature of the interactions and predict future behaviors well. This situation arises commonly in biological applications such as chemotaxis, pattern formation in developmental biology, and also the cell polarity phenomenon [3,4].

Cell polarity plays a vital role in cell growth and function for many cell types, affecting cell migration, proliferation, and differentiation. A classic example of polar growth is pollen tube growth, which is controlled by the Rho GTPase (ROP1) molecular switch. Recent studies have revealed that the localization of active ROP1 is regulated by both positive and negative feedback loops, and calcium ions play a role in ROP1’s negative feedback mechanism. Initially, ROP1 is inside the membrane. During positive feedback (rate

k_{p f}

), some of the ROP1 enters the membrane. At the same time, negative feedback (rate

k_{n f}

) causes some of it to return inside the membrane while the rest diffuse on the membrane (rate

D_{r}

). Calcium ions follow a similar process with positive rate

k_{a c}

, negative rate

k_{d c}

, and diffusion rate

D_{c}

. In [5,6], the following 2D reaction–diffusion system (2) is introduced:

\{\begin{matrix} R_{t} = k_{p f} R^{α} (R_{t o t} - \int_{- L}^{L} R (x, t) d x) - k_{n f} g (C) R + D_{r} R_{x x}, \\ C_{t} = k_{a c} R - k_{d c} C + D_{c} C_{x x}, \\ R_{x} (- L, t) = R_{x} (L, t) = 0, C_{x} (- L, t) = C_{x} (L, t) = 0, \\ R (x, 0) = R_{0} (x), C (x, 0) = C_{0} (x) . \end{matrix}

(2)

with suitable initial and boundary conditions being proposed to quantitatively describe such spatial and temporal connection between ROP1 and calcium ions, leading to rapid oscillations in their distributions on the cell membrane. Here,

R = R (x, t)

,

C = C (x, t)

, and

R_{t}

,

C_{t}

,

R_{x}

,

R_{x x}

,

C_{x}

and

C_{x x}

are abbreviated notations for partial derivatives with respect to the time t or to the spatial variable x. Moreover, the non-linear function

g (C)

characterizes how calcium ions play a role in ROP1’s negative feedback loop. Specifically, the active ROP1 causes an increase in

C a^{2 +}

levels, leading to a reduction in ROP1 activity and a decrease in its levels. Meanwhile, the flow of

C a^{2 +}

slows down as ROP1 drops. Ref. [6] proposed the equation

g (C) = \frac{C^{2}}{C^{2} + k_{c}^{2}}

to describe such spatial–temporal patterns of calcium, where

k_{c}

is a positive constant. Based on this model, ref. [6] developed a modified gradient matching procedure for parameter estimation, including

k_{n f}

and

k_{c}

. However, it requires that

g (C)

in (2) is a known function. In this work, we propose to apply neural network methods to uncover the function

g (C)

or more broadly, to learn interaction terms

N_{1}

and

N_{2}

in general reaction-diffusion PDEs (1), which may contain fractional expressions (Figure 1).

In the past decade, the artificial intelligence community has focused increasingly on neural networks, which have become crucial in many applications, especially PDEs. Deep learning-based approaches to PDEs have made substantial progress and are well-studied, both for forward and inverse problems. For forward problems with appropriate initial and boundary conditions in various domains, several methods have been developed to accurately predict dynamics (e.g., [7,8,9,10,11,12,13,14,15,16,17]). For inverse problems, there are two classes of approaches. The first class of approaches focuses on inferring coefficients from known data (e.g., [7,10,12,15,18,19]). An example of this is the widely known PINN (Physics-informed Neural Networks) method [10], which uses PDEs in the loss function of neural networks to incorporate scientific knowledge. Ref. [7] improved the efficiency of PINNs with the residual-based adaptive refinement (RAR) method and created a library of open-source codes for solving various PDEs, including those with complex geometry. However, this method is only capable of estimating coefficients for fixed known terms in PDEs, and may not work well for discovering hidden PDE models. Although [9] extended the PINN method to find unknown dynamic systems, the nonlinear learner function remains a black-box and no explicit expressions of the discovered terms in the predicted PDE are available, making it difficult to interpret their physical meaning. The second class of approaches not only estimates coefficients, but also discovers hidden terms (e.g., [16,17,20,21,22,23,24,25,26]). An example is the PDE-Net method [16], which combines numerical approximations of convolutional differential operators with symbolic neural networks. PDE-Net can learn differential operators through convolution kernels, a natural method for solving PDEs that has been well-studied in [27]. This approach is capable of recovering terms in PDE models with explicit expressions and relatively accurate coefficients, but often produces many noisy terms that lack interpretation. In order to produce parsimonious models, refs. [25,26] proposed to create a regression model with the response variable

\partial_{t} u

, and a matrix

Θ

with a collection of spatial and polynomial derivative functions (e.g.,

u, \partial_{x} u, u \partial_{x} u

):

\partial_{t} u = Θ ξ

. The estimation of differential equations by modeling the time variations of the solution is known to produce consistent estimates [28]. In addition, the Ridge regression with hard thresholding can be used to approximate the coefficient vector

ξ

. This sparse regression-based method generally results in a PDE model with accurately predicted terms and high accuracy coefficients. However, few existing studies have focused on effectively recovering interaction terms in the fractional form (say one polynomial term divided by another polynomial term) in hidden partial differential equations, which is the focus of this paper.

Previous methods for identifying the hidden terms in reaction–diffusion partial differential equation models have mostly focused on polynomial forms. However, as indicated in Equation (2), the model for ROP1 and calcium ion distribution also involves fractional and integral forms, which can pose identifiability issues when combined with polynomial forms. Furthermore, we want to attain a parsimonious model, as the interpretability of the PDE model is important for biologists to comprehend biological behavior and phenomena revealed by the model.

In this paper, we utilize a combination of a modified PDE-Net method (which adds fractional and integration terms to the original PDE-Net approach), an

L^{2}

norm term selection criterion, and an appropriate sparsity regression. This combination proves to produce meaningful and stable terms with accurate estimation of coefficients. For ease of reference to this combination, we call it Frac-PDE-Net.

The paper is organized as follows. In Section 2, we explain the main idea and the framework of our proposed method Frac-PDE-Net. In Section 3, we apply Frac-PDE-Net to discover some biological PDE models based on simulation data. Then, in Section 4, we make some predictions to test the effectiveness of the models learned in Section 3. Finally, we summarize our findings and present some possible future works in Section 5.

2. Methodology

The main idea of the PDE-Net method, as described in [16], is to use a deep convolutional neural network (CNN) to study generic nonlinear evolution partial differential equations (PDEs) as shown below:

\partial_{t} u = F (z, u, \nabla u, \nabla^{2} u, \dots), z \in Ω, t \in [0, T],

(3)

where

u = u (z, t)

is a function (scalar valued or vector valued) of the space variable

z

and the temporal variable t. Its architecture is a feed-forward network that combines the forward Euler method in time with the second-order finite difference method in space through the implementation of special filters in the CNN that imitate differential operators. The network is trained to approximate the solution to the above PDEs and then the network is used to make predictions for the subsequent time steps. The authors of [16] show that this approach is effective for solving a range of PDEs and can achieve satisfactory accuracy and computational efficiency compared to traditional numerical methods. In this paper, we follow a similar framework to PDE-Net, but with modifications on a symbolic network framework (

S y m N e t_{m}^{k}

) to better align with biological models.

2.1. PDE-Net Review

The feed-forward network consists of several

Δ t

-blocks, all of which use the same parameters optimized through minimizing a loss function. For simplicity, we will only show one

Δ t

-block for two-dimensional PDEs, as repeating it generates multiple

Δ t

-blocks, and the concept can easily be extended to higher-dimensional PDEs.

Denote the space variable

z

in (3) to be

z = (x, y)

since we are dealing with the two-dimensional case. Let

t_{0} = 0

and

\tilde{u} (\cdot, t_{0})

be the given initial data. For

i \geq 0

,

\tilde{u} (\cdot, t_{i + 1})

denotes the predicted value of u at time

t_{i + 1}

calculated from the predicted (true) value of

\tilde{u}

at time

t_{i}

using the following procedure:

\tilde{u} (\cdot, t_{i + 1}) = \tilde{u} (\cdot, t_{i}) + (Δ t) SymNet (x, y, D_{00} u, D_{10} u, D_{01} u, D_{20} u, \dots),

where SymNet is an approximation operator of F. Here, the operators

D_{i j}

are convolution operators with the underlying filters

q_{i j}

, i.e.,

D_{i j} u : = \frac{1}{{(Δ x)}^{i} {(Δ y)}^{j}} q_{i j} \otimes u

. These operators approximate differential operators:

D_{i j} u \approx \frac{d^{i + j} u}{d^{i} x d^{j} y} .

For a general

N \times N

filter

q = (q [k_{1}, k_{2}])

, where

- \frac{N - 1}{2} \leq k_{1}, k_{2} \leq \frac{N - 1}{2}

,

q \otimes u (x, y) : = \sum_{k_{1}, k_{2}} q [k_{1}, k_{2}] u (x + k_{1} Δ x, y + k_{2} Δ y) .

(4)

By Taylor expansion,

q \otimes u (x, y) = \sum_{i, j = 0}^{N - 1} m_{i j} {(Δ x)}^{i} {(Δ y)}^{j} \frac{\partial^{i + j} u}{\partial^{i} x \partial^{j} y} |_{(x, y)} + O ({| Δ x |}^{N}) + O ({| Δ y |}^{N}),

where

m_{i j} : = \frac{1}{i! j!} \sum_{k_{1}, k_{2}} k_{1}^{i} k_{2}^{j} q [k_{1}, k_{2}], \forall 0 \leq i, j \leq N - 1 .

In particular, if choosing

Δ x = Δ y = δ

, then

q \otimes u (x, y) = \sum_{i, j = 0}^{N - 1} m_{i j} δ^{i + j} \frac{\partial^{i + j} u}{\partial^{i} x \partial^{j} y} |_{(x, y)} + O (δ^{N}),

(5)

As a result, the training of q can be performed through the training of

M : = (m_{i j})

since the moment matrix

M = M (q)

. It is important to note that the trainable filters M (or q) must be carefully constrained to match differential operators.

For example, to approximate

\frac{\partial u}{\partial x}

by

D_{10} u

, or equivalently by

\frac{1}{Δ x} q_{10} \otimes u

for a

3 \times 3

filter

q_{10}

, we may choose

M_{1} (q_{10}) = (\begin{matrix} 0 & 0 & * \\ 1 & * & * \\ * & * & * \end{matrix}) or M_{2} (q_{10}) = (\begin{matrix} 0 & 0 & 0 \\ 1 & 0 & * \\ 0 & * & * \end{matrix}),

(6)

where ∗ means no constraint on the corresponding entry. Generally, the fewer instances of ∗ present, the more restrictions are imposed, leading to increased accuracy. In this example of (6), the choice of

M_{1}

ensures the 1st order accuracy and the choice of

M_{2}

guarantees the 2^nd order accuracy. More precisely, if we plug

M_{1}

into (5) with

Δ x = Δ y = δ

, then

q_{10} \otimes u (x, y) = δ \frac{\partial u}{\partial x} + O (δ^{2}),

which implies

\frac{1}{Δ x} q_{10} \otimes u (x, y) = \frac{\partial u}{\partial x} + O (Δ x)

. Similarly, if we plug

M_{2}

into (5), then

\frac{1}{Δ x} q_{10} \otimes u (x, y) = \frac{\partial u}{\partial x} + O ({(Δ x)}^{2})

. In PDE-Net 2.0, all moment matrices are trained as subject to partial constraints so that the accuracy is at least 2nd order.

The

S y m N e t_{m}^{k}

network, modeled after CNNs, is employed to approximate the multivariate nonlinear response function F. It takes a m-dimensional vector as input and consists of k layers. As depicted in Figure 2, the

S y m N e t_{m}^{2}

network has two hidden layers, where each

f_{i}

unit performs a dyadic multiplication and the output is added to the

(i + 1)

th hidden layer.

The loss function for this method has three components and is defined as follows:

L = L^{d a t a} + λ_{M} L^{m o m e n t} + λ_{S} L^{S y m N e t} .

(7)

Here,

L^{d a t a}

measures the difference between the true data and the prediction. Consider the data set

{u_{j} (\cdot, t_{i}) \in R^{N_{s} \times N_{s}} : 1 \leq i \leq n, 1 \leq j \leq N}

, where n is the number of

Δ t

blocks, N is the total number of samples, and

N_{s}

is the number of space grids. The index j indicates the jth solution path with a certain initial condition of the unknown dynamics, and the index i represents the solution at time

t_{i}

. Then, we define

L^{d a t a} = \frac{1}{n N {(Δ t)}^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{N} ℓ_{i j} .

Here,

ℓ_{i j} : = | | u_{j} (t_{i}, \cdot) - {\tilde{u}}_{j} (t_{i}, \cdot) {| |}_{2}^{2}

, where

u_{j}

represents the real data and

{\tilde{u}}_{j}

denotes the predicted data. For a given threshold s, recall the Huber’s loss function

ℓ_{1}^{(s)}

defined as

ℓ_{1}^{(s)} (x) = \{\begin{matrix} | x | - \frac{s}{2} & if & | x | > s, \\ \frac{x^{2}}{2 s} & if & | x | \leq s . \end{matrix}

(8)

We then define the following:

L^{m o m e n t} = \sum_{i, j} \sum_{i_{1}, j_{1}} ℓ_{1}^{(s)} (M (q_{i j}) [i_{1}, j_{1}]),

where

q_{i j}

s are filters and

M (q_{i j})

is the moment matrix of

q_{i j}

. Using the same Huber loss function as in (8), we define

L^{S y m N e t} = \sum_{i, j} ℓ_{1}^{(s)} (w_{i j}),

where

w_{i j}

s are the parameters in SymNet. The coefficients

λ_{M}

and

λ_{S}

in Equation (7) serve as regularization terms to help control the magnitude of the parameters, preventing them from becoming too large and overfitting to the training data.

2.2. mPDE-Net (Modified PDE-Net)

In mPDE-Net, we do not include multiplications between derivatives of u and v, as these interactions are not commonly present in biological phenomena. Additionally, to handle interactions in fractional or integral forms, such as those in Equation (2), mPDE-Net incorporates integral terms and division operations into

S y m N e t_{m}^{k}

. However, there was a challenge with identifiability using mPDE-Net. For instance, consider a two-component input vector u and v. mPDE-Net may produce results such as

\frac{u^{2}}{u + ϵ}

or

\frac{u v}{v + ϵ}

, where

ϵ

is a small number due to noise. Although both of these terms essentially represent the same term u, the mPDE-Net is unable to automatically identify them as such. Keeping all similar terms such as

\frac{u^{2}}{u + ϵ}

,

\frac{u v}{v + ϵ}

and u at the same time would result in a complex model and the real fractional term would not be effectively trained.

To address the identifiability issue, restrictions were imposed on the nonlinear interaction term

N (u, v)

by assuming that

N (u, v) = g (u) h (v)

, where either g or h is linear and the other one can contain a fractional term with the order of the denominator larger than that of the numerator. For instance, the terms

\frac{u^{2}}{u + ϵ}

and

\frac{u v}{v + ϵ}

are further decomposed as follows:

\frac{u^{2}}{u + ϵ} = u - ϵ + \frac{ϵ^{2}}{u + ϵ}, \frac{u v}{v + ϵ} = u - u \frac{ϵ}{v + ϵ} .

As seen, the main part of the above two terms is u while the rest, such as

ϵ

,

\frac{ϵ^{2}}{u + ϵ}

and

u \frac{ϵ}{v + ϵ}

, are considered as perturbations since

ϵ

is very small. This allows the mPDE-Net to identify and combine the main parts of terms, resulting in a compact model.

Figure 3 presents an example of a system involving the derivatives of u and v up to the second order. The symbolic neural network in this example has five hidden layers, referred to as

S y m N e t_{10}^{5}

. The operators

f_{i}

are multiplication functions, i.e.,

f_{i} (η_{i}, ξ_{i}) = η_{i} ξ_{i}

, for

i = 1, 4, 5

; and

f_{j}

are division functions, i.e.,

f_{j} (η_{j}, ξ_{j}) = \frac{η_{j}}{ξ_{j}}

, for

j = 2, 3

. Additionally, a term

u^{α}

is included to incorporate fractional powers, such as the term

R^{α}

in (2). The algorithm corresponding to this example is outlined in Algorithm 1, where

L_{1} = {(u, u_{x}, u_{x x}, v, v_{x}, v_{x x}, u^{2}, v^{2}, u^{α}, I)}^{T}, L_{2} = {(L_{1}^{T}, f_{1})}^{T}, L_{3} = {(L_{2}^{T}, f_{2})}^{T}, L_{4} = {(L_{3}^{T}, f_{3})}^{T}, L_{5} = {(L_{4}^{T}, f_{4})}^{T}, L_{6} = {(L_{5}^{T}, f_{5})}^{T} .

Algorithm 1 Scheme of mPDE-Net.

Input:

u, u_{x}, u_{x x}, v, v_{x}, v_{x x}, u^{2}, v^{2}, u^{α}, I

, where I represents

\int u (x, t) d x

,

{(η_{1}, ξ_{1})}^{T} = W^{(1)} L_{1} + b^{(1)}

,

W^{(1)} \in R^{2 \times 10}, L_{1} \in R^{10}, b^{(1)} \in R^{2}

,

{(η_{2}, ξ_{2})}^{T} = W^{(2)} L_{2} + b^{(2)}

,

W^{(2)} \in R^{2 \times 11}, L_{2} \in R^{11}, b^{(2)} \in R^{2}

,

{(η_{3}, ξ_{3})}^{T} = W^{(3)} L_{3} + b^{(3)}

,

W^{(3)} \in R^{2 \times 12}, L_{3} \in R^{12}, b^{(3)} \in R^{2}

,

{(η_{4}, ξ_{4})}^{T} = W^{(4)} L_{4} + b^{(4)}

,

W^{(4)} \in R^{2 \times 13}, L_{4} \in R^{13}, b^{(4)} \in R^{2}

,

{(η_{5}, ξ_{5})}^{T} = W^{(5)} L_{5} + b^{(5)}

,

W^{(5)} \in R^{2 \times 14}, L_{5} \in R^{14}, b^{(5)} \in R^{2}

,

Output:

F = W^{(6)} L_{6} + b^{(6)}

,

W^{(6)} \in R^{1 \times 15}, L_{6} \in R^{15}, b^{(6)} \in R^{1}

.

To further demonstrate the mPDE-Net approach, we present a concrete example. To simplify the notation, we introduce the row vector

e_{i}

with a 1 in the ith component and 0 in all other components, i.e.,

e_{i} = (0, 0, \dots, 0, 1, 0 \dots, 0),

where the number “1” is on the ith position. Then, we set

W^{(1)} = (\begin{matrix} e_{1} + e_{4} \\ e_{1} + e_{4} \end{matrix}), W^{(2)} = (\begin{matrix} e_{4} \\ 4 e_{4} + e_{8} \end{matrix}), W^{(3)} = (\begin{matrix} 0.5 e_{1} \\ 0.2 e_{1} + e_{7} \end{matrix}),

W^{(4)} = (\begin{matrix} 0.2 e_{1} \\ e_{12} \end{matrix}), W^{(5)} = (\begin{matrix} 0.2 e_{4} \\ e_{13} \end{matrix}), W^{(6)} = 0.1 e_{1} + 0.3 e_{3} + 6 e_{4} + e_{11} + 2 e_{14} + 3 e_{15},

b^{(1)} = (\begin{matrix} 1 \\ 0 \end{matrix}), b^{(2)} = (\begin{matrix} 0.5 \\ 0.5 \end{matrix}), b^{(3)} = (\begin{matrix} 1 \\ 0 \end{matrix}), b^{(4)} = (\begin{matrix} 0 \\ 0 \end{matrix}), b^{(5)} = (\begin{matrix} 0 \\ 0 \end{matrix}), b^{(6)} = (\begin{matrix} 0 \\ 0 \end{matrix}) .

According to Algorithm 1 for

1 \leq i \leq 5

,

W^{(1)} L_{1} + b^{(1)} = (\begin{matrix} u + v + 1 \\ u + v \end{matrix}), f_{1} = f_{1} (η_{1}, ξ_{1}) = η_{1} ξ_{1} = (u + v + 1) (u + v),

W^{(2)} L_{2} + b^{(2)} = (\begin{matrix} v + 0.5 \\ 4 v + v^{2} + 0.5 \end{matrix}), f_{2} = f_{2} (η_{2}, ξ_{2}) = \frac{η_{2}}{ξ_{2}} = \frac{v + 0.5}{v^{2} + 4 v + 0.5},

W^{(3)} L_{3} + b^{(3)} = (\begin{matrix} 0.5 u + 1 \\ 0.2 u + u^{2} \end{matrix}), f_{3} = f_{3} (η_{3}, ξ_{3}) = \frac{η_{3}}{ξ_{3}} = \frac{0.5 u + 1}{u^{2} + 0.2 u},

W^{(4)} L_{4} + b^{(4)} = (\begin{matrix} 0.2 u \\ f_{2} \end{matrix}), f_{4} = f_{4} (η_{4}, ξ_{4}) = η_{4} ξ_{4} = 0.2 u f_{2} = 0.2 u \frac{v + 0.5}{v^{2} + 4 v + 0.5},

W^{(5)} L_{5} + b^{(5)} = (\begin{matrix} 0.2 v \\ f_{3} \end{matrix}), f_{5} = f_{5} (η_{5}, ξ_{5}) = η_{5} ξ_{5} = 0.2 v f_{3} = 0.2 v \frac{0.5 u + 1}{u^{2} + 0.2 u},

Therefore,

\begin{matrix} S y m N e t_{10}^{5} & = W^{6} L_{6} + b^{(6)} = 0.1 u + 0.3 u_{x x} + 6 v + f_{1} + 2 f_{4} + 3 f_{5} \\ = 0.3 u_{x x} + u^{2} + 2 u v + v^{2} + 1.1 u + 7 v + 0.4 u \frac{v + 0.5}{v^{2} + 4 v + 0.5} + 0.6 v \frac{0.5 u + 1}{u^{2} + 0.2 u} . \end{matrix}

Let

L

denote the library for PDE-Net 2.0 and

L_{f}

denote the library for mPDE-Net. It is clear that

L

and

L_{f}

are distinct. Typically,

L

only seeks to identify multiplication terms and has the form:

L = \{λ (U_{x x} + U_{y y}) + f_{1} (U) : λ \in R, U = (u, v), f_{1} \in P\},

where

P : = {Polynomials of U up to a certain degree} .

Conversely,

L_{f}

is engineered to learn both multiplication terms and fractional terms, subject to certain constraints. In our paper, we make the choice of

\begin{matrix} L_{f} = & {λ (U_{x x} + U_{y y}) + f_{1} (U) + \frac{f_{2} (u)}{f_{3} (u)} f_{4} (v) + f_{5} (u) \frac{f_{6} (v)}{f_{7} (v)} : \\ λ \in R, U = (u, v), {f_{i}}_{i = 1}^{7} \subset P, deg f_{2} < deg f_{3}, deg f_{6} < deg f_{7}}, \end{matrix}

which is much larger than

L

. Therefore, our framework of neural networks, built upon

L_{f}

, is more challenging to implement than the original framework, which is based on

L

.

2.3. Optimizing Hyperparameters

In this section, we will explain the process of tuning hyperparameters

λ_{M}

and

λ_{S}

in the loss function (7). Firstly, the range of spatial and temporal variables in the training set are defined as

[- L, L]

and

[0, T]

, respectively. Then, using the finite difference method, we generate a dataset that acts as the “true data”. Additionally, we consider M initial conditions. The time interval is determined by

d t / \tilde{d} t

, where

\tilde{d} t

is the time step size for computing the “true data” and

d t

represents the time step size for selecting the “observational data”. Typically,

\tilde{d} t

is chosen to be much smaller than

d t

. The solution corresponding to the mth initial condition is denoted as

u_{m} (\cdot, \cdot)

, where the first “·” refers to the spatial variable and the second “·” represents the temporal variable. If the solution is evaluated at the kth time step, it is written as

u_{m} (\cdot, t_{k})

, with “·” representing the spatial variable.

The M initial values from M initial conditions are divided into three separate groups, resulting in

M = M_{1} + M_{2} + M_{3}

, where

M_{1}

,

M_{2}

, and

M_{3}

represent the sizes of the training set, validation set, and test set, respectively. The solutions produced by these initial values are designated as follows:

Training set:

u_{1} (\cdot, \cdot), \dots, u_{M_{1}} (\cdot, \cdot)

;

Validation set:

u_{M_{1} + 1} (\cdot, \cdot), \dots, u_{M_{1} + M_{2}} (\cdot, \cdot)

;

Testing set:

u_{M_{1} + M_{2} + 1} (\cdot, \cdot), \dots, u_{M_{1} + M_{2} + M_{3}} (\cdot, \cdot)

.

We use the training set to train our models, the validation set to find the best parameters, and the testing set to evaluate the performance of the trained models.

Assume we divide the time range

[0, T]

into K blocks, with cutting points denoted as

t_{k}

for

1 \leq k \leq K

. Then, for any

1 \leq m \leq M

and for any

1 \leq k \leq K

, we define

ℓ_{k m} = | | u_{m} (\cdot, t_{k}) - {\tilde{u}}_{m} (\cdot, t_{k}) {| |}_{2}^{2},

where

{∥ \cdot ∥}_{2}

denotes the

L^{2}

norm with respect to the space variable on

[- L, L]

,

u_{m}

is the “true solution”, and

{\tilde{u}}_{m}

is the “predicted solution” by a neural network. Based on this, the training loss, validation loss and the testing loss are defined as follows:

Training loss:

$L_{t r a i n} : = \frac{1}{M_{1} K {(d t)}^{2}} \sum_{k = 1}^{K} \sum_{m = 1}^{M_{1}} ℓ_{k m} .$

(9)
Validation loss:

$L_{v a l i d} : = \frac{1}{M_{2} K {(d t)}^{2}} \sum_{k = 1}^{K} \sum_{m = M_{1} + 1}^{M_{1} + M_{2}} ℓ_{k m} .$

(10)
Testing loss:

$L_{t e s t} : = \frac{1}{M_{3} K {(d t)}^{2}} \sum_{k = 1}^{K} \sum_{m = M_{1} + M_{2} + 1}^{M} ℓ_{k m} .$

(11)

We choose the hyperparameters

λ_{M}

and

λ_{S}

in the loss function (7) using the validation sets. Let

B_{m k} = u_{m} (\cdot, t_{k})

and

B_{j k} = u_{j} (\cdot, t_{k})

, where

1 \leq m \leq M_{1}

,

M_{1} + 1 \leq j \leq M_{1} + M_{2}

and

1 \leq k \leq K

. We define the training number by

N_{t}

. We then gradually increase the time points of the training and validation sets. For instance, if K = 15 and

N_{t} = 5

, the training and validation sets can be selected as follows. The performance metric is the same as the validation loss in (10).

Training	Validation	Validation Loss
$B_{m 1}, \dots, B_{m 3}$	$B_{j 1}, \dots, B_{j 3}$	$L_{v a l i d}^{(1)}$
$B_{m 1}, \dots, B_{m 6}$	$B_{j 1}, \dots, B_{j 6}$	$L_{v a l i d}^{(2)}$
$B_{m 1}, \dots, B_{m 9}$	$B_{j 1}, \dots, B_{j 9}$	$L_{v a l i d}^{(3)}$
$B_{m 1}, \dots, B_{m 12}$	$B_{j 1}, \dots, B_{j 12}$	$L_{v a l i d}^{(4)}$
$B_{m 1}, \dots, B_{m 15}$	${\tilde{B}}_{j 1}, \dots, B_{j 15}$	$L_{v a l i d}^{(5)}$

Furthermore, we tune the hyperparameters using Hyperopt [29], which uses Bayesian optimization to explore the hyperparameter space more efficiently than a brute-force grid search. Specifically, the mPDE-Net is nested in the objective function of Hyperopt, which will optimize the average validation loss

L_{a v l}

of models.

L_{a v l} = \frac{1}{5} \sum_{i = 1}^{5} L_{v a l i d}^{(i)} .

The selection procedure is described in Algorithm 2.

Algorithm 2 Optimizing Hyperparameters using Hyperopt

1:: Initialize the search spaces for $λ_{M}$ and $λ_{S}$ ;
2:: Define the objective function (to be optimized) as the average testing loss obtained from mPDE-Net, implemented using PyTorch;
3:: Set the optimization algorithm, specify the number of trials, and initialize the results list.
4:: for $i = 1$ to number of trials do
5:: Sample a set of hyperparameters from the search spaces, evaluate the objective function with the sampled hyperparameters, and set a list called the Validation loss.
6:: for $r = 1$ to $\frac{K}{N_{t}}$ do
7:: Train model of mPDE-Net on $B_{m 1}, \dots, B_{m r}$ , test it on $B_{j 1}, \dots, B_{j r}$ to get a validation loss, and then append the validation loss to the Validation loss.
8:: end for
9:: Get an average validation loss from the Validation loss, append the hyperparameters and the average validation loss to the results list, and then update the search space based on the results so far.
10:: end for
11:: return the hyperparameters with the minimum objective function value.

2.4. Frac-PDE-Net

We have noted that mPDE-Net accurately fits data and recovers terms, but it may not always simplify the learned PDE, making it challenging to interpret. To address this, we implement sparsity-encouraging methods such as the Lasso approach. However, even with Lasso and hyperparameters chosen from the validation sets, the predicted equation still had redundant terms. This is likely due to correlated data and linear dependencies in the data, which prevent Lasso from fully shrinking the extra coefficients to zero. To overcome this, we employ two approaches. The first, called the

L^{2}

norm-based term selection criterion, weakens or eliminates linear dependencies in the data. The second, called sequential threshold ridge regression (STRidge), creates concise models through strong thresholding. We will discuss these approaches in more detail below.

$L^{2}$ norm based term selection criterion. Consider the underlying PDE in the form of

$\partial_{t} u = Θ (u) ξ,$

(12)

where

$Θ (u) = (Θ_{1} (u), Θ_{2} (u), \dots, Θ_{p} (u)), ξ = {(ξ_{1}, ξ_{2}, \dots, ξ_{p})}^{T} .$

To address the issue of excessive terms in the learned PDE, we apply the $L^{2}$ norm based term selection criterion. This involves normalizing the columns of $Θ (u)$ to obtain $Φ_{k} (u)$

$Θ (u) ξ = \sum_{k = 1}^{p} Θ_{k} (u) ξ_{k} = \sum_{k = 1}^{p} Φ_{k} (u) η_{k},$

where

$Φ_{k} (u) = \frac{Θ_{k} (u)}{∥ Θ_{k} {(u) ∥}_{2}}, η_{k} = ξ_{k} {∥ Θ_{k} (u) ∥}_{2}, \forall 1 \leq k \leq p,$

and adjusting the coefficients $ξ$ to $\tilde{ξ}$ ,

${\tilde{ξ}}_{k} = \{\begin{matrix} 0, & if | η_{k} | < δ max_{j} | η_{j} |, \\ ξ_{k}, & otherwise, \end{matrix} \forall 1 \leq k \leq p .$

By removing the terms in $Θ (u)$ whose adjusted coefficients $η_{k}$ are significantly smaller than the largest one, we shorten the vector $\tilde{ξ}$ to $ξ^{(s)}$ . The corresponding columns in $Θ (u)$ form a new matrix $Θ^{(s)} (u)$ with reduced linear dependency between its columns. This results in a simplified approximation of the PDE:

$\partial_{t} u \approx Θ^{(s)} (u) ξ^{(s)} .$

(13)
Sparse regression: STRidge. After using the $L^{2}$ norm-based term selection criterion to select terms, as discussed previously, we move on to consider sparse regression to further improve the compactness of the representation for the hidden PDE model (13). Here, a tolerance threshold “tol” is introduced to select coefficients for sparse results. Coefficients smaller than “tol” will be discarded, and the remaining ones will be utilized until the number of terms stabilizes. The sparsity regression process is outlined in Algorithm 3. For further information, see [25].

To summarize, the mPDE-Net approach allows us to achieve relatively accurate predictions for the function and its derivatives. We then employ an

L^{2}

norm-based term selection criterion and sparse regression to obtain a concise model, which we refer to as Frac-PDE-Net. Algorithm 4 summarizes this procedure.

Algorithm 3: STRidge(

Θ^{(s)}

,

U_{t}

,

λ

, tol, iters)

1:: ${\hat{ξ}}^{(s)} = a r g m i n_{ξ^{(s)}} | | Θ^{(s)} ξ^{(s)} - U_{t} {| |}_{2}^{2} + λ | | ξ^{(s)} {| |}_{2}^{2}$ ▹ ridge regression
2:: bigcoeffs= ${j : | {\hat{ξ}}_{j}^{(s)} | \geq t o l}$ ▹ select large coefficients
3:: ${\hat{ξ}}^{(s)} [\sim bigcoeffs] = 0$ ▹ apply hard threshold
4:: ${\hat{ξ}}^{(s)} [bigcoeffs] = S T R i d g e (Θ^{(s)} [:, bigcoeffs], U_{t}, λ, tol, iters - 1)$ ▹ recursive call with fewer coefficients
5:: return ${\hat{ξ}}^{(s)}$

Algorithm 4:

L^{2}

Norm selection criterion+ STRidge(

\hat{Θ}

,

{\hat{u}}_{t}

,

λ

, tol, iters)

1:: $\hat{Θ} \hat{ξ} = \sum_{k = 1}^{p} {\hat{Θ}}_{k} {\hat{ξ}}_{k} = \sum_{k = 1}^{p} \frac{Θ_{k} (\hat{u})}{∥ Θ_{k} (\hat{u}) ∥_{2}} ({\hat{ξ}}_{k} {∥ Θ_{k} (\hat{u}) ∥}_{2}) = \sum_{k = 1}^{p} Φ_{k} (\hat{u}) η_{k}$ ▹ Adjusted coefficients
2:: bigcoeffs= ${k : | η_{k} | \geq δ max_{j} | η_{j} |}$ ▹ Select large coefficients
3:: $\hat{ξ} [\sim bigcoeffs] = 0$
4:: $Θ^{(s)} = \hat{Θ} [:, bigcoeffs]$ and $ξ^{(s)} = \hat{ξ} [bigcoeffs]$
5:: ${\hat{ξ}}^{(s)} = a r g m i n_{ξ^{(s)}} | | Θ^{(s)} ξ^{(s)} - {\hat{u}}_{t} {| |}_{2}^{2} + λ | | ξ^{(s)} {| |}_{2}^{2}$ ▹ ridge regression
6:: bigcoeffs= ${j : | {\hat{ξ}}_{j}^{(s)} | \geq t o l}$ ▹ select large coefficients
7:: ${\hat{ξ}}^{(s)} [\sim bigcoeffs] = 0$ ▹ apply hard threshold
8:: ${\hat{ξ}}^{(s)} [bigcoeffs] = S T R i d g e (Θ^{(s)} [:, bigcoeffs], {\hat{u}}_{t}, λ, tol, iters - 1)$ ▹ recursive call with fewer non-zero coefficients
9:: return ${\hat{ξ}}^{(s)}$

2.5. Kolmogorov-Smirnov Test

After applying the Frac-PDE-Net procedure, a simplified, interpretable model has been created. Our next goal is to determine if this model can be further compressed. We designate Model 1 as the system learned by Frac-PDE-Net, and Model 2 as the system obtained by removing the interaction term with the smallest

L^{2}

norm from Model 1. To determine if Model 1 and Model 2 come from the same distribution, we use the Kolmogorov–Smirnov test (K-S test).

Since our examples involve systems of two PDEs, a two-dimensional K-S test is appropriate. The time range is

[0, T]

with time step size

d t

, giving

n : = \frac{T}{d t}

time grids denoted as

{t_{i}}_{i = 1}^{n}

, where

t_{i} = i (d t)

, and

1 \leq i \leq n

. At a fixed time

t_{i}

, we aim to test the proximity of two samples

Y_{t_{i}}

and

{\tilde{Y}}_{t_{i}}

, which are associated with Model 1 and Model 2, respectively, at time

t_{i}

. For each

t_{i}

, we specify:

Hypothesis 1

(Null). The two sets

{Y_{t_{i}}}_{i = 1}^{n}

and

{{\tilde{Y}}_{t_{i}}}_{i = 1}^{n}

come from a common distribution.

Hypothesis 2

(Alternative). The two sets

{Y_{t_{i}}}_{i = 1}^{n}

and

{{\tilde{Y}}_{t_{i}}}_{i = 1}^{n}

do not come from a common distribution.

Let

H_{t_{i}, 0}

and

{\hat{p}}_{t_{i}}

denote null hypotheses and the corresponding p-values, respectively, for

1 \leq i \leq n

. In this paper, we employed Bonerroni [30], Holm [31] and Benjamini–Hochberg (B-H) [32] methods for multiple testing adjustment. Note that the Bonferroni method is the most conservative one among these three methods. Under the complete null hypothesis of a common distribution across all time points, no more than

5 %

of the total time points can be rejected.

3. Numerical Studies: Convection-Diffusion Equations with the Neumman Boundary Condition

In this section, we showcase numerical examples to demonstrate the efficacy of Frac-PDE-Net, our proposed method. The training, validation, and testing data are generated based on the underlying governing equation. Our aim is to use Frac-PDE-Net on these data to obtain a concise and interpretable model for the PDE. The governing PDEs under consideration in this paper are of the following form:

\{\begin{matrix} \partial_{t} u = F_{1} (u, v), \\ \partial_{t} v = F_{2} (u, v), \end{matrix}

(14)

where

F_{1} (u, v) = d_{1} Δ u + P_{1} (u, v) + R_{1} (u, v), F_{2} (u, v) = d_{2} Δ v + P_{2} (u, v) + R_{2} (u, v) .

(15)

Here,

d_{1}

and

d_{2}

are positive diffusion coefficients,

R_{1}

and

R_{2}

represent fractional functions of

(u, v)

, and

P_{1}

and

P_{2}

denote combinations of power functions and integration operators of

(u, v)

through addition and multiplication. For example,

R_{1} (u, v)

can be

\frac{u - 2}{v^{2} - v + 3}

, and

P_{1} (u, v)

can be

1 + u^{1.5} - v^{2} + u^{1.5} \int u d x

.

3.1. Example 1: A 2-Dimensional Model

Our first example is taken from (Equation (2.8) in Section 2.2 in [2]). In this example, we consider (14) under the Neumann boundary condition on a 2-dimensional domain

D_{1} : = [- 5, 5] \times [- 5, 5]

with

d_{1} = 0.3

,

d_{2} = 0.4

,

P_{1} (u, v) = 1 - u

,

P_{2} (u, v) = 0.4 - 0.2 v

, and

R_{1} (u, v) = R_{2} (u, v) = - 2 \frac{u v}{1 + u + 4 u^{2}} = - \frac{1}{2} \frac{u v}{u^{2} + 0.25 u + 0.25} .

Thus, Equation (14) is reduced to

\{\begin{matrix} \partial_{t} u = F_{1} (u, v), \\ \partial_{t} v = F_{2} (u, v), \\ \partial_{x} u (- 5, y, t) = \partial_{x} u (5, y, t) = \partial_{y} u (x, - 5, t) = \partial_{y} u (x, 5, t) = 0, \end{matrix}

(16)

with

(x, y, t) \in [- 5, 5] \times [- 5, 5] \times [0, 0.15]

and

\{\begin{matrix} F_{1} (u, v) & = & 0.3 (\partial_{x}^{2} u + \partial_{y}^{2} u) + 1 - u - \frac{1}{2} \frac{u v}{u^{2} + 0.25 u + 0.25}, \\ F_{2} (u, v) & = & 0.4 (\partial_{x}^{2} v + \partial_{y}^{2} v) + 0.4 - 0.2 v - \frac{1}{2} \frac{u v}{u^{2} + 0.25 u + 0.25} \end{matrix}

(17)

The observations are generated with Equations (16) and (17), and then split into training data, validation data and testing data. The PDE is solved by applying a finite difference scheme to a

64 \times 64

spatial mesh grid with the central difference scheme for

Δ : = \partial_{x}^{2} + \partial_{y}^{2}

, and with a temporal discretization of second-order Runge–Kutta (see [16]), using a time step size of

\frac{1}{1600}

.

In addition, the observations are obtained from various initial values: this implies an extra variability in the datasets, that is necessary if we want to be able to generalize well to any initial conditions. We assume that we have

N_{I n i t} = 12

different solutions, coming from different initial values

w_{0}

. The functions are random, defined through random parameters

a_{i, j}

,

b_{i, j}

,

c_{i, j}

,

d_{i, j}

,

a_{k, l}

,

b_{k, l}

,

c_{k, l}

and

d_{k, l}

, which follow from the standard normal distribution

N (0, 1)

,

c_{1}

and

c_{2}

, which follow from uniform distributions:

c_{1} \sim U (- 0.5, 0.5)

and

c_{2} \sim U (0.5, 1.5)

. Then, we generate the 12 initial values

(u_{0}, v_{0})

by setting

u_{0} (x, y) = \frac{w_{0} (x, y)}{max | w_{0} |} + c_{1}, v_{0} (x, y) = \frac{{\tilde{w}}_{0} (x, y)}{max | {\tilde{w}}_{0} |} + c_{2},

(18)

where

\begin{matrix} w_{0} (x, y) = \sum_{| i |, | j | \leq 13} & {a_{i, j} cos (2 i x) cos (2 j y) + b_{i, j} sin [(2 i + 1) x] sin [(2 j + 1) y] \\ + c_{i, j} sin [(2 i + 1) x] cos (2 j y) + d_{i, j} cos (2 i x) sin [(2 j + 1) y]}, \end{matrix}

\begin{matrix} {\tilde{w}}_{0} (x, y) = \sum_{| k |, | l | \leq 13} & {a_{k, l} cos (2 k x) cos (2 l y) + b_{k, l} sin [(2 k + 1) x] sin [(2 l + 1) y] \\ + c_{k, l} sin [(2 k + 1) x] cos (2 l y) + d_{k, l} cos (2 k x) sin [(2 l + 1) y]} . \end{matrix}

For any given initial data

(u_{0}, v_{0})

, we denote the corresponding solution to be

(u^{*}, v^{*})

. When noise is allowed, we assume the perturbed data to be

u (x, y, t) = u^{*} (x, y, t) + n_{l} Q_{1}, v (x, y, t) = v^{*} (x, y, t) + n_{l} Q_{2},

where

n_{l}

is the level of Gaussian noise added, and

Q_{1}

and

Q_{2}

are random variables, which follow from the normal distribution:

Q_{i} \sim N (0, σ_{i}^{2})

for

i = 1, 2

, where

σ_{1}

(or

σ_{2}

resp.) is the standard deviation of the true data

u^{*}

(or

v^{*}

resp.).

Since the time is from 0 to 0.15, there are 15 time blocks and we denote

N_{T i m e} = 15

. For spatial variables, we have

N_{S p a c e} = 64

, where

N_{S p a c e}

represents the number of space grids. Therefore, the dataset is

\{(u_{t, k}, v_{t, k}) : 1 \leq t \leq N_{T i m e}, 1 \leq k \leq N_{I n i t}\},

where both

u_{t, k}

and

v_{t, k}

are matrices in

R^{N_{S p a c e} \times N_{S p a c e}}

. The following Table 1 and Table 2 show a summary of parameters for Frac-PDE-Net.

Our goal is to discover the terms

F_{1} (u, v)

and

F_{2} (u, v)

on the right hand side of (16) and the true expressions are given by (17). For convenience of notation, we denote

{\hat{F}}_{1}

and

{\hat{F}}_{2}

to be our predicted operators for

F_{1}

and

F_{2}

. Based on some existing models (see, e.g., Section 2.2 in [2]), we adopt some assumptions before discovering

{\hat{F}}_{1}

and

{\hat{F}}_{2}

. More precisely, we assume that

{\hat{F}}_{1} (u, v) = {\hat{d}}_{1} Δ u + {\hat{P}}_{1} (u, v) + {\hat{R}}_{1} (u, v), {\hat{F}}_{2} (u, v) = {\hat{d}}_{2} Δ v + {\hat{P}}_{2} (u, v) + {\hat{R}}_{2} (u, v),

(19)

where

{\hat{d}}_{1}

and

{\hat{d}}_{2}

are positive constants,

{\hat{P}}_{1}

and

{\hat{P}}_{2}

are polynomials of

(u, v)

up to order 2, and both the fractional terms

{\hat{R}}_{1}

and

{\hat{R}}_{2}

are in the form

l (u) r (v)

or

r (u) l (v)

, where l means a linear function and r denotes a fractional function in which the numerator is linear and the denominator is quadratic.

Based on these assumptions, we consider the following library

{u, u_{x x}, u_{y y}, v, v_{x x}, v_{y y}}

for training our model.

The filters q (as defined in (4)) are selected to be of size

5 \times 5

. The total number of parameters in

W^{(i)}

(as defined in Algorithm 1) for approximating

F_{1}

and

F_{2}

is 56, and the number of trainable parameters in the moment matrices M (as defined in (6)) is 52. To optimize the parameters, we use the BFGS algorithm instead of the Adam or SGD optimizers since the BFGS algorithm is faster and also stable.

In the following, we outline the notation used and summarize the key steps of our framework.

1.: ${\hat{F}}_{i}^{m P D E - N e t}$ denotes the result of applying the modified PDE-Net on our model.
2.: Next, we utilize the $L^{2}$ norm-based selection criterion and sparse regression on ${\hat{F}}_{i}^{m P D E - N e t}$ to obtain a more concise and interpretable model, referred to as ${\hat{F}}_{i}^{s m P D E - N e t}$ . The “s” in ${\hat{F}}_{i}^{s m P D E - N e t}$ represents the application of sparse regression.
3.: Subsequently, we fix the terms in ${\hat{F}}_{i}^{s m P D E - N e t}$ and retrain its coefficients to produce a final model named ${\hat{F}}_{i}^{r s m P D E - N e t}$ . This is the end result of our Frac-PDE-Net scheme. The “r” in ${\hat{F}}_{i}^{r s m P D E - N e t}$ signifies the process of retraining the coefficients.
4.: Finally, to verify that no further terms can be eliminated after Frac-PDE-Net, we compare two models: Model 1, generated by Frac-PDE-Net; and Model 2, which is identical to Model 1 but removes the term with the smallest $L^{2}$ norm from ${\hat{F}}_{1}$ and ${\hat{F}}_{2}$ . The coefficients in Model 2 are retrained, and the resulting model is referred to as ${\hat{F}}_{i}^{P H r s m P D E - N e t}$ . “PH” in ${\hat{F}}_{i}^{P H r s m P D E - N e t}$ represents the Post-hoc selection in Model 2. The comparison between Model 1 and Model 2 will be conducted using the Kolmogorov–Smirnov test as outlined in Section 2.5.

For this case, we added 5% noise to the generated data to form the observational data. The results are displayed in Table 3. Table 3 shows that

{\hat{F}}_{i}^{m P D E - N e t}

(modified PDE-Net framework) accurately identifies the terms in Example 1 and estimates their corresponding coefficients. However, it also produces unnecessary terms with low weights after training. By applying the

L^{2}

norm-based selection and sparse regression (

L^{2}

+SP), we successfully remove these extra terms in

{\hat{F}}_{i}^{r s m P D E - N e t}

. After the terms in

{\hat{F}}_{1}

and

{\hat{F}}_{2}

are identified, we retrain the model with these fixed terms to obtain the final coefficients in

{\hat{F}}_{i}^{r s m P D E - N e t}

.

To test whether Model 1 (

{\hat{F}}_{i}^{r s m P D E - N e t}

) and Model 2 (

{\hat{F}}_{i}^{P H r s m P D E - N e t}

) are similar or not, we compare their predictions by using the finite difference scheme. Consider the time range to be

[0, 0.5]

with time step size

d t = 0.01

. Hence, there are 50 time grids, which are denoted to be

{t_{i}}_{i = 1}^{50}

, where

t_{i} = 0.01 i

,

1 \leq i \leq 50

. Fix a time

t_{i}

, we introduce the residuals

E_{t_{i}} : = Y_{t_{i}} - Y_{t_{i}}^{*}

and

{\tilde{E}}_{t_{i}} : = {\tilde{Y}}_{t_{i}} - Y_{t_{i}}^{*}

, where

Y_{t_{i}}^{*}

represents the true solution, and

Y_{t_{i}}

and

{\tilde{Y}}_{t_{i}}

denote the predicted solutions based on Model 1 and Model 2, respectively, at time

t_{i}

. We will test if the residuals

{E_{t_{i}}}_{i = 1}^{50}

and

{{\tilde{E}}_{t_{i}}}_{i = 1}^{50}

have similar distributions. The null hypothesis is

H_{0}^{(i)} : E_{t_{i}} \sim {\tilde{E}}_{t_{i}}

and the alternative hypothesis is

H_{A}^{(i)} : E_{t_{i}} ≁ {\tilde{E}}_{t_{i}}

. Applying Bonferroni method, Holm method and the B-H’s procedure for multiple testing adjustment, discussed in Section 2.5, the test results are presented in the following Table 4.

The results in Table 4 show that Model 1 (Frac-PDE-Net) is significantly different from Model 2, meaning all terms in Model 1 should be kept. Hence, the final discovered terms for

{\hat{F}}_{1}

and

{\hat{F}}_{2}

are represented by Model 1 (Frac-PDE-Net) in Table 4.

To assess the stability of the results shown above, we repeated the experiments 100 times and the results are presented in Figure 4 and Figure 5. The process of merging similar terms is outline in Appendix A.1. The plots show that there are some instances where the three methods fail to eliminate certain redundant terms. However, these instances are rare, as the median of these terms is 0, indicating that they appear infrequently.

3.2. Example 2: A 1-Dimensional Model

Our second example is taken from [6]. In this example, we consider (14) under the Neumann boundary condition on a one-dimensional domain

D_{1} : = [- \frac{5 π}{2}, \frac{5 π}{2}]

with

d_{1} = 0.1, d_{2} = 10

,

P_{1} (u, v) = 3.6 u^{1.5} - 3.6 u - 0.229 u^{1.5} \int_{- 2.5 π}^{2.5 π} u d x, P_{2} (u, v) = u - 0.4 v,

R_{1} (u, v) = 0.081 \frac{u}{v^{2} + 0.0215}, R_{2} (u, v) = 0 .

Thus, Equation (14) is reduced to

\{\begin{matrix} \partial_{t} u = F_{1} (u, v), \\ \partial_{t} v = F_{2} (u, v), \\ \partial_{x} u (- 2.5 π, t) = \partial_{x} u (2.5 π, t) = \partial_{x} v (- 2.5 π, t) = \partial_{x} v (2.5 π, t) = 0, \end{matrix}

(20)

with

(x, t) \in [- 5, 5] \times [0, 0.75]

and

\{\begin{matrix} F_{1} (u, v) & = & 0.1 \partial_{x}^{2} u + 3.6 u^{1.5} - 3.6 u - 0.229 u^{1.5} \int_{- 2.5 π}^{2.5 π} u d x + 0.081 \frac{u}{v^{2} + 0.0215}, \\ F_{2} (u, v) & = & 10 \partial_{x}^{2} v + u - 0.4 v . \end{matrix}

(21)

The training data, validation data and testing data are generated, based on (20), by applying a finite difference scheme to a 600 spatial mesh grid and then restricted to a 200 spatial mesh grid with the central difference scheme for

Δ : = \partial_{x}^{2}

, and with a temporal discretization of the implicit Euler scheme, using a time step size of 0.01. Furthermore, we evaluate 14 different initial values, 10 of which were selected from a set of solutions with periodic patterns. The remaining initial values were generated by combining elementary functions. The reason for using different ways to produce initial values is to test if this method still works for periodical solutions.

We also add noise to the generated data in the following form:

u (x, y, t) = | u^{*} (x, y, t) + n_{l} Q_{1} |, v (x, y, t) = | v^{*} (x, y, t) + n_{l} Q_{2} |

where

n_{l}

is the level of Gaussian noise added and

Q_{1}

and

Q_{2}

are random variables that follow from the normal distribution:

Q_{i} \sim N (0, σ_{i}^{2})

for

i = 1, 2

, where

σ_{1}

(or

σ_{2}

resp.) is the standard deviation of

u^{*}

(or

v^{*}

resp.). The reason of imposing the absolute value sign is to avoid negative values, which may cause trouble to evaluate power functions with non-integer power, such as

u^{1.5}

.

We choose 15 blocks for the time on the interval

[0, 0.75]

and denote

N_{T i m e} = 15

. For spatial variables, we set

N_{S p a c e} = 200

, where

N_{S p a c e}

represents the number of space grids. Therefore, the dataset is

\{(u_{t, k}, v_{t, k}) : 1 \leq t \leq N_{T i m e}, 1 \leq k \leq N_{I n i t}\},

where both

u_{t, k}

and

v_{t, k}

are matrices in

R^{N_{S p a c e} \times N_{S p a c e}}

. The following Table 5 and Table 6 show a summary of parameters for Frac-PDE-Net.

In [6], some assumptions are made on the model based on existing experimental knowledge of the biological behavior. For example, it is assumed that the operator

F_{2} (u, v)

is linear in both u and v, while

F_{1} (u, v)

is nonlinear in both u and v. As the form in (15),

F_{1} (u, v) = d_{1} Δ u + P_{1} (u, v) + R_{1} (u, v) .

In [6], the nonlinear dependence of

P_{1} (u, v)

on u is via the combination of the power function

u^{α}

and the integration operator

\int u d x

, where

α

is further restricted to the range

[1, 2]

. On the other hand,

R_{1} (u, v)

is assumed to be linear in u, but nonlinear in v and the nonlinear dependence on v is via a fractional function whose denominator is a quadratic polynomial. Thanks to these a priori constraints, we consider the library

{u, u_{x}, u_{x x}, v, v_{x}, v_{x x}, I, u^{α}}

for

{\hat{F}}_{1} (u, v)

and the library

{u, u_{x}, u_{x x}, v, v_{x}, v_{x x}}

for

{\hat{F}}_{2} (u, v)

, where

α

takes the form

α = 1.5 + 0.5 sin (η)

for

η \in R

to ensure that

α \in [1, 2]

.

The filters q are of size

1 \times 19

. The total number of parameters for approximating

F_{1}

and

F_{2}

is 29, and the number of trainable parameters in the moment matrices M is 32. To optimize the parameters, we again use the BFGS algorithm.

For this case, we added 1% noise to the generated data to form the observational data. The results are displayed in Table 7, in which the notations are consistent with those in Table 3.

Similar to the post hoc selection procedure we performed in Example 1, we also need to compare Model 1 (

{\hat{F}}_{1}^{r s m P D E - N e t}

) and Model 2 (

{\hat{F}}_{1}^{P H r s m P D E - N e t}

), and determine whether they differ significantly. Consider the time range to be

[0, 10]

with time step size

d t = 0.05

. Hence, there are 200 time grids which are denoted to be

{t_{i}}_{i = 1}^{200}

, where

t_{i} = 0.05 i

,

1 \leq i \leq 200

. At each time

t_{i}

, we introduce the residuals

E_{t_{i}} : = Y_{t_{i}} - Y_{t_{i}}^{*}

and

{\tilde{E}}_{t_{i}} : = {\tilde{Y}}_{t_{i}} - Y_{t_{i}}^{*}

, where

Y_{t_{i}}

and

{\tilde{Y}}_{t_{i}}

are associated to Model 1 and Model 2, respectively. We will test if the residuals

{E_{t_{i}}}_{i = 1}^{200}

and

{{\tilde{E}}_{t_{i}}}_{i = 1}^{200}

have similar distributions or not. Analogous to the previous case, we see from Table 7 that the coefficient in front of the term

\partial_{x}^{2} u

in Model 2 (

{\hat{F}}_{1}^{P H r s m P D E - N e t}

) is a negative number -0.026, which leads to rapid concentration rather than diffusion effect. With this being said, Model 2 is essentially different from Model 1 and the distributions of

{E_{t_{i}}}_{i = 1}^{200}

and

{{\tilde{E}}_{t_{i}}}_{i = 1}^{200}

are totally different.

To assess the stability of the results shown above, we repeated the experiments 100 times and the results are presented in Figure 6 and Figure 7. The plots show that there are some instances where the three methods fail to eliminate certain redundant terms. However, these instances are rare, as the median of these terms is 0, indicating that they appear infrequently.

4. Prediction

4.1. Example 1: The 2-Dimensional Model

In this section, we validate the robustness of the model discovered by Frac-PDE-Net in Example 1 by performing predictions with non-typical initial values

u_{0}

and

v_{0}

,

u_{0} = \frac{50 y^{2} - y^{4} + 4}{800 [1.2 - cos (\frac{π}{5} y)]} + 4, v_{0} = \frac{1}{800} (50 y^{2} - y^{4} + 4) [2 + cos (\frac{π}{5} x)] + 1 .

We use the finite difference method to generate the “true data” in the forward direction using the known coefficients and terms in (16) and (17). The spatial step sizes (

d x

and

d y

) are set to

\frac{10}{64}

and the time step size (

d t

) is

\frac{1}{1600}

. We then simulate the data using the trained model from Table 3 up to t = 0.5.

In Figure 8, both the true solution

(u, v)

and the predicted solution

(\tilde{u}, \tilde{v})

of the trained model by Frac-PDE-Net are plotted at different time instances:

t \in {0.4, 0.6, 0.8, 1}

. One can see from Figure 8 that the predicted solution is very close to the true one.

The results of the comparison between Frac-PDE-Net and PDE-Net 2.0 are presented in both graphical and quantitative form. The model discovered by PDE-Net 2.0 is shown in Table 8, while the predicted solutions are displayed in Figure 9. Although PDE-Net 2.0 only utilizes polynomials, the predicted images still have a similar shape to the true ones. To further evaluate the performance, the predicted errors are analyzed quantitatively using the

L^{\infty}

norm and

L^{2}

norm on the space domain

[- 5, 5] \times [- 5, 5]

, as seen in Table 9. The results show that Frac-PDE-Net has smaller errors compared to PDE-Net 2.0, highlighting its advantage.

4.2. Example 2: The One-Dimensional Model

In this section, we validate the robustness of the model discovered by Frac-PDE-Net in Example 2 in Section 3.2 by performing predictions with the following periodic initial values

u_{0}

and

v_{0}

,

u_{0} (x) = 0.0259 + 0.01 sin (3 x), v_{0} (x) = 0.06475 + 0.01 sin (3 x) .

We use finite difference method to generate the “true” data in the forward direction using the known coefficients and terms in (20) and (21). The spatial step sizes (

d x

and

d y

) are set to

\frac{5 π}{200}

and the time step size (

d t

) is

\frac{5}{100}

. The time interval considered is

t \in [0, 10]

. We then simulate the data using the trained model from Table 7 over the time period

[0, 10]

. In Figure 10, both the true solution and the predicted solution of the trained model by Frac-PDE-Net are plotted for

t \in [0, 10]

. One can see from Figure 10 that the predicted solution is very close to the true one.

The results of the comparison between Frac-PDE-Net and PDE-Net 2.0 are presented in both graphical and quantitative form. The model discovered by PDE-Net 2.0 is shown in Table 10, while the predicted solutions are displayed in Figure 11. We can clearly see that the predicted images by PDE-Net 2.0 are far from satisfactory compared to the true one in Figure 10. To further evaluate the performance, the predicted errors are analyzed quantitatively using the

L^{\infty}

norm and

L^{2}

norm on the space-time region

[- \frac{5 π}{2}, \frac{5 π}{2}] \times [0, 10]

in Table 11. The results show that Frac-PDE-Net has much smaller errors compared to PDE-Net 2.0, highlighting its advantage.

5. Conclusions

Our approach, Frac-PDE-Net, builds on the symbolic approach developed in PDE-Net for addressing the discovery of realistic and interpretable PDE from data. While the neural network remains very efficient for generating and learning dictionaries of functions, typically polynomials, we have shown that if we enrich the dictionaries with large families of functions (typically uncountable), an extra-care is needed for selecting the important terms by penalization and by evaluating and testing the impact of a reaction term in the predicted solution. Quite remarkably, we can extract a sparse equation with readable terms and with good estimates of the associated parameters.

The introduction of rich families of functions, such as fractions (rational functions) is often necessary because they are well used by modelers, but also they can avoid the limitations of the approximation capacity of polynomials. Indeed, it might be necessary to have numerous terms in the expansion in order to have a correct approximation of the unknown reaction terms. As a matter of fact, we have introduced a very flexible family of fractions that avoid truncation based on powers

u^{p}, v^{q}, q, p \in N

. While we learn then the numerator and denominator coefficients in

R

, our approach is incorporated seamlessly in the symbolic differentiable neural network framework of PDE-Net by the introduction of extra layers.

Our work is originally motivated by the discovery and estimation of reaction–diffusion PDEs, with possibly complex terms such as fractions, non-integer powers, or non-local terms (such as an integral), as it has been introduced for the pollen tube growth problem [6]. Nevertheless, our selection approach could be used to handle other dictionaries, or in the presence of advection terms as our methodology does exploit the reaction–diffusion structure only for imposing some constraints on the dictionaries of interest, and because of the interpretability of each term in that case. As the next steps, the Frac-PDE-Net methodology can be improved by considering more advanced numerical schemes in time discretization, say implicit Euler or second-order Runge–Kutta. In that case, we expect to have a better accuracy and stability for model recovery and prediction. Another possible improvement would be to enrich the dictionaries of fractionals by replacing the current form

N (u, v) = g (u) h (v)

by more rational functions with denominators that depends both on u and v, say

N (u, v) = \frac{u - v}{u^{2} - v^{2} + 1}

. Finally, we put an emphasis on the fact that Frac-PDE-Net reaches a trade-off by discovering the main terms of the PDE, accurately estimating each coefficient in order to gain interpretability, while it also allows effective long-term prediction, even for unseen initial conditions.

Author Contributions

Conceptualization, N.J-B.B., X.C.; methodology, S.C., X.Y., N.J-B.B., X.C.; software, S.C., X.Y.; validation, S.C., X.Y., N.J-B.B., X.C.; formal analysis, S.C., X.Y., N.J-B.B., X.C.; writing—original draft preparation, S.C., X.Y., N.J-B.B., X.C.; writing—review and editing, S.C., X.Y., N.J-B.B., X.C.; supervision, N.J-B.B., X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by United States Department of Agriculture (USDA) National Institute of Food and Agriculture (NIFA) Hatch Project AES-CE award (CA-R-STA-7132-H) and NSF DMS 1853698.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank all anonymous reviewers for their constructive comments and suggestions.

Conflicts of Interest

The authors declared no conflict of interest.

Appendix A

Appendix A.1. Term Combination after Simulation

During the process of simulation, if only the addition and the multiplication operators are involved, then it is not an issue to combine terms as the program can easily identify same terms and then add their coefficients together. However, combining similar terms can be difficult when fractional terms are present. To address this issue, we classify the simulation results into various groups before combining them.

As an example, we consider the scenario where the nonlinear term takes the form

g (u) h (v)

, and one of the following two structures is assumed.

(i): g is linear and h is a fractional function whose denominator is a second order polynomial:

$(u + c_{1}) \frac{α_{1} v + α_{2}}{v^{2} + β_{1} v + β_{2}} .$
(ii): h is linear and g is a fractional function whose denominator is a second order polynomial:

$\frac{α_{3} u + α_{4}}{u^{2} + β_{3} u + β_{4}} (v + c_{2}) .$

Therefore, the outcomes have 32 possibilities if we only classify terms and signs:

Numerator (4 possibilities): u, v, $u v$ , 1.
Denominator (2 possibilities): quadratic function in u or v.
Signs (4 possibilities): the sign of $β_{1}$ or $β_{2}$ can be either positive or negative.

There are now 32 groups. In each of them, all members share the same main terms and same signs in the denominator while the coefficients are allowed to be different. For example, in the group with the form

\frac{α_{1} u v}{v^{2} + β_{1} v + β_{2}},

all members share the same term

u v

in the numerator, same terms

v^{2}

and v in the denominator, and same signs of

β_{1}

and

β_{2}

, while the specific values of

α_{1}

,

β_{1}

and

β_{2}

may vary.

Based on the above groups, we will adopt the following general principle to proceed. If two terms live in distinct groups, then they are considered to be different and will not be combined. If two terms live in the same group, then we will further quantify how close their coefficients in the denominator (say

β_{1}

and

β_{2}

) are. If these coefficients are close enough, then we will regard them as the “same” term and combine them by adding their coefficients in the numerator (say

α_{1}

) together. So, the next question is how to quantify the distance between two members in the same group with possibly different coefficients (say

β_{1}

and

β_{2}

).

We will illustrate the criterion in the following by studying a specific form

\frac{u v}{v^{2} + β_{1} v + β_{2}}

. More precisely, suppose there are two terms

T_{1}

and

T_{2}

as below,

T_{1} = \frac{α_{1}^{(1)} u v}{v^{2} + β_{1}^{(1)} v + β_{2}^{(1)}}, T_{2} = \frac{α_{1}^{(2)} u v}{v^{2} + β_{1}^{(2)} v + β_{2}^{(2)}},

then we define their distance to be

D [T_{1}, T_{2}] = max_{i = 1, 2} \{\frac{| β_{i}^{(2)} - β_{i}^{(1)} |}{max {| β_{i}^{(2)} |, | β_{i}^{(1)} |}}\} .

(A1)

According to this concept, we combine

T_{1}

and

T_{2}

together if and only if

D [T_{1}, T_{2}] < 0.2

, that is when the relative difference between the coefficients is less than

0.2

. In such a case, we add the coefficients

α_{1}^{(1)}

and

α_{1}^{(2)}

to obtain

T_{1} + T_{2} \approx T^{*} : = α^{*} \frac{u v}{v^{2} + β_{1}^{(*)} v + β_{2}^{(*)}},

where

α^{*} = α_{1}^{(1)} + α_{1}^{(2)}, β_{1}^{(*)} = \frac{1}{2} [β_{1}^{(1)} + β_{1}^{(2)}], β_{2}^{(*)} = \frac{1}{2} [β_{2}^{(1)} + β_{2}^{(2)}] .

References

Turing, A.M. The Chemical Basis of Morphogenesis. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 1952, 237, 37–72. [Google Scholar]
Murray, J.D. Mathematical Biology, II, 3rd ed.; Interdisciplinary Applied Mathematics; Springer: New York, NY, USA, 2003; Volume 18, p. xxvi+811. [Google Scholar]
Mori, Y.; Jilkine, A.; Edelstein-Keshet, L. Wave-pinning and cell polarity from a bistable reaction-diffusion system. Biophys. J. 2008, 94, 3684–3697. [Google Scholar] [CrossRef] [Green Version]
Mogilner, A.; Allard, J.; Wollman, R. Cell polarity: Quantitative modeling as a tool in cell biology. Science 2012, 336, 175–179. [Google Scholar] [CrossRef] [Green Version]
Tian, C. Parameter Estimation Procedure of Reaction Diffusion Equation with Application on Cell Polarity Growth. Ph.D. Thesis, UC Riverside, Riverside, CA, USA, 2018. [Google Scholar]
Tian, C.; Shi, Q.; Cui, X.; Guo, J.; Yang, Z.; Shi, J. Spatiotemporal dynamics of a reaction-diffusion model of pollen tube tip growth. J. Math. Biol. 2019, 79, 1319–1355. [Google Scholar] [CrossRef] [PubMed]
Lu, L.; Meng, X.; Mao, Z.; Karniadakis, G.E. DeepXDE: A deep learning library for solving differential equations. arXiv 2019, arXiv:1907.04502. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 2017, arXiv:1711.10561. [Google Scholar]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Deep hidden physics models: Deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 2018, 19, 932–955. [Google Scholar]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Meng, X.; Li, Z.; Zhang, D.; Karniadakis, G.E. PPINN: Parareal physics-informed neural network for time-dependent PDEs. Comput. Methods Appl. Mech. Eng. 2020, 370, 113250. [Google Scholar] [CrossRef]
Pang, G.; Lu, L.; Karniadakis, G.E. fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 2019, 41, A2603–A2626. [Google Scholar] [CrossRef] [Green Version]
Chen, Z.; Xiu, D. On generalized residual network for deep learning of unknown dynamical systems. J. Comput. Phys. 2021, 438, 110362. [Google Scholar] [CrossRef]
Wu, K.; Xiu, D. Data-driven deep learning of partial differential equations in modal space. J. Comput. Phys. 2020, 408, 109307. [Google Scholar] [CrossRef] [Green Version]
Zhou, Z.; Wang, L.; Yan, Z. Deep neural networks for solving forward and inverse problems of (2 + 1)-dimensional nonlinear wave equations with rational solitons. arXiv 2021, arXiv:2112.14040. [Google Scholar]
Long, Z.; Lu, Y.; Dong, B. PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network. J. Comput. Phys. 2019, 399, 108925. [Google Scholar] [CrossRef] [Green Version]
Long, Z.; Lu, Y.; Ma, X.; Dong, B. PDE-Net: Learning pdes from data. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 3208–3216. [Google Scholar]
Pakravan, S.; Mistani, P.; Aragon-Calvo, M.; Gibou, F. Solving inverse-PDE problems with physics-aware neural networks. J. Comput. Phys. 2021, 440, 110414. [Google Scholar] [CrossRef]
Daneker, M.; Zhang, Z.; Karniadakis, G.; Lu, L. Systems Biology: Identifiability analysis and parameter identification via systems-biology informed neural networks. arXiv 2022, arXiv:2202.01723. [Google Scholar]
Both, G.; Choudhury, S.; Sens, P.; Kusters, R. DeepMoD: Deep learning for model discovery in noisy data. J. Comput. Phys. 2021, 428, 109985. [Google Scholar] [CrossRef]
Xu, H.; Chang, H.; Zhang, D. DL-PDE: Deep-learning based data-driven discovery of partial differential equations from discrete and noisy data. arXiv 2019, arXiv:1908.04463. [Google Scholar] [CrossRef]
Chen, Y.; Luo, Y.; Liu, Q.; Xu, H. and Zhang, D. Symbolic genetic algorithm for discovering open-form partial differential equations (SGA-PDE). Phys. Rev. Res. 2022, 4, 023174. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Y. Robust data-driven discovery of partial differential equations under uncertainties. arXiv 2021, arXiv:2102.06504. [Google Scholar]
Bhowmick, S.; Nagarajaiah, S. Data-driven theory-guided learning of partial differential equations using simultaneous basis function approximation and parameter estimation (SNAPE). arXiv 2021, arXiv:2109.07471. [Google Scholar]
Rudy, S.H.; Brunton, S.L.; Kutz, J.N. Data-driven discovery of partial differential equations. Sci. Adv. 2017, 3, e1602614. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rudy, S.; Alla, A.; Brunton, S.L.; Kutz, J.N. Data-driven identification of parametric partial differential equations. SIAM J. Appl. Dyn. Syst. 2019, 18, 643–660. [Google Scholar] [CrossRef]
Cai, J.; Dong, B.; Osher, S.; Shen, Z. Image restoration: Total variation, wavelet frames, and beyond. J. Amer. Math. Soc. 2012, 25, 1033–1089. [Google Scholar] [CrossRef] [Green Version]
Brunel, N. J-B. Parameter estimation of ODE’s via nonparametric estimators. Electron. J. Statist. 2008, 2, 1242–1267. [Google Scholar] [CrossRef]
Bergstra, J.; Yamins, D.; Cox, D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 115–123. [Google Scholar]
Dunn, O. Multiple comparisons among means. J. Am. Stat. Assoc. 1961, 56, 52–64. [Google Scholar] [CrossRef]
Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar]
Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 1995, 57, 289–300. [Google Scholar] [CrossRef]

Figure 1. ROP1 and

C a^{2 +}

polarization dynamics. Left: ROP1 dynamics; Right:

C a^{2 +}

dynamics.

Figure 1. ROP1 and

C a^{2 +}

polarization dynamics. Left: ROP1 dynamics; Right:

C a^{2 +}

dynamics.

Figure 2. The scheme of one

Δ t

.

Figure 2. The scheme of one

Δ t

.

Figure 3. The scheme of mPDE-Net.

Figure 4. Simulation results for true positive discovering with 5% noise. (a)

{\hat{F}}_{1}

. (b)

{\hat{F}}_{2}

.

Figure 4. Simulation results for true positive discovering with 5% noise. (a)

{\hat{F}}_{1}

. (b)

{\hat{F}}_{2}

.

Figure 5. Simulation results for false positive discovering with 5% noise. (a)

{\hat{F}}_{1}

. (b)

{\hat{F}}_{1}

. (c)

{\hat{F}}_{2}

. (d)

{\hat{F}}_{2}

.

Figure 5. Simulation results for false positive discovering with 5% noise. (a)

{\hat{F}}_{1}

. (b)

{\hat{F}}_{1}

. (c)

{\hat{F}}_{2}

. (d)

{\hat{F}}_{2}

.

Figure 6. Simulation results for

{\hat{F}}_{1} (u, v)

with

1 %

noise. (a) True positive discovering. (b) True positive discovering. (c) False positive discovering.

Figure 6. Simulation results for

{\hat{F}}_{1} (u, v)

with

1 %

noise. (a) True positive discovering. (b) True positive discovering. (c) False positive discovering.

Figure 7. Simulation results for

{\hat{F}}_{2} (u, v)

with

1 %

noise. True positive discovering.

Figure 7. Simulation results for

{\hat{F}}_{2} (u, v)

with

1 %

noise. True positive discovering.

Figure 8. The first (second resp.) row shows the true dynamics of u (v resp.) at times

t = 0.4, 0.6, 0.8, and 1.0

. The third (fourth resp.) row shows the predicted dynamics of u (v resp.) with

5 %

noise level using Frac-PDE-Net.

Figure 8. The first (second resp.) row shows the true dynamics of u (v resp.) at times

t = 0.4, 0.6, 0.8, and 1.0

. The third (fourth resp.) row shows the predicted dynamics of u (v resp.) with

5 %

noise level using Frac-PDE-Net.

Figure 9. Images of the predicted dynamics using PDE-Net 2.0 with

5 %

noise level.

Figure 9. Images of the predicted dynamics using PDE-Net 2.0 with

5 %

noise level.

Figure 10. The first row shows the true dynamics of

(u, v)

for

(x, t) \in [- \frac{5 π}{2}, \frac{5 π}{2}] \times [0, 10]

. The second row presents the predicted dynamics of

(u, v)

with 1% noise level by Frac-PDE-Net.

Figure 10. The first row shows the true dynamics of

(u, v)

for

(x, t) \in [- \frac{5 π}{2}, \frac{5 π}{2}] \times [0, 10]

. The second row presents the predicted dynamics of

(u, v)

with 1% noise level by Frac-PDE-Net.

Figure 11. Images of the predicted dynamics of

(u, v)

for

(x, t) \in [- \frac{5 π}{2}, \frac{5 π}{2}] \times [0, 10]

using PDE-Net 2.0 with 1% noise level.

Figure 11. Images of the predicted dynamics of

(u, v)

for

(x, t) \in [- \frac{5 π}{2}, \frac{5 π}{2}] \times [0, 10]

using PDE-Net 2.0 with 1% noise level.

Table 1. Fixed parameters for Frac-PDE-Net.

Parameter	Value
t	[0, 0.15]
dt	0.01
x & y	[−5, 5]
dx & dy	$\frac{10}{64}$
$N_{Init}$	12
$N_{T i m e}$	15
$N_{S p a c e}$	64

Table 2. Hyper-parameters selected by validation procedure Section 2.3 for Frac-PDE-Net.

Parameter	Value
$λ_{M}$ ( $5 %$ noise level)	$3.28 \times 10^{- 5}$
$λ_{S}$ ( $5 %$ noise level)	$4.93 \times 10^{- 5}$

Table 3. PDE model discovery with

5 %

noise.

Table 3. PDE model discovery with

5 %

noise.

True $F_{1}^{*}$	$0.3 Δ u + 1 - u - 0.5 \frac{u v}{u^{2} + 0.25 u + 0.25}$
${\hat{F}}_{1}^{m P D E - N e t}$	$0.305 Δ u + 0.992 - 0.988 u - 0.510 \frac{u v}{u^{2} + 0.259 u + 0.265}$
	$- 0.003 \frac{v}{u^{2} + 0.259 u + 0.265} + 0.003 v$
${\hat{F}}_{1}^{s m P D E - N e t}$	$0.305 Δ u + 1.00 - 0.981 u - 0.532 \frac{u v}{u^{2} + 0.259 u + 0.265}$
${\hat{F}}_{1}^{r s m P D E - N e t}$ (Frac-PDE-Net)	$0.304 Δ u + 0.975 - 0.982 u - 0.501 \frac{u v}{u^{2} + 0.256 u + 0.260}$
${\hat{F}}_{1}^{P H r s m P D E - N e t}$	$0.278 Δ u + 0.969 - 0.993 u - 0.514 \frac{u v}{u^{2} + 0.301 u + 0.271}$
True $F_{2}^{*}$	$0.4 Δ v + 0.4 - 0.2 v - 0.5 \frac{u v}{u^{2} + 0.25 u + 0.25}$
${\hat{F}}_{2}^{m P D E - N e t}$	$0.398 Δ v + 0.412 - 0.195 v - 0.510 \frac{u v}{u^{2} + 0.254 u + 0.263}$
	$- 0.005 \frac{v}{u^{2} + 0.254 u + 0.263} - 0.010 u$
${\hat{F}}_{2}^{s m P D E - N e t}$	$0.398 Δ v + 0.424 - 0.199 v - 0.542 \frac{u v}{u^{2} + 0.254 u + 0.263}$
${\hat{F}}_{2}^{r s m P D E - N e t}$ (Frac-PDE-Net)	$0.400 Δ v + 0.385 - 0.202 v - 0.490 \frac{u v}{u^{2} + 0.243 u + 0.256}$
${\hat{F}}_{2}^{P H r s m P D E - N e t}$	$0.344 Δ v + 2.116 - 0.815 v$

Table 4. Hypothesis tests with

5 %

observation noise.

Table 4. Hypothesis tests with

5 %

observation noise.

$H_{0}^{(i)}$ vs. $H_{A}^{(i)}$ , $1 \leq i \leq 50$	Number of Rejections
Bonferroni	49
Holm	49
B-H	49

Table 5. Fixed parameters for Frac-PDE-Net.

Parameter	Value
t	[0, 0.75]
dt	0.05
x	[−2.5 $π$ , 2.5 $π$ ]
dx	$\frac{5 π}{200}$
$N_{Init}$	14
$N_{T i m e}$	15
$N_{S p a c e}$	200

Table 6. Hyper-parameters selected for Frac-PDE-Net by the validation procedure as in Section 2.3.

Parameter	Value
$λ_{M}$ ( $1 %$ noise level)	$1.88 \times 10^{- 7}$
$λ_{S}$ ( $1 %$ noise level)	$1.62 \times 10^{- 6}$

Table 7. PDE model discovery with

1 %

noise level.

Table 7. PDE model discovery with

1 %

noise level.

True $F_{1}^{*}$	$0.1 \partial_{x}^{2} u + 3.6 u^{1.5} - 3.6 u - 0.229 u^{1.5} \int_{- 2.5 π}^{2.5 π} u d x$
	$+ 0.081 \frac{u}{v^{2} + 0.0215}$
${\hat{F}}_{1}^{m P D E - N e t}$	$0.118 \partial_{x}^{2} u + 3.959 u^{1.361} - 3.871 u$
	$- 0.223 u^{1.361} \int_{- 2.5 π}^{2.5 π} u d x + 0.0749 \frac{u}{(v + 0.005)^{2} + 0.0211}$
	$0.0002 \frac{u v}{(v + 0.005)^{2} + 0.0211} - 0.0029 v$
${\hat{F}}_{1}^{s m P D E - N e t}$	$0.117 \partial_{x}^{2} u + 3.893 u^{1.361} - 3.976 u$
	$- 0.223 u^{1.361} \int_{- 2.5 π}^{2.5 π} u d x + 0.0750 \frac{u}{(v + 0.005)^{2} + 0.0211}$
${\hat{F}}_{1}^{r s m P D E - N e t}$ (Frac-PDE-Net)	$0.0899 \partial_{x}^{2} u + 3.441 u^{1.508} - 3.363 u - 0.244 u^{1.508} \int_{- 2.5 π}^{2.5 π} u d x$
	$+ 0.0714 \frac{u}{(v + 0.0002)^{2} + 0.0209}$
${\hat{F}}_{1}^{P H r s m P D E - N e t}$	$- 0.026 \partial_{x}^{2} u + 0.628 u^{1.500} - 2.333 u + 0.0393 \frac{u}{(v - 0.0479)^{2} + 0.0154}$
True $F_{2}^{*}$	$10.0 \partial_{x}^{2} v + u - 0.4 v$
${\hat{F}}_{2}^{m P D E - N e t}$	$9.388 \partial_{x}^{2} v + 0.963 u - 0.400 v$
${\hat{F}}_{2}^{s m P D E - N e t}$	$9.388 \partial_{x}^{2} v + 0.963 u - 0.400 v$
${\hat{F}}_{1}^{r s m P D E - N e t}$	$9.588 \partial_{x}^{2} v + 0.969 u - 0.403 v$
${\hat{F}}_{1}^{P H r s m P D E - N e t}$	$8.145 \partial_{x}^{2} v + 0.937 u - 0.387 v$

Table 8. PDE model discovered by PDE-Net 2.0.

	Predicted Terms by PDE-Net 2.0 with 5% Noise
${\hat{F}}_{1} (u, v)$	$0.0457 Δ u - 1.765 u + 0.0938 v + 0.0008$
${\hat{F}}_{2} (u, v)$	$0.243 Δ v - 0.604 u - 0.277 v + 7 (10^{- 5})$

Table 9. Errors of predicted solutions for u and v by Frac-PDE-Net and PDE-Net 2.0.

	Noise	Frac-PDE-Net				PDE-Net 2.0
$\| \tilde{u} - u \|$		$t = 0.4$	$t = 0.6$	$t = 0.8$	$t = 1$	$t = 0.4$	$t = 0.6$	$t = 0.8$	$t = 1$
$L^{\infty}$	$5 %$	0.007254	0.010806	0.014310	0.017765	0.227365	0.331438	0.429602	0.522157
$L^{2}$	$5 %$	0.000106	0.000158	0.000209	0.000260	0.002720	0.003986	0.005192	0.006341
$\| \tilde{v} - v \|$		$t = 0.4$	$t = 0.6$	$t = 0.8$	$t = 1$	$t = 0.4$	$t = 0.6$	$t = 0.8$	$t = 1$
$L^{\infty}$	$5 %$	0.001503	0.002247	0.002988	0.003725	0.200241	0.293939	0.383577	0.469314
$L^{2}$	5%	0.000022	0.000033	0.000044	0.000054	0.001989	0.002930	0.003836	0.004708

Table 10. PDE model discovered by PDE-Net 2.0.

	Predicted Terms by PDE-Net 2.0 with 1% Noise
${\hat{F}}_{1} (u, v)$	$0.0001 \partial_{x}^{2} u - 3.95 (10^{- 5}) u - 6.05 (10^{- 5}) v - 0.0002$
${\hat{F}}_{2} (u, v)$	$5.22 (10^{- 5}) \partial_{x}^{2} v + 1.70 (10^{- 5}) u + 8.19 (10^{- 6}) v + 4.59 (10^{- 5})$

Table 11. Errors of predicted solutions for u and v by Frac-PDE-Net and PDE-Net 2.0.

	Noise	Frac-PDE-Net	PDE-Net 2.0
$\| \tilde{u} - u \|$		$t \in [0, 10]$	$t \in [0, 10]$
$L^{\infty}$	$1 %$	0.062771	0.117773
$L^{2}$	1%	0.000029	0.000060
$\| \tilde{v} - v \|$		$t \in [0, 10]$	$t \in [0, 10]$
$L^{\infty}$	$1 %$	0.009434	0.039400
$L^{2}$	1%	0.000010	0.000056

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, S.; Brunel, N.J.-B.; Yang, X.; Cui, X. Learning Interactions in Reaction Diffusion Equations by Neural Networks. Entropy 2023, 25, 489. https://doi.org/10.3390/e25030489

AMA Style

Chen S, Brunel NJ-B, Yang X, Cui X. Learning Interactions in Reaction Diffusion Equations by Neural Networks. Entropy. 2023; 25(3):489. https://doi.org/10.3390/e25030489

Chicago/Turabian Style

Chen, Sichen, Nicolas J-B. Brunel, Xin Yang, and Xinping Cui. 2023. "Learning Interactions in Reaction Diffusion Equations by Neural Networks" Entropy 25, no. 3: 489. https://doi.org/10.3390/e25030489

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning Interactions in Reaction Diffusion Equations by Neural Networks

Abstract

1. Introduction

2. Methodology

2.1. PDE-Net Review

2.2. mPDE-Net (Modified PDE-Net)

2.3. Optimizing Hyperparameters

2.4. Frac-PDE-Net

2.5. Kolmogorov-Smirnov Test

3. Numerical Studies: Convection-Diffusion Equations with the Neumman Boundary Condition

3.1. Example 1: A 2-Dimensional Model

3.2. Example 2: A 1-Dimensional Model

4. Prediction

4.1. Example 1: The 2-Dimensional Model

4.2. Example 2: The One-Dimensional Model

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Term Combination after Simulation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI