Next Article in Journal
Interdiffusion in Refractory Metal System with a BCC Lattice: Ti/TiZrHfNbTaMo
Previous Article in Journal
Extraction of Important Factors in a High-Dimensional Data Space: An Application for High-Growth Firms
Previous Article in Special Issue
Learning PDE to Model Self-Organization of Matter
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Learning Interactions in Reaction Diffusion Equations by Neural Networks

1
Department of Statistics, University of California, Riverside, CA 92521, USA
2
ENSIIE & Laboratoire de Mathématiques et Modélisation d’Evry, Université Paris Saclay, 91025 Evry, France
3
Quantmetry, 75008 Paris, France
4
Department of Mathematics, University of California, Riverside, CA 92521, USA
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(3), 489; https://doi.org/10.3390/e25030489
Submission received: 6 February 2023 / Revised: 7 March 2023 / Accepted: 8 March 2023 / Published: 11 March 2023
(This article belongs to the Special Issue Physics-Based Machine and Deep Learning for PDE Models)

Abstract

:
Partial differential equations are common models in biology for predicting and explaining complex behaviors. Nevertheless, deriving the equations and estimating the corresponding parameters remains challenging from data. In particular, the fine description of the interactions between species requires care for taking into account various regimes such as saturation effects. We apply a method based on neural networks to discover the underlying PDE systems, which involve fractional terms and may also contain integration terms based on observed data. Our proposed framework, called Frac-PDE-Net, adapts the PDE-Net 2.0 by adding layers that are designed to learn fractional and integration terms. The key technical challenge of this task is the identifiability issue. More precisely, one needs to identify the main terms and combine similar terms among a huge number of candidates in fractional form generated by the neural network scheme due to the division operation. In order to overcome this barrier, we set up certain assumptions according to realistic biological behavior. Additionally, we use an L 2 -norm based term selection criterion and the sparse regression to obtain a parsimonious model. It turns out that the method of Frac-PDE-Net is capable of recovering the main terms with accurate coefficients, allowing for effective long term prediction. We demonstrate the interest of the method on a biological PDE model proposed to study the pollen tube growth problem.

1. Introduction

Two-component reaction–diffusion systems often model the interaction of two chemicals, leading to the formation of non-uniform spatial patterns of chemical concentration or morphogenesis under certain conditions due to chemical reactions and spreading. Since Turing’s groundbreaking work [1], reaction–diffusion systems have been extensively used in developmental biology modeling. For example, let u = u ( x , y , t ) and v = v ( x , y , t ) represent the concentration of two chemical species, which may either enhance or suppress each other depending on the context. The system of u and v can be modeled as follows:
t u = d 0 Δ u + N 1 ( u , v ) , t v = d 1 Δ v + N 2 ( u , v ) ,
where Δ = x 2 + y 2 denotes the Laplacian operator, N 1 and N 2 are interactions between u and v. The functions N 1 and N 2 are sums of various reaction terms that can be derived from physical or chemical principles such as mass-action laws, Michaelis–Menten kinetics, or products that represent some competition, cooperation effects. We refer the readers to ([2], Section 2.2) for more discussions. Hence, N 1 and N 2 are sums of meaningful functions that represent specific mechanisms: if we are able to identify these terms and discover the explicit formulas for N 1 and N 2 , then we can learn more about the nature of the interactions and predict future behaviors well. This situation arises commonly in biological applications such as chemotaxis, pattern formation in developmental biology, and also the cell polarity phenomenon [3,4].
Cell polarity plays a vital role in cell growth and function for many cell types, affecting cell migration, proliferation, and differentiation. A classic example of polar growth is pollen tube growth, which is controlled by the Rho GTPase (ROP1) molecular switch. Recent studies have revealed that the localization of active ROP1 is regulated by both positive and negative feedback loops, and calcium ions play a role in ROP1’s negative feedback mechanism. Initially, ROP1 is inside the membrane. During positive feedback (rate k p f ), some of the ROP1 enters the membrane. At the same time, negative feedback (rate k n f ) causes some of it to return inside the membrane while the rest diffuse on the membrane (rate D r ). Calcium ions follow a similar process with positive rate k a c , negative rate k d c , and diffusion rate D c . In [5,6], the following 2D reaction–diffusion system (2) is introduced:
R t = k p f R α R t o t L L R ( x , t ) d x k n f g ( C ) R + D r R x x , C t = k a c R k d c C + D c C x x , R x ( L , t ) = R x ( L , t ) = 0 , C x ( L , t ) = C x ( L , t ) = 0 , R ( x , 0 ) = R 0 ( x ) , C ( x , 0 ) = C 0 ( x ) .
with suitable initial and boundary conditions being proposed to quantitatively describe such spatial and temporal connection between ROP1 and calcium ions, leading to rapid oscillations in their distributions on the cell membrane. Here, R = R ( x , t ) , C = C ( x , t ) , and  R t , C t , R x , R x x , C x and C x x are abbreviated notations for partial derivatives with respect to the time t or to the spatial variable x. Moreover, the non-linear function g ( C ) characterizes how calcium ions play a role in ROP1’s negative feedback loop. Specifically, the active ROP1 causes an increase in C a 2 + levels, leading to a reduction in ROP1 activity and a decrease in its levels. Meanwhile, the flow of C a 2 + slows down as ROP1 drops. Ref. [6] proposed the equation g ( C ) = C 2 C 2 + k c 2 to describe such spatial–temporal patterns of calcium, where k c is a positive constant. Based on this model, ref. [6] developed a modified gradient matching procedure for parameter estimation, including k n f and k c . However, it requires that g ( C ) in (2) is a known function. In this work, we propose to apply neural network methods to uncover the function g ( C ) or more broadly, to learn interaction terms N 1 and N 2 in general reaction-diffusion PDEs (1), which may contain fractional expressions (Figure 1).
In the past decade, the artificial intelligence community has focused increasingly on neural networks, which have become crucial in many applications, especially PDEs. Deep learning-based approaches to PDEs have made substantial progress and are well-studied, both for forward and inverse problems. For forward problems with appropriate initial and boundary conditions in various domains, several methods have been developed to accurately predict dynamics (e.g., [7,8,9,10,11,12,13,14,15,16,17]). For inverse problems, there are two classes of approaches. The first class of approaches focuses on inferring coefficients from known data (e.g., [7,10,12,15,18,19]). An example of this is the widely known PINN (Physics-informed Neural Networks) method [10], which uses PDEs in the loss function of neural networks to incorporate scientific knowledge. Ref. [7] improved the efficiency of PINNs with the residual-based adaptive refinement (RAR) method and created a library of open-source codes for solving various PDEs, including those with complex geometry. However, this method is only capable of estimating coefficients for fixed known terms in PDEs, and may not work well for discovering hidden PDE models. Although [9] extended the PINN method to find unknown dynamic systems, the nonlinear learner function remains a black-box and no explicit expressions of the discovered terms in the predicted PDE are available, making it difficult to interpret their physical meaning. The second class of approaches not only estimates coefficients, but also discovers hidden terms (e.g., [16,17,20,21,22,23,24,25,26]). An example is the PDE-Net method [16], which combines numerical approximations of convolutional differential operators with symbolic neural networks. PDE-Net can learn differential operators through convolution kernels, a natural method for solving PDEs that has been well-studied in [27]. This approach is capable of recovering terms in PDE models with explicit expressions and relatively accurate coefficients, but often produces many noisy terms that lack interpretation. In order to produce parsimonious models, refs. [25,26] proposed to create a regression model with the response variable t u , and a matrix Θ with a collection of spatial and polynomial derivative functions (e.g., u , x u , u x u ): t u = Θ ξ . The estimation of differential equations by modeling the time variations of the solution is known to produce consistent estimates [28]. In addition, the Ridge regression with hard thresholding can be used to approximate the coefficient vector ξ . This sparse regression-based method generally results in a PDE model with accurately predicted terms and high accuracy coefficients. However, few existing studies have focused on effectively recovering interaction terms in the fractional form (say one polynomial term divided by another polynomial term) in hidden partial differential equations, which is the focus of this paper.
Previous methods for identifying the hidden terms in reaction–diffusion partial differential equation models have mostly focused on polynomial forms. However, as indicated in Equation (2), the model for ROP1 and calcium ion distribution also involves fractional and integral forms, which can pose identifiability issues when combined with polynomial forms. Furthermore, we want to attain a parsimonious model, as the interpretability of the PDE model is important for biologists to comprehend biological behavior and phenomena revealed by the model.
In this paper, we utilize a combination of a modified PDE-Net method (which adds fractional and integration terms to the original PDE-Net approach), an  L 2 norm term selection criterion, and an appropriate sparsity regression. This combination proves to produce meaningful and stable terms with accurate estimation of coefficients. For ease of reference to this combination, we call it Frac-PDE-Net.
The paper is organized as follows. In Section 2, we explain the main idea and the framework of our proposed method Frac-PDE-Net. In Section 3, we apply Frac-PDE-Net to discover some biological PDE models based on simulation data. Then, in Section 4, we make some predictions to test the effectiveness of the models learned in Section 3. Finally, we summarize our findings and present some possible future works in Section 5.

2. Methodology

The main idea of the PDE-Net method, as described in [16], is to use a deep convolutional neural network (CNN) to study generic nonlinear evolution partial differential equations (PDEs) as shown below:
t u = F ( z , u , u , 2 u , ) , z Ω , t [ 0 , T ] ,
where u = u ( z , t ) is a function (scalar valued or vector valued) of the space variable z and the temporal variable t. Its architecture is a feed-forward network that combines the forward Euler method in time with the second-order finite difference method in space through the implementation of special filters in the CNN that imitate differential operators. The network is trained to approximate the solution to the above PDEs and then the network is used to make predictions for the subsequent time steps. The authors of [16] show that this approach is effective for solving a range of PDEs and can achieve satisfactory accuracy and computational efficiency compared to traditional numerical methods. In this paper, we follow a similar framework to PDE-Net, but with modifications on a symbolic network framework ( S y m N e t m k ) to better align with biological models.

2.1. PDE-Net Review

The feed-forward network consists of several Δ t -blocks, all of which use the same parameters optimized through minimizing a loss function. For simplicity, we will only show one Δ t -block for two-dimensional PDEs, as repeating it generates multiple Δ t -blocks, and the concept can easily be extended to higher-dimensional PDEs.
Denote the space variable z in (3) to be z = ( x , y ) since we are dealing with the two-dimensional case. Let t 0 = 0 and u ˜ ( · , t 0 ) be the given initial data. For  i 0 , u ˜ ( · , t i + 1 ) denotes the predicted value of u at time t i + 1 calculated from the predicted (true) value of u ˜ at time t i using the following procedure:
u ˜ ( · , t i + 1 ) = u ˜ ( · , t i ) + ( Δ t ) SymNet ( x , y , D 00 u , D 10 u , D 01 u , D 20 u , ) ,
where SymNet is an approximation operator of F. Here, the operators D i j are convolution operators with the underlying filters q i j , i.e.,  D i j u : = 1 ( Δ x ) i ( Δ y ) j q i j u . These operators approximate differential operators:
D i j u d i + j u d i x d j y .
For a general N × N filter q = q [ k 1 , k 2 ] , where N 1 2 k 1 , k 2 N 1 2 ,
q u ( x , y ) : = k 1 , k 2 q [ k 1 , k 2 ] u ( x + k 1 Δ x , y + k 2 Δ y ) .
By Taylor expansion,
q u ( x , y ) = i , j = 0 N 1 m i j ( Δ x ) i ( Δ y ) j i + j u i x j y | ( x , y ) + O | Δ x | N + O | Δ y | N ,
where
m i j : = 1 i ! j ! k 1 , k 2 k 1 i k 2 j q [ k 1 , k 2 ] , 0 i , j N 1 .
In particular, if choosing Δ x = Δ y = δ , then
q u ( x , y ) = i , j = 0 N 1 m i j δ i + j i + j u i x j y | ( x , y ) + O δ N ,
As a result, the training of q can be performed through the training of M : = ( m i j ) since the moment matrix M = M ( q ) . It is important to note that the trainable filters M (or q) must be carefully constrained to match differential operators.
For example, to approximate u x by D 10 u , or equivalently by 1 Δ x q 10 u for a 3 × 3 filter q 10 , we may choose
M 1 ( q 10 ) = 0 0 1 or M 2 ( q 10 ) = 0 0 0 1 0 0 ,
where ∗ means no constraint on the corresponding entry. Generally, the fewer instances of ∗ present, the more restrictions are imposed, leading to increased accuracy. In this example of (6), the choice of M 1 ensures the 1st order accuracy and the choice of M 2 guarantees the 2nd order accuracy. More precisely, if we plug M 1 into (5) with Δ x = Δ y = δ , then
q 10 u ( x , y ) = δ u x + O δ 2 ,
which implies 1 Δ x q 10 u ( x , y ) = u x + O Δ x . Similarly, if we plug M 2 into (5), then 1 Δ x q 10 u ( x , y ) = u x + O ( Δ x ) 2 . In PDE-Net 2.0, all moment matrices are trained as subject to partial constraints so that the accuracy is at least 2nd order.
The S y m N e t m k network, modeled after CNNs, is employed to approximate the multivariate nonlinear response function F. It takes a m-dimensional vector as input and consists of k layers. As depicted in Figure 2, the  S y m N e t m 2 network has two hidden layers, where each f i unit performs a dyadic multiplication and the output is added to the ( i + 1 ) th hidden layer.
The loss function for this method has three components and is defined as follows:
L = L d a t a + λ M L m o m e n t + λ S L S y m N e t .
Here, L d a t a measures the difference between the true data and the prediction. Consider the data set { u j ( · , t i ) R N s × N s : 1 i n , 1 j N } , where n is the number of Δ t blocks, N is the total number of samples, and  N s is the number of space grids. The index j indicates the jth solution path with a certain initial condition of the unknown dynamics, and the index i represents the solution at time t i . Then, we define
L d a t a = 1 n N ( Δ t ) 2 i = 1 n j = 1 N i j .
Here, i j : = | | u j ( t i , · ) u ˜ j ( t i , · ) | | 2 2 , where u j represents the real data and u ˜ j denotes the predicted data. For a given threshold s, recall the Huber’s loss function 1 ( s ) defined as
1 ( s ) ( x ) = | x | s 2 if | x | > s , x 2 2 s if | x | s .
We then define the following:
L m o m e n t = i , j i 1 , j 1 1 ( s ) M ( q i j ) [ i 1 , j 1 ] ,
where q i j s are filters and M ( q i j ) is the moment matrix of q i j . Using the same Huber loss function as in (8), we define
L S y m N e t = i , j 1 ( s ) ( w i j ) ,
where w i j s are the parameters in SymNet. The coefficients λ M and λ S in Equation (7) serve as regularization terms to help control the magnitude of the parameters, preventing them from becoming too large and overfitting to the training data.

2.2. mPDE-Net (Modified PDE-Net)

In mPDE-Net, we do not include multiplications between derivatives of u and v, as these interactions are not commonly present in biological phenomena. Additionally, to handle interactions in fractional or integral forms, such as those in Equation (2), mPDE-Net incorporates integral terms and division operations into S y m N e t m k . However, there was a challenge with identifiability using mPDE-Net. For instance, consider a two-component input vector u and v. mPDE-Net may produce results such as u 2 u + ϵ or u v v + ϵ , where ϵ is a small number due to noise. Although both of these terms essentially represent the same term u, the mPDE-Net is unable to automatically identify them as such. Keeping all similar terms such as u 2 u + ϵ , u v v + ϵ and u at the same time would result in a complex model and the real fractional term would not be effectively trained.
To address the identifiability issue, restrictions were imposed on the nonlinear interaction term N ( u , v ) by assuming that N ( u , v ) = g ( u ) h ( v ) , where either g or h is linear and the other one can contain a fractional term with the order of the denominator larger than that of the numerator. For instance, the terms u 2 u + ϵ and u v v + ϵ are further decomposed as follows:
u 2 u + ϵ = u ϵ + ϵ 2 u + ϵ , u v v + ϵ = u u ϵ v + ϵ .
As seen, the main part of the above two terms is u while the rest, such as ϵ , ϵ 2 u + ϵ and u ϵ v + ϵ , are considered as perturbations since ϵ is very small. This allows the mPDE-Net to identify and combine the main parts of terms, resulting in a compact model.
Figure 3 presents an example of a system involving the derivatives of u and v up to the second order. The symbolic neural network in this example has five hidden layers, referred to as S y m N e t 10 5 . The operators f i are multiplication functions, i.e.,  f i ( η i , ξ i ) = η i ξ i , for  i = 1 , 4 , 5 ; and f j are division functions, i.e.,  f j ( η j , ξ j ) = η j ξ j , for  j = 2 , 3 . Additionally, a term u α is included to incorporate fractional powers, such as the term R α in (2). The algorithm corresponding to this example is outlined in Algorithm 1, where L 1 = ( u , u x , u x x , v , v x , v x x , u 2 , v 2 , u α , I ) T , L 2 = ( L 1 T , f 1 ) T , L 3 = ( L 2 T , f 2 ) T , L 4 = ( L 3 T , f 3 ) T , L 5 = ( L 4 T , f 4 ) T , L 6 = ( L 5 T , f 5 ) T .
Algorithm 1 Scheme of mPDE-Net.
Input: u , u x , u x x , v , v x , v x x , u 2 , v 2 , u α , I , where I represents u ( x , t ) d x ,
( η 1 , ξ 1 ) T = W ( 1 ) L 1 + b ( 1 ) ,     W ( 1 ) R 2 × 10 , L 1 R 10 , b ( 1 ) R 2 ,
( η 2 , ξ 2 ) T = W ( 2 ) L 2 + b ( 2 ) ,     W ( 2 ) R 2 × 11 , L 2 R 11 , b ( 2 ) R 2 ,
( η 3 , ξ 3 ) T = W ( 3 ) L 3 + b ( 3 ) ,     W ( 3 ) R 2 × 12 , L 3 R 12 , b ( 3 ) R 2 ,
( η 4 , ξ 4 ) T = W ( 4 ) L 4 + b ( 4 ) ,     W ( 4 ) R 2 × 13 , L 4 R 13 , b ( 4 ) R 2 ,
( η 5 , ξ 5 ) T = W ( 5 ) L 5 + b ( 5 ) ,     W ( 5 ) R 2 × 14 , L 5 R 14 , b ( 5 ) R 2 ,
Output: F = W ( 6 ) L 6 + b ( 6 ) ,     W ( 6 ) R 1 × 15 , L 6 R 15 , b ( 6 ) R 1 .
To further demonstrate the mPDE-Net approach, we present a concrete example. To simplify the notation, we introduce the row vector e i with a 1 in the ith component and 0 in all other components, i.e.,
e i = ( 0 , 0 , , 0 , 1 , 0 , 0 ) ,
where the number “1” is on the ith position. Then, we set
W ( 1 ) = e 1 + e 4 e 1 + e 4 , W ( 2 ) = e 4 4 e 4 + e 8 , W ( 3 ) = 0.5 e 1 0.2 e 1 + e 7 ,
W ( 4 ) = 0.2 e 1 e 12 , W ( 5 ) = 0.2 e 4 e 13 , W ( 6 ) = 0.1 e 1 + 0.3 e 3 + 6 e 4 + e 11 + 2 e 14 + 3 e 15 ,
b ( 1 ) = 1 0 , b ( 2 ) = 0.5 0.5 , b ( 3 ) = 1 0 , b ( 4 ) = 0 0 , b ( 5 ) = 0 0 , b ( 6 ) = 0 0 .
According to Algorithm 1 for 1 i 5 ,
W ( 1 ) L 1 + b ( 1 ) = u + v + 1 u + v , f 1 = f 1 ( η 1 , ξ 1 ) = η 1 ξ 1 = ( u + v + 1 ) ( u + v ) ,
W ( 2 ) L 2 + b ( 2 ) = v + 0.5 4 v + v 2 + 0.5 , f 2 = f 2 ( η 2 , ξ 2 ) = η 2 ξ 2 = v + 0.5 v 2 + 4 v + 0.5 ,
W ( 3 ) L 3 + b ( 3 ) = 0.5 u + 1 0.2 u + u 2 , f 3 = f 3 ( η 3 , ξ 3 ) = η 3 ξ 3 = 0.5 u + 1 u 2 + 0.2 u ,
W ( 4 ) L 4 + b ( 4 ) = 0.2 u f 2 , f 4 = f 4 ( η 4 , ξ 4 ) = η 4 ξ 4 = 0.2 u f 2 = 0.2 u v + 0.5 v 2 + 4 v + 0.5 ,
W ( 5 ) L 5 + b ( 5 ) = 0.2 v f 3 , f 5 = f 5 ( η 5 , ξ 5 ) = η 5 ξ 5 = 0.2 v f 3 = 0.2 v 0.5 u + 1 u 2 + 0.2 u ,
Therefore,
S y m N e t 10 5 = W 6 L 6 + b ( 6 ) = 0.1 u + 0.3 u x x + 6 v + f 1 + 2 f 4 + 3 f 5 = 0.3 u x x + u 2 + 2 u v + v 2 + 1.1 u + 7 v + 0.4 u v + 0.5 v 2 + 4 v + 0.5 + 0.6 v 0.5 u + 1 u 2 + 0.2 u .
Let L denote the library for PDE-Net 2.0 and L f denote the library for mPDE-Net. It is clear that L and L f are distinct. Typically, L only seeks to identify multiplication terms and has the form:
L = λ ( U x x + U y y ) + f 1 ( U ) : λ R , U = ( u , v ) , f 1 P ,
where
P : = { Polynomials of   U up to a certain degree } .
Conversely, L f is engineered to learn both multiplication terms and fractional terms, subject to certain constraints. In our paper, we make the choice of
L f = { λ ( U x x + U y y ) + f 1 ( U ) + f 2 ( u ) f 3 ( u ) f 4 ( v ) + f 5 ( u ) f 6 ( v ) f 7 ( v ) : λ R , U = ( u , v ) , { f i } i = 1 7 P , deg f 2 < deg f 3 , deg f 6 < deg f 7 } ,
which is much larger than L . Therefore, our framework of neural networks, built upon L f , is more challenging to implement than the original framework, which is based on L .

2.3. Optimizing Hyperparameters

In this section, we will explain the process of tuning hyperparameters λ M and λ S in the loss function (7). Firstly, the range of spatial and temporal variables in the training set are defined as [ L , L ] and [ 0 , T ] , respectively. Then, using the finite difference method, we generate a dataset that acts as the “true data”. Additionally, we consider M initial conditions. The time interval is determined by d t / d ˜ t , where d ˜ t is the time step size for computing the “true data” and d t represents the time step size for selecting the “observational data”. Typically, d ˜ t is chosen to be much smaller than d t . The solution corresponding to the mth initial condition is denoted as u m ( · , · ) , where the first “·” refers to the spatial variable and the second “·” represents the temporal variable. If the solution is evaluated at the kth time step, it is written as u m ( · , t k ) , with “·” representing the spatial variable.
The M initial values from M initial conditions are divided into three separate groups, resulting in M = M 1 + M 2 + M 3 , where M 1 , M 2 , and  M 3 represent the sizes of the training set, validation set, and test set, respectively. The solutions produced by these initial values are designated as follows:
Training set: u 1 ( · , · ) , , u M 1 ( · , · ) ;
Validation set: u M 1 + 1 ( · , · ) , , u M 1 + M 2 ( · , · ) ;
Testing set: u M 1 + M 2 + 1 ( · , · ) , , u M 1 + M 2 + M 3 ( · , · ) .
We use the training set to train our models, the validation set to find the best parameters, and the testing set to evaluate the performance of the trained models.
Assume we divide the time range [ 0 , T ] into K blocks, with cutting points denoted as t k for 1 k K . Then, for any 1 m M and for any 1 k K , we define
k m = | | u m ( · , t k ) u ˜ m ( · , t k ) | | 2 2 ,
where · 2 denotes the L 2 norm with respect to the space variable on [ L , L ] , u m is the “true solution”, and  u ˜ m is the “predicted solution” by a neural network. Based on this, the training loss, validation loss and the testing loss are defined as follows:
  • Training loss:
    L t r a i n : = 1 M 1 K ( d t ) 2 k = 1 K m = 1 M 1 k m .
  • Validation loss:
    L v a l i d : = 1 M 2 K ( d t ) 2 k = 1 K m = M 1 + 1 M 1 + M 2 k m .
  • Testing loss:
    L t e s t : = 1 M 3 K ( d t ) 2 k = 1 K m = M 1 + M 2 + 1 M k m .
We choose the hyperparameters λ M and λ S in the loss function (7) using the validation sets. Let B m k = u m ( · , t k ) and B j k = u j ( · , t k ) , where 1 m M 1 , M 1 + 1 j M 1 + M 2 and 1 k K . We define the training number by N t . We then gradually increase the time points of the training and validation sets. For instance, if K = 15 and N t = 5 , the training and validation sets can be selected as follows. The performance metric is the same as the validation loss in (10).
Training       Validation       Validation Loss
B m 1 , , B m 3 B j 1 , , B j 3 L v a l i d ( 1 )
B m 1 , , B m 6 B j 1 , , B j 6 L v a l i d ( 2 )
B m 1 , , B m 9 B j 1 , , B j 9 L v a l i d ( 3 )
B m 1 , , B m 12 B j 1 , , B j 12 L v a l i d ( 4 )
B m 1 , , B m 15 B ˜ j 1 , , B j 15 L v a l i d ( 5 )
Furthermore, we tune the hyperparameters using Hyperopt [29], which uses Bayesian optimization to explore the hyperparameter space more efficiently than a brute-force grid search. Specifically, the mPDE-Net is nested in the objective function of Hyperopt, which will optimize the average validation loss L a v l of models.
L a v l = 1 5 i = 1 5 L v a l i d ( i ) .
The selection procedure is described in Algorithm 2.
Algorithm 2 Optimizing Hyperparameters using Hyperopt
1:
Initialize the search spaces for λ M and λ S ;
2:
Define the objective function (to be optimized) as the average testing loss obtained from mPDE-Net, implemented using PyTorch;
3:
Set the optimization algorithm, specify the number of trials, and initialize the results list.
4:
for  i = 1 to number of trials do
5:
      Sample a set of hyperparameters from the search spaces, evaluate the objective function with the sampled hyperparameters, and set a list called the Validation loss.
6:
      for  r = 1 to K N t  do
7:
        Train model of mPDE-Net on B m 1 , , B m r , test it on B j 1 , , B j r to get a validation loss, and then append the validation loss to the Validation loss.
8:
      end for
9:
      Get an average validation loss from the Validation loss, append the hyperparameters and the average validation loss to the results list, and then update the search space based on the results so far.
10:
end for
11:
return the hyperparameters with the minimum objective function value.

2.4. Frac-PDE-Net

We have noted that mPDE-Net accurately fits data and recovers terms, but it may not always simplify the learned PDE, making it challenging to interpret. To address this, we implement sparsity-encouraging methods such as the Lasso approach. However, even with Lasso and hyperparameters chosen from the validation sets, the predicted equation still had redundant terms. This is likely due to correlated data and linear dependencies in the data, which prevent Lasso from fully shrinking the extra coefficients to zero. To overcome this, we employ two approaches. The first, called the L 2 norm-based term selection criterion, weakens or eliminates linear dependencies in the data. The second, called sequential threshold ridge regression (STRidge), creates concise models through strong thresholding. We will discuss these approaches in more detail below.
  • L 2 norm based term selection criterion. Consider the underlying PDE in the form of
    t u = Θ ( u ) ξ ,
    where
    Θ ( u ) = Θ 1 ( u ) , Θ 2 ( u ) , , Θ p ( u ) , ξ = ( ξ 1 , ξ 2 , , ξ p ) T .
    To address the issue of excessive terms in the learned PDE, we apply the L 2 norm based term selection criterion. This involves normalizing the columns of Θ ( u ) to obtain Φ k ( u )
    Θ ( u ) ξ = k = 1 p Θ k ( u ) ξ k = k = 1 p Φ k ( u ) η k ,
    where
    Φ k ( u ) = Θ k ( u ) Θ k ( u ) 2 , η k = ξ k Θ k ( u ) 2 , 1 k p ,
    and adjusting the coefficients ξ to ξ ˜ ,
    ξ ˜ k = 0 , if | η k | < δ max j | η j | , ξ k , otherwise , 1 k p .
    By removing the terms in Θ ( u ) whose adjusted coefficients η k are significantly smaller than the largest one, we shorten the vector ξ ˜ to ξ ( s ) . The corresponding columns in Θ ( u ) form a new matrix Θ ( s ) ( u ) with reduced linear dependency between its columns. This results in a simplified approximation of the PDE:
    t u Θ ( s ) ( u ) ξ ( s ) .
  • Sparse regression: STRidge. After using the L 2 norm-based term selection criterion to select terms, as discussed previously, we move on to consider sparse regression to further improve the compactness of the representation for the hidden PDE model (13). Here, a tolerance threshold “tol” is introduced to select coefficients for sparse results. Coefficients smaller than “tol” will be discarded, and the remaining ones will be utilized until the number of terms stabilizes. The sparsity regression process is outlined in Algorithm 3. For further information, see [25].
To summarize, the mPDE-Net approach allows us to achieve relatively accurate predictions for the function and its derivatives. We then employ an L 2 norm-based term selection criterion and sparse regression to obtain a concise model, which we refer to as Frac-PDE-Net. Algorithm 4 summarizes this procedure.
Algorithm 3: STRidge( Θ ( s ) , U t , λ , tol, iters)
1:
ξ ^ ( s ) = a r g m i n ξ ( s ) | | Θ ( s ) ξ ( s ) U t | | 2 2 + λ | | ξ ( s ) | | 2 2                       ▹ ridge regression
2:
bigcoeffs= { j : | ξ ^ j ( s ) | t o l }                               ▹ select large coefficients
3:
ξ ^ ( s ) [ bigcoeffs ] = 0                                  ▹ apply hard threshold
4:
ξ ^ ( s ) [ bigcoeffs ] = S T R i d g e ( Θ ( s ) [ : , bigcoeffs ] , U t , λ , tol , iters 1 )     ▹ recursive call with fewer coefficients
5:
return ξ ^ ( s )
Algorithm 4 L 2 Norm selection criterion+ STRidge( Θ ^ , u ^ t , λ , tol, iters)
1:
Θ ^ ξ ^ = k = 1 p Θ ^ k ξ ^ k = k = 1 p Θ k ( u ^ ) Θ k ( u ^ ) 2 ξ ^ k Θ k ( u ^ ) 2 = k = 1 p Φ k ( u ^ ) η k            ▹ Adjusted coefficients
2:
bigcoeffs= { k : | η k | δ max j | η j | }                             ▹ Select large coefficients
3:
ξ ^ [ bigcoeffs ] = 0
4:
Θ ( s ) = Θ ^ [ : , bigcoeffs ] and ξ ( s ) = ξ ^ [ bigcoeffs ]
5:
ξ ^ ( s ) = a r g m i n ξ ( s ) | | Θ ( s ) ξ ( s ) u ^ t | | 2 2 + λ | | ξ ( s ) | | 2 2 ▹ ridge regression
6:
bigcoeffs= { j : | ξ ^ j ( s ) | t o l }                                ▹ select large coefficients
7:
ξ ^ ( s ) [ bigcoeffs ] = 0                                       ▹ apply hard threshold
8:
ξ ^ ( s ) [ bigcoeffs ] = S T R i d g e ( Θ ( s ) [ : , bigcoeffs ] , u ^ t , λ , tol , iters 1 )   ▹ recursive call with fewer non-zero coefficients
9:
return ξ ^ ( s )

2.5. Kolmogorov-Smirnov Test

After applying the Frac-PDE-Net procedure, a simplified, interpretable model has been created. Our next goal is to determine if this model can be further compressed. We designate Model 1 as the system learned by Frac-PDE-Net, and Model 2 as the system obtained by removing the interaction term with the smallest L 2 norm from Model 1. To determine if Model 1 and Model 2 come from the same distribution, we use the Kolmogorov–Smirnov test (K-S test).
Since our examples involve systems of two PDEs, a two-dimensional K-S test is appropriate. The time range is [ 0 , T ] with time step size d t , giving n : = T d t time grids denoted as { t i } i = 1 n , where t i = i ( d t ) , and 1 i n . At a fixed time t i , we aim to test the proximity of two samples Y t i and Y ˜ t i , which are associated with Model 1 and Model 2, respectively, at time t i . For each t i , we specify:
Hypothesis 1
(Null). The two sets { Y t i } i = 1 n and { Y ˜ t i } i = 1 n come from a common distribution.
Hypothesis 2
(Alternative). The two sets { Y t i } i = 1 n and { Y ˜ t i } i = 1 n do not come from a common distribution.
Let H t i , 0 and p ^ t i denote null hypotheses and the corresponding p-values, respectively, for 1 i n . In this paper, we employed Bonerroni [30], Holm [31] and Benjamini–Hochberg (B-H) [32] methods for multiple testing adjustment. Note that the Bonferroni method is the most conservative one among these three methods. Under the complete null hypothesis of a common distribution across all time points, no more than 5 % of the total time points can be rejected.

3. Numerical Studies: Convection-Diffusion Equations with the Neumman Boundary Condition

In this section, we showcase numerical examples to demonstrate the efficacy of Frac-PDE-Net, our proposed method. The training, validation, and testing data are generated based on the underlying governing equation. Our aim is to use Frac-PDE-Net on these data to obtain a concise and interpretable model for the PDE. The governing PDEs under consideration in this paper are of the following form:
t u = F 1 ( u , v ) , t v = F 2 ( u , v ) ,
where
F 1 ( u , v ) = d 1 Δ u + P 1 ( u , v ) + R 1 ( u , v ) , F 2 ( u , v ) = d 2 Δ v + P 2 ( u , v ) + R 2 ( u , v ) .
Here, d 1 and d 2 are positive diffusion coefficients, R 1 and R 2 represent fractional functions of ( u , v ) , and P 1 and P 2 denote combinations of power functions and integration operators of ( u , v ) through addition and multiplication. For example, R 1 ( u , v ) can be u 2 v 2 v + 3 , and P 1 ( u , v ) can be 1 + u 1.5 v 2 + u 1.5 u d x .

3.1. Example 1: A 2-Dimensional Model

Our first example is taken from (Equation (2.8) in Section 2.2 in [2]). In this example, we consider (14) under the Neumann boundary condition on a 2-dimensional domain D 1 : = [ 5 , 5 ] × [ 5 , 5 ] with d 1 = 0.3 , d 2 = 0.4 , P 1 ( u , v ) = 1 u , P 2 ( u , v ) = 0.4 0.2 v , and
R 1 ( u , v ) = R 2 ( u , v ) = 2 u v 1 + u + 4 u 2 = 1 2 u v u 2 + 0.25 u + 0.25 .
Thus, Equation (14) is reduced to
t u = F 1 ( u , v ) , t v = F 2 ( u , v ) , x u ( 5 , y , t ) = x u ( 5 , y , t ) = y u ( x , 5 , t ) = y u ( x , 5 , t ) = 0 ,
with ( x , y , t ) [ 5 , 5 ] × [ 5 , 5 ] × [ 0 , 0.15 ] and
F 1 ( u , v ) = 0.3 ( x 2 u + y 2 u ) + 1 u 1 2 u v u 2 + 0.25 u + 0.25 , F 2 ( u , v ) = 0.4 ( x 2 v + y 2 v ) + 0.4 0.2 v 1 2 u v u 2 + 0.25 u + 0.25
The observations are generated with Equations (16) and (17), and then split into training data, validation data and testing data. The PDE is solved by applying a finite difference scheme to a 64 × 64 spatial mesh grid with the central difference scheme for Δ : = x 2 + y 2 , and with a temporal discretization of second-order Runge–Kutta (see [16]), using a time step size of 1 1600 .
In addition, the observations are obtained from various initial values: this implies an extra variability in the datasets, that is necessary if we want to be able to generalize well to any initial conditions. We assume that we have N I n i t = 12 different solutions, coming from different initial values w 0 . The functions are random, defined through random parameters a i , j , b i , j , c i , j , d i , j , a k , l , b k , l , c k , l and d k , l , which follow from the standard normal distribution N ( 0 , 1 ) , c 1 and c 2 , which follow from uniform distributions: c 1 U ( 0.5 , 0.5 ) and c 2 U ( 0.5 , 1.5 ) . Then, we generate the 12 initial values ( u 0 , v 0 ) by setting
u 0 ( x , y ) = w 0 ( x , y ) max | w 0 | + c 1 , v 0 ( x , y ) = w ˜ 0 ( x , y ) max | w ˜ 0 | + c 2 ,
where
w 0 ( x , y ) = | i | , | j | 13 { a i , j cos ( 2 i x ) cos ( 2 j y ) + b i , j sin [ ( 2 i + 1 ) x ] sin [ ( 2 j + 1 ) y ] + c i , j sin [ ( 2 i + 1 ) x ] cos ( 2 j y ) + d i , j cos ( 2 i x ) sin [ ( 2 j + 1 ) y ] } ,
w ˜ 0 ( x , y ) = | k | , | l | 13 { a k , l cos ( 2 k x ) cos ( 2 l y ) + b k , l sin [ ( 2 k + 1 ) x ] sin [ ( 2 l + 1 ) y ] + c k , l sin [ ( 2 k + 1 ) x ] cos ( 2 l y ) + d k , l cos ( 2 k x ) sin [ ( 2 l + 1 ) y ] } .
For any given initial data ( u 0 , v 0 ) , we denote the corresponding solution to be ( u * , v * ) . When noise is allowed, we assume the perturbed data to be
u ( x , y , t ) = u * ( x , y , t ) + n l Q 1 , v ( x , y , t ) = v * ( x , y , t ) + n l Q 2 ,
where n l is the level of Gaussian noise added, and Q 1 and Q 2 are random variables, which follow from the normal distribution: Q i N ( 0 , σ i 2 ) for i = 1 , 2 , where σ 1 (or σ 2 resp.) is the standard deviation of the true data u * (or v * resp.).
Since the time is from 0 to 0.15, there are 15 time blocks and we denote N T i m e = 15 . For spatial variables, we have N S p a c e = 64 , where N S p a c e represents the number of space grids. Therefore, the dataset is
( u t , k , v t , k ) : 1 t N T i m e , 1 k N I n i t ,
where both u t , k and v t , k are matrices in R N S p a c e × N S p a c e . The following Table 1 and Table 2 show a summary of parameters for Frac-PDE-Net.
Our goal is to discover the terms F 1 ( u , v ) and F 2 ( u , v ) on the right hand side of (16) and the true expressions are given by (17). For convenience of notation, we denote F ^ 1 and F ^ 2 to be our predicted operators for F 1 and F 2 . Based on some existing models (see, e.g., Section 2.2 in [2]), we adopt some assumptions before discovering F ^ 1 and F ^ 2 . More precisely, we assume that
F ^ 1 ( u , v ) = d ^ 1 Δ u + P ^ 1 ( u , v ) + R ^ 1 ( u , v ) , F ^ 2 ( u , v ) = d ^ 2 Δ v + P ^ 2 ( u , v ) + R ^ 2 ( u , v ) ,
where d ^ 1 and d ^ 2 are positive constants, P ^ 1 and P ^ 2 are polynomials of ( u , v ) up to order 2, and both the fractional terms R ^ 1 and R ^ 2 are in the form l ( u ) r ( v ) or r ( u ) l ( v ) , where l means a linear function and r denotes a fractional function in which the numerator is linear and the denominator is quadratic.
Based on these assumptions, we consider the following library { u , u x x , u y y , v , v x x , v y y } for training our model.
The filters q (as defined in (4)) are selected to be of size 5 × 5 . The total number of parameters in W ( i ) (as defined in Algorithm 1) for approximating F 1 and F 2 is 56, and the number of trainable parameters in the moment matrices M (as defined in (6)) is 52. To optimize the parameters, we use the BFGS algorithm instead of the Adam or SGD optimizers since the BFGS algorithm is faster and also stable.
In the following, we outline the notation used and summarize the key steps of our framework.
1.
F ^ i m P D E N e t denotes the result of applying the modified PDE-Net on our model.
2.
Next, we utilize the L 2 norm-based selection criterion and sparse regression on F ^ i m P D E N e t to obtain a more concise and interpretable model, referred to as F ^ i s m P D E N e t . The “s” in F ^ i s m P D E N e t represents the application of sparse regression.
3.
Subsequently, we fix the terms in F ^ i s m P D E N e t and retrain its coefficients to produce a final model named F ^ i r s m P D E N e t . This is the end result of our Frac-PDE-Net scheme. The “r” in F ^ i r s m P D E N e t signifies the process of retraining the coefficients.
4.
Finally, to verify that no further terms can be eliminated after Frac-PDE-Net, we compare two models: Model 1, generated by Frac-PDE-Net; and Model 2, which is identical to Model 1 but removes the term with the smallest L 2 norm from F ^ 1 and F ^ 2 . The coefficients in Model 2 are retrained, and the resulting model is referred to as F ^ i P H r s m P D E N e t . “PH” in F ^ i P H r s m P D E N e t represents the Post-hoc selection in Model 2. The comparison between Model 1 and Model 2 will be conducted using the Kolmogorov–Smirnov test as outlined in Section 2.5.
For this case, we added 5% noise to the generated data to form the observational data. The results are displayed in Table 3. Table 3 shows that F ^ i m P D E N e t (modified PDE-Net framework) accurately identifies the terms in Example 1 and estimates their corresponding coefficients. However, it also produces unnecessary terms with low weights after training. By applying the L 2 norm-based selection and sparse regression ( L 2 +SP), we successfully remove these extra terms in F ^ i r s m P D E N e t . After the terms in F ^ 1 and F ^ 2 are identified, we retrain the model with these fixed terms to obtain the final coefficients in F ^ i r s m P D E N e t .
To test whether Model 1 ( F ^ i r s m P D E N e t ) and Model 2 ( F ^ i P H r s m P D E N e t ) are similar or not, we compare their predictions by using the finite difference scheme. Consider the time range to be [ 0 , 0.5 ] with time step size d t = 0.01 . Hence, there are 50 time grids, which are denoted to be { t i } i = 1 50 , where t i = 0.01 i , 1 i 50 . Fix a time t i , we introduce the residuals E t i : = Y t i Y t i * and E ˜ t i : = Y ˜ t i Y t i * , where Y t i * represents the true solution, and Y t i and Y ˜ t i denote the predicted solutions based on Model 1 and Model 2, respectively, at time t i . We will test if the residuals { E t i } i = 1 50 and { E ˜ t i } i = 1 50 have similar distributions. The null hypothesis is H 0 ( i ) : E t i E ˜ t i and the alternative hypothesis is H A ( i ) : E t i E ˜ t i . Applying Bonferroni method, Holm method and the B-H’s procedure for multiple testing adjustment, discussed in Section 2.5, the test results are presented in the following Table 4.
The results in Table 4 show that Model 1 (Frac-PDE-Net) is significantly different from Model 2, meaning all terms in Model 1 should be kept. Hence, the final discovered terms for F ^ 1 and F ^ 2 are represented by Model 1 (Frac-PDE-Net) in Table 4.
To assess the stability of the results shown above, we repeated the experiments 100 times and the results are presented in Figure 4 and Figure 5. The process of merging similar terms is outline in Appendix A.1. The plots show that there are some instances where the three methods fail to eliminate certain redundant terms. However, these instances are rare, as the median of these terms is 0, indicating that they appear infrequently.

3.2. Example 2: A 1-Dimensional Model

Our second example is taken from [6]. In this example, we consider (14) under the Neumann boundary condition on a one-dimensional domain D 1 : = [ 5 π 2 , 5 π 2 ] with d 1 = 0.1 , d 2 = 10 ,
P 1 ( u , v ) = 3.6 u 1.5 3.6 u 0.229 u 1.5 2.5 π 2.5 π u d x , P 2 ( u , v ) = u 0.4 v ,
R 1 ( u , v ) = 0.081 u v 2 + 0.0215 , R 2 ( u , v ) = 0 .
Thus, Equation (14) is reduced to
t u = F 1 ( u , v ) , t v = F 2 ( u , v ) , x u ( 2.5 π , t ) = x u ( 2.5 π , t ) = x v ( 2.5 π , t ) = x v ( 2.5 π , t ) = 0 ,
with ( x , t ) [ 5 , 5 ] × [ 0 , 0.75 ] and
F 1 ( u , v ) = 0.1 x 2 u + 3.6 u 1.5 3.6 u 0.229 u 1.5 2.5 π 2.5 π u d x + 0.081 u v 2 + 0.0215 , F 2 ( u , v ) = 10 x 2 v + u 0.4 v .
The training data, validation data and testing data are generated, based on (20), by applying a finite difference scheme to a 600 spatial mesh grid and then restricted to a 200 spatial mesh grid with the central difference scheme for Δ : = x 2 , and with a temporal discretization of the implicit Euler scheme, using a time step size of 0.01. Furthermore, we evaluate 14 different initial values, 10 of which were selected from a set of solutions with periodic patterns. The remaining initial values were generated by combining elementary functions. The reason for using different ways to produce initial values is to test if this method still works for periodical solutions.
We also add noise to the generated data in the following form:
u ( x , y , t ) = | u * ( x , y , t ) + n l Q 1 | , v ( x , y , t ) = | v * ( x , y , t ) + n l Q 2 |
where n l is the level of Gaussian noise added and Q 1 and Q 2 are random variables that follow from the normal distribution: Q i N ( 0 , σ i 2 ) for i = 1 , 2 , where σ 1 (or σ 2 resp.) is the standard deviation of u * (or v * resp.). The reason of imposing the absolute value sign is to avoid negative values, which may cause trouble to evaluate power functions with non-integer power, such as u 1.5 .
We choose 15 blocks for the time on the interval [ 0 , 0.75 ] and denote N T i m e = 15 . For spatial variables, we set N S p a c e = 200 , where N S p a c e represents the number of space grids. Therefore, the dataset is
( u t , k , v t , k ) : 1 t N T i m e , 1 k N I n i t ,
where both u t , k and v t , k are matrices in R N S p a c e × N S p a c e . The following Table 5 and Table 6 show a summary of parameters for Frac-PDE-Net.
In [6], some assumptions are made on the model based on existing experimental knowledge of the biological behavior. For example, it is assumed that the operator F 2 ( u , v ) is linear in both u and v, while F 1 ( u , v ) is nonlinear in both u and v. As the form in (15),
F 1 ( u , v ) = d 1 Δ u + P 1 ( u , v ) + R 1 ( u , v ) .
In [6], the nonlinear dependence of P 1 ( u , v ) on u is via the combination of the power function u α and the integration operator u d x , where α is further restricted to the range [ 1 , 2 ] . On the other hand, R 1 ( u , v ) is assumed to be linear in u, but nonlinear in v and the nonlinear dependence on v is via a fractional function whose denominator is a quadratic polynomial. Thanks to these a priori constraints, we consider the library { u , u x , u x x , v , v x , v x x , I , u α } for F ^ 1 ( u , v ) and the library { u , u x , u x x , v , v x , v x x } for F ^ 2 ( u , v ) , where α takes the form α = 1.5 + 0.5 sin ( η ) for η R to ensure that α [ 1 , 2 ] .
The filters q are of size 1 × 19 . The total number of parameters for approximating F 1 and F 2 is 29, and the number of trainable parameters in the moment matrices M is 32. To optimize the parameters, we again use the BFGS algorithm.
For this case, we added 1% noise to the generated data to form the observational data. The results are displayed in Table 7, in which the notations are consistent with those in Table 3.
Similar to the post hoc selection procedure we performed in Example 1, we also need to compare Model 1 ( F ^ 1 r s m P D E N e t ) and Model 2 ( F ^ 1 P H r s m P D E N e t ), and determine whether they differ significantly. Consider the time range to be [ 0 , 10 ] with time step size d t = 0.05 . Hence, there are 200 time grids which are denoted to be { t i } i = 1 200 , where t i = 0.05 i , 1 i 200 . At each time t i , we introduce the residuals E t i : = Y t i Y t i * and E ˜ t i : = Y ˜ t i Y t i * , where Y t i and Y ˜ t i are associated to Model 1 and Model 2, respectively. We will test if the residuals { E t i } i = 1 200 and { E ˜ t i } i = 1 200 have similar distributions or not. Analogous to the previous case, we see from Table 7 that the coefficient in front of the term x 2 u in Model 2 ( F ^ 1 P H r s m P D E N e t ) is a negative number -0.026, which leads to rapid concentration rather than diffusion effect. With this being said, Model 2 is essentially different from Model 1 and the distributions of { E t i } i = 1 200 and { E ˜ t i } i = 1 200 are totally different.
To assess the stability of the results shown above, we repeated the experiments 100 times and the results are presented in Figure 6 and Figure 7. The plots show that there are some instances where the three methods fail to eliminate certain redundant terms. However, these instances are rare, as the median of these terms is 0, indicating that they appear infrequently.

4. Prediction

4.1. Example 1: The 2-Dimensional Model

In this section, we validate the robustness of the model discovered by Frac-PDE-Net in Example 1 by performing predictions with non-typical initial values u 0 and v 0 ,
u 0 = 50 y 2 y 4 + 4 800 [ 1.2 cos ( π 5 y ) ] + 4 , v 0 = 1 800 ( 50 y 2 y 4 + 4 ) 2 + cos π 5 x + 1 .
We use the finite difference method to generate the “true data” in the forward direction using the known coefficients and terms in (16) and (17). The spatial step sizes ( d x and d y ) are set to 10 64 and the time step size ( d t ) is 1 1600 . We then simulate the data using the trained model from Table 3 up to t = 0.5.
In Figure 8, both the true solution ( u , v ) and the predicted solution ( u ˜ , v ˜ ) of the trained model by Frac-PDE-Net are plotted at different time instances: t { 0.4 , 0.6 , 0.8 , 1 } . One can see from Figure 8 that the predicted solution is very close to the true one.
The results of the comparison between Frac-PDE-Net and PDE-Net 2.0 are presented in both graphical and quantitative form. The model discovered by PDE-Net 2.0 is shown in Table 8, while the predicted solutions are displayed in Figure 9. Although PDE-Net 2.0 only utilizes polynomials, the predicted images still have a similar shape to the true ones. To further evaluate the performance, the predicted errors are analyzed quantitatively using the L norm and L 2 norm on the space domain [ 5 , 5 ] × [ 5 , 5 ] , as seen in Table 9. The results show that Frac-PDE-Net has smaller errors compared to PDE-Net 2.0, highlighting its advantage.

4.2. Example 2: The One-Dimensional Model

In this section, we validate the robustness of the model discovered by Frac-PDE-Net in Example 2 in Section 3.2 by performing predictions with the following periodic initial values u 0 and v 0 ,
u 0 ( x ) = 0.0259 + 0.01 sin ( 3 x ) , v 0 ( x ) = 0.06475 + 0.01 sin ( 3 x ) .
We use finite difference method to generate the “true” data in the forward direction using the known coefficients and terms in (20) and (21). The spatial step sizes ( d x and d y ) are set to 5 π 200 and the time step size ( d t ) is 5 100 . The time interval considered is t [ 0 , 10 ] . We then simulate the data using the trained model from Table 7 over the time period [ 0 , 10 ] . In Figure 10, both the true solution and the predicted solution of the trained model by Frac-PDE-Net are plotted for t [ 0 , 10 ] . One can see from Figure 10 that the predicted solution is very close to the true one.
The results of the comparison between Frac-PDE-Net and PDE-Net 2.0 are presented in both graphical and quantitative form. The model discovered by PDE-Net 2.0 is shown in Table 10, while the predicted solutions are displayed in Figure 11. We can clearly see that the predicted images by PDE-Net 2.0 are far from satisfactory compared to the true one in Figure 10. To further evaluate the performance, the predicted errors are analyzed quantitatively using the L norm and L 2 norm on the space-time region 5 π 2 , 5 π 2 × [ 0 , 10 ] in Table 11. The results show that Frac-PDE-Net has much smaller errors compared to PDE-Net 2.0, highlighting its advantage.

5. Conclusions

Our approach, Frac-PDE-Net, builds on the symbolic approach developed in PDE-Net for addressing the discovery of realistic and interpretable PDE from data. While the neural network remains very efficient for generating and learning dictionaries of functions, typically polynomials, we have shown that if we enrich the dictionaries with large families of functions (typically uncountable), an extra-care is needed for selecting the important terms by penalization and by evaluating and testing the impact of a reaction term in the predicted solution. Quite remarkably, we can extract a sparse equation with readable terms and with good estimates of the associated parameters.
The introduction of rich families of functions, such as fractions (rational functions) is often necessary because they are well used by modelers, but also they can avoid the limitations of the approximation capacity of polynomials. Indeed, it might be necessary to have numerous terms in the expansion in order to have a correct approximation of the unknown reaction terms. As a matter of fact, we have introduced a very flexible family of fractions that avoid truncation based on powers u p , v q , q , p N . While we learn then the numerator and denominator coefficients in R , our approach is incorporated seamlessly in the symbolic differentiable neural network framework of PDE-Net by the introduction of extra layers.
Our work is originally motivated by the discovery and estimation of reaction–diffusion PDEs, with possibly complex terms such as fractions, non-integer powers, or non-local terms (such as an integral), as it has been introduced for the pollen tube growth problem [6]. Nevertheless, our selection approach could be used to handle other dictionaries, or in the presence of advection terms as our methodology does exploit the reaction–diffusion structure only for imposing some constraints on the dictionaries of interest, and because of the interpretability of each term in that case. As the next steps, the Frac-PDE-Net methodology can be improved by considering more advanced numerical schemes in time discretization, say implicit Euler or second-order Runge–Kutta. In that case, we expect to have a better accuracy and stability for model recovery and prediction. Another possible improvement would be to enrich the dictionaries of fractionals by replacing the current form N ( u , v ) = g ( u ) h ( v ) by more rational functions with denominators that depends both on u and v, say N ( u , v ) = u v u 2 v 2 + 1 . Finally, we put an emphasis on the fact that Frac-PDE-Net reaches a trade-off by discovering the main terms of the PDE, accurately estimating each coefficient in order to gain interpretability, while it also allows effective long-term prediction, even for unseen initial conditions.

Author Contributions

Conceptualization, N.J-B.B., X.C.; methodology, S.C., X.Y., N.J-B.B., X.C.; software, S.C., X.Y.; validation, S.C., X.Y., N.J-B.B., X.C.; formal analysis, S.C., X.Y., N.J-B.B., X.C.; writing—original draft preparation, S.C., X.Y., N.J-B.B., X.C.; writing—review and editing, S.C., X.Y., N.J-B.B., X.C.; supervision, N.J-B.B., X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by United States Department of Agriculture (USDA) National Institute of Food and Agriculture (NIFA) Hatch Project AES-CE award (CA-R-STA-7132-H) and NSF DMS 1853698.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank all anonymous reviewers for their constructive comments and suggestions.

Conflicts of Interest

The authors declared no conflict of interest.

Appendix A

Appendix A.1. Term Combination after Simulation

During the process of simulation, if only the addition and the multiplication operators are involved, then it is not an issue to combine terms as the program can easily identify same terms and then add their coefficients together. However, combining similar terms can be difficult when fractional terms are present. To address this issue, we classify the simulation results into various groups before combining them.
As an example, we consider the scenario where the nonlinear term takes the form g ( u ) h ( v ) , and one of the following two structures is assumed.
(i)
g is linear and h is a fractional function whose denominator is a second order polynomial:
( u + c 1 ) α 1 v + α 2 v 2 + β 1 v + β 2 .
(ii)
h is linear and g is a fractional function whose denominator is a second order polynomial:
α 3 u + α 4 u 2 + β 3 u + β 4 ( v + c 2 ) .
Therefore, the outcomes have 32 possibilities if we only classify terms and signs:
  • Numerator (4 possibilities): u, v, u v , 1.
  • Denominator (2 possibilities): quadratic function in u or v.
  • Signs (4 possibilities): the sign of β 1 or β 2 can be either positive or negative.
There are now 32 groups. In each of them, all members share the same main terms and same signs in the denominator while the coefficients are allowed to be different. For example, in the group with the form
α 1 u v v 2 + β 1 v + β 2 ,
all members share the same term u v in the numerator, same terms v 2 and v in the denominator, and same signs of β 1 and β 2 , while the specific values of α 1 , β 1 and β 2 may vary.
Based on the above groups, we will adopt the following general principle to proceed. If two terms live in distinct groups, then they are considered to be different and will not be combined. If two terms live in the same group, then we will further quantify how close their coefficients in the denominator (say β 1 and β 2 ) are. If these coefficients are close enough, then we will regard them as the “same” term and combine them by adding their coefficients in the numerator (say α 1 ) together. So, the next question is how to quantify the distance between two members in the same group with possibly different coefficients (say β 1 and β 2 ).
We will illustrate the criterion in the following by studying a specific form u v v 2 + β 1 v + β 2 . More precisely, suppose there are two terms T 1 and T 2 as below,
T 1 = α 1 ( 1 ) u v v 2 + β 1 ( 1 ) v + β 2 ( 1 ) , T 2 = α 1 ( 2 ) u v v 2 + β 1 ( 2 ) v + β 2 ( 2 ) ,
then we define their distance to be
D T 1 , T 2 = max i = 1 , 2 | β i ( 2 ) β i ( 1 ) | max { | β i ( 2 ) | , | β i ( 1 ) | } .
According to this concept, we combine T 1 and T 2 together if and only if D [ T 1 , T 2 ] < 0.2 , that is when the relative difference between the coefficients is less than 0.2 . In such a case, we add the coefficients α 1 ( 1 ) and α 1 ( 2 ) to obtain
T 1 + T 2 T * : = α * u v v 2 + β 1 ( * ) v + β 2 ( * ) ,
where
α * = α 1 ( 1 ) + α 1 ( 2 ) , β 1 ( * ) = 1 2 β 1 ( 1 ) + β 1 ( 2 ) , β 2 ( * ) = 1 2 β 2 ( 1 ) + β 2 ( 2 ) .

References

  1. Turing, A.M. The Chemical Basis of Morphogenesis. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 1952, 237, 37–72. [Google Scholar]
  2. Murray, J.D. Mathematical Biology, II, 3rd ed.; Interdisciplinary Applied Mathematics; Springer: New York, NY, USA, 2003; Volume 18, p. xxvi+811. [Google Scholar]
  3. Mori, Y.; Jilkine, A.; Edelstein-Keshet, L. Wave-pinning and cell polarity from a bistable reaction-diffusion system. Biophys. J. 2008, 94, 3684–3697. [Google Scholar] [CrossRef] [Green Version]
  4. Mogilner, A.; Allard, J.; Wollman, R. Cell polarity: Quantitative modeling as a tool in cell biology. Science 2012, 336, 175–179. [Google Scholar] [CrossRef] [Green Version]
  5. Tian, C. Parameter Estimation Procedure of Reaction Diffusion Equation with Application on Cell Polarity Growth. Ph.D. Thesis, UC Riverside, Riverside, CA, USA, 2018. [Google Scholar]
  6. Tian, C.; Shi, Q.; Cui, X.; Guo, J.; Yang, Z.; Shi, J. Spatiotemporal dynamics of a reaction-diffusion model of pollen tube tip growth. J. Math. Biol. 2019, 79, 1319–1355. [Google Scholar] [CrossRef] [PubMed]
  7. Lu, L.; Meng, X.; Mao, Z.; Karniadakis, G.E. DeepXDE: A deep learning library for solving differential equations. arXiv 2019, arXiv:1907.04502. [Google Scholar] [CrossRef]
  8. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 2017, arXiv:1711.10561. [Google Scholar]
  9. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Deep hidden physics models: Deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 2018, 19, 932–955. [Google Scholar]
  10. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
  11. Meng, X.; Li, Z.; Zhang, D.; Karniadakis, G.E. PPINN: Parareal physics-informed neural network for time-dependent PDEs. Comput. Methods Appl. Mech. Eng. 2020, 370, 113250. [Google Scholar] [CrossRef]
  12. Pang, G.; Lu, L.; Karniadakis, G.E. fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 2019, 41, A2603–A2626. [Google Scholar] [CrossRef] [Green Version]
  13. Chen, Z.; Xiu, D. On generalized residual network for deep learning of unknown dynamical systems. J. Comput. Phys. 2021, 438, 110362. [Google Scholar] [CrossRef]
  14. Wu, K.; Xiu, D. Data-driven deep learning of partial differential equations in modal space. J. Comput. Phys. 2020, 408, 109307. [Google Scholar] [CrossRef] [Green Version]
  15. Zhou, Z.; Wang, L.; Yan, Z. Deep neural networks for solving forward and inverse problems of (2 + 1)-dimensional nonlinear wave equations with rational solitons. arXiv 2021, arXiv:2112.14040. [Google Scholar]
  16. Long, Z.; Lu, Y.; Dong, B. PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network. J. Comput. Phys. 2019, 399, 108925. [Google Scholar] [CrossRef] [Green Version]
  17. Long, Z.; Lu, Y.; Ma, X.; Dong, B. PDE-Net: Learning pdes from data. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 3208–3216. [Google Scholar]
  18. Pakravan, S.; Mistani, P.; Aragon-Calvo, M.; Gibou, F. Solving inverse-PDE problems with physics-aware neural networks. J. Comput. Phys. 2021, 440, 110414. [Google Scholar] [CrossRef]
  19. Daneker, M.; Zhang, Z.; Karniadakis, G.; Lu, L. Systems Biology: Identifiability analysis and parameter identification via systems-biology informed neural networks. arXiv 2022, arXiv:2202.01723. [Google Scholar]
  20. Both, G.; Choudhury, S.; Sens, P.; Kusters, R. DeepMoD: Deep learning for model discovery in noisy data. J. Comput. Phys. 2021, 428, 109985. [Google Scholar] [CrossRef]
  21. Xu, H.; Chang, H.; Zhang, D. DL-PDE: Deep-learning based data-driven discovery of partial differential equations from discrete and noisy data. arXiv 2019, arXiv:1908.04463. [Google Scholar] [CrossRef]
  22. Chen, Y.; Luo, Y.; Liu, Q.; Xu, H. and Zhang, D. Symbolic genetic algorithm for discovering open-form partial differential equations (SGA-PDE). Phys. Rev. Res. 2022, 4, 023174. [Google Scholar] [CrossRef]
  23. Zhang, Z.; Liu, Y. Robust data-driven discovery of partial differential equations under uncertainties. arXiv 2021, arXiv:2102.06504. [Google Scholar]
  24. Bhowmick, S.; Nagarajaiah, S. Data-driven theory-guided learning of partial differential equations using simultaneous basis function approximation and parameter estimation (SNAPE). arXiv 2021, arXiv:2109.07471. [Google Scholar]
  25. Rudy, S.H.; Brunton, S.L.; Kutz, J.N. Data-driven discovery of partial differential equations. Sci. Adv. 2017, 3, e1602614. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Rudy, S.; Alla, A.; Brunton, S.L.; Kutz, J.N. Data-driven identification of parametric partial differential equations. SIAM J. Appl. Dyn. Syst. 2019, 18, 643–660. [Google Scholar] [CrossRef]
  27. Cai, J.; Dong, B.; Osher, S.; Shen, Z. Image restoration: Total variation, wavelet frames, and beyond. J. Amer. Math. Soc. 2012, 25, 1033–1089. [Google Scholar] [CrossRef] [Green Version]
  28. Brunel, N. J-B. Parameter estimation of ODE’s via nonparametric estimators. Electron. J. Statist. 2008, 2, 1242–1267. [Google Scholar] [CrossRef]
  29. Bergstra, J.; Yamins, D.; Cox, D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 115–123. [Google Scholar]
  30. Dunn, O. Multiple comparisons among means. J. Am. Stat. Assoc. 1961, 56, 52–64. [Google Scholar] [CrossRef]
  31. Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar]
  32. Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 1995, 57, 289–300. [Google Scholar] [CrossRef]
Figure 1. ROP1 and C a 2 + polarization dynamics. Left: ROP1 dynamics; Right: C a 2 + dynamics.
Figure 1. ROP1 and C a 2 + polarization dynamics. Left: ROP1 dynamics; Right: C a 2 + dynamics.
Entropy 25 00489 g001
Figure 2. The scheme of one Δ t .
Figure 2. The scheme of one Δ t .
Entropy 25 00489 g002
Figure 3. The scheme of mPDE-Net.
Figure 3. The scheme of mPDE-Net.
Entropy 25 00489 g003
Figure 4. Simulation results for true positive discovering with 5% noise. (a) F ^ 1 . (b) F ^ 2 .
Figure 4. Simulation results for true positive discovering with 5% noise. (a) F ^ 1 . (b) F ^ 2 .
Entropy 25 00489 g004
Figure 5. Simulation results for false positive discovering with 5% noise. (a) F ^ 1 . (b) F ^ 1 . (c) F ^ 2 . (d) F ^ 2 .
Figure 5. Simulation results for false positive discovering with 5% noise. (a) F ^ 1 . (b) F ^ 1 . (c) F ^ 2 . (d) F ^ 2 .
Entropy 25 00489 g005
Figure 6. Simulation results for F ^ 1 ( u , v ) with 1 % noise. (a) True positive discovering. (b) True positive discovering. (c) False positive discovering.
Figure 6. Simulation results for F ^ 1 ( u , v ) with 1 % noise. (a) True positive discovering. (b) True positive discovering. (c) False positive discovering.
Entropy 25 00489 g006
Figure 7. Simulation results for F ^ 2 ( u , v ) with 1 % noise. True positive discovering.
Figure 7. Simulation results for F ^ 2 ( u , v ) with 1 % noise. True positive discovering.
Entropy 25 00489 g007
Figure 8. The first (second resp.) row shows the true dynamics of u (v resp.) at times t = 0.4 , 0.6 , 0.8 , and   1.0 . The third (fourth resp.) row shows the predicted dynamics of u (v resp.) with 5 % noise level using Frac-PDE-Net.
Figure 8. The first (second resp.) row shows the true dynamics of u (v resp.) at times t = 0.4 , 0.6 , 0.8 , and   1.0 . The third (fourth resp.) row shows the predicted dynamics of u (v resp.) with 5 % noise level using Frac-PDE-Net.
Entropy 25 00489 g008
Figure 9. Images of the predicted dynamics using PDE-Net 2.0 with 5 % noise level.
Figure 9. Images of the predicted dynamics using PDE-Net 2.0 with 5 % noise level.
Entropy 25 00489 g009
Figure 10. The first row shows the true dynamics of ( u , v ) for ( x , t ) 5 π 2 , 5 π 2 × [ 0 , 10 ] . The second row presents the predicted dynamics of ( u , v ) with 1% noise level by Frac-PDE-Net.
Figure 10. The first row shows the true dynamics of ( u , v ) for ( x , t ) 5 π 2 , 5 π 2 × [ 0 , 10 ] . The second row presents the predicted dynamics of ( u , v ) with 1% noise level by Frac-PDE-Net.
Entropy 25 00489 g010
Figure 11. Images of the predicted dynamics of ( u , v ) for ( x , t ) 5 π 2 , 5 π 2 × [ 0 , 10 ] using PDE-Net 2.0 with 1% noise level.
Figure 11. Images of the predicted dynamics of ( u , v ) for ( x , t ) 5 π 2 , 5 π 2 × [ 0 , 10 ] using PDE-Net 2.0 with 1% noise level.
Entropy 25 00489 g011
Table 1. Fixed parameters for Frac-PDE-Net.
Table 1. Fixed parameters for Frac-PDE-Net.
ParameterValue
t[0, 0.15]
dt0.01
x & y[−5, 5]
dx & dy 10 64
N Init 12
N T i m e 15
N S p a c e 64
Table 2. Hyper-parameters selected by validation procedure Section 2.3 for Frac-PDE-Net.
Table 2. Hyper-parameters selected by validation procedure Section 2.3 for Frac-PDE-Net.
ParameterValue
λ M ( 5 % noise level) 3.28 × 10 5
λ S ( 5 % noise level) 4.93 × 10 5
Table 3. PDE model discovery with 5 % noise.
Table 3. PDE model discovery with 5 % noise.
True F 1 * 0.3 Δ u + 1 u 0.5 u v u 2 + 0.25 u + 0.25
F ^ 1 m P D E N e t 0.305 Δ u + 0.992 0.988 u 0.510 u v u 2 + 0.259 u + 0.265
0.003 v u 2 + 0.259 u + 0.265 + 0.003 v
F ^ 1 s m P D E N e t 0.305 Δ u + 1.00 0.981 u 0.532 u v u 2 + 0.259 u + 0.265
F ^ 1 r s m P D E N e t (Frac-PDE-Net) 0.304 Δ u + 0.975 0.982 u 0.501 u v u 2 + 0.256 u + 0.260
F ^ 1 P H r s m P D E N e t 0.278 Δ u + 0.969 0.993 u 0.514 u v u 2 + 0.301 u + 0.271
True F 2 * 0.4 Δ v + 0.4 0.2 v 0.5 u v u 2 + 0.25 u + 0.25
F ^ 2 m P D E N e t 0.398 Δ v + 0.412 0.195 v 0.510 u v u 2 + 0.254 u + 0.263
0.005 v u 2 + 0.254 u + 0.263 0.010 u
F ^ 2 s m P D E N e t 0.398 Δ v + 0.424 0.199 v 0.542 u v u 2 + 0.254 u + 0.263
F ^ 2 r s m P D E N e t (Frac-PDE-Net) 0.400 Δ v + 0.385 0.202 v 0.490 u v u 2 + 0.243 u + 0.256
F ^ 2 P H r s m P D E N e t 0.344 Δ v + 2.116 0.815 v
Table 4. Hypothesis tests with 5 % observation noise.
Table 4. Hypothesis tests with 5 % observation noise.
H 0 ( i ) vs. H A ( i ) , 1 i 50 Number of Rejections
Bonferroni49
Holm49
B-H49
Table 5. Fixed parameters for Frac-PDE-Net.
Table 5. Fixed parameters for Frac-PDE-Net.
ParameterValue
t[0, 0.75]
dt0.05
x[−2.5 π , 2.5 π ]
dx 5 π 200
N Init 14
N T i m e 15
N S p a c e 200
Table 6. Hyper-parameters selected for Frac-PDE-Net by the validation procedure as in Section 2.3.
Table 6. Hyper-parameters selected for Frac-PDE-Net by the validation procedure as in Section 2.3.
ParameterValue
λ M ( 1 % noise level) 1.88 × 10 7
λ S ( 1 % noise level) 1.62 × 10 6
Table 7. PDE model discovery with 1 % noise level.
Table 7. PDE model discovery with 1 % noise level.
True F 1 * 0.1 x 2 u + 3.6 u 1.5 3.6 u 0.229 u 1.5 2.5 π 2.5 π u d x
+ 0.081 u v 2 + 0.0215
F ^ 1 m P D E N e t 0.118 x 2 u + 3.959 u 1.361 3.871 u
0.223 u 1.361 2.5 π 2.5 π u d x + 0.0749 u ( v + 0.005 ) 2 + 0.0211
0.0002 u v ( v + 0.005 ) 2 + 0.0211 0.0029 v
F ^ 1 s m P D E N e t 0.117 x 2 u + 3.893 u 1.361 3.976 u
0.223 u 1.361 2.5 π 2.5 π u d x + 0.0750 u ( v + 0.005 ) 2 + 0.0211
F ^ 1 r s m P D E N e t (Frac-PDE-Net) 0.0899 x 2 u + 3.441 u 1.508 3.363 u 0.244 u 1.508 2.5 π 2.5 π u d x
+ 0.0714 u ( v + 0.0002 ) 2 + 0.0209
F ^ 1 P H r s m P D E N e t 0.026 x 2 u + 0.628 u 1.500 2.333 u + 0.0393 u ( v 0.0479 ) 2 + 0.0154
True F 2 * 10.0 x 2 v + u 0.4 v
F ^ 2 m P D E N e t 9.388 x 2 v + 0.963 u 0.400 v
F ^ 2 s m P D E N e t 9.388 x 2 v + 0.963 u 0.400 v
F ^ 1 r s m P D E N e t 9.588 x 2 v + 0.969 u 0.403 v
F ^ 1 P H r s m P D E N e t 8.145 x 2 v + 0.937 u 0.387 v
Table 8. PDE model discovered by PDE-Net 2.0.
Table 8. PDE model discovered by PDE-Net 2.0.
Predicted Terms by PDE-Net 2.0 with 5% Noise
F ^ 1 ( u , v ) 0.0457 Δ u 1.765 u + 0.0938 v + 0.0008
F ^ 2 ( u , v ) 0.243 Δ v 0.604 u 0.277 v + 7 ( 10 5 )
Table 9. Errors of predicted solutions for u and v by Frac-PDE-Net and PDE-Net 2.0.
Table 9. Errors of predicted solutions for u and v by Frac-PDE-Net and PDE-Net 2.0.
NoiseFrac-PDE-NetPDE-Net 2.0
| u ˜ u | t = 0.4 t = 0.6 t = 0.8 t = 1 t = 0.4 t = 0.6 t = 0.8 t = 1
L 5 % 0.0072540.0108060.0143100.0177650.2273650.3314380.4296020.522157
L 2 5 % 0.0001060.0001580.0002090.0002600.0027200.0039860.0051920.006341
| v ˜ v | t = 0.4 t = 0.6 t = 0.8 t = 1 t = 0.4 t = 0.6 t = 0.8 t = 1
L 5 % 0.0015030.0022470.0029880.0037250.2002410.2939390.3835770.469314
L 2 5%0.0000220.0000330.0000440.0000540.0019890.0029300.0038360.004708
Table 10. PDE model discovered by PDE-Net 2.0.
Table 10. PDE model discovered by PDE-Net 2.0.
Predicted Terms by PDE-Net 2.0 with 1% Noise
F ^ 1 ( u , v ) 0.0001 x 2 u 3.95 ( 10 5 ) u 6.05 ( 10 5 ) v 0.0002
F ^ 2 ( u , v ) 5.22 ( 10 5 ) x 2 v + 1.70 ( 10 5 ) u + 8.19 ( 10 6 ) v + 4.59 ( 10 5 )
Table 11. Errors of predicted solutions for u and v by Frac-PDE-Net and PDE-Net 2.0.
Table 11. Errors of predicted solutions for u and v by Frac-PDE-Net and PDE-Net 2.0.
NoiseFrac-PDE-NetPDE-Net 2.0
| u ˜ u | t [ 0 , 10 ] t [ 0 , 10 ]
L 1 % 0.0627710.117773
L 2 1%0.0000290.000060
| v ˜ v | t [ 0 , 10 ] t [ 0 , 10 ]
L 1 % 0.0094340.039400
L 2 1%0.0000100.000056
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, S.; Brunel, N.J.-B.; Yang, X.; Cui, X. Learning Interactions in Reaction Diffusion Equations by Neural Networks. Entropy 2023, 25, 489. https://doi.org/10.3390/e25030489

AMA Style

Chen S, Brunel NJ-B, Yang X, Cui X. Learning Interactions in Reaction Diffusion Equations by Neural Networks. Entropy. 2023; 25(3):489. https://doi.org/10.3390/e25030489

Chicago/Turabian Style

Chen, Sichen, Nicolas J-B. Brunel, Xin Yang, and Xinping Cui. 2023. "Learning Interactions in Reaction Diffusion Equations by Neural Networks" Entropy 25, no. 3: 489. https://doi.org/10.3390/e25030489

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop