An Efficient Parallel Implementation of the Runge–Kutta Discontinuous Galerkin Method with Weighted Essentially Non-Oscillatory Limiters on Three-Dimensional Unstructured Meshes

Pei, Weicheng; Jiang, Yuyan; Li, Shu

doi:10.3390/app12094228

Open AccessArticle

An Efficient Parallel Implementation of the Runge–Kutta Discontinuous Galerkin Method with Weighted Essentially Non-Oscillatory Limiters on Three-Dimensional Unstructured Meshes

by

Weicheng Pei

,

Yuyan Jiang

and

Shu Li

^*

School of Aeronautics Science and Engineering, Beihang University, 37 Xueyuan Road, Haidian District, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(9), 4228; https://doi.org/10.3390/app12094228

Submission received: 16 March 2022 / Revised: 18 April 2022 / Accepted: 20 April 2022 / Published: 22 April 2022

(This article belongs to the Topic Engineering Mathematics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In computational fluid dynamics, high-order solvers suitable for three-dimensional unstructured meshes are attractive but are less developed than other methods. In this article, we provide the formulation and a parallel implementation of the Runge–Kutta discontinuous Galerkin finite element method with weighted essentially non-oscillatory limiters, which are compact and effective for suppressing numerical oscillations near discontinuities. In our experiments, high-order solvers do outperform their low-order counterparts in accuracy and the efficient parallel implementation makes the time cost affordable for large problems. Such high-order parallel solvers are efficient tools for solving conservative laws including the Euler system that models inviscid compressible flows.

Keywords:

high-order CFD solvers; discontinuous Galerkin methods; WENO limiters; three-dimensional unstructured meshes; distributed memory parallelization

1. Introduction

Most of the computational fluid dynamics (CFD) solvers currently used in aerospace engineering are based on schemes using finite volume (FV) methods, which are more suitable than schemes using finite difference (FD) methods for unstructured meshes. However, FV schemes usually have only second-order spatial accuracy, due to the difficulty of handling irregular local stencils, whose sizes grows rapidly as the order of accuracy increases. In high-order FV schemes, the local stencil of a cell is made up of the cell and the cell’s neighbors, and the neighbors’ neighbors, and so on. To ease the development of high-order schemes for unstructured meshes, it is better to use finite element (FE) methods whose local stencils are much more compact than those of FV schemes. One family of such compact high-order schemes suitable for unstructured meshes are called the discontinuous Galerkin (DG) methods, which assume continuous approximation in each cell (like classic FE methods), but allow discontinuities to exist on cell boundaries (like classic FV methods). Such an assumption gives scheme designers more freedom on choosing the basis for each cell, such as orthogonal functions which lead to modal DG methods, and Lagrange polynomials which lead to nodal DG methods [1]. The original DG method was introduced by Reed and Hill [2] for solving the neutron transport equation, which is a linear problem contains only first-order spatial derivatives. For nonlinear conservation laws, Chavent and Salzano [3] made the first attempt of using a DG method to discretize in space and using the explicit Euler method for time discretization. This scheme is second-order accurate in space but only first-order accurate in time. To improve the accuracy of time discretization, Cockburn and Shu [4] replace the first-order Euler method with a special second-order Runge–Kutta (RK) method [5,6], which is explicit and total variation diminishing (TVD). This is the first successful Runge–Kutta discontinuous Galerkin (RKDG) method, which is high-order accurate both in space and time, for scalar conservation laws. This RKDG method was soon extended to one-dimensional system case [7], multi-dimensional scalar case [8] and multi-dimensional system case [9], which includes the Euler system that models inviscid compressible flows. Nearly the same time (late 1990s), the DG methods were also extended to the Navier–Stokes system that governs viscous flows. Bassi and Rebay [10] made the first attempt to apply the DG idea to both the unknowns and their gradients for solving the Navier–Stokes equations. This method was then generalized by Cockburn and Shu [11] to the family of local discontinuous Galerkin (LDG) methods, which extends the RKDG methods from conversation laws to convection-dominated problems. One may refer to [12], which is a comprehensive review article on this topic. In the rest of this article, we will only study the RKDG methods for conservation laws that are purely hyperbolic. In Section 2.1 and Section 2.3, we will give a matrix formed formulation of the RKDG method for solving three-dimensional conservation law systems.

When solving hyperbolic problems, limiting procedures, or limiters for short, are necessary for suppressing numerical oscillations that might occur near discontinuities (known as the Gibbs phenomenon [13]). For FD and FV schemes, there is the monotonic upstream-centered scheme for conservation laws (MUSCL) [14,15,16,17], which could achieve third-order accuracy when used with caution. However, it is not directly applicable to FE schemes. Besides the RKDG formulations, there is another important contribution made by [8,9], which is proposing the generalized slope limiters for multi-dimensional problems. These minmod type limiters, together with TVD RK methods, makes the high-order solutions free from non-physical oscillation, but tend to reduce the order of accuracy in smooth regions. To overcome this drawback, the essentially non-oscillatory (ENO) [18] and weighted ENO (WENO) [19] limiting procedures were introduced. Both of them can maintain high-order accuracy in smooth regions but essentially suppress spurious oscillations near discontinuities. The first attempt of making such limiters suitable for unstructured meshes was to incorporate a WENO reconstruction procedure into high-order FV schemes [20]. Similar ideas were later adopted for DG methods [21,22] at the cost of sacrificing the compactness of DG methods. Compact versions of WENO limiters suitable for DG methods only occurred in the last ten years [23,24,25] and most of them are only formulated for two-dimensional meshes. In Section 2.2, we will give a unified formulation of some compact WENO limiters for both two- and three-dimensional RKDG methods on unstructured meshes.

The two-dimensional version of the algorithms described in Section 2.1, Section 2.2 and Section 2.3 has been proposed for nearly ten years, but few three-dimensional or real engineering applications have been reported so far. One reason is that the amount of computational resources cost by each cell grows rapidly as the order of accuracy or the dimension of space increases. Fortunately, both RKDG methods and WENO limiters use highly compact stencils and there is no global algebraic equation to be solved (as in FD and FV schemes). Based on these facts, parallel computing using domain decomposition can be applied for accelerating computation in a very natural way. Currently, we have not seen any parallel implementation of RKDG methods with WENO limiters for three-dimensional unstructured meshes. However, there are some frequently referenced works existing in the literature, which measured the parallel efficiencies of their DG methods. Biswas, Devine and Flaherty [26] performed some tests of their one- and two-dimensional DG solvers for linear scalar problems on uniform structured meshes. They achieved excellent parallel efficiencies, which were over

99 %

for pure solution time (without I/O) and at least

89 %

for total running time. Bey, Patra and Oden [27] tested their DG solvers for linear conservation laws on both structured (uniform) and unstructured (

h p

-adaptive) meshes. They obtained nearly optimal speedups when the number of interior elements is sufficiently larger than that of subdomain boundary elements. Recently, Chalmers et al. [28] implemented their DG solver for two-dimensional Navier–Stokes equations using MPI/OpenMP hybrid parallelism and achieved good scalability on a uniform mesh with only quadrilateral elements. None of them show the parallel performance of DG solvers on three-dimensional unstructured meshes. In this article, we incorporate unstructured mesh partitioning and message passing into the algorithms and implement them on top of publicly available libraries to support parallel execution. To partition a three-dimensional unstructured mesh, we use the application programming interface (API) provided by the Metis library [29]. To send and receive messages, we use the message passing interface (MPI) [30], which is the de facto industry standard of distributed memory parallelization. We will give the details of these parallel programming techniques in Section 2.4.

2. Methods

2.1. Spatial Discretization

The differential form of a conservation law system in a three-dimensional space could be written as

\partial_{t} \underset{̲}{U} + \partial_{x} \underset{̲}{F^{x}} + \partial_{y} \underset{̲}{F^{y}} + \partial_{z} \underset{̲}{F^{z}} = \underset{̲}{O}

(1)

in which

$\partial_{t}, \partial_{x}, \partial_{y}, \partial_{z}$ represent $\frac{\partial}{\partial t}, \frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}$ , respectively;
$\underset{̲}{U}$ is a $K \times 1$ matrix of unknowns, each row of which is a scalar function depending on position $\vec{x}$ and time t;
$\underset{̲}{F^{μ}}$ (where $μ = x, y, z$ ) is also a $K \times 1$ matrix, which is the dot-product of the flux $\underset{̲}{\vec{F}}$ (whose value depends on $\underset{̲}{U}$ ) and ${\vec{e}}_{μ}$ (which is an unit vector along the positive direction of the $μ$ -axis);
$\underset{̲}{O}$ is a $K \times 1$ matrix of 0’s.

This differential equation could be turned into an integral equation, by multiplying both sides with an arbitrary function

V (\vec{x})

and integrating the product on an arbitrary control volume

Ω

:

\int_{Ω} (\partial_{t} \underset{̲}{U} + \nabla \cdot \underset{̲}{\vec{F}}) V = \underset{̲}{O} .

To weaken the smoothness requirements on

\underset{̲}{\vec{F}}

, we apply integral by parts and Gauss’s divergence theorem to it, which will lead to the weak form of Equation (1):

\int_{Ω} (\partial_{t} \underset{̲}{U} - \underset{̲}{\vec{F}} \cdot \nabla V) + \int_{\partial Ω} (\vec{ν} \cdot \underset{̲}{\vec{F}}) V = \underset{̲}{O},

(2)

where

\vec{ν}

is the outer normal unit vector of the control surface

\partial Ω

(which is the boundary of

Ω

).

To introduce spatial discretization for Equation (2), we choose the linear space spanned by polynomials up to the p-th degree over

Ω

, denoted as

V^{p} (Ω)

, as the approximation space. Let

\underset{̲}{ϕ} (\vec{x}) = [\begin{matrix} ϕ_{1} (\vec{x}) \dots ϕ_{L} (\vec{x}) \end{matrix}]

be a basis of

V^{p} (Ω)

, in this article, we choose

ϕ_{1} (\vec{x}) = 1, [\begin{matrix} ϕ_{2} (\vec{x}) \\ ϕ_{3} (\vec{x}) \\ ϕ_{4} (\vec{x}) \end{matrix}] = [\begin{matrix} x - x_{0} \\ y - y_{0} \\ z - z_{0} \end{matrix}], [\begin{matrix} ϕ_{5} (\vec{x}) \\ ϕ_{6} (\vec{x}) \\ ϕ_{7} (\vec{x}) \\ ϕ_{8} (\vec{x}) \\ ϕ_{9} (\vec{x}) \\ ϕ_{10} (\vec{x}) \end{matrix}] = [\begin{matrix} (x - x_{0}) (x - x_{0}) \\ (x - x_{0}) (y - y_{0}) \\ (x - x_{0}) (z - z_{0}) \\ (y - y_{0}) (y - y_{0}) \\ (y - y_{0}) (z - z_{0}) \\ (z - z_{0}) (z - z_{0}) \end{matrix}], \dots,

in which

(x_{0}, y_{0}, z_{0})

is the geometric center of

Ω

. Then

\underset{̲}{U}

and V could be approximated as

\underset{̲}{U} (\vec{x}, t) \approx {\underset{̲}{U}}^{h} (\vec{x}, t) = \sum_{l = 1}^{L} {\hat{\underset{̲}{U}}}_{l} (t) ϕ_{l} (\vec{x}), V (\vec{x}) \approx V^{h} (\vec{x}) = \sum_{l = 1}^{L} {\hat{V}}_{l} ϕ_{l} (\vec{x}),

where each

{\hat{\underset{̲}{U}}}_{l} (t)

is a

K \times 1

matrix of temporal functions, and each

{\hat{V}}_{l}

is a constant number (which is arbitrary since

V (\vec{x})

is arbitrary). Substitute them into the weak form (Equation (1)), we obtain

\sum_{l} {\hat{V}}_{l} [\sum_{k} (\int_{Ω} ϕ_{l} ϕ_{k}) {\underset{̲}{\hat{U}}}_{k} t + \int_{Ω} (\nabla ϕ_{l}) \cdot \underset{̲}{\vec{F}} ({\underset{̲}{U}}^{h}) + \oint_{\partial Ω} ϕ_{l} \underset{̲}{F^{ν}} ({\underset{̲}{U}}_{I}^{h}, {\underset{̲}{U}}_{O}^{h})] = \underset{̲}{O}

(3)

where

\vec{ν} \cdot \underset{̲}{\vec{F}} = : \underset{̲}{F^{ν}}

is the normal flux on

\partial Ω

, whose value could be solved from

{\underset{̲}{U}}_{I}^{h}

(the approximated inner-side state) and

{\underset{̲}{U}}_{O}^{h}

(the approximated outer-side state). We implement this procedure as an independent module called the Riemann solver of the conservation law (Equation (1)), see [31] for details.

Recall the arbitrariness of

{\{{\hat{V}}_{l}\}}_{l = 1}^{L}

and adopt the inner-product notation

〈 f | g 〉 : = \int_{Ω} f (\vec{x}) g (\vec{x}),

we could turn Equation (3) into a system of ordinary differential equations:

\frac{{d \underset{̲}{\hat{U}}}_{K \times L}}{d t} = {\underset{̲}{B}}_{K \times L} {\underset{̲}{A}}_{L \times L}^{- 1} = : {\underset{̲}{R}}_{K \times L}

(4)

in which

\underset{̲}{\hat{U}} (t) = {[\begin{matrix} 〈 U_{1} | ϕ_{1} 〉 & \dots & 〈 U_{1} | ϕ_{L} 〉 \\ ⋮ & ⋱ & ⋮ \\ 〈 U_{K} | ϕ_{1} 〉 & \dots & 〈 U_{K} | ϕ_{L} 〉 \end{matrix}]}_{K \times L}

is the matrix of temporal functions (which will be solved in Section 2.3), and

\underset{̲}{A} = {[\begin{matrix} 〈 ϕ_{1} | ϕ_{1} 〉 & \dots & 〈 ϕ_{1} | ϕ_{L} 〉 \\ ⋮ & ⋱ & ⋮ \\ 〈 ϕ_{L} | ϕ_{1} 〉 & \dots & 〈 ϕ_{L} | ϕ_{L} 〉 \end{matrix}]}_{L \times L}

is a constant matrix for a given

Ω

, and

\underset{̲}{B} = \int_{Ω} {[\begin{matrix} \underset{̲}{F^{x}} & \underset{̲}{F^{y}} & \underset{̲}{F^{z}} \end{matrix}]}_{K \times 3} {[\begin{matrix} \partial_{x} \underset{̲}{ϕ} \\ \partial_{y} \underset{̲}{ϕ} \\ \partial_{z} \underset{̲}{ϕ} \end{matrix}]}_{3 \times L} - \oint_{\partial Ω} {\underset{̲}{F^{ν}}}_{K \times 1} {\underset{̲}{ϕ}}_{1 \times L}

(5)

is a variable matrix depending on

\underset{̲}{\hat{U}}

, so is the residual matrix

\underset{̲}{R} = \underset{̲}{B} {\underset{̲}{A}}^{- 1}

. By applying the Gram–Schmidt orthonormalization to

\underset{̲}{ϕ}

, the constant matrix

\underset{̲}{A}

could be an identity matrix, which would lead to

\underset{̲}{R} = \underset{̲}{B}

. The integrals in

\underset{̲}{B}

would be evaluated by Gaussian quadrature rules. For triangular and tetrahedral cells, we used the quadrature rules given in [32].

2.2. Limiting Procedures

The limiters we used in this article was originally designed by [23,24]. Zhong [23] gives the formulation for two-dimensional structured meshes, and Zhu [24] extends it for two-dimensional unstructured meshes. Here we present a unified formulation for both two- and three-dimensional unstructured meshes. To simplify subscripts, we denote

{ψ |}_{E_{i}}

(the restriction of function

ψ

on element

E_{i}

) as

ψ_{i}

, and

{〈 ψ 〉}_{E_{i}}

(the average value of

ψ

on

E_{i}

) as

{〈 ψ 〉}_{i}

. Let

K_{i}

be the index set of

E_{i}

’s neighbors (those elements adjacent to

E_{i}

), and

K_{i}^{+} : = K_{i} \cup \{i\}

.

2.2.1. The `ScalarWeno` Limiter

To reconstruct a scalar-valued function

ψ

on

E_{i}

, we first borrow the expression of

ψ

from

E_{k}

to

E_{i}

ψ_{k \to i} (\vec{x}) : = ψ_{k} (\vec{x}) - {〈 ψ_{k} 〉}_{i} + {〈 ψ_{i} 〉}_{i}

(6)

for each

k \in K_{i}^{+}

. The key idea of WENO limiters is to build a convex combination of these borrowed functions:

ψ_{i}^{new} (\vec{x}) : = \sum_{k \in K_{i}^{+}} w_{k \to i} ψ_{k \to i} (\vec{x}) .

(7)

The non-negative weight

w_{k \to i}

should be determined from the smoothness of

ψ_{k \to i}

, so we then calculate the smoothness of

ψ_{k \to i}

for each

k \in K_{i}

:

β_{k \to i} = \sum_{| α | = 1}^{p} \frac{l_{i}^{2 | α |}}{| E_{i} |} \int_{E_{i}} {(\frac{\partial^{| α |} ψ_{k \to i}}{\partial x_{1}^{α_{1}} \dots \partial x_{d}^{α_{d}}})}^{2}, | α | : = α_{1} + \dots + α_{d}

(8)

in which

| E_{i} | : = \int_{E_{i}} 1

is the measure (i.e., “area” for

d = 2

, “volume” for

d = 3

) of

E_{i}

, and

l_{i} : = \sqrt[d]{| E_{i} |}

is the approximated length of

E_{i}

’s edges. Once we have these

β

’s, the weight for each

ψ_{k \to i}

could be constructed as

w_{k \to i} = \frac{w_{k \to i}^{β}}{\sum_{k \in K_{i}^{+}} w_{k \to i}^{β}}, w_{k \to i}^{β} : = \frac{w_{k \to i}^{✶}}{{(ε_{0} + β_{k \to i})}^{2}},

(9)

in which

w_{k \to i}^{✶} = \{\begin{matrix} ε_{1} & k \neq i \\ 1 - \sum_{j \in K_{i}} ε_{1} & k = i \end{matrix}

(10)

are called the ideal weights. The

ε

’s in Equations (9) and (10) are artificial parameters and we use

ε_{0} = 10^{- 6}

and

ε_{1} = 10^{- 3}

as suggested by [23,24]. This limiting procedure for scalar-valued functions is independent from the conservation laws to be solved, so it can be programmed as an independent module, which we would like to name as the ScalarWeno limiter.

2.2.2. The `EigenWeno` Limiter

For a conservation law system (Equation (1)), the value of the conservative variable

\underset{̲}{U}

is a column matrix, for which the following limiter is recommended. The first step is to obtain the

ν

-split form of Equation (1) on the interface shared by

E_{i}

and its neighbor

E_{k}

(for each

k \in K_{i}

):

\partial_{t} \underset{̲}{U} + \partial_{ν} \underset{̲}{F^{ν}} = \underset{̲}{O},

(11)

where

\partial_{ν} : = \vec{ν} \cdot \nabla

is the directional derivative operator and

\underset{̲}{F^{ν}} : = \vec{ν} \cdot \underset{̲}{\vec{F}}

is the normal flux (as in Equation (3)). It can be treated as a one-dimensional conservation law system whose flux Jacobian can be approximated by the average value of

\underset{̲}{U}

:

\underset{̲}{A^{ν}} = {\frac{\partial \underset{̲}{F^{ν}}}{\partial \underset{̲}{U}}|}_{{〈 \underset{̲}{U} 〉}_{i}} .

(12)

For a hyperbolic system, it is guaranteed that the

K \times K

matrix

\underset{̲}{A^{ν}}

has K real eigenvalues and has the eigenvalue decomposition

\underset{̲}{A^{ν}} = \underset{̲}{R} [\begin{matrix} λ_{1} \\ ⋱ \\ λ_{K} \end{matrix}] {\underset{̲}{R}}^{- 1}, \underset{̲}{R} : = [\begin{matrix} \underset{̲}{r_{1}} & \dots & \underset{̲}{r_{K}} \end{matrix}],

(13)

where

\underset{̲}{r_{k}}

(the k-th column of

\underset{̲}{R}

) is an eigenvector corresponding to the k-th eigenvalue

λ_{k}

(for

k = 1, \dots, K

). Once obtaining the

\underset{̲}{R}

, the original conservative variable

\underset{̲}{U}

can be projected into the space spanned by the

\underset{̲}{r}

’s, which gives the characteristic variable

\underset{̲}{V} : = {\underset{̲}{R}}^{- 1} \underset{̲}{U},

(14)

The next step is then to apply the ScalarWeno limiter (Equations (6)–(10)) on each scalar component of

\underset{̲}{V}

, which gives

{\underset{̲}{V}}^{new}

. After obtaining the reconstructed characteristic variable

{\underset{̲}{V}}^{new}

, it can be turned back into the original conservative variable

{\underset{̲}{U}}_{k \to i}^{new} : = \underset{̲}{R} {\underset{̲}{V}}^{new},

(15)

in which the subscript

k \to i

means that it is a function defined on

E_{i}

, which is constructed with the help of

E_{k}

. The final step is to weight these reconstructed conservative variables by the measure of the corresponding adjacent element:

{\underset{̲}{U}}_{i}^{new} : = \frac{\sum_{k \in K_{i}} {\underset{̲}{U}}_{k \to i}^{new} | E_{k} |}{\sum_{k \in K_{i}} | E_{k} |} .

(16)

Since the eigenvalue decomposition (Equation (13)) plays a central role in this limiting procedure, we would like to name it as the EigenWeno limiter.

2.2.3. The `LazyWeno` Limiter

The EigenWeno limiter (Equations (11)–(16)) works well on two-dimensional meshes in [23,24] and on three-dimensional meshes in this article. However, it depends on the conservation law system to be solved and thus is not applicable if the eigenvalue decomposition (Equation (13)) is not easily computable, or the task is to design a limiter for general matrix-valued functions (not necessarily the conservative variable of a conservation law system). In either case, one could simply apply the ScalarWeno limiter (Equations (6)–(9)) to each scalar component of

\underset{̲}{U}

, which is a matrix-valued function. Since this limiting procedure requires less derivation and computational resources, we would like to name it as the LazyWeno limiter.

2.3. Temporal Discretization

Equation (4) is a typical nonlinear ordinary differential equation system, which can be solved by various numerical methods, such as the Runge–Kutta methods (see [33]). However, to preserve the total variation diminishing (TVD) property of the solution, the method itself should be TVD [34] and some kind of limiters (already discussed in Section 2.2) should be carefully incorporated into it. In this article, we follow the practice of [23,24], which use the explicit third-order TVD Runge–Kutta method:

\begin{matrix} {\underset{̲}{\hat{U}}}^{n + 1 / 3} & = {\underset{̲}{\hat{U}}}^{n} + {\underset{̲}{R}}^{n} Δ t \\ {\underset{̲}{\hat{U}}}^{n + 2 / 3} & = \frac{3}{4} {\underset{̲}{\hat{U}}}^{n} + \frac{1}{4} ({\underset{̲}{\hat{U}}}^{n + 1 / 3} + {\underset{̲}{R}}^{n + 1 / 3} Δ t) \\ {\underset{̲}{\hat{U}}}^{n + 1} \equiv {\underset{̲}{\hat{U}}}^{n + 3 / 3} & = \frac{1}{3} {\underset{̲}{\hat{U}}}^{n} + \frac{2}{3} ({\underset{̲}{\hat{U}}}^{n + 2 / 3} + {\underset{̲}{R}}^{n + 2 / 3} Δ t) \end{matrix}

(17)

in which, integers in superscripts are the marks of time steps, and fractions in superscripts represent intermediate stages. The values of the right hand side (RHS) expressions are not guaranteed to be TVD, so limiting procedures must be applied before assigning them the the left hand side (LHS). To make it more clear, we introduce a nonlinear operator

L

(stands for limiter) into the RHS of Equation (17), which gives

\begin{matrix} {\underset{̲}{\hat{U}}}^{n + 1 / 3} & = L [{\underset{̲}{\hat{U}}}^{n} + {\underset{̲}{R}}^{n} Δ t] \\ {\underset{̲}{\hat{U}}}^{n + 2 / 3} & = L [\frac{3}{4} {\underset{̲}{\hat{U}}}^{n} + \frac{1}{4} ({\underset{̲}{\hat{U}}}^{n + 1 / 3} + {\underset{̲}{R}}^{n + 1 / 3} Δ t)] \\ {\underset{̲}{\hat{U}}}^{n + 1} \equiv {\underset{̲}{\hat{U}}}^{n + 3 / 3} & = L [\frac{1}{3} {\underset{̲}{\hat{U}}}^{n} + \frac{2}{3} ({\underset{̲}{\hat{U}}}^{n + 2 / 3} + {\underset{̲}{R}}^{n + 2 / 3} Δ t)] \end{matrix}

(18)

This notation clearly emphasize the application of limiters.

2.4. Parallel Programming

Both the flux integrals (Equation (5)) in the DG method and the function borrowing (Equation (6)) in WENO limiters put a requirement on each cell to access its neighbors in

𝒪

(1) time, which is not supported by commonly used mesh formats. For this reason, we do not partition the input mesh directly using the METIS_PartMeshDual(...) function as in traditional finite element methods, but convert the mesh to its dual graph by the METIS_MeshToDual(...) function, and then partition the graph using the METIS_PartGraphKway(...) function. The METIS_MeshToDual(...) function stores the cell adjacency information in dynamically allocated arrays pointed by raw pointers, which should then be the released exactly once by some caller in the call stack. To avoid memory bugs, we suggest to wrap such raw pointers into some smart pointers, such as those provided by the standard library of modern C++ [35].

Once we obtain the partitioning, each part of the mesh should then be load by a process, which holds and updates local data sequentially and shares data on inter-part boundaries with neighbors when necessary. This is a typical scenario of the distributed memory parallelization, which achieves acceleration by solving relatively equal-sized subproblems simultaneously on multiple cores. Compared with this, the shared memory parallelization, which provides a global memory address space shared by multiple threads, is generally easier to program but less scalable. The price we paid for scalability is the explicit management of message passing for sharing data between processes. Thanks to the publicly available implementations of the MPI standard, such as MPICH (https://www.mpich.org) and Open-MPI (https://www.open-mpi.org), the code for doing this is much simpler than it used to be. To improve readability and maintainability of our code, we wrap these communication operations in functions names as ShareSomething(...), which share the same code structure:

For each destination, put the data to be sent into a sending buffer and register a request of sending by calling the MPI_Isend(...) function.
For each source, allocate a receiving buffer for the data to be received and register a request of receiving by calling the MPI_Irecv(...) function.
Performed other computations that can be conducted without communications.
Block the process until all its requests complete by calling the the MPI_Waitall(...) function.

The third step is optional but may help to improve parallel efficiency, since it allows computations to overlap with communications.

3. Results

In this section, we give the results of various numerical experiments to show the accuracy and performance of the methods described in Section 2. Even though all these experiments can be carried out on one- or two-dimensional structured meshes, we intentionally solve them on three-dimensional unstructured meshes. In this way, the applicability of our solvers for real engineering problems could be demonstrated.

3.1. Linear Conservation Laws

The first group of problems to be solved is the linear version of Equation (1):

\partial_{t} \underset{̲}{U} + \underset{̲}{A^{x}} \partial_{x} \underset{̲}{U} + \underset{̲}{A^{y}} \partial_{y} \underset{̲}{U} + \underset{̲}{A^{z}} \partial_{z} \underset{̲}{U} = \underset{̲}{O}

(19)

with certain boundary and initial conditions. These problems are mathematically simple in the sense that they can be solved analytically. The existence of analytic solutions gives us a good way to measure the accuracy of our numerical solvers. In this subsection, we use tetrahedral meshes generated in a

[0, 4] \times [0, 1] \times [0, 0.5]

box.

3.1.1. Scalar Case

This is the simplest case of Equation (19):

Problem 1.

In Equation (19), let

\underset{̲}{U}

consists only one component and each

\underset{̲}{A}

consists a single number:

\underset{̲}{U} (\vec{x}, t) = [\begin{matrix} U (x, y, z, t) \end{matrix}], \underset{̲}{A^{x}} = [\begin{matrix} - 10 \end{matrix}], \underset{̲}{A^{y}} = \underset{̲}{A^{z}} = [\begin{matrix} 0 \end{matrix}] .

The following boundary conditions

\underset{̲}{U} (x = 0, y, z, t) = [\begin{matrix} - 10 \end{matrix}] = : {\underset{̲}{U}}_{L}, \underset{̲}{U} (x = 4, y, z, t) = [\begin{matrix} + 10 \end{matrix}] = : {\underset{̲}{U}}_{R},

and the initial condition

\underset{̲}{U} (x, y, z, t = 0) = {\underset{̲}{U}}_{L}

are applied.

The analytic solution of Problem 1 is:

\underset{̲}{U} (x, y, z, t) = \{\begin{matrix} {\underset{̲}{U}}_{L} & x - 4 < - 10 t \\ {\underset{̲}{U}}_{R} & x - 4 > - 10 t \end{matrix}

which can be interpreted as a left-running plane wave, whose profile is a jump initially positioned at the right end (

x = 4

). To compare the accuracy of various schemes, we plot the meshes and the numerical solutions at the same moment (

t = 0.2

) in Figure 1, Figure 2, Figure 3 and Figure 4. The white-colored regions in these figures are continuous transition layers, which are inevitable due to the dissipation of numerical schemes. The thickness of such transition layer, however, can then be used as an indicator of the scheme’s accuracy. Ideally, the thickness should be infinitesimal, as in the analytic solution. The differences between these results are more obvious in Figure 5 and Figure 6, in which we plot the values on 1001 uniformly distributed sample points along the longitudinal axis (on which

y = 0.5

and

z = 0.25

) for each solver.

In Table 1, we show the measured error and time cost of each solver/mesh pair. The seconds consumed by first-order solutions are somewhat exaggerated, since we use the same high-order quadrature rules for both first- and third-order solvers, which is necessary to integrate non-constant errors. If we did not have to measure the errors, then low-order numerical integrators could be used, which might save some time.

The following conclusions can be drawn from both Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6 and Table 1:

Both mesh refinement (decreasing h) and order increment (increasing p) can help to improve accuracy.
The solver of the highest order ( $p = 3$ ) on the coarsest ( $h \approx 2^{- 2}$ ) mesh defeats the solver of the lowest order ( $p = 1$ ) on the finest ( $h \approx 2^{- 5}$ ) mesh in accuracy but saves quite a lot of time.
High-order schemes are better than low-order ones in the sense of getting the same level of accuracy with less time cost.

3.1.2. System Case

Problem 2.

In Equation (19), let

\underset{̲}{U}

consists two components and each

\underset{̲}{A}

be a

2 \times 2

matrix:

\underset{̲}{U} (\vec{x}, t) = [\begin{matrix} U_{1} (x, y, z, t) \\ U_{2} (x, y, z, t) \end{matrix}], \underset{̲}{A^{x}} = [\begin{matrix} 6 & - 2 \\ - 2 & 6 \end{matrix}], \underset{̲}{A^{y}} = \underset{̲}{A^{z}} = [\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix}],

The following boundary conditions

\underset{̲}{U} (x = 0, y, z, t) = [\begin{matrix} 0 \\ 0 \end{matrix}] = : {\underset{̲}{U}}_{L}, \underset{̲}{U} (x = 4, y, z, t) = [\begin{matrix} 12 \\ - 4 \end{matrix}] = : {\underset{̲}{U}}_{R},

and the initial condition

\underset{̲}{U} (x, y, z, t = 0) = {\underset{̲}{U}}_{R}

are applied.

To solve this problem analytically, we first obtain the eigenvalue decomposition of

\underset{̲}{A^{x}}

, which is

\underset{\underset{̲}{A^{x}}}{\underset{︸}{[\begin{matrix} 6 & - 2 \\ - 2 & 6 \end{matrix}]}} = \underset{\underset{̲}{R^{x}}}{\underset{︸}{[\begin{matrix} 1 & 1 \\ - 1 & 1 \end{matrix}]}} \underset{\underset{̲}{Λ^{x}}}{\underset{︸}{[\begin{matrix} 8 \\ 4 \end{matrix}]}} \underset{{(\underset{̲}{R^{x}})}^{- 1}}{\underset{︸}{[\begin{matrix} 1 / 2 & - 1 / 2 \\ 1 / 2 & 1 / 2 \end{matrix}]}}

By introducing the characteristic variable

\underset{̲}{V} : = {(\underset{̲}{R^{x}})}^{- 1} \underset{̲}{U}

, which means

[\begin{matrix} V_{1} \\ V_{2} \end{matrix}] : = [\begin{matrix} 1 / 2 & - 1 / 2 \\ 1 / 2 & 1 / 2 \end{matrix}] [\begin{matrix} U_{1} \\ U_{2} \end{matrix}] = \frac{1}{2} [\begin{matrix} U_{1} - U_{2} \\ U_{1} + U_{2} \end{matrix}],

the boundary conditions become

{\underset{̲}{V}}_{L} = {(\underset{̲}{R^{x}})}^{- 1} {\underset{̲}{U}}_{L} = [\begin{matrix} 0 \\ 0 \end{matrix}], {\underset{̲}{V}}_{R} = {(\underset{̲}{R^{x}})}^{- 1} {\underset{̲}{U}}_{R} = [\begin{matrix} 8 \\ 4 \end{matrix}],

and the system can be decoupled:

[\begin{matrix} \partial_{t} + 8 \partial_{x} \\ \partial_{t} + 4 \partial_{x} \end{matrix}] [\begin{matrix} V_{1} \\ V_{2} \end{matrix}] = [\begin{matrix} 0 \\ 0 \end{matrix}] .

We can then solve these two scalar problems independently, which gives

V_{1} (x, y, z, t) = \{\begin{matrix} 0 & x < 8 t \\ 8 & x > 8 t \end{matrix}, V_{2} (x, y, z, t) = \{\begin{matrix} 0 & x < 4 t \\ 4 & x > 4 t \end{matrix} .

The solution of Problem 2 can be obtained by

\underset{̲}{U} = \underset{̲}{R^{x}} \underset{̲}{V}

, which gives

\underset{̲}{U} (\vec{x}, t) = [\begin{matrix} V_{2} + V_{1} \\ V_{2} - V_{1} \end{matrix}] = \{\begin{matrix} {\underset{̲}{U}}_{L} & x / t < 4 \\ {\underset{̲}{U}}_{M} & x / t \in (4, 8) \\ {\underset{̲}{U}}_{R} & x / t > 8 \end{matrix},

(20)

where

{\underset{̲}{U}}_{L} = [\begin{matrix} 0 \\ 0 \end{matrix}], {\underset{̲}{U}}_{M} = [\begin{matrix} 4 \\ 4 \end{matrix}], {\underset{̲}{U}}_{R} = [\begin{matrix} 12 \\ - 4 \end{matrix}] .

With this analytic solution, we can evaluate the accuracy of our numerical solvers. Four solver–limiter pairs are tested on the same mesh (

h \approx 2^{- 3}

) used in Figure 1.

We plot the contour of

\underset{̲}{U} (x, y, z, t = 0.3)

with the underlying mesh in Figure 7 and Figure 8 and compare the results along the longitudinal axis at

t = 0.3

with the analytic solution in Figure 9 and Figure 10. It is clear that both LazyWeno and EigenWeno (see Section 2.2) can essentially suppress non-physical oscillations in each component. Figure 11 shows that higher-order (

p = 3

) solvers still outperforms lower-order (

p = 2

) solvers in accuracy and the EigenWeno limiter generally works better than its LazyWeno counterpart. For this reason, we will use EigenWeno limiters exclusively in the rest of this section.

3.2. Inviscid Compressible Flows

The second group of problems to be solved is the three-dimensional Euler system:

\partial_{t} [\begin{matrix} ρ \\ ρ u_{x} \\ ρ u_{y} \\ ρ u_{z} \\ ρ e_{0} \end{matrix}] + \partial_{x} [\begin{matrix} ρ u_{x} \\ ρ u_{x} u_{x} + p \\ ρ u_{y} u_{x} \\ ρ u_{z} u_{x} \\ ρ h_{0} u_{x} \end{matrix}] + \partial_{y} [\begin{matrix} ρ u_{y} \\ ρ u_{x} u_{y} \\ ρ u_{y} u_{y} + p \\ ρ u_{z} u_{y} \\ ρ h_{0} u_{y} \end{matrix}] + \partial_{z} [\begin{matrix} ρ u_{z} \\ ρ u_{x} u_{z} \\ ρ u_{y} u_{z} \\ ρ u_{z} u_{z} + p \\ ρ h_{0} u_{z} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{matrix}]

(21)

with certain boundary and initial conditions. These problems are genuinely nonlinear which cannot be solved analytically in general. However, their exact or high-order solutions in lower-dimensional spaces are well known in CFD studies, which can still be used to test our three-dimensional solvers.

For this system, the

\underset{̲}{A^{ν}}

defined in Equation (12) depends on

\underset{̲}{U}

, so do the

{\underset{̲}{R}}^{- 1}

in Equation (14) and the

\underset{̲}{R}

in Equation (15). Fortunately, these matrices can be explicitly formulated:

\underset{̲}{R} (\underset{̲}{U}) = [\begin{matrix} 1 & 1 & 0 & 0 & 1 \\ u_{x} - a ν_{x} & u_{x} & σ_{x} & π_{x} & u_{x} + a ν_{x} \\ u_{y} - a ν_{y} & u_{y} & σ_{y} & π_{y} & u_{y} + a ν_{y} \\ u_{z} - a ν_{z} & u_{z} & σ_{z} & π_{z} & u_{z} + a ν_{z} \\ h_{0} - u_{ν} a & \frac{u_{x}^{2} + u_{y}^{2} + u_{z}^{2}}{2} & u_{σ} & u_{π} & h_{0} + u_{ν} a \end{matrix}], [\begin{matrix} u_{ν} \\ u_{σ} \\ u_{π} \end{matrix}] = {[\begin{matrix} ν_{x} & σ_{x} & π_{x} \\ ν_{y} & σ_{y} & π_{y} \\ ν_{z} & σ_{z} & π_{z} \end{matrix}]}^{- 1} [\begin{matrix} u_{x} \\ u_{y} \\ u_{z} \end{matrix}],

\underset{̲}{L} (\underset{̲}{U}) = [\begin{matrix} \frac{1}{2} (B_{2} + \frac{u_{ν}}{a}) & \frac{- 1}{2} (B_{1} u_{x} + \frac{ν_{x}}{a}) & \frac{- 1}{2} (B_{1} u_{y} + \frac{ν_{y}}{a}) & \frac{- 1}{2} (B_{1} u_{z} + \frac{ν_{z}}{a}) & \frac{1}{2} B_{1} \\ 1 - B_{2} & B_{1} u_{x} & B_{1} u_{y} & B_{1} u_{z} & - B_{1} \\ - u_{σ} & σ_{x} & σ_{y} & σ_{z} & 0 \\ - u_{π} & π_{x} & π_{y} & π_{z} & 0 \\ \frac{1}{2} (B_{2} - \frac{u_{ν}}{a}) & \frac{- 1}{2} (B_{1} u_{x} - \frac{ν_{x}}{a}) & \frac{- 1}{2} (B_{1} u_{y} - \frac{ν_{y}}{a}) & \frac{- 1}{2} (B_{1} u_{z} - \frac{ν_{z}}{a}) & \frac{1}{2} B_{1} \end{matrix}],

in which

B_{1} : = (γ - 1) / a^{2}

and

B_{2} : = B_{1} (u_{ν}^{2} + u_{σ}^{2} + u_{π}^{2})

.

3.2.1. Shock Tube Problems

These problems are usually defined as one-dimensional problems, but we treat them as three-dimensional ones. All these problems are considered in a

[0.0, 5.0] \times [0.0, 1.0] \times [0.0, 0.5]

box with all boundaries closed but the left and right ends open. Although no analytic solutions exist, we can still use the method described in [31], which solves nonlinear algebraic equations numerically, to obtain their exact solutions. To test the numerical methods described in Section 2, we use the unstructured hexahedral mesh in Figure 12, in which

h \approx 1 / 10

.

Problem 3

(Sod). Solve the Euler system (Equation (21)) for

t \in [0.0, 1.0]

with the initial condition

{[\begin{matrix} ρ & u & v s . & w & p \end{matrix}]}_{t = 0} = \{\begin{matrix} [\begin{matrix} 1.000 & 0.000 & 0.000 & 0.000 & 1.000 \end{matrix}] & x < 2 \\ [\begin{matrix} 0.125 & 0.000 & 0.000 & 0.000 & 0.100 \end{matrix}] & x > 2 \end{matrix}

Problem 4

(Lax). Solve the Euler system (Equation (21)) for

t \in [0.0, 0.6]

with the initial condition

{[\begin{matrix} ρ & u & v s . & w & p \end{matrix}]}_{t = 0} = \{\begin{matrix} [\begin{matrix} 0.445 & 0.698 & 0.000 & 0.000 & 3.528 \end{matrix}] & x < 2 \\ [\begin{matrix} 0.500 & 0.000 & 0.000 & 0.000 & 0.571 \end{matrix}] & x > 2 \end{matrix}

Problem 5

(Vacuum). Solve the Euler system (Equation (21)) for

t \in [0.0, 0.3]

with the initial condition

{[\begin{matrix} ρ & u & v s . & w & p \end{matrix}]}_{t = 0} = \{\begin{matrix} [\begin{matrix} 1.0 & - 4.0 & 0.0 & 0.0 & 0.4 \end{matrix}] & x < 2 \\ [\begin{matrix} 1.0 & + 4.0 & 0.0 & 0.0 & 0.4 \end{matrix}] & x > 2 \end{matrix}

In Figure 13, Figure 14 and Figure 15, we plot the density contours given by the same third-order solver with an EigenWeno limiter. In Figure 13, Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18, we plot the density distributions along the longitudinal axis (on which

y = 0.5

and

z = 0.25

) of the box. All these results show that higher-order (

p = 3

) solvers with EigenWeno limiters are better than lower-order (

p = 1

) solvers at capturing discontinuities (shocks, contacts, expansions), which may occur frequently in compressible flows.

3.2.2. Double Mach Reflection Problem

This is a classical two-dimensional problem originally proposed in [36], which we redefine here as a three-dimensional one:

Problem 6.

Solve the Euler system (Equation (21)) in the region defined in Figure 19, in which

x_{0} = 1 / 6

. The initial condition is given as a moving shock wave:

{[\begin{matrix} ρ & u & v s . & w & p \end{matrix}]}_{t = 0} = \{\begin{matrix} [\begin{matrix} 1.4 & 0.0 & 0.0 & 0.0 & 1.0 \end{matrix}] & y < \sqrt{3} (x - x_{0}) \\ [\begin{matrix} 8.0 & u_{A} & v_{A} & 0.0 & 116.5 \end{matrix}] & y > \sqrt{3} (x - x_{0}) \end{matrix},

(22)

in which

u_{A} = 4.125 \sqrt{3}

and

v_{A} = - 4.125

are the velocity components after the shock wave. The boundary conditions are given as following:

The $x = 0$ surface is open as an inlet;
The $x = 4$ surface and the $x < x_{0}$ part of the $y = 0$ surface are open as outlets;
The $x > x_{0}$ part of the $y = 0$ surface is closed as a solid wall;
The $y = 1$ surface has the following prescribed state:

[\begin{matrix} ρ & u & v s . & w & p \end{matrix}] = \{\begin{matrix} [\begin{matrix} 1.4 & 0.0 & 0.0 & 0.0 & 1.0 \end{matrix}] & 1 < \sqrt{3} (x - (x_{0} + u_{A} t)) \\ [\begin{matrix} 8.0 & u_{A} & v_{A} & 0.0 & 116.5 \end{matrix}] & 1 > \sqrt{3} (x - (x_{0} + u_{A} t)) \end{matrix},

which is consistent with the initial condition (Equation (22)).

As a common practice, we plot the density contour at

t = 0.2

in a

[0, 3] \times [0, 1]

rectangle (on the

z = 0

surface) for each solver in Figure 20, Figure 21 and Figure 22. It is clear that as the accuracy order increases, the thickness of each discontinuity decreases and the rolled-up vortex structure becomes more clear.

Before concluding this section, we provide the measured performance of our third-order solver that produces Figure 22 in Table 2, in which

P means the number of processes (one process per core).
$T_{n}$ means the wall clock time to finish the first n step.
$P \frac{T_{m + m} - T_{n}}{m}$ is the core time per step. The total core time of all steps is often used as an index for charging by high performance computing centers.

Since parallel I/O operations require many collective communications, we write one frame every 100 steps. Thus, the difference between the values in the last two columns is the amortized core time of writing per step. We have to admit that this cost is growing as the number of cores increases. If the number of cores keeps increasing, this may be a bottleneck of maintaining scalability.

The community of parallel computing usually use the speedup (S) and efficiency (E) defined as

S = \frac{T_{serial}}{T_{parallel}}, E = \frac{S}{P} \times 100 %,

to assess the performance a parallel program. We follow this practice, calculate these values based on the measured data given in Table 2 and plot them in Figure 23 and Figure 24. These figures show again that the I/O operations have adverse effects on the parallel performance.

The efficiency values given in Figure 24 fluctuate around

90 %

, which are not as good as those above

99 %

in [26]. One source of such gap is the imperfect load balancing of our tests. Since we are using a three-dimensional unstructured mesh, it can hardly be partitioned uniformly, which is an

N P

-hard problem. On the other hand, their meshes are one- and two-dimensional structured, on which uniform partitioning can be trivially achieved. Figure 25 gives the distribution of cells of the 100-part mesh partitioning (Figure 26) used in this section, which shows a

3 %

fluctuation.

4. Discussion

In this article, we have formulated the RKDG methods and the WENO limiters for three-dimensional unstructured meshes. The algorithms have been implemented on top of the MPI standard, which supports distributed memory parallelization. The numerical experiments have shown that increasing a solver’s accuracy order helps more to produce better results than just refining the mesh it uses. The efficient parallel implementation has made the time cost affordable for large problems, as long as the solvers can be executed on sufficiently large number of cores. Extending the methods to Navier–Stokes equations and applications of these high-order parallel solvers to real engineering problems are ongoing works. Further optimization of the parallel I/O module may be conducted to achieve better parallel performance.

Author Contributions

Conceptualization, W.P. and S.L.; methodology, W.P.; software, W.P. and Y.J.; validation, W.P. and Y.J.; formal analysis, W.P. and Y.J.; investigation, W.P.; resources, W.P.; data curation, W.P.; writing—original draft preparation, W.P.; writing—review and editing, W.P., Y.J. and S.L.; visualization, W.P.; supervision, S.L.; project administration, S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National High-tech R&D Program of China (863 Program) grant number 2012AA112201.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this article are all generated from the source code publicly available in our Git repository https://github.com/pvc1989/miniCFD accessed on 15 March 2022 (or the mirror site https://gitee.com/pvc1989/miniCFD, accessed on 15 March 2022).

Acknowledgments

The authors would like to express the deepest appreciation to Zhou Yukai for his professional assistance with typesetting and graphing.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Hesthaven, J.S.; Warburton, T. Nodal Discontinuous Galerkin Methods; Springer: New York, NY, USA, 2008. [Google Scholar] [CrossRef] [Green Version]
Reed, W.H.; Hill, T.R. Triangular Mesh Methods for the Neutron Transport Equation; Technical Report LA-UR-73-479; Los Alamos Scientific Lab.: Los Alamos, NM, USA, 1973. [Google Scholar]
Chavent, G.; Salzano, G. A finite-element method for the 1-D water flooding problem with gravity. J. Comput. Phys. 1982, 45, 307–344. [Google Scholar] [CrossRef]
Cockburn, B.; Shu, C.W. The Runge-Kutta local projection P1-discontinuous-Galerkin finite element method for scalar conservation laws. In Proceedings of the 1st National Fluid Dynamics Conference, Cincinnati, OH, USA, 25–28 July 1988; American Institute of Aeronautics and Astronautics: Reston, VA, USA, 1988. [Google Scholar] [CrossRef] [Green Version]
Shu, C.W.; Osher, S. Efficient implementation of essentially non-oscillatory shock-capturing schemes. J. Comput. Phys. 1988, 77, 439–471. [Google Scholar] [CrossRef] [Green Version]
Shu, C.W.; Osher, S. Efficient implementation of essentially non-oscillatory shock-capturing schemes, II. J. Comput. Phys. 1989, 83, 32–78. [Google Scholar] [CrossRef]
Cockburn, B.; Lin, S.Y.; Shu, C.W. TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws III: One-dimensional systems. J. Comput. Phys. 1989, 84, 90–113. [Google Scholar] [CrossRef] [Green Version]
Cockburn, B.; Hou, S.; Shu, C.W. The Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. IV. The multidimensional case. Math. Comput. 1990, 54, 545–581. [Google Scholar] [CrossRef] [Green Version]
Cockburn, B. An introduction to the Discontinuous Galerkin method for convection-dominated problems. In Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1998; pp. 150–268. [Google Scholar] [CrossRef]
Bassi, F.; Rebay, S. A High-Order Accurate Discontinuous Finite Element Method for the Numerical Solution of the Compressible Navier–Stokes Equations. J. Comput. Phys. 1997, 131, 267–279. [Google Scholar] [CrossRef]
Cockburn, B.; Shu, C.W. The Local Discontinuous Galerkin Method for Time-Dependent Convection-Diffusion Systems. SIAM J. Numer. Anal. 1998, 35, 2440–2463. [Google Scholar] [CrossRef]
Cockburn, B.; Shu, C.W. Runge–Kutta Discontinuous Galerkin Methods for Convection-Dominated Problems. J. Sci. Comput. 2001, 16, 173–261. [Google Scholar] [CrossRef]
Gottlieb, D.; Shu, C.W. On the Gibbs Phenomenon and Its Resolution. SIAM Rev. 1997, 39, 644–668. [Google Scholar] [CrossRef] [Green Version]
Leer, B.V. Towards the ultimate conservative difference scheme III. Upstream-centered finite-difference schemes for ideal compressible flow. J. Comput. Phys. 1977, 23, 263–275. [Google Scholar] [CrossRef]
Leer, B.V. Towards the ultimate conservative difference scheme. IV. A new approach to numerical convection. J. Comput. Phys. 1977, 23, 276–299. [Google Scholar] [CrossRef]
van Leer, B. Towards the ultimate conservative difference scheme. V. A second-order sequel to Godunov’s method. J. Comput. Phys. 1979, 32, 101–136. [Google Scholar] [CrossRef]
van Leer, B.; Nishikawa, H. Towards the ultimate understanding of MUSCL: Pitfalls in achieving third-order accuracy. J. Comput. Phys. 2021, 446, 110640. [Google Scholar] [CrossRef]
Harten, A.; Engquist, B.; Osher, S.; Chakravarthy, S.R. Uniformly High Order Accurate Essentially Non-oscillatory Schemes, III. J. Comput. Phys. 1997, 131, 3–47. [Google Scholar] [CrossRef]
Jiang, G.S.; Shu, C.W. Efficient Implementation of Weighted ENO Schemes. J. Comput. Phys. 1996, 126, 202–228. [Google Scholar] [CrossRef] [Green Version]
Hu, C.; Shu, C.W. Weighted Essentially Non-oscillatory Schemes on Triangular Meshes. J. Comput. Phys. 1999, 150, 97–127. [Google Scholar] [CrossRef] [Green Version]
Qiu, J.; Shu, C.W. Runge–Kutta Discontinuous Galerkin Method Using WENO Limiters. SIAM J. Sci. Comput. 2005, 26, 907–929. [Google Scholar] [CrossRef]
Zhu, J.; Qiu, J.; Shu, C.W.; Dumbser, M. Runge–Kutta discontinuous Galerkin method using WENO limiters II: Unstructured meshes. J. Comput. Phys. 2008, 227, 4330–4353. [Google Scholar] [CrossRef]
Zhong, X.; Shu, C.W. A simple weighted essentially nonoscillatory limiter for Runge–Kutta discontinuous Galerkin methods. J. Comput. Phys. 2013, 232, 397–415. [Google Scholar] [CrossRef]
Zhu, J.; Zhong, X.; Shu, C.W.; Qiu, J. Runge–Kutta discontinuous Galerkin method using a new type of WENO limiters on unstructured meshes. J. Comput. Phys. 2013, 248, 200–220. [Google Scholar] [CrossRef]
Mazaheri, A.; Shu, C.W.; Perrier, V. Bounded and compact weighted essentially nonoscillatory limiters for discontinuous Galerkin schemes: Triangular elements. J. Comput. Phys. 2019, 395, 461–488. [Google Scholar] [CrossRef]
Biswas, R.; Devine, K.D.; Flaherty, J.E. Parallel, adaptive finite element methods for conservation laws. Appl. Numer. Math. 1994, 14, 255–283. [Google Scholar] [CrossRef]
Bey, K.S.; Patra, A.; Oden, J.T. hp-version discontinuous Galerkin methods for hyperbolic conservation laws: A parallel adaptive strategy. Int. J. Numer. Methods Eng. 1995, 38, 3889–3908. [Google Scholar] [CrossRef]
Chalmers, N.; Agbaglah, G.; Chrust, M.; Mavriplis, C. A parallel hp-adaptive high order discontinuous Galerkin method for the incompressible Navier-Stokes equations. J. Comput. Phys. X 2019, 2, 100023. [Google Scholar] [CrossRef]
Karypis, G.; Kumar, V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 1998, 20, 359–392. [Google Scholar] [CrossRef]
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, 4th ed.; University of Tennessee: Knoxville, TN, USA, 2021. [Google Scholar]
Toro, E.F. Riemann Solvers and Numerical Methods for Fluid Dynamics; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
Zhang, L.; Cui, T.; Liu, H. A Set of Symmetric Quadrature Rules on Triangles and Tetrahedra. J. Comput. Math. 2009, 27, 89–96. [Google Scholar]
Quarteroni, A.; Sacco, R.; Saleri, F. Numerical Mathematics; Springer: New York, NY, USA, 2007. [Google Scholar] [CrossRef] [Green Version]
Gottlieb, S.; Shu, C.W. Total variation diminishing Runge-Kutta schemes. Math. Comput. 1998, 67, 73–85. [Google Scholar] [CrossRef] [Green Version]
Meyers, S. Effective Modern C++; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2015. [Google Scholar]
Woodward, P.; Colella, P. The numerical simulation of two-dimensional fluid flow with strong shocks. J. Comput. Phys. 1984, 54, 115–173. [Google Scholar] [CrossRef]

Figure 1. Third-order solution of Problem 1 on medium (

h \approx 2^{- 3}

) cells.

Figure 1. Third-order solution of Problem 1 on medium (

h \approx 2^{- 3}

) cells.

Figure 2. First-order solution of Problem 1 on small (

h \approx 2^{- 4}

) cells.

Figure 2. First-order solution of Problem 1 on small (

h \approx 2^{- 4}

) cells.

Figure 3. First-order solution of Problem 1 on tiny (

h \approx 2^{- 5}

) cells.

Figure 3. First-order solution of Problem 1 on tiny (

h \approx 2^{- 5}

) cells.

Figure 4. Third-order solution of Problem 1 on big (

h \approx 2^{- 2}

) cells.

Figure 4. Third-order solution of Problem 1 on big (

h \approx 2^{- 2}

) cells.

Figure 5. Comparison between solutions of Problem 1 given by running the same solver on different meshes.

Figure 6. Comparison between solutions of Problem 1 given by running different solvers on the same mesh.

Figure 7. Third-order solution of

U_{1} (t = 0.3)

in Problem 2.

Figure 7. Third-order solution of

U_{1} (t = 0.3)

in Problem 2.

Figure 8. Third-order solution of

U_{2} (t = 0.3)

in Problem 2.

Figure 8. Third-order solution of

U_{2} (t = 0.3)

in Problem 2.

Figure 9. Comparison between solutions of

U_{1} (t = 0.3)

in Problem 2.

Figure 9. Comparison between solutions of

U_{1} (t = 0.3)

in Problem 2.

Figure 10. Comparison between solutions of

U_{2} (t = 0.3)

in Problem 2.

Figure 10. Comparison between solutions of

U_{2} (t = 0.3)

in Problem 2.

Figure 11. Comparison between absolute errors of numerical solutions in Figure 9 and Figure 10.

Figure 12. Mesh for Problems 3–5.

Figure 13. Third-order solution of

ρ (t = 1.0)

in Problem 3.

Figure 13. Third-order solution of

ρ (t = 1.0)

in Problem 3.

Figure 14. Third-order solution of

ρ (t = 0.6)

in Problem 4.

Figure 14. Third-order solution of

ρ (t = 0.6)

in Problem 4.

Figure 15. Third-order solution of

ρ (t = 0.3)

in Problem 5.

Figure 15. Third-order solution of

ρ (t = 0.3)

in Problem 5.

Figure 16. Comparison between solutions of

ρ (t = 1.0)

in Problem 3.

Figure 16. Comparison between solutions of

ρ (t = 1.0)

in Problem 3.

Figure 17. Comparison between solutions of

ρ (t = 0.5)

in Problem 4.

Figure 17. Comparison between solutions of

ρ (t = 0.5)

in Problem 4.

Figure 18. Comparison between solutions of

ρ (t = 0.3)

in Problem 5.

Figure 18. Comparison between solutions of

ρ (t = 0.3)

in Problem 5.

Figure 19. A schematic diagram of Problem 6. The rectangle bounded by four dashed lines and a solid line is the computational domain. The thick red line represents the initial shock wave, which is at an angle of

π / 3

relative to the x-axis.

Figure 19. A schematic diagram of Problem 6. The rectangle bounded by four dashed lines and a solid line is the computational domain. The thick red line represents the initial shock wave, which is at an angle of

π / 3

relative to the x-axis.

Figure 20. First-order solution of

ρ (x, y, z = 0, t = 0.2)

in Problem 6 (

h \approx 1 / 200

).

Figure 20. First-order solution of

ρ (x, y, z = 0, t = 0.2)

in Problem 6 (

h \approx 1 / 200

).

Figure 21. Second-order solution of

ρ (x, y, z = 0, t = 0.2)

in Problem 6 (

h \approx 1 / 200

).

Figure 21. Second-order solution of

ρ (x, y, z = 0, t = 0.2)

in Problem 6 (

h \approx 1 / 200

).

Figure 22. Third-order solution of

ρ (x, y, z = 0, t = 0.2)

in Problem 6 (

h \approx 1 / 200

).

Figure 22. Third-order solution of

ρ (x, y, z = 0, t = 0.2)

in Problem 6 (

h \approx 1 / 200

).

Figure 23. Speedup of the third-order solver for generating Figure 22.

Figure 24. Parallel efficiency of the third-order solver for generating Figure 22.

Figure 25. Distribution of cells in the mesh partitioning given in Figure 26.

Figure 26. A 100-part mesh partitioning of the unstructured mesh used for generating Figure 20, Figure 21 and Figure 22.

Table 1. Accuracy and time cost of each solver–mesh (p–h) pair.

$L_{1}$ -Error with Respect to the Analytic Solution
	$h$	$2^{- 2}$	$2^{- 3}$	$2^{- 4}$	$2^{- 5}$
$p$		$2^{- 2}$	$2^{- 3}$	$2^{- 4}$	$2^{- 5}$
1		2.858	2.095	1.524	1.108
2		1.258	0.771	0.463	0.275
3		1.021	0.590	0.341
Time Cost (in Seconds) Measured on a Single Core Whose Main Frequency Is 2.7 GHz
	$h$	$2^{- 2}$	$2^{- 3}$	$2^{- 4}$	$2^{- 5}$
$p$		$2^{- 2}$	$2^{- 3}$	$2^{- 4}$	$2^{- 5}$
1		0.373	1.129	16.533	306.293
2		1.580	14.894	253.821	4986.391
3		4.147	61.425	906.914

Table 2. Performance of the same solver running on different number of cores.

P	$T_{100}$	$T_{199}$	$T_{200}$	$P \frac{T_{199} - T_{100}}{99}$	$P \frac{T_{200} - T_{100}}{100}$
1	17,652.2	35,324.5	35,519.3	178.508	178.671
20	960.443	1912.365	1926.584	192.307	193.228
40	491.070	971.710	983.933	194.198	197.145
60	335.651	666.548	676.930	200.544	204.767
80	251.548	494.541	504.951	196.358	202.722
100	202.789	397.641	408.126	196.820	205.337

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pei, W.; Jiang, Y.; Li, S. An Efficient Parallel Implementation of the Runge–Kutta Discontinuous Galerkin Method with Weighted Essentially Non-Oscillatory Limiters on Three-Dimensional Unstructured Meshes. Appl. Sci. 2022, 12, 4228. https://doi.org/10.3390/app12094228

AMA Style

Pei W, Jiang Y, Li S. An Efficient Parallel Implementation of the Runge–Kutta Discontinuous Galerkin Method with Weighted Essentially Non-Oscillatory Limiters on Three-Dimensional Unstructured Meshes. Applied Sciences. 2022; 12(9):4228. https://doi.org/10.3390/app12094228

Chicago/Turabian Style

Pei, Weicheng, Yuyan Jiang, and Shu Li. 2022. "An Efficient Parallel Implementation of the Runge–Kutta Discontinuous Galerkin Method with Weighted Essentially Non-Oscillatory Limiters on Three-Dimensional Unstructured Meshes" Applied Sciences 12, no. 9: 4228. https://doi.org/10.3390/app12094228

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Efficient Parallel Implementation of the Runge–Kutta Discontinuous Galerkin Method with Weighted Essentially Non-Oscillatory Limiters on Three-Dimensional Unstructured Meshes

Abstract

1. Introduction

2. Methods

2.1. Spatial Discretization

2.2. Limiting Procedures

2.2.1. The `ScalarWeno` Limiter

2.2.2. The `EigenWeno` Limiter

2.2.3. The `LazyWeno` Limiter

2.3. Temporal Discretization

2.4. Parallel Programming

3. Results

3.1. Linear Conservation Laws

3.1.1. Scalar Case

3.1.2. System Case

3.2. Inviscid Compressible Flows

3.2.1. Shock Tube Problems

3.2.2. Double Mach Reflection Problem

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

An Efficient Parallel Implementation of the Runge–Kutta Discontinuous Galerkin Method with Weighted Essentially Non-Oscillatory Limiters on Three-Dimensional Unstructured Meshes

Abstract

1. Introduction

2. Methods

2.1. Spatial Discretization

2.2. Limiting Procedures

2.2.1. The ScalarWeno Limiter

2.2.2. The EigenWeno Limiter

2.2.3. The LazyWeno Limiter

2.3. Temporal Discretization

2.4. Parallel Programming

3. Results

3.1. Linear Conservation Laws

3.1.1. Scalar Case

3.1.2. System Case

3.2. Inviscid Compressible Flows

3.2.1. Shock Tube Problems

3.2.2. Double Mach Reflection Problem

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2.1. The `ScalarWeno` Limiter

2.2.2. The `EigenWeno` Limiter

2.2.3. The `LazyWeno` Limiter