A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization

Banholzer, Stefan; Mechelli, Luca; Volkwein, Stefan

doi:10.3390/mca27030039

Open AccessFeature PaperEditor’s ChoiceArticle

A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization

by

Stefan Banholzer

,

Luca Mechelli

and

Stefan Volkwein

^*

Department of Mathematics and Statistics, University of Konstanz, Universitätsstraße 10, 78464 Konstanz, Germany

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl. 2022, 27(3), 39; https://doi.org/10.3390/mca27030039

Submission received: 18 January 2022 / Revised: 28 April 2022 / Accepted: 29 April 2022 / Published: 3 May 2022

(This article belongs to the Special Issue Computational Methods for Coupled Problems in Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

In the present paper non-convex multi-objective parameter optimization problems are considered which are governed by elliptic parametrized partial differential equations (PDEs). To solve these problems numerically the Pascoletti-Serafini scalarization is applied and the obtained scalar optimization problems are solved by an augmented Lagrangian method. However, due to the PDE constraints, the numerical solution is very expensive so that a model reduction is utilized by using the reduced basis (RB) method. The quality of the RB approximation is ensured by a trust-region strategy which does not require any offline procedure, in which the RB functions are computed in a greedy algorithm. Moreover, convergence of the proposed method is guaranteed and different techniques to prevent the excessive growth of the number of basis functions are explored. Numerical examples illustrate the efficiency of the proposed solution technique.

Keywords:

non-convex multi-objective optimization; partial differential equations; Pascoletti-Serafini method; augmented Lagrangian; reduced basis method; trust-region strategy

1. Introduction

Multi-objective optimization plays an important role in many applications, e.g., in industry, medicine or engineering. One of the mentioned examples is the minimization of costs with simultaneous quality optimization in production or the minimization of CO

_{2}

emission in energy generation and simultaneous cost minimization. These problems lead to multi-objective optimization problems (MOPs), where we want to achieve an optimal compromise with respect to all given objectives at the same time. Normally, the different objectives are contradictory such that there exists an infinite number of optimal compromises. The set of these compromises is called the Pareto set. The goal is to approximate the Pareto set in an efficient way, which turns out to be more expensive than solving a single objective optimization problem.

Since MOPs are of great importance, there exist several algorithms to solve them. Among the most popular methods are scalarization methods, which transform MOPs into scalar problems. For example, in the weighted sum method [1,2,3], convex combinations of the original objectives are optimized. However, in our case the multi-objective optimization problem

\begin{matrix} min \hat{J} (u) = {({\hat{J}}_{1} (u), \dots, {\hat{J}}_{k} (u))}^{T} subject to (s . t .) u \in U_{ad} \end{matrix}

(MOP)

is non-convex with a bounded, non-empty, convex and closed set

U_{ad}

. To solve (MOP) a suitable scalarization method in that case is the Pascoletti-Serafini (PS) scalarization [4,5]: For a chosen reference point

z \in R^{k}

and a given target direction

r \in R^{k}

with

r_{i} > 0

for all

i \in {1, \dots, k}

the Pascoletti-Serafini problem is given by

\begin{matrix} min t s . t . (t, u) \in R \times U_{ad} and \hat{J} (u) - z \leq t r . & (P_{z, r}^{PS}) \end{matrix}

In the present paper (

P_{z, r}^{PS}

) is solved by an augmented Lagrangian approach. However, in our case the evaluation of the objective

\hat{J}

requires the solution of an elliptic partial differential equation (PDE) for the given parameter u. This implies further that for the computation of the gradients

\nabla {\hat{J}}_{i}

,

i = 1, \dots, k

, adjoint PDEs have to be solved; cf. [6]. Here, surrogate models offer a promising tool to reduce the computational effort significantly [7]. Examples are dimensional reduction techniques such as the Reduced Basis (RB) method [8,9]. In an offline phase, a low-dimensional surrogate model of the PDE is constructed by using, e.g., the greedy algorithm, cf. [8,10,11]. In the online phase, only the RB model is used to solve the PDE, which saves a lot of computing time.

Since the early 2000s the combination of model order reduction with trust-region algorithms in the setting of PDE-constrained optimization is present in the literature, cf. [12,13]. The idea in these methods is to replace the usual quadratic model function in each trust-region step with the reduced-order approximation of the cost function. More recent publications followed and enhanced this approach by using a-posteriori error estimates of the cost function and its gradient, cf. [14,15]. These works were the starting point for the trust-region reduced basis methods developed in [16,17,18]. Let us mention that [19,20] have proposed similar methods for the combination of reduced-order and trust-region methods based on previous works on trust-region algorithms for PDE-constrained optimization under uncertainty, cf. [21,22]. In contrast to the approach followed by [14,15,16,17,18], these methods do not use rigorous a-posteriori error estimates but rather asymptotic error indicators which still allow for a global convergence result. Here we propose an extension of the method in [16] for solving multi-objective PDE-constrained parameter optimization problems, which is based on a combination of the trust-region reduced basis method presented in [17,18] and the PS method. In particular, we discuss different strategies to handle the increasing number of reduced basis functions, which is crucial in order to guarantee good performances of the algorithm. Notice that our approach is designed for applications, where we have to solve the multi-objective PDE-constrained parameter optimization problem once. For that reason, our trust-region reduced basis method does not rely on any offline computations. These proposed strategies are not only interesting in the field of multi-objective optimization by the PS method, but can also be used in other applications where many PDE-constrained optimization problems must be solved and it is hence crucial to keep the number of reduced basis functions small enough, as, e.g., in model predictive control; cf. [23].

The paper is organized as follows: In Section 2 we introduce a general MOP and explain the PS method, in particular, a hierarchical version of the PS algorithm which turns out to be very efficient in the numerical realization. The concrete PDE-constrained MOP is investigated in Section 3. The trust-region RB method and its combination with the PS method is described in Section 4. Convergence is ensured and the algorithmic realization of the approach is explained. Numerical examples are discussed in detail in Section 5. Finally, we draw some conclusions.

2. Multi-Objective Optimization

Let

(U, {〈 \cdot, \cdot 〉}_{U})

be a real Hilbert space,

U_{ad} \subset U

non-empty, convex and closed,

k \geq 2

arbitrary and

{\hat{J}}_{1}, \dots, {\hat{J}}_{k} : U_{ad} \subset U \to R

be given real-valued functions. In this manuscript, we assume also that

U_{ad}

is bounded. This is an assumption we will require later for the convergence of our method. Note that one can derive similar results of this section if

U_{ad}

is unbounded by introducing additional assumptions; cf. [16]. To shorten the notation, we write

\hat{J} : = {({\hat{J}}_{1}, \dots, {\hat{J}}_{k})}^{T} : U_{ad} \to R^{k}

. In the following, we deal with the multi-objective optimization problem

\begin{matrix} min \hat{J} (u) s . t . u \in U_{ad} . \end{matrix}

(MOP)

Definition 1.

(a): The functions ${\hat{J}}_{1}, \dots, {\hat{J}}_{k}$ are called cost or objective functions. Analogously, the vector-valued function $\hat{J} : U_{ad} \to R^{k}$ is named the (multi-objective) cost or (multi-objective) objective function.
(b): The Hilbert space $U$ is named the admissible space, the set $U_{ad}$ is called the admissible set and a vector $u \in U_{ad}$ is called admissible.
(c): The space $R^{k}$ is named the objective space and the image set $\hat{J} (U_{ad})$ is called the objective set. A vector $y = \hat{J} (u) \in \hat{J} (U_{ad})$ is called objective point.

Definition 2

(Partial ordering on

R^{k}

). On

R^{k}

we define the partial ordering ≤ as

\begin{matrix} x \leq y & : \Leftrightarrow (\forall i \in {1, \dots, k} : x_{i} \leq y_{i}) \end{matrix}

for all

x, y \in R^{k}

. Moreover, we define

\begin{matrix} x < y & : \Leftrightarrow (\forall i \in {1, \dots, k} : x_{i} < y_{i}) . \end{matrix}

For convenience, we write

\begin{matrix} x ≨ y & : \Leftrightarrow (x \leq y & x \neq y) \end{matrix}

for all

x, y \in R^{k}

and define the two sets

R_{\leq}^{k} : = {y \in R^{k} ∣ y \leq 0}

,

R_{≨}^{k} : = {y \in R^{k} ∣ y ≨ 0}

. Analogously, the relations ≥, > and ≩ as well as the sets

R_{\geq}^{k}

and

R_{≩}^{k}

are defined.

Definition 3

(Pareto optimality).

(a): An admissible vector $\bar{u} \in U_{ad}$ and its corresponding objective point $\bar{y} : = \hat{J} (\bar{u}) \in \hat{J} (U_{ad})$ are called (locally) weakly Pareto optimal if there is no $\tilde{u} \in U_{ad}$ (in a neighborhood of $\bar{u}$ ) with $\hat{J} (\tilde{u}) < \hat{J} (\bar{u})$ . The sets

$\begin{matrix} U_{opt, w} & : = {u \in U_{ad} ∣ u is weakly Pareto optimal} \subset U_{ad}, \\ U_{opt, w, loc} & : = {u \in U_{ad} ∣ u is locally weakly Pareto optimal} \subset U_{ad} \end{matrix}$

are said to be the weak Pareto set and the locally weak Pareto set, respectively. The sets

$\begin{matrix} J_{opt, w} : = \hat{J} (U_{opt, w}) \subset R^{k}, J_{opt, w, loc} : = \hat{J} (U_{opt, w, loc}) \subset R^{k}, \end{matrix}$

are the weak Pareto front and the locally weak Pareto front, respectively.
(b): An admissible vector $\bar{u} \in U_{ad}$ and its corresponding objective point $\bar{y} : = \hat{J} (\bar{u}) \in \hat{J} (U_{ad})$ are called (locally) Pareto optimal if there is no $\tilde{u} \in U_{ad}$ (in a neighborhood of $\bar{u}$ ) with $\hat{J} (\tilde{u}) ≨ \hat{J} (\bar{u})$ . The sets

$\begin{matrix} U_{opt} & : = {u \in U_{ad} ∣ u is Pareto optimal} \subset U_{ad}, \\ U_{opt, loc} & : = {u \in U_{ad} ∣ u is locally Pareto optimal} \subset U_{ad} \end{matrix}$

are called the Pareto set and the local Pareto set, respectively. The sets

$\begin{matrix} J_{opt} : = \hat{J} (U_{opt}) \subset R^{k}, J_{opt, loc} : = \hat{J} (U_{opt, loc}) \subset R^{k} \end{matrix}$

are called the Pareto front and the local Pareto front, respectively.

If we talk about the different notions of (local) (weak) Pareto optimality in one sentence, we use the notation

U_{opt, (w), (loc)}

to keep the sentence compact. Analogously,

U_{opt, (w), loc}

,

U_{opt, (loc)}

,

J_{opt, (w), (loc)}

etc. are to be understood. An example with the different concepts of Pareto optimality can be found in [16] (Example 1.2.6).

The next theorem about a sufficient condition for the existence of Pareto optimal points goes back to [24]. It also appears in a similar form in [25,26].

Theorem 1.

Suppose that there is

y \in \hat{J} (U_{ad}) + R_{\geq}^{k}

such that the set

(y - R_{\geq}^{k}) \cap (\hat{J} (U_{ad}) + R_{\geq}^{k})

is compact. Then it holds

J_{opt} \neq \emptyset

.

Proof.

This is a slight generalization of [1] (Theorem 2.10) using the argument that adding

R_{\geq}^{k}

to the set

\hat{J} (U_{ad})

does not change the Pareto front

J_{opt}

. □

Given any

y = \hat{J} (u) \in \hat{J} (U_{ad})

with

y \notin J_{opt}

, it follows directly from the definition of Pareto optimality that there is

\bar{y} = \hat{J} (\bar{u}) \in \hat{J} (U_{ad})

with

\bar{y} ≨ y

. However, even if the Pareto front

J_{opt}

is not empty (e.g., since the assumptions of Theorem 1 are satisfied), it is not clear that there is

\bar{y} \in J_{opt}

with

\bar{y} ≨ y

. If this property holds for all

y \in \hat{J} (U_{ad}) ∖ J_{opt}

, the set

J_{opt}

is said to be externally stable; cf. [1,26].

Definition 4.

The set

J_{opt}

is said to be externally stable if for every

y \in \hat{J} (U_{ad})

there is

\bar{y} \in J_{opt}

with

\bar{y} \leq y

. This is equivalent to

\hat{J} (U_{ad}) \subset J_{opt} + R_{\geq}^{k}

.

Especially for the investigation of suitable solution methods for solving (MOP), we are interested in guaranteeing that the Pareto front is externally stable. The next result provides a sufficient condition for this property.

Theorem 2.

If for every

y \in \hat{J} (U_{ad}) + R_{\geq}^{k}

the set

(y - R_{\geq}^{k}) \cap (\hat{J} (U_{ad}) + R_{\geq}^{k})

is compact, then

J_{opt}

is externally stable.

Proof.

For a proof of a similar version of this theorem, we refer to [1] (Theorem 2.21). □

Among the methods to solve multi-objective optimization problems, the ones based on scalarization techniques are frequently appearing in the literature. Let us mention here the weighted-sum method [1,3], the Euclidian reference point method [27] and the PS method [4,5]. Since in our case the set

\hat{J} (U_{ad}) + R_{\geq}^{k}

is non-convex, we apply the PS method which is proven to be able to solve a non-convex (MOP).

2.1. The PS Method

For a chosen reference point

z \in R^{k}

and a given target direction

r \in R_{>}^{k}

the PS problem is given by

\begin{matrix} min t s . t . (t, u) \in R \times U_{ad} and \hat{J} (u) - z \leq t r . & (P_{z, r}^{PS}) \end{matrix}

Analogously, we can define the PS problem as a scalarization problem. For

z \in R^{k}

and

r \in R_{>}^{k}

we define the scalarization function

\begin{matrix} g_{z, r} : R^{k} \to R, x \mapsto g_{z, r} (x) : = max_{1 \leq i \leq k} \frac{1}{r_{i}} (x_{i} - z_{i}), \end{matrix}

and the PS scalarized function

{\hat{J}}^{g_{z, r}} (u) : = g_{z, r} (\hat{J} (u)) = max_{1 \leq i \leq k} \frac{1}{r_{i}} ({\hat{J}}_{i} (u) - z_{i}) for u \in U_{ad} .

Then the reformulated PS problem is given by

\begin{matrix} min {\hat{J}}^{g_{z, r}} (u) s . t . u \in U_{ad} . & ({RP}_{z, r}^{PS}) \end{matrix}

The following theorem proved in [16] (Theorem 1.7.3) ensures the equivalence between (

P_{z, r}^{PS}

) and (

{RP}_{z, r}^{PS}

).

Theorem 3.

Let

z \in R^{k}

and

r \in R_{>}^{k}

be arbitrary. On the one hand, if

(\bar{u}, \bar{t})

is a global (local) solution of (

P_{z, r}^{PS}

), then

\bar{u}

is a global (local) solution of (

{RP}_{z, r}^{PS}

) with minimal function value

\bar{t}

. On the other hand, if

\bar{u}

is a global (local) solution of (

{RP}_{z, r}^{PS}

), then

(\bar{u}, \bar{t})

with

\bar{t} : = {max}_{1 \leq i \leq k} ({\hat{J}}_{i} (\bar{u}) - z_{i}) / r_{i}

is a global (local) solution of (

P_{z, r}^{PS}

).

Assumption 1.

The cost functions

{\hat{J}}_{1}, \dots, {\hat{J}}_{k}

are weakly lower semi-continuous and bounded from below.

Theorem 4.

Let Assumption 1 be satisfied and

z \in R^{k}

as well as

r \in R_{>}^{k}

be arbitrary. Then (

{RP}_{z, r}^{PS}

) has a global solution

\bar{u} \in U_{opt}

.

Proof.

A proof of this statement can be found in [16] (Corollary 1.7.12). □

The previous result also shows that the existing global solution of (

{RP}_{z, r}^{PS}

) belongs to the Pareto set. To guarantee a good reconstruction of the Pareto set by the PS method, one needs that, given a (weakly) Pareto optimal point, it is possible to choose the parameters z and r such that this point solves (

{RP}_{z, r}^{PS}

). This is stated in [16] (Theorem 1.7.13), which we report here for clearness.

Theorem 5.

Let

\bar{u} \in U_{opt, w}

be arbitrary. Then for every

r \in R_{>}^{k}

and every

\bar{t} \in R

we have that

\bar{u}

is a global solution of (

{RP}_{z, r}^{PS}

) for the reference point

z : = \hat{J} (\bar{u}) - \bar{t} r

. If even

\bar{u} \in U_{opt}

, any other global solution

\tilde{u}

of (

{RP}_{z, r}^{PS}

) satisfies

\hat{J} (\tilde{u}) = \hat{J} (\bar{u})

.

Remark 1.

We refer the reader to [16] (Lemma 1.7.15) for the derivation of first-order necessary optimality condition for a global solution of (

P_{z, r}^{PS}

).

Thus, the PS method can compute in principle every (locally) (weak) Pareto optimal point so that many algorithms based on PS method have been proposed. Here we only mention the ones which are related to (but differ from) our proposed technique. Our main idea is to keep the parameter r fixed, while varying the reference point z. This was also proposed in [4], but the method turns out to be, on the one hand, not numerically efficient for

k > 2

and, on the other hand, not numerically applicable in some cases for

k > 2

. In [28], the authors provide assumptions on the Pareto front to ensure that the so-called trade-off limits (i.e., points on the Pareto front which cannot be improved in at least one component), are given by the solution to subproblems. Their idea was then to find these trade-off points first and then compute the rest of the Pareto front. A similar idea but with the use of Centroidal Voronoi Tessellation was presented by [29]. Finally, [30] shows and fixes some problematic behavior associated to the algorithm in [28]. We follow the idea of the mentioned contributions of hierarchically solving subproblems of (MOP), but with the focus of finding a set of reference points, by looking at subproblems, for which we can obtain Pareto optimal points. We are then not interested in finding ‘boundary’ points (i.e., the trade-off limits) of the Pareto front and then filling its ‘interior’ as in [28,29,30], but rather to partly generalize this approach. In what follows, we characterize which reference points are necessary and/or sufficient for computing the entire (local) (weak) Pareto front. First, we recall the following well-defined solution mappings of (

{RP}_{z, r}^{PS}

); cf. [16] (Definition 1.7.16).

Definition 5.

We define the set-valued mappings

\begin{matrix} Q_{opt, w} & : R^{k} ⇉ U_{opt, w}, & z \mapsto {u \in U_{ad} ∣ u is a global solution of (({RP}_{z, r}^{PS}))}, \\ Q_{opt, w, loc} & : R^{k} ⇉ U_{opt, w, loc}, & z \mapsto {u \in U_{ad} ∣ u is a local solution of (({RP}_{z, r}^{PS}))}, \\ Q_{opt, (loc)} & : R^{k} ⇉ U_{opt, (loc)}, & z \mapsto Q_{opt, w, (loc)} (z) \cap U_{opt, (loc)} . \end{matrix}

From Theorem 3, it follows that

Q_{opt, (w), (loc)} (R^{k}) = U_{opt, (w), (loc)}

, i.e., by solving (

{RP}_{z, r}^{PS}

) for all

z \in R^{k}

, we obtain all (locally), (weakly) Pareto optimal points. Furthermore, if Assumption 1 is satisfied, we infer from Theorem 4 that

Q_{opt, (w), (loc)} (z) \neq \emptyset

for all

z \in R^{k}

. We also introduce the notion of a (locally) (weakly) Pareto sufficient set for the PS method.

Definition 6.

A set

Z \subset R^{k}

is called (locally) (weakly) Pareto sufficient if we have

Q_{opt, (w), (loc)} (Z) = U_{opt, (w), (loc)}

.

Hence, a (locally) (weakly) Pareto sufficient set contains the reference points which allow us to compute the entire (local) (weak) Pareto front. Clearly, the set

R^{k}

is (locally) (weakly) Pareto sufficient, but this fact is not computationally useful. The next lemma gives a first condition towards this computational efficiency.

Lemma 1.

Let

Z \subset R^{k}

be arbitrary. Z is (locally) (weakly) Pareto sufficient, if

\begin{matrix} \forall \bar{u} \in U_{opt, (w), (loc)} : \exists t \in R : \hat{J} (\bar{u}) - t r \in Z . \end{matrix}

(1)

Proof.

Let

Z \subset R^{k}

be such that (1) holds. Let

\bar{u} \in U_{opt, (w), (loc)}

be arbitrary. We need to show that there is a

z \in Z

with

\bar{u} \in Q_{opt, (w), (loc)} (z)

. Indeed, by (1) there is

t \in R

with

z : = \hat{J} (\bar{u}) - t r \in Z

and by Theorem 5 we already have

\bar{u} \in Q_{opt, (w), (loc)} (z)

. □

To proceed we introduce the concepts of ideal point and shifted ideal point, which will first be used to define a set of shifted coordinate planes D. On this set we can then define a set of reference points

Z_{opt, (w), (loc)}^{D}

which turns out to be an optimal Pareto sufficient set (The word ‘optimal’ here means that removing any point from the set will cause the loss of the Pareto sufficient property).

Definition 7.

(a): We define the ideal objective point $y^{id} \in R^{k} \cup {- \infty}$ by $y_{i}^{id} : = {inf}_{u \in U_{ad}} {\hat{J}}_{i} (u)$ for all $i \in {1, \dots, k}$ .
(b): For an arbitrary vector $\tilde{d} \in R_{>}^{k}$ define the shifted ideal point ${\tilde{y}}^{id} : = y^{id} - \tilde{d}$ . Let $D_{i} \subset R^{k}$ be given by $D_{i} : = {y \in R^{k} ∣ y \geq {\tilde{y}}^{id}, y_{i} = {\tilde{y}}_{i}^{id}}$ for all $i \in {1, \dots, k}$ . Then the set $D \subset R^{k}$ is defined by $D : = ⋃_{i = 1}^{k} D_{i}$ .
(c): We define $Z_{opt, (w), (loc)}^{D} : = {z \in D ∣ \exists \bar{u} \in U_{opt, (w), (loc)} : \exists t \in R : z = \hat{J} (\bar{u}) - t r}$ .
(d): For any $y \in R^{k}$ we set $t^{D} (y) : = {min}_{i \in {1, \dots, k}} (y_{i} - {\tilde{y}}_{i}^{id}) / r_{i} \in R$ .

Remark 2.

It is proved in [16] (Lemma 1.7.24) that

\begin{matrix} Z_{opt, (w), (loc)}^{D} = \{\hat{J} (\bar{u}) - t^{D} (\hat{J} (\bar{u})) r | \bar{u} \in U_{opt, (w), (loc)}\} . \end{matrix}

Furthermore, the set

Z_{opt, (w), (loc)}^{D}

is (locally) (weakly) Pareto sufficient and there is a Lipschitz continuous bijection between

Z_{opt}^{D}

and the Pareto front

J_{opt}

. Unfortunately, there is no bijection between

Z_{opt, (w), (loc)}^{D}

and

J_{opt, (w), (loc)}

, but the set

Z_{opt, (w), (loc)}^{D}

is still (locally) (weakly) Pareto sufficient. Therefore, it is anyway possible to use it for the computation of the Pareto front.

2.2. Hierarchical PS Method

Due to Definition 7 and Remark 2 the set

Z_{opt, (w), (loc)}^{D}

can only by computed once the set

U_{opt, (w), (loc)}

is available. Clearly, this characterization of

Z_{opt, (w), (loc)}^{D}

is not useful for a numerical algorithm since the availability of

U_{opt, (w), (loc)}

means that we have already solved (MOP). Fortunately, in [16,31] it is shown that the Pareto set has a hierarchical structure. This means that the (weak) Pareto front and the (weak) Pareto sets of (MOP) are contained in the set of all (weak) Pareto fronts and (weak) Pareto sets of all of its subproblems. This particular structure of the Pareto set can be exploited to set up a hierarchical algorithm for obtaining a superset of

Z_{opt, (w), (loc)}^{D}

without having to compute the entire (local) (weak) Pareto set

U_{opt, (w), (loc)}

first. We start the explanation of the hierarchical algorithm by introducing the notion of a subproblem and related notations.

Definition 8.

For the index set

I \subset {1, \dots, k}

we denote by

{\hat{J}}^{I}

the multi-objective cost function

{({\hat{J}}_{i})}_{i \in I} : U_{ad} \to R^{I}

, and call the problem

\begin{matrix} min {\hat{J}}^{I} (u) s . t . u \in U_{ad} \end{matrix}

(MOP_I)

a subproblem of (MOP). For

I, K \subset {1, \dots, k}

with

K \subset I

,

(a): and for every $y \in R^{I}$ we denote by $y^{K} : = {(y_{i})}_{i \in K} \in R^{K}$ the canonical projection to $R^{K}$ .
(b): the set $U_{opt, (w), (loc)}^{I} : = {u \in U_{ad} ∣ u is (loc .) (weak .) Pareto optimal for ({MOP}_{I})}$ denotes the (local) (weak) Pareto set and the set $J_{opt, (w), (loc)}^{I} : = {\hat{J}}^{I} (U_{opt, (w), (loc)}^{I}) \subset R^{I}$ denotes the (local) (weak) Pareto front of the subproblem (MOP_I).
(c): the (local) (weak) nadir objective point for the subproblem (MOP_I) is defined by

$\begin{matrix} y_{i}^{nad, I, (w), (loc)} : = sup {y_{i} ∣ y \in J_{opt, (w), (l o c)}^{I}} for all i \in I . \end{matrix}$

Given a subproblem (MOP_I) it is straight-forward to define the PS problem for this setting.

Definition 9.

Let

I \subset {1, \dots, k}

be arbitrary. For a given reference point

z \in R^{I}

and target direction

r \in R_{>}^{I}

, we define the PS problem for (MOP_I) by

\begin{matrix} min t s . t . (t, u) \in R \times U_{ad} and {\hat{J}}^{I} (u) - z \leq t r^{I} . & (P_{I, z, r}^{PS}) \end{matrix}

Again, it is possible to show that (

P_{I, z, r}^{PS}

) is equivalent (in the sense of Theorem 3) to the problem

\begin{matrix} min (max_{i \in I} \frac{1}{r_{i}} ({\hat{J}}_{i} (u) - z_{i})) s . t . u \in U_{ad} . & ({RP}_{I, z, r}^{PS}) \end{matrix}

Let us mention that the statements proved in Section 2.1 can be adapted for the PS method for the subproblems. Similarly, we can also generalize the definition of the shifted coordinate plane D and the (locally) (weakly) Pareto sufficient set of reference points

Z_{opt, (w), (loc)}^{D}

to this setting.

Definition 10.

Let

I \subset {1, \dots, k}

be arbitrary. Given the vector

\tilde{d} \in R_{>}^{k}

and the shifted ideal point

{\tilde{y}}^{id} \in R^{k}

, which were both introduced in Definition 7, let

D_{i}^{I} \subset R^{I}

be given by

D_{i}^{I} : = \{y \in R^{I} | y \geq {({\tilde{y}}^{id})}^{I}, y_{i} = {\tilde{y}}_{i}^{id}\} for i \in I .

Then the set

D^{I} \subset R^{I}

is defined by

D^{I} : = ⋃_{i \in I} D_{i}

. Moreover, for all

K \subset {1, \dots, k}

we define the sets

\begin{matrix} Z_{opt, (w), (loc)}^{D^{I}, K} & : = \{z \in D^{I} | \exists \bar{u} \in U_{opt, (w), (loc)}^{K} : \exists t \in R : z = {\hat{J}}^{I} (\bar{u}) - t r^{I}\} . \end{matrix}

To ease the notation, we write

Z_{opt, (w), (loc)}^{D^{I}} : = Z_{opt, (w), (loc)}^{D^{I}, I}

. If

I = {1, \dots, k}

we set

Z_{opt, (w), (loc)}^{D, K} : = Z_{opt, (w), (loc)}^{D^{I}, K}

and

Z_{opt, (w), (loc)}^{D} : = Z_{opt, (w), (loc)}^{D^{I}, I}

. Finally, for any

y \in R^{I}

we set

t^{D^{I}} (y) : = {min}_{i \in I} \frac{y_{i} - {\tilde{y}}_{i}^{id}}{r_{i}} \in R

.

Note that also Remark 2 can be rewritten for the subproblems.

The main ingredient of the hierarchical PS method is the result that a superset of

Z_{opt, (w), (loc)}^{D^{I}}

can be computed by using the sets

U_{opt, (w), (loc)}^{K}

for all

K ⊊ I

. In other words, in contrast to Definition 10 only the Pareto optimal solutions to all subproblems–but not the problem itself–are needed to compute the (locally) (weakly) Pareto sufficient set of reference points

Z_{opt, (w), (loc)}^{D^{I}}

for the subproblem (MOP_I). The very technical details of the analytical derivation and verification of this result are omitted here to ease and shorten the presentation. For a reader interested in the details we refer to [16] (Sections 1.7.4.2–1.7.4.4). Building on this result, the idea of the hierarchical PS method is to iteratively solve subproblems with increasing number of cost functions. During this procedure the required reference points for the current subproblem can be computed by using the Pareto optimal solutions of all of its subproblems as described above.

Before we formulate the hierarchical algorithm, we give the necessary numerical condition in order to compute a numerical approximation of the set

Z_{opt, (w), (loc)}^{D^{I}}

by using the numerical solution to all subproblems.

To do so, we introduce a grid on

D^{I}

as follows.

Definition 11.

Let

I \subset {1, \dots, k}

be arbitrary. For a given grid size

h > 0

and any

i \in I

, we define

\begin{matrix} Z_{i}^{h, I} & : = \{z \in D_{i}^{I} | \forall j \in I ∖ {i} : (\exists k \geq 0 : z_{j} = {\tilde{y}}_{j}^{id} + \frac{h}{2} + k h) & (z_{j} \leq y_{j}^{n a d, I, w} - {\bar{t}}^{i} r_{j})\} . \end{matrix}

Furthermore, we set

Z^{h, I} : = ⋃_{i \in I} Z_{i}^{h, I}

. If

I = {1, \dots, k}

, we write

Z^{h} : = Z^{h, I}

.

The idea is to only choose reference points that lie on the grid

Z^{h, I}

and do not satisfy the condition

\begin{matrix} \exists K ⊊ I : \exists (\bar{u}, \bar{t}, \bar{z}) \in U T Z^{num} (K) : z^{K} = {\bar{z}}^{K} & z^{I ∖ K} \geq {\hat{J}}^{I ∖ K} (\bar{u}) - \bar{t} r^{I ∖ K}, \end{matrix}

(2)

where

U T Z^{num} (K)

is a numerical approximation of

U T Z (K) = {(u, {\tilde{d}}_{j}, {\tilde{y}}_{j}^{id}) ∣ u \in {\tilde{U}}_{opt, w} (I)}

. An explanation for excluding points based on (2) can be found in [16] (Section 1.7.4.5). Finally, we describe the proposed numerical hierarchical PS method in Algorithm 1.

Remark 3.

In [32], the author introduce three different quality criteria for the numerical implementation of a scalarization method, which we discuss here for the presented hierarchical PS method.

(a): Coverage: Every part of the Pareto set and front has to be represented in the sets $U_{opt, w}^{num}$ and $J_{opt, w}^{num}$ , respectively. This can be measured by

$cov (J_{opt, (w), (loc)}) : = max_{\bar{y} \in J_{opt, (w), (loc)}} min_{y \in J_{opt, (w), (loc)}^{num}} ∥\bar{y} - y∥ .$

In the case of Algorithm 1, we have that $cov (J_{opt, (w), (loc)}) = O (h)$ (cf. [16] (Remark 1.7.69-(a))).
(b): Uniformity:The points on the Pareto set and front should be distributed (almost) equidistantly; cf. [16] (Remark 1.7.69-(b)).
(c): Cardinality:The number of points contained in the numerical approximation should be reasonable. In the case of Algorithm 1 is not possible to estimate a-priori the number of elements computed by the method. It is possible to show a bound which can be computed when the nadir objective point $y^{nad, (w)}$ is known (cf. [16], Remark 1.7.69-(c)).

Algorithm 1: Solving (MOP) numerically by the hierarchical PS method

1:: for $j = 1 : k$ do
2:: Set $I : = {j}$ ;
3:: Compute $U_{opt, w}^{num} (I) = {u ∣ u minimizes {\hat{J}}_{j}}$ ;
4:: Choose ${\tilde{d}}_{j}$ , compute $y_{j}^{id}$ and set ${\tilde{y}}_{j}^{id} = y_{j}^{id} - {\tilde{d}}_{j}$ ;
5:: Set $U T Z^{num} (I) = {(u, {\tilde{d}}_{j}, {\tilde{y}}_{j}^{id}) ∣ u \in U_{opt, w}^{num} (I)}$ ;
6:: end for
7:: for $i = 2 : k$ do
8:: for all $I \subset {1, \dots, k}$ with $|I| = i$ do
9:: Initialize $U_{opt, w}^{num} (I) = ⋃_{K ⊊ I} U_{opt, w}^{num} (K)$ and $U T Z^{num} (I) = \emptyset$ ;
10:: Compute the reference points $Z^{num} (I) = {z \in Z^{h, I} ∣ \neg ()}$ ;
11:: while $Z^{num} (I) \neq \emptyset$ do
12:: Choose $z \in Z^{num} (I)$ and remove z from $Z^{num} (I)$ ;
13:: Solve ( $P_{I, z, r}^{PS}$ )/( ${RP}_{I, z, r}^{PS}$ );
14:: Set $U_{opt, w}^{num} (I) \leftarrow U_{opt, w}^{num} (I) \cup Q_{opt, w}^{I} (z)$ ;
15:: Set
$U T Z^{num} (I) \leftarrow U T Z^{num} (I) \cup {(\bar{u}, \bar{t}, z) ∣ (\bar{u}, \bar{t}) gl . sol . of (P_{I, z, r}^{PS})}$ ;
16:: Add solutions of PSPs with respect to redundant reference points: Set
$U T Z^{num} (I) \leftarrow U T Z^{num} (I) \cup {(\bar{u}, \bar{t}, \tilde{z}) ∣ (\bar{u}, \bar{t}) gl . sol . of (P_{I, z, r}^{PS}),$
$\tilde{z} \in Z^{num} (I) \cap [z - (\bar{t} r^{I} - ({\hat{J}}^{I} (\bar{u}) - z)), z]}$ ;
17:: Remove redundant reference points: Set
$Z^{num} (I) \leftarrow Z^{num} (I) ∖ [z - (\bar{t} r^{I} - ({\hat{J}}^{I} (\bar{u}) - z)), z]$ for all $\bar{u} \in Q_{opt, (w)}^{I} (z)$ ;
18:: end while
19:: end for
20:: end for
21:: ifcomputeParetoFront == truethen
22:: Remove all $u \in U_{opt, w}^{num} ({1, \dots, k})$ with $u \notin U_{opt}$ by a non-dominance test;
23:: end if

3. The Non-Convex Parametric PDE-Constrained MOP

Before defining our exemplary MOP, we introduce the PDE model which will later serve as an equality constraint. Let

Ω \subset R^{d}

,

d \in {2, 3}

, be a bounded domain with Lipschitz-continuous boundary

Γ = \partial Ω

. Furthermore, let

Ω_{1}, \dots, Ω_{m}

be a pairwise disjoint decomposition of the domain

Ω

and set

Γ_{i} : = \partial Ω_{i} \cap \partial Ω

for all

i = 1, \dots, m

. Then we are interested in the following elliptic diffusion-reaction equation with Robin boundary condition:

\begin{matrix} - \nabla \cdot (\sum_{i = 1}^{m} u_{i}^{κ} χ_{Ω_{i}} (x) \nabla y (x)) + u^{r} r (x) y (x) & = f (x) & a . e . in Ω, \end{matrix}

(3a)

\begin{matrix} u_{i}^{κ} \frac{\partial y}{\partial n} (s) + α y (s) & = α y_{a} (s) & a . e . on Γ_{i} . \end{matrix}

(3b)

For every

i \in {1, \dots, m}

, the parameter

u_{i}^{κ} > 0

represents the diffusion coefficient on the subdomain

Ω_{i}

. By

r \in L^{\infty} (Ω)

, we denote a reaction function, which is supposed to satisfy

r > 0

a.e. in

Ω

and is controlled by the scalar parameter

u^{r} > 0

. On the right-hand side of (3a), we have the source term

f \in L^{2} (Ω)

. The constant

α > 0

in (3b) models the heat exchange with the outside of the domain

Ω

, where a temperature of

y_{a} \in L^{2} (Γ)

is assumed. In total, the parameter space is given by

U = R^{m} \times R

and any parameter

u \in U

can be written as the vector

u = {(u^{κ}, u^{r})}^{T}

with

u^{κ} = {(u_{1}^{κ}, \dots, u_{m}^{κ})}^{T} \in R^{m}

. Setting

H = L^{2} (Ω)

and

V = H^{1} (Ω)

the weak formulation of (3) is

\begin{matrix} a (u; y, φ) = F (φ) for all φ \in V \end{matrix}

(4)

for any

u \in U

. In (4) the parameter-dependent symmetric bilinear form

a (u; \cdot, \cdot) : V \times V \to R

is given by

\begin{matrix} a (u; φ, ψ) : = & \sum_{i = 1}^{m} u_{i}^{κ} \int_{Ω_{i}} \nabla φ (x) \cdot \nabla ψ (x) d x + u^{r} \int_{Ω} r (x) φ (x) ψ (x) d x \\ + α \int_{Γ} φ (s) ψ (s) d s \end{matrix}

for all

φ, ψ \in V

and

u \in U

. The linear functional

F \in V^{'}

is defined by

F (φ) : = \int_{Ω} f (x) φ (x) d x + α \int_{Γ} y_{a} (s) φ (s) d s for all φ \in V .

Lemma 2.

(a): For all $u \in U$ it holds

$\begin{matrix} {∥a (u; \cdot, \cdot)∥}_{L (V, V^{'})} \leq C {∥u∥}_{U} \end{matrix}$

with a constant $C > 0$ , which does not depend on u.
(b): For all $u \in U$ with $u^{κ} > 0$ in $R$ and $u^{r} > 0$ , it holds

$\begin{matrix} a (u; φ, φ) & \geq min (u_{1}^{κ}, \dots, u_{m}^{κ}, u^{r}) {∥φ∥}_{V}^{2} for all φ \in V . \end{matrix}$
(c): The mapping $F \in V^{'}$ is well-defined.

Proof.

All statements follow from similar arguments of [33] (Lemma 1.4), where related operators were considered in the parabolic case. □

Theorem 6.

Let

u \in U

with

u > 0

be arbitrary. Then there is a unique solution

y = y (u) \in V

of (3). Moreover, the estimate

\begin{matrix} {∥y∥}_{V} \leq C ({∥f∥}_{L^{2} (Ω)} + {∥y_{a}∥}_{L^{2} (Γ)}) \end{matrix}

(5)

holds with a constant

C > 0

, which depends continuously on u, but is independent of f and

y_{a}

.

Proof.

The claims follow from the Lax-Milgram theorem (cf. [34]) and Lemma 2. □

Definition 12.

Let

u_{m i n}^{κ} \in {(0, \infty)}^{m}

and

u_{m i n}^{r} > 0

be arbitrary. Then we define the closed set

U_{eq} : = {u \in U ∣ u^{κ} \geq u_{m i n}^{κ}, u^{r} \geq u_{m i n}^{r}} .

In view of Theorem 6, it is possible to define the solution operator

S : U_{eq} \to V

, which maps any parameter

u \in U_{eq}

to the unique solution

y = S (u) \in V

of (4).

Remark 4.

Due to Lemma 2, we can conclude that

a (u; φ, φ) \geq α_{\min} {∥φ∥}_{V}^{2}

for all

φ \in V

and

u \in U_{eq}

, where

α_{\min} : = min ({(u_{m i n}^{κ})}_{1}, \dots, {(u_{m i n}^{κ})}_{m}, u^{r}) > 0

. In particular, the constant C in (5) can be chosen independently of u if we restrict ourselves to parameters

u \in U_{eq}

.

Theorem 7.

The solution operator

S : U_{eq} \to V

is twice continuously Fréchet differentiable. For the first derivative

S^{'} : U_{eq} \to L (U, V)

, we have that for any

u \in U_{eq}

and

h \in U

the function

y^{h} : = S^{'} (u) h \in V

solves the equation

\begin{matrix} a (u; y^{h}, φ) = - \partial_{u} a (u; S (u), φ) h for all φ \in V . \end{matrix}

The second derivative

S^{″} : U_{eq} \to L (U, L (U, V))

is given as follows: For any

u \in U_{eq}

and

h_{1}, h_{2} \in U

, the function

y^{h_{1}, h_{2}} : = S^{″} (u) (h_{1}, h_{2})

solves the equation

\begin{matrix} a (u; y^{h_{1}, h_{2}}, φ) = - \partial_{u} a (u; S^{'} (u) h_{1}, φ) h_{2} - \partial_{u} a (u; S^{'} (u) h_{2}, φ) h_{1} for all φ \in V . \end{matrix}

Remark 5.

By

\partial_{u} a

we denote the partial derivative of the mapping a w.r.t. the parameter u. Since a is linear in u, it holds

\begin{matrix} \partial_{u} a (u; φ, ψ) h = a (h; φ, ψ), & \partial_{u}^{2} a (u; φ, ψ) = 0 \in L (U, U^{'}) \end{matrix}

for all

u, h \in U

and all

φ, ψ \in V

. In particular, we can identify

\partial_{u} a (u; φ, ψ) \in U^{'}

by

\begin{matrix} \partial_{u} a (u; φ, ψ) = (\begin{matrix} \int_{Ω_{1}} \nabla φ (x) \cdot \nabla ψ (x) d x \\ ⋮ \\ \int_{Ω_{m}} \nabla φ (x) \cdot \nabla ψ (x) d x \\ \int_{Ω} r (x) φ (x) ψ (x) d x \end{matrix}) \in U \end{matrix}

by using the Riesz representation theorem.

We are now ready to state the multiobjective parametric PDE-constrained optimization problem (MPPOP). Let

k \in N

be fixed and

σ_{Ω}^{(1)}, \dots, σ_{Ω}^{(k)} \geq 0 as well as σ_{U}^{(1)}, \dots, σ_{U}^{(k)} \geq 0

be non-negative weights. Furthermore, denote by

y_{Ω}^{(1)}, \dots, y_{Ω}^{(k)} \in H

the desired states and by

u_{d}^{(1)}, \dots, u_{d}^{(k)} \in U

the desired parameters. Then we define the multiobjective essential cost functions

{\hat{J}}_{1}, \dots, {\hat{J}}_{k} : U_{eq} \to R

by

\begin{matrix} {\hat{J}}_{i} (u) : = \frac{σ_{Ω}^{(i)}}{2} ∥ S (u) - y_{Ω}^{(i)} ∥_{H}^{2} + \frac{σ_{U}^{(i)}}{2} ∥ u - u_{d}^{(i)} ∥_{U}^{2} for all u \in U_{eq} and i \in {1, \dots, k} . \end{matrix}

Moreover,

u_{a}, u_{b}

with

u_{a} \leq u_{b}

are lower and upper bounds on the parameter u which we assume to be finite. We define

U_{ad} : = {u \in U ∣ u_{a} \leq u \leq u_{b}}

and we assume that

U_{ad} \subset U_{eq}

holds. Note that

U_{ad}

is a closed, convex and bounded set because of the finiteness assumption on

u_{a}

and

u_{b}

. We are interested in solving

\begin{matrix} min_{u \in U_{ad}} \hat{J} (u) = min_{u \in U_{ad}} {({\hat{J}}_{1} (u), \dots, {\hat{J}}_{k} (u))}^{T} . \end{matrix}

(MPPOP)

Note that, thanks to the assumptions on

U_{ad}

and

σ_{U}^{(i)}

, the costs

{\hat{J}}_{1}, \dots, {\hat{J}}_{k}

satisfy Assumption 1. This problem fits into the framework of non-convex multiobjective optimization and Algorithm 1 can be applied. The non-convexity comes from the way the bilinear form depends on the parameter u. This makes, in fact, the solution mapping non-linear and thus the MPPOP non-convex. To close this section, we derive the expression of the gradient and Hessian of the cost functionals

{\hat{J}}_{1}, \dots, {\hat{J}}_{k}

. We define the i-th adjoint equation and its solution operator as

Definition 13.

For

i = 1, \dots, k

, the solution operator of the i-th adjoint equation is

A_{i} : U_{eq} \to V

, where for any given

u \in U_{eq}

,

p^{(i)} : = A_{i} (u)

solves the equation

\begin{matrix} a (u; φ, p^{(i)}) = {〈 σ_{Ω}^{(i)} (S (u) - y_{Ω}^{(i)}), φ 〉}_{H} for all φ \in V . \end{matrix}

(6)

As shown in [16], this operators satisfy the two following results:

Lemma 3.

The solution operator

A_{i} : U_{eq} \to V

is continuously Fréchet differentiable for all

i = 1, \dots, k

. For all

i = 1, . . . k

, for the first derivative

A_{i}^{'} : U_{eq} \to L (U, V)

, we have that for any

u \in U_{eq}

and

h \in U

the function

p^{(i), h} : = A_{i}^{'} (u) h \in V

solves the equation

\begin{matrix} a (u; φ, p_{i}^{(i), h}) = - \partial_{u} a (u; φ, A_{i} (u)) h + σ_{Ω} {〈 S^{'} (u) h, φ 〉}_{V^{'}, V} for all φ \in V . \end{matrix}

(7)

Corollary 1.

Let

U_{ad} \subset U_{eq}

,

u \in U_{ad}

and

h \in U

be arbitrary. Then for

i = 1, \dots, k

the cost functions

{\hat{J}}_{i}

are twice continuously Fréchet differentiable and it holds

\begin{matrix} \nabla {\hat{J}}_{i} (u) & = - \partial_{u} a (u; S (u), A_{i} (u)) + σ_{U} (u - u_{d}^{(i)}) \in U, \\ \nabla^{2} {\hat{J}}_{i} (u) h & = - \partial_{u} a (u; S^{'} (u) h, A_{i} (u)) - \partial_{u} a (u; S (u), A_{i}^{'} (u) h) + σ_{U}^{(i)} h \in U . \end{matrix}

where we use the representation of

\partial_{u} a (u; S (u), A_{i} (u)) \in U^{'}

in

U

, cf. Remark 5.

The RB Method for MPPOP

One of the limitations of solving the MPPOP directly with the PS method is the high computational cost. Algorithm 1, in fact, requires to solve the state and adjoint equation a large number of times in order to efficiently approximate the Pareto set. Unfortunately, the numerical evaluation of the state and adjoint solution operators is costly due to the high number of degrees of freedom required to apply, for example, the FE method. For this reason, we use the RB method. In the following we explain how the RB method can be applied to our model. From Theorem 6, we know that the weak form of the state equation admits a unique solution for any control

u \in U_{eq}

. This allows us to define the solution operator

S : U_{eq} \to V

. Now, let us consider the so-called solution manifold

M : = {S (u) | u \in U_{eq}} \subset V

. The goal of the RB method is to provide a low-dimensional subspace

V^{ℓ} \subset V

, which is a good approximation of

M

. The subspace

V^{ℓ}

is defined as the span of linearly independent snapshots

S (u_{1}), \dots, S (u_{ℓ})

for selected parameters

u_{1}, \dots, u_{ℓ} \in U_{eq}

. Clearly,

V^{ℓ}

has dimension ℓ and the snapshots constitute its basis. Let us postpone the discussion on how to select good parameters for generating

V^{ℓ}

. Given an RB space

V^{ℓ}

, we obtain the reduced-order state equation by a Galerkin projection:

\begin{matrix} a (u; y^{ℓ}, ψ) = F (ψ) for all ψ \in V^{ℓ} . \end{matrix}

(8)

Also for the reduced-order equation, we have unique solvability for all parameters

u \in U_{eq}

. The solution map

S^{ℓ} : U_{eq} \to V^{ℓ}

, which maps any parameter

u \in U_{eq}

to the unique solution

y^{ℓ} = S^{ℓ} (u) \in V^{ℓ}

of (8), is then well-defined. We can similarly define a reduced-order adjoint equation and essential cost functional. For

i = 1, \dots, k

, we define the essential reduced-order cost functions

{\hat{J}}_{i}^{ℓ} : U_{eq} \to R

by

\begin{matrix} {\hat{J}}_{i}^{ℓ} (u) : = \frac{σ_{Ω}^{(i)}}{2} ∥ S^{ℓ} (u) - y_{Ω}^{(i)} ∥_{H}^{2} + \frac{σ_{U}^{(i)}}{2} {∥ u - u_{d}^{(i)} ∥}_{U}^{2}, \end{matrix}

the reduced-order adjoint equation by

\begin{matrix} a (u; ψ, p^{(i), ℓ}) = {〈σ_{Ω}^{(i)} (S^{ℓ} (u) - y_{Ω}^{(i)}), ψ〉}_{H} for all ψ \in V^{ℓ} \end{matrix}

(9)

and the reduced-order adjoint solution operator

A_{i}^{ℓ} : U_{eq} \to V

. Following Corollary 1, it is possible to represent the gradient and the Hessian of the essential reduced-order cost functions

{\hat{J}}_{i}^{ℓ}

for

i = 1, \dots, k

by simply replacing the operators

S

and

A_{i}

by their respective reduced-order versions

S^{ℓ}

and

A_{i}^{ℓ}

. There are still two aspects which remain to be clarified: first, how to generate an RB space which guarantees a good approximation of the state and adjoint solution manifolds and, second, how to estimate a-posteriori (i.e., without explicitly evaluating the full-order solution operators

S

and

A

) the error of such an approximation.

For the first aspect, one can think of building an RB space either prior to solving the reduced-order optimization problem or while solving it. The first approach is the so-called offline/online decomposition; cf. [35]. This technique exploits a greedy algorithm in the offline phase, which iteratively searches for the parameter for which the approximation error between the full- and reduced-order state and adjoint variables is the largest. Then, the RB space is enriched (by solving the full-order state and adjoint equations at the respective parameter and orthonormalizing the newly computed snapshots with respect to the current RB basis) until a pre-defined tolerance for the approximation error is reached. Once the RB space is computed, the online phase can start: the optimization problem is solved fast on the reduced-order level. Although this technique is still widely used in literature, it shows many disadvantages in the context of optimization. At first, it suffers from the curse of dimensionality: for a high-dimensional parameter space it is too costly to explore the entire parameter space with a greedy procedure. At second, it is counter-intuitive to prepare an RB space which is accurate enough for any parameter, when usually the optimization method follows a (short) pattern in the parameter space to find the solution or when the Pareto set is contained in some local regions of the parameter space, as often in the case of non-convex multiobjective problems. While it is true that the computational costs of an offline phase could be amortized in the context of multiobjective optimization for a reasonably small dimension of the parameter space due to the vast amount a scalarized PS problems that need to be solved in the online phase, the disadvantage of the offline-online splitting in this setting is the lack of control of the accuracy of the Pareto optimal solutions. Indeed, to the best of our knowledge there are no suitable error indicators for the greedy algorithm to guarantee a certified accuracy of the reduced-order Pareto optimal points w.r.t. full-order ones. Luckily, the focus has shifted recently towards adapting the RB space while proceeding with the optimization method. This procedure is followed, e.g., by the methods presented in [14,15,17,18]. The advantage of these methods with respect to methods based on an offline-online splitting is that they compute first-order critical points of the full-order optimization problems. Let us specify that in [14,17,18] the authors proposed and progressively improved an RB method combined with a TR algorithm, based on more general results presented in [15]. Such a method constructs the RB space adaptively while the optimizer is computing the optimal solution. Our focus here is on further improving the method in [17], which can be considered the most general among the TR-RB methods.

For any of the above-mentioned methods, a-posteriori error estimates are crucial to compute upper bounds of the approximation error made by the RB space in reconstructing the solution for a given parameter without any full-order solution at hand. In case of optimization, one is also interested in estimating the error in reconstructing the cost functional and its gradient. For our model, we can use the following estimates:

Theorem 8.

Let

u \in U_{ad}

be arbitrary and denote by

α (u)

the coercivity constant of the bilinear form

a (u; \cdot, \cdot)

. By Remark 4, it holds

α (u) \geq α_{\min} > 0

. Let the residual

r_{st} (u; \cdot) \in V^{'}

be given by

r_{st} (u; φ) : = F (φ) - a (u; S^{ℓ} (u), φ)

for all

φ \in V

. Then it holds

\begin{matrix} ∥ S (u) - S^{ℓ} (u) ∥_{V} \leq Δ_{st} (u) : = \frac{∥ r_{st} (u; \cdot) ∥_{V^{'}}}{α (u)} . \end{matrix}

(10)

For

i = 1, \dots, k

the residual

r_{adj}^{(i)} (u; \cdot) \in V^{'}

of the adjoint equations is given by

r_{adj}^{(i)} (u; φ) : = {〈 σ_{Ω}^{(i)} (S^{ℓ} (u) - y_{Ω}^{(i)}), φ 〉}_{H} - a (u; φ, A_{i}^{ℓ} (u))

for all

φ \in V

. Then it holds

\begin{matrix} ∥ A_{i} (u) - A_{i}^{ℓ} (u) ∥_{V} \leq Δ_{adj}^{(i)} (u) : = \frac{∥ r_{adj}^{(i)} (u; \cdot) ∥_{V^{'}} + σ_{Ω}^{(i)} Δ_{st} (u)}{α (u)} . \end{matrix}

Furthermore, for

i = 1, \dots, k

we have

\begin{matrix} | {\hat{J}}_{i} (u) - {\hat{J}}_{i}^{ℓ} (u) | & \leq Δ_{st} (u) ∥ r_{adj}^{(i)} (u; \cdot) ∥_{V^{'}} + σ_{Ω}^{(i)} Δ_{st} {(u)}^{2} = : Δ_{{\hat{J}}_{i}^{ℓ}} (u), \\ ∥ \nabla {\hat{J}}_{i} (u) - \nabla {\hat{J}}_{i}^{ℓ} (u) ∥_{U} & \leq ∥\partial_{u} a (u; \cdot, \cdot)∥ (∥ S^{ℓ} (u) ∥_{V} Δ_{adj}^{(i)} (u) + Δ_{st} (u) Δ_{adj}^{(i)} (u) \\ + Δ_{st} (u) ∥ A_{i}^{ℓ} (u) ∥_{V}) = : Δ_{\nabla {\hat{J}}_{i}^{ℓ}} (u) . \end{matrix}

Proof.

A proof of the a-posteriori error estimates for the state and adjoint can be found in [35]. For the cost function and the gradient, we refer to [18] (Proposition 2.5). □

Note that we only need the reduced-order state and adjoint state to evaluate the a-posteriori error estimates. For our example, the computation of the coercivity constant

α (u)

is cheap, see Lemma 2. In more general examples, this might not be the case. Thus, one often uses a quickly computable lower bound

α_{LB} (u)

instead. Possible methods for computing such a lower bound are, e.g., the min-theta approach (cf. [35]) or the Successive Constraint Method (SCM) (cf. [36]). In situations in which the computation or the estimation of the coercivity constant is complicated, the TR-RB algorithms presented in [19,20] have the advantage that they do not require the computation or estimation of the coercivity constant but only rely on asymptotic error estimates consisting of residual based error indicators. Note finally that the computation of the terms

∥ r_{st} {(u; \cdot) ∥}_{V^{'}}

and

∥ r_{adj}^{(i)} {(u; \cdot) ∥}_{V^{'}}

is not possible in an infinite-dimensional setting. Even after discretization with the FE method, the cost of computing such a term depends on the dimension of the full-order model, which contradicts the request of having a computationally cheap estimate. However, in our case, the parameter-separability of the bilinear form

a (u; \cdot, \cdot)

can be exploited to preassemble certain quantities in such a way that the computational cost for evaluating

∥ r_{st} {(u; \cdot) ∥}_{V^{'}}

and

∥ r_{adj}^{(i)} {(u; \cdot) ∥}_{V^{'}}

only depends on the dimension of the RB space; see, e.g., [36]. Finally, we apply the RB method to (MPPOP): for a given RB space

V^{ℓ}

the reduced-order MPPOP reads

\begin{matrix} min {\hat{J}}^{ℓ} (u) = {({\hat{J}}_{1}^{ℓ} (u), \dots, {\hat{J}}_{k}^{ℓ} (u))}^{T} s . t . u \in U_{ad} . \end{matrix}

(MPPOP^ℓ)

For an arbitrary reference point

z \in R^{k}

and target direction

r \in R^{k}

, the reduced-order PS problem reads

\begin{matrix} min_{(u, t)} t s . t . (t, u) \in R \times U_{ad} and {\hat{J}}_{i}^{ℓ} (u) - z_{i} \leq t, i = 1, \dots, k . & (P_{z, r}^{PS, ℓ}) \end{matrix}

One could then outline an algorithm similar to Algorithm 1 by using an offline/online splitting. Because of the above-mentioned disadvantages, we focus on combining the PSPs with the TR-RB method from [17] and extend it with respect to the method in [16]. The TR method introduces new aspects to the RB implementation, such as the adaptive construction of the RB space; see the next section for further details.

4. The TR-RB Method

We briefly introduce the method from [17] and clarify how to apply this in combination with the PS method. In Section 4.2 we highlight our extension to this method and how this can reduce the computational time. The basic idea of a TR method is to compute a first-order critical point of a costly optimization problem by iteratively solving some cheap-to-solve approximations in local regions of the admissible space, where these model approximations can be trusted (i.e., are accurate enough). In such a way, one can derive a global method, which converges in a finite number of steps. For each outer iteration

j \geq 0

of the TR method, the cheap approximation of the objective is generally indicated by

m^{(j)}

and the trust regions are described by a radius

δ^{(j)}

. To simplify the exposition, let us stick with the case

U = R^{m} \times R

, as in Section 3. The TR method solves then, for each

j \geq 0

, the following constrained optimization sub-problems

min_{v \in U} m^{(j)} (v) s . t . {∥ v ∥}_{2} \leq δ^{(j)}, \tilde{u} : = u^{(j)} + v \in U_{ad} .

(11)

Under suitable assumptions, problem (11) admits a unique solution

{\bar{v}}^{(j)}

, which is used to compute the next outer iteration

u^{(j + 1)} = u^{(j)} + {\bar{v}}^{(j)}

. To further simplify the presentation of the algorithm in [17], let us present it for a general cost functional

J

. Later in this section we will give more details about its application to the MPPOP and the PS method. The TR-RB version of problem (11) is

min_{\tilde{u} \in U_{ad}} J^{ℓ, (j)} (\tilde{u}) s . t . q^{(j)} (\tilde{u}) : = \frac{Δ_{J^{ℓ, (j)}} (\tilde{u})}{J^{ℓ, (j)} (\tilde{u})} \leq δ^{(j)},

(12)

where

J^{ℓ, (j)} (\tilde{u})

is the reduced-order cost functional w.r.t. the reduced-order model at the j-th iteration and

Δ_{J^{ℓ, (j)}} (\tilde{u})

is an estimate for the error

| J (\tilde{u}) - J^{ℓ, (j)} (\tilde{u}) |

. Looking at (12), one clearly sees that the role of the model function

m^{(j)}

is played by the reduced-order model cost functional. This is perfectly in line with the TR spirit of having a cheap-to-solve approximation of the original optimization problem. The trust regions are defined instead through the RB error estimator, which is in fact the way we use to check the quality of the approximation. Let us mention at this point that there are also different approaches to this. In [19,20] the authors incorporated the usual trust-region constraints as seen in (11) into a TR-RB algorithm. In [18] also the importance of introducing a correction term on the RB level is discussed to improve the performance of the method. We point out that this only has to be done if one chooses two separate RB spaces for state and adjoint equations (see also [17]). This will not be the case for our application. In Algorithm 2, we report the method from [17]. In what follows, we guide the reader through the features of the algorithm. At first, we need to initialize the reduced-order model at the initial guess

u^{(0)}

. This means computing

S (u^{(0)})

and

A_{i} (u^{(0)})

for

i = 1, \dots, k

and generating the RB space

V^{ℓ, (0)}

as their span. Similarly, updating the RB space

V^{ℓ, (j)}

at the point

u^{(j + 1)}

means computing the full-order quantities

S (u^{(j + 1)})

and

A_{i} (u^{(j + 1)})

for

i = 1, \dots, k

and adding them to the RB space by a Gram-Schmidt orthonormalization.

In Line 3 of Algorithm 2, it is required to compute the so-called approximated generalized Cauchy (AGC) point. We report here its definition according to [15,18].

Definition 14.

Let

κ \in (0, 1)

and

κ_{arm} \in (0, 1)

be backtracking parameters. For the current iterate

u^{(j)}

define

d^{(j)} : = \nabla J^{ℓ, (j)} (u^{(j)})

. Let

α \in N

be the smallest number for which the two conditions

\begin{matrix} J^{ℓ, (j)} (P_{U_{ad}} (u^{(j)} - κ^{α} d^{(j)})) - J^{ℓ, (j)} (u^{(j)}) & \leq - \frac{κ_{arm}}{κ^{α}} {∥ P_{U_{ad}} (u^{(j)} - κ^{α} d^{(j)}) - u^{(j)} ∥}_{U}^{2}, \end{matrix}

(13)

\begin{matrix} q^{(j)} (P_{U_{ad}} (u^{(j)} - κ^{α} d^{(j)})) & \leq δ^{(j)} \end{matrix}

(14)

are satisfied, where

P_{U_{ad}} : U \to U_{ad}

is the canonical projection onto the closed and convex set

U_{ad}

. Then we define the AGC point as

u_{AGC}^{(j)} : = P_{U_{ad}} (u^{(j)} - κ^{α} d^{(j)})

.

The TR-RB subproblem (12) is then solved in Line 4 using a projected Newton-CG algorithm with the AGC point as a warm start and the following termination criteria

∥ u - P_{U_{ad}} (u - \nabla J^{ℓ, (j)} (u)) ∥_{U} \leq τ_{sub}, β_{bound} δ^{(j)} \leq q^{(j)} (u) \leq δ^{(j)} .

(15)

The first condition in (15) is the standard first-order criticality condition with tolerance

τ_{sub} \in (0, 1)

and the second one was already introduced in [14] to avoid too many iterations close to the TR boundary, which is generally an area where we are already starting to trust the model function less. The parameter

β_{bound}

is usually chosen to be close to one exactly for this purpose.

Algorithm 2: TR-RB algorithm

1:: Initialize the reduced-order model at $u^{(0)}$ , set $j = 0$ and Loop_flag=True;
2:: whileLoop_flagdo
3:: Compute the AGC point $u_{AGC}^{(j)}$ ;
4:: Compute $u^{(j + 1)}$ as solution of (12) with stopping criteria (15);
5:: if $J^{ℓ, (j)} (u^{(j + 1)}) + Δ_{J^{ℓ, (j)}} (u^{(j + 1)}) < J^{ℓ, (j)} (u_{AGC}^{(j)})$ then
6:: Accept $u^{(j + 1)}$ , set $δ^{(j + 1)} = δ^{(j)}$ , compute $ϱ^{(j)}$ and $g (u^{(j + 1)})$ ;
7:: if $g (u^{(j + 1)}) \leq τ_{FOC}$ then
8:: Set Loop_flag=False;
9:: else
10:: if $ϱ^{(j)} \geq η_{ϱ}$ then
11:: Enlarge the TR radius $δ^{(j + 1)} = β_{1}^{- 1} δ^{(j)}$ ;
12:: end if
13:: if not Skip_enrichment_flag $(j)$ then
14:: Update the RB model at $u^{(j + 1)}$ ;
15:: end if
16:: end if
17:: else if $J^{ℓ, (j)} (u^{(j + 1)}) - Δ_{J^{ℓ, (j)}} (u^{(j + 1)}) > J^{ℓ, (j)} (u_{AGC}^{(j)})$
18:: if $β_{1} δ^{(j)} \leq δ_{\min}$ or Skip_enrichment_flag $(j - 1)$ then
19:: Update the RB model at $u^{(j + 1)}$ ;
20:: end if
21:: Reject $u^{(j + 1)}$ , shrink the radius $δ^{(j + 1)} = β_{1} δ^{(j)}$ and go to 4;
22:: else
23:: Compute $J (u^{(j + 1)})$ , $g (u^{(j + 1)})$ , $ϱ^{(j)}$ and set $δ^{(j + 1)} = β_{1}^{- 1} δ^{(j)}$ ;
24:: if $g (u^{(j + 1)}) \leq τ_{FOC}$ then
25:: Set Loop_flag=False;
26:: else
27:: if Skip_enrichment_flag $(j)$ and $ϱ^{(j)} \geq η_{ϱ}$ then
28:: Accept $u^{(j + 1)}$ ;
29:: else if $J (u^{(j + 1)}) \leq J^{ℓ, (j)} (u_{AGC}^{(j)})$
30:: Accept $u^{(j + 1)}$ and update the RB model;
31:: if $ϱ^{(j)} < η_{ϱ}$ then
32:: Set $δ^{(j + 1)} = δ^{(j)}$ ;
33:: end if
34:: else
35:: if $β_{1} δ^{(j)} \leq δ_{\min}$ or Skip_enrichment_flag $(j - 1)$ then
36:: Update the RB model at $u^{(j + 1)}$ ;
37:: end if
38:: Reject $u^{(j + 1)}$ , set $δ^{(j + 1)} = β_{1} δ^{(j)}$ and go to 4;
39:: end if
40:: end if
41:: end if
42:: Set $j = j + 1$ ;
43:: end while

An important aspect of TR methods is the decision to accept or reject the step

u^{(j + 1)}

. Generally, one asks for the so-called sufficient decrease condition

J^{ℓ, (j + 1)} (u^{(j + 1)}) \leq J^{ℓ, (j)} (u_{AGC}^{(j)})

; cf. [15]. Note that this condition requires to update the RB space before being sure that the step will be accepted. If it is rejected, then we performed a costly update without the possibility of exploiting it. Because of this fact, Ref. [14] proposed a sufficient (Line 5) and a necessary (Line 17) condition for the sufficient decrease condition. In [18] it is also noted that the full-order quantities

J (u^{(j + 1)})

and

\nabla J (u^{(j + 1)})

are cheaply available after updating the RB space. Additionally, Ref. [17] introduced the possibility of skipping a redundant enrichment, which is particularly useful at the late stage of the method, where we are close to the optimum. This will prevent the dimension of the RB space from growing too fast, so that the cheap-to-solve property is preserved. The three conditions to be checked in order to decide whether to skip the update of the RB space are contained in the following skipping parameter

\begin{matrix} Skip_enrichment_flag & (j) : = (q^{(j)} (u^{(j + 1)}) \leq β_{q} δ^{(j + 1)}) and \\ (\frac{| g (u^{(j + 1)}) - g^{ℓ, (j)} (u^{(j + 1)}) |}{g^{ℓ, (j)} (u^{(j + 1)})} \leq τ_{g}) and \\ (\frac{∥ \nabla J^{ℓ, (j)} (u^{(j + 1)}) - \nabla J (u^{(j + 1)}) ∥_{U}}{∥ \nabla J^{ℓ, (j)} (u^{(j + 1)}) ∥_{U}} \leq min {τ_{grad}, β_{grad} δ^{(j + 1)}}) . \end{matrix}

where

β_{q}, β_{grad}, τ_{g}, τ_{grad} \in (0, 1)

are given parameters and

\begin{matrix} g (u) : = ∥ u - P_{U_{ad}} (u - \nabla J (u)) ∥_{U}, g^{ℓ, (j)} (u) : = ∥ u - P_{U_{ad}} (u - \nabla J^{ℓ, (j)} (u)) ∥_{U} . \end{matrix}

Note also that

g (u) = 0

is nothing else than the standard first-order condition for optimization problems with constraints on the parameter set. This is the reason why Algorithm 2 terminates when

g (u^{(j + 1)}) < τ_{FOC}

holds with

τ_{FOC} \in (0, 1)

. For more details on the skipping condition, we refer to [17]. Typically, TR methods also have the option of shrinking (enlarging) the TR radius

δ^{(j)}

with some factor

β_{1} \in (0, 1)

(

β_{1}^{- 1} > 1

, respectively). In the case of Algorithm 2, we shrink the radius if a point is rejected. We also compute the ratio

\begin{matrix} ϱ^{(j)} : = \frac{J (u^{(j)}) - J (u^{(j + 1)})}{J^{ℓ, (j)} (u^{(j)}) - J^{ℓ, (j)} (u^{(j + 1)})} . \end{matrix}

If this ratio is greater than a parameter

η_{ϱ} \in [0.75, 1]

, then the radius is enlarged. Algorithm 2 is proved to be convergent given some technical assumptions on the problem. We summarize everything in the following theorem (cf. [17]).

Theorem 9.

Suppose that

U_{ad} = [u^{a}, u^{b}] \subset R^{P}

for some

u^{a}, u^{b} \in R^{P}

with

u^{a} \leq u^{b}

. Assume that

J

and

J^{ℓ, (j)}

(

j \in N

) are strictly positive,

J

is continuously Fréchet differentiable and

J^{ℓ, (j)}

is even twice continuously Fréchet differentiable for all

j \in N

. Moreover,

\nabla J^{ℓ, (j)}

is uniformly Lipschitz-continuous with respect to j. Suppose that there is

δ_{\min} > 0

such that for every

j \in N

there exists a TR radius

δ^{(j)} \geq δ_{\min}

, for which there is a solution

u^{(j + 1)}

of the TR-RB subproblem (12) which is accepted by Algorithm 2. Assume that the family of functions

{(q^{(j)})}_{j \in N}

is uniformly continuous w.r.t. the parameter u and the index j. Then every accumulation point

\bar{u}

of the sequence of iterates

{(u^{(j)})}_{j \in N}

is a first-order critical point for the full-order optimization problem, i.e., it holds

{∥\bar{u} - P_{U_{ad}} (\bar{u} - \nabla J (\bar{u}))∥}_{U} = 0 .

In particular, Algorithm 2 terminates after finitely many steps.

Although many of the assumptions in Theorem 9 are quite technical for the proof, one can show that they are reasonable in the case of the RB method; cf. [17].

4.1. The TR-RB Algorithm Applied to the PS Method

In this section we show how Algorithm 2 can be applied to the PS method. To this end, we recall the following lemma from [16].

Lemma 4.

There are constants

C_{J}, C_{\nabla J}, C_{\nabla^{2} J} > 0

such that for any

j \in {1, \dots, k}

, any

u \in U_{ad}

and any choice of the RB space

V^{ℓ}

it holds

\begin{matrix} | {\hat{J}}_{i}^{ℓ} (u) | \leq C_{J}, ∥ \nabla {\hat{J}}_{i}^{ℓ} (u) ∥_{U} \leq C_{\nabla J}, ∥ \nabla^{2} {\hat{J}}_{i}^{ℓ} (u) ∥_{L (U)} \leq C_{\nabla^{2} J} . \end{matrix}

Lemma 4 immediately implies that the reduced-order gradient is uniformly Lipschitz-continuous with respect to ℓ. We have to solve (

P_{z, r}^{PS}

). We follow the approach in [16], where the target direction

r = (1, \dots, 1)

is chosen and an augmented Lagrangian method is used. Provided a penalty parameter

μ > 0

, the augmented Lagrangian for (

P_{z, r}^{PS}

) is

\begin{matrix} L_{A} ((u, t, s), λ; μ) : = t + \sum_{i = 1}^{k} λ_{i} c_{i} (u, t, s) + \frac{μ}{2} \sum_{i = 1}^{k} c_{i} {(u, t, s)}^{2} \end{matrix}

(16)

with

c_{i} (u, t, s) = {\hat{J}}_{i} (u) - z_{i} - t + s_{i}

. The idea is to iteratively solve the subproblems

\begin{matrix} min L_{A} ((u, t, s), λ; μ) s . t . (u, t, s) \in U_{ad} \times R \times R_{\geq}^{k} \end{matrix}

(17)

approximately and then update the Lagrange multiplier

λ

and the penalty parameter

μ

until the termination criteria

\begin{matrix} {∥c (u, t, s)∥}_{R^{k}} & < τ_{EC}, \end{matrix}

(18)

\begin{matrix} ∥ (u, t, s) - P_{ad} ((u, t, s) - \nabla_{(u, t, s)} L_{A} ((u, t, s), λ; μ)) ∥_{U \times R \times R^{k}} & < τ_{FOC} \end{matrix}

(19)

are satisfied for some tolerances

τ_{EC}, τ_{FOC} \in (0, 1)

, where

P_{ad} : U \times R \times R^{k} \to U_{ad} \times R \times R_{\geq}^{k}

is the canonical projection onto

U_{ad} \times R \times R_{\geq}^{k}

. For further details, we refer to [16] (Appendix B). We want to combine then the augmented Lagrangian method with the TR-RB algorithm to solve problem (

P_{z, r}^{PS}

). To do so, we apply Algorithm 2 to solve each subproblem (17). We first define the reduced-order augmented Lagrangian

\begin{matrix} L_{A}^{ℓ} ((u, t, s), λ; μ) : = t + \sum_{i = 1}^{k} λ_{i} c_{i}^{ℓ} (u, t, s) + \frac{μ}{2} \sum_{i = 1}^{k} c_{i}^{ℓ} {(u, t, s)}^{2}, \end{matrix}

(20)

with

c_{i}^{ℓ} (u, t, s) = {\hat{J}}_{i}^{ℓ} (u) - z_{i} - t + s_{i}

, which leads to the reduced-order subproblem

\begin{matrix} min L_{A}^{ℓ} ((u, t, s), λ; μ) s . t . (u, t, s) \in U_{ad} \times R \times R_{\geq}^{k} . \end{matrix}

(21)

Note that in this case the admissible set

U_{ad} \times R \times R_{\geq}^{k}

is unbounded, which collides with the first assumption of Theorem 9. Nevertheless, Ref. [16] showed that the (

P_{z, r}^{PS}

) problem is also equivalent to

min t s . t . (t, u) \in [t^{\min}, t^{\max}] \times U_{ad} and \hat{J} (u) - z \leq t .

(22)

There is still the problem that the admissible set for the slack variables s is given by

{[0, \infty)}^{k}

. However, computing the partial derivative of the augmented Lagrangian

L_{A}

with respect to

s_{i}

, we obtain

\begin{matrix} \partial_{s_{i}} L_{A} ((u, t, s), λ; μ) & = λ_{i} + μ ({\hat{J}}_{i} (u) - z_{i} - t + s_{i}) \geq λ_{i} + μ (- z_{i} - t^{\max} + s_{i}) . \end{matrix}

Thus,

L_{A}

is strictly monotonically increasing in

s_{i}

for

s_{i} > - λ_{i} / μ + z_{i} + t_{\max} = : s_{i}^{\max}

. Thus, given the Lagrange multiplier

λ

and the penalty parameter

μ

, we can restrict the slack variable

s_{i}

to the interval

[0, s_{i}^{\max}]

. This will not cause any modification to the solvability and the solution of the augmented Lagrangian subproblem. By setting

X_{ad} : = U_{ad} \times [t^{\min}, t^{\max}] \times [0, s^{\max}]

, the equivalent formulation for the augmented Lagrangian subproblem corresponding to (22) reads

\begin{matrix} min_{(u, t, s) \in X_{ad}} L_{A} ((u, t, s), λ; μ) . \end{matrix}

(23)

Similarly, the reduced-order augmented Lagrangian subproblem is given by

\begin{matrix} min L_{A}^{ℓ} ((u, t, s), λ; μ) s . t . (u, t, s) \in X_{ad} . \end{matrix}

(24)

Therefore, the goal is to apply Algorithm 2 to solve the subproblem (23). To this end, we define

x = (u, t, s) \in U \times R \times R^{k}

,

J (x) = L_{A} (x, λ; μ)

and

J^{ℓ, (j)} (x) = L_{A}^{ℓ, (j)} (x, λ; μ)

for any reference point

z \in R^{k}

, any Lagrange multiplier

λ \in R_{\geq}^{k}

and any penalty parameter

μ > 0

. Furthermore, using the a-posteriori estimates of the individual objectives (cf. Theorem 8), we have that

\begin{matrix} | J (x) - J^{ℓ, (j)} (x) | \leq & \sum_{j = 1}^{k} (λ_{j} + c | {\hat{J}}_{j}^{ℓ, (j)} (u) - z_{j} - t + s_{j} |) Δ_{{\hat{J}}_{j}^{ℓ, (j)}} (u) \\ + \sum_{j = 1}^{k} \frac{c}{2} {(Δ_{{\hat{J}}_{j}^{ℓ, (j)}} (u))}^{2} = : Δ_{J}^{ℓ, (j)} (u) \end{matrix}

for all

u \in U_{ad}

, which can be used as a-posteriori error estimate in the TR-RB algorithm. According to Theorem 9, we still need to show the strict positivity of the costs

J

and

J^{ℓ, (j)}

and the uniform Lipschitz continuity of the gradient

\nabla J^{ℓ, (j)}

. For the first, we note that the objectives

J

and

J^{ℓ, (j)}

are bounded from below by

C : = t^{\min} - \sum_{i = 1}^{k} λ_{i}^{2} / (2 μ_{i})

. Since C depends only on fixed parameters of the optimization problems, we can add

C + 1

to the cost functions to obtain strict positivity. Obviously, this will not change the minimizers. The second property is a bit more technical and we prove it in the following lemma.

Lemma 5.

Let the Lagrange multiplier λ and the penalty parameter μ be given. Then the function

J (\cdot) : = L_{A} (\cdot, λ; μ)

is twice continuously Fréchet-differentiable for all

j \in N

and the gradient

\nabla J^{ℓ, (j)}

is uniformly Lipschitz continuous with respect to j.

Proof.

Due to Corollary 1 the cost functions

{\hat{J}}_{1}, \dots, {\hat{J}}_{k}

are twice continuously Fréchet-differentiable. Thus, the function

(u, t, s) \mapsto L_{A} ((u, t, s), λ; μ)

is also twice continuously Fréchet-differentiable as a composition of twice continuously Fréchet-differentiable functions. Similarly, the reduced-order augmented Lagrangians

L_{A}^{ℓ, (j)} ((\cdot, \cdot, \cdot), λ; μ)

are also twice continuously Fréchet-differentiable for all

j \in N

. We have that

\begin{matrix} \nabla^{2} L_{A}^{ℓ, (j)} ((u, t, s), λ; μ) (h^{u}, h^{t}, h^{s}) = \\ (\begin{matrix} \sum_{j = 1}^{k} ((λ_{j} + μ c_{j}^{ℓ, (j)}) \nabla^{2} {\hat{J}}_{j}^{ℓ, (j)} (u) h^{u} + μ (d_{j}^{ℓ, (j)} - h^{t} + h_{j}^{s}) \nabla {\hat{J}}_{j}^{ℓ, (j)} (u)) \\ k μ h^{t} - μ \sum_{j = 1}^{k} (d_{j}^{ℓ, (j)} + h_{j}^{s}) \\ μ (d_{1}^{ℓ, (j)} + h_{1}^{s} - h^{t}) \\ ⋮ \\ μ (d_{k}^{ℓ, (j)} + h_{k}^{s} - h^{t}) \end{matrix}) \end{matrix}

for any

h = (h^{u}, h^{t}, h^{s}) \in U \times R \times R^{k}

, where

c_{j}^{ℓ, (j)} : = {\hat{J}}_{j}^{ℓ, (j)} (u) - z_{j} - t + s_{j}

and

d_{j}^{ℓ, (j)} : = {〈 \nabla {\hat{J}}^{ℓ, (j)} (u), h^{u} 〉}_{U}

for

j \in {1, \dots, k}

. Using Lemma 4, we obtain that the Hessian matrix

\nabla^{2} L_{A}^{ℓ, (j)} ((u, t, s), λ; μ)

can be bounded independently of

(u, t, s)

and j. Using the mean value theorem, we can conclude that the gradients

\nabla L_{A}^{ℓ, (j)} ((\cdot, \cdot, \cdot), λ; μ)

are Lipschitz-continuous with constant

C_{L}

uniformly in j. □

As a consequence of Theorem 9, we have that Algorithm 2 applied to solve the augmented Lagrangian subproblem (23) converges after finitely many steps to a first-order critical point of (23).

Remark 6.

Algorithm 2 constructs and updates the RB space during the optimization procedure. In the case of the PS method, we are free to choose what to do for the space constructed during the TR-RB procedure. For example, we can use it for the next augmented Lagrangian subproblem (and also for the next reference point). We explored different ideas (see also [16]), but we report here only the two most interesting and efficient ones:

(1): Use one common RB space for all the subproblems and reference points, i.e., use a single space $V^{ℓ}$ (which is, of course, updated in the process) for solving the MOP. This strategy acquires efficiency in terms of reconstructing the full-order parameter space during the iterations. Therefore, thanks to the possibility of skipping an enrichment (which is the costly part in Algorithm 2), we expect more and more speed-up, together with accuracy, as the algorithm proceeds.
(2): Use multiple (local) RB spaces. This idea is already exploited by [16,37,38]. In this case, we do not use the previously obtained RB space for the next minimization problem. We generate instead k initial spaces $V_{1}^{ℓ}, \dots, V_{k}^{ℓ}$ , resulting from the minimization (Note that this procedure does not require extra computational cost, since we need to solve these problems for the hierarchical PS method anyway) of the objectives ${\hat{J}}_{1}, \dots, {\hat{J}}_{k}$ . Then at the beginning of every PS problem, we can decide to use the space $V_{i}^{ℓ}$ for which $q^{(0)} (u^{(0)}) < β_{q} δ^{(0)}$ and $dim V_{i}^{ℓ} \leq ℓ_{\max}$ , with $ℓ_{\max} \in N$ being a predefined maximal number of basis functions. If several spaces satisfy these conditions then we select the one for which the value $q^{(0)} (u^{(0)})$ is the smallest. If instead there is no space fulfilling these conditions, we initialize a new space $V_{k + 1}^{ℓ}$ by using the full-order quantities $S (u^{(0)})$ and $A_{i} (u^{0})$ for $i = 1, \dots, k$ .

Although these two techniques are already efficient, we noticed that there is a common problem: the number of RB basis functions might grow too fast and prevent a good speed-up for the solution. In particular, this is the case for the first strategy. To fix this issue, we propose different strategies to remove basis functions from

V^{ℓ}

in Section 4.2. This approach was not considered in [14,16,17,18] and to our knowledge it has not been addressed in the literature yet. In reduced-order optimization, instead, this is meaningful, since the reduced-order model might grow too fast; see, e.g., [33], in the case of proper orthogonal decomposition.

4.2. How to Reduce the Number of Basis Functions

We point out that what is described in this section can also generally be applied to Algorithm 2 from [17] without any relation to the PS method. In particular, the strategies for reducing the number of basis functions presented in this section cannot only be used for PDE-constrained multi-objective optimization problems, but also for any other problem formulation containing PDE-constrained optimization problems. Therefore, we use again the general notation

J

for the cost, as it was done in the beginning of this section. The methodology to remove a basis function comes from the observation that some basis elements might not be used during the optimization process. Suppose that we start from a point

u^{(0)}

very far from the optimum. Clearly, after j iterations the point

u^{(j)}

is in a completely different region of the admissible set compared to the one of the starting point. Hence, the basis functions built for

u^{(0)}

might give a negligible contribution in spanning the reduced-order model at the point

u^{(j)}

. If this is the case, we can expect that these functions will not play any further role also for the subsequent points and therefore they can be removed to reduce the dimension of the RB space. Our methodologies for removing basis functions are then based on Remark 6 and try to check which basis functions give a negligible contribution for the current iteration of the TR-RB algorithm. Notice that every technique we propose from now on will be applied after updating the RB space in the TR-RB algorithm. The aim is to modify the updated RB space in order to provide a new RB space, where the number of basis functions is reduced.

Technique T1.

The first proposed technique is based on the computation of the so-called Fourier coefficients. Given

v \in V

and a set of orthonormal basis functions

{ψ_{n}}_{n = 1}^{ℓ} \subset V^{ℓ}

, the n-th Fourier coefficient is defined as

c_{F}^{(n)} (v) : = {〈 v, ψ_{n} 〉}_{V}

. Now, T1 consists in computing

c_{F}^{(n)} (S (u^{(j + 1)}))

and

c_{F}^{(n)} (A_{i} (u^{(j + 1)}))

,

i = 1, \dots, k

, for

n = 1, \dots, ℓ

and remove the basis function

ψ_{n}

for which

ζ^{(n)} : = max \{\frac{c_{F}^{(n)} {(S (u^{(j + 1)}))}^{2}}{\sum_{η = 1}^{ℓ} c_{F}^{(η)} {(S (u^{(j + 1)}))}^{2}}, max_{i = 1, \dots, k} \{\frac{c_{F}^{(n)} {(A_{i} (u^{(j + 1)}))}^{2}}{\sum_{η = 1}^{ℓ} c_{F}^{(η)} {(A_{i} (u^{(j + 1)}))}^{2}}\}\}

is below a certain tolerance. Note, in fact, that the Fourier coefficients indicate the order of magnitude of the contribution of a given basis function in reconstructing the new snapshots that we want to add to update the RB. Strategy T1 is also based on the assumption that the snapshots, which we want to include in an update, are the most relevant for the new TR subproblem, because they correspond to the last accepted optimization step

u^{(j + 1)}

. The advantage of T1 is that the required Fourier coefficients are already available from the Gram-Schmidt orthogonalization performed during the update of the RB space. There is, anyway, a possible drawback of T1 due to the tolerance we set: it can happen that also important basis functions are removed although one thinks that the tolerance is small enough. Because of this, we would like to have a criteria to decide in an unbiased way which basis functions should be removed.

Technique T2.

This approach is based on the idea that once a point

u^{(j + 1)}

is accepted by the TR-RB algorithm and the RB space is updated, we will compute a provisional AGC point

u_{AGC}^{(j + 1), prov}

(cf. Definition 14) with respect to the previously updated RB space. One robustness criteria that we demand is that after removing basis functions, this provisional AGC point is still inside the new TR (Note that the TR depends on the reduced-order model due to the inequality constraint in (12) and, therefore, changes if we remove basis functions), although it might not coincide with the actual AGC point

u_{AGC}^{(j + 1)}

that we compute after removing basis functions according to Line 3 in Algorithm 2 (Note that the reduced-order cost function changes by removing a basis function, so that also the first term in (13) differs after this removal). If we do not demand this robustness criteria, we can expect a deterioration of the TR performances due to lack of accuracy of the RB model in the steepest descent direction. Another important aspect is to guarantee the convergence of the TR-RB method, which implies checking that the conditions for accepting the point

u^{(j + 1)}

are still fulfilled, although we removed basis functions.

In summary, the difference with respect to T1 is then to remove basis functions starting from the one with the smallest value of

ζ^{(n)}

and proceeding in ascending order until one of the following conditions is satisfied

\begin{matrix} \frac{Δ_{J^{ℓ - rem, (j + 1)}} (u_{AGC}^{(j + 1), prov})}{J^{ℓ - rem, (j + 1)} (u_{AGC}^{(j + 1), prov})} & > β_{q} δ^{(j + 1)}, \end{matrix}

(25a)

\begin{matrix} \frac{Δ_{\nabla J^{ℓ - rem, (j + 1)}} (u_{AGC}^{(j + 1), prov})}{∥ \nabla J^{ℓ - rem, (j + 1)} (u_{AGC}^{(j + 1), prov}) ∥_{U}} & > min {τ_{grad}, β_{grad} δ^{(j + 1)}}, \end{matrix}

(25b)

\begin{matrix} \frac{∥ \nabla J^{ℓ - rem, (j + 1)} (u^{(j + 1)}) - \nabla J (u^{(j + 1)}) ∥_{U}}{∥ \nabla J^{ℓ - rem, (j + 1)} (u^{(j + 1)}) ∥_{U}} & > min {τ_{grad}, β_{grad} δ^{(j + 1)}}, \end{matrix}

(25c)

\begin{matrix} \frac{| g (u^{(j + 1)}) - g^{ℓ - rem, (j + 1)} (u^{(j + 1)}) |}{g^{ℓ - rem, (j + 1)} (u^{(j + 1)})} & > τ_{g}, \end{matrix}

(25d)

\begin{matrix} J^{ℓ - rem, (j + 1)} (u^{(j + 1)}) & > J^{ℓ, (j)} (u_{AGC}^{(j)}), \end{matrix}

(25e)

\begin{matrix} J^{ℓ - rem, (j + 1)} (u_{AGC}^{(j + 1), prov}) - J (u^{(j + 1)}) & > - κ_{arm} ∥ u_{AGC}^{(j + 1), prov} - u^{(j + 1)} ∥_{U}^{2} . \end{matrix}

(25f)

If one of the conditions (25) holds we re-add the basis function to the RB space and finish the removal continuing with the TR-RB procedure. T2 is summarized in Algorithm 3.

Algorithm 3: Summary of T2

1:: Follow the steps in Algorithm 2 until the RB model is updated at $u^{(j + 1)}$ ;
2:: Compute a provisional AGC point $u_{AGC}^{(j + 1), prov}$ by using the reduced-order cost function w.r.t. the updated RB model;
3:: Compute $ζ^{(n)}$ for $n \in {1, \dots, ℓ}$ ;
4:: while None of the conditions in (25) is fullfiled do
5:: Out of all remaining basis functions, remove the one with the smallest value of $ζ^{(n)}$ from the RB space;
6:: end while
7:: Add the last removed basis function to the RB space;
8:: Proceed with Algorithm 2 with the RB space obtained performing Steps 2–7;

Let us explain the meaning of (25). At first, the superindex

ℓ - rem

indicates that the space used to compute the quantity is the RB space obtained after removing a basis function. Condition (25a) is to check that the provisional AGC point will remain inside an accurate-enough region of the TR. Condition (25b) is in the spirit of (25a) but for the gradient of the objective. Conditions (25c) and (25d) are based on the skipping enrichment criteria and are checked to ensure convergence and robustness of the method after the removal. For a similar issue we need to check that the sufficient decrease condition is fulfilled as well (cf. (25e)). Finally, (25f) is to enforce that the provisional AGC point is still a Cauchy point. In such a way, we are sure that Algorithm 2 converges even after performing the basis removal (cf. [17,18]). In this sense, T2 introduces an unbiased way to deal with the technique introduced in T1. There are still a few aspects one should comment on before implementing T2. At first, note that all the above-mentioned conditions are cheaply computable, since they are based either on reduced-order quantities or the appearing full-order quantities are available because of the RB update. At second, conditions (25a) and (25b) request efficient and reliable error estimators. Although for the PS method the efficiency of

Δ_{J}^{ℓ, (j)}

is acceptable, it is not the same for an error estimator

Δ_{\nabla J}^{ℓ, (j)}

based on the a-posteriori estimates of the gradients of the individual objectives. These estimators generally produce a huge overestimation, which makes them useless in practice. We notice, in fact, that condition (25b) is immediately triggered in the case of the PS method and we can not remove any basis function. This is the reason why we solved this issue by two different related approaches:

Technique T2a.

We replace the numerator of (25b) by

∥ \nabla J^{ℓ - rem, (j)} (u_{AGC}^{(j + 1), prov}) - \nabla J (u_{AGC}^{(j + 1), prov}) ∥_{U},

which is the true error we wanted to estimate, but it is unfortunately costly. It requires the computation of the full-order quantities

S (u_{AGC}^{(j + 1), prov})

and

A_{i} (u_{AGC}^{(j + 1), prov})

,

i = 1, \dots, k

.

Technique T2b.

We replace the numerator of (25b) by

∥ \nabla J^{ℓ - rem, (j)} (u_{AGC}^{(j + 1), prov}) - \nabla J^{ℓ, (j + 1)} (u_{AGC}^{(j + 1), prov}) ∥_{U}

which is a cheap approximation of the true error that we suppose to be reliable only after enough steps of Algorithm 2, however.

Clearly, if one has a good estimation of the gradient at hand, T2 can be still used in its original form.

Technique T3.

Another drawback of T2 is the fact that we first need to remove the basis function in order to check (25). This implies that when we stop the removal, we need to add back the last basis function which was removed, because it is containing important information; cf. Line 7 of Algorithm 3. This results in a waste of time for the modified Algorithm 2. We decide to add the option of introducing numerical tolerances for each of the conditions (25). In such a way, the modified algorithm will generally stop before an important basis function is removed at the price of possibly leaving one or a few redundant basis functions in the RB space. We think that this is a meaningful modification regarding the time that is wasted reintroducing the removed basis function into the RB space; cf. Section 5. We indicate this last strategy as T3.

5. Numerical Experiments

In this section we test Algorithm 2 and compare it with the results obtained in [16] (Section 3.2.2). We use the same numerical setting, which we briefly report here. Let the domain

Ω

be the two-dimensional unit square, split into four different subdomains

Ω_{1} = (0, 0.5) \times (0, 0.5)

,

Ω_{2} = (0, 0.5) \times (0.5, 1)

,

Ω_{3} = (0.5, 1) \times (0, 0.5)

and

Ω_{4} = (0.5, 1) \times (0.5, 1)

. For each

Ω_{i}

, we consider a corresponding diffusion coefficient

u_{i}^{κ} \in R

in (3) for

i = 1, \dots, 4

. The reaction term

r (x)

is set to be constantly equal to 1 for any

x \in Ω

. We impose homogeneous Neumann boundary conditions (i.e.,

α = 0

) and a source term

f (x) = \sum_{i = 1}^{4} c_{i} χ_{Ω_{i}} (x)

with

c_{1} \approx 2.76

,

c_{2} \approx - 0.96

,

c_{3} \approx 0.51

and

c_{4} \approx - 1.66

generated randomly in order to obtain a problem with a non-convex Pareto front. For the spatial discretization of the state equation, we apply the Finite Element (FE) method with 1340 nodes and piecewise linear basis functions. For (MPPOP) we choose the following three objectives

\begin{matrix} {\hat{J}}_{1} (u) & : = \frac{1}{2} ∥ S (u) - y_{Ω}^{(1)} ∥_{H}^{2} + \frac{ε}{2} ∥ u - u_{d}^{(1)} ∥_{U}^{2}, \\ {\hat{J}}_{2} (u) & : = \frac{1}{2} ∥ S (u) - y_{Ω}^{(2)} ∥_{H}^{2} + \frac{ε}{2} ∥ u - u_{d}^{(2)} ∥_{U}^{2}, {\hat{J}}_{3} (u) : = \frac{0.05}{2} ∥ u - u_{d}^{(3)} ∥_{U}^{2} \end{matrix}

with

ε = 0.002

, the desired states

\begin{matrix} y_{Ω}^{(1)} (x) : = χ_{(0, 0.5) \times (0, 1)} (x), y_{Ω}^{(2)} (x) : = χ_{(0.5, 1) \times (0, 1)} (x), \end{matrix}

and the desired parameter values

\begin{matrix} u_{d}^{(1)} = u_{d}^{(2)} : = {(2, 0, 0, 0, 0.3)}^{T}, u_{d}^{(3)} & : = {(2, 1, 1, 1, 0.3)}^{T} . \end{matrix}

The lower and upper parameter bounds are given by

u_{a} = {(2, 0.1, 0.1, 0.1, 0.3)}^{T} and u_{b} = {(2, 4, 4, 4, 0.3)}^{T},

respectively. This implies that

u_{1}^{κ} = 2

and

u^{r} = 0.3

are seen as constants and we only optimize over the three parameters

u_{2}^{κ}

,

u_{3}^{κ}

and

u_{4}^{κ}

. Note furthermore, that the desired parameters

u_{d}^{(1)} = u_{d}^{(2)}

are not admissible. In fact, as for the parameters of the source term, they were chosen such that the resulting Pareto front is non-convex.

For the choice of the initial value for PSPs corresponding to reference points for the entire problem

({\hat{J}}_{1}, {\hat{J}}_{2}, {\hat{J}}_{3})

we do the following: Let

{\bar{u}}^{i}

be the minimizer of

{\hat{J}}_{i}

for

i = 1, 2, 3

. Recall that the sets

D_{i}

have been introduced in Definition 7-(ii). Then, if

z \in D_{i}

, we choose

{\bar{u}}^{i}

as the initial value for solving (

P_{z, r}^{PS}

). We additionally choose the shifting vectors

\tilde{d} = 0.001 \cdot {(1, 1, 1)}^{T}

, while the grid size h for the reference point grid is set to

h_{PSM} = 0.003

.

5.1. Parameter Choices for the TR-RB Algorithm

There are many parameters used in the TR-RB algorithm, which we will specify and briefly comment on in this section.

The initial TR radius is chosen as $δ^{(0)} = 0.1$ , the tolerance for increasing the TR radius is set to $η_{ϱ} = 0.75$ and the factor for shrinking the TR radius to $β_{1} = 0.5$ . For the minimal TR radius we use $δ_{\min} = 1 \times 10^{- 16}$ .
For the Armijo backtracking strategy, we use the constants $κ_{arm} = 1 \times 10^{- 4}$ and $κ = 0.5$ .
The tolerance of the first-order condition is set to $τ_{FOC} = τ_{FOC, sub}^{(i)}$ , where $τ_{FOC, sub}^{(i)}$ is the tolerance for the first-order condition of the current augmented Lagrangian subproblem. Moreover, we choose $τ_{sub} = 0.5 τ_{FOC}$ as the tolerance of the first-order condition of the TR-subproblem and $β_{bound} = 0.9$ as the constant in (15).
For checking the necessity of updating the RB space, we choose $τ_{g} = 1$ , $τ_{grad} = 0.1$ , $β_{grad} = 0.2$ and $β_{q} = 0.005$ .
The tolerance chosen in T1 (cf. Section 4.2) for the Fourier coefficient is $10^{- 6}$ . Similarly, we choose the same tolerance for T3 in order to break the removal algorithm before deleting important basis functions, i.e., we subtract it on the right-hand side of (25a)–(25f).

We notice in our numerical experiments that the method without basis removal is quite robust in terms of computational time and required PDE solves with respect to all the parameters except for the ratio between the first-order conditions of the current augmented Lagrangian subproblem and the TR-subproblem

τ_{FOC} / τ_{sub}

. In our experiments we choose this ratio to be 2, but we observe that a too large ratio (already 5 is sufficient) slows down the method considerably. The reason is that the TR-subproblems are solved with too much accuracy in this case which needs a lot of numerical effort but does not benefit the overall optimization. Regarding the techniques introduced in Section 4.2, T1 heavily depends on the choice of the tolerance for truncating the Fourier coefficient. The smaller the tolerance the less basis functions are removed. Anyway, if we remove too many basis functions (e.g., tolerance of

10^{- 4}

), T1 becomes less stable and the method needs more iterations to converge which generally corresponds in more enrichment steps which slow it down. Conversely, removing few basis functions (e.g.,

10^{- 8}

) implies no significant differences between T1 and the method without removal. Contrarily to T1, techniques T2, T2a and T2b are only based on the same parameters which influence the behavior of the algorithm without removal. Their performances are also robust with respect to all these parameters in terms of basis functions removed. For T3 the same discussion applies, but this method is also sensitive to the tolerance chosen to break the removal algorithm before deleting presumed important basis functions. On one hand, if this tolerance is too high (e.g.,

10^{- 2}

), the method will not remove a significant number of basis functions to influence the performances of the algorithm. On the other hand, if this is too low (e.g.,

10^{- 8}

) T3 will be essentially equivalent to T2.

5.2. Numerical Ressults

In this section, we focus mainly on the comparison of our proposed TR-RB variants, briefly commenting on full-order versus reduced-order model. For detailed comments and results on the PS method applied on the FE and RB level, we refer to [16] (Section 3.2.2). At first, to validate our approach, we show in Figure 1 the obtained Pareto fronts by using the method in [16] (left) and our method (right). As one can see, there is no visible difference. The approximation error is, in fact, of the order of

10^{- 6}

for a Pareto point computed by all the proposed techniques (i.e., T1, T2a, T2b and T3) on average. This can be essentially explained by the fact that the termination criteria for Algorithm 2 relies on the full-order model. Therefore, any computed point is first-order critical for the FE model, up to the chosen stopping tolerance. Let us remark that this is not typical for model order reduction, where generally there is an additional approximation error due to the reduced-order model inaccuracy.

In Figure 2 we compare the computational time of Algorithm 2 for all the proposed techniques (cf. Section 4.2) against the full-order FE model and the algorithm in [16]. Concerning the FE method, we can save between

41 %

and

59 %

of the computational time. Considering the fact that we do not have an approximation error in reconstructing the Pareto points, we get the same result in approximately half of the time by using any of the TR-RB variants. This speed-up will also increase with an increasing number of degrees of freedom for the FE method, since the number of required FE solves of the PDE is significally smaller for the TR-RB algorithms than for the FE method; cf. Table 1.

Furthermore, we get a speed-up of the TR-RB algorithm by using the proposed techniques for reducing the number of basis functions in almost all cases. Depending on the strategy from Remark 6, one technique performs better than the others. Here we try to explain this phenomena in detail. Let us focus on the common RB space first. In this case, every technique helps in saving computational time. This is clearly the effect of removing redundant basis functions, which are particularly frequently included using a large common RB space. This is the reason why T1 appears to be the most effective, since it is the cheapest among the techniques (as we said it does not imply additional cost to be checked). T2a is more robust, but it comes with the price of evaluating the full-order gradient at the new AGC point and thus results to be slower than T1. Apparently, T2b should overcome this problem, but the inaccuracy of the RB space in the beginning yields a bad approximation of (25b), resulting in removing too many basis functions which leads to a worse approximation for the consecutive steps. This worsening of the approximation results in a way larger number of enrichment steps towards the end of the algorithm, which also negatively influences the computational time. T3 is comparable with T2a, meaning that for this example we are removing many basis functions in only a few instances, rather than frequently removing a few basis functions. Figure 3b confirms the above remarks for the case of a common RB space. In this figure we report the number of basis functions obtained at the end of Algorithm 2 while this is applied to compute each Pareto optimal point in the PS method.

Now, let us focus on the left group of columns in Figure 2 (and thus on Figure 3a), which corresponds to the computational times in the case of using local RB spaces (cf. Remark 6). This case is a bit more delicate, since the use of local RB spaces makes it more difficult to interpret the results. Here the problem of T1 is emerging. The fact that this technique removes a number of basis functions without any robustness criteria implies that the method slows down. In the case of local spaces, in fact, we do not have the same amount of redundant basis functions as it can occur for a common RB space. Therefore, we should only remove the basis functions which are actually redundant. As one can note in Figure 3a, T1 removes a significantly larger amount of basis functions in comparison to the other techniques. Here the criteria introduced in T2a play their role in a positive way. We can counteract the effect of T1 in such a way that the computational time is comparable to the one in [16]. The further simplification introduced in T2b helps to get an additional speed-up. In contrast to the common RB space, here we have local spaces which provide a sufficiently good accuracy for approximating (25b) also in the beginning of the optimization. This is then beneficial for the algorithm, since the cost of computing the criteria in T2b is way cheaper than T2a, where we need full-order solves of the state and adjoint equation to compute the gradient at the new AGC point. Additionally, T3 further improves T2a and T2b in terms of computational time, since in the case of local RB spaces it is more probable that we indeed remove only a few basis functions but more frequently than in the case of one common RB space. In this case, it is important to have tolerances that let us stop before removing an important basis function and save time for reintroducing it in the RB space.

In conclusion, comparing our fastest method (i.e., Algorithm 2 with local RB spaces and T3) to the slowest (i.e., using [16] with a common RB space) we get essentially the same results (the approximation error is

10^{- 6}

) saving approximately

30 %

of the computational time, which is roughly 300 s. This shows how one should invest time and resources in providing efficient techniques for reducing the number of basis functions in the RB space, while using an adaptive TR-RB algorithm. Particularly in the case of multiobjective optimization, this becomes crucial for a large number of cost functionals k. To obtain the same resolution of the Pareto front as in Figure 1 for a large k, we will need to solve the PSPs for many more points, implying higher risk of having redundant basis functions.

6. Conclusions

We showed the applicability and convergence of the TR-RB algorithm in the context multi-objective PDE-constrained parameter optimization problem. We presented and analyzed novel ways of reducing the dimension of the RB space during the optimization procedure. To our knowledge, basis reduction strategies have not been proposed yet for the RB method, although it is common for other model order reduction techniques. Such a removal significantly improved the performances of the TR-RB algorithm in the context of multiobjective optimization, leading faster to an accurate solution than the already existing techniques. The presented example contained only three parameters to be optimized. However, based on the results in [17] (Section 4.4) for an example with 28 parameters and on the various examples in [39] (Sections 3.5.4–3.5.6), we expect all of the TR-RB methods to scale well with an increasing number of parameters. As for the multi-objective optimization by the PS method, the numerical effort grows exponentially with the number of cost functions k, but is independent of the number of parameters m if

m \geq k - 1

. Moreover, the presented removal techniques of reduced basis functions can also be extended to other applications in which sequential parametric PDE-constrained optimization problems must be solved. In future work, one can try to extend the convergence theory for the presented TR-RB algorithm to a larger class of PDEs than the one presented here, as, e.g., parabolic PDEs [14] or non-affine parameter-to-state couplings. Due to the general formulation of the convergence result we are optimistic that this is possible. Moreover, one can try to achieve further improvements concerning robustness of the method and deriving tighter a-posteriori error estimators, in particular for the gradient of the cost function. This is also of great interest in the RB community. Another interesting idea could be to incorporate the usual trust-region condition based on the (Euclidean) distance from the current iterate into the presented TR-RB algorithm. In [19] the usual trust-region condition was actually performing slightly better than a residual-based error estimate as the trust-region constraint for some of the considered problems. Despite the fact that we use not only a residual-based error estimate but an error estimate of the actual cost function, a comparison between the different approaches is definitely of interest.

Author Contributions

Conceptualization, S.B., L.M. and S.V.; methodology, S.B., L.M. and S.V.; software, S.B. and L.M.; formal analysis, S.B., L.M. and S.V.; investigation, S.B., L.M. and S.V.; writing—original draft preparation, S.B., L.M. and S.V.; writing—review and editing, S.B., L.M. and S.V.; funding acquisition, S.V. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG) for the project Localized Reduced Basis Methods for PDE-constrained Parameter Optimization under contract VO 1658/6-1.

Acknowledgments

The authors thank Tim Keil, Mario Ohlberger and Felix Schindler from University of Münster (Germany) for the fruitful exchange of ideas on the topic.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AGC	Approximated generalized Cauchy
CG	Conjugate gradient
FE	Finite element
MOP	Multiobjective optization problem
MPPOP	Multiobjective parametric PDE-constrained optimization problem
PDE	Partial Differential Equation
PS	Pascoletti-Serafini
RB	Reduced basis
s.t.	subject to
TR	Trust-region

References

Ehrgott, M. Multicriteria Optimization, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Miettinen, K. Nonlinear Multiobjective Optimization; Kluwer Academic Publishers: Cambridge, MA, USA, 1999. [Google Scholar]
Zadeh, L. Optimality and non-scalar-valued performance criteria. IEEE Trans. Autom. Control 1963, 8, 59–60. [Google Scholar] [CrossRef]
Eichfelder, G. Adaptive Scalarization Methods in Multiobjective Optimization; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Pascoletti, A.; Serafini, P. Scalarizing vector optimization problems. J. Optim. Theory Appl. 1984, 42, 499–524. [Google Scholar] [CrossRef]
Hinze, M.; Pinnau, R.; Ulbrich, M.; Ulbrich, S. Optimization with PDE Constraints; Springer Science + Business Media B.V.: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Schilders, W.H.; Van der Vorst, H.A.; Rommes, J. Model Order Reduction; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Hesthaven, J.S.; Rozza, G.; Stamm, B. Certified Reduced Basis Methods for Parametrized Partial Differential Equations; SpringerBriefs in Mathematics: Heidelberg, Germany, 2016. [Google Scholar]
Patera, A.T.; Rozza, G. Reduced Basis Approximation and a Posteriori Error Estimation for Parametrized Partial Differential Equations; MIT Pappalardo Graduate Monographs in Mechanical Engineering: Cambridge, MA, USA, 2007. [Google Scholar]
Banholzer, S.; Gebken, B.; Reichle, L.; Volkwein, S. ROM-based inexact subdivision methods for PDE-constrained multiobjective optimization. Math. Comput. Appl. 2021, 26, 32. [Google Scholar] [CrossRef]
Iapichino, L.; Ulbrich, S.; Volkwein, S. Multiobjective PDE-constrained optimization using the reduced-basis method. Adv. Comput. Math. 2017, 43, 945–972. [Google Scholar] [CrossRef]
Schu, M. Adaptive Trust-Region POD Methods and Their Application in Finance. Ph.D. Thesis, University of Trier, Trier, Germany, 2012. Available online: https://ubt.opus.hbz-nrw.de/opus45-ubtr/frontdoor/deliver/index/docId/574/file/PhD_Thesis_Schu.pdf (accessed on 28 April 2022).
Arian, E.; Fahl, M.; Sachs, W.S. Trust-Region Proper Orthogonal Decomposition for Flow Controls; Techincal Report No. 2000–2025; Institute for Computer Applications in Science and Engineering, NASA Langley Research Center: Hampton, VA, USA, 2000. [Google Scholar]
Qian, E.; Grepl, M.; Veroy, K.; Willcox, K. A certified trust region reduced basis approach to PDE-constrained optimization. SIAM J. Sci. Comput. 2017, 39, S434–S460. [Google Scholar] [CrossRef]
Yue, Y.; Meerbergen, K. Accelerating optimization of parametric linear systems by model order reduction. SIAM J. Optimiz. 2013, 23, 1344–1370. [Google Scholar] [CrossRef]
Banholzer, S. ROM-Based Multiobjective Optimization with PDE Constraints. Ph.D. Thesis, University of Konstanz, Konstanz, Germany, 2021. Available online: http://nbn-resolving.de/urn:nbn:de:bsz:352-2-1g98y1ic7inp29 (accessed on 28 April 2022).
Banholzer, S.; Keil, T.; Mechelli, L.; Ohlberger, M.; Schindler, F.; Volkwein, S. An adaptive projected Newton non-conforming dual approach for trust-region reduced basis approximation of PDE-constrained parameter optimization. arXiv 2020, arXiv:2012.11653. [Google Scholar]
Keil, T.; Mechelli, L.; Ohlberger, M.; Schindler, F.; Volkwein, S. A non-conforming dual approach for adaptive trust-region reduced basis approximation of PDE-constrained optimization. ESAIM M2AN 2021, 55, 1239–1269. [Google Scholar] [CrossRef]
Yano, M.; Huang, T.; Zahr, M.J. A globally convergent method to accelerate topology optimization using on-the-fly model reduction. Comput. Methods Appl. Mech. Eng. 2021, 375, 113635. [Google Scholar] [CrossRef]
Zahr, M.J.; Carlberg, K.T.; Kouri, D.P. An efficient, globally convergent method for optimization under uncertainty using adaptive model reduction and sparse grids. SIAM/ASA J. Uncertain. Quantif. 2019, 7, 877–912. [Google Scholar] [CrossRef]
Kouri, D.P.; Heinkenschloss, M.; Ridzal, D.; van Bloemen Waanders, B.G. A trust-region algorithm with adaptive stochastic collocation for PDE optimization under uncertainty. SIAM J. Sci. Comput. 2013, 35, A1847–A1879. [Google Scholar] [CrossRef] [Green Version]
Kouri, D.P.; Heinkenschloss, M.; Ridzal, D.; van Bloemen Waanders, B.G. Inexact objective function evaluations in a trust-region algorithm for PDE-constrained optimization under uncertainty. SIAM J. Sci. Comput. 2014, 36, A3011–A3029. [Google Scholar] [CrossRef]
Grüne, L.; Pannek, J. Nonlinear Model Predictive Control: Theory and Algorithms, 2nd ed.; Springer: London, UK, 2016. [Google Scholar]
Borwein, J.M. On the existence of Pareto efficient points. Math. Oper. Res. 1983, 8, 64–73. [Google Scholar] [CrossRef]
Hartley, R. On cone-efficiency, cone-convexity and cone-compactness. SIAM J. Appl. Math. 1978, 34, 211–222. [Google Scholar] [CrossRef]
Sawaragi, Y.; Nakayama, H.; Tanino, T. Theory of Multiobjective Optimization; Elsevier: Amsterdam, The Netherlands, 1985. [Google Scholar]
Wierzbicki, A.P. The Use of Reference Objectives in Multiobjective Optimization. In Multiple Criteria Decision Making Theory and Application; Springer: Berlin/Heidelberg, Germany, 1980; pp. 468–486. [Google Scholar]
Mueller-Gritschneder, D.; Graeb, H.; Schlichtmann, U. A successive approach to compute the bounded Pareto front of practical multiobjective optimization problems. SIAM J. Optim. 2009, 20, 915–934. [Google Scholar] [CrossRef]
De Motta, R.S.; Afonso, S.M.B.; Lyra, P.R.M. A modified NBI and NC method for the solution of N-multiobjective optimization problems. Struct. Multidiscip. Optim. 2012, 46, 239–259. [Google Scholar] [CrossRef]
Khaledian, K.; Soleimani-damaneh, M. A new approach to approximate the bounded Pareto front. Math. Method Oper. Res. 2015, 82, 211–228. [Google Scholar] [CrossRef]
Lowe, T.J.; Thisse, J.-F.; Ward, J.E.; Wendell, R.E. On efficient solutions to multiple objective mathematical programs. Manag. Sci. 1984, 30, 1346–1349. [Google Scholar] [CrossRef]
Sayın, S. Measuring the quality of discrete representations of efficient sets in multiple objective mathematical programming. Math. Program. 2000, 87, 543–560. [Google Scholar] [CrossRef]
Mechelli, L. POD-Based State-Constrained Economic Model Predictive Control of Convection-Diffusion Phenomena. Ph.D. Thesis, University of Konstanz, Konstanz, Germany, 2019. Available online: http://nbn-resolving.de/urn:nbn:de:bsz:352-2-2zoi8n9sxknm1 (accessed on 28 April 2022).
Evans, L.C. Partial Differential Equations; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar]
Haasdonk, B. Reduced basis methods for parametrized PDEs—A tutorial introduction for stationary and instationary problems. In Model Order Reduction and Approximation: Theory and Algorithms; Benner, P., Ohlberger, M., Cohen, A., Willcox, K., Eds.; SIAM: Philadelphia, PA, USA, 2017; pp. 65–136. [Google Scholar]
Rozza, G.; Huynh, D.B.P.; Patera, A.T. Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations. Arch. Comput. Method E 2008, 15, 229–275. [Google Scholar] [CrossRef] [Green Version]
Beermann, D.; Dellnitz, M.; Peitz, S.; Volkwein, S. Set-oriented multi- objective optimal control of PDEs using proper orthogonal decomposition. In Reduced-Order Modeling (ROM) for Simulation and Optimization; Keiper, W., Milde, A., Volkwein, S., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 47–72. [Google Scholar]
Haasdonk, B.; Dihlmann, M.; Ohlberger, M. A training set and multiple bases generation approach for parameterized model reduction based on adaptive grids in parameter space. Math. Comput. Model. Dyn. 2011, 17, 423–442. [Google Scholar] [CrossRef]
Keil, T. Adaptive Reduced Basis Methods for Multiscale Problems and Large-Scale PDE-Constrained Optimization. Ph.D. Thesis, WWU Münster, Münster, Germany, 2022. [Google Scholar]

Figure 1. (a) Algorithm 2 no Removal local RB spaces. (b) Algorithm 2 T3 local RB spaces.

Figure 2. Computational times in seconds for Algorithm 2 with or without basis removal and using the two strategies in Remark 6 for initializing the RB space.

Figure 3. Number of basis functions used to compute each Pareto optimal point. (a) Local RB space. (b) Common RB space. In brackets: average number of basis functions.

Table 1. Total PDE and only FE solves for the tested methods.

Method	# Total PDE Solves	# FE Solves
FE	433378	433378
Common RB Space No R.	493254	20743
Common RB Space T1	493282	20786
Common RB Space T2a	497032	20838
Common RB Space T2b	497032	20752
Common RB Space T3	493985	20792
Local RB Space No R.	497072	20773
Local RB Space T1	497589	20893
Local RB Space T2a	507064	21226
Local RB Space T2b	507064	20857
Local RB Space T3	502911	21023

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Banholzer, S.; Mechelli, L.; Volkwein, S. A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization. Math. Comput. Appl. 2022, 27, 39. https://doi.org/10.3390/mca27030039

AMA Style

Banholzer S, Mechelli L, Volkwein S. A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization. Mathematical and Computational Applications. 2022; 27(3):39. https://doi.org/10.3390/mca27030039

Chicago/Turabian Style

Banholzer, Stefan, Luca Mechelli, and Stefan Volkwein. 2022. "A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization" Mathematical and Computational Applications 27, no. 3: 39. https://doi.org/10.3390/mca27030039

Article Menu

A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization

Abstract

1. Introduction

2. Multi-Objective Optimization

2.1. The PS Method

2.2. Hierarchical PS Method

3. The Non-Convex Parametric PDE-Constrained MOP

The RB Method for MPPOP

4. The TR-RB Method

4.1. The TR-RB Algorithm Applied to the PS Method

4.2. How to Reduce the Number of Basis Functions

5. Numerical Experiments

5.1. Parameter Choices for the TR-RB Algorithm

5.2. Numerical Ressults

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI