Next Article in Journal / Special Issue
Enhancing Quasi-Newton Acceleration for Fluid-Structure Interaction
Previous Article in Journal
Interval-Based Computation of the Uncertainty in the Mechanical Properties and the Failure Analysis of Unidirectional Composite Materials
Previous Article in Special Issue
Challenges in Kinetic-Kinematic Driven Musculoskeletal Subject-Specific Infant Modeling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization

Department of Mathematics and Statistics, University of Konstanz, Universitätsstraße 10, 78464 Konstanz, Germany
*
Author to whom correspondence should be addressed.
Math. Comput. Appl. 2022, 27(3), 39; https://doi.org/10.3390/mca27030039
Submission received: 18 January 2022 / Revised: 28 April 2022 / Accepted: 29 April 2022 / Published: 3 May 2022
(This article belongs to the Special Issue Computational Methods for Coupled Problems in Science and Engineering)

Abstract

:
In the present paper non-convex multi-objective parameter optimization problems are considered which are governed by elliptic parametrized partial differential equations (PDEs). To solve these problems numerically the Pascoletti-Serafini scalarization is applied and the obtained scalar optimization problems are solved by an augmented Lagrangian method. However, due to the PDE constraints, the numerical solution is very expensive so that a model reduction is utilized by using the reduced basis (RB) method. The quality of the RB approximation is ensured by a trust-region strategy which does not require any offline procedure, in which the RB functions are computed in a greedy algorithm. Moreover, convergence of the proposed method is guaranteed and different techniques to prevent the excessive growth of the number of basis functions are explored. Numerical examples illustrate the efficiency of the proposed solution technique.

1. Introduction

Multi-objective optimization plays an important role in many applications, e.g., in industry, medicine or engineering. One of the mentioned examples is the minimization of costs with simultaneous quality optimization in production or the minimization of CO 2 emission in energy generation and simultaneous cost minimization. These problems lead to multi-objective optimization problems (MOPs), where we want to achieve an optimal compromise with respect to all given objectives at the same time. Normally, the different objectives are contradictory such that there exists an infinite number of optimal compromises. The set of these compromises is called the Pareto set. The goal is to approximate the Pareto set in an efficient way, which turns out to be more expensive than solving a single objective optimization problem.
Since MOPs are of great importance, there exist several algorithms to solve them. Among the most popular methods are scalarization methods, which transform MOPs into scalar problems. For example, in the weighted sum method [1,2,3], convex combinations of the original objectives are optimized. However, in our case the multi-objective optimization problem
min J ^ ( u ) = J ^ 1 ( u ) , , J ^ k ( u ) T subject to ( s . t . ) u U ad
is non-convex with a bounded, non-empty, convex and closed set U ad . To solve (MOP) a suitable scalarization method in that case is the Pascoletti-Serafini (PS) scalarization [4,5]: For a chosen reference point z R k and a given target direction r R k with r i > 0 for all i { 1 , , k } the Pascoletti-Serafini problem is given by
min t s . t . ( t , u ) R × U ad and J ^ ( u ) z t r . ( P z , r PS )
In the present paper ( P z , r PS ) is solved by an augmented Lagrangian approach. However, in our case the evaluation of the objective J ^ requires the solution of an elliptic partial differential equation (PDE) for the given parameter u. This implies further that for the computation of the gradients J ^ i , i = 1 , , k , adjoint PDEs have to be solved; cf. [6]. Here, surrogate models offer a promising tool to reduce the computational effort significantly [7]. Examples are dimensional reduction techniques such as the Reduced Basis (RB) method [8,9]. In an offline phase, a low-dimensional surrogate model of the PDE is constructed by using, e.g., the greedy algorithm, cf. [8,10,11]. In the online phase, only the RB model is used to solve the PDE, which saves a lot of computing time.
Since the early 2000s the combination of model order reduction with trust-region algorithms in the setting of PDE-constrained optimization is present in the literature, cf. [12,13]. The idea in these methods is to replace the usual quadratic model function in each trust-region step with the reduced-order approximation of the cost function. More recent publications followed and enhanced this approach by using a-posteriori error estimates of the cost function and its gradient, cf. [14,15]. These works were the starting point for the trust-region reduced basis methods developed in [16,17,18]. Let us mention that [19,20] have proposed similar methods for the combination of reduced-order and trust-region methods based on previous works on trust-region algorithms for PDE-constrained optimization under uncertainty, cf. [21,22]. In contrast to the approach followed by [14,15,16,17,18], these methods do not use rigorous a-posteriori error estimates but rather asymptotic error indicators which still allow for a global convergence result. Here we propose an extension of the method in [16] for solving multi-objective PDE-constrained parameter optimization problems, which is based on a combination of the trust-region reduced basis method presented in [17,18] and the PS method. In particular, we discuss different strategies to handle the increasing number of reduced basis functions, which is crucial in order to guarantee good performances of the algorithm. Notice that our approach is designed for applications, where we have to solve the multi-objective PDE-constrained parameter optimization problem once. For that reason, our trust-region reduced basis method does not rely on any offline computations. These proposed strategies are not only interesting in the field of multi-objective optimization by the PS method, but can also be used in other applications where many PDE-constrained optimization problems must be solved and it is hence crucial to keep the number of reduced basis functions small enough, as, e.g., in model predictive control; cf. [23].
The paper is organized as follows: In Section 2 we introduce a general MOP and explain the PS method, in particular, a hierarchical version of the PS algorithm which turns out to be very efficient in the numerical realization. The concrete PDE-constrained MOP is investigated in Section 3. The trust-region RB method and its combination with the PS method is described in Section 4. Convergence is ensured and the algorithmic realization of the approach is explained. Numerical examples are discussed in detail in Section 5. Finally, we draw some conclusions.

2. Multi-Objective Optimization

Let ( U , · , · U ) be a real Hilbert space, U ad U non-empty, convex and closed, k 2 arbitrary and J ^ 1 , , J ^ k : U ad U R be given real-valued functions. In this manuscript, we assume also that U ad is bounded. This is an assumption we will require later for the convergence of our method. Note that one can derive similar results of this section if U ad is unbounded by introducing additional assumptions; cf. [16]. To shorten the notation, we write J ^ : = ( J ^ 1 , , J ^ k ) T : U ad R k . In the following, we deal with the multi-objective optimization problem
min J ^ ( u ) s . t . u U ad .
Definition 1.
(a)
The functions J ^ 1 , , J ^ k are called cost or objective functions. Analogously, the vector-valued function J ^ : U ad R k is named the (multi-objective) cost or (multi-objective) objective function.
(b)
The Hilbert space U is named the admissible space, the set U ad is called the admissible set and a vector u U ad is called admissible.
(c)
The space R k is named the objective space and the image set J ^ ( U ad ) is called the objective set. A vector y = J ^ ( u ) J ^ ( U ad ) is called objective point.
Definition 2
(Partial ordering on R k ). On R k we define the partial ordering ≤ as
x y : i { 1 , , k } : x i y i
for all x , y R k . Moreover, we define
x < y : i { 1 , , k } : x i < y i .
For convenience, we write
x y : x y & x y
for all x , y R k and define the two sets R k : = { y R k y 0 } , R k : = { y R k y 0 } . Analogously, the relations ≥, > and ≩ as well as the sets R k and R k are defined.
Definition 3
(Pareto optimality).
(a)
An admissible vector u ¯ U ad and its corresponding objective point y ¯ : = J ^ ( u ¯ ) J ^ ( U ad ) are called (locally) weakly Pareto optimal if there is no u ˜ U ad (in a neighborhood of u ¯ ) with J ^ ( u ˜ ) < J ^ ( u ¯ ) . The sets
U opt , w : = { u U ad u is weakly Pareto optimal } U ad , U opt , w , loc : = { u U ad u is locally weakly Pareto optimal } U ad
are said to be the weak Pareto set and the locally weak Pareto set, respectively. The sets
J opt , w : = J ^ ( U opt , w ) R k , J opt , w , loc : = J ^ ( U opt , w , loc ) R k ,
are the weak Pareto front and the locally weak Pareto front, respectively.
(b)
An admissible vector u ¯ U ad and its corresponding objective point y ¯ : = J ^ ( u ¯ ) J ^ ( U ad ) are called (locally) Pareto optimal if there is no u ˜ U ad (in a neighborhood of u ¯ ) with J ^ ( u ˜ ) J ^ ( u ¯ ) . The sets
U opt : = { u U ad u is Pareto optimal } U ad , U opt , loc : = { u U ad u is locally Pareto optimal } U ad
are called the Pareto set and the local Pareto set, respectively. The sets
J opt : = J ^ ( U opt ) R k , J opt , loc : = J ^ ( U opt , loc ) R k
are called the Pareto front and the local Pareto front, respectively.
If we talk about the different notions of (local) (weak) Pareto optimality in one sentence, we use the notation U opt , ( w ) , ( loc ) to keep the sentence compact. Analogously, U opt , ( w ) , loc , U opt , ( loc ) , J opt , ( w ) , ( loc ) etc. are to be understood. An example with the different concepts of Pareto optimality can be found in [16] (Example 1.2.6).
The next theorem about a sufficient condition for the existence of Pareto optimal points goes back to [24]. It also appears in a similar form in [25,26].
Theorem 1.
Suppose that there is y J ^ ( U ad ) + R k such that the set ( y R k ) ( J ^ ( U ad ) + R k ) is compact. Then it holds J opt .
Proof. 
This is a slight generalization of [1] (Theorem 2.10) using the argument that adding R k to the set J ^ ( U ad ) does not change the Pareto front J opt .      □
Given any y = J ^ ( u ) J ^ ( U ad ) with y J opt , it follows directly from the definition of Pareto optimality that there is y ¯ = J ^ ( u ¯ ) J ^ ( U ad ) with y ¯ y . However, even if the Pareto front J opt is not empty (e.g., since the assumptions of Theorem 1 are satisfied), it is not clear that there is y ¯ J opt with y ¯ y . If this property holds for all y J ^ ( U ad ) J opt , the set J opt is said to be externally stable; cf. [1,26].
Definition 4.
The set J opt is said to be externally stable if for every y J ^ ( U ad ) there is y ¯ J opt with y ¯ y . This is equivalent to J ^ ( U ad ) J opt + R k .
Especially for the investigation of suitable solution methods for solving (MOP), we are interested in guaranteeing that the Pareto front is externally stable. The next result provides a sufficient condition for this property.
Theorem 2.
If for every y J ^ ( U ad ) + R k the set ( y R k ) ( J ^ ( U ad ) + R k ) is compact, then J opt is externally stable.
Proof. 
For a proof of a similar version of this theorem, we refer to [1] (Theorem 2.21). □
Among the methods to solve multi-objective optimization problems, the ones based on scalarization techniques are frequently appearing in the literature. Let us mention here the weighted-sum method [1,3], the Euclidian reference point method [27] and the PS method [4,5]. Since in our case the set J ^ ( U ad ) + R k is non-convex, we apply the PS method which is proven to be able to solve a non-convex (MOP).

2.1. The PS Method

For a chosen reference point  z R k and a given target direction r R > k the PS problem is given by
min t s . t . ( t , u ) R × U ad and J ^ ( u ) z t r . ( P z , r PS )
Analogously, we can define the PS problem as a scalarization problem. For  z R k and r R > k we define the scalarization function
g z , r : R k R , x g z , r ( x ) : = max 1 i k 1 r i x i z i ,
and the PS scalarized function
J ^ g z , r ( u ) : = g z , r ( J ^ ( u ) ) = max 1 i k 1 r i ( J ^ i ( u ) z i ) for u U ad .
Then the reformulated PS problem is given by
min J ^ g z , r ( u ) s . t . u U ad . ( RP z , r PS )
The following theorem proved in [16] (Theorem 1.7.3) ensures the equivalence between ( P z , r PS ) and ( RP z , r PS ).
Theorem 3.
Let z R k and r R > k be arbitrary. On the one hand, if  ( u ¯ , t ¯ ) is a global (local) solution of ( P z , r PS ), then u ¯ is a global (local) solution of ( RP z , r PS ) with minimal function value t ¯ . On the other hand, if  u ¯ is a global (local) solution of ( RP z , r PS ), then ( u ¯ , t ¯ ) with t ¯ : = max 1 i k ( J ^ i ( u ¯ ) z i ) / r i is a global (local) solution of ( P z , r PS ).
Assumption 1.
The cost functions J ^ 1 , , J ^ k are weakly lower semi-continuous and bounded from below.
Theorem 4.
Let Assumption 1 be satisfied and z R k as well as r R > k be arbitrary. Then ( RP z , r PS ) has a global solution u ¯ U opt .
Proof. 
A proof of this statement can be found in [16] (Corollary 1.7.12). □
The previous result also shows that the existing global solution of ( RP z , r PS ) belongs to the Pareto set. To guarantee a good reconstruction of the Pareto set by the PS method, one needs that, given a (weakly) Pareto optimal point, it is possible to choose the parameters z and r such that this point solves ( RP z , r PS ). This is stated in [16] (Theorem 1.7.13), which we report here for clearness.
Theorem 5.
Let u ¯ U opt , w be arbitrary. Then for every r R > k and every t ¯ R we have that u ¯ is a global solution of ( RP z , r PS ) for the reference point z : = J ^ ( u ¯ ) t ¯ r . If even u ¯ U opt , any other global solution u ˜ of ( RP z , r PS ) satisfies J ^ ( u ˜ ) = J ^ ( u ¯ ) .
Remark 1.
We refer the reader to [16] (Lemma 1.7.15) for the derivation of first-order necessary optimality condition for a global solution of ( P z , r PS ).
Thus, the PS method can compute in principle every (locally) (weak) Pareto optimal point so that many algorithms based on PS method have been proposed. Here we only mention the ones which are related to (but differ from) our proposed technique. Our main idea is to keep the parameter r fixed, while varying the reference point z. This was also proposed in [4], but the method turns out to be, on the one hand, not numerically efficient for k > 2 and, on the other hand, not numerically applicable in some cases for k > 2 . In [28], the authors provide assumptions on the Pareto front to ensure that the so-called trade-off limits (i.e., points on the Pareto front which cannot be improved in at least one component), are given by the solution to subproblems. Their idea was then to find these trade-off points first and then compute the rest of the Pareto front. A similar idea but with the use of Centroidal Voronoi Tessellation was presented by [29]. Finally, [30] shows and fixes some problematic behavior associated to the algorithm in [28]. We follow the idea of the mentioned contributions of hierarchically solving subproblems of (MOP), but with the focus of finding a set of reference points, by looking at subproblems, for which we can obtain Pareto optimal points. We are then not interested in finding ‘boundary’ points (i.e., the trade-off limits) of the Pareto front and then filling its ‘interior’ as in [28,29,30], but rather to partly generalize this approach. In what follows, we characterize which reference points are necessary and/or sufficient for computing the entire (local) (weak) Pareto front. First, we recall the following well-defined solution mappings of ( RP z , r PS ); cf. [16] (Definition 1.7.16).
Definition 5.
We define the set-valued mappings
Q opt , w : R k U opt , w , z { u U ad u is a global solution of ( ( RP z , r PS ) ) } , Q opt , w , loc : R k U opt , w , loc , z { u U ad u is a local solution of ( ( RP z , r PS ) ) } , Q opt , ( loc ) : R k U opt , ( loc ) , z Q opt , w , ( loc ) ( z ) U opt , ( loc ) .
From Theorem 3, it follows that Q opt , ( w ) , ( loc ) ( R k ) = U opt , ( w ) , ( loc ) , i.e., by solving ( RP z , r PS ) for all z R k , we obtain all (locally), (weakly) Pareto optimal points. Furthermore, if Assumption 1 is satisfied, we infer from Theorem 4 that Q opt , ( w ) , ( loc ) ( z ) for all z R k . We also introduce the notion of a (locally) (weakly) Pareto sufficient set for the PS method.
Definition 6.
A set Z R k is called (locally) (weakly) Pareto sufficient if we have Q opt , ( w ) , ( loc ) ( Z ) = U opt , ( w ) , ( loc ) .
Hence, a (locally) (weakly) Pareto sufficient set contains the reference points which allow us to compute the entire (local) (weak) Pareto front. Clearly, the set R k is (locally) (weakly) Pareto sufficient, but this fact is not computationally useful. The next lemma gives a first condition towards this computational efficiency.
Lemma 1.
Let Z R k be arbitrary. Z is (locally) (weakly) Pareto sufficient, if
u ¯ U opt , ( w ) , ( loc ) : t R : J ^ ( u ¯ ) t r Z .
Proof. 
Let Z R k be such that (1) holds. Let u ¯ U opt , ( w ) , ( loc ) be arbitrary. We need to show that there is a z Z with u ¯ Q opt , ( w ) , ( loc ) ( z ) . Indeed, by (1) there is t R with z : = J ^ ( u ¯ ) t r Z and by Theorem 5 we already have u ¯ Q opt , ( w ) , ( loc ) ( z ) . □
To proceed we introduce the concepts of ideal point and shifted ideal point, which will first be used to define a set of shifted coordinate planes D. On this set we can then define a set of reference points Z opt , ( w ) , ( loc ) D which turns out to be an optimal Pareto sufficient set (The word ‘optimal’ here means that removing any point from the set will cause the loss of the Pareto sufficient property).
Definition 7.
(a)
We define the ideal objective point y id R k { } by y i id : = inf u U ad J ^ i ( u ) for all i { 1 , , k } .
(b)
For an arbitrary vector d ˜ R > k define the shifted ideal point y ˜ id : = y id d ˜ . Let D i R k be given by D i : = { y R k y y ˜ id , y i = y ˜ i id } for all i { 1 , , k } . Then the set D R k is defined by D : = i = 1 k D i .
(c)
We define Z opt , ( w ) , ( loc ) D : = { z D u ¯ U opt , ( w ) , ( loc ) : t R : z = J ^ ( u ¯ ) t r } .
(d)
For any y R k we set t D ( y ) : = min i { 1 , , k } ( y i y ˜ i id ) / r i R .
Remark 2.
It is proved in [16] (Lemma 1.7.24) that
Z opt , ( w ) , ( loc ) D = J ^ ( u ¯ ) t D ( J ^ ( u ¯ ) ) r | u ¯ U opt , ( w ) , ( loc ) .
Furthermore, the set Z opt , ( w ) , ( loc ) D is (locally) (weakly) Pareto sufficient and there is a Lipschitz continuous bijection between Z opt D and the Pareto front J opt . Unfortunately, there is no bijection between Z opt , ( w ) , ( loc ) D and J opt , ( w ) , ( loc ) , but the set Z opt , ( w ) , ( loc ) D is still (locally) (weakly) Pareto sufficient. Therefore, it is anyway possible to use it for the computation of the Pareto front.

2.2. Hierarchical PS Method

Due to Definition 7 and Remark 2 the set Z opt , ( w ) , ( loc ) D can only by computed once the set U opt , ( w ) , ( loc ) is available. Clearly, this characterization of Z opt , ( w ) , ( loc ) D is not useful for a numerical algorithm since the availability of U opt , ( w ) , ( loc ) means that we have already solved (MOP). Fortunately, in [16,31] it is shown that the Pareto set has a hierarchical structure. This means that the (weak) Pareto front and the (weak) Pareto sets of (MOP) are contained in the set of all (weak) Pareto fronts and (weak) Pareto sets of all of its subproblems. This particular structure of the Pareto set can be exploited to set up a hierarchical algorithm for obtaining a superset of Z opt , ( w ) , ( loc ) D without having to compute the entire (local) (weak) Pareto set U opt , ( w ) , ( loc ) first. We start the explanation of the hierarchical algorithm by introducing the notion of a subproblem and related notations.
Definition 8.
For the index set I { 1 , , k } we denote by J ^ I the multi-objective cost function ( J ^ i ) i I : U ad R I , and call the problem
min J ^ I ( u ) s . t . u U ad
a subproblem of (MOP). For  I , K { 1 , , k } with K I ,
(a)
and for every y R I we denote by y K : = ( y i ) i K R K the canonical projection to R K .
(b)
the set U opt , ( w ) , ( loc ) I : = { u U ad u is ( loc . ) ( weak . ) Pareto optimal for ( MOP I ) } denotes the (local) (weak) Pareto set and the set J opt , ( w ) , ( loc ) I : = J ^ I ( U opt , ( w ) , ( loc ) I ) R I denotes the (local) (weak) Pareto front of the subproblem (MOPI).
(c)
the (local) (weak) nadir objective point for the subproblem (MOPI) is defined by
y i nad , I , ( w ) , ( loc ) : = sup { y i y J opt , ( w ) , ( l o c ) I } for all i I .
Given a subproblem (MOPI) it is straight-forward to define the PS problem for this setting.
Definition 9.
Let I { 1 , , k } be arbitrary. For a given reference point z R I and target direction r R > I , we define the PS problem for (MOPI) by
min t s . t . ( t , u ) R × U ad and J ^ I ( u ) z t r I . ( P I , z , r PS )
Again, it is possible to show that ( P I , z , r PS ) is equivalent (in the sense of Theorem 3) to the problem
min max i I 1 r i J ^ i ( u ) z i s . t . u U ad . ( RP I , z , r PS )
Let us mention that the statements proved in Section 2.1 can be adapted for the PS method for the subproblems. Similarly, we can also generalize the definition of the shifted coordinate plane D and the (locally) (weakly) Pareto sufficient set of reference points Z opt , ( w ) , ( loc ) D to this setting.
Definition 10.
Let I { 1 , , k } be arbitrary. Given the vector d ˜ R > k and the shifted ideal point y ˜ id R k , which were both introduced in Definition 7, let D i I R I be given by
D i I : = y R I | y ( y ˜ id ) I , y i = y ˜ i id for i I .
Then the set D I R I is defined by D I : = i I D i . Moreover, for all K { 1 , , k } we define the sets
Z opt , ( w ) , ( loc ) D I , K : = z D I | u ¯ U opt , ( w ) , ( loc ) K : t R : z = J ^ I ( u ¯ ) t r I .
To ease the notation, we write Z opt , ( w ) , ( loc ) D I : = Z opt , ( w ) , ( loc ) D I , I . If  I = { 1 , , k } we set Z opt , ( w ) , ( loc ) D , K : = Z opt , ( w ) , ( loc ) D I , K and Z opt , ( w ) , ( loc ) D : = Z opt , ( w ) , ( loc ) D I , I . Finally, for any y R I we set t D I ( y ) : = min i I y i y ˜ i id r i R .
Note that also Remark 2 can be rewritten for the subproblems.
The main ingredient of the hierarchical PS method is the result that a superset of Z opt , ( w ) , ( loc ) D I can be computed by using the sets U opt , ( w ) , ( loc ) K for all K I . In other words, in contrast to Definition 10 only the Pareto optimal solutions to all subproblems–but not the problem itself–are needed to compute the (locally) (weakly) Pareto sufficient set of reference points Z opt , ( w ) , ( loc ) D I for the subproblem (MOPI). The very technical details of the analytical derivation and verification of this result are omitted here to ease and shorten the presentation. For a reader interested in the details we refer to [16] (Sections 1.7.4.2–1.7.4.4). Building on this result, the idea of the hierarchical PS method is to iteratively solve subproblems with increasing number of cost functions. During this procedure the required reference points for the current subproblem can be computed by using the Pareto optimal solutions of all of its subproblems as described above.
Before we formulate the hierarchical algorithm, we give the necessary numerical condition in order to compute a numerical approximation of the set Z opt , ( w ) , ( loc ) D I by using the numerical solution to all subproblems.
To do so, we introduce a grid on D I as follows.
Definition 11.
Let I { 1 , , k } be arbitrary. For a given grid size h > 0 and any i I , we define
Z i h , I : = z D i I | j I { i } : k 0 : z j = y ˜ j id + h 2 + k h & z j y j n a d , I , w t ¯ i r j .
Furthermore, we set Z h , I : = i I Z i h , I . If  I = { 1 , , k } , we write Z h : = Z h , I .
The idea is to only choose reference points that lie on the grid Z h , I and do not satisfy the condition
K I : ( u ¯ , t ¯ , z ¯ ) U T Z num ( K ) : z K = z ¯ K & z I K J ^ I K ( u ¯ ) t ¯ r I K ,
where U T Z num ( K ) is a numerical approximation of U T Z ( K ) = { ( u , d ˜ j , y ˜ j id ) u U ˜ opt , w ( I ) } . An explanation for excluding points based on (2) can be found in [16] (Section 1.7.4.5). Finally, we describe the proposed numerical hierarchical PS method in Algorithm 1.
Remark 3.
In [32], the author introduce three different quality criteria for the numerical implementation of a scalarization method, which we discuss here for the presented hierarchical PS method.
(a)
Coverage: Every part of the Pareto set and front has to be represented in the sets U opt , w num and J opt , w num , respectively. This can be measured by
cov ( J opt , ( w ) , ( loc ) ) : = max y ¯ J opt , ( w ) , ( loc ) min y J opt , ( w ) , ( loc ) num y ¯ y .
In the case of Algorithm 1, we have that cov ( J opt , ( w ) , ( loc ) ) = O ( h ) (cf. [16] (Remark 1.7.69-(a))).
(b)
Uniformity:The points on the Pareto set and front should be distributed (almost) equidistantly; cf. [16] (Remark 1.7.69-(b)).
(c)
Cardinality:The number of points contained in the numerical approximation should be reasonable. In the case of Algorithm 1 is not possible to estimate a-priori the number of elements computed by the method. It is possible to show a bound which can be computed when the nadir objective point y nad , ( w ) is known (cf. [16], Remark 1.7.69-(c)).
Algorithm 1: Solving (MOP) numerically by the hierarchical PS method
1:
for j = 1 : k do
2:
   Set I : = { j } ;
3:
   Compute U opt , w num ( I ) = { u u minimizes J ^ j } ;
4:
   Choose d ˜ j , compute y j id and set y ˜ j id = y j id d ˜ j ;
5:
   Set U T Z num ( I ) = { ( u , d ˜ j , y ˜ j id ) u U opt , w num ( I ) } ;
6:
end for
7:
for i = 2 : k do
8:
  for all I { 1 , , k } with I = i do
9:
   Initialize U opt , w num ( I ) = K I U opt , w num ( K ) and U T Z num ( I ) = ;
10:
   Compute the reference points Z num ( I ) = { z Z h , I ¬ ( ) } ;
11:
   while  Z num ( I )  do
12:
     Choose z Z num ( I ) and remove z from Z num ( I ) ;
13:
     Solve ( P I , z , r PS )/( RP I , z , r PS );
14:
     Set U opt , w num ( I ) U opt , w num ( I ) Q opt , w I ( z ) ;
15:
     Set
U T Z num ( I ) U T Z num ( I ) { ( u ¯ , t ¯ , z ) ( u ¯ , t ¯ ) gl . sol . of ( P I , z , r PS ) } ;
16:
     Add solutions of PSPs with respect to redundant reference points: Set
U T Z num ( I ) U T Z num ( I ) { ( u ¯ , t ¯ , z ˜ ) ( u ¯ , t ¯ ) gl . sol . of ( P I , z , r PS ) ,
z ˜ Z num ( I ) [ z ( t ¯ r I ( J ^ I ( u ¯ ) z ) ) , z ] } ;
17:
     Remove redundant reference points: Set
Z num ( I ) Z num ( I ) [ z ( t ¯ r I ( J ^ I ( u ¯ ) z ) ) , z ] for all u ¯ Q opt , ( w ) I ( z ) ;
18:
    end while
19:
  end for
20:
end for
21:
ifcomputeParetoFront == truethen
22:
   Remove all u U opt , w num ( { 1 , , k } ) with u U opt by a non-dominance test;
23:
end if

3. The Non-Convex Parametric PDE-Constrained MOP

Before defining our exemplary MOP, we introduce the PDE model which will later serve as an equality constraint. Let Ω R d , d { 2 , 3 } , be a bounded domain with Lipschitz-continuous boundary Γ = Ω . Furthermore, let Ω 1 , , Ω m be a pairwise disjoint decomposition of the domain Ω and set Γ i : = Ω i Ω for all i = 1 , , m . Then we are interested in the following elliptic diffusion-reaction equation with Robin boundary condition:
· i = 1 m u i κ χ Ω i ( x ) y ( x ) + u r r ( x ) y ( x ) = f ( x ) a . e . in   Ω ,
u i κ y n ( s ) + α y ( s ) = α y a ( s ) a . e . on   Γ i .
For every i { 1 , , m } , the parameter u i κ > 0 represents the diffusion coefficient on the subdomain Ω i . By  r L ( Ω ) , we denote a reaction function, which is supposed to satisfy r > 0 a.e. in Ω and is controlled by the scalar parameter u r > 0 . On the right-hand side of (3a), we have the source term f L 2 ( Ω ) . The constant α > 0 in (3b) models the heat exchange with the outside of the domain Ω , where a temperature of y a L 2 ( Γ ) is assumed. In total, the parameter space is given by U = R m × R and any parameter u U can be written as the vector u = ( u κ , u r ) T with u κ = ( u 1 κ , , u m κ ) T R m . Setting H = L 2 ( Ω ) and V = H 1 ( Ω ) the weak formulation of (3) is
a ( u ; y , φ ) = F ( φ ) for   all   φ V
for any u U . In (4) the parameter-dependent symmetric bilinear form a ( u ; · , · ) : V × V R is given by
a ( u ; φ , ψ ) : = i = 1 m u i κ Ω i φ ( x ) · ψ ( x ) d x + u r Ω r ( x ) φ ( x ) ψ ( x ) d x + α Γ φ ( s ) ψ ( s ) d s
for all φ , ψ V and u U . The linear functional F V is defined by
F ( φ ) : = Ω f ( x ) φ ( x ) d x + α Γ y a ( s ) φ ( s ) d s for   all   φ V .
Lemma 2.
(a)
For all u U it holds
a ( u ; · , · ) L ( V , V ) C u U
with a constant C > 0 , which does not depend on u.
(b)
For all u U with u κ > 0 in R and u r > 0 , it holds
a ( u ; φ , φ ) min u 1 κ , , u m κ , u r φ V 2 for   all   φ V .
(c)
The mapping F V is well-defined.
Proof. 
All statements follow from similar arguments of [33] (Lemma 1.4), where related operators were considered in the parabolic case.      □
Theorem 6.
Let u U with u > 0 be arbitrary. Then there is a unique solution y = y ( u ) V of (3). Moreover, the estimate
y V C f L 2 ( Ω ) + y a L 2 ( Γ )
holds with a constant C > 0 , which depends continuously on u, but is independent of f and y a .
Proof. 
The claims follow from the Lax-Milgram theorem (cf. [34]) and Lemma 2. □
Definition 12.
Let u m i n κ ( 0 , ) m and u m i n r > 0 be arbitrary. Then we define the closed set
U eq : = { u U u κ u m i n κ , u r u m i n r } .
In view of Theorem 6, it is possible to define the solution operator S : U eq V , which maps any parameter u U eq to the unique solution y = S ( u ) V of (4).
Remark 4.
Due to Lemma 2, we can conclude that a ( u ; φ , φ ) α min φ V 2 for all φ V and u U eq , where α min : = min ( u m i n κ ) 1 , , ( u m i n κ ) m , u r > 0 . In particular, the constant C in (5) can be chosen independently of u if we restrict ourselves to parameters u U eq .
Theorem 7.
The solution operator S : U eq V is twice continuously Fréchet differentiable. For the first derivative S : U eq L ( U , V ) , we have that for any u U eq and h U the function y h : = S ( u ) h V solves the equation
a ( u ; y h , φ ) = u a ( u ; S ( u ) , φ ) h for   all   φ V .
The second derivative S : U eq L ( U , L ( U , V ) ) is given as follows: For any u U eq and h 1 , h 2 U , the function y h 1 , h 2 : = S ( u ) ( h 1 , h 2 ) solves the equation
a ( u ; y h 1 , h 2 , φ ) = u a ( u ; S ( u ) h 1 , φ ) h 2 u a ( u ; S ( u ) h 2 , φ ) h 1 for   all   φ V .
Remark 5.
By u a we denote the partial derivative of the mapping a w.r.t. the parameter u. Since a is linear in u, it holds
u a ( u ; φ , ψ ) h = a ( h ; φ , ψ ) , u 2 a ( u ; φ , ψ ) = 0 L ( U , U )
for all u , h U and all φ , ψ V . In particular, we can identify u a ( u ; φ , ψ ) U by
u a ( u ; φ , ψ ) = Ω 1 φ ( x ) · ψ ( x ) d x Ω m φ ( x ) · ψ ( x ) d x Ω r ( x ) φ ( x ) ψ ( x ) d x U
by using the Riesz representation theorem.
We are now ready to state the multiobjective parametric PDE-constrained optimization problem (MPPOP). Let k N be fixed and
σ Ω ( 1 ) , , σ Ω ( k ) 0 as   well   as   σ U ( 1 ) , , σ U ( k ) 0
be non-negative weights. Furthermore, denote by y Ω ( 1 ) , , y Ω ( k ) H the desired states and by u d ( 1 ) , , u d ( k ) U the desired parameters. Then we define the multiobjective essential cost functions J ^ 1 , , J ^ k : U eq R by
J ^ i ( u ) : = σ Ω ( i ) 2 S ( u ) y Ω ( i ) H 2 + σ U ( i ) 2 u u d ( i ) U 2 for all u U eq and i { 1 , , k } .
Moreover, u a , u b with u a u b are lower and upper bounds on the parameter u which we assume to be finite. We define U ad : = { u U u a u u b } and we assume that U ad U eq holds. Note that U ad is a closed, convex and bounded set because of the finiteness assumption on u a and u b . We are interested in solving
min u U ad J ^ ( u ) = min u U ad J ^ 1 ( u ) , , J ^ k ( u ) T .
Note that, thanks to the assumptions on U ad and σ U ( i ) , the costs J ^ 1 , , J ^ k satisfy Assumption 1. This problem fits into the framework of non-convex multiobjective optimization and Algorithm 1 can be applied. The non-convexity comes from the way the bilinear form depends on the parameter u. This makes, in fact, the solution mapping non-linear and thus the MPPOP non-convex. To close this section, we derive the expression of the gradient and Hessian of the cost functionals J ^ 1 , , J ^ k . We define the i-th adjoint equation and its solution operator as
Definition 13.
For i = 1 , , k , the solution operator of the i-th adjoint equation is A i : U eq V , where for any given u U eq , p ( i ) : = A i ( u ) solves the equation
a ( u ; φ , p ( i ) ) = σ Ω ( i ) ( S ( u ) y Ω ( i ) ) , φ H for   all   φ V .
As shown in [16], this operators satisfy the two following results:
Lemma 3.
The solution operator A i : U eq V is continuously Fréchet differentiable for all i = 1 , , k . For all i = 1 , . . . k , for the first derivative A i : U eq L ( U , V ) , we have that for any u U eq and h U the function p ( i ) , h : = A i ( u ) h V solves the equation
a ( u ; φ , p i ( i ) , h ) = u a ( u ; φ , A i ( u ) ) h + σ Ω S ( u ) h , φ V , V for   all   φ V .
Corollary 1.
Let U ad U eq , u U ad and h U be arbitrary. Then for i = 1 , , k the cost functions J ^ i are twice continuously Fréchet differentiable and it holds
J ^ i ( u ) = u a ( u ; S ( u ) , A i ( u ) ) + σ U ( u u d ( i ) ) U , 2 J ^ i ( u ) h = u a ( u ; S ( u ) h , A i ( u ) ) u a ( u ; S ( u ) , A i ( u ) h ) + σ U ( i ) h U .
where we use the representation of u a ( u ; S ( u ) , A i ( u ) ) U in U , cf. Remark 5.

The RB Method for MPPOP

One of the limitations of solving the MPPOP directly with the PS method is the high computational cost. Algorithm 1, in fact, requires to solve the state and adjoint equation a large number of times in order to efficiently approximate the Pareto set. Unfortunately, the numerical evaluation of the state and adjoint solution operators is costly due to the high number of degrees of freedom required to apply, for example, the FE method. For this reason, we use the RB method. In the following we explain how the RB method can be applied to our model. From Theorem 6, we know that the weak form of the state equation admits a unique solution for any control u U eq . This allows us to define the solution operator S : U eq V . Now, let us consider the so-called solution manifold M : = { S ( u ) | u U eq } V . The goal of the RB method is to provide a low-dimensional subspace V V , which is a good approximation of M . The subspace V is defined as the span of linearly independent snapshots S ( u 1 ) , , S ( u ) for selected parameters u 1 , , u U eq . Clearly, V has dimension and the snapshots constitute its basis. Let us postpone the discussion on how to select good parameters for generating V . Given an RB space V , we obtain the reduced-order state equation by a Galerkin projection:
a ( u ; y , ψ ) = F ( ψ ) for   all   ψ V .
Also for the reduced-order equation, we have unique solvability for all parameters u U eq . The solution map S : U eq V , which maps any parameter u U eq to the unique solution y = S ( u ) V of (8), is then well-defined. We can similarly define a reduced-order adjoint equation and essential cost functional. For  i = 1 , , k , we define the essential reduced-order cost functions J ^ i : U eq R by
J ^ i ( u ) : = σ Ω ( i ) 2 S ( u ) y Ω ( i ) H 2 + σ U ( i ) 2 u u d ( i ) U 2 ,
the reduced-order adjoint equation by
a ( u ; ψ , p ( i ) , ) = σ Ω ( i ) ( S ( u ) y Ω ( i ) ) , ψ H for   all   ψ V
and the reduced-order adjoint solution operator A i : U eq V . Following Corollary 1, it is possible to represent the gradient and the Hessian of the essential reduced-order cost functions J ^ i for i = 1 , , k by simply replacing the operators S and A i by their respective reduced-order versions S and A i . There are still two aspects which remain to be clarified: first, how to generate an RB space which guarantees a good approximation of the state and adjoint solution manifolds and, second, how to estimate a-posteriori (i.e., without explicitly evaluating the full-order solution operators S and A ) the error of such an approximation.
For the first aspect, one can think of building an RB space either prior to solving the reduced-order optimization problem or while solving it. The first approach is the so-called offline/online decomposition; cf. [35]. This technique exploits a greedy algorithm in the offline phase, which iteratively searches for the parameter for which the approximation error between the full- and reduced-order state and adjoint variables is the largest. Then, the RB space is enriched (by solving the full-order state and adjoint equations at the respective parameter and orthonormalizing the newly computed snapshots with respect to the current RB basis) until a pre-defined tolerance for the approximation error is reached. Once the RB space is computed, the online phase can start: the optimization problem is solved fast on the reduced-order level. Although this technique is still widely used in literature, it shows many disadvantages in the context of optimization. At first, it suffers from the curse of dimensionality: for a high-dimensional parameter space it is too costly to explore the entire parameter space with a greedy procedure. At second, it is counter-intuitive to prepare an RB space which is accurate enough for any parameter, when usually the optimization method follows a (short) pattern in the parameter space to find the solution or when the Pareto set is contained in some local regions of the parameter space, as often in the case of non-convex multiobjective problems. While it is true that the computational costs of an offline phase could be amortized in the context of multiobjective optimization for a reasonably small dimension of the parameter space due to the vast amount a scalarized PS problems that need to be solved in the online phase, the disadvantage of the offline-online splitting in this setting is the lack of control of the accuracy of the Pareto optimal solutions. Indeed, to the best of our knowledge there are no suitable error indicators for the greedy algorithm to guarantee a certified accuracy of the reduced-order Pareto optimal points w.r.t. full-order ones. Luckily, the focus has shifted recently towards adapting the RB space while proceeding with the optimization method. This procedure is followed, e.g., by the methods presented in [14,15,17,18]. The advantage of these methods with respect to methods based on an offline-online splitting is that they compute first-order critical points of the full-order optimization problems. Let us specify that in [14,17,18] the authors proposed and progressively improved an RB method combined with a TR algorithm, based on more general results presented in [15]. Such a method constructs the RB space adaptively while the optimizer is computing the optimal solution. Our focus here is on further improving the method in [17], which can be considered the most general among the TR-RB methods.
For any of the above-mentioned methods, a-posteriori error estimates are crucial to compute upper bounds of the approximation error made by the RB space in reconstructing the solution for a given parameter without any full-order solution at hand. In case of optimization, one is also interested in estimating the error in reconstructing the cost functional and its gradient. For our model, we can use the following estimates:
Theorem 8.
Let u U ad be arbitrary and denote by α ( u ) the coercivity constant of the bilinear form a ( u ; · , · ) . By Remark 4, it holds α ( u ) α min > 0 . Let the residual r st ( u ; · ) V be given by r st ( u ; φ ) : = F ( φ ) a ( u ; S ( u ) , φ ) for all φ V . Then it holds
S ( u ) S ( u ) V Δ st ( u ) : = r st ( u ; · ) V α ( u ) .
For i = 1 , , k the residual r adj ( i ) ( u ; · ) V of the adjoint equations is given by r adj ( i ) ( u ; φ ) : = σ Ω ( i ) ( S ( u ) y Ω ( i ) ) , φ H a ( u ; φ , A i ( u ) ) for all φ V . Then it holds
A i ( u ) A i ( u ) V Δ adj ( i ) ( u ) : = r adj ( i ) ( u ; · ) V + σ Ω ( i ) Δ st ( u ) α ( u ) .
Furthermore, for  i = 1 , , k we have
| J ^ i ( u ) J ^ i ( u ) | Δ st ( u ) r adj ( i ) ( u ; · ) V + σ Ω ( i ) Δ st ( u ) 2 = : Δ J ^ i ( u ) , J ^ i ( u ) J ^ i ( u ) U u a ( u ; · , · ) S ( u ) V Δ adj ( i ) ( u ) + Δ st ( u ) Δ adj ( i ) ( u ) + Δ st ( u ) A i ( u ) V = : Δ J ^ i ( u ) .
Proof. 
A proof of the a-posteriori error estimates for the state and adjoint can be found in [35]. For the cost function and the gradient, we refer to [18] (Proposition 2.5). □
Note that we only need the reduced-order state and adjoint state to evaluate the a-posteriori error estimates. For our example, the computation of the coercivity constant α ( u ) is cheap, see Lemma 2. In more general examples, this might not be the case. Thus, one often uses a quickly computable lower bound α LB ( u ) instead. Possible methods for computing such a lower bound are, e.g., the min-theta approach (cf. [35]) or the Successive Constraint Method (SCM) (cf. [36]). In situations in which the computation or the estimation of the coercivity constant is complicated, the TR-RB algorithms presented in [19,20] have the advantage that they do not require the computation or estimation of the coercivity constant but only rely on asymptotic error estimates consisting of residual based error indicators. Note finally that the computation of the terms r st ( u ; · ) V and r adj ( i ) ( u ; · ) V is not possible in an infinite-dimensional setting. Even after discretization with the FE method, the cost of computing such a term depends on the dimension of the full-order model, which contradicts the request of having a computationally cheap estimate. However, in our case, the parameter-separability of the bilinear form a ( u ; · , · ) can be exploited to preassemble certain quantities in such a way that the computational cost for evaluating r st ( u ; · ) V and r adj ( i ) ( u ; · ) V only depends on the dimension of the RB space; see, e.g., [36]. Finally, we apply the RB method to (MPPOP): for a given RB space V the reduced-order MPPOP reads
min J ^ ( u ) = J ^ 1 ( u ) , , J ^ k ( u ) T s . t . u U ad .
For an arbitrary reference point z R k and target direction r R k , the reduced-order PS problem reads
min ( u , t ) t s . t . ( t , u ) R × U ad and J ^ i ( u ) z i t , i = 1 , , k . ( P z , r PS , )
One could then outline an algorithm similar to Algorithm 1 by using an offline/online splitting. Because of the above-mentioned disadvantages, we focus on combining the PSPs with the TR-RB method from [17] and extend it with respect to the method in [16]. The TR method introduces new aspects to the RB implementation, such as the adaptive construction of the RB space; see the next section for further details.

4. The TR-RB Method

We briefly introduce the method from [17] and clarify how to apply this in combination with the PS method. In Section 4.2 we highlight our extension to this method and how this can reduce the computational time. The basic idea of a TR method is to compute a first-order critical point of a costly optimization problem by iteratively solving some cheap-to-solve approximations in local regions of the admissible space, where these model approximations can be trusted (i.e., are accurate enough). In such a way, one can derive a global method, which converges in a finite number of steps. For each outer iteration j 0 of the TR method, the cheap approximation of the objective is generally indicated by m ( j ) and the trust regions are described by a radius δ ( j ) . To simplify the exposition, let us stick with the case U = R m × R , as in Section 3. The TR method solves then, for each j 0 , the following constrained optimization sub-problems
min v U m ( j ) ( v ) s . t . v 2 δ ( j ) , u ˜ : = u ( j ) + v U ad .
Under suitable assumptions, problem (11) admits a unique solution v ¯ ( j ) , which is used to compute the next outer iteration u ( j + 1 ) = u ( j ) + v ¯ ( j ) . To further simplify the presentation of the algorithm in [17], let us present it for a general cost functional J . Later in this section we will give more details about its application to the MPPOP and the PS method. The TR-RB version of problem (11) is
min u ˜ U ad J , ( j ) ( u ˜ ) s . t . q ( j ) ( u ˜ ) : = Δ J , ( j ) ( u ˜ ) J , ( j ) ( u ˜ ) δ ( j ) ,
where J , ( j ) ( u ˜ ) is the reduced-order cost functional w.r.t. the reduced-order model at the j-th iteration and Δ J , ( j ) ( u ˜ ) is an estimate for the error | J ( u ˜ ) J , ( j ) ( u ˜ ) | . Looking at (12), one clearly sees that the role of the model function m ( j ) is played by the reduced-order model cost functional. This is perfectly in line with the TR spirit of having a cheap-to-solve approximation of the original optimization problem. The trust regions are defined instead through the RB error estimator, which is in fact the way we use to check the quality of the approximation. Let us mention at this point that there are also different approaches to this. In [19,20] the authors incorporated the usual trust-region constraints as seen in (11) into a TR-RB algorithm. In [18] also the importance of introducing a correction term on the RB level is discussed to improve the performance of the method. We point out that this only has to be done if one chooses two separate RB spaces for state and adjoint equations (see also [17]). This will not be the case for our application. In Algorithm 2, we report the method from [17]. In what follows, we guide the reader through the features of the algorithm. At first, we need to initialize the reduced-order model at the initial guess u ( 0 ) . This means computing S ( u ( 0 ) ) and A i ( u ( 0 ) ) for i = 1 , , k and generating the RB space V , ( 0 ) as their span. Similarly, updating the RB space V , ( j ) at the point u ( j + 1 ) means computing the full-order quantities S ( u ( j + 1 ) ) and A i ( u ( j + 1 ) ) for i = 1 , , k and adding them to the RB space by a Gram-Schmidt orthonormalization.
In Line 3 of Algorithm 2, it is required to compute the so-called approximated generalized Cauchy (AGC) point. We report here its definition according to [15,18].
Definition 14.
Let κ ( 0 , 1 ) and κ arm ( 0 , 1 ) be backtracking parameters. For the current iterate u ( j ) define d ( j ) : = J , ( j ) ( u ( j ) ) . Let α N be the smallest number for which the two conditions
J , ( j ) P U ad ( u ( j ) κ α d ( j ) ) J , ( j ) ( u ( j ) ) κ arm κ α P U ad ( u ( j ) κ α d ( j ) ) u ( j ) U 2 ,
q ( j ) ( P U ad ( u ( j ) κ α d ( j ) ) ) δ ( j )
are satisfied, where P U ad : U U ad is the canonical projection onto the closed and convex set U ad . Then we define the AGC point as u AGC ( j ) : = P U ad ( u ( j ) κ α d ( j ) ) .
The TR-RB subproblem (12) is then solved in Line 4 using a projected Newton-CG algorithm with the AGC point as a warm start and the following termination criteria
u P U ad ( u J , ( j ) ( u ) ) U τ sub , β bound δ ( j ) q ( j ) ( u ) δ ( j ) .
The first condition in (15) is the standard first-order criticality condition with tolerance τ sub ( 0 , 1 ) and the second one was already introduced in [14] to avoid too many iterations close to the TR boundary, which is generally an area where we are already starting to trust the model function less. The parameter β bound is usually chosen to be close to one exactly for this purpose.
Algorithm 2: TR-RB algorithm
1:
Initialize the reduced-order model at u ( 0 ) , set j = 0 and Loop_flag=True;
2:
whileLoop_flagdo
3:
   Compute the AGC point u AGC ( j ) ;
4:
   Compute u ( j + 1 ) as solution of (12) with stopping criteria (15);
5:
   if  J , ( j ) ( u ( j + 1 ) ) + Δ J , ( j ) ( u ( j + 1 ) ) < J , ( j ) ( u AGC ( j ) )  then
6:
     Accept u ( j + 1 ) , set δ ( j + 1 ) = δ ( j ) , compute ϱ ( j ) and g ( u ( j + 1 ) ) ;
7:
     if  g ( u ( j + 1 ) ) τ FOC  then
8:
        Set Loop_flag=False;
9:
     else
10:
        if  ϱ ( j ) η ϱ  then
11:
          Enlarge the TR radius δ ( j + 1 ) = β 1 1 δ ( j ) ;
12:
        end if
13:
        if not Skip_enrichment_flag ( j )  then
14:
          Update the RB model at u ( j + 1 ) ;
15:
        end if
16:
     end if
17:
   else if J , ( j ) ( u ( j + 1 ) ) Δ J , ( j ) ( u ( j + 1 ) ) > J , ( j ) ( u AGC ( j ) )
18:
     if  β 1 δ ( j ) δ min or Skip_enrichment_flag ( j 1 )  then
19:
        Update the RB model at u ( j + 1 ) ;
20:
     end if
21:
     Reject u ( j + 1 ) , shrink the radius δ ( j + 1 ) = β 1 δ ( j ) and go to 4;
22:
   else
23:
     Compute J ( u ( j + 1 ) ) , g ( u ( j + 1 ) ) , ϱ ( j ) and set δ ( j + 1 ) = β 1 1 δ ( j ) ;
24:
     if  g ( u ( j + 1 ) ) τ FOC  then
25:
        Set Loop_flag=False;
26:
     else
27:
        if Skip_enrichment_flag ( j ) and ϱ ( j ) η ϱ  then
28:
          Accept u ( j + 1 ) ;
29:
       else if J ( u ( j + 1 ) ) J , ( j ) ( u AGC ( j ) )
30:
          Accept u ( j + 1 ) and update the RB model;
31:
          if  ϱ ( j ) < η ϱ  then
32:
             Set δ ( j + 1 ) = δ ( j ) ;
33:
          end if
34:
        else
35:
          if  β 1 δ ( j ) δ min or Skip_enrichment_flag ( j 1 )  then
36:
             Update the RB model at u ( j + 1 ) ;
37:
          end if
38:
          Reject u ( j + 1 ) , set δ ( j + 1 ) = β 1 δ ( j ) and go to 4;
39:
        end if
40:
     end if
41:
   end if
42:
   Set j = j + 1 ;
43:
end while
An important aspect of TR methods is the decision to accept or reject the step u ( j + 1 ) . Generally, one asks for the so-called sufficient decrease condition J , ( j + 1 ) ( u ( j + 1 ) ) J , ( j ) ( u AGC ( j ) ) ; cf. [15]. Note that this condition requires to update the RB space before being sure that the step will be accepted. If it is rejected, then we performed a costly update without the possibility of exploiting it. Because of this fact, Ref. [14] proposed a sufficient (Line 5) and a necessary (Line 17) condition for the sufficient decrease condition. In [18] it is also noted that the full-order quantities J ( u ( j + 1 ) ) and J ( u ( j + 1 ) ) are cheaply available after updating the RB space. Additionally, Ref. [17] introduced the possibility of skipping a redundant enrichment, which is particularly useful at the late stage of the method, where we are close to the optimum. This will prevent the dimension of the RB space from growing too fast, so that the cheap-to-solve property is preserved. The three conditions to be checked in order to decide whether to skip the update of the RB space are contained in the following skipping parameter
Skip _ enrichment _ flag ( j ) : = q ( j ) ( u ( j + 1 ) ) β q δ ( j + 1 ) and | g ( u ( j + 1 ) ) g , ( j ) ( u ( j + 1 ) ) | g , ( j ) ( u ( j + 1 ) ) τ g and J , ( j ) ( u ( j + 1 ) ) J ( u ( j + 1 ) ) U J , ( j ) ( u ( j + 1 ) ) U min { τ grad , β grad δ ( j + 1 ) } .
where β q , β grad , τ g , τ grad ( 0 , 1 ) are given parameters and
g ( u ) : = u P U ad ( u J ( u ) ) U , g , ( j ) ( u ) : = u P U ad ( u J , ( j ) ( u ) ) U .
Note also that g ( u ) = 0 is nothing else than the standard first-order condition for optimization problems with constraints on the parameter set. This is the reason why Algorithm 2 terminates when g ( u ( j + 1 ) ) < τ FOC holds with τ FOC ( 0 , 1 ) . For more details on the skipping condition, we refer to [17]. Typically, TR methods also have the option of shrinking (enlarging) the TR radius δ ( j ) with some factor β 1 ( 0 , 1 ) ( β 1 1 > 1 , respectively). In the case of Algorithm 2, we shrink the radius if a point is rejected. We also compute the ratio
ϱ ( j ) : = J ( u ( j ) ) J ( u ( j + 1 ) ) J , ( j ) ( u ( j ) ) J , ( j ) ( u ( j + 1 ) ) .
If this ratio is greater than a parameter η ϱ [ 0.75 , 1 ] , then the radius is enlarged. Algorithm 2 is proved to be convergent given some technical assumptions on the problem. We summarize everything in the following theorem (cf. [17]).
Theorem 9.
Suppose that U ad = [ u a , u b ] R P for some u a , u b R P with u a u b . Assume that J and J , ( j ) ( j N ) are strictly positive, J is continuously Fréchet differentiable and J , ( j ) is even twice continuously Fréchet differentiable for all j N . Moreover, J , ( j ) is uniformly Lipschitz-continuous with respect to j. Suppose that there is δ min > 0 such that for every j N there exists a TR radius δ ( j ) δ min , for which there is a solution u ( j + 1 ) of the TR-RB subproblem (12) which is accepted by Algorithm 2. Assume that the family of functions ( q ( j ) ) j N is uniformly continuous w.r.t. the parameter u and the index j. Then every accumulation point u ¯ of the sequence of iterates ( u ( j ) ) j N is a first-order critical point for the full-order optimization problem, i.e., it holds
u ¯ P U ad u ¯ J ( u ¯ ) U = 0 .
In particular, Algorithm 2 terminates after finitely many steps.
Although many of the assumptions in Theorem 9 are quite technical for the proof, one can show that they are reasonable in the case of the RB method; cf. [17].

4.1. The TR-RB Algorithm Applied to the PS Method

In this section we show how Algorithm 2 can be applied to the PS method. To this end, we recall the following lemma from [16].
Lemma 4.
There are constants C J , C J , C 2 J > 0 such that for any j { 1 , , k } , any u U ad and any choice of the RB space V it holds
| J ^ i ( u ) | C J , J ^ i ( u ) U C J , 2 J ^ i ( u ) L ( U ) C 2 J .
Lemma 4 immediately implies that the reduced-order gradient is uniformly Lipschitz-continuous with respect to . We have to solve ( P z , r PS ). We follow the approach in [16], where the target direction r = ( 1 , , 1 ) is chosen and an augmented Lagrangian method is used. Provided a penalty parameter μ > 0 , the augmented Lagrangian for ( P z , r PS ) is
L A ( ( u , t , s ) , λ ; μ ) : = t + i = 1 k λ i c i ( u , t , s ) + μ 2 i = 1 k c i ( u , t , s ) 2
with c i ( u , t , s ) = J ^ i ( u ) z i t + s i . The idea is to iteratively solve the subproblems
min L A ( ( u , t , s ) , λ ; μ ) s . t . ( u , t , s ) U ad × R × R k
approximately and then update the Lagrange multiplier λ and the penalty parameter μ until the termination criteria
c ( u , t , s ) R k < τ EC ,
( u , t , s ) P ad ( u , t , s ) ( u , t , s ) L A ( ( u , t , s ) , λ ; μ ) U × R × R k < τ FOC
are satisfied for some tolerances τ EC , τ FOC ( 0 , 1 ) , where P ad : U × R × R k U ad × R × R k is the canonical projection onto U ad × R × R k . For further details, we refer to [16] (Appendix B). We want to combine then the augmented Lagrangian method with the TR-RB algorithm to solve problem ( P z , r PS ). To do so, we apply Algorithm 2 to solve each subproblem (17). We first define the reduced-order augmented Lagrangian
L A ( ( u , t , s ) , λ ; μ ) : = t + i = 1 k λ i c i ( u , t , s ) + μ 2 i = 1 k c i ( u , t , s ) 2 ,
with c i ( u , t , s ) = J ^ i ( u ) z i t + s i , which leads to the reduced-order subproblem
min L A ( ( u , t , s ) , λ ; μ ) s . t . ( u , t , s ) U ad × R × R k .
Note that in this case the admissible set U ad × R × R k is unbounded, which collides with the first assumption of Theorem 9. Nevertheless, Ref. [16] showed that the ( P z , r PS ) problem is also equivalent to
min t s . t . ( t , u ) [ t min , t max ] × U ad and J ^ ( u ) z t .
There is still the problem that the admissible set for the slack variables s is given by [ 0 , ) k . However, computing the partial derivative of the augmented Lagrangian L A with respect to s i , we obtain
s i L A ( ( u , t , s ) , λ ; μ ) = λ i + μ J ^ i ( u ) z i t + s i λ i + μ ( z i t max + s i ) .
Thus, L A is strictly monotonically increasing in s i for s i > λ i / μ + z i + t max = : s i max . Thus, given the Lagrange multiplier λ and the penalty parameter μ , we can restrict the slack variable s i to the interval [ 0 , s i max ] . This will not cause any modification to the solvability and the solution of the augmented Lagrangian subproblem. By setting X ad : = U ad × [ t min , t max ] × [ 0 , s max ] , the equivalent formulation for the augmented Lagrangian subproblem corresponding to (22) reads   
min ( u , t , s ) X ad L A ( ( u , t , s ) , λ ; μ ) .
Similarly, the reduced-order augmented Lagrangian subproblem is given by
min L A ( ( u , t , s ) , λ ; μ ) s . t . ( u , t , s ) X ad .
Therefore, the goal is to apply Algorithm 2 to solve the subproblem (23). To this end, we define x = ( u , t , s ) U × R × R k , J ( x ) = L A ( x , λ ; μ ) and J , ( j ) ( x ) = L A , ( j ) ( x , λ ; μ ) for any reference point z R k , any Lagrange multiplier λ R k and any penalty parameter μ > 0 . Furthermore, using the a-posteriori estimates of the individual objectives (cf. Theorem 8), we have that
| J ( x ) J , ( j ) ( x ) | j = 1 k λ j + c | J ^ j , ( j ) ( u ) z j t + s j | Δ J ^ j , ( j ) ( u ) + j = 1 k c 2 Δ J ^ j , ( j ) ( u ) 2 = : Δ J , ( j ) ( u )
for all u U ad , which can be used as a-posteriori error estimate in the TR-RB algorithm. According to Theorem 9, we still need to show the strict positivity of the costs J and J , ( j ) and the uniform Lipschitz continuity of the gradient J , ( j ) . For the first, we note that the objectives J and J , ( j ) are bounded from below by C : = t min i = 1 k λ i 2 / ( 2 μ i ) . Since C depends only on fixed parameters of the optimization problems, we can add C + 1 to the cost functions to obtain strict positivity. Obviously, this will not change the minimizers. The second property is a bit more technical and we prove it in the following lemma.
Lemma 5.
Let the Lagrange multiplier λ and the penalty parameter μ be given. Then the function J ( · ) : = L A ( · , λ ; μ ) is twice continuously Fréchet-differentiable for all j N and the gradient J , ( j ) is uniformly Lipschitz continuous with respect to j.
Proof. 
Due to Corollary 1 the cost functions J ^ 1 , , J ^ k are twice continuously Fréchet-differentiable. Thus, the function ( u , t , s ) L A ( ( u , t , s ) , λ ; μ ) is also twice continuously Fréchet-differentiable as a composition of twice continuously Fréchet-differentiable functions. Similarly, the reduced-order augmented Lagrangians L A , ( j ) ( ( · , · , · ) , λ ; μ ) are also twice continuously Fréchet-differentiable for all j N . We have that
2 L A , ( j ) ( ( u , t , s ) , λ ; μ ) ( h u , h t , h s ) = j = 1 k λ j + μ c j , ( j ) 2 J ^ j , ( j ) ( u ) h u + μ d j , ( j ) h t + h j s J ^ j , ( j ) ( u ) k μ h t μ j = 1 k d j , ( j ) + h j s μ d 1 , ( j ) + h 1 s h t μ d k , ( j ) + h k s h t
for any h = ( h u , h t , h s ) U × R × R k , where c j , ( j ) : = J ^ j , ( j ) ( u ) z j t + s j and d j , ( j ) : = J ^ , ( j ) ( u ) , h u U for j { 1 , , k } . Using Lemma 4, we obtain that the Hessian matrix 2 L A , ( j ) ( ( u , t , s ) , λ ; μ ) can be bounded independently of ( u , t , s ) and j. Using the mean value theorem, we can conclude that the gradients L A , ( j ) ( ( · , · , · ) , λ ; μ ) are Lipschitz-continuous with constant C L uniformly in j. □
As a consequence of Theorem 9, we have that Algorithm 2 applied to solve the augmented Lagrangian subproblem (23) converges after finitely many steps to a first-order critical point of (23).
Remark 6.
Algorithm 2 constructs and updates the RB space during the optimization procedure. In the case of the PS method, we are free to choose what to do for the space constructed during the TR-RB procedure. For example, we can use it for the next augmented Lagrangian subproblem (and also for the next reference point). We explored different ideas (see also [16]), but we report here only the two most interesting and efficient ones:
(1)
Use one common RB space for all the subproblems and reference points, i.e., use a single space V (which is, of course, updated in the process) for solving the MOP. This strategy acquires efficiency in terms of reconstructing the full-order parameter space during the iterations. Therefore, thanks to the possibility of skipping an enrichment (which is the costly part in Algorithm 2), we expect more and more speed-up, together with accuracy, as the algorithm proceeds.
(2)
Use multiple (local) RB spaces. This idea is already exploited by [16,37,38]. In this case, we do not use the previously obtained RB space for the next minimization problem. We generate instead k initial spaces V 1 , , V k , resulting from the minimization (Note that this procedure does not require extra computational cost, since we need to solve these problems for the hierarchical PS method anyway) of the objectives J ^ 1 , , J ^ k . Then at the beginning of every PS problem, we can decide to use the space V i for which q ( 0 ) ( u ( 0 ) ) < β q δ ( 0 ) and dim V i max , with  max N being a predefined maximal number of basis functions. If several spaces satisfy these conditions then we select the one for which the value q ( 0 ) ( u ( 0 ) ) is the smallest. If instead there is no space fulfilling these conditions, we initialize a new space V k + 1 by using the full-order quantities S ( u ( 0 ) ) and A i ( u 0 ) for i = 1 , , k .
Although these two techniques are already efficient, we noticed that there is a common problem: the number of RB basis functions might grow too fast and prevent a good speed-up for the solution. In particular, this is the case for the first strategy. To fix this issue, we propose different strategies to remove basis functions from V in Section 4.2. This approach was not considered in [14,16,17,18] and to our knowledge it has not been addressed in the literature yet. In reduced-order optimization, instead, this is meaningful, since the reduced-order model might grow too fast; see, e.g., [33], in the case of proper orthogonal decomposition.

4.2. How to Reduce the Number of Basis Functions

We point out that what is described in this section can also generally be applied to Algorithm 2 from [17] without any relation to the PS method. In particular, the strategies for reducing the number of basis functions presented in this section cannot only be used for PDE-constrained multi-objective optimization problems, but also for any other problem formulation containing PDE-constrained optimization problems. Therefore, we use again the general notation J for the cost, as it was done in the beginning of this section. The methodology to remove a basis function comes from the observation that some basis elements might not be used during the optimization process. Suppose that we start from a point u ( 0 ) very far from the optimum. Clearly, after j iterations the point u ( j ) is in a completely different region of the admissible set compared to the one of the starting point. Hence, the basis functions built for u ( 0 ) might give a negligible contribution in spanning the reduced-order model at the point u ( j ) . If this is the case, we can expect that these functions will not play any further role also for the subsequent points and therefore they can be removed to reduce the dimension of the RB space. Our methodologies for removing basis functions are then based on Remark 6 and try to check which basis functions give a negligible contribution for the current iteration of the TR-RB algorithm. Notice that every technique we propose from now on will be applied after updating the RB space in the TR-RB algorithm. The aim is to modify the updated RB space in order to provide a new RB space, where the number of basis functions is reduced. 
Technique T1.
The first proposed technique is based on the computation of the so-called Fourier coefficients. Given v V and a set of orthonormal basis functions { ψ n } n = 1 V , the n-th Fourier coefficient is defined as c F ( n ) ( v ) : = v , ψ n V . Now, T1 consists in computing c F ( n ) ( S ( u ( j + 1 ) ) ) and c F ( n ) ( A i ( u ( j + 1 ) ) ) , i = 1 , , k , for  n = 1 , , and remove the basis function ψ n for which
ζ ( n ) : = max c F ( n ) ( S ( u ( j + 1 ) ) ) 2 η = 1 c F ( η ) ( S ( u ( j + 1 ) ) ) 2 , max i = 1 , , k c F ( n ) ( A i ( u ( j + 1 ) ) ) 2 η = 1 c F ( η ) ( A i ( u ( j + 1 ) ) ) 2
is below a certain tolerance. Note, in fact, that the Fourier coefficients indicate the order of magnitude of the contribution of a given basis function in reconstructing the new snapshots that we want to add to update the RB. Strategy T1 is also based on the assumption that the snapshots, which we want to include in an update, are the most relevant for the new TR subproblem, because they correspond to the last accepted optimization step u ( j + 1 ) . The advantage of T1 is that the required Fourier coefficients are already available from the Gram-Schmidt orthogonalization performed during the update of the RB space. There is, anyway, a possible drawback of T1 due to the tolerance we set: it can happen that also important basis functions are removed although one thinks that the tolerance is small enough. Because of this, we would like to have a criteria to decide in an unbiased way which basis functions should be removed.
Technique T2.
This approach is based on the idea that once a point u ( j + 1 ) is accepted by the TR-RB algorithm and the RB space is updated, we will compute a provisional AGC point u AGC ( j + 1 ) , prov (cf. Definition 14) with respect to the previously updated RB space. One robustness criteria that we demand is that after removing basis functions, this provisional AGC point is still inside the new TR (Note that the TR depends on the reduced-order model due to the inequality constraint in (12) and, therefore, changes if we remove basis functions), although it might not coincide with the actual AGC point u AGC ( j + 1 ) that we compute after removing basis functions according to Line 3 in Algorithm 2 (Note that the reduced-order cost function changes by removing a basis function, so that also the first term in (13) differs after this removal). If we do not demand this robustness criteria, we can expect a deterioration of the TR performances due to lack of accuracy of the RB model in the steepest descent direction. Another important aspect is to guarantee the convergence of the TR-RB method, which implies checking that the conditions for accepting the point u ( j + 1 ) are still fulfilled, although we removed basis functions.
In summary, the difference with respect to T1 is then to remove basis functions starting from the one with the smallest value of ζ ( n ) and proceeding in ascending order until one of the following conditions is satisfied
Δ J rem , ( j + 1 ) ( u AGC ( j + 1 ) , prov ) J rem , ( j + 1 ) ( u AGC ( j + 1 ) , prov ) > β q δ ( j + 1 ) ,
Δ J rem , ( j + 1 ) ( u AGC ( j + 1 ) , prov ) J rem , ( j + 1 ) ( u AGC ( j + 1 ) , prov ) U > min { τ grad , β grad δ ( j + 1 ) } ,
J rem , ( j + 1 ) ( u ( j + 1 ) ) J ( u ( j + 1 ) ) U J rem , ( j + 1 ) ( u ( j + 1 ) ) U > min { τ grad , β grad δ ( j + 1 ) } ,
| g ( u ( j + 1 ) ) g rem , ( j + 1 ) ( u ( j + 1 ) ) | g rem , ( j + 1 ) ( u ( j + 1 ) ) > τ g ,
J rem , ( j + 1 ) ( u ( j + 1 ) ) > J , ( j ) ( u AGC ( j ) ) ,
J rem , ( j + 1 ) u AGC ( j + 1 ) , prov J ( u ( j + 1 ) ) > κ arm u AGC ( j + 1 ) , prov u ( j + 1 ) U 2 .
If one of the conditions (25) holds we re-add the basis function to the RB space and finish the removal continuing with the TR-RB procedure. T2 is summarized in Algorithm 3.
Algorithm 3: Summary of T2
1:
Follow the steps in Algorithm 2 until the RB model is updated at u ( j + 1 ) ;
2:
Compute a provisional AGC point u AGC ( j + 1 ) , prov by using the reduced-order cost function w.r.t. the updated RB model;
3:
Compute ζ ( n ) for n { 1 , , } ;
4:
while None of the conditions in (25) is fullfiled do
5:
   Out of all remaining basis functions, remove the one with the smallest value of ζ ( n ) from the RB space;
6:
end while
7:
Add the last removed basis function to the RB space;
8:
Proceed with Algorithm 2 with the RB space obtained performing Steps 2–7;
Let us explain the meaning of (25). At first, the superindex rem indicates that the space used to compute the quantity is the RB space obtained after removing a basis function. Condition (25a) is to check that the provisional AGC point will remain inside an accurate-enough region of the TR. Condition (25b) is in the spirit of (25a) but for the gradient of the objective. Conditions (25c) and (25d) are based on the skipping enrichment criteria and are checked to ensure convergence and robustness of the method after the removal. For a similar issue we need to check that the sufficient decrease condition is fulfilled as well (cf. (25e)). Finally, (25f) is to enforce that the provisional AGC point is still a Cauchy point. In such a way, we are sure that Algorithm 2 converges even after performing the basis removal (cf. [17,18]). In this sense, T2 introduces an unbiased way to deal with the technique introduced in T1. There are still a few aspects one should comment on before implementing T2. At first, note that all the above-mentioned conditions are cheaply computable, since they are based either on reduced-order quantities or the appearing full-order quantities are available because of the RB update. At second, conditions (25a) and (25b) request efficient and reliable error estimators. Although for the PS method the efficiency of Δ J , ( j ) is acceptable, it is not the same for an error estimator Δ J , ( j ) based on the a-posteriori estimates of the gradients of the individual objectives. These estimators generally produce a huge overestimation, which makes them useless in practice. We notice, in fact, that condition (25b) is immediately triggered in the case of the PS method and we can not remove any basis function. This is the reason why we solved this issue by two different related approaches:
Technique T2a.
We replace the numerator of (25b) by
J rem , ( j ) ( u AGC ( j + 1 ) , prov ) J ( u AGC ( j + 1 ) , prov ) U ,
which is the true error we wanted to estimate, but it is unfortunately costly. It requires the computation of the full-order quantities S ( u AGC ( j + 1 ) , prov ) and A i ( u AGC ( j + 1 ) , prov ) , i = 1 , , k .
Technique T2b.
We replace the numerator of (25b) by
J rem , ( j ) ( u AGC ( j + 1 ) , prov ) J , ( j + 1 ) ( u AGC ( j + 1 ) , prov ) U
which is a cheap approximation of the true error that we suppose to be reliable only after enough steps of Algorithm 2, however.
Clearly, if one has a good estimation of the gradient at hand, T2 can be still used in its original form.
Technique T3.
Another drawback of T2 is the fact that we first need to remove the basis function in order to check (25). This implies that when we stop the removal, we need to add back the last basis function which was removed, because it is containing important information; cf. Line 7 of Algorithm 3. This results in a waste of time for the modified Algorithm 2. We decide to add the option of introducing numerical tolerances for each of the conditions (25). In such a way, the modified algorithm will generally stop before an important basis function is removed at the price of possibly leaving one or a few redundant basis functions in the RB space. We think that this is a meaningful modification regarding the time that is wasted reintroducing the removed basis function into the RB space; cf. Section 5. We indicate this last strategy as T3.

5. Numerical Experiments

In this section we test Algorithm 2 and compare it with the results obtained in [16] (Section 3.2.2). We use the same numerical setting, which we briefly report here. Let the domain Ω be the two-dimensional unit square, split into four different subdomains Ω 1 = ( 0 , 0.5 ) × ( 0 , 0.5 ) , Ω 2 = ( 0 , 0.5 ) × ( 0.5 , 1 ) , Ω 3 = ( 0.5 , 1 ) × ( 0 , 0.5 ) and Ω 4 = ( 0.5 , 1 ) × ( 0.5 , 1 ) . For each Ω i , we consider a corresponding diffusion coefficient u i κ R in (3) for i = 1 , , 4 . The reaction term r ( x ) is set to be constantly equal to 1 for any x Ω . We impose homogeneous Neumann boundary conditions (i.e., α = 0 ) and a source term f ( x ) = i = 1 4 c i χ Ω i ( x ) with c 1 2.76 , c 2 0.96 , c 3 0.51 and c 4 1.66 generated randomly in order to obtain a problem with a non-convex Pareto front. For the spatial discretization of the state equation, we apply the Finite Element (FE) method with 1340 nodes and piecewise linear basis functions. For (MPPOP) we choose the following three objectives
J ^ 1 ( u ) : = 1 2 S ( u ) y Ω ( 1 ) H 2 + ε 2 u u d ( 1 ) U 2 , J ^ 2 ( u ) : = 1 2 S ( u ) y Ω ( 2 ) H 2 + ε 2 u u d ( 2 ) U 2 , J ^ 3 ( u ) : = 0.05 2 u u d ( 3 ) U 2
with ε = 0.002 , the desired states
y Ω ( 1 ) ( x ) : = χ ( 0 , 0.5 ) × ( 0 , 1 ) ( x ) , y Ω ( 2 ) ( x ) : = χ ( 0.5 , 1 ) × ( 0 , 1 ) ( x ) ,
and the desired parameter values
u d ( 1 ) = u d ( 2 ) : = ( 2 , 0 , 0 , 0 , 0.3 ) T , u d ( 3 ) : = ( 2 , 1 , 1 , 1 , 0.3 ) T .
The lower and upper parameter bounds are given by
u a = ( 2 , 0.1 , 0.1 , 0.1 , 0.3 ) T and u b = ( 2 , 4 , 4 , 4 , 0.3 ) T ,
respectively. This implies that u 1 κ = 2 and u r = 0.3 are seen as constants and we only optimize over the three parameters u 2 κ , u 3 κ and u 4 κ . Note furthermore, that the desired parameters u d ( 1 ) = u d ( 2 ) are not admissible. In fact, as for the parameters of the source term, they were chosen such that the resulting Pareto front is non-convex.
For the choice of the initial value for PSPs corresponding to reference points for the entire problem ( J ^ 1 , J ^ 2 , J ^ 3 ) we do the following: Let u ¯ i be the minimizer of J ^ i for i = 1 , 2 , 3 . Recall that the sets D i have been introduced in Definition 7-(ii). Then, if z D i , we choose u ¯ i as the initial value for solving ( P z , r PS ). We additionally choose the shifting vectors d ˜ = 0.001 · ( 1 , 1 , 1 ) T , while the grid size h for the reference point grid is set to h PSM = 0.003 .

5.1. Parameter Choices for the TR-RB Algorithm

There are many parameters used in the TR-RB algorithm, which we will specify and briefly comment on in this section.
  • The initial TR radius is chosen as δ ( 0 ) = 0.1 , the tolerance for increasing the TR radius is set to η ϱ = 0.75 and the factor for shrinking the TR radius to β 1 = 0.5 . For the minimal TR radius we use δ min = 1 × 10 16 .
  • For the Armijo backtracking strategy, we use the constants κ arm = 1 × 10 4 and κ = 0.5 .
  • The tolerance of the first-order condition is set to τ FOC = τ FOC , sub ( i ) , where τ FOC , sub ( i ) is the tolerance for the first-order condition of the current augmented Lagrangian subproblem. Moreover, we choose τ sub = 0.5 τ FOC as the tolerance of the first-order condition of the TR-subproblem and β bound = 0.9 as the constant in (15).
  • For checking the necessity of updating the RB space, we choose τ g = 1 , τ grad = 0.1 , β grad = 0.2 and β q = 0.005 .
  • The tolerance chosen in T1 (cf. Section 4.2) for the Fourier coefficient is 10 6 . Similarly, we choose the same tolerance for T3 in order to break the removal algorithm before deleting important basis functions, i.e., we subtract it on the right-hand side of (25a)–(25f).
We notice in our numerical experiments that the method without basis removal is quite robust in terms of computational time and required PDE solves with respect to all the parameters except for the ratio between the first-order conditions of the current augmented Lagrangian subproblem and the TR-subproblem τ FOC / τ sub . In our experiments we choose this ratio to be 2, but we observe that a too large ratio (already 5 is sufficient) slows down the method considerably. The reason is that the TR-subproblems are solved with too much accuracy in this case which needs a lot of numerical effort but does not benefit the overall optimization. Regarding the techniques introduced in Section 4.2, T1 heavily depends on the choice of the tolerance for truncating the Fourier coefficient. The smaller the tolerance the less basis functions are removed. Anyway, if we remove too many basis functions (e.g., tolerance of 10 4 ), T1 becomes less stable and the method needs more iterations to converge which generally corresponds in more enrichment steps which slow it down. Conversely, removing few basis functions (e.g., 10 8 ) implies no significant differences between T1 and the method without removal. Contrarily to T1, techniques T2, T2a and T2b are only based on the same parameters which influence the behavior of the algorithm without removal. Their performances are also robust with respect to all these parameters in terms of basis functions removed. For T3 the same discussion applies, but this method is also sensitive to the tolerance chosen to break the removal algorithm before deleting presumed important basis functions. On one hand, if this tolerance is too high (e.g., 10 2 ), the method will not remove a significant number of basis functions to influence the performances of the algorithm. On the other hand, if this is too low (e.g., 10 8 ) T3 will be essentially equivalent to T2.

5.2. Numerical Ressults

In this section, we focus mainly on the comparison of our proposed TR-RB variants, briefly commenting on full-order versus reduced-order model. For detailed comments and results on the PS method applied on the FE and RB level, we refer to [16] (Section 3.2.2). At first, to validate our approach, we show in Figure 1 the obtained Pareto fronts by using the method in [16] (left) and our method (right). As one can see, there is no visible difference. The approximation error is, in fact, of the order of 10 6 for a Pareto point computed by all the proposed techniques (i.e., T1, T2a, T2b and T3) on average. This can be essentially explained by the fact that the termination criteria for Algorithm 2 relies on the full-order model. Therefore, any computed point is first-order critical for the FE model, up to the chosen stopping tolerance. Let us remark that this is not typical for model order reduction, where generally there is an additional approximation error due to the reduced-order model inaccuracy.
In Figure 2 we compare the computational time of Algorithm 2 for all the proposed techniques (cf. Section 4.2) against the full-order FE model and the algorithm in [16]. Concerning the FE method, we can save between 41 % and 59 % of the computational time. Considering the fact that we do not have an approximation error in reconstructing the Pareto points, we get the same result in approximately half of the time by using any of the TR-RB variants. This speed-up will also increase with an increasing number of degrees of freedom for the FE method, since the number of required FE solves of the PDE is significally smaller for the TR-RB algorithms than for the FE method; cf. Table 1.
Furthermore, we get a speed-up of the TR-RB algorithm by using the proposed techniques for reducing the number of basis functions in almost all cases. Depending on the strategy from Remark 6, one technique performs better than the others. Here we try to explain this phenomena in detail. Let us focus on the common RB space first. In this case, every technique helps in saving computational time. This is clearly the effect of removing redundant basis functions, which are particularly frequently included using a large common RB space. This is the reason why T1 appears to be the most effective, since it is the cheapest among the techniques (as we said it does not imply additional cost to be checked). T2a is more robust, but it comes with the price of evaluating the full-order gradient at the new AGC point and thus results to be slower than T1. Apparently, T2b should overcome this problem, but the inaccuracy of the RB space in the beginning yields a bad approximation of (25b), resulting in removing too many basis functions which leads to a worse approximation for the consecutive steps. This worsening of the approximation results in a way larger number of enrichment steps towards the end of the algorithm, which also negatively influences the computational time. T3 is comparable with T2a, meaning that for this example we are removing many basis functions in only a few instances, rather than frequently removing a few basis functions. Figure 3b confirms the above remarks for the case of a common RB space. In this figure we report the number of basis functions obtained at the end of Algorithm 2 while this is applied to compute each Pareto optimal point in the PS method.
Now, let us focus on the left group of columns in Figure 2 (and thus on Figure 3a), which corresponds to the computational times in the case of using local RB spaces (cf. Remark 6). This case is a bit more delicate, since the use of local RB spaces makes it more difficult to interpret the results. Here the problem of T1 is emerging. The fact that this technique removes a number of basis functions without any robustness criteria implies that the method slows down. In the case of local spaces, in fact, we do not have the same amount of redundant basis functions as it can occur for a common RB space. Therefore, we should only remove the basis functions which are actually redundant. As one can note in Figure 3a, T1 removes a significantly larger amount of basis functions in comparison to the other techniques. Here the criteria introduced in T2a play their role in a positive way. We can counteract the effect of T1 in such a way that the computational time is comparable to the one in [16]. The further simplification introduced in T2b helps to get an additional speed-up. In contrast to the common RB space, here we have local spaces which provide a sufficiently good accuracy for approximating (25b) also in the beginning of the optimization. This is then beneficial for the algorithm, since the cost of computing the criteria in T2b is way cheaper than T2a, where we need full-order solves of the state and adjoint equation to compute the gradient at the new AGC point. Additionally, T3 further improves T2a and T2b in terms of computational time, since in the case of local RB spaces it is more probable that we indeed remove only a few basis functions but more frequently than in the case of one common RB space. In this case, it is important to have tolerances that let us stop before removing an important basis function and save time for reintroducing it in the RB space.
In conclusion, comparing our fastest method (i.e., Algorithm 2 with local RB spaces and T3) to the slowest (i.e., using [16] with a common RB space) we get essentially the same results (the approximation error is 10 6 ) saving approximately 30 % of the computational time, which is roughly 300 s. This shows how one should invest time and resources in providing efficient techniques for reducing the number of basis functions in the RB space, while using an adaptive TR-RB algorithm. Particularly in the case of multiobjective optimization, this becomes crucial for a large number of cost functionals k. To obtain the same resolution of the Pareto front as in Figure 1 for a large k, we will need to solve the PSPs for many more points, implying higher risk of having redundant basis functions.

6. Conclusions

We showed the applicability and convergence of the TR-RB algorithm in the context multi-objective PDE-constrained parameter optimization problem. We presented and analyzed novel ways of reducing the dimension of the RB space during the optimization procedure. To our knowledge, basis reduction strategies have not been proposed yet for the RB method, although it is common for other model order reduction techniques. Such a removal significantly improved the performances of the TR-RB algorithm in the context of multiobjective optimization, leading faster to an accurate solution than the already existing techniques. The presented example contained only three parameters to be optimized. However, based on the results in [17] (Section 4.4) for an example with 28 parameters and on the various examples in [39] (Sections 3.5.4–3.5.6), we expect all of the TR-RB methods to scale well with an increasing number of parameters. As for the multi-objective optimization by the PS method, the numerical effort grows exponentially with the number of cost functions k, but is independent of the number of parameters m if m k 1 . Moreover, the presented removal techniques of reduced basis functions can also be extended to other applications in which sequential parametric PDE-constrained optimization problems must be solved. In future work, one can try to extend the convergence theory for the presented TR-RB algorithm to a larger class of PDEs than the one presented here, as, e.g., parabolic PDEs [14] or non-affine parameter-to-state couplings. Due to the general formulation of the convergence result we are optimistic that this is possible. Moreover, one can try to achieve further improvements concerning robustness of the method and deriving tighter a-posteriori error estimators, in particular for the gradient of the cost function. This is also of great interest in the RB community. Another interesting idea could be to incorporate the usual trust-region condition based on the (Euclidean) distance from the current iterate into the presented TR-RB algorithm. In [19] the usual trust-region condition was actually performing slightly better than a residual-based error estimate as the trust-region constraint for some of the considered problems. Despite the fact that we use not only a residual-based error estimate but an error estimate of the actual cost function, a comparison between the different approaches is definitely of interest.

Author Contributions

Conceptualization, S.B., L.M. and S.V.; methodology, S.B., L.M. and S.V.; software, S.B. and L.M.; formal analysis, S.B., L.M. and S.V.; investigation, S.B., L.M. and S.V.; writing—original draft preparation, S.B., L.M. and S.V.; writing—review and editing, S.B., L.M. and S.V.; funding acquisition, S.V. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG) for the project Localized Reduced Basis Methods for PDE-constrained Parameter Optimization under contract VO 1658/6-1.

Acknowledgments

The authors thank Tim Keil, Mario Ohlberger and Felix Schindler from University of Münster (Germany) for the fruitful exchange of ideas on the topic.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AGCApproximated generalized Cauchy
CGConjugate gradient
FEFinite element
MOPMultiobjective optization problem
MPPOPMultiobjective parametric PDE-constrained optimization problem
PDEPartial Differential Equation
PSPascoletti-Serafini
RBReduced basis
s.t.subject to
TRTrust-region

References

  1. Ehrgott, M. Multicriteria Optimization, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  2. Miettinen, K. Nonlinear Multiobjective Optimization; Kluwer Academic Publishers: Cambridge, MA, USA, 1999. [Google Scholar]
  3. Zadeh, L. Optimality and non-scalar-valued performance criteria. IEEE Trans. Autom. Control 1963, 8, 59–60. [Google Scholar] [CrossRef]
  4. Eichfelder, G. Adaptive Scalarization Methods in Multiobjective Optimization; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  5. Pascoletti, A.; Serafini, P. Scalarizing vector optimization problems. J. Optim. Theory Appl. 1984, 42, 499–524. [Google Scholar] [CrossRef]
  6. Hinze, M.; Pinnau, R.; Ulbrich, M.; Ulbrich, S. Optimization with PDE Constraints; Springer Science + Business Media B.V.: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  7. Schilders, W.H.; Van der Vorst, H.A.; Rommes, J. Model Order Reduction; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  8. Hesthaven, J.S.; Rozza, G.; Stamm, B. Certified Reduced Basis Methods for Parametrized Partial Differential Equations; SpringerBriefs in Mathematics: Heidelberg, Germany, 2016. [Google Scholar]
  9. Patera, A.T.; Rozza, G. Reduced Basis Approximation and a Posteriori Error Estimation for Parametrized Partial Differential Equations; MIT Pappalardo Graduate Monographs in Mechanical Engineering: Cambridge, MA, USA, 2007. [Google Scholar]
  10. Banholzer, S.; Gebken, B.; Reichle, L.; Volkwein, S. ROM-based inexact subdivision methods for PDE-constrained multiobjective optimization. Math. Comput. Appl. 2021, 26, 32. [Google Scholar] [CrossRef]
  11. Iapichino, L.; Ulbrich, S.; Volkwein, S. Multiobjective PDE-constrained optimization using the reduced-basis method. Adv. Comput. Math. 2017, 43, 945–972. [Google Scholar] [CrossRef]
  12. Schu, M. Adaptive Trust-Region POD Methods and Their Application in Finance. Ph.D. Thesis, University of Trier, Trier, Germany, 2012. Available online: https://ubt.opus.hbz-nrw.de/opus45-ubtr/frontdoor/deliver/index/docId/574/file/PhD_Thesis_Schu.pdf (accessed on 28 April 2022).
  13. Arian, E.; Fahl, M.; Sachs, W.S. Trust-Region Proper Orthogonal Decomposition for Flow Controls; Techincal Report No. 2000–2025; Institute for Computer Applications in Science and Engineering, NASA Langley Research Center: Hampton, VA, USA, 2000. [Google Scholar]
  14. Qian, E.; Grepl, M.; Veroy, K.; Willcox, K. A certified trust region reduced basis approach to PDE-constrained optimization. SIAM J. Sci. Comput. 2017, 39, S434–S460. [Google Scholar] [CrossRef]
  15. Yue, Y.; Meerbergen, K. Accelerating optimization of parametric linear systems by model order reduction. SIAM J. Optimiz. 2013, 23, 1344–1370. [Google Scholar] [CrossRef]
  16. Banholzer, S. ROM-Based Multiobjective Optimization with PDE Constraints. Ph.D. Thesis, University of Konstanz, Konstanz, Germany, 2021. Available online: http://nbn-resolving.de/urn:nbn:de:bsz:352-2-1g98y1ic7inp29 (accessed on 28 April 2022).
  17. Banholzer, S.; Keil, T.; Mechelli, L.; Ohlberger, M.; Schindler, F.; Volkwein, S. An adaptive projected Newton non-conforming dual approach for trust-region reduced basis approximation of PDE-constrained parameter optimization. arXiv 2020, arXiv:2012.11653. [Google Scholar]
  18. Keil, T.; Mechelli, L.; Ohlberger, M.; Schindler, F.; Volkwein, S. A non-conforming dual approach for adaptive trust-region reduced basis approximation of PDE-constrained optimization. ESAIM M2AN 2021, 55, 1239–1269. [Google Scholar] [CrossRef]
  19. Yano, M.; Huang, T.; Zahr, M.J. A globally convergent method to accelerate topology optimization using on-the-fly model reduction. Comput. Methods Appl. Mech. Eng. 2021, 375, 113635. [Google Scholar] [CrossRef]
  20. Zahr, M.J.; Carlberg, K.T.; Kouri, D.P. An efficient, globally convergent method for optimization under uncertainty using adaptive model reduction and sparse grids. SIAM/ASA J. Uncertain. Quantif. 2019, 7, 877–912. [Google Scholar] [CrossRef]
  21. Kouri, D.P.; Heinkenschloss, M.; Ridzal, D.; van Bloemen Waanders, B.G. A trust-region algorithm with adaptive stochastic collocation for PDE optimization under uncertainty. SIAM J. Sci. Comput. 2013, 35, A1847–A1879. [Google Scholar] [CrossRef] [Green Version]
  22. Kouri, D.P.; Heinkenschloss, M.; Ridzal, D.; van Bloemen Waanders, B.G. Inexact objective function evaluations in a trust-region algorithm for PDE-constrained optimization under uncertainty. SIAM J. Sci. Comput. 2014, 36, A3011–A3029. [Google Scholar] [CrossRef]
  23. Grüne, L.; Pannek, J. Nonlinear Model Predictive Control: Theory and Algorithms, 2nd ed.; Springer: London, UK, 2016. [Google Scholar]
  24. Borwein, J.M. On the existence of Pareto efficient points. Math. Oper. Res. 1983, 8, 64–73. [Google Scholar] [CrossRef]
  25. Hartley, R. On cone-efficiency, cone-convexity and cone-compactness. SIAM J. Appl. Math. 1978, 34, 211–222. [Google Scholar] [CrossRef]
  26. Sawaragi, Y.; Nakayama, H.; Tanino, T. Theory of Multiobjective Optimization; Elsevier: Amsterdam, The Netherlands, 1985. [Google Scholar]
  27. Wierzbicki, A.P. The Use of Reference Objectives in Multiobjective Optimization. In Multiple Criteria Decision Making Theory and Application; Springer: Berlin/Heidelberg, Germany, 1980; pp. 468–486. [Google Scholar]
  28. Mueller-Gritschneder, D.; Graeb, H.; Schlichtmann, U. A successive approach to compute the bounded Pareto front of practical multiobjective optimization problems. SIAM J. Optim. 2009, 20, 915–934. [Google Scholar] [CrossRef]
  29. De Motta, R.S.; Afonso, S.M.B.; Lyra, P.R.M. A modified NBI and NC method for the solution of N-multiobjective optimization problems. Struct. Multidiscip. Optim. 2012, 46, 239–259. [Google Scholar] [CrossRef]
  30. Khaledian, K.; Soleimani-damaneh, M. A new approach to approximate the bounded Pareto front. Math. Method Oper. Res. 2015, 82, 211–228. [Google Scholar] [CrossRef]
  31. Lowe, T.J.; Thisse, J.-F.; Ward, J.E.; Wendell, R.E. On efficient solutions to multiple objective mathematical programs. Manag. Sci. 1984, 30, 1346–1349. [Google Scholar] [CrossRef]
  32. Sayın, S. Measuring the quality of discrete representations of efficient sets in multiple objective mathematical programming. Math. Program. 2000, 87, 543–560. [Google Scholar] [CrossRef]
  33. Mechelli, L. POD-Based State-Constrained Economic Model Predictive Control of Convection-Diffusion Phenomena. Ph.D. Thesis, University of Konstanz, Konstanz, Germany, 2019. Available online: http://nbn-resolving.de/urn:nbn:de:bsz:352-2-2zoi8n9sxknm1 (accessed on 28 April 2022).
  34. Evans, L.C. Partial Differential Equations; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar]
  35. Haasdonk, B. Reduced basis methods for parametrized PDEs—A tutorial introduction for stationary and instationary problems. In Model Order Reduction and Approximation: Theory and Algorithms; Benner, P., Ohlberger, M., Cohen, A., Willcox, K., Eds.; SIAM: Philadelphia, PA, USA, 2017; pp. 65–136. [Google Scholar]
  36. Rozza, G.; Huynh, D.B.P.; Patera, A.T. Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations. Arch. Comput. Method E 2008, 15, 229–275. [Google Scholar] [CrossRef] [Green Version]
  37. Beermann, D.; Dellnitz, M.; Peitz, S.; Volkwein, S. Set-oriented multi- objective optimal control of PDEs using proper orthogonal decomposition. In Reduced-Order Modeling (ROM) for Simulation and Optimization; Keiper, W., Milde, A., Volkwein, S., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 47–72. [Google Scholar]
  38. Haasdonk, B.; Dihlmann, M.; Ohlberger, M. A training set and multiple bases generation approach for parameterized model reduction based on adaptive grids in parameter space. Math. Comput. Model. Dyn. 2011, 17, 423–442. [Google Scholar] [CrossRef]
  39. Keil, T. Adaptive Reduced Basis Methods for Multiscale Problems and Large-Scale PDE-Constrained Optimization. Ph.D. Thesis, WWU Münster, Münster, Germany, 2022. [Google Scholar]
Figure 1. (a) Algorithm 2 no Removal local RB spaces. (b) Algorithm 2 T3 local RB spaces.
Figure 1. (a) Algorithm 2 no Removal local RB spaces. (b) Algorithm 2 T3 local RB spaces.
Mca 27 00039 g001
Figure 2. Computational times in seconds for Algorithm 2 with or without basis removal and using the two strategies in Remark 6 for initializing the RB space.
Figure 2. Computational times in seconds for Algorithm 2 with or without basis removal and using the two strategies in Remark 6 for initializing the RB space.
Mca 27 00039 g002
Figure 3. Number of basis functions used to compute each Pareto optimal point. (a) Local RB space. (b) Common RB space. In brackets: average number of basis functions.
Figure 3. Number of basis functions used to compute each Pareto optimal point. (a) Local RB space. (b) Common RB space. In brackets: average number of basis functions.
Mca 27 00039 g003
Table 1. Total PDE and only FE solves for the tested methods.
Table 1. Total PDE and only FE solves for the tested methods.
Method# Total PDE Solves# FE Solves
FE433378433378
Common RB Space No R.49325420743
Common RB Space T149328220786
Common RB Space T2a49703220838
Common RB Space T2b49703220752
Common RB Space T349398520792
Local RB Space No R.49707220773
Local RB Space T149758920893
Local RB Space T2a50706421226
Local RB Space T2b50706420857
Local RB Space T350291121023
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Banholzer, S.; Mechelli, L.; Volkwein, S. A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization. Math. Comput. Appl. 2022, 27, 39. https://doi.org/10.3390/mca27030039

AMA Style

Banholzer S, Mechelli L, Volkwein S. A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization. Mathematical and Computational Applications. 2022; 27(3):39. https://doi.org/10.3390/mca27030039

Chicago/Turabian Style

Banholzer, Stefan, Luca Mechelli, and Stefan Volkwein. 2022. "A Trust Region Reduced Basis Pascoletti-Serafini Algorithm for Multi-Objective PDE-Constrained Parameter Optimization" Mathematical and Computational Applications 27, no. 3: 39. https://doi.org/10.3390/mca27030039

Article Metrics

Back to TopTop