Next Article in Journal
Pseudo-Poisson Distributions with Concomitant Variables
Next Article in Special Issue
COVID-19 Data Analysis with a Multi-Objective Evolutionary Algorithm for Causal Association Rule Mining
Previous Article in Journal
Global Stability of Multi-Strain SEIR Epidemic Model with Vaccination Strategy
Previous Article in Special Issue
An Experimental Study of Grouping Mutation Operators for the Unrelated Parallel-Machine Scheduling Problem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Hypervolume Newton Method for Constrained Multi-Objective Optimization Problems

1
Leiden Institute of Advanced Computer Science, Leiden University, 2333 CA Leiden, The Netherlands
2
School of Engineering and Sciences, Tecnológico de Monterrey, Av. Lago de Guadalupe Km 3.5, Atizapán de Zaragoza, Mexico City 52926, Mexico
3
Computer Science Department, Cinvestav-IPN, Mexico City 07360, Mexico
*
Author to whom correspondence should be addressed.
Math. Comput. Appl. 2023, 28(1), 10; https://doi.org/10.3390/mca28010010
Submission received: 1 November 2022 / Revised: 24 December 2022 / Accepted: 4 January 2023 / Published: 9 January 2023

Abstract

:
Recently, the Hypervolume Newton Method (HVN) has been proposed as a fast and precise indicator-based method for solving unconstrained bi-objective optimization problems with objective functions. The HVN is defined on the space of (vectorized) fixed cardinality sets of decision space vectors for a given multi-objective optimization problem (MOP) and seeks to maximize the hypervolume indicator adopting the Newton–Raphson method for deterministic numerical optimization. To extend its scope to non-convex optimization problems, the HVN method was hybridized with a multi-objective evolutionary algorithm (MOEA), which resulted in a competitive solver for continuous unconstrained bi-objective optimization problems. In this paper, we extend the HVN to constrained MOPs with in principle any number of objectives. Similar to the original variant, the first- and second-order derivatives of the involved functions have to be given either analytically or numerically. We demonstrate the applicability of the extended HVN on a set of challenging benchmark problems and show that the new method can be readily applied to solve equality constraints with high precision and to some extent also inequalities. We finally use HVN as a local search engine within an MOEA and show the benefit of this hybrid method on several benchmark problems.

1. Introduction

Multi-objective optimization problems (MOPs)—i.e., problems where several objectives have to be optimized concurrently –naturally arise in many applications (e.g., [1,2,3,4]). As an example, in many portfolio problems, one is interested in maximizing the expected return and social responsibility or sustainability while minimizing the risk to a financial portfolio ([5,6]). In multi-objective optimization, we distinguish between the decision space, which contains the vectors of decision variables, and the objective space, which is the k dimensional real vectors and comprises the images of the vector-valued objective function. A typical approach to the solution of MOPs is to compute or approximate the non-dominated (or efficient) set with respect to the Pareto dominance order (the image of which in the objective space is called the Pareto front). One important characteristic of (continuous) MOPs is that in regular cases, the Pareto front is a manifold of k 1 dimensions, where k denotes the number of objective functions. In general, it is possible that parts of the Pareto front are of lower dimension, but the Pareto front is never more than k 1 dimensions. Since, in the continuous case, the non-dominated set and the Pareto front can contain infinitely many points, it is usually approximated by a finite set of points. In particular, in the area of evolutionary multi-objective optimization (EMO), many performance indicators have been proposed that propagate optimal approximations of the Pareto front (e.g., [7,8,9,10]). While their definitions slightly differ, most have in mind to obtain (more or less) evenly spread solutions along the Pareto front.
Interestingly, with the hypervolume indicator [10], there exists an indicator that does not require the knowledge of the location of the true Pareto front. Still, its maximization leads to well-distributed approximation sets consisting of only non-dominated solutions. In this work, by “well-distributed”, we mean the objective points have good coverage of the Pareto front and are gap-free when the population size is large. At the maximum of the hypervolume indicator, the density of objective points is inversely proportional to the local curvature of the Pareto front [11]. Here is where the idea of set-scalarization comes into play. In set-scalarization methods, rather than focusing on the improvement of single points of the approximation set, the focus is on the optimization of a fixed cardinality set as an entity concerning a set-based indicator, e.g., the hypervolume indicator. The objective function of the set-scalarization method, in our case, the hypervolume indicator, provides a mapping from the set of fixed cardinality sets in the decision space to a scalar that has to be maximized. Due to the properties mentioned above of the hypervolume indicator, the resulting set will provide a well-distributed set of points on the Pareto front.
Multi-objective evolutionary algorithms (MOEAs) have long since adopted the idea of set-scalarization. The so-called indicator-based MOEAs (e.g., [12,13,14]) use performance indicators to guide the search, e.g., by indicator-based selection. In numerical methods, the set scalarization approach was first addressed in gradient-based hypervolume maximization [15,16,17,18,19] and in the maximization of the Averaged Hausdorff Metric [14]). More recently, the approach was generalized to second-order methods with the Hypervolume Newton Method (HVN), a set scalarization-based Newton–Raphson method for the maximization of the hypervolume indicator value of a given MOP (e.g., [20,21]). However, this method has only been discussed for unconstrained and bi-objective optimization problems, which limits its application.
In this paper, we extend the HVN for constrained MOPs with a general number of objectives. To this end, we present the HVN for equality-constrained problems and further discuss a straightforward active set method to handle inequalities. Since the HVN is highly local, we also discuss the hybridization of this method with an MOEA. Finally, we present numerical results indicating the strength of the novel approaches.
The remainder of this paper is organized as follows: in Section 2, we present the necessary background required for understanding the sequel, and we review the related work. In Section 3, we present the HVN for constrained multi-objective optimization problems. Section 4 presents the numerical results of the constrained HVN as a standalone algorithm and a local search strategy within a hybrid evolutionary algorithm. Finally, we conclude and give possible paths for future research in Section 5.

2. Background and Related Work

2.1. Notations

We will always denote a finite Pareto approximate set by X = { x ( 1 ) , x ( 2 ) , , x ( μ ) } R n . When differentiating a set function, e.g., the hypervolume, over the input set, we often concatenate the points in X into a much longer vector, i.e., X = [ x ( 1 ) , x ( 2 ) , , x ( μ ) ] R μ n . To make our discussion less cumbersome, we abuse the notation X slightly such that it can be interpreted as a finite set in R n or an R μ n -vector, depending on the context. (See [16] for a detailed formal discussion of the mapping between fixed cardinality sets and vectors.) We will explain the meaning of X on the spot whenever it is unclear from the text. We will always denote by ∇ and 2 the gradient/Jacobian and Hessian operators on real-valued functions, respectively, when the domain of such a function is clear from the text. Otherwise, we take the derivative operator / X . When expressing the Hessian matrix, we will use the numerator layout for matrix calculus notations [22].

2.2. Multi-Objective Optimization

A real-valued multi-objective optimization problem (MOP) involves minimizing multiple objective functions simultaneously, i.e., F = ( f 1 , , f k ) , f i : X R , X R n , i { 1 , , k } . For every y ( 1 ) and y ( 2 ) R k , we say y ( 1 ) weakly dominates y ( 2 ) (written as y ( 1 ) y ( 2 ) ) iff y i ( 1 ) y i ( 2 ) , i [ 1 k ] . The Pareto order ≺ on R k is defined: y ( 1 ) y ( 2 ) iff. y ( 1 ) y ( 2 ) and y ( 1 ) y ( 2 ) . A point x X is called efficient or (Pareto) optimal iff. x X ( F ( x ) F ( x ) ) . The set P Q of all Pareto optimal solutions of a MOP is called the Pareto set, and its image F ( P Q ) is called the Pareto front. Typically, i.e., under certain (mild) assumptions on the model, one can assume that the Pareto set and front of a given continuous MOP form at least locally an object of dimension k 1 ([23]).
The Pareto order can also be extended to the family of sets [10], i.e., we say A B iff. y B y A ( y y ) . The set of all efficient points of X is called the efficient set. The image of the efficient set under F is called the Pareto front. Multi-objective optimization algorithms (MOAs) often employ a finite multiset X = { x ( 1 ) , , x ( μ ) } to approximate the efficient set, whose image under F is denoted by Y . Multi-objective optimization is an active research field that has produced many algorithms for the approximation of the entire Pareto set/front of a given MOP. There exist, for instance, scalarization methods, and mathematical programming techniques that transform the given MOP into an auxiliary scalar optimization problem (SOP) (e.g., [24]). Via solving a clever sequence of such SOPs, one can obtain in many cases suitable Pareto front approximations (e.g., [25,26,27,28]). In [29], a Newton method is proposed for multi-objective optimization. Next to these point-wise iterative local search strategies there exist global set-based algorithms such as cell-to-cell mapping techniques and subdivision techniques ([30,31,32]) as well as specialized evolutionary algorithms ([33,34,35,36]). There exist in particular indicator-based evolutionary algorithms (IBEAs) that aim for Pareto front approximations of a given performance indicator (e.g., [12,13,14]). Widely used performance indicators are the Generational Distance (GD [7]), the Inverted Generational Distance and variants ([8,37,38]), the averaged Hausdorff distance Δ p ([9,39,40]), and the Hypervolume indicator, which we will use in this work and briefly review in the next section.
Finally, there exist multi-objective continuation methods that make use of the fact that the solution set forms at least locally a manifold (e.g., [23,41,42,43,44,45,46]).

2.3. Hypervolume Indicator and Its First-Order Derivatives

The hypervolume indicator (HV) [10,47] is defined as the Lebesgue measure of the compact set dominated by a Pareto approximation set Y R k and cut from above by a reference point r :
HV ( Y ; r ) = λ k p : y Y ( y p ) p r ,
where λ k denotes the Lebesgue measure in R k . HV is Pareto compliant, i.e., for all Y Y , HV ( Y ; r ) > HV ( Y ; r ) , and is extensively used to assess the quality of approximation sets to the Pareto front, e.g., in SMS-MOEA [12] and multi-objective Bayesian optimization [48]. Being a set function, it is cumbersome to define the derivative of HV. (The derivative of a set function is not defined for an arbitrary family of sets. For some special cases, it can be defined directly, e.g., on Jordan-measurable sets [49].) Therefore, we follow the generic set-based approach for MOPs [16], which considers a finite approximation sets of size μ vectors as a point in R μ n , i.e., X = [ x ( 1 ) , x ( 2 ) , , x ( μ ) ] R μ n . Similarly, the image of X under F can also be represented by a R μ k -vector: Y = [ F ( x ( 1 ) ) , F ( x ( 2 ) ) , , F ( x ( μ ) ) ] . In this sense, the objective function F is also extended as follows:
F : X μ R μ k , X [ F ( X 1 , , X n ) , F ( X n + 1 , , X 2 n ) , , F ( X ( μ 1 ) n + 1 , , X μ n ) ] .
Taking F , we can express the hypervolume indicator as a function on R μ n :
H F : R μ n R 0 , X HV ( F ( X ) ; r ) .
We will henceforth omit the reference point r in H F for simplicity. It is straightforward to express the gradient of H F with respect to  X using the chain rule as reported in our previous works [16,19]: H F ( X ) = ( H F / F ) ( F / X ) , in which we also discussed the time complexity of computing the hypervolume gradient. It is noted here that an alternative to the computation of the gradient of the entire set, it was also suggested to compute only the gradient of a single point with respect to the hypervolume indicator; this approach is referred to as hypervolume scalarization [50].

2.4. Hypervolume Hessian and Hypervolume Newton Method

Here, we assume F is at least twice continuously differentiable. In general, the Hessian matrix of the hypervolume indicator can be expressed as follows:
2 H F = X H F F F X = X H F F F X + H F F 2 F X X = F 2 H F F F F + H F F 2 F X X .
Note that in the above expression, 2 H F / F F and 2 F / X X denote the Hessian matrix of the hypervolume indicator with respect to objective points and of the objective function F , respectively. In our previous work [21], we derived the analytical expression of 2 H F for bi-objective cases and analyzed the structure and properties of the hypervolume Hessian matrix. In addition, we implemented a standalone hypervolume Newton (HVN) algorithm for unconstrained MOPs. Moreover, we have shown that the Hessian 2 H F is a tridiagonal block matrix in bi-objective cases and provided the non-singularity condition thereof, which states the Hessian is only singular on a null subset of R μ n  [21], thereby ascertaining the safety of applying the HVN method.
The analytical expression of the Hessian matrix for higher dimensions contains the derivatives H F / x i ( ) x j ( m ) , m = 1 , , μ , = 1 , , μ , i = 1 , , n , j = 1 , , n . To compute these derivatives analytically, the chain rule can be applied (see [21]). In [21]. However, the Hessian matrix of the second mapping—from the points in the objective space ( y ( 1 ) , , y ( k ) ) to the hypervolume indicator—was only given analytically for two dimensions. The Hessian matrix of this second mapping can be generalized to k dimensional objective spaces, and it is continuous in regular cases. Here, we will only sketch the construction of this matrix and leave the detailed analysis for future research. It is known that in the N-dimensional case, the first derivatives HV / y i are given by the ( k 1 ) -dimensional Lebesgue measure of the k 1 dimensional faces of the attainment surface that separates the dominated space from the non-dominated space (see Figure 1, HV / y 3 ( 1 ) ). These faces themselves have a derivative that is given by the ( k 2 ) -dimensional Lebesgue measure of the k 2 -dimensional segments (or patches) at the boundary of these faces, which are also changing continuously with y i (see Figure 1, examples HV / y 1 ( 1 ) y 3 ( 1 ) and HV / y 2 ( 2 ) y 3 ( 1 ) ). Note that points in the objective space need to be in a general position to guarantee differentiability; otherwise, one-sided differentiability applies and one of the two derivatives, i.e., when the derivative with perturbed coordinate falls to the dominated subspace, it is always zero [16].
In this work, however, rather than investigating in detail the analytical and computational properties of the Hessian for more than two objective functions, we compute the second-order derivative 2 H F / F F with the automatic differentiation (AD) method [51] and focus on solving equality-constrained MOPs using the Hessian matrix of the hypervolume indicator.

3. Hypervolume Newton Method for Constrained MOPs

In this section, we first describe the base method of HVN for the treatment of equality constrained MOPs and will then discuss how to deal with inequalities and with dominated points that may be computed during the run of the Newton method.

3.1. Handling Equalities

Consider a continuous equality-constrained MOP of the form
min x X F ( x ) , s . t . h ( x ) = 0 ,
where h ( x ) = ( h 1 ( x ) , , h p ( x ) ) , and h i : R n R , i = 1 , , p , being the i-th equality constraint. The objective map is defined by F : X R n R k , where f i : X R n R is the i-the individual objective to be considered in the MOP. The feasible set is given by:
Q = { x X : h ( x ) = 0 } .
The set (population) based hypervolume optimization problem we are considering in this work is the following one:
max X Q | X | = μ HV ( F ( X ) ) ,
where HV ( F ( X ) ) denotes the value of the hypervolume for a given set X = { x ( 1 ) , , x ( μ ) } of magnitude μ N , where each x ( i ) R n . Note that the set X Q can be interpreted as a point in R μ n (via considering X = ( x 1 ( 1 ) , , x n ( 1 ) , x 1 ( 2 ) , , x n ( 2 ) , , x 1 ( μ ) , , x n ( μ ) ) , and hence, problem (4) can be identified by a scalar objective optimization problem of dimension  μ n .
The feasibility of X (i.e., X Q ) is identical to
h i ( x ( j ) ) = 0 , i = 1 , , p , j = 1 , , μ .
For the related set-based equality constraints, we define for i { 1 , , p } and j { 1 , , μ }
h i , j : R μ n R , h i , j ( X ) = h i ( x ( j ) ) .
For checking the feasibility of all decision points, we define h ¯ : R μ n R p n via
h ¯ ( X ) = h 1 , 1 ( X ) h 2 , 1 ( X ) h p , 1 ( X ) h 1 , 2 ( X ) h 2 , 2 ( X ) h p , 2 ( X ) h p , n ( X ) = : h ¯ 1 ( X ) h ¯ 2 ( X ) h ¯ p ( X ) h ¯ p + 1 ( X ) h ¯ p + 2 ( X ) h ¯ 2 p ( X ) h ¯ p n ( X ) ,
then its Jacobian is given by
H ¯ : = h ¯ ( X ) = diag H ( x ( 1 ) ) , , H ( x ( μ ) ) R μ p × μ n ,
where
H ( x ( i ) ) = h 1 ( x ( i ) ) h p ( x ( i ) ) R p × n .
The Karush-Kuhn-Tucker (KKT) equations of the problem (4) hence read as
H F ( X ) + H ¯ λ = 0 h ¯ ( X ) = 0 ,
for a Lagrange multiplier (or the dual variable) λ R μ p which directly leads to the root finding problem
G : R n ( μ + p ) R n ( μ + p ) G ( X , λ ) = H F ( X ) + H ¯ λ h ¯ ( X ) = 0 ,
where λ R μ n . The Jacobian of G at ( X , λ ) T is given by
D G ( X , λ ) = 2 H F ( X ) + M H ¯ H ¯ 0 R μ ( n + p ) × μ ( n + p ) ,
where
M = j = 1 μ p λ i 2 h ¯ j ( X ) R μ n × μ n .
Denoting by X t R μ n and λ t R μ p , the variables in iteration t, a Newton step for problem (11) is given by
X t + 1 λ t + 1 = X t λ t D G ( X t , λ t ) 1 G ( X t , λ t ) .
In our computations, we have omitted M in D G . A Newton step is hence obtained by solving
2 H F ( X t ) H ¯ H ¯ 0 X t + 1 X t λ t + 1 λ t = H F ( X t ) + H ¯ λ t h ¯ ( X t ) .

3.2. Handling Inequalities

In order to handle inequalities, we have chosen an active set approach which we will discuss in the following. This approach is straightforward; however, it has led to satisfying results in our computations, in particular when the initial candidate set was computed by the evolutionary algorithm.
Assume problem (2) contains inequalities of the form
g ( x ) 0 ,
where g ( x ) = ( g 1 ( x ) , , g m ( x ) ) and g i : R n R , i = 1 , , m , is the i-th inequality constraint. Analogous to the equality-constrained case, we define the feasibility of X = ( x 1 ( 1 ) , , x n ( 1 ) , x 1 ( 2 ) , , x n ( 2 ) , , x 1 ( μ ) , , x n ( μ ) ) by
g i ( x ( j ) ) 0 , i = 1 , , m , j = 1 , , μ .
Define for i { 1 , , m } and j { 1 , , μ }
g i , j : R μ n R , g i , j ( X ) = g i ( x ( j ) )
and g ¯ : R μ n R m n by
g ¯ ( X ) = g 1 , 1 ( X ) h 2 , 1 ( X ) h m , 1 ( X ) h 1 , 2 ( X ) h 2 , 2 ( X ) h m , 2 ( X ) h m , n ( X ) = : g ¯ 1 ( X ) g ¯ 2 ( X ) g ¯ m ( X ) g ¯ m + 1 ( X ) g ¯ m + 2 ( X ) g ¯ 2 m ( X ) g ¯ m n ( X ) .
The active set we have used is as follows: if for an inequality constraint it holds
g ¯ l ( X ) > tol
for a given tolerance tol > 0 at X , then we impose the equality
g ¯ l ( X ) = 0 ,
(i.e., it will be added to the set of equalities) while all other inequalities are disregarded at  X .

3.3. Handling Dominated Points

Since Newton’s method tends to realize relatively longer steps, it often occurs that some decision points are dominated after a Newton step/iteration. Therefore, it is necessary to discuss how the equality-constrained HVN method behaves in this case. For the reason that will become clear during our discussion, we will investigate two scenarios: (1) infeasible and dominated points and (2) feasible but dominated points.
For the first scenario, we consider the simplest case, where p = 1 and there is only one dominated point. Without loss of generality, we can assume that for an approximation set X = { x ( 1 ) , x ( 2 ) , , x ( μ ) } X , x ( 1 ) is dominated by at least one of the remaining μ 1 points (as the indices are assigned to X arbitrarily). Denoting by X ( 1 ) the approximation set after removing x ( 1 ) , we can express the constraint function on X ( 1 ) as:
h ¯ * ( X ( 1 ) ) : R ( μ 1 ) n R μ 1 , X ( 1 ) h ¯ 2 * ( X ( 1 ) ) , h ¯ 3 * ( X ( 1 ) ) , , h ¯ μ * ( X ( 1 ) ) , h ¯ j * ( X ( 1 ) ) : R ( μ 1 ) n R , X ( 1 ) h ( x ( j ) ) , j [ 2 μ ] .
Note that we are only considering the special case of one constraint, i.e., p = 1 . The root finding problem G can re-expressed in the following form, equivalent to Equation (11):
G ( X , λ ) = λ 1 h ( x ( 1 ) ) H F X ( 1 ) + i = 2 μ λ j h ¯ j * ( X ( 1 ) ) h ( x ( 1 ) ) h ¯ * ( X ( 1 ) ) .
Let μ = μ 1 and H X ( 1 ) = [ h ¯ 2 * ( X ( 1 ) ) , , h ¯ μ * ( X ( 1 ) ) ] R μ n × μ , we express the derivative of G as a block matrix:
D G ( X , λ ) = λ 1 2 h ( x ( 1 ) ) 0 n × μ n h ( x ( 1 ) ) 0 n × μ 0 μ n × n 2 H F X ( 1 ) + j = 2 μ 2 h ¯ j * X ( 1 ) 0 μ n × 1 H X ( 1 ) h ( x ( 1 ) ) 0 1 × μ n 0 0 1 × μ 0 μ × n H X ( 1 ) 0 μ × 1 0 μ × μ .
Note that the upper left 2 × 2 block equals 2 H F X + i = 1 μ 2 h i X . The inverse of D G can be obtained by applying the Schur complement recursively (first consider the block partition indicated above and then apply it again to each partition), provided that both 2 h ( x ( 1 ) ) and 2 H F X ( 1 ) are non-singular.
After simplification, the inverse of D G admits the following form:
[ D G ( X , λ ) ] 1 = I n × n g A g 1 A g g A 0 n × μ n g A g 1 A g 0 n × μ 0 μ n × n B I H ( H B H ) 1 H B 0 μ n × 1 0 μ n × μ g A g 1 ( A g ) 0 1 × μ n g A g 1 0 1 × μ 0 μ n × n 0 μ × μ n 0 μ × 1 ( H B H ) 1 ,
where g = h ( x ( 1 ) ) , A = [ λ 1 2 h ( x ( 1 ) ) ] 1 , H = H ( X ( 1 ) ) , and
B = 2 H F X ( 1 ) + j = 2 μ 2 h ¯ j * X ( 1 ) 1 .
The first row of blocks is of particular interest to us since it determines the search step of x ( 1 ) . It is obvious that
Δ x ( 1 ) = [ D G ( X , λ ) ] 1 [ 1 : n , 1 : μ ( n + 1 ) ] G ( X , λ ) = λ 1 I n × n g A g 1 A g g A g + h ( x ( 1 ) ) g A g 1 A g = h ( x ( 1 ) ) h ( x ( 1 ) ) [ 2 h ( x ( 1 ) ) ] 1 h ( x ( 1 ) ) [ 2 h ( x ( 1 ) ) ] 1 h ( x ( 1 ) ) ,
where notation M [ 1 : n , 1 : μ ( n + 1 ) ] takes rows from 1 to n and columns from 1 to μ ( n + 1 ) in matrix M . Similarly, the search step of the dual variable is:
Δ λ 1 = [ D G ( X , λ ) ] 1 [ μ n + 1 , 1 : μ ( n + 1 ) ] G ( X , λ ) = λ 1 g A g 1 ( A g ) g + g A g 1 h ( x ( 1 ) ) = λ 1 h ( x ( 1 ) ) h ( x ( 1 ) ) [ 2 h ( x ( 1 ) ) ] 1 h ( x ( 1 ) ) 1 .
Now, consider the function h ^ ( x ) = h 2 ( x ) / 2 , whose first- and second-order derivatives are:
h ^ ( x ) = h ( x ) h ( x ) , 2 h ^ ( x ) = h ( x ) 2 h ( x ) + h ( x ) h ( x ) .
The global minimum/maximum of h ^ corresponds to the feasible set, i.e., h ( x ) = 0 . Hence, Newton iterations that optimize h ^ will equivalently find the feasible set. Computing the Newton direction of h ^ , we have:
2 h ^ ( x ) 1 h ^ ( x ) = [ h ( x ) 2 h ( x ) ] 1 I n × n h ( x ) h ( x ) [ h ( x ) 2 h ( x ) ] 1 1 + h ( x ) [ h ( x ) 2 h ( x ) ] 1 h ( x ) h ( x ) h ( x ) = h ( x ) h ( x ) + h ( x ) [ 2 h ( x ) ] 1 h ( x ) [ 2 h ( x ) ] 1 h ( x ) .
Setting x = x ( 1 ) in the above equation and comparing it to Equation (22), we notice that the Newton direction of h ^ and the hypervolume Newton step Δ x ( 1 ) only differ by a scalar, which can be neglected in practice since we implement a step-size control to re-scale the search step (see the following sub-section). Therefore, we conclude that for infeasible and dominated points, our HVN method (Equation (15)) only considers the constraint function and moves such decision points to the feasible set rapidly (ideally at quadratic speed when the point is close to the feasible set). This satisfactory property allows for handling infeasible and dominated points without modifying our HVN method.
In addition, due to this nice property, an infeasible point will eventually lie on the feasible set, where it can still be dominated if other feasible points exist. This is precisely the second scenario of our discussion, in which the hypervolume of feasible but dominated points will be zero. To move such points, we propose to employ the famous non-dominated sorting [36] procedure, where we partition all feasible points into “layers” of mutually non-dominated ones (formally, anti-chains of Pareto order) and compute the Newton direction for each layer (using Equation (15)) regardless of other dominating layers. In this manner, the HVN method can move all feasible points along the feasible set for achieving a good distribution.

3.4. The HVN Method for Constrained MOPs

Taking the above considerations regarding the HVN method, in this section, we aim to devise and implement a standalone HVN algorithm, which is outlined in Algorithm 1. First, we check if any decision point is feasible (i.e., h ( x ) = 0 for some x ), where the feasibility can be tested numerically with a pre-defined small threshold (e.g., 10 4 used in this work) for the equality constraints. Then, we employ the non-dominated sorting point procedure [36] to partition the feasible points X f into “layers” of mutually non-dominated ones, where the Newton direction (Equation (15)) is calculated separately on each layer. Taking L for the indices of points in a layer and X f [ L ] for the subset indexed by L, we express this partitioning as X f [ L 1 ] X f [ L 2 ] X f [ L q ] , i j ( L i L j = ) , i L i [ 1 μ ] . Note that the dominance relation for the remaining infeasible and dominated points is not well-defined, considering the equality constraints since they are incomparable to the feasible ones (also among themselves). In this case, we simply merge them into the first layer L 1 and compute the Newton direction thereof, which can be justified by the observation in Equations (22) and (24). The resulting search direction of the infeasible and dominated points is a Newton direction of the function h 2 / 2 . In this treatment, a special case arises when there are no feasible points, usually in the first several iterations of the algorithm.
Algorithm 1: Standalone hypervolume Newton algorithm for equality-constrained MOPs
Mca 28 00010 i001
Finally, another important aspect is the step-size control for each Newton step. We propose maintaining individual step-sizes for each partition, which is determined using the well-known Armijo’s backtracking line search [52]. In detail, this method starts with an initial step-size σ 0 and tests whether the Euclidean norm of G ( X , λ ) has sufficiently decreased after applying the Newton step to the primal-dual pair ( X , λ ) . Since Newton’s direction for equality-constrained problems (Equation (15)) is not necessarily an ascent direction for the hypervolume, we take the Euclidean norm | | G ( X , λ ) | | as the convergence measure since (1) the optimality condition is G ( X , λ ) = 0 (Equation (11)) and (2) the Newton step is always a descent direction of | | G ( X , λ ) | | . Let Z = ( X , λ ) be the primal-dual variable and Δ Z = [ D G ( Z ) ] 1 G ( Z ) , then we have ( d d σ | | G ( Z + σ Δ Z ) | | ) | σ = 0 = | | G ( Z ) | | 0 . If the test fails, then we halve the step-size and repeat the test. Notably, for infeasible and dominated points, the test checks if the value of the squared constraint value is sufficiently decreased as the HVN method computes the Newton direction of h 2 / 2 for those points. In our implementation, we use maximally six iterations of such tests, resulting in a minimal step-size of σ 0 / 64 . As for the initial step-size σ 0 , the commonly used value σ 0 = 1 often leads to Newton steps that jump out of the decision space when the Newton direction is large or the point is in the vicinity of the decision boundary. Therefore, we set it to the minimum of one and the maximal step-size that the primal vector X can take without leaving the decision space, i.e., σ 0 = min { 1 , σ max } . The value of σ max can be calculated in a straightforward way when the decision space is a convex and compact subset of R n , e.g., a hyperbox.

3.5. Computational Cost

The above method requires the knowledge of the Jacobian and the Hessian of both objective and constraint functions. In this work, we have used automatic differentiation (AD) techniques [53]. Note that finite differences can also be utilized when the AD-computation is not applicable. The AD-computation takes maximally four times the used multiply–add operations taken in evaluating the function value [54]. Hence, to make a fair comparison between HVN and MOEA methods, we will take 4 function evaluations (FEs) and 4 + 6 n FEs to quantify the computational cost of each AD-computed Jacobian and Hessian, respectively. In total, the number of FEs consumed in each iteration comprises:
# FEs : μ F + μ h + 4 μ F + 4 μ h + ( 4 + 6 n ) μ 2 F + ( 4 + 6 n ) μ 2 h + 6 ( 4 μ + 4 μ ) step - size control = ( 69 + 12 n ) μ ,
which amounts to computations of function evaluation, constraint evaluation, Jacobian of the objective function and the constraint, Hessian of the objective function and the constraint, and the backtracking line search of the step-size. Computing the hypervolume Hessian takes Θ ( ( μ n ) 3 ) time in addition to the AD-computation of derivatives in Equation (1). For solving Equation (15), we use Cholesky decomposition, which has a computational complexity of O ( ( μ ( n + p ) ) 3 ) . It is certainly desired either to have an analytic expression of the HV Hessian or to exploit the block diagonal structure this matrix will certainly have for AD, which we, however, have to leave for future research.
We have implemented the standalone algorithm in Python, which is accessible at https://github.com/wangronin/HypervolumeDerivatives (accessed on 1 November 2022).

4. Numerical Results

In this section, we present some numerical results of the HVN both as standalone algorithms as well as a local search engine within the NSGA-III algorithm.

4.1. HVN as Standalone Algorithm

We showcase the behavior of the proposed Newton method as a standalone method on three example problems:
( P 1 ) : F ( x ) = ( x 1 ) 2 , ( x + 1 ) 2 , h ( x ) = x 2 1 , X = [ 2 , 2 ] 2 , r = [ 20 , 20 ] . ( P 2 ) : F ( x ) = ( x ( 1 , 1 , 0 ) ) 2 , ( x ( 1 , 1 , 0 ) ) 2 , ( x ( 1 , 1 , 0 ) ) 2 , h ( x ) = x 2 3 3 1 , 0 , 1.5 2 1 , X = [ 2 , 2 ] 3 , r = [ 38 , 38 , 38 ] . ( P 3 ) : F ( x ) = ( x + ( 1 , 1 , 1 ) ) 2 , ( x + ( 1 , 0 , 0 ) ) 2 , ( x + ( 2 , 2 , 4 ) ) 2 , g ( x ) = x 0 , X = [ 4 , 4 ] 3 , r = [ 90 , 90 , 90 ] .
Importantly, we will use different initializations of the decision points that are specific to each problem in order to investigate the behavior of the standalone HVN with respect to the characteristic of each problem; We do not aim to provide a unified and systematic initialization method for the standalone HVN in this section. Note that problem P3 defines an inequality constraint on the first component x 0 of the decision point, where the feasible set is { x X : x 0 0 } , and the optimum lies on the active set of g, i.e., x 0 = 0 . This problem is meant to test if the proposed HVN algorithm can manage to solve inequality-constrained problems where the optimum is on the active set of the constraint. To measure the empirical performance of the HVN algorithm, we take the Euclidean norm | | G ( X , λ ) | | since the Newton direction is not necessarily an ascent direction for the hypervolume.
Moreover, since it is well-known that the Newton-like method can be affected by choice of initial solutions, we investigate the performance of the HVN algorithm on problem P1 with three different initializations. Specifically, in the two-dimensional decision space, we create μ = 50 initial decision points on the line segment x 2 = x 1 2 , x 1 [ 0 , 2 ] , where we determine the value of x 1 by (i) taking evenly spaced points (linear), (ii) logistic-transformed evenly spaced points (which makes the points denser around the tails of the line segment), or (iii) logit-transformed evenly spaced points (higher density of points in the middle). The results are illustrated in Figure 2 and Table 1, which shows a set of well-distributed points on the feasible set in the objective space (the red dashed sphere) for all three initializations. In addition, the empirical convergence rate is quadratic regardless of the choice of initialization methods, as reported in Table 1.
The results on problem P2 are depicted in Figure 3 and Table 2 for three different sizes μ { 20 , 40 , 60 } of the approximation set. The initial decision points are sampled uniformly at random in the convex hull of three points ( 1 , 1 , 0 ) , ( 1 , 1 , 0 ) , and ( 1 , 0 , 0 ) . Whereas the final approximation set is well-distributed in the objective space, we observe that empirical convergence of | | G ( X , λ ) | | is considerably rugged in the first 20–25 iterations, after which quadratic convergence appears. This is indeed attributed to the fact that decision points often become dominated in the first couple of iterations on this problem, resulting in zero hypervolume gradient thereof and hence quite a large norm of | | G ( X , λ ) | | . Nevertheless, the proposed treatment of those dominated points (Algorithm 1), which is based on the non-dominated sorting procedure, is capable of bringing the dominated points to the active set with a quadratic speed. Similarly, the same ruggedness is seen in the convergence chart of problem P3 (shown in Figure 4). On this problem, we again take the setting μ { 20 , 40 , 60 } , and the initial decision points are sampled uniformly at random in the feasible space of [ 0 , 4 ] × [ 4 , 4 ] 2 . We extend the HVN algorithm slightly for this inequality-constrained problem in the following way: whenever the decision points are feasible, i.e., g ( x ) 0 , and quite distant from the active set ( g ( x ) = 0 , shown as the red plane in Figure 4), we ignore the constraint function when computing the Newton step. When the feasible decision points are sufficiently close to the active set (the distance is less than 10 4 in our implementation), we consider g ( x ) an equality constraint and utilize Equation (15) to compute the Newton step.
Moreover, we test the standalone HVN method on large-scale, complicated MOPs. We choose the well-known DTLZ problems with one spherical constraint [55,56] with μ = 200 decision points, resulting in a relatively large Hessian matrix (for an 11-dimensional decision space and one constraint, the D G ( X , λ ) object is of size 2400 × 2400 ). In this case, we use sparse matrix operations for computation efficiency, exploiting the sparsity of the Hessian. Since the DTLZ problems are highly multi-modal, the initial approximation set is generated in a local vicinity of the Pareto set, i.e., X * + 0.02 U ( 0 , 1 ) , where X * is sampled uniformly at random on the Pareto set. We execute the standalone HVN method for 15 iterations and illustrate the result in Figure 5. In the plot, we observe well-distributed final points (green dots) in contrast to non-uniform initial ones (black crosses), showing the standalone HVN works properly as a local method for large-scale problems.

4.2. HVN within NSGA-III

In this section, we investigate the empirical performance of the HVN algorithm on more complicated, equality-constrained DTLZ (Eq-DTLZ) problems [55,56] and their inverted counterparts (Eq-IDTLZ). As Newton-like algorithms are local methods, running the standalone algorithm (Algorithm 1) will stagnate at local Pareto sets. Therefore, we hybridize the HVN algorithm with an MOEA, in which we first execute the MOEA for a pre-defined budget to overcome the local optimum and get close to the global Pareto set, and then initialize the HVN algorithm from the final approximation set of the MOEA to make local refinements. We summarize this hybrid approach in Algorithm 2. Notably, in line 3, we transfer the whole approximation set (rather than only the non-dominated points) to HVN upon the termination of MOEA since the standalone HVN method is able to move dominated points towards the feasible set at quadratic speed, as proven in Section 3.3.
Algorithm 2: Hybridization of HVN and MOEA
Mca 28 00010 i002
The following empirical study aims to check whether the hybridization approach can achieve a better final approximation set/front than an MOEA alone under the same computation budget. As for the test problem, a single spherical constraint h ( x ) = ( x 1 0.5 ) 2 + ( x 2 0.5 ) 2 0.16 is imposed on problems DTLZ 1 4 . The decision space is [ 0 , 1 ] 11 , the reference point is r = ( 1 , 1 , 1 ) for HVN, and the approximation set is of size 200. Here, we choose the well-known NSGA-III algorithm [34,35], where the equality constraints are handled using the adaptive ε -constraint handling technique. We utilize the implementation in the Pymoo library: https://pymoo.org/constraints/eps.html (accessed on 1 November 2022). The method considers a solution feasible subject to a small ε threshold, which decreases linearly to zero. The initial value of ε is set to the average constraint value of the initial population. In our experiment, we control the ε decrease to zero after 50 % of the iterations of NSGA-III. In addition, we use Das and Dennis’s approach [28] to generate well-spaced reference directions (18 partitions which lead to 190 directions) for NSGA-III. As for its hyperparameters, we use the default setting: η = 30 and p = 1 for simulated binary crossover and η = 20 for polynomial mutation. Furthermore, the hybrid algorithm first executes NSGA-III with μ = 200 for 1000 iterations and then runs the HVN method for 10 iterations. In HVN, the total function evaluations and AD operations take ca. 270 s CPU time on an Intel(R) Core(TM) i5-8257U CPU. Considering the CPU time of a single function evaluation, which is on average ca. 5.6 × 10 5 s measured on the same hardware, the total function evaluations plus the AD operations are equivalent to roughly 270 / 5.6 × 10 5 4.8 × 10 5 FEs. Therefore, the total budget of the hybrid algorithm is roughly 4.8 × 10 5 / 200 + 1000 3400 iterations. We will execute the standalone NSGA-III algorithm for the same iterations to keep the fairness of comparisons.
We first depict one example of the final approximation set (only the non-dominated subset is shown) in Figure 6 for both methods, where we clearly observe that the hybridization achieves much more non-dominated points than NSGA-III. Second, we show, in Table 3, the hypervolume indicator value and the number of final non-dominated points for both algorithms obtained from 15 independent runs. In addition, we compute the above metrics for the hybrid algorithm right before the HVN phase starts (NSGA-III (1000) in the table), showing the progress that HVN manages to make. From the results, we conclude that the hybrid algorithm significantly improves upon the hypervolume metric and outputs substantially more non-dominated points than NSGA-III alone. We conjecture that the observed advantage of the hybrid algorithm is very likely attributed to HVN’s ability to move dominated points to the feasible set with quadratic convergence (see Section 3), which disregards the objective function and thereby its multi-modal landscape.

5. Conclusions

In this paper, we propose a hypervolume Newton method for equality-constrained multi-objective optimization problems (MOPs) under the assumption that both the objective and the constraint functions are twice continuously differentiable. Based on previous works on set-oriented hypervolume Hessian matrix and hypervolume Newton (HVN) method for unconstrained MOPs, we propose, in this paper, the generalization of the HVN for equality-constrained problems and also elaborate a treatment for inequality-constrained based on an active set approach, which regards an inequality function as equality if the constraint values are within some small tolerance. In addition, we devised and tested two resulting algorithms: the standalone HVN method as an efficient local optimizer and a hybridization of the HVN and an MOEA for solving complicated and multi-modal MOPs. Moreover, in detail, we discuss the search direction for dominated points obtained from the set-oriented Newton step in which we prove that for dominated and infeasible points, the computed search step is the Newton step of the squared equality constraint function. Therefore, our HVN method can efficiently steer the non-dominated and dominated decision points.
We first illustrate the empirical behavior of the standalone algorithm on three simple MOPs, where we observe quadratic convergence of the two-norm of the root finding problem G. Then, on highly multi-modal DTLZ problems with one spherical constraint (Eq-DTLZ), we tested the local convergence of the standalone HVN algorithm with a relatively large approximation set ( μ = 200 ) by initializing the approximation set in the neighborhood around the Pareto set, which shows a fast convergence to well-distributed points on the feasible set. Finally, we benchmark the hybrid algorithm against NSGA-III on Eq-DTLZ1-4 and Eq-IDTLZ1-4 problems, in which we observe that with roughly the same computational budget, the hybrid algorithm achieves substantially more non-dominated points in the final population, which leads to significantly higher hypervolume values. We conjecture that such an advantage is attributed to (1) the fast local convergence of the HVN method and (2) HVN’s ability to move infeasible and dominated points.
For future works, we contemplate (1) testing the hybridization of the HVN method with other EMOAs for more than three objectives, e.g., SMS-EMOA, to investigate the benefit of the HVN method in a broader setup; (2) comparing the hybrid HVN method to other state-of-the-art algorithms, e.g., MOEA/D (decomposition-based), EHVI-EGO (Bayesian optimization), or the average Hausdorff distance-based Newton method (mathematical optimization) on complex, or even real-world MOPs with multiple non-linear constraint functions; (3) investigating the analytical expression (as sketched in Figure 1) and computation of the hypervolume Hessian matrix, which can reduce the computation cost of the HVN method; (4) devising generic methodologies to handle inequality constraints for the HVN method, which will make it more applicable in practice; (5) extending the HVN to methods that provide non-zero sub-gradients for dominated points as in [17,18]; and (6) incorporating a surrogate-assisted method for tackling high-dimensional and complex problems, e.g., as in [57].

Author Contributions

Conceptualization, O.S., H.W., A.D., M.E. and V.A.S.H.; methodology, O.S. and H.W.; software, M.E. geometrical analysis, and visualization, H.W. and V.A.S.H.; validation, H.W., M.E. and O.S.; formal analysis, H.W., O.S. and M.E.; investigation, H.W. and O.S.; resources, O.S.; data curation, H.W.; writing—original draft preparation, all; writing—review and editing, all; visualization, H.W.; supervision, O.S. and M.E.; project administration, O.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

We have hosted all the data sets of this work on Zenodo: https://doi.org/10.5281/zenodo.7509148, accessed on 31 October 2022.

Acknowledgments

We dedicate this work to Kalyanmoy Deb for his pioneering, inspiring, and fundamental contributions to the evolutionary multi-objective optimization (EMO) community, and particularly for his famous non-dominated sorting procedure, which plays a crucial role in this work in order to efficiently handle dominated points that can be generated throughout the Newton iteration.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Stewart, T.; Bandte, O.; Braun, H.; Chakraborti, N.; Ehrgott, M.; Göbelt, M.; Jin, Y.; Nakayama, H. Real-World Applications of Multiobjective Optimization. In Proceedings of the Multiobjective Optimization, Lecture Notes in Computer Science; Slowinski, R., Ed.; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5252, pp. 285–327. [Google Scholar]
  2. Deb, K. Evolutionary multi-objective optimization: Past, present and future. In Proceedings of the GECCO ’20: Proceedings of the 22th annual Conference on Genetic and Evolutionary Computation, Cancún, Mexico, 8–12 July 2020; pp. 343–372. [Google Scholar]
  3. Aguilera-Rueda, V.J.; Cruz-Ramírez, N.; Mezura-Montes, E. Data-Driven Bayesian Network Learning: A Bi-Objective Approach to Address the Bias-Variance Decomposition. Math. Comput. Appl. 2020, 25, 37. [Google Scholar] [CrossRef]
  4. Frausto-Solis, J.; Hernández-Ramírez, L.; Castilla-Valdez, G.; González-Barbosa, J.J.; Sánchez-Hernández, J.P. Chaotic Multi-Objective Simulated Annealing and Threshold Accepting for Job Shop Scheduling Problem. Math. Comput. Appl. 2021, 26, 8. [Google Scholar] [CrossRef]
  5. Utz, S.; Wimmer, M.; Hirschberger, M.; Steuer, R.E. Tri-criterion inverse portfolio optimization with application to socially responsible mutual funds. Eur. J. Oper. Res. 2014, 234, 491–498. [Google Scholar] [CrossRef] [Green Version]
  6. Estrada-Padilla, A.; Lopez-Garcia, D.; Gómez-Santillán, C.; Fraire-Huacuja, H.J.; Cruz-Reyes, L.; Rangel-Valdez, N.; Morales-Rodríguez, M.L. Modeling and Optimizing the Multi-Objective Portfolio Optimization Problem with Trapezoidal Fuzzy Parameters. Math. Comput. Appl. 2021, 26, 36. [Google Scholar] [CrossRef]
  7. Van Veldhuizen, D.A. Multiobjective Evolutionary Algorithms: Classifications, Analyses, and New Innovations; Technical report; Air Force Institute of Technology: Kaduna, Nigeria, 1999. [Google Scholar]
  8. Coello, C.A.C.; Cortés, N.C. Solving Multiobjective Optimization Problems Using an Artificial Immune System. Genet. Program. Evolvable Mach. 2005, 6, 163–190. [Google Scholar] [CrossRef]
  9. Schütze, O.; Esquivel, X.; Lara, A.; Coello, C.A.C. Using the averaged Hausdorff distance as a performance measure in evolutionary multiobjective optimization. IEEE Trans. Evol. Comput. 2012, 16, 504–522. [Google Scholar] [CrossRef]
  10. Zitzler, E.; Thiele, L.; Laumanns, M.; Fonseca, C.M.; da Fonseca, V.G. Performance assessment of multiobjective optimizers: An analysis and review. IEEE Trans. Evol. Comput. 2003, 7, 117–132. [Google Scholar] [CrossRef] [Green Version]
  11. Auger, A.; Bader, J.; Brockhoff, D.; Zitzler, E. Theory of the hypervolume indicator: Optimal μ-distributions and the choice of the reference point. In Proceedings of the Foundations of Genetic Algorithms, 10th ACM SIGEVO International Workshop, FOGA 2009, Orlando, FL, USA, 9–11 January 2009; Garibay, I.I., Jansen, T., Wiegand, R.P., Wu, A.S., Eds.; ACM: New York, NY, USA, 2009; pp. 87–102. [Google Scholar] [CrossRef] [Green Version]
  12. Beume, N.; Naujoks, B.; Emmerich, M.T.M. SMS-EMOA: Multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 2007, 181, 1653–1669. [Google Scholar] [CrossRef]
  13. Bader, J.; Zitzler, E. HypE: An Algorithm for Fast Hypervolume-Based Many-Objective Optimization. Evol. Comput. 2011, 19, 45–76. [Google Scholar] [CrossRef]
  14. Schütze, O.; Domínguez-Medina, C.; Cruz-Cortés, N.; de la Fraga, L.G.; Sun, J.Q.; Toscano, G.; Landa, R. A scalar optimization approach for averaged Hausdorff approximations of the Pareto front. Eng. Optim. 2016, 48, 1593–1617. [Google Scholar] [CrossRef]
  15. Emmerich, M.; Deutz, A.; Beume, N. Gradient-Based/Evolutionary Relay Hybrid for Computing Pareto Front Approximations Maximizing the S-Metric. In Proceedings of the Hybrid Metaheuristics; Bartz-Beielstein, T., Blesa Aguilera, M.J., Blum, C., Naujoks, B., Roli, A., Rudolph, G., Sampels, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 140–156. [Google Scholar]
  16. Emmerich, M.; Deutz, A.H. Time Complexity and Zeros of the Hypervolume Indicator Gradient Field. In Proceedings of the EVOLVE—A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation III EVOLVE 2012, Mexico City, Mexico, 7–9 August 2012; Studies in Computational Intelligence. Schuetze, O., Coello, C.A.C., Tantar, A., Tantar, E., Bouvry, P., Moral, P.D., Legrand, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 500, pp. 169–193. [Google Scholar] [CrossRef]
  17. Wang, H.; Ren, Y.; Deutz, A.; Emmerich, M. On steering dominated points in hypervolume indicator gradient ascent for bi-objective optimization. In NEO 2015; Springer: Berlin/Heidelberg, Germany, 2017; pp. 175–203. [Google Scholar]
  18. Deist, T.M.; Maree, S.C.; Alderliesten, T.; Bosman, P.A. Multi-objective optimization by uncrowded hypervolume gradient ascent. In Proceedings of the International Conference on Parallel Problem Solving from Nature; Springer: Berlin/Heidelberg, Germany, 2020; pp. 186–200. [Google Scholar]
  19. Wang, H.; Deutz, A.H.; Bäck, T.; Emmerich, M. Hypervolume Indicator Gradient Ascent Multi-objective Optimization. In Proceedings of the Evolutionary Multi-Criterion Optimization—9th International Conference, EMO 2017, Münster, Germany, 19–22 March 2017; Trautmann, H., Rudolph, G., Klamroth, K., Schütze, O., Wiecek, M.M., Jin, Y., Grimme, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10173, pp. 654–669. [Google Scholar] [CrossRef]
  20. Sosa Hernández, V.A.; Schütze, O.; Emmerich, M. Hypervolume maximization via set based Newton’s method. In EVOLVE-A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation V; Springer: Berlin/Heidelberg, Germany, 2014; pp. 15–28. [Google Scholar]
  21. Sosa-Hernández, V.A.; Schütze, O.; Wang, H.; Deutz, A.H.; Emmerich, M. The Set-Based Hypervolume Newton Method for Bi-Objective Optimization. IEEE Trans. Cybern. 2020, 50, 2186–2196. [Google Scholar] [CrossRef] [PubMed]
  22. Petersen, K.B.; Pedersen, M.S. The matrix cookbook. Tech. Univ. Den. 2008, 7, 510. [Google Scholar]
  23. Hillermeier, C. Nonlinear Multiobjective Optimization: A Generalized Homotopy Approach; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2001; Volume 135. [Google Scholar]
  24. Miettinen, K. Nonlinear Multiobjective Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 12. [Google Scholar]
  25. Klamroth, K.; Tind, J.; Wiecek, M. Unbiased Approximation in Multicriteria Optimization. Math. Methods Oper. Res. 2002, 56, 413–437. [Google Scholar] [CrossRef]
  26. Fliege, J. Gap-free computation of Pareto-points by quadratic scalarizations. Math. Methods Oper. Res. 2004, 59, 69–89. [Google Scholar] [CrossRef]
  27. Eichfelder, G. Adaptive Scalarization Methods in Multiobjective Optimization; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  28. Das, I.; Dennis, J.E. Normal-Boundary Intersection: A New Method for Generating the Pareto Surface in Nonlinear Multicriteria Optimization Problems. SIAM J. Optim. 1998, 8, 631–657. [Google Scholar] [CrossRef] [Green Version]
  29. Fliege, J.; Drummond, L.G.; Svaiter, B.F. Newton’s method for multiobjective optimization. SIAM J. Optim. 2009, 20, 602–626. [Google Scholar] [CrossRef] [Green Version]
  30. Dellnitz, M.; Schütze, O.; Hestermeyer, T. Covering Pareto Sets by Multilevel Subdivision Techniques. J. Optim. Theory Appl. 2005, 124, 113–155. [Google Scholar] [CrossRef]
  31. Hernández, C.; Naranjani, Y.; Sardahi, Y.; Liang, W.; Schütze, O.; Sun, J.Q. Simple Cell Mapping Method for Multi-objective Optimal Feedback Control Design. Int. J. Dyn. Control. 2013, 1, 231–238. [Google Scholar] [CrossRef]
  32. Sun, J.Q.; Xiong, F.R.; Schütze, O.; Hernández, C. Cell Mapping Methods—Algorithmic Approaches and Applications; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
  33. Zhang, Q.; Li, H. MOEA/D: A Multi-objective Evolutionary Algorithm Based on Decomposition. IEEE Trans. Evol. Comput. 2007, 11, 712–731. [Google Scholar] [CrossRef]
  34. Deb, K.; Jain, H. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints. IEEE Trans. Evol. Comput. 2014, 18, 577–601. [Google Scholar] [CrossRef]
  35. Jain, H.; Deb, K. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point Based Nondominated Sorting Approach, Part II: Handling Constraints and Extending to an Adaptive Approach. IEEE Trans. Evol. Comput. 2014, 18, 602–622. [Google Scholar] [CrossRef]
  36. Deb, K.; Agrawal, S.; Pratap, A.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef] [Green Version]
  37. Ishibuchi, H.; Masuda, H.; Nojima, Y. A Study on Performance Evaluation Ability of a Modified Inverted Generational Distance Indicator; Association for Computing Machinery: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
  38. Dilettoso, E.; Rizzo, S.A.; Salerno, N. A Weakly Pareto Compliant Quality Indicator. Math. Comput. Appl. 2017, 22, 25. [Google Scholar] [CrossRef] [Green Version]
  39. Rudolph, G.; Schütze, O.; Grimme, C.; Domínguez-Medina, C.; Trautmann, H. Optimal averaged Hausdorff archives for bi-objective problems: Theoretical and numerical results. Comput. Optim. Appl. 2016, 64, 589–618. [Google Scholar] [CrossRef]
  40. Bogoya, J.M.; Vargas, A.; Schütze, O. The Averaged Hausdorff Distances in Multi-Objective Optimization: A Review. Mathematics 2019, 7, 894. [Google Scholar] [CrossRef] [Green Version]
  41. Schütze, O.; Dell’Aere, A.; Dellnitz, M. On Continuation Methods for the Numerical Treatment of Multi-Objective Optimization Problems. In Proceedings of the Practical Approaches to Multi-Objective Optimization; Number 04461 in Dagstuhl Seminar Proceedings; Branke, J., Deb, K., Miettinen, K., Steuer, R.E., Eds.; Internationales Begegnungs- und Forschungszentrum (IBFI): Schloss Dagstuhl, Germany, 2005; Available online: http://drops.dagstuhl.de/opus/volltexte/2005/349 (accessed on 1 November 2022).
  42. Martin, B.; Goldsztejn, A.; Granvilliers, L.; Jermann, C. On continuation methods for non-linear bi-objective optimization: Towards a certified interval-based approach. J. Glob. Optim. 2014, 64, 3–16. [Google Scholar] [CrossRef] [Green Version]
  43. Martín, A.; Schütze, O. Pareto Tracer: A predictor-corrector method for multi-objective optimization problems. Eng. Optim. 2018, 50, 516–536. [Google Scholar] [CrossRef]
  44. Schütze, O.; Cuate, O.; Martín, A.; Peitz, S.; Dellnitz, M. Pareto Explorer: A global/local exploration tool for many-objective optimization problems. Eng. Optim. 2020, 52, 832–855. [Google Scholar] [CrossRef]
  45. Beltrán, F.; Cuate, O.; Schütze, O. The Pareto Tracer for General Inequality Constrained Multi-Objective Optimization Problems. Math. Comput. Appl. 2020, 25, 80. [Google Scholar] [CrossRef]
  46. Bolten, M.; Doganay, O.T.; Gottschalk, H.; Klamroth, K. Tracing Locally Pareto-Optimal Points by Numerical Integration. SIAM J. Control. Optim. 2021, 59, 3302–3328. [Google Scholar] [CrossRef]
  47. Zitzler, E.; Thiele, L. Multiobjective Optimization Using Evolutionary Algorithms—A Comparative Case Study. In Proceedings of the Parallel Problem Solving from Nature—PPSN V, 5th International Conference, Amsterdam, The Netherlands, 27–30 September 1998; Lecture Notes in Computer Science. Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; Volume 1498, pp. 292–304. [Google Scholar] [CrossRef]
  48. Emmerich, M.; Yang, K.; Deutz, A.H.; Wang, H.; Fonseca, C.M. A Multicriteria Generalization of Bayesian Global Optimization. In Advances in Stochastic and Deterministic Global Optimization; Pardalos, P.M., Zhigljavsky, A., Zilinskas, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; Volume 107, pp. 229–242. [Google Scholar] [CrossRef]
  49. DiBenedetto, E.; Debenedetto, E. Real Analysis; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
  50. Paquete, L.; Schulze, B.; Stiglmayr, M.; Lourenço, A.C. Computing representations using hypervolume scalarizations. Comput. Oper. Res. 2022, 137, 105349. [Google Scholar] [CrossRef]
  51. Margossian, C.C. A Review of Automatic Differentiation and its Efficient Implementation. WIREs Data Mining Knowl. Discov. 2019, 9, e1305. [Google Scholar] [CrossRef] [Green Version]
  52. Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar] [CrossRef] [Green Version]
  53. Baydin, A.G.; Pearlmutter, B.A.; Radul, A.A.; Siskind, J.M. Automatic Differentiation in Machine Learning: A Survey. J. Mach. Learn. Res. 2017, 18, 153:1–153:43. [Google Scholar]
  54. Griewank, A.; Walther, A. Evaluating Derivatives—Principles and Techniques of Algorithmic Differentiation, 2nd ed.; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar] [CrossRef]
  55. Cuate, O.; Uribe, L.; Lara, A.; Schütze, O. A benchmark for equality constrained multi-objective optimization. Swarm Evol. Comput. 2020, 52, 100619. [Google Scholar] [CrossRef]
  56. Cuate, O.; Uribe, L.; Lara, A.; Schütze, O. Dataset on a Benchmark for Equality Constrained Multi-objective Optimization. Data Brief 2020, 29, 105130. [Google Scholar] [CrossRef]
  57. Fu, C.; Wang, P.; Zhao, L.; Wang, X. A distance correlation-based Kriging modeling method for high-dimensional problems. Knowl. Based Syst. 2020, 206, 106356. [Google Scholar] [CrossRef]
Figure 1. Example of a hypervolume indicator Hessian computation in three-dimensional objective space with a collection of points { y ( 1 ) , y ( 2 ) , y ( 3 ) } and reference point r .
Figure 1. Example of a hypervolume indicator Hessian computation in three-dimensional objective space with a collection of points { y ( 1 ) , y ( 2 ) , y ( 3 ) } and reference point r .
Mca 28 00010 g001
Figure 2. On problem P1, the convergence of the HVN method is shown for three different initializations of the starting approximation set ( μ = 50 )—linear (top row), logistic (middle), and logit spacing (bottom). We depict the final approximation set (left column; green stars), the corresponding objective points (middle column; green stars), and the evolution of the HV value and G ( X , λ ) (right column).
Figure 2. On problem P1, the convergence of the HVN method is shown for three different initializations of the starting approximation set ( μ = 50 )—linear (top row), logistic (middle), and logit spacing (bottom). We depict the final approximation set (left column; green stars), the corresponding objective points (middle column; green stars), and the evolution of the HV value and G ( X , λ ) (right column).
Mca 28 00010 g002
Figure 3. On problem P2 with a spherical constraint, we depict for three sizes of the approximation set ( μ { 20 , 40 , 60 } ; from top to bottom), the final approximation set (left column; green stars), the corresponding objective points (middle column; green stars), and the evolution of the HV value and G ( X , λ ) (right column). The initial points are sampled uniformly at random in the convex hull of three points ( 1 , 1 , 0 ) , ( 1 , 1 , 0 ) , and ( 1 , 0 , 0 ) .
Figure 3. On problem P2 with a spherical constraint, we depict for three sizes of the approximation set ( μ { 20 , 40 , 60 } ; from top to bottom), the final approximation set (left column; green stars), the corresponding objective points (middle column; green stars), and the evolution of the HV value and G ( X , λ ) (right column). The initial points are sampled uniformly at random in the convex hull of three points ( 1 , 1 , 0 ) , ( 1 , 1 , 0 ) , and ( 1 , 0 , 0 ) .
Mca 28 00010 g003
Figure 4. On problem P3 with a spherical constraint, we depict for three sizes of the initial approximation set ( μ { 20 , 40 , 60 } ; from top to bottom), the final approximation set (left column; green stars), the corresponding objective points (middle column; green stars), and the evolution of the HV value and G ( X ) (right column). The initial decision points are sampled uniformly at random in the feasible space of [ 0 , 4 ] × [ 4 , 4 ] 2 .
Figure 4. On problem P3 with a spherical constraint, we depict for three sizes of the initial approximation set ( μ { 20 , 40 , 60 } ; from top to bottom), the final approximation set (left column; green stars), the corresponding objective points (middle column; green stars), and the evolution of the HV value and G ( X ) (right column). The initial decision points are sampled uniformly at random in the feasible space of [ 0 , 4 ] × [ 4 , 4 ] 2 .
Mca 28 00010 g004
Figure 5. On Eq-DTLZ1-3 problems, the HVN method starts from a small local perturbation (black crosses) of the Pareto set (sphere in the decision space), i.e., X * + 0.02 U ( 0 , 1 ) , where X * (of size 200) is sampled uniformly at random on the Pareto set. The final approximation set of the HVN method is depicted as green points. Only the first three search dimensions are shown for the decision space.
Figure 5. On Eq-DTLZ1-3 problems, the HVN method starts from a small local perturbation (black crosses) of the Pareto set (sphere in the decision space), i.e., X * + 0.02 U ( 0 , 1 ) , where X * (of size 200) is sampled uniformly at random on the Pareto set. The final approximation set of the HVN method is depicted as green points. Only the first three search dimensions are shown for the decision space.
Mca 28 00010 g005
Figure 6. On the Eq-DTLZ2 (a) and the Eq-IDTLZ1 (b) problem, we compare the hybridization of HVN and NSGA-III to NSGA-III with roughly the same budget: for the former, the hybrid algorithm first executes NSGA-III with μ = 200 for 1000 iterations and then runs the HVN method for 10 iterations. In HVN, the total function evaluations and AD takes ca. 270 s CPU time on an Intel(R) Core(TM) i5-8257U CPU, which corresponds to ca. 4.8 × 10 5 FEs. Hence, for the latter, we set 3400 ( = 4.8 × 10 5 / 200 + 1000 ) iterations in total for μ = 200 . We use the same hyperparameter setting for the standalone NSGA-III and the one used in the hybridization. The decision space is [ 0 , 1 ] 11 , and the reference point is ( 1 , 1 , 1 ) for HVN.
Figure 6. On the Eq-DTLZ2 (a) and the Eq-IDTLZ1 (b) problem, we compare the hybridization of HVN and NSGA-III to NSGA-III with roughly the same budget: for the former, the hybrid algorithm first executes NSGA-III with μ = 200 for 1000 iterations and then runs the HVN method for 10 iterations. In HVN, the total function evaluations and AD takes ca. 270 s CPU time on an Intel(R) Core(TM) i5-8257U CPU, which corresponds to ca. 4.8 × 10 5 FEs. Hence, for the latter, we set 3400 ( = 4.8 × 10 5 / 200 + 1000 ) iterations in total for μ = 200 . We use the same hyperparameter setting for the standalone NSGA-III and the one used in the hybridization. The decision space is [ 0 , 1 ] 11 , and the reference point is ( 1 , 1 , 1 ) for HVN.
Mca 28 00010 g006
Table 1. The evolution of G ( X , λ ) on problems P1 with three different initialization strategies.
Table 1. The evolution of G ( X , λ ) on problems P1 with three different initialization strategies.
LinearLogisticLogit
14.23 × 10 1 4.55 × 10 1 4.20 × 10 1
22.33 × 10 1 2.54 × 10 1 2.27 × 10 1
38.81 × 10 0 1.01 × 10 1 8.52 × 10 0
48.19 × 10 0 7.82 × 10 0 8.30 × 10 0
52.29 × 10 0 2.17 × 10 0 2.29 × 10 0
61.06 × 10 1 8.77 × 10 2 1.11 × 10 1
71.91 × 10 4 3.48 × 10 4 1.93 × 10 3
87.38 × 10 10 7.05 × 10 7 1.03 × 10 5
91.76 × 10 14 1.06 × 10 12 1.55 × 10 10
101.62 × 10 14 1.79 × 10 14 2.33 × 10 14
Table 2. The evolution of G ( X , λ ) on problems P2 and P3.
Table 2. The evolution of G ( X , λ ) on problems P2 and P3.
Problem P2Problem P3
μ = 20 μ = 40 μ = 60 μ = 20 μ = 40 μ = 60
11.365 × 10 1 1.055 × 10 1 18.0368361.433 × 10 3 569.121097438.983791
29.454 × 10 0 9.259 × 10 0 16.0839169.368 × 10 2 541.523806434.703270
31.247 × 10 1 8.977 × 10 0 16.4036431.197 × 10 3 444.774066365.952443
41.589 × 10 1 8.628 × 10 0 18.0521269.522 × 10 2 261.636562362.014326
59.791 × 10 0 6.888 × 10 0 12.3648026.194 × 10 2 212.841570341.897644
68.618 × 10 0 1.123 × 10 1 3.8992545.232 × 10 2 145.076665254.253017
78.024 × 10 0 8.779 × 10 0 11.3234403.557 × 10 2 103.986300240.719767
84.737 × 10 0 7.632 × 10 0 13.3206062.419 × 10 2 57.592159165.603954
99.037 × 10 1 7.090 × 10 0 2.5436221.511 × 10 2 12.628821109.411195
107.393 × 10 2 1.816 × 10 0 5.9844378.527 × 10 1 0.10430770.516402
111.182 × 10 1 2.660 × 10 1 5.7494963.732 × 10 1 0.09777741.699152
124.399 × 10 2 2.877 × 10 1 0.7029643.248 × 10 0 0.09701319.525977
131.535 × 10 1 3.232 × 10 2 2.2404492.008 × 10 0 0.0966340.447690
142.299 × 10 1 3.694 × 10 3 13.2744681.829 × 10 0 0.0962560.257345
157.425 × 10 2 7.159 × 10 2 15.2019155.425 × 10 2 0.0052772.066379
161.572 × 10 2 1.378 × 10 2 11.2735711.735 × 10 1 0.0029342.016149
174.216 × 10 4 1.231 × 10 3 3.3189786.372 × 10 6 0.0016021.019636
181.630 × 10 7 5.630 × 10 4 2.8183409.702 × 10 3 0.0015520.944297
191.674 × 10 13 5.454 × 10 4 0.4003609.373 × 10 5 0.0015284.904926
201.733 × 10 13 5.411 × 10 4 0.3351072.546 × 10 8 0.0015222.937413
211.803 × 10 13 4.697 × 10 4 0.0740587.243 × 10 12 0.0015163.118031
221.761 × 10 13 5.901 × 10 4 0.0817988.897 × 10 12 0.0011390.336917
231.384 × 10 13 6.140 × 10 5 0.0577766.654 × 10 12 0.0010720.004270
241.020 × 10 13 4.508 × 10 7 0.0298097.210 × 10 12 0.0010100.000866
259.765 × 10 14 2.759 × 10 11 0.0019565.851 × 10 12 0.0009940.000489
269.788 × 10 14 7.794 × 10 13 0.0819495.851 × 10 12 0.0009900.000477
271.177 × 10 13 7.767 × 10 13 0.0312755.851 × 10 12 0.0009540.000459
281.176 × 10 13 7.688 × 10 13 0.0004925.851 × 10 12 0.1281400.000460
291.052 × 10 13 5.918 × 10 13 0.0000535.851 × 10 12 0.2527210.000460
301.314 × 10 13 6.821 × 10 13 0.0000025.851 × 10 12 0.0222330.000460
Table 3. On Eq-DTLZ1-4 and Eq-IDTLZ1-4 problems, the sample mean and standard error of the hypervolume (HV) value and the number of final non-dominated (ND) points over 15 independent runs for each algorithm. The hypervolume values are computed with reference point ( 1 , 1 , 1 ) for all problems except Eq-DTLZ4, Eq-IDTLZ3, and Eq-IDTLZ4, which we use ( 1.2 , 5 × 10 3 , 5 × 10 4 ) , ( 800 , 800 , 700 ) , and ( 0.4 , 0.6 , 0.6 ) , respectively. The initial population is μ = 200 for all algorithms. Hybridization = NSGA-III (iter = 1000 ) + HVN (iter = 10 ), which consumes roughly the same CPU time on function evaluations with NSGA-III for 3400 iterations (see caption of Figure 6 for the detail).
Table 3. On Eq-DTLZ1-4 and Eq-IDTLZ1-4 problems, the sample mean and standard error of the hypervolume (HV) value and the number of final non-dominated (ND) points over 15 independent runs for each algorithm. The hypervolume values are computed with reference point ( 1 , 1 , 1 ) for all problems except Eq-DTLZ4, Eq-IDTLZ3, and Eq-IDTLZ4, which we use ( 1.2 , 5 × 10 3 , 5 × 10 4 ) , ( 800 , 800 , 700 ) , and ( 0.4 , 0.6 , 0.6 ) , respectively. The initial population is μ = 200 for all algorithms. Hybridization = NSGA-III (iter = 1000 ) + HVN (iter = 10 ), which consumes roughly the same CPU time on function evaluations with NSGA-III for 3400 iterations (see caption of Figure 6 for the detail).
Eq-DTLZ1Eq-DTLZ2Eq-DTLZ3Eq-DTLZ4
AlgorithmHV#NDHV#NDHV#NDHV#ND
NSGA-III (1000)0.867 ± 1.4  × 10 3 28.4 ± 0.70.297 ± 1.9  × 10 3 32.7 ± 0.90.292 ± 1.9  × 10 3 26.0 ± 1.08.4  × 10 4 ± 7.0  × 10 5 12.3 ± 0.8
Hybridization0.876 ± 2.4  × 10 4 80.9 ± 2.00.324 ± 3.6  × 10 4 95.3 ± 1.90.321 ± 6.6  × 10 4 75.2 ± 2.41.1  × 10 3 ± 5.1  × 10 5 200.0 ± 0.0
NSGA-III (3400)0.873 ± 4.5  × 10 4 38.5 ± 1.30.304 ± 9.2  × 10 4 32.6 ± 0.90.301 ± 1.1  × 10 3 30.1 ± 0.79.2  × 10 4 ± 5.2  × 10 5 14.5 ± 0.6
Eq-IDTLZ1Eq-IDTLZ2Eq-IDTLZ3Eq-IDTLZ4
AlgorithmHV#NDHV#NDHV#NDHV#ND
NSGA-III (1000)0.517 ± 1.8  × 10 2 23.2 ± 0.53.224 ± 2.0  × 10 2 74.1 ± 1.21.5  × 10 9 ± 8.0  × 10 6 81.7 ± 1.68.4  × 10 4 ± 7.0  × 10 5 12.3 ± 0.8
Hybridization0.534 ± 1.5  × 10 3 112.1 ± 2.13.388 ± 1.7  × 10 2 198.2 ± 0.41.6  × 10 9 ± 5.4  × 10 6 197.1 ± 0.41.1  × 10 3 ± 5.1  × 10 5 200.0 ± 0.0
NSGA-III (3400)0.529 ± 2.9  × 10 4 33.4 ± 0.43.359 ± 4.7  × 10 3 88.3 ± 0.41.5  × 10 9 ± 2.5  × 10 6 92.1 ± 0.89.2  × 10 4 ± 5.2  × 10 5 14.5 ± 0.6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, H.; Emmerich, M.; Deutz, A.; Hernández, V.A.S.; Schütze, O. The Hypervolume Newton Method for Constrained Multi-Objective Optimization Problems. Math. Comput. Appl. 2023, 28, 10. https://doi.org/10.3390/mca28010010

AMA Style

Wang H, Emmerich M, Deutz A, Hernández VAS, Schütze O. The Hypervolume Newton Method for Constrained Multi-Objective Optimization Problems. Mathematical and Computational Applications. 2023; 28(1):10. https://doi.org/10.3390/mca28010010

Chicago/Turabian Style

Wang, Hao, Michael Emmerich, André Deutz, Víctor Adrián Sosa Hernández, and Oliver Schütze. 2023. "The Hypervolume Newton Method for Constrained Multi-Objective Optimization Problems" Mathematical and Computational Applications 28, no. 1: 10. https://doi.org/10.3390/mca28010010

Article Metrics

Back to TopTop