Non-Convex Optimization: Using Preconditioning Matrices for Optimally Improving Variable Bounds in Linear Relaxations

Reyes, Victor; Araya, Ignacio

doi:10.3390/math11163549

Open AccessArticle

Non-Convex Optimization: Using Preconditioning Matrices for Optimally Improving Variable Bounds in Linear Relaxations

by

Victor Reyes

^1,*

and

Ignacio Araya

²

¹

Escuela de Informática y Telecomunicaciones, Universidad Diego Portales, Santiago 8370068, Chile

²

Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Valparaíso 2340000, Chile

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(16), 3549; https://doi.org/10.3390/math11163549

Submission received: 11 July 2023 / Revised: 8 August 2023 / Accepted: 10 August 2023 / Published: 17 August 2023

(This article belongs to the Special Issue Mathematical Modeling, Optimization and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

The performance of branch-and-bound algorithms for solving non-convex optimization problems greatly depends on convex relaxation techniques. They generate convex regions which are used for improving the bounds of variable domains. In particular, convex polyhedral regions can be represented by a linear system

A . x = b

. Then, bounds of variable domains can be improved by minimizing and maximizing variables in the linear system. Reducing or contracting optimally variable domains in linear systems, however, is an expensive task. It requires solving up to two linear programs for each variable (one for each variable bound). Suboptimal strategies, such as preconditioning, may offer satisfactory approximations of the optimal reduction at a lower cost. In non-square linear systems, a preconditioner P can be chosen such that

P . A

is close to a diagonal matrix. Thus, the projection of the equivalent system

P . A . x = P . b

over x, by using an iterative method such as Gauss–Seidel, can significantly improve the contraction. In this paper, we show how to generate an optimal preconditioner, i.e., a preconditioner that helps the Gauss–Seidel method to optimally reduce the variable domains. Despite the cost of generating the preconditioner, it can be re-used in sub-regions of the search space without losing too much effectiveness. Experimental results show that, when used for reducing domains in non-square linear systems, the approach is significantly more effective than Gauss-based elimination techniques. Finally, the approach also shows promising results when used as a component of a solver for non-convex optimization problems.

Keywords:

optimal preconditioning matrices; interval methods; optimization-based bound tightening; branch-and-bound

MSC:

90C26; 90C59

1. Introduction

Non-convex optimization refers to the process of finding the minimum of a nonlinear function within a non-convex region, if it exists. In such regions, the function might have multiple local minima, maxima, or saddle points [1,2]. Usually, these kinds of problems can be defined as follows:

min_{x \in x} f (x) s . t . g (x) \leq 0,

(1)

with

x \in R^{n}

the set of variables varying in the box

x

(an interval

x_{i} = [\underset{̲}{x_{i}}, \bar{x_{i}}]

defines the set of reals

x_{i}

, such that

\underset{̲}{x_{i}} \leq x_{i} \leq \bar{x_{i}}

; a box

x

is a Cartesian product of intervals

x_{1} \times \dots \times x_{i} \times \dots \times x_{n}

),

f : R^{n} \to R

a real-valued objective function, and

g : R^{n} \to R^{m}

a set of inequality constraints. Notice that f and g may be non-convex functions. Interval-based branch-and-bound (B&B) techniques are commonly used for solving non-convex global optimization or constraint satisfaction problems. Still, the approach relies heavily on convex relaxation techniques [3]. Convex relaxations are used to transform the original non-convex problem into a convex one. This involves generating a convex and generally polyhedral region that contains the optimal solution of the original problem. The polyhedral region is represented by a non-square linear system

A . x = b

, where

A \in R^{m \times n}

,

x \in R^{n}

, and

b \in R^{m}

. The vector x includes some variables from the original problem and auxiliary variables with unbounded bounds for the inequalities. Once the relaxation is generated, the objective is to reduce or contract the variable domains of the original problem. Lower and upper bounds of variable domains can be found by minimizing and maximizing each variable of the linear system. The method, which solves the

2 n

linear programs, is called optimization-based bound tightening (obbt [4]) or PolytopeHull [5], and it is used by several global optimization solvers such as

α

BB [6], ANTIGONE [7], Couenne [8], LaGO [9], SCIP [10] and IbexOpt [5,11]. Due to its expensiveness, obbt is mostly applied at the root node and within the search tree only with limited frequency or based on its success rate. ANTIGONE, for instance, measures the success of obbt by the reduction in the box volume and disables it for all child nodes once the rate of reduction drops below a given threshold. The method is expensive, and some improvements have been proposed in order to: (1) reduce the number of linear programs to be solved; (2) accelerate the convergence of the simplex algorithm, and (3) generate projection inequalities that approximate the contraction performed by the

2 n

linear programs [12].

Machine learning techniques have also been applied to reduce the expensiveness of obbt. In [13], the authors propose a deep neural network (DNN) capable of predicting, from a convex relaxation of an AC optimal power flow problem, the subset of variables whose tightening of bounds can still contribute to the best improvement of the relaxation. The results show promising outcomes for these kinds of problems, demonstrating a 6.3-speed-up in obbt run times. In another study [14], a different machine learning technique, deep value-based reinforcement learning, was used to enhance the Simplex method. This was achieved by combining two well-known rules (Dantzig and Steepest Edge [15,16]) and using the algorithm’s current status to decide when to switch between them. This approach also showed promising results for solving complex problems.

In this work, we deal with the same problem as obbt, i.e., we want to improve the domain bounds of a variable vector x by using a non-square linear system

A . x = b

. In other words, we want to find a minimal box

x^{'}

such that all the solutions of the system belong to

x^{'}

.

An interval variant of the Gauss–Seidel algorithm can be used for contracting

x

(i.e., reducing the domains of the variables). However, it does not work well without a proper preconditioning [17]. In non-square systems, a conditioner matrix P can be chosen such that

P . A

is close to a diagonal matrix. Thus, the projection of the equivalent system

P . A . x = P . b

over x, by using an iterative method such as Gauss–Seidel, can be significantly improved. Gauss–Seidel applies the following operation for contracting the domain of each variable

x_{k}

:

x_{k} \leftarrow x_{k} \cap \frac{1}{{\hat{a}}_{i k}} (b_{i} - \sum_{j, j \neq k}^{m} {\hat{a}}_{i j} x_{j}), \forall i \in {1 \dots m}

(2)

where

{\hat{a}}_{i j}

are the coefficients of the matrix

\hat{A} = P . A

.

Techniques used for preconditioning non-square matrices include the Gauss–Jordan elimination method [18] which can provide a preconditioning matrix P by retrieving the row (or column) operations performed in A. This method constructs a pseudo-diagonal matrix in an iterative way, by selecting a subset of m variables as pivots. Usually, the current maximum absolute value of A is selected as the pivot in each step. In [19], the authors propose to select the pivot by using five priority rules (e.g., to select columns with at least two values, to select rows with less values, etc.). Another technique is the least squares method [20]. This technique constructs the system

A^{T} A . x = A^{T} . b

, which provides us with the solution

x = A^{T} {(A A^{T})}^{- 1} b

, known as the Moore–Penrose pseudoinverse. However, the matrix

{(A A^{T})}^{- 1}

may not exist if A is not full rank. In that case, the well-known singular-value decomposition method [21] (also known as SVD) can be used in order to determine the pseudoinverse of A.

In this work, we propose three methods which construct preconditioning matrices in order to deal with non-square linear systems. Two of them are based on solving linear programs. The first one constructs a

n \times m

matrix P by solving n linear programs. The solution of the i-th linear program corresponds to the i-th vector of P and minimizes the size of the projection over the variable

x_{i}

(right part in (2)). The second method constructs a

2 n \times m

matrix P by solving

2 n

linear programs. The solution of the first n linear programs corresponds to rows in P that maximize the lower bound of the projection over each of the variables. The other n linear programs minimize the upper bound of the projections. Furthermore, P is able to improve the domain bounds of the variables optimally, i.e., it leads to the smallest box that contains the feasible region. Finally, we realized that the problem of finding the optimal preconditioning matrix is equivalent to finding the dual feasible solutions of the

2 n

linear programs solved by obbt. An equivalent method is proposed in [12], where the authors, instead of generating a preconditioning matrix, directly generate a set of redundant inequalities for improving the bounds of the variables.

We also propose a heuristic for constructing preconditioners based on the Gauss–Jordan pivoting. It takes into account the current variable domains and constructs preconditioners that, when used with a Gauss–Seidel approach, offer a better contraction compared to the ones generated by other state-of-the-art heuristics.

The paper is organized as follows. Section 2 provides basic notions related to interval arithmetic and some remarks about linear systems. In Section 3 we present an example of using preconditioning for contracting a linear system. In Section 4 we describe in detail the three contributions of this paper. Section 5 reports the experimental results. Finally, Section 6 presents our conclusions and future work.

2. Background

In this section, we introduce some basic concepts related to interval arithmetic and interval linear systems. For more details and definitions, refer to [22].

2.1. Intervals

An interval

x_{i} = [{\underset{̲}{x}}_{i}, {\bar{x}}_{i}]

defines the set of reals

x_{i}

s.t.

{\underset{̲}{x}}_{i} \leq x_{i} \leq {\bar{x}}_{i}

, where

{\underset{̲}{x}}_{i}

and

{\bar{x}}_{i}

are floating-point numbers. The size or width of

x_{i}

is defined as

wid (x_{i}) = {\bar{x}}_{i} - {\underset{̲}{x}}_{i}

.

mid (x_{i})

denotes the midpoint of

x_{i}

, where

mid (x_{i}) = \frac{{\bar{x}}_{i} + {\underset{̲}{x}}_{i}}{2}

. A box

x = (x_{1}, \dots, x_{n})

represents the Cartesian product of intervals

x_{1} \times \dots \times x_{n}

. The size of a box is

wid (x) = max_{x_{i} \in x} wid (x_{i})

. The perimeter of a box is

per (x) = \sum_{i = 1}^{n} wid (x_{i})

. A hull of a set of vectors in

R^{n}

corresponds to the minimal box containing all of these vectors.

Interval arithmetic defines the extension of unary and binary operators; for instance,

\begin{matrix} x_{1} + x_{2} & = & [{\underset{̲}{x}}_{1} + {\underset{̲}{x}}_{2}, {\bar{x}}_{1} + {\bar{x}}_{2}] \\ x_{1} - x_{2} & = & [{\underset{̲}{x}}_{1} - {\bar{x}}_{2}, {\bar{x}}_{1} - {\underset{̲}{x}}_{2}] \\ x_{1} * x_{2} & = & [min ({\underset{̲}{x}}_{1} {\underset{̲}{x}}_{2}, {\underset{̲}{x}}_{1} {\bar{x}}_{2}, {\bar{x}}_{1} {\underset{̲}{x}}_{2}, {\bar{x}}_{1} {\bar{x}}_{2}), max ({\underset{̲}{x}}_{1} {\underset{̲}{x}}_{2}, {\underset{̲}{x}}_{1} {\bar{x}}_{2}, {\bar{x}}_{1} {\underset{̲}{x}}_{2}, {\bar{x}}_{1} {\bar{x}}_{2})] \\ log (x_{1}) & = & [log (\underset{̲}{x_{1}}), log (\bar{x_{1}})] \end{matrix}

A function

f : {IR}^{n} \to IR

is factorable if it can be computed in a finite number of simple steps, using unary and binary operators.

f : {IR}^{n} \to IR

is said to be an extension of a real factorable function f to intervals if

\forall x \in {IR}^{n}, f (x) \supseteq {f (x), x \in x} .

(3)

The optimal image

f_{o p t}

is the sharpest interval containing the image of

f (x)

over

x

. There are several kinds of extensions; in particular, the natural extension

f_{N}

corresponds to mapping a real n-dimensional function f to intervals by using interval arithmetic.

2.2. Linear Systems

A linear system

A . x = b

is a set of

m > 1

equations defined by a set of

n > 1

variables. Without loss of generality, we consider A as a matrix of real coefficients with m rows and n columns.

b \in R^{m}

corresponds to an m-size vector of real values and

x \in R^{n}

corresponds to the vector of variables. Initial domains of variables are represented by a box

x

which is generally reduced or contracted in order to converge to the hull of solutions. Linear systems can be classified in three types: square, overdetermined and underdetermined.

Square systems are the most common and studied type, where the number of linear independent equations is equal to the number of variables. Cheap suboptimal methods can be used for contracting

x

. For instance, the Gauss–Seidel algorithm updates, at each step, the box

x

by performing the contraction step (2).

None of these methods work well without a proper preconditioning. A good, but computationally expensive, preconditioning matrix when A is square corresponds to

P = A^{- 1}

, i.e., the inverse of A. If A is not singular (or numerically close to it), then

P . A

will correspond to the identity matrix, and the problem can be directly solved:

x = P . b

.

The second type of linear system belongs to the overdetermined category. In this case, the number of equations is greater than the number of variables. As it is not possible to compute the inverse matrix (

m \neq n

), other suboptimal methods, such as the Gauss–Jordan elimination technique [18], can be used for generating a preconditioner P through the row operations performed by the method. The Gauss–Jordan elimination is a variant of the Gaussian elimination technique. This method is usually used to solve linear systems and to find the inverse of any invertible matrix. Gauss–Jordan transforms a sub-matrix

n \times n

of A into a pseudo-identity matrix. The algorithm selects an element

a_{i j}

(known as the pivot) of A, and by performing row operations, it leaves

a_{i j}

equal to 1 and the other elements of the column j equal to 0. Notice that any element in the column j can not be selected as a pivot in the future.

A more recent technique, known as the subsquares approach, is proposed in [23]. This method extracts, sequentially, square systems

n \times n

from the original overdetermined system, performing a contraction of

x

by using each one of these systems. As there are

(\binom{m}{n})

possible combinations of square systems, the authors propose a heuristic to only select a fraction of them. Even if it does not compute the hull, it gives good results compared to other classical approaches.

Finally, the underdetermined linear systems have more variables than equations (i.e.,

m < n

). In this case, if we apply the Gauss–Jordan elimination, a matrix

P . A = [I R]

is generated, where I corresponds to an identity matrix of size

m \times m

and R represents a residual

m \times (m - n)

matrix.

As

P . A

is not diagonal, the contraction is not optimal. Small values for the residual matrix are preferred in order to obtain better contractions. Thus, the order in which the pivots are selected is crucial. A reasonable and widely used strategy for the pivoting process corresponds to selecting, in each iteration of the Gauss–Jordan elimination, the current maximum absolute value of the matrix A [24].

In this paper we focus on dealing with this last type of linear systems, i.e., the underdetermined ones.

3. Example: Contracting a Linear System

In order to explain preconditioning and to describe the new proposed methods, we will consider an underdetermined linear system example. First, notice that a linear system of constraints

a \leq \sum x_{i}^{'} + c \leq b

can be represented by a linear system

A^{'} . x^{'} = y

, where

y \in {[a - c, b - c]}^{m}

is an auxiliary vector of variables.

A^{'} . x^{'} = y

is equivalent to the linear system

A . x = 0

, where

x = (\begin{matrix} x^{'} \\ y \end{matrix})

and

A = (\begin{matrix} A^{'} - I \end{matrix})

, where I is an identity matrix of size

m \times m

. Thus in the following, and without loss of generality, we deal with the problem

A . x = 0

, with

A \in R^{m \times n}

and

x \in R^{n}

.

Example 1.

Consider the following underdetermined linear system (coefficients and domains were randomly generated):

(\begin{matrix} - 7.31 & 6.95 & 5.28 & - 4.90 & - 0.09 \\ - 1.01 & 3.03 & 5.77 & - 8.12 & - 9.43 \\ 6.72 & - 1.34 & 5.25 & - 9.96 & - 1.09 \end{matrix}) (\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \end{matrix}) = (\begin{matrix} 0 \\ 0 \\ 0 \end{matrix})

where the domains are

x_{1} = [- 1.565, 2.880]

,

x_{2} = [0.478, 4.463]

,

x_{3} = [- 1.038, 6.032]

,

x_{4} = [0.048, 3.615]

and

x_{5} = [- 1.076, 2.647]

.

If we apply the Gauss–Seidel steps (2) directly, no contraction is afforded in any of the five variables. On the other hand, if we apply the Gauss–Jordan technique by pivoting the maximum absolute value of A in each iteration, we obtain the following preconditioning matrix:

(\begin{matrix} - 0.091 & - 0.004 & 0.048 \\ 0.069 & - 0.113 & 0.058 \\ - 0.069 & 0.009 & - 0.073 \end{matrix})

Consequently, for

P . A

we obtain

(\begin{matrix} 1 & - 0.714 & - 0.253 & 0 & 0 \\ 0 & 0.059 & 0.017 & 0 & 1 \\ 0 & - 0.353 & - 0.699 & 1 & 0 \end{matrix})

By applying contraction steps (2) on the new system

P . A . x = P . b

, we obtain a contraction in some domains:

x_{1} \leftarrow [0.078, 2.880]

,

x_{2} \leftarrow [0.478, 4.400]

,

x_{3} \leftarrow [- 1.038, 4.922]

and

x_{5} \leftarrow [- 0.352, - 0.009]

.

4. Toward an Optimal Contraction of Non-Square Linear Systems

In this section we describe in detail three new approaches for dealing with non-square linear systems

A . x = b

. All of them construct a preconditioning matrix P which attempts to improve the projection performed by the Gauss–Seidel contraction step (2).

The first proposal corresponds to an improvement of the Gauss-pivot selection heuristic by taking into account information of the box

x

. The second and third proposals aim to construct the preconditioning matrix by solving linear programs.

4.1. Improving the Gauss-Pivoting Heuristic

At the end of Section 2.2, we explained that the Gauss–Jordan technique can be used for generating a preconditioning matrix P for the system

A . x = b

. From (2) we can see that, in order to increase the likelihood of contracting a variable

x_{k}

, the interval evaluation of

\frac{1}{{\hat{a}}_{i k}} (b_{i} - \sum_{j, j \neq k}^{m} {\hat{a}}_{i j} x_{j})

, where

{\hat{a}}_{i j}

are the coefficients of the matrix

\hat{A} = P . A

, should be as tight as possible. The width of this interval is

\frac{1}{| {\hat{a}}_{i k} |} (\sum_{j, j \neq k} {\hat{a}}_{i j} \cdot wid (x_{j}))

(4)

When applying Gauss–Jordan elimination, an indirect way of reducing this size is by pivoting the variable which maximizes the value of

| {\hat{a}}_{i k} |

in each iteration. When doing this, in a way, we are selecting the row i for contracting

x_{k}

:

| {\hat{a}}_{i k} |

is the largest value in the row i, thus, according to (4), when

| {\hat{a}}_{i k} |

is large, it is more likely we will obtain a tight projection over

x_{k}

. In addition, the Gauss–Jordan method removes the coefficient related to this variable from the other rows, benefiting the contraction over the other variables. Notice that the

P . A

matrix in Example 1 has a distinct property. In each row, the pivoted value is 1 and corresponds to the largest absolute value. As a result, other values in the corresponding columns are canceled out, providing a beneficial projection over the other variables.

Following the same idea, we think that it is also relevant to take into account the width of the next pivoting variable. For instance, if the width of

x_{k}

in (2) is too small, the likelihood of contracting this variable will be small too. On the contrary, if

x_{k}

is large, it is more likely to contract its domain. If we normalize variable domains to intervals

e_{j} = [- 1, 1]

, i.e.,

e_{j} : = \frac{x_{j}}{wid (x_{j})} - mid (x_{j})

, then the width of the projection over

e_{k}

is equal to:

\frac{1}{| {\hat{a}}_{i k} | \cdot wid (x_{k})} (\sum_{j = 0, j \neq k}^{m} | {\hat{a}}_{i j} | \cdot wid (x_{j}))

(5)

Thus, we propose, as a pivoting heuristic, to select the element

| {\hat{a}}_{i k} |

, such that

| {\hat{a}}_{i k} | \cdot wid (x_{k})

is maximized.

Example 2.

If we use the Gauss–Jordan elimination method, using as pivoting rule the value that maximizes the product

| a_{i j} | \cdot w i d (x_{j})

, in the example we obtain the following preconditioner:

(\begin{matrix} 0.067 & - 0.113 & 0.056 \\ 0.098 & - 0.013 & 0.105 \\ - 0.066 & - 0.008 & - 0.075 \end{matrix})

Consequently, for

P . A

we obtain

(\begin{matrix} 0 & 0.050 & 0 & 0.025 & 1 \\ 0 & 0.505 & 1 & - 1.428 & 0 \\ 1 & - 0.586 & 0 & - 0.361 & 0 \end{matrix})

Once we perform Equation (2), we obtain an additional contraction compared to just pivoting the cell with the largest absolute value. We obtain additional contraction on

x_{1} \leftarrow [0.297, 2.880] [0.078, 2.880]

and

x_{5} \leftarrow [- 0.319, - 0.025] [- 0.352, - 0.009]

.

4.2. Linear-Based Preconditioning

Despite offering good projections over

x

, the Gauss–Jordan-based preconditioning methods rarely lead to optimal contractions. In this section, we describe two methods that directly focus on optimizing the projection over variables.

4.2.1. Minimizing the Size of the Interval Projection

First, we attempt to construct a preconditioning vector

p = (p_{1}, p_{2}, \dots, p_{m})

, such that the projection of the system

p . A . x = 0

over the interval

x_{k}

has a minimum size. The values of the vector

\hat{a} = p . A

are computed:

{\hat{a}}_{j} = \sum_{i = 1}^{m} p_{i} \cdot a_{i j}, \forall j = 1, \dots, n,

(6)

where

a_{i j}

are the coefficients of matrix A. Thus, taking into account the interval projection size (4), a preconditioning vector p for minimizing this size (related to a variable

x_{k}

) can be generated by solving the following linear program:

\begin{matrix} minimize & \sum_{j = 1, j \neq k}^{n} | {\hat{a}}_{j} | \cdot w i d (x_{j}) \\ s . t . & \begin{matrix} {\hat{a}}_{k} & = & 1 \\ {\hat{a}}_{j} & = & \sum_{i = 1}^{m} p_{i} \cdot a_{i j}, \forall j = 1, \dots, n \end{matrix} \end{matrix}

(7)

Notice that by adding the constraint

\hat{a} k = 1

, we can remove the quotient

| \hat{a} i k |

from Formula (4) in the objective function.

For constructing the preconditioning matrix P, we have to solve the linear program for each variable

x_{k}

that we want to contract and to include the precondition vectors as rows in P.

In order to deal with the absolute value inside the objective function, we replace

| {\hat{a}}_{j} |

by auxiliary variables

u_{j}

. Then, we add the constraints

u_{j} \geq {\hat{a}}_{j}

and

u_{j} \geq - {\hat{a}}_{j}

and we solve the equivalent linear program by using the simplex algorithm.

Example 3.

For contracting

x_{1}

of Example 1, we would generate the following linear program:

\begin{matrix} minimize & 4.45 u_{1} + 3.98 u_{2} + 7.07 u_{3} + 3.57 u_{4} + 3.72 u_{5} \\ s . t . & \begin{matrix} {\hat{a}}_{1} & = & 1 \\ u_{r} & \geq & s_{r} & ; r = 1 \dots 5 \\ u_{r} & \geq & - s_{r} & ; r = 1 \dots 5 \\ {\hat{a}}_{1} & = & 7.31 p_{1} + 1.01 p_{2} - 6.72 p_{3} \\ {\hat{a}}_{2} & = & - 6.95 p_{1} - 3.03 p_{2} + 1.34 p_{3} \\ {\hat{a}}_{3} & = & - 5.28 p_{1} - 5.77 p_{2} - 5.25 p_{3} \\ {\hat{a}}_{4} & = & 4.90 p_{1} + 8.12 p_{2} + 9.96 p_{3} \\ {\hat{a}}_{5} & = & 0.08 p_{1} + 9.43 p_{2} + 1.09 p_{3}, \end{matrix} \end{matrix}

with optimal solution

p^{*} = (- 0.066, - 0.008, 0.075)

. By solving the linear problems related to the other variables, we would obtain the following preconditioner P:

(\begin{matrix} - 0.066 & - 0.008 & 0.075 \\ 0.127 & 0.006 & - 0.068 \\ 0.098 & - 0.013 & 0.105 \\ - 0.023 & 0.011 & - 0.098 \\ 0.067 & - 0.113 & 0.056 \end{matrix})

Finally,

P . A

is

(\begin{matrix} 1 & - 0.586 & 0 & - 0.361 & 0 \\ - 1.400 & 1 & 0.354 & 0 & 0 \\ 0 & 0.505 & 1 & - 1.428 & 0 \\ - 0.495 & 0 & - 0.574 & 1 & 0 \\ 0 & 0.050 & 0 & 0.025 & 1 \end{matrix})

Compared to the pivoting heuristic in Example 2, the preconditioner obtained by solving linear programs offers an additional contraction on

x_{2} \leftarrow [0.478, 4.400] [0.478, 4.463]

and

x_{5} \leftarrow [- 0.316, - 0.025] [- 0.319, - 0.025]

.

4.2.2. Minimizing/Maximizing the Upper/Lower Bound of the Interval Projection

Extending the idea of the previous section, now we attempt to construct preconditioning vectors

p = (p_{1}, p_{2}, \dots, p_{m})

such that the projection minimizes (maximizes) the upper (lower) bound of the projection of the system

p . A . x = 0

over the interval

x_{k}

. Considering that

\hat{a} = p . A

, the upper bound of the projection of

\hat{a} . x = 0

over the interval

x_{k}

is

\bar{- \frac{1}{{\hat{a}}_{k}} (\sum_{j, j \neq k}^{n} {\hat{a}}_{j} x_{j})}

(8)

Minimizing (8) is equivalent to maximizing

\sum_{j, j \neq k}^{n} \underset{̲}{{\hat{a}}_{j} x_{j}}

with

{\hat{a}}_{k} = 1

. We can replace

\underset{̲}{{\hat{a}}_{j} x_{j}}

by auxiliary variables

w_{j}

and two inequalities:

w_{j} \leq {\hat{a}}_{i j} \underset{̲}{x_{j}}

and

w_{j} \leq {\hat{a}}_{i j} \bar{x_{j}}

. Finally, we obtain the following linear program equivalent to minimizing (8):

\begin{matrix} maximize & \sum_{j = 1, j \neq k}^{n} w_{j} \\ s . t . & \begin{matrix} {\hat{a}}_{k} & = & 1 \\ {\hat{a}}_{j} & = & \sum_{i = 1}^{m} p_{i} \cdot a_{i j} \forall j = 1, \dots, n \\ w_{j} & \leq & {\hat{a}}_{j} \underset{̲}{x_{j}} \forall j = 1, \dots, n; j \neq k \\ w_{j} & \leq & {\hat{a}}_{j} \bar{x_{j}} \forall j = 1, \dots, n; j \neq k \end{matrix} \end{matrix}

(9)

An opposite and analogous reasoning can be performed in order to obtain a linear problem for maximizing the lower bound of the projection of

p . A . x = 0

over

x_{k}

.

Then, solving the linear programs results in obtaining preconditioning vectors p. By means of Gauss–Seidel projections, each of these vectors is capable of improving one of the bounds of an interval

x_{k}

. The obtained vectors p can be included in a preconditioning matrix P (duplicated vectors can be discarded).

Example 4.

By solving the two linear programs for each variable in Example 1, we obtain the following preconditioning matrix (recall that each row corresponds to a solution vector p of a linear program):

(\begin{matrix} - 0.066 & - 0.008 & 0.075 \\ 0.127 & 0.006 & - 0.068 \\ 0.098 & - 0.013 & 0.105 \\ - 0.023 & 0.011 & - 0.098 \\ 0.067 & - 0.113 & 0.056 \\ - 0.069 & 0.009 & - 0.073 \\ 0.061 & - 0.113 & 0.062 \end{matrix})

The first five rows were obtained by minimizing the upper bounds of the projections, while the last two were obtained by maximizing the lower bounds of the projections. The missing rows correspond to duplicated ones. Finally, we obtain the following matrix

P . A

:

(\begin{matrix} 1 & - 0.586 & 0 & - 0.361 & 0 \\ - 1.400 & 1 & 0.354 & 0 & 0 \\ 0 & 0.505 & 1 & - 1.428 & 0 \\ - 0.495 & 0 & - 0.574 & 1 & 0 \\ 0 & 0.050 & 0 & 0.025 & 1 \\ 0 & - 0.353 & - 0.699 & 1 & 0 \\ 0.083 & 0 & - 0.003 & 0 & 1 \end{matrix})

Compared to the preconditioner obtained in Section 4.2.1, the preconditioner in this section offers additional contraction on

x_{5} \leftarrow [- 0.245, - 0.025] [- 0.316, - 0.025]

.

Proposition 1.

Let p be an optimal solution of the linear program (9). Then, by using the system

p . A . x

and Gauss–Seidel, we can optimally improve the upper bound of the interval

x_{k}

.

Proof.

The optimal upper bound of a interval domain

x_{k}

is equivalent to the maximum value of

x_{k}

subject to the constraint system

A . x

, i.e.,

\begin{matrix} maximize & x_{k} \\ s . t . & \begin{matrix} \sum_{j = 1}^{n} a_{i j} \cdot x_{j} & = & 0, \forall i = 1, \dots, m \\ \underset{̲}{x_{j}} \leq & x_{j} \leq & \underset{̲}{x_{j}}, \forall i = 1, \dots, m \end{matrix} \end{matrix}

We consider first the dual problem of Section 4.2.2. Let

π \in R^{m}

be the vector associated with the constraints,

l \in R^{n}

be the vector associated with the bound constraints

\underset{̲}{x_{j}} \leq x_{j}

, and

u \in R^{n}

be the vector associated with the bound constraints

x_{j} \leq \bar{x_{j}}

. Thus, the dual problem of Section 4.2.2 can be stated as follows:

\begin{matrix} minimize & \sum_{i = 1}^{m} 0 \cdot π + \sum_{j = 1, j \neq k}^{n} \underset{̲}{x_{j}} \cdot l_{j} - \sum_{j = 1, j \neq k}^{n} \bar{x_{j}} \cdot u_{j} \\ s . t . & \begin{matrix} \sum_{i = 1}^{m} a_{i j} \cdot π + l_{j} - u_{j} & = & 0, \forall j = 1, \dots, n; j \neq k \\ \sum_{i = 1}^{m} a_{i k} \cdot π & = & 1 \\ l, u & \leq & 0, π free \end{matrix} \end{matrix}

By defining

p_{i} = π_{i}

and

{\hat{a}}_{j} = \sum_{i = 1}^{m} p_{i} \cdot a_{i j}

, and also developing the previous linear program, we obtain

\begin{matrix} maximize & \sum_{j = 1, j \neq k}^{n} (\bar{x_{j}} \cdot u_{j} - \underset{̲}{x_{j}} \cdot l_{j}) \\ s . t . & \begin{matrix} {\hat{a}}_{j} + l_{j} - u_{j} & = & 0, \forall j = 1, \dots, n; j \neq k \\ \sum_{i = 1}^{m} p_{i} \cdot a_{i j} & = & {\hat{a}}_{j}, \forall j = 1, \dots, n; j \neq k \\ {\hat{a}}_{k} & = & 1 \\ l, u & \leq & 0, π free \end{matrix} \end{matrix}

Let

w_{j} = \bar{x_{j}} \cdot u_{j} - \underset{̲}{x_{j}} \cdot l_{j}

,

l j \neq k

. Notice that if the first constraint is multiplied by

- \bar{x_{j}}

, we obtain the following result:

\begin{matrix} - \bar{x_{j}} \cdot {\hat{a}}_{j} + \bar{x_{j}} \cdot u_{j} & = \bar{x_{j}} \cdot l_{j} / adding - \underset{̲}{x_{j}} \cdot l_{j} \\ - \bar{x_{j}} \cdot {\hat{a}}_{j} + w_{j} & = \bar{x_{j}} \cdot l_{j} - \underset{̲}{x_{j}} \cdot l_{j} \\ w_{j} & = \bar{x_{j}} \cdot {\hat{a}}_{j} + l_{j} (\bar{x_{j}} - \underset{̲}{x_{j}}), \end{matrix}

as

l_{j} \leq 0

, we can deduce that

w_{j} \leq \bar{x_{j}} \cdot {\hat{a}}_{j}

. Using the same procedure, but multiplying by

\underset{̲}{x_{j}}

instead, it can be deduced that

w_{j} \leq \underset{̲}{x_{j}} \cdot {\hat{a}}_{j}

. Finally, we reach the same linear program stated in (9).

As finding the best preconditioning vector p for projecting over the upper bound of

x_{k}

is equivalent to the dual linear problem of finding an optimal upper bound of

x_{k}

, then, according to the duality theorem, the value of the optimal solutions is the same. In other words, by using the preconditioned system

p . A . x

and the Gauss–Seidel procedure, we can optimally improve the upper bound of

x_{k}

. □

Proposition 1 can be extended to lower bounds in a straightforward way. It is important to highlight that an equivalent proposition was derived in [12] by directly using the duality theory of linear programming.

5. Experiments

For validating our approach, we first compare the different strategies for contracting variable domains related to linear systems (see Section 5.1). Then, we observe the contraction power of a preconditioner P in boxes which are smaller than the box used for generating P (see Section 5.2). Finally, we include the preconditioning-based strategies into a non-convex optimization solver and compare the results with those of a standard strategy (see Section 5.3).

For the first experiments, we generated several sets of benchmark instances by using a random linear system generator (https://github.com/vareyesr/linear-generator, accessed on 1 October 2022). The generator constructs rectangular linear systems

A . x = b

with n variables and m constraints (

n > m

). Each constraint i has the following structure:

\sum_{j = 1}^{n} a_{i j} x_{j} = b_{i},

(10)

where

a_{i j}

corresponds to a real value between

- 10

and 10.

Without loss of generality, we set b to the null vector (i.e., x equal to a null vector is always a solution of the problem). Additionally, for all the performed experiments, the number of variables was fixed at

n = 20

. The number of constraints m varied from 12 to 19. For each value of m, 20 systems were generated. The bounds of each variable domain were initially set to random values uniformly distributed in the range

[- 50, 0]

for the lower bound and

[0, 50]

for the upper bound. It is important to note that each variable domain included the value 0 to prevent empty solutions or manifolds.

All the strategies explained in the previous sections were incorporated into Ibex 2.8.9 [25], a C++ state-of-the-art library for constraint processing over real numbers.

5.1. Contracting Power

Figure 1 reports a comparison between the different strategies. The plot on the left side shows, on problems with different number of constraints, an average relative width of the most contracted interval w.r.t. the width of the optimally contracted interval. Considering that an input box is contracted to

x^{'}

, then the corresponding relative width is

\frac{wid (x_{i}^{'})}{wid (x_{i}^{*})}

, where

x_{i}^{'} = \underset{i \in {1 \dots n}}{argmin} wid (x_{i}^{'})

and

x^{*}

is an optimally contracted box. The optimal contraction is performed by obbt.

On the other hand, the right plot shows the average relative perimeter of the contracted box, w.r.t. the perimeter of the optimally contracted box. The relative perimeter is computed analogously to the relative width.

All the strategies perform the Gauss–Seidel procedure for contracting

x

on the system

P . A . x = 0

. The strategy Gauss max constructs the preconditioning matrix P by using the Gauss elimination method with the maximum heuristic, while the strategy Gauss max-diam constructs P by using the heuristic that takes into account the size of the interval domains

x_{i}

. The strategy LP min-size constructs P by solving the linear programs (7) that minimize the size of the projection intervals. Recall that the strategy that constructs P by solving the linear programs (9) is optimal; thus, its results are not reported in the plots.

From the figure, we can see that the linear-based preconditioner is the closest to obbt. The largest difference in terms of relative width/perimeter between all the strategies occurs when the number of constraints is 17.

On the other hand, when the number of rows is 19, all the strategies are optimal. The reason is that in this case, all the preconditioners behave as

A^{- 1}

, as the number of rows in A is almost equivalent to the number of columns.

A similar situation occurs when the number of constraints is low. In this case, it seems that the initial domains are close to the global hull-consistency [26]. Global hull-consistency occurs when each domain bound is part of a solution.

Even if LP min-size is the one with the best contraction among all the three strategies, its contraction is suboptimal, this is due to this strategy using only one preconditioning vector p for improving both bounds of each interval. Additionally, we can see that by taking into account the domain sizes in the Gauss pivoting heuristics (Gauss max-diam), we reach a significantly better contraction compared to its counterpart.

5.2. Sustainability

In a second series of experiments, we evaluate the sustainability of the approaches. That is, we want to know how long in the search we could use the same preconditioner P without losing too much effectiveness in contraction. In addition to the strategies used in the first series of experiments, we have included the strategy LP-opt, which constructs a preconditioning matrix that offers the same contraction power as obbt.

The experiment involves first generating a preconditioning matrix P using the same set of benchmarks and initial box

x

as the previous experiment. Then, we arbitrarily and randomly reduce this box to a fraction of its original width (

50 %

,

10 %

and

1 %

). Plots in Figure 2 report the average relative width obtained by the contraction performed by the strategies by using the reduced box. In this way, we simulate, in a certain way, what happens in an iteration of a B&B solver after some subdivisions and domain reductions of the initial box

x

. The optimal contraction is performed by obbt on the reduced box.

In the plots, we can observe that as the widths decrease, the strategies move away from the optimal contraction achieved by the reference strategy. Additionally, it is evident that the linear-based strategies consistently outperform the Gauss-based ones. Similar to the previous experiment, the Gauss-based strategy considering the size of the interval domains performs better than its counterpart, even when the size of the box is small.

When the width of the reduced box is

50 %

of the original width, we can see that the contraction performed by LP opt is the best among all the strategies. However, when the width is small (

10 %

or

1 %

of the original width), LP min-size is more effective than LP opt. We think that as the preconditioning vectors p generated by the strategy LP min-size focus on improving both bounds of a variable at the same time, they are probably more adaptable to changes in the interval domain bounds.

5.3. Non-Convex Optimization Problems

As a final series of experiments, we implemented a preconditioning-based method for filtering variable domains (obbt-gs

_{α}

) and included it in an interval B&B global optimizer: IbexOpt [11]. In general, interval B&B methods solve non-convex optimization problems, such as (1), by performing a branch-and-bound schema from an initial node [3]. At each iteration, a box from a list of the remaining boxes is selected and processed. Once such selection is performed, the box is divided into two or more sub-boxes by using a bisection strategy. New boxes are then treated by one or several contractors. Contractors attempt to remove inconsistent values from the bounds of the intervals without the loss of solutions; this process is known as contraction. The types of contraction algorithms can be primarily divided into three categories: interval analysis contractors (e.g., Interval Newton [27]), constraint programming contractors, and linear relaxation-based contractors. These include constraint propagation methods such as HC4/FBBT or 3BCID, and linear-relaxation-based methods such as obbt. Finally, the upper bounding method is applied. This method consists in finding feasible solutions in the box to be used for pruning the search space.

Thus, in the following experiment, we propose to use obbt-gs

_{α}

instead of obbt as the linear-relaxation-based contractor of IbexOpt. Algorithm 1 shows the method. As input, obbt-gs

_{α}

receives the current box

x

, the objective function f and the functions g related to the constraint system

g (x) \leq 0

. It returns a contracted box

x_{c}

.

First, if matrix

P A

has not been created, or certain conditions (which will be explained later) are met, then the constraint system is linearized with a traditional linearization technique (e.g., AF2 [28] and XNewton [5]) resulting in a linear system of inequalities:

A . x = b

(note that b is an interval vector). Then, the method obbt contracts the box and generates the preconditioning matrix P by using the strategy LPopt. Variables

P A

and

P b

are computed and stored for future use. The variable

x_{p}

(last preconditioned box) is updated by the current box

x

. On the other hand, if conditions are not met, then the much cheaper Gauss–Seidel method is applied, using the stored variables

P A

and

P b

, for contracting the current box

x

instead of obbt.

Algorithm 1: The obbt-gs

_{α}

contractor for reducing box domains.

The preconditioning matrix is recomputed, and the box is contracted by using obbt, if one of the following conditions is met:

$P A$ has not been created.
It is not the turn of applying Gauss–Seidel: the user-defined parameter $F$ indicates the frequency of applying Gauss–Seidel instead of obbt. For example, if $F = 1 / 5$ , then GS_turn $(F)$ returns true once every 5 calls.
The space related to the current box is not related to the space used for computing the current preconditioning matrix, i.e., $x ⊈ x_{p}$ . This occurs when the algorithm finishes a branch of the search tree and starts another one.
The box is too small compared to the last one used for preconditioning, i.e., $wid (x) < α$ , with $0 \leq α \leq 1$ a user-defined parameter.

Example 5.

Consider the following simple example to illustrate the algorithm:

\begin{matrix} m i n & f (x) = 0.4 x_{1}^{2} + 2 x_{1} + x_{2} \\ s u b j e c t t o : & g (x) = - x_{1}^{3} + 5 \cdot x_{1}^{2} - 10 (x_{1} + x_{2}) \leq 0 . \end{matrix}

Suppose that after performing some search, we reach a node where

x_{1} = [- 5, 5]

and

x_{2} = [- 5, 5]

and that the current best solution

x^{'}

has a value of

f (x^{'}) = 3

. Since we are not interested in feasible suboptimal solutions, the additional constraint

f (x) \leq 3

is also considered in the system.

When Algorithm 1 processes the box, it linearizes the constraints. Figure 3-left graphically illustrates the example. Constraints are projected on the box, i.e., we plot

f (x) = 3

and

g (x) = 0

. The feasible region corresponds to the area below the objective function and above the constraint function. The optimal solution of the problem, i.e.,

x = (- 0.51, 0.66)

, is also plotted. Notice that

g (x)

is not convex and its linearization is only valid inside the box.

Straight lines represent the linearization of the constraints corresponding to

\begin{matrix} 2 x_{1} + x_{2} & \in [3, + \infty) (linearization of f (x) \leq 3) \\ - x_{1} - x_{2} & \in [0, + \infty) (linearization of g (x) \leq 0) . \end{matrix}

Next,obbtis applied to contract the variables

x_{1}

and

x_{2}

(see Figure 3-right). That is, bounds are found by maximizing and minimizing each variable over the linear system. The corresponding dual optimal solutions are used for generating the preconditioning vectors

p_{k}

for each bound (notice that this is equivalent to solving linear program (9) according to the proof of Proposition 1).

Finally, the matrix P allows us to construct the linear system

P . A . x = P . b

:

[\begin{matrix} 1 & 1 & 0 & 1 \\ 0 & 1 & 1 & 2 \\ 1 & 0 & - 1 & - 1 \\ 2 & 1 & - 1 & 0 \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \\ w_{1} \\ w_{2} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix}]

Notice that,

w_{1} = [3, \infty]

and

w_{2} = [0, \infty]

correspond to auxiliary variables. Applying Gauss–Seidel on this system reaches the same contraction asobbtin the current box and will be used in subsequent iterations of the algorithm (line 10 of the algorithm).

The set of instances (42 in total) was selected from the COCONUT benchmarks [29] for global optimization (https://arnold-neumaier.at/glopt/coconut/Benchmark/Benchmark.html, accessed on 1 December 2022). We selected all the instances solved by the default strategy (i.e., IbexOpt using obbt) in a time period between 2 and 3600 s. Details of these instances are shown in Table 1, where n represents the number of variables, m represents the number of constraints, and #nonlinear represents the number of nonlinear constraints. It is important to note that all of these benchmarks (1) have a nonlinear objective function, and (2) are differentiable except at certain points where discontinuities may exist.

In Figure 4, we report the average percentage difference of the processed boxes (left) and the percentage difference in CPU time (right) of the strategies (the percentage difference is computed:

\frac{s - s^{*}}{s^{*}} * 100

, where s correspond to any strategy, and

s^{*}

is the default strategy), w.r.t. the default strategy, with different values of

F

. Notice that a negative value denotes that the strategy reduced its time or the number of boxes compared to the default strategy, while positive values indicate an increase. We consider the following strategies: obbt-gs

_{α}

and obbt-gs for different values of

α

. obbt-gs does not take into account the condition

wid (x) < α \cdot wid (x_{p})

, i.e.,

α = 0

.

From the results, we can see that as we increase the frequency of applying obbt (i.e., we reduce the value of

F

), the average percentage differences, evidently, approach 0. Although obbt-gs

_{α}

seems to be less effective in contraction than obbt, when the value of

F

is adequately chosen, we may reach an improvement in CPU time. For instance, when

F = 1 / 4

, the increase in the search tree size is compensated by a reduction in the CPU time required for processing nodes, and we reach an average percentage difference of

- 6.6 %

(

α = 10^{- 3}

).

In Figure 5, we report relative gains in obbt-gs

_a l p h a

, w.r.t. to the reference strategy, with different configurations of parameters. The best results are reported when

α = 10^{- 3}

and

F = 1 / 4

. When the condition

wid (x) < α \cdot wid (x_{p})

is not considered, i.e.,

α = 0

, the best results are reported when

F = 1 / 5

. Additionally, obbt with

F = 1 / b

consists in applying obbtb out of

b + 1

times, i.e, it is equivalent to obbt-gs with

α = 0

but without using Gauss–Seidel.

Finally, Table 2 reports results on most relevant instances. In this table, we considered benchmark instances which were solved by the reference strategy in a time greater than 2 s and having a difference, in terms of both the number of boxes and CPU time, of at least

10 %

compared with the other strategies (16 instances). The column labeled boxes (CPU) reports the number of boxes (CPU time) processed by using a strategy. The table also presents the percentage difference in the CPU time spent (

Δ

t) compared to the default strategy. The strategy obbt-gs considers parameter values

F = 1 / 5

and

α = 0.0

, i.e., the best configuration found when

α

is fixed to

0.0

. On the other hand, the strategy obbt-gs

_{α}

considers parameter values

F = 1 / 4

and

α = 10^{- 3}

, i.e., the best found configuration for the parameters. In bold, we highlight the best CPU times. The last row reports the relative gain in CPU time for the considered set of instances compared with the reference strategy, i.e., obbt.

First notice that obbt-gs

_{α}

reports, in average, the greatest reductions in terms of CPU time (average percentage differenoe of

- 12.7 %

). This shows that a well-preconditioned matrix can provide a better balance between contraction power and CPU cost compared to using the traditional obbt strategy, which, although it offers optimal contraction, is much more expensive. On the other hand, if we compare the results with obbt-gs, the parameter

α

seems to be important, that is, it seems important to update the preconditioning matrix when the boxes are to small compared to the box used in the previous update.

In some instances, such as ex6_2_8, ex8_5_2_1, hs100 and hs113, we can observe important gains in terms of CPU time, even if the number of boxes increases considerably w.r.t the obbt strategy. This is due to the fact that the worsening of contraction effectiveness is highly compensated by the time complexity reduction of the Gauss–Seidel method. On the other hand, notice that the reference strategy outperforms obbt-gs and obbt-gs

_{α}

in more than 10% of the reported CPU time only in 2 instances (dualc8 and chembis). In these cases, the time complexity reduction of the Gauss–Seidel method is not enough to compensate for the significant increase in the number of boxes.

6. Conclusions

In this work, we propose three methods for generating preconditioning matrices for non-square linear systems

A . x = b

. These preconditioners can be used for improving the contraction of iterative methods such as the Gauss–Seidel algorithm. The first method generates a preconditioner by using a Gauss–Jordan elimination method that takes into account the width of the intervals for selecting the next pivot. In this way, we are selecting the row i maximizing

a_{i k} \cdot wid (x_{k})

for contracting

x_{k}

instead of simply selecting the row maximizing

a_{i k}

as a previous approach. Additionally, we have presented two preconditioners generated by solving linear programs. They are focused, respectively, on minimizing the interval size and optimizing the bounds of the variable domains.

The experiments show promising results. On the one hand, by using the preconditioner based on Gauss–Jordan elimination, we obtain a better contraction than using its counterpart which selects the maximum coefficient of A for pivoting. On the other hand, the preconditioners generated by solving linear programs outperform the ones based on Gauss–Jordan elimination.

We also show that, by using the preconditioner focused on optimizing the interval bounds of a box

x

, we reach an optimal contraction of this box. In addition, when a smaller box

x^{'} \subset x

is contracted, although the preconditioner does not offer an optimal contraction, it is still better than the Gauss–Jordan-based strategies. Finally, we propose a simple contractor (obbt-gs)

_{α}

that replaces obbt calls by Gauss–Seidel iterations on the system

P . A x = P b

when the current box is similar to the box used for generating the last available preconditioner P (and linearization

A . x = b

). Otherwise, obbt is applied normally and a new preconditioner (and linearization) is generated for future calls to the method. obbt-gs

_{α}

shows promising results when included into a solver for non-convex optimization problems.

As a future work, we plan to design a more intelligent mechanism for updating P. It should update P only when it is needed, e.g., when some relevant coefficients in A or some relevant bounds of variable domains suffer significant changes. To achieve this, we propose exploring the use of deep learning mechanisms or machine learning algorithms to determine when it is necessary to update P. By training a model on historical data and monitoring changes in the problem structure, we can identify key indicators that trigger the need for a new preconditioner. This would optimize the usage of the obbt-gs

_{α}

method, reducing unnecessary overheads while ensuring improved convergence rates when relevant changes occur.

Additionally, we aim to investigate the impact of different preconditioning techniques on a broader range of non-convex optimization problems. Understanding how the proposed preconditioners perform on various problem classes and problem sizes will provide valuable insights into their versatility and effectiveness in different scenarios.

Author Contributions

Validation, V.R.; Investigation, V.R. and I.A.; Writing—original draft, V.R. and I.A. All authors have read and agreed to the published version of the manuscript.

Funding

Victor Reyes is supported by Fondecyt project 11230225, and Ignacio Araya is supported by Fondecyt project 1200035.

Data Availability Statement

IbexOpt can be downloaded from https://github.com/ibex-team/ibex-lib (accessed on 1 October 2022). Benchmark used in Section 5.2 and Section 5.1 were generated by using https://github.com/vareyesr/linear-generator (accessed on 1 October 2022). Benchmarks used in Section 5.3 can be found in https://arnold-neumaier.at/glopt/coconut/Benchmark/Benchmark.html (accessed on 1 December 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Bertsekas, D. Nonlinear Programming, 3rd ed.; Optimized Computers Series; Athena Scientific: Belmont, MA, USA, 2016; Available online: http://www.athenasc.com/nonlinbook.html (accessed on 1 August 2023).
Boyd, S.P.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Araya, I.; Reyes, V. Interval branch-and-bound algorithms for optimization and constraint satisfaction: A survey and prospects. J. Glob. Optim. 2016, 65, 837–866. [Google Scholar] [CrossRef]
Locatelli, M.; Schoen, F. Global Optimization: Theory, Algorithms and Applications; SIAM: Philadelphia, PA, USA, 2013. [Google Scholar]
Araya, I.; Trombettoni, G.; Neveu, B. A contractor based on convex interval taylor. In Proceedings of the International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, Nantes, France, 28 May–1 June 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 1–16. [Google Scholar]
Adjiman, C.S.; Dallwig, S.; Floudas, C.A.; Neumaier, A. A global optimization method, αBB, for general twice-differentiable constrained NLPs—I. Theoretical advances. Comput. Chem. Eng. 1998, 22, 1137–1158. [Google Scholar] [CrossRef]
Misener, R.; Floudas, C.A. ANTIGONE: Algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. 2014, 59, 503–526. [Google Scholar] [CrossRef]
Belotti, P.; Lee, J.; Liberti, L.; Margot, F.; Wächter, A. Branching and bounds tightening techniques for non-convex MINLP. Optim. Methods Softw. 2009, 24, 597–634. [Google Scholar] [CrossRef]
Nowak, I.; Vigerske, S. LaGO: A (heuristic) branch and cut algorithm for nonconvex MINLPs. Cent. Eur. J. Oper. Res. 2008, 16, 127–138. [Google Scholar] [CrossRef]
Achterberg, T. SCIP: Solving constraint integer programs. Math. Program. Comput. 2009, 1, 1–41. [Google Scholar] [CrossRef]
Trombettoni, G.; Ignacio, A.; Neveu, B.; Chabert, G. Inner regions and interval linearizations for global optimization. In Proceedings of the AAAI, San Francisco, CA, USA, 7–11 August 2011. [Google Scholar]
Gleixner, A.M.; Berthold, T.; Müller, B.; Weltge, S. Three enhancements for optimization-based bound tightening. J. Glob. Optim. 2017, 67, 731–757. [Google Scholar] [CrossRef]
Cengil, F.; Nagarajan, H.; Bent, R.; Eksioglu, S.; Eksioglu, B. Learning to accelerate globally optimal solutions to the AC Optimal Power Flow problem. Electr. Power Syst. Res. 2022, 212, 108275. [Google Scholar] [CrossRef]
Suriyanarayana, V.; Tavaslioglu, O.; Patel, A.B.; Schaefer, A.J. DeepSimplex: Reinforcement Learning of Pivot Rules Improves the Efficiency of Simplex Algorithm in Solving Linear Programming Problems. 2019. Available online: https://openreview.net/pdf?id=SkgvvCVtDS (accessed on 29 July 2023).
Forrest, J.J.; Goldfarb, D. Steepest-edge simplex algorithms for linear programming. Math. Program. 1992, 57, 341–374. [Google Scholar] [CrossRef]
Dantzig, G.B.; Orden, A.; Wolfe, P. The generalized simplex method for minimizing a linear form under linear inequality restraints. Pac. J. Math. 1955, 5, 183–195. [Google Scholar] [CrossRef]
Niki, H.; Kohno, T.; Morimoto, M. The preconditioned Gauss–Seidel method faster than the SOR method. J. Comput. Appl. Math. 2008, 219, 59–71. [Google Scholar] [CrossRef]
Hansen, E.; Walster, G.W. Solving overdetermined systems of interval linear equations. Reliab. Comput. 2006, 12, 239–243. [Google Scholar] [CrossRef]
Ceberio, M.; Granvilliers, L. Solving nonlinear equations by abstraction, Gaussian elimination, and interval methods. In Proceedings of the International Workshop on Frontiers of Combining Systems, Santa Margherita Ligure, Italy, 8–10 April 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 117–131. [Google Scholar]
Abdi, H. The method of least squares. Encycl. Meas. Stat. 2007, 1, 530–532. [Google Scholar]
Golub, G.H.; Reinsch, C. Singular value decomposition and least squares solutions. In Linear Algebra; Springer: Berlin/Heidelberg, Germany, 1971; pp. 134–151. [Google Scholar]
Jaulin, L.; Kieffer, M.; Didrit, O.; Walter, E.; Jaulin, L.; Kieffer, M.; Didrit, O.; Walter, É. Interval Analysis; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Horáček, J.; Hladík, M. Subsquares approach—A simple scheme for solving overdetermined interval linear systems. In Proceedings of the International Conference on Parallel Processing and Applied Mathematics, Warsaw, Poland, 8–11 September 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 613–622. [Google Scholar]
Domes, F.; Neumaier, A. Rigorous filtering using linear relaxations. J. Glob. Optim. 2012, 53, 441–473. [Google Scholar] [CrossRef]
Chabert, G.; Jaulin, L. Contractor Programming. Artif. Intell. 2009, 173, 1079–1100. [Google Scholar] [CrossRef]
Benhamou, F.; Goualard, F.; Granvilliers, L.; Puget, J.F. Revising hull and box consistency. In Proceedings of the International Conference on Logic Programming, Las Cruces, NM, USA, 29 November–4 December 1999; Citeseer: State College, PA, USA, 1999. [Google Scholar]
Moore, R.E. Methods and Applications of Interval Analysis; SIAM: Philadelphia, PA, USA, 1979. [Google Scholar]
Ninin, J.; Messine, F.; Hansen, P. A reliable affine relaxation method for global optimization. 4OR 2015, 13, 247–277. [Google Scholar] [CrossRef]
Shcherbina, O.; Neumaier, A.; Sam-Haroud, D.; Vu, X.H.; Nguyen, T.V. Benchmarking global optimization and constraint satisfaction codes. In Proceedings of the Global Optimization and Constraint Satisfaction: First International Workshop on Global Constraint Optimization and Constraint Satisfaction, COCOS 2002, Valbonne-Sophia Antipolis, France, 15–18 October 2002; Revised Selected Papers 1. Springer: Berlin/Heidelberg, Germany, 2003; pp. 211–222. [Google Scholar]

Figure 1. Average relative sizes of the contracted boxes w.r.t. the optimal contracted boxes. (left) Relative width of the most contracted variable; and (right) relative perimeter after contraction.

Figure 2. Relative width of the most contracted interval on linear systems with a different number of constraints. Each strategy uses the initial box

x

for generating the preconditioning matrix P. Then, the contraction is performed on a randomly generated box

x^{'} \in x

with (a)

50 %

, (b)

10 %

and (c)

1 %

of the width of

x

.

Figure 2. Relative width of the most contracted interval on linear systems with a different number of constraints. Each strategy uses the initial box

x

for generating the preconditioning matrix P. Then, the contraction is performed on a randomly generated box

x^{'} \in x

with (a)

50 %

, (b)

10 %

and (c)

1 %

of the width of

x

.

Figure 3. Example showing the constraints

f (x) \leq 3

, and

g (x) \leq 0

projected over a box

x = x_{1} \times x_{2}

. Straight lines represent the linearization of the functions over the box. The point

(- 0.51, 0.66)

corresponds to the solution minimizing

f (x)

subject to the constraint. On the right figure we can see, in blue, the box generated after applying obbt over the initial box and the linear system.

Figure 3. Example showing the constraints

f (x) \leq 3

, and

g (x) \leq 0

projected over a box

x = x_{1} \times x_{2}

. Straight lines represent the linearization of the functions over the box. The point

(- 0.51, 0.66)

corresponds to the solution minimizing

f (x)

subject to the constraint. On the right figure we can see, in blue, the box generated after applying obbt over the initial box and the linear system.

Figure 4. Performance profile. Comparison between the results reported by different configurations of

α

for the obbt-gs strategy: (left) percentage of contraction obtained by a strategy given a certain number of calls (

F

); and (right) percentage of CPU time spent by an strategy.

Figure 4. Performance profile. Comparison between the results reported by different configurations of

α

for the obbt-gs strategy: (left) percentage of contraction obtained by a strategy given a certain number of calls (

F

); and (right) percentage of CPU time spent by an strategy.

Figure 5. Summary of the best results, i.e., obbt-gs with an

α

of

10^{- 3}

and the obbt-gs without the

α

parameter. Additionally, obbt represents the curve where only the obbt is applied (without obbt-gs). (left) Percentage of contraction obtained by a strategy given a certain number of calls (

F

); (right) Percentage of CPU time spent by an strategy.

Figure 5. Summary of the best results, i.e., obbt-gs with an

α

of

10^{- 3}

and the obbt-gs without the

α

parameter. Additionally, obbt represents the curve where only the obbt is applied (without obbt-gs). (left) Percentage of contraction obtained by a strategy given a certain number of calls (

F

); (right) Percentage of CPU time spent by an strategy.

Table 1. Details of the benchmark instances used in the experiments.

Benchmark	n	m	#Nonlinear	Benchmark	n	m	#Nonlinear
avgasa	8	10	0	ex8_4_4bis	5	4	0
chembis	11	4	0	ex8_4_5	15	11	11
dipigri	7	4	4	ex8_4_5bis	4	1	0
dixchlng	10	5	5	ex8_5_1	6	5	2
dualc8	8	15	0	ex8_5_1-1	6	5	2
ex2_1_7	20	10	0	ex8_5_1bis	7	6	2
ex2_1_8	24	20	0	ex8_5_2_1	6	5	2
ex2_1_9	10	1	0	ex8_5_4	5	4	2
ex5_4_4	27	19	6	ex8_5_5	5	4	2
ex6_1_3	12	9	6	ex8_5_6	6	4	2
ex6_1_3bis	6	3	0	hhfair	27	25	6
ex6_2_10	6	3	0	hs056	7	4	4
ex6_2_11	3	1	0	hs100	7	4	4
ex6_2_12	4	2	0	hs113	10	8	8
ex6_2_14	4	2	0	hs119	16	8	0
ex6_2_8	3	1	0	hydro	30	24	6
ex7_2_8	8	4	4	meanvar	7	2	0
ex7_3_4bis	7	14	2	schwefel5	5	5	0
ex7_3_5bis	4	6	2	schwefel5-abs	5	5	0
ex8_1_3	2	2	0	srcpm	38	20	0
ex8_4_4-1	17	12	12	srcpm-1	39	20	0

Table 2. CPU times and number of boxes for the reference strategy obbt, obbt-gs (with parameter values

F = 1 / 5

and

α = 0.0

) and obbt-gs

_{α}

(with parameter values

F = 1 / 4

and

α = 10^{- 3}

). In bold, we highlight the best results.

Table 2. CPU times and number of boxes for the reference strategy obbt, obbt-gs (with parameter values

F = 1 / 5

and

α = 0.0

) and obbt-gs

_{α}

(with parameter values

F = 1 / 4

and

α = 10^{- 3}

). In bold, we highlight the best results.

	obbt		obbt-gs			obbt-gs $_{α}$
	Boxes	CPU	Boxes	CPU	$Δ$ t	Boxes	CPU	$Δ$ t
ex6_2_12	7854	10.8	9854	11.1	3.3%	10,290	10.1	−6.2%
ex6_2_8	31,793	41.5	45,649	38.2	−8.2%	46,079	32.6	−21.7%
ex8_4_4bis	77,506	114	114,091	99.1	−13.1%	120,760	87.1	−26.6%
ex8_5_2_1	9616	23.6	13,587	21.6	−8.7%	14,853	19.2	−18.9%
schwefel5−abs	6329	8.6	9485	6.27	−27.1%	11,283	7.97	−7.3%
ex6_1_3	16,553	122	22,837	112.4	−7.9%	21,875	93.1	−23.8%
srcpm	337	7.02	597	6.3	−10.1%	600	6.30	−10.3%
ex2_1_7	2515	17.9	2939	16.3	−9.0%	2609	15.1	−15.8%
ex8_5_1	4135	10.1	5072	7.07	−30.1%	4899	8.41	−16.8%
dixchlng	1787	8.30	1935	6.71	−19.7%	2158	6.51	−22.1%
hs100	2667	6.75	3753	5.71	−15.4%	4050	4.8	−28.3%
hs113	4769	19.3	6817	16.3	−15.9%	7593	15.0	−22.5%
hhfair	1609	21.9	2132	18.9	−13.8%	2185	25.2	15.4 %
dualc8	190,092	532	331,281	749	40.9%	322,698	611	14.9%
chembis	483,627	1425	802,579	1725	21.1%	500,690	1462	2.6%
ex8_4_5	16,934	216	27,942	204	−5.8%	17,368	191	−11.5%
			avg:		−8.5%	avg:		−12.7%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Reyes, V.; Araya, I. Non-Convex Optimization: Using Preconditioning Matrices for Optimally Improving Variable Bounds in Linear Relaxations. Mathematics 2023, 11, 3549. https://doi.org/10.3390/math11163549

AMA Style

Reyes V, Araya I. Non-Convex Optimization: Using Preconditioning Matrices for Optimally Improving Variable Bounds in Linear Relaxations. Mathematics. 2023; 11(16):3549. https://doi.org/10.3390/math11163549

Chicago/Turabian Style

Reyes, Victor, and Ignacio Araya. 2023. "Non-Convex Optimization: Using Preconditioning Matrices for Optimally Improving Variable Bounds in Linear Relaxations" Mathematics 11, no. 16: 3549. https://doi.org/10.3390/math11163549

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Non-Convex Optimization: Using Preconditioning Matrices for Optimally Improving Variable Bounds in Linear Relaxations

Abstract

1. Introduction

2. Background

2.1. Intervals

2.2. Linear Systems

3. Example: Contracting a Linear System

4. Toward an Optimal Contraction of Non-Square Linear Systems

4.1. Improving the Gauss-Pivoting Heuristic

4.2. Linear-Based Preconditioning

4.2.1. Minimizing the Size of the Interval Projection

4.2.2. Minimizing/Maximizing the Upper/Lower Bound of the Interval Projection

5. Experiments

5.1. Contracting Power

5.2. Sustainability

5.3. Non-Convex Optimization Problems

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI