Next Article in Journal
Ultimate Dynamics of the Two-Phenotype Cancer Model: Attracting Sets and Global Cancer Eradication Conditions
Previous Article in Journal
A Comprehensive Review and Analysis of Deep Learning-Based Medical Image Adversarial Attack and Defense
Previous Article in Special Issue
Sparse Support Tensor Machine with Scaled Kernel Functions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

First-Order Conditions for Set-Constrained Optimization

1
Department of Electrical and Computer Engineering, Indiana University-Purdue University, Indianapolis, IN 46202, USA
2
Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523, USA
3
Department of Mathematics, University of Virginia, Charlottesville, VA 22904, USA
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(20), 4274; https://doi.org/10.3390/math11204274
Submission received: 15 September 2023 / Revised: 3 October 2023 / Accepted: 10 October 2023 / Published: 13 October 2023
(This article belongs to the Special Issue Optimization Theory, Method and Application)

Abstract

:
A well-known first-order necessary condition for a point to be a local minimizer of a given function is the non-negativity of the dot product of the gradient and a vector in a feasible direction. This paper proposes a series of alternative first-order necessary conditions and corresponding first-order sufficient conditions that seem not to appear in standard texts. The conditions assume a nonzero gradient. The methods use extensions of the notions of gradient, differentiability, and twice differentiability. Examples, including one involving the Karush–Kuhn–Tucker (KKT) theorem, illustrate the scope of the conditions.

1. Introduction

Set-constrained and set-unconstrained optimization use several theorems that include a first-order necessary condition, as well as second-order conditions that are necessary and/or sufficient for a point to be a local minimizer [1,2]. These theorems require the objective function to be once or twice continuously differentiable.
The first-order necessary condition for unconstrained optimization requires the gradient to be zero at a minimizer [3,4,5]. The first-order necessary condition for set-constrained optimization requires the dot product of the gradient and a vector in a feasible direction to be non-negative. When constraints are defined in terms of differentiable functions, the first-order necessary condition takes the form of the first-order Lagrange and Karush–Kuhn–Tucker (KKT) conditions [2,6]. Reference [5] determined the solution to linear programming problems using a first-order necessary condition. Such conditions have been studied for control systems governed by ordinary differential equations [7], stochastic differential equations [8], and stochastic evolution equations [9].
Throughout, we assume a nonzero gradient. Our main results present a series of four sufficient conditions and four corresponding necessary conditions for a point to be a local minimizer of a given function f (Theorems 1–8). Each is of the first-order type, including those that assume twice differentiability. Theorems 1 and 2 describe the behavior of f in cones determined by the gradient. Theorems 3 and 4 replace the geometrical viewpoint by analytical conditions involving sequences. The analytical versions use generalizations of the notions of gradient and differentiability that simplify statements and proofs. Theorems 5 and 6 are refinements when f is twice differentiable. A previous version of this paper included a remark that the analytical conditions are unverifiable. However, the last two results, Theorems 7 and 8, which return to the geometrical view, are proved precisely by verifying the analytical conditions. They replace the original cones by larger regions that we call α -cones. An α -cone is an ordinary cone when α = 1 and a paraboloid when α = 2 . The results fail for half-planes, and a paraboloid is a limiting case for what is possible. Example 5 illustrates a class of problems that do not meet criteria for a strict local minimizer in the KKT theory but are covered by Theorem 7.
We remark that the cones used here are different from the cones of descent directions in [2], which are actually half-planes. Our sufficient conditions do not guarantee that a point with a nonzero gradient is a strict local minimizer on a half-plane. Example 1 gives a function with a nonzero gradient that is not a strict local minimizer on a half-plane.
Convex optimization problems require the objective function to be convex, which essentially makes them second-order conditions. The first-order conditions that we propose do not require the objective function to be convex. The requirement for a function to be twice continuously differentiable is different from the condition of convexity, which constrains the values of the second derivatives.
Notation and terminology. Points of R n are written as column vectors x = [ x 1 . , x n ] with norm x = | x 1 | 2 + + | x n | 2 . A subset of R n is called a neighborhood of a point x * if it contains a disk x x * < ε for some ε > 0 , and a neighborhood of a set Ω R n if it is a neighborhood of every point in Ω . A point x * Ω is a strict local minimizer of a function f : Ω R if there is an ε > 0 such that f ( x ) > f ( x * ) whenever x Ω and 0 < x x * < ε , and a local minimizer if the condition holds with > replaced by ≥. Local maximizers and strict local maximizers are defined similarly by reversing the inequalities.

2. First-Order Conditions for Local Minimizers

The gradient of a real-valued function f defined on a subset Ω of R n is defined by
f ( x ) = f x 1 , , f x n ,
whenever the partial derivatives exist. Gradients appear in a standard first-order necessary condition for local minimizers. Consider a function f : Ω R that is C 1 on a neighborhood of a set Ω R n and a point x * Ω . If x * is a local minimizer of f, then d f ( x * ) 0 for every vector d in a feasible direction, that is, a direction such that some straight line segment with endpoint x * lies entirely within Ω [1] (Theorem 6.1). Hence, if d f ( x * ) < 0 for some feasible direction d at x * , the standard necessary condition implies that x * is not a local minimizer. However, it may occur that x * has no feasible direction within Ω , and then it is impossible for the standard necessary condition to give such information. For example, feasible directions are impossible for any set Ω whose points have only rational coordinates. For an elementary example, consider the objective function f ( x ) = x 1 on the set
Ω = x R 2 : 0 x 1 1 , x 1 3 x 2 x 1 2 .
Then, f attains a maximum value on Ω at x * = 0 . The point x * admits no feasible direction within Ω (proof: any feasible direction must be along a line segment x 2 = c x 1 , c > 0 ; then, for all sufficiently small positive x 1 , x 1 3 c x 1 x 1 2 , and hence x 1 2 c x 1 , which is impossible), and thus the standard necessary condition yields no information.
Feasible directions play no role in our results, and instead all that is needed is that f ( x * ) 0 . Theorem 2 (FONC 1) is a new first-order necessary condition for local minimizers. It is proved using a corresponding new first-order sufficient condition, Theorem 1 (FOSC 1). Corollary 1 is a companion sufficiency condition for strict local maximizers. In the example f ( x ) = x 1 on the set Ω defined by (1) and x * = 0 , the gradient is f ( 0 ) = [ 1 , 0 ] , and therefore:
(1)
Theorem 2 is applicable and implies that 0 is not a strict local minimizer, because every opposite cone | x 2 | c x 1 contains points arbitrarily near 0 .
(2)
Corollary 1 is applicable and implies that 0 is a strict local maximizer, because Ω is entirely contained in the opposite cone | x 2 | x 1 .
We need a few more preliminaries. A real-valued function f on a subset Ω of R n is said to be differentiable at x * if the domain of f contains a neighborhood of x * , all partial derivatives of f exist at x * , and the function r ( x ) defined by
f ( x ) f ( x * ) = ( x x * ) f ( x * ) + r ( x )
satisfies
lim x x * r ( x ) x x * = 0 .
When n = 1 , this is equivalent to the existence of a derivative at the point, but in general the simple existence of partial derivatives does not imply differentiability. A convenient sufficient condition for differentiability is that f is defined and C 1 on some neighborhood of x * [10] (Th. 9 on p. 113). Differentiability is equivalent to the existence of a first-order Taylor approximation, as in [1] (Th. 5.5 on pp. 64–65) or [10] (Th. 2 on p. 160).
Any two nonzero vectors d 1 , d 2 in R n determine an acute angle θ such that
cos ( θ ) = d 1 d 2 d 1 d 2 .
This notion is implicit in the definition of a cone in R n .
Definition 1. 
If x * , d R n , d 0 , and 0 < δ < 1 , the set consisting of x * together with all points x x * in R n that satisfy
( x x * ) d x x * d δ
is denoted by K δ ( x * , d ) and called a cone with vertex x * and direction d . The opposite cone is defined by
K δ ( x * , d ) = K δ ( x * , d ) .
Our first result is a first-order sufficient condition (FOSC) for a local minimizer.
Theorem 1 
(FOSC 1). Let f be a real-valued function that is defined and C 1 on a neighborhood of a point x * R n such that d * = f ( x * ) 0 . Assume that Ω is a set in the domain of f that contains x * and is contained in some cone K δ ( x * , d * ) , δ ( 0 , 1 ) . Then, x * is a strict local minimizer of f on Ω.
We note a consequence that will be useful in Theorem 2.
Corollary 1. 
For f as in Theorem 1, x * is a strict local maximizer of f on any set in the domain of f that contains x * and is contained in K δ ( x * , d * ) for some δ ( 0 , 1 ) .
The corollary follows by applying Theorem 1 with f replaced by f .
Proof of Theorem 1. 
Let Ω be a set in the domain of f that contains x * and is contained in K δ ( x * , d * ) , δ ( 0 , 1 ) . Since f is C 1 on a neighborhood of x * , f is differentiable at x * . Thus,
f ( x ) f ( x * ) = ( x x * ) d * + r ( x ) ,
where lim x x * r ( x ) / x x * = 0 . Therefore, we may choose η > 0 so that the punctured disk 0 < x x * < η is contained in the domain of f and
| r ( x ) | x x * < δ d *
whenever 0 < x x * < η . Suppose x Ω and 0 < x x * < η . Then, x K δ ( x * , d * ) , so ( x x * ) d * x x * d * . By (4),
f ( x ) f ( x * ) δ x x * d * + r ( x ) .
Since 0 < x x * < η , by (5), | r ( x ) | < δ d * x x * , and hence by (6),
f ( x ) f ( x * ) > 0 .
Therefore, x * is a strict local minimizer of f on Ω . □
A corresponding first-order necessary condition (FONC) is deduced with the aid of Corollary 1. We first introduce some useful terminology.
Definition 2. 
A point x * of Ω is called isolated if there exists an ε > 0 such that the punctured disk 0 < x x * < ε contains no points of Ω, or, equivalently, there is no sequence { x n } n = 1 in Ω { x * } such that x n x * .
In the extreme case that a set consists of just one point, that point is isolated in the set because the alternative would imply the existence of other points in the set. Isolated points occur in our setting in the next result, which is illustrated in Figure 1.
Theorem 2 
(FONC 1). Let f be a real-valued function that is defined and C 1 on a neighborhood of a point x * R n such that d * = f ( x * ) 0 . If x * is a strict local minimizer of f on some subset Ω of R n , then x * is an isolated point of K δ ( x * , d * ) Ω for every δ ( 0 , 1 ) .
Proof. 
Suppose x * is a strict local minimizer of f on some subset Ω of R n . We argue by contradiction to show that x * is an isolated point of K δ ( x * , d * ) Ω for every δ ( 0 , 1 ) . Assume that this conclusion is false for some δ ( 0 , 1 ) . Then, the cone K δ ( x * , d * ) contains a sequence x 1 , x 2 , x 3 , . . . in Ω { x * } that converges to x * . Since x * is a strict local minimizer of f on Ω , there exists ε > 0 such that f ( x ) > f ( x * ) whenever 0 < x x * < ε and x Ω . By the definition of convergence, x n x * < ε for all sufficiently large n, and hence f ( x n ) > f ( x * ) for all sufficiently large n. However, by Corollary 1, x * is a strict local maximizer for f on K δ ( x * , d * ) Ω , so f ( x n ) < f ( x * ) for a sufficiently large n, which is a contradiction. The result follows. □

3. Analytical Versions of the Conditions and Generalized Differentiability

In this section, we present analytical versions of the conditions for local minimizers given in Section 2. They are stated in a more general setting that uses extensions of the notions of differentiability and gradient to arbitrary sets. The analytical versions are in some ways more transparent and lead to generalizations of Theorems 1 and 2 in Section 5.
Definition 3 
(Generalized Differentiability). Let Ω be a subset of R n . We say that a function f : Ω R is differentiable at a point x * Ω if ( 1 ) x * is not an isolated point of Ω, and (2) there is a vector g ( x * ) in R n such that the function r ( x ) defined by
f ( x ) f ( x * ) = ( x x * ) g ( x * ) + r ( x )
satisfies
lim x x * r ( x ) x x * = 0 ,
or, equivalently, for every sequence { x n } 1 in Ω { x * } converging to x * , the sequence { r n } 1 defined by
f ( x n ) f ( x * ) = ( x n x * ) g ( x * ) + r n
satisfies
lim x x * r n x n x * = 0 .
Any such vector g ( x * ) is denoted by f ( x * ) and called a gradient of f at x * .
By [10] (Th. 9 on p. 113), the condition (8) is automatically met if f has an extension to a function f ˜ which is C 1 on a neighborhood of x * , and then we can choose g ( x * ) = f ˜ ( x * ) . In general, gradients are not unique, but for our purpose any choice will work.
For a fixed f and x * , the set of all gradients g ( x * ) is a closed convex set. We shall not need this fact and omit a proof.
Theorem 3 
(FOSC 2). Let Ω be a subset of R n , and let f : Ω R be differentiable at some point x * Ω with gradient f ( x * ) 0 . Assume that for every sequence { x n } 1 in Ω { x * } with x n x * there is a δ > 0 such that
( x n x * ) f ( x * ) δ x n x *
for all sufficiently large n. Then, x * is a strict local minimizer of f over Ω.
Proof. 
Argue by contradiction. If x * is not a strict local minimizer of f, there is a sequence { x n } 1 in Ω { x * } with x n x * such that f ( x n ) f ( x * ) for all n. Define { r n } 1 by (9). Then, by (11),
f ( x n ) f ( x * ) = ( x n x * ) f ( x * ) + r n δ x n x * + r n
for all sufficiently large n. Since f is differentiable at x * , r n / x n x * 0 as n by (10). Dividing by x n x * , we obtain
f ( x n ) f ( x * ) x n x * δ + r n x n x * > 0
for all sufficiently large n. The result that f ( x n ) > f ( x * ) for all sufficiently large n contradicts our choice of the sequence { x n } 1 to satisfy f ( x n ) f ( x * ) for all n. The theorem follows. □
The corresponding necessary condition is conveniently stated in contrapositive form.
Theorem 4 
(FONC 2). Let Ω be a subset of R n , and let f : Ω R be differentiable at some point x * Ω with gradient f ( x * ) 0 . If there exists a sequence { x n } 1 in Ω { x * } converging to x and number δ > 0 such that
( x n x * ) f ( x * ) δ x n x *
for all sufficiently large n, then x * is not a local minimizer of f on Ω.
Equivalently, if x * is a local minimizer of f over Ω , then there exists no such sequence { x n } 1 and number δ > 0 .
Proof. 
Assume we are given such a sequence { x n } 1 and δ > 0 . Define { r n } 1 by (9). Then, by (14),
f ( x n ) f ( x * ) = ( x n x * ) f ( x * ) + r n δ x n x * + r n
for all sufficiently large n. Since f is differentiable at x * , r n / x n x * 0 as n by (10). Thus, dividing (15) by x n x * , we see that
f ( x n ) f ( x * ) x n x * δ + r n x n x * < 0
for all sufficiently large n. Hence, f ( x n ) < f ( x * ) for all sufficiently large n. Since x n * x , x * is not a strict local minimizer. □
Example 1. 
The inequality (11) in Theorem 3 (FOSC 2) cannot be weakened to
( x n x * ) f ( x * ) > 0 .
Choose x * = 0 , Ω = { 0 } { x R 2 : x 1 > 0 } , and f ( x ) = x 1 x 2 2 , x Ω . Then, we can take f ( x * ) = [ 1 , 0 ] , and any sequence { x n } 1 in Ω { 0 } with x n 0 satisfies (17). However, 0 is not a local minimizer because f ( 0 ) = 0 , and the sequence { x n } 1 in Ω { 0 } defined by x n = 1 / ( 2 n 2 ) , 1 / n , n 1 , converges to 0 and satisfies f ( x n ) < 0 for all n 1 .
Example 2. 
We cannot relax the inequality (14) in Theorem 4 (FONC 2) to
( x n x * ) f ( x * ) < 0 .
Choose x * = 0 , Ω = x R 2 : 2 x 1 x 2 2 , f ( x ) = x 1 + x 2 2 , x Ω , and f ( x * ) = [ 1 , 0 ] . See Figure 2. The sequence x n = 1 / ( 2 n 2 ) , 1 / n , n 1 , belongs to Ω { 0 } and converges to 0 . It satisfies (18) because
( x n x * ) f ( x * ) = 1 / ( 2 n 2 ) , 1 / n 1 0 = 1 2 n 2 < 0 , n 1 .
Nevertheless, f ( 0 ) = 0 is the minimum value of f attained on Ω. For, if x Ω { 0 } and x 2 0 , then f ( x ) = x 1 + x 2 2 1 2 x 2 2 + x 2 2 > 0 , and f ( x ) = x 1 > 0 whenever x Ω { 0 } and x 2 = 0 .

4. Refinements Based on Twice Differentiability

The additional smoothness of f yields stronger results.
Definition 4 
(Generalized Twice Differentiability). Let Ω be a subset of R n . We say that a function f : Ω R is twice differentiable at a point x * Ω if ( 1 ) x * is not an isolated point of Ω, and (2) there exist g ( x * ) R n and H ( x * ) R n x n such that the function r ( x ) defined by
f ( x ) f ( x * ) = ( x x * ) g ( x * ) + ( x x * ) H ( x * ) ( x x * ) + r ( x )
satisfies
lim n r ( x ) x x * 2 = 0 ,
or, equivalently, such that for any sequence { x n } 1 in Ω { x * } converging to x * ,
f ( x n ) f ( x * ) = ( x n x * ) g ( x * ) + ( x n x * ) H ( x * ) ( x n x * ) + r n ,
where
lim n r n x n x * 2 = 0 .
Any such vector g ( x * ) is denoted f ( x * ) and called a gradient of f at x * , and any such matrix H ( x * ) is called a Hessian of f at x * .
Twice differentiability implies differentiability according to Lemma 1 below. By Th. 3 on p. 160 of [10], f is twice differentiable at x * if it has an extension to a function f ˜ that is C 2 on a neighborhood of x * . In this case, g ( x * ) and H ( x * ) can be chosen as the usual gradient and Hessian of f ˜ .
Definition 5. 
Given sequences { a n } n = 1 and { b n } n = 1 of real numbers, we write (1) a n = O ( b n ) to mean that there exists M > 0 such that | a n | M | b n | for all sufficiently large n, and (2) a n = O ( b n ) if b n 0 for all sufficiently large n and lim n a n / b n = 0 .
Lemma 1. 
Let Ω be a subset of R n , and let f : Ω R be twice differentiable at some point x * Ω with gradient f ( x * ) . Then, for any sequence { x n } 1 in Ω { x * } converging to x * ,
f ( x n ) f ( x * ) = ( x n x * ) f ( x * ) + O ( x n x * 2 ) .
Proof. 
We start with (21), assuming some choice of Hessian H ( x * ) . Let M = H ( x * ) be the matrix bound of H ( x * ) . Then, by the triangle inequality and Cauchy–Schwarz inequality,
| ( x n x * ) H ( x * ) ( x n x * ) + r n | | ( x n x * ) H ( x * ) ( x n x * ) | + | r n | M x n x * 2 + | r n | x n x * 2 x n x * 2 .
By (22), r n / x n x * 2 0 , and hence r n / x n x * 2 1 for all sufficiently large n. Therefore | ( x n x * ) H ( x * ) ( x n x * ) + r n | ( M + 1 ) x n x * 2 for all sufficiently large n. □
Theorem 5 
(FOSC 3). Let Ω be a subset of R n , and let f : Ω R be twice differentiable at some point x * Ω , with gradient f ( x * ) 0 . Assume that for every sequence { x n } 1 in Ω { x * } with x n x * , there is a sequence { δ n } n = 1 of positive numbers such that x n x * 2 = O ( δ n ) and
( x n x * ) f ( x * ) δ n
for all sufficiently large n. Then, x * is a strict local minimizer of f over Ω.
Proof. 
If the conclusion is not true, there exists a sequence { x n } 1 in Ω { x * } with x n x * such that f ( x n ) f ( x * ) for all n. Then, by hypothesis, there is a sequence { δ n } n = 1 of positive numbers satisfying (24) such that x n x * 2 = O ( δ n ) . By Lemma 1 and (24),
f ( x n ) f ( x * ) = ( x n x * ) f ( x * ) + O ( x n x * 2 ) δ n + O ( x n x * 2 )
for all sufficiently large n. Hence, since x n x * 2 = O ( δ n ) ,
f ( x n ) f ( x * ) δ n 1 + O ( x n x * 2 ) δ n > 0
for all sufficiently large n. Thus, f ( x n ) > f ( x * ) for all sufficiently large n, contradicting our choice of the sequence { x n } 1 . The result follows. □
Theorem 6 
(FONC 3). Let Ω be a subset of R n , and let f : Ω R be twice differentiable at some point x * Ω , with gradient f ( x * ) 0 . If there exist a sequence { x n } 1 in Ω { x * } converging to x * and a sequence { δ n } n = 1 of positive numbers such that x n x * 2 = O ( δ n ) and
( x n x * ) f ( x * ) δ n
for all sufficiently large n, then x * is not a local minimizer of f on Ω.
Equivalently, if x * is a local minimizer of f on Ω , no such sequences { x n } 1 and { δ n } n = 1 can exist.
Proof. 
Let { x n } 1 and { δ n } n = 1 be sequences with the properties stated in the theorem. By Lemma 1 and (26),
f ( x n ) f ( x * ) = ( x n x * ) f ( x * ) + O ( x n x * 2 ) . δ n + O ( x n x * 2 )
for all sufficiently large n. Hence, since x n x * 2 = O ( δ n ) ,
f ( x n ) f ( x * ) δ n 1 + O ( x n x * 2 ) δ n < 0
for all sufficiently large n. Thus, f ( x n ) < f ( x * ) for all sufficiently large n, and therefore x * is not a local minimizer. □
Remark 1. 
If f , Ω , x * , f ( x * ) 0 satisfy the conditions for a strict local minimizer in Theorem 3 (FOSC 2), they satisfy the conditions in Theorem 5 (FOSC 3) by choosing δ n = δ x n x * , n 1 . Similarly, if f , Ω , x * , f ( x * ) 0 meet the conditions for a local non-minimizer in Theorem 4 (FONC 2), they meet the conditions in Theorem 6 (FONC 3). Examples 3 and 4 show that both converse statements fail.
Example 3. 
See Figure 3. Set f ( x ) = x 1 , x = [ x 1 , x 2 ] , on Ω = { x R 2 : x 1 | x 2 | 3 / 2 } and let x * = 0 . Then, f is twice differentiable at x * with gradient f ( x * ) = [ 1 , 0 ] . The point x * is a strict local minimizer because f ( 0 ) = 0 and f ( x ) > 0 for every other x Ω . The condition for a strict local minimizer in Theorem 3 (FOSC 2) is not satisfied. For example, the sequence
x n = 1 n , 1 n 2 / 3 , n 1 ,
is in Ω { x * } , x n * 0 , and the inequality (11) implies 1 / n δ 1 / n 2 + 1 / n 4 / 3 1 / 2 , which is impossible. On the other hand, the condition in Theorem 5 (FOSC 3) is satisfied. Consider any sequence x n = [ x n , 1 , x n , 2 ] , n 1 , in Ω { x * } with x n x * . Choosing δ n = x n , 1 , we readily find that ( x n x * ) f ( x * ) δ n , and x n x * 2 = o ( δ n ) .
Example 4. 
See Figure 4. Set f ( x ) = x 1 , x = [ x 1 , x 2 ] , on Ω = { x R 2 : x 1 | x 2 | 3 / 2 } and let x * = 0 . Then, f is twice differentiable at x * with gradient f ( x * ) = [ 1 , 0 ] . Since f ( x ) < 0 on the curve x 1 = | x 2 | 3 / 2 , x * is not a local minimizer of f on Ω. It is not possible to show this using Theorem 4 (FONC 2), because it is impossible to find a sequence satisfying the conditions there. However, the conditions of Theorem 6 (FONC 3) are met by choosing
x n = 1 n 3 / 2 , 1 n a n d δ n = 1 n 3 / 2
for all n 1 .

5. Applications of the Analytical Conditions

Example 3 suggests looking for generalizations of Theorems 1 and 2 to larger regions. In this section, we show that such generalizations exist, again assuming twice differentiability.
Definition 6. 
Let x * , d R n , d 0 , and α > 0 be given. For each β > 0 , define K α , β ( x * , d ) R n as the set consisting of the point x * , together with all points x x * such that ( 1 ) u β v α and ( 2 )   u d > 0 , where
u = ( x x * ) d d d d
is the projection of x x * in the direction d , and
v = ( x x * ) u
is a component of x x * in the orthogonal direction. We call K α , β ( x * , d ) an α-cone. The opposite α-cone is the set
K α , β ( x * , d ) = K α , β ( x * , d ) .
See Figure 5. For α = 1 , K α , β ( x * , d ) R n is a cone as in Definition 1. For α = 2 , it is a paraboloid.
Theorem 7 
(FOSC 4). Let Ω be a subset of R n , and let f : Ω R be twice differentiable at some point x * Ω , with nonzero gradient d * = f ( x * ) . If Ω K α , β ( x * , d * ) for some α [ 1 , 2 ) and β > 0 , then x * is a strict local minimizer of f over Ω.
As in Theorem 1, we can apply Theorem 7 with f replaced by f .
Corollary 2. 
Let Ω be a subset of R n , and let f : Ω R be twice differentiable at some point x * Ω , with nonzero gradient d * = f ( x * ) . If Ω K α , β ( x * , d * ) for some α [ 1 , 2 ) and β > 0 , then x * is a strict local maximizer of f over Ω.
Remark 2. 
Theorem 7 fails for α = 2 . For an example, let x * = 0 , x = [ x 1 , x 2 ] , and
Ω = { x R 2 : x 1 x 2 2 } .
Define
f ( x ) = x 1 2 x 2 2 , x Ω .
Then, f is twice differentiable at x * , d * = f ( x * ) = [ 1 , 0 ] , and Ω = K 2 , 1 ( x * , d * ) (see Figure 5). If Theorem 7 were true for α = 2 , then x * = 0 would be a strict local minimizer for f on Ω. However, f assumes positive, negative, and zero values at points of Ω arbitrarily close to x * . This contradicts Theorem 7, and therefore α = 2 cannot be allowed in the theorem.
Proof of Theorem 7. 
We prove the theorem by verifying the condition for a strict local minimizer in Theorem 5 (FOSC 3). Let { x n } 1 be a sequence in Ω { x * } such that x n x * . We shall construct a sequence { δ n } n = 1 of positive numbers satisfying (24) for all sufficiently large n such that x n x * 2 = O ( δ n ) . For all n 1 , set
u n = ( x n x * ) d * d * d * d * and v = ( x n x * ) u n ,
Then, u n β v n α and u n d * > 0 for all n by the definition of K α , β ( x * , d * ) . Since u n and v n are orthogonal,
x n x * 2 = u n 2 + v n 2 u n 2 + 1 β 2 / α u n 2 / α .
Since x n x * , u n 0 , and hence u n < 1 for all sufficiently large n, say n n 0 . Since α 1 , 2 / α 2 , and hence u n 2 / α u n 2 for all n n 0 (because u n < 1 ). Thus.
x n x * 2 u n 2 / α + 1 β 2 / α u n 2 / α = γ u n 2 / α , n n 0 .
We assume also that α < 2 and hence 2 / α > 1 , say 2 / α = 1 + ε . Then,
x n x * 2 γ u n 2 / α = γ ( x n x * ) d * d * 1 + ε = γ ( x n x * ) d * d * ( x n x * ) d * d * ε γ ( x n x * ) d * d * ( x n x * ) ε ,
by the Cauchy–Schwarz inequality and the fact that ( x n x * ) d * = u n d * > 0 . Therefore,
( x n x * ) d * x n x * 2 ε d * γ , n n 0 .
Setting δ n = x n x * 2 ε d * / γ , we obtain ( x n x * ) d * δ n for all sufficiently large n and
lim n x n x * 2 δ n = lim n γ x n x * ε d * = 0 .
We have verified the requirements in Theorem 5, and therefore x * is a strict local minimizer of f over Ω by that result. □
Theorem 8 
(FONC 4). Let Ω be a subset of R n , and let f : Ω R be twice differentiable at some point x * Ω , with nonzero gradient d * = f ( x * ) . If x * is a strict local minimizer of f on Ω, then x * is an isolated point of K α , β ( x * , d ) Ω for every α [ 1 , 2 ) and β > 0 .
Proof. 
Assume that x * is a strict local minimizer of f on Ω , and, if possible, that x * is not an isolated point of K α , β ( x * , d ) Ω for some α [ 1 , 2 ) and β > 0 . Then, there is a sequence { x n } n = 1 that belongs to both K α , β ( x * , d ) and Ω { x * } such that x n x * . By Corollary 2, f ( x n ) < f ( x * ) for all sufficiently large n, contradicting our assumption that x * is a strict local minimizer of f on Ω . The theorem follows. □
Examples 5 and 6 are set in the context of the Karush–Kuhn–Tucker (KKT) theorem [2,6], which allows constraint conditions to be expressed in terms of inequalities. We follow the account in [1], in which the KKT theorem appears as Theorem 21.1, and Theorem 21.3 is the corresponding second-order sufficient condition (SOSC). Theorems 9 and 10 below are specializations of these results to the cases that concern us here. Theorems 21.1 and 21.3 in [1] allow additional Lagrange-type conditions that play no role in our examples.
Theorem 9 
(KKT Theorem). Let f , g : R n R be given C 1 functions. Assume that x * R n is a local minimizer for f subject to the condition g ( x ) 0 , and that x * is a regular point for g in the sense that g ( x * ) 0 . Then, there is a real number μ * 0 such that
(1) 
μ * g ( x * ) = 0 ;
(2) 
f ( x * ) + μ * g ( x * ) = 0 .
The corresponding sufficient condition requires the stronger assumption that the given functions f , g are C 2 . The Hessians F , G for f , g are the n × n matrices of second-order partials of f , g .
Theorem 10 
(SOSC). Let f , g : R n R be given C 2 functions. Assume that x * R n satisfies g ( x * ) 0 and we can find a real number μ * 0 satisfying the following conditions:
(1) 
μ * g ( x * ) = 0 .
(2) 
f ( x * ) + μ * g ( x * ) = 0 .
(3) 
If F , G are the Hessians of f , g and L ( x * , μ * ) = F ( x * ) + μ * G ( x * ) , then y L ( x * , μ * ) y > 0 for all y R n such that y 0 and g ( x * ) y = 0 .
Then, x * is a strict local minimizer for f subject to the condition g ( x ) 0 .
Example 5. 
Set x * = 0 , f ( x ) = x 1 x 1 2 x 2 2 , and g ( x ) = | x 2 | 3 / 2 x 1 for all x = [ x 1 , x 2 ] in R 2 . Then, f C 2 and g C 1 on R 2 . The set Ω = { x R 2 : g ( x ) 0 } is an a l p h a -cone with α = 3 / 2 in the direction f ( x * ) = [ 1 , 0 ] (see Figure 3). Therefore, by Theorem 7 (FOSC 4), x * is a strict local minimizer for f subject to the constraint g ( x ) 0 . However, this cannot be shown with Theorem 10 because g C 2 , which is a hypothesis in Theorem 10. To see why this is a problem, consider the form L ( x , μ * ) = F ( x ) + μ * G ( x ) that appears in condition (3). At any point x = [ x 1 , x 2 ] with x 2 0 , the Hessian of g is given by
G ( x ) = 2 g x i x j i , j = 1 2 = 0 0 0 3 4 | x 2 | 1 / 2 .
The second partial 2 g / x 2 2 does not exist at any point on the line x 2 = 0 . Thus, G ( x * ) is undefined. Hence, the Lagrangian L ( x * , μ * ) = F ( x * ) + μ * G ( x * ) in condition (3) of Theorem 10 is undefined, and therefore Theorem 10 cannot be applied. We remark that this example is within the scope of Theorem 9 (KKT Theorem), and the conditions (1) and (2) there are satisfied with μ * = 1 .
Example 6. 
Set x * = 0 , f ( x ) = x 1 x 1 2 x 2 2 , and g ( x ) = x 2 2 x 1 for all x = [ x 1 , x 2 ] in R 2 . Then, f , g C 2 on R 2 . In this example, x * is not a local minimizer of f subject to the constraint g ( x ) 0 . For example, for x x * on the boundary x 1 = x 2 2 of the constraint set,
f ( x ) = x 2 2 x 1 2 + x 2 2 = x 1 2 < 0 .
Might this example contradict Theorem 7 or Theorem 10? Fortunately, no, and it is instructive to see why. Theorem 7 is not applicable because the constraint set Ω = { x R 2 : g ( x ) 0 } is an a l p h a -cone with α = 2 , and it is shown in Remark 2 that Theorem 7 fails for α = 2 . To see that Theorem 10 is also not applicable, let us check the required conditions (1)–(3):
(1) 
Since g ( x * ) = 0 , μ * g ( x * ) = 0 for all μ * 0 .
(2) 
For μ * = 1 , g ( x * ) + μ * f ( x * ) = [ 1 , 0 ] + μ * [ 1 , 0 ] = 0 .
(3) 
In our example,
L ( x * , μ * ) = F ( x * ) + μ * G ( x * ) = 2 0 0 2 + 0 0 0 2 = 2 0 0 0 .
Therefore,
y L ( x * , μ * ) y = 0 y 2 2 0 0 0 0 y 2 = 0
for every y = [ 0 , y 2 ] such that y 2 0 , that is, for all y 0 such that g ( x * ) y = 0 .
In view of (28), the positive definiteness condition in (3) fails, and hence Theorem 7 cannot be applied to this example.

6. Conclusions

The first-order necessary conditions in this paper contribute to the literature on first-order optimality conditions by introducing stronger results than those in the current literature. The new first-order necessary conditions imply the standard first-order necessary conditions, including those in [1,5]. We introduced first-order sufficient conditions that we did not find elsewhere in the literature. Our explanation of why the new conditions are stronger used examples that were two-dimensional. However, the method is applicable to general n-dimensional problems including linear programming.
We proposed first-order sufficient conditions for set-constrained optimization that do not require the objective function to be convex or the constraint equations to be differentiable. Conditions that require the function to be convex are essentially second-order conditions. Our conditions only require the gradient of the objective function to be nonzero at a candidate minimizer, and they are essentially first-order conditions even when we apply them to problems where the objective function is twice differentiable.
When the given function is continuously differentiable at x * and the gradient is nonzero, the simplest form of the sufficient condition says that there is a cone with a vertex at x * , and x * is a strict local minimizer on the cone. This sufficient condition was employed to prove a corresponding necessary condition that does not use feasible directions and instead uses the topological notion of an isolated point in a set.
We introduced generalized differentiability and reformulated the first-order conditions in terms of convergent sequences. The new differentiability does not require the objective function to be defined on an open neighborhood of x * . It only requires the function to be defined on the constraint set.
We refined the first-order conditions for a minimizer to twice differentiable functions in terms of α -cones. The sufficiency version says that a twice differentiable function with a nonzero gradient has a strict local minimizer at the vertex of an α -cone whose axis is the gradient direction. We presented a problem with an α -cone constraint set where the new sufficiency condition shows that the candidate point is a strict local minimizer. This problem satisfies the necessary condition of the KKT method but not the sufficient condition, because the Hessian is undefined at the candidate minimizer.

Author Contributions

Conceptualization, S.M.R. and E.K.P.C.; methodology, S.M.R., E.K.P.C. and J.R.; writing—original draft, S.M.R.; writing—review and editing, J.R. All authors have read and agreed to the published version of the manuscript.

Funding

S.M. Rovnyak was supported in part by the National Science Foundation under grant ECCS-1711521. E.K.P. Chong was supported in part by the National Science Foundation under grant CCF-2006788.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors thank Henry Rovnyak for help with the LaTeX document and TikZ figures.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Chong, E.K.P.; Zak, S.H. An Introduction to Optimization, 4th ed.; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2013. [Google Scholar]
  2. Forst, W.; Hoffmann, D. Optimization—Theory and Practice; Springer Undergraduate Texts in Mathematics and Technology; Springer: New York, NY, USA, 2010. [Google Scholar]
  3. Sioshansi, R.; Conejo, A.J. Optimization in Engineering: Models and Algorithms; Springer Optimization and Its Applications; Springer: Cham, Switzerland, 2017; Volume 120. [Google Scholar]
  4. Butenko, S.; Pardalos, P.M. Numerical Methods and Optimization; Chapman & Hall/CRC Numerical Analysis and Scientific Computing; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
  5. Kochenderfer, M.J.; Wheeler, T.A. Algorithms for Optimization; MIT Press: Cambridge, MA, USA, 2019. [Google Scholar]
  6. Luenberger, D.G. Optimization by Vector Space Methods; John Wiley & Sons, Inc.: New York, NY, USA, 1969. [Google Scholar]
  7. Lewis, A.D. Maximum Principle. Online Lecture Notes. 2006. Available online: https://mast.queensu.ca/~andrew/teaching/pdf/maximum-principle.pdf (accessed on 9 October 2023).
  8. Peng, S.G. A general stochastic maximum principle for optimal control problems. SIAM J. Control Optim. 1990, 28, 966–979. [Google Scholar] [CrossRef]
  9. Lu, Q. Second order necessary conditions for optimal control problems of stochastic evolution equations. In Proceedings of the 35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016. [Google Scholar]
  10. Marsden, J.E.; Tromba, A.J. Vector Calculus, 6th ed.; W.H. Freeman & Company: New York, NY, USA, 2012. [Google Scholar]
Figure 1. This figure illustrates Theorem 2 (FONC 1). If x * is a strict local minimizer of f on some subset Ω of R n , then x * is an isolated point of K δ ( x * , d * ) Ω for every δ ( 0 , 1 ) .
Figure 1. This figure illustrates Theorem 2 (FONC 1). If x * is a strict local minimizer of f on some subset Ω of R n , then x * is an isolated point of K δ ( x * , d * ) Ω for every δ ( 0 , 1 ) .
Mathematics 11 04274 g001
Figure 2. Region Ω , point x * , and gradient f ( x * ) for Example 2.
Figure 2. Region Ω , point x * , and gradient f ( x * ) for Example 2.
Mathematics 11 04274 g002
Figure 3. Region Ω , point x * , and gradient f ( x * ) for Examples 3 and 5.
Figure 3. Region Ω , point x * , and gradient f ( x * ) for Examples 3 and 5.
Mathematics 11 04274 g003
Figure 4. Region Ω , point x * , and gradient f ( x * ) for Example 4.
Figure 4. Region Ω , point x * , and gradient f ( x * ) for Example 4.
Mathematics 11 04274 g004
Figure 5. This figure illustrates Definition 6 when x * = 0 and d is in the positive x 1 -direction. The α -cone K α , β ( x * , d ) is the region to the right of the curve together with the curve itself.
Figure 5. This figure illustrates Definition 6 when x * = 0 and d is in the positive x 1 -direction. The α -cone K α , β ( x * , d ) is the region to the right of the curve together with the curve itself.
Mathematics 11 04274 g005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rovnyak, S.M.; Chong, E.K.P.; Rovnyak, J. First-Order Conditions for Set-Constrained Optimization. Mathematics 2023, 11, 4274. https://doi.org/10.3390/math11204274

AMA Style

Rovnyak SM, Chong EKP, Rovnyak J. First-Order Conditions for Set-Constrained Optimization. Mathematics. 2023; 11(20):4274. https://doi.org/10.3390/math11204274

Chicago/Turabian Style

Rovnyak, Steven M., Edwin K. P. Chong, and James Rovnyak. 2023. "First-Order Conditions for Set-Constrained Optimization" Mathematics 11, no. 20: 4274. https://doi.org/10.3390/math11204274

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop