Next Article in Journal
An Interval-Valued Three-Way Decision Model Based on Cumulative Prospect Theory
Previous Article in Journal
A Rule-Based Approach for Mining Creative Thinking Patterns from Big Educational Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Convergence Rates for Hestenes’ Gram–Schmidt Conjugate Direction Method without Derivatives in Numerical Optimization

1
Department of Mathematics and Statistics, The University of Toledo, Toledo, OH 43606, USA
2
Department of Mathematics and Statistics, Stephen F. Austin State University, Nacogdoches, TX 75962, USA
*
Author to whom correspondence should be addressed.
AppliedMath 2023, 3(2), 268-285; https://doi.org/10.3390/appliedmath3020015
Submission received: 27 December 2022 / Revised: 21 February 2023 / Accepted: 27 February 2023 / Published: 24 March 2023

Abstract

:
In this work, we studied convergence rates using quotient convergence factors and root convergence factors, as described by Ortega and Rheinboldt, for Hestenes’ Gram–Schmidt conjugate direction method without derivatives. We performed computations in order to make a comparison between this conjugate direction method, for minimizing a nonquadratic function f, and Newton’s method, for solving f = 0 . Our primary purpose was to implement Hestenes’ CGS method with no derivatives and determine convergence rates.

1. Introduction

The conjugate gradient (CG) and conjugate direction (CD) methods have been extended to the optimization of nonquadratic functions by several authors. Fletcher and Reeves [1] gave a direct extension of the conjugate gradient (CG) method. An approach to conjugate direction (CD) methods using only function values was developed by Powell [2]. Davidon [3] developed a variable metric algorithm, which was later modified by Fletcher and Powell [4]. According to Davidon [3], variable metric methods are considered to be very effective techniques for optimizing a nonquadratic function.
In 1952, Hestenes and Stiefel [5] developed conjugate direction (CD) methods for minimizing a quadratic function defined on a finite dimensional space. One of their objectives was to find efficient computational methods for solving a large system of linear equations. In 1964, Fletcher and Reeves [1] extended the conjugate gradient (CG) method of Hestenes and Stiefel [5] to nonquadratic functions. The method presented here is related to those described by G.S. Smith [6], M.J.B. Powell [2] and W.I. Zangwill [7]. The method of Smith is also described by Fletcher [8] on pp. 9–10, Brent [9] on p. 124 and Hestenes [10] on p. 210. In addition to that, Nocedal [11] explored the possibility of nonlinear conjugate gradient methods converging without restarts and with the use of practical line search. In the field of numerical optimization, a number of additional authors, including Kelley [12], Zang and Li [13], among others, investigated a wide range of approaches in the use of conjugate direction methods.
The primary purpose of this work is to implement Hestenes’ Gram–Schmidt conjugate direction method without derivatives, which uses function values with no line searches. We will refer to this method as the GSCD method; Hestenes refers to it as the CGS method. We illustrate the procedure numerically, computing asymptotic constants and the quotient convergent factors of Ortega and Rheinboldt [14]. In reference to Hestenes [10], p. 202, where he states that the CGS has Newton’s algorithm as its limit, Russak [15] shows that n-step superlinear convergence is possible. We verify numerically that the GSCD procedure converges quadratically under appropriate conditions.
As for notation, we use capital letters, such A , B , C , , to denote matrices and lower case letters, such as a , b , c , , for scalars. The value A * denotes the transpose of matrix A. If F is a real-valued differentiable function of n real variables, we denote its gradient at x by F ( x ) and the Hessian of F at x by F ( x ) . We use subscripts to distinguish vectors and superscripts to denote components when these distinctions are made together, for example, x k = ( x k 1 , , x k n ) .
The method of steepest descent is due to Cauchy [16]. It is one of the oldest and most obvious ways to find a minimum point of a function f.
There are two versions of steepest descent. The one due to Cauchy, which we call an iterative method, uses line searches and another, described by Eells [17] in Equation (10), p. 783, uses a differential equation of steepest descent. In Equation (4.3) we describe another version of the differential equation of steepest descent. However, numerically, both have flaws. The iterative method is generally quite slow, as shown by Rosenbrock [18] in his banana valley function.
Newton’s method applied to f = 0 , where f is a function to be minimized, is another approach for finding a minimum of the function f. Newton’s method has rapid convergence, but it is costly because of derivative evaluations. Hestenes’ CGS method without derivatives [10], p. 202, has Newton’s method as its limit as σ 0 .
If the minimizing function is strongly convex quadratic and the line search is exact, then, in theory, all choices for the search direction in standard conjugate gradient algorithms are equivalent. However, for nonquadratic functions, each choice of the search direction leads to standard conjugate gradient algorithms with very different performances [19].
In this article, we investigate quotient convergence factors and root convergence factors. We computationally compare the conjugate Gram–Schmidt direction method with Newton’s method. There are other types of convergence for the conjugate gradient, the conjugate direction, the gradient method, Newton’s method and the steepest descent method, such as superlinear convergence [20,21,22], Wall [23] root convergence and Ostrowski convergence factors [24], but, for the sake of this research, quotient convergence is the one that is the most appropriate for the quadratic convergence.
In this article, the well-known conjugate directions algorithm, for minimizing a quadratic function, is modified to become an algorithm for minimizing a nonquadratic function, in the manner described in Section 2. The algorithm uses the gradient estimates and Hessian matrix estimates described in Section 3. In Section 4, a test example for minimizing a nonquadratic function by the developed conjugate direction algorithm without derivatives is analyzed. The advantage of this approach compared to Newton’s method is efficiency. The proposed approach is justified in sufficient detail. The results obtained are of certain theoretical and practical interest for specialists in the theory and methods of optimization.

2. Methodology

In this section, we present a class of CD algorithms for minimizing functions defined on an n-dimensional space. The reader is directed to refer to Stein [25] and Hestenes [10], pp. 135–137 and pp. 199–202, respectively, for more details.

2.1. The Method of CD

Let A be a positive definite real symmetric n × n matrix, let k be a constant n-vector and let c be a fixed real scalar. Throughout this section, F denotes the function defined on Euclidean n-space E n by
F ( x ) = 1 2 x * A x k * x + c ,
where x is in E n .
Suppose 1 m n . Let S m be the linear subspace spanned by the set { p 1 , , p n } of m linearly independent and, hence, nonzero vectors. Let x 1 be any vector in E n . Then, the m-dimensional plane P m through x 1 obtained by translating the subspace S m is defined by
P m = x : x = x 1 + α 1 p 1 + + α m p m , α i R ( i = 1 , , m ) .
Two vectors, p and q, in E n are said to be A-orthogonal or conjugate if p * A p = 0 . A set { p 1 , , p m } of nonzero vectors in E n is said to be mutually A-orthogonal or mutually conjugate if
p i * A p j = 0 f o r i j ( i = 1 , , m ) .
Theorem 1
([25]). Let S m be a subspace of E n , where { p 1 , , p n } is a basis for S m , 1 m n . Further assume that p 1 , , p m is a mutually A-orthogonal set of vectors. Let x 1 be any vector in E n . Let x be in P m . Then, the following conditions are equivalent:
1. 
x minimizes F on P m .
2. 
F ( x ) , the gradient of F at x, is orthogonal to the subspace S m .
3. 
x = x 1 + a 1 p 1 + + a m p m , where a i = c i d i , c i = p i * F ( x 1 ) , d i = p i * A p i , i = 1 , , m .
Let x i = x 1 + a 1 p 1 + + a i p i , i = 1 , , m . Then the quantity c i defined in (3) above is also given by
c i = p i * F ( x i ) , i = 1 , , m .
Then, there is a unique vector x 0 in the m-dimensional plane P m through x 1 translating S m such that x 0 minimizes the function F given by (1) on P m .
Proof. 
First, we are going to show that F has at least one minimizing vector in P m . Let p be any vector in P m and let M = F ( p ) . Since A is positive definite, there is an R R > 0 such that | | x | | > R implies F ( x ) > M . Hence, F ( x ) M implies | | x | | R . Since C = { x : | | x | | R } P m is a compact set in E n on which F assumes values and is continuous, then F has at least one minimizing vector p 0 in the compact set C. Outside this compact set C, F assumes only larger values. Thus, p 0 is a minimizing vector for F in P m .
To show that (1) implies (2), assume that x minimizes F on P m . Then,
p j * F ( x ) = d F d α ( x + α p j ) | α = 0 = 0 ,
for j = 1 , , m . Thus,
p j * F ( x ) = 0 ( j = 1 , , m ) .
So F ( x ) is orthogonal to every vector in the basis of S m and, hence, is orthogonal to S m .
To show (2) implies (1), suppose that F ( x ) is orthogonal to S m . Let v be any vector in P m . We are going to show that F ( v ) > F ( x ) unless v = x . By Taylor’s theorem we have the following:
F ( v ) = F ( x ) + ( v x ) * F ( x ) + 1 2 ( v x ) * A ( v x ) .
Since ( v x ) is a vector in S m , then ( v x ) * F ( x ) = 0 . In addition, ( v x ) * A ( v x ) > 0 unless v = x , because A is positive definite. Thus,
F ( v ) > F ( x ) unless v = x .
Hence, x is a unique absolute minimum for F in P m .
Now we can prove that there is a unique vector x 0 in P m minimizing F on P m . Earlier we established that there is at least one minimizing vector p 0 for F in P m . Since (1) implies (2), then F ( p 0 ) is orthogonal to S m . From the proof of (2) implies (1), it now follows that p 0 is a unique absolute minimum for F in P m .
To show that (2) implies (3), let x = x 1 + a 1 p 1 + + a m p m since x is in P m , and assume that F ( x ) is orthogonal to the subspace S m . We are going to show that a i = c i d i , where c i = p i * F ( x 1 ) , d i = p i * A p i , i = 1 , , m . Note that A x = A x 1 + a 1 A p 1 + + a m A p m . In addition, A x k = A x 1 k + a 1 A p 1 + + a m A p m .
Since F ( x ) = A x k , then
F ( x ) = F ( x 1 ) + a 1 A p 1 + + a m A p m .
For i = 1 , , m , we have
p i * F ( x ) = p i * F ( x 1 ) + a 1 p i * A p 1 + + a m p i * A p m .
Since { p 1 , , p m } is a mutually A-orthogonal set of vectors, then p i * F ( x ) = p i * F ( x 1 ) + a i p i * A p i , i = 1 , , m . Since F ( x ) is orthogonal to the subspace S m , then p i * F ( x ) = 0 , i = 1 , , m . Thus, a i p i * A p i = p i * F ( x 1 ) . Since p i 0 , i = 1 , , m , and A is positive definite, then p i * A p i 0 , i = 1 , , m . If we let c i = p i * F ( x 1 ) and d i = p i * A p i , then
a i = c i d i , i = 1 , , m .
To show that (3) implies (2), we can use what was established in the previous proof. An indication of this is proved below.
Suppose that x = x 1 + a 1 p 1 + + a m p m , where a i = c i d I , c i = p i * F ( x 1 ) , d i = p i * A p i , i = 1 , , m . We want to show that p i * F ( x ) = 0 , i = 1 , , m . Since
p i * F ( x ) = p i * F ( x 1 ) + a i p i * A p i ,
and a i = p i * F ( x 1 ) p i * A p i , then we have p i * F ( x ) = 0 , i = 1 , , m . Hence, F ( x ) is orthogonal to S m . Thus, (1)–(3) are equivalent.
Now we are going to show that the quantity c i defined by c i = p i * F ( x 1 ) , i = 1 , , m , in (3) is also given by c i = p i * F ( x i ) , i = 1 , , m .
Since x i + 1 = x i + a i p i , i = 1 , , ( m 1 ) , then
A x i + 1 = A x i + a i A p i , A x i + 1 k = A x i k + a i A p i , F ( x i + 1 ) = F ( x i ) + a i A p i i = 1 , , ( m 1 ) .
Thus,
F ( x i + 1 ) = F ( x 1 ) + a i A p i + + a 1 A p 1 , i = 1 , , ( m 1 ) ,
and, by conjugacy of { p 1 , , p m } , we have
p i * F ( x i ) = p i * [ F ( x 1 ) + a 1 A p 1 + + a i 1 A p i 1 = p i * F ( x 1 ) , i = 1 , , m .
Hence,
p i * F ( x 1 ) = p i * F ( x i ) , i = 1 , , m .
This completes the proof of the theorem.    □

2.2. A Class of Minimization Algorithms

Now, we shall describe a class of minimization algorithms known as the method of CDs. The significance of the formulas given in (3) of Theorem 1 is indicated below.
Suppose { p 1 , , p m } , 1 m n , is a conjugate set of nonzero vectors and that P m is the m-dimensional plane through x 1 obtained by translating the subspace S m generated by { p 1 , , p m } . Then, the minimum of F given by (1) on P m is attained at x 0 , which we will call x m + 1 , where x m + 1 = x 1 + a 1 p 1 + + a m p m , refer to Theorem 1. Now we assume that p m + 1 is a nonzero vector that has been constructed to be conjugate to p i , i = 1 , , m , and let P m + 1 denote the ( m + 1 ) -dimensional plane through x 1 obtained by translating the subspace S m + 1 generated by { p 1 , , p m , p m + 1 } . It turns out that it is not necessary to solve a new ( m + 1 ) -dimensional minimization problem to determine the minimizing vector x m + 2 on P m + 1 .
The minimizing vector x m + 2 on P m + 1 is obtained by a one-dimensional minimization of F about the vector x m + 1 in the direction p m + 1 . This follows directly from the following formulas found in Theorem 1:
x m + 2 = x m + 1 + a m + 1 p m + 1 ,
and
a m + 1 = c m + 1 d m + 1 , c m + 1 = p m + 1 * F ( x m + 1 ) , d m + 1 = p m + 1 * A p m + 1 .
Note that a m + 1 depends upon x m + 1 and p m + 1 and explicitly involves no other x or p terms. Thus, the minimizing vector x m + 1 on P m results from m consecutive one-dimensional minimizations starting at x 1 and preceding along the CDs p 1 , , p m successively. The ways of obtaining a mutually conjugate set { p 1 , , p m } of vectors are not specified in general. Thus, the method of CDs is really a class of algorithms, where a specific algorithm depends upon the choice of { p 1 , , p m } . In practice, the vector p k , k = 1 , , m , needed for the ( k + 1 ) t h iteration in finding x k + 1 , k = 1 , , m , is usually constructed from information obtained at the k t h iteration, k = 1 , , m . The following class of algorithms is referred to as the method of CDs: for k = 1 , , n , we find
x k + 1 = x k + a k p k ,
a k = c k d k , c k = p k * F ( x 1 ) , d k = p k * A p k .
Alternatively, c k may be given by
c k = p k * F ( x k ) .
If F ( x m ) = 0 for 1 m n , then the algorithm terminates and x m minimizes F on E n . Furthermore, any algorithm terminates in n steps or less since F is quadratic.

2.3. Special Inner Product and the Gram–Schmidt Process

Let A be a positive definite symmetric n × n matrix. Define a special inner product ( x , y ) by
( x , y ) = x * A y ,
where x and y are column vectors.
Let
u 1 * = ( 1 , 0 , , 0 ) , u 2 * = ( 0 , 1 , 0 , , 0 ) and u n * = ( 0 , 0 , , 0 , 1 ) .
Then, using the special inner product above, we apply the Gram–Schmidt process to the linearly independent vectors u 1 , u 2 , , u n to obtain a set of mutually A-orthogonal vectors p 1 , p 2 , , p n , where the property of A-orthogonality is relative to the special inner product as performed by Hestenes and Stieffel [5] on p. 425.

3. Results

A brief description of the CG method is given below using a quadratic function:
F ( x ) = 1 2 x * A x k * x + c .
The CG method is the CD method, which is described previously, with the first CD being in the direction of the negative gradient of function F. The remaining CDs can be determined in a variety of ways, and the CG procedure described by Hestenes [10] is given below.

3.1. CG—Algorithms for Nonquadratic Approximations

One can apply the CG method to the quadratic function in z, namely F ( z ) , to obtain a minimum of F ( z ) . Let f be a function of n variables, then
F ( z ) = f ( x 1 ) + ( f ( x 1 ) ) * z + 1 2 z * f ( x 1 ) z .
Assume that a Hessian matrix is a positive definite symmetric matrix, which implies that F ( z ) has a unique minimum z ¯ min . Then,
F ( z ) = f ( x 1 ) + f ( x 1 ) z .
Applying Newton’s method to F ( z ) = 0 , we get
f ( x 1 ) + f ( x 1 ) z = 0 , ( f ( x 1 ) ) 1 ( f ( x 1 ) ) + z = 0 multiplied by ( f ( x 1 ) ) 1 , z ¯ min = ( f ( x 1 ) ) 1 ( f ( x 1 ) ) .
Remark 1.
We solved F ( z ¯ ) = 0 ¯ directly to obtain min F ( z ) .
In general, Newton’s method is used to solve f ( z ¯ ) = 0 ¯ for z ¯ . It is given by
z n + 1 = z n J n 1 f ( z n ) , n = 0 , 1 , 2 ,
where z 0 is an initial guess and J n is the Jacobian matrix, i.e.,
J n = f ( 1 ) ( z n ) z 1 f ( 1 ) ( z n ) z n f ( 1 ) ( z n ) z 1 f ( 1 ) ( z n ) z n .
Now, we apply Newton’s method by taking f ¯ to F and assuming that F and its second partial derivatives are continuous. So, one can apply Newton’s method to F ( z ) = 0 ¯ , with z 1 = 0 as the initial point, to obtain the minimum point z ¯ m i n of F, where
z n + 1 = z n J n 1 ( F z n ) = z n ( F ( x 1 ) ) 1 ( F z n ) .
Then,
z 2 = z 1 ( F ( x 1 ) ) 1 ( F z 1 ) = 0 ¯ ( F ( x 1 ) ) 1 ( F ( 0 ) ) ,
where we take z 1 = 0 ¯ .
Since
F ( z ¯ ) = F ( x 1 ) + F ( x 1 ) ( z ¯ ) , F ( 0 ¯ ) = F ( x 1 ) + F ( x 1 ) ( 0 ¯ ) , F ( 0 ¯ ) = F ( x 1 ) .
Then,
z 2 = 0 ( F ( x 1 ) ) 1 F ( x 1 ) .
For convenience in exposition, we include formulas below from Hestenes [10], pp. 136–137 and pp. 199–202.
Here, the first step of Newton’s method is applied to F ( z ) = 0 and z 2 also turns out to be the only m i n of F ( z ) (a quadratic equation with positive definite symmetric term), i.e.,
z 2 = ( F ( x 1 ) ) 1 F ( x 1 ) ,
which satisfies F ( z 2 ) = 0 . Therefore, Newton’s method terminates in one iteration [10].
The initial formulas for b k and c k given in Algorithm 1 imply the basic CG relations
p k * r k + 1 = 0 , s k * p k + 1 = 0 .
Algorithm 1 CG algorithm
Step 1: Select an initial point x 1 . Set r 1 = f ( x 1 ) , p 1 = r 1 , z 1 = 0 .
for  k = 1 , , n  do perform the following iteration:
  Step 2: s k = f ( x 1 ) p k ,
  Step 3: a k = c k d k , d k = p k * s k , c k = p k * r k or c k = p k * r 1 ,
  Step 4: z k + 1 = z k + a k p k , r k + 1 = r k a k s k ,
  Step 5: p k + 1 = r k + 1 + b k p k ,      b k = s k * r k + 1 d k or b k = | r k + 1 | 2 | r k 2 | .
end for
Step 6: When  k = n consider the next estimate of the minimum point x 0 of f to be the point x ¯ 1 = x 1 + z n + 1 .
Then choose x ¯ 1 as the final estimate, if | f ( x ¯ 1 ) | is sufficiently small enough.
Otherwise, reset x 1 = x ¯ 1 and the CG cycle ( S t e p 1 ) ( S t e p 5 ) is repeated.
The CG cycle in Step 1 can terminate prematurely at the mth step if r m + 1 = 0 . In this case, we replace x 1 by x ¯ 1 = x 1 + z m + 1 and restart the algorithm.
If we take A = f ( x 1 ) , where A is positive definite symmetric, then we establish the formula
f ( x 1 ) 1 = k = 1 n p k p k * d k ,
for the inverse of f ( x 1 ) .
Since Step 2 implies that s k = f ( x 1 ) p k , then, in Algorithm 1, we find
lim σ 0 f ( x 1 + σ p k ) f ( x 1 ) σ = f ( x 1 ) p k .
We obtain the difference quotient by rewriting the vector s k in Algorithm 1 (see Hestenes [10]). Therefore, without computing the second derivative we find
s k = f ( x 1 + σ p k ) f ( x 1 ) σ .
In view of the development of Algorithms 1 and 2, each cycle of n steps is clearly comparable to one Newton step.
Thus, we replace c k = p k * r k by c k = p k * r 1 and obtain the following relation
z n + 1 = k = 1 n c k p k d k = k = 1 n p k p k * r 1 d k = H ( x 1 , σ ) ( r 1 ) = H ( x 1 , σ ) f ( x 1 ) ,
where
H ( x 1 , σ ) = k = 1 n p k p k * d k , r 1 = f ( x 1 ) .
Algorithm 2 CG algorithm without derivative
Step 1: Initially select x 1 and choose a positive constant σ . Set z 1 = 0 , r 1 = f ( x 1 ) , p 1 = r 1 .
for  k = 1 , , n  do perform the following iteration:
  Step 2: s k = f ( x 1 + σ p k ) f ( x 1 ) σ ,   σ k = σ | p k | ,
  Step 3: a k = c k d k , d k = p k * s k , c k = p k * r k ,
  Step 4: z k + 1 = z k + a k p k , r k + 1 = r k a k s k ,
  Step 5: p k + 1 = z k + a k p k , b k = s k * r k + 1 d k .
end for
Step 6: When  k = n , then x ¯ 1 = x 1 + z n + 1 is to be the next estimate of the minimum point x 0 of f.
Then accept x ¯ 1 as the final estimate of x 0 , if | f ( x ¯ 1 ) | is sufficiently small.
Otherwise, reset x 1 = x ¯ 1 and repeat the CG cycle ( S t e p 1 ) ( S t e p 5 ) .
The new initial point x ¯ 1 = x 1 + z n + 1 generated by one cycle of the modified Algorithm 2 is, therefore, given by the Newton-type formula
x ¯ 1 = x 1 H ( x 1 , σ ) f ( x 1 ) .
So, we have lim σ 0 H ( x 1 , σ ) = f ( x 1 ) 1 . The above algorithm approximates the Newton algorithm
x 1 ¯ = x 1 f ( x 1 ) 1 f ( x 1 )
and has this algorithm as a limit as σ 0 . Therefore, Algorithm 2 will have nearly identical convergence features to Newton’s algorithm if σ is replaced by σ 2 at the end of each cycle.

3.2. Conjugate Gram–Schmidt (CGS)—Algorithms for Nonquadratic Functions

With an appropriate initial point x 1 , we can derive the algorithm that is described by Hestenes [10] on p. 135, which relates Newton’s method to a CGS algorithm. Since [10]
lim σ 0 f ( x 1 + σ p k ) f ( x 1 ) σ = f ( x 1 ) p k .
We can approximate the vector f ( x 1 ) p k by the vector
s k = f ( x 1 + σ p k ) f ( x 1 ) σ ,
with a small value of σ k . Then, we obtain the following modification of Newton’s algorithm, the CGS algorithm (see Hestenes [10]):
In Step 2 of Algorithm 3, substitute s k with the following formula
s k = f ( x 1 ) p k
and repeat the CGS algorithm. Then, we obtain Newton’s algorithm.
Algorithm 3 CGS algorithm
Step 1: Select a point x 1 . a small positive constant, σ > 0 and n linearly independent vectors u 1 , , u n ; set z 1 = 0 , r 1 = f ( x 1 ) , p 1 = u 1 .
for  k = 1 , , n and having obtained z k , r k and p k  do perform the following iteration:
  Step 2: s k = f ( x 1 + σ p k ) f ( x 1 ) σ , σ k = σ | p k | ,
  Step 3: d k = p k * s k , c k = p k * r 1 , a k = c k d k ,
  Step 4: z k + 1 = z k + a k p k ,
  Step 5: b k + 1 , j = s j * u k + 1 d j      ( j = 1 , , k ) ,
  Step 6: p k + 1 = u k + 1 b k + 1 , 1 p 1 b k + 1 , k p k .
end for
Step 7: When when  z n + 1 has been computed, the cycle is terminated.
Then choose x ¯ 1 as the final estimate, if | f ( x ¯ 1 ) | is sufficiently small enough.
Otherwise, reset x 1 = x ¯ 1 and repeat the CGS cycle ( S t e p 1 ) ( S t e p 6 ) .
In view of (11), for small σ > 0 , the CGS Algorithm 3 is a good approximation of Newton’s algorithm as a limit as σ 0 .
A simple modification of Algorithm 3 is obtained by replacing the following formulas in Step 2 and Step 3, as described in Hestenes [10].
                          s k = f ( x 1 + σ p k ) f ( x 1 ) σ , σ k = σ | p k | ,
                          x k = x 1 + z k , d k = p k * s k , c k = p k * f ( x k ) , a k = c k d k .
A CGS algorihtm for nonquadratic functions is obtained form the following relation, where the ratios
                          c ( σ ) = f ( x σ p ) f ( x + σ p ) 2 σ ,
                          d ( σ ) = ( f σ p ) 2 f ( x ) + f ( x + σ p ) σ 2 ,
have the properties
lim σ 0 c ( σ ) = p * f ( x ) , lim σ 0 d ( σ ) = p * f ( x ) p ,
and p is a nonzero vector. Moreover, for a given vector u 0 , the ratio
c ( α , σ ) = f ( x + α u σ p ) f ( x + α u + σ p ) 2 σ ,
has the property that
lim α 0 lim σ 0 c ( σ ) c ( α , σ ) α = u * f ( x ) p .
The details are as follows. Suppose p 1 , p 2 , , p n is an orthogonal basis that spans the same vector space as that spanned by u 1 , u 2 , , u n , which are linearly independent vectors. The inner product ( x , y ) is defined by x * A y , where A is a positive definite symmetric matrix. Then, the Gram–Schmidt process works as follows:
p ¯ 1 = u 1 , p 1 = p ¯ 1 | p ¯ 1 | = u 1 p ¯ 2 = u 2 ( u 2 , p 1 ) ( p 1 , p 1 ) p 1 , p 2 = p ¯ 2 | p ¯ 2 | p ¯ 3 = u 3 ( u 3 , p 1 ) ( p 1 , p 1 ) p 1 ( u 3 , p 2 ) ( p 2 , p 2 ) p 2 , p 3 = p ¯ 3 | p ¯ 3 | p ¯ 3 = u 3 ( p 1 * A u 3 ) ( p 1 * A p 1 ) p 1 ( p 2 * A u 3 ) ( p 2 * A p 2 ) p 2 , p ¯ k + 1 = u k + 1 ( p 1 * A u k + 1 ) ( p 1 * A p 1 ) p 1 ( p k * A u k + 1 ) ( p k * A p k ) p k , p k + 1 = p ¯ k + 1 | p ¯ k + 1 | .
Take A = f ( x 1 ) , then
p ¯ k + 1 = u k + 1 ( p 1 * f ( x 1 ) u k + 1 ) ( p 1 * f ( x 1 ) p 1 ) p 1 ( p k * f ( x 1 ) u k + 1 ) ( p k * f ( x 1 ) p k ) p k .
We already proved that
p * A p = D or p * f ( x 1 ) p = D .
Then,
p ¯ k + 1 = u k + 1 ( p 1 * f ( x 1 ) u k + 1 ) d 1 p 1 ( p k * f ( x 1 ) u k + 1 ) d k p k .
We also know that
p k * f ( x 1 ) = s k .
Therefore,
p ¯ k + 1 = u k + 1 s 1 u k + 1 d 1 p 1 s k u k + 1 d k p k , p k + 1 = u k + 1 b k + 1 , 1 p 1 , , b k + 1 , k p k , p ¯ k + 1 = u k + 1 b k + 1 , 1 p 1 , , b k + 1 , k p k , sin ce p k + 1 = p ¯ k + 1 | p ¯ k + 1 | .
Now using function values only, a conjugate Gram–Schmidt process without derivatives is described by Hestenes [10] as follows, as the CGS routine without derivatives (Algorithm 4):
Algorithm 4 CGS algorithm without derivatives
Step 1: select an initial point x 1 , small σ > 0 and a set of unit vectors u 1 , , u n , which are linearly independent; set z 1 = 0 , p 1 = u 1 , α = 2 σ , γ 0 = 0 ; compute f ( x 1 ) .
for  k = 1 , , n and having obtained z k , p 1 , , p k and γ k 1 , do perform the following iteration:
  Step 2: d k = f ( x 1 σ p k ) 2 f ( x 1 ) + f ( x 1 + σ p k ) σ 2 ,
  Step 3: d c k = f ( x 1 σ p k ) f ( x 1 + σ p k ) 2 σ ,
  Step 4: γ k = m a x [ γ k 1 , | c k | ] ,
  Step 5: a k = c k d k ,      z k + 1 = z k + a k p k ,
  Step 6: p k + 1 = u k + 1 b k + 1 , 1 p 1 b k + 1 , k p k .
end for
Step 7: When  z n + 1 has been computed, the cycle is terminated.
Then choose x ¯ 1 as the final estimate, if | f ( x ¯ 1 ) | is sufficiently small, x ¯ 1 is the minimum of f.
Otherwise, reset x 1 = x ¯ 1 and repeat the CGS cycle ( S t e p 1 ) ( S t e p 6 ) with the initial condition γ 0 = 0 .
In addition, the conjugate Gram–Schmidt method without derivatives is described by Dennemeyer and Mookini [26]. In this program, they used different notations from Hestenes’ notations, but they provided the same procedure.
Initial step: select an initial point x 1 , a small σ > 0 and a set of linearly independent vectors u 1 , , u n ;
set h 1 = 0 , p 1 = u 1 , α = 2 σ , γ 0 = 0 and compute f ( x 1 ) .
Iterative steps: given x 1 , p 1 , , p k , h k , compute
d k = f ( x 1 σ p k ) 2 f ( x 1 ) + f ( x 1 + σ p k ) σ 2 , c k = f ( x 1 σ p k ) f ( x 1 + σ p k ) 2 σ , γ k = max γ k 1 , | c k | , a k = c k d k , h k + 1 = h k + a k p k ;
for j = 1 , , k compute
c k + 1 , j = f ( x 1 + α u j σ p k ) f ( x 1 + α u j + σ p k ) 2 σ , a k + 1 , j = c k + 1 , j d j , b k + 1 , j = a k + 1 , j a j α ,
then,
p k + 1 = u k + 1 + j = 1 k b k + 1 , j p j .
Terminate when h n + 1 is obtained, and set x n + 1 = x 1 + h n + 1 . If the value γ n is small enough, x n + 1 is the minimum point of f. Otherwise, set x 1 = x n + 1 and repeat the program.
The term γ n is used to terminate the algorithm because the gradient is not explicitly computed. Another termination method would be to test if max | a j | < ϵ is chosen beforehand. Both of these tests were used on the computer by Dennemeyer and Mookini [26] and the results were comparable.

4. Discussion

In this section, we present a computation to illustrate convergence rates, as well as the relationship between that computation and Newton’s method. Two of the most important concepts in the study of iterative processes are the following: (a) when the iterations converge; and (b) how fast the convergence is. We introduce the idea of rates of convergence, as described by Ortega and Rheinboldt [14].

4.1. Rates of Convergence

A precise formulation of the asymptotic rate of convergence of a sequence x k converging to x * is motivated by the fact that estimates of the form
| | x k + 1 x * | | | | x k x * | | p ,
for all k = 1 , 2 , , often arise naturally in the study of certain iterative processes.
Definition 1.
Let x k be a sequence of points in R n that converges to a point x * . Let 1 p < . Ortega and Rheinboldt [14] define the quantities
Q p { x k } = 0 i f   x k = x * f o r   a l l   b u t   f i n i t e l y   m a n y   k , lim sup k x k + 1 x * x k x * p i f   x k x * f o r   a l l   b u t   f i n i t e l y   m a n y   k , + o t h e r w i s e ,
and refer to them as quotient convergence factors, or Q-factors for short.
Definition 2.
Let C ( I , x * ) denote the set of all sequences having a limit of x * that are generated by an iterative process I .
Q p ( I , x * ) = sup { Q p { x k } | { x k } C ( I , x * ) } 1 p < + ,
are the Q -factors of I at x * with respect to the norm in which the Q p { x k } are computed.
Note that if Q p { x k } < + for some p where 1 p < , then, for any ϵ > 0 , there is some positive integer K such that (13) above holds for C = Q p { x k } + ϵ . If 0 < Q p { x k } < , then we say that x k converges to x * with Q-order of convergence p, and if Q p { x k } = 0 , for some fixed p satisfying 1 p < , then we say that x k has superconvergence of Q-order p to x * . For example, if 0 < Q p { x k } < + when p = 1 , then we also have 0 < C < 1 in (13), we say that { x n } converges to x * linearly. In addition, if Q p { x k } = 0 when p = 1 , we say that { x n } converges to x * superlinearly.
Definition 3.
One other method of describing convergence rate involves the root convergence factors. See ([14]).
R p ( x n ) = lim sup k | | x n x * | | 1 / n i f p = 1 , lim sup k | | x n x * | | 1 / p n i f p > 1 .

4.2. Acceleration

One acceleration procedure is the following: first, apply n CD steps to an initial point x 1 to obtain a point x n + 1 = y 1 ; then, take x n + 1 to be a new initial point and apply n CD steps again to obtain another x n + 1 = y 2 ; finally, check for acceleration by evaluating Q = F ( y 2 ( Y 2 y 1 ) ) , if Q < F ( y 2 ) ; then, we accelerate by taking [ y 2 ( y 2 y 1 ) ] as our initial point; if Q > F ( y 2 ) , then take y 2 as a new initial point; after two more applications of the CD method, we check for acceleration again. The procedure continues in this manner [25].

4.3. Test Function

4.3.1. Rosenbrook’s Banana Valley Function

We carry out the following computations for Rosenbrook’s banana valley function ( n = 2 ) . This function possesses a steep sided valley that is nearly parabolic in shape. First, we determine values in the domain of Rosenbrock’s function for which its Hessian matrix is positive definite symmetric. Since the Rosenbrock’s banana valley function is non-negative, i.e.,
f ( x , y ) = [ 100 ( y x 2 ) 2 + ( x 1 ) 2 ] 0 ,
then we have
f x = 200 ( y x 2 ) ( 2 x ) + 2 ( x 1 ) = 400 x ( y x 2 ) + 2 ( x 1 ) ,
and
f x x = 400 ( y x 2 ) 400 x ( 2 x ) + 2 = 400 y + 400 x 2 + 800 x 2 + 2 = 1200 x 2 400 y + 2 ,
and
f x y = 400 x , f y = 200 ( y x 2 ) , f y y = 200 .
Therefore, the Hessian matrix is positive definite symmetric if and only if Sylvester’s criterion holds:
( 1200 x 2 400 y + 2 ) > 0 , and ( 200 ) ( 1200 x 2 400 y + 2 ) 160000 x 2 > 0 ,
which implies that 1200 x 2 + 2 > 400 y , y < 3 x 2 + 1 200 , and
1200 x 2 400 y + 2 800 x 2 > 0 , 400 x 2 + 2 > 400 y , y < x 2 + 1 200 .
So, the Hessian matric is positive definite symmetric if and only if y < x 2 + 1 200 .
Figure 1 shows the maximal convex level set on which the Hessian is positive definite symmetric in the interior for Rosenbrock’s Banana Valley Function.

4.3.2. Kantorovich’s Function

The following function
F ( x , y ) = ( 3 x 2 y + y 2 1 ) 2 + ( x 4 + x y 3 1 ) 2 ,
which is non-negative, i.e., F ( x 1 , x 2 ) 0 , is called Kantorovich’s Function.
Calculating the Hessian matrix for Kantorovich’s function, we find that
F x x = 72 x 2 y 2 + 12 ( 3 x 2 y + y 2 1 ) y + 2 ( 4 x 3 + y 3 ) 2 + 24 ( x 4 + x y 3 1 ) x 2 ,
F x y = 12 ( 3 x 2 + 2 y ) x y + 12 ( 3 x 2 y + y 2 1 ) x + 6 x y 2 ( 4 x 3 + y 3 ) + 6 ( x 4 + x y 3 1 ) y 2
and
F y y = 2 ( 3 x 2 + 2 y ) 2 + 12 x 2 y + 4 y 2 4 + 18 x 2 y 4 + 12 ( x 4 + x y 3 1 ) x y .
Minimizing this function is equivalent to solving the nonlinear system of equations. Therefore, for the initial point ( 0.98 , 0.32 ) , we obtain the minimum point at (0.992779, 0.306440) [25].

4.4. Numerical Computation

The goal of this numerical computation is to provide a system of iterative approaches for finding these extreme points [10]. A significant point is that a Newton step can be performed instead by a CD sequence of n linear minimizations in n appropriately chosen directions.
It is important to keep in mind that a function acts like a quadratic function when it is in the neighborhood of a nondegenerate minimum point. Conjugacy can be thought of as a generalization of the concept of orthogonality. Conjugate direction methods include substituting conjugate bases for orthogonal bases in the foundational structure. The formulas for determining the minimum point of a quadratic function can be reduced to their simplest forms by following the CD technique.
The conjugate direction algorithms for minimizing a quadratic function, which are discussed in the current work, were initially presented in Hestenes and Stiefel, 1952 [5]. These algorithms can be found in the present work. The authors Davidon [3], Fletcher and Powell [4] are most known for the modifications and additions that they made to these methods. However, numerous other authors also made these changes.
The iterative methods described above apply to many problems. They are used in least squares fitting, in solving linear and nonlinear systems of equations and in optimization problems with and without constraints [25]. The computing performances and numerical results of these techniques and comparisons have received a significant amount of attention in recent years. This interest has been focused on the solving of unconstrained optimization problems and large-scale applications [19,27].
The Rosenbrock function of two variables, considered in Section 4.3, was introduced by Rosenbrock [18] as a simple test function for minimization algorithms. We chose ( x 1 , y 1 ) = ( 1.2 , 1 ) as the initial point. We applied algorithm ( 4.4 a ) ( 4.4 f ) with σ = 0.1 × 10 120 , using 400-digit accuracy. Algorithm (4) is basically Newton’s algorithm.
The final estimate of ( x 0 , y 0 ) has more than 150-digit accuracy. The successive values 0.8574 , 0.0274 , 0.2433 , 0.0030 , 0.2000 , 0.0030 , 0.2000 , of quotients that lead to the quotient convergence factor oscillate. The lim sup of these quotients give the quotient converge factor, which indicates quadratic convergence. The lim sup is . 2000 .
For σ = 0.1 × 10 120 , ρ = 0.2 × 10 120 , ϵ = 0.1 × 10 60 and the initial values, we obtained the following computations for Rosenbrock’s function f using the Gram-Schmidt Conjugate Direction Method without Derivatives or the CGS method, no derivatives, and Newton’s Method applied to f = 0 : (See [28])
For additional information regarding the programming, please refer to the supplementary material.

4.5. Differential Equations of Steepest Descent

The following equations are known as the differential equations of steepest descent:
d x ( t ) d t = F ( x ( t ) ) ,
and
d x ( t ) d t = F ( x ( t ) ) | | F ( x ( t ) ) | | 2 .
The solution to either differential equation of steepest descent with initial condition x 1 ( 0 ) = 1.2 , x 2 ( 0 ) = 1.0 is shown in Figure 2, one can refer to Equation (10), p. 783, in Eells [17]. For Equation (14), the solution will not include the minimum for finite values of t. For Equation (15), the solution will approach the minimum, but will blow up at the minimum.
From a numerical point of view, the differential equation approach has to be used with caution. Rosenbrock [15] pointed out that the iterative method of steepest descent with line searches was not effective with steep valleys. The iterative method was introduced by Cauchy [16].
In summary, the method of steepest descent is not effective and does not compare with Hestenes’ CGS method with no derivatives, which is almost numerically equivalent to Newton’s method applied to grad ( f ) = 0 , where f is the function to be minimized.
Below are level curves of Rosenbrock’s banana valley function. We used this function to compare Hestenes’ CGS method, Newton’s method and the steepest descent methods. In Figure 2, the level curves of Rosenbrock’s Banana Valley Function show that the minimizer is at ( 1 , 1 ) . Level curves are plotted for function values 4.0 , 4.1 , 4.25 , 4.5 in Figure 3. For steepest descent, the iterative method and the ODE approach are illustrated. The curve y = x 2 appears to parallel the valley floor in the graph.
We use the CGS method for computation. The Rosenbrock’s banana valley function
F ( x 1 , x 2 ) = ( 1 x 1 ) 2 + 100 ( x 2 x 1 2 ) 2 ,
gives the minimum point at ( 1 , 1 ) .
This example provided us with geometric illustrations in Figure 2. For specific algorithms, please refer to Section 3 for the Gram–Schmidt conjugate direction method and the Newton method in order to compare the two methods along side one another.
The outcomes of the numerical experiments performed on the standard test function using the CGS method are reported above. Based on these data, it is clear that this particular implementation of the CGS method is quite effective.

5. Conclusions

In this paper, we introduced a class of CD algorithms that, for small values of n, provided effective minimization methods. As n grew, however, the algorithms became more and more costly to run.
The computer program above showed that the CGS algorithm without derivatives could generate Newton’s method. Since the Hessian matrix of Rosenbrock’s function was positive definite symmetric and satisfied Sylvester’s criterion, the CGS method converged if we began anywhere in the closed convex set in the nearby area of a minimum. This was because the CGS method is based on the fact that the Hessian matrix of Rosenbrock’s function is positive definite symmetric.
Using quotient convergence factors, one can see that for Rosenbrock’s function one sequence converged quadratically. In particular, the numerical computation on p. 21 revealed that the asymptotic constant oscillated between 0.20000 and 0.00307 , so the quotient convergence factor by Ortega and Rheinboldt [14] was, approximately, Q 2 { x k } = 0.200002 , which indicated quadratic convergence. The results agreed for Newton’s method.
Moreover, the CGS algorithm uses function evaluations and difference quotients for gradient and Hessian evaluations, it does not require accurate gradient evaluation nor function minimization. This approach is the most efficient algorithm that has been discussed in this study; yet, it is extremely sensitive to both the choice of σ that is used for difference quotients and the choice of ρ that is used for scaling.
The Gram–Schmidt conjugate direction method without derivatives has been used quite successfully in a variety of applications, including radar designs by Norman Olsen [27] in developing corporate feed systems for antennas and aperture distributions for antenna arrays. He tweaked the parameters sigma and rho in our GSCD computer programs to obtain successful radar designs.

Supplementary Materials

Supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/appliedmath3020015/s1.

Author Contributions

Conceptualization, I.S.J. and M.N.R.; methodology, I.S.J.; software, I.S.J.; validation, M.N.R. and I.S.J.; formal analysis, M.N.R.; investigation, M.N.R.; resources, I.S.J.; data curation, I.S.J.; writing—original draft preparation, I.S.J.; writing—review and editing, M.N.R.; visualization, M.N.R.; supervision, I.S.J.; project administration, I.S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

CDconjugate direction;
CGconjugate gradient;
CGSconjugate Gram–Schmidt;
GSCDGram–Schmidt conjugate direction.

References

  1. Fletcher, R.; Reeves, C. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
  2. Powell, M. An Efficient Method of Finding the Minimum of a Function of Several Variables Without Calculating Derivatives. Comput. J. 1964, 7, 155–162. [Google Scholar] [CrossRef]
  3. Davidon, W.C. Variable Metric Method for Minimization, A.E.C. Research and Development Report; ANL-5990 (Revision 2); Argonne National Lab.: Lemont, IL, USA, 1959. [Google Scholar]
  4. Fletcher, R.; Powell, M. A rapidly convergent descent method for minimization. Comput. J. 1936, 6, 163–168. [Google Scholar] [CrossRef] [Green Version]
  5. Hestenes, M.R.; Stiefel, E. The Method of Conjugate Gradients for Solving Linear System. J. Res. Natl. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
  6. Smith, C.S. The Automatic Computation of Maximum Likelihood Estimates; Scientific Department, National Coal Board: Bretby, UK, 1962; pp. 7.1–7.3. [Google Scholar]
  7. Zangwill, W.I. Minimizing a Function without Calculating Derivatives. Comput. J. 1967, 10, 293–296. [Google Scholar] [CrossRef] [Green Version]
  8. Fletcher, R. A Review of Methods For Unconstrained Optimization; Academic Press: New York, NY, USA, 1969; pp. 1–12. [Google Scholar]
  9. Brent, R.P. Algorithms for Minimizing without Derivatives; Prentice-Hall: Englewood Cliffs, NJ, USA, 1973. [Google Scholar]
  10. Hestenes, M.R. Conjugate Direction Methods in Optimization; Springer: New York, NY, USA, 1980. [Google Scholar]
  11. Nocedal, J.; Wright, S.J. Conjugate gradient methods. In Numerical Optimization; Springer: New York, NY, USA, 2006; pp. 101–134. [Google Scholar]
  12. Kelley, C.T. Iterative Methods for Optimization: Society for Industrial and Applied Mathematics; SIAM: Wake Forest, NC, USA, 1999. [Google Scholar]
  13. Zhang, L. An improved Wei–Yao–Liu nonlinear conjugate gradient method for optimization computation. Appl. Math. Comput. 2009, 215, 2269–2274. [Google Scholar] [CrossRef]
  14. Ortega, J.; Rheinboldt, W.C. Iterative Solutions of Nonlinear Equations in Several Variables; Academic Press: New York, NY, USA, 1970. [Google Scholar]
  15. Russak, I.B. Convergence of the conjugate Gram-Schmidt method. J. Optim. Theory Appl. 1981, 33, 163–173. [Google Scholar] [CrossRef]
  16. Cauchy, A. Méthode générale pour la resolution de systemes d’eguations simultanées. C. R. Hebd. Séances Acad. Sci. 1847, 25, 536–538. [Google Scholar]
  17. Eells, J. A setting for global analysis. Am. Math. Soc. Bull. 1966, 72, 751–807. [Google Scholar] [CrossRef]
  18. Rosenbrock, H.H. An automatic method for finding the greatest or least value of a function. Comput. J. 1960, 3, 175–184. [Google Scholar] [CrossRef] [Green Version]
  19. Andrei, N. Nonlinear Conjugate Gradient Methods for Unconstrained Optimization; Series Title: Springer Optimization and Its Applications; Springer Nature Switzerland AG: Cham, Switzerland, 2021; ISBN 978-3-030-42952-2. [Google Scholar]
  20. Jakovlev, M. On the solution of nonlinear equations by iterations. Dokl. Akad. Nauk SSSR 1964, 156, 522–524. [Google Scholar]
  21. Jakovlev, M. On the solution of nonlinear equations by an iteration method. Sibirskii Matematicheskii Zhurnal 1964, 5, 1428–1430. (In Russian) [Google Scholar]
  22. Jakovlev, M. The solution of systems of nonlinear equations by a method of differentiation with respect to a parameter. USSR Comput. Math. Math. Phys. 1964, 4, 198–203. (In Russian) [Google Scholar] [CrossRef]
  23. Wall, D. The order of an iteration formula. Math. Camp. 1956, 10, 167–168. [Google Scholar] [CrossRef] [Green Version]
  24. Ostrowski, A. Solution of Equations and Systems of Equations, 2nd ed.; Academic Press: New York, NY, USA, 1960. [Google Scholar]
  25. Stein, I., Jr. Conjugate Direction Algorithms in Numerical Analysis and Optimization: Final Report; U.S. Army Research Office, DAHC 04-74-G-0006, National Science Foundation GP-40175, and University of Toledo Faculty Research Grant; University of Toledo: Toledo, OH, USA, 1975. [Google Scholar]
  26. Dennemeyer, R.F.; Mookini, E.H. CGS Algorithms for Unconstrained Minimization of Functions. J. Optim. Theory Appl. 1975, 16, 67–85. [Google Scholar] [CrossRef]
  27. Olsen, N.C.; (Consultant, Lockheed, Palmdale, CA, USA). Private communication to Ivie Stein Jr., 2005.
  28. Raihen, N. Convergence Rates for Hestenes’ Gram-Schmidt Conjugate Direction Method without Derivatives in Numerical Optimization. Master’s Thesis in Mathematics, University of Toledo, Toledo, OH, USA, 2017. [Google Scholar]
Figure 1. Maximal conves level set for Rosenbrock’s banana valley function.
Figure 1. Maximal conves level set for Rosenbrock’s banana valley function.
Appliedmath 03 00015 g001
Figure 2. Level Curves of Rosenbrock’s banana valley function.
Figure 2. Level Curves of Rosenbrock’s banana valley function.
Appliedmath 03 00015 g002
Figure 3. Curve of steepest descent and level curves for Rosenbrock’s banana valley function.
Figure 3. Curve of steepest descent and level curves for Rosenbrock’s banana valley function.
Appliedmath 03 00015 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Stein, I., Jr.; Raihen, M.N. Convergence Rates for Hestenes’ Gram–Schmidt Conjugate Direction Method without Derivatives in Numerical Optimization. AppliedMath 2023, 3, 268-285. https://doi.org/10.3390/appliedmath3020015

AMA Style

Stein I Jr., Raihen MN. Convergence Rates for Hestenes’ Gram–Schmidt Conjugate Direction Method without Derivatives in Numerical Optimization. AppliedMath. 2023; 3(2):268-285. https://doi.org/10.3390/appliedmath3020015

Chicago/Turabian Style

Stein, Ivie, Jr., and Md Nurul Raihen. 2023. "Convergence Rates for Hestenes’ Gram–Schmidt Conjugate Direction Method without Derivatives in Numerical Optimization" AppliedMath 3, no. 2: 268-285. https://doi.org/10.3390/appliedmath3020015

Article Metrics

Back to TopTop