Next Article in Journal
A Novel Deep Learning-Based State-of-Charge Estimation for Renewable Energy Management System in Hybrid Electric Vehicles
Previous Article in Journal
Finite Element Method-Based Elastic Analysis of Multibody Systems: A Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Improved Modification of Accelerated Double Direction and Double Step-Size Optimization Schemes

by
Milena J. Petrović
1,*,
Dragana Valjarević
1,
Dejan Ilić
2,
Aleksandar Valjarević
3 and
Julija Mladenović
4
1
Faculty of Sciences and Mathematics, University of Pristina in Kosovska Mitrovica, Lole Ribara 29, 38220 Kosovska Mitrovica, Serbia
2
Faculty of Sciences and Mathematics, University of Niš, Višegradska 33, 18106 Niš, Serbia
3
Faculty of Geography, University of Belgrade, Studentski Trg 3/III, 11000 Belgrade, Serbia
4
Faculty of Mathematics, University of Belgrade, Studentski Trg 16, 11000 Belgrade, Serbia
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(2), 259; https://doi.org/10.3390/math10020259
Submission received: 20 November 2021 / Revised: 6 January 2022 / Accepted: 13 January 2022 / Published: 15 January 2022

Abstract

:
We propose an improved variant of the accelerated gradient optimization models for solving unconstrained minimization problems. Merging the positive features of either double direction, as well as double step size accelerated gradient models, we define an iterative method of a simpler form which is generally more effective. Performed convergence analysis shows that the defined iterative method is at least linearly convergent for uniformly convex and strictly convex functions. Numerical test results confirm the efficiency of the developed model regarding the CPU time, the number of iterations and the number of function evaluations metrics.

1. Accelerated Double Direction and Double Step Size Methods Overview

In order to define an efficient optimization model for solving unconstrained nonlinear tasks, we approach the matter on multiple fronts. One of the primers is insuring a fast convergence, desirably close enough to the Newton method’s convergence rate. On the other hand, we would like to avoid eventual complicated calculations that can arise from deriving Hessians’ second order partial derivatives. That is why the quasi-Newton method is a good starting point in developing an optimization method with good performance profiles. The benefits of the quasi-Newton methods are well known. One of the main characteristics of these iterations is the conservation of good convergence features, although the Hessian, i.e., the Hessian’s inverse, is not explicitly used. Instead, the appropriately defined Hessian’s approximation, or the approximation of its inverse is used in these methods. This way, the quasi-Newton methods preserve a good convergence rate and, at same time, avoid the possible difficulties of Hessians’ calculations. In this paper, we are using a quasi-Newton concept to define an efficient minimization scheme for solving unconstrained minimization problems, assigned as:
min f ( x ) ,   x R n ,
where f ( x ) is an objective function.
When defining an optimization iterative models based on the quasi-Newton form, we can start with the following general iteration:
x k + 1 = x k + t k d k ,
where x k stands for a current iterative point, x k + 1 is the next one, t k is the iterative step length and d k is the search direction of the k th iteration. For iterations of the quasi-Newton type, the search direction is defined trough the gradient features. Therewith, an iterative direction vector has to fulfill the descent condition, i.e.,
g k T d k 0 .
In condition (3), by g k , we denote the gradient of the objective function at x k . Furthermore, we adopt the usual notations:
g ( x ) = f ( x ) , G ( x ) = 2 f ( x ) , g k = f ( x k ) , G k = 2 f ( x k ) ,
where f ( x ) and 2 f ( x ) are the standard notations for the gradient and the Hessian of the goal function, respectively.
The way of defining the iterative step length t k and the iterative search direction vector d k directly influences the methods’ efficiency. With that, some authors [1,2,3,4,5] segregated one parameter more, equally important as the other two, that contributes to the method’s performance characteristics. That is an iterative accelerated parameter, often marked out as γ k . In [1], the author marked this parameter as θ k , and its iterative value is expressed by the relation (5). Researchers on this topic justifiably extricated a class of accelerated gradient schemes. In [3], for example, authors numerically confirmed more than evident performance progress in favor of the accelerated method when compared to its non-accelerated version. Here are some expressions of the accelerated factors defined in the accelerated gradient models mentioned above. These accelerated parameters are also listed in [6]:
θ k A G D = t k g k T g k t k y k T g k ,
γ k + 1 S M = 2 γ k γ k f ( x k + 1 ) f ( x k ) + t k g k 2 t k 2 g k 2 ,
γ k + 1 A D D = 2 f ( x k + 1 ) f ( x k ) α k g k T α k d k γ k 1 g k α k d k γ k 1 g k T t k d k γ k 1 g k ,
γ k + 1 A D S S = 2 f ( x k + 1 ) f ( x k ) + α k γ k 1 + β k g k 2 α k γ k 1 + β k 2 g k 2 ,
γ k + 1 T A D S S = 2 f ( x k + 1 ) f ( x k ) + ψ k g k 2 ψ k 2 g k 2 , ψ k = [ α k γ k 1 α k 2 ) + 1 ] .
Interesting ideas of the double step length and the double direction approach in defining an efficient minimization iteration are presented in [2,3]. In both of these studies, the authors used properly determined accelerating characteristics. In this paper, we use the proven good properties of each of these models, i.e., of the accelerated double direction, or shortly, the ADD method, as well as of the accelerated double step size-ADSS method.
The ADD iteration is defined by the following expression:
x k + 1 = x k + α k 2 d k α k γ k 1 g k ,
where γ k = γ k A D D > 0 is the acceleration parameter. The iterative step length α k is derived using the Armijos’ Backtracking inexact lines search algorithm. Variable d k stands for the second vector direction, and it is calculated by the next rule:
d k ( t ) = d k * , k m 1 i = 2 m t i 1 d k i + 1 * , k m
where d k * is the solution of the problem min x R Φ k ( d ) ,
Φ k ( d ) = f ( x k ) T d + 1 2 γ k + 1 I = g ( x k ) T d + 1 2 γ k + 1 I .
The two search directions in the ADD method are d k , defined by the previous rule and γ k 1 g k . One of the main results in [3] is that the ADD algorithm provides a lower number of iterations than the accelerated gradient descent method, marked as the SM method, which is presented in [2]. The iterative form of the SM method is given by the expression:
x k + 1 = x k t k γ k 1 g k ,
where t k is the iterative step length value, and γ k γ k S M is the acceleration parameter of the SM iteration expressed by the relation (6).
The accelerated double step size model, i.e., the ADSS, is defined as
x k + 1 = x k α k γ k 1 g k β k g k = x k α k γ k 1 + β k g k .
Parameters α k > 0 and β k > 0 are two iterative step lengths, calculated by two different Backtracking procedures, and γ k = γ k A D S S > 0 is the ADSS iterative accelerated parameter. In the ADSS iteration, we can identify the vector direction as:
α k γ k 1 + β k g k .
Transformed ADSS method, or in short, the TADSS, came from the ADSS scheme under the following condition: α k + β k = 1 . The TADSS iteration is defined as:
x k + 1 = x k [ α k ( γ k 1 1 ) + 1 ] g k .
From expression (13), we conclude that the defined vector direction has the form of a negative gradient direction. Having that in mind, it depends on the step length parameters as well as on the accelerated parameter iterative value. Numerical experiments from [4] show that the ADSS iteration outperforms the ADD [3] and the SM [2] schemes regarding all three of the analyzed metrics: the number of iterations, CPU time and the number of function evaluations.
We are motivated to define the method as an improved merged version of the accelerated double direction and double step size methods. At the same time, the proposed model should be of the simpler form than the ADD and the ADSS schemes are. We define this simpler form by ejecting one of the Backtracking algorithms from the ADSS iteration and by replacing the algorithm (11) in the ADD scheme with the gradient descent rule. Taking all these assumptions, we expect the proposed iterative method to be convergent at least at the same rate as the ADD and the ADSS methods are. That modified iteration, based on the mentioned accelerated gradient descent algorithms, should conserve the positive sides of its predecessors but also exceed them regarding the performance profiles of all tested metrics.
The paper is organized in the following way: In Section 2, we define the improved version of the ADD and the ADSS schemes. The convergence analysis of the defined model is carried out in Section 3. Numerical test results are compared, analyzed and displayed in Section 4.

2. Modified Accelerated Double Direction and Double Step Size Method

Taking into account the iterative form of the accelerated ADD method as well as good performance features of the accelerated double step size ADSS scheme, considering all three tested metrics, we propose the following iterative model for solving a large scale of unconstrained minimization problems:
x k + 1 = x k α k γ k 1 + α k 2 g k x k α k γ k 1 g k α k 2 g k .
Iterative scheme (15) presents the merged variant of the ADD and the ADSS methods, keeping the favorable aspects of each included gradient scheme. We denoted the iterative rule (15) as the modified accelerated double direction and double step size method, or in short, modADS. In the modADS scheme, one iterative search direction is γ k 1 g k , and the other is simply a negative gradient direction. Two step lengths, α k and α k 2 , are obtained using one Backtracking procedure. Basically, our main goal in generating the modADS method is to define an improved merged version of the accelerated double direction and double step size methods. Having that in mind, we want to conserve the positive aspects of each of these two baseline models. The form of the ADD iteration contains only one iterative step length value, i.e., one Backtracking procedure is applied. That was the main motivation to substitute the second iterative value β k from the ADSS iteration with the α k 2 . In this way, we conserve the form of the ADD iteration in the new modADS scheme.
On the other hand, from the results presented in [4], we know that the second search direction d k defined in the ADD iteration by (11) causes an increase in the number of function evaluations. Therefore, instead of it, just like in the ADSS iteration, in the new modADS process we simply use the gradient descent direction for the second search direction, as well.
There are certainly many different options for defining the second iterative step length in the double-direction and double step size models that differ from our choice: α k 2 . That question is still open. Since the modADS belongs to the class of accelerated double direction and double step size methods and presents a merged form of the ADD and the ADSS iteration, the choice to keep α k 2 as the second step length value was a natural one. Additionally, according to the TADSS iteration (14), it could be said that the TADSS corresponds to a different choice of second step size β k of the ADSS iteration. Therefore, this is also a motivation to define the modADS in a presented way and to compare the performance features of these two similar approaches.
So, the common elements of the ADD, the ADSS and the proposed modADSS iterative form represent the iterative step length value, α k , and the search direction vector γ k 1 g k . The other search direction in the modADS is g k , just like in the ADSS scheme. Still, as previously explained, the second step-size value of the new method differs from the one, β k , applied in the ADSS model. Instead of using an additional inexact line search technique to calculate the second iterative step length value, in the modADS, we use only one Backtracking procedure and define the second step length parameter as the quadratic value of the Backtracking outcome α k . This way, we evidently provide a decrease in the computational time, number of needed iterations and function evaluations. We confirm this statement in Section 4 by comparative analysis of the performance profiles of each of the tested models.
The algorithm of the Backtracking procedure upon which we calculate the iterative step length value is given by the following steps:
  • Objective function f ( x ) , the direction d k of the search at the point x k and numbers 0 < σ < 0.5 and β ( 0 , 1 ) are required;
  • α = 1 ;
  • f ( x k + α d k ) > f ( x k ) + σ α g k T d k , take α : = α β ;
  • Return α k = α .
We now derive the iterative value of the acceleration parameter using the second order Taylors’ expansion of the modADS iteration (15). To avoid huge expressions in that process, we simplified the relation (15) using the next substitution:
x k + 1 = x k s k g k ,
where s k = α k γ k 1 + α k 2 = α k γ k 1 + α k . Second order Taylor polynomial of (16) is then:
f ( x k + 1 ) f ( x k ) g k T s k g k + 1 2 s k g k T 2 f ( ξ ) g k .
In relation (17), 2 f ( ξ ) stands for the Hessian of the objective function, and variable ξ fulfills the following conditions:
ξ [ x k , x k + 1 ] , ξ = x k + δ ( x k + 1 x k ) = x k δ s k g k , 0 δ 1 .
We replace Hessian 2 f ( ξ ) with a properly defined scalar diagonal matrix
γ k I ,
where variable γ k + 1 is the acceleration parameter we are searching for:
f ( x k + 1 ) f ( x k ) s k g k 2 + 1 2 s k γ k + 1 g k 2 .
From the previous expression, we can easily compute the iterative value of the acceleration factor:
γ k + 1 = 2 f ( x k + 1 ) f ( x k ) + s k g k 2 s k 2 g k 2 = 2 f ( x k + 1 ) f ( x k ) + α k γ k 1 + α k g k 2 α k 2 γ k 1 + α k 2 g k 2 .
We are only interested in the positive γ k + 1 values because, in that case, both of the second order necessary and the second order sufficient conditions are fulfilled. However, if in some iterative steps we calculate a negative value for the acceleration parameter, then we simply set γ k + 1 = 1 . This choice of γ k + 1 transforms our modADS iteration into the standard gradient descent iterative method, i.e.,
x k + 2 = x k + 1 α k + 1 ( 1 + α k + 1 ) g k + 1 x k + 1 t k + 1 g k + 1 ,
for some t k + 1 = α k + 1 ( 1 + α k + 1 ) .
For initial values 0 < ρ < 1 , 0 < τ < 1 , x 0 , γ 0 = 1 , we now present the modADS algorithm:
  • Set k = 0 , compute f ( x 0 ) , g 0 and take γ 0 = 1 ;
  • If g k < ϵ , then go to Step 8, else continue by the step 3;
  • Apply Backtracking algorithm to calculate the iterative step length α k ;
  • Compute x k + 1 using (15);
  • Determine the acceleration parameter γ k + 1 using (19);
  • If γ k + 1 < 0 , then take γ k + 1 = 1 ;
  • Set k : = k + 1 , go to Step 2;
  • Return x k + 1 and f ( x k + 1 ) .

3. Convergence Analysis

In this section, we prove that the modADS iteration linearly converges on the sets of uniformly convex functions and strictly convex quadratic functions. We analyze these two function sets separately.

3.1. Set of Uniformly Convex Functions

To prove the linear convergence properties, we are using the following two statements from [7,8]:
Proposition 1. 
If the function f : R n R is twice continuously differentiable and uniformly convex on R n then:
(1)
the function f has a lower bound on L 0 = { x R n f ( x ) f ( x 0 ) } , where x 0 R n is available;
(2)
the gradient g is the Lipschitz continuous in an open convex set B which contains L 0 , i.e., there exists L > 0 such that:
g ( x ) g ( y ) L x y ,   x , y B .
Lemma 1. 
Under the assumptions of Proposition 1, there exist real numbers m and M satisfying:
0 < m 1 M ,
such that f ( x ) has an unique minimizer x * and
m y 2 y T 2 f ( x ) y M y 2 ,   x , y R n ;
1 2 m x x * 2 f ( x ) f ( x * ) 1 2 M x x * 2 ,   x R n ;
m x y 2 ( g ( x ) g ( y ) ) T ( x y ) M x y 2 ,   x , y R n .
In the following Lemma, we show that the objective function, on which the modADS iteration is applied, is bounded below. We also estimate the measure of the iterative function decreasing. The proof is analogous as in [2].
Lemma 2. 
Let the sequence { x k } be defined by the (15), and let f be uniformly convex function. Then:
f ( x k ) f ( x k + 1 ) μ g k 2 ,
for
μ = min σ M , σ ( 1 σ ) L β ,
where L > 0 is the Lipschitz constant from Proposition 1, and M R is defined in Lemma 1.
The fact that the modADS model converges at least linearly is proved in the next Theorem 1.
Theorem 1. 
The sequence { x k } , defined by the (15) and applied on uniformly convex and twice differentiable objective function f, converges linearly to its solution x * and
lim k g k = 0 .
Proof. 
From Lemma 2, we know that the objective function f, when applied on the modADS process, is bounded below and decreases, so it is evident that:
lim k ( f ( x k ) f ( x k + 1 ) ) = 0 .
This equality, merged with the result of Lemma 2, i.e., the relation (24), lead us to the following conclusion:
lim k g k = 0 .
Let us prove now that the sequence { x k } , generated by the (15), converges to its solution x * , i.e.,
lim k x k x * = 0 .
To prove (29), we put x * y in (23):
m x x * 2 ( g ( x ) g ( x * ) ) T ( x x * ) M x x * 2 .
Regarding the Mean Value Theorem and the Cauchy–Schwartz inequality, further on we obtain:
m x x * 2 g ( x ) M x x * 2 .
From (24) and (30), we have the following estimations:
μ g k 2 μ m 2 x x * 2 2 · μ m 2 M f ( x k ) f ( x * ) k 0 ,
which confirms (29).
To complete this proof, at the end, we show that the modADS process is linearly convergent. To do this, we practically need to prove that
ρ 2 · μ m 2 M < 1 .
We know from Lemma 2 that there are two values of the variable μ : μ = σ M and μ = σ ( 1 σ ) β L :
  • μ = σ M : In this case, we have:
    ρ 2 = 2 μ m 2 M = 2 · σ M m 2 M = 2 σ M m 2 M 2 σ m 2 M 2 σ < 1 ,
    since σ ( 0 , 1 2 ) and m < M .
  • μ = σ ( 1 σ ) β L : For this μ value, using the inequality m L , we show the same
    ρ 2 = 2 μ m 2 M = 2 · β σ ( 1 σ ) L m 2 M < 2 · 1 2 · 1 · m 2 L · M = m 2 L · M L · m L · M = m M < 1 ,
which completes this proof. □

3.2. Set of Strictly Convex Quadratics

Now, let us suppose that the objective function is a strictly convex quadratic function, expressed as:
f ( x ) = 1 2 x T A x b T x ,
where A is a real n × n matrix, which is symmetric and positive definite, and b R n is a given vector. Lets denote and sort the eigenvalues of the matrix A as
λ 1 λ 2 λ n .
Our goal now is to prove the convergence of the modADS iteration when applied on strictly convex quadratic. However, before we reveal the main theorem of this subsection, we show one auxiliary lemma which estimates the iterative variable s k α k ( γ k 1 + α k ) with respect to the smallest and the largest eigenvalues of matrix A.
Lemma 3. 
The smallest and the largest eigenvalues of the matrix A satisfy inequalities:
σ 2 λ n α k + 1 ( γ k + 1 1 + α k + 1 ) 1 λ 1 + 1 ,
where γ k + 1 and α k + 1 are the iterative acceleration parameter and step length value of the modADS iteration, respectively.
Proof. 
For the strictly convex quadratic function (31), the difference of its values in two successive points is:
f ( x k + 1 ) f ( x k ) = 1 2 x k + 1 T A x k + 1 b T x k + 1 1 2 x k T A x k + b T x k = 1 2 x k s k g k T A x k s k g k b T x k s k g k 1 2 x k T A x k + b T x k = 1 2 x k T A x k 1 2 s k x k T A g k 1 2 s k g k T A x k + 1 2 s k 2 g k T A g k b T x k + s k b T g k 1 2 x k T A x k + b T x k = 1 2 s k x k T A g k 1 2 s k g k T A x k + 1 2 s k 2 g k T A g k + s k b T g k ,
i.e.,
f ( x k + 1 ) f ( x k ) = 1 2 s k x k T A g k 1 2 s k g k T A x k + 1 2 s k 2 g k T A g k + s k b T g k .
Matrix A is symmetric and positive definite, so we can apply the symmetry condition: b T g k = g T b k . We can also use the fact that the gradient of the function (31) is g k = A x k b and transform (33) into:
f ( x k + 1 ) f ( x k ) = 1 2 s k g k T A x k + x k T A g k s k g k T A g k b T g k b T g k = 1 2 s k g k T ( A x k b T ) + g k T ( A x k b T ) s k g k T A g k = 1 2 s k g k T g k + g k T g k s k g k T A g k = s k g k T g k + 1 2 s k 2 g k T A g k .
If we replace the derived expression of the difference between function values in two successive iterations into the (19), we obtain:
γ k + 1 = 2 s k g k T g k + 1 2 s k 2 g k T A g k + s k g k T g k s k 2 g k T g k g k T A g k g k T g k .
From (34), we conclude that γ k + 1 is the Rayleigh quotient of the real symmetric matrix at the gradient vector g k , so the next is true:
λ 1 γ k + 1 λ n , k N .
Since 0 α k + 1 1 , the following estimations are valid:
s k + 1 = α k + 1 ( γ k + 1 1 + α k + 1 ) = α k + 1 γ k + 1 1 + α k + 1 2 1 γ k + 1 + α k + 1 1 λ 1 + α k + 1 1 λ 1 + 1
To prove the right side of (32), we will take the relation t k > η 1 σ γ k L , proved in [2]. With proper notation used in this scheme, the previous inequality becomes:
α k > β 1 σ γ k L .
We take into account the parameter limitations, i.e., σ ( 0 , 1 2 ) , β ( σ , 1 ) and 0 α k + 1 1 , and that leads us to:
s k + 1 = α k + 1 ( γ k + 1 1 + α k + 1 ) = α k + 1 γ k + 1 1 + α k + 1 2 > α k + 1 γ k + 1 β 1 σ γ k + 1 L · 1 γ k + 1 β 1 σ L σ 1 1 2 L = σ 2 L σ 2 λ n .
The last inequality arises from the fact that the largest eigenvalue λ n has the property of the Lipschity constant L:
g ( x ) g ( y ) = A x A y = A ( x y ) A x y = λ n x y .
This analysis confirms that (32) is truly assured. □
Theorem 2. 
Suppose the relation λ n < 2 2 λ 1 1 + λ 1 holds for the smallest and the largest eigenvalues of the strictly convex quadratic function (31). Then, considering the modADS iteration applied on (31), the following holds:
g k = i = 1 n d i k v i ,
where
( d i k + 1 ) 2 δ 2 ( d i k ) 2 , δ = max 1 λ 1 2 λ n , λ n ( 1 λ 1 + 1 ) 1 ,
for some real parameters d 1 k , d 2 k , , d n k . With that:
lim k g k = 0 .
Proof. 
Let { v 1 , v 2 , , v n } be the set of orthonormal eigenvalues of matrix A in expression (31). Assume that the sequence { x k } is generated by iterative rule (15). Then, the gradient of the function (31) in k + 1 th iterative point is:
g k + 1 = A ( x k s k g k ) b = A x k b s k A g k = g k s k A g k = ( I s k A ) g k ,
since g k = A x k b . Applying (37), we obtain:
g k + 1 = i = 1 n d i k + 1 v i = i = 1 n ( 1 s k λ i ) d i k v i .
To prove (37), it is enough to show that 1 s k λ i δ .
1 s k λ i δ { 1 s k λ i s k λ i 1 s k λ i 1 s k λ i > 1 ,
so, we analyze two cases:
  • 1 s k λ i λ 1 2 λ n 1 s k λ i 1 λ 1 2 λ n δ ;
  • 1 < s k λ i λ n 1 λ 1 + 1 λ n 1 λ 1 + 1 1 < δ .
From (37), we have that the measure of the gradient norm square is:
g k 2 = i = 1 n ( d i k ) 2 ,
and since parameter δ ( 0 , 1 ) , we derive the final conclusion (39). □

4. Numerical Outcomes and Comparative Analysis

In this section, we display the numerical results, using which we compare the relevant methods. For comparative models, in addition to the objective modADSS method presented in this paper, we primarily chose the accelerated double direction (ADD) method introduced in [3] and the accelerated double step-size (ADSS) iteration from [4]. This is a natural choice of comparative optimization processes since the derived modADS algorithm originates from these two gradient accelerated schemes and our basic goal is the improvement of this class of methods. Then, we investigate the impact of Backtracking parameter β by testing two more values for this parameter. The TADSS method, presented in [5], and the modADS introduced in this paper present two different ways of reducing the double step-size ADSS scheme into a single step length iteration. Due to this fact, we compare these two methods as well. Finally, we complete the numerical comparative analysis by comparing the defined modADS model with two more general gradient descent methods: Cauchy’s gradient method (GD) and Andrei’s accelerated gradient method (AGD) from [1].
The ADD scheme brought benefits regarding the reduction in the needed number of iterations towards its non-linear version and the SM method from [2]. Furthermore, in [4], the ADSS shown undisputed advances with respect to all three of the tested metrics: the number of iterations, the CPU time and the number of function evaluations. It has been compared with the SM and the ADD schemes.
All codes are written in the visual C++ programming language and run on a Workstation Intel(R) Core(TM) 2.3 GHz. The following values of the Backtracking parameters are taken: σ = 0.0001 and β = 0.8 .
The stopping criteria are:
g k 10 6 and | f ( x k + 1 ) f ( x k ) | 1 + | f ( x k ) | 10 16 .
We chose 10 values for the number of parameters for each test function: 100; 500; 1000; 3000; 5000; 10,000; 15,000; 20,000; 25,000 and 30,000. As a final result for 1 test function, we sum all 10 outcomes. We measured all three performance characteristics: the number of iterations, CPU and the number of evaluations. If for a certain number of iterations and for some test functions the applied model does not finish the test process in some defined time, we put the constant t e , the time-limiter parameter, in Table 1 and Table 2.
Remark 1. 
Time-limiter parameter is introduced in [3]. It is posed as an indicator for stopping the code execution, after some defined time, t e 120 s.
In the next Listing 1, we list the set of test functions examined in this research. We applied all three compared methods to each of these functions. The proposed functions are taken from a collection of unconstrained optimization test functions introduced in [9].
Listing 1. Test functions.
1. Extended Penalty
2. Perturbed Quadratic
3. Raydan-1
4. Diagonal 1
5. Diagonal 3
6. Generalized Tridiagonal-1
7. Extended Tridiagonal-1
8. Extended Three Expon. Terms
9. Diagonal 4
10. Extended Himmelblau
11. Quadr. Diag. Perturbed
12. Quadratic QF1
13. Exten. Quadr. Penalty QP1
14. Exten. Quadr. Penalty QP2
15. Quadratic QF2
16. Extended EP1
17. Extended Tridiagonal-2
18. Arwhead
19. Almost Perturbed Quadratic
20. Engval1
21. Quartc
22. Generalized Quartic
23. Diagonal 7
24. Diagonal 8
25. Diagonal 9
26. DIXON3DQ
27. NONSCOMP
28. HIMMELH
29. Power (Cute)
30. Sine
In Table 1, we display the results concerning the number of iterations metric. All three of the models provide very good numerical outcomes regarding the number of needed iterations. As expected, modADS and ADSS have an equal number of iterations for many test functions, precisely, 21 out of 30. This is due to the modADS iterative form having similar characteristics to those of the ADSS iteration. All three models give the same number of iterations for three cases. With that, each of the modADS and ADD give the lowest number of iterations in 6 out of 30 cases while ADSS does so in only 1 of 30 cases. A general view shows that modADS gives the final outcomes for all 30 test functions, ADD for 26 and ADSS for 29. ADD broke the time-limiter constant for the Diagonal 7, Diagonal 8, Power (Cute) and Sine functions. Execution time is exceeded only for the Sine function when the ADSS model is applied.
Regarding the speed of execution of each comparative model, from the obtained numerical outcomes, we can see that the modADS and ADSS models perform almost equally, and that is why we did not display the results obtained on this metric. Both models give zeros for CPU time in 29 out of 30 cases, and only modADS was successfully applied on the test function (Sine), while the ADSS iteration broke the execution time in this case. The ADD model has the worst outcomes in testing this characteristic with four t e breaks.
The contents of the Table 2 show the number of function evaluations for all three of the tested models. It is obvious that the modADS achieved the greatest improvement regarding this performance characteristic, when compared to the other two test processes. This method convincingly gives the lowest number of function evaluations in 29 out of 30 cases. The ADSS has the best outcome in 1 case only, while the ADD has very high numbers as results regarding this metric for almost all 30 test functions.
The average values concerning the three analyzed criteria for all comparative models are displayed in Table 3. We included the results of these computations achieved on 26 out of 30 test functions, on which we could apply all methods without breaking the execution time. From this Table, we can obtain a general impression about the performance features of the generated modADS process in comparison to its forerunners. We see that this new accelerated variant is equally fast as the ADSS scheme, it slightly goes beyond the ADSS regarding the number of iterations metric and evidently gives a significant shift in the number of evaluations. When compared with the ADD iteration, the modADS iteration upgrades it multiple times regarding all three performance profiles. More precisely, the modADS gives a 4 times lower average number of iterations, more than a 142 times lower number of function evaluations and it is multiple times faster than the ADD process.
We now analyze the dependency of the approaches regarding the Backtracking parameter beta. As mentioned before in this Section, in all previously displayed results, in the algorithms of all three comparative models, the value of this parameter was set to β = 0.8 . We conducted 600 additional tests over the modADS, the ADD and the AGD algorithms for 2 more values of this parameter: β = 0.3 and β = 0.6 . For that purpose, we chose the first 10 test functions from the Listing 1. In the following Table 4 and Table 5, we display the sums of the obtained results regarding the number of iterations and the number of evaluations for these three comparative models. As expected, the modADS demonstrates similar performance regarding the analyzed metrics when compared to the ADD and the ADSS methods, just as in the case of β = 0.8 . Concerning the number of iterations, for both beta values, the modADS acts similar to the ADSS method. Regarding the number of evaluations, again for each of the 2 additional beta values, it gives the best results in 7 out of 10 cases when compared to the ADSS and in all 10 cases in comparison to the ADD scheme.
Furthermore, we compare performance metrics between the modADS and the transformed ADSS, i.e., the TADSS. In [5], the authors confirmed that the TADSS provides better numerical outcomes regarding the number of iterations, CPU time and number of function evaluations in comparison with the ADSS scheme on 22 chosen test functions. From the results presented in the previous Table 1, Table 2, Table 3, Table 4 and Table 5, we concluded that the modADSS behaves similarly to the ADSS regarding the number of iterations and the CPU time, but it provides a lower number of evaluations. Due to results from [5], we may expect that the TADSS has better performance results than the modADSS with respect to the number of iterations. In Table 6, we present the achieved test results not only for the 22 test functions from [5] but for all 30 test functions from Listing 1. With that, we show in Table 7 a more general overview of the average results regarding all analyzed metrics.
Although the results from Table 6 illustrate that the TADSS provides a lower number of iterations in even 17 out of 30 test functions, still the general average outcomes confirm that the modADS provides more than 3 times better outcomes with respect to this metric than the TADSS process. According to the Table 6 results, when we analyze the number of function evaluations, the modADS and the TADSS obtain an equal number of the best outcomes. Yet, from the results presented in Table 7, we are assured that the modADS is almost three times more effective on this matter when compared to the TADSS iteration. From Table 6, we can also notice that for the Sine function, the TADSS process exceeds the execution time.
To achieve more general view of the performance features of the modADS method, we conducted additional comparisons with a classical gradient method, defined by Cauchy, and with the accelerated gradient method from [1]. We further denote these comparative methods by GD and AGD, respectively. The execution times were very long for the previously chosen number of variables. Due to that reason, we changed this set into the set of the next 10 decreased values: 10, 100, 200, 300, 500, 700, 800, 1000, 2000 and 3000. We tested the first 15 test functions from the Listing 1 by applying the modADS, the GD and the AGD iterative rules. The sums of 450 additional tests outcomes are displayed in the following Table 8, Table 9 and Table 10.
From Table 8, we can see that it is undoubtedly evident that the modADS gives the lowest number of iterations compared to the GD and the AGD methods in all 15 test functions.
The CPU execution time needed when 3 comparative models are applied on first 15 test functions is listed in the Table 9. We see that, except in four cases when all three methods have the same (zero) outcomes, the modADSS is again a dominant model regarding this aspect, as well.
The number of objective function evaluations achieved by the modADS, the GD and the AGD are illustrated in the Table 10. General conclusions over this performance metric are the same as regarding the number of iterations (Table 8), i.e., the modADS has the best outcomes for all 15 test functions.
As a summary, we display in Table 11 the comparisons of the average results obtained by three comparative methods (modADS, GD and AGD) regarding all three performance characteristics. The results displayed in this table confirm that the modADSS requires an approximately 417 times lower number of iterations compared to the GD method and an about 263 times lower number of iterations compared to the AGD method. Regarding the needed number of evaluations, the modADS outperforms the GD and the AGD methods over the 1420 times.

5. Discussion

We defined an optimization model for solving a large scale of unconstrained minimization problems. This method belongs to the class of accelerated gradient iterations with quasi-Newton features. The presented modADS method could be classified in this manner since it contains the scalar matrix approximation of the Hessian, instead of the Hessian itself, with a guiding scalar, the so-called approximation parameter. Previous research on accelerated gradient optimization models confirms that the existence of this parameter directly improves the performance profiles vice versa to the relevant non-accelerated version [3]. In this paper, we chose to develop this acceleration parameter based on the second order Taylor expansion of the posed iteration.
The modADS originates from the accelerated double direction and double step size methods, and the so conducted convergence analysis is similar to those taken in [4]. It confirmed that the developed model is linearly convergent on the sets of uniformly convex and strictly convex functions.
The outcomes of the numerical experiments conducted on the modADS, the ADD and the ADSS methods for three values of the Backtracking parameter β , show the convincing improvement in reducing the number of function evaluations in favor of the developed model. The ADSS method has one execution break, while the ADD has even four. The modADS highly outperforms the ADD method regarding all analyzed metrics.
When compared with the Cauchy’s gradient method and the Andrei’s accelerated gradient descent method from [1], the modADS outperforms these models multiple times concerning all performance metrics.

6. Conclusions

The proposed iterative rule has the elements of the accelerated double step size-ADSS method [4] and accelerated double direction-ADD method. [3]. In defining modADS, as in previously mentioned methods, we kept the inexact line search Backtracking technique [10] to define an iterative step length value.
We conducted the convergence analysis and proved that the proposed modADS process is at least linearly convergent for the uniformly convex and strictly convex quadratic functions.
Through numerical experiments, we generally conclude that, when compared with the baseline methods, the modADS algorithms has more similarities with the ADSS scheme than with the ADD method. With that, it upgrades both comparative models, primarily because only the modADS method provides numerical outcomes for all 30 test functions, without exception, which confirms the stability of the defined model. In comparison to the classical gradient descent method and accelerated gradient descent method from [1], the defined modADS shows a convincing progress regarding all monitored features.
From all exposed, we conclude that the proposed accelerated gradient minimization model is an effective and efficient algorithm which can be applied for solving many unconstrained optimization tasks.

Author Contributions

Conceptualization, M.J.P.; methodology, M.J.P. and D.V.; software, M.J.P.; validation, M.J.P., D.V., D.I. and A.V.; formal analysis, M.J.P., D.V. and D.I.; investigation, M.J.P., A.V. and J.M.; resources, M.J.P. and D.V.; data curation, M.J.P., D.V., A.V. and J.M.; writing—original draft preparation, M.J.P.; writing—review and editing, D.V., D.I. and A.V.; visualization, A.V. and J.M.; supervision, M.J.P., D.V. and D.I.; project administration, A.V. and J.M.; funding acquisition, A.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by internal-junior project IJ-0202, Faculty of Sciences and Mathematics, University of Priština in Kosovska Mitrovica.

Data Availability Statement

Data results are available on readers request.

Acknowledgments

The first author gratefully acknowledges support from the project Grant No. 174025 by Ministry of Education and Science of Republic of Serbia.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
MDPIMultidisciplinary Digital Publishing Institute
DOAJDirectory of open access journals
TLAThree letter acronym
LDLinear dichroism

References

  1. Andrei, N. An acceleration of gradient descent algoritham with backtracing for unconstrained optimization. Numer. Algor. 2006, 42, 63–173. [Google Scholar] [CrossRef]
  2. Stanimirovic, P.S.; Miladinović, M.B. Accelerated gradient descent methods with line search. Numer. Algor. 2010, 54, 503–520. [Google Scholar] [CrossRef]
  3. Petrović, M.J.; Stanimirovic, P.S. Accelerated Double Direction Method For Solving Unconstrained Optimization Problems. Math. Probl. Eng. 2014, 2014, 965104. [Google Scholar] [CrossRef] [Green Version]
  4. Petrović, M.J. An accelerated Double Step Size method in unconstrained optimization. Appl. Math. Comput. 2015, 250, 309–319. [Google Scholar] [CrossRef]
  5. Stanimirovic, P.S.; Petrović, M.J.; Milovanović, G.V. A Transformation of Accelerated Double Step Size Method for Unconstrained Optimization. Math. Probl. Eng. 2015, 2015, 283679. [Google Scholar] [CrossRef]
  6. Petrović, M.J.; Ivanović, M.; Djordjević, M. Comparative performance analysis of some accelerated and hybrid accelerated gradient models. Univ. Thought Publ. Nat. Sci. 2019, 9, 57–61. [Google Scholar] [CrossRef]
  7. Ortega, J.M.; Rheinboldt, W.C. Iterative Solution of Nonlinear Equation in Several Variables. In Iterative Solution of Nonlinear Equation in Several Variables; Academic Press: London, UK, 1970. [Google Scholar]
  8. Rockafellar, R.T. Convex Analysis. In Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
  9. Andrei, N. An Unconstrained Optimization Test Functions Collection. Adv. Model. Optim. 2008, 10, 1–15. Available online: http://www.apmath.spbu.ru/cnsa/pdf/obzor/An%20Unconstrained%20Optimization%20Test%20Functions%20Collection.pdf (accessed on 6 January 2022).
  10. Armijo, L. Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 2008, 16, 1–3. [Google Scholar] [CrossRef] [Green Version]
Table 1. Number of iterations, modADS, ADD and ADSS.
Table 1. Number of iterations, modADS, ADD and ADSS.
Function NumbermodADSADDADSS
1.507350
2.43282432
3.318831
4.6083135
5.418244
6.8011076
7.7012070
8.4010040
9.783100781
10.7010070
11.42891428
12.47084470
13.609160
14.618161
15.608560
16.4010040
17.8011180
18.608960
19.43282432
20.7010070
21.101010
22.7610270
23.2202 t e > l 2203
24.2215 t e > l 2215
25.328034
26.235131235
27.101010
28.101010
29.3870 t e > l 5083
30.2061 t e > l t e > l
Table 2. Number of function evaluations, modADSS, ADD and ADSS.
Table 2. Number of function evaluations, modADSS, ADD and ADSS.
Function NumbermodADSADDADSS
1.1242228,1321703
2.1240154,3551793
3.837131,6374804
4.3484140,86216,997
5.5384127,188876
6.410176,018690
7.250186,657420
8.220104,690350
9.1756223,2402593
10.320206,110480
11.1245249,2381797
12.1283159,2561861
13.570254,480824
14.573154,821827
15.582189,159809
16.350278,890490
17.30071,354420
18.602254,487854
19.1239154,0501792
20.300130,390460
21.304040
22.7023123,052617
23.4424 > t e 6639
24.4480 > t e 6715
25.457143,701714
26.1218251,9551692
27.304040
28.304040
29.7760 > t e 15,279
30.126,094 > t e > t e
Table 3. ModADS, ADD and ADSS average outcomes of all 3 analyzed metrics obtained on 26 test functions from Listing 1.
Table 3. ModADS, ADD and ADSS average outcomes of all 3 analyzed metrics obtained on 26 test functions from Listing 1.
Average MetricsmodADSADDADSS
Number of iterations145.81583.23148.42
CPU time (s)0135.850
Number of function evaluations1191.35157,455.461691.65
Table 4. Number of iterations for β = 0.3 and β = 0.6 .
Table 4. Number of iterations for β = 0.3 and β = 0.6 .
Function
Number
modADS 0.3ADD 0.3ADSS 0.3modADS 0.6ADD 0.6ADSS 0.6
1.507250507250
2.4328143243282432
3.7258635497631
4.3378764083102
5.408146438244
6.83100788011076
7.340100707011070
8.40100404010040
9.788100781783100781
10.7090707010070
Table 5. Number of evaluations for β = 0.3 and β = 0.6 .
Table 5. Number of evaluations for β = 0.3 and β = 0.6 .
Function
Number
modADS 0.3ADD 0.3ADSS 0.3modADS 0.6ADD 0.6ADSS 0.6
1.342165,094778620184,5371068
2.945117,65914981037128,6571587
3.4078101,862273743103,993338
4.19596,897944665121,55813,181
5.436109,8384503642116,573594
6.250142,4822137296148,475532
7.89684,46036022083,040380
8.220116,10122028090,350270
9.1626182,68824531656189,4102493
10.210126,380400250183,600266
Table 6. Number of iterations and number of function evaluations, modADS and TADSS.
Table 6. Number of iterations and number of function evaluations, modADS and TADSS.
Function NumbermodADS
num.it.
TADSS
num.it.
modADS
num.eval.
TADSS
num.eval.
1.504012421082
2.43210,973124029,624
3.3111838379355
4.60223484349
5.41235384439
6.8060410412
7.7060250250
8.4040220400
9.783401756270
10.7060320300
11.4286915124534,053
12.4705314128314,650
13.6050570570
14.6186573672
15.6050582563
16.40167350776
17.806203001993
18.6050602582
19.43210,715123929,150
20.7060300290
21.10103030
22.76607023256
23.22021994424572
24.22151744480696
25.3224457448
26.23510121830
27.10103030
28.10103040
29.3870175277608644
30.2061 t e > l 126,094 t e > l
Table 7. ModADS and TADSS average outcomes of all 3 analyzed metrics obtained on 29 test functions from Listing 1.
Table 7. ModADS and TADSS average outcomes of all 3 analyzed metrics obtained on 29 test functions from Listing 1.
Average MetricsmodADSTADSS
Number of iterations416.481337.14
CPU time (s)0.072.97
Number of function evaluations47,639136,506
Table 8. The number of iterations for first 15 test functions obtained by modADS, GD and AGD methods.
Table 8. The number of iterations for first 15 test functions obtained by modADS, GD and AGD methods.
Function NumbermodADSGDAGD
1.522058271
2.59950,86361,678
3.4420,82315,344
4.5811,65011,563
5.5919,17829,673
6.80888583
7.70678,6481768
8.401784396
9.7888484100
10.701295321
11.595354,364549,164
12.60853,10362,996
13.61579182
14.6186,323109,632
15.6163,74511,797
Table 9. CPU for first 15 test functions obtained by modADS, GD and AGD methods.
Table 9. CPU for first 15 test functions obtained by modADS, GD and AGD methods.
Function NumbermodADSGDAGD
1.010
2.0116150
3.0116
4.078
5.02237
6.000
7.01980
8.000
9.000
10.000
11.014143000
12.01445192
13.000
14.0673785
15.076720
Table 10. The number of evaluations for first 15 test functions obtained by modADS, GD and AGD methods.
Table 10. The number of evaluations for first 15 test functions obtained by modADS, GD and AGD methods.
Function NumbermodADSGDAGD
1.929425495822
2.14691,747,1451,971,495
3.1260416,274240,666
4.3137355,313316,838
5.3126577,545838,896
6.41414,4568321
7.2503457,7777102
8.22017,9683413
9.1766165,9381110
10.32024,5655591
11.147211,880,54316,276,884
12.14661,656,7381,823,829
13.46696792163
14.4672,489,7322,719,409
15.4772,390,405356,569
Table 11. The average number of all 3 analyzed metrics obtained on first 15 test functions from Listing 1.
Table 11. The average number of all 3 analyzed metrics obtained on first 15 test functions from Listing 1.
Average MetricsmodADSGDAGD
Number of iterations216.490,252.3357,031.2
CPU time (s)0310.27279.87
Number of function evaluations1149.271,683,108.471,638,540.53
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Petrović, M.J.; Valjarević, D.; Ilić, D.; Valjarević, A.; Mladenović, J. An Improved Modification of Accelerated Double Direction and Double Step-Size Optimization Schemes. Mathematics 2022, 10, 259. https://doi.org/10.3390/math10020259

AMA Style

Petrović MJ, Valjarević D, Ilić D, Valjarević A, Mladenović J. An Improved Modification of Accelerated Double Direction and Double Step-Size Optimization Schemes. Mathematics. 2022; 10(2):259. https://doi.org/10.3390/math10020259

Chicago/Turabian Style

Petrović, Milena J., Dragana Valjarević, Dejan Ilić, Aleksandar Valjarević, and Julija Mladenović. 2022. "An Improved Modification of Accelerated Double Direction and Double Step-Size Optimization Schemes" Mathematics 10, no. 2: 259. https://doi.org/10.3390/math10020259

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop