Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations

Stanimirović, Predrag S.; Shaini, Bilall I.; Sabi’u, Jamilu; Shah, Abdullah; Petrović, Milena J.; Ivanov, Branislav; Cao, Xinwei; Stupina, Alena; Li, Shuai

doi:10.3390/a16020064

Open AccessArticle

Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations

by

Predrag S. Stanimirović

^1,2,*

,

Bilall I. Shaini

³,

Jamilu Sabi’u

⁴,

Abdullah Shah

⁵,

Milena J. Petrović

⁶

,

Branislav Ivanov

⁷

,

Xinwei Cao

^8,*,

Alena Stupina

² and

Shuai Li

⁹

¹

Faculty of Sciences and Mathematics, University of Niš, Višegradska 33, 18000 Niš, Serbia

²

Laboratory “Hybrid Methods of Modelling and Optimization in Complex Systems”, Siberian Federal University, Prosp. Svobodny 79, 660041 Krasnoyarsk, Russia

³

Department of Mathematics, Faculty of Applied Sciences, State University of Tetova, St. Ilinden, n.n., 1220 Tetovo, North Macedonia

⁴

Department of Mathematics, Yusuf Maitama Sule University, Kano 700282, Nigeria

⁵

Department of Mathematics and Statistics, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

⁶

Faculty of Sciences and Mathematics, University of Pristina in Kosovska Mitrovica, Lole Ribara 29, 38220 Kosovska Mitrovica, Serbia

⁷

Technical Faculty in Bor, University of Belgrade, Vojske Jugoslavije 12, 19210 Bor, Serbia

⁸

School of Business, Jiangnan University, Lihu Blvd, Wuxi 214122, China

⁹

Faculty of Science and Engineering, Zienkiewicz Centre for Computational Engineering, Swansea University, Swansea SA1 8EN, UK

^*

Authors to whom correspondence should be addressed.

Algorithms 2023, 16(2), 64; https://doi.org/10.3390/a16020064

Submission received: 24 November 2022 / Revised: 7 January 2023 / Accepted: 16 January 2023 / Published: 18 January 2023

(This article belongs to the Special Issue Computational Methods and Optimization for Numerical Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

This research proposes and investigates some improvements in gradient descent iterations that can be applied for solving system of nonlinear equations (SNE). In the available literature, such methods are termed improved gradient descent methods. We use verified advantages of various accelerated double direction and double step size gradient methods in solving single scalar equations. Our strategy is to control the speed of the convergence of gradient methods through the step size value defined using more parameters. As a result, efficient minimization schemes for solving SNE are introduced. Linear global convergence of the proposed iterative method is confirmed by theoretical analysis under standard assumptions. Numerical experiments confirm the significant computational efficiency of proposed methods compared to traditional gradient descent methods for solving SNE.

Keywords:

nonlinear equations; gradient descent methods; nonlinear programming; Jacobian

MSC:

90C53; 65K05; 49M37

1. Introduction, Preliminaries, and Motivation

Our intention is to solve a system of nonlinear equations (SNE) of the general form

F (x) = 0, x \in R^{n},

(1)

where

R

is the set of real numbers,

R^{n}

denotes the set of n-dimensional vectors from

R

, and

F : R^{n} \mapsto R^{n}

,

F (x) = {(F_{1} (x), \dots, F_{n} (x))}^{T}

, and

F_{i} : R^{n} \mapsto R

is the ith component of F. It is assumed that F is a continuously differentiable mapping. The nonlinear optimization problem (1) is equivalent to the subsequent minimization of the following goal function f:

min_{x \in R^{n}} f (x), f (x) = \frac{1}{2} {∥ F (x) ∥}^{2} = \frac{1}{2} \sum_{i = 1}^{n} {(F_{i} (x))}^{2} .

(2)

The equivalence of (1) and (2) is widely used in science and practical applications. In such problems, the solution to SNE (1) comes down to solving a related least-squares problem (2). In addition to that, the application of the adequate nonlinear optimization method in solving (1) is a common and efficient technique. Some well-known schemes for solving (1) are based on successive linearization, where the search direction

d_{k}

is obtained by solving the equation

F (x_{k}) + F^{'} (x_{k}) d_{k} = 0,

(3)

where

F^{'} (x_{k}) \equiv J_{F} (x_{k})

, and

J_{F} (x) = [\frac{\partial F_{1} (x)}{\partial x_{j}}]

is the Jacobian matrix of

F (x)

. Therefore, the Newton iterative scheme for solving (1) is defined as

x_{k + 1} = x_{k} + t_{k} d_{k} = x_{k} - t_{k} {(F^{'} (x_{k}))}^{- 1} F (x_{k}),

(4)

where

t_{k}

is a positive parameter that stands for the steplength value.

1.1. Overview of Methods for Solving SNE

Most popular iterations for solving (1) use appropriate approximations

B_{k}

of the Jacobian matrix

F^{'} (x_{k})

. These iterations are of the form

x_{k + 1} = x_{k} + t_{k} d_{k}

, where

t_{k}

is the steplength, and

d_{k}

is the search direction obtained as a solution to the SNE

B_{k} d_{k} + F (x_{k}) = 0 .

(5)

For simplicity, we will use notations

F_{k} : = F (x_{k}), y_{k} : = F_{k + 1} - F_{k}, s_{k} : = x_{k + 1} - x_{k} .

(6)

The BFGS approximations are defined on the basis of the secant equation

B_{k + 1} s_{k} = y_{k}

. The BFGS updates

B_{k + 1} = B_{k} - \frac{B_{k} s_{k} s_{k}^{T} B_{k}}{s_{k}^{T} B_{k} s_{k}} + \frac{y_{k} y_{k}^{T}}{y_{k}^{T} s_{k}}

with an initial approximation

B_{0} \in R^{n \times n}

were considered in [1].

Further on, we list and briefly describe relevant minimization methods that exploit the equivalence between (1) and (2). The efficiency and applicability of these algorithms highly motivated the research presented in this paper. The number of methods that we mention below confirms the applicability of this direction in solving SNE. In addition, there is an evident need to develop and constantly upgrade the performances of optimization methods for solving (1).

There are numerous methods which can be used to solve the problem (1). Many of them are developed in [2,3,4,5,6,7]. Derivative-free methods for solving SNE were considered in [8,9,10]. These methods are proposed as appropriate adaptations of double direction and steplength methods in nonlinear optimization and the approximation of the Jacobian with a diagonal matrix whose entries are defined utilizing of an appropriate parameter. One approach based on various modifications of the Broyden method was proposed in [11,12]. A derivative-free conjugate gradient (CG) iterations for solving SNE were proposed in [13].

A descent Dai–Liao CG method for solving large-scale SNE was proposed in [14]. Novel hybrid and modified CG methods for finding a solution to SNE were originated in [15,16], respectively. An extension of a modified three-term CG method that can be applied for solving equations with convex constraints was presented in [17]. A diagonal quasi-Newton approach for solving large-scale nonlinear systems was considered in [18,19]. A quasi-Newton method, defined based on an improved diagonal Jacobian approximation, for solving nonlinear systems was proposed in [20]. Abdullah et al. in [21] proposed a double direction method for solving nonlinear equations. The first direction is the steepest descent direction, while the second direction is the proposed CG direction. Two derivative-free modifications of the CG-based method for solving large-scale systems

F (x) = 0

were presented in [22]. These methods are applicable in the case when the Jacobian of

F (x)

is not accessible. An efficient approximation to the Jacobian matrix with a computational effort similar to that of matrix-free settings was proposed in [23]. Such efficiency was achieved when a diagonal matrix generates a Jacobian approximation. This method possesses low memory space requirements because the method is defined without computing exact gradient and Jacobian. Waziri et al. in [24] followed the approach based on the approximation of the Jacobian inverse by a nonsingular diagonal matrix. A fast and computationally efficient method concerning memory requirements was proposed in [25], and it uses an approximation of the Jacobian by an adequate diagonal matrix. A two-step generalized scheme of the Jacobian approximation was given in [26]. Further on, an iterative scheme which is based on a modification of the Dai–Liao CG method, classical Newton iterates, and the standard secant equation was suggested in [27]. A three-step method based on a proper diagonal updating was presented in [28]. A hybridization of FR and PRP conjugate gradient methods was given in [29]. The method in [29] can be considered as a convex combination of the PRP method and the FR method while using the hyperplane projection technique. A diagonal Jacobian method was derived from data from two preceding steps, and a weak secant equation was investigated in [30]. An iterative modified Newton scheme based on diagonal updating was proposed in [31]. Solving nonlinear monotone operator equations via a modified symmetric rank-one update is given in [32]. In [33], the authors used a new approach in solving nonlinear systems by simply considering them in the form of multi-objective optimization problems.

It is essential to mention that the analogous idea of avoiding the second derivative in the classical Newton’s method for solving nonlinear equations is exploited in deriving several iterative methods of various orders for solving nonlinear equations [34,35,36,37]. Moreover, some derivative-free iterative methods were developed for solving nonlinear equations [38,39]. Furthermore, some alternative approaches were conducted for solving complex symmetric linear systems [40] or a Sylvester matrix equation [41].

Trust region methods have become very popular algorithms for solving nonlinear equations and general nonlinear problems [37,42,43,44].

The systems of nonlinear equations (1) have various applications [15,29,45,46,47,48], for example in solving the

ℓ_{1}

-norm problem arising from compressing sensing [49,50,51,52], in variational inequalities problems [53,54], and optimal power flow equations [55] among others.

Viewed statistically, the Newton method and different forms of quasi-Newton methods have been frequently used in solving SNE. Unfortunately, methods of the Newton family are not efficient in solving large-scale SNE problems since they are based on the Jacobian matrix. A similar drawback applies to all methods based on various matrix approximations of the Jacobian matrix in each iteration. Numerous adaptations and improvements of the CG iterative class exist as one solution applicable to large-scale problems. We intend to use the simplest Jacobian approximation using an appropriate diagonal matrix. Our goal is to define computationally effective methods for solving large-scale SNEs using the simplest of Jacobian approximations. The realistic basis for our expectations is the known efficient methods used to optimize individual nonlinear functions.

The remaining sections have the following general structure. The introduction, preliminaries, and motivation are included in Section 1. An overview of methods for solving SNE is presented in Section 1.1 to complete the presentation and explain the motivation. The motivation for the current study is described in Section 1.2. Section 2 proposes several multiple-step-size methods for solving nonlinear equations. Convergence analysis of the proposed methods is investigated in Section 3. Section 4 contains several numerical examples obtained on main standard test problems of various dimensions.

1.2. Motivation

The following standard designations will be used. We adopt the standard notations for the gradient

g (x) : = \nabla f (x)

and the Hessian

G (x) : = \nabla^{2} f (x)

of the objective function

f (x)

. Further,

g_{k} = g (x_{k})

denotes the gradient vector for f in the point

x_{k}

. An appropriate identity matrix will be denoted by I.

Our research is motivated by two trends in solving minimization problems. These streams are described as two subsequent parts of the current subsection. A nonlinear multivariate unconstrained minimization problem is defined as

min f (x), x \in R^{n},

(7)

where

f (x) : R^{n} \mapsto R

is a uniformly convex or strictly convex continuously differentiable function bounded from below.

1.2.1. Improved Gradient Descent Methods as Motivation

The most general iteration for solving (7) is expressed as

x_{k + 1} = x_{k} + t_{k} d_{k} .

(8)

In (8),

x_{k + 1}

presents a new approximation point based on the previous

x_{k}

. Positive parameter

t_{k}

stays for the steplength value, while

d_{k}

presents the search direction vector, which is generated based on the descent condition

g_{k}^{T} d_{k} < 0 .

The direction vector

d_{k}

may be defined in various ways. This vital element is often determined using the features of the function gradient. In one of the earliest optimization schemes, the gradient descent method (GD), this variable is defined as negative of the gradient direction, i.e.,

d_{k} = - g_{k}

. In the line search variant of the Newton method, the search direction presents the solution to the system of nonlinear equations

G_{k} d = - g_{k}

with respect to

d

, where

G_{k} : = G (x_{k}) = ▽^{2} f (x_{k})

denotes the Hessian matrix.

Unlike traditional GD algorithms for nonlinear unconstrained minimization, which are defined based on a single step size

t_{k}

, a class of improved gradient descent (

I G D

) algorithms define the final step size using two or more steps size scaling parameters. Such algorithms were classified and investigated in [56]. Obtained numerical results confirm that the usage of appropriate additional scaling parameters decreases the number of iterations. Typically, one of the parameters is defined using the inexact line search, while the second one is defined using the first terms of the Taylor expansion of the goal function.

A frequently investigated class of minimization methods that can be applied for solving the problem (7) use the following iterative rule

x_{k + 1} = x_{k} - θ_{k} t_{k} g_{k} .

(9)

In (9), the parameter

t_{k}

represents the step size in the kth iteration. The originality of the iteration (9) is expressed though the acceleration variable

θ_{k}

. This type of optimization scheme with acceleration parameter was originated in [57]. Later, in [58], the authors justifiably named such models as accelerated gradient descent methods (AGD methods shortly). Further research on this topic confirmed that the acceleration parameter generally improves the performance of the gradient method.

The Newton method with included line search technique is defined by the following iterative rule

x_{k + 1} = x_{k} - t_{k} G_{k}^{- 1} g_{k},

(10)

wherein

G_{k}^{- 1}

stands for the inverse of the Hessian matrix

G_{k}

. Let

B_{k}

be a symmetric positive definite matrix such that

∥ B_{k} - G_{k} ∥ < ϵ

, for arbitrary matrix norm

∥ . ∥

and for a given tolerance

ϵ

. Further, let

H_{k}

be a positive definite approximation of the Hessian’s inverse

G_{k}^{- 1}

. This approach leads to the relation (11) which is the quasi-Newton method with line search:

x_{k + 1} = x_{k} - t_{k} H_{k} g_{k} .

(11)

Updates of

H_{k}

can be defined as solutions to the quasi-Newton equation

H_{k + 1} y_{k} = s_{k},

(12)

where

s_{k} = x_{k + 1} - x_{k}

,

y_{k} = g_{k + 1} - g_{k}

. There is a class of iterations (11) in which there is no ultimate requirement for the

H_{k}

to satisfy the quasi-Newton equation. Such a class of iterates is known as modified Newton methods [59].

The idea in [58] is usage of a proper diagonal approximation of the Hessian

B_{k} = γ_{k} I, γ_{k} > 0, γ_{k} \in R .

(13)

Applying the approximation (13) of

B_{k}

, the matrix

H_{k}

can be approximated by the simple scalar matrix

H_{k} = γ_{k}^{- 1} I .

(14)

In this way, the quasi-Newton line search scheme (11) is transformed into a kind of

A G D

iteration, called the

S M

method and presented in [58] as

x_{k + 1} = x_{k} - γ_{k}^{- 1} t_{k} g_{k} .

(15)

The positive quantity

γ_{k}

is the convergence acceleration parameter which improves the behavior of the generated iterative loop. In [56], methods of the form (15) are termed as improved gradient descent methods (IGD). Commonly, the primary step size

t_{k}

is calculated through the features of some inexact line search algorithms. An additional acceleration parameter

γ_{k}

is usually determined by the Taylor expansion of the goal function. This way of generating acceleration parameter is confirmed as a good choice in [56,58,60,61,62].

The choice

γ_{k} : = 1

in the

I G D

iterations (15) reveal the

G D

iterations

x_{k + 1} = x_{k} - t_{k} g_{k} .

(16)

On the other hand, if the acceleration

γ_{k}

is well-defined, then the step size

t_{k} : = 1

in the

I G D

iterations (15) is acceptable in most cases [63], which leads to a kind of the

G D

iterative principle:

x_{k + 1} = x_{k} - γ_{k}^{- 1} g_{k} .

(17)

Barzilai and Borwein in [64] proposed two efficient

I G D

variants, known as

B B

method variants, where the steplength

γ_{k}^{B B}

was defined as an approximation

H_{k} = γ_{k}^{B B} I

. Therefore, the replacement

γ_{k}^{- 1} : = γ_{k}^{B B}

in (17) leads to the

B B

iterative rule

x_{k + 1} = x_{k} - γ_{k}^{B B} g_{k} .

The scaling parameter

γ_{k}^{B B}

in the basic version is defined upon the minimization of the vector norm

min_{γ} {∥ s_{k - 1} - γ y_{k - 1} ∥}^{2}

, which gives

γ_{k}^{B B} = \frac{s_{k - 1}^{T} y_{k - 1}}{y_{k - 1}^{T} y_{k - 1}} .

(18)

The steplength

γ_{k}^{B B}

in the dual method is produced by the minimization

min_{γ} {∥ γ s_{k - 1} - y_{k - 1} ∥}^{2}

, which yields

γ_{k}^{B B} = \frac{s_{k - 1}^{T} s_{k - 1}}{s_{k - 1}^{T} y_{k - 1}} .

(19)

The

B B

iterations were modified and investigated in a number of publications [65,66,67,68,69,70,71,72,73,74,75,76,77,78,79]. The so-called Scalar Correction (

S C

) method from [80] proposed the trial steplength in (17) defined by

γ_{k + 1}^{S C} = \{\begin{matrix} \frac{s_{k}^{T} r_{k}}{y_{k}^{T} r_{k}}, & y_{k}^{T} r_{k} > 0 \\ \frac{∥ s_{k} ∥}{∥ y_{k} ∥}, & y_{k}^{T} r_{k} \leq 0 \end{matrix}, r_{k} = s_{k} - γ_{k} y_{k} .

(20)

The

S C

iterations are defined as

x_{k + 1} = x_{k} - γ_{k}^{S C} g_{k} .

A kind of steepest descent and

B B

iterations relaxed by a parameter

θ_{k} \in (0, 2)

were proposed in [81]. The so-called Relaxed Gradient Descent Quasi Newton methods, (shortly

R G D Q N

and

R G D Q N 1

), expressed by

x_{k + 1} = x_{k} - θ_{k} t_{k} γ_{k}^{- 1} g_{k},

(21)

are introduced in [82]. Here,

θ_{k}

presents the relaxation parameter. This value is chosen randomly within the

(0, 1)

interval in the

R G D Q N

schemes and by the relation

θ_{k} = \frac{γ_{k}}{t_{k} γ_{k + 1}}

in the

R G D Q N 1

algorithm.

1.2.2. Discretization of Gradient Neural Networks (GNN) as Motivation

Our second motivation arises from discretizing gradient neural network (GNN) design. A GNN evolution can be defined in three steps. Further details can be found in [83,84].

The bulleted lists look like this:

Step1GNN.: Define underlying error matrix $E (t)$ by the interchange of the unknown matrix in the actual problem by the unknown time-varying matrix $V (t)$ , which will be approximated over time $t \geq 0$ . The scalar objective of a GNN is just the Frobenius norm of $E (t)$ :

$ε (t) = \frac{{∥ E (t) ∥}_{F}^{2}}{2}, {∥ E ∥}_{F} = \sqrt{Tr (E^{T} E)} .$
Step2GNN.: Compute the gradient $\frac{\partial ε (t)}{\partial V} = \nabla ε (t)$ of the objective $ε (t)$ .
Step3GNN.: Apply the dynamic GNN evolution, which relates the time derivative $\dot{V} (t)$ and direction opposite to the gradient of $ε (t)$ :

$\dot{V} (t) = \frac{d V (t)}{d t} = - γ \frac{\partial ε (t)}{\partial V}, V (0) = V_{0} .$

(22)

Here,

V (t)

is the activation state variables matrix,

t \in [0, + \infty)

is the time,

γ > 0

is the gain parameter, and

\dot{V} (t)

is the time derivative of

V (t)

.

The discretization of

\dot{V} (t)

by the Euler forward-difference rule is given by

\dot{V} (t) \approx (V_{k + 1} - V_{k}) / τ,

(23)

where

τ

is the sampling time and

V_{k} = V (t = k τ)

,

k = 1, 2, \dots

[84]. The approximation (23) transforms the continuous-time GNN evolution (23) into discrete-time iterations

\frac{V_{k + 1} - V_{k}}{τ} = - γ \frac{\partial ε (t)}{\partial V} = - γ \nabla ε (t) .

Derived discretization of the GNN design is just a GD method for nonlinear optimization:

V_{k + 1} = V_{k} - β_{k} \nabla ε (t), β_{k} = τ γ > 0,

(24)

where

β_{k} = τ γ > 0

is the step size. So, the step size

β_{k}

is defined as a product of two parameters, in which the parameter

γ

should be “as large as possible”, while

τ

should be “as small as possible”. Such considerations may add additional points of view to multiple parameters gradient optimization methods.

Our idea is to generalize the IGD iterations considered in [56] to the problem of solving SNE. One observable analogy is that the gain parameter

γ

from (22) corresponds to the parameter

γ_{k}

from (15). In addition, the sampling time

τ

can be considered as an analogy to the primary step size

t_{k} \in (0, 1)

, which is defined by an inexact line search. Iterations defined as IGD iterations adopted to solve SNE will be called IGDN class.

2. Multiple Step-Size Methods for Solving SNE

The term “multiple step-size methods” is related to the class of gradient-based iterative methods for solving SNE employing a step size defined using two or more appropriately defined parameters. The final goal is to improve the efficiency of classical gradient methods. Two strategies are used in finding approximate parameters: inexact line search and the Taylor expansion.

2.1. IGDN Methods for Solving SNE

Our aim is to simplify the update of the Jacobian

F^{'} (x_{k}) : = J_{k}

. Following (13), it is appropriate to approximate the Jacobian with a diagonal matrix

F^{'} (x_{k}) \approx γ_{k} I .

(25)

Then,

B_{k} = γ_{k} I

in (5) produces the search direction

d_{k} = - γ_{k}^{- 1} F_{k}

, and the iterations (8) are transformed into

x_{k + 1} = x_{k} - t_{k} γ_{k}^{- 1} F_{k} .

(26)

The final step size in iterations (26) is defined using two step size parameters:

t_{k}

and

γ_{k}

. Iterations that fulfill pattern (26) are an analogy of

I G D

methods for nonlinear unconstrained optimization and will be termed as

I G D N

class of methods.

Using the experience of nonlinear optimization, the steplength parameter

γ_{k}

can be defined appropriately using the Taylor expansion of

F (x)

:

F_{k + 1} = F_{k} + F^{'} (ξ_{k}) (x_{k + 1} - x_{k}), ξ_{k} \in [x_{k}, x_{k + 1}]

On the basis of (25), it is appropriate to use

F^{'} (ξ_{k}) \approx γ_{k} I

, which implies

F_{k + 1} - F_{k} = γ_{k} (x_{k + 1} - x_{k}) .

(27)

Using (27) and applying notation (6), one obtains the following updates of

γ_{k}

:

γ_{k} = \frac{y_{k}^{T} y_{k}}{y_{k}^{T} s_{k}} = \frac{s_{k}^{T} y_{k}}{s_{k}^{T} s_{k}} .

It can be noticed that the iterative rule (26) matches with

B B

iteration [64]. So, we introduced the

B B

method for solving SNE. Our further contribution is the introduction of appropriate restrictions on the scaling parameter. To that end, Theorem 1 reveals values of

γ_{k}

which decrease the objective functions included in

F_{k}

. The inequality

F_{k + 1} \leq F_{k}

means

{(F_{k + 1})}_{i} \leq {(F_{k})}_{i}

,

i = 1, \dots, n

.

Theorem 1.

If the condition

γ_{k + 1} \leq \frac{γ_{k}}{t_{k}}

is satisfied, then the

I G D N

iterations (26) satisfy

F_{k + 1} \leq F_{k} .

Proof.

As a consequence of (26) and (27), one can verify

F_{k + 1} = F_{k} - t_{k} γ_{k + 1} γ_{k}^{- 1} F_{k} = (1 - t_{k} γ_{k + 1} γ_{k}^{- 1}) F_{k} .

(28)

In view of

t_{k}, γ_{k + 1}, γ_{k} \geq 0

, it follows that

1 - t_{k} γ_{k + 1} γ_{k}^{- 1} \leq 1

. On the other hand, the inequality

1 - t_{k} γ_{k + 1} γ_{k}^{- 1} \geq 0

is satisfied in the case

γ_{k + 1} \leq \frac{γ_{k}}{t_{k}}

. Now, (28) implies

{(F_{k + 1})}_{i} \leq {(F_{k})}_{i}

,

i = 1, \dots, n

, which needs to be proven. □

So, appropriate update

γ_{k + 1}

can be defined as follows:

γ_{k + 1} = \{\begin{matrix} \frac{y_{k}^{T} y_{k}}{y_{k}^{T} s_{k}} = \frac{s_{k}^{T} y_{k}}{s_{k}^{T} s_{k}}, & y_{k}^{T} s_{k} \geq 0, \\ \frac{γ_{k}}{t_{k}}, & y_{k}^{T} s_{k} < 0 . \end{matrix}

(29)

Now, we are able to generate the value of the next approximation in the form

x_{k + 2} = x_{k + 1} - t_{k + 1} γ_{k + 1}^{- 1} F_{k + 1} .

(30)

The step size

t_{k + 1}

in (30) can be determined using the nonmonotone line search. More precisely,

t_{k}

is defined by

t_{k} = max \{1, s^{k}\}

, where

s \in (0, 1)

, and the integer k is defined from the line search

f (x_{k} + t_{k} d_{k}) - f (x_{k}) \leq - ω_{1} | | t_{k} F (x_{k}) {| |}^{2} - ω_{2} | | t_{k} d_{k} {| |}^{2} + η_{k} f (x_{k}),

(31)

wherein

ω_{1} > 0

,

ω_{2} > 0

, are constants, and

\{η_{k}\}

is a positive sequence such that

\sum_{k = 0}^{\infty} η_{k} < \infty .

(32)

The equality (28) can be rewritten in the equivalent form

y_{k} = - t_{k} γ_{k + 1} γ_{k}^{- 1} F_{k},

(33)

which gives

γ_{k + 1} = - \frac{γ_{k} F_{k}^{T} y_{k}}{t_{k} F_{k}^{T} F_{k}} .

Further, an application of Theorem 1 gives the following additional update for the acceleration parameter

γ_{k}

:

γ_{k + 1} = \{\begin{matrix} - \frac{γ_{k} F_{k}^{T} y_{k}}{t_{k} F_{k}^{T} F_{k}}, & \frac{F_{k}^{T} y_{k}}{F_{k}^{T} F_{k}} \notin (- 1, 0), \\ \frac{γ_{k}}{t_{k}}, & \frac{F_{k}^{T} y_{k}}{F_{k}^{T} F_{k}} \in (- 1, 0) . \end{matrix}

(34)

Corollary 1.

I G D N

iterations (26) determined by (34) satisfy

F_{k + 1} \leq F_{k} .

Proof.

Clearly, (34) initiates

γ_{k + 1} \leq \frac{γ_{k}}{t_{k}}

, and the proof follows from Theorem 1. □

Further on, the implementation framework of the

I G D N

method is presented in Algorithm 1.

Algorithm 1 The IGDN iterations based on (29), (30) or (34), (30).

Require:: Vector function $F (x)$ , $ϵ > 0$ and initialization $x_{0} \in R^{n}$ .
1:: For $k = 0$ chose $γ_{0} = 1$ and $F (x_{0})$ .
2:: Check the output criterion; if $∥F (x_{k})∥ \leq ϵ$ is fulfilled then stop the algorithm; else, continue performing the next step.
3:: (Line search) Compute $t_{k} \in (0, 1]$ using (31).
4:: Compute $x_{k + 1}$ using (30).
5:: Determine $γ_{k + 1}$ using (29) or (34).
6:: $k : = k + 1$ .
7:: Return to Step 2.
8:: Outputs: $x_{k + 1}$ , $F (x_{k + 1})$ .

Remark 1.

The IGDN algorithm defined by (29) (resp. by (34)) will be denoted by

I G D N

(29) (resp. by

I G D N

(34)). Mathematically,

I G D N

(29) and

I G D N

(34) are equivalent. The numerical comparison of these algorithms will be performed later.

2.2. A Class of Accelerated Double Direction (ADDN) Methods

In [61], an optimization method was defined by the iterative rule

x_{k + 1} = x_{k} + t_{k} d_{k} + t_{k}^{2} c_{k},

(35)

where

t_{k}

denotes the value of the steplength parameter, and

d_{k}

,

c_{k}

are the search directions vectors. The vector

d_{k}

is defined as in the SM-method from [58], which gives

d_{k} = {γ_{k}}^{- 1} g_{k}

, and further

x_{k + 1} = x_{k} - t_{k} {γ_{k}}^{- 1} g_{k} + t_{k}^{2} c_{k} .

(36)

We want to apply this strategy in solving (1). First of all, the vector

c_{k}

can be defined according to [85]. An appropriate definition of

c_{k}

is still open.

Assuming again

B_{k} = γ_{k} I

, the vector

d_{k}

from (5) becomes

d_{k} = - γ_{k}^{- 1} F_{k}

, which transforms (35) into

x_{k + 1} = x_{k} - t_{k} γ_{k}^{- 1} F_{k} + t_{k}^{2} c_{k} .

(37)

We propose the steplength

γ_{k + 1}

arising from the Taylor expansion (27) and defined as in (29). In addition, it is possible to use an alternative approach. More precisely, in this case, (27) yields to

F_{k + 1} = F_{k} - γ_{k + 1} (- t_{k} γ_{k}^{- 1} F_{k} + t_{k}^{2} c_{k}) .

As a consequence,

γ_{k + 1}

can be defined utilizing

γ_{k + 1} = - \frac{γ_{k} y_{k}^{T} y_{k}}{y_{k}^{T} (- t_{k} F_{k} + γ_{k} t_{k}^{2} c_{k})} .

(38)

The problem

γ_{k + 1} < 0

in (38) is solved using

γ_{k + 1} = 1

.

We can easily conclude that the next iteration is then generated by

x_{k + 2} = x_{k + 1} - t_{k + 1} F_{k + 1} + t_{k}^{2} c_{k + 1} .

The

A D D N

iterations are defined in Algorithm 2.

Algorithm 2 The

A D D N

iterations based on (37), (38).

Require:: Functions $F (x)$ , $ϵ > 0$ and a given initial vector $x_{0} \in R^{n}$ .
1:: For $k = 0$ chose $γ_{0} = 1$ and $F (x_{0})$ .
2:: Check the stop criterion; if $∥F (x_{k})∥ \leq ϵ$ is satisfied then stop the algorithm; else, continue with Step 3:.
3:: (Line search) Find $t_{k} \in (0, 1]$ using inexact line search procedure.
4:: Compute $x_{k + 1}$ using (37).
5:: Determine $γ_{k + 1}$ using (38).
6:: In case $γ_{k + 1} < 0$ , apply $γ_{k + 1} = 1$ .
7:: $k : = k + 1$ .
8:: Back to Step 2.
9:: Outputs: $x_{k + 1}$ , $F (x_{k + 1})$ .

2.3. A Class of Accelerated Double Step Size (ADSSN) Methods

If the steplength

t_{k}^{2}

is replaced by another steplength

l_{k}

in (35), it can be obtained

x_{k + 1} = x_{k} + t_{k} d_{k} + l_{k} c_{k} .

(39)

Here, the parameters

t_{k}, l_{k} \geq 0

are two independent step size values, and the vectors

d_{k}

,

c_{k}

define the search directions of the proposed iterative scheme (39).

Motivation for this type of iterations arises from [60]. The author of this paper suggested a model of the form (39) with two-step size parameters. This method is actually defined by substituting the parameter

{t_{k}}^{2}

from (35) with another step size parameter

l_{k}

. Both step size values are computed by independent inexact line search algorithms.

Since we aim to unify search directions, it is possible to use

d_{k} : = - γ_{k}^{- 1} F_{k}, c_{k} : = - F_{k} .

(40)

The substitution of chosen parameters (40) into (39) produces

x_{k + 1} = x_{k} - (t_{k} γ_{k}^{- 1} + l_{k}) F_{k} .

(41)

The final step size,

(t_{k} γ_{k}^{- 1} + l_{k})

, in the iterations (41) are defined combining three step size parameters:

t_{k}

,

l_{k}

, and

γ_{k}

. Again, the parameter

γ_{k + 1}

is defined using the Taylor series of the form

F (x_{k + 1}) = F (x_{k}) - γ_{k + 1} (t_{k} γ_{k}^{- 1} + l_{k}) F (x_{k}) .

As a consequence,

γ_{k + 1}

can be computed by

γ_{k + 1} = - \frac{γ_{k} F_{k}^{T} y_{k}}{(t_{k} + γ_{k} l_{k}) F_{k}^{T} F_{k}} .

Theorem 2.

If the condition

γ_{k + 1} \leq \frac{γ_{k}}{t_{k} + γ_{k} l_{k}}

holds, then the iterations (41) satisfy

F_{k + 1} \leq F_{k} .

Proof.

Taking (27) in conjunction with (41), one can verify

F_{k + 1} = F_{k} - γ_{k + 1} (t_{k} γ_{k}^{- 1} + l_{k}) F_{k} = F_{k} (1 - γ_{k + 1} (t_{k} γ_{k}^{- 1} + l_{k})) .

Clearly,

γ_{k + 1} \leq \frac{γ_{k}}{t_{k} + γ_{k} l_{k}}

implies

1 - γ_{k + 1} (t_{k} γ_{k}^{- 1} + l_{k}) \geq 0

. The proof follows from

t_{k} \geq 0, γ_{k + 1}, γ_{k} \geq 0

, which ensures

1 - γ_{k + 1} (t_{k} γ_{k}^{- 1} + l_{k}) \leq 1

. □

In view of Theorem 2, it is reasonable to define the following update for

γ_{k + 1}

in the

A D S S N

method:

γ_{k + 1} = \{\begin{matrix} - \frac{γ_{k} F_{k}^{T} y_{k}}{(t_{k} + γ_{k} l_{k}) F_{k}^{T} F_{k}}, & \frac{F_{k}^{T} y_{k}}{F_{k}^{T} F_{k}} \notin (- 1, 0), \\ \frac{γ_{k}}{t_{k} + γ_{k} l_{k}}, & \frac{F_{k}^{T} y_{k}}{F_{k}^{T} F_{k}} \in (- 1, 0) . \end{matrix}

(42)

Once the accelerated parameter

γ_{k + 1} > 0

is determined, the values of step size parameters

t_{k + 1}

and

l_{k + 1}

are defined. Then, it is possible to generate the next point:

F_{k + 2} = F_{k + 1} - γ_{k + 2} (t_{k + 1} γ_{k + 1}^{- 1} + l_{k + 1}) F_{k + 1} .

In order to derive appropriate values of the parameters

t_{k + 1}

and

l_{k + 1}

, we investigate the function

Φ_{k + 1} (t, l) = F_{k + 1} - γ_{k + 2} (γ_{k + 1}^{- 1} t + l) F_{k + 1} .

The gradient of

Φ_{k + 1} (t, l)

is equal to

g (Φ_{k + 1} (t, l)) = \{Φ_{k + 1} {(t, l)}_{t}^{'}, Φ_{k + 1} {(t, l)}_{l}^{'}\} = \{- γ_{k + 2} γ_{k + 1}^{- 1} F_{k + 1}, - γ_{k + 2} F_{k + 1}\} .

(43)

Therefore,

Φ_{k + 1} (0, 0) = F_{k + 1} .

In addition,

g (Φ_{k + 1} (t, l)) = {0, 0} ⟺ F_{k + 1} = 0 .

(44)

Therefore, the function

Φ_{k + 1} (t, l)

is well-defined.

Step scaling parameters

t_{k}

and

l_{k}

can be determined using two successive line search procedures (31).

Corollary 2.

The

A D S S N

iterations determined by (41) satisfy

F_{k + 1} \leq F_{k} .

Proof.

Clearly, the definition of

γ_{k + 1}

in (42) implies

γ_{k + 1} \leq \frac{γ_{k}}{t_{k}}

, and the proof follows from Theorem 2. □

The

A D S S N

iterations are defined in Algorithm 3.

Algorithm 3 The

A D S S N

iteration based on (41) and (42).

Require:: Chosen $F (x)$ , $ϵ > 0$ and an initialization $x_{0} \in R^{n}$ .
1:: For $k = 0$ chose $γ_{0} = 1$ and $F (x_{0})$ .
2:: Check the test criterion; if $∥F (x_{k})∥ \leq ϵ$ holds, then stop; else, continue with Step 3:.
3:: Find $t_{k}$ using inexact line search.
4:: Find $l_{k}$ using inexact line search.
5:: Compute $x_{k + 1}$ using (41).
6:: Determine the scalar $γ_{k + 1}$ using (42).
7:: $k : = k + 1$ .
8:: Return to Step 2:.
9:: Outputs: $x_{k + 1}$ and $F (x_{k + 1})$ .

Remark 2.

Step 6 of Algorithm 3 is defined according to Theorem 2.

2.4. Simplified ADSSN

Applying the relation

t_{k} + l_{k} = 1

(45)

between the step size parameters

t_{k}

and

l_{k}

in the

A D S S N

iterative rule (41), the

A D S S N

iteration is transformed to

x_{k + 1} = x_{k} - [t_{k} (γ_{k}^{- 1} - 1) + 1] F (x_{k}) .

(46)

The convex combination (45) of step size parameters

t_{k}

and

l_{k}

that appear in the

A D S S N

scheme (41) was originally proposed in [62] and applied in an iterative method for solving the unconstrained optimization problem (7). The assumption (45) represents a trade-off between the steplength parameters

t_{k}

and

l_{k}

. In [62], it was shown that the induced single step size method shows better performance characteristics in general. The constraint (45) initiates the reduction of the two-parameter

A D S S N

rule into a single step size transformed

A D S S N

(shortly

T A D S S N

) iterative method (46).

We can spot that the

T A D S S N

method is a modified version of

I G D N

iterations, based on the replacement of the product

t_{k} γ_{k}^{- 1}

, from the classical IGDN iteration, by the multiplying factor

t_{k} (γ_{k}^{- 1} - 1) + 1

.

The substitution

ϕ_{k} : = t_{k} (γ_{k}^{- 1} - 1) + 1

will be used to simplify the presentation. Here, the accelerated parameter value

γ_{k + 1}

is calculated by (29).

Corollary 3.

Iterations (46) satisfy

F_{k + 1} = F_{k} - γ_{k + 1} ϕ_{k} F_{k} .

(47)

Proof.

It follows from (27) and (46). □

In view of (47), it is possible to conclude

γ_{k + 1} = - \frac{F_{k}^{T} y_{k}}{ϕ_{k} F_{k}^{T} F_{k}} .

Corollary 4 gives some useful restrictions on this rule.

Corollary 4.

If the condition

γ_{k + 1} \leq \frac{γ_{k}}{t_{k} + γ_{k} (1 - t_{k})}

holds, then the iterations (41) satisfy

F_{k + 1} \leq F_{k} .

Proof.

It follows from Theorem 1 and

l_{k} = 1 - t_{k}

. □

In view of Corollary 4, it is reasonable to define the following update for

γ_{k + 1}

in the

T A D S S N

method:

γ_{k + 1} = \{\begin{matrix} - \frac{F_{k}^{T} y_{k}}{ϕ_{k} F_{k}^{T} F_{k}}, & \frac{F_{k}^{T} y_{k}}{ϕ_{k} F_{k}^{T} F_{k}} \leq 0 \\ \frac{γ_{k}}{t_{k} + γ_{k} (1 - t_{k})}, & \frac{F_{k}^{T} y_{k}}{ϕ_{k} F_{k}^{T} F_{k}} > 0 . \end{matrix}

(48)

Then,

x_{k + 2}

is equal to

x_{k + 2} = x_{k + 1} - ϕ_{k + 1} F_{k + 1} .

Algorithm 4 The ADSSN iteration based on (46) and (48).

Require:: Chosen $F (x)$ , $ϵ > 0$ and $x_{0} \in R^{n}$ .
1:: For $k = 0$ chose $γ_{0} = 1$ and $F (x_{0})$ .
2:: Check the termination criterion; if $∥F (x_{k})∥ \leq ϵ$ holds then stop; else, go to Step 3:.
3:: (Line search) Apply (31) and generate the step size value $t_{k}$ .
4:: Compute $l_{k} = 1 - t_{k}$ .
5:: Compute $x_{k + 1}$ using (46).
6:: Determine the scaling factor $γ_{k + 1}$ using (48).
7:: $k : = k + 1$ .
8:: Return to Step 2.
9:: Output: $x_{k + 1}$ , $F (x_{k + 1})$ .

3. Convergence Analysis

The level set is defined as

Ω = \{x \in R^{n} | ∥F (x)∥ \leq ∥F (x_{0})∥\},

(49)

where

x_{0} \in R^{n}

is an initial approximation.

Therewith, the next assumptions are needed:

$(A_{1})$: The level set $Ω$ defined in (49) is bounded below.
$(A_{2})$: Lipschitz continuity holds for the vector function F, i.e., $∥F (x) - F (y)∥ \leq r ∥x - y∥$ for all $x, y \in R^{n}$ and $r > 0$ .
$(A_{3})$: The Jacobian $F^{'} (x)$ is bounded.

Lemma 1.

Suppose the assumption

(A_{2})

holds. If the sequence

\{x_{k}\}

is obtained by the

I G D N

(29) iterations, then

y_{k}^{T} s_{k} \leq r {∥s_{k}∥}^{2}, r > 0 .

(50)

Proof.

Obviously,

y_{k}^{T} s_{k} = s_{k}^{T} y_{k} = s_{k}^{T} (F_{k + 1} - F_{k}) .

(51)

Therefore, assuming

(A_{2})

, it is possible to derive

y_{k}^{T} s_{k} \leq ∥s_{k}∥ ∥F_{k + 1} - F_{k}∥ \leq r {∥s_{k}∥}^{2} .

(52)

Previous estimation confirms that (50) is satisfied with r defined by the Lipschitz condition in

(A_{2})

. □

For the convergence results of the remaining algorithms, we need to prove the finiteness of

γ_{k}

,

d_{k}

, and the remaining results follow trivially.

Lemma 2.

The

γ_{k}

generated by

I G D N

(29) is bounded by the Lipschitz constant r.

Proof.

Clearly, the complemental step size

γ_{k}

defined by (29) satisfies

γ_{k + 1} = \frac{y_{k}^{T} s_{k}}{{∥s_{k}∥}^{2}} \leq \frac{r {∥s_{k}∥}^{2}}{{∥s_{k}∥}^{2}} = r,

(53)

which leads to the conclusion

γ_{k} \leq r

. □

Lemma 3.

The additional step size

γ_{k}

generated by

I G D N

(34) is bounded as follows:

γ_{k} \leq \frac{1}{\prod_{i = 0}^{k - 1} t_{i}}

(54)

Proof.

The updating rule (34) satisfies

γ_{k + 1} \leq \frac{γ_{k}}{t_{k}}

. Continuing in the same way, one concludes

γ_{k + 1} \leq \frac{γ_{0}}{\prod_{i = 0}^{k} t_{i}} .

The proof can be finished using

γ_{0} = 1

. □

Lemma 4.

The additional scaling parameter

γ_{k}

generated by (42) is bounded as follows:

γ_{k} \leq \frac{1}{\prod_{i = 0}^{k - 1} (t_{i} + γ_{i} l_{i})} .

(55)

Lemma 5.

The directions

d_{k}

used in

I G D N

(29) and

I G D N

(34) algorithms are descent directions.

Proof.

Since

d_{k} = - γ_{k}^{- 1} F_{k},

(56)

an application of the scalar product of both sides in (56) with

F_{k}^{T}

in conjunction with Lemma 2 leads to the following conclusion for

I G D N

(29) iterations:

F_{k}^{T} d_{k} = - γ_{k}^{- 1} F_{k}^{T} F_{k} \leq - \frac{1}{r} {∥F_{k}∥}^{2} < 0 .

(57)

With Lemma 3, it can be concluded that

I G D N

(34) iterations imply the following:

F_{k}^{T} d_{k} = - γ_{k}^{- 1} F_{k}^{T} F_{k} \leq - (\prod_{i = 0}^{k - 1} t_{i}) {∥F_{k}∥}^{2} < 0 .

(58)

The proof is complete. □

Lemma 6.

The direction

d_{k}

used in

A D S S N

algorithms is a descent direction.

Proof.

Since

d_{k} = - (t_{k} γ_{k}^{- 1} + l_{k}) F_{k},

(59)

after using the scalar product of both sides in (59) with

F_{k}^{T}

and taking into account Lemma 4, we obtain

\begin{matrix} F_{k}^{T} d_{k} & = - (t_{k} γ_{k}^{- 1} + l_{k}) F_{k}^{T} F_{k} \\ = - \frac{1}{γ_{k}} (t_{k} + l_{k} γ_{k}) F_{k}^{T} F_{k} \\ \leq - \frac{1}{\frac{1}{\prod_{i = 0}^{k - 1} (t_{i} + γ_{i} l_{i})}} (t_{k} + l_{k} γ_{k}) F_{k}^{T} F_{k} \\ = - (\prod_{i = 0}^{k} (t_{i} + γ_{i} l_{i})) {∥F_{k}∥}^{2} < 0 . \end{matrix}

(60)

The proof is complete. □

Theorem 3.

The vector

F_{k + 1}

generated by

I G D N

(34) is a descent direction.

Proof.

According to (34), it follows

γ_{k + 1} = - \frac{γ_{k} F_{k}^{T} y_{k}}{t_{k} F_{k}^{T} F_{k}} = - \frac{γ_{k} F_{k}^{T} (F_{k + 1} - F_{k})}{t_{k} F_{k}^{T} F_{k}} = - \frac{γ_{k} F_{k}^{T} F_{k + 1}}{t_{k} {∥ F_{k} ∥}^{2}} + \frac{γ_{k}}{t_{k}} .

As a consequence,

γ_{k + 1} \leq \frac{γ_{k}}{t_{k}}

implies

F_{k}^{T} F_{k + 1} \geq 0

, which means that

F_{k + 1}

is a descent direction. □

Theorem 4.

The vector

F_{k + 1}

generated by

A D S S N

iterations (41) is a descent direction.

Lemma 7.

If the assumptions

(A_{1})

and

(A_{2})

are valid, then the norm of the direction vector

d_{k}

generated by

I G D N

(29) is bounded.

Proof.

The norm

∥d_{k}∥

can be estimated as

\begin{matrix} ∥d_{k}∥ & = ∥- γ_{k}^{- 1} F_{k}∥ \\ \leq |- γ_{k}^{- 1}| ∥F_{k}∥ . \end{matrix}

(61)

As an implication of

(A_{1})

, one can conclude

∥F_{k}∥ \leq M

, which in conjunction with Lemma 2 further approximates

∥d_{k}∥

in (61) by

∥d_{k}∥ \leq w

,

w = \frac{1}{r} M > 0 .

□

Lemma 8.

If the assumptions

(A_{1})

and

(A_{2})

hold, then the norm of the direction vector

d_{k}

generated by

I G D N

(34) is bounded.

Proof.

As an implication of

(A_{1})

, one can conclude

∥F_{k}∥ \leq M

, which in conjunction with (54) and (61) further approximates

∥d_{k}∥

in (61) by

∥d_{k}∥ \leq w

,

w = (\prod_{i = 0}^{k - 1} t_{i}) M > 0 .

□

Lemma 9.

If the assumptions

(A_{1})

and

(A_{2})

are active, then the norm of the direction vector

d_{k}

generated by

A D S S N

is bounded.

Proof.

Following the proof used in Lemma 8, it can be verified that

∥d_{k}∥ \leq (\prod_{i = 0}^{k - 1} (t_{i} γ_{i}^{- 1} + l_{i})) M > 0 .

□

Now, we are going to establish the global convergence of

I G D N

(29) and

I G D N

(34) and

A D S S N

iterations.

Theorem 5.

If the assumptions

(A_{2})

and

(A_{3})

are satisfied and

x_{k}

are iterations generated by

I G D N

(29), then

lim_{k \to \infty} ∥F (x_{k})∥ = 0 .

(62)

Proof.

The search direction is defined by

d_{k} = - γ_{k}^{- 1} F_{k} .

Starting from the apparent relation

F_{k}^{T} d_{k} = - γ_{k}^{- 1} {∥F_{k}∥}^{2},

we can conclude

{∥F_{k}∥}^{2} = - F_{k}^{T} d_{k} γ_{k} .

(63)

Finally, (57) implies

F_{k}^{T} d_{k} < 0

, which further implies

- F_{k}^{T} d_{k} > 0

. From Lemma 2, using (63) and

- F_{k}^{T} d_{k} > 0

, it follows that

\begin{matrix} {∥F_{k}∥}^{2} & = γ_{k} | - F_{k}^{T} d_{k} | \\ \leq r |F_{k}^{T} d_{k}| \\ \leq r ∥ F_{k} ∥ ∥ d_{k} ∥ . \end{matrix}

(64)

Based on Lemma 7, it can be concluded

{∥F_{k}∥}^{2} \leq r ∥ F_{k} ∥ ∥ d_{k} ∥ \leq r w ∥ F_{k} ∥ .

(65)

By Lemma 5, we can deduce that the norm of the function

F (x_{k})

is decreasing along the direction

d_{k}

, which means

∥F (x_{k + 1})∥ \leq ∥F (x_{k})∥

is true for every k. Based on this fact, it follows

0 \leq {∥F_{k}∥}^{2} \leq r w ∥ F_{k} ∥ ⟶ 0,

(66)

which directly implies

lim_{k \to \infty} ∥F (x_{k})∥ = 0

(67)

and completes the proof. □

Theorem 6.

If the assumptions

(A_{2})

and

(A_{3})

are satisfied and

x_{k}

are iterations generated by

I G D N

(34), then (62) is valid.

Proof.

The search direction of

I G D N

(34) satisfies (63). Finally, since

γ_{k}

is bounded as in (54), and

d_{k}

is a descent direction (Lemma 8). iIt can be concluded

\begin{matrix} 0 \leq {∥F_{k}∥}^{2} & \leq (\frac{1}{\prod_{i = 0}^{k - 1} t_{i}}) |F_{k}^{T} d_{k}| \\ \leq (\frac{1}{\prod_{i = 0}^{k - 1} t_{i}}) ∥F_{k}∥ ∥d_{k}∥ \\ \leq (\frac{1}{\prod_{i = 0}^{k - 1} t_{i}}) w ∥F_{k}∥ ⟶ 0, \end{matrix}

(68)

which implies the desired result. □

Theorem 7.

If the assumptions

(A_{2})

and

(A_{3})

are satisfied and

x_{k}

are iterations generated by

A D S S N

iterations (41), then (62) is valid.

4. Numerical Experience

In order to confirm the efficiency of the presented

I G D N

and

A D S S N

processes, we compare them with the

E M F D

iterations from [8]. We explore performances of both

I G D N

variants defined by Algorithm 1, depending on chosen acceleration parameter

γ_{k}

. These variants are denoted as

I G D N

(29) and

I G D N

(34).

The following values of needed parameters are used:

$I G D N$ algorithms are defined using $ω_{1} = ω_{2} = 10^{- 4}$ , $α_{0} = 0.01$ , $s = 0.2$ , $ϵ = 10^{- 4}$ , and $η_{k} = \frac{1}{{(k + 1)}^{2}} .$
$E M F D$ method is defined using $ω_{1} = ω_{2} = 10^{- 4}$ , $α_{0} = 0.01$ , $s = 0.2$ , $ϵ = 10^{- 4}$ , and $η_{k} = \frac{1}{{(k + 1)}^{2}}$ .

We use the following initial points (IP shortly) for the iterations:

x_{1} = o n e s (1, \dots, 1)

,

x_{2} = (1, \frac{1}{2}, \frac{1}{3}, \dots, \frac{1}{n})

,

x_{3} = (0.1, 0.1, \dots, 0.1)

,

x_{4} = (\frac{1}{n}, \frac{2}{n}, \dots, 1)

,

x_{5} = (1 - \frac{1}{n}, 1 - \frac{2}{n}, \dots, 0)

,

x_{6} = (- 1, \dots, - 1)

,

x_{7} = (n - \frac{1}{n}, n - \frac{2}{n}, \dots, n - 1)

,

x_{8} = (\frac{1}{2}, 1, \frac{2}{3}, \dots, \frac{2}{n})

.

The considered nine test problems are listed below.

Problem 1 (P1) [86] Nonsmooth Function

F (x_{i}) = 2 x_{i} - sin |x_{i}|

, for

i = 1, 2, \dots, n

.

Problem 2 (P2) [87]

F (x_{i}) = min \{min (x_{i}, x_{i}^{2}), max (|x_{i}|, x_{i}^{3})\}

,

i = 2, 3, \dots, n

.

Problem 3 (P3) [87] Strictly Convex Function I

F (x_{i}) = exp (x_{i}) - 1

, for

i = 1, 2, \dots, n

.

Problem 4 (P4) [87]

F_{1} (x) = h x_{1} + x_{2} - 1,

F_{i} (x) = x_{i - 1} + h x_{i} + x_{i - 1} - 1,

i = 2, 3, \dots, n - 1

,

h = 2.5

F_{n} (x) = x_{n - 1} + h x_{n} - 1

.

Problem 5 (P5) [87]

F_{1} (x) = x_{1} + exp (cos (h x_{1} + x_{2})),

F_{i} (x) = x_{i} + exp (cos (h x_{i - 1} + x_{i} + x_{i + 1})),

for

i = 2, 3, \dots, n - 1

,

h = \frac{1}{n + 1}

F_{n} (x) = x_{n} + exp (cos (h x_{n - 1} + x_{n}))

Problem 6 (P6) [87]

F_{1} (x) = 2 x_{1} + sin (x) - 1,

F_{i} (x) = - 2 x_{i - 1} + 2 x_{i} + 2 sin (x_{i}) - 1,

for

i = 2, 3, \dots, n - 1

,

h = 2.5

F_{n} (x) = - 2 x_{n} + sin (x_{n}) - 1

.

Problem 7 (P7) [87]

F_{1} (x) = 3 x_{1}^{3} + x_{2} - 5 + sin (x_{1} - x_{2}) sin (x_{1} + x_{2}),

F_{i} (x) = 3 x_{i}^{3} + 2 x_{i + 1} - 5 sin (x_{i} - x_{i + 1}) + 4 x_{i} - x_{i - 1} exp (x_{i - 1} - x_{i}) - 3,

for

i = 2, 3, \dots, n - 1

,

F_{n} (x) = - x_{n - 1} exp (x_{n - 1} - x_{n}) + 4 x_{n} - 3

.

Problem 8 (P8) [86]

F (x_{i}) = x_{i} - sin |x_{i} - 1|

, for

i = 1, 2, \dots, n

.

Problem 9 (P9) [86]

F (x_{i}) = 2 x_{i} - sin |x_{i}|

, for

i = 1, 2, \dots, n

.

All tested methods are analyzed concerning three main computational aspects: number of iterations (iter), number of function evaluations (fval), and the CPU time (CPU). Performances of analyzed models are investigated on nine listed problems, applied on eight marked initial points, for five variables: 1000, 5000, 10,000, 50,000, 100,000.

According to obtained results,

I G D N

(29) and

I G D N

(34) have better performances in comparison to the

E M F D

method from [8]. Both variants of

I G D N

algorithms outperform the

E M F D

method in all considered performances. In the next Table 1 (IGDN-EMFD comparisons), we display the best comparative analysis achievements of all methods regarding three tested profiles: iter, fval, and CPU.

The

I G D N

(29) variant gives the best results in 52 out of 360 cases, considering the minimal number of iterations. Further,

I G D N

(34) has the lowest outcomes in 33 out of 230 cases. These variants have the same minimal number of iterations in total, 181 out of 360 cases. All tree models require equal minimal number of iterations in 23 out of 360 cases, while the

E M F D

methods give the minimal number of iterations in 71 out of 360 cases. Considering the needed number of iterations,

I G D N

variants reach the minimal values in 265 out of 360 cases, as stated in the column

I G D N

total.

Regarding the fval metric, the results are as follows: 52 out of 360 cases are in favor to

I G D N

(29), 33 out of 360 with respect to

I G D N

(34), 180 out of 360 when both

I G D N

variants have the same minimal fval, while in 24 out of 360 cases all three methods give equal fval minimal values, and 71 out of 360 are in favor to the

E M F D

method. The total minimal fval values achieved under the application of some

I G D N

variants are the same as the total minimal iter numbers, i.e., 265 out of 360.

Concerning the CPU time, numerical outcomes are absolutely in favor of

I G D N

variants, i.e., in 355 out of 360 cases, while the

E M F D

is faster only in 5 out of 360 outcomes.

Obtained numerical results justify better performance characteristics of the

A D S S N

method, which is defined by Algorithm 3, compared to the

E M F D

method. Actually, the

A D S S N

scheme outperforms the

E M F D

iteration regarding all analyzed metrics: iter, fval, CPU time, and additionally with respect to the norm of the objective function. The summary review of obtained numerical values is presented in Table 2 (ADSSN-EMFD comparisons).

Results arranged in Table 2 confirm huge dominance of the

A D S S N

scheme in comparison with the

E M F D

method. Considering the number of iterations, the

A D S S N

method obtains 282 minimal values, while the

E M F D

wins in only 55 instances. Similar outcomes are recorded regarding the fval profile. The most convincing results are achieved considering the CPU time metric, by which the

A D S S N

model outperforms the

E M F D

in 359 out of 360 cases.

This section finishes with a graphical analysis of the performance features of the considered methods. In the subsequent Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6, we display Dolan and Moré [88] performance profiles of compared models in relation to tested metrics: iter, fval, and CPU.

Figure 1, Figure 2 and Figure 3 exhibit the clear superiority of

I G D N

(29) and

I G D N

(34) iterations compared to corresponding

E M F D

iterations regarding the analyzed characteristics iter (resp. fval, CPU time). Further, the theoretical equivalence between

I G D N

(29) and

I G D N

(34) implies their identical responses on testing criteria iter and fval, represented in Figure 1 and Figure 2. However, Figure 3 demonstrates slightly better performances of

I G D N

(34) with respect to

I G D N

(29), which implies that the updating rule (34) is slightly better compared to (29) concerning the execution time. So,

I G D N

(34) is computationally the most effective algorithm.

In the rest of this section, we compare

A D S S N

and

E M F D

.

Figure 4, Figure 5 and Figure 6 exhibit clear superiority of

A D S S N

iterations compared to corresponding

E M F D

iterations regarding all three analyzed performance profiles, iter, fval, and CPU.

5. Conclusions

The traditional gradient descent optimization schemes for solving SNE form a class of methods termed the

G D N

class. A single step size parameter characterizes methods belonging to that class. We aim to upgrade the traditional

G D N

iterates by introducing the improved gradient descent iterations (

I G D N

), which include complex steplength values defined by several parameters. In this way, we justified the assumption that applying two or more quantities in defining the composed step size parameters generally improves the performance of an underlying iterative process.

Numerical results confirm the evident superiority of

I G D N

methods in comparison with

E M F D

iterations from [8], which indicates the superiority of

I G D N

methods over traditional

G D N

methods considering all three analyzed features: iter, fval, and CPU. Confirmation of excellent performance of the presented models is also given through graphically displayed Dolan and Moré’s performance profiles.

The problem of solving SNE by applying some efficient accelerated gradient optimization models is of great interest to the optimization community. In that regard, the question of further upgrading

I G D N

,

A D D N

, and

A D S S N

type of methods is still open.

One possibility for further research is proper exploitation of the results presented in Theorems 1–2 in defining proper updates of the scaling parameter

γ_{k}

. In addition, it will be interesting to examine and exploit similar results in solving classical nonlinear optimization problems.

Author Contributions

Conceptualization, P.S.S. and M.J.P.; methodology, P.S.S., M.J.P. and B.I.; software, B.I. and J.S.; validation, B.I. and J.S.; formal analysis, P.S.S., B.I., A.S. (Abdullah Shah) and J.S.; investigation, X.C., S.L. and J.S.; data curation, B.I., J.S. and A.S. (Abdullah Shah); writing—original draft preparation, P.S.S., J.S. and B.I.S.; writing—review and editing, M.J.P., B.I.S., X.C., A.S. (Alena Stupina) and S.L.; visualization, B.I., J.S. and B.I.S.; project administration, A.S. (Alena Stupina); funding acquisition, A.S. (Alena Stupina). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Science and Higher Education of the Russian Federation (Grant No. 075-15-2022-1121).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Predrag Stanimirović is supported by the Science Fund of the Republic of Serbia, (No. 7750185, Quantitative Automata Models: Fundamental Problems and Applications-QUAM). Predrag Stanimirović acknowledges support Grant No. 451-03-68/2022-14/200124 given by Ministry of Education, Science and Technological Development, Republic of Serbia. Milena J. Petrović acknowledges support Grant No.174025 given by Ministry of Education, Science and Technological Development, Republic of Serbia. Milena J. Petrović acknowledges support from the internal-junior project IJ-0202 given by the Faculty of Sciences and Mathematics, University of Priština in Kosovska Mitrovica, Serbia.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yuan, G.; Lu, X. A new backtracking inexact BFGS method for symmetric nonlinear equations. Comput. Math. Appl. 2008, 55, 116–129. [Google Scholar] [CrossRef] [Green Version]
Abubakar, A.B.; Kumam, P. An improved three–term derivative–free method for solving nonlinear equations. Comput. Appl. Math. 2018, 37, 6760–6773. [Google Scholar] [CrossRef]
Cheng, W. A PRP type method for systems of monotone equations. Math. Comput. Model. 2009, 50, 15–20. [Google Scholar] [CrossRef]
Hu, Y.; Wei, Z. Wei–Yao–Liu conjugate gradient projection algorithm for nonlinear monotone equations with convex constraints. Int. J. Comput. Math. 2015, 92, 2261–2272. [Google Scholar] [CrossRef]
La Cruz, W. A projected derivative–free algorithm for nonlinear equations with convex constraints. Optim. Methods Softw. 2014, 29, 24–41. [Google Scholar] [CrossRef]
La Cruz, W. A spectral algorithm for large–scale systems of nonlinear monotone equations. Numer. Algorithms 2017, 76, 1109–1130. [Google Scholar] [CrossRef]
Papp, Z.; Rapajić, S. FR type methods for systems of large–scale nonlinear monotone equations. Appl. Math. Comput. 2015, 269, 816–823. [Google Scholar] [CrossRef]
Halilu, A.S.; Waziri, M.Y. En enhanced matrix-free method via double steplength approach for solving systems of nonlinear equations. Int. J. Appl. Math. Res. 2017, 6, 147–156. [Google Scholar] [CrossRef] [Green Version]
Halilu, A.S.; Waziri, M.Y. A transformed double steplength method for solving large-scale systems of nonlinear equations. J. Numer. Math. Stochastics 2017, 9, 20–32. [Google Scholar]
Waziri, M.Y.; Muhammad, H.U.; Halilu, A.S.; Ahmed, K. Modified matrix-free methods for solving system of nonlinear equations. Optimization 2021, 70, 2321–2340. [Google Scholar] [CrossRef]
Osinuga, I.A.; Dauda, M.K. Quadrature based Broyden-like method for systems of nonlinear equations. Stat. Optim. Inf. Comput. 2018, 6, 130–138. [Google Scholar] [CrossRef]
Muhammad, K.; Mamat, M.; Waziri, M.Y. A Broyden’s-like method for solving systems of nonlinear equations. World Appl. Sci. J. 2013, 21, 168–173. [Google Scholar]
Ullah, N.; Sabi’u, J.; Shah, A. A derivative–free scaling memoryless Broyden–Fletcher–Goldfarb–Shanno method for solving a system of monotone nonlinear equations. Numer. Linear Algebra Appl. 2021, 28, e2374. [Google Scholar] [CrossRef]
Abubakar, A.B.; Kumam, P. A descent Dai–Liao conjugate gradient method for nonlinear equations. Numer. Algorithms 2019, 81, 197–210. [Google Scholar] [CrossRef]
Aji, S.; Kumam, P.; Awwal, A.M.; Yahaya, M.M.; Kumam, W. Two Hybrid Spectral Methods With Inertial Effect for Solving System of Nonlinear Monotone Equations With Application in Robotics. IEEE Access 2021, 9, 30918–30928. [Google Scholar] [CrossRef]
Dauda, M.K.; Usman, S.; Ubale, H.; Mamat, M. An alternative modified conjugate gradient coefficient for solving nonlinear system of equations. Open J. Sci. Technol. 2019, 2, 5–8. [Google Scholar] [CrossRef]
Zheng, L.; Yang, L.; Liang, Y. A conjugate gradient projection method for solving equations with convex constraints. J. Comput. Appl. Math. 2020, 375, 112781. [Google Scholar] [CrossRef]
Waziri, M.Y.; Aisha, H.A. A diagonal quasi-Newton method for system of nonlinear equations. Appl. Math. Comput. Sci. 2014, 6, 21–30. [Google Scholar]
Waziri, M.Y.; Leong, W.J.; Hassan, M.A.; Monsi, M. Jacobian computation-free Newton’s method for systems of nonlinear equations. J. Numer. Math. Stochastics 2010, 2, 54–63. [Google Scholar]
Waziri, M.Y.; Majid, Z.A. An improved diagonal Jacobian approximation via a new quasi-Cauchy condition for solving large-scale systems of nonlinear equations. J. Appl. Math. 2013, 2013, 875935. [Google Scholar] [CrossRef] [Green Version]
Abdullah, H.; Waziri, M.Y.; Yusuf, S.O. A double direction conjugate gradient method for solving large-scale system of nonlinear equations. J. Math. Comput. Sci. 2017, 7, 606–624. [Google Scholar]
Yan, Q.-R.; Peng, X.-Z.; Li, D.-H. A globally convergent derivative-free method for solving large-scale nonlinear monotone equations. J. Comput. Appl. Math. 2010, 234, 649–657. [Google Scholar] [CrossRef] [Green Version]
Leong, W.J.; Hassan, M.A.; Yusuf, M.W. A matrix-free quasi-Newton method for solving large-scale nonlinear systems. Comput. Math. Appl. 2011, 62, 2354–2363. [Google Scholar] [CrossRef] [Green Version]
Waziri, M.Y.; Leong, W.J.; Mamat, M. A two-step matrix-free secant method for solving large-scale systems of nonlinear equations. J. Appl. Math. 2012, 2012, 348654. [Google Scholar] [CrossRef] [Green Version]
Waziri, M.Y.; Leong, W.J.; Hassan, M.A.; Monsi, M. A new Newton’s Method with diagonal Jacobian approximation for systems of nonlinear equations. J. Math. Stat. 2010, 6, 246–252. [Google Scholar] [CrossRef]
Waziri, M.Y.; Leong, W.J.; Mamat, M.; Moyi, A.U. Two-step derivative-free diagonally Newton’s method for large-scale nonlinear equations. World Appl. Sci. J. 2013, 21, 86–94. [Google Scholar]
Yakubu, U.A.; Mamat, M.; Mohamad, M.A.; Rivaie, M.; Sabi’u, J. A recent modification on Dai–Liao conjugate gradient method for solving symmetric nonlinear equations. Far East J. Math. Sci. 2018, 103, 1961–1974. [Google Scholar] [CrossRef]
Uba, L.Y.; Waziri, M.Y. Three-step derivative-free diagonal updating method for solving large-scale systems of nonlinear equations. J. Numer. Math. Stochastics 2014, 6, 73–83. [Google Scholar]
Zhou, Y.; Wu, Y.; Li, X. A New Hybrid PRPFR Conjugate Gradient Method for Solving Nonlinear Monotone Equations and Image Restoration Problems. Math. Probl. Eng. 2020, 2020, 6391321. [Google Scholar] [CrossRef]
Waziri, M.Y.; Leong, W.J.; Mamat, M. An efficient solver for systems of nonlinear equations with singular Jacobian via diagonal updating. Appl. Math. Sci. 2010, 4, 3403–3412. [Google Scholar]
Waziri, M.Y.; Leong, W.J.; Hassan, M.A. Diagonal Broyden-like method for large-scale systems of nonlinear equations. Malays. J. Math. Sci. 2012, 6, 59–73. [Google Scholar]
Abubakar, A.B.; Sabi’u, J.; Kumam, P.; Shah, A. Solving nonlinear monotone operator equations via modified SR1 update. J. Appl. Math. Comput. 2021, 67, 343–373. [Google Scholar] [CrossRef]
Grosan, C.; Abraham, A. A new approach for solving nonlinear equations systems. IEEE Trans. Syst. Man Cybern. 2008, 38, 698–714. [Google Scholar] [CrossRef]
Dehghan, M.; Hajarian, M. New iterative method for solving nonlinear equations with fourth-order convergence. Int. J. Comput. Math. 2010, 87, 834–839. [Google Scholar] [CrossRef]
Dehghan, M.; Hajarian, M. Fourth-order variants of Newton’s method without second derivatives for solving nonlinear equations. Eng. Comput. 2012, 29, 356–365. [Google Scholar] [CrossRef]
Kaltenbacher, B.; Neubauer, A.; Scherzer, O. Iterative Regularization Methods for Nonlinear III—Posed Problems; De Gruyter: Berlin, Germany; New York, NY, USA, 2008. [Google Scholar]
Wang, Y.; Yuan, Y. Convergence and regularity of trust region methods for nonlinear ill-posed problems. Inverse Probl. 2005, 21, 821–838. [Google Scholar] [CrossRef] [Green Version]
Dehghan, M.; Hajarian, M. Some derivative free quadratic and cubic convergence iterative formulas for solving nonlinear equations. Comput. Appl. Math. 2010, 29, 19–30. [Google Scholar] [CrossRef]
Dehghan, M.; Hajarian, M. On some cubic convergence iterative formulae without derivatives for solving nonlinear equations. Int. J. Numer. Methods Biomed. Eng. 2011, 27, 722–731. [Google Scholar] [CrossRef]
Dehghan, M.; Shirilord, A. Accelerated double-step scale splitting iteration method for solving a class of complex symmetric linear systems. Numer. Algorithms 2020, 83, 281–304. [Google Scholar] [CrossRef]
Dehghan, M.; Shirilord, A. A generalized modified Hermitian and skew-Hermitian splitting (GMHSS) method for solving complex Sylvester matrix equation. Appl. Math. Comput. 2019, 348, 632–651. [Google Scholar] [CrossRef]
Bellavia, S.; Gurioli, G.; Morini, B.; Toint, P.L. Trust-region algorithms: Probabilistic complexity and intrinsic noise with applications to subsampling techniques. EURO J. Comput. Optim. 2022, 10, 100043. [Google Scholar] [CrossRef]
Bellavia, S.; Krejić, N.; Morini, B.; Rebegoldi, S. A stochastic first-order trust-region method with inexact restoration for finite-sum minimization. Comput. Optim. Appl. 2023, 84, 53–84. [Google Scholar] [CrossRef]
Bellavia, S.; Krejić, N.; Morini, B. Inexact restoration with subsampled trust-region methods for finite-sum minimization. Comput. Optim. Appl. 2020, 76, 701–736. [Google Scholar] [CrossRef]
Eshaghnezhad, M.; Effati, S.; Mansoori, A. A Neurodynamic Model to Solve Nonlinear Pseudo-Monotone Projection Equation and Its Applications. IEEE Trans. Cybern. 2017, 47, 3050–3062. [Google Scholar] [CrossRef]
Meintjes, K.; Morgan, A.P. A methodology for solving chemical equilibrium systems. Appl. Math. Comput. 1987, 22, 333–361. [Google Scholar] [CrossRef]
Crisci, S.; Piana, M.; Ruggiero, V.; Scussolini, M. A regularized affine–acaling trust–region method for parametric imaging of dynamic PET data. SIAM J. Imaging Sci. 2021, 14, 418–439. [Google Scholar] [CrossRef]
Bonettini, S.; Zanella, R.; Zanni, L. A scaled gradient projection method for constrained image deblurring. Inverse Probl. 2009, 25, 015002. [Google Scholar] [CrossRef] [Green Version]
Liu, J.K.; Du, X.L. A gradient projection method for the sparse signal reconstruction in compressive sensing. Appl. Anal. 2018, 97, 2122–2131. [Google Scholar] [CrossRef]
Liu, J.K.; Li, S.J. A projection method for convex constrained monotone nonlinear equations with applications. Comput. Math. Appl. 2015, 70, 2442–2453. [Google Scholar] [CrossRef]
Xiao, Y.; Zhu, H. A conjugate gradient method to solve convex constrained monotone equations with applications in compressive sensing. J. Math. Anal. Appl. 2013, 405, 310–319. [Google Scholar] [CrossRef]
Awwal, A.M.; Wang, L.; Kumam, P.; Mohammad, H.; Watthayu, W. A Projection Hestenes–Stiefel Method with Spectral Parameter for Nonlinear Monotone Equations and Signal Processing. Math. Comput. Appl. 2020, 25, 27. [Google Scholar] [CrossRef]
Fukushima, M. Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems. Math. Program. 1992, 53, 99–110. [Google Scholar] [CrossRef]
Qian, G.; Han, D.; Xu, L.; Yang, H. Solving nonadditive traffic assignment problems: A self-adaptive projection–auxiliary problem method for variational inequalities. J. Ind. Manag. Optim. 2013, 9, 255–274. [Google Scholar] [CrossRef]
Ghaddar, B.; Marecek, J.; Mevissen, M. Optimal power flow as a polynomial optimization problem. IEEE Trans. Power Syst. 2016, 31, 539–546. [Google Scholar] [CrossRef] [Green Version]
Ivanov, B.; Stanimirović, P.S.; Milovanović, G.V.; Djordjević, S.; Brajević, I. Accelerated multiple step-size methods for solving unconstrained optimization problems. Optim. Methods Softw. 2021, 36, 998–1029. [Google Scholar] [CrossRef]
Andrei, N. An acceleration of gradient descent algorithm with backtracking for unconstrained optimization. Numer. Algorithms 2006, 42, 63–73. [Google Scholar] [CrossRef]
Stanimirović, P.S.; Miladinović, M.B. Accelerated gradient descent methods with line search. Numer. Algorithms 2010, 54, 503–520. [Google Scholar] [CrossRef]
Sun, W.; Yuan, Y.-X. Optimization Theory and Methods: Nonlinear Programming; Springer: New York, NY, USA, 2006. [Google Scholar]
Petrović, M.J. An Accelerated Double Step Size model in unconstrained optimization. Appl. Math. Comput. 2015, 250, 309–319. [Google Scholar] [CrossRef]
Petrović, M.J.; Stanimirović, P.S. Accelerated Double Direction method for solving unconstrained optimization problems. Math. Probl. Eng. 2014, 2014, 965104. [Google Scholar] [CrossRef]
Stanimirović, P.S.; Milovanović, G.V.; Petrović, M.J.; Kontrec, N. A Transformation of accelerated double step size method for unconstrained optimization. Math. Probl. Eng. 2015, 2015, 283679. [Google Scholar] [CrossRef] [Green Version]
Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 1999. [Google Scholar]
Barzilai, J.; Borwein, J.M. Two-point step size gradient method. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
Dai, Y.H. Alternate step gradient method. Optimization 2003, 52, 395–415. [Google Scholar] [CrossRef]
Dai, Y.H.; Fletcher, R. On the asymptotic behaviour of some new gradient methods. Math. Program. 2005, 103, 541–559. [Google Scholar] [CrossRef]
Dai, Y.H.; Liao, L.Z. R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 2002, 22, 1–10. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, J.Y.; Yuan, Y. Modified two-point step-size gradient methods for unconstrained optimization. Comput. Optim. Appl. 2002, 22, 103–109. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, Y. Alternate minimization gradient method. IMA J. Numer. Anal. 2003, 23, 377–393. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, Y. Analysis of monotone gradient methods. J. Ind. Manag. Optim. 2005, 1, 181–192. [Google Scholar] [CrossRef]
Dai, Y.H.; Zhang, H. Adaptive two-point step size gradient algorithm. Numer. Algorithms 2001, 27, 377–385. [Google Scholar] [CrossRef]
Raydan, M. On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 1993, 13, 321–326. [Google Scholar] [CrossRef]
Raydan, M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 1997, 7, 26–33. [Google Scholar] [CrossRef]
Vrahatis, M.N.; Androulakis, G.S.; Lambrinos, J.N.; Magoulas, G.D. A class of gradient unconstrained minimization algorithms with adaptive step-size. J. Comput. Appl. Math. 2000, 114, 367–386. [Google Scholar] [CrossRef] [Green Version]
Yuan, Y. A new step size for the steepest descent method. J. Comput. Math. 2006, 24, 149–156. [Google Scholar]
Frassoldati, G.; Zanni, L.; Zanghirati, G. New adaptive step size selections in gradient methods. J. Ind. Manag. Optim. 2008, 4, 299–312. [Google Scholar] [CrossRef]
Serafino, D.; Ruggiero, V.; Toraldo, G.; Zanni, L. On the steplength selection in gradient methods for unconstrained optimization. Appl. Math. Comput. 2018, 318, 176–195. [Google Scholar] [CrossRef] [Green Version]
Crisci, S.; Porta, F.; Ruggiero, V.; Zanni, L. Spectral properties of Barzilai–Borwein rules in solving singly linearly constrained optimization problems subject to lower and upper bounds. SIAM J. Optim. 2020, 30, 1300–1326. [Google Scholar] [CrossRef]
Crisci, S.; Porta, F.; Ruggiero, V.; Zanni, L. Hybrid limited memory gradient projection methods for box–constrained optimization problems. Comput. Optim. Appl. 2023, 84, 151–189. [Google Scholar] [CrossRef]
Miladinović, M.; Stanimirović, P.S.; Miljković, S. Scalar Correction method for solving large scale unconstrained minimization problems. J. Optim. Theory Appl. 2011, 151, 304–320. [Google Scholar] [CrossRef]
Raydan, M.; Svaiter, B.F. Relaxed steepest descent and Cauchy-Barzilai-Borwein method. Comput. Optim. Appl. 2002, 21, 155–167. [Google Scholar] [CrossRef]
Djordjević, S.S. Two modifications of the method of the multiplicative parameters in descent gradient methods. Appl. Math. Comput. 2012, 218, 8672–8683. [Google Scholar]
Zhang, Y.; Yi, C. Zhang Neural Networks and Neural-Dynamic Method; Nova Science Publishers, Inc.: New York, NY, USA, 2011. [Google Scholar]
Zhang, Y.; Ma, W.; Cai, B. From Zhang neural network to Newton iteration for matrix inversion. IEEE Trans. Circuits Syst. I Regul. Pap. 2009, 56, 1405–1415. [Google Scholar] [CrossRef]
Djuranovic-Miličić, N.I.; Gardašević-Filipović, M. A multi-step curve search algorithm in nonlinear optimization - nondifferentiable case. Facta Univ. Ser. Math. Inform. 2010, 25, 11–24. [Google Scholar]
Zhou, W.J.; Li, D.H. A globally convergent BFGS method for nonlinear monotone equations without any merit functions. Math. Comput. 2008, 77, 2231–2240. [Google Scholar] [CrossRef]
La Cruz, W.; Martínez, J.; Raydan, M. Spectral residual method without gradient information for solving large-scale nonlinear systems of equations. Math. Comput. 2006, 75, 1429–1448. [Google Scholar] [CrossRef] [Green Version]
Dolan, E.; Moré, J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]

Figure 1. Performance profile of

I G D N

versus

E M F D

[8] with respect to iter.

Figure 1. Performance profile of

I G D N

versus

E M F D

[8] with respect to iter.

Figure 2. Performance profile of

I G D N

versus

E M F D

[8] with respect to fval.

Figure 2. Performance profile of

I G D N

versus

E M F D

[8] with respect to fval.

Figure 3. Performance profile of

I G D N

versus

E M F D

[8] with respect to CPU.

Figure 3. Performance profile of

I G D N

versus

E M F D

[8] with respect to CPU.

Figure 4. Performance profile of

A D S S N

versus

E M F D

[8] with respect to iter.

Figure 4. Performance profile of

A D S S N

versus

E M F D

[8] with respect to iter.

Figure 5. Performance profile of

A D S S N

versus

E M F D

[8] with respect to fval.

Figure 5. Performance profile of

A D S S N

versus

E M F D

[8] with respect to fval.

Figure 6. Performance profile of

A D S S N

versus

E M F D

[8] with respect to CPU.

Figure 6. Performance profile of

A D S S N

versus

E M F D

[8] with respect to CPU.

Table 1. IGDN-EMFD comparisons.

Methods	(29)	(34)	(29) = (34)	(29) = (34) = $EMFD$	$EMFD$	$IGDN$ Total
iter	52	32	181	23	72	265
fval	52	33	180	24	71	265
CPU (sec)	214	141	0	0	5	355

Table 2. IADSSN-EMFD comparisons.

Methods	ADSSN	EMFD	ADSSN = EMFD
iter	282	55	23
fval	281	56	23
CPU (sec)	359	1	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stanimirović, P.S.; Shaini, B.I.; Sabi’u, J.; Shah, A.; Petrović, M.J.; Ivanov, B.; Cao, X.; Stupina, A.; Li, S. Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations. Algorithms 2023, 16, 64. https://doi.org/10.3390/a16020064

AMA Style

Stanimirović PS, Shaini BI, Sabi’u J, Shah A, Petrović MJ, Ivanov B, Cao X, Stupina A, Li S. Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations. Algorithms. 2023; 16(2):64. https://doi.org/10.3390/a16020064

Chicago/Turabian Style

Stanimirović, Predrag S., Bilall I. Shaini, Jamilu Sabi’u, Abdullah Shah, Milena J. Petrović, Branislav Ivanov, Xinwei Cao, Alena Stupina, and Shuai Li. 2023. "Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations" Algorithms 16, no. 2: 64. https://doi.org/10.3390/a16020064

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations

Abstract

1. Introduction, Preliminaries, and Motivation

1.1. Overview of Methods for Solving SNE

1.2. Motivation

1.2.1. Improved Gradient Descent Methods as Motivation

1.2.2. Discretization of Gradient Neural Networks (GNN) as Motivation

2. Multiple Step-Size Methods for Solving SNE

2.1. IGDN Methods for Solving SNE

2.2. A Class of Accelerated Double Direction (ADDN) Methods

2.3. A Class of Accelerated Double Step Size (ADSSN) Methods

2.4. Simplified ADSSN

3. Convergence Analysis

4. Numerical Experience

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI