Development of an Efficient Variable Step-Size Gradient Method Utilizing Variable Fractional Derivatives

Ye, Luotang; Chen, Yanmao; Liu, Qixian

doi:10.3390/fractalfract7110789

Open AccessArticle

Development of an Efficient Variable Step-Size Gradient Method Utilizing Variable Fractional Derivatives

by

Luotang Ye

,

Yanmao Chen

and

Qixian Liu

^*

Department of Mechanics, Shenzhen Campus of Sun Yat-sen University, No. 66 Gongchang Road, Shenzhen 518107, China

^*

Author to whom correspondence should be addressed.

Fractal Fract. 2023, 7(11), 789; https://doi.org/10.3390/fractalfract7110789

Submission received: 23 September 2023 / Revised: 21 October 2023 / Accepted: 24 October 2023 / Published: 30 October 2023

(This article belongs to the Special Issue Modeling, Optimization, and Control of Fractional-Order Neural Networks and Nonlinear Systems)

Download

Browse Figures

Versions Notes

Abstract

:

The fractional gradient method has garnered significant attention from researchers. The common view regarding fractional-order gradient methods is that they have a faster convergence rate compared to classical gradient methods. However, through conducting theoretical convergence analysis, we have revealed that the maximum convergence rate of the fractional-order gradient method is the same as that of the classical gradient method. This discovery implies that the superiority of fractional gradients may not reside in achieving fast convergence rates compared to the classical gradient method. Building upon this discovery, a novel variable fractional-type gradient method is proposed with an emphasis on automatically adjusting the step size. Theoretical analysis confirms the convergence of the proposed method. Numerical experiments demonstrate that the proposed method can converge to the extremum point both rapidly and accurately. Additionally, the Armijo criterion is introduced to ensure that the proposed gradient methods, along with various existing gradient methods, can select the optimal step size at each iteration. The results indicate that, despite the proposed method and existing gradient methods having the same theoretical maximum convergence speed, the introduced variable step size mechanism in the proposed method consistently demonstrates superior convergence stability and performance when applied to practical problems.

Keywords:

gradient descent method; variable fractional derivative; variable step size; rapid convergence rate; high convergence precision

1. Introduction

It has been more than 300 years since fractional calculus was first proposed. For a long time, due to the lack of corresponding physical background, fractional calculus was regarded as a purely theoretical study of mathematics. With the development of science and technology, researchers found that fractional calculus can provide a good description of modeling some abnormal behaviors such as diffusion process [1], cybernetics [2,3], viscoelasticity [4,5], biology [6], etc. The long-memory properties of fractional calculus not only reflect local characteristics at the time of evaluation but also reflect the entire history of the function. Therefore, fractional calculus was extensively used in complex nonlinear dynamical systems with memory and genetic properties [7,8,9].

As a natural extension of the constant fractional derivative, variable fractional derivative is a relatively new branch in the field of fractional calculus and has also been widely applied in various mechanical problems such as contact dynamics [10], viscoelasticity [11], fracture mechanics [12], vibration control [13], etc. In the variable fractional derivative, the order can be continuously varied as a function of the integral (differential) independent variable, which can be classified into three categories according to the memory characteristics of the order [14].

As one of the standard optimization algorithms, the gradient method (GM) has been widely used in many engineering applications, such as adaptive filtering [15], image processing [16], and system identification [17]. Recently, the fractional gradient method has attracted the attention of many scholars. The basic principle of this method is to replace the first-order derivative with a fractional-order derivative [18]. However, Pu et al. [19] found that the fractional gradient method cannot guarantee to converge to the extreme point. This shortcoming has been overcome by introducing the variable initial value [20] and truncation technique [21]. After this, the above-improved methods were successfully applied in many scenarios, such as parameter identification [22,23,24,25,26,27], signal processing [28,29,30,31,32,33], and neural networks [34,35,36,37,38,39]. Wei et al. [40] recently divided the existing fractional-type gradient method into three categories, such as variable initial value type, truncation type, and variable fractional-order type. On this basis [40], they further developed a new method that can converge to the real extreme point by replacing the constant fractional order with a variable fractional order.

Research on the fractional gradient method has been progressing rapidly. It is important to note that many of these studies emphasize the higher convergence rate of the fractional gradient method compared to the classical gradient method. However, the research also suggests that the maximum convergence rate of the fractional gradient method is the same as that of the classical gradient method. This implies that the advantage of the fractional gradient method may not lie solely in its higher convergence rate.

Therefore, it is crucial to explore the advantages of the fractional gradient method from a different perspective. It is well-known that gradient-based methods require a small iteration step size to achieve high accuracy, while a large iteration step size is needed for rapid convergence. Consequently, the classical gradient method, with its fixed step size, cannot simultaneously achieve high precision and a rapid convergence rate. Hence, there is a need to propose a new gradient method that is capable of achieving both high precision and a rapid convergence rate.

To address this issue, this paper presents a novel gradient descent method based on variable fractional derivatives, aiming to achieve both rapid convergence rate and high convergence precision. The underlying principle of this method is to offer a larger iteration step size in the initial stages of the iteration, facilitating rapid convergence. Conversely, towards the end of the iteration, the step size gradually decreases to a fixed and smaller value, ensuring a higher level of convergence precision. The main contributions of this paper are outlined as follows:

The fractional-type methods currently in existence can be broadly categorized into three types. Theoretical convergence analysis has revealed that the maximum convergence rate of fractional-type gradient methods is comparable to that of classical gradient methods. These findings suggest that the advantage of fractional gradients may not necessarily lie in their superior convergence rates compared to the classical gradient method.
We reveal that the fractional order operator’s key impact on the algorithm lies in the variable step size mechanism. Based on this discovery, a novel variable step-size gradient method is proposed by combining the existing three kinds of gradient methods. This method can overcome the shortcoming that the classical gradient method cannot balance the rapid convergence rate and high convergence precision.
Numerical simulations are provided to demonstrate the effectiveness of the proposed variable fractional-order gradient method. It can be observed that the proposed method converges rapidly and accurately to the extremum of convex optimization problems. Moreover, under optimal step-size conditions, the proposed method exhibits remarkable capability, maintaining a high convergence performance even when other methods struggle to achieve their maximum convergence speeds in practical simulations.

The structure of the paper is as follows: In Section 2, several definitions, such as the variable fractional derivative and a lemma about convex function, are introduced. The novel gradient method based on the variable fractional derivative is proposed in Section 3. In Section 4, the theoretical convergence analysis of the proposed algorithm is presented. Numerical simulations are presented to illustrate the effectiveness of the proposed method in Section 5. Finally, the paper is concluded in the Section 6.

2. Classification of Fractional Gradient Methods

Ever since fractional calculus was invented, the definition of fractional-order derivatives has been changing and improving. In recent years, scholars have put forward various definitions of fractional-order derivatives. However, there are three widely accepted definitions which are the Riemann–Liouville definition, the Grunwald–Letnikov definition, and the Caputo definition. Previous studies have shown that the fractional-order gradient method with Riemann–Liouville definition cannot converge to the true extreme point of the objective function [21,25]. Therefore, in this paper, we adopt the Caputo definition, given as follows:

{}_{t_{0}}^{C}𝒟_{t_{k}}^{α} f (t) = \frac{1}{Γ (n - α)} \int_{t_{0}}^{t} \frac{f^{(n)} (τ)}{{(t - τ)}^{α - n + 1}} d τ

(1)

where

n = [α]

is the upper integer bound on

α

and

Γ (\cdot)

is the gamma function. Alternatively, it can be rewritten in a similar form to the conventional Taylor series:

{}_{t_{0}}^{C}𝒟_{t_{k}}^{α} f (t) = \sum_{i = n}^{\infty} \frac{f^{(i)} (t_{0})}{Γ (i + 1 - α)} {(t - t_{0})}^{i - α}

(2)

The variable fractional-order Caputo derivative of

f (t)

with the order

α (t) > 0

on the interval

(t_{0}, t)

is defined as:

{}_{t_{0}}^{C}𝒟_{t_{k}}^{α (t)} f (t) = \frac{1}{Γ [n - α (t)]} \int_{t_{0}}^{t} \frac{1}{{(t - τ)}^{1 + α (t) - n}} \frac{d^{n} f (τ)}{d τ^{n}} d τ

(3)

where

n = [α (t)]

is the upper integer bound on α(t) and

Γ (\cdot)

is the gamma function. We emphasize that the dynamics of all the systems considered in this study start from

t = 0

without any prior history. Thus, all functions of interest are identically zero for all instants before

t = 0

.

Suppose that

f (t)

is a smooth convex function with a unique extreme point

\hat{t}

. It is well known that each iteration step of the fractional-order gradient method (FOGM) is formulated as:

t_{k + 1} = t_{k} - ρ {}_{t_{0}}^{C}𝒟_{t_{k}}^{α} f (t)

(4)

where

ρ > 0

is the iteration step size. Note that the FOGM will degenerate to its counterpart integer-order gradient method as

α \to 1

.

It is well known that the fractional-order gradient method cannot converge to the real extreme value. To circumvent this problem, scholars propose the following three improvement versions, listed as follows:

Type-I (variable initial value strategy): Let the initial value

t_{0}

of the fractional order derivative

{}_{t_{0}}^{C}𝒟_{t_{k}}^{α} f (t)

change to

t_{k - 1}

with the iteration; this improved iteration format can be formulated as:

t_{k + 1} = t_{k} - ρ {}_{t_{k - 1}}^{C}𝒟_{t_{k}}^{α} f (t)

(5)

Type-II (truncation strategy): According to [37], the Caputo fractional-order derivative can be also rewritten in another form as a Taylor series:

{}_{t_{0}}^{C}𝒟_{t_{k}}^{α} f (t) = \sum_{i = n}^{\infty} (\begin{matrix} t_{0} - n \\ i - n \end{matrix}) \frac{f^{(i)} (t)}{Γ (i + 1 - α)} {(t - t_{0})}^{i - α}

(6)

Under this Taylor expansion of the scheme, by truncating the higher-order terms and preserving the first-order terms

\frac{1}{Γ (2 - α)} 𝒟 f (t) {(t - t_{0})}^{1 - α}

, the iteration format of this method can be formulated as:

t_{k + 1} = t_{k} - ρ 𝒟 f (t_{k}) {(t_{k} - t_{0})}^{1 - α}

(7)

Type-III (variable fractional-order strategy): By designing different variable fractional-order schemes, let the variable fractional-order derivative

α (t)

converge to

1

when the function

f (t)

converges to the real extreme value

f (\hat{t})

. The iteration format of this method can be formulated as:

t_{k + 1} = t_{k} - ρ {}_{t_{0}}^{C}𝒟_{t_{k}}^{α (t_{k})} f (t)

(8)

with

α (t_{k}) \to 1

as the iteration converges. We have thus provided a summary of the main improvement strategies in the literature.

Remark 1.

It should be noted that both Type-I and Type-III require calculating the fractional derivative at each iteration, which is very time-consuming for the cases where the objective function cannot be expressed explicitly. Whereas, in Type-II, the convergence accuracy of the algorithm is affected due to the truncation of the higher-order terms of the fractional derivatives. In [21], Chen proposed a novel fractional-order gradient method that combines Type-I and Type-II. However, this hybrid method may lead to a numerical oscillator in some cases, such that it cannot always achieve a satisfactory convergence accuracy.

A significant amount of research conducted using numerical simulations has found that the fractional gradient method, compared to the traditional gradient method, has a faster convergence rate advantage. In the subsequent subsection, we will conduct an in-depth theoretical analysis of the convergence rate of the fractional gradient method to verify whether it possesses a greater convergence speed advantage compared to the traditional gradient method.

3. Convergence Analysis of Fractional Gradient Method

In this section, we will theoretically analyze the convergence rate of the fractional-type gradient algorithm. To do this, we introduce the following Lemma:

Lemma 1.

Suppose that

f (.)

is a smooth convex function; then, it must satisfy the following Lipschitz continuity condition

f (y) \leq f (x) + 𝒟 f {(x)}^{T} (y - x) + \frac{M}{2} {∥ y - x ∥}_{2}^{2}

(9)

and strong convex condition

f (y) \geq f (x) + 𝒟 f {(x)}^{T} (y - x) + \frac{m}{2} {∥ y - x ∥}_{2}^{2}

(10)

where M and m are arbitrary constants.

Firstly, we rewrite Type-I, -II and -III as the following unified form:

t_{k + 1} = t_{k} - ρ Φ (t_{k})

(11)

where

Φ (t_{k})

represents the iteration functions such as

{}_{t_{k - 1}}^{C}𝒟_{t_{k}}^{α} f (t)

,

𝒟 f (t_{k}) {(t_{k} - t_{0})}^{1 - α}

and

{}_{t_{0}}^{C}𝒟_{t_{k}}^{α (t_{k})} f (t)

. Obviously,

Φ (t_{k})

is a bounded function. Comparing inequality (9) and algorithm (11), we let

y = t_{k + 1}

,

x = t_{k}

and substitute them into Equation (9). One arrives at:

\begin{matrix} f (t_{k + 1}) - f (t_{k}) \leq - 𝒟 f (t_{k}) ρ Φ (t_{k}) + \frac{M}{2} {∥ ρ Φ (t_{k}) ∥}_{2}^{2} \\ \leq - [ρ μ_{k} - \frac{M}{2} {(ρ μ_{k})}^{2}] {∥ 𝒟 f (t_{k}) ∥}_{2}^{2} \end{matrix}

(12)

with

μ_{k} = Φ (t_{k}) / 𝒟 f (t_{k})

. Note that Equation (12) degenerates to its counterpart integer-order gradient descent method as

μ_{k} \to 1

. For this case, let

g (ρ μ_{k}) = ρ μ_{k} - \frac{M}{2} {(ρ μ_{k})}^{2}

. One can ascertain that

g (ρ μ_{k})

has a minimum value of

g {(ρ μ_{k})}_{m i n} = \frac{1}{2 M}

when

ρ μ_{k} = \frac{1}{M}

. By substituting

g ({(ρ μ_{k})}_{m i n} = \frac{1}{2 M}

into inequality (12), one arrives at

\begin{matrix} f (t_{k + 1}) - f (t_{k}) \leq - \frac{1}{2 M} {| | 𝒟 f (t_{k}) | |}_{2}^{2} \end{matrix}

(13)

On the other hand, we let the right-hand side of Equation (10) be the function

ψ (y)

that takes the extreme value when

y = x - \frac{1}{m} 𝒟 f (x)

. Substituting this into Equation (10), one can yield

\begin{matrix} f (y) \geq f (x) - \frac{1}{2 m} {| | 𝒟 f (x) | |}_{2}^{2} \end{matrix}

(14)

Let

y = \hat{t}

be the optimal point and

x = t_{k + 1}

; Equation (14) becomes:

\begin{matrix} f (\hat{t}) \geq f (t_{k + 1}) - \frac{1}{2 m} {| | 𝒟 f (t_{k + 1}) | |}_{2}^{2} \end{matrix}

(15)

Subtracting optimal value

f (\hat{t})

from both sides of Equation (13) yields

\begin{matrix} f (t_{k + 1}) - f (\hat{t}) \leq f (t_{k}) - f (\hat{t}) - \frac{1}{2 M} {| | 𝒟 f (t_{k}) | |}_{2}^{2} \end{matrix}

(16)

Combining Equations (15) and (16) further yields

∥ \frac{f (t_{k + 1}) - f (\hat{t})}{f (t_{k}) - f (\hat{t})} ∥ \leq ∥ 1 - \frac{m}{M} ∥ < 1

(17)

The difference between

f (t_{k + 1}) - f (\hat{t})

and

f (t_{k}) - f (\hat{t})

is proportional and

f (t_{k + 1})

is closer to the optimal value

f (\hat{t})

than

f (t_{k})

as

∥ 1 - \frac{m}{M} ∥

is always less than 1. Thus, we theoretically prove that the fractional-type gradient method is always convergence.

In addition, Equation (17) also provides an estimate of the maximum convergence rate of the fractional-type gradient method, which is the same as the classical gradient method. That is to say, the advantage of the fractional-type gradient method may not be that it has a faster convergence rate than the classical gradient method. However, a related question arises about the practical feasibility of the classical gradient method to achieve its theoretical maximum convergence rate in real-world problem-solving contexts. This prompts us to rediscover the advantage of the fractional gradient method.

Remark 2.

We further rewrite Equation (11) as:

t_{k + 1} = t_{k} - ρ μ_{k} 𝒟 f (t_{k})

(18)

Herein,

μ_{k} = Φ (t_{k}) / 𝒟 f (t_{k})

. Note that Equation (18), the fractional-type gradient method, adds an auxiliary iteration function

μ_{k}

when compared to the classical gradient method. Note that

μ_{k}

is a variable that changes with the number of iterations k. Hence,

μ_{k}

can be regarded as a regulation parameter to adjust the step size of the fractional-type gradient method at each iteration.

Remark 3.

It is widely acknowledged that the classical gradient method encounters difficulties in achieving a satisfactory trade-off between convergence rate and precision. Nonetheless, the fractional-type gradient method, with its inherent variable step-size characteristic, presents a prospective solution to address this dilemma. In view of this, our objective is to investigate a fresh standpoint regarding the merits of the fractional gradient method in comparison to the classical gradient method.

4. A Novel Variable Step-Size Gradient Method

4.1. Establishment of the Novel Gradient Method

To balance the trade-off between convergence rate and precision, we propose a novel fractional-type gradient method by combining the Type-I, -II and -III. We construct the novel method starting from Type-III:

t_{k + 1} = t_{k} - ρ {}_{t_{0}}^{C}𝒟_{t_{k}}^{α (t_{k})} f (t)

(19)

where

α (t_{k})

is designed as follows:

α (t_{k}) = 1 - u e^{- \frac{v}{t_{k} - t_{k - 1}}}

(20)

Here,

u

and

v

are control parameters that can adjust the convergence rate and range of the algorithm (19). Obviously, if only the proposed algorithm (19) is convergent, the iteration points

t_{k}

and

t_{k - 1}

are close enough that the variable fraction-order

α (t)

tends to 1. Therefore, this scheme can guarantee that it converges to the true extreme point.

As noted above, it is very time-consuming to calculate the fractional derivative at each iteration. To avoid this problem, we further introduce the variable initial value strategy (Type-I) and truncation strategy (Type-II).

Firstly, by replacing the initial value

t_{0}

with variable initial value

t_{k}

, Equation (19) can be rewritten as

t_{k + 2} = t_{k + 1} - ρ {}_{t_{k}}^{C}𝒟_{t_{k + 1}}^{α (t_{k})} f (t)

(21)

where

0 < α (t) < 1

and

ρ > 0

. According to Equation (6), the variable fractional derivative can be expressed as the following equivalent form:

{}_{t_{k}}^{C}𝒟_{t_{k + 1}}^{α (t)} f (t) = \sum_{i = 0}^{\infty} \frac{𝒟^{(i + 1)} f (t_{k})}{Γ (i + 2 - α (t))} {(t_{k + 1} - t_{k})}^{i + 1 - α (t_{k})}

(22)

Considering that the first-order item plays a dominant role, Algorithm (21) can be truncated as

t_{k + 2} = t_{k + 1} - \frac{ρ}{Γ (2 - α (t))} 𝒟 f (t_{k + 1}) {(t_{k + 1} - t_{k})}^{1 - α (t_{k})}

(23)

Since

\frac{ρ}{Γ (2 - α (t))}

can be seen as part of the step size, it can be simplified as

t_{k + 2} = t_{k + 1} - ρ 𝒟 f (t_{k + 1}) {(t_{k + 1} - t_{k})}^{1 - α (t_{k})}

(24)

To force the step size to be positive and avoid singularity, the proposed method can be further modified as:

t_{k + 2} = t_{k + 1} - ρ 𝒟 f (t_{k + 1}) {(| t_{k + 1} - t_{k} | + δ)}^{1 - α (t_{k})}

(25)

here,

δ

is selected as a very small positive constant so that it hardly affects the convergence of the proposed algorithm. Up to now, we have completed a description of the proposed algorithm, which can be called a truncation variable fractional-order gradient method (TVFOGM). The flow chart of the TVFOGM is shown in Algorithm 1.

Algorithm 1: TVFOGM

1: Initialize

ρ, δ, u, v, k_{m a x}

and

t_{0}

.

2: Calculate

t_{1}

by conventional gradient method:

t_{1} = t_{0} - ρ 𝒟 f (t_{0})

3: Calculate the variable fractional order by:

α (t_{k}) = 1 - u e^{- \frac{v}{t_{k} - t_{k - 1}}}

4: Calculate the next iteration point

t_{k} (k \geq 2)

by:

t_{k + 2} = t_{k + 1} - ρ 𝒟 (t_{k + 1}) {(| t_{k + 1} - t_{k} | + δ)}^{1 - α (t_{k})}

5: Check if

t_{k}

meets the accuracy requirements or

k > k_{m a x}

,

if not, go to step 3.

4.2. Convergence Analysis of TVFOGM

In this section, we will theoretically prove that the proposed algorithm can converge to the true extremum point using the reduction to absurdity. Assume that

t_{k}

converges to a different point

\tilde{t} \neq \hat{t}

and

𝒟 f (\tilde{t}) \neq 0

. It is concluded that, for any sufficiently small positive scalar

ε

, there exists a sufficient large number

N \in ℕ

such that:

| t_{k} - \tilde{t} | < ε < | \hat{t} - \tilde{t} |, \forall k > N

Thus

d = \inf_{k > N} | 𝒟 f (t_{k}) | > 0

must hold. According to Inequality (24), one can obtain:

\begin{matrix} | t_{k + 2} - t_{k + 1} | = ρ | 𝒟 f (t_{k + 1}) | {(| t_{k + 1} - t_{k} | + δ)}^{1 - α (t_{k})} \\ \geq ρ d {(| t_{k + 1} - t_{k} | + δ)}^{1 - α (t_{k})} \end{matrix}

where

δ

is a given positive real number. Therefore,

| t_{k + 2} - t_{k + 1} | \geq ρ d {(| t_{k + 1} - t_{k} | + δ)}^{1 - α (t_{k})} \geq ρ d {(| t_{k + 1} - t_{k} |)}^{1 - α (t_{k})}

. Take

ε < \frac{{(ρ d)}^{1 / α (t_{k})}}{2}

; thus,

ρ d > {(2 ε)}^{α (t_{k})}

, and then:

\begin{matrix} | t_{k + 2} - t_{k + 1} | \geq ρ d {| t_{k + 1} - t_{k} |}^{1 - α (t)} \\ > {(2 ε)}^{α (t_{k})} {| t_{k + 1} - t_{k} |}^{1 - α (t_{k})} \end{matrix}

Since

| t_{k + 1} - t_{k} | < | t_{k + 1} - \tilde{t} | + | \tilde{t} - t_{k} | < 2 ε

, then

{(2 ε)}^{α (t_{k})} > {| t_{k + 1} - t_{k} |}^{α (t_{k})}

, it finally yields:

\begin{matrix} | t_{k + 2} - t_{k + 1} | > {(2 ε)}^{α (t_{k})} {| t_{k + 1} - t_{k} |}^{1 - α (t_{k})} \\ > {| t_{k + 1} - t_{k} |}^{α (t_{k})} {| t_{k + 1} - t_{k} |}^{1 - α (t_{k})} \\ > | t_{k + 1} - t_{k} | \end{matrix}

In this way, we obtain the conclusion

| t_{k + 2} - t_{k + 1} | > | t_{k + 1} - t_{k} |

, which contradicts the above premise so that the proposed algorithm is convergent and converges to the only extreme point. In addition, the convergence rate of the proposed method can still be estimated by Equation (17). That is to say, the maximum convergence rate of the proposed method is also the same as the classical gradient method.

Remark 4.

The proposed method provides an additional auxiliary iterative function

μ_{k} = {(| t_{k + 1} - t_{k} | + δ)}^{1 - α (t_{k})}

to adjust the iteration step automatically according to the difference between adjacent iterations

t_{k + 1}

and

t_{k}

. Concretely, it can provide a larger iteration step size as there is a large difference between

t_{k + 1}

and

t_{k}

. As the iteration converges, the value of

t_{k + 1}

and

t_{k}

are very close to each other; then, the step size of the proposed method will reduce to a fixed value

ρ

as the auxiliary iterative function will tend to 1.

Remark 5.

By leveraging this characteristic, the suggested algorithm offers a swift convergence rate during the initial stage of iteration. Additionally, it can attain a high convergence accuracy as the iteration progresses when a suitably small value is assigned to

ρ

. In other words, the proposed method effectively addresses a fundamental limitation of the conventional gradient method, which fails to strike a balance between efficiency and precision. Moreover, the inclusion of a variable fractional-order operator augments the algorithm’s ability to achieve optimal convergence rates throughout the optimization process.

4.3. Incorporating the Armijo Criterion

To further emphasize the advantages of the proposed method, we introduce the Armijo Criterion, which is a mathematical condition used in optimization algorithms to determine an appropriate step size along a search direction. This ensures that each step taken during the optimization process results in a sufficient reduction in the objective function value. Algorithm 2 shows the specific flow of the algorithm.

Algorithm 2: Armijo Criterion

(1): Input of current point $t_{k} .$ Search direction $d_{k} .$ Objective function $f (t) .$ Current gradient $g_{k}$ . Armijo constant $β$ (usually a value between 0 and 1). A positive constant $σ$ (typically between 0 and 0.5).
(2): If the inequality $f (t_{k} + β d_{k}) \leq f (t_{k}) + σ β g_{k}^{T} d_{k}$ is satisfied, let the iteration step size $ρ = β$ and stop algorithm. If not, go to Step 3.
(3): Let $β = β / 2$ ; turn to Step 2.

Remark 6.

By incorporating the Armijo Criterion into the TVFOGM, several significant enhancements can be achieved. Firstly, the convergence speed of the algorithm can be elevated to approach its theoretical maximum, thereby optimizing its efficiency. Additionally, the inclusion of the Armijo Criterion bolsters the stability of the algorithm, ensuring its robust performance across diverse functions. Most importantly, it efficiently fortifies the algorithm’s convergence abilities, enabling it to effectively tackle a wide array of functions.

5. Numerical Experiments

In this section, we validate the convergence performance of the proposed TVFOGM through several numerical experiments. To do this, we compare it with the classical gradient method (GM), the previous three types of FOGM, and FOGM in [21]. The optimal fractional order

α

of previous FOGMs is set as 1.5.

Experiment 1.

Firstly, we consider a simple quadratic function

f (t) = {(t - 5)}^{2}

. Obviously, the extreme point of this function is

\hat{t} = 5

. Parameter

ρ

is the iteration step size and

ε

is the convergence precision. In this case, the additional auxiliary parameter

v

is sete asa fixed value of

v = ρ

, and parameter

u

is set to be −1.5. The termination condition is

| 𝒟 f (t_{k}) | < ε

or the maximal iteration number equals 50,000. At this point, we consider the method to be invalid. Take the initial point

t_{0} =

100; Table 1 shows the convergence number of various algorithms converging to the same precision. Compared with classical GM, the previous three types of FOGM do not improve the convergence ability. However, the FOGM in [21] can reduce the number of convergences, but cannot achieve a higher convergence precision. In this case, the proposed TVFOGM can balance the rapid convergence rate and high convergence precision. To explore this characteristic more clearly, the iteration process of these methods is shown in Figure 1.

Figure 1 shows the convergence rate and precision of the proposed TVFOGM, previous FOGM, and classical GM, respectively. The previous FOGM can converge rapidly to the extreme point, but the convergence precision is limited. For the same

ρ

, the proposed TVFOGM has a larger convergence rate than other methods in the initial stage, and then reduces to the same convergence rate as the classical GM after convergence to a certain precision. This reflects that the proposed method has the characteristic of a variable step size, and the precision is at least two orders of magnitude higher than that of the classical GM. For the selection of parameter

u

, we adopted three different values, and the performance of the TVFOGM did not change greatly. The robustness of the TVFOGM will be further studied in the following calculation.

Figure 2 shows the required number of iteration steps that converge to the same precision (10⁻²) for the classical GM and proposed TVFOGM, respectively. The convergence rate of classical GM is very sensitive to the step size

ρ

because its number of iterations strongly depends on the iteration step size, as shown in Figure 1A. However, the number of iteration steps for the proposed TVFOGM almost remains at the same small magnitude as step size

ρ

varies. This means that the proposed TVFOGM is more robust and efficient.

Figure 3 further shows the effect of the

u

on the convergence of the proposed TVFOGM. Subjecting this to the different ρ values shows that there seems to be a minimum number of iterations for the proposed TVFOGM, e.g., about

u = - 1.5

. The difference between the minimum values of the four cases is very small. In addition, the influence of auxiliary parameter

u

on the number of iterations of the TVFOGM is almost negligible. This indicates that the selection of parameter

u

has a larger degree of freedom.

Through the above discussions, we verified that the proposed method can achieve both a high convergence rate and precision. In addition, the method is not sensitive to the step size

ρ

, so it is recommended to choose a smaller step size

ρ

to guarantee higher convergence precision.

Experiment 2.

Here, we further consider a more complex objective function, given as

f (t) = \frac{1}{t} + l n t

with extreme point is

\hat{t} = 1

. The characteristic of this objective function is that the gradient of the function is very small when t is given a large value. Therefore, if the initial value of the iteration is given as a large value, the iteration rate will be very slow in the initial stage.

In order to improve the convergence rate, a larger iteration step size

ρ

should be selected for the classical gradient method. As shown in Figure 4A, when a larger iteration step size

ρ =

2 (green line) is chosen, although there is a rapid convergence rate in the initial stage, the final convergence precision is very low. When choosing a smaller iteration step size, the classical GM can achieve a rapid convergence precision size, but its convergence rate will be reduced. However, the proposed TVFOGM can achieve both rapid convergence rate and high convergence precision due to it essentially being a variable step-size method, as shown by the black line in Figure 4A. We also discuss the selection of auxiliary parameters

v

and

u

. As shown in Figure 4B, the two additional auxiliary parameters have little effect on the convergence rate of the proposed method so that the parameters can be selected to be approximately the same as the previous Simulation I, such as

v = 1

and

u =

−1.5. Given the same

ρ

and number of iterations, one can observe that the proposed TVFOGM can quickly converge to the extreme point, whereas the classical GM and FOGM only iterate forward a short distance in Figure 5.

Experiment 3.

Here, we consider a two-dimensional objective function

f (x, y) = 2 {(x - 1)}^{2} + {(y - 5)}^{2}

. Obviously, the target point here is at point

(1, 5)

. In this simulation, the iteration start point of all algorithms is set to the same point

(20, 20)

, the iteration step size

ρ

is set to a small value given as

0.01

, the parameter

v

is set to be equal to

ρ

, the parameter

u

is also selected as

u = - 1.5

and the number of times that the iteration is terminated is set to 20. Figure 6 shows the iteration processes of GM, FOGM, and TVFOGM. It can be seen that the proposed TVFOGM can also rapidly converge to the target point. More importantly, it can also reach a high convergence precision, as listed in Table 2.

Experiment 4.

As a classical technique for chaos suppression, the time-delayed feedback controlling strategy has been widely developed by stabilizing unstable periodic orbits (UPOs) embedded in chaotic systems. A critical issue for achieving a high level of controlling precision is to search for an appropriate time delay. This section utilizes the proposed TVFOGM to determine the optimal time delay in the delayed feedback controller. The time delay is adjusted within the iterative scheme provided by the proposed method, and finally converges to the period of the target UPO. Also consider the paradigmatic Rössler system subjected to time-delayed feedback.

\begin{array}{l} \dot{x} = - y - z \\ \dot{y} = x + a y + u (t) \\ \dot{z} = b + z (x - c) \end{array}

(26)

Given the initial value

x (0) = 1

,

y (0) = 2

,

z (0) = 1

, and parameters

a = 0.2

,

b = 0.2

,

c = 5.7

, the uncontrolled system (i.e., u(t) = 0) will exhibit chaotic behavior, as shown in Figure 7A. Take the control function

u (t) = η [y (t - τ) - y (t)]

. The results shown in Figure 7B,C are the periodic solutions of the controlled system with the time delay

τ

as either 5 or 6, both with the feedback gain of

η = 0.2

.

It should be pointed out that the attained periodic solutions are not necessarily the UPO embedded in the uncontrolled system. Next, we will use the TVFOGM to obtain the optimal stable solution

τ

of the system. Take the target function (27) for the system:

ϕ_{T} (τ) = \sum_{s = 0}^{S} {[e (t_{s})]}^{2} = \sum_{s = 0}^{S} {[y (t_{s} - τ) - y (t_{s})]}^{2}

(27)

where

t_{s}

s are selected as timepoints that are uniformly distributed over a fixed interval [0, T]. By applying the proposed method to the above objective function, one can obtain:

τ_{k + 2} = τ_{k + 1} - ρ {(| τ_{k + 1} - τ_{k} | + δ)}^{1 - α (τ_{k})} \frac{\partial ϕ_{T} (τ_{k + 1})}{\partial τ}

where the gradient of

ϕ_{T} (τ)

with respect to

τ

is obtained using numerical difference such as

\frac{\partial ϕ_{T} (τ_{k + 1})}{\partial τ} = \frac{ϕ_{T} (τ_{k + 1} + Δ τ) - ϕ_{T} (τ_{k + 1})}{Δ τ}

with

Δ τ

as a small quantity. In this case, we need to adjust the parameter

v

slightly to ensure the convergence of the proposed algorithm as the objective function (27) is not a strictly convex function. For instance, the parameter selection of

v

is set to

v = 0.1

for the identification of period-1 and period-2 orbits, while for period-3 orbits, it will also be selected as

v = ρ

. The selection of other parameters is the same as the above numerical simulation.

As shown in Figure 8, the proposed method can also quickly identify the periods of unstable period-1, period-2, and period-3 orbits embedded in a chaotic system (26), while the classical GM and FOGM are very inefficient. This shows that the proposed TVFOGM has better adaptability than the classical GM and FOGM. By using the obtained optimal time delays, we successfully achieved the stabilization of period-1, period-2, and period-3 unstable periodic orbits, as shown in Figure 9B–D. Figure 10 shows the corresponding control errors, which verify the precision of the obtained optimal time delays. Table 3 and Table 4 compare the number of iteration steps and iteration time of the three gradient methods, from which it can be seen that the proposed method has a higher convergence efficiency.

Through this example, the proposed method is also shown to efficiently handle non-convex optimization problems. This is because, in this example, only one parameter

v

needs to be adjusted, and the selection of other parameters remains unchanged from the previous example.

Experiment 5.

In the preceding four simulations, we examined the performance of the algorithms under a fixed iteration step size. The outcomes indicate that through the regulation of this fixed step size, TVFOGM not only enhances convergence accuracy but also significantly improves convergence speed, rendering it an effective optimization technique. In Section 3, we theoretically prove that the theoretical maximum convergence rate of the classical gradient method is equivalent to that of the fractional-type methods. To evaluate the performance of these methods under optimal step-size conditions, we introduce the Armijo criterion shown in Section 4.3 in this simulation.

The Armijo criterion ensures that each iteration step of the algorithm converges toward the current optimal choice. Furthermore, we will employ 10 benchmark test functions [41,42] which are characterized as having a valley shape, bowl shape, plate shape and steep ridges to facilitate a comprehensive comparative analysis of the performance of these methods. The characteristics of these 10 benchmark test functions are documented in Table 5.

The convergence performance of the three methods is shown in Table 6. Iteration number

k

indicates the number of iterations used by the method to converge to the extreme point

\hat{t}

, and its limit is set at 20,000. When the number of iterations

k

of the method exceeds 20,000, we consider this method to be invalid for this problem.

t_{0}

is the initial point of the method iteration,

t_{k}

is the final convergence point, and

f (t_{k})

is the final convergence function value. According to [25] the parameter α of FOGM is selected as α = 0.8 and α = 1.2. The parameter

u

of TVFOGM is also selected as

u = - 1.5

. As the iteration step size

ρ

of each iteration is constantly changing, the parameter

v

is set as 1 for convenience.

As depicted in Table 5, the incorporation of the Armijo criterion leads to significant results for the three methods, allowing them to achieve the fastest convergence rates in certain cases, such as functions 4, 7, and 9. However, when testing functions 2, 3, and 5, it becomes evident that the FOGM lags behind GM in terms of performance, whereas TVFOGM exhibits a slight advantage over GM. On the other hand, when testing functions 1, 6, and 10, it becomes apparent that the convergence ability of GM substantially declines. In contrast, the TVFOGM method demonstrates the ability to simultaneously achieve rapid convergence and increased convergence accuracy in these specific scenarios. While TVFOGM effectively addresses the limitation that classical GM faces in achieving both convergence speed and accuracy, an observation can be made regarding function 8. In scenarios characterized by multiple extreme points, it is clear that gradient-based methods struggle to converge to the global optimal point, resulting in random final convergence points. Therefore, our future efforts will focus on devising strategies to overcome this challenge and ensure that the algorithm converges to the global optimal point, which represents a critical area for future improvements.

6. Conclusions

In this paper, we conducted an analysis of existing fractional-type gradient methods in three categories. Through convergence analysis, we discovered that the maximum convergence rate of fractional-type gradient methods is the same as that of classical gradient methods. This indicates that the advantage of fractional gradients may not solely lie in their faster convergence rates compared to classical gradient methods. This realization motivated us to reexamine the benefits of fractional gradient methods.

Through theoretical analysis, it is evident that fractional-type gradient methods are essentially variable-step-size methods, offering an effective means to dynamically adjust the step size during optimization. Based on this insight, we propose a novel variable fractional gradient method that focuses on automatic step size adjustment. By doing so, this method overcomes the limitations of classical gradient methods in balancing rapid convergence rates with high convergence precision.

Theoretical analysis demonstrates the convergence of the proposed variable fractional gradient method. Numerical simulations are provided to illustrate the effectiveness of the proposed approach. It is evident that the method achieves rapid and accurate convergence to the extreme point. Moreover, the numerical analysis reveals that the proposed method exhibits insensitivity to additional auxiliary parameters. Thus, for most optimization problems, the auxiliary parameters u can be chosen as −1.5, while the selection of v should be appropriately adjusted based on the specific optimization problem. It is worth emphasizing that selecting a small value for the parameter ρ is recommended to achieve a high convergence precision.

Additionally, we conducted tests on the three methods using 10 benchmark test functions with the inclusion of the Armijo criterion. The findings demonstrate that, despite having the same theoretical maximum convergence speed, TVFOGM consistently displays superior convergence stability and performance when applied to real-world problems.

A critical focus for future enhancements is to ensure the method’s convergence towards the global optimal point in scenarios involving multiple extreme points. Additionally, it is essential to delve into the challenging task of stability analysis for the proposed TVFOGM and explore its application to more intricate optimization problems like neural networks and parameter identification. These advancements would broaden the method’s scope and effectiveness in practical applications. As a result, our proposed TVFOGM should be further extended to RNN-based and CNN-based networks.

Author Contributions

Conceptualization, Q.L.; Methodology, L.Y.; Investigation, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science Foundation of China grant number 11702333, Shenzhen Science and Technology Program (202206193000001, 20220817204433001) and National Science Foundation of China (12172387).

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jiménez, S.; Usero, D.; Vázquez, L.; Velasco, M.P. Fractional Diffusion Models for the Atmosphere of Mars. Fractal Fract. 2018, 2, 1. [Google Scholar] [CrossRef]
Yin, C.; Chen, Y.; Zhong, S. Fractional-Order Sliding Mode Based Extremum Seeking Control of a Class of Nonlinear Systems. Automatica 2014, 50, 3173–3181. [Google Scholar] [CrossRef]
Dinh, T.N.; Kamal, S.; Pandey, R.K. Fractional-Order System: Control Theory and Applications. Fractal Fract. 2023, 7, 48. [Google Scholar] [CrossRef]
Rossikhin, Y.A.; Shitikova, M.V. Application of Fractional Calculus for Dynamic Problems of Solid Mechanics: Novel Trends and Recent Results. Appl. Mech. Rev. 2010, 63, 010801. [Google Scholar] [CrossRef]
Suzuki, J.L.; Naghibolhosseini, M.; Zayernouri, M. A General Return-Mapping Framework for Fractional Visco-Elasto-Plasticity. Fractal Fract. 2022, 6, 715. [Google Scholar] [CrossRef] [PubMed]
Freed, A.D.; Diethelm, K. Fractional Calculus in Biomechanics: A 3D Viscoelastic Model Using Regularized Fractional Derivative Kernels with Application to the Human Calcaneal Fat Pad. Biomech. Model. Mechanobiol. 2006, 5, 203–215. [Google Scholar] [CrossRef]
Malik, M.; Vijayakumar, V.; Shukla, A. Controllability of Discrete-Time Semilinear Riemann–Liouville-like Fractional Equations. Chaos Solitons Fractals 2023, 175, 113959. [Google Scholar] [CrossRef]
Vadivoo, B.S.; Jothilakshmi, G.; Almalki, Y.; Debbouche, A.; Lavanya, M. Relative Controllability Analysis of Fractional Order Differential Equations with Multiple Time Delays. Appl. Math. Comput. 2022, 428, 127192. [Google Scholar] [CrossRef]
Dhayal, R.; Gómez-Aguilar, J.F.; Torres-Jiménez, J. Stability Analysis of Atangana–Baleanu Fractional Stochastic Differential Systems with Impulses. Int. J. Syst. Sci. 2022, 53, 3481–3495. [Google Scholar] [CrossRef]
Ingman, D.; Suzdalnitsky, J.; Zeifman, M. Constitutive Dynamic-Order Model for Nonlinear Contact Phenomena. J. Appl. Mech. 2000, 67, 383–390. [Google Scholar] [CrossRef]
Ramirez, L.E.S.; Coimbra, C.F.M. A Variable Order Constitutive Relation for Viscoelasticity. Ann. Phys. 2007, 519, 543–552. [Google Scholar] [CrossRef]
Patnaik, S.; Semperlotti, F. Variable-Order Fracture Mechanics and Its Application to Dynamic Fracture. NPJ Comput. Mater. 2021, 7, 27. [Google Scholar] [CrossRef]
Diaz, G.; Coimbra, C.F.M. Nonlinear Dynamics and Control of a Variable Order Oscillator with Application to the van Der Pol Equation. Nonlinear Dyn. 2009, 56, 145–157. [Google Scholar] [CrossRef]
Lorenzo, C.F.; Hartley, T.T. Variable Order and Distributed Order Fractional Operators. Nonlinear Dyn. 2002, 29, 57–98. [Google Scholar] [CrossRef]
Lin, J.-Y.; Liao, C.-W. New IIR Filter-Based Adaptive Algorithm in Active Noise Control Applications: Commutation Error-Introduced LMS Algorithm and Associated Convergence Assessment by a Deterministic Approach. Automatica 2008, 44, 2916–2922. [Google Scholar] [CrossRef]
Balla-Arabe, S.; Gao, X.; Wang, B. A Fast and Robust Level Set Method for Image Segmentation Using Fuzzy Clustering and Lattice Boltzmann Method. IEEE Trans. Cybern. 2013, 43, 910–920. [Google Scholar] [CrossRef]
Angulo, M.T. Nonlinear Extremum Seeking Inspired on Second Order Sliding Modes. Automatica 2015, 57, 51–55. [Google Scholar] [CrossRef]
Tan, Y.; He, Z.; Tian, B. A Novel Generalization of Modified LMS Algorithm to Fractional Order. IEEE Signal Process. Lett. 2015, 22, 1244–1248. [Google Scholar] [CrossRef]
Pu, Y.-F.; Zhou, J.-L.; Zhang, Y.; Zhang, N.; Huang, G.; Siarry, P. Fractional Extreme Value Adaptive Training Method: Fractional Steepest Descent Approach. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 653–662. [Google Scholar] [CrossRef]
Cheng, S.; Wei, Y.; Chen, Y.; Li, Y.; Wang, Y. An Innovative Fractional Order LMS Based on Variable Initial Value and Gradient Order. Signal Process. 2017, 133, 260–269. [Google Scholar] [CrossRef]
Chen, Y.; Gao, Q.; Wei, Y.; Wang, Y. Study on Fractional Order Gradient Methods. Appl. Math. Comput. 2017, 314, 310–321. [Google Scholar] [CrossRef]
Chaudhary, N.I.; Raja, M.A.Z.; He, Y.; Khan, Z.A.; Tenreiro Machado, J.A. Design of Multi Innovation Fractional LMS Algorithm for Parameter Estimation of Input Nonlinear Control Autoregressive Systems. Appl. Math. Model. 2021, 93, 412–425. [Google Scholar] [CrossRef]
Chaudhary, N.I.; Raja, M.A.Z.; Khan, Z.A.; Mehmood, A.; Shah, S.M. Design of Fractional Hierarchical Gradient Descent Algorithm for Parameter Estimation of Nonlinear Control Autoregressive Systems. Chaos Solitons Fractals 2022, 157, 111913. [Google Scholar] [CrossRef]
Cheng, S.; Wei, Y.; Sheng, D.; Wang, Y. Identification for Hammerstein Nonlinear Systems Based on Universal Spline Fractional Order LMS Algorithm. Commun. Nonlinear Sci. Numer. Simul. 2019, 79, 104901. [Google Scholar] [CrossRef]
Liu, J.; Zhai, R.; Liu, Y.; Li, W.; Wang, B.; Huang, L. A Quasi Fractional Order Gradient Descent Method with Adaptive Stepsize and Its Application in System Identification. Appl. Math. Comput. 2021, 393, 125797. [Google Scholar] [CrossRef]
Cao, Y.; Su, S. Fractional Gradient Descent Algorithms for Systems with Outliers: A Matrix Fractional Derivative or a Scalar Fractional Derivative. Chaos Solitons Fractals 2023, 174, 113881. [Google Scholar] [CrossRef]
Chen, Y.; Wei, Y.; Liang, S.; Wang, Y. Indirect Model Reference Adaptive Control for a Class of Fractional Order Systems. Commun. Nonlinear Sci. Numer. Simul. 2016, 39, 458–471. [Google Scholar] [CrossRef]
Iqbal, F.; Tufail, M.; Ahmed, S.; Akhtar, M.T. A Fractional Taylor Series-Based Least Mean Square Algorithm, and Its Application to Power Signal Estimation. Signal Process. 2022, 193, 108405. [Google Scholar] [CrossRef]
Yin, W.; Wei, Y.; Liu, T.; Wang, Y. A Novel Orthogonalized Fractional Order Filtered-x Normalized Least Mean Squares Algorithm for Feedforward Vibration Rejection. Mech. Syst. Signal Process. 2019, 119, 138–154. [Google Scholar] [CrossRef]
Zubair, S.; Chaudhary, N.I.; Khan, Z.A.; Wang, W. Momentum Fractional LMS for Power Signal Parameter Estimation. Signal Process. 2018, 142, 441–449. [Google Scholar] [CrossRef]
Zhang, X.; Ding, F. Optimal Adaptive Filtering Algorithm by Using the Fractional-Order Derivative. IEEE Signal Process. Lett. 2022, 29, 399–403. [Google Scholar] [CrossRef]
Zhang, H.; Mo, L. A Novel LMS Algorithm with Double Fractional Order. Circuits Syst. Signal Process. 2023, 42, 1236–1260. [Google Scholar] [CrossRef]
Cheng, S.; Wei, Y.; Chen, Y.; Liang, S.; Wang, Y. A Universal Modified LMS Algorithm with Iteration Order Hybrid Switching. ISA Trans. 2017, 67, 67–75. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; He, Y.; Zhu, Z. Study on Fast Speed Fractional Order Gradient Descent Method and Its Application in Neural Networks. Neurocomputing 2022, 489, 366–376. [Google Scholar] [CrossRef]
Zhang, H.; Pu, Y.-F.; Xie, X.; Zhang, B.; Wang, J.; Huang, T. A Global Neural Network Learning Machine: Coupled Integer and Fractional Calculus Operator with an Adaptive Learning Scheme. Neural Netw. 2021, 143, 386–399. [Google Scholar] [CrossRef]
Han, X.; Dong, J. Applications of Fractional Gradient Descent Method with Adaptive Momentum in BP Neural Networks. Appl. Math. Comput. 2023, 448, 127944. [Google Scholar] [CrossRef]
Chen, M.-R.; Chen, B.-P.; Zeng, G.-Q.; Lu, K.-D.; Chu, P. An Adaptive Fractional-Order BP Neural Network Based on Extremal Optimization for Handwritten Digits Recognition. Neurocomputing 2020, 391, 260–272. [Google Scholar] [CrossRef]
Khan, S.; Naseem, I.; Malik, M.A.; Togneri, R.; Bennamoun, M. A Fractional Gradient Descent-Based RBF Neural Network. Circuits Syst. Signal Process. 2018, 37, 5311–5332. [Google Scholar] [CrossRef]
Khan, S.; Ahmad, J.; Naseem, I.; Moinuddin, M. A Novel Fractional Gradient-Based Learning Algorithm for Recurrent Neural Networks. Circuits Syst. Signal Process. 2018, 37, 593–612. [Google Scholar] [CrossRef]
Wei, Y.; Kang, Y.; Yin, W.; Wang, Y. Generalization of the Gradient Method with Fractional Order Gradient Direction. J. Frankl. Inst. 2020, 357, 2514–2532. [Google Scholar] [CrossRef]
Jamil, M.; Yang, X.-S. A Literature Survey of Benchmark Functions for Global Optimisation Problems. Int. J. Math. Model. Numer. Optim. 2013, 4, 150–194. [Google Scholar] [CrossRef]
Yao, X.; Liu, Y.; Lin, G. Evolutionary Programming Made Faster. IEEE Trans. Evol. Comput. 1999, 3, 82–102. [Google Scholar] [CrossRef]

$Fractalfract 07 00789 g001$

Figure 1. The convergence precision and rate of the integer-order GM, previous FOGM, and proposed TVFOGM, respectively.

$Fractalfract 07 00789 g001$

$Fractalfract 07 00789 g002$

Figure 2. The number of iterations of GM and TVFOGM for the different step sizes.

$Fractalfract 07 00789 g002$

$Fractalfract 07 00789 g003$

Figure 3. The number of iterations of TVFOGM under different parameter

u

values.

Figure 3. The number of iterations of TVFOGM under different parameter

u

values.

$Fractalfract 07 00789 g003$

$Fractalfract 07 00789 g004$

Figure 4. (A): The iterative process of the integer-order GM and TVFOGM (with u = −1.5 and v = 1), respectively. (B): The iterative number of the proposed TVFOGM for different parameters

v

. (C): The iterative number of the proposed TVFOGM for different parameters

u

.

Figure 4. (A): The iterative process of the integer-order GM and TVFOGM (with u = −1.5 and v = 1), respectively. (B): The iterative number of the proposed TVFOGM for different parameters

v

. (C): The iterative number of the proposed TVFOGM for different parameters

u

.

$Fractalfract 07 00789 g004$

$Fractalfract 07 00789 g005$

Figure 5. The iterative process of the integer-order GM, FOGM, and TVFOGM, respectively.

$Fractalfract 07 00789 g005$

$Fractalfract 07 00789 g006$

Figure 6. The iterative process of the integer-order GM, FOGM, and TVFOGM respectively.

$Fractalfract 07 00789 g006$

$Fractalfract 07 00789 g007$

Figure 7. (A) Chaotic response of the uncontrolled system, and the stabilized period solutions of the controlled system with k = 0.2, (B)

τ =

5, and (C)

τ =

6.

Figure 7. (A) Chaotic response of the uncontrolled system, and the stabilized period solutions of the controlled system with k = 0.2, (B)

τ =

5, and (C)

τ =

6.

$Fractalfract 07 00789 g007$

$Fractalfract 07 00789 g008$

Figure 8. The iterative process of the integer-order GM, FOGM, and TVFOGM, respectively.

$Fractalfract 07 00789 g008$

$Fractalfract 07 00789 g009$

Figure 9. Chaotic (A), period-1 (B), period-2 (C), and period-3 (D) responses of the Rossler system, stabilized by the presented controllers. The parameters are selected as (B)

τ_{o p} =

5.890; (C)

τ_{o p} =

11.845; and (D)

τ_{o p} =

17.581 all with k = 0.2.

Figure 9. Chaotic (A), period-1 (B), period-2 (C), and period-3 (D) responses of the Rossler system, stabilized by the presented controllers. The parameters are selected as (B)

τ_{o p} =

5.890; (C)

τ_{o p} =

11.845; and (D)

τ_{o p} =

17.581 all with k = 0.2.

$Fractalfract 07 00789 g009$

$Fractalfract 07 00789 g010$

Figure 10. Controlling errors obtained by the proposed TVFOGM for period-1, period-2, and period-3 orbits, respectively.

$Fractalfract 07 00789 g010$

Table 1. Convergence numbers of each method with different precisions

ε

and iteration step sizes

ρ

.

Table 1. Convergence numbers of each method with different precisions

ε

and iteration step sizes

ρ

.

Iteration Step Size	Convergence Precision	GM	FOGM-Type-1	FOGM-Type-2	FOGM-Type-3	FOGM [21]	TVFOGM
$ρ =$	$ε = 10^{- 3}$	45	50,000	443	50,000	36	21
$0.1$	$ε = 10^{- 5}$	65	50,000	665	50,000	50,000	41
	$ε = 10^{- 7}$	86	50,000	887	50,000	50,000	62
$ρ =$	$ε = 10^{- 3}$	488	523	4497	50,000	178	30
$0.01$	$ε = 10^{- 5}$	716	50,000	6739	50,000	183	221
	$ε = 10^{- 7}$	944	50,000	8981	50,000	50,000	449

Table 2. Convergence errors of GM, FOGM, and TVFOGM at iteration 20, respectively.

GM	FOGM	TVFOGM
257.4713	166.2380	0.0047

Table 3. Convergence series for the time delay provided by TVFOGM, GM, and FOGM.

Methods:	TVFOGM	GM	FOGM
1 iteration	8.000	8.000	8.000
2 iterations	7.868	7.868	7.868
3 iterations	7.330	7.738	7.511
4 iterations	6.994	7.573	7.251
5 iterations	6.521	7.399	6.944
6 iterations	6.277	7.229	6.704
14 iterations	5.890	6.282	5.890
15 iterations	5.890	6.213	5.885
36 iterations		5.890	5.895
37 iterations		5.890	5.883
38 iterations			5.895

Table 4. Convergence time of TVFOGM, GM, and FOGM, respectively.

TVFOGM	GM	FOGM
9.40 s	22.54 s	NAN

Table 5. The 10 benchmark functions and their features.

No.	Function Name	Expression	Feature	$Extreme Point \hat{t}$	$Min f (\hat{t})$
1	Sphere Function	$f (t) = \sum_{i = 1}^{3} t_{i}^{2}$	Bowl-shaped	(0, 0, 0)	0
2	Schwefel’s Problem 2.22	$f (t) = \sum_{i = 1}^{3} \| t_{i} \| + \prod_{i = 1}^{3} \| t_{i} \|$	Smooth, multi-peaked	(0, 0, 0)	0
3	Schwefel’s Problem 1.2	$f (t) = \sum_{i = 1}^{3} {(\sum_{j = 1}^{i} t_{j})}^{2}$	Bowl-shaped	(0, 0, 0)	0
4	Schwefel’s Problem 2.21	$f (t) = \underset{i}{m a x} {\| t_{i} \|, 1 \leq i \leq 3}$	Valley-shaped	(0, 0, 0)	0
5	Rosenbrock’s Function	$f (t) = \sum_{i = 1}^{3} [100 {(t_{i + 1} - t_{i}^{2})}^{2} + {(t_{i} - 1)}^{2}]$	Valley-shaped	(1, 1, 1)	0
6	Sphere Function	$f (t) = \sum_{i = 1}^{3} {(\| t_{i} + 2 \|)}^{2}$	Bowl-shaped	(−2, −2, −2)	0
7	Booth	$f (t) = {(t_{1} - 2 t_{2} - 7)}^{2} + {(2 t_{1} + t_{2} - 5)}^{2}$	Plate-shaped,	(3.4, −1.8)	0
8	Goldstein-Price Function	$f (t)$	several local minima	(0, −1)	3
9	Hartman’s Function	$f (t) = - \sum_{i = 1}^{4} c_{i} \exp [- \sum_{j = 1}^{3} a_{i j} {(t_{j} - p_{i j})}^{2}]$	Steep ridges/drops	(0.114, 0.556, 0.852)	−3.8628
10	McCormick Function	$f (t) = \sin (t_{1} + t_{2}) + {(t_{1} - t_{2})}^{2} - 1.5 t_{1} + 2.5 t_{2} + 1$	Plate-shaped	(−0.547, −1.547)	−1.9132

Notes: 1. The expression of function 8 is

\begin{matrix} f (t) = [1 + {(t_{1} + t_{2} + 1)}^{2} (19 - 14 t_{1} + 3 t_{1}^{2} - 14 t_{2} + 6 t_{1} t_{2} + 3 t_{2}^{2})] \\ \times [30 + {(2 t_{1} - 3 t_{2})}^{2} (18 - 32 t_{1} + 12 t_{1}^{2} + 48 t_{2} - 36 t_{1} t_{2} + 27 t_{2}^{2})] \end{matrix}

. 2. In the expression of function 9,

c_{i} = (1, 1.2, 3, 3.2)

,

a_{i j} = (3, 10, 30; 0.1, 10, 35; 3, 10, 30, 0.1, 10, 35)

,

p_{i j} = (0.3689, 0.1170, 0.2673; 0.4699, 0.4387, 0.7470; 0.1091, 0.8732, 0.5547; 0.03815, 0.5743, 0.8828)

.

Table 6. Numerical results of TVFOGM on 10 benchmark functions.

No.	Initial Point	Method	Iteration Number	$t_{k}$	$f (t_{k})$
1	(2, 2, 2)	GM	20,000	(0, 0, 0)	1.2741 × 10⁻⁵
		FOGM (α = 0.8)	8	(0, 0, 0)	3.0123 × 10⁻⁸
		FOGM (α = 1.2)	7	(0, 0, 0)	2.0317 × 10⁻⁷
		TVFOGM	6	(0, 0, 0)	1.6982 × 10⁻¹⁵
2	(2, 2, 2)	GM	27	(0, 0, 0)	5.1533 × 10⁻¹⁷
		FOGM (α = 0.8)	60	(0, 0, 0)	7.4117 × 10⁻¹⁷
		FOGM (α = 1.2)	45	(0, 0, 0)	1.5587 × 10⁻¹⁶
		TVFOGM	22	(0, 0, 0)	6.9001 × 10⁻¹⁷
3	(1, 1, 1)	GM	36	(0, 0, 0)	2.3833 × 10⁻⁷
		FOGM (α = 0.8)	44	(0, 0, 0)	7.3228 × 10⁻⁷
		FOGM (α = 1.2)	45	(0, 0, 0)	1.8001 × 10⁻⁹
		TVFOGM	31	(0, 0, 0)	2.8866 × 10⁻⁷
4	(10, 10, 10)	GM	43	(0, 0, 0)	1.2761 × 10⁻¹⁷
		FOGM (α = 0.8)	46	(0, 0, 0)	2.9384 × 10⁻¹⁷
		FOGM (α = 1.2)	75	(0, 0, 0)	4.4939 × 10⁻¹⁷
		TVFOGM	40	(0, 0, 0)	5.2108 × 10⁻¹⁷
5	(−1, 5, −2)	GM	421	(1, 1, 1)	8.7176 × 10⁻⁷
		FOGM (α = 0.8)	668	(1, 1, 1)	7.8917 × 10⁻⁷
		FOGM (α = 1.2)	609	(1, 1, 1)	2.3693 × 10⁻⁹
		TVFOGM	419	(1, 1, 1)	9.3046 × 10⁻⁷
6	(10, 5, 1)	GM	3465	(−2, −2, −2)	2.5029 × 10⁻⁷
		FOGM (α = 0.8)	15	(−2, −2, −2)	3.1444 × 10⁻⁸
		FOGM (α = 1.2)	11	(−2, −2, −2)	1.8627 × 10⁻⁹
		TVFOGM	6	(−2, −2, −2)	1.8650 × 10⁻⁹
7	(2, 2)	GM	8	(3.4, −1.8)	1.6006 × 10⁻⁸
		FOGM (α = 0.8)	12	(3.4, −1.8)	6.9082 × 10⁻¹⁰
		FOGM (α = 1.2)	10	(3.4, −1.8)	1.2007 × 10⁻⁸
		TVFOGM	9	(3.4, −1.8)	1.8144 × 10⁻⁸
8	(5, 10)	GM	28	(−0.6, −0.4)	30.0000
		FOGM (α = 0.8)	20,000	(1.8, 0.2)	84.0104
		FOGM (α = 1.2)	20,000	(1.8, 0.2)	84.0003
		TVFOGM	118	(0, −1)	3.0000
9	(0.5, 0.5, 0.5)	GM	320	(0.114, 0.556, 0.852)	−3.8628
		FOGM (α = 0.8)	303	(0.114, 0.556, 0.852)	−3.8628
		FOGM (α = 1.2)	230	(0.114, 0.556, 0.852)	−3.8628
		TVFOGM	285	(0.114, 0.556, 0.852)	−3.8628
10	(0.1, 0.3)	GM	22	(−0.547, −1.547)	−1.9132
		FOGM (α = 0.8)	12	(−0.547, −1.547)	−1.9132
		FOGM (α = 1.2)	13	(−0.548, −1.548)	−1.9132
		TVFOGM	9	(−0.547, −1.547)	−1.9132

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ye, L.; Chen, Y.; Liu, Q. Development of an Efficient Variable Step-Size Gradient Method Utilizing Variable Fractional Derivatives. Fractal Fract. 2023, 7, 789. https://doi.org/10.3390/fractalfract7110789

AMA Style

Ye L, Chen Y, Liu Q. Development of an Efficient Variable Step-Size Gradient Method Utilizing Variable Fractional Derivatives. Fractal and Fractional. 2023; 7(11):789. https://doi.org/10.3390/fractalfract7110789

Chicago/Turabian Style

Ye, Luotang, Yanmao Chen, and Qixian Liu. 2023. "Development of an Efficient Variable Step-Size Gradient Method Utilizing Variable Fractional Derivatives" Fractal and Fractional 7, no. 11: 789. https://doi.org/10.3390/fractalfract7110789

Article Menu

Development of an Efficient Variable Step-Size Gradient Method Utilizing Variable Fractional Derivatives

Abstract

1. Introduction

2. Classification of Fractional Gradient Methods

3. Convergence Analysis of Fractional Gradient Method

4. A Novel Variable Step-Size Gradient Method

4.1. Establishment of the Novel Gradient Method

4.2. Convergence Analysis of TVFOGM

4.3. Incorporating the Armijo Criterion

5. Numerical Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI