Next Article in Journal
Third-Order Neutral Differential Equations with Damping and Distributed Delay: New Asymptotic Properties of Solutions
Previous Article in Journal
On the Solutions of Quaternion Difference Equations in Terms of Generalized Fibonacci-Type Numbers
Previous Article in Special Issue
A Two-Step Iteration Method for Vertical Linear Complementarity Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method

School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou 510520, China
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(10), 2191; https://doi.org/10.3390/sym14102191
Submission received: 14 September 2022 / Revised: 4 October 2022 / Accepted: 9 October 2022 / Published: 18 October 2022
(This article belongs to the Special Issue Tensors and Matrices in Symmetry with Applications)

Abstract

:
The adaptive cubic regularization method solves an unconstrained optimization model by using a three-order regularization term to approximate the objective function at each iteration. Similar to the trust-region method, the calculation of the sub-problem highly affects the computing efficiency. The Lanczos method is an useful tool for simplifying the objective function in the sub-problem. In this paper, we implement the adaptive cubic regularization method with the aid of the Lanczos method, and analyze the error of Lanczos approximation. We show that both the error between the Lanczos objective function and the original cubic term, and the error between the solution of the Lanczos approximation and the solution of the original cubic sub-problem are bounded up by the condition number of the optimal Hessian matrix. Furthermore, we compare the numerical performances of the adaptive cubic regularization algorithm when using the Lanczos approximation method and the adaptive cubic regularization algorithm without using the Lanczos approximation for unconstrained optimization problems. Numerical experiments show that the Lanczos method improves the computation efficiency of the adaptive cubic method remarkably.

1. Introduction

For the unconstrained optimization problem
min x R n f ( x ) ,
Cartis et al. [1] proposed an adaptive cubic regularization (ACR) algorithm. It is an alternative to classical globalization techniques, which uses a cubic over-estimator of the objective function as a regularization technique, and uses an adaptive parameter σ to replace the Lipschitz constant in the cubic Taylor-series model. At each iteration, the objective function is approximated by a cubic function. Numerical experiments in [1] show that the ACR is comparable with trust-region method for small-scale problems. Despite the fact that the method has been shown to have powerful local and global convergence properties, the practicality and efficiency of the adaptive cubic regularization method depend critically on the efficiency of solving its sub-problem at each iteration.
For solving the trust-region sub-problem, many efficient algorithms have been proposed. These algorithms can be grouped into three broad categories: the accurate methods for dense problems, the accurate methods for large-sparse problems, and the approximation methods for large-scale problems. The first category are the accurate methods for dense problems, such as the classical algorithm proposed by Moré and Sorensen [2], which used Newton’s method to iteratively solve symmetric positive definite linear systems via the Cholesky factorization. The second category are the accurate methods for large-sparse problems. For instance, the Lanczos method was employed to solve the large-scale trust-region sub-problem through a parameterized eigenvalue problem [3,4]. Another accurate approach [5], is based on a parametric eigenvalue problem within a semi-definite framework, which employed the Lanczos method for the smallest eigenvalue as a black box. Hager [6] and Erway et al. [7] utilized the subspace projection algorithms for accurate methods. The third category are the approximation methods for large-scale problems. The generalized Lanczos trust-region method (GLTR) [8,9] was proposed as an improved Steihaug [10]-Toint [11] conjugate-gradient method. For the GLTR method, Zhang et al. established prior upper bounds [12] and posterior error bounds [13] for the optimal objective value and the optimal solution between the original trust-region sub-problem and their projected counterparts.
For solving cubic models sub-problems, many algorithms are extensions of trust-region algorithms. Cartis et al. [1] provided the Newton’s method to solve the sub-problem of ACR, which employs Cholesky factorization at each iteration. This method usually applies to small-scale problems. Moreover, Cartis et al. briefly described the process of using the Lanczos method for the ACR sub-problem in [1]. Carmon and Duchi [14] provided the gradient descent method to approximate the cubic-regularized Newton step, and gave the convergence rate. However, the convergence rate of the gradient descent method is worse than that of the Krylov subspace method. Birgin et al. [15] proposed a Newton-like method for unconstrained optimization, whose sub-problem is similar to but different from that of ACR. They introduced a mixed factorization, which is a cheaper factorization than the Cholesky factorization. Brás et al. [16] used the Lanczos method efficiently to solve the sub-problems associated with a special type of cubic models, and also embedded the Lanczos method in a large-scale trust-region strategy. Furthermore, an accelerated first-order method for the ACR sub-problem was developed by Jiang et al. [17].
In this paper, we employ the Lanczos method to solve the sub-problem of the adaptive cubic regularization method (ACRL) for large-scale problems. The ACRL algorithm mainly includes the following three steps. Firstly, the ACRL generates the jth Krylov subspace using the Lanczos method. Next, we project the original sub-problem onto the jth Krylov subspace to obtain a smaller-sized sub-problem. Finally, we solve the resulting smaller-sized sub-problem to get an approximate solution. Such procedures are based on the minimization of the local model of the objective function over a sequence of small-sized sub-spaces. As a result, the ACRL is applicable for large-scale problems. Moreover, we analyze the error of the Lanczos approximation. For unconstrained optimization problems, we perform numerical experiments and compare our method with the method of not using the Lanczos approximation (ACRN).
The outline of this paper is as follows. In Section 2, we introduce the adaptive cubic regularization method and its optimality condition. The method using the Lanczos algorithm to solve the ACR sub-problem is introduced in Section 3. In Section 4, we show the error bounds of the approximate solution and approximate objective value obtained using the ACRL method. Numerical experiments demonstrating the efficiency of the algorithm are given in Section 5. Finally, we give some concluding remarks in Section 6.

2. Preliminaries

Throughout the paper, a matrix is represented by a capital letter, while a lower case bold letter is used for a vector and a lower case letter for a scalar.
The adaptive cubic regularization method [1,18] is proposed by Cartis et al. for unconstrained optimization problems. It mainly uses a cubic over-estimator of the objective function as a regularization technique to calculate the step at each iteration. Assuming that x k is the current iteration point, the objective function f ( x ) is second-order continuously differentiable, and its Hessian matrix H ( x ) = xx f ( x ) is globally Lipschitz continuous. For any p R n , by expressing the Taylor expansion of f ( x k + p ) at the point x k , we obtain
f ( x k + p ) = f ( x k ) + p T g ( x k ) + 1 2 p T H ( x k ) p + 0 1 ( 1 t ) p T [ H ( x k + t p ) H ( x k ) ] p d t f ( x k ) + p T g ( x k ) + 1 2 p T H k p + 1 6 L p 3 ,
where g ( x ) = x f ( x ) , H ( x ) = xx f ( x ) , and L is the Lipschitz constant. Here, and for the remainder of this paper, · denotes an l 2 norm. The inequality is obtained by using the Lipschitz property of xx f ( x ) . In [1], Cartis et al. proposed to replace the constant 1 2 L in Equation (1) with a dynamic positive parameter σ k . In the cubic regularization model, the matrix H ( x ) needs not to be globally or locally continuous in general. Furthermore, the approximation of H ( x ) by a symmetric matrix B k is employed at each iteration. Therefore, the model
min m k ( p ) : = f ( x k ) + p T g ( x k ) + 1 2 p T B k p + 1 3 σ k p 3
is used to estimate f ( x k ) at each iteration. Then, the adaptive cubic regularization method sub-problem aims to compute a descent direction vector p . Finally, the sub-problem is given with the form of
min q k ( p ) : = p T g k + 1 2 p T B k p + 1 3 σ k p 3 ,
in which g k is short for g ( x k ) .
Cartis et al. introduced the following global optimality result of ACR, which is similar to the optimality conditions of the trust-region method.
Theorem 1
([1], Theorem 3.1). The vector p o p t is a global minimizer of the sub-problem (3) if and only if there is a scalar λ o p t 0 satisfying the following system of equations:
( B k + λ o p t I ) p o p t = g k ,
where λ o p t = σ k p o p t , and  B k + λ o p t I is a positive semi-definite matrix. If  B k + λ o p t I is positive definite, then p o p t is unique.
The optimality condition of the trust-region sub-problem [19] aims to minimize g k T p + 1 2 p T B k p within an l 2 -norm trust region p k , where k > 0 is the trust-region radius. For a trust-region sub-problem, the vector p o p t satisfies λ o p t ( k p o p t ) = 0 , which means either λ o p t = 0 or p o p t = k . When both the trust-region sub-problem and the cubic regularization sub-problem approximate the original objective function precisely enough, we get k = λ o p t / σ k from Theorem 1. Therefore, the parameter σ k in the ACR algorithm is inversely proportional to the trust-region radius, and it plays the same role as the trust region-radius, while we adjust the estimation accuracy of the sub-problem.

3. Computation of the ACR Sub-Problem with the Lanczos Method

The Lanczos algorithm [20] was proposed to solve sparse linear systems and to find the eigenvalues of sparse matrices. It builds up an orthogonal basis Q j = { q 0 , q 1 , , q j } for the Krylov space K j ( B , g ) : = { g , B g , B 2 g , , B j g } . By utilizing the orthogonal basis Q j , the original symmetric matrix B is transformed into a tridiagonal matrix.
Normally, the dimension of the K j ( B , g ) increases by 1 as j increases by 1. However, the Lanczos process may break down and the dimension of K j ( B , g ) stops increasing at a certain j. We define j m a x as the smallest nonnegative integer, such that the Lanczos process breaks down. If the dimension of the Krylov space is much less than the size of the matrix, it greatly saves the storage space and highly improves the calculation speed by projecting B onto a j + 1 subspace. Specially, we find a proper Q j using the Lanczos method, such that Q j T B Q j = T j is tridiagonal. We state the procedure in the following algorithm.
Algorithm 1 computes an orthogonal matrix Q j = [ q 0 , q 1 , . . . , q j ] R n × ( j + 1 ) , where
T j = α 0 β 1 β 1 α 1 β j 1 β j 1 α j
is tridiagonal. Moreover, it follows directly from Algorithm 1 that
Q j T Q j = I , T j = Q j T B Q j , Q j T g = β 0 e 1 ,
where e 1 is the first unit vector of j + 1 in length.
Algorithm 1 Lanczos algorithm
1:
j = 0 , β 0 = g , r 0 = g , q 0 = r 0 / r 0
2:
while  j = 0 or β j 0   do
3:
    q j + 1 = r j / β j
4:
    j = j + 1
5:
    α j = q j T B q j
6:
    r j = ( B α j I ) q j β j 1 q j 1
7:
    β j = r j
8:
end while
For a large-scale trust-region sub-problem, an effective solution is to approximately calculate it using the Krylov subspace methods. The Lanczos algorithm, as one of the Kryolv subspace methods, was first introduced in [8] for the trust-region method. Similar to the trust-region method, the Lanczos algorithm is also suitable for solving the cubic regularization sub-problem. By employing Algorithm 1, we find
p j k = Q j k u j k : = arg min p K j ( B k , g k ) q k ( p ) K j ( B k , g k ) ,
where q k ( p ) is defined by (3). The original sub-problem (3) is transformed into the following sub-problem
min q k ( u ) : = β 0 u T e 1 + 1 2 u T T j u + 1 3 σ k u 2 3 .
Theorem 1 illustrates that u is a global minimizer of the above sub-problem, if and only if a pair of ( u , λ ) satisfies
( T j + λ I ) u = β 0 e 1 and λ 2 = σ k 2 u T u ,
where T j + λ I is positive semi-definite. Equation (8) can finally be solved by Newton’s method ([1], Algorithm 6.1). Newton’s method for solving the sub-problem requires the eigenvalue decomposition of B + λ I for various λ . When the scale of the original problem is large, it is very expensive to directly use the iterative method.
In summary, an approximation p j k of the solution of the ACR sub-problem (3) can be obtained in the following steps. First, we apply j steps of the Lanczos method to the cubic function appearing in (3) to obtain a tridiagonal matrix T j . Then, we use the Newton’s method for a small-size sub-problem with matrix T j to compute the Lagrange multiplier λ k and u j k . Finally, the matrix Q j is used to recover p j k . Thus, it should be noted that the Lanczos vectors need to be saved. We sketch the algorithm as follows.
In the GLTR algorithm, ([8], Theorem 5.8) discussed a restarting strategy for the degenerate case, which means that multiple global solutions p o p t exist. Similar to the GLTR, a restarting strategy also applies to the ACRL, although this is just discussed from a theoretical perspective. Therefore, we mainly consider the nondegenerate case in the following analysis.

4. Convergence Analysis

Theorem 1 shows that we aim to seek a pair of ( p o p t , λ o p t ) satisfying
( B k + λ o p t I ) p o p t = g k and λ o p t = σ k p o p t .
Then, we have p o p t = λ o p t / σ k . In this section, we will analyze the error between the optimal objective function value of the original sub-problem q k ( p o p t ) and the the optimal objective function value q k ( p j k ) of the sub-problem in the subspace K j ( B k , g k ) generated by the Algorithm 2, as well as the distance between p o p t and p j k under the assumption σ k > 0 when the Equation (9) was satisfied.
We set
B k o p t : = B k + λ o p t I ,
which is positive definite in the nondegenerate case. The spectral condition number of B k o p t  is
κ ( B k o p t ) = θ n + λ o p t θ 1 + λ o p t ,
where θ 1 θ 2 . . . θ n are the eigenvalues of B k . We define
q k o p t ( p ) : = 1 2 p T B k o p t p + p T g k + 1 3 σ k p 3 = q k ( p ) + 1 2 λ o p t p 2 ,
in which q k ( p ) is defined by (3).
Next, for the vector p j k defined in (6), we analyze the errors
p j k p o p t and | q k ( p j k ) q k ( p o p t ) | .
Algorithm 2 The ACRL method
1:
for  j = 0   do
2:
   Obtain T j and Q j from Algorithm 1
3:
   Solve the tridiagonal sub-problem (7) to get u j k by Newton’s method
4:
    p j k = Q j u j k
5:
    j = j + 1
6:
end for
Theorem 2.
Suppose (3) is nondegenerate; p o p t = λ o p t / σ k and p j k is the jth approximation of p o p t generated by ACRL satisfying p j k = λ o p t / σ k , then for any nonzero p ˜ K j ( B k , g k ) , we have
0 q k ( p j k ) q k ( p o p t ) 2 B k o p t p ˜ p o p t 2
and
p j k p o p t 2 κ ( B k o p t ) p ˜ p o p t .
Proof. 
It can be seen that | p ˜ λ o p t σ k | = | p ˜ p o p t | p ˜ p o p t . Then, we obtain
| 1 λ o p t σ k p ˜ | p ˜ p o p t p ˜ .
Let s = v p o p t , where
v = λ o p t σ k · p ˜ p ˜ .
Based on (16), we obtain
v = p o p t = λ o p t σ k .
We immediately have
s = p o p t v p o p t p ˜ + p ˜ v = p o p t p ˜ + p ˜ λ o p t σ k · p ˜ p ˜ p o p t p ˜ + p ˜ | 1 λ o p t σ k p ˜ | 2 p o p t p ˜ ,
where the last equality follows from (15).
Furthermore, for any 0 i j m a x 1 ,
q k ( p i k ) = min p K i ( B k , g k ) q k ( p ) q k ( p i + 1 k ) = min p K i + 1 ( B k , g k ) q k ( p ) q k ( p o p t ) = min q k ( p ) .
Therefore, we have
0 q k ( p j k ) q k ( p o p t ) q k ( v ) q k ( p o p t ) = q k ( s + p o p t ) q k ( p o p t ) = 1 2 s T B k s + s T ( B k p o p t + g k ) + 1 3 σ k ( v 3 p o p t 3 ) = 1 2 s T B k s λ o p t s T p o p t by ( 17 ) and ( 9 ) = 1 2 s T ( B k + λ o p t I ) s
B k o p t 2 s 2 2 B k o p t p ˜ p o p t 2 by ( 18 ) .
From
( λ o p t σ k ) 2 = v 2 = s 2 + p o p t 2 + 2 s T p o p t
and (17), we get s T p o p t = s 2 / 2 = s T s / 2 . Then, the equality (19) holds. The conclusion in (13) is given based on the above analysis.
Next, we prove the inequality (14). From the definition of q k o p t in (12), for any p j k , by p o p t = p j k , we have
q k o p t ( p j k ) q k o p t ( p o p t ) = q k ( p j k ) + 1 2 λ o p t p j k 2 q k ( p o p t ) + 1 2 λ o p t p o p t 2 = q k ( p j k ) q k ( p o p t ) .
Furthermore, we obtain
q k o p t ( p j k ) q k o p t ( p o p t ) = 1 2 ( p j k ) T B k o p t p j k + ( p j k ) T g k 1 2 p o p t T B k o p t p o p t p o p t T g k = 1 2 ( p j k p o p t ) T B k o p t ( p j k p o p t ) ,
where the last equality follows from (9) and (10). Then,
q k o p t ( p j k ) q k o p t ( p o p t ) 1 2 ( θ 1 + λ o p t ) p j k p o p t 2 .
Combining (13), (21), and (22), we get
1 2 ( θ 1 + λ o p t ) p j k p o p t 2 q k ( p j k ) q k ( p o p t ) 2 B k o p t p ˜ p o p t 2 .
The inequality (14) holds. □

5. Numerical Experiments

In order to show the efficiency of the Lanczos for improving the adaptive cubic regularization algorithm, we perform the following two numerical experiments. In this section, we compare the numerical performances of the adaptive cubic regularization algorithm when using the Lanczos approximation method (ACRL) and the adaptive cubic regularization algorithm by just using Newton’s method (ACRN) for unconstrained optimization problems.
The ACRL and ACRN algorithms are implemented with the following parameters
η 1 = 0.1 , η 2 = 0.8 , γ 1 = 0.25 , γ 2 = 1.2 , and γ 3 = 2 .
Convergence in both algorithms for the sub-problem occurs as soon as
q k ( p j k ) min 0.0001 , p j k max ( 1 , σ k ) q k ( 0 )
or if more than the maximum number of iterations has been performed, which we set to 2000. All numerical experiments in this paper were performed on a laptop with i5-10210U CPU at 1.60 GHz and 16.0 GB of RAM.
Example 1
(Generalized Rosenbrock function [21]). The Generalized Rosenbrock function is a non-convex function, introduced by Howard H. Rosenbrock in 1960, which is defined as follows:
f ( x ) = i = 1 n 1 c ( x i + 1 x i 2 ) 2 + ( 1 x i ) 2 , c = 100 .
From the (23), the solution x is obviously ( 1 , 1 , . . . , 1 ) T , and the minimum f ( x ) = 0 .
In Table 1, we show the results of the ACRL and the ACRN for computing the minima of the Generalized Rosenbrock function, with variables from 10 to 2000. In addition to the dimensions of the Generalized Rosenbrock function, we give the number of iterations (“Iter.”), the total CPU time required in seconds and the relative error between the computational result and the exact minimum (“Err.”). It can be seen that, using the Lanczos method to solve the adaptive cubic regularization sub-problem of Generalized Rosenbrock function is much more efficient than not using the Lanczos method. Moreover, it is not only faster, but also more accurate to calculate, especially when the scale is relatively large.
Example 2
(Eigenvalues of tensors arising from hypergraphs). Next, we consider the problem of computing extreme eigenvalues of sparse tensors arising from a hypergraph. An adaptive cubic regularization method on a Stiefel manifold named ACRCET is proposed to solve the eigenvalues of tensors [22]. We compare the numerical performances of the ACRL and the ACRN method when applying to the sub-problem of ACRCET. Before going to the experiment part, we first introduce the concepts of tensor eigenvalues and hypergraphs.
A real mth order n-dimensional tensor A R [ m , n ] has n m entries:
{ a i 1 i 2 i m }
for i j { 1 , 2 , , n } and j { 1 , 2 , , m } . If the value of { a i 1 i 2 i m } is invariable under any permutation of its indices, A is a symmetric tensor.
Qi [23] defined a scalar Λ R as a Z-eigenvalue of A and a nonzero vector x R n as its associated Z-eigenvector if they satisfy
A x m 1 = Λ x and x T x = 1 .
Definition 1
(Hypergraph). A hypergraph is defined as G = ( V , E ) , where V = { 1 , 2 , , n } is the vertex set and E = { e 1 , e 2 , , e m } is the edge set for e p V , p = 1 , 2 , , m . If | e p | = r 2 for p = 1 , 2 , , m and e i e j when i j , we call G an r-uniform hypergraph.
For each vertex i V , the degree d ( i ) is defined as
d ( i ) = | { e p : i e p , e p E } | .
Definition 2
(adjacency tensor and Laplacian tensor). The adjacency tensor A R [ m , n ] of a m-uniform hypergraph G is a symmetric tensor with entries
a i 1 i m = 1 ( m 1 ) ! i f { i 1 , , i m } E , 0 o t h e r w i s e .
For an m-uniform hypergraph G, the degree tensor D is a diagonal tensor whose ith diagonal element is d ( i ) . Then, the Laplacian tensor L is defined as
L = D A .
A triangle has three vertices and three edges. In this example, we subdivide the triangles by connecting the midpoints of each edge of the triangles. Then, the s-order subdivision of a triangle has 4 s faces, and each face is a triangle. As shown in Figure 1, three vertices as well as the center of the triangles are regarded as an edge of a 4-uniform graph G T s .
We compute the largest Z-eigenvalue of the Laplacian tensor L ( G T s ) via the ACRCET method, using ACRL and ACRN, respectively. In each run, 10 points on the unit sphere are randomly chosen, and 10 estimated eigenvalues are calculated. Then, we take the best one as the estimated largest eigenvalue. For different subdivision order s , the computation results, including the estimated largest Z-eigenvalue, the total number of iterations, and the total CPU time (in seconds) of the 10 runs are reported in Table 2.
It can be seen that both the ACRL and the ACRN find all the largest eigenvalues. However, the ACRL takes almost no time compared to the ACRN. When s = 6 , the ACRL method only costs 236 s, while the ACRN needs 103,900 s. The numerical comparison between the ACRL and the ACRN verifies that the Lanczos method dramatically accelerates the running speed when solving the ACR sub-problem (3), and is powerful for large-scale problems.

6. Conclusions

In this paper, we have used the Lanczos method to solve the adaptive cubic regularization method sub-problem (ACRL). The ACRL method first projects a large-scale ACR sub-problem (3) into a much smaller sub-problem (7) using the Lanczos method, and then solves the smaller sub-problem (7) using the Newton’s method. For the convergence analysis, we also established prior error bounds on the differences between the approximate objective value q k ( p j k ) and the approximate solution p j k with its corresponding optimal ones. Numerical experiments illustrate that the ACRL method greatly improves the computing efficiency and performs well, even for large-scale problems.

Author Contributions

Methodology, Z.Z. and J.C.; writing—original draft preparation, Z.Z.; writing—review and editing, J.C.; supervision, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant No. 11901118 and No. 62073087.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cartis, C.; Gould, N.I.; Toint, P.L. Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results. Math. Program. 2011, 127, 245–295. [Google Scholar] [CrossRef] [Green Version]
  2. Moré, J.J.; Sorensen, D.C. Computing a trust region step. SIAM J. Sci. Stat. Comput. 1983, 4, 553–572. [Google Scholar] [CrossRef] [Green Version]
  3. Sorensen, D.C. Minimization of a large-scale quadratic functionsubject to a spherical constraint. SIAM J. Optim. 1997, 7, 141–161. [Google Scholar] [CrossRef]
  4. Rojas, M.; Santos, S.A.; Sorensen, D.C. A new matrix-free algorithm for the large-scale trust-region subproblem. SIAM J. Optim. 2000, 11, 611–646. [Google Scholar] [CrossRef]
  5. Rendl, F.; Wolkowicz, H. A semidefinite framework for trust region subproblems with applications to large scale minimization. Math. Program. 1997, 77, 273–299. [Google Scholar] [CrossRef]
  6. Hager, W.W. Minimizing a quadratic over a sphere. SIAM J. Optim. 2001, 12, 188–208. [Google Scholar] [CrossRef] [Green Version]
  7. Erway, J.B.; Gill, P.E.; Griffin, J.D. Iterative methods for finding a trust-region step. SIAM J. Optim. 2009, 20, 1110–1131. [Google Scholar] [CrossRef] [Green Version]
  8. Gould, N.I.; Lucidi, S.; Roma, M.; Toint, P.L. Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 1999, 9, 504–525. [Google Scholar] [CrossRef]
  9. Conn, A.R.; Gould, N.I.; Toint, P.L. Trust Region Methods; SIAM: Philadelphia, PA, USA, 2000; pp. 91–105. [Google Scholar]
  10. Steihaug, T. The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 1983, 20, 626–637. [Google Scholar] [CrossRef] [Green Version]
  11. Toint, P. Towards an efficient sparsity exploiting Newton method for minimization. In Sparse Matrices and Their Uses; Academic Press: Cambridge, MA, USA, 1981; pp. 57–88. [Google Scholar]
  12. Zhang, L.H.; Shen, C.; Li, R.C. On the generalized Lanczos trust-region method. SIAM J. Optim. 2017, 27, 2110–2142. [Google Scholar] [CrossRef]
  13. Zhang, L.; Yang, W.; Shen, C.; Feng, J. Error bounds of Lanczos approach for trust-region subproblem. Front. Math. China 2018, 13, 459–481. [Google Scholar] [CrossRef]
  14. Carmon, Y.; Duchi, J. Gradient descent finds the cubic-regularized nonconvex Newton step. SIAM J. Optim. 2019, 29, 2146–2178. [Google Scholar] [CrossRef]
  15. Birgin, E.G.; Martínez, J.M. A Newton-like method with mixed factorizations and cubic regularization for unconstrained minimization. Comput. Optim. Appl. 2019, 73, 707–753. [Google Scholar] [CrossRef]
  16. Brás, C.P.; Martínez, J.M.; Raydan, M. Large-scale unconstrained optimization using separable cubic modeling and matrix-free subspace minimization. Comput. Optim. Appl. 2020, 75, 169–205. [Google Scholar] [CrossRef]
  17. Jiang, R.; Yue, M.C.; Zhou, Z. An accelerated first-order method with complexity analysis for solving cubic regularization subproblems. Comput. Optim. Appl. 2021, 79, 471–506. [Google Scholar] [CrossRef]
  18. Cartis, C.; Gould, N.I.; Toint, P.L. Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function-and derivative-evaluation complexity. Math. Program. 2011, 130, 295–319. [Google Scholar] [CrossRef]
  19. Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: Berlin/Heidelberg, Germany, 1999; pp. 69–71. [Google Scholar]
  20. Parlett, B.N.; Reid, J.K. Tracking the Progress of the Lanczos Algorithm for Large Symmetric Eigenproblems. IMA J. Numer. Anal. 1981, 1, 135–155. [Google Scholar] [CrossRef]
  21. Andrei, N. An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10, 147–161. [Google Scholar]
  22. Chang, J.; Zhu, Z. An adaptive cubic regularization method for computing extreme eigenvalues of tensors. arXiv 2022, arXiv:2209.04971. [Google Scholar]
  23. Qi, L. Eigenvalues of a real supersymmetric tensor. J. Symb. Comput. 2005, 40, 1302–1324. [Google Scholar] [CrossRef]
Figure 1. Four-uniform hypergraphs: subdivision of a triangle.
Figure 1. Four-uniform hypergraphs: subdivision of a triangle.
Symmetry 14 02191 g001
Table 1. Results for computing the minima of the Generalized Rosenbrock function.
Table 1. Results for computing the minima of the Generalized Rosenbrock function.
nACRLACRN
Iter.Time (s)Err.Iter.Time (s)Err.
10110.014.05  × 10 14 80.014.85  × 10 11
500170.058.33  × 10 13 120.881.10  × 10 13
1000210.277.35  × 10 13 124.565.53  × 10 11
5000236.241.47  × 10 15 9191.951.41  × 10 12
10,0002120.332.42  × 10 15 111524.886.29  × 10 13
20,0002177.86.90  × 10 13 1416,166.623.03  × 10 11
Table 2. Results for finding the largest Z-eigenvalues of L ( G T s ) .
Table 2. Results for finding the largest Z-eigenvalues of L ( G T s ) .
ACRLACRN
smnIter.Time (s) λ max Z ( L ( G T s ) ) Iter.Time (s) λ max Z ( L ( G T s ) )
1104530.043530.053
23116570.056560.176
310964550.086540.796
4409256560.7865413.056
5158510246713.02666727.776
66241409681236.35685103,9006
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhu, Z.; Chang, J. Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method. Symmetry 2022, 14, 2191. https://doi.org/10.3390/sym14102191

AMA Style

Zhu Z, Chang J. Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method. Symmetry. 2022; 14(10):2191. https://doi.org/10.3390/sym14102191

Chicago/Turabian Style

Zhu, Zhi, and Jingya Chang. 2022. "Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method" Symmetry 14, no. 10: 2191. https://doi.org/10.3390/sym14102191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop