Next Article in Journal
A Review of Fluid-Induced Excitations in Centrifugal Pumps
Previous Article in Journal
A Robust Sphere Detection in a Realsense Point Cloud by USING Z-Score and RANSAC
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High-Dimensional Covariance Estimation via Constrained Lq-Type Regularization

1
School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China
2
Department of Statistics, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
3
Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(4), 1022; https://doi.org/10.3390/math11041022
Submission received: 23 January 2023 / Revised: 14 February 2023 / Accepted: 15 February 2023 / Published: 17 February 2023
(This article belongs to the Section Computational and Applied Mathematics)

Abstract

:
High-dimensional covariance matrix estimation is one of the fundamental and important problems in multivariate analysis and has a wide range of applications in many fields. In practice, it is common that a covariance matrix is composed of a low-rank matrix and a sparse matrix. In this paper we estimate the covariance matrix by solving a constrained L q -type regularized optimization problem. We establish the first-order optimality conditions for this problem by using proximal mapping and the subspace method. The proposed stationary point degenerates to the first-order stationary points of the unconstrained L q regularized sparse or low-rank optimization problems. A smoothing alternating updating method is proposed to find an estimator for the covariance matrix. We establish the convergence of the proposed calculation method. The numerical simulation results show the effectiveness of the proposed approach for high-dimensional covariance estimation.

1. Introduction

High-dimensional covariance matrix estimation is an important topic in statistical inference and learning. The objective is to estimate the covariance matrix of p variables based on sample data of size n. Much statistical analysis of high-dimensional data involves covariance estimation. In a high-dimensional setting (usually p n ), researchers usually assume that the covariance matrix is sparse and have proposed various regularization techniques [1,2,3,4,5,6,7,8,9] to consistently estimate covariance matrix Σ . However, this assumption is not appropriate in many applications such as financial analysis, depending on the equity market risks, gene expressions stimulated by cytokines, etc. A more reasonable assumption based on the factor model [10,11,12,13,14] is that the high-dimensional covariance matrix is the sum of a low-rank matrix and a sparse matrix. Specifically, a factor model can be formulated as
Y i = B f i + u i , i = 1 , , n ,
where Y i = ( Y 1 i , , Y p i ) T is the observed variable, B = ( b 1 , , b p ) T is the matrix of factor loadings, f i is a K × 1 vector of common factors, and u i = ( u 1 i , , u p i ) T is the idiosyncratic error component, uncorrelated with f i . Under this model, the covariance matrix of Y i is given by
Σ = B cov ( f i ) B T + Σ u : = L + S ,
where L is the covariance matrix of B f i and S is the covariance matrix of u i . Since K is usually less than p, L is a low-rank matrix. More generally, there is a large class of problems where one is interested in estimating the covariance matrix of X i from the noisy observations Y i = X i + u i , where the noise u i is uncorrelated with the signal X i . In addition, if X i is generated from a linear system so that some of its elements are linear functions of the others, then the covariance matrix of X i has low-rank. Based on the structural decomposition (2), this paper proposes a novel approach to estimate the covariance matrix by solving a constrained low-rank and sparse regularized optimization problem.
Under structural decomposition (2), researchers have proposed some approaches to estimate the covariance matrix. When the common factors in Model (1) are known and observable and S is diagonal, Reference [10] proposed a least-squares-based covariance estimation (LS-CE). The core idea of LS-CE is first to estimate B by the least-squares method and, then, obtain the final estimator assuming that S is diagonal. Compared to the sample covariance matrix estimator, the LS-CE estimator is always invertible, even in a high-dimensional setting. Moreover, this estimator is shown to have substantial gains when estimating the inverse of the covariance matrix, however, it does not have much advantage when estimating the covariance matrix. Instead of assuming diagonal S , Reference [11] assumed that S is sparse and applied the adaptive thresholding technique of [4] to obtain an estimator for S . Then, a least-squares-based covariance estimator with conditional sparsity (LS-CS-CE) under Model (1) was studied in [11]. More recently, a robust approach based on Huber loss minimization was proposed in [13]. This approach estimates the initial joint covariance matrix of the observed data and the factors and, then, uses it to recover the covariance matrix of the observed data. Moreover, the proposed estimator in [13] only requires the condition that the fourth moment of the data exists. This method is applicable to a wider range of data, including sub-Gaussian and elliptical distributions. When the common factors are unknown and unobservable, Reference [12] proposed a non-parametric estimator via the thresholding principal orthogonal complements, which is called the principal orthogonal complement thresholding (POET) method. The POET uses the K principal components to estimate L and applies the adaptive thresholding technique [4] on the rest of the ( p K ) components to estimate S . However, as stated in [9], using the simple thresholding approaches in [1,2,3,4,5,9] to estimate S may lead to the covariance estimators in [11,12,13], which are necessarily positive semi-definite.
Moreover, as suggested in [14], an intuitive optimization-type approach is to solve the problem:
min L , S R p × p 1 2 L + S Σ ˜ F 2 + λ L rank ( L ) + λ S ( u , v ) D c I ( | S u v | ) ,
where Σ ˜ is the sample covariance matrix, rank ( L ) denotes the rank of matrix L , I ( | S u v | ) = 1 if | S u v | 0 and I ( | S u v | ) = 0 if | S u v | = 0 , D c : = { ( u , v ) N × N : 1 u p , 1 v p , u v } is the index set of non-diagonal elements of S , and λ L and λ S are tuning parameters. To facilitate the calculation, Reference [14] proposed the low-rank and sparse covariance estimator (LOREC) via solving the following convex relaxation optimization problem:
min L , S R p × p 1 2 L + S Σ ˜ F 2 + λ R L + λ S ( u , v ) D c | S u v | ,
where L : = i = 1 p σ i ( L ) denotes the nuclear norm of L and σ i ( L ) is the ith largest singular value of L . However, the estimators for L and S proposed in [14] are not guaranteed to be positive semi-definite. Clearly, the optimization problems (3) and (4) can be considered as special cases of the robust principal component analysis (RPCA) in [15] and its convex relaxation, respectively.
In this paper, we propose an optimization model similar to some models of RPCA, which is formulated as
min L , S R p × z rank ( L ) + λ S S 0 , s . t . L + S = Σ ˜ ,
where S 0 denotes the number of the non-zero elements of S . Reference [15] showed that the principal component pursuit estimate solving
min L , S R p × z L + λ S u = 1 p v = 1 p | S u v | , s . t . L + S = Σ ˜ ,
exactly recovers the low-rank L and the sparse S . Furthermore, Reference [16] suggested that L and S can be estimated by solving the following unconstrained optimization problem:
min L , S R p × z L + λ S u = 1 p v = 1 p | S u v | + λ L 2 L + S Σ ˜ F 2 .
More references about the RPCA and related works based on convex and non-convex regularization can be found in [17,18,19,20,21].
The main contributions of this paper can be summarized as follows:
  • We construct a constrained adaptive L q -type regularized optimization problem for covariance estimation, which is a non-smooth and non-convex problem. This optimization problem is based on the structural decomposition (2) and the idea of RPCA in [16], but is not limited to the factor model (1).
  • We study the first-order optimality condition of this problem and define a class of hybrid first-order stationary points. We establish the relationship between the first-order stationary point and the global minimizer of this problem.
  • We propose a smoothing alternating updating method to solve the constructed non-smooth, non-convex optimization problem and established its convergence. The simulation results show that the proposed smoothing alternating updating method is efficacious for the constrained adaptive L q -type regularized optimization problem.
The rest of this paper is organized as follows. In Section 2, we give some notations and preliminaries. In Section 3, we construct a constrained adaptive L q -type regularized optimization problem for covariance estimation. In Section 4, we study the first-order optimality condition of this problem and define a class of hybrid first-order stationary points. In Section 5, we propose a smoothing alternating updating method to solve the non-smooth, non-convex optimization problem and establish its convergence. The simulation results are given in Section 6. The conclusions and discussion are given in Section 7, while the mathematical proofs are given in Section 8.

2. Notations and Preliminaries

Throughout this paper, we use the following notations. The sets of p × p positive semi-definite symmetric and positive definite symmetric matrices are denoted by S + p and S + + p , respectively. It is known that S + p and S + + p are cones.
Given matrices X : = ( X u v ) u , v = 1 p R p × p and Y : = ( Y u v ) u , v = 1 p R p × p , where X u v is the ( u , v ) th element of X , X , Y : = u = 1 p v = 1 p X u v Y u v , and X Y denotes the Hadamard product ( X Y ) u v = : X u v Y u v for all u , v = 1 , , p . Define the elementwise maximum norm (or max norm) | X | max : = max u , v | X u v | , the Frobenius norm X F : = ( u = 1 p v = 1 p X u v 2 ) 1 / 2 , and the spectral norm X 2 : = sup x 2 1 X x 2 , where x 2 : = i = 1 p x i 2 is 2-norm of vector x . Moreover, given a matrix X R p × p and a index set Γ { ( u , v ) : 1 u p , 1 v p } , X Γ denotes the array of the same dimension such that [ X Γ ] u v = X u v for ( u , v ) Γ and [ X Γ ] u v are undefined for ( u , v ) Γ c . For convenience, we will write the ( u , v ) th element of X Γ as X u v regardless if it is defined or not.
Given a vector x R p , Diag ( x ) denotes a square matrix with vector x on its diagonal and zeros elsewhere. Given a matrix X S + p and letting D i ( X ) denote the ith largest eigenvalue of X , i = 1 , , p , write D ( X ) = ( D 1 ( X ) , , D p ( X ) ) T R p . Define sign operator:
sign ( t ) = 1 , if t > 0 0 , if t = 0 1 , if t < 0 ,
and set D : = { ( u , v ) N × N : 1 u p , 1 v p , u = v } .
Let f : R p × p R ¯ be a convex function, and let X ¯ dom f . The subdifferential of f at X ¯ is defined by
f ( X ¯ ) : = ξ R p × p : lim t 0 f ( X ¯ + t Z ) f ( X ¯ ) t ξ , Z for all Z R p × p ,
where dom f : = { X R p × p : f ( X ) < } denotes the domain of f, R ¯ : = ( , ] . In other words, f ( X ¯ ) is the closed convex hull of all points of the form lim i f ( X i ) , where { X i } is any sequence that converges to X ¯ while avoiding the set for which f fails to be differentiable. Moreover, the singular subdifferential of f at X ¯ is defined by
f ( X ¯ ) : = ξ R p × p : ( ξ , 0 ) N ( X ¯ , f ( X ¯ ) ) ; epi f ,
where epi f : = { ( X , t ) R p × p × R : X dom f , t f ( X ) } denotes the epigraph of f.
Finally, we give the definition of the proximal mapping in this paper. Given a closed set Ω and a function f : R p × p R ¯ , the proximal mapping on Ω of f is the operator given by
Prox f , Ω ( X ) = argmin U Ω f ( U ) + 1 2 U X F 2 for any X R p × p .

3. Covariance Estimation with Adaptive L q -Type Regularization

Considering that sparse covariance matrix estimation is intrinsically a heteroscedastic problem ([4]) and L and S are all positive semi-definite matrices, we estimate the covariance matrix by solving the following adaptive L q -type regularized optimization problem:
min L , S S + p F ( L , S ) : = 1 2 L + S Σ ˜ F 2 + λ L L q q + ( u , v ) D c λ u v S | S u v | q ,
where q ( 0 , 1 ) , L q q : = i = 1 p σ i ( L ) q is the Schatten-q quasi-norm of matrix L and λ L and λ u v S are tuning parameters. Note that although we use the same q for L and S here for simplicity, it can be different in general. Our approach is called L q regularized covariance ( L q -CSC) estimation with conditional sparsity. The final estimator Σ ^ : = L ^ + S ^ is called the L q regularized covariance estimator with conditional sparsity ( L q -CSCE).
To deal with heteroscedasticity in sparse covariance matrix estimation, we adopt the adaptive thresholding approach of Reference [4] and [12] to calculate the entry-dependent threshold λ u v S in (7) for S u v as
λ u v S : = λ σ ˜ u u σ ˜ v v log p n 1 / 2 ,
where λ > 0 is a constant. Note that these entry-dependent tuning parameters for S in Problem (7) depend on the variance of the corresponding variables. The parameter σ ˜ u u can be taken as the u-th diagonal elements of the estimator of the POET for S . It is worth mentioning that, when the data matrix is normalized, the covariance matrix is equivalent to its correlation matrix, in which case, all tuning parameters are λ log p / n , and the adaptive L q regularization function acting on S becomes the ordinary L q regularization function.
We now discuss the selection of Σ ˜ . The common one is the ordinary sample covariance matrix
1 n 1 i = 1 n ( Y i Y ¯ ) ( Y i Y ¯ ) T ,
where Y ¯ = n 1 i = 1 n Y i , Y i is the i-th sample vector, i = 1 , , n . The ordinary sample covariance is suitable for the data with exponential-type tails or polynomial-type tails; see [1,4,8]. However, in many practical problems data tend to be heavy-tailed. References [9,13,22] defined a Huber minimization sample covariance matrix, which is suitable for the data with finite fourth moment. Moreover, Reference [9] also defined a rank-based covariance matrix and a median of means covariance matrix for data with other distribution conditions.
Finally, we discuss some basic properties of Problem (7).
Proposition 1. 
The following statements of F ( L , S ) : R p × p × R p × p R + in Problem (7) hold:
(i)
F ( L , S ) is proper, closed, and coercive, i.e., lim ( R , S ) F F ( L , S ) = .
(ii)
F ( L , S ) attains its minimal value over S + p × S + p .
(iii)
Let f ( L , S ) = 1 2 L + S Σ ˜ F 2 ; the gradient f ( L , S ) is Lipschitz continuous with a constant = 2 , that is, for any ( L 1 , S 1 ) , ( L 2 , S 2 ) R p × p × R p × p , it holds that
f ( L 1 , S 1 ) f ( L 2 , S 2 ) F 2 ( L 1 , S 1 ) ( L 2 , S 2 ) F .
(iv)
Given a matrix L R p × p , the partial gradient function S f ( L , S ) is Lipschitz continuous with a constant = 1 , that is, for any S 1 , S 2 R p × p , it holds that
S f ( L , S 1 ) S f ( L , S 2 ) F S 1 S 2 F .
Moreover, given a matrix S R p × p , the partial gradient function L f ( L , S ) is Lipschitz continuous with a constant = 1 .

4. Necessary Optimality Conditions

In this section, we define a class of first-order stationary points for Problem (7), which is a hybridization of the first-order stationary points in [23,24,25] and the fixed point in [25]. Furthermore, we prove that any global minimizer of Problem (7) is a first-order stationary point of Problem (7). We also define an approximate variant of this stationary point. It is worth mentioning that the constraints in Problem (7) are treated as ordinary closed convex set constraints in this paper.
For the simple unconstrained L q regularized sparse optimization problem [23,24]:
min x R p f ( x ) + λ x q q ,
where x q q : = i = 1 p | x i | q , its the first-order stationary point is defined as the point satisfying
Diag ( x ) f ( x ) + λ q | x | q = 0 .
For the simple unconstrained L q regularized rank optimization problem [25]:
min X R m × n f ( X ) + λ X q q ,
its first-order stationary point (or fixed point) is defined as the point satisfying
X Prox λ X q q / , R p × p X 1 f ( X ) ,
where is a constant greater than the Lipschitz constant of f ( X ) . Inspired by these works, we can define a class of hybrid first-order stationary points of Problem (7).
Definition 1. 
( L , S ) S + p × S + p is said to be the first-order stationary point of Problem (7) if ( L , S ) satisfies
L Proxi ( λ L / ) L q q , S + p L 1 ( L + S Σ ˜ ) ,
0 S ( L + S Σ ˜ ) + Q + S Y : Y Γ D h Γ D ( S Γ D ) ,
where Q : = ( Q u v ) u , v = 1 p and
Q u v = q λ u v S | S u v | q , if ( u , v ) D c , 0 , if ( u , v ) D ,
Γ = { ( u , v ) : S u v 0 } , h Γ D ( S Γ D ) : = δ S + p ( [ S Γ D : 0 ] ) , Y = [ Y Γ D : Y ( Γ D ) c ] , Y ( Γ D ) c can be any finite array.
Next, we investigate the relationship between the global minimizer of Problem (7) and the first-order stationary point of Problem (7) without any qualification conditions.
Theorem 1. 
Suppose that ( L , S ) S + p × S + p is a global minimizer of Problem (7), and let Γ = { ( u , v ) N × N : S u v 0 } . Then, ( L , S ) is a first-order stationary point with > 1 of Problem (7).
Again, under some qualification conditions, the form of the first-order stationary point of Problem (7) can be further analyzed. We now discuss the specific form of the subdifferential of h Γ D at S Γ D . Note that
h Γ D ( X Γ D ) = δ S + p ( [ X Γ D : 0 ] ) = δ S + p ( L Γ D ( X Γ D ) ) ,
where L Γ D ( X Γ D ) : = [ X Γ D : 0 ] is the mapping acting on array X Γ . The adjoint mapping L Γ D : R p × p R p × p is defined by
L Γ D ( X Γ D ) , Y = X Γ D , L Γ D ( Y ) for all X Γ D Y R p × p .
Specifically, L Γ D ( Y ) = Y Γ D . Clearly, δ S + p is convex, and L Γ D and L Γ D are linear mappings. It follows from Theorem 23.9 in [26] and Theorem 2.51 in [27] that, if S S + p , then
{ L Γ D ( Y ) : Y N S + p ( S ) } h Γ D ( S Γ D ) .
Moreover, suppose that the qualification condition Ker L Γ D δ S + p ( S ) = { 0 } holds, where Ker L Γ D is the kernel of linear mapping L Γ D , then
{ L Γ D ( Y ) : Y N S + p ( S ) } = h Γ D ( S Γ D ) .
Based on these discussions, the following statement holds: If ( L , S ) satisfies
L Proxi λ L L q q , S + p ( Σ ˜ S ) , 0 S ( L + S Σ ˜ ) + Q + S Y : Y N S + p ( S ) ,
where Q is defined in Definition 1 and the qualification condition Ker L Γ D δ S + p ( S ) = { 0 } holds, then ( L , S ) is a first-order stationary point of Problem (7).
Obviously, the qualification condition Ker L Γ D δ S + p ( S ) = { 0 } holds automatically if δ S + p is locally Lipschitz continuous at S . In other words, if S int ( S + p ) = S + + p , this qualification condition holds. Indeed, the true covariance matrix S o is often assumed to be positive definite [7,8]. When S S + + p , we can establish the lower bound theory of the first-order stationary point of Problem (7).
Theorem 2. 
Let ( L , S ) be a first-order stationary point of Problem (7) satisfying F ( L , S ) F ( L 0 , S 0 ) + ϵ for some L 0 , S 0 and ϵ 0 , Γ : = { ( u , v ) : S u v 0 } . If S S + + p , then
| S u v | q min { λ u v S } 2 [ F ( L 0 , S 0 ) + ϵ ] 1 1 q ( u , v ) D c Γ .
Here, we only give the lower bound theory of the first-order stationary point of Problem (7). With reference to the definitions of the first- and second-order stationary point established by [23,24] for the general unconstrained L q regularized optimization, a class of second-order necessary optimality conditions of Problem (7) and its lower bound theory can be considered.
Note that Problem (7) is actually a very challenging optimization problem due to the positive definite constraints and the non-Lipschitz regularization terms, even finding a first-order stationary point. Here, we define a class of approximate first-order stationary points of Problem (7).
Definition 2. 
We say that ( L , S ) S + p × S + p is an ε-approximate first-order stationary point ( ε 0 ) of Problem (7) if there exist a set Γ { ( u , v ) : 1 u , v p , S u v 0 } such that
dist L , Proxi ( λ L / ) L q q , S + p L 1 ( L + S Σ ˜ ) ε ,
dist 0 , L Γ D + S Γ D Σ ˜ Γ D + Q Γ D + Ξ Γ D : Ξ Γ D h Γ D ( S Γ D ) ε ,
S ( Γ D ) c F ε .
where Q is defined in Definition 1.
Moreover, if ( L , S ) is a first-order stationary point of Problem (7), it is an ε -approximate first-order stationary point of Problem (7) with ε = 0 . If ( L , S ) is an ε -approximate first-order stationary point of Problem (7) with ε = 0 , it is a first-order stationary point of Problem (7) with ε = 0 . In practice, the stopping point of any algorithm for an optimization problem is usually an approximate stationary point, rather than a stationary point.

5. Numerical Optimization

In this section, we propose a smoothing alternating updating (SAU) method to solve Problem (7), which is a combination of the smoothing projection gradient step in [28,29] and the proximal gradient step in [30] under the framework of the ordinary alternating minimization method. The convergence of the SAU method is established.

5.1. A Smooth Approximation Optimization Problem

Referring to the smoothing technique in [23,29], a class of smooth approximation optimizations of Problem (7) can be given by
min L , S S + p F μ ( L , S ) : = 1 2 L + S Σ ˜ F 2 + λ L L q q + ( u , v ) D c λ u v S ψ μ ( S u v ) q ,
where μ ( 0 , ) and
ψ μ ( S u v ) = | S u v | , | S u v | > μ , S u v 2 2 μ + μ 2 , | S u v | μ .
Obviously, ψ μ ( S u v ) is first-order continuously differential and
( ψ μ ( S u v ) q ) = q | S u v | q 1 sign ( S u v ) , | S u v | > μ , q S u v 2 2 μ + μ 2 q 1 S u v μ , | S u v | μ .
Furthermore,
S F μ ( L , S ) = L + S Σ ˜ + Ψ ( S ) ,
where
( Ψ ( S ) ) u v = λ u v S ( ψ μ ( S u v ) q ) , ( u , v ) D c , 0 , ( u , v ) D .
Next, we define a class of hybrid first-order stationary points of Problem (16).
Definition 3. 
We say that ( L , S ) S + p × S + p is a first-order stationary point of Problem (16) if ( L , S ) satisfies
L Proxi ( λ L / ) L q q , S + p L 1 ( L + S Σ ˜ ) ,
0 S F μ ( L , S ) + Y , Y N S + p ( S ) .
Similar to the proof of Theorem 1, it is easy to prove that any global solution of Problem (16) is a first-order stationary point of Problem (16) with > 1 . The following conclusions establish the relationship between the first-order stationary points of Problems (7) and (16), which is the core of the convergence of the SAU method in next subsection.
Theorem 3. 
Let { μ k } denote a sequence with μ k > 0 , k = 1 , 2 , , and μ k 0 as k 0 and ( L μ k , S μ k ) be a first-order stationary point of Problem (16) with μ = μ k . Then, the following statements holds:
(i)
If μ k ε / p 2 , then ( L μ k , S μ k ) is an ε-approximate first-order stationary point of Problem (7).
(ii)
Any accumulation point of ( L μ k , S μ k ) is a first-order stationary point of Problem (7).

5.2. Smoothing Alternating Updating Method

Theorem 3 provides a feasible procedure to obtain an approximate solution of Problem (7) by solving Problem (16). This subsection gives the framework of the SAU method for Problem (7) and establishes its convergence.
First, we give the framework of the alternating updating (AU) method for solving Problem (16).
Algorithm 1. 
Given α 0 > 0 , β ( 0 , 1 ) , > 1 , choose an the initial point ( L 0 , S 0 ) S + p × S + p . For m = 0 , 1 , 2 , , repeat the following two steps until convergence:
Step 1.Update L by solving the following optimization problem:
L m + 1 = argmin L S + p L f ( L m , S m ) , L L m + 2 L L m F 2 + λ L L q q .
Step 2.Compute
S m + 1 = argmin S S + p S ( S m α m S F μ ( L m + 1 , S m ) ) F 2 ,
where α m = α 0 β m and m is the smallest non-negative integer ℓ such that
F μ ( L m + 1 , S m ( α 0 β ) ) ) F μ ( L m + 1 , S m ) ) c 2 S m ( α 0 β ) ) S m F 2 ,
and S m ( α ) = min S S + p S ( S m α S F μ ( L m + 1 , S m ) ) F 2 .
The AU method proposed in this paper applies the proximal gradient step [30] to update L and uses the smoothing projection gradient step [28,29] to update S . A key component at each iteration of the proposed AU method is to solve a singular-value minimization Problem (21), which can be rewritten as follows:
min L S + p 1 2 L L m 1 L f ( L m , S m ) F 2 + λ L L q q .
Theorem 3.4 of [30] provides a solution to Problem (23). To facilitate the expression, define vector q-thresholding operator H ( d ) : = ( h v ( d 1 ) , , h v ( d p ) ) T with v = 2 λ L / , where
h ( d i ) = h v , q ( d i ) d i > d , ( v ( 1 q ) ) 1 / ( 2 q ) or 0 d i = d , 0 , d i < d ,
d = 2 q 2 ( 1 q ) ( v ( 1 q ) ) 1 / ( 2 q ) , and h v , q ( d i ) is the unique root of the single-variable minimization problem in ( d ¯ , + ) :
min x 0 g d i ( x ) : = x 2 2 d i x + v x q ,
where d ¯ = ( v q ( 1 q ) / 2 ) 1 / ( 2 q ) . Moreover, h v , q ( d i ) is differentiable and strictly increasing on [ d , + ) . As per the suggestion in [25], h v , q ( d i ) can be simply obtained by the numerical methods, for example the Newton method:
x z + 1 = x z q v ( x z ) q 1 / 2 + x z d i q ( q 1 ) v ( x z ) q 2 / 2 + 1
with the initial point d 0 = 1.5 d ¯ .
Theorem 4 
(Theorem 3.4 in [30]). Let U Diag ( d ) U T be the eigenvalue decomposition of L m ( 1 / ) L f ( L m , S m ) ) . Then, L = U Diag ( H ( d ) ) U T is an optimal solution to Problem (23), where H ( d ) is defined as (24) with v = 2 λ L / .
When q = 1 / 2 , H ( d ) has a closed-form expression, that is h v , q ( d i ) = 2 3 d i ( 1 + cos ( 2 π 3 2 φ ( d i ) 3 ) ) , φ ( d i ) = arccos ( v 8 ( d i 3 ) 3 / 2 ) , d = 54 3 4 v 2 / 3 . Refer to [30] for the details.
Another key component of at each iteration of the proposed AU method is a projection problem over S + p , which can be written as follows:
min S S + p S C m F 2 ,
where C m : = S m α m S F μ ( L m + 1 , S m ) . It is well known that, if C m has the eigen-decomposition C m : = i = 1 p λ i v i v i T , then S = i = 1 p max { λ i , 0 } v i v i T is the optimal solution of Problem (25).
By (23) and (25), it is easy to find that, when the initial matrices L 0 and S 0 in the AU method are symmetric, the matrices for L and S output by the AU method are also symmetric. The next theorem gives the convergence of the AU method.
Theorem 5. 
Let { ( L m , S m ) } be the sequence generated by the AU method. Then, the following statements hold:
(i)
F μ ( L m + 1 , S m + 1 ) F μ ( L m , S m ) c 2 S m + 1 S m F 2 1 2 L m + 1 L m F 2 .
(ii)
{ ( L m , S m ) } is bounded.
(iii)
lim m L m + 1 L m F = 0 , lim m S m + 1 S m F = 0 .
(iv)
Any accumulation point of { ( L m , S m ) } is a first-order stationary point of Problem (16).
We now give the framework of the SAU method for solving Problem (16).
Algorithm 2. 
Give τ ( 0 , 1 ) . Choose an initial smoothing parameter μ 0 . For k = 0 , 1 , 2 , , repeat the following two steps until convergence:
Step 1.Solve Problem (16) with μ k by the AU method, and obtain ( L μ k , S μ k ) .
Step 2.Set μ k + 1 τ μ k , and return to Step 1.
Theorem 3 shows that any accumulation point of ( L μ k , S μ k ) is a first-order stationary point of Problem (7), which provides the guarantee of the convergence of the SAU method. In practice, k cannot be taken as infinite, and μ k cannot be taken as 0. Then, the L q -CSCE we obtained by the SAU method is often an ε -approximate first-order stationary point of Problem (7). Indeed, we stop the SAU method if it reaches a maximum iteration k max or μ k < ε , where ε > 0 is a small constant. Moreover, in order to improve the efficiency of the SAU method, a trick is to take the output point ( L μ k , S μ k ) as the initial point of the AU method for Problem (16) with μ k + 1 . Thus, if the initial point of the AU method for solving Problem (16) with μ 0 , then the final matrices for L and S generated by the SAU method are also symmetric.

6. Simulations

In this section, we study the performance of the L q -CSC method and the associated algorithms. The following models are considered in this section:
  • Model 1 (banded matrix with ordering). Σ = blockdiag ( A 1 , A 2 ) , where A 1 = ( σ u v ) , 1 u , v p / 2 , σ u v = ( 1 | i j | 10 ) + , A 2 = 4 I p / 2 × p / 2 . Σ is a two-block diagonal matrix, A 1 is a banded and sparse covariance matrix, and A 2 is a diagonal matrix with four along the diagonal.
  • Model 2 (simple factor). Σ = U D U T + I , where U R p × 3 with orthonormal columns uniformly random and D = 8 I .
  • Model 3 (random factor). Σ = U D U T + Σ Sparse , where U R p × 3 with orthonormal columns uniformly random and D = 8 I , where Σ Sparse is the sparse matrix generated by Model 1.
The matrix Σ in Model 1 is sparse, while the matrix in Model 2 and Model 3 satisfies structural decomposition (2).

6.1. Comparison with Representative Methods

This subsection studies the advantages and disadvantages of the L q -CSCE and existing methods. Four representative estimators (the adaptive thresholding estimator (ATE) with the hard thresholding rule in [4], the optimization-type estimator (OTE) in [8], the POET in [12], and the LOREC in [14]) were used for the comparison. The first two estimators are based on the assumption of sparse covariance. While the ATE is a good simple thresholding estimator, the OTE is a good optimization-type estimator. Our estimators POET and LOREC are based on the structural decomposition (2).
Under each model, we set n = 100 , and all random sample are generated from the normal distribution with mean 0 and covariance matrix Σ , for p = 30 , 100 , 200 . The losses are measured by three matrix norms: the spectral norm, the max norm, and the Frobenius norm. The standard errors are given in parentheses. It is known that q = 0.5 is a good choice for the variable selection problem and the low-rank problem [24,25]. For convenience, the parameter q in Problem (7) is taken as 0.5 . In this subsection, we do not consider different options of q.
For Model 1, the tuning parameters λ L in Problems (4) and (7) are taken as 10 8 . Moreover, we set K = 0 in the POET. Under these setting, the LOREC, POET, and L q -CSCE degenerate into the simple soft thresholding estimator [1,2], the ATE, and the constrained adaptive thresholding estimator, respectively. Table 1 reports the means and standard errors of the five estimators under the three losses. In our analysis, the performance of the POET is almost the same as the ATE. Note that compared with the ATE, OTE, POET, and L q -CSCE, the LOREC has a large bias under the spectral norm loss and the Frobenius norm loss, which is caused by the bias of the L 1 penalty. Moreover, the L q -CSCE performs best under the spectral norm loss and the Frobenius norm loss. All methods perform similarly under the max norm.
For Models 2 and 3, we choose the tuning parameters in the LOREC and L q -CSCE by setting the grid and setting K = 3 in the POET. Table 2 and Table 3 report the means and standard errors of the five estimators under the three losses. Note that the ATE and OTE performed poorly for Models 2 and 3. In contrast, the LOREC, POET, and L q -CSCE were more effective. The L q -CSCE performs best under the spectral norm loss and the Frobenius norm loss. The same as the result of Model 1, all methods perform similarly under the max norm. It is worth mentioning that the adaptive technology is invalid for Model 2.
Moreover, Table 4 records the number of times that the estimators of the POET, LOREC, and L q -CSCE for L and S are positive semi-definite matrices. Clearly, all estimators of the L q -CSC estimation for L and S are positive semi-definite. It is not guaranteed that the estimators of the LOREC and POET for S are positive semi-definite matrices.

6.2. Choice of q

In the previous simulations, we only consider the setting where q = 0.5 . This subsection supplements some simulations to study the influence of the choice of q on the performance of L q -CSCE. Here, we consider the cases q = 0.1 , 0.3 , 0.5 0.7 , and 0.9 .
Table 5, Table 6 and Table 7 report the means and standard errors of the L q -CSCE with different q under the three losses. Note that the options for q did not affect the result of the L q -CSCE under the max norm loss. Under other losses, for different models, the performance of the L q -CSCE depends on the choice of q deeply or shallowly. Specifically, there was little difference between q = 0.1 , q = 0.3 , and q = 0.5 under the spectral norm loss and the max norm loss. The L q -CSCE with q = 0.7 and q = 0.9 performed poorly under the Frobenius norm, especially when q = 0.9 . Note that, when q = 0.1 , the L q -CSCE is slightly worse than that of q = 0.3 and q = 0.5 . We guess that a smaller q would make the minimizing functional more non-convex and, thus, more difficult to solve. The estimator induced by the L q function with a large q had a large deviation.
Problem (23) has an explicit solution when q = 0.5 , which makes the iteration of the SAU method simple. When q = 0.1 , 0.3 , 0.7 , and 0.9 , it is necessary to embed other numerical methods to find the approximate solution of Problem (23). Therefore, in practical applications, we recommend the setting where q = 0.5 .

7. Discussion and Conclusions

In this paper, we considered the problem of estimating a high-dimensional covariance matrix from noisy data and studied a class of estimators based on the structural decomposition (2). We estimated the high-dimensional covariance matrix by solving a constrained adaptive L q -type regularized optimization Problem (7). This paper provided a systematic study for Problem (7). We first defined a class of hybrid first-order stationary points of Problem (7) and analyzed the relationship between these stationary points and the global solution of Problem (7). Secondly, we proposed an SAU method for solving Problem (7) and established its convergence. The final simulation results showed that the proposed SAU method can solve Problem (7) effectively and that the final L q -CSCE had good numerical performance.

8. Proof

This section provides the detailed proof of the conclusions in this paper.
Proof of Proposition 1. 
( i ) ( i i ) The properness of F ( L , S ) is obvious. Note that F is continuous over its domain and dom ( F ) is closed. Theorem 2.8 in [31] shows that F is closed. Furthermore, whether L F or S F , at least one of the three terms in F tends to infinity. Thus, the coerciveness of F ( L , S ) holds. Then, it follows from Theorem 2.14 in [31] that F ( L , S ) attains its minimal value over S + p × S + p .
( i i i ) By calculation, f ( L , S ) = [ L T f ( L , S ) , S T f ( L , S ) ] T , where
L f ( L , S ) = S f ( L , S ) = L + S Σ ˜ .
For any ( L 1 , S 1 ) , ( L 2 , S 2 ) R p × p × R p × p , it holds that
f ( L 1 , S 1 ) f ( L 2 , S 2 ) F 2 = L f ( L 1 , S 1 ) L f ( L 2 , S 2 ) F 2 + S f ( L 1 , S 1 ) S f ( L 2 , S 2 ) F 2 = 2 L 1 L 2 + S 1 S 2 F 2 = 2 L 1 L 2 F 2 + 2 S 1 S 2 F 2 + 4 L 1 L 2 , S 1 S 2 4 L 1 L 2 F 2 + 4 S 1 S 2 F 2 = 4 ( L 1 , S 1 ) ( L 2 , S 2 ) F 2 ,
which implies that the gradient f ( L , S ) is Lipschitz continuous with a constant = 2 .
( i v ) By calculation, S f ( L , S ) = L + S Σ ˜ . For any S 1 , S 2 R p × p , it holds that S f ( L , S 1 ) S f ( L , S 2 ) = S 1 S 2 . Then, the partial gradient function S f ( L , S ) is Lipschitz continuous with a constant = 1 . The same result holds for the partial gradient function L f ( L , S ) . □
Proof of Theorem 1. 
Since ( L , S ) is a global minimizer of Problem (7), then L is also a global minimizer of the problem:
min L S + p f ( L , S ) + λ L L q q .
By ( i v ) of Proposition 1, we have that
f ( L 1 , S ) f ( L 2 , S ) + L f ( L 2 , S ) , L 1 L 2 + 1 2 L 1 L 2 F 2 , L 1 , L 2 R p × p .
It follows that
f ( L , S ) + λ L L q q f ( L , S ) + λ L L q q f ( L , S ) + λ L L q q + 1 2 L L F 2 f ( L , S ) + L f ( L , S ) , L L + 1 2 L L F 2 + λ L L q q + 1 2 L L F 2 = f ( L , S ) + L f ( L , S ) , L L + 2 L L F 2 + λ L L q q .
Thus,
L argmin L S + p f ( L , S ) + L f ( L , S ) , L L + 2 L L F 2 + λ L L q q = Proxi ( λ L / ) L q q , S + p L 1 ( L + S Σ ˜ ) .
Hence, (9) holds.
We now prove that (10) holds for ( L , S ) . Observe that S is a global minimizer of the problem:
min S S + p f ( L , S ) + ( u , v ) D c λ u v S | S u v | q ,
which can be equivalently rewritten as
min S R p × p f ( L , S ) + ( u , v ) D c λ u v S | S u v | q + δ S + p ( S ) .
Then, S Γ is a global minimizer of the problem:
min S Γ f ( L , S Γ , S ( Γ ) c ) + ( u , v ) D c Γ λ u v S | S u v | q + δ S + p ( [ S Γ : S ( Γ ) c ] ) .
It follows from the first-order optimality condition of Problem (26) and Exercise 10.10 in [32] that there exists a Y Γ h Γ ( S Γ ) such that
0 = L u v + S u v Σ ˜ u v + λ u v S sign ( S u v ) | S u v | q 1 + Y u v , for all ( u , v ) Γ D c ,
0 = L u v + S u v Σ ˜ u v + Y u v , for all ( u , v ) Γ D .
Moreover, let Y = [ Y Γ : Y ( Γ ) c ] , where Y ( Γ ) c is some finite array. For any ( u , v ) ( Γ ) c D c , we have
S u v ( L u v + S u v Σ ˜ u v ) + λ u v S | S u v | q + S u v Y u v = 0 ,
and for any ( u , v ) ( Γ ) c D , we have
S u v ( L u v + S u v Σ ˜ u v ) + S u v Y u v = 0 ,
Combining (27)-(30) with the definition of Y , (10) holds. Thus, ( L , S ) is a first-order stationary point of Problem (7). □
Proof of Theorem 2. 
Since ( L , S ) is a first-order stationary point of Problem (7) and S S + + p , by Definition 1, it holds that
0 = S ( L + S Σ ˜ ) + Q ,
where Q : = ( Q u v ) u , v = 1 p and
Q u v = q λ u v S | S u v | q , if ( u , v ) D c , 0 , if ( u , v ) D .
For all ( u , v ) D c Γ , by (31), we have
0 = S u v ( L u v + S u v Σ ˜ u v ) + q λ u v S | S u v | q .
Moreover, the statement ( i v ) in Proposition 1 shows that S f ( L , S ) is Lipschitz continuous with a constant 1, and it follows that
f ( L , Y ) f ( L , X ) + S f ( L , X ) , Y X + 1 2 Y X F 2 , X , Y R p × p .
Taking X = S and Y = S S f ( L , S ) , we obtain that
f ( L , S S f ( L , S ) ) f ( L , S ) 1 2 S f ( L , S ) F 2 .
Note that
f ( L , S S f ( L , S ) ) 0 , f ( L , S ) F ( L , S ) F ( L 0 , S 0 ) + ϵ .
It follows that
S f ( L , S ) F 2 [ F ( L 0 , S 0 ) + ϵ ] .
This, together with the relation (32), implies that
q λ u v S | S u v | q 1 = | ( S f ( L , S ) ) u v | S f ( L , S ) F 2 [ F ( L 0 , S 0 ) + ϵ ] , ( u , v ) D c Γ .
Thus, we have
| S u v | q min { λ u v S } 2 [ F ( L 0 , S 0 ) + ϵ ] 1 1 q ( u , v ) D c Γ ,
which completes this proof. □
Proof of Theorem 3. 
( i ) We just need to prove that Relation (20) is a sufficient condition for (14) and the inequality (15) holds. Since ( L μ k , S μ k ) is a first-order stationary point of Problem (16), there exists a matrix Y μ k N S + p ( S ) such that
0 = S F μ k ( L μ k ) + Y μ k .
Let N : = { ( u , v ) D c : ( S μ k ) u v > μ } . It holds that
0 = ( L μ k ) N D + ( S μ k ) N D Σ ˜ N D + [ Ψ ( S μ k ) ] N D + ( Y μ k ) N D .
By (11), it is known that
( Y μ k ) N D L N D ( N S + p ( S μ k ) ) h N D ( ( S μ k ) N D ) .
Then,
dist 0 , ( L μ k ) N D + ( S μ k ) N D Σ ˜ N D + [ Ψ ( S μ k ) ] N D + ( Y μ k ) N D : Y N D h N D ( ( S μ k ) N D ) ε .
Moreover, it follows from the condition that μ ε / p 2 that
( S μ k ) ( Γ D ) c F ( u , v ) ( Γ D ) c | ( S μ k ) u v | μ p 2 ε .
Therefore, ( L μ k , S μ k ) is an ε -approximate first-order stationary point of Problem (7).
( i i ) Let ( L , S ) be an accumulation point of ( L μ k , S μ k ) . By the definition of the first-order stationary point of Problem (16), for any ( u , v ) N c D c , there exists a matrix Y μ k N S + p ( S μ k ) such that
0 = ( L μ k ) u v + ( S μ k ) u v Σ ˜ u v + q λ u v ( S μ k ) u v 2 2 μ k + μ k 2 q 1 ( S μ k ) u v μ k + ( Y μ k ) u v .
Note that
0 q λ u v ( S μ k ) u v 2 2 μ k + μ k 2 q 1 ( S μ k ) u v 2 μ k q λ u v ( S μ k ) u v 2 μ k q q λ u v S | ( S μ k ) u v | q .
If | ( S μ k ) u v | μ k for arbitrarily large k, then S u v = lim k ( S μ k ) u v = 0 , and by (35), we have
lim k q λ u v ( S μ k ) u v 2 2 μ k + μ k 2 q 1 ( S μ k ) u v 2 μ k = 0 and q λ u v S | S u v | q = 0 .
Since Y μ k N S + p ( S μ k ) , lim k Y μ k N S + p ( S ) .
This, together with the relation (34), implies that, for any ( u , v ) N c D c ,
0 S u v ( L u v + S u v Σ ˜ u v ) + q λ u v S | S u v | q + S u v Y u v , Y N S + p ( S ) .
Moreover, for any ( u , v ) N D , we have
0 = ( L μ k ) u v + ( S μ k ) u v Σ ˜ u v + Q u v μ k + ( Y μ k ) u v ,
where
Q u v μ k = 0 , ( u , v ) D , q λ u v S sign ( ( S μ k ) u v ) | ( S μ k ) u v | q 1 , ( u , v ) ( N D ) / D .
For any ( u , v ) N D , it follows that
0 S u v ( L u v + S u v Σ ˜ u v ) + S u v Q u v + S u v Y u v , Y N S + p ( S ) ,
where
Q u v = 0 , ( u , v ) D , q λ u v S sign ( S u v ) | S u v | q 1 , ( u , v ) ( N D ) / D .
Since the Y μ k in (34) coincides with that in (37), by (36) and (38), we have that ( L , S ) is the first-order stationary point of Problem (16). □
Proof of Theorem 5. 
( i ) By ( i v ) of Proposition 1, we have that
1 2 L 1 + S Σ ˜ F 2 1 2 L 2 + S Σ ˜ F 2 + L 2 + S Σ ˜ , L 1 L 2 + 1 2 L 1 L 2 F 2 , L 1 , L 2 R p × p .
It follows that
F μ ( L m + 1 , S m ) = 1 2 L m + 1 + S m Σ ˜ F 2 + λ L L m + 1 q q + ( u , v ) D c λ u v S ψ μ ( S u v m ) q , 1 2 L m + S m Σ ˜ F 2 + L m + S m Σ ˜ , L m + 1 L m + 1 2 L m + 1 L m F 2 + λ L L m + 1 q q + ( u , v ) D c λ u v S ψ μ ( S u v m ) q 1 2 L m + S m Σ ˜ F 2 + L m + S m Σ ˜ , L m + 1 L m + 2 L m + 1 L m F 2 + λ L L m + 1 q q + ( u , v ) D c λ u v S ψ μ ( S u v m ) q 1 2 L m + 1 L m F 2 1 2 L m + S m Σ ˜ F 2 + λ L L m q q + ( u , v ) D c λ u v S ψ μ ( S u v m ) q 1 2 L m + 1 L m F 2 = F μ ( L m , S m ) 1 2 L m + 1 L m F 2 ,
where the third inequality is due to the iteration of L in AU. Then,
F μ ( L m + 1 , S m + 1 ) F μ ( L m + 1 , S m ) c 2 S m + 1 S m F 2 F μ ( L m , S m ) c 2 S m + 1 S m F 2 1 2 L m + 1 L m F 2 .
( i i ) It is easy to show that F μ is coercive, which implies that F μ is level bounded. By ( i ) in this Theorem, we have F μ ( L m , S m ) F μ ( L 0 , S 0 ) . Then, { ( L m , S m ) } is bounded.
( i i i ) It follows from the inequality in ( i ) that
m = 0 c 2 S m + 1 S m F 2 + 1 2 L m + 1 L m F 2 m = 0 F μ ( L m , S m ) F μ ( L m + 1 , S m + 1 ) F μ ( L 0 , S 0 ) lim m F μ ( L m , S m ) < ,
where the last inequality is due to F μ being bounded from below. Hence, lim m L m + 1 L m F = 0 , lim m S m + 1 S m F = 0 .
( i v ) Let ( L , S ) be an accumulation point of ( L m , S m ) . We assumed that lim j L m j = L and lim j S m j = S , where m j as j . By the results in ( i i i ) ,
lim j L m j + 1 = lim j L m j + ( L m j + 1 L m j ) = L , lim j S m j + 1 = lim j S m j + ( S m j + 1 S m j ) = S .
By the AU method in this paper, we have that
L m j + 1 = argmin L S + p L f ( L m j , S m j ) , L L + 2 L L m j F 2 + λ L L q q , S m j + 1 ( S m j α m j S F μ ( L m j , S m j ) ) , S S m j + 1 0 , S S + p .
Taking the limits on both sides, it follows that
L = argmin L S + p L f ( L , S ) , L L + 2 L L m j F 2 + λ L L q q , S F μ ( L , S ) , S S 0 , S S + p ,
i.e., ( L , S ) is a first-order stationary point of Problem (16). □

Author Contributions

Methodology, X.W., L.K. and L.W.; software, X.W. and Z.Y.; writing—original draft, X.W., L.K. and L.W.; validation, X.W., L.K., L.W. and Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 12071022) and the 111 Project of China (Grant No. B16002).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bickel, P.; Levian, E. Covariance regularization by thresholding. Ann. Stat. 2008, 36, 2577–2604. [Google Scholar] [CrossRef]
  2. El Karoui, P. Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Stat. 2008, 36, 2717–2756. [Google Scholar] [CrossRef]
  3. Rothman, A.; Levian, E.; Zhu, J. Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc. 2009, 104, 177–186. [Google Scholar] [CrossRef]
  4. Cai, T.; Liu, W. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 2011, 106, 672–684. [Google Scholar] [CrossRef] [Green Version]
  5. Cai, T.; Zhou, H. Minimax estimation of large covariance matrices under 1 norm. Stat. Sinica 2008, 36, 2577–2604. [Google Scholar]
  6. Rothman, A. Positive definite estimators of large covariance matrices. Biometrika 2012, 99, 733–740. [Google Scholar] [CrossRef]
  7. Xue, L.; Ma, S.; Zou, H. Positive definite L1 penalized estimation of large covariance matrices. J. Am. Stat. Assoc. 2012, 107, 1480–1491. [Google Scholar] [CrossRef] [Green Version]
  8. Cui, Y.; Leng, C.; Sun, D. Sparse estimation of high-dimensional correlation matrices. Comput. Stat. Data. An. 2016, 93, 390–403. [Google Scholar] [CrossRef]
  9. Avella-Medina, M.; Battey, H.; Fan, J.; Li, Q. Robust estimation of high-dimensional covariance and precision matrices. Biometrika 2018, 105, 271–284. [Google Scholar] [CrossRef] [Green Version]
  10. Fan, J.; Fan, Y.; Lv, J. High dimensional covariance matrix estimation using a factor model. J. Econom. 2008, 147, 186–197. [Google Scholar] [CrossRef] [Green Version]
  11. Fan, J.; Liao, Y.; Mincheva, M. High dimensional covariance matrix estimation in appriximate. Ann. Stat. 2011, 39, 3320–3356. [Google Scholar] [CrossRef] [PubMed]
  12. Fan, J.; Liao, Y.; Mincheva, M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B 2013, 75, 603–680. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Fan, J.; Wang, W.; Zhong, Y. Robust covariance estimation for approximate factor models. J. Econom. 2019, 208, 5–22. [Google Scholar] [CrossRef] [PubMed]
  14. Luo, X. High dimensional low rank and sparse covariance matrix estimation via convex minimization. Biometrika 2018, 105, 271–284. [Google Scholar]
  15. Cand<i>e</i>‘s, E.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM 2011, 58, 1–37. [Google Scholar]
  16. Feng, J.; Xu, H.; Yan, S. Online robust PCA via stochastic optimization. In Proceedings of the 26th International Conference on Neural Information Processing Systems, New York, NY, USA, 5 December 2013; pp. 404–412. [Google Scholar]
  17. Kang, Z.; Peng, C.; Cheng, Q. Robust PCA via nonconvex rank approximation. In Proceedings of the 2015 IEEE International Conference on Data Mining, Atlantic City, NJ, USA, 14–17 November 2015; pp. 1550–4786. [Google Scholar]
  18. Song, W.; Zhu, J.; Li, Y.; Chen, C. Image alignment by online robust PCA via stochastic gradient descent. IEEE Trans. Circ. Syst. Vid 2016, 26, 1241–1250. [Google Scholar] [CrossRef]
  19. Bouwmans, T.; Jaced, S.; Zhang, H.; Lin, Z.; Ricardo, O. On the applications of robust PCA in image and video processing. Proc. IEEE 2018, 106, 1427–1457. [Google Scholar] [CrossRef] [Green Version]
  20. Javed, S.; Mahmood, A.; Al-Maadeed, S.; Bouwmans, T.; Jung, S.K. Moving object detection in complex scene using spatiotemporal structured-sparse RPCA. IEEE Trans. Image Process 2018, 28, 1007–1022. [Google Scholar] [CrossRef]
  21. Liu, Q.; Li, X. Efficient low-rank matrix factorization based on 1,ϵ-norm for online background subtraction. IEEE T Circ. Syst. Vid 2022, 32, 4900–4904. [Google Scholar] [CrossRef]
  22. Fan, J.; Li, Q.; Wang, Y. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. Ser. B 2017, 79, 247–265. [Google Scholar] [CrossRef] [Green Version]
  23. Chen, X.; Xu, F.; Ye, Y. Lower bound theory of nonzero entries in solutions of L2-Lp mininization. SIAM J. Sci. Comput. 2010, 32, 2832–2852. [Google Scholar] [CrossRef]
  24. Lu, Z. Iterative reweighted minimization methods for Lp regularized unconstrained nonlinear programming. Math. Program. 2014, 147, 277–307. [Google Scholar] [CrossRef]
  25. Peng, D.; Xiu, N.; Yu, J. Global optimality and fixed point continuation algorithm for non-Lipschitz p regularized matrix minimization. Sci. China Math. 2018, 61, 171–184. [Google Scholar] [CrossRef]
  26. Rockafellar, R.T. Convex Analysis, 2nd ed; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
  27. Mordukhovich, B.S.; Nguyen, M.N. An Easy Path to Convex Analysis and Applications; Morgan and Claypool: San Rafael, CA, USA, 2014. [Google Scholar]
  28. Zhang, C.; Chen, X. Smoothing prijected gradient method and its application to stochastic linear complementarity problems. SIAM J. Optimiz. 2009, 20, 627–649. [Google Scholar] [CrossRef]
  29. Chen, X.; Ng, M.K.; Zhang, C. Nonconvex Lp regularization and box constrained model for image restoration. IEEE T Image Process 2012, 21, 4709–4721. [Google Scholar] [CrossRef]
  30. Chen, Y.; Xiu, N.; Peng, D. Global solutions of non-Lipschitz S2-Sp minimization over the positive semidefinite cone. Optim. Lett. 2014, 8, 2053–2064. [Google Scholar] [CrossRef]
  31. Beck, A. First-Order Methods in Optimization; Society for Industrial and Applied Mathematics and Mathematical Optimization Society: Philadelphia, PA, USA, 2017. [Google Scholar]
  32. Rockafellar, R.T.; Wets, R.J.-B. Variational Analysis; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Table 1. Simulation results for Model 1 over 100 replications. The standard errors are given in parentheses.
Table 1. Simulation results for Model 1 over 100 replications. The standard errors are given in parentheses.
pSEATEOTEPOETLOREC L q -CSCE
(Spectral norm)
303.76 (0.58)1.73 (0.50)1.89 (1.04)1.74 (0.48)3.04 (0.43)1.74 (0.48)
1008.58 (0.67)2.69 (0.44)2.70 (0.50)2.61 (0.49)4.94 (0.38)2.60 (0.50)
20014.10 (0.82)4.73 (0.52)3.27 (0.42)3.02 (0.45)6.17 (0.38)3.02 (0.46)
(Max norm)
301.25 (0.23)1.17 (0.28)1.16 (0.28)1.16 (0.28)1.22 (0.22)1.16 (0.28)
1001.59 (0.26)1.59 (0.26)1.48 (0.32)1.51 (0.33)1.50 (0.25)1.51 (0.33)
2001.73 (0.21)1.73 (0.21)1.63 (0.25)1.67 (0.25)1.63 (0.19)1.67 (0.25)
(Frobenius norm)
307.69 (0.52)3.17 (0.48)3.32 (1.03)3.22 (0.43)6.35 (0.47)3.12 (0.44)
10025.30 (0.64)8.10 (0.66)7.24 (0.52)6.73 (0.49)16.06 (0.58)6.41 (0.52)
20050.50 (0.79)21.41 (0.80)12.11 (0.48)9.80 (0.47)25.59 (0.62)9.25 (0.49)
Table 2. Simulation results for Model 2 over 100 replications. The standard errors are given in parentheses.
Table 2. Simulation results for Model 2 over 100 replications. The standard errors are given in parentheses.
pSEATEOTEPOETLOREC L q -CSCE
(Spectral norm)
303.30 (0.75)3.31 (0.10)3.28 (0.03)3.77 (0.79)3.23 (0.73)3.27 (0.77)
1004.99 (0.75)7.60 (0.08)7.62 (0.13)5.48 (0.85)4.60 (0.62)4.60 (0.63)
2006.85 (0.59)8.36 (0.79)8.65 (0.49)7.40 (0.28)6.30 (0.25)6.27 (0.63)
(Max norm)
300.73 (0.18)0.74 (0.17)0.73 (0.18)0.75 (0.19)0.72 (0.17)0.72 (0.16)
1000.57 (0.08)0.62 (0.05)0.62 (0.05)0.56 (0.09)0.55 (0.07)0.50 (0.07)
2000.54 (0.07)0.54 (0.07)0.52 (0.08)0.54 (0.07)0.52 (0.07)0.51 (0.07)
(Frobenius norm)
305.59 (0.58)5.70 (0.59)5.58 (0.57)5.47 (0.76)5.17 (0.58)4.98 (0.63)
10012.62 (0.40)13.58 (0.05)13.49 (0.24)10.26 (0.57)10.06 (0.40)7.91 (0.48)
20022.54 (0.39)14.21 (0.09)13.94 (0.02)16.04 (0.53)18.71 (0.39)14.37 (0.48)
Table 3. Simulation results for Model 3 over 100 replications. The standard errors are given in parentheses.
Table 3. Simulation results for Model 3 over 100 replications. The standard errors are given in parentheses.
pSEATEOTEPOETLOREC L q -CSCE
(Spectral norm)
305.59 (1.04)5.57 (0.85)5.53 (1.03)5.73 (1.09)5.57 (1.04)5.33 (0.99)
10010.09 (0.77)9.71 (0.76)10.01 (0.77)10.19 (0.78)9.52 (0.76)9.21 (0.96)
20015.00 (0.93)13.17 (0.84)14.22 (0.89)15.69 (0.95)14.41 (0.93)11.28 (1.11)
(Max norm)
301.63 (0.30)1.70 (0.28)1.63 (0.30)1.65 (0.29)1.63 (0.30)1.59 (0.29)
1001.71 (0.29)1.71 (0.29)1.71 (0.29)1.71 (0.29)1.67 (0.29)1.72 (0.37)
2001.76 (0.22)1.76 (0.22)1.75 (0.22)1.76 (0.22)1.75 (0.21)1.60 (0.16)
(Frobenius norm)
3010.44 (0.88)11.09 (0.94)10.34 (0.87)10.54 (0.91)10.39 (0.88)10.10 (0.84)
10027.99 (0.77)27.48 (0.76)27.78 (0.77)27.76 (0.78)25.69 (0.76)19.66 (0.96)
20052.86 (0.75)49.25 (0.74)50.16 (0.73)49.76 (0.76)49.09 (0.74)32.23 (0.78)
Table 4. Record of the number of positive semi-definite matrices over 100 replications.
Table 4. Record of the number of positive semi-definite matrices over 100 replications.
Method p = 30 p = 100 p = 200
L S L S L S
Model1
ATE------
OTE------
POET100010001000
LOREC1008610001000
L q -CSCE100100100100100100
Model2
ATE------
OTE------
POET10010010021000
LOREC010053100100100
L q -CSCE100100100100100100
Model3
ATE------
OTE------
POET100010001000
LOREC31002500
L q -CSCE100100100100100100
Table 5. Simulation results of the L q -CSCE with different q for Model 1 over 100 replications. The standard errors are given in parentheses.
Table 5. Simulation results of the L q -CSCE with different q for Model 1 over 100 replications. The standard errors are given in parentheses.
p q = 0.1 q = 0.3 q = 0.5 q = 0.7 q = 0.9
(Spectral norm)
301.70 (0.49)1.71 (0.48)1.66 (0.51)1.68 (0.46)2.12 (0.43)
1002.61 (0.54)2.61 (0.54)2.62 (0.55)2.80 (0.46)3.71 (0.32)
2003.10 (0.52)3.11 (0.52)3.10 (0.53)3.42 (0.49)4.10 (0.46)
(Max norm)
301.18 (0.31)1.18 (0.32)1.13 (0.30)1.21 (0.32)1.21 (0.33)
1001.53 (0.33)1.53 (0.32)1.53 (0.34)1.54 (0.31)1.52 (0.28)
2001.65 (0.26)1.65 (0.27)1.65 (0.33)1.66 (0.32)1.66 (0.30)
(Frobenius norm)
303.12 (0.52)3.15 (0.51)3.10 (0.54)3.73 (0.53)4.45 (0.46)
1006.38 (0.54)6.34 (0.54)6.34 (0.52)6.43 (0.64)7.21 (0.55)
2009.31 (0.53)9.22 (0.55)9.21 (0.57)9.70 (0.51)11.12 (0.49)
Table 6. Simulation results of the L q -CSCE with different q for Model 2 over 100 replications. The standard errors are given in parentheses.
Table 6. Simulation results of the L q -CSCE with different q for Model 2 over 100 replications. The standard errors are given in parentheses.
p q = 0.1 q = 0.3 q = 0.5 q = 0.7 q = 0.9
(Spectral norm)
303.41 (0.75)3.41 (0.74)3.40 (0.72)3.39 (0.68)3.38 (0.62)
1004.81 (0.75)4.75 (0.10)4.77 (0.03)4.74 (0.79)4.83 (0.79)
2006.35 (0.75)6.30 (0.10)6.31 (0.03)6.40 (0.79)6.41 (0.79)
(Max norm)
300.73 (0.25)0.74 (0.18)0.74 (0.18)0.73 (0.18)0.74 (0.19)
1000.57 (0.11)0.53 (0.10)0.53 (0.07)0.53 (0.04)0.54 (0.05)
2000.53 (0.10)0.50 (0.07)0.51 (0.09)0.52 (0.07)0.54 (0.08)
(Frobenius norm)
305.58 (0.62)5.36 (0.65)5.33 (0.64)5.32 (0.61)5.52 (0.56)
1008.15 (0.62)8.09 (0.52)8.11 (0.50)8.30 (0.52)9.14 (0.57)
20014.81 (0.65)14.49 (0.57)14.62 (0.53)14.74 (0.56)15.55 (0.49)
Table 7. Simulation results of the L q -CSCE with different q for Model 3 over 100 replications. The standard errors are given in parentheses.
Table 7. Simulation results of the L q -CSCE with different q for Model 3 over 100 replications. The standard errors are given in parentheses.
p q = 0.1 q = 0.3 q = 0.5 q = 0.7 q = 0.9
(Spectral norm)
305.80 (1.01)5.63 (0.99)5.51 (0.98)5.71 (0.95)5.65 (1.00)
1008.91 (1.04)8.89 (0.90)8.97 (0.93)8.85 (0.84)9.21 (0.79)
20012.40 (1.13)12.20 (1.04)12.21 (1.01)12.34 (1.06)13.37 (0.99)
(Max norm)
301.60 (0.35)1.56 (0.29)1.56 (0.28)1.56 (0.31)1.60 (0.29)
1001.73 (0.43)1.73 (0.31)1.74 (0.34)1.76 (0.33)1.69 (0.31)
2001.71 (0.29)1.71 (0.22)1.72 (0.19)1.73 (0.19)1.78 (0.21)
(Frobenius norm)
3010.12 (0.91)9.98 (0.87)10.03 (0.89)10.14 (0.84)10.42 (0.86)
10020.60 (1.03)20.01 (0.99)20.04 (1.01)20.45 (0.95)21.36 (0.83)
20031.56 (0.99)31.40 (0.90)31.29 (0.84)31.49 (0.79)34.47 (0.72)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, X.; Kong, L.; Wang, L.; Yang, Z. High-Dimensional Covariance Estimation via Constrained Lq-Type Regularization. Mathematics 2023, 11, 1022. https://doi.org/10.3390/math11041022

AMA Style

Wang X, Kong L, Wang L, Yang Z. High-Dimensional Covariance Estimation via Constrained Lq-Type Regularization. Mathematics. 2023; 11(4):1022. https://doi.org/10.3390/math11041022

Chicago/Turabian Style

Wang, Xin, Lingchen Kong, Liqun Wang, and Zhaoqilin Yang. 2023. "High-Dimensional Covariance Estimation via Constrained Lq-Type Regularization" Mathematics 11, no. 4: 1022. https://doi.org/10.3390/math11041022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop