High-Dimensional Covariance Estimation via Constrained Lq-Type Regularization

Wang, Xin; Kong, Lingchen; Wang, Liqun; Yang, Zhaoqilin

doi:10.3390/math11041022

Open AccessArticle

High-Dimensional Covariance Estimation via Constrained L_q-Type Regularization

¹

School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China

²

Department of Statistics, University of Manitoba, Winnipeg, MB R3T 2N2, Canada

³

Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(4), 1022; https://doi.org/10.3390/math11041022

Submission received: 23 January 2023 / Revised: 14 February 2023 / Accepted: 15 February 2023 / Published: 17 February 2023

(This article belongs to the Section Computational and Applied Mathematics)

Download Versions Notes

Abstract

:

High-dimensional covariance matrix estimation is one of the fundamental and important problems in multivariate analysis and has a wide range of applications in many fields. In practice, it is common that a covariance matrix is composed of a low-rank matrix and a sparse matrix. In this paper we estimate the covariance matrix by solving a constrained

L_{q}

-type regularized optimization problem. We establish the first-order optimality conditions for this problem by using proximal mapping and the subspace method. The proposed stationary point degenerates to the first-order stationary points of the unconstrained

L_{q}

regularized sparse or low-rank optimization problems. A smoothing alternating updating method is proposed to find an estimator for the covariance matrix. We establish the convergence of the proposed calculation method. The numerical simulation results show the effectiveness of the proposed approach for high-dimensional covariance estimation.

Keywords:

high-dimensional covariance matrix; constrained Lq-type regularized optimization problem; smoothing alternating updating method

MSC:

62-08; 65K05

1. Introduction

High-dimensional covariance matrix estimation is an important topic in statistical inference and learning. The objective is to estimate the covariance matrix of p variables based on sample data of size n. Much statistical analysis of high-dimensional data involves covariance estimation. In a high-dimensional setting (usually

p ≫ n

), researchers usually assume that the covariance matrix is sparse and have proposed various regularization techniques [1,2,3,4,5,6,7,8,9] to consistently estimate covariance matrix

Σ

. However, this assumption is not appropriate in many applications such as financial analysis, depending on the equity market risks, gene expressions stimulated by cytokines, etc. A more reasonable assumption based on the factor model [10,11,12,13,14] is that the high-dimensional covariance matrix is the sum of a low-rank matrix and a sparse matrix. Specifically, a factor model can be formulated as

\begin{matrix} Y_{i} = B f_{i} + u_{i}, i = 1, \dots, n, \end{matrix}

(1)

where

Y_{i} = {(Y_{1 i}, \dots, Y_{p i})}^{T}

is the observed variable,

B = {(b_{1}, \dots, b_{p})}^{T}

is the matrix of factor loadings,

f_{i}

is a

K \times 1

vector of common factors, and

u_{i} = {(u_{1 i}, \dots, u_{p i})}^{T}

is the idiosyncratic error component, uncorrelated with

f_{i}

. Under this model, the covariance matrix of

Y_{i}

is given by

\begin{matrix} Σ & = & B cov (f_{i}) B^{T} + Σ_{u} \\ : = & L + S, \end{matrix}

(2)

where

L

is the covariance matrix of

B f_{i}

and

S

is the covariance matrix of

u_{i}

. Since K is usually less than p,

L

is a low-rank matrix. More generally, there is a large class of problems where one is interested in estimating the covariance matrix of

X_{i}

from the noisy observations

Y_{i} = X_{i} + u_{i}

, where the noise

u_{i}

is uncorrelated with the signal

X_{i}

. In addition, if

X_{i}

is generated from a linear system so that some of its elements are linear functions of the others, then the covariance matrix of

X_{i}

has low-rank. Based on the structural decomposition (2), this paper proposes a novel approach to estimate the covariance matrix by solving a constrained low-rank and sparse regularized optimization problem.

Under structural decomposition (2), researchers have proposed some approaches to estimate the covariance matrix. When the common factors in Model (1) are known and observable and

S

is diagonal, Reference [10] proposed a least-squares-based covariance estimation (LS-CE). The core idea of LS-CE is first to estimate

B

by the least-squares method and, then, obtain the final estimator assuming that

S

is diagonal. Compared to the sample covariance matrix estimator, the LS-CE estimator is always invertible, even in a high-dimensional setting. Moreover, this estimator is shown to have substantial gains when estimating the inverse of the covariance matrix, however, it does not have much advantage when estimating the covariance matrix. Instead of assuming diagonal

S

, Reference [11] assumed that

S

is sparse and applied the adaptive thresholding technique of [4] to obtain an estimator for

S

. Then, a least-squares-based covariance estimator with conditional sparsity (LS-CS-CE) under Model (1) was studied in [11]. More recently, a robust approach based on Huber loss minimization was proposed in [13]. This approach estimates the initial joint covariance matrix of the observed data and the factors and, then, uses it to recover the covariance matrix of the observed data. Moreover, the proposed estimator in [13] only requires the condition that the fourth moment of the data exists. This method is applicable to a wider range of data, including sub-Gaussian and elliptical distributions. When the common factors are unknown and unobservable, Reference [12] proposed a non-parametric estimator via the thresholding principal orthogonal complements, which is called the principal orthogonal complement thresholding (POET) method. The POET uses the K principal components to estimate

L

and applies the adaptive thresholding technique [4] on the rest of the

(p - K)

components to estimate

S

. However, as stated in [9], using the simple thresholding approaches in [1,2,3,4,5,9] to estimate

S

may lead to the covariance estimators in [11,12,13], which are necessarily positive semi-definite.

Moreover, as suggested in [14], an intuitive optimization-type approach is to solve the problem:

\begin{matrix} \begin{matrix} min_{L, S \in R^{p \times p}} & \frac{1}{2} ∥ L + S - \tilde{Σ} ∥_{F}^{2} + λ_{L} rank (L) + λ_{S} \sum_{(u, v) \in D^{c}} I (| S_{u v} |), \end{matrix} \end{matrix}

(3)

where

\tilde{Σ}

is the sample covariance matrix,

rank (L)

denotes the rank of matrix

L

,

I (| S_{u v} |) = 1

if

| S_{u v} | \neq 0

and

I (| S_{u v} |) = 0

if

| S_{u v} | = 0

,

D^{c} : = {(u, v) \in N \times N : 1 \leq u \leq p, 1 \leq v \leq p, u \neq v}

is the index set of non-diagonal elements of

S

, and

λ_{L}

and

λ_{S}

are tuning parameters. To facilitate the calculation, Reference [14] proposed the low-rank and sparse covariance estimator (LOREC) via solving the following convex relaxation optimization problem:

\begin{matrix} \begin{matrix} min_{L, S \in R^{p \times p}} & \frac{1}{2} ∥ L + S - \tilde{Σ} ∥_{F}^{2} + λ_{R} {∥ L ∥}_{*} + λ_{S} \sum_{(u, v) \in D^{c}} | S_{u v} |, \end{matrix} \end{matrix}

(4)

where

{∥ L ∥}_{*} : = \sum_{i = 1}^{p} σ_{i} (L)

denotes the nuclear norm of

L

and

σ_{i} (L)

is the ith largest singular value of

L

. However, the estimators for

L

and

S

proposed in [14] are not guaranteed to be positive semi-definite. Clearly, the optimization problems (3) and (4) can be considered as special cases of the robust principal component analysis (RPCA) in [15] and its convex relaxation, respectively.

In this paper, we propose an optimization model similar to some models of RPCA, which is formulated as

\begin{matrix} min_{L, S \in R^{p \times z}} rank (L) + λ_{S} {∥ S ∥}_{0}, s . t . L + S = \tilde{Σ}, \end{matrix}

where

{∥ S ∥}_{0}

denotes the number of the non-zero elements of

S

. Reference [15] showed that the principal component pursuit estimate solving

\begin{matrix} min_{L, S \in R^{p \times z}} {∥ L ∥}_{*} + λ_{S} \sum_{u = 1}^{p} \sum_{v = 1}^{p} | S_{u v} |, s . t . L + S = \tilde{Σ}, \end{matrix}

(5)

exactly recovers the low-rank

L

and the sparse

S

. Furthermore, Reference [16] suggested that

L

and

S

can be estimated by solving the following unconstrained optimization problem:

\begin{matrix} min_{L, S \in R^{p \times z}} {∥ L ∥}_{*} + λ_{S} \sum_{u = 1}^{p} \sum_{v = 1}^{p} | S_{u v} | + \frac{λ_{L}}{2} {∥ L + S - \tilde{Σ} ∥}_{F}^{2} . \end{matrix}

(6)

More references about the RPCA and related works based on convex and non-convex regularization can be found in [17,18,19,20,21].

The main contributions of this paper can be summarized as follows:

We construct a constrained adaptive $L_{q}$ -type regularized optimization problem for covariance estimation, which is a non-smooth and non-convex problem. This optimization problem is based on the structural decomposition (2) and the idea of RPCA in [16], but is not limited to the factor model (1).
We study the first-order optimality condition of this problem and define a class of hybrid first-order stationary points. We establish the relationship between the first-order stationary point and the global minimizer of this problem.
We propose a smoothing alternating updating method to solve the constructed non-smooth, non-convex optimization problem and established its convergence. The simulation results show that the proposed smoothing alternating updating method is efficacious for the constrained adaptive $L_{q}$ -type regularized optimization problem.

The rest of this paper is organized as follows. In Section 2, we give some notations and preliminaries. In Section 3, we construct a constrained adaptive

L_{q}

-type regularized optimization problem for covariance estimation. In Section 4, we study the first-order optimality condition of this problem and define a class of hybrid first-order stationary points. In Section 5, we propose a smoothing alternating updating method to solve the non-smooth, non-convex optimization problem and establish its convergence. The simulation results are given in Section 6. The conclusions and discussion are given in Section 7, while the mathematical proofs are given in Section 8.

2. Notations and Preliminaries

Throughout this paper, we use the following notations. The sets of

p \times p

positive semi-definite symmetric and positive definite symmetric matrices are denoted by

S_{+}^{p}

and

S_{+ +}^{p}

, respectively. It is known that

S_{+}^{p}

and

S_{+ +}^{p}

are cones.

Given matrices

X : = {(X_{u v})}_{u, v = 1}^{p} \in R^{p \times p}

and

Y : = {(Y_{u v})}_{u, v = 1}^{p} \in R^{p \times p}

, where

X_{u v}

is the

(u, v)

th element of

X

,

〈 X, Y 〉 : = \sum_{u = 1}^{p} \sum_{v = 1}^{p} X_{u v} Y_{u v}

, and

X \circ Y

denotes the Hadamard product

{(X \circ Y)}_{u v} = : X_{u v} Y_{u v}

for all

u, v = 1, \dots, p

. Define the elementwise maximum norm (or max norm)

{| X |}_{\max} : = {max}_{u, v} | X_{u v} |

, the Frobenius norm

{∥ X ∥}_{F} : = {(\sum_{u = 1}^{p} \sum_{v = 1}^{p} X_{u v}^{2})}^{1 / 2}

, and the spectral norm

{∥ X ∥}_{2} : = {sup}_{{∥ x ∥}_{2} \leq 1} {∥ X x ∥}_{2}

, where

{∥ x ∥}_{2} : = \sqrt{\sum_{i = 1}^{p} x_{i}^{2}}

is 2-norm of vector

x

. Moreover, given a matrix

X \in R^{p \times p}

and a index set

Γ \subseteq {(u, v) : 1 \leq u \leq p, 1 \leq v \leq p}

,

X_{Γ}

denotes the array of the same dimension such that

{[X_{Γ}]}_{u v} = X_{u v}

for

(u, v) \in Γ

and

{[X_{Γ}]}_{u v}

are undefined for

(u, v) \in Γ^{c}

. For convenience, we will write the

(u, v)

th element of

X_{Γ}

as

X_{u v}

regardless if it is defined or not.

Given a vector

x \in R^{p}

,

Diag (x)

denotes a square matrix with vector

x

on its diagonal and zeros elsewhere. Given a matrix

X \in S_{+}^{p}

and letting

D_{i} (X)

denote the ith largest eigenvalue of

X

,

i = 1, \dots, p

, write

D (X) = {(D_{1} (X), \dots, D_{p} (X))}^{T} \in R^{p}

. Define sign operator:

sign (t) = \{\begin{matrix} 1, & if t > 0 \\ 0, & if t = 0 \\ - 1, & if t < 0, \end{matrix}

and set

D : = {(u, v) \in N \times N : 1 \leq u \leq p, 1 \leq v \leq p, u = v}

.

Let

f : R^{p \times p} \to \bar{R}

be a convex function, and let

\bar{X} \in dom f

. The subdifferential of f at

\bar{X}

is defined by

\partial f (\bar{X}) : = \{ξ \in R^{p \times p} : lim_{t ↓ 0} \frac{f (\bar{X} + t Z) - f (\bar{X})}{t} \geq 〈 ξ, Z 〉 for all Z \in R^{p \times p}\},

where

dom f : = {X \in R^{p \times p} : f (X) < \infty}

denotes the domain of f,

\bar{R} : = (- \infty, \infty]

. In other words,

\partial f (\bar{X})

is the closed convex hull of all points of the form

{lim}_{i \to \infty} \nabla f (X_{i})

, where

{X_{i}}

is any sequence that converges to

\bar{X}

while avoiding the set for which f fails to be differentiable. Moreover, the singular subdifferential of f at

\bar{X}

is defined by

\partial^{\infty} f (\bar{X}) : = \{ξ \in R^{p \times p} : (ξ, 0) \in N ((\bar{X}, f (\bar{X})); epi f)\},

where

epi f : = {(X, t) \in R^{p \times p} \times R : X \in dom f, t \geq f (X)}

denotes the epigraph of f.

Finally, we give the definition of the proximal mapping in this paper. Given a closed set

Ω

and a function

f : R^{p \times p} \to \bar{R}

, the proximal mapping on

Ω

of f is the operator given by

{Prox}_{f, Ω} (X) = \underset{U \in Ω}{argmin} \{f (U) + \frac{1}{2} {∥ U - X ∥}_{F}^{2}\} for any X \in R^{p \times p} .

3. Covariance Estimation with Adaptive $L_{q}$ -Type Regularization

Considering that sparse covariance matrix estimation is intrinsically a heteroscedastic problem ([4]) and

L

and

S

are all positive semi-definite matrices, we estimate the covariance matrix by solving the following adaptive

L_{q}

-type regularized optimization problem:

\begin{matrix} min_{L, S \in S_{+}^{p}} & F (L, S) : = \frac{1}{2} ∥ L + S - \tilde{Σ} ∥_{F}^{2} + λ_{L} {∥ L ∥}_{q}^{q} + \sum_{(u, v) \in D^{c}} λ_{u v}^{S} {| S_{u v} |}^{q}, \end{matrix}

(7)

where

q \in (0, 1)

,

{∥ L ∥}_{q}^{q} : = \sum_{i = 1}^{p} σ_{i} {(L)}^{q}

is the Schatten-q quasi-norm of matrix

L

and

λ_{L}

and

λ_{u v}^{S}

are tuning parameters. Note that although we use the same q for

L

and

S

here for simplicity, it can be different in general. Our approach is called

L_{q}

regularized covariance (

L_{q}

-CSC) estimation with conditional sparsity. The final estimator

\hat{Σ} : = \hat{L} + \hat{S}

is called the

L_{q}

regularized covariance estimator with conditional sparsity (

L_{q}

-CSCE).

To deal with heteroscedasticity in sparse covariance matrix estimation, we adopt the adaptive thresholding approach of Reference [4] and [12] to calculate the entry-dependent threshold

λ_{u v}^{S}

in (7) for

S_{u v}

as

\begin{matrix} λ_{u v}^{S} : = λ {(\frac{{\tilde{σ}}_{u u} {\tilde{σ}}_{v v} log p}{n})}^{1 / 2}, \end{matrix}

(8)

where

λ > 0

is a constant. Note that these entry-dependent tuning parameters for

S

in Problem (7) depend on the variance of the corresponding variables. The parameter

{\tilde{σ}}_{u u}

can be taken as the u-th diagonal elements of the estimator of the POET for

S

. It is worth mentioning that, when the data matrix is normalized, the covariance matrix is equivalent to its correlation matrix, in which case, all tuning parameters are

λ \sqrt{log p / n}

, and the adaptive

L_{q}

regularization function acting on

S

becomes the ordinary

L_{q}

regularization function.

We now discuss the selection of

\tilde{Σ}

. The common one is the ordinary sample covariance matrix

\frac{1}{n - 1} \sum_{i = 1}^{n} (Y_{i} - \bar{Y}) {(Y_{i} - \bar{Y})}^{T},

where

\bar{Y} = n^{- 1} \sum_{i = 1}^{n} Y_{i}

,

Y_{i}

is the i-th sample vector,

i = 1, \dots, n

. The ordinary sample covariance is suitable for the data with exponential-type tails or polynomial-type tails; see [1,4,8]. However, in many practical problems data tend to be heavy-tailed. References [9,13,22] defined a Huber minimization sample covariance matrix, which is suitable for the data with finite fourth moment. Moreover, Reference [9] also defined a rank-based covariance matrix and a median of means covariance matrix for data with other distribution conditions.

Finally, we discuss some basic properties of Problem (7).

Proposition 1.

The following statements of

F (L, S) : R^{p \times p} \times R^{p \times p} \to R_{+}

in Problem (7) hold:

(i): $F (L, S)$ is proper, closed, and coercive, i.e., ${lim}_{{∥ (R, S) ∥}_{F} \to \infty} F (L, S) = \infty$ .
(ii): $F (L, S)$ attains its minimal value over $S_{+}^{p} \times S_{+}^{p}$ .
(iii): Let $f (L, S) = \frac{1}{2} {∥ L + S - \tilde{Σ} ∥}_{F}^{2}$ ; the gradient $\nabla f (L, S)$ is Lipschitz continuous with a constant $ℓ = 2$ , that is, for any $(L_{1}, S_{1})$ , $(L_{2}, S_{2}) \in R^{p \times p} \times R^{p \times p}$ , it holds that

$∥ \nabla f (L_{1}, S_{1}) - \nabla f (L_{2}, S_{2}) ∥_{F} \leq 2 {∥ (L_{1}, S_{1}) - (L_{2}, S_{2}) ∥}_{F} .$
(iv): Given a matrix $L^{*} \in R^{p \times p}$ , the partial gradient function $\nabla_{S} f (L^{*}, S)$ is Lipschitz continuous with a constant $ℓ = 1$ , that is, for any $S_{1}$ , $S_{2} \in R^{p \times p}$ , it holds that

$∥ \nabla_{S} f (L^{*}, S_{1}) - \nabla_{S} f (L^{*}, S_{2}) ∥_{F} \leq {∥ S_{1} - S_{2} ∥}_{F} .$

Moreover, given a matrix $S^{*} \in R^{p \times p}$ , the partial gradient function $\nabla_{L} f (L, S^{*})$ is Lipschitz continuous with a constant $ℓ = 1$ .

4. Necessary Optimality Conditions

In this section, we define a class of first-order stationary points for Problem (7), which is a hybridization of the first-order stationary points in [23,24,25] and the fixed point in [25]. Furthermore, we prove that any global minimizer of Problem (7) is a first-order stationary point of Problem (7). We also define an approximate variant of this stationary point. It is worth mentioning that the constraints in Problem (7) are treated as ordinary closed convex set constraints in this paper.

For the simple unconstrained

L_{q}

regularized sparse optimization problem [23,24]:

min_{x \in R^{p}} f (x) + λ {∥ x ∥}_{q}^{q},

where

{∥ x ∥}_{q}^{q} : = \sum_{i = 1}^{p} {| x_{i} |}^{q}

, its the first-order stationary point is defined as the point satisfying

Diag (x^{*}) \nabla f (x^{*}) + λ q {| x^{*} |}^{q} = 0 .

For the simple unconstrained

L_{q}

regularized rank optimization problem [25]:

min_{X \in R^{m \times n}} f (X) + λ {∥ X ∥}_{q}^{q},

its first-order stationary point (or fixed point) is defined as the point satisfying

X^{*} \in {Prox}_{{λ ∥ X ∥}_{q}^{q} / ℓ, R^{p \times p}} (X^{*} - \frac{1}{ℓ} \nabla f (X^{*})),

where ℓ is a constant greater than the Lipschitz constant of

\nabla f (X)

. Inspired by these works, we can define a class of hybrid first-order stationary points of Problem (7).

Definition 1.

(L^{*}, S^{*}) \in S_{+}^{p} \times S_{+}^{p}

is said to be the first-order stationary point of Problem (7) if

(L^{*}, S^{*})

satisfies

\begin{matrix} L^{*} \in {Proxi}_{(λ_{L} / ℓ) {∥ L ∥}_{q}^{q}, S_{+}^{p}} (L^{*} - \frac{1}{ℓ} (L^{*} + S^{*} - \tilde{Σ})), \end{matrix}

(9)

\begin{matrix} 0 \in \{S^{*} \circ (L^{*} + S^{*} - \tilde{Σ}) + Q^{*} + S^{*} \circ Y : Y_{Γ^{*} ⋃ D} \in \partial h_{Γ^{*} ⋃ D} (S_{Γ^{*} ⋃ D}^{*})\}, \end{matrix}

(10)

where

Q^{*} : = {(Q_{u v}^{*})}_{u, v = 1}^{p}

and

\begin{matrix} Q_{u v}^{*} = \{\begin{matrix} q λ_{u v}^{S} {| S_{u v}^{*} |}^{q}, & if (u, v) \in D^{c}, \\ 0, & if (u, v) \in D, \end{matrix} \end{matrix}

Γ^{*} = {(u, v) : S_{u v}^{*} \neq 0}

,

h_{Γ^{*} ⋃ D} (S_{Γ^{*} ⋃ D}) : = δ_{S_{+}^{p}} ([S_{Γ^{*} ⋃ D} : 0])

,

Y = [Y_{Γ^{*} ⋃ D} : Y_{{(Γ^{*} ⋃ D)}^{c}}]

,

Y_{{(Γ^{*} ⋃ D)}^{c}}

can be any finite array.

Next, we investigate the relationship between the global minimizer of Problem (7) and the first-order stationary point of Problem (7) without any qualification conditions.

Theorem 1.

Suppose that

(L^{*}, S^{*}) \in S_{+}^{p} \times S_{+}^{p}

is a global minimizer of Problem (7), and let

Γ^{*} = {(u, v) \in N \times N : S_{u v}^{*} \neq 0}

. Then,

(L^{*}, S^{*})

is a first-order stationary point with

ℓ > 1

of Problem (7).

Again, under some qualification conditions, the form of the first-order stationary point of Problem (7) can be further analyzed. We now discuss the specific form of the subdifferential of

h_{Γ^{*} ⋃ D}

at

S_{Γ^{*} ⋃ D}^{*}

. Note that

h_{Γ^{*} ⋃ D} (X_{Γ^{*} ⋃ D}) = δ_{S_{+}^{p}} ([X_{Γ^{*} ⋃ D} : 0]) = δ_{S_{+}^{p}} (L_{Γ^{*} ⋃ D} (X_{Γ^{*} ⋃ D})),

where

L_{Γ^{*} ⋃ D} (X_{Γ^{*} ⋃ D}) : = [X_{Γ^{*} ⋃ D} : 0]

is the mapping acting on array

X_{Γ^{*}}

. The adjoint mapping

L_{Γ^{*} ⋃ D}^{*} : R^{p \times p} \to R^{p \times p}

is defined by

〈 L_{Γ^{*} ⋃ D} (X_{Γ^{*} ⋃ D}), Y 〉 = 〈 X_{Γ^{*} ⋃ D}, L_{Γ^{*} ⋃ D}^{*} (Y) 〉 for all X_{Γ^{*} ⋃ D} Y \in R^{p \times p} .

Specifically,

L_{Γ^{*} ⋃ D}^{*} (Y) = Y_{Γ^{*} ⋃ D}

. Clearly,

δ_{S_{+}^{p}}

is convex, and

L_{Γ^{*} ⋃ D}

and

L_{Γ^{*} ⋃ D}^{*}

are linear mappings. It follows from Theorem 23.9 in [26] and Theorem 2.51 in [27] that, if

S^{*} \in S_{+}^{p}

, then

\begin{matrix} {L_{Γ^{*} ⋃ D}^{*} (Y) : Y \in N_{S_{+}^{p}} (S^{*})} \subseteq \partial h_{Γ^{*} ⋃ D} (S_{Γ^{*} ⋃ D}^{*}) . \end{matrix}

(11)

Moreover, suppose that the qualification condition

Ker L_{Γ^{*} ⋃ D}^{*} ⋂ \partial^{\infty} δ_{S_{+}^{p}} (S^{*}) = {0}

holds, where

Ker L_{Γ^{*} ⋃ D}^{*}

is the kernel of linear mapping

L_{Γ^{*} ⋃ D}^{*}

, then

\begin{matrix} {L_{Γ^{*} ⋃ D}^{*} (Y) : Y \in N_{S_{+}^{p}} (S^{*})} = \partial h_{Γ^{*} ⋃ D} (S_{Γ^{*} ⋃ D}^{*}) . \end{matrix}

(12)

Based on these discussions, the following statement holds: If

(L^{*}, S^{*})

satisfies

\begin{matrix} \begin{matrix} L^{*} \in {Proxi}_{λ_{L} {∥ L ∥}_{q}^{q}, S_{+}^{p}} (\tilde{Σ} - S^{*}), \\ 0 \in \{S^{*} \circ (L^{*} + S^{*} - \tilde{Σ}) + Q^{*} + S^{*} \circ Y : Y \in N_{S_{+}^{p}} (S^{*})\}, \end{matrix} \end{matrix}

where

Q^{*}

is defined in Definition 1 and the qualification condition

Ker L_{Γ^{*} ⋃ D}^{*} ⋂ \partial^{\infty} δ_{S_{+}^{p}} (S^{*}) = {0}

holds, then

(L^{*}, S^{*})

is a first-order stationary point of Problem (7).

Obviously, the qualification condition

Ker L_{Γ^{*} ⋃ D}^{*} ⋂ \partial^{\infty} δ_{S_{+}^{p}} (S^{*}) = {0}

holds automatically if

δ_{S_{+}^{p}}

is locally Lipschitz continuous at

S^{*}

. In other words, if

S^{*} \in int (S_{+}^{p}) = S_{+ +}^{p}

, this qualification condition holds. Indeed, the true covariance matrix

S^{o}

is often assumed to be positive definite [7,8]. When

S^{*} \in S_{+ +}^{p}

, we can establish the lower bound theory of the first-order stationary point of Problem (7).

Theorem 2.

Let

(L^{*}, S^{*})

be a first-order stationary point of Problem (7) satisfying

F (L^{*}, S^{*}) \leq F (L^{0}, S^{0}) + ϵ

for some

L^{0}, S^{0}

and

ϵ \geq 0

,

Γ^{*} : = {(u, v) : S_{u v}^{*} \neq 0}

. If

S^{*} \in S_{+ +}^{p}

, then

| S_{u v}^{*} | \geq {(\frac{q min {λ_{u v}^{S}}}{\sqrt{2 [F (L^{0}, S^{0}) + ϵ]}})}^{\frac{1}{1 - q}} \forall (u, v) \in D^{c} ⋂ Γ^{*} .

Here, we only give the lower bound theory of the first-order stationary point of Problem (7). With reference to the definitions of the first- and second-order stationary point established by [23,24] for the general unconstrained

L_{q}

regularized optimization, a class of second-order necessary optimality conditions of Problem (7) and its lower bound theory can be considered.

Note that Problem (7) is actually a very challenging optimization problem due to the positive definite constraints and the non-Lipschitz regularization terms, even finding a first-order stationary point. Here, we define a class of approximate first-order stationary points of Problem (7).

Definition 2.

We say that

(L^{*}, S^{*}) \in S_{+}^{p} \times S_{+}^{p}

is an ε-approximate first-order stationary point (

ε \geq 0

) of Problem (7) if there exist a set

Γ^{*} \subseteq {(u, v) : 1 \leq u, v \leq p, S_{u v}^{*} \neq 0}

such that

\begin{matrix} dist (L^{*}, {Proxi}_{(λ_{L} / ℓ) {∥ L ∥}_{q}^{q}, S_{+}^{p}} (L^{*} - \frac{1}{ℓ} (L^{*} + S^{*} - \tilde{Σ}))) \leq ε, \end{matrix}

(13)

\begin{matrix} dist (0, L_{Γ^{*} ⋃ D}^{*} + S_{Γ^{*} ⋃ D}^{*} - {\tilde{Σ}}_{Γ^{*} ⋃ D} + Q_{Γ^{*} ⋃ D}^{*} + Ξ_{Γ^{*} ⋃ D} : Ξ_{Γ^{*} ⋃ D} \in \partial h_{Γ^{*} ⋃ D} (S_{Γ^{*} ⋃ D}^{*})) \leq ε, \end{matrix}

(14)

\begin{matrix} ∥ S_{{(Γ^{*} ⋃ D)}^{c}}^{*} ∥_{F} \leq ε . \end{matrix}

(15)

where

Q^{*}

is defined in Definition 1.

Moreover, if

(L^{*}, S^{*})

is a first-order stationary point of Problem (7), it is an

ε

-approximate first-order stationary point of Problem (7) with

ε = 0

. If

(L^{*}, S^{*})

is an

ε

-approximate first-order stationary point of Problem (7) with

ε = 0

, it is a first-order stationary point of Problem (7) with

ε = 0

. In practice, the stopping point of any algorithm for an optimization problem is usually an approximate stationary point, rather than a stationary point.

5. Numerical Optimization

In this section, we propose a smoothing alternating updating (SAU) method to solve Problem (7), which is a combination of the smoothing projection gradient step in [28,29] and the proximal gradient step in [30] under the framework of the ordinary alternating minimization method. The convergence of the SAU method is established.

5.1. A Smooth Approximation Optimization Problem

Referring to the smoothing technique in [23,29], a class of smooth approximation optimizations of Problem (7) can be given by

\begin{matrix} min_{L, S \in S_{+}^{p}} & F_{μ} (L, S) : = \frac{1}{2} ∥ L + S - \tilde{Σ} ∥_{F}^{2} + λ_{L} {∥ L ∥}_{q}^{q} + \sum_{(u, v) \in D^{c}} λ_{u v}^{S} ψ_{μ} {(S_{u v})}^{q}, \end{matrix}

(16)

where

μ \in (0, \infty)

and

ψ_{μ} (S_{u v}) = \{\begin{matrix} | S_{u v} |, & | S_{u v} | > μ, \\ \frac{S_{u v}^{2}}{2 μ} + \frac{μ}{2}, & | S_{u v} | \leq μ . \end{matrix}

Obviously,

ψ_{μ} (S_{u v})

is first-order continuously differential and

\begin{matrix} {(ψ_{μ} {(S_{u v})}^{q})}^{'} = \{\begin{matrix} q | S_{u v} |^{q - 1} sign (S_{u v}), & | S_{u v} | > μ, \\ q {(\frac{S_{u v}^{2}}{2 μ} + \frac{μ}{2})}^{q - 1} \frac{S_{u v}}{μ}, & | S_{u v} | \leq μ . \end{matrix} \end{matrix}

(17)

Furthermore,

\nabla_{S} F_{μ} (L, S) = L + S - \tilde{Σ} + Ψ (S),

where

\begin{matrix} {(Ψ (S))}_{u v} = \{\begin{matrix} λ_{u v}^{S} {(ψ_{μ} {(S_{u v})}^{q})}^{'}, & (u, v) \in D^{c}, \\ 0, & (u, v) \in D . \end{matrix} \end{matrix}

(18)

Next, we define a class of hybrid first-order stationary points of Problem (16).

Definition 3.

We say that

(L^{*}, S^{*}) \in S_{+}^{p} \times S_{+}^{p}

is a first-order stationary point of Problem (16) if

(L^{*}, S^{*})

satisfies

\begin{matrix} L^{*} \in {Proxi}_{(λ_{L} / ℓ) {∥ L ∥}_{q}^{q}, S_{+}^{p}} (L^{*} - \frac{1}{ℓ} (L^{*} + S^{*} - \tilde{Σ})), \end{matrix}

(19)

\begin{matrix} 0 \in \{\nabla_{S} F_{μ} (L^{*}, S^{*}) + Y, Y \in N_{S_{+}^{p}} (S^{*})\} . \end{matrix}

(20)

Similar to the proof of Theorem 1, it is easy to prove that any global solution of Problem (16) is a first-order stationary point of Problem (16) with

ℓ > 1

. The following conclusions establish the relationship between the first-order stationary points of Problems (7) and (16), which is the core of the convergence of the SAU method in next subsection.

Theorem 3.

Let

{μ_{k}}

denote a sequence with

μ_{k} > 0

,

k = 1, 2, \dots,

and

μ_{k} \to 0

as

k \to 0

and

(L_{μ_{k}}^{*}, S_{μ_{k}}^{*})

be a first-order stationary point of Problem (16) with

μ = μ_{k}

. Then, the following statements holds:

(i): If $μ_{k} \leq ε / p^{2}$ , then $(L_{μ_{k}}^{*}, S_{μ_{k}}^{*})$ is an ε-approximate first-order stationary point of Problem (7).
(ii): Any accumulation point of $(L_{μ_{k}}^{*}, S_{μ_{k}}^{*})$ is a first-order stationary point of Problem (7).

5.2. Smoothing Alternating Updating Method

Theorem 3 provides a feasible procedure to obtain an approximate solution of Problem (7) by solving Problem (16). This subsection gives the framework of the SAU method for Problem (7) and establishes its convergence.

First, we give the framework of the alternating updating (AU) method for solving Problem (16).

Algorithm 1.

Given

α_{0} > 0

,

β \in (0, 1)

,

ℓ > 1

, choose an the initial point

(L^{0}, S^{0}) \in S_{+}^{p} \times S_{+}^{p}

. For

m = 0, 1, 2, \dots

, repeat the following two steps until convergence:

Step 1.Update

L

by solving the following optimization problem:

\begin{matrix} L^{m + 1} = \underset{L \in S_{+}^{p}}{argmin} 〈 \nabla_{L} f (L^{m}, S^{m}), L - L^{m} 〉 + \frac{ℓ}{2} ∥ L - L^{m} ∥_{F}^{2} + λ_{L} {∥ L ∥}_{q}^{q} . \end{matrix}

(21)

Step 2.Compute

\begin{matrix} S^{m + 1} = \underset{S \in S_{+}^{p}}{argmin} {∥ S - (S^{m} - α_{m} \nabla_{S} F_{μ} (L^{m + 1}, S^{m})) ∥}_{F}^{2}, \end{matrix}

(22)

where

α_{m} = α_{0} β^{ℓ_{m}}

and

ℓ_{m}

is the smallest non-negative integer ℓ such that

F_{μ} (L^{m + 1}, S^{m} (α_{0} β^{ℓ}))) \leq F_{μ} (L^{m + 1}, S^{m})) - \frac{c}{2} ∥ S^{m} (α_{0} β^{ℓ}) {) - S^{m} ∥}_{F}^{2},

and

S^{m} (α) = {min}_{S \in S_{+}^{p}} {∥ S - (S^{m} - α \nabla_{S} F_{μ} (L^{m + 1}, S^{m})) ∥}_{F}^{2}

.

The AU method proposed in this paper applies the proximal gradient step [30] to update

L

and uses the smoothing projection gradient step [28,29] to update

S

. A key component at each iteration of the proposed AU method is to solve a singular-value minimization Problem (21), which can be rewritten as follows:

\begin{matrix} min_{L \in S_{+}^{p}} \frac{1}{2} ∥ L - (L^{m} - \frac{1}{ℓ} \nabla_{L} f (L^{m}, S^{m})) ∥_{F}^{2} + \frac{λ_{L}}{ℓ} {∥ L ∥}_{q}^{q} . \end{matrix}

(23)

Theorem 3.4 of [30] provides a solution to Problem (23). To facilitate the expression, define vector q-thresholding operator

H (d) : = {(h_{v} (d_{1}), \dots, h_{v} (d_{p}))}^{T}

with

v = 2 λ_{L} / ℓ

, where

\begin{matrix} h (d_{i}) = \{\begin{matrix} h_{v, q} (d_{i}) & d_{i} > d^{*}, \\ {(v (1 - q))}^{1 / (2 - q)} or 0 & d_{i} = d^{*}, \\ 0, & d_{i} < d^{*}, \end{matrix} \end{matrix}

(24)

d^{*} = \frac{2 - q}{2 (1 - q)} {(v (1 - q))}^{1 / (2 - q)}

, and

h_{v, q} (d_{i})

is the unique root of the single-variable minimization problem in

(\bar{d}, + \infty)

:

min_{x \geq 0} g_{d_{i}} (x) : = x^{2} - 2 d_{i} x + v x^{q},

where

\bar{d} = {(v q (1 - q) / 2)}^{1 / (2 - q)}

. Moreover,

h_{v, q} (d_{i})

is differentiable and strictly increasing on

[d^{*}, + \infty)

. As per the suggestion in [25],

h_{v, q} (d_{i})

can be simply obtained by the numerical methods, for example the Newton method:

x^{z + 1} = x^{z} - \frac{q v {(x^{z})}^{q - 1} / 2 + x^{z} - d_{i}}{q (q - 1) v {(x^{z})}^{q - 2} / 2 + 1}

with the initial point

d_{0} = 1.5 \bar{d}

.

Theorem 4

(Theorem 3.4 in [30]). Let

U Diag (d) U^{T}

be the eigenvalue decomposition of

L^{m} - (1 / ℓ) \nabla_{L} f (L^{m}, S^{m}))

. Then,

L^{*} = U Diag (H (d)) U^{T}

is an optimal solution to Problem (23), where

H (d)

is defined as (24) with

v = 2 λ_{L} / ℓ

.

When

q = 1 / 2

,

H (d)

has a closed-form expression, that is

h_{v, q} (d_{i}) = \frac{2}{3} d_{i} (1 + cos (\frac{2 π}{3} - \frac{2 φ (d_{i})}{3}))

,

φ (d_{i}) = arccos (\frac{v}{8} {(\frac{d_{i}}{3})}^{- 3 / 2})

,

d^{*} = \frac{\sqrt[3]{54}}{4} v^{2 / 3}

. Refer to [30] for the details.

Another key component of at each iteration of the proposed AU method is a projection problem over

S_{+}^{p}

, which can be written as follows:

\begin{matrix} min_{S \in S_{+}^{p}} {∥ S - C^{m} ∥}_{F}^{2}, \end{matrix}

(25)

where

C^{m} : = S^{m} - α_{m} \nabla_{S} F_{μ} (L^{m + 1}, S^{m})

. It is well known that, if

C^{m}

has the eigen-decomposition

C^{m} : = \sum_{i = 1}^{p} λ_{i} v_{i} v_{i}^{T}

, then

S^{*} = \sum_{i = 1}^{p} max {λ_{i}, 0} v_{i} v_{i}^{T}

is the optimal solution of Problem (25).

By (23) and (25), it is easy to find that, when the initial matrices

L^{0}

and

S^{0}

in the AU method are symmetric, the matrices for

L

and

S

output by the AU method are also symmetric. The next theorem gives the convergence of the AU method.

Theorem 5.

Let

{(L^{m}, S^{m})}

be the sequence generated by the AU method. Then, the following statements hold:

(i): $F_{μ} (L^{m + 1}, S^{m + 1}) \leq F_{μ} (L^{m}, S^{m}) - \frac{c}{2} ∥ S^{m + 1} - S^{m} ∥_{F}^{2} - \frac{ℓ - 1}{2} {∥ L^{m + 1} - L^{m} ∥}_{F}^{2}$ .
(ii): ${(L^{m}, S^{m})}$ is bounded.
(iii): ${lim}_{m \to \infty} {∥ L^{m + 1} - L^{m} ∥}_{F} = 0$ , ${lim}_{m \to \infty} {∥ S^{m + 1} - S^{m} ∥}_{F} = 0$ .
(iv): Any accumulation point of ${(L^{m}, S^{m})}$ is a first-order stationary point of Problem (16).

We now give the framework of the SAU method for solving Problem (16).

Algorithm 2.

Give

τ \in (0, 1)

. Choose an initial smoothing parameter

μ_{0}

. For

k = 0, 1, 2, \dots

, repeat the following two steps until convergence:

Step 1.Solve Problem (16) with

μ_{k}

by the AU method, and obtain

(L_{μ_{k}}^{*}, S_{μ_{k}}^{*})

.

Step 2.Set

μ_{k + 1} \leq τ μ_{k}

, and return to Step 1.

Theorem 3 shows that any accumulation point of

(L_{μ_{k}}^{*}, S_{μ_{k}}^{*})

is a first-order stationary point of Problem (7), which provides the guarantee of the convergence of the SAU method. In practice, k cannot be taken as infinite, and

μ_{k}

cannot be taken as 0. Then, the

L_{q}

-CSCE we obtained by the SAU method is often an

ε

-approximate first-order stationary point of Problem (7). Indeed, we stop the SAU method if it reaches a maximum iteration

k_{max}

or

μ_{k} < ε

, where

ε > 0

is a small constant. Moreover, in order to improve the efficiency of the SAU method, a trick is to take the output point

(L_{μ_{k}}^{*}, S_{μ_{k}}^{*})

as the initial point of the AU method for Problem (16) with

μ_{k + 1}

. Thus, if the initial point of the AU method for solving Problem (16) with

μ_{0}

, then the final matrices for

L

and

S

generated by the SAU method are also symmetric.

6. Simulations

In this section, we study the performance of the

L_{q}

-CSC method and the associated algorithms. The following models are considered in this section:

Model 1 (banded matrix with ordering). $Σ^{*} = blockdiag (A_{1}, A_{2})$ , where $A_{1} = (σ_{u v})$ , $1 \leq u, v \leq p / 2$ , $σ_{u v} = {(1 - \frac{| i - j |}{10})}_{+}$ , $A_{2} = 4 I_{p / 2 \times p / 2}$ . $Σ^{*}$ is a two-block diagonal matrix, $A_{1}$ is a banded and sparse covariance matrix, and $A_{2}$ is a diagonal matrix with four along the diagonal.
Model 2 (simple factor). $Σ^{*} = U D U^{T} + I$ , where $U \in R^{p \times 3}$ with orthonormal columns uniformly random and $D = 8 I$ .
Model 3 (random factor). $Σ^{*} = U D U^{T} + Σ_{Sparse}$ , where $U \in R^{p \times 3}$ with orthonormal columns uniformly random and $D = 8 I$ , where $Σ_{Sparse}$ is the sparse matrix generated by Model 1.

The matrix

Σ^{*}

in Model 1 is sparse, while the matrix in Model 2 and Model 3 satisfies structural decomposition (2).

6.1. Comparison with Representative Methods

This subsection studies the advantages and disadvantages of the

L_{q}

-CSCE and existing methods. Four representative estimators (the adaptive thresholding estimator (ATE) with the hard thresholding rule in [4], the optimization-type estimator (OTE) in [8], the POET in [12], and the LOREC in [14]) were used for the comparison. The first two estimators are based on the assumption of sparse covariance. While the ATE is a good simple thresholding estimator, the OTE is a good optimization-type estimator. Our estimators POET and LOREC are based on the structural decomposition (2).

Under each model, we set

n = 100

, and all random sample are generated from the normal distribution with mean 0 and covariance matrix

Σ

, for

p = 30, 100, 200

. The losses are measured by three matrix norms: the spectral norm, the max norm, and the Frobenius norm. The standard errors are given in parentheses. It is known that

q = 0.5

is a good choice for the variable selection problem and the low-rank problem [24,25]. For convenience, the parameter q in Problem (7) is taken as

0.5

. In this subsection, we do not consider different options of q.

For Model 1, the tuning parameters

λ_{L}

in Problems (4) and (7) are taken as

10^{8}

. Moreover, we set

K = 0

in the POET. Under these setting, the LOREC, POET, and

L_{q}

-CSCE degenerate into the simple soft thresholding estimator [1,2], the ATE, and the constrained adaptive thresholding estimator, respectively. Table 1 reports the means and standard errors of the five estimators under the three losses. In our analysis, the performance of the POET is almost the same as the ATE. Note that compared with the ATE, OTE, POET, and

L_{q}

-CSCE, the LOREC has a large bias under the spectral norm loss and the Frobenius norm loss, which is caused by the bias of the

L_{1}

penalty. Moreover, the

L_{q}

-CSCE performs best under the spectral norm loss and the Frobenius norm loss. All methods perform similarly under the max norm.

For Models 2 and 3, we choose the tuning parameters in the LOREC and

L_{q}

-CSCE by setting the grid and setting

K = 3

in the POET. Table 2 and Table 3 report the means and standard errors of the five estimators under the three losses. Note that the ATE and OTE performed poorly for Models 2 and 3. In contrast, the LOREC, POET, and

L_{q}

-CSCE were more effective. The

L_{q}

-CSCE performs best under the spectral norm loss and the Frobenius norm loss. The same as the result of Model 1, all methods perform similarly under the max norm. It is worth mentioning that the adaptive technology is invalid for Model 2.

Moreover, Table 4 records the number of times that the estimators of the POET, LOREC, and

L_{q}

-CSCE for

L

and

S

are positive semi-definite matrices. Clearly, all estimators of the

L_{q}

-CSC estimation for

L

and

S

are positive semi-definite. It is not guaranteed that the estimators of the LOREC and POET for

S

are positive semi-definite matrices.

6.2. Choice of q

In the previous simulations, we only consider the setting where

q = 0.5

. This subsection supplements some simulations to study the influence of the choice of q on the performance of

L_{q}

-CSCE. Here, we consider the cases

q = 0.1

,

0.3

,

0.5

0.7

, and

0.9

.

Table 5, Table 6 and Table 7 report the means and standard errors of the

L_{q}

-CSCE with different q under the three losses. Note that the options for q did not affect the result of the

L_{q}

-CSCE under the max norm loss. Under other losses, for different models, the performance of the

L_{q}

-CSCE depends on the choice of q deeply or shallowly. Specifically, there was little difference between

q = 0.1

,

q = 0.3

, and

q = 0.5

under the spectral norm loss and the max norm loss. The

L_{q}

-CSCE with

q = 0.7

and

q = 0.9

performed poorly under the Frobenius norm, especially when

q = 0.9

. Note that, when

q = 0.1

, the

L_{q}

-CSCE is slightly worse than that of

q = 0.3

and

q = 0.5

. We guess that a smaller q would make the minimizing functional more non-convex and, thus, more difficult to solve. The estimator induced by the

L_{q}

function with a large q had a large deviation.

Problem (23) has an explicit solution when

q = 0.5

, which makes the iteration of the SAU method simple. When

q = 0.1

,

0.3

,

0.7

, and

0.9

, it is necessary to embed other numerical methods to find the approximate solution of Problem (23). Therefore, in practical applications, we recommend the setting where

q = 0.5

.

7. Discussion and Conclusions

In this paper, we considered the problem of estimating a high-dimensional covariance matrix from noisy data and studied a class of estimators based on the structural decomposition (2). We estimated the high-dimensional covariance matrix by solving a constrained adaptive

L_{q}

-type regularized optimization Problem (7). This paper provided a systematic study for Problem (7). We first defined a class of hybrid first-order stationary points of Problem (7) and analyzed the relationship between these stationary points and the global solution of Problem (7). Secondly, we proposed an SAU method for solving Problem (7) and established its convergence. The final simulation results showed that the proposed SAU method can solve Problem (7) effectively and that the final

L_{q}

-CSCE had good numerical performance.

8. Proof

This section provides the detailed proof of the conclusions in this paper.

Proof of Proposition 1.

(i) (i i)

The properness of

F (L, S)

is obvious. Note that F is continuous over its domain and

dom (F)

is closed. Theorem 2.8 in [31] shows that F is closed. Furthermore, whether

{∥ L ∥}_{F} \to \infty

or

{∥ S ∥}_{F} \to \infty

, at least one of the three terms in F tends to infinity. Thus, the coerciveness of

F (L, S)

holds. Then, it follows from Theorem 2.14 in [31] that

F (L, S)

attains its minimal value over

S_{+}^{p} \times S_{+}^{p}

.

(i i i)

By calculation,

\nabla f (L, S) = {[\nabla_{L}^{T} f (L, S), \nabla_{S}^{T} f (L, S)]}^{T}

, where

\nabla_{L} f (L, S) = \nabla_{S} f (L, S) = L + S - \tilde{Σ} .

For any

(L_{1}, S_{1})

,

(L_{2}, S_{2}) \in R^{p \times p} \times R^{p \times p}

, it holds that

\begin{matrix} \begin{matrix} ∥ \nabla f (L_{1}, S_{1}) - \nabla f (L_{2}, S_{2}) ∥_{F}^{2} & = ∥ \nabla_{L} f (L_{1}, S_{1}) - \nabla_{L} f (L_{2}, S_{2}) ∥_{F}^{2} + {∥ \nabla_{S} f (L_{1}, S_{1}) - \nabla_{S} f (L_{2}, S_{2}) ∥}_{F}^{2} \\ = 2 ∥ L_{1} - L_{2} + S_{1} - S_{2} ∥_{F}^{2} \\ = 2 ∥ L_{1} - L_{2} ∥_{F}^{2} + 2 {∥ S_{1} - S_{2} ∥}_{F}^{2} + 4 〈 L_{1} - L_{2}, S_{1} - S_{2} 〉 \\ \leq 4 ∥ L_{1} - L_{2} ∥_{F}^{2} + 4 {∥ S_{1} - S_{2} ∥}_{F}^{2} \\ = 4 ∥ (L_{1}, S_{1}) - (L_{2}, S_{2}) ∥_{F}^{2}, \end{matrix} \end{matrix}

which implies that the gradient

\nabla f (L, S)

is Lipschitz continuous with a constant

ℓ = 2

.

(i v)

By calculation,

\nabla_{S} f (L^{*}, S) = L^{*} + S - \tilde{Σ}

. For any

S_{1}

,

S_{2} \in R^{p \times p}

, it holds that

\nabla_{S} f (L^{*}, S_{1}) - \nabla_{S} f (L^{*}, S_{2}) = S_{1} - S_{2}

. Then, the partial gradient function

\nabla_{S} f (L^{*}, S)

is Lipschitz continuous with a constant

ℓ = 1

. The same result holds for the partial gradient function

\nabla_{L} f (L, S^{*})

. □

Proof of Theorem 1.

Since

(L^{*}, S^{*})

is a global minimizer of Problem (7), then

L^{*}

is also a global minimizer of the problem:

\begin{matrix} min_{L \in S_{+}^{p}} f (L, S^{*}) + λ_{L} {∥ L ∥}_{q}^{q} . \end{matrix}

By

(i v)

of Proposition 1, we have that

f (L_{1}, S) \leq f (L_{2}, S) + 〈 \nabla_{L} f (L_{2}, S), L_{1} - L_{2} 〉 + \frac{1}{2} {∥ L_{1} - L_{2} ∥}_{F}^{2}, \forall L_{1}, L_{2} \in R^{p \times p} .

It follows that

\begin{matrix} \begin{matrix} f (L^{*}, S^{*}) + λ_{L} {∥ L^{*} ∥}_{q}^{q} & \leq f (L, S^{*}) + λ_{L} {∥ L ∥}_{q}^{q} \\ \leq f (L, S^{*}) + λ_{L} {∥ L ∥}_{q}^{q} + \frac{ℓ - 1}{2} {∥ L - L^{*} ∥}_{F}^{2} \\ \leq f (L^{*}, S^{*}) + 〈 \nabla_{L} f (L^{*}, S^{*}), L - L^{*} 〉 + \frac{1}{2} ∥ L - L^{*} ∥_{F}^{2} + λ_{L} {∥ L ∥}_{q}^{q} + \frac{ℓ - 1}{2} {∥ L - L^{*} ∥}_{F}^{2} \\ = f (L^{*}, S^{*}) + 〈 \nabla_{L} f (L^{*}, S^{*}), L - L^{*} 〉 + \frac{ℓ}{2} ∥ L - L^{*} ∥_{F}^{2} + λ_{L} {∥ L ∥}_{q}^{q} . \end{matrix} \end{matrix}

Thus,

\begin{matrix} \begin{matrix} L^{*} & \in \underset{L \in S_{+}^{p}}{argmin} \{f (L^{*}, S^{*}) + 〈 \nabla_{L} f (L^{*}, S^{*}), L - L^{*} 〉 + \frac{ℓ}{2} ∥ L - L^{*} ∥_{F}^{2} + λ_{L} {∥ L ∥}_{q}^{q}\} \\ = {Proxi}_{(λ_{L} / ℓ) {∥ L ∥}_{q}^{q}, S_{+}^{p}} (L^{*} - \frac{1}{ℓ} (L^{*} + S^{*} - \tilde{Σ})) . \end{matrix} \end{matrix}

Hence, (9) holds.

We now prove that (10) holds for

(L^{*}, S^{*})

. Observe that

S^{*}

is a global minimizer of the problem:

\begin{matrix} min_{S \in S_{+}^{p}} f (L^{*}, S) + \sum_{(u, v) \in D^{c}} λ_{u v}^{S} {| S_{u v} |}^{q}, \end{matrix}

which can be equivalently rewritten as

\begin{matrix} min_{S \in R^{p \times p}} f (L^{*}, S) + \sum_{(u, v) \in D^{c}} λ_{u v}^{S} {| S_{u v} |}^{q} + δ_{S_{+}^{p}} (S) . \end{matrix}

Then,

S_{Γ^{*}}^{*}

is a global minimizer of the problem:

\begin{matrix} min_{S_{Γ^{*}}} f (L^{*}, S_{Γ^{*}}, S_{{(Γ^{*})}^{c}}^{*}) + \sum_{(u, v) \in D^{c} ⋂ Γ^{*}} λ_{u v}^{S} {| S_{u v} |}^{q} + δ_{S_{+}^{p}} ([S_{Γ^{*}} : S_{{(Γ^{*})}^{c}}^{*}]) . \end{matrix}

(26)

It follows from the first-order optimality condition of Problem (26) and Exercise 10.10 in [32] that there exists a

Y_{Γ^{*}} \in \partial h_{Γ^{*}} (S_{Γ^{*}})

such that

\begin{matrix} 0 = L_{u v}^{*} + S_{u v}^{*} - {\tilde{Σ}}_{u v} + λ_{u v}^{S^{*}} sign (S_{u v}^{*}) {| S_{u v}^{*} |}^{q - 1} + Y_{u v}, for all (u, v) \in Γ^{*} ⋂ D^{c}, \end{matrix}

(27)

\begin{matrix} 0 = L_{u v}^{*} + S_{u v}^{*} - {\tilde{Σ}}_{u v} + Y_{u v}, for all (u, v) \in Γ^{*} ⋂ D . \end{matrix}

(28)

Moreover, let

Y = [Y_{Γ^{*}} : Y_{{(Γ^{*})}^{c}}]

, where

Y_{{(Γ^{*})}^{c}}

is some finite array. For any

(u, v) \in {(Γ^{*})}^{c} ⋂ D^{c}

, we have

\begin{matrix} S_{u v}^{*} (L_{u v}^{*} + S_{u v}^{*} - {\tilde{Σ}}_{u v}) + λ_{u v}^{S} {| S_{u v} |}^{q} + S_{u v}^{*} Y_{u v} = 0, \end{matrix}

(29)

and for any

(u, v) \in {(Γ^{*})}^{c} ⋂ D

, we have

\begin{matrix} S_{u v}^{*} (L_{u v}^{*} + S_{u v}^{*} - {\tilde{Σ}}_{u v}) + S_{u v}^{*} Y_{u v} = 0, \end{matrix}

(30)

Combining (27)-(30) with the definition of

Y

, (10) holds. Thus,

(L^{*}, S^{*})

is a first-order stationary point of Problem (7). □

Proof of Theorem 2.

Since

(L^{*}, S^{*})

is a first-order stationary point of Problem (7) and

S^{*} \in S_{+ +}^{p}

, by Definition 1, it holds that

\begin{matrix} 0 = S^{*} \circ (L^{*} + S^{*} - \tilde{Σ}) + Q^{*}, \end{matrix}

(31)

where

Q^{*} : = {(Q_{u v}^{*})}_{u, v = 1}^{p}

and

\begin{matrix} Q_{u v}^{*} = \{\begin{matrix} q λ_{u v}^{S} {| S_{u v}^{*} |}^{q}, & if (u, v) \in D^{c}, \\ 0, & if (u, v) \in D . \end{matrix} \end{matrix}

For all

(u, v) \in D^{c} ⋂ Γ^{*}

, by (31), we have

\begin{matrix} 0 = S_{u v}^{*} (L_{u v}^{*} + S_{u v}^{*} - {\tilde{Σ}}_{u v}) + q λ_{u v}^{S} {| S_{u v}^{*} |}^{q} . \end{matrix}

(32)

Moreover, the statement

(i v)

in Proposition 1 shows that

\nabla_{S} f (L, S)

is Lipschitz continuous with a constant 1, and it follows that

f (L^{*}, Y) \leq f (L^{*}, X) + 〈 \nabla_{S} f (L^{*}, X), Y - X 〉 + \frac{1}{2} {∥ Y - X ∥}_{F}^{2}, \forall X, Y \in R^{p \times p} .

Taking

X = S^{*}

and

Y = S^{*} - \nabla_{S} f (L^{*}, S^{*})

, we obtain that

f (L^{*}, S^{*} - \nabla_{S} f (L^{*}, S^{*})) \leq f (L^{*}, S^{*}) - \frac{1}{2} {∥ \nabla_{S} f (L^{*}, S^{*}) ∥}_{F}^{2} .

Note that

f (L^{*}, S^{*} - \nabla_{S} f (L^{*}, S^{*})) \geq 0, f (L^{*}, S^{*}) \leq F (L^{*}, S^{*}) \leq F (L^{0}, S^{0}) + ϵ .

It follows that

∥ \nabla_{S} f (L^{*}, S^{*}) ∥_{F} \leq \sqrt{2 [F (L^{0}, S^{0}) + ϵ]} .

This, together with the relation (32), implies that

q λ_{u v}^{S} | S_{u v}^{*} |^{q - 1} = | {(\nabla_{S} f (L^{*}, S^{*}))}_{u v} | \leq ∥ \nabla_{S} f (L^{*}, S^{*}) ∥_{F} \leq \sqrt{2 [F (L^{0}, S^{0}) + ϵ]}, \forall (u, v) \in D^{c} ⋂ Γ^{*} .

Thus, we have

| S_{u v}^{*} | \geq {(\frac{q min {λ_{u v}^{S}}}{\sqrt{2 [F (L^{0}, S^{0}) + ϵ]}})}^{\frac{1}{1 - q}} \forall (u, v) \in D^{c} ⋂ Γ^{*},

which completes this proof. □

Proof of Theorem 3.

(i)

We just need to prove that Relation (20) is a sufficient condition for (14) and the inequality (15) holds. Since

(L_{μ_{k}}^{*}, S_{μ_{k}}^{*})

is a first-order stationary point of Problem (16), there exists a matrix

Y_{μ_{k}}^{*} \in N_{S_{+}^{p}} (S^{*})

such that

0 = \nabla_{S} F_{μ_{k}} (L_{μ_{k}}^{*}) + Y_{μ_{k}}^{*} .

Let

N : = {(u, v) \in D^{c} : {(S_{μ_{k}}^{*})}_{u v} > μ}

. It holds that

\begin{matrix} 0 = {(L_{μ_{k}}^{*})}_{N ⋃ D} + {(S_{μ_{k}}^{*})}_{N ⋃ D} - {\tilde{Σ}}_{N ⋃ D} + {[Ψ (S_{μ_{k}}^{*})]}_{N ⋃ D} + {(Y_{μ_{k}}^{*})}_{N ⋃ D} . \end{matrix}

(33)

By (11), it is known that

{(Y_{μ_{k}}^{*})}_{N ⋃ D} \in L_{N ⋃ D}^{*} (N_{S_{+}^{p}} (S_{μ_{k}}^{*})) \subseteq \partial h_{N ⋃ D} ({(S_{μ_{k}}^{*})}_{N ⋃ D}) .

Then,

dist (0, {(L_{μ_{k}}^{*})}_{N ⋃ D} + {(S_{μ_{k}}^{*})}_{N ⋃ D} - {\tilde{Σ}}_{N ⋃ D} + {[Ψ (S_{μ_{k}}^{*})]}_{N ⋃ D} + {(Y_{μ_{k}})}_{N ⋃ D} : Y_{N ⋃ D} \in \partial h_{N ⋃ D} ({(S_{μ_{k}}^{*})}_{N ⋃ D})) \leq ε .

Moreover, it follows from the condition that

μ \leq ε / p^{2}

that

∥ {(S_{μ_{k}}^{*})}_{{(Γ^{*} ⋃ D)}^{c}} ∥_{F} \leq \sum_{(u, v) \in {(Γ^{*} ⋃ D)}^{c}} | {(S_{μ_{k}}^{*})}_{u v} | \leq μ p^{2} \leq ε .

Therefore,

(L_{μ_{k}}^{*}, S_{μ_{k}}^{*})

is an

ε

-approximate first-order stationary point of Problem (7).

(i i)

Let

(L^{*}, S^{*})

be an accumulation point of

(L_{μ_{k}}^{*}, S_{μ_{k}}^{*})

. By the definition of the first-order stationary point of Problem (16), for any

(u, v) \in N^{c} ⋂ D^{c}

, there exists a matrix

Y_{μ_{k}}^{*} \in N_{S_{+}^{p}} (S_{μ_{k}}^{*})

such that

\begin{matrix} 0 = {(L_{μ_{k}}^{*})}_{u v} + {(S_{μ_{k}}^{*})}_{u v} - {\tilde{Σ}}_{u v} + q λ_{u v} {(\frac{{(S_{μ_{k}}^{*})}_{u v}^{2}}{2 μ_{k}} + \frac{μ_{k}}{2})}^{q - 1} \frac{{(S_{μ_{k}}^{*})}_{u v}}{μ_{k}} + {(Y_{μ_{k}}^{*})}_{u v} . \end{matrix}

(34)

Note that

\begin{matrix} 0 \leq q λ_{u v} {(\frac{{(S_{μ_{k}}^{*})}_{u v}^{2}}{2 μ_{k}} + \frac{μ_{k}}{2})}^{q - 1} \frac{{(S_{μ_{k}}^{*})}_{u v}^{2}}{μ_{k}} \leq q λ_{u v} {(\frac{{(S_{μ_{k}}^{*})}_{u v}^{2}}{μ_{k}})}^{q} \leq q λ_{u v}^{S} {| {(S_{μ_{k}}^{*})}_{u v} |}^{q} . \end{matrix}

(35)

If

| {(S_{μ_{k}}^{*})}_{u v} | \leq μ_{k}

for arbitrarily large k, then

S_{u v}^{*} = {lim}_{k \to \infty} {(S_{μ_{k}}^{*})}_{u v} = 0

, and by (35), we have

lim_{k \to \infty} q λ_{u v} {(\frac{{(S_{μ_{k}}^{*})}_{u v}^{2}}{2 μ_{k}} + \frac{μ_{k}}{2})}^{q - 1} \frac{{(S_{μ_{k}}^{*})}_{u v}^{2}}{μ_{k}} = 0 and q λ_{u v}^{S} {| S_{u v}^{*} |}^{q} = 0 .

Since

Y_{μ_{k}}^{*} \in N_{S_{+}^{p}} (S_{μ_{k}}^{*})

,

{lim}_{k \to \infty} Y_{μ_{k}}^{*} \in N_{S_{+}^{p}} (S^{*})

.

This, together with the relation (34), implies that, for any

(u, v) \in N^{c} ⋂ D^{c}

,

\begin{matrix} 0 \in \{S_{u v}^{*} (L_{u v}^{*} + S_{u v}^{*} - {\tilde{Σ}}_{u v}) + q λ_{u v}^{S} {| S_{u v}^{*} |}^{q} + S_{u v}^{*} Y_{u v}, Y \in N_{S_{+}^{p}} (S^{*})\} . \end{matrix}

(36)

Moreover, for any

(u, v) \in N ⋃ D

, we have

\begin{matrix} 0 = {(L_{μ_{k}}^{*})}_{u v} + {(S_{μ_{k}}^{*})}_{u v} - {\tilde{Σ}}_{u v} + Q_{u v}^{μ_{k}} + {(Y_{μ_{k}}^{*})}_{u v}, \end{matrix}

(37)

where

Q_{u v}^{μ_{k}} = \{\begin{matrix} 0, & (u, v) \in D, \\ q λ_{u v}^{S} sign ({(S_{μ_{k}}^{*})}_{u v}) {| {(S_{μ_{k}}^{*})}_{u v} |}^{q - 1}, & (u, v) \in (N ⋃ D) / D . \end{matrix}

For any

(u, v) \in N ⋃ D

, it follows that

\begin{matrix} 0 \in \{S_{u v}^{*} (L_{u v}^{*} + S_{u v}^{*} - {\tilde{Σ}}_{u v}) + S_{u v}^{*} Q_{u v} + S_{u v}^{*} Y_{u v}, Y \in N_{S_{+}^{p}} (S^{*})\}, \end{matrix}

(38)

where

Q_{u v} = \{\begin{matrix} 0, & (u, v) \in D, \\ q λ_{u v}^{S} sign (S_{u v}^{*}) {| S_{u v}^{*} |}^{q - 1}, & (u, v) \in (N ⋃ D) / D . \end{matrix}

Since the

Y_{μ_{k}}^{*}

in (34) coincides with that in (37), by (36) and (38), we have that

(L^{*}, S^{*})

is the first-order stationary point of Problem (16). □

Proof of Theorem 5.

(i)

By

(i v)

of Proposition 1, we have that

\frac{1}{2} ∥ L_{1} + S - \tilde{Σ} ∥_{F}^{2} \leq \frac{1}{2} ∥ L_{2} + S - \tilde{Σ} ∥_{F}^{2} + 〈 L_{2} + S - \tilde{Σ}, L_{1} - L_{2} 〉 + \frac{1}{2} {∥ L_{1} - L_{2} ∥}_{F}^{2}, \forall L_{1}, L_{2} \in R^{p \times p} .

It follows that

\begin{matrix} \begin{matrix} F_{μ} (L^{m + 1}, S^{m}) & = \frac{1}{2} ∥ L^{m + 1} + S^{m} - \tilde{Σ} ∥_{F}^{2} + λ_{L} {∥ L^{m + 1} ∥}_{q}^{q} + \sum_{(u, v) \in D^{c}} λ_{u v}^{S} ψ_{μ} {(S_{u v}^{m})}^{q}, \\ \leq \frac{1}{2} ∥ L^{m} + S^{m} - \tilde{Σ} ∥_{F}^{2} + 〈 L^{m} + S^{m} - \tilde{Σ}, L^{m + 1} - L^{m} 〉 + \frac{1}{2} ∥ L^{m + 1} - L^{m} ∥_{F}^{2} + λ_{L} {∥ L^{m + 1} ∥}_{q}^{q} \\ + \sum_{(u, v) \in D^{c}} λ_{u v}^{S} ψ_{μ} {(S_{u v}^{m})}^{q} \\ \leq \frac{1}{2} ∥ L^{m} + S^{m} - \tilde{Σ} ∥_{F}^{2} + 〈 L^{m} + S^{m} - \tilde{Σ}, L^{m + 1} - L^{m} 〉 + \frac{ℓ}{2} ∥ L^{m + 1} - L^{m} ∥_{F}^{2} + λ_{L} {∥ L^{m + 1} ∥}_{q}^{q} \\ + \sum_{(u, v) \in D^{c}} λ_{u v}^{S} ψ_{μ} {(S_{u v}^{m})}^{q} - \frac{ℓ - 1}{2} {∥ L^{m + 1} - L^{m} ∥}_{F}^{2} \\ \leq \frac{1}{2} ∥ L^{m} + S^{m} - \tilde{Σ} ∥_{F}^{2} + λ_{L} ∥ L^{m} ∥_{q}^{q} + \sum_{(u, v) \in D^{c}} λ_{u v}^{S} ψ_{μ} {(S_{u v}^{m})}^{q} - \frac{ℓ - 1}{2} {∥ L^{m + 1} - L^{m} ∥}_{F}^{2} \\ = F_{μ} (L^{m}, S^{m}) - \frac{ℓ - 1}{2} {∥ L^{m + 1} - L^{m} ∥}_{F}^{2}, \end{matrix} \end{matrix}

where the third inequality is due to the iteration of

L

in AU. Then,

\begin{matrix} \begin{matrix} F_{μ} (L^{m + 1}, S^{m + 1}) & \leq F_{μ} (L^{m + 1}, S^{m}) - \frac{c}{2} {∥ S^{m + 1} - S^{m} ∥}_{F}^{2} \\ \leq F_{μ} (L^{m}, S^{m}) - \frac{c}{2} ∥ S^{m + 1} - S^{m} ∥_{F}^{2} - \frac{ℓ - 1}{2} {∥ L^{m + 1} - L^{m} ∥}_{F}^{2} . \end{matrix} \end{matrix}

(i i)

It is easy to show that

F_{μ}

is coercive, which implies that

F_{μ}

is level bounded. By

(i)

in this Theorem, we have

F_{μ} (L^{m}, S^{m}) \leq F_{μ} (L^{0}, S^{0})

. Then,

{(L^{m}, S^{m})}

is bounded.

(i i i)

It follows from the inequality in

(i)

that

\begin{matrix} \begin{matrix} \sum_{m = 0}^{\infty} (\frac{c}{2} ∥ S^{m + 1} - S^{m} ∥_{F}^{2} + \frac{ℓ - 1}{2} {∥ L^{m + 1} - L^{m} ∥}_{F}^{2}) & \leq \sum_{m = 0}^{\infty} (F_{μ} (L^{m}, S^{m}) - F_{μ} (L^{m + 1}, S^{m + 1})) \\ \leq F_{μ} (L^{0}, S^{0}) - lim_{m \to \infty} F_{μ} (L^{m}, S^{m}) < \infty, \end{matrix} \end{matrix}

where the last inequality is due to

F_{μ}

being bounded from below. Hence,

{lim}_{m \to \infty} {∥ L^{m + 1} - L^{m} ∥}_{F} = 0

,

{lim}_{m \to \infty} {∥ S^{m + 1} - S^{m} ∥}_{F} = 0

.

(i v)

Let

(L^{*}, S^{*})

be an accumulation point of

(L^{m}, S^{m})

. We assumed that

{lim}_{j \to \infty} L^{m_{j}} = L^{*}

and

{lim}_{j \to \infty} S^{m_{j}} = S^{*}

, where

m_{j} \to \infty

as

j \to \infty

. By the results in

(i i i)

,

lim_{j \to \infty} L^{m_{j} + 1} = lim_{j \to \infty} L^{m_{j}} + (L^{m_{j} + 1} - L^{m_{j}}) = L^{*}, lim_{j \to \infty} S^{m_{j} + 1} = lim_{j \to \infty} S^{m_{j}} + (S^{m_{j} + 1} - S^{m_{j}}) = S^{*} .

By the AU method in this paper, we have that

\begin{matrix} L^{m_{j} + 1} = \underset{L \in S_{+}^{p}}{argmin} 〈 \nabla_{L} f (L^{m_{j}}, S^{m_{j}}), L - L^{*} 〉 + \frac{ℓ}{2} ∥ L - L^{m_{j}} ∥_{F}^{2} + λ_{L} {∥ L ∥}_{q}^{q}, \\ 〈 S^{m_{j} + 1} - (S^{m_{j}} - α_{m_{j}} \nabla_{S} F_{μ} (L^{m_{j}}, S^{m_{j}})), S - S^{m_{j} + 1} 〉 \geq 0, \forall S \in S_{+}^{p} . \end{matrix}

Taking the limits on both sides, it follows that

\begin{matrix} L^{*} = \underset{L \in S_{+}^{p}}{argmin} 〈 \nabla_{L} f (L^{*}, S^{*}), L - L^{*} 〉 + \frac{ℓ}{2} ∥ L - L^{m_{j}} ∥_{F}^{2} + λ_{L} {∥ L ∥}_{q}^{q}, \\ 〈 \nabla_{S} F_{μ} (L^{*}, S^{*}), S - S^{*} 〉 \geq 0, \forall S \in S_{+}^{p}, \end{matrix}

i.e.,

(L^{*}, S^{*})

is a first-order stationary point of Problem (16). □

Author Contributions

Methodology, X.W., L.K. and L.W.; software, X.W. and Z.Y.; writing—original draft, X.W., L.K. and L.W.; validation, X.W., L.K., L.W. and Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 12071022) and the 111 Project of China (Grant No. B16002).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bickel, P.; Levian, E. Covariance regularization by thresholding. Ann. Stat. 2008, 36, 2577–2604. [Google Scholar] [CrossRef]
El Karoui, P. Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Stat. 2008, 36, 2717–2756. [Google Scholar] [CrossRef]
Rothman, A.; Levian, E.; Zhu, J. Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc. 2009, 104, 177–186. [Google Scholar] [CrossRef]
Cai, T.; Liu, W. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 2011, 106, 672–684. [Google Scholar] [CrossRef] [Green Version]
Cai, T.; Zhou, H. Minimax estimation of large covariance matrices under ℓ₁ norm. Stat. Sinica 2008, 36, 2577–2604. [Google Scholar]
Rothman, A. Positive definite estimators of large covariance matrices. Biometrika 2012, 99, 733–740. [Google Scholar] [CrossRef]
Xue, L.; Ma, S.; Zou, H. Positive definite L₁ penalized estimation of large covariance matrices. J. Am. Stat. Assoc. 2012, 107, 1480–1491. [Google Scholar] [CrossRef] [Green Version]
Cui, Y.; Leng, C.; Sun, D. Sparse estimation of high-dimensional correlation matrices. Comput. Stat. Data. An. 2016, 93, 390–403. [Google Scholar] [CrossRef]
Avella-Medina, M.; Battey, H.; Fan, J.; Li, Q. Robust estimation of high-dimensional covariance and precision matrices. Biometrika 2018, 105, 271–284. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Fan, Y.; Lv, J. High dimensional covariance matrix estimation using a factor model. J. Econom. 2008, 147, 186–197. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Liao, Y.; Mincheva, M. High dimensional covariance matrix estimation in appriximate. Ann. Stat. 2011, 39, 3320–3356. [Google Scholar] [CrossRef] [PubMed]
Fan, J.; Liao, Y.; Mincheva, M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B 2013, 75, 603–680. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Fan, J.; Wang, W.; Zhong, Y. Robust covariance estimation for approximate factor models. J. Econom. 2019, 208, 5–22. [Google Scholar] [CrossRef] [PubMed]
Luo, X. High dimensional low rank and sparse covariance matrix estimation via convex minimization. Biometrika 2018, 105, 271–284. [Google Scholar]
Cand<i>e</i>‘s, E.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM 2011, 58, 1–37. [Google Scholar]
Feng, J.; Xu, H.; Yan, S. Online robust PCA via stochastic optimization. In Proceedings of the 26th International Conference on Neural Information Processing Systems, New York, NY, USA, 5 December 2013; pp. 404–412. [Google Scholar]
Kang, Z.; Peng, C.; Cheng, Q. Robust PCA via nonconvex rank approximation. In Proceedings of the 2015 IEEE International Conference on Data Mining, Atlantic City, NJ, USA, 14–17 November 2015; pp. 1550–4786. [Google Scholar]
Song, W.; Zhu, J.; Li, Y.; Chen, C. Image alignment by online robust PCA via stochastic gradient descent. IEEE Trans. Circ. Syst. Vid 2016, 26, 1241–1250. [Google Scholar] [CrossRef]
Bouwmans, T.; Jaced, S.; Zhang, H.; Lin, Z.; Ricardo, O. On the applications of robust PCA in image and video processing. Proc. IEEE 2018, 106, 1427–1457. [Google Scholar] [CrossRef] [Green Version]
Javed, S.; Mahmood, A.; Al-Maadeed, S.; Bouwmans, T.; Jung, S.K. Moving object detection in complex scene using spatiotemporal structured-sparse RPCA. IEEE Trans. Image Process 2018, 28, 1007–1022. [Google Scholar] [CrossRef]
Liu, Q.; Li, X. Efficient low-rank matrix factorization based on ℓ_1,ϵ-norm for online background subtraction. IEEE T Circ. Syst. Vid 2022, 32, 4900–4904. [Google Scholar] [CrossRef]
Fan, J.; Li, Q.; Wang, Y. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. Ser. B 2017, 79, 247–265. [Google Scholar] [CrossRef] [Green Version]
Chen, X.; Xu, F.; Ye, Y. Lower bound theory of nonzero entries in solutions of L₂-L_p mininization. SIAM J. Sci. Comput. 2010, 32, 2832–2852. [Google Scholar] [CrossRef]
Lu, Z. Iterative reweighted minimization methods for L_p regularized unconstrained nonlinear programming. Math. Program. 2014, 147, 277–307. [Google Scholar] [CrossRef]
Peng, D.; Xiu, N.; Yu, J. Global optimality and fixed point continuation algorithm for non-Lipschitz ℓ_p regularized matrix minimization. Sci. China Math. 2018, 61, 171–184. [Google Scholar] [CrossRef]
Rockafellar, R.T. Convex Analysis, 2nd ed; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
Mordukhovich, B.S.; Nguyen, M.N. An Easy Path to Convex Analysis and Applications; Morgan and Claypool: San Rafael, CA, USA, 2014. [Google Scholar]
Zhang, C.; Chen, X. Smoothing prijected gradient method and its application to stochastic linear complementarity problems. SIAM J. Optimiz. 2009, 20, 627–649. [Google Scholar] [CrossRef]
Chen, X.; Ng, M.K.; Zhang, C. Nonconvex L_p regularization and box constrained model for image restoration. IEEE T Image Process 2012, 21, 4709–4721. [Google Scholar] [CrossRef]
Chen, Y.; Xiu, N.; Peng, D. Global solutions of non-Lipschitz S₂-S_p minimization over the positive semidefinite cone. Optim. Lett. 2014, 8, 2053–2064. [Google Scholar] [CrossRef]
Beck, A. First-Order Methods in Optimization; Society for Industrial and Applied Mathematics and Mathematical Optimization Society: Philadelphia, PA, USA, 2017. [Google Scholar]
Rockafellar, R.T.; Wets, R.J.-B. Variational Analysis; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]

Table 1. Simulation results for Model 1 over 100 replications. The standard errors are given in parentheses.

p	SE	ATE	OTE	POET	LOREC	$L_{q}$ -CSCE
(Spectral norm)
30	3.76 (0.58)	1.73 (0.50)	1.89 (1.04)	1.74 (0.48)	3.04 (0.43)	1.74 (0.48)
100	8.58 (0.67)	2.69 (0.44)	2.70 (0.50)	2.61 (0.49)	4.94 (0.38)	2.60 (0.50)
200	14.10 (0.82)	4.73 (0.52)	3.27 (0.42)	3.02 (0.45)	6.17 (0.38)	3.02 (0.46)
(Max norm)
30	1.25 (0.23)	1.17 (0.28)	1.16 (0.28)	1.16 (0.28)	1.22 (0.22)	1.16 (0.28)
100	1.59 (0.26)	1.59 (0.26)	1.48 (0.32)	1.51 (0.33)	1.50 (0.25)	1.51 (0.33)
200	1.73 (0.21)	1.73 (0.21)	1.63 (0.25)	1.67 (0.25)	1.63 (0.19)	1.67 (0.25)
(Frobenius norm)
30	7.69 (0.52)	3.17 (0.48)	3.32 (1.03)	3.22 (0.43)	6.35 (0.47)	3.12 (0.44)
100	25.30 (0.64)	8.10 (0.66)	7.24 (0.52)	6.73 (0.49)	16.06 (0.58)	6.41 (0.52)
200	50.50 (0.79)	21.41 (0.80)	12.11 (0.48)	9.80 (0.47)	25.59 (0.62)	9.25 (0.49)

Table 2. Simulation results for Model 2 over 100 replications. The standard errors are given in parentheses.

p	SE	ATE	OTE	POET	LOREC	$L_{q}$ -CSCE
(Spectral norm)
30	3.30 (0.75)	3.31 (0.10)	3.28 (0.03)	3.77 (0.79)	3.23 (0.73)	3.27 (0.77)
100	4.99 (0.75)	7.60 (0.08)	7.62 (0.13)	5.48 (0.85)	4.60 (0.62)	4.60 (0.63)
200	6.85 (0.59)	8.36 (0.79)	8.65 (0.49)	7.40 (0.28)	6.30 (0.25)	6.27 (0.63)
(Max norm)
30	0.73 (0.18)	0.74 (0.17)	0.73 (0.18)	0.75 (0.19)	0.72 (0.17)	0.72 (0.16)
100	0.57 (0.08)	0.62 (0.05)	0.62 (0.05)	0.56 (0.09)	0.55 (0.07)	0.50 (0.07)
200	0.54 (0.07)	0.54 (0.07)	0.52 (0.08)	0.54 (0.07)	0.52 (0.07)	0.51 (0.07)
(Frobenius norm)
30	5.59 (0.58)	5.70 (0.59)	5.58 (0.57)	5.47 (0.76)	5.17 (0.58)	4.98 (0.63)
100	12.62 (0.40)	13.58 (0.05)	13.49 (0.24)	10.26 (0.57)	10.06 (0.40)	7.91 (0.48)
200	22.54 (0.39)	14.21 (0.09)	13.94 (0.02)	16.04 (0.53)	18.71 (0.39)	14.37 (0.48)

Table 3. Simulation results for Model 3 over 100 replications. The standard errors are given in parentheses.

p	SE	ATE	OTE	POET	LOREC	$L_{q}$ -CSCE
(Spectral norm)
30	5.59 (1.04)	5.57 (0.85)	5.53 (1.03)	5.73 (1.09)	5.57 (1.04)	5.33 (0.99)
100	10.09 (0.77)	9.71 (0.76)	10.01 (0.77)	10.19 (0.78)	9.52 (0.76)	9.21 (0.96)
200	15.00 (0.93)	13.17 (0.84)	14.22 (0.89)	15.69 (0.95)	14.41 (0.93)	11.28 (1.11)
(Max norm)
30	1.63 (0.30)	1.70 (0.28)	1.63 (0.30)	1.65 (0.29)	1.63 (0.30)	1.59 (0.29)
100	1.71 (0.29)	1.71 (0.29)	1.71 (0.29)	1.71 (0.29)	1.67 (0.29)	1.72 (0.37)
200	1.76 (0.22)	1.76 (0.22)	1.75 (0.22)	1.76 (0.22)	1.75 (0.21)	1.60 (0.16)
(Frobenius norm)
30	10.44 (0.88)	11.09 (0.94)	10.34 (0.87)	10.54 (0.91)	10.39 (0.88)	10.10 (0.84)
100	27.99 (0.77)	27.48 (0.76)	27.78 (0.77)	27.76 (0.78)	25.69 (0.76)	19.66 (0.96)
200	52.86 (0.75)	49.25 (0.74)	50.16 (0.73)	49.76 (0.76)	49.09 (0.74)	32.23 (0.78)

Table 4. Record of the number of positive semi-definite matrices over 100 replications.

Method	$p = 30$		$p = 100$		$p = 200$
Method	$L$	$S$	$L$	$S$	$L$	$S$
Model1
ATE	-	-	-	-	-	-
OTE	-	-	-	-	-	-
POET	100	0	100	0	100	0
LOREC	100	86	100	0	100	0
$L_{q}$ -CSCE	100	100	100	100	100	100
Model2
ATE	-	-	-	-	-	-
OTE	-	-	-	-	-	-
POET	100	100	100	2	100	0
LOREC	0	100	53	100	100	100
$L_{q}$ -CSCE	100	100	100	100	100	100
Model3
ATE	-	-	-	-	-	-
OTE	-	-	-	-	-	-
POET	100	0	100	0	100	0
LOREC	31	0	0	25	0	0
$L_{q}$ -CSCE	100	100	100	100	100	100

Table 5. Simulation results of the

L_{q}

-CSCE with different q for Model 1 over 100 replications. The standard errors are given in parentheses.

Table 5. Simulation results of the

L_{q}

-CSCE with different q for Model 1 over 100 replications. The standard errors are given in parentheses.

p	$q = 0.1$	$q = 0.3$	$q = 0.5$	$q = 0.7$	$q = 0.9$
(Spectral norm)
30	1.70 (0.49)	1.71 (0.48)	1.66 (0.51)	1.68 (0.46)	2.12 (0.43)
100	2.61 (0.54)	2.61 (0.54)	2.62 (0.55)	2.80 (0.46)	3.71 (0.32)
200	3.10 (0.52)	3.11 (0.52)	3.10 (0.53)	3.42 (0.49)	4.10 (0.46)
(Max norm)
30	1.18 (0.31)	1.18 (0.32)	1.13 (0.30)	1.21 (0.32)	1.21 (0.33)
100	1.53 (0.33)	1.53 (0.32)	1.53 (0.34)	1.54 (0.31)	1.52 (0.28)
200	1.65 (0.26)	1.65 (0.27)	1.65 (0.33)	1.66 (0.32)	1.66 (0.30)
(Frobenius norm)
30	3.12 (0.52)	3.15 (0.51)	3.10 (0.54)	3.73 (0.53)	4.45 (0.46)
100	6.38 (0.54)	6.34 (0.54)	6.34 (0.52)	6.43 (0.64)	7.21 (0.55)
200	9.31 (0.53)	9.22 (0.55)	9.21 (0.57)	9.70 (0.51)	11.12 (0.49)

Table 6. Simulation results of the

L_{q}

-CSCE with different q for Model 2 over 100 replications. The standard errors are given in parentheses.

Table 6. Simulation results of the

L_{q}

-CSCE with different q for Model 2 over 100 replications. The standard errors are given in parentheses.

p	$q = 0.1$	$q = 0.3$	$q = 0.5$	$q = 0.7$	$q = 0.9$
(Spectral norm)
30	3.41 (0.75)	3.41 (0.74)	3.40 (0.72)	3.39 (0.68)	3.38 (0.62)
100	4.81 (0.75)	4.75 (0.10)	4.77 (0.03)	4.74 (0.79)	4.83 (0.79)
200	6.35 (0.75)	6.30 (0.10)	6.31 (0.03)	6.40 (0.79)	6.41 (0.79)
(Max norm)
30	0.73 (0.25)	0.74 (0.18)	0.74 (0.18)	0.73 (0.18)	0.74 (0.19)
100	0.57 (0.11)	0.53 (0.10)	0.53 (0.07)	0.53 (0.04)	0.54 (0.05)
200	0.53 (0.10)	0.50 (0.07)	0.51 (0.09)	0.52 (0.07)	0.54 (0.08)
(Frobenius norm)
30	5.58 (0.62)	5.36 (0.65)	5.33 (0.64)	5.32 (0.61)	5.52 (0.56)
100	8.15 (0.62)	8.09 (0.52)	8.11 (0.50)	8.30 (0.52)	9.14 (0.57)
200	14.81 (0.65)	14.49 (0.57)	14.62 (0.53)	14.74 (0.56)	15.55 (0.49)

Table 7. Simulation results of the

L_{q}

-CSCE with different q for Model 3 over 100 replications. The standard errors are given in parentheses.

Table 7. Simulation results of the

L_{q}

-CSCE with different q for Model 3 over 100 replications. The standard errors are given in parentheses.

p	$q = 0.1$	$q = 0.3$	$q = 0.5$	$q = 0.7$	$q = 0.9$
(Spectral norm)
30	5.80 (1.01)	5.63 (0.99)	5.51 (0.98)	5.71 (0.95)	5.65 (1.00)
100	8.91 (1.04)	8.89 (0.90)	8.97 (0.93)	8.85 (0.84)	9.21 (0.79)
200	12.40 (1.13)	12.20 (1.04)	12.21 (1.01)	12.34 (1.06)	13.37 (0.99)
(Max norm)
30	1.60 (0.35)	1.56 (0.29)	1.56 (0.28)	1.56 (0.31)	1.60 (0.29)
100	1.73 (0.43)	1.73 (0.31)	1.74 (0.34)	1.76 (0.33)	1.69 (0.31)
200	1.71 (0.29)	1.71 (0.22)	1.72 (0.19)	1.73 (0.19)	1.78 (0.21)
(Frobenius norm)
30	10.12 (0.91)	9.98 (0.87)	10.03 (0.89)	10.14 (0.84)	10.42 (0.86)
100	20.60 (1.03)	20.01 (0.99)	20.04 (1.01)	20.45 (0.95)	21.36 (0.83)
200	31.56 (0.99)	31.40 (0.90)	31.29 (0.84)	31.49 (0.79)	34.47 (0.72)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Kong, L.; Wang, L.; Yang, Z. High-Dimensional Covariance Estimation via Constrained L_q-Type Regularization. Mathematics 2023, 11, 1022. https://doi.org/10.3390/math11041022

AMA Style

Wang X, Kong L, Wang L, Yang Z. High-Dimensional Covariance Estimation via Constrained L_q-Type Regularization. Mathematics. 2023; 11(4):1022. https://doi.org/10.3390/math11041022

Chicago/Turabian Style

Wang, Xin, Lingchen Kong, Liqun Wang, and Zhaoqilin Yang. 2023. "High-Dimensional Covariance Estimation via Constrained L_q-Type Regularization" Mathematics 11, no. 4: 1022. https://doi.org/10.3390/math11041022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Dimensional Covariance Estimation via Constrained L_q-Type Regularization

Abstract

1. Introduction

2. Notations and Preliminaries

3. Covariance Estimation with Adaptive $L_{q}$ -Type Regularization

4. Necessary Optimality Conditions

5. Numerical Optimization

5.1. A Smooth Approximation Optimization Problem

5.2. Smoothing Alternating Updating Method

6. Simulations

6.1. Comparison with Representative Methods

6.2. Choice of q

7. Discussion and Conclusions

8. Proof

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

High-Dimensional Covariance Estimation via Constrained Lq-Type Regularization

Abstract

1. Introduction

2. Notations and Preliminaries

3. Covariance Estimation with Adaptive L q -Type Regularization

4. Necessary Optimality Conditions

5. Numerical Optimization

5.1. A Smooth Approximation Optimization Problem

5.2. Smoothing Alternating Updating Method

6. Simulations

6.1. Comparison with Representative Methods

6.2. Choice of q

7. Discussion and Conclusions

8. Proof

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

High-Dimensional Covariance Estimation via Constrained L_q-Type Regularization

3. Covariance Estimation with Adaptive $L_{q}$ -Type Regularization