De-Biased Graphical Lasso for High-Frequency Data

Koike, Yuta

doi:10.3390/e22040456

Open AccessArticle

De-Biased Graphical Lasso for High-Frequency Data

by

Yuta Koike

Mathematics and Informatics Center and Graduate School of Mathematical Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8914, Japan

Entropy 2020, 22(4), 456; https://doi.org/10.3390/e22040456

Submission received: 6 February 2020 / Revised: 10 April 2020 / Accepted: 14 April 2020 / Published: 17 April 2020

(This article belongs to the Special Issue Machine Learning Meets Stochastic Processes: New Trends for Understanding Complex Systems)

Download

Browse Figures

Versions Notes

Abstract

:

This paper develops a new statistical inference theory for the precision matrix of high-frequency data in a high-dimensional setting. The focus is not only on point estimation but also on interval estimation and hypothesis testing for entries of the precision matrix. To accomplish this purpose, we establish an abstract asymptotic theory for the weighted graphical Lasso and its de-biased version without specifying the form of the initial covariance estimator. We also extend the scope of the theory to the case that a known factor structure is present in the data. The developed theory is applied to the concrete situation where we can use the realized covariance matrix as the initial covariance estimator, and we obtain a feasible asymptotic distribution theory to construct (simultaneous) confidence intervals and (multiple) testing procedures for entries of the precision matrix.

Keywords:

asymptotic mixed normality; factor model; high-dimensions; Malliavin calculus; precision matrix; sparsity

1. Introduction

In high-frequency financial econometrics, covariance matrix estimation of asset returns has been extensively studied in the past two decades. High-frequency financial data are commonly modeled as a discretely observed semimartingale for which the quadratic covariation matrix plays the role of the covariance matrix, so their treatments are often different from those in a standard i.i.d. setting. In recent years, motivated by application to portfolio allocation and risk management in a large-scale asset universe, the high-dimensionality problem has attracted much attention in this area. Since the 2000s, great progress has been made in high-dimensional covariance estimation from i.i.d. data, so researchers are naturally led to apply the techniques developed therein to the context of high-frequency data. For example, Wang and Zou [1] have applied the entry-wise shrinkage methods considered in [2,3] to estimating the covariance matrix of high-frequency data which are asynchronously observed with noise. See also [4,5,6,7] for further developments in this approach. In the meantime, it is well-recognized that the factor structure is an important ingredient both theoretically and empirically for financial data. In the context of high-dimensional covariance estimation from high-frequency data, this perspective was first taken into account by Fan et al. [8] and subsequently built up by, among others, [9,10,11]. Other common methods used in i.i.d. settings have also been investigated in the literature of high-frequency financial econometrics. Hautsch et al. [12] and Morimoto and Nagata [13] formally apply eigenvalue regularization methods based on random matrix theory to high-frequency data. Lam et al. [14] accommodate the non-linear shrinkage estimator of [15] to a high-frequency data setting with the help of the spectral distribution theory for the realized covariance matrix developed in [16]. Brownlees et al. [17] employ the

ℓ_{1}

-penalized Gaussian MLE, which is known as the graphical Lasso, to estimate the precision matrix (the inverse of the covariance matrix) of high-frequency data. The last approach is closely related to the methodology we will focus on. Despite the recent advances in this topic as above, most studies in this area focus only on point estimation of covariance and precision matrices, and there are little work about interval estimation and hypothesis testing for these objects. A few exceptions are [18,19,20]. The first two articles are concerned with continuous-time factor models: Kong and Liu [18] propose a test for the constancy of the factor loading matrix, while Pelger [19] assumes constant loadings and develops an asymptotic distribution theory to make inference for the factors and loadings. Meanwhile, Koike [20] establishes a high-dimensional central limit theorem for the realized covariance matrix which allows us to construct simultaneous confidence regions or carry out multiple testing for entries of the high-dimensional covariance matrix of high-frequency data. The aim of this study is to develop such a statistical inference theory for the precision matrix of high-frequency data. This is naturally motivated by the fact that the precision matrix of asset returns plays an important role in mean-variance analysis of portfolio allocation (see e.g., [21], Chapter 5). We accomplish this purpose by imposing a sparsity assumption on the precision matrix. Such an assumption has a clear interpretation in connection with Gaussian graphical models: For a Gaussian random vector

ξ = {(ξ_{1}, \dots, ξ_{d})}^{⊤}

with covariance matrix

Σ

,

ξ_{i}

and

ξ_{j}

are conditionally independent given the other components if and only if the

(i, j)

-th entry of

Σ^{- 1}

is equal to 0, so the sparsity of

Σ^{- 1}

is interpreted as the sparsity of the edge structure of the conditional independence graph associated with

ξ

. We refer to Chapter 13 of [22] and references therein for more details on graphical models. This standpoint also makes it interesting to estimate the precision matrix of financial data in view of the recent attention to financial network analysis such as [23].

Statistical inference for high-dimensional sparse precision matrices has been actively studied in the recent literature, and various methodologies have ever been proposed; see [24] for an overview. Among others, this paper studies (a weighted version of) the de-biased (or de-sparsified) graphical Lasso in the context of high-frequency data. The de-biased graphical Lasso was introduced in Janková and van de Geer [25] and its theoretical property was investigated in the i.i.d. case. In this paper, we consider its weighted version discussed in [24] because of its theoretically preferable behavior due to its adaptive nature (see Remarks 1 and 2). Compared to the i.i.d. case, we need to handle a new theoretical difficulty in the application to high-frequency data, which is caused by the non-ergodic nature of the problem, i.e., the precision matrix of high-frequency data is generally stochastic and not (stochastically) independent of the observation data. In our context, the precision matrix appears in the coefficients of the linear approximation of the de-biased estimator (see Lemma 1), so it spoils the martingale structure of the linear approximation which we usually have in the i.i.d. case. In a low-dimensional setting, this issue is typically resolved by the concept of stable convergence (see e.g., [26]), but the applicability of this approach is questionable in our setting due to the high-dimensionality (see pages 1451–1452 of [20] for a discussion). Instead, we rely on the recent high-dimensional central limit theory of [20] to establish the asymptotic distribution theory for the de-biased estimator, where we settle the above difficulty with the help of Malliavin calculus.

The graphical Lasso is an example of penalized estimation methods. We shall mention that penalized estimation has recently become an active research topic in the setting of asymptotic statistics for stochastic processes. For example, penalized quasi-likelihood estimation for stochastic processes has been developed in the fixed-dimensional setting by [27,28,29,30], while estimation for linearly parameterized high-dimensional diffusion models has been studied in [31,32]. Compared to these articles, this paper is novel in the respect that we develop an asymptotic distribution theory in a high-dimensional setting.

The rest of this paper is organized as follows. In Section 2 we develop an abstract asymptotic theory for the weighted graphical Lasso based on a generic estimator for the quadratic covariation matrix of a high-dimensional semimartingale. This allows us to flexibly apply the developed theory to various settings arising in high-frequency financial econometrics. In Section 3 we extend the scope of the theory to a situation where a known factor structure is present in data and a sparsity assumption is imposed on the precision matrix of the residual process rather than that of the original process. In Section 4, we apply the abstract theory developed in Section 3 to a concrete setting where we observe the process at equidistant times without jumps and noise. Section 5 conducts a Monte Carlo study to assess the finite sample performance of the asymptotic theory, while Section 6 performs a simple real data analysis for illustration. All the technical proofs are collected in the Appendix A, Appendix B, Appendix C and Appendix D.

Notation 1.

Throughout the paper, we assume

d \geq 2

. ⊤ stands for the transpose of a matrix. For a vector

x \in R^{d}

, we write the i-th component of x as

x^{i}

for

i = 1, \dots, d

. For two vectors

x, y \in R^{d}

, the statement

x \leq y

means

x^{i} \leq y^{i}

for all

i = 1, \dots, d

. The identity matrix of size d is denoted by

E_{d}

. We write

R^{l \times k}

for the set of all

l \times k

matrices.

S_{d}

denotes the set of all

d \times d

symmetric matrices.

S_{d}^{+}

denotes the set of all

d \times d

positive semidefinite matrices.

S_{d}^{+ +}

denotes the set of all

d \times d

positive definite matrices. For a

l \times k

matrix A, the

(i, j)

-th entry of A is denoted by

A^{i j}

. Also,

A^{i \cdot}

and

A^{\cdot j}

denote the i-th row vector and the j-th column vector, respectively (both are regarded as column vectors). We write

vec (A)

for the vectorization of A:

vec (A) : = {(A^{11}, \dots, A^{l 1}, A^{12}, \dots, A^{l 2}, \dots, A^{1 k}, \dots, A^{l k})}^{⊤} \in R^{l k} .

For every

w \in [1, \infty]

, we set

{∥ A ∥}_{ℓ_{w}} : = \{\begin{matrix} {\sum_{i = 1}^{l} \sum_{j = 1}^{k} | A^{i j} {|^{w}}}^{1 / w} & if w < \infty, \\ max_{1 \leq i \leq l} max_{1 \leq j \leq k} | A^{i j} | & if w = \infty . \end{matrix}

Also, we write

{| | | A | | |}_{w}

for the

ℓ_{w}

-operator norm of A:

{| | | A | | |}_{w} : = {sup {∥ A x ∥}_{ℓ^{w}} : x \in R^{k} {, ∥ x ∥}_{ℓ_{w}} = 1} .

It is well-known that

{| | | A | | |}_{1} = {max}_{1 \leq j \leq k} \sum_{i = 1}^{l} | A^{i j} |

and

{| | | A | | |}_{\infty} = {max}_{1 \leq i \leq l} \sum_{j = 1}^{k} | A^{i j} |

. When

l = k

,

diag (A)

denotes the diagonal matrix with the same diagonal entries as A, and we set

A^{-} : = A - diag (A)

. If A is symmetric, we denote by

Λ_{max} (A)

and

Λ_{min} (A)

the maximum and minimum eigenvalues of A, respectively. For two matrices A and B,

A \otimes B

denotes their Kronecker product. When A and B has the same size, we write

A \circ B

for their Hadamard product.

For a random variable ξ and

p \in (0, \infty]

,

{∥ ξ ∥}_{p}

denotes the

L^{p}

-norm of ξ. For a l-dimensional semimartingale

X = {(X_{t})}_{t \in [0, 1]}

and a k-dimensional semimartingale

Y = {(Y_{t})}_{t \in [0, 1]}

, we define

Σ_{X Y} : = {[X, Y]}_{1} : = {({[X^{i}, Y^{j}]}_{1})}_{1 \leq i \leq l, 1 \leq j \leq k}

. We write

Σ_{X} = Σ_{X X}

for short. If

Σ_{X}

is a.s. invertible, we write

Θ_{X} : = Σ_{X}^{- 1}

.

2. Estimators and Abstract Results

Given a stochastic basis

B = (Ω, F, {(F_{t})}_{t \in [0, 1]}, P)

, we consider a d-dimensional semimartingale

Y = {(Y_{t})}_{t \in [0, 1]}

defined there. We assume

Σ_{Y} = {[Y, Y]}_{1}

is a.s. invertible. In this paper, we consider the asymptotic theory such that the dimension d possibly depends on a parameter

n \in N

so that

d = d_{n} \to \infty

as

n \to \infty

. Consequently, both

B

and Y may also depend on n. However, following the custom of the literature, we omit the indices n from these objects and many other ones appearing below.

Our aim is to estimate the precision matrix

Θ_{Y} = Σ_{Y}^{- 1}

when we have an estimator

{\hat{Σ}}_{n}

for

Σ_{Y}

; as a corollary, we can also estimate

Σ_{Y}

itself. We assume that

{\hat{Σ}}_{n}

is an

S_{d}^{+}

-valued random variable all of whose diagonal entries are a.s. positive, but we do not specify the form of

{\hat{Σ}}_{n}

because the asymptotic theory developed in this section depends on the property of

{\hat{Σ}}_{n}

rather than their construction. This is convenient because construction of the estimator depends heavily on observation schemes for Y (with or without noise, synchronous or not, continuous or discontinuous and so on; see [33] for details). In Section 4 we illustrate how we apply the abstract theory developed in this and the next sections to a concrete situation.

We use the weighted graphical Lasso to estimate

Θ_{Y}

(cf. [24]). The weighted graphical Lasso estimator

{\hat{Θ}}_{λ}

with penalty parameter

λ > 0

based on

{\hat{Σ}}_{n}

is defined by

{\hat{Θ}}_{λ} : = arg min_{Θ \in S_{d}^{+ +}} \{tr (Θ {\hat{Σ}}_{n}) - log det (Θ) + λ \sum_{i \neq j} {\hat{V}}_{n}^{i i} {\hat{V}}_{n}^{j j} |Θ^{i j}|\},

(1)

where

{\hat{V}}_{n} : = diag {({\hat{Σ}}_{n})}^{\frac{1}{2}}

. According to the proof of [34] (Lemma 1), the optimization problem in Equation (1) has the unique solution when

λ > 0

and

{\hat{Σ}}_{n}

is positive semidefinite and all the diagonal entries of

{\hat{Σ}}_{n}

are positive, so

{\hat{Θ}}_{λ}

is a.s. defined in our setting. In the following we allow

λ

to be a random variable because we typically select

λ

in a data-driven way.

To analyze the theoretical property of

{\hat{Θ}}_{λ}

, it is convenient to consider the graphical Lasso estimator

{\hat{K}}_{λ}

based on the correlation matrix estimator

{\hat{R}}_{n} : = {\hat{V}}_{n}^{- 1} {\hat{Σ}}_{n} {\hat{V}}_{n}^{- 1}

as follows:

{\hat{K}}_{λ} : = arg min_{K \in S_{d}^{+ +}} \{tr (K {\hat{R}}_{n}) - log det (K) + λ {∥K^{-}∥}_{ℓ_{1}}\} .

(2)

We can easily check

{\hat{Θ}}_{λ} = {\hat{V}}_{n}^{- 1} {\hat{K}}_{λ} {\hat{V}}_{n}^{- 1}

.

Remark 1.

As pointed out in Rothman et al. [35] and Janková and van de Geer [24], the graphical Lasso based on correlation matrices is theoretically preferable to that based on covariance matrices (so the weighted graphical Lasso is also preferable). In particular, we do not need to impose the so-called irrepresentability condition on

Σ_{Y}

to derive the theoretical properties of our estimators, which contrasts with Brownlees et al. [17] (see Assumption 2 in [17]). See also Remark 2 for an additional discussion.

We introduce some notation related to the sparsity assumptions we will impose on

Θ_{Y}

. Let

A \in S_{d}

. For

j = 1, \dots, d

, we set

D_{j} (A) : = {i : A^{i j} \neq 0, i \neq j}

and

d_{j} (A) : = # D_{j} (A)

. Then we define

d (A) : = {max}_{1 \leq j \leq d} d_{j} (A)

. We also define

S (A) : = ⋃_{j = 1}^{d} D_{j} (A) = {(i, j) : A^{i j} \neq 0, i \neq j}

and

s (A) : = # S (A)

. These quantities have a clear interpretation when the matrix A represents the edge structure of some graph so that

A^{i j} \neq 0

is equivalent to the presence of an edge between vertices i and j for

i \neq j

; in this case,

d_{j} (A)

is the number of edges adjacent to vertex j (which is called the degree of vertex j) and

s (A)

is the total number of edges contained in the graph.

To derive our asymptotic results, we will impose the following structural assumptions on

Σ_{Y}

.

[A1]: $Λ_{\max} (Σ_{Y}) + 1 / Λ_{\min} (Σ_{Y}) = O_{p} (1)$ as $n \to \infty$ .
[A2]: $s (Θ_{Y}) = O_{p} (s_{n})$ as $n \to \infty$ for some sequence $s_{n} \in [1, \infty)$ , $n = 1, 2, \dots$ .
[A3]: $d (Θ_{Y}) = O_{p} (d_{n})$ as $n \to \infty$ for some sequence $d_{n} \in [1, \infty)$ , $n = 1, 2, \dots$ .

[A1] is standard in the literature; see e.g., Condition A1 in [24]. [A2] states that the sparsity of

Θ_{Y}

is controlled by the deterministic sequence

s_{n}

; we will require the growth rate of

s_{n}

to be moderate. [A3] is another sparsity assumption on

Θ_{Y}

. It is weaker than [A2] in the sense that it always holds true with

d_{n} = s_{n}

under [A2]. However, we can generally take

d_{n}

smaller than

s_{n}

.

2.1. Consistency

Set

V_{Y} : = diag {(Σ_{Y})}^{\frac{1}{2}}

,

R_{Y} : = V_{Y}^{- 1} Σ_{Y} V_{Y}^{- 1}

and

K_{Y} : = R_{Y}^{- 1}

.

Proposition 1.

Assume [A1]–[A2]. Let

{(λ_{n})}_{n = 1}^{\infty}

be a sequence of positive-valued random variables satisfying the following conditions:

[B1]: $λ_{n}^{- 1} {∥ {\hat{Σ}}_{n} - Σ_{Y} ∥}_{ℓ_{\infty}} \to^{p} 0$ as $n \to \infty$ .
[B2]: $s_{n} λ_{n} \to^{p} 0$ as $n \to \infty$ .

Then we have

λ_{n}^{- 1} {∥ {\hat{K}}_{λ_{n}} - K_{Y} ∥}_{ℓ_{2}} = O_{p} (\sqrt{s_{n}}), λ_{n}^{- 1} {| | | {\hat{K}}_{λ_{n}} - K_{Y} | | |}_{w} = O_{p} (s_{n})

(3)

and

λ_{n}^{- 1} {| | | {\hat{Θ}}_{λ_{n}} - Θ_{Y} | | |}_{w} = O_{p} (s_{n}), λ_{n}^{- 1} {| | | {\hat{Θ}}_{λ_{n}}^{- 1} - Σ_{Y} | | |}_{2} = O_{p} (s_{n})

(4)

as

n \to \infty

for any

w \in [1, \infty]

.

Proposition 1 is essentially a rephrasing of Theorem 14.1.3 in [24]. To get a better convergence rate in Proposition 1, we should choose

λ_{n}

as small as possible, where a lower bound of

λ_{n}

is determined by the convergence rate of

{\hat{Σ}}_{n}

in the

ℓ_{\infty}

-norm by [B1]. One typically derives this convergence rate by establishing entry-wise concentration inequalities for

{\hat{Σ}}_{n}

. Such inequalities have already been established for various covariance estimators used in high-frequency financial econometrics; see Theorems 1–2 and Lemma 3 in [36], Theorem 1 in [4], Theorem 1 in [37], and Theorem 2 in [17] for example. We however note that

{\hat{Σ}}_{n}

should be positive semidefinite to ensure that the graphical Lasso has the unique solution. This property is not necessarily ensured by many covariance estimators used in this area. In this regard, we mention that pre-averaging and realized kernel estimators have versions to ensure this property, for which relevant bounds are available in [6] (Theorem 2) and [11] (Lemma 1).

Remark 2

(Comparison to Brownlees et al. [17]). Compared with [17] (Theorem 1), Proposition 1 has two major theoretical improvements. First, Proposition 1 does not assume the so-called irrepresentability condition, which is imposed in [17] (Theorem 1) as Assumption 2. In fact, under the assumptions of Proposition 1, the unweighted graphical Lasso estimator adopted in [17] would have the convergence rate

(s_{n} + d) λ_{n}

(rather than

s_{n} λ_{n}

in our case) to estimate

Θ_{Y}

in the norm

{| | | \cdot | | |}_{w}

, in view of [24] (Theorem 14.1.2). This means that we need to select

λ_{n}

so that

d λ_{n} \to^{p} 0

as

n \to \infty

to ensure the consistency, which is much stronger than the corresponding assumption [B2] in our setting. Since

λ_{n}

typically converges to 0 no faster than

1 / \sqrt{n}

with n being the sample size (cf. Section 4), the condition

d λ_{n} \to^{p} 0

excludes high-dimensional settings such that

d ≫ n

.

Second, Proposition 1 gives consistency in the

ℓ_{w}

-operator norm for all

w \in [1, \infty]

, while [17] (Theorem 1) only shows consistency in the

ℓ_{\infty}

-norm. We shall remark that consistency in matrix operator norms is important in application. For example, the consistency of

{\hat{Θ}}_{λ_{n}}

in the

ℓ_{2}

-operator norm implies that eigenvalues of

{\hat{Θ}}_{λ_{n}}

consistently estimate the corresponding eigenvalues of

Θ_{Y}

. Also, the consistency in the

ℓ_{\infty}

-operator norm ensures

∥ {\hat{Θ}}_{λ_{n}} x - Θ_{Y} {x ∥}_{ℓ_{\infty}} \to^{p} 0

as

n \to \infty

for any

x \in R^{d}

such that

{∥ x ∥}_{ℓ_{\infty}} = O (1)

. This result is important for portfolio allocation because the weight vector for the global minimum variance portfolio is given by

Θ_{Y} 1 / 1^{⊤} Θ_{Y} 1

when assets have covariance matrix

Σ_{Y}

, where

1 = {(1, \dots, 1)}^{⊤} \in R^{d}

; see e.g., [21] (Section 5.2).

On the other hand, unlike [17] (Theorem 1), we do not show selection consistency (i.e.,

P (S ({\hat{Θ}}_{λ_{n}}) = S (Θ_{Y})) \to 1

as

n \to \infty

) under our assumptions. Indeed, in the linear regression setting, it is known that an irrepresentability type condition is necessary for the selection consistency of the Lasso; see [22] (Section 7.5.3) for more details. This suggests that our estimator would not have oracle property in the sense of [38] in general. However, we shall remark that the asymptotic mixed normality of the de-biased estimator stated below can be used to construct an estimator with selection consistency via thresholding as in e.g., [39] (Section 3.1) and [40] (Section 4.2). See Corollary 2 and the subsequent discussion for details.

2.2. Asymptotic Mixed Normality

The following lemma states that

{\hat{Θ}}_{λ_{n}} - Θ_{Y}

is asymptotically linear in

{\hat{Σ}}_{n} - Σ_{Y}

after bias correction when

Θ_{Y}

is sufficiently sparse.

Lemma 1.

Suppose that the assumptions of Proposition 1 and [A3] are satisfied. Then we have

λ_{n}^{- 2} {∥ {\hat{Θ}}_{λ_{n}} - Θ_{Y} - Γ_{n} + Θ_{Y} ({\hat{Σ}}_{n} - Σ_{Y}) Θ_{Y} ∥}_{ℓ_{\infty}} = O_{p} (s_{n} \sqrt{d_{n}})

as

n \to \infty

, where

Γ_{n} : = - ({\hat{Θ}}_{λ_{n}} - {\hat{Θ}}_{λ_{n}} {\hat{Σ}}_{n} {\hat{Θ}}_{λ_{n}})

.

Lemma 1 is an almost straightforward consequence of Equation (4) and the Karush–Kuhn–Tucker (KKT) conditions for the optimization problem in Equation (1). As a consequence of this lemma, we obtain the following result, which states that the “de-biased” weighted graphical Lasso estimator

{\hat{Θ}}_{λ_{n}} - Θ_{Y} - Γ_{n}

inherits the asymptotic mixed normality of

{\hat{Σ}}_{n}

.

Proposition 2.

Suppose that the assumptions of Lemma 1 are satisfied. For every

n \in N

, let

a_{n} > 0

,

C_{n}

be a

d^{2} \times d^{2}

positive semidefinite random matrix and

J_{n}

be an

m \times d^{2}

random matrix, where

m = m_{n}

may depend on n. Assume

a_{n} {| | | J_{n} | | |}_{\infty} λ_{n}^{2} s_{n} \sqrt{d_{n} log (m + 1)} \to^{p} 0

as

n \to \infty

. Assume also that

lim_{n \to \infty} sup_{y \in R^{m}} |P (a_{n} {\tilde{J}}_{n} vec ({\hat{Σ}}_{n} - Σ_{Y}) \leq y) - P ({\tilde{J}}_{n} C_{n}^{1 / 2} ζ_{n} \leq y)| = 0

(5)

and

lim_{b ↓ 0} \underset{n \to \infty}{lim sup} P (min diag ({\tilde{J}}_{n} C_{n} {\tilde{J}}_{n}^{⊤}) < b) = 0

(6)

as

n \to \infty

, where

{\tilde{J}}_{n} : = - J_{n} (Θ_{Y} \otimes Θ_{Y})

and

ζ_{n}

is a

d^{2}

-dimensional standard Gaussian vector independent of

F

, which is defined on an extension of the probability space

(Ω, F, P)

if necessary. Then,

lim_{n \to \infty} sup_{y \in R^{m}} |P (a_{n} J_{n} vec ({\hat{Θ}}_{λ_{n}} - Γ_{n} - Θ_{Y}) \leq y) - P ({\tilde{J}}_{n} C_{n}^{1 / 2} ζ_{n} \leq y)| = 0 .

In a standard i.i.d. setting such that

Θ_{Y}

is non-random, we can usually verify Equation (5) by classical Lindeberg’s central limit theorem when

m = 1

and

J_{n}

is non-random because

a_{n} {\tilde{J}}_{n} vec ({\hat{Σ}}_{n} - Σ_{Y})

can be written as a sum of independent random variables; see the proof of [25] (Theorem 1) for example. By contrast,

Θ_{Y}

is generally random and not independent of

{\hat{Σ}}_{n} - Σ_{Y}

in our setting, so

a_{n} {\tilde{J}}_{n} vec ({\hat{Σ}}_{n} - Σ_{Y})

may not be a martingale even if

vec ({\hat{Σ}}_{n} - Σ_{Y})

is a martingale. In the case that d is fixed, we typically resolve this issue by proving stable convergence in law of

vec ({\hat{Σ}}_{n} - Σ_{Y})

; see e.g., [26] for details. However, extension of this approach to the case that

d \to \infty

as

n \to \infty

is far from trivial as discussed at the beginning of [20] (Section 3). For this reason, [20] gives a result to directly establish Equation (5) type convergence in a high-dimensional setting. This result will be used in Section 4 to apply our abstract theory to a more concrete setting.

Remark 3.

Proposition 2 also allows m to diverge as

n \to \infty

, which is necessary when we need to derive an asymptotic approximation of the joint distribution of

vec ({\hat{Θ}}_{λ_{n}} - Γ_{n} - Θ_{Y})

. Such an approximation can be used to make simultaneous inference for entries of

Θ_{Y}

; see [40] for example.

3. Factor Structure

In financial applications, it is often important to take account of the factor structure of asset prices. In fact, many empirical studies have documented the existence of common factors in financial markets (e.g., [41] (Section 6.5)). Also, factor models play a dominant role in asset pricing theory (cf. [21] (Chapter 9)). When common factors are present across asset returns, the precision matrix cannot be sparse because all pairs of the assets are partially correlated given other assets through the common factors. Therefore, in such a situation, it is common practice to impose a sparsity assumption on the precision matrix of the residual process which is obtained after removing the co-movements induced by the factors (see e.g., [17] (Section 4.2) and [42] (Section 4.2)). In this section, we accommodate the theory developed in Section 2 to such an application.

Specifically, suppose that we have an r-dimensional known factor process X, and consider the following continuous-time factor model:

Y = β X + Z .

(7)

Here,

β

is a non-random

d \times r

matrix and Z is a d-dimensional semimartingale such that

{[Z, X]}_{1} = 0

.

β

and Z represent the factor loading matrix and residual process of the model, respectively. This model is widely used in high-frequency financial econometrics; see [8,9,11] in the context of high-dimensional covariance matrix estimation. One restriction of the model Equation (7) is that the factor loading

β

is assumed to be constant, but there is empirical evidence that

β

may be regarded as constant in short time intervals (one week or less); see [18,43] for instance.

Remark 4.

The number of factors r possibly depends on n and (slowly) diverges as

n \to \infty

. Also, β may depend on n.

We are interested in estimating

Σ_{Y}

based on observation data for X and Y while taking account of the factor structure given by Equation (7). Suppose that we have generic estimators

{\hat{Σ}}_{Y, n}, {\hat{Σ}}_{X, n}

and

{\hat{Σ}}_{Y X, n}

for

Σ_{Y}, Σ_{X}

and

Σ_{Y X}

, respectively.

{\hat{Σ}}_{Y, n}, {\hat{Σ}}_{X, n}

and

{\hat{Σ}}_{Y X, n}

are assumed to be random variables taking values in

S_{d}, S_{r}^{+}

and

R^{d \times r}

, respectively. Now, by assumption we have

Σ_{Y} = β Σ_{X} β^{⊤} + Σ_{Z} .

(8)

Assume

Σ_{X}

is a.s. invertible. Then

β

can be written as

β = Σ_{Y X} Σ_{X}^{- 1}

. Therefore, we can naturally estimate

β

by

{\hat{β}}_{n} : = {\hat{Σ}}_{Y X, n} {\hat{Σ}}_{X, n}^{- 1}

, provided that

{\hat{Σ}}_{X, n}

is invertible. In practical applications, the invertibility of

{\hat{Σ}}_{X, n}

is usually not problematic because the number of factors r is sufficiently small compared to the sample size. However, it is theoretically convenient to (formally) define

{\hat{β}}_{n}

in the case that

{\hat{Σ}}_{X, n}

is singular. For this reason, we take an

S_{d}^{+ +}

-valued random variable

{\hat{Σ}}_{X, n}^{†}

such that

{\hat{Σ}}_{X, n}^{†} = {\hat{Σ}}_{X, n}^{- 1}

on the event where

{\hat{Σ}}_{X, n}

is invertible, and redefine

{\hat{β}}_{n}

as

{\hat{β}}_{n} : = {\hat{Σ}}_{Y X, n} {\hat{Σ}}_{X, n}^{†}

. This does not affect the asymptotic properties of our estimators because

{\hat{Σ}}_{X, n}

is asymptotically invertible under our assumptions we will impose. Now, from Equation (8),

Σ_{Z}

is estimated by

{\hat{Σ}}_{Z, n} : = {\hat{Σ}}_{Y, n} - {\hat{β}}_{n} {\hat{Σ}}_{X, n} {\hat{β}}_{n}^{⊤} .

(9)

Since

{\hat{Σ}}_{Z, n}

might be a poor estimator for

Σ_{Z}

because d can be extremely large in our setting, we apply the weighted graphical Lasso to

{\hat{Σ}}_{Z, n}

in order to estimate

Σ_{Z}

. Specifically, we construct the weighted graphical Lasso estimator

{\hat{Θ}}_{Z, λ}

based on

{\hat{Σ}}_{Z, n}

as follows:

{\hat{Θ}}_{Z, λ} = arg min_{Θ \in S_{d}^{+ +}} \{tr (Θ {\hat{Σ}}_{Z, n}) - log det (Θ) + λ \sum_{i \neq j} \sqrt{{\hat{Σ}}_{Z, n}^{i i} {\hat{Σ}}_{Z, n}^{j j}} |Θ^{i j}|\} .

(10)

Then

Σ_{Z}

is estimated by the inverse of

{\hat{Θ}}_{Z, λ}

. Hence our final estimator for

Σ_{Y}

is constructed as

{\hat{Σ}}_{Y, λ} : = {\hat{β}}_{n} {\hat{Σ}}_{X, n} {\hat{β}}_{n}^{⊤} + {\hat{Θ}}_{Z, λ}^{- 1} .

(11)

Remark 5.

Although we will impose the assumptions which guarantee that the optimization problem in Equation (10) asymptotically has the unique solution with probability 1, it may have no solution for a fixed n. Thus, we formally define

{\hat{Θ}}_{Z, λ}

as an

S_{d}^{+ +}

-valued random variable such that

{\hat{Θ}}_{Z, λ}

is defined by Equation (10) on the event where the optimization problem in Equation (10) has the unique solution.

Remark 6

(Positive definiteness of

{\hat{Σ}}_{Y, λ}

). Since

{\hat{Θ}}_{Z, λ}^{- 1}

is positive definite by construction,

{\hat{Σ}}_{Y, λ}

is positive definite (note that we assume

{\hat{Σ}}_{X, n}

is positive semidefinite).

We will impose the following structural assumptions on the model:

[C1]: $∥ Σ_{Y} ∥_{ℓ_{\infty}} = O_{p} (1)$ and ${∥ β ∥}_{ℓ_{\infty}} = O (1)$ as $n \to \infty$ .
[C2]: $Λ_{\max} (Σ_{Z}) + 1 / Λ_{\min} (Σ_{Z}) = O_{p} (1)$ as $n \to \infty$ .
[C3]: $∥ Σ_{X} ∥_{ℓ_{\infty}} + 1 / Λ_{\min} (Σ_{X}) = O_{p} (1)$ as $n \to \infty$ .
[C4]: $s (Θ_{Z}) = O_{p} (s_{n})$ as $n \to \infty$ for some sequence $s_{n} \in [1, \infty)$ , $n = 1, 2, \dots$ .
[C5]: $d (Θ_{Z}) = O_{p} (d_{n})$ as $n \to \infty$ for some sequence $d_{n} \in [1, \infty)$ , $n = 1, 2, \dots$ .
[C6]: There is a positive definite $d \times d$ matrix $B$ such that ${| | | d^{- 1} β^{⊤} β - B | | |}_{2} \to 0$ and $Λ_{min} {(B)}^{- 1} = O (1)$ as $n \to \infty$ .

[C1]–[C3] are natural structural assumptions on the model and standard in the literature; see e.g., Assumptions 2.1 and 3.3 in [44]. [C4]–[C5] are sparsity assumptions on the precision matrix of the residual process and necessary for our application of the (weighted) graphical Lasso. [C6] requires the factors to have non-negligible impact on almost all assets and is also standard in the context of covariance matrix estimation based on a factor model; see e.g., Assumption 3.5 in [44] and Assumption 6 in [8].

The following result establishes the consistency of the residual precision matrix estimator

{\hat{Θ}}_{Z, λ}

.

Proposition 3.

Assume [C1]–[C4]. Let

{(λ_{n})}_{n = 1}^{\infty}

be a sequence of positive-valued random variables satisfying the following conditions:

[D1]: $λ_{n}^{- 1} {∥ {\hat{Σ}}_{X, n} - Σ_{X} ∥}_{ℓ_{\infty}} \to^{p} 0$ , $λ_{n}^{- 1} {∥ {\hat{Σ}}_{Y X, n} - β {\hat{Σ}}_{X, n} ∥}_{ℓ_{\infty}} \to^{p} 0$ and $λ_{n}^{- 1} {∥ {\overset{˘}{Σ}}_{Z, n} - Σ_{Z} ∥}_{ℓ_{\infty}} \to^{p} 0$ as $n \to \infty$ , where ${\overset{˘}{Σ}}_{Z, n} : = {\hat{Σ}}_{Y, n} - {\hat{Σ}}_{Y X, n} β^{⊤} - β {\hat{Σ}}_{Y X, n}^{⊤} + β {\hat{Σ}}_{X, n} β^{⊤}$ .
[D2]: $(s_{n} + r) λ_{n} \to^{p} 0$ as $n \to \infty$ .
[D3]: $P ({\bar{Σ}}_{n} \in S_{d}^{+}) \to 1$ as $n \to \infty$ , where

${\bar{Σ}}_{n} : = (\begin{matrix} {\hat{Σ}}_{X, n} & {\hat{Σ}}_{Y X, n}^{⊤} \\ {\hat{Σ}}_{Y X, n} & {\hat{Σ}}_{Y, n} \end{matrix}) .$

Then

λ_{n}^{- 1} {| | | {\hat{Θ}}_{Z, λ_{n}} - Θ_{Z} | | |}_{w} = O_{p} (s_{n})

and

λ_{n}^{- 1} {| | | {\hat{Θ}}_{Z, λ_{n}}^{- 1} - Σ_{Z} | | |}_{2} = O_{p} (s_{n})

as

n \to \infty

for any

w \in [1, \infty]

.

Remark 7.

(a) Since

Σ_{Z X} = Σ_{Y X} - β Σ_{X}

and

Σ_{Z} = Σ_{Y} - Σ_{Y X} β^{⊤} - β Σ_{X Y} + β Σ_{X} β^{⊤}

,

{\hat{Σ}}_{Y X, n}

and

{\overset{˘}{Σ}}_{Z, n}

are seen as natural estimators for

Σ_{Z X} (= 0)

and

Σ_{Z}

respectively if β were known. In this sense, [D1] is a natural extension of [B1]. In particular, if

r = O (1)

as

n \to \infty

, [D1] follows from the convergences

λ_{n}^{- 1} {∥ {\hat{Σ}}_{X, n} - Σ_{X} ∥}_{ℓ_{\infty}} \to^{p} 0

,

λ_{n}^{- 1} {∥ {\hat{Σ}}_{Y X, n} - Σ_{Y X} ∥}_{ℓ_{\infty}} \to^{p} 0

and

λ_{n}^{- 1} {∥ {\hat{Σ}}_{Y, n} - Σ_{Y} ∥}_{ℓ_{\infty}} \to^{p} 0

under [C1], which are typically derived from entry-wise concentration inequalities for

{\hat{Σ}}_{X, n}, {\hat{Σ}}_{Y X, n}

and

{\hat{Σ}}_{Y, n}

.

(b) [D3] ensures that

{\hat{Σ}}_{Z, n}

is asymptotically positive semidefinite. This is necessary for guaranteeing that the optimization problem in Equation (10) asymptotically has the unique solution with probability 1.

From Proposition 3 we can also derive the convergence rates for the estimators

{\hat{Σ}}_{Z, λ_{n}}

and

{\hat{Σ}}_{Z, λ_{n}}^{- 1}

in appropriate norms, which may be seen as counterparts of Theorems 1–2 in [8].

Proposition 4.

Under the assumptions of Proposition 3,

λ_{n}^{- 1} {∥ {\hat{Σ}}_{Z, λ_{n}} - Σ_{Z} ∥}_{ℓ_{\infty}} = O_{p} (s_{n} + r^{2})

as

n \to \infty

Proposition 5.

Under the assumptions of Proposition 3, we additionally assume [C5]–[C6]. Then,

λ_{n}^{- 1} {| | | {\hat{Σ}}_{Y, λ_{n}}^{- 1} - Σ_{Y}^{- 1} | | |}_{2} = O_{p} (s_{n} + r)

and

λ_{n}^{- 1} {| | | {\hat{Σ}}_{Y, λ_{n}}^{- 1} - Σ_{Y}^{- 1} | | |}_{\infty} = O_{p} (r^{3 / 2} d_{n} (s_{n} + r))

as

n \to \infty

.

Next we present the high-dimensional asymptotic mixed normality of the de-biased version of

{\hat{Θ}}_{Z, λ}

.

Proposition 6.

Suppose that the assumptions of Proposition 3 and [C5] are satisfied. For every

n \in N

, let

a_{n} > 0

,

C_{n}

be a

d^{2} \times d^{2}

positive semidefinite random matrix and

J_{n}

be an

m \times d^{2}

random matrix, where

m = m_{n}

may depend on n. Assume

a_{n} {| | | J_{n} | | |}_{\infty} λ_{n}^{2} s_{n} \sqrt{d_{n} log (m + 1)} \to^{p} 0

as

n \to \infty

. Assume also that

lim_{n \to \infty} sup_{y \in R^{m}} |P (a_{n} {\tilde{J}}_{Z, n} vec ({\overset{˘}{Σ}}_{Z, n} - Σ_{Z}) \leq y) - P ({\tilde{J}}_{Z, n} C_{n}^{1 / 2} ζ_{n} \leq y)| = 0

(12)

and

lim_{b ↓ 0} \underset{n \to \infty}{lim sup} P (min diag ({\tilde{J}}_{Z, n} C_{n} {\tilde{J}}_{Z, n}^{⊤}) < b) = 0

(13)

as

n \to \infty

, where

{\tilde{J}}_{Z, n} : = - J_{n} (Θ_{Z} \otimes Θ_{Z})

and

ζ_{n}

is a

d^{2}

-dimensional standard Gaussian vector independent of

F

, which is defined on an extension of the probability space

(Ω, F, P)

if necessary. Then,

lim_{n \to \infty} sup_{y \in R^{m}} |P (a_{n} J_{n} vec ({\hat{Θ}}_{Z, λ_{n}} - Γ_{Z, n} - Θ_{Z}) \leq y) - P ({\tilde{J}}_{Z, n} C_{n}^{1 / 2} ζ_{n} \leq y)| = 0,

where

Γ_{Z, n} : = - ({\hat{Θ}}_{Z, λ_{n}} - {\hat{Θ}}_{Z, λ_{n}} {\hat{Σ}}_{Z, n} {\hat{Θ}}_{Z, λ_{n}})

.

Remark 8.

It is worth mentioning that condition Equation (12) is stated for

{\overset{˘}{Σ}}_{Z, n}

rather than

{\hat{Σ}}_{Z, n}

. In other words, for deriving the asymptotic distribution, we do not need to take account of the effect of plugging

{\hat{β}}_{n}

into β, at least in the first order. This is thanks to Lemma A11.

Although it is generally difficult to derive the asymptotic mixed normality of (the de-biased version of)

{\hat{Σ}}_{Y, λ_{n}}^{- 1}

, this is possible when d is sufficiently large. In fact, in such a situation, the entry-wise behavior of

Σ_{Y}^{- 1}

is dominated by

Θ_{Z}

as described by the following lemma:

Lemma 2.

Under the assumptions of Proposition 5,

∥ {\hat{Σ}}_{Y, λ_{n}}^{- 1} - {\hat{Θ}}_{Z, λ_{n}} ∥_{ℓ_{\infty}} = O_{p} (r d_{n} / d)

and

∥ Σ_{Y}^{- 1} - Θ_{Z} ∥_{ℓ_{\infty}} = O_{p} (r d_{n} / d)

as

n \to \infty

.

Consequently, we obtain the following result.

Proposition 7.

Suppose that the assumptions of Proposition 6 and [C6] are satisfied. Suppose also

a_{n} {| | | J_{n} | | |}_{\infty} r d_{n} \sqrt{log (m + 1)} / d \to 0

as

n \to \infty

. Then we have

lim_{n \to \infty} sup_{y \in R^{m}} |P (a_{n} J_{n} vec ({\hat{Σ}}_{Y, λ_{n}}^{- 1} - Γ_{Z, n} - Σ_{Y}^{- 1}) \leq y) - P ({\tilde{J}}_{Z, n} C_{n}^{1 / 2} ζ_{n} \leq y)| = 0 .

4. Application to Realized Covariance Matrix

In this section, we apply the abstract theory developed above to the simplest situation where the processes have no jumps and are observed at equidistant times without noise. Specifically, we consider the continuous-time factor model Equation (7) and assume that both Y and X are observed at equidistant time points

h / n

,

h = 0, 1, \dots, n

. In this case,

Σ_{Y} = {[Y, Y]}_{1}

is naturally estimated by the realized covariance matrix:

{\hat{Σ}}_{Y, n} : = {\hat{[Y, Y]}}_{1}^{n} : = \sum_{h = 1}^{n} (Y_{h / n} - Y_{(h - 1) / n}) {(Y_{h / n} - Y_{(h - 1) / n})}^{⊤} .

(14)

Analogously, we define

{\hat{Σ}}_{X, n} : = {\hat{[X, X]}}_{1}^{n}

and

{\hat{Σ}}_{Y X, n} : = {\hat{[Y, X]}}_{1}^{n}

. In addition, we assume that Z and X are respectively d-dimensional and r-dimensional continuous Itô semimartingales given by

Z_{t} = Z_{0} + \int_{0}^{t} μ_{s} d s + \int_{0}^{t} σ_{s} d W_{s}, X_{t} = X_{0} + \int_{0}^{t} {\tilde{μ}}_{s} d s + \int_{0}^{t} {\tilde{σ}}_{s} d W_{s},

where

μ = {(μ_{s})}_{s \in [0, 1]}

and

\tilde{μ} = {({\tilde{μ}}_{s})}_{s \in [0, 1]}

are respectively d-dimensional and r-dimensional

(F_{t})

-progressively measurable processes,

σ = {(σ_{s})}_{s \in [0, 1]}

and

\tilde{σ} = {({\tilde{σ}}_{s})}_{s \in [0, 1]}

are respectively

R^{d \times d^{'}}

-valued and

R^{r \times d^{'}}

-valued

(F_{t})

-progressively measurable processes, and

W = {(W_{s})}_{s \in [0, 1]}

is a

d^{'}

-dimensional standard

(F_{t})

-Wiener process. To apply the convergence rate results to this setting, we impose the following assumptions:

[E1]

For all

n, ν \in N

, we have an event

Ω_{n} (ν) \in F

and

(F_{t})

-progressively measurable processes

μ (ν) = {(μ {(ν)}_{s})}_{s \in [0, 1]}

,

\tilde{μ} (ν) = {(\tilde{μ} {(ν)}_{s})}_{s \in [0, 1]}

,

σ (ν) = {(σ {(ν)}_{s})}_{s \in [0, 1]}

and

\tilde{σ} (ν) = {(\tilde{σ} {(ν)}_{s})}_{s \in [0, 1]}

which take values in

R^{d}

,

R^{r}

,

R^{d \times d^{'}}

and

R^{r \times d^{'}}

, respectively, and they satisfy the following conditions:

(i): ${lim}_{ν \to \infty} {lim sup}_{n \to \infty} P (Ω_{n} {(ν)}^{c}) = 0$ .
(ii): $μ = μ (ν)$ , $\tilde{μ} = \tilde{μ} (ν)$ , $σ = σ (ν)$ and $\tilde{σ} = \tilde{σ} (ν)$ on $Ω_{n} (ν)$ for all $ν \in N$ .
(iii): For all $ν \in N$ , there is a constant $C_{ν} > 0$ such that

$sup_{n \in N} sup_{0 \leq t \leq 1} sup_{ω \in Ω} ({∥ μ (ν)}_{t} {(ω) ∥}_{ℓ_{\infty}} + ∥ \tilde{μ} {(ν)}_{t} {(ω) ∥}_{ℓ_{\infty}} {+ ∥ c (ν)}_{t} {(ω) ∥}_{ℓ_{\infty}} + {∥ \tilde{c} {(ν)}_{t} (ω) ∥}_{ℓ_{\infty}}) \leq C_{ν},$

where $c {(ν)}_{t} : = σ {(ν)}_{t} σ {(ν)}_{t}^{⊤}$ and $\tilde{c} {(ν)}_{t} : = \tilde{σ} {(ν)}_{t} \tilde{σ} {(ν)}_{t}^{⊤}$ .

[E2]

r = O (d)

and

(log d) / \sqrt{n} \to 0

as

n \to \infty

.

[E1] is a local boundedness assumption on the coefficient processes and typical in the literature: For example, [E1] is satisfied when

μ, \tilde{μ}, σ

and

\tilde{σ}

are all bounded by some locally bounded process independent of n. This latter condition is imposed in [8], among others. [E2] restricts the growth rates of d and r. It is indeed an adaptation of [D1] to the present setting.

Theorem 1.

Assume [C1]–[C4] and [E1]–[E2]. Let

λ_{n}

be a sequence of positive-valued random variables such that

λ_{n}^{- 1} \sqrt{(log d) / n} \to^{p} 0

and

(s_{n} + r) λ_{n} \to^{p} 0

as

n \to \infty

. Then

λ_{n}^{- 1} {| | | {\hat{Θ}}_{Z, λ_{n}} - Θ_{Z} | | |}_{w} = O_{p} (s_{n})

,

λ_{n}^{- 1} {| | | {\hat{Θ}}_{Z, λ_{n}}^{- 1} - Σ_{Z} | | |}_{2} = O_{p} (s_{n})

and

λ_{n}^{- 1} {∥ {\hat{Σ}}_{Y, λ_{n}} - Σ_{Y} ∥}_{ℓ_{\infty}} = O_{p} (s_{n} + r^{2})

as

n \to \infty

for any

w \in [1, \infty]

. Moreover, if we additionally assume [C5]–[C6], then

λ_{n}^{- 1} {| | | {\hat{Σ}}_{Y, λ_{n}}^{- 1} - Σ_{Y}^{- 1} | | |}_{2} = O_{p} (s_{n} + r)

and

λ_{n}^{- 1} {| | | {\hat{Σ}}_{Y, λ_{n}}^{- 1} - Σ_{Y}^{- 1} | | |}_{\infty} = O_{p} (r^{3 / 2} d_{n} (s_{n} + r))

as

n \to \infty

.

Remark 9

(Optimal convergence rate). From Theorem 1, the convergence rate of

{\hat{Θ}}_{Z, λ_{n}}

to

Θ_{Z}

in the

ℓ_{w}

-operator norm for any

w \in [1, \infty]

can be arbitrarily close to

s_{n} \sqrt{(log d) / n}

, which is similar to that in a standard i.i.d. setting (cf. Theorem 14.1.3 in [24]). On the other hand, in the Gaussian i.i.d. setting without factor structure, the minimax optimal rate for this problem is known to be

d (Θ_{Z}) \sqrt{(log d) / n}

(see [45] (Theorem 1.1) and [46] (Theorem 5)), which can be faster than

s_{n} \sqrt{(log d) / n}

. In a standard i.i.d. setting, this rate can be attained by using a node-wise penalized regression (see e.g., [46] (Section 3.1)), so it would be interesting to study the convergence rate of such a method in our setting. We leave it to future research. In the meantime, such a method does not ensure the positive definiteness of the estimated precision matrix in general, so our estimator would be preferable for some practical applications such as portfolio allocation.

Next we derive the asymptotic mixed normality of the de-biased estimator in the present setting. As announced, we accomplish this purpose with the help of Malliavin calculus. In the following we will freely use standard concepts and notation from Malliavin calculus. We refer to [47,48] (Chapter 1) for detailed treatments of this subject.

We consider the Malliavin calculus with respect to W. For any real number

p \geq 1

and any integer

k \geq 1

,

D_{k, p}

denotes the stochastic Sobolev space of random variables which are k times differentiable in the Malliavin sense and the derivatives up to order k have finite moments of order p. If

F \in D_{k, p}

, we denote by

D^{k} F

the kth Malliavin derivative of F, which is a random variable taking values in

L^{2} ({[0, 1]}^{k}; {(R^{d^{'}})}^{\otimes k})

. Here, we identify the space

{(R^{d^{'}})}^{\otimes k}

with the set of all

d^{'}

-dimensional k-way arrays, i.e., real-valued functions on

{1, \dots, d^{'}}^{k}

. Since

D^{k} F

is a random function on

{[0, 1]}^{k}

, we can consider the value

D^{k} F (t_{1}, \dots, t_{k})

evaluated at

(t_{1}, \dots, t_{k}) \in {[0, 1]}^{k}

. We denote this value by

D_{t_{1}, \dots, t_{k}} F

. Moreover, since

D_{t_{1}, \dots, t_{k}} F

takes values in

{(R^{d^{'}})}^{\otimes k}

, we can consider the value

D_{t_{1}, \dots, t_{k}} F (a_{1}, \dots, a_{k})

evaluated at

(a_{1}, \dots, a_{k}) \in {1, \dots, d^{'}}^{k}

. This value is denoted by

D_{t_{1}, \dots, t_{k}}^{(a_{1}, \dots, a_{k})} F

. We remark that the variable

D_{t_{1}, \dots, t_{k}} F

is defined only a.e. on

{[0, 1]}^{k} \times Ω

with respect to the measure

{error}_{k} \times P

, where

{error}_{k}

denotes the Lebesgue measure on

{[0, 1]}^{k}

. Therefore, if

D_{t_{1}, \dots, t_{k}} F

satisfies some property a.e. on

{[0, 1]}^{k} \times Ω

with respect to

{error}_{k} \times P

, by convention we will always take a version of

D_{t_{1}, \dots, t_{k}} F

satisfying that property everywhere on

{[0, 1]}^{k} \times Ω

if necessary. We set

D_{k, \infty} : = ⋂_{p = 1}^{\infty} D_{k, p}

. We denote by

D_{k, \infty} (R^{d})

the space of all d-dimensional random variables F such that

F^{i} \in D_{k, \infty}

for every

i = 1, \dots, d

. The space

D_{k, \infty} (R^{d \times r})

is defined in an analogous way. Finally, for any

{(R^{d^{'}})}^{\otimes k}

-valued random variable F and

p \in (0, \infty]

, we set

{∥ F ∥}_{p, ℓ_{2}} : = {∥\sqrt{\sum_{a_{1}, \dots, a_{k} = 1}^{d^{'}} F {(a_{1}, \dots, a_{k})}^{2}}∥}_{p} .

We also need to define some variables related to the “asymptotic” covariance matrices of the estimators. We define

d^{2} \times d^{2}

random matrix

C_{n}

by

\begin{matrix} C_{n}^{(i - 1) d + j, (k - 1) d + l} : = \\ n \sum_{h = 1}^{n} \{(\int_{(h - 1) / n}^{h / n} c_{s}^{i k} d s) (\int_{(h - 1) / n}^{h / n} c_{s}^{j l} d s) + (\int_{(h - 1) / n}^{h / n} c_{s}^{i l} d s) (\int_{(h - 1) / n}^{h / n} c_{s}^{j k} d s)\}, \\ i, j, k, l = 1, \dots, d, \end{matrix}

where

c_{s} : = σ_{s} σ_{s}^{⊤}

. Then we set

V_{n} : = (Θ_{Z} \otimes Θ_{Z}) C_{n} (Θ_{Z} \otimes Θ_{Z})

and

S_{n} : = diag {(V_{n})}^{1 / 2}

. In addition, under [E1], we define

C_{n} (ν)

similarly to

C_{n}

with replacing

σ

by

σ (ν)

.

C_{n}

and

V_{n}

play roles of the asymptotic covariance matrices of

{\overset{˘}{Σ}}_{Z, n}

and

{\hat{Θ}}_{Z, λ_{n}}

, respectively.

We impose the following assumptions on the model.

(F1): We have [E1] and $Σ_{Z} (ν) : = \int_{0}^{1} c {(ν)}_{t} d t$ is a.s. invertible for all $n, ν \in N$ . Moreover, for all $n, ν \in N$ and $t \in [0, 1]$ , $μ {(ν)}_{t} \in D_{1, \infty} (R^{d})$ , $σ {(ν)}_{t} \in D_{2, \infty} (R^{d \times r})$ and

$\begin{matrix} sup_{n \in N} max_{1 \leq i \leq d} sup_{0 \leq s, t \leq 1} {∥ D_{s} μ {(ν)}_{t}^{i} ∥}_{\infty, ℓ_{2}} < \infty, \end{matrix}$

(15)

$\begin{matrix} sup_{n \in N} max_{1 \leq i \leq d} (sup_{0 \leq s, t \leq 1} ∥ D_{s} σ {(ν)}_{t}^{i \cdot} ∥_{\infty, ℓ_{2}} + sup_{0 \leq s, t, u \leq 1} {∥ D_{s, t} σ {(ν)}_{u}^{i \cdot} ∥}_{\infty, ℓ_{2}}) < \infty, \end{matrix}$

(16)

$\begin{matrix} sup_{n \in N} (max_{1 \leq i \leq d} ∥ Θ_{Z} {(ν)}^{i i} ∥_{\infty} + max_{1 \leq k \leq d^{2}} {∥ 1 / V_{n} {(ν)}^{k k} ∥}_{\infty}) < \infty, \end{matrix}$

(17)

where $Θ_{Z} (ν) : = Σ_{Z} {(ν)}^{- 1}$ and $V_{n} (ν) : = (Θ_{Z} (ν) \otimes Θ_{Z} (ν)) C_{n} (ν) (Θ_{Z} (ν) \otimes Θ_{Z} (ν))$ .
(F2): The $d \times d$ matrix $Q_{Z} : = {(1_{{Θ_{Z}^{i j} \neq 0}})}_{1 \leq i, j \leq d}$ is non-random and $d (Q_{Z}) = O (1)$ as $n \to \infty$ .
(F3): $r = O (d)$ and ${(log d)}^{13} / n \to 0$ as $n \to \infty$ .

We give a few remarks on these assumptions. First, [F1] imposes the (local) Malliavin differentiability on the coefficient processes of the residual process Z and the local boundedness on their Malliavin derivatives. Such an assumption is necessary for the application of the high-dimensional mixed normal limit theorem of [20] to our setting (see Lemma A16). Please note that we do not need to impose this type of assumption on the factor process X. We also remark that analogous assumptions are sometimes used in the literature of high-frequency financial econometrics even in low-dimensional settings; see e.g., [49,50]. Second, [F2] is clearly understood when we consider a Gaussian graphical model associated with

Σ_{Z}

: The non-randomness of

Q_{Z}

implies that the edge structure of this Gaussian graphical model is determined in a non-random manner (by conditioning, it is indeed sufficient that the edge structure is determined independently of the driving Wiener process W). Also, we remark that the condition

d (Q_{Z}) = O (1)

is equivalent to [C5] with

d_{n} = 1

. It is seemingly possible to relax this condition so that it allows a diverging sequence

d_{n}

as long as

d_{n} {(log d)}^{κ} / n \to 0

for an appropriate constant

κ > 0

. However, to determine the precise value of

κ

, we need to carefully revise the proof of Lemma A16 so that it allows the quantity inside

{sup}_{n \in N}

in (A7) to diverge as

n \to \infty

. To avoid such an additional complexity, we restrict our attention to the case of

d_{n} = 1

. Third, the condition

{(log d)}^{13} / n \to 0

in [F3] is used again for applying the high-dimensional CLT of [20].

Now we are ready to state our result. Let

A^{re} (d^{2})

be the set of all hyperrectangles in

R^{d^{2}}

, i.e.,

A^{re} (d^{2})

consists of all sets A of the form

A = {x \in R^{d^{2}} : a_{j} \leq x^{j} \leq b_{j} for all j = 1, \dots, d^{2}}

for some

- \infty \leq a_{j} \leq b_{j} \leq \infty

,

j = 1, \dots, d^{2}

.

Theorem 2.

Assume [C1]–[C4] and [F1]–[F3]. Let

λ_{n}

be a sequence of positive-valued random variables such that

λ_{n}^{- 1} \sqrt{(log d) / n} \to^{p} 0

,

(s_{n} + r) λ_{n} \to^{p} 0

and

λ_{n}^{2} s_{n} \sqrt{n log d} \to^{p} 0

as

n \to \infty

. Then we have

sup_{A \in A^{re} (d^{2})} |P (\sqrt{n} vec ({\hat{Θ}}_{Z, λ_{n}} - Γ_{Z, n} - Θ_{Z}) \in A) - P (V_{n}^{1 / 2} ζ_{n} \in A)| \to 0

(18)

and

sup_{A \in A^{re} (d^{2})} |P (\sqrt{n} S_{n}^{- 1} vec ({\hat{Θ}}_{Z, λ_{n}} - Γ_{Z, n} - Θ_{Z}) \in A) - P (S_{n}^{- 1} V_{n}^{1 / 2} ζ_{n} \in A)| \to 0

(19)

as

n \to \infty

.

Remark 10.

λ_{n}

is typically chosen of order close to

\sqrt{log d / n}

as possible, so

λ_{n}^{2} s_{n} \sqrt{n log d} \to^{p} 0

is almost equivalent to

s_{n} {(log d)}^{\frac{3}{2}} / \sqrt{n} \to 0

. This is stronger than the condition

s_{n} (log d) / \sqrt{n} \to 0

which is used to derive the asymptotic normality of the de-biased weighted graphical Lasso estimator in [24] (Theorem 14.1.6) (note that we assume

d (Θ_{Z}) = O_{p} (1)

). This is because Theorem 2 derives the approximations of the joint distributions of the de-biased estimator and its Studentization, while [24] (Theorem 14.1.6) focuses only on approximation of their marginal distributions.

Theorem 2 is statistically infeasible in the sense that

V_{n}

is unobservable. Thus, we need to estimate it from the data. Since

Θ_{Z}

is naturally estimated by

{\hat{Θ}}_{Z, λ_{n}}

, we construct an estimator for

C_{n}

. Define the

d^{2}

-dimensional random vectors

{\hat{χ}}_{h}

by

{\hat{χ}}_{h} : = vec [({\hat{Z}}_{h / n} - {\hat{Z}}_{(h - 1) / n}) {({\hat{Z}}_{h / n} - {\hat{Z}}_{(h - 1) / n})}^{⊤}], h = 1, \dots, n,

where

{\hat{Z}}_{h / n} : = Y_{h / n} - \hat{β} X_{h / n}

. Then we set

{\hat{C}}_{n} : = n \sum_{h = 1}^{n} {\hat{χ}}_{h} {\hat{χ}}_{h}^{⊤} - \frac{n}{2} \sum_{h = 1}^{n - 1} ({\hat{χ}}_{h} {\hat{χ}}_{h + 1}^{⊤} + {\hat{χ}}_{h + 1} {\hat{χ}}_{h}^{⊤}) .

Lemma 3.

Suppose that the assumptions of Theorem 2 are satisfied. Suppose also

r^{2} (log d) / n = O (1)

as

n \to \infty

and that there is a constant

γ \in (0, \frac{1}{2}]

such that

sup_{0 < t \leq 1 - \frac{1}{n}} {∥max_{1 \leq i, j \leq d} |c {(ν)}_{t + \frac{1}{n}}^{i j} - c {(ν)}_{t}^{i j}|∥}_{2} = O (n^{- γ})

(20)

as

n \to \infty

for all

ν \in N

. Then,

∥ {\hat{C}}_{n} - C_{n} ∥_{ℓ_{\infty}} = O_{p} (r {(log d)}^{5 / 2} / \sqrt{n} + n^{- γ})

as

n \to \infty

.

Let us set

{\hat{V}}_{n} : = ({\hat{Θ}}_{Z, λ_{n}} \otimes {\hat{Θ}}_{Z, λ_{n}}) {\hat{C}}_{n} ({\hat{Θ}}_{Z, λ_{n}} \otimes {\hat{Θ}}_{Z, λ_{n}})

and

{\hat{S}}_{n} : = diag {({\hat{V}}_{n})}^{1 / 2}

.

Corollary 1.

Under the assumptions of Lemma 3, we have the following results:

(a): Assume $s_{n} λ_{n} log d \to^{p} 0$ and $r {(log d)}^{7 / 2} / \sqrt{n} + n^{- γ} log d \to 0$ as $n \to \infty$ . Then,

$lim_{n \to \infty} sup_{A \in A^{re} (d^{2})} |P (\sqrt{n} {\hat{S}}_{n}^{- 1} vec ({\hat{Θ}}_{Z, λ_{n}} - Γ_{Z, n} - Θ_{Z}) \in A) - P (S_{n}^{- 1} V_{n}^{1 / 2} ζ_{n} \in A)| = 0 .$
(b): Assume $s_{n} λ_{n} {(log d)}^{2} \to^{p} 0$ and $r {(log d)}^{9 / 2} / \sqrt{n} + n^{- γ} {(log d)}^{2} \to 0$ as $n \to \infty$ . Then,

$\begin{matrix} sup_{A \in A^{re} (d^{2})} |P ({\hat{V}}_{n}^{1 / 2} ζ_{n} \in A ∣ F) - P (V_{n}^{1 / 2} ζ_{n} \in A ∣ F)| \to^{p} 0, \\ sup_{A \in A^{re} (d^{2})} |P ({\hat{S}}_{n}^{- 1} {\hat{V}}_{n}^{1 / 2} ζ_{n} \in A ∣ F) - P (S_{n}^{- 1} V_{n}^{1 / 2} ζ_{n} \in A ∣ F)| \to^{p} 0 \end{matrix}$

as $n \to \infty$ .

Corollary 1(a) particularly implies that

lim_{n \to \infty} max_{1 \leq i, j \leq d} sup_{x \in R} |P (\frac{\sqrt{n} ({\hat{Θ}}_{Z, λ_{n}}^{i j} - Γ_{Z, n}^{i j} - Θ_{Z}^{i j})}{{\hat{s}}_{n}^{i j}} \leq x) - Φ (x)| = 0,

(21)

where

{\hat{s}}_{n}^{i j} : = {\hat{S}}_{n}^{(i - 1) d + j, (i - 1) d + j}

and

Φ

is the standard normal distribution function. This result can be used to construct entry-wise confidence intervals for

Θ_{Z}

. Meanwhile, combining Corollary 1(b) with [20] (Proposition 3.2), we can estimate the quantiles of

{max}_{k \in K} {(V_{n}^{1 / 2} ζ_{n})}^{k}

and

{max}_{k \in K} {(S_{n}^{- 1} V_{n}^{1 / 2} ζ_{n})}^{k}

for a given set of indices

K \subset {1, \dots, d^{2}}

by simulation. Such a result can be used to construct simultaneous confidence intervals and control the family-wise error rate in multiple testing for entries of

Θ_{Z}

; see Sections 2.3–2.4 of [51] for details.

As announced, another application of our result is to construct an estimator with selection consistency via thresholding. This is carried out by using the following result:

Corollary 2.

Let

α_{n} \in (0, 1)

(

n = 1, 2, \dots

) satisfy

α_{n} \to α

and

- log α_{n} = O (log d)

as

n \to \infty

for some

α \in [0, 1)

. Define

c_{n} : = Φ^{- 1} (1 - \frac{α_{n}}{d (d - 1)})

and

{\hat{S}}_{n} (Θ_{Z}) : = \{(i, j) : i \neq j and \frac{\sqrt{n} | {\hat{Θ}}_{Z, λ_{n}}^{i j} - Γ_{Z, n}^{i j} |}{{\hat{s}}_{n}^{i j}} > c_{n}\} .

Then, under the assumptions of Corollary 1(a), we have

\underset{n \to \infty}{lim inf} P ({\hat{S}}_{n} (Θ_{Z}) = S (Θ_{Z})) \geq 1 - 2 α,

provided that

\sqrt{n / log d} {min}_{(i, j) \in S (Θ_{Z})} | Θ_{Z}^{i j} | \to^{p} \infty

as

n \to \infty

.

Please note that the last condition is satisfied if

{min}_{(i, j) \in S (Θ_{Z})} | Θ_{Z}^{i j} |

is bounded away from zero because

\sqrt{n / log d} \to \infty

under our assumptions. Taking the sequence

α_{n}

so that

α = 0

in Corollary 2, we can asymptotically recover the support of

Θ_{Z}

. In this case, if we define

{\tilde{Θ}}_{Z, λ_{n}} = {({\tilde{Θ}}_{Z, λ_{n}}^{i j})}_{1 \leq i, j \leq d}

by

{\tilde{Θ}}_{Z, λ_{n}}^{i j} = \{\begin{matrix} {\hat{Θ}}_{Z, λ_{n}}^{i j} - Γ_{Z, n}^{i j} & if i = j or (i, j) \in {\hat{S}}_{n} (Θ_{Z}), \\ 0 & otherwise, \end{matrix}

{\tilde{Θ}}_{Z, λ_{n}}

will be oracle in the sense of [38]. However, we note that the estimator

{\tilde{Θ}}_{Z, λ_{n}}

would not be continuous in data, so it would not satisfy the third desirable property in [38] (p. 1349). To construct an oracle estimator for

Θ_{Z}

which is continuous in data, we will need to consider a non-concave penalized estimator as in [52]. This is left to future research.

5. Simulation Study

5.1. Implementation

To implement the proposed estimation procedure, we need to solve the optimization problem in Equation (10). Among many existing algorithms to solve this problem, we employ the GLASSOFAST algorithm of [53], which is an improved implementation of the popular GLASSO algorithm of [54] and implemented in the R package glassoFast.

The remaining problem is how to select the penalty parameter

λ

. Following [17,55], we select it by minimizing the following formally defined Bayesian information criterion (BIC):

BIC (λ) : = n \{tr ({\hat{Θ}}_{Z, λ} {\hat{Σ}}_{Z, n}) - log det ({\hat{Θ}}_{Z, λ})\} + (log n) \sum_{i \leq j} 1_{\{{\hat{Θ}}_{Z, λ}^{i j} \neq 0\}} .

The minimization is carried out by grid search. The grid

{λ_{1}, \dots, λ_{m}}

is constructed analogously to the R package glmnet (see Section 2.5 of [56] for details): First, as the maximum value

λ_{max}

of the grid, we take the smallest value for which all the off-diagonal entries of

{\hat{Θ}}_{Z, λ_{max}}

are zero: In our case,

λ_{max}

is set to the maximum modulus of the off-diagonal entries of

{\hat{Σ}}_{Z, n}

(cf. [57] (Corollary 1)). Next, we take a constant

ε > 0

and set

λ_{min} : = ε λ_{max}

as the minimum value of the grid. Finally, we construct the values

λ_{1}, \dots, λ_{m}

increasing from

λ_{min}

to

λ_{max}

on the log scale:

λ_{i} = exp (log (λ_{min}) + \frac{i - 1}{m - 1} log (λ_{max} / λ_{min})), i = 1, \dots, m .

We use

ε = \sqrt{(log d) / n}

and

m = 10

in our experiments (the computation procedure of the weighted graphical Lasso described here is implemented in the R package yuima as the function cce.factor with the option regularize="glasso" since version 1.9.2).

5.2. Simulation Design

We basically follow the setting of [8]. We simulate the model (7) with the following specification: For the factor process X, we set

r = 3

and

\begin{matrix} d X_{t}^{j} & = μ_{j} d t + \sqrt{v_{t}^{j}} d W_{t}^{j}, & d v_{t}^{j} & = κ_{j} (θ_{j} - v_{t}^{j}) d t + η_{j} \sqrt{v_{t}^{j}} (ρ_{j} d W_{t}^{j} + \sqrt{1 - ρ_{j}^{2}} d {\tilde{W}}_{t}^{j}), j = 1, 2, 3, \end{matrix}

where

W^{1}, W^{2}, W^{3}, {\tilde{W}}^{1}, {\tilde{W}}^{2}, {\tilde{W}}^{3}

are independent standard Wiener processes. We set

κ = (3, 4, 5), θ = (0.09, 0.04, 0.06), η = (0.3, 0.4, 0.3), ρ = (- 0.6, - 0.4, - 0.25)

and

μ = (0.05, 0.03, 0.02)

. The initial value

v_{0}^{j}

is drawn from the stationary distribution of the process

{(v_{t}^{j})}_{t \in [0, 1]}

, i.e., the gamma distribution with shape

2 κ_{j} θ_{j} / η_{j}^{2}

and rate

2 κ_{j} / η_{j}^{2}

. The entries of the loading

β

are independently drawn as

β^{i 1} \overset{i . i . d .}{\sim} U [0.25, 2.25]

and

β^{i 2}, β^{i 3} \overset{i . i . d .}{\sim} U [- 0.5, 0.5]

(

U [a, b]

denotes the uniform distribution on

[a, b]

). Finally, as the residual process Z, we take a d-dimensional Wiener process with covariance matrix Q. We consider the following two designs for Q:

Design 1: Q is a block diagonal matrix with 10 blocks of size $(d / 10) \times (d / 10)$ . Each block has diagonal entries independently generated from $U [0.2, 0.5]$ and a constant correlation of $0.25$ .
Design 2: We simulate a Chung-Lu random graph $G$ and set $Q : = {(E_{d} + D - A)}^{- 1}$ , where $D$ and $A$ are respectively the degree and adjacent matrices of the random graph $G$ . Formally, given a weight vector $w \in R^{d}$ with $w \geq 0$ , $A$ is defined as a $d \times d$ symmetric random matrix such that all the diagonal entries of $A$ are equal to 0 and the off-diagonal upper triangular entries are generated by independent Bernoulli variables so that $P (A^{i j} = 1) = 1 - P (A^{i j} = 0) = w^{i} w^{j} / \sum_{k = 1}^{d} w^{k}$ for $i < j$ . Then, $D$ is defined as the diagonal matrix such that the j-th diagonal entry of $D$ is given by $d_{j} (A) = \sum_{i = 1}^{d} A^{i j}$ . The weight vector w is specified as follows: For every $i = 1, \dots, d$ , we set $w^{i} : = c {((i + i_{0} - 1) / d)}^{- 1 / (α - 1)}$ with $i_{0} : = d {(c / w_{M})}^{α - 1}$ and $c : = (α - 2) / (α - 1)$ , where we use $α = 2.5$ and $w_{M} = ⌊ d^{0.45} ⌋$ .

Design 1 is the same one as in [8]. Design 2 is motivated by the recent work of Barigozzi et al. [42], which reports that several characteristics of the residual precision matrix of the S&P 500 assets exhibit power-law behaviors and they are well-described by the power-law partial correlation network model proposed in [42]; the specification in Design 2 is the same one as in the simulation study of [42].

We observe the processes Y and X at the equidistant times

h / n

,

h = 0, 1, \dots, n

. We set

d = 500

and vary n as

n \in {78, 130, 195, 390, 780}

. We run 10,000 Monte Carlo iterations for each experiment.

5.3. Results

We begin by assessing the estimation accuracy of the proposed estimator in various norms. For comparison, we consider the following 5 different methods to estimate

Σ_{Y}

:

RC: We simply use the realized covariance matrix ${\hat{[Y, Y]}}_{1}^{n}$ defined by Equation (14) to estimate $Σ_{Y}$ .
glasso: We estimate $Σ_{Y}^{- 1}$ by the (unweighted) graphical Lasso based on ${\hat{[Y, Y]}}_{1}^{n}$ . Then, $Σ_{Y}$ is estimated by its inverse.
wglasso: We estimate $Σ_{Y}^{- 1}$ by the weighted graphical Lasso based on ${\hat{[Y, Y]}}_{1}^{n}$ (i.e., the estimator defined by Equation (1) with ${\hat{Σ}}_{n} = {\hat{[Y, Y]}}_{1}^{n}$ ). Then, $Σ_{Y}$ is estimated by its inverse.
f-glasso: We estimate $Σ_{Z}^{- 1}$ by the (unweighted) graphical Lasso based on ${\hat{Σ}}_{Z, n}$ defined by Equation (9) with ${\hat{Σ}}_{Y, n} = {\hat{[Y, Y]}}_{1}^{n}$ and ${\hat{Σ}}_{X, n} = {\hat{[X, X]}}_{1}^{n}$ . Then, $Σ_{Y}$ is estimated by Equation (11) with ${\hat{Θ}}_{Z, λ}$ being the estimator so constructed.
f-wglasso: We estimate $Σ_{Z}^{- 1}$ by the weighted graphical Lasso based on ${\hat{Σ}}_{Z, n}$ defined by Equation (9) with ${\hat{Σ}}_{Y, n} = {\hat{[Y, Y]}}_{1}^{n}$ and ${\hat{Σ}}_{X, n} = {\hat{[X, X]}}_{1}^{n}$ . Then, $Σ_{Y}$ is estimated by Equation (11) with ${\hat{Θ}}_{Z, λ}$ being the estimator so constructed.

In addition, for Design 1, we also consider the estimator proposed in [8]: Assuming that we know which entries of

Σ_{Z}

are zero, we estimate

Σ_{Y}

by

{\hat{β}}_{n} {\hat{Σ}}_{X, n} {\hat{β}}_{n}^{⊤} + {({\hat{Σ}}_{Z, n}^{i j} 1_{{Σ_{Z}^{i j} \neq 0}})}_{1 \leq i, j \leq d}

. We label this method f-thr. Since the estimates of RC and f-thr are not always regular, we use their Moore-Penrose generalized inverses to estimate

Σ_{Y}^{- 1}

when they are singular. Please note that the methods glasso and f-glasso correspond to those proposed in [17], while wglasso and f-wglasso are those proposed in this paper. We report the simulation results in Table 1 and Table 2.

We first focus on the accuracy of estimating the precision matrix

Σ_{Y}^{- 1}

. The tables reveal the excellent performance of graphical Lasso-based methods. In particular, they outperform f-thr in Design 1 except for the case

n = 780

even when we ignore the factor structure of the model. Nevertheless, the tables also show apparent benefit to take the factor structure into account in constructing the graphical Lasso type estimators. When we compare the weighted graphical Lasso estimators with the unweighted versions, the weighted ones tend to outperform the unweighted ones as n increases, especially when the factor structure is taken into account. This is more pronounced in Design 2. It is also worth mentioning that the estimation errors for

Σ_{Y}^{- 1}

in the method RC are greater at

n =

390,780 than those at

n =

78,130,195. This is presumably due to a “resonance” effect between the sample size n and dimension d coming from the use of the Moore-Penrose generalized inverse, which is well-known in multivariate analysis (see e.g., [58]): The estimation error for the precision matrix by the generalized inverse of the sample covariance matrix drastically increases as n approaches d. Theoretically, this occurs because the smallest non-zero eigenvalue of the sample covariance matrix tends to 0 as n approaches d.

Turning to the estimation accuracy for

Σ_{Y}

in terms of the

ℓ_{\infty}

-norm, we find little advantage to use the graphical Lasso type methods over the realized covariance matrix: f-glasso and f-wglasso tend to outperform RC at small values of n, but the differences of the performance become less clear as n increases. From a theoretical point of view, this is not surprising because the realized covariance matrix is a consistent estimator for

Σ_{Y}

in the

ℓ_{\infty}

-norm with the convergence rate

\sqrt{log d / n}

; this can be seen from e.g., Lemma A15. Meanwhile, in Design 1 f-thr performs the best in terms of estimating

Σ_{Y}

at all the values of n.

Next we assess the accuracy of the mixed normal approximation for the de-biased estimator. For this purpose, we construct entry-wise confidence intervals for

Θ_{Z}

based on Equation (21) (with taking the factor structure into account) and evaluate their nominal coverages. Table 3 reports these coverages averaged over the sets

{(i, j) : i \leq j, Θ_{Z}^{i j} = 0}

and

{(i, j) : i \leq j, Θ_{Z}^{i j} \neq 0}

, respectively. We see from the table that the asymptotic approximation perfectly works to construct confidence intervals for zero entries of

Θ_{Z}

. By contrast, confidence intervals for non-zero entries of

Θ_{Z}

tend to be over-coverages, especially in Design 1. However, these coverage distortions tend to be moderate at larger values of n, which suggests that the normal approximation starts to work for relatively large sample sizes.

6. Empirical Application

To illustrate the applicability of the proposed method to real data analysis, we conduct a simple empirical study using high-frequency financial data. We take 1 March 2018 as the observation interval

[0, 1]

and the log-price processes of the component stocks of the S&P 500 index as the process Y. In addition, as is often performed in the literature, we regard the SPDR S&P 500 ETF (SPY) as the observable factor process X. We use 5-minute returns to compute the estimators presented in Section 4. The dataset is provided by Bloomberg. Please note that our setting implies

d = 504

and

n = 77

, yielding a high-dimensional setting considered in this paper (note that our dataset does not contain observations at the market opening).

The selection procedure presented in Section 5.1 suggests

λ_{n} \approx 0.272

. Then we estimate the support

S (Θ_{Z})

of

Θ_{Z}

by the estimator

{\hat{S}}_{n} (Θ_{Z})

with

α_{n} = 0.05

from Corollary 2. Figure 1 shows the partial correlation network induced by

{\hat{S}}_{n} (Θ_{Z})

, drawn by the R package igraph. Specifically, it depicts the undirected graph with vertices consisting of the S&P 500 component stocks and an edge set given by

{\hat{S}}_{n} (Θ_{Z})

. To illuminate the relationship between the network and sector structures, we color the vertices according to their Global Industry Classification Standard (GICS) sectors. We find there are strong interconnections in several sectors such as Consumer Staples, Energy, Real Estate and Utilities. The figure also suggests that the network would have some characteristics that are commonly observed in scale-free networks: It consists of a giant component with several hubs and a few small components. This is consistent with an observation made in [42]. Indeed, in [42] the authors have proposed a model for

Θ_{Z}

that induces a scale-free partial correlation network. According to their model, the decay of the largest eigenvalues of

Θ_{Z}

also exhibits power-law behavior. More precisely, letting

Λ_{1} \geq \dots \geq Λ_{d}

be the ordered eigenvalues of

Θ_{Z}

, we have

Λ_{i} ≍ i^{- α}

with some

α > 0

for moderate i and large d. Then, it is interesting to check whether this is the case in our dataset. Figure 2 shows the log-log size-rank plot for the 50 largest eigenvalues of

{\hat{Θ}}_{Z, λ_{n}}

. We see that except for the three largest eigenvalues, they clearly display power-law behavior.

7. Conclusions

In this paper, we have developed a generic asymptotic theory to estimate the high-dimensional precision matrix of high-frequency data using the weighted graphical Lasso. We have shown that the consistency of the weighted graphical Lasso estimator in matrix operator norms follows from the consistency of the initial estimator in the

ℓ_{\infty}

-norm, while the asymptotic mixed normality of its de-biased version follows from that of the initial estimator, where the asymptotic mixed normality has been formulated appropriately for the high-dimensional setting considered here. Our theory also encompasses a situation where a known factor structure is present in the data. In such a situation, we have applied the weighted graphical Lasso to the residual process obtained after removing the effect of factors.

We have applied the developed theory to the concrete situation where we can use the realized covariance matrix as the initial covariance estimator. We have derived the desirable asymptotic mixed normality of the realized covariance matrix by an application of the recent high-dimensional central limit theorem obtained in [20], where Malliavin calculus resolves the main theoretical difficulties caused by the high-dimensionality. Consequently, we have obtained a feasible asymptotic distribution theory to conduct inference for entries of the precision matrix. A Monte Carlo study has shown the good finite sample performance of our asymptotic theory.

A natural direction for future work is to apply the developed theory to a more complex situation where the process is asynchronously observed with noise and/or jumps. To accomplish this purpose, we need to establish the high-dimensional asymptotic mixed normality of relevant covariance estimators.

Funding

This research was funded by JST CREST Grant Number JPMJCR14D7 and JSPS KAKENHI Grant Numbers JP17H01100, JP18H00836, JP19K13668.

Acknowledgments

The author thanks two anonymous referees for their constructive comments that substantially improved the original version of this paper.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

i.i.d.	independent and identically distributed
MLE	maximum likelihood estimation
KKT	Karush–Kuhn–Tucker
CLT	central limit theorem

Appendix A. Matrix Inequalities

This appendix collects some elementary (but less trivial) inequalities for matrices used in the proofs of the main results.

Lemma A1.

Let

A \in S_{d}

. Then

Λ_{min} (A) \leq A^{i i} \leq Λ_{max} (A)

for every

i = 1, \dots, d

.

Proof.

See Theorem 14 in [59] (Chapter 11). □

Lemma A2.

Let

A \in S_{d}^{+}

and

B \in R^{d \times r}

. Then

Λ_{max} (B^{⊤} A B) \leq Λ_{max} (B^{⊤} B) Λ_{max} (A)

and

Λ_{min} (B^{⊤} A B) \geq Λ_{min} (B^{⊤} B) Λ_{min} (A)

.

Proof.

Let x be an eigenvector associated with

Λ_{max} (A)

such that

{∥ x ∥}_{ℓ_{2}} = 1

. Then, by Theorem 4 in [59] (Chapter 11) we have

Λ_{max} (B^{⊤} A B) = x^{⊤} B^{⊤} A B x \leq Λ_{max} (A) x^{⊤} B^{⊤} B x \leq Λ_{max} (A) Λ_{max} (B^{⊤} B)

. Therefore, we obtain the first inequality. The second one can be shown analogously. □

Lemma A3.

Let

A, B \in S_{d}

. Then

| Λ_{max} (A) - Λ_{max} (B) | \lor | Λ_{min} (A) - Λ_{min} {(B) | \leq | | | A - B | | |}_{2}

.

Proof.

Noting the identity

{| | | C | | |}_{2} = Λ_{max} (C) \lor (- Λ_{min} (C))

holding for any symmetric matrix C, the desired result follows from Weyl’s inequality (cf. Corollary 4.3.15 in [60]). □

Lemma A4.

For any

A \in S_{d}

,

{| | | A | | |}_{1} = {| | | A | | |}_{\infty} \leq \sqrt{d (A)} {| | | A | | |}_{2}

.

Proof.

This is a straightforward consequence of the Schwarz inequality. □

Lemma A5.

Let

A, B \in R^{r \times r}

. If A is invertible and

{| | | A^{- 1} (B - A) | | |}_{w} < 1

for some

w \in [1, \infty]

, B is invertible and

{| | | B^{- 1} - A^{- 1} | | |}_{w} \leq \frac{{| | | A^{- 1} | | |}_{w} {| | | A^{- 1} (B - A) | | |}_{w}}{1 - {| | | A^{- 1} (B - A) | | |}_{w}} .

Proof.

See pages 381–382 of [60]. □

Lemma A6.

Let

A \in S_{r}

and

B, C \in R^{d \times r}

. Then

∥ B A C^{⊤} ∥_{ℓ_{\infty}} \leq {| | | A | | |}_{2} (max_{1 \leq i \leq d} {∥ B^{i \cdot} ∥}_{ℓ_{2}}) (max_{1 \leq j \leq d} {∥ C^{j \cdot} ∥}_{ℓ_{2}}) \leq r {| | | A | | |}_{2} {∥ B ∥}_{ℓ_{\infty}} {∥ C ∥}_{ℓ_{\infty}} .

Proof.

This result has essentially been shown in [61] (Lemma A.7). Since A is symmetric, there is an orthogonal matrix

U \in R^{r \times r}

such that

Λ : = U^{⊤} A U

is a diagonal matrix. Now, for any

i, j \in {1, \dots, d}

,

\begin{matrix} | {(B A C^{⊤})}^{i j} | & = | {(B^{i \cdot})}^{⊤} A C^{j \cdot} | = | {(B^{i \cdot})}^{⊤} U Λ U^{⊤} C^{j \cdot} | = |\sum_{k = 1}^{r} Λ^{k k} {({(B^{i \cdot})}^{⊤} U)}^{k} {(U^{⊤} C^{j \cdot})}^{k}| \\ \leq max_{1 \leq k \leq r} | Λ^{k k} | ∥ U^{⊤} B^{i \cdot} ∥_{ℓ_{2}} ∥ U C^{j \cdot} ∥_{ℓ_{2}} = {| | | A | | |}_{2} ∥ B^{i \cdot} ∥_{ℓ_{2}} ∥ C^{j \cdot} ∥_{ℓ_{2}} \leq r {| | | A | | |}_{2} {∥ B ∥}_{ℓ_{\infty}} {∥ C ∥}_{ℓ_{\infty}} . \end{matrix}

This yields the desired result. □

Lemma A7.

Let

A, B, C \in R^{d \times d}

. Then, for any

i, j = 1, \dots, d

,

| {(B A C^{⊤})}^{i j} |^{2} \leq (\sum_{k = 1}^{d} | B^{i k} |) (\sum_{l = 1}^{d} | C^{i l} |) \sum_{k, l = 1}^{d} | B^{i k} C^{j l} | {(A^{k l})}^{2} .

Proof.

This is a straightforward consequence of the Schwarz inequality. □

Appendix B. Proofs for Section 2

Appendix B.1. Proof of Proposition 1

The following result has essentially been proven in [24] and gives an estimate for the “deterministic part” of oracle inequalities for graphical Lasso type estimators.

Proposition A1.

Let

A_{0}, A \in S_{d}

and assume

∥ A - A_{0} ∥_{ℓ_{\infty}} \leq λ_{0}

for some

λ_{0} > 0

. Assume also that there are numbers

L > 1

and

λ > 0

such that

L^{- 1} \leq Λ_{\min} (A_{0}) \leq Λ_{\max} (A_{0}) \leq L

,

2 λ_{0} \leq λ \leq {(8 L c_{L})}^{- 1}

and

8 c_{L}^{2} s λ^{2} + 2 c_{L} λ_{0}^{2} {∥ diag (A) - diag (A_{0}) ∥}_{ℓ_{2}}^{2} \leq λ_{0} / (2 L)

, where

s : = s (B_{0})

and

c_{L} : = 8 L^{2}

. Set

B_{0} : = A_{0}^{- 1}

. Then, for any

B \in S_{d}^{+ +}

satisfying

tr (B A) - log det (B) + λ ∥ B^{-} ∥_{ℓ_{1}} \leq tr (B_{0} A) - log det (B_{0}) + λ {∥ B_{0}^{-} ∥}_{ℓ_{1}},

(A1)

it holds that

∥ B - B_{0} ∥_{ℓ_{2}}^{2} / c_{L} + λ ∥ B^{-} - B_{0}^{-} ∥_{ℓ_{1}} \leq 8 c_{L}^{2} s λ^{2} + 2 c_{L} λ_{0}^{2} {∥ diag (A) - diag (A_{0}) ∥}_{ℓ_{2}}^{2} .

We first prove Proposition A1 under an additional assumption:

Lemma A8.

Proposition A1 holds true if we additionally have

∥ B - B_{0} ∥_{ℓ_{2}} \leq 1 / (2 L)

.

Proof.

Set

Δ = B - B_{0}

. By assumption we have

{∥ Δ ∥}_{ℓ_{2}} \leq 1 / (2 L)

, so Lemma 2 in [24] implies that

E (Δ) : = tr (Δ A_{0}) - \{log det (Δ + B_{0}) - log det (B_{0})\}

is well-defined and we have

E (Δ) \geq {c ∥ Δ ∥}_{ℓ_{2}},

(A2)

where

c = c_{L}^{- 1}

. Moreover, Equation (A1) yields

\begin{matrix} E (Δ) + λ ∥ B^{-} ∥_{ℓ_{1}} & = - tr (Δ (A - A_{0})) + \{tr (B A) - log det (B) + λ {∥ B^{-} ∥}_{ℓ_{1}}\} - tr (B_{0} A) + log det (B_{0}) \\ \leq - tr (Δ (A - A_{0})) + \{tr (B_{0} A) - log det (B_{0}) + λ {∥ B_{0}^{-} ∥}_{ℓ_{1}}\} - tr (B_{0} A) + log det (B_{0}) \\ = - tr (Δ (A - A_{0})) + λ {∥ B_{0}^{-} ∥}_{ℓ_{1}} . \end{matrix}

(A3)

Now, note that

tr (A_{1} B_{1}) = tr (A_{1}^{-} B_{1}^{-}) + tr (diag (A_{1}) diag (B_{1}))

and

| tr (A_{1} B_{1}) | \leq ∥ A_{1} \circ B_{1} ∥_{ℓ_{1}}

for any

A_{1}, B_{1} \in R^{d \times d}

. Thus, we infer that

\begin{matrix} | tr (Δ (A - A_{0})) | & \leq ∥ Δ^{-} ∥_{ℓ_{1}} ∥ A^{-} - A_{0}^{-} ∥_{ℓ_{\infty}} + {∥ diag (Δ) ∥}_{ℓ_{2}} {∥ diag (A) - diag (A_{0}) ∥}_{ℓ_{2}} \\ \leq λ_{0} ∥ Δ^{-} ∥_{ℓ_{1}} + ∥ diag (A) - diag (A_{0}) ∥_{ℓ_{2}} {∥ diag (Δ) ∥}_{ℓ_{2}}, \end{matrix}

where we use

∥ A - A_{0} ∥_{ℓ_{\infty}} \leq λ_{0}

in the last line. Combining this with Equations (A2) and (A3), we conclude that

{c ∥ Δ ∥}_{ℓ_{2}}^{2} + λ ∥ B^{-} ∥_{ℓ_{1}} \leq λ_{0} ∥ Δ^{-} ∥_{ℓ_{1}} + ∥ diag (A) - diag (A_{0}) ∥_{ℓ_{2}} ∥ diag (Δ) ∥_{ℓ_{2}} + λ {∥ B_{0}^{-} ∥}_{ℓ_{1}} .

Let

S : = S (B_{0})

. Also, for a subset I of

{1, \dots, d}^{2}

and a

d \times d

matrix U, we define the

d \times d

matrix

U_{I} = {(U_{I}^{i j})}_{1 \leq i, j \leq d}

by

U_{I}^{i j} = U^{i j} 1_{{(i, j) \in I}}

. Then, by definition and assumption, we have

∥ B^{-} ∥_{ℓ_{1}} = ∥ B_{S}^{-} ∥_{ℓ_{1}} + {∥ B_{S^{c}}^{-} ∥}_{ℓ_{1}}

,

∥ Δ^{-} ∥_{ℓ_{1}} = ∥ Δ_{S}^{-} ∥_{ℓ_{1}} + {∥ B_{S^{c}}^{-} ∥}_{ℓ_{1}}

,

∥ B_{0}^{-} ∥_{ℓ_{1}} \leq ∥ Δ_{S}^{-} ∥_{ℓ_{1}} + {∥ B_{S}^{-} ∥}_{ℓ_{1}}

,

λ \geq 2 λ_{0}

, so we deduce

{c ∥ Δ ∥}_{ℓ_{2}}^{2} + \frac{λ}{2} ∥ B_{S^{c}}^{-} ∥_{ℓ_{1}} \leq \frac{3 λ}{2} ∥ Δ_{S}^{-} ∥_{ℓ_{1}} + ∥ diag (A) - diag (A_{0}) ∥_{ℓ_{2}} {∥ diag (Δ) ∥}_{ℓ_{2}} .

Consequently, we obtain

\begin{matrix} {2 c ∥ Δ ∥}_{ℓ_{2}}^{2} + λ {∥ Δ^{-} ∥}_{ℓ_{1}} & = {2 c ∥ Δ ∥}_{ℓ_{2}}^{2} + λ (∥ B_{S^{c}}^{-} ∥_{ℓ_{1}} + ∥ Δ_{S}^{-} ∥_{ℓ_{1}}) \\ \leq 4 λ ∥ Δ_{S}^{-} ∥_{ℓ_{1}} + 2 ∥ diag (A) - diag (A_{0}) ∥_{ℓ_{2}} {∥ diag (Δ) ∥}_{ℓ_{2}} \\ \leq 4 λ \sqrt{s} ∥ Δ_{S}^{-} ∥_{ℓ_{2}} + 2 ∥ diag (A) - diag (A_{0}) ∥_{ℓ_{2}} {∥ diag (Δ) ∥}_{ℓ_{2}} (∵ Schwarz inequality) \\ \leq 8 s λ^{2} / c^{2} + c ∥ Δ_{S}^{-} ∥_{ℓ_{2}}^{2} / 2 + 2 ∥ diag (A) - diag (A_{0}) ∥_{ℓ_{2}}^{2} / c + c {∥ diag (Δ) ∥}_{ℓ_{2}}^{2} / 2, \end{matrix}

where we use the inequality

x y \leq (x^{2} + y^{2}) / 2

in the last line. Since

{∥ Δ ∥}_{ℓ_{2}}^{2} = {∥ diag (Δ) ∥}_{ℓ_{2}}^{2} + {∥ Δ^{-} ∥}_{ℓ_{2}}^{2}

, we conclude that

{c ∥ Δ ∥}_{ℓ_{2}}^{2} + λ ∥ Δ^{-} ∥_{ℓ_{1}} \leq 8 s λ^{2} / c^{2} + 2 {∥ diag (A) - diag (A_{0}) ∥}_{ℓ_{2}}^{2} / c,

which completes the proof. □

Proof of Proposition A1.

Thanks to Lemma A8, it suffices to prove

∥ B - B_{0} ∥_{ℓ_{2}} \leq 1 / (2 L)

.

Set

\tilde{B} = α B + (1 - α) B_{0}

with

α = M / (M + ∥ B - B_{0} ∥_{ℓ_{2}})

and

M = 1 / (2 L)

. By definition we have

∥ \tilde{B} - B_{0} ∥_{ℓ_{2}} \leq M = 1 / (2 L)

. Moreover, Equation (A1) and the convexity of the loss function imply that

tr (\tilde{B} A) - log det (\tilde{B}) + λ ∥ B^{-} ∥_{ℓ_{1}} \leq tr (B_{0} A) - log det (B_{0}) + λ {∥ B_{0}^{-} ∥}_{ℓ_{1}} .

Therefore, we can apply Lemma A8 with replacing B by

\tilde{B}

, and thus we obtain

∥ \tilde{B} - B_{0} ∥_{ℓ_{2}}^{2} / c_{L} + λ ∥ {\tilde{B}}^{-} - B_{0}^{-} ∥_{ℓ_{1}} \leq 8 c_{L}^{2} s λ^{2} + 2 c_{L} λ_{0}^{2} {∥ diag (A) - diag (Σ) ∥}_{ℓ_{2}}^{2} .

In particular, we have

∥ \tilde{B} - B_{0} ∥_{ℓ_{2}}^{2} \leq c_{L} λ_{0} / (2 L) \leq 1 / (16 L^{2}),

so we obtain

∥ \tilde{B} - B_{0} ∥_{ℓ_{2}} \leq 1 / (4 L) = M / 2

. By construction this yields

∥ B - B_{0} ∥_{ℓ_{2}} \leq M = 1 / (2 L)

, which completes the proof. □

Proof of Proposition 1.

Thanks to Lemma 7.2 of [45], it suffices to consider the case

w = 1

.

For any

L, n \in N

, we define the set

Ω_{n, L} \subset Ω

by

\begin{matrix} Ω_{n, L} : = {∥ {\hat{R}}_{n} - R_{Y} ∥_{ℓ_{\infty}} \leq λ_{n} / 2} \cap {L^{- 1} \leq Λ_{min} (R_{Y}) \leq Λ_{max} (R_{Y}) \leq L} \\ \cap {s (K_{Y}) \leq L s_{n}} \cap {8 c_{L}^{2} L s_{n} λ_{n} > 1 / (4 L)}, \end{matrix}

where

c_{L} : = 8 L^{2}

. Then we have

lim_{L \to \infty} \underset{n \to \infty}{lim sup} P (Ω_{n, L}^{c}) = 0 .

In fact, noting that Lemma A1 and [A1] yield

max_{1 \leq j \leq d} (Σ_{Y}^{j j} + 1 / Σ_{Y}^{j j}) = O_{p} (1),

(A4)

[A1]–[A2], [B1] and Lemma A2 imply that

λ_{n}^{- 1} {∥ {\hat{R}}_{n} - R_{Y} ∥}_{ℓ_{\infty}} = o_{p} (1)

,

s (K_{Y}) = O_{p} (s_{n})

and

Λ_{max} (R_{Y}) + 1 / Λ_{min} (R_{Y}) = O_{p} (1)

. Finally, [B2] yields

{lim}_{n \to \infty} P (8 L s_{n} λ_{n} > 1 / (4 L)) = 0

for all L. Now, note that

λ_{n} \leq 1 / (16 L c_{L}^{2}) \leq {(8 L c_{L})}^{- 1}

on the set

Ω_{n, L}

. Therefore, applying Proposition A1 with

λ : = λ_{n}

and

λ_{0} : = λ_{n} / 2

, for any fixed L we have

∥ {\hat{K}}_{λ_{n}} - K_{Y} ∥_{ℓ_{2}}^{2} / c_{L} + λ_{n} {∥ {\hat{K}}_{λ_{n}}^{-} - K_{Y}^{-} ∥}_{ℓ_{1}} \leq 8 c_{L}^{2} s_{n} λ_{n}^{2} on Ω_{n, L} .

Consequently, we obtain

\underset{n \to \infty}{lim sup} P (∥ {\hat{K}}_{λ_{n}} - K_{Y} ∥_{ℓ_{2}} > 64 L^{3} \sqrt{s_{n}} λ_{n}) \leq \underset{n \to \infty}{lim sup} P (Ω_{n, L}^{c})

and

\underset{n \to \infty}{lim sup} P (∥ {\hat{K}}_{λ_{n}}^{-} - K_{Y}^{-} ∥_{ℓ_{1}} > 512 L^{4} s_{n} λ_{n}) \leq \underset{n \to \infty}{lim sup} P (Ω_{n, L}^{c}) .

Therefore, we conclude that

\underset{M \to \infty}{lim sup} \underset{n \to \infty}{lim sup} P (∥ {\hat{K}}_{λ_{n}} - K_{Y} ∥_{ℓ_{2}} > M \sqrt{s_{n}} λ_{n}) \leq \underset{L \to \infty}{lim sup} \underset{n \to \infty}{lim sup} P (Ω_{n, L}^{c}) = 0

and

\underset{M \to \infty}{lim sup} \underset{n \to \infty}{lim sup} P (∥ {\hat{K}}_{λ_{n}}^{-} - K_{Y}^{-} ∥_{ℓ_{1}} > M s_{n} λ_{n}) \leq \underset{L \to \infty}{lim sup} \underset{n \to \infty}{lim sup} P (Ω_{n, L}^{c}) = 0,

which yields

∥ {\hat{K}}_{λ_{n}} - K_{Y} ∥_{ℓ_{2}} = O_{p} (\sqrt{s_{n}} λ_{n})

and

∥ {\hat{K}}_{λ_{n}}^{-} - K_{Y}^{-} ∥_{ℓ_{1}} = O_{p} (s_{n} λ_{n})

. In particular, we obtain the first convergence of Equation (3). Moreover, since we have

\begin{matrix} {| | | {\hat{K}}_{λ_{n}} - K_{Y} | | |}_{1} & \leq ∥ diag ({\hat{K}}_{λ_{n}}) - diag (K_{Y}) ∥_{ℓ_{\infty}} + {∥ {\hat{K}}_{λ_{n}}^{-} - K_{Y}^{-} ∥}_{ℓ_{1}} \\ \leq ∥ {\hat{K}}_{λ_{n}} - K_{Y} ∥_{ℓ_{2}} + {∥ {\hat{K}}_{λ_{n}}^{-} - K_{Y}^{-} ∥}_{ℓ_{1}}, \end{matrix}

we also obtain the second convergence of Equation (3).

Now we prove Equation (4). First, Equation (A4) and [B1] yield

λ_{n}^{- 1} {| | | {\hat{V}}_{n} - V | | |}_{1} \to^{p} 0

. Since

{| | | V | | |}_{1} = O_{p} (1)

,

{| | | K_{Y} | | |}_{1} = O_{p} (s_{n})

and

λ_{n} = o_{p} (1)

by Equation (A4), [A2] and [B2], we obtain

{| | | {\hat{V}}_{n} | | |}_{1} = O_{p} (1)

and

{| | | {\hat{K}}_{λ_{n}} | | |}_{1} = O_{p} (s_{n})

. Since

\begin{matrix} {| | | {\hat{Θ}}_{λ_{n}} - Θ_{Y} | | |}_{1} = {| | | {\hat{V}}_{n} {\hat{K}}_{λ_{n}} {\hat{V}}_{n} - V K_{Y} V | | |}_{1} \\ \leq {| | | {\hat{V}}_{n} - V | | |}_{1} {| | | {\hat{K}}_{λ_{n}} | | |}_{1} {| | | {\hat{V}}_{n} | | |}_{1} + {| | | V | | |}_{1} {| | | {\hat{K}}_{λ_{n}} - K_{Y} | | |}_{1} {| | | {\hat{V}}_{n} | | |}_{1} + {| | | V | | |}_{1} {| | | K_{Y} | | |}_{1} {| | | {\hat{V}}_{n} - V | | |}_{1}, \end{matrix}

we obtain the first convergence of Equation (4). Next, since

{| | | {\hat{Θ}}_{λ_{n}} - Θ_{Y} | | |}_{2} = o_{p} (1)

by the above result, [A1] and Lemma A3 yield

{| | | {\hat{Θ}}_{λ_{n}}^{- 1} | | |}_{2} = Λ_{min} {({\hat{Θ}}_{λ_{n}})}^{- 1} = O_{p} (1)

. Since

{| | | {\hat{Θ}}_{λ_{n}}^{- 1} - Σ_{Y} | | |}_{2} \leq {| | | {\hat{Θ}}_{λ_{n}}^{- 1} | | |}_{2} {| | | Θ_{Y} - {\hat{Θ}}_{λ_{n}} | | |}_{2} {| | | Σ_{Y} | | |}_{2}

, we obtain the second convergence of Equation (4). □

Appendix B.2. Proof of Lemma 1

First, by Proposition 14.4.3 of [62] there is a (not necessarily measurable)

d \times d

random matrix

{\hat{Z}}_{n}

such that

{\hat{Σ}}_{n} - {\hat{Θ}}_{λ_{n}}^{- 1} + λ_{n} {\hat{V}}_{n} {\hat{Z}}_{n} {\hat{V}}_{n} = 0, {∥ {\hat{Z}}_{n} ∥}_{ℓ_{\infty}} \leq 1,

and

{\hat{Z}}_{n}^{i j} = sign ({\hat{Θ}}_{λ_{n}}^{i j})

if

{\hat{Θ}}_{λ_{n}}^{i j} \neq 0

. Consequently, it holds that

{\hat{Σ}}_{n} {\hat{Θ}}_{λ_{n}} - E_{d} + λ_{n} {\hat{V}}_{n} {\hat{Z}}_{n} {\hat{V}}_{n} {\hat{Θ}}_{λ_{n}} = 0 .

Therefore, we have

\begin{matrix} {\hat{Θ}}_{λ_{n}} - Θ_{Y} + Θ_{Y} ({\hat{Σ}}_{n} - Σ_{Y}) Θ_{Y} \\ = {\hat{Θ}}_{λ_{n}} - Θ_{Y} + Θ_{Y} ({\hat{Σ}}_{n} - Σ_{Y}) {\hat{Θ}}_{λ_{n}} - Θ_{Y} ({\hat{Σ}}_{n} - Σ_{Y}) ({\hat{Θ}}_{λ_{n}} - Θ_{Y}) \\ = {\hat{Θ}}_{λ_{n}} - Θ_{Y} + Θ_{Y} (E_{d} - λ_{n} {\hat{V}}_{n} {\hat{Z}}_{n} {\hat{V}}_{n} {\hat{Θ}}_{λ_{n}} - Σ_{Y} {\hat{Θ}}_{λ_{n}}) - Θ_{Y} ({\hat{Σ}}_{n} - Σ_{Y}) ({\hat{Θ}}_{λ_{n}} - Θ_{Y}) \\ = - λ_{n} Θ_{Y} {\hat{V}}_{n} {\hat{Z}}_{n} {\hat{V}}_{n} {\hat{Θ}}_{λ_{n}} - Θ_{Y} ({\hat{Σ}}_{n} - Σ_{Y}) ({\hat{Θ}}_{λ_{n}} - Θ_{Y}) \\ = λ_{n} ({\hat{Θ}}_{λ_{n}} - Θ_{Y}) {\hat{V}}_{n} {\hat{Z}}_{n} {\hat{V}}_{n} {\hat{Θ}}_{λ_{n}} - ({\hat{Θ}}_{λ_{n}} - {\hat{Θ}}_{λ_{n}} {\hat{Σ}}_{n} {\hat{Θ}}_{λ_{n}}) - Θ_{Y} ({\hat{Σ}}_{n} - Σ_{Y}) ({\hat{Θ}}_{λ_{n}} - Θ_{Y}), \end{matrix}

so we obtain

\begin{matrix} ∥ {\hat{Θ}}_{λ_{n}} - Θ_{Y} - Γ_{n} + Θ_{Y} ({\hat{Σ}}_{n} - Σ_{Y}) Θ_{Y} ∥_{ℓ_{\infty}} \\ \leq λ_{n} {| | | {\hat{Θ}}_{λ_{n}} - Θ_{Y} | | |}_{\infty} ∥ {\hat{V}}_{n} {\hat{Z}}_{n} {\hat{V}}_{n} {\hat{Θ}}_{λ_{n}} ∥_{ℓ_{\infty}} + {| | | Θ_{Y} | | |}_{\infty} {∥ {\hat{Σ}}_{n} - Σ_{Y} ∥}_{ℓ_{\infty}} {| | | {\hat{Θ}}_{λ_{n}} - Θ_{Y} | | |}_{ℓ_{\infty}} \\ \leq λ_{n} {| | | {\hat{Θ}}_{λ_{n}} - Θ_{Y} | | |}_{\infty} {| | | {\hat{V}}_{n} | | |}_{\infty}^{2} {| | | {\hat{Θ}}_{λ_{n}} | | |}_{\infty} + {| | | Θ_{Y} | | |}_{\infty} {∥ {\hat{Σ}}_{n} - Σ_{Y} ∥}_{ℓ_{\infty}} {| | | {\hat{Θ}}_{λ_{n}} - Θ_{Y} | | |}_{ℓ_{\infty}} . \end{matrix}

Now the desired result follows from Proposition 1 and Lemma A4. □

Appendix B.3. Proof of Proposition 2

In the light of Lemma 3.1 of [20], it is enough to prove

\sqrt{log (m + 1)} {∥J_{n} vec ({\hat{Θ}}_{λ_{n}} - Θ_{Y} - Γ_{n}) - {\tilde{J}}_{n} vec ({\hat{Σ}}_{n} - Σ)∥}_{ℓ_{\infty}} \to^{p} 0

as

n \to \infty

. Please note that

vec (A B C) = (C^{⊤} \otimes A) vec (B)

for any

d \times d

matrices

A, B, C

(cf. Theorem 2 in [59] (Chapter 2)). Thus, we obtain the desired result once we prove

\sqrt{log (m + 1)} {∥J_{n} ({\hat{Θ}}_{λ_{n}} - Θ_{Y} - Γ_{n} + Θ_{Y} ({\hat{Σ}}_{n} - Σ) Θ_{Y})∥}_{ℓ_{\infty}} \to^{p} 0

as

n \to \infty

. This follows from Lemma 1 and the assumptions of the proposition. □

Appendix C. Proofs for Section 3

Appendix C.1. Proof of Proposition 3

Set

Ω_{n} : = {{| | | Σ_{X}^{- 1} ({\hat{Σ}}_{X, n} - Σ_{X}) | | |}_{2} \leq 1 / 2}

.

Lemma A9.

Under the assumptions of Proposition 3, we have the following results:

(a): On the event $Ω_{n}$ , ${\hat{Σ}}_{X, n}$ is invertible and ${| | | {\hat{Σ}}_{X, n}^{- 1} - Σ_{X}^{- 1} | | |}_{2} \leq 2 {| | | Σ_{X}^{- 1} | | |}_{2} {| | | Σ_{X}^{- 1} ({\hat{Σ}}_{X, n} - Σ_{X}) | | |}_{2} .$
(b): $λ_{n}^{- 1} {| | | {\hat{Σ}}_{X, n} - Σ_{X} | | |}_{2} = o_{p} (r)$ and ${| | | {\hat{Σ}}_{X, n} | | |}_{2} = O_{p} (r)$ as $n \to \infty$ .
(c): $P (Ω_{n}) \to 1$ as $n \to \infty$ .

Proof.

(a) is a direct consequence of Lemma A5. (b) follows from [C3], [D2] and the inequalities

{| | | {\hat{Σ}}_{X, n} - Σ_{X} | | |}_{2} \leq r {∥ {\hat{Σ}}_{X, n} - Σ_{X} ∥}_{ℓ_{\infty}}

and

{| | | Σ_{X} | | |}_{2} \leq r {∥ Σ_{X} ∥}_{ℓ_{\infty}}

. (c) follows from the inequality

{| | | Σ_{X}^{- 1} ({\hat{Σ}}_{X, n} - Σ_{X}) | | |}_{2} \leq r {| | | Σ_{X}^{- 1} | | |}_{2} {∥ {\hat{Σ}}_{X, n} - Σ_{X} ∥}_{ℓ_{\infty}}

. □

Lemma A10.

Under the assumptions of Proposition 3,

λ_{n}^{- 2} {∥ {\hat{Σ}}_{Z, n} - {\overset{˘}{Σ}}_{Z, n} ∥}_{ℓ_{\infty}} = o_{p} (r)

as

n \to \infty

.

Proof.

Since

{\hat{β}}_{n} = {\hat{Σ}}_{Y X, n} {\hat{Σ}}_{X, n}^{- 1}

on the event

Ω_{n}

, we have

\begin{matrix} {\hat{Σ}}_{Z, n} - {\overset{˘}{Σ}}_{Z, n} & = - {\hat{β}}_{n} {\hat{Σ}}_{X, n} {\hat{β}}_{n}^{⊤} + {\hat{Σ}}_{Y X, n} β^{⊤} + β {\hat{Σ}}_{Y X, n}^{⊤} - β {\hat{Σ}}_{X, n} β^{⊤} \\ = - {\hat{Σ}}_{Y X, n} {({\hat{β}}_{n} - β)}^{⊤} + β {\hat{Σ}}_{X, n} {({\hat{β}}_{n} - β)}^{⊤} \\ = - ({\hat{Σ}}_{Y X, n} - β {\hat{Σ}}_{X, n}) {\hat{Σ}}_{X, n}^{- 1} {({\hat{Σ}}_{Y X, n} - β {\hat{Σ}}_{X, n})}^{⊤} on Ω_{n} . \end{matrix}

Therefore, Lemma A6 yields

∥ {\hat{Σ}}_{Z, n} - {\overset{˘}{Σ}}_{Z, n} ∥_{ℓ_{\infty}} \leq r {| | | {\hat{Σ}}_{X, n}^{- 1} | | |}_{2} {∥ {\hat{Σ}}_{Y X, n} - β {\hat{Σ}}_{X, n} ∥}_{ℓ_{\infty}}^{2} on Ω_{n} .

Now, by [C3] and Lemma A9 we have

{| | | {\hat{Σ}}_{X, n}^{- 1} | | |}_{2} 1_{Ω_{n}} = O_{p} (1)

, so we obtain

λ_{n}^{- 2} {∥ {\hat{Σ}}_{Z, n} - {\overset{˘}{Σ}}_{Z, n} ∥}_{ℓ_{\infty}} 1_{Ω_{n}} = o_{p} (r)

by [D1]. Since

P (Ω_{n}) \to 1

by Lemma A9(c), we complete the proof. □

Lemma A11.

Under the assumptions of Proposition 3,

λ_{n}^{- 1} {∥ {\hat{Σ}}_{Z, n} - Σ_{Z} ∥}_{ℓ_{\infty}} \to^{p} 0

and

P ({min}_{1 \leq i \leq d} {\hat{Σ}}_{Z, n}^{i i} > 0) \to 1

as

n \to \infty

.

Proof.

The first claim immediately follows from Lemma A10 and [D1]. The second one is a consequence of the first one, Lemma A1 and [C2]. □

Proof of Proposition3.

Set

E_{n} : = Ω_{n} \cap {{\bar{Σ}}_{n} \in S_{d}^{+}} \cap {{min}_{1 \leq i \leq d} {\hat{Σ}}_{Z, n}^{i i} > 0}

. From Equation (0.8.5.3) in [60], we have

{\hat{Σ}}_{Z, n} \in S_{d}^{+}

on

E_{n}

. Hence, from the proof of [34] (Lemma 1), the optimization problem in Equation (10) has the unique solution on

E_{n}

. Since

P (E_{n}) \to 1

as

n \to \infty

by [D3] and Lemmas A9 and A11, the desired result follows once we prove

λ_{n}^{- 1} {∥ {\hat{Σ}}_{Z, n} - Σ_{Z} ∥}_{ℓ_{\infty}} \to^{p} 0

as

n \to \infty

according to Proposition 1. This has already been established in Lemma A11. □

Appendix C.2. Proof of Proposition 4

We first establish some asymptotic properties of

{\hat{β}}_{n}

which are necessary for the subsequent proofs.

Lemma A12.

Under the assumptions of Proposition 3, we have the following results:

(a): ${∥ β ∥}_{ℓ_{2}} = O_{p} (\sqrt{d})$ as $n \to \infty$ .
(b): $λ_{n}^{- 1} {max}_{1 \leq i \leq d} {∥ {\hat{β}}_{n}^{i \cdot} - β^{i \cdot} ∥}_{ℓ_{2}} = o_{p} (\sqrt{r})$ and ${max}_{1 \leq i \leq d} {∥ {\hat{β}}_{n}^{i \cdot} ∥}_{ℓ_{2}} = O_{p} (\sqrt{r})$ as $n \to \infty$ .
(c): $λ_{n}^{- 1} {∥ {\hat{β}}_{n} - β ∥}_{ℓ_{2}} = o_{p} (\sqrt{d r})$ and $∥ {\hat{β}}_{n} ∥_{ℓ_{2}} = O_{p} (\sqrt{d})$ as $n \to \infty$ .
(d): $λ_{n}^{- 1} {| | | {\hat{β}}_{n} - β | | |}_{1} = o_{p} (d \sqrt{r})$ and ${| | | {\hat{β}}_{n} | | |}_{1} = O_{p} (d)$ as $n \to \infty$ .
(e): $λ_{n}^{- 1} {| | | {\hat{β}}_{n} - β | | |}_{\infty} = o_{p} (r)$ and ${| | | {\hat{β}}_{n} | | |}_{\infty} = O_{p} (r)$ as $n \to \infty$ .

Proof.

(a) Since

Σ_{X} - Λ_{min} (Σ_{X}) E_{r}

is positive semidefinite,

β Σ_{X} β^{⊤} - Λ_{min} (Σ_{X}) β β^{⊤}

is also positive semidefinite. Thus,

Σ_{Y} - Λ_{min} (Σ_{X}) β β^{⊤}

is positive definite by Equation (8). This implies that

0 \leq tr (Σ_{Y} - Λ_{min} (Σ_{X}) β β^{⊤}) = tr (Σ_{Y}) - Λ_{min} (Σ_{X}) {∥ β ∥}_{ℓ_{2}}^{2}

. Since

tr (Σ_{Y}) = O_{p} (d)

by [C1], we obtain

{∥ β ∥}_{ℓ_{2}}^{2} = O_{p} (d)

by [C3].

(b) By Lemma A9, on the event

Ω_{n}

, we have

{\hat{β}}_{n} = {\hat{Σ}}_{Y X, n} {\hat{Σ}}_{X, n}^{- 1}

. Hence, for every

i = 1, \dots, d

,

\begin{matrix} ∥ {\hat{β}}_{n}^{i \cdot} - β^{i \cdot} ∥_{ℓ_{2}} & = ∥ ({\hat{Σ}}_{Y X, n}^{i \cdot} - β {\hat{Σ}}_{X, n}^{i \cdot}) {\hat{Σ}}_{X, n}^{- 1} ∥_{ℓ_{2}} \leq {| | | {\hat{Σ}}_{X, n}^{- 1} | | |}_{2} {∥ {\hat{Σ}}_{Y X, n}^{i \cdot} - β {\hat{Σ}}_{X, n}^{i \cdot} ∥}_{ℓ_{2}} \\ \leq \sqrt{r} {| | | {\hat{Σ}}_{X, n}^{- 1} | | |}_{2} {∥ {\hat{Σ}}_{Y X, n} - β {\hat{Σ}}_{X, n} ∥}_{ℓ_{\infty}} on Ω_{n} . \end{matrix}

Since

{| | | {\hat{Σ}}_{X, n}^{- 1} | | |}_{2} 1_{Ω_{n}} = O_{p} (1)

by Lemma A9,

λ_{n}^{- 1} {max}_{1 \leq i \leq d} {∥ {\hat{β}}_{n}^{i \cdot} - β^{i \cdot} ∥}_{ℓ_{\infty}} 1_{Ω_{n}} = o_{p} (\sqrt{r})

by [D1]. Since

P (Ω_{n}^{c}) \to 0

by Lemma A9, we obtain

λ_{n}^{- 1} {max}_{1 \leq i \leq d} {∥ {\hat{β}}_{n}^{i \cdot} - β^{i \cdot} ∥}_{ℓ_{2}} = o_{p} (\sqrt{r})

. Since

{max}_{1 \leq i \leq d} ∥ β^{i \cdot} ∥_{ℓ_{2}} \leq \sqrt{r} {∥ β ∥}_{ℓ_{\infty}} = O (\sqrt{r})

by [C1], we also obtain

{max}_{1 \leq i \leq d} {∥ {\hat{β}}_{n}^{i \cdot} ∥}_{ℓ_{\infty}} = O_{p} (\sqrt{r})

.

(c) This follows from (a)–(b) and

r λ_{n} = o_{p} (1)

.

(d) This is a direct consequence of (b).

(e) This follows from (b) and the Schwarz inequality. □

Proof of Proposition 4.

Since

{∥ A ∥}_{ℓ_{\infty}} \leq {| | | A | | |}_{2}

for any matrix A, in view of Proposition 3 it suffices to prove

λ_{n}^{- 1} {∥ {\hat{β}}_{n} {\hat{Σ}}_{X, n} {\hat{β}}_{n}^{⊤} - β Σ_{X} β^{⊤} ∥}_{ℓ_{\infty}} = O_{p} (r^{2})

. By Lemma A6 we have

\begin{matrix} ∥ {\hat{β}}_{n} {\hat{Σ}}_{X, n} {\hat{β}}_{n}^{⊤} - β Σ_{X} β^{⊤} ∥_{ℓ_{\infty}} & \leq {| | | {\hat{Σ}}_{X, n} | | |}_{2} (max_{1 \leq i \leq d} {∥ {\hat{β}}_{n}^{i \cdot} - β^{i \cdot} ∥}_{ℓ_{2}}) (max_{1 \leq i \leq d} {∥ {\hat{β}}_{n}^{i \cdot} ∥}_{ℓ_{2}}) \\ + {| | | {\hat{Σ}}_{X, n} - Σ_{X} | | |}_{2} (max_{1 \leq i \leq d} {∥ β^{i \cdot} ∥}_{ℓ_{2}}) (max_{1 \leq i \leq d} {∥ {\hat{β}}_{n}^{i \cdot} ∥}_{ℓ_{2}}) \\ + {| | | Σ_{X} | | |}_{2} (max_{1 \leq i \leq d} {∥ β^{i \cdot} ∥}_{ℓ_{2}}) (max_{1 \leq i \leq d} {∥ {\hat{β}}_{n}^{i \cdot} - β^{i \cdot} ∥}_{ℓ_{2}}) . \end{matrix}

Therefore, the desired result follows from Lemmas A9, A12(b) and assumption. □

Appendix C.3. Proof of Proposition 5

Set

Π : = {(Σ_{X}^{- 1} + β^{⊤} Σ_{Z}^{- 1} β)}^{- 1}

and

{\hat{Π}}_{n} : = {({\hat{Σ}}_{X, n}^{†} + {\hat{β}}_{n}^{⊤} {\hat{Θ}}_{Z, λ_{n}} {\hat{β}}_{n})}^{- 1}

.

Lemma A13.

Under the assumptions of Proposition 5, we have the following results:

(a): $Λ_{min} {(β^{⊤} β)}^{- 1} = O (d^{- 1})$ as $n \to \infty$ .
(b): ${| | | Π | | |}_{2} = O_{p} (d^{- 1})$ as $n \to \infty$ .
(c): $λ_{n}^{- 1} {| | | {\hat{Π}}_{n} - Π | | |}_{2} = O_{p} (d^{- 1} (s_{n} + r))$ and ${| | | {\hat{Π}}_{n} | | |}_{2} = O_{p} (d^{- 1})$ as $n \to \infty$ .

Proof.

(a) By Lemma A3 we have

| Λ_{min} (d^{- 1} β^{⊤} β) - Λ_{min} (B) | \leq {| | | d^{- 1} β^{⊤} β - B | | |}_{2}

. Hence the desired result follows from [C6].

(b) Since

{| | | Π | | |}_{2} = Λ_{min} {(Σ_{X}^{- 1} + β^{⊤} Σ_{Z}^{- 1} β)}^{- 1}

and

Σ_{X}^{- 1}

is positive definite, Corollary 4.3.12 in [60] and Lemma A2 yield

\begin{matrix} {| | | Π | | |}_{2} \leq Λ_{min} {(β^{⊤} Σ_{Z}^{- 1} β)}^{- 1} \leq Λ_{min} {(β^{⊤} β)}^{- 1} Λ_{min} {(Σ_{Z}^{- 1})}^{- 1} = Λ_{min} {(β^{⊤} β)}^{- 1} Λ_{max} (Σ_{Z}) . \end{matrix}

Thus, the desired result follows from claim (a) and [C2].

(c) First, since we have

\begin{matrix} {| | | {\hat{β}}_{n}^{⊤} {\hat{Θ}}_{Z, λ_{n}} {\hat{β}}_{n} - β^{⊤} Θ_{Z} β | | |}_{2} \\ \leq {| | | {\hat{β}}_{n} - β | | |}_{2} {| | | {\hat{Θ}}_{Z, λ_{n}} | | |}_{2} {| | | {\hat{β}}_{n} | | |}_{2} + {| | | β | | |}_{2} {| | | {\hat{Θ}}_{Z, λ_{n}} - Θ_{Z} | | |}_{2} {| | | {\hat{β}}_{n} | | |}_{2} + {| | | β | | |}_{2} {| | | Θ_{Z} | | |}_{2} {| | | {\hat{β}}_{n} - β | | |}_{2}, \end{matrix}

Lemma A12(a) and (c) and Proposition 3 yield

λ_{n}^{- 1} {| | | {\hat{β}}_{n}^{⊤} {\hat{Θ}}_{Z, λ_{n}} {\hat{β}}_{n} - β^{⊤} Θ_{Z} β | | |}_{2} = O_{p} (d s_{n})

. Combining this with Lemma A9 and (b), we obtain

λ_{n}^{- 1} {| | | Π ({\hat{Π}}_{n}^{- 1} - Π^{- 1}) | | |}_{2} 1_{Ω_{n}} = O_{p} (s_{n} + r)

. Now let us set

Ω_{n, 1} : = Ω_{n} \cap {{| | | Π ({\hat{Π}}_{n}^{- 1} - Π^{- 1}) | | |}_{2} \leq 1 / 2}

. Then, using (b) and Lemmas A5 and A9(c), we obtain

λ_{n}^{- 1} {| | | {\hat{Π}}_{n} - Π | | |}_{2} 1_{Ω_{n, 1}} = O_{p} (d^{- 1} (s_{n} + r))

and

P (Ω_{n, 1}^{c}) \to 0

. This completes the proof. □

Proof of Proposition 5.

By Sherman–Morrison–Woodbury formula (cf. Equation (0.7.4.1) in [60]), for any

w \in {2, \infty}

we have

\begin{matrix} {| | | {\hat{Σ}}_{Y, λ_{n}}^{- 1} - Σ_{Y}^{- 1} | | |}_{w} \\ \leq {| | | {\hat{Θ}}_{Z, λ_{n}} - Θ_{Z} | | |}_{w} + {| | | ({\hat{Θ}}_{Z, λ_{n}} - Θ_{Z}) {\hat{β}}_{n} {\hat{Π}}_{n} {\hat{β}}_{n}^{⊤} {\hat{Θ}}_{Z, λ_{n}} | | |}_{w} + {| | | Θ_{Z} ({\hat{β}}_{n} - β) {\hat{Π}}_{n} {\hat{β}}_{n}^{⊤} {\hat{Θ}}_{Z, λ_{n}} | | |}_{w} \\ + {| | | Θ_{Z} β ({\hat{Π}}_{n} - Π) {\hat{β}}_{n}^{⊤} {\hat{Θ}}_{Z, λ_{n}} | | |}_{w} + {| | | Θ_{Z} β Π {({\hat{β}}_{n} - β)}^{⊤} {\hat{Θ}}_{Z, λ_{n}} | | |}_{w} + {| | | Θ_{Z} β Π β^{⊤} ({\hat{Θ}}_{Z, λ_{n}} - Θ_{Z}) | | |}_{w} \\ = : Δ_{1} + Δ_{2} + Δ_{3} + Δ_{4} + Δ_{5} + Δ_{6} . \end{matrix}

Proposition 3 yields

λ_{n}^{- 1} Δ_{1} = O_{p} (s_{n})

. Moreover, noting that

{| | | Θ_{Z} | | |}_{\infty} = O_{p} (\sqrt{d_{n}})

by Lemma A4 and [C2], Proposition 3 and Lemmas A12 and A13 imply that

λ_{n}^{- 1} (Δ_{2} + Δ_{6}) = O_{p} (s_{n})

,

λ_{n}^{- 1} (Δ_{3} + Δ_{5}) = o_{p} (\sqrt{r})

and

λ_{n}^{- 1} Δ_{4} = O_{p} (s_{n} + r)

when

w = 2

and

λ_{n}^{- 1} (Δ_{2} + Δ_{6}) = O_{p} (r^{3 / 2} s_{n} \sqrt{d_{n}})

,

λ_{n}^{- 1} Δ_{3} = o_{p} (r^{3 / 2} d_{n})

,

λ_{n}^{- 1} Δ_{4} = O_{p} (r^{3 / 2} (s_{n} + r) d_{n})

and

λ_{n}^{- 1} Δ_{5} = o_{p} (r^{2} d_{n})

when

w = \infty

. This completes the proof. □

Appendix C.4. Proof of Proposition 6

We apply Proposition 2 to

{\hat{Σ}}_{Z, n}

. From the arguments in the proof of Proposition 3, it remains to check condition Equation (5). More precisely, we need to prove

lim_{n \to \infty} sup_{y \in R^{m}} |P (a_{n} {\tilde{J}}_{Z, n} vec ({\hat{Σ}}_{Z, n} - Σ_{Z}) \leq y) - P ({\tilde{J}}_{Z, n} C_{n}^{1 / 2} ζ_{n} \leq y)| = 0 .

Thanks to Lemma 3.1 in [20] and Equation (12), this claim follows once we prove

\sqrt{log (m + 1)} {∥ a_{n} {\tilde{J}}_{Z, n} vec ({\hat{Σ}}_{Z, n} - Σ_{Z}) - a_{n} {\tilde{J}}_{Z, n} vec ({\overset{˘}{Σ}}_{Z, n} - Σ_{Z}) ∥}_{ℓ_{\infty}} \to 0

. Since we have

∥ a_{n} {\tilde{J}}_{Z, n} vec ({\hat{Σ}}_{Z, n} - Σ_{Z}) - a_{n} {\tilde{J}}_{Z, n} vec ({\overset{˘}{Σ}}_{Z, n} - Σ_{Z}) ∥_{ℓ_{\infty}} \leq a_{n} {| | | J_{n} | | |}_{\infty} {| | | Θ_{Z} | | |}_{\infty}^{2} {∥ {\hat{Σ}}_{Z, n} - {\overset{˘}{Σ}}_{Z, n} ∥}_{ℓ_{\infty}}

and

{| | | Θ_{Z} | | |}_{\infty} = O_{p} (\sqrt{d_{n}})

by Lemma A4, the desired result follows from Lemma A10 and assumption. □

Appendix C.5. Proof of Lemma 2 and Proposition 7

We use the same notation as in Section C.3. By Sherman-Morisson-Woodbury formula we have

\begin{matrix} ∥ {\hat{Σ}}_{Y, λ_{n}}^{- 1} - Θ_{Z, λ_{n}} ∥_{ℓ_{\infty}} & \leq ∥ {\hat{Θ}}_{Z, λ_{n}} {\hat{β}}_{n} {\hat{Π}}_{n} {\hat{β}}_{n}^{⊤} {\hat{Θ}}_{Z, λ_{n}} ∥_{ℓ_{\infty}} \\ \leq r ∥ {\hat{Θ}}_{Z, λ_{n}} {\hat{β}}_{n} ∥_{ℓ_{\infty}}^{2} {| | | {\hat{Π}}_{n} | | |}_{2} \leq r {| | | {\hat{Θ}}_{Z, λ_{n}} | | |}_{\infty}^{2} {∥ {\hat{β}}_{n} ∥}_{ℓ_{\infty}}^{2} {| | | {\hat{Π}}_{n} | | |}_{2}, \end{matrix}

where the second inequality follows from Lemma A6. Since

{| | | Θ_{Z} | | |}_{\infty}^{2} = O (d_{n})

by Lemma A4, we have

{| | | {\hat{Θ}}_{Z, λ_{n}} | | |}_{\infty}^{2} = O_{p} (d_{n})

by Proposition 3. We also have

∥ {\hat{β}}_{n} ∥_{ℓ_{\infty}} = O_{p} (1)

by [C1], [D2] and Lemma A12(b). Consequently, we obtain

∥ {\hat{Σ}}_{Y, λ_{n}}^{- 1} - Θ_{Z, λ_{n}} ∥_{ℓ_{\infty}} = O_{p} (r d_{n} / d)

by Lemma A13. Similarly, we can prove

∥ Σ_{Y}^{- 1} - Θ_{Z} ∥_{ℓ_{\infty}} = O_{p} (r d_{n} / d)

. Therefore, we complete the proof of Lemma 2.

Proposition 7 is an immediate consequence of Proposition 6, Lemma 2 and [20] (Lemma 3.1). □

Appendix D. Proofs for Section 4

Appendix D.1. Proof of Theorem 1

The proof relies on the following concentration inequalities for discretized quadratic covariations of continuous martingales:

Lemma A14.

Let

M = {(M_{t})}_{t \in [0, 1]}

and

N = {(N_{t})}_{t \in [0, 1]}

be two continuous martingales. Suppose that there is a constant

L > 0

such that

{| [M, M]}_{t} - {[M, M]}_{s} {| \lor | [N, N]}_{t} - {[N, N]}_{s} | \leq L | t - s |

(A5)

for all

s, t \in [0, 1]

. Then, for any

θ > 0

, there is a constant

C_{L, θ} > 0

which depends only on L and θ such that

P (\sqrt{n} |{\hat{[M, N]}}_{1}^{n} - {[M, N]}_{1}| > x) \leq 2 exp (- C_{L, θ} x^{2})

for all

n \in N

and

x \in [0, θ \sqrt{n}]

.

Remark A1.

Similar estimates to Lemma A14 have already been obtained in the literature (see e.g., [36] (Lemma 3), [4] (Lemma 10) and [8] (Lemma A.1)). Since we use slightly different assumptions from the existing ones, we give its proof in Appendix E for the shake of completeness.

Define the

(d + r)

-dimensional semimartingale

\bar{Z} = {({\bar{Z}}_{t})}_{t \in [0, 1]}

by

{\bar{Z}}_{t} = {(Z_{t}^{1}, \dots, Z_{t}^{d}, X_{t}^{1}, \dots, X_{t}^{r})}^{⊤}

.

Lemma A15.

Assume [E1] and

log (d + r) / \sqrt{n} \to 0

as

n \to \infty

. Then,

∥ {\hat{[\bar{Z}, \bar{Z}]}}_{1}^{n} - {[\bar{Z}, \bar{Z}]}_{1} ∥_{ℓ_{\infty}} = O_{p} (\sqrt{log (d + r) / n})

as

n \to \infty

.

Proof.

For all

n, ν \in N

and

t \in [0, 1]

, set

\bar{μ} {(ν)}_{t} = (\begin{matrix} μ {(ν)}_{t} \\ \tilde{μ} {(ν)}_{t} \end{matrix}), \bar{σ} {(ν)}_{t} = (\begin{matrix} σ {(ν)}_{t} \\ \tilde{σ} {(ν)}_{t} \end{matrix}) .

Then we define the processes

\bar{A} (ν) = {(\bar{A} {(ν)}_{t})}_{t \in [0, 1]}

and

\bar{M} (ν) = {(\bar{M} {(ν)}_{t})}_{t \in [0, 1]}

by

\bar{A} {(ν)}_{t} = \int_{0}^{t} \bar{μ} {(ν)}_{s} d s

and

\bar{M} {(ν)}_{t} = \int_{0}^{t} \bar{σ} {(ν)}_{s} d W_{s}

. By the local property of Itô integrals (cf. [47], pp. 17–18), we have

\bar{Z} = \bar{Z} (ν) : = \bar{A} (ν) + \bar{M} (ν)

on

Ω_{n} (ν)

. Hence, for every

L > 0

, it holds that

\begin{matrix} P (∥ {\hat{[\bar{Z}, \bar{Z}]}}_{1}^{n} - {[\bar{Z}, \bar{Z}]}_{1} ∥_{ℓ_{\infty}} > L \sqrt{log (d + r) / n}) \\ \leq P (∥ {\hat{[\bar{Z} (ν), \bar{Z} (ν)]}}_{1}^{n} - {[\bar{Z} (ν), \bar{Z} (ν)]}_{1} ∥_{ℓ_{\infty}} > L \sqrt{log (d + r) / n}) + P (Ω_{n} {(ν)}^{c}) . \end{matrix}

Therefore, the proof is completed once we show that

lim_{L \to \infty} \underset{n \to \infty}{lim sup} P (∥ {\hat{[\bar{Z} (ν), \bar{Z} (ν)]}}_{1}^{n} - {[\bar{Z} (ν), \bar{Z} (ν)]}_{1} ∥_{ℓ_{\infty}} > L \sqrt{log (d + r) / n}) = 0

for any fixed

ν > 0

. We decompose the target quantity as

\begin{matrix} {\hat{[\bar{Z} (ν), \bar{Z} (ν)]}}_{1}^{n} - {[\bar{Z} (ν), \bar{Z} (ν)]}_{1} \\ = ({\hat{[\bar{M} (ν), \bar{M} (ν)]}}_{1}^{n} - {[\bar{M} (ν), \bar{M} (ν)]}_{1}) + {\hat{[\bar{A} (ν), \bar{A} (ν)]}}_{1}^{n} + {\hat{[\bar{A} (ν), \bar{M} (ν)]}}_{1}^{n} + {\hat{[\bar{M} (ν), \bar{A} (ν)]}}_{1}^{n} \\ = : I_{n} + {II}_{n} + {III}_{n} + {IV}_{n} . \end{matrix}

First we consider

I_{n}

. Since we have

| {[\bar{M} {(ν)}^{i}, \bar{M} {(ν)}^{i}]}_{t} - {[\bar{M} {(ν)}^{i}, \bar{M} {(ν)}^{i}]}_{s} | \leq C_{ν} | t - s |

for all

s, t \in [0, 1]

and

i \in {1, \dots, d + r}

by [E1], by Lemma A14 there is a constant

C > 0

such that

max_{1 \leq i, j \leq d + r} P (\sqrt{n} |I_{n}^{i j}| > x) \leq 2 e^{- C x^{2}}

for all

n \in N

and

x \in [0, \sqrt{n}]

. Therefore, for every

L \in [0, \sqrt{n / log (d + r)}]

we obtain

\begin{matrix} P ({∥I_{n}∥}_{ℓ_{\infty}} > L \sqrt{\frac{log (d + r)}{n}}) & \leq \sum_{i, j = 1}^{d + r} P (\sqrt{n} |I_{n}^{i j}| > L \sqrt{log (d + r)}) \leq 2 {(d + r)}^{2 - C L^{2}} . \end{matrix}

Hence, noting the assumption

\sqrt{n} / log (d + r) \to \infty

, we conclude that

\begin{matrix} lim_{L \to \infty} \underset{n \to \infty}{lim sup} P ({∥I_{n}∥}_{ℓ_{\infty}} > L \sqrt{\frac{log (d + r)}{n}}) = 0 . \end{matrix}

Next, by [E1] we have

∥ {II}_{n} ∥_{ℓ_{\infty}} \leq C_{ν}^{2} / n

. Therefore, we obtain

∥ {II}_{n} ∥_{ℓ_{\infty}} = O (n^{- 1}) = O (\sqrt{log (d + r) / n})

. Third, we consider

{III}_{n}

. By the Schwarz inequality we have

∥ {III}_{n} ∥_{ℓ_{\infty}} \leq \sqrt{∥ {II}_{n} ∥_{ℓ_{\infty}}} max_{1 \leq j \leq d + r} \sqrt{{\hat{[\bar{M} {(ν)}^{j}, \bar{M} {(ν)}^{j}]}}_{1}^{n}} .

From the above result we have

\sqrt{∥ {II}_{n} ∥_{ℓ_{\infty}}} = O (1 / \sqrt{n})

. Meanwhile, using the inequality

\sqrt{x} \leq \sqrt{| x - y |} + \sqrt{y}

holding for all

x, y \geq 0

, we have

max_{1 \leq j \leq d + r} \sqrt{{\hat{[\bar{M} {(ν)}^{j}, \bar{M} {(ν)}^{j}]}}_{1}^{n}} \leq \sqrt{∥ I_{n} ∥_{ℓ_{\infty}}} + max_{1 \leq j \leq d + r} \sqrt{{[\bar{M} {(ν)}^{j}, \bar{M} {(ν)}^{j}]}_{1}} \leq \sqrt{∥ I_{n} ∥_{ℓ_{\infty}}} + \sqrt{C_{ν}} .

Hence the above result yields

{max}_{1 \leq j \leq d + r} \sqrt{{\hat{[\bar{M} {(ν)}^{j}, \bar{M} {(ν)}^{j}]}}_{1}^{n}} = O_{p} (1)

. Thus, we conclude that

∥ {III}_{n} ∥_{ℓ_{\infty}} = O_{p} (1 / \sqrt{n}) = O_{p} (\sqrt{log (d + r) / n})

. Finally, since

∥ {IV}_{n} ∥_{ℓ_{\infty}} = {∥ {III}_{n} ∥}_{ℓ_{\infty}}

, we complete the proof. □

Proof of Theorem 1.

In view of Propositions 3–5, it suffices to check [D1]. Noting that

{\hat{Σ}}_{Y X, n} - β {\hat{Σ}}_{X, n} = {\hat{[Z, X]}}_{1}^{n}

and

{\overset{˘}{Σ}}_{Z, n} = {\hat{[Z, Z]}}_{1}^{n}

, [D1] immediately follows from Lemma A15. □

Appendix D.2. Proof of Theorem 2

Our proof relies on the following “high-dimensional” asymptotic mixed normality of the realized covariance matrix:

Lemma A16

([20], Theorem 4.2(b)). Assume [F1]. For every n, let

X_{n}

be an

m \times d^{2}

random matrix and

Υ_{n}

be an

m \times d^{2}

non-random matrix such that

{| | | Υ_{n} | | |}_{\infty} \geq 1

, where

m = m_{n}

possibly depends on n. Define

Ξ_{n} : = Υ_{n} \circ X_{n}

. Suppose that for all

n, ν \in N

, we have

X_{n} (ν) \in D_{2, \infty} (R^{m \times d^{2}})

such that

X_{n} = X_{n} (ν)

on

Ω_{n} (ν)

and

\begin{matrix} lim_{b ↓ 0} \underset{n \to \infty}{lim sup} P (min diag (Ξ_{n} (ν) C_{n} (ν) Ξ_{n} {(ν)}^{⊤}) < b) = 0, \end{matrix}

(A6)

\begin{matrix} sup_{n \in N} max_{1 \leq i \leq m} max_{1 \leq j \leq d^{2}} (∥ X_{n} {(ν)}^{i j} ∥_{\infty} + sup_{0 \leq t \leq 1} ∥ D_{t} X_{n} {(ν)}^{i j} ∥_{\infty, ℓ_{2}} + sup_{0 \leq s, t \leq 1} {∥ D_{s, t} X_{n} {(ν)}^{i j} ∥}_{\infty, ℓ_{2}}) < \infty, \end{matrix}

(A7)

where

Ξ_{n} (ν) : = Υ_{n} \circ X_{n} (ν)

. Suppose also

{| | | Υ_{n} | | |}_{\infty}^{5} {(log d m)}^{\frac{13}{2}} \to 0

as

n \to \infty

. Then we have

sup_{y \in R^{m}} |P (Ξ_{n} vec ({\hat{[Z, Z]}}_{1}^{n} - {[Z, Z]}_{1}) \leq y) - P (Ξ_{n} C_{n}^{1 / 2} ζ_{n} \leq y)| \to 0

(A8)

as

n \to \infty

.

To apply Lemma A16 to the present setting, we prove some auxiliary results.

Lemma A17.

Let

A_{1}, A_{2}, B_{1}, B_{2} \in R^{d \times d}

. Then

(A_{1} \otimes A_{2}) \circ (B_{1} \otimes B_{2}) = (A_{1} \circ B_{1}) \otimes (A_{2} \circ B_{2})

.

Proof.

This follows from a straightforward computation. □

Lemma A18.

Assume [F1]. Then, for any

n, ν \in N

and

t \in [0, 1]

,

c {(ν)}_{t} \in D_{2, \infty} (R^{d \times d})

,

C_{n} (ν) \in D_{2, \infty} (R^{d^{2} \times d^{2}})

and

\begin{matrix} sup_{n \in N} max_{1 \leq i, j \leq d} (sup_{0 \leq t, u \leq 1} ∥ D_{s} c {(ν)}_{t}^{i j} ∥_{\infty, ℓ_{2}} + sup_{0 \leq t, u, v \leq 1} {∥ D_{u, v} c {(ν)}_{t}^{i j} ∥}_{\infty, ℓ_{2}}) < \infty, \\ sup_{n \in N} max_{1 \leq k, l \leq d^{2}} (sup_{0 \leq u \leq 1} ∥ D_{u} C_{n} {(ν)}^{k l} ∥_{\infty, ℓ_{2}} + sup_{0 \leq u, v \leq 1} {∥ D_{u, v} C_{n} {(ν)}^{k l} ∥}_{\infty, ℓ_{2}}) < \infty . \end{matrix}

Proof.

This directly follows from Lemmas B.11–B.12 in [20]. □

Lemma A19.

Assume [F1]–[F2]. For any

n, ν \in N

,

Θ_{Z} (ν) \in D_{2, \infty} (R^{d \times d})

and

sup_{n \in N} max_{1 \leq i, j \leq d} (sup_{0 \leq t \leq 1} ∥ D_{t} Θ_{Z} {(ν)}^{i j} ∥_{\infty, ℓ_{2}} + sup_{0 \leq s, t \leq 1} {∥ D_{s, t} Θ_{Z} {(ν)}^{i j} ∥}_{\infty, ℓ_{2}}) < \infty .

Proof.

First, by Remark 15.87 in [48] and Lemma A18,

Σ_{Z} (ν) \in D_{2, \infty} (R^{d \times d})

and

D^{k} Σ_{Z} (ν) = \int_{0}^{1} D^{k} c {(ν)}_{s} d s

for

k = 1, 2

. In particular, we have

sup_{n \in N} max_{1 \leq i, j \leq d} (sup_{0 \leq t \leq 1} ∥ D_{t} Σ_{Z} {(ν)}^{i j} ∥_{\infty, ℓ_{2}} + sup_{0 \leq s, t \leq 1} {∥ D_{s, t} Σ_{Z} {(ν)}^{i j} ∥}_{\infty, ℓ_{2}}) < \infty

(A9)

by Lemma A18 and Equation (16). Next, by Theorem 15.78 in [48] and Theorem 4 in [59] (Chapter 8), we have

Θ_{Z} (ν) \in D_{2, \infty} (R^{d \times d})

with

D_{s}^{(a)} Θ_{Z} (ν) = - Θ_{Z} (ν) D_{s}^{(a)} Σ_{Z} (ν) Θ_{Z} (ν)

and

\begin{matrix} D_{s, t}^{(a, b)} Θ_{Z} (ν) = Θ_{Z} (ν) D_{t}^{(b)} Σ_{Z} (ν) Θ_{Z} (ν) D_{s}^{(a)} Σ_{Z} (ν) Θ_{Z} (ν) \\ - Θ_{Z} (ν) D_{s, t}^{(a, b)} Σ_{Z} (ν) Θ_{Z} (ν) + Θ_{Z} (ν) D_{s}^{(a)} Σ_{Z} (ν) Θ_{Z} (ν) D_{t}^{(b)} Σ_{Z} (ν) Θ_{Z} (ν) \end{matrix}

for all

s, t \in [0, 1]

and

a, b \in {1, \dots, d^{'}}

. Therefore, by Lemma A7 we have

∥ D_{s} Θ_{Z} {(ν)}^{i j} ∥_{ℓ_{2}} \leq {| | | Θ_{Z} (ν) | | |}_{\infty}^{2} max_{1 \leq k, l \leq d} {∥ D_{s} Σ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}}

for all

i, j = 1, \dots, d

. Then, noting that

Q_{Z}

is non-random, we have

{(1_{{Θ_{Z} {(ν)}^{i j} \neq 0}})}_{1 \leq i, j \leq d} = Q_{Z}

by assumption. Therefore, we obtain

{| | | Θ_{Z} (ν) | | |}_{\infty} \leq {| | | Q_{Z} | | |}_{\infty} {∥ Θ_{Z} (ν) ∥}_{ℓ_{\infty}}

. Hence, Equation (A9), [F2] and Equation (17) yield

{sup}_{n \in N} {max}_{1 \leq i, j \leq d} {sup}_{0 \leq t \leq 1} {∥ D_{t} Θ_{Z} {(ν)}^{i j} ∥}_{\infty, ℓ_{2}} < \infty

. In the meantime, by Lemma A7 we also have

\begin{matrix} ∥ D_{s, t} Θ_{Z} {(ν)}^{i j} ∥_{ℓ_{2}} \\ \leq 2 \sqrt{\sum_{b = 1}^{d^{'}} {(\sum_{k = 1}^{d} |{(Θ_{Z} (ν) D_{t}^{(b)} Σ_{Z} (ν) Θ_{Z} (ν))}^{i k}|)}^{2}} {| | | Θ_{Z} (ν) | | |}_{\infty} max_{1 \leq k, l \leq d} {∥ D_{s} Σ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}} \\ + {| | | Θ_{Z} (ν) | | |}_{\infty}^{2} max_{1 \leq k, l \leq d} {∥ D_{s, t} Σ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}} \\ = 2 \sqrt{\sum_{b = 1}^{d^{'}} {(\sum_{k = 1}^{d} |D_{t}^{(b)} Θ_{Z} {(ν)}^{i k}|)}^{2}} {| | | Θ_{Z} (ν) | | |}_{\infty} max_{1 \leq k, l \leq d} {∥ D_{s} Σ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}} \\ + {| | | Θ_{Z} (ν) | | |}_{\infty}^{2} max_{1 \leq k, l \leq d} {∥ D_{s, t} Σ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}} . \end{matrix}

Now, since

Q_{Z}

is non-random, we have

D_{t}^{(b)} Θ_{Z} (ν) = Q_{Z} \circ D_{t}^{(b)} Θ_{Z} (ν)

. Therefore, the Schwarz inequality yields

\begin{matrix} ∥ D_{s, t} Θ_{Z} {(ν)}^{i j} ∥_{ℓ_{2}} \\ \leq 2 \sqrt{{| | | Q_{Z} | | |}_{\infty} \sum_{k = 1}^{d} \sum_{b = 1}^{d^{'}} {|D_{t}^{(b)} Θ_{Z} {(ν)}^{i k}|}^{2}} {| | | Θ_{Z} (ν) | | |}_{\infty} max_{1 \leq k, l \leq d} {∥ D_{s} Σ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}} \\ + {| | | Θ_{Z} (ν) | | |}_{\infty}^{2} max_{1 \leq k, l \leq d} {∥ D_{s, t} Σ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}} \\ \leq 2 {| | | Q_{Z} | | |}_{\infty} (max_{1 \leq k, l \leq d} {∥ D_{t} Θ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}}) {| | | Θ_{Z} (ν) | | |}_{\infty} max_{1 \leq k, l \leq d} {∥ D_{s} Σ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}} \\ + {| | | Θ_{Z} (ν) | | |}_{\infty}^{2} max_{1 \leq k, l \leq d} {∥ D_{s, t} Σ_{Z} {(ν)}^{k l} ∥}_{ℓ_{2}} . \end{matrix}

Consequently, we conclude

{sup}_{n \in N} {max}_{1 \leq i, j \leq d} {sup}_{0 \leq s, t \leq 1} {∥ D_{s, t} Θ_{Z} {(ν)}^{i j} ∥}_{\infty, ℓ_{2}} < \infty

by [F2], Equation (17) and the results proved above. □

Lemma A20.

Assume [F1]–[F2]. For any

n, ν \in N

,

Θ_{Z} (ν) \otimes Θ_{Z} (ν), V_{n} (ν) \in D_{2, \infty} (R^{d^{2} \times d^{2}})

and

\begin{matrix} sup_{n \in N} max_{1 \leq i, j \leq d^{2}} (sup_{0 \leq t \leq 1} ∥ D_{t} {Θ_{Z} (ν) \otimes Θ_{Z} (ν)}^{i j} ∥_{\infty, ℓ_{2}} + sup_{0 \leq s, t \leq 1} {∥ D_{s, t} {Θ_{Z} (ν) \otimes Θ_{Z} (ν)}^{i j} ∥}_{\infty, ℓ_{2}}) < \infty, \end{matrix}

(A10)

\begin{matrix} sup_{n \in N} max_{1 \leq i, j \leq d^{2}} (sup_{0 \leq t \leq 1} ∥ D_{t} V_{n} {(ν)}^{i j} ∥_{\infty, ℓ_{2}} + sup_{0 \leq s, t \leq 1} {∥ D_{s, t} V_{n} {(ν)}^{i j} ∥}_{\infty, ℓ_{2}}) < \infty . \end{matrix}

(A11)

Proof.

First, Corollary 15.80 in [48], Equation (17) and Lemma A19 imply that

Θ_{Z} (ν) \otimes Θ_{Z} (ν) \in D_{2, \infty} (R^{2 d^{2} \times d^{2}})

and Equation (A10) holds true. Next, Corollary 15.80 in [48] and Lemma A18 imply that

V_{n} (ν) \in D_{2, \infty} (R^{d^{2} \times d^{2}})

and

\begin{matrix} D_{s}^{(a)} V_{n} (ν) & = {D_{s}^{(a)} Θ_{Z}^{\otimes 2} (ν)} C_{n} (ν) Θ_{Z}^{\otimes 2} (ν) \\ + Θ_{Z}^{\otimes 2} (ν) {D_{s}^{(a)} C_{n} (ν)} Θ_{Z}^{\otimes 2} (ν) + Θ_{Z}^{\otimes 2} (ν) C_{n} (ν) {D_{s}^{(a)} Θ_{Z}^{\otimes 2} (ν)} \end{matrix}

and

\begin{matrix} D_{s, t}^{(a, b)} V_{n} (ν) & = {D_{s, t}^{(a, b)} Θ_{Z}^{\otimes 2} (ν)} C_{n} (ν) Θ_{Z}^{\otimes 2} (ν) + {D_{s}^{(a)} Θ_{Z}^{\otimes 2} (ν)} {D_{t}^{(b)} C_{n} (ν)} Θ_{Z}^{\otimes 2} (ν) \\ + {D_{s}^{(a)} Θ_{Z}^{\otimes 2} (ν)} C_{n} (ν) {D_{t}^{(b)} Θ_{Z}^{\otimes 2} (ν)} + {D_{s}^{(b)} Θ_{Z}^{\otimes 2} (ν)} {D_{s}^{(a)} C_{n} (ν)} Θ_{Z}^{\otimes 2} (ν) \\ + Θ_{Z}^{\otimes 2} (ν) {D_{s, t}^{(a, b)} C_{n} (ν)} Θ_{Z}^{\otimes 2} (ν) + Θ_{Z}^{\otimes 2} (ν) {D_{s}^{(a)} C_{n} (ν)} {D_{t}^{(b)} Θ_{Z}^{\otimes 2} (ν)} \\ + {D_{t}^{(b)} Θ_{Z}^{\otimes 2} (ν)} C_{n} (ν) {D_{s}^{(a)} Θ_{Z}^{\otimes 2} (ν)} + Θ_{Z}^{\otimes 2} (ν) {D_{t}^{(b)} C_{n} (ν)} {D_{s}^{(a)} Θ_{Z}^{\otimes 2} (ν)} \\ + Θ_{Z}^{\otimes 2} (ν) C_{n} (ν) {D_{s, t}^{(a, b)} Θ_{Z}^{\otimes 2} (ν)} \end{matrix}

for any

s, t \in [0, 1]

, where

Θ_{Z}^{\otimes 2} (ν) : = Θ_{Z} (ν) \otimes Θ_{Z} (ν)

. Thus, by Lemma A7 we obtain

\begin{matrix} max_{1 \leq i, j \leq d^{2}} {∥ D_{s} V_{n} {(ν)}^{i j} ∥}_{ℓ_{2}} & \leq 2 max_{1 \leq i \leq d^{2}} \sqrt{\sum_{a = 1}^{d^{'}} {(\sum_{k = 1}^{d^{2}} |D_{s}^{(a)} Θ_{Z}^{\otimes 2} {(ν)}^{i k}|)}^{2}} {∥ C_{n} (ν) ∥}_{ℓ_{\infty}} {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty} \\ + {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty}^{2} max_{1 \leq i, j \leq d^{2}} {∥ D_{s} C_{n} {(ν)}^{i j} ∥}_{ℓ_{2}} \end{matrix}

and

\begin{matrix} max_{1 \leq i, j \leq d^{2}} {∥ D_{s, t} V_{n} {(ν)}^{i j} ∥}_{ℓ_{2}} \\ \leq 2 max_{1 \leq i \leq d^{2}} sup_{0 \leq s, t \leq 1} \sqrt{\sum_{a, b = 1}^{d^{'}} {(\sum_{k = 1}^{d^{2}} |D_{s, t}^{(a, b)} Θ_{Z}^{\otimes 2} {(ν)}^{i k}|)}^{2}} {∥ C_{n} (ν) ∥}_{ℓ_{\infty}} {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty} \\ + 4 max_{1 \leq i, j, l \leq d^{2}} sup_{0 \leq s, t \leq 1} \sqrt{\sum_{a = 1}^{d^{'}} {(\sum_{k = 1}^{d^{2}} |D_{s}^{(a)} Θ_{Z}^{\otimes 2} {(ν)}^{i k}|)}^{2}} {∥ D_{t} C_{n} {(ν)}^{j l} ∥}_{ℓ_{2}} {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty} \\ + 2 max_{1 \leq i \leq d^{2}} sup_{0 \leq s \leq 1} {∥ C_{n} (ν) ∥}_{ℓ_{\infty}} \sum_{a = 1}^{d^{'}} {(\sum_{k = 1}^{d^{2}} |D_{s}^{(a)} Θ_{Z}^{\otimes 2} {(ν)}^{i k}|)}^{2} \\ + max_{1 \leq i, j \leq d^{2}} sup_{0 \leq s, t \leq 1} {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty}^{2} {∥ D_{s, t} C_{n} {(ν)}^{i j} ∥}_{ℓ_{2}} . \end{matrix}

Now, as pointed out in the proof of Lemma A19, we have

Θ_{Z} (ν) = Q_{Z} \circ Θ_{Z} (ν)

. Therefore, Lemma A17 yields

Θ_{Z}^{\otimes 2} (ν) = (Q_{Z} \otimes Q_{Z}) \circ Θ_{Z}^{\otimes 2} (ν)

. Since

Q_{Z}

is non-random by [F2], we have

D_{s}^{(a)} Θ_{Z}^{\otimes 2} (ν) = (Q_{Z} \otimes Q_{Z}) \circ D_{s}^{(a)} Θ_{Z}^{\otimes 2} (ν)

. Thus, using the Schwarz inequality repeatedly, we obtain

\begin{matrix} max_{1 \leq i, j \leq d^{2}} {∥ D_{s} V_{n} {(ν)}^{i j} ∥}_{ℓ_{2}} & \leq 2 max_{1 \leq i j \leq d^{2}} {| | | Q_{Z} | | |}_{\infty}^{2} ∥ D_{s} Θ_{Z}^{\otimes 2} {(ν)}^{i j} ∥_{ℓ_{2}} {∥ C_{n} (ν) ∥}_{ℓ_{\infty}} {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty} \\ + {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty}^{2} max_{1 \leq i, j \leq d^{2}} {∥ D_{s} C_{n} {(ν)}^{i j} ∥}_{ℓ_{2}} \end{matrix}

and

\begin{matrix} max_{1 \leq i, j \leq d^{2}} {∥ D_{s, t} V_{n} {(ν)}^{i j} ∥}_{ℓ_{2}} \\ \leq 2 max_{1 \leq i, j \leq d^{2}} sup_{0 \leq s, t \leq 1} {| | | Q_{Z} | | |}_{\infty}^{2} ∥ D_{s, t} Θ_{Z}^{\otimes 2} {(ν)}^{i j} ∥_{ℓ_{2}} {∥ C_{n} (ν) ∥}_{ℓ_{\infty}} {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty} \\ + 4 max_{1 \leq i, j, k, l \leq d^{2}} sup_{0 \leq s, t \leq 1} {| | | Q_{Z} | | |}_{\infty}^{2} ∥ D_{s} Θ_{Z}^{\otimes 2} {(ν)}^{i k} ∥_{ℓ_{2}} {∥ D_{t} C_{n} {(ν)}^{j l} ∥}_{ℓ_{2}} {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty} \\ + 2 max_{1 \leq i, j \leq d^{2}} sup_{0 \leq s \leq 1} ∥ C_{n} {(ν) ∥}_{ℓ_{\infty}} {| | | Q_{Z} | | |}_{\infty}^{2} {∥ D_{s} Θ_{Z}^{\otimes 2} {(ν)}^{i j} ∥}_{ℓ_{2}} \\ + max_{1 \leq i, j \leq d^{2}} sup_{0 \leq s, t \leq 1} {| | | Θ_{Z}^{\otimes 2} (ν) | | |}_{\infty}^{2} {∥ D_{s, t} C_{n} {(ν)}^{i j} ∥}_{ℓ_{2}} . \end{matrix}

Hence we complete the proof by Lemma A18, Equation (A10) and assumption. □

Proof of Theorem 2.

Set

U_{n} : = \sqrt{n} vec ({\hat{Θ}}_{Z, λ_{n}} - Γ_{Z, n} - Θ_{Z})

. Define the

2 d^{2} \times d^{2}

matrices

J_{n, 1}

and

J_{n, 2}

by

J_{n, 1} = (\begin{matrix} E_{d^{2}} \\ - E_{d^{2}} \end{matrix}), J_{n, 2} = (\begin{matrix} S_{n}^{- 1} \\ - S_{n}^{- 1} \end{matrix}) .

Then we have

sup_{A \in A^{re} (d^{2})} |P (U_{n} \in A) - P (V_{n}^{1 / 2} ζ_{n} \in A)| = sup_{y \in R^{2 d^{2}}} |P (J_{n, 1} U_{n} \leq y) - P (J_{n, 1} V_{n}^{1 / 2} ζ_{n} \leq y)|

and

sup_{A \in A^{re} (d^{2})} |P (S_{n}^{- 1} U_{n} \in A) - P (S_{n}^{- 1} V_{n}^{1 / 2} ζ_{n} \in A)| = sup_{y \in R^{2 d^{2}}} |P (J_{n, 2} U_{n} \leq y) - P (J_{n, 2} V_{n}^{1 / 2} ζ_{n} \leq y)| .

Therefore, in view of Proposition 6, it suffices to check [D1] and Equations (12)–(13) for

J_{n} \in {J_{n, 1}, J_{n, 2}}

. We have already checked [D1] in the proof of Theorem 1. Meanwhile, Equation (13) immediately follows from [E1] and Equation (17). To check Equation (12), we apply Lemma A16 with

Ξ_{n} = - J_{n} (Θ_{Z} \otimes Θ_{Z})

(note that

{\overset{˘}{Σ}}_{Z, n} = {\hat{[Z, Z]}}_{1}^{n}

). Set

Υ_{n} = (\begin{matrix} Q_{Z} \otimes Q_{Z} \\ Q_{Z} \otimes Q_{Z} \end{matrix}) .

Then we have

Ξ_{n} = Υ_{n} \circ Ξ_{n}

by Lemma A17. Since

Υ_{n}

is non-random by [F2], we can apply Lemma A16 with

X_{n} = Ξ_{n}

once we show that for every

ν \in N

, there is an

X_{n} (ν) \in D_{2, \infty} (R^{m \times d^{2}})

such that

X_{n} = X_{n} (ν)

on

Ω_{n} (ν)

and Equations (A6)–(A7) hold true. Now we separately consider the two cases.

Case 1:

J_{n} = J_{n, 1}

. In this case, we set

X_{n} (ν) : = - J_{n, 1} (Θ_{Z} (ν) \otimes Θ_{Z} (ν))

. By [E1] we have

X_{n} = X_{n} (ν)

on

Ω_{n} (ν)

, while Equations (A6)–(A7) follow from Equation (17) and Lemma A20, respectively.

Case 2:

J_{n} = J_{n, 2}

. In this case, we set

X_{n} (ν) : = - J_{n, 2} (ν) (Θ_{Z} (ν) \otimes Θ_{Z} (ν))

, where

J_{n, 2} (ν) = (\begin{matrix} S_{n} {(ν)}^{- 1} \\ - S_{n} {(ν)}^{- 1} \end{matrix}) .

By [E1] we have

X_{n} = X_{n} (ν)

on

Ω_{n} (ν)

, while Equation (A6) is evident because

Ξ_{n} (ν) C_{n} (ν) Ξ_{n} {(ν)}^{⊤}

is the identity matrix in this case. Therefore, it remains to prove Equation (A7). Noting that

S_{n} (ν)

is a diagonal matrix, Equation (A7) follows from Corollary 15.80 in [48] and Lemma A20 once we show that

S_{n} {(ν)}^{k k} \in D_{2, \infty}

for every

k = 1, \dots, d^{2}

and

\begin{matrix} sup_{n \in N} max_{1 \leq k \leq d^{2}} (∥ S_{n} {(ν)}^{k k} ∥_{\infty} + sup_{0 \leq t \leq 1} ∥ D_{t} S_{n} {(ν)}^{k k} ∥_{\infty, ℓ_{2}} + sup_{0 \leq s, t \leq 1} {∥ D_{s, t} S_{n} {(ν)}^{k k} ∥}_{\infty, ℓ_{2}}) < \infty . \end{matrix}

Since we can write

S_{n} {(ν)}^{k k} = {(V_{n} {(ν)}^{k k})}^{5 / 2} {(V_{n} {(ν)}^{k k})}^{- 3}

, we obtain the desired result by combining Theorem 15.78 and Lemma 15.152 in [48] with Lemma A20. □

Appendix D.3. Proof of Lemma 3

We use the following notation: For a d-dimensional process

U = {(U_{t})}_{t \in [0, 1]}

, we set

Δ_{h}^{n} U : = U_{h / n} - U_{(h - 1) / n}

,

h = 1, \dots, n

. Also, we set

χ_{h} : = vec [Δ_{h}^{n} Z {(Δ_{h}^{n} Z)}^{⊤}]

for

h = 1, \dots, n

and

{\tilde{C}}_{n} : = n \sum_{h = 1}^{n} χ_{h} χ_{h}^{⊤} - \frac{n}{2} \sum_{h = 1}^{n - 1} (χ_{h} χ_{h + 1}^{⊤} + χ_{h + 1} χ_{h}^{⊤}) .

Lemma A21.

Assume [E1]. Then

\sum_{h = 1}^{n} (∥ Δ_{h}^{n} {Z ∥}_{ℓ_{\infty}}^{4} + ∥ Δ_{h}^{n} X ∥_{ℓ_{\infty}}^{4}) = O_{p} ({log}^{2} (d + r) / n)

as

n \to \infty

.

Proof.

We use the same notation as in the proof of Lemma A15. Then, we need to prove

\sum_{h = 1}^{n} {∥ Δ_{h}^{n} \bar{Z} ∥}_{ℓ_{\infty}}^{4} = O_{p} ({log}^{2} (d + r) / n)

as

n \to \infty

. For every

ν \in N

and

L > 0

, we have

\begin{matrix} P (\sum_{h = 1}^{n} {∥ Δ_{h}^{n} \bar{Z} ∥}_{ℓ_{\infty}}^{4} > L) \leq P (\sum_{h = 1}^{n} {∥ Δ_{h}^{n} \bar{Z} (ν) ∥}_{ℓ_{\infty}}^{4} > L) + P (Ω_{n} {(ν)}^{c}) . \end{matrix}

Hence it suffices to prove

\sum_{h = 1}^{n} {∥ Δ_{h}^{n} \bar{Z} (ν) ∥}_{ℓ_{\infty}}^{4} = O_{p} ({log}^{2} (d + r) / n)

as

n \to \infty

for any fixed

ν \in N

. By Lemma A23 there is a universal constant

c > 0

such that

∥ Δ_{h}^{n} \bar{M} {(ν)}^{j} ∥_{p} \leq c \sqrt{p} {∥ \sqrt{Δ_{h}^{n} [\bar{M} {(ν)}^{j}, \bar{M} {(ν)}^{j}]} ∥}_{p}

for all

p \geq 2

. Thus, by [E1] we obtain

∥ Δ_{h}^{n} \bar{Z} {(ν)}^{j} ∥_{p} \leq C_{ν} / n + c \sqrt{C_{ν}} \sqrt{p / n}

. Therefore, by [63] (Proposition 2.5.2), there is a constant

C^{'} > 0

such that

{max}_{j, h} {∥ Δ_{h}^{n} \bar{Z} {(ν)}^{j} ∥}_{ψ_{2}} \leq C^{'} / \sqrt{n}

for all n, where

{∥ ξ ∥}_{ψ_{2}} : = inf {Λ > 0 : E [exp (| ξ | / Λ)] \leq 2}

for a random variable

ξ

. Thus, [64] (Lemma 2.2.2) implies that there is a constant

C^{''} > 0

such that

{max}_{h} ∥ ∥ Δ_{h}^{n} \bar{Z} (ν) {∥_{ℓ_{\infty}} ∥}_{ψ_{2}} \leq C^{''} \sqrt{log (d + r) / n}

for all n. Thus, we obtain

E [\sum_{h = 1}^{n} {∥ Δ_{h}^{n} \bar{Z} (ν) ∥}_{ℓ_{\infty}}^{4}] \leq 4!^{4} C^{''} \frac{{log}^{2} (d + r)}{n},

so the desired result follows from the Markov inequality. □

Lemma A22.

Assume [C1]–[C4] and [E1]. Then

\sum_{h = 1}^{n} {∥ {\hat{χ}}_{h} - χ_{h} ∥}_{ℓ_{\infty}}^{2} = O_{p} (r^{2} {(log d)}^{3} / n^{2})

as

n \to \infty

.

Proof.

Since

{\hat{Z}}_{h / n} = Z_{h / n} - ({\hat{β}}_{n} - β) X_{h / n}

, we have

\begin{matrix} {\hat{χ}}_{h} - χ_{h} & = - vec [({\hat{β}}_{n} - β) Δ_{h}^{n} X {(Δ_{h}^{n} Z)}^{⊤}] - vec [Δ_{h}^{n} Z {(({\hat{β}}_{n} - β) Δ_{h}^{n} X)}^{⊤}] \\ + vec [({\hat{β}}_{n} - β) Δ_{h}^{n} X {(({\hat{β}}_{n} - β) Δ_{h}^{n} X)}^{⊤}] . \end{matrix}

Now, since

∥ vec (x y^{⊤}) ∥_{ℓ_{\infty}} \leq {∥ x ∥}_{ℓ_{\infty}} {∥ y ∥}_{ℓ_{\infty}}

for any

x, y \in R^{d}

, it holds that

\begin{matrix} ∥ {\hat{χ}}_{h} - χ_{h} ∥_{ℓ_{\infty}} & \leq 2 ∥ ({\hat{β}}_{n} - β) Δ_{h}^{n} {X ∥}_{ℓ_{\infty}} ∥ Δ_{h}^{n} {Z ∥}_{ℓ_{\infty}} + {∥ ({\hat{β}}_{n} - β) Δ_{h}^{n} X ∥}_{ℓ_{\infty}}^{2} \\ \leq 2 {| | | {\hat{β}}_{n} - β | | |}_{\infty} ∥ Δ_{h}^{n} {X ∥}_{ℓ_{\infty}} ∥ Δ_{h}^{n} {Z ∥}_{ℓ_{\infty}} + {| | | {\hat{β}}_{n} - β | | |}_{\infty}^{2} {∥ Δ_{h}^{n} X ∥}_{ℓ_{\infty}}^{2} . \end{matrix}

Therefore, we obtain

\begin{matrix} \sum_{h = 1}^{n} {∥ {\hat{χ}}_{h} - χ_{h} ∥}_{ℓ_{\infty}}^{2} & \leq 2 {| | | {\hat{β}}_{n} - β | | |}_{\infty}^{2} \sum_{h = 1}^{n} (∥ Δ_{h}^{n} {X ∥}_{ℓ_{\infty}}^{4} + ∥ Δ_{h}^{n} {Z ∥}_{ℓ_{\infty}}^{4}) + 2 {| | | {\hat{β}}_{n} - β | | |}_{\infty}^{4} \sum_{h = 1}^{n} {∥ Δ_{h}^{n} X ∥}_{ℓ_{\infty}}^{4} . \end{matrix}

Now, noting Lemma A15, we infer that

{| | | {\hat{β}}_{n} - β | | |}_{\infty} = O_{p} (r \sqrt{(log d) / n})

from the proof of Lemma A12(e). Thus, we complete the proof by Lemma A21. □

Proof of Lemma 3.

Since

∥ {\tilde{C}}_{n} - C_{n} ∥_{ℓ_{\infty}} = O_{p} ({(log d)}^{2} / \sqrt{n} + n^{- γ})

by Proposition 4.1 in [20], it suffices to prove

∥ {\hat{C}}_{n} - {\tilde{C}}_{n} ∥_{ℓ_{\infty}} = O_{p} (r {(log d)}^{5 / 2} / \sqrt{n})

. Since

∥ vec (x y^{⊤}) ∥_{ℓ_{\infty}} \leq {∥ x ∥}_{ℓ_{\infty}} {∥ y ∥}_{ℓ_{\infty}}

for any

x, y \in R^{d}

, Lemma A21 yields

\sum_{h = 1}^{n} {∥ χ_{h} ∥}_{ℓ_{\infty}}^{2} = O_{p} ({(log d)}^{2} / n)

. Combining this with Lemma A22 and

r^{2} (log d) / n = O (1)

, we also obtain

\sum_{h = 1}^{n} {∥ {\hat{χ}}_{h} ∥}_{ℓ_{\infty}}^{2} = O_{p} ({(log d)}^{2} / n)

. Now the desired result follows from the Schwarz inequality and Lemma A22. □

Appendix D.4. Proof of Corollary 1

(a) Since

{| | | Θ_{Z} | | |}_{\infty} = O_{p} (1)

by Equation (17) and [F2], we have

∥ C_{n} ∥_{ℓ_{\infty}} + {∥ V_{n} ∥}_{ℓ_{\infty}} = O_{p} (1)

by [E1] and

λ_{n}^{- 1} {| | | {\hat{Θ}}_{Z, λ_{n}} \otimes {\hat{Θ}}_{Z, λ_{n}} - Θ_{Z} \otimes Θ_{Z} | | |}_{\infty} = O_{p} (s_{n})

by Theorem 1. Combining this with Lemma 3 and assumption, we obtain

∥ {\hat{V}}_{n} ∥_{ℓ_{\infty}} = O_{p} (1)

and

(log d) ∥ {\hat{V}}_{n} - V_{n} ∥_{ℓ_{\infty}} \to^{p} 0

. Noting Equation (17) and the fact that

{\hat{S}}_{n}

is a diagonal matrix, we also obtain

{| | | {\hat{S}}_{n}^{- 1} | | |}_{\infty} = O_{p} (1)

and

(log d) {| | | {\hat{S}}_{n}^{- 1} - S_{n}^{- 1} | | |}_{\infty} \to^{p} 0

. Since Equation (18) yields

∥ \sqrt{n} vec ({\hat{Θ}}_{Z, λ_{n}} - Γ_{Z, n} - Θ_{Z}) ∥_{ℓ_{\infty}} = O_{p} (\sqrt{log d})

, we obtain

\sqrt{log d} {∥ \sqrt{n} ({\hat{S}}_{n}^{- 1} - S_{n}) vec ({\hat{Θ}}_{Z, λ_{n}} - Γ_{Z, n} - Θ_{Z}) ∥}_{ℓ_{\infty}} \to^{p} 0

. Now the desired result follows from Theorem 2 and [20] Lemma 3.1.

(b) The same argument as above implies that

{(log d)}^{2} {∥ {\hat{V}}_{n} - V_{n} ∥}_{ℓ_{\infty}} \to^{p} 0

and

{(log d)}^{2} {| | | {\hat{S}}_{n}^{- 1} - S_{n}^{- 1} | | |}_{\infty} \to^{p} 0

. Thus, the desired result follows from [20] Proposition 3.1. □

Appendix D.5. Proof of Corollary 2

First, we have by Corollary 1(a)

\begin{matrix} \underset{n \to \infty}{lim sup} P (max_{1 \leq i < j \leq d} \frac{\sqrt{n} | {\hat{Θ}}_{Z, λ_{n}}^{i j} - Γ_{Z, n}^{i j} - Θ_{Z}^{i j} |}{{\hat{s}}_{n}^{i j}} > c_{n}) \\ = \underset{n \to \infty}{lim sup} P (max_{1 \leq i < j \leq d} | {(S_{n}^{- 1} V_{n}^{1 / 2} ζ_{n})}^{(i - 1) d + j} | > c_{n}) \\ \leq \underset{n \to \infty}{lim sup} \frac{d (d - 1)}{2} \cdot 2 (1 - Φ (c_{n})) = \underset{n \to \infty}{lim sup} α_{n} = α . \end{matrix}

Hence we have

\begin{matrix} \underset{n \to \infty}{lim sup} P (\hat{S} (Θ_{Z}) ⊅ S (Θ_{Z})) & \leq \underset{n \to \infty}{lim sup} P (max_{(i, j) : Θ_{Z}^{i j} = 0} \frac{\sqrt{n} | {\hat{Θ}}_{Z, λ_{n}}^{i j} - Γ_{Z, n}^{i j} |}{{\hat{s}}_{n}^{i j}} > c_{n}) \\ \leq \underset{n \to \infty}{lim sup} P (max_{1 \leq i < j \leq d} \frac{\sqrt{n} | {\hat{Θ}}_{Z, λ_{n}}^{i j} - Γ_{Z, n}^{i j} - Θ_{Z}^{i j} |}{{\hat{s}}_{n}^{i j}} > c_{n}) \leq α \end{matrix}

and

\begin{matrix} \underset{n \to \infty}{lim sup} P (\hat{S} (Θ_{Z}) ⊅ S (Θ_{Z})) \\ \leq \underset{n \to \infty}{lim sup} P (min_{(i, j) \in S (Θ_{Z})} \frac{\sqrt{n} | {\hat{Θ}}_{Z, λ_{n}}^{i j} - Γ_{Z, n}^{i j} |}{{\hat{s}}_{n}^{i j}} \leq c_{n}) \\ \leq \underset{n \to \infty}{lim sup} P (min_{(i, j) \in S (Θ_{Z})} \frac{\sqrt{n} | Θ_{Z}^{i j} |}{{\hat{s}}_{n}^{i j}} \leq max_{1 \leq i < j \leq d} \frac{\sqrt{n} | {\hat{Θ}}_{Z, λ_{n}}^{i j} - Γ_{Z, n}^{i j} - Θ_{Z}^{i j} |}{{\hat{s}}_{n}^{i j}} + c_{n}) \\ \leq \underset{n \to \infty}{lim sup} P (min_{(i, j) \in S (Θ_{Z})} \frac{\sqrt{n} | Θ_{Z}^{i j} |}{{\hat{s}}_{n}^{i j}} \leq 2 c_{n}) + \underset{n \to \infty}{lim sup} P (max_{1 \leq i < j \leq d} \frac{\sqrt{n} | {\hat{Θ}}_{Z, λ_{n}}^{i j} - Γ_{Z, n}^{i j} - Θ_{Z}^{i j} |}{{\hat{s}}_{n}^{i j}} > c_{n}) \\ \leq \underset{n \to \infty}{lim sup} P (min_{(i, j) \in S (Θ_{Z})} | Θ_{Z}^{i j} | \leq \frac{2 c_{n}}{\sqrt{n}} max_{i, j} {\hat{s}}_{n}^{i j}) + α . \end{matrix}

Now, we have

{max}_{i, j} {\hat{s}}_{n}^{i j} = O_{p} (1)

from the proof of Corollary 1(a). Moreover, since

\frac{α_{n}}{d (d - 1) / 2} = 1 - Φ (c_{n}) \leq e^{- c_{n}^{2} / 2}

by Chernoff’s inequality, we have

c_{n} \leq \sqrt{- 2 log \frac{α_{n}}{d (d - 1) / 2}}

. Hence

c_{n} = O (\sqrt{log d})

as

n \to \infty

by assumption. Consequently, we obtain

\underset{n \to \infty}{lim sup} P (min_{(i, j) \in S (Θ_{Z})} | Θ_{Z}^{i j} | \leq \frac{2 c_{n}}{\sqrt{n}} max_{i, j} {\hat{s}}_{n}^{i j}) = 0

by assumption. This completes the proof.

Appendix E. Proof of Lemma A14

In this appendix we prove Lemma A14 with the help of two general martingale inequalities. The first one is the Burkholder-Davis-Gundy inequality with a sharp constant:

Lemma A23

(Barlow and Yor [65], Proposition 4.2). There is a universal constant

c > 0

such that

{∥sup_{0 \leq t \leq T} | M_{t} |∥}_{p} \leq c \sqrt{p} {∥{[M, M]}_{T}^{1 / 2}∥}_{p}

for any

p \in [2, \infty)

and any continuous martingale

M = {(M_{t})}_{t \in [0, T]}

with

M_{0} = 0

.

The second one is a Bernstein-type inequality for martingales:

Lemma A24.

Let

{(ξ_{i})}_{i = 1}^{n}

be a martingale difference sequence with respect to the filtration

{(G_{i})}_{i = 0}^{n}

. Suppose that there are constants

a, b > 0

such that

\sum_{i = 1}^{n} E [| ξ_{i} |^{k} ∣ G_{i - 1}] \leq k! a^{k - 2} b^{2} / 2

a.s. for any integer

k \geq 2

. Then, for any

x \geq 0

,

P (max_{1 \leq m \leq n} |\sum_{i = 1}^{m} ξ_{i}| \geq x) \leq 2 exp (- \frac{x^{2}}{b^{2} + b \sqrt{b^{2} + 2 a x}}) .

Proof.

This is a special case of Pinelis [66] (Theorem 3.3). In fact, since

R

is a Hilbert space, we can apply this result with

X = R

and

D = 1

in the notation of that paper. □

Proof of Lemma A14.

For every

h = 1, \dots, n

, set

ξ_{n, h} : = \sqrt{n} \{\int_{t_{h - 1}}^{t_{h}} (M_{t} - M_{t_{h - 1}}) d N_{t} + \int_{t_{h - 1}}^{t_{h}} (N_{t} - N_{t_{h - 1}}) d M_{t}\} .

Itô’s formula yields

\sqrt{n} ({\hat{[M, N]}}_{1}^{n} - {[M, N]}_{1}) = \sum_{h = 1}^{n} ξ_{n, h} .

Also, by assumption

{(ξ_{n, h})}_{h = 1}^{n}

is a martingale difference with respect to

{(F_{t_{h}})}_{h = 0}^{n}

. Moreover, for any integer

k \geq 2

, we have

\begin{matrix} E [| ξ_{n, h} |^{k} ∣ F_{t_{h - 1}}] \\ \leq 2^{k - 1} n^{k / 2} E [{|\int_{t_{h - 1}}^{t_{h}} (M_{t} - M_{t_{h - 1}}) d N_{t}|}^{k} + {|\int_{t_{h - 1}}^{t_{h}} (N_{t} - N_{t_{h - 1}}) d M_{t}|}^{k} ∣ F_{t_{h - 1}}] \\ \leq 2^{k - 1} n^{k / 2} c^{k} k^{k / 2} E [{(\int_{t_{h - 1}}^{t_{h}} {(M_{t} - M_{t_{h - 1}})}^{2} d {[N, N]}_{t})}^{k / 2} + {(\int_{t_{h - 1}}^{t_{h}} {(N_{t} - N_{t_{h - 1}})}^{2} d {[M, M]}_{t})}^{k / 2} ∣ F_{t_{h - 1}}] \\ (∵ Lemma A 23) \\ \leq 2^{k - 1} c^{k} k^{k / 2} L^{k / 2} E [sup_{t_{h - 1} < t \leq t_{h}} | M_{t} - M_{t_{h - 1}} |^{k} + sup_{t_{h - 1} < t \leq t_{h}} {| N_{t} - N_{t_{h - 1}} |}^{k} ∣ F_{t_{h - 1}}] (∵ (A 5)) \\ \leq 2^{k - 1} c^{2 k} k^{k} L^{k / 2} E [{({[M, M]}_{t_{h}} - {[M, M]}_{t_{h - 1}})}^{k / 2} + {({[N, N]}_{t_{h}} - {[N, N]}_{t_{h - 1}})}^{k / 2} ∣ F_{t_{h - 1}}] (∵ Lemma A 23) \\ \leq 2^{k} c^{2 k} k^{k} \frac{L^{k}}{n^{k / 2}} (∵ (A 5)), \end{matrix}

where

c > 0

is the universal constant appearing in Lemma A23. Thus, using Stirling’s formula, we obtain

\sum_{h = 1}^{n} E [| ξ_{n, h} |^{k} ∣ F_{t_{h - 1}}] \leq 2^{k} c^{2 k} \frac{e^{k}}{\sqrt{2 π k}} k! \frac{L^{k}}{n^{k / 2 - 1}} \leq \frac{k!}{2} {(\frac{a_{0}}{\sqrt{n}})}^{k - 2} b_{0}^{2},

where

a_{0} : = 2 e c^{2} L

and

b_{0} : = 2 \sqrt{2} c^{2} L e / {(2 π)}^{1 / 4}

. Hence, Lemma A24 yields

P (|\sum_{h = 1}^{n} ξ_{n, h}| \geq x) \leq 2 exp (- \frac{x^{2}}{b_{0}^{2} + b_{0} \sqrt{b_{0}^{2} + 2 (a_{0} / \sqrt{n}) x}})

for every

x \geq 0

. Consequently, when

x \in [0, θ \sqrt{n}]

for some

θ > 0

, we have

P (|\sum_{h = 1}^{n} ξ_{n, h}| \geq x) \leq 2 exp (- C_{L, θ} x^{2})

with

C_{L, θ} : = {(b_{0}^{2} + b_{0} \sqrt{b_{0}^{2} + 2 a_{0} θ})}^{- 1}

. This completes the proof. □

References

Wang, Y.; Zou, J. Vast volatility matrix estimation for high-frequency financial data. Ann. Statist. 2010, 38, 943–978. [Google Scholar] [CrossRef] [Green Version]
Bickel, P.J.; Levina, E. Covariance regularization by thresholding. Ann. Statist. 2008, 36, 2577–2604. [Google Scholar] [CrossRef]
Bickel, P.J.; Levina, E. Regularized estimation of large covariance matrices. Ann. Statist. 2008, 36, 199–227. [Google Scholar] [CrossRef]
Tao, M.; Wang, Y.; Zhou, H. Optimal sparse volatility matrix estimation for high-dimensional Itô processes with measurement errors. Ann. Statist. 2013, 41, 1816–1864. [Google Scholar] [CrossRef] [Green Version]
Tao, M.; Wang, Y.; Chen, X. Fast convergence rates in estimating large volatility matrices using high-frequency financial data. Econom. Theory 2013, 29, 838–856. [Google Scholar] [CrossRef] [Green Version]
Kim, D.; Wang, Y.; Zou, J. Asymptotic theory for large volatility matrix estimation based on high-frequency financial data. Stoch. Process. Appl. 2016, 126, 3527–3577. [Google Scholar] [CrossRef] [Green Version]
Kim, D.; Kong, X.B.; Li, C.X.; Wang, Y. Adaptive thresholding for large volatility matrix estimation based on high-frequency financial data. J. Econom. 2018, 203, 69–79. [Google Scholar] [CrossRef]
Fan, J.; Furger, A.; Xiu, D. Incorporating global industrial classification standard into portfolio allocation: A simple factor-based large covariance matrix estimator with high-frequency data. J. Bus. Econom. Statist. 2016, 34, 489–503. [Google Scholar] [CrossRef]
Aït-Sahalia, Y.; Xiu, D. Using principal component analysis to estimate a high dimensional factor model with high-frequency data. J. Econom. 2017, 201, 384–399. [Google Scholar] [CrossRef]
Fan, J.; Kim, D. Robust high-dimensional volatility matrix estimation for high-frequency factor model. J. Am. Statist. Assoc. 2018, 113, 1268–1283. [Google Scholar] [CrossRef]
Dai, C.; Lu, K.; Xiu, D. Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data. J. Econom. 2019, 208, 43–79. [Google Scholar] [CrossRef]
Hautsch, N.; Kyj, L.M.; Oomen, R.C. A blocking and regularization approach to high-dimensional realized covariance estimation. J. Appl. Econom. 2012, 27, 625–645. [Google Scholar] [CrossRef] [Green Version]
Morimoto, T.; Nagata, S. Robust estimation of a high-dimensional integrated covariance matrix. Commun. Statist. Simul. Comput. 2017, 46, 1102–1112. [Google Scholar] [CrossRef]
Lam, C.; Feng, P.; Hu, C. Nonlinear shrinkage estimation of large integrated covariance matrices. Biometrika 2017, 104, 481–488. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 2012, 40, 1024–1060. [Google Scholar] [CrossRef]
Zheng, X.; Li, Y. On the estimation of integrated covariance matrices of high dimensional diffusion processes. Ann. Statist. 2011, 39, 3121–3151. [Google Scholar] [CrossRef] [Green Version]
Brownlees, C.; Nualart, E.; Sun, Y. Realized networks. J. Appl. Econom. 2018, 33, 986–1006. [Google Scholar] [CrossRef]
Kong, X.B.; Liu, C. Testing against constant factor loading matrix with large panel high-frequency data. J. Econom. 2018, 204, 301–319. [Google Scholar] [CrossRef]
Pelger, M. Large-dimensional factor modeling based on high-frequency observations. J. Econom. 2019, 208, 23–42. [Google Scholar] [CrossRef]
Koike, Y. Mixed-normal limit theorems for multiple Skorohod integrals in high-dimensions, with application to realized covariance. Electron. J. Stat. 2019, 13, 1443–1522. [Google Scholar] [CrossRef]
Cochrane, J.H. Asset Pricing, revised ed.; Princeton University Press: Princeton, NJ, USA, 2005. [Google Scholar]
Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Acemoglu, D.; Carvalho, V.M.; Ozdaglar, A.; Tahbaz-Salehi, A. The network origins of aggregate fluctuations. Econometrica 2012, 80, 1977–2016. [Google Scholar] [CrossRef] [Green Version]
Janková, J.; van de Geer, S. Inference in high-dimensional graphical models. In Handbook of Graphical Models; CRC Press: Boca Raton, FL, USA, 2018; Chapter 14; pp. 325–351. [Google Scholar]
Janková, J.; van de Geer, S. Confidence intervals for high-dimensional inverse covariance estimation. Electron. J. Stat. 2015, 9, 1205–1229. [Google Scholar] [CrossRef] [Green Version]
Podolskij, M.; Vetter, M. Understanding limit theorems for semimartingales: A short survey. Stat. Neerl. 2010, 64, 329–351. [Google Scholar] [CrossRef]
De Gregorio, A.; Iacus, S.M. Adaptive LASSO-type estimation for multivariate diffusion processes. Econom. Theory 2012, 28, 838–860. [Google Scholar] [CrossRef] [Green Version]
Masuda, H.; Shimizu, Y. Moment convergence in regularized estimation under multiple and mixed-rates asymptotics. Math. Methods Statist. 2017, 26, 81–110. [Google Scholar] [CrossRef] [Green Version]
Kinoshita, Y.; Yoshida, N. Penalized quasi likelihood estimation for variable selection. arXiv 2019, arXiv:1910.12871. [Google Scholar]
Suzuki, T.; Yoshida, N. Penalized least squares approximation methods and their applications to stochastic processes. Jpn. J. Stat. Data Sci. 2020. forthcoming. [Google Scholar] [CrossRef] [Green Version]
Fujimori, K. The Dantzig selector for a linear model of diffusion processes. Stat. Inference Stoch. Process. 2019, 22, 475–498. [Google Scholar] [CrossRef] [Green Version]
Gaïffas, S.; Matulewicz, G. Sparse inference of the drift of a high-dimensional Ornstein–Uhlenbeck process. J. Multivar. Anal. 2019, 169, 1–20. [Google Scholar] [CrossRef] [Green Version]
Koike, Y.; Yoshida, N. Covariance estimation and quasi-likelihood analysis. In Financial Mathematics, Volatility and Covariance Modelling; Chevallier, J., Goutte, S., Guerreiro, D., Saglio, S., Sanhaji, B., Eds.; Routledge: London, UK, 2019; Volume 2, Chapter 12; pp. 308–335. [Google Scholar]
Duchi, J.; Gould, S.; Koller, D. Projected subgradient methods for learning sparse Gaussians. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, Helsinki, Finland, 9–12 July 2008; AUAI Press: Arlington, VA, USA, 2008; pp. 153–160. [Google Scholar]
Rothman, A.J.; Bickel, P.J.; Levina, E.; Zhu, J. Sparse permutation invariant covariance estimation. Electron. J. Stat. 2008, 2, 494–515. [Google Scholar] [CrossRef]
Fan, J.; Li, Y.; Yu, K. Vast volatility matrix estimation using high-frequency data for portfolio selection. J. Am. Statist. Assoc. 2012, 107, 412–428. [Google Scholar] [CrossRef] [PubMed]
Kim, D.; Wang, Y. Sparse PCA-based on high-dimensional Itô processes with measurement errors. J. Multivar. Anal. 2016, 152, 172–189. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Ren, Z.; Sun, T.; Zhang, C.H.; Zhou, H.H. Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 2015, 43, 991–1026. [Google Scholar] [CrossRef] [Green Version]
Chang, J.; Qiu, Y.; Yao, Q.; Zou, T. Confidence regions for entries of a large precision matrix. J. Econom. 2018, 206, 57–82. [Google Scholar] [CrossRef]
Campbell, J.Y.; Lo, A.W.; MacKinlay, A.C. The Econometrics of Financial Markets; Princeton University Press: Princeton, NJ, USA, 1997. [Google Scholar]
Barigozzi, M.; Brownlees, C.; Lugosi, G. Power-law partial correlation network models. Electron. J. Stat. 2018, 12, 2905–2929. [Google Scholar] [CrossRef]
Reiß, M.; Todorov, V.; Tauchen, G. Nonparametric test for a constant beta between Itô semi-martingales based on high-frequency data. Stoch. Process. Appl. 2015, 125, 2955–2988. [Google Scholar] [CrossRef]
Fan, J.; Liao, Y.; Mincheva, M. High-dimensional covariance matrix estimation in approximate factor models. Ann. Statist. 2011, 39, 3320–3356. [Google Scholar] [CrossRef]
Cai, T.T.; Liu, W.; Zhou, H.H. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Statist. 2016, 44, 455–488. [Google Scholar] [CrossRef]
Cai, T.T.; Ren, Z.; Zhou, H.H. Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electron. J. Stat. 2016, 10, 1–59. [Google Scholar] [CrossRef]
Nualart, D. The Malliavin Calculus and Related Topics, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Janson, S. Gaussian Hilbert Space; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Clément, E.; Gloter, A. Limit theorems in the Fourier transform method for the estimation of multivariate volatility. Stoch. Process. Appl. 2011, 121, 1097–1124. [Google Scholar] [CrossRef] [Green Version]
Christensen, K.; Podolskij, M.; Thamrongrat, N.; Veliyev, B. Inference from high-frequency data: A subsampling approach. J. Econom. 2017, 197, 245–272. [Google Scholar] [CrossRef] [Green Version]
Belloni, A.; Chernozhukov, V.; Chetverikov, D.; Hansen, C.; Kato, K. High-dimensional econometrics and regularized GMM. arXiv 2018, arXiv:1806.01888. [Google Scholar]
Lam, C.; Fan, J. Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 2009, 37, 4254–4278. [Google Scholar] [CrossRef] [PubMed]
Sustik, M.A.; Calderhead, B. GLASSOFAST: An efficient GLASSO Implementation; UTCS Technical Report TR-12-29; The University of Texas at Austin: Austin, TX, USA, 2012. [Google Scholar]
Friedman, J.; Hastie, T.; Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008, 9, 432–441. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yuan, M.; Lin, Y. Model selection and estimation in the Gaussian graphical model. Biometrika 2007, 94, 19–35. [Google Scholar] [CrossRef] [Green Version]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1. [Google Scholar] [CrossRef] [Green Version]
Witten, D.M.; Friedman, J.H.; Simon, N. New insights and faster computations for the graphical lasso. J. Comput. Graph. Statist. 2011, 20, 892–900. [Google Scholar] [CrossRef]
Hoyle, D.C. Accuracy of pseudo-inverse covariance learning—A random matrix theory analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1470–1481. [Google Scholar] [CrossRef]
Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; Wiley: New York, NY, USA, 1988. [Google Scholar]
Horn, R.A.; Johnson, C.R. Matrix Analysis, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Ogihara, T. Parametric inference for nonsynchronously observed diffusion processes in the presence of market microstructure noise. Bernoulli 2018, 24, 3318–3383. [Google Scholar] [CrossRef] [Green Version]
Lange, K. Optimization, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Vershynin, R. High-Dimensional Probability; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
Van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
Barlow, M.T.; Yor, M. Semi-martingale inequalities via the Garsia–Rodemich–Rumsey lemma and application to local times. J. Funct. Anal. 1982, 49, 198–229. [Google Scholar] [CrossRef] [Green Version]
Pinelis, I. Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 1994, 22, 1679–1706. [Google Scholar] [CrossRef]

Figure 1. Partial correlation network of the S&P 500 component stocks on 1 March 2018.

Figure 2. Log-log size-rank plot for the eigenvalues of the estimated residual precision matrix of the S&P 500 component stocks on 1 March 2018.

Table 1. Estimation accuracy of different methods in Design 1.

	n	RC	glasso	wglasso	f-glasso	f-wglasso	f-thr
	78	22.431	18.857	19.083	15.122	15.130	416.197
	130	26.307	17.931	17.954	14.353	14.353	93.242
${\| \| \| {\hat{Σ}}_{Y}^{- 1} - Σ_{Y}^{- 1} \| \| \|}_{\infty}$	195	45.795	17.447	17.471	13.923	13.928	50.605
	390	722.381	16.687	16.678	11.306	10.806	25.335
	780	423.434	15.965	15.908	9.387	8.851	15.227
	78	6.576	4.270	4.263	3.419	3.420	138.442
	130	6.508	3.654	3.468	3.193	3.193	28.384
${\| \| \| {\hat{Σ}}_{Y}^{- 1} - Σ_{Y}^{- 1} \| \| \|}_{2}$	195	6.480	3.381	3.271	3.094	3.097	14.307
	390	203.038	3.009	3.015	2.133	2.100	6.446
	780	93.354	2.788	2.855	1.782	1.693	3.562
	78	0.361	0.432	0.441	0.351	0.351	0.347
	130	0.279	0.311	0.296	0.281	0.281	0.268
$∥ {\hat{Σ}}_{Y} - Σ_{Y} ∥_{ℓ_{\infty}}$	195	0.227	0.255	0.250	0.241	0.241	0.219
	390	0.160	0.181	0.189	0.166	0.169	0.154
	780	0.112	0.130	0.143	0.118	0.119	0.108

Note. RC: realized covariance matrix; glasso: graphical Lasso; wglasso: weighted graphical Lasso; f-glasso: graphical Lasso with taking the factor structure into account; f-wglasso: weighted graphical Lasso with taking the factor structure into account; f-thr: location-based thresholding with taking the factor structure into account (the method of [8]). The results are based on 10,000 Monte Carlo iterations.

Table 2. Estimation accuracy of different methods in Design 2.

	n	RC	glasso	wglasso	f-glasso	f-wglasso
	78	47.934	43.144	43.055	35.347	35.263
	130	48.266	43.166	41.750	34.767	34.284
${\| \| \| {\hat{Σ}}_{Y}^{- 1} - Σ_{Y}^{- 1} \| \| \|}_{\infty}$	195	50.049	42.806	40.571	34.154	32.835
	390	338.847	41.060	37.801	33.100	29.934
	780	401.447	38.886	34.961	32.163	23.121
	78	17.805	13.557	13.522	7.857	7.843
	130	17.798	13.543	12.628	7.954	7.866
${\| \| \| {\hat{Σ}}_{Y}^{- 1} - Σ_{Y}^{- 1} \| \| \|}_{2}$	195	17.752	13.319	11.630	8.006	7.742
	390	87.239	12.296	9.888	8.059	7.416
	780	55.619	11.189	8.522	8.065	6.072
	78	0.669	0.723	0.721	0.632	0.631
	130	0.509	0.678	0.572	0.489	0.481
$∥ {\hat{Σ}}_{Y} - Σ_{Y} ∥_{ℓ_{\infty}}$	195	0.412	0.567	0.470	0.403	0.390
	390	0.289	0.298	0.339	0.282	0.273
	780	0.203	0.198	0.252	0.197	0.192

Note. RC: realized covariance matrix; glasso: graphical Lasso; wglasso: weighted graphical Lasso; f-glasso: graphical Lasso with taking the factor structure into account; f-wglasso: weighted graphical Lasso with taking the factor structure into account. The results are based on 10,000 Monte Carlo iterations.

Table 3. Average coverages of entry-wise confidence intervals.

		Design 1		Design 2
	n	95%	99%	95%	99%
	78	95.21	99.04	95.22	99.04
	130	95.13	99.03	95.13	99.03
$Θ_{Z}^{i j} = 0$	195	95.09	99.02	95.09	99.02
	390	95.04	99.01	95.05	99.01
	780	95.02	99.00	95.02	99.01
	78	99.33	99.87	95.16	99.03
	130	99.82	99.96	95.90	99.18
$Θ_{Z}^{i j} \neq 0$	195	99.97	99.99	96.36	99.26
	390	96.00	99.20	96.65	99.33
	780	96.09	99.22	96.41	99.27

This table reports the average coverages of entry-wise confidence intervals for the residual precision matrix

Θ_{Z}

over the sets

{(i, j) : i \leq j, Θ_{Z}^{i j} = 0}

and

{(i, j) : i \leq j, Θ_{Z}^{i j} \neq 0}

, respectively. The confidence intervals are constructed based on the normal approximation Equation (21). The results are based on 10,000 Monte Carlo iterations.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Koike, Y. De-Biased Graphical Lasso for High-Frequency Data. Entropy 2020, 22, 456. https://doi.org/10.3390/e22040456

AMA Style

Koike Y. De-Biased Graphical Lasso for High-Frequency Data. Entropy. 2020; 22(4):456. https://doi.org/10.3390/e22040456

Chicago/Turabian Style

Koike, Yuta. 2020. "De-Biased Graphical Lasso for High-Frequency Data" Entropy 22, no. 4: 456. https://doi.org/10.3390/e22040456

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

De-Biased Graphical Lasso for High-Frequency Data

Abstract

1. Introduction

2. Estimators and Abstract Results

2.1. Consistency

2.2. Asymptotic Mixed Normality

3. Factor Structure

4. Application to Realized Covariance Matrix

5. Simulation Study

5.1. Implementation

5.2. Simulation Design

5.3. Results

6. Empirical Application

7. Conclusions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Matrix Inequalities

Appendix B. Proofs for Section 2

Appendix B.1. Proof of Proposition 1

Appendix B.2. Proof of Lemma 1

Appendix B.3. Proof of Proposition 2

Appendix C. Proofs for Section 3

Appendix C.1. Proof of Proposition 3

Appendix C.2. Proof of Proposition 4

Appendix C.3. Proof of Proposition 5

Appendix C.4. Proof of Proposition 6

Appendix C.5. Proof of Lemma 2 and Proposition 7

Appendix D. Proofs for Section 4

Appendix D.1. Proof of Theorem 1

Appendix D.2. Proof of Theorem 2

Appendix D.3. Proof of Lemma 3

Appendix D.4. Proof of Corollary 1

Appendix D.5. Proof of Corollary 2

Appendix E. Proof of Lemma A14

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI