Next Article in Journal
Optimal Power and Efficiency of Multi-Stage Endoreversible Quantum Carnot Heat Engine with Harmonic Oscillators at the Classical Limit
Next Article in Special Issue
Inference for Convolutionally Observed Diffusion Processes
Previous Article in Journal
Black-Scholes Theory and Diffusion Processes on the Cotangent Bundle of the Affine Group
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

De-Biased Graphical Lasso for High-Frequency Data

Mathematics and Informatics Center and Graduate School of Mathematical Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8914, Japan
Entropy 2020, 22(4), 456; https://doi.org/10.3390/e22040456
Submission received: 6 February 2020 / Revised: 10 April 2020 / Accepted: 14 April 2020 / Published: 17 April 2020

Abstract

:
This paper develops a new statistical inference theory for the precision matrix of high-frequency data in a high-dimensional setting. The focus is not only on point estimation but also on interval estimation and hypothesis testing for entries of the precision matrix. To accomplish this purpose, we establish an abstract asymptotic theory for the weighted graphical Lasso and its de-biased version without specifying the form of the initial covariance estimator. We also extend the scope of the theory to the case that a known factor structure is present in the data. The developed theory is applied to the concrete situation where we can use the realized covariance matrix as the initial covariance estimator, and we obtain a feasible asymptotic distribution theory to construct (simultaneous) confidence intervals and (multiple) testing procedures for entries of the precision matrix.

1. Introduction

In high-frequency financial econometrics, covariance matrix estimation of asset returns has been extensively studied in the past two decades. High-frequency financial data are commonly modeled as a discretely observed semimartingale for which the quadratic covariation matrix plays the role of the covariance matrix, so their treatments are often different from those in a standard i.i.d. setting. In recent years, motivated by application to portfolio allocation and risk management in a large-scale asset universe, the high-dimensionality problem has attracted much attention in this area. Since the 2000s, great progress has been made in high-dimensional covariance estimation from i.i.d. data, so researchers are naturally led to apply the techniques developed therein to the context of high-frequency data. For example, Wang and Zou [1] have applied the entry-wise shrinkage methods considered in [2,3] to estimating the covariance matrix of high-frequency data which are asynchronously observed with noise. See also [4,5,6,7] for further developments in this approach. In the meantime, it is well-recognized that the factor structure is an important ingredient both theoretically and empirically for financial data. In the context of high-dimensional covariance estimation from high-frequency data, this perspective was first taken into account by Fan et al. [8] and subsequently built up by, among others, [9,10,11]. Other common methods used in i.i.d. settings have also been investigated in the literature of high-frequency financial econometrics. Hautsch et al. [12] and Morimoto and Nagata [13] formally apply eigenvalue regularization methods based on random matrix theory to high-frequency data. Lam et al. [14] accommodate the non-linear shrinkage estimator of [15] to a high-frequency data setting with the help of the spectral distribution theory for the realized covariance matrix developed in [16]. Brownlees et al. [17] employ the 1 -penalized Gaussian MLE, which is known as the graphical Lasso, to estimate the precision matrix (the inverse of the covariance matrix) of high-frequency data. The last approach is closely related to the methodology we will focus on. Despite the recent advances in this topic as above, most studies in this area focus only on point estimation of covariance and precision matrices, and there are little work about interval estimation and hypothesis testing for these objects. A few exceptions are [18,19,20]. The first two articles are concerned with continuous-time factor models: Kong and Liu [18] propose a test for the constancy of the factor loading matrix, while Pelger [19] assumes constant loadings and develops an asymptotic distribution theory to make inference for the factors and loadings. Meanwhile, Koike [20] establishes a high-dimensional central limit theorem for the realized covariance matrix which allows us to construct simultaneous confidence regions or carry out multiple testing for entries of the high-dimensional covariance matrix of high-frequency data. The aim of this study is to develop such a statistical inference theory for the precision matrix of high-frequency data. This is naturally motivated by the fact that the precision matrix of asset returns plays an important role in mean-variance analysis of portfolio allocation (see e.g., [21], Chapter 5). We accomplish this purpose by imposing a sparsity assumption on the precision matrix. Such an assumption has a clear interpretation in connection with Gaussian graphical models: For a Gaussian random vector ξ = ( ξ 1 , , ξ d ) with covariance matrix Σ , ξ i and ξ j are conditionally independent given the other components if and only if the ( i , j ) -th entry of Σ 1 is equal to 0, so the sparsity of Σ 1 is interpreted as the sparsity of the edge structure of the conditional independence graph associated with ξ . We refer to Chapter 13 of [22] and references therein for more details on graphical models. This standpoint also makes it interesting to estimate the precision matrix of financial data in view of the recent attention to financial network analysis such as [23].
Statistical inference for high-dimensional sparse precision matrices has been actively studied in the recent literature, and various methodologies have ever been proposed; see [24] for an overview. Among others, this paper studies (a weighted version of) the de-biased (or de-sparsified) graphical Lasso in the context of high-frequency data. The de-biased graphical Lasso was introduced in Janková and van de Geer [25] and its theoretical property was investigated in the i.i.d. case. In this paper, we consider its weighted version discussed in [24] because of its theoretically preferable behavior due to its adaptive nature (see Remarks 1 and 2). Compared to the i.i.d. case, we need to handle a new theoretical difficulty in the application to high-frequency data, which is caused by the non-ergodic nature of the problem, i.e., the precision matrix of high-frequency data is generally stochastic and not (stochastically) independent of the observation data. In our context, the precision matrix appears in the coefficients of the linear approximation of the de-biased estimator (see Lemma 1), so it spoils the martingale structure of the linear approximation which we usually have in the i.i.d. case. In a low-dimensional setting, this issue is typically resolved by the concept of stable convergence (see e.g., [26]), but the applicability of this approach is questionable in our setting due to the high-dimensionality (see pages 1451–1452 of [20] for a discussion). Instead, we rely on the recent high-dimensional central limit theory of [20] to establish the asymptotic distribution theory for the de-biased estimator, where we settle the above difficulty with the help of Malliavin calculus.
The graphical Lasso is an example of penalized estimation methods. We shall mention that penalized estimation has recently become an active research topic in the setting of asymptotic statistics for stochastic processes. For example, penalized quasi-likelihood estimation for stochastic processes has been developed in the fixed-dimensional setting by [27,28,29,30], while estimation for linearly parameterized high-dimensional diffusion models has been studied in [31,32]. Compared to these articles, this paper is novel in the respect that we develop an asymptotic distribution theory in a high-dimensional setting.
The rest of this paper is organized as follows. In Section 2 we develop an abstract asymptotic theory for the weighted graphical Lasso based on a generic estimator for the quadratic covariation matrix of a high-dimensional semimartingale. This allows us to flexibly apply the developed theory to various settings arising in high-frequency financial econometrics. In Section 3 we extend the scope of the theory to a situation where a known factor structure is present in data and a sparsity assumption is imposed on the precision matrix of the residual process rather than that of the original process. In Section 4, we apply the abstract theory developed in Section 3 to a concrete setting where we observe the process at equidistant times without jumps and noise. Section 5 conducts a Monte Carlo study to assess the finite sample performance of the asymptotic theory, while Section 6 performs a simple real data analysis for illustration. All the technical proofs are collected in the Appendix A, Appendix B, Appendix C and Appendix D.
Notation 1.
Throughout the paper, we assume d 2 . stands for the transpose of a matrix. For a vector x R d , we write the i-th component of x as x i for i = 1 , , d . For two vectors x , y R d , the statement x y means x i y i for all i = 1 , , d . The identity matrix of size d is denoted by E d . We write R l × k for the set of all l × k matrices. S d denotes the set of all d × d symmetric matrices. S d + denotes the set of all d × d positive semidefinite matrices. S d + + denotes the set of all d × d positive definite matrices. For a l × k matrix A, the ( i , j ) -th entry of A is denoted by A i j . Also, A i · and A · j denote the i-th row vector and the j-th column vector, respectively (both are regarded as column vectors). We write vec ( A ) for the vectorization of A:
vec ( A ) : = ( A 11 , , A l 1 , A 12 , , A l 2 , , A 1 k , , A l k ) R l k .
For every w [ 1 , ] , we set
A w : = { i = 1 l j = 1 k | A i j | w } 1 / w if w < , max 1 i l max 1 j k | A i j | if w = .
Also, we write | | | A | | | w for the w -operator norm of A:
| | | A | | | w : = sup { A x w : x R k , x w = 1 } .
It is well-known that | | | A | | | 1 = max 1 j k i = 1 l | A i j | and | | | A | | | = max 1 i l j = 1 k | A i j | . When l = k , diag ( A ) denotes the diagonal matrix with the same diagonal entries as A, and we set A : = A diag ( A ) . If A is symmetric, we denote by Λ max ( A ) and Λ min ( A ) the maximum and minimum eigenvalues of A, respectively. For two matrices A and B, A B denotes their Kronecker product. When A and B has the same size, we write A B for their Hadamard product.
For a random variable ξ and p ( 0 , ] , ξ p denotes the L p -norm of ξ. For a l-dimensional semimartingale X = ( X t ) t [ 0 , 1 ] and a k-dimensional semimartingale Y = ( Y t ) t [ 0 , 1 ] , we define Σ X Y : = [ X , Y ] 1 : = ( [ X i , Y j ] 1 ) 1 i l , 1 j k . We write Σ X = Σ X X for short. If Σ X is a.s. invertible, we write Θ X : = Σ X 1 .

2. Estimators and Abstract Results

Given a stochastic basis B = ( Ω , F , ( F t ) t [ 0 , 1 ] , P ) , we consider a d-dimensional semimartingale Y = ( Y t ) t [ 0 , 1 ] defined there. We assume Σ Y = [ Y , Y ] 1 is a.s. invertible. In this paper, we consider the asymptotic theory such that the dimension d possibly depends on a parameter n N so that d = d n as n . Consequently, both B and Y may also depend on n. However, following the custom of the literature, we omit the indices n from these objects and many other ones appearing below.
Our aim is to estimate the precision matrix Θ Y = Σ Y 1 when we have an estimator Σ ^ n for Σ Y ; as a corollary, we can also estimate Σ Y itself. We assume that Σ ^ n is an S d + -valued random variable all of whose diagonal entries are a.s. positive, but we do not specify the form of Σ ^ n because the asymptotic theory developed in this section depends on the property of Σ ^ n rather than their construction. This is convenient because construction of the estimator depends heavily on observation schemes for Y (with or without noise, synchronous or not, continuous or discontinuous and so on; see [33] for details). In Section 4 we illustrate how we apply the abstract theory developed in this and the next sections to a concrete situation.
We use the weighted graphical Lasso to estimate Θ Y (cf. [24]). The weighted graphical Lasso estimator Θ ^ λ with penalty parameter λ > 0 based on Σ ^ n is defined by
Θ ^ λ : = arg min Θ S d + + tr Θ Σ ^ n log det Θ + λ i j V ^ n i i V ^ n j j Θ i j ,
where V ^ n : = diag ( Σ ^ n ) 1 2 . According to the proof of [34] (Lemma 1), the optimization problem in Equation (1) has the unique solution when λ > 0 and Σ ^ n is positive semidefinite and all the diagonal entries of Σ ^ n are positive, so Θ ^ λ is a.s. defined in our setting. In the following we allow λ to be a random variable because we typically select λ in a data-driven way.
To analyze the theoretical property of Θ ^ λ , it is convenient to consider the graphical Lasso estimator K ^ λ based on the correlation matrix estimator R ^ n : = V ^ n 1 Σ ^ n V ^ n 1 as follows:
K ^ λ : = arg min K S d + + tr K R ^ n log det K + λ K 1 .
We can easily check Θ ^ λ = V ^ n 1 K ^ λ V ^ n 1 .
Remark 1.
As pointed out in Rothman et al. [35] and Janková and van de Geer [24], the graphical Lasso based on correlation matrices is theoretically preferable to that based on covariance matrices (so the weighted graphical Lasso is also preferable). In particular, we do not need to impose the so-called irrepresentability condition on Σ Y to derive the theoretical properties of our estimators, which contrasts with Brownlees et al. [17] (see Assumption 2 in [17]). See also Remark 2 for an additional discussion.
We introduce some notation related to the sparsity assumptions we will impose on Θ Y . Let A S d . For j = 1 , , d , we set D j ( A ) : = { i : A i j 0 , i j } and d j ( A ) : = # D j ( A ) . Then we define d ( A ) : = max 1 j d d j ( A ) . We also define S ( A ) : = j = 1 d D j ( A ) = { ( i , j ) : A i j 0 , i j } and s ( A ) : = # S ( A ) . These quantities have a clear interpretation when the matrix A represents the edge structure of some graph so that A i j 0 is equivalent to the presence of an edge between vertices i and j for i j ; in this case, d j ( A ) is the number of edges adjacent to vertex j (which is called the degree of vertex j) and s ( A ) is the total number of edges contained in the graph.
To derive our asymptotic results, we will impose the following structural assumptions on Σ Y .
[A1]
Λ max ( Σ Y ) + 1 / Λ min ( Σ Y ) = O p ( 1 ) as n .
[A2]
s ( Θ Y ) = O p ( s n ) as n for some sequence s n [ 1 , ) , n = 1 , 2 , .
[A3]
d ( Θ Y ) = O p ( d n ) as n for some sequence d n [ 1 , ) , n = 1 , 2 , .
[A1] is standard in the literature; see e.g., Condition A1 in [24]. [A2] states that the sparsity of Θ Y is controlled by the deterministic sequence s n ; we will require the growth rate of s n to be moderate. [A3] is another sparsity assumption on Θ Y . It is weaker than [A2] in the sense that it always holds true with d n = s n under [A2]. However, we can generally take d n smaller than s n .

2.1. Consistency

Set V Y : = diag ( Σ Y ) 1 2 , R Y : = V Y 1 Σ Y V Y 1 and K Y : = R Y 1 .
Proposition 1.
Assume [A1]–[A2]. Let ( λ n ) n = 1 be a sequence of positive-valued random variables satisfying the following conditions:
[B1]
λ n 1 Σ ^ n Σ Y p 0 as n .
[B2]
s n λ n p 0 as n .
Then we have
λ n 1 K ^ λ n K Y 2 = O p ( s n ) , λ n 1 | | | K ^ λ n K Y | | | w = O p ( s n )
and
λ n 1 | | | Θ ^ λ n Θ Y | | | w = O p ( s n ) , λ n 1 | | | Θ ^ λ n 1 Σ Y | | | 2 = O p ( s n )
as n for any w [ 1 , ] .
Proposition 1 is essentially a rephrasing of Theorem 14.1.3 in [24]. To get a better convergence rate in Proposition 1, we should choose λ n as small as possible, where a lower bound of λ n is determined by the convergence rate of Σ ^ n in the -norm by [B1]. One typically derives this convergence rate by establishing entry-wise concentration inequalities for Σ ^ n . Such inequalities have already been established for various covariance estimators used in high-frequency financial econometrics; see Theorems 1–2 and Lemma 3 in [36], Theorem 1 in [4], Theorem 1 in [37], and Theorem 2 in [17] for example. We however note that Σ ^ n should be positive semidefinite to ensure that the graphical Lasso has the unique solution. This property is not necessarily ensured by many covariance estimators used in this area. In this regard, we mention that pre-averaging and realized kernel estimators have versions to ensure this property, for which relevant bounds are available in [6] (Theorem 2) and [11] (Lemma 1).
Remark 2
(Comparison to Brownlees et al. [17]). Compared with [17] (Theorem 1), Proposition 1 has two major theoretical improvements. First, Proposition 1 does not assume the so-called irrepresentability condition, which is imposed in [17] (Theorem 1) as Assumption 2. In fact, under the assumptions of Proposition 1, the unweighted graphical Lasso estimator adopted in [17] would have the convergence rate ( s n + d ) λ n (rather than s n λ n in our case) to estimate Θ Y in the norm | | | · | | | w , in view of [24] (Theorem 14.1.2). This means that we need to select λ n so that d λ n p 0 as n to ensure the consistency, which is much stronger than the corresponding assumption [B2] in our setting. Since λ n typically converges to 0 no faster than 1 / n with n being the sample size (cf. Section 4), the condition d λ n p 0 excludes high-dimensional settings such that d n .
Second, Proposition 1 gives consistency in the w -operator norm for all w [ 1 , ] , while [17] (Theorem 1) only shows consistency in the -norm. We shall remark that consistency in matrix operator norms is important in application. For example, the consistency of Θ ^ λ n in the 2 -operator norm implies that eigenvalues of Θ ^ λ n consistently estimate the corresponding eigenvalues of Θ Y . Also, the consistency in the -operator norm ensures Θ ^ λ n x Θ Y x p 0 as n for any x R d such that x = O ( 1 ) . This result is important for portfolio allocation because the weight vector for the global minimum variance portfolio is given by Θ Y 1 / 1 Θ Y 1 when assets have covariance matrix Σ Y , where 1 = ( 1 , , 1 ) R d ; see e.g., [21] (Section 5.2).
On the other hand, unlike [17] (Theorem 1), we do not show selection consistency (i.e., P ( S ( Θ ^ λ n ) = S ( Θ Y ) ) 1 as n ) under our assumptions. Indeed, in the linear regression setting, it is known that an irrepresentability type condition is necessary for the selection consistency of the Lasso; see [22] (Section 7.5.3) for more details. This suggests that our estimator would not have oracle property in the sense of [38] in general. However, we shall remark that the asymptotic mixed normality of the de-biased estimator stated below can be used to construct an estimator with selection consistency via thresholding as in e.g., [39] (Section 3.1) and [40] (Section 4.2). See Corollary 2 and the subsequent discussion for details.

2.2. Asymptotic Mixed Normality

The following lemma states that Θ ^ λ n Θ Y is asymptotically linear in Σ ^ n Σ Y after bias correction when Θ Y is sufficiently sparse.
Lemma 1.
Suppose that the assumptions of Proposition 1 and [A3] are satisfied. Then we have
λ n 2 Θ ^ λ n Θ Y Γ n + Θ Y ( Σ ^ n Σ Y ) Θ Y = O p ( s n d n )
as n , where Γ n : = ( Θ ^ λ n Θ ^ λ n Σ ^ n Θ ^ λ n ) .
Lemma 1 is an almost straightforward consequence of Equation (4) and the Karush–Kuhn–Tucker (KKT) conditions for the optimization problem in Equation (1). As a consequence of this lemma, we obtain the following result, which states that the “de-biased” weighted graphical Lasso estimator Θ ^ λ n Θ Y Γ n inherits the asymptotic mixed normality of Σ ^ n .
Proposition 2.
Suppose that the assumptions of Lemma 1 are satisfied. For every n N , let a n > 0 , C n be a d 2 × d 2 positive semidefinite random matrix and J n be an m × d 2 random matrix, where m = m n may depend on n. Assume a n | | | J n | | | λ n 2 s n d n log ( m + 1 ) p 0 as n . Assume also that
lim n sup y R m P a n J ˜ n vec Σ ^ n Σ Y y P J ˜ n C n 1 / 2 ζ n y = 0
and
lim b 0 lim sup n P ( min diag ( J ˜ n C n J ˜ n ) < b ) = 0
as n , where J ˜ n : = J n ( Θ Y Θ Y ) and ζ n is a d 2 -dimensional standard Gaussian vector independent of F , which is defined on an extension of the probability space ( Ω , F , P ) if necessary. Then,
lim n sup y R m P a n J n vec Θ ^ λ n Γ n Θ Y y P J ˜ n C n 1 / 2 ζ n y = 0 .
In a standard i.i.d. setting such that Θ Y is non-random, we can usually verify Equation (5) by classical Lindeberg’s central limit theorem when m = 1 and J n is non-random because a n J ˜ n vec Σ ^ n Σ Y can be written as a sum of independent random variables; see the proof of [25] (Theorem 1) for example. By contrast, Θ Y is generally random and not independent of Σ ^ n Σ Y in our setting, so a n J ˜ n vec Σ ^ n Σ Y may not be a martingale even if vec Σ ^ n Σ Y is a martingale. In the case that d is fixed, we typically resolve this issue by proving stable convergence in law of vec Σ ^ n Σ Y ; see e.g., [26] for details. However, extension of this approach to the case that d as n is far from trivial as discussed at the beginning of [20] (Section 3). For this reason, [20] gives a result to directly establish Equation (5) type convergence in a high-dimensional setting. This result will be used in Section 4 to apply our abstract theory to a more concrete setting.
Remark 3.
Proposition 2 also allows m to diverge as n , which is necessary when we need to derive an asymptotic approximation of the joint distribution of vec Θ ^ λ n Γ n Θ Y . Such an approximation can be used to make simultaneous inference for entries of Θ Y ; see [40] for example.

3. Factor Structure

In financial applications, it is often important to take account of the factor structure of asset prices. In fact, many empirical studies have documented the existence of common factors in financial markets (e.g., [41] (Section 6.5)). Also, factor models play a dominant role in asset pricing theory (cf. [21] (Chapter 9)). When common factors are present across asset returns, the precision matrix cannot be sparse because all pairs of the assets are partially correlated given other assets through the common factors. Therefore, in such a situation, it is common practice to impose a sparsity assumption on the precision matrix of the residual process which is obtained after removing the co-movements induced by the factors (see e.g., [17] (Section 4.2) and [42] (Section 4.2)). In this section, we accommodate the theory developed in Section 2 to such an application.
Specifically, suppose that we have an r-dimensional known factor process X, and consider the following continuous-time factor model:
Y = β X + Z .
Here, β is a non-random d × r matrix and Z is a d-dimensional semimartingale such that [ Z , X ] 1 = 0 . β and Z represent the factor loading matrix and residual process of the model, respectively. This model is widely used in high-frequency financial econometrics; see [8,9,11] in the context of high-dimensional covariance matrix estimation. One restriction of the model Equation (7) is that the factor loading β is assumed to be constant, but there is empirical evidence that β may be regarded as constant in short time intervals (one week or less); see [18,43] for instance.
Remark 4.
The number of factors r possibly depends on n and (slowly) diverges as n . Also, β may depend on n.
We are interested in estimating Σ Y based on observation data for X and Y while taking account of the factor structure given by Equation (7). Suppose that we have generic estimators Σ ^ Y , n , Σ ^ X , n and Σ ^ Y X , n for Σ Y , Σ X and Σ Y X , respectively. Σ ^ Y , n , Σ ^ X , n and Σ ^ Y X , n are assumed to be random variables taking values in S d , S r + and R d × r , respectively. Now, by assumption we have
Σ Y = β Σ X β + Σ Z .
Assume Σ X is a.s. invertible. Then β can be written as β = Σ Y X Σ X 1 . Therefore, we can naturally estimate β by β ^ n : = Σ ^ Y X , n Σ ^ X , n 1 , provided that Σ ^ X , n is invertible. In practical applications, the invertibility of Σ ^ X , n is usually not problematic because the number of factors r is sufficiently small compared to the sample size. However, it is theoretically convenient to (formally) define β ^ n in the case that Σ ^ X , n is singular. For this reason, we take an S d + + -valued random variable Σ ^ X , n such that Σ ^ X , n = Σ ^ X , n 1 on the event where Σ ^ X , n is invertible, and redefine β ^ n as β ^ n : = Σ ^ Y X , n Σ ^ X , n . This does not affect the asymptotic properties of our estimators because Σ ^ X , n is asymptotically invertible under our assumptions we will impose. Now, from Equation (8), Σ Z is estimated by
Σ ^ Z , n : = Σ ^ Y , n β ^ n Σ ^ X , n β ^ n .
Since Σ ^ Z , n might be a poor estimator for Σ Z because d can be extremely large in our setting, we apply the weighted graphical Lasso to Σ ^ Z , n in order to estimate Σ Z . Specifically, we construct the weighted graphical Lasso estimator Θ ^ Z , λ based on Σ ^ Z , n as follows:
Θ ^ Z , λ = arg min Θ S d + + tr Θ Σ ^ Z , n log det Θ + λ i j Σ ^ Z , n i i Σ ^ Z , n j j Θ i j .
Then Σ Z is estimated by the inverse of Θ ^ Z , λ . Hence our final estimator for Σ Y is constructed as
Σ ^ Y , λ : = β ^ n Σ ^ X , n β ^ n + Θ ^ Z , λ 1 .
Remark 5.
Although we will impose the assumptions which guarantee that the optimization problem in Equation (10) asymptotically has the unique solution with probability 1, it may have no solution for a fixed n. Thus, we formally define Θ ^ Z , λ as an S d + + -valued random variable such that Θ ^ Z , λ is defined by Equation (10) on the event where the optimization problem in Equation (10) has the unique solution.
Remark 6
(Positive definiteness of Σ ^ Y , λ ). Since Θ ^ Z , λ 1 is positive definite by construction, Σ ^ Y , λ is positive definite (note that we assume Σ ^ X , n is positive semidefinite).
We will impose the following structural assumptions on the model:
[C1]
Σ Y = O p ( 1 ) and β = O ( 1 ) as n .
[C2]
Λ max ( Σ Z ) + 1 / Λ min ( Σ Z ) = O p ( 1 ) as n .
[C3]
Σ X + 1 / Λ min ( Σ X ) = O p ( 1 ) as n .
[C4]
s ( Θ Z ) = O p ( s n ) as n for some sequence s n [ 1 , ) , n = 1 , 2 , .
[C5]
d ( Θ Z ) = O p ( d n ) as n for some sequence d n [ 1 , ) , n = 1 , 2 , .
[C6]
There is a positive definite d × d matrix B such that | | | d 1 β β B | | | 2 0 and Λ min ( B ) 1 = O ( 1 ) as n .
[C1]–[C3] are natural structural assumptions on the model and standard in the literature; see e.g., Assumptions 2.1 and 3.3 in [44]. [C4]–[C5] are sparsity assumptions on the precision matrix of the residual process and necessary for our application of the (weighted) graphical Lasso. [C6] requires the factors to have non-negligible impact on almost all assets and is also standard in the context of covariance matrix estimation based on a factor model; see e.g., Assumption 3.5 in [44] and Assumption 6 in [8].
The following result establishes the consistency of the residual precision matrix estimator Θ ^ Z , λ .
Proposition 3.
Assume [C1]–[C4]. Let ( λ n ) n = 1 be a sequence of positive-valued random variables satisfying the following conditions:
[D1]
λ n 1 Σ ^ X , n Σ X p 0 , λ n 1 Σ ^ Y X , n β Σ ^ X , n p 0 and λ n 1 Σ ˘ Z , n Σ Z p 0 as n , where Σ ˘ Z , n : = Σ ^ Y , n Σ ^ Y X , n β β Σ ^ Y X , n + β Σ ^ X , n β .
[D2]
( s n + r ) λ n p 0 as n .
[D3]
P ( Σ ¯ n S d + ) 1 as n , where
Σ ¯ n : = Σ ^ X , n Σ ^ Y X , n Σ ^ Y X , n Σ ^ Y , n .
Then λ n 1 | | | Θ ^ Z , λ n Θ Z | | | w = O p ( s n ) and λ n 1 | | | Θ ^ Z , λ n 1 Σ Z | | | 2 = O p ( s n ) as n for any w [ 1 , ] .
Remark 7.
(a) Since Σ Z X = Σ Y X β Σ X and Σ Z = Σ Y Σ Y X β β Σ X Y + β Σ X β , Σ ^ Y X , n and Σ ˘ Z , n are seen as natural estimators for Σ Z X ( = 0 ) and Σ Z respectively if β were known. In this sense, [D1] is a natural extension of [B1]. In particular, if r = O ( 1 ) as n , [D1] follows from the convergences λ n 1 Σ ^ X , n Σ X p 0 , λ n 1 Σ ^ Y X , n Σ Y X p 0 and λ n 1 Σ ^ Y , n Σ Y p 0 under [C1], which are typically derived from entry-wise concentration inequalities for Σ ^ X , n , Σ ^ Y X , n and Σ ^ Y , n .
(b) [D3] ensures that Σ ^ Z , n is asymptotically positive semidefinite. This is necessary for guaranteeing that the optimization problem in Equation (10) asymptotically has the unique solution with probability 1.
From Proposition 3 we can also derive the convergence rates for the estimators Σ ^ Z , λ n and Σ ^ Z , λ n 1 in appropriate norms, which may be seen as counterparts of Theorems 1–2 in [8].
Proposition 4.
Under the assumptions of Proposition 3, λ n 1 Σ ^ Z , λ n Σ Z = O p ( s n + r 2 ) as n
Proposition 5.
Under the assumptions of Proposition 3, we additionally assume [C5]–[C6]. Then, λ n 1 | | | Σ ^ Y , λ n 1 Σ Y 1 | | | 2 = O p ( s n + r ) and λ n 1 | | | Σ ^ Y , λ n 1 Σ Y 1 | | | = O p ( r 3 / 2 d n ( s n + r ) ) as n .
Next we present the high-dimensional asymptotic mixed normality of the de-biased version of Θ ^ Z , λ .
Proposition 6.
Suppose that the assumptions of Proposition 3 and [C5] are satisfied. For every n N , let a n > 0 , C n be a d 2 × d 2 positive semidefinite random matrix and J n be an m × d 2 random matrix, where m = m n may depend on n. Assume a n | | | J n | | | λ n 2 s n d n log ( m + 1 ) p 0 as n . Assume also that
lim n sup y R m P a n J ˜ Z , n vec Σ ˘ Z , n Σ Z y P J ˜ Z , n C n 1 / 2 ζ n y = 0
and
lim b 0 lim sup n P ( min diag ( J ˜ Z , n C n J ˜ Z , n ) < b ) = 0
as n , where J ˜ Z , n : = J n ( Θ Z Θ Z ) and ζ n is a d 2 -dimensional standard Gaussian vector independent of F , which is defined on an extension of the probability space ( Ω , F , P ) if necessary. Then,
lim n sup y R m P a n J n vec Θ ^ Z , λ n Γ Z , n Θ Z y P J ˜ Z , n C n 1 / 2 ζ n y = 0 ,
where Γ Z , n : = ( Θ ^ Z , λ n Θ ^ Z , λ n Σ ^ Z , n Θ ^ Z , λ n ) .
Remark 8.
It is worth mentioning that condition Equation (12) is stated for Σ ˘ Z , n rather than Σ ^ Z , n . In other words, for deriving the asymptotic distribution, we do not need to take account of the effect of plugging β ^ n into β, at least in the first order. This is thanks to Lemma A11.
Although it is generally difficult to derive the asymptotic mixed normality of (the de-biased version of) Σ ^ Y , λ n 1 , this is possible when d is sufficiently large. In fact, in such a situation, the entry-wise behavior of Σ Y 1 is dominated by Θ Z as described by the following lemma:
Lemma 2.
Under the assumptions of Proposition 5, Σ ^ Y , λ n 1 Θ ^ Z , λ n = O p ( r d n / d ) and Σ Y 1 Θ Z = O p ( r d n / d ) as n .
Consequently, we obtain the following result.
Proposition 7.
Suppose that the assumptions of Proposition 6 and [C6] are satisfied. Suppose also a n | | | J n | | | r d n log ( m + 1 ) / d 0 as n . Then we have
lim n sup y R m P a n J n vec Σ ^ Y , λ n 1 Γ Z , n Σ Y 1 y P J ˜ Z , n C n 1 / 2 ζ n y = 0 .

4. Application to Realized Covariance Matrix

In this section, we apply the abstract theory developed above to the simplest situation where the processes have no jumps and are observed at equidistant times without noise. Specifically, we consider the continuous-time factor model Equation (7) and assume that both Y and X are observed at equidistant time points h / n , h = 0 , 1 , , n . In this case, Σ Y = [ Y , Y ] 1 is naturally estimated by the realized covariance matrix:   
Σ ^ Y , n : = [ Y , Y ] ^ 1 n : = h = 1 n ( Y h / n Y ( h 1 ) / n ) ( Y h / n Y ( h 1 ) / n ) .
Analogously, we define Σ ^ X , n : = [ X , X ] ^ 1 n and Σ ^ Y X , n : = [ Y , X ] ^ 1 n . In addition, we assume that Z and X are respectively d-dimensional and r-dimensional continuous Itô semimartingales given by
Z t = Z 0 + 0 t μ s d s + 0 t σ s d W s , X t = X 0 + 0 t μ ˜ s d s + 0 t σ ˜ s d W s ,
where μ = ( μ s ) s [ 0 , 1 ] and μ ˜ = ( μ ˜ s ) s [ 0 , 1 ] are respectively d-dimensional and r-dimensional ( F t ) -progressively measurable processes, σ = ( σ s ) s [ 0 , 1 ] and σ ˜ = ( σ ˜ s ) s [ 0 , 1 ] are respectively R d × d -valued and R r × d -valued ( F t ) -progressively measurable processes, and W = ( W s ) s [ 0 , 1 ] is a d -dimensional standard ( F t ) -Wiener process. To apply the convergence rate results to this setting, we impose the following assumptions:
[E1]
For all n , ν N , we have an event Ω n ( ν ) F and ( F t ) -progressively measurable processes μ ( ν ) = ( μ ( ν ) s ) s [ 0 , 1 ] , μ ˜ ( ν ) = ( μ ˜ ( ν ) s ) s [ 0 , 1 ] , σ ( ν ) = ( σ ( ν ) s ) s [ 0 , 1 ] and σ ˜ ( ν ) = ( σ ˜ ( ν ) s ) s [ 0 , 1 ] which take values in R d , R r , R d × d and R r × d , respectively, and they satisfy the following conditions:
(i)
lim ν lim sup n P ( Ω n ( ν ) c ) = 0 .
(ii)
μ = μ ( ν ) , μ ˜ = μ ˜ ( ν ) , σ = σ ( ν ) and σ ˜ = σ ˜ ( ν ) on Ω n ( ν ) for all ν N .
(iii)
For all ν N , there is a constant C ν > 0 such that
sup n N sup 0 t 1 sup ω Ω μ ( ν ) t ( ω ) + μ ˜ ( ν ) t ( ω ) + c ( ν ) t ( ω ) + c ˜ ( ν ) t ( ω ) C ν ,
where c ( ν ) t : = σ ( ν ) t σ ( ν ) t and c ˜ ( ν ) t : = σ ˜ ( ν ) t σ ˜ ( ν ) t .
[E2]
r = O ( d ) and ( log d ) / n 0 as n .
[E1] is a local boundedness assumption on the coefficient processes and typical in the literature: For example, [E1] is satisfied when μ , μ ˜ , σ and σ ˜ are all bounded by some locally bounded process independent of n. This latter condition is imposed in [8], among others. [E2] restricts the growth rates of d and r. It is indeed an adaptation of [D1] to the present setting.
Theorem 1.
Assume [C1]–[C4] and [E1]–[E2]. Let λ n be a sequence of positive-valued random variables such that λ n 1 ( log d ) / n p 0 and ( s n + r ) λ n p 0 as n . Then λ n 1 | | | Θ ^ Z , λ n Θ Z | | | w = O p ( s n ) , λ n 1 | | | Θ ^ Z , λ n 1 Σ Z | | | 2 = O p ( s n ) and λ n 1 Σ ^ Y , λ n Σ Y = O p ( s n + r 2 ) as n for any w [ 1 , ] . Moreover, if we additionally assume [C5]–[C6], then λ n 1 | | | Σ ^ Y , λ n 1 Σ Y 1 | | | 2 = O p ( s n + r ) and λ n 1 | | | Σ ^ Y , λ n 1 Σ Y 1 | | | = O p ( r 3 / 2 d n ( s n + r ) ) as n .
Remark 9
(Optimal convergence rate). From Theorem 1, the convergence rate of Θ ^ Z , λ n to Θ Z in the w -operator norm for any w [ 1 , ] can be arbitrarily close to s n ( log d ) / n , which is similar to that in a standard i.i.d. setting (cf. Theorem 14.1.3 in [24]). On the other hand, in the Gaussian i.i.d. setting without factor structure, the minimax optimal rate for this problem is known to be d ( Θ Z ) ( log d ) / n (see [45] (Theorem 1.1) and [46] (Theorem 5)), which can be faster than s n ( log d ) / n . In a standard i.i.d. setting, this rate can be attained by using a node-wise penalized regression (see e.g., [46] (Section 3.1)), so it would be interesting to study the convergence rate of such a method in our setting. We leave it to future research. In the meantime, such a method does not ensure the positive definiteness of the estimated precision matrix in general, so our estimator would be preferable for some practical applications such as portfolio allocation.
Next we derive the asymptotic mixed normality of the de-biased estimator in the present setting. As announced, we accomplish this purpose with the help of Malliavin calculus. In the following we will freely use standard concepts and notation from Malliavin calculus. We refer to [47,48] (Chapter 1) for detailed treatments of this subject.
We consider the Malliavin calculus with respect to W. For any real number p 1 and any integer k 1 , D k , p denotes the stochastic Sobolev space of random variables which are k times differentiable in the Malliavin sense and the derivatives up to order k have finite moments of order p. If F D k , p , we denote by D k F the kth Malliavin derivative of F, which is a random variable taking values in L 2 ( [ 0 , 1 ] k ; ( R d ) k ) . Here, we identify the space ( R d ) k with the set of all d -dimensional k-way arrays, i.e., real-valued functions on { 1 , , d } k . Since D k F is a random function on [ 0 , 1 ] k , we can consider the value D k F ( t 1 , , t k ) evaluated at ( t 1 , , t k ) [ 0 , 1 ] k . We denote this value by D t 1 , , t k F . Moreover, since D t 1 , , t k F takes values in ( R d ) k , we can consider the value D t 1 , , t k F ( a 1 , , a k ) evaluated at ( a 1 , , a k ) { 1 , , d } k . This value is denoted by D t 1 , , t k ( a 1 , , a k ) F . We remark that the variable D t 1 , , t k F is defined only a.e. on [ 0 , 1 ] k × Ω with respect to the measure error k × P , where error k denotes the Lebesgue measure on [ 0 , 1 ] k . Therefore, if D t 1 , , t k F satisfies some property a.e. on [ 0 , 1 ] k × Ω with respect to error k × P , by convention we will always take a version of D t 1 , , t k F satisfying that property everywhere on [ 0 , 1 ] k × Ω if necessary. We set D k , : = p = 1 D k , p . We denote by D k , ( R d ) the space of all d-dimensional random variables F such that F i D k , for every i = 1 , , d . The space D k , ( R d × r ) is defined in an analogous way. Finally, for any ( R d ) k -valued random variable F and p ( 0 , ] , we set
F p , 2 : = a 1 , , a k = 1 d F ( a 1 , , a k ) 2 p .
We also need to define some variables related to the “asymptotic” covariance matrices of the estimators. We define d 2 × d 2 random matrix C n by
C n ( i 1 ) d + j , ( k 1 ) d + l : = n h = 1 n ( h 1 ) / n h / n c s i k d s ( h 1 ) / n h / n c s j l d s + ( h 1 ) / n h / n c s i l d s ( h 1 ) / n h / n c s j k d s , i , j , k , l = 1 , , d ,
where c s : = σ s σ s . Then we set V n : = ( Θ Z Θ Z ) C n ( Θ Z Θ Z ) and S n : = diag ( V n ) 1 / 2 . In addition, under [E1], we define C n ( ν ) similarly to C n with replacing σ by σ ( ν ) . C n and V n play roles of the asymptotic covariance matrices of Σ ˘ Z , n and Θ ^ Z , λ n , respectively.
We impose the following assumptions on the model.
(F1)
We have [E1] and Σ Z ( ν ) : = 0 1 c ( ν ) t d t is a.s. invertible for all n , ν N . Moreover, for all n , ν N and t [ 0 , 1 ] , μ ( ν ) t D 1 , ( R d ) , σ ( ν ) t D 2 , ( R d × r ) and
sup n N max 1 i d sup 0 s , t 1 D s μ ( ν ) t i , 2 < ,
sup n N max 1 i d sup 0 s , t 1 D s σ ( ν ) t i · , 2 + sup 0 s , t , u 1 D s , t σ ( ν ) u i · , 2 < ,
sup n N max 1 i d Θ Z ( ν ) i i + max 1 k d 2 1 / V n ( ν ) k k < ,
where Θ Z ( ν ) : = Σ Z ( ν ) 1 and V n ( ν ) : = ( Θ Z ( ν ) Θ Z ( ν ) ) C n ( ν ) ( Θ Z ( ν ) Θ Z ( ν ) ) .
(F2)
The d × d matrix Q Z : = ( 1 { Θ Z i j 0 } ) 1 i , j d is non-random and d ( Q Z ) = O ( 1 ) as n .
(F3)
r = O ( d ) and ( log d ) 13 / n 0 as n .
We give a few remarks on these assumptions. First, [F1] imposes the (local) Malliavin differentiability on the coefficient processes of the residual process Z and the local boundedness on their Malliavin derivatives. Such an assumption is necessary for the application of the high-dimensional mixed normal limit theorem of [20] to our setting (see Lemma A16). Please note that we do not need to impose this type of assumption on the factor process X. We also remark that analogous assumptions are sometimes used in the literature of high-frequency financial econometrics even in low-dimensional settings; see e.g., [49,50]. Second, [F2] is clearly understood when we consider a Gaussian graphical model associated with Σ Z : The non-randomness of Q Z implies that the edge structure of this Gaussian graphical model is determined in a non-random manner (by conditioning, it is indeed sufficient that the edge structure is determined independently of the driving Wiener process W). Also, we remark that the condition d ( Q Z ) = O ( 1 ) is equivalent to [C5] with d n = 1 . It is seemingly possible to relax this condition so that it allows a diverging sequence d n as long as d n ( log d ) κ / n 0 for an appropriate constant κ > 0 . However, to determine the precise value of κ , we need to carefully revise the proof of Lemma A16 so that it allows the quantity inside sup n N in (A7) to diverge as n . To avoid such an additional complexity, we restrict our attention to the case of d n = 1 . Third, the condition ( log d ) 13 / n 0 in [F3] is used again for applying the high-dimensional CLT of [20].
Now we are ready to state our result. Let A re ( d 2 ) be the set of all hyperrectangles in R d 2 , i.e., A re ( d 2 ) consists of all sets A of the form A = { x R d 2 : a j x j b j for all j = 1 , , d 2 } for some a j b j , j = 1 , , d 2 .
Theorem 2.
Assume [C1]–[C4] and [F1]–[F3]. Let λ n be a sequence of positive-valued random variables such that λ n 1 ( log d ) / n p 0 , ( s n + r ) λ n p 0 and λ n 2 s n n log d p 0 as n . Then we have
sup A A re ( d 2 ) P n vec ( Θ ^ Z , λ n Γ Z , n Θ Z ) A P V n 1 / 2 ζ n A 0
and
sup A A re ( d 2 ) P n S n 1 vec ( Θ ^ Z , λ n Γ Z , n Θ Z ) A P S n 1 V n 1 / 2 ζ n A 0
as n .
Remark 10.
λ n is typically chosen of order close to log d / n as possible, so λ n 2 s n n log d p 0 is almost equivalent to s n ( log d ) 3 2 / n 0 . This is stronger than the condition s n ( log d ) / n 0 which is used to derive the asymptotic normality of the de-biased weighted graphical Lasso estimator in [24] (Theorem 14.1.6) (note that we assume d ( Θ Z ) = O p ( 1 ) ). This is because Theorem 2 derives the approximations of the joint distributions of the de-biased estimator and its Studentization, while [24] (Theorem 14.1.6) focuses only on approximation of their marginal distributions.
Theorem 2 is statistically infeasible in the sense that V n is unobservable. Thus, we need to estimate it from the data. Since Θ Z is naturally estimated by Θ ^ Z , λ n , we construct an estimator for C n . Define the d 2 -dimensional random vectors χ ^ h by
χ ^ h : = vec ( Z ^ h / n Z ^ ( h 1 ) / n ) ( Z ^ h / n Z ^ ( h 1 ) / n ) , h = 1 , , n ,
where Z ^ h / n : = Y h / n β ^ X h / n . Then we set
C ^ n : = n h = 1 n χ ^ h χ ^ h n 2 h = 1 n 1 χ ^ h χ ^ h + 1 + χ ^ h + 1 χ ^ h .
Lemma 3.
Suppose that the assumptions of Theorem 2 are satisfied. Suppose also r 2 ( log d ) / n = O ( 1 ) as n and that there is a constant γ ( 0 , 1 2 ] such that
sup 0 < t 1 1 n max 1 i , j d c ( ν ) t + 1 n i j c ( ν ) t i j 2 = O ( n γ )
as n for all ν N . Then, C ^ n C n = O p ( r ( log d ) 5 / 2 / n + n γ ) as n .
Let us set V ^ n : = ( Θ ^ Z , λ n Θ ^ Z , λ n ) C ^ n ( Θ ^ Z , λ n Θ ^ Z , λ n ) and S ^ n : = diag ( V ^ n ) 1 / 2 .
Corollary 1.
Under the assumptions of Lemma 3, we have the following results:
(a) 
Assume s n λ n log d p 0 and r ( log d ) 7 / 2 / n + n γ log d 0 as n . Then,
lim n sup A A re ( d 2 ) P n S ^ n 1 vec ( Θ ^ Z , λ n Γ Z , n Θ Z ) A P S n 1 V n 1 / 2 ζ n A = 0 .
(b) 
Assume s n λ n ( log d ) 2 p 0 and r ( log d ) 9 / 2 / n + n γ ( log d ) 2 0 as n . Then,
sup A A re ( d 2 ) P V ^ n 1 / 2 ζ n A F P V n 1 / 2 ζ n A F p 0 , sup A A re ( d 2 ) P S ^ n 1 V ^ n 1 / 2 ζ n A F P S n 1 V n 1 / 2 ζ n A F p 0
as n .
Corollary 1(a) particularly implies that
lim n max 1 i , j d sup x R P n Θ ^ Z , λ n i j Γ Z , n i j Θ Z i j s ^ n i j x Φ ( x ) = 0 ,
where s ^ n i j : = S ^ n ( i 1 ) d + j , ( i 1 ) d + j and Φ is the standard normal distribution function. This result can be used to construct entry-wise confidence intervals for Θ Z . Meanwhile, combining Corollary 1(b) with [20] (Proposition 3.2), we can estimate the quantiles of max k K ( V n 1 / 2 ζ n ) k and max k K ( S n 1 V n 1 / 2 ζ n ) k for a given set of indices K { 1 , , d 2 } by simulation. Such a result can be used to construct simultaneous confidence intervals and control the family-wise error rate in multiple testing for entries of Θ Z ; see Sections 2.3–2.4 of [51] for details.
As announced, another application of our result is to construct an estimator with selection consistency via thresholding. This is carried out by using the following result:
Corollary 2.
Let α n ( 0 , 1 ) ( n = 1 , 2 , ) satisfy α n α and log α n = O ( log d ) as n for some α [ 0 , 1 ) . Define c n : = Φ 1 1 α n d ( d 1 ) and
S ^ n ( Θ Z ) : = ( i , j ) : i j and n | Θ ^ Z , λ n i j Γ Z , n i j | s ^ n i j > c n .
Then, under the assumptions of Corollary 1(a), we have
lim inf n P ( S ^ n ( Θ Z ) = S ( Θ Z ) ) 1 2 α ,
provided that n / log d min ( i , j ) S ( Θ Z ) | Θ Z i j | p as n .
Please note that the last condition is satisfied if min ( i , j ) S ( Θ Z ) | Θ Z i j | is bounded away from zero because n / log d under our assumptions. Taking the sequence α n so that α = 0 in Corollary 2, we can asymptotically recover the support of Θ Z . In this case, if we define Θ ˜ Z , λ n = ( Θ ˜ Z , λ n i j ) 1 i , j d by
Θ ˜ Z , λ n i j = Θ ^ Z , λ n i j Γ Z , n i j if i = j or ( i , j ) S ^ n ( Θ Z ) , 0 otherwise ,
Θ ˜ Z , λ n will be oracle in the sense of [38]. However, we note that the estimator Θ ˜ Z , λ n would not be continuous in data, so it would not satisfy the third desirable property in [38] (p. 1349). To construct an oracle estimator for Θ Z which is continuous in data, we will need to consider a non-concave penalized estimator as in [52]. This is left to future research.

5. Simulation Study

5.1. Implementation

To implement the proposed estimation procedure, we need to solve the optimization problem in Equation (10). Among many existing algorithms to solve this problem, we employ the GLASSOFAST algorithm of [53], which is an improved implementation of the popular GLASSO algorithm of [54] and implemented in the R package glassoFast.
The remaining problem is how to select the penalty parameter λ . Following [17,55], we select it by minimizing the following formally defined Bayesian information criterion (BIC):
BIC ( λ ) : = n tr Θ ^ Z , λ Σ ^ Z , n log det Θ ^ Z , λ + ( log n ) i j 1 Θ ^ Z , λ i j 0 .
The minimization is carried out by grid search. The grid { λ 1 , , λ m } is constructed analogously to the R package glmnet (see Section 2.5 of [56] for details): First, as the maximum value λ max of the grid, we take the smallest value for which all the off-diagonal entries of Θ ^ Z , λ max are zero: In our case, λ max is set to the maximum modulus of the off-diagonal entries of Σ ^ Z , n (cf. [57] (Corollary 1)). Next, we take a constant ε > 0 and set λ min : = ε λ max as the minimum value of the grid. Finally, we construct the values λ 1 , , λ m increasing from λ min to λ max on the log scale:
λ i = exp log ( λ min ) + i 1 m 1 log ( λ max / λ min ) , i = 1 , , m .
We use ε = ( log d ) / n and m = 10 in our experiments (the computation procedure of the weighted graphical Lasso described here is implemented in the R package yuima as the function cce.factor with the option regularize="glasso" since version 1.9.2).

5.2. Simulation Design

We basically follow the setting of [8]. We simulate the model (7) with the following specification: For the factor process X, we set r = 3 and
d X t j = μ j d t + v t j d W t j , d v t j = κ j ( θ j v t j ) d t + η j v t j ρ j d W t j + 1 ρ j 2 d W ˜ t j , j = 1 , 2 , 3 ,
where W 1 , W 2 , W 3 , W ˜ 1 , W ˜ 2 , W ˜ 3 are independent standard Wiener processes. We set κ = ( 3 , 4 , 5 ) , θ = ( 0.09 , 0.04 , 0.06 ) , η = ( 0.3 , 0.4 , 0.3 ) , ρ = ( 0.6 , 0.4 , 0.25 ) and μ = ( 0.05 , 0.03 , 0.02 ) . The initial value v 0 j is drawn from the stationary distribution of the process ( v t j ) t [ 0 , 1 ] , i.e., the gamma distribution with shape 2 κ j θ j / η j 2 and rate 2 κ j / η j 2 . The entries of the loading β are independently drawn as β i 1 i . i . d . U [ 0.25 , 2.25 ] and β i 2 , β i 3 i . i . d . U [ 0.5 , 0.5 ] ( U [ a , b ] denotes the uniform distribution on [ a , b ] ). Finally, as the residual process Z, we take a d-dimensional Wiener process with covariance matrix Q. We consider the following two designs for Q:
Design 1
Q is a block diagonal matrix with 10 blocks of size ( d / 10 ) × ( d / 10 ) . Each block has diagonal entries independently generated from U [ 0.2 , 0.5 ] and a constant correlation of 0.25 .
Design 2
We simulate a Chung-Lu random graph G and set Q : = ( E d + D A ) 1 , where D and A are respectively the degree and adjacent matrices of the random graph G . Formally, given a weight vector w R d with w 0 , A is defined as a d × d symmetric random matrix such that all the diagonal entries of A are equal to 0 and the off-diagonal upper triangular entries are generated by independent Bernoulli variables so that P ( A i j = 1 ) = 1 P ( A i j = 0 ) = w i w j / k = 1 d w k for i < j . Then, D is defined as the diagonal matrix such that the j-th diagonal entry of D is given by d j ( A ) = i = 1 d A i j . The weight vector w is specified as follows: For every i = 1 , , d , we set w i : = c ( i + i 0 1 ) / d 1 / ( α 1 ) with i 0 : = d ( c / w M ) α 1 and c : = ( α 2 ) / ( α 1 ) , where we use α = 2.5 and w M = d 0.45 .
Design 1 is the same one as in [8]. Design 2 is motivated by the recent work of Barigozzi et al. [42], which reports that several characteristics of the residual precision matrix of the S&P 500 assets exhibit power-law behaviors and they are well-described by the power-law partial correlation network model proposed in [42]; the specification in Design 2 is the same one as in the simulation study of [42].
We observe the processes Y and X at the equidistant times h / n , h = 0 , 1 , , n . We set d = 500 and vary n as n { 78 , 130 , 195 , 390 , 780 } . We run 10,000 Monte Carlo iterations for each experiment.

5.3. Results

We begin by assessing the estimation accuracy of the proposed estimator in various norms. For comparison, we consider the following 5 different methods to estimate Σ Y :
RC
We simply use the realized covariance matrix [ Y , Y ] ^ 1 n defined by Equation (14) to estimate Σ Y .
glasso
We estimate Σ Y 1 by the (unweighted) graphical Lasso based on [ Y , Y ] ^ 1 n . Then, Σ Y is estimated by its inverse.
wglasso
We estimate Σ Y 1 by the weighted graphical Lasso based on [ Y , Y ] ^ 1 n (i.e., the estimator defined by Equation (1) with Σ ^ n = [ Y , Y ] ^ 1 n ). Then, Σ Y is estimated by its inverse.
f-glasso
We estimate Σ Z 1 by the (unweighted) graphical Lasso based on Σ ^ Z , n defined by Equation (9) with Σ ^ Y , n = [ Y , Y ] ^ 1 n and Σ ^ X , n = [ X , X ] ^ 1 n . Then, Σ Y is estimated by Equation (11) with Θ ^ Z , λ being the estimator so constructed.
f-wglasso
We estimate Σ Z 1 by the weighted graphical Lasso based on Σ ^ Z , n defined by Equation (9) with Σ ^ Y , n = [ Y , Y ] ^ 1 n and Σ ^ X , n = [ X , X ] ^ 1 n . Then, Σ Y is estimated by Equation (11) with Θ ^ Z , λ being the estimator so constructed.
In addition, for Design 1, we also consider the estimator proposed in [8]: Assuming that we know which entries of Σ Z are zero, we estimate Σ Y by β ^ n Σ ^ X , n β ^ n + ( Σ ^ Z , n i j 1 { Σ Z i j 0 } ) 1 i , j d . We label this method f-thr. Since the estimates of RC and f-thr are not always regular, we use their Moore-Penrose generalized inverses to estimate Σ Y 1 when they are singular. Please note that the methods glasso and f-glasso correspond to those proposed in [17], while wglasso and f-wglasso are those proposed in this paper. We report the simulation results in Table 1 and Table 2.
We first focus on the accuracy of estimating the precision matrix Σ Y 1 . The tables reveal the excellent performance of graphical Lasso-based methods. In particular, they outperform f-thr in Design 1 except for the case n = 780 even when we ignore the factor structure of the model. Nevertheless, the tables also show apparent benefit to take the factor structure into account in constructing the graphical Lasso type estimators. When we compare the weighted graphical Lasso estimators with the unweighted versions, the weighted ones tend to outperform the unweighted ones as n increases, especially when the factor structure is taken into account. This is more pronounced in Design 2. It is also worth mentioning that the estimation errors for Σ Y 1 in the method RC are greater at n = 390,780 than those at n = 78,130,195. This is presumably due to a “resonance” effect between the sample size n and dimension d coming from the use of the Moore-Penrose generalized inverse, which is well-known in multivariate analysis (see e.g., [58]): The estimation error for the precision matrix by the generalized inverse of the sample covariance matrix drastically increases as n approaches d. Theoretically, this occurs because the smallest non-zero eigenvalue of the sample covariance matrix tends to 0 as n approaches d.
Turning to the estimation accuracy for Σ Y in terms of the -norm, we find little advantage to use the graphical Lasso type methods over the realized covariance matrix: f-glasso and f-wglasso tend to outperform RC at small values of n, but the differences of the performance become less clear as n increases. From a theoretical point of view, this is not surprising because the realized covariance matrix is a consistent estimator for Σ Y in the -norm with the convergence rate log d / n ; this can be seen from e.g., Lemma A15. Meanwhile, in Design 1 f-thr performs the best in terms of estimating Σ Y at all the values of n.
Next we assess the accuracy of the mixed normal approximation for the de-biased estimator. For this purpose, we construct entry-wise confidence intervals for Θ Z based on Equation (21) (with taking the factor structure into account) and evaluate their nominal coverages. Table 3 reports these coverages averaged over the sets { ( i , j ) : i j , Θ Z i j = 0 } and { ( i , j ) : i j , Θ Z i j 0 } , respectively. We see from the table that the asymptotic approximation perfectly works to construct confidence intervals for zero entries of Θ Z . By contrast, confidence intervals for non-zero entries of Θ Z tend to be over-coverages, especially in Design 1. However, these coverage distortions tend to be moderate at larger values of n, which suggests that the normal approximation starts to work for relatively large sample sizes.

6. Empirical Application

To illustrate the applicability of the proposed method to real data analysis, we conduct a simple empirical study using high-frequency financial data. We take 1 March 2018 as the observation interval [ 0 , 1 ] and the log-price processes of the component stocks of the S&P 500 index as the process Y. In addition, as is often performed in the literature, we regard the SPDR S&P 500 ETF (SPY) as the observable factor process X. We use 5-minute returns to compute the estimators presented in Section 4. The dataset is provided by Bloomberg. Please note that our setting implies d = 504 and n = 77 , yielding a high-dimensional setting considered in this paper (note that our dataset does not contain observations at the market opening).
The selection procedure presented in Section 5.1 suggests λ n 0.272 . Then we estimate the support S ( Θ Z ) of Θ Z by the estimator S ^ n ( Θ Z ) with α n = 0.05 from Corollary 2. Figure 1 shows the partial correlation network induced by S ^ n ( Θ Z ) , drawn by the R package igraph. Specifically, it depicts the undirected graph with vertices consisting of the S&P 500 component stocks and an edge set given by S ^ n ( Θ Z ) . To illuminate the relationship between the network and sector structures, we color the vertices according to their Global Industry Classification Standard (GICS) sectors. We find there are strong interconnections in several sectors such as Consumer Staples, Energy, Real Estate and Utilities. The figure also suggests that the network would have some characteristics that are commonly observed in scale-free networks: It consists of a giant component with several hubs and a few small components. This is consistent with an observation made in [42]. Indeed, in [42] the authors have proposed a model for Θ Z that induces a scale-free partial correlation network. According to their model, the decay of the largest eigenvalues of Θ Z also exhibits power-law behavior. More precisely, letting Λ 1 Λ d be the ordered eigenvalues of Θ Z , we have Λ i i α with some α > 0 for moderate i and large d. Then, it is interesting to check whether this is the case in our dataset. Figure 2 shows the log-log size-rank plot for the 50 largest eigenvalues of Θ ^ Z , λ n . We see that except for the three largest eigenvalues, they clearly display power-law behavior.

7. Conclusions

In this paper, we have developed a generic asymptotic theory to estimate the high-dimensional precision matrix of high-frequency data using the weighted graphical Lasso. We have shown that the consistency of the weighted graphical Lasso estimator in matrix operator norms follows from the consistency of the initial estimator in the -norm, while the asymptotic mixed normality of its de-biased version follows from that of the initial estimator, where the asymptotic mixed normality has been formulated appropriately for the high-dimensional setting considered here. Our theory also encompasses a situation where a known factor structure is present in the data. In such a situation, we have applied the weighted graphical Lasso to the residual process obtained after removing the effect of factors.
We have applied the developed theory to the concrete situation where we can use the realized covariance matrix as the initial covariance estimator. We have derived the desirable asymptotic mixed normality of the realized covariance matrix by an application of the recent high-dimensional central limit theorem obtained in [20], where Malliavin calculus resolves the main theoretical difficulties caused by the high-dimensionality. Consequently, we have obtained a feasible asymptotic distribution theory to conduct inference for entries of the precision matrix. A Monte Carlo study has shown the good finite sample performance of our asymptotic theory.
A natural direction for future work is to apply the developed theory to a more complex situation where the process is asynchronously observed with noise and/or jumps. To accomplish this purpose, we need to establish the high-dimensional asymptotic mixed normality of relevant covariance estimators.

Funding

This research was funded by JST CREST Grant Number JPMJCR14D7 and JSPS KAKENHI Grant Numbers JP17H01100, JP18H00836, JP19K13668.

Acknowledgments

The author thanks two anonymous referees for their constructive comments that substantially improved the original version of this paper.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
i.i.d.independent and identically distributed
MLEmaximum likelihood estimation
KKTKarush–Kuhn–Tucker
CLTcentral limit theorem

Appendix A. Matrix Inequalities

This appendix collects some elementary (but less trivial) inequalities for matrices used in the proofs of the main results.
Lemma A1.
Let A S d . Then Λ min ( A ) A i i Λ max ( A ) for every i = 1 , , d .
Proof. 
See Theorem 14 in [59] (Chapter 11). □
Lemma A2.
Let A S d + and B R d × r . Then Λ max ( B A B ) Λ max ( B B ) Λ max ( A ) and Λ min ( B A B ) Λ min ( B B ) Λ min ( A ) .
Proof. 
Let x be an eigenvector associated with Λ max ( A ) such that x 2 = 1 . Then, by Theorem 4 in [59] (Chapter 11) we have Λ max ( B A B ) = x B A B x Λ max ( A ) x B B x Λ max ( A ) Λ max ( B B ) . Therefore, we obtain the first inequality. The second one can be shown analogously. □
Lemma A3.
Let A , B S d . Then | Λ max ( A ) Λ max ( B ) | | Λ min ( A ) Λ min ( B ) | | | | A B | | | 2 .
Proof. 
Noting the identity | | | C | | | 2 = Λ max ( C ) ( Λ min ( C ) ) holding for any symmetric matrix C, the desired result follows from Weyl’s inequality (cf. Corollary 4.3.15 in [60]). □
Lemma A4.
For any A S d , | | | A | | | 1 = | | | A | | | d ( A ) | | | A | | | 2 .
Proof. 
This is a straightforward consequence of the Schwarz inequality. □
Lemma A5.
Let A , B R r × r . If A is invertible and | | | A 1 ( B A ) | | | w < 1 for some w [ 1 , ] , B is invertible and
| | | B 1 A 1 | | | w | | | A 1 | | | w | | | A 1 ( B A ) | | | w 1 | | | A 1 ( B A ) | | | w .
Proof. 
See pages 381–382 of [60]. □
Lemma A6.
Let A S r and B , C R d × r . Then
B A C | | | A | | | 2 max 1 i d B i · 2 max 1 j d C j · 2 r | | | A | | | 2 B C .
Proof. 
This result has essentially been shown in [61] (Lemma A.7). Since A is symmetric, there is an orthogonal matrix U R r × r such that Λ : = U A U is a diagonal matrix. Now, for any i , j { 1 , , d } ,
| ( B A C ) i j | = | ( B i · ) A C j · | = | ( B i · ) U Λ U C j · | = k = 1 r Λ k k ( ( B i · ) U ) k ( U C j · ) k max 1 k r | Λ k k | U B i · 2 U C j · 2 = | | | A | | | 2 B i · 2 C j · 2 r | | | A | | | 2 B C .
This yields the desired result. □
Lemma A7.
Let A , B , C R d × d . Then, for any i , j = 1 , , d ,
| ( B A C ) i j | 2 k = 1 d | B i k | l = 1 d | C i l | k , l = 1 d | B i k C j l | ( A k l ) 2 .
Proof. 
This is a straightforward consequence of the Schwarz inequality. □

Appendix B. Proofs for Section 2

Appendix B.1. Proof of Proposition 1

The following result has essentially been proven in [24] and gives an estimate for the “deterministic part” of oracle inequalities for graphical Lasso type estimators.
Proposition A1.
Let A 0 , A S d and assume A A 0 λ 0 for some λ 0 > 0 . Assume also that there are numbers L > 1 and λ > 0 such that L 1 Λ min ( A 0 ) Λ max ( A 0 ) L , 2 λ 0 λ ( 8 L c L ) 1 and 8 c L 2 s λ 2 + 2 c L λ 0 2 diag ( A ) diag ( A 0 ) 2 2 λ 0 / ( 2 L ) , where s : = s ( B 0 ) and c L : = 8 L 2 . Set B 0 : = A 0 1 . Then, for any B S d + + satisfying
tr B A log det B + λ B 1 tr B 0 A log det B 0 + λ B 0 1 ,
it holds that
B B 0 2 2 / c L + λ B B 0 1 8 c L 2 s λ 2 + 2 c L λ 0 2 diag ( A ) diag ( A 0 ) 2 2 .
We first prove Proposition A1 under an additional assumption:
Lemma A8.
Proposition A1 holds true if we additionally have B B 0 2 1 / ( 2 L ) .
Proof. 
Set Δ = B B 0 . By assumption we have Δ 2 1 / ( 2 L ) , so Lemma 2 in [24] implies that
E ( Δ ) : = tr Δ A 0 log det Δ + B 0 log det B 0
is well-defined and we have
E ( Δ ) c Δ 2 ,
where c = c L 1 . Moreover, Equation (A1) yields
E ( Δ ) + λ B 1 = tr Δ ( A A 0 ) + tr B A log det B + λ B 1 tr B 0 A + log det B 0 tr Δ ( A A 0 ) + tr B 0 A log det B 0 + λ B 0 1 tr B 0 A + log det B 0 = tr Δ ( A A 0 ) + λ B 0 1 .
Now, note that tr ( A 1 B 1 ) = tr ( A 1 B 1 ) + tr ( diag ( A 1 ) diag ( B 1 ) ) and | tr ( A 1 B 1 ) | A 1 B 1 1 for any A 1 , B 1 R d × d . Thus, we infer that
| tr Δ ( A A 0 ) | Δ 1 A A 0 + diag ( Δ ) 2 diag ( A ) diag ( A 0 ) 2 λ 0 Δ 1 + diag ( A ) diag ( A 0 ) 2 diag ( Δ ) 2 ,
where we use A A 0 λ 0 in the last line. Combining this with Equations (A2) and (A3), we conclude that
c Δ 2 2 + λ B 1 λ 0 Δ 1 + diag ( A ) diag ( A 0 ) 2 diag ( Δ ) 2 + λ B 0 1 .
Let S : = S ( B 0 ) . Also, for a subset I of { 1 , , d } 2 and a d × d matrix U, we define the d × d matrix U I = ( U I i j ) 1 i , j d by U I i j = U i j 1 { ( i , j ) I } . Then, by definition and assumption, we have B 1 = B S 1 + B S c 1 , Δ 1 = Δ S 1 + B S c 1 , B 0 1 Δ S 1 + B S 1 , λ 2 λ 0 , so we deduce
c Δ 2 2 + λ 2 B S c 1 3 λ 2 Δ S 1 + diag ( A ) diag ( A 0 ) 2 diag ( Δ ) 2 .
Consequently, we obtain
2 c Δ 2 2 + λ Δ 1 = 2 c Δ 2 2 + λ ( B S c 1 + Δ S 1 ) 4 λ Δ S 1 + 2 diag ( A ) diag ( A 0 ) 2 diag ( Δ ) 2 4 λ s Δ S 2 + 2 diag ( A ) diag ( A 0 ) 2 diag ( Δ ) 2 ( Schwarz inequality ) 8 s λ 2 / c 2 + c Δ S 2 2 / 2 + 2 diag ( A ) diag ( A 0 ) 2 2 / c + c diag ( Δ ) 2 2 / 2 ,
where we use the inequality x y ( x 2 + y 2 ) / 2 in the last line. Since Δ 2 2 = diag ( Δ ) 2 2 + Δ 2 2 , we conclude that
c Δ 2 2 + λ Δ 1 8 s λ 2 / c 2 + 2 diag ( A ) diag ( A 0 ) 2 2 / c ,
which completes the proof. □
Proof of Proposition A1.
Thanks to Lemma A8, it suffices to prove B B 0 2 1 / ( 2 L ) .
Set B ˜ = α B + ( 1 α ) B 0 with α = M / ( M + B B 0 2 ) and M = 1 / ( 2 L ) . By definition we have B ˜ B 0 2 M = 1 / ( 2 L ) . Moreover, Equation (A1) and the convexity of the loss function imply that
tr ( B ˜ A ) log det ( B ˜ ) + λ B 1 tr B 0 A log det B 0 + λ B 0 1 .
Therefore, we can apply Lemma A8 with replacing B by B ˜ , and thus we obtain
B ˜ B 0 2 2 / c L + λ B ˜ B 0 1 8 c L 2 s λ 2 + 2 c L λ 0 2 diag ( A ) diag ( Σ ) 2 2 .
In particular, we have
B ˜ B 0 2 2 c L λ 0 / ( 2 L ) 1 / ( 16 L 2 ) ,
so we obtain B ˜ B 0 2 1 / ( 4 L ) = M / 2 . By construction this yields B B 0 2 M = 1 / ( 2 L ) , which completes the proof. □
Proof of Proposition 1.
Thanks to Lemma 7.2 of [45], it suffices to consider the case w = 1 .
For any L , n N , we define the set Ω n , L Ω by
Ω n , L : = { R ^ n R Y λ n / 2 } { L 1 Λ min ( R Y ) Λ max ( R Y ) L } { s ( K Y ) L s n } { 8 c L 2 L s n λ n > 1 / ( 4 L ) } ,
where c L : = 8 L 2 . Then we have
lim L lim sup n P ( Ω n , L c ) = 0 .
In fact, noting that Lemma A1 and [A1] yield
max 1 j d Σ Y j j + 1 / Σ Y j j = O p ( 1 ) ,
[A1]–[A2], [B1] and Lemma A2 imply that λ n 1 R ^ n R Y = o p ( 1 ) , s ( K Y ) = O p ( s n ) and Λ max ( R Y ) + 1 / Λ min ( R Y ) = O p ( 1 ) . Finally, [B2] yields lim n P ( 8 L s n λ n > 1 / ( 4 L ) ) = 0 for all L. Now, note that λ n 1 / ( 16 L c L 2 ) ( 8 L c L ) 1 on the set Ω n , L . Therefore, applying Proposition A1 with λ : = λ n and λ 0 : = λ n / 2 , for any fixed L we have
K ^ λ n K Y 2 2 / c L + λ n K ^ λ n K Y 1 8 c L 2 s n λ n 2 on Ω n , L .
Consequently, we obtain
lim sup n P K ^ λ n K Y 2 > 64 L 3 s n λ n lim sup n P ( Ω n , L c )
and
lim sup n P K ^ λ n K Y 1 > 512 L 4 s n λ n lim sup n P ( Ω n , L c ) .
Therefore, we conclude that
lim sup M lim sup n P K ^ λ n K Y 2 > M s n λ n lim sup L lim sup n P ( Ω n , L c ) = 0
and
lim sup M lim sup n P K ^ λ n K Y 1 > M s n λ n lim sup L lim sup n P ( Ω n , L c ) = 0 ,
which yields K ^ λ n K Y 2 = O p ( s n λ n ) and K ^ λ n K Y 1 = O p ( s n λ n ) . In particular, we obtain the first convergence of Equation (3). Moreover, since we have
| | | K ^ λ n K Y | | | 1 diag ( K ^ λ n ) diag ( K Y ) + K ^ λ n K Y 1 K ^ λ n K Y 2 + K ^ λ n K Y 1 ,
we also obtain the second convergence of Equation (3).
Now we prove Equation (4). First, Equation (A4) and [B1] yield λ n 1 | | | V ^ n V | | | 1 p 0 . Since | | | V | | | 1 = O p ( 1 ) , | | | K Y | | | 1 = O p ( s n ) and λ n = o p ( 1 ) by Equation (A4), [A2] and [B2], we obtain | | | V ^ n | | | 1 = O p ( 1 ) and | | | K ^ λ n | | | 1 = O p ( s n ) . Since
| | | Θ ^ λ n Θ Y | | | 1 = | | | V ^ n K ^ λ n V ^ n V K Y V | | | 1 | | | V ^ n V | | | 1 | | | K ^ λ n | | | 1 | | | V ^ n | | | 1 + | | | V | | | 1 | | | K ^ λ n K Y | | | 1 | | | V ^ n | | | 1 + | | | V | | | 1 | | | K Y | | | 1 | | | V ^ n V | | | 1 ,
we obtain the first convergence of Equation (4). Next, since | | | Θ ^ λ n Θ Y | | | 2 = o p ( 1 ) by the above result, [A1] and Lemma A3 yield | | | Θ ^ λ n 1 | | | 2 = Λ min ( Θ ^ λ n ) 1 = O p ( 1 ) . Since | | | Θ ^ λ n 1 Σ Y | | | 2 | | | Θ ^ λ n 1 | | | 2 | | | Θ Y Θ ^ λ n | | | 2 | | | Σ Y | | | 2 , we obtain the second convergence of Equation (4). □

Appendix B.2. Proof of Lemma 1

First, by Proposition 14.4.3 of [62] there is a (not necessarily measurable) d × d random matrix Z ^ n such that
Σ ^ n Θ ^ λ n 1 + λ n V ^ n Z ^ n V ^ n = 0 , Z ^ n 1 ,
and Z ^ n i j = sign ( Θ ^ λ n i j ) if Θ ^ λ n i j 0 . Consequently, it holds that
Σ ^ n Θ ^ λ n E d + λ n V ^ n Z ^ n V ^ n Θ ^ λ n = 0 .
Therefore, we have
Θ ^ λ n Θ Y + Θ Y ( Σ ^ n Σ Y ) Θ Y = Θ ^ λ n Θ Y + Θ Y ( Σ ^ n Σ Y ) Θ ^ λ n Θ Y ( Σ ^ n Σ Y ) ( Θ ^ λ n Θ Y ) = Θ ^ λ n Θ Y + Θ Y ( E d λ n V ^ n Z ^ n V ^ n Θ ^ λ n Σ Y Θ ^ λ n ) Θ Y ( Σ ^ n Σ Y ) ( Θ ^ λ n Θ Y ) = λ n Θ Y V ^ n Z ^ n V ^ n Θ ^ λ n Θ Y ( Σ ^ n Σ Y ) ( Θ ^ λ n Θ Y ) = λ n ( Θ ^ λ n Θ Y ) V ^ n Z ^ n V ^ n Θ ^ λ n ( Θ ^ λ n Θ ^ λ n Σ ^ n Θ ^ λ n ) Θ Y ( Σ ^ n Σ Y ) ( Θ ^ λ n Θ Y ) ,
so we obtain
Θ ^ λ n Θ Y Γ n + Θ Y ( Σ ^ n Σ Y ) Θ Y λ n | | | Θ ^ λ n Θ Y | | | V ^ n Z ^ n V ^ n Θ ^ λ n + | | | Θ Y | | | Σ ^ n Σ Y | | | Θ ^ λ n Θ Y | | | λ n | | | Θ ^ λ n Θ Y | | | | | | V ^ n | | | 2 | | | Θ ^ λ n | | | + | | | Θ Y | | | Σ ^ n Σ Y | | | Θ ^ λ n Θ Y | | | .
Now the desired result follows from Proposition 1 and Lemma A4. □

Appendix B.3. Proof of Proposition 2

In the light of Lemma 3.1 of [20], it is enough to prove
log ( m + 1 ) J n vec Θ ^ λ n Θ Y Γ n J ˜ n vec Σ ^ n Σ p 0
as n . Please note that vec ( A B C ) = ( C A ) vec ( B ) for any d × d matrices A , B , C (cf. Theorem 2 in [59] (Chapter 2)). Thus, we obtain the desired result once we prove
log ( m + 1 ) J n Θ ^ λ n Θ Y Γ n + Θ Y Σ ^ n Σ Θ Y p 0
as n . This follows from Lemma 1 and the assumptions of the proposition. □

Appendix C. Proofs for Section 3

Appendix C.1. Proof of Proposition 3

Set Ω n : = { | | | Σ X 1 ( Σ ^ X , n Σ X ) | | | 2 1 / 2 } .
Lemma A9.
Under the assumptions of Proposition 3, we have the following results:
(a)
On the event Ω n , Σ ^ X , n is invertible and | | | Σ ^ X , n 1 Σ X 1 | | | 2 2 | | | Σ X 1 | | | 2 | | | Σ X 1 ( Σ ^ X , n Σ X ) | | | 2 .
(b)
λ n 1 | | | Σ ^ X , n Σ X | | | 2 = o p ( r ) and | | | Σ ^ X , n | | | 2 = O p ( r ) as n .
(c)
P ( Ω n ) 1 as n .
Proof. 
(a) is a direct consequence of Lemma A5. (b) follows from [C3], [D2] and the inequalities | | | Σ ^ X , n Σ X | | | 2 r Σ ^ X , n Σ X and | | | Σ X | | | 2 r Σ X . (c) follows from the inequality | | | Σ X 1 ( Σ ^ X , n Σ X ) | | | 2 r | | | Σ X 1 | | | 2 Σ ^ X , n Σ X . □
Lemma A10.
Under the assumptions of Proposition 3, λ n 2 Σ ^ Z , n Σ ˘ Z , n = o p ( r ) as n .
Proof. 
Since β ^ n = Σ ^ Y X , n Σ ^ X , n 1 on the event Ω n , we have
Σ ^ Z , n Σ ˘ Z , n = β ^ n Σ ^ X , n β ^ n + Σ ^ Y X , n β + β Σ ^ Y X , n β Σ ^ X , n β = Σ ^ Y X , n ( β ^ n β ) + β Σ ^ X , n ( β ^ n β ) = ( Σ ^ Y X , n β Σ ^ X , n ) Σ ^ X , n 1 ( Σ ^ Y X , n β Σ ^ X , n ) on Ω n .
Therefore, Lemma A6 yields
Σ ^ Z , n Σ ˘ Z , n r | | | Σ ^ X , n 1 | | | 2 Σ ^ Y X , n β Σ ^ X , n 2 on Ω n .
Now, by [C3] and Lemma A9 we have | | | Σ ^ X , n 1 | | | 2 1 Ω n = O p ( 1 ) , so we obtain λ n 2 Σ ^ Z , n Σ ˘ Z , n 1 Ω n = o p ( r ) by [D1]. Since P ( Ω n ) 1 by Lemma A9(c), we complete the proof. □
Lemma A11.
Under the assumptions of Proposition 3, λ n 1 Σ ^ Z , n Σ Z p 0 and P ( min 1 i d Σ ^ Z , n i i > 0 ) 1 as n .
Proof. 
The first claim immediately follows from Lemma A10 and [D1]. The second one is a consequence of the first one, Lemma A1 and [C2]. □
Proof of Proposition3.
Set E n : = Ω n { Σ ¯ n S d + } { min 1 i d Σ ^ Z , n i i > 0 } . From Equation (0.8.5.3) in [60], we have Σ ^ Z , n S d + on E n . Hence, from the proof of [34] (Lemma 1), the optimization problem in Equation (10) has the unique solution on E n . Since P ( E n ) 1 as n by [D3] and Lemmas A9 and A11, the desired result follows once we prove λ n 1 Σ ^ Z , n Σ Z p 0 as n according to Proposition 1. This has already been established in Lemma A11. □

Appendix C.2. Proof of Proposition 4

We first establish some asymptotic properties of β ^ n which are necessary for the subsequent proofs.
Lemma A12.
Under the assumptions of Proposition 3, we have the following results:
(a)
β 2 = O p ( d ) as n .
(b)
λ n 1 max 1 i d β ^ n i · β i · 2 = o p ( r ) and max 1 i d β ^ n i · 2 = O p ( r ) as n .
(c)
λ n 1 β ^ n β 2 = o p ( d r ) and β ^ n 2 = O p ( d ) as n .
(d)
λ n 1 | | | β ^ n β | | | 1 = o p ( d r ) and | | | β ^ n | | | 1 = O p ( d ) as n .
(e)
λ n 1 | | | β ^ n β | | | = o p ( r ) and | | | β ^ n | | | = O p ( r ) as n .
Proof. 
(a) Since Σ X Λ min ( Σ X ) E r is positive semidefinite, β Σ X β Λ min ( Σ X ) β β is also positive semidefinite. Thus, Σ Y Λ min ( Σ X ) β β is positive definite by Equation (8). This implies that 0 tr ( Σ Y Λ min ( Σ X ) β β ) = tr ( Σ Y ) Λ min ( Σ X ) β 2 2 . Since tr ( Σ Y ) = O p ( d ) by [C1], we obtain β 2 2 = O p ( d ) by [C3].
(b) By Lemma A9, on the event Ω n , we have β ^ n = Σ ^ Y X , n Σ ^ X , n 1 . Hence, for every i = 1 , , d ,
β ^ n i · β i · 2 = ( Σ ^ Y X , n i · β Σ ^ X , n i · ) Σ ^ X , n 1 2 | | | Σ ^ X , n 1 | | | 2 Σ ^ Y X , n i · β Σ ^ X , n i · 2 r | | | Σ ^ X , n 1 | | | 2 Σ ^ Y X , n β Σ ^ X , n on Ω n .
Since | | | Σ ^ X , n 1 | | | 2 1 Ω n = O p ( 1 ) by Lemma A9, λ n 1 max 1 i d β ^ n i · β i · 1 Ω n = o p ( r ) by [D1]. Since P ( Ω n c ) 0 by Lemma A9, we obtain λ n 1 max 1 i d β ^ n i · β i · 2 = o p ( r ) . Since max 1 i d β i · 2 r β = O ( r ) by [C1], we also obtain max 1 i d β ^ n i · = O p ( r ) .
(c) This follows from (a)–(b) and r λ n = o p ( 1 ) .
(d) This is a direct consequence of (b).
(e) This follows from (b) and the Schwarz inequality. □
Proof of Proposition 4.
Since A | | | A | | | 2 for any matrix A, in view of Proposition 3 it suffices to prove λ n 1 β ^ n Σ ^ X , n β ^ n β Σ X β = O p ( r 2 ) . By Lemma A6 we have
β ^ n Σ ^ X , n β ^ n β Σ X β | | | Σ ^ X , n | | | 2 max 1 i d β ^ n i · β i · 2 max 1 i d β ^ n i · 2 + | | | Σ ^ X , n Σ X | | | 2 max 1 i d β i · 2 max 1 i d β ^ n i · 2 + | | | Σ X | | | 2 max 1 i d β i · 2 max 1 i d β ^ n i · β i · 2 .
Therefore, the desired result follows from Lemmas A9, A12(b) and assumption. □

Appendix C.3. Proof of Proposition 5

Set Π : = ( Σ X 1 + β Σ Z 1 β ) 1 and Π ^ n : = ( Σ ^ X , n + β ^ n Θ ^ Z , λ n β ^ n ) 1 .
Lemma A13.
Under the assumptions of Proposition 5, we have the following results:
(a)
Λ min ( β β ) 1 = O ( d 1 ) as n .
(b)
| | | Π | | | 2 = O p ( d 1 ) as n .
(c)
λ n 1 | | | Π ^ n Π | | | 2 = O p ( d 1 ( s n + r ) ) and | | | Π ^ n | | | 2 = O p ( d 1 ) as n .
Proof. 
(a) By Lemma A3 we have | Λ min ( d 1 β β ) Λ min ( B ) | | | | d 1 β β B | | | 2 . Hence the desired result follows from [C6].
(b) Since | | | Π | | | 2 = Λ min ( Σ X 1 + β Σ Z 1 β ) 1 and Σ X 1 is positive definite, Corollary 4.3.12 in [60] and Lemma A2 yield
| | | Π | | | 2 Λ min ( β Σ Z 1 β ) 1 Λ min ( β β ) 1 Λ min ( Σ Z 1 ) 1 = Λ min ( β β ) 1 Λ max ( Σ Z ) .
Thus, the desired result follows from claim (a) and [C2].
(c) First, since we have
| | | β ^ n Θ ^ Z , λ n β ^ n β Θ Z β | | | 2 | | | β ^ n β | | | 2 | | | Θ ^ Z , λ n | | | 2 | | | β ^ n | | | 2 + | | | β | | | 2 | | | Θ ^ Z , λ n Θ Z | | | 2 | | | β ^ n | | | 2 + | | | β | | | 2 | | | Θ Z | | | 2 | | | β ^ n β | | | 2 ,
Lemma A12(a) and (c) and Proposition 3 yield λ n 1 | | | β ^ n Θ ^ Z , λ n β ^ n β Θ Z β | | | 2 = O p ( d s n ) . Combining this with Lemma A9 and (b), we obtain λ n 1 | | | Π ( Π ^ n 1 Π 1 ) | | | 2 1 Ω n = O p ( s n + r ) . Now let us set Ω n , 1 : = Ω n { | | | Π ( Π ^ n 1 Π 1 ) | | | 2 1 / 2 } . Then, using (b) and Lemmas A5 and A9(c), we obtain λ n 1 | | | Π ^ n Π | | | 2 1 Ω n , 1 = O p ( d 1 ( s n + r ) ) and P ( Ω n , 1 c ) 0 . This completes the proof. □
Proof of Proposition 5.
By Sherman–Morrison–Woodbury formula (cf. Equation (0.7.4.1) in [60]), for any w { 2 , } we have
| | | Σ ^ Y , λ n 1 Σ Y 1 | | | w | | | Θ ^ Z , λ n Θ Z | | | w + | | | ( Θ ^ Z , λ n Θ Z ) β ^ n Π ^ n β ^ n Θ ^ Z , λ n | | | w + | | | Θ Z ( β ^ n β ) Π ^ n β ^ n Θ ^ Z , λ n | | | w + | | | Θ Z β ( Π ^ n Π ) β ^ n Θ ^ Z , λ n | | | w + | | | Θ Z β Π ( β ^ n β ) Θ ^ Z , λ n | | | w + | | | Θ Z β Π β ( Θ ^ Z , λ n Θ Z ) | | | w = : Δ 1 + Δ 2 + Δ 3 + Δ 4 + Δ 5 + Δ 6 .
Proposition 3 yields λ n 1 Δ 1 = O p ( s n ) . Moreover, noting that | | | Θ Z | | | = O p ( d n ) by Lemma A4 and [C2], Proposition 3 and Lemmas A12 and A13 imply that λ n 1 ( Δ 2 + Δ 6 ) = O p ( s n ) , λ n 1 ( Δ 3 + Δ 5 ) = o p ( r ) and λ n 1 Δ 4 = O p ( s n + r ) when w = 2 and λ n 1 ( Δ 2 + Δ 6 ) = O p ( r 3 / 2 s n d n ) , λ n 1 Δ 3 = o p ( r 3 / 2 d n ) , λ n 1 Δ 4 = O p ( r 3 / 2 ( s n + r ) d n ) and λ n 1 Δ 5 = o p ( r 2 d n ) when w = . This completes the proof. □

Appendix C.4. Proof of Proposition 6

We apply Proposition 2 to Σ ^ Z , n . From the arguments in the proof of Proposition 3, it remains to check condition Equation (5). More precisely, we need to prove
lim n sup y R m P a n J ˜ Z , n vec Σ ^ Z , n Σ Z y P J ˜ Z , n C n 1 / 2 ζ n y = 0 .
Thanks to Lemma 3.1 in [20] and Equation (12), this claim follows once we prove log ( m + 1 ) a n J ˜ Z , n vec ( Σ ^ Z , n Σ Z ) a n J ˜ Z , n vec ( Σ ˘ Z , n Σ Z ) 0 . Since we have
a n J ˜ Z , n vec ( Σ ^ Z , n Σ Z ) a n J ˜ Z , n vec ( Σ ˘ Z , n Σ Z ) a n | | | J n | | | | | | Θ Z | | | 2 Σ ^ Z , n Σ ˘ Z , n
and | | | Θ Z | | | = O p ( d n ) by Lemma A4, the desired result follows from Lemma A10 and assumption. □

Appendix C.5. Proof of Lemma 2 and Proposition 7

We use the same notation as in Section C.3. By Sherman-Morisson-Woodbury formula we have
Σ ^ Y , λ n 1 Θ Z , λ n Θ ^ Z , λ n β ^ n Π ^ n β ^ n Θ ^ Z , λ n r Θ ^ Z , λ n β ^ n 2 | | | Π ^ n | | | 2 r | | | Θ ^ Z , λ n | | | 2 β ^ n 2 | | | Π ^ n | | | 2 ,
where the second inequality follows from Lemma A6. Since | | | Θ Z | | | 2 = O ( d n ) by Lemma A4, we have | | | Θ ^ Z , λ n | | | 2 = O p ( d n ) by Proposition 3. We also have β ^ n = O p ( 1 ) by [C1], [D2] and Lemma A12(b). Consequently, we obtain Σ ^ Y , λ n 1 Θ Z , λ n = O p ( r d n / d ) by Lemma A13. Similarly, we can prove Σ Y 1 Θ Z = O p ( r d n / d ) . Therefore, we complete the proof of Lemma 2.
Proposition 7 is an immediate consequence of Proposition 6, Lemma 2 and [20] (Lemma 3.1). □

Appendix D. Proofs for Section 4

Appendix D.1. Proof of Theorem 1

The proof relies on the following concentration inequalities for discretized quadratic covariations of continuous martingales:
Lemma A14.
Let M = ( M t ) t [ 0 , 1 ] and N = ( N t ) t [ 0 , 1 ] be two continuous martingales. Suppose that there is a constant L > 0 such that
| [ M , M ] t [ M , M ] s | | [ N , N ] t [ N , N ] s | L | t s |
for all s , t [ 0 , 1 ] . Then, for any θ > 0 , there is a constant C L , θ > 0 which depends only on L and θ such that
P n [ M , N ] ^ 1 n [ M , N ] 1 > x 2 exp C L , θ x 2
for all n N and x [ 0 , θ n ] .
Remark A1.
Similar estimates to Lemma A14 have already been obtained in the literature (see e.g., [36] (Lemma 3), [4] (Lemma 10) and [8] (Lemma A.1)). Since we use slightly different assumptions from the existing ones, we give its proof in Appendix E for the shake of completeness.
Define the ( d + r ) -dimensional semimartingale Z ¯ = ( Z ¯ t ) t [ 0 , 1 ] by Z ¯ t = ( Z t 1 , , Z t d , X t 1 , , X t r ) .
Lemma A15.
Assume [E1] and log ( d + r ) / n 0 as n . Then, [ Z ¯ , Z ¯ ] ^ 1 n [ Z ¯ , Z ¯ ] 1 = O p ( log ( d + r ) / n ) as n .
Proof. 
For all n , ν N and t [ 0 , 1 ] , set
μ ¯ ( ν ) t = μ ( ν ) t μ ˜ ( ν ) t , σ ¯ ( ν ) t = σ ( ν ) t σ ˜ ( ν ) t .
Then we define the processes A ¯ ( ν ) = ( A ¯ ( ν ) t ) t [ 0 , 1 ] and M ¯ ( ν ) = ( M ¯ ( ν ) t ) t [ 0 , 1 ] by A ¯ ( ν ) t = 0 t μ ¯ ( ν ) s d s and M ¯ ( ν ) t = 0 t σ ¯ ( ν ) s d W s . By the local property of Itô integrals (cf. [47], pp. 17–18), we have Z ¯ = Z ¯ ( ν ) : = A ¯ ( ν ) + M ¯ ( ν ) on Ω n ( ν ) . Hence, for every L > 0 , it holds that
P [ Z ¯ , Z ¯ ] ^ 1 n [ Z ¯ , Z ¯ ] 1 > L log ( d + r ) / n P [ Z ¯ ( ν ) , Z ¯ ( ν ) ] ^ 1 n [ Z ¯ ( ν ) , Z ¯ ( ν ) ] 1 > L log ( d + r ) / n + P ( Ω n ( ν ) c ) .
Therefore, the proof is completed once we show that
lim L lim sup n P [ Z ¯ ( ν ) , Z ¯ ( ν ) ] ^ 1 n [ Z ¯ ( ν ) , Z ¯ ( ν ) ] 1 > L log ( d + r ) / n = 0
for any fixed ν > 0 . We decompose the target quantity as
[ Z ¯ ( ν ) , Z ¯ ( ν ) ] ^ 1 n [ Z ¯ ( ν ) , Z ¯ ( ν ) ] 1 = ( [ M ¯ ( ν ) , M ¯ ( ν ) ] ^ 1 n [ M ¯ ( ν ) , M ¯ ( ν ) ] 1 ) + [ A ¯ ( ν ) , A ¯ ( ν ) ] ^ 1 n + [ A ¯ ( ν ) , M ¯ ( ν ) ] ^ 1 n + [ M ¯ ( ν ) , A ¯ ( ν ) ] ^ 1 n = : I n + II n + III n + IV n .
First we consider I n . Since we have | [ M ¯ ( ν ) i , M ¯ ( ν ) i ] t [ M ¯ ( ν ) i , M ¯ ( ν ) i ] s | C ν | t s | for all s , t [ 0 , 1 ] and i { 1 , , d + r } by [E1], by Lemma A14 there is a constant C > 0 such that
max 1 i , j d + r P n I n i j > x 2 e C x 2
for all n N and x [ 0 , n ] . Therefore, for every L [ 0 , n / log ( d + r ) ] we obtain
P I n > L log ( d + r ) n i , j = 1 d + r P n I n i j > L log ( d + r ) 2 ( d + r ) 2 C L 2 .
Hence, noting the assumption n / log ( d + r ) , we conclude that
lim L lim sup n P I n > L log ( d + r ) n = 0 .
Next, by [E1] we have II n C ν 2 / n . Therefore, we obtain II n = O ( n 1 ) = O ( log ( d + r ) / n ) . Third, we consider III n . By the Schwarz inequality we have
III n II n max 1 j d + r [ M ¯ ( ν ) j , M ¯ ( ν ) j ] ^ 1 n .
From the above result we have II n = O ( 1 / n ) . Meanwhile, using the inequality x | x y | + y holding for all x , y 0 , we have
max 1 j d + r [ M ¯ ( ν ) j , M ¯ ( ν ) j ] ^ 1 n I n + max 1 j d + r [ M ¯ ( ν ) j , M ¯ ( ν ) j ] 1 I n + C ν .
Hence the above result yields max 1 j d + r [ M ¯ ( ν ) j , M ¯ ( ν ) j ] ^ 1 n = O p ( 1 ) . Thus, we conclude that III n = O p ( 1 / n ) = O p ( log ( d + r ) / n ) . Finally, since IV n = III n , we complete the proof. □
Proof of Theorem 1.
In view of Propositions 3–5, it suffices to check [D1]. Noting that Σ ^ Y X , n β Σ ^ X , n = [ Z , X ] ^ 1 n and Σ ˘ Z , n = [ Z , Z ] ^ 1 n , [D1] immediately follows from Lemma A15. □

Appendix D.2. Proof of Theorem 2

Our proof relies on the following “high-dimensional” asymptotic mixed normality of the realized covariance matrix:
Lemma A16
([20], Theorem 4.2(b)). Assume [F1]. For every n, let X n be an m × d 2 random matrix and Υ n be an m × d 2 non-random matrix such that | | | Υ n | | | 1 , where m = m n possibly depends on n. Define Ξ n : = Υ n X n . Suppose that for all n , ν N , we have X n ( ν ) D 2 , ( R m × d 2 ) such that X n = X n ( ν ) on Ω n ( ν ) and
lim b 0 lim sup n P ( min diag ( Ξ n ( ν ) C n ( ν ) Ξ n ( ν ) ) < b ) = 0 ,
sup n N max 1 i m max 1 j d 2 X n ( ν ) i j + sup 0 t 1 D t X n ( ν ) i j , 2 + sup 0 s , t 1 D s , t X n ( ν ) i j , 2 < ,
where Ξ n ( ν ) : = Υ n X n ( ν ) . Suppose also | | | Υ n | | | 5 ( log d m ) 13 2 0 as n . Then we have
sup y R m P Ξ n vec [ Z , Z ] ^ 1 n [ Z , Z ] 1 y P ( Ξ n C n 1 / 2 ζ n y ) 0
as n .
To apply Lemma A16 to the present setting, we prove some auxiliary results.
Lemma A17.
Let A 1 , A 2 , B 1 , B 2 R d × d . Then ( A 1 A 2 ) ( B 1 B 2 ) = ( A 1 B 1 ) ( A 2 B 2 ) .
Proof. 
This follows from a straightforward computation. □
Lemma A18.
Assume [F1]. Then, for any n , ν N and t [ 0 , 1 ] , c ( ν ) t D 2 , ( R d × d ) , C n ( ν ) D 2 , ( R d 2 × d 2 ) and
sup n N max 1 i , j d sup 0 t , u 1 D s c ( ν ) t i j , 2 + sup 0 t , u , v 1 D u , v c ( ν ) t i j , 2 < , sup n N max 1 k , l d 2 sup 0 u 1 D u C n ( ν ) k l , 2 + sup 0 u , v 1 D u , v C n ( ν ) k l , 2 < .
Proof. 
This directly follows from Lemmas B.11–B.12 in [20]. □
Lemma A19.
Assume [F1]–[F2]. For any n , ν N , Θ Z ( ν ) D 2 , ( R d × d ) and
sup n N max 1 i , j d sup 0 t 1 D t Θ Z ( ν ) i j , 2 + sup 0 s , t 1 D s , t Θ Z ( ν ) i j , 2 < .
Proof. 
First, by Remark 15.87 in [48] and Lemma A18, Σ Z ( ν ) D 2 , ( R d × d ) and D k Σ Z ( ν ) = 0 1 D k c ( ν ) s d s for k = 1 , 2 . In particular, we have
sup n N max 1 i , j d sup 0 t 1 D t Σ Z ( ν ) i j , 2 + sup 0 s , t 1 D s , t Σ Z ( ν ) i j , 2 <
by Lemma A18 and Equation (16). Next, by Theorem 15.78 in [48] and Theorem 4 in [59] (Chapter 8), we have Θ Z ( ν ) D 2 , ( R d × d ) with D s ( a ) Θ Z ( ν ) = Θ Z ( ν ) D s ( a ) Σ Z ( ν ) Θ Z ( ν ) and
D s , t ( a , b ) Θ Z ( ν ) = Θ Z ( ν ) D t ( b ) Σ Z ( ν ) Θ Z ( ν ) D s ( a ) Σ Z ( ν ) Θ Z ( ν ) Θ Z ( ν ) D s , t ( a , b ) Σ Z ( ν ) Θ Z ( ν ) + Θ Z ( ν ) D s ( a ) Σ Z ( ν ) Θ Z ( ν ) D t ( b ) Σ Z ( ν ) Θ Z ( ν )
for all s , t [ 0 , 1 ] and a , b { 1 , , d } . Therefore, by Lemma A7 we have
D s Θ Z ( ν ) i j 2 | | | Θ Z ( ν ) | | | 2 max 1 k , l d D s Σ Z ( ν ) k l 2
for all i , j = 1 , , d . Then, noting that Q Z is non-random, we have ( 1 { Θ Z ( ν ) i j 0 } ) 1 i , j d = Q Z by assumption. Therefore, we obtain | | | Θ Z ( ν ) | | | | | | Q Z | | | Θ Z ( ν ) . Hence, Equation (A9), [F2] and Equation (17) yield sup n N max 1 i , j d sup 0 t 1 D t Θ Z ( ν ) i j , 2 < . In the meantime, by Lemma A7 we also have
D s , t Θ Z ( ν ) i j 2 2 b = 1 d k = 1 d Θ Z ( ν ) D t ( b ) Σ Z ( ν ) Θ Z ( ν ) i k 2 | | | Θ Z ( ν ) | | | max 1 k , l d D s Σ Z ( ν ) k l 2 + | | | Θ Z ( ν ) | | | 2 max 1 k , l d D s , t Σ Z ( ν ) k l 2 = 2 b = 1 d k = 1 d D t ( b ) Θ Z ( ν ) i k 2 | | | Θ Z ( ν ) | | | max 1 k , l d D s Σ Z ( ν ) k l 2 + | | | Θ Z ( ν ) | | | 2 max 1 k , l d D s , t Σ Z ( ν ) k l 2 .
Now, since Q Z is non-random, we have D t ( b ) Θ Z ( ν ) = Q Z D t ( b ) Θ Z ( ν ) . Therefore, the Schwarz inequality yields
D s , t Θ Z ( ν ) i j 2 2 | | | Q Z | | | k = 1 d b = 1 d D t ( b ) Θ Z ( ν ) i k 2 | | | Θ Z ( ν ) | | | max 1 k , l d D s Σ Z ( ν ) k l 2 + | | | Θ Z ( ν ) | | | 2 max 1 k , l d D s , t Σ Z ( ν ) k l 2 2 | | | Q Z | | | max 1 k , l d D t Θ Z ( ν ) k l 2 | | | Θ Z ( ν ) | | | max 1 k , l d D s Σ Z ( ν ) k l 2 + | | | Θ Z ( ν ) | | | 2 max 1 k , l d D s , t Σ Z ( ν ) k l 2 .
Consequently, we conclude sup n N max 1 i , j d sup 0 s , t 1 D s , t Θ Z ( ν ) i j , 2 < by [F2], Equation (17) and the results proved above. □
Lemma A20.
Assume [F1]–[F2]. For any n , ν N , Θ Z ( ν ) Θ Z ( ν ) , V n ( ν ) D 2 , ( R d 2 × d 2 ) and
sup n N max 1 i , j d 2 sup 0 t 1 D t { Θ Z ( ν ) Θ Z ( ν ) } i j , 2 + sup 0 s , t 1 D s , t { Θ Z ( ν ) Θ Z ( ν ) } i j , 2 < ,
sup n N max 1 i , j d 2 sup 0 t 1 D t V n ( ν ) i j , 2 + sup 0 s , t 1 D s , t V n ( ν ) i j , 2 < .
Proof. 
First, Corollary 15.80 in [48], Equation (17) and Lemma A19 imply that Θ Z ( ν ) Θ Z ( ν ) D 2 , ( R 2 d 2 × d 2 ) and Equation (A10) holds true. Next, Corollary 15.80 in [48] and Lemma A18 imply that V n ( ν ) D 2 , ( R d 2 × d 2 ) and
D s ( a ) V n ( ν ) = { D s ( a ) Θ Z 2 ( ν ) } C n ( ν ) Θ Z 2 ( ν ) + Θ Z 2 ( ν ) { D s ( a ) C n ( ν ) } Θ Z 2 ( ν ) + Θ Z 2 ( ν ) C n ( ν ) { D s ( a ) Θ Z 2 ( ν ) }
and
D s , t ( a , b ) V n ( ν ) = { D s , t ( a , b ) Θ Z 2 ( ν ) } C n ( ν ) Θ Z 2 ( ν ) + { D s ( a ) Θ Z 2 ( ν ) } { D t ( b ) C n ( ν ) } Θ Z 2 ( ν ) + { D s ( a ) Θ Z 2 ( ν ) } C n ( ν ) { D t ( b ) Θ Z 2 ( ν ) } + { D s ( b ) Θ Z 2 ( ν ) } { D s ( a ) C n ( ν ) } Θ Z 2 ( ν ) + Θ Z 2 ( ν ) { D s , t ( a , b ) C n ( ν ) } Θ Z 2 ( ν ) + Θ Z 2 ( ν ) { D s ( a ) C n ( ν ) } { D t ( b ) Θ Z 2 ( ν ) } + { D t ( b ) Θ Z 2 ( ν ) } C n ( ν ) { D s ( a ) Θ Z 2 ( ν ) } + Θ Z 2 ( ν ) { D t ( b ) C n ( ν ) } { D s ( a ) Θ Z 2 ( ν ) } + Θ Z 2 ( ν ) C n ( ν ) { D s , t ( a , b ) Θ Z 2 ( ν ) }
for any s , t [ 0 , 1 ] , where Θ Z 2 ( ν ) : = Θ Z ( ν ) Θ Z ( ν ) . Thus, by Lemma A7 we obtain
max 1 i , j d 2 D s V n ( ν ) i j 2 2 max 1 i d 2 a = 1 d k = 1 d 2 D s ( a ) Θ Z 2 ( ν ) i k 2 C n ( ν ) | | | Θ Z 2 ( ν ) | | | + | | | Θ Z 2 ( ν ) | | | 2 max 1 i , j d 2 D s C n ( ν ) i j 2
and
max 1 i , j d 2 D s , t V n ( ν ) i j 2 2 max 1 i d 2 sup 0 s , t 1 a , b = 1 d k = 1 d 2 D s , t ( a , b ) Θ Z 2 ( ν ) i k 2 C n ( ν ) | | | Θ Z 2 ( ν ) | | | + 4 max 1 i , j , l d 2 sup 0 s , t 1 a = 1 d k = 1 d 2 D s ( a ) Θ Z 2 ( ν ) i k 2 D t C n ( ν ) j l 2 | | | Θ Z 2 ( ν ) | | | + 2 max 1 i d 2 sup 0 s 1 C n ( ν ) a = 1 d k = 1 d 2 D s ( a ) Θ Z 2 ( ν ) i k 2 + max 1 i , j d 2 sup 0 s , t 1 | | | Θ Z 2 ( ν ) | | | 2 D s , t C n ( ν ) i j 2 .
Now, as pointed out in the proof of Lemma A19, we have Θ Z ( ν ) = Q Z Θ Z ( ν ) . Therefore, Lemma A17 yields Θ Z 2 ( ν ) = ( Q Z Q Z ) Θ Z 2 ( ν ) . Since Q Z is non-random by [F2], we have D s ( a ) Θ Z 2 ( ν ) = ( Q Z Q Z ) D s ( a ) Θ Z 2 ( ν ) . Thus, using the Schwarz inequality repeatedly, we obtain
max 1 i , j d 2 D s V n ( ν ) i j 2 2 max 1 i j d 2 | | | Q Z | | | 2 D s Θ Z 2 ( ν ) i j 2 C n ( ν ) | | | Θ Z 2 ( ν ) | | | + | | | Θ Z 2 ( ν ) | | | 2 max 1 i , j d 2 D s C n ( ν ) i j 2
and
max 1 i , j d 2 D s , t V n ( ν ) i j 2 2 max 1 i , j d 2 sup 0 s , t 1 | | | Q Z | | | 2 D s , t Θ Z 2 ( ν ) i j 2 C n ( ν ) | | | Θ Z 2 ( ν ) | | | + 4 max 1 i , j , k , l d 2 sup 0 s , t 1 | | | Q Z | | | 2 D s Θ Z 2 ( ν ) i k 2 D t C n ( ν ) j l 2 | | | Θ Z 2 ( ν ) | | | + 2 max 1 i , j d 2 sup 0 s 1 C n ( ν ) | | | Q Z | | | 2 D s Θ Z 2 ( ν ) i j 2 + max 1 i , j d 2 sup 0 s , t 1 | | | Θ Z 2 ( ν ) | | | 2 D s , t C n ( ν ) i j 2 .
Hence we complete the proof by Lemma A18, Equation (A10) and assumption. □
Proof of Theorem 2.
Set U n : = n vec ( Θ ^ Z , λ n Γ Z , n Θ Z ) . Define the 2 d 2 × d 2 matrices J n , 1 and J n , 2 by
J n , 1 = E d 2 E d 2 , J n , 2 = S n 1 S n 1 .
Then we have
sup A A re ( d 2 ) P U n A P V n 1 / 2 ζ n A = sup y R 2 d 2 P J n , 1 U n y P J n , 1 V n 1 / 2 ζ n y
and
sup A A re ( d 2 ) P S n 1 U n A P S n 1 V n 1 / 2 ζ n A = sup y R 2 d 2 P J n , 2 U n y P J n , 2 V n 1 / 2 ζ n y .
Therefore, in view of Proposition 6, it suffices to check [D1] and Equations (12)–(13) for J n { J n , 1 , J n , 2 } . We have already checked [D1] in the proof of Theorem 1. Meanwhile, Equation (13) immediately follows from [E1] and Equation (17). To check Equation (12), we apply Lemma A16 with Ξ n = J n ( Θ Z Θ Z ) (note that Σ ˘ Z , n = [ Z , Z ] ^ 1 n ). Set
Υ n = Q Z Q Z Q Z Q Z .
Then we have Ξ n = Υ n Ξ n by Lemma A17. Since Υ n is non-random by [F2], we can apply Lemma A16 with X n = Ξ n once we show that for every ν N , there is an X n ( ν ) D 2 , ( R m × d 2 ) such that X n = X n ( ν ) on Ω n ( ν ) and Equations (A6)–(A7) hold true. Now we separately consider the two cases.
Case 1: J n = J n , 1 . In this case, we set X n ( ν ) : = J n , 1 ( Θ Z ( ν ) Θ Z ( ν ) ) . By [E1] we have X n = X n ( ν ) on Ω n ( ν ) , while Equations (A6)–(A7) follow from Equation (17) and Lemma A20, respectively.
Case 2: J n = J n , 2 . In this case, we set X n ( ν ) : = J n , 2 ( ν ) ( Θ Z ( ν ) Θ Z ( ν ) ) , where
J n , 2 ( ν ) = S n ( ν ) 1 S n ( ν ) 1 .
By [E1] we have X n = X n ( ν ) on Ω n ( ν ) , while Equation (A6) is evident because Ξ n ( ν ) C n ( ν ) Ξ n ( ν ) is the identity matrix in this case. Therefore, it remains to prove Equation (A7). Noting that S n ( ν ) is a diagonal matrix, Equation (A7) follows from Corollary 15.80 in [48] and Lemma A20 once we show that S n ( ν ) k k D 2 , for every k = 1 , , d 2 and
sup n N max 1 k d 2 S n ( ν ) k k + sup 0 t 1 D t S n ( ν ) k k , 2 + sup 0 s , t 1 D s , t S n ( ν ) k k , 2 < .
Since we can write S n ( ν ) k k = ( V n ( ν ) k k ) 5 / 2 ( V n ( ν ) k k ) 3 , we obtain the desired result by combining Theorem 15.78 and Lemma 15.152 in [48] with Lemma A20. □

Appendix D.3. Proof of Lemma 3

We use the following notation: For a d-dimensional process U = ( U t ) t [ 0 , 1 ] , we set Δ h n U : = U h / n U ( h 1 ) / n , h = 1 , , n . Also, we set χ h : = vec [ Δ h n Z ( Δ h n Z ) ] for h = 1 , , n and
C ˜ n : = n h = 1 n χ h χ h n 2 h = 1 n 1 χ h χ h + 1 + χ h + 1 χ h .
Lemma A21.
Assume [E1]. Then h = 1 n ( Δ h n Z 4 + Δ h n X 4 ) = O p ( log 2 ( d + r ) / n ) as n .
Proof. 
We use the same notation as in the proof of Lemma A15. Then, we need to prove h = 1 n Δ h n Z ¯ 4 = O p ( log 2 ( d + r ) / n ) as n . For every ν N and L > 0 , we have
P h = 1 n Δ h n Z ¯ 4 > L P h = 1 n Δ h n Z ¯ ( ν ) 4 > L + P ( Ω n ( ν ) c ) .
Hence it suffices to prove h = 1 n Δ h n Z ¯ ( ν ) 4 = O p ( log 2 ( d + r ) / n ) as n for any fixed ν N . By Lemma A23 there is a universal constant c > 0 such that Δ h n M ¯ ( ν ) j p c p Δ h n [ M ¯ ( ν ) j , M ¯ ( ν ) j ] p for all p 2 . Thus, by [E1] we obtain Δ h n Z ¯ ( ν ) j p C ν / n + c C ν p / n . Therefore, by [63] (Proposition 2.5.2), there is a constant C > 0 such that max j , h Δ h n Z ¯ ( ν ) j ψ 2 C / n for all n, where ξ ψ 2 : = inf { Λ > 0 : E [ exp ( | ξ | / Λ ) ] 2 } for a random variable ξ . Thus, [64] (Lemma 2.2.2) implies that there is a constant C > 0 such that max h Δ h n Z ¯ ( ν ) ψ 2 C log ( d + r ) / n for all n. Thus, we obtain
E h = 1 n Δ h n Z ¯ ( ν ) 4 4 ! 4 C log 2 ( d + r ) n ,
so the desired result follows from the Markov inequality. □
Lemma A22.
Assume [C1]–[C4] and [E1]. Then h = 1 n χ ^ h χ h 2 = O p ( r 2 ( log d ) 3 / n 2 ) as n .
Proof. 
Since Z ^ h / n = Z h / n ( β ^ n β ) X h / n , we have
χ ^ h χ h = vec [ ( β ^ n β ) Δ h n X ( Δ h n Z ) ] vec [ Δ h n Z ( ( β ^ n β ) Δ h n X ) ] + vec [ ( β ^ n β ) Δ h n X ( ( β ^ n β ) Δ h n X ) ] .
Now, since vec ( x y ) x y for any x , y R d , it holds that
χ ^ h χ h 2 ( β ^ n β ) Δ h n X Δ h n Z + ( β ^ n β ) Δ h n X 2 2 | | | β ^ n β | | | Δ h n X Δ h n Z + | | | β ^ n β | | | 2 Δ h n X 2 .
Therefore, we obtain
h = 1 n χ ^ h χ h 2 2 | | | β ^ n β | | | 2 h = 1 n ( Δ h n X 4 + Δ h n Z 4 ) + 2 | | | β ^ n β | | | 4 h = 1 n Δ h n X 4 .
Now, noting Lemma A15, we infer that | | | β ^ n β | | | = O p ( r ( log d ) / n ) from the proof of Lemma A12(e). Thus, we complete the proof by Lemma A21. □
Proof of Lemma 3.
Since C ˜ n C n = O p ( ( log d ) 2 / n + n γ ) by Proposition 4.1 in [20], it suffices to prove C ^ n C ˜ n = O p ( r ( log d ) 5 / 2 / n ) . Since vec ( x y ) x y for any x , y R d , Lemma A21 yields h = 1 n χ h 2 = O p ( ( log d ) 2 / n ) . Combining this with Lemma A22 and r 2 ( log d ) / n = O ( 1 ) , we also obtain h = 1 n χ ^ h 2 = O p ( ( log d ) 2 / n ) . Now the desired result follows from the Schwarz inequality and Lemma A22. □

Appendix D.4. Proof of Corollary 1

(a) Since | | | Θ Z | | | = O p ( 1 ) by Equation (17) and [F2], we have C n + V n = O p ( 1 ) by [E1] and λ n 1 | | | Θ ^ Z , λ n Θ ^ Z , λ n Θ Z Θ Z | | | = O p ( s n ) by Theorem 1. Combining this with Lemma 3 and assumption, we obtain V ^ n = O p ( 1 ) and ( log d ) V ^ n V n p 0 . Noting Equation (17) and the fact that S ^ n is a diagonal matrix, we also obtain | | | S ^ n 1 | | | = O p ( 1 ) and ( log d ) | | | S ^ n 1 S n 1 | | | p 0 . Since Equation (18) yields n vec ( Θ ^ Z , λ n Γ Z , n Θ Z ) = O p ( log d ) , we obtain log d n ( S ^ n 1 S n ) vec ( Θ ^ Z , λ n Γ Z , n Θ Z ) p 0 . Now the desired result follows from Theorem 2 and [20] Lemma 3.1.
(b) The same argument as above implies that ( log d ) 2 V ^ n V n p 0 and ( log d ) 2 | | | S ^ n 1 S n 1 | | | p 0 . Thus, the desired result follows from [20] Proposition 3.1. □

Appendix D.5. Proof of Corollary 2

First, we have by Corollary 1(a)
lim sup n P max 1 i < j d n | Θ ^ Z , λ n i j Γ Z , n i j Θ Z i j | s ^ n i j > c n = lim sup n P max 1 i < j d | ( S n 1 V n 1 / 2 ζ n ) ( i 1 ) d + j | > c n lim sup n d ( d 1 ) 2 · 2 ( 1 Φ ( c n ) ) = lim sup n α n = α .
Hence we have
lim sup n P ( S ^ ( Θ Z ) S ( Θ Z ) ) lim sup n P max ( i , j ) : Θ Z i j = 0 n | Θ ^ Z , λ n i j Γ Z , n i j | s ^ n i j > c n lim sup n P max 1 i < j d n | Θ ^ Z , λ n i j Γ Z , n i j Θ Z i j | s ^ n i j > c n α
and
lim sup n P ( S ^ ( Θ Z ) S ( Θ Z ) ) lim sup n P min ( i , j ) S ( Θ Z ) n | Θ ^ Z , λ n i j Γ Z , n i j | s ^ n i j c n lim sup n P min ( i , j ) S ( Θ Z ) n | Θ Z i j | s ^ n i j max 1 i < j d n | Θ ^ Z , λ n i j Γ Z , n i j Θ Z i j | s ^ n i j + c n lim sup n P min ( i , j ) S ( Θ Z ) n | Θ Z i j | s ^ n i j 2 c n + lim sup n P max 1 i < j d n | Θ ^ Z , λ n i j Γ Z , n i j Θ Z i j | s ^ n i j > c n lim sup n P min ( i , j ) S ( Θ Z ) | Θ Z i j | 2 c n n max i , j s ^ n i j + α .
Now, we have max i , j s ^ n i j = O p ( 1 ) from the proof of Corollary 1(a). Moreover, since α n d ( d 1 ) / 2 = 1 Φ ( c n ) e c n 2 / 2 by Chernoff’s inequality, we have c n 2 log α n d ( d 1 ) / 2 . Hence c n = O ( log d ) as n by assumption. Consequently, we obtain
lim sup n P min ( i , j ) S ( Θ Z ) | Θ Z i j | 2 c n n max i , j s ^ n i j = 0
by assumption. This completes the proof.

Appendix E. Proof of Lemma A14

In this appendix we prove Lemma A14 with the help of two general martingale inequalities. The first one is the Burkholder-Davis-Gundy inequality with a sharp constant:
Lemma A23
(Barlow and Yor [65], Proposition 4.2). There is a universal constant c > 0 such that
sup 0 t T | M t | p c p [ M , M ] T 1 / 2 p
for any p [ 2 , ) and any continuous martingale M = ( M t ) t [ 0 , T ] with M 0 = 0 .
The second one is a Bernstein-type inequality for martingales:
Lemma A24.
Let ( ξ i ) i = 1 n be a martingale difference sequence with respect to the filtration ( G i ) i = 0 n . Suppose that there are constants a , b > 0 such that i = 1 n E [ | ξ i | k G i 1 ] k ! a k 2 b 2 / 2 a.s. for any integer k 2 . Then, for any x 0 ,
P max 1 m n i = 1 m ξ i x 2 exp x 2 b 2 + b b 2 + 2 a x .
Proof. 
This is a special case of Pinelis [66] (Theorem 3.3). In fact, since R is a Hilbert space, we can apply this result with X = R and D = 1 in the notation of that paper. □
Proof of Lemma A14.
For every h = 1 , , n , set
ξ n , h : = n t h 1 t h ( M t M t h 1 ) d N t + t h 1 t h ( N t N t h 1 ) d M t .
Itô’s formula yields
n [ M , N ] ^ 1 n [ M , N ] 1 = h = 1 n ξ n , h .
Also, by assumption ( ξ n , h ) h = 1 n is a martingale difference with respect to ( F t h ) h = 0 n . Moreover, for any integer k 2 , we have
E [ | ξ n , h | k F t h 1 ] 2 k 1 n k / 2 E t h 1 t h ( M t M t h 1 ) d N t k + t h 1 t h ( N t N t h 1 ) d M t k F t h 1 2 k 1 n k / 2 c k k k / 2 E t h 1 t h ( M t M t h 1 ) 2 d [ N , N ] t k / 2 + t h 1 t h ( N t N t h 1 ) 2 d [ M , M ] t k / 2 F t h 1 ( Lemma A 23 ) 2 k 1 c k k k / 2 L k / 2 E sup t h 1 < t t h | M t M t h 1 | k + sup t h 1 < t t h | N t N t h 1 | k F t h 1 ( ( A 5 ) ) 2 k 1 c 2 k k k L k / 2 E ( [ M , M ] t h [ M , M ] t h 1 ) k / 2 + ( [ N , N ] t h [ N , N ] t h 1 ) k / 2 F t h 1 ( Lemma A 23 ) 2 k c 2 k k k L k n k / 2 ( ( A 5 ) ) ,
where c > 0 is the universal constant appearing in Lemma A23. Thus, using Stirling’s formula, we obtain
h = 1 n E [ | ξ n , h | k F t h 1 ] 2 k c 2 k e k 2 π k k ! L k n k / 2 1 k ! 2 a 0 n k 2 b 0 2 ,
where a 0 : = 2 e c 2 L and b 0 : = 2 2 c 2 L e / ( 2 π ) 1 / 4 . Hence, Lemma A24 yields
P h = 1 n ξ n , h x 2 exp x 2 b 0 2 + b 0 b 0 2 + 2 ( a 0 / n ) x
for every x 0 . Consequently, when x [ 0 , θ n ] for some θ > 0 , we have
P h = 1 n ξ n , h x 2 exp C L , θ x 2
with C L , θ : = ( b 0 2 + b 0 b 0 2 + 2 a 0 θ ) 1 . This completes the proof. □

References

  1. Wang, Y.; Zou, J. Vast volatility matrix estimation for high-frequency financial data. Ann. Statist. 2010, 38, 943–978. [Google Scholar] [CrossRef] [Green Version]
  2. Bickel, P.J.; Levina, E. Covariance regularization by thresholding. Ann. Statist. 2008, 36, 2577–2604. [Google Scholar] [CrossRef]
  3. Bickel, P.J.; Levina, E. Regularized estimation of large covariance matrices. Ann. Statist. 2008, 36, 199–227. [Google Scholar] [CrossRef]
  4. Tao, M.; Wang, Y.; Zhou, H. Optimal sparse volatility matrix estimation for high-dimensional Itô processes with measurement errors. Ann. Statist. 2013, 41, 1816–1864. [Google Scholar] [CrossRef] [Green Version]
  5. Tao, M.; Wang, Y.; Chen, X. Fast convergence rates in estimating large volatility matrices using high-frequency financial data. Econom. Theory 2013, 29, 838–856. [Google Scholar] [CrossRef] [Green Version]
  6. Kim, D.; Wang, Y.; Zou, J. Asymptotic theory for large volatility matrix estimation based on high-frequency financial data. Stoch. Process. Appl. 2016, 126, 3527–3577. [Google Scholar] [CrossRef] [Green Version]
  7. Kim, D.; Kong, X.B.; Li, C.X.; Wang, Y. Adaptive thresholding for large volatility matrix estimation based on high-frequency financial data. J. Econom. 2018, 203, 69–79. [Google Scholar] [CrossRef]
  8. Fan, J.; Furger, A.; Xiu, D. Incorporating global industrial classification standard into portfolio allocation: A simple factor-based large covariance matrix estimator with high-frequency data. J. Bus. Econom. Statist. 2016, 34, 489–503. [Google Scholar] [CrossRef]
  9. Aït-Sahalia, Y.; Xiu, D. Using principal component analysis to estimate a high dimensional factor model with high-frequency data. J. Econom. 2017, 201, 384–399. [Google Scholar] [CrossRef]
  10. Fan, J.; Kim, D. Robust high-dimensional volatility matrix estimation for high-frequency factor model. J. Am. Statist. Assoc. 2018, 113, 1268–1283. [Google Scholar] [CrossRef]
  11. Dai, C.; Lu, K.; Xiu, D. Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data. J. Econom. 2019, 208, 43–79. [Google Scholar] [CrossRef]
  12. Hautsch, N.; Kyj, L.M.; Oomen, R.C. A blocking and regularization approach to high-dimensional realized covariance estimation. J. Appl. Econom. 2012, 27, 625–645. [Google Scholar] [CrossRef] [Green Version]
  13. Morimoto, T.; Nagata, S. Robust estimation of a high-dimensional integrated covariance matrix. Commun. Statist. Simul. Comput. 2017, 46, 1102–1112. [Google Scholar] [CrossRef]
  14. Lam, C.; Feng, P.; Hu, C. Nonlinear shrinkage estimation of large integrated covariance matrices. Biometrika 2017, 104, 481–488. [Google Scholar] [CrossRef]
  15. Ledoit, O.; Wolf, M. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 2012, 40, 1024–1060. [Google Scholar] [CrossRef]
  16. Zheng, X.; Li, Y. On the estimation of integrated covariance matrices of high dimensional diffusion processes. Ann. Statist. 2011, 39, 3121–3151. [Google Scholar] [CrossRef] [Green Version]
  17. Brownlees, C.; Nualart, E.; Sun, Y. Realized networks. J. Appl. Econom. 2018, 33, 986–1006. [Google Scholar] [CrossRef]
  18. Kong, X.B.; Liu, C. Testing against constant factor loading matrix with large panel high-frequency data. J. Econom. 2018, 204, 301–319. [Google Scholar] [CrossRef]
  19. Pelger, M. Large-dimensional factor modeling based on high-frequency observations. J. Econom. 2019, 208, 23–42. [Google Scholar] [CrossRef]
  20. Koike, Y. Mixed-normal limit theorems for multiple Skorohod integrals in high-dimensions, with application to realized covariance. Electron. J. Stat. 2019, 13, 1443–1522. [Google Scholar] [CrossRef]
  21. Cochrane, J.H. Asset Pricing, revised ed.; Princeton University Press: Princeton, NJ, USA, 2005. [Google Scholar]
  22. Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  23. Acemoglu, D.; Carvalho, V.M.; Ozdaglar, A.; Tahbaz-Salehi, A. The network origins of aggregate fluctuations. Econometrica 2012, 80, 1977–2016. [Google Scholar] [CrossRef] [Green Version]
  24. Janková, J.; van de Geer, S. Inference in high-dimensional graphical models. In Handbook of Graphical Models; CRC Press: Boca Raton, FL, USA, 2018; Chapter 14; pp. 325–351. [Google Scholar]
  25. Janková, J.; van de Geer, S. Confidence intervals for high-dimensional inverse covariance estimation. Electron. J. Stat. 2015, 9, 1205–1229. [Google Scholar] [CrossRef] [Green Version]
  26. Podolskij, M.; Vetter, M. Understanding limit theorems for semimartingales: A short survey. Stat. Neerl. 2010, 64, 329–351. [Google Scholar] [CrossRef]
  27. De Gregorio, A.; Iacus, S.M. Adaptive LASSO-type estimation for multivariate diffusion processes. Econom. Theory 2012, 28, 838–860. [Google Scholar] [CrossRef] [Green Version]
  28. Masuda, H.; Shimizu, Y. Moment convergence in regularized estimation under multiple and mixed-rates asymptotics. Math. Methods Statist. 2017, 26, 81–110. [Google Scholar] [CrossRef] [Green Version]
  29. Kinoshita, Y.; Yoshida, N. Penalized quasi likelihood estimation for variable selection. arXiv 2019, arXiv:1910.12871. [Google Scholar]
  30. Suzuki, T.; Yoshida, N. Penalized least squares approximation methods and their applications to stochastic processes. Jpn. J. Stat. Data Sci. 2020. forthcoming. [Google Scholar] [CrossRef] [Green Version]
  31. Fujimori, K. The Dantzig selector for a linear model of diffusion processes. Stat. Inference Stoch. Process. 2019, 22, 475–498. [Google Scholar] [CrossRef] [Green Version]
  32. Gaïffas, S.; Matulewicz, G. Sparse inference of the drift of a high-dimensional Ornstein–Uhlenbeck process. J. Multivar. Anal. 2019, 169, 1–20. [Google Scholar] [CrossRef] [Green Version]
  33. Koike, Y.; Yoshida, N. Covariance estimation and quasi-likelihood analysis. In Financial Mathematics, Volatility and Covariance Modelling; Chevallier, J., Goutte, S., Guerreiro, D., Saglio, S., Sanhaji, B., Eds.; Routledge: London, UK, 2019; Volume 2, Chapter 12; pp. 308–335. [Google Scholar]
  34. Duchi, J.; Gould, S.; Koller, D. Projected subgradient methods for learning sparse Gaussians. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, Helsinki, Finland, 9–12 July 2008; AUAI Press: Arlington, VA, USA, 2008; pp. 153–160. [Google Scholar]
  35. Rothman, A.J.; Bickel, P.J.; Levina, E.; Zhu, J. Sparse permutation invariant covariance estimation. Electron. J. Stat. 2008, 2, 494–515. [Google Scholar] [CrossRef]
  36. Fan, J.; Li, Y.; Yu, K. Vast volatility matrix estimation using high-frequency data for portfolio selection. J. Am. Statist. Assoc. 2012, 107, 412–428. [Google Scholar] [CrossRef] [PubMed]
  37. Kim, D.; Wang, Y. Sparse PCA-based on high-dimensional Itô processes with measurement errors. J. Multivar. Anal. 2016, 152, 172–189. [Google Scholar] [CrossRef] [Green Version]
  38. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  39. Ren, Z.; Sun, T.; Zhang, C.H.; Zhou, H.H. Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 2015, 43, 991–1026. [Google Scholar] [CrossRef] [Green Version]
  40. Chang, J.; Qiu, Y.; Yao, Q.; Zou, T. Confidence regions for entries of a large precision matrix. J. Econom. 2018, 206, 57–82. [Google Scholar] [CrossRef]
  41. Campbell, J.Y.; Lo, A.W.; MacKinlay, A.C. The Econometrics of Financial Markets; Princeton University Press: Princeton, NJ, USA, 1997. [Google Scholar]
  42. Barigozzi, M.; Brownlees, C.; Lugosi, G. Power-law partial correlation network models. Electron. J. Stat. 2018, 12, 2905–2929. [Google Scholar] [CrossRef]
  43. Reiß, M.; Todorov, V.; Tauchen, G. Nonparametric test for a constant beta between Itô semi-martingales based on high-frequency data. Stoch. Process. Appl. 2015, 125, 2955–2988. [Google Scholar] [CrossRef]
  44. Fan, J.; Liao, Y.; Mincheva, M. High-dimensional covariance matrix estimation in approximate factor models. Ann. Statist. 2011, 39, 3320–3356. [Google Scholar] [CrossRef]
  45. Cai, T.T.; Liu, W.; Zhou, H.H. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Statist. 2016, 44, 455–488. [Google Scholar] [CrossRef]
  46. Cai, T.T.; Ren, Z.; Zhou, H.H. Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electron. J. Stat. 2016, 10, 1–59. [Google Scholar] [CrossRef]
  47. Nualart, D. The Malliavin Calculus and Related Topics, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  48. Janson, S. Gaussian Hilbert Space; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
  49. Clément, E.; Gloter, A. Limit theorems in the Fourier transform method for the estimation of multivariate volatility. Stoch. Process. Appl. 2011, 121, 1097–1124. [Google Scholar] [CrossRef] [Green Version]
  50. Christensen, K.; Podolskij, M.; Thamrongrat, N.; Veliyev, B. Inference from high-frequency data: A subsampling approach. J. Econom. 2017, 197, 245–272. [Google Scholar] [CrossRef] [Green Version]
  51. Belloni, A.; Chernozhukov, V.; Chetverikov, D.; Hansen, C.; Kato, K. High-dimensional econometrics and regularized GMM. arXiv 2018, arXiv:1806.01888. [Google Scholar]
  52. Lam, C.; Fan, J. Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 2009, 37, 4254–4278. [Google Scholar] [CrossRef] [PubMed]
  53. Sustik, M.A.; Calderhead, B. GLASSOFAST: An efficient GLASSO Implementation; UTCS Technical Report TR-12-29; The University of Texas at Austin: Austin, TX, USA, 2012. [Google Scholar]
  54. Friedman, J.; Hastie, T.; Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008, 9, 432–441. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Yuan, M.; Lin, Y. Model selection and estimation in the Gaussian graphical model. Biometrika 2007, 94, 19–35. [Google Scholar] [CrossRef] [Green Version]
  56. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1. [Google Scholar] [CrossRef] [Green Version]
  57. Witten, D.M.; Friedman, J.H.; Simon, N. New insights and faster computations for the graphical lasso. J. Comput. Graph. Statist. 2011, 20, 892–900. [Google Scholar] [CrossRef]
  58. Hoyle, D.C. Accuracy of pseudo-inverse covariance learning—A random matrix theory analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1470–1481. [Google Scholar] [CrossRef]
  59. Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; Wiley: New York, NY, USA, 1988. [Google Scholar]
  60. Horn, R.A.; Johnson, C.R. Matrix Analysis, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
  61. Ogihara, T. Parametric inference for nonsynchronously observed diffusion processes in the presence of market microstructure noise. Bernoulli 2018, 24, 3318–3383. [Google Scholar] [CrossRef] [Green Version]
  62. Lange, K. Optimization, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  63. Vershynin, R. High-Dimensional Probability; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
  64. Van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
  65. Barlow, M.T.; Yor, M. Semi-martingale inequalities via the Garsia–Rodemich–Rumsey lemma and application to local times. J. Funct. Anal. 1982, 49, 198–229. [Google Scholar] [CrossRef] [Green Version]
  66. Pinelis, I. Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 1994, 22, 1679–1706. [Google Scholar] [CrossRef]
Figure 1. Partial correlation network of the S&P 500 component stocks on 1 March 2018.
Figure 1. Partial correlation network of the S&P 500 component stocks on 1 March 2018.
Entropy 22 00456 g001
Figure 2. Log-log size-rank plot for the eigenvalues of the estimated residual precision matrix of the S&P 500 component stocks on 1 March 2018.
Figure 2. Log-log size-rank plot for the eigenvalues of the estimated residual precision matrix of the S&P 500 component stocks on 1 March 2018.
Entropy 22 00456 g002
Table 1. Estimation accuracy of different methods in Design 1.
Table 1. Estimation accuracy of different methods in Design 1.
nRCglassowglassof-glassof-wglassof-thr
7822.43118.85719.08315.12215.130416.197
13026.30717.93117.95414.35314.35393.242
| | | Σ ^ Y 1 Σ Y 1 | | | 19545.79517.44717.47113.92313.92850.605
390722.38116.68716.67811.30610.80625.335
780423.43415.96515.9089.3878.85115.227
786.5764.2704.2633.4193.420138.442
1306.5083.6543.4683.1933.19328.384
| | | Σ ^ Y 1 Σ Y 1 | | | 2 1956.4803.3813.2713.0943.09714.307
390203.0383.0093.0152.1332.1006.446
78093.3542.7882.8551.7821.6933.562
780.3610.4320.4410.3510.3510.347
1300.2790.3110.2960.2810.2810.268
Σ ^ Y Σ Y 1950.2270.2550.2500.2410.2410.219
3900.1600.1810.1890.1660.1690.154
7800.1120.1300.1430.1180.1190.108
Note. RC: realized covariance matrix; glasso: graphical Lasso; wglasso: weighted graphical Lasso; f-glasso: graphical Lasso with taking the factor structure into account; f-wglasso: weighted graphical Lasso with taking the factor structure into account; f-thr: location-based thresholding with taking the factor structure into account (the method of [8]). The results are based on 10,000 Monte Carlo iterations.
Table 2. Estimation accuracy of different methods in Design 2.
Table 2. Estimation accuracy of different methods in Design 2.
nRCglassowglassof-glassof-wglasso
7847.93443.14443.05535.34735.263
13048.26643.16641.75034.76734.284
| | | Σ ^ Y 1 Σ Y 1 | | | 19550.04942.80640.57134.15432.835
390338.84741.06037.80133.10029.934
780401.44738.88634.96132.16323.121
7817.80513.55713.5227.8577.843
13017.79813.54312.6287.9547.866
| | | Σ ^ Y 1 Σ Y 1 | | | 2 19517.75213.31911.6308.0067.742
39087.23912.2969.8888.0597.416
78055.61911.1898.5228.0656.072
780.6690.7230.7210.6320.631
1300.5090.6780.5720.4890.481
Σ ^ Y Σ Y 1950.4120.5670.4700.4030.390
3900.2890.2980.3390.2820.273
7800.2030.1980.2520.1970.192
Note. RC: realized covariance matrix; glasso: graphical Lasso; wglasso: weighted graphical Lasso; f-glasso: graphical Lasso with taking the factor structure into account; f-wglasso: weighted graphical Lasso with taking the factor structure into account. The results are based on 10,000 Monte Carlo iterations.
Table 3. Average coverages of entry-wise confidence intervals.
Table 3. Average coverages of entry-wise confidence intervals.
Design 1Design 2
n95%99%95%99%
7895.2199.0495.2299.04
13095.1399.0395.1399.03
Θ Z i j = 0 19595.0999.0295.0999.02
39095.0499.0195.0599.01
78095.0299.0095.0299.01
7899.3399.8795.1699.03
13099.8299.9695.9099.18
Θ Z i j 0 19599.9799.9996.3699.26
39096.0099.2096.6599.33
78096.0999.2296.4199.27
This table reports the average coverages of entry-wise confidence intervals for the residual precision matrix Θ Z over the sets { ( i , j ) : i j , Θ Z i j = 0 } and { ( i , j ) : i j , Θ Z i j 0 } , respectively. The confidence intervals are constructed based on the normal approximation Equation (21). The results are based on 10,000 Monte Carlo iterations.

Share and Cite

MDPI and ACS Style

Koike, Y. De-Biased Graphical Lasso for High-Frequency Data. Entropy 2020, 22, 456. https://doi.org/10.3390/e22040456

AMA Style

Koike Y. De-Biased Graphical Lasso for High-Frequency Data. Entropy. 2020; 22(4):456. https://doi.org/10.3390/e22040456

Chicago/Turabian Style

Koike, Yuta. 2020. "De-Biased Graphical Lasso for High-Frequency Data" Entropy 22, no. 4: 456. https://doi.org/10.3390/e22040456

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop