Next Article in Journal
Approaches to Efficiency Assessing of Regional Knowledge-Intensive Services Sector Development Using Data Envelopment Analysis
Next Article in Special Issue
Contributions to Risk Assessment with Edgeworth–Sargan Density Expansions (I): Stability Testing
Previous Article in Journal
Economic, Ecological and Social Analysis Based on DEA and MCDA for the Management of the Madrid Urban Public Transportation System
Previous Article in Special Issue
Cross-Hedging Portfolios in Emerging Stock Markets: Evidence for the LATIBEX Index
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

“A Bias Recognized Is a Bias Sterilized”: The Effects of a Bias in Forecast Evaluation

1
Facultad de Economía y Empresa, Universidad Diego Portales, Santiago 8370179, Chile
2
School of Business, Universidad Adolfo Ibáñez, Santiago 8380629, Chile
Mathematics 2022, 10(2), 171; https://doi.org/10.3390/math10020171
Submission received: 16 November 2021 / Revised: 30 December 2021 / Accepted: 1 January 2022 / Published: 6 January 2022

Abstract

:
Are traditional tests of forecast evaluation well behaved when the competing (nested) model is biased? No, they are not. In this paper, we show analytically and via simulations that, under the null hypothesis of no encompassing, a bias in the nested model may severely distort the size properties of traditional out-of-sample tests in economic forecasting. Not surprisingly, these size distortions depend on the magnitude of the bias and the persistency of the additional predictors. We consider two different cases: (i) There is both in-sample and out-of-sample bias in the nested model. (ii) The bias is present exclusively out-of-sample. To address the former case, we propose a modified encompassing test (MENC-NEW) robust to a bias in the null model. Akin to the ENC-NEW statistic, the asymptotic distribution of our test is a functional of stochastic integrals of quadratic Brownian motions. While this distribution is not pivotal, we can easily estimate the nuisance parameters. To address the second case, we derive the new asymptotic distribution of the ENC-NEW, showing that critical values may differ remarkably. Our Monte Carlo simulations reveal that the MENC-NEW (and the ENC-NEW with adjusted critical values) is reasonably well-sized even when the ENC-NEW (with standard critical values) exhibits rejections rates three times higher than the nominal size.

1. Introduction

Fortunately for serious minds, a bias recognized is a bias sterilized” Benjamin Haydon.
Diebold and Mariano (1995) [1] and West’s (1996) [2] seminal papers are typically pinpointed as the Big Bang of the forecast evaluation literature in economics and finance. Even though both papers propose asymptotically normal tests for forecast evaluation, the proper environment of each paper is different. The former considers the case of comparing forecasts (i.e., the forecasts are assumed to be given), while the latter focuses on the case of comparing models (i.e., the forecasts are constructed through estimated parametric models). Put simply, the key contribution of West’s asymptotic theory is that it accounts for parameter uncertainty.
While the asymptotic theory of [2] is quite general and allows a variety of estimation techniques and loss functions, it is not universal. One of the key assumptions in West’s theory is a full rank condition over the long-run variance of the loss function when parameters are set at their true values. One of the most iconic cases in which this condition is not fulfilled is the comparison of two competing nested models: Under the null hypothesis of no encompassing, both models are identical, and standard tests of forecast evaluation become degenerate. As pointed out by Clark and West (2006) [3] and West (2006) [4], for the case of MSPE, this degeneracy is not only important in a theoretical sense, but also in an empirical one: “[…] use of standard critical values usually results in very poorly sized tests, with far too few rejections. As well, the usual statistic has very poor power.” [4] p. 119 (A note of caution here. We are not arguing that this degeneracy necessarily imply that tests become undersize. Some of the simulations in [5] suggest both types of size distortions: sometimes tests are undersized, sometimes they are oversized.).
Not surprisingly, an important strand of this literature focuses on the case of nested models. Some of the most influential empirical papers in economic forecasting (such as Meese and Rogoff (1983, 1987) [6,7] and Goyal and Welsh (2003, 2008) [8,9], among many others) consider the random walk as one of the most relevant benchmarks. Anecdotally, and just to emphasize the relevance and popularity of nested models comparisons, to date, the number of citations of [2] is slightly lower than Clark and West (2007) [10] (1552 and 1615, respectively), even though the latter is exclusively useful for nested models (and published a decade later).
An interesting approach dealing with nested models is the one of Clark and McCracken (2001, 2005) [5,11] and McCracken (2007) [12] (henceforth, CM). Let P be the number of forecasts, R the initial number of observations used to estimate our parameters, T = P + R, and f t + 1 ( β ^ t ) the loss function for one-step-ahead forecasts, constructed with a vector of estimated parameters β ^ t . The intuition of CM asymptotics is as follows: when models are nested, traditional tests of encompassing/accuracy become degenerate under the null hypothesis in the sense that P 0.5 t = R T f t + 1 ( β ^ t ) p l i m 0 . However, CM notice that t = R T f t + 1 ( β ^ t ) need not be degenerate. Based on this intuition, CM derive the asymptotic distribution of some traditional tests, such as the encompassing test of Harvey, Leybourne, and Newbold (1998) [13] (ENC-t), and a simple modification of the ENC-t (label as ENC-NEW). CM show that the asymptotic distribution of these tests is non-standard, but functions of integrals of quadratic Brownian motions. Under specific conditions, however, these statistics are free of nuisance parameters (See Clark and McCracken (2013) [14] for a detailed discussion on these conditions.). For those cases, CM tabulate their critical values, which depend on how we update our parameters (either rolling, recursive, or fixed), the number of excess parameters in the nesting model ( k 2 ), and the asymptotic limit of the ratio P/R.
Even though the asymptotic theory of CM is useful in many contexts, it has some important caveats. In particular, Assumption 3 in [5,12] requires that the generalized forecast errors form a martingale difference sequence. This condition may be, of course, violated in the case of misspecification in the null model. As commented by [12]: “The assumption has the side effect of imposing a type of correct model specification […] and thus rules out applications where the predictive models are misspecified” [12], p. 725. In our opinion, this is a somewhat overlooked aspect of CM results (Even though they take a completely different approach, the most related papers addressing misspecification in predictive models are those of Chao, Corradi and Swanson (2001) [15], Armah and Swanswon (2008) [16], and Corradi and Swanson (2004, 2006, 2007) [17,18,19]. These authors propose bootstrap procedures that are robust to different types of misspecifications. In contrast, we derive the new asymptotics distributions in the presence of bias and propose a modification to the ENCNEW.). We show that the use of CM critical values ignoring the effects of these misspecifications may lead to severe size distortions; in specific, we address the effects of a bias in the predictive models over the asymptotic distributions (The implication of a bias in the nested model is that, under the null hypothesis, the forecasting errors of both models are biased. In essence, we show that a bias in the forecast errors distorts the size properties of traditional tests.).
Why not simply use a correctly specified model, then? First, our view about econometric forecasting models coincides with Box (1976) [20] in the sense that “All models are wrong, but some of them are useful.” In this context, as forecasting models are just parsimonious representations of complex phenomena, it is reasonable to expect some misspecification in the predictive models. Moreover, the econometric framework of CM considers exclusively linear models estimated by OLS; thus, any nonlinearity in the true model may introduce some type of misspecification. Second, if we knew in advance the “correct” model, then “there may be no compelling reason for performing an out-of-sample version of the particular test (…) in the first place.” [17], p. 188. As noticed by Inoue and Killian (2002) [21], in-sample analyses tend to outperform their out-of-sample counterparts in terms of power (when models are correctly specified); as a consequence, if we knew for sure that our models are “correct”, then the framework of CM may not be relevant to begin with: “After all, under the assumption of correct specification under the null, why not simply carry out in-sample inference, for the sake of efficiency?” [16], p. 196.
If there is indeed a bias in the null model, then by construction, the critical values provided by CM are no longer valid as they rule out this possibility. In this context, we distinguish between two relevant cases: (i) there is a bias both in-sample and out-of-sample, and (ii) we find the bias exclusively out-of-sample. A leading case in which (i) is relevant is the following: If the expected value of the target variable is not zero, and the benchmark is the zero-forecast from a driftless random walk (DRW), then by construction, both models are biased under the null hypothesis, and inference may be severely incorrect when using CM critical values. As the DRW is one of the most important benchmarks in the economic/financial forecasting literature, we think this is an important extension for CM work, and we focus on this leading case (Some empirical papers using the DRW as a benchmark and the ENC-NEW with CM critical values are Pincheira and Hardy (2019, 2021) [22,23], Pincheira et al. (2021) [24,25], to name a few. We emphasize, however, that these papers consider multiple benchmarks as the nested model (not exclusively the DRW). In this sense, we are not questioning they main conclusions by any means.). Regarding (ii), an important case is a shift in the expected value of the target variable: because out-of-sample analyses require multiple in-sample estimations of the parameters, a structural break in the expected value of the target variable (outside the estimation window) may introduce an out-of-sample bias.
Based on CM, we relax Assumption 3 in [5] and derive the new distribution for the ENC-NEW imposing a bias in the null model (either out-of-sample or both in-sample and out-of-sample). While this is a subtle change in the set of assumptions, it has important implications over the asymptotic theory. In particular, the quadratic Brownian motions in CM arise because of the martingale difference terms: put simply, the orthogonality condition is assumed to hold both in-sample and out-of-sample. In contrast, in our case, the quadratic Brownian motions arise because of both the predictors and the martingale difference terms. Moreover, if the bias appears both in-sample and out-of-sample, the persistency of the predictors shift the expected value of the integrals of quadratic Brownian motions, thus a re-centering is required: our MENC-NEW is simply a re-centered version of the ENC-NEW. Of course, in the absence of a bias, our MENC-NEW reduces to the ENC-NEW. As expected, we show that this new asymptotic distribution depends on the magnitude of the bias and the persistency of the predictors. Even though the asymptotic distribution of the MENC-NEW (and the ENC-NEW with adjusted critical values) is not pivotal, the nuisance parameters can be easily estimated (one of these nuisance parameters being, of course, the magnitude of the bias).
To assess the adequacy of our approach, we provide five different sets of Monte Carlo simulations. The firsts two sets are designed to evaluate the size properties of our MENC-NEW with both an in-sample and out-of-sample bias. The third and fourth sets are designed to evaluate the size properties of the ENC-NEW with our adjusted critical values when there is only an out-of-sample bias (e.g., a structural break). Finally, the last set of simulations evaluates the power properties of our test. We compare the performance of our approach against the ENC-NEW (with CM critical values), the ENC-t ([5]), and the Wild Clark and West (WCW-t) proposed by Pincheira, Hardy, and Muñoz (2021) [26]. Our results are crystal clear in terms of size: (i) A bias in the null model may severely distort the size properties of the ENC-NEW and the ENC-t. For instance, under the null hypothesis, some of our simulations show that both tests may exhibit an empirical size of over 30% when the nominal size is just 10%. (ii) In sharp contrast, our approach is reasonably well-sized in all our simulations, and it seems to outperform the other tests in most of our simulations. As expected, our approach dominates the rest of the tests when the data-generating process considers a greater bias. (iii) Next to our approach, the WCW-t exhibits the best size properties. However, as Pincheira, Hardy, and Muñoz (2021) [26] noticed, this advantage comes with a cost: the WCW-t displays significantly less power than the competitors. (iv) Even though the ENC-NEW is significantly more oversized than our test, there are no significant differences between both approaches in terms of power.
The rest of the paper is organized as follows. Section 2 provides a brief literature review on forecast evaluation in economics and finance. Section 3 considers the case of the DRW being the benchmark model (thus, a presence of a bias both in-sample and out-of-sample). In this case, we propose the MENC-NEW as the proper approach. Section 4 considers that the in-sample bias may be different than the out-of-sample bias. The interesting case here is an in-sample bias of zero but an out-of-sample bias different from zero (e.g., a shift in the drift of the target variable). In this case, there is no need for a re-centering term, although we derive the new asymptotic distribution of the ENC-NEW (thus, it is necessary to simulate new critical values). Section 5 considers four different Monte Carlo simulations to evaluate the size properties of our approach compared to traditional tests. At the end of Section 4 we evaluate the power of our test. Finally, Section 6 concludes.

2. Literature Review

Forecasting is a crucial area of study in financial econometrics: “It is obvious that forecasts are of great importance and widely used in economics and finance. Quite simply, good forecasts lead to good decisions.” Diebold and Lopez (1996) [27], page 241. Of course, a forecast has to be good in order to be useful, but how do we assess the quality of the forecast? This is the raison d’être of the literature on forecast evaluation. See West (2006) [4] and Clark and McCracken (2013) [28] for great reviews on forecast evaluation.
Forecast evaluation has a long tradition in economics and finance. For instance, West (2006) [4] recognizes Wilson (1934) [29] as one of the earlier examples. Other famous works are Fair and Shiller (1989, 1990) [30,31] and Meese and Rogoff (1983, 1988) [6,7]. The main intuition in these empirical papers is the following: if a model is a reasonably good representation of the target variable, then this model should accurately forecast this target variable. In this sense, forecast evaluation is at the core of empirical time series works.
Diebold and Mariano (1995) [1] (DM) and West (1996) [2] (DM) are two famous seminal papers addressing formal procedures to evaluate forecasts. In short, DM propose a direct application of the Central Limit Theorem for stationary series. In some sense, the DM approach is appealing since it relies on weak assumptions on the loss functions and the forecasting errors. Moreover, the Central Limit Theorem ensures asymptotic normality, and consequently, conducting inference in this framework is straightforward. Nevertheless, an important caveat of DM approach is that they consider forecasts as primitives: they do not consider the effects of parameter uncertainty. Intuitively, the DM approach is valid for evaluating forecasts but not for inference about models.
The main contribution of [2] is considering the case of forecasting models. The author provides a formal procedure for testing predictive ability at the population level. In essence, West (1996) [2] addresses how estimation error on the model’s parameters may affect the proper inference about predictive ability. West notices that, under specific conditions, estimation error vanishes asymptotically; if that is the case, then the approaches of DM and West are equivalent. Notably, the asymptotic theory of West rules out the comparison of nested models: West requires a full rank condition over the variance of the loss function. Intuitively, in nested models, the null hypothesis of equal predictive ability means that both models are identical; thus the asymptotic distribution of the test becomes degenerate.
The comparisons of nested models are extremely relevant in economics and finance. A common practice is to compare the predictive performance of a model with a simple benchmark. One of the most commonly considered benchmarks in the empirical literature is the random walk, which is typically nested on more complex predictive models. Even though the random walk is very simple, it is difficult to outperform empirically. As famously noticed by Meese and Rogoff (1983) [6]: “We find that a random walk model would have predicted major-country exchange rates during the recent floating-rate period as well as any of our candidate models” [6], page 3. See Goyal and Welch (2008) [9], Rossi (2013) [32], and Meese and Rogoff (1988) [7] for more examples of how difficult it is to outperform a simple random walk with more complex nesting models.
Due to its relevance, an important strand of the literature focuses on formal procedures to compare the predictive performance of nested models. [3,10] (CW) propose an adjusted mean squared prediction error test and show via simulations that this test works reasonably well with standard normal critical values. Recently, Pincheira, Hardy, and Muñoz (2021) [26] (PHM) consider a simple modification of the CW test. This modification prevents the core statistic from becoming degenerate under the null hypothesis. Using the asymptotic theory of [2], PHM show their test is asymptotically standard normal. The main caveat of their approach is that their modification erodes some of the power properties of the CW test.
A different approach to deal with nested models is Clark and McCracken (2001, 2005) [5,11] and McCracken (2007) [12]. Using a completely different asymptotic theory, the authors consider some of the most popular tests on forecast evaluation and derive its asymptotic distributions for the special case of nested models. The authors show that the asymptotic distribution of these statistics are no longer standard, but they are functionals of quadratic Brownian motions. These distributions depend on the estimation scheme considered to update the parameters of the model (either rolling or recursive), on the number of excess parameters in the nesting model, and the asymptotic limit of the ratio P/R. Under some specific conditions, CM show that these distributions are pivotal, and they provide tables with the relevant critical values. Additionally, CM propose a different regularization of the encompassing test proposed by [13]: they label this new encompassing test as ENC-NEW. Notably, the simulations of CM suggest that the ENC-NEW has an edge in terms of power compared to other traditional tests.
One problem with the approach of CM is that it requires the strong assumption that the nested model is correctly specified. In essence, this assumption rules out the possibility of a bias in the benchmark model: CM critical values are no longer valid if that is the case. To the best of our knowledge, this is the first paper that addresses how this type of misspecification affects the asymptotic distribution of these tests. First, using the same asymptotic theory as CM, we analyze how a bias in the null model modifies the asymptotic distribution of the ENC-NEW. This asymptotic distribution is not pivotal and relies on the magnitude of the bias and the persistency of the additional predictor in the nesting model. Second, we show via simulations that ignoring this bias, and using CM critical values, may lead to severe size distortions. Not surprisingly, these size distortions depend on the magnitude of the bias and the persistency of the additional predictor.

3. The DRW as the Null Model and the MENC-NEW Test

The theory of CM imposes that the benchmark model is correctly specified. This section illustrates the effects of failing this condition because of a bias both in-sample and out-of-sample. The leading case is the DRW as the benchmark model: If the expected value of the target variable is not zero, then both models are biased under the null hypothesis of no encompassing. Considering the relevance of the DRW in the forecasting literature, we focus exclusively on this leading case. With some technical subtleties, the generalization to a linear parametric benchmark is straightforward and considered in Section 4. In fact, the results of Section 3 are special cases of our decompositions in Section 4.
Following CM notation, let k 2 be the number of parameters in the nesting model, and W(s) be a ( k 2 × 1) vector Brownian Motion with covariance kernel equal to the Identity matrix. Let Y t + 1 be our scalar target variable, such that Y t + 1 = Δ + u t + 1 , where u t + 1 is simply white noise with variance σ u 2 , and Δ captures the expected value of Y t + 1 . Consider two competing models for one-step-ahead forecasts. The first one is simply the zero-forecast from a DRW, with forecast errors denoted by u 1 , t + 1
Y t + 1 = u 1 , t + 1     ( m o d e l   1 : n u l l   m o d e l )
The second one is a linear parametric model
Y t + 1 = X t β * + u 2 , t + 1   ( m o d e l   2 : a l t e r n a t i v e   m o d e l )
Notice that β * denotes the ( k 2 × 1)vector of true (population) parameters, and X t is assumed to be a zero-mean covariance stationary vector with k 2 predictors. Let β t ^ be the OLS estimator of β * constructed with either a rolling, recursive or fixed scheme.
As noticed by [2], under the null hypothesis of no encompassing, β * = 0 and both models are identical t . Therefore, traditional tests of encompassing/accuracy become degenerate under West’s asymptotic theory. To address this issue, CM proposes a new statistic (ENC-NEW) and derives its distribution using a different set of asymptotics:
E N C N E W = P P 1 t u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) P 1 t u ^ 2 , t + 1 2
The limit distribution of the ENC-NEW depends on how parameters are updated: either R / T   1 s 1 W ( s ) d W ( s ) (recursive), or T / R R / T   1 ( W ( s ) W ( s R T ) ) d W ( s ) (rolling) (see [5], page 93). One of the key assumptions in CM is that model 1 is correctly specified under the null hypothesis. Notice, however, that the DRW is free of parameter uncertainty ( u ^ 1 , t + 1 = u 1 , t + 1 ), and
Y t + 1 = u ^ 1 , t + 1 = u 1 , t + 1 = Δ + u t + 1
In contrast, for the case of model 2
Y t + 1 = X t β t ^ + u ^ 2 , t + 1
where u ^ 2 , t + 1 = X t ( β * β t ^ ) + u 2 , t + 1 .
Under the null of no encompassing β * = 0 , and u 1 , t + 1 = u 2 , t + 1 = Δ + u t + 1 . This is the key difference with CM asymptotic theory: Under the null hypothesis, both models are misspecified (biased); the reason is that none of them capture the mean of our target variable. By construction, we will observe this type of misspecification whenever Δ = E Y t + 1 0 and the nested model is the zero-forecast from a driftless random walk. We focus on this leading case as the DRW is one of the most frequently used benchmarks in financial forecasting.
Let us now illustrate the effects of this bias over the asymptotic distribution of the ENC-NEW. For clarity of exposition, we focus our discussion and most of our proofs on the recursive scheme (since they are all similar). However, we do provide the main results for both the rolling and recursive schemes in Theorem 2.
From West (1996), under the null hypothesis, we have P 1 t u ^ 2 , t + 1 2 σ 2 2 = E u t + 1 2 + Δ 2 = σ u 2 + Δ 2 as P , R . In what follows, we assume that X t u t + 1 (rather than X t u 2 , t + 1 ) is a martingale difference sequence. Now consider the following decomposition of the core statistic
t = R T 1 u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) = t = R T 1 u 1 , t + 1 ( u 1 , t + 1 u ^ 2 , t + 1 ) = t = R T 1 u 1 , t + 1 X t ( β t ^ β * ) = t = R T 1 ( Δ + u t + 1 ) X t ( β t ^ β * )
= t = R T 1 ( Δ + u t + 1 ) X t ( j = 1 t 1 X j X j ) 1 ( j = 1 t 1 X j u ^ 2 , j + 1 ) = t = R T 1 ( Δ + u t + 1 ) X t ( j = 1 t 1 X j X j ) 1 ( j = 1 t 1 X j ( Δ + u j + 1 ) )
= t = R T 1 u t + 1 X t ( j = 1 t 1 X j X j ) 1 ( j = 1 t 1 X j u j + 1 )      
+ Δ t = R T 1 X t ( j = 1 t 1 X j X j ) 1 ( j = 1 t 1 X j u j + 1 )                  
+ Δ t = R T 1 u t + 1 X t ( j = 1 t 1 X j X j ) 1 ( j = 1 t 1 X j )                
+ Δ 2 t = R T 1 X t ( j = 1 t 1 X j X j ) 1 ( j = 1 t 1 X j )
This decomposition explicitly shows the effects of parameter uncertainty on the core statistic. This decomposition shows that the asymptotic distribution of the ENC-t and the ENC-NEW depend on how parameters are updated (either rolling or recursive). Notice that (1) is precisely the same term as Lemma A6 in [33] for the numerator of the ENC-NEW in a recursive scheme. We may interpret (1) as the core statistic when there is no bias; if that is the case, then Δ = 0 ,   and consequently ( 2 ) = 0 , ( 3 ) = 0 ,   ( 4 ) = 0 , and the distribution of the ENC-NEW is given by Theorem 3.3 of [5]. Nevertheless, (2–4) in the ENC-NEW arise because Δ may not be zero. In some sense, we can view Clark and McCracken (2001) [5] results as the special case in which Δ = 0 . To establish the new asymptotic distribution of the ENC-NEW, we use the following results.
Lemma 1.
(a) 
t = R T 1 ( T t ) T 0.5 u t + 1 X t ( 1 t j = 1 t 1 X j X j ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) = t = R T 1 ( T t ) T 0.5 u t + 1 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) + o p ( 1 )
(b) 
Δ t = R T 1 ( T t ) T 0.5 X t ( 1 t j = 1 t 1 X j X j ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) = Δ t = R T 1 ( T t ) T 0.5 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) + o p ( 1 )  
(c) 
Δ t = R T 1 ( T t ) T 0.5 u t + 1 X t ( 1 t j = 1 t 1 X j X j ) 1 ( T 0.5 j = 1 t 1 X j ) = Δ t = R T 1 ( T t ) T 0.5 u t + 1 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j ) + o p ( 1 )  
(d) 
Δ 2 t = R T 1 ( T t ) T 0.5 X t ( 1 t j = 1 t 1 X j X j ) 1 ( T 0.5 j = 1 t 1 X j ) = Δ 2 t = R T 1 ( T t ) T 0.5 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j ) + o p ( 1 )  
(e) 
t = R T 1 u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) = t = R T 1 ( T t ) T 0.5 u t + 1 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) +   Δ t = R T 1 ( T t ) T 0.5 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) + Δ t = R T 1 ( T t ) T 0.5 u t + 1 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j ) + Δ 2 t = R T 1 ( T t ) T 0.5 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j ) + o p ( 1 )
Proof of Lemma 1.
See Appendix A. □
Theorem 1 next establishes the asymptotic distribution of t u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 )
Theorem 1.
(a) 
t = R T 1 ( T t ) T 0.5 u t + 1 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) λ 1 s 1 W 1 ( s ) σ u 2 d W 1 ( s )
(b) 
Δ t = R T 1 ( T t ) T 0.5 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) Δ λ 1 s 1 W 1 ( s ) ( E X t X t ) 1 σ u V L P ( X t ) d W 2 ( s )
(c) 
Δ t = R T 1 ( T t ) T 0.5 u t + 1 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j ) Δ λ 1 s 1 W 2 ( s ) V L P ( X t ) σ u ( E X t X t ) 1 d W 1 ( s )
(d) 
Δ 2 t = R T 1 ( T t ) T 0.5 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j ) Δ 2 λ 1 s 1 W 2 ( s ) V L P ( X t ) ( E X t X t ) 1 d W 2 ( s ) + Λ λ
(e) 
t = R T 1 u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) λ 1 s 1 W 1 ( s ) σ u 2 d W 1 ( s ) + Δ λ 1 s 1 W 1 ( s ) ( E X t X t ) 1 σ u V L P ( X t ) d W 2 ( s ) + Δ λ 1 s 1 W 2 ( s ) V L P ( X t ) σ u ( E X t X t ) 1 d W 1 ( s ) + Δ 2 λ 1 s 1 W 2 ( s ) V L P ( X t ) ( E X t X t ) 1 d W 2 ( s ) + Λ λ
Proof of Theorem 1.
See Appendix B. □
Where V L P ( X t ) is the long-run variance of X t and λ = R / T . As expected, if Δ = 0 (no bias), the asymptotic distribution of t u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) is simply Theorem 1a λ 1 s 1 W 1 ( s ) σ u 2 d W 1 ( s ) (the same as CM). In contrast, if Δ 0 , the limit distribution of t u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) is given by Theorem 1e. Compared to CM, this distribution depends on three additional nuisance parameters: the magnitude of the bias Δ , the persistency of the additional predictors V L P ( X t ) , and Λ λ (which is also related to the persistency of the predictors). Consequently, our Monte Carlo simulations in Section 5 reveal that the size distortions of the ENC-NEW with CM critical values are more severe with greater bias and persistency. Finally, Theorem 1e establishes that the asymptotic distribution of t u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) depend not only on W 1 , but also on W 2 ; while the former appears because of X t u t + 1 (as in CM), the latter arises because of X t . Both standard Brownian motions are assumed to be independent.
Notice that the term Λ λ in Theorem 1d,e arises because of the correlation between X t and X j (i.e., the correlation between the predictor in the estimation window and the evaluation window). Put simply, Λ λ depends on the covariances E ( X t X j ) . As W 1 and W 2 are assumed to be independent, we must re-center the term Δ 2 λ 1 s 1 W 2 ( s ) V L P ( X t ) ( E X t X t ) 1 d W 2 ( s ) whenever E ( X t X j ) 0 . See White (2014) [34], p.198 for a similar discussion. Notably, our proposed MENC-NEW is simply a re-centered version of the ENC-NEW, corrected by the autocovariances that shift the distribution. The re-centering term is simply the expected value of Δ 2 λ 1 s 1 W 2 ( s ) V L P ( X t ) ( E X t X t ) 1 d W 2 ( s ) . For clarity of exposition, consider the following example: suppose that k 2 = 1 , then, the re-centering term is simply given by
Λ λ = Δ 2 ( E X t 2 ) 1 t = R T 1 ( 1 t ) j = 1 t 1 E ( X t X j )
While the sum of the covariances E ( X t X j ) are unknown, they can be estimated. Our proposed MENC-NEW is simply
M E N C N E W = P P 1 t u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) Λ ^ λ P 1 t u ^ 2 , t + 1 2
As Λ ^ λ depends on the sum of the covariances, its estimation depends on λ (i.e., how we split the data). Notice that this re-centering term will be particularly important for both the recursive and the rolling scheme. Theorem 2 next establishes the asymptotic distribution of our MENC-NEW for each updating scheme
Theorem 2.
The asymptotic distribution of the MENC-NEW = P P 1 t u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) Λ ^ λ P 1 t u ^ 2 , t + 1 2 is given by
M E N C N E W P , R ( r e c u r s i v e )   1 σ u 2 + Δ 2 { σ u 2 λ 1 s 1 W 1 ( s ) d W 1 ( s ) + Δ σ u λ 1 s 1 W 1 ( s ) ( E X t X t ) 0.5 V L P ( X t ) d W 2 ( s ) + Δ σ u λ 1 s 1 W 1 ( s ) ( E X t X t ) 0.5 V L P ( X t ) d W 2 ( s ) + Δ 2 λ 1 s 1 W 2 ( s ) V L P ( X t ) ( E X t X t ) 1 d W 2 ( s ) }
M E N C N E W P , R ( r o l l i n g ) λ 1 σ u 2 + Δ 2 { σ u 2 λ 1 [ W 1 ( s ) W 1 ( s λ ) ] d W 1 ( s ) + Δ σ u λ 1 [ W 1 ( s ) W 1 ( s λ ) ] ( E X t X t ) 0.5 V L P ( X t ) d W 2 ( s ) + Δ σ u λ 1 [ W 2 ( s ) W 2 ( s λ ) ] V L P ( X t ) ( E X t X t ) 0.5 d W 1 ( s ) + Δ 2 λ 1 [ W 2 ( s ) W 2 ( s λ ) ] V L P ( X t ) ( E X t X t ) 1 d W 2 ( s ) }
where R / T T , R λ .
Proof of Theorem 2.
This is a direct application of Theorem 1 and the definition of the MENC-NEW. □
Not surprisingly, if we set Δ = 0 (i.e., no bias), the asymptotic distribution of the MENC-NEW is either λ 1 s 1 W 1 ( s ) d W 1 ( s ) (recursive) or λ 1 λ 1 [ W 1 ( s ) W 1 ( s λ ) ] d W 1 ( s ) , which is precisely Theorem 3c in [5]. In other words, Theorem 2 encompasses the case of unbiasedness.
The main caveat of our approach is that these distributions are not pivotal: to obtain our critical values, we must simulate the quadratic Brownian motions using estimates of these nuisance parameters. Notably, relaxing the assumption of Δ = 0 has the implication of significantly complicating the asymptotic distribution of our test. Notice that the distribution of the MENC-NEW depends on (i) The number of excess parameters k 2 . (ii) How we update our parameters (either recursive or rolling). (iii) The persistency of the additional predictors (through V L P ( X t ) ). (iv) The magnitude and sign of the bias (through Δ ), (v) The variances ( E X t X t ) 1 and σ u 2 . (vi) How we split our dataset (through λ )). In contrast, the distribution of the ENC-NEW simply depends on (i), (ii), and (vi); for this reason, CM are able to provide a set of critical values. We emphasize, however, that ignoring these additional parameters may distort the empirical size of these tests, as we show in Section 5.
Let us illustrate numerically how the critical values of the ENC-NEW might change with a bias. Suppose we are interested in a recursive scheme, and we use fifty percent of the sample to estimate our parameters and fifty percent to evaluate our forecasts ( R T = 0.5 , or P R = 1 ). Suppose the alternative model consider just one predictor X t (i.e., k 2 = 1 ). Under the assumption of no bias, CM establishes the asymptotic distribution of the ENC-NEW in a recursive scheme as R / T   1 s 1 W ( s ) d W ( s ) , and tabulate the corresponding critical values. In this case, the critical values for 10%, 5%, and 1% significance levels are 0.984, 1.584, and 3.209, respectively.
Nevertheless, Theorem 1 establishes that the proper asymptotic distribution of the ENC-NEW with a bias under the null model should be
1 Δ + σ u 2 ( λ 1 s 1 W 1 ( s ) σ u 2 d W 1 ( s ) + Δ λ 1 s 1 W 1 ( s ) ( E X t X t ) 1 σ u V L P ( X t ) d W 2 ( s )
+ Δ λ 1 s 1 W 2 ( s ) V L P ( X t ) σ u ( E X t X t ) 1 d W 1 ( s ) + Δ 2 λ 1 s 1 W 2 ( s ) V L P ( X t ) ( E X t X t ) 1 d W 2 ( s ) + Λ λ )
Suppose the predictor in the alternative model follows an AR(1) process: X t = 0.5 X t 1 + u X , t , with V ( u X , t ) = 0.1 . If that is the case, V L P ( X t ) = 0.1 ( 1 0.5 ) 2 = 0.4 and E X t 2 = 0.1333 . Set Δ = 0.5 and σ u 2 = 0.1 . The critical values for 10%, 5%, and 1% significance levels in this new scenario are 2.380, 3.760, and 7.676, respectively. In this case, the bias in the null model shifts the critical values significantly to the right: in this illustration, the critical values are twice those tabulated by CM. Consequently, using CM critical values may lead to severely oversized tests. This is supported by our Monte Carlo simulations.

4. The General Case of a Parametric Nested Benchmark and the Asymptotic Distribution of the ENC-NEW

This section generalizes our previous result considering a parametric nested benchmark (possibly different from the DRW). Here, we allow the possibility of having different in-sample and out-of-sample biases. As we estimate our parameters by OLS, it is reasonable to think that the bias is present exclusively out-of-sample. For this reason, we derive the asymptotic distribution of the ENC-NEW considering an in-sample and an out-of-sample bias ( Δ IS and Δ oos respectively and possibly Δ IS Δ oos ), and then we focus on the particular (but more relevant) case of Δ IS = 0 .
Let X 1 and X 2 be the ( k 1 × 1) and ( k × 1) vectors of predictors of the nested and nesting models, respectively. In contrast to the previous section, k = k 1 + k 2 ; in other words, k 2 indicates the additional predictors considered in the nesting model. Let J be a k 1 × k 2 indication matrix, constructed as a ( k 1 × k 1 ) identity matrix, and zeroes elsewhere. In contrast to the previous section, additional terms arise in our decomposition because model 1 is also affected by parameter uncertainty. We focus our discussion and most of our proofs on the recursive scheme. We do provide, however, the main results for both the rolling and recursive methods. First, notice that
u ^ 1 , t + 1 = u 1 , t + 1 X 1 t ( β ^ 1 t β 1 * ) = ( u t + 1 + Δ oos ) X 1 t ( β ^ 1 t β 1 * )
u ^ 2 , t + 1 = u 2 , t + 1 X 2 t ( β ^ 2 t β 2 * ) = ( u t + 1 + Δ oos ) X 2 t ( β ^ 2 t β 2 * )
Then, the core statistic from the ENC-NEW is simply given by
t = R T 1 u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 )
                                          = t = R T 1 [ ( u t + 1 + Δ oos ) X 1 t ( β ^ 1 t β 1 * ) ] [ X 2 t ( β ^ 2 t β 2 * ) X 1 t ( β ^ 1 t β 1 * ) ]
= t = R T 1 ( u t + 1 + Δ oos ) [ X 2 t ( β ^ 2 t β 2 * ) X 1 t ( β ^ 1 t β 1 * ) ]
t = R T 1 X 1 t ( β ^ 1 t β 1 * ) [ X 2 t ( β ^ 2 t β 2 * ) X 1 t ( β ^ 1 t β 1 * ) ]      
To establish the distribution of (5) and (6), we need additional results, provided by Lemmas 2 through 5. While these Lemmas are similar to those of CM, they are not immediate from their results.
Lemma 2.
sup t T 0.5 | 1 t j = 1 t 1 X l j | = O p ( 1 ) , with  l = 1 , 2 .
Proof Lemma 2.
See Appendix C. □
Lemma 3.
(a) 
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { 1 t j = 1 t 1 X 1 j X 1 j } 1 X 1 t X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } = t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 E X 1 t X 2 t { E X 2 X 2 } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } + o p ( 1 )
(b) 
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { 1 t j = 1 t 1 X 1 j X 1 j } 1 X 1 t X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } = t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
(c) 
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { 1 t j = 1 t 1 X 1 j X 1 j } 1 X 1 t X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j } = t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
(d) 
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { 1 t j = 1 t 1 X 1 j X 1 j } 1 X 1 t X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j } = t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 E X 1 t X 2 t { E X 2 X 2 } 1 { 1 t j = 1 t 1 X 2 j } + o p ( 1 )
(e) 
t = R T 1 { 1 t j = 1 t 1 X 1 j } { 1 t j = 1 t 1 X 1 j X 1 j } 1 X 1 t X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } = t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 X 1 } 1 E X 1 t X 2 t { E X 2 X 2 } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } + o p ( 1 )
(f) 
t = R T 1 { 1 t j = 1 t 1 X 1 j } { 1 t j = 1 t 1 X 1 j X 1 j } 1 X 1 t X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j } = t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
(g) 
t = R T 1 { 1 t j = 1 t 1 X 1 j } { 1 t j = 1 t 1 X 1 j X 1 j } 1 X 1 t X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j } = t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 X 1 } 1 E X 1 t X 2 t { E X 2 X 2 } 1 { 1 t j = 1 t 1 X 2 j } + o p ( 1 )
Proof of Lemma 3.
The proofs for Lemma 3a,b follow from Lemma A2 and A3b in CM. While the proofs for Lemmas 3c through 3g are not immediate from CM, they follow the same arguments and strategies. Since the proofs are very similar, Appendix D shows our proofs exclusively for Lemma 3c. □
Lemma 4.
t = R T 1 X 1 t ( β ^ 1 t β 1 * ) [ X 2 t ( β ^ 2 t β 2 * ) X 1 t ( β ^ 1 t β 1 * ) ]   i s   o p ( 1 )
Proof Lemma 4.
See Appendix E and Appendix F. □
Lemma 5.
t = R T 1 u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) =       t = R   T 1 u t + 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j u t + 1 + Δ IS t = R T 1 u t + 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j + Δ oos t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j u t + 1 + Δ IS Δ oos t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j + o p ( 1 )
Proof of Lemma 5.
See Appendix G. □
Theorem 3.
Let S u X = σ u ( E X 2 t X 2 t ) 1 ,
(a) 
t = R T 1 u t + 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j u t + 1 λ 1 s 1 W 1 ( s ) S u X   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s )
(b) 
Δ IS t = R T 1 u t + 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j Δ IS λ 1 s 1 W 2 ( s ) V L P ( X 2 t )   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s )
(c) 
Δ oos t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j u t + 1 Δ oos λ 1 s 1 W 1 ( s ) S u X [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] V L P ( X 2 t ) d W 2 ( s )
(d) 
Δ oos Δ IS t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j Δ oos Δ IS λ 1 s 1 W 2 ( s ) V L P ( X 2 t ) [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   V L P ( X 2 t ) d W 2 ( s ) + Λ λ
(e) 
t = R T 1 u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) λ 1 s 1 W 1 ( s ) S u X   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s ) + Δ IS λ 1 s 1 W 2 ( s ) V L P ( X 2 t )   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s ) + Δ oos λ 1 s 1 W 1 ( s ) S u X   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   V L P ( X 2 t ) d W 2 ( s ) + Δ oos Δ IS λ 1 s 1 W 2 ( s ) V L P ( X 2 t )   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   V L P ( X 2 t ) d W 2 ( s ) + Λ λ + o p ( 1 )
(f) 
If there is no in-sample bias (i.e., Δ I S = 0 ), then
t = R T 1 u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) λ 1 s 1 W 1 ( s ) S u X   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s )
+ Δ o o s λ 1 s 1 W 1 ( s ) S u X [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] V L P ( X 2 t ) d W 2 ( s )
Proof of Theorem 3.
Theorem 3e is simply the combination of Theorems 3a through 3d. Theorem 3f is equivalent to Theorem 3e imposing Δ IS = 0 . The proof for Theorems 3a–3d is direct from Lemmas 3 through 5. □
Three things from Theorem 3 are worth mentioning. First, the re-centering term of the MENC-NEW is relevant in the distribution because of the term Δ oos Δ IS t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j . The reason is that the correlation between X 2 j and X 2 t may not be zero; in other words, we may find a correlation between the predictor in the estimation and evaluation window. Second, if there is no in-sample bias (i.e., Δ IS = 0 ) then the re-centering term is not required (thus, we could simply use the ENC-NEW with different critical values). In essence, the term Δ oos t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 u t + 1 X 2 j does not require a re-centering term since X 2 j u t + 1 (the in-sample term) and X 2 t (out-of-sample predictor) are assumed to be uncorrelated. Finally, the first term in the distribution of Thereom 3f is akin to CM if we are willing to assume that errors are not autocorrelated and homoscedastic. Notice, however, that even if that is the case, our distribution is different because of the out-of-sample bias. In contrast to CM, these additional terms depend on the long-run variance of X 2 t ( V L P ( X 2 t ) ), on an additional Brownian motion W 2 and, of course, the out-of-sample bias Δ o o s . Theorem 4a next establishes the asymptotic distribution of the MENC-NEW in the general case in which Δ I S , Δ o o s 0 . In Theorem 4b, we consider the special (although more interesting) case in which Δ I S = 0 ,   Δ o o s 0 ; in this case, we establish the new asymptotic distribution of the ENC-NEW, as the re-centering term is not required.
Theorem 4.
(a) 
The asymptotic distribution of our MENC-NEW = P P 1 t u ^ 1 , t + 1 ( u ^ 1 , t + 1 u ^ 2 , t + 1 ) Λ ^ λ P 1 t u ^ 2 , t + 1 2 is simply given by
M E N C N E W P , T ( R e c u r s i v e )
1 σ u 2 + Δ oos 2 [ λ 1 s 1 W 1 ( s ) S u X   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s ) + Δ I S λ 1 s 1 W 2 ( s ) V L P ( X 2 t )   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s ) + Δ oos λ 1 s 1 W 1 ( s ) S u X [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] V L P ( X 2 t ) d W 2 ( s ) + Δ oos Δ IS λ 1 s 1 W 2 ( s ) V L P ( X 2 t )   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   V L P ( X 2 t ) d W 2 ( s ) ]  
M E N C N E W P , T ( R o l l i n g )
λ 1 σ u 2 + Δ oos 2 [ λ 1 ( W 1 ( s ) W 1 ( s λ ) ) S u X   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s ) + Δ I S λ 1 ( W 2 ( s ) W 2 ( s λ ) ) V L P ( X 2 t )   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s ) + Δ oos λ 1 ( W 1 ( s ) W 1 ( s λ ) ) S u X [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] V L P ( X 2 t ) d W 2 ( s ) + Δ oos Δ IS λ 1 ( W 2 ( s ) W 2 ( s λ ) ) V L P ( X 2 t )   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   V L P ( X 2 t ) d W 2 ( s ) ]
(b) 
If there is no in-sample bias (i.e., Δ I S = 0 ), then the asymptotic distribution of the ENC-NEW is simply
E N C N E W P , T ( r e c u r s i v e )
1 σ u 2 + Δ oos 2 [ λ 1 s 1 W 1 ( s ) S u X   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s ) + Δ oos λ 1 s 1 W 1 ( s ) S u X [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] V L P ( X 2 t ) d W 2 ( s ) ]
E N C N E W P , T ( r o l l i n g )
λ 1 σ u 2 + Δ oos 2 [ λ 1 ( W 1 ( s ) W 1 ( s λ ) ) S u x   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   S u X d W 1 ( s ) ]
+ λ 1 Δ oos σ u 2 + Δ oos 2 [ λ 1 ( W 1 ( s ) W 1 ( s λ ) ) S u x   [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ]   V L P ( X 2 t ) d W 2 ( s ) ]
Proof of Theorem 4.
The proofs of Theorem 4a,b are immediate from Theorem 3e,f and the definition of the MENC-NEW and the ENC-NEW. □
As expected from Theorem 3, even in the case of Δ I S = 0 , we will see that the out of sample bias shift the asymptotic distribution of the ENC-NEW because of the term Δ oos t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j u t + 1 . Akin to CM, the distribution of the ENC-NEW depends on how parameters are estimated (either rolling or recursive), on the number of excess parameters k 2 , and how we split our database (through λ = lim P , T R / T ). Nevertheless, the new asymptotic distribution of the ENC-NEW depend on more nuisance parameters than CM: it depends on the magnitude of the out-of-sample bias Δ oos , on the persistency of the predictors in the nesting model ( V L P ( X 2 t ) ) , and it is necessary to simulate a second Brownian motion because of W 2 . In Section 5 next, we show that ignoring these additional terms may severely distort the empirical size of the ENC-NEW.

5. Monte Carlo Simulations

In this section, we consider four different sets of simulations to study the size properties of our test compared to some traditional out-of-sample tests: the ENC-NEW and the ENC-t with CM critical values, and the “Wild Clark and West” (WCW) of [26] (henceforth PHM).
The data generating process (DGP) in Section 5.1 is designed to study the effects of a bias both in-sample and out of sample (e.g., the benchmark model is the DRW). The experiment in Section 5.2 is akin to Section 5.1, but this time we set our parameters to match the empirical specifications of [22,23,24] using the Chilean peso and lead prices. In these two experiments, we consider the MENC-NEW according to Theorem 2.
Section 5.3 and Section 5.4 consider two different DGPs that introduce an out-of-sample bias. In these cases, the benchmark model the random walk (with drift). As a consequence, there is no in-sample bias, and no re-centering term is required (according to Theorem 3). We use the ENC-NEW with corrected critical values in these two cases, following Theorem 4 (ENC-NEW*).
Finally, Section 5.5 studies the power properties of the MENC-NEW. To this end, we impose the alternative hypothesis over the DGP of Section 5.2.

5.1. Size Properties of the MENC-NEW: The Benchmark Model Is the DRW and E Y t 0

To illustrate the effects of a bias in the nested model, we consider a simple DGP for both the target variable Y t and the predictor X t :
Y t + 1 = α + β X t + ε t + 1
X t + 1 = ρ X t + u t + 1
where ε t + 1 and u t + 1 are drawn simply by independent standard normal distributions. The initial value Y 1 for each simulation is set at the unconditional expected value of Y t + 1 ( E Y t = α ). We consider two competing one-step-ahead forecasts. The first model is simply a zero-forecast (e.g., the DRW), and the second model uses exclusively X t as a predictor with no drift (i.e., Y t f ( 1 ) = γ X t ).
To evaluate the empirical size of each test, we set β = 0 . In this simple DGP, we introduce the bias through the parameter α , and we modify the persistency of the predictor X t + 1 through ρ . If α = 0 , then the proper critical values for this setup are those of CM. We argue, however, that severe distortions may appear because of α 0 , especially for highly persistent predictors. To illustrate the severity of these size distortions, we consider different sets of values for these two parameters α and ρ . We set α to either 0.3, 0.5, 0.7 or 1, and ρ to either 0.6 or 0.8. Additional simulations with different values of these parameters are available upon request. While this is an extremely simple DGP, it clearly illustrates the size distortions over traditional tests of out-of-sample evaluation.
We consider sample sizes of T = 100, 150, and 200. As the asymptotic distribution of most of these tests depends on how we split this sample, we consider P/R = 2 and 4, using a recursive scheme of estimation (Additional results considering P/R = 0.4, 1, and 3 are available upon request with similar conclusions.). We consider a total of 5000 Monte Carlo simulations for each exercise.
To obtain the critical values suggested by Theorem 2, we closely follow the simulations in CM. In each of the 5000 simulations, we generate 10,000 Brownian motions through simple random walks, each using an independent sequence of 10,000 i.i.d increments, drawn from a normal distribution, with a mean of 0 and a variance of 1/10,000. Then, we obtain the integrals by summing the weighted quadratic of the random walks for the entire evaluation window (from R + 1/T through T/T). In every Monte Carlo simulation, we estimate the nuisance parameters of the MENC-NEW with the synthetic sample data (such as the bias and the long-run variance of X t ), according to Theorem 2. Finally, the critical values are simply the 90th and 95th percentiles of our 10,000 simulated stochastic integrals for 10% and 5% significance levels, respectively.
To evaluate the empirical size of the ENC-t and ENC-NEW, we consider the critical values available in CM for the recursive scheme with the corresponding P/R and k 2 = 1 . In a recent paper, PHM propose a new asymptotically normal test label as the Wild Clark and West (WCW). The authors show that parameter uncertainty in their framework is asymptotically irrelevant. The strategy of PHM is to introduce an independent random variable θ t in the ENC-t core statistic that prevents degeneracy under the null hypothesis of no encompassing. PHM propose a gaussian distribution for θ t ( θ t ~ N ( 1 , ϕ 2 ) ) such that
W C W t = P 1 P 1 t = R + 1 T e ^ 1 , t + 1 ( e ^ 1 , t + 1 θ t e ^ 2 , t + 1 ) S f f ^
where e ^ 1 , t + 1 and e ^ 2 , t + 1 are the forecasting errorrs of the nested and nesting models, respectively, and S f f ^ is a consistent estimate of the long-run variance of e ^ 1 , t + 1 ( e ^ 1 , t + 1 θ t e ^ 2 , t + 1 ) . In this case, we consider the HAC estimator proposed by Newey and West (1987,1994) [35,36]. While PHM do not provide an optimal value for the tuning parameter ϕ , they give some recommendations. In particular, we set ϕ = V ( e ^ 2 , t ) 0.1 , as suggested by the authors (see PHM, page 6).
Table 1 and Table 2 display our results for nominal sizes 10% and 5%, using P/R = 2, and 4 respectively. Each panel considers different parametrizations of our DGP. Three things about Table 1 and Table 2 are worth mentioning. First, consistent with Theorem 2, the ENC-t and the ENC-NEW become severely oversized with an increasing bias and higher persistency on the predictor. Notably, the last panel of Table 2 reports an empirical size about three times higher than the nominal size for both statistics. The average size of the ENC-NEW across all exercises goes from 18.2% (11.5%) to 30.8% (21.6%) with a nominal size of 10% (5%). Second, the MENC-NEW is reasonably well-sized in most of our exercises. Notice that the size properties of our test improve significantly with larger samples; this is reasonable since the asymptotic distribution of our test relies on nuisance parameters estimated in each simulation. The average size of our test goes from 10.3% (5.4%) to 12.9% (7.5%) across all exercises. Moreover, if we focus on the simulations with the larger sample T = 200, our results range from 9.5% (4.9%) to 12.3% (6.9%). Finally, while the WCW-t is generally well-sized in these exercises, we make a note of caution. While we see improvements in size compared to the ENC-t and ENC-NEW, PHM show that these improvements come with a cost: the WCW tends to exhibit less power than other tests. Our results in Section 5.5 generally support this idea: the power of the WCW is sometimes less than a half than our test.

5.2. Size Properties of the MENC-NEW with a DGP

In this section, we use the same DGP as Section 5.1, but we set the parameters in our simulation to match the econometric setup of [22,23,24] (PH). Based on the present-value model for exchange rate determination (Campbell and Shiller (1987) [37] and Engel and West (2005) [38]), PH show that the Chilean exchange rate has the ability to predict base-metal prices. Table 2 in PH suggest the following econometric specification:
Δ l n ( C P t ) = β [ Δ l n ( E R t 1 ) + Δ l n ( E R t 2 ) ] + e t
where E R t stands for the Chilean peso and C P t for a generic commodity price (in this case, we use lead prices). PH use this econometric specification to determine whether this model can outperform the zero-forecast model (e.g., they evaluate the null hypothesis H 0 : β = 0 ). We argue that some distortions may appear if E [ Δ l n ( C P t ) ] 0 , as both models are biased under the null hypothesis.
Let Y t = Δ l n ( C P t ) and X t = [ Δ l n ( E R t 1 ) + Δ l n ( E R t 2 ) ] . Using the annualized monthly returns of the Chilean peso and lead prices from September 1999 through August 2020, and using the same DGP of Section 5.1 (under the null hypothesis β = 0 ), we estimate ρ , σ ε 2 = V ( ε t + 1 ) , σ u 2 = V ( u t + 1 ) , and δ = C o r r ( ε t + 1 , u t + 1 ) :
Y t = α + β X t + ε t + 1
X t = 0.53 X t 1 + u t + 1
where ( ε t u t ) N ( σ ε 2 δ σ u σ ε δ σ u σ ε σ u 2 ) with δ = 0.36 , σ ε = 1.05 , and σ u = 0.50 . To introduce the bias, we consider two different values for α (0.3 and 0.6). Akin to Section 5.1, we compare out-of-sample the zero-forecast against a model that uses exclusively X t ( Y t f ( 1 ) = γ X t ). We estimate recursively by OLS the parameter γ . Table 3 and Table 4 exhibit our results for P/R = 1 and 2, respectively. Additional exercises are available upon request. Notice that the only difference in each panel is the magnitude of the bias.
The main conclusions of Table 3 and Table 4 are similar to those in Table 1 and Table 2. First, consistent with Theorem 2, the ENC-t and ENC-NEW exhibit substantial size distortions when the bias increases. The average size of both tests is ~12–13% (7%) when α = 0.3 . Nevertheless, when the bias increases ( α = 0.6 ) , the average size of both tests is ~17–18% (11%). Notably, the MENC-NEW seems to be reasonably well-sized in all our exercises. Specifically, the empirical size of our test ranges between 8.5 and 10.5% (4.3–6.1%), with an average empirical size across all our exercises of 9.8% (5.1%). Akin to the previous section, the size of our test seems to improve with the sample size; this is reasonable since we need to estimate some nuisance parameters to determine our critical values. Finally, the WCW exhibit mixed results compared to the ENC-t and ENC-NEW. In particular, the WCW is less oversized for α = 0.6 , but it tends to underperform for α = 0.3 .

5.3. Size Properties of the ENC-NEW with Adjusted Critical Values: The Effects of an Out-of-Sample Bias

In this DGP we introduce an out-of-sample bias by shifting the expected value of the target variable Y t . The key point is that this shift happens in the evaluation window t > R . As the parameters are recursively estimated in each period, the OLS estimator will not fully accommodate this shift in small samples. As a consequence, we will observe an out-of-sample bias in both models under the null hypothesis. In contrast to Section 5.1, there is no need to re-center the ENC-NEW, as suggested by Theorem 3. Nevertheless, the asymptotic distribution of the ENC-NEW is now given by Theorem 4b, and it is necessary to obtain new critical values.
To illustrate the effects of this bias, we introduce this shift immediately after the estimation window at t = R + 1 through a change in the drift. Let α 1 and α 2 be the drifts before and after the shift, respectively:
Y t = α 1 + β X t 1 + ε t ,   t R
Y t = α 2 + β X t 1 + ε t ,   t > R + 1
where ε t is an independent standard normal random variable and X t is a stationary AR(1) process
X t = α x + ρ X t 1 + v t , t
We consider two competing models for the target variable Y t
Y t = c 1 + v 1 , t   ( M 1 )
Y t = c 2 + γ X t 1 + v 2 , t   ( M 2 )
where c 1 , c 2 , and γ are regression parameters, and v 1 , t , v 2 , t stand for the forecast errors. Notice that M1 forecasts Y t using its historical sample mean, and it is identical to M2 whenever γ = 0 . As the parameters in M1 and M2 are estimated in-sample in each recursive window by OLS, we should expect no in-sample bias whatsoever. Nevertheless, the OLS estimator will not fully capture the out-of-sample change on the drift in small samples. According to Theorem 3, if there is no in-sample bias, then the re-centering term of the MENC-NEW is unnecessary. In this case, we may simply consider the ENC-NEW with corrected critical values (as suggested by the asymptotic distribution in Theorem 4b). We consider different parametrizations for this DGP. In particular, we allow different levels of persistency on the predictor X t (ρ = 0.4 and 0.8), and different changes on the drift parameter ( α 1 = 0 α 2 = 0.3, α 1 = 0 α 2 = 0.3, and α 1 = 0 α 2 = 0.3). The idea of using different changes on the drift is to manage the magnitude of the out-of-sample bias.
Similar to Section 5.1 and Section 5.2, we consider 5000 Monte Carlo simulations using a recursive scheme with P/R = 1 and T = 100, 150, and 200. To evaluate the empirical size of each test, we set β = 0 . To simulate the corrected critical values of the ENC-NEW, we consider the asymptotic distribution suggested by Theorem 4b. Akin to the previous section, in each of the 5000 simulations, we generate 10,000 Brownian motions, each using an independent sequence of 10,000 i.i.d increments, drawn from a normal distribution, with mean 0 and variance 1/10,000. Then, we obtain the integrals by summing the weighted quadratic of the random walks for the entire evaluation window (from R + 1/T through T/T). We estimate the respective nuisance parameters in each simulation using the simulated data, according to Theorem 4b. The critical values are the 90th and 95th percentiles of our 10,000 simulated stochastic integrals for 10% and 5% significance levels, respectively.
Each panel in Table 5 exhibits our results using different parameterizations of our DGP. Three features in Table 5 are worth mentioning. First, the ENC-NEW with corrected critical values (ENC-NEW*) is generally well-sized in all exercises. In particular, the average size of the ENC-NEW* goes from 8.8% (4.4%) to 11.5% (6.1%), where the most oversized exercise exhibits a rejection of 11.6% (6.6%), and the average size across all exercises is 10.0% (5.16%). Second, the ENC-NEW and the ENC-t are again significantly more oversized. For instance, the average size of the ENC-t goes from 13.0% (7.3%) to 14.9% (9.0%), with an average size across all exercises of 13.7% (7.87%). Finally, consistent with Theorem 4, the worsts results for the ENC-NEW and ENC-t are those in simulations with a stronger persistency in the predictor and a bigger positive shift in the drift (first panel). In these exercises, the ENC-t ranges from 14.5% (8.8%) through 15.6% (9.3%), with an average size of 14.9% (9.0%). In sharp contrast, the ENC-NEW with corrected critical values has an empirical size ranging from 11.3% (5.0%) to 11.6% (6.6%), with an average size of 11.5% (6.1%).

5.4. Size Properties of the ENC-NEW with Adjusted Critical Values: Persistency in the Drift

In this DGP, we introduce an out-of-sample bias by drifting the expected value of the target variable Y t . The key point is that the drift in Y t follows a persistent, although stationary, AR(1) process. As the parameters are recursively estimated in each period, the OLS estimator will not be able to accommodate this shift in small samples, and as a consequence, we will observe an out-of-sample bias in both models under the null hypothesis. In contrast to Section 5.1, there is no in-sample bias, and there is no need to re-center the ENC-NEW, as suggested by Theorem 3. Nevertheless, the asymptotic distribution of the ENC-NEW is now given by Theorem 4b.
Consider the following DGP for the target variable Y t and the predictor X t
Y t = α y , t + β X t 1 + u t
X t = α x + ρ X t 1 + ε t
where u t and ε t are independent standard normal random variables. We consider two competing models
Y t = c 1 + v 1 , t   ( M 1 )
Y t = c 2 + γ X t 1 + v 2 , t   ( M 2 )
where c 1 , c 2 , and γ are simply regression parameters, and v 1 , t , v 2 , t stand for the forecast errors. Notice that M1 forecasts Y t using its historical sample mean, and it is identical to M2 whenever γ = 0 . The interesting thing about this DGP is that we allow the drift of Y t to follow a persistent (although stationary) AR(1) process
α y , t = α α + ρ α α y , t 1 + e t
where e t is an independent standard normal random variable. As the parameters in M1 and M2 are estimated in-sample in each recursive window by OLS, we should expect no in-sample bias whatsoever. Nevertheless, the OLS estimator will not capture the out-of-sample change on the drift. As the drift is persistently changing in each period, it may introduce an out-of-sample bias. According to Theorem 4b, if there is no in-sample bias, then the re-centering term of the MENC-NEW is unnecessary. In this case, we may simply consider the ENC-NEW with corrected critical values (as suggested by the asymptotic distribution in Theorem 4b).
Table 6 and Table 7 report our results for P/R = 1 and 2, respectively (Additional results are available upon request.). Consistent with our previous simulations, the ENC-NEW with corrected critical values is reasonably well-sized, especially with large sample sizes. The size of the ENC-NEW* ranges between 7.3% (3.0%) to 13.1% (7.2%), with an average size across all exercises of 10.5% (5.2%). In sharp contrast, the ENC-NEW and the ENC-t are severely oversized in all our simulations: some exercises exhibit an empirical size four times bigger than its nominal size. For instance, the last panel of Table 7 shows an average empirical size of 31.2% (22.8%) and 31.2% (23.3%) for the ENC-NEW and ENC-t, respectively. Moreover, the average size of the ENC-NEW across all exercises is 26.3% (18.6%), more than two (three) times its nominal size. Finally, the WCW-t seems to be correctly sized for most of our exercises, with an empirical size ranging from 9.1% (4.6%) to 13.1% (6.8%).

5.5. Power Properties of the MENC-NEW

In this section, we provide some simulations evaluating the power properties of the MENC-NEW compared to the ENC-NEW, ENC-t, and WCW-t. We use the same DGP of Section 5.2, but this time we impose the alternative hypothesis setting β 0 . Table 8 and Table 9 next exhibit our power results introducing a bias of α = 0.15 ,   0.30 , and 0.60, considering P/R = 1 and 2, respectively. In both tables, we impose the alternative by setting β = 0.6 . Additional simulations using different values for β deliver a similar message, and they are available upon request.
First, consistent with [26] (PHM), the size improvements of the WCW-t come with a cost: a deterioration in power. As commented by PHM: “In terms of power, results have been rather mixed, although CW has frequently exhibited some more power. All in all, our simulations reveal that asymptotic normality and size corrections come with a cost: the introduction of a random variable erodes some of the power of WCW.” [26], page 3. In particular, the power of WCW-t ranges from 33.0% (26.7%) to 50.7% (50.1%), with an average power of 44.5% (40.7%). Second, the ENC-NEW, ENC-t, and MENC-NEW exhibit significantly more power than WCW-t, generally more than double. For instance, the average power of the MENC-NEW is 92.2% (88%). Finally, while the power of the ENC-NEW and the MENC-NEW are very similar, we do observe a small edge on the ENC-NEW. This is, of course, reasonable as the ENC-NEW and ENC-t are extremely oversized in this environment, according to the results of Section 5.2. Not surprisingly, the differences in power become more important with a higher bias (e.g., last panel): the advantage of the ENC-NEW in terms of power coincides with the ENC-NEW being particularly more oversized. We see the results of Table 8 and Table 9 encouraging since the differences in power are, most of the time, neglectable, even though the size of the ENC-NEW is sometimes more than double than the nominal size.

6. Concluding Remarks

In this paper, we show analytically and via simulations that a bias in the null model may severely distort the asymptotic distribution of some traditional tests of forecast evaluation in nested models comparisons. These distortions are more severe with an increasing bias and a higher persistency on the additional predictors. We address two relevant cases: (i) The presence of a bias in-sample and out-of-sample (e.g., the driftless random walk as the benchmark model). (ii) The presence of a bias exclusively out-of-sample (e.g., a shift in the expected value of the target variable). To deal with the former case, we consider a simple modification of the ENC-NEW (MENC-NEW) robust to a bias in the null model. In essence, the MENC-NEW introduces a re-centering term in the ENC-NEW that appears because of the additional predictors’ bias and persistency. In both cases, the relevant asymptotic distribution is not pivotal; akin to CM, they are functionals of stochastic integrals, but they also depend on the magnitude of the bias and the persistency of the predictors.
Based on CM, we derive the new distribution for the ENC-NEW imposing a bias in the null model (either out-of-sample or both in-sample and out-of-sample). While this is a subtle change in the set of assumptions, it has important implications over the asymptotic theory. In particular, the quadratic Brownian motions in CM arise because of the martingale difference terms: put simply, the orthogonality condition is assumed to hold both in-sample and out-of-sample. In contrast, in our case, the quadratic Brownian motions arise because of both the predictors and the martingale difference terms. Moreover, if the bias appears both in-sample and out-of-sample, the persistency of the predictors shift the expected value of the integrals of quadratic Brownian motions, thus a re-centering is required: our MENC-NEW is simply a re-centered version of the ENC-NEW. Of course, in the absence of a bias, our MENC-NEW reduces to the ENC-NEW. As expected, we show that this new asymptotic distribution depends on the magnitude of the bias and the persistency of the predictors. Even though the asymptotic distribution of the MENC-NEW (and the ENC-NEW with adjusted critical values) is not pivotal, the nuisance parameters can be easily estimated (one of these nuisance parameters being, of course, the magnitude of the bias).
Our Monte Carlo simulations reveal that our MENC-NEW (and the ENC-NEW with adjusted critical values) is reasonably well-sized even when the ENC-NEW and ENC-t (with CM critical values) exhibits rejections three to four times higher than the nominal size. The severity of the size distortion is, of course, related to the magnitude of the bias, as demonstrated by our decompositions.
Our results are important since the “correct specification” assumption in CM is often overlooked in empirical works. We show that ignoring this type of misspecification (bias) may severely change the asymptotic distribution of traditional tests. We suggest four interesting avenues for future research: First, it is important to study the effects of other cases of misspecification. For instance, a usual finding in the empirical literature is that forecasts are often auto-inefficient, in the sense that C o r r ( X t , u t + 1 ) 0 . This type of misspecification may also affect the asymptotic distribution of out-of-sample tests and, consequently, distort the size and power properties. Second, in our view, the main caveat of our approach is that the asymptotic distribution of our test is not pivotal. In this sense, the empirical researcher must simulate its critical values to conduct inference correctly. An interesting contribution may be the development of asymptotically normal tests, or at least an approach free of nuisance parameters, in order to simplify the forecast evaluation. Third, the extension of our approach to a direct multi-step-ahead framework, along the lines of [11]. Fourth, akin to CM, all our derivations are based on non-linear least squares (where the least square estimator is, of course, a special case). It may be interesting to study how the asymptotic distribution of these tests is affected by different estimation methods (e.g., ridge regressions).

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proof Lemma 1

Lemma 1e is simply the combination of Lemma 1a through 1d. We show first our result for Lemma 1(a).
Proof Lemma 1(a). From (1):
t = R T 1 ( T t ) T 0.5 u t + 1 X t ( 1 t j = 1 t 1 X j X j ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 )  
If we add and subtract ( E X t X t ) 1
t = R T 1 ( T t ) T 0.5 u t + 1 X t ( 1 t j = 1 t 1 X j X j ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) = t = R T 1 ( T t ) T 0.5 u t + 1 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 )  
+ T 0.5 t = R T 1 ( T t ) T 0.5 u t + 1 X t T 0.5 { ( 1 t j = 1 t 1 X j X j ) 1 ( E X t X t ) 1 } ( T 0.5 j = 1 t 1 X j u j + 1 )  
It suffices then to show that the last term on the RHS is o p ( 1 ) . From Lemma A1(b) in [33] sup t T 0.5 | v e c [ ( 1 t j = 1 t 1 X j X j ) 1 ] v e c [ ( E X t X t ) 1 ] | is O p ( 1 ) .
Then, that t = R T 1 ( T t ) T 0.5 u t + 1 X t T 0.5 { ( 1 t j = 1 t 1 X j X j ) 1 ( E X t X t ) 1 } ( T 0.5 j = 1 t 1 X j u j + 1 ) is O p ( 1 ) follows from the fact that X j u j + 1 is a martingale difference, Theorem 7.19 in [34], from Corollary 29.19 of Davidson (1994) [39] and Theorem 3.1 of Hansen (1992) [40]. Then, the proof is complete since T 0.5 is o p ( 1 ) . □
As expected, the previous proof is akin to Lemma A2 from [33]. The remainder of the proof is similar with one crucial difference: (2–4) depend on X t rather than the martingale difference sequence X t u t + 1 . In other words, these terms depend not only on the generalized forecast errors, but also on the additional predictor X t . The proofs for Lemma 1b through 1d are very similar and follow the same arguments, hence we only show our proofs for Lemma 1b.
Proof Lemma 1(b).
From the proof of Lemma 1a, it follows that
= t = R T 1 ( T t ) T 0.5 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 )  
+ T 0.5 t = R T 1 ( T t ) T 0.5 X t T 0.5 { ( 1 t j = 1 t 1 X j X j ) 1 ( E X t X t ) 1 } ( T 0.5 j = 1 t 1 X j u j + 1 )
Moreover, again, we only need to prove that the last term on the RHS is o p ( 1 ) . From Lemma A1(b) in [33] sup t T 0.5 | v e c [ ( 1 t j = 1 t 1 X j X j ) 1 ] v e c [ ( E X t X t ) 1 ] | is O p ( 1 ) .
Then, that t = R T 1 ( T t ) T 0.5 X t T 0.5 { ( 1 t j = 1 t 1 X j X j ) 1 ( E X t X t ) 1 } ( T 0.5 j = 1 t 1 X j u j + 1 ) is O p ( 1 ) follows from the fact that X t is covariance stationary, Theorem 7.17 in [34], Corollary 29.19 of [39], and Equation (3) in [40] (mixing sequences). Then, the proof is complete since T 0.5 is o p ( 1 ) . □
Notice that this result is similar to Lemma A2 in [33], with one crucial difference: we apply the FCLT for stationary ergodic series (to X t ), rather than the FCLT for a martingale difference sequence.

Appendix B. Proof Theorem 1

Proof Theorem 1(a).
Let R T λ when T , R . Since X j u j + 1 is a martingale difference, from Theorem 7.19 in White (2014), the Invariance Principle (FCLT) for martingale differences applies (See McLeish (1974) [41] and Hall (1977) [42]). From Corollary 29.19 of [39] it follows that
T 0.5 j = 1 t 1 X j u j + 1 E X t X t σ u W 1 ( s )
From the Continuous Mapping theorem
( E X t X t ) 1 ( T t ) ( T 0.5 j = 1 t 1 X j u j + 1 ) s 1 ( E X t X t ) 0.5 σ u W 1 ( s )
Finally, from Theorem 3.1 of Hansen (1992), it follows that
t = R T 1 ( T t ) T 0.5 u t + 1 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) λ 1 s 1 W 1 ( s ) σ u 2 d W 1 ( s )
This result is akin to Lemma A6 in [33] for the numerator of the ENCNEW with a recursive scheme. We may interpret this term as the asymptotic distribution of the ENCNEW when no bias is present.
Proof Theorem 1(b).
That T 0.5 j = 1 t 1 X j u j + 1 E X t X t σ u W 1 ( s ) follows immediately from Proof of Theorem 1a. From the continuous mapping theorem
Δ ( T t ) ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) Δ s 1 ( E X t X t ) 0.5 σ u W 1 ( s )
As X t is covariance stationary, it is also global covariance stationary (See Definition 7.14 in [34]). According to Theorem 7.17 in [34], we may apply the FCLT for stationary ergodic series. Then, from Corollary 29.19 of [39] and Equation (3) in [40] (mixing sequences), it follows that
Δ t = R T 1 ( T t ) T 0.5 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j u j + 1 ) Δ λ 1 s 1 W 1 ( s ) ( E X t X t ) 0.5 σ u V L P ( X ) 0.5 d W 2 ( s )
Proof Theorem 1(c).
This proof is similar to that of Theorem 1b. We use the Donsker principle for stationary ergodic series (Theorem 7.17 in [34]) so that
T 0.5 j = 1 t 1 X j V L P ( X ) 0.5 σ u W 2 ( s )
From the continuous mapping theorem
Δ ( T t ) ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j ) Δ s 1 ( E X t X t ) 1 V L P ( X ) 0.5 σ u W 2 ( s )
Then, the desired result follows from Corollary 29.19 of [39] and Equation (3) in [40]
Δ t = R T 1 ( T t ) T 0.5 u t + 1 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j ) Δ λ 1 s 1 W 2 ( s ) V L P ( X ) σ u ( E X t X t ) 0.5 d W 1 ( s )
Proof Theorem 1(d).
That Δ 2 ( T t ) ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j )   Δ 2 s 1 ( E X t X t ) 1 V L P ( X ) 0.5 σ u W 2 ( s )
Follows directly from the proof of Theorem 1c. From Corollary 29.19 of [39] and Equation (3) in [40]
Δ 2 t = R T 1 ( T t ) T 0.5 X t ( E X t X t ) 1 ( T 0.5 j = 1 t 1 X j ) Δ 2 λ 1 s 1 W 2 ( s ) V L P ( X ) ( E X t X t ) 1 d W 2 ( s ) + Λ λ
Proof Theorem 1(e).
Theorem 1e is simply the combination of Theorem 1a through 1d. □

Appendix C

Proof of Lemma 2.
First notice that sup t T 0.5 | 1 t j = 1 t 1 X l j | T R sup t | T 0.5 j = 1 t 1 X l j |
As T/R is bounded, it is sufficient to show that sup t | T 0.5 j = 1 t 1 X l j | is o p ( 1 ) . Then, by the FCLT it follows that T 0.5 j = 1 t 1 X l j V L P ( X l ) W ( s ) .
Then, as noticed by CM, from Lemma 2.1 in Corradi, Swanson and Olivetti (2001) [43] it follows that
sup t | T 0.5 j = 1 t 1 X l j | sup λ t T 1 | V L P ( X l ) W ( s ) | = O p ( 1 )
The proof is complete. □

Appendix D

Proof of Lemma 3.
With minor technical subtleties, this proof is similar to Lemma A2 in CM. The first step is to show that.
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { 1 t j = 1 t 1 X 1 j X 1 j } 1 X 1 t X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j } = t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 X 1 t X 1 t { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
First, we add and substract twice E ( X 1 t X 1 t ) 1
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { 1 t j = 1 t 1 X 1 j X 1 j } 1 X 1 t X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j } =
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 X 1 t X 1 t { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] X 1 t X 1 t [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] { 1 t j = 1 t 1 X 1 j } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 X 1 t X 1 t [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] { 1 t j = 1 t 1 X 1 j } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] X 1 t X 1 t { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j }
Then, we only need to show that the following term is o p ( 1 )
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] X 1 t X 1 t [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] { 1 t j = 1 t 1 X 1 j } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 X 1 t X 1 t [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] { 1 t j = 1 t 1 X 1 j } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] X 1 t X 1 t { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j }
Moreover, each of the three terms are o p ( 1 ) . The proof for each term follows the same argument; thus, we show its exclusively for the first term. Taking absolute value
| t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] X 1 t X 1 t [ { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 ] { 1 t j = 1 t 1 X 1 j } | k 4 P T ( 1 P t = R T 1 | X 1 t X 1 t | ) ( sup t | T 0.25 1 t j = 1 t 1 X 1 j u j + 1 | ) ( sup t | T 0.25 1 t j = 1 t 1 X 1 j | ) ( sup t T 0.25 | { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 | ) ( sup t T 0.25 | { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 | )
Given that X 1 t is assumed to be stationary, 1 P t = R T 1 | X 1 t X 1 t | is bounded. That sup t | T 0.25 1 t j = 1 t 1 X 1 j u j + 1 , sup t T 0.25 | { 1 t j = 1 t 1 X 1 j X 1 j } 1 { E X 1 X 1 } 1 | are o p ( 1 ) follow directly from Lemmas A1 (a) and (b) in CM. Finally, from Lemma 4, sup t T 0.5 | 1 t j = 1 t 1 X j | = O p ( 1 ) , then sup t T 0.25 | 1 t j = 1 t 1 X j | = o p ( 1 ) and the first part of the proof is complete.
Now we show that
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 X 1 t X 1 t { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } = t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
First, we add and subtract E X 1 t X 1 t
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 X 1 t X 1 t { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } = t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 ( X 1 t X 1 t E X 1 t X 1 t ) { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j }
Then, we show that
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 ( X 1 t X 1 t E X 1 t X 1 t ) { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } is o p ( 1 ) :
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 X 1 } 1 ( X 1 t X 1 t E X 1 X 1 ) { E X 1 X 1 } 1 { 1 t j = 1 t 1 X 1 j } = T 0.5 t = R T 1 { ( T t ) 2 [ T 0.5 j = 1 t 1 X 1 j { E X 1 X 1 } 1 j = 1 t 1 X 1 j u j + 1 { E X 1 X 1 } 1 v e c [ T 0.5 ( X 1 t X 1 t E X 1 t X 1 t ) ] ] }
As t = R T 1 { ( T t ) 2 [ T 0.5 j = 1 t 1 X 1 j { E X 1 X 1 } 1 j = 1 t 1 X 1 j u j + 1 { E X 1 X 1 } 1 v e c [ T 0.5 ( X 1 t X 1 t E X 1 t X 1 t ) ] ] } is bounded and T 0.5 is o p ( 1 ) , the proof is complete. □

Appendix E

Proof of Lemma 4.
While this result is similar to Lemma A3 in [33], it is not immediate from their paper. Notice that t = R T 1 X 1 t ( β ^ 1 t β 1 * ) [ X 2 t ( β ^ 2 t β 2 * ) X 1 t ( β ^ 1 t β 1 * ) ] is simply
t = R T 1 X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j ( u j + 1 + Δ IS ) }   ( X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j ( u j + 1 + Δ IS ) } X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j ( u j + 1 + Δ IS ) } )
Then, we separate into eight terms
t = R T 1 X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j u j + 1 }   ( X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } )  
t = R T 1 X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j u j + 1 }   ( X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } Δ X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j } )  
t = R T 1 X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j u j + 1 }   ( Δ X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j } X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } )  
t = R T 1 X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j u j + 1 }   ( Δ IS X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j } Δ IS X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j } )  
Δ IS t = R T 1 X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j }   ( X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } )
Δ IS t = R T 1 X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j }   ( X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } Δ IS X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j } )
Δ IS t = R T 1 X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j }   ( Δ IS X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j } X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } )
Δ IS 2 t = R T 1 X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j }   ( X 2 t { 1 t j = 1 t 1 X 2 j X 2 j } 1 { 1 t j = 1 t 1 X 2 j } X 1 t { 1 t j = 1 t 1 X 1 j X 1 j } 1 { 1 t j = 1 t 1 X 1 j } )  
In Appendix F, we show that (A1), (A2+A3), (A4), (A5), (A6+A7), and (A8) are o p (1) and the proof is complete. □

Appendix F

Proof that (A1), (A2+A3), (A4),(A5),(A6+A7), and (A8) in Appendix E are o p ( 1 ) .
Proof that (A1) is  o p ( 1 ) .
This is immediate from Lemma A3 in [33]. Using Lemma 4, notice that (A1) equals to
t = R T 1 X 1 t { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 }   X 2 t { E X 2 t X 2 t } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } + t = R T 1 X 1 t { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } X 1 t { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
= t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 X 1 t X 2 t { E X 2 t X 2 t } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 X 1 t X 1 t { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
= t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 E X 1 t X 2 t { E X 2 t X 2 t } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 E X 1 t X 1 t { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
Notice that E X 1 t X 2 t = J E X 2 t X 2 t
= t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 J { 1 t j = 1 t 1 X 2 j u j + 1 } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
As X 1 t = J X 2 , the proof is complete. □
Proof that (A2) and (A3) are  o p ( 1 ) .
Using Lemma 4, rewrite (A2) as
t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 E X 1 t X 2 t { E X 2 t X 2 t } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } + Δ t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
= t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 J { 1 t j = 1 t 1 X 2 j u j + 1 } + Δ t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
Again, from Lemma 4, we write (A3) as
Δ t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 E X 1 t X 2 t { E X 2 t X 2 t } 1 { 1 t j = 1 t 1 X 2 j } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 E X 1 t X 1 t { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
= Δ t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 J { 1 t j = 1 t 1 X 2 j } + t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
As X 1 t = J X 2 t , and adding (A2) and (A3), the proof is complete. □
Proof that (A4) is  o p ( 1 ) .
Notice from Lemma 4 that (A4) is equal to
Δ t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 E X 1 t X 2 t { E X 2 t X 2 t } 1 { 1 t j = 1 t 1 X 2 j } + Δ t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 E X 1 t X 1 t { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
= Δ t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 J { 1 t j = 1 t 1 X 2 j } + Δ t = R T 1 { 1 t j = 1 t 1 X 1 j u j + 1 } { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
As X 1 t = J X 2 t , (A4) is o p ( 1 )   and the proof is complete. □
Proof that (A5) is  o p ( 1 ) .
From (A5), using Lemma 4,
Δ t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 E X 1 t X 2 t { E X 2 t X 2 t } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } + Δ t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 E X 1 t X 1 t { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
= Δ t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 J { 1 t j = 1 t 1 X 2 j u j + 1 } + Δ t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
As X 1 t = J X 2 t , it follows that (A5) is o p ( 1 ) and the proof is complete. □
Proof that (A6) and (A7) are  o p ( 1 ) .
From Lemma 4, notice that (A6) is equal to
Δ t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 E X 1 t X 2 t { E X 2 t X 2 t } 1 { 1 t j = 1 t 1 X 2 j u j + 1 } + Δ t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 E X 1 t X 1 t { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
= Δ t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 J { 1 t j = 1 t 1 X 2 j u j + 1 } + Δ 2 t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
Now, based on Lemma 4, write (A7) as
= Δ 2 t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 J { 1 t j = 1 t 1 X 2 j } + Δ t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j u j + 1 } + o p ( 1 )
As X 1 t = J X 2 t , we add (A6) and (A7), and the proof is complete. □
Proof that (A8) is  o p ( 1 ) .
Notice from Lemma 4 that (A8) is equal to
Δ 2 t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 E X 1 t X 2 t { E X 2 t X 2 t } 1 { 1 t j = 1 t 1 X 2 j } + Δ 2 t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 E X 1 t X 1 t { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
Δ 2 t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 J { 1 t j = 1 t 1 X 2 j } + Δ 2 t = R T 1 { 1 t j = 1 t 1 X 1 j } { E X 1 t X 1 t } 1 { 1 t j = 1 t 1 X 1 j } + o p ( 1 )
As X 1 t = J X 2 t , the proof is complete. □

Appendix G

Proof of Lemma 5.
From (5) t = R T 1 u ^ 1 t + 1 ( u ^ 1 t + 1 u ^ 2 t + 1 )
= t = R T 1 u t + 1 [ X 2 t ( β ^ 2 t β 2 * ) X 1 t ( β ^ 1 t β 1 * ) ]  
+ t = R T 1 Δ o o s   [ X 2 t ( β ^ 2 t β 2 * ) X 1 t ( β ^ 1 t β 1 * ) ]  
+ o p ( 1 )
First, notice that X 1 j u j + 1 = J X 2 j u j + 1 . Then, use the fact that, under the null hypothesis u 2 j + 1 = u 1 j + 1 = u t + 1 + Δ oos . Finally, combining these results with Lemma A10 in CM it follows that
t = R T 1 u t + 1 [ X 2 t ( β ^ 2 t β 2 * ) X 1 t ( β ^ 1 t β 1 * ) ] = t = R T 1 u t + 1 X 2 t ( 1 t j = 1 t 1 X 2 j X 2 j ) 1 ( 1 t j = 1 t 1 X 2 j u 2 j + 1 ) J t = R T 1 u t + 1 X 2 t ( 1 t j = 1 t 1 X 1 j X 1 j ) 1 ( 1 t j = 1 t 1 X 1 j u 1 j + 1 )
= t = R T 1 u t + 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 ( u t + 1 + Δ IS ) X 2 j + o p ( 1 )
Then,
t = R T 1 u t + 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 ( u j + 1 + Δ IS ) X 2 j
= t = R T 1 u t + 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j u t + 1  
+ Δ IS t = R T 1 u t + 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j  
Second, following similar arguments
t = R T 1 Δ oos [ X 2 t ( β ^ 2 t β 2 * ) X 1 t ( β ^ 1 t β 1 * ) ] = Δ oos t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 ( u t + 1 + Δ IS ) X 2 j + o p ( 1 )
Then,
Δ oos t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 ( u t + 1 + Δ IS ) X 2 j
= Δ oos t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j u t + 1  
+ Δ oos Δ I S t = R T 1 X 2 t [ ( E X 2 X 2 ) 1 J ( E X 1 X 1 ) 1 J ] j = 1 t 1 X 2 j  
The proof is complete. □

References

  1. Diebold, F.X.; Mariano, R.S. Comparing Predictive Accuracy. J. Bus. Econ. Statics 1995, 13, 253–263. [Google Scholar]
  2. West, K.D. Asymptotic Inference about Predictive Ability. Econometrica 1996, 64, 1067. [Google Scholar] [CrossRef]
  3. Clark, T.E.; West, K.D. Using out-of-sample mean squared prediction errors to test the martingale difference hypothesis. J. Econom. 2006, 135, 155–186. [Google Scholar] [CrossRef] [Green Version]
  4. West, K.D. Chapter 3 Forecast Evaluation. Handb. Econ. Forecast. 2006, 1, 99–134. [Google Scholar]
  5. Clark, T.E.; McCracken, M.W. Tests of equal forecast accuracy and encompassing for nested models. J. Econom. 2001, 105, 85–110. [Google Scholar] [CrossRef] [Green Version]
  6. Meese, R.A.; Rogoff, K. Empirical exchange rate models of the seventies: Do they fit out of sample? J. Int. Econ. 1983, 14, 3–24. [Google Scholar] [CrossRef]
  7. Meese, R.; Rogoff, K. Was it real? The exchange rate-interest differential relation over the modern floating-rate period. J. Financ. 1988, 43, 933–948. [Google Scholar] [CrossRef]
  8. Goyal, A.; Welch, I. Predicting the equity premium with dividend ratios. Manag. Sci. 2003, 49, 639–654. [Google Scholar] [CrossRef] [Green Version]
  9. Welch, I.; Goyal, A. A comprehensive look at the empirical performance of equity premium prediction. Rev. Financ. Stud. 2008, 21, 1455–1508. [Google Scholar] [CrossRef]
  10. Clark, T.E.; West, K.D. Approximately normal tests for equal predictive accuracy in nested models. J. Econom. 2007, 138, 291–311. [Google Scholar] [CrossRef] [Green Version]
  11. Clark, T.E.; McCracken, M.W. Evaluating direct multistep forecasts. Econom. Rev. 2005, 24, 369–404. [Google Scholar] [CrossRef]
  12. McCracken, M.W. Asymptotics for out of sample tests of Granger causality. J. Econom. 2007, 140, 719–752. [Google Scholar] [CrossRef]
  13. Harvey, D.S.; Leybourne, S.J.; Newbold, P. Tests for forecast encompassing. J. Bus. Econ. Stat. 1998, 16, 254–259. [Google Scholar]
  14. Clark, T.; McCracken, M. Evaluating the accuracy of forecasts from vector autoregressions. In Vector Autoregressive Modeling–New Developments and Applications: Essays in Honor of Christopher A. Sims; Fomby, T., Kilian, L., Murphy, A., Eds.; Emerald Group Publishing: Bingley, UK, 2013. [Google Scholar]
  15. Chao, J.; Corradi, V.; Swanson, N. Out-of-sample tests for Granger causality. Macroecon. Dyn. 2001, 5, 598–620. [Google Scholar] [CrossRef]
  16. Armah, N.; Swanson, N.R. Predictive inference under model misspecification. In Forecasting in the Presence of Structural Breaks and Model Uncertainty; Emerald Group Publishing Limited: Bingley, UK, 2008. [Google Scholar]
  17. Corradi, V.; Swanson, N. Some recent developments in predictive accuracy testing with nested models and (generic) nonlinear alternatives. Int. J. Forecast. 2004, 20, 185–199. [Google Scholar] [CrossRef] [Green Version]
  18. Corradi, V.; Swanson, N. Bootstrap conditional distribution tests in the presence of dynamic misspecification. J. Econom. 2006, 133, 779–806. [Google Scholar] [CrossRef] [Green Version]
  19. Corradi, V.; Swanson, N. Nonparametric bootstrap procedures for predictive inference based on recursive estimation schemes. Int. Econ. Rev. 2007, 48, 67–109. [Google Scholar] [CrossRef] [Green Version]
  20. Box, G. Science and Statistics. J. Am. Stat. Assoc. 1976, 71, 791–799. [Google Scholar] [CrossRef]
  21. Inoue, A.; Killian, L. In-sample or out-of-sample tests of predictability: Which one should we use? Econom. Rev. 2002, 23, 371–402. [Google Scholar] [CrossRef] [Green Version]
  22. Pincheira, P.; Hardy, N. Forecasting base metal prices with the Chilean exchange rate. Resour. Policy 2019, 62, 256–281. [Google Scholar] [CrossRef]
  23. Pincheira, P.; Hardy, N. Forecasting aluminum prices with commodity currencies. Resour. Policy 2021, 73, 102066. [Google Scholar] [CrossRef]
  24. Pincheira, P.; Bentancor, A.; Hardy, N.; Jarsun, N. Forecasting fuel prices with the Chilean exchange rate: Going beyond the commodity currency hypothesis. Energy Econ. 2021, 105802. [Google Scholar] [CrossRef]
  25. Pincheira, P.; Hardy, N.; Henriquez, C.; Tapia, I.; Bentancor, A. Forecasting Base Metal Prices with an International Stock Index. 2021. Available online: https://ssrn.com/abstract=3849161 (accessed on 1 September 2021).
  26. Pincheira, P.; Hardy, N.; Muñoz, F. “Go Wild for a While!”: A New Test for Forecast Evaluation in Nested Models. Mathematics 2021, 9, 2254. [Google Scholar] [CrossRef]
  27. Diebold, F.X.; Lopez, J.A. Forecast evaluation and combination. Handb. Stat. 1996, 14, 241–268. [Google Scholar]
  28. Clark, T.; McCracken, M. Advances in forecast evaluation. Handb. Econ. Forecast. 2013, 2, 1107–1201. [Google Scholar]
  29. Wilson, E.B. The periodogram of American business activity. Q. J. Econ. 1934, 48, 375–417. [Google Scholar] [CrossRef]
  30. Fair, R.C.; Shiller, R.J. The informational content of ex ante forecasts. Rev. Econ. Stat. 1989, 71, 325–331. [Google Scholar] [CrossRef] [Green Version]
  31. Fair, R.C.; Shiller, R.J. Comparing information in forecasts from econometric models. Am. Econ. Rev. 1990, 80, 375–389. [Google Scholar]
  32. Rossi, B. Exchange rate predictability. J. Econ. Lit. 2013, 51, 1063–1119. [Google Scholar] [CrossRef] [Green Version]
  33. Clark, T.; McCracken, M. Not-for-Publication Appendix to “Tests of Equal Forecast Accuracy and Encompassing for Nested Models”. 2000. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.4290 (accessed on 1 January 2021).
  34. White, H. Asymptotic Theory for Econometricians; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
  35. Newey, W.; West, K. A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica 1987, 703–708. Available online: https://www.nber.org/system/files/working_papers/t0055/t0055.pdf (accessed on 1 November 2021). [CrossRef]
  36. Newey, W.; West, K. Automatic lag selection in covariance matrix estimation. Rev. Econ. Stud. 1994, 61, 631–653. [Google Scholar] [CrossRef]
  37. Campbell, J.Y.; Shiller, R.J. Cointegration and Tests of Present Value Models. J. Political Econ. 1987, 95, 1062–1088. [Google Scholar] [CrossRef] [Green Version]
  38. Engel, C.; West, K.D. Exchange Rates and Fundamentals. J. Political Econ. 2005, 113, 485–517. [Google Scholar] [CrossRef] [Green Version]
  39. Davidson, J. Stochastic Limit Theory: An Introduction for Econometricians; OUP Oxford: Oxford, UK, 1994. [Google Scholar]
  40. Hansen, B. Convergence to stochastic integrals for dependent heterogeneous processes. Econom. Theory 1992, 8, 489–500. [Google Scholar] [CrossRef] [Green Version]
  41. McLeish, D. Dependent central limit theorems and invariance principles. Ann. Probab. 1974, 2, 620–628. [Google Scholar] [CrossRef]
  42. Hall, P. Martingale invariance principles. Ann. Probab. 1977, 5, 875–887. [Google Scholar] [CrossRef]
  43. Corradi, V.; Swanson, N.; Olivetti, C. Predictive ability with cointegrated variables. J. Econom. 2001, 104, 315–358. [Google Scholar] [CrossRef]
Table 1. Empirical Size with P/R = 2.
Table 1. Empirical Size with P/R = 2.
Nominal Size 10% Nominal Size 5%
ENC-NEWENC-tWCW-tMENC-NEW ENC-NEWENC-tWCW-tMENC-NEW
α = 0.3    ρ = 0.8
T = 1000.1920.1950.1340.136 0.1190.1220.0690.074
T = 1500.1880.1880.1330.1280.1190.1210.0690.072
T = 2000.1870.1840.1140.123 0.1190.1160.0610.065
Ave0.1890.1890.1270.129 0.1190.1200.0660.070
α = 0.5    ρ = 0.6
T = 1000.1830.1870.1160.107 0.1140.1170.0600.057
T = 1500.1820.1850.1130.103 0.1150.1220.0570.056
T = 2000.1820.1810.1050.100 0.1160.1120.0530.050
Ave0.1820.1840.1110.103 0.1150.1170.0570.054
α = 0.7    ρ = 0.6
T = 1000.2330.2270.1120.109 0.1570.1560.0640.064
T = 1500.2300.2270.1040.105 0.1540.1620.0550.062
T = 2000.2300.2200.1010.097 0.1540.1480.0470.051
Ave0.2310.2250.1060.104 0.1550.1550.0550.059
α = 1.0    ρ = 0.6
T = 1000.2910.2800.1130.114 0.2080.1980.0620.067
T = 1500.2790.2700.1060.105 0.2040.1980.0550.067
T = 2000.2760.2610.0980.096 0.2010.1860.0500.053
Ave0.2820.2700.1060.105 0.2040.1940.0560.062
Notes: Each entry report the percentage of rejections under the null hypothesis β = 0 . The table’s left (right) side reports our results with a nominal size of 10% (5%). ρ is related to the persistency of the predictor and α with the magnitude of the bias. For each exercise we consider 5000 Monte Carlo simulations. T stands for the sample size. We use a recursive scheme to update our parameters. P and R stand for the number of observations in the evaluation and estimation windows, respectively. We determine the critical values (CV) for the MENC-NEW according to Theorem 2. CV for the WCW are standard normal, while CV for the ENC-t and the ENC-NEW are those tabulated in [5] for k 2 = 1, P/R = 2 and a recursive method.
Table 2. Empirical Size for P/R = 4.
Table 2. Empirical Size for P/R = 4.
Nominal Size 10%Nominal Size 5%
ENC-NEWENC-tWCW-tMENC-NEW ENC-NEWENC-tWCW-tMENC-NEW
α = 0.3    ρ = 0.8
T = 1000.2050.1990.1240.137 0.1220.1180.0610.079
T = 1500.2010.1950.1200.131 0.1210.1220.0640.078
T = 2000.1920.1870.1040.119 0.1160.1140.0540.069
Ave0.1990.1940.1160.129 0.1200.1180.0600.075
α = 0.5    ρ = 0.6
T = 1000.1980.1920.1140.111 0.1200.1140.0560.061
T = 1500.1930.1930.1060.105 0.1210.1210.0540.059
T = 2000.1910.1870.0960.098 0.1170.1090.0440.049
Ave0.1940.1910.1050.105 0.1190.1150.0510.056
α = 0.7    ρ = 0.6
T = 1000.2780.2700.1130.134 0.1870.1760.0570.081
T = 1500.2760.2610.1070.1260.1930.1830.0560.077
T = 2000.2740.2560.0950.1100.1800.1650.0450.062
Ave0.2760.2620.1050.1230.1870.1750.0530.073
α = 1.0    ρ = 0.6
T = 1000.3210.3010.1030.120 0.2180.2090.0520.067
T = 1500.3020.2880.0950.114 0.2200.2060.0520.064
T = 2000.3010.2840.0890.095 0.2090.1890.0430.052
Ave0.3080.2910.0960.110 0.2160.2010.0490.061
Notes: Each entry report the percentage of rejections under the null hypothesis β = 0 . The table’s left (right) side reports our results with a nominal size of 10% (5%). ρ is related to the persistency of the predictor and α with the magnitude of the bias. For each exercise we consider 5000 Monte Carlo simulations. T stands for the sample size. We use a recursive scheme to update our parameters. P and R stand for the number of observations in the evaluation and estimation windows, respectively. We determine the critical values (CV) for the MENC-NEW according to Theorem 2. CV for the WCW are standard normal, while CV for the ENC-t and the ENC-NEW are those tabulated in [5] for k 2   = 1, P/R = 4 and a recursive method.
Table 3. Empirical Size for P/R = 1.
Table 3. Empirical Size for P/R = 1.
Nominal Size 10%Nominal Size 5%
ENC-NEWENC-tWCW-tMENC-NEWENC-NEWENC-tWCW-tMENC-NEW
Bias = 0.3
T = 1000.1270.1360.1420.1050.0750.0820.0750.059
T = 1500.1200.1340.1320.1000.0740.0810.0730.054
T = 2000.1210.1250.1210.0970.0640.0730.0590.043
Ave0.1230.1320.1320.1010.0710.0790.0690.052
Bias = 0.6
T = 1000.1770.1860.1330.1050.1130.1200.0650.061
T = 1500.1750.1830.1230.1030.1120.1190.0580.052
T = 2000.1690.1720.1130.0910.1060.1080.0560.044
Ave0.1740.1800.1230.1000.1100.1160.0600.052
Notes: Each entry report the percentage of rejections under the null hypothesis. The table’s left (right) side reports our results with a nominal size of 10% (5%). We introduce the bias in this GDP through the parameter α. For each exercise, we consider 5000 Monte Carlo simulations. T stands for the sample size. We use a recursive scheme to update our parameters. P and R stand for the number of observations in the evaluation and estimation windows, respectively. We determine the critical values (CV) for the MENC-NEW according to Theorem 2. CV for the WCW are standard normal, while CV for the ENC-t and the ENC-NEW are those tabulated in [5] for k 2   = 1, P/R = 1 and a recursive method.
Table 4. Empirical Size for P/R = 2.
Table 4. Empirical Size for P/R = 2.
Nominal Size 10%Nominal Size 5%
ENC-NEWENC-tWCW-tMENC-NEWENC-NEWENC-tWCW-tMENC-NEW
Bias = 0.3
T = 1000.1200.1270.1280.0980.0690.0730.0700.056
T = 1500.1270.1340.1290.1010.0690.0790.0600.051
T = 2000.1130.1210.1210.0900.0620.0690.0610.043
Ave0.1200.1270.1260.0960.0670.0740.0640.050
Bias = 0.6
T = 1000.1740.1750.1200.0970.1050.1140.0580.057
T = 1500.1800.1830.1120.1000.1150.1240.0580.054
T = 2000.1760.1770.1010.0850.1010.1040.0490.045
Ave0.1770.1780.1110.0940.1070.1140.0550.052
Notes: Each entry report the percentage of rejections under the null hypothesis β = 0 . The table’s left (right) side reports our results with a nominal size of 10% (5%). We introduce the bias in this GDP through the parameter α. For each exercise, we consider 5000 Monte Carlo simulations. T stands for the sample size. We use a recursive scheme to update our parameters. P and R stand for the number of observations in the evaluation and estimation windows, respectively. We determine the critical values (CV) for the MENC-NEW according to Theorem 2. CV for the WCW are standard normal, while CV for the ENC-t and the ENC-NEW are those tabulated in [5] for k 2   = 1, P/R = 2 and a recursive method.
Table 5. Empirical Size with an out-of-sample bias and P/R = 1.
Table 5. Empirical Size with an out-of-sample bias and P/R = 1.
Nominal Size 10%Nominal Size 5%
ENC-NEWENC-tWCW-tENC-NEW*ENC-NEWENC-tWCW-tENC-NEW*
ρ = 0.8 α 1 = 0 α 2 = 0.3 α X   = 0.2
T = 1000.1420.1450.1410.1160.0860.0880.0770.066
T = 1500.1460.1560.1360.1150.0860.0930.0740.060
T = 2000.1410.1450.1210.1130.0880.0880.0670.058
Ave0.1430.1490.1330.1150.0870.0900.0730.061
ρ = 0.8 α 1 = 0.1 α 2 = 0.3 α X = 0.2
T = 1000.1300.1260.1410.1040.0730.0720.0780.054
T = 1500.1280.1350.1350.0990.0730.0760.0720.048
T = 2000.1250.1290.1270.0980.0730.0720.0720.049
Ave0.1280.1300.1340.1000.0730.0730.0740.050
ρ = 0.4 α 1 = 0.2 α 2 = 0.3 α X = 0.4
T = 1000.1200.1360.1370.0940.0690.0770.0740.048
T = 1500.1120.1260.1320.0800.0600.0710.0710.040
T = 2000.1210.1310.1280.0900.0680.0710.0690.045
Ave0.1180.1310.1320.0880.0660.0730.0710.044
Notes: Each entry report the percentage of rejections under the null hypothesis β = 0 . The table’s left (right) side reports our results with a nominal size of 10% (5%). We introduce the out-of-sample bias in this GDP through the parameters α 1   (the initial drift of Y t ) and α 2 (the drift of Y t after the change). The parameter ρ is related to the persistency of the predictor X t . α X denotes the drift of the predictor. For each exercise, we consider 5000 Monte Carlo simulations. T stands for the sample size. We use a recursive scheme to update our parameters. P and R stand for the number of observations in the evaluation and estimation windows, respectively. We determine the corrected critical values (CV) for the ENC-NEW according to Theorem 2 (ENC-NEW*). CV for the WCW are standard normal, while CV for the ENC-t and the ENC-NEW are those tabulated in [5] for k 2   = 1, P/R = 1 and a recursive method.
Table 6. Empirical size for P/R = 1.
Table 6. Empirical size for P/R = 1.
Nominal Size 10%Nominal Size 5%
ENC-NEWENC-tWCW-tENC-NEW*ENC-NEWENC-tWCW-tENC-NEW*
ρ = 0.4 ρ α = 0.9 α α = 0.01 α x = 0.3
T = 1000.2190.2350.1200.1250.1450.1570.0600.065
T = 1500.2070.2200.1120.1060.1370.1410.0560.049
T = 2000.2150.2180.1070.1040.1490.1510.0580.044
Ave0.2140.2240.1130.1120.1440.1500.0580.053
ρ = 0.5 ρ α = 0.95 α α = 0.01 α x = 0.3
T = 1000.2640.2760.1280.1310.1890.2040.0630.072
T = 1500.2600.2710.1080.1100.1860.1930.0540.053
T = 2000.2700.2740.1150.1050.1970.2010.0580.052
Ave0.2650.2740.1170.1150.1910.1990.0580.059
ρ = 0.6 ρ α = 0.9 α α = 0.05 α x = 0.4
T = 1000.3030.3020.1310.1150.2220.2290.0680.064
T = 1500.2890.2920.1130.0900.2150.2140.0600.042
T = 2000.3000.2940.1080.0850.2250.2210.0570.040
Ave0.2970.2960.1170.0970.2210.2210.0620.049
Notes: Each entry report the percentage of rejections under the null hypothesis β = 0 . The table’s left (right) side reports our results with a nominal size of 10% (5%). ρ α and α α denote the persistency and the drift of α y , t , respectively. The parameter ρ is related to the persistency of the predictor X t . α X denotes the drift of the predictor. For each exercise, we consider 5000 Monte Carlo simulations. T stands for the sample size. We use a recursive scheme to update our parameters. P and R stand for the number of observations in the evaluation and estimation windows, respectively. We determine the corrected critical values (CV) for the ENC-NEW according to Theorem 2 (ENC-NEW*). CV for the WCW are standard normal, while CV for the ENC-t and the ENC-NEW are those tabulated in [5] for k 2   = 1, P/R = 1 and a recursive method.
Table 7. Empirical Size for P/R = 2.
Table 7. Empirical Size for P/R = 2.
Nominal Size 10%Nominal Size 5%
ENC-NEWENC-tWCW-tENC-NEW*ENC-NEWENC-tWCW-tENC-NEW*
ρ = 0.4 ρ α = 0.9 α α = 0.01 α x = 0.3
T = 1000.2160.2360.1110.1210.1450.1560.0530.063
T = 1500.2100.2190.0960.1040.1400.1450.0470.050
T = 2000.2200.2180.0960.0990.1480.1550.0480.042
Ave0.2150.2240.1010.1080.1440.1520.0490.052
ρ = 0.5 ρ α = 0.95 α α = 0.01 α x = 0.3
T = 1000.2730.2910.1070.1260.1920.2080.0560.064
T = 1500.2720.2850.0920.1030.1860.2030.0460.050
T = 2000.2780.2860.0960.0990.1970.2070.0470.047
Ave0.2740.2870.0980.1090.1920.2060.0500.054
ρ = 0.6 ρ α = 0.9 α α = 0.05 α x = 0.4
T = 1000.3160.3190.1160.1150.2260.2390.0620.059
T = 1500.3090.3140.1010.0860.2250.2270.0520.040
T = 2000.3100.3030.0910.0730.2340.2320.0490.030
Ave0.3120.3120.1030.0910.2280.2330.0540.043
Notes: Each entry report the percentage of rejections under the null hypothesis β = 0 . The table’s left (right) side reports our results with a nominal size of 10% (5%). ρ α and α α denote the persistency and the drift of α y , t , respectively. The parameter ρ is related to the persistency of the predictor X t . α X denotes the drift of the predictor. For each exercise, we consider 5000 Monte Carlo simulations. T stands for the sample size. We use a recursive scheme to update our parameters. P and R stand for the number of observations in the evaluation and estimation windows, respectively. We determine the corrected critical values (CV) for the ENC-NEW according to Theorem 2 (ENC-NEW*). CV for the WCW are standard normal, while CV for the ENC-t and the ENC-NEW are those tabulated in [5] for k 2   = 1, P/R = 2 and a recursive method.
Table 8. Raw power with nominal sizes 10% and 5%, and P/R = 1.
Table 8. Raw power with nominal sizes 10% and 5%, and P/R = 1.
Nominal Size 10%Nominal Size 5%
ENC-NEWENC-tWCW-tMENC-NEWENC-NEWENC-tWCW-tMENC-NEW
Bias = 0.15
T = 1000.9250.8830.4190.9230.8780.8010.3640.871
T = 1500.9850.9630.4670.9840.9690.9190.4400.967
T = 2000.9970.9910.5010.9970.9940.9760.4890.993
Ave0.9690.9460.4620.9680.9470.8990.4310.944
Bias = 0.3
T = 1000.9050.8560.3950.8850.8480.7650.3380.823
T = 1500.9700.9450.4530.9630.9470.8900.4170.933
T = 2000.9930.9840.4930.9910.9880.9550.4710.982
Ave0.9560.9280.4470.9470.9280.8700.4090.913
Bias = 0.6
T = 1000.8070.7510.3300.7380.7390.6590.2670.630
T = 1500.9090.8730.3950.8610.8680.8020.3430.792
T = 2000.9620.9310.4500.9280.9360.8850.4070.892
Ave0.8920.8510.3920.8420.8470.7820.3390.771
Notes: Each entry report the percentage of rejections under the alternative hypothesis β = 0.6 . The table’s left (right) side reports our results with a nominal size of 10% (5%). We introduce the bias in this GDP through the parameter α. For each exercise, we consider 5000 Monte Carlo simulations. T stands for the sample size. We use a recursive scheme to update our parameters. P and R stand for the number of observations in the evaluation and estimation windows, respectively. We determine the critical values (CV) for the MENC-NEW according to Theorem 2. CV for the WCW are standard normal, while CV for the ENC-t and the ENC-NEW are those tabulated in [5] for k 2   = 1, P/R = 1 and a recursive method.
Table 9. Raw power with nominal sizes 10% and 5%, and P/R = 2.
Table 9. Raw power with nominal sizes 10% and 5%, and P/R = 2.
Nominal Size 10%Nominal Size 5%
ENC-NEWENC-tWCW-tMENC-NEWENC-NEWENC-tWCW-tMENC-NEW
Bias = 0.15
T = 1000.9380.9290.4500.9340.8960.8700.4090.893
T = 1500.9850.9800.4850.9850.9730.9610.4650.971
T = 2000.9970.9970.5070.9960.9940.9890.5010.994
Ave0.9730.9690.4810.9720.9540.9400.4580.953
Bias = 0.3
T = 1000.9140.9060.4320.9000.8650.8350.3850.837
T = 1500.9760.9680.4730.9690.9540.9370.4400.941
T = 2000.9950.9920.5020.9940.9890.9780.4880.982
Ave0.9620.9550.4690.9540.9360.9170.4380.920
Bias = 0.6
T = 1000.8300.8140.3640.7480.7570.7280.3030.655
T = 1500.9180.9050.4160.8640.8750.8470.3730.795
T = 2000.9690.9580.4680.9380.9460.9250.4380.902
Ave0.9060.8920.4160.8500.8590.8330.3720.784
Notes: Each entry report the percentage of rejections under the alternative hypothesis β = 0.6 . The table’s left (right) side reports our results with a nominal size of 10% (5%). We introduce the bias in this GDP through the parameter α. For each exercise, we consider 5000 Monte Carlo simulations. T stands for the sample size. We use a recursive scheme to update our parameters. P and R stand for the number of observations in the evaluation and estimation windows, respectively. We determine the critical values (CV) for the MENC-NEW according to Theorem 2. CV for the WCW are standard normal, while CV for the ENC-t and the ENC-NEW are those tabulated in [5] for k 2   = 1, P/R = 2 and a recursive method.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hardy, N. “A Bias Recognized Is a Bias Sterilized”: The Effects of a Bias in Forecast Evaluation. Mathematics 2022, 10, 171. https://doi.org/10.3390/math10020171

AMA Style

Hardy N. “A Bias Recognized Is a Bias Sterilized”: The Effects of a Bias in Forecast Evaluation. Mathematics. 2022; 10(2):171. https://doi.org/10.3390/math10020171

Chicago/Turabian Style

Hardy, Nicolas. 2022. "“A Bias Recognized Is a Bias Sterilized”: The Effects of a Bias in Forecast Evaluation" Mathematics 10, no. 2: 171. https://doi.org/10.3390/math10020171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop