Second-Order Least Squares Estimation in Nonlinear Time Series Models with ARCH Errors

Salamh, Mustafa; Wang, Liqun

doi:10.3390/econometrics9040041

Open AccessEditor’s ChoiceArticle

Second-Order Least Squares Estimation in Nonlinear Time Series Models with ARCH Errors

by

Mustafa Salamh

¹

and

Liqun Wang

^2,*

¹

Department of Statistics, Cairo University, Giza 12613, Egypt

²

Department of Statistics, University of Manitoba, Winnipeg, MB R3T 2N2, Canada

^*

Author to whom correspondence should be addressed.

Econometrics 2021, 9(4), 41; https://doi.org/10.3390/econometrics9040041

Submission received: 20 September 2021 / Revised: 23 November 2021 / Accepted: 23 November 2021 / Published: 27 November 2021

Download

Browse Figures

Versions Notes

Abstract

:

Many financial and economic time series exhibit nonlinear patterns or relationships. However, most statistical methods for time series analysis are developed for mean-stationary processes that require transformation, such as differencing of the data. In this paper, we study a dynamic regression model with nonlinear, time-varying mean function, and autoregressive conditionally heteroscedastic errors. We propose an estimation approach based on the first two conditional moments of the response variable, which does not require specification of error distribution. Strong consistency and asymptotic normality of the proposed estimator is established under strong-mixing condition, so that the results apply to both stationary and mean-nonstationary processes. Moreover, the proposed approach is shown to be superior to the commonly used quasi-likelihood approach and the efficiency gain is significant when the (conditional) error distribution is asymmetric. We demonstrate through a real data example that the proposed method can identify a more accurate model than the quasi-likelihood method.

Keywords:

nonlinear dynamic model; ARCH error; mixing process; mean nonstationarity; second order least squares; semiparametric efficiency; econometric modeling; financial time series

1. Introduction

Dynamic models have been widely applied in the analysis of economic and financial data. Most theories and methods are developed for mean-stationary data generating processes, specifically ARMA processes, although ARIMA processes have also been studied (Koul and Ling 2006; Ling 2003; Ling and McAleer 2003; Meitz and Saikkonen 2008). However, many financial and economic variables exhibit nonlinear behaviour or relationships (e.g., Enders 2010; Franses and Van Dijk 2000). As pointed out by Li et al. (2002), consistent estimation of variance parameters may be misleading or impossible if the conditional mean function is not adequately specified. Therefore, more general and flexible models with time varying mean functions are desirable to capture the nonlinear dynamic behaviour and the structural relationships in the real data.

On the other hand, the autoregressive conditional heteroscedasticity (ARCH) model and its various generalizations have been widely used to analyze economic and financial data. These models allow both the conditional means and variances of a process to jointly evolve over time. The mainstream method for estimation and inference in generalized ARCH (GARCH) models is likelihood based (e.g., Engle 1982; Engle and Gonzalez-Rivera 1991; Weiss 1986), although the estimating function approach is also studied (Li and Turtle 2000). So far most research focuses on the quasi-likelihood method for various generalizations of the ARCH error component while keeping the process mean function very simple, e.g., constant or linear. Therefore from both theoretical and practical points of view, it is important to develop methodologies for mean nonstationary ARCH processes.

In this paper, we consider a model with a fairly general time varying nonlinear mean function and ARCH error that covers both stationary and mean-nonstationary processes. In particular, we propose a so-called second-order least squares (SLS) approach based on the first two conditional moments of the process. We establish the consistency and asymptotic normality for the proposed estimator under general mixing conditions. We demonstrate that this approach is more efficient than the commonly used quasi-likelihood method and the efficiency gain is significant when the conditional error distribution is asymmetric. We also carry out extensive simulations to study finite sample properties of our proposed estimator and compare its performance with other related estimators. Our results show that in most cases the optimal SLS estimator has superior performance over the estimating function estimators based on the same set of moments. Finally, we apply our approach to the empirical example of the U.K. inflation of Engle (1982), which leads to a different model specification than the quasi-likelihood method.

It is worthwhile to note that some researchers have obtained more efficient estimators than the Gaussian quasi-maximum likelihood estimator (QMLE) by assuming various parametric families of error distributions (see a recent survey by Zhu and Li (2015)). However, our approach is based on the first two conditional moments of the process only and does not require any distributional assumptions. The SLS method was first used by Wang (2003, 2004) to estimate the nonlinear measurement error models. Later, it was extended to the nonlinear longitudinal data models by Wang (2007) and to the censored linear models by Abarin and Wang (2009). Wang and Leblanc (2008) showed that under a nonlinear cross-sectional data model, the SLS estimator is asymptotically more efficient than the ordinary least squares estimator when the error term has nonzero third moment, and both estimators are equally efficient otherwise. Further, Kim and Ma (2012) showed that the SLS estimator attains the optimal semiparametric efficiency bound in general. More recently, Rosadi and Filzmoser (2019); Rosadi and Peiris (2014) and Salamh and Wang (2021) used this method in dynamic models. It has also been applied in the optimal design problems by several researchers, e.g., Bose and Mukerjee (2015); Gao and Zhou (2017); Yin and Zhou (2017) and He and Yue (2019).

The paper is organized as follows. In Section 2 we introduce the model and SLS estimator and establish its consistency and asymptotic normality. In Section 3 we derive the optimal SLS estimator and propose a feasible version of it. We also investigate the efficiency gain of the optimal SLS estimator relative to the QMLE and highlight the differences between our approach and that of the estimating functions. In Section 4 we carry out Monte Carlo simulations to study the finite sample behavior of the SLS estimator in the cases of both skewed and leptokurtic conditional error distributions, and compare it with the QMLE and some other related estimators. In Section 5 we apply the SLS approach to an empirical analysis of the U.K. inflation data. Finally, conclusions and discussion are given in Section 6, while regularity assumptions and mathematical proofs are given in the Appendix A.

2. Model and SLS Estimation

Let

\{(y_{t}, x_{t}^{'})\}

be a sequence of random vectors defined on a complete probability space

(Ω, F, P)

and denote

v_{t} = {(x_{t}^{'}, y_{t - 1}, x_{t - 1}^{'}, \dots, y_{t - τ}, x_{t - τ}^{'})}^{'}

for some nonnegative integer

τ

. We consider the model

y_{t} = f_{t} (v_{t}, θ_{0}) + ϵ_{t}, t \in Z,

(1)

where

f_{t} : R^{υ} \times Θ \to R^{1}

are known measurable functions on

R^{υ}

for each

θ \in Θ \subset R^{q}

, and continuous on

Θ

uniformly in t a.s.-P. Further, denote the

σ

-field

F_{t - 1} = F \{x_{i}, y_{i - 1}, i \leq t\}

and assume that

ϵ_{t} = σ_{t} ε_{t}

satisfying

E (ε_{t} | F_{t - 1}) = 0 a . s . - P, E (ε_{t}^{2} | F_{t - 1}) = 1 a . s . - P, σ_{t}^{2} = ϕ_{00} + \sum_{i = 1}^{p} ϕ_{0 i} ϵ_{t - i}^{2},

(2)

where

ϕ_{00}, ϕ_{0 p} > 0

, and

ϕ_{0 i} \geq 0

for

i = 1, 2, \dots, p - 1

. It is easy to see that the linear model with ARCH error is a special case of model (1) and (2). Our main goal is to estimate

γ_{0} = {(θ_{0}^{'}, ϕ_{0}^{'})}^{'} \in Γ

which is a compact subset of

R^{q + p + 1}

.

Given the random sample

(y_{T}, x_{T}^{'}, \dots, y_{1 - p - τ}, x_{1 - p - τ}^{'})

, let

ϵ_{t} (θ) = y_{t} - f_{t} (v_{t}, θ)

,

σ_{t}^{2} (γ) = ϕ_{0} + \sum_{i = 1}^{p} ϕ_{i} ϵ_{t - i}^{2} (θ)

and

h_{t} (γ) = {(ϵ_{t} (θ), y_{t}^{2} - f_{t}^{2} (v_{t}, θ) - σ_{t}^{2} (γ))}^{'}

. Then the second-order least squares (SLS) estimator is defined as the

F

–measurable function satisfying

{\hat{γ}}_{T} = \underset{γ \in Γ}{argmin} Q_{T} (γ) a . s . - P,

(3)

where

Q_{T} (γ) = T^{- 1} \sum_{t = 1}^{T} h_{t}^{'} (γ) W_{t} h_{t} (γ)

and

W_{t}

is a nonnegative definite matrix that is measurable, with respect to

F \{v_{t}, \dots, v_{t - p}\}

.

Next we establish the consistency and asymptotic normality of

{\hat{γ}}_{T}

under the regularity conditions given in Appendix A. The consistency of

{\hat{γ}}_{T}

follows from the uniform convergence of

Q_{T} (γ)

on

Γ

to a non-random sequence

{\bar{Q}}_{T} (γ)

which have unique minimizer at

γ_{0}

for sufficiently large T.

Theorem 1.

Under Assumptions A1–A3,

{\hat{γ}}_{T} \overset{a . s .}{⟶} γ_{0}

, as

T \to \infty

.

Theorem 2.

Under Assumptions A1–A9,

V_{T}^{- 1 / 2} {\bar{A}}_{T} (γ_{0}) \sqrt{T} ({\hat{γ}}_{T} - γ_{0}) \overset{d}{⟶} N (0, I_{q + p + 1})

, as

T \to \infty

, where

{\bar{A}}_{T} (γ_{0}) = 2 T^{- 1} \sum_{t = 1}^{T} E \{\nabla_{γ} h_{t}^{'} (γ_{0}) W_{t} \nabla_{γ^{'}} h_{t} (γ_{0})\}

(4)

and

V_{T} = 4 T^{- 1} \sum_{t = 1}^{T} E \{\nabla_{γ} h_{t}^{'} (γ_{0}) W_{t} h_{t} (γ_{0}) h_{t}^{'} (γ_{0}) W_{t} \nabla_{γ^{'}} h_{t} (γ_{0})\} .

(5)

The proofs are given in the Appendix A.

3. Optimal SLS Estimator

From Theorem 2 the asymptotic covariance (acov) of

\sqrt{T} ({\hat{γ}}_{T} - γ_{0})

is given by

{\bar{A}}_{T}^{- 1} (γ_{0}) V_{T} {\bar{A}}_{T}^{- 1} (γ_{0})

which depends on the weights

W_{t}

,

t = 1, 2, \dots, T

. Therefore it is of interest to find the (asymptotically) optimal estimator, say

{\hat{γ}}_{T}^{o}

, which has the smallest asymptotic variance in the class of estimators defined by (3), i.e., for any estimator

{\hat{γ}}_{T}

satisfying (3),

acov \sqrt{T} ({\hat{γ}}_{T} - γ_{0}) - acov \sqrt{T} ({\hat{γ}}_{T}^{o} - γ_{0})

is nonnegative definite. The following theorem gives the optimal choice of

W_{t}

to achieve this goal.

Theorem 3.

Suppose

U_{t} = E \{h_{t} (γ_{0}) h_{t}^{'} (γ_{0}) v_{t}, \dots, v_{t - p}\}

is nonsingular a.s.-P, and Assumptions A2, A3, A6–A9 hold with

W_{t} = U_{t}^{- 1}

. Then, the asymptotically optimal SLS (OSLS) estimator

{\hat{γ}}_{T}^{o}

is obtained by using

W_{t} = U_{t}^{- 1}

,

t = 1, 2, \dots, T

. Further, the corresponding (inverse) optimal covariance matrix is given by

\begin{matrix} {a c o v}^{- 1} \sqrt{T} ({\hat{γ}}_{T}^{o} - γ_{0}) & = & T^{- 1} \sum_{t = 1}^{T} E \{\nabla_{γ} h_{t}^{'} (γ_{0}) U_{t}^{- 1} \nabla_{γ^{'}} h_{t} (γ_{0})\}, \end{matrix}

(6)

\begin{matrix} = & T^{- 1} \sum_{t = 1}^{T} E \{B_{t}^{'} Ω_{t}^{- 1} B_{t}\}, \end{matrix}

(7)

where

B_{t}^{'} = (\begin{matrix} \nabla_{θ} f_{t} (v_{t}, θ_{0}) & \nabla_{θ} σ_{t}^{2} (γ_{0}) \\ 0 & \nabla_{ϕ} σ_{t}^{2} (γ_{0}) \end{matrix}),

(8)

and

Ω_{t} = σ_{t}^{2} (γ_{0}) (\begin{matrix} 1 & σ_{t} (γ_{0}) E (ε_{t}^{3} v_{t}, \dots, v_{t - p}) \\ \cdot & σ_{t}^{2} (γ_{0}) [E (ε_{t}^{4} v_{t}, \dots, v_{t - p}) - 1] \end{matrix}) .

(9)

However, the OSLS estimator

{\hat{γ}}_{T}^{o}

is infeasible because

U_{t}

depends on

γ_{0}

and

E (ε_{t}^{j} v_{t}, \dots, v_{t - p})

,

j = 3, 4

. In practice a two-step procedure can be used as follows. First, a first-step consistent estimator of

γ_{0}

is calculated, such as the QMLE or simply the SLS

{\hat{γ}}_{T}

using the identity weight matrix. Second, the residuals

{\hat{ε}}_{t}

are calculated and suitable autoregressive models are fitted to

{\hat{ε}}_{t}^{3}

and

{\hat{ε}}_{t}^{4}

respectively. Finally, these fitted values are substituted in

U_{t}

and the second-step estimator is calculated using the estimated optimal weights

W_{t} = {\hat{U}}_{t}^{- 1}

. Under some general conditions this two-step estimator is consistent and, moreover, it has the same asymptotic variance as in (7) if

{\hat{U}}_{t}

is consistent for

U_{t}

. Henceforth, this two-step estimator will be called the feasible optimal SLS estimator (FSLS). Note that fitting the autoregressive models is useful if the errors

\{ε_{t}\}

are not i.i.d., otherwise the sample means of

{\hat{ε}}_{t}^{3}

and

{\hat{ε}}_{t}^{4}

can be used. Alternatively, if the conditioning set

\{v_{t}, \dots, v_{t - p}\}

is reasonably small, one can use nonparametric estimators of the conditional skewness and kurtosis to obtain

{\hat{U}}_{t}

. More details about the two-step estimators can be found in White (1996, Section 6.3).

In the rest of this section we investigate the efficiency gain of the OSLS estimator compared to the Gaussian QMLE, which is one of the most popular methods of estimation in GARCH models. The asymptotic properties of the QMLE are studied by Weiss (1986) and Bollerslev and Wooldridge (1992). Specifically for model (1) and (2), the Gaussian QMLE is defined as

{\hat{γ}}_{T}^{Q} = \underset{γ \in Γ}{argmin} T^{- 1} \sum_{t = 1}^{T} log σ_{t}^{2} (γ) + \frac{ϵ_{t}^{2} (θ)}{σ_{t}^{2} (γ)} a . s . - P .

(10)

Under similar conditions as Assumptions A2–A9 and similar to the proofs of Theorems 1 and 2, we can show that

{\hat{γ}}_{T}^{Q}

is

\sqrt{T}

-consistent with

acov \sqrt{T} ({\hat{γ}}_{T}^{Q} - γ_{0})

given by

T {(\sum_{t = 1}^{T} E \{B_{t}^{'} Σ_{t}^{- 1} B_{t}\})}^{- 1} (\sum_{t = 1}^{T} E \{B_{t}^{'} Σ_{t}^{- 1} Ω_{t} Σ_{t}^{- 1} B_{t}\}) {(\sum_{t = 1}^{T} E \{B_{t}^{'} Σ_{t}^{- 1} B_{t}\})}^{- 1},

(11)

where

B_{t}

and

Ω_{t}

are defined in (8) and (9) respectively and

Σ_{t} = (\begin{matrix} σ_{t}^{2} (γ_{0}) & 0 \\ 0 & 2 σ_{t}^{4} (γ_{0}) \end{matrix}) .

Further, similar to the proof of Theorem 3, we can show that

acov \sqrt{T} a^{'} ({\hat{γ}}_{T}^{o} - γ_{0}) \leq acov \sqrt{T} a^{'} ({\hat{γ}}_{T}^{Q} - γ_{0})

for any

a \in R^{q + p + 1}

, and the equality holds if and only if for

t = 1, 2, \dots, T

,

Ω_{t} Σ_{t}^{- 1} B_{t} a = B_{t} {(\sum_{t = 1}^{T} E \{B_{t}^{'} Ω_{t}^{- 1} B_{t}\})}^{- 1} (\sum_{t = 1}^{T} E \{B_{t}^{'} Σ_{t}^{- 1} B_{t}\}) a a . s . - P .

(12)

The above general condition can be simplified under specific settings. For example, if the process

\{(y_{t}, x_{t}^{'}, σ_{t}, ϵ_{t})\}

is stationary with

E (ε_{t}^{3} v_{t}, \dots, v_{t - p}) = 0

, and

E (ε_{t}^{4} v_{t}, \dots, v_{t - p})

=

μ_{4}

, then it can be shown that Equation (12) is equivalent to

a_{1}^{'} (I_{q} - C) \nabla_{θ} f_{t} (v_{t}, θ_{0}) = 0, a_{1}^{'} (\frac{μ_{4} - 1}{2} I_{q} - C) \nabla_{θ} σ_{t}^{2} (γ_{0}) = 0,

(13)

where

C = (C_{1} + \frac{1}{2} C_{2}) {(C_{1} + \frac{1}{μ_{4} - 1} C_{2})}^{- 1}

,

C_{1} = E \{σ_{t}^{- 2} (γ_{0}) \nabla_{θ} f_{t} (v_{t}, θ_{0}) \nabla_{θ^{'}} f_{t} (v_{t}, θ_{0})\}

,

C_{2} = E \{σ_{t}^{- 4} (γ_{0}) \nabla_{θ} σ_{t}^{2} (γ_{0}) \nabla_{θ^{'}} σ_{t}^{2} (γ_{0})\}

and

a_{1}

is the subvector of the first q elements of

a

.

Since it is difficult to quantify the difference between the asymptotic covariance matrices in (7) and (11) in general, in the following we calculate some examples under a simple AR(1) model with ARCH(1) error

y_{t} = θ_{0} y_{t - 1} + ϵ_{t}, σ_{t}^{2} = 1 - ϕ_{0} + ϕ_{0} ϵ_{t - 1}^{2},

(14)

where the innovations

\{ε_{t} = ϵ_{t} / σ_{t}\}

are i.i.d. with zero mean and unit variance. We consider both symmetric and skewed distributions for

ε_{t}

. In particular, we choose Gamma distribution with various shape parameters and Student’s t distribution with various degrees of freedom to reflect the different degrees of skewness and kurtosis, respectively.

The asymptotic variances of the OSLS and QMLE for various parameter values are given in Table 1, where the asymptotic variance of the true maximum likelihood estimator (MLE) is also given as a benchmark. The results show clearly that the efficiency gain of the OSLS over the QMLE is significant in the case of highly skewed distributions such as Gamma, while this is less so in the case of symmetric distributions such as Student’s t. Note that the QMLE and OSLS of

ϕ_{0}

have the same asymptotic variances under the Student’s t distribution, which is consistent with the theoretical result (13).

In order to see how much loss of efficiency in the QMLE is recovered by the OSLS estimator, next we calculate the relative reduction in the QMLE efficiency-loss (inefficiency)

R I E L (a^{'} γ_{0}) = 100 \times \frac{acov \sqrt{T} a^{'} ({\hat{γ}}_{T}^{Q} - γ_{0}) - acov \sqrt{T} a^{'} ({\hat{γ}}_{T}^{o} - γ_{0})}{acov \sqrt{T} a^{'} ({\hat{γ}}_{T}^{Q} - γ_{0}) - acov \sqrt{T} a^{'} ({\hat{γ}}_{T}^{M} - γ_{0})},

where

{\hat{γ}}_{T}^{M}

is the true MLE for

γ_{0}

. This measure also indicates which estimator approaches the asymptotic variance lower bound faster.

Figure 1 contains a sample of the numerical outputs. First, Figure 1a shows that 45–60% of the inefficiency of

{\hat{θ}}_{T}^{Q}

is recovered by

{\hat{θ}}_{T}^{o}

when the innovations

\{ε_{t}\}

have a heavy tail Student’s t distribution. Further,

R I E L (θ_{0})

declines sharply as the degrees of freedom increases, indicating that as the distribution of

\{ε_{t}\}

gets close to the Gaussian, the QMLE improves quickly and it gets close to the OSLS estimator, and both of them approach the variance lower bound. However, the situation in Figure 1b,c is opposite, where

R I E L (θ_{0})

and

R I E L (ϕ_{0})

are increasing with the shape parameter of the Gamma distribution. This indicates that the OSLS improves significantly faster than the QMLE as the skewed distribution gets closer to the Gaussian. In other words, the efficiency loss of the QMLE is persistent and therefore it is not desirable under asymmetric conditional error distribution.

4. Simulation Studies

In this section we carry out Monte Carlo simulations to study the finite sample behaviour of the feasible optimal SLS estimator (FSLS)

{\hat{γ}}_{T}^{o}

and compare it with some other related, commonly used estimators.

4.1. Comparison with Quasi-MLE

We first compare the FSLS with the quasi-maximum likelihood estimator (QMLE). Specifically, we generate the data from the AR(1)-ARCH(1) model in (14) with innovations

\{ε_{t}\}

drawn from the standardized distributions of different levels of skewness and kurtosis. We consider various sample sizes including

T =

10,000 to approximate the asymptotic results. In each simulation, we vary the values of the parameters

(θ_{0}, ϕ_{0})

to represent different levels of persistence in the mean and variance components. For each estimator, we calculate the mean estimates

({\hat{θ}}_{0}, {\hat{ϕ}}_{0})

and the root mean squared errors

(R M S E ({\hat{θ}}_{0}), R M S E ({\hat{ϕ}}_{0}))

based on 3000 independent replications over 4 different pairs of true values

(θ_{0}, ϕ_{0})

.

Table 2 contains the summary results for Gamma (2,1) innovations, which show clearly that the FSLS outperforms the QMLE for all sample sizes and panels, while both estimators have the same degree of bias. We have also done the simulations with Student’s

t (5)

innovations and the results show that the FSLS has moderate gain of efficiency over the QMLE, which performs fairly well in small samples.

To understand how the values of

(θ_{0}, ϕ_{0})

, shape parameter, and sample size T affect the relative RMSE of the FSLS compared to QMLE, we use the numerical results of the simulations to fit two regression equations with

R R M S E ({\hat{θ}}_{T}^{o}) = R M S E ({\hat{θ}}_{T}^{o}) / R M S E ({\hat{θ}}_{T}^{Q})

and

R R M S E ({\hat{ϕ}}_{T}^{o})

as the response variables, respectively. The results in Table 3 show that the shape parameter has a positive effect on both RRMSE

({\hat{θ}}_{T}^{o})

and RRMSE

({\hat{ϕ}}_{T}^{o})

, while T is negatively associated with both RRMSEs, indicating that the outperformance of the FSLS in large samples is more evident than in small samples if the innovations distribution is skewed. Moreover, the negative sign of

θ_{0}

in the RRMSE

({\hat{θ}}_{T}^{o})

equation indicates that the performance of the QMLE improves quickly as the value of

θ_{0}

gets larger.

4.2. Comparison with Estimating Function Estimators

Our approach is related to the estimating function (EF) approach. Following Durairajan (1992), it can be shown that under models (1) and (2) the EF estimator

{\hat{γ}}_{T}^{E F}

is obtained by solving the estimating equation

\sum_{t = 1}^{T} B_{t}^{'} (γ) Ω_{t}^{- 1} (γ) (\begin{matrix} ϵ_{t} (γ) \\ ϵ_{t}^{2} (γ) - σ_{t}^{2} (γ) \end{matrix}) = 0,

(15)

where

B_{t}^{'} (γ)

and

Ω_{t} (γ)

are given in (8) and (9) with

θ_{0}

,

γ_{0}

and

E (ε_{t}^{j} v_{t}, \dots, v_{t - p})

,

j = 3, 4

replaced by

θ

,

γ

and

E_{γ} (ε_{t}^{j} (γ) v_{t}, \dots, v_{t - p})

,

j = 3, 4

respectively. The EF in (15) is optimal with respect to the so-called Godambe’s information criterion. Moreover, under some regularity conditions similar to Assumptions A2–A9, the EF estimator can be shown to be

\sqrt{T}

-consistent with

acov \sqrt{T} ({\hat{γ}}_{T}^{E F} - γ_{0})

given by Equation (7). However, although the FSLS and EF estimators have the same asymptotic variance, they are distinct in the following aspects. First, the FSLS is an extremum estimator while the EF estimator represents a solution of the optimal estimating Equation (15). Second, if

E_{γ} (ε_{t}^{j} (γ) v_{t}, \dots, v_{t - p}), j = 3, 4

are known functions of

γ

, then the EF estimator can be calculated in one step, while the FSLS remains to be a two-step estimator due the dependence of

W_{t}

on

γ_{0}

. Third, the two estimators may behave differently in finite-sample situations because they have different estimating equations. This can be seen by comparing Equation (15) with the first order condition for the FSLS which can be written as

\sum_{t = 1}^{T} B_{t}^{'} (γ) H_{t} (θ) Ω_{t}^{- 1} H_{t}^{'} (θ) (\begin{matrix} ϵ_{t} (γ) \\ ϵ_{t}^{2} (γ) - σ_{t}^{2} (γ) \end{matrix}) = 0,

where

H_{t} (θ) = (\begin{matrix} 1 & 2 f_{t} (v_{t}, θ) - 2 f_{t} (v_{t}, θ_{0}) \\ 0 & 1 \end{matrix}) .

Next we calculate some numerical examples to compare the FSLS with four different versions of the EF estimators that are commonly used in practice. First, since

ε_{t}

are i.i.d., Equation (15) can be written as

\sum_{t = 1}^{T} B_{t}^{'} (γ_{1}) Ω_{t}^{- 1} (γ_{2}, μ_{3}, μ_{4}) (\begin{matrix} ϵ_{t} (γ) \\ ϵ_{t}^{2} (γ) - σ_{t}^{2} (γ) \end{matrix}) = 0 .

(16)

Then we calculate four variants of the EF estimator as follows: the estimator EF0 is obtained by taking

γ_{1} = γ_{2} = {\hat{γ}}_{T}^{Q}

,

μ_{3} = 1 / T \sum_{t = 1}^{T} ε_{t}^{3} ({\hat{γ}}_{T}^{Q})

and

μ_{4} = 1 / T \sum_{t = 1}^{T} ε_{t}^{4} ({\hat{γ}}_{T}^{Q})

; EF1 is the same as EF0 except that

γ_{1} = γ

; EF is the same as EF0 except that

γ_{1} = γ_{2} = γ

; and EF2 is obtained by letting

γ_{1} = γ_{2} = γ

,

μ_{3} = 1 / T \sum_{t = 1}^{T} ε_{t}^{3} (γ)

and

μ_{4} = 1 / T \sum_{t = 1}^{T} ε_{t}^{4} (γ)

. The four variants have the same asymptotic covariance matrix.

Figure 2 shows the relative RMSE of the FSLS over the RMSE of the QMLE and EF estimators, respectively. Figure 2a is based on 4790 simulations with 3000 independent replications where

\{ε_{t}\}

are generated from the standardized Gamma distribution with shape parameters 2, 3, 4, 5, 6, and 7 respectively. Similarly, Figure 2b is based on 4050 simulations with

\{ε_{t}\}

generated from the standardized Student’s t distribution with df 5, 6, 7, 8, and 9, respectively. In all cases, sample size T varies over a range (30, 40, 50, 60, 70, 80, 90, 100, 500, and 1000) and the RMSE of the estimators are calculated on the parameters grid

\{(0.1, 0.1), (0.1, 0.2), \dots, (0.1, 0.9), (0.2, 0.1), \dots, (0.9, 0.9)\}

.

The results show clearly that the FSLS outperform the EF estimators for

θ_{0}

in almost all cases and for

ϕ_{0}

in the majority of the cases. In particular, results of EF2 show that replacing the nuisance parameters with highly nonlinear functions of the estimated parameter makes the performance worse. Therefore in practice, a two-step EF estimator such as the EF or EF1 should be recommended. Finally, the QMLE performs reasonably well in the case of symmetric error distributions, while not so well in the case of skewed error distributions.

5. Application

In this section we apply our method to an empirical example of Engle (1982) (see also Enders 2010), who used an AR model with ARCH error to study the wage/price spiral in the U.K. over the period 1958Q2–1977Q2. Specifically, let

p_{t}

denote the log of the consumer price index and

w_{t}

denote the log of the index of the nominal wage rates. Then

y_{t} = p_{t} - p_{t - 1}

and

r_{t} = w_{t} - p_{t}

are the rate of inflation and real wage, respectively. Engle (1982) first fitted the following equation using the least squares (LS) method

\begin{matrix} y_{t} = & 0.0257 & + & 0.334 & y_{t - 1} & + & 0.408 & y_{t - 4} & - & 0.404 & y_{t - 5} & + & 0.0559 & r_{t - 1} + ϵ_{t}, \\ (0.006) & (0.103) & (0.110) & (0.114) & (0.014) \\ {\hat{σ}}_{t}^{2} = & 8.9 \times 10^{- 5}, \end{matrix}

(17)

where the standard errors are in parentheses. Since the Lagrange multiplier (LM) test for ARCH(1) error was not significant, but for ARCH(4) process was instead, the following conditional variance equation was specified

σ_{t}^{2} = ϕ_{0} + ϕ_{1} (0.4 ϵ_{t - 1}^{2} + 0.3 ϵ_{t - 2}^{2} + 0.2 ϵ_{t - 3}^{2} + 0.1 ϵ_{t - 4}^{2}),

(18)

where the two-parameter variance function with declining weights was chosen to satisfy the nonnegativity and stationarity constraints. Further, Engle (1982) fitted Equations (17) and (18) jointly using the Gaussian quasi-likelihood method, where all coefficients (except the first lag in the inflation rate) were significant at level

0.05

.

Since we could not obtain the wage rates before 1963, we use the data from 1963Q1 through 1982Q1 to compensate for the 19 missing quarters. The data were obtained from the OECD website http://dx.doi.org/10.1787/data-00052-en (accessed on 15 January 2014, see OECD, Main Economic Indicators–complete database). To see if there is major structural difference in our data compared to the data used by Engle (1982), we have done an initial investigation and found no evidence of structural change by replacing the 19 quarters. So our results are to some extent comparable with those in Engle (1982).

We start by fitting the following regression model for inflation using the LS method

y_{t} = θ_{0} + θ_{1} y_{t - 1} + θ_{2} y_{t - 2} + θ_{3} y_{t - 3} + θ_{4} y_{t - 4} + θ_{5} y_{t - 5} + θ_{6} r_{t - 1} + ϵ_{t}

(19)

The results are shown in the first part of Table 4 under Model-I (LS), where the White’s correction for the standard errors are reported in parentheses.

We calculate the Ljung-Box statistic Q for

{\hat{ε}}_{t}

(denoted by Q1) and

{\hat{ε}}_{t}^{2}

(denoted by Q2) at lags 5, 10, 15, and 20. They are all insignificant at level

0.1

except for Q2(5), which agrees with Engle (1982) in including four lags in the variance equation. Further, we use the squared residuals from this regression to fit an ARCH(4) model for the conditional variance

σ_{t}^{2} = ϕ_{0} + ϕ_{1} ϵ_{t - 1}^{2} + ϕ_{2} ϵ_{t - 2}^{2} + ϕ_{3} ϵ_{t - 3}^{2} + ϕ_{4} ϵ_{t - 4}^{2}

(20)

The results are shown in the second part of Table 4 under Model-I (LS). Again, the ARCH(4) model is confirmed by the LM test at significance level 0.05. We report only the LS estimates of the variance function without standard errors because those estimates are only used as starting values to compute the QMLE. The Q1 and Q2 statistics from Model-I (LS) (in the third part of Table 4) indicate that the mean and variance equations are fairly well specified since none of these diagnostics are significant at level 0.1. Therefore we fit the Model-I again using the QMLE, which is more efficient than the LS procedure. Although the diagnostics of the standardized innovations from Model-I (QMLE) do not show serial correlation of the first or second order, all coefficients in the variance function are insignificant except for the constant term. This contradicts the ARCH(4) that we found before to be correctly specified. However, this can be explained by the lack of efficiency in the QMLE due to the moderate level of skewness in the corresponding residuals.

On the other hand, our FSLS estimation yields a significant fourth lag in the variance function in addition to the correct specification as indicated by Q1 and Q2. Accordingly, we use the model fitted by FSLS in stepwise regression algorithm to obtain a reduced model (Model-II in Table 4)

y_{t} = θ_{0} + θ_{1} y_{t - 1} + θ_{4} y_{t - 4} + θ_{5} y_{t - 5} + θ_{6} r_{t - 1} + ϵ_{t}

(21)

σ_{t}^{2} = ϕ_{0} + ϕ_{4} ϵ_{t - 4}^{2}

(22)

Note that while the mean equation is identical to that in Engle (1982), only the fourth lag is significant in the variance equation. The above ARCH structure can only be detected by using the full model that is more flexible than the two-parameter variance function in Equation (18). Moreover, the more efficient FSLS estimation yields the ARCH(4) structure, while the QMLE would conclude with a misspecified homoscedastic model.

6. Conclusions and Discussion

Although the ARCH-type models have been extensively studied for decades, most theories and methods are developed for stationary data processes whose conditional mean functions are either ARMA or simple linear functions of covariates. Moreover, recent research has been focusing on generalization of the error component while leaving the mean function to be simple linear form. However, many economic and financial time series are nonlinear and/or nonstationary in the mean, therefore, data transformation is required in order to apply standard methodologies in the analysis.

In this paper, we proposed the second-order least squares (SLS) approach to estimate a flexible and general model with nonlinear and time-varying conditional mean and ARCH conditional variance function. This approach is applicable to both stationary and mean-nonstationary processes and therefore can be used to analyze the transformed as well as the original data. Another advantage of this approach is that it does not require specification of the underlying distribution for the errors. Moreover, the feasible optimal SLS estimator (FSLS) is more efficient than the commonly used QMLE, and the efficiency gain is significant when the conditional error distribution is asymmetric. We have demonstrated through a real data example that the efficiency gain of the proposed approach leads to a more accurate model than the QMLE. The third and fourth conditional moments of the innovation provide useful information that is utilized by the FSLS (through the weight matrix) to gain efficiency over the QMLE. This information is even more important in the case of skewed and/or leptokurtic error distributions. Our simulation studies also show that the SLS approach has better finite sample properties than the estimating function approach based on the same set of conditional moments.

There are some issues remaining to be studied in the future. Some assumptions for the asymptotic theories are sufficient but not necessary. They are adopted here mainly due to the way of the proofs we used. As is common in statistics and econometrics, there are usually different ways to prove an asymptotic theory, and each of them requires a specific set of assumptions. Therefore it is interesting to explore the possibilities of establishing the asymptotic properties for the FSLS estimator under the similar conditions for the QMLE. Indeed, our Monte Carlo simulation studies have shown that the FSLS estimator performs well in the finite sample situations even if the innovation does not have the finite moments of order higher than four. From the application point of view, it is also important to extend the method of this paper to models with more general GARCH errors. This is possible by modifying some assumptions to adapt for the mixing process.

Author Contributions

Conceptualization, M.S. and L.W.; methodology, M.S. and L.W.; software, M.S.; validation, M.S. and L.W.; formal analysis, M.S.; investigation, M.S.; resources, L.W.; data curation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, L.W.; visualization, M.S.; supervision, L.W.; project administration, L.W.; funding acquisition, L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) grant number 546719.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset in the Application section was accessible at OECD website http://dx.doi.org/10.1787/data-00052-en (accessed on 15 January 2014).

Acknowledgments

We are grateful to the Editor and the three anonymous referees for their comments and suggestions, which were helpful in improving the previous version of this paper. The research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). The first author also gratefully acknowledges financial support by the University of Manitoba Graduate Fellowship and Manitoba Graduate Scholarship.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this Appendix we first list the technical assumptions that are sufficient for the asymptotic properties of the SLS estimator

{\hat{γ}}_{T}

, followed by the mathematical proofs for the theorems. To simplify the notation, we denote

f_{t} (\cdot, θ) = f_{t} (v_{t}, θ)

.

Appendix A.1. Regularity Conditions

We make the following assumptions for the consistency of the SLS estimator

{\hat{γ}}_{T}

.

Assumption A1.

The process

\{(y_{t}, x_{t}^{'})\}

is strong mixing of size

- a

, for some

a > 1

. That is, there exists

δ > 0

such that

α (m) = O (m^{- a - δ})

, where

α (m) = sup_{n} sup_{\{F \in F_{- \infty}^{n}, G \in F_{n + m}^{\infty}\}} | P (F \cap G) - P (F) P (G) |,

F_{- \infty}^{n} = F \{(y_{t}, x_{t}^{'}), t \leq n\}

and

F_{n + m}^{\infty} = F \{(y_{t}, x_{t}^{'}), t \geq n + m\}

.

This is a high-level assumption which allows for considerable dependence and heterogeneity in the underlying process. As noted by White and Domowitz (1984), it preserves the asymptotic independence of the observed process even under further transformations. This assumption can be justified on a case by case basis. For example, if

\sum_{i = 1}^{p} ϕ_{0 i} < 1

, then

\{ϵ_{t}\}

is strong mixing with geometric rate if the innovation noise sequence

\{ε_{t}\}

is i.i.d., has finite second moment, and Lebesgue density being strictly positive in a neighbourhood of zero (Lindner 2009). The geometric memory decay implies that a can be set to an arbitrarily large number. It can also be shown that finite order Gaussian ARMA processes are strong mixing (Ibragimov and Linnik 1971, pp. 312–13).

Assumption A2.

Let

∥.∥

be the Euclidean norm. Then for some

r > \frac{a}{a - 1}

,

sup_{t \in N} E {\{∥W_{t}∥ (1 + \sum_{i = 0}^{p} ϵ_{t - i}^{4} + sup_{Θ} f_{t - i}^{4} (\cdot, θ))\}}^{r} < \infty .

Assumption A3.

For any open neighbourhood

N ⊊ Γ

of

γ_{0}

, there exists

T_{0} (N)

such that

inf_{T \geq T_{0}} T^{- 1} \sum_{t = 1}^{T} min_{γ \in N^{c} \cap Γ} E \{{(h_{t} (γ) - h_{t} (γ_{0}))}^{'} W_{t} (h_{t} (γ) - h_{t} (γ_{0}))\} > 0 .

Note that Assumption A2 ensures the uniform convergence of

Q_{T} (γ)

and Assumption A3 is sufficient for parameter identification. If the process

(y_{t}, x_{t}^{'})

is stationary,

f_{t} = f : R^{υ} \times Θ \to R^{1}

and

W_{t} = W (v_{t}, \dots, v_{t - p})

is positive definite a.s.-P, then Assumption A3 is equivalent to that

f (\cdot, θ) = f (\cdot, θ_{0})

a . s . - P

only if

θ = θ_{0}

.

Further, we make the following additional assumptions for the asymptotic normality of

{\hat{γ}}_{T}

.

Assumption A4.

The true value

γ_{0}

is an interior point of Γ.

Assumption A5.

The random functions

f_{t} (\cdot, θ)

are twice continuously differentiable on Γ uniformly in t a.s.-P.

Assumption A6.

For some

r > \frac{a}{a - 1}

, it holds

\begin{matrix} sup_{t \in N} E \{∥W_{t}∥ sup_{Θ} ({∥\nabla_{θ}^{2} f_{t} (\cdot, θ)∥}^{2} + \sum_{i = 0}^{p} {∥\nabla_{θ} f_{t - i} (\cdot, θ)∥}^{4} + \sum_{i = 1}^{p} ϵ_{t - i}^{2} {∥\nabla_{θ}^{2} f_{t - i} (\cdot, θ)∥}^{2} \\ {+ \sum_{i = 0}^{p} f_{t - i}^{2} (\cdot, θ) {∥\nabla_{θ}^{2} f_{t - i} (\cdot, θ)∥}^{2})\}}^{r} < \infty . \end{matrix}

Assumption A7.

The sequence

{\bar{A}}_{T} (γ_{0}) = 2 T^{- 1} \sum_{t = 1}^{T} E \{\nabla_{γ} h_{t}^{'} (γ_{0}) W_{t} \nabla_{γ^{'}} h_{t} (γ_{0})\}

is bounded and

{lim inf}_{T \to \infty} | {\bar{A}}_{T} (γ_{0}) | > 0

.

Assumption A8.

For some

r > \frac{a}{a - 1}

,

\begin{matrix} sup_{t \in N} E \{{∥W_{t}∥}^{2} (1 + f_{t}^{2} (\cdot, θ_{0}) {∥\nabla_{θ} f_{t} (\cdot, θ_{0})∥}^{2} + \sum_{i = 1}^{p} ϵ_{t - i}^{2} {∥\nabla_{θ} f_{t - i} (\cdot, θ_{0})∥}^{2} \\ {+ {∥\nabla_{θ} f_{t} (\cdot, θ_{0})∥}^{2} + \sum_{i = 1}^{p} ϵ_{t - i}^{4}) (1 + \sum_{i = 0}^{p} ϵ_{t - i}^{4} + ϵ_{t}^{2} f_{t}^{2} (\cdot, θ_{0}))\}}^{r} < \infty . \end{matrix}

Assumption A9.

The sequence

V_{T} = 4 T^{- 1} \sum_{t = 1}^{T} E \{\nabla_{γ} h_{t}^{'} (γ_{0}) W_{t} h_{t} (γ_{0}) h_{t}^{'} (γ_{0}) W_{t} \nabla_{γ^{'}} h_{t} (γ_{0})\}

is bounded and

{lim inf}_{T \to \infty} | V_{T} | > 0

.

Assumptions A2, A6, and A8 are for general cases and can be simplified for specific choice of

W_{t}

. For example, for the optimal weight

W_{t} = U_{t}^{- 1}

in Section 3, these assumptions can be simplified to the following assumptions, respectively.

Assumption A10

(Assumption A2). For

k = 0, 1, \dots, p

, and some

r > \frac{a}{a - 1}

,

sup_{t \in N} E {\{ε_{t}^{4} + σ_{t}^{- 4} sup_{Θ} f_{t - k}^{4} (\cdot, θ)\}}^{r} < \infty .

Assumption A11

(Assumption A6). For

s = 1, 2

and

k = 0, 1, \dots, p

, and some

r > \frac{a}{a - 1}

,

sup_{t \in N} E {\{σ_{t}^{- 4} sup_{Θ} {∥\nabla_{θ}^{s} f_{t - k} (\cdot, θ)∥}^{4}\}}^{r} < \infty .

Assumption A12

(Assumption A8). For

k = 0, 1, \dots, p

, and some

r > \frac{a}{a - 1}

,

sup_{t \in N} E {\{ε_{t}^{8} + σ_{t}^{- 8} f_{t}^{8} (\cdot, θ_{0}) + σ_{t}^{- 8} {∥\nabla_{θ} f_{t - k} (\cdot, θ_{0})∥}^{8}\}}^{r} < \infty .

Appendix A.2. Proof of Theorem 1

First, by using Hölder’s inequality and

C_{r}

inequality, we can easily verify that the sequence

\{h_{t}^{'} (γ) W_{t} h_{t} (γ)\}

is dominated by uniformly

L_{r}

-bounded variables (i.e.,

{sup}_{t} E {| h_{t}^{'} (γ) W_{t} h_{t} (γ) |}^{r} < \infty

). Therefore,

{\bar{Q}}_{T} (γ) = T^{- 1} \sum_{t = 1}^{T} E \{h_{t}^{'} (γ) W_{t} h_{t} (γ)\}

is well defined and is continuous on

Γ

uniformly in T. Then by the uniform law of large numbers (ULLN) (White and Domowitz 1984, Theorem 2.3), we have

{sup}_{γ \in Γ} | Q_{T} (γ) - {\bar{Q}}_{T} (γ) | \overset{a . s .}{⟶} 0 as T \to \infty

. Further, since

\{h_{t} (γ_{0}), F_{t}\}

is a martingale difference sequence and

W_{t}

is measurable–

F_{t - 1}

, we have

E \{h_{t}^{'} (γ) W_{t} h_{t} (γ)\} = E \{{(h_{t} (γ) - h_{t} (γ_{0}))}^{'} W_{t} (h_{t} (γ) - h_{t} (γ_{0}))\} + E \{h_{t}^{'} (γ_{0}) W_{t} h_{t} (γ_{0})\} .

Since

W_{t}

is non-negative definite a.s.-P, Assumption A3 ensures the uniqueness of the minimum of

{\bar{Q}}_{T} (γ)

for sufficiently large T. Thus the result follows from Theorem 3.4 of White (1996). □

Appendix A.3. Proof of Theorem 2

The proof consists of the following four steps.

(i) First we apply the mean value theorem for random functions to the first order condition for a minimum of

Q_{T} (γ)

. Since

{\hat{γ}}_{T} \overset{a . s .}{⟶} γ_{0}

and

γ_{0}

is interior to

Γ

, there is a neighbourhood

N \subset Γ

of

γ_{0}

such that

{\hat{γ}}_{T} \in N

a . s .

for sufficiently large T. Further, since

f_{t} (θ)

is twice continuously differentiable on

Θ

uniformly in t, by Jennrich (1969, Lemma 3), for sufficiently large T,

\nabla_{γ}^{2} Q_{T} ({\tilde{γ}}_{T}) ({\hat{γ}}_{T} (ω) - γ_{0}) = - \nabla_{γ} Q_{T} (γ_{0}),

(A1)

where

\nabla_{γ}^{2} Q_{T} (γ)

is the Hessian matrix of

Q_{T} (γ)

and

∥{\tilde{γ}}_{T} - γ_{0}∥ \leq ∥{\hat{γ}}_{T} - γ_{0}∥

.

(ii) Let

{\bar{A}}_{T} (γ) = 2 T^{- 1} \sum_{t = 1}^{T} E \{\nabla_{γ} h_{t}^{'} (γ) W_{t} \nabla_{γ^{'}} h_{t} (γ)\}

and we show that

∥\nabla_{γ}^{2} Q_{T} ({\tilde{γ}}_{T}) - {\bar{A}}_{T} (γ_{0})∥ \overset{a . s .}{⟶} 0 as T ⟶ \infty .

(A2)

Using Hölder’s, triangle and

C_{r}

inequalities we can verify that Assumptions A2 and A6 imply that

∥A_{t} (γ)∥

are dominated by uniformly

L_{r}

-bounded functions, where

A_{t} (γ) = 2 \nabla_{γ} h_{t}^{'} (γ) W_{t} \nabla_{γ^{'}} h_{t} (γ) + 2 (h_{t}^{'} (γ) W_{t} ⨂ I_{q + p + 1}) \nabla_{γ^{'}} vec (\nabla_{γ} h_{t}^{'} (γ)) .

Hence by the ULLN we have

sup_{γ \in Γ} ∥\nabla_{γ}^{2} Q_{T} (γ) - T^{- 1} \sum_{t = 1}^{T} E A_{t} (γ)∥ \overset{a . s .}{⟶} 0 as T ⟶ \infty .

Moreover, by the triangle inequality we have

\begin{matrix} ∥\nabla_{γ}^{2} Q_{T} ({\tilde{γ}}_{T}) - T^{- 1} \sum_{t = 1}^{T} E A_{t} (γ_{0})∥ & \leq sup_{γ \in Γ} ∥\nabla_{γ}^{2} Q_{T} (γ) - T^{- 1} \sum_{t = 1}^{T} E A_{t} (γ)∥ \\ + sup_{K \in N} ∥K^{- 1} \sum_{t = 1}^{K} {E A}_{t} ({\tilde{γ}}_{T}) - K^{- 1} \sum_{t = 1}^{K} E A_{t} (γ_{0})∥ a . s . \end{matrix}

Since

E A_{t} (γ)

is continuous on

Γ

uniformly in t and

E A_{t} (γ_{0}) = 2 E \{\nabla_{γ} h_{t}^{'} (γ_{0}) W_{t} \nabla_{γ^{'}} h_{t} (γ_{0})\}

, Equation (A2) follows by letting

T \to \infty

in the last inequality. Further, since

{\bar{A}}_{T} (γ_{0})

and

V_{T}

are uniformly nonsingular by Assumptions A7 and A9 respectively, for sufficiently large T, we have

\begin{matrix} V_{T}^{- 1 / 2} {\bar{A}}_{T} (γ_{0}) \sqrt{T} ({\hat{γ}}_{T} - γ_{0}) = - V_{T}^{- 1 / 2} \sqrt{T} \nabla_{γ} Q_{T} (γ_{0}) \\ + V_{T}^{- 1 / 2} {\bar{A}}_{T} (γ_{0}) ({\bar{A}}_{T}^{- 1} (γ_{0}) - \nabla_{γ}^{2} Q_{T}^{- 1} ({\tilde{γ}}_{T})) V_{T}^{1 / 2} V_{T}^{- 1 / 2} \sqrt{T} \nabla_{γ} Q_{T} (γ_{0}) . \end{matrix}

(A3)

(iii) Now we use Cramér-Wold device (Rao 1973, p. 123) to show that

V_{T}^{- 1 / 2} \sqrt{T} \nabla_{γ} Q_{T} (γ_{0}) \overset{d}{⟶} N (0, I_{q + p + 1}) as T \to \infty .

Let

λ \in R^{q + p + 1}

with

∥λ∥ = 1

. Then it is sufficent to show that

T^{- 1 / 2} \sum_{t = 1}^{T} λ^{'} V_{T}^{- 1 / 2} S_{t} (γ_{0}) \overset{d}{⟶} N (0, 1) as T \to \infty

, where

S_{t} (γ_{0}) = 2 \nabla_{γ} h_{t}^{'} (γ_{0}) W_{t} h_{t} (γ_{0})

. By Assumption A9 we have

V_{T}^{- 1 / 2} = O (1)

and, therefore, by Assumption A8 and applying again the Hölder’s and

C_{r}

inequality the double array

\{m_{T t} = λ^{'} V_{T}^{- 1 / 2} S_{t} (γ_{0})\}

is uniformly

L_{r}

-bounded for all T sufficiently large. Further, since

\{h_{t} (γ_{0}), F_{t}\}

is a martingale difference, we have

E (m_{T t}) = 0

and

var (T^{- 1 / 2} \sum_{t = 1}^{T} m_{T t}) = 1

for all T sufficiently large. It follows from Theorem 14.1 of Davidson (1994) and Assumption A1 that

\{m_{T t}\}

to be strong mixing of size

- a

. Hence by Theorem 5.20 of White (2001) we have

T^{- 1 / 2} \sum_{t = 1}^{T} m_{T t} \overset{d}{⟶} N (0, 1)

as

T \to \infty

.

(iv) By (A2), Assumption A7 and Theorem 2.16 of White (2001), we have

\nabla_{γ}^{2} Q_{T}^{- 1} ({\tilde{γ}}_{T}) - {\bar{A}}_{T}^{- 1} (γ_{0}) = o_{p} (1)

. Since

V_{T}^{- 1 / 2} \sqrt{T} \nabla_{γ} Q_{T} (γ_{0}) = O_{p} (1)

from (iii), it follows that

V_{T}^{- 1 / 2} {\bar{A}}_{T} (γ_{0}) ({\bar{A}}_{T}^{- 1} (γ_{0}) - \nabla_{γ}^{2} Q_{T}^{- 1} ({\tilde{γ}}_{T})) \sqrt{T} \nabla_{γ} Q_{T} (γ_{0}) = o_{p} (1) .

By the method of subsequences (Davidson 1994, Theorem 18.6), there exists a subsequence

\{T^{'}\}

such that

V_{T^{'}}^{- 1 / 2} {\bar{A}}_{T^{'}} (γ_{0}) ({\bar{A}}_{T^{'}}^{- 1} (γ_{0}) - \nabla_{γ}^{2} Q_{T^{'}}^{- 1} ({\tilde{γ}}_{T})) \sqrt{T^{'}} \nabla_{γ} Q_{T^{'}} (γ_{0}) \overset{a . s .}{⟶} 0 as T^{'} \to \infty,

which implies

V_{T^{'}}^{- 1 / 2} \sqrt{T^{'}} ({\bar{A}}_{T^{'}} (γ_{0}) ({\hat{γ}}_{T^{'}} - γ_{0}) + \nabla_{γ} Q_{T^{'}} (γ_{0})) \overset{a . s .}{⟶} 0 as T^{'} \to \infty .

Finally since

\{T^{'}\}

is arbitrary, we have

V_{T}^{- 1 / 2} {\bar{A}}_{T} (γ_{0}) \sqrt{T} ({\hat{γ}}_{T} - γ_{0}) + V_{T}^{- 1 / 2} \sqrt{T} \nabla_{γ} Q_{T} (γ_{0}) \overset{P}{\to} 0 as T ⟶ \infty,

and the proof is completed by applying the result (2c.4.12) of Rao (1973). □

Appendix A.4. Proof of Theorem 3

Let

R^{'} = \frac{1}{\sqrt{T}} (R_{1}^{'}, R_{2}^{'}, \dots, R_{T}^{'})

,

M^{'} = \frac{1}{\sqrt{T}} (M_{1}^{'}, M_{2}^{'}, \dots, M_{T}^{'})

,

R_{t}^{'} = \nabla_{γ} h_{t}^{'} (γ_{0}) W_{t} U_{t}^{1 / 2}

and

M_{t}^{'} = \nabla_{γ} h_{t}^{'} (γ_{0}) U_{t}^{- 1 / 2}

. Then the proof follows by noting that

E \{{(R - M E^{- 1} \{M^{'} M\} E \{M^{'} R\})}^{'} (R - M E^{- 1} \{M^{'} M\} E \{M^{'} R\})\}

(A4)

is nonnegative definite matrix. Moreover the equality in (A4) holds if

W_{t} = U_{t}^{- 1}

,

t = 1, 2, \dots, T

, which justifies Equation (6). The equivalence between Equations (6) and (7) follows from substituting

Ω_{t}^{- 1}

and

B_{t}

in Equation (7). □

References

Abarin, Taraneh, and Liqun Wang. 2009. Second-order least squares estimation of censored regression models. Journal of Statistical Planning and Inference 139: 125–35. [Google Scholar] [CrossRef]
Bollerslev, Tim, and Jeffrey M. Wooldridge. 1992. Quasi-maximum likelihood estimation and inference in dynamic models with time-varying covariances. Econometric Reviews 11: 143–72. [Google Scholar] [CrossRef]
Bose, Mausumi, and Rahul Mukerjee. 2015. Optimal design measures under asymmetric errors, with application to binary design points. Journal of Statistical Planning and Inference 159: 28–36. [Google Scholar] [CrossRef] [Green Version]
Davidson, James. 1994. Stochastic Limit Theory: An Introduction for Econometricians. Oxford: Oxford University Press. [Google Scholar]
Durairajan, T. M. 1992. Optimal estimating function for non-orthogonal model. Journal of Statistical Planning and Inference 33: 381–84. [Google Scholar] [CrossRef]
Enders, Walter. 2010. Applied Econometric Time Series. New York: Wiley. [Google Scholar]
Engle, Robert F. 1982. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica 50: 987–1007. [Google Scholar] [CrossRef]
Engle, Robert F., and Gloria Gonzalez-Rivera. 1991. Semiparametric ARCH models. Journal of Business & Economic Statistics 9: 345–59. [Google Scholar]
Franses, Philip Hans, and Dick Van Dijk. 2000. Non-Linear Time Series Models in Empirical Finance. Cambridge: Cambridge University Press. [Google Scholar]
Gao, Lucy L., and Julie Zhou. 2017. D-optimal designs based on the second-order least squares estimator. Statistical Papers 58: 77–94. [Google Scholar] [CrossRef]
He, Lei, and Rong-Xian Yue. 2019. R-optimality criterion for regression models with asymmetric errors. Journal of Statistical Planning and Inference 199: 318–26. [Google Scholar] [CrossRef]
Ibragimov, I. A., and Yu V. Linnik. 1971. Independent and Stationary Sequences of Random Variables. Gröningen: Wolters-Noordhoff. [Google Scholar]
Jennrich, Robert I. 1969. Asymptotic properties of non-linear least squares estimators. The Annals of Mathematical Statistics 40: 633–43. [Google Scholar] [CrossRef]
Kim, Mijeong, and Yanyuan Ma. 2012. The efficiency of the second-order nonlinear least squares estimator and its extension. Annals of the Institute of Statistical Mathematics 64: 751–64. [Google Scholar] [CrossRef] [Green Version]
Koul, Hira L., and Shiqing Ling. 2006. Fitting an error distribution in some heteroscedastic time series models. The Annals of Statistics 34: 994–1012. [Google Scholar] [CrossRef] [Green Version]
Li, David X., and Harry J. Turtle. 2000. Semiparametric ARCH models: An estimating function approach. Journal of Business & Economic Statistics 18: 174–86. [Google Scholar]
Li, Wai Keung, Shiqing Ling, and Michael McAleer. 2002. Recent theoretical results for time series models with GARCH errors. Journal of Economic Surveys 16: 245–69. [Google Scholar] [CrossRef]
Lindner, Alexander M. 2009. Stationarity, Mixing, Distributional Properties and Moments of GARCH (p, q)–Processes. Handbook of Financial Time Series; Berlin: Springer, pp. 43–69. [Google Scholar]
Ling, Shiqing. 2003. Adaptive estimators and tests of stationary and nonstationary short-and long-memory ARFIMA–GARCH models. Journal of the American Statistical Association 98: 955–67. [Google Scholar] [CrossRef]
Ling, Shiqing, and Michael McAleer. 2003. On adaptive estimation in nonstationary ARMA models with GARCH errors. The Annals of Statistics 31: 642–74. [Google Scholar] [CrossRef]
Meitz, Mika, and Pentti Saikkonen. 2008. Stability of nonlinear AR-GARCH models. Journal of Time Series Analysis 29: 453–75. [Google Scholar] [CrossRef] [Green Version]
Rao, Calyampudi Radhakrishna. 1973. Linear Statistical Inference and Its Applications. New York: Wiley. [Google Scholar]
Rosadi, D, and Peter Filzmoser. 2019. Robust second-order least-squares estimation for regression models with autoregressive errors. Statistical Papers 60: 105–22. [Google Scholar] [CrossRef]
Rosadi, Dedi, and Shelton Peiris. 2014. Second-order least-squares estimation for regression models with autocorrelated errors. Computational Statistics 29: 931–43. [Google Scholar] [CrossRef]
Salamh, Mustafa, and Liqun Wang. 2021. Second-order least squares method for dynamic panel data models with application. Journal of Risk and Financial Management 14: 410. [Google Scholar] [CrossRef]
Wang, Liqun. 2003. Estimation of nonlinear Berkson-type measurement error models. Statistica Sinica 13: 1201–10. [Google Scholar]
Wang, Liqun. 2004. Estimation of nonlinear models with Berkson measurement errors. The Annals of Statistics 32: 2559–79. [Google Scholar] [CrossRef] [Green Version]
Wang, Liqun. 2007. A unified approach to estimation of nonlinear mixed effects and Berkson measurement error models. Canadian Journal of Statistics 35: 233–48. [Google Scholar] [CrossRef]
Wang, Liqun, and Alexandre Leblanc. 2008. Second-order nonlinear least squares estimation. Annals of the Institute of Statistical Mathematics 60: 883–900. [Google Scholar] [CrossRef]
Weiss, Andrew A. 1986. Asymptotic theory for ARCH models: Estimation and testing. Econometric Theory 2: 107–31. [Google Scholar] [CrossRef] [Green Version]
White, Halbert. 1996. Estimation, Inference and Specification Analysis. Cambridge, UK: Cambridge University Press. [Google Scholar]
White, Halbert. 2001. Asymptotic Theory for Econometricians. New York: Academic Press. [Google Scholar]
White, Halbert, and Ian Domowitz. 1984. Nonlinear regression with dependent observations. Econometrica 52: 143–61. [Google Scholar] [CrossRef] [Green Version]
Yin, Yue, and Julie Zhou. 2017. Optimal designs for regression models using the second-order least squares estimator. Statistica Sinica 27: 1841–56. [Google Scholar] [CrossRef] [Green Version]
Zhu, Ke, and Wai Keung Li. 2015. A new Pearson-type QMLE for conditionally heteroskedastic models. Journal of Business and Economic Statistics 33: 552–65. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Reduction (%) in the QMLE efficiency-loss. (a)

ε_{t} \sim

Student’s t,

ϕ_{0} = 0.5

; (b)

ε_{t} \sim

Gamma,

ϕ_{0} = 0.5

; (c)

ε_{t} \sim

Gamma,

θ_{0} = 0.5

.

Figure 1. Reduction (%) in the QMLE efficiency-loss. (a)

ε_{t} \sim

Student’s t,

ϕ_{0} = 0.5

; (b)

ε_{t} \sim

Gamma,

ϕ_{0} = 0.5

; (c)

ε_{t} \sim

Gamma,

θ_{0} = 0.5

.

Figure 2. Ratio of the RMSE of FSLS to the RMSE of QML and EF estimators respectively. (a)

ε_{t} \sim

Gamma with shape parameter (2, 3, 4, 5, 6, 7). (b)

ε_{t} \sim

Student’s t with df (5, 6, 7, 8, 9).

Figure 2. Ratio of the RMSE of FSLS to the RMSE of QML and EF estimators respectively. (a)

ε_{t} \sim

Gamma with shape parameter (2, 3, 4, 5, 6, 7). (b)

ε_{t} \sim

Student’s t with df (5, 6, 7, 8, 9).

Table 1. Asymptotic variances of OSLS, QMLE, and MLE under AR(1)-ARCH(1) model.

		$v ({\hat{θ}}_{0})$	$v ({\hat{ϕ}}_{0})$	$v ({\hat{θ}}_{0})$	$v ({\hat{ϕ}}_{0})$	$v ({\hat{θ}}_{0})$	$v ({\hat{ϕ}}_{0})$	$v ({\hat{θ}}_{0})$	$v ({\hat{ϕ}}_{0})$
		$θ_{0} = 0.2$		$θ_{0} = 0.2$		$θ_{0} = 0.8$		$θ_{0} = 0.8$
		$ϕ_{0} = 0.2$		$ϕ_{0} = 0.6$		$ϕ_{0} = 0.2$		$ϕ_{0} = 0.6$
Gamma (2)	OSLS	0.83	2.89	0.81	1.23	0.24	2.77	0.19	1.23
	QML	1.35	4.48	1.63	2.02	0.42	4.48	0.38	2.03
	ML	0.15	0.25	0.08	0.14	0.04	0.06	0.04	0.13
Gamma (8)	OSLS	0.97	2.08	0.88	0.94	0.31	2.06	0.22	0.94
	QML	1.18	2.52	1.09	1.15	0.38	2.51	0.28	1.14
	ML	0.87	1.44	0.69	0.64	0.27	1.35	0.19	0.62
Gamma (12)	OSLS	1.02	2.00	0.90	0.90	0.32	1.99	0.23	0.90
	QML	1.17	2.30	1.05	1.04	0.37	2.29	0.27	1.04
	ML	0.97	1.58	0.79	0.69	0.30	1.50	0.21	0.72
Gamma (20)	OSLS	1.06	1.93	0.91	0.88	0.34	1.93	0.24	0.88
	QML	1.15	2.11	1.00	0.96	0.37	2.11	0.26	0.96
	ML	1.03	1.67	0.86	0.73	0.33	1.62	0.23	0.75
t (5)	OSLS	1.34	6.32	1.51	2.86	0.37	6.26	0.29	2.86
	QML	1.56	6.32	2.34	2.86	0.41	6.26	0.41	2.86
	ML	1.05	2.44	1.04	1.11	0.29	2.41	0.21	1.11
t (7)	OSLS	1.26	3.30	1.26	1.51	0.37	3.32	0.29	1.51
	QML	1.30	3.30	1.41	1.51	0.38	3.32	0.32	1.51
	ML	1.10	2.31	1.05	1.05	0.32	2.34	0.25	1.05
t (13)	OSLS	1.20	2.31	1.08	1.06	0.37	2.33	0.26	1.06
	QML	1.20	2.31	1.10	1.06	0.37	2.33	0.26	1.06
	ML	1.15	2.12	1.03	0.97	0.36	2.17	0.24	0.97

Table 2. Simulation results for FSLS and QMLE under AR(1)-ARCH(1) model with

ε_{t} \sim

Gamma(2,1).

Table 2. Simulation results for FSLS and QMLE under AR(1)-ARCH(1) model with

ε_{t} \sim

Gamma(2,1).

T		${\hat{θ}}_{0}$	$RMSE ({\hat{θ}}_{0})$	${\hat{ϕ}}_{0}$	$RMSE ({\hat{ϕ}}_{0})$	${\hat{θ}}_{0}$	$RMSE ({\hat{θ}}_{0})$	${\hat{ϕ}}_{0}$	$RMSE ({\hat{ϕ}}_{0})$
		(a): $θ_{0} = ϕ_{0} = 0.2$				(b): $θ_{0} = 0.2$ , $ϕ_{0} = 0.6$
60	QMLE	0.21	0.149	0.30	0.213	0.19	0.152	0.58	0.163
	FSLS	0.21	0.119	0.27	0.172	0.19	0.121	0.59	0.136
100	QMLE	0.20	0.116	0.25	0.169	0.20	0.121	0.58	0.134
	FSLS	0.20	0.091	0.23	0.138	0.20	0.094	0.59	0.108
1000	QMLE	0.20	0.038	0.20	0.063	0.20	0.040	0.60	0.037
	FSLS	0.20	0.029	0.20	0.052	0.20	0.028	0.60	0.030
10,000	QMLE	0.20	0.012	0.20	0.020	0.20	0.013	0.60	0.012
	OSLS	0.20	0.009	0.20	0.016	0.20	0.009	0.60	0.010
		(c): $θ_{0} = 0.8$ , $ϕ_{0} = 0.2$				(d): $θ_{0} = 0.8$ , $ϕ_{0} = 0.6$
60	QMLE	0.77	0.098	0.29	0.215	0.77	0.097	0.59	0.161
	FSLS	0.78	0.073	0.27	0.178	0.78	0.077	0.59	0.139
100	QMLE	0.78	0.074	0.25	0.172	0.78	0.071	0.58	0.128
	FSLS	0.79	0.054	0.23	0.140	0.79	0.054	0.59	0.106
1000	QMLE	0.80	0.021	0.20	0.062	0.80	0.021	0.60	0.038
	FSLS	0.80	0.016	0.19	0.050	0.80	0.015	0.60	0.030
10,000	QMLE	0.80	0.007	0.20	0.020	0.80	0.007	0.60	0.012
	OSLS	0.80	0.005	0.20	0.016	0.80	0.005	0.60	0.010

Table 3. Effect of the shape, sample size, and parameter values on the RRMSEs under Gamma distribution. All coefficients are significant at 0.0001 level.

	Coefficients					$R^{2}$	Error df
	Const	Shape	T	$θ_{0}$	$ϕ_{0}$	$R^{2}$	Error df
$R R M S E ({\hat{θ}}_{T}^{o})$	0.76546	0.02887	$- 0.00007$	$- 0.02783$	0.02525	0.78692	4785
$R R M S E ({\hat{ϕ}}_{T}^{o})$	0.79952	0.02078	$- 0.00003$	0.01013	0.00962	0.70135	4785

Table 4. Fitted Model-I (19) and (20) and Model-II (21) and (22), where standard errors are in parentheses. The superscript

^{a}

indicates statistical significance at 5% level.

Q 1 (n)

(

Q 2 (n)

) is the Ljung-Box statistic for the (squared) standardized innovations and the corresponding p-values are in parentheses. JB is the standard Jarque-Bera test.

Table 4. Fitted Model-I (19) and (20) and Model-II (21) and (22), where standard errors are in parentheses. The superscript

^{a}

indicates statistical significance at 5% level.

Q 1 (n)

(

Q 2 (n)

) is the Ljung-Box statistic for the (squared) standardized innovations and the corresponding p-values are in parentheses. JB is the standard Jarque-Bera test.

Coef.	Model-I			Model-II
Coef.	LS	QMLE	FSLS	QMLE	FSLS
	Conditional Mean Equation
$θ_{0}$	${0.071}^{a}$	${0.079}^{a}$	${0.049}^{a}$	${0.067}^{a}$	${0.063}^{a}$
	(0.016)	(0.015)	(0.014)	(0.015)	(0.013)
$θ_{1}$	${0.417}^{a}$	${0.339}^{a}$	${0.280}^{a}$	${0.323}^{a}$	${0.257}^{a}$
	(0.141)	(0.1)	(0.093)	(0.096)	(0.086)
$θ_{2}$	$- 0.039$	$0.004$	0.050	–	–
	(0.095)	(0.086)	(0.081)
$θ_{3}$	$- 0.180$	$- {0.235}^{a}$	−0.158	–	–
	(0.148)	(0.101)	(0.094)
$θ_{4}$	${0.436}^{a}$	${0.481}^{a}$	${0.563}^{a}$	${0.328}^{a}$	${0.339}^{a}$
	(0.184)	(0.108)	(0.101)	(0.116)	(0.104)
$θ_{5}$	$- {0.350}^{a}$	$- {0.310}^{a}$	$- {0.294}^{a}$	$- {0.246}^{a}$	$- {0.234}^{a}$
	(0.101)	(0.094)	(0.088)	(0.094)	(0.084)
$θ_{6}$	${0.076}^{a}$	${0.086}^{a}$	${0.051}^{a}$	${0.073}^{a}$	${0.067}^{a}$
	(0.018)	(0.017)	(0.016)	(0.017)	(0.015)
	Conditional Variance Equation
$ϕ_{0}$	$0.0001$	${0.000}^{a}$	0.000	${0.000}^{a}$	$0.000$
		(0.000)	(0.000)	(0.000)	(0.000)
$ϕ_{1}$	$0.1064$	0.093	0.021	–	–
		(0.134)	(0.126)
$ϕ_{2}$	$0.0000$	0.000	0.000	–	–
		(0.077)	(0.072)
$ϕ_{3}$	$0.0806$	0.100	0.102	–	–
		(0.146)	(0.137)
$ϕ_{4}$	$0.3364$	$0.389$	${0.479}^{a}$	0.553	${0.556}^{a}$
		(0.252)	(0.235)	(0.308)	(0.281)
	Diagnostic Statistics of the Standardized Innovations
Q1(5)	0.9 (0.97)	1.0 (0.96)	2.1 (0.83)	2.0 (0.85)	1.9 (0.86)
Q1(10)	4.4 (0.93)	7.4 (0.68)	6.4 (0.78)	8.4 (0.59)	8.5 (0.58)
Q1(15)	12.8 (0.62)	16.1 (0.38)	14.1 (0.52)	19.6 (0.19)	18.7 (0.23)
Q2(5)	3.7 (0.59)	4.4 (0.50)	0.4 (0.99)	6.5 (0.26)	6.4 (0.27)
Q2(10)	5.7 (0.84)	6.9 (0.74)	1.3 (0.99)	8.2 (0.61)	8.1 (0.62)
Q2(15)	9.2 (0.87)	9.8 (0.83)	2.7 (0.99)	9.1 (0.87)	9.4 (0.85)
Skewness	0.78	0.61	1.67	0.73	0.99
Kurtosis	4.07	3.49	8.48	3.73	4.27
JB	${11.0}^{a}$	5.35	$115^{a}$	${8.12}^{a}$	${16.93}^{a}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salamh, M.; Wang, L. Second-Order Least Squares Estimation in Nonlinear Time Series Models with ARCH Errors. Econometrics 2021, 9, 41. https://doi.org/10.3390/econometrics9040041

AMA Style

Salamh M, Wang L. Second-Order Least Squares Estimation in Nonlinear Time Series Models with ARCH Errors. Econometrics. 2021; 9(4):41. https://doi.org/10.3390/econometrics9040041

Chicago/Turabian Style

Salamh, Mustafa, and Liqun Wang. 2021. "Second-Order Least Squares Estimation in Nonlinear Time Series Models with ARCH Errors" Econometrics 9, no. 4: 41. https://doi.org/10.3390/econometrics9040041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Second-Order Least Squares Estimation in Nonlinear Time Series Models with ARCH Errors

Abstract

1. Introduction

2. Model and SLS Estimation

3. Optimal SLS Estimator

4. Simulation Studies

4.1. Comparison with Quasi-MLE

4.2. Comparison with Estimating Function Estimators

5. Application

6. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Regularity Conditions

Appendix A.2. Proof of Theorem 1

Appendix A.3. Proof of Theorem 2

Appendix A.4. Proof of Theorem 3

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI