A Comparison of Some Bayesian and Classical Procedures for Simultaneous Equation Models with Weak Instruments

Gao, Chuanming; Lahiri, Kajal

doi:10.3390/econometrics7030033

Open AccessArticle

A Comparison of Some Bayesian and Classical Procedures for Simultaneous Equation Models with Weak Instruments

by

Chuanming Gao

¹ and

Kajal Lahiri

^2,*

¹

Fannie Mae, 1100 15th St NW, Washington, DC 20005, USA

²

Department of Economics, University at Albany, SUNY, Albany, NY 12222, USA

^*

Author to whom correspondence should be addressed.

Econometrics 2019, 7(3), 33; https://doi.org/10.3390/econometrics7030033

Submission received: 18 March 2019 / Revised: 3 July 2019 / Accepted: 23 July 2019 / Published: 29 July 2019

Download Versions Notes

Abstract

:

We compare the finite sample performance of a number of Bayesian and classical procedures for limited information simultaneous equations models with weak instruments by a Monte Carlo study. We consider Bayesian approaches developed by Chao and Phillips, Geweke, Kleibergen and van Dijk, and Zellner. Amongst the sampling theory methods, OLS, 2SLS, LIML, Fuller’s modified LIML, and the jackknife instrumental variable estimator (JIVE) due to Angrist et al. and Blomquist and Dahlberg are also considered. Since the posterior densities and their conditionals in Chao and Phillips and Kleibergen and van Dijk are nonstandard, we use a novel “Gibbs within Metropolis–Hastings” algorithm, which only requires the availability of the conditional densities from the candidate generating density. Our results show that with very weak instruments, there is no single estimator that is superior to others in all cases. When endogeneity is weak, Zellner’s MELO does the best. When the endogeneity is not weak and

ρ ω_{12} > 0

, where

ρ

is the correlation coefficient between the structural and reduced form errors, and

ω_{12}

is the covariance between the unrestricted reduced form errors, the Bayesian method of moments (BMOM) outperforms all other estimators by a wide margin. When the endogeneity is not weak and

β ρ

< 0 (

β

being the structural parameter), the Kleibergen and van Dijk approach seems to work very well. Surprisingly, the performance of JIVE was disappointing in all our experiments.

Keywords:

limited information estimation; weak instruments; Metropolis–Hastings algorithm; Gibbs sampler; Monte Carlo method

JEL Classification:

C30; C11; C13; C15

1. Introduction

Research on Bayesian analysis of the simultaneous equations models addresses a problem, raised initially by Maddala (1976), and now recognized as related to the problem of local nonidentification when diffuse/flat priors are used in traditional Bayesian analysis, e.g., Drèze (1976); Drèze and Morales (1976), and Drèze and Richard (1983).1 In this paper, we examine the approaches developed by Chao and Phillips (1998, hereafter CP), Geweke (1996), Kleibergen and van Dijk (1998, hereafter KVD), and Zellner (1998). The idea in KVD was to treat an overidentified simultaneous equations model (SEM) as a linear model with nonlinear parameter restrictions, and has been extended further in Kleibergen and Zivot (2003). While KVD focused mainly on resolving the problem of local nonidentification, CP explored further the consequences of using a Jeffreys prior. By deriving the exact and (asymptotically) approximate representations for the posterior density of the structural parameter, CP showed that the use of a Jeffreys prior brings Bayesian inference closer to classical inference in the sense that this prior choice leads to posterior distributions which exhibit Cauchy-like tail behavior akin to the LIML estimator. Geweke (1996), being aware of the potential problem of local nonidentification, suggests a shrinkage prior such that the posterior density is properly defined for each parameter. In another approach, Zellner (1998) suggested a finite sample Bayesian method of moments (BMOM) procedure based on given data without specifying a likelihood function or introducing any sampling assumptions.

For the Bayesian approaches considered, while Geweke (1996) proposed Gibbs sampling (GS) to evaluate the posterior density with a shrinkage prior, the posterior densities as well as their conditional densities resulting from CP and KVD are nonstandard and cannot be readily simulated. In the category of “block-at-a-time” approach, we suggest a novel MCMC procedure, which we call a “Gibbs within M–H” algorithm. The advantage of this algorithm is that it only requires the availability of the conditional densities from the candidate generating density. These conditional densities are used in a Gibbs sampler to simulate the candidate generating density, whose drawings on convergence are then weighted to generate drawings from the target density in a Metropolis–Hastings (M–H) algorithm. In this study, we will focus on weak instruments, where the classical approach has been particularly problematic.2 Ni and Sun (2003) have studied similar issues in the context of vector autoregressive models, see also Ni et al. (2007). Radchenko and Tsurumi (2006) used many of the procedures analyzed in this paper to estimate a gasoline demand model using an MCMC algorithm.

The main objective of the present paper is to compare the small sample performance of some Bayesian and classical approaches using Monte Carlo simulations. For the purpose of comparison, a number of classical methods including OLS, 2SLS, LIML, Fuller’s modified LIML, and a jackknife instrumental variables estimator (JIVE) originally due to Angrist et al. (1999) and Blomquist and Dahlberg (1999) are also computed from the generated data. Our simulation results from repeated sampling experiments provide some unambiguous guidelines for empirical practitioners.

The plan of the paper is as follows. In Section 2, we set up the model. Section 3 reviews in limited details the recent Bayesian approaches and JIVE. Section 4 suggests a new MCMC procedure for evaluating the posterior distributions for CP and KVD, and discusses the convergence diagnostics implemented. Section 5 presents simulation results and some discussions. Section 6 contains the main conclusions.

2. The Model

Consider the following limited information formulation of the m-equation simultaneous equations model (LISEM):

y_{1} = Y_{2} β + Z_{1} γ + u

(1)

Y_{2} = Z_{1} Π_{1} + Z_{2} Π_{2} + V_{2}

(2)

where y₁: (T × 1) and Y₂: (T × (m − 1)) are the m included endogenous variables; Z₁: (T × k₁) is an observation matrix of exogenous variables included in the structural Equation (1); Z₂: (T × k₂) is an observation matrix of exogenous variables excluded from (1); and u and V₂ are, respectively, a T × 1 vector and a T × (m − 1) matrix of random disturbances to the system. We assume that (u, V₂) ∼ N (0, Σ ⊗ I_T), where the m × m covariance matrix is positive definite symmetric (pds) and is partitioned conformably with the rows of (u, V₂) as follows

Σ = (\begin{matrix} σ_{11} & σ_{21}^{'} \\ σ_{21} & Σ_{22} \end{matrix})

The likelihood function for the model described by (1) and (2) can be written as

\begin{array}{l} L (β, γ, Π_{1}, Π_{2}, Σ | Y, Z) \\ = {(2 π)}^{- T m / 2} {| Σ |}^{- T / 2} e x p {- \frac{1}{2} t r [Σ^{- 1} {(u, V_{2})}^{'} (u, V_{2})]}, \end{array}

(3)

where

Y = (y_{1}, Y_{2})

and

Z = (Z_{1}, Z_{2})

.

The structural model described by (1) and (2) can alternatively be written in its reduced form

(\begin{matrix} y_{1} & Y_{2} \end{matrix}) = (\begin{matrix} Z_{1} & Z_{2} \end{matrix}) (\begin{matrix} π_{1} & Π_{1} \\ Π_{2} β & Π_{2} \end{matrix}) + (\begin{matrix} ξ_{1} & V_{2} \end{matrix})

(4)

where

π_{1} = γ + Π_{1} β,

ξ_{1} = u + V_{2} β,

(ξ_{1}, V_{2}) ~ N (0, Ω \otimes I_{T}), and

Σ = C^{'} Ω C,

C = (\begin{matrix} 1 & 0 \\ - β & I_{m - 1} \end{matrix})

. The likelihood function corresponding to this alternative representation is:

L (β, γ, Π_{1}, Π_{2}, Ω | Y, Z) = {(2 π)}^{- T m / 2} {| Ω |}^{- T / 2} e x p {- \frac{1}{2} t r [Ω^{- 1} {(ξ_{1}, V_{2})}^{'} (ξ_{1}, V_{2})]}

(5)

The likelihood functions (3) and (5) are equivalent since the Jacobian between

Ω

and Σ is unity.

Geweke (1996) considers the following reduced rank regression specification3

Y = Z_{1} A + Z_{2} Θ + E,

(6)

where

A = (Π_{1}, π_{1}), Θ = Π_{2} \underline{Φ}

and

\underline{Φ} = (I_{m - 1}, β)

,

E = (V_{2}, ξ_{1})

~

N (0, \underline{Σ} \otimes I_{T}

with

{\underline{Σ}}^{- 1}

=

(\begin{matrix} {\underline{Σ}}^{11} & {\underline{Σ}}^{12} \\ {\underline{Σ}}^{21} & {\underline{Σ}}^{22} \end{matrix})

partitioned conformably with the rows of

(V_{2}, ξ_{1})

. Obviously, (6) is equivalent to (4) and the corresponding likelihood function is similar to (5).

Note that in the absence of restrictions on the covariance structure, (1) is fully identified if and only if

r a n k (Π_{1}) = (m - 1) \leq k_{2}

.

3. Review of Some Bayesian Formulations

Among the Bayesian approaches, Geweke (1996) used a shrinkage prior such that all parameters are identified (in the sense that a proper posterior distribution exists) even when

Π_{2}

has reduced rank. KVD treated overidentified SEMs as linear models with nonlinear parameter restrictions using the singular value decomposition. A diffuse or natural conjugate prior for the parameters of the embedding linear model results in the posterior for the parameters of the SEM having zero weight in the region of parameter space where

Π_{2}

has reduced rank. This is a feature of the Jacobian of transformation from the multivariate linear model to the SEM. CP used a prior by applying Jeffreys principle on the model described by (1) and (2) and the assumptions regarding the disturbances. An important advantage of the Jeffreys prior in the present context is that it places no weight in the region of the parameter space where

r a n k (Π_{2}) < (m - 1)

and relatively low weight in close neighborhoods of this region where the model is nearly unidentified.

3.1. Zellner’s Bayesian Method of Moments Approach (BMOM)

Among the various Bayesian treatments of SEM proposed by Zellner (1971, 1978, 1986, 1994, 1998), the Bayesian method of moments approach applies the principle of maximum entropy and generates optimal estimates which can be evaluated by double K-class estimators. Given the unrestricted reduced form equation

y_{1} = Z {\bar{π}}_{1} + ξ_{1}

, Zellner (1998) considered a balanced loss function,

\begin{array}{l} L_{b} = ω L_{g} + (1 - ω) L_{p} \\ = ω {(y_{1} - X \hat{δ})}^{'} (y_{1} - X \hat{δ}) + (1 - ω) {(Z {\bar{π}}_{1} - X \hat{δ})}^{'} (Z {\bar{π}}_{1} - X \hat{δ}), for 0 \leq ω \leq 1 \end{array}

where

X = (Y_{2}, Z_{1})

,

δ = (β^{'}, γ^{'})

, and

\hat{δ}

is an estimate of

δ

. The BMOM estimate that minimizes

E L_{b}

, where the expectation is taken with respect to a probability density function of the

π

matrices of the unrestricted reduced form equations, is given by

(\begin{matrix} \hat{β} \\ \hat{γ} \end{matrix}) = {(\begin{matrix} Y_{2}^{'} Y_{2} - K_{1} {\hat{V}}_{2}^{'} \hat{V_{2}} & Y_{2}^{'} Z_{1} \\ Z_{1}^{'} Y_{2} & Z_{1}^{'} Z_{1} \end{matrix})}^{- 1} (\begin{matrix} {(Y_{2} - K_{2} \hat{V_{2}})}^{'} y_{1} \\ Z_{1}^{'} y_{1} \end{matrix}),

(7)

where

K_{1} = 1 - k / (T - k), K_{2} = 1 - (1 - ω) k / (T - k)

with

0 \leq ω \leq 1

and

\hat{V_{2}} = (I - Z {(Z^{'} Z)}^{- 1} Z^{'}) Y_{2}

.

BMOM estimate will vary depending on the value of

ω

. When

ω = 1

, it is the optimal estimate resulting from a “goodness of fit” loss function L_g. When ω = 0, it is the optimal estimate given by a precision of estimation loss function L_p. Meanwhile, the well-known minimum expected loss (MELO) estimator is derived using a precision of estimation loss function and may be evaluated as a K-class estimator with

K_{1} = K_{2} = 1 - k / (T - k - m - 1) .

Similar to the BMOM method, Conley et al. (2008) developed a Bayesian semiparametric approach to the instrumental variable problem assuming linear structural and reduced form equations, but with unspecified error distributions.

3.2. The Geweke Approach

Geweke (1996) assumes the following reference prior

{| \underline{Σ} |}^{- (m + v + 1) / 2} \exp [- \frac{1}{2} t r {S \underline{Σ}}^{- 1}] \exp [- \frac{τ^{2}}{2} (β^{'} β + t r Π_{2}^{'} Π_{2} + t r A^{'} A)],

(8)

which is the product of an independent inverted Wishart distribution for

Σ

with

v

degrees of freedom and scale matrix S, and an independent N(0,

τ^{2}

) shrinkage priors for each element of

β

and

Π_{2}

. Geweke derived the respective conditional posterior distributions, which may be used to generate drawings through Gibbs sampling from the joint posterior distribution. Regarding the vector of parameters

({\underline{Σ}}^{- 1}, A, Π_{2}, β)

, we obtain the full conditional densities as follows:

(1): Conditional density of ${\underline{Σ}}^{- 1}$

${\underline{Σ}}^{- 1} | (Π_{2}, β, A, Z, Y) ~ Wishart (T + v, G^{- 1}),$

(9)

where $G = S + {(Y - Z_{1} A - Z_{2} Θ)}^{'} (Y - Z_{1} A - Z_{2} Θ) .$
(2): Conditional density of A

$vec (A) | (Π_{2}, β, {\underline{Σ}}^{- 1}, Z, Y) ~ N ({[{\underline{Σ}}^{- 1} \otimes Z_{1}^{'} Z_{1} + τ^{2} I_{m k_{1}}]}^{- 1} [{\underline{Σ}}^{- 1} \otimes {Z_{1}}^{'} Z_{1}] vec (\hat{A}), {[{\underline{Σ}}^{- 1} \otimes Z_{1}' Z_{1} + τ^{2} I_{m k_{1}}]}^{- 1}),$

(10)

where $\hat{A} = {(Z_{1}^{'} Z_{1})}^{- 1} Z_{1}^{'} (Y - Z_{2} Θ)$ .
(3): Conditional density of $Π_{2}$ 4

$vec (Π_{2}) | (β, {\underline{Σ}}^{- 1}, A, Z, Y) ~ N ({[{\tilde{Σ}}^{11} \otimes Z_{2}^{'} Z_{2} + τ^{2} I_{k_{2} (m - 1)}]}^{- 1} [{\tilde{Σ}}^{11} \otimes Z_{2}^{'} Z_{2}] vec ({\hat{Π}}_{2}), {[{\tilde{Σ}}^{11} \otimes Z_{2}^{'} Z_{2} + τ^{2} I_{k_{2} (m - 1)}]}^{- 1}),$

(11)

where ${\hat{Π}}_{2} = \hat{Θ} [{\underline{Φ}}^{+} + {\underline{Φ}}^{0} {\tilde{Σ}}^{21} {({\tilde{Σ}}^{11})}^{- 1}],$ $\hat{Θ} = {(Z_{2}^{'} Z_{2})}^{- 1} Z_{2}^{'} (Y - Z_{1} A) .$ ${\underline{Φ}}^{+}$ is the Moore–Penrose generalized inverse of $\underline{Φ}$ and the columns of ${\underline{Φ}}^{+}$ and ${\underline{Φ}}^{0}$ are orthogonal, and C ≡ ( ${\underline{Φ}}^{+}$ , ${\underline{Φ}}^{0})$ is and m × m nonsingular matrix. Finally, ${\tilde{Σ}}^{i j}$ denotes the partitioning of ${\tilde{Σ}}^{- 1} = {(C^{'} \underline{Σ} C)}^{- 1}$ conformably with $Y = (Y_{2}, y_{1})$ .
(4): Conditional density of $β$

$β | (Π_{2}, {\underline{Σ}}^{- 1}, A, Z, Y)$

$~ N ({[{\underline{Σ}}^{22} \otimes Π_{2}^{'} Z_{2}^{'} Z_{2} Π_{2} + τ^{2} I_{m - 1}]}^{- 1} [{\underline{Σ}}^{22} \otimes Π_{2}^{'} Z_{2}^{'} Z_{2} Π_{2}] \hat{β}, {[{\underline{Σ}}^{22} \otimes Π_{2}^{'} Z_{2}^{'} Z_{2} Π_{2} + τ^{2} I_{m - 1}]}^{- 1}),$

(12)

where

$\hat{β} = {(Π_{2}^{'} Z_{2}^{'} Z_{2} Π_{2})}^{- 1} Π_{2}^{'} Z_{2}^{'} (Y_{2} - Z_{1} Π_{1}) {\underline{Σ}}^{12} {({\underline{Σ}}^{22})}^{- 1} - {\underline{Σ}}^{12} {({\underline{Σ}}^{22})}^{- 1} + {(Π_{2}^{'} Z_{2}^{'} Z_{2} Π_{2})}^{- 1} Π_{2}^{'} Z_{2}^{'} (y_{1} - Z_{1} π_{1})$

3.3. The Chao and Phillips Approach

Using Jeffreys prior, CP obtains exact and approximate analytic expressions for the posterior density of the structural coefficient β in the LISEM (1) and (2). Their formulas are found to exhibit Cauchy-like tails analogous to comparable results in the classical literature on LIML estimation. For the model (1) and (2) under normality assumption for the disturbances, a Jeffreys prior on the parameters,

θ = (β, γ, Π_{1}, Π_{2}, Σ)

, is of the form

\begin{matrix} p (β, γ, Π_{1}, Π_{2}, Σ) & \propto {| - E {\frac{\partial^{2}}{\partial θ \partial θ'} \ln L (θ | Y, Z)} |}^{1 / 2} \\ \propto {| σ_{11} |}^{(k_{2} - m + 1) / 2} {| Σ |}^{- (k + m + 1) / 2} {| Π_{2}^{'} Z_{2}^{'} Q_{Z_{1}} Z_{2} Π_{2} |}^{1 / 2} \end{matrix}

(13)

where

l n L (θ | Y, Z)

is the log-likelihood function as specified in (3), and

Q_{X} = I_{T} - P_{X}, P_{X} = X {(X^{'} X)}^{- 1} X^{'}

. As first noted by Poirier (1996), the prior in (13) places no weight where

r a n k (Π_{2}) < (m - 1)

through the factor

{| Π_{2}^{'} Z_{2}^{'} Q_{Z_{1}} Z_{2} Π_{2} |}^{1 / 2}

.

The joint posterior of the parameters of LISEM (1) and (2) is constructed as proportional to the product of the prior (13) and the likelihood function (3),

p (β, γ, Π_{1}, Π_{2}, Σ | Y, Z) \propto p (β, γ, Π_{1}, Π_{2}, Σ) L (β, γ, Π_{1}, Π_{2}, Σ | Y, Z)

\propto {| σ_{11} |}^{(k_{2} - m + 1) / 2} {| Σ |}^{- (T + k + m + 1) / 2} {| Π_{2}^{'} Z_{2}^{'} Q_{Z_{1}} Z_{2} Π_{2} |}^{1 / 2} \times \exp {- \frac{1}{2} t r [Σ^{- 1} {(u, V_{2})}^{'} (u, V_{2})]}

(14)

where (u, V₂) is defined in (1) and (2). Note that (14) or its conditionals do not belong to any standard class of probability density functions.

3.4. The Kleibergen and van Dijk Approach

To solve the problem of local nonidentification and to avoid the so-called Borel–Kolmogorov paradox, see Billingsley (1986) and Poirier (1995), KVD considered (4) as a multivariate linear model with nonlinear parameter restrictions:

(\begin{matrix} y_{1} & Y_{2} \end{matrix}) = (\begin{matrix} Z_{1} & Z_{2} \end{matrix}) (\begin{matrix} π_{1} & Π_{1} \\ ϕ_{2} & Φ_{2} \end{matrix}) + (\begin{matrix} ξ_{1} & V_{2} \end{matrix}),

(15)

where

ϕ_{1}

is a

k_{2} \times 1

vector,

Φ_{2}

is a

k_{2} \times (m - 1)

matrix. Denote

Φ = (ϕ_{1}, Φ_{2})

. The reduced form model (4) is obtained if a reduced rank restriction is imposed on the linear model (15) such that

r a n k (Φ) = (m - 1)

instead of m.

Using a singular value decomposition (SVD) of

Φ

, they show that (15) is identical to the so-called unrestricted reduced form (URF) model,5

(\begin{matrix} y_{1} & Y_{2} \end{matrix}) = Z_{1} (\begin{matrix} π_{1} & Π_{1} \end{matrix}) + Z_{2} Π_{2} B + Z_{2} Π_{2 ⊥} λ B_{⊥} + (\begin{matrix} ξ_{1} & V_{2} \end{matrix}),

(16)

where

B = (\begin{matrix} β & I_{m - 1} \end{matrix})

,

λ

is a

(k_{2} - m - 1) \times 1

vector.

Π_{2 ⊥}

and

B_{⊥}

are the orthogonal complements of

Π_{2}

and B, respectively, such that

Π_{2}^{'} Π_{2 ⊥} \equiv 0,

{BB}_{⊥}^{'} \equiv 0,

and

Π_{2 ⊥}^{'} Π_{2 ⊥} \equiv I_{k_{2} - m - 1}

,

B_{⊥} B_{⊥}^{'} \equiv 1

(i.e.,

Π_{2 ⊥} = {(- Π_{22} Π_{21}^{- 1} I_{k_{2} - m - 1})}^{'} {(I_{k_{2} - m - 1} + Π_{22} Π_{21}^{- 1} Π_{21}^{- 1^{'}} Π_{22})}^{- 1 / 2}

, where

Π_{2} = {(Π_{21}^{'} Π_{22}^{'})}^{'}

,

Π_{21} : (m - 1) \times (m - 1)

,

Π_{22} : (k_{2} - m - 1) \times (m - 1)

, and

B_{⊥} = {(1 + β^{'} β)}^{1 / 2} (1 - β^{'})

).

There is one-to-one correspondence between the parameters in (15) and (16). The SVD of Φ is,

Φ = U S V^{'},

(17)

where

U : k_{2} \times k_{2}

,

U^{'} U = I_{k_{2}}

;

V : m \times m

;

V^{'} V = I_{m}

; and

S : k_{2} \times m

is a rectangular matrix containing the (nonnegative) singular values (in decreasing order) on its main diagonal (i.e., (s₁₁, s₁₁, …, s_mm)). Rewrite

U = (\begin{matrix} U_{11} & U_{12} \\ U_{21} & U_{22} \end{matrix}), = (\begin{matrix} S_{1} & 0 \\ 0 & S_{2} \end{matrix}), and V = (\begin{matrix} v_{11} & v_{12} \\ v_{21} & v_{22} \end{matrix}),

(18)

where

U_{11}, S_{1}, v_{21} : (m - 1) \times (m - 1);

v_{12} : 1 \times 1; v_{11}^{'}, v_{22} : (m - 1) \times 1;

U_{12} : (m - 1) \times (k_{2} - m + 1) \times (m - 1)

;

U_{21} : (k_{2} - m - 1) \times (m - 1)

;

U_{22} : (k_{2} - m - 1) \times (k_{2} - m + 1)

;

S_{2} : (k_{2} - m - 1) \times 1

, then the following relationship between (

Π_{2}, β, λ

) and

(U, S, V)

results,

Π_{2} = (\begin{matrix} U_{11} \\ U_{12} \end{matrix}) S_{1} V_{21}^{'}

,

β = V_{21}^{' - 1} v_{11}^{'}

, and

λ = {(U_{22} U_{22}^{'})}^{- 1 / 2} U_{22} S_{2} v_{12}^{'} {(v_{12} v_{12}^{'})}^{- 1 / 2} .

(19)

Note that

λ

is obtained through pre- and postmultiplication of s₂ by orthogonal matrices while s₂ contains the smallest singular values of

Φ

and is invariant with respect to the ordering of variables contained in Y and Z₂.

According to KVD, the above shows that the model described by (1) and (2) can be considered as equivalent to the linear model (16) with a nonlinear (reduced rank) restriction

λ = 0

on the parameters. Therefore, the priors and posteriors of the parameters of the LISEM (1) and (2) may be constructed as proportional to the priors and posteriors of the parameters of the linear model (16) evaluated at

λ = 0

.

A diffuse (Jeffreys) prior for the parameters

(π_{1}, Π_{1}, Φ, Ω)

of the linear model6

p (π_{1}, Π_{1}, Φ, Ω) \propto {| Ω |}^{- (k + m + 1) / 2} \propto {| Ω |}^{- (m + 1) / 2} {| Ω^{- 1} \otimes Z^{'} Z |}^{1 / 2}

(20)

where

k = k_{1} + k_{2}

, implies the prior for the parameters

(β, π_{1}, Π_{1}, Π_{2}, Ω)

of the LISEM (4) as

p (β, π_{1}, Π_{1}, Π_{2}, Ω) \propto p (π_{1}, Π_{1} Φ (Π_{2}, β, λ), Ω) ∣_{λ = 0} ∣ J (Φ, (Π_{2}, β, λ)) |_{λ = 0} | \propto {| Ω |}^{- (m + 1) / 2} {| Ω^{- 1} \otimes Z^{'} Z |}^{1 / 2} | J (Φ, (Π_{2}, β, λ)) |_{λ = 0} | \propto {| Ω |}^{- (m + 1) / 2} {| Ω^{- 1} \otimes Z^{'} Z |}^{1 / 2} \times | (B' \otimes I_{k_{2}} e_{1} \otimes Π_{2} B_{⊥}^{'} \otimes Π_{2 ⊥}) |,

(21)

where

e_{1} = {(1, 0, 0, \dots, 0)}^{'}

. Note that the prior (21) is the Jeffreys prior of the unrestricted reduced form (16) evaluated at

λ = 0

. Most importantly, |J(Φ, (Π₂, β, λ))|_λ=0 = 0 when

Π_{2}

has reduced rank. This feature in KVD approach eliminates the potential impact of local nonidentification.

The joint posterior of the parameters of the LISEM (4) is readily constructed as proportional to the product of the prior (21) and the likelihood function (5),

p (β, π_{1}, Π_{1}, Π_{2}, Ω | Y, Z) \propto p (β, π_{1}, Π_{1}, Π_{2}, Ω) L^{*} (β, γ, Π_{1}, Π_{2}, Ω | Y, Z) \propto {| Ω |}^{- (T + m + 1) / 2} {| Ω^{- 1} \otimes Z^{'} Z |}^{1 / 2} \times | (B^{'} \otimes I_{k_{2}} e_{1} \otimes Π_{2} B_{⊥}^{'} \otimes Π_{2 ⊥}) | \times \exp {- \frac{1}{2} t r [Ω^{- 1} {((\begin{matrix} y_{1} & Y_{2} \end{matrix}) - (\begin{matrix} Z_{1} & Z_{2} \end{matrix}) (\begin{matrix} π_{1} & Π_{1} \\ Π_{2} β & Π_{2} \end{matrix}))}^{'} ((\begin{matrix} y_{1} & Y_{2} \end{matrix}) - (\begin{matrix} Z_{1} & Z_{2} \end{matrix}) (\begin{matrix} π_{1} & Π_{1} \\ Π_{2} β & Π_{2} \end{matrix}))]}_{}

(22)

Unfortunately, the above posterior or its conditional densities do not belong to a known class of probability density functions.

3.5. The Jackknife Instrumental Variable Estimator (JIVE)

Motivated by split sample instrumental variables estimators, Angrist et al. (1999) and Blomquist and Dahlberg (1999) independently suggested a jackknife instrumental variable estimator (JIVE). For model (1) and (2), JIVE is given by

{\hat{δ}}_{jive} = {({\hat{X}}_{jive}^{'} X)}^{- 1} ({\hat{X}}_{jive}^{'} y_{1})

(23)

where

{\hat{X}}_{jive}

is the

T \times (m - 1 + k_{1})

matrix with t-th row defined by

Z_{t} {\hat{Π}}_{- t} = Z_{t} {(Z_{- t}^{'} Z_{- t})}^{- 1} (Z_{- t}^{'} X_{- t}) = \frac{Z_{t} \hat{Π} - h_{t} X_{t}}{1 - h_{t}},

Z_{- t}

and

X_{- t}

are

(T - 1) \times k

and

(T - 1) \times (m - 1 + k_{1})

matrices obtained after eliminating the t-th rows of Z and X matrices respectively,

\hat{Π} = {(Z^{'} Z)}^{- 1} (Z^{'} X)

and

h_{t} = Z_{t} {(Z^{'} Z)}^{- 1} Z_{t}^{'}

. In JIVE, the instrument is independent of the disturbances even in finite samples, which is achieved by using a ‘leave-one-out’ jackknife-type fitted value in place of the usual unrestricted reduced form predictions.

Angrist et al. (1999) also proposed a second jackknife estimator that is a slight modification of (23). Similar to their study, we found that its performance is very similar to JIVE, and is not reported here.

4. Posterior Simulator: “Gibbs within M–H” Algorithm

Given the full conditional densities in (9) through (12) for the four blocks of parameters, evaluating the joint posterior densities by Gibbs sampling is straightforward, see Geweke (1996) for a detailed description. Although Geweke’s (1996) shrinkage prior does not meet the argument in KVD that the implied prior/posterior on the parameters of an embedding linear model should be well-behaved, we found that the use of Geweke’s shrinkage prior does not lead to a reducible Markov Chain. With the specification of a shrinkage prior, when

Π_{2}

has reduced rank, the joint posterior density still depends on

β

and will not exhibit any asymptotic cusp. In the following, we only discuss the posterior simulation for CP and KVD.

KVD suggested two simulation algorithms for the posterior (22): an Importance sampler and a Metropolis–Hastings algorithm. We found that their M–H algorithm performs unsatisfactorily with low acceptance rate even for reasonable parameter specifications.7 As mentioned earlier, since the posteriors (14) and (22) as well as their conditional posteriors do not belong to any standard class of probability density functions, Gibbs sampling cannot be used. In this section, we suggest an alternative simulation algorithm which combines Gibbs sampling (see Casella and George (1992) and Chib and Greenberg (1996)) and the Metropolis–Hastings algorithm (see Metropolis et al. 1953; Hastings 1970; Smith and Roberts 1993; Tierney 1994; Chib and Greenberg 1995). Our algorithm is different from the “M–H within Gibbs” algorithm and can find its usefulness in other applications as well.

To generate drawings from the target density p(x), we use a candidate-generating density r(x). An Independence sampler, which is a special case of the M–H sampler, in algorithmic form is as follows:

0.: Choose starting values x⁰
1.: Draw xⁱ from r(x)
2.: Accept xⁱ with probability

$α (x^{i - 1}, x^{i}) = {\begin{matrix} \min (\frac{p (x^{i}) r (x^{i - 1})}{p (x^{i - 1}) r (x^{i})}, 1), if p (x^{i - 1}) r (x^{i}) > 0 \\ 1, if p (x^{i - 1}) r (x^{i}) = 0, \end{matrix}$

(24)

otherwise $x^{i} = x^{i - 1}$
3.: $i = i + 1$ . Go to 1.

It is generally not feasible to draw all elements of the vector x simultaneously. A block-at-a-time possibility was first discussed in (Hastings 1970, sct. 2.4) and then in Chib and Greenberg (1995) along with an example.

Chib and Greenberg (1995) considered applying the M–H algorithm in turn to sub-blocks of the vector

x

, which presumes that the target density p(x) may be manipulated to generate full conditional densities for each of the sub-blocks of

x

, conditioning on other elements of x. However, the full conditionals are sometimes not readily available from the target density for empirical investigators. The posteriors (14) and (22) happen to fall in this category. In this latter case, problems come up at step 1 while trying to generate drawings from the joint marginal density

r (x)

. Note that these drawings, whether accepted or rejected at step 2, satisfy the necessary reversibility condition if step 1 is performed successfully.

To simplify the notation, we consider a vector x which contains two blocks,

x = (x_{1}, x_{2})

. KVD used the fact that

r (x_{1}, x_{2}) = r (x_{1}) r (x_{2} | x_{1})

(25)

and suggested to draw

x_{1}^{i}

from

r (x_{1})

and then draw

x_{2}^{i}

from

r (x_{2} | x_{1}^{i})

. The pair

(x_{1}^{i}, x_{2}^{i})

is then taken as a drawing from

r (x)

. It turns out that this strategy gives very low acceptance rate at step 2 in simulation studies for various reasonable parameter values. Sometimes the move never takes place and the posterior has all its mass at the parameter values of the first drawing. The reason for the failure is that information is not updated at subsequent drawings and the transition kernel of (25) is static.

If the full conditionals

r (x_{1} | x_{2})

and

r (x_{2} | x_{1})

are available, which is usually true for many standard densities, we propose to use them in a Gibbs sampler to make independent drawings from the invariant density

r (x)

after the Markov chain has converged.

The combined algorithm is thus as follows, which we call “Gibbs within M–H”:

0.: Choose starting values $x^{0} = (x_{1}^{0} + x_{2}^{0})$
1.: Draw $x_{1}^{i}$ from $r (x_{1} | x_{2}^{i - 1})$ , draw $x_{2}^{i}$ from $r (x_{2} | x_{1}^{i})$ .
2.: Accept $x^{i} = (x_{1}^{i} x_{2}^{i})$ with probability $α (x_{1}^{i - 1}, x_{2}^{i})$ as defined in (24), otherwise $x^{i} =$ $x^{i - 1}$ .
3.: $i = i + 1$ . Go to 1.

As explained, step 2 is the Gibbs step and step 3 is the M–H step in our combined algorithm. In the following subsections, we describe the steps for implementing the above procedure to generate drawings from the posteriors (14) and (22).8

4.1. Implementing the CP Approach

Note that the posterior in the CP approach is proportional to the product of the prior, which is uniformly bounded, and the likelihood function, which can be sampled by a Gibbs sampler. Therefore, we choose the candidate-generating density the way suggested by Chib and Greenberg (1995): we use the likelihood function,

L (β, γ, Π_{1}, Π_{2}, Σ | Y, Z)

, as the candidate generating density for the posterior (14). Using precision matrix Σ⁻¹, the simulation steps are as follows,

0.: Choose starting values $(β^{0}, γ^{0}, Π_{1}^{0}, Π_{2}^{0}, Σ^{- 1, 0})$
1.: Draw Σ^−1,i from $p (Σ^{- 1} | β^{i - 1}, γ^{i - 1}, Π_{1}^{i - 1}, Π_{2}^{i - 1}, Y, Z)$
Draw $(β^{i}, γ^{i}, Π_{1}^{i}, Π_{2}^{i})$ from $p (β, γ, Π_{1}, Π_{2} {| Σ}^{- 1, i}, Y, Z)$
2.: Accept $β^{i}, γ^{i}, Π_{1}^{i}, Π_{2}^{i}, Σ^{- 1, i}$ as a drawing from the posterior (14) with probability,

$\min (\frac{{| σ_{11}^{i} |}^{(k_{2} - m + 1) / 2} {| Σ^{- 1, i} |}^{(k - m + 1) / 2} {| Π_{2}^{i^{'}} Z_{2}^{'} Q_{Z_{2}} Z_{2} Π_{2}^{i} |}^{1 / 2}}{{| σ_{11}^{i - 1} |}^{(k_{2} - m + 1) / 2} {| Σ^{- 1, (i - 1)} |}^{(k - m + 1) / 2} {| Π_{2}^{i - 1'} Z_{2}^{'} Q_{Z_{1}} Z_{2} Π_{2}^{i - 1} |}^{1 / 2}}, 1),$

Otherwise, $β^{i}, γ^{i}, Π_{1}^{i}, Π_{2}^{i}, Σ^{- 1, i} = (β^{i - 1}, γ^{i - 1}, Π_{1}^{i - 1}, Π_{2}^{i - 1}, Σ^{- 1, (i - 1)}) .$
3.: $i = i + 1 .$ Go to 1.

The conditional densities used in the first step are constructed as follows (see Percy (1992) and Chib and Greenberg (1996)): Rewrite the model (1) and (2) as a SUR model,

y_{t} = W_{t} δ + (\begin{matrix} u_{t} \\ V_{2, t} \end{matrix}),

(26)

where

y_{t} = {(\begin{matrix} y_{1, t} & Y_{2, t}^{'} \end{matrix})}^{'}

W_{t} = (\begin{matrix} (Y_{2, t}^{'} Z_{1, t}^{'}) & 0 \\ 0 & (I_{m - 1} \otimes Z_{t}^{'}) \end{matrix})

,

δ = {(β^{'}, γ^{'}, {(vec (\begin{matrix} Π_{1} \\ Π_{2} \end{matrix}))}^{'})}^{'}

. Then

p (Σ^{- 1} | δ, Y, Z) \propto {| Σ^{- 1} |}^{(T - 2 (m + 1)) / 2} \exp [- \frac{1}{2} t r (Σ^{- 1} H)]

(27)

which follows a Wishart distribution with

(T - m - 1)

degrees of freedom, where

H = \sum_{t = 1}^{T} (y_{t} - W_{t} δ) {(y_{t} - W_{t} δ)}^{'}

, and

p (δ | Σ^{- 1}, Y, Z) = N ({(\sum_{t = 1}^{T} W_{t}^{'} Σ^{- 1} W_{t})}^{- 1} (\sum_{t = 1}^{T} W_{t}^{'} Σ^{- 1} y_{t}), {(\sum_{t = 1}^{T} W_{t}^{'} Σ^{- 1} W_{t})}^{- 1})

(28)

4.2. Implementing the KVD Approach

KVD proposed to use the posterior of the unrestricted linear model (16),

p (β, λ, Π_{2}, Ω | Y, Z)

, as the candidate generating density of the posterior (22),

p (β, Π_{2}, Ω | Y, Z)

, where the parameters

(π_{1}, Π_{1})

have been concentrated out. First

(Φ, Ω)

is generated from

p (Φ, Ω | Y, Z)

, and then

(β, λ, Π_{2})

is obtained from

Φ

using (19). However,

λ

is also sampled which is not present in the posterior

p (β, Π_{2}, Ω | Y, Z)

. Therefore, KVD assumes that

λ

is generated by a conditional density of the form,

\begin{array}{l} g (λ | β, Π_{2}, Ω) \\ = {(2 π)}^{- (k_{2} - m + 1) / 2} {| B_{⊥} Ω^{- 1} B_{⊥}^{'} |}^{(k_{2} - m + 1) / 2} {| Π_{2 ⊥}^{'} Z_{2}^{'} M_{Z_{1}} Z_{2} Π_{2 ⊥} |}^{1 / 2} \\ \times \exp [- \frac{1}{2} t r (B_{⊥} Ω^{- 1} B_{⊥}^{'} (λ - \hat{λ})' Π_{2 ⊥}^{'} Z_{2}^{'} M_{Z_{1}} Z_{2} Π_{2 ⊥}) (λ - \hat{λ}))], \end{array}

(29)

where

\hat{λ} = {(Π_{2 ⊥}^{'} Z_{2}^{'} M_{Z_{1}} Z_{2} Π_{2 ⊥})}^{- 1} Π_{2 ⊥}^{'} Z_{2}^{'} M_{Z_{1}} (Y - Z_{2} Π_{2} B) Ω^{- 1} B_{⊥}^{'} {(B_{⊥} Ω^{- 1} B_{⊥}^{'})}^{- 1} .

Therefore, the density

p (β, λ, Π, Ω | Y, Z)

is used to approximate the posterior

g (λ | β, Π_{2}, Ω) p (β, Π_{2}, Ω | Y, Z)

. The weight function, defined as the ratio of the posterior and the candidate generating density, becomes

ω (β, λ, Π_{2}, Ω) = \frac{g (λ | β, Π_{2}, Ω) p (β, Π_{2}, Ω | Y, Z)}{p (β, λ, Π_{2}, Ω | Y, Z)} = \frac{| J (Φ, (Π_{2}, β, λ)) |_{λ = 0} |}{| J (Φ, (Π_{2}, β, λ)) |} g (λ | β, Π_{2}, Ω) |_{λ = 0},

(30)

where the Jacobian matrix

J (Φ, (Π_{2}, β, λ))

as well as

J (Φ, (Π_{2}, β, λ)) |_{λ = 0}

have been carefully derived in KVD9. Note that

ω (\cdot) = p (\cdot) / r (\cdot)

, so (30) may be used in the “GS within M–H” algorithm to simplify (24).

Similar to the way we implemented the CP approach, it is more convenient to work with the precision matrix Ω⁻¹ in the conditional densities. Applying the procedure outlined above, the steps involved in constructing the Markov chain for the posterior (22) are summarized as follows,

0.: Choose starting values $(Φ^{0}, Ω^{- 1, 0})$
1.: Draw $Ω^{- 1, i}$ from $p (Ω^{- 1} {| Φ}^{i - 1}, Y, Z)$
Draw $Φ^{i}$ from $p ({Φ | Ω}^{i - 1}, Y, Z)$
2.: Perform a singular value decomposition of $Φ^{i} = U^{i} S^{i} V^{i'}$
3.: Compute $β^{i}, λ^{i}, Π_{2}^{i}$ according to (18) and (19)
4.: Compute $ω (β^{i}, λ^{i}, π_{1}^{i}, Π_{1}^{i}, Π_{2}^{i}, Ω^{- 1, i})$ according to (29) and (30)
5.: Draw $(π_{1}^{i}, Π_{1}^{i})$ from $p (π_{1}, Π_{1} {| Ω}^{- 1, i}, Φ^{i} (Π_{2}^{i}, β^{i}, λ) Y, Z) |_{λ = 0}$
6.: Accept ( $β^{i}, π_{1}^{i}, Π_{1}^{i}, Π_{2}^{i}, Ω^{- 1, i})$ as a drawing from the posterior with probability,

$\min (\frac{ω (β^{i}, λ^{i}, Π_{2}^{i}, Ω^{- 1, i})}{ω (β^{i - 1}, λ^{i - 1}, Π_{2}^{i - 1}, Ω^{- 1, (i - 1)})}, 1),$

otherwise, $(β^{i}, λ^{i}, Π_{2}^{i}, Ω^{- 1, i}) = (β^{i - 1}, λ^{i - 1}, Π_{2}^{i - 1}, Ω^{- 1, (i - 1)}) .$
7.: $i = i + 1$ . Go to 1.

Note that the conditional densities used in the first step are as follows:

p (Σ^{- 1} | Φ, Y, Z) \propto {| Ω^{- 1} |}^{(T + k_{2} - m - 1)) / 2} \exp [- \frac{1}{2} t r (Ω^{- 1} G)],

(31)

which follows a Wishart distribution

W_{m} (T + k_{2}, G^{- 1})

with

(T + k_{2})

degree of freedom, where

G = Y^{'} Q_{z} Y + {(Φ - \hat{Φ})}^{'} Z_{2}^{'} M_{Z_{1}} Z_{2} (Φ - \hat{Φ})

, and

\hat{Φ} = {(Z_{2}^{'} M_{Z_{1}} Z_{2})}^{- 1} Z_{2}^{'} M_{Z_{1}} Y

. In addition,

p (Φ | Ω^{- 1}, Y, Z) {| Ω^{- 1} |}^{k_{2} / 2} \exp [- \frac{1}{2} t r [Ω^{- 1} {(Φ - \hat{Φ})}^{'} Z_{2}^{'} M_{Z_{1}} Z_{2} (Φ - \hat{Φ})]],

(32)

which is matric-variate normal density.

The conditional density used in step 5 is

p (π_{1}, Π_{1} | Ω^{- 1}, Φ (Π_{2}, β, λ), Y, Z) \propto {| Ω^{- 1} |}^{k_{1} / 2} \exp [- \frac{1}{2} t r [Ω^{- 1} {(Λ - \hat{Λ})}^{'} Z_{1}^{'} Z_{1} (Λ - \hat{Λ})]],

(33)

Evaluated at

λ = 0

, where

Λ = (π_{1} Π_{1})

,

\hat{Λ} = {(Z_{1}^{'} Z_{1})}^{- 1} Z_{1}^{'} (Y - Z_{2} Φ)

.

4.3. Convergence Diagnosis

One important implementation issue associated with MCMC methods is that of determining the number of iterations required. There are various informal or formal methods for the diagnosis of convergence, see Cowles and Carlin (1996) and Brooks and Roberts (1998) for comprehensive reviews and recommendations. Since the posterior densities in (14) and (22) resulting from CP and KVD do not have moments of any positive integer order, most of the methods proposed in the MCMC literature which require the existence of at least the first moment (posterior mean) are ruled out. We are left with a very few alternatives that can be used in our context.

First, the popular Raftery and Lewis (1992) method has been recognized as the best for estimating the convergence rate of the Markov chain if quantiles of the posterior density are of major interest, although the method does not provide any information as to the convergence rate of the chain as a whole. Because we are interested in the posterior modes and medians for

β

associated with the Bayesian approaches, we will largely rely on Raftery and Lewis’ method to determine the number of burn-ins and the subsequent number of iterations required to attain specified accuracy (e.g., estimating the 0.50 quantile in any posterior within ±0.05 with probability 0.95). However, we do not adopt their suggested skip-interval. MacEachern and Berliner (1994) showed that estimation quality is always degraded by discarding samples. We also experimented with using the skip-intervals and found that the results are basically the same if a sufficient number of iterations are run. This seems to be inefficient and sometimes infeasible in terms of computation time.

For each specification in our Monte Carlo study with repeated experiments, we determined the number of burn-ins and subsequent number of iterations by running the publicly available FORTRAN code gibbsit on MCMC output of 10,000 iterations from three or more testing replications. For KVD and CP approaches, the number of burn-ins for both the GS step and the M–H algorithm were estimated. It was found that the number of burn-ins in the GS step is negligible for most cases. However, we discarded more iterations as the transient phase than the estimated number of burn-ins.10 The estimated number of subsequent iterations across testing replications was stable for the Gibbs sampler (in both Geweke approach and the GS step for KVD and CP approaches), but it varied a lot for the M–H procedures, which is also demonstrated by the variation in acceptance rates over repeated experiments. We used a generous value for the number of subsequent iterations when feasible.

Second, for MCMC output from each testing replication, we also applied other convergence diagnostic methods, including percentiles derived from every quarter of the long chain, Yu and Mykland (1998)’s CUSUM plot, and Brooks (1996)’s D-sequence statistic. While the CUSUM partial sums actually involve averaging over sampling drawings, the computation of Brooks’ statistic is justified on the basis that it is designed to measure the frequency of back and forth movement in the MCMC algorithm. However, these diagnostics may sometimes provide contradictory outcomes so that one has to be extra careful in interpreting them before making a judgment on convergence.

5. Simulation Results and Discussions

In this section, we present results of Monte Carlo experiments and discuss some of the findings. As mentioned before, for the purpose of comparison, we also computed a number of single K-class estimators including OLS, 2SLS, LIML, and Fuller’s modified LIML. In summary, the set of K-class estimators for the structural coefficients in model (1) and (2) is given by:

(\begin{matrix} \hat{β} \\ \hat{γ} \end{matrix}) = {(\begin{matrix} Y_{2}^{'} Y_{2} - K_{1} {\hat{V}}_{2}^{'} \hat{V_{2}} & Y_{2}^{'} Z_{1} \\ Z_{1}^{'} Y_{2} & Z_{1}^{'} Z_{1} \end{matrix})}^{- 1} (\begin{matrix} {(Y_{2} - K_{2} \hat{V_{2}})}^{'} y_{1} \\ Z_{1}^{'} y_{1} \end{matrix})

where

\hat{V_{2}} = Q_{Z} Y_{2}

—see Equation (7) above.

The following LISEM estimators have been considered:

(1): Ordinary least squares (OLS)

$K_{1} = K_{2} = 0 .$
(2): Two stage least squares (2SLS)

$K_{1} = K_{2} = 1 .$
(3): Zellner’s (1978) Bayesian minimum expected loss estimator (MELO)

$K_{1} = K_{2} = 1 - k / (T - k - m - 1) .$
(4): Zellner’s Bayesian method of moments relative to balanced loss function (BMOM)11

$K_{1} = 1 - k / (T - k), K_{2} = 1 - (1 - ω) k / (T - k) with ω = 0.75$
(5): Classical LIML. We compute classical LIML as an iterated Aitken estimator (see Pagan (1979) and Gao and Lahiri (2000a)).
(6): Fuller (1977) modified LIML estimators (Fuller1 and Fuller4)

$K_{1} = K_{2} = λ_{*} - α / (T - k) for α = 1, 4$

where

$λ_{*} = \min_{β} \frac{{(y_{1} - Y_{2} β)}^{'} Q_{Z_{1}} (y_{1} - Y_{2} β)}{{(y_{1} - Y_{2} β)}^{'} Q_{Z} (y_{1} - Y_{2} β)}$

and it is computed using the LIML estimate.
(7): JIVE.
(8): Posterior mode and median from the Geweke (1996) approach using Gibbs Sampling. The values of the hyperparameters are chosen to be $τ^{2} = 0.01$ , $v = m (m + 1) / 2, S = 0.01 I_{m}$ .12
(9): Mode and median of the marginal density of β based on classical LIML from Gibbs sampling (LIML-GS). LIML-GS is a byproduct of the “Gibbs within M–H” algorithm for the CP approach since the likelihood function is used as the candidate-generating density to explore the CP posterior.
(10): Posterior mode and median from CP approach using “Gibbs within M–H” algorithm.
(11): Posterior mode and median from KVD approach using “Gibbs within M–H” algorithm.

For the Bayesian approaches and LIML-GS, we report both (posterior) mode and median to show possible asymmetry in the marginal densities of

β

. Any preference for one over the other will depend on the researcher’s loss function. We obtain 16 estimates for each generated data set. The data are generated from the model,

y_{1} = Y_{2} β + u Y_{2} = Z_{2} π + V_{2},

(34)

where y₁, Y₂ are T × 1 such that m = 2, and

Z_{2} : T \times k_{2}

. We further specify

β = 1

and

Σ = (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})

(35)

For

| ρ |

we used 0.20, 0.60, and 0.95.13 Z₂ is simulated from a

N (0, I_{k_{2}} \otimes I_{T})

distribution and

(u, V_{2})

from a

N (0, Σ \otimes I_{T})

distribution. A constant term is added in each equation, i.e., Z₁ is a T × 1 vector of 1 s.

The simulation results are reported in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13. Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 are for cases with

ρ > 0

, each table reporting results for one specification.

Table 13 summarizes the results for cases with

ρ < 0

for BMOM and KVD for whom negative

ρ

made a surprising difference. As mentioned before, we focus on the estimates of the structural parameter

β

. Specifically, we analyze the sensitivity of the various estimates of

β

with respect to the strength of the instrumental variables Z, the degree of overidentification

(k_{2} - m + 1)

, the degree of endogeneity

(ρ)

, and the sample size (T). Also, we will examine whether the performance of an estimator is symmetric with respect to the sign of parameter

ρ

, an issue generally overlooked in the literature.14

Note that the strength of the instrumental variables for the included endogenous variable Y₂ is measured in terms of the adjusted

R^{2}

by regressing Y₂ on Z = (Z₁, Z₂). In the data generating process, we controlled

{\bar{R}}^{2}

to be within ±2.5% of the specified value to reduce unnecessary variation. We did not experiment with extremely small

{\bar{R}}^{2}

(say, 0.01 or less). In these cases, the mean values of all estimators approached the point of concentration

ω_{12} / ω_{22}

, which is equal to

(β + ρ)

for our data generating process (DGP).

For each specification, the number of replications is 400. The number of burn-ins (nburn_GS and nburn_MH), and subsequent number of iterations (n) determined at the convergence diagnosis step are reported in the footnotes to each table.

The average acceptance rate and its standard deviation (in parentheses) across replications for each M–H routine are reported as well. To evaluate alternative estimators, we computed mean, standard deviation (Std), root of mean squared errors (RMSE), and mean absolute deviation (MAD) over repeated experiments for all the estimators considered.15 Since LIML, posterior densities for CP and KVD, as well as 2SLS in the just-identified case do not have finite moments of positive order in finite samples, one should interpret the computed mean, standard deviation and RMSE across replications for these estimators with caution. In this sense, the MAD across replications is a preferred measure to consider.

We will first look at cases reported in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 with

ρ > 0

. In Table 1, we consider a case (T = 50, ρ = 0.60, k₂ = 4) with moderately strong instruments (

{\bar{R}}^{2} = 0.40

). It is found that with reasonably strong instruments all estimators designed for simultaneous equations perform reasonably well. As expected, OLS is seriously biased. BMOM has a slight edge over others in terms of RMSE and MAD. For all Bayesian approaches and LIML-GS, the medians perform a little better than modes, and CP over KVD, in terms of bias, RMSE, and MAD. Notice that the classical LIML estimates are different from LIML-GS (mode or median). As noted by Drèze (1976), from a Bayesian viewpoint, LIML produces an estimate of

β

conditionally on the overidentifying restrictions, the modal values of all the remaining parameters, and a uniform prior. In other words, the concentrated likelihood function of

β

after concentrating out (i.e., maximizing with respect to) other reduced-form and nuisance parameters is a conditional density.

However, LIML-GS is a marginal density with all other parameters being integrated out. Due to possible asymmetry in the distribution of the nuisance parameters, the modal/median values of LIML-GS may not coincide with classical LIML estimates. In all our experiments, we find that the median-unbiasedness property of (conditional) LIML does not carry over to the marginal LIML (i.e., LIML-GS); however, the former generally has a much larger standard deviation than the latter. In a way, LIML-GS brings the classical LIML estimator close to its Bayesian counterpart for the purpose of comparison.

It is interesting to note that across all our tables, the difference between LIML-GS and CP can only be attributed to the importance of Jeffreys prior. Compared to LIML-GS, typically CP has a smaller bias, but slightly larger standard deviation, even though the differences are very small. In some cases, however, the use of Jeffreys prior reduces the bias in CP quite substantially. For example, in Table 4 with T = 50 and a high degree of overidentification, the bias is reduced from 0.36 to 0.25.

A simple case when the structural model is just identified (k₂ = 1) is reported in Table 2. For this case it is well known that classical LIML coincides with 2SLS. The KVD approach does not accommodate the case of just-identification since (15) requires k₂ > (m − 1).16 In this case, we find that CP-Mode produces results closer to LIML-GS-Mode than to LIML. CP (1998) showed that for a two-equation just-identified SEM in orthonormal canonical form, the posterior density of

β

with Jeffreys prior has precisely the same functional form as the density of the finite sample distribution of the corresponding LIML estimator as obtained by Mariano and McDonald (1979). Our simulation results show that the assumption of orthonormal canonical form is crucial for their exact correspondence, which cannot be extended to a general SEM.17 In general, the Bayesian marginal density is not the same as the classical conditional density. Interestingly, JIVE is considerably more biased and has larger standard deviation than 2SLS. Also, CP-Median and LIML-GS-Median perform significantly worse than their modes. This is because in an exactly identified model with weak instruments, the probability of local nonidentification is substantial, and the resulting nonstandard marginal density exhibits a very high variance. The same result holds true for Geweke-Median, but to a lesser extent. Thus, for exactly identified SEMs with very weak instruments, mode of the marginal density is a more dependable measure of

β

. We should point out that in all other cases in this study, the medians generally turned out to be more preferable than the modes in terms of bias, RMSE, and MAD (see Table 11 and Table 12, for instance).

Results reported in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 consider cases with general overidentification and weak instruments. As noted in the literature, OLS and 2SLS are median-biased in the direction of the correlation coefficient

ρ

, and the bias in 2SLS grows with the degree of over identification, and decreases as sample size increases. Results in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10 confirm these results. Since MELO is a single K-class estimator with 0 < K < 1, its performance is always between OLS and 2SLS estimates. The bias in MELO shows the same pattern as that of 2SLS. With moderate simultaneity, the median-bias in 2SLS can be as large as about 40% of the true value (see Table 8). We note that MELO, LIML-GS-Mode, and KVD-Mode or KVD-Median are also median-biased in the direction of ρ. However, the bias in JIVE is consistently in the opposite direction of

ρ .

Classical LIML is remarkably median-unbiased when the instrumental variables are not very weak, which is well documented in the literature. We find that LIML is median-biased in the direction of

ρ

when the instruments are very weak (Table 8), which is consistent with the finding in Staiger and Stock (1997) using local-to-zero asymptotic theory. Even in this situation, the bias of LIML is much smaller than that of any other estimator, except BMOM.

The MAD of OLS is very close to its bias (i.e., relatively small Std) across all cases and it implies that OLS method is robust in the sense that it does not suffer from heavy tails or outlying estimates, see Zellner (1998). In this sense, MELO and BMOM are all robust with relatively small standard deviations across replications. However, OLS exhibits large bias in the presence of simultaneity and is not so appealing. It is known that for a degree of overidentification strictly less than seven, 2SLS would have a smaller asymptotic mean squared error (AMSE) than LIML, cf. Mariano and McDonald (1979) and references therein. In cases with weak instruments the situation gets more complicated in finite samples. In our experiments, LIML has larger RMSE and MAD than 2SLS except in Table 11 and Table 12 where ρ was 0.95. Note that the degree of overidentification is 8.0 in Table 4, Table 6, Table 8 and Table 10.

Among classical estimators, JIVE turns out to be least appealing. Monte Carlo simulations in Angrist et al. (1999) showed that JIVE has slight median bias in the opposite direction of

ρ

(but less than 2SLS) and have heavier tails than LIML. Our Table 6 is comparable to panel 2 of their Table 1, and the results are similar. Our other experiments show that JIVE may also have large absolute bias (larger than LIML) in the case with weak instruments, sometimes even greater than 2SLS (see Table 2). Generally, JIVE has slightly less bias than 2SLS, but this gain is overshadowed by enlarged standard deviation such that in finite samples it has no advantage over 2SLS in terms of MAD and RMSE. We also find that JIVE has greater RMSE and MAD than LIML. Blomquist and Dahlberg (1999) experimented with much larger sample sizes than ours. Comparing our Table 4 with Table 6 and with an unreported simulation with a sample size of 500, we found that the relative gain in JIVE is more than other estimators as sample size increases, even though its relative low standing remains valid. Examined from different angles, these results are very similar to those reported by Davidson and MacKinnon (2006a, 2006b).18

Fuller’s modified LIML estimators are included because Fuller1 is designed to minimize the median-bias, and Fuller4 to minimize the mean-squared error. It seems that this conclusion is also problematic in the presence of weak instruments. Between the two, Fuller1 has smaller median-bias, and Fuller4 has smaller standard deviations across replications. However, in terms of RMSE or MAD, Fuller4 shows no advantage over Fuller1 in most of the cases.

Because all the estimators except OLS are consistent and their asymptotic distributions are also the same, results in Table 3, Table 4, Table 5 and Table 6 confirm that their bias and dispersion decrease as sample size increases. But if the instruments are very weak (see Table 7 and Table 8), their bias and dispersion may remain significant, a point emphasized forcefully by Zellner (1998). However, when the endogeneity is not strong (see Table 9 and Table 10), their bias and dispersion may not be a big concern for some of the estimators.

Across all cases, we find that the bias in BMOM is small if

ρ

is not too small and the structural Equation (1) is overidentified. As sample size increases or degree of over-identification rises, the observed bias in BMOM decreases. The most striking feature of BMOM is that it exhibits the smallest MAD and Std when ρ is not too small. MELO shows slightly smaller MAD and Std than BMOM if

ρ

is small (see Table 9 and Table 10). In cases with very weak instruments and a high degree of overidentification, the MAD of BMOM is only one-fourth of that of other estimators (see Table 8). These are in accordance with Tsurumi (1990)’s finding that in many cases, ZEM has the least relative mean absolute deviation. Meanwhile, if

ρ

is very small and the structural equation is overidentified, the bias in BMOM can be large; 2SLS, LIML-GS, Geweke, and CP perform remarkably well in these situations.

Next, we examine in more detail the performance of the Bayesian approaches. Overall, the median bias resulting from these approaches exhibits the same pattern as the bias of 2SLS, it increases with the degree of overidentification, and decreases as sample size rises. The Geweke (1996) approach used a shrinkage prior but its performance is comparable with LIML-GS and CP. The median-bias from PMOD-Geweke is the same or slightly less than that of LIML-GS-Mode, and the bias from Geweke-Median is always slightly less than that of LIML-GS-Median. Similar relationships are observed for MADs. These reflect the impact of the (informative) shrinkage prior on the posterior density.

For each specification, the acceptance rate in the M–H algorithm using CP approach is stable while that using KVD approach shows huge variation across replications. The acceptance rate for CP is generally above 40%, except when sample size is small and the degree of overidentification is high. This shows that the posterior of CP is largely dominated by the likelihood function (3) and the Jeffreys prior generally carries little information. Second, in terms of the computed standard deviations (Stds) of the estimates across replications, CP-Mode has larger dispersion than LIML-GS-Mode, and CP-Median has larger dispersion than LIML-GS-Median. These also shed light on the notion that Jeffreys prior is less informative than a uniform prior. However, between the Jeffreys prior (13) used by CP and the implied prior (21) resulting from diffuse/Jeffreys prior on a linear model used by KVD, it is not clear which one is less informative.

As for the KVD (1998) approach, we observe that it performs as well as any other estimator if the instruments are not weak (see Table 1). But when the instruments are weak, and

ρ

is positive, KVD shows more bias and higher MAD than those from CP. In Table 4 with T = 50 and high degree of overidentification, KVD performs as bad as OLS.

Next, we consider cases with negative

ρ,

and the results are summarized in Table 13. We replicate each case in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 with the same specification except ρ being negative. Since the performance of all estimators except BMOM and KVD were basically the same with respect to the sign of

ρ

, we only report results on these two in Table 13. We find that when

ρ

changes sign, the bias of BMOM does not change sign and even increases in magnitude. Also note that the computed Stds for BMOM when

ρ < 0

are close to the respective ones when

ρ > 0

. Therefore, for cases with

ρ < 0

, BMOM has large RMSEs/MADs and loses its attraction. Note that BMOM is the same as the double K-class estimator (DKC) with K values fixed. This asymmetry in the performance in DKC is not well recognized in the literature, and has been discussed in Gao and Lahiri (2001). The observed asymmetry in its bias with respect to

ρ

in our experiments is readily explained by examining an expression for the mean of double K-class estimator (DKC) in (Dwivedi and Srivastava (1984, Theorem (1)). We can express

{\hat{δ}}_{D K C}

as:

{\hat{δ}}_{D K C} = {\hat{δ}}_{K_{1}} + {(\begin{matrix} Y_{2}^{'} Y_{2} - K_{1} {\hat{V}}_{2}^{'} \hat{V_{2}} & Y_{2}^{'} Z_{1} \\ Z_{1}^{'} Y_{2} & Z_{1}^{'} Z_{1} \end{matrix})}^{- 1} (\begin{matrix} (K_{1} - K_{2}) {\hat{V}}_{2}^{'} y_{1} \\ 0 \end{matrix}),

(36)

where

{\hat{δ}}_{K_{1}}

is a single K-class estimator with characterizing scalar

K_{1}

. When

Z_{1}^{'} Z_{2} = 0

, which is satisfied in our experimental specifications, a double K-class estimator of

β

may be written as

{\hat{β}}_{D K C} = {\hat{β}}_{K_{1}} + (K_{1} - K_{2}) \frac{Y_{2}^{'} Q_{Z} y_{1}}{Y_{2}^{'} Δ Y_{2}}

where

Δ = (1 - K_{1}) Q_{z_{1}} + K_{1} P_{Z_{2}}

. Observe that for

0 < K_{1} < 1

,

{β^}_{K_{1}}

is biased in the direction of

ρ

, as noted in Mariano and McDonald (1979). Note also that

Y_{2}^{'} Δ Y_{2} > 0

, and

Y_{2}^{'} Q_{Z} y_{1}

provides an estimate of

ω_{12}

. Although Dwivedi and Srivastava (1984) explored the dominance of double K-class over K-class using the exact MSE criterion, their guidelines for the selection of K₂ for a given K₁ are not entirely valid, because the conditions were derived from a small Monte Carlo simulation with positive

ω_{12}

and negative

ρ

only. Since K₁ < K₂ for BMOM, when

ρ

and

ω_{12}

have the opposite sign, the second term in

{\hat{β}}_{D K C}

will be of the same sign as the bias of

{\hat{β}}_{K_{1}}

, therefore

{\hat{β}}_{D K C}

(hence BMOM) will exhibit large bias. Otherwise, when

ρ . ω_{12} > 0

, the bias is mitigated. Based on our simulation results, we found that the sign of ρ has no effect on the standard deviation of BMOM. This finding shows that the greater RMSE of BMOM when

ρ ω_{12}

< 0 is due to the aggravated bias. For the specification corresponding to Table 4 in Table 13 (i.e., T = 50,

ρ

= −0.60, K₂ = 4,

{\bar{R}}^{2}

= 0.10), we find that for given K₁ = 0.947, RMSE is minimized if K₂ is chosen to be 0.829, which is much less than K₁, and less than K₂ = 0.987 used in BMOM. See Gao and Lahiri (2001) for further details.

In Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 we found that KVD with

ρ

> 0 performs very poorly, often with substantial bias and relatively high RMSE and MAD. CP uniformly dominates KVD in these cases. However, with

ρ

< 0 the picture turns around remarkably well in favor of KVD. As we see in Table 13, across all cases the bias tends to be negative and relatively small. With other parameter values being the same, KVD with

ρ

< 0 has significantly less RMSE and MAD than cases when

ρ

> 0, and performs unequivocally the best among all estimators when endogeneity is strong. However, since this observed asymmetry is essentially a finite sample problem with KVD, the improved performance when

ρ

< 0 becomes less significant when the sample size increases from 50 to 100. With

ρ

< 0 the overall performance of KVD is very comparable to that of CP, if not slightly better in some cases.

After experimenting with widely different negative and positive values of

β

and

ρ

, we found out that the performance of KVD is dependent on the sign of

β ρ

, rather than on the sign of

ρ

. When

β ρ

> 0, it performs very unsatisfactorily as documented in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12. Kleibergen and Zivot (2003) have derived exact analytical expressions for the conditional densities of

β

given Ω for both the KVD and CP posteriors. They show that the difference between the two is in the Jacobian relating the unrestricted linear multivariate model to the restricted reduced form model. We expect that this additional term may account for the asymmetry in KVD with respect to

β ρ

. In our experiments, we found that in finite samples, when

β ρ

> 0, the reduced rank restriction using singular value decomposition shifts the marginal posterior for KVD away from the marginal posterior of the linear multivariate model. However, when the sample size gets large, the problem seems to go away.

6. Conclusions

This paper examines the relative merits of some contemporary developments in the Bayesian and classical analysis of limited information simultaneous equations models in situations where the instruments are very weak. Since the posterior densities and their conditionals in the Bayesian approaches developed by Chao and Phillips (1998) and Kleibergen and van Dijk (1998) are nonstandard, we proposed and implemented a “Gibbs within Metropolis–Hastings” algorithm, which only requires the availability of the conditional densities from the candidate-generating density. These conditional densities are used in a Gibbs sampler (GS) to simulate the candidate generating density, whose drawings, after convergence, are then weighted to generate drawings from the target density in a Metropolis–Hastings (M–H) algorithm. We rely on Raftery and Lewis (1992) to determine the number of burn-ins, and the subsequent number of required iterations in order to ensure convergence. Through a MCMC simulation study, our results provide useful guidelines for empirical practitioners.

The first comforting result is that with reasonably strong instruments (marginal

{\bar{R}}^{2}

in excess of 0.40), all estimators perform equally well in finite samples. In cases with very weak instruments (marginal

{\bar{R}}^{2}

less than 0.10), there is no single estimator that is superior to others in all cases—a conclusion also reached by Andrews and Stock (2005). When endogeneity is weak (

ρ

less than 0.20), Zellner’s MELO does the best. When the endogeneity is relatively strong (ρ in excess of 0.60) and

ρ ω_{12}

> 0, BMOM outperforms all other estimators by wide margins. When the endogeneity is strong but

β ρ

< 0, the KVD approach seems to get very appealing; but, otherwise, its performance is surprisingly poor. With

β ρ

> 0, as the sample size gets larger, the performance of KVD improves rapidly. Fortunately, the Geweke and CP approaches exhibit no such asymmetry and their performances based on bias, RMSE, and MAD are very similar. Based on the medians of marginal posteriors, their performance ranking is consistently a distant second. The record of JIVE is quite disappointing across all our experiments and is not recommended in practice. Even though JIVE is slightly less biased than 2SLS in most cases, its standard deviation is considerably higher, particularly in small samples. The most remarkable result in this study is that poor instruments can affect the performance of different estimators differently, depending on the signs and magnitudes of certain key parameters of the model. Given the finding that even in finite samples with very weak instruments BMOM and KVD perform remarkably well on certain parts of the parameter space, more research is needed to understand the reasons for the asymmetry and find ways to fix the problem. Another important caveat of our comparative study is that it was done in an iid setting. Heteroskedastic and autocorrelated errors, particularly in highly leveraged regressions, can affect inferences based on alternative instrument variable regressions differentially relative to ordinary least squares, see Young (2019). These issues remain unresolved.

Author Contributions

The two authors contributed equally to this work.

Funding

This research received no external funding.

Acknowledgments

An earlier version of this paper was presented at the 1999 Joint Statistical Meetings and the 2000 Winter Meetings of the Econometric Society. We are grateful to Late Arnold Zellner for his early encouragement with this work, and to Ingolf Dittmann, Yuichi Kitamura, Roberto Mariano, Eric Zivot, Herman van Dijk and two anonymous referees for many helpful comments and suggestions. Peter Dang re-typed the whole manuscript. However, the responsibility for any remaining errors and shortcomings is solely ours.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof that |J(Φ, (Π₂, β, λ))| ≥ |J(Φ, (Π₂, β, λ))|_λ₌₀ is invalid, cf. footnote 9.

θ_{2} = Π_{22} Π_{21}^{- 1}

. The fact is that J(Φ, (Π₂₁, θ₂, β, λ))|_λ=0 and W are not orthogonal, where W = J(Φ, (Π₂₁, θ₂, β, λ))−J(Φ, (Π₂₁, θ₂, β, λ))|_λ₌₀.

Consider a simple case with m = k₂ = 2. In this case,

Φ = (\begin{matrix} 1 \\ θ_{2} \end{matrix}) Π_{21} (\begin{matrix} β & 1 \end{matrix}) + (\begin{matrix} - θ_{2} \\ 1 \end{matrix}) {(1 + θ_{2}^{2})}^{- 1 / 2} λ {(1 + β^{2})}^{- 1 / 2} (1 - β) .

Denote

K = {(1 + θ_{2}^{2})}^{- 1 / 2} {(1 + β^{2})}^{- 1 / 2}

. Therefore,

J {(Φ, (Π_{21}, θ_{2}, β, λ)) |}_{λ = 0} = (\begin{matrix} β & 0 & Π_{21} & K (- θ_{2}) \\ β θ_{2} & β Π_{21} & θ_{2} Π_{21} & K (1) \\ 1 & 0 & 0 & K (β θ_{2}) \\ θ_{2} & Π_{21} & 0 & K (- β) \end{matrix}),

W = (\begin{matrix} 0 & λ K {(1 + θ^{2})}^{- 1} (- 1) & λ K {(1 + β^{2})}^{- 1} (β θ_{2}) & 0 \\ 0 & λ K {(1 + θ^{2})}^{- 1} (- θ_{2}) & λ K {(1 + β^{2})}^{- 1} (- β) & 0 \\ 0 & λ K {(1 + θ^{2})}^{- 1} (β) & λ K {(1 + β^{2})}^{- 1} (θ_{2}) & 0 \\ 0 & λ K {(1 + θ^{2})}^{- 1} (β θ_{2}) & λ K {(1 + β^{2})}^{- 1} (- 1) & 0 \end{matrix}) .

It is easy to check that |J(Φ, (Π₂₁, θ₂, β, λ))|_λ₌₀|W′ is not a zero matrix but with its third row being 0 s. Interestingly,

(J (Φ, {(Π_{21}, θ_{2}, β, λ)) |}_{λ = 0})^{'} W = (\begin{matrix} 0 & 0 & 0 & 0 \\ 0 & 0 & λ K (- Π_{21}) & 0 \\ 0 & λ K (- Π_{21}) & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}) .

References

Ackerberg, Daniel A., and Paul J. Devereux. 2006. Comment on case against JIVE. Journal of Applied Econometrics 21: 835–38. [Google Scholar] [CrossRef]
Andrews, Donald W. K., and James H. Stock. 2005. Inference with weak instruments. In Advances in Economics and Econometrics, Theory and Applications. Edited by Richard Blundell, Whitney K. Newey and Torsen Persson. Ninth World Congress of the Econometric Society. Cambridge: Cambridge University Press, vol. III. [Google Scholar]
Andrews, Donald W. K., James H. Stock, and Liang Sun. 2019. Weak Instruments in IV Regression: Theory and Practice. Annual Review of Economics. forthcoming. [Google Scholar]
Angrist, Joshua D., Guido W. Imbens, and Alan Krueger. 1999. Jackknife instrumental variables estimation. Journal of Applied Econometrics 14: 57–67. [Google Scholar] [CrossRef]
Billingsley, Patrik. 1986. Probability and Measure. New York: Wiley. [Google Scholar]
Blomquist, Sören, and Matz Dahlberg. 1999. Small sample properties of LIML and jackknife IV estimators: Experiments with weak instruments. Journal of Applied Econometrics 14: 69–88. [Google Scholar] [CrossRef]
Blomquist, Sören, and Matz Dahlberg. 2006. The case against Jive: A comment. Journal of Applied Econometrics 21: 839–41. [Google Scholar] [CrossRef]
Bound, John, David A. Jaeger, and Regina M. Baker. 1995. Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. Journal of the American Statistical Association 90: 443–50. [Google Scholar] [CrossRef]
Brooks, Stephen P. 1996. Quantitative Convergence Diagnosis for MCMC via CUSUMS. Technical Report. Bristol: University of Bristol. [Google Scholar]
Brooks, Stephen P., and Gareth O. Roberts. 1998. Assessing convergence of Markov chain Monte Carlo algorithms. Statistics and Computing 8: 319–35. [Google Scholar] [CrossRef]
Buse, Adolf. 1992. The bias of instrumental variable estimators. Econometrica 60: 173–80. [Google Scholar] [CrossRef]
Casella, George, and Edward I. George. 1992. Explaining the Gibbs sampler. The American Statistician 46: 167–74. [Google Scholar]
Chao, John C., and Peter C. B. Phillips. 1998. Posterior distributions in limited information analysis of the simultaneous equations model using the Jeffreys prior. Journal of Econometrics 87: 49–86. [Google Scholar] [CrossRef]
Chib, Siddhartha, and Edward Greenberg. 1995. Understanding the Metropolis–Hastings algorithm. The American Statistician 49: 327–35. [Google Scholar]
Chib, Siddhartha, and Edward Greenberg. 1996. Markov chain Monte Carlo simulation methods in econometrics. Econometric Theory 12: 409–31. [Google Scholar] [CrossRef]
Conley, Timothy G., Christian B. Hansen, Robert McCulloch, and Peter E. Rossi. 2008. A semi-parametric Bayesian approach to the instrumental variable problem. Journal of Econometrics 144: 276–305. [Google Scholar] [CrossRef]
Cowles, Mary K., and Bradley P. Carlin. 1996. Markov chain Monte Carlo convergence diagnosis: A comparative review. Journal of the American Statistical Association 91: 883–904. [Google Scholar] [CrossRef]
Davidson, Russell, and James G. MacKinnon. 2006a. The case against JIVE. Journal of Applied Econometrics 21: 827–33. [Google Scholar] [CrossRef]
Davidson, Russell, and James G. MacKinnon. 2006b. The case against JIVE: Reply. Journal of Applied Econometrics 21: 843–44. [Google Scholar] [CrossRef]
Drèze, Jacques H. 1976. Bayesian limited information analysis of the simultaneous equations model. Econometrica 44: 1045–75. [Google Scholar] [CrossRef]
Drèze, Jacques H., and Juan-Antonio A. Morales. 1976. Bayesian full information analysis of simultaneous equations. Journal of American Statistical Association 71: 329–54. [Google Scholar]
Drèze, Jacques H., and Jean François Richard. 1983. Bayesian analysis of simultaneous equation systems. In Handbook of Econometrics. Edited by Zvi Griliches and Michael Intriligator. Amsterdam: North Holland. [Google Scholar]
Dwivedi, Tryambakeshwar D., and Virendra K. Srivastava. 1984. Exact finite sample properties of double k-class estimators in simultaneous equations. Journal of Econometrics 25: 263–83. [Google Scholar] [CrossRef]
Fuller, Wayne. A. 1977. Some properties of a modification of the limited information estimator. Econometrica 45: 939–53. [Google Scholar] [CrossRef]
Gao, Chuanming, and Kajal Lahiri. 2000a. Further consequences of viewing LIML as an iterated Aitken estimator. Journal of Econometrics 98: 187–202. [Google Scholar] [CrossRef]
Gao, Chuanming, and Kajal Lahiri. 2000b. MCMC algorithms for two recent Bayesian limited information estimators. Economics Letters 66: 121–26. [Google Scholar] [CrossRef]
Gao, Chuanming, and Kajal Lahiri. 2001. A Note on the double k-class estimator in simultaneous equations. Journal of Econometrics 108: 101–11. [Google Scholar] [CrossRef]
Geweke, John. 1996. Bayesian reduced rank regression in econometrics. Journal of Econometrics 75: 121–46. [Google Scholar] [CrossRef]
Hastings, Willeen K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrica 57: 97–109. [Google Scholar] [CrossRef]
Kleibergen, Frank. 1997. Equality Restricted Random Variables: Densities and Sampling Algorithms. Econometric Institute Report 9662/A. Rotterdam: Erasmus University Rotterdam. [Google Scholar]
Kleibergen, Frank. 1998. Conditional Densities in Econometrics. Econometric Institute Research Papers EI 9853, Erasmus School of Economics (ESE), Discussion Paper. Rotterdam: Erasmus University Rotterdam. [Google Scholar]
Kleibergen, Frank, and Herman K. van Dijk. 1998. Bayesian simultaneous equation analysis using reduced rank structures. Econometric Theory 14: 701–43. [Google Scholar] [CrossRef]
Kleibergen, Frank, and Eric Zivot. 2003. Bayesian and classical approaches to in-strumental variable regression. Journal of Econometrics 114: 29–72. [Google Scholar] [CrossRef]
MacEachern, Steven N., and L. Mark Berliner. 1994. Subsampling the Gibbs sampler. The American Statistician 48: 188–90. [Google Scholar]
Maddala, Gangadharrao S. 1976. Weak priors and sharp posteriors in simultaneous equation models. Econometrica 44: 345–51. [Google Scholar] [CrossRef]
Maddala, Gangadharrao S., and Jinook Jeong. 1992. On the exact small sample distribution of the instrumental variable estimator. Econometrica 60: 181–83. [Google Scholar] [CrossRef]
Mariano, Roberto S., and James B. McDonald. 1979. A note on the distribution functions of LIML and 2SLS structural coefficient in exactly identified case. Journal of the American Statistical Association 74: 847–48. [Google Scholar] [CrossRef]
Metropolis, Nicholas, Arianna. W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. Equations of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087–92. [Google Scholar] [CrossRef]
Ni, Shawn, and Dongchu Sun. 2003. Noninformative priors and frequentist risks of Bayesian estimators of vector-autoregressive models. Journal of Econometrics 115: 159–97. [Google Scholar] [CrossRef]
Ni, Shawn, Dongchu Sun, and Xiaogian Sun. 2007. Intrinsic Bayesian estimation of vector autoregression impulse responses. Journal of Business & Economic Statistics 25: 163–76. [Google Scholar]
Pagan, Adrian R. 1979. Some consequences of viewing LIML as an iterated Aitken estimator. Economics Letters 3: 269–372. [Google Scholar] [CrossRef]
Percy, David F. 1992. Prediction for seemingly unrelated regressions. Journal of the Royal Statistical Society B 54: 243–52. [Google Scholar] [CrossRef]
Poirier, Dale J. 1995. Intermediate Statistics and Econometrics. Cambridge: MIT Press. [Google Scholar]
Poirier, Dale J. 1996. Prior beliefs about fit. In Bayesian Statistics 5: Proceedings of the Fifth Valencia International Meeting. Edited by Jose M. Bernardo, James O. Berger, A. Philip Dawid and Adrian F. M. Smith. Oxford: Clarendon Press. [Google Scholar]
Radchenko, Stanislav, and Hiroki Tsurumi. 2006. Limited information Bayesian analysis of a simultaneous equation with an autocorrelated error term and its application to the U.S. gasoline market. Journal of Econometrics 133: 31–49. [Google Scholar] [CrossRef]
Raftery, Adrian E., and Stephen M. Lewis. 1992. How many iterations in the Gibbs sampler? In Bayesian Statistics 4. Proceedings of the Fourth Valencia International Meeting. Edited by Jose M. Bernardo, Adrian F. M. Smith, A. Philip Dawid and James O. Berger. Oxford: Oxford University Press. [Google Scholar]
Smith, Adrian F. M., and Gareth O. Roberts. 1993. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society B 55: 3–23. [Google Scholar] [CrossRef]
Staiger, Douglas, and James H. Stock. 1997. Instrumental variables regression with weak instruments. Econometrica 65: 557–86. [Google Scholar] [CrossRef]
Tierney, Luke. 1994. Markov chains for exploring posterior distributions. Annals of Statistics 22: 1701–67. [Google Scholar] [CrossRef]
Tsurumi, Hiroki. 1990. Comparing Bayesian and non-Bayesian limited information estimators. In Bayesian and Likelihood Methods in Statistics and Econometrics. Edited by Seymour Geisser, James S. Hodges, S. James Press and Arnold Zellner. Amsterdam: North-Holland. [Google Scholar]
Young, Alwyn. 2019. Consistency without Inference: Instrumental Variables in Practical Application. London: London School of Economics. [Google Scholar]
Yu, Bin, and Per Mykland. 1998. Looking at Markov samplers through Cusum path plots: A simple diagnostic idea. Statistics and Computing 8: 275–86. [Google Scholar] [CrossRef]
Zellner, Arnold. 1971. An Introduction to Bayesian Inference in Econometrics. New York: Wiley. [Google Scholar]
Zellner, Arnold. 1978. Estimation of functions of population means and regression coefficients: A minimum expected loss (MELO) approach. Journal of Econometrics 8: 127–58. [Google Scholar] [CrossRef]
Zellner, Arnold. 1986. Further results on Bayesian minimum expected loss (MELO) estimates and posterior distributions for structural coefficients. In Advances in Econometrics. Edited by Daniel L. Slottje. Amsterdam: Elsevier, vol. 5, pp. 171–82. [Google Scholar]
Zellner, Arnold. 1994. Bayesian and Non-Bayesian estimation using balanced loss functions. In Statistical Decision Theory and Related Topics. Edited by Shanti S. Gupta and James O. Berger. New York: Springer, vol. V, chp. 28. pp. 377–90. [Google Scholar]
Zellner, Arnold. 1998. The finite sample properties of simultaneous equations’ estimates and estimators: Bayesian and non-Bayesian approaches. Journal of Econometrics 83: 185–212. [Google Scholar] [CrossRef]
Zellner, Arnold, Luc Bauwens, and Harman K. van Dijk. 1988. Bayesian specification analysis and estimation of simultaneous equation models using Monte Carlo methods. Journal of Econometrics 38: 39–72. [Google Scholar] [CrossRef] [Green Version]
Zellner, Arnold, Tomohiro Ando, Nalan Baştürk, Lennart Hoogerheide, and Herman K. van Dijk. 2014. Bayesian analysis of instrumental variable models: Acceptance-rejection within direct Monte Carlo. Econometric Reviews 33: 3–35. [Google Scholar] [CrossRef]

1	Zellner (1998) and Zellner et al. (2014) contain a comprehensive review of the finite sample properties of SEM estimators, and emphasize the need for finite sample optimal estimation procedure for such models. Andrews and Stock (2005) reviews recent developments in methods that deal with weak in IV regression models, and presents new testing results under “many weak-IV asymptotics”.
2	There has been a lot of interest in the estimation of LISEM with weak instruments. See Buse (1992); Bound et al. (1995); Staiger and Stock (1997); Angrist et al. (1999); Blomquist and Dahlberg (1999), among others. More recently, Andrews et al. (2019) review the literature on weak instruments in linear IV regression, and suggest that weak instruments remain an important issue in empirical practice.
3	Geweke (1996) considered a more general specification. To facilitate comparison, for Geweke approach only, we have denoted Y = (Y₂, y₁).
4	The expressions for the conditional densities of Π₂ and β given in (Geweke 1996, expressions (11) and (13)) contain some typographical errors and are corrected here in (11) and (12).
5	Note that this formulation or the singular value decomposition does not change the identification status of the LISEM specified by (1) and (2). If $r a n k (Π_{2}) < (m - 1), β$ is locally nonidentified.
6	This is the prior suggested in Drèze (1976). Zellner (1971) and Zellner et al. (1988) used a similar prior with −(m + 1)/2 in the exponent.
7	Zellner et al. (2014) suggested a variant of this approach called Acceptance-Rejection within Direct Monte Carlo (ARDMC) to evaluate the posterior density, and report substantial gain in computational efficiency, particularly with weak instruments. They also studied the existence conditions for posterior moments of the parameters of interest in terms of the number of available instruments being greater than the number of endogenous variables plus the order of the moment.
8	Gao and Lahiri (2000b) illustrated the algorithm empirically with a simple labor supply model.
9	See also Kleibergen (1997, 1998). Note that their claimed relationship that \|J(Φ, (Π₂, β, λ))\| ≥ \|J(Φ, (Π₂, β, λ))\|_λ=0 is analytically incorrect; see the Appendix A for proof.
10	In practice, there is often a concern about possible underestimation of true length of the burn-in period using the Raftery and Lewis method if the quantile of interest is not properly pre-prescribed, see Brooks and Roberts (1998).
11	Tsurumi (1990) used ω = 0.75 for Zellner’s extended MELO (ZEM) in his experiments. BMOM and ZEF are almost identical in our context.
12	We found that the median-bias and dispersion of the posterior density of β from the Geweke (1996) approach increase as $τ^{2}$ gets larger. Although one might suspect that the convergence the Gibbs sampler could be slow with smaller values of $τ^{2}$ , our convergence diagnostics did confirm this concern.
13	We do not report cases with \|ρ\| = 0.99 or 1. As pointed out by Maddala and Jeong (1992), when the instruments are weak and \|ρ\| is very close to one, the exact finite sample distribution of IV estimator is bimodal. Our experiments show that the marginal posterior density of β from the Bayesian approaches exhibits a similar pattern.
14	Denote $Ω = [\begin{matrix} ω_{11} & ω_{12} \\ ω_{12} & ω_{22} \end{matrix}]$ . Using $Σ = C^{'} Ω C$ , we have $σ_{12} = ω_{11} - 2 β ω_{12} + β^{2} ω_{22}$ , $σ_{12} = ω_{12} - β ω_{22}$ , and $σ_{22} = ω_{22}$ . Letting $ρ = σ_{12} / \sqrt{σ_{11} σ_{22}}$ , the second relationship may be rewritten as: $β - \frac{ω_{12}}{ω_{22}} = - ρ \sqrt{\frac{σ_{11}}{ω_{22}}}$ If Σ is normalized as in (35) with $σ_{11} = ω_{22} = 1$ , then ω₁₂ = β + ρ. Therefore, in our context, given β = 1, the sign and magnitude of ρ (or ω₁₂) has a special significance.
15	Medians were also calculated. Since they were very close to the corresponding means in all our experiments, we did not report them in this paper.
16	When k₂ = (m − 1), a diffuse prior in (20) for the linear model implies that the prior for the parameters of the LISEM (4) is $p (β, π_{1}, Π_{1}, Π_{2}, Ω) \propto {\| Ω \|}^{- (k + m + 1) / 2} \| Π_{2} \|,$ and the prior for the parameters of the LISEM (1) and (2) is $p (β, γ, Π_{1}, Π_{2}, Σ) \propto {\| Σ \|}^{- (k + m + 1) / 2} \| Π_{2} \|$ which is identical to the Jeffreys prior; see also expressions (22) and (42) in CP.
17	Note that the relationship between the standardized parameter vector and the original parameter vector involves the nuisance parameters, cf. Chao and Phillips (1998). However, when a SEM is in orthonormal canonical form (i.e., the exogenous regressors are orthonormal and the disturbance covariance matrix Ω is an identity matrix), both the density of random parameter β from the CP approach and the probability density of the classical LIML estimator for β are conditional on these information.
18	Ackerberg and Devereux (2006) and Blomquist and Dahlberg (2006) have suggested some ad hoc adjustments to the original JIVE formula to improve its performance.

Table 1. T = 50, ρ = 0.60, k₂ = 4, R² = 0.40.

	Mean	Std	RMSE	MAD
OLS	1.348	0.089	0.359	0.348
2SLS	1.045	0.144	0.151	0.121
MELO	1.115	0.126	0.171	0.144
BMOM	0.967	0.127	0.131	0.102
LIML	0.998	0.152	0.152	0.118
Fuller1	1.015	0.147	0.148	0.116
Fuller4	1.061	0.136	0.149	0.120
JIVE	0.957	0.178	0.183	0.141
Geweke_Mode	1.056	0.140	0.151	0.122
Geweke_Median	1.031	0.143	0.146	0.116
LIML_GS_Mode	1.061	0.139	0.152	0.123
LIML_GS_Median	1.036	0.142	0.146	0.116
CP_Mode	1.046	0.144	0.151	0.121
CP_Median	1.021	0.145	0.147	0.115
KVD_Mode	1.090	0.148	0.173	0.143
KVD_Median	1.079	0.137	0.158	0.130

Notes: Number of replications: 400. Geweke: nburn = 100, n = 2000. CP: nburn_GS = 100, nburn_MH = 100, n = 5000, acceptance rate = 0.482 (0.015). KVD: nburn_GS = 100, nburn_MH = 100, n = 4000, acceptance rate = 0.215 (0.136).

Table 2. T = 50, ρ = 0.60, k₂ = 1, R² = 0.10.

	Mean	Std	RMSE	MAD
OLS	1.537	0.111	0.548	0.537
2SLS	1.030	0.345	0.346	0.267
MELO	1.173	0.262	0.314	0.248
BMOM	0.881	0.264	0.290	0.229
LIML	1.030	0.345	0.346	0.267
Fuller1	1.107	0.300	0.319	0.245
Fuller4	1.250	0.219	0.332	0.277
JIVE	0.803	0.491	0.529	0.409
Geweke_Mode	1.089	0.331	0.343	0.265
Geweke_Median	0.907	0.518	0.526	0.358
LIML_GS_Mode	1.091	0.313	0.326	0.255
LIML_GS_Median	0.778	1.386	1.404	0.592
CP_Mode	1.108	0.309	0.327	0.256
CP_Median	0.797	1.383	1.398	0.580
KVD_Mode	n.a.	n.a.	n.a.	n.a.
KVD_Median	n.a.	n.a.	n.a.	n.a.

Notes: Number of replications: 400. Geweke: nburn = 100, n = 3000. CP: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.551 (0.023).

Table 3. T = 50, ρ = 0.60, k₂ = 4, R² = 0.10.

	Mean	Std	RMSE	MAD
OLS	1.539	0.111	0.550	0.539
2SLS	1.231	0.279	0.362	0.296
MELO	1.366	0.186	0.411	0.368
BMOM	0.943	0.184	0.193	0.154
LIML	1.043	0.579	0.581	0.386
Fuller1	1.143	0.367	0.394	0.307
Fuller4	1.281	0.244	0.372	0.307
JIVE	0.816	0.568	0.597	0.474
Geweke_Mode	1.244	0.287	0.377	0.309
Geweke_Median	1.204	0.309	0.370	0.300
LIML_GS_Mode	1.260	0.268	0.373	0.308
LIML_GS_Median	1.220	0.298	0.370	0.300
CP_Mode	1.230	0.293	0.372	0.301
CP_Median	1.194	0.315	0.370	0.298
KVD_Mode	1.351	0.384	0.520	0.389
KVD_Median	1.381	0.367	0.529	0.405

Notes: Number of replications: 400. Geweke: nburn = 100, n = 2000. CP: nburn_GS = 100, nburn_MH = 100, n = 10,000, acceptance rate = 0.475 (0.010). KVD: nburn_GS = 100, nburn_MH = 100, n = 3000, acceptance rate = 0.400 (0.217).

Table 4. T = 50, ρ = 0.60, k₂ = 9, R² = 0.10.

	Mean	Std	RMSE	MAD
OLS	1.535	0.111	0.546	0.535
2SLS	1.363	0.221	0.425	0.371
MELO	1.463	0.139	0.483	0.463
BMOM	0.969	0.132	0.136	0.106
LIML	1.090	0.864	0.869	0.534
Fuller1	1.182	0.479	0.512	0.366
Fuller4	1.302	0.291	0.419	0.333
JIVE	0.706	0.933	0.978	0.728
Geweke_Mode	1.357	0.239	0.430	0.367
Geweke_Median	1.350	0.245	0.427	0.361
LIML_GS_Mode	1.375	0.218	0.328	0.380
LIML_GS_Median	1.367	0.228	0.432	0.374
CP_Mode	1.215	0.629	0.665	0.466
CP_Median	1.255	0.388	0.464	0.346
KVD_Mode	1.550	0.376	0.666	0.556
KVD_Median	1.573	0.322	0.657	0.576

Notes: Number of replications: 400. Geweke: nburn = 100, n = 1000. CP: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.242 (0.040). KVD: nburn_GS = 200, nburn_MH = 100, n = 10,000, acceptance rate = 0.267 (0.188).

Table 5. T = 100, ρ = 0.60, k₂ = 4, R² = 0.10.

	Mean	Std	RMSE	MAD
OLS	1.538	0.077	0.543	0.538
2SLS	1.138	0.208	0.250	0.200
MELO	1.257	0.156	0.301	0.264
BMOM	0.954	0.156	0.163	0.127
LIML	1.023	0.280	0.281	0.210
Fuller1	1.069	0.250	0.259	0.197
Fuller4	1.171	0.195	0.259	0.209
JIVE	0.914	0.320	0.331	0.262
Geweke_Mode	1.149	0.215	0.262	0.208
Geweke_Median	1.111	0.228	0.254	0.198
LIML_GS_Mode	1.162	0.205	0.261	0.209
LIML_GS_Median	1.117	0.225	0.254	0.199
CP_Mode	1.155	0.207	0.259	0.206
CP_Median	1.107	0.228	0.252	0.196
KVD_Mode	1.233	0.205	0.310	0.258
KVD_Median	1.215	0.210	0.301	0.243

Notes: Number of replications: 400. Geweke: nburn = 100, n = 2000. CP: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.616 (0.008). KVD: nburn_GS = 200, nburn_MH = 100, n = 10,000, acceptance rate = 0.312 (0.175).

Table 6. T = 100, ρ = 0.60, k₂ = 9, R² = 0.10.

	Mean	Std	RMSE	MAD
OLS	1.542	0.078	0.548	0.542
2SLS	1.258	0.197	0.325	0.274
MELO	1.376	0.134	0.399	0.376
BMOM	0.972	0.132	0.135	0.110
LIML	1.003	0.437	0.437	0.291
Fuller1	1.071	0.311	0.319	0.243
Fuller4	1.180	0.233	0.294	0.232
JIVE	0.927	0.408	0.414	0.333
Geweke_Mode	1.253	0.201	0.323	0.269
Geweke_Median	1.238	0.206	0.315	0.261
LIML_GS_Mode	1.265	0.196	0.330	0.278
LIML_GS_Median	1.247	0.202	0.319	0.266
CP_Mode	1.196	0.264	0.329	0.266
CP_Median	1.192	0.232	0.301	0.240
KVD_Mode	1.371	0.278	0.464	0.382
KVD_Median	1.395	0.269	0.478	0.397

Notes: Number of replications: 400. Geweke: nburn = 100, n = 1000. CP: nburn_GS = 200, nburn_MH = 200, n = 6000, acceptance rate = 0.434 (0.029). KVD: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.210 (0.179).

Table 7. T = 100, ρ = 0.60, k₂ = 4, R² = 0.05.

	Mean	Std	RMSE	MAD
OLS	1.565	0.080	0.571	0.565
2SLS	1.254	0.282	0.380	0.309
MELO	1.376	0.184	0.419	0.379
BMOM	0.953	0.183	0.189	0.150
LIML	1.052	0.584	0.586	0.392
Fuller1	1.158	0.377	0.409	0.307
Fuller4	1.296	0.244	0.384	0.317
JIVE	0.833	0.638	0.659	0.527
Geweke_Mode	1.264	0.285	0.388	0.314
Geweke_Median	1.224	0.316	0.387	0.305
LIML_GS_Mode	1.274	0.283	0.394	0.320
LIML_GS_Median	1.232	0.310	0.387	0.306
CP_Mode	1.263	0.295	0.395	0.318
CP_Median	1.223	0.316	0.387	0.304
KVD_Mode	1.388	0.389	0.549	0.418
KVD_Median	1.394	0.315	0.504	0.414

Notes: Number of replications: 400. Geweke: nburn = 100, n = 2000. CP: nburn_GS = 100, nburn_MH = 100, n = 4000, acceptance rate = 0.611 (0.009). KVD: nburn_GS = 200, nburn_MH = 200, n = 8000, acceptance rate = 0.442 (0.224).

Table 8. T = 100, ρ = 0.60, k₂ = 9, R² = 0.05.

	Mean	Std	RMSE	MAD
OLS	1.574	0.076	0.579	0.574
2SLS	1.386	0.219	0.444	0.394
MELO	1.478	0.131	0.496	0.478
BMOM	0.979	0.129	0.131	0.105
LIML	1.139	0.882	0.893	0.545
Fuller1	1.224	0.477	0.527	0.389
Fuller4	1.335	0.280	0.437	0.358
JIVE	0.844	0.823	0.838	0.663
Geweke_Mode	1.385	0.243	0.455	0.395
Geweke_Median	1.380	0.246	0.453	0.390
LIML_GS_Mode	1.397	0.230	0.459	0.404
LIML_GS_Median	1.387	0.236	0.453	0.396
CP_Mode	1.338	0.465	0.575	0.433
CP_Median	1.337	0.311	0.459	0.376
KVD_Mode	1.584	0.462	0.745	0.592
KVD_Median	1.608	0.368	0.711	0.610

Notes: Number of replications: 400. Geweke: nburn = 100, n = 2000. CP: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.433 (0.035). KVD: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.371 (0.221).

Table 9. T = 100, ρ = 0.20, k₂ = 4, R² = 0.10.

	Mean	Std	RMSE	MAD
OLS	1.172	0.090	0.194	0.174
2SLS	1.046	0.253	0.257	0.206
MELO	1.083	0.189	0.206	0.164
BMOM	0.859	0.190	0.237	0.195
LIML	1.017	0.333	0.333	0.260
Fuller1	1.029	0.298	0.299	0.236
Fuller4	1.059	0.235	0.242	0.192
JIVE	0.957	0.417	0.419	0.340
Geweke_Mode	1.053	0.251	0.257	0.200
Geweke_Median	1.041	0.267	0.270	0.214
LIML_GS_Mode	1.058	0.244	0.251	0.197
LIML_GS_Median	1.044	0.265	0.269	0.212
CP_Mode	1.054	0.255	0.261	0.205
CP_Median	1.040	0.271	0.274	0.218
KVD_Mode	1.131	0.368	0.391	0.237
KVD_Median	1.161	0.328	0.365	0.245

Notes: Number of replications: 400. Geweke: nburn = 100, n = 1000. CP: nburn_GS = 100, nburn_MH = 100, n = 5000, acceptance rate = 0.615 (0.011). KVD: nburn_GS = 100, nburn_MH = 100, n = 1000, acceptance rate = 0.548 (0.200).

Table 10. T = 100, ρ = 0.20, k₂ = 9, R² = 0.10.

	Mean	Std	RMSE	MAD
OLS	1.179	0.096	0.203	0.181
2SLS	1.085	0.214	0.230	0.182
MELO	1.124	0.146	0.192	0.154
BMOM	0.823	0.143	0.228	0.193
LIML	0.992	0.397	0.397	0.301
Fuller1	1.015	0.347	0.347	0.270
Fuller4	1.055	0.267	0.273	0.216
JIVE	0.991	0.481	0.481	0.390
Geweke_Mode	1.084	0.218	0.234	0.184
Geweke_Median	1.079	0.223	0.237	0.187
LIML_GS_Mode	1.087	0.212	0.229	0.181
LIML_GS_Median	1.082	0.218	0.233	0.185
CP_Mode	1.054	0.308	0.313	0.223
CP_Median	1.063	0.254	0.262	0.207
KVD_Mode	1.249	0.234	0.342	0.283
KVD_Median	1.286	0.235	0.370	0.308

Notes: Number of replications: 400. Geweke: nburn = 100, n = 1000. CP: nburn_GS = 100, nburn_MH = 200, n = 5000, acceptance rate = 0.456 (0.023). KVD: nburn_GS = 100, nburn_MH = 100, n = 5000, acceptance rate = 0.413 (0.202).

Table 11. T = 50, ρ = 0.95, k₂ = 4, R² = 0.10.

	Mean	Std	RMSE	MAD
OLS	1.846	0.052	0.848	0.846
2SLS	1.359	0.180	0.402	0.363
MELO	1.572	0.118	0.584	0.572
BMOM	1.057	0.118	0.131	0.102
LIML	0.988	0.404	0.404	0.255
Fuller1	1.169	0.196	0.259	0.221
Fuller4	1.417	0.120	0.434	0.417
JIVE	0.637	0.611	0.711	0.478
Geweke_Mode	1.347	0.302	0.460	0.358
Geweke_Median	1.277	0.377	0.468	0.305
LIML_GS_Mode	1.338	0.155	0.372	0.345
LIML_GS_Median	1.252	0.194	0.318	0.281
CP_Mode	1.314	0.162	0.353	0.325
CP_Median	1.234	0.194	0.304	0.266
KVD_Mode	1.411	0.379	0.559	0.428
KVD_Median	1.462	0.463	0.654	0.514

Notes: Number of replications: 400. Geweke: nburn = 100, n = 3000. CP: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.476 (0.010). KVD: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.036 (0.038).

Table 12. T = 100, ρ = 0.95, k₂ = 4, R² = 0.10.

	Mean	Std	RMSE	MAD
OLS	1.850	0.033	0.851	0.850
2SLS	1.230	0.126	0.262	0.234
MELO	1.414	0.094	0.425	0.414
BMOM	1.044	0.095	0.105	0.082
LIML	1.025	0.170	0.172	0.132
Fuller1	1.095	0.142	0.171	0.143
Fuller4	1.264	0.099	0.282	0.265
JIVE	0.873	0.199	0.236	0.191
Geweke_Mode	1.216	0.117	0.246	0.223
Geweke_Median	1.150	0.127	0.197	0.172
LIML_GS_Mode	1.227	0.118	0.256	0.235
LIML_GS_Median	1.158	0.128	0.203	0.180
CP_Mode	1.221	0.116	0.250	0.228
CP_Median	1.154	0.127	0.200	0.176
KVD_Mode	1.258	0.207	0.331	0.280
KVD_Median	1.252	0.294	0.387	0.260

Notes: Number of replications: 400. Geweke: nburn = 100, n = 3000. CP: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.626 (0.007). KVD: nburn_GS = 200, nburn_MH = 200, n = 10,000, acceptance rate = 0.022 (0.022).

Table 13. Performance of BMOM and KVD when ρ < 0.

	Mean	Std	RMSE	MAD	Remarks
T = 50, ρ = −0.60, k₂ = 4, R² = 0.40
BMOM	0.852	0.129	0.196	0.165	Compare Table 1.
KVD_Mode	0.971	0.150	0.152	0.119	Acceptance rate for
KVD_Median	0.999	0.153	0.152	0.119	KVD: 0.713 (0.130)
T = 50, ρ = −0.60, k₂ = 4, R² = 0.10
BMOM	0.551	0.191	0.488	0.453	Compare Table 3.
KVD_Mode	0.851	0.327	0.359	0.271	Acceptance rate for
KVD_Median	0.934	0.341	0.347	0.267	KVD: 0.680 (0.133)
T = 50, ρ = −0.60, k₂ = 9, R² = 0.10
BMOM	0.420	0.136	0.600	0.580	Compare Table 4.
KVD_Mode	0.857	0.367	0.393	0.296	Acceptance rate for
KVD_Median	0.927	0.399	0.406	0.291	KVD: 0.482 (0.155)
T = 100, ρ = −0.60, k₂ = 4, R² = 0.10
BMOM	0.676	0.160	0.362	0.326	Compare Table 5.
KVD_Mode	0.901	0.213	0.235	0.186	Acceptance rate for
KVD_Median	0.964	0.237	0.239	0.190	KVD: 0.772 (0.110)
T = 100, ρ = −0.60, k₂ = 9, R² = 0.10
BMOM	0.531	0.129	0.486	0.469	Compare Table 6.
KVD_Mode	0.903	0.240	0.258	0.200	Acceptance rate for
KVD_Median	0.952	0.247	0.252	0.198	KVD: 0.614 (0.138)
T = 100, ρ = −0.60, k₂ = 4, R² = 0.05
BMOM	0.514	0.181	0.519	0.486	Compare Table 7.
KVD_Mode	0.813	0.306	0.358	0.285	Acceptance rate for
KVD_Median	0.908	0.362	0.373	0.287	KVD: 0.720 (0.128)
T = 100, ρ = −0.60, k₂ = 9, R² = 0.05
BMOM	0.407	0.131	0.608	0.593	Compare Table 8.
KVD_Mode	0.848	0.424	0.450	0.312	Acceptance rate for
KVD_Median	0.907	0.349	0.361	0.275	KVD: 0.585 (0.144)
T = 100, ρ = −0.20, k₂ = 4, R² = 0.10
BMOM	0.753	0.195	0.314	0.266	Compare Table 9.
KVD_Mode	1.002	0.267	0.267	0.208	Acceptance rate for
KVD_Median	1.037	0.291	0.293	0.218	KVD: 0.699 (0.162)
T = 100, ρ = −0.20, k₂ = 9, R² = 0.10
BMOM	0.673	0.159	0.364	0.328	Compare Table 10.
KVD_Mode	1.093	0.318	0.331	0.233	Acceptance rate for
KVD_Median	1.129	0.279	0.307	0.241	KVD: 0.553 (0.181)
T = 50, ρ = −0.95, k₂ = 4, R² = 0.10
BMOM	0.427	0.120	0.585	0.573	Compare Table 11.
KVD_Mode	0.737	0.244	0.359	0.312	Acceptance rate for
KVD_Median	0.836	0.246	0.295	0.239	KVD: 0.173 (0.112)
T = 100, ρ = −0.95, k₂ = 4, R² = 0.10
BMOM	0.589	0.097	0.422	0.411	Compare Table 12.
KVD_Mode	0.815	0.155	0.241	0.209	Acceptance rate for
KVD_Median	0.889	0.153	0.189	0.156	KVD: 0.179 (0.103)

Notes: Number of replications: 500.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, C.; Lahiri, K. A Comparison of Some Bayesian and Classical Procedures for Simultaneous Equation Models with Weak Instruments. Econometrics 2019, 7, 33. https://doi.org/10.3390/econometrics7030033

AMA Style

Gao C, Lahiri K. A Comparison of Some Bayesian and Classical Procedures for Simultaneous Equation Models with Weak Instruments. Econometrics. 2019; 7(3):33. https://doi.org/10.3390/econometrics7030033

Chicago/Turabian Style

Gao, Chuanming, and Kajal Lahiri. 2019. "A Comparison of Some Bayesian and Classical Procedures for Simultaneous Equation Models with Weak Instruments" Econometrics 7, no. 3: 33. https://doi.org/10.3390/econometrics7030033

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comparison of Some Bayesian and Classical Procedures for Simultaneous Equation Models with Weak Instruments

Abstract

1. Introduction

2. The Model

3. Review of Some Bayesian Formulations

3.1. Zellner’s Bayesian Method of Moments Approach (BMOM)

3.2. The Geweke Approach

3.3. The Chao and Phillips Approach

3.4. The Kleibergen and van Dijk Approach

3.5. The Jackknife Instrumental Variable Estimator (JIVE)

4. Posterior Simulator: “Gibbs within M–H” Algorithm

4.1. Implementing the CP Approach

4.2. Implementing the KVD Approach

4.3. Convergence Diagnosis

5. Simulation Results and Discussions

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI