Bayesian Logistic Regression Model for Sub-Areas

Chen, Lu; Nandram, Balgobin

doi:10.3390/stats6010013

Open AccessArticle

Bayesian Logistic Regression Model for Sub-Areas

by

Lu Chen

^1,*

and

Balgobin Nandram

²

¹

National Institute of Statistical Sciences, 1750 K Street NW Suite 1100, Washington, DC 20006, USA

²

Worcester Polytechnic Institute, Stratton Hall 103, Worcester, MA 01609, USA

^*

Author to whom correspondence should be addressed.

Stats 2023, 6(1), 209-231; https://doi.org/10.3390/stats6010013

Submission received: 27 October 2022 / Revised: 4 January 2023 / Accepted: 17 January 2023 / Published: 29 January 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Many population-based surveys have binary responses from a large number of individuals in each household within small areas. One example is the Nepal Living Standards Survey (NLSS II), in which health status binary data (good versus poor) for each individual from sampled households (sub-areas) are available in the sampled wards (small areas). To make an inference for the finite population proportion of individuals in each household, we use the sub-area logistic regression model with reliable auxiliary information. The contribution of this model is twofold. First, we extend an area-level model to a sub-area level model. Second, because there are numerous sub-areas, standard Markov chain Monte Carlo (MCMC) methods to find the joint posterior density are very time-consuming. Therefore, we provide a sampling-based method, the integrated nested normal approximation (INNA), which permits fast computation. Our main goal is to describe this hierarchical Bayesian logistic regression model and to show that the computation is much faster than the exact MCMC method and also reasonably accurate. The performance of our method is studied by using NLSS II data. Our model can borrow strength from both areas and sub-areas to obtain more efficient and precise estimates. The hierarchical structure of our model captures the variation in the binary data reasonably well.

Keywords:

hierarchical Bayesian model; integrated nested normal approximation; MCMC; metropolis sampler; numerical integration; parallel computing

1. Introduction

The Nepal Living Standard Survey (NLSS) II (see [1]) is a two-stage stratified sampling. A random sample of wards (areas) were selected from six strata and 12 households (sub-areas) were selected from each sampled ward. All individuals in each sampled household were interviewed. One interest is on health status, a binary variable. To make smooth estimates of the finite population proportion of the individuals with good health in each household, we focus on hierarchical Bayesian (HB) models with sub-area random effects to obtain reliable “indirect” estimates for numerous small areas or sub-areas. Most of the sample surveys are designed to provide reliable “direct” estimates of interests for large areas or domains (e.g., state level, national level). However, direct estimates are not reliable for areas or domains for which only small samples or no samples are available—see [2].

In many applications, some areas, e.g., states and wards, are sampled; in each sampled area, a sample of sub-areas, e.g., counties and households, is further selected. Ref. [3] proposed a one-fold hierarchical Bayesian logistic regression model and applied the model to NLSS II data. The main objective is to make an inference for the finite population proportion of individuals with a specific character for each area. However, the one-fold model ignores the sub-area level structure in the data. As an extension of [3], we are particularly interested in small area models that can capture the hierarchical structure of the NLSS II data in this paper. Although the one-fold basic models are very popular and in common use in producing reliable estimates, the hierarchical structure of the data and the consistency between the estimates for different levels may not hold. In particular, the sampling designs of many population-based surveys were two-stage stratified sampling as NLSS II. But if we use a one-fold unit level model to fit the data, the sub-area level effects will have been ignored. Ref. [4] studied the case that the data follow a normal model with a two-stage (three-stage) hierarchical structure, while the fitted model has a one-stage (two-stage) hierarchical structure using posterior predictive p-values. Ref. [5] discussed the ability to detect a three-stage model when a two-stage model is actually fitted.

Two-fold models are an important extension of basic small area models. Many authors have considered the problems and proposed these kinds of models. Much of the literature focuses on continuous data. Ref. [6] proposed a sub-area level model which provides model-based estimates that account for the hierarchical structure of data. Two-fold sub-area level models were studied by [2,7,8,9], and many others. This type of model is an area-level model which extends the Fay-Harriot model (see [10]) to the sub-area level. Two-fold nested error regression models were considered by [11,12]. On the other hand, some literature focus on the categorical data. Ref. [13] described a HB model to make an inference about the finite population proportion under two-stage clustering sampling. Ref. [14] extended the Beta-Binomial model to the two-fold model and used Gibbs sampling to obtain the posterior estimates. Ref. [15] showed that the two-fold Beta-Binomial model is preferable over the one-fold one if the data have a hierarchical structure. Ref. [16] extended [15] to accommodate heterogeneous correlations. They used a HB model to make a posterior inference about the finite population proportion of each area, accounting for intracluster correlations. Ref. [17] discussed the sub-area Beta-Binomial model and applied the model to estimate the finite population proportion of healthy individuals in each household covered by the NLSS II, assuming no covariate was available.

Bayesian logistic regression models with random effects are suitable for handling binary data with covariates. Ref. [18] discussed discrimination between the logit and the complementary log-log link functions by using the logistic regression model. Roberts, Ref. [19], discussed logistic regression for the sample survey data (not small area estimation). Ref. [20] showed how to accelerate the Gibbs sampler for a model with latent variables introduced earlier by [21] for Bayesian probit analysis. Ref. [22], discussed the logistic regression model by using the empirical Bayesian approach. Ref. [23] showed how to analyze binary data with covariates to maintain conjugacy for both the logistic and Poisson regression models. The analysis of binary data with covariates under nonignorable nonresponse was discussed by [24]. Ref. [3] proposed a hierarchical Bayesian logistic regression model for binary data in a small area estimation. This model is a unit level model without a sub-area effect. Our two-fold sub-area model is an extension of this logistic regression model. We add the sub-area level random effect into the model which can capture the hierarchical structure of the sampled data. At the same time, we add more hyper-parameters into the model, which make the inference more complicated. However, we propose an approximation method called the integrated nested normal approximation (INNA), which solves the difficulties.

The other side of our application is that there are numerous small areas (households and individuals) and MCMC methods, which involve complicated integrals, and cannot handle them efficiently. “Big data” are defined as data that are too big to comfortably process on a single machine [25]. The researchers considered consensus Monte Carlo methods that split the data across several machines. They proposed algorithms that perform distributed approximate Bayesian analyses in order to minimize the communication between computers. The parallel MCMC methods for non-Gaussian posterior distributions were discussed by [26]. Fortunately, in survey sampling, the design generally uses a stratification which is not artificial, and, in this case, consensus Monte Carlo may not be needed; it will be a good idea for a large stratum.

The integration involved in Bayesian inference is usually intractable, which is true for our logistic regression model. The approximation techniques are desired. The procedure we used to approximate the posterior density of the parameters of the logistic regression sub-area model, INNA, is similar to the integrated nested Laplace approximation (INLA) originally proposed by [27], but they are actually different. INLA is a quite popular algorithm and an alternative to MCMC for big data analysis if the joint posterior density is very complicated. It requires posterior modes, and, for numerous small areas, the computation of modes becomes time-consuming and challenging for the logistic regression model or any generalized linear mixed models. Yet, INLA has found many useful applications, such as in Poisson regression by [28], and in spatial point pattern data by [29]. We note that INLA can be problematic, especially for logistic and Poisson hierarchical regression models, even if the modes can be computed. Ref. [30], attempted to improve INLA using a copula-based correction, which adds complexity to INLA. Our approximation method, INNA, which does not require finding posterior modes, uses a sampling-based procedure accommodated by the multiplication rule of probability. Instead of finding the posterior modes, INNA finds the approximate modes in closed form, facilitated by the empirical logistic transform ([31]) and the second-order Taylor series approximation.

On the other hand, two-fold models can capture the heterogeneity between samples within not only areas but also sub-areas. Many model-based estimation techniques for the sampling variances have been considered in the literature, but most of them for the area-level model: see [32,33,34].

In Section 2, a full description of a sub-area HB logistic regression model is given. In Section 3, we describe the integrated nested normal approximation (INNA) computation method and some theoretical results are provided. The exact MCMC method is presented in Appendix A. The exact method refers to MCMC methods without further approximation. In Section 4, we apply the model to the NLSS II data to provide smoothed estimates of the household proportions of members in good health for both sampled and nonsampled households. Some comparisons between INNA and the exact method are presented. Finally, in Section 5, we make concluding remarks and discuss the future work.

2. Sub-Area Logistic Regression Model

In this section, we discuss the sub-area HB logistic regression model at the unit level. In the NLSS II data, we have binary data (good health versus poor health) for each individual within a household, and these households are within wards. The observations are available at the unit level and so is the reliable auxiliary information. However, the model and method we proposed for small areas and sub-areas is not only for this application on NLSS II data. It can be also applied to other population-based surveys with binary responses which contain small areas or/and sub-areas.

Suppose that there are L small areas (wards) in the finite population and that, within the

i^{th}

area, there are

N_{i}

sub-areas (households). Within the

j^{th}

sub-area, there are

M_{i j}

individuals. We assume that

ℓ (< L)

areas are sampled and a simple random sample of

n_{i} (< N_{i})

households is taken from the

i^{th}

area. All individuals in the sampled households are sampled. Here, we assume the survey weights are the same within all households in each area. Actually, the design is almost self-weighting.

Let

y_{i j k}, k = 1, \dots, m_{i j}, j = 1, \dots, n_{i}, i = 1, \dots \dots, ℓ

denote the binary responses. Let

\underset{\tilde{}}{y} = {(y_{i j k}, k = 1, \dots, m_{i j}, j = 1, \dots, n_{i}, i = 1, \dots \dots, ℓ)}^{'}

. Let

y_{i j} = \sum_{k = 1}^{m_{i j}} y_{i j k}

be the number with response 1 and

m_{i j}

be the total number of people who responded. Let

{\underset{\tilde{}}{x}}_{i j k} = {(1, x_{i j k 1}, \dots, x_{i j k p})}^{'}

be the

(p + 1)

vector with p covariates for individuals and an intercept.

We use P to represent the population proportion and p as the sample proportion. Let

p_{i j}

be the corresponding sample probability of

y_{i j}

,

j = 1, \dots, n_{i}, i = 1, \dots, ℓ

.

The primary interests are the finite population proportions of the households, which are

P_{i j} = \frac{1}{M_{i j}} \sum_{k = 1}^{M_{i j}} y_{i j k}, j = 1, \dots, N_{i}, i = 1, . . \dots, ℓ

and the finite population proportions of the areas, which are

P_{i} = \frac{1}{N_{i}} \sum_{j = 1}^{N_{i}} \sum_{k = 1}^{M_{i j}} y_{i j k}, i = 1, \dots \dots, ℓ .

In the content of the logistic regression model, the two-fold hierarchical Bayesian logistic regression model for the sub-area means,

μ_{i j}

, is

y_{i j k} | \underset{\tilde{}}{β}, ν_{i}, μ_{i j} \overset{i n d}{\sim} Bernoulli \{\frac{e^{{\underset{\tilde{}}{x}}_{i j k}^{'} \underset{\tilde{}}{β} + ν_{i} + μ_{i j}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} \underset{\tilde{}}{β} + ν_{i} + μ_{i j}}}\}, k = 1, \dots, m_{i j},

μ_{i j} | σ^{2} \overset{i i d}{\sim} Normal (0, σ^{2}), j = 1, \dots, n_{i},

(1)

ν_{i} | δ^{2} \overset{i i d}{\sim} Normal (0, δ^{2}), i = 1, \dots, ℓ,

π (\underset{\tilde{}}{β}, δ^{2}, σ^{2}) \propto \frac{1}{{(1 + δ^{2})}^{2}} \frac{1}{{(1 + σ^{2})}^{2}}, δ^{2} > 0, σ^{2} > 0 .

Here,

μ_{i j}, i = 1, \dots, ℓ; j = 1, \dots, n_{i}

are the sub-area level random effects, which are not in the area-level model in [3].

ν_{i}, i = 1, \dots, ℓ

are the area random effects and

\underset{\tilde{}}{β} = {(β_{0}, β_{1}, \dots, β_{p})}^{'}

are the regression coefficients, with

σ^{2}, δ^{2}

as the variance of the random effects, respectively.

In order to apply our approximation method and make an inference for posterior distribution, we use an equivalent model.

First, we separate

\underset{\tilde{}}{β}

into

β_{0}

and

{\underset{\tilde{}}{β}}_{(0)}

, where

{\underset{\tilde{}}{β}}_{(0)} = {(β_{1}, β_{2}, \dots, β_{p})}^{T}

. We set

β_{0}

as the mean of

\underset{\tilde{}}{ν}

, and then we can omit the intercept term from the covariate

{\underset{\tilde{}}{x}}_{i j k}

. Second, we introduce a new parameter,

w_{i j} = ν_{i} + μ_{i j}

, in order to set

ν_{i}

and

μ_{i j}

independently and make it easy to make an inference for both of them. We have

y_{i j k} | w_{i j}, {\underset{\tilde{}}{β}}_{(0)} \overset{i n d}{\sim} Bernoulli \{\frac{e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}\}, k = 1, \dots, m_{i j},

w_{i j} | ν_{i}, σ^{2} \overset{i n d}{\sim} Normal (ν_{i}, σ^{2}), j = 1, \dots, n_{i},

(2)

ν_{i} | β_{0}, δ^{2} \overset{i i d}{\sim} Normal (β_{0}, δ^{2}), i = 1, \dots, ℓ,

π (\underset{\tilde{}}{β}, δ^{2}, σ^{2}) \propto \frac{1}{{(1 + δ^{2})}^{2}} \frac{1}{{(1 + σ^{2})}^{2}}, δ^{2} > 0, σ^{2} > 0 .

The joint posterior density for the parameters is

\begin{matrix} π (\underset{\tilde{}}{v}, \underset{\tilde{}}{w}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y}) & \propto \prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \prod_{k = 1}^{m_{i j}} [\frac{e^{({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}) y_{i j k}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}] \times {(\frac{1}{\sqrt{2 π σ^{2}}})}^{n} exp \{- \sum_{i = 1}^{l} \sum_{j = 1}^{n i} \frac{{(w_{i j} - ν_{i})}^{2}}{2 σ^{2}}\} \\ \times {(\frac{1}{\sqrt{2 π δ^{2}}})}^{l} exp \{- \sum_{i = 1}^{l} \frac{{(ν_{i} - β_{0})}^{2}}{2 δ^{2}}\} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} . \end{matrix}

(3)

The posterior density is a non-standard multivariate density, and there are difficulties in fitting it using MCMC methods, more so when

n_{i}, m_{i j}

are large. This motivates our approximate methods.

3. Integrated Nested Normal Approximation Method

In this section, we discuss the INNA method for the sub-area HB logistic regression model. It is an extension of the INNA method in [3]. INNA method is not required to find the posterior modes. Due to the large amount of sub-areas, it would be time-consuming to find all posterior modes, which is why we did not choose the popular INLA method. In detail, we discuss the approximation of the joint posterior density (3).

Notice that the joint posterior density (3) is very complicated and it is the expit part,

\prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \prod_{k = 1}^{m_{i j}} [\frac{e^{({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}) y_{i j k}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}]

, that causes the difficulties. In the following, we discuss how to approximate this term to normal density functions by using Laplace approximation, the second-order multivariate Taylor-series approximation and the empirical logistic transform (ELT). This is the key contribution in the paper. Then we use the multiplication rule to approximate the joint posterior density,

π_{a} (\underset{\tilde{}}{w}, \underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y}) \propto π_{a} (\underset{\tilde{}}{w} ∣ \underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y}) π_{a} (\underset{\tilde{}}{ν} ∣ β_{0}, δ^{2}, \underset{\tilde{}}{y}) π_{a} ({\underset{\tilde{}}{β}}_{(0)} ∣ \underset{\tilde{}}{y}) π_{a} (\underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y}),

where the first three densities on the right-hand side are all multivariate normal densities. Therefore, we can draw samples and make inference through the approximate joint posterior density.

Let

f (\underset{\tilde{}}{τ}) = e^{h (\underset{\tilde{}}{τ})}

denote the density of a vector of parameters

\underset{\tilde{}}{τ}

. Let

\underset{\tilde{}}{g}

denote the gradient vector and H the Hessian matrix at some point

{\underset{\tilde{}}{τ}}^{*}

.

Lemma 1.

Let

h (\underset{\tilde{}}{τ})

be a logconcave density function with the parameter

\underset{\tilde{}}{τ}

. Then,

\underset{\tilde{}}{τ}

approximately has a multivariate normal distribution,

\underset{\tilde{}}{τ} \overset{}{\sim} N o r m a l ({\underset{\tilde{}}{τ}}^{*} - H^{- 1} \underset{\tilde{}}{g}, - H^{- 1}) .

Proof.

Simply applying the second-order multivariate Taylor series of

h (\underset{\tilde{}}{τ})

at

{\underset{\tilde{}}{τ}}^{*}

gives

f (\underset{\tilde{}}{τ}) \approx f ({\underset{\tilde{}}{τ}}^{*}) + {(\underset{\tilde{}}{τ} - {\underset{\tilde{}}{τ}}^{*})}^{'} \underset{\tilde{}}{g} + \frac{1}{2} {(\underset{\tilde{}}{τ} - {\underset{\tilde{}}{τ}}^{*})}^{'} H (\underset{\tilde{}}{τ} - {\underset{\tilde{}}{τ}}^{*}) .

Due to the logconcavity of

h (\underset{\tilde{}}{τ})

, its Hessian Matrix

- H

is posit-definite, which can be the covariance matrix. Notice that we are not required to use the mode of

h (\underset{\tilde{}}{τ})

. We do not need to find the solution of the gradient vector

\underset{\tilde{}}{g} = 0

. Therefore,

{\underset{\tilde{}}{τ}}^{*}

does not have to be the solution but some other point. It is worth noticing that the term,

- H^{- 1} \underset{\tilde{}}{g}

, is a correction to

{\underset{\tilde{}}{τ}}^{*}

. □

To illustrate the approximation steps, we start with a simpler model with flat priors for

{\underset{\tilde{}}{β}}_{(0)}

and the

\underset{\tilde{}}{w}

, according to model (2). That is,

y_{i j k} | w_{i j}, {\underset{\tilde{}}{β}}_{(0)} \overset{i n d}{\sim} Bernoulli \{\frac{e^{{\underset{\tilde{}}{x}}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}\}, k = 1, \dots, m_{i j}, j = 1, \dots, n_{i}, i = 1, \dots, ℓ,

p (\underset{\tilde{}}{w}, {\underset{\tilde{}}{β}}_{(0)}) = 1 .

(4)

The joint posterior density is

π (\underset{\tilde{}}{w}, {\underset{\tilde{}}{β}}_{(0)} | \underset{\tilde{}}{y}) \propto \prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \prod_{k = 1}^{m_{i j}} \{\frac{e^{({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}) y_{i j k}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}\} .

(5)

The logarithm of the joint posterior density (or log likelihood) is

Δ = h (\underset{\tilde{}}{τ}) = \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} \{({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}) y_{i j k} - log (1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}})\} .

Let

{\underset{\tilde{}}{τ}}^{'} = ({\underset{\tilde{}}{μ}}^{'}, {\underset{\tilde{}}{β}}_{(0)}^{'})

. In our method, we find a convenient point to expand the log-likelihood in a second-order multivariate Taylor-series expansion.

To begin with, let

{\bar{y}}_{i j} = \frac{1}{m_{i j}} \sum_{k = 1}^{m_{i j}} y_{i j k}

. We use the empirical logistic transform

z_{i j}

to get an estimate of

w_{i j}

, where

{\hat{w}}_{i j}^{*} = z_{i j} = log \{\frac{{\bar{y}}_{i j} + \frac{1}{2 m_{i j}}}{1 - {\bar{y}}_{i j} + \frac{1}{2 m_{i j}}}\}, i = 1, \dots, ℓ; j = 1, \dots, n_{i} .

First, we discuss how to find the quasi mode of

{\underset{\tilde{}}{β}}_{(0)}

. We plug

{\hat{w}}_{i j}^{*}

into the log likelihood function

Δ

and consider it as a function of

{\underset{\tilde{}}{β}}_{(0)}

only as

q ({\underset{\tilde{}}{β}}_{(0)})

, and we get

q ({\underset{\tilde{}}{β}}_{(0)}) = \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} [({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*}) y_{i j k} - log (1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*}})] .

The first derivative of

q ({\underset{\tilde{}}{β}}_{(0)})

is

\begin{matrix} q^{'} ({\underset{\tilde{}}{β}}_{(0)}) & = \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} \{{\underset{\tilde{}}{x}}_{i j k} y_{i j k} - \frac{{\underset{\tilde{}}{x}}_{i j k} e^{({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*})}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*}}}\} \\ = \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} \{{\underset{\tilde{}}{x}}_{i j k} y_{i j k} - {\underset{\tilde{}}{x}}_{i j k} {[1 + e^{- ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*})}]}^{- 1}\} . \end{matrix}

Usually we should set

q^{'} ({\underset{\tilde{}}{β}}_{(0)})

equal to zero and find the modes as the maximum likelihood estimator (MLE) of

{\underset{\tilde{}}{β}}_{(0)}

. But here, it is not easy to solve the equation due to the complexity of

q^{'} ({\underset{\tilde{}}{β}}_{(0)})

. We use the first-order Taylor series to approximate it and then simplify

q^{'} ({\underset{\tilde{}}{β}}_{(0)})

so that we can get quasi-modes of

{\underset{\tilde{}}{β}}_{(0)}

.

The first-order Taylor expansion of

{(1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*}})}^{- 1}

equals

(1 - e^{- ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*})})

. Notice that by Taylor series,

e^{- ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*})} \approx 1 - ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*})

. Then we can get

\begin{matrix} q^{'} ({\underset{\tilde{}}{β}}_{(0)}) & \approx \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} \{{\underset{\tilde{}}{x}}_{i j k} y_{i j k} - {\underset{\tilde{}}{x}}_{i j k} [(1 - e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*}})]\} \\ \approx \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} \{{\underset{\tilde{}}{x}}_{i j k} y_{i j k} - {\underset{\tilde{}}{x}}_{i j k} [(1 - 1 + ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + {\hat{w}}_{i j}^{*}))]\} \\ = \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} \{{\underset{\tilde{}}{x}}_{i j k} (y_{i j k} - {\hat{w}}_{i j}^{*}) - {\underset{\tilde{}}{x}}_{i j k} {\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}\} . \end{matrix}

We can get the quasi-modes of

{\underset{\tilde{}}{β}}_{(0)}

by solving the equation

q^{'} ({\underset{\tilde{}}{β}}_{(0)}) = 0

. That is,

{\underset{\tilde{}}{β}}_{(0)}^{*} = {[\sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} {\underset{\tilde{}}{x}}_{i j k} {\underset{\tilde{}}{x}}_{i j k}^{'}]}^{- 1} [\sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} {\underset{\tilde{}}{x}}_{i j k} (y_{i j k} - {\hat{w}}_{i j}^{*})] .

Second, we obtain quasi-modes for the

w_{i j}

, a refinement of the

z_{i}

. Plug

{\underset{\tilde{}}{β}}_{(0)}^{*}

into the likelihood function

Δ

and consider it as function

w_{i j}

only:

g (w_{i j}) = \sum_{k = 1}^{m_{i j}} [({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}) y_{i j k} - log (1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{{(0)}^{*}} + w_{i j}})] .

Similarly, after applying Taylor expansion, we get the approximate first derivative of

g (w_{i j})

\begin{matrix} g^{'} (w_{i j}) & = \sum_{k = 1}^{m_{i j}} \{y_{i j k} - {[1 + e^{- ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j})}]}^{- 1}\} \\ \approx \sum_{k = 1}^{m_{i j}} \{y_{i j k} - (1 - e^{- w_{i j}} e^{- {\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*}})\} . \end{matrix}

We can obtain the approximate posterior mode of

w_{i j}

by solving the equation

g^{'} (w_{i j}) = 0

.

w_{i j}^{*} = log \{\frac{\sum_{k = 1}^{m_{i j}} e^{- {\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*}}}{m_{i j} (1 - {\bar{y}}_{i j})}\} .

Notice that the term

1 - {\bar{y}}_{i j}

in denominator may cause trouble if

{\bar{y}}_{i j} = 1

for some is and js. Here, we borrow the idea from ELT and make a small adjustment in order to avoid a zero denominator. That is,

w_{i j}^{*} \approx log \{\frac{\sum_{k = 1}^{m_{i j}} e^{- {\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*}}}{m_{i j} (1 - {\bar{y}}_{i j} + \frac{1}{2 m_{i j}})}\} i = 1, \dots, ℓ, j = 1, \dots, n_{i} .

Let

{\underset{\tilde{}}{τ}}^{*'} = ({\underset{\tilde{}}{μ}}^{*'}, {\underset{\tilde{}}{β}}_{(0)}^{*'})

. Next, we evaluate

\underset{\tilde{}}{g}

and H at the quasi-modes

\underset{\tilde{}}{τ} = {\underset{\tilde{}}{τ}}^{*}

can also be obtained as

\underset{\tilde{}}{g} = {(\begin{matrix} \frac{\partial Δ}{\partial w_{11}} \dots \frac{\partial Δ}{\partial w_{ℓ n_{ℓ}}} \frac{\partial Δ}{\partial {\underset{\tilde{}}{β}}_{(0)}} \end{matrix})}_{\underset{\tilde{}}{w} = {\underset{\tilde{}}{w}}^{*}, {\underset{\tilde{}}{β}}_{(0)} = {\underset{\tilde{}}{β}}_{(0)}^{*}}^{T},

H = {(\begin{matrix} \frac{\partial^{2} Δ}{\partial w_{11}^{2}} & \dots & \frac{\partial^{2} Δ}{\partial w_{11} \partial w_{ℓ n_{ℓ}}} & \frac{\partial^{2} Δ}{\partial w_{11} \partial {\underset{\tilde{}}{β}}_{(0)}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & \dots & \frac{\partial^{2} Δ}{\partial w_{ℓ n_{ℓ}}^{2}} & \frac{\partial^{2} Δ}{\partial w_{ℓ n_{ℓ}} \partial {\underset{\tilde{}}{β}}_{(0)}} \\ \frac{\partial^{2} Δ}{\partial w_{11} \partial {\underset{\tilde{}}{β}}_{(0)}} & \dots & \frac{\partial^{2} Δ}{\partial w_{ℓ n_{ℓ}} \partial {\underset{\tilde{}}{β}}_{(0)}} & \frac{\partial^{2} Δ}{\partial {\underset{\tilde{}}{β}}_{(0)}^{2}} \end{matrix})}_{\underset{\tilde{}}{w} = {\underset{\tilde{}}{w}}^{*}, {\underset{\tilde{}}{β}}_{(0)} = {\underset{\tilde{}}{β}}_{(0)}^{*}} .

The partial derivatives can be expressed in terms of response

y_{i j k}

and covariates

{\underset{\tilde{}}{x}}_{i j k}

as

\frac{\partial Δ}{\partial {\underset{\tilde{}}{β}}_{(0)}} = \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} \{{\underset{\tilde{}}{x}}_{i j k} y_{i j k} - \frac{{\underset{\tilde{}}{x}}_{i j k} e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}}}\},

\frac{\partial Δ}{\partial w_{i j}} = \sum_{k = 1}^{m_{i j}} (y_{i j k} - \frac{e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}}}),

\frac{\partial^{2} Δ}{\partial {\underset{\tilde{}}{β}}_{(0)}^{2}} = - \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} \frac{{\underset{\tilde{}}{x}}_{i j k} {\underset{\tilde{}}{x}}_{i j k}^{'} e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}}}{{(1 + e^{{\underset{\tilde{}}{x}}_{i j}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}})}^{2}},

\frac{\partial^{2} Δ}{\partial w_{i j}^{2}} = - \sum_{k = 1}^{m_{i j}} \frac{e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}}}{{(1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}})}^{2}},

\frac{\partial^{2} Δ}{\partial μ_{i} \partial {\underset{\tilde{}}{β}}_{(0)}} = - \sum_{k = 1}^{m_{i j}} \frac{{\underset{\tilde{}}{x}}_{i j k} e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}}}{{(1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)}^{*} + w_{i j}^{*}})}^{2}},

where

i = 1, \dots, ℓ, j = 1, \dots, n_{i}

.

For the convenience of computation, denote

\underset{\tilde{}}{g} = (\begin{matrix} {\underset{\tilde{}}{g}}_{1} \\ {\underset{\tilde{}}{g}}_{2} \end{matrix})

and

H = - (\begin{matrix} D C^{'} \\ C B \end{matrix}),

where

{\underset{\tilde{}}{g}}_{1} = {(\begin{matrix} \frac{\partial Δ}{\partial w_{11}} \dots \frac{\partial Δ}{\partial w_{ℓ n_{ℓ}}} \end{matrix})}^{T}, {\underset{\tilde{}}{g}}_{2} = \frac{\partial Δ}{\partial {\underset{\tilde{}}{β}}_{(0)}},

B = - \frac{\partial^{2} Δ}{\partial {\underset{\tilde{}}{β}}_{(0)}^{2}}, C = - (\begin{matrix} \frac{\partial^{2} Δ}{\partial w_{11} \partial {\underset{\tilde{}}{β}}_{(0)}} \dots \frac{\partial^{2} Δ}{\partial w_{ℓ n_{ℓ}} \partial {\underset{\tilde{}}{β}}_{(0)}} \end{matrix}), D = - (\begin{matrix} \frac{\partial^{2} Δ}{\partial w_{11}^{2}} \dots 0 \\ : ⋱ : \\ 0 \dots \frac{\partial^{2} Δ}{\partial w_{ℓ n_{ℓ}}^{2}} \end{matrix}) .

Let

- H^{- 1} = {(\begin{matrix} D C^{'} \\ C B \end{matrix})}^{- 1} = (\begin{matrix} E F^{'} \\ F G \end{matrix})

, where

E = D^{- 1} + D^{- 1} C^{'} {(B - C D^{- 1} C^{'})}^{- 1} C D^{- 1}, F = - {(B - C D^{- 1} C^{'})}^{- 1} C D^{- 1}, G = {(B - C D^{- 1} C^{'})}^{- 1} .

Lemma 2.

Assuming that the design matrix is full-rank and

0 < \sum_{k = 1}^{m_{i j}} y_{i j k} < m_{i j}, j = 1, \dots, n_{i}; i = 1, \dots, ℓ

, the posterior density,

\underset{\tilde{}}{τ} | \underset{\tilde{}}{y}

in (5), is logconcave.

Proof.

If

0 < \sum_{k = 1}^{m_{i j}} y_{i j k} < m_{i j}, i = 1, \dots, ℓ, j = 1, \dots, n_{i}

, there are solutions to the gradient vector set to zero.

Let

p_{i j k} = \frac{e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}, k = 1, \dots, m_{i} j, j = 1, \dots, n_{i}, i = 1, \dots, ℓ

. Then, A, B and C of the negative Hessian matrix can be written as,

B = - \frac{\partial^{2} Δ}{\partial {\underset{\tilde{}}{β}}_{(0)}^{2}} = \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} p_{i j k} (1 - p_{i j k}) {\underset{\tilde{}}{x}}_{i j k} {\underset{\tilde{}}{x}}_{i j k}^{'},

D = d i a g o n a l (d_{i j}), d_{i j} = \frac{\partial^{2} Δ}{\partial w_{i j}^{2}} = \sum_{k = 1}^{m_{i j}} p_{i j k} (1 - p_{i j k}),

C = ({\underset{\tilde{}}{c}}_{i j}), {\underset{\tilde{}}{c}}_{i j} = \frac{\partial^{2} Δ}{\partial w_{i j} \partial {\underset{\tilde{}}{β}}_{(0)}} = \sum_{k = 1}^{m_{i j}} p_{i j k} (1 - p_{i j k}) {\underset{\tilde{}}{x}}_{i j k},

where

j = 1, \dots, n_{i}, i = 1, \dots, ℓ .

It is obvious that D is positive-definite. Thus, to show that

- H

is positive-definite, we need to show that its Schur complement of D,

S = B - C D^{- 1} C^{'}

, is positive-definite (e.g., see [35]). Let

ω_{i j k} = p_{i j k} (1 - p_{i j k}) / \sum_{k = 1}^{m_{i j}} p_{i j k} (1 - p_{i j k}), k = 1, \dots, m_{i j}, j = 1, \dots, n_{i}, i = 1, \dots, ℓ

. The Schur complement is

S = \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} p_{i j k} (1 - p_{i j k}) \sum_{k = 1}^{m_{i j}} ω_{i j k} {\underset{\tilde{}}{x}}_{i j k} {\underset{\tilde{}}{x}}_{i j k}^{'} - \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} p_{i j k} (1 - p_{i j k}) \sum_{k = 1}^{m_{i j}} ω_{i j k} {\underset{\tilde{}}{x}}_{i j k} \sum_{k = 1}^{m_{i j}} ω_{i j k} {\underset{\tilde{}}{x}}_{i j k}^{'} .

It is now easy to show that

S = \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} ω_{i j k} ({\underset{\tilde{}}{x}}_{i j k} - \sum_{k = 1}^{m_{i j}} ω_{i j k} {\underset{\tilde{}}{x}}_{i j k}) {({\underset{\tilde{}}{x}}_{i j k} - \sum_{k = 1}^{m_{i j}} ω_{i j k} {\underset{\tilde{}}{x}}_{i j k})}^{'} .

Therefore,

- H

is positive-definite, and

\underset{\tilde{}}{τ} | \underset{\tilde{}}{y}

is logconcave. □

Finally, according to the Lemmas 1 and 2, we can establish the approximation Theorem.

Theorem 1.

Assuming that the design matrix is full-rank and

0 < \sum_{k = 1}^{m_{i j}} y_{i j k} < m_{i j}, j = 1, \dots, n_{i}, i = 1, \dots, ℓ

, the posterior density,

\underset{\tilde{}}{τ} | \underset{\tilde{}}{y}

in (5) is approximately a multivariate normal density, and the conditional posterior density of

\underset{\tilde{}}{w} | {\underset{\tilde{}}{β}}_{(0)}, \underset{\tilde{}}{y}

and

{\underset{\tilde{}}{β}}_{(0)} | \underset{\tilde{}}{y}

can also be approximated by multivariate normal distributions.

Proof.

The proof is given in Appendix B. □

Therefore, we can approximate that logit expit term

\prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \prod_{k = 1}^{m_{i j}} [\frac{e^{({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}) y_{i j k}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}]

into two multivariate densities by Theorem 1. And then we can get our approximate two-fold Bayesian logistic regression model.

Recall the posterior density of our two-fold logistic model is

π (\underset{\tilde{}}{w}, \underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y}) \propto π (\underset{\tilde{}}{y} | \underset{\tilde{}}{w}, {\underset{\tilde{}}{β}}_{(0)}) π (\underset{\tilde{}}{w} ∣ \underset{\tilde{}}{ν}, σ^{2}) π (\underset{\tilde{}}{ν} ∣ β_{0}, δ^{2}) π ({\underset{\tilde{}}{β}}_{(0)}, β_{0}, σ^{2}, δ^{2})

The likelihood function

π (\underset{\tilde{}}{y} | \underset{\tilde{}}{w}, {\underset{\tilde{}}{β}}_{(0)})

can be approximated by the multivariate normal distribution by Theorem 1. Combining the prior values of

\underset{\tilde{}}{w}

and

\underset{\tilde{}}{ν}

given by our Bayesian Logistic model and the results in Theroem 1, we can obtain our INNA model

\underset{\tilde{}}{w} | {\underset{\tilde{}}{β}}_{(0)}, \underset{\tilde{}}{y} \overset{}{\sim} Normal {{\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}), D^{- 1}}

{\underset{\tilde{}}{β}}_{(0)} | \underset{\tilde{}}{y} \overset{}{\sim} Normal {{\underset{\tilde{}}{μ}}_{β}, G}

\underset{\tilde{}}{w} | \underset{\tilde{}}{ν}, σ^{2} \overset{i n d}{\sim} Normal ({\underset{\tilde{}}{μ}}_{ν}, σ^{2} I),

\underset{\tilde{}}{ν} | β_{0}, δ^{2} \overset{i i d}{\sim} Normal (β_{0} \underset{\tilde{}}{j}, δ^{2} I),

π ({\underset{\tilde{}}{β}}_{(0)}, β_{0}, δ^{2}, σ^{2}) \propto \frac{1}{{(1 + δ^{2})}^{2}} \frac{1}{{(1 + σ^{2})}^{2}}, δ^{2} > 0, σ^{2} > 0,

where

{\underset{\tilde{}}{μ}}_{ν}^{'} = {(\underset{n_{1}}{\underset{⏟}{ν_{1}, \dots, ν_{1}}} \dots \underset{n_{ℓ}}{\underset{⏟}{ν_{ℓ}, \dots, ν_{ℓ}}})}^{'}

and

\underset{\tilde{}}{j}

is a vector of ones.

Using Bayes’ Theorem and the multiplication rule, the posterior density

π (\underset{\tilde{}}{w}, \underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y})

can be approximated as

\begin{matrix} π_{a} (\underset{\tilde{}}{w}, \underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y}) & \propto π_{a} (\underset{\tilde{}}{w} ∣ \underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y}) π_{a} (\underset{\tilde{}}{ν} ∣ β_{0}, δ^{2}, \underset{\tilde{}}{y}) π_{a} ({\underset{\tilde{}}{β}}_{(0)} ∣ \underset{\tilde{}}{y}) π_{a} (\underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y}) \\ = e^{- \frac{1}{2} \{{[\underset{\tilde{}}{w} - ({\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}))]}^{'} D [\underset{\tilde{}}{w} - ({\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}))]\}} \\ \times e^{- \frac{1}{2} \{{[\underset{\tilde{}}{w} - {\underset{\tilde{}}{μ}}_{ν}]}^{'} {(σ^{2} I)}^{- 1} [\underset{\tilde{}}{w} - {\underset{\tilde{}}{μ}}_{ν}] + {[\underset{\tilde{}}{ν} - β_{0} \underset{\tilde{}}{j}]}^{'} {(δ^{2} I)}^{- 1} [\underset{\tilde{}}{ν} - β_{0} \underset{\tilde{}}{j}] + {[{\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}]}^{'} G^{- 1} [{\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}]\}} \\ \times \frac{{| D |}^{1 / 2}}{| δ^{2} {I |}^{1 / 2} | σ^{2} {I |}^{1 / 2} {| G |}^{1 / 2}} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} \end{matrix}

(6)

Therefore, we can get the following key result.

Theorem 2.

Using the multiplication rule, the joint posterior density,

π (\underset{\tilde{}}{w}, \underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y})

in (6), can be approximated by

π_{a} (\underset{\tilde{}}{w}, \underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y}) \propto π_{a} (\underset{\tilde{}}{w} ∣ \underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y}) π_{a} (\underset{\tilde{}}{ν} ∣ β_{0}, δ^{2}, \underset{\tilde{}}{y}) π_{a} ({\underset{\tilde{}}{β}}_{(0)} ∣ \underset{\tilde{}}{y}) π_{a} (\underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y}),

where the first three densities on the right-hand side are all multivariate normal densities.

Proof.

The proof is given in Appendix C. □

The INNA is actually a random sampler. First, we draw samples for

σ^{2}, δ^{2}

from

π (σ^{2}, δ^{2} | \underset{\tilde{}}{y})

. The posterior distribution of

σ^{2}, δ^{2} | \underset{\tilde{}}{y}

does not have standardized form. Here, we use the grid method and numerical integration to sample

σ^{2}

and

δ^{2}

. Since

0 < σ^{2} < \infty

and

0 < δ^{2} < \infty

, we make a transformation to

ϕ_{1} = \frac{1}{1 + σ^{2}}

and

ϕ_{2} = \frac{1}{1 + δ^{2}}

so that we get

0 < ϕ_{1} < 1

and

0 < ϕ_{2} < 1

. Then, the posterior density of

ϕ_{1}, ϕ_{2} | \underset{\tilde{}}{y}

is

\begin{matrix} π_{a} (ϕ_{1}, ϕ_{2} | \underset{\tilde{}}{y}) & \propto {\{{|\begin{matrix} \begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix} \end{matrix}|}^{- \frac{1}{2}} \times \prod_{i = 1}^{l} {(\frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + \frac{1}{δ^{2}})}^{\frac{1}{2}} \frac{1}{| δ^{2} D + \frac{δ^{2}}{σ^{2}} {I |}^{1 / 2}}\}}_{ϕ_{1} = \frac{1}{1 + σ^{2}}, ϕ_{2} = \frac{1}{1 + δ^{2}}} \\ \times exp \{- \frac{1}{2} {({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β})}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} ({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β}) + {\underset{\tilde{}}{μ}}_{β}^{'} G^{- 1} {\underset{\tilde{}}{μ}}_{β}\} \\ \times exp {\{- \frac{1}{2} {(\begin{matrix} β_{0} - ω_{0} \\ {\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{ω}}_{(0)} \end{matrix})}^{'} (\begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix}) (\begin{matrix} β_{0} - ω_{0} \\ {\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{ω}}_{(0)} \end{matrix})\}}_{ϕ_{1} = \frac{1}{1 + σ^{2}}, ϕ_{2} = \frac{1}{1 + δ^{2}}} . \end{matrix}

We need to draw

ϕ_{1}, ϕ_{2}

together. The joint density can be rewritten as

π (ϕ_{1}, ϕ_{2} | \underset{\tilde{}}{y}) = π (ϕ_{2} | ϕ_{1}) π (ϕ_{1} | \underset{\tilde{}}{y}) = π (ϕ_{2} | ϕ_{1}) \int_{0}^{1} π (ϕ_{1}, ϕ_{2} | \underset{\tilde{}}{y}) d ϕ_{2} .

We plug each grid of

ϕ_{1} \in (0, 1)

into

\int_{0}^{1} π (ϕ_{1}, ϕ_{2} | \underset{\tilde{}}{y}) d ϕ_{2}

and then use numerical integration to get the density of

(ϕ_{1} | \underset{\tilde{}}{y})

. After we plug all the 100 grids, we can get 100 value of

π (ϕ_{1} | \underset{\tilde{}}{y})

and then draw

ϕ_{1}

from them, i.e.

ϕ_{1}^{(h)}

. Next, we plug

ϕ_{1}^{(h)}

into

π (ϕ_{2} | ϕ_{1})

and use grid method to draw

ϕ_{2}^{(h)}

. We repeat those steps 10,000 times to get the sample of

(ϕ_{1}^{(h)}, ϕ_{2}^{(h)}), h = 1, \dots, 10,000 .

Once we get samples for

ϕ_{1}, ϕ_{2}

, we transform them back to

σ^{2}

and

δ^{2}

respectively. Second, given

σ^{2}, δ^{2}

, we can simply draw samples of

\underset{\tilde{}}{β}

from the approximate multivariate normal distribution

π_{a} (\underset{\tilde{}}{β} ∣ σ^{2}, δ^{2}, \underset{\tilde{}}{y})

. Third, we can draw samples of

ν_{i}

independently given

\underset{\tilde{}}{β}, δ^{2}

and data from the approximate normal distribution

π_{a} (\underset{\tilde{}}{ν} | β_{0}, δ^{2}, \underset{\tilde{}}{y})

. Finally, samples of

w_{i j}

independently given

\underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}

can be obtained from the approximate normal distribution

π_{a} (\underset{\tilde{}}{w} | \underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y})

. Notice that the last three steps are very simple, just drawing samples from normal densities. In addition,

w_{i j}

and

ν_{i}

are all independent so that we can draw them simultaneously. Therefore, those latter steps permit fast computing.

In order to check if INNA method can provide resonal results, we apply the MCMC logistic regression exact method to the sub-area model. The idea of exact method is to get full conditional posterior distributions for all of the parameters in the model, and then get a large number of independent samples of each parameter with its full conditional posterior density. Details are given in Appendix A.

There are two differences between these two methods. First, both methods are sampling-based. The approximate method implements random samples and the exact method uses numerical integration method and Markov chains. Second,

π_{a} (\underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y})

is used for the INNA method. In the exact method, a Metroplis step is used for the

π (\underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y})

. This is very time-consuming in the exact method. On the other hand, the exact method actually uses the INNA method. We use M-H sampler draw samples for

\underset{\tilde{}}{ν}

and

\underset{\tilde{}}{w}

, respectively. Proposal functions are

π_{a} (\underset{\tilde{}}{ν} ∣ {\underset{\tilde{}}{β}}_{(0)}, β_{0}, σ^{2}, \underset{\tilde{}}{y})

and

π_{a} (\underset{\tilde{}}{w} ∣ \underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y})

, respectively, from the INNA method.

4. Numerical Example

4.1. Nepal Living Standards Survey II

The performance of our method is studied using the Nepal Living Standard Survey (NLSS II), conducted in the years 2003–2004. The main objective of the NLSS II is to track changes in and the progress of national living standards and social indicators of the Nepalese population. It is an integrated survey which covers samples from the whole country and runs throughout the year.

The NLSS II gathers information on a variety of aspects. It has collected data on demographics, housing, education, health, fertility, employment, income, agricultural activity, consumption, and various other areas. The sampling design of NLSS II is two-stage stratified sampling. Nepal is stratified into Primary Sampling Units (PSUs) and, within each PSU, there are a number of households (sub-area) selected. All household members in the sample were interviewed.

In detail, the NLSS II has records for 20,263 individuals from 3912 households (sub-areas) from 326 PSUs (areas) from a population of 60,262 households and about two million Nepalese. A sample of PSUs was selected from the strata using probability proportional to size (PPS) sampling and 12 households were systematically selected from each PSU. The survey is self-weighted and some adjustments were made after conducting the survey for non-responses or missing data. For simplicity, in this paper, we assume all samples have the same weight. Table 1 shows the distribution of all samples by stratum.

We chose four relevant covariates which can influence health status from the same NLSS II survey for our two-fold logistic regression model. They are age, nativity, sex and religion. We created binary variables for nativity (Indigenous = 1, Non-indigenous = 0) and religion ((Hindu = 1, Non-Hindu = 0), sex (Male = 1, Female = 0). Table 2 shows the details of these four covariates. In the model fitting, we standardized the age covariate. Older age and a child’s age are more vulnerable times than younger age. Indigenous people can have different health statuses from migrated people.

According to the 2001 census data, only about 0.091% of households and only 0.904% of PSU were sampled. The NLSS II was designed to provide reliable estimates only at the stratum level, or even larger areas than the stratum. It cannot give estimates in small areas (PSU or household level) since the sample sizes are too small. Therefore, we need to use statistical models to fit the available data and find reliable estimates in small areas. In our study, we chose the binary variable, health status, from the health section of the questionnaire.

4.2. Numerical Comparison

We used data from NLSS II to illustrate our sub-area logistic regression model. We predicted the household proportions of members in good health for 18,924 households (sampled and non-sampled). Bayesian bootstrap by [36] was applied to get non-sampled auxiliary information. This analysis was based on 1224 sample households from 102 wards (PSUs) in strata 6. Our primary purposes were to show that our model can provide good estimates and to compare the approximate method with the exact method when there are random effects at the household level.

We used Rcpp [37] and RcppArmadillo [38] packages in R [39] to fit the model based on both the approximation, INNA, method and the exact method to this NLSS II dataset. For the INNA method, we began with 10,000 iterations and a burn-in of 1000 and we kept only every ninth sample. Finally, 1000 samples were obtained for constructing the posterior distributions of all the parameters. The exact method was very time-consuming, taking about 30 h to finish. However, the INNA approximation method can get samples in 8 min. When we have a large number of areas or sub-areas, the approximation method will make enormous savings.

Convergence diagnostics were conducted. The convergences of the hyperparameters

(\underset{\tilde{}}{β}, σ^{2}, δ^{2})

were monitored by the Geweke test of stationarity [40] and the effective sample sizes. The p-values and effective sample sizes are shown in Table 3, resulting in good convergence for both methods. Table 3 also shows the posterior means (PMs) and associated posterior standard deviations (PMs) of the hyperparameters. The PMs are very close between these two methods. The PSDs are slightly larger for the exact method than for the INNA method, but they are reasonably close.

In Figure 1, Figure 2 and Figure 3, we compare the PMs, PSDs and posterior coefficient of variations (PCVs) in the household level as our primary purpose. We can see that the PMs are very close, nearly lying on the 45-degree line through the origin. The PSDs are slightly spread out and thicker, but all points still lie on the 45-degree line and so do the PCVs. Overall, these approximations are acceptable in the data analysis. Figure 4, Figure 5 and Figure 6 we compare respectively to the PMs, PSDs and PCVs at the ward level. The plots of the PMs are still very good. Notice that other two plots of PSDs and PCVs are more spread out than those in the household level. Again, though, the approximate method and the exact method are reasonably close.

We also compare the approximate method with the exact method using the five number summaries (the minimum values, the first quartiles, the median, the third quartiles, and the maximum values) with respect to the PMs, PSDs and PCVs of the finite population proportions at the household level and ward level in Table 4 and Table 5. The PMs from both methods at the household level have larger variations than those at the ward level. The PCVs at the ward level are generally much smaller than at the household level. The summaries of the PMs, PSDs and PCVs within households and wards between the approximate and exact methods are very close.

We conclude that the approximation method at the household level is reasonable. The approximation is desirable because one can perform the computations in real time.

5. Conclusions and Future Work

The sub-area HB logistic regression model can be applied to analyze the binary response variable. This model is an extension of the HB logistic regression area-level model, which ignores the actual hierarchical structure of the data. We propose an approximation method, INNA, to fit the model. For large datasets, it is very unrealistic to use the MCMC method to fit the model. We propose the approximation method, INNA, which saves time significantly because there is no need to compute numerous modes. In the numerical example, we can show that INNA can provide reliable estimates as well. An illustrative example of the NLSS II is presented in order to compare the approximation method and the exact method. It shows that, when there are a large number of areas and sub-areas, the approximation method can be efficient and it can also provide reasonable estimates.

INNA is a method for approximate Bayesian inference based on Laplace’s method, the second-order multivariate Taylor-series approximation and the empirical logistic transform (ELT). It can be applied to all HB logistic regression models, for which it can be a fast and accurate alternative to the Markov chain Monte Carlo methods. The comparison and model results illustrate the performance of the INNA methods based on the sub-area model.

There will be many future works on the two-fold small areas model. First, in this paper, we assume equal survey weights since the NLSS II is a self-weighted sampling. However, after the data are collected, the sampling weights are usually adjusted for various characteristics or based on nonresponse as well. Incorporating those survey weights into the model is also very important. Generally, we need to consider these weights in the model. The NLSS II is a national population-based survey. We should rescale the sample weights to sum to an equivalent sample size. That is, we consider the adjusted weight as

w_{i j k}^{*} = \hat{n} (\frac{w_{i j k}}{\sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} w_{i j k}})

, where

\hat{n} = \frac{{(\sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} w_{i j k})}^{2}}{\sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \sum_{k = 1}^{m_{i j}} w_{i j k}^{2}}

as an equivalent sample. Introducing the sampling weights, we can obtain an updated normalized likelihood function. Based on the updated likelihood function and the same prior in the two-fold model, we can have a full Bayesian analysis on the updated model and then project the finite population proportion of the family members with good health in each household.

Second, we focus on the binary data. Actually, there are four options in the health status questionnaire. The Multinomial-Dirichlet model can be an extension of the polychotomous data. Third, the two-fold sub-area level models can also be extended to three-fold models if the data have an additional hierarchical structure; actually, the NLSS II has this structure (households within wards, wards within districts). Fourth, in our models, we consider parametric priors. Introducing the Dirichlet process as a prior might make our method more robust to its specifications.

Author Contributions

Conceptualization, B.N. and L.C.; methodology, B.N. and L.C.; software, L.C.; validation, B.N. and L.C.; formal analysis, L.C.; investigation, L.C.; resources, B.N.; writing—original draft preparation, L.C.; writing—review and editing, B.N. and L.C.; visualization, L.C.; supervision, B.N. All authors have read and agreed to the published version of the manuscript.

Funding

Balgobin Nandram was supported by a grant from the Simons Foundation (#353953, Balgobin Nandram).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Exact Method for Sub-Area Logistic Regression Model

Recall that the joint posterior distribution of our two-fold logistic regression model is the joint posterior density for the parameters is

\begin{matrix} π (\underset{\tilde{}}{v}, \underset{\tilde{}}{w}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y}) & \propto \prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \prod_{k = 1}^{m_{i j}} [\frac{e^{({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}) y_{i j k}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}] \times {(\frac{1}{\sqrt{2 π σ^{2}}})}^{n} exp \{- \sum_{i = 1}^{l} \sum_{j = 1}^{n i} \frac{{(w_{i j} - ν_{i})}^{2}}{2 σ^{2}}\} \\ \times {(\frac{1}{\sqrt{2 π δ^{2}}})}^{l} exp \{- \sum_{i = 1}^{l} \frac{{(ν_{i} - β_{0})}^{2}}{2 δ^{2}}\} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} . \end{matrix}

We can see that the form of the joint posterior density is very complicated. It is very time consuming to draw all the posterior samples if applying the exact MCMC method. But the exact method will provide reliable estimates of all parameters, so in order to test the performance of our approximation method, we need to apply MCMC method on our model and then compare the performance of two methods. We use Metropolis-Hastings sampler to draw samples for

\underset{\tilde{}}{β}, σ^{2}, δ^{2}

together and then draw

\underset{\tilde{}}{ν}

given

\underset{\tilde{}}{β}, σ^{2}, δ^{2}

samples. At last, we use MH method to draw

\underset{\tilde{}}{w}

given

\underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2}

samples.

In order to draw samples for

\underset{\tilde{}}{β}, σ^{2}, δ^{2}

together, we need to integrate out

\underset{\tilde{}}{w}

and

\underset{\tilde{}}{v}

. First, we integrate out

\underset{\tilde{}}{w}

from the joint posterior density

π (\underset{\tilde{}}{v}, \underset{\tilde{}}{w}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y})

to get

\begin{matrix} π (\underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y}) & \propto \int_{Ω}^{} \prod_{i = 1}^{ℓ} \{\prod_{j = 1}^{n_{i}} [\prod_{k = 1}^{m_{i j}} \frac{e^{({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + μ_{i} + w_{i j}) y_{i j}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + μ_{i} + w_{i j}}}] \frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(w_{i j} - ν_{i})}^{2}}{2 σ^{2}}}\} d \underset{\tilde{}}{w} \\ \times {(\frac{1}{\sqrt{2 π δ^{2}}})}^{l} exp \{- \sum_{i = 1}^{l} \frac{{(ν_{i} - β_{0})}^{2}}{2 δ^{2}}\} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} . \end{matrix}

Notice that the integrant is not any simple distribution function, so we use Monte Carlo numberical integration to approximate the integrals. Let

z_{i j}^{w} = \frac{w_{i j} - ν_{i}}{σ}

. Notice that

z_{i j}^{w}

follows standard normal distribution. For standard normal density, 99.7% of data will fall within 3 standard deviations of the mean, which corresponds to the interval

[- 3, 3]

. Therefore, we bounded the integration domain to

[- 3, 3]

and divide the interval to M equal subintervals

[p_{a - 1}, p_{a}], a = 1, \dots, M

. Then we can get an approximate but very accurate joint density

\begin{matrix} π (\underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y}) & \propto \prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \{\sum_{a = 1}^{M} \int_{p_{a - 1}}^{p_{a}} \frac{e^{\sum_{k = 1}^{m_{i j}} ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}) y_{i j k}}}{\prod_{k = 1}^{m_{i j}} [1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}]} \frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(w_{i j} - ν_{i})}^{2}}{2 σ^{2}}} d w_{i j}\} \\ \times {(\frac{1}{\sqrt{2 π δ^{2}}})}^{l} exp \{- \sum_{i = 1}^{l} \frac{{(ν_{i} - β_{0})}^{2}}{2 δ^{2}}\} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} \\ \propto \prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \{\sum_{a = 1}^{M} \int_{p_{a - 1}}^{p_{a}} \frac{e^{\sum_{k = 1}^{m_{i j}} ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + σ z_{i j}^{w} + ν_{i}) y_{i j k}}}{\prod_{k = 1}^{m_{i j}} [1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + σ z_{i j}^{w} + ν_{i}}]} \frac{1}{\sqrt{2 π}} e^{- \frac{{(z_{i j}^{w})}^{2}}{2}} d z_{i j}^{w}\} \\ \times {(\frac{1}{\sqrt{2 π δ^{2}}})}^{l} exp \{- \sum_{i = 1}^{l} \frac{{(ν_{i} - β_{0})}^{2}}{2 δ^{2}}\} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} \end{matrix}

Let

{\bar{z}}_{a}^{w} = \frac{p_{a} - p_{a - 1}}{2}

, which is the midpoint of each interval

[p_{a - 1}, p_{a}], a = 1, \dots, M

. We use midpoint rule to approximate the definite integrals. We divide the interval

[- 3, 3]

into 100 subintervals, and so we use 100 midpoints to get the approximate joint posterior distribution

\begin{matrix} π (\underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y}) & \approx \prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \{\sum_{a = 1}^{100} \frac{e^{\sum_{k = 1}^{m_{i j}} ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + σ {\bar{z}}_{a}^{w} + ν_{i}) y_{i j k}}}{\prod_{k = 1}^{m_{i j}} [1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + σ {\bar{z}}_{a}^{w} + ν_{i}}]} (Φ (a) - Φ (a - 1))\} \\ \times {(\frac{1}{\sqrt{2 π δ^{2}}})}^{l} exp \{- \sum_{i = 1}^{l} \frac{{(ν_{i} - β_{0})}^{2}}{2 δ^{2}}\} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} \end{matrix}

Similarly, let

z_{i}^{ν} = \frac{ν_{i} - β_{0}}{δ}

and

{\bar{z}}_{b}^{ν} = \frac{p_{b} - p_{b - 1}}{2}, b = 1, \dots, 100

. We use the midpoint rule to approximate the definite integral with respect to

\underset{\tilde{}}{ν}

and then get the posterior density of

\underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y}

\begin{matrix} π (\underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y}) & \approx \prod_{i = 1}^{ℓ} \{\sum_{b = 1}^{100} [\prod_{j = 1}^{n_{i}} (\sum_{a = 1}^{100} \frac{e^{\sum_{k = 1}^{m_{i j}} ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + β_{0} + {\bar{z}}_{a}^{w} δ + {\bar{z}}_{b}^{ν} σ) y_{i j k}}}{\prod_{k = 1}^{m_{i j}} [1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + β_{0} + {\bar{z}}_{a}^{w} δ + {\bar{z}}_{b}^{ν} σ}]} Δ (Φ (p_{a})))] Δ (Φ (p_{b}))\} \\ \times \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} . \end{matrix}

We propose to draw samples from

\underset{\tilde{}}{β}, σ^{2}, δ^{2}

jointly by applying M-H sampler. Target function is

π (\underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y})

. We set the proposal function as

(\begin{matrix} \underset{\tilde{}}{β} \\ log σ^{2} \\ log δ^{2} \end{matrix}) | \underset{\tilde{}}{y} \sim Normal \{(\begin{matrix} {\bar{\underset{\tilde{}}{β}}}_{a} \\ log \bar{σ_{a}^{2}} \\ log \bar{δ_{a}^{2}} \end{matrix}), σ_{t}^{2} Σ_{a}\},

where

\frac{t}{σ_{t}^{2}} \sim χ_{t}^{2}

, Chi-square on t degree of freedom, i.e.

(\begin{matrix} \underset{\tilde{}}{β} \\ log σ^{2} \\ log δ^{2} \end{matrix}) | \underset{\tilde{}}{y} \sim

Student’s t. Here t is tuning constant.

We also use M-H sampler draw samples for

\underset{\tilde{}}{ν}

and

\underset{\tilde{}}{w}

respectively. Proposal functions are

π_{a} (\underset{\tilde{}}{ν} ∣ {\underset{\tilde{}}{β}}_{(0)}, β_{0}, σ^{2}, \underset{\tilde{}}{y})

and

π_{a} (\underset{\tilde{}}{w} ∣ \underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y})

respectively from the INNA method.

The target function to draw

\underset{\tilde{}}{ν}

is

\begin{matrix} π (\underset{\tilde{}}{ν} | \underset{\tilde{}}{β}, σ^{2}, δ^{2}, \underset{\tilde{}}{y}) & \propto \prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \{\sum_{a = 1}^{100} \frac{e^{\sum_{k = 1}^{m_{i j}} ({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + σ {\bar{z}}_{a}^{w} + ν_{i}) y_{i j k}}}{\prod_{k = 1}^{m_{i j}} [1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + σ {\bar{z}}_{a}^{w} + ν_{i}}]} (Φ (a) - Φ (a - 1))\} \\ \times {(\frac{1}{\sqrt{2 π δ^{2}}})}^{l} exp \{- \sum_{i = 1}^{l} \frac{{(ν_{i} - β_{0})}^{2}}{2 δ^{2}}\} . \end{matrix}

After we get samples for

\underset{\tilde{}}{ν}

, we can use M-H sampler to draw

\begin{matrix} π (\underset{\tilde{}}{w} | \underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y}) & \propto \prod_{i = 1}^{ℓ} \prod_{j = 1}^{n_{i}} \prod_{k = 1}^{m_{i j}} [\frac{e^{({\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}) y_{i j k}}}{1 + e^{{\underset{\tilde{}}{x}}_{i j k}^{'} {\underset{\tilde{}}{β}}_{(0)} + w_{i j}}}] \times {(\frac{1}{\sqrt{2 π σ^{2}}})}^{n} exp \{- \sum_{i = 1}^{l} \sum_{j = 1}^{n_{i}} \frac{{(w_{i j} - ν_{i})}^{2}}{2 σ^{2}}\} . \end{matrix}

Appendix B. Proof of Theorem 1

Proof.

By Lemma 2, the posterior density is logconcave. Then according to Lemma 1, the posterior distribution

\underset{\tilde{}}{τ} | \underset{\tilde{}}{y}

is approximately a multivariate normal distribution.

By Lemma 1, evaluating all quantities at

{\underset{\tilde{}}{τ}}^{*}

, the mean is

(\begin{matrix} {\underset{\tilde{}}{μ}}_{w} \\ {\underset{\tilde{}}{μ}}_{β} \end{matrix}) = {\underset{\tilde{}}{τ}}^{*} - H^{- 1} \underset{\tilde{}}{g} = (\begin{matrix} {\underset{\tilde{}}{w}}^{*} \\ {\underset{\tilde{}}{β}}_{(0)}^{*} \end{matrix}) + (\begin{matrix} E F^{'} \\ F G \end{matrix}) (\begin{matrix} {\underset{\tilde{}}{g}}_{1} \\ {\underset{\tilde{}}{g}}_{2} \end{matrix}) = (\begin{matrix} {\underset{\tilde{}}{w}}^{*} + E {\underset{\tilde{}}{g}}_{1} + F^{'} {\underset{\tilde{}}{g}}_{2} \\ {\underset{\tilde{}}{β}}_{(0)}^{*} + F {\underset{\tilde{}}{g}}_{1} + G {\underset{\tilde{}}{g}}_{2} \end{matrix}) .

Also, the covariance matrix is

- H^{- 1} = {(\begin{matrix} D C^{'} \\ C B \end{matrix})}^{- 1} = (\begin{matrix} E F^{'} \\ F G \end{matrix}) .

Therefore, by Lemma 1, the approximate joint posterior density of

\underset{\tilde{}}{w}, {\underset{\tilde{}}{β}}_{(0)} | \underset{\tilde{}}{y}

is

(\begin{matrix} \underset{\tilde{}}{w} \\ {\underset{\tilde{}}{β}}_{(0)} \end{matrix}) | \underset{\tilde{}}{y} \overset{}{\sim} Normal \{(\begin{matrix} {\underset{\tilde{}}{μ}}_{w} \\ {\underset{\tilde{}}{μ}}_{β} \end{matrix}), (\begin{matrix} E F^{'} \\ F G \end{matrix})\} .

Finally, using the property of the multivariate normal density, the conditional posterior density of

\underset{\tilde{}}{w} | {\underset{\tilde{}}{β}}_{(0)}, \underset{\tilde{}}{y}

and

{\underset{\tilde{}}{β}}_{(0)} | \underset{\tilde{}}{y}

can also be approximated by multivariate normal distributions,

\underset{\tilde{}}{w} | {\underset{\tilde{}}{β}}_{(0)}, \underset{\tilde{}}{y} \overset{}{\sim} Normal {{\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}), D^{- 1}} a n d {\underset{\tilde{}}{β}}_{(0)} | \underset{\tilde{}}{y} \overset{}{\sim} Normal {{\underset{\tilde{}}{μ}}_{β}, G},

where

{\underset{\tilde{}}{μ}}_{w} = {\underset{\tilde{}}{w}}^{*} + E {\underset{\tilde{}}{g}}_{1} + F^{'} {\underset{\tilde{}}{g}}_{2} a n d {\underset{\tilde{}}{μ}}_{β} = {\underset{\tilde{}}{β}}_{(0)}^{*} + F {\underset{\tilde{}}{g}}_{1} + G {\underset{\tilde{}}{g}}_{2} .

□

Appendix C. Proof of Theorem 2

Proof.

First, look at the exponent terms containing

\underset{\tilde{}}{w}

in the above approximate posterior density function

\begin{matrix} {[\underset{\tilde{}}{w} - ({\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}))]}^{'} D [\underset{\tilde{}}{w} - ({\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}))] + {[\underset{\tilde{}}{w} - {\underset{\tilde{}}{μ}}_{ν}]}^{'} {(σ^{2} I)}^{- 1} [\underset{\tilde{}}{w} - {\underset{\tilde{}}{μ}}_{ν}] \\ = Σ_{\underset{\tilde{}}{w}}^{'} (D + \frac{1}{σ^{2}} I) Σ_{\underset{\tilde{}}{w}} + {[{\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - {\underset{\tilde{}}{μ}}_{ν}]}^{'} {(D^{- 1} + σ^{2} I)}^{- 1} [{\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - {\underset{\tilde{}}{μ}}_{ν}], \end{matrix}

where

Σ_{\underset{\tilde{}}{w}} = [\underset{\tilde{}}{w} - {(D + \frac{1}{σ^{2}} I)}^{- 1} (D {\underset{\tilde{}}{μ}}_{w} - C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) + \frac{1}{σ^{2}} {\underset{\tilde{}}{μ}}_{ν})]

.

Then it can show that the

π_{a} (\underset{\tilde{}}{w} ∣ \underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y})

is

\underset{\tilde{}}{w} | \underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y} \overset{app}{\sim} Normal \{{(D + \frac{1}{σ^{2}} I)}^{- 1} (D {\underset{\tilde{}}{μ}}_{w} - C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) + \frac{1}{σ^{2}} {\underset{\tilde{}}{μ}}_{ν}), {(D + \frac{1}{σ^{2}} I)}^{- 1}\} .

Notice that

(D + \frac{1}{σ^{2}} I)

is diagonal matrix. Then given

\underset{\tilde{}}{ν}, {\underset{\tilde{}}{β}}_{(0)}, σ^{2}, \underset{\tilde{}}{y}

, all

w_{i j}

s are independent. This is an important result because parallel computation can be done for

w_{i j}

, which accommodates time-consuming and massive storage challenges in big data analysis. This result holds for the exact conditional posterior density of the

μ_{i j}

. Since

\underset{\tilde{}}{w}

has a multivariate normal distribution, we can integrate out

\underset{\tilde{}}{w}

from the joint approximate posterior density

π_{a} (\underset{\tilde{}}{w}, \underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y})

, and obtain the joint posterior density of

\underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}

and

δ^{2}

\begin{matrix} π_{a} (\underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y}) & \propto e^{- \frac{1}{2} \{{[{\underset{\tilde{}}{μ}}_{ν} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - {\underset{\tilde{}}{μ}}_{w}]}^{'} {(D^{- 1} + σ^{2} I)}^{- 1} [{\underset{\tilde{}}{μ}}_{ν} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - {\underset{\tilde{}}{μ}}_{w}]\}} \\ \times e^{- \frac{1}{2} \{{[\underset{\tilde{}}{ν} - β_{0} \underset{\tilde{}}{j}]}^{'} {(δ^{2} I)}^{- 1} [\underset{\tilde{}}{ν} - β_{0} \underset{\tilde{}}{j}] + {[{\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}]}^{'} G^{- 1} [{\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}]\}} \\ \times \frac{{| D |}^{1 / 2}}{| δ^{2} {I |}^{1 / 2} | D + \frac{1}{σ^{2}} {I |}^{1 / 2} {| G |}^{1 / 2}} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} . \end{matrix}

Next, we will show that the approximate conditional posterior density of

ν_{i}

is also normal distribution and all

ν_{i}

s are independent as well. Here we consider each

ν_{i}

. Let

\sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} = n

,

{(σ_{i j}^{2})}_{n \times n} = D^{- 1} + σ^{2} I

,

{({\underset{\tilde{}}{t}}_{i j})}_{n \times 1} = D^{- 1} C

and

{(μ_{w_{i j}})}_{n \times 1} = {\underset{\tilde{}}{μ}}_{w}

.

Look at the exponent only containing

ν_{i}, i = 1, \dots, ℓ .

in the

π_{a} (\underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y})

\begin{matrix} \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \frac{1}{σ_{i j}^{2}} {[ν_{i} - μ_{w_{i j}} + {\underset{\tilde{}}{t}}_{i j}^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β})]}^{2} + \frac{1}{δ^{2}} \sum_{i = 1}^{ℓ} {(ν_{i} - β_{0})}^{2} \\ = \sum_{i = 1}^{ℓ} {(\frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + \frac{1}{δ^{2}})}^{- 1} {\{ν_{i} - \frac{(\frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}}) [{\bar{μ}}_{w_{i}} - {\bar{\underset{\tilde{}}{t}}}_{i}^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β})] + \frac{1}{δ^{2}} β_{0}}{\frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + \frac{1}{δ^{2}}}\}}^{2} \\ + \sum_{i = 1}^{ℓ} {(\frac{1}{1 / \sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + δ^{2})}^{- 1} {\{{\bar{μ}}_{w_{i}} - {\bar{\underset{\tilde{}}{t}}}_{i}^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - β_{0}\}}^{2} \\ + \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \frac{1}{σ_{i j}^{2}} {\{{({\bar{\underset{\tilde{}}{t}}}_{i} - {\underset{\tilde{}}{t}}_{i j})}^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - ({\bar{μ}}_{w_{i}} - μ_{w_{i j}})\}}^{2}, \end{matrix}

where

{\bar{μ}}_{w_{i}} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} μ_{w_{i j}}

and

{\bar{\underset{\tilde{}}{t}}}_{i} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} {\underset{\tilde{}}{t}}_{i j}

.

Then it is easy to see that

ν_{i} | β_{0}, σ^{2}, δ^{2}, \underset{\tilde{}}{y} \overset{app}{\sim} Normal \{\frac{(\frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}}) [{\bar{μ}}_{w_{i}} - {\bar{\underset{\tilde{}}{t}}}_{i}^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β})] + \frac{1}{δ^{2}} β_{0}}{\frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + \frac{1}{δ^{2}}}, \frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + \frac{1}{δ^{2}}\} .

Similarly, we can use parallel computing to draw

ν_{i}, i = 1, \dots, ℓ

as well since all of them are independent given

{\underset{\tilde{}}{β}}_{(0)}, β_{0}, σ^{2}, δ^{2}

. Then we can integrate out

\underset{\tilde{}}{ν}

from the joint approximate posterior density

π_{a} (\underset{\tilde{}}{ν}, \underset{\tilde{}}{β}, σ^{2}, δ^{2} ∣ \underset{\tilde{}}{y})

and obtain the joint posterior density of

\underset{\tilde{}}{β}, σ^{2}

and

δ^{2}

\begin{matrix} π_{a} (\underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y}) & \propto exp \{- \frac{1}{2} \sum_{i = 1}^{ℓ} {(\frac{1}{1 / \sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + δ^{2})}^{- 1} {[{\bar{μ}}_{w_{i}} - {\bar{\underset{\tilde{}}{t}}}_{i}^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - β_{0}]}^{2}\} \\ \times exp \{- \frac{1}{2} \sum_{i = 1}^{ℓ} \sum_{j = 1}^{n_{i}} \frac{1}{σ_{i j}^{2}} {[{{({\bar{\underset{\tilde{}}{t}}}_{i} - {\underset{\tilde{}}{t}}_{i j})}^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - ({\bar{μ}}_{w_{i}} - μ_{w_{i j}})]}^{2}\} \\ \times \prod_{i = 1}^{l} {(\frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + \frac{1}{δ^{2}})}^{\frac{1}{2}} \frac{1}{| δ^{2} {I |}^{1 / 2} {| D + \frac{1}{σ^{2}} I |}^{1 / 2}} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} \\ = e^{- \frac{1}{2} {({\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - β_{0} \underset{\tilde{}}{j})}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} ({\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - β_{0} \underset{\tilde{}}{j})} \\ \times e^{- \frac{1}{2} {({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β})}^{'} G^{- 1} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β})} \\ \times \prod_{i = 1}^{l} {(\frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + \frac{1}{δ^{2}})}^{\frac{1}{2}} \frac{1}{| δ^{2} D + \frac{δ^{2}}{σ^{2}} {I |}^{1 / 2}} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} . \end{matrix}

Next we assume that the conditional posterior density of

\underset{\tilde{}}{β} | σ^{2}, δ^{2}, \underset{\tilde{}}{y}

has an approximate multivariate normal density,

(\begin{matrix} β_{0} \\ {\underset{\tilde{}}{β}}_{(0)} \end{matrix}) | σ^{2}, δ^{2}, \underset{\tilde{}}{y} \sim Normal \{(\begin{matrix} ω_{0} \\ {\underset{\tilde{}}{ω}}_{(0)} \end{matrix}), {(\begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix})}^{- 1}\},

which is denoted by

π_{a} (\underset{\tilde{}}{β} ∣ σ^{2}, δ^{2}, \underset{\tilde{}}{y})

. The density function is

π_{a} (\underset{\tilde{}}{β} ∣ σ^{2}, δ^{2}, \underset{\tilde{}}{y}) \propto {|\begin{matrix} (\begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix}) \end{matrix}|}^{\frac{1}{2}} \times e^{- \frac{1}{2} {(\begin{matrix} β_{0} - ω_{0} \\ {\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{ω}}_{(0)} \end{matrix})}^{'} (\begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix}) (\begin{matrix} β_{0} - ω_{0} \\ {\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{ω}}_{(0)} \end{matrix})}

So the exponent terms are

{(\begin{matrix} β_{0} - ω_{0} \\ {\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{ω}}_{(0)} \end{matrix})}^{'} (\begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix}) (\begin{matrix} β_{0} - ω_{0} \\ {\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{ω}}_{(0)} \end{matrix}) .

Consider the exponent terms containing

{\underset{\tilde{}}{β}}_{(0)}

and

β_{0}

\begin{matrix} {({\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - β_{0} \underset{\tilde{}}{j})}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} ({\underset{\tilde{}}{μ}}_{w} - D^{- 1} C^{'} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) - β_{0} \underset{\tilde{}}{j}) \\ + {({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β})}^{'} G^{- 1} ({\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{μ}}_{β}) \\ = {\underset{\tilde{}}{β}}_{(0)}^{'} [C D^{- 1} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} D^{- 1} C^{'} + G^{- 1}] {\underset{\tilde{}}{β}}_{(0)} + {\underset{\tilde{}}{j}}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} \underset{\tilde{}}{j} β_{0}^{2} \\ - 2 [{({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β})}^{'} (D^{- 1} + σ^{2} I + δ^{2} I) D^{- 1} C^{'} + {\underset{\tilde{}}{μ}}_{β}^{'} G^{- 1}] {\underset{\tilde{}}{β}}_{(0)} \\ - 2 [{({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β})}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} \underset{\tilde{}}{j}] β_{0} + 2 C D^{- 1} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} \underset{\tilde{}}{j} β_{0} {\underset{\tilde{}}{β}}_{(0)} \\ + {(D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β} + {\underset{\tilde{}}{μ}}_{w})}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} \underset{\tilde{}}{j} β_{0} {\underset{\tilde{}}{β}}_{(0)} \\ {({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β})}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} ({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β}) + {\underset{\tilde{}}{μ}}_{β}^{'} G^{- 1} {\underset{\tilde{}}{μ}}_{β} . \end{matrix}

We know those two exponent parts are equal, so we have

Δ_{(0)} = C D^{- 1} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} D^{- 1} C^{'} + G^{- 1},

δ_{0}^{2} = {\underset{\tilde{}}{j}}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} \underset{\tilde{}}{j},

\underset{\tilde{}}{γ} = C D^{- 1} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} \underset{\tilde{}}{j},

(\begin{matrix} ω_{0} \\ {\underset{\tilde{}}{ω}}_{(0)} \end{matrix}) = {(\begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix})}^{- 1} (\begin{matrix} {({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β})}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} \underset{\tilde{}}{j} \\ {({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β})}^{'} (D^{- 1} + σ^{2} I + δ^{2} I) D^{- 1} C^{'} + {\underset{\tilde{}}{μ}}_{β}^{'} G^{- 1} \end{matrix}) .

That is,

\underset{\tilde{}}{β} | σ^{2}, δ^{2}, \underset{\tilde{}}{y}

approximately follows multivariate normal distribution,

(\begin{matrix} β_{0} \\ {\underset{\tilde{}}{β}}_{(0)} \end{matrix}) | σ^{2}, δ^{2}, \underset{\tilde{}}{y} \sim Normal \{(\begin{matrix} ω_{0} \\ {\underset{\tilde{}}{ω}}_{(0)} \end{matrix}), {(\begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix})}^{- 1}\},

Then we can easily integrate out

\underset{\tilde{}}{β}

from the joint density of

\underset{\tilde{}}{β}, σ^{2}, δ^{2} | \underset{\tilde{}}{y}

, and get the posterior density of

σ^{2}, δ^{2} | \underset{\tilde{}}{y}

\begin{matrix} π_{a} (σ^{2}, δ^{2} | \underset{\tilde{}}{y}) & \propto {|\begin{matrix} \begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix} \end{matrix}|}^{- \frac{1}{2}} \times \prod_{i = 1}^{l} {(\frac{1}{\sum_{j = 1}^{n_{i}} σ_{i j}^{2}} + \frac{1}{δ^{2}})}^{\frac{1}{2}} \frac{1}{| δ^{2} D + \frac{δ^{2}}{σ^{2}} {I |}^{1 / 2}} \frac{1}{{(1 + σ^{2})}^{2}} \frac{1}{{(1 + δ^{2})}^{2}} \\ \times exp \{- \frac{1}{2} {({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β})}^{'} {(D^{- 1} + σ^{2} I + δ^{2} I)}^{- 1} ({\underset{\tilde{}}{μ}}_{w} + D^{- 1} C^{'} {\underset{\tilde{}}{μ}}_{β}) + {\underset{\tilde{}}{μ}}_{β}^{'} G^{- 1} {\underset{\tilde{}}{μ}}_{β}\} \\ \times exp \{- \frac{1}{2} {(\begin{matrix} β_{0} - ω_{0} \\ {\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{ω}}_{(0)} \end{matrix})}^{'} (\begin{matrix} δ_{0}^{2} {\underset{\tilde{}}{γ}}^{'} \\ \underset{\tilde{}}{γ} Δ_{(0)} \end{matrix}) (\begin{matrix} β_{0} - ω_{0} \\ {\underset{\tilde{}}{β}}_{(0)} - {\underset{\tilde{}}{ω}}_{(0)} \end{matrix})\} . \end{matrix}

□

References

Central Bureau of Statistics Thapathali. Nepal living Standards Survey 2003/04; Statistical Report; Central Bureau of Statistics Thapathali: Kathmandu, Nepal, 2004; Volume 1.
Rao, J.N.; Molina, I. Small Area Estimation; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Nandram, B.; Chen, L.; Fu, S.-T.; Manandhar, B. Bayesian Logistic Regression for Small Areas with Numerous Households. Stat. Appl. 2018, 1, 171–205. [Google Scholar]
Yan, G.; Sedransk, J. Bayesian Diagnostic Techniques for Detecting Hierarchical Structure. Bayesian Anal. 2007, 2, 735–760. [Google Scholar]
Yan, G.; Sedransk, J. A Note on Bayesian Residuals as A Hierarchical Model Diagnostic Technique. In Stat. Pap. 2010, 51, 1. [Google Scholar] [CrossRef]
Fuller, W.A.; Goyeneche, J.J. Estimation of the State Variance Component. Unpublished Manuscript. 1988. [Google Scholar]
Torabi, M.; Rao, J.N.K. On Small Area Estimation under a Sub-Area Level Model. In J. Multivar. Anal. 2014, 127, 36–55. [Google Scholar] [CrossRef]
Chen, L.; Nandram, B.; Cruze, N.B. Hierarchical Bayesian Model with Inequality Constraints for US County Estimates. J. Off. Stat. 2022, 38, 709–732. [Google Scholar] [CrossRef]
Erciulescu, A.L.; Cruze, N.B.; Nandram, B. Model-Based County Level Crop Estimates Incorporating Auxiliary Sources of Information. J. R. Stat. Soc. Ser. (Stat. Soc.) 2019, 182, 283–303. [Google Scholar] [CrossRef]
Fay, R.E.; Herriot, R.A. Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data. J. Am. Stat. Assoc. 1979, 74, 269–277. [Google Scholar] [CrossRef]
Stukel, D.M.; Rao, J.N.K. Estimation of Regression Models with Nested Error Structure and Unequal Error Variances Under Two and Three Stage Cluster Sampling. Stat. Probab. Lett. 1997, 35, 401–407. [Google Scholar] [CrossRef]
Stukel, D.M.; Rao, J.N.K. On Small-Area Estimation under Two-Fold Nested Error Regression Models. J. Stat. Plan. Inference 1999, 78, 131–147. [Google Scholar] [CrossRef]
Nandram, B.; Sedransk, J. Bayesian Predictive Inference for a Finite Population Proportion: Two-Stage Cluster Sampling. J. R. Stat. Soc. Ser. B (Methodol.) 1993, 55, 399–408. [Google Scholar] [CrossRef]
You, Y.; Reiss, P. Hierarchical Bayes Small Area Estimation of Response Rates for an Expenditure Survey. In Proceedings of the Survey Methods Section; Statistical Society of Canada: Ottawa, ON, Canada, 2000; pp. 123–128. [Google Scholar]
Nandram, B. Bayesian Predictive Inference of a Proportion Under a Twofold Small-Area Model. J. Off. Stat. 2016, 32, 187–208. [Google Scholar] [CrossRef]
Lee, D.; Nandram, B.; Kim, D. Bayesian Predictive Inference of a Proportion under a Two-Fold Small Area Model with Heterogeneous Correlations. Surv. Methodol. 2017, 17, 69–92. [Google Scholar]
Chen, L.; Nandram, B. A Hierarchical Bayesian Beta-Binomial Model for Sub-areas. In Applied Statistical Methods. ISGES 2020, Springer Proceedings in Mathematics & Statistics; Hanagal, D.D., Latpate, R.V., Chandra, G., Eds.; Springer: Singapore, 2022; pp. 23–40. [Google Scholar]
Nandram, B. Discrimination between Complementary Log-log and Logistic Model for Ordinal Data. Commun. Stat. Theory Methods 1989, 18, 2155–2164. [Google Scholar] [CrossRef]
Roberts, G.; Rao, J.N.K.; Kumar, S. Logistic Regression Analysis of Sample Survey Data. Biometrika 1987, 74, 1–12. [Google Scholar] [CrossRef]
Nandram, B.; Chen, M.-H. Reparameterizing the Generalized Linear Model to Accelerate Gibbs Sampler Convergence. J. Stat. Comput. Simul. 1996, 54, 129–144. [Google Scholar] [CrossRef]
Albert, J.H.; Chib, S. Bayesian Analysis of Binary and Polychotomous Response Data. J. Am. Stat. Assoc. 1993, 88, 669–679. [Google Scholar] [CrossRef]
Farrell, P.J.; MacGibbon, B.; Tomberlin, T.J. Empirical Bayes Small Area Estimation Using Logistic Regression Models and Summary Statistics. J. Bus. Econ. Stat. 1997, 15, 101–108. [Google Scholar]
Nandram, B.; Erhardt, E. Fitting Bayesian Two-Stage Generalized Linear Models Using Random Samples via the SIR Algorithm. Sankhya 2010, 66, 733–755. [Google Scholar]
Nandram, B.; Choi, J.W. A Bayesian Analysis of Body Mass Index Data from Small Domains Under Nonignorable Nonresponse and Selection. J. Am. Stat. Assoc. 2010, 105, 120–135. [Google Scholar] [CrossRef]
Scott, S.L.; Blocker, A.W.; Bonassi, F.V.; Chipman, H.A.; George, E.I.; McCulloch, R.E. Bayes and Big Data: The Consensus Monte Carlo Algorithm; Technical Report; Google, Inc.: Mountain View, CA, USA, 2013; pp. 1–22. [Google Scholar]
Miroshnikov, A.; Wei, Z.; Conlon, E.M. Parallel Markov chain Monte Carlo for non-Gaussian posterior distributions. Stat 2015, 4, 304–319. [Google Scholar] [CrossRef] [Green Version]
Rue, H.; Martino, S.; Chopin, N. Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations. J. R. Stat. Soc. (Ser. B) 2009, 71, 319–392. [Google Scholar] [CrossRef]
Fong, Y.; Rue, H.; Wakefield, J. Bayesian inference for generalized linear mixed models. Biostatistics 2010, 11, 397–412. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Illian, J.B.; Sørbye, S.H.; Rue, H. A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA). Ann. Appl. Stat. 2012, 6, 1499–1530. [Google Scholar] [CrossRef] [Green Version]
Ferkingstad, E.; Rue, H. Improving the INLA approach for approximate Bayesian inference for latent Gaussian models. Electron. J. Stat. 2015, 9, 2706–2731. [Google Scholar] [CrossRef]
Cox, D.R.; Snell, E.J. Analysis of Binary Data, 2nd ed.; Chapman and Hall/CRC: London, UK, 1989. [Google Scholar]
Wang, J.; Fuller, W.A. The Mean Squared Error of Small Area Predictors Constructed With Estimated Area Variances. J. Am. Stat. Assoc. 2003, 98, 716–723. [Google Scholar] [CrossRef]
Yan, G.; Sedransk, J. Small Area Estimation Using Area Level Models and Estimated Sampling Variances. Surv. Methodol. 2006, 32, 97–103. [Google Scholar]
Erciulescu, A.L.; Berg, E. Small area estimates for the conservation effects assessment project. In In Frontiers of Hierarchical Modeling in Observational Studies, Complex Surveys and Big Data: A Conference Honoring Professor Malay Ghosh, College Park, MD, USA; Women in Statistics Conference: Cary, NC, USA, 2014. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Rubin, D.B. The Bayesian Bootstrap. Ann. Stat. 1981, 9, 130–134. [Google Scholar] [CrossRef]
Eddelbuettel, D.; Francois, R. Rcpp: Seamless R and C++ Integration. J. Stat. Softw. 2011, 40, 1–18. [Google Scholar] [CrossRef] [Green Version]
Eddelbuettel, D.; Sanderson, C. Rcpparmadillo: Accelerating R with High-performance C++ Linear Algebra. Comput. Stat. Data Anal. 2014, 71, 1054–1063. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
Geweke, J. Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments. Bayesian Stat. 1992, 169–193. [Google Scholar]

Figure 1. Comparison of the INNA method and the exact method using the PSDs of the household proportions.

Figure 2. Comparison of the INNA method and the exact method using the PSDs of the household proportions.

Figure 3. Comparison of the INNA method and the exact method using the CVs of the household proportions.

Figure 4. Comparison of the INNA method and the exact method using the PMs of the ward proportions.

Figure 5. Comparison of the INNA method and the exact method using the PSDs of the ward proportions.

Figure 6. Comparison of the INNA method and the exact method using the CVs of the ward proportions.

Table 1. Distribution of wards and households in the sample.

Strata	Mountains	Kathemandu	Urban Hill	Rural Hills	Urban Tarai	Rural Tarai	Total
PSU	32	34	28	96	34	102	326
Households	384	408	336	1152	408	1224	3912
Individuals	1949	1954	1467	5755	2104	7034	20,263

Table 2. The descriptives of 4 covariates.

Covariates		Frequency	Percentage
Age	0–14	7765	38.32
	15–59	10,951	54.04
	60+	1547	7.64
Gender	Male	9763	48.18
	Female	10,500	51.82
Nativity	Indigeous	11,903	41.25
	Non-Indigous	8,360	58.75
Religion	Hingdu	16,378	80.83
	Non-Hingdu	3385	19.17

Table 3. Posterior means (PM), associated posterior standard deviations (PSD), Geweke test p-values and effective sample sizes (ESS) of hyperparameters based on the INNA and exact method.

Method	INNA				Exact
Estimator	PM	PSD	p-Value	ESS	PM	PSD	p-Value	ESS
$β_{0}$	0.802	0.323	0.688	780	0.840	0.405	0.264	1000
$β_{1}$	0.675	0.015	0.759	857	0.654	0.016	0.156	858
$β_{2}$	0.302	0.017	0.453	894	0.313	0.019	0.655	1000
$β_{3}$	−0.862	0.017	0.605	915	−0.824	0.017	0.418	863
$β_{4}$	−0.338	0.013	0.718	937	−0.360	0.011	0.839	1000
$σ^{2}$	115.248	35.601	0.408	1000	108.959	38.719	0.615	911
$δ^{2}$	30.002	1.146	0.448	731	29.034	1.697	0.490	780

Table 4. Comparison of posterior inference about the finite population proportions using the five-number summaries at the household level.

Households		Min	Q1	Mean	Q3	Max
PM	INNA	0.049	0.541	0.565	0.584	0.981
	Exact	0.050	0.542	0.566	0.584	0.984
PSD	INNA	0.040	0.246	0.441	0.465	0.500
	Exact	0.038	0.248	0.442	0.466	0.501
PCV	INNA	0.041	0.379	0.762	0.852	1.549
	Exact	0.039	0.384	0.768	0.852	1.577

Table 5. Comparison of posterior inference about the finite population proportions using five-number summaries at the ward level.

Wards		Min	Q1	Mean	Q3	Max
PM	INNA	0.449	0.537	0.563	0.589	0.684
	Exact	0.450	0.537	0.564	0.590	0.683
PSD	INNA	0.056	0.059	0.063	0.066	0.077
	Exact	0.058	0.061	0.064	0.066	0.077
PCV	INNA	0.095	0.103	0.113	0.121	0.163
	Exact	0.097	0.103	0.113	0.122	0.165

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, L.; Nandram, B. Bayesian Logistic Regression Model for Sub-Areas. Stats 2023, 6, 209-231. https://doi.org/10.3390/stats6010013

AMA Style

Chen L, Nandram B. Bayesian Logistic Regression Model for Sub-Areas. Stats. 2023; 6(1):209-231. https://doi.org/10.3390/stats6010013

Chicago/Turabian Style

Chen, Lu, and Balgobin Nandram. 2023. "Bayesian Logistic Regression Model for Sub-Areas" Stats 6, no. 1: 209-231. https://doi.org/10.3390/stats6010013

Article Menu

Bayesian Logistic Regression Model for Sub-Areas

Abstract

1. Introduction

2. Sub-Area Logistic Regression Model

3. Integrated Nested Normal Approximation Method

4. Numerical Example

4.1. Nepal Living Standards Survey II

4.2. Numerical Comparison

5. Conclusions and Future Work

Author Contributions

Funding

Conflicts of Interest

Appendix A. Exact Method for Sub-Area Logistic Regression Model

Appendix B. Proof of Theorem 1

Appendix C. Proof of Theorem 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI