Variational Bayesian Approximation (VBA) with Exponential Families and Covariance Estimation

Fallah Mortezanejad, Seyedeh Azadeh; Mohammad-Djafari, Ali

doi:10.3390/psf2023009012

Open AccessProceeding Paper

Variational Bayesian Approximation (VBA) with Exponential Families and Covariance Estimation^†

by

Seyedeh Azadeh Fallah Mortezanejad

¹

and

Ali Mohammad-Djafari

^2,3,*

¹

School of Automotive and Traffic Engineering, Jiangsu University, Zhenjiang 212013, China

²

International Science Consulting and Training (ISCT), 91440 Bures sur Yvette, France

³

Shanfeng Company, Shaoxing 312352, China

^*

Author to whom correspondence should be addressed.

^†

Presented at the 42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Garching, Germany, 3–7 July 2023.

Phys. Sci. Forum 2023, 9(1), 12; https://doi.org/10.3390/psf2023009012

Published: 30 November 2023

(This article belongs to the Proceedings of The 42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Variational Bayesian Approximation (VBA) is a fast technique for approximating Bayesian computation. The main idea is to assess the joint posterior distribution of all the unknown variables with a simple expression. Mean–Field Variational Bayesian Approximation (MFVBA) is a particular case developed for large–scale problems where the approximated probability law is separable in all variables. A well–known drawback of MFVBA is that it tends to underestimate the variances in the variables, even though it estimates the means well. It can lead to poor inference results. We can obtain a fixed point algorithm to evaluate the means in exponential families for the approximating distribution. However, this does not solve the problem of underestimating the variances. In this paper, we propose a modified method of VBA with exponential families to first estimate the posterior mean and then improve the estimation of the posterior covariance. We demonstrate the performance of the procedure with an example.

Keywords:

Variational Bayesian Approximation (VBA); Kullback–Leibler divergence; Mean–Field Variational Bayesian Approximation (MFVBA); Linear Response Variational Bayes (LRVB)

1. Introduction

Bayesian computation uses Bayes’ theorem to revise the probabilities of unknowns based on prior knowledge and data. When the likelihood expressions and prior are available, we can get the expression of the posterior law. However, this expression needs the denominator integral computation of the Bayes formula, called evidence. Approximate Bayesian Computation (ABC) are a class of algorithms used to perform Bayesian computation without exactly computing the evidence. The first class of these methods generate samples from the posterior law. We can mention all of the Monte Carlo sampling methods, such as slice sampling, nested sampling, and Markov chain Monte Carlo (MCMC) methods. The second category of these methods includes those that directly compute the means and variances, such as the Variational Bayesian Approximation (VBA) methods.

MCMC is a class of algorithms for sampling from a probability distribution and estimating the posterior distribution. Metropolis et al. (1953) [1] introduced the first MCMC algorithm called the Metropolis algorithm. Hastings (1970) [2] brought it closer to applied statistics, generalizing the Metropolis algorithm for asymmetric proposal distributions.

VBA methods were introduced by Jordan et al. (1998) [3]. They were motivated by the demand to improve the Bayesian inference for large and complex models intractable by conventional methods such as MCMC. VBA is a fast and robust scheme to perform Bayesian computations in high–dimensional problems. The main idea is to approximate the high–complexity posterior probability law with a simple and lower–complexity probability law using the Kullback–Leibler divergence as the approximation criterion.

In many cases, the VBA methods are faster than the MCMC methods. VBA can release two types of posterior approximations. The first is the free–form product of distributions such as the conjugated exponential family known as the Mean–Field Approximation (MFA). The second one is the fixed–form for posterior distributions like multivariate Gaussian with a proper parametrization for the model, Sarkka and Nummenmaa (2009) [4]. MFA is a general approach in the graphical models, particularly for the hierarchical Bayesian models.

The foundation of VBA is minimizing the Kullback–Leibler divergence (KL) between the approximated probability distribution and the exact posterior distribution. Rohde and Wand (2016) [5] worked on semi–parametric Mean–Field Variational Bayesian Approximation (MFVBA) as a combination of the KL measure and MFA. A significant drawback of MFVBA is that it underestimates the variable uncertainties and is uninformed concerning the variable covariance of models. Giordano et al. (2015, 2018) [6,7] rectified the MFVBA and named Linear Response Variational Bayes (LRVB) as being able to provide proper uncertainties.

MFVBA is a valuable approximation method for estimating the posterior mean in the Bayesian framework. While using an exponential family for approximation, we get a fixed point algorithm for reckoning the means. In this paper, we show how to use VBA with exponential families to approximate the posterior means and simultaneously evaluate the posterior covariance more precisely. Via an example, we also compare the MFVBA and MCMC posterior distributions.

This paper’s organization is as follows: In Section 2, we demonstrate the main idea of MFVBA with a particular case in exponential family (EF) distributions. In Section 3, we modify the covariance matrix resulting from the MFVBA to get a better approximation. In Section 4, we give details of the covariance matrix expression in EF. We show that the covariance computation needs high–dimensional matrix inversion, which is very costly in high–dimensional problems. In Section 5, we provide an example of how to compute the posterior distribution and compare the computational costs of the proposed method and the MCMC. In Section 6, we present the main conclusion on the paper content.

2. Mean–Field Variational Bayesian Approximation (MFVBA)

VBA approximates posterior distributions when the exact expression of the posterior is too complex or costly. MFA simplifies the VBA by assuming that the final distribution is factorized into independent margins, which means that each unknown parameter has its distribution independent of the others.

Suppose that p is approximated by

q^{*}

and

θ

is the unknown vector of parameters. We define

V = {Cov}_{q^{*}} (θ)

and

Σ = {Cov}_{p} (θ)

. We know that

V < Σ

.

V

is a poor estimate (underestimate) of

Σ

even if

m^{*} = E_{q^{*}} \{θ\} = E_{p} \{θ\}

. The question is how to find a better estimate for

Σ

. One solution for

Σ

in the case of the exponential family, proposed by Giordano et al.

(2015)

[6], is

Σ = {(I - V H)}^{- 1} V

via the LRVB method, where

I

and

H

are an identity matrix and Hessian matrix of the log probability

ln p

, respectively. To show this, let us go through the details step by step.

2.1. Main Idea

General Bayesian framework: Given the likelihood $p (D | θ)$ and the prior $p (θ)$ , the expression of the posterior $p (θ | D)$ is given by:

$p (θ | D) = \frac{p (D | θ) p (θ)}{p (D)} .$

(1)
MFVBA: Approximate $p (θ | x)$ by a separable $q (θ)$ :

$q (θ) = \prod_{j} q_{j} (θ_{j}),$

(2)

by minimizing:

$KL [q : p] = \int q ln \frac{q}{p} = E_{q} \{ln q\} - E_{q} \{ln p\} .$

(3)
Noting by $L = E_{q} \{ln p\}$ the expected posterior likelihood and by $S = - E_{q} \{ln q\}$ the entropy of q, we can also write:

$q^{*} = arg min_{q} \{KL [q : p]\} = arg max_{q} \{E = S + L\} .$

(4)
E is the evidence lower bound.

2.2. VBA and Exponential Family

If q is chosen to be in an exponential family:

$q (θ | η) = exp [η^{'} θ - A (η)],$

(5)

then it is entirely characterized by its mean $m = E_{q} \{θ\}$ and so is:

$q^{*} (θ | η^{*}) = exp [{η^{*}}^{'} θ - A (η^{*})],$

(6)

then the same characterization by the mean $m^{*} = E_{q^{*}} \{θ\}$ .
We can then define the objective E as a function of $m$ , and the first–order condition of the optimality is:

$\frac{\partial E}{\partial m} |_{m = m *} = 0 .$

(7)

2.3. MFVBA, Exponential Family (EF) and Fixed Point Algorithm

VBA+exponential family with $E_{q} \{θ\}$ :

$\frac{\partial E}{\partial m} |_{m = m *} = 0 \to \frac{\partial E}{\partial m} |_{m = m *} + m = m \to M (m) = m .$

(8)
Iterating on this fixed point algorithm:

$M (m) = m^{(k - 1)} with M (m) : = \frac{\partial E}{\partial m} + m,$

(9)

converges to $m * = E_{q^{*}} \{θ\}$ .

3. VBA, EF and Covariance Estimation

Noting from $V = {Cov}_{q^{*}} (θ)$ and $Σ = {Cov}_{p} (θ)$ , we know that $V < Σ$ .
$V$ is a poor estimate (underestimate) of $Σ$ even if $m^{*} = E_{q^{*}} \{θ\} = E_{p} \{θ\}$ .
The question is now: How to find a better estimate for $Σ$ .
One solution proposed by Giordano et al. $(2015)$ [6] is the LRVB.
The main idea is to perturb $p (θ | x)$ by some exponential family to obtain $p_{t} (θ | x)$ :

$p_{t} (θ | x) = p (θ | x) exp [t^{'} θ + C (t)]$

(10)

such that:

$ln p_{t} (θ | x) = ln p (θ | x) + t^{'} θ + C (t)$

(11)
$C (t)$ is the cumulate generating function of $p_{t} (θ | x)$ , so

$Σ = {Cov}_{p} (θ) = \frac{\partial^{2} C (t)}{\partial t^{'} \partial t} |_{t = 0} = \frac{\partial E_{p_{t}} \{θ\}}{\partial t^{'}} |_{t = 0} = \frac{\partial m_{t}^{*}}{\partial t^{'}} |_{t = 0}$

(12)

4. Linear Perturbation Method for Better Covariance Estimation

To summarize this method:

Perturb $p (θ | x)$ by an exponential family term to obtain:

$p_{t} (θ | x) = p (θ | x) exp [t^{'} θ + C (t)] .$

Thus, we have:

$\begin{matrix} - KL [q : p_{t}] = & - KL [q : p] + t^{'} m ⟶ E_{t} = E + t^{'} m, \\ \frac{\partial E}{\partial m} = & 0 ⟶ m = E_{q} \{m\}, M (m) = m, \end{matrix}$

and finally:

$E_{t} = E + t^{'} m ⟶ M_{t} (m) = M (m) + t .$

(13)
Replacing these relations, we have:

$\frac{\partial m_{t}^{*}}{\partial t^{'}} = \frac{\partial M_{t}}{\partial m^{'}} |_{m = m_{t}^{*}} \frac{\partial m}{\partial t} + \frac{\partial M_{t}}{\partial t} = \frac{\partial M_{t}}{\partial m^{'}} |_{m = m_{t}^{*}} \frac{\partial m}{\partial t} + I .$

We obtain:

$Σ = \frac{\partial m_{t}^{*}}{\partial t^{'}} = \frac{\partial M}{\partial m} Σ + I = (\frac{\partial^{2} E}{\partial m \partial m^{'}} + I) + I ⟶ Σ = - (\frac{\partial^{2} E}{\partial m \partial m^{'}}),$

and finally:

$E = L + S ⟶ Σ = - (\frac{\partial^{2} L}{\partial m \partial m^{'}} + \frac{\partial^{2} S}{\partial m \partial m^{'}}) .$

(14)
Noting that the entropy of an exponential family is:

$S = η^{'} m + A (η) ⟶ \frac{\partial S}{\partial m} = - η ⟶ \frac{\partial^{2} S}{\partial m \partial m^{'}} = - \frac{\partial η}{\partial m} = V^{- 1} .$

We get:

$Σ = - {(\frac{\partial^{2} L}{\partial m \partial m^{'}} + \frac{\partial^{2} S}{\partial m \partial m^{'}})}^{- 1} = - {(H - V^{- 1})}^{- 1} .$
Finally, using a matrix inversion lemma, we hold:

$Σ = {(I - V H)}^{- 1} V .$

(15)

4.1. Dealing with Nuisance Parameters

If we are only interested in some of the variables $α$ , i.e., $θ = (α, z)$ where z are nuisance parameters that we are not interested in, then we can write:

$Σ = [\begin{matrix} Σ_{α} & Σ_{α z} \\ Σ_{z α} & Σ_{z} \end{matrix}] .$
Looking for the partition of $θ = (α, z)$ , we obtain:

$Σ_{α} = {(I_{α} - V_{α} H_{α} - V_{α} H_{α z} {(I_{z} - V_{z} H_{z})}^{- 1} V_{z} H_{z α})}^{- 1} V_{α} .$

4.2. The Whole Algorithm

Having chosen the prior $p (θ)$ and the likelihood $p (D | θ)$ , find the expression of the posterior law:

$p (θ | D) \propto p (D | θ) p (θ) .$
Choose an exponential family $q (θ)$ and find the expressions of:

$L = E_{q} \{ln p\}, S = E_{q} \{ln q\}, and E = L + S,$

as a function of the expectation parameters $m = E_{q} \{θ\}$ .
Find the expression of $M = \nabla_{m} E + m$ and write the following fix point update $M (m) = m^{(k - 1)}$ until convergence.
When $m^{★}$ is obtained, then compute:

$V = \frac{\partial^{2} S}{\partial m \partial m^{'}} {|_{m^{★}} and H = \frac{\partial^{2} L}{\partial m \partial m^{'}} |}_{m^{★}} .$
Compute $Σ = {(I - V H)}^{- 1} V$ .

5. Numerical Experimentations of Normal–Poisson Distribution

The Normal–Poisson example is a non–conjugacy generalized mix model. The observed data are from a Poisson distribution named

y_{n}

and a design vector

x_{n}

, for

n = 1, \dots, N

. The considered model is in Figure 1:

y_{n} | z_{n} \sim P (exp (z_{n})),

(16)

where

z_{n} | β, τ \sim N (β x_{n}, τ^{- 1}), β \sim N (0, σ_{β}^{2}), a n d τ \sim Γ (α_{τ}, β_{τ}) .

In the first step, we approximate the posterior joint distribution of

β

,

τ

and

z = (z_{1}, \dots, z_{n})

for

n = 1, \dots, N

via MFVBA:

q (β, τ, z) = q_{N + 1} (β) q_{N + 2} (τ) \prod_{n = 1}^{N} q_{n} (z_{n}) .

We need to write

L = ln p (β, τ, z, y)

:

\begin{matrix} ln p (β, τ, z, y) = & - \frac{β^{2}}{2 σ_{β}^{2}} + (α_{τ} + \frac{N}{2} - 1) ln (τ) - β_{τ} τ - \frac{τ}{2} \sum_{n = 1}^{N} z_{n}^{2} - \frac{τ β^{2}}{2} \sum_{n = 1}^{N} x_{n}^{2} \\ + τ β \sum_{n = 1}^{N} z_{n} x_{n} + \sum_{n = 1}^{N} z_{n} y_{n} - \sum_{n = 1}^{N} exp (z_{n}) - \sum_{n = 1}^{N} ln (y_{n}!) + C, \end{matrix}

(17)

where C is a constant and does not depend on unknown variables. We start with computing the distributions of

z_{n}

for

n = 1, \dots, N

. For the simplicity of notations, we assume that n is a natural number in the interval

[1, N]

, and

q_{N + 2 ∖ n}

means all

N + 2

distribution items except the nth one, and

{〈 \cdot 〉}_{q}

means the expectation over density q. Thus,

q_{n} (z_{n})

, the density of

z_{n}

, is equivalent to the following:

\begin{matrix} {〈 ln p (β, τ, z, y) 〉}_{q_{N + 2 ∖ n}} \propto - \frac{{〈 τ 〉}_{q_{N + 2}}}{2} z_{n}^{2} + ({〈 τ 〉}_{q_{N + 2}} {〈 β 〉}_{q_{N + 1}} x_{n} + y_{n}) z_{n} - exp (z_{n}) . \end{matrix}

Since this expression does not follow a specific distribution because of the existence of

exp (z_{n})

, we omit this part according to its expectation, which is a function of

{〈 z_{n} 〉}_{q_{n}}

and

{〈 z_{n}^{2} 〉}_{q_{n}}

. We can calculate the expectations using the Normal moment–generating distribution function. The logical reason is that its expectation depends only on

{〈 z_{n} 〉}_{q_{n}}

and

{〈 z_{n}^{2} 〉}_{q_{n}}

. Moreover, we have

{〈 τ 〉}_{q_{N + 2}} = \frac{α_{τ}}{β_{τ}}

and

{〈 β 〉}_{q_{N + 1}} = 0

. Thus,

\begin{matrix} {〈 ln p (β, τ, z, y) 〉}_{q_{N + 2 ∖ n}} \propto - \frac{α_{τ}}{2 β_{τ}} (z_{n}^{2} - \frac{2 β_{τ} y_{n}}{α_{τ}} z_{n}), \end{matrix}

which means a Normal distribution for

z_{n}

,

N (\frac{β_{τ} {\bar{y}}_{n}}{α_{τ}}, \frac{β_{τ}}{α_{τ}})

for

n = 1, \dots, N

. The process is quite similar for

q_{N + 2} (τ)

and

q_{N + 1} (β)

. The corresponding equivalents of distributions for

q_{N + 2}

and

q_{N + 1}

are shown below, respectively:

\begin{matrix} {〈 ln p (β, τ, z, y) 〉}_{q_{1, \dots, N + 1}} \propto & (α_{τ} + \frac{N}{2} - 1) ln (τ) \\ - \frac{τ}{2} (2 β_{τ} + \sum_{n = 1}^{N} ({〈 z_{n}^{2} 〉}_{q_{n}} + {〈 β^{2} 〉}_{q_{N + 1}} x_{n}^{2} - 2 {〈 β 〉}_{q_{N + 1}} x_{n} {〈 z_{n} 〉}_{q_{n}})), \end{matrix}

and

\begin{matrix} {〈 ln p (β, τ, z, y) 〉}_{q_{1, \dots, N, N + 2}} \propto - \frac{β^{2}}{2 σ_{β}^{2}} - \frac{{〈 τ 〉}_{q_{N + 2}} β^{2}}{2} \sum_{n = 1}^{N} x_{n}^{2} + {〈 τ 〉}_{q_{N + 2}} β \sum_{n = 1}^{N} {〈 z_{n} 〉}_{q_{n}} x_{n} . \end{matrix}

Thus, we have:

\begin{matrix} τ \sim Γ (α_{τ} + \frac{N}{2}, \frac{1}{2} (2 β_{τ} + \frac{β_{τ}}{α_{τ}} N + \frac{β_{τ}^{2}}{α_{τ}^{2}} \sum_{n = 1}^{N} {\bar{y}}_{n}^{2} + σ_{β}^{2} \sum_{n = 1}^{N} x_{n}^{2})), \end{matrix}

and

\begin{matrix} β \sim N & (\frac{σ_{β}^{2} (2 α_{τ} + N) \frac{β_{τ}}{α_{τ}} \sum_{n = 1}^{N} {\bar{y}}_{n} x_{n}}{2 β_{τ} + \frac{β_{τ}}{α_{τ}} N + \frac{β_{τ}^{2}}{α_{τ}^{2}} \sum_{n = 1}^{N} {\bar{y}}_{n}^{2} + σ_{β}^{2} (2 α_{τ} + N + 1) \sum_{n = 1}^{N} x_{n}^{2}}, \\ \frac{σ_{β}^{2} (2 β_{τ} + \frac{β_{τ}}{α_{τ}} N + \frac{β_{τ}^{2}}{α_{τ}^{2}} \sum_{n = 1}^{N} {\bar{y}}_{n}^{2} + σ_{β}^{2} \sum_{n = 1}^{N} x_{n}^{2})}{2 β_{τ} + \frac{β_{τ}}{α_{τ}} N + \frac{β_{τ}^{2}}{α_{τ}^{2}} \sum_{n = 1}^{N} {\bar{y}}_{n}^{2} + σ_{β}^{2} (2 α_{τ} + N + 1) \sum_{n = 1}^{N} x_{n}^{2}}) . \end{matrix}

In the second step, we calculate the variance diagonal matrix

V

:

V = [\begin{matrix} Σ_{β} & 0 & 0 \\ 0 & Σ_{τ} & 0 \\ 0 & 0 & Σ_{z} \end{matrix}],

where

\begin{matrix} Σ_{β} = & \frac{σ_{β}^{2} (2 β_{τ} + \frac{β_{τ}}{α_{τ}} N + \frac{β_{τ}^{2}}{α_{τ}^{2}} \sum_{n = 1}^{N} {\bar{y}}_{n}^{2} + σ_{β}^{2} \sum_{n = 1}^{N} x_{n}^{2})}{2 β_{τ} + \frac{β_{τ}}{α_{τ}} N + \frac{β_{τ}^{2}}{α_{τ}^{2}} \sum_{n = 1}^{N} {\bar{y}}_{n}^{2} + σ_{β}^{2} (2 α_{τ} + N + 1) \sum_{n = 1}^{N} x_{n}^{2}}, \\ Σ_{τ} = & \frac{2 (2 α_{τ} + N)}{{(2 β_{τ} + \frac{β_{τ}}{α_{τ}} N + \frac{β_{τ}^{2}}{α_{τ}^{2}} \sum_{n = 1}^{N} {\bar{y}}_{n}^{2} + σ_{β}^{2} \sum_{n = 1}^{N} x_{n}^{2})}^{2}}, \end{matrix}

and

\begin{matrix} Σ_{z_{n}} = \frac{β_{τ}}{α_{τ}}, n = 1, \dots, N . \end{matrix}

Thus,

Σ_{z} = d i a g (Σ_{z_{1}}, \dots, Σ_{z_{N}})

. The third step is to compute the Hessian matrix

H

:

H = [\begin{matrix} \frac{\partial^{2} L}{\partial β^{2}} & \frac{\partial^{2} L}{\partial β \partial τ} & \frac{\partial^{2} L}{\partial β \partial z} \\ \frac{\partial^{2} L}{\partial τ \partial β} & \frac{\partial^{2} L}{\partial τ^{2}} & \frac{\partial^{2} L}{\partial τ \partial z} \\ \frac{\partial^{2} L}{\partial z \partial β} & \frac{\partial^{2} L}{\partial z \partial τ} & \frac{\partial^{2} L}{\partial z^{2}} \end{matrix}] .

Since

H

is symmetric in our case study, we need to calculate the main diagonal and the upper triangle. This covariance matrix depends on the values of

m

:

\begin{matrix} m = {(E_{q} (β), E_{q} (β^{2}), E_{q} (τ), E_{q} (ln (τ)), E_{q} (z_{1}), E_{q} (z_{1}^{2}), \dots, E_{q} (z_{N}), E_{q} (z_{N}^{2}))}^{T} . \end{matrix}

Thus, the diagonal of

H

as a function of

m

is shown below:

\begin{matrix} \frac{\partial^{2} L}{\partial β^{2}} = & - \frac{1}{σ_{β}^{2}} - E_{q} (τ) \sum_{n = 1}^{N} x_{n}^{2}, \\ \frac{\partial^{2} L}{\partial τ^{2}} = & - (α_{τ} + \frac{N}{2} - 1) E_{q} (\frac{1}{τ^{2}}), \\ \frac{\partial^{2} L}{\partial z_{n}^{2}} = & - E_{q} (τ) - E_{q} (exp (z_{n})), n = 1, \dots, N, \end{matrix}

where L is defined in Equation (17). The upper triangle is:

\begin{matrix} \frac{\partial^{2} L}{\partial β \partial τ} = & - E_{q} (β) \sum_{n = 1}^{N} x_{n}^{2} + \sum_{n = 1}^{N} E_{q} (z_{n}) x_{n}, \\ \frac{\partial^{2} L}{\partial β \partial z_{n}} = & E_{q} (τ) x_{n}, n = 1, \dots, N, \\ \frac{\partial^{2} L}{\partial τ \partial z_{n}} = & - E_{q} (z_{n}) + E_{q} (β) x_{n}, n = 1, \dots, N . \end{matrix}

The last step is to compute

Σ

using (15). For numerical experimentation, we generate data from the Normal–Poisson model with these descriptions:

σ_{β}^{2} = 10, α_{τ} = 1, β_{τ} = 1, N = 4 .

We generate

x_{n}

for

n = 1, \dots, N

from the Normal distribution

N (2, 1)

. The number of the reputation for vector

y_{n}

is 100, so

y_{n}

is a matrix

N \times 100

. The only data we use are the observation matrix

y_{n}

and vector

x_{n}

. The covariance matrix

V

from MFVBA is a diagonal matrix:

V = [\begin{matrix} 1.48927 & 0 & 0^{T} \\ 0 & 0.00071 & 0^{T} \\ 0 & 0 & I \end{matrix}],

where

0

and

I

are the zero vector with N dimensions and the identity matrix with

N \times N

dimensions, respectively. The Hessian matrix

H

and matrix

I - V H

are:

H = [\begin{matrix} - 0.67147 & 0.09617 & 0.05557 & 0.03031 & 0.13360 & 0.07026 \\ 0.09617 & - 1639.9471 & - 0.21738 & - 0.02130 & 0.02651 & 0.06653 \\ 0.05557 & - 0.21738 & - 2.20665 & 0 & 0 & 0 \\ 0.03031 & - 0.02130 & 0 & - 1.78014 & 0 & 0 \\ 0.13360 & 0.02651 & 0 & 0 & - 1.86901 & 0 \\ 0.07026 & 0.06653 & 0 & 0 & 0 & - 1.69561 \end{matrix}],

and

I - V H = [\begin{matrix} 2 & - 0.14322 & - 0.08276 & - 0.04514 & - 0.19897 & - 0.10464 \\ - 6.8318 {\times 10}^{- 5} & 2.16501 & 0.00015 & 1.5130 {\times 10}^{- 5} & - 1.8832 {\times 10}^{- 5} & - 4.7264 {\times 10}^{- 5} \\ - 0.05557 & 0.21738 & 3.20665 & 0 & 0 & 0 \\ - 0.03031 & 0.02130 & 0 & 2.78014 & 0 & 0 \\ - 0.13360 & - 0.02651 & 0 & 0 & 2.86901 & 0 \\ - 0.07026 & - 0.06653 & 0 & 0 & 0 & 2.69561 \end{matrix}] .

The covariance matrix via LRVBA is:

Σ = [\begin{matrix} 0.74986 & 2.3409 {\times 10}^{- 5} & 0.01299 & 0.00818 & 0.03492 & 0.01955 \\ 2.3408 {\times 10}^{- 5} & 0.00033 & - 2.1838 {\times 10}^{- 5} & - 2.2586 {\times 10}^{- 6} & 4.1220 {\times 10}^{- 6} & 8.7090 {\times 10}^{- 6} \\ 0.01299 & - 2.1838 {\times 10}^{- 5} & 0.31208 & 0.00014 & 0.00060 & 0.00034 \\ 0.00817 & - 2.2586 {\times 10}^{- 6} & 0.00014 & 0.35978 & 0.00038 & 0.00021 \\ 0.03492 & 4.1220 {\times 10}^{- 6} & 0.00060 & 0.00038 & 0.35018 & 0.00091 \\ 0.01955 & 8.7090 {\times 10}^{- 6} & 0.00034 & 0.00021 & 0.00091 & 0.37148 \end{matrix}] .

(18)

The sparsity patterns for the

V

,

H

and

I - V H

matrixes are shown in Figure 2.

In addition, we simulate the posterior distribution via MCMC. This simulation has five chains and one divergence. The estimated marginal distribution for each unknown variable is shown in Figure 3 and Figure 4 via MFVBA and MCMC, respectively. More details are in Table 1. The joint distribution of unknown variables is approximated via MFVBA and MCMC. The MFVBA distribution is the marginal density factorial of the unknown variables. Therefore, the variables are independent in MFVBA, and the covariance matrix is diagonal. LRVBA is a correction method on the covariance matrix. Thus, the matrix is no longer diagonal as presented in (18). Also, we add more details about the MCMC results in Table 1. The Monte Carlo Standard Error (MCSE) is a chain–accurate measure and provides a quantitative suggestion of how big the estimation noise is. The noises are small based on the mean and sd of the MCSE. The ess–bulk column delivers an evaluated bulk influential sample size using rank–normalised draws. The ess–tail column creates an approximated tail practical sample size by computing the effective minimum sample sizes between the

5 %

and

95 %

quantiles. The last column is the r–hat convergence diagnostic, which compares the between and within–chain estimation for model variables and other univariate quantities.

6. Conclusions

Bayesian computations are used to infer unknown parameters from data. ABC methods are applied to perform approximate computation, particularly in high–dimensional problems. Two classes of approximate computational methods are MCMC and VBA. MCMC generates samples from the probability function to approximate the posterior distribution. It generates a sequence of samples that converges to the target distribution. The VBA approximates the expression of the joint posterior distribution of unknown parameters in a complex model using a simple expression. VBA has some advantages over MCMC, such as being faster, more scalable and versatile. It also guarantees convergence to a local minimum of KL divergence.

VBA’s drawback is that it underestimates the covariance matrix of the unknowns. For this aim, we use LRVB to approximate the covariance more precisely. However, the LRVB computations still need the inversion of some matrixes, which is costly when the unknown parameters’ dimensions are high. We work on a numerical example to show the whole process of VBA and LRVB as well as the performance and ability of the proposed method and compare it with the classical MCMC results. As a result of the numerical example, the VBA implementation was much shorter than the MCMC, and we had an explicit form of joint distribution with a modified covariance matrix with the LRVB aid.

Author Contributions

Conceptualization, S.A.F.M. and A.M.-D.; methodology, S.A.F.M. and A.M.-D.; software, S.A.F.M.; validation, A.M.-D.; formal analysis, S.A.F.M. and A.M.-D.; investigation, S.A.F.M. and A.M.-D.; resources, S.A.F.M. and A.M.-D.; data curation, S.A.F.M. and A.M.-D.; writing—original draft preparation, S.A.F.M. and A.M.-D.; writing—review and editing, S.A.F.M. and A.M.-D.; visualization, S.A.F.M. and A.M.-D.; supervision, A.M.-D.; project administration, A.M.-D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Conflicts of Interest

The author Ali Mohammad-Djafari is Scientific Research Leader of Shanfeng Company. The authors declare no conflict of interest.

References

Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef]
Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
Jordan, M.I.; Ghahramani, Z.; Jaakkola, T.S.; Saul, L.K. An introduction to variational methods for graphical models. In Learning in Graphical Models; Springer: Dordrecht, The Netherlands, 1998; pp. 105–161. [Google Scholar]
Sarkka, S.; Nummenmaa, A. Recursive noise adaptive Kalman filtering by variational Bayesian approximations. IEEE Trans. Autom. Control 2009, 54, 596–600. [Google Scholar] [CrossRef]
Rohde, D.; Wand, M.P. Semiparametric mean field variational Bayes: General principles and numerical issues. J. Mach. Learn. Res. 2016, 17, 5975–6021. [Google Scholar]
Giordano, R.J.; Broderick, T.; Jordan, M.I. Linear response methods for accurate covariance estimates from mean field variational Bayes. Adv. Neural Inf. Process. Syst. 2015, 28, 1441–1449. [Google Scholar]
Giordano, R.; Broderick, T.; Jordan, M.I. Covariances, robustness and variational bayes. J. Mach. Learn. Res. 2018, 19, 1–49. [Google Scholar]

Figure 1. The Normal–Poisson hierarchical model pattern when

N = 4

with sample size 100.

Figure 1. The Normal–Poisson hierarchical model pattern when

N = 4

with sample size 100.

Figure 2. Sparsity patterns for the matrices using model (16). (a) VBA covariance matrix

V

; (b) Hessian matrix

H

; (c)

I - V H

matrix.

Figure 2. Sparsity patterns for the matrices using model (16). (a) VBA covariance matrix

V

; (b) Hessian matrix

H

; (c)

I - V H

matrix.

Figure 3. The final estimations of margin densities for each variable via MFVBA method.

Figure 4. The final margins via MCMC method.

Table 1. Summary of the margin densities for unknown variables.

Unknown	MFVBA Method		LRVBA Method	MCMC Method
Unknown	mean	sd	sd	mean	sd	hdi–3%	hdi–97%	MCSE–mean	MCSE–sd	ess–bulk	ess–tail	r–hat
$β$	$0.044$	$1.220$	$0.866$	$- 1.616$	$1.769$	$- 5.175$	$1.620$	$0.148$	$0.194$	$186.0$	$43.0$	$1.11$
$τ$	$0.047$	$0.027$	$0.018$	$0.361$	$0.188$	$0.057$	$0.685$	$0.024$	$0.017$	$38.0$	$51.0$	$1.11$
$z_{0}$	$0.270$	1	$0.559$	$- 1.316$	$0.177$	$- 1.664$	$- 1.028$	$0.008$	$0.006$	$468.0$	$316.0$	$1.02$
$z_{1}$	$0.050$	1	$0.600$	$- 3.051$	$0.410$	$- 3.783$	$- 2.252$	$0.021$	$0.015$	$382.0$	$372.0$	$1.00$
$z_{2}$	$0.100$	1	$0.592$	$- 2.375$	$0.332$	$- 3.029$	$- 1.790$	$0.015$	$0.011$	$476.0$	$415.0$	$1.02$
$z_{3}$	0	1	$0.610$	$- 7.334$	$4.641$	$- 14.366$	$- 3.311$	$0.868$	$0.620$	$45.0$	$55.0$	$1.13$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fallah Mortezanejad, S.A.; Mohammad-Djafari, A. Variational Bayesian Approximation (VBA) with Exponential Families and Covariance Estimation. Phys. Sci. Forum 2023, 9, 12. https://doi.org/10.3390/psf2023009012

AMA Style

Fallah Mortezanejad SA, Mohammad-Djafari A. Variational Bayesian Approximation (VBA) with Exponential Families and Covariance Estimation. Physical Sciences Forum. 2023; 9(1):12. https://doi.org/10.3390/psf2023009012

Chicago/Turabian Style

Fallah Mortezanejad, Seyedeh Azadeh, and Ali Mohammad-Djafari. 2023. "Variational Bayesian Approximation (VBA) with Exponential Families and Covariance Estimation" Physical Sciences Forum 9, no. 1: 12. https://doi.org/10.3390/psf2023009012

Article Menu

Variational Bayesian Approximation (VBA) with Exponential Families and Covariance Estimation^†

Abstract

1. Introduction

2. Mean–Field Variational Bayesian Approximation (MFVBA)

2.1. Main Idea

2.2. VBA and Exponential Family

2.3. MFVBA, Exponential Family (EF) and Fixed Point Algorithm

3. VBA, EF and Covariance Estimation

4. Linear Perturbation Method for Better Covariance Estimation

4.1. Dealing with Nuisance Parameters

4.2. The Whole Algorithm

5. Numerical Experimentations of Normal–Poisson Distribution

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Variational Bayesian Approximation (VBA) with Exponential Families and Covariance Estimation †

Abstract

1. Introduction

2. Mean–Field Variational Bayesian Approximation (MFVBA)

2.1. Main Idea

2.2. VBA and Exponential Family

2.3. MFVBA, Exponential Family (EF) and Fixed Point Algorithm

3. VBA, EF and Covariance Estimation

4. Linear Perturbation Method for Better Covariance Estimation

4.1. Dealing with Nuisance Parameters

4.2. The Whole Algorithm

5. Numerical Experimentations of Normal–Poisson Distribution

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Variational Bayesian Approximation (VBA) with Exponential Families and Covariance Estimation^†