Entropy

Editorial

Jump to: Research

12 pages, 263 KiB

Open AccessEditorial

Approximate Bayesian Inference

by Pierre Alquier

Entropy 2020, 22(11), 1272; https://doi.org/10.3390/e22111272 - 10 Nov 2020

Cited by 10 | Viewed by 5137

Abstract

This is the Editorial article summarizing the scope of the Special Issue: Approximate Bayesian Inference. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

Research

Jump to: Editorial

25 pages, 4200 KiB

Open AccessArticle

Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder

by Shichen Cao, Jingjing Li, Kenric P. Nelson and Mark A. Kon

Entropy 2022, 24(3), 423; https://doi.org/10.3390/e24030423 - 18 Mar 2022

Cited by 7 | Viewed by 4223

Abstract

We present a coupled variational autoencoder (VAE) method, which improves the accuracy and robustness of the model representation of handwritten numeral images. The improvement is measured in both increasing the likelihood of the reconstructed images and in reducing divergence between the posterior and [...] Read more.

We present a coupled variational autoencoder (VAE) method, which improves the accuracy and robustness of the model representation of handwritten numeral images. The improvement is measured in both increasing the likelihood of the reconstructed images and in reducing divergence between the posterior and a prior latent distribution. The new method weighs outlier samples with a higher penalty by generalizing the original evidence lower bound function using a coupled entropy function based on the principles of nonlinear statistical coupling. We evaluated the performance of the coupled VAE model using the Modified National Institute of Standards and Technology (MNIST) dataset and its corrupted modification C-MNIST. Histograms of the likelihood that the reconstruction matches the original image show that the coupled VAE improves the reconstruction and this improvement is more substantial when seeded with corrupted images. All five corruptions evaluated showed improvement. For instance, with the Gaussian corruption seed the accuracy improves by

10^{14}

(from

10^{- 57.2}

to

10^{- 42.9}

) and robustness improves by

10^{22}

(from

10^{- 109.2}

to

10^{- 87.0}

). Furthermore, the divergence between the posterior and prior distribution of the latent distribution is reduced. Thus, in contrast to the

β

-VAE design, the coupled VAE algorithm improves model representation, rather than trading off the performance of the reconstruction and latent distribution divergence. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

27 pages, 1813 KiB

Open AccessArticle

Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly

by Le Li and Benjamin Guedj

Entropy 2021, 23(11), 1534; https://doi.org/10.3390/e23111534 - 18 Nov 2021

Cited by 1 | Viewed by 1891

Abstract

When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn [...] Read more.

When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation (called slpc, for sequential learning principal curves) that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret computation and performance on synthetic and real-life data. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

13 pages, 418 KiB

Open AccessArticle

Still No Free Lunches: The Price to Pay for Tighter PAC-Bayes Bounds

by Benjamin Guedj and Louis Pujol

Entropy 2021, 23(11), 1529; https://doi.org/10.3390/e23111529 - 18 Nov 2021

Cited by 9 | Viewed by 2100

Abstract

“No free lunch” results state the impossibility of obtaining meaningful bounds on the error of a learning algorithm without prior assumptions and modelling, which is more or less realistic for a given problem. Some models are “expensive” (strong assumptions, such as sub-Gaussian tails), [...] Read more.

“No free lunch” results state the impossibility of obtaining meaningful bounds on the error of a learning algorithm without prior assumptions and modelling, which is more or less realistic for a given problem. Some models are “expensive” (strong assumptions, such as sub-Gaussian tails), others are “cheap” (simply finite variance). As it is well known, the more you pay, the more you get: in other words, the most expensive models yield the more interesting bounds. Recent advances in robust statistics have investigated procedures to obtain tight bounds while keeping the cost of assumptions minimal. The present paper explores and exhibits what the limits are for obtaining tight probably approximately correct (PAC)-Bayes bounds in a robust setting for cheap models. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

18 pages, 1419 KiB

Open AccessArticle

A Scalable Bayesian Sampling Method Based on Stochastic Gradient Descent Isotropization

by Giulio Franzese, Dimitrios Milios, Maurizio Filippone and Pietro Michiardi

Entropy 2021, 23(11), 1426; https://doi.org/10.3390/e23111426 - 28 Oct 2021

Cited by 3 | Viewed by 1694

Abstract

Stochastic gradient sg-based algorithms for Markov chain Monte Carlo sampling (sgmcmc) tackle large-scale Bayesian modeling problems by operating on mini-batches and injecting noise on sgsteps. The sampling properties of these algorithms are determined by user choices, such as the [...] Read more.

Stochastic gradient sg-based algorithms for Markov chain Monte Carlo sampling (sgmcmc) tackle large-scale Bayesian modeling problems by operating on mini-batches and injecting noise on sgsteps. The sampling properties of these algorithms are determined by user choices, such as the covariance of the injected noise and the learning rate, and by problem-specific factors, such as assumptions on the loss landscape and the covariance of sg noise. However, current sgmcmc algorithms applied to popular complex models such as Deep Nets cannot simultaneously satisfy the assumptions on loss landscapes and on the behavior of the covariance of the sg noise, while operating with the practical requirement of non-vanishing learning rates. In this work we propose a novel practical method, which makes the sg noise isotropic, using a fixed learning rate that we determine analytically. Extensive experimental validations indicate that our proposal is competitive with the state of the art on sgmcmc. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

20 pages, 424 KiB

Open AccessArticle

PAC-Bayes Unleashed: Generalisation Bounds with Unbounded Losses

by Maxime Haddouche, Benjamin Guedj, Omar Rivasplata and John Shawe-Taylor

Entropy 2021, 23(10), 1330; https://doi.org/10.3390/e23101330 - 12 Oct 2021

Cited by 12 | Viewed by 2411

Abstract

We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework, where most of the existing literature focuses on supervised learning problems with a bounded loss function (typically assumed to [...] Read more.

We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework, where most of the existing literature focuses on supervised learning problems with a bounded loss function (typically assumed to take values in the interval [0;1]). In order to relax this classical assumption, we propose to allow the range of the loss to depend on each predictor. This relaxation is captured by our new notion of HYPothesis-dependent rangE (HYPE). Based on this, we derive a novel PAC-Bayesian generalisation bound for unbounded loss functions, and we instantiate it on a linear regression problem. To make our theory usable by the largest audience possible, we include discussions on actual computation, practicality and limitations of our assumptions. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

14 pages, 326 KiB

Open AccessArticle

Differentiable PAC–Bayes Objectives with Partially Aggregated Neural Networks

by Felix Biggs and Benjamin Guedj

Entropy 2021, 23(10), 1280; https://doi.org/10.3390/e23101280 - 29 Sep 2021

Cited by 11 | Viewed by 2375

Abstract

We make two related contributions motivated by the challenge of training stochastic neural networks, particularly in a PAC–Bayesian setting: (1) we show how averaging over an ensemble of stochastic neural networks enables a new class of partially-aggregated estimators, proving that these lead to [...] Read more.

We make two related contributions motivated by the challenge of training stochastic neural networks, particularly in a PAC–Bayesian setting: (1) we show how averaging over an ensemble of stochastic neural networks enables a new class of partially-aggregated estimators, proving that these lead to unbiased lower-variance output and gradient estimators; (2) we reformulate a PAC–Bayesian bound for signed-output networks to derive in combination with the above a directly optimisable, differentiable objective and a generalisation guarantee, without using a surrogate loss or loosening the bound. We show empirically that this leads to competitive generalisation guarantees and compares favourably to other methods for training such networks. Finally, we note that the above leads to a simpler PAC–Bayesian training scheme for sign-activation networks than previous work. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

20 pages, 579 KiB

Open AccessArticle

Meta-Strategy for Learning Tuning Parameters with Guarantees

by Dimitri Meunier and Pierre Alquier

Entropy 2021, 23(10), 1257; https://doi.org/10.3390/e23101257 - 27 Sep 2021

Cited by 3 | Viewed by 1944

Abstract

Online learning methods, similar to the online gradient algorithm (OGA) and exponentially weighted aggregation (EWA), often depend on tuning parameters that are difficult to set in practice. We consider an online meta-learning scenario, and we propose a meta-strategy to learn these parameters from [...] Read more.

Online learning methods, similar to the online gradient algorithm (OGA) and exponentially weighted aggregation (EWA), often depend on tuning parameters that are difficult to set in practice. We consider an online meta-learning scenario, and we propose a meta-strategy to learn these parameters from past tasks. Our strategy is based on the minimization of a regret bound. It allows us to learn the initialization and the step size in OGA with guarantees. It also allows us to learn the prior or the learning rate in EWA. We provide a regret analysis of the strategy. It allows to identify settings where meta-learning indeed improves on learning each task in isolation. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

16 pages, 465 KiB

Open AccessArticle

Fast Compression of MCMC Output

by Nicolas Chopin and Gabriel Ducrocq

Entropy 2021, 23(8), 1017; https://doi.org/10.3390/e23081017 - 06 Aug 2021

Cited by 2 | Viewed by 2373

Abstract

We propose cube thinning, a novel method for compressing the output of an MCMC (Markov chain Monte Carlo) algorithm when control variates are available. It allows resampling of the initial MCMC sample (according to weights derived from control variates), while imposing equality constraints [...] Read more.

We propose cube thinning, a novel method for compressing the output of an MCMC (Markov chain Monte Carlo) algorithm when control variates are available. It allows resampling of the initial MCMC sample (according to weights derived from control variates), while imposing equality constraints on the averages of these control variates, using the cube method (an approach that originates from survey sampling). The main advantage of cube thinning is that its complexity does not depend on the size of the compressed sample. This compares favourably to previous methods, such as Stein thinning, the complexity of which is quadratic in that quantity. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

45 pages, 704 KiB

Open AccessArticle

Accelerated Diffusion-Based Sampling by the Non-Reversible Dynamics with Skew-Symmetric Matrices

by Futoshi Futami, Tomoharu Iwata, Naonori Ueda and Issei Sato

Entropy 2021, 23(8), 993; https://doi.org/10.3390/e23080993 - 30 Jul 2021

Cited by 3 | Viewed by 3628

Abstract

Langevin dynamics (LD) has been extensively studied theoretically and practically as a basic sampling technique. Recently, the incorporation of non-reversible dynamics into LD is attracting attention because it accelerates the mixing speed of LD. Popular choices for non-reversible dynamics include underdamped Langevin dynamics [...] Read more.

Langevin dynamics (LD) has been extensively studied theoretically and practically as a basic sampling technique. Recently, the incorporation of non-reversible dynamics into LD is attracting attention because it accelerates the mixing speed of LD. Popular choices for non-reversible dynamics include underdamped Langevin dynamics (ULD), which uses second-order dynamics and perturbations with skew-symmetric matrices. Although ULD has been widely used in practice, the application of skew acceleration is limited although it is expected to show superior performance theoretically. Current work lacks a theoretical understanding of issues that are important to practitioners, including the selection criteria for skew-symmetric matrices, quantitative evaluations of acceleration, and the large memory cost of storing skew matrices. In this study, we theoretically and numerically clarify these problems by analyzing acceleration focusing on how the skew-symmetric matrix perturbs the Hessian matrix of potential functions. We also present a practical algorithm that accelerates the standard LD and ULD, which uses novel memory-efficient skew-symmetric matrices under parallel-chain Monte Carlo settings. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

34 pages, 2438 KiB

Open AccessArticle

Flexible and Efficient Inference with Particles for the Variational Gaussian Approximation

by Théo Galy-Fajou, Valerio Perrone and Manfred Opper

Entropy 2021, 23(8), 990; https://doi.org/10.3390/e23080990 - 30 Jul 2021

Cited by 2 | Viewed by 2676

Abstract

Variational inference is a powerful framework, used to approximate intractable posteriors through variational distributions. The de facto standard is to rely on Gaussian variational families, which come with numerous advantages: they are easy to sample from, simple to parametrize, and many expectations are [...] Read more.

Variational inference is a powerful framework, used to approximate intractable posteriors through variational distributions. The de facto standard is to rely on Gaussian variational families, which come with numerous advantages: they are easy to sample from, simple to parametrize, and many expectations are known in closed-form or readily computed by quadrature. In this paper, we view the Gaussian variational approximation problem through the lens of gradient flows. We introduce a flexible and efficient algorithm based on a linear flow leading to a particle-based approximation. We prove that, with a sufficient number of particles, our algorithm converges linearly to the exact solution for Gaussian targets, and a low-rank approximation otherwise. In addition to the theoretical analysis, we show, on a set of synthetic and real-world high-dimensional problems, that our algorithm outperforms existing methods with Gaussian targets while performing on a par with non-Gaussian targets. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

20 pages, 1588 KiB

Open AccessArticle

ABCDP: Approximate Bayesian Computation with Differential Privacy

by Mijung Park, Margarita Vinaroz and Wittawat Jitkrittum

Entropy 2021, 23(8), 961; https://doi.org/10.3390/e23080961 - 27 Jul 2021

Cited by 3 | Viewed by 2309

Abstract

We developed a novel approximate Bayesian computation (ABC) framework, ABCDP, which produces differentially private (DP) and approximate posterior samples. Our framework takes advantage of the sparse vector technique (SVT), widely studied in the differential privacy literature. SVT incurs the privacy cost only [...] Read more.

We developed a novel approximate Bayesian computation (ABC) framework, ABCDP, which produces differentially private (DP) and approximate posterior samples. Our framework takes advantage of the sparse vector technique (SVT), widely studied in the differential privacy literature. SVT incurs the privacy cost only when a condition (whether a quantity of interest is above/below a threshold) is met. If the condition is sparsely met during the repeated queries, SVT can drastically reduce the cumulative privacy loss, unlike the usual case where every query incurs the privacy loss. In ABC, the quantity of interest is the distance between observed and simulated data, and only when the distance is below a threshold can we take the corresponding prior sample as a posterior sample. Hence, applying SVT to ABC is an organic way to transform an ABC algorithm to a privacy-preserving variant with minimal modification, but yields the posterior samples with a high privacy level. We theoretically analyzed the interplay between the noise added for privacy and the accuracy of the posterior samples. We apply ABCDP to several data simulators and show the efficacy of the proposed framework. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

43 pages, 644 KiB

Open AccessArticle

Variational Message Passing and Local Constraint Manipulation in Factor Graphs

by İsmail Şenöz, Thijs van de Laar, Dmitry Bagaev and Bert de Vries

Entropy 2021, 23(7), 807; https://doi.org/10.3390/e23070807 - 24 Jun 2021

Cited by 15 | Viewed by 6092

Abstract

Accurate evaluation of Bayesian model evidence for a given data set is a fundamental problem in model development. Since evidence evaluations are usually intractable, in practice variational free energy (VFE) minimization provides an attractive alternative, as the VFE is an upper bound on [...] Read more.

Accurate evaluation of Bayesian model evidence for a given data set is a fundamental problem in model development. Since evidence evaluations are usually intractable, in practice variational free energy (VFE) minimization provides an attractive alternative, as the VFE is an upper bound on negative model log-evidence (NLE). In order to improve tractability of the VFE, it is common to manipulate the constraints in the search space for the posterior distribution of the latent variables. Unfortunately, constraint manipulation may also lead to a less accurate estimate of the NLE. Thus, constraint manipulation implies an engineering trade-off between tractability and accuracy of model evidence estimation. In this paper, we develop a unifying account of constraint manipulation for variational inference in models that can be represented by a (Forney-style) factor graph, for which we identify the Bethe Free Energy as an approximation to the VFE. We derive well-known message passing algorithms from first principles, as the result of minimizing the constrained Bethe Free Energy (BFE). The proposed method supports evaluation of the BFE in factor graphs for model scoring and development of new message passing-based inference algorithms that potentially improve evidence estimation accuracy. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

34 pages, 1413 KiB

Open AccessArticle

Understanding the Variability in Graph Data Sets through Statistical Modeling on the Stiefel Manifold

by Clément Mantoux, Baptiste Couvy-Duchesne, Federica Cacciamani, Stéphane Epelbaum, Stanley Durrleman and Stéphanie Allassonnière

Entropy 2021, 23(4), 490; https://doi.org/10.3390/e23040490 - 20 Apr 2021

Cited by 2 | Viewed by 2775

Abstract

Network analysis provides a rich framework to model complex phenomena, such as human brain connectivity. It has proven efficient to understand their natural properties and design predictive models. In this paper, we study the variability within groups of networks, i.e., the structure of [...] Read more.

Network analysis provides a rich framework to model complex phenomena, such as human brain connectivity. It has proven efficient to understand their natural properties and design predictive models. In this paper, we study the variability within groups of networks, i.e., the structure of connection similarities and differences across a set of networks. We propose a statistical framework to model these variations based on manifold-valued latent factors. Each network adjacency matrix is decomposed as a weighted sum of matrix patterns with rank one. Each pattern is described as a random perturbation of a dictionary element. As a hierarchical statistical model, it enables the analysis of heterogeneous populations of adjacency matrices using mixtures. Our framework can also be used to infer the weight of missing edges. We estimate the parameters of the model using an Expectation-Maximization-based algorithm. Experimenting on synthetic data, we show that the algorithm is able to accurately estimate the latent structure in both low and high dimensions. We apply our model on a large data set of functional brain connectivity matrices from the UK Biobank. Our results suggest that the proposed model accurately describes the complex variability in the data set with a small number of degrees of freedom. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

40 pages, 4144 KiB

Open AccessArticle

“Exact” and Approximate Methods for Bayesian Inference: Stochastic Volatility Case Study

by Yuliya Shapovalova

Entropy 2021, 23(4), 466; https://doi.org/10.3390/e23040466 - 15 Apr 2021

Cited by 4 | Viewed by 3691

Abstract

We conduct a case study in which we empirically illustrate the performance of different classes of Bayesian inference methods to estimate stochastic volatility models. In particular, we consider how different particle filtering methods affect the variance of the estimated likelihood. We review and [...] Read more.

We conduct a case study in which we empirically illustrate the performance of different classes of Bayesian inference methods to estimate stochastic volatility models. In particular, we consider how different particle filtering methods affect the variance of the estimated likelihood. We review and compare particle Markov Chain Monte Carlo (MCMC), RMHMC, fixed-form variational Bayes, and integrated nested Laplace approximation to estimate the posterior distribution of the parameters. Additionally, we conduct the review from the point of view of whether these methods are (1) easily adaptable to different model specifications; (2) adaptable to higher dimensions of the model in a straightforward way; (3) feasible in the multivariate case. We show that when using the stochastic volatility model for methods comparison, various data-generating processes have to be considered to make a fair assessment of the methods. Finally, we present a challenging specification of the multivariate stochastic volatility model, which is rarely used to illustrate the methods but constitutes an important practical application. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

39 pages, 480 KiB

Open AccessArticle

PAC-Bayes Bounds on Variational Tempered Posteriors for Markov Models

by Imon Banerjee, Vinayak A. Rao and Harsha Honnappa

Entropy 2021, 23(3), 313; https://doi.org/10.3390/e23030313 - 06 Mar 2021

Cited by 2 | Viewed by 1716

Abstract

Datasets displaying temporal dependencies abound in science and engineering applications, with Markov models representing a simplified and popular view of the temporal dependence structure. In this paper, we consider Bayesian settings that place prior distributions over the parameters of the transition kernel of [...] Read more.

Datasets displaying temporal dependencies abound in science and engineering applications, with Markov models representing a simplified and popular view of the temporal dependence structure. In this paper, we consider Bayesian settings that place prior distributions over the parameters of the transition kernel of a Markov model, and seek to characterize the resulting, typically intractable, posterior distributions. We present a Probably Approximately Correct (PAC)-Bayesian analysis of variational Bayes (VB) approximations to tempered Bayesian posterior distributions, bounding the model risk of the VB approximations. Tempered posteriors are known to be robust to model misspecification, and their variational approximations do not suffer the usual problems of over confident approximations. Our results tie the risk bounds to the mixing and ergodic properties of the Markov data generating model. We illustrate the PAC-Bayes bounds through a number of example Markov models, and also consider the situation where the Markov model is misspecified. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

16 pages, 862 KiB

Open AccessArticle

Approximate Bayesian Computation for Discrete Spaces

by Ilze A. Auzina and Jakub M. Tomczak

Entropy 2021, 23(3), 312; https://doi.org/10.3390/e23030312 - 06 Mar 2021

Cited by 4 | Viewed by 2601

Abstract

Many real-life processes are black-box problems, i.e., the internal workings are inaccessible or a closed-form mathematical expression of the likelihood function cannot be defined. For continuous random variables, likelihood-free inference problems can be solved via Approximate Bayesian Computation (ABC). However, an optimal alternative [...] Read more.

Many real-life processes are black-box problems, i.e., the internal workings are inaccessible or a closed-form mathematical expression of the likelihood function cannot be defined. For continuous random variables, likelihood-free inference problems can be solved via Approximate Bayesian Computation (ABC). However, an optimal alternative for discrete random variables is yet to be formulated. Here, we aim to fill this research gap. We propose an adjusted population-based MCMC ABC method by re-defining the standard ABC parameters to discrete ones and by introducing a novel Markov kernel that is inspired by differential evolution. We first assess the proposed Markov kernel on a likelihood-based inference problem, namely discovering the underlying diseases based on a QMR-DTnetwork and, subsequently, the entire method on three likelihood-free inference problems: (i) the QMR-DT network with the unknown likelihood function, (ii) the learning binary neural network, and (iii) neural architecture search. The obtained results indicate the high potential of the proposed framework and the superiority of the new Markov kernel. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

19 pages, 1606 KiB

Open AccessArticle

Variationally Inferred Sampling through a Refined Bound

by Víctor Gallego and David Ríos Insua

Entropy 2021, 23(1), 123; https://doi.org/10.3390/e23010123 - 19 Jan 2021

Cited by 4 | Viewed by 2555

Abstract

In this work, a framework to boost the efficiency of Bayesian inference in probabilistic models is introduced by embedding a Markov chain sampler within a variational posterior approximation. We call this framework “refined variational approximation”. Its strengths are its ease of implementation and [...] Read more.

In this work, a framework to boost the efficiency of Bayesian inference in probabilistic models is introduced by embedding a Markov chain sampler within a variational posterior approximation. We call this framework “refined variational approximation”. Its strengths are its ease of implementation and the automatic tuning of sampler parameters, leading to a faster mixing time through automatic differentiation. Several strategies to approximate evidence lower bound (ELBO) computation are also introduced. Its efficient performance is showcased experimentally using state-space models for time-series data, a variational encoder for density estimation and a conditional variational autoencoder as a deep Bayes classifier. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

33 pages, 1542 KiB

Open AccessArticle

Dynamics of Coordinate Ascent Variational Inference: A Case Study in 2D Ising Models

by Sean Plummer, Debdeep Pati and Anirban Bhattacharya

Entropy 2020, 22(11), 1263; https://doi.org/10.3390/e22111263 - 06 Nov 2020

Cited by 8 | Viewed by 2700

Abstract

Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study the convergence of coordinate ascent algorithms for mean field variational inference. Focusing [...] Read more.

Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study the convergence of coordinate ascent algorithms for mean field variational inference. Focusing on the Ising model defined on two nodes, we fully characterize the dynamics of the sequential coordinate ascent algorithm and its parallel version. We observe that in the regime where the objective function is convex, both the algorithms are stable and exhibit convergence to the unique fixed point. Our analyses reveal interesting discordances between these two versions of the algorithm in the region when the objective function is non-convex. In fact, the parallel version exhibits a periodic oscillatory behavior which is absent in the sequential version. Drawing intuition from the Markov chain Monte Carlo literature, we empirically show that a parameter expansion of the Ising model, popularly called the Edward–Sokal coupling, leads to an enlargement of the regime of convergence to the global optima. Full article

(This article belongs to the Special Issue Approximate Bayesian Inference)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Approximate Bayesian Inference

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (19 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI