entropy-logo

Journal Browser

Journal Browser

Approximate Bayesian Inference

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (22 June 2021) | Viewed by 58066

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor


E-Mail Website
Guest Editor
Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo 103-0027, Japan
Interests: statistical learning theory; mathematical statistics; Bayesian statistics; aggregation of estimators; approximate posterior inference

Special Issue Information

Already extremely popular when it comes to statistical inference, Bayesian methods are also becoming popular in machine learning and AI problems, where it is important for any device not only to predict well, but also to provide a quantification of the uncertainty of the prediction.

Traditionally, Bayesian estimators were implemented using Monte Carlo methods, such as the Metropolis–Hastings of the Gibbs sampler. These algorithms target the exact posterior distribution. However, many modern models in statistics are simply too complex to use such methodologies. In machine learning, the volume of the data used in practice makes Monte Carlo methods too slow to be useful.

Motivated by these applications, many faster algorithms have recently been proposed that target an approximation of the posterior.

1) A first family of methods still relies on Monte Carlo simulations but targets an approximation of the posterior. For example, approximate versions of Metropolis–Hastings based on subsampling, or Langevin Monte Carlo methods, are extremely useful when the sample size or the dimension of the data is too large. The ABC algorithm is useful when the model is generative, in the sense that it is simple to sample from it, even though its likelihood may be intractable.

2) Another interesting class of methods relies on the optimization algorithm to approximate the posterior by a member of a tractable family of probability distributions—for example, variational approximations, Laplace approximations, the EP algorithm, etc.

Of course, even though these algorithms are much faster than exact methods, it is extremely important to quantify what is lost in accuracy with respect to the exact posterior. For some of the previous methods, such results are still only partially available. Recent work established the very good scaling properties of Langevin Monte Carlo with the dimension of the data. Another series of paper connected the question of the accuracy of variational approximations to the PAC–Bayes literature in machine learning and obtained convergence results.

The objective of this Special Issue is to provide the latest advances in approximate Monte Carlo methods and in approximations of the posterior: design of efficient algorithms, study of the statistical properties of these algorithms, and challenging applications.

Dr. Pierre Alquier
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Bayesian statistics
  • variational approximations
  • EP algorithm
  • Langevin Monte Carlo
  • Laplace approximations
  • Approximate Bayesian Computation (ABC)
  • Markov chain Monte Carlo (MCMC)
  • PAC–Bayes

Published Papers (19 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

12 pages, 263 KiB  
Editorial
Approximate Bayesian Inference
by Pierre Alquier
Entropy 2020, 22(11), 1272; https://doi.org/10.3390/e22111272 - 10 Nov 2020
Cited by 10 | Viewed by 5137
Abstract
This is the Editorial article summarizing the scope of the Special Issue: Approximate Bayesian Inference. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)

Research

Jump to: Editorial

25 pages, 4200 KiB  
Article
Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder
by Shichen Cao, Jingjing Li, Kenric P. Nelson and Mark A. Kon
Entropy 2022, 24(3), 423; https://doi.org/10.3390/e24030423 - 18 Mar 2022
Cited by 7 | Viewed by 4223
Abstract
We present a coupled variational autoencoder (VAE) method, which improves the accuracy and robustness of the model representation of handwritten numeral images. The improvement is measured in both increasing the likelihood of the reconstructed images and in reducing divergence between the posterior and [...] Read more.
We present a coupled variational autoencoder (VAE) method, which improves the accuracy and robustness of the model representation of handwritten numeral images. The improvement is measured in both increasing the likelihood of the reconstructed images and in reducing divergence between the posterior and a prior latent distribution. The new method weighs outlier samples with a higher penalty by generalizing the original evidence lower bound function using a coupled entropy function based on the principles of nonlinear statistical coupling. We evaluated the performance of the coupled VAE model using the Modified National Institute of Standards and Technology (MNIST) dataset and its corrupted modification C-MNIST. Histograms of the likelihood that the reconstruction matches the original image show that the coupled VAE improves the reconstruction and this improvement is more substantial when seeded with corrupted images. All five corruptions evaluated showed improvement. For instance, with the Gaussian corruption seed the accuracy improves by 1014 (from 1057.2 to 1042.9) and robustness improves by 1022 (from 10109.2 to 1087.0). Furthermore, the divergence between the posterior and prior distribution of the latent distribution is reduced. Thus, in contrast to the β-VAE design, the coupled VAE algorithm improves model representation, rather than trading off the performance of the reconstruction and latent distribution divergence. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

27 pages, 1813 KiB  
Article
Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
by Le Li and Benjamin Guedj
Entropy 2021, 23(11), 1534; https://doi.org/10.3390/e23111534 - 18 Nov 2021
Cited by 1 | Viewed by 1891
Abstract
When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn [...] Read more.
When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation (called slpc, for sequential learning principal curves) that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret computation and performance on synthetic and real-life data. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

13 pages, 418 KiB  
Article
Still No Free Lunches: The Price to Pay for Tighter PAC-Bayes Bounds
by Benjamin Guedj and Louis Pujol
Entropy 2021, 23(11), 1529; https://doi.org/10.3390/e23111529 - 18 Nov 2021
Cited by 9 | Viewed by 2100
Abstract
“No free lunch” results state the impossibility of obtaining meaningful bounds on the error of a learning algorithm without prior assumptions and modelling, which is more or less realistic for a given problem. Some models are “expensive” (strong assumptions, such as sub-Gaussian tails), [...] Read more.
“No free lunch” results state the impossibility of obtaining meaningful bounds on the error of a learning algorithm without prior assumptions and modelling, which is more or less realistic for a given problem. Some models are “expensive” (strong assumptions, such as sub-Gaussian tails), others are “cheap” (simply finite variance). As it is well known, the more you pay, the more you get: in other words, the most expensive models yield the more interesting bounds. Recent advances in robust statistics have investigated procedures to obtain tight bounds while keeping the cost of assumptions minimal. The present paper explores and exhibits what the limits are for obtaining tight probably approximately correct (PAC)-Bayes bounds in a robust setting for cheap models. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

18 pages, 1419 KiB  
Article
A Scalable Bayesian Sampling Method Based on Stochastic Gradient Descent Isotropization
by Giulio Franzese, Dimitrios Milios, Maurizio Filippone and Pietro Michiardi
Entropy 2021, 23(11), 1426; https://doi.org/10.3390/e23111426 - 28 Oct 2021
Cited by 3 | Viewed by 1694
Abstract
Stochastic gradient sg-based algorithms for Markov chain Monte Carlo sampling (sgmcmc) tackle large-scale Bayesian modeling problems by operating on mini-batches and injecting noise on sgsteps. The sampling properties of these algorithms are determined by user choices, such as the [...] Read more.
Stochastic gradient sg-based algorithms for Markov chain Monte Carlo sampling (sgmcmc) tackle large-scale Bayesian modeling problems by operating on mini-batches and injecting noise on sgsteps. The sampling properties of these algorithms are determined by user choices, such as the covariance of the injected noise and the learning rate, and by problem-specific factors, such as assumptions on the loss landscape and the covariance of sg noise. However, current sgmcmc algorithms applied to popular complex models such as Deep Nets cannot simultaneously satisfy the assumptions on loss landscapes and on the behavior of the covariance of the sg noise, while operating with the practical requirement of non-vanishing learning rates. In this work we propose a novel practical method, which makes the sg noise isotropic, using a fixed learning rate that we determine analytically. Extensive experimental validations indicate that our proposal is competitive with the state of the art on sgmcmc. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

20 pages, 424 KiB  
Article
PAC-Bayes Unleashed: Generalisation Bounds with Unbounded Losses
by Maxime Haddouche, Benjamin Guedj, Omar Rivasplata and John Shawe-Taylor
Entropy 2021, 23(10), 1330; https://doi.org/10.3390/e23101330 - 12 Oct 2021
Cited by 12 | Viewed by 2411
Abstract
We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework, where most of the existing literature focuses on supervised learning problems with a bounded loss function (typically assumed to [...] Read more.
We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework, where most of the existing literature focuses on supervised learning problems with a bounded loss function (typically assumed to take values in the interval [0;1]). In order to relax this classical assumption, we propose to allow the range of the loss to depend on each predictor. This relaxation is captured by our new notion of HYPothesis-dependent rangE (HYPE). Based on this, we derive a novel PAC-Bayesian generalisation bound for unbounded loss functions, and we instantiate it on a linear regression problem. To make our theory usable by the largest audience possible, we include discussions on actual computation, practicality and limitations of our assumptions. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

14 pages, 326 KiB  
Article
Differentiable PAC–Bayes Objectives with Partially Aggregated Neural Networks
by Felix Biggs and Benjamin Guedj
Entropy 2021, 23(10), 1280; https://doi.org/10.3390/e23101280 - 29 Sep 2021
Cited by 11 | Viewed by 2375
Abstract
We make two related contributions motivated by the challenge of training stochastic neural networks, particularly in a PAC–Bayesian setting: (1) we show how averaging over an ensemble of stochastic neural networks enables a new class of partially-aggregated estimators, proving that these lead to [...] Read more.
We make two related contributions motivated by the challenge of training stochastic neural networks, particularly in a PAC–Bayesian setting: (1) we show how averaging over an ensemble of stochastic neural networks enables a new class of partially-aggregated estimators, proving that these lead to unbiased lower-variance output and gradient estimators; (2) we reformulate a PAC–Bayesian bound for signed-output networks to derive in combination with the above a directly optimisable, differentiable objective and a generalisation guarantee, without using a surrogate loss or loosening the bound. We show empirically that this leads to competitive generalisation guarantees and compares favourably to other methods for training such networks. Finally, we note that the above leads to a simpler PAC–Bayesian training scheme for sign-activation networks than previous work. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
20 pages, 579 KiB  
Article
Meta-Strategy for Learning Tuning Parameters with Guarantees
by Dimitri Meunier and Pierre Alquier
Entropy 2021, 23(10), 1257; https://doi.org/10.3390/e23101257 - 27 Sep 2021
Cited by 3 | Viewed by 1944
Abstract
Online learning methods, similar to the online gradient algorithm (OGA) and exponentially weighted aggregation (EWA), often depend on tuning parameters that are difficult to set in practice. We consider an online meta-learning scenario, and we propose a meta-strategy to learn these parameters from [...] Read more.
Online learning methods, similar to the online gradient algorithm (OGA) and exponentially weighted aggregation (EWA), often depend on tuning parameters that are difficult to set in practice. We consider an online meta-learning scenario, and we propose a meta-strategy to learn these parameters from past tasks. Our strategy is based on the minimization of a regret bound. It allows us to learn the initialization and the step size in OGA with guarantees. It also allows us to learn the prior or the learning rate in EWA. We provide a regret analysis of the strategy. It allows to identify settings where meta-learning indeed improves on learning each task in isolation. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

16 pages, 465 KiB  
Article
Fast Compression of MCMC Output
by Nicolas Chopin and Gabriel Ducrocq
Entropy 2021, 23(8), 1017; https://doi.org/10.3390/e23081017 - 06 Aug 2021
Cited by 2 | Viewed by 2373
Abstract
We propose cube thinning, a novel method for compressing the output of an MCMC (Markov chain Monte Carlo) algorithm when control variates are available. It allows resampling of the initial MCMC sample (according to weights derived from control variates), while imposing equality constraints [...] Read more.
We propose cube thinning, a novel method for compressing the output of an MCMC (Markov chain Monte Carlo) algorithm when control variates are available. It allows resampling of the initial MCMC sample (according to weights derived from control variates), while imposing equality constraints on the averages of these control variates, using the cube method (an approach that originates from survey sampling). The main advantage of cube thinning is that its complexity does not depend on the size of the compressed sample. This compares favourably to previous methods, such as Stein thinning, the complexity of which is quadratic in that quantity. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

45 pages, 704 KiB  
Article
Accelerated Diffusion-Based Sampling by the Non-Reversible Dynamics with Skew-Symmetric Matrices
by Futoshi Futami, Tomoharu Iwata, Naonori Ueda and Issei Sato
Entropy 2021, 23(8), 993; https://doi.org/10.3390/e23080993 - 30 Jul 2021
Cited by 3 | Viewed by 3628
Abstract
Langevin dynamics (LD) has been extensively studied theoretically and practically as a basic sampling technique. Recently, the incorporation of non-reversible dynamics into LD is attracting attention because it accelerates the mixing speed of LD. Popular choices for non-reversible dynamics include underdamped Langevin dynamics [...] Read more.
Langevin dynamics (LD) has been extensively studied theoretically and practically as a basic sampling technique. Recently, the incorporation of non-reversible dynamics into LD is attracting attention because it accelerates the mixing speed of LD. Popular choices for non-reversible dynamics include underdamped Langevin dynamics (ULD), which uses second-order dynamics and perturbations with skew-symmetric matrices. Although ULD has been widely used in practice, the application of skew acceleration is limited although it is expected to show superior performance theoretically. Current work lacks a theoretical understanding of issues that are important to practitioners, including the selection criteria for skew-symmetric matrices, quantitative evaluations of acceleration, and the large memory cost of storing skew matrices. In this study, we theoretically and numerically clarify these problems by analyzing acceleration focusing on how the skew-symmetric matrix perturbs the Hessian matrix of potential functions. We also present a practical algorithm that accelerates the standard LD and ULD, which uses novel memory-efficient skew-symmetric matrices under parallel-chain Monte Carlo settings. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

34 pages, 2438 KiB  
Article
Flexible and Efficient Inference with Particles for the Variational Gaussian Approximation
by Théo Galy-Fajou, Valerio Perrone and Manfred Opper
Entropy 2021, 23(8), 990; https://doi.org/10.3390/e23080990 - 30 Jul 2021
Cited by 2 | Viewed by 2676
Abstract
Variational inference is a powerful framework, used to approximate intractable posteriors through variational distributions. The de facto standard is to rely on Gaussian variational families, which come with numerous advantages: they are easy to sample from, simple to parametrize, and many expectations are [...] Read more.
Variational inference is a powerful framework, used to approximate intractable posteriors through variational distributions. The de facto standard is to rely on Gaussian variational families, which come with numerous advantages: they are easy to sample from, simple to parametrize, and many expectations are known in closed-form or readily computed by quadrature. In this paper, we view the Gaussian variational approximation problem through the lens of gradient flows. We introduce a flexible and efficient algorithm based on a linear flow leading to a particle-based approximation. We prove that, with a sufficient number of particles, our algorithm converges linearly to the exact solution for Gaussian targets, and a low-rank approximation otherwise. In addition to the theoretical analysis, we show, on a set of synthetic and real-world high-dimensional problems, that our algorithm outperforms existing methods with Gaussian targets while performing on a par with non-Gaussian targets. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

20 pages, 1588 KiB  
Article
ABCDP: Approximate Bayesian Computation with Differential Privacy
by Mijung Park, Margarita Vinaroz and Wittawat Jitkrittum
Entropy 2021, 23(8), 961; https://doi.org/10.3390/e23080961 - 27 Jul 2021
Cited by 3 | Viewed by 2309
Abstract
We developed a novel approximate Bayesian computation (ABC) framework, ABCDP, which produces differentially private (DP) and approximate posterior samples. Our framework takes advantage of the sparse vector technique (SVT), widely studied in the differential privacy literature. SVT incurs the privacy cost only [...] Read more.
We developed a novel approximate Bayesian computation (ABC) framework, ABCDP, which produces differentially private (DP) and approximate posterior samples. Our framework takes advantage of the sparse vector technique (SVT), widely studied in the differential privacy literature. SVT incurs the privacy cost only when a condition (whether a quantity of interest is above/below a threshold) is met. If the condition is sparsely met during the repeated queries, SVT can drastically reduce the cumulative privacy loss, unlike the usual case where every query incurs the privacy loss. In ABC, the quantity of interest is the distance between observed and simulated data, and only when the distance is below a threshold can we take the corresponding prior sample as a posterior sample. Hence, applying SVT to ABC is an organic way to transform an ABC algorithm to a privacy-preserving variant with minimal modification, but yields the posterior samples with a high privacy level. We theoretically analyzed the interplay between the noise added for privacy and the accuracy of the posterior samples. We apply ABCDP to several data simulators and show the efficacy of the proposed framework. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

43 pages, 644 KiB  
Article
Variational Message Passing and Local Constraint Manipulation in Factor Graphs
by İsmail Şenöz, Thijs van de Laar, Dmitry Bagaev and Bert de Vries
Entropy 2021, 23(7), 807; https://doi.org/10.3390/e23070807 - 24 Jun 2021
Cited by 15 | Viewed by 6092
Abstract
Accurate evaluation of Bayesian model evidence for a given data set is a fundamental problem in model development. Since evidence evaluations are usually intractable, in practice variational free energy (VFE) minimization provides an attractive alternative, as the VFE is an upper bound on [...] Read more.
Accurate evaluation of Bayesian model evidence for a given data set is a fundamental problem in model development. Since evidence evaluations are usually intractable, in practice variational free energy (VFE) minimization provides an attractive alternative, as the VFE is an upper bound on negative model log-evidence (NLE). In order to improve tractability of the VFE, it is common to manipulate the constraints in the search space for the posterior distribution of the latent variables. Unfortunately, constraint manipulation may also lead to a less accurate estimate of the NLE. Thus, constraint manipulation implies an engineering trade-off between tractability and accuracy of model evidence estimation. In this paper, we develop a unifying account of constraint manipulation for variational inference in models that can be represented by a (Forney-style) factor graph, for which we identify the Bethe Free Energy as an approximation to the VFE. We derive well-known message passing algorithms from first principles, as the result of minimizing the constrained Bethe Free Energy (BFE). The proposed method supports evaluation of the BFE in factor graphs for model scoring and development of new message passing-based inference algorithms that potentially improve evidence estimation accuracy. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

34 pages, 1413 KiB  
Article
Understanding the Variability in Graph Data Sets through Statistical Modeling on the Stiefel Manifold
by Clément Mantoux, Baptiste Couvy-Duchesne, Federica Cacciamani, Stéphane Epelbaum, Stanley Durrleman and Stéphanie Allassonnière
Entropy 2021, 23(4), 490; https://doi.org/10.3390/e23040490 - 20 Apr 2021
Cited by 2 | Viewed by 2775
Abstract
Network analysis provides a rich framework to model complex phenomena, such as human brain connectivity. It has proven efficient to understand their natural properties and design predictive models. In this paper, we study the variability within groups of networks, i.e., the structure of [...] Read more.
Network analysis provides a rich framework to model complex phenomena, such as human brain connectivity. It has proven efficient to understand their natural properties and design predictive models. In this paper, we study the variability within groups of networks, i.e., the structure of connection similarities and differences across a set of networks. We propose a statistical framework to model these variations based on manifold-valued latent factors. Each network adjacency matrix is decomposed as a weighted sum of matrix patterns with rank one. Each pattern is described as a random perturbation of a dictionary element. As a hierarchical statistical model, it enables the analysis of heterogeneous populations of adjacency matrices using mixtures. Our framework can also be used to infer the weight of missing edges. We estimate the parameters of the model using an Expectation-Maximization-based algorithm. Experimenting on synthetic data, we show that the algorithm is able to accurately estimate the latent structure in both low and high dimensions. We apply our model on a large data set of functional brain connectivity matrices from the UK Biobank. Our results suggest that the proposed model accurately describes the complex variability in the data set with a small number of degrees of freedom. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

40 pages, 4144 KiB  
Article
“Exact” and Approximate Methods for Bayesian Inference: Stochastic Volatility Case Study
by Yuliya Shapovalova
Entropy 2021, 23(4), 466; https://doi.org/10.3390/e23040466 - 15 Apr 2021
Cited by 4 | Viewed by 3691
Abstract
We conduct a case study in which we empirically illustrate the performance of different classes of Bayesian inference methods to estimate stochastic volatility models. In particular, we consider how different particle filtering methods affect the variance of the estimated likelihood. We review and [...] Read more.
We conduct a case study in which we empirically illustrate the performance of different classes of Bayesian inference methods to estimate stochastic volatility models. In particular, we consider how different particle filtering methods affect the variance of the estimated likelihood. We review and compare particle Markov Chain Monte Carlo (MCMC), RMHMC, fixed-form variational Bayes, and integrated nested Laplace approximation to estimate the posterior distribution of the parameters. Additionally, we conduct the review from the point of view of whether these methods are (1) easily adaptable to different model specifications; (2) adaptable to higher dimensions of the model in a straightforward way; (3) feasible in the multivariate case. We show that when using the stochastic volatility model for methods comparison, various data-generating processes have to be considered to make a fair assessment of the methods. Finally, we present a challenging specification of the multivariate stochastic volatility model, which is rarely used to illustrate the methods but constitutes an important practical application. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

39 pages, 480 KiB  
Article
PAC-Bayes Bounds on Variational Tempered Posteriors for Markov Models
by Imon Banerjee, Vinayak A. Rao and Harsha Honnappa
Entropy 2021, 23(3), 313; https://doi.org/10.3390/e23030313 - 06 Mar 2021
Cited by 2 | Viewed by 1716
Abstract
Datasets displaying temporal dependencies abound in science and engineering applications, with Markov models representing a simplified and popular view of the temporal dependence structure. In this paper, we consider Bayesian settings that place prior distributions over the parameters of the transition kernel of [...] Read more.
Datasets displaying temporal dependencies abound in science and engineering applications, with Markov models representing a simplified and popular view of the temporal dependence structure. In this paper, we consider Bayesian settings that place prior distributions over the parameters of the transition kernel of a Markov model, and seek to characterize the resulting, typically intractable, posterior distributions. We present a Probably Approximately Correct (PAC)-Bayesian analysis of variational Bayes (VB) approximations to tempered Bayesian posterior distributions, bounding the model risk of the VB approximations. Tempered posteriors are known to be robust to model misspecification, and their variational approximations do not suffer the usual problems of over confident approximations. Our results tie the risk bounds to the mixing and ergodic properties of the Markov data generating model. We illustrate the PAC-Bayes bounds through a number of example Markov models, and also consider the situation where the Markov model is misspecified. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
16 pages, 862 KiB  
Article
Approximate Bayesian Computation for Discrete Spaces
by Ilze A. Auzina and Jakub M. Tomczak
Entropy 2021, 23(3), 312; https://doi.org/10.3390/e23030312 - 06 Mar 2021
Cited by 4 | Viewed by 2601
Abstract
Many real-life processes are black-box problems, i.e., the internal workings are inaccessible or a closed-form mathematical expression of the likelihood function cannot be defined. For continuous random variables, likelihood-free inference problems can be solved via Approximate Bayesian Computation (ABC). However, an optimal alternative [...] Read more.
Many real-life processes are black-box problems, i.e., the internal workings are inaccessible or a closed-form mathematical expression of the likelihood function cannot be defined. For continuous random variables, likelihood-free inference problems can be solved via Approximate Bayesian Computation (ABC). However, an optimal alternative for discrete random variables is yet to be formulated. Here, we aim to fill this research gap. We propose an adjusted population-based MCMC ABC method by re-defining the standard ABC parameters to discrete ones and by introducing a novel Markov kernel that is inspired by differential evolution. We first assess the proposed Markov kernel on a likelihood-based inference problem, namely discovering the underlying diseases based on a QMR-DTnetwork and, subsequently, the entire method on three likelihood-free inference problems: (i) the QMR-DT network with the unknown likelihood function, (ii) the learning binary neural network, and (iii) neural architecture search. The obtained results indicate the high potential of the proposed framework and the superiority of the new Markov kernel. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

19 pages, 1606 KiB  
Article
Variationally Inferred Sampling through a Refined Bound
by Víctor Gallego and David Ríos Insua
Entropy 2021, 23(1), 123; https://doi.org/10.3390/e23010123 - 19 Jan 2021
Cited by 4 | Viewed by 2555
Abstract
In this work, a framework to boost the efficiency of Bayesian inference in probabilistic models is introduced by embedding a Markov chain sampler within a variational posterior approximation. We call this framework “refined variational approximation”. Its strengths are its ease of implementation and [...] Read more.
In this work, a framework to boost the efficiency of Bayesian inference in probabilistic models is introduced by embedding a Markov chain sampler within a variational posterior approximation. We call this framework “refined variational approximation”. Its strengths are its ease of implementation and the automatic tuning of sampler parameters, leading to a faster mixing time through automatic differentiation. Several strategies to approximate evidence lower bound (ELBO) computation are also introduced. Its efficient performance is showcased experimentally using state-space models for time-series data, a variational encoder for density estimation and a conditional variational autoencoder as a deep Bayes classifier. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

33 pages, 1542 KiB  
Article
Dynamics of Coordinate Ascent Variational Inference: A Case Study in 2D Ising Models
by Sean Plummer, Debdeep Pati and Anirban Bhattacharya
Entropy 2020, 22(11), 1263; https://doi.org/10.3390/e22111263 - 06 Nov 2020
Cited by 8 | Viewed by 2700
Abstract
Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study the convergence of coordinate ascent algorithms for mean field variational inference. Focusing [...] Read more.
Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study the convergence of coordinate ascent algorithms for mean field variational inference. Focusing on the Ising model defined on two nodes, we fully characterize the dynamics of the sequential coordinate ascent algorithm and its parallel version. We observe that in the regime where the objective function is convex, both the algorithms are stable and exhibit convergence to the unique fixed point. Our analyses reveal interesting discordances between these two versions of the algorithm in the region when the objective function is non-convex. In fact, the parallel version exhibits a periodic oscillatory behavior which is absent in the sequential version. Drawing intuition from the Markov chain Monte Carlo literature, we empirically show that a parameter expansion of the Ising model, popularly called the Edward–Sokal coupling, leads to an enlargement of the regime of convergence to the global optima. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

Back to TopTop