# Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. The Variational Autoencoder

#### 2.1. Vae Loss Function

#### 2.2. Comparison with Other Generative Machine Learning Methods

## 3. Accounting for Risk with Coupled Entropy

#### 3.1. Assessing Probabilistic Forecasts with the Generalized Mean

#### 3.2. Definition of Negative Coupled ELBO

**Definition**

**1.**

## 4. Results Using the MNIST Handwritten Numerals

**z**can be from 2 to 20. Taking the latent layers

**z**as the input, the probability distribution of each pixel is computed using a Bernoulli or Gaussian distribution by the decoder. The decoder outputs the corresponding 784 parameters to reconstruct an image. We used specific numbers of images from the training set as the batch size and a fixed number of epochs. Additionally, for the learned MNIST manifold, visualizations of learned data and reproduced results were plotted. The algorithm and experiments were developed with Python and the TensorFlow library. Our Python code can be found in the Data Availability Statement.

^{rd}epoch; when $\kappa =0.5$, the loss function has a computational error at the 8

^{th}epoch. Further investigations of the computational bounds of the algorithm are planned. The specific relationship between coupling $\kappa $ and probabilities for input images is shown in Table 1. The increased Robustness metric shows that the modified loss does improve the robustness of the the reconstructed image. In the next section, we also examine the performance of the divergence between the posterior and prior distributions of the latent layer.

## 5. Visualization of Latent Distribution

## 6. Performance with Corrupted Images

## 7. Discussion and Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Appendix A

#### Appendix A.1. Derivation of Negative Coupled ELBO

#### Appendix A.2. Origin of the Generalized Probability Metrics

## References

- Srivastava, A.; Sutton, C. Autoencoding Variational Inference for Topic Models. arXiv
**2017**, arXiv:1703.01488. [Google Scholar] - Dilokthanakul, N.; Mediano, P.A.M.; Garnelo, M.; Lee, M.C.H.; Salimbeni, H.; Arulkumaran, K.; Shanahan, M. Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders. arXiv
**2016**, arXiv:1611.02648. [Google Scholar] - Akrami, H.; Joshi, A.A.; Li, J.; Aydore, S.; Leahy, R.M. Robust variational autoencoder. arXiv
**2019**, arXiv:1905.09961. [Google Scholar] - Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Tran, D.; Hoffman, M.D.; Saurous, R.A.; Brevdo, E.; Murphy, K.; Blei, D.M. Deep probabilistic programming. In Proceedings of the Fifth International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Bowman, S.R.; Vilnis, L.; Vinyals, O.; Dai, A.M.; Jozefowicz, R.; Bengio, S. Generating Sentences from a Continuous Space. In Proceedings of the Twentieth Conference on Computational Natural Language Learning (CoNLL), Beijing, China, 26–31 July 2015. [Google Scholar]
- Zalger, J. Application of Variational Autoencoders for Aircraft Turbomachinery Design; Technical Report; Stanford University: Stanford, CA, USA, 2017. [Google Scholar]
- Xu, H.; Feng, Y.; Chen, J.; Wang, Z.; Qiao, H.; Chen, W.; Zhao, N.; Li, Z.; Bu, J.; Li, Z.; et al. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, Lyon, France, 23–27 April 2018; ACM Press: New York, NY, USA, 2018; pp. 187–196. [Google Scholar]
- Luchnikov, I.A.; Ryzhov, A.; Stas, P.J.; Filippov, S.N.; Ouerdane, H. Variational autoencoder reconstruction of complex many-body physics. Entropy
**2019**, 21, 1091. [Google Scholar] [CrossRef] [Green Version] - Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc.
**2016**, 112, 859–877. [Google Scholar] [CrossRef] [Green Version] - Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. beta-vae: Learning basic visual concepts with a constrained variational framework. In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
- Burgess, C.P.; Higgins, I.; Pal, A.; Matthey, L.; Watters, N.; Desjardins, G.; Lerchner, A. Understanding disentangling in beta-VAE. arXiv
**2018**, arXiv:1804.03599. [Google Scholar] - Niemitalo, O. A Method for Training Artificial Neural Networks to Generate Missing Data within a Variable Context. Internet Archive (Wayback Machine). 2010. Available online: https://web.archive.org/web/20120312111546/http://yehar.com/blog/?p=167 (accessed on 5 February 2022).
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Dutchess County, NY, USA, 2014; pp. 2672–2680. [Google Scholar]
- Donahue, J.; Darrell, T.; Krähenbühl, P. Adversarial feature learning. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017—Conference Track Proceedings, International Conference on Learning Representations, ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
- Dumoulin, V.; Belghazi, I.; Poole, B.; Mastropietro, O.; Lamb, A.; Arjovsky, M.; Courville, A. Adversarially learned inference. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017—Conference Track Proceedings, International Conference on Learning Representations, ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
- Neyshabur, B.; Bhojanapalli, S.; Chakrabarti, A. Stabilizing GAN training with multiple random projections. arXiv
**2017**, arXiv:1705.07831. [Google Scholar] - Pearl, J. Bayesian Netwcrks: A Model cf Self-Activated Memory for Evidential Reasoning; Technical Report; University of California: Oakland, CA, USA, 1985. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Ebbers, J.; Heymann, J.; Drude, L.; Glarner, T.; Haeb-Umbach, R.; Raj, B. Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery. In Proceedings of the INTERSPEECH 2017, Stockholm, Sweden, 20–24 August 2017; pp. 488–492. [Google Scholar]
- Nelson, K.P.; Umarov, S. Nonlinear statistical coupling. Phys. Stat. Mech. Its Appl.
**2010**, 389, 2157–2163. [Google Scholar] [CrossRef] [Green Version] - Nelson, K.P.; Umarov, S.R.; Kon, M.A. On the average uncertainty for systems with nonlinear coupling. Phys. Stat. Mech. Its Appl.
**2017**, 468, 30–43. [Google Scholar] [CrossRef] [Green Version] - Nelson, K.P. Reduced Perplexity: A simplified perspective on assessing probabilistic forecasts. In Info-Metrics Volume; Chen, M., Dunn, J.M., Golan, A., Ullah, A., Eds.; Oxford University Press: Oxford, UK, 2020. [Google Scholar]
- Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: New York, NY, USA, 2009; pp. 1–382. [Google Scholar]
- Weberszpil, J.; Helayël-Neto, J.A. Variational approach and deformed derivatives. Phys. Stat. Mech. Its Appl.
**2016**, 450, 217–227. [Google Scholar] [CrossRef] [Green Version] - Venkatesan, R.; Plastino, A. Generalized statistics variational perturbation approximation using q-deformed calculus. Phys. Stat. Mech. Its Appl.
**2010**, 389, 1159–1172. [Google Scholar] [CrossRef] [Green Version] - McAlister, D. XIII. The law of the geometric mean. Proc. R. Soc.
**1879**, 29, 367–376. [Google Scholar] - Nelson, K.P.; Scannell, B.J.; Landau, H. A risk profile for information fusion algorithms. Entropy
**2011**, 13, 1518–1532. [Google Scholar] [CrossRef] [Green Version] - Frogner, C.; Zhang, C.; Mobahi, H.; Araya, M.; Poggio, T.A. Learning with a Wasserstein Loss. In Advances in Neural Information Processing Systems 28; Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 2053–2061. [Google Scholar]
- Vahdat, A.; Kautz, J. Nvae: A deep hierarchical variational autoencoder. Adv. Neural Inf. Process. Syst.
**2020**, 33, 19667–19679. [Google Scholar] - LeCun, Y.; Cortes, C.; Burges, C.J. The MNIST Database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 5 February 2022).
- Chen, K.R.; Svoboda, D.; Nelson, K.P. Use of Student’s t-Distribution for the Latent Layer in a Coupled Variational Autoencoder. arXiv
**2020**, arXiv:2011.10879. [Google Scholar] - Takahashi, H.; Iwata, T.; Yamanaka, Y.; Yamada, M.; Yagi, S. Student-t Variational Autoencoder for Robust Density Estimation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 2696–2702. [Google Scholar]
- Mu, N.; Gilmer, J. Mnist-c: A robustness benchmark for computer vision. arXiv
**2019**, arXiv:1906.02337. [Google Scholar] - Van Der Maaten, L.; Hinton, G. Visualizing Data using T-SNE. J. Mach. Learn. Res.
**2008**, 9, 2579–2605. [Google Scholar] - Thurner, S.; Corominas-Murtra, B.; Hanel, R. Three faces of entropy for complex systems: Information, thermodynamics, and the maximum entropy principle. Phys. Rev. E
**2017**, 96, 032124. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Abe, S. Stability of Tsallis entropy and instabilities of Rényi and normalized Tsallis entropies: A basis for q-exponential distributions. Phys. Rev. E
**2002**, 66, 046134. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Rényi, A. On the foundations of information theory. Rev. L’Inst. Int. Stat.
**1965**, 33, 1–14. [Google Scholar] [CrossRef]

**Figure 3.**A histogram of the likelihoods that the VAE-reconstructed images match the input images. The objective of the coupled VAE research is to demonstrate that the Robustness, which is the −2/3 generalized mean, can be increased by penalizing the cost of producing outlier reconstructions. The Accuracy is the exponential of the average log-likelihood and the Decisiveness is the arithmetic mean.

**Figure 4.**(

**a**) The MNIST input images and (

**b**) the output images generated by the original VAE. (

**c**–

**e**) The output images generated by the modified coupled VAE model show small improvements in detail and clarity. For instance, the fifth digit in the first row of the input images is ‘4’, but the output image in the original VAE is more like ‘9’ rather than ‘4’, while the coupled VAE method produced ‘4’ correctly.

**Figure 5.**The histograms of likelihood for the reconstruction of the input images with various coupling $\kappa $ values. The red, blue, and green lines represent the arithmetic mean (Decisiveness), geometric mean (Accuracy), and −2/3 mean (Robustness), respectively. The minimal value of the Robustness metric indicates that the original VAE suffers from poor robustness. As $\kappa $ increases, the Robustness and Accuracy improve while the Decisiveness is mostly unchanged.

**Figure 6.**The rose plots of the various standard deviation values in 20 dimensions. The range and average values of these standard deviations are reduced as coupling increases.

**Figure 7.**The standard deviation of latent variable samples near the three generalized mean metrics. The red, blue, and green lines represent samples near the Decisiveness, Accuracy, and Robustness, respectively. As $\kappa $ increases, values of $\sigma $ fluctuate less and decrease toward 0. Magnified plots are shown to visualize the results further.

**Figure 8.**The histogram likelihood plots with a two-dimensional latent variable. Like the 20-D model, the increased values of the arithmetic mean metric and −2/3 mean metric show that the accuracy and robustness of the VAE model have been improved.

**Figure 9.**The rose plots of the various mean (above four figures) and standard deviation (below four figures) values in 2 dimensions. The range of means is reduced and mean values become closer to 0 as coupling increases.

**Figure 10.**The plot of the latent space of VAE trained for 200 epochs on MNIST with various $\kappa $ values. Different numerals cluster together more tightly as coupling $\kappa $ increases.

**Figure 11.**The plot of visualization of learned data manifold for generative models with the axes as the values of each dimension of latent variables. The distinct digits each exist in different regions of the latent space and smoothly transform from one digit to another.

**Figure 12.**The images with 5 different corruptions are shown in the first row. The reconstructed images when $\kappa =0.0$ and $\kappa =0.1$ are shown in the second and third rows, respectively. The qualitative visual improvement in clarity using the coupling is modest.

**Figure 13.**The histograms of marginal likelihood for the MNIST images with

**Gaussian**corruption shown. All three metrics increase as the coupling parameter $\kappa $ increases. The robustness improves the most, central tendency is the next, and decisiveness has the least improvement. From $\kappa =0.0$ to $\kappa =0.1$, the Robustness improves from ${10}^{-109.2}$ to ${10}^{-87.0}$, the Accuracy improves from ${10}^{-57.2}$ to ${10}^{-42.9}$, and the Decisiveness improves from ${10}^{-16.8}$ to ${10}^{-13.6}$.

**Figure 14.**The histograms of marginal likelihood for the MNIST images with

**glass blur**corruption are shown. All the three metrics increase as the coupling parameter $\kappa $ increases from 0 to 0.1.

**Figure 15.**The histograms of marginal likelihood for the MNIST images with

**impulse noise**corruption are shown. All the three metrics increase as the coupling $\kappa $ increases from 0 to 0.1.

**Figure 16.**The histograms of marginal likelihood for the MNIST images with

**shot noise**corruption are shown. All the three metrics increase as the coupling parameter $\kappa $ increases from 0 to 0.1.

**Figure 17.**The histograms of marginal likelihood for the MNIST images with

**shear**corruption are shown. All the three metrics increase as the coupling parameter $\kappa $ increases from 0 to 0.1.

**Table 1.**The Decisiveness, Accuracy, and Robustness of the reconstruction likelihood as a function of the coupling $\kappa $.

Coupling $\mathit{\kappa}$ | Decisiveness | Accuracy | Robustness |
---|---|---|---|

0 | $1.31\times {10}^{-15}$ | $2.41\times {10}^{-39}$ | $1.40\times {10}^{-79}$ |

$0.025$ | $6.61\times {10}^{-15}$ | $5.89\times {10}^{-35}$ | $9.91\times {10}^{-81}$ |

$0.05$ | $7.18\times {10}^{-12}$ | $5.80\times {10}^{-32}$ | $1.31\times {10}^{-73}$ |

$0.1$ | $1.34\times {10}^{-12}$ | $7.09\times {10}^{-29}$ | $2.57\times {10}^{-71}$ |

**Table 2.**Components of coupled ELBO with a 2-dimensional latent layer under different values of coupling. The improvement in the coupled KL-divergence is very slight, while it is larger for the coupled reconstruction loss.

Coupling $\mathit{\kappa}$ | Coupled KL-Divergence | Coupled RE Loss | Coupled ELBO | KL Proportion | RE Proportion |
---|---|---|---|---|---|

0 | 5.8 +/− 1.7 | 166.5 +/− 52.2 | 172.3 | 3.38% | 96.62% |

$0.025$ | 5.7 +/− 1.6 | 156.4 +/− 49.8 | 162.1 | 3.53% | 96.47% |

$0.05$ | 5.6 +/− 1.6 | 149.2 +/− 46.6 | 154.8 | 3.61% | 96.39% |

$0.075$ | 5.6 +/− 1.7 | 141.1 +/− 44.6 | 146.7 | 3.82% | 96.18% |

**Table 3.**The components of the coupled ELBO under

**Gaussian**corruptions are provided in the table. The coupled KL-divergence initially increases when moving away from the standard VAE design with $\kappa =0$ to $\kappa =0.025$, however, it then steadily decreases with increasing $\kappa $. The coupled reconstruction loss (column three) shows steady improvement. The overall negative coupled ELBO shows consistent improvement as the coupling increases. The relative importance of the divergence and reconstruction varies as the coupling increases but in each case it is approximately a $15\%$ to $85\%$ relative weighting.

Coupling $\mathit{\kappa}$ | Coupled KL-Divergence | Coupled RE Loss | Coupled ELBO | KL Proportion | RE Proportion |
---|---|---|---|---|---|

$\kappa =0$ | 23.9 +/− 3.8 | 131.6 +/− 40.7 | 155.5 | 15.34% | 84.66% |

$\kappa =0.025$ | 29.6 +/− 2.3 | 119.9 +/− 38.5 | 149.5 | 19.80% | 80.20% |

$\kappa =0.05$ | 26.0 +/− 0.9 | 111.1 +/− 36.5 | 137.1 | 18.94% | 80.06% |

$\kappa =0.075$ | 21.4 +/− 0.5 | 104.4 +/− 34.3 | 125.8 | 16.98% | 83.02% |

$\kappa =0.1$ | 18.4 +/− 0.6 | 98.9 +/− 32.7 | 117.3 | 15.71% | 84.28% |

**Table 4.**The components of the coupled ELBO under

**glass blur**corruptions are provided in the table. The coupled KL-divergence initially increases when moving away from the standard VAE design with $\kappa \le 0.025$, but it then steadily decreases with increasing $\kappa $. The coupled reconstruction loss shows steady improvement. The overall negative coupled ELBO shows consistent improvement as $\kappa $ increases.

Coupling $\mathit{\kappa}$ | Coupled KL-Divergence | Coupled RE Loss | Coupled ELBO | KL Proportion | RE Proportion |
---|---|---|---|---|---|

$\kappa =0$ | 22.3 +/− 3.5 | 196.1 +/− 55.3 | 218.4 | 10.19% | 89.81% |

$\kappa =0.025$ | 29.4 +/− 2.0 | 178.8 +/− 50.1 | 208.2 | 14.12% | 85.88% |

$\kappa =0.05$ | 25.5 +/− 0.7 | 164.1 +/− 45.7 | 189.6 | 13.44% | 86.56% |

$\kappa =0.075$ | 20.9 +/− 0.4 | 154.0 +/− 43.0 | 174.9 | 11.96% | 88.04% |

$\kappa =0.1$ | 18.0 +/− 0.4 | 145.1 +/− 40.0 | 163.1 | 11.05% | 88.95% |

**Table 5.**The components of the coupled ELBO under

**impulse noise**corruptions are provided in the table. The coupled KL-divergence initially increases when moving away from the standard VAE design with $\kappa \le 0.025$, but it then steadily decreases with increasing $\kappa $. The overall negative coupled ELBO shows consistent improvement as $\kappa $ increases.

Coupling $\mathit{\kappa}$ | Coupled KL-Divergence | Coupled RE Loss | Coupled ELBO | KL Proportion | RE Proportion |
---|---|---|---|---|---|

$\kappa =0$ | 24.2 +/− 3.8 | 170.7 +/− 34.7 | 195.0 | 12.43% | 87.57% |

$\kappa =0.025$ | 29.9 +/− 2.2 | 148.0 +/− 31.0 | 177.9 | 16.81% | 83.19% |

$\kappa =0.05$ | 26.0 +/− 0.8 | 131.6 +/− 28.5 | 157.7 | 16.52% | 83.48% |

$\kappa =0.075$ | 21.4 +/− 0.6 | 120.9 +/− 26.7 | 142.3 | 15.05% | 84.95% |

$\kappa =0.1$ | 18.5 +/− 0.6 | 111.8 +/− 25.2 | 130.3 | 14.21% | 85.79% |

**Table 6.**The components of the coupled ELBO under

**shot noise**corruptions are provided in the table. The coupled KL-divergence increases when moving away from the standard VAE design with $\kappa \le 0.025$, but it then steadily decreases with increasing $\kappa $. The coupled reconstruction loss shows steady improvement. The overall negative coupled ELBO shows consistent improvement as $\kappa $ increases.

Coupling $\mathit{\kappa}$ | Coupled KL-Divergence | Coupled RE Loss | Coupled ELBO | KL Proportion | RE Proportion |
---|---|---|---|---|---|

$\kappa =0$ | 23.9 +/− 3.8 | 98.9 +/− 28.3 | 122.8 | 19.45% | 80.55% |

$\kappa =0.025$ | 29.9 +/− 2.4 | 88.9 +/− 26.2 | 118.8 | 25.14% | 74.86% |

$\kappa =0.05$ | 26.1 +/− 1.0 | 81.8 +/− 25.0 | 108.0 | 24.21% | 75.80% |

$\kappa =0.075$ | 21.6 +/− 0.7 | 77.6 +/− 23.9 | 99.2 | 21.80% | 78.20% |

$\kappa =0.1$ | 18.6 +/− 0.6 | 73.4 +/− 22.8 | 92.0 | 20.17% | 79.83% |

**Table 7.**The components of the coupled ELBO under

**shear**corruptions are provided. The coupled KL-divergence increases when moving away from the standard VAE design with $\kappa \le 0.025$, but it then steadily decreases with increasing $\kappa $. The coupled reconstruction loss shows steady improvement. The overall negative coupled ELBO shows consistent improvement as $\kappa $ increases.

Coupling $\mathit{\kappa}$ | Coupled KL-Divergence | Coupled RE Loss | Coupled ELBO | KL Proportion | RE Proportion |
---|---|---|---|---|---|

$\kappa =0$ | 24.8 +/− 4.0 | 114.1 +/− 31.7 | 138.9 | 17.85% | 82.15% |

$\kappa =0.025$ | 30.4 +/− 2.4 | 102.3 +/− 29.0 | 132.7 | 22.92% | 77.08% |

$\kappa =0.05$ | 26.1 +/− 0.9 | 94.7 +/− 27.5 | 120.8 | 21.61% | 78.39% |

$\kappa =0.075$ | 21.8 +/− 0.7 | 89.5 +/− 26.3 | 111.3 | 19.61% | 80.39% |

$\kappa =0.1$ | 18.6 +/− 0.6 | 84.9 +/− 24.9 | 103.5 | 17.97% | 82.03% |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cao, S.; Li, J.; Nelson, K.P.; Kon, M.A.
Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder. *Entropy* **2022**, *24*, 423.
https://doi.org/10.3390/e24030423

**AMA Style**

Cao S, Li J, Nelson KP, Kon MA.
Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder. *Entropy*. 2022; 24(3):423.
https://doi.org/10.3390/e24030423

**Chicago/Turabian Style**

Cao, Shichen, Jingjing Li, Kenric P. Nelson, and Mark A. Kon.
2022. "Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder" *Entropy* 24, no. 3: 423.
https://doi.org/10.3390/e24030423