TURBO: The Swiss Knife of AutoEncoders
Abstract
:1. Introduction
 Highlighting the main limitations of the IBN principle and the need for a new framework;
 Introducing and explaining the details of the TURBO framework, and motivating several use cases;
 Reviewing wellknown models with the lens of the TURBO framework, showing how it is a straightforward generalisation of them;
 Linking the TURBO framework to additional related models, opening the door to additional studies and applications;
 Showcasing several applications where the TURBO framework gives either stateoftheart or competing results compared to existing methods.
2. Notations and Definitions
3. Background: From IBN to TURBO
3.1. MinMax Game: Or Bottleneck Training
3.1.1. VAE from BIBAE Perspectives
3.1.2. GAN from BIBAE Perspectives
3.1.3. CLUB
3.2. MaxMax Game: Or Physically Meaningful Latent Space
4. TURBO
4.1. General Objective Function
4.2. Generalisation of Many Models
4.2.1. AAE
4.2.2. GAN and WGAN
4.2.3. pix2pix and SRGAN
4.2.4. CycleGAN
4.2.5. Flows
4.3. Extension to Additional Models
ALAE
5. Applications
5.1. TURBO in HighEnergy Physics: TurboSim
5.2. TURBO in Astronomy: HubbletoWebb
5.3. TURBO in AntiCounterfeiting: Digital Twin
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
TURBO  Twoway Unidirectional Representations by Bounded Optimisation 
IBN  Information Bottleneck 
BIBAE  Bounded Information Bottleneck AutoEncoder 
GAN  Generative Adversarial Network 
WGAN  Wasserstein GAN 
VAE  Variational AutoEncoder 
InfoVAE  Information maximising VAE 
AAE  Adversarial AutoEncoder 
pix2pix  ImagetoImage Translation with Conditional GAN 
SRGAN  SuperResolution GAN 
CycleGAN  CycleConsistent GAN 
ALAE  Adversarial Latent AutoEncoder 
KLD  Kullback–Leibler Divergence 
OTUS  OptimalTransportbased Unfolding and Simulation 
LPIPS  Learned Perceptual Image Patch Similarity 
FID  Fréchet Inception Distance 
MSE  Mean Squared Error 
SSIM  Structural SIMilarity 
PSNR  Peak SignaltoNoise Ratio 
CDP  Copy Detection Pattern 
UMAP  Uniform Manifold Approximation and Projection 
Appendix A. Notations Summary
Notation  Description 

$p(\mathbf{x},\mathbf{z})$, $p\left(\mathbf{x}\right\mathbf{z})$, $p\left(\mathbf{z}\right\mathbf{x})$, $p\left(\mathbf{x}\right)$, $p\left(\mathbf{z}\right)$  Data joint, conditional and marginal distributions. Short notations for ${p}_{\mathrm{x},\mathrm{z}}(\mathbf{x},\mathbf{z})$, ${p}_{\mathrm{x}}\left(\mathbf{x}\right)$, etc. 
${q}_{\varphi}(\mathbf{x},\mathbf{z})$, ${q}_{\varphi}\left(\mathbf{x}\right\mathbf{z})$, ${q}_{\varphi}\left(\mathbf{z}\right\mathbf{x})$  Encoder joint and conditional distributions as defined in Equation (1). 
${\tilde{q}}_{\varphi}\left(\mathbf{z}\right):=\int p\left(\mathbf{x}\right)\phantom{\rule{0.166667em}{0ex}}{q}_{\varphi}\left(\mathbf{z}\right\mathbf{x})\mathrm{d}\mathbf{x}$  Approximated marginal distribution of synthetic data in the encoder latent space. 
${\widehat{q}}_{\varphi}\left(\mathbf{z}\right):=\int {\tilde{p}}_{\theta}\left(\mathbf{x}\right)\phantom{\rule{0.166667em}{0ex}}{q}_{\varphi}\left(\mathbf{z}\right\mathbf{x})\mathrm{d}\mathbf{x}$  Approximated marginal distribution of synthetic data in the encoder reconstructed space. 
${p}_{\theta}(\mathbf{x},\mathbf{z})$, ${p}_{\theta}\left(\mathbf{x}\right\mathbf{z})$, ${p}_{\theta}\left(\mathbf{z}\right\mathbf{x})$  Decoder joint and conditional distributions as defined in Equation (2). 
${\tilde{p}}_{\theta}\left(\mathbf{x}\right):=\int p\left(\mathbf{z}\right)\phantom{\rule{0.166667em}{0ex}}{p}_{\theta}\left(\mathbf{x}\right\mathbf{z})\mathrm{d}\mathbf{z}$  Approximated marginal distribution of synthetic data in the decoder latent space. 
${\widehat{p}}_{\theta}\left(\mathbf{x}\right):=\int {\tilde{q}}_{\varphi}\left(\mathbf{z}\right)\phantom{\rule{0.166667em}{0ex}}{p}_{\theta}\left(\mathbf{x}\right\mathbf{z})\mathrm{d}\mathbf{z}$  Approximated marginal distribution of synthetic data in the decoder reconstructed space. 
$I(\mathbf{X};\mathbf{Z})$, ${I}_{\varphi}(\mathbf{X};\tilde{\mathbf{Z}})$, ${I}_{\theta}(\tilde{\mathbf{X}};\mathbf{Z})$  Mutual information as defined in Equation (3) and below. Subscripts mean that parametrised distributions are involved in the space denoted by a tilde. 
${\mathcal{I}}_{\varphi}^{\mathrm{z}}(\mathbf{X};\mathbf{Z})$, ${\mathcal{I}}_{\varphi ,\theta}^{\mathrm{x}}(\mathbf{X};\tilde{\mathbf{Z}})$, ${\mathcal{I}}_{\theta}^{\mathrm{x}}(\mathbf{X};\mathbf{Z})$, ${\mathcal{I}}_{\varphi ,\theta}^{\mathrm{z}}(\tilde{\mathbf{X}};\mathbf{Z})$  Lower bounds to mutual information as derived in Appendix C. Superscripts denote for which variable the corresponding loss terms are computed, subscripts denote the involved parametrised distributions and tildes follow the notations of the bounded mutual information. 
Appendix B. BIBAE Full Derivation
Appendix B.1. Minimised Terms
Appendix B.2. Maximised Terms
Appendix C. TURBO Full Derivation
Appendix C.1. Direct Path, Encoder Space
Appendix C.2. Direct Path, Decoder Space
Appendix C.3. Reverse Path, Decoder Space
Appendix C.4. Reverse Path, Encoder Space
Appendix D. ALAE Modified Term
References
 Goodfellow, I.J.; PougetAbadie, J.; Mirza, M.; Xu, B.; WardeFarley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. arXiv 2014, arXiv:1406.2661. [Google Scholar]
 Kingma, D.P.; Welling, M. AutoEncoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
 Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China, 21–26 June 2014; pp. 1278–1286. [Google Scholar]
 Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial Autoencoders. arXiv 2015, arXiv:1511.05644. [Google Scholar]
 Tishby, N.; Zaslavsky, N. Deep learning and the information bottleneck principle. In Proceedings of the IEEE Information Theory Workshop, Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
 Alemi, A.A.; Fischer, I.; Dillon, J.V.; Murphy, K. Deep Variational Information Bottleneck. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
 Voloshynovskiy, S.; Taran, O.; Kondah, M.; Holotyak, T.; Rezende, D. Variational Information Bottleneck for SemiSupervised Classification. Entropy 2020, 22, 943. [Google Scholar] [CrossRef]
 Amjad, R.A.; Geiger, B.C. Learning Representations for Neural NetworkBased Classification Using the Information Bottleneck Principle. IEEE Trans. Pattern. Anal. Mach. Intell. 2019, 42, 2225–2239. [Google Scholar] [CrossRef]
 Uğur, Y.; Arvanitakis, G.; Zaidi, A. Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding. Entropy 2020, 22, 213. [Google Scholar] [CrossRef]
 Tishby, N.; Pereira, F.C.; Bialek, W. The information bottleneck method. In Proceedings of the ThirtySeventh Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, 22–24 September 1999; pp. 368–377. [Google Scholar]
 Cover, T.M. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
 Voloshynovskiy, S.; Kondah, M.; Rezaeifar, S.; Taran, O.; Hotolyak, T.; Rezende, D.J. Information bottleneck through variational glasses. In Proceedings of the Workshop on Bayesian Deep Learning, NeurIPS, Vancouver, Canada, 13 December 2019. [Google Scholar]
 Zbontar, J.; Jing, L.; Misra, I.; LeCun, Y.; Deny, S. Barlow twins: Selfsupervised learning via redundancy reduction. In Proceedings of the International Conference on Machine Learning, PMLR, Virtually, 18–24 July 2021; pp. 12310–12320. [Google Scholar]
 ShwartzZiv, R.; LeCun, Y. To Compress or Not to Compress–SelfSupervised Learning and Information Theory: A Review. arXiv 2023, arXiv:2304.09355. [Google Scholar]
 Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Imagetoimage translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
 Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired imagetoimage translation using cycleconsistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
 Rezende, D.; Mohamed, S. Variational inference with normalizing flows. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1530–1538. [Google Scholar]
 Pidhorskyi, S.; Adjeroh, D.A.; Doretto, G. Adversarial latent autoencoders. In Proceedings of the Conference on Computer Vision and Pattern Recognition, IEEE/CVF, Virtually, 14–19 June 2020; pp. 14104–14113. [Google Scholar]
 Achille, A.; Soatto, S. Information Dropout: Learning Optimal Representations Through Noisy Computation. IEEE Trans. Pattern. Anal. Mach. Intell. 2018, 40, 2897–2905. [Google Scholar] [CrossRef]
 Razeghi, B.; Calmon, F.P.; Gündüz, D.; Voloshynovskiy, S. Bottlenecks CLUB: Unifying InformationTheoretic TradeOffs Among Complexity, Leakage, and Utility. IEEE Trans. Inf. Forensics Secur. 2023, 18, 2060–2075. [Google Scholar] [CrossRef]
 Tian, Y.; Pang, G.; Liu, Y.; Wang, C.; Chen, Y.; Liu, F.; Singh, R.; Verjans, J.W.; Wang, M.; Carneiro, G. Unsupervised Anomaly Detection in Medical Images with a Memoryaugmented Multilevel Crossattentional Masked Autoencoder. arXiv 2022, arXiv:2203.11725. [Google Scholar]
 Patel, A.; Tudiosu, P.D.; Pinaya, W.H.; Cook, G.; Goh, V.; Ourselin, S.; Cardoso, M.J. Cross Attention Transformers for Multimodal Unsupervised WholeBody PET Anomaly Detection. J. Mach. Learn. Biomed. Imaging 2023, 2, 172–201. [Google Scholar] [CrossRef]
 Golling, T.; Nobe, T.; Proios, D.; Raine, J.A.; Sengupta, D.; Voloshynovskiy, S.; Arguin, J.F.; Martin, J.L.; Pilette, J.; Gupta, D.B.; et al. The Massive Issue: Anomaly Detection in Jet Physics. arXiv 2023, arXiv:2303.14134. [Google Scholar]
 Buhmann, E.; Diefenbacher, S.; Eren, E.; Gaede, F.; Kasieczka, G.; Korol, A.; Krüger, K. Getting high: High fidelity simulation of high granularity calorimeters with high speed. Comput. Softw. Big Sci. 2021, 5, 13. [Google Scholar] [CrossRef]
 Buhmann, E.; Diefenbacher, S.; Hundhausen, D.; Kasieczka, G.; Korcari, W.; Eren, E.; Gaede, F.; Krüger, K.; McKeown, P.; Rustige, L. Hadrons, better, faster, stronger. Mach. Learn. Sci. Technol. 2022, 3, 025014. [Google Scholar] [CrossRef]
 Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. betaVAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
 Zhao, S.; Song, J.; Ermon, S. InfoVAE: Information Maximizing Variational Autoencoders. arXiv 2017, arXiv:1706.02262. [Google Scholar]
 Mohamed, S.; Lakshminarayanan, B. Learning in Implicit Generative Models. arXiv 2016, arXiv:1610.03483. [Google Scholar]
 Larsen, A.B.L.; Sønderby, S.K.; Larochelle, H.; Winther, O. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 19–24 June 2016; pp. 1558–1566. [Google Scholar]
 Howard, J.N.; Mandt, S.; Whiteson, D.; Yang, Y. Learning to simulate high energy particle collisions from unlabeled data. Sci. Rep. 2022, 12, 7567. [Google Scholar] [CrossRef]
 Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
 Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photorealistic single image superresolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
 Karras, T.; Laine, S.; Aila, T. A StyleBased Generator Architecture for Generative Adversarial Networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition, IEEE/CVF, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
 Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
 Sauer, A.; Schwarz, K.; Geiger, A. StyleGANXL: Scaling StyleGAN to Large Diverse Datasets. In Proceedings of the SIGGRAPH Conference. ACM, Vancouver, BC, Canada, 8–11 August 2022; pp. 1–10. [Google Scholar]
 Image Generation on ImageNet 256 × 256. Available online: https://paperswithcode.com/sota/imagegenerationonimagenet256x256 (accessed on 29 August 2023).
 Image Generation on FFHQ 256 × 256. Available online: https://paperswithcode.com/sota/imagegenerationonffhq256x256 (accessed on 29 August 2023).
 Papamakarios, G.; Nalisnick, E.; Rezende, D.J.; Mohamed, S.; Lakshminarayanan, B. Normalizing Flows for Probabilistic Modeling and Inference. J. Mach. Learn. Res. 2021, 22, 1–64. [Google Scholar]
 Quétant, G.; Drozdova, M.; Kinakh, V.; Golling, T.; Voloshynovskiy, S. TurboSim: A generalised generative model with a physical latent space. In Proceedings of the Workshop on Machine Learning and the Physical Sciences, NeurIPS, Virtually, 13 December 2021. [Google Scholar]
 Bellagente, M.; Butter, A.; Kasieczka, G.; Plehn, T.; Rousselot, A.; Winterhalder, R.; Ardizzone, L.; Köthe, U. Invertible networks or partons to detector and back again. SciPost Phys. 2020, 9, 074. [Google Scholar] [CrossRef]
 Belousov, Y.; Pulfer, B.; Chaban, R.; Tutt, J.; Taran, O.; Holotyak, T.; Voloshynovskiy, S. Digital twins of physical printingimaging channel. In Proceedings of the IEEE International Workshop on Information Forensics and Security, Virtually, 12–16 December 2022; pp. 1–6. [Google Scholar]
 McInnes, L.; Healy, J.; Saul, N.; Grossberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
BIBAE  TURBO  

Paradigm  Minimising the mutual information between the input space and the latent space, while maximising the mutual information between the latent space and the output space  Maximising the mutual information between the input space and the latent space, and maximising the mutual information between the latent space and the output space 
Oneway encoding  Twoway encoding  
Data and latent space distributions are considered independently  Data and latent space distributions are considered jointly  
Targeted tasks 


Advantages 


Drawbacks 


Particular cases  VAE, GAN, VAE/GAN  AAE, GAN, pix2pix, SRGAN, CycleGAN, Flows 
Related models  InfoVAE, CLUB  ALAE 
Z space  X space  Rec. space  

Model  ${\mathit{E}}^{\mathit{b}}$  ${\mathit{E}}^{\mathit{jet}\mathbf{1}}$  ${\mathit{m}}_{\mathit{tt}}$ 
TurboSim  3.96  4.43  2.97 
OTUS  2.76  5.75  15.8 
Model  MSE ↓  SSIM ↑  PSNR ↑  LPIPS ↓  FID ↓ 

CycleGAN  0.0097  0.83  20.11  0.48  128.1 
pix2pix  0.0021  0.93  26.78  0.44  54.58 
TURBO  0.0026  0.92  25.88  0.41  43.36 
Model  FID${}_{\mathit{x}\to \tilde{\mathit{z}}}$ ↓  FID${}_{\mathit{z}\to \tilde{\mathit{x}}}$ ↓  Hamming ↓  MSE ↓  SSIM ↑ 

W/O processing  304  304  0.24  0.18  0.48 
CycleGAN  3.87  4.45  0.15  0.05  0.73 
pix2pix  3.37  8.57  0.11  0.05  0.76 
TURBO  3.16  6.60  0.09  0.04  0.78 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Quétant, G.; Belousov, Y.; Kinakh, V.; Voloshynovskiy, S. TURBO: The Swiss Knife of AutoEncoders. Entropy 2023, 25, 1471. https://doi.org/10.3390/e25101471
Quétant G, Belousov Y, Kinakh V, Voloshynovskiy S. TURBO: The Swiss Knife of AutoEncoders. Entropy. 2023; 25(10):1471. https://doi.org/10.3390/e25101471
Chicago/Turabian StyleQuétant, Guillaume, Yury Belousov, Vitaliy Kinakh, and Slava Voloshynovskiy. 2023. "TURBO: The Swiss Knife of AutoEncoders" Entropy 25, no. 10: 1471. https://doi.org/10.3390/e25101471