# Measure of Similarity between GMMs Based on Autoencoder-Generated Gaussian Component Representations

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Baseline GMM Similarity Measures

#### 2.2. GMM Similarity Measures Based on Autoencoder-Generated Representations

#### 2.2.1. Autoencoder Architectures

#### 2.2.2. Ground Distances Regularization

#### 2.2.3. Forming the Feature Map Regularizer ${\mathcal{R}}_{FMR}(F,G)$

#### 2.2.4. Forming the GMM Similarity Measure Based on Autoencoder-Generated Representations

#### 2.3. Computational Complexity

## 3. Experimental Results

#### 3.1. Network Architectures

#### 3.2. Performances

## 4. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

AI | Artificial Intelligence |

CNN | Convolutional Neural Network |

DNN | Deep Neural Network |

DPLM | Distance Preservation to the Local Mean |

EM | Expectation Maximization |

EMD | Earth Mover’s Distance |

FMR | Feature Map Regularized |

GMM | Gaussian Mixture Model |

KL | Kullback–Leibler |

KNN | K-Nearest Neighbors |

KTH | Royal Institute of Technology in Stockholm |

LP | Linear Programming |

MB | Matching-Based |

MC | Monte Carlo |

ML | Machine Learning |

MLE | Maximum Likelihood Estimate |

MSE | Mean Squared Error |

Probability Density Function | |

SPD | Symmetric Positive Definite |

TIPS | Textures Under Varying Illumination, Pose and Scale |

UC | Unscented Transform |

UIUC | University of Illinois Urbana-Champaign |

UMD | University of Maryland |

VB | Variational Bound |

VAE | Variational AutoEncoder |

WA | Weighted Average |

## References

- Goldberger, J.; Aronowitz, H. A distance measure between GMMs based on the unscented transform and its application to speaker recognition. In Proceedings of the INTERSPEECH, Lisbon, Portugal, 4–8 September 2005; pp. 1985–1988. [Google Scholar]
- Goldberger, J.; Gordon, S.; Greenspan, H. An efficient image similarity measure based on approximations of KL-divergence between two Gaussian mixtures. In Proceedings of the International Conference on Computer Vision, Nice, France, 13–16 October 2003; Volume 3, pp. 487–493. [Google Scholar]
- Wu, Y.; Chan, K.L.; Huang, Y. Image texture classification based on finite Gaussian mixture models. In Proceedings of the 3rd Int. Workshop on Text. Anal. and Synth., Int. Conf. on Computer Vision, Nice, France, 13–16 October 2003; pp. 107–112. [Google Scholar]
- Dilokthanakul, N.; Mediano, P.A.; Garnelo, M.; Lee, M.C.; Salimbeni, H.; Arulkumaran, K.; Shanahan, M. Deep unsupervised clustering with Gaussian mixture variational autoencoders. arXiv
**2017**, arXiv:1611.02648v2. [Google Scholar] - Gangodkar, D. A novel image retrieval technique based on semi supervised clustering. Multimed. Tools Appl.
**2021**, 80, 35741–35769. [Google Scholar] [CrossRef] - Asheri, H.; Hosseini, R.; Araabi, B.N. A new EM algorithm for flexibly tied GMMs with large number of components. Pattern Recognit.
**2021**, 114, 107836. [Google Scholar] [CrossRef] - Durrieu, J.L.; Thiran, J.P.; Kelly, F. Lower and upper bounds for approximation of the Kullback-Leibler divergence between Gaussian mixture models. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 25–30 March 2012; pp. 4833–4836. [Google Scholar]
- Brkljač, B.; Janev, M.; Obradović, R.; Rapaić, D.; Ralević, N.; Crnojević, V. Sparse representation of precision matrices used in GMMs. Appl. Intell.
**2014**, 41, 956–973. [Google Scholar] [CrossRef] - Kaur, A.; Sachdeva, R.; Singh, A. Classification approaches for automatic speech recognition system. In Artificial Intelligence and Speech Technology; CRC Press: Boca Raton, FL, USA, 2021; pp. 1–7. [Google Scholar]
- Demir, K.E.; Eskil, M.T. Improved microphone array design with statistical speaker verification. Appl. Acoust.
**2021**, 175, 107813. [Google Scholar] [CrossRef] - Yücesoy, E. Two-level classification in determining the age and gender group of a speaker. Int. Arab J. Inf. Technol.
**2021**, 18, 663–670. [Google Scholar] [CrossRef] - Narasimhan, H.; Vinayakumar, R.; Mohammad, N. Unsupervised deep learning approach for in-vehicle intrusion detection system. IEEE Consum. Electron. Mag.
**2023**, 12, 103–108. [Google Scholar] [CrossRef] - Chernoff, H. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat.
**1952**, 23, 493–507. [Google Scholar] [CrossRef] - Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distribution. Bull. Calcutta Math. Soc.
**1943**, 35, 99–110. [Google Scholar] - Matusita, K. Decision rules, based on the distance, for problems of fit, two samples, and estimation. Ann. Math. Stat.
**1955**, 26, 631–640. [Google Scholar] [CrossRef] - Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Kullback, S. Information Theory and Statistics; Dover Publications Inc.: Mineola, NY, USA, 1968. [Google Scholar]
- Minh, H.Q.; Murino, V. Covariances in computer vision and machine learning. Synth. Lect. Comput. Vis.
**2017**, 7, 1–170. [Google Scholar] [CrossRef] - Hao, H.; Wang, Q.; Li, P.; Zhang, L. Evaluation of ground distances and features in EMD-based GMM matching for texture classification. Pattern Recognit.
**2016**, 57, 152–163. [Google Scholar] [CrossRef] - Nielsen, F. Chernoff information of exponential families. arXiv
**2011**, arXiv:1102.2684. [Google Scholar] - Mak, H.W.L.; Han, R.; Yin, H.H. Application of variational autoEncoder (VAE) model and image processing approaches in game design. Sensors
**2023**, 23, 3457. [Google Scholar] [CrossRef] - Lucas, S.M.; Volz, V. Tile pattern KL-divergence for analysing and evolving game levels. In Proceedings of the Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 170–178. [Google Scholar]
- Li, P.; Wang, Q.; Zhang, L. A novel earth mover’s distance methodology for image matching with Gaussian mixture models. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1689–1696. [Google Scholar]
- Rubner, Y.; Tomasi, C.; Guibas, L.J. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis.
**2000**, 40, 99. [Google Scholar] [CrossRef] - Krstanović, L.; Ralević, N.M.; Zlokolica, V.; Obradović, R.; Mišković, D.; Janev, M.; Popović, B. GMMs similarity measure based on LPP-like projection of the parameter space. Expert Syst. Appl.
**2016**, 66, 136–148. [Google Scholar] [CrossRef] - Popović, B.; Cepova, L.; Cep, R.; Janev, M.; Krstanović, L. Measure of similarity between GMMs by embedding of the parameter space that preserves KL divergence. Mathematics
**2021**, 9, 957. [Google Scholar] [CrossRef] - Popović, B.; Janev, M.; Krstanović, L.; Simić, N.; Delić, V. Measure of similarity between GMMs based on geometry-aware dimensionality reduction. Mathematics
**2022**, 11, 175. [Google Scholar] [CrossRef] - He, X.; Niyogi, P. Locality preserving projections. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 9–14 December 2003; Volume 16. [Google Scholar]
- He, X.; Cai, D.; Yan, S.; Zhang, H.J. Neighborhood preserving embedding. In Proceedings of the International Conference on Computer Vision, Beijing, China, 17–21 October 2005; Volume 2, pp. 1208–1213. [Google Scholar]
- Sivalingam, R.; Boley, D.; Morellas, V.; Papanikolopoulos, N. Tensor sparse coding for region covariances. In Proceedings of the European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; pp. 722–735. [Google Scholar]
- Lovrić, M.; Min-Oo, M.; Ruh, E.A. Multivariate normal distributions parametrized as a Riemannian symmetric space. J. Multivar. Anal.
**2000**, 74, 36–48. [Google Scholar] [CrossRef] - Roy, S.S.; Hossain, S.I.; Akhand, M.; Murase, K. A robust system for noisy image classification combining denoising autoencoder and convolutional neural network. Int. J. Adv. Comput. Sci. Appl.
**2018**, 9, 224–235. [Google Scholar] [CrossRef] - Ahmed, A.S.; El-Behaidy, W.H.; Youssif, A.A. Medical image denoising system based on stacked convolutional autoencoder for enhancing 2-dimensional gel electrophoresis noise reduction. Biomed. Signal Process. Control
**2021**, 69, 102842. [Google Scholar] [CrossRef] - Munir, N.; Park, J.; Kim, H.J.; Song, S.J.; Kang, S.S. Performance enhancement of convolutional neural network for ultrasonic flaw classification by adopting autoencoder. NDT E Int.
**2020**, 111, 102218. [Google Scholar] [CrossRef] - Hershey, J.R.; Olsen, P.A. Approximating the Kullback Leibler divergence between Gaussian mixture models. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Honolulu, HI, USA, 15–20 April 2007; Volume 4, pp. 317–320. [Google Scholar]
- Cover, T.M. Elements of Information Theory; Wiley Series in Telecommunications; John Wiley & Sons: New York, NY, USA, 1991. [Google Scholar]
- Julier, S.J. A General Method for Approximating Non-Linear Transformations of Probability Distributions; Technical Report; Robotics Research Group, Department of Engineering Science, University of Oxford: Oxford, UK, 1996. [Google Scholar]
- Ling, H.; Okada, K. An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE Trans. Pattern Anal. Mach. Intell.
**2007**, 29, 840–853. [Google Scholar] [CrossRef] - Davoudi, A.; Ghidary, S.S.; Sadatnejad, K. Dimensionality reduction based on distance preservation to local mean for symmetric positive definite matrices and its application in brain–computer interfaces. J. Neural Eng.
**2017**, 14, 036019. [Google Scholar] [CrossRef] - Arsigny, V.; Fillard, P.; Pennec, X.; Ayache, N. Fast and simple calculus on tensors in the log-Euclidean framework. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Palm Springs, CA, USA, 26–29 October 2005; pp. 115–122. [Google Scholar]
- Klir, G.J.; Yuan, B. Fuzzy Sets and Fuzzy Logic: Theory and Applicaions; Prentice Hall New Jersey: Hoboken, NJ, USA, 1995; Volume 4. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Schmidt, R.M.; Schneider, F.; Hennig, P. Descending through a crowded valley-benchmarking deep learning optimizers. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 9367–9376. [Google Scholar]
- Webb, A.R. Statistical Pattern Recognition; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
- Lazebnik, S.; Schmid, C.; Ponce, J. A sparse texture representation using local affine regions. IEEE Trans. Pattern Anal. Mach. Intell.
**2005**, 27, 1265–1278. [Google Scholar] [CrossRef] [PubMed] - Fritz, M.; Hayman, E.; Caputo, B.; Eklundh, J.O. The Kth-Tips Database; Technical Report; Computational Vision and Active Perception Laboratory, Department of Numerical Analysis and Computer Science: Stockholm, Sweden, 2004; Available online: https://www.csc.kth.se/cvap/databases/kth-tips/doc/ (accessed on 1 October 2022).
- Xu, Y.; Ji, H.; Fermüller, C. Viewpoint invariant texture description using fractal analysis. Int. J. Comput. Vis.
**2009**, 83, 85–100. [Google Scholar] [CrossRef]

**Figure 1.**Illustration of the proposed fully connected autoencoder architecture for the low-dimensional embedding of Gaussian components represented by $vect\left({P}_{i}\right)$ into $h\in {\mathbb{R}}^{l}$. Colors indicate symmetric architecture of the network and the goal of learning unique embedding of the original Gaussian components (middle vector h shown in red).

**Figure 2.**Illustration of the proposed FMR regularized CNN autoencoder for the low-dimensional embedding of Gaussian components represented by SPD matrices ${P}_{i}\in {\mathrm{Sym}}_{++}(d+1)$ into $h\in {\mathbb{R}}^{l}$.

**Table 1.**Recognition accuracies for the proposed GMM-Autoenc-based measures when compared with KL-based as well as DPLM-based GMM similarity measures on UIUC database.

GMM Sim. Meas. | Accuracy | ||||||
---|---|---|---|---|---|---|---|

$\mathit{m}$ = 1 | $\mathit{m}$ = 5 | $\mathit{m}$ = 10 | |||||

$K{L}_{MB}$ | 0.82 | 0.80 | 0.80 | ||||

$K{L}_{WA}$ | 0.82 | 0.82 | 0.82 | ||||

$K{L}_{VB}$ | 0.82 | 0.82 | 0.82 | ||||

$\tilde{\mathit{l}}$ = 5 | $\tilde{\mathit{l}}$ = 7 | $\tilde{\mathit{l}}$ = 5 | $\tilde{\mathit{l}}$ = 7 | $\tilde{\mathit{l}}$ = 5 | $\tilde{\mathit{l}}$ = 7 | ||

$uDPL{M}_{MB}$ | 0.72 | 0.81 | 0.73 | 0.74 | 0.79 | 0.79 | |

$uDPL{M}_{WA}$ | 0.72 | 0.81 | 0.73 | 0.74 | 0.80 | 0.80 | |

$uDPL{M}_{VB}$ | 0.72 | 0.81 | 0.73 | 0.74 | 0.80 | 0.80 | |

$\mathit{l}$ = 20 | $\mathit{l}$ = 30 | $\mathit{l}$ = 20 | $\mathit{l}$ = 30 | $\mathit{l}$ = 20 | $\mathit{l}$ = 30 | ||

GMM-Autoenc${}_{1,AE}$ | 0.75 | 0.81 | 0.75 | 0.76 | 0.80 | 0.80 | |

GMM-Autoenc${}_{2,AE}$ | 0.75 | 0.81 | 0.76 | 0.76 | 0.80 | 0.80 | |

GMM-Autoenc${}_{3,AE}$ | 0.76 | 0.80 | 0.76 | 0.77 | 0.81 | 0.81 | |

GMM-Autoenc${}_{1,AECNN}$ | 0.75 | 0.80 | 0.73 | 0.77 | 0.81 | 0.80 | |

GMM-Autoenc${}_{2,AECNN}$ | 0.76 | 0.81 | 0.71 | 0.78 | 0.81 | 0.81 | |

GMM-Autoenc${}_{3,AECNN}$ | 0.77 | 0.81 | 0.71 | 0.79 | 0.81 | 0.80 |

**Table 2.**Recognition accuracies for the proposed GMM-Autoenc-based measures when compared with KL-based as well as DPLM-based GMM similarity measures on KTH-TIPS database.

GMM Sim. Meas. | Accuracy | ||||||
---|---|---|---|---|---|---|---|

$\mathit{m}$ = 1 | $\mathit{m}$ = 5 | $\mathit{m}$ = 10 | |||||

$K{L}_{MB}$ | 0.78 | 0.74 | 0.75 | ||||

$K{L}_{WA}$ | 0.78 | 0.78 | 0.78 | ||||

$K{L}_{VB}$ | 0.78 | 0.78 | 0.78 | ||||

$\tilde{\mathit{l}}$ = 5 | $\tilde{\mathit{l}}$ = 7 | $\tilde{\mathit{l}}$ = 5 | $\tilde{\mathit{l}}$ = 7 | $\tilde{\mathit{l}}$ = 5 | $\tilde{\mathit{l}}$ = 7 | ||

$uDPL{M}_{MB}$ | 0.57 | 0.73 | 0.69 | 0.71 | 0.63 | 0.72 | |

$uDPL{M}_{WA}$ | 0.57 | 0.73 | 0.72 | 0.75 | 0.64 | 0.75 | |

$uDPL{M}_{VB}$ | 0.57 | 0.73 | 0.72 | 0.75 | 0.63 | 0.75 | |

$\mathit{l}$ = 20 | $\mathit{l}$ = 30 | $\mathit{l}$ = 20 | $\mathit{l}$ = 30 | $\mathit{l}$ = 20 | $\mathit{l}$ = 30 | ||

GMM-Autoenc${}_{1,AE}$ | 0.71 | 0.75 | 0.73 | 0.75 | 0.72 | 0.74 | |

GMM-Autoenc${}_{2,AE}$ | 0.71 | 0.74 | 0.72 | 0.74 | 0.72 | 0.75 | |

GMM-Autoenc${}_{3,AE}$ | 0.72 | 0.75 | 0.73 | 0.75 | 0.73 | 0.76 | |

GMM-Autoenc${}_{1,AECNN}$ | 0.72 | 0.75 | 0.73 | 0.74 | 0.73 | 0.73 | |

GMM-Autoenc${}_{2,AECNN}$ | 0.73 | 0.75 | 0.71 | 0.75 | 0.74 | 0.76 | |

GMM-Autoenc${}_{3,AECNN}$ | 0.73 | 0.76 | 0.72 | 0.77 | 0.74 | 0.77 |

**Table 3.**Recognition accuracies for the proposed GMM-Autoenc-based measures when compared with KL-based as well as DPLM-based GMM similarity measures on UMD database.

GMM Sim. Meas. | Accuracy | ||||||
---|---|---|---|---|---|---|---|

m = 1 | m = 5 | m = 10 | |||||

$K{L}_{MB}$ | 0.75 | 0.73 | 0.72 | ||||

$K{L}_{WA}$ | 0.75 | 0.75 | 0.75 | ||||

$K{L}_{VB}$ | 0.75 | 0.75 | 0.75 | ||||

$\tilde{\mathit{l}}$ = 5 | $\tilde{\mathit{l}}$ = 7 | $\tilde{\mathit{l}}$ = 5 | $\tilde{\mathit{l}}$ = 7 | $\tilde{\mathit{l}}$ = 5 | $\tilde{\mathit{l}}$ = 7 | ||

$uDPL{M}_{MB}$ | 0.73 | 0.74 | 0.72 | 0.72 | 0.70 | 0.72 | |

$uDPL{M}_{WA}$ | 0.73 | 0.74 | 0.73 | 0.74 | 0.71 | 0.75 | |

$uDLP{M}_{VB}$ | 0.73 | 0.74 | 0.73 | 0.74 | 0.71 | 0.75 | |

$\mathit{l}$ = 20 | $\mathit{l}$ = 30 | $\mathit{l}$ = 20 | $\mathit{l}$ = 30 | $\mathit{l}$ = 20 | $\mathit{l}$ = 30 | ||

GMM-Autoenc${}_{1,AE}$ | 0.74 | 0.74 | 0.73 | 0.73 | 0.71 | 0.72 | |

GMM-Autoenc${}_{2,AE}$ | 0.73 | 0.74 | 0.73 | 0.74 | 0.73 | 0.74 | |

GMM-Autoenc${}_{3,AE}$ | 0.74 | 0.75 | 0.74 | 0.75 | 0.73 | 0.75 | |

GMM-Autoenc${}_{1,AECNN}$ | 0.74 | 0.75 | 0.73 | 0.73 | 0.72 | 0.72 | |

GMM-Autoenc${}_{2,AECNN}$ | 0.75 | 0.75 | 0.74 | 0.75 | 0.74 | 0.74 | |

GMM-Autoenc${}_{3,AECNN}$ | 0.74 | 0.75 | 0.74 | 0.75 | 0.74 | 0.75 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kalušev, V.; Popović, B.; Janev, M.; Brkljač, B.; Ralević, N.
Measure of Similarity between GMMs Based on Autoencoder-Generated Gaussian Component Representations. *Axioms* **2023**, *12*, 535.
https://doi.org/10.3390/axioms12060535

**AMA Style**

Kalušev V, Popović B, Janev M, Brkljač B, Ralević N.
Measure of Similarity between GMMs Based on Autoencoder-Generated Gaussian Component Representations. *Axioms*. 2023; 12(6):535.
https://doi.org/10.3390/axioms12060535

**Chicago/Turabian Style**

Kalušev, Vladimir, Branislav Popović, Marko Janev, Branko Brkljač, and Nebojša Ralević.
2023. "Measure of Similarity between GMMs Based on Autoencoder-Generated Gaussian Component Representations" *Axioms* 12, no. 6: 535.
https://doi.org/10.3390/axioms12060535