# Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Stochastic Neighbor Embedding (SNE)

## 3. t-Distributed Stochastic Neighbor Embedding (t-SNE)

Algorithm 1: t-Distributed Stochastic Neighbor Embedding [17] |

Data: data set X = ${x}_{1}$, ${x}_{2}$, ${x}_{3}$, …, ${x}_{n}$ |

cost function parameters: perplexity Perp |

optimization parameters: number of iterations T, learning rate h, momentum a(t). |

Result: low-dimensional data representation ${Y}^{T}$ = ${y}_{1}$, ${y}_{2}$, ${y}_{3}$, …,${y}_{n}$ |

begin |

compute pairwise affinities ${p}_{j|i}$ with perplexity Perp (using Equation (1)) |

set ${p}_{i|j}=\frac{{p}_{j|i}+\phantom{\rule{4pt}{0ex}}{p}_{i|i}}{2n}$ sample initial solution $Y\left(0\right)$ = ${y}_{1}$, ${y}_{2}$, …, ${y}_{n}$ from $N(0,{10}^{-4})$ |

for $k\leftarrow 1$ to N |

compute low-dimensional affinities ${q}_{\mathrm{ij}}$ (using Equation (5)) |

compute gradient $\frac{\delta C}{\delta y}$ (using Equation (6)) |

set ${Y}^{t}=\phantom{\rule{4pt}{0ex}}{Y}^{t-1}+\eta \frac{\delta C}{\delta Y}+\phantom{\rule{4pt}{0ex}}\alpha \left(t\right)({Y}^{t-1\phantom{\rule{4pt}{0ex}}}-\phantom{\rule{4pt}{0ex}}{Y}^{t-2})$ |

end |

end |

## 4. Motivation

- First, there is dearth of research visualizing the high-dimensional data of Urdu handwritten text at character level.
- Second, there is ambivalence as to whether to use the heterogeneous data fusion from multiple data sources while visualizing high-dimensional data.
- Third, there is need to enhance the visualization of Urdu handwritten digit data by increasing the inter-cluster separation.

## 5. Our Approach

#### 5.1. Dataset

#### 5.2. Experimental Setup and Results

#### 5.3. Applying Standard t-SNE

#### 5.4. Fusion Matrix

#### 5.5. Experimental Results of PCA, AE and Modified t-SNE

## 6. Discussion

## 7. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Bahlmann, C. Directional features in online handwriting recognition. Pattern Recognit.
**2006**, 39, 115–125. [Google Scholar] [CrossRef] [Green Version] - Razzak, M.I.; Anwar, F.; Husain, S.A.; Belaid, A.; Sher, M. HMM and fuzzy logic: A hybrid approach for online Urdu script-based languages character recognition. Knowl.-Based Syst.
**2010**, 23, 914–923. [Google Scholar] [CrossRef] - Herrera-Luna, E.C.; Felipe-Riveron, E.M.; Godoy-Calderon, S. A supervised algorithm with a new differentiated-weighting scheme for identifying the author of a handwritten text. Pattern Recognit. Lett.
**2011**, 32, 1139–1144. [Google Scholar] [CrossRef] - Carbonell, M.; Villegas, M.; Fornés, A.; Lladós, J. Joint recognition of handwritten text and named entities with a neural end-to-end model. arXiv, 2018; arXiv:1803.06252. [Google Scholar]
- Shinde, A.; Shinde, A. Overlapping character recognition for handwritten text using discriminant hidden Semi-Markov model. In Intelligent Computing and Information and Communication; Springer: Singapore, 2018; pp. 163–172. [Google Scholar]
- Mori, S.; Nishida, H.; Yamada, H. Optical Character Recognition; John Wiley & Sons, Inc.: New York, NY, USA, 1999. [Google Scholar]
- Schantz, H.F. The History of OCR, Optical Character Recognition; Recognition Technologies Users Association: Manchester, VT, USA, 1982. [Google Scholar]
- Khan, N.H.; Adnan, A.; Basar, S. An analysis of off-line and on-line approaches in urdu character recognition. In Proceedings of the 15th International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases (AIKED ’16), Venice, Italy, 29–31 January 2016. [Google Scholar]
- Akram, Q.U.A.; Hussain, S. Ligature-based font size independent OCR for Noori Nastalique writing style. In Proceedings of the 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; pp. 129–133. [Google Scholar]
- Melville, P.; Mooney, R.J. Diverse ensembles for active learning. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 74. [Google Scholar]
- Weinstein, J.N.; Collisson, E.A.; Mills, G.B.; Shaw, K.R.M.; Ozenberger, B.A.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J.M.; Network, C.G.A.R.; et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet.
**2013**, 45, 1113. [Google Scholar] [CrossRef] [PubMed] - Ravi, L.; Yan, Q.; Dascalu, S.M.; Harris, F.C., Jr. A survey of visualization techniques and tools for environmental data. In Proceedings of the 2013 Intl. Conference on Computers and Their Applications (CATA 2013), Honolulu, HI, USA, 4–6 March 2013. [Google Scholar]
- De Oliveira, M.F.; Levkowitz, H. From visual data exploration to visual data mining: A survey. IEEE Trans. Vis. Comput. Graph.
**2003**, 9, 378–394. [Google Scholar] [CrossRef] - Keim, D.A.; Ankerst, M.; Kriegel, H.-P. Recursive pattern: A technique for visualizing very large amounts of data. In Proceedings of the 6th Conference on Visualization’95; IEEE Computer Society, Atlanta, GA, USA, 29 October–9 November 1995; p. 279. [Google Scholar]
- Rekimoto, J.; Green, M. The information cube: Using transparency in 3rd information visualization. In Proceedings of the Third Annual Workshop on Information Technologies & Systems (WITS93), Orlando, FL, USA, 5 December 1993; pp. 125–132. [Google Scholar]
- Pickett, R.M.; Grinstein, G.G. Iconographic displays for visualizing multidimensional data. In Proceedings of the 1988 IEEE Conference on Systems, Man, and Cybernetics, Beijing, China, 18–12 August 1988; Volume 514, p. 519. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Lear. Res.
**2008**, 9, 2579–2605. [Google Scholar] - Maaten, L. Learning a parametric embedding by preserving local structure. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 8–12 June 2009; pp. 384–391. [Google Scholar]
- Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol.
**1933**, 24, 417. [Google Scholar] [CrossRef] - Torgerson, W.S. Multidimensional scaling: I. Theory and method. Psychometrika
**1952**, 17, 401–419. [Google Scholar] [CrossRef] - Lee, J.A.; Verleysen, M. Nonlinear Dimensionality Reduction; Springer Science & Business Media: New York, NY, USA, 2007. [Google Scholar]
- Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science
**2000**, 290, 2323–2326. [Google Scholar] [CrossRef] - Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2002; pp. 585–591. [Google Scholar]
- Weinberger, K.Q.; Sha, F.; Saul, L.K. Learning a kernel matrix for nonlinear dimensionality reduction. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 106. [Google Scholar]
- Hinton, G.E.; Roweis, S.T. Stochastic neighbor embedding. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–13 December 2003; pp. 857–864. [Google Scholar]
- Demartines, P.; Hérault, J. Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Trans. Neural Netw.
**1997**, 8, 148–154. [Google Scholar] [CrossRef] [PubMed] - Song, L.; Gretton, A.; Borgwardt, K.M.; Smola, A.J. Colored maximum variance unfolding. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 12–13 December 2008; pp. 1385–1392. [Google Scholar]
- Liou, C.-Y.; Cheng, W.-C.; Liou, J.-W.; Liou, D.-R. Autoencoder for words. Neurocomputing
**2014**, 139, 84–96. [Google Scholar] [CrossRef] - Joyce, J.M. Kullback-leibler divergence. In International Encyclopedia of Statistical Science; Springer: Berlin/Heidelberg, Germany, 2011; pp. 720–722. [Google Scholar]
- Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory
**1991**, 37, 145–151. [Google Scholar] [CrossRef] [Green Version] - Gashi, I.; Stankovic, V.; Leita, C.; Thonnard, O. An experimental study of diversity with off-the-shelf antivirus engines. In Proceedings of the 2009 Eighth IEEE International Symposium on Network Computing and Applications (NCA 2009), Cambridge, MA, USA, 9–11 July 2009; pp. 4–11. [Google Scholar]
- Hamel, P.; Eck, D. Learning Features from Music Audio with Deep Belief Networks. In Proceedings of the 11th International Society for Music Information Retrieval Conference ISMIR, Utrecht, The Netherlands, 9–13 August 2010; Volume 10, pp. 339–344. [Google Scholar]
- Jamieson, A.R.; Giger, M.L.; Drukker, K.; Li, H.; Yuan, Y.; Bhooshan, N. Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian eigenmaps and t-SNE. Med. Phys.
**2010**, 37, 339–351. [Google Scholar] [CrossRef] [PubMed] - Wallach, I.; Lilien, R. The protein–small-molecule database, a non-redundant structural resource for the analysis of protein-ligand binding. Bioinformatics
**2009**, 25, 615–620. [Google Scholar] [CrossRef] [PubMed] - Najafabadi, M.M.; Villanustre, F.; Khoshgoftaar, T.M.; Seliya, N.; Wald, R.; Muharemagic, E. Deep learning applications and challenges in big data analytics. J. Big Data
**2015**, 2, 1. [Google Scholar] [CrossRef] - Kiselev, V.Y.; Kirschner, K.; Schaub, M.T.; Andrews, T.; Yiu, A.; Chandra, T.; Natarajan, K.N.; Reik, W.; Barahona, M.; Green, A.R.; et al. SC3: Consensus clustering of single-cell RNA-seq data. Nat. Methods
**2017**, 14, 483. [Google Scholar] [CrossRef] - Pezzotti, N.; Lelieveldt, B.P.; van der Maaten, L.; Höllt, T.; Eisemann, E.; Vilanova, A. Approximated and user steerable tSNE for progressive visual analytics. IEEE Trans. Vis. Comput. Graph.
**2017**, 23, 1739–1752. [Google Scholar] [CrossRef] - Wattenberg, M.; Viégas, F.; Johnson, I. How to use t-sne effectively. Distill
**2016**, 1, e2. [Google Scholar] [CrossRef] - De Ridder, D.; Duin, R.P. Sammon’s mapping using neural networks: a comparison. Pattern Recognit. Lett.
**1997**, 18, 1307–1316. [Google Scholar] [CrossRef] - Tenenbaum, J.B.; De Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science
**2000**, 290, 2319–2323. [Google Scholar] [CrossRef] [PubMed]

**Figure 4.**Sample of Urdu handwritten numerals, in comparison with Persian and Arabic digits, with equivalent English digits from 0 to 9 (left to right).

**Figure 5.**Results of standard t-SNE on our data set having pixel-based data of Urdu handwritten numerals.

**Figure 6.**Results of standard t-SNE on our data set having pixel-based and structure-based data of Urdu handwritten numerals.

**Figure 7.**Visualization of handwritten Urdu numerals (pixel-based features only) using (

**a**) autoencoders, (

**b**) principal component analysis and (

**c**) standard t-SNE.

**Figure 9.**Visualization of handwritten Urdu numerals (fusion matrix) using (

**a**) autoencoders, (

**b**) principal component analysis and (

**c**) t-SNE.

**Figure 10.**Visualization of our handwritten Urdu numeral data using original images from the data set (all 5000 images).

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Husnain, M.; Missen, M.M.S.; Mumtaz, S.; Luqman, M.M.; Coustaty, M.; Ogier, J.-M.
Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE. *Symmetry* **2019**, *11*, 107.
https://doi.org/10.3390/sym11010107

**AMA Style**

Husnain M, Missen MMS, Mumtaz S, Luqman MM, Coustaty M, Ogier J-M.
Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE. *Symmetry*. 2019; 11(1):107.
https://doi.org/10.3390/sym11010107

**Chicago/Turabian Style**

Husnain, Mujtaba, Malik Muhammad Saad Missen, Shahzad Mumtaz, Muhammad Muzzamil Luqman, Mickaël Coustaty, and Jean-Marc Ogier.
2019. "Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE" *Symmetry* 11, no. 1: 107.
https://doi.org/10.3390/sym11010107