# Limit Theorems as Blessing of Dimensionality: Neural-Oriented Overview

^{*}

## Abstract

**:**

## 1. Introduction: From Curse of Dimensionality to Blessing of Dimensionality

#### 1.1. First, a Curse

#### 1.2. Then, a Blessing

#### 1.3. Example

#### 1.4. Other Examples

#### 1.5. Are These Lucky Examples or a General Trend?

#### 1.6. It Is a General Trend

#### 1.7. Caution: Blessing-of-Dimensionality Is Not a Panacea

- The fact that limit theorems can explain some empirical successes does not mean, of course, that these blessing-of-dimensionality results are the only reason for these empirical successes: sometimes, as we have mentioned, the multi-dimensional data is actually intrinsically low-dimensional.
- The fact that limit theorems often make data processing easier does not mean that as the data complexity increases, the analysis always becomes simpler: many problems remain complex. At present, there is no clear general understanding of when the blessing of dimensionality occurs and where it does not occur. It would be nice to find such an understanding.

#### 1.8. What We Do in This Paper

#### 1.9. How This Paper Is Structured

## 2. Two Main Sources of Dimensionality: Spatial and Temporal

- First, at each moment of time, there is usually a large number of phenomena—located, in general, at different points in space—that need to be taken into account. Even if we use a few parameters to describe each of these phenomena, overall, we will need a very large number of parameters to describe all these phenomena—and thus, the dimensionality of the problems grows. We will call this dimensionality of spatial origin, or simply spatial dimensionality, for short. The above-mentioned Central Limit Theorem is a good example of spatial dimensionality.
- Furthermore, there may be parameters describing the history of the analyzed phenomenon—which also affect its current state. What naturally comes to mind is that the values of physical quantities change with time. In some cases, we observe these changes and we can analyze the corresponding time series. In other cases, we only observe the final results of these changes: e.g., inside a sensor, the original value may be transformed many times, and what we get as a resulting signal is the result of all these past transformations. In yet other cases, what changes are the simulated values—e.g., when we apply iterative algorithms. We will call the resulting dimensionality of temporal origin, or simply temporal dimensionality.

- Newton’s iterative method$${x}^{(k+1)}={x}^{\left(k\right)}-\frac{f\left({x}^{\left(k\right)}\right)}{{f}^{\prime}\left({x}^{\left(k\right)}\right)}$$
- the gradient descent method$${x}_{i}^{(k+1)}={x}_{i}^{\left(k\right)}-\alpha \xb7{\frac{\partial f}{\partial {x}_{i}}}_{|x={x}^{\left(k\right)}}$$

- For Newton’s method, if ${x}^{\left(k\right)}\to x$, then, in the limit, we get $x=x-\frac{f\left(x\right)}{{f}^{\prime}\left(x\right)}$, which implies that $f\left(x\right)=0$.
- For the gradient descent, if ${x}^{\left(k\right)}\to x$, then, in the limit, we get $x}_{i}={x}_{i}-\alpha \xb7\frac{\partial f}{\partial {x}_{i}$, which implies that $\frac{\partial f}{\partial {x}_{i}}=0$. Thus, the limit point is always a stationary point, which is a necessary (but, as is well known, not sufficient) condition for it being the location of the minimum.

## 3. Dimensionality of Spatial Origin

#### 3.1. Not-Well-Known Consequences of the Central Limit Theorem

#### 3.1.1. Why Are Many Things in the World Discrete?

#### 3.1.2. This Puzzling Discreteness Has Been Observed before

- when we assume that the reconstructed signal is monotonic, the reconstructed function is often (piece-wise) constant;
- if we additionally assume that the signal is one time differentiable, the result is usually one time differentiable, but rarely twice differentiable, etc.

#### 3.1.3. Tsirelson’s Explanation

- the fraction of the points that are the closest to one of the sides tends to 0, while
- the fraction of the points $\overrightarrow{o}$ for which the closest point from the set A is one of A’s vertices tends to 1.

#### 3.1.4. Methodological Consequence

#### 3.1.5. Resulting Discreteness Is Only Approximate

#### 3.2. Uncertainty Quantification and Probabilistic Limit Theorems—Including Theorems beyond Normal Distributions

#### 3.2.1. Need for Data Processing

#### 3.2.2. Need for Uncertainty Quantification

#### 3.2.3. Possibility of Linearization

#### 3.2.4. Here, the Central Limit Theorem Can Help

#### 3.2.5. How Can We Actually Estimate $\sigma $?

#### 3.2.6. So What?

#### 3.2.7. Need for Interval Uncertainty

#### 3.2.8. Is the Corresponding Distribution Gaussian?

#### 3.2.9. Uncertainty Quantification: Case of Interval Uncertainty

- for ${c}_{i}>0$, when the value $\Delta {x}_{i}$ is the largest, i.e., when $\Delta {x}_{i}={\Delta}_{i}$, and
- for ${c}_{i}<0$, when the value $\Delta {x}_{i}$ is the smallest, i.e., when $\Delta {x}_{i}=-{\Delta}_{i}$.

#### 3.2.10. How to Estimate Uncertainty in the Interval Case

#### 3.2.11. Another Limit Distribution Comes to the Rescue

- the actual measurement error is always located inside the interval $[-\Delta ,\Delta ]$, while
- the Cauchy-distributed random variable has a non-zero probability to be anywhere, in particular, outside the interval.

#### 3.3. What If We Have No Information about Probabilities

#### 3.3.1. Formulation of the Problem

#### 3.3.2. 1-D Case

- every point from the set X is $\epsilon $-close to some point from the interval I; and
- every point from the interval I is $\epsilon $-close to some point from the set X.

**Theorem**

**1.**

**Theorem**

**2.**

#### 3.3.3. General Case

- The inclusion $X\subseteq {X}^{\left(1\right)}+\dots +{X}^{\left(N\right)}$ follows from the fact that each element x can be represented as the sum $x={N}^{-1}\xb7x+\dots +{N}^{-1}\xb7x$; and
- the opposite inclusion ${X}^{\left(1\right)}+\dots +{X}^{\left(N\right)}\subseteq X$ follows from the fact that the set X is convex and thus, once the elements ${x}^{\left(1\right)},\dots ,{x}^{\left(N\right)}$ belong to this set, their convex combination ${N}^{-1}\xb7{x}^{\left(1\right)}+\dots +{N}^{-1}\xb7{x}^{\left(N\right)}$ also belongs to X.

**Theorem**

**3.**

**Theorem**

**4.**

#### 3.4. Important Open Questions

#### 3.4.1. What if We Only Have Partial Information About Probabilities?

#### 3.4.2. Possible Approach and Natural Generalizations of the Central Limit Theorem

#### 3.4.3. What if We Are Interested in the Extreme Case?

## 4. Dimensionality of Temporal Origin

#### 4.1. Case Study

#### 4.2. Let Us Formulate This Idea in Precise Terms

- A sequence ${f}_{i}$ whose compositions tend to some function $f\in F$; and
- a sequence ${g}_{i}$ whose composition tends to some function $g\in F$,

- then in the case when we first apply all ${f}_{i}$-transformations and then all ${g}_{i}$-transformations, then the resulting limit function $g\left(f\right(x\left)\right)$ should also belong to the family F. Thus, the desired family F of all possible limit functions should be closed under composition.

#### 4.3. Enter Norbert Wiener

- When the object is very far, all we see is a formless blur—in other words, objects obtained from one another by arbitrary smooth transformations cannot be distinguished.
- When the object gets closer, we can detect whether it is smooth or has sharp angles. We may see a circle as an ellipse, or a square as a rhombus (diamond). At this stage, images obtained by a projective transformation are indistinguishable.
- When the object gets even closer, we can detect which lines are parallel, but we may not yet detect the angles. For example, we are not sure whether what we see is a rectangle or a parallelogram. This stage corresponds to affine transformation.
- Then, we have a stage of similarity transformations—when we detect the shape, but cannot yet detect its size.
- Finally, when the object is close enough, we can detect both its shape and its size.

#### 4.4. So, What Are the Limit Transformations?

#### 4.5. A Similar Conclusion Can Be Made about All Possible Reasonable Transformations

#### 4.6. What Are the Implications for Neural Networks?

## 5. Conclusions

- For the corresponding transformations—as shown, e.g., by the description of all possible limit and/or reasonable transformations, and by the resulting theoretical explanation of the efficiency of sigmoid activation functions;
- for the system’s uncertainty—as shown, e.g., by the use of limit distributions such as normal and Cauchy to make uncertainty quantification more efficient, and by the use of limit theorems to explain the ubiquity of interval uncertainty; and
- the desired result of the system’s analysis—as shown, e.g., by a limit-theorem-based explanation of why it is usually possible to meaningfully classify objects into a small finite number of classes.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Kainen, P.C. Utilizing geometric anomalies of high dimension: When complexity makes computations easier. In Computer-Intensive Methods in Control and Signal Processing; Warwick, K., Kárný, M.M., Eds.; Springer: New York, NY, USA, 1997; pp. 283–294. [Google Scholar]
- Sheskin, D.J. Handbook of Parametric and Non-Parametric Statistical Procedures; Chapman & Hall/CRC: London, UK, 2011. [Google Scholar]
- Donoho, D.L. High-dimensional data analysis: The curses and blessings of dimensionality. In Proceedings of the American Mathematical Society Conference on Math Challenges of the 21st Century, Los Angeles, CA, USA, 6–12 August 2020. [Google Scholar]
- Gorban, A.N.; Tyukin, I.Y.; Romanenko, I. The blessing of dimensionality: Separation theorems in the thermodynamic limit. IFAC-PapersOnLine
**2016**, 49, 64–69. [Google Scholar] [CrossRef] - Gorban, A.N.; Golubkov, V.; Grechuk, B.; Mirkes, E.M.; Tyukin, I.Y. Correction of AI systems by linear discriminants: Probabilistic foundations. Inf. Sci.
**2018**, 466, 303–322. [Google Scholar] [CrossRef] [Green Version] - Gorban, A.N.; Tyukin, I.Y. Blessing of dimensionality: Mathematical foundations of the statistical physics of data. Phil. Trans. R. Soc. A
**2018**, 376, 20170237. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
- Gorban, A.N.; Makarov, V.A.; Tyukin, I.Y. The unreasonable effectiveness of small neural ensembles in high-dimensional brain. Phys. Life Rev.
**2019**, 29, 55–88. [Google Scholar] [CrossRef] - Kreinovich, V. The heresy of unheard-of simplicity: Comment on “The unreasonable effectiveness of small neural ensembles in high-dimensional brain" by A. N. Gorban, V. A. Makarov, and I. Y. Tyukin”. Phys. Life Rev.
**2019**, 29, 93–95. [Google Scholar] [CrossRef] - Grechuk, B.; Gorban, A.N.; Tyukin, I.Y. General stochastic separation theorems with optimal bounds. Neural Netw.
**2021**, 138, 33–56. [Google Scholar] [CrossRef] - Tyukin, I.Y.; Higham, D.J.; Gorban, A.N. On adversarial examples and stealth attacks in artificial intelligence systems. In Proceedings of the International Joint Conference on Neural Networks IJCNN’2020, Glasgow, UK, 19–24 July 2020; pp. 1–6. [Google Scholar]
- Alexander, D.M.; Jurica, P.; Trengove, C.; Nikolaev, A.R.; Gepshtein, S.; Zvyagintsev, M.; Mathiak, K.; Schulze-Bonhage, A.; Ruescher, J.; Ball, T.; et al. Traveling waves and trial averaging: The nature of single-trial and averaged brain responses in large-scale cortical signals. Neuroimage
**2013**, 73, 95–112. [Google Scholar] [CrossRef] [Green Version] - Alexander, D.M.; Trengove, C.; van Leeuwen, C. Donders is dead: Cortical traveling waves and the limits of mental chronometry in cognitive neuroscience. Cogn. Process.
**2015**, 16, 365–375. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Leaning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Tsirel’son, B.S. A geometrical approach to maximum likelihood estimation for infinite-dimensional Gaussian location. Theory Probab. Its Appl.
**1982**, 27, 411–418. [Google Scholar] [CrossRef] - Nguyen, H.T.; Wu, B.; Kreinovich, V. Our reasoning is clearly fuzzy, so why is crisp logic so often adequate? Int. J. Intell. Technol. Appl. Stat. (IJITAS)
**2015**, 8, 133–137. [Google Scholar] - Einstein, A. Collected Papers of Albert Einstein; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
- Schlipp, P.A. Albert Einstein: Philosopher-Scientist; MJF Books: New York, NY, USA, 2001. [Google Scholar]
- Kumar, M. Quantum: Einstein, Bohr, and the Great Debate about the Nature of Reality; W. W. Norton & Company: New York, NY, USA, 2011. [Google Scholar]
- Rabinovich, S.G. Measurement Errors and Uncertainties: Theory and Practice; Springer: New York, NY, USA, 2005. [Google Scholar]
- Feynman, R.; Leighton, R.; Sands, M. The Feynman Lectures on Physics; Addison Wesley: Boston, MA, USA, 2005. [Google Scholar]
- Moore, R.E.; Kearfott, R.B.; Cloud, M.J. Introduction to Interval Analysis; SIAM: Philadelphia, PA, USA, 2009. [Google Scholar]
- Mayer, G. Interval Analysis and Automatic Result Verification; de Gruyter: Berlin, Germany, 2017. [Google Scholar]
- Kreinovich, V.; Lakeyev, A.; Rohn, J.; Kahl, P. Computational Complexity and Feasibility of Data Processing and Interval Computations; Kluwer: Dordrecht, The Netherlands, 1998. [Google Scholar]
- Kreinovich, V.; Ferson, S. A new Cauchy-based black-box technique for uncertainty in risk analysis. Reliab. Syst. Saf.
**2004**, 85, 267–279. [Google Scholar] [CrossRef] [Green Version] - Kreinovich, V. Why intervals? A simple limit theorem that is similar to limit theorems from statistics. Reliab. Comput.
**1995**, 1, 33–40. [Google Scholar] [CrossRef] - Roginskaya, M.M.; Shulman, V.S. On Minkoswki sums of many small sets. Funct. Anal. Its Appl.
**2018**, 52, 233–235. [Google Scholar] [CrossRef] - Nocedal, G.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 2006. [Google Scholar]
- Urenda, J.C.; Kosheleva, O.; Kreinovich, V. How to describe measurement errors: A natural generalization of the Central Limit Theorem beyond normal (and other infinitely divisible) distributions. In Advanced Mathematical and Computational Tools in Metrology and Testing XII; Pavese, F., Forbes, A.B., Zhang, N.F., Chunovkina, A.G., Eds.; World Scientific: Singapore, 2021; to appear. [Google Scholar]
- Embrechts, P.; Klüppelberg, C.; Mikosch, T. Modelling Extremal Events for Insurance and Finance; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
- Kotz, S.; Nadarajah, S. Extreme Value Distributions: Theory and Applications; Imperial College Press: London, UK, 2000. [Google Scholar]
- Coles, S. An Introduction to Statistical Modeling of Extreme Values; Springer: London, UK, 2001. [Google Scholar]
- Beirlant, J.; Goegebeur, Y.; Segers, J.; Teugels, J. Statistics of Extremes: Theory and Applications; Wiley: New York, NY, USA, 2004. [Google Scholar]
- de Haan, L.; Ferreira, A. Extreme Value Theory: An Introduction; Springer: New York, NY, USA, 2006. [Google Scholar]
- Resnick, S.I. Extreme Values, Regular Variation and Point Processes; Springer: New York, NY, USA, 2008. [Google Scholar]
- Novak, S.Y. Extreme Value Methods with Applications to Finance; Chapman & Hall/CRC Press: London, UK, 2011. [Google Scholar]
- Gumbel, E.J. Statistics of Extremes; Dover: New York, NY, USA, 2013. [Google Scholar]
- Kreinovich, V.; Nguyen, H.T.; Sriboonchitta, S.; Kosheleva, O. Modeling extremal events is not easy: Why the extreme value theorem cannot be as general as the central limit theorem. In Uncertainty Modeling; Kreinovich, V., Ed.; Springer: Cham, Switzerland, 2017; pp. 123–134. [Google Scholar]
- Wiener, N. Cybernetics, or Control and Communication in the Animal and the Machine, 3rd ed.; MIT Press: Cambridge, MA, USA, 1962. [Google Scholar]
- Guillemin, V.M.; Sternberg, S. An algebraic model of transitive differential geometry. Bull. Am. Soc.
**1964**, 70, 16–47. [Google Scholar] [CrossRef] [Green Version] - Singer, I.M.; Sternberg, S. Infinite groups of Lie and Cartan, Part 1. J. D’Analyse Math.
**1965**, 15, 1–113. [Google Scholar] [CrossRef] - Nguyen, H.T.; Kreinovich, V. Applications of Continuous Mathematics to Computer Science; Kluwer: Dordrecht, The Netherlands, 1997. [Google Scholar]
- Zapata, F.; Kosheleva, O.; Kreinovich, V. Wiener’s conjecture about transformation groups helps predict which fuzzy techniques work better. In Proceedings of the 2014 Annual Conference of the North American Fuzzy Information Processing Society NAFIPS’2014, Boston, MA, USA, 24–26 June 2014. [Google Scholar]
- Kreinovich, V.; Quintana, C. Neural networks: What non-linearity to choose? In Proceedings of the 4th University of New Brunswick Artificial Intelligence Workshop, Fredericton, NB, Canada, 15–18 October 1991; pp. 627–637. [Google Scholar]
- Kreinovich, V.; Kosheleva, O. Optimization under uncertainty explains empirical success of deep learning heuristics. In Black Box Optimization, Machine Learning and No-Free Lunch Theorems; Pardalos, P., Rasskazova, V., Vrahatis, M.N., Eds.; Springer: Cham, Switzerland, 2021; pp. 195–220. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kreinovich, V.; Kosheleva, O.
Limit Theorems as Blessing of Dimensionality: Neural-Oriented Overview. *Entropy* **2021**, *23*, 501.
https://doi.org/10.3390/e23050501

**AMA Style**

Kreinovich V, Kosheleva O.
Limit Theorems as Blessing of Dimensionality: Neural-Oriented Overview. *Entropy*. 2021; 23(5):501.
https://doi.org/10.3390/e23050501

**Chicago/Turabian Style**

Kreinovich, Vladik, and Olga Kosheleva.
2021. "Limit Theorems as Blessing of Dimensionality: Neural-Oriented Overview" *Entropy* 23, no. 5: 501.
https://doi.org/10.3390/e23050501