# Recurrence Networks in Natural Languages

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methods

#### 2.1. Recurrence Networks

#### 2.2. Network Metrics

- Density ($\rho $): The density of a network is defined as:$$\rho =\frac{2g}{n(n-1)},$$
- Closeness centrality (${K}_{c}$): Measures the centrality of a given node in the network, defined as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph [29],$${K}_{c}=\frac{n}{{\Sigma}_{j}{d}_{ij}},$$
- Clustering coefficient (${C}_{i}$): Measures the degree of transitivity in connectivity amongstamong the nearest neighbors of a node i [29]. In recurrence terms, ${C}_{i}$ represents the extent to which neighbors of a node (pattern) i are also recurrent amongstamong themselves. Specifically, ${C}_{i}$ is given by,$${C}_{i}=\frac{2{E}_{i}}{{k}_{i}({k}_{i}-1)},$$
- Average nearest-neighbor degree (${\overline{k}}_{nn,i}$): This measure allows us to see the mean preference in connectivity of a given node [30,31,32]. The behavior of this quantity as a function of the node’s degree, reveals whether high-degree nodes connect with other equally high-degree ones (assortativity), or high-degree nodes preferentially connect to low-degree ones (dissortativity) [29]. For unweighted networks, ${\overline{k}}_{nn,i}$ is calculated as:$${\overline{k}}_{nn,i}=\frac{1}{{k}_{i}}\sum _{j=1}^{N}{A}_{ij}{k}_{j},$$
- Assortative mixing coefficient by degree (${A}_{r}$): This measure quantifies the tendency observed in networks that nodes with many connections are connected to other nodes with many (or a few) connections [33]. Formally, the coefficient is given by,$${A}_{r}=\frac{{\sum}_{ij}{A}_{ij}({k}_{i}-\mu )({k}_{j}-\mu )}{{\sum}_{ij}{A}_{ij}{({k}_{i}-\mu )}^{2}},$$

## 3. Results

## 4. Discussion and Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Zipf, G.K. The Psycho-Biology of Language: An Introduction to Dynamic Philology; Houghton Mifflin: Boston, MA, USA, 1935. [Google Scholar]
- Grzybek, P. History and methodology of word length studies. In Contributions to the Science of Text and Language. Text, Speech and Language Technology; Grzybek, P., Ed.; Springer: Dordrecht, The Netherlands, 2006; Volume 31, pp. 15–90. [Google Scholar]
- Piantadosi, S.T.; Tily, H.; Gibson, E. Word lengths are optimized for efficient communication. Proc. Natl. Acad. Sci. USA
**2011**, 108, 3526–3529. [Google Scholar] [CrossRef] [Green Version] - Solé, R.V.; Corominas-Murtra, B.; Valverde, S.; Steels, L. Language networks: Their structure, function, and evolution. Complexity
**2010**, 15, 20–26. [Google Scholar] [CrossRef] - Seoane, L.F.; Solé, R. The morphospace of language networks. Sci. Rep.
**2018**, 8, 10465. [Google Scholar] [CrossRef] - Rêgo, H.H.A.; Braunstein, L.A.; D’Agostino, G.; Stanley, H.E.; Miyazima, S. When a Text Is Translated Does the Complexity of Its Vocabulary Change? Translations and Target Readerships. PLoS ONE
**2014**, 9, e110213. [Google Scholar] [CrossRef] - Kosmidis, K.; Kalampokis, A.; Argyrakis, P. Language time series analysis. Physica A
**2006**, 370, 808–816. [Google Scholar] [CrossRef] [Green Version] - Lacasa, L.; Luque, B.; Ballesteros, F.; Luque, J.; Nuño, J.C. From time series to complex networks: The visibility graph. Proc. Natl. Acad. Sci. USA
**2008**, 105, 4972–4975. [Google Scholar] [CrossRef] [Green Version] - Luque, B.; Lacasa, L.; Ballesteros, F.; Luque, J. Horizontal visibility graphs: Exact results for random time series. Phys. Rev. E
**2009**, 80, 046103. [Google Scholar] [CrossRef] [PubMed] - Ausloos, M. Generalized Hurst exponent and multifractal function of original and translated texts mapped into frequency and length time series. Phys. Rev. E
**2012**, 86, 031108. [Google Scholar] [CrossRef] [Green Version] - Donner, R.V.; Zou, Y.; Donges, J.F.; Marwan, N.; Kurths, J. Recurrence networks—a novel paradigm for nonlinear time series analysis. New J. Phys.
**2010**, 12, 033025. [Google Scholar] [CrossRef] - Rodriguez, E.; Aguilar-Cornejo, M.; Femat, R.; Alvarez-Ramirez, J. Scale and time dependence of serial correlations in word-length time series of written texts. Physica A
**2014**, 414, 378–386. [Google Scholar] [CrossRef] - Arbesman, S.; Strogatz, S.H.; Vitevitch, M.S. Comparative Analysis of Networks of Phonologically Similar Words in English and Spanish. Entropy
**2010**, 12, 327. [Google Scholar] [CrossRef] - De Arruda, H.F.; Marinho, V.Q.; Costa, L.d.F.; Amancio, D.R. Paragraph-based representation of texts: A complex networks approach. Inf. Process. Manag.
**2019**, 56, 479–494. [Google Scholar] [CrossRef] [Green Version] - Susuki, S.; Hirata, Y.; Aihara, K. Definition of distance for marked point process data and its application to recurrence plot-based analysis of exchange tick data of foreign currencies. Int. J. Bifurcat. Chaos
**2010**, 20, 3699–3708. [Google Scholar] [CrossRef] - Trulla, L.; Giuliani, A.; Zbilut, J.; Webber, C., Jr. Recurrence quantification analysis of the logistic equation with transients. Phys. Lett. A
**1996**, 223, 255–260. [Google Scholar] [CrossRef] - Marwan, N.; Donges, J.F.; Zou, Y.; Donner, R.V.; Kurths, J. Complex network approach for recurrence analysis of time series. Phys. Lett. A
**2009**, 373, 4246–4254. [Google Scholar] [CrossRef] [Green Version] - Zou, Y.; Donner, R.V.; Marwan, N.; Donges, J.F.; Kurths, J. Complex network approaches to nonlinear time series analysis. Phys. Rep.
**2019**, 787, 1–97. [Google Scholar] [CrossRef] - Liu, H.; Cong, J. Language clustering with word co-occurrence networks based on parallel texts. Sci. Bull.
**2013**, 58, 1139–1144. [Google Scholar] [CrossRef] [Green Version] - Abramov, O.; Mehler, A. Automatic Language Classification by means of Syntactic Dependency Networks. J. Quant. Linguist.
**2011**, 18, 291–336. [Google Scholar] [CrossRef] - Martinčić-Ipšić, S.; Margan, D.; Meštrović, A. Multilayer network of language: A unified framework for structural analysis of linguistic subsystems. Physica A
**2016**, 457, 117–128. [Google Scholar] [CrossRef] [Green Version] - Montemurro, M.A.; Zanette, D.H. Universal Entropy of Word Ordering across Linguistic Families. PLoS ONE
**2011**, 6, e19875. [Google Scholar] [CrossRef] - Altmann, E.G.; Pierrehumbert, J.B.; Motter, A.E. Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words. PLoS ONE
**2009**, 4, e7678. [Google Scholar] [CrossRef] - Doxas, I.; Dennis, S.; Oliver, W.L. The dimensionality of discourse. Proc. Natl. Acad. Sci. USA
**2010**, 107, 4866–4871. [Google Scholar] [CrossRef] [Green Version] - Grassberger, P. Generalized dimensions of strange attractors. Phys. Lett. A
**1983**, 97, 227–230. [Google Scholar] [CrossRef] - Hernández-Gómez, C.; Basurto-Flores, R.; Obregón-Quintana, B.; Guzmán-Vargas, L. Evaluating the Irregularity of Natural Languages. Entropy
**2017**, 19, 521. [Google Scholar] [CrossRef] - Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA
**1991**, 88, 2297–2301. [Google Scholar] [CrossRef] - Richman, J.S.; Moorman, J.R. Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol.
**2000**, 278, H2039–H2049. [Google Scholar] [CrossRef] [Green Version] - Newman, M.E.J. Networks: An Introduction; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
- Pastor-Satorras, R.; Vázquez, A.; Vespignani, A. Dynamical and Correlation Properties of the Internet. Phys. Rev. Lett.
**2001**, 87, 258701. [Google Scholar] [CrossRef] - Maslov, S.; Sneppen, K. Specificity and Stability in Topology of Protein Networks. Science
**2002**, 296, 910–913. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Barrat, A.; Barthélemy, M.; Pastor-Satorras, R.; Vespignani, A. The architecture of complex weighted networks. Proc. Natl. Acad. Sci. USA
**2004**, 101, 3747–3752. [Google Scholar] [CrossRef] [Green Version] - Newman, M.E.J. Mixing patterns in networks. Phys. Rev. E
**2003**, 67, 026126. [Google Scholar] [CrossRef] [Green Version] - Available online: https://figshare.com/articles/Recurrence_networks_in_natural_languages/7885376 (accessed on 23 May 2019).
- Marwan, N.; Wessel, N.; Meyerfeldt, U.; Schirdewan, A.; Kurths, J. Recurrence-plot-based measures of complexity and their application to heart-rate-variability data. Phys. Rev. E
**2002**, 66, 026702. [Google Scholar] [CrossRef] [Green Version] - Kalimeri, M.; Constantoudis, V.; Papadimitriou, C.; Karamanos, K.; Diakonos, F.K.; Papageorgiou, H. Entropy analysis of word-length series of natural language texts: Effects of text language and genre. Int. J. Bifurcat. Chaos
**2012**, 22, 1250223. [Google Scholar] [CrossRef] - Kalimeri, M.; Constantoudis, V.; Papadimitriou, C.; Karamanos, K.; Diakonos, F.K.; Papageorgiou, H. Word-length Entropies and Correlations of Natural Language Written Texts. J. Quant. Linguist.
**2015**, 22, 101–118. [Google Scholar] [CrossRef] [Green Version] - Guzmán-Vargas, L.; Obregón-Quintana, B.; Aguilar-Velázquez, D.; Hernández-Pérez, R.; Liebovitch, L. Word-length correlations and memory in large texts: A visibility network analysis. Entropy
**2015**, 17, 7798–7810. [Google Scholar] [CrossRef] - Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’ networks. Nature
**1998**, 393, 440–442. [Google Scholar] [CrossRef] - Newman, M.E.J. Assortative Mixing in Networks. Phys. Rev. Lett.
**2002**, 89, 208701. [Google Scholar] [CrossRef] [Green Version] - Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen.
**1936**, 7, 179–188. [Google Scholar] [CrossRef] - Vitevitch, M.S. What Can Graph Theory Tell Us About Word Learning and Lexical Retrieval? J. Speech Lang. Hear. Res.
**2008**, 51, 408–422. [Google Scholar] [CrossRef] - Arbesman, S.; Strogatz, S.H.; Vitevitch, M.S. The structure of phonological networks across multiple languages. Int. J. Bifurcat. Chaos
**2010**, 20, 679–685. [Google Scholar] [CrossRef] - Chan, K.Y.; Vitevitch, M.S. Network Structure Influences Speech Production. Cogn. Sci.
**2010**, 34, 685–697. [Google Scholar] [CrossRef]

**Figure 1.**Log-linear plot of density $\rho $ vs. the distance r for several values of the pattern length m. Here we show the cases $m=6,7,8,9,10$ and r runs from 1 to ${r}_{max}$, where ${r}_{max}=m-1$. The fit corresponds to the case $m=10$, which yields to $d\approx 1.3$.

**Figure 2.**Representative metrics of recurrence-pattern networks for different languages. (

**a**) Density for languages grouped by linguistic families. (

**b**) Average clustering coefficient $\mathcal{C}$. (

**c**) Closeness centrality. (

**d**) Assortativity coefficient. Vertical bars indicate the standard deviation of the data.

**Figure 3.**Mean nearest-neighbor connectivity as a function of the degree for (

**a**) Germanic, (

**b**) Romance, (

**c**) Slavic, and (

**d**) Uralic linguistic families. For each language, we also show the values of ${k}_{nn}$ corresponding to shuffled texts. A scaling behavior is observed for all cases of the form ${k}_{nn}\sim {k}^{\delta}$. We estimate the scaling exponent for degree values $10<k<500$, yielding the average values $\overline{\delta}\approx 0.49$ and $\overline{{\delta}_{r}}\approx 0.47$ for the original and random data, respectively. As a guide for the eye, the dashed line corresponds to the slope = $0.5$.

**Figure 4.**Results of classification analysis applied to European languages. (

**a**) Results of the linear discriminant method. Here we show the projection of density values from pattern lengths $m=5,6,7$. For each m-value and for each language, we considered ten segments with length ${10}^{4}$ to obtain ten $\rho $ values. Next, languages were labeled in classes according to the linguistic family to which they belong (Romance, Germanic, Slavic, Uralic). (

**b**) Results of the application of the k-nearest- neighbor classification method to data in panel a) but assigning the same label to languages of the same family. We used $k=20$ neighbors in the classifier. We observe that the families are segregated, except in the case of the Uralic family, which led to two disjoint regions. (

**c**) Results of the confusion matrix. The system makes a clear distinction between almost all family languages, except the case of Uralic, where we observe a problem distinguishing this family from Slavic and Romance.

**Table 1.**Recurrence symmetric matrix for the beginning of Hamlet’s famous soliloquy: To-be-or-not-to-be. Here $N=18$ and we set $m=3$. The resulting matrix has 16 rows and columns.

$\mathit{r}=2$ | To_ | o_b | _be | be_ | e_o | _or | or_ | r_n | _no | not | ot_ | t_t | _to | to_ | ob_ | _be |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

To_ | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |

o_b | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |

_be | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |

be_ | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |

e_o | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |

_or | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 |

or | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 |

r_n | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |

_no | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |

not | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |

ot_ | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 |

t_t | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |

_to | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |

to_ | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |

o_b | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |

_be | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Baeza-Blancas, E.; Obregón-Quintana, B.; Hernández-Gómez, C.; Gómez-Meléndez, D.; Aguilar-Velázquez, D.; Liebovitch, L.S.; Guzmán-Vargas, L.
Recurrence Networks in Natural Languages. *Entropy* **2019**, *21*, 517.
https://doi.org/10.3390/e21050517

**AMA Style**

Baeza-Blancas E, Obregón-Quintana B, Hernández-Gómez C, Gómez-Meléndez D, Aguilar-Velázquez D, Liebovitch LS, Guzmán-Vargas L.
Recurrence Networks in Natural Languages. *Entropy*. 2019; 21(5):517.
https://doi.org/10.3390/e21050517

**Chicago/Turabian Style**

Baeza-Blancas, Edgar, Bibiana Obregón-Quintana, Candelario Hernández-Gómez, Domingo Gómez-Meléndez, Daniel Aguilar-Velázquez, Larry S. Liebovitch, and Lev Guzmán-Vargas.
2019. "Recurrence Networks in Natural Languages" *Entropy* 21, no. 5: 517.
https://doi.org/10.3390/e21050517