# A Model for Scale-Free Networks: Application to Twitter

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Complex Networks

- Degree distribution: In an undirected graph $G=(V,E)$, the degree of a node ${v}_{i}$ is the number of connections or edges that this node has to other nodes (excluding self-links), i.e.,$$deg{v}_{i}=\left|\{{e}_{ij}\in E:j\ne i\}\right|,$$$$P\left(k\right)=\frac{\left|\{{v}_{i}\in V:deg{v}_{i}=k\}\right|}{\left|E\right|},$$In Erdös–Rényi random networks, it has been shown [18] that $P\left(k\right)$ follows a Poisson distribution whose peak is located at $\langle k\rangle $ ($\langle \xb7\rangle $ denotes the expectation value). However, in many real-world networks, $P\left(k\right)$ follows a power-law distribution, that is,$$P\left(k\right)\sim {k}^{-\lambda}.$$Other functional dependencies on k are also found describing the degree distribution of real-world networks. Truncated power laws, exponential functions or Gaussian distributions are some examples of non-power laws that arise in many situations [20].In the case of directed graphs, there are in- and out-degrees (referring to incoming and outgoing edges, respectively) and the corresponding degree distributions. Needless to say, incoming and outgoing edges could follow different scaling laws.
- Clustering coefficient: This coefficient measures the tendency of a graph to form clusters of highly-related (i.e., connected) nodes, and it can be defined at a local and at a global level.For a particular node (local level), its clustering coefficient represents the degree of connection among its neighbors. The set of neighbors ${N}_{i}$ of a given node ${v}_{i}$ in a directed graph is defined as the set of nodes that are bidirectionally connected to ${v}_{i}$ (i.e., there is an edge from ${v}_{i}$ to the neighbor and also an edge from the same neighbor to ${v}_{i}$):$${N}_{i}=\{{v}_{j}\in V:{e}_{ij}\in E,{e}_{ji}\in E\}.$$$${C}_{i}=\frac{\left|\right\{{e}_{jk}\in E:{v}_{j},{v}_{k}\in {N}_{i}\left\}\right|}{|{N}_{i}\left|\right(|{N}_{i}|-1)}.$$$$C=\frac{1}{n}\sum _{i=1}^{n}{C}_{i}.$$$${C}_{rand}=\frac{\langle k\rangle}{n}$$
- Average path length: This concept is defined as the average distance between any two nodes of the network, assuming that all nodes of the graph are connected to each other (i.e., for all ${v}_{i},{v}_{j}\in V$, there is a sequence of nodes $\{{v}_{i},{v}_{{a}_{1}},{v}_{{a}_{2}},\dots .{v}_{{a}_{l}},{v}_{j}\}$, called a path from ${v}_{i}$ to ${v}_{j}$ of length $l\ge 0$, such that $\{{e}_{i{a}_{1}},{e}_{{a}_{1}{a}_{2}},\dots ,{e}_{{a}_{l}j}\}\subset E$). Otherwise, one discards all isolated nodes; the result is (are) called the connected component(s) of the graph. The distance $d({v}_{i},{v}_{j})$ from node ${v}_{i}$ to node ${v}_{j}$ in the same connected component is then the shortest distance of all paths joining ${v}_{i}$ to ${v}_{j}$. Note that, in the case of directed graphs, $d({v}_{i},{v}_{j})\ne d({v}_{j},{v}_{i})$ in general. The average path length ${A}_{length}$ is calculated according to:$${A}_{length}=\frac{1}{n(n-1)}\sum _{i,j}d({v}_{i},{v}_{j}).$$Real-world networks usually fall into the category of small world networks. Small world networks are characterized by the fact that their average path length is very small compared to the size of the network. This interesting property is not so uncommon though, since it can also be found in random networks [21].

## 3. Twitter as a Complex Network

- $V=\{{v}_{1},{v}_{2},\dots ,{v}_{n}\}$ is the set of nodes (or vertices) of the graph. Here, each ${v}_{i}$ represents a Twitter user, and n is the number of users. By definition, the number of nodes is the cardinality of the graph (in symbols: $n=\left|G\right|$).
- E is the set of directed edges (or links). An edge ${e}_{ij}$ is an ordered pair of nodes of the form $({v}_{i},{v}_{j})$, meaning that the edge goes from ${v}_{i}$ to ${v}_{j}$. If there exists a relationship between two users ${v}_{i}$ and ${v}_{j}$ in the sense that ${v}_{i}$ follows ${v}_{j}$ (in Twitter terms), then there is a corresponding edge ${e}_{ij}=({v}_{i},{v}_{j})\in E$ in the Twitter graph.

#### 3.1. Data Gathering

#### 3.2. Degree Distribution

**Figure 1.**Outgoing degree distribution of Twitter’s network. As the figure shows, there are a few users with an enormous degree (number of friends). On the contrary, the majority of them have just at most 1000 friends.

**Figure 2.**Incoming degree distribution of Twitter’s network. As the figure shows, there are a few users with an enormous degree (number of followers). On the contrary, the majority of them have less than 100 followers.

#### 3.3. Clustering Coefficient

Random network C. Coef. | Twitter network actual C. Coef. | Ratio |
---|---|---|

$3.74\times {10}^{-7}$ | $0.096$$(9.6\%)$ | $256,684.5$ |

#### 3.4. Average Path Length

#### 3.4.1. Method

- For every node that is not in the hub subnetwork, find one path from it to any hub. Call $d({v}_{i},{V}_{h})$ its length, $\u2329d({v}_{i},{V}_{h})\u232a$ the average of all distances $d({v}_{i},{V}_{h})$ and ${max}_{toHub}={max}_{{v}_{i}\notin {V}_{h}}d({v}_{i},{V}_{h})$ their maximum.
- For every hub, find one path to any other hub. Call $d({h}_{i},{h}_{j})$ its distance, $\u2329d({h}_{i},{h}_{j})\u232a$ the average of all distances $d({h}_{i},{h}_{j})$ and ${max}_{withinHub}={max}_{{h}_{i},{h}_{j}\in {V}_{h}}d({h}_{i},{h}_{j})$ their maximum.
- For every node that is not in the hub subnetwork, find one path from any hub to that node. Call $d({V}_{h},{v}_{i})$ its length, $\u2329d({V}_{h},{v}_{i})\u232a$ the average of all distances $d({V}_{h},{v}_{i})$ and ${max}_{fromHub}={max}_{{v}_{i}\notin {V}_{h}}d({V}_{h},{v}_{i})$ their maximum.

- It is always possible to find a path from any node to the hub subnetwork.
- There is a path between any two nodes within the hub subnetwork.
- It is always possible to go from any node of the hub subnetwork to any other node in the graph.

#### 3.4.2. Results

- $ma{x}_{toHub}$The maximum length of all of the paths found between any node and the hub subnetwork is:$${max}_{toHub}=45.$$$$\u2329d({v}_{i},{V}_{h})\u232a=1.79.$$The distribution of path lengths is shown in Figure 4, where it can be checked that most of them have a length of three or less. Moreover, a total amount of $42,213,921$ paths (each with a different initial node) was found. Taking into account the fact that there are $43,027,729$ users with at least one friend, this means that in $98.10\%$ of the cases, the search of a path from a node outside the hub subnetwork to a hub was successful.
No. of Paths Path Length … … 10 9 51 8 387 7 3,789 6 41,337 5 461,414 4 4,998,876 3 21,755,660 2 14,951,325 1 1,000 0 **Figure 4.**Path length distributions ending at a node of the hub network. The few paths with a length greater than or equal to 10 have been omitted in the table. - $ma{x}_{withinHub}$The maximum length of all of the paths found from one hub to another is:$${\text{max}}_{withinHub}=3,$$$$\u2329d({h}_{i},{h}_{j})\u232a=1.30.$$Figure 5 shows the distribution of path lengths in the hub subnetwork.
No. of Paths Path Length 35 3 299,884 2 699,081 1 1,000 0 - $ma{x}_{fromHub}$The maximum length of all of the paths found from the hub subnetwork to any other node is:$${max}_{fromHub}=12.$$$$\u2329d({V}_{h},{v}_{i})\u232a=2.19.$$The distribution of path lengths is shown in Figure 6, where it can be seen that most of them have a length of three or less. Moreover, the total count of paths (each with a different initial node) was $48,044,814$. Taking into account that there are $48,192,718$ users with at least one follower, this means that in $99.69\%$ of the cases, the search of a path from the hub subnetwork to a node outside was successful.
No. of Paths Path Length 1 12 1 11 1 10 2 9 7 8 83 7 1,033 6 18,578 5 330,156 4 14,467,363 3 27,089,900 2 6,136,689 1 1,000 0

#### 3.4.3. Resulting Upper Bounds

## 4. The Model

**Figure 7.**Growth process in the model: (i) node creation attaching an old node; (ii) node creation as the target of an old node; (iii) link creation. In (i) and (ii), the new node is a filled black circle, while in (i), (ii) and (iii), the new link is a dashed line.

- with probability p, a new node is created attaching to a directed node,
- with probability q, a new node is created attached by a directed link and
- with probability r, a directed link is created between the old nodes.

- (i) the attachment rate $A(i,j)$, defined as the probability that a newly-introduced node links to an existing node with i incoming and j outgoing links,
- (ii) the attachment rate $B(i,j)$, defined as the probability that a newly-introduced node is linked by an existing node with i incoming and j outgoing links,
- (ii) the creation rate $C({i}_{1},{j}_{1}|{i}_{2},{j}_{2})$, defined as the probability of adding a new link from a $({i}_{1},{j}_{1})$ node to a $({i}_{2},{j}_{2})$ node.

## 5. Application of the Model to Twitter

**Figure 8.**Outgoing degree distribution of Twitter’s network computed considering 5000 nodes with the proposed model.

**Figure 9.**Incoming degree distribution of the Twitter network computed considering 5000 nodes with the proposed model.

**Table 2.**Comparison between the real Twitter network of $51,217,936$ users and the modeled network using 5000.

${\lambda}_{\mathrm{IN}}$ | ${\lambda}_{\mathrm{OUT}}$ | Clustering Coeff. | |
---|---|---|---|

Real Twitter network | $-1.8778$ | $-2.1715$ | $0.096$$(9.6\%)$ |

Modeled Twitter network | $-1.8227$ | $-2.1361$ | $0.020$$(2\%)$ |

## 6. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Al-Kandari, A.; Hasanen, M. The impact of the Internet on political attitudes in Kuwait and Egypt. Telemat. Inform.
**2012**, 29, 245–253. [Google Scholar] [CrossRef] - Tahrir Square in Madrid: Spain’s Lost Generation Finds Its Voice. Der Spiegel. Retrieved. 7 July 2011. Available online: http://www.spiegel.de (accessed on 8 August 2015).
- Barabasi, A.L.; Albert, R. Emergence of scaling in random networks. Science
**1999**, 286, 509–512. [Google Scholar] [PubMed] - Simon, H.A. On a class of skew distribution functions. Biometrika
**1955**, 286, 425–440. [Google Scholar] [CrossRef] - Dorogovtsev, S.N.; Mendes, J.F.; Samukhin, A.N. Structure of growing networks with preferential linking. Phys. Rev. Lett.
**2000**, 85, 4633–4636. [Google Scholar] [CrossRef] [PubMed] - Krapivsky, P.L.; Redner, S. Organization of growing random networks. Phys. Rev. E
**2001**, 83, 066123. [Google Scholar] [CrossRef] - Dorogovtsev, S.N.; Mendes, J.F. Effect of the accelerating growth of communications networks on their structure.
**2001**, 63. [Google Scholar] [CrossRef] - Dorogovtsev, S.N.; Mendes, J.F. Accelerated growth of networks. In Handbook of Graphs and Networks: From the Genome to the Internet; Wiley: Hoboken, NJ, USA, 2002. [Google Scholar]
- Dorogovtsev, S.N.; Mendes, J.F. Scaling behavior of developing and decaying networks. EuroPhys. Lett.
**2000**, 1, 33. [Google Scholar] [CrossRef] - Krapivsky, P.L.; Redner, S. A statistical physics perspective on Web growth. Comput. Netw.
**2002**, 39, 261–276. [Google Scholar] [CrossRef] - Iribarren, J.L.; Moro, E. Branching dynamics of viral information spreading. Phys. Rev. E
**2011**, 84, 046116. [Google Scholar] [CrossRef] - Iribarren, J.L.; Moro, E. Information diffusion epidemics in social networks. Phys. Rev. Lett.
**2007**. [Google Scholar] [CrossRef] - Ver Steeg, G.; Galstyan, A. Information-theoretic measures of influence based on content dynamics. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, Rome, Italy, 4–8 February 2013; pp. 3–12.
- Ghosh, R.; Surachawala, T.; Lerman, K. Entropy-Based Classification of “Retweeting” Activity on Twitter; CoRR: Los Angeles, CA, USA, 2011. [Google Scholar]
- Garcia-Herranz, M.; Moro, E.; Cebrian, M.; Christakis, N.A.; Fowler, J.H. Using friends as sensors to detect global-scale contagious outbreaks. PLoS ONE
**2014**, 9, e92413. [Google Scholar] [CrossRef] [PubMed] - Kim, M.; Newth, D.; Christen, P. Trends of news diffusion in social media based on crowd phenomena. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, Seoul, Korea, 7–11 April 2014; pp. 753–758.
- Newman, M.E.J. The Structure and Function of Complex Networks. SIAM Rev.
**2003**, 45, 167–256. [Google Scholar] [CrossRef] - Albert, R.; Barabási, A.L. Statistical mechanics of complex networks. Rev. Mod. Phys.
**2002**, 29, 47–97. [Google Scholar] [CrossRef] - Albert, R.; Barabási, A.L. Scale-Free Networks: A Decade and Beyond. Science
**2009**, 325, 412–413. [Google Scholar] - Strogatz, S.H. Exploring complex networks. Nature
**2001**, 410, 268–276. [Google Scholar] [CrossRef] [PubMed] - Watts, D.J.; Strogatz, S. Collective dynamics of “small-world” networks. Nature
**1998**, 393, 440–442. [Google Scholar] [CrossRef] [PubMed] - Twitter API. Available online: http://dev.twitter.com/doc (accessed on 8 August 2015).
- Klout. Available online: http://klout.com (accessed on 8 August 2015).
- Twitalyzer. Available online: http://twitalyzer.com/ (accessed on 8 August 2015).
- Cha, M.; Haddadi, H.; Benevenuto, F.; Gummadi, K.P. Measuring User Influence in Twitter: The Million Follower Fallacy. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM), Washington, DC, USA, 23–26 May 2010.
- Broder, A.; Kumar, R.; Maghoul, F.; Raghavan, P.; Rajagopalan, S.; Stata, R.; Tomkins, A.; Wiener, J. Graph structure in the web. Comput. Netw.
**2000**, 33, 309–320. [Google Scholar] [CrossRef] - Newman, M.E.J. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA
**2001**, 98, 404–409. [Google Scholar] [CrossRef] [PubMed] - Floyd, R.W. Algorithm 97: Shortest path. Commun. ACM
**1962**, 5, 345. [Google Scholar] [CrossRef] - Johnson, D.B. Efficient Algorithms for Shortest Paths in Sparse Networks. J. ACM
**1977**, 24, 1–13. [Google Scholar] [CrossRef]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Aparicio, S.; Villazón-Terrazas, J.; Álvarez, G.
A Model for Scale-Free Networks: Application to Twitter. *Entropy* **2015**, *17*, 5848-5867.
https://doi.org/10.3390/e17085848

**AMA Style**

Aparicio S, Villazón-Terrazas J, Álvarez G.
A Model for Scale-Free Networks: Application to Twitter. *Entropy*. 2015; 17(8):5848-5867.
https://doi.org/10.3390/e17085848

**Chicago/Turabian Style**

Aparicio, Sofía, Javier Villazón-Terrazas, and Gonzalo Álvarez.
2015. "A Model for Scale-Free Networks: Application to Twitter" *Entropy* 17, no. 8: 5848-5867.
https://doi.org/10.3390/e17085848