Object-Based Dynamics: Applying Forman–Ricci Flow on a Multigraph to Assess the Impact of an Object on The Network Structure

Cohen, Haim; Nachshon, Yinon; Maril, Anat; Naim, Paz M.; Jost, Jürgen; Saucan, Emil

doi:10.3390/axioms11090486

Open AccessArticle

Object-Based Dynamics: Applying Forman–Ricci Flow on a Multigraph to Assess the Impact of an Object on The Network Structure

by

Haim Cohen

^1,2

,

Yinon Nachshon

¹,

Anat Maril

^1,3,

Paz M. Naim

¹,

Jürgen Jost

^4,5 and

Emil Saucan

^6,*

¹

Department of Cognitive Science, The Hebrew University of Jerusalem, Jerusalem 91905, Israel

²

The Federmann Center for the Study of Rationality, Jerusalem 91904, Israel

³

Department of Psychology, The Hebrew University of Jerusalem, Jerusalem 91905, Israel

⁴

Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany

⁵

The Santa Fe Institute, Santa Fe, NM 87501, USA

⁶

Department of Applied Mathematics, Braude College, Karmiel 2161002, Israel

^*

Author to whom correspondence should be addressed.

Axioms 2022, 11(9), 486; https://doi.org/10.3390/axioms11090486

Submission received: 18 August 2022 / Revised: 12 September 2022 / Accepted: 16 September 2022 / Published: 19 September 2022

(This article belongs to the Special Issue Discrete Curvatures and Laplacians)

Download

Browse Figures

Versions Notes

Abstract

:

Temporal information plays a central role in shaping the structure of a network. In this paper, we consider the impact of an object on network structure over time. More specifically, we use a novel object-based dynamic measure to reflect the extent to which an object that is represented in the network by a vertex affects the topology of the network over time. By way of multigraph and Forman–Ricci flow, we assess the object’s impact on graph weights by comparing two graphs, one in which the object is present and one in which the object is absent. After using a case study to demonstrate the impact of Forman–Ricci flow on the network structure, we apply our measure in a semantic network to assess the effects of a word on the interactions between other words that follow it. In addition, we compare our novel measure to centrality and curvature measures so that we can ascertain the advantages of our measure over ones that already exist.

Keywords:

complex network; Forman–Ricci curvature; Ricci flow; semantic network; multigraph

MSC:

53E20; 53Z99; 51K99

1. Introduction

Complex network theory provides a useful way to represent complex relationships among different objects; previous research has focused on social, genetic, neurological, psychopathological, economic, and semantic contexts [1,2,3,4,5,6]. The structural elements are, first, a set of vertices within the network, representing elements such as individuals, genes, or words, and second, a set of edges representing the interaction between those vertices. This paper focuses on directed and weighted networks, in other words, networks in which interactions may occur in a specific direction and with a certain strength.

One of the basic assumptions in many studies involving complex networks is that static networks can model real-world phenomena. The problem with this assumption is that real-world interactions, such as social or economic transactions, evolve. The network, then, must be dynamic, with a temporal dimension that captures information about the changing interactions between objects. Dynamical network analysis such as temporal network [7] or stream graph [8] amplifies the temporal dimension in the analysis. A temporal network captures changes in the interactions between vertices as well as the appearance and disappearance of vertices.

In a dynamical evolution, one can investigate differences between networks [9] or take a dynamical approach to community detection [10]. Our own aim is to detect the extent to which an object changes the network’s structure over time, and we include here the effect on the neighborhood of the interaction, too. In other words, we want to see how an object changes not only the interaction between a pair of objects but also the larger environment that surrounds that interaction.

In a social setting, for instance, the presence of a specific individual may change the way other individuals interact; in the language of a network, the individual affects the set of vertices in the environment of the interaction. In a commercial context, the purchase of product A may affect the rest of the buying process, leading to the purchase of products B and C together. Without the purchase of A, B and C might not have been purchased at the same time.

In this paper, we harness tools from differential geometry and complex networks to investigate the interplay between an object and the dynamics of network structure. We offer a novel measure, the object-based dynamic measure, to account for how an object changes the network topology and the interactions between other objects over time. Our measure is based on two essential elements. First is a multigraph representation for each object. In this multigraph, any pair of vertices receives two sets of edges. One set expresses the interaction between the vertices in the absence of the object, and the other set expresses the interactions in the presence of the object. This multigraph representation allows us to compare the set of interactions with and without the presence of a given object. The second element of our measure is the computation of Ricci flow as a regularization term for each set of edges separately. Ricci flow allows us to extract the geometrical information that reflects the core structure of the network [11,12]. By comparing metrics of the same size for both sets of edges—in other words, the metric both with and without the presence of the object—we can assess the influence of that object on the core geometry of the network.

Our object-based dynamic measure can be used in two ways. The first option takes into account the direction of the effect. As we will explain later, when a negative value represents an object that changes the metric, the structure is tree-like, and when a positive value represents an object that changes the metric, the structure is spherical. The second option takes into account the absolute value of the object-based dynamic, which represents the extent that an object changes the metric. In this case, we know only that a significant change has occurred, not whether the structure is tree-like or spherical. This study presents both options.

After presenting the measure and its essential elements, we present the case study of a semantic network. We apply our object-based dynamic perspective to an ongoing semantic retrieval task so that we can assess a word’s impact, if any, on the semantic interactions that follow that word. More specifically, we examine a graph that represents a standard semantic task: the participants’ production, over the course of one minute, of as many unique words as possible within a semantic category, such as animals. In this context, we can assess the impact of a specific word on the interactions that follow. In the stream of associations, sun-fire-sphere, for example, the word “sun” affects the semantic relationship between “fire” and “sphere.” In the presence of “sun”, “fire” and “sphere” are closely related, but in the absence of “sun,” the relationship between those two words would be weaker.

The purpose of our semantic case study is threefold. First, we want to demonstrate the effect of Ricci-flow as a regularization term on the network structure. Following the studies of Weber, Jost, and Saucan [11] and Ni, Lin, Luo, and Gao [13], which examined different uses of Olivier-Ricci flow and Forman–Ricci flow, we propose a comparison of Forman–Ricci flow between two graphs as a way to examine the impact of an object on the graph weights. Second, we are interested in the results when we apply the measure in a semantic context: we want to examine the extent to which a word affects the interactions between other words that follow it. Third, we want to demonstrate that the object-based dynamic score is not a graph measure but a comparison between two graphs for each object.

This third goal notwithstanding, we compare our novel measure to the centrality measures of degree, closeness, betweenness, LDC, and PageRank. This focus on centrality measures makes sense because a vertex that controls the flow of information is likely to affect the relationship between other vertices over time. We also compare our object-based dynamic to the curvature-based measures Menger, 1-dimensional Forman, 2-dimensional Forman, and Haantjes since these measures provide insight into the local geometric properties near the vertex as well as the topological structure of the network [14,15,16,17]. Finally, we compare our object-based dynamic to the psycholinguistic measures of word frequency and word location since both of these measures reflect attributes of words that may express local changes in the environment of a given word [18]. Through these comparisons, we investigate whether our word-based dynamic measure contributes information beyond what is already provided by centrality, curvature-based, and psycholinguistic measures. We aim to show that the effect of a word on its environment, as measured by our object-based dynamic, is not captured by existing measures.

This paper makes three major contributions: a new perspective in the analysis of network dynamics, with a focus on the effect of an object on network structure; a novel object-based dynamic measure; and the application of this perspective and measure to a semantic graph.

The remainder of this paper is organized as follows. Section 2 presents the data we use in our semantic study case, along with some basic assumptions. Section 3 presents our results with reference to Forman–Ricci flow and our object-based dynamic measure, along with a comparison between our measure and others that already exist. Finally, we present our conclusions in Section 4.

2. Materials and Methods

The experiment was conducted on two groups (N = 2047), both consisting of native Hebrew speakers. One group was recruited from the Hebrew University of Jerusalem (HUJI: N = 691; M:F = 1:1.002; mean age = 24.6 years; range: 18–39); the members of this group received coupons for coffee. The second group (P4A: N = 1356; M:F = 1:1.96; mean age = 29.07 years; range: 18–40) was recruited through the website www.panel4all.co.il (accessed on 15 September 2022), and participants were compensated with gift certificates from the panel4all organization. The ethics committee of the Department of Psychology at the Hebrew University of Jerusalem approved all experimental procedures.

Participants were given one minute in a category fluency test (CFT); the task was to produce as many unique words as possible within the semantic category of animal names. Participants from HUJI were recorded on a Philips DVT4010, and soundtracks were transcribed with the PRAAT program [19], which gave us the words as well as the time signatures for the beginning and end of each word. Participants from P4A were recorded on a phone application, and these soundtracks, too, were transcribed via PRAAT.

Two lists were generated for each participant: a list of words and a list of timestamps, with each timestamp indicating the start time of the word’s retrieval. The timestamps started at 0, indicating the beginning of the recording, and ended at 60.

We begin by describing the manner in which we construct the semantic graph. Let

G = (V, E, w)

be an edge-weighted and directed graph, with

V = \{v_{1}, v_{2} \dots v_{n}\}

a set of vertices and

E = \{(v_{i}, v_{j})\}

a set of edges or links between words

v_{i}

and

v_{j}

in V. The words are the vertices, and the edges reflect the relationship between words. The weight

w_{i j}

was based on the assumption that the closer the semantic relationship between two words, the faster the transition between these two words [20]. The weights of the edges were based on a proposal by Nachshon, Cohen, and Maril [21] and were calculated by a “distance” function that assumes, as expected for a metric, that the “distance” is non-negative. We did not assume symmetry, and we also allowed violations of the triangle inequality. Our “distances,” therefore, did not constitute a true metric.

Our “distance” function calculates the weights of the edges as follows. For any ordered pair of vertices

v_{i}, v_{j}

, p = p(s) is a sublist of participant s; the sublist starts at

v_{i}

and ends at

v_{j}

and denotes the amount of normalized time that it took s to traverse from

v_{i}

to

v_{j}

. Here, each timestamp was normalized by the number of words that s produced. The “distance” function has two free variables that determine the upper and lower boundaries, WS (window size) and MS (minimum subjects):

1.: The upper boundary window size (WS) defines the maximum number of words between $v_{i} and v_{j}$ . The sublist p is therefore taken into account when the number of words between $v_{i}$ and $v_{j}$ is less than or equal to the number WS.
2.: MS is a number defining the lower boundary, which is the minimum number of p’s containing $v_{i}$ and $v_{j}$ in that order, and with, at most, WS words between them.

Let P be the set of the amount of time it took any p to traverse from

v_{i}

to

v_{j}

up to WS words. The distance between the ordered pair

v_{i}

and

v_{j}

is then defined as the median of the set P, if |P| > MS. Otherwise, there will be no distance between

v_{i}

and

v_{j}

. In other words, a distance is well-defined iff |P| > MS. The length of any path on the graph will be the sum of the weights (or distances) of the edges composing it, as defined above. Additionally,

d_{i j}

denotes the shortest path, given by Dijkstra’s shortest path algorithm, from

v_{i}

to

v_{j}

based on

w_{i j}

.

By aggregating temporal information into a pre-defined window size, WS enables us to collapse temporal information into a snapshot static network. The greater the window size, the more fully the snapshot summarizes the broader temporal information. As a result, manipulation of window size allows us to compare snapshots with different degrees of temporary resolution. Descriptive analyses, such as the number of vertices with edges as a function of the free parameters WS and MS, are presented in Appendix A.

3. Results

This part of our paper is divided into three subsections. In Section 3.1, we focus on Forman–Ricci flow, the geometric tool that we use in our object-based dynamic measure, and offer an empirical demonstration of its effect on network structure. In Section 3.2, we present our novel measure and demonstrate how it can be used in a semantic context to quantify the impact of a word on the interactions between the words that follow. Section 3.3 compares our measure with centralities, scalar Ricci-curvature, frequency, and word location.

3.1. Forman–Ricci Flow

Our object-based dynamic measure relies on the notion of Forman–Ricci flow. In this subsection, we introduce Forman–Ricci curvature and then explain Forman–Ricci flow. We then present several analyses that demonstrate the impact of Forman–Ricci flow on the graph weights.

3.1.1. Forman–Ricci Curvature

In Riemannian geometry, the sectional curvature of a tangent plane captures, among other things, the dispersion of geodesics. Three basic behaviors can be distinguished for two geodesics with the “same” velocity and starting point. If they move away exponentially, the result is negative curvature (hyperbolic space). If they move away linearly, the result is zero curvature (Euclidean space). Finally, if they move away logarithmically, the result is positive curvature (spherical space). Ricci curvature of a given geodesic c generalizes the dispersion of geodesics in all the planes to which c belongs. In other words, Ricci curvature, which is defined on smooth manifolds, measures the degree to which the manifold deviates from being locally Euclidean in various tangential directions. Ricci curvature, therefore, carries information about a given geodesic’s local environment. In the discrete case, Ricci curvature is an edge measure that describes how well the neighborhood of the edge is connected; the measure, therefore, provides information not only on edge but also on the neighborhood of an edge.

Here, we focus on Forman–Ricci curvature. Forman [22] introduced a discrete notion of Ricci curvature based on the Bochner–Weitzenböck formula, which preserves the notion of the dispersion of geodesics in a discrete setting [23]. Forman defined a discrete notion of Ricci curvature for general geometric objects, the so-called weighted CW complexes. For our purposes, it is sufficient to introduce Ricci curvature for the limiting case of 1-dimensional simplicial complexes [24], i.e., graphs. For a directed and weighted graph, we use the 1-dimensional Forman–Ricci curvature (FR1) formulated by Saucan, Sreejith, Vivek-Ananth, Jost, and Samal [16]. In this version, three variables determine whether and to what degree FR1 is positive or negative. The key attributes for estimating the Ricci-curvature for edge e =

(v_{1}, v_{2}

) in a directed and weighted graph are the following:

1.: The smaller the weight $w_{e}$ of the edge $e$ , the more positive the curvature of $e$ .
2.: The greater the weighted in/out-degree of $v_{1}$ and $v_{1}$ , the more positive the curvature of $e$ . The weighted in-degree of vertex $v_{1}$ takes into account the weights $e_{1 j}$ , where $e_{1 j} = (v_{1,} v_{j}), j \neq 2$ , while the weighted out-degree of vertex $v_{2}$ takes into account the weights $e_{j 2},$ where $e_{j 2} = (v_{j,} v_{2}), j \neq 1$ .
3.: The smaller the $\deg (v_{1})$ and/or $\deg (v_{2})$ , the more positive the curvature of $e$ .

Forman–Ricci curvature

F R 1 (e)

in a directed and weighted graph is given by the following formula:

F R 1 (e) = w_{e} (\frac{w_{v_{1}}}{w_{e}} - \sum_{e_{I, v_{1}} ~ e} \frac{w_{v_{1}}}{\sqrt{w_{e} w_{e_{I, v_{1}}}}}) + w_{e} (\frac{w_{v_{2}}}{w_{e}} - \sum_{e_{O, v_{2}} ~ e} \frac{w_{v_{2}}}{\sqrt{w_{e} w_{e_{O, v_{2}}}}})

(1)

where

e

denotes the edge between vertices

v_{1}

and

v_{2}

,

w_{e}

denotes the weight of the edge

e

, and

w_{v_{1}}

and

w_{v_{2}}

denote the weights associated with vertices

v_{1}

and

v_{2}

. We define

w_{v_{i}} = 1

,

i = 1, 2

(In Forman’s formula (1) it is possible to take into account the weights of the vertices

w_{v_{1}}

and

w_{v_{2}}

. However, in this framework, we used the minimal necessary assumptions, and we, therefore, defined the weights of the vertices as 1. Further research may take up different definitions of the vertices’ weights). In addition, let

e_{I, v_{1}} ~ e

denote the incoming edges incident on vertex

v_{1}

and

e_{O, v_{2}} ~ e

denote the outgoing edges emanating from vertex

v_{2}

. The scalar

F R 1

curvature of vertex v in a graph is defined as follows:

F R 1 (v) = \sum_{T_{e} ~ v} F R 1 (e)

(2)

To clarify the link between Forman–Ricci curvature and the dispersion of geodesics, let us examine that link on an undirected and combinatorial graph, in other words, a 1-dimensional complex with only 0-simplices (vertices) and 1-simplices (edges). In this case, all weights—both edge and vertex ones—equal 1; therefore, Formula (1) simplifies as follows:

F R 1 (e) = 4 - \deg (v_{1}) - \deg (v_{2})

(3)

Thus, in this simple case, only the number of the edges incident to

e

, affects the magnitude of Forman–Ricci curvature. In consequence, the larger the number of the edges incident to

e

, the more negative the curvature, hence the information that passes through

e

disperses to other vertices. In other words, given that in Formula (3), the number of edges incident to

e

is determined only by the degree of vertices—the smaller the number of edges incident to

e

, the less dispersed the information passing through

e

.

3.1.2. Forman–Ricci Flow

Hamilton [25] introduced the notion of Ricci flow to reflect how the metric deforms a Riemannian manifold and to smooth out irregularities in the metric. More precisely, Ricci flow changes the metric such that regions with negative Ricci curvature expand and regions with positive Ricci curvature shrink.

In a discrete setting, a combinatorial version was introduced by Chow and Luo [26]. As in the continuous case, Ricci flow processes expand edges with highly negative Ricci curvature and shrink edges with highly positive Ricci curvature. The present article relies on the work of Weber, Jost, and Saucan [11], which applied Forman–Ricci flow for anomaly detection in the network, as well as the work of Ni, Lin, Luo, and Gao [13], which applied Ollivier–Ricci flow to reveal the community structure of a network and demonstrated that the removal of edges with high weights enables the detection of communities in the network.

Following the work of these researchers, for each iteration t, we define Ricci flow as follows:

w_{i j}^{t + 1} = d_{i j}^{t} - (η \cdot κ_{i j}^{t} \cdot d_{i j}^{t}),

(4)

where

η

is the learning rate of the gradient descent process, and

d_{i j}

denotes the length of the shortest path, given by Dijkstra’s shortest path algorithm, from

v_{i}

to

v_{j}

based on

w_{i j}^{t}

. Forman–Ricci curvature from

v_{i}

to

v_{j}

is denoted by

κ_{i j}^{t}

. In each iteration, the input is a weighted and directed graph G, and the output is a weighted and directed G with edge weight given by the Ricci flow metric. For each iteration, we followed these steps:

1.: Apply Dijkstra’s shortest path algorithm for any edge based on $w_{i j}^{t}$ to calculate $d_{i j}^{t}$ ;
2.: Compute the Forman–Ricci curvature $κ_{i j}^{t}$ for any edge based on $d_{i j}^{t}$ ;
3.: Update the edge weight by Formula (4);
4.: Repeat steps 1–3 for N iterations.

3.1.3. Analysis

The aim of this section is to illustrate the process of Forman–Ricci flow by describing the evolution of the metric as a function of Forman–Ricci curvature. Ricci flow changes the metric such that regions with negative Ricci curvature expand and regions with positive Ricci curvature shrink.

The analysis was conducted on the semantic network described in Section 1. The edge weights were defined by the “distance function” mentioned above, which includes the two free parameters of minimum subjects per edge (MS) and window size (WS), or the maximum number of words that determine the distance between a given pair of words. For the purpose of describing Forman–Ricci flow, we selected parameters WS = 7 and MS = 10. Note that for any graph constructed from any parameters WS and MS, the direction of the effect between Forman curvature and the change in the metric are the same. In addition, we ran 20 iterations of Forman–Ricci flow (Formula 4) when the learning rate was 0.001. See Figure 1 for the distribution of Forman–Ricci curvature and the weights before and after Forman–Ricci flow. Figure 2 illustrates the evolution of the graph as a function of Forman–Ricci curvature such that the greater the curvature of an edge, the higher the edge weight in the following Ricci flow iteration.

Figure 2 illustrates the fundamental relationship between Forman curvature at time t and the weight at time t + 1. A negative Forman curvature entails greater weights, while a positive Forman curvature entails smaller weights. The free parameters of the function that determines the weights affect the robustness of the graph and the temporal resolution of the weights: MS indicates the minimum number of people who moved from i to j, and the larger the WS, the more the weight from i to j can be based on relationships that contain a greater number of objects between them. It turns out that choosing other parameters for WS and MS changes the starting point of the analysis by establishing a different set of weights, but such a choice does not change the fundamental relationship between the curvature and the weight.

Our approach relies on the notion of Ricci flow in order to detect the core structure of the network. Consistent with Ni, Lin, Luo, and Gao [13], in a graph G in which the vertices are well connected such that the edges have positive curvature, then the structure of G after a number of iterations of Ricci flow is spherical. If the edges have negative curvature, then the structure of G after a number of iterations of Ricci flow is tree-shaped. As a result, in a graph that features areas with dense negative curvature surrounded by other areas containing edges with negative curvature, we find a division into communities

To estimate the effect of Forman–Ricci flow on the network structure, particularly the extent to which G is distributed to dense communities, we compare the network’s internal validity before and after calculating Forman–Ricci flow. Internal validity refers to the process of evaluating the quality of a clustering structure without referring to external information. Note that we use an internal validity approach because there is no ground truth.

We conducted the comparison between the internal validity of graph weights and the Forman–Ricci flow metrics as follows. First, for each set of weights, we initialized four different lists of clusters given by agglomerative, Birch, K-means, and greedy modularity communities on a range of a number of communities (2, 3, …, 30). Then, for each list of clusters and number of clusters, we computed internal validity with two different measures, Silhouette and Calinski–Harabasz scores. For greedy modularity communities, we computed internal validity by coverage and performance, with a high score indicating dense and well-separated communities. We anticipated that the Forman–Ricci flow process would increase the internal validation of the graph since, in the iterative process, we expected the metric to change so that the graph would split into more distinct communities than at the outset. We performed the analysis via the Scikit-learn package in Python [27]. For more details regarding the cluster algorithms and the internal validity measures, see Appendix A.

The results indicate that for the Calinski–Harabasz score, Ricci flow yields better results in all cases. For the Silhouette score, Ricci flow was better for all cases in Birch and for 28 out of 29 in agglomerative and K-means. In all cases, Ricci flow yields better results for coverage and performance scores (Figure 3). All told, the Forman–Ricci flow metric yields a better clustering performance than graph weights.

3.2. Object-Based Dynamic Measure

Here, we introduce our novel measure based on Forman–Ricci flow. This measure is applicable to many contexts, but to illustrate its use, we focus on the way it can quantify the impact of a word on semantic relations between other words.

To capture the object-based dynamic, we need to consider a multigraph

\hat{G} = (V, E, w, I)

. A network is called a multigraph if a pair of vertices can be connected with multiple edges. The additional information

I

is a class of disjointed sets, and in our case,

I

is an ordered set of conditional event indexes:

I = \{{\hat{G}|}_{s}, {\hat{G}|}_{\neg s}\}

. That is, two graphs represent any object

s

,

{\hat{G}|}_{s}

represents the weights between any two objects iff

s

appears before the pair of objects, and

{\hat{G}|}_{\neg s}

represent the weights between any two objects iff

s

does not appear before the pair of objects.

Formally speaking, for

\hat{G} = (V, E, w, I)

, we use the following steps to quantify the extent to which an object s changes the interactions between other objects:

1.: Define index $I$ for graph $\hat{G}$ such that $I = \{{\hat{G}|}_{s}, {\hat{G}|}_{\neg s}\},$ so that $s$ is represented by two graphs.
2.: For each element in $I$ , define the weights for ${\hat{G}|}_{s}$ as $w_{i j} \in {\hat{G}|}_{s} : = w_{i j} | s_{i - k}$ , for ${\hat{G}|}_{\neg s}$ as $w_{i j} \in, {\hat{G}|}_{\neg s} : = w_{i j} | \neg s_{i - k}$ , for $k = 1 \cup 2$ and $j \leq i + W S$ .
3.: Compute steps 1–4 of the Forman–Ricci flow algorithm for each of the graphs $\{{\hat{G}|}_{s}, {\hat{G}|}_{\neg s}\}$ .
4.: For any $w_{i j}$ and $w_{i j} \in {\hat{G}|}_{s}$ and $w_{i j} \in {\hat{G}|}_{\neg s}$ .

δ (s) = \frac{1}{N} \sum_{w_{i j} \in {\hat{G}|}_{s} and {\hat{G}|}_{\neg s}} {\hat{G}|}_{s} (w_{i j}) - {\hat{G}|}_{\neg s} (w_{i j})

(5)

where N denotes the number of

w_{i j}

that belongs to both graphs.

This object-based dynamic measure

δ (s)

can be used in two ways. The first, a directional approach, considers the direction of the effect. When an object that changes the weights is represented by a negative value, the structure is tree-like in the presence of s, meaning that the network conditional to s is significantly more diffuse than a network that is conditional to not s. When the object is represented by a positive value, the structure is spherical, meaning that the network conditional to s is more densely packed than a network that is conditional to not s. A value close to 0 indicates that there are no significant differences between the graphs. The second approach does not take direction into account. Instead, it considers the absolute value of

δ (s)

, which represents the extent to which an object changes the metric in any direction. We examine both approaches in the next section, and we present descriptive analyses regarding

δ (s)

in Appendix A. The significant difference between the two measures is that in the directed version, a high value indicates a spherical-like structure, while a low value indicates a tree-like structure. The non-directed measure answers the fundamental question of whether the object affects the metric: a low value indicates no effect, and a high value indicates an effect.

By considering both measures, it is possible to check whether existing measures establish a correlation with spherical-like or tree-like structures or with the degree of influence of an object on the metric.

3.2.1. Study Case

Attention to the ever-changing nature of the semantic network has recently gained momentum [28,29], and here we demonstrate the application of our novel measure to this new perspective. In the context of an ongoing semantic retrieval task, we use our object-based dynamic measure to assess the impact of a word S on the interactions between the words that follow. By comparing the structure of the Ricci flow metric in a network that is conditioned by the utterance of the word S to the structure of the Ricci flow metric in a network that is conditioned by not S, we can identify the impact of S on the interactions between the following words. In the following sections, we show that certain sets of objects significantly change the metric; we examine both the directional and nondirectional approaches and compare those approaches with measures that already exist.

3.2.2. Analysis

Our analysis aims to distinguish between the set of words that significantly affect the metric and the set of words that do not. To make this distinction, we assigned each word an object-based dynamic score given by

δ (s)

for any possible combination between the free parameters WS = (2,4,5,6,8,9) and MS = (3,5,7,9,15), the number of iterations of Forman–Ricci flow = (3,4,5), and a learning rate of 0.001. We based the score on 90 different networks to estimate the result’s stability for weights at different degrees of locality (WS) and power (MS and number of iterations). Finally, we performed a bootstrap test for each word to assess whether that word significantly changed the metric. The bootstrap was performed on the list of values in which the word received a

δ (s)

value (see Figure 4), when in each iteration (number of iterations = 10,000), the mean of a mini-batch

(N = 5)

was computed. Then, each word was bounded between the upper and lower numbers given by 10,000 mini-batch means, where the boundaries were corrected for multiple compressions via the Bonferroni correction, such that the lower boundary was

2.5 / l

and the upper boundary

100 - (2.5 / l)

, with

l = 130

, representing the number of compressions. Finally, if the value 0 was not included in the boundaries, we determined that the word’s effect on the metric was significantly different from 0; otherwise, we determined that the result was non-significant.

In sum, out of 130 words, 61 significantly changed the network structure, and 69 were non-significant (Figure 5). Additionally, in most cases, the words had small

δ (s)

values, which made the network structure more tree-like than it would have been in the absence of those words. We will include a comprehensive discussion about the direction of a word’s effect (tree-like or spherical) in a follow-up study. Additional analyses can be found in the appendix.

3.3. Exploratory Comparison Analysis

To determine whether our object-based dynamic measure,

δ (s)

, contributes new information, this section will compare

δ (s)

with already existing measures. We will examine both the directional and nondirectional approaches and compare them with centrality measures, scalar curvature measures, and word frequency and location. We will see whether the effect of an object on its environment, as measured in the object-based dynamic, is reflected in those alternative measures.

3.3.1. Centrality Measures

Centrality measures capture how a vertex influences the flow of information in the network, and in this section, we focus on two sets of this type of measure. The first set, which captures how close a vertex is to other objects, includes the number of triangles, closeness, and PageRank. The second set, which captures how a word functions as an intermediary, includes betweenness and LDC. We will examine empirically the extent to which

δ (s)

differs from all these other measures, but we must first introduce the measures:

The number of triangles: This measure calculates the number of undirected three-cliques for each vertex in the graph. The number of triangles is used to detect vertices that belong to numerous cliques. It is worth noting that this measure is closely related to the clustering coefficient.
Closeness: This measure is the inverse of farness, which is defined as the mean of the shortest paths to all other vertices [30,31]. Closeness can be interpreted as the expected time of arriving at a word through the graph’s shortest paths. The gist of this metric is to assign more importance to the vertices that are closest. The definition is as follows:

$C (v) = \frac{N - 1}{\sum_{u} δ (v, u)}$

(6)

where $δ (v, u)$ represents the length of the shortest path between $v$ and $u$ . This measure takes weight into account by averaging the shortest paths that emerge from $v$ .
PageRank: The basic idea of the PageRank algorithm, first introduced in a Google paper [32], is that a central vertex is determined not only by the number of incoming edges (in-degree) but also by the level of importance of the incoming vertices. T represents the set of vertices, $N_{u}$ represents the number of vertices to which vertex v points, and $S_{V}$ represents the set of vertices pointing to vertex v. Finally, $α$ is the damping factor of the probability of jumping from a given vertex to another random vertex in the graph. PageRank is computed as follows:

$PR (v) = \sum_{u \in S_{V}} \frac{P R (u)}{N_{u}} + \frac{1 - α}{T}$

(7)
Betweenness: This measure was introduced by Freeman [33] in order to quantify the extent to which a vertex tends to be on the shortest paths between other vertices—in other words, to serve as an intermediary. Betweenness for a vertex v is defined as follows:

$γ (v) = \sum_{i \neq v \neq j \in V} \frac{σ_{i j} (v)}{σ_{i j}}$

(8)

where $σ_{i j} (v)$ represents the number of shortest paths between i and j that go through v, and $σ_{i j}$ represents the total number of shortest paths between i and j.
LDC: In Cohen et al. [34], we recently proposed local detour centrality (LDC), a novel centrality measure for weighted and directed graphs that quantifies the tendency of a given vertex to lie on the shortest possible path between other vertices. In other words, LDC indicates whether the possible path between $v_{1}$ and $v_{2}$ is significantly closer when $v$ lies between them, in contrast to cases in which $v$ does not lie between them. Let $G = (V, E)$ be an edge-weighted and directed graph in which $w (v_{i}, v_{j})$ represents the weights from $v_{i}$ to $v_{j}$ and let $δ (v_{i}, v_{j})$ denote the shortest path from $v_{i}$ to $v_{j}$ based on Dijkstra’s shortest path algorithm. For any vertex $v$ ,
- Let $L = \{v_{1}, v_{2} \dots v_{n}\}$ such that any $v_{i} \in L$ if $δ (v, v_{i}) \leq r$ or $δ (v_{i}, v) \leq r$ . The number $r = \frac{1}{|V (G)|} \sum_{v_{i,} v_{j} \in V} δ (v_{i}, v_{j})$ is called the threshold.
- Let $G_{v} \subset G$ be a complete graph with $V (G_{v}) = L$ where, in this case, $w (v_{i}, v_{j}) = δ (v_{i}, v_{j})$ .

To calculate local intermediateness, for any vertex

v

, let us first introduce some further notations:

Let

G N = (V, E)

be the complete graph, where the weights are calculated according to the Dijkstra’s shortest path algorithm on the following weights:

w^{'} (v_{i}, v_{j}) = \{\begin{matrix} \max_{v_{k,} v_{l} \in V} (w (v_{k}, v_{l})) & v_{j} = v o r v_{i} = v \\ w (v_{i}, v_{j}) & o t h e r w i s e \end{matrix}

Since both graphs

G_{\bar{v}}

,

G_{v}

have the same edges but not the same weights, we denote E’ = E(

G_{\bar{v}}

) = E(

G_{v}

). Then, the local detour centrality (LDC) of the vertex

v

is defined as follows:

L D C (v) = \frac{1}{|L|} \sum_{e \in E^{'}} (G_{\bar{v}}_{e} - G_{v}_{e})

(9)

Analysis: In sum, centrality measures capture the extent to which a vertex dictates the flow of information in a static graph. Here, we examine whether centrality computed in a static graph encodes information about changes to the metric over time.

For any possible combination between the free parameters WS (window size), comprising the values (2,4,5,6,8,9); MS (minimum subjects per edge), comprising the values (3,5,7,9,15); the number of iterations of Forman–Ricci flow = (3,4,5); and a learning rate of Ricci flow algorithm at 0.001, each word received a list of values given by the object-based dynamic scores

δ (s)

and |

δ (s)

| as well as each centrality measure described above. We based the score on 90 different networks so that we could estimate the stability of the result for weights at different degrees of locality (WS) and power (MS and number of iterations).

For the directional approach, the mean correlation between

δ (s)

and the alternative centrality measures were as follows: number of triangles (M = 0.19, SD = 0.1), PageRank (M = 0.1, SD = 0.1), closeness (M = 0.2, SD = 0.1), betweenness (M = 0.2, SD = 0.1), and LDC (M = 0.17, SD = 0.1). For the nondirectional approach, the mean correlation between the absolute value

δ (s)

and the alternative centrality measures were as follows: number of triangles (M = 0.15, SD = 0.07), PageRank (M = 0.13, SD = 0.08), closeness (M = 0.12, SD = 0.1), betweenness (M = 0.11, SD = 0.09), and LDC (M = 0.11, SD = 0.13) (Figure 5).

All in all, we found a high correlation among centrality measures but a weak correlation between those measures and the object-based dynamic score. In comparison with the nondirectional approach, the directional approach yielded a slightly higher correlation to the alternative centrality measures.

3.3.2. Scalar Curvature

As we have already mentioned, Ricci curvature is classically defined on smooth manifolds and measures the deviation of the manifold from being locally Euclidean in various tangential directions. Ricci curvature is an edge-based measure defined by averaging all sectional curvatures of the planes where the edge is present. The measure captures two essential geometric properties: the growth of volume and the dispersion of geodesics. As a result, Ricci-curvature provides insight into the local geometric properties near the edge as well as the topological structure of the network. Scalar-curvature is a vertex measure that is calculated by averaging all Ricci curvatures in all directions in which the vertex appears. One might therefore expect Scalar curvature to capture local temporal changes in the network. In this section, we compare both object-based dynamic

δ (s)

and |

δ (s)

| to different discretizations of Ricci-curvature: Menger, 1-dimensional Forman, 2-dimensional Forman, and Haantjes.

Our analysis did not include Ollivier-Ricci curvature [35] because of its computational complexity compared to other measures and because of its metric assumptions; by contrast, we calculate the curvature directly on the weights. Since previous studies have shown that Ollivier-Ricci curvature maintains a high correlation with Forman–Ricci curvature in many networks [15], we have found it sufficient to focus on two different perspectives of Forman–Ricci curvature in this framework.

1.: Haantjes curvature: This curvature [36] is a path-based measure and can be generalized to a path of length $n$ by replacing path ab ∪ bc with a path of length $n$ . For a general discrete notion of Haantjes curvature, let $π = (v_{0}, \dots, v_{n})$ be a directed path between the vertices $v_{0}$ and $v_{n}$ . The simplified Haantjes-Ricci curvature of the path $π$ is then defined as follows [37]:

$κ_{H}^{2} (π) = \frac{l (π) - d (v_{0}, v_{n})}{d (v_{0}, v_{n})}$

(10)

Next, the Haantjes-scalar curvature of $v$ in $G$ is defined as follows:

$κ s_{H}^{2} (v) = \sum_{π ~ v} κ_{H}^{2} (π)$

(11)

where $π ~ v$ denotes the paths that are anchored to vertex $v$ . Note that while Menger-Ricci and Forman–Ricci take into account only triangles or simple paths of length 2, Haantjes-Ricci considers paths of any length; in this study, however, we limit the analysis to triangles because of computational considerations. The resulting simplicial complexes are canonical geometric structures that are commonly used to model higher-order correlations in networks.
2.: Menger–Ricci curvature: In 1930 Menger introduced a discrete definition of Ricci curvature K(T) for any three vertices in the network. Let (M, d) be a metric space and T = T(a, b, c) be a triangle with sides lengths a, b, c, and denote $p = \frac{a + b + c}{2}$ . Then the Menger curvature of T is given by

$K_{m} (T) = \frac{\sqrt{p (p - a) (p - b) (p - c)}}{a b c}$

(12)

The Menger–Ricci curvature of a vertex e in a network can be defined as

$K_{m} (v) = \sum_{T_{e} ~ v} K_{m} (e)$

(13)

where T_e ∼ e denotes the triangles adjacent to the vertex v. Intuitively, if an edge is part of several triangles in the network, that edge will have a high positive Menger-Ricci curvature.
Note that the definition of Menger-Ricci in (12) relies on the geometry of the Euclidian plane. This geometric assumption is incorrect in many cases, however, including hyperbolic spaces [14]. By contrast with Menger-Ricci curvature, Haantjes and Forman–Ricci curvatures do not assume any background geometry.
3.: Forman–Ricci curvature: There are different ways to calculate Forman–Ricci curvature of a directed and weighted network. Formula (1) described 1-dimensional Forman–Ricci curvature, and here we present an extension by Saucan, Sreejith, Vivek-Ananth, Jost, and Samal [16], the 2-dimensional Augmented Forman–Ricci curvature (AFR). FR1 considers only the pairwise correlation between vertices; while computing FR1 for the directed edge $e = (v_{1}, v_{2})$ , we consider only the incoming edges to $v_{1}$ and the outcoming edges from $v_{2}$ . By contrast, AFR considers 2-dimensional faces—cases in which three vertices form a triangle—and potentially higher-order faces as well. Although there are four different types of directed triangles, we consider only two types (see Figure 6). The directed triangular face t formed by vertices ( $v_{1}, v_{2}, v_{3})$ and edges {( $v_{1}, v_{2}), (v_{2}, v_{3}), (v_{3}, v_{1})}$ makes a positive contribution. In this case, since the information from $v_{3}$ “flows” back to $v_{1}$ , the triangle represents a spherical structure that is consistent with the interpretation of a positive Forman–Ricci curvature. The directed triangular face t formed by edges {( $v_{1}, v_{2}), (v_{1}, v_{3}), (v_{3}, v_{2})}$ makes a negative contribution. This triangle represents a tree-like structure since the information “flows” out from vertex $v_{1}$ to vertex $v_{2}$ and so also from vertex $v_{1}$ to $v_{3}$ and from there to $v_{2}$ .

The augmented Forman–Ricci curvature

A F R (e)

in a directed and weighted graph is given by the following equation:

A F R (e) = w_{e} (\sum_{e < t} \frac{w_{e}}{w_{t}} + (\frac{w_{v_{1}}}{w_{e}} + \frac{w_{v_{2}}}{w_{e}})) - \sum_{e_{I, v_{1}}, e_{O, v_{2}} ~ e, e_{I, v_{1}}, e_{O, v_{2}} ≮ t} (\frac{w_{v_{1}}}{\sqrt{w_{e} w_{e_{I, v_{1}}}}} + \frac{w_{v_{2}}}{\sqrt{w_{e} w_{e_{O, v_{2}}}}})

(14)

Here w_t denotes the weight of face t, and σ < τ means that σ is a face of τ. The augmented Forma–Ricci curvature of a vertex v in a network can be defined as

A F R (v) = \sum_{T_{e} ~ v} F R 1 (e)

(15)

Note that in order to calculate

κ s_{H}^{2}

,

K_{m} (v), F R 1 (v)

and

A F R (v)

in all of our statistical analyses, we use the median and not the sum for statistical reasons.

Analysis: As we did in our previous analysis, we constructed 90 networks based on the different possible combinations of the free parameters WS, MS, and number of iterations. For the directional approach, the mean correlation between

δ (s)

and the alternative scalar curvature measures were as follows: Haantjes [M = 0.21, SD = 0.14], Menger [M = 0.22, SD = 0.12], 1D-Forman [M = 0.21, SD = 0.17]. and 2D-Forman [M = −0.18, SD = 0.12]. For the nondirectional approach, the mean correlation between the absolute value

δ (s)

and the alternative scalar curvature measures were as follows: Haantjes [M = 0.09, SD = 0.15], Menger [M = 0.11, SD = 0.12], 1D-Forman [M = −0.02, SD = 0.16] and 2D-Forman [M = −0.1, SD = 0.12] (Figure 7).

In sum, we found a low degree of correlation between the object-based dynamic score and the different discretizations of scalar curvature for both approaches. The directional approach correlates with the alternative scalar curvature measures more closely than the nondirectional approach, and this result is consistent with our previous analysis. In addition, we found a nontrivial and high degree of correlation between Menger–Ricci curvature and 2-dimensional Forman–Ricci curvature. This observation is highly relevant since it shows that for all practical purposes, the elementary Menger curvature and the far less intuitive 2-dimensional Forman–Ricci curvature essentially coincide. This coincidence demonstrates that the combinatorically defined Forman curvature also captures the essential geometry of the underlying simplicial complex.

3.3.3. Frequency and Word Location

Word frequency effects are manifested in many ways, including lexical access [38], lexical decision-making [39], perceptual identification [40], and recall [41]. In addition, word frequency is strongly related to lexical similarity, so word frequency reflects the strength of the connection to other words [42]. The effect of word frequency on access and similarity raises the possibility that word frequency influences the upcoming semantic interactions. Furthermore, in a serial search, word location is negatively correlated with log-frequency [43,44]. Another measure that is closely related to word frequency is degree centrality. For a directed graph

G

and the vertex

v ϵ V

, the in-degree of vertex v refers to the number of arcs that incident from v, and the out-degree refers to the number of arcs that incident to v. As will be shown below, word frequency is strongly correlated with degree distribution. Given all these considerations, in this analysis, we examine three strongly related but dissimilar measures—frequency, location, and degree centrality—and test the correlation between each of them and our object-based dynamic measure.

Analysis: As in our previous analyses, we constructed 90 networks based on the different possible combinations of the free parameters WS, MS, and number of iterations. For the directional approach, these were the mean correlations between

δ (s)

and each measure: log-frequency (M = 0.28, SD = 0.14), average location (M = −0.33, SD = 0.14), degree-in (M = 0.15, SD = 0.12), and degree-out (M = 0.24, SD = 0.12). For the nondirectional approach, the mean correlations between absolute

δ (s)

and each measure were as follows: log-frequency (M = 0.19, SD = 0.11 ), average location (M = −0.14, SD = 0.12), degree-in (M = 0.08, SD = 0.15), and degree-out (M = 0.17, SD = 0.11) (Figure 8).

In sum, the correlation with log-frequency, degree-in, and degree-out is low for both directional and non-directional approaches. We found the strongest connection between the directed approach and word location; words that change their environment tend to appear at the beginning of the task. By contrast, the correlation among log-frequency, word location, degree-in, and degree-out was near perfect.

4. Discussion

The explosion of big data increasingly includes temporal information, which plays a central role in shaping network structure. This study attempts to include temporal information in network analysis by shifting from a static to a temporal perspective. We present our object-based dynamic approach, which investigates the extent to which an object that is represented in the network by a vertex affects the network’s topology over time. We explore how the presence of an object may affect subsequent interactions between other objects, leading to topological changes in the network.

By combining tools from complex networks and differential geometry, we can present a fine-grained description of the object-based dynamic approach, an approach that enables us, like other researchers who have combined the same tools, to study applied problems [45,46,47]. The use of multigraphs provides a useful representation of network evolution, with each set of edges representing a distinct and discrete event in the flow of information over time. In our case, each object receives two sets of edges, one expressing the interactions conditional to the presence of another specific object, and the other expressing the interactions conditional to that object’s absence. With these two sets of edges, we can investigate whether the object’s direction affects the shape of the network. To do this, we implement Forman–Ricci flow separately to each set, aiming for a regularization effect for each set of edges and a revelation of their core topology. By allowing us to compare the geometry of the two sets, this process enables us to determine the impact of an object’s presence.

The main advantage of our measure is that it does not estimate the relative contribution of a vertex for a given set of weights, as can be done using various graph measures. Instead, the estimation is based on a comparison between two sets of weights, which makes it possible to examine the degree of influence of an object over time. Since the measure is object- rather than vertex-based, it does not represent a property of a vertex in the graph, and as a result, it cannot be used in the context of a structural analysis of the graph. It functions instead as a comparative measure between objects, shedding light on the direction and strength of an object’s influence on the relationships between other objects. As a result, one can distinguish between centrality measures and object-based dynamic measures. Centrality measures examine the connections that pass through, enter, or exit a given vertex in a graph with a defined set of weights. Here, we estimate the impact of a chosen object by comparing it to distinct sets of weights, the weights conditional to its presence and the weights conditional to its absence.

This comparative feature of our object-based approach makes it applicable to the analysis of social networks. To this aim, one might build two graphs for each person, with one graph representing the social interactions among other people given that person’s presence, and the other graph representing the social interactions among the same set of people given that person’s absence. By comparing the graphs, it would be possible to examine to what extent that person influences social relations. Here, a major constraint is the need for serial data in the construction of networks. The effect of the chosen person on the group can be examined only if we compare the social interactions that follow the person’s specific contribution to the social interactions that take place without that previous contribution. Data that would enable such a comparison include timestamped posts or responses in a social network; these would allow the researcher to see how the group’s interactions after that person’s input compared to the group’s interactions after other people’s input.

Our own context, however, is the semantic network. More specifically, we assess the impact of a word retrieved from memory on the subsequent words that the agent retrieves. Our purpose is to describe the effect of Forman–Ricci flow as a regularization term on the weights. The networks achieved with and without the use of this Forman–Ricci flow are significantly different on various internal validity scores; after regularization, the network cluster topology is much more defined. This finding is consistent with Ni, Lin, Luo, and Gao [13], but they use Ollivier curvature and not Forman curvature as we do. Another of our goals is to uncover the set of words whose presence significantly changes the topology. Finally, we examine whether the temporal perspective represented by our object-based dynamic indeed contributes new information beyond that provided by existing measures that are based on a static graph. At least for the empirical data provided by the retrieval task, we demonstrate that centrality measures, scalar curvature measures, word location, and frequency are poorly correlated with the object-based dynamic score.

A potential follow-up study might examine the differences between Ricci flow based on 1-dimensional Forman curvature and Ricci flow based on 2-dimensional Forman curvature. This research might investigate the differences in the speed with which the graph is characterized by singularity and the creation of a network characterized by well-defined, distinct communities.

In the context of a semantic network, our object-based measure both resembles and differs from the work of Nachshon, Cohen, and Maril [21], which demonstrates how a specific word might influence relations between other words. Their demonstration considers all the triplets (A,B,C) that violate the triangle inequality assumption (AB + BC ≥ AC). By replacing AC with the conditional edge BC preceded by A, one can assess whether the conditional edge will continue to violate triangle inequality. This approach, like ours, investigates the impact of an object on other interactions. However, while Nachshon, Cohen, and Maril examine local influence, a single edge at a time, this paper takes a global approach, building a whole new network to examine an object’s influence on the network’s structure. Further investigation will examine the fundamental difference between words that affect the metric and words that do not, and then take up the difference between words whose presence creates a tree-like space and words whose presence creates a spherical space.

Author Contributions

Conceptualization, H.C., Y.N., P.M.N., A.M. and E.S.; methodology, H.C. and Y.N.; formal analysis, H.C., Y.N., P.M.N., J.J. and E.S.; statistical analysis, H.C.; writing—original draft preparation, H.C.; writing—review and editing, H.C., J.J. A.M. and E.S.; visualization, H.C.; supervision, A.M. and E.S.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the German-Israeli Foundation [grant I-1514-304.6/2019 to Emil Saucan and Jürgen Jost] and the Israel Science Foundation [grant 1471/20 to Anat Maril].

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the ethics committee of the Department of Psychology at the Hebrew University of Jerusalem (15 November 2021).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

All of the Python code and raw data material are publicly available via OSF (https://osf.io/832tc/, accessed on 17 August 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

This appendix includes the following elements: (1) descriptive statistics of the semantic graph and the object-based dynamic measure and (2) additional information regarding the cluster algorithms and internal validity measures.

Appendix A.1. Descriptive Analysis

Figure A1. (a) This graph presents the percentage of words belonging to the following frequency ranges: (1) 0–5; (2) 6–15; (3) 16–30; (4) 31–60; (5) 61–100; (6) 101–500; (7) >500. (b) This pie chart presents the empirical cumulative distribution of word frequency.

Figure A2. (a) Each cell in this heatmap represents the number of vertices that receive at least one weight given by the “distance function” for the pair of parameters MS (minimum subjects per edge) and WS (window size). (b) This heatmap is for the number of edges for the pair of parameters MS and WS. The x-axis denotes the range of MS values, and the y-axis denotes the range of WS values. The color highlights the number of vertices/edges. Bluer cells indicate a lower number of vertices/edges; redder cells indicate a higher number of vertices/edges.

A network was constructed for any possible combination among WS, MS, and the threshold, a total of 90 networks. For each network, a set of words received an object-based dynamic score. Figure A3 illustrates the number of scores that each word received out of 90 networks. In some cases, a word received a small number of scores since the combination of the free parameters sometimes left the word with no defined connection to other words.

Figure A3. The number of object-based dynamic scores that each word received in 90 networks where the free parameters of the object-based dynamic were WS, MS, and the threshold.

Two concerns arise here. First, is there a correlation between the number of scores and the object-based dynamic measure? To determine the answer, on the one hand, we extracted the average value of the object-based dynamic score for all the values obtained by up to 90 networks (i.e., obtained by all the possible combinations of the parameters WS and MS). On the other hand, we calculated the number of times, out of those 90 networks, that each word received a score. As Figure A4 shows, we found significant correlation between the mean score and the number of scores (

n = 130, r = 0.297, p = 0.001

). However, by removing 25 values that are 1 SD from the mean, the correlation is no longer significant (

n = 105, r = 0.167, p = 0.08

).

A second concern is whether a word that significantly changes the metric may be affected by the number of scores. Figure A4 shows the distribution of the number of scores of non-significant words compared to that of significant words. A Mann–Whitney U test revealed a significant difference between the groups (U = 1224, n₁ = 61, n₂ = 69, p < 0.001, two-tailed). The greater the number of scores, the higher the probability that the word will not be significant.

Figure A4. (a) This panel presents the correlation between the number of scores and the mean object-based dynamic score. (b) This panel presents the correlation after removing dots that are 1SD higher or lower than the mean. (c) This panel presents histograms of the number of scores for sets of significant and non-significant words.

Trivially, these findings suggest that sample size should be taken into account. A more reliable estimate of the number of words that significantly affect the metric would be the set of words characterized by high frequency. If one wants to take a conservative approach, one should remember that out of 23 words that received scores in at least 80 cases, 7 words significantly changed the metric.

Appendix A.2. Internal Validity Scores

In Section 2, we evaluated the differences in the cluster structure of the network before and after Ricci flow for lists of clusters given by agglomerative, Birch, K-means, and greedy modularity communities on a range of a number of communities (2, 3, …, 30). An internal validation was performed with the following calculations to assess the quality of the partition given by agglomerative, Birch, and K-means. The silhouette score is defined for each point as follows:

s i l (p) = \frac{c l o - d i f f}{\max (x, y)}

where clo denotes the average distance between point p and all other points in the same cluster, and diff denotes the average distance between point p and all points in the next nearest cluster. While clo expresses the closeness of points in the same cluster, diff expresses the differences between clusters. The silhouette score for the list of clusters is the average sil for all points p in the network.

The second score is the Calinski–Harabasz, for data set E of size

n_{E}

, which was clustered in K sets. Let us first define the between-set dispersion,

t r (B_{K})

, and the trace of within-set dispersion,

t r (W_{b})

:

t r (B_{K}) = \sum_{q = 1}^{k} \sum_{x \in c_{q}} (x - c_{q}) {(x - c_{q})}^{T}

t r (W_{b}) = \sum_{q = 1}^{k} \sum_{x \in c_{q}} (x - c_{q}) {(x - c_{q})}^{T}

The Calinski–Harabasz score is defined as the ratio between

t r (B_{K})

and

t r (W_{K})

:

S H = \frac{t r (B_{K})}{t r (W_{K})} \cdot \frac{n_{E} - k}{k - 1}

For both silhouette and Calinski–Harabasz, a high score means that the model has a well-defined cluster structure.

To assess the quality of the partition given by greedy modularity communities, we calculated coverage and performance. Coverage is defined by dividing the number of intra-community edges by the total number of edges. Thus, when a network has a well-defined cluster structure—in other words, when all edges of the network fall within clusters—the coverage is 1. The second measure, performance, is calculated for each cluster c as follows. Let C be a cluster, E the set of edges, and n the number of vertices.

P (c) = \frac{|\{(i, j) \in E, C_{i} = C_{j}\}| + |\{(i, j) \notin E, C_{i} \neq C_{j}\}|}{n (n - 1) / 2}

By definition, 0 ≤

P (c)

≤ 1. This score counts the number of occasions that two vertices belonging to the same community are connected by an edge and adds the number of occasions that two vertices belonging to different communities are not connected by an edge. Thus, the greater the value of P(c), the more well-defined the graph’s cluster structure.

References

Chatterjee, T.; Albert, R.; Thapliyal, S.; Azarhooshang, N.; DasGupta, B. Detecting network anomalies using Forman–Ricci curvature and a case study for human brain networks. Sci. Rep. 2021, 11, 1–14. [Google Scholar] [CrossRef] [PubMed]
Tannenbaum, A.; Sander, C.; Zhu, L.; Sandhu, R.; Kolesov, I.; Reznik, E.; Senbabaoglu, Y.; Georgiou, T. Ricci Curvature and Robustness of Cancer Networks. Available online: https://arxiv.org/abs/1502.04512 (accessed on 17 August 2022).
Pouryahya, M.; Mathews, J.; Tannenbaum, A. Comparing Three Notions of Discrete Ricci Curvature on Biological Networks. Available online: https://arxiv.org/abs/1712.02943 (accessed on 17 August 2022).
Sandhu, R.S.; Georgiou, T.T.; Tannenbaum, A.R. Ricci curvature: An economic indicator for market fragility and systemic risk. Sci. Adv. 2016, 2, e1501495. [Google Scholar] [CrossRef]
Hofmann, S.G.; Curtiss, J.; McNally, R.J. A complex network perspective on clinical science. Perspect Psychol. Sci. 2016, 11, 597–605. [Google Scholar] [CrossRef] [PubMed]
Rubinov, M.; Sporns, O. Complex network measures of brain connectivity: Uses and interpretations. Neuroimage 2010, 52, 1059–1069. [Google Scholar] [CrossRef]
Holme, P.; Saramäki, J. Temporal networks. Phys. Rep. 2012, 519, 97–125. [Google Scholar] [CrossRef]
Latapy, M.; Viard, T.; Magnien, C. Stream graphs and link streams for the modeling of interactions over time. Soc. Netw. Anal. Min. 2018, 8, 1–29. [Google Scholar] [CrossRef]
Banerjee, A. Structural distance and evolutionary relationship of networks. Biosystems 2012, 107, 186–196. [Google Scholar] [CrossRef]
Rossetti, G.; Cazabet, R. Community discovery in dynamic networks: A survey. ACM Comput. Surv. 2019, 51, 1–37. [Google Scholar] [CrossRef]
Weber, M.; Jost, J.; Saucan, E. Forman-Ricci flow for change detection in large dynamic data sets. Axioms 2016, 5, 26. [Google Scholar] [CrossRef]
Weber, M.; Saucan, E.; Jost, J. Coarse geometry of evolving networks. J. Complex Netw. 2018, 6, 706–732. [Google Scholar] [CrossRef]
Ni, C.C.; Lin, Y.Y.; Luo, F.; Gao, J. Community detection on networks with Ricci flow. Sci. Rep. 2019, 9, 1–12. [Google Scholar] [CrossRef] [PubMed]
Saucan, E.; Samal, A.; Jost, J. A simple differential geometry for complex networks. Netw. Sci. 2021, 9, S106–S133. [Google Scholar] [CrossRef]
Samal, A.; Sreejith, R.P.; Gu, J.; Liu, S.; Saucan, E.; Jost, J. Comparative analysis of two discretizations of Ricci curvature for complex networks. Sci. Rep. 2018, 8, 1–16. [Google Scholar] [CrossRef]
Saucan, E.; Sreejith, R.P.; Vivek-Ananth, R.P.; Jost, J.; Samal, A. Discrete Ricci curvatures for directed networks. Chaos Solitons Fractals 2019, 118, 347–360. [Google Scholar] [CrossRef]
Sreejith, R.P.; Mohanraj, K.; Jost, J.; Saucan, E.; Samal, A. Forman curvature for complex networks. J. Stat. Mech. 2016, 6, 063206. [Google Scholar] [CrossRef]
Caldwell-Harris, C.L. Frequency effects in reading are powerful–but is contextual diversity the more important variable? Lang Linguist Compass 2021, 15, e12444. [Google Scholar] [CrossRef]
Boersma, P.; Weenink, D. Praat: Doing Phonetics by Computer (Version 6.1.39). Available online: https://www.praat.org (accessed on 16 July 2020).
Collins, A.M.; Loftus, E.F. A spreading-activation theory of semantic processing. Psychol. Rev. 1975, 82, 407–428. [Google Scholar] [CrossRef]
Nachshon, Y.; Cohen, H.; Maril, A. Empirical Evidence for a Semantic Distance in a Patch: Investigating Symmetry and the Triangle Inequality Violations. Available online: https://psyarxiv.com/fhkq8/ (accessed on 10 August 2022). [CrossRef]
Forman, R. Bochner’s method for cell complexes and combinatorial Ricci curvature. Discrete Comput. Geom. 2003, 29, 323–374. [Google Scholar] [CrossRef]
Bauer, F.; Hua, B.; Jost, J.; Liu, S.; Wang, G. The geometric meaning of curvature: Local and nonlocal aspects of Ricci curvature. In Modern Approaches to Discrete Curvature; Najman, L., Romon, P., Eds.; Lecture Notes in Mathematics; Springer: Cham, Switzerland, 2017; Volume 2184, pp. 1–62. [Google Scholar] [CrossRef]
Aktas, M.E.; Akbas, E.; El Fatmaoui, A. Persistence homology of networks: Methods and applications. Appl. Netw. Sci. 2019, 4, 1–28. [Google Scholar] [CrossRef]
Hamilton, R.S. Three-manifolds with positive Ricci curvature. J. Differential Geom. 1982, 17, 255–306. [Google Scholar] [CrossRef]
Chow, B.; Luo, F. Combinatorial Ricci flows on surfaces. J. Differential Geom. 2003, 63, 97–129. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Müller, A.; Nothman, J.; Louppe, P.; et al. E. Scikit-learn: Machine learning in Python. J. Mach Learn Res. 2011, 12, 2825–2830. [Google Scholar] [CrossRef]
Hills, T.T.; Kenett, Y.N. Is the mind a network? maps, vehicles, and skyhooks in cognitive network science. Top Cogn. Sci. 2022, 14, 189–208. [Google Scholar] [CrossRef]
Wulff, D.U.; Hills, T.; Hertwig, R. Memory Is One Representation Not Many: Evidence Against Wormholes in Memory. Available online: https://psyarxiv.com/b5ynj/ (accessed on 16 January 2020). [CrossRef]
Borgatti, S.P.; Everett, M.G. A graph-theoretic perspective on centrality. Soc. Netw. 2006, 28, 466–484. [Google Scholar] [CrossRef]
Freeman, L.C. Centrality in social networks conceptual clarification. Soc. Netw. 1978, 1, 215–239. [Google Scholar] [CrossRef]
Brin, S.; Page, L. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 1998, 30, 107–117. [Google Scholar] [CrossRef]
Freeman, L.C. A set of measures of centrality based on betweenness. Sociometry 1977, 40, 35–41. [Google Scholar] [CrossRef]
Cohen, H.; Nachshon, Y.; Naim, P.M.; Jost, J.; Saucan, E.; Maril, A. Local Detour Centrality: A Novel Local Centrality Measure for Weighted Networks. Available online: https://arxiv.org/abs/2208.03158 (accessed on 5 August 2020).
Ollivier, Y. Ricci curvature of metric spaces. C. R. Math. 2007, 345, 643–646. [Google Scholar] [CrossRef]
Haantjes, J. Distance geometry. Curvature in abstract metric spaces. Proc. Kon. Ned. Akad Wetenseh. 1947, 50, 496–508. [Google Scholar]
Samal, A.; Pharasi, H.K.; Ramaia, S.J.; Kannan, H.; Saucan, E.; Jost, J.; Chakraborti, A. Network geometry and market instability. R. Soc. Open Sci. 2021, 8, 1–18. [Google Scholar] [CrossRef]
Forster, K.I.; Chambers, S.M. Lexical access and naming time. J. Verbal Learning Verbal Behav. 1973, 12, 627–635. [Google Scholar] [CrossRef]
Scarborough, D.L.; Cortese, C.; Scarborough, H.S. Frequency and repetition effects in lexical memory. J. Exp. Psychol. Hum. Percept. Perform. 1977, 3, 1–17. [Google Scholar] [CrossRef]
Morton, J. Interaction of information in word recognition. Psychol. Rev. 1969, 76, 165–178. [Google Scholar] [CrossRef]
Brysbaert, M.; New, B. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav. Res. Methods 2009, 41, 977–990. [Google Scholar] [CrossRef] [PubMed]
Nelson, D.L.; McEvoy, C.L. What is this thing called frequency? Mem. Cognit. 2000, 28, 509–522. [Google Scholar] [CrossRef]
Adelman, J.S.; Brown, G.D.A. Modeling lexical decision: The form of frequency and diversity effects. Psychol. Rev. 2008, 115, 214–229. [Google Scholar] [CrossRef] [PubMed]
Rubenstein, H.; Garfield, L.; Millikan, J.A. Homographic entries in the internal lexicon. J. Verbal Learn. Verbal Behav. 1970, 9, 487–494. [Google Scholar] [CrossRef]
Aguirre-Hernández, B.; Frías-Armenta, M.E.; Muciño-Raymundo, J. Geometry and dynamics of the Schur-Cohn stability algorithm for one variable polynomials. Math. Control. Signal. Syst. 2019, 31, 545–587. [Google Scholar] [CrossRef]
Gu, X.; Wang, Y.; Chan, T.F.; Thompson, P.M.; Yau, S.T. Genus zero surface conformal mapping and its application to brain surface mapping. In Information Processing in Medical Imaging; Lecture Notes in Computer Science; Taylor, C., Noble, J.A., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2732, pp. 172–184. ISBN 978-3-540-40560-3. [Google Scholar]
Angenent, S.; Pichon, E.; Tannenbaum, A. Mathematical methods in medical image processing. Bull. New Ser. Am. Math. Soc. 2006, 43, 365–396. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The distribution of (a) weights, (b) Forman–Ricci curvature, and (c) weights after 20 iterations of Forman–Ricci flow.

Figure 2. The evolution of edge weights as a function of Forman–Ricci curvature (represented by the different colors) over 20 iterations of Forman–Ricci flow on a semantic network with 183 vertices and 6018 edges. Each line represents the evolution of an edge weight along iterations of Forman–Ricci flow.

Figure 3. The clustering performance of the Forman–Ricci flow metric compared to graph weights for four different cluster algorithms and two internal validity scores.

Figure 4. The upper panel represents the set of non-significant words, and the lower panel the set of significant words. Each bar represents the range between the upper and lower boundaries given by the bootstrap test. The numbers represent the number of values out of 90 networks that each word received.

Figure 5. Spearman correlations between object-based dynamic and centrality measures. “Dynamic” denotes the directed approach, and “Dynamic abs” denotes the undirected approach.

Figure 6. Four archetypical triangles in a directed network. For a directed edge from

v_{1}

to

v_{2}

, colored in black, the incoming edges are colored in blue and the outgoing edges are colored in red. Type (a) represents a backward loop that contributes positively to the curvature. Type (b) represents a forward loop that contributes negatively to the curvature. Since Types (c,d) have no defined direction, we do not take them into account in our calculations.

Figure 6. Four archetypical triangles in a directed network. For a directed edge from

v_{1}

to

v_{2}

, colored in black, the incoming edges are colored in blue and the outgoing edges are colored in red. Type (a) represents a backward loop that contributes positively to the curvature. Type (b) represents a forward loop that contributes negatively to the curvature. Since Types (c,d) have no defined direction, we do not take them into account in our calculations.

Figure 7. Spearman correlations between object-based dynamic and scalar curvature measures. “Dynamic” denotes the directed approach, and “Dynamic abs” denotes the undirected approach.

Figure 8. This graph presents the Spearman correlations between the object-based dynamic and log-frequency, average location, degree-in, and degree-out.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cohen, H.; Nachshon, Y.; Maril, A.; Naim, P.M.; Jost, J.; Saucan, E. Object-Based Dynamics: Applying Forman–Ricci Flow on a Multigraph to Assess the Impact of an Object on The Network Structure. Axioms 2022, 11, 486. https://doi.org/10.3390/axioms11090486

AMA Style

Cohen H, Nachshon Y, Maril A, Naim PM, Jost J, Saucan E. Object-Based Dynamics: Applying Forman–Ricci Flow on a Multigraph to Assess the Impact of an Object on The Network Structure. Axioms. 2022; 11(9):486. https://doi.org/10.3390/axioms11090486

Chicago/Turabian Style

Cohen, Haim, Yinon Nachshon, Anat Maril, Paz M. Naim, Jürgen Jost, and Emil Saucan. 2022. "Object-Based Dynamics: Applying Forman–Ricci Flow on a Multigraph to Assess the Impact of an Object on The Network Structure" Axioms 11, no. 9: 486. https://doi.org/10.3390/axioms11090486

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Object-Based Dynamics: Applying Forman–Ricci Flow on a Multigraph to Assess the Impact of an Object on The Network Structure

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Forman–Ricci Flow

3.1.1. Forman–Ricci Curvature

3.1.2. Forman–Ricci Flow

3.1.3. Analysis

3.2. Object-Based Dynamic Measure

3.2.1. Study Case

3.2.2. Analysis

3.3. Exploratory Comparison Analysis

3.3.1. Centrality Measures

3.3.2. Scalar Curvature

3.3.3. Frequency and Word Location

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Descriptive Analysis

Appendix A.2. Internal Validity Scores

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI