Wasserstein Dissimilarity for Copula-Based Clustering of Time Series with Spatial Information

Benevento, Alessia; Durante, Fabrizio

doi:10.3390/math12010067

Open AccessArticle

Wasserstein Dissimilarity for Copula-Based Clustering of Time Series with Spatial Information

by

Alessia Benevento

and

Fabrizio Durante

^*

Dipartimento di Scienze dell’Economia, Università del Salento, 73100 Lecce, Italy

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(1), 67; https://doi.org/10.3390/math12010067

Submission received: 13 November 2023 / Revised: 12 December 2023 / Accepted: 21 December 2023 / Published: 24 December 2023

(This article belongs to the Special Issue Stochastic Processes: Theory, Simulation and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The clustering of time series with geo-referenced data requires a suitable dissimilarity matrix interpreting the comovements of the time series and taking into account the spatial constraints. In this paper, we propose a new way to compute the dissimilarity matrix, merging both types of information, which leverages on the Wasserstein distance. We then make a quasi-Gaussian assumption that yields more convenient formulas in terms of the joint correlation matrix. The method is illustrated in a case study involving climatological data.

Keywords:

clustering; copula; dissimilarity matrix; optimal transport; time series; Wasserstein distance

MSC:

62H30; 60E15; 49Q22

1. Introduction

Time-series cluster analysis collects various unsupervised learning techniques for organizing data points collected over time into groups based on their similarity. The general objective is to maximize data similarity within clusters and minimize it across clusters (see, for instance, [1,2]).

Usually, such methods depend on the different ways of defining the similarity among time series, which can be based on (a) the observed time series values or suitable transformations thereof; (b) specific features extracted from the time series, such as autocorrelation, periodograms, etc.; or (c) the data-generating stochastic process (see [3]).

Copula-based clustering methods for time series tend to group time series according to the degree of cross-sectional dependence among different patterns (see [4] and references therein). Typical choices include rank-correlation measures [5], tail dependence coefficients [6,7,8], and copula distances and/or divergences [9,10,11,12,13]. As their specific feature, the obtained cluster composition is invariant under a monotone transformation of the original time series, hence it is usually more robust against the presence of outliers. Copula-based algorithms, hence, focus on the joint comovements among time series and, as such, are particularly used in risk analysis due to their ability to detect extreme scenarios that occur when both time series tend to fall/rise at the same time) [6,7,14,15,16].

When time series are collected at different geographic locations, i.e., geo-referenced data are considered where a variable is observed over time at a static spatial location, copula-based algorithms may, hence, detect the impacts of extreme events across space [17]. In such cases, it may be of interest to obtain clusters that show geographical proximity in order to enhance their interpretability. See, for instance, [18,19]. These situations occur not only in environmental science, but also in the financial context. In this latter case, the attributes may describe the economic distance between the markets and can provide deeper insights into the dependence structure of stock returns (see, e.g., [20,21,22]).

The goal of this paper is to consider clustering of time series for geo-referenced data. In particular, we focus on the construction of a novel pairwise dissimilarity among time series that takes into account the spatial information.

To fix ideas, consider

n \geq 2

real-valued time series

x_{i} = (x_{i 1}, \dots, x_{i T})

of length T, where

x_{i t}

represents the observed value of the i-th time series

(i = 1, \dots, n)

at the t-th period

(t = 1, \dots, T)

. Each time series is equipped with additional information on the phenomenon under consideration, represented by a p-dimensional vector of attributes,

s_{i} = (s_{i 1}, \dots, s_{i p}) \in R^{p}

, where

s_{i k}

represents the value of the i-th unit

(i = 1, \dots, n)

at the k-th attribute

(k = 1, \dots, p)

. The additional attributes are typically geographic coordinates of the location at which the i-th time series is observed (thus,

p = 2

and latitude and longitude are collected). For this reason, we will refer to these vectors as the spatial information.

The classical strategy in spatio-temporal clustering (see, e.g., [23]) is to start with a

(n \times n)

matrix of pairwise dissimilarities among

x_{1}, \dots, x_{n}

. This latter matrix is, hence, modified by a nonlinear function of the separating distances among

s_{1}, \dots, s_{n}

. Finally, the modified dissimilarity matrix is used as the input to obtain the clustering partition via algorithms such as (agglomerative) hierarchical clustering, PAM, and fuzzy C-means (see, e.g., [24,25,26]). In the copula-based framework, such an approach has been adopted, for instance, in [27,28], which considers the temporal matrix induced by the (pairwise) copula parameters. In [29], instead, the correlation-based dissimilarity matrix is transformed by means of the spatial information, taking into account the intrinsic geometry of the space of correlation matrices. According to [30], all these methods have the tendency to produce a smooth dissimilarity matrix without, however, reinforcing the spatial contiguity between the resulting clusters.

In [11] (see also [31]), a different strategy has been proposed. It consists of associating with the spatial information (usually contained in the Euclidean space) a suitable copula (the spatial copula). Such a latter object is, hence, combined with the temporal copula to obtain an element of the copula space that merges temporal and spatial information, and is used for the calculation of the associated dissimilarity.

The present work exploits tools from optimal transport and the Wasserstein metric (see, e.g., [32,33]) to modify the framework of [11] by introducing the following main novelties: (a) the dissimilarity measure is constructed by equipping the space of bivariate copulas

C

with the Wasserstein metric

d_{W_{2}}

; (b) the temporal and spatial information is merged (at the level of pairwise association) via geodetic curves [34] (also known as weighted barycenters [35]) in

(C, d_{W_{2}})

. Such an approach has the following advantages:

The Wasserstein distance allows for a meaningful comparison between distributions also without density. This property is not shared by the most common distances and divergences, such as the total variation distance, the Hellinger distance, or the Kullback–Leibler divergence (see, e.g., [36]).
The Wasserstein metric seems to be more appropriate to measure distance between copulas since it does not lead to counter-intuitive clusters (see, e.g., [37]).

As possible limitations, however, the calculations cannot always be explicitly made. For this reason, we rely on a quasi-Gaussian approach of the presented framework, as in [38].

The paper is organized as follows. In Section 2, we consider some tools from optimal transport theory that will be used in the manuscript. In Section 3, we present the methodology underlying the computation of a dissimilarity matrix. In Section 4, we present a quasi-Gaussian approximation of the dissimilarity matrix extraction. Section 5 illustrates the effect of the spatial dependence to the whole procedure. Finally, Section 6 presents an empirical application. Section 7 concludes the paper.

2. Background on Optimal Transport, Wasserstein Distance, and Copulas

In order to extract the spatial information and to merge the temporal and spatial dependencies, we will leverage on the 2-Wasserstein distance between probability measures [39,40] and on optimal transport theory [32,33]. To this end, we recall some basic facts that will be used in this paper (see, e.g., [32,33,39]).

Given a domain

Ω \subseteq R^{d}

(that we take to be compact and convex for simplicity), we define the space

W_{2} (Ω)

as the space of probability measures on

Ω

with finite moments of order 2 endowed with the distance

d_{W_{2}}

, defined through

d_{W_{2}} (μ_{X}, μ_{Y}) = {(\inf_{γ \in Γ (μ_{X}, μ_{Y})} \int_{Ω \times Ω} {| x - y |}^{2} d γ (x, y))}^{1 / 2},

(1)

where

Γ (μ_{X}, μ_{Y})

denotes the collection of all joint measures on

Ω \times Ω

whose marginals are

μ_{X}

and

μ_{Y}

, respectively.

It is possible to prove that the above minimization problem has a solution, which is unique (and it is concentrated on the graph of a map

f^{O T} : Ω \to Ω

, called the optimal transport map) if

μ_{X}

is absolutely continuous with respect to the Lebesgue measure (actually, it is enough that

μ_{X}

vanishes on small sets [32]). Moreover,

d_{W_{2}}

is a distance in

W_{2} (Ω)

.

The space

W_{2} (Ω)

is a geodesic space where, for two measures

μ_{X}

and

μ_{Y}

, the unique geodesic curve connecting them is obtained through

μ^{(α)} = {((1 - α) i d + α f^{OT})}_{#} μ_{X},

(2)

where

f^{OT}

is the optimal transport map from

μ_{X}

to

μ_{Y}

,

i d

is the identity mapping on

R^{d}

, and

α \in [0, 1]

. Here,

T_{#} μ

denotes the push-forward of the measure

μ

under the mapping T. Such a curve is also known as the displacement interpolation [34]. This provides a useful interpolation between

μ_{X}

and

μ_{Y}

which is, in general, different from the convex combination of the two measures. As a matter of fact,

μ^{(α)}

coincides with the weighted barycenter [35] of

(μ_{X}, 1 - α)

and

(μ_{Y}, α)

, i.e., it solves the minimization problem

μ^{(α)} = \underset{μ}{arg min} ((1 - α) d_{W_{2}}^{2} (μ_{X}, μ) + α d_{W_{2}}^{2} (μ_{Y}, μ))

(3)

over all possible probability measures

μ

(see also [41,42]). Wasserstein barycenters have found various applications in machine learning, especially in dealing with image recognition (see, for instance, [39]).

The actual computation related to the Wasserstein distances is largely simplified when we deal with Gaussian measures. In fact, the Wasserstein distance between centered Gaussian distributions is well known and it is given by the so-called Bures–Wasserstein distance between their covariance matrices. Specifically, let

X \sim N (0, \sum_{1})

and

Y \sim N (0, \sum_{2})

be two Gaussian random vectors with positive semi-definite covariance matrices

\sum_{1}

and

\sum_{2}

, respectively. Then, the following holds (see, e.g., Theorem 2.2 in [43]):

d_{W_{2}}^{2} (X, Y) = d_{W_{2}}^{2} (\sum_{1}, \sum_{2}) = t r (\sum_{1}) + t r (\sum_{2}) - 2 t r ({(\sum_{1}^{1 / 2} \sum_{2} \sum_{1}^{1 / 2})}^{1 / 2}) .

(4)

Such a formula also extends to the case when

\sum_{1}

and

\sum_{2}

are singular (see [44], p. 239). In the particular case that

X

and

Y

are also two-dimensional, the Wasserstein distance has the following expression (see [44], p. 239):

d_{W_{2}}^{2} (X, Y) = t r (\sum_{1}) + t r (\sum_{2}) - 2 {(t r (\sum_{1} \sum_{2}) + 2 \sqrt{d e t (\sum_{1} \sum_{2})})}^{1 / 2} .

(5)

The (weighted) barycenter among the Gaussian vectors

X

and

Y

is a Gaussian vector whose covariance matrix is of the form (see, e.g., Lemma 2.3 in [43])

\sum^{(α)} = \sum_{1} #_{α} \sum_{2} = \sum_{1}^{- 1 / 2} ((1 - α) \sum_{1} + α {(\sum_{1}^{1 / 2} \sum_{2} \sum_{1}^{1 / 2})}^{1 / 2}) \sum_{1}^{- 1 / 2} .

(6)

Standard references for the study of the Wasserstein distance for Gaussian measures include [44,45,46,47].

Beyond the Gaussian cases, explicit expressions for the Wasserstein distance and/or the optimal couplings for multivariate measures are rare. However, the involved probability measures can be approximated by taking consistent empirical versions of the input measures. See, for instance, [39,40].

In the following, we will mainly consider doubly stochastic measures [48], i.e., probability measures on

{[0, 1]}^{2}

whose margins coincide with the Lebesgue measure on

[0, 1]

. These measures are better known as copulas. The space

C

of a bivariate copula can be ordered with respect to pointwise order among functions. It contains the Fréchet–Hoeffding upper-bound copula

M (u, v) = \min (u, v)

, which interprets the maximal degree of similarity (comonotonicity) between two random variables. Moreover, the Fréchet–Hoeffding lower bound is given by the copula

W (u, v) = \max (u + v - 1, 0)

, which interprets the counter-monotonic behavior of two variables. Another notable copula is

Π (u, v) = u v

, which represents the independence of random variables.

The set

C

, considered as a compact subset of

W_{2} ({[0, 1]}^{2})

with respect to weak convergence, will be equipped with the metric

d_{W_{2}}

. In general, the use of the Wasserstein metric for capturing dependence aspects has been recently employed in [38,49,50], among others. Typically, copula-based dependence measurements are expressed as a discrepancy of the estimated copula from the independent case. Here, instead, we will consider the Wasserstein distance between a copula and the comonotonicity copula M. This perspective has been also recently considered in [36] in a Bayesian setting.

Copulas have been recently adopted in the construction of various dissimilarity measures that are used as input to clustering algorithms (see, for instance, [51] and references therein). The use of a dissimilarity based on the Wasserstein distance between copulas appeared, for instance, in [37] in order to measure the distances among different Gaussian copulas for clustering purposes (see also [52]). Here, we extend the range of applications to the case of time series with geo-referenced information.

3. The Methodology

Our aim is to cluster

n \geq 2

time series, each of them equipped with a (static) p-dimensional vector of attributes associated with each unit, which are typically geographic coordinates of the location at which the time series are observed.

Given the input time series and the (static) attributes, and a fixed weighting parameter

α \in [0, 1]

, we aim at finding an

(n \times n)

dissimilarity matrix

Δ^{(α)}

that merges all the previous information. The (

i, j)

-entry

Δ_{i j}^{(α)}

of

Δ^{(α)}

interprets the temporal dependence between units i and j by taking into account the similarity between

s_{i}

and

s_{j}

. The relative importance of each of the two criteria is summarized by the parameter

α

, i.e.,

α = 0

corresponds to no influence of the spatial component.

For every pair

(i, j)

with

i \neq j

,

Δ_{i j}^{(α)}

is constructed in the following way:

Determine the copula $C_{i j}^{ts}$ that describes the temporal dependence between the i-th and j-th time series.
Determine the copula $C_{i j}^{sp}$ that interprets the spatial proximity between the attribute vectors $s_{i}$ and $s_{j}$ associated with the i-th and j-th time series.
Merge $C_{i j}^{ts}$ and $C_{i j}^{sp}$ into one single copula $C_{i j}^{(α)}$ that represents their weighted barycenter. This copula depends on the tuning parameter $α$ . Then, define

$Δ_{i j}^{(α)} = d_{W_{2}} (M, C_{i j}^{(α)}),$

i.e., the distance of $C_{i j}^{(α)}$ from the comonotonicity copula M, which represents maximal concordance.

Once the dissimilarity matrix is obtained, it can be used as an input of various algorithms like hierarchical agglomerative methods, medoids-based procedures, etc.

The previously described procedure is illustrated in detail below.

3.1. Extract the Temporal Dependence

In order to capture the dependence among the time series, it is usual in the context of copula-based clustering (see, for instance, [4]) to proceed in two steps. First, we filter serial dependence using univariate time-series models, like the ARMA-GARCH family, and hence, we model the cross-sectional dependence using a copula for the residuals (see, for instance, [53,54,55,56]).

To fix ideas, we assume that the multivariate time series follows the model

X_{t} = (X_{1 t}, \dots, X_{n t})

, where

X_{t} = μ_{t} (θ) + σ_{t} (θ) ε_{t}, (t = 1, \dots, T)

(7)

where the innovations

ε_{t} = (ε_{1 t}, \dots, ε_{n t})

are independent and identically distributed, with

E (ε_{i t}) = 0

and

V (ε_{i t}) = 1

for

i = 1, \dots, n

, with the continuous joint distribution function H. Moreover,

μ_{t}

and

σ_{t}

are the (time-varying) conditional mean and standard deviation, respectively, and they are both

F_{t - 1}

-measurable and independent of

ε_{t}

. Here,

F_{t - 1}

contains information from the past and possibly information from exogenous variables as well (see, e.g., [55]). Since the distribution function H is continuous, there exists a unique copula C so that for all

x \in R^{n}

,

H (x) = C (F_{1} (x_{1}), \dots, F_{n} (x_{n})),

where

F_{i}

is the distribution function of

ε_{i t}

for every t. Defining

U_{t} = (F_{1} (ε_{1 t}), \dots, F_{n} (ε_{n t}))

, one obtains that

U_{1}, \dots, U_{T}

are independent and identically distributed with distribution function C. Since the marginal distributions are unknown,

U_{t}

is not observable. However, given an estimator

\hat{θ}

of

θ

, we can compute the residuals

e_{t} = \frac{x_{t} - μ_{t} (\hat{θ})}{σ (\hat{θ})}

(8)

for

t = 1, \dots, n

. The ranks associated with the residuals contain the information about the copula among the time series for any fixed t and, as such, they can be used to capture the cross-sectional dependence (see, e.g., [56]). Specifically, for any

t = 1, \dots, T

, let

r_{i t}

be the rank of

e_{i t}

among the residuals

e_{i 1}, \dots, e_{i T}

of the i-th time series. The multivariate scaled ranks

u_{t} = (\frac{r_{1 t}}{T + 1}, \dots, \frac{r_{n t}}{T + 1})

(9)

are the so-called pseudo-observations.

Now, for every pair

(i, j)

,

i \neq j

, the copula

C_{i j}^{ts}

that describes the cross-sectional dependence between the i-th and j-th time series can be obtained from the associated pseudo-observations

{(u_{i t}, u_{j t})}_{t = 1, \dots, T}

. Here, one can adopt:

a parametric approach, i.e., one assumes that $C_{i j}^{ts}$ belongs to the same specific family of copulas, whose parameter can be fitted via, e.g, maximum likelihood techniques. See, e.g., [57].
a non-parametric approach, which assumes that $C_{i j}^{ts}$ coincides with one smoothed version of the empirical copula associated with the pseudo-observations, such as the empirical checkerboard copula [58,59] or the empirical beta copula [60].

In the following, if not otherwise stated, we assume that every

C_{i j}^{ts}

is absolutely continuous (with respect to the Lebesgue measure).

3.2. Extract the Spatial Dependence

In order to describe the geographical proximity, we preliminarily select a specific family of copulas that can be parameterized as

{(C_{θ})}_{Θ}

, with

Θ = [θ_{\min}, θ_{\max}] \subset R

. In particular, we assume that

M \in {C_{θ_{\min}}, C_{θ_{\max}}}

, i.e., the family includes the Fréchet–Hoeffding upper bound as a limiting case. Moreover, the following technical condition is assumed

θ \mapsto d_{W_{2}} (M, C_{θ}) is continuous and strictly monotone in θ .

(10)

The continuity should be interpreted in the sense of topology of uniform convergence, which is equivalent to pointwise convergence (see, e.g., [48]). Roughly speaking, condition (10) implies a one-to-one correspondence between the copula parameter and the Wasserstein distance between M and any member of the given parametric family.

Now, for every pair

(i, j)

,

i \neq j

, we assume that the copula

C_{i j}^{sp}

that describes the proximity among the attributes

s_{i}

and

s_{j}

belongs to

{(C_{θ})}_{Θ}

, i.e.,

C_{i j}^{sp} = C_{θ_{i j}}

. Moreover, we assume that

θ_{i j}

only depend on the (normalized) distance

d_{i j}

between

s_{i}

and

s_{j}

.

Furthermore, we want to ensure that

C_{θ_{i j}}

approaches M, i.e., it is close to the comonotonic case, when the normalized geographic distance is close to zero. To this end, we propose to select, for every pair

(i, j)

,

θ_{i j}

as the unique value satisfying the equality:

\frac{d_{W_{2}} (M, C_{θ})}{\max_{θ^{'} \in Θ} d_{W_{2}} (M, C_{θ^{'}})} = \frac{d i s t (s_{i}, s_{j})}{\max_{i^{'}, j^{'}} d i s t (s_{i^{'}}, s_{j^{'}})} (= d_{i j}) .

(11)

The existence and uniqueness of such a parameter is guaranteed by (10).

Various copula families satisfy the conditions stated above.

Example 1.

Consider the parametric family of copulas of type

\begin{matrix} C_{θ}^{Fre} (u, v) = (1 - θ) M (u, v) + θ W (u, v) \end{matrix}

(12)

where

θ \in [0, 1]

, M and W are the Fréchet–Hoeffding upper- and lower-bound copulas, respectively. Copulas of type (12) belong to the so-called Fréchet class, which was suggested as a possible model for spatial dependence in [11]. For such copulas, we have (see Appendix A)

d_{W_{2}} (M, C_{θ}^{Fre}) = \sqrt{\frac{θ}{3}} .

Moreover, the family is continuous in θ with respect to uniform convergence.

Now, according to Equation (11), if such a family is used to model spatial dependence, then the parameter is chosen so that

\frac{\sqrt{θ_{i j} / 3}}{\sqrt{1 / 3}} = \frac{d i s t (s_{i}, s_{j})}{\max_{i^{'}, j^{'}} d i s t (s_{i^{'}}, s_{j^{'}})} = d_{i j},

which gives

θ_{i j} = d_{i j}^{2}

.

3.3. Create the Dissimilarity Measure

In order to merge the temporal and spatial information, for a fixed

α \in [0, 1]

, we associate to each pair of units

(i, j)

,

i \neq j

, the copula

C_{i j}^{(α)}

associated with the displacement interpolation of (2). Namely,

C_{i j}^{(α)}

is the copula associated with the probability measures

μ^{(α)}

that solves the minimization problem

μ^{(α)} = \underset{μ \in W_{2} ({[0, 1]}^{2})}{arg min} ((1 - α) d_{W_{2}}^{2} (μ_{C_{i j}^{ts}}, μ) + α d_{W_{2}}^{2} (μ_{C_{i j}^{sp}}, μ)),

(13)

where

μ_{C_{i j}^{ts}}

and

μ_{C_{i j}^{sp}}

are the measures associated with the copulas

C^{ts}

and

C^{sp}

, respectively. The existence of such a copula is guaranteed under the assumption that

C_{i j}^{ts}

is absolutely continuous. Moreover, in such a case,

μ^{(α)}

is also absolutely continuous and it admits a unique copula in view of Sklar’s theorem [48].

Remark 1.

Generally, given two copulas

C_{1}

and

C_{2}

, the copula of their weighted barycenter, given by (2), does not coincide with the convex combination of the two copulas. This fact follows from, e.g., [34], and it is illustrated in Figure 1.

Finally, according to a general way of calculating dissimilarity as a distance from the comonotonic case [51], we define the dissimilarity between the time series i and j as

Δ_{i j}^{(α)} = d_{W_{2}} (M, C_{i j}^{(α)}) .

(14)

4. The Quasi-Gaussian Approach

Although theoretically appealing, the actual computation related to the dissimilarity in (14) is involved. In fact, generally,

μ^{(α)}

(and, hence,

C^{(α)}

) cannot be calculated in a closed form. Therefore, following seminal ideas in [38], we use a a quasi-Gaussian approach based on correlation matrices in order to define a modified dissimilarity. The main idea is to replace the copula space with the space of all bivariate Gaussian distributions with standard marginals. Such distributions are called G-copulas in [38] and their space will be denoted by

G

. In fact, in this latter case, the calculations can be obtained only in terms of the corresponding correlation matrices, as recalled in Section 2.

We recall that each G-copula is characterized by a Gaussian vector with mean

(0, 0)

and correlation matrix

\sum (θ) = (\begin{matrix} 1 & θ \\ θ & 1 \end{matrix}) .

Then, for every pair

(i, j)

,

i \neq j

, we proceed as follows.

The copula $C_{i j}^{ts}$ from Section 3.1 is replaced with the G-copula with correlation matrix $\sum ({\hat{θ}}_{i j}^{ts})$ , where ${\hat{θ}}_{i j}^{ts}$ equals the estimation of the normal score correlation among the involved observations (as suggested in [38]).
The copula $C_{i j}^{sp}$ from Section 3.2 is replaced with the G-copula with correlation matrix $\sum ({\hat{θ}}_{i j}^{sp})$ , where ${\hat{θ}}_{i j}^{sp}$ is the unique value $θ$ that solves

$\frac{d_{W_{2}} (\sum (1), \sum (θ))}{\max_{θ^{'} \in [0, 1]} d_{W_{2}} (\sum (1), \sum (θ^{'}))} = d_{i j},$

where $d_{i j}$ is the normalized distance between $s_{i}$ and $s_{j}$ . Notice that $\sum (1)$ is the (singular) correlation matrix showing maximal dependence (comonotonic case). For the existence and uniqueness of ${\hat{θ}}_{i j}^{sp}$ see Remark 2. In particular, we notice that since $θ^{'} \in [0, 1]$ , the maximal spatial distance is interpreted by the vector of independent components.
For a fixed $α \in] 0, 1 [$ , the copula $C_{i j}^{(α)}$ from Section 3.3 is replaced with the G-copula having the correlation matrix

$\sum^{(α)} = D^{- 1 / 2} \sum^{'} D^{- 1 / 2},$

(15)

where $\sum^{'} = \sum ({\hat{θ}}_{i j}^{ts}) #_{α} \sum ({\hat{θ}}_{i j}^{sp})$ is the weighted barycenter of $\sum ({\hat{θ}}_{i j}^{ts})$ and $\sum ({\hat{θ}}_{i j}^{sp})$ given by (6) and D is the diagonal matrix associated with $\sum^{'}$ . In other words, we transform the covariance matrix of the weighted barycenter into a correlation matrix with the natural projection.

Summarizing, in the quasi-Gaussian approach, we define the modified dissimilarity between the time series i and j as

{\tilde{Δ}}_{i j}^{(α)} = d_{W_{2}} (\sum (1), \sum^{(α)}),

(16)

i.e., in terms of the Wasserstein distance between two correlation matrices associated with elements of

G

. This is analogous to Equation (14) since both matrices

\sum (1)

and M represent the comonotonic case, while

C_{i j}^{(α)}

and

\sum^{(α)}

correspond to the weighted barycenters in their respective spaces.

Remark 2.

In the bivariate case, the Wasserstein distance among two Gaussian distributions is obtained via Formula (5). In particular, if we consider the singular correlation matrix

\sum (1)

, we obtain that

d_{W_{2}}^{2} (\sum (1), \sum (θ)) = 4 - 2 \sqrt{2 + 2 θ} .

The plot of

θ \mapsto d_{W_{2}} (\sum (1), \sum (θ))

is visualized in Figure 2.

5. An Illustration with a Fuzzy-PAM Algorithm

Given an

(n \times n)

dissimilarity matrix among time series, various algorithms can be exploited to provide a suitable clustering. Here, we will apply a (fuzzy) partitioning-around-medoid (PAM) clustering method for time series whose output expresses the membership degree of each time series to a cluster (see [11]). As known, the main advantage of PAM is that the prototypes (i.e., medoids) of each cluster are time series actually observed and not average time series, which is often very appealing for the interpretation of the selected clusters (see, e.g., [61]). For a fixed number K of clusters, the fuzzy PAM algorithm can be formalized as follows:

\begin{matrix} \min_{u_{i j}} & \sum_{i = 1}^{n} \sum_{k = 1}^{K} u_{i k}^{p} Δ_{i k} \\ s . t . \sum_{k = 1}^{K} u_{i k} = 1, u_{i k} \geq 0, \end{matrix}

where

u_{i k}

indicates the membership degree of the i-th unit to the k-th cluster (

k = 1, \dots, K

); and

p > 1

is a weighting exponent that controls the fuzziness of the obtained partition (hereinafter,

p = 1.5

). Here,

Δ_{i k}

is a suitable dissimilarity between the time series of the i-th unit and the time series of the k-th medoid.

To illustrate the proposed algorithm, we consider the fuzzy PAM algorithm applied to the dissimilarity given in (16). We consider a scenario similar to the one in Section 3.1 of [11]. Specifically, we consider

n = 48

time series of innovations of length

T = 100

. We assume that for each

i = 1, \dots, 48

, the i-th time series has been observed with a vector of attributes

s_{i}

so that

d i s t (s_{i}, s_{j}) = d i s t (s_{j}, s_{i}) = 1

if

1 \leq i \leq 12

and

13 \leq j \leq 48

; otherwise,

d i s t (s_{i}, s_{j}) = 0

. Roughly speaking, the time series can be either contiguous (distance equal to 0) or not (distance equal to 1).

Moreover, the time series have a temporal dependence given by the following copula model:

C (u_{1}, \dots, u_{48}) = C_{1} (u_{1}, \dots, u_{24}) \cdot C_{2} (u_{25}, \dots, u_{48}),

(17)

where

C_{1}

and

C_{2}

are copulas belonging to the Frank family (see, e.g., [48]), with a pairwise Kendall’s

τ \in {0.25, 0.50, 0.75}

.

Figure 3 reports the membership degree of unit 1 to the same cluster of unit 13 at different levels of

α

(which is the weight assigned to the spatial component). Clearly, when

α = 0

(i.e., no spatial information) units 1 and 13 tend to belong to the same cluster, since they are positively associated via the copula

C_{1}

. However, when

α

increases, the spatial component plays a major role and, roughly speaking, it moves unit 13 far from unit 1, i.e., into a different cluster.

It is important to highlight the effects of different choices of the spatial component. When we assume that the maximal spatial distance between two units corresponds to a zero correlation in the Gaussian model (Figure 3, on the left), the membership degree of unit 1 to its natural temporal cluster tends to decline sharply only for

α

close to

0.5

. Instead, the decline comes earlier when the maximal spatial distance between two units corresponds to a correlation equal to

- 1

in the Gaussian model (Figure 3, on the right). We notice that this latter choice was also adopted in [11] when the Fréchet family of copulas was used.

6. An Empirical Application

In this section, we apply the fuzzy-PAM clustering algorithm on time series representing the summer temperature maxima of Italy from 1971 to 2023. The data have been downloaded from the Climate Data Store (https://cds.climate.copernicus.eu/, accessed on 1 September 2023) which collects global climate and weather data of the past eight decades. Specifically, we focus on JJA (June–July–August) maxima of daily maximum temperatures over a grid of the Italian land. The considered data are

n = 527

grid points. Similarly to [62], we detrend the observed temperatures from the long-term warming trend, following a two-step procedure. First, we remove the multi-year climatological average from the daily temperature maxima within the dataset. Then, from these temperature residuals, we remove the 92-day running average. After this detrending process, we consider the maximum value of each season for each grid point, leading to a collection of time series of length

T = 53

. Given the set of detrended time series and the geographical locations, we compute the temporal and the spatial dependencies, as described in Section 4. Specifically, for every pair of

(i, j)

,

i \neq j

, we use the estimation of the normal score correlation to compute the temporal correlation matrix

\sum ({\hat{θ}}_{i j}^{ts})

as in step 1, and we use the normalized distances

d_{i j}

to compute the spatial correlation matrix

\sum ({\hat{θ}}_{i j}^{sp})

according to step 2. Initially, we consider the fuzzy-PAM algorithm applied to the dissimilarity given in (16) for

α = 0

representing the pure temporal case. We select the number of clusters via the fuzzy silhouette (FS) index [63], a suitable measure for the fuzzy clustering algorithm that is computed as the weighted average of the individual silhouettes. The better the units are assigned to the clusters, at the same time as minimizing intra-cluster distance and maximizing inter-cluster distance, the higher the value of FS. We represent the obtained clusters in the pure temporal case (

α = 0

) in Figure 4. From the maximization of the FS we have that the optimal number of clusters is

K = 3

.

In the pure temporal case, time series belonging to the same cluster may be not contiguous. This aspect can be reduced by combining the two dissimilarities via Equation (16) for some

α \in [0, 1]

. As an illustration, we show the cluster composition when

α = 0.30

(clearly, other values are possible, although a range of 0.05–0.30 seems generally reasonable, as suggested in [30]).

In order to choose the optimal number of clusters for

α = 0.30

, we proceed with the fuzzy silhouette index maximization. Table 1 shows the evolution of the fuzzy silhouette index for different values of

K \in {2, \dots, 10}

. Thus,

K = 3

is selected.

The corresponding clustering configuration is shown in Figure 5.

We note that when

α

increases, the cluster composition increases the spatial contiguity among its elements. Clearly, however, some time series still exhibit a behavior that is not driven by geographic proximity. Such time series are particularly of interest in order to detect spatial anomalies in some geographic area, and their behavior should be the object of in-depth investigations.

7. Conclusions

In this paper, we proposed a way to define a suitable dissimilarity matrix interpreting the comovements of time series subjected to spatial constraints. The ultimate goal was the clustering of time series with geo-referenced information. The proposed extraction of the dissimilarity matrix leverages on the 2-Wasserstein distance between probability measures and on optimal transport theory. Moreover, as the computational aspects of such an extraction are involved in practical applications, we proposed a quasi-Gaussian approximation of the correlation matrices in order to define a modified dissimilarity and reduce the complexity and the computational burden of the procedure. In the second part of the paper, we presented an illustration with a fuzzy-PAM algorithm to show the effect of the spatial dependence on the whole procedure. Lastly, we proposed a case study showing the effects of the proposed method on real data involving climatic time series.

Author Contributions

All authors contributed equally to this work; A.B. mainly focused on validation and investigation; F.D. mainly focused on investigation and visualization. All authors have read and agreed to the published version of the manuscript.

Funding

A.B. acknowledges the support of Regione Puglia via the Programma Regionale “RIPARTI (assegni di RIcerca per riPARTire con le Imprese)”-research project “FIRST: a Framework for Innovation in Risk management to support Territories” (code: c19a5daa). F.D. has been supported by MIUR-PRIN 2017, Project “Stochastic Models for Complex Systems” (No. 2017JFFHSH) and by MIUR-PRIN 2022, Project “Statistical Mechanics of Learning Machines: from algorithmic and information theoretical limits to new biologically inspired paradigms” (No. 20229T9EAT). The work of F. D. has been carried out with the partial financial support from ICSC—Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing, funded by European Union NextGenerationEU (CUP F83C22000740001). F.D. is also member of the group GNAMPA of INdAM (Istituto Nazionale di Alta Matematica).

Data Availability Statement

The datasets that were used in this study are available online on the following link, https://cds.climate.copernicus.eu/, accessed on 1 September 2023.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Here we compute the Wasserstein distance

d_{W_{2}} (M, (1 - θ) M + θ W)

. Due to the probabilistic interpretation of the copulas M, W and of their convex combination (see, e.g., [48]), this corresponds to computing

\inf \{{(E {((U, U) - (V, 2 Z V - V - Z + 1))}^{2})}^{1 / 2}\}

where the infimum is taken over all possible joint probability distribution of the involved random variables. Here, U and V are random variables uniformly distributed on

[0, 1]

, while Z is a Bernoulli variable, so that

Z = 1

with probability

θ

, and

Z = 0

with probability

1 - θ

. Thus,

(U, U) \sim M

and

(V, 2 Z V - V - Z + 1) \sim C_{θ}^{Fre}

. We start by computing the expression to be minimized.

\begin{matrix} {E | (U, U) - (V, 2 Z V - V - Z + 1) |}^{2} \\ = E ({(U - V)}^{2} + {(U - (2 Z V - V - Z + 1))}^{2}) \\ = E (U^{2} + V^{2} - 2 U V + U^{2} + 4 Z^{2} V^{2} + V^{2} + Z^{2} + 1 \\ - 4 Z U V + 2 U V + 2 U Z - 2 U - 4 Z V^{2} - 4 Z^{2} V + 4 Z V + 2 V Z - 2 V - 2 Z) \\ = E (2 U^{2} + 2 V^{2} + Z^{2} + 1 + 6 V Z + 4 Z^{2} V^{2} - 4 Z U V + 2 U Z \\ - 2 U - 2 V - 2 Z - 4 Z V^{2} - 4 Z^{2} V) \\ = E (4 U^{2} + Z^{2} + 1 + 6 θ V + 4 Z^{2} V^{2} - 4 θ U V + 2 θ U - 1 - 1 - 2 θ - 4 θ V^{2} - 4 Z^{2} V) \\ = 4 E (U^{2}) + E (Z^{2}) - 1 + 2 θ + 4 E (Z^{2}) E (V^{2}) - 4 θ E (U V) - 4 θ E (V^{2}) - 2 E (Z^{2}) \\ = \frac{4}{3} + θ - 1 + 2 θ + \frac{4}{3} θ - 4 θ E (U V) - \frac{4}{3} θ - 2 θ \\ = \frac{1}{3} + θ - 4 θ E (U V) \\ = \frac{1}{3} + θ - 4 θ \cdot \frac{ρ + 3}{12} \\ = \frac{1}{3} + θ - \frac{θ}{3} (ρ + 3) \end{matrix}

The minimum value of

\frac{1}{3} + θ - \frac{θ}{3} (ρ + 3)

is reached for

ρ = 1

and corresponds to

\frac{1}{3} - \frac{1}{3} θ

. For the previous chain of equalities we used the following facts:

$E (Z) = θ$ , $E (Z^{2}) = θ$ ;
$V (U) = \frac{1}{12}$ , $E (U^{2}) = E (V^{2}) = \frac{1}{3}$ ;
$E (U V) = \frac{ρ + 3}{12}$ , where $ρ$ is the Pearson’s correlation between U and V.

References

Hennig, C.; Meila, M.; Murtagh, F.; Rocci, R. Handbook of Cluster Analysis; Chapman and Hall/CRC Handbook of Modern Statistical Methods; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
Maharaj, E.A.; D’Urso, P.; Caiado, J. Time Series Clustering and Classification; Chapman Hall/CRC Computing and Data Analytics Series; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
Caiado, J.; Maharaj, E.A.; D’Urso, P. Time-series clustering. In Handbook of Cluster Analysis; Hennig, C., Meila, M., Murtagh, F., Rocci, R., Eds.; CRC Press: Boca Raton, FL, USA, 2016; pp. 241–263. [Google Scholar]
Di Lascio, F.; Durante, F.; Pappadà, R. Copula–based clustering methods. In Copulas and Dependence Models with Applications; Úbeda Flores, M., de Amo, E., Durante, F., Fernández Sánchez, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; pp. 49–67. [Google Scholar]
Marti, G.; Nielsen, F.; Bińkowski, M.; Donnat, P. A Review of Two Decades of Correlations, Hierarchies, Networks and Clustering in Financial Markets. In Progress in Information Geometry: Theory and Applications; Nielsen, F., Ed.; Springer: Cham, Switzerland, 2021; pp. 245–274. [Google Scholar]
De Luca, G.; Zuccolotto, P. A tail dependence-based dissimilarity measure for financial time series clustering. Adv. Data Anal. Classif. 2011, 5, 323–340. [Google Scholar] [CrossRef]
De Luca, G.; Zuccolotto, P. Hierarchical time series clustering on tail dependence with linkage based on a multivariate copula approach. Internat. J. Approx. Reason. 2021, 139, 88–103. [Google Scholar] [CrossRef]
Durante, F.; Pappadà, R.; Torelli, N. Clustering of time series via non–parametric tail dependence estimation. Statist. Pap. 2015, 56, 701–721. [Google Scholar] [CrossRef]
Bonanomi, A.; Nai Ruscone, M.; Osmetti, S.A. Dissimilarity measure for ranking data via mixture of copulae. Stat. Anal. Data Min. Asa Data Sci. J. 2019, 12, 412–425. [Google Scholar] [CrossRef]
De Keyser, S.; Gijbels, I. Hierarchical variable clustering via copula-based divergence measures between random vectors. Int. J. Approx. Reason. 2023, 21, 109090. [Google Scholar] [CrossRef]
Disegna, M.; D’Urso, P.; Durante, F. Copula-based fuzzy clustering of spatial time series. Spat. Stat. 2017, 21, 209–225. [Google Scholar] [CrossRef]
Kojadinovic, I. Agglomerative hierarchical clustering of continuous variables based on mutual information. Comput. Stat. Data Anal. 2004, 46, 269–294. [Google Scholar] [CrossRef]
Zhang, B.; An, B. Clustering time series based on dependence structure. PLoS ONE 2018, 13, e0206753. [Google Scholar] [CrossRef]
De Luca, G.; Zuccolotto, P. A double clustering algorithm for financial time series based on extreme events. Stat. Risk Model. 2017, 34, 1–12. [Google Scholar] [CrossRef]
Durante, F.; Pappadà, R.; Torelli, N. Clustering of financial time series in risky scenarios. Adv. Data Anal. Classif. 2014, 8, 359–376. [Google Scholar] [CrossRef]
Pappadà, R.; Durante, F.; Salvadori, G.; De Michele, C. Clustering of concurrent flood risks via Hazard Scenarios. Spat. Stat. 2018, 23, 124–142. [Google Scholar] [CrossRef]
Saunders, K.R.; Stephenson, A.G.; Karoly, D.J. A regionalisation approach for rainfall based on extremal dependence. Extremes 2021, 24, 1386–1999. [Google Scholar] [CrossRef]
Fouedjio, F. Clustering of multivariate geostatistical data. WIREs Comput Stat. 2020, 12, e1510. [Google Scholar] [CrossRef]
Kopczewska, K. Spatial machine learning: New opportunities for regional science. Ann. Reg. Sci. 2022, 68, 713–755. [Google Scholar] [CrossRef]
Asgharian, H.; Hess, W.; Liu, L. A spatial analysis of international stock market linkages. J. Bank. Financ. 2013, 37, 4738–4754. [Google Scholar] [CrossRef]
Fernández-Avilés, G.; Montero, J.M.; Orlov, A.G. Spatial modeling of stock market comovements. Fin. Res. Lett. 2012, 9, 202–212. [Google Scholar] [CrossRef]
Hüttner, A.; Scherer, M.; Gräler, B. Geostatistical modeling of dependent credit spreads: Estimation of large covariance matrices and imputation of missing data. J. Bank. Financ. 2020, 118, 105897. [Google Scholar] [CrossRef]
Oliver, M.A.; Webster, R. A geostatistical basis for spatial weighting in multivariate classification. Math. Geol. 1989, 21, 15–35. [Google Scholar] [CrossRef]
Coppi, R.; D’Urso, P.; Giordani, P. A fuzzy clustering model for multivariate spatial time series. J. Class. 2010, 27, 54–88. [Google Scholar] [CrossRef]
Fouedjio, F. A hierarchical clustering method for multivariate geostatistical data. Spat. Stat. 2016, 18, 333–351. [Google Scholar] [CrossRef]
D’Urso, P.; Vitale, V. A robust hierarchical clustering for georeferenced data. Spat. Stat. 2020, 35, 100407. [Google Scholar] [CrossRef]
Di Lascio, F.; Menapace, A.; Pappadà, R. A spatially-weighted AMH copula-based dissimilarity measure for clustering variables: An application to urban thermal efficiency. Environmetrics 2023, e2828. [Google Scholar] [CrossRef]
Zuccolotto, P.; De Luca, G.; Metulini, R.; Carpita, M. Modeling and clustering of traffic flows time series in a flood prone area. In Proceedings of the Statistics and Data Science Conference; Cerchiello, P., Agosto, A., Osmetti, S., Spelta, A., Eds.; EGEA: Pavia, Italy, 2023; pp. 113–118. [Google Scholar]
Benevento, A.; Durante, F. Correlation-based hierarchical clustering of time series with spatial constraints. Spat. Stat. 2024, 59, 100797. [Google Scholar] [CrossRef]
Romary, T.; Ors, F.; Rivoirard, J.; Deraisme, J. Unsupervised classification of multivariate geostatistical data: Two algorithms. Comput. Geosci. 2015, 85, 96–103. [Google Scholar] [CrossRef]
Benevento, A.; Durante, F.; Pappadà, R. An approach to cluster time series extremes with spatial constraints. In Proceedings of the Book of the Short Papers SIS 2023; Chelli, F., Ciommi, M., Ingrassia, S., Mariani, F., Recchioni, M., Eds.; Pearson: London, UK, 2023; pp. 679–684. [Google Scholar]
Villani, C. Optimal Transport Old and New; Grundlehren der mathematischen Wissenschaften; Springer: Berlin, Germany, 2009; Volume 338. [Google Scholar]
Santambrogio, F. Optimal transport for applied mathematicians. Birkäuser 2015, 55, 94. [Google Scholar]
McCann, R.J. A convexity principle for interacting gases. Adv. Math. 1997, 128, 153–179. [Google Scholar] [CrossRef]
Agueh, M.; Carlier, G. Barycenters in the Wasserstein space. SIAM J. Math. Anal. 2011, 43, 904–924. [Google Scholar] [CrossRef]
Catalano, M.; Lijoi, A.; Prünster, I. Measuring dependence in the Wasserstein distance for Bayesian nonparametric models. Ann. Stat. 2021, 49, 2916–2947. [Google Scholar] [CrossRef]
Marti, G.; Andler, S.; Nielsen, F.; Donnat, P. Optimal transport vs. Fisher-Rao distance between copulas for clustering multivariate time series. In Proceedings of the 2016 IEEE statistical signal processing workshop (SSP), Palma de Mallorca, Spain, 26–29 June 2016; pp. 1–5. [Google Scholar]
Mordant, G.; Segers, J. Measuring dependence between random vectors via optimal transport. J. Multivar. Anal. 2022, 189, 104912. [Google Scholar] [CrossRef]
Peyré, G.; Marti, M. Computational Optimal Transport: With Applications to Data Science. Found. Trends® Mach. Learn. 2019, 11, 355–607. [Google Scholar] [CrossRef]
Panaretos, V.M.; Zemel, Y. Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Appl. 2019, 6, 405–431. [Google Scholar] [CrossRef]
Cuturi, M.; Doucet, A. Fast computation of Wasserstein barycenters. In Proceedings of the 31st International Conference on Machine Learning; Xing, E., Jebara, T., Eds.; Proceedings of Machine Learning Research: Bejing, China, 2014; Volume 32, pp. 685–693. [Google Scholar]
Puccetti, G.; Rüschendorf, L.; Vanduffel, S. On the computation of Wasserstein barycenters. J. Multivar. Anal. 2020, 176, 16. [Google Scholar] [CrossRef]
Takatsu, A. Wasserstein geometry of Gaussian measures. Osaka J. Math. 2011, 48, 1005–1026. [Google Scholar]
Givens, C.; Shortt, R.M. A class of Wasserstein metrics for probability distributions. Mich. Math. J. 1984, 31, 231–240. [Google Scholar] [CrossRef]
Dowson, D.C.; Landau, B.V. The Fréchet distance between multivariate normal distributions. J. Multivar. Anal. 1982, 12, 450–455. [Google Scholar] [CrossRef]
Knott, M.; Smith, C.S. On the optimal mapping of distributions. J. Optim. Theory Appl. 1984, 43, 39–49. [Google Scholar] [CrossRef]
Olkin, I.; Pukelsheim, F. The distance between two random vectors wigh given dispersion matrices. Linear Algebra Appl. 1982, 48, 257–263. [Google Scholar] [CrossRef]
Durante, F.; Sempi, C. Principles of Copula Theory; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
Nielsen, F.; Marti, G.; Ray, S.; Pyne, S. Clustering patterns connecting COVID-19 dynamics and human mobility using optimal transport. Sankhyā Ser. B 2021, 83, 167–184. [Google Scholar] [CrossRef]
Wiesel, J. Measuring association with Wasserstein distances. Bernoulli 2022, 28, 2816–2832. [Google Scholar] [CrossRef]
Fuchs, S.; Di Lascio, F.M.L.; Durante, F. Dissimilarity functions for rank-invariant hierarchical clustering of continuous variables. Comput. Statist. Data Anal. 2021, 159, 107201. [Google Scholar] [CrossRef]
Marti, G.; Andler, S.; Nielsen, F.; Donnat, P. Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering. In Proceedings of the NIPS 2016 Time Series Workshop; PMLR: Beijing, China, 2017; pp. 59–69. [Google Scholar]
Chen, X.; Fan, Y. Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. J. Econom. 2006, 135, 125–154. [Google Scholar] [CrossRef]
Patton, A. A review of copula models for economic time series. J. Multivar. Anal. 2012, 110, 4–18. [Google Scholar] [CrossRef]
Rémillard, B. Goodness-of-Fit Tests for Copulas of Multivariate Time Series. Econometrics 2017, 5, 13. [Google Scholar] [CrossRef]
Nasri, B.R.; Rémillard, B.N. Copula-based dynamic models for multivariate time series. J. Multivar. Anal. 2019, 172, 107–121. [Google Scholar] [CrossRef]
Hofert, M.; Kojadinovic, I.; Mächler, M.; Yan, J. Elements of Copula Modeling with R; Springer: Cham, Switzerland, 2018. [Google Scholar]
Genest, C.; Nešlehová, J.G.; Rémillard, B. Asymptotic behavior of the empirical multilinear copula process under broad conditions. J. Multivar. Anal. 2017, 159, 82–110. [Google Scholar] [CrossRef]
Pfeifer, D.; Mändle, A.; Ragulina, O.; Girschig, C. New copulas based on general partitions-of-unity. III: The continuous case. Depend. Model. 2019, 7, 181–201. [Google Scholar] [CrossRef]
Segers, J.; Sibuya, M.; Tsukahara, H. The empirical beta copula. J. Multivar. Anal. 2017, 155, 35–51. [Google Scholar] [CrossRef]
Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Bador, M.; Naveau, P.; Gilleland, E.; Castellà, M.; Arivelo, T. Spatial clustering of summer temperature maxima from the CNRM-CM5 climate model ensembles & E-OBS over Europe. Weather Clim. Extrem. 2015, 9, 17–24. [Google Scholar]
Campello, R.J.; Hruschka, E.R. A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst. 2006, 157, 2858–2875. [Google Scholar] [CrossRef]

Figure 1. Random sample from a Gaussian copula

C_{1}

with parameter

0.9

(upper left), a Gaussian copula

C_{2}

with parameter

- 0.9

(upper right),

(C_{1} + C_{2}) / 2

(lower left), and the (equally weighted) copula associated with the barycenter of

C_{1}

and

C_{2}

(lower right).

Figure 1. Random sample from a Gaussian copula

C_{1}

with parameter

0.9

(upper left), a Gaussian copula

C_{2}

with parameter

- 0.9

(upper right),

(C_{1} + C_{2}) / 2

(lower left), and the (equally weighted) copula associated with the barycenter of

C_{1}

and

C_{2}

(lower right).

Figure 2. Graph of

θ \mapsto d_{W_{2}} (\sum (1), \sum (θ))

where

θ \in [- 1, 1]

.

Figure 2. Graph of

θ \mapsto d_{W_{2}} (\sum (1), \sum (θ))

where

θ \in [- 1, 1]

.

Figure 3. Membership degree of unit 1 to the same cluster of unit 13 for different values of the dissimilarity matrix of (16). Solid line:

τ = 0.25

; dashed line:

τ = 0.50

; dotted line:

τ = 0.75

. (Left) Maximal spatial correlation equal to 0. (Right) Maximal spatial correlation equal to

- 1

. The results are mean values over

R = 25

replications from model (17).

Figure 3. Membership degree of unit 1 to the same cluster of unit 13 for different values of the dissimilarity matrix of (16). Solid line:

τ = 0.25

; dashed line:

τ = 0.50

; dotted line:

τ = 0.75

. (Left) Maximal spatial correlation equal to 0. (Right) Maximal spatial correlation equal to

- 1

. The results are mean values over

R = 25

replications from model (17).

Figure 4. Membership representation of the pure temporal clustering. Darker colors and bigger points represent a higher membership degree. The crossed points are the medoids of each group.

Figure 5. Membership representation of the clustering for

α = 0.30

. Darker colors and bigger points represent a higher membership degree. The crossed points are the medoids of each group.

Figure 5. Membership representation of the clustering for

α = 0.30

. Darker colors and bigger points represent a higher membership degree. The crossed points are the medoids of each group.

Table 1. Fuzzy silhouette index for

α = 0.30

and

K \in {2, \dots, 10}

.

Table 1. Fuzzy silhouette index for

α = 0.30

and

K \in {2, \dots, 10}

.

$k = 2$	$k = 3$	$k = 4$	$k = 5$	$k = 6$	$k = 7$	$k = 8$	$k = 9$	$k = 10$
0.176	0.208	0.194	0.163	0.120	0.154	0.118	0.105	0.102

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Benevento, A.; Durante, F. Wasserstein Dissimilarity for Copula-Based Clustering of Time Series with Spatial Information. Mathematics 2024, 12, 67. https://doi.org/10.3390/math12010067

AMA Style

Benevento A, Durante F. Wasserstein Dissimilarity for Copula-Based Clustering of Time Series with Spatial Information. Mathematics. 2024; 12(1):67. https://doi.org/10.3390/math12010067

Chicago/Turabian Style

Benevento, Alessia, and Fabrizio Durante. 2024. "Wasserstein Dissimilarity for Copula-Based Clustering of Time Series with Spatial Information" Mathematics 12, no. 1: 67. https://doi.org/10.3390/math12010067

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wasserstein Dissimilarity for Copula-Based Clustering of Time Series with Spatial Information

Abstract

1. Introduction

2. Background on Optimal Transport, Wasserstein Distance, and Copulas

3. The Methodology

3.1. Extract the Temporal Dependence

3.2. Extract the Spatial Dependence

3.3. Create the Dissimilarity Measure

4. The Quasi-Gaussian Approach

5. An Illustration with a Fuzzy-PAM Algorithm

6. An Empirical Application

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI