A Generative Model for Correlated Graph Signals

Loskot, Pavel

doi:10.3390/math9233078

Open AccessArticle

A Generative Model for Correlated Graph Signals

by

Pavel Loskot

ZJU-UIUC Institute, Haining 314400, China

Mathematics 2021, 9(23), 3078; https://doi.org/10.3390/math9233078

Submission received: 31 October 2021 / Revised: 24 November 2021 / Accepted: 27 November 2021 / Published: 29 November 2021

(This article belongs to the Special Issue Random Processes on Graphs)

Download

Browse Figures

Versions Notes

Abstract

:

A graph signal is a random vector with a partially known statistical description. The observations are usually sufficient to determine marginal distributions of graph node variables and their pairwise correlations representing the graph edges. However, the curse of dimensionality often prevents estimating a full joint distribution of all variables from the available observations. This paper introduces a computationally effective generative model to sample from arbitrary but known marginal distributions with defined pairwise correlations. Numerical experiments show that the proposed generative model is generally accurate for correlation coefficients with magnitudes up to about 0.3, whilst larger correlations can be obtained at the cost of distribution approximation accuracy. The generative models of graph signals can also be used to sample multivariate distributions for which closed-form mathematical expressions are not known or are too complex.

Keywords:

covariance; generative model; graph signal; probability distribution; random vector

1. Introduction

The observations of many real-world systems can be studied as multiple time series. Provided that the pairwise relationships between the time series are implicitly or explicitly defined, it is common to refer to these data models as graph signals [1]. In most cases, only one feature representing the pairwise relationships is considered. The pairwise relationships, which are not explicitly stated can assume some implicit value, such as having a zero covariance (i.e., being uncorrelated), or these relationships should be assumed to be undefined (i.e., unknown). The graph edges can also indicate statistical or causal dependencies [2]. Consequently, graph signals can be defined as random vectors with incomplete knowledge of their statistics. The random variables then represent nodes of the graph, and the pairwise relationships are the graph edges.

The graph variables can be arranged into a random vector or matrix in an arbitrary order. One can also assume a graph search and follow a path over the graph edges through the graph nodes to construct the random vector. The random vectors and matrices can be conveniently processed using the well-established framework of linear algebra combined with methods in statistical signal processing [3] and machine learning. In the literature on graph signal processing, the mainstream approach assumes a frequency domain representation of graph signals using the singular value decomposition (SVD) of graph adjacency, incidence, or Laplacian matrices [4].

In general, a random vector or matrix is fully statistically described by a joint distribution of all the constituent random variables. A comprehensive survey of multivariate distributions can be found in [5]. Mathematical expressions of multivariate distributions may contain the pairwise correlations explicitly, as is the case of multivariate normal distribution, but in most cases, the correlations can be calculated from other distribution parameters. The main challenge in fitting multivariate distributions to observed data is that the number of required observations to achieve a certain goodness-of-fit grows exponentially with the number of dimensions (the curse of dimensionality). In addition, multivariate distributions are often difficult to sample, and numerically complex algorithms are required to perform statistical inferences [6].

A related problem of generating random variables with defined moments was considered in the literature. In particular, a linear transformation was utilized in [7,8] to generate nonnormal variates with the defined univariate skewness, kurtosis, and pairwise covariances. This method was extended in [9] to assume multivariate skewness and kurtosis.

In this paper, it is assumed that a random vector is described by the marginal distributions and the pairwise correlations of its elements. The task is to define a generative model to efficiently generate multivariate samples satisfying the given statistical constraints without resorting to fitting a multivariate distribution to the observed data. It is proposed to approximate the unknown multivariate distribution with defined statistical constraints by a multivariate mixture distribution having independent marginals. In addition, it is shown that the component distributions of the marginal mixture distributions can be conjugate distributions. This approach enables a definition of a universal procedure for constructing generative models of multivariate graph signals that can be readily sampled. The limitation of the proposed procedure is that there is a tradeoff in how accurately the marginal distributions can be approximated and the achievable pairwise correlations. However, it is likely that the proposed generative procedure could be further modified to improve the tradeoff.

The rest of the paper is organized as follows. The research problem is stated in Section 2. The generative model of graph signals is introduced in Section 3. Numerical examples for bivariate distributions are presented in Section 4. The obtained results are discussed in Section 5, and Section 6 concludes the paper.

2. Problem Statement

Assume that a sufficient number of discrete-time observations of N stochastic time series,

X_{i}

,

i \in {1, 2, \dots, N}

, have been collected, so the following quantities can be determined with a good accuracy:

marginal densities, $f_{i} (X)$ of $X_{i}$ , for $\forall i \in {1, 2, \dots, N}$ ; and (C1)
covariances, $C_{i j} = cov [X_{i}, X_{j}] = E [(X_{i} - {\bar{X}}_{i}) (X_{j} - {\bar{X}}_{j})]$ , for $\forall i, j \in {1, 2, \dots, N}$ (C2)

where

X_{i}

is a random variable representing the samples in the i-th time series, and

E [\cdot]

denotes expectation. Note that the existence of time-invariant densities implies stationarity as well as knowledge of all the moments of univariate random variable

X_{i}

. Furthermore, only continuous multivariate distributions are considered in this paper, and, unless otherwise stated,

X_{i} \in R

.

If the constraints C1 and C2 are determined from real-world observations with sufficient accuracy, both constraints are guaranteed to be consistent, and there must exist at least one corresponding multivariate distribution. However, given a set of marginal distributions and (unnormalized) covariances, there may be, in general, no multivariate distributions satisfying these constraints. The constraint C2 may be modified to assume the normalized correlation coefficients instead of covariances.

The joint moments including covariances and the corresponding Pearson correlation coefficients can be estimated from data by the method of sample moments [10]. The marginal densities can be efficiently estimated from data by various nonparametric methods using histograms, kernels [11], and diffusion [12]. However, estimating the joint probability density of all N variates

X_{i}

may be problematic, since it may require a very large number of observations, especially for more complex distributions in many dimensions [13]. Consequently, given the marginal distributions

f_{i} (X)

and the pairwise covariances

C_{i j}

, the task is to construct a generative model for generating random samples of the vector,

X = [X_{1}, X_{2}, \dots, X_{N}]

, whose elements satisfy the constraints C1 and C2 given above. No other constraints are adopted in this paper, although it may be desirable to require that the generative model is also numerically efficient. In addition, the generative procedure to obtain random samples satisfying the constraints C1 and C2 should be sufficiently general in order to allow for different types of marginal distributions including the case of non-identical marginal distributions.

3. Constructing a Multivariate Distribution from Its Marginals

The constraints C1 and C2 do not uniquely define the joint distribution

f (X)

. In particular, the marginal distributions can be used to obtain all general and central moments of individual random variables

X_{i}

, whereas the pairwise covariances are the only joint statistics assumed to be known. Provided that the marginals

f_{i}

are of the same and more common type, the corresponding multivariate density may have been identified in the literature [5]. However, even if the mathematical expression of the desired multivariate distribution

f (X)

is available, it may be too complex to sample from, or to accurately fit to the observations using, for example, the least squares regression or other parameter estimation methods [3]. Another strategy, which is investigated in this paper, is to construct the joint distribution f from the known marginals

f_{i}

,

i = 1, 2, \dots, N

under the covariance constraints.

Proposition 1.

The join density f with the given marginals

f_{i}

,

i = 1, 2, \dots, N

, under mild covariance constraints can be well approximated by the mixture distribution,

f (X) = \sum_{k = 1}^{K} α_{k} {\tilde{f}}_{k} (X)

(1)

of K joint component densities

{\tilde{f}}_{k}

, and the weighting factors,

α_{k} > 0 \forall k

and

\sum_{k = 1}^{K} α_{k} = 1

.

The mixture decomposition (1) of f again requires that not only all the components

{\tilde{f}}_{k}

are identified but also that they can be effectively sampled from. In order to overcome the latter challenge, it is newly proposed to adopt an independence assumption, and express the marginals

{\tilde{f}}_{k}

as a product of the individual densities, i.e., the mixture decomposition (1) is rewritten as,

f (X) = \sum_{k = 1}^{K} α_{k} \prod_{i = 1}^{N} {\tilde{f}}_{k i} (X_{i}) .

(2)

The advantage of assuming the mixture decomposition (2) is that it is generally much easier to sample from univariate than from multivariate distributions. The disadvantage of decomposition (2) is that the independence assumption limits the achievable pairwise correlations between variables

X

.

Denote

X_{- i} = {X_{1}, X_{2}, \dots, X_{N}} \ {X_{i}}

, and

X_{- i - j} = {X_{1}, X_{2}, \dots, X_{N}} \ {X_{i}, X_{j}}

. The marginal distributions corresponding to decomposition (2) are obtained as

f_{i} (X_{i}) = \int_{R^{N - 1}} f (X) d X_{- i} = \sum_{k = 1}^{K} α_{k} {\tilde{f}}_{k i} (X_{i}),

(3)

whilst the bivariate marginal densities are computed as

f_{i j} (X_{i}, X_{j}) = \int_{R^{N - 2}} f (X) d X_{- i - j} = \sum_{k = 1}^{K} α_{k} {\tilde{f}}_{k i} (X_{i}) {\tilde{f}}_{k j} (X_{j}) .

(4)

The corresponding mean value is

{\bar{X}}_{i} = E [X_{i}] = \sum_{k = 1}^{K} α_{k} E_{{\tilde{f}}_{k i}} [X_{i}] = \sum_{k = 1}^{K} α_{k} {\bar{X}}_{k i},

(5)

and the second moments are computed as

\begin{matrix} var [X_{i}] & = & E [X_{i}^{2}] - E {[X_{i}]}^{2} = \sum_{k = 1}^{K} E_{{\tilde{f}}_{k i}} [X_{i}^{2}] - \sum_{k = 1}^{K} \sum_{l = 1}^{K} α_{k} α_{l} {\bar{X}}_{k i} {\bar{X}}_{l i} \\ corr [X_{i}, X_{j}] & = & E [X_{i} X_{j}] = \sum_{k = 1}^{K} α_{k} {\bar{X}}_{k i} {\bar{X}}_{k j} \\ cov [X_{i}, X_{j}] & = & E [X_{i} X_{j}] - E [X_{i}] E [X_{j}] = \sum_{k = 1}^{K} α_{k} {\bar{X}}_{k i} {\bar{X}}_{k j} - \sum_{k = 1}^{K} \sum_{l = 1}^{K} α_{k} α_{l} {\bar{X}}_{k i} {\bar{X}}_{l j} . \end{matrix}

(6)

Consequently and importantly, for

N \geq 2

and

K \geq 2

, the pairwise covariances are given by the mixture coefficients

α_{k}

and the mean values

{\bar{X}}_{k i}

of the component distributions

{\tilde{f}}_{k i} (X_{i})

.

Given the marginals

f_{i}

and the covariances

cov [X_{i}, X_{j}]

, the values of

α_{k}

and

{\bar{X}}_{k i}

must satisfy Equations (5) and (6). This represents a system of

(N + (\binom{N}{2}))

equations with

(N K + K)

unknowns, where

(\binom{N}{2}) = N (N - 1) / 2

is a binomial coefficient. Provided that the K coefficients

α_{k}

can be determined by N univariate decompositions (3), the number of unknowns can be reduced to

N K

. Then the number of degrees of freedom as a difference between the number of equations and the number of unknowns, i.e.,

D_{free} = N K - N - (\binom{N}{2})

, versus the distribution dimension N is shown in Figure 1 for several different values of K. The points above the horizontal dashed line in Figure 1 indicate the underdetermined cases when the number of unknowns is greater than the number of constraints. The unique solution may exist, if

N = 2 (K - 1) + 1

, i.e., when N is odd.

Bivariate Case

For

N = 2

and

K = 2

, the number of degrees of freedom,

D_{free} = 1

. Assuming two random variables X and Y, the mixture decomposition (2) can be rewritten as,

f (X, Y) = α {\tilde{f}}_{X_{1}} (X) {\tilde{f}}_{Y_{1}} (Y) + (1 - α) {\tilde{f}}_{X_{2}} (X) {\tilde{f}}_{Y_{2}} (Y)

(7)

where

0 < α < 1

. Note that, for

α = 0

or

α = 1

, the variables X and Y are assumed to be independent, and thus, uncorrelated. The corresponding marginal distributions are the mixtures

\begin{matrix} f_{X} (X) & = α {\tilde{f}}_{X_{1}} (X) + (1 - α) {\tilde{f}}_{X_{2}} (X) \\ f_{Y} (Y) & = α {\tilde{f}}_{Y_{1}} (Y) + (1 - α) {\tilde{f}}_{Y_{2}} (Y) \end{matrix}

(8)

having the following first and second order statistics

\begin{matrix} \bar{X} & = & α {\bar{X}}_{1} + (1 - α) {\bar{X}}_{2} \\ \bar{Y} & = & α {\bar{Y}}_{1} + (1 - α) {\bar{Y}}_{2} \\ var [X] & = & α var [X_{1}] + (1 - α) var [X_{2}] + α (1 - α) {({\bar{X}}_{1} - {\bar{X}}_{2})}^{2} \\ var [Y] & = & α var [Y_{1}] + (1 - α) var [Y_{2}] + α (1 - α) {({\bar{Y}}_{1} - {\bar{Y}}_{2})}^{2} \\ corr [X, Y] & = & α {\bar{X}}_{1} {\bar{Y}}_{1} + (1 - α) {\bar{X}}_{2} {\bar{Y}}_{2} \\ cov [X, Y] & = & α (1 - α) ({\bar{X}}_{1} - {\bar{X}}_{2}) ({\bar{Y}}_{1} - {\bar{Y}}_{2}) \end{matrix}

(9)

where the mean and the variance of

{\tilde{f}}_{X_{1}}

are denoted as

{\bar{X}}_{1}

and

var [X_{1}]

, respectively. Similar notation is used for the other component distributions. Note that the correlation between the variables X and Y increases with the difference of the means of the corresponding component distributions. Conversely, the variables X and Y are uncorrelated, if either the means

{\bar{X}}_{1} = {\bar{X}}_{2}

, or

{\bar{Y}}_{1} = {\bar{Y}}_{2}

. The expression,

α (1 - α) \geq 0

, is maximized for

α = 1 / 2

.

Since

D_{free} = 1

, the mean values of the component distributions in (7) can be expressed as functions of one parameter. For example, choosing

{\bar{X}}_{1}

as such parameter, the other means are computed as

\begin{matrix} {\bar{X}}_{2} & = \frac{\bar{X} - α {\bar{X}}_{1}}{1 - α} \\ {\bar{Y}}_{1} & = \frac{cov [X, Y] (1 - α) + α \bar{Y} ({\bar{X}}_{1} - \bar{X})}{α ({\bar{X}}_{1} - \bar{X})} \\ {\bar{Y}}_{2} & = \frac{- cov [X, Y] + \bar{Y} ({\bar{X}}_{1} - \bar{X})}{({\bar{X}}_{1} - \bar{X})} \end{matrix}

(10)

by solving the equations for

\bar{X}

,

\bar{Y}

and

cov [X, Y]

given in (9). Substituting the expressions in (9), the Pearson correlation coefficient is computed as,

ρ_{X Y} = \frac{cov [X, Y]}{\sqrt{var [X] var [Y]}} = \frac{α (1 - α) ({\bar{X}}_{1} - {\bar{X}}_{2}) ({\bar{Y}}_{1} - {\bar{Y}}_{2})}{\sqrt{σ_{1}^{2} + α (1 - α) {({\bar{X}}_{1} - {\bar{X}}_{2})}^{2}} \sqrt{σ_{2}^{2} + α (1 - α) {({\bar{Y}}_{1} - {\bar{Y}}_{2})}^{2}}}

(11)

where

σ_{1}^{2} = α var [X_{1}] + (1 - α) var [X_{2}]

and

σ_{2}^{2} = α var [Y_{1}] + (1 - α) var [Y_{2}]

. Consequently, the combined variances

σ_{1}^{2}

and

σ_{2}^{2}

limit the achievable correlations between the variables X and Y in the generative model (7). Only when

σ_{1}^{2} = σ_{2}^{2} = 0

, can the correlation coefficient reach the maximum magnitude,

ρ_{X Y} = \pm 1

.

The mixture decompositions of marginals defined in (8) can be obtained using different strategies. The marginal distributions defined by the constraint C1 can be common univariate distributions, or they can be defined as the univariate mixture distributions from the onset. The latter case can be resolved by curve fitting to the observed data, so here, we investigate the former case.

Proposition 2.

The marginal distributions (8) can be approximated by conjugate mixtures. The conjugate mixtures are of the same type as the resulting marginal distributions, but they have their parameters determined by the constraints (9).

Hence, assume that the mixture distributions

{\tilde{f}}_{X_{1}}

and

{\tilde{f}}_{X_{2}}

in (8) are obtained by a linear transformation of the marginal distribution

f_{X}

, i.e., let [14]

\begin{matrix} {\tilde{f}}_{X_{1}} (X) & = \frac{1}{s_{1}} f_{X} (\frac{X - m_{1}}{s_{1}}) \\ {\tilde{f}}_{X_{2}} (X) & = \frac{1}{s_{2}} f_{X} (\frac{X - m_{2}}{s_{2}}) \end{matrix}

(12)

where the shifts

m_{1}, m_{2} \in R

, whereas the scaling

s_{1} = 1 - ϵ_{1}

and

s_{2} = 1 - ϵ_{2}

, for some small

ϵ_{1}, ϵ_{2} > 0

, to satisfy the variance constraint in (9). The marginal distribution

f_{Y}

is approximated similarly, and independently from

f_{X}

.

Substituting (12) into (8), the value of the mixture coefficient

α

can be determined to optimize the goodness of fit. In particular, the conjugate mixture distributions can be locally linearized about

X_{0}

, as indicated in Figure 2. Then, for

\forall X :

| X - X_{0} | < ϵ

, the distributions can be approximated as linear functions, i.e.,

\begin{matrix} f_{X} (X) & \approx g X + o \\ f_{X_{1}} (X) & \approx g_{1} (X + m_{1}) + o \\ f_{X_{2}} (X) & \approx g_{2} (X - m_{2}) + o \end{matrix}

(13)

where g,

g_{1}

, and

g_{2}

are the gradients, o is the common offset, and

m_{1}, m_{2} > 0

are the shifts. Substituting into (8), we obtain

\begin{matrix} α g_{1} (1 + m_{1} / m_{2}) & = g \\ (1 - α) g_{2} (1 + m_{2} / m_{1}) & = g, \end{matrix}

(14)

which is crucially independent of the actual value of

X_{0}

. In the case when

g_{1} = g_{2}

, a rule of thumb for choosing the value of mixture coefficient

α

is obtained as

m_{1} α = m_{2} (1 - α)

(15)

so that,

α (\bar{X} + m_{1}) + (1 - α) (\bar{X} - m_{2}) = \bar{X}

as in (9). Hence, the value of

α

can be chosen somewhat arbitrarily as long as the condition (15) and the constraints (9) are satisfied.

Alternatively, for conjugate mixture components, the decomposition (8) can be rewritten as,

f_{X} (X) \approx α g_{1} f_{X} (X + Δ X) + (1 - α) g_{2} f_{X} (X - Δ X) .

(16)

Equation (16) can be assumed to be a linear digital filtering of the signal

f_{X} (X)

in variable X. The filter coefficients

α g_{1}

and

(1 - α) g_{2}

are separated by

2 Δ X

. The approximation (16) is then more exact, provided that the filter does not distort the filtered signal

f_{X} (X)

, i.e., when the filter bandwidth is wider than the bandwidth of the signal [15]. The signal and filter bandwidth are determined by the magnitude of the Fourier transform. In particular, since the signal

f_{X} (X)

is also a distribution, we can assume the characteristic function of

f_{X} (X)

, i.e.,

ϕ_{X} (s) = E_{X} [e^{j s X}]

,

j = \sqrt{- 1}

, which is known or can be obtained for most univariate distributions. The filter bandwidth is obtained by computing the magnitude of its transfer function

T (s)

, i.e.,

\begin{matrix} T (s) & = \int_{- \infty}^{\infty} \{α g_{1} δ (X + Δ X) + (1 - α) g_{2} δ (X - Δ X)\} e^{j s X} d X \\ = α g_{1} e^{j 2 π s Δ X} + (1 - α) g_{2} e^{j 2 π s Δ X} . \end{matrix}

(17)

4. Numerical Examples

The case of the following three bivariate distributions is considered: normal, gamma, and normal-exponential distributions [14]. Although generating correlated normal samples is straightforward, which is a rare exception among multivariate distributions, the normal distribution is mainly considered to validate the proposed generative model.

The first experiment investigates approximations of the selected univariate distributions by a mixture of the two component distributions defined in (8), i.e., the approximation,

f_{\tilde{X}} = α {\tilde{f}}_{X_{1}} + (1 - α) {\tilde{f}}_{X_{2}}

, of

f_{X}

. The approximation accuracy is quantified by the Kullback–Leibler (KL) divergence between the target distribution

f_{X}

and its mixture approximation

f_{\tilde{X}}

. The KL divergence is defined as,

KL (f_{\tilde{X}} ∥ f_{X}) = \int_{- \infty}^{\infty} f_{\tilde{X}} (X) log \frac{f_{\tilde{X}} (X)}{f_{X} (X)} d X .

(18)

Moreover, using Jensen’s inequality for logarithm, it is straightforward to show that

KL (f_{X} ∥ f_{\tilde{X}}) \geq \sum_{i = 1}^{K} α_{i} KL (f_{X} | f_{X_{i}}) .

(19)

In order to reduce the number of free parameters, the mixture components are assumed to be the conjugates of the target distribution, i.e.,

f_{X_{1}}

and

f_{x_{2}}

are of the same type as

f_{X}

, and have the means,

{\bar{X}}_{1} = \bar{X} + Δ \bar{X}

, and

{\bar{X}}_{2} = \bar{X} - Δ \bar{X}

, where

Δ X \geq 0

. Consequently,

α = 1 / 2

in all the experiments, in accordance with (15).

In the case of normal distribution, there are two distribution parameters, i.e., the mean

\bar{X}

and the variance

var [X]

. In order to account for the variance constraint in (9), the variances of the component distributions

f_{X_{1}}

and

f_{X_{2}}

have been equally reduced to,

p \times var [X]

,

0 < p \leq 1

. The gamma distribution also has two parameters, i.e., the shape

k > 0

, and the scale

θ > 0

. Given the scale

θ

, the shapes of the two component distributions

f_{X_{1}}

and

f_{X_{2}}

are set to,

k_{1, 2} = (\bar{X} \pm Δ X) / θ

, respectively. The normal-exponential distribution (or, exponentially-modified normal distribution) is described by three parameters, i.e., the mean and the variance of the normal distribution and the rate of the exponential distribution. The variance of the normal distribution of both components

f_{X_{1}}

and

f_{X_{2}}

was reduced to

0.9 var [X]

, and the variance of the exponential distribution was left unchanged.

The KL values for all three distributions considered are shown in a log-scale in Figure 3. It is observed that the approximations

f_{\tilde{X}}

of

f_{X}

are visually accurate for the log-KL values below

10^{- 2}

. Hence, the mixture approximation

f_{\tilde{X}}

is rather accurate for some parameter values of the target distribution, and mainly for smaller displacements

Δ X

, as expected.

Next, we investigate the achievable magnitudes of the correlation coefficient between the random variables X and Y which are both generated by the mixture approximations (8). The same three marginal distributions are considered, i.e., normal, gamma, and normal-exponential distributions with the same parameters as in Figure 3. Here, the benefit of defining the bivariate distributions as the mixtures (7) to generate correlated random samples becomes evident. In particular, with the probability

α

, the distributions

f_{X_{1}}

and

f_{Y_{1}}

are sampled independently, and with the probability

(1 - α)

, the samples X and Y are independently generated from the distributions

f_{X_{2}}

and

f_{Y_{2}}

, respectively. Thus, the correlated bivariate samples are generated by independently sampling from the four univariate distributions

f_{X_{1}}

,

f_{X_{2}}

,

f_{Y_{1}}

, and

f_{Y_{2}}

. The generation of normal samples is trivial, and readily available in many numerical software packages. For gamma distribution, the generator of gamma samples is either available (e.g., in Matlab as function gamrnd), or the gamma random number generator can be constructed [16]. Finally, the normal-exponential distributed samples are simply the sum of the two underlying distributions.

The achievable correlation coefficient has been measured empirically for

10^{5}

bivariate random samples. The results are shown in Figure 4. The curves are in a good agreement with the theoretical values given by expressions (11). More importantly, the following conclusion can be drawn from comparing the corresponding results in Figure 3 and Figure 4. The accurate approximation of the marginal distributions by the proposed generative mixture model limits the achievable values of the correlation coefficient to about

0.2

or

0.3

, depending on the specific distributions and their parameters considered.

5. Discussion

The full statistical description of multiple time series may be difficult or impossible to obtain from a limited number of the observations. The incomplete statistics give raise to interesting signal processing problems involving graph signals. The graph signals can be more formally defined as follows.

Definition 1.

The graph signal is a set of random variables representing space-time observations of stochastic processes or phenomena, for which some marginal or conditional distributions and some statistical moments are known. However, the statistical description is incomplete in the sense that a full joint distribution of all random variables cannot be uniquely determined from prior knowledge and the available observations.

Provided that graph signals are represented as random vectors or matrices, many techniques of statistical signal processing or even machine learning can be used. For instance, Bayesian inference of parameters and unobserved model states requires a full statistical description of observed samples as the joint density or conditional density. Learning the model may require generating enough labeled data, which are consistent with the observations. Knowledge of the joint distribution is also important in controllability and observability of stochastic systems, for instance, to make optimum decisions under uncertainty. In these cases, constructing a generative model of graph observations and graph responses is crucial.

In this paper, the generative model of multiple random observations is constrained by the marginal distributions and the second order statistics. This does not uniquely define the corresponding joint distribution, but it can be further constrained by other higher order moments [7,8]. The behavior of the higher and lower order moments is described by Hölder’s inequality [17]. An open research problem is whether there always exists at least one multivariate distribution given the set of marginal distributions and the pairwise correlations or covariances; it is guaranteed to exist if the observations have been obtained from a real-world system or a simulated model. There are multivariate distributions such as the multivariate Cauchy distribution for which the correlations cannot be defined. In addition, the marginal or conditional distributions may have some of their parameters undefined which increases the number of degrees of freedom. The unknown parameters could be then estimated from the known moments or other known statistics. Furthermore, practical problems often require generating the samples that are correlated in more than one dimension.

In general, there is a tradeoff between accurately approximating the marginals as mixture distributions and the achievable magnitude of the correlation coefficient as indicated by Equation (11). This has been observed in numerical examples involving two random variables assuming conjugate components in the mixture approximations of marginal distributions. The main advantage of the proposed generative model is that it can be readily sampled. The accuracy–correlation tradeoff could be improved by assuming nonconjugate mixture component distributions or by considering other types of generative models, albeit at the cost of the sampling efficacy. Alternatively, the pairwise covariance of samples

X_{i}

and

Y_{i}

can be reduced by simple averaging. In particular, let

\bar{X} = \sum_{i = 1}^{N_{1}} X_{i}

, and,

\bar{Y} = \sum_{i = 1}^{N_{2}} Y_{i}

, and assume that the samples

X_{i}

and

Y_{i}

are otherwise uncorrelated. Thus, let

E [X_{i} X_{j}] = E [Y_{i} Y_{j}] = E [X_{i} Y_{j}] = 0

, for

i \neq j

, whereas

E [X_{i} Y_{i}] \neq 0

. It is then straightforward to show that the correlation coefficient defined in (11) changes to

ρ_{\bar{X} \bar{Y}} = \sqrt{\frac{N_{2}}{N_{1}}} ρ_{X Y}, N_{2} \leq N_{1} .

(20)

Another strategy worth exploring is to investigate the kernel approximations of multivariate densities [11] and also the bounds of these densities. For instance, for any sets A and B, the joint probability can bounded as

Pr (A) + Pr (B) - 1 \leq Pr (A \cap B) \leq \sqrt{Pr (A) Pr (B)} .

(21)

Then, for

A = {X : X \leq x}

and

B = {Y : Y \leq y}

, the joint cumulative function

F_{X Y}

can be bounded as

F_{X} (x) + F_{Y} (y) - 1 \leq F_{X Y} (x, y) \leq \sqrt{F_{X} (x) F_{Y} (y)} .

(22)

For

N = 2

, the bivariate joint cumulative density is obtained by integrating Equation (2), i.e.,

F_{X Y} (x, y) = \sum_{k = 1}^{K} α_{k} F_{X_{k}} (x) F_{Y_{k}} (y) .

(23)

Assuming Equation (22), it can be bounded as

\sum_{k = 1}^{K} α_{k} (F_{X_{k}} (x) + F_{Y_{k}} (y) - 1) \leq \sum_{k = 1}^{K} α_{k} F_{X_{k}} (x) F_{Y_{k}} (y) \leq \sqrt{\sum_{k = 1}^{K} α_{k} F_{X_{k}} (x) \sum_{k = 1}^{K} α_{k} F_{Y_{k}} (y)} .

(24)

Moreover, in many stochastic systems, the densities evolve in time, so the discrete mixture (1) could be rewritten for the case of continuous time t as

f (X) = \int_{R} α (t) \tilde{f} (X; t) d t,

(25)

where

α (t)

is another probability distribution, i.e.,

α (t) \geq 0

and

\int_{R} α (t) d t = 1

. The expression (25) then represents a mean-time density of the system response.

Lastly, the generative model constructed in this paper exactly fits the first and the second statistical moments whilst approximating the marginal distributions. An alternative strategy to construct a generative model may instead emphasize fitting exactly the marginal distributions and relaxing the constraints involving the statistical moments.

6. Conclusions

The graph signals were defined as random vectors with incomplete knowledge of their statistics. A generative probabilistic model was then proposed to sample graph signals from given marginal distributions with given pairwise correlations. The generative model approximated the multivariate distributions by a mixture of independent univariate densities, which were then sampled to generate the correlated random sequences, which were consistent with the observations. The numerical results were presented for a bivariate case of three specific marginal distributions. The results confirmed that the proposed generative model experiences a tradeoff between accurately approximating the marginal densities and the achievable correlations, with the correlation coefficient magnitudes not greater than about

0.3

. However, the cross-correlations of the observed samples can be reduced by simple averaging. Future work will focus on improving the approximation–correlation tradeoff and on defining generative models with estimation of unknown model parameters.

Funding

This research was funded by a start-up research grant provided by Zhejiang University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

Dong, X.; Thanou, D.; Rabbat, M.; Frossard, P. Learning Graphs From Data. IEEE Signal Process. Mag. 2019, 36, 44–63. [Google Scholar] [CrossRef] [Green Version]
Pearl, J. Causal inference in statistics: An overview. Stat. Surv. 2009, 3, 96–146. [Google Scholar] [CrossRef]
Kay, S.M. Fundamentals of Statistical Signal Processing: Estimation Theory; Prentice Hall: Upper Saddle River, NJ, USA, 1993; Volume I. [Google Scholar]
Ortega, A.; Frossard, P.; Kovačević, J.; Moura, J.M.F.; Vandergheynst, P. Graph Signal Processing: Overview, Challenges, and Applications. IEEE Proc. 2018, 106, 808–828. [Google Scholar] [CrossRef] [Green Version]
Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
Foldnes, N.; Olsson, U.H. A simple simulation technique for non-normal data with pre-specified skewness, kurtosis and covariance matrix. Multivar. Behav. Res. 2016, 51, 207–219. [Google Scholar] [CrossRef] [PubMed]
Lyhagen, J. Communications in Statistics-Simulation and Computation; Taylor & Francis: Abingdon, UK, 2008; Volume 37, Chapter A Method to Generate Multivariate Data with the Desired Moments; pp. 2063–2075. [Google Scholar] [CrossRef] [Green Version]
Qu, W.; Liu, H.; Zhang, Z. A method of generating multivariate non-normal random numbers with desired multivariate skewness and kurtosis. Behav. Res. Methods 2020, 52, 939–946. [Google Scholar] [CrossRef] [PubMed]
Loskot, P. Polynomial Representations of High-Dimensional Observations of Random Processes. Mathematics 2021, 9, 123. [Google Scholar] [CrossRef]
Wȩglarczyk, S. Kernel density estimation and its application. XLVIII Semin. Appl. Math. 2018, 23, 00037. [Google Scholar] [CrossRef]
Botev, Z.I.; Grotowski, J.F.; Kroese, D.P. Kernel Density Estimation via Diffusion. Ann. Stat. 2010, 38, 2916–2957. [Google Scholar] [CrossRef] [Green Version]
O’Brien, T.A.; Kashinath, K.; Cavanaugh, N.R.; Collins, W.D.; O’Brien, J.P. A fast and objective multidimensional kernel density estimation method: FastKDE. Comput. Stat. Data Anal. 2016, 101, 148–160. [Google Scholar] [CrossRef] [Green Version]
Papoulis, A.; Pillai, S.U. Probability, Random Variables, and Stochastic Processes, 4th ed.; McGraw-Hill: New York, NY, USA, 2002. [Google Scholar]
Thede, L. Practical Analog and Digital Filter Design; Artech House Inc.: Norwood, MA, USA, 2004. [Google Scholar]
Kundu, D.; Gupta, R.D. A convenient way of generating gamma random variables using generalized exponential distribution. Comput. Stat. Data Anal. 2007, 51, 2796–2802. [Google Scholar] [CrossRef] [Green Version]
Beckenbach, E.; Bellman, R. Inequalities; Springer: Berlin/Heidelberg, Germany, 1961. [Google Scholar]

Figure 1. The number of degrees of freedom to determine

N K

unknown coefficients

{\bar{X}}_{k i}

from the constraints C1 and C2.

Figure 1. The number of degrees of freedom to determine

N K

unknown coefficients

{\bar{X}}_{k i}

from the constraints C1 and C2.

Figure 2. Linearization of distributions in the mixture decomposition of the marginal distribution

f_{X}

in the vicinity of an arbitrary point

X_{0}

.

Figure 2. Linearization of distributions in the mixture decomposition of the marginal distribution

f_{X}

in the vicinity of an arbitrary point

X_{0}

.

Figure 3. The Kullback–Leibler divergence of approximating the named univariate distributions by the two-component mixture distributions as a function of the component distributions displacement.

Figure 4. The correlation coefficient of the named bivariate distributions as a function of the component distributions displacement.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Loskot, P. A Generative Model for Correlated Graph Signals. Mathematics 2021, 9, 3078. https://doi.org/10.3390/math9233078

AMA Style

Loskot P. A Generative Model for Correlated Graph Signals. Mathematics. 2021; 9(23):3078. https://doi.org/10.3390/math9233078

Chicago/Turabian Style

Loskot, Pavel. 2021. "A Generative Model for Correlated Graph Signals" Mathematics 9, no. 23: 3078. https://doi.org/10.3390/math9233078

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Generative Model for Correlated Graph Signals

Abstract

1. Introduction

2. Problem Statement

3. Constructing a Multivariate Distribution from Its Marginals

Bivariate Case

4. Numerical Examples

5. Discussion

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI