1. Introduction
It is well known that for the purpose of modeling dependence in a risk management setting, the multivariate normal distribution is not flexible enough, and therefore its use can lead to a misleading assessment of risk(s). Indeed, the multivariate normal has light tails and its copula is tailindependent such that inference based on this model heavily underestimates joint extreme events. An important class of distributions that generalizes this simple model is that of
normal variance mixtures. A random vector
$\mathit{X}=({X}_{1},\dots ,{X}_{d})$ follows a normal variance mixture, denoted by
$\mathit{X}\sim {\mathrm{NVM}}_{d}(\mathit{\mu},\mathsf{\Sigma},{F}_{W})$, if, in distribution,
where
$\mathit{\mu}\in {\mathbb{R}}^{d}$ is the
location (vector),
$\mathsf{\Sigma}=A{A}^{\top}$ for
$A\in {\mathbb{R}}^{d\times k}$ denotes the symmetric, positive semidefinite
scale (matrix) and
$W\sim {F}_{W}$ is a nonnegative random variable independent of the random vector
$\mathit{Z}\sim {N}_{k}(\mathbf{0},{I}_{k})$ (where
${I}_{k}\in {\mathbb{R}}^{k\times k}$ denotes the identity matrix); see, for example, (
McNeil et al. 2015, Section 6.2) or (
Hintz et al. 2020). Here, the random variable
W can be thought of as a shock mixing the normal
$\mathit{Z}$, thus allowing
$\mathit{X}$ to have different tail behavior and dependence structure than the special case of a multivariate normal.
The multivariate
t distribution with
$\nu >0$ degrees of freedom (dof) is also a special case of (
1), for
$W\sim IG(\nu /2,\nu /2)$; a random variable (rv)
W is said to follow an inversegamma distribution with shape
$\alpha >0$ and rate
$\beta >0$, notation
$W\sim IG(\alpha ,\beta )$, if
W has density
${f}_{W}\left(w\right)={\beta}^{\alpha}{w}^{\alpha 1}exp(1/\left(\beta w\right))/\mathsf{\Gamma}\left(\alpha \right)$ for
$w>0$ (here,
$\mathsf{\Gamma}\left(x\right)={\int}_{0}^{\infty}{t}^{x1}exp(t)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ denotes the gamma function). If
$W\sim IG(\nu /2,\nu /2)$, then
${X}_{j}\sim {t}_{\nu}({\mu}_{j},{\mathsf{\Sigma}}_{jj})$,
$j=1,\dots ,d$, so that all margins are univariate
t with the same dof
$\nu $. The
t copula, which is the implicitly derived copula from
$\mathit{X}\sim {t}_{d}(\nu ,\mathbf{0},P)$ for a correlation matrix
P via Sklar’s theorem, is a widely used copula in risk management; see, e.g.,
Demarta and McNeil (
2005). It allows one to model pairwise dependencies, including tail dependence, flexibly via the correlation matrix
P. When
$P={I}_{d}$, all
kdimensional margins of
$\mathit{X}$ are identically distributed. To overcome this limitation, one can allow different margins to have different dof. On a copula level, this leads to the notion of grouped
t copulas of
Daul et al. (
2003) and generalized
t copulas of
Luo and Shevchenko (
2010).
In this paper, we, more generally, define
grouped normal variance mixtures via the stochastic representation
where
$\mathit{W}=({W}_{1},\dots ,{W}_{d})$ is a
ddimensional nonnegative and comonotone random vector with
${W}_{j}\sim {F}_{{W}_{j}}$ that is independent of
$\mathit{Z}$. Denote by
${F}_{W}^{\leftarrow}\left(u\right)=inf\{w\ge 0:{F}_{W}\left(w\right)\ge u\}$ the quantile function of a random variable
W. Comonotonicity of the
${W}_{j}$ implies the stochastic representation
If a
ddimensional random vector
$\mathit{X}$ satisfies (
2) with
$\mathit{W}$ given as in (
3), we use the notation
$\mathit{X}\sim \mathrm{gNVM}(\mathit{\mu},\mathsf{\Sigma},{F}_{\mathit{W}})$ where
${F}_{\mathit{W}}\left(\mathit{w}\right)=\mathbb{P}(\mathit{W}\le \mathit{w})$ for
$\mathit{w}\in {\mathbb{R}}^{d}$ and the inequality is understood componentwise.
As mentioned above, in the case of an (ungrouped) normal variance mixture distribution from (
1), the scalar random variable (rv)
W can be regarded as a shock affecting all components of
$\mathit{X}$. In the more general setting considered in this paper where
$\mathit{W}$ is a vector of comonotone mixing rvs, different, perfectly dependent random variables affect different margins of
$\mathit{X}$. By moving from a scalar mixing rv to a comonotone random vector, one obtains nonelliptical distributions well beyond the classical multivariate
t case, giving rise to flexible modeling of joint and marginal body and tail behaviors. The price to pay for this generalization is significant computational challenges: Not even the density of a grouped
t distribution is available in closed form.
At first glance, the definition given in (
2) does not indicate any “grouping” yet. However, Equation (
3) allows one to group components of the random vector
$\mathit{X}$ such that all components within a group have the same mixing distribution. More precisely, let
$\mathit{W}$ be split into
S subvectors, i.e.,
$\mathit{W}=({\mathit{W}}_{1},\dots ,{\mathit{W}}_{S})$ where
${\mathit{W}}_{k}$ has dimension
${d}_{k}$ for
$k=1,\dots ,S$ and
${\sum}_{k=1}^{S}{d}_{k}=d$. Now let each
${\mathit{W}}_{k}$ have stochastic representation
${\mathit{W}}_{k}=({F}_{{W}_{k}}^{\leftarrow}\left(U\right),\dots ,{F}_{{W}_{k}}^{\leftarrow}\left(U\right))$. Hence, all univariate margins of the subvector
${\mathit{W}}_{k}$ are identically distributed. This implies that all margins of the corresponding subvector
${\mathit{X}}_{k}$ are of the same type.
An example is the copula derived from
$\mathit{X}$ in (
2) when
${F}_{{W}_{k}}=IG\left({\nu}_{k}/2,{\nu}_{k}/2\right)$ for
$k=1,\dots ,S$; this is the aforementioned grouped
t copula. Here, different margins of the copula follow (potentially) different
t copulas with different dof, allowing for more flexibility in modeling pairwise dependencies. A grouped
t copula with
$S=d$, that is when each component has their own mixing distribution, was proposed in
Venter et al. (
2007) (therein called “individuated
t copula”) and studied in more detail in
Luo and Shevchenko (
2010) (therein called “
t copula with multiple dof”). If
$S=1$, the classical
t copula with exactly one dof parameter is recovered.
For notational convenience, derivations in this paper are often done for the case
$S=d$, so that the
${F}_{{W}_{j}}$ are all different; the case
$S<d$, that is when grouping is present, is merely a special case where some of the
${F}_{{W}_{j}}$ are identical. That being said, we chose to keep the name “grouped” to refer to this class of models so as to reflect the original motivation for this type of model, e.g., as in
Daul et al. (
2003), where it is used to model the components of a portfolio in which there are subgroups representing different business sectors.
Previous work on grouped
t copulas and their corresponding distributions includes some algorithms for the tasks needed to handle these models, but were mostly focused on demonstrating the superiority of this class of models over special cases such as the multivariate normal or
t distribution. More precisely, in
Daul et al. (
2003), the grouped
t copula was introduced and applied to model an internationally diversified credit portfolio of 92 risk factors split into 8 subgroups. It was demonstrated that the grouped
t copula is superior to both the Gaussian and
t copula in regards to modeling the tail dependence present in the data.
Luo and Shevchenko (
2010) also study the grouped
t copula and, unlike in
Daul et al. (
2003), allow group sizes of 1 (corresponding to
$S=d$ in our definition). They provide calibration methods to fit the copula to data and furthermore study bivariate characteristics of the grouped
t copula, including symmetry properties and tail dependence.
However, to the best of our knowledge, there currently does not exist an encompassing body of work providing all algorithms and formulas required to handle these copulas and their corresponding distributions, both in terms of evaluating distributional quantities and in terms of general fitting algorithms. In particular, not even the problem of computing the distribution and density function of a grouped t copula has been addressed. Our paper fills this gap by providing a complete set of algorithms for performing the main computational tasks associated with these distributions and their associated copulas, and does so in an as automated way as possible. This is done not only for grouped t copulas, but (in many cases) for the more general grouped normal variance mixture distributions/copulas, which allow for even further flexibility in modeling the shock variables $\mathit{W}$. Furthermore, we assume that the only available information about the distribution of the ${W}_{j}$ are the marginal quantile functions in the form of a “black box“, meaning that we can only evaluate these quantile functions but have no mathematical expression for them (so that neither the density, nor the distribution function of ${W}_{j}$ are available in closed form).
Our work includes the following contributions: (i) we develop an algorithm to evaluate the distribution function of a grouped NVM model. Our method only requires the user to provide a function that evaluates the quantile function of the
${W}_{j}$ through a black box. As such, different mixing distributions can be studied by merely providing a quantile function without having to implement an integration routine for the model at hand; (ii) as mentioned above, the density function for a grouped
t distribution does not exist in closed form, neither does it for the more general grouped NVM case. We provide an adaptive algorithm to estimate this density function in a very general setting. The adaptive mechanism we propose ensures the estimation procedure is precise even for points that are far from the mean; (iii) to estimate Kendall’s tau and Spearman’s rho for a twodimensional grouped NVM copula, we provide a representation as an expectation, which in turn leads to an easytoapproximate two or threedimensional integral; (iv) we provide an algorithm to estimate the copula and its density associated with the grouped
t copula, and fitting algorithms to estimate the parameters of a grouped NVM copula based on a dataset. While the problem of parameter estimation was already studied in
Daul et al. (
2003) and
Luo and Shevchenko (
2010), the computation of the copula density which is required for the joint estimation of all dof parameters has not been investigated in full generality for arbitrary dimensions yet, which is a gap we fill in this paper.
The four items from the list of contributions described in the previous paragraph correspond to
Section 3,
Section 4,
Section 5 and
Section 6 of the paper.
Section 2 includes a brief presentation of the notation used, basic properties of grouped NVM distributions and a description of randomized quasiMonte Carlo methods that are used throughout the paper since most quantities of interest require the approximation of integrals.
Section 7 provides a discussion. The proofs are given in
Section 8.
All our methods are implemented in the
R package
nvmix (
Hofert et al. (
2020)) and all numerical results are reproducible with the demo
grouped_mixtures.
3. Distribution Function
Let $\infty \le \mathit{a}<\mathit{b}\le \infty $ componentwise (entries $\pm \infty $ to be interpreted as the corresponding limits). Then $F(\mathit{a},\mathit{b})=\mathbb{P}(\mathit{a}<\mathit{X}\le \mathit{b})$ is the probability that the random vector $\mathit{X}$ falls in the hyperrectangle spanned by the lowerleft and upperright endpoints $\mathit{a}$ and $\mathit{b}$, respectively. If $\mathit{a}=(\infty ,\dots ,\infty )$, we recover $F(\mathit{a},\mathit{x})=F\left(\mathit{x}\right)=\mathbb{P}({X}_{1}\le {x}_{1},\dots ,{X}_{d}\le {x}_{d})$ which is the (cumulative) distribution function of $\mathit{X}$.
Assume wlog that
$\mathit{\mu}=\mathbf{0}$, otherwise adjust
$\mathit{a}$,
$\mathit{b}$ accordingly. Then
where
${\mathsf{\Phi}}_{\mathsf{\Sigma}}(\mathit{a},\mathit{b})=\mathbb{P}(\mathit{a}<\mathit{Y}\le \mathit{b})$ for
$\mathit{Y}\sim {N}_{d}(\mathbf{0},\mathsf{\Sigma})$. Note that the function
${\mathsf{\Phi}}_{\mathsf{\Sigma}}(\mathit{a},\mathit{b})$ itself is a
ddimensional integral for which no closed formula exists and is typically approximated via numerical methods; see, e.g.,
Genz (
1992).
Comonotonicity of the
${W}_{j}$ allowed us to write
$F(\mathit{a},\mathit{b})$ as a
$(d+1)$dimensional integral; had the
${W}_{j}$ a different dependence structure, this convenience would be lost and the resulting integral in (
11) could be up to
$2d$dimensional (e.g., when all
${W}_{j}$ are independent).
3.1. Estimation
In
Hintz et al. (
2020), randomized quasiMonte Carlo methods have been derived to approximate the distribution function of a normal variance mixture
$\mathit{X}\sim {\mathrm{NVM}}_{d}(\mathit{\mu},\mathsf{\Sigma},{F}_{W})$ from (
1). Grouped normal variance mixtures can be dealt with similarly, thanks to the comonotonicity of the mixing random variables in
$\mathit{W}$.
In order to apply RQMC to the problem of estimating
$F(\mathit{a},\mathit{b})$, we need to transform
$F(\mathit{a},\mathit{b})$ to an integral over the unit hypercube. To this end, we first address
${\mathsf{\Phi}}_{\mathsf{\Sigma}}$. Let
$C={\left({C}_{ij}\right)}_{i,j=1}^{d}$ be the Cholesky factor of
$\mathsf{\Sigma}$ (a lower triangular matrix such that
$C{C}^{\top}=\mathsf{\Sigma}$). We assume that
$\mathsf{\Sigma}$ has full rank which implies
${C}_{jj}>0$ for
$j=1,\dots ,d$.
Genz (
1992) (see also
Genz and Bretz (
1999,
2002,
2009)) uses a series of transformations, relying on
C being a lower triangular matrix, to write
where the
${\widehat{d}}_{i}$ and
${\widehat{e}}_{i}$ are recursively defined via
and
${\widehat{d}}_{i}$ is
${\widehat{e}}_{i}$ with
${b}_{i}$ replaced by
${a}_{i}$ for
$i=1,\dots ,d$. Note that the final integral in (
12) is
$(d1)$dimensional.
Combining the representation (
12) of
${\mathsf{\Phi}}_{\mathsf{\Sigma}}$ with Equation (
11) yields
where
for
$\mathit{u}=({u}_{0},{u}_{1},\dots ,{u}_{d1})\in {(0,1)}^{d}$. The
${e}_{i}$ are recursively defined by
for
$i=2,\dots ,d$ and the
${d}_{i}$ are
${e}_{i}$ with
${b}_{i}$ replaced by
${a}_{i}$ for
$i=1,\dots ,d$.
Summarizing, we were able to write $F(\mathit{a},\mathit{b})$ as an integral over the ddimensional unit hypercube. Our algorithm to approximate $F(\mathit{a},\mathit{b})$ consists of two steps:
First, a greedy reordering algorithm is applied to the inputs
$\mathit{a}$,
$\mathit{b}$,
$\mathsf{\Sigma}$. It reorders the components
$1,\dots ,d$ of
$\mathit{a}$ and
$\mathit{b}$ as well as the corresponding rows and columns in
$\mathsf{\Sigma}$ in a way that the expected ranges of
${g}_{i}$ in (
15) are increasing with the index
i for
$i=1,\dots ,d$. Observe that the integration variable
${u}_{i}$ is present in all remaining
$di+1$ integrals in (
14) whose ranges are determined by the ranges of
${g}_{1},\dots ,{g}_{i}$; reordering the variables according to expected ranges therefore (in the vast majority of cases) reduces the overall variability of
g (namely,
$var\left(g\right(\mathit{U}\left)\right)$ for
$\mathit{U}\sim \mathrm{U}{(0,1)}^{d}$). Reordering also makes the first variables “more important” than the last ones, thereby reducing the effective dimension of the integrand. This is particularly beneficial for quasiMonte Carlo methods, as these methods are known to perform well in highdimensional problems with low effective dimension; see, e.g.,
Caflisch et al. (
1997),
Wang and Sloan (
2005). For a detailed description of the method, see (
Hintz et al. 2020, Algorithm 3.2) (with
${a}_{j}/{\mu}_{\sqrt{W}}$ replaced by
${a}_{j}/{\mu}_{{\sqrt{W}}_{j}}$ and similarly for
${b}_{j}$ for
$j=1,\dots ,d$ to account for the generalization); similar reordering strategies have been proposed in
Gibson et al. (
1994) for calculating multivariate normal and in
Genz and Bretz (
2002) for multivariate
t probabilities.
Second, an RQMC algorithm as described in
Section 2.2 is applied to approximate the integral in (
14) with reordered
$\mathit{a}$,
$\mathit{b}$,
$\mathsf{\Sigma}$ and
${F}_{\mathit{W}}$. Instead of integrating
g from (
15) directly, antithetic variates are employed so that effectively, the function
$\tilde{g}\left(\mathit{u}\right)=(g\left(\mathit{u}\right)+g(\mathbf{1}\mathit{u}))/2$ is integrated.
The algorithm to estimate $F(\mathit{a},\mathit{b})$ just described is implemented in the function pgnvmix() of the R package nvmix.
3.2. Numerical Results
In order to assess the performance of our algorithm described in
Section 3.1, we estimate the error as a function of the number of function evaluations. Three estimators are considered. First, the “Crude MC“ estimator is constructed by sampling
${\mathit{X}}_{1},\dots ,{\mathit{X}}_{n}\underset{}{\stackrel{\mathrm{ind}.}{\sim}}\mathrm{gNVM}(\mathit{\mu},\mathsf{\Sigma},{F}_{\mathit{W}})$ and estimating
$\mathbb{P}(\mathit{X}\le \mathit{x})$ by
${\widehat{\mu}}_{n}^{\mathrm{MC}}=(1/n){\sum}_{i=1}^{n}{\mathsf{\U0001d7d9}}_{\{{\mathit{X}}_{i}\le \mathit{x}\}}$. The second and third estimator are based on the integrand
g from (
15), which is integrated once using MC (“g (MC)”) and once using a randomized Sobol’ sequence (“g (sobol)”). In either case, variable reordering is applied first.
We perform our experiments for an inversegamma mixture. As motivated in the introduction, an important special case of (grouped) normal variance mixtures is obtained when the mixing distribution is inversegamma. In the ungrouped case when
$\mathit{X}\sim {\mathrm{NVM}}_{d}(\mathit{\mu},\mathsf{\Sigma},{F}_{W})$ with
$W\sim IG(\nu /2,\nu /2)$, the distribution of
$\mathit{X}$ is multivariate
t (notation
$\mathit{X}\sim {t}_{d}(\nu ,\mathit{\mu},\mathsf{\Sigma})$) with density
The distribution function of
$\mathit{X}\sim {t}_{d}(\nu ,\mathit{\mu},\mathsf{\Sigma})$ does not admit a closed form; estimation of the latter was discussed for instance in
Genz and Bretz (
2009),
Hintz et al. (
2020),
Cao et al. (
2020). The same holds for a grouped inversegamma mixture model. If
${W}_{j}\sim IG({\nu}_{j}/2,{\nu}_{j}/2)$ for
$j=1,\dots ,d$, the random vector
$\mathit{X}$ follows a grouped
t distribution, denoted by
$\mathit{X}\sim {gt}_{d}({\nu}_{1},\dots ,{\nu}_{d};\mathit{\mu},\mathsf{\Sigma})$ or by
$\mathit{X}\sim {gt}_{d}(\mathit{\nu},\mathit{\mu},\mathsf{\Sigma})$ for
$\mathit{\nu}=({\nu}_{1},\dots ,{\nu}_{d})$. If
$1<S<d$, denote by
${d}_{1},\dots ,{d}_{S}$ the group sizes. In this case, we use the notation
$\mathit{X}\sim {gt}_{d}({\nu}_{1},\dots ,{\nu}_{S};{d}_{1},\dots ,{d}_{S};\mathit{\mu},\mathsf{\Sigma})$ or
$\mathit{X}\sim {gt}_{d}(\mathit{\nu},\mathit{d},\mathit{\mu},\mathsf{\Sigma})$ for
$\mathit{d}=({d}_{1},\dots ,{d}_{S})$. If
$S=1$, it follows that
$\mathit{X}\sim {t}_{d}({\nu}_{1},\mathit{\mu},\mathsf{\Sigma})$.
For our numerical examples to test the performance of our procedure for estimating
$F(\mathit{a},\mathit{b})$, assume
$\mathit{X}\sim {gt}_{d}(\mathit{\nu},\mathbf{0},P)$ for a correlation matrix
P. We perform the experiment in
$d=5$ with
$\mathit{\nu}=(1.5,2.5,\dots ,5.5)$ and in
$d=20$ with
$\mathit{\nu}=(1,1.25,\dots ,5.5,5.75)$. The following is repeated 15 times: Sample an upper limit
$\mathit{b}\sim \mathrm{U}{(0,3\sqrt{d})}^{d}$ and a correlation matrix
P (sampled based on a random Wishart matrix via the function
rWishart() in R). Then estimate
$\mathbb{P}(\mathit{X}\le \mathit{b})$ using the three aforementioned methods using various sample sizes and estimate the error for the MC estimators based on a CLT argument and for the RQMC estimator as described in
Section 2.2.
Figure 1 reports the average absolute errors for each sample size over the 15 runs.
Convergence speed as measured by the regression coefficient $\alpha $ of $log\left(\widehat{\epsilon}\right)=\alpha log\left(n\right)+c$ where $\widehat{\epsilon}$ is the estimated error are displayed in the legend. As expected, the MC estimators have an overall convergence speed of $1/\sqrt{n}$; however, the crude estimator has a much larger variance than the MC estimator based on the function g. The RQMC estimator (“g (sobol)”) not only shows much faster convergence speed than its MC counterparts, but also a smaller variance.
4. Density Function
Let us now focus on the density of
$\mathit{X}\sim \mathrm{gNVM}(\mathit{\mu},\mathsf{\Sigma},{F}_{\mathit{W}})$, where we assume that
$\mathsf{\Sigma}$ has full rank in order for the density to exist. As mentioned in the introduction, the density of
$\mathit{X}$ is typically not available in closed form, not even in the case of a grouped
t distribution. The same conditioning argument used to derive (
11) yields that the density of
$\mathit{X}\sim {\mathrm{gNVM}}_{d}(\mathit{\mu},\mathsf{\Sigma},{F}_{\mathit{W}})$ evaluated at
$\mathit{x}\in {\mathbb{R}}^{d}$ can be written as
where
${D}^{2}(\mathit{x};\mathit{\mu},\mathsf{\Sigma})={(\mathit{x}\mathit{\mu})}^{\top}{\mathsf{\Sigma}}^{1}(\mathit{x}\mathit{\mu})$ denotes the (squared) Mahalanobis distance of
$\mathit{x}\in {\mathbb{R}}^{d}$ from
$\mathit{\mu}$ with respect to
$\mathsf{\Sigma}$ and the integrand
$h\left(u\right)$ is defined in an obvious manner. Except for some special cases (e.g., when all
${W}_{j}$ are inversegamma with the same parameters), this integral cannot be computed explicitly, so that we rely on numerical approximation thereof.
4.1. Estimation
From (
18), we find that computing the density
$f\left(\mathit{x}\right)$ of
$\mathit{X}\sim {\mathrm{gNVM}}_{d}(\mathit{\mu},\mathsf{\Sigma},{F}_{\mathit{W}})$ evaluated at
$\mathit{x}\in {\mathbb{R}}^{d}$ requires the estimation of a univariate integral. As interest often lies in the logarithmic density (or logdensity) rather than the actual density (e.g., likelihoodbased methods where the loglikelihood function of a random sample is optimized over some parameter space), we directly consider the problem of estimating
$log\left(\mu \right)$ for
$\mu ={\int}_{0}^{1}h\left(u\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u$ with
h given in (
18).
Since
$\mu $ is expressed as an integral over
$(0,1)$, RQMC methods to estimate
$log\left(\mu \right)$ from
Section 2.2 can be applied directly to the problem in this form. If the logdensity needs to be evaluated at several
${\mathit{x}}_{1},\dots ,{\mathit{x}}_{N}$, one can use the same pointsets
${\tilde{P}}_{n,b}$ and therefore the same realizations of the mixing random vector
$\mathit{W}$ for all inputs. This avoids costly evaluations of the quantile functions
${F}_{{W}_{j}}^{\leftarrow}$.
Estimating
$log\left(f\right(\mathit{x}\left)\right)$ via RQMC as just described works well for input
$\mathit{x}$ of moderate size, but deteriorates if
$\mathit{x}$ is far away from the mean. To see this,
Figure 2 shows the integrand
h for three different input
$\mathit{x}$ and three different settings for
${F}_{\mathit{W}}$. If
$\mathit{x}$ is “large”, most of the mass is contained in a small subdomain of
$(0,1)$ containing the abscissa of the maximum of
h. If an integration routine is not able to detect this peak, the density is substantially underestimated. Further complication arises as we are estimating the logdensity rather than the density. Unboundedness of the natural logarithm at 0 makes estimation of
$log\left(\mu \right)$ for small
$\mu $ challenging, both from a theoretical and a computational point of view due to finite machine precision.
In (
Hintz et al. 2020, Section 4), an adaptive RQMC algorithm is proposed to efficiently estimate the logdensity of
$\mathit{X}\sim {\mathrm{NVM}}_{d}(\mathit{\mu},\mathsf{\Sigma},{F}_{W})$. We generalize this method to the grouped case. The grouped case is more complicated because the distribution is not elliptical, hence the density does not only depend on
$\mathit{x}$ through
${D}^{2}(\mathit{x},\mathit{\mu},\mathsf{\Sigma})$. Furthermore, the height of the (unique) maximum of
h in the ungrouped case can be easily computed without simulation, which helps the adaptive procedure find the relevant region; in the grouped case, the value of the maximum is usually not available. Lastly,
S (as opposed to 1) quantile evaluations are needed to obtain one function value
$h\left(u\right)$; from a run time perspective, evaluating these quantile functions is the most expensive part.
If
$\mathit{x}$ is “large”, the idea is to apply RQMC only in a relevant region
$({u}_{l},{u}_{r})$ with
${argmax}_{u}h\left(u\right)=:{u}^{*}\in ({u}_{l},{u}_{r})$. More precisely, given a threshold
${\epsilon}_{\mathrm{th}}$ with
$0<{\epsilon}_{\mathrm{th}}<{h}_{max}={max}_{u\in (0,1)}h\left(u\right)$, choose
${u}_{l},{u}_{r}$ (
l for “left” and
r for “right”) with
$0\le {u}_{l}\le {u}^{*}\le {u}_{r}\le 1$ so that
$h\left(u\right)>{\epsilon}_{\mathrm{th}}$ if and only if
$u\in ({u}_{l},{u}_{r})$. For instance, take
with
${k}_{\mathrm{th}}=10$ so that
${\epsilon}_{\mathrm{th}}$ is 10 orders smaller than
${h}_{max}$.
One can then apply RQMC (with a proper logarithm) in the region $({u}_{l},{u}_{r})$ (by replacing every ${u}_{i,b}\in (0,1)$ by ${u}_{i,b}^{\prime}={u}_{l}+({u}_{r}{u}_{l}){u}_{i,b}\in ({u}_{l},{u}_{r})$), producing an estimate for $log{\int}_{{u}_{l}}^{{u}_{r}}h\left(u\right)\mathrm{d}u$. By construction, the remaining regions do not contribute significantly to the overall integral anyway, so that a rather quick integration routine suffices here. Note that neither ${h}_{max}$, nor ${u}_{l},{u}_{r}$ are known explicitly. However, ${h}_{max}$ can be estimated from pilotruns and ${u}_{l},{u}_{r}$ can be approximated using bisections.
Summarizing, we propose the following method to estimate $log\left(f\left({\mathit{x}}_{i}\right)\right)$, $i=1,\dots ,N$, for given inputs ${\mathit{x}}_{1},\dots ,{\mathit{x}}_{N}$ and error tolerance $\epsilon $.
This algorithm is implemented in the function dgnvmix(, log = TRUE) in the R package nvmix, which by default uses a relative error tolerance.
The advantage of the proposed algorithm is that only little run time is spent on estimating “easy” integrals, thanks to the pilot run in Step 1. If ${n}_{0}={2}^{10}$ and $B=15$ (the current default in the nvmix package), this step gives 15 360 pairs $(u,{F}_{\mathit{W}}^{\leftarrow}\left(u\right))$. These pairs give good starting values for the bisections to find ${u}_{l},{u}_{r}$. Note that no additional quantile evaluations are needed to estimate the less important regions $(0,{u}_{l})$ and $({u}_{r},1)$.
4.2. Numerical Results
Luo and Shevchenko (
2010) are faced with almost the same integration problem when estimating the density of a bivariate grouped
t copula. They use a globally adaptive integration scheme from
Piessens et al. (
2012) to integrate
h. While this procedure works well for a range of inputs, it deteriorates for input
$\mathit{x}$ with large components.
Consider first
$\mathit{X}\sim {t}_{d}(\nu ,\mathbf{0},{I}_{d})$ and recall that the density of
$\mathit{X}$ is known and given by (
17); this is useful to test our estimation procedure. As such, let
$\mathit{X}\sim {t}_{2}(\nu =6,\mathbf{0},{I}_{2})$ and consider the problem of evaluating the density of
$\mathit{X}$ at
$\mathit{x}\in \left\{\right(0,0),(5,5),(25,25),(50,50\left)\right\}$. Some values of the corresponding integrands are shown in
Figure 2. In
Table 1, true and estimated (log)density values are reported; once estimated using the
R function
integrate(), which is based on the QUADPACK package of
Piessens et al. (
2012) and once using
dgnvmix(), which is based on Algorithm 1. Clearly, the
integrate() integration routine is not capable of detecting the peak when input
$\mathit{x}$ is large, yielding substantially flawed estimates. The estimates obtained from
dgnvmix(), however, are quite close to the true values even far out in the tail.
Algorithm 1: Adaptive RQMC Algorithm to Estimate $log\left(f\left({\mathit{x}}_{1}\right)\right),\dots ,log\left(f\left({\mathit{x}}_{n}\right)\right)$. 
Given ${\mathit{x}}_{1},\dots ,{\mathit{x}}_{N}$, $\mathsf{\Sigma}$, $\epsilon $, ${\epsilon}_{\mathrm{th}}$, ${n}_{0}$, estimate $log\left(f\left({\mathit{x}}_{l}\right)\right)$, $l=1,\dots ,N$, via: Compute ${\widehat{\mu}}_{logf\left({\mathit{x}}_{i}\right),{n}_{0}}^{\mathrm{RQMC}}$ with sample size ${n}_{0}$ using the same random numbers for all input ${\mathit{x}}_{i}$, $i=1,\dots ,N$. Store all uniforms with corresponding quantile evaluations ${F}_{\mathit{W}}^{\leftarrow}$ in a list $\mathcal{L}$. If all estimates ${\widehat{\mu}}_{logf\left({\mathit{x}}_{i}\right),{n}_{0}}^{\mathrm{RQMC}}$, $i=1,\dots ,N$, meet the error tolerance $\epsilon $, go to Step 4. Otherwise let ${\mathit{x}}_{s}$, $s=1,\dots ,{N}^{\prime}$ with $1\le {N}^{\prime}\le N$ be the inputs whose error estimates exceed the error tolerance. For each remaining input ${\mathit{x}}_{s}$, $s=1,\dots ,{N}^{\prime}$, do:
 (a)
Use all pairs $(u,{F}_{\mathit{W}}^{\leftarrow}\left(u\right))$ in $\mathcal{L}$ to compute values of $h\left(u\right)$ and set ${\widehat{h}}_{max}={max}_{u\in \mathcal{L}}h\left(u\right)$. If the largest value of h is obtained for the largest (smallest) u in the list $\mathcal{L}$, set ${u}^{*}=1$ (${u}^{*}=0$).  (b)
If ${u}^{*}=1$, set ${u}_{r}=1$ and if ${u}^{*}=0$, set ${u}_{l}=0$. Unless already specified, use bisections to find ${u}_{l}$ and ${u}_{r}$ such that ${u}_{l}<{u}^{*}<{u}_{r}$ and ${u}_{l}$ ( ${u}_{r}$) is the smallest (largest) u such that $h\left(u\right)>{\epsilon}_{\mathrm{th}}$ from ( 19) with ${h}_{max}$ replaced by ${\widehat{h}}_{max}$. Starting intervals for the bisections can be found from the values in $\mathcal{L}$.  (c)
If ${u}_{l}>0$, approximate $log\left({\int}_{0}^{{u}_{l}}h\left(u\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u\right)$ using a trapezoidal rule with proper logarithm and knots ${u}_{1}^{\prime},\dots ,{u}_{m}^{\prime}$ where ${u}_{i}^{\prime}$ are those u’s in $\mathcal{L}$ satisfying $u\le {u}_{l}$. Call the approximation ${\widehat{\mu}}_{(0,{u}_{l})}\left({\mathit{x}}_{s}\right)$. If ${u}_{l}=0$, set ${\widehat{\mu}}_{(0,{u}_{l})}=\infty $.  (d)
If ${u}_{r}<1$, approximate $log\left({\int}_{{u}_{r}}^{1}h\left(u\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u\right)$ using a trapezoidal rule with proper logarithm and knots ${u}_{1}^{\u2033},\dots ,{u}_{p}^{\u2033}$ where ${u}_{i}^{\u2033}$ are those u’s in $\mathcal{L}$ satisfying $u\ge {u}_{r}$. Call the approximation ${\widehat{\mu}}_{({u}_{r},1)}\left({\mathit{x}}_{s}\right)$. If ${u}_{r}=0$, set ${\widehat{\mu}}_{({u}_{r},1)}\left({\mathit{x}}_{s}\right)=\infty $.  (e)
Estimate $log\left({\int}_{{u}_{l}}^{{u}_{r}}h\left(u\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u\right)$ via RQMC. That is, compute ${\widehat{\mu}}_{logf,n}^{\mathrm{RQMC}}$ from ( 10) where every ${u}_{i,b}\in (0,1)$ is replaced by ${u}_{i,b}^{\prime}={u}_{l}+({u}_{r}{u}_{l}){u}_{i,b}\in ({u}_{l},{u}_{r})$. Increase n until the error tolerance $\epsilon $ is met. Then set ${\widehat{\mu}}_{({u}_{l},{u}_{r})}=log({u}_{r}{u}_{l})+{\widehat{\mu}}_{logf,n}^{\mathrm{RQMC}}$ which estimates $log\left({\int}_{{u}_{l}}^{{u}_{r}}h\left(u\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u\right)$.  (f)
Return ${\widehat{\mu}}_{logf\left({\mathit{x}}_{l}\right)}^{\mathrm{RQMC}}$, $l=1,\dots ,N$.

The preceding discussion focused on the classical multivariate
t setting, as the density is known in this case. Next, consider a grouped inversegamma mixture model and let
$\mathit{X}\sim {gt}_{d}(\mathit{\nu},\mathit{\mu},\mathsf{\Sigma})$. The density
${f}_{\mathit{\nu},\mathit{\mu},P}^{gt}$ of
$\mathit{X}\sim {gt}_{d}(\mathit{\nu},\mathit{\mu},\mathsf{\Sigma})$ is not available in closed form, so that here we indeed need to rely on estimation of the latter. The following experiment is performed for
$\mathit{X}\sim {gt}_{2}(\mathit{\nu},\mathbf{0},{I}_{2})$ with
$\mathit{\nu}=(3,6)$ and for
$\mathit{X}\sim {gt}_{10}(\mathit{\nu},\mathbf{0},{I}_{10})$ where
$\mathit{\nu}=(3,\dots ,3,6,\dots ,6)$ (corresponding to two groups of size 5 each). First, a sample from a more heavy tailed grouped
t distribution of size 2500 is sampled (with degrees of freedom
${\mathit{\nu}}^{\prime}=(1,2)$ and
${\mathit{\nu}}^{\prime}=(1,\dots ,1,2,\dots ,2)$, respectively) and then the logdensity function of
$\mathit{X}\sim {gt}_{d}(\mathit{\nu},\mathbf{0},{I}_{d})$ is evaluated at the sample. The results are shown in
Figure 3.
It is clear from the plots that integrate() again gives wrong approximations to $f\left(\mathit{x}\right)$ for input $\mathit{x}$ far out in the tail; for small input $\mathit{x}$, the results from integrate() and from dgnvmix() coincide. Furthermore, it can be seen that the density function is not monotonic in the Mahalanobis distance (as grouped normal mixtures are not elliptical anymore). The plot also includes the logdensity functions of an ungrouped ddimensional t distribution with degrees of freedom 3 and 6, respectively. The logdensity function of the grouped mixture with $\mathit{\nu}=(3,6)$ is not bounded by either; in fact, the grouped mixture shows heavier tails than both the t distribution with 3 and with 6 dof.
5. Kendall tau and Spearman rho
Two widely used measures of association are the rank correlation coefficients Spearman’s rho ${\rho}_{\mathrm{S}}$ and Kendall’s tau ${\rho}_{\tau}$. For elliptical models, one can easily compute Spearman’s rho as a function of the copula parameter $\rho $ which can be useful in estimating the matrix P nonparametrically. For grouped mixtures, however, this is not easily possible. In this section, integral representations for Spearman’s rho and Kendall’s tau in the general grouped NVM case are derived.
If $\mathit{X}=({X}_{1},{X}_{2})\sim F$ is a random vector with continuous margins ${F}_{1},{F}_{2}$, then ${\rho}_{\mathrm{S}}({X}_{1},{X}_{2})=\rho ({F}_{1}\left({X}_{1}\right),{F}_{2}\left({X}_{2}\right))$ and ${\rho}_{\tau}({X}_{1},{X}_{2})=\mathbb{P}\left(({X}_{1}{Y}_{1})({X}_{2}{Y}_{2})>0\right)\mathbb{P}\left(({X}_{1}{Y}_{1})({X}_{2}{Y}_{2})<0\right)$, where $({Y}_{1},{Y}_{2})\sim F$ independent of $({X}_{1},{X}_{2})$ and $\rho (X,Y)=cov(X,Y)/\sqrt{var\left(X\right)var\left(Y\right)}$ is the linear correlation between X and Y. Both ${\rho}_{\mathrm{S}}$ and ${\rho}_{\tau}$ depend only on the copula of F.
If
$\mathit{X}\sim {\mathrm{ELL}}_{2}(\mathit{\mu},\mathsf{\Sigma},{F}_{R})$ is elliptical and
$\rho ={\mathsf{\Sigma}}_{12}/\sqrt{{\mathsf{\Sigma}}_{11}{\mathsf{\Sigma}}_{22}}$, then
see (
Lindskog et al. 2003, Theorem 2). This formula holds only approximately for grouped normal variance mixtures. In
Daul et al. (
2003), an expression was derived for Kendall’s tau of bivariate, grouped
t copulas. Their result is easily extended to the more general grouped normal variance mixture case; see
Section 8 for the proof.
Proposition 1. Let $\mathit{X}\sim {\mathit{gNVM}}_{2}(\mathit{\mu},\mathsf{\Sigma},{F}_{\mathit{W}})$ and $\rho ={\mathsf{\Sigma}}_{12}/\sqrt{{\mathsf{\Sigma}}_{11}{\mathsf{\Sigma}}_{22}}$. Then where $U,\tilde{U}\underset{}{\stackrel{\mathrm{ind}.}{\sim}}\mathrm{U}(0,1)$.
Next, we address Spearman’s rho
${\rho}_{\mathrm{S}}$. For computing
${\rho}_{\mathrm{S}}$, it is useful to study
$\mathbb{P}({X}_{1}>0,{X}_{2}>0)$. If
$\mathit{X}\sim {\mathrm{ELL}}_{2}(\mathit{\mu},P,{F}_{R})$ where
P is a correlation matrix with
${P}_{12}=\rho $ and
$\mathbb{P}(\mathit{X}=\mathbf{0})=0$, then
see, e.g., (
McNeil et al. 2015, Proposition 7.41). Using the same technique, we can show that this result also holds for grouped normal variance mixtures; see
Section 8 for the proof.
Proposition 2. Let $\mathit{X}\sim {\mathit{gNVM}}_{2}(\mathit{\mu},\mathsf{\Sigma},{F}_{\mathit{W}})$ and $\rho ={\mathsf{\Sigma}}_{12}/\sqrt{{\mathsf{\Sigma}}_{11}{\mathsf{\Sigma}}_{22}}$. Then Remark 1. If $\mathit{Y}$ is a grouped elliptical distribution in the sense of (5), a very similar idea can be used to show that $\mathbb{P}({Y}_{1}>0,{Y}_{2}>0)=1/4+arcsin\left(\rho \right)/\left(2\pi \right)$. Next, we derive a new expression for Spearman’s rho
${\rho}_{\mathrm{S}}$ for bivariate grouped normal variance mixture distributions; see
Section 8 for the proof.
Proposition 3. Let $\mathit{X}\sim {\mathit{gNVM}}_{2}(\mathbf{0},P,{F}_{\mathit{W}})$ and $\rho ={P}_{12}$. Thenwhere $U,\tilde{U},\overline{U}\underset{}{\stackrel{\mathrm{ind}.}{\sim}}\mathrm{U}(0,1)$. Numerical Results
Let
$\mathit{X}\sim {\mathrm{gNVM}}_{2}(\mathbf{0},P,{F}_{\mathit{W}})$. It follows from Proposition 1 that
Similarly, Proposition 3 implies that
Hence, both
${\rho}_{\tau}({X}_{1},{X}_{2})$ and
${\rho}_{\mathrm{S}}({X}_{1},{X}_{2})$ can be expressed as integrals over the
ddimensional unit hypercube with
$d\in \{2,3\}$ so that RQMC methods as described in
Section 2.2 can be applied directly to the problem in this form to estimate
${\rho}_{\tau}({X}_{1},{X}_{2})$ and
${\rho}_{\mathrm{S}}({X}_{1},{X}_{2})$, respectively. This is implemented in the function
corgnvmix() (with
method = "kendall" or
method = "spearman") of the
R package
nvmix.
As an example, we consider three different bivariate grouped
t distributions with
$\mathit{\nu}\in \left\{\right(1,2),(4,8),(1,5),(4,20),(1,\infty ),(4,\infty \left)\right\}$ and plot estimated
${\rho}_{\tau}$ as a function of
$\rho $ in
Figure 4. The elliptical case (corresponding to equal dof) is included for comparison. When the pairwise dof are close and
$\rho $ is not too close to 1, the elliptical approximation is quite satisfactory. However, when the dof are further apart there is a significant difference between the estimated
${\rho}_{\tau}$ and the elliptical approximation. This is highlighted in the plot on the right side, which displays the relative difference
$({\rho}_{\tau}^{\mathrm{ell}}{\rho}_{\tau})/{\rho}_{\tau}^{\mathrm{ell}}$. Intuitively, it makes sense that the approximation deteriorates when the dof are further apart, as the closer the dof, the “closer” is the model to being elliptical.
6. Copula Setting
So far, the focus of this paper was on grouped normal variance mixtures. This section addresses grouped normal variance mixture copulas, i.e., the copulas derived from $\mathit{X}\sim \mathrm{gNVM}({F}_{\mathit{W}},\mathit{\mu},\mathsf{\Sigma})$ via Sklar’s theorem. The first part addresses grouped NVM copulas in full generality and provides formulas for the copula, its density and the tail dependence coefficients. The second part details the important special case of inversegamma mixture copulas, that is copulas derived from a grouped t distribution, $\mathit{X}\sim {gt}_{d}(\mathit{\nu},\mathit{\mu},\mathsf{\Sigma})$. The third part discusses estimation of the copula and its density whereas the fourth part answers the question of how copula parameters can be fitted to a dataset. The last part of this section includes numerical examples.
6.1. Grouped Normal Variance Mixture Copulas
Copulas provide a flexible tool for modeling dependent risks, as they allow one to model the margins separately from the dependence between the margins. Let
$\mathit{X}\sim F$ be a
ddimensional random vector with continuous margins
${F}_{1},\dots ,{F}_{d}$. Consider the random vector
$\mathit{U}$ given by
$\mathit{U}=({U}_{1},\dots ,{U}_{d})=({F}_{1}\left({X}_{1}\right),\dots ,{F}_{d}\left({X}_{d}\right))$; note that
${U}_{j}\sim \mathrm{U}(0,1)$ for
$j=1,\dots ,d$. The
copula C of
F (or
$\mathit{X}$) is the distribution function of the marginfree
$\mathit{U}$, i.e.,
If
F is absolutely continuous and the margins
${F}_{1},\dots ,{F}_{d}$ are strictly increasing and continuous, the
copula density is given by
where
f denotes the (joint) density of
F and
${f}_{j}$ is the marginal density of
${F}_{j}$. For more about copulas and their applications to risk management, see, e.g.,
Embrechts et al. (
2001);
Nelsen (
2007).
Since copulas are invariant with respect to strictly increasing marginal transformations, we may wlog assume that
$\mathit{\mu}=0$,
$\mathsf{\Sigma}=P$ is a correlation matrix and we may consider
$\mathit{X}\sim {\mathrm{gNVM}}_{d}(\mathbf{0},P,{F}_{\mathit{W}})$. We find using (
11) that the
grouped normal variance mixture copula is given by
and its density can be computed using (
18) as
where
${F}_{j}$ and
${f}_{j}$ denote the distribution function and density function of
${X}_{j}\sim {\mathrm{NVM}}_{1}(0,1,{F}_{{W}_{i}})$ for
$j=1,\dots ,d$; directly considering
$log\left({c}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}\left(\mathit{u}\right)\right)$ also makes (
25) more robust to compute.
In the remainder of this subsection, some useful properties of gNVM copulas are derived. In particular, we study symmetry properties, rank correlation and tail dependence coefficients.
6.1.1. Radial Symmetry and Exchangeability
A
ddimensional random vector
$\mathit{X}$ is radially symmetric about
$\mathit{\mu}\in {\mathbb{R}}^{d}$ if
$\mathit{X}\mathit{\mu}\underset{}{\stackrel{\mathrm{d}}{=}}\mathit{\mu}\mathit{X}$. It is evident from (
2) that
$\mathit{X}\sim {\mathrm{gNVM}}_{d}(\mathit{\mu},\mathsf{\Sigma},{F}_{\mathit{W}})$ is radially symmetric about its location vector
$\mathit{\mu}$. In layman’s terms this implies that jointly large values of
$\mathit{X}$ are as likely as jointly small values of
$\mathit{X}$. Radial symmetry also implies that
${c}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}\left(\mathit{u}\right)={c}_{{F}_{\mathit{W}}}^{\mathrm{gNVM}}(\mathbf{1}\mathit{u})$.
If $({X}_{\mathsf{\Pi}\left(1\right)},\dots ,{X}_{\mathsf{\Pi}\left(d\right)})\underset{}{\stackrel{\mathrm{d}}{=}}({X}_{1},\dots ,{X}_{d})$ for all permutations $\mathsf{\Pi}$ of $\{1,\dots ,d\}$, the random vector $\mathit{X}$ is called exchangeable. The same definition applies to copulas. If $\mathit{X}\sim {\mathrm{gNVM}}_{d}(\mathbf{0},{I}_{d},{F}_{\mathit{W}})$, then $\mathit{X}$ is in general not exchangeable unless ${F}_{{W}_{1}}=\dots ={F}_{{W}_{d}}$ in which case $\mathit{X}\sim {\mathrm{NVM}}_{d}(\mathbf{0},P,{F}_{{W}_{1}})$. The lack of exchangeability implies that ${c}_{{I}_{d},{F}_{\mathit{W}}}^{\mathrm{gNVM}}({u}_{1},\dots ,{u}_{d})\ne {c}_{{I}_{d},{F}_{\mathit{W}}}^{\mathrm{gNVM}}({u}_{\mathsf{\Pi}\left(1\right)},\dots ,{u}_{\mathsf{\Pi}\left(d\right)})$, in general.
6.1.2. Tail Dependence Coefficients
Consider a bivariate
${C}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}$ copula. Such copula is radially symmetric, hence the lower and upper tail dependence coefficients are equal, i.e.,
${\lambda}_{l}={\lambda}_{u}=:\lambda \in [0,1]$, where
for
$({U}_{1},{U}_{2})\sim {C}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}$. In the case where only the quantile functions
${F}_{{W}_{j}}^{\leftarrow}$ are available, no simple expression for
$\lambda $ is available. In
Luo and Shevchenko (
2010),
$\lambda $ is derived for grouped
t copulas, as will be discussed in
Section 6.2. Following the arguments used in their proof, the following lemma provides a new expression for
$\lambda $ in the more general normal variance mixture case.
Proposition 4. The tail dependence coefficient λ for a bivariate ${C}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}$ with $\rho ={P}_{12}$ satisfieswhere for $i,j\in \{1,2\}$, 6.2. InverseGamma Mixtures
If
$\mathit{X}\sim {t}_{d}(\nu ,\mathbf{0},P)$ for a positive definite correlation matrix
P, the copula of
$\mathit{X}$ extracted via Sklar’s theorem is the well known
t copula, denoted by
${C}_{\nu ,P}^{t}$. This copula is given by
where
${t}_{\nu}$ and
${t}_{\nu}^{1}$ denote the distribution function and quantile function of a univariate standard
t distribution. Note that (
26) is merely the distribution function of
$\mathit{X}\sim {t}_{d}(\nu ,\mathbf{0},P)$ evaluated at the quantiles
${t}_{\nu}^{1}\left({u}_{1}\right),\dots ,{t}_{\nu}^{1}\left({u}_{d}\right)$. The copula density
${c}_{\nu ,P}^{t}\left(\mathit{u}\right)$ is
The (upper and lower) tail dependence coefficient
$\lambda $ of the bivariate
${C}_{\nu ,P}^{t}$ with
$\rho ={P}_{12}$ is well known to be
see (
Demarta and McNeil 2005, Propositon 1). The multivariate
t distribution being elliptical implies the formula
${\rho}_{\tau}=2arcsin\left(\rho \right)/\pi $ for Kendall’s tau.
A closed formula for Spearman’s rho is not available, but our Proposition 3 implies that
Next, consider a grouped inversegamma mixture model. If
$\mathit{X}\sim {gt}_{d}(\mathit{\nu},\mathbf{0},P)$, the copula of
$\mathit{X}$ is the grouped
t copula, denoted by
${C}_{\mathit{\nu},P}^{gt}$. From (
24),
and the copula density follows from (
25) as
The (lower and upper) tail dependence coefficient
$\lambda $ of
${C}_{{\nu}_{1},{\nu}_{2},P}^{gt}$ is given by
see (
Luo and Shevchenko 2010, Equation (26)). Here,
${f}_{{\chi}_{\nu}^{2}}$ denotes the density of a
${\chi}_{\nu}^{2}$ distribution.
Finally, consider rank correlation coefficients for grouped
t copulas. No closed formula for either Kendall’s tau or Spearman’s rho exists in the grouped
t case. An exact integral representation of
${\rho}_{\tau}$ for
${C}_{{\nu}_{1},{\nu}_{2},P}^{gt}$ follows from Proposition 1. No substantial simplification of (
21) therein can be achieved by considering the special case when
${W}_{j}\sim IG({\nu}_{j}/2,{\nu}_{j}/2)$. In order to compute
${\rho}_{\tau}$, one can either numerically integrate (
21) (as will be discussed in the next subsection) or use the approximation
${\rho}_{\tau}\approx \frac{2}{\pi}arcsin\left(\rho \right)$ which was shown to be a “very accurate” approximation in
Daul et al. (
2003).
For Spearman’s rho, no closed formula can be derived either, not even in the ungrouped
t copula case, so that the integral in (
22) in Proposition 3 needs be computed numerically, as will be discussed in the next subsection.
The discussion in this section highlights that moving from a scalar mixing rv W (as in the classical t case) to comonotone mixing rvs ${W}_{1},\dots ,{W}_{S}$ (as in the grouped t case) introduces challenges from a computational point of view. While in the classical t setting, the density, Kendall’s tau and the tail dependence coefficient are available in closed form, all of these quantities need to be estimated in the more general grouped setting. Efficient estimation of these important quantities is discussed in the next subsection.
6.3. Estimation of the Copula and Its Density
Consider a
ddimensional normal variance mixture copula
${C}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}$. From (
24), it follows that
where
${F}_{\mathit{X}}$ is the distribution function of
$\mathit{X}\sim \mathrm{gNVM}(\mathbf{0},P,{F}_{\mathit{W}})$ and
${F}_{j}$ is the distribution function of
${\mathrm{NVM}}_{1}(0,1,{F}_{{W}_{j}})$ for
$j=1,\dots ,d$. If the margins are known (as in the case of an inversegamma mixture), evaluating the copula is no harder than evaluating the distribution function of
$\mathit{X}$ so that the methods described in
Section 3.1 can be applied.
When the mixing rvs
${W}_{j}$ are only known through their quantile functions in the form of a “black box”, one needs to estimate the marginal quantiles
${F}_{j}$ of
F first. Note that
which can be estimated using RQMC. The quantile
${F}_{j}^{\leftarrow}\left({u}_{j}\right)$ can then be estimated by numerically solving
${F}_{j}\left(x\right)=u$ for
x, for instance using a bisection algorithm or Newton’s method.
The general form of gNVM copula densities was given in (
25). Again, if the margins are known, the only unknown quantity is the joint density
${f}_{\mathit{X}}$ which can be estimated using the adaptive RQMC procedure proposed in
Section 4.1. If the margins are not available,
${F}_{j}^{\leftarrow}$ can be estimated as discussed above. The marginal densities
${f}_{j}$ can be estimated using an adaptive RQMC algorithm similar to the one developed in
Section 4.1; see also (
Hintz et al. 2020, Section 4).
Remark 2. Estimating the copula density is the most challenging problem discussed in this paper if we assume that ${F}_{\mathit{W}}$ is only known via its marginal quantile functions. Evaluating the copula density ${c}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}$ at one $\mathit{u}\in {[0,1]}^{d}$ requires estimation of:
the marginal quantiles ${F}_{j}^{\leftarrow}\left({u}_{j}\right)$, which involves estimation of ${F}_{j}$ and then numerical root finding, for each $j=1,\dots ,d$,
the marginal densities evaluated at the quantiles ${f}_{j}\left({F}_{j}^{\leftarrow}\left({u}_{j}\right)\right)$ for $j=1,\dots ,d$. This involves estimation of the density of a univariate normal variance mixture,
the joint density evaluated at the quantiles $f({F}_{1}^{\leftarrow}\left({u}_{1}\right),\dots ,{F}_{d}^{\leftarrow}\left({u}_{d}\right))$, which is another one dimensional integration problem.
It follows from Remark 2 that, while estimation of ${c}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}$ is theoretically possible with the methods proposed in this paper, the problem becomes computationally intractable for large dimensions d. If the margins are known, however, our proposed methods are efficient and accurate, as demonstrated in next subsection, where we focus on the important case of a grouped t model. Our methods to estimate the copula and the density of ${C}_{\mathit{\nu},\mathit{k},P}^{gt}$ are implemented in the functions pgStudentcopula() and dgStudentcopula() in the R package nvmix.
6.4. Fitting Copula Parameters to a Dataset
In this subsection, we discuss estimation methods for grouped normal variance mixture copulas. Let ${\mathit{X}}_{1},\dots ,{\mathit{X}}_{n}$ be independent and distributed according to some distribution with ${C}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}$ as underlying copula, with ${\mathit{X}}_{i}=({X}_{i,1},\dots ,{X}_{i,d})$ and group sizes ${d}_{1},\dots ,{d}_{S}$ with ${\sum}_{j=1}^{S}{d}_{j}=d$. Furthermore, let ${\mathit{\nu}}_{k}$ be (a vector of) parameters of the kth mixing distribution for $k=1,\dots ,S$; for instance, in the grouped t case, ${\mathit{\nu}}_{k}={\nu}_{k}$ is the degrees of freedom for group k. Finally, denote by $\mathit{\nu}=({\mathit{\nu}}_{1},\dots ,{\mathit{\nu}}_{S})$ the vector consisting of all mixing parameters. Note that we assume that the group structure is given. We are interested in estimating the parameter vector $\mathit{\nu}$ and the matrix P of the underlying copula ${C}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}$.
In
Daul et al. (
2003), this problem was discussed for the grouped
t copula where
${d}_{k}\ge 2$ for
$k=1,\dots ,S$. In this case, all subgroups are
t copulas and
Daul et al. (
2003) suggest estimating the dof
${\nu}_{1},\dots ,{\nu}_{S}$ separately in each subgroup. Computationally, this is rather simple as the density of the ungrouped
t copula is known analytically.
Luo and Shevchenko (
2010) consider the grouped
t copula with
$S=d$, so
${d}_{k}=1$ for
$k=1,\dots ,d$. Since any univariate margin of a copula is uniformly distributed, separate estimation is not feasible. As such,
Luo and Shevchenko (
2010) suggest estimating
${\nu}_{1},\dots ,{\nu}_{S}$ jointly by maximizing the copulalikelihood of the grouped mixture. In both references, the matrix
P is estimated by estimating pairwise Kendall’s tau and using the approximate identity
${\rho}_{\tau}({X}_{i},{X}_{j})\approx 2arcsin\left({\rho}_{i,j}\right)/\pi $ for
$i\ne j$. Although we have shown in
Section 5 that in some cases, this approximation could be too crude, our assessment is that in the context of the fitting examples considered in the present section, this approximation is sufficiently accurate.
Luo and Shevchenko (
2010) also consider joint estimation of
$(P,\mathit{\nu})$ by maximizing the corresponding copula likelihood simultaneously over all
$d+d(d1)/2$ parameters. Their numerical results in
$d=2$ suggest that this does not lead to a significant improvement. In large dimensions
$d>2$, the optimization problem becomes intractable, however, so that the first nonparametric approach for estimating
P is likely to be preferred.
We combine the two estimation methods, applied to the general case of a grouped normal variance mixture, in Algorithm 2.
Algorithm 2: Estimation of the Copula Parameters $\mathit{\nu}$ and $\mathit{P}$ of ${\mathit{C}}_{\mathit{P},{\mathit{F}}_{\mathit{W}}}^{\mathrm{gNVM}}$. 
Given iid ${\mathit{X}}_{1},\dots ,{\mathit{X}}_{n}$, estimate $\mathit{\nu}$ and P of the underlying ${C}_{P,{F}_{\mathit{W}}}^{\mathrm{gNVM}}$ as follows: Estimation of P. Estimate Kendall’s tau ${\rho}_{\tau}({X}_{i},{X}_{j})$ for each pair $1\le i<j\le d$. Use the approximate identity ${\rho}_{\tau}({X}_{i},{X}_{j})\approx 2arcsin\left({\rho}_{i,j}\right)/\pi $ to find the estimates ${\rho}_{i,j}$. Then combine the estimates ${\rho}_{i,j}$ into a correlation matrix $\widehat{P}$, which may have to be modified to ensure positive definiteness. Transformation to pseudoobservations. If necessary, transform the data ${\mathit{X}}_{1},\dots ,{\mathit{X}}_{n}$ to pseudoobservations ${\mathit{U}}_{1},\dots ,{\mathit{U}}_{n}$ from the underlying copula, for instance, by setting ${U}_{i,j}={R}_{i,j}/(n+1)$ where ${R}_{i,j}$ is the rank of ${X}_{i,j}$ among ${X}_{1,j},\dots ,{X}_{n,j}$. Initial parameters. Maximize the copula loglikelihood for each subgroup k with ${d}_{k}\ge 2$ over their respective parameters separately. That is, if ${\mathit{U}}_{i}^{\left(k\right)}=({U}_{i,{d}_{k1}+1},\dots ,{U}_{i,{d}_{k1}+{d}_{k}})$ (where ${d}_{0}=0$) denotes the subvector of ${\mathit{U}}_{i}$ belonging to group k, and if ${\widehat{P}}^{\left(k\right)}$ is defined accordingly, solve the following optimization problems:
For “groups” with ${d}_{k}=1$, choose the initial estimate ${\widehat{\mathit{\nu}}}_{0}^{\left(k\right)}$ from prior/expert experience or as a hardcoded value. Joint estimation. With initial estimates ${\widehat{\mathit{\nu}}}_{0}^{\left(k\right)}$, $k=1,\dots ,S$ at hand, optimize the full copula likelihood to estimate $\mathit{\nu}$; that is,

The method proposed in
Daul et al. (
2003) returns the initial estimates obtained in Step 3. A potential drawback of this approach is that it fails to consider the dependence between the groups correctly. Indeed, the dependence between a component in group
${k}_{1}$ and a component in group
${k}_{2}$ (e.g., measured by Kendall’s tau or by the taildependence coefficient) is determined by both
${\mathit{\nu}}^{\left({k}_{1}\right)}$ and
${\mathit{\nu}}^{\left({k}_{2}\right)}$. As such, these parameters should be estimated jointly.
Note that the copula density is not available in closed form, not even in the grouped
t case, so that each call of the likelihood function in (
30) requires the approximation of
n integrals. This poses numerical challenges, as the estimated likelihood function is typically “bumpy”, having many local maxima due to estimation errors.
If
${F}_{\mathit{W}}$ is only known via its marginal quantile functions, as is the general theme of this paper, the optimization problem in (
29) and in (
30) become intractable (unless
d and
n are small) due to the numerical challenges involved in the estimation of the copula density; see also Remark 2. We leave the problem of fitting grouped normal variance mixture copulas in full generality (where the distribution of the mixing random variables
${W}_{j}$ is only specified via marginal quantile functions in the form of a “black box”) for future research. Instead, we focus on the important case of a grouped
t copula. Here, the quantile functions
${F}_{j}^{\leftarrow}$ (of
${X}_{j}$) and the densities
${f}_{j}$ are known for
$j=1,\dots ,d$, since the margins are all
t distributed. This substantially simplifies the underlying numerical procedure. Our method is implemented in the function
fitgStudentcopula() of the R package
nvmix. The numerical optimizations in Steps 3 and 4 are passed to the
R optimizer
optim() and the copula density is estimated as in
Section 6.3.
Example 1. Consider a 6dimensional grouped t copula, with three groups of size 2 each and degrees of freedom $1,4$ and 7, respectively. We perform the following experiment: We sample a correlation matrix P using the R function rWishart(). Then, for each sample size $n\in \{250,500,\dots ,1750,2000\}$, we repeat sampling ${\mathit{X}}_{1},\dots ,{\mathit{X}}_{n}$ 15 times, and in each case, estimate the degrees of freedom once using the method in Daul et al. (2003) (i.e., by estimating the dof in each group separately) and once using our method from the previous section. The true matrix P is used in the fitting, so that the focus is really on estimating the dof. The results are displayed in Figure 5. The estimates on the left are obtained for each group separately; on the right, the dof were estimated jointly by maximizing the full copula likelihood (with initial estimates obtained as in the left figure). Clearly, the jointly estimated parameters are much closer to their true values (which are known in this simulation study and indicated by horizontal lines), and it can be confirmed that the variance decreases with increasing sample size n. Example 2. Let us now consider the negative logarithmic returns of the constituents of the Dow Jones 30 index from 1 January 2014 to 31 December 2015 ($n=503$ data points obtained from the R package qrmdata of Hofert and Hornik (2016)) and, after deGARCHing, fit a grouped t copula to the standardized residuals. We choose the natural groupings induced by the industry sectors of the 30 constituents and merge groups of size 1 so that 9 groups are left. Figure 6 displays the estimates obtained for various specifications of maxit, the maximum number of iterations for the underlying optimizer (note that the current default of optim() is as low as maxit = 500). The points for maxit = 0 correspond to the initial estimates found from separately fitting t copulas to the groups. The initial estimates differ significantly from the maximum likelihood estimates (MLEs) obtained from the joint estimation of the dof. Note also that the MLEs change with increasing maxit argument, even though they do not change drastically anymore if 1500 or more iterations are used. Note that the initial parameters result in a much more heavy tailed model than the MLEs. Figure 6 also displays the estimated loglikelihood of the parameters found by the fitting procedure. The six lines correspond to the estimated loglikelihood using six different seeds. It can be seen that estimating the dof jointly (as opposed to groupwise) yields a substantially larger loglikelihood, whereas increasing the parameter maxit (beyond a necessary minimum) only gives a minor improvement. In order to examine the impact of the different estimates on the underlying copula in terms of its tail behavior,
Figure 7 displays the probability
$C(u,\dots ,u)$ estimated using methods from
Section 6.3 as a function of
u; in a risk management context,
$C(u,\dots ,u)$ is the probability of a jointly large loss, hence a rare event. An absolute error tolerance of
${10}^{7}$ was used to estimate the copula. The figure also includes the corresponding probability for the ungrouped
t copula, for which the dof were estimated to be 6.3.
Figure 7 indicates that the initial estimates yield the most heavy tailed model. This seems reasonable since all initial estimates for the dof range between 0.9 and 5.3 (with average 2.8). The models obtained from the MLEs exhibit the smallest tail probability, indicating that these are the least heavy tailed models considered here. This is in line with
Figure 6, which shows that the dof are substantially larger than the initial estimates. The ungrouped
t copula is more heavy tailed than the fitted grouped one (with MLEs) but less heavy tailed than the fitted grouped one with initial estimates.
This example demonstrates that it is generally advisable to estimate the dof jointly when grouped modeling is of interest, rather than groupwise as suggested in
Daul et al. (
2003). Indeed, in this particular example, the initial estimates give a model that substantially overestimates the risk of jointly large losses. As can be seen from
Figure 6, optimizing an estimated loglikelihood function is not at all trivial, in particular when many parameters are involved. Indeed, the underlying optimizer never detected convergence, which is why the user needs to carefully assess which specification of
maxit to use. We plan on exploring more elaborate optimization procedures which perform better in large dimensions for this problem in the future.
Example 3. In this example, we consider the problem of meanvariance (MV) portfolio optimization in the classical Markowitz (1952) setting. Consider d assets, and denote by ${\mathit{\mu}}_{t}$ and ${\mathsf{\Sigma}}_{t}$ the expected return vector on the risky assets in excess of the risk free rate and the variancecovariance (VCV) matrix of asset returns in the portfolio at time t, respectively. We assume that an investor chooses the weights ${\mathit{x}}_{t}$ of the portfolio to maximize the quadratic utility function $U\left({\mathit{x}}_{t}\right)={\mathit{x}}_{t}^{\top}{\mathit{\mu}}_{t}\frac{\gamma}{2}{\mathit{x}}_{t}^{\top}{\mathsf{\Sigma}}_{t}{\mathit{x}}_{t}$, where in what follows we assume the riskaversion parameter $\gamma =1$. When there are no shortselling (or other) constraints, one finds the optimal ${\mathit{x}}_{t}$ as ${\mathit{x}}_{t}={\mathsf{\Sigma}}_{t}^{1}{\mathit{\mu}}_{t}$. As in Low et al. (2016), we consider relative portfolio weights, which are thus given by As such, the investor needs to estimate ${\mathit{\mu}}_{t}$ and ${\mathsf{\Sigma}}_{t}$. If we assume no shortselling, i.e., ${x}_{t,j}\ge 0$ for $j=1,\dots ,d$, the optimization problem can be solved numerically, for instance using the R package quadprog of Turlach et al. (2019). Assume we have return data for the d assets stored in vectors ${\mathit{y}}_{t}$, $t=1,\dots ,T$, and a sampling window $0<M<T$. We perform an experiment similar to Low et al. (2016) and compare a historical approach with a modelbased approach to estimate ${\mathit{\mu}}_{t}$ and ${\mathsf{\Sigma}}_{t}$. The main steps are as follows:  1.
In each period $t=M+1,\dots ,T$, estimate ${\mathit{\mu}}_{t}$ and ${\mathsf{\Sigma}}_{t}$ using the M previous return data ${\mathit{y}}_{i}$, $i=tM,\dots ,t1$.
 2.
Compute the optimal portfolio weights ${\mathit{w}}_{t}$ and the outofsample return ${r}_{t}={\mathit{w}}_{t}^{\top}{\mathit{y}}_{t}$.
In the historical approach, ${\mathit{\mu}}_{t}$ and ${\mathsf{\Sigma}}_{t}$ in the first step are merely computed as the sample mean vector and sample VCV matrix of the past return data. Our modelbased approach is a simplification of the approach used in Low et al. (2016). In particular, to estimate ${\mathit{\mu}}_{t}$ and ${\mathsf{\Sigma}}_{t}$ in the first step, the following is done in each time period:  1a.
Fit marginal $\mathit{ARMA}(1,1)\mathit{GARCH}(1,1)$ models with standardized t innovations to ${\mathit{y}}_{i}$, $i=tM,\dots ,t1$.
 1b.
Extract the standardized residuals and fit a grouped t copula to the pseudoobservations thereof.
 1c.
Sample n vectors from the fitted copula, transform the margins by applying the quantile function of the respective standardized t distribution and based on these n ddimensional residuals, sample from the fitted $\mathit{ARMA}(1,1)\mathit{GARCH}(1,1)$ giving a total of n simulated return vectors, say ${\mathit{y}}_{i}^{\prime}$, $i=1,\dots ,n$.
 1d.
Estimate ${\mathit{\mu}}_{t}$ and ${\mathsf{\Sigma}}_{t}$ from ${\mathit{y}}_{i}^{\prime}$, $i=1,\dots ,n$.
The historical and modelbased approaches each produce $TM$ outofsample returns from which we can estimate the certaintyequivalent return (CER) and the Sharperatio (SR) aswhere ${\widehat{\mu}}_{r}$ and ${\widehat{\sigma}}_{r}$ denote the sample mean and sample standard deviation of the $TM$ outofsample returns; see also Tu and Zhou (2011). Note that larger, positive values of the SR and CER indicate better portfolio performance. We consider logarithmic returns of the constituents of the Dow Jones 30 index from 1 January 2013 to 31 December 2014 ($n=503$ data points obtained from the R package qrmdata of Hofert and Hornik (2016)), a sampling window of $M=250$ days, $n={10}^{4}$ samples to estimate ${\mathit{\mu}}_{t}$ and ${\mathsf{\Sigma}}_{t}$ in the modelbased approach, a riskfree interest rate of zero and no transaction costs. We report (in percent) the point estimates ${\widehat{\mu}}_{r}$, $\widehat{\mathrm{CER}}$ and $\widehat{\mathrm{SR}}$ for the historical approach and for the modelbased approach based on an ungrouped and grouped t copula in Table 2 assuming no shortselling. To limit the run time for this illustrative example, the degrees of freedom for the grouped and ungrouped t copula are estimated once and held fixed throughout all time periods $t=M+1,\dots ,T$. We see that the point estimates for the grouped model exceed the point estimates for the ungrouped model. 7. Discussion and Conclusions
We introduced the class of grouped normal variance mixtures and provided efficient algorithms to work with this class of distributions: Estimating the distribution function and logdensity function, estimating the copula and its density, estimating Spearman’s rho and Kendall’s tau and estimating the parameters of a grouped NVM copula given a dataset. Most algorithms (and functions in the package nvmix) merely require one to provide the quantile function(s) of the mixing distributions. Due to their importance in practice, algorithms presented in this paper (and their implementation in the R package nvmix) are widely applicable in practice.
We saw that the distribution function (and hence, the copula) of grouped NVM distributions can be efficiently estimated even in high dimensions using RQMC algorithms. The density function of grouped NVM distributions is in general not available in closed form, not even for the grouped t distribution, so one relies on its estimation. Our proposed adaptive algorithm is capable of estimating the logdensity even in high dimensions accurately and efficiently. Fitting grouped normal variance mixture copulas, such as the grouped t copula, to data is an important yet challenging task due to lack of a tractable density function. Thanks to our adaptive procedure for estimating the density, the parameters can be estimated jointly in the special case of a grouped t copula. As was demonstrated in the previous section, it is indeed advisable to estimate the dof jointly, as otherwise one might severely over or underestimate the joint tails.
A computational challenge that we plan to further investigate is the optimization of the estimated loglikelihood function, which is currently slow and lacks a reliable convergence criterion that can be used for automation. Another avenue for future research is to study how one can, for a given multivariate dataset, assign the components to homogeneous groups.