High-Dimensional Distributionally Robust Mean-Variance Efficient Portfolio Selection

Zhang, Zhonghui; Jing, Huarui; Kao, Chihwa

doi:10.3390/math11051272

Open AccessArticle

High-Dimensional Distributionally Robust Mean-Variance Efficient Portfolio Selection

by

Zhonghui Zhang

^1,*

,

Huarui Jing

²

and

Chihwa Kao

³

¹

Institute of Banking and Money, Nanjing Audit University, Nanjing 210017, China

²

Department of Economics and Finance, The University of the South, Sewanee, TN 37383, USA

³

Department of Economics, University of Connecticut, Storrs, CT 06269, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(5), 1272; https://doi.org/10.3390/math11051272

Submission received: 20 January 2023 / Revised: 3 March 2023 / Accepted: 3 March 2023 / Published: 6 March 2023

(This article belongs to the Special Issue Mathematical Economics and Spatial Econometrics)

Download

Browse Figures

Versions Notes

Abstract

:

This paper introduces a novel distributionally robust mean-variance portfolio estimator based on the projection robust Wasserstein (PRW) distance. This approach addresses the issue of increasing conservatism of portfolio allocation strategies due to high-dimensional data. Our simulation results show the robustness of the PRW-based estimator in the presence of noisy data and its ability to achieve a higher Sharpe ratio than regular Wasserstein distances when dealing with a large number of assets. Our empirical study also demonstrates that the proposed portfolio estimator outperforms classic “plug-in” methods using various covariance estimators in terms of risk when evaluated out of sample.

Keywords:

mean variance portfolio; high dimension; distributionally robust optimization; projection robust Wasserstein distance

MSC:

91G10

1. Introduction

The mean-variance efficient portfolios (MVP) introduced by [1] continues to be the workhorse to determine optimal asset allocations. The MVP model minimizes portfolio risk (as measured by the portfolio variance) subject to constraints on the expected portfolio return and the weights that sum to one. Mathematically,

min_{ϕ} ϕ^{'} {Var}_{P} (R) ϕ, s . t . ϕ^{'} 1 = 1, E_{P} (ϕ^{'} R) \geq α,

(1)

where

ϕ

is a vector of portfolio weights, R is the asset return,

α

is the required return,

E_{P}

and

{Var}_{P}

are, respectively, the expectation vector and covariance matrix under the distribution P. We assume that

ϕ^{'} {Var}_{P} (R) ϕ

exists throughout the paper. However, it is well-known that the standard “plug-in” approach, which replaces population parameters with sample estimates, can lead to poor out-of-sample performance [2,3,4].

A special case of the MVP is the global minimum–variance portfolio (GMVP):

min_{ϕ} ϕ^{'} {Var}_{P} (R) ϕ, s . t . ϕ^{'} 1 = 1,

(2)

which minimizes portfolio risk without any constraints on the expected return. The GMVP has a closed-form solution,

ϕ^{*} = \frac{{Var}_{P} {(R)}^{- 1} 1}{1^{'} {Var}_{P} {(R)}^{- 1} 1}

, and its risk,

R_{m i n} = \frac{1}{1^{'} {Var}_{P} {(R)}^{- 1} 1}

, depends solely on the variance of asset returns. There is substantial empirical evidence that the GMVP generally outperforms other MVPs in terms of out-of-sample risk [5,6]. However, the GMVP depends on the accuracy of the covariance matrix estimate. When the matrix dimension is large compared to the sample size, the sample covariance matrix may not perform well. Several methods have been proposed to overcome this issue and improve the estimation of the covariance matrix for GMVP in high-dimensional scenarios. Among others, ref. [7] proposed a principal orthogonal complement thresholding (POET) method for estimating the large covariance matrix. Ref. [8] studied GMVP under two scenarios: (1) minimum risk decays to zero at a faster rate than

1 / \sqrt{N}

and (2) minimum risk decays to zero at a slower rate or is bounded from below. They found that using the POET estimator alone is not enough to achieve accurate risk estimates under the second scenario and proposed a unified estimator that provides consistent shape risk estimates in both scenarios. Ref. [9] introduced a consistent estimator for the minimum variance in a high-frequency setting, while [10,11,12] proposed shrinkage estimators that exhibit good out-of-sample performance.

In addition to the aforementioned methods that focus on improving the covariance estimate, robust and distributionally robust optimization (DRO) are also popular choices as they are designed to handle uncertainty in decision making. Robust estimators, such as those proposed by [13,14], are designed to provide meaningful results even if the empirical distribution deviates from the assumed distribution, though they are not as efficient as maximum likelihood estimators if the underlying model is correctly specified. For example, ref. [15] demonstrated that the use of robust estimates significantly improves the stability properties of MVP. Ref. [2] proposed a portfolio using robust estimators, where robust estimation and portfolio optimization are performed in a single step. DRO was first introduced by [16] for the newsvendor problems and involves modeling uncertainty by representing it through an ambiguity set. This allows for the incorporation of estimation errors into optimization problems. Ref. [17] (BCZ, hereafter) used a DRO approach with a Wasserstein distance-based ambiguity set for MVP and formulated it as an optimization problem:

min_{ϕ} max_{P \in U_{δ} (P_{T})} ϕ^{'} {Var}_{P} (R) ϕ, s . t . ϕ^{'} 1 = 1, E_{P} (ϕ^{'} R) \geq α

(3)

with

U_{δ} (P_{T}) = \{P : W (P, P_{T}) \leq δ\} .

(4)

Here,

W (P, P_{T})

is the Wasserstein distance, which measures the difference between two probability distributions, P and

P_{T}

. The ambiguity set, defined in Equation (4), is centered at the empirical distribution of asset returns (

P_{T}

) with a radius of

δ

. The objective of the DRO formulation is to minimize the portfolio risk under the worst-case scenario, subject to the constraints of having a total investment of one and a minimum expected return of

α

. This approach provides a way to consider estimation errors and uncertainty in MVP.

BCZ show that the min–max problem in (3) can be reformulated as

min_{ϕ} {(\sqrt{ϕ^{'} {Var}_{P_{T}} (R) ϕ} + \sqrt{δ} ∥ ϕ ∥)}^{2} s . t . ϕ^{'} 1 = 1, E_{P_{T}} (ϕ^{'} R) \geq α + \sqrt{δ} ∥ ϕ ∥,

(5)

where

| | \cdot | |

is the Euclidean norm. This reformulation provides a direct distributionally robust interpretation that links regularization of the portfolio weights to the ambiguity of the distribution of asset returns. The size of the ambiguity set is related to the regularization parameter and the shape of the transportation cost in the definition of the optimal transport discrepancy determines the type of regularization imposed. BCZ demonstrate that minimizing the portfolio variance when the distribution of parameters is from a Wasserstein metric-based ambiguity set can be formulated as imposing two-norm constraints on MVP. This demonstrates that the distributionally robust approach can recover several widely used regularization techniques in the portfolio selection literature. In particular, [18] showed that the solution to the GMVP with a one-norm constraint, where

{∥ ϕ ∥}_{1} \equiv \sum_{i = 1}^{N} | ϕ_{i} | \leq 1

, is equivalent to the solution to the short-sale constrained problem in [6]. They also showed that imposing a short-sale constraint when minimizing portfolio variance is equivalent to shrinking the extreme elements of the covariance matrix. The solution to GMVP with an A-norm constraint, where

{∥ ϕ ∥}_{A} \equiv ϕ^{'} A ϕ \leq δ

, is equivalent to the shrinkage portfolios proposed by Ledoit and Wolf. When A is equal to the one-factor covariance matrix, A-norm-constrained portfolios and the shrinkage portfolios proposed in [10] have a one-to-one correspondence. When A is equal to the identity matrix, A-norm-constrained portfolios and the shrinkage portfolios proposed in [11] have a one-to-one correspondence. Ref. [19] showed that MVP with p-norm transaction costs can be equivalently formulated as a robust portfolio problem with a mean in a moment-based ambiguity set. The unified estimator proposed by [8] was obtained by adding a one-norm constraint of

{| ϕ |}_{1} \leq δ

to the traditional GMVP. Building a direct relationship between DRO and the function regularization commonly used in MVP is an interesting research question.

In this paper, we make a theoretical contribution by revisiting BCZ’s Wasserstein DRO and extending it to the high-dimensional data. The use of Wasserstein distance in DRO has gained significant attention due to its ability to provide an adjustable level of conservatism and exceptional out-of-sample performance as reported by [20]. However, it is well known that Wasserstein distances are prone to the curse of dimensionality, as the sample complexity can grow rapidly with the increase in dimension [21]. More importantly, the portfolio selection used by BCZ can lead to overly conservative selections when incorporating a large number of assets. As an example, consider the following scenario. Assume that the asset returns follow the Capital Asset Pricing Model (CAPM):

R_{i t} = β_{i} f_{t} + ϵ_{i t}

for

i = 1, \dots, N

and

t = 1, \dots, T

. The factor f follows an AR(1) process, for example,

f_{t} = 0.9 f_{t - 1} + e_{t}

. The values of

β_{i}

,

e_{t}

, and

ϵ_{i t}

are randomly drawn from a normal distribution

N (0, 1)

. Figure 1 depicts the behavior of the Wasserstein distance between the empirical distribution of

R_{i t}

and

β_{i} f_{t}

as a function of the dimension N. The left panel in the figure is based on observations of

T = 50

, and the right panel is based on

T = 100

. Although there is only one underlying factor (f) driving the asset returns, the Wasserstein distance (Wass) increases as more assets are included. These experimental results imply that there may be a potential issue with the distributionally robust MVP optimization using the Wasserstein distance to form an ambiguity set, as defined in Equation (4). This issue arises because the regularization parameter

δ

, which determines the size of the ambiguity set, is estimated from the data and depends on how far the worst-case scenario of asset returns is from the population distribution. It should only depend on the key factors that drive asset returns, not the number of assets. However, using the regular Wasserstein distance as in Equation (5) to compute

δ

can cause it to increase as more assets are included, even if the key factors remain fixed. This larger value of

δ

places more weight on the regularization term and can lead to a portfolio allocation strategy that is more conservative or diversified, with less dependence on the variance of returns under the population distribution

P_{T}

. Therefore, it is important to ensure that the estimation of

δ

is based on the signal factors that drive asset returns, rather than the number of assets.

We address the issue of increasing conservatism or diversification of portfolio allocation strategies due to high dimensionality in BCZ’s DRO method. To overcome this challenge, we propose the use of projection robust Wasserstein distance (PRW) introduced by [22]. Unlike the regular Wasserstein distance, PRW distance computes the distance between two distribution measures in low-dimensional subspaces, rather than the original high-dimensional space. The subspaces are selected by maximizing the Wasserstein distance between the two measures after projection. For a 1-dimensional subspace, a sliced version of Wasserstein distance is employed [23,24]. As shown in Figure 1, PRW distance is more robust to changes in dimensions and noise compared to the regular Wasserstein distance. In the context of MVP selection, we examine the underlying motivation for PRW distance and focus on a Gelbrich hull defined by the mean and covariance in a k-dimensional subspace. This convex relaxation of the Wasserstein ambiguity set in BCZ maintains the robustness of the method while significantly reducing computation time, especially for large portfolios. Furthermore, we present a data-driven approach for selecting the degree of ambiguity (

δ

) that is robust to mean and covariance estimates. Our second contribution is the development of a data-driven method for selecting the parameter

δ

. Our findings indicate that the value of

δ

is robust to variations in the estimates of mean and covariance. Our simulation study also demonstrates that our proposed PRW-based estimator outperforms BCZ’s method when the dimension of assets is high.

The paper is structured as follows: In Section 2, we introduce the proposed high-dimensional distributionally subspace robust MVP (DSR-MVP) estimation method and the data-driven approaches for determining

δ

and the number of factors. Section 3 compares the performance of our method’s estimated GMVP with several popular MVPs and the equal-weight portfolio, using high-dimensional and noisy return data. The results show that our method has the highest Sharpe ratio. Section 4 presents an empirical study, and Section 5 concludes the paper.

2. Estimation Method

In this section, we propose a portfolio selection method that is based on the PRW distance.

2.1. Low-Rank Structure and DRO Formulation

The return R is a

T \times N

matrix with a low-rank signal component

R_{0}

and idiosyncratic noise

ϵ

. The rank of

R_{0}

is unknown, and it can be random or deterministic. If

R_{0}

has a factor structure, it can be written as

F B^{'}

, where F and B are matrices of common factors and loadings. The factors, loadings, and idiosyncratic errors are unobservable. We assume that all factors are strong. These assumptions allow for consistent estimation of the number of factors k from the estimated singular values of the low-rank matrix.

To provide a formal definition of the 2-Wasserstein distance, let

P (R^{N} \times R^{N})

be the space of Borel probability measures supported on

R^{N} \times R^{N}

. The couplings set is defined as

Π (P, Q) = \{π \in P (R^{N} \times R^{N}) : π (A \times R^{N}) = P (A), π (R^{N} \times B) = Q (B)\},

(6)

for any

A, B \subset R^{N}

Borel set. The 2-Wasserstein distance between two probability distributions P and Q supported on

R^{N}

is defined as

W (P, Q) : = inf_{π \in Π (P, Q)} {(\int ‖ U - {V ‖}^{2} d π (U, V))}^{1 / 2} .

(7)

The ambiguity set in (4) can be understood as a Wasserstein ball centered at a nominal distribution with radius

δ

, which investors can adjust to control their conservativeness. In practice, investors may collect a set of n possible distributions

Q_{i}, i = 1, \dots, n

and use their “center” to approximate the true distribution, which can be the Wasserstein barycenter. However, this requires the Wasserstein barycenter to be as close to the population distribution as possible.

In this paper, we propose to regularize the Wasserstein distance by projecting input measures onto lower-dimensional subspaces and computing the Wasserstein distance between these reductions. We employ the k-dimensional PRW distances between P and Q in [22], defined as

P_{k} (P, Q) : = inf_{π \in Π (P, Q)} sup_{E \in G_{k}} {(\int ‖ P_{E} (U - V) ‖^{2} d π (U, V))}^{1 / 2},

(8)

where

G_{k}

is the Grassmannian of k-dimensional subspaces of

R^{N}

, and

P_{E}

is the orthogonal projector onto

E

. PRW finds the worst-case direction that maximizes the probability distance between projected sample points, which is a special case of the projected Wasserstein distance using one-dimensional linear mapping. See [22] for convex relaxation and theoretical results such as the existence of

P_{k} (P, Q)

. We present our DSR-MVP estimator as follows:

ϕ^{*} = arg min_{ϕ} {(\sqrt{ϕ^{'} {Var}_{P_{T}} (R) ϕ} + \sqrt{δ_{k}} \cdot ∥ ϕ ∥)}^{2} s . t . ϕ^{'} 1 = 1, E_{P_{T}} (ϕ^{'} R) \geq \bar{α},

(9)

We modify the reformulation of BCZ’s problem (3) by using our ambiguity set defined with PRW and adjusting the regularization parameter

δ_{k}

based on the rank of the signal component k (the reformulation in BCZ remains valid if we replace the regular Wasserstein distance with PRW; this is because they are equivalent, in the sense that

\frac{k}{N} W (P, Q) \leq P_{k} (P, Q) \leq W (P, Q)

). This adjustment is necessary as it allows the regularization parameter to depend only on the signal factors, instead of the number of assets, and thus avoids potential issues with over-conservatism. Directly selecting the regularization parameter using cross-validation cannot replace this adjustment, as we will discuss in detail in the next section.

2.2. Choosing DSR-MVP Parameters

2.2.1. Data-Driven Approach for Choosing $δ_{k}$

Selecting an appropriate radius for the Wasserstein ball is crucial for obtaining good performance guarantees of DRO. The radius should balance between robustness and performance. Overestimating the size of the ambiguity set can occur in high-dimensional data, so we suggest using PRW to estimate this parameter. In this section, we present a practical and useful method for selecting

δ_{k}

.

Recall that the Markowitz portfolio selection aims to construct a portfolio with a target return

α

while minimizing risk, given a particular population distribution

P^{*}

. Namely,

min_{ϕ} \{\frac{1}{2} ϕ^{'} {Var}_{P^{*}} (R) ϕ\} s . t . ϕ^{'} E_{P^{*}} (R) = α, ϕ^{'} 1 = 1 .

(10)

Denote

Σ_{0} = {Var}_{P^{*}} (R)

and

μ_{0} = E_{P^{*}} (R)

. The mean-variance efficient frontier (MVEF) can be derived as

F (μ_{0}, Σ_{0}, ϕ^{*}) \equiv ϕ^{*} - Σ_{0}^{- 1} \frac{(c_{1} μ_{0}^{'} ϕ^{*} - c_{3}) μ_{0} + (c_{2} - c_{3} μ_{0}^{'} ϕ^{*}) 1}{c_{1} c_{2} - {(c_{3})}^{2}} = 0,

(11)

where

ϕ^{*}

are the optimal portfolio weights that solve the Markowitz problem in (10), and

c_{1} = 1^{'} Σ_{0}^{- 1} 1

,

c_{2} = μ_{0}^{'} Σ_{0}^{- 1} μ_{0}

, and

c_{3} = 1^{'} Σ_{0}^{- 1} μ_{0}

.

We now allow ambiguities on

P^{*}

. Let

P \in U_{δ_{k}} (μ_{0}, Σ_{0})

, where

U_{δ} (μ_{0}, Σ_{0}) : = \{P : (μ \equiv E_{P} (R), Σ \equiv {Var}_{P} (R)) \in U_{δ} (μ_{0}, Σ_{0})\}

(12)

is the Gelbrich hull, and

U_{δ} (μ_{0}, Σ_{0}) = \{(μ, Σ) : \sqrt{∥ μ - μ_{0} ∥^{2} + sup_{\begin{matrix} P_{E} \in R^{N \times k} \\ P_{E}^{'} P_{E} \in I_{k} \end{matrix}} Tr [P_{E} (Σ + Σ_{0} - 2 {(Σ^{\frac{1}{2}} Σ_{0} Σ^{\frac{1}{2}})}^{\frac{1}{2}})]} \leq δ\}

(13)

is our uncertainty set defined in the subspace of mean vectors and covariance matrices inspired by PRW in [22]. It follows that

U_{δ} (P^{*})

defined in (4) is a subset of

U_{δ} (μ_{0}, Σ_{0})

. Since

\begin{matrix} W (P, P^{*}) & \geq \sqrt{{∥μ - μ_{0}∥}^{2} + Tr [Σ + Σ_{0} - 2 {(Σ^{\frac{1}{2}} Σ_{0} Σ^{\frac{1}{2}})}^{\frac{1}{2}}]} \\ \geq \sqrt{∥ μ - μ_{0} ∥^{2} + sup_{\begin{matrix} P_{E} \in R^{N \times k} \\ P_{E}^{'} P_{E} \in I_{k} \end{matrix}} Tr [P_{E} (Σ + Σ_{0} - 2 {(Σ^{\frac{1}{2}} Σ_{0} Σ^{\frac{1}{2}})}^{\frac{1}{2}})]}, \end{matrix}

(14)

where

μ = E_{P} (R)

and

Σ = {Var}_{P} (R)

. The first inequality is well-known with the equality hold if P and

P^{*}

are elliptical distributions with the same density generator. The second inequality is from the the Cauchy–Schwarz inequality:

\begin{matrix} sup_{\begin{matrix} P_{E} \in R^{N \times k} \\ P_{E}^{'} P_{E} \in I_{k} \end{matrix}} Tr [P_{E} (Σ + Σ_{0} - 2 {(Σ^{\frac{1}{2}} Σ_{0} Σ^{\frac{1}{2}})}^{\frac{1}{2}})] \leq & sup_{\begin{matrix} P_{E} \in R^{N \times k} \\ P_{E}^{'} P_{E} \in I_{k} \end{matrix}} Tr [P_{E}] \cdot Tr [(Σ + Σ_{0} - 2 {(Σ^{\frac{1}{2}} Σ_{0} Σ^{\frac{1}{2}})}^{\frac{1}{2}})] \\ \leq & k \cdot Tr [Σ + Σ_{0} - 2 {(Σ^{\frac{1}{2}} Σ_{0} Σ^{\frac{1}{2}})}^{\frac{1}{2}}], \end{matrix}

(15)

for

k \in {1, \dots, N}

. Consequently, the corresponding portfolio risk follows

(max_{P \in U_{δ} (P^{*})} ϕ^{'} Σ ϕ) \leq (max_{P \in U_{δ} (μ_{0}, Σ_{0})} ϕ^{'} Σ ϕ)

(16)

for any

ϕ

such that

ϕ^{'} 1 = 1

and

ϕ^{'} μ \geq \bar{α}

. Let

Φ_{P} = \{ϕ : F (μ, Σ, ϕ) = 0\}

denote the MVEF under P. The set of all plausible MVEFs under P within the ambiguity set can be represented as

Φ_{δ} (P^{*}) = ⋃_{P \in U_{δ} (μ_{0}, Σ_{0})} Φ_{P} .

(17)

Here,

U_{δ} (μ_{0}, Σ_{0})

denotes the set of distributions with mean and covariance constraints, and

P^{*}

is a reference distribution. Given a confidence level

1 - ξ_{0}

, it is natural to choose

δ

by

δ^{*} = inf \{δ : P^{*} (ϕ \in Φ_{δ} (P^{*})) \geq 1 - ξ_{0}\},

(18)

which is by definition equivalent to

δ^{*} = inf \{δ : P^{*} (\sqrt{∥ μ - μ_{0} ∥^{2} + sup_{\begin{matrix} P_{E} \in R^{N \times k} \\ P_{E}^{'} P_{E} \in I_{k} \end{matrix}} Tr [P_{E} (Σ + Σ_{0} - 2 {(Σ^{\frac{1}{2}} Σ_{0} Σ^{\frac{1}{2}})}^{\frac{1}{2}})]} \leq δ) \geq 1 - ξ_{0}\} .

(19)

The goal is to find the smallest radius that ensures, with high probability, the Gelbrich hull contains at least one distribution. Note that

\sqrt{∥ μ - μ_{0} ∥^{2} + sup_{\begin{matrix} P_{E} \in R^{N \times k} \\ P_{E}^{'} P_{E} \in I_{k} \end{matrix}} Tr [P_{E} (Σ + Σ_{0} - 2 {(Σ^{\frac{1}{2}} Σ_{0} Σ^{\frac{1}{2}})}^{\frac{1}{2}})]} = \sqrt{∥ μ - μ_{0} ∥^{2} + \sum_{i = 1}^{k} λ_{i}},

(20)

where

λ_{i}

is the ith largest eigenvalue of

Σ + Σ_{0} - 2 {(Σ^{\frac{1}{2}} Σ_{0} Σ^{\frac{1}{2}})}^{\frac{1}{2}}

. Therefore, we can estimate this radius by

\hat{δ} = inf \{δ : P_{T} (\sqrt{∥ μ - μ_{T} ∥^{2} + \sum_{i = 1}^{k} λ_{i}} \leq δ) \geq 1 - ξ_{0}\},

(21)

where

P_{T}

,

μ_{T}

, and

Σ_{T}

are sample estimates of

P^{*}

,

μ_{0}

, and

Σ_{0}

, respectively, and we assume that

P_{T}

converges to

P^{*}

as

T \to \infty

.

We recommend a “backward selection” approach for selecting

δ

. Specifically, we choose a value of

δ

that captures the asset return distributions that have consistently appeared in historical data, excluding occasional economic fluctuations such as large-scale financial crises and booms. It is important to note that a larger ambiguity set that includes more occasional economic fluctuations would result in a more conservative investment strategy. Our empirical method involves the following steps:

Gather historical data on asset returns and partition them into appropriate time blocks based on the data frequency, such as using a daily block if the data is collected at a per-second interval. Each block represents the distribution of asset returns for that particular time period, contributing a measure $P_{d_{1}}$ to the ambiguity set of asset returns. Repeating this process for each time block, such as for each day of the year, will yield $P_{d_{1}}, P_{d_{2}}, \dots, P_{d_{365}}$ in our ambiguity set for that year.
Select a basis $P_{d_{0}}$ from $P_{d_{1}}, P_{d_{2}}, \dots, P_{d_{365}}$ . It is advisable to avoid choosing $P_{d_{0}}$ with accidental large-scale fluctuations in asset returns.
Compute the sample mean and sample covariance for each $P_{d_{0}}, P_{d_{1}}, P_{d_{2}}, \dots, P_{d_{365}}$ . Then, use these estimates to calculate the PRW distance, as defined in (20), between each $P_{d_{i}}$ and the basis $P_{d_{0}}$ . We denote these distances as $G_{d_{1}}, G_{d_{2}}, \dots, G_{d_{365}}$ . Note that calculating the Wasserstein distances directly is more time-consuming and may suffer from the curse of dimensionality if the number of assets is large, whereas the PRW distance does not. Additionally, since we choose $δ$ using a backward selection approach, using the Wasserstein distance may result in a larger value of $δ$ , leading to a more conservative MVP.
Select $δ$ as the $(1 - ξ_{0})$ th percentile of the empirical distribution of $G_{d_{1}}, G_{d_{2}}, \dots, G_{d_{365}}$ . The user-defined confidence level $ξ_{0}$ can be determined through cross-validation. In our empirical study, we found that choosing $δ$ as the median of $G_{d_{1}}, G_{d_{2}}, \dots, G_{d_{365}}$ leads to better out-of-sample performance compared to selecting the minimum or maximum. However, users can try different percentile levels to find a more suitable value for $δ$ . It is worth noting that since the distribution of $G_{d}$ depends on the choice of k, it is necessary to estimate k even when using cross-validation to determine $δ$ .

Remark 1.

Assuming bounded asset returns, let

P^{*}

be the unknown true distribution of returns with mean vector

μ_{0}

and covariance matrix

Σ_{0}

. Then, there exists a constant

c > 1

that depends on

P^{*}

such that for any confidence level

ξ_{0} \in (0, 1]

, the sample mean

μ_{T}

and sample covariance

Σ_{T}

satisfy the concentration inequality

P_{T} [(μ_{0}, Σ_{0}) \in U_{δ} (μ_{T}, Σ_{T})] \geq 1 - ξ_{0}

whenever δ exceeds

δ_{T} (ξ_{0}) = \frac{log (c / ξ_{0})}{\sqrt{T}} .

(22)

Moreover, for all

ξ_{0} \in (0, 1)

and

δ \geq δ_{T} (ξ_{0})

, we have

P_{T} [{ϕ^{*}}^{'} Σ_{0} ϕ^{*} \leq (max_{(μ, Σ) \in U_{δ} (μ_{T}, Σ_{T})} {ϕ^{*}}^{'} Σ ϕ^{*})] \geq 1 - ξ_{0},

(23)

where

ϕ^{*}

is the optimal portfolio weight that solves the problem

max_{(μ, Σ) \in U_{δ} (μ_{T}, Σ_{T})} \frac{1}{2} {ϕ^{*}}^{'} Σ ϕ^{*} s . t . {ϕ^{*}}^{'} μ = α .

(24)

Especially, if

P^{*}

is an elliptical distribution. Then,

(max_{(μ, Σ) \in U_{δ} (μ_{T}, Σ_{T})} {ϕ^{*}}^{'} Σ ϕ^{*}) ⟶ {ϕ^{*}}^{'} Σ_{0} ϕ^{*}

(25)

almost surely as

T \to \infty

.

Remark 1 provides a large sample performance guarantee for our approach, which is a direct result of Theorems 21, 22, and 23 in [25]. These theorems assert that the uncertainty set

U_{δ} (μ, Σ)

with radius

δ \geq δ_{T} (ξ_{0})

represents a

(1 - ξ_{0})

confidence set for the mean and covariance of the population distribution

P^{*}

. The theorems also provide an upper confidence bound on the out-of-sample performance of the MVP that is constructed based on the training samples. For convergence rates in Wasserstein distance of the empirical measure, readers can refer to [26].

We conducted an experiment to test if using different covariance estimators would affect the empirical distribution of

δ

in high-dimensional settings. We iteratively generate asset returns

R_{t} \sim N (μ_{0}, Σ)

for 500 times, where

Σ

is the sum of two positive definite matrices

Σ_{0}

and

Σ_{e}

, with

Σ_{0}

having a low-rank structure and

Σ_{e} = L_{e}^{'} L_{e}

. We randomly sample

μ_{0}

and

L_{e}

from

N (0, 1)

. Table 1 displays the empirical distribution of

δ

using our method, which is based on the sample mean and three different covariance estimators: the sample covariance

(Σ_{s a m})

, the POET estimator

(Σ_{P O E T})

in [7], and the shrinkage estimator

(Σ_{L W})

in [27]. The empirical distribution of

δ

is robust to all listed covariance estimators, as shown in the table. We also conduct a Kolmogorov–Smirnov test on the distributions of

\hat{δ}

using the sample covariance, POET, and shrinkage estimators. The test fails to reject the null hypothesis that any two distributions are the same given N and T. Hence, we suggest simply using the sample mean and sample covariance to estimate

δ

in practice.

2.2.2. Methods for Choosing k

The topic of estimating the number of factors is important and has been extensively studied in the literature. Wang provided a survey of the methods and categorized them based on the asymptotic regimes of fixed N and

T \to \infty

, and both N and

T \to \infty

.

Under the regime of fixed N and

T \to \infty

, various methods for estimating the number of factors have been compared, such as scree plot, Bartlett’s test, and cross-validation [28]. Ref. [29] further introduced percent variance, Kaiser test, parallel analysis (PA), and the minimum average partial procedure, and selected PA as the best approach due to its practical performance, although it assumes finite-dimensional data and lacks theoretical guarantees. PA is a method for estimating the number of factors, which involves comparing the rth factor to the 95th percentile of the rth eigenvalues of a simulated correlation matrix, for

r = 1, \dots, N

. If the rth factor is larger than the 95th percentile of the simulated rth factor, and all previous

r - 1

factors have been retained, then the rth factor is also retained. Originally, PA simulated the correlation matrix from a Gaussian distribution [30], but [31] introduced a permutation version of PA that shows better performance in simulations.

Under the regime of

N, T \to \infty

while k is fixed, the existing methods to estimate the low-rank or the number of factors can be classified by assuming strong or weak factors. Strong factors correspond to significant loadings B, where

B^{'} B / T \to Σ_{B}

, and include the methods of [32](BN), [33], and [34](AH). These methods typically use information criteria or eigengaps to estimate the number of factors. Compared to strong factors, weak factors assume

B^{'} B \to Σ_{B}

instead of

B^{'} B / T \to Σ_{B}

. Methods for estimating weak factors include [35,36] (KN), and [37]. These methods are designed for the case where the idiosyncratic components have homoscedastic noise

ϵ = σ^{2} I

and factors have eigenvalues larger than

σ^{2} \sqrt{T / N}

when

N, T \to \infty

,

T / N \to c > 0

. It is known [36,38] that there exists a phase transition phenomenon. That is, the factors are not detectable if their associated eigenvalues are below some threshold, even for the homoscedastic noise

ϵ = σ^{2} I

.

3. Monte Carlo Simulations

In this section, we describe our simulation experiments and discuss the comparison of different portfolio selection methods.

3.1. Setting

We simulate asset-returns from a normal distribution,

N (0, Σ)

, where

Σ

is the sum of the signal component

Σ_{s}

and the noise component

Σ_{e}

. The signal

Σ_{s}

is defined as

B C o v (F) B^{'}

, where F are the Fama–French three factors, B are the corresponding factor loadings obtained by regressing the monthly returns of

N = 82

S&P 100 index constituent stocks on F, and

C o v (F)

is the covariance matrix of F. The noise

Σ_{e}

is obtained by applying the soft-thresholding approach in [39] on the sample covariance of the residuals from the regression.

3.2. Comparison of Methods of Choosing k

In this section, we suggest an approach from choosing the number of factors in Section 2.2.2. The estimation accuracy of k is known to depend on the relative magnitudes of

Σ_{s}

and

Σ_{e}

[37]. We thus generate returns from

R \sim N (0, Σ) s . t . Σ = \frac{Σ_{s}}{∥ Σ_{s} ∥} + γ \cdot \frac{Σ_{e}}{∥ Σ_{e} ∥} .

(26)

The signal and noise were normalized to unity using their Frobenius norm, with a constant

γ

adjusting the signal and noise ratio. Four methods (PA, BN, AH, and KN) were compared for selecting the true number of factors in a dataset of 20 randomly chosen stocks with 200 observations per iteration. The true number of factors was 3, and the total iteration was 500. Table 2 shows the correct rank selection rate, with PA having the highest correct rate and outperforming the other methods when the noise was large. Therefore, PA is the preferred method.

3.3. Performance of Various GMVP Estimators

In this section, we compare various estimators for GMVP, including the BCZ method and our proposed approach. Specifically, we compare three GMVP estimators:

\hat{w} (Σ_{S a m})

,

\hat{w} (Σ_{P O E T})

, and

\hat{w} (Σ_{L W})

, which are estimated by plugging in

\hat{w} (Σ) = \frac{Σ^{- 1} 1}{1^{'} Σ^{- 1} 1}

with the sample covariance matrix

Σ_{S a m}

, POET estimator

Σ_{P O E T}

[7], the Ledoit–Wolf shrinkage estimator [27], and the equal-weight portfolio

{\hat{w}}_{e w}

. Our proposed GMVP estimator is obtained by solving the following optimization problem:

{\hat{w}}_{D S R} = \underset{ϕ}{argmin} {(\sqrt{ϕ^{'} {Var}_{P_{T}} (R) ϕ} + δ_{k} ∥ ϕ ∥)}^{2} s . t . ϕ^{'} 1 = 1,

(27)

where

δ_{k}

is computed using the method introduced in Section 2.2.1. We highlight the advantage of using the projected robust Wasserstein (PRW) distance over the regular Wasserstein distance when the data dimension is large. Because there is only theoretical guidance for choosing

δ

based on large samples in the BCZ method, it is not clear how to compute

δ

using empirical data. The GMVP and minimum variance portfolio (MVP) estimates referred to in this paper using the BCZ method, denoted as

{\hat{w}}_{D R}

, are computed in the same way as ours, except that

δ

is computed using the regular Wasserstein distance.

We allow for short positions in our portfolio weightings. The performance of the aforementioned methods is evaluated based on the risk ratio (R.R.), as defined in [8]:

R . R . = \sqrt{\frac{R (\hat{w})}{R_{min}}} = \sqrt{\frac{{\hat{w}}^{'} Σ \hat{w}}{{w^{*}}^{'} Σ w^{*}}},

(28)

where

w^{*} = \frac{Σ^{- 1} 1}{1^{'} Σ^{- 1} 1}

and

\hat{w}

are the GMVP estimators based on the generated returns, as introduced in Section 3.1. For both the BCZ and our proposed approaches, we estimate the empirical distributions of

δ

and

δ_{k}

from the sampled returns, and select their

25 %, 50 %

, and

75 %

percentiles as the values of

δ

and

δ_{k}

, respectively. We also consider the transaction cost (Trans) of all the listed portfolios, which is captured by the Euclidean norm of weight vectors. The results are reported in Table 3.

In Table 3, the values in parentheses

(\cdot)

denote the standard deviations. We use the

25 %, 50 %,

and

75 %

percentiles of the empirical distribution of

δ_{k}

to select the benchmark values for our GMVP estimator, and compare them with the other GMVP estimators. We employ a one-sided t-test with the null hypothesis:

H_{0} : R . R . (\hat{w} (Σ)) < {R . R . ({\hat{w}}_{D S R} (δ_{25 %})), R . R . ({\hat{w}}_{D S R} (δ_{50 %})), R . R . ({\hat{w}}_{D S R} (δ_{75 %}))} .

(29)

The corresponding p-values are presented in square brackets in the form of

[\cdot / \cdot / \cdot]

. As shown in the table, for all combinations of

N, T

in our experiment,

{\hat{w}}_{D S R}

exhibits a lower

R . R .

than

{\hat{w}}_{D R}

and

{\hat{w}}_{e w}

. We observe that the GMVP estimates

\hat{w} (Σ_{P O E T})

and

\hat{w} (Σ_{L W})

, which are global variance minimizers, have a lower risk ratio (

R . R .

) than our

{\hat{w}}_{D S R}

. However, this can be attributed to the fact that our

{\hat{w}}_{D S R}

is estimated by minimizing the global variance plus a regularization term, while the other GMVP estimators do not include any additional terms in their objective functions. In contrast,

\hat{w} (Σ_{P O E T})

and

\hat{w} (Σ_{L W})

are pure global variance minimizers. It is important to note that our

{\hat{w}}_{D S R}

enjoys a more robust out-of-sample performance than the other GMVP estimators, as we will show later, at the expense of sacrificing some sharp risk consistency, which is defined as

\sqrt{\frac{R (\hat{w})}{R_{min}}} \overset{p}{\to} 1

as

T \to \infty

. This is because our estimator is designed to be more robust to model misspecification and market noise. Regarding transaction cost,

\hat{w} (Σ_{P O E T})

,

\hat{w} (Σ_{S a m})

, and

\hat{w} (Σ_{L W})

can be seen as

δ = 0

, while

{\hat{w}}_{e w}

can be seen as

δ = \infty

. We find that larger values of the regularization parameter imply lower transaction costs.

3.4. Robustness of Various MVP Estimators

In this section, we compare the robustness of the Sharpe ratio of our DSR-MVP with other MVP estimators listed in the previous section. We use the same data generating process as in Section 3.1, but with a larger set of stocks (

N = 388

) that remained in the S&P 500 index during the period 2001–2021. All MVPs were trained using the covariance

Σ = Σ_{s} + Σ_{e}

, with the required return set to be the same as that of the equal-weight portfolio. We compute the Sharpe ratios by averaging 500 iterations based on simulated returns from

N (μ, Σ (a))

, where

μ

is a vector of average returns for each stock and

Σ (a) = Σ_{s} + a \cdot Σ_{e}

. By tuning the value of a, we adjust the relative magnitude of the noise to the signal. The results are presented in Figure 2. Figure 2 shows that all MVPs have a lower Sharpe ratio when the returns are more noisy. However, our DSR-MVP enjoys the highest Sharpe ratio compared to the other MVPs across all levels of noise. The shadow areas represent the corresponding confidence intervals at a significance level of

5 %

. We observe that even the lower boundary of the confidence interval for the Sharpe ratio of our DSR-MVP is higher than the average Sharpe ratio of the other MVPs.

4. Empirical Results

In this section, we evaluate the out-of-sample performance of our DSR-MVP estimator. We use the daily returns of the S&P 500 index span from 2001 to 2021. This gives us a sample of

N = 388

stocks, with

T = 5282

periods of observations.

We compared the performance of our DSR-MVP estimator with other MVP estimators using a rolling-out-of-sample testing approach. We trained the initial portfolios using data from 2001–2002, with a training set of 24 months that is updated each month by excluding the first month and adding the next month. This allows us to recalculate portfolio weights every month. We set the required return to be the same as the equal-weight portfolio. The performance of each portfolio are evaluated based on its out-of-sample annualized Sharpe ratio (S.R.) and risk. We used a testing set of one month of data, with the out-of-sample period spanning from 2003 to 2021, a total of 227 months.

To construct our DSR-MVP estimator, we first estimate the empirical distribution of

δ

using the approach introduced in Section 2.2.1. We use each training set as a block and estimate the mean and covariance using the daily returns from that block. We repeat this process for the remaining blocks, resulting in 228 pairs of

(\hat{μ}, \hat{Σ})

denoted as

{(\hat{μ}, \hat{Σ})}_{1}, (\hat{μ}, \hat{Σ}) 2, \dots, (\hat{μ}, \hat{Σ}) 228

. We choose

{(\hat{μ}, \hat{Σ})}_{1}

, which is estimated based on stock returns during 2001–2002 as the basis and compute the PRW distance between

{(\hat{μ}, \hat{Σ})}_{1}

and

{(\hat{μ}, \hat{Σ})}_{t}

using (20) for

t = 2, \dots, 228

. This results in 227 PRW distances that measure the deviation of the distribution of a given block from the basis block. We plot the values of these PRW distances along with the S&P 500 index in Figure 3. Both curves were normalized by dividing their first values. From Figure 3, we can observe that the Gelbrich distances and the S&P 500 index have a reverse corresponding relationship. Although the former cannot be used to predict the rise and fall of the stock market, it can describe the difference between the stock market return distribution in a specified period and the base period. We estimate the empirical distribution of

δ_{k}

from these PRW distances and set its median as the value of

δ_{k}

to construct our DSR-MVP estimator. Practitioners can try using more percentiles for cross-validation.

Table 4 presents the annual risks and Sharpe ratios of the listed MVPs from 2003 to 2021. Both are computed by averaging their 12 monthly values for the year. We choose our

{\hat{w}}_{D S R}

as the benchmark and conduct the following two-sided t-tests:

H_{0} : R i s k (\hat{w}) = R i s k ({\hat{w}}_{D S R})

and

H_{0} : S . R . (\hat{w}) = S . R . ({\hat{w}}_{D S R})

, where

\hat{w} \in {\hat{w} (Σ_{s a m}), \hat{w} (Σ_{P O E T}), \hat{w} (Σ_{W F}), {\hat{w}}_{e w}, {\hat{w}}_{D R}}

. In panel (a), our DSR-MVP exhibits a significantly lower risk than the traditional “plug-in” MVPs and the equal-weights portfolio for most of the years, indicating its clear advantage concerning out-of-sample risk. In panel (b), we must acknowledge that no single portfolio significantly dominates the others when using real market data without imposing any structures in terms of Sharpe ratio. However, this result can be interpreted as indicating that, with similar Sharpe ratio performance, our PRW-based DRO portfolio achieves lower risk than the classic “plug in” MVPs.

5. Conclusions

This paper introduces a new PRW distance-based DRO to the Markowitz portfolio selection. The main contribution is to enhance the existing DRO method proposed by BCZ with high-dimensional data. The application of the regular Wasserstein distance in BCZ may lead to an overly conservative strategy when managing a large number of assets. In contrast, our DRO method, which employs the PRW distance, concentrates solely on the principal components that account for most of the variability in asset returns. By focusing on the key factors, our approach mitigates the conservatism issue that arises when handling many assets. Our simulation results demonstrate that our DSR-MVP attains the highest Sharpe ratio in comparison to both conventional “plug-in” MVPs and DRO utilizing regular Wasserstein distance. This is especially evident when the data’s signal component has a low-rank structure and when the data is noisy. Our method provides practitioners with a comprehensive solution for estimating the optimal allocation policy based solely on asset return data without any exogenously given inputs. Furthermore, our empirical study indicates that the distributionally robust Wasserstein MVP outperforms conventional “plug-in” MVPs, in terms of lower risk, utilizing different covariance matrix estimators during out-of-sample testing. Future research can explore the connection between DRO problems and regularization in MVP using optimal transport discrepancy or other measures such as the phi-divergences, as discussed in [18]. Moreover, we find that estimating the number of latent factors consistently requires relatively strong factors. We also consider possible extensions to allow for weak factors, but formal theoretical studies on this issue are left for future work.

Author Contributions

Conceptualization, Z.Z.; methodology, Z.Z.; software, Z.Z. and H.J.; formal analysis, Z.Z.; writing—original draft preparation, Z.Z.; writing—review and editing, Z.Z., H.J. and C.K.; visualization, H.J.; supervision, C.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Yahoo Finance.

Conflicts of Interest

The authors declare no conflict of interest.

References

Markowitz, H. Portfolio Selection. J. Financ. 1952, 7, 77–91. [Google Scholar]
DeMiguel, V.; Nogales, F.J. Portfolio Selection with Robust Estimation. Oper. Res. 2009, 57, 560–577. [Google Scholar] [CrossRef] [Green Version]
Michaud, R.O. The Markowitz Optimization Enigma: Is ’Optimized’ Optimal? Financ. Anal. J. 1989, 45, 31–42. [Google Scholar] [CrossRef]
DeMiguel, V.; Garlappi, L.; Uppal, R. Optimal Versus Naive Diversification: How Inefficient is the 1-N Portfolio Strategy? Rev. Financ. Stud. 2009, 22, 1915–1953. [Google Scholar] [CrossRef] [Green Version]
Merton, R.C. On estimating the expected return on the market: An exploratory investigation. J. Financ. Econ. 1980, 8, 323–361. [Google Scholar] [CrossRef]
Jagannathan, R.; Ma, T. Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps. J. Financ. 2003, 58, 1651–1683. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Liao, Y.; Mincheva, M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2013, 75, 603–680. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ding, Y.; Li, Y.; Zheng, X. High dimensional minimum variance portfolio estimation under statistical factor models. J. Econom. 2021, 222, 502–515. [Google Scholar] [CrossRef]
Cai, T.T.; Hu, J.; Li, Y.; Zheng, X. High-dimensional minimum variance portfolio estimation based on high-frequency data. J. Econom. 2020, 214, 482–494. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empir. Financ. 2003, 10, 603–621. [Google Scholar] [CrossRef] [Green Version]
Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 2004, 88, 365–411. [Google Scholar] [CrossRef] [Green Version]
Ledoit, O.; Wolf, M. Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selection: Markowitz Meets Goldilocks. Rev. Financ. Stud. 2017, 30, 4349–4388. [Google Scholar] [CrossRef] [Green Version]
Huber, P.J. Robust Estimation of a Location Parameter. Ann. Math. Stat. 1964, 35, 73–101. [Google Scholar] [CrossRef]
Hampel, F. Contributions to the Theory of Robust Estimation; University Microfilms: Ann Arbor, MI, USA, 1976. [Google Scholar]
Perret-Gentil, C.; Victoria-Feser, M.P. Robust Mean-Variance Portfolio Selection. SSRN Electron. J. 2007. [Google Scholar] [CrossRef] [Green Version]
Scarf, H. A Min-Max Solution of an Inventory Problem. In Studies in the Mathematical Theory of Inventory and Production; Rand Corporation: Santa Monica, CA, USA, 1958; pp. 201–209. [Google Scholar]
Blanchet, J.; Chen, L.; Zhou, X.Y. Distributionally Robust Mean-Variance Portfolio Selection with Wasserstein Distances. Manag. Sci. 2022, 68, 6382–6410. [Google Scholar] [CrossRef]
DeMiguel, V.; Garlappi, L.; Nogales, F.J.; Uppal, R. A Generalized Approach to Portfolio Optimization: Improving Performance by Constraining Portfolio Norms. Manag. Sci. 2009, 55, 798–812. [Google Scholar] [CrossRef] [Green Version]
Olivares-Nadal, A.V.; DeMiguel, V. Technical Note—A Robust Perspective on Transaction Costs in Portfolio Optimization. Oper. Res. 2018, 66, 733–739. [Google Scholar] [CrossRef]
Esfahani, M.P.; Kuhn, D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018, 171, 115–166. [Google Scholar] [CrossRef] [Green Version]
Lin, T.; Fan, C.; Ho, N.; Cuturi, M.; Jordan, M.I. Projection Robust Wasserstein Distance and Riemannian Optimization. Adv. Neural Inf. Process. Syst. 2020, 33, 9383–9397. [Google Scholar]
Paty, F.P.; Cuturi, M. Subspace Robust Wasserstein Distances. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5072–5081. [Google Scholar]
Deshpande, I.; Hu, Y.T.; Sun, R.; Pyrros, A.; Siddiqui, N.; Koyejo, S.; Zhao, Z.; Forsyth, D.; Schwing, A. Max-Sliced Wasserstein Distance and its use for GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 10648–10656. [Google Scholar]
Nguyen, K.; Ho, N.; Pham, T.; Bui, H. Distributional Sliced-Wasserstein and Applications to Generative Modeling. arXiv 2020, arXiv:2002.07367. [Google Scholar]
Kuhn, D.; Esfahani, P.M.; Nguyen, V.A.; Shafieezadeh-Abadeh, S. Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning. In Operations Research & Management Science in the Age of Analytics; Informs: Catonsville, MD, USA, 2019. [Google Scholar]
Fournier, N.; Guillin, A. On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 2015, 162, 707–738. [Google Scholar] [CrossRef] [Green Version]
Ledoit, O.; Wolf, M. Spectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions. J. Multivar. Anal. 2015, 139, 360–384. [Google Scholar] [CrossRef]
Jolliffe, I.T. Principal Component Analysi; Springer: Berlin, Germany, 2002. [Google Scholar]
Velicer, W.F.; Eaton, C.A.; Fava, J.L. Construct Explication through Factor or Component Analysis: A Review and Evaluation of Alternative Procedures for Determining the Number of Factors or Components. In Problems and Solutions in Human Assessment: Honoring Douglas N. Jackson at Seventy; Springer US: Boston, MA, USA, 2000; Chapter 3; pp. 41–71. [Google Scholar]
Horn, J. A rationale and test for the number of factors in factor analysis. Psychometrika 1965, 30, 179–185. [Google Scholar] [CrossRef] [PubMed]
Buja, A.; Eyuboglu, N. Remarks on Parallel Analysis. Multivar. Behav. Res. 1992, 27, 509–540. [Google Scholar] [CrossRef] [PubMed]
Bai, J.; Ng, S. Determining the Number of Factors in Approximate Factor Models. Econometrica 2002, 70, 191–221. [Google Scholar] [CrossRef] [Green Version]
Onatski, A. Determining the Number of Factors from Empirical Distribution of Eigenvalues. Rev. Econ. Stat. 2010, 92, 1004–1016. [Google Scholar] [CrossRef]
Ahn, S.C.; Horenstein, A.R. Eigenvalue Ratio Test for the Number of Factors. Econometrica 2013, 81, 1203–1227. [Google Scholar] [CrossRef]
Muirhead, R.J. Latent Roots and Matrix Variates: A Review of Some Asymptotic Results. Ann. Statist. 1978, 6, 5–33. [Google Scholar] [CrossRef]
Kritchman, S.; Nadler, B. Non-Parametric Detection of the Number of Signals: Hypothesis Testing and Random Matrix Theory. IEEE Trans. Signal Process. 2009, 57, 3930–3941. [Google Scholar] [CrossRef]
Choi, Y.; Taylor, J.; Tibshirani, R. Selecting the number of principal components: Estimation of the true rank of a noisy matrix. Ann. Statist. 2017, 45, 2590–2617. [Google Scholar] [CrossRef]
Wang, J. Factor Analysis For High-dimensional Data. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2016. [Google Scholar]
Rothman, A.J. Positive definite estimators of large covariance matrices. Biometrika 2012, 99, 733–740. [Google Scholar] [CrossRef]

Figure 1. Various type of Wasserstein distances depending on the dimension N.

Figure 2. Sharpe ratio of various MVPs under noisy returns.

Figure 3. Gelbrich distances and S&P 100 index during 2003–2021.

Table 1. Empirical distribution of

\hat{δ}

based on various estimators of covariance.

Table 1. Empirical distribution of

\hat{δ}

based on various estimators of covariance.

			Percentile of $δ$					KS Test (p-Value)
$N$	$T$	Estimator	min	25%	50%	75%	max	H1	H2	H3
50	100	$\hat{δ} (Σ_{s a m})$	1.26	2.617	3.224	3.968	8.886	N = 50, T = 100
50	100	$\hat{δ} (Σ_{P O E T})$	1.298	2.639	3.216	3.982	8.897	1	0.718	0.770
50	100	$\hat{δ} (Σ_{L W})$	1.059	2.682	3.291	4.015	8.851
100	100	$\hat{δ} (Σ_{s a m})$	1.614	3.685	4.531	5.55	10.382	N = 100, T = 100
100	100	$\hat{δ} (Σ_{P O E T})$	1.648	3.654	4.549	5.549	10.460	1	0.740	0.614
100	100	$\hat{δ} (Σ_{L W})$	1.605	3.791	4.643	5.614	9.945
200	100	$\hat{δ} (Σ_{s a m})$	2.499	5.19	6.4	7.698	14.395	N = 200, T = 100
200	100	$\hat{δ} (Σ_{P O E T})$	2.446	5.157	6.388	7.681	14.364	1	0.399	0.356
200	100	$\hat{δ} (Σ_{L W})$	2.265	5.384	6.569	7.763	15.088

H1:

\hat{δ} (Σ_{s a m})

and

\hat{δ} (Σ_{P O E T})

follow same distribution. H2:

\hat{δ} (Σ_{s a m})

and

\hat{δ} (Σ_{L W})

follow same distribution. H3:

\hat{δ} (Σ_{P O E T})

and

\hat{δ} (Σ_{L W})

follow same distribution.

Table 2. Correct rate using various methods of choosing k.

$γ$	BN	AH	PA	$γ$	BN	PA
0.03	0.062	0.52	0.942	0.18	0.34	0.76
0.06	0.12	0.066	0.914	0.21	0.372	0.658
0.09	0.116	0.004	0.912	0.24	0.39	0.6
0.12	0.224	0	0.886	0.27	0.386	0.514
0.15	0.256	0	0.818	0.30	0.374	0.436

Table 3. Comparison of various GMVP estimators.

$N = 83$ , $T = 50$
						Choice of $δ$
	R.R.	p-value	Trans			25%	50%	75%
$\hat{w} (Σ_{S a m})$	2.703 (0.349)	[0/0/0]	0.466 (0.117)	${\hat{w}}_{D R}$	R.R.	5.689 (3.472) [0]	6.396 (3.991) [0]	7.757 (4.893) [0]
$\hat{w} (Σ_{P O E T})$	1.382 (0.066)	[1/1/1]	0.105 (0.004)		Trans	0.073 (0.01)	0.07 (0.011)	0.066 (0.012)
$\hat{w} (Σ_{L W})$	1.331 (0.065)	[1/1/1]	0.102 (0.004)	${\hat{w}}_{D S R}$	R.R.	2.067 (0.363)	2.174 (0.437)	2.429 (0.629)
$\hat{w} (Σ_{e w})$	9.926 (-)	[0/0/0]	0.012 (-)		Trans	0.1 (0.005)	0.097 (0.005)	0.093 (0.006)
$N = 83$ , $T = 100$
						Choice of $δ$
	R.R.	p-value	Trans			25%	50%	75%
$\hat{w} (Σ_{S a m})$	2.362 (0.365)	[0/0/0]	0.577 (0.18)	${\hat{w}}_{D R}$	R.R.	2.278 (0.369) [0]	2.354 (0.402) [0]	2.516 (0.473) [0]
$\hat{w} (Σ_{P O E T})$	1.313 (0.029)	[1/1/1]	0.104 (0.003)		Trans	0.09 (0.004)	0.089 (0.004)	0.087 (0.004)
$\hat{w} (Σ_{L W})$	1.237 (0.03)	[1/1/1]	0.109 (0.003)	${\hat{w}}_{D S R}$	R.R.	1.717 (0.151)	1.772 (0.171)	1.861 (0.204)
$\hat{w} (Σ_{e w})$	9.926 (-)	[0/0/0]	0.012 (-)		Trans	0.102 (0.003)	0.1 (0.003)	0.097 (0.003)
$N = 83$ , $T = 100$
						Choice of $δ$
	R.R.	p-value	Trans			25%	50%	75%
$\hat{w} (Σ_{S a m})$	1.3 (0.054)	[1/1/1]	0.227 (0.024)	${\hat{w}}_{D R}$	R.R.	1.699 (0.112) [0]	1.725 (0.118) [0]	1.783 (0.131) [0]
$\hat{w} (Σ_{P O E T})$	1.285 (0.017)	[1/1/1]	0.104 (0.002)		Trans	0.097 (0.002)	0.097 (0.002)	0.095 (0.002)
$\hat{w} (Σ_{L W})$	1.158 (0.021)	[1/1/1]	0.125 (0.006)	${\hat{w}}_{D S R}$	R.R.	1.519 (0.074)	1.545 (0.079)	1.597 (0.09)
$\hat{w} (Σ_{e w})$	9.926 (-)	[0/0/0]	0.012 (-)		Trans	0.103 (0.002)	0.102 (0.002)	0.1 (0.002)

p-value 0 implies that p > 0.001 and p-value 1 implies that p > 0.999.

Table 4. Performance of various MVP estimators using empirical data.

(a) Risk
Year	$\hat{w} (Σ_{sam})$	$\hat{w} (Σ_{POET})$	$\hat{w} (Σ_{WF})$	${\hat{w}}_{ew}$	${\hat{w}}_{DR}$	${\hat{w}}_{DSR}$
2003	3.23 ***	2.50 *	2.15	4.64 ***	2.14	2.10
2004	3.27 ***	3.31 ***	2.36	3.59 ***	2.46	2.35
2005	3.18 ***	3.07 ***	2.10	3.31 ***	2.4	2.12
2006	3.32 ***	2.53 ***	1.90	3.31 ***	2.01	1.91
2007	3.64 ***	3.16 *	2.31	4.51 ***	2.91	2.33
2008	6.79 *	6.53	4.49	10.81 ***	4.63	4.51
2009	6.25 ***	5.62 **	3.60	8.59 ***	3.43	3.71
2010	4.08 ***	2.67	2.56	5.23 ***	2.56	2.49
2011	3.86 **	3.11	2.66	6.49 ***	2.98	2.70
2012	3.67 ***	2.53 ***	2.05	3.79 ***	1.94	2.01
2013	3.46 ***	2.92 **	2.39	3.35 ***	2.45	2.35
2014	3.32 ***	3.39 ***	2.12	3.11 **	2.51	2.15
2015	3.97 **	4.81 ***	3.02	4.15 **	3.63	3.09
2016	4.73 ***	5.6 ***	2.73	3.84	2.79	2.56
2017	4.09 ***	2.78 ***	2.12 **	1.99	1.64	1.81
2018	4.58 ***	4.13 **	3.06	4.01 *	3.31	3.00
2019	4.42 ***	3.74 ***	2.54	3.25 **	2.36	2.32
2020	8.56 *	8.86 *	5.55	8.53	5.73	5.59
2021	5.64 ***	4.94 ***	3.04	3.83 ***	2.7	2.98
(b) Sharpe Ratio
Year	$\hat{w} (Σ_{sam})$	$\hat{w} (Σ_{POET})$	$\hat{w} (Σ_{WF})$	${\hat{w}}_{ew}$	${\hat{w}}_{DR}$	${\hat{w}}_{DSR}$
2003	0.54 *	0.75	1.15	0.95	1.24	1.21
2004	0.88	0.88	1.20	0.72	1.05	1.01
2005	0.17	0.32	0.29	0.53	0.48	0.35
2006	0.16	0.68	0.76	0.70	0.82	0.78
2007	0.20	0.31	0.35	0.39	0.47	0.44
2008	−0.05	−0.21	−0.11	−0.01	−0.13	−0.12
2009	−0.26	0.13	0.23	0.65	0.23	0.09
2010	0.48	0.48	0.49	0.76	0.57	0.50
2011	0.55	0.86	0.46	0.26	0.53	0.52
2012	0.24	0.12	0.40	0.61	0.36	0.41
2013	0.87	0.65	0.67	1.01	0.89	0.72
2014	0.49	0.64	0.83	0.72	0.82	0.87
2015	0.32	0.44	0.5	0.29	0.35	0.49
2016	0.00	0.08	0.48	0.60	0.45	0.57
2017	0.06	0.39	0.49	1.14	0.94	0.63
2018	0.09	0.47	0.33	0.34	0.51	0.46
2019	0.52	0.55	1.10	1.07	1.09	1.05
2020	0.22	−0.08	0.10	0.52	0.29	0.12
2021	0.10 *	0.29	0.73	0.74	0.72	0.76

Significant. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.; Jing, H.; Kao, C. High-Dimensional Distributionally Robust Mean-Variance Efficient Portfolio Selection. Mathematics 2023, 11, 1272. https://doi.org/10.3390/math11051272

AMA Style

Zhang Z, Jing H, Kao C. High-Dimensional Distributionally Robust Mean-Variance Efficient Portfolio Selection. Mathematics. 2023; 11(5):1272. https://doi.org/10.3390/math11051272

Chicago/Turabian Style

Zhang, Zhonghui, Huarui Jing, and Chihwa Kao. 2023. "High-Dimensional Distributionally Robust Mean-Variance Efficient Portfolio Selection" Mathematics 11, no. 5: 1272. https://doi.org/10.3390/math11051272

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Dimensional Distributionally Robust Mean-Variance Efficient Portfolio Selection

Abstract

1. Introduction

2. Estimation Method

2.1. Low-Rank Structure and DRO Formulation

2.2. Choosing DSR-MVP Parameters

2.2.1. Data-Driven Approach for Choosing $δ_{k}$

2.2.2. Methods for Choosing k

3. Monte Carlo Simulations

3.1. Setting

3.2. Comparison of Methods of Choosing k

3.3. Performance of Various GMVP Estimators

3.4. Robustness of Various MVP Estimators

4. Empirical Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

High-Dimensional Distributionally Robust Mean-Variance Efficient Portfolio Selection

Abstract

1. Introduction

2. Estimation Method

2.1. Low-Rank Structure and DRO Formulation

2.2. Choosing DSR-MVP Parameters

2.2.1. Data-Driven Approach for Choosing δ k

2.2.2. Methods for Choosing k

3. Monte Carlo Simulations

3.1. Setting

3.2. Comparison of Methods of Choosing k

3.3. Performance of Various GMVP Estimators

3.4. Robustness of Various MVP Estimators

4. Empirical Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2.1. Data-Driven Approach for Choosing $δ_{k}$