k-Covariance: An Approach of Ensemble Covariance Estimation and Undersampling to Stabilize the Covariance Matrix in the Global Minimum Variance Portfolio

Tran, Tuan; Nguyen, Nhat; Nguyen, Trung

doi:10.3390/app12136403

Open AccessArticle

k-Covariance: An Approach of Ensemble Covariance Estimation and Undersampling to Stabilize the Covariance Matrix in the Global Minimum Variance Portfolio

by

Tuan Tran

^1,2,*

,

Nhat Nguyen

¹ and

Trung Nguyen

¹

Faculty of Banking, Banking University of HCMC, Ho Chi Minh City 700000, Vietnam

²

École Pratique des Hautes Études—PSL, 75014 Paris, France

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(13), 6403; https://doi.org/10.3390/app12136403

Submission received: 15 May 2022 / Revised: 15 June 2022 / Accepted: 16 June 2022 / Published: 23 June 2022

Download Versions Notes

Abstract

:

A covariance matrix is an important parameter in many computational applications, such as quantitative trading. Recently, a global minimum variance portfolio received great attention due to its performance after the 2007–2008 financial crisis, and this portfolio uses only a covariance matrix to calculate weights for assets. However, the calculation process of that portfolio is sensitive with outliers in the covariance matrix, for example, a sample covariance matrix estimation or linear shrinkage covariance matrix estimations. In this paper, we propose the use of an undersampling technique and ensemble learning to stabilize the covariance matrix by reducing the impacts of outliers on the output of a covariance estimation. Experimenting on an emerging stock market using three performance metrics shows that our approach significantly improves the sample covariance matrix and also a linear shrinkage to the single-index model to a level of two shrinkage estimations, a shrinkage to identity matrix and shrinkage to constant correlation model.

Keywords:

ensemble learning; undersampling; covariance matrix; shrinkage estimation; minimum variance portfolio

1. Introduction

Beyond a linear shrinkage estimation of a sample covariance matrix with another well-structured matrix, Ledoit and Wolf [1,2] proposed a nonlinear shrinkage method by taking into account eigenvalues of the population that are estimated from the eigenvalue distribution of a sample dataset. The original idea is from a random matrix theory which effectively deals with noise in large-dimensional covariance matrices; however, it requires complex mathematics and is difficult to understand for practitioners. Another simpler nonlinear method splits the sample dataset and directly uses eigenvalues on the validation subset as the population eigenvalues [3]. Although this method is less accurate, it is simpler and easier to understand than the previous nonlinear approach. Another extension of the linear shrinkage is using multiple target matrices. However, performances of the original linear shrinkage estimators depend on a chosen target matrix. Therefore, multiple different targets could avoid this dependency and make the estimator more flexible [4].

Instead of being more complex, the benefits of the shrinkage estimators are questioned by a simple approach of a portfolio of covariance estimators [5]. The portfolio of covariance estimators contains the sample covariance matrix and estimators with different assumption and direction error. Then, it simply takes an average of them to neutralize the estimation error of the sample covariance matrix by specification errors of the others. A general comparison of Disatnik and Benninga [6] compares those simple portfolios with the linear shrinkage portfolios, combinations in this study could be equally weighted or even random weighted. They show that out-of-sample performances of the simple portfolios and the complex shrinkage portfolios perform within the same range, at least on a standard deviation of the portfolio returns. Therefore, they conclude that there is no statistical improvement and also no benefit of using those complicated shrinkage estimators.

In practice, fund managers are not only looking for the best covariance estimator but also seeking alternatives in order to expand their fund’s capacity. Those alternatives are different to the best one but have comparable portfolio performances. Therefore, the portfolio of estimators is a simple method but important for investment management and they also rise a question about the effectiveness of the shrinkage estimation. However, there is a lack of investigation on sustainability of those covariance matrix estimations. An advanced estimation which has highest performances but unsustainable outcomes is not preferable. For example, a portfolio changes its positions unreasonably just because of some outliers in the data, then it leads to other problems for the fund managers such as transaction costs or liquidity risk. Under the risk perspective, that is not a best portfolio and those alternatives are more attract to fund managers if they are more sustainable.

In this study, we question the stability of multiple covariance matrices to understand how they work in the portforlio optimization under the data perspective and it helps fund managers to develop their sustainable investments. Firstly, does the strength of the shrinkage estimation come from the combination with other matrices or from an optimal shrinkage intensity calculation? Secondly, could we combine with other sample covariance matrices instead of using different matrices? The first question is important in theoretical mathematics because understanding characteristics of not only the sample covariance matrix but also the shrinkage estimation is a principal research topic in various domains. The second question is important in the practical finance because they prefer a simple, effective, understandable and explainable method to a new, complicated and unproven method.

One problem with an estimated covariance matrix is highly fluctuating and sensitive with outliers [7,8,9,10]. In many applications in statistics and machine learning, ensemble learning and undersampling are efficient methods to reduce not only the variance of predictions but also the impact of outliers. Therefore, we propose the use of the ensemble learning with undersampling technique on a given covariance estimator to reduce the impact of outliers and stabilize our estimated covariance matrix. We test our approach with the sample covariance matrix and three linear shrinkage estimations on the Vietnam stock market from 2013 to the end of 2019. Backtesting on the historical data, out-of-sample portfolio performances show that applying our approach on the sample covariance matrix achieves comparable results to the portfolios of shrinkage techniques, or even better in some cases. Comparing to the sample covariance matrix, our portfolio significantly outperforms on all performance metrics.

The rest of this paper is organized as follows. Section 2 describes backgrounds and their formulations. Section 3 presents our methodology. Section 4 summarizes our dataset and evaluation metrics. Section 5 shows our experimental results. Section 6 discuss the results and conclude the paper.

2. Background

2.1. Linear Shrinkage Estimation

After a seminal theory of portfolio selection of Markowitz [11], the computation of efficient portfolios is an area of interest in finance profession which aims to estimate optimal weights for assets. That uses the mean and the covariance of stock returns as input parameters; however, an estimation error of the expected returns are larger than the covariance estimation error [12]. Recent research studies suggest assuming equal expected returns for all stocks and using only the covariance matrix to estimate the portfolio offer the lowest risk: in other words, a global minimum variance portfolio (GMVP) [13]. The standard statistical estimation of the covariance matrix in the GMVP is the sample covariance estimation. While the mathematics of the GMVP and the sample covariance matrix is straightforward, several studies in the theory and industrial practice pointed out the disadvantages of the sample covariance matrix in portfolio trading.

When a number of assets in our universe are greater than a number of historical returns, formally is when a number of dimensions are greater than a number of observations, the estimated sample covariance matrix from those data is not invertible and also ill-conditioned. Portfolio optimization in this context contains high estimation error and it turns out that the portfolio needs to adjust its positions more frequently which will increase the risk level of the portfolio and also decrease its profit. One approach to make the covariance matrix invertible in statistics is by imposing the matrix structure, such as a shrinkage technique that takes a weighted combination of two matrices, the sample covariance matrix and another invertible matrix [14]. The weight, also known as a shrinkage intensity, is optimized from a quadratic loss function on a given dataset without looking at the out-of-sample data. In this case, there is a unique “optimal” shrinkage intensity for one dataset. On the other hand, some recent studies [15,16] split the dataset and look at validation subsets to evaluate the “optimal” shrinkage intensity by cross-validation method instead of optimization, it resulted in different “optimal” shrinkage intensities based on different pre-defined objectives.

Given a dataset of N random variables and T observations representing T returns of N assets on a universe, the GMVP is formulated as follows:

\begin{matrix} min_{w} & w^{⊺} Σ w \\ s . t . & w^{⊺} 1 = 1 \end{matrix}

(1)

where the

Σ^{N \times N}

is the covariance matrix and the

w = {w_{1}, \dots, w_{N}}

is a weight vector of N assets respectively. The well-known solution of Equation (1) is:

w = \frac{Σ^{- 1} 1}{1^{⊺} Σ^{- 1} 1} .

(2)

The most common estimator, the sample covariance matrix estimator, not always be invertible and also ill-conditioned. Particularly in the financial industry, when the number of assets N is greater than the number of observations T (

N > > T

), the inverse of the sample covariance matrix amplifies estimation error.

For the high dimensional covariance matrices, refs. [17,18,19] proposed to impose the structure of the covariance matrix by using the shrinkage technique. They combined linearly the sample covariance matrix with another well-conditioned structured matrix. Given a target matrix

F

, the linear shrinkage estimator is as follows:

\hat{Σ} = δ F + (1 - δ) S

(3)

in which,

S

is the sample covariance matrix and

δ \in (0, 1)

is a shrinkage intensity. The optimal

δ

is obtained by minimize the quadratic loss of

| F - S |

. The target matrix

F

is a domain-based matrix and particularly in finance, Ledoit and Wolf proposed three different targets and showed that they are empirically significant better than the sample covariance matrix, including shrinkage towards identity matrix [17], shrinkage towards single-index model (also known as shrinkage to market) [18] and shrinkage towards constant correlation model [19].

2.2. Ensemble Learning and Undersampling

Besides the dimension curse, the covariance matrix is also sensitive with outliers [7,8,9,10], especially in the high dimension matrix estimated from the limited input data. This leads to an unstable covariance matrix. The outliers also affect several problems in Machine Learning and various approaches have been proposed to handle this issue. For example, Breiman [20] propose the use of undersampling and ensemble learning to build one robust estimator from several weak estimators which are low-bias but high variance functions. They use undersampling to build a large number of de-correlated estimators and then combine them to reduce the variance of the final prediction. The combination rule could be a simple average for regression problems or major voting for classification problems. This approach is not only simple to use but also efficient in many problems. One of the advantages is complementary: (i) combining the strength of each estimator and (ii) any outlier in the input data or even in the prediction of a single estimator does not affect the final estimator.

Dealing with a small partial of the outliers in the large covariance matrix, outlier detection in this context could face with another problem, a problem of imbalance between outliers and non-outliers. The authors of Tran et al. [21] show that the undersampling technique combines with ensemble learning is efficient not only on a massive dataset but also on extremely imbalance cases. They propose the splitting of the original data into k segments with the same imbalance ratio then build k estimators and finally combine them into a final predictor. Those segments are different but have similar data semantics, this characteristic is similar to the stock data for covariance estimation.

3. Methodology

The goal of the GMVP is to minimize the variance of the portfolio and it requires a stable covariance matrix over time. Typically, a dataset for covariance estimation is a weekly basis which is extracted from the daily dataset simply by getting the last trading day in each week (many research studies use monthly data, but there are not enough monthly data for estimations in younger markets). However, abnormal data exist in those last days and that leads to outliers in the covariance matrix. To reduce the impact of the outliers in the covariance matrix, we propose the use of the undersampling method then apply ensemble learning for covariance estimations at the data level. Our sampling procedure is as follows: a daily historical return dataset is randomly sampled with replacement into k weekly subsets with the same size: in other words, we pick a random day in each week instead of the last day. With a given covariance estimator, each of them is used to estimate a covariance matrix as a covariance predictor.

The above sampling process aims to take a nearby data point in each week that has similar information with the last day. The covariance predictors build on those subsets have similar semantics and also contain different outliers. To combine those covariance predictors, we apply the simple average to neutralize the outliers in one predictor by the non-outliers in other predictors. This ensemble also reduces the changes in the covariance matrix: in other words, the covariance matrix is more stable.

To illustrate the effectiveness of our method, we conduct an empirical experiment to compare with the linear shrinkage covariance estimators and also the portfolios of covariance estimators. While the combinations in the portfolios of covariance estimators could be seen as a method-level ensemble, our approach basically is data-level. In total, there are eleven portfolios in our experiment with different covariance estimators, including:

the sample covariance matrix (denoted as sample);
the shrinkage to the identity matrix (denoted as STIM): this is the linear shrinkage estimator which combines the sample covariance matrix and the identity matrix by an optimal shrinkage intensity [17];
the shrinkage to single-index model (denoted as SSIM): this is the linear shrinkage estimator which uses the single-index model of [22] as the target matrix and then combines with the sample covariance by an optimal shrinkage intensity [18];
The Shrinkage to constant correlation matrix (denoted as SCCM): this is the linear shrinkage estimator, a target matrix assumes the correlations of each pair of stocks are the same to estimate a constant-correlation covariance matrix and then combine with the sample covariance matrix by an optimal shrinkage intensity [19];
the combination of the sample matrix, the diagonal matrix and the single-index model (denoted as sTS): this is a portfolio with covariance matrix as a combination of basic elements, including the sample matrix, the diagonal matrix and the single-index model, then simply take an equally average instead of optimal weighting [5];
the combination of the sample matrix, the single-index model and the constant correlation matrix (denoted as sSC): this is a portfolio with covariance matrix as a combination of basic elements, including the sample matrix, the single-index model and the constant correlation matrix, then simply take an equally average instead of optimal weighting [6];
the combination of the sample matrix, the diagonal matrix and the constant correlation matrix (denoted as sTC): this is a portfolio with covariance matrix as a combination of basic elements, including the sample matrix, the diagonal matrix and the constant correlation matrix, then simply take an equally average instead of optimal weighting;
given one of the following estimators: the sample covariance estimator, the linear shrinkage to the identity matrix estimator, the linear shrinkage to the single-index model estimator and the linear shrinkage to constant correlation estimator; we apply our method on a given estimator to build our GMVPs, they are denoted as k-sample, k-STIM, k-SSIM and k-SCCM respectively.

A backtesting process for the portfolios above is as follows: given one of the above covariance estimators, at the end of each week a GMVP is built on the latest data then at the first trading day of next week, it will execute and match by the closing prices of that day. We hold it until the weekend then re-compute and rebalance the portfolio: in other words, the out-of-sample period is one week. Particularly, the training data for the covariance estimator is two-year weekly data: in other words, the in-sample period is two years. Considering the data quality, at each date, we use only the assets which have at least a half of non-null data: in other words, one-year after their IPOs.

In our backtesting process, to evaluate a portfolio precisely, we follow real regulations and restrictions of the market. Firstly, instead of assuming no impact of commission, we apply a real commission fee on every transaction. Secondly, we use a long-term interest rate of government bonds in the portfolio performance calculations. Thirdly, we limit the maximum number of shares that could be traded at each date but to simplify the process, we assume that we could long/short up to the real traded volume in the dataset and also there is no slippage in our transactions. Last but not least, to avoid a lookahead bias, at the end of each date the latest data we can use is the historical data up to that date and then the orders will fit with the next date’s prices.

4. Dataset and Evaluation Metrics

4.1. Dataset

In this study, we test our approach and other portfolios on the Vietnam stock market, one of the emerging markets in Asia. Particularly, we use historical trading data of all stocks traded on the Ho Chi Minh City Stock Exchange (HOSE) which are available on their website. The period of our experiment is over seven years, from 2013 to 2019, with 1744 trading days in total. According to their trading regulation, short-sell is not allowed and we only focus on long-only portfolios. A statistical summary of data is described in Table 1, unit for average volume is

10^{5}

shares and for others is a basis point.

4.2. Evaluation Metrics

To evaluate the portfolios, we use three performance metrics that are commonly used in finance, including annual volatility, a Sharpe ratio and portfolio turnover. The annual volatility (

σ

) is a standard deviation of portfolio returns which indicate a risk of that portfolio. The volatility in the context of the GMVP, lower is better. The Sharpe ratio is developed by Sharpe [23] to describe the profit of an investment compared to its risk, it is defined as follows:

Sharpe ratio = \frac{R - R_{f}}{σ},

(4)

where R is the return of a portfolio and

R_{f}

is a risk-free rate. For

R_{f}

, we use Vietnam’s 10-year government bond in the same period. A portfolio with a higher Sharpe ratio is a better portfolio.

The portfolio turnover measures a level of stability of the portfolio weights over the investment horizon [24]. In other words, it measures a fluctuation of portfolio weights. In the scope of the GMVP, we prefer a portfolio with a lower portfolio turnover. The formula of the portfolio turnover is as follows:

Portfolio Turnover = \frac{1}{T - 1} \sum_{t = 1}^{T - 1} \sum_{i = 1}^{N} (| w_{t + 1, i} - w_{t, i} |)

(5)

where

w_{t, i}

is a weight of asset i at date t. A maximum value for the Portfolio Turnover from Equation (5) is two: in other words,

200 %

changes from a portfolio weights to another completely different weights, and a minimum value is

0 %

.

Only out-of-sample results are reported for those evaluation metrics. We also report p-value for each pair of covariance estimators for all portfolio metrics in an Appendix section. Particularly for the Sharpe ratio, we use a bootstrapping method which is suggested by Ledoit and Wolf [25] to do a robust performance hypothesis testing.

5. Results and Discussions

In Table 2, we report out-of-sample portfolio performances of eleven portfolios on three evaluation metrics. We also report hypothesis testing results of each pair of those portfolios in the Table 2, including: Table A1 is p-values of the annual volatility, Table A2 is p-values of the Sharpe ratio and Table A3 is p-values of the portfolio turnover. Similarly in Table 3, we report out-of-sample portfolio performances of eleven portfolios but only on one hundred largest assets by market capitalization (

N = 100

). Table A4, Table A5 and Table A6 are p-values of the annual volatility, the Sharpe ratio and the portfolio turnover respectively for these portfolios.

A first noteworthy observation is that the most basic method in our experiment, the sample portfolio, is significantly worse than all other portfolios on all of three metrics, this shows the instability of the sample covariance matrix. Three more complicated portfolios, using the linear shrinkage estimators, have no difference both on the volatility and the Sharpe ratio. However, on the portfolio turnover, the SSIM portfolio is significantly higher than the STIM and SCCM portfolios and these two portfolios have the same turnover.

The results from our experiment show that the undersampling and ensemble learning applied to the sample covariance matrix estimation could improve significantly the portfolio performances on the annual volatility and the Sharpe ratio. These improvements are approximately the same as three portfolios of optimal shrinkage covariance estimations. Although on the portfolio turnover, it is better than the sample portfolio but still higher than the linear shrinkage portfolios.

These shrinkage portfolios have similar performances, except that the portfolio turnover of the SSIM portfolio is worse than the portfolio turnover of the STIM and the SCCM portfolios. Applying our approach on the shrinkage to single-index model, the k-SSIM portfolio, improves significantly the portfolio turnover of the SSIM portfolio to the same level of the STIM and the SCCM portfolios. This suggests that the estimated covariance matrix from the SSIM estimation is less stable than the STIM and SCCM estimations.

Furthermore, combinations of more than two different matrices have the lowest portfolio turnover in our experiment and in many cases, they are better than the others. This suggests that adding more matrices that are well-structured and domain-based could stabilize the portfolio. The exceptional estimator in this study, the sTS portfolio, is the combination of the sample matrix, the diagonal matrix and the single-index model and it has the best results in most cases. The diagonal matrix in this context could be seen as a combination of the sample portfolio with the equally-weighted portfolio and that helps to increase the portfolio diversification.

Comparing the three portfolios of optimal shrinkage estimations with the three portfolios of estimators shows that they have comparable results. This suggests that the strength of the shrinkage technique mostly come from the combination with other matrices and the “optimal” shrinkage intensity calculation is an additional step to make sure its results are the best. Moreover, we conclude that the combinations do not have to use different matrices, the combination of multiple sample covariance matrices by using ensemble learning and undersampling technique produces the similar performances to the advanced and optimal shrinkage estimations.

6. Conclusions

In this paper, we have presented the k-covariance estimation which is using the undersampling approach and ensemble learning to improve the stability of the covariance matrix in quantitative trading, particularly for the global minimum variance portfolio. Our approach reduces the impact of outliers in the covariance estimation by manipulating the process at the data level. All elements in our combinations are homogeneous covariance matrices, such as combining multiple sample covariance matrices and they are estimated from different samples. The experimental results show that the k-covariance approach improves significantly the sample covariance matrix estimation to the performance level of the linear shrinkage estimations. Although the linear shrinkage estimations are robust, applying our approach shows improvements in some cases, such as on the shrinkage to single-index model. In practice, our results suggest that fund managers should focus on two shrinkage estimations, the shrinkage to the identity matrix and the Shrinkage to constant correlation matrix, which are more stable and the performances of the portfolios of estimators are worth analyzing by the research community.

The scientific novelty of the research is using machine learning techniques to build and combine multiple weak covariance matrix estimations to achieve a sustainable covariance matrix estimation. We conclude that combining multiple sample covariance matrices by our approach and similarly applying on the weak shrinkage estimation achieve the same level of highest portfolio performances of the shrinkage estimations. In which, the improvement of the shrinkage estimations mostly come from the combination with other matrices while in our approach, we combine only one kind of covariance matrix estimation multiple times which is weak and noisy.

Furthermore, the portfolios of covariance estimations that combine the sample covariance matrix and more than two other matrices have exceptional results, we propose the investigation of these portfolios in future work. There are two limitations in our study, the first is the outdated data which is from 2013 to the end of 2019 and the second is our analysis performed on the single market. Because of a COVID-19 crisis at the early of 2020, the financial markets around the world performed differently at that period. Therefore, we focus on the normal scenario of the markets and the crisis scenario is worth studying in a separate paper. Another research direction is to test this approach on other markets to verify our conclusions generally.

Author Contributions

Conceptualization, T.T. and N.N.; methodology, T.T. and N.N.; formal analysis, T.T.; investigation, T.T.; data curation, N.N. and T.N.; writing—original draft preparation, T.T.; writing—review and editing, T.T., N.N. and T.N.; visualization, T.T. and N.N.; supervision, T.N.; project administration, T.N.; funding acquisition, T.T. and T.N. All authors have read and agreed to the published version of the manuscript.

Funding

The study was supported by The Youth Incubator for Science and Technology Programe, managed by Youth Development Science and Technology Center—Ho Chi Minh Communist Youth Union and Department of Science and Technology of Ho Chi Minh City, the contract number is “16/2021/HĐ-KHCNT-VƯ”.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Hypothesis Testing

Table A1. p-Value of the annual volatility for each pair of covariance estimators in Table 2.

Estimator	-	1	2	3	4	5	6	7	8	9	10
sample	1
STIM	2	<0.001
SSIM	3	<0.001	0.76
SCCM	4	<0.001	0.99	0.97
k-sample	5	<0.001	0.99	0.99	0.69
k-STIM	6	<0.001	0.26	0.09	<0.001	<0.001
k-SSIM	7	<0.001	0.35	0.13	0.001	<0.001	0.60
k-SCCM	8	<0.001	0.94	0.81	0.15	0.06	0.99	0.98
sTS	9	<0.001	0.003	<0.001	<0.001	<0.001	0.02	0.01	<0.001
sTC	10	<0.001	0.90	0.71	0.09	0.03	0.97	0.95	0.38	0.99
sSC	11	<0.001	0.90	0.72	0.09	0.03	0.97	0.95	0.38	0.99	0.51

Table A2. p-Value of the Sharpe ratio for each pair of covariance estimators in Table 2.

Estimator	-	1	2	3	4	5	6	7	8	9	10
sample	1
STIM	2	0.002
SSIM	3	<0.001	0.56
SCCM	4	0.02	0.97	0.74
k-sample	5	0.03	0.36	0.16	0.47
k-STIM	6	<0.001	0.27	0.74	0.65	0.06
k-SSIM	7	<0.001	0.33	0.50	0.48	0.004	0.75
k-SCCM	8	0.03	0.92	0.63	0.77	0.49	0.52	0.30
sTS	9	<0.001	0.007	0.03	0.04	0.02	0.16	0.26	0.04
sTC	10	0.01	0.71	0.93	0.27	0.32	0.89	0.75	0.38	0.06
sSC	11	0.01	0.71	0.93	0.28	0.32	0.89	0.75	0.38	0.06	0.44

Table A3. p-Value of the portfolio turnover for each pair of covariance estimators in Table 2.

Estimator	-	1	2	3	4	5	6	7	8	9	10
sample	1
STIM	2	<0.001
SSIM	3	<0.001	<0.001
SCCM	4	<0.001	0.79	<0.001
k-sample	5	<0.001	<0.001	<0.001	<0.001
k-STIM	6	<0.001	0.51	<0.001	0.71	<0.001
k-SSIM	7	<0.001	0.13	0.04	0.08	<0.001	0.03
k-SCCM	8	<0.001	0.27	<0.001	0.41	<0.001	0.64	0.009
sTS	9	<0.001	0.008	<0.001	0.02	<0.001	0.04	<0.001	0.12
sTC	10	<0.001	0.01	<0.001	0.02	<0.001	0.05	<0.001	0.15	0.95
sSC	11	<0.001	0.01	<0.001	0.03	<0.001	0.06	<0.001	0.15	0.93	0.98

Table A4. p-Value of the annual volatility for each pair of covariance estimators in Table 3.

Estimator	-	1	2	3	4	5	6	7	8	9	10
sample	1
STIM	2	0.02
SSIM	3	0.12	0.81
SCCM	4	0.04	0.64	0.29
k-sample	5	0.07	0.72	0.37	0.58
k-STIM	6	0.002	0.21	0.04	0.11	0.08
k-SSIM	7	0.01	0.44	0.14	0.30	0.23	0.74
k-SCCM	8	0.01	0.41	0.12	0.27	0.20	0.71	0.46
sTS	9	<0.001	0.02	0.001	0.008	0.004	0.11	0.03	0.03
sTC	11	0.009	0.38	0.11	0.25	0.19	0.69	0.44	0.47	0.95
sSC	10	0.009	0.39	0.11	0.25	0.19	0.69	0.44	0.47	0.95	0.50

Table A5. p-Value of the Sharpe ratio for each pair of covariance estimators in Table 3.

Estimator	-	1	2	3	4	5	6	7	8	9	10
sample	1
STIM	2	0.08
SSIM	3	0.02	0.43
SCCM	4	0.04	0.46	0.12
k-sample	5	0.98	0.18	0.30	0.05
k-STIM	6	0.36	0.48	0.93	0.27	0.20
k-SSIM	7	0.69	0.29	0.53	0.07	0.19	0.37
k-SCCM	8	0.20	0.87	0.49	0.34	0.07	0.51	0.10
sTS	9	0.02	0.13	0.05	0.68	0.02	0.07	0.03	0.32
sTC	11	0.09	0.46	0.21	0.88	0.09	0.28	0.12	0.38	0.71
sSC	10	0.09	0.46	0.21	0.88	0.09	0.28	0.12	0.39	0.71	0.50

Table A6. p-Value of the portfolio turnover for each pair of covariance estimators in Table 3.

Estimator	-	1	2	3	4	5	6	7	8	9	10
sample	1
STIM	2	<0.001
SSIM	3	0.08	0.02
SCCM	4	<0.001	0.96	0.02
k-sample	5	0.03	0.05	0.69	0.06
k-STIM	6	<0.001	0.29	<0.001	0.27	0.002
k-SSIM	7	<0.001	0.54	0.08	0.57	0.18	0.09
k-SCCM	8	<0.001	0.28	<0.001	0.27	0.002	0.98	0.09
sTS	9	<0.001	0.003	<0.001	0.003	<0.001	0.05	<0.001	0.05
sTC	11	<0.001	0.01	<0.001	0.01	<0.001	0.17	0.002	0.18	0.58
sSC	10	<0.001	0.01	<0.001	0.01	<0.001	0.17	0.002	0.18	0.57	0.99

References

Ledoit, O.; Wolf, M. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Stat. 2012, 40, 1024–1060. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. Spectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions. J. Multivar. Anal. 2015, 139, 360–384. [Google Scholar] [CrossRef]
Lam, C. Nonparametric eigenvalue-regularized precision or covariance matrix estimator. Ann. Stat. 2016, 44, 928–953. [Google Scholar] [CrossRef]
Gray, H.; Leday, G.G.; Vallejos, C.A.; Richardson, S. Shrinkage estimation of large covariance matrices using multiple shrinkage targets. arXiv 2018, arXiv:1809.08024. [Google Scholar]
Jagannathan, R.; Ma, T. Three methods for improving the precision in covariance matrix estimation. 2000; Unpublished working. [Google Scholar]
Disatnik, D.J.; Benninga, S. Shrinking the covariance matrix. J. Portf. Manag. 2007, 33, 55–63. [Google Scholar] [CrossRef]
Yuan, K.H.; Bentler, P.M. Effect of outliers on estimators and tests in covariance structure analysis. Br. J. Math. Stat. Psychol. 2001, 54, 161–175. [Google Scholar] [CrossRef] [PubMed]
Leys, C.; Klein, O.; Dominicy, Y.; Ley, C. Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. J. Exp. Soc. Psychol. 2018, 74, 150–156. [Google Scholar] [CrossRef]
Raymaekers, J.; Rousseeuw, P.J. Fast robust correlation for high-dimensional data. Technometrics 2021, 63, 184–198. [Google Scholar] [CrossRef] [Green Version]
Ke, Y.; Minsker, S.; Ren, Z.; Sun, Q.; Zhou, W.X. User-friendly covariance estimation for heavy-tailed distributions. Stat. Sci. 2019, 34, 454–471. [Google Scholar] [CrossRef] [Green Version]
Markowitz, H.M. Portfolio Selection; Yale University Press: London, UK, 1968. [Google Scholar]
Merton, R.C. On estimating the expected return on the market: An exploratory investigation. J. Financ. Econ. 1980, 8, 323–361. [Google Scholar] [CrossRef]
DeMiguel, V.; Nogales, F.J. Portfolio selection with robust estimation. Oper. Res. 2009, 57, 560–577. [Google Scholar] [CrossRef] [Green Version]
Stein, C. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Contribution to the Theory of Statistics; University of California Press: Berkeley, CA, USA, 2020; pp. 197–206. [Google Scholar]
Tong, J.; Hu, R.; Xi, J.; Xiao, Z.; Guo, Q.; Yu, Y. Linear shrinkage estimation of covariance matrices using low-complexity cross-validation. Signal Process. 2018, 148, 223–233. [Google Scholar] [CrossRef] [Green Version]
Tran, T.; Nguyen, N.; Nguyen, T.; Mai, A. Voting shrinkage algorithm for Covariance Matrix Estimation and its application to portfolio selection. In Proceedings of the 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), Ho Chi Minh City, Vietnam, 14–15 October 2020; pp. 1–6. [Google Scholar]
Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 2004, 88, 365–411. [Google Scholar] [CrossRef] [Green Version]
Ledoit, O.; Wolf, M. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empir. Financ. 2003, 10, 603–621. [Google Scholar] [CrossRef] [Green Version]
Ledoit, O.; Wolf, M. Honey, I shrunk the sample covariance matrix. J. Portf. Manag. 2004, 30, 110–119. [Google Scholar] [CrossRef] [Green Version]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Tran, T.; Tran, L.; Mai, A. K-Segments Under Bagging approach: An experimental Study on Extremely Imbalanced Data Classification. In Proceedings of the 2019 19th International Symposium on Communications and Information Technologies (ISCIT), Ho Chi Minh City, Vietnam, 25–27 September 2019; pp. 492–495. [Google Scholar]
Sharpe, W.F. A simplified model for portfolio analysis. Manag. Sci. 1963, 9, 277–293. [Google Scholar] [CrossRef] [Green Version]
Sharpe, W.F. The sharpe ratio. J. Portf. Manag. 1994, 21, 49–58. [Google Scholar] [CrossRef]
DeMiguel, V.; Garlappi, L.; Nogales, F.J.; Uppal, R. A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Manag. Sci. 2009, 55, 798–812. [Google Scholar] [CrossRef] [Green Version]
Ledoit, O.; Wolf, M. Robust performance hypothesis testing with the Sharpe ratio. J. Empir. Financ. 2008, 15, 850–859. [Google Scholar] [CrossRef] [Green Version]

Table 1. Statistical summary of the historical data on the HOSE exchange from 2013 to 2019.

	2013	2014	2015	2016	2017	2018	2019
Minimum number of stocks	304	302	303	308	324	350	380
Maximum number of stocks	316	310	313	324	350	380	387
Average daily return	16	13	6.23	5.26	10.44	−1.79	2.42
Standard deviation	815.88	276.2	354.13	283.13	255.54	275.92	249.2
Average volume	4.54	9.03	8.12	8.88	11.84	11.19	9.66

Table 2. Out-of-sample performance of eleven minimum variance portfolios with different covariance matrix estimations. All available stocks on the HOSE exchange are considered.

Estimator	$σ$ (%)	Sharpe Ratio	Portfolio Turnover (%)
sample	8.30	1.39	7.28
STIM	7.24	1.94	3.49
SSIM	7.37	2.00	4.41
SCCM	7.71	1.94	3.43
k-sample	7.80	1.77	5.62
k-STIM	7.13	2.04	3.34
k-SSIM	7.17	2.07	3.86
k-SCCM	7.52	1.91	3.23
sTS	6.79	2.24	2.88
sTC	7.46	2.01	2.89
sSC	7.47	2.01	2.90

Table 3. Out-of-sample performance of eleven minimum variance portfolios with different covariance matrix estimations. Only top one hundred assets by market capitalization are considered,

(N = 100)

.

Table 3. Out-of-sample performance of eleven minimum variance portfolios with different covariance matrix estimations. Only top one hundred assets by market capitalization are considered,

(N = 100)

.

Estimator	$σ$ (%)	Sharpe Ratio	Portfolio Turnover (%)
sample	9.81	0.82	4.30
STIM	9.34	0.99	3.17
SSIM	9.54	0.92	3.78
SCCM	9.42	1.09	3.18
k-sample	9.47	0.82	3.67
k-STIM	9.16	0.93	2.91
k-SSIM	9.31	0.86	3.32
k-SCCM	9.29	1.01	2.90
sTS	8.90	1.13	2.47
sTC	9.28	1.10	2.59
sSC	9.28	1.10	2.59

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tran, T.; Nguyen, N.; Nguyen, T. k-Covariance: An Approach of Ensemble Covariance Estimation and Undersampling to Stabilize the Covariance Matrix in the Global Minimum Variance Portfolio. Appl. Sci. 2022, 12, 6403. https://doi.org/10.3390/app12136403

AMA Style

Tran T, Nguyen N, Nguyen T. k-Covariance: An Approach of Ensemble Covariance Estimation and Undersampling to Stabilize the Covariance Matrix in the Global Minimum Variance Portfolio. Applied Sciences. 2022; 12(13):6403. https://doi.org/10.3390/app12136403

Chicago/Turabian Style

Tran, Tuan, Nhat Nguyen, and Trung Nguyen. 2022. "k-Covariance: An Approach of Ensemble Covariance Estimation and Undersampling to Stabilize the Covariance Matrix in the Global Minimum Variance Portfolio" Applied Sciences 12, no. 13: 6403. https://doi.org/10.3390/app12136403

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

k-Covariance: An Approach of Ensemble Covariance Estimation and Undersampling to Stabilize the Covariance Matrix in the Global Minimum Variance Portfolio

Abstract

1. Introduction

2. Background

2.1. Linear Shrinkage Estimation

2.2. Ensemble Learning and Undersampling

3. Methodology

4. Dataset and Evaluation Metrics

4.1. Dataset

4.2. Evaluation Metrics

5. Results and Discussions

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Hypothesis Testing

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI