Next Article in Journal
Deep Learning for Vessel Trajectory Prediction Using Clustered AIS Data
Next Article in Special Issue
Masked Autoencoder for Pre-Training on 3D Point Cloud Object Detection
Previous Article in Journal
Steganography with High Reconstruction Robustness: Hiding of Encrypted Secret Images
Previous Article in Special Issue
Prediction Method of Human Fatigue in an Artificial Atmospheric Environment Based on Dynamic Bayesian Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smoothed Quantile Regression with Factor-Augmented Regularized Variable Selection for High Correlated Data

1
College of Science, North China University of Technology, Beijing 100144, China
2
Key Laboratory of Quantitative Remote Sensing Information Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
3
School of Statistics, Renmin University of China, Beijing 100872, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(16), 2935; https://doi.org/10.3390/math10162935
Submission received: 16 July 2022 / Revised: 10 August 2022 / Accepted: 11 August 2022 / Published: 15 August 2022

Abstract

:
This paper studies variable selection for the data set, which has heavy-tailed distribution and high correlations within blocks of covariates. Motivated by econometric and financial studies, we consider using quantile regression to model the heavy-tailed distribution data. Considering the case where the covariates are high dimensional and there are high correlations within blocks, we use the latent factor model to reduce the correlations between the covariates and use the conquer to obtain the estimators of quantile regression coefficients, and we propose a consistency strategy named factor-augmented regularized variable selection for quantile regression (Farvsqr). By principal component analysis, we can obtain the latent factors and idiosyncratic components; then, we use both as predictors instead of the covariates with high correlations. Farvsqr transforms the problem from variable selection with highly correlated covariates to that with weakly correlated ones for quantile regression. Variable selection consistency is obtained under mild conditions. Simulation study and real data application demonstrate that our method is better than the common regularized M-estimation LASSO.

1. Introduction

Along with the continuous development of data collection and storage technology, data sets that present high dimensions and high correlations within blocks of variables can cause some new research problems in economics, finance, genomics, statistics, machine learning, etc. Because for such data, we need to make a variable selection in highly correlated variables.
There has been significant research into variable selection methods, and many variable selection methods have been developed, such as the regularized M-estimation method, which includes the LASSO [1], SCAD [2], elastic net [3], and the Dantzig selector [4]. There are many references to the regularized M-estimation method’s theoretical properties and algorithmic studies, including [5,6,7,8,9,10,11,12,13,14].
Most existing variable selection methods assume that the covariates are cross-sectionally weakly correlated, even, and serially independent. However, these assumptions are easily invalid in the data sets, which present high dimensions and high correlations within blocks of covariates, such as economic and financial data sets. For example, economics studies [15,16,17] show a strong correlation within blocks of covariates. In order to deal with the problem, Fan et al. proposed factor-adjusted variable selection for mean regression [18].
However, mean regression cannot simultaneously fit the skew and heavy-tailed data; mean regression is not robust against the outliers. Koenker and Bassett [19] proposed quantile regression (QR) to model the relationship between the response y and the covariates x . Compared to the mean regression, QR has two significant advantages: (i) QR can be used to model the entire conditional distribution of y given x , and thus, it provides insightful information about the relationship between y and x . The conditional distribution function of Y given x is F ( y | x ) = P ( Y y | x ) . For 0 < τ < 1 , the τ th conditional quantile of Y given x is defined as Q Y | x ( τ ) = inf { t : F ( t | x ) τ } . (ii) QR is robust against outliers and can be used to model the response in which distribution is skewed or heavy-tailed without correct error assumption. These two advantages make QR an appealing method to reflect data information that is difficult for the mean regression. The researchers can refer to Koenker [20] and Koenker et al. [21] for a comprehensive overview of methods, theory, computation, and many extensions of QR.
Ando and Tsay [22] proposed factor-augmented predictors for quantile regression, but the model did not contain the idiosyncratic components of the covariates, so it will cause an information loss of explanatory variables. So, we refer to Fan et al. [18] and propose the factor-augmented regularized variable selection (Farvsqr) for quantile regression to overcome the problems caused by the correlations within the covariates. As usual, let us assume that the i-th observation covariates x i = x i 1 , , x i p T follow an approximate factor model,
x i = Λ f i + ϵ i ,
where f i is a k × 1 vector of latent factors, Λ is a p × k loading matrix, and ϵ i is a p × 1 vector of idiosyncratic components or errors which are independent of f i .
The factor model has become one of the most popular and powerful tools in multivariate statistics and deeply impacted biology [23,24,25], economics, and finance [15,16,26]. Chamberlain and Rothschild [27] first proposed using principal component analysis (PCA) to solve the approximate factor model’s latent factors and loading matrix. Subsequently, much literature explores the factor model using the PCA method [28,29,30,31,32]. In our paper, we will use the PCA to obtain the estimators of Λ , f i , and ϵ i .
The process of Farvsqr is first to estimate model (1) and obtain the independent or low-correlated estimators of f i and ϵ i . Then, we replace the high correlation covariates x i with the estimators f i and ϵ i . The second step is to solve a common regularized loss function. In this paper, we study Farvsqr by giving the specific parameter-solving process and the theoretical properties. Moreover, both simulation and real data application studies are presented.
The main contribution of our paper is to generalize the factor-adjusted regularized variable selection of mean regression to the quantile regression to accommodate the skew and heavy-tailed data. Section 2 introduces the smoothed quantile regression and the approximate factor models. Section 3 introduces the variable selection methodology of Farvsqr. Section 4 presents the general theoretical results. Section 5 provides simulation studies, and Section 6 applies our model to the Quarterly Database for Macroeconomic Research (FRED-QD).

2. Quantile Regression and Approximate Factor Model

2.1. Notations

Now, we will give some notations that will be used throughout the paper. Let I n denote the n × n identity matrix; 0 denotes the n × m zero matrix; 0 n and 1 n denote the zero vector and one vector in R n , respectively. For a matrix W , let W m a x = m a x i , j W i j denote its max norm, while W F and W p denote its Frobenius and induced p-norms, respectively. Let λ m i n W denote the minimum eigenvalue of W if it is symmetric. For W R n × m , I [ n ] and J [ m ] , define W I J = W i j i I , j J , W I · = W i j i I , j [ m ] , W · J = W i j i [ n ] , j J . For a vector w R p and L [ p ] , define w L = w i i L to be its subvector. Let ∇ and 2 be the gradient and Hessian operators. For f : R p R and I , J [ p ] , define I f x = f ( x ) I and I J 2 f ( x ) = 2 f ( x ) I J . Let N ( μ , Σ ) denote the normal distribution with mean μ and covariance matrix Σ .

2.2. Regularized M-Estimator for Quantile Regression

This subsection will begin with high-dimensional regression problems with heavy-tailed data. Let y = ( y 1 , , y n ) be the response vector, x i = ( x i 1 , , x i p ) T , i = 1 , , n be the p-dimensional vectors of the explanatory variables. Let X = 1 n , ( x 1 , , x n ) T R n × ( p + 1 ) be the design matrix and y = ( y 1 , y 2 , , y n ) T R n be the response vector. Let X 1 = x 1 , , x n T R n × p be the matrix including n samples of the p-dimensional vector.
In this paper, we will fit the heavy-tailed data with quantile regression. Let F y i | x i be the conditional cumulative distribution function of y i given x i . Under the linear quantile regression assumption, the τ th conditional quantile function is defined as
F y i | x i 1 ( τ ) = β 0 * ( τ ) + j = 1 p β j * ( τ ) x i j = 1 , x i T β * ( τ )
where the quantile τ 0 , 1 , β * ( τ ) = β 0 * ( τ ) , β 1 * ( τ ) , , β p * ( τ ) T is the true coefficients of the quantile regression that changes with the quantile τ . For the convenience of writing, we will omit the τ given in the following.
Under the linear quantile regression assumption, the common regression coefficient estimator at a given τ can be given as [19]
β ^ a r g m i n β R p + 1 R y , X β = a r g m i n β R p + 1 1 n i = 1 n ρ τ ( y i 1 , x i T β ) ,
where ρ τ ( u ) = u ( τ I u 0 ) is the check function, I u 0 is the indicator function, and τ is the quantile. However, as we know, the check function is not differentiable, which is very different from other widely used objective functions. The non-differentiable has two obvious disadvantages: (i) theoretical analysis of the estimator is very difficult; and (ii) gradient-based optimization methods cannot be used. So, He et al. [33] proposed a smoothed quantile regression for large-scale inference, which is denoted as conquer (convolution-type smoothed quantile regression). He et al. [33] concluded that the conquer method could improve estimation accuracy and computational efficiency for fitting large-scale linear quantile regression models rather than by minimizing the check function (3). So, in our paper, we will use the conquer to estimate the quantile regression. The estimator is given by
β ^ a r g m i n β R p + 1 R y , X β = a r g m i n β R p + 1 1 n i = 1 n L h ( y i 1 , x i T β )
where L h ( v ) = ( ρ τ * K h ) ( v ) = ρ τ ( w ) K h ( w v ) d w , K ( · ) is a symmetric and non-negative kernel function in which the integral is 1, and h is the bandwidth. Referring to He et al. [33], we have the definition:
K h ( v ) = 1 h K ( v / h ) , K h ( v ) = K ( v / h ) , K ( v ) = v K ( w ) d w , v R .
The conquer function R y , X β is twice continuously differentiable relative to β ; the gradient matrix and hessian matrix are as follows:
R y , X β = 1 n i = 1 n { K h 1 , x i T β y i τ } 1 , x i T T , 2 R y , X β = 1 n i = 1 n K h 1 , x i T β y i 1 , x i T T 1 , x i T
When β = ( β 0 , β 1 , , β p ) T is a sparse vector, it is common to estimate β through the regularized M-estimator as the following:
β ^ a r g m i n β R p + 1 R y , X β + λ Q ( β ) = a r g m i n β R p + 1 1 n i = 1 n L h ( y i 1 , x i β ) + λ Q ( β )
We expect that the estimator of (5) satisfies two formulas: β ^ β * P 0 for some norm · and P ( s u p p ( β ^ ) = s u p p ( β * ) ) 1 as n . Zhao and Yu [9] studied the LASSO estimator for a sparse linear model and showed that there exists an irrepresentable condition that is sufficient and almost necessary for two formulas when we assume s u p p ( β * ) = [ l ] = L . Let ( X ) L and ( X ) L c denote the submatrices of X , which are the first l and the rest ( p + 1 l ) of the columns, respectively. Then, the irrepresentable condition is:
( X ) L c ( X ) L [ ( X ) L T ( X ) L ] 1 1 γ
where γ ( 0 , 1 ) , but when the explanatory variables strongly correlate with the blocks, the irrepresentable condition will be easily invalid [18].

2.3. Approximate Factor Model

When there exist strong correlations between the covariates x i , in order to estimate the parameters β , the common method is the latent factor model. There are many papers in the literature that studied the latent factor model in econometrics and statistics [15,16,18,30,34].
As usual, let us assume that x i R p , i = 1 , , n follows the approximate factor model (1). As we know, the x i , i = 1 , , n are the only observed variables; we need to estimate Λ , f i , ϵ i , i = 1 , , n . Generally, it is assumed that k is independent of n [18]. Let F = ( f 1 , , f n ) T R n × k be the latent factors matrix, and ε = ( ϵ 1 , , ϵ n ) T R n × p is the errors matrix. Then, Equation (1) can be written in a matrix as the following:
X 1 = F Λ T + ε .
Here, we need to note that x i = ( x i 1 , , x i p ) T , i = 1 , , n have a strong correlation within the blocks, not including the intercept, so the matrix form of the latent factor model is X 1 but not X . We impose the basic assumption for the latent factor model to identify the model as Assumption 1 [18].
Assumption 1.
Assume that c o v ( f i ) = I k , Λ T Λ is diagonal, and all the eigenvalues of Λ T Λ / p are bounded away from 0 and ∞ as p .

3. Factor-Augmented Regularized Variable Selection

3.1. Methodology

Let Λ 0 = ( 0 k , Λ T ) T R p + 1 × k , and ε 1 = 1 n , ε R n × ( p + 1 ) . With the approximate factor model (7), we have X = F Λ 0 T + ε 1 , so we can obtain:
X β = F Λ 0 T β + ε 1 β = F α + ε 1 β ,
where α = Λ 0 T β R k . So, the regularized variable selection (5) can be written as:
β ^ a r g m i n β R p + 1 , α = Λ 0 T β R k R y , X β + λ Q ( β ) = a r g m i n β R p + 1 , α = Λ 0 T β R k 1 n i = 1 n L h ( y i ( 1 , ϵ i T ) β f i T α ) + λ Q ( β )
We need to estimate the coefficient of x i , i = 1 , , n , namely β , so we consider α as the nuisance parameter. Now, let us consider a new estimator without the constraint α = Λ 0 T β ,
β ^ a r g m i n β R p + 1 , α R k R y , X β + λ Q ( β ) = a r g m i n β R p + 1 , α R k 1 n i = 1 n L h ( y i ( 1 , ϵ i T ) β f i T α ) + λ Q ( β )
From Equation (10), we can see that the vector ( ϵ i T , f i T ) T can be considered as the new explanatory variables. In other words, we lift the covariate space from R p + 1 to R p + 1 + k with the latent factor model, and the highly dependent covariates x i are replaced by weakly dependent ( ϵ i T , f i T ) T .
We have the following lemma, whose proof is given in Appendix A:
Lemma 1.
Consider the model (2), let R y , X β = 1 n i = 1 n L h ( y i 1 , x i T β ) , η i = K h 1 , x i T β * y i τ and v i = ( 1 , ϵ i T , f i T ) T R p + 1 + k . If E ( η i v i ) = 0 p + 1 + k , then
β * , Λ 0 T β * = a r g m i n β R p + 1 , α R k E [ R y , F α + ε 1 β ] .
By the latent factor model, ( ε , F ) has a much weaker correlation than X 1 . So, we can calculate the estimators by the following two steps:
  • Let X 1 R n × p be the design matrix with strong cross-section correlations. Fit the approximate factor model (7), and the estimators of Λ , F , and ε are denoted as Λ ^ , F ^ , and ε ^ . This paper will use the principal component analysis (PCA) to estimate all the parameters in the latent factor model. Regarding PCA, the references such as Bai [28] and Fan et al. [18,30] are available. More specifically, the columns of F ^ / n are the eigenvectors of X 1 X 1 T corresponding to the top k eigenvalues, Λ ^ = 1 n X T F ^ .
  • Define V ^ = 1 n , ε ^ , F ^ R n × ( p + 1 + k ) and θ = β T , α T T R p + 1 + k . Then, β ^ is obtained from the first p + 1 entries of the estimator vector of θ .
    θ ^ a r g m i n θ R p + 1 + k R ( y , V ^ θ ) + λ Q ( θ [ p + 1 ] ) = a r g m i n θ R p + 1 + k 1 n i = 1 n L h ( y i v ^ i T θ ) + λ Q ( θ [ p + 1 ] ) ,
    where v ^ i T is the i-th row of the matrix V ^ .
We call the above two-step method as the factor-augmented regularized variable selection for quantile regression (Farvsqr). We successfully changed the quantile variable selection with highly correlated covariates X in (5) to quantile variable selection with weakly correlated or uncorrelated ones by the latent factor model in (12). Formula (12) is a convex function that can be minimized via the method conquer proposed by He et al. [33].

3.2. Selection Method of λ

Throughout all the study, the tuning parameter λ is selected by 10-fold cross-validation. First, we are given an equally spaced sequence of size 50 with the range from 0.05 to 2, which is the value range of λ . Second, the samples are divided into 10 pieces, nine of which are used as training sets and one of which is used as the test set. Third, for each value of λ , calculate the estimators of the model (12) using the training sets, then predict the test set, and select the λ which obtains the minimum value of the mean square error on the test set.

4. Theoretical Analysis

In this section, we will give the theoretical guarantees of the estimator from Formula (12) under the condition of the LASSO penalty. As the description before, β * is the first p + 1 elements of θ * . Here, let L = s u p p θ * , L 1 = s u p p β * , L 2 = [ p + 1 + k ] \ L . When the explanatory variables X 1 can be fitted well by the approximate factor model (7), then we can use the true augmented explanatory variables v i = ( 1 , ϵ i T , f i T ) T to solve the objective function
m i n θ R p + 1 + k R ( y , V θ ) + λ θ [ p + 1 ] 1 ,
where V = v i , , v n T . However, V is not observable, so we need to use its estimator V ^ to solve the objective function
m i n θ R p + 1 + k R ( y , V ^ θ ) + λ θ [ p + 1 ] 1 .
Assumption 2.
K ( z ) C 2 R . For some constants W 2 and W 3 , we have 0 K ( z ) W 2 , K ( z ) W 3 .
Assumption 3.
Let θ * = β * Λ 0 T β * . It is assumed that ρ 2 > ρ > 0 and γ ( 0 , 0.5 ) such that
[ L L 2 R y , V θ * ] 1 l 1 4 ρ l , l = 2 , , L 2 L 2 R y , V θ * [ L L 2 R y , V θ * ] 1 1 2 γ .
Assumption 4.
V W 0 2 for some constant W 0 > 0 . In addition, there exists k × k nonsingular matrix M 0 , and M = I p + 1 0 ( p + 1 ) × k 0 ( p + 1 ) × k M 0 such that for V ¯ = V ^ M , we have V ¯ V m a x W 0 2 and σ = m a x j [ p + 1 + k ] 1 n i = 1 n v ¯ i j v i j 2 1 / 2 4 ρ γ 3 W 0 W 2 L 2 .
Theorem 1.
Suppose Assumptions 2–4 hold. Define W = W 0 3 W 3 L 3 / 2 and
ω = m a x j [ p + 1 + k ] 1 n i = 1 n v ¯ i j [ K h 1 , x i T β * y i τ ] .
If 7 ω γ < λ < ρ 2 ρ γ 12 W L , then, we have s u p p β ^ s u p p β * and β ^ β * 6 λ 5 ρ , β ^ β * 2 4 λ L ρ 2 , β ^ β * 1 6 λ L 5 ρ .

5. Simulation Study

In this section, we will assess the performance of the method proposed by this paper through simulation. We compare Farvsqr with LASSO and SCAD under different simulation data.
We generate the response y i from the model y i = x i β * + e i , where the true coefficients β * are set to be β * = 6 , 5 , 4 , 0 T T , and the error part e i is following three models:
(i)
e i N ( 0 , 1 ) ;
(ii)
e i t ( 2 ) ;
(iii)
e i 0.1 N ( 0 , 1 ) + 0.9 N ( 0 , 9 ) .
The covariates x i are generated from one of the following two models:
(i)
Factor model. x i = Λ f i + ϵ i with k = 3 . Factors are generated from a stationary V A R 1 model f i = Φ f i 1 + η i with f 0 = 0 . The ( i , j ) -th entry of Φ is set to be 0.5 when i = j and 0.3 i j when i j . We draw Λ , ϵ i , and η i from the i.i.d. standard normal distribution.
(ii)
Equal correlated case. We draw x i from i.i.d. N p ( 0 , Σ ) , where Σ has diagonal elements 1 and off-diagonal elements 0.4.
For the factor model, in order to comprehensively evaluate the Farvsqr, given the quantile τ , we compare the influence of the different sample sizes and the explanatory variable’s dimensionality under different error distributions. We use the estimation error, namely β ^ β * 2 , average model size, percentage of true positives (TP) for β , percentage of true negatives (TN) for β , and the elapsed time to compare the Farvsqr and LASSO. The percentage of TP and TN are defined as follows:
T P = 1 p j = 1 p I ( β ^ j 0 , β j 0 , s i g n ( β ^ j ) = s i g n ( β j ) ) , T N = 1 p j = 1 p I ( β ^ j = 0 , β j = 0 ) .
We compare the model performance of Farvsqr with LASSO under different error distributions and explanatory variable relationships; for each situation, we simulate 500 replications.
  • Influence of sample size
We compare the model with the fixed explanatory variable’s dimensionality p = 200 ; the sample size is set to be 100 , 300 , 500 , 800 , and 1000 , respectively. For each sample size, we simulate 500 replications and calculate the average estimation error, average model size, TP, TN, and elapsed time. The results are presented in Table 1, Table 2 and Table 3. From the results, we can see that under three different error distributions, for each τ and n, the average estimation error of Farvsqr is smaller than that for LASSO. For example, when τ = 0.25 , n = 1000 of normal distribution, the average estimation errors of Farvsqr and LASSO are 0.127 and 2.586, respectively. As for the average model size, almost all the values of Farvsqr are smaller than those of LASSO, except for n = 100 . For TP, all the scenarios are the same for Farvsqr and LASSO, so we can say that both can select the true non-zero variables. For elapsed time, all the values of Farvsqr are smaller than those of LASSO, so we can say that our method is more efficient. From all of the above, we can say that Farvsqr is better than LASSO. For every quantile τ , as the number of samples increases, the estimation error gradually decreases for Farvsqr, but for LASSO, the impact of sample size is not obvious. It may be that for the factor model, LASSO is not approximate, so although the sample size becomes larger, it cannot change the defects of LASSO method.
  • Influence of explanatory variable’s dimensionality
We compare the model with a fixed sample size n = 1000 ; the explanatory variable’s dimensionality is set to be 200 , 300 , 400 , 500 , and 600, respectively. For each explanatory variable’s dimensionality, we simulate 500 replications and calculate the average estimation error, average model size, TP, TN, and elapsed time. The results are presented in Table 4, Table 5 and Table 6. From the results, we can see that under three different error distributions, for each τ and p, the average estimation error of Farvsqr is smaller than that of LASSO. For example, when τ = 0.25 , p = 200 of normal distribution, the average estimation errors of Farvsqr and LASSO are 0.124 and 2.059, respectively. As for the average model size, all the values of Farvsqr are smaller than those for LASSO. For TP, all the scenarios are the same for Farvsqr and LASSO, so we can say that both can select the true non-zero variables. For TN, all the values of Farvsqr are bigger than those of LASSO, so we can say that LASSO prefers to select redundant variables. For elapsed time, all the values of Farvsqr are smaller than those of LASSO, so we can say that our method is more efficient. From all of the above, we can say that Farvsqr is better than LASSO. For every quantile τ , as the dimension increases, the average estimation error also increases, which is consistent with common sense, however, the increase in range of Farvsqr is smaller than that for LASSO. For example, when τ = 0.25 normal distribution, the values of Farvsqr are 0.124 and 0.158, respectively, for p = 200 and p = 500 , the relative increase is 27.42 % ; as for LASSO, the relative increase is 85.58 % , so we can say that LASSO is vulnerable to the increase of variable dimension.
  • Equal correlated case
We also compare our model with LASSO under different sample sizes and explanatory variable’s dimensionality situation for the equal correlated case. By simulating 500 replications, we calculate the average estimation error, average model size, TP, TN, and elapsed time. The results are presented in Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12. From all the tables, we can see that essentially all the elapsed time of Farvsqr is shorter than LASSO; at the same time, the estimation error is slightly larger for most situations. For the fixed explanatory variable’s dimensionality p = 200 , as the number of samples increases, the elapsed time gradually decreases for Farvsqr and LASSO, but the relative increase is more significant for LASSO. For example, when τ = 0.25 for N ( 0 , 1 ) , the elapsed time of two methods for n = 100 are 0.687 and 1.099, respectively, and the elapsed time of two methods for n = 1000 are 1.965 and 3.856, respectively, and the relative increase is 186 % for Farvsqr. As for LASSO, the relative increase is 251 % . So, we can say that the efficiency of LASSO is easily affected by the sample size, and it is not appropriate for the large sample data. So, we can say that Farvsqr pays less cost for the similar correlated case.
From all the results above, we can draw the following conclusions:
(i)
When the covariates are high dimensional and high correlations within blocks, namely, the covariates are generated from the factor model, our method Farvsqr is better than LASSO from all the evaluating indicators, including the average estimation error, average model size, TP, TN, and elapsed time.
(ii)
For the factor model, the parameter estimation accuracy of LASSO is easily affected by the increase of the explanatory variable’s dimension.
(iii)
For the equal correlated case, the Farvsqr pays less cost.
(iv)
For all the different scenarios, the efficiency of the LASSO is easily affected by the sample size.
In order to illustrate further that our method is better for the data which is high dimensional and high correlations within blocks, we compare our method with SCAD also, and we found the same conclusions as LASSO. Here, we just give the results under normal distribution. Table 13 and Table 14 are, respectively, for the fixed explanatory variable’s dimensionality and sample size. We need to know here that the Farvsqr method is first to replace the highly dependent covariates by weakly dependent or uncorrelated ones by the latent factor model; then, we minimize (12) with LASSO or SCAD. However, LASSO and SCAD directly minimize Formula (5) in which the covariates are highly correlated.

6. Real Data Application

In this section, we will use the season U.S. macroeconomic variables in the FRED-QD database [17]. The dataset includes 247 dimensions, and the covariates in the FRED-QD data set are strongly correlated. We choose 88 data points which are complete observation samples from the first quarter of 2000 to the last quarter of 2021. The FRED-QD is a quarterly economic database updated by the Federal Reserve Bank of St. Louis, which is publicly available at http://research.stlouisfed.org/econ/mccracken/sel/ (accessed on 28 June 2022). The detailed information about the data can be found on the website. In this paper, we choose the variable GDP as the response and the other 246 variables as the explanatory variables. The density distribution of the response of our data is as shown in Figure 1. We compare the proposed Farvsqr with LASSO in variable selection, estimation, and elapsed time. The estimation performance is evaluated by the R 2 , which is defined as:
1 i = 1 n y i y ^ i 2 i = 1 n y i y ¯ 2
where y i is the observed value at the time i, y ^ i is the predicted value, and y ¯ is the sample mean. We model the data given the quantile τ = 0.1 , τ = 0.5 , τ = 0.75 , τ = 0.9 . We evaluate the model from the R 2 , model size, and elapsed time.
The results are presented in Table 15. From the result, we can see that the model sizes of Farvsqr are 18, 19, 38, and 38 for the quantile τ = 0.1 , τ = 0.5 , τ = 0.75 , and τ = 0.9 , respectively; however, the model sizes of LASSO are 241, 176, 207, and 222 for the quantile τ = 0.1 , τ = 0.5 , τ = 0.75 , and τ = 0.9 , respectively. The LASSO prefers to choose more related variables. For instance, for τ = 0.1 , τ = 0.5 , τ = 0.75 , and τ = 0.9 , all LASSO models include both Real PCE expenditures: durable goods, Real PCE: services, Real PCE: nondurable goods, Real gross private domestic investment, Real private fixed investment, Real gross private domestic investment: fixed investment: nonresidential: equipment, and Real private fixed investment: nonresidential because of the strong correlation between them. Moreover, all LASSO models also include both Number of civilians unemployed for less than 5 weeks, Number of civilians unemployed from 5 to 14 weeks, and Number of civilians unemployed from 15 to 26 weeks because of the strong correlation between them. Many other related variables are included by LASSO. The elapsed times of Farvsqr are 7.6209, 8.2036, 8.3589, and 8.3493 for the quantile τ = 0.1 , τ = 0.5 , τ = 0.75 , and τ = 0.9 respectively, while the elapsed times of LASSO are 9.8736, 13.8031, 10.6616, and 10.1012 for the quantile τ = 0.1 , τ = 0.5 , τ = 0.75 , and τ = 0.9 , respectively; so we can say that the algorithm efficiency of LASSO for our real data is much lower than that of Farvsqr. It may be because LASSO selects too many redundant explanatory variables, which not only affects the estimation accuracy of the model but also affects the efficiency of the algorithm. For the R 2 , Farvsqr is better than LASSO except for τ = 0.1 . So, we can see that Farvsqr is more suitable for this data set. Furthermore, we can say that for the data set with strong correlation between explanatory variables, Farvsqr is more suitable for use.

7. Conclusions

In this paper, we are aimed at the data set, which has heavy-tailed distribution, high dimension, and high correlations within the blocks of the covariates. By generalizing the factor-adjusted regularized variable selection for mean regression to the quantile regression, we proposed the method of factor-augmented regularized variable selection for quantile regression ( Farvsqr). In order to analyze the theoretical analysis and improve estimation accuracy and computational efficiency for fitting large-scale linear quantile regression models, we use the convolution-type smoothed quantile regression to estimate the quantile regression coefficients. The paper gives the theoretical result of the estimators. At the same time, from the simulation and the real data analysis, we can see that our method is better than LASSO. In the future, we will continue to study the missing data variable selection for quantile regression with the high correlations within the blocks of the covariates.

Author Contributions

Conceptualization, M.T.; methodology, Y.Z.; software, Q.W.; validation, Y.Z. and Q.W.; formal analysis, M.T.; investigation, M.T.; resources, M.T.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, M.T.; visualization, M.T.; supervision, M.T.; project administration, M.T.; funding acquisition, M.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (22XNL016).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The researchers can download the FRED-QD database from the website http://research.stlouisfed.org/econ/mccracken/sel/ (accessed on 28 June 2022).

Acknowledgments

The authors would like to thank for Liwen Xu for some helpful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
QRQuantile Regression
ConquerConvolution-type Smoothed Quantile Regression)
PCAPrincipal Component Analysis

Appendix A

Appendix A.1. Proof of Lemma 1

Let V = v 1 , , v n T and θ * = ( β * ) T , ( β * ) T Λ 0 T . Note that
E [ R y , V θ ] = E { 1 n i = 1 n [ K h 1 , x i T β y i τ ] 1 , x i T T } = E { [ K h 1 , x 1 T β y 1 τ ] 1 , x 1 T T } = E { [ K h v 1 T θ y 1 τ ] v 1 }
and v i T θ * = 1 , x i T β * . So the conclusion can be proved by
E [ R y , V θ ] | θ = θ * = E { [ K h v 1 T θ * y 1 τ ] v 1 } = E { [ K h 1 , x 1 T β * y 1 τ ] v 1 } = E [ η 1 v 1 ] = 0 p + 1 + k

Appendix A.2. Proof of Theorem 1

In order to proof the theorem 1, let us introduce the Lemma A1 from Fan et al. [18] first. When we assume that the last k variables are not penalized, let R ( · ) : R p + 1 + k R be a convex function, θ * and β * = θ [ p + 1 ] * be the sparse sub-vector of interest. Then, θ * and β * are estimated by
θ ^ = a r g m i n { R θ + λ θ [ p + 1 ] 1 } β ^ = θ ^ [ p + 1 ]
Let L = s u p p θ * , L 1 = s u p p β * , L 2 = [ p + 1 + k ] \ L . Then, we can obtain the Lemma A1 as follows:
Assumption A1
(Smoothness). R ( θ ) C 2 ( R p + k + 1 ) and there exist A > 0 , W > 0 such that · L 2 R θ · L 2 R θ * W θ θ * 2 whenever s u p p ( θ ) L and θ θ * 2 A ;
Assumption A2
(Restricted strong convexity). There exist ρ 2 > ρ > 0 such that [ L L 2 R θ * ] 1 1 2 ρ and [ L L 2 R θ * ] 1 2 1 2 ρ 2 ;
Assumption A3
(Irrepresentable condition). L 2 L 2 R θ * [ L L 2 R θ * ] 1 1 γ for some γ ( 0 , 1 ) ;
Lemma A1.
Under Assumptions A1–A3, if
7 γ R θ * < λ < ρ 2 4 L m i n A , ρ γ 3 W
then s u p p ( θ ^ ) L and
θ ^ θ * 3 5 ρ L R θ * + λ θ ^ θ * 2 2 ρ 2 L R θ * 2 + λ L 1 θ ^ θ * 1 m i n 3 5 ρ L R θ * 1 + λ L 1 , 2 L ρ 2 L R θ * 2 + λ L 1 .
Next, we will give the proof of the Theorem 1.
Proof of Theorem 1.
As we know, θ ^ = a r g m i n θ { R ( y , V ^ θ ) + λ θ [ p + 1 ] 1 } . From Assumption 4, we know that M 0 is nonsingular and M = I p + 1 0 ( p + 1 ) × k 0 ( p + 1 ) × k M 0 . Let V ¯ = V ^ M , θ ¯ = M 1 θ ^ , Λ ^ 0 = 0 k T , Λ ^ T T , θ ^ * = β * Λ ^ 0 β * , θ ¯ * = M 1 θ ^ * . So, we can see that β ^ = θ ^ [ p + 1 ] = θ ¯ [ p + 1 ] and θ ¯ = a r g m i n θ { R ( y , V ¯ θ ) + λ θ [ p + 1 ] 1 } . So, s u p p β ^ = s u p p θ ¯ [ p + 1 ] and β ^ β * = θ ¯ [ p + 1 ] θ ¯ [ p + 1 ] * θ ¯ θ ¯ * for any norm.
Then, we can change to study θ ¯ and the objective function R y , V ¯ θ in order to study the theoretical properties of β ^ . We will give the Theorem A1 which means all the assumptions in Lemma A1 are fulfilled.
Let v i T and v ¯ i T be the i th row of V and V ¯ , respectively. We can see that R y , V ¯ θ = 1 n i = 1 n L h ( y i v ¯ i T θ ) , R y , V ¯ θ = 1 n i = 1 n { K h v ¯ i T θ y i τ } v ¯ i , V ¯ θ ¯ * = X β * . Hence R y , V ¯ θ ¯ * = ω . From the properties of the vector norm, we can obtain L R y , V ¯ θ ¯ * ω , L R y , V ¯ θ ¯ * 2 ω L , L R y , V ¯ θ ¯ * 1 ω L . In addition, let λ > 7 ω γ ω . From Lemma A1, we can obtain that Theorem 1 is true.  □
Theorem A1.
Based on all the Assumptions 2–4, define W = W 0 3 W 3 L 3 / 2 , then
( i ) · L 2 R y , V ¯ θ · L 2 R y , V ¯ θ ¯ * W θ θ ¯ * 2 , ( i i ) [ L L 2 R y , V ¯ θ ¯ * ] 1 1 2 ρ , ( i i i ) [ L L 2 R y , V ¯ θ ¯ * ] 1 2 1 2 ρ 2 , ( i v ) L 2 L 2 R y , V ¯ θ ¯ * [ L L 2 R y , V ¯ θ ¯ * ] 1 1 γ .
Proof. 
(i) V θ * = V ¯ θ ¯ * = X β * , then
2 R y , V θ * = 1 n i = 1 n { K h v ¯ i T θ ¯ * y i } v i v i T , 2 R y , V ¯ θ ¯ * = 1 n i = 1 n { K h v ¯ i T θ ¯ * y i } v ¯ i v ¯ i T .
For any j , t [ p + 1 + k ] and s u p p ( θ ) L , we have
j t 2 R y , V ¯ θ j t 2 R y , V ¯ θ ¯ * = 1 n i = 1 n K h v ¯ i T θ y i v ¯ i j v ¯ i t i = 1 n K h v ¯ i T θ ¯ * y i v ¯ i j v ¯ i t = 1 n i = 1 n [ K h v ¯ i T θ y i K h v ¯ i T θ ¯ * y i ] v ¯ i j v ¯ i t 1 n i = 1 n K h v ¯ i T θ y i K h v ¯ i T θ ¯ * y i v ¯ i j v ¯ i t 1 n i = 1 n W 3 v i T θ θ * V ¯ m a x 2
By the Cauchy–Schwarz inequality and V ¯ m a x V m a x + V ¯ V m a x W 0 , so for i [ n ] , we have v ¯ i T θ θ ¯ * = v ¯ i L T θ θ ¯ * L v ¯ i L 2 θ θ ¯ * 2 L W 0 θ θ ¯ * 2 . Plugging this result back to (A1), we can obtain
j t 2 R y , V ¯ θ j t 2 R y , V ¯ θ ¯ * L W 3 W 0 3 θ θ ¯ * 2 , j , t [ p + 1 + k ] · L 2 R y , V ¯ θ · L 2 R y , V ¯ θ ¯ * L 3 / 2 W 3 W 0 3 θ θ ¯ * 2 = M θ θ ¯ * 2 .
(ii) For any t [ p + 1 + k ] , we have
t L 2 R y , V ¯ θ ¯ * · L 2 R y , V θ * = 1 n i = 1 n K h ( v ¯ i T θ ¯ * y i ) ( v ¯ i t v ¯ i L T v i t v i L T ) 1 n i = 1 n K h ( v ¯ i T θ ¯ * y i ) v ¯ i t v ¯ i L T v i t v i L T m a x W 2 L n i = 1 n v ¯ i t v ¯ i L T v i t v i L T 2 .
With V m a x W 0 / 2 , V ¯ m a x W 0 , we can obtain
v ¯ i t v ¯ i L T v i t v i L T 2 = v ¯ i t v ¯ i L T v i t v i L T + v i t v ¯ i L T v i t v ¯ i L T 2 v i t ( v ¯ i L v i L ) T 2 + ( v ¯ i t v i t ) v ¯ i L T 2 v i t v ¯ i L v i L 2 + v ¯ i t v i t v ¯ i L T 2 V m a x v ¯ i L v i L 2 + v ¯ i t v i t L V ¯ m a x W 0 2 v ¯ i L v i L 2 + W 0 L v ¯ i t v i t .
From Assumption 4, we know that σ = m a x j [ p + 1 + k ] 1 n i = 1 n v ¯ i j v i j 2 1 / 2 . By Jensen’s inequality, J [ p + 1 + k ] , we have
1 n i = 1 n v ¯ i J v i J 2 1 n i = 1 n v ¯ i J v i J 2 2 1 / 2 J n m a x j [ p + 1 + k ] i = 1 n v ¯ i j v i j 1 / 2 L σ .
So
· L 2 R y , V ¯ θ * · L 2 R y , V θ * = L · m a x j [ p + 1 + k ] j L 2 R y , V ¯ θ * j L 2 R y , V θ * L W 2 L N i = 1 n W 0 2 v ¯ i L v i L 2 + W 0 L v ¯ i j v i j = W 0 W 2 L 2 1 n i = 1 n v ¯ i L v i L 2 + W 0 W 2 L 2 1 n i = 1 n v ¯ i j v i j W 0 W 2 L 2 2 σ + W 0 W 2 L 2 σ = 3 2 W 0 W 2 L 2 σ .
Let κ = L L 2 R y , V θ * 1 [ L L 2 R y , V ¯ θ ¯ * L L 2 R y , V θ * ] , then we can obtain
κ L L 2 R y , V θ * 1 L L 2 R y , V ¯ θ ¯ * L L 2 R y , V θ * 3 8 ρ W 0 W 2 L 2 σ 1 2
And
L L 2 R y , V ¯ θ ¯ * 1 L L 2 R y , V θ * 1 L L 2 R y , V θ * 1 κ 1 κ 1 4 ρ
So
L L 2 R y , V ¯ θ ¯ * 1 L L 2 R y , V θ * 1 + 1 4 ρ 1 2 ρ
(iii) The third conclusion can be obtained easily from (A4). Since for any symmetric matrix B, B 2 B is satisfied. We can obtain L L 2 R y , V ¯ θ ¯ * 1 L L 2 R y , V θ * 1 2 1 4 ρ 1 4 ρ 2 , and thus
L L 2 R y , V ¯ θ ¯ * 1 2 1 2 ρ 2
(iv)
L 2 L 2 R y , V ¯ θ ¯ * L L 2 R y , V ¯ θ ¯ * 1 L 2 L 2 R y , V θ * L L 2 R y , V θ * 1 = L 2 L 2 R y , V ¯ θ ¯ * L L 2 R y , V ¯ θ ¯ * 1 + L 2 L 2 R y , V θ * L L 2 R y , V ¯ θ ¯ * 1 L 2 L 2 R y , V θ * L L 2 R y , V ¯ θ ¯ * 1 L 2 L 2 R y , V θ * L L 2 R y , V θ * 1 L 2 L 2 R y , V ¯ θ ¯ * L 2 L 2 R y , V θ * L L 2 R y , V ¯ θ ¯ * 1 + L 2 L 2 R y , V θ * [ L L 2 R y , V ¯ θ ¯ * 1 L L 2 R y , V θ * 1 ]
From the conclusion (ii) and (A2), we can obtain that
L 2 L 2 R y , V ¯ θ ¯ * L 2 L 2 R y , V θ * L L 2 R y , V ¯ θ ¯ * 1 3 4 ρ W 0 W 2 L 2 σ
On the other hand, we can take A = L 2 L 2 R y , V θ * , B = L L 2 R y , V θ * , C = L L 2 R y , V ¯ θ ¯ * L L 2 R y , V θ * . By Assumption 4, A B 1 1 2 γ 1 , and we have
L 2 L 2 R y , V θ * [ L L 2 R y , V ¯ θ ¯ * 1 L L 2 R y , V θ * 1 = A [ ( B + C ) 1 B 1 ] A B 1 C B 1 1 C B 1 C B 1 1 C B 1
From Formula (A3), we can obtain C B 1 3 8 ρ W 0 W 2 L 2 σ 1 2 . As a result,
L 2 L 2 R y , V θ * [ L L 2 R y , V ¯ θ ¯ * 1 L L 2 R y , V θ * 1 3 4 ρ W 0 W 2 L 2 σ
By combining these estimates, we have
L 2 L 2 R y , V ¯ θ ¯ * L L 2 R y , V ¯ θ ¯ * 1 L 2 L 2 R y , V θ * L L 2 R y , V θ * 1 3 4 ρ W 0 W 2 L 2 σ + 3 4 ρ W 0 W 2 L 2 σ 3 2 ρ W 0 W 2 L 2 σ γ
Therefore, L 2 L 2 R y , V ¯ θ ¯ * L L 2 R y , V ¯ θ ¯ * 1 ( 1 2 γ ) + γ = 1 γ .  □

References

  1. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  2. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  3. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  4. Candes, E.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar]
  5. Donoho, D.L.; Elad, M. Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. USA 2003, 100, 2197–2202. [Google Scholar] [CrossRef]
  6. Fan, J.; Peng, H. On non-concave penalized likelihood with diverging number of parameters. Ann. Stat. 2004, 32, 928–961. [Google Scholar] [CrossRef]
  7. Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
  8. Meinshausen, N.; Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 2006, 34, 1436–1462. [Google Scholar] [CrossRef]
  9. Zhao, P.L.; Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
  10. Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 849–911. [Google Scholar] [CrossRef] [PubMed]
  11. Zou, H.; Li, R. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 2008, 36, 1509–1533. [Google Scholar]
  12. Bickel, P.J.; Ritov, Y.A.; Tsybakov, A.B. Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 2009, 37, 1705–1732. [Google Scholar] [CrossRef]
  13. Wainwright, M.J. Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (lasso) quadratic programming (Lasso). IEEE Trans. Inform. Theory 2009, 55, 2183–2202. [Google Scholar] [CrossRef]
  14. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef]
  15. Stock, J.; Watson, M. Forecasting using principal components from a large number of predictors. J. Am. Stat. Assoc. 2002, 97, 1167–1179. [Google Scholar] [CrossRef]
  16. Bai, J.; Ng, S. Determining the number of factors in approximate factor models. Econometrica 2002, 70, 191–221. [Google Scholar] [CrossRef]
  17. McCracken, M.; Ng, S. FRED-QD: A Quarterly Database for Macroeconomic Research; Federal Reserve Bank of St. Louis: St. Louis, MO, USA, 2021. [Google Scholar]
  18. Fan, J.; Ke, Y.; Wang, K. Factor-Adjusted Regularized Model Selection. J. Econom. 2020, 216, 71–85. [Google Scholar] [CrossRef]
  19. Koenker, R.; Bassett, G. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  20. Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
  21. Koenker, R.; Chernozhukov, V.; He, X.; Peng, L. Handbook of Quantile Regression; CRC Press: New York, NY, USA, 2017. [Google Scholar]
  22. Ando, T.; Tsay, R.S. Quantile regression models with factor-augmented predictors and information criterion. Econom. J. 2011, 14, 1–24. [Google Scholar] [CrossRef]
  23. Hirzel, A.H.; Hausser, J.; Chessel, D.; Perrin, N. Ecological-niche factor analysis: How to compute habitat-suitability maps without absence data? Ecology 2002, 83, 2027–2036. [Google Scholar] [CrossRef]
  24. Hochreiter, S.; Clevert, D.A.; Obermayer, K. A new summarization method for affymetrix probe level data. Bioinformatics 2006, 22, 943–949. [Google Scholar] [CrossRef] [PubMed]
  25. Gonalves, K.; Silva, A. Bayesian quantile factor models. arXiv 2020, arXiv:2002.07242. [Google Scholar]
  26. Chang, J.; Guo, B.; Yao, Q. High dimensional stochastic regression with latent factors, endogeneity and nonlinearity. J. Econom. 2015, 189, 297–312. [Google Scholar] [CrossRef]
  27. Chamberlain, G.; Rothschild, M. Arbitrage, factor structure, and mean–variance analysis on large asset markets. Econometrica 1982, 51, 1305–1324. [Google Scholar] [CrossRef]
  28. Bai, J. Inferential theory for factor models of large dimensions. Econometrica 2003, 71, 135–171. [Google Scholar] [CrossRef]
  29. Lam, C.; Yao, Q. Factor modeling for high-dimensional time series: Inference for the number of factors. Ann. Stat. 2012, 40, 694–726. [Google Scholar] [CrossRef]
  30. Fan, J.; Liao, Y.; Mincheva, M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2013, 75, 603–680. [Google Scholar] [CrossRef] [PubMed]
  31. Fan, J.; Liu, H.; Wang, W. Large covariance estimation through elliptical factor models. Ann. Stat. 2018, 46, 1383–1414. [Google Scholar] [CrossRef]
  32. Ando, T.; Bai, J. Quantile co-movement in financial markets: A panel quantile model with unobserved heterogeneity. J. Am. Stat. Assoc. 2020, 115, 266–279. [Google Scholar] [CrossRef]
  33. He, X.; Pan, X.; Tan, K.M.; Zhou, W.X. Smoothed quantile regression with large-scale inference. J. Econom. 2021; in press. [Google Scholar] [CrossRef]
  34. Forni, M.; Hallin, M.; Lippi, L. The generalized dynamic factor model: One-sided estimation and forecasting. J. Am. Stat. Assoc. 2005, 100, 830–840. [Google Scholar] [CrossRef]
Figure 1. The density of the response.
Figure 1. The density of the response.
Mathematics 10 02935 g001
Table 1. The comparison for p = 200 , N ( 0 , 1 ) with the factor model.
Table 1. The comparison for p = 200 , N ( 0 , 1 ) with the factor model.
FarvsqrLASSO
τ nEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 n = 1000.442263/200174/2001.3382.522243/200176/2003.876
n = 3000.218213/200179/2002.6972.312273/200173/20012.818
n = 5000.174193/200181/2002.9512.215353/200165/20018.989
n = 8000.139203/200180/2002.1082.443383/200162/20023.558
n = 10000.127193/200181/2002.2682.586423/200158/20028.702
τ = 0.5 n = 1000.346313/200169/2001.0152.226233/200177/2003.418
n = 3000.200223/200178/2001.9462.054283/200172/20012.735
n = 5000.154223/200178/2001.7922.132363/200164/20018.406
n = 8000.132203/200180/2001.8112.355403/200160/20023.207
n = 10000.116193/200181/2002.0042.594453/200155/20027.984
τ = 0.75 n = 1000.418263/200174/2001.2552.457223/200178/2003.525
n = 3000.228213/200179/2002.7152.218263/200174/20012.949
n = 5000.171203/200180/2003.0492.241343/200166/20019.219
n = 8000.141213/200179/2002.0992.474393/200161/20024.118
n = 10000.128203/200180/2002.2162.694443/200156/20028.402
τ = 0.9 n = 1000.583233/200177/2001.7843.337223/200178/2003.727
n = 3000.285213/200179/2004.7462.718253/200175/20013.996
n = 5000.216203/200180/2006.3562.640313/200169/20020.913
n = 8000.171213/200179/2005.8122.914373/200163/20025.975
n = 10000.158203/200180/2003.9233.045423/200158/20029.713
Table 2. The comparison for p = 200 , t 2 with the factor model.
Table 2. The comparison for p = 200 , t 2 with the factor model.
FarvsqrLASSO
τ nEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 n = 1000.675223/200178/2002.5872.513203/200180/2007.432
n = 3000.347223/200178/2005.2622.074263/200174/20018.704
n = 5000.257223/200178/2005.2512.230343/200166/20027.196
n = 8000.201213/200179/2003.3892.487423/200158/20027.192
n = 10000.158233/200177/2003.4572.394403/200160/20024.866
τ = 0.5 n = 1000.545253/200175/2001.1772.256203/200180/2003.862
n = 3000.257223/200178/2002.9881.830273/200173/20013.639
n = 5000.194213/200179/2002.7282.029343/200166/20022.801
n = 8000.149203/200180/2002.5022.268413/200159/20026.217
n = 10000.127193/200181/2002.7042.321403/200160/20023.114
τ = 0.75 n = 1000.655263/200174/2001.3662.608223/200178/2003.672
n = 3000.320243/200176/2004.1142.101273/200173/20014.301
n = 5000.254223/200178/2004.7572.228343/200166/20025.602
n = 8000.182243/200176/2003.2792.435413/200159/20027.394
n = 10000.177213/200179/2003.3202.415383/200162/20025.605
τ = 0.9 n = 1001.222263/200174/2002.7793.617223/200178/2005.240
n = 3000.638223/200178/2007.5012.743263/200174/20017.543
n = 5000.487223/200178/2009.4142.738303/200170/20027.684
n = 8000.373223/200178/2009.7572.867383/200162/20031.129
n = 10000.353213/200179/2009.8442.766373/200163/20022.628
Table 3. The comparison for p = 200 , 0.1 N ( 0 , 1 ) + 0.9 N ( 0 , 9 ) with the factor model.
Table 3. The comparison for p = 200 , 0.1 N ( 0 , 1 ) + 0.9 N ( 0 , 9 ) with the factor model.
FarvsqrLASSO
τ nEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 n = 1001.119283/200172/2001.5992.667223/200178/2004.086
n = 3000.620243/200176/2003.8932.762323/200168/20013.850
n = 5000.502233/200177/2004.2362.648363/200164/20021.379
n = 8000.379243/200176/2003.4612.509393/200161/20028.357
n = 10000.338233/200177/2003.4822.304393/200161/20027.955
τ = 0.5 n = 1001.049303/200170/2001.2522.359223/200178/2003.583
n = 3000.583253/200175/2003.0822.608323/200168/20013.517
n = 5000.469243/200176/2003.0742.481373/200163/20020.439
n = 8000.349243/200176/2002.8452.395403/200160/20027.249
n = 10000.311223/200178/2002.9692.204403/200160/20028.307
τ = 0.75 n = 1001.183283/200172/2001.4982.606223/200178/2003.552
n = 3000.618273/200173/2003.8822.808323/200168/20013.790
n = 5000.491233/200177/2004.1572.695363/200164/20021.212
n = 8000.380233/200177/2003.4062.531383/200162/20027.762
n = 10000.338243/200176/2003.4212.279393/200161/20027.729
τ = 0.9 n = 1001.469243/200176/2002.0783.490213/200179/2003.577
n = 3000.856213/200179/2006.4673.380313/200169/20015.179
n = 5000.640223/200178/2007.6923.175333/200167/20023.323
n = 8000.500213/200179/2007.3262.871343/200166/20029.211
n = 10000.434233/200177/2006.4272.638373/200163/20032.147
Table 4. The comparison for n = 1000 , N ( 0 , 1 ) with the factor model.
Table 4. The comparison for n = 1000 , N ( 0 , 1 ) with the factor model.
FarvsqrLASSO
τ pEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 p = 2000.124193/200181/2002.2682.059353/200165/20027.725
p = 3000.136213/300279/3002.9772.934523/300248/30029.432
p = 4000.149203/400380/4003.9893.417603/400340/40021.408
p = 5000.153223/500478/5004.9143.961633/500437/50018.900
p = 6000.158213/600579/6006.0133.821703/600530/60019.541
τ = 0.5 p = 2000.110213/200179/2002.0031.961373/200163/20025.812
p = 3000.126223/300278/3002.5982.818523/300248/30025.645
p = 4000.132213/400379/4003.3543.309633/400337/40022.244
p = 5000.142223/500478/5004.2383.875653/500435/50020.284
p = 6000.138233/600577/6005.1953.698723/600528/60020.908
τ = 0.75 p = 2000.120203/200180/2002.2472.051363/200164/20025.729
p = 3000.141213/300279/3002.9392.890523/300248/30028.248
p = 4000.139213/400379/4003.8303.466623/400338/40023.445
p = 5000.149203/500480/5004.8663.972623/500438/50019.635
p = 6000.148233/600577/6005.9673.870713/600529/60018.038
τ = 0.9 p = 2000.164193/200181/2003.8872.354343/200166/20029.357
p = 3000.171193/300281/3005.8193.327503/300250/30033.806
p = 4000.176203/400380/4008.1273.765573/400343/40027.258
p = 5000.181223/500478/50010.9034.461613/500439/50022.041
p = 6000.196213/600579/60013.7834.256683/600532/60020.241
Table 5. The comparison for n = 1000 , t 2 with the factor model.
Table 5. The comparison for n = 1000 , t 2 with the factor model.
FarvsqrLASSO
τ pEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 p = 2000.183203/200180/2006.3112.249383/200162/20020.116
p = 3000.191233/300277/3008.9453.272553/300245/30012.822
p = 4000.208243/400376/40012.0273.074583/400342/40013.202
p = 5000.219273/500473/50015.9863.861763/500424/5009.092
p = 6000.210233/600577/60019.3674.269863/600514/6009.049
τ = 0.5 p = 2000.146203/200180/2003.8532.203393/200161/20026.025
p = 3000.142203/300280/3007.0103.116563/300244/30016.420
p = 4000.158223/400378/4009.2962.973603/400340/40012.778
p = 5000.171223/500478/50012.5453.742783/500422/50010.316
p = 6000.170233/600577/60016.5904.209903/600510/6007.255
τ = 0.75 p = 2000.182223/200178/2006.1872.251383/200162/20023.494
p = 3000.196213/300279/3008.8313.253563/300244/30014.122
p = 4000.207233/400377/40011.9743.120593/400341/40012.617
p = 5000.221223/500478/50015.7813.926773/500423/5009.743
p = 6000.223233/600577/60019.5734.292843/600516/6008.488
τ = 0.9 p = 2000.352233/200177/20013.6842.610353/200165/20017.965
p = 3000.381233/300277/30020.9083.673523/300248/30012.572
p = 4000.417233/400377/40027.1343.626583/400342/40012.338
p = 5000.432263/500474/50035.5874.360733/500427/5006.723
p = 6000.446253/600575/60033.5894.750843/600516/6009.754
Table 6. The comparison for n = 1000 , 0.1 N ( 0 , 1 ) + 0.9 N ( 0 , 9 ) with the factor model.
Table 6. The comparison for n = 1000 , 0.1 N ( 0 , 1 ) + 0.9 N ( 0 , 9 ) with the factor model.
FarvsqrLASSO
τ pEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 p = 2000.364213/200179/2003.4352.323403/200160/20031.611
p = 3000.387233/300277/3004.7883.281593/300241/30031.467
p = 4000.401263/400374/4006.4113.649643/400336/40025.958
p = 5000.431253/500475/5008.3403.860753/500425/50021.536
p = 6000.417253/600575/60010.5484.215853/600515/60015.388
τ = 0.5 p = 2000.333233/200177/2002.8012.267423/200158/20029.623
p = 3000.345253/300275/3003.9023.196613/300239/30029.980
p = 4000.382243/400376/4005.3773.485673/400333/40024.325
p = 5000.365273/500473/5007.0153.730773/500423/50024.365
p = 6000.384283/600572/6009.0454.028853/600515/60018.130
τ = 0.75 p = 2000.359233/200177/2003.2622.320423/200158/20030.568
p = 3000.384233/300277/3004.5893.309593/300241/30029.940
p = 4000.404253/400375/4006.2833.620623/400338/40025.600
p = 5000.407263/500474/5008.2423.825733/500427/50022.053
p = 6000.433273/600573/60010.5254.117833/600517/60015.519
τ = 0.9 p = 2000.463203/200180/2005.9102.688383/200162/20033.401
p = 3000.488223/300278/3008.7163.666543/300246/30032.895
p = 4000.512223/400378/40012.0584.011593/400341/40031.480
p = 5000.523253/500475/50015.7714.207653/500435/50025.127
p = 6000.564233/600577/60019.1284.657783/600522/60021.218
Table 7. The comparison for p = 200 , N ( 0 , 1 ) with the equal correlated case.
Table 7. The comparison for p = 200 , N ( 0 , 1 ) with the equal correlated case.
FarvsqrLASSO
τ nEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 n = 1000.519213/200179/2000.6870.474153/200185/2001.099
n = 3000.314183/200182/2001.9320.272143/200186/2001.689
n = 5000.241163/200184/2001.7370.207133/200187/2002.204
n = 8000.196163/200184/2001.8020.168133/200187/2002.694
n = 10000.172153/200185/2001.9650.144133/200187/2003.856
τ = 0.5 n = 1000.482223/200178/2000.5040.449143/200186/2000.812
n = 3000.292173/200183/2001.4010.254143/200186/2001.637
n = 5000.231153/200185/2001.4450.197133/200187/2002.114
n = 8000.184163/200184/2001.6410.157123/200188/2002.621
n = 10000.157143/200186/2001.8060.135123/200188/2003.731
τ = 0.75 n = 1000.562203/200180/2000.6330.491153/200185/2001.009
n = 3000.313173/200183/2001.9430.267153/200185/2001.717
n = 5000.261153/200185/2001.7320.215133/200187/2002.201
n = 8000.197153/200185/2001.8270.164133/200187/2002.713
n = 10000.168153/200185/2001.9550.142123/200188/2003.871
τ = 0.9 n = 1000.723183/200182/2000.9740.613143/200186/2001.602
n = 3000.419163/200184/2003.2930.351133/200187/2002.395
n = 5000.315153/200185/2004.4340.261123/200188/2002.640
n = 8000.249153/200185/2002.6900.207123/200188/2003.064
n = 10000.217143/200186/2002.5600.179123/200188/2004.264
Table 8. The comparison for p = 200 , t 2 with the equal correlated case.
Table 8. The comparison for p = 200 , t 2 with the equal correlated case.
FarvsqrLASSO
τ nEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 n = 1000.920183/200182/2001.6170.780153/200185/2002.091
n = 3000.448183/200182/2003.9300.386153/200185/2003.560
n = 5000.357173/200183/2004.2220.306153/200185/2005.515
n = 8000.261163/200184/2004.2850.229143/200186/2006.950
n = 10000.248153/200185/2004.8490.214133/200187/2009.660
τ = 0.5 n = 1000.678213/200179/2001.1920.619143/200186/2001.417
n = 3000.340173/200183/2002.6990.300143/200186/2002.901
n = 5000.275163/200184/2002.6260.233143/200186/2004.046
n = 8000.208153/200185/2002.9430.181133/200187/2005.057
n = 10000.185153/200185/2003.1160.161133/200187/2006.456
τ = 0.75 n = 1000.886203/200180/2001.2210.767153/200185/2001.570
n = 3000.459163/200184/2003.3120.390143/200186/2002.922
n = 5000.358173/200183/2003.4640.311153/200185/2004.369
n = 8000.281183/200182/2003.3420.251153/200185/2005.341
n = 10000.233163/200184/2003.6010.202133/200187/2006.945
τ = 0.9 n = 1001.528213/200179/2002.2341.406143/200186/2003.076
n = 3000.871173/200183/2005.7240.722143/200186/2005.274
n = 5000.721173/200183/2007.3530.625153/200185/2005.912
n = 8000.564183/200182/2006.6980.498163/200184/2007.411
n = 10000.501153/200185/2006.3700.422143/200186/2009.231
Table 9. The comparison for p = 200 , 0.1 N ( 0 , 1 ) + 0.9 N ( 0 , 9 ) with the equal correlated case.
Table 9. The comparison for p = 200 , 0.1 N ( 0 , 1 ) + 0.9 N ( 0 , 9 ) with the equal correlated case.
FarvsqrLASSO
τ nEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 n = 1001.619213/200179/2001.0151.409153/200185/2001.493
n = 3000.901183/200182/2003.0100.790153/200185/2002.880
n = 5000.666173/200183/2003.1620.556163/200184/2003.584
n = 8000.557173/200183/2002.9440.473153/200185/2004.093
n = 10000.502173/200183/2003.0680.417163/200184/2005.173
τ = 0.5 n = 1001.371233/200177/2000.7861.236173/200183/2001.149
n = 3000.824203/200180/2002.5180.736153/200185/2002.719
n = 5000.633183/200182/2002.5200.544153/200185/2003.367
n = 8000.513163/200184/2002.5480.434143/200186/2003.855
n = 10000.432173/200183/2002.6540.371153/200185/2004.869
τ = 0.75 n = 1001.490223/200178/2000.9401.344153/200185/2001.331
n = 3000.938163/200184/2002.9920.783153/200185/2002.855
n = 5000.713163/200184/2003.1200.599153/200185/2003.529
n = 8000.569153/200185/2002.9050.469153/200185/2003.996
n = 10000.461173/200183/2002.9820.395153/200185/2005.074
τ = 0.9 n = 1002.077173/200183/2001.3441.760143/200186/2001.928
n = 3001.123163/200184/2004.2740.961133/200187/2004.074
n = 5000.919163/200184/2005.3680.763143/200186/2004.460
n = 8000.732163/200184/2004.7830.602153/200185/2004.777
n = 10000.610153/200185/2004.4850.497143/200186/2005.943
Table 10. The comparison for n = 1000 , N ( 0 , 1 ) with the equal correlation.
Table 10. The comparison for n = 1000 , N ( 0 , 1 ) with the equal correlation.
FarvsqrLASSO
τ pEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 p = 2000.166153/200185/2002.0020.140123/200188/2003.897
p = 3000.177153/300285/3002.7800.154123/300288/3004.935
p = 4000.209173/400383/4003.5760.180143/400386/4006.534
p = 5000.193163/500484/5004.5180.164143/500486/5008.210
p = 6000.210183/600582/6005.5310.182153/600585/6009.900
τ = 0.5 p = 2000.148153/200185/2001.8240.128123/200188/2003.743
p = 3000.169163/300284/3002.5040.146123/300288/3004.769
p = 4000.190173/400383/4003.2400.167143/400386/4006.329
p = 5000.173193/500481/5004.1140.153143/500486/5008.029
p = 6000.199173/600583/6004.9990.172163/600584/6009.662
τ = 0.75 p = 2000.164163/200184/2001.9660.138133/200187/2003.834
p = 3000.184173/300283/3002.7250.160123/300288/3004.849
p = 4000.206163/400384/4003.5400.176143/400386/4006.467
p = 5000.190173/500483/5004.5000.162143/500486/5008.205
p = 6000.214173/600583/6005.4670.188163/600584/6009.819
τ = 0.9 p = 2000.203163/200184/2002.5870.178123/200188/2004.229
p = 3000.222173/300283/3003.6190.196123/300288/3005.216
p = 4000.252153/400385/4004.7970.212143/400386/4006.965
p = 5000.244163/500484/5006.1970.206153/500485/5008.749
p = 6000.269163/600584/6007.5980.227153/600585/60010.395
Table 11. The comparison for n = 1000 , t 2 with the equal correlation.
Table 11. The comparison for n = 1000 , t 2 with the equal correlation.
FarvsqrLASSO
τ pEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 p = 2000.239163/200184/2002.5090.211153/200185/2004.951
p = 3000.268173/300283/3003.6550.239133/300287/3006.187
p = 4000.276203/400380/4005.0340.255153/400385/4007.654
p = 5000.281193/500481/5006.8240.263153/500485/5009.662
p = 6000.303173/600583/6008.6070.284163/600584/60011.108
τ = 0.5 p = 2000.194143/200186/2002.0010.165133/200187/2004.339
p = 3000.203163/300284/3002.8560.182133/300287/3005.500
p = 4000.211173/400383/4003.9660.193143/400386/4007.039
p = 5000.217173/500483/5005.4260.208143/500486/5008.957
p = 6000.230163/600584/6007.1340.251183/600582/60010.568
τ = 0.75 p = 2000.252153/200185/2002.4020.214143/200186/2004.828
p = 3000.269173/300283/3003.5500.240143/300286/3006.226
p = 4000.257163/400384/4004.8820.232143/400386/4007.572
p = 5000.295183/500482/5006.6140.274163/500484/5009.456
p = 6000.309163/600584/6008.5600.289153/600585/60010.910
τ = 0.9 p = 2000.520163/200184/2004.4970.445153/200185/2006.764
p = 3000.522173/300283/3006.5990.471143/300286/3008.048
p = 4000.532193/400381/4008.6770.480163/400384/4009.320
p = 5000.598173/500483/50011.2970.532163/500484/50011.131
p = 6000.614173/600583/60013.7430.543163/600584/60013.034
Table 12. The comparison for n = 1000 , 0.1 N ( 0 , 1 ) + 0.9 N ( 0 , 9 ) with the equal correlation.
Table 12. The comparison for n = 1000 , 0.1 N ( 0 , 1 ) + 0.9 N ( 0 , 9 ) with the equal correlation.
FarvsqrLASSO
τ pEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 p = 2000.479173/200183/2003.0530.400163/200184/2005.184
p = 3000.509193/300281/3004.4710.443163/300284/3006.581
p = 4000.556183/400382/4006.0780.478163/400384/4008.515
p = 5000.569213/500479/5007.9840.504163/500484/50010.485
p = 6000.601193/600581/60010.0910.526173/600583/60012.369
τ = 0.5 p = 2000.427163/200184/2002.6590.368143/200186/2004.914
p = 3000.468193/300281/3003.8810.406163/300284/3006.312
p = 4000.513173/400383/4005.3110.438163/400384/4008.199
p = 5000.515213/500479/5007.0140.464163/500484/50010.267
p = 6000.557213/600579/6008.9110.497183/600582/60012.116
τ = 0.75 p = 2000.484163/200184/2002.9690.404143/200186/2005.076
p = 3000.521173/300283/3004.4000.439163/300284/3006.513
p = 4000.549173/400383/4006.0540.464163/400384/4008.482
p = 5000.584193/500481/5007.9310.506173/500483/50010.444
p = 6000.580183/600582/60010.0060.514163/600584/60012.279
τ = 0.9 p = 2000.599163/200184/2004.4190.489143/200186/2005.918
p = 3000.653163/300284/3006.4440.546153/300285/3007.202
p = 4000.694193/400381/4008.8930.600163/400384/4009.254
p = 5000.738193/500481/50011.4780.658163/500484/50011.372
p = 6000.752193/600581/60014.1820.661163/600584/60013.227
Table 13. The comparison for p = 200 , N ( 0 , 1 ) with the factor model between Farvsqr and SCAD.
Table 13. The comparison for p = 200 , N ( 0 , 1 ) with the factor model between Farvsqr and SCAD.
FarvsqrSCAD
τ nEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 n = 1000.01063/200194/2001.4641.143323/200168/2004.207
n = 3000.01963/200194/2002.1700.27573/200193/20013.175
n = 5000.00363/200194/2002.9200.26873/200193/20025.350
n = 8000.00563/200194/2003.6910.29083/200192/20032.313
n = 10000.00263/200194/2004.1620.469133/200187/20022.552
τ = 0.5 n = 1000.02763/200194/2001.1130.503123/200188/2003.876
n = 3000.01563/200194/2002.1440.21073/200193/20014.855
n = 5000.00963/200194/2002.8690.20163/200194/20027.913
n = 8000.00463/200194/2003.6920.22473/200193/20028.429
n = 10000.00363/200194/2004.4420.374113/200189/20022.641
τ = 0.75 n = 1000.02963/200194/2001.2341.328403/200160/2003.410
n = 3000.01363/200194/2002.0030.26373/200193/20011.749
n = 5000.01163/200194/2002.6550.26073/200193/20022.713
n = 8000.00763/200194/2003.6380.29593/200191/20033.381
n = 10000.00263/200194/2004.0820.453133/200187/20025.003
τ = 0.9 n = 1000.02163/200194/2001.6443.325953/200105/2002.602
n = 3000.01563/200194/2002.0702.214723/200128/2006.157
n = 5000.01563/200194/2002.5471.049363/200164/20010.351
n = 8000.01163/200194/2003.0080.647183/200182/20016.733
n = 10000.00863/200194/2003.3440.805233/200177/20025.050
Table 14. The comparison for n = 1000 , N ( 0 , 1 ) with the factor model between Farvsqr and SCAD.
Table 14. The comparison for n = 1000 , N ( 0 , 1 ) with the factor model between Farvsqr and SCAD.
FarvsqrSCAD
τ pEstimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)Estimation ErrorAverage Model SizeTPTNElapsed Time (in Seconds)
τ = 0.25 p = 2000.00763/200194/2004.4130.21183/200192/20025.313
p = 3000.00563/300294/3006.0280.523143/300286/30020.791
p = 4000.00663/400394/4007.4720.505143/400386/40020.691
p = 5000.00663/500494/5009.5520.671253/500475/50022.224
p = 6000.00563/600594/60011.4120.519233/600577/60013.513
τ = 0.5 p = 2000.00663/200194/2004.7610.17073/200193/20021.696
p = 3000.00563/300294/3006.1110.427123/300288/30016.894
p = 4000.00663/400394/4007.8690.358113/400389/40011.312
p = 5000.00563/500494/5009.8000.399103/500490/5009.254
p = 6000.00263/600594/60012.0000.24373/600593/6003.955
τ = 0.75 p = 2000.00563/200194/2004.1970.21483/200192/20021.727
p = 3000.01063/300294/3005.6660.541143/300286/30019.417
p = 4000.00663/400394/4007.4540.491143/400386/40020.128
p = 5000.00263/500494/5009.0980.607183/500482/50027.821
p = 6000.00663/600594/60011.4220.586223/600578/60016.544
τ = 0.9 p = 2000.00163/200194/2003.4870.420133/200187/20025.110
p = 3000.00963/300294/3004.9321.086373/300263/30023.282
p = 4000.00663/400394/4006.8411.408613/400339/40026.220
p = 5000.01063/500494/5008.4462.1741143/500386/50029.943
p = 6000.01263/600594/60010.2512.3711583/600442/60034.568
Table 15. The results of the real data.
Table 15. The results of the real data.
R 2 Model SizeElapsed Time (Seconds)
τ FarvsqrLASSOFarvsqrLASSOFarvsqrLASSO
τ = 0.1 0.99880.9993182417.62099.8736
τ = 0.5 0.99980.9996191768.203613.8031
τ = 0.75 0.99980.9995382078.358910.6616
τ = 0.9 0.99980.9993382228.349310.1012
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Wang, Q.; Tian, M. Smoothed Quantile Regression with Factor-Augmented Regularized Variable Selection for High Correlated Data. Mathematics 2022, 10, 2935. https://doi.org/10.3390/math10162935

AMA Style

Zhang Y, Wang Q, Tian M. Smoothed Quantile Regression with Factor-Augmented Regularized Variable Selection for High Correlated Data. Mathematics. 2022; 10(16):2935. https://doi.org/10.3390/math10162935

Chicago/Turabian Style

Zhang, Yongxia, Qi Wang, and Maozai Tian. 2022. "Smoothed Quantile Regression with Factor-Augmented Regularized Variable Selection for High Correlated Data" Mathematics 10, no. 16: 2935. https://doi.org/10.3390/math10162935

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop