Next Article in Journal
Robust Synchronization of Fractional-Order Chaotic System Subject to Disturbances
Next Article in Special Issue
An Ensemble Method for Feature Screening
Previous Article in Journal
Intelligent Prediction of Maximum Ground Settlement Induced by EPB Shield Tunneling Using Automated Machine Learning Techniques
Previous Article in Special Issue
Estimation of Error Variance in Regularized Regression Models via Adaptive Lasso
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation

College of Mathematics and Statistics, Chongqing University, Chongqing 401331, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(24), 4638; https://doi.org/10.3390/math10244638
Submission received: 14 November 2022 / Revised: 22 November 2022 / Accepted: 3 December 2022 / Published: 7 December 2022
(This article belongs to the Special Issue Statistical Methods for High-Dimensional and Massive Datasets)

Abstract

:
Considering the influence of conditional variables is crucial to statistical modeling, ignoring this may lead to misleading results. Recently, Ma, Li and Tsai proposed the quantile partial correlation (QPC)-based screening approach that takes into account conditional variables for ultrahigh dimensional data. In this paper, we propose a nonparametric version of quantile partial correlation (NQPC), which is able to describe the influence of conditional variables on other relevant variables more flexibly and precisely. Specifically, the NQPC firstly removes the effect of conditional variables via fitting two nonparametric additive models, which differs from the conventional partial correlation that fits two parametric models, and secondly computes the QPC of the resulting residuals as NQPC. This measure is very useful in the situation where the conditional variables are highly nonlinearly correlated with both the predictors and response. Then, we employ this NQPC as the screening utility to do variable screening. A variable screening procedure based on NPQC (NQPC-SIS) is proposed. Theoretically, we prove that the NQPC-SIS enjoys the sure screening property that, with probability going to one, the selected subset can recruit all the truly important predictors under mild conditions. Finally, extensive simulations and an empirical application are carried out to demonstrate the usefulness of our proposal.

1. Introduction

Variable screening technique has been demonstrated as a computationally fast and efficient tool in solving many problems in ultrahigh dimensions. For example, in many scientific areas, such as biological genetics, finance and econometrics, we may collect the ultrahigh dimensional data sets (e.g., biomarkers, financial factors, assets and stocks), where the number p n of predictors extremely exceeds the sample size n. Theoretically, ultrahigh dimension often refers to the dimensionality p n and sample size n satisfies the relationship: p n = O ( exp ( n a ) ) for some constant a > 0 . Variable screening is able to reduce the computational cost, to avoid the instability of algorithms, and to improve the estimation accuracy. These issues exist in the variable selection approaches based on LASSO [1], SCAD [2,3] or MCP [4] for ultrahigh dimensional data. Since the seminal work of [5], which pioneeringly proposed the sure independence screening (SIS) procedure, many variable screening approaches have been consecutively documented over the last fifteen years, including the model-based methods (e.g., [6,7,8,9,10,11]) and the model-free methods [12,13,14,15,16,17,18,19,20]. These papers have showed that with probability approaching one, the set of selected predictors contain the set of all truly important predictors.
Most marginal approaches focus only on developing various effective and robust measures to characterize the marginal association between the response and individual predictor. Whereas, these methods do not take into consideration the influence of conditional variables or confounding factors on the response. A simple application of SIS is relatively rough since SIS may perform poorly when predictors are highly correlated with each other. Some predictors that are weakly relevant or irrelevant, but jointly correlated to the response, may be excluded in the final model after applying marginal screening methods. This will result in a high false positive rate (FPR). To surmount this weakness, an iterated screening algorithm or a penalization-based variable selection is usually offered as a refined follow-up step (e.g., [5,10]).
Conditional variable screening can be viewed as an important extension of the marginal screening. It accounts for conditional information when calculating the marginal screening utility. There is relatively less work in the literature. To name a few, Ref. [21] proposed a conditional SIS (CIS) procedure to improve the performance of SIS because some correlated conditional variables may increase the chance of boosting the rank of the marginally weak predictor and that of reducing the number of false negatives. The paper [22] proposed a confounder-adjusted screening method for high dimensional censoring data, in which the additional environmental confounders are regarded as conditional variables. The researchers in [23] studied the variable screening by incorporating within-subject correlation for ultrahigh dimensional longitudinal data, where they used some baseline variables as conditional variables. Ref. [24] proposed a conditional distance correlation-based screening via kernel smoothing method, while [25] further presented a screening procedure based on conditional distance correlation, which is similar to [24] in methodology, but differs in theory. Additionally, Ref. [11] developed a conditional quantile correlation-based screening approach using the B-spline smoothing technique. However, in [11,24,25], among others, the conditional variable they considered is only univariate. Further, Ref. [21] focuses on the generalized linear models, but cannot handle heavy-tailed data. For this regard, we aim to develop a screener that behaves more robustly to outliers and heavy-tailed data, and simultaneously considers more than one conditional variable. On the choice of conditional variables, one can achieve that through some prior knowledge such as published research work or the experience of experts from relevant subjects. When no prior knowledge is available, one can apply some marginal screening approaches, such as the SIS or its robust variants, to select several top-ranked predictors as conditional variables.
On the other hand, to the best of our knowledge, several works have considered multiple conditional variables based on distinct partial correlations. For instance, Ref. [26] proposed a thresholded partial correlation approach to select significant variables in linear regression models. Additionally, Ref. [17] presented a screening procedure on the basis of the quantile partial correlation in [27], and they referred to the procedure as QPC-SIS. More recently, Ref. [28] proposed a copula partial correlation-based screening approach. It is worth noting that the partial correlation used in both [17,28] removes the effect of conditional variables on the response and each predictor through fitting two parametric models with a linear structure. However, this manner may be ineffective, especially when the conditional variables have a nonlinear influence on the response nonlinear. This motivates us to work out a flexible way to control the impact of conditional variables. Meanwhile, we also take into account the issue of the robustness to outlying or heavy-tail response in this paper.
This paper contributes a robust and flexible conditional variable screening procedure via a partial correlation coefficient, which is a non-trivial extension of [17]. First of all, in order to precisely control conditional variables, we propose a nonparametric definition of QPC, which extends that of [17] and allows for more flexibility. Specifically, we first fit two nonparametric additive models to remove the effect of conditional variables on the response and an individual predictor, where we use the B-spline smoothing technique to estimate the nonparametric functions. This can be viewed as a nonparametric adjustment for controlling conditional variables. By that, we can obtain two residuals, on which a quantile correlation can be calculated to formulate a nonparametric QPC. Second, we use this quantity as the screening utility in variable screening. This procedure can be implemented rapidly. We refer to this procedure as the nonparametric quantile partial correlation-based screening, denoted as NQPC-SIS. Third, theoretically, we establish the sure screening property for NQPC-SIS under some mild conditions. Compared to [17], our approach is more flexible and our theory on the sure screening property is more difficult to derive. Moreover, our screening idea can be easily transferred to some existing screening methods that use some popular partial correlation.
The remainder of the paper is organized as follows. In Section 2, the NQPC-SIS is introduced. The technical conditions needed are listed and asymptotic properties are established in Section 3. Section 4 provides an iterative algorithm for a further refinement. Numerical studies and empirical analysis of real data set are carried out in Section 5. Concluding remarks are given in Section 6. All the proofs of the main results are relegated to the Appendix A.

2. Methodology

2.1. A Preliminary

In this section, we formally introduce the NQPC-SIS procedure. To begin with, we give some background on the quantile correlation (QC) introduced in [27]. Let X and Y be two random variables, and E X be the expectation of X. The definition of QC is formulated as
qcor τ ( Y , X ) = E [ ψ τ ( Y Q τ , Y ) ( X E ( X ) ) ] var ( I ( Y Q τ , Y > 0 ) ) var ( X ) ,
where Q τ , Y is the τ th quantile of Y, and ψ τ ( u ) = τ I ( u < 0 ) for some quantile level τ ( 0 , 1 ) , here I ( · ) denotes an indicator function. This correlation takes on a value between 1 and 1 and is asymmetric with respect to Y and X compared with the conventional correlation coefficient. The QC shares the merits: the property of monotone invariance for Y as well as the robustness of Y, due to the use of the quantile rather than the mean in the definition. Thus, QC affects little in the presence of outliers in Y. Besides, as shown in [27], qcor τ ( Y , X ) is closely related to the quantile regression. If we denote by ( a 0 τ * , a 1 τ * ) the minimizer of E { ρ τ ( Y a 0 τ a 1 τ X ) } with respect to a 0 τ and a 1 τ , where ρ τ ( u ) = u [ τ I ( u < 0 ) ] . Then, it follows that qcor τ ( Y , X ) = φ ( a 1 τ * ) , where φ ( · ) is a continuous and increasing function, and φ ( a 1 τ * ) = 0 if and only if a 1 τ * = 0 .
When QC is used as a marginal screening utility for variable screening, the screening results obtained may be misleading when the predictors are highly correlated. To overcome this problem, Ref. [17] proposed the screening based on quantile partial correlation (QPC) to reduce the effect from conditional predictors. For the sake of presentation, write X j = ( X k , k j ) T for j = 1 , , p n . The QPC in [17] is defined as
qpcor τ ( Y , X j | X j ) = cov ( ψ τ ( Y X j T α j 0 ) , X j X j T θ j 0 ) var ( ψ τ ( Y X j T α j 0 ) ) var ( X j X j T θ j 0 ) = E { ψ τ ( Y X j T α j 0 ) ( X j X j T θ j 0 ) } τ ( 1 τ ) σ j 2 ,
where σ j 2 = var ( X j X j T θ j 0 ) , α j 0 = argmin α j E { ρ τ ( Y X j T α j ) } and θ j 0 = argmin θ j E { ( X j X j T θ j ) 2 } . When applying the QPC to variable screening, we must estimate two quantities α j 0 and θ j 0 in advance. However, for ultrahigh dimensional data, the dimensionality of X j is p n 1 , which can still be much bigger than the sample size n. In this situation, it is difficult to obtain the estimators of α j 0 and θ j 0 . On the other hand, it is usually believed that the useful conditional variables are relatively less. Thus, it is reasonable to consider a small subset of { k : k j , k = 1 , p n } , denoted by S j in practice. Here, S j is said to be conditional set with a size smaller than n and it can be specified as the set of previously selected variables and the variables related to the jth predictor, if there is no prior knowledge on it. As a result, Ref. [17] suggested using the following measure to perform variable screening:
qpcor τ ( Y , X j | X S j ) = E { ψ τ ( Y X S j T α j 0 ) ( X j X S j T θ j 0 ) } τ ( 1 τ ) σ j 2 ,
where σ j 2 = var ( X j X S j T θ j 0 ) , α j 0 = argmin α j E { ρ τ ( Y X S j T α j ) } and θ j 0 = argmin θ j E { ( X j X S j T θ j ) 2 } , in which X S j = ( X k , k S j ) T .
From the definition, one can see that the QPC is just the QC between Y and X j after removing the confounding effects of conditional variables X S j . Typically, it is through fitting two parametric regression models: one is to fit a linear quantile regression of Y on X S j , and another is on a multivariate linear regression of X j on X S j . Afterwards, the QPC computes the QC of two residuals that are obtained from these two regression fittings. However, in real applications, the parametric models used to dispel the confounding effects may not be adequate, especially when a nonlinear dependence structure between the response and the predictions is present, which is quite common in high-dimensional data. This motivates us to consider a more flexible and efficient approach to control the influence of the confounding/conditional variables.

2.2. Proposed Method: NQPC-SIS

Without loss of generality, we assume that the predictors { X j , 1 j p } are standardized and the response Y satisfies τ -qauntile centered, i.e., Q τ , Y = 0 , which is similar to the treatment where the response is centered by mean. Then, we consider the quantile additive model as
Y = m 1 ( X 1 ) + m 2 ( X 2 ) + + m p ( X p ) + ε ,
where the error term satisfies P ( ε < 0 | X ) = τ . This means that the conditional τ -quantile of Y given X is Q τ , Y | X = m 1 ( X 1 ) + m 2 ( X 2 ) + + m p ( X p ) . We denote by M * = { j : m j ( X j ) 0 , 1 j p } the active set, which indicates the set of indices associated with the nonzero coefficients in the true model and is often assumed to be sparse.
Let | S j | be the cardinality of a set S j , and m j k and g j k , k S j , be 2 -smoothing functions satisfying some conditions. For the identification, we require that m j k ( x ) d x = 0 and E { g j k ( X k ) } = 0 for all j , k . Set m j ( X S j ) = k S j m j k ( X k ) and g j ( X S j ) = k S j g j k ( X k ) . A nonparametric version of QPC (denoted as NQPC) is formulated as
ϱ τ ( Y , X j | X S j ) = E { ψ τ ( Y m j 0 ( X S j ) ) ( X j g j 0 ( X S j ) ) } τ ( 1 τ ) σ j , 0 2 ,
where σ j , 0 2 = var ( X j g j 0 ( X S j ) ) , m j 0 = argmin m j E { ρ τ ( Y m j ( X S j ) ) } and g j 0 = argmin g j E { ( X j g j ( X S j ) ) 2 } . Suppose we have a dataset: { ( Y i , X i ) , i = 1 , , n } consisting of n independent copies of ( Y , X ) , where the dimensionality of X i is p n . Let X i , S j be the sub-vector of X i indexed by S j . Then, a sample estimate for NQPC can be given as
ϱ ˜ τ ( Y , X j | X S j ) = n 1 i = 1 n ψ τ ( Y i m ˜ j ( X i , S j ) ) ( X i j g ˜ j ( X i , S j ) ) τ ( 1 τ ) σ ˜ j 2 ,
where σ ˜ j 2 = n 1 i = 1 n ( X i j g ˜ j ( X i , S j ) ) 2 , m ˜ j = argmin m j 1 n i = 1 n ρ τ ( Y i m j ( X i , S j ) ) and g ˜ j = argmin g j 1 n i = 1 n ( X i j g j ( X i , S j ) ) 2 . Since m j k and g j k are unknown nonparametric functions, so m ˜ j and g ˜ j cannot be used, rendering ϱ ˜ τ ( Y , X j | X S j ) inapplicable. In what follows, we estimate each of m j k s and g j k s by making use of nonparametric B-spline approximation.
To proceed, we denote { B k ( · ) , k = 1 , , L n } with B k 1 by a sequence of normalized and centered B-spline basis functions, where L n is the number of basis functions. Then, according to the theory of B-spline approximation ([29]), for a generic smoothing function m, there exists a vector γ R L n such that m ( x ) B ( x ) T γ , where B ( · ) = ( B 1 ( · ) , , B L n ( · ) ) T . Therefore, there exist vectors α j k R L n and θ j k R L n such that m j k ( X k ) B ( X k ) T α j k and g j k ( X k ) B ( X k ) T θ j k . Since m j k ( x ) d x = 0 and E { g j k ( X k ) } = 0 , it naturally implies that E { B ( X k ) } = 0 for k S j . Write α j = ( { α j k T , k S j } ) T , θ j = ( { θ j k T , k S j } ) T and B j = ( { B ( X k ) T , k S j } ) T . Denote by m ^ j ( X i , S j ) = B i j T α ^ j , g ^ j ( X i , S j ) = B i j T θ ^ j and σ ^ j 2 = n 1 i = 1 n ( X i j g ^ j ( X i , S j ) ) 2 , where
α ^ j = argmin α j 1 n i = 1 n ρ τ ( Y i B i j T α j )
and
θ ^ j = argmin θ j 1 n i = 1 n ( X i j B i j T θ j ) 2 ,
where B i j indicates B j within B ( X k ) being replaced by B ( X i k ) for i = 1 , , n and k S j . Then, it follows that a feasible sample estimate for NQPC is given by
ϱ ^ τ ( Y , X j | X S j ) = n 1 i = 1 n ψ τ ( Y i m ^ j ( X i , S j ) ) ( X i j g ^ j ( X i , S j ) ) τ ( 1 τ ) σ ^ j 2 .
Next, we employ the above NQPC estimator as a screening utility in variable screening. To this end, we denote M ^ ν n to be the selected active set via the screening procedure such that the maximal absolute sample NQPC of the selected variables in M ^ ν n are greater than a user-specified threshold value ν n . In other words, we can select an active set of variables by
M ^ ν n = { j : | ϱ τ ^ ( Y , X j | X S j ) | ν n for 1 j p } .
We name this procedure as the NQPC-based variable screening, abbreviated as NQPC-SIS. In the next section, we will provide some theoretical justification for this approach.

3. Theoretical Properties

To state our theoretical results, we first make some notations. Let r n = max 1 j p | S j | . Throughout the rest of the paper, for any matrix A , we use A = λ max ( A T A ) , A = max i , j | A i j | , and λ min ( A ) and λ max ( A ) to stand for the operator norm, the infinity norm as well as the minimum and maximum eigenvalues for a symmetric matrix A , respectively. In addition, for any vector a , a = i a i 2 means the Euclidean norm.
Denote u j = | ϱ τ ( Y , X j | X S j ) | and u ^ j = | ϱ ^ τ ( Y , X j | X S j ) | , where ϱ τ ( Y , X j | X S j ) is given in Equation (4) and ϱ ^ τ ( Y , X j | X S j ) is given in Equation (7). Further, we also denote u j * = | ϱ τ * ( Y , X j | X S j ) | , where
ϱ τ * ( Y , X j | X S j ) = E { ψ τ ( Y B j T α j 0 ) ( X j B j T θ j 0 ) } τ ( 1 τ ) σ j 2 ,
where σ j 2 = var ( X j B j T θ j 0 ) , α j 0 = argmin α j E { ρ τ ( Y B j T α j ) } and θ j 0 = argmin θ j E { ( X j B j T θ j ) 2 } . Before we establish the uniform convergence of u ^ j to u j , we first investigate the bound of the gap between u j and u j * , which is helpful to understand the marginal signal level after applying B-spline approximation to the population utility. We need the following conditions:
(B1) 
We assume that E { X j | X S j } = g j 0 ( X S j ) = k S j g j k 0 ( X k ) and X k denotes the support of covariate X k . There exist some positive constants C g and C m such that for any k S j ,
max 1 j p sup x X k | g j k 0 ( x ) B j ( x ) T θ j k 0 | C g L n d , max 1 j p sup x X k | m j k 0 ( x ) B j ( x ) T α j k 0 | C m L n d ,
where d is defined in condition (C1) below.
(B2) 
There exist some positive constants c σ , min , c σ , max , c ˜ σ , min , c ˜ σ , max such that
0 < c σ , min max 1 j p σ j 2 c σ , max < , 0 < c ˜ σ , min max 1 j p σ j , 0 2 c ˜ σ , max < ,
where σ j 2 and σ j , 0 2 are given in (4) and (8), respectively.
(B3) 
In a neighborhood of B j T α j 0 , the conditional density of Y given ( X j , X S j ) , f Y | ( X j , X S j ) ( y ) , is bounded on the support of ( X j , X S j ) and uniformly in j.
(B4) 
min j M * u j C 0 r n n κ for some C 0 > 0 and 0 < κ < 1 / 2 .
Condition (B1) is imposed on the approximation error condition for nonparametric function in B-spline smoothing literature (e.g., [11,30,31]). Condition (B2) requires variances σ j 2 and σ j , 0 2 to be uniformly bounded. Condition (B3) implies that there exists a finite constant c ¯ f > 0 such that for a small ϵ > 0 , sup | y B j T α j 0 | < ϵ f Y | ( X j , X S j ) ( y ) c ¯ f holds uniformly. Condition (B4) guarantees that the marginal signal of active components in model M * does not vanish. These conditions are similar to those in [17].
Proposition 1.
Under conditions (B1)–(B3), there exists a positive constant M 1 * such that
u j u j * M 1 * r n L n d ,
In addition, if condition (B4) further holds, then
min j M * u j * C 0 ξ r n n κ ,
provided that L n d C 0 ( 1 ξ ) n κ / M 1 for some ξ ( 0 , 1 ) .
To establish the sure screening property, we make the following assumptions:
(C1) 
{ m k j } and { g k j } belong to a class of functions F , whose rth derivatives m k j ( r ) and g k j ( r ) exist and are Lipschitz of order α ,
F = { b ( · ) : | b ( r ) ( s ) b ( r ) ( t ) | K | s t | α } , for s , t [ a , b ]
for some positive constant K, where [ a , b ] is the support of X k , r is a non-negative integer and α ( 0 , 1 ] such that d = r + α > 0.5 .
(C2) 
The joint density of X , f X is bounded by two positive numbers b 1 f and b 2 f satisfying b 1 f f X b 2 f . The density of X j , f X j is bounded away from zero and infinity uniformly in j, that is, there exist two positive constants c 1 f and c 2 f such that c 1 f f X j ( x ) c 2 f .
(C3) 
There exist two positive constants K 1 and K 2 , such that P ( X j > x | X j ) K 1 exp ( K 2 1 x ) for every j.
(C4) 
The conditional density of Y given X = x , f Y | X = x ( y ) , satisfies the Lipschitz condition of first order and c 3 f f Y | X = x ( y ) c 4 f for some positive constants c 3 f and c 4 f for any y in a neighborhood of B j T α j 0 for 1 j p .
(C5) 
There exist some positive constants M 1 and M 2 such that sup i , j | B i j T α j 0 | M 1 < , sup i , j | B i j T θ j 0 | M 2 < . Furthermore, assume that min 1 j p σ j 2 M 3 > 0 for some constant M 3 .
(C6) 
There exists some constant ξ ( 0 , 1 ) such that L n d C 0 ( 1 ξ ) n κ / M 1 * .
Condition (C1) is a smoothness assumption on { m k j } and { g k j } in nonparametric B-spline-related literature ([7,32]). Condition (C3) is a moment constraint on each of the predictors. Conditions (C2), (C4) and (C5) are similar to those imposed in [17]. Condition (C6) is assumed to ensure the marginal signal level of truly active variables not too weak after B-spline approximation. The above conditions are standard in variable screening literature (e.g., [17,28]).
According to the properties of normalized B-splines and under the conditions (C1) and (C2) (c.f., [33,34]), we can obtain the fact that for each j = 1 , , p and k = 1 , , L n , there exist positive constants C 1 , C 2 and C 3 independent of j , k such that
C 1 L n 1 λ min ( E { B ( X j ) B ( X j ) T } ) λ max ( E { B ( X j ) B ( X j ) T } ) C 2 L n 1 ,
and
E { B k 2 ( X j ) } C 3 L n 1 .
The following lemma bounds the eigenvalues of the B-spline basis matrix from below and from above. This result extends Lemma 3 of [32] from a fixed dimension to a diverging dimension, which may be crucial to the independent interest of some readers.
Lemma 1.
Suppose that conditions (C1) and (C2) hold, then we have
C 1 1 δ 0 2 | S j | 1 L n 1 λ min ( E { B j B j T } ) λ max ( E { B j B j T } ) C 2 | S j | L n 1 ,
where δ 0 = ( 1 b 1 f 2 b 2 f 2 ζ ) 1 / 2 for some constant 0 < ζ < 1 .
This result reveals that r n plays an important role in bounding the eigenvalues of the B-spline basis matrix. When r n goes to infinity rapidly, the minimum eigenvalue of the basis matrix will degrade to zero very quickly at an exponential rate. However, if the following result holds, then the divergence rate of r n cannot achieve a polynomial order of n, but can be of an order of log n .
Theorem 1.
Suppose that conditions (B1)–(B5) and (C1)–(C5) hold and assume that a 0 2 r n L n / n 1 2 κ = o ( 1 ) and a 0 2 r n r n 3 L n n κ = o ( 1 ) are satisfied.
(i) 
For any C > 0 , then there exist some positive constants c 6 * , c 14 * such that, for 0 < κ < 1 / 2 and sufficiently large n,
P max 1 j p n | u ^ j u j * | C r n n κ p n { 7 exp ( c 6 * a 0 2 r n r n 2 n 1 4 κ ) + [ 116 ( r n L n ) 2 + 60 r n L n + 10 ] exp ( c 14 * a 0 2 r n L n 3 n 1 2 κ ) } ,
where a 0 = ( 1 δ 0 ) / 2 and δ 0 is given in Lemma 1.
(ii) 
In addition, if condition (C6) is further satisfied, by choosing ν n = C ˜ 0 r n n κ with C ˜ 0 C 0 ξ / 2 , we have
P M * M ^ ν n 1 s n { 7 exp ( c 6 * a 0 2 r n r n 2 n 1 4 κ ) + [ 116 ( r n L n ) 2 + 60 r n L n + 10 ] exp ( c 14 * a 0 2 r n L n 3 n 1 2 κ ) }
for sufficiently large n, where s n = | M * | .
The above establishes the sure screening property that all the relevant variables can be recruited with probability going to one in the final model. The probability bound in the property is free of p n , but depends on r n and the number of basis functions L n . Though this ensures that NQPC-SIS retains all important predictors with high probability, the noisy variables can be included by NQPC-SIS. Ideally, this can be realized by the choice of ν n , according to Theorem 1 and by setting max j M * | ϱ τ * ( Y , X j | X S j ) | = o ( r n n κ ) , to achieve the selection consistency, i.e.,
P ( M * = M ^ ν n ) 1
when n is sufficiently large. This property can also be achieved by Theorem 1 and by assuming that ϱ τ * ( Y , X j | X S j ) = 0 for j M * . However, this would be too restrictive to check in practice. Similar to [17], we may assume that j = 1 p u j * = O ( n ς ) for some ς > 0 to control the false selection rate. With this condition, we can obtain the following property to control the size of the selected model.
Theorem 2.
Under the conditions of Theorem 1 and by choosing ν n = C ˜ 0 r n n κ with C ˜ 0 C 0 ξ / 2 and if j = 1 p u j * = O ( n ς ) for some ς > 0 , then for some positive constant C * , there exist some constants c ˜ 6 * , c ˜ 14 * such that
P ( | M ^ ν n | C * r n 1 n κ + ς ) 1 p n { 7 exp ( c ˜ 6 * a 0 2 r n r n 2 n 1 4 κ ) + [ 116 ( r n L n ) 2 + 60 r n L n + 10 ] exp ( c ˜ 14 * a 0 2 r n L n 3 n 1 2 κ ) }
for sufficiently large n.
This theorem reveals that after an application of the NQPC-SIS, the dimensionality can be reduced from an exponential order to a polynomial size of n at the same time retaining all the important predictors with probability approaching one.

4. Algorithm for NQPC-SIS

To make the NQPS-SIS practically applicable, for each X J , we need to specify the conditional set S j . We note that a sequential test was developed to identify S j in [17] via an application of the Fisher’s Z-transformation [35] and partial correlation. In this section, we provide a two-stage procedure based on nonparametric additive quantile regression model, which can be viewed as a complementary to [17].
To reduce the computational burden, we first apply the quantile-adaptive model-free feature screening (Qa-SIS) proposed by [13] to select a subset from { X j , 1 j p n } , denoted by M ^ Qa-SIS with | M ^ Qa-SIS | = 0.5 n L n 1 / log ( n L n 1 ) + 1 , where L n is the number of basis functions used in Qa-SIS and a denotes the largest integer not exceeding a. Second, for each X j , if X j M ^ Qa-SIS , we set C j = { X k | X k M ^ Qa-SIS , k j } , otherwise C j = { X k | X k M ^ Qa-SIS , k | M ^ Qa-SIS | } . Thus, | C j | = 0.5 n L n 1 / log ( n L n 1 ) . Third, we carry out a variable selection with SCAD penalty [2] based on additive quantile regression model for data set { ( X i j , X i C j ) , i = 1 , , n } and then a small reduced subset is obtained, denoted by C j v . Such a two-stage procedure can help to find the conditional subset for the jth variable and will be incorporated in the following algorithm. With a slight abuse of notation, we use d n to denote the screening threshold parameter of the NQPC-SIS, in other words, for the NQPC-SIS, we select d n covariates that correspond to the first d n largest NQPCs.
Algorithm 1 has the same spirit as the QPCS algorithm of [17], who demonstrated empirically that the QPCS algorithm outperforms their QTCS and QFR algorithms. In the implementation, we choose d n * = 0.5 n L n 1 / log ( n L n 1 ) and d n = n / log n , which does not exclude other choice. According to our limited simulation experience, this choice works satisfactorily. The values of d n * and r n we take on cannot be too large, due to the use of B-spline basis approximations. Theoretically, we need to specify d n * such that d n * r n , while it is sufficient to require L n d n * < n practically.
Algorithm 1 The implementation of NQPC-SIS.
1:
Given d n , we set a pre-specified number d n * d n and an initial set A ( 0 ) = .
2:
For k = 1 , , d n * ,
(2a)
update S j = A ( k 1 ) C j v ;
(2b)
update A ( k ) = A ( k 1 ) { j * } , where the variable index j * is defined by
j * = argmax j A ( k 1 ) | ρ ^ τ ( Y , X j | X S j ) | .
3:
For k = d n * + 1 , , d n ,
(3a)
update S j = A ( d n * ) C j v ;
(3b)
update A ( k ) = A ( k 1 ) { j * } , where the variable index j * is such that
j * = argmax j A ( k 1 ) | ρ ^ τ ( Y , X j | X S j ) | .
4:
Repeat Step 3 until k d n . The final selected set is denoted as M ^ .

5. Numerical Studies

5.1. Simulations

In this subsection, we conduct some simulation studies to examine the finite sample performance of the proposed NQPC-SIS. In order to evaluate the performance, we employ three criteria: the minimum model size (MMS), i.e., the smallest number of covariates that contain all the active variables, its robust standard deviation (RSD), and the proportion of all the active variables selected ( P ) with the screening threshold parameter being specified as d n = n / log n . Throughout this subsection, we adopt the following simulation settings: the sample size n = 200 , the number of basis L n = n 1 / 5 + 1 , and the dimensionality p n = 1000 . We simulate the random error ε from two distributions: N ( 0 , 1 ) and t ( 3 ) , respectively. Three quantile levels τ = 0.2 , 0.5 , 0.8 are considered in all situations. For each simulation scenario, all the results are obtained over N = 200 replications.
Example 1.
Let X = ( X 1 , , X p n ) T be a p n -dimensional random vector having a multivariate normal distribution with mean zero and covariance matrix Σ = ( σ j k ) 1 j , k p n , where σ j j = 1 and σ j , k = ρ , j k except that σ i 4 = σ 4 j = ρ . Generate the response as:
Y = β X 1 + β X 2 + β X 3 3 β ρ X 4 + ε .
It is easily observed that the marginal Pearson’s correlation between X 4 and Y is zero. We take ρ = 0.5 , 0.8 and set β = 2.5 ( 1 + | τ 0.5 | ) to incorporate the quantile information.
Example 2.
We follow the simulation model of [17] and generate the response as
Y = β X 1 + β X 2 + β X 3 3 β ρ X 4 0.25 β X 5 + ε ,
where β , ρ , and X are defined as in Example 1 except that σ i 5 = σ 5 j = 0 such that X 5 is uncorrelated with X j , j 5 .
Example 3.
We simulate the response from the following nonlinear model:
Y = 3 g 1 ( X 1 ) + 3 g 2 ( X 2 ) + 3 g 3 ( X 3 ) + 3 g 4 ( X 4 ) + 3 g 5 ( X 5 ) + ε ,
where g 1 ( x ) = 1.5 x , g 2 ( x ) = 2 x ( 2 x 1 ) , g 3 ( x ) = sin ( 2 π x ) / ( 2 sin ( 2 π x ) ) , g 4 ( x ) = sin ( 2 π x ) , g 5 ( x ) = e x 0.5 . The covariates X = ( X 1 , , X p n ) are simulated from a random-effects model X j = W j + t U 1 + t , j = 1 , , p n , where W j s and U are iid U n i f ( 0 , 1 ) . We consider two cases of t = 1 and t = 2 , corresponding to corr ( X j , X k ) = 0.5 and 0.8 for j k , respectively.
Example 4.
We consider the same model as that in Example 3, with exception that X 2 and X 5 are replaced by X 2 = cos ( 2 π X 6 ) + ϵ and X 5 = ( X 1 0.5 ) 2 + ϵ , where ϵ N ( 0 , 1 ) and is independent of ε, the error in the model in Example 3.
The simulation results of Examples 1–4 are shown in Table 1, Table 2, Table 3 and Table 4, respectively. The results in Table 1 show that when the true relation between the response and covariates in the model is linear, the SIS, NIS and Qa-SIS methods fail to work. However, when comparing to those methods, we can see that both QPC-SIS and NQPC-SIS with τ = 0.5 work reasonably well, although the QPC-SIS slightly outperform our NQPC-SIS when ρ = 0.8 . This is expected because the QPC works for the model with linear relationship between the covariates. A similar observation can be drawn in Table 2 for Example 2, which is also a linear model, albeit the difference that X 5 and X j , j 5 are independent in Example 2. The results in Table 3 indicate that when the relationship between Y and X is nonlinear and the relationship between covariates is linear, our proposed NQPC-SIS performs best and then followed by QPC-SIS. From Table 4, we can see that when the relationship between Y and X is nonlinear and there also exists a nonlinear relationship among X , NQPC-SIS works most satisfactorily and is much better than Qa-SIS and QPC-SIS in terms of both MMS and selection rate P .
In addition, the simulation results of QPC-SIS and NQPC-SIS for Examples 1–4 with ρ = 0.9 and τ = 0.5 are reported in Table 5. It can be observed from Table 5 that when the sample size increases from 200 to 400, the performance of QPC-SIS and NQPC-SIS are improved by much, although QPC-SIS and NQPC-SIS perform very competitively in Examples 1 and 2, while NQPC-SIS performs significantly better than QPC-SIS in Examples 3 and 4. These evidences indicate the effectiveness and usefulness of our NQPC-SIS.
As suggested by one anonymous reviewer, we add one more simulation to compare our NQPC-SIS with the following two approaches: (a) QC-SIS, which is the screening method based on quantile correlation, but simply ignores the effect of conditional variables on the response, and (b) RFQPC-SIS, which is a procedure very similarly to our NQPC-SIS, yet removes the effect of conditional variables through fitting Random Forest models. We examine the performance of these three approaches under τ = 0.5 and n = 200 for Examples 1 to 4, where RFQPC-SIS is a variant of the NQPC method and implemented with randomForest in R package “randomForest”. Note that RFQPC-SIS requires 2 ( p | A ( k ) | ) random forest regressions in the k-th iteration, which is highly computationally intensive. Here, we evaluate NQPC-SIS, QC-SIS and RFQPC-SIS using effective model size (EMS) and P , where EMS indicates the average of true variables contained in the first d n = n / log ( n ) variables selected from 200 replicate experiments. The results are reported in Table 6, showing that our NQPC-SIS still performs the best and is followed by RFQPC-SIS. Moreover, the computational cost of NQPC-SIS is much less than that of RFQPC-SIS.

5.2. An Application to Breast Cancer Data

In this subsection, we apply the proposed NQPC-SIS to breast cancer data with a high lethality rate, which is reported by [36]. The data consists of 19,672 gene expression and 2,149 CGH measurements from 89 cancer patient samples, which is available at https://github.com/bnaras/PMA/blob/master/data/breastdata.rda (accessed on 18 June 2021). Our interest here is to detect the genes that have the most impact on comparative genomic hybridization (CGH) measurements. A similar purpose was achieved in [25,37]. Following [37], we consider the first principal component of 136 CGH measurements as the response Y and the remaining 18,672 gene probes as the explanatory variables X . We implement the two stage procedure for the sake of comparison, where a variable screening method is implemented in the first stage and a predictive regression model is conducted in the second stage. To this end, we select d n = n / log ( n ) variables in the first stage using one of the screening methods: SIS, NIS, Qa-SIS, QPC-SIS and NQPC-SIS, as mentioned in the simulation study. In the second stage, we randomly select 80% sample data as the training set, and the remaining 20 % sample as the test set. Then, we apply one machine learning method, regression tree, to the dimension-reduced data to examine the finite sample performance on the test set. We use the command M5P in R package “RWeka” for implementing the regression tree method. We use the mean of absolute prediction error (MAPE), defined as
MAPE = 1 n ( t e s t ) i = 1 n ( t e s t ) Y i ( t e s t ) Y ^ i ( t e s t ) ,
as our evaluation index, where n ( t e s t ) is the number of observations in the training set and Y ^ i ( t e s t ) is the predicted value of Y at the observation x i in the test set. We repeat the above procedure 500 times and report the mean and standard deviation of 500 MAPEs in Table 7. According to the results in Table 7, we can observe that the NQPC-SIS outperforms both the SIS, NIS and Qa-SIS. Typically, our NQPC-SIS produces the lowest prediction error (MAPE) among these methods when τ = 0.4 , τ = 0.5 and τ = 0.7 . Moreover, we also note that the QPC-SIS performs better than our NQPC-SIS at τ = 0.3 and τ = 0.6 , but worse than our method at other three quantile levels. Qa-SIS performs worst among these methods. This evidence supports that the proposed NPQC-SIS in this paper works well for this real data.

6. Concluding Remarks

In this paper, we proposed a nonparametric quantile partial correlation-based variable screening approach (NQPC-SIS), which can be viewed as an extension of the QPC-SIS proposed in [17] from a parametric framework to the nonparametric situation. Our proposed NQPC-SIS enjoys the sure independence screening property under some mild technical conditions. Furthermore, an algorithm of NQPC-SIS for implementation is provided for users. Extensive numerical experiments including simulations and real-world data analysis are carried out for illustration. The numerical results showed that our NQPC-SIS works fairly well especially when the relationship between variables is highly nonlinear.

Author Contributions

All the authors contributed to formulating the research idea, methodology, theory, algorithm design, result analysis, writing and reviewing the research. Conceptualization, X.X. and H.M.; methodology, X.X.; software, H.M.; validation, H.M.; formal analysis, H.M.; investigation, X.X.; writing—original draft preparation, H.M.; writing—review and editing, X.X.; supervision, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Fundamental Research Funds for the Central Universities (Grant No. 2021CDJQY-047) and National Natural Science Foundation of China (Grant No. 11801202).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Technical Proofs

Proof of Proposition 1.
First, recalling definitions of u j and u j * , we can make a simple algebra decomposition:
τ ( 1 τ ) [ ϱ τ ( Y , X j | X S j ) ϱ τ * ( Y , X j | X S j ) ] = ( σ j , 0 σ j ) 1 ( σ j σ j , 0 ) E { ψ τ ( Y m j 0 ( X S j ) ) ( X j g j 0 ( X S j ) ) } + σ j 1 E { [ ψ τ ( Y m j 0 ( X S j ) ) ψ τ ( Y B j T α j 0 ) ] ( X j g j 0 ( X S j ) ) } σ j 1 E { ψ τ ( Y B j T α j 0 ) [ g j 0 ( X S j ) B j T θ j 0 ] } A 1 + A 2 + A 3 ( say ) .
Due to condition (B1), we can observe
σ j 2 σ j , 0 2 = E { ( X j B j T θ j 0 ) 2 } E { ( X j g j 0 ( X S j ) ) 2 } = E { ( g j 0 ( X S j ) B j T θ j 0 ) 2 } + 2 E { ( g j 0 ( X S j ) B j T θ j 0 ) ( X j g j 0 ( X S j ) ) } = E { ( g j 0 ( X S j ) B j T θ j 0 ) 2 } ,
where the cross product is zero due to E { X j g j 0 ( X S j ) | X S j } = 0 by condition (B1). This, in conjunction with condition (B1) and the basic inequality that a b a b for a > b > 0 , gives
σ j σ j , 0 [ E { k S j ( g j k 0 X k B j X k T θ j k 0 ) } 2 ] 1 / 2 C g r n L n d .
Using Cauchy–Schwarz inequality, (A2) and the fact that | ψ τ ( u ) | max ( τ , 1 τ ) 1 , we have
| A 1 | ( σ j , 0 σ j ) 1 ( σ j σ j , 0 ) E { | ψ τ ( Y m j 0 ( X S j ) ) | 2 } 1 / 2 E { | ( X j g j 0 ( X S j ) ) | 2 } 1 / 2 σ j 1 ( σ j σ j , 0 ) c σ , min 1 / 2 C g r n L n d .
For A 2 , we note that
E { [ ψ τ ( Y m j 0 ( X S j ) ) ψ τ ( Y B j T α j 0 ) ] ( X j g j 0 ( X S j ) ) } E { ( X j g j 0 ( X S j ) ) A 21 } ,
where, by Taylor’s expansion,
A 21 = E { [ ψ τ ( Y m j 0 ( X S j ) ) ψ τ ( Y B j T α j 0 ) ] | X j , X S j } = f Y | ( X j , X S j ) ( y * ) ( m j 0 ( X S j ) B j T α j 0 ) ,
where y * is a number between m j 0 ( X S j ) and B j T α j 0 . Hence, by condition (B1)–(B3) and Cauchy–Schwarz inequality, we can obtain
| A 2 | σ j 1 E { | X j g j 0 ( X S j ) | · | A 21 | } σ j 1 c ¯ f E { | m j 0 ( X S j ) B j T α j 0 | · | X j g j 0 ( X S j ) | } σ j 1 c ¯ f { E [ | m j 0 ( X S j ) B j T α j 0 | 2 ] } 1 / 2 { E [ | X j g j 0 ( X S j ) | 2 ] } 1 / 2 σ j 1 σ j , 0 c ¯ f C m r n L n d c σ , min 1 / 2 c ˜ σ , max 1 / 2 c ¯ f C m r n L n d
for some constant c ¯ f > 0 .
For A 3 , by a similar argument, we can obtain
| A 3 | σ j 1 E { | ψ τ ( Y B j T α j 0 ) | · | g j 0 ( X S j ) B j T θ j 0 | } σ j 1 { E [ | ψ τ ( Y B j T α j 0 ) | 2 ] } 1 / 2 { E [ | g j 0 ( X S j ) B j T θ j 0 | 2 ] } 1 / 2 σ j 1 C g r n L n d c σ , min 1 / 2 C g r n L n d .
Therefore, combining (A1) and the results in (A3)–(A5), we have
| ϱ τ ( Y , X j | X S j ) ϱ τ * ( Y , X j | X S j ) | [ τ ( 1 τ ) ] 1 / 2 c σ , min 1 / 2 ( 2 C g + c ˜ σ , max 1 / 2 c ¯ f C m ) r n L n d .
Using the basic inequality that | a | | b | | | a | | b | | | a b | , we can immediately conclude
u j u j * M 1 * r n L n d ,
where M 1 * = [ τ ( 1 τ ) ] 1 / 2 c σ , min 1 / 2 ( 2 C g + c ˜ σ , max 1 / 2 c ¯ f C m ) . Thus, we complete the proof. □
Proof of Lemma 1.
Without loss of generality, suppose that S j = { 1 , 2 , , s } . Then, B j = ( B ( X 1 ) T , , B ( X s ) T ) T . Let a = 1 , where a = ( a 1 T , , a s T ) T with a k R L n . On one hand, since ( i = 1 n x i ) 2 n i = 1 n x i 2 by Cauchy–Schwarz inequality, we have
a T E { B j B j T } a = E [ k = 1 s a k T B ( X k ) ] 2 s k = 1 s a k T E { B ( X k ) B ( X k ) T } a k .
This together with the right hand side of (9) implies that
λ max ( E { B j B j T } ) s λ max ( E { B ( X k ) B ( X k ) T } ) C 2 s L n 1 .
On the other hand, an application of Lemma S.1 of [38] leads to
a T E { B j B j T } a = E k = 1 s a k T B ( X k ) 2 1 δ 0 2 s 1 k = 1 s E { a k T B ( X k ) B ( X k ) T a k } 2 1 δ 0 2 s 1 λ min ( E { B ( X k ) B ( X k ) T } ) k = 1 s a k 2 ,
where δ 0 = ( 1 b 1 f 2 b 2 f 2 ζ ) 1 / 2 for some positive constant ζ > 0 and the last line uses the fact that a T A a λ min ( A ) for any a = 1 . It follows from the result on the left hand side of (9) that
a T E { B j B j T } a 1 δ 0 2 s 1 C 1 L n 1 k = 1 s a k 2 1 δ 0 2 s 1 C 1 L n 1 ,
where the second inequality stems from ( i = 1 n | x i | ) 2 i = 1 n x i 2 and a = 1 . This in turns implies that
λ min ( E { B j B j T } ) 1 δ 0 2 s 1 C 1 L n 1 .
Hence, combining (A6) and (A7) completes the proof of Lemma 1. □
Lemma A1.
Suppose that condition (C3) holds, then, for all r 2 ,
E ( | X j | r | X j ) K 1 K 2 r r !
holds uniformly in j.
Lemma A1 is the same as Lemma 1 of [11]. From this, it is easily seen that E { | X j | 2 | X j } is finite and bounded by 2 K 1 K 2 2 .
Lemma A2
(Bernstein’s inequality, Lemma 2.2.11, [39]). For independent random variables Y 1 , , Y n with mean zero and E { | Y i | r } r ! K r 2 v i / 2 for every r 2 , i = 1 , , n and some constants K , v i . Then, for x > 0 , we have
P ( | Y 1 + + Y n | > x ) 2 exp x 2 2 ( v + K x ) ,
for v i = 1 n v i .
Lemma A3
(Bernstein’s inequality, Lemma 2.2.9, [39]). For independent random variables Y 1 , , Y n with mean zero and bounded range [ M , M ] , then
P ( | Y 1 + + Y n | > x ) 2 exp x 2 2 ( v + M x / 3 ) ,
for v var ( Y i + + Y n ) .
Lemma A4
(Symmetrization, Lemma 2.3.1, [39]). Let Z 1 , , Z n be independent random variables with values in Z and F is a class of real valued functions on Z . Then,
E { sup f F | ( P n P ) f ( Z ) | } 2 E { sup f F | P n ε f ( Z ) | } ,
where ε 1 , , ε n is a Rademacher sequence (i.e., independent and identically distributed sequence taking values ± 1 with probability 1 2 ) independent of Z 1 , , Z n , and P f ( Z ) = E f ( Z ) and P n f ( Z ) = n 1 i = 1 n f ( Z i ) .
Lemma A5.
(Contraction theorem, [40]). Let z 1 , , z n be nonrandom elements of some space Z and let F be a class of real valued functions on Z . Denote by ε 1 , , ε n a Rademacher sequence. Consider Lipschitz functions g i : R R , that is
| g i ( s 1 ) g i ( s 2 ) | | s 1 s 2 | , s 1 , s 2 R .
Then, for any function f 1 : Z R , we have
E { sup f F | P n ε ( g ( f ) g ( f 1 ) ) | } 2 E { sup f F | P n ε ( f f 1 ) | } .
Lemma A6
(Concentration theorem, [41]). Let Z 1 , , Z n be independent random variables with values in Z and let g G , a class of real valued functions on Z . We assume that for some positive constants l i , g and u i , g , l i , g g ( Z i ) u i , g g G . Define D 2 = sup g G i = 1 n ( u i , g l i , g ) 2 / n , and U = sup g G | ( P n P ) g ( Z ) | , then for any t > 0 ,
P ( U E U + t ) exp n t 2 2 D 2 .
Next, we need several lemmas to establish the consistency inequalities for θ ^ j and α ^ j . Write D n j = 1 n i = 1 n B i j B i j T , D j = E { B i j B i j T } = E { B j B j T } , E n j = 1 n i = 1 n B i j X i j and E j = E { B i j X i j } = E { B j X j } . Thus θ ^ j = D n j 1 E n j and θ j 0 = D j 1 E j .
Lemma A7.
Under conditions (C1) and (C2),
(i) 
there exists a constant C 3 such that for any δ > 0 ,
P | λ min ( D n j ) λ min ( D j ) | r n L n δ / n 2 ( r n L n ) 2 exp δ 2 2 ( C 3 L n 1 n + 2 δ / 3 ) , P | λ max ( D n j D j ) | r n L n δ / n 2 ( r n L n ) 2 exp δ 2 2 ( C 3 L n 1 n + 2 δ / 3 ) ,
(ii) 
for some positive constant c 1 , there exists some positive constant c 2 such that
P ( λ min ( D n j ) | ( 1 + c 1 ) λ min ( D j ) ) 2 ( r n L n ) 2 exp ( c 2 a 0 2 r n r n 2 L n 3 n ) ,
where a 0 = ( 1 δ 0 ) / 2 and δ 0 is defined in Lemma 1; and
(iii) 
in addition, for any given constant c 2 , there exists some positive constant c 3 such that
P D n j 1 ( 1 + c 3 ) D j 1 2 ( r n L n ) 2 exp c 2 a 0 2 r n r n 2 L n 3 n .
Proof of Lemma A7.
First, consider the proof of part (i). Denote Q i j , s , t ( k , l ) = B s ( X i k ) B t ( X i l ) E { B s ( X i k ) B t ( X i l ) } with k , l S j and s , t = 1 , , L n . Recalling that B t 1 , we have | Q i j , s , t ( k , l ) | 2 and var { Q i j , s , t ( k , l ) } E { B s 2 ( X i k ) B t 2 ( X i l ) } E { B s 2 ( X i k ) } C 3 L n 1 by the inequality (10). By Lemma A3, we have for any δ > 0 ,
P | n 1 i = 1 n Q i j , s , t ( k , l ) | > δ n 2 exp δ 2 2 ( C 3 L n 1 n + 2 δ / 3 ) .
Let Q n j = D n j D j . It follows from Lemma 5 of [7] that | λ min ( D n j ) λ min ( D j ) | max { | λ min ( Q n j ) | , | λ min ( Q n j ) | } . Besides, it is easy to derive that for any | S j | L n × 1 vector a = 1 , | a T Q n j a | L n | S j | · Q n j , which implies that
| λ min ( Q n j ) | L n | S j | · Q n j , and | λ max ( Q n j ) | L n | S j | · Q n j .
This in conjunction with (A8) and the union bound of probability yields that
P | λ min ( D n j ) λ min ( D j ) | r n L n δ / n P Q n j δ / n 2 ( r n L n ) 2 exp δ 2 2 ( C 3 L n 1 n + 2 δ / 3 )
and
P | λ max ( D n j D j ) | r n L n δ / n 2 ( r n L n ) 2 exp δ 2 2 ( C 3 L n 1 n + 2 δ / 3 ) .
Next, consider the proof of part (ii). Let c 1 * = 2 c 1 C 1 / ( 1 δ 0 ) , where c 1 ( 0 , 1 ) . Employing the result (A10) and taking δ = c 1 * a 0 r n r n 1 L n 2 n , we have
P | λ min ( D n j ) λ min ( D j ) | c 1 λ min ( D j ) P | λ min ( D n j ) λ min ( D j ) | c 1 * a 0 r n L n 1 2 ( r n L n ) 2 exp c 1 * 2 a 0 2 r n r n 2 L n 4 n 2 2 ( C 3 L n 1 n + 2 c 1 * a 0 r n r n 1 L n 2 n / 3 ) 2 ( r n L n ) 2 exp c 2 a 0 2 r n r n 2 L n 3 n
for some positive constant c 2 . This implies the part (ii).
Last, consider the proof of part (iii). Let A = λ min ( D n j ) and B = λ min ( D j ) . Obviously, we know that A , B > 0 . Using the same arguments as in [7], we can show that for a ( 0 , 1 ) , | A 1 B 1 | c B 1 implies | A B | a B , where c = 1 1 a 1 . Thus, | λ min 1 ( D n j ) λ min 1 ( D j ) | ( 1 / ( 1 c 1 ) 1 ) λ min 1 ( D j ) implies | λ min ( D n j ) λ min ( D j ) | c 1 λ min ( D j ) . Hence, using the fact that λ min 1 ( A ) = λ max ( A 1 ) = A 1 for any real symmetric invertible matrix A , we have
P | D n j 1 ) | ( 1 + c 3 ) D j 1 P | D n j 1 ) D j 1 | c 3 D j 1 P | λ min ( D n j ) λ min ( D j ) | c 1 λ min ( D j ) 2 ( r n L n ) 2 exp c 2 a 0 2 r n r n 2 L n 3 n ,
where c 3 = 1 / ( 1 c 1 ) 1 . This completes the proof. □
Lemma A8.
Under conditions (C1)–(C3), for every 1 j p and for any given positive constant c 1 * , there exist some positive constants c 2 * such that
P ( θ ^ j θ j 0 c 1 * a 0 r n r n 1 / 2 L n ) [ 8 ( r n L n ) 2 + 4 r n L n ] exp c 2 * a 0 2 r n r n 2 L n 3 n .
Proof of Lemma A8.
By the definitions of θ ^ j and θ j 0 and a simple algebra operation, we have
θ ^ j θ j 0 = ( D n j 1 D j 1 ) E n j + D j 1 ( E n j E j ) I n 1 + I n 2 ( say ) .
In the following, we need to find the exponential tail probabilities for I n 1 and I n 2 , respectively.
We first deal with the first term I n 1 . Since D n j 1 D j 1 = D n j 1 ( D j D n j ) D j 1 , we have
I n 1 2 = E n j T D j 1 ( D j D n j ) D n j 1 D n j 1 ( D j D n j ) D j 1 E n j D j 1 2 D n j 1 2 D n j D j 2 E n j 2 .
Thus, it follows from the triangle inequality and Lemma 1 that
I n 1 λ min 1 ( D j ) D n j 1 · | λ max ( D n j D j ) | · E n j C 1 1 a 0 | S j | + 1 L n D n j 1 · | λ max ( D n j D j ) | · E j + C 1 1 a 0 | S j | + 1 L n D n j 1 · | λ max ( D n j D j ) | · E n j E j I n 1 ( 1 ) + I n 1 ( 2 ) ( say ) .
For I n 1 ( 1 ) , it follows that
E j 2 = k S j l = 1 L n E { B l ( X i k ) X i j } 2 k S j l = 1 L n E B l 2 ( X i k ) X i j 2 k S j l = 1 L n E B l 2 ( X i k ) E { X i j 2 | X j } 2 K 1 K 2 2 C 3 r n = C 4 r n ,
where C 4 = 2 K 1 K 2 2 C 3 and the last inequality holds by applying Lemma A1 and the result in (10). Using the above result, we have
I n 1 ( 1 ) C 1 1 C 4 1 / 2 a 0 r n + 1 r n 1 / 2 L n D n j 1 · | λ max ( D n j D j ) | .
Let C 5 = ( 1 + c 3 ) a 0 2 C 1 2 C 4 1 / 2 , then for any δ > 0 , we have
P ( | I n 1 ( 1 ) | C 5 a 0 2 r n r n 3 / 2 L n 3 δ / n ) P D n j 1 ( 1 + c 3 ) D j 1 + P | λ max ( D n j D j ) | r n L n δ / n .
Therefore, by Lemma A7, it follows that
P | I n 1 ( 1 ) | C 5 a 0 2 r n r n 3 / 2 L n 3 δ / n 2 ( r n L n ) 2 exp c 2 a 0 2 r n r n 2 L n 3 n + 2 ( r n L n ) 2 exp δ 2 2 ( C 3 L n 1 n + 2 δ / 3 ) .
For I n 1 ( 2 ) , note that E n j E j = 1 n i = 1 n B i j X i j E { B i j X i j } is an | S j | L n × 1 vector, whose ( ( k 1 ) L n + 1 ) th component is 1 n i = 1 n B l ( X i k ) X i j E { B l ( X i k ) X i j } , where k S j and l = 1 , , L n . Let Z i k l j = B l ( X i k ) X i j E { B l ( X i k ) X i j } . Then, for every r 2 , we have
E { | Z i k l j | r } 2 r E { | B l ( X i k ) X i j | r } 2 r E { B l 2 ( X i k ) | X i j | r } 2 r E { B l 2 ( X i k ) E ( | X i j | r | X j ) } 2 r K 1 K 2 r r ! C 3 L n 1 = r ! ( 2 K 2 ) r 2 8 K 1 K 2 2 C 3 L n 1 / 2 ,
where we have used the C r inequality that | x + y | r 2 r 1 ( | x | r + | y | r ) for r 2 , the fact that B l 1 as well as Lemma A1. It follows from Lemma A2 that for any δ > 0 ,
P | 1 n i = 1 n Z i k l j | δ n 2 exp δ 2 c 4 L n 1 n + c 5 δ ,
where c 4 = 16 K 1 K 2 2 C 3 and c 5 = 4 K 2 . Employing the union bound of probability and the inequality (A16), we further have
P E n j E j r n 1 / 2 L n 1 / 2 δ / n 2 r n L n exp δ 2 c 4 L n 1 n + c 5 δ .
Let C 6 = ( 1 + c 3 ) C 1 2 a 0 2 . Similar to the derivation of (A15) and by Lemma 1 and Lemma A7 and (A17), we obtain
P | I n 1 ( 2 ) | C 6 a 0 2 r n r n 3 / 2 L n 7 / 2 δ 2 / n 2 P D n j 1 · | λ max ( D n j D j ) | C 6 C 1 a 0 r n 1 r n L n 2 δ / n + P E n j E j r n 1 / 2 L n 1 / 2 δ / n P D n j 1 ( 1 + c 3 ) D j 1 + P | λ max ( D n j D j ) | > r n L n δ / n + P E n j E j r n 1 / 2 L n 1 / 2 δ / n 2 ( r n L n ) 2 exp c 2 a 0 2 r n r n 2 L n 3 n + 2 ( r n L n ) 2 exp δ 2 2 ( C 3 L n 1 n + 2 δ / 3 ) + 2 r n L n exp δ 2 c 4 L n 1 n + c 5 δ
Hence, combining (A15) and (A18) gives
P I n 1 C 5 a 0 2 r n r n 3 / 2 L n 3 δ / n + C 6 a 0 2 r n r n 3 / 2 L n 7 / 2 δ 2 / n 2 P | I n 1 ( 1 ) | C 5 a 0 2 r n r n 3 / 2 L n 3 δ / n + P | I n 1 ( 2 ) | C 6 a 0 2 r n r n 3 / 2 L n 7 / 2 δ 2 / n 2 4 ( r n L n ) 2 exp c 2 a 0 2 r n r n 2 L n 3 n + 4 ( r n L n ) 2 exp δ 2 2 ( C 3 L n 1 n + 2 δ / 3 ) + 2 r n L n exp δ 2 c 4 L n 1 n + c 5 δ 2 [ 2 ( r n L n ) 2 + r n L n ] exp δ 2 c 6 L n 1 n + c 7 δ + 4 ( r n L n ) 2 exp c 2 a 0 2 r n r n 2 L n 3 n ,
where c 6 = max ( 2 C 3 , c 4 ) and c 7 = max ( c 5 , 4 / 3 ) .
Next, we deal with the second term I n 2 . Since I n 2 2 = ( E n j E j ) T D j 1 D j 1 ( E n j E j ) D j 1 2 E n j E j 2 , we have I n 2 λ min 1 ( D j ) E n j E j C 1 1 a 0 r n + 1 L n E n j E j by Lemma 1. Then, it follows from (A17) that
P I n 2 C 1 1 a 0 r n + 1 r n 1 / 2 L n 3 / 2 δ / n P E n j E j r n 1 / 2 L n 1 / 2 δ / n 2 r n L n exp δ 2 c 4 L n 1 n + c 5 δ .
Putting (A14), (A19) and (A20) together, we find that
P θ ^ j θ j 0 C 5 a 0 2 r n r n 3 / 2 L n 3 δ / n + C 6 a 0 2 r n r n 3 / 2 L n 7 / 2 δ 2 / n 2 + C 1 1 a 0 r n + 1 r n 1 / 2 L n 3 / 2 δ / n P I n 1 C 5 a 0 2 r n r n 3 / 2 L n 3 δ / n + C 6 a 0 2 r n r n 3 / 2 L n 7 / 2 δ 2 / n 2 + P I n 2 C 1 1 a 0 r n + 1 r n 1 / 2 L n 3 / 2 δ / n 4 [ ( r n L n ) 2 + r n L n ] exp δ 2 c 6 L n 1 n + c 7 δ + 4 ( r n L n ) 2 exp c 2 a 0 2 r n r n 2 L n 3 n ,
Using (A21) with δ = a 0 r n r n 1 L n 2 n , we have
P ( θ ^ j θ j 0 c 1 * a 0 r n r n 1 / 2 L n ) [ 8 ( r n L n ) 2 + 4 r n L n ] exp c 2 * a 0 2 r n r n 2 L n 3 n
for some positive constant c 2 * and sufficiently large n, where c 1 * = C 5 + C 6 + C 1 1 a 0 . Hence, the desired result follows. □
Lemma A9.
Under conditions (C1)–(C5), for any given constant C > 0 and for every 1 j p , there exist some positive constants c 9 and c 11 such that
P ( α ^ j α j 0 C ( r n L n ) 1 / 2 n κ ) 2 exp c 9 a 0 2 r n r n 2 n 1 4 κ + exp c 11 a 0 2 r n L n 2 n 1 2 κ .
Proof of Lemma A9.
Write W n ( α j ) = 1 n i = 1 n { ρ τ ( Y i B i j T α j ) ρ τ ( Y i ) } and W ( α j ) = E { ρ τ ( Y B j T α j ) ρ τ ( Y ) } . By Lemma A.2 of [13], we have, for any ϵ > 0 ,
P α ^ j α j 0 ϵ P ( sup α j α j 0 ϵ | W n ( α j ) W ( α j ) | 1 2 inf α j α j 0 = ϵ W ( α j ) W ( α j 0 ) )
Taking ϵ = C ( r n L n ) 1 / 2 n κ in (A22), where C is any given positive constant, we first show that there exists some positive constant c 8 such that
inf α j α j 0 = C ( r n L n ) 1 / 2 n κ W ( α j ) W ( α j 0 ) c 8 a 0 r n r n n 2 κ .
To this end, let α j = α j 0 + C ( r n L n ) 1 / 2 n κ u with u = 1 . Invoking the Knight’s identity ([42], p121), i.e., ρ τ ( u v ) ρ τ ( u ) = v [ τ I ( u < 0 ) ] + 0 v [ I ( u s ) I ( u 0 ) ] d s , we have
W ( α j ) W ( α j 0 ) = E 0 C ( r n L n ) 1 / 2 n κ B j T u I ( 0 < Y B j T α j 0 s ) d s ,
where we have used the result that E { B j ψ τ ( Y B j T α j 0 ) } = 0 by the definition of α j 0 . Note that the right hand side of (A24) equals
E 0 C ( r n L n ) 1 / 2 n κ B j T u E I ( 0 < Y B j T α j 0 s ) | X d s = E 0 C ( r n L n ) 1 / 2 n κ B j T u f Y | X ( y * ) s d s ,
for y * between B j T α j 0 and B j T α j 0 + s . By condition (C4), it follows that
W ( α j ) W ( α j 0 ) 1 2 c 3 f C 2 r n L n n 2 κ E ( B j T u ) 2 1 2 c 3 f C 2 r n L n n 2 κ λ min ( E { B j B j T } ) 1 2 c 3 f C 2 C 1 a 0 r n 1 r n n 2 κ = c 8 a 0 r n r n n 2 κ ,
where c 8 = 1 2 c 3 f C 2 C 1 a 0 1 and a 0 = ( 1 δ 0 ) / 2 . This proves (A23). Hence, by (A22), it reduces to derive that
P α ^ j α j 0 C ( r n L n ) 1 / 2 n κ P sup α j α j 0 C ( r n L n ) 1 / 2 n κ | W n ( α j ) W ( α j ) | 1 2 c 8 a 0 r n r n n 2 κ P sup α j α j 0 C ( r n L n ) 1 / 2 n κ | { W n ( α j ) W n ( α j 0 ) } { W ( α j ) W ( α j 0 ) } | 1 4 c 8 a 0 r n r n n 2 κ + P | W n ( α j 0 ) W ( α j 0 ) | 1 4 c 8 a 0 r n r n n 2 κ J n 1 + J n 2 .
In what follows, we first consider J n 2 . Let U i j = [ ρ τ ( Y i B i j T α j 0 ) ρ τ ( Y i ) ] E [ ρ τ ( Y B j T α j 0 ) ρ τ ( Y ) ] and then W n ( α j 0 ) W ( α j 0 ) = 1 n i = 1 n U i j . Note that using the Knight’s identity, we have | ρ τ ( u v ) ρ τ ( u ) | | v | max { τ 1 , τ } | v | . So, by using condition (C5), it follows that
| U i j | 2 | ρ τ ( Y i B i j T α j 0 ) ρ τ ( Y i ) | 2 sup i , j | B i j T α j 0 | 2 M 1 ,
and
var ( U i j ) E [ ρ τ ( Y i B i j T α j 0 ) ρ τ ( Y i ) ] 2 E sup i , j | B i j T α j 0 | 2 M 1 2 .
According to Lemma A3, we have
J n 2 = P | 1 n i = 1 n U i j | 1 4 c 8 a 0 r n r n n 2 κ 2 exp 16 1 c 8 2 a 0 2 r n r n 2 n 2 4 κ 2 ( n M 1 2 + M 1 c 8 a 0 r n r n n 1 2 κ / 6 ) 2 exp c 9 a 0 2 r n r n 2 n 1 4 κ
for some positive constant c 9 , provided a 0 r n r n n 2 κ = o ( 1 ) .
Next, we consider J n 1 . Define V i j ( α j ) = ρ τ ( Y i B i j T α j ) ρ τ ( Y i B i j T α j 0 ) and so W n ( α j ) W n ( α j 0 ) = 1 n i = 1 n V i j ( α j ) . This leads to
J n 1 = P sup α j α j 0 C ( r n L n ) 1 / 2 n κ | 1 n i = 1 n V i j ( α j ) E { V i j ( α j ) } | 1 4 c 8 a 0 r n r n n 2 κ .
Again, using the Knight’s identity, we obtain
| V i j ( α j ) | | B i j T ( α j α j 0 ) [ I ( Y i B i j T α j 0 < 0 ) τ ] | + | 0 B i j T ( α j α j 0 ) { I ( Y i B i j T α j 0 s ) I ( Y i B i j T α j 0 0 ) } d s | 2 | B i j T ( α j α j 0 ) | 2 ( | S j | L n ) 1 / 2 α j α j 0 ,
where the last line is because B k 1 . Thus, it follows that
sup α j α j 0 C ( r n L n ) 1 / 2 n κ | V i j ( α j ) | 2 ( | S j | L n ) 1 / 2 sup α j α j 0 C ( r n L n ) 1 / 2 n κ α j α j 0 2 C r n L n n κ .
Let ε 1 , , ε n be a Rademacher sequence independent of V i j ( α j ) . By Lemmas A4 and A5, we have
E sup α j α j 0 C ( r n L n ) 1 / 2 n κ | 1 n i = 1 n [ V i j ( α j ) E { V i j ( α j ) } ] | 2 E sup α j α j 0 C ( r n L n ) 1 / 2 n κ | 1 n i = 1 n ε i V i j ( α j ) | = 2 E sup α j α j 0 C ( r n L n ) 1 / 2 n κ | 1 n i = 1 n ε i [ ρ τ ( Y i B i j T α j ) ρ τ ( Y i B i j T α j 0 ) ] | 4 E sup α j α j 0 C ( r n L n ) 1 / 2 n κ | 1 n i = 1 n ε i B i j T ( α j α j 0 ) | 4 C ( r n L n ) 1 / 2 n κ E 1 n i = 1 n ε i B i j 4 C ( r n L n ) 1 / 2 n κ E 1 n i = 1 n ε i B i j 2 1 / 2 = 4 C ( r n L n ) 1 / 2 n κ n 2 k S j l = 1 L n i = 1 n E [ ε i 2 B l 2 ( X i k ) ] 1 / 2 c 10 r n L n 1 / 2 n 1 2 κ ,
where c 10 = 4 C C 3 1 / 2 and we have used (10) in the last line. With the above arguments, we can apply Lemma A6 to derive J n 2 in equation (A27). Set
U = sup α j α j 0 C ( r n L n ) 1 / 2 n κ | 1 n i = 1 n V i j ( α j ) E { V i j ( α j ) } | .
Taking t = 1 4 c 8 a 0 r n r n n 2 κ c 10 r n L n 1 / 2 n 1 2 κ in Lemma A6, we have
J n 2 = P U 1 4 c 8 a 0 r n r n n 2 κ = P U E { U } + ( 1 4 c 8 a 0 r n r n n 2 κ E { U } ) P U E { U } + ( 1 4 c 8 a 0 r n r n n 2 κ c 10 r n L n 1 / 2 n 1 2 κ ) exp n ( 1 4 c 8 a 0 r n r n n 2 κ c 10 r n L n 1 / 2 n 1 2 κ ) 2 2 ( 2 C r n L n n κ ) 2 exp c 11 a 0 2 r n L n 2 n 1 2 κ
foe some positive constant c 11 , provided a 0 2 r n L n / n 1 2 κ = o ( 1 ) . Plugging (A26) and (A29) into (A25) gives the desired result. □
Lemma A10.
Under conditions (C1)–(C5), for every 1 j p and for any given constant c 5 * , there exist some positive constants c 6 * and c 7 * such that
P | 1 n i = 1 n ψ τ ( Y i B i j T α ^ j ) ( X i j B i j T θ ^ j ) E { ψ τ ( Y i B i j T α j 0 ) ( X i j B i j T θ j 0 ) } | c 5 * r n n κ 7 exp c 6 * a 0 2 r n r n 2 n 1 4 κ + [ 8 ( r n L n ) 2 + 4 r n L n ] exp c 7 * a 0 2 r n r n 2 L n 3 n .
Proof of Lemma A10.
Since E { B i j ψ τ ( Y i B i j T θ j 0 ) } = 0 by definition, so E { ψ τ ( Y i B i j T θ j 0 ) ( X i j B i j T θ j 0 ) } = E { ψ τ ( Y i B i j T θ j 0 ) X i j } . A simple decomposition gives
n 1 i = 1 n ψ τ ( Y i B i j T α ^ j ) ( X i j B i j T θ ^ j ) E { ψ τ ( Y i B j T α j 0 ) X i j } = n 1 i = 1 n [ ψ τ ( Y i B i j T α j 0 ) X i j E { ψ τ ( Y B j T α j 0 ) X j } ] + n 1 i = 1 n { ψ τ ( Y i B i j T α ^ j ) ψ τ ( Y i B i j T α j 0 ) } X i j n 1 i = 1 n ψ τ ( Y i B i j T α ^ j ) B i j T θ ^ j Δ n 1 j + Δ n 2 j + Δ n 3 j .
The rest is to find exponential bounds for the tail probabilities of Δ n 1 j , Δ n 2 j and Δ n 3 j , respectively.
For Δ n 1 j , since | ψ τ ( u ) | max ( τ , 1 τ ) 1 , so it follows from the C r inequality and Lemma A1 that for each r 2 ,
E | ψ τ ( Y i B i j T α j 0 ) X i j E { ψ τ ( Y B j T α j 0 ) X j } | r 2 r E | ψ τ ( Y i B i j T α j 0 ) X i j | r 2 r E | X i j | r = 2 r E E ( | X i j | r | X j ) 2 r K 1 K 2 r r ! = r ! ( 2 K 2 ) r 2 8 K 1 K 2 2 / 2 .
Invoking Lemma A2, for any δ > 0 , we have
P | Δ n 1 j | δ / n 2 exp δ 2 c 12 n + c 13 δ ,
where c 12 = 16 K 1 K 2 2 and c 13 = 4 K 2 .
For Δ n 2 j , note that for each r 2 ,
P | Δ n 2 j | δ / n P | Δ n 2 j | δ / n , α ^ j α j 0 < C ( r n L n ) 1 / 2 n κ + P α ^ j α j 0 C ( r n L n ) 1 / 2 n κ H n 1 j + H n 2 j ,
where a direct application of Lemma A9 yields H n 2 j 2 exp c 9 a 0 2 r n r n 2 n 1 4 κ + exp c 11 a 0 2 r n L n 2 n 1 2 κ . Let α ^ j = α j 0 + C ( r n L n ) 1 / 2 n κ u with u 1 . Denote
Π i j = sup u 1 | { ψ τ ( Y i B i j T α j 0 C ( r n L n ) 1 / 2 n κ B i j T u ) ψ τ ( Y i B i j T α j 0 ) } X i j | .
Then,
H n 1 j P | n 1 i = 1 n Π i j | δ n .
Furthermore, there exists a u * = ( { u k * T , k S j } ) T with u * 1 and u k * R L n such that
E { Π i j } = E { | { ψ τ ( Y i B i j T α j 0 C ( r n L n ) 1 / 2 n κ B i j T u * ) ψ τ ( Y i B i j T α j 0 ) } X i j | } E | B i j T α j 0 B i j T α j 0 + C ( r n L n ) 1 / 2 n κ B i j T u * f Y | X ( y ) d y | | X i j | c 4 f C ( r n L n ) 1 / 2 n κ E | B i j T u * | | X i j | c 4 f C ( r n L n ) 1 / 2 n κ E | B i j T u * | 2 E { | X i j | 2 } c 14 r n n κ
for some positive constant c 14 , where we have used condition (C4) in the third line, Cauchy–Schwarz inequality in the fourth line, Lemmas 1 and A1 in the last line. Analogously to (A31), we have for each r 2 ,
E { | Π i j E ( Π i j ) | r } 2 r E { | Π i j | r } 2 r E { 2 r | X i j | r } r ! ( 4 K 2 ) r 2 32 K 2 2 K 1 / 2
and it follows from Lemma A2 that for any δ > 0 ,
P | 1 n i = 1 n { Π i E ( Π i j ) } | δ n 2 exp δ 2 c 15 n + c 16 δ ,
where c 15 = 64 K 1 K 2 2 and c 16 = 8 K 2 . Setting δ = c 14 r n n 1 κ in (A34), we obtain
P | 1 n i = 1 n Π i | 2 c 14 r n n κ P | 1 n i = 1 n { Π i E ( Π i j ) } | 2 c 14 r n n κ E ( Π i j ) P | 1 n i = 1 n { Π i E ( Π i j ) } | c 14 r n n κ 2 exp c 17 r n 2 n 1 2 κ .
As r n 2 n 1 2 κ / ( a 0 2 r n L n 2 n 1 2 κ ) as n , combining (A32), (A33) and (A35), we obtain
P | Δ n 2 j | 2 c 14 r n n κ 2 exp c 9 a 0 2 r n r n 2 n 1 4 κ + 3 exp c 11 a 0 2 r n L n 2 n 1 2 κ 5 exp c 18 a 0 2 r n r n 2 n 1 4 κ
for some positive constant c 18 .
Finally, we consider Δ n 3 j . Denote Φ ( α j ) = n 1 i = 1 n ρ τ ( Y i B i j T α j ) and define its subdifferential as Φ ( α j ) = ( { Φ ( k 1 ) L n + l ( α j ) : k S j , l = 1 , , L n } ) T with
Φ ( k 1 ) L n + l ( α j ) = n 1 i = 1 n ψ τ ( Y i B i j T α j ) B l ( X i k ) n 1 i = 1 n I ( Y i B i j T α j = 0 ) v i B l ( X i k )
and v i [ τ 1 , τ ] . Recalling the definition of α ^ j , there exists v i * [ τ 1 . τ ] such that Φ ( k 1 ) L n + l ( α ^ j ) = 0 . This yields
Δ n 3 j = n 1 i = 1 n I ( Y i B i j T α ^ j = 0 ) v i B i j T θ ^ j .
Thus, by condition (C5), it follows that
| Δ n 3 j | n 1 i = 1 n I ( Y i B i j T α ^ j = 0 ) | B i j T θ ^ j | n 1 i = 1 n I ( Y i B i j T α ^ j = 0 ) | B i j T θ j 0 | + n 1 i = 1 n I ( Y i B i j T α ^ j = 0 ) | B i j T ( θ ^ j θ j 0 ) | n 1 i = 1 n I ( Y i B i j T α ^ j = 0 ) ( M 2 + ( r n L n ) 1 / 2 θ ^ j θ j 0 ) .
Using Lemma A8, we obtain
P ( M 2 + ( r n L n ) 1 / 2 θ ^ j θ j 0 M 2 + c 1 * a 0 r n r n L n 3 / 2 ) [ 8 ( r n L n ) 2 + 4 r n L n ] exp c 2 * a 0 2 r n r n 2 L n 3 n .
Note that P ( n 1 i = 1 n I ( Y i B i j T α ^ j = 0 ) ϵ ) = 0 for any ϵ > 0 . Letting ϵ = n 1 L n 3 / 2 , we thus have
P n 1 i = 1 n I ( Y i B i j T α ^ j = 0 ) n 1 L n 3 / 2 = 0 .
Gathering (A37)–(A39) gives
P | Δ n 3 j | n 1 L n 3 / 2 ( M 2 + c 1 * a 0 r n r n L n 3 / 2 ) [ 8 ( r n L n ) 2 + 4 r n L n ] exp c 2 * a 0 2 r n r n 2 L n 3 n .
Furthermore, using (A31) with δ = c 14 r n n 1 κ , we have
P | Δ n 1 j c 14 r n n κ 2 exp c 3 * r n 2 n 1 2 κ
for some positive constant c 3 * . Accordingly, by (A36), (A40) and (A41), we obtain
P ( | Δ n 1 j + Δ n 2 j + Δ n 3 j | 3 c 14 r n n κ + n 1 L n 3 / 2 ( M 2 + c 1 * a 0 r n r n L n 3 / 2 ) ) 2 exp c 3 * r n 2 n 1 2 κ + 5 exp c 18 a 0 2 r n r n 2 n 1 4 κ + [ 8 ( r n L n ) 2 + 4 r n L n ] exp c 2 * a 0 2 r n r n 2 L n 3 n 7 exp c 4 * a 0 2 r n r n 2 n 1 4 κ + [ 8 ( r n L n ) 2 + 4 r n L n ] exp c 2 * a 0 2 r n r n 2 L n 3 n
for some positive constant c 4 * . As a result, the desired result follows for some given positive constant c 5 * = 3 c 14 + M 2 + c 1 * and for sufficiently large n. □
Lemma A11.
Under conditions (C1)–(C5), for every 1 j p and for any given constant c 8 * , there exist some positive constants c 10 * and c 12 * such that
P ( | σ ^ j 2 σ j 2 | c 5 * r n n κ ) [ 8 ( r n L n ) 2 + 6 r n L n + 2 ] exp ( c 13 * a 0 2 r n L n 3 n 1 2 κ ) + [ 10 ( r n L n ) 2 + 4 r n L n ] exp ( c 10 * a 0 4 r n r n 3 L n 4 n 1 κ )
when n is sufficiently large. In addition, for some c ˜ 1 ( 0 , 1 ) ,
P ( | σ ^ j 2 σ j 2 | c ˜ 1 σ j 2 ) [ 8 ( r n L n ) 2 + 6 r n L n + 2 ] exp ( c 13 * a 0 2 r n L n 3 n 1 2 κ ) + [ 10 ( r n L n ) 2 + 4 r n L n ] exp ( c 10 * a 0 4 r n r n 3 L n 4 n 1 κ )
Proof of Lemma A11.
Recalling the definition of σ ^ j 2 and σ j 2 , we have
| σ ^ j 2 σ j 2 | | n 1 i = 1 n ( X i j B i j T θ j 0 ) 2 E { ( X i j B i j T θ j 0 ) 2 } | + | n 1 i = 1 n ( X i j B i j T θ ^ j ) 2 n 1 i = 1 n ( X i j B i j T θ j 0 ) 2 | Ξ n 1 j + Ξ n 2 j .
Let ξ i j = ( X i j B i j T θ j 0 ) 2 E { ( X i j B i j T θ j 0 ) 2 } . For every r 2 , by the C r inequality and condition (C5), we have E { | ξ i j | r } 2 r E { ( X i j B i j T θ j 0 ) 2 r } 2 3 r 1 { E | X i j | 2 r + M 2 2 r } 2 3 r 1 { K 1 K 2 2 r ( 2 r ) ! + M 2 2 r } 2 3 r K ˜ 1 K ˜ 2 2 r ( 2 r ) ! 2 3 r K ˜ 1 K ˜ 2 2 r ( 2 r ) r r ! = r ! ( 16 r K ˜ 2 2 ) r 2 512 ( r K ˜ 2 2 ) 2 K ˜ 1 / 2 with K ˜ 1 = max ( K 1 , 1 ) and K ˜ 2 = max ( K 2 , M 2 ) . Thus, by Lemma A2, it follows
P Ξ n 1 j 1 2 c 5 * r n n κ 2 exp ( c 6 * r n 2 n 1 2 κ )
for some positive constant c 6 * . In addition, it is easily derived that
Ξ n 2 j ( θ ^ j θ j 0 ) T D n j ( θ ^ j θ j 0 ) + | 2 n 1 i = 1 n ( X i j B i j T θ j 0 ) B i j T ( θ ^ j θ j 0 ) | Ξ n 2 j ( 1 ) + Ξ n 2 j ( 2 ) ,
where Ξ n 2 j ( 1 ) λ max ( D n j ) θ ^ j θ j 0 2 . Similarly, applying the arguments used in deriving Lemma A7(ii), we have that for any constant c ˜ 1 ( 0 , 1 ) , there exists some finite positive constant c 7 * such that
P ( | λ max ( D n j ) | ( 1 + c ˜ 1 ) λ max ( D j ) ) 2 ( r n L n ) 2 exp c 7 * a 0 2 r n r n 2 L n 3 n .
This together with Lemma 1 yields
P ( | λ max ( D n j ) | ( 1 + c ˜ 1 ) C 2 r n L n 1 ) 2 ( r n L n ) 2 exp c 7 * a 0 2 r n r n 2 L n 3 n .
Moreover, employing (A21) with δ = ( 1 + c ˜ 1 ) 1 / 2 C 2 1 / 2 ( c 5 * / 4 ) 1 / 2 c 1 * 1 a 0 2 r n r n 3 / 2 L n 5 / 2 n 1 κ / 2 , we have
P ( θ ^ j θ j 0 ( 1 + c ˜ 1 ) 1 / 2 C 2 1 / 2 ( c 5 * / 4 ) 1 / 2 L n 1 / 2 n κ / 2 ) 4 [ ( r n L n ) 2 + r n L n ] exp c 8 * a 0 4 r n r n 3 L n 4 n 1 κ + 4 ( r n L n ) 2 exp c 2 * a 0 2 r n r n 2 L n 3 n 4 [ 2 ( r n L n ) 2 + r n L n ] exp c 9 * a 0 4 r n r n 3 L n 4 n 1 κ
for some positive constants c 8 * and c 9 * . This in conjunction with (A45) gives
P Ξ n 2 j ( 1 ) 1 4 c 5 * r n n κ P | λ max ( D n j ) | ( 1 + c ˜ 1 ) C 2 r n L n 1 + P θ ^ j θ j 0 ( 1 + c ˜ 1 ) 1 / 2 C 2 1 / 2 ( c 5 * / 4 ) 1 / 2 L n 1 / 2 n κ / 2 [ 10 ( r n L n ) 2 + 4 r n L n ] exp c 10 * a 0 4 r n r n 3 L n 4 n 1 κ
for some positive constant c 10 * . For Ξ n 2 j ( 2 ) , let N i k l j = ( X i j B i j T θ j 0 ) B l ( X i k ) , k S j , l = 1 , , L n , and then for every r 2 , E { | N i k l j | r } E { | X i j B i j T θ j 0 | r } 2 r 1 { E | X i j | r + sup i , j | B i j T θ j 0 | r } 2 r 1 ( K 1 K 2 r r ! + M 2 r ) 2 r 1 ( 2 K ˜ 1 K ˜ 2 r r ! ) = r ! ( 2 K ˜ 2 ) r 2 8 K ˜ 1 K ˜ 2 2 / 2 , where K ˜ 1 = max ( K 1 , 1 ) and K ˜ 2 = max ( K 2 , M 2 ) . Thus, it follows from Lemma A2 that
P ( | n 1 i = 1 n N i k l j | 1 8 c 5 * c 1 * 1 a 0 r n L n 3 / 2 n κ ) 2 exp ( c 11 * a 0 2 r n L n 3 n 1 2 κ )
for some positive constant c 11 * . Note that n 1 i = 1 n ( X i j B i j T θ j 0 ) B i j ( r n L n ) 1 / 2 max k , l | N i k l j | . This together with (A47) and the union bound of probability gives
P ( n 1 i = 1 n ( X i j B i j T θ j 0 ) B i j 1 8 c 5 * c 1 * 1 a 0 r n r n 1 / 2 L n 1 n κ ) 2 ( r n L n ) exp ( c 11 * a 0 2 r n L n 3 n 1 2 κ ) .
Using Lemma A8 and (A48), we obtain
P Ξ n 2 j ( 2 ) 1 4 c 5 * r n n κ P n 1 i = 1 n ( X i j B i j T θ j 0 ) B i j θ ^ j θ j 0 1 8 c 5 * r n n κ P n 1 i = 1 n ( X i j B i j T θ j 0 ) B i j 1 8 c 1 * 1 c 5 * a 0 r n r n 1 / 2 L n 1 n κ + P ( θ ^ j θ j 0 c 1 * a 0 r n r n 1 / 2 L n ) 2 ( r n L n ) exp ( c 11 * a 0 2 r n L n 3 n 1 2 κ ) + [ 8 ( r n L n ) 2 + 4 r n L n ] exp c 2 * a 0 2 r n r n 2 L n 3 n [ 8 ( r n L n ) 2 + 6 r n L n ] exp c 12 * a 0 2 r n r n 2 L n 3 n
for some positive constant c 12 * . Therefore, combining (A43), (A44), (A46) and (A49), we can conclude the first result of Lemma A11. Moreover, the assumption that r n n κ = o ( 1 ) implies c 5 * r n n κ c ˜ 1 σ j 2 for large n. Hence, the second result of Lemma A11 follows from the first result. □
Proof of Theorem 1.
(i) We first show the first assertion. Let H n 1 j = 1 n i = 1 n ψ τ ( Y i B i j T α ^ j ) ( X i j B i j T θ ^ j ) , H n 2 j = σ ^ j 2 = σ ^ j , h 1 j = E { ψ τ ( Y i B i j T α j 0 ) ( X i j B i j T θ j 0 ) } and h 2 j = σ j . Then,
| ϱ τ ^ ( Y , X j | X S j ) ϱ τ * ( Y , X j | X S j ) | = H n 2 j 1 h 2 j 1 | ( H n 1 j h 1 j ) h 2 j h 1 j ( H n 2 j h 2 j ) | H n 2 j 1 | H n 1 j h 1 j | + H n 2 j 1 h 2 j 1 | h 1 j | | H n 2 j h 2 j | .
We first show that for some given constant C 7 = ( 1 c ˜ 1 + 1 ) 1 M 3 1 / 2 c 5 * , there exists a positive constant c 13 * such that
P ( | H n 2 j h 2 j | C 7 r n n κ ) 2 [ 8 ( r n L n ) 2 + 6 r n L n + 2 ] exp ( c 13 * a 0 2 r n L n 3 n 1 2 κ ) + 2 [ 10 ( r n L n ) 2 + 4 r n L n ] exp ( c 10 * a 0 4 r n r n 3 L n 4 n 1 κ ) .
To this end, using the fact that x y = ( x y ) / ( x + y ) for positive x and y, we have
P ( | H n 2 j h 2 j | C 7 r n n κ ) = P | σ ^ j 2 σ j 2 | C 7 r n n κ ( σ ^ j + σ j ) P | σ ^ j 2 σ j 2 | C 7 r n n κ ( σ ^ j + σ j ) , σ ^ j 2 > ( 1 c ˜ 1 ) σ j 2 + P σ ^ j 2 ( 1 c ˜ 1 ) σ j 2 P | σ ^ j 2 σ j 2 | c 5 * r n n κ + P | σ ^ j 2 σ j 2 | c ˜ 1 σ j 2 ,
where the last line uses condition (C5). This together with Lemma A11 implies (A51). Notice that since C 7 r n n κ = o ( 1 ) , we have, for sufficiently large n, there exists a constant c ˜ 2 ( 0 , 1 ) such that C 7 r n n κ c ˜ 2 M 3 1 / 2 c ˜ 2 σ j . Thus,
P ( H n 2 j ( 1 c ˜ 2 ) h 2 j ) P ( | H n 2 j h 2 j | c ˜ 2 σ j ) P ( | H n 2 j h 2 j | C 7 r n n κ ) 2 [ 8 ( r n L n ) 2 + 6 r n L n + 2 ] exp ( c 13 * a 0 2 r n L n 3 n 1 2 κ ) + 2 [ 10 ( r n L n ) 2 + 4 r n L n ] exp ( c 10 * a 0 4 r n r n 3 L n 4 n 1 κ ) .
Accordingly,
P H n 2 j 1 | H n 1 j h 1 j | ( 1 c ˜ 2 ) 1 M 3 1 / 2 c 5 * r n n κ P | H n 1 j h 1 j | ( 1 c ˜ 2 ) 1 M 3 1 / 2 c 5 * r n n κ H n 2 j , H n 2 j > ( 1 c ˜ 2 ) h 2 j + P ( H n 2 j ( 1 c ˜ 2 ) h 2 j ) P | H n 1 j h 1 j | c 5 * r n n κ + P ( H n 2 j ( 1 c ˜ 2 ) h 2 j ) 7 exp c 6 * a 0 2 r n r n 2 n 1 4 κ + [ 8 ( r n L n ) 2 + 4 r n L n ] exp c 7 * a 0 2 r n r n 2 L n 3 n + 2 [ 8 ( r n L n ) 2 + 6 r n L n + 2 ] exp ( c 13 * a 0 2 r n L n 3 n 1 2 κ ) + 2 [ 10 ( r n L n ) 2 + 4 r n L n ] exp ( c 10 * a 0 4 r n r n 3 L n 4 n 1 κ ) 7 exp c 6 * a 0 2 r n r n 2 n 1 4 κ + [ 44 ( r n L n ) 2 + 20 r n L n + 2 ] exp ( c 14 * a 0 2 r n L n 3 n 1 2 κ ) ,
where c 14 * = min ( c 7 * , c 13 * , c 10 * ) and the last inequality is due to a 0 2 r n r n 3 L n n κ = o ( 1 ) . Moreover, observe that, by the definition of θ j 0 and Lemma A1,
| h 1 j | = | E { ψ τ ( Y i B i j T α j 0 ) ( X i j B i j T θ j 0 ) } | = | E { ψ τ ( Y i B i j T α j 0 ) X i j } | max ( τ , 1 τ ) E { | X i j | } max ( τ , 1 τ ) { E ( X i j 2 ) } 1 / 2 M 4 ,
where M 4 = max ( τ , 1 τ ) 2 K 1 K 2 2 . So it follows from condition (C5) and (A51) and (A51) that
P H n 2 j 1 h 2 j 1 | h 1 j | | H n 2 j h 2 j | ( 1 c ˜ 2 ) 1 M 4 M 3 3 / 2 ( 1 c ˜ 1 + 1 ) 1 c 5 * r n n κ P | H n 2 j h 2 j | ( 1 c ˜ 2 ) 1 M 3 1 ( 1 c ˜ 1 + 1 ) 1 c 5 * r n n κ H n 2 j P | H n 2 j h 2 j | ( 1 c ˜ 2 ) 1 M 3 1 ( 1 c ˜ 1 + 1 ) 1 c 5 * r n n κ H n 2 j , H n 2 j > ( 1 c ˜ 2 ) h 2 j + P ( H n 2 j ( 1 c ˜ 2 ) h 2 j ) P | H n 2 j h 2 j | C 7 r n n κ + P ( H n 2 j ( 1 c ˜ 2 ) h 2 j ) 4 [ 8 ( r n L n ) 2 + 6 r n L n + 2 ] exp ( c 13 * a 0 2 r n L n 3 n 1 2 κ ) + 4 [ 10 ( r n L n ) 2 + 4 r n L n ] exp ( c 10 * a 0 4 r n r n 3 L n 4 n 1 κ ) .
Put C = ( 1 c ˜ 2 ) 1 c 5 * M 3 1 / 2 [ 1 + M 4 M 3 ( 1 c ˜ 1 + 1 ) 1 ] / τ ( 1 τ ) . Therefore, a direct application of (A53) and (A54) as well as the fact that | x y | | | x | | y | | , we can obtain
max 1 j p P | u ^ j u j | C r n n κ 7 exp c 6 * a 0 2 r n r n 2 n 1 4 κ + [ 116 ( r n L n ) 2 + 60 r n L n + 10 ] exp ( c 14 * a 0 2 r n L n 3 n 1 2 κ ) .
This together with the union bound of probability proves the first assertion.
(ii) Next, we show the second assertion. By the choice of ν n = C ˜ 0 r n n κ with C ˜ 0 C 0 / 2 and condition (C6), we have
P ( M * M ^ ) P min j M * u ^ j > ν n = P min j M * u j min j M * u ^ j < min j M * u j ν n P min j M * ( u ^ j u j ) > ν n min j M * u j P min j M * u j max j M * | u ^ j u j | > ν n = 1 P max j M * | u ^ j u j | min j M * u j ν n 1 P max j M * | u ^ j u j | ν n 1 s n { 7 exp c 6 * a 0 2 r n r n 2 n 1 4 κ + [ 116 ( r n L n ) 2 + 60 r n L n + 10 ] exp ( c 14 * a 0 2 r n L n 3 n 1 2 κ ) } .
Thus, this completes the proof. □
Proof of Theorem 2.
By the assumption that i = 1 p u j * = O ( n ς ) which implies that the size of { j : u j * > C ˜ 0 r n n κ } cannot exceed O r n 1 n κ + ς . Thus, it follows that for any δ > 0 , on the set A n = max 1 j p | u ^ j u j * | δ r n n κ , the size of { j : u ^ j > 2 δ r n n κ } cannot exceed the size of { j : u j * > δ r n n κ } , which is bounded by O r n 1 n κ + ς . Then, taking δ = C ˜ 0 and ν n = 2 C ˜ 0 r n n κ , we have
P | M ^ | O r n 1 n κ + κ P A n 1 P max 1 j p | u ^ j u j * | > C ˜ 0 r n n κ .
Therefore, the desired conclusion follows from part (i) of Theorem 1. □

References

  1. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  2. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  3. Zou, H.; Li, R. One-step sparse estimate in nonconcave penalized likelihood models. Ann. Stat. 2008, 36, 1509–1533. [Google Scholar] [PubMed]
  4. Zhang, C. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
  5. Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. B 2008, 70, 849–911. [Google Scholar] [CrossRef] [Green Version]
  6. Cheng, M.; Honda, T.; Li, J.; Peng, H. Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data. Ann. Stat. 2014, 42, 1819–1849. [Google Scholar] [CrossRef] [Green Version]
  7. Fan, J.; Feng, Y.; Song, R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 2011, 106, 544–557. [Google Scholar] [CrossRef] [Green Version]
  8. Fan, J.; Ma, Y.; Dai, W. Nonparametric independent screening in sparse ultra-high dimensional varying coefficient models. J. Am. Stat. Assoc. 2014, 109, 1270–1284. [Google Scholar] [CrossRef] [Green Version]
  9. Fan, J.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 2010, 38, 3567–3604. [Google Scholar] [CrossRef]
  10. Liu, J.; Li, R.; Wu, R. Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J. Am. Stat. Assoc. 2014, 109, 266–274. [Google Scholar] [CrossRef]
  11. Xia, X.; Li, J.; Fu, B. Conditional quantile correlation learning for ultrahigh dimensional varying coefficient models and its application in survival analysis. Statist. Sinica 2019, 29, 645–669. [Google Scholar] [CrossRef]
  12. Chang, J.; Tang, C.Y.; Wu, Y. Local independence feature screening for nonparametric and semiparametric models by marginal empirical likelihood. Ann. Stat. 2016, 44, 515–539. [Google Scholar] [CrossRef] [PubMed]
  13. He, X.; Wang, L.; Hong, H. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 2013, 41, 342–369. [Google Scholar] [CrossRef]
  14. Liu, W.; Ke, Y.; Liu, J.; Li, R. Model-free feature screening and FDR control with knockoff features. J. Am. Stat. Assoc. 2022, 117, 428–443. [Google Scholar] [CrossRef]
  15. Li, J.; Zheng, Q.; Peng, L.; Huang, Z. Survival impact index and ultrahigh-dimensional model-free screening with survival outcomes. Biometrics 2016, 72, 1145–1154. [Google Scholar] [CrossRef] [Green Version]
  16. Li, R.; Zhong, W.; Zhu, L. Feature screening via distance correlation learning. J. Am. Stat. Assoc. 2012, 107, 1129–1139. [Google Scholar] [CrossRef] [Green Version]
  17. Ma, S.; Li, R.; Tsai, C. Variable screening via quantile partial correlation. J. Am. Stat. Assoc. 2017, 112, 650–663. [Google Scholar] [CrossRef]
  18. Mai, Q.; Zou, H. The fused Kolmogorov filter: A nonparametric model-free screening method. Ann. Stat. 2015, 43, 1471–1497. [Google Scholar] [CrossRef] [Green Version]
  19. Wu, Y.; Yin, G. Conditional qunatile screening in ultrahigh-dimensional heterogeneous data. Biometrika 2015, 102, 65–76. [Google Scholar] [CrossRef]
  20. Zhou, T.; Zhu, L.; Xu, C.; Li, R. Model-free forward screening via cumulative divergence. J. Am. Stat. Assoc. 2020, 115, 1393–1405. [Google Scholar] [CrossRef]
  21. Barut, E.; Fan, J.; Verhasselt, A. Conditional sure independence screening. J. Am. Stat. Assoc. 2016, 111, 1266–1277. [Google Scholar] [CrossRef] [PubMed]
  22. Xia, X.; Jiang, B.; Li, J.; Zhang, W. Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis. Lifetime Data Anal. 2016, 22, 549–569. [Google Scholar] [CrossRef] [PubMed]
  23. Chu, W.; Li, R.; Meimherr, M. Feature screening for time-varying coefficient models with ultrahigh-dimensional longitudinal data. Ann. Appl. Stat. 2016, 10, 596–617. [Google Scholar] [CrossRef] [PubMed]
  24. Liu, Y.; Wang, Q. Model-free feature screening for ultrahigh-dimensional data conditional on some variables. Ann. I. Stat. Math. 2018, 70, 283–301. [Google Scholar] [CrossRef]
  25. Wen, C.; Pan, W.; Huang, M.; Wang, X. Sure independence screening adjusted for confounding covariates with ultrahigh-dimensional data. Statist. Sinica 2018, 28, 293–317. [Google Scholar]
  26. Li, R.; Liu, J.; Lou, L. Variable selection via partial correlation. Statist. Sinica 2017, 27, 983–996. [Google Scholar] [CrossRef]
  27. Li, G.; Li, Y.; Tsai, C.L. Quantile correlations and quantile autoregressive modeling. J. Am. Stat. Assoc. 2015, 110, 246–261. [Google Scholar] [CrossRef] [Green Version]
  28. Xia, X.; Li, J. Copula-based partial correlation screening: A joint and robust approach. Statist. Sinica 2021, 31, 421–447. [Google Scholar] [CrossRef]
  29. De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 2001. [Google Scholar]
  30. Huang, J.Z.; Wu, C.; Zhou, L. Varying-coefficient models and basis function approximation for the analysis of repeated measurements. Biometrika 2002, 89, 111–128. [Google Scholar] [CrossRef]
  31. Xia, X. Model averaging prediction for nonparametric varying-coefficient models with B-spline smoothing. Stat. Pap. 2022, 62, 2885–2905. [Google Scholar] [CrossRef]
  32. Huang, J.; Horowitz, J.; Wei, F. Variable selection in nonparametric additive models. Ann. Stat. 2010, 38, 2282–2313. [Google Scholar] [CrossRef] [PubMed]
  33. Stone, C. Additive regression and other nonparametric models. Ann. Stat. 1985, 13, 689–705. [Google Scholar] [CrossRef]
  34. Zhou, S.; Shen, X.; Wolfe, D.A. Local asymptotics for regression splines and confidence regions. Ann. Stat. 1998, 26, 1760–1782. [Google Scholar]
  35. Kalisch, M.; Bühlmann, P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 2007, 8, 613–636. [Google Scholar]
  36. Chin, K.; DeVries, S.; Fridlyand, J.; Spellman, P.T.; Roydasgupta, R.; Kuo, W.L.; Lapuk, A.; Neve, R.M.; Qian, Z.; Ryder, T.; et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 2006, 10, 529–541. [Google Scholar] [CrossRef] [Green Version]
  37. Zhou, Y.; Liu, J.; Hao, Z.; Zhu, L. Model-free conditional feature screening with exposure variables. Stat. Its Interface 2019, 12, 239–251. [Google Scholar] [CrossRef] [Green Version]
  38. Chen, Z.; Fan, J.; Li, R. Error variance estimation in ultrahigh dimensional additive models. J. Am. Stat. Assoc. 2018, 113, 315–327. [Google Scholar] [CrossRef]
  39. Van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996. [Google Scholar]
  40. Leddoux, M.; Talagrand, M. Probability in Banach Spaces: Isoperimetry and Processes; Springer: Berlin, Germany, 1991. [Google Scholar]
  41. Massart, P. About the constants in talagrands concentration inequalities for empirical processes. Ann. Probab. 2000, 28, 863–884. [Google Scholar] [CrossRef]
  42. Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Table 1. Simulation results for Example 1 when n = 200.
Table 1. Simulation results for Example 1 when n = 200.
τ = 0.2 τ = 0.5 τ = 0.8
ε ρ Method s n MMS(RSD) P MMS(RSD) P MMS(RSD) P
N ( 0 , 1 ) 0.5SIS4455.5(319.3)0437(330.3)0434.5(372)0
NIS4451(456.5)0.025506(421.3)0486.5(390.5)0
Qa-SIS4466(392.5)0.02466.5(375.5)0.01490.5(382.3)0.01
QPC-SIS44(0)14(0)14(0)1
NQPC-SIS44(0)0.9954(0)14(0)1
0.8SIS4444.5(141.3)0458(161.8)0452.5(188)0
NIS4489.5(274.5)0518.5(274)0511(285.8)0
Qa-SIS4522(372.3)0.01510.5(358)0560.5(292.8)0
QPC-SIS45(2)0.994(1)15(2)0.98
NQPC-SIS46(3)0.964(2)0.996(3)0.96
t ( 3 ) 0.5SIS4434.5(352.8)0.005475(343.3)0472.5(366)0
NIS4492.5(347.5)0.01501.5(415)0555.5(352.3)0
Qa-SIS4510.5(390.3)0.005481(463.3)0.015541.5(460.3)0.01
QPC-SIS44(0)14(0)14(0)1
NQPC-SIS44(0)0.9954(0)14(0)1
0.8SIS4453(135.8)0468(200.5)0473(283.5)0
NIS4535.5(288.3)0507(253.3)0507.5(368.3)0
Qa-SIS4597.5(329.3)0578.5(374)0591.5(366.8)0.005
QPC-SIS46(3)0.9155(2)0.9756(2)0.945
NQPC-SIS46.5(6)0.845(3)0.9556(5.3)0.855
Table 2. Simulation results for Example 2 when n = 200 .
Table 2. Simulation results for Example 2 when n = 200 .
τ = 0.2 τ = 0.5 τ = 0.8
ε ρ Method s n MMS(RSD) P MMS(RSD) P MMS(RSD) P
N ( 0 , 1 ) 0.5SIS5439.5(359.5)0477(319)0427(324.8)0
NIS5522(362)0.005566(429.8)0507.5(392)0
Qa-SIS5542.5(400.3)0565.5(351.5)0554(340.3)0
QPC-SIS55(0)15(0)15(0)1
NQPC-SIS55(0)15(0)15(0)1
0.8SIS5436(111.3)0479.5(232.8)0466.5(219.3)0
NIS5523.5(246.8)0556.5(265.8)0527(286.3)0
Qa-SIS5557.5(376.5)0604(358.8)0542(363.8)0
QPC-SIS56(2)0.976(2)0.977(3)0.93
NQPC-SIS57(2)0.96(2)0.9457(3)0.9
t ( 3 ) 0.5SIS5478.5(347)0451.5(308.8)0483.5(322.8)0
NIS5535.5(384.3)0545.5(317.8)0.005508(353)0
Qa-SIS5597.5(389.5)0.005568.5(341.5)0593.5(435.8)0.005
QPC-SIS55(0)0.995(0)15(0)0.995
NQPC-SIS55(1)0.9855(0)0.9955(1)0.975
0.8SIS5468(286.3)0477(238.5)0466(229)0
NIS5530.5(324.8)0532.5(321.3)0525.5(245.5)0
Qa-SIS5655(391.8)0590.5(361.5)0591.5(374.8)0
QPC-SIS57(21.3)0.7657(3)0.888(44.8)0.74
NQPC-SIS58(81.8)0.637(7.3)0.8211(136.5)0.57
Table 3. Simulation results for Example 3 when n = 200 .
Table 3. Simulation results for Example 3 when n = 200 .
t = 1 t = 2
ε τ Method s n MMS(RSD) P MMS(RSD) P
N ( 0 , 1 ) τ = 0.2 Qa-SIS5695(461.5)0.005799.5(326.5)0
QPC-SIS546(165.3)0.485298.5(477.5)0.075
NQPC-SIS57(13)0.82160(345.8)0.25
τ = 0.5 SIS5492(805)0.1551000(1)0
NIS5533(608)0.035798(391)0.005
Qa-SIS5761.5(460)0.005762.5(353)0
QPC-SIS58.5(33)0.75211(372.8)0.145
NQPC-SIS55(0)0.9629(103.8)0.56
τ = 0.8 Qa-SIS5599(508)0.01708(348.3)0
QPC-SIS547(213)0.46393(400)0.015
NQPC-SIS56(7)0.86156.5(388)0.23
t ( 3 ) τ = 0.2 Qa-SIS5689.5(434)0794(345)0
QPC-SIS585.5(239.5)0.28548.5(518.5)0.025
NQPC-SIS546.5(181)0.46487(551)0.055
τ = 0.5 SIS5626.5(767.8)0.1999(6)0
NIS5560.5(574.5)0.025742(430.3)0
Qa-SIS5673.5(449.3)0.005751.5(358.5)0
QPC-SIS521.5(89.3)0.56331.5(467.8)0.06
NQPC-SIS56(5)0.855136.5(388)0.21
τ = 0.8 Qa-SIS5583(458.5)0.015711.5(382.3)0
QPC-SIS5108.5(303)0.3623(448)0.005
NQPC-SIS528.5(136.3)0.52413(489.5)0.03
Table 4. Simulation results for Example 4 when n = 200 .
Table 4. Simulation results for Example 4 when n = 200 .
t = 1 t = 2
ε τ Method s n MMS(RSD) P MMS(RSD) P
N ( 0 , 1 ) τ = 0.2 Qa-SIS5735(448.5)0.005681.5(388.5)0
QPC-SIS5647.5(425)0.005746.5(329.3)0
NQPC-SIS55(1)0.94586(334)0.385
τ = 0.5 SIS5793(268.3)0846.5(252.8)0
NIS5765.5(298.8)0896.5(225.8)0
Qa-SIS5749.5(274.8)0818(301.5)0
QPC-SIS5717.5(326.8)0805.5(254.3)0
NQPC-SIS55(0)18(60.3)0.705
τ = 0.8 Qa-SIS5836(274)0867.5(243.8)0
QPC-SIS5798.5(248.3)0811.5(249.3)0
NQPC-SIS55(1)0.98561(355.8)0.44
t ( 3 ) τ = 0.2 Qa-SIS5716.5(374.3)0703(375.3)0
QPC-SIS5603(422)0.01743.5(295.3)0
NQPC-SIS57(22.5)0.78317(592.3)0.15
τ = 0.5 SIS5786.5(261.8)0869.5(301.3)0
NIS5779(285.5)0833.5(259.8)0
Qa-SIS5754.5(324.5)0800(255.8)0
QPC-SIS5755.5(379.5)0825(296.3)0
NQPC-SIS55(0)0.9961.5(302.3)0.435
τ = 0.8 Qa-SIS5819.5(255.8)0869(241.3)0
QPC-SIS5795(249.3)0847(260.3)0
NQPC-SIS56(9)0.835375(576)0.14
Table 5. Simulation results for Examples 1 to 4 when ρ = 0.9 and τ = 0.5 .
Table 5. Simulation results for Examples 1 to 4 when ρ = 0.9 and τ = 0.5 .
n = 200 n = 400
ε N ( 0 , 1 ) ε t ( 3 ) ε N ( 0 , 1 ) ε t ( 3 )
MethodMMS(RSD) P MMS(RSD) P MMS(RSD) P MMS(RSD) P
Example 1QPC-SIS6(2)0.949(64)0.6655(2)15(2)0.985
NQPC-SIS6(3)0.8830.5(174.3)0.525(2)0.996(2)0.93
Example 2QPC-SIS7(2)0.925.5(204.5)0.5256(2)17(2)0.985
NQPC-SIS8(9.25)0.82544.5(178.8)0.4656(2)0.9957(2)0.95
Example 3QPC-SIS608.5(471.5)0687.5(400)0.005253(408.3)0.175399.5(521)0.105
NQPC-SIS341(505.8)0.09527.5(496.5)0.02519(75)0.70569.5(247.8)0.475
Example 4QPC-SIS770.5(311.5)0802.5(232.8)0752.5(312)0784(258.3)0
NQPC-SIS301.5(484.3)0.11433(554.8)0.07511(44)0.7638.5(142.8)0.58
Table 6. Simulation results for Examples 1 to 4 when n = 200 and τ = 0.5 .
Table 6. Simulation results for Examples 1 to 4 when n = 200 and τ = 0.5 .
ρ = 0.5 ρ = 0.8
ε N ( 0 , 1 ) ε t ( 3 ) ε N ( 0 , 1 ) ε t ( 3 )
MethodEMS(SD) P EMS(RSD) P EMS(SD) P EMS(SD) P
Example 1QC-SIS2.985(0.122)02.925(0.346)02.570(0.969)02.340(1.162)0
RFQPC-SIS3.995(0.071)0.9953.975(0.157)0.9753.675(0.470)0.6753.460(0.500)0.46
NQPC-SIS4(0)14(0)13.985(0.122)0.9853.930(0.256)0.93
Example 2QC-SIS3.565(0.646)03.490(0.657)03.355(1.147)03.255(1.280)0
RFQPC-SIS4.745(0.437)0.7454.630(0.504)0.644.475(0.609)0.5354.215(0.649)0.335
NQPC-SIS5(0)15(0)14.960(0.221)0.9654.785(0.447)0.8
Example 3QC-SIS2.470(0.862)0.032.575(0.894)01.505(0.567)01.460(0.592)0
RFQPC-SIS4.945(0.229)0.9454.730(0.788)0.754.275(0.808)0.4653.535(0.175)0.175
NQPC-SIS4.955(0.208)0.9554.785(0.424)0.794.320(0.825)0.493.715(1.109)0.265
Example 4QC-SIS1.775(0.613)01.790(0.720)01.685(0.639)01.690(0.629)0
RFQPC-SIS3.045(0.739)0.023.040(0.788)0.0452.455(0.616)02.350(0.528)0
NQPC-SIS5(0)14.980(0.140)0.984.590(0.731)0.7154.205(0.864)0.465
Table 7. Prediction results for the real data on the test set, where the standard deviation is given in the parenthesis.
Table 7. Prediction results for the real data on the test set, where the standard deviation is given in the parenthesis.
Method τ = 0.3 τ = 0.4 τ = 0.5 τ = 0.6 τ = 0.7
SIS--0.8202(0.1362)--
NIS--0.8254(0.1348)--
Qa-SIS0.8318(0.1472)0.8261(0.1446)0.8375(0.1448)0.8612(0.1541)0.8431(0.1512)
QPC-SIS0.7269(0.1267)0.7989(0.1458)0.8347(0.1471)0.6495(0.1155)1.0240(0.1732)
NQPC-SIS0.7629(0.1356)0.6488(0.1232)0.6742(0.1285)0.8156(0.1643)0.7802(0.1315)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xia, X.; Ming, H. A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation. Mathematics 2022, 10, 4638. https://doi.org/10.3390/math10244638

AMA Style

Xia X, Ming H. A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation. Mathematics. 2022; 10(24):4638. https://doi.org/10.3390/math10244638

Chicago/Turabian Style

Xia, Xiaochao, and Hao Ming. 2022. "A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation" Mathematics 10, no. 24: 4638. https://doi.org/10.3390/math10244638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop