Next Article in Journal
Some Theoretical and Computational Aspects of the Truncated Multivariate Skew-Normal/Independent Distributions
Next Article in Special Issue
Estimation and Inference for Spatio-Temporal Single-Index Models
Previous Article in Journal
LoRA-NCL: Neighborhood-Enriched Contrastive Learning with Low-Rank Dimensionality Reduction for Graph Collaborative Filtering
Previous Article in Special Issue
Modeling the Cigarette Consumption of Poor Households Using Penalized Zero-Inflated Negative Binomial Regression with Minimax Concave Penalty
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Partially Functional Linear Models with Linear Process Errors

1
School of Mathematical Sciences, Tongji University, Shanghai 200092, China
2
Department of Applied Mathematics, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(16), 3581; https://doi.org/10.3390/math11163581
Submission received: 20 July 2023 / Revised: 10 August 2023 / Accepted: 15 August 2023 / Published: 18 August 2023
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)

Abstract

:
In this paper, we focus on the partial functional linear model with linear process errors deduced by not necessarily independent random variables. Based on Mercer’s theorem and Karhunen–Loève expansion, we give the estimators of the slope parameter and coefficient function in the model, establish the asymptotic normality of the estimator for the parameter and discuss the weak convergence with rates of the proposed estimators. Meanwhile, the penalized estimator of the parameter is defined by the SCAD penalty and its oracle property is investigated. Finite sample behavior of the proposed estimators is also analysed via simulations.

1. Introduction

Over the last two decades, there has been an increasing interest in functional data analysis due to its extensive application in biometrics, chemometrics, econometrics, medical research as well as other fields. The functional data are intrinsically infinite in dimension; thus, the classical methods for multivariate observations are no longer applicable. The functional linear model is an important model in functional data analysis, and has been extensively investigated. Ramasy and Silverman [1] systematically introduced the statistical analysis methods for the functional data and described regression relationship between functional covariates and scalar responses by the functional linear models; further investigations include Cardot et al. [2], Cardot and Sarda [3], Li and Hsing [4], and Hall and Horowitz [5], who constructed estimator of slope function in the functional linear model and established its convergence with rates based on the functional principal component analysis (FPCA) technique. For more analysis on the functional data, refer to Hall and Hosseini-Nasab [6], Horváth and Kokoszka [7], Hsing and Eubank [8], Ferraty and Vieu [9], and others.
In real data, the response variable is also affected by other covariates. Shin [10] proposed the following partial functional linear model:
Y = z T β + 0 1 γ ( t ) X ( t ) d t + V ,
where Y R is the response variable, z is a d-dimensional covariate vector with E z = 0 , X ( t ) is a square integrable stochastic process on [0, 1] with E X ( t ) = 0 , β is an unknown d-dimensional parametric vector, γ ( t ) is a square integrable unknown coefficient function and V is the regression error, which is independent of { X ( t ) , z } .
The spline method, which is frequently employed to investigate functional data, were used in many papers that studied functional data. Based on B-spline, Yuan and Zhang [11] used the residual sum of squares to construct the test statistic of the β in the model (1); Hu and Liang [12] considered the empirical likelihood in the single-index partially functional linear model when the observations are missing at random; Jiang and Huang [13]) discussed single-index partially functional linear quantile regression and used B-spline to approximate the link function and the slope function and further establish the convergence rates and asymptotic normality of the estimators. Bouka et al. [14] employed a smoothing spline to study the estimators with a spatial functional linear regression.
However, the spline method has some drawbacks, such as the fact that shifting a control point causes the entire curve to change, making it hard to regulate the curve’s trend locally. Then, in recent years, many researchers have developed an interest in the FPCA approach for analyzing functional data, since this approach enables finite dimensional study of a topic that is inherently infinitely dimensional. The estimation and testing of a partially functional linear varying coefficient model were covered by Feng and Xue [15]. Based on FPCA, Xie et al. [16] examined the rank-based test’s asymptotic characteristics while considering the hypothesis test for the β in the model (1). In addition, Hu et al.’s [17] research also concentrated on the estimation issues for additive partial functional linear models with skew-normal errors. Tang et al. [18] proposed a two-step estimation procedure with FPCA in the partial functional partially linear additive model. At the same time, we find there are several papers to discuss penalized estimators related to the partially functional linear model based on FPCA. For instance, Kong et al. [19] applied a group penalty to reduce the effect of significant functional predictors in a high dimension setting; Du et al. [20] analysed estimation and variable selection; Yao et al. [21] selected the important variables based on the SCAD penalty in partially functional linear quantile model; Wu et al. [22] concentrated on building the estimators for the parameter and slope function with the responses being right-censored and the censoring indicators being missing at random, and proposed a variable selection procedure by the method of adaptive lasso penalty.
It is known that the independence assumption for the model errors, in practical applications, is not always appropriate, especially for sequentially collected economic data, which often exhibit dependence in the errors. Up to now, we have found that Wang et al. [23] established asymptotic normality and weak convergence with rates of the estimators for β and γ , respectively, in model (1) when the errors form a stationary α -mixing sequence, while Hu and Liang [24] used the reproducing kernel Hilbert space technique to study the parameter estimator and the convergence rate of the estimator for the slope function with missing observations under under correlated errors V i = j = + c j e i j , which is called linear process, with j = + | c j | < , and further defined the penalized estimator of the parameter by the SCAD penalty and a test statistic for check a linear hypothesis.
Motivated by the discussion in Hu and Liang [24], we, in this paper, focus on the partially functional linear model (1) when the regression error V is a linear process deduced by not necessarily independent random variables by using FPCA method. In particular, we give the estimators β ^ and γ ^ ( · ) of β and γ ( · ) , investigate the asymptotic normality of β ^ and discuss the weak convergence with rates of β ^ and γ ^ ( · ) . At the same time, the penalized estimator of β is defined based on the SCAD penalty introduced by Fan and Li [25] and its oracle property is established. Finite sample behavior of the proposed estimators is also investigated via simulations.
The rest of the paper is organized as follows. In Section 2, we construct the estimators of the parameter and slope function including the penalized estimator of the parameter. The main results are described in Section 3. A simulation study is presented in Section 4. Conclusions are put into Section 5. All proofs are put in Section 6.

2. Estimators

2.1. Least Squares Estimation

Let { Y i , X i ( t ) , z i , 1 i n } come from ( Y , X ( t ) , z ) based on model (1), i.e.,
Y i = z i T β + 0 1 γ ( t ) X i ( t ) d t + V i ,
where { X i ( t ) , z i , 1 i n } are assume to be i.i.d. random variables, the errors V i = j = c j e i j with j = | c j | < and E e i = 0 . Set C X ( s , t ) = E X ( s ) X ( t ) . The operator T X f corresponding to C X ( s , t ) is defined by
( T X f ) ( · ) = 0 1 C X ( · , t ) f ( t ) d t .
Then T X f is positive definite, i.e., for f L 2 ( [ 0 , 1 ] ) , T X f , f = 0 1 0 1 E X ( s ) X ( t ) f ( s ) f ( t ) d s d t = E 0 1 X ( s ) f ( s ) d s 2 0 , where f , g = 0 1 f ( t ) g ( t ) d t , f ( t ) , g ( t ) L 2 [ 0 , 1 ] . If C X ( s , t ) is continuous, then C X ( · , · ) has the following representation by Mercer’s theorem (cf. Hsing and Eubank [8], Theorem 4.6.5, page 120)
C X ( s , t ) = i = 1 λ i ρ i ( s ) ρ i ( t ) , s , t [ 0 , 1 ] ,
where { ρ i ( s ) , i 1 } is an orthonormal basis of L 2 ( [ 0 , 1 ] ) , and ( λ i , ρ i ( s ) ) are (eigenvalue, eigenfunction) pairs of T X , which satisfy T X ρ i = λ i ρ i , ρ i , ρ j = 0 with i j and ρ i , ρ j = 1 with i = j . Without loss generality, we assume λ 1 > λ 2 > > 0 . The estimators of ( λ i , ρ i ( s ) ) are defined by
C ^ X ( s , t ) = 1 n i = 1 n X i ( s ) X i ( t ) = j = 1 λ ^ j ρ ^ j ( s ) ρ ^ j ( t ) , s , t [ 0 , 1 ] , λ ^ 1 λ ^ 2 0 ,
where { ( λ ^ j , ρ ^ j ) , i 1 } are the (eigenvalue, eigenfunction) pairs of T ^ X corresponding to C ^ X with T ^ X f ( · ) = 0 1 C ^ X ( · , t ) f ( t ) d t . Here, the choice method for { ρ ^ j , j 1 } is to simply let ρ j take an arbitrary sign while choosing ρ ^ j to minimize ρ ^ j ρ j over the two possible signs, that is, ρ ^ j is chosen so that ρ ^ j , ρ j . Clearly, { ρ ^ j , j 1 } is an orthonormal basis of L 2 ( [ 0 , 1 ] ) and { ( λ ^ j , ρ ^ j ) , j 1 } satisfy T ^ X ρ ^ j ( t ) = λ ^ j ρ ^ j ( t ) .
In addition, using Karhunen–Loève expansion (cf. Hsing and Eubank [8], Theorem 2.4.13, page 34,), X ( t ) and γ ( t ) have the following expressions
X ( t ) = j = 1 X , ρ j ρ j ( t ) = j = 1 θ j ρ j ( t ) , γ ( t ) = j = 1 γ , ρ j ρ j ( t ) = j = 1 α j ρ j ( t ) ,
where θ j : = X , ρ j and α j : = γ , ρ j . Then E θ j = 0 and Var ( θ j ) = λ j . Thus, the model (2) can be written as
Y i = β z i + j = 1 α j θ i j + V i , V i = j = c j e i j with θ i j = X i , ρ j .
In order to define the estimators of β and γ ( · ) , we use an approximated form of (3)
Y i β z i + j = 1 m α j θ i j + V i , i = 1 , 2 , , n , m : = m n ( n ) ,
which can be rewritten into the following matrix form: Y = Z β + U m α + V , where Y = ( Y 1 , , Y n ) T , Z = ( z 1 , , z n ) T , z i = ( z i 1 , , z i d ) T , U m = ( X i , ρ j ^ ) n × m , α = ( α i , , α m ) T , V = ( V 1 , , V n ) T . The estimators ( β ^ , α ^ ) of ( β , α ) can be defined by minimizing the following objective function
G ( β , α ) = ( Y Z β U m α ) T ( Y Z β U m α ) .
Let H m = U m ( U m T U m ) 1 U m T . When Z T ( I H m ) Z is invertible, we have
β ^ = Z T ( I H m ) Z 1 Z T ( I H m ) Y , α ^ = ( U m T U m ) 1 U m T ( Y Z β ^ ) .
Let δ j k denote the Kronecker delta, then
i = 1 n X i , ρ ^ j X i , ρ ^ k = n 0 1 0 1 C ^ X ( s , t ) ρ ^ j ( s ) ρ ^ k ( t ) d s d t = n λ ^ j δ j k ,
which implies U m T U m = diag ( n λ ^ 1 , n λ ^ 2 , , n λ ^ m ) . Put C ^ z = 1 n i = 1 n z i z i T , C ^ z Y = 1 n i = 1 n z i Y i ,
C ^ z X ( t ) = 1 n i = 1 n z i X i ( t ) , C ^ X z ( t ) = C ^ z X ( t ) T , C ^ Y X ( t ) = 1 n i = 1 n Y i X i ( t ) .
Then β ^ and α ^ can be rewritten, respectively, as
β ^ = C ^ z j = 1 m C ^ z X , ρ ^ j C ^ X z , ρ ^ j λ ^ j 1 C ^ z Y j = 1 m C ^ z X , ρ ^ j C ^ Y X , ρ ^ j λ ^ j , α ^ = C ^ Y X β ^ T C ^ z X , ρ ^ j λ ^ j j = 1 , 2 , , m : = ( α ^ 1 , α ^ 2 , , α ^ m ) ,
where C ^ z X , ρ ^ j = 1 n i = 1 n z i X i , ρ ^ j and C ^ Y X , ρ ^ j = 1 n i = 1 n Y i X i , ρ ^ j . The estimator γ ^ ( t ) of γ ( t ) is defined by γ ^ ( t ) = j = 1 m α ^ j ρ ^ j ( t ) .

2.2. Variable Selection

Variable selection is a crucial step when the dimensionality of the covariate z in (1) is high, and it is of great interest to identify the nonzero components in β . In this paper, we adopt the SCAD penalty introduced by Fan and Li [25] to get a penalized estimator. In particular, the first order derivative of the SCAD penalty function p ω ( t ) is
p ω ( t ) = ω I ( t ω ) + ( a ω t ) + ( a 1 ) ω I ( t > ω ) , t > 0 ,
where ω is a tuning parameter and a = 3.7 suggested by Fan and Li [25]. Hence, we define the penalty estimator of β as
β ^ p = arg min β R d G p ( β ) : = arg min β R d G ( β , α ( β ) ) + n j = 1 d p ω ( | β j | ) ,
where α ( β ) = ( U m T U m ) 1 U m T ( Y Z β ) and β j is the j-th component of β .
Remark 1. 
In simulation below, the tuning parameter ω in (5) is selected by 10-fold cross-validation.
Let β 0 = ( β 01 , , β 0 d ) be true value of β , and put A = { j : β 0 j 0 } , d 1 = | A | . Without loss of generality, we assume β 0 = ( β 01 T , β 02 T ) T , where β 01 R d 1 and β 02 R d d 1 are the nonzero and zero components of β 0 , respectively, i.e., β 0 = ( β 01 T , 0 T ) T , and the estimator of β 0 = ( β 01 T , 0 T ) T is defined by β ^ p = ( β ^ p 1 T , β ^ p 2 T ) T .

3. Main Results

In the sequel, let C , C 1 , and c 0 , c , c 1 , denote generic finite positive constants, whose values may change from line to line; a n b n means C 1 a n b n C 2 ; f = ( 0 1 f 2 ( t ) d t ) 1 / 2 . For the sake of convenience for statement, we give the following notations.
η i k = z i k f k , X i = z i k j = 1 C z k X , ρ j ρ j , X i λ j , η i = ( η i 1 , η i 2 , , η i d ) T , C z = Var ( z ) , C z X = ( C z 1 X , , C z d X ) T , C X z = ( C z X ) T , f k = j = 1 C z k X , ρ j ρ j λ j , where C z k X = Cov ( z k , X )
and z R d 2 = z 1 2 + z 2 2 + + z d 2 for z = ( z 1 , z 2 , , z d ) T .
In order to list the main results in this paper, we impose the following assumptions.
(A0)
Let random variables { e i } be identical distributed with E e 1 2 < or square uniformly integrable and satisfy E e i = 0 .
(A1)
0 1 X 2 ( t ) d t < a.s., and E X 4 < .
(A2)
E z R d 4 < .
(A3)
For each j, E θ j 4 c λ j 2 and c 1 j a λ j c 2 j a , λ j λ j + 1 c j a 1 for some a > 1 .
(A4)
For each j and some b > a 2 + 1 : (i) | α j | c j b ; (ii) | C z k X , ρ j | c j ( a + b ) for each k.
(A5)
m n 1 a + 2 b .
(A6)
Let η 1 k , η 2 k , , η n k be i.i.d. random variables and satisfy E ( η 1 k | X 1 , X 2 , , X n ) = 0 a.s., E ( η 1 k 2 | X 1 , X 2 , , X n ) = B k k a.s., which is k-th diagonal element of B = E ( η i η i T ) and B is a positive definite matrix. Assume that E | η 1 k | 2 + δ < for δ > 0 .
(A7)
lim inf n lim inf θ 0 + p ω ( θ ) / ω > 0 , max j A p ω ( | β 0 j | ) = o ( n 1 / 2 ) , max j A p ω ( | β 0 j | ) = o ( 1 ) .
Remark 2. 
(a) 
It is easy to verify B = C z j = 1 C z X , ρ j ρ j , X i λ j .
(b) 
(A1)–(A3), (A4)(i), (A5) and (A6) are general regularization conditions in the partially functional linear model (cf. Shin [10]); when λ j decrease in (A3), λ j λ j + 1 c j a 1 implies λ j c 1 j a . (A1) implies C ^ X C X 2 : = 0 1 0 1 [ C ^ X ( s , t ) C X ( s , t ) ] 2 d s d t = O p ( n 1 ) . In fact,
E C ^ X C X 2 = 0 1 0 1 E n 1 i = 1 n X i ( s ) X i ( t ) E X ( s ) X ( t ) 2 d s d t = n 1 0 1 0 1 Var ( X ( s ) X ( t ) ) d s d t n 1 E 0 1 X 2 ( s ) d s 2 = n 1 E X 4 = O ( n 1 ) ,
which implies C ^ X C X 2 = O p ( n 1 ) .
(c) 
From (A3) and (A4)(ii), we have | f k , ρ j | = | t = 1 C z k X , ρ t ρ t λ j , ρ j | = | C z k X , ρ j λ j | c j b .
Theorem 1. 
Let (A0)–(A6) hold with E ( e i | , e i 2 , e i 1 ) = 0 a.s. and E ( e i 2 | , e i 2 , e i 1 ) = σ 2 a.s. for < i < . Then n ( β ^ β 0 ) D N 0 , σ 2 B 1 j = c j 2 .
Theorem 2. 
Let (A0)–(A6) hold, then γ ^ γ 2 = O p ( n ( 2 b 1 ) / ( a + 2 b ) ) .
Remark 3. 
When c 0 = 1 and c j = 0 for j 0 , V i = e i and { e i } is i.i.d. random variables, Theorems 1 and 2 can reduce to Theorems 3.1 and 3.2 of Shin [10], respectively.
Theorem 3. 
Suppose that (A0)–(A7) hold. If ω 0 and n 1 / 2 ω , then
(1) 
Selection consistency: P ( β ^ p 2 = 0 ) 1 ;
(2) 
Asymptotic normality: If E ( e i | , e i 2 , e i 1 ) = 0 a.s. and E ( e i 2 | , e i 2 , e i 1 ) = σ 2 a.s. for < i < , then n ( β ^ p 1 β 01 ) D N 0 , σ 2 B p 1 1 j = c j 2 , where B p 1 = E ( η i ( 1 ) η i ( 1 ) T ) and η i ( 1 ) is related to z i ( 1 ) , which is corresponding to β 01 , is a d 1 -dimensional subvector of z i for i = 1 , 2 , , n .

4. Simulation Study

4.1. Least Squares Estimation

In this subsection, we use the Monte Carlo simulation to study the performance for the proposed methods. The data are generated from the following model:
Y i = z i 1 β 1 + z i 2 β 2 + 0 1 γ ( t ) X i ( t ) d t + V i ,
where z i 1 N ( 0 , 1 ) , z i 2 N ( 0 , 1 ) and β = ( β 1 , β 2 ) T = ( 1 , 2 ) T . Taking X i ( t ) = j = 1 50 θ j ρ j ( t ) , θ j N ( 0 , λ j ) and ρ j ( t ) = 2 sin ( ( j 0.5 ) π t ) . For 0 1 γ ( t ) X i ( t ) d t = j = 1 m θ i j α j in (6), let
α j = 0 1 γ ( t ) ρ j ( t ) d t with γ ( t ) = 2 sin π t 2 + 3 3 sin 3 π t 2
and m be chosen by the CPV method (see Horváth and Kokoszka [7], page 41), i.e.,
m = min k { k : CPV ( k ) 85 % } = min k k : i = 1 k λ ^ i / i = 1 n λ ^ i 85 % .
Let V i be an A R ( 1 ) process: V i = ϕ V i 1 + e i , where e i N ( 0 , 1 ) and | ϕ | < 1 . Thus, E V i 2 = 1 / ( 1 ϕ 2 ) .
In the simulation, we take ϕ = 0.1 , 0.9 and sample sizes n = 50 and 200. For each sample size, we replicate N = 500 simulations and take l = 500 grid points of equal interval in [ 0 , 1 ] . The mean square error (MSE) of the estimators γ ^ ( · ) and β ^ of γ ( · ) and β is defined, respectively, as
MSE ( γ ^ ) = 1 N l i = 1 N k = 1 l [ γ ^ i ( t k ) γ i ( t k ) ] 2 , MSE ( β ^ ) = 1 N i = 1 N [ β ^ ( i ) β ] 2 .
In Figure 1, we draw Q-Q plots of the estimators β ^ 1 and β ^ 2 with sample sizes n = 50, 200 and ϕ = 0.1 , 0.9 . Figure 1 illustrates that more points fall near the line as n = 200 and the value of ϕ = 0.1 ; more points on either sides are away from the line when n = 50 and ϕ = 0.9 . This implies the quality of fit decreases as the dependence of the observations increases, i.e., the value of ϕ increases, and that the normality in the distribution of the estimators increases as the sample size n increases, which comfirms the asymptotic normality in Theorem 1.
In Table 1, we report the bias and MSEs, of the estimators for β 1 , β 2 and γ ( t ) . From Table 1, we can draw the following conclusions:
(1)
For the same sample size n, the values of MSE ( β ^ 1 ) , MSE ( β ^ 2 ) and MSE ( γ ^ ) increase with ϕ increasing;
(2)
With the same ϕ , if the sample size n increases, then the values of MSE ( β ^ 1 ) , MSE ( β ^ 2 ) and MSE ( γ ^ ) decrease;
(3)
Changes of the sample size n and ϕ have a little effect on the biases of β ^ 1 and β ^ 2 .

4.2. Variable Selection

In this subsection, we add six independent covariates { z i 3 , z i 4 , , z i 8 } with z i j N ( 0 , 1 ) ( j = 3 , 4 , , 8 ) to model (6), and take β = ( β 1 , β 2 , , β 8 ) T = ( 1 , 2 , 0 , 0 , 0 , 0 , 0 , 0 ) T . Set C as the average number of components in β correctly estimated to be zero, IC as the average number of components in β incorrectly estimated to be zero, C-fit as the probability of exactly fitting the model and MSE( β ^ ) = 1 8 N j = 1 8 i = 1 N [ β ^ j ( i ) β j ] 2 as the mean square error of the estimator for β .
Figure 2 shows Q-Q plots of the estimators for β 1 and β 2 with n = 50 , 200 and ϕ = 0.1 , 0.9 by using the SCAD penalty function. The performance from Q-Q plots in this instance is comparable to that in Section 4.1. The asymptotic normality in Theorem 3 is verified as sample size rises because the Q-Q plots are more aligned with the normality; when the value of ϕ is increasing, the poorer imitative impact is observed.
In Table 2, we represent the values of C, IC, C-fit and MSE( β ^ ) . Table 2 indicates the following conclusion:
(1)
When the sample size n increases with the same ϕ , the value of MSE( β ^ ) decreases;
(2)
 
(i)
If the sample size n increases with the same ϕ , the average number of zero coefficients correctly estimated to be zero is close to 5 and the average number of components in β incorrectly estimated to be zero is near 0 and even is 0 when ϕ = 0.1 or n = 200 . This verifies the selection consistency in Theorem 3;
(ii)
When the value of ϕ dereases with the same sample size, the average number of zero coefficients correctly estimated to be zero increases, and the average number of components in β incorrectly estimated to be zero decreases;
(3)
As the sample size n increases with the same ϕ or ϕ decreases with the same sample size, the probability of exactly fitting the model increases.

5. Conclusions

By using the least square method, based on FPCA, we construct the estimators of parameter and coefficient function in the partially functional linear models with linear process errors and establish asymptotic normality of parameter estimator and the rate of convergence of the estimator for the coefficient function. Additionally, we use the SCAD penalty to define the estimator of the parameter and discuss its oracle property.
However, the proposed method has some limitations. First, in this study, we approximate the expansion of the function part into partial sum form by using the FPCA method, which may leave out data information due to the truncation value m. Second, while, in practice, there exists missing data, this work considers the scenario of complete data. Therefore, in the event of missing data, our proposed method may be inapplicable. In the future, we are interested in figuring out a way to reduce data information loss and take missing data into account.

6. Proof of Main Results

In proof below, we use the following notations: for linear operator T, let T = sup f = 1 T f , T H = sup f = 1 | T f | for f L 2 [ 0 , 1 ] ; A = max i j | A i j | for matrix A = ( A i j ) ; for k = 1 , , d , set η ( k ) = ( η 1 k , η 2 k , , η n k ) T , f ( k ) = ( f k , X 1 , f k , X 2 , , f k , X n ) T , z ( k ) = ( z 1 k , z 2 k , , z n k ) T ,
ψ ^ k ( f ) = j = 1 m C ^ z k X , ρ ^ j ρ ^ j , f λ ^ j , ψ k ( f ) = j = 1 C z k X , ρ j ρ j , f λ j ,
B ^ = 1 n Z T ( I H m ) Z and M = γ , X 1 , γ , X 2 , , γ , X n T .
Lemma 1 
(Shin [10] ).
(1) 
Let E θ j 4 c λ j 2 , then E T ^ X T X 2 n 1 E X 4 , E C ^ z C z 2 n 1 E z R d 4 and E C ^ z k X C z k X 2 n 1 E z R d 4 E X 4 1 / 2 for k = 1 , 2 , , d .
(2) 
Suppose that (A1), (A3), (A4)(i), (A5) and (A6) are satisfied. Then ψ k ψ ^ k H 2 = O p n ( 2 b 1 ) / ( a + 2 b ) . Further, if { e i } are identical distributed with E e 1 2 < or square uniformly integrable, B ^ B 2 = O p n ( 2 b 1 ) / ( a + 2 b ) .
Lemma 2 
(Pollard [26], page 171). Let { Z n k , k 0 } be a sequence of random variables and { G n , k 1 , k 1 } be increasing sequence of σ-fields such that { Z n k } is measurable with respect to G n , k , and E ( Z n k | G n , k 1 ) = 0 for 1 k n . Assume that k = 1 n E ( Z n k 2 | G n , k 1 ) p a 2 and k = 1 n E ( Z n k 2 I | Z n k | > q | G n , k 1 ) p 0 for some constant a 2 > 0 and every q > 0 . Then k = 1 n Z n k d N ( 0 , a 2 ) .
Lemma 3. 
For t [ 0 , 1 ] , let { X i ( t ) , i 1 } be i . i . d . random variables with E X i ( t ) = 0 and E X 1 2 < . Set V i = s = c s e i s with j = | c j | < . Assume that { X i ( t ) , i 1 } is independent of { e i , i Z } , and that { e i } are identical distributed with E e 1 2 < or square uniformly integrable. Then n 1 / 2 i = 1 n X i V i = O p ( 1 ) .
Proof. 
Note that sup i E V i 2 sup i s = c s 2 E e i s 2 + 2 s = k > s | c s c k | E e i s 2 E e i k 2 1 / 2   c s = | c s | 2 < . So E n 1 / 2 i = 1 n X i V i 2 = n 1 i = 1 n E X i 2 E V i 2 < , which yields n 1 / 2 i = 1 n X i V i = O p ( 1 ) .
Lemma 4. 
If (A1) and E θ k 4 c λ k 2 for each k hold, then E P k c n 1 λ k and E Q j k 2 c n 1 λ j λ k , where P K = 0 1 0 1 C ^ X ( s , t ) C X ( s , t ) ρ k ( t ) d t 2 d s , Q j k = 0 1 0 1 ( C ^ X ( s , t ) C X ( s , t ) ) ρ j ( s ) ρ k ( t ) d s d t .
Proof. 
From θ i k = X i , ρ k , using (A1) we have
E P k = 0 1 E 1 n i = 1 n X i ( s ) 0 1 X i ( t ) ρ k ( t ) d t E X ( s ) 0 1 X ( s ) ρ k ( t ) d t 2 d s = 0 1 E 1 n i = 1 n X i ( s ) X i , ρ k E [ X ( s ) X , ρ k ] 2 d s 1 n E 0 1 X 2 ( s ) d s θ k 2 = n 1 E ( X 2 θ k 2 ) n 1 E X 4 · E θ k 4 1 / 2 c n 1 λ k , E Q j k 2 = E 0 1 0 1 1 n i = 1 n X i ( s ) X i ( t ) E ( X ( s ) X ( t ) ) ρ j ( s ) ρ k ( t ) d s d t 2 = E 1 n i = 1 n X i , ρ j X i , ρ k E X , ρ j X ( t ) , ρ k 2 = Var 1 n i = 1 n θ i j θ i k = n 1 Var ( θ j θ k ) n 1 ( E θ j 4 E θ k 4 ) 1 / 2 c n 1 λ j λ k .
Lemma 5. 
Let { x n } and { y n } be two sequences of independent random variables, then for any ε > 0 and c > 0 we have
E | x n y n | 2 I ( | x n y n | > ε ) E | x n | 2 I ( | x n | > c ) E | y n | 2 + E | y n | 2 I ( | y n | > ε / c ) E | x n | 2 .
Proof. 
Using independence between { x n } and { y n } , it follows that
E | x n y n | 2 I ( | x n y n | > ε ) = E | x n y n | 2 I ( | x n y n | > ε , | x n | > c ) + I ( | x n y n | > ε , | x n | c ) E | x n | 2 I ( | x n | > c ) E | y n | 2 + E | y n | 2 I ( | y n | > ε / c ) E | x n | 2 .
Proof of Theorem 1. 
We write (cf. the proof of Theorem 3.1 in Shin [10])
n β ^ β 0 = n 1 2 B ^ 1 { i = 1 n z i j = 1 m C ^ z X , ρ ^ j ρ ^ j , X i λ ^ j γ , X i + i = 1 n j = 1 C z X , ρ j X i , ρ j λ j j = 1 m C ^ z X , ρ ^ j ρ ^ j , X i λ ^ j V i + i = 1 n z i j = 1 C z X , ρ j ρ j , X i λ j V i } : = B ^ 1 ( A 1 + A 2 + A 3 ) .
Lemma 1 implies B ^ p B . Thus, it suffices to show that A 1 = o p ( 1 ) , A 2 = o p ( 1 ) and
A 3 = n 1 / 2 i = 1 n η i V i d N ( 0 , σ 2 B j = c j 2 ) .
Step 1. We prove A 2 = o p ( 1 ) . By applying Lemmas 1 and 3, it follows
A 2 R d = n 1 / 2 i = 1 n [ j = 1 C z X , ρ j X i , ρ j λ j j = 1 m C ^ z X , ρ ^ j ρ ^ j , X i λ ^ j ] V i R d = n 1 / 2 ( ψ 1 ψ ^ 1 , , ψ d ψ ^ d ) T ( i = 1 n X i V i ) R d n 1 / 2 k = 1 d ψ k ψ ^ k H i = 1 n X i V i = O p ( n 2 b 1 2 ( a + 2 b ) ) = o p ( 1 ) ,
which yields A 2 = o p ( 1 ) .
Step 2. We prove A 1 = o p ( 1 ) . From z ( k ) = η ( k ) + f ( k ) , we know that the k-th ( 1 k d ) element of A 1 is
A 1 ( k ) = n 1 / 2 i = 1 n z i k j = 1 m C ^ z k X , ρ ^ j ρ ^ j , X i λ ^ j γ , X i = n 1 / 2 z ( k ) T ( I H m ) M = n 1 / 2 f ( k ) T ( I H m ) M + n 1 / 2 η ( k ) T ( I H m ) M .
Since I H m = ( I H m ) 2 , n 1 / 2 | f ( k ) T ( I H m ) M | n 1 / 2 [ n 1 f ( k ) T ( I H m ) f ( k ) ] 1 2 n 1 M T ( I H m ) M 1 2 . Then, to prove A 1 ( k ) = o p ( 1 ) , we need only to prove that
n 1 M T ( I H m ) M = O p ( n ( a + 2 b 1 ) / ( a + 2 b ) ) = o p ( 1 ) ,
n 1 f ( k ) T ( I H m ) f ( k ) = O p ( n ( a + 2 b 1 ) / ( a + 2 b ) ) = o p ( 1 ) ,
n 1 / 2 η ( k ) T ( I H m ) M = O p ( n ( a + 2 b 1 ) / 2 ( a + 2 b ) ) = o p ( 1 ) .
Here, we only prove (8) and (10), the proof for (9) is similar by Remark 2(c).
To do these, we need the following results (i) and (ii), their proofs can be found in Hall and Horowitz [5]:
(i)
If A m : = A m ( n ) = { ( λ ^ j λ k ) 2 2 ( λ j λ k ) 2 , j k } , we have P ( A m ) 1 .
(ii)
ρ k , ρ ^ j ρ j = ( λ ^ j λ k ) 1 0 1 0 1 C ^ X ( s , t ) C X ( s , t ) ρ ^ j ( s ) ρ k ( t ) d s d t δ j k ; sup j | λ ^ j λ j | C ^ X C X . Furthermore, let B m = { 1 2 λ m C ^ X C X } , then P ( B m ) 1 .
Note that γ ( t ) = j = 1 γ , ρ ^ j ρ ^ j ( t ) , then, from i = 1 n X i , ρ ^ j X i , ρ ^ k = n λ ^ j δ j k , we have
i = 1 n γ , X i 2 = i = 1 n j = 1 γ , ρ ^ j ρ ^ j , X i 2 = i = 1 n j = 1 γ , ρ ^ j 2 ρ ^ j , X i 2 = n j = 1 λ ^ j γ , ρ ^ j 2 .
Using i = 1 n X i , ρ ^ j γ , X i = n 0 1 0 1 C ^ X ( s , t ) ρ ^ j ( s ) γ ( t ) d s d t = n λ ^ j ρ ^ j , γ , we get
U m T M = i = 1 n X i , ρ ^ 1 γ , X i i = 1 n X i , ρ ^ 2 γ , X i i = 1 n X i , ρ ^ m γ , X i = n λ ^ 1 ρ ^ 1 , γ n λ ^ 2 ρ ^ 2 , γ n λ ^ m ρ ^ m , γ .
The result (ii) implies 1 2 λ j λ ^ j 3 2 λ j on B m , hence using (11) and U m T U m = diag ( n λ ^ 1 , n λ ^ 2 , , n λ ^ m ) , On B m we have
n 1 M T ( I H m ) M = M T M M T U m ( U m T U m ) 1 U m T M = n 1 i = 1 n γ , X i 2 j = 1 m λ ^ j ρ ^ j , γ 2 = j = m + 1 λ ^ j ρ ^ j , γ 2 3 2 j = m + 1 λ j ρ ^ j , γ 2 3 j = m + 1 λ j ρ ^ j ρ j , γ 2 + 3 j = m + 1 λ j ρ j , γ 2 : = 3 ( D 1 + D 2 ) .
By ( A 3 ) and ( A 4 ) ( i ) , we find D 2 = j = m + 1 λ j α j 2 c j = m + 1 j ( a + 2 b ) c m 1 ( a + 2 b ) .
When k j and using the conclusion ( i i ) and Lemma 4
| ρ k , ρ ^ j ρ j | = | λ ^ j λ k | 1 | 0 1 0 1 C ^ X ( s , t ) C X ( s , t ) ρ ^ j ( s ) ρ k ( t ) d s d t | | λ ^ j λ k | 1 0 1 0 1 C ^ X ( s , t ) C X ( s , t ) ρ k ( t ) d t 2 d s · 0 1 ρ ^ j 2 ( s ) d s 1 2 = | λ ^ j λ k | 1 P k 1 / 2 .
The inequation (5.16) in Hall and Horowitz [5] shows ρ ^ j ρ j 2 2 k : k j ( λ ^ j λ k ) 2 0 1 0 1 C ^ X ( s , t ) C X ( s , t ) ρ ^ j ( s ) ρ k ( t ) d t d s 2 2 k : k j ( λ ^ j λ k ) 2 P k , thus, from γ ( t ) = j = 1 α k ρ k ( t ) we have
D 1 = j = m + 1 λ j k = 1 α k ρ k , ρ ^ j ρ j 2 2 j = m + 1 λ j α j 2 ( ρ j , ρ ^ j ρ j ) 2 + 2 j = m + 1 λ j k : k j α k ρ k , ρ ^ j ρ j 2 2 j = m + 1 λ j α j 2 ρ ^ j ρ j 2 + 2 j = m + 1 λ j k : k j α k ρ k , ρ ^ j ρ j 2 4 j = m + 1 λ j α j 2 k : k j ( λ ^ j λ k ) 2 P k + 2 j = m + 1 λ j k : k j | α k | | λ ^ j λ k | 1 P k 1 / 2 2 : = 4 D 11 + 2 D 12 .
On A m , we have D 11 2 j = m + 1 λ j α j 2 k : k j ( λ j λ k ) 2 P k . Obviously, from ( A 3 ) we have
λ j λ k c k j a < c 1 λ j λ k > 1 c : = c λ k λ j > c λ k f o r k < j
and λ j λ k > c λ j for k > j . Then, in view of (A3), (A4) and Lemma 4, we obtain
E | D 11 | 2 j = m + 1 λ j α j 2 k : k j ( λ j λ k ) 2 E P k c n 1 j = m + 1 λ j α j 2 k : k j ( λ j λ k ) 2 λ k c n 1 j = m + 1 λ j α j 2 k : k < j λ k 2 λ k + c n 1 j = m + 1 λ j α j 2 k : k > j λ j 2 λ k c n 1 j = m + 1 λ j α j 2 k : k < j λ k 1 + c n 1 j = m + 1 λ j 1 α j 2 k : k > j λ k c n 1 j = m + 1 j a 2 b k : k < j k a + c n 1 j = m + 1 j a 2 b k : k > j k a c n 1 j = m + 1 j 1 2 b + c n 1 j = m + 1 j 1 2 b c n 1 m 2 2 b .
On A m , one can write
D 12 2 j = m + 1 λ j k : k j | α k | | λ j λ k | 1 P k 1 / 2 2 4 j = m + 1 λ j k : k < j | α k | | λ j λ k | 1 P k 1 / 2 2 + 4 j = m + 1 λ j k : k > j | α k | | λ j λ k | 1 P k 1 / 2 2 : = 4 L 21 + 4 L 22 .
According to (A3) and (A4), by using Lemma 4 we have
E | L 21 | c j = m + 1 λ j E k : k < j | α k | λ k 1 P k 1 / 2 2 = c j = m + 1 λ j k : k < j | α k | 2 λ k 2 E P k + 2 k : k < j s : s < k | α k | λ k 1 | α s | λ s 1 E ( P k 1 / 2 P s 1 / 2 ) c j = m + 1 λ j k : k < j | α k | 2 λ k 2 E P k + 2 k : k < j s : s < k | α k | λ k 1 | α s | λ s 1 ( E P k · E P s ) 1 / 2 c n 1 j = m + 1 λ j k : k < j | α k | 2 λ k 1 + 2 k : k < j s : s < k | α k | λ k 1 / 2 | α s | λ s 1 / 2 = c n 1 j = m + 1 λ j k : k < j | α k | λ k 1 / 2 2 c n 1 j = m + 1 j a k : k < j k b + a / 2 2 c n 1 m 1 a ,
E | L 22 | c j = m + 1 λ j E k : k > j | α k | λ j 1 P k 1 / 2 2 c j = m + 1 λ j 1 k : k > j | α k | 2 E P k + 2 k : k > j s : s > k | α k | | α s | ( E P k · E P s ) 1 / 2 c n 1 j = m + 1 λ j 1 k : k > j | α k | 2 λ j + 2 k : k > j s : s > k | α k | | α s | λ j c n 1 j = m + 1 k : k > j | α k | 2 c n 1 j = m + 1 k : k > j k b 2 c n 1 j = m + 1 j 2 2 b c n 1 m 3 2 b .
So, on A m B m , we have E ( n 1 M T ( I H m ) M ) c ( m 1 ( a + 2 b ) + n 1 m 1 a + n 1 m 3 2 b + n 1 m 2 2 b ) c n 1 ( a + 2 b ) a + 2 b . Thus, using results (i) and (ii), as c 0 ,
P ( n 1 M T ( I H m ) M > c 0 n 1 ( a + 2 b ) a + 2 b ) P ( n 1 M T ( I H m ) M > c 0 n 1 ( a + 2 b ) a + 2 b , A m B m ) + P ( A m c ) + P ( B m c ) E [ n 1 M T ( I H m ) M I A m B m ] ) c 0 n 1 ( a + 2 b ) a + 2 b + P ( A m c ) + P ( B m c ) 0 ,
which yields (8). As for (10), we have
P ( | n 1 / 2 η ( k ) T ( I H m ) M | > c 0 n 1 ( a + 2 b ) 2 ( a + 2 b ) , A m B m ) 1 c 0 n 1 ( a + 2 b ) ( a + 2 b ) E ( n 1 η ( k ) T ( I H m ) M I A m B m ) 2 = 1 c 0 n 1 ( a + 2 b ) ( a + 2 b ) E ( n 1 M T ( I H m ) η ( k ) η ( k ) T ( I H m ) M I A m B m ) = 1 c 0 n 1 ( a + 2 b ) ( a + 2 b ) E [ n 1 M T ( I H m ) E ( η ( k ) η ( k ) T | X 1 , X 2 , , X n ) ( I H m ) M I A m B m ] = 1 c 0 n 1 ( a + 2 b ) ( a + 2 b ) B k k E ( n 1 M T ( I H m ) M I A m B m ) 0 as c 0 .
Then, (10) is verified.
Step 3. We verify (7). It suffices to show that for nonzero vector l = ( l 1 , l 2 , , l d ) T , l T A 3 d N 0 , σ 2 l T B l j = c j 2 . Write
l T A 3 = n 1 2 i = 1 n l T η i j = n n c j e i j + n 1 2 i = 1 n l T η i | j | > n c j e i j : = T n + W n .
By ( A 6 ) and using j = | c j | < , it follows that, for any ϵ > 0
P ( | W n | > ϵ ) 1 ϵ 2 n 1 E i = 1 n l T η i | j | > n c j e i j 2 = d ϵ 2 k = 1 d l k 2 B k k n 1 i = 1 n E | j | > n c j e i j 2 c n 1 i = 1 n | j | > n c j 2 E e i j 2 + 2 | j 1 | > n j 2 > j 1 | c j 1 c j 2 | ( E e i j 1 2 E e i j 2 2 ) 1 / 2 c | j | > n | c j | 2 0 ,
which implies W n = o p ( 1 ) .
Now, we use Lemma 2 to prove A 3 d N 0 , σ 2 l T B l j = c j 2 . In fact, T n = s = 1 n 2 n d n s e s = k = 1 3 n d n , k n e k n , where d n s = n 1 / 2 i = max ( 1 , s n ) min ( n , n + s ) c i s l T η i . Set
G n , k = σ ( X 1 , z 1 ) , , ( X n , z n ) , e 1 n , e 2 n , , e k n .
Then { d n , k n e k n } is measurable with respect to G n , k , and for 1 k 3 n , E ( d n , k n e k n | G n , k 1 ) = d n , k n E ( e k n | e 1 n , e 2 n , , e k 1 n ) = 0 . Thus, from Lemma 2, we only need to verify that
k = 1 3 n E ( d n , k n 2 e k n 2 | G n , k 1 ) p σ 2 l T B l j = c j 2 ,
k = 1 3 n E d n , k n 2 e k n 2 I ( | d n , k n e k n | > ε ) | G n , k 1 = o p ( 1 ) for   any ε > 0 .
We first prove (13). Applying E ( e i 2 | , e i 2 , e i 1 ) = σ 2 a.s., we can write
k = 1 3 n E ( d n , k n 2 e k n 2 | G n , k 1 ) = σ 2 k = 1 3 n d n , k n 2 = σ 2 s = 1 n 2 n d n s 2 = σ 2 n s = 1 n 2 n i = max ( 1 , s n ) min ( n , n + s ) ( l T η i ) 2 c i s 2 + 2 s = 1 n 2 n max ( 1 , s n ) i < j min ( n , n + s ) ( l T η i ) ( l T η j ) c i s c j s = σ 2 n i = 1 n l T η i 2 j = n n c j 2 + 2 1 i < j n ( l T η i ) ( l T η j ) t = n n c t c j i + t : = G 1 + G 2 .
The law of large numbers implies 1 n i = 1 n l T η i 2 p l T B l , hence from j = n n c j 2 j = c j 2 we obtain G 1 p σ 2 l T B l j = c j 2 .
Since { η i , 1 i n } is a sequence of independent random vectors with E η i = 0 , we have
E G 2 2 = 4 σ 2 n 2 E i = 1 n 1 l T η i j = i + 1 n l T η j t = n n c t c j i + t 2 = 4 σ 2 n 2 i = 1 n 1 E ( l T η i ) 2 E j = i + 1 n l T η j t = n n c t c j i + t 2 = 4 σ 2 n 2 ( l T B l ) 2 i = 1 n 1 j = i + 1 n t = n n c t 2 c j i + t 2 + 2 i = 1 n 1 j = i + 1 n t = n n 1 s = t + 1 n c t c j i + t c s c j i + s 4 σ 2 n 2 ( l T B l ) 2 i = 1 n 1 t = + c t 2 2 + 2 i = 1 n 1 t = + | c t | 2 s = + c s 2 0 ,
which gives G 2 = o p ( 1 ) . Therefore, (13) is proved.
We next prove (14). Using Lemma 5, we write
s = 1 n 2 n E [ d n s 2 e s 2 I ( | d n s e s | > ε ) ] s = 1 n 2 n E [ d n s 2 I ( | d n s 2 | > ε n 1 / 4 ) ] E | e s | 2 + s = 1 n 2 n E [ e s 2 I ( | e s | > n 1 / 4 ) ] E d n s 2 : = R 1 + R 2 .
According to the moment inequality for sum of independent random variables, we can write
R 1 c ε δ n 1 δ / 4 s = 1 n 2 n E max ( 1 , s n ) min ( n , n + s ) c i s l T η i 2 + δ c n 1 δ / 4 s = 1 n 2 n max ( 1 , s n ) min ( n , n + s ) E ( l T η i c i s ) 2 ( 2 + δ ) / 2 + s = 1 n 2 n max ( 1 , s n ) min ( n , n + s ) E | l T η i c i s | 2 + δ : = R 11 + R 12 .
By ( A 6 ) , we have
R 11 c n 1 δ / 4 s = 1 n 2 n max ( 1 , s n ) min ( n , n + s ) c i s 2 ( 2 + δ ) / 2 c n δ / 4 j = | c j | 2 + δ 0 , R 12 c n 1 δ / 4 i = 1 n j = | c j | 2 + δ c n δ / 4 j = | c j | 2 + δ 0 .
When { e i } are identical distributed, from E e s 2 = σ 2 < we have E e s 2 I ( | e s | > n 1 / 4 ) 0 ; when { e i } are square uniformly integrable, we have sup s E e s 2 I ( | e s | > n 1 / 4 ) 0 . Hence R 2 0 . Therefore, k = 1 3 n E d n , k n 2 e k n 2 I ( | d n , k n e k n | > ε ) 0 , further we verify (14). □
Proof of Theorem 2. 
Applying Lemmas 1 and 3 and sup i E V i 2 < , following the proof line in the proof of Theorem 3.2 in Shin [10], one can prove this result. □
Proof of Theorem 3. 
(1) Let a n = max j A p ω ( | β 0 j | ) and b n = n 1 / 2 + a n . we first prove that for any ϵ > 0 and a large constant C > 0 ,
P inf u R d = C G p ( β 0 + b n u ) > G p ( β 0 ) 1 ϵ ,
which implies there exists a local minimizer β ^ p such that β ^ p β 0 R d = O p ( b n ) = O p ( n 1 / 2 ) .
In fact, G ( β , α ( β ) ) = ( Y Z β ) T ( I H m ) [ Y Z β ] . Let I 4 = n j A { p ω ( | β 0 j + b n u j | ) p ω ( | β 0 j | ) } . Using the Taylor expansion, we have
I 4 = j A n b n p ω ( | β 0 j | ) s i g n ( β 0 j ) u j + 1 2 n b n 2 p ω ( | β 0 j | ) u j 2 ( 1 + o ( 1 ) ) ,
and from p ω ( | b n u j | ) p ω ( 0 ) 0 , it follows that
G p ( β 0 + b n u ) G p ( β 0 ) G ( β 0 + b n u , α ( β 0 + b n u ) ) G ( β 0 , α ( β 0 ) ) + n j A { p ω ( | β 0 j + b n u j | ) p ω ( | β 0 j | ) } = b n G ( β 0 , α ( β 0 ) ) T u + 1 2 b n 2 u T 2 G ( β 0 , α ( β 0 ) ) u + I 4 = 2 b n ( Y Z β ) | β = β 0 T ( I H m ) Z u + b n 2 u T Z T ( I H m ) Z u + I 4 = 2 b n M T ( I H m ) Z u 2 b n V T ( I H m ) Z u + b n 2 u T Z T ( I H m ) Z u + I 4 : = I 1 + I 2 + I 3 + I 4 ,
where G ( β 0 , α ( β 0 ) ) = G ( β 0 , α ( β 0 ) ) β , 2 G ( β 0 , α ( β 0 ) ) = 2 G ( β 0 , α ^ ) β β T and V = ( V 1 , V 2 , , V n ) T . (8), (10) and (9) imply Z T ( I H m ) M = O p ( n 1 / 2 ( a + 2 b ) ) . Hence, from Theorem 2 and b > a / 2 + 1 we have
| I 1 | 2 b n Z T ( I H m ) M R d u R d = O p n 1 2 ( a + 2 b ) b n u R d = o p n b n 2 u R d .
Next, we consider I 2 . Clearly, I 2 = n b n u T A 2 u + n b n u T A 3 u , where A 2 and A 3 is defined in the proof of Theorem 1 and further in the proof of Theorem 1, it holds that n b n u T A 2 u = O p ( n ( a + 1 ) / 2 ( a + 2 b ) b n u R d ) = o p n b n 2 u R d and n b n u T A 3 u = O p ( b n u R d ) = o p ( n b n 2 u R d ) . Thus, I 2 = o p n b n 2 u R d .
From Lemma 1, we have B ^ = B + o p ( 1 ) , which implies I 3 = b n 2 u T Z T ( I H m ) Z u = n b n 2 u T B u + o p ( n b n 2 u R d 2 ) .
As for I 4 , from (A7) we get
| I 4 | c n b n a n u R d + n b n 2 u R d 2 max j A p ω ( | β 0 j | ) = o n b n 2 u R d + n b n 2 u R d 2 .
Therefore, for u R d = C , we have I 1 + I 2 + I 3 + I 4 = n b n 2 u T B u + o p n b n 2 , which yields (15) since B = E η i η i T is a positive definite matrix.
Note that G p ( β ) = G ( β 0 , α ( β 0 ) ) + G ( β 0 , α ( β 0 ) ) T ( β β 0 ) + 1 2 ( β β 0 ) T 2 G ( β 0 , α ( β 0 ) ) ( β β 0 ) + n j = 1 d p ω ( | β j | ) = G ( β 0 , α ( β 0 ) ) 2 M T ( I H m ) Z ( β β 0 ) + ( β β 0 ) T Z T ( I H m ) Z ( β β 0 ) + n j = 1 d p ω ( | β j | ) . Then
G p ( β ) = 2 Z T ( I H m ) M 2 Z T ( I H m ) V + 2 Z T ( I H m ) Z ( β β 0 ) + n g ( β ) ,
where g ( β ) = ( p ω ( | β 1 | ) s i g n ( β 1 ) , , p ω ( | β d | ) s i g n ( β d ) ) T . Thus, for any β satisfying β β 0 R d = O p ( n 1 / 2 ) , by (A7) and Theorem 2, n 1 / 2 ω and b > a / 2 + 1 we find
G p ( β ) β j = O p n 1 2 ( a + 2 b ) + O p ( n ) + n p ω ( | β j | ) s i g n ( β j ) = n ω ω 1 p ω ( | β j | ) s i g n ( β j ) + o p ( 1 ) ,
then the sign of G p ( β ) β j is dominated by the sign of β j . Note that β ^ p = ( β ^ p 1 T , β ^ p 2 T ) T = arg min β R d G p ( β ) is the estimator of β 0 = ( β 01 T , 0 T ) T . Then P ( β ^ p 2 = 0 ) 1 as n .
(2) From the proof in (1) above, we know β ^ p 1 β 01 R d 1 = O p ( n 1 / 2 ) and G p ( β ^ p 1 ) = 0 . Hence, from (16) we get
n ( β ^ p 1 β 01 ) = B ^ p 1 1 n 1 / 2 Z p 1 T ( I H m ) M + n 1 / 2 Z p 1 T ( I H m ) V n 2 g ( β ^ p 1 )
where B ^ p 1 = n 1 Z p 1 T ( I H m ) Z p 1 , g ( β ^ p 1 ) = ( p ω ( | β ^ l | ) s i g n ( β ^ l ) ) l A and Z p 1 = ( z 1 ( 1 ) , z 2 ( 1 ) , , z n ( 1 ) ) T . Due to Lemma 1, we can obtain B ^ p 1 p B p 1 . Then, by (A7),
n g ( β ^ p 1 ) = n g ( β 01 ) + g ( β 01 ) ( β ^ p 1 β 01 ) [ 1 + o ( 1 ) ] = o p ( 1 ) .
Thus, by (8), (10) and (9), (17) can be rewritten as
n ( β ^ p 1 β 01 ) = B p 1 1 n 1 / 2 Z p 1 T ( I H m ) V + o p ( 1 ) .
Similar to the proof of Theorem 1, we have n 1 / 2 Z p 1 T ( I H m ) V = A 2 ( 1 ) + A 3 ( 1 ) , where A 2 ( 1 ) and A 2 ( 3 ) are similarly defined as A 2 and A 3 , which are related to z i ( 1 ) for i = 1 , 2 , , n . Then, following the proof line in Step 2 and Step 3 of the proof of Theorem 1, one can verify n ( β ^ p 1 β 01 ) D N 0 , σ 2 B p 1 1 j = c j 2 .

Author Contributions

Conceptualization, Y.H.; Methodology, Y.H.; Software, Y.H.; Formal analysis, Y.H.; Data curation, Y.H.; Writing—original draft, Y.H.; Writing—review & editing, Z.P.; Visualization, Z.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

In this paper, Monte Carlo simulation method was used for data analysis, and R software was used to generate the required data. No new data were created in this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer: New York, NY, USA, 1997. [Google Scholar]
  2. Cardot, H.; Ferraty, F.; Sarda, P. Spline estimators for the functional linear model. Stat. Sin. 2003, 13, 571–591. [Google Scholar]
  3. Cardot, H.; Sarda, P. Linear Regression Models for Functional Data. In The Art of Semiparametrics; Contributions to Statistics; Physica-Verlag: Heidelberg, Germany, 2006; pp. 49–66. [Google Scholar]
  4. Li, Y.; Hsing, T. On rates of convergence in functional linear regression. J. Multivar. Anal. 2007, 98, 1782–1804. [Google Scholar] [CrossRef]
  5. Hall, P.; Horowitz, J.L. Methodology and convergence rates for functional linear regression. Ann. Stat. 2007, 35, 70–91. [Google Scholar] [CrossRef]
  6. Hall, P.; Hosseini-Nasab, M. On properties of functional principal components analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 2006, 68, 109–126. [Google Scholar] [CrossRef]
  7. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer: New York, NY, USA, 2012. [Google Scholar]
  8. Hsing, T.; Eubank, R. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators; John Wiley & Sons, Ltd.: Chichester, UK, 2015. [Google Scholar]
  9. Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis: Theory and Practice; Springer: New York, NY, USA, 2006. [Google Scholar]
  10. Shin, H. Partial functional linear regression. J. Stat. Plan. Inference 2009, 139, 3405–3418. [Google Scholar] [CrossRef]
  11. Yuan, M.G.; Zhang, Y. Test for the parametric part in partial functional linear regression based on B-spline. Commun. Stat.-Simul. Comput. 2021, 50, 1–15. [Google Scholar] [CrossRef]
  12. Hu, Y.P.; Liang, H.Y. Empirical likelihood in single-index partially functional linear model with missing observations. Commun. Stat.-Theory Methods 2022. [Google Scholar] [CrossRef]
  13. Jiang, Z.Q.; Huang, Z.S. Single-index partially functional linear quantile regression. Commun. Stat.-Theory Methods 2022. [Google Scholar] [CrossRef]
  14. Bouka, S.; Dabo-Niang, S.; Nkiet, G.M. On estimation and prediction in spatial functional linear regression model. Lith. Math. J. 2023, 63, 13–30. [Google Scholar] [CrossRef]
  15. Feng, S.; Xue, L. Partially functional linear varying coefficient model. Statistics 2016, 50, 717–732. [Google Scholar] [CrossRef]
  16. Xie, T.F.; Cao, R.Y.; Yu, P. Rank-based test for partial functional linear regression models. J. Syst. Sci. Complex. 2020, 33, 1571–1584. [Google Scholar] [CrossRef]
  17. Hu, Y.; Xue, L.; Zhao, J.; Zhang, L. Skew-normal partial functional linear model and homogeneity test. J. Stat. Plan. Inference 2020, 204, 116–127. [Google Scholar] [CrossRef]
  18. Tang, Q.G.; Tu, W.; Kong, L.L. Estimation for partial functional partially linear additive model. Comput. Stat. Data Anal. 2023, 177, 107584. [Google Scholar] [CrossRef]
  19. Kong, D.; Xue, K.; Yao, F.; Zhang, H.H. Partially functional linear regression in high dimensions. Biometrika 2016, 103, 147–159. [Google Scholar] [CrossRef]
  20. Du, J.; Xu, D.; Cao, R. Estimation and variable selection for partially func- tional linear models. J. Korean Stat. Soc. 2018, 474, 436–449. [Google Scholar] [CrossRef]
  21. Yao, F.; Sue-Chee, S.; Wang, F. Regularized partially functional quantile regression. J. Multivar. Anal. 2017, 156, 39–56. [Google Scholar] [CrossRef]
  22. Wu, C.X.; Ling, N.X.; Vieu, P.; Liang, W.J. Partially functional linear quantile regression model and variable selection with censoring indicators MAR. J. Multivar. Anal. 2023, 197, 105189. [Google Scholar] [CrossRef]
  23. Wang, Y.F.; Du, J.; Zhang, Z.G. Partial functional linear models with dependent errors. Acta Math. Appl. Sin. 2017, 40, 49–65. (In Chinese) [Google Scholar]
  24. Hu, Y.P.; Liang, H.Y. Functional regression with dependent error and missing observation in reproducing kernel Hilbert spaces. J. Korean Stat. Soc. 2023. [Google Scholar] [CrossRef]
  25. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  26. Pollard, D. Convergence of Stochastic Processes; Springer: New York, NY, USA, 1984. [Google Scholar]
Figure 1. Q-Q plots of β ^ 1 and β ^ 2 with n = 50 (the first row), n = 200 (the second row), ϕ = 0.1 (the first and second column) and ϕ = 0.9 (the third and fourth column).
Figure 1. Q-Q plots of β ^ 1 and β ^ 2 with n = 50 (the first row), n = 200 (the second row), ϕ = 0.1 (the first and second column) and ϕ = 0.9 (the third and fourth column).
Mathematics 11 03581 g001
Figure 2. Q-Q plots of the estimators for ( β 1 , β 2 ) with n = 50 (the first row), n = 200 (the second row), ϕ = 0.1 (the first and second column) and ϕ = 0.9 (the third and fourth column) by the method of SCAD.
Figure 2. Q-Q plots of the estimators for ( β 1 , β 2 ) with n = 50 (the first row), n = 200 (the second row), ϕ = 0.1 (the first and second column) and ϕ = 0.9 (the third and fourth column) by the method of SCAD.
Mathematics 11 03581 g002
Table 1. Values of bias, MSEs for β ^ 1 , β ^ 2 and γ ^ with n = 50 , 200 and ϕ = 0.1 , 0.9 .
Table 1. Values of bias, MSEs for β ^ 1 , β ^ 2 and γ ^ with n = 50 , 200 and ϕ = 0.1 , 0.9 .
n ϕ MSEBias
β ^ 1 β ^ 2 γ ^ β ^ 1 β ^ 2
500.10.02070.02451.86490.0040−0.0050
0.90.10100.12886.12970.0078−0.0118
2000.10.00510.00520.46060.0044−0.0058
0.90.02890.02682.14210.0040−0.0120
Table 2. Values of C, IC, C-fit and MSE( β ^ ) .
Table 2. Values of C, IC, C-fit and MSE( β ^ ) .
n ϕ CICC-FitMSE( β ^ )
500.14.9400.86750.0847
0.94.120.0480.7590.5204
2000.15.42400.9280.0173
0.94.98200.8730.0900
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, Y.; Pang, Z. Partially Functional Linear Models with Linear Process Errors. Mathematics 2023, 11, 3581. https://doi.org/10.3390/math11163581

AMA Style

Hu Y, Pang Z. Partially Functional Linear Models with Linear Process Errors. Mathematics. 2023; 11(16):3581. https://doi.org/10.3390/math11163581

Chicago/Turabian Style

Hu, Yanping, and Zhongqi Pang. 2023. "Partially Functional Linear Models with Linear Process Errors" Mathematics 11, no. 16: 3581. https://doi.org/10.3390/math11163581

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop