Next Article in Journal
Equilibrium in a Bargaining Game of Two Sellers and Two Buyers
Next Article in Special Issue
Properties of Statistical Depth with Respect to Compact Convex Random Sets: The Tukey Depth
Previous Article in Journal
Nonlinear Dynamics
Previous Article in Special Issue
A New Clustering Method Based on the Inversion Formula
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High-Dimensional Statistics: Non-Parametric Generalized Functional Partially Linear Single-Index Model

1
Ecole Nationale des Sciences Appliquées, Université Cadi Ayyad, Marrakech 40000, Morocco
2
Laboratoire AGEIS EA 7407, Université Grenoble Alpes, AGIM Team, UFR SHS, BP. 47, CEDEX 09, 38040 Grenoble, France
3
Institut de Mathématiques de Toulouse, Université Paul Sabatier, CEDEX 09, 31062 Toulouse, France
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(15), 2704; https://doi.org/10.3390/math10152704
Submission received: 27 June 2022 / Revised: 18 July 2022 / Accepted: 26 July 2022 / Published: 30 July 2022

Abstract

:
We study the non-parametric estimation of partially linear generalized single-index functional models, where the systematic component of the model has a flexible functional semi-parametric form with a general link function. We suggest an efficient and practical approach to estimate (I) the single-index link function, (II) the single-index coefficients as well as (III) the non-parametric functional component of the model. The estimation procedure is developed by applying quasi-likelihood, polynomial splines and kernel smoothings. We then derive the asymptotic properties, with rates, of the estimators of each component of the model. Their asymptotic normality is also established. By making use of the splines approximation and the Fisher scoring algorithm, we show that our approach has numerical advantages in terms of the practical efficiency and the computational stability. A computational study on data is provided to illustrate the good practical behavior of our methodology.

1. Introduction

Generalized linear models (GLM) encompass several parametric regression models by studying parametric modeling between a (often canonical or known) link function between the mean response and certain covariates; check [1,2]. This is not always desirable because the link function is not always known and may be more complicated.
Several models have been developed to overcome this problem, such as non-parametric and semi-parametric regression models. However, the great scourge arises and limits the use of such models. Thus, efforts have been made to circumvent this difficulty. The remedy is available according to two approaches: the approximation of the functions of link and the reduction of dimension. To do this, the generalized additive model (GAM), in which the non-parametric component is replaced by a sum of functions with only one variable, was recommended by Hastie et al. [3] and detailed by Wood [4]. The only criticism of this type of model is that it does not take into account the interactions between the explanatory variables. Thus, the single index model (SIM) was developed by Hardle et al. [5] and Hristache et al. [6] because it makes it possible to reduce the dimensions and soften the restrictive parametric assumptions by replacing several covariates by a linear combination of them. Subsequently, the partially linear single-index model (PLSIM), making it possible to model discrete explanatory variables in the linear part, were developed by Liang et al. [7] and Chen et al. [8] for the case of longitudinal data. Generalized partial linear single index (GPLSIM) models that are based on kernel smoothing to estimate the single index link function first appeared in the work of Caroll et al. [9], while in that of Cao and Wang [10], we use penalized spline smoothing of the quasi-likelihood and the technique of Fisher scoring, which is theoretically more reliable and relevant.
Notice also that some covariates may be functional and are not taken into account by this model. It should be remembered that several works have focused on the study of functional variables (see, for example, Ramsay and Silverman [11] and Ferraty and Vieu [12]). Note also that semi-functional partial linear regression has been studied by Aneiros-Perez and Vieu [13], then partial linear modeling with multi-functional covariates by Aneiros-Perez and Vieu [14]. We can also refer to several books on this subject, such as Horvath and Kokoszka [15], Kokoszka and Reimherr [16], Schumaker [17], Ould-Said et al. [18], Ouassou and Rachdi [19,20], Laksaci et al. [21], Cao et al. [22], Li and Lu [23] and Yao et al. [24]. We specifically cite, for example, Yu et al. [25] for the study of the partially functional single-index linear regression model and Yu and Ruppert [26] for a comprehensive review of the penalized spline smoothing methodology for PLSIM, in which the sub-regression function underlying is assumed to be a spline function with a fixed number of knots. Partially linear generalized single index models for functional data (PLGSIMF) has been studied by Rachdi et al. [27] and Alahiane et al. [28] using B-spline expansion and the quasi-likelihood function where the functional part is linear.
In this paper, we study the generalized non-parametric functional partially linear single-index models GNPFPLSIM where the functional variables, which froze there, are taken into account. Notice that in this model (I) the link function is unknown, (II) the number of knots increases with the size of the sample, (III) the functional regression is estimated in parallel with the unknown link function and the simple nonparametric index function using an iterative algorithm based on smoothing by spline functions and maximizing the quasi-likelihood function.
In addition, we use Fisher’s algorithm to solve the maximization problem iteratively. In addition, we also provide a generalized cross-validation method to select the number of knots in the spline approximation and use kernel methods for functional data.
We also provide the convergence rates of our different estimators of the different parameters of GNPFPLSIM.
The rest of this paper is organized as follows. In Section 2 and Section 3, we develop an estimation methodology, give some asymptotic properties of the proposed estimators and we present an iterative algorithm based on the maximization of the quasi-likelihood function for the computation of the proposed estimators. A simulation study is conducted in Section 5. The technical lemmas allowing to prove Theorems 1, 2 and 3 are presented in Appendix A.
Notice finally that in order to save space, proofs of various results obtained are grouped in a supplementary file to this paper.

2. Some Preliminaries

Let Y be a scalar response variable and ( X , Z ) I R d × H be the predictor vector where X = ( X 1 , , X d ) and Z belongs to H where ( H , δ ) is a semi-metric space of functions defined on [ 0 , 1 ] , i.e., Z is a functional random variable and δ is a semi-metric.
For a fixed ( x , z ) I R d × H , we assume that the conditional density function of the response Y given ( X , Z ) = ( x , z ) belongs to a canonical exponential family, which is given by
f Y | X = x , Z = z ( y ) = exp y ξ ( x , z ) B ( ξ ( x , z ) ) + C ( y ) ,
where B and C are two known functions, and where ξ is the unknown natural parameter in the generalized parametric linear models, which is linked to the dependent variable by
µ ( x , z ) = E [ Y | X = x , Z = z ] = B ( ξ ( x , z ) ) ,
where B denotes the first derivative of the function B (see [10,29]).
In what follows, we modelize g ( µ ( x , z ) ) as a generalized non-parametric functional partially linear single-index model (GNPFPLSIM) by
g ( µ ( x , z ) ) = η 0 α x + r ( z ) ,
where α = α 1 , α 2 , , α d R d is the single-index coefficient vector of dimension d, r ( · ) is the unknown non-parametric function in the functional component, and η 0 ( · ) is the unknown single-index link function which will be assumed to be sufficiently smooth.
Remark 1.
Notice the following:
  • For identifiability reasons, we assume that | | α | | d = 1 and the first component of α is non-negative, i.e., α 1 > 0 , where | | · | | d denotes the Euclidean norm on I R d .
  • In order to identify the function η 0 ( · ) , we define its support as [ a , b ] , where a = inf α X and b = sup α X .
  • The GNPFPLSIM includes as special cases the linear model (LM), the single-index model (SIM), as well as the partially linear model (PLM), the PLSIM, and the non-parametric models.
  • In the definition of the real canonical link function g, we assume that the functional random variable Z = { Z ( t ) , t [ 0 , 1 ] } is valued in H and such that
    E [ Z ] = 0 , E ( ε | X , Z ) = 0 a n d var ( ε | X , Z ) = σ 2 .
  • If the conditional variance var ( Y | X = x , Z = z ) = σ 2 V ( µ ( x , z ) ) where V ( · ) is an unknown positive function, then the estimation of the mean function g ( µ ) may be obtained by replacing the log-likelihood f Y | X = x , Z = z , given by (1), by the quasi-likelihood Q ( u , v ) which is given, for any real numbers u and v, by
    Q ( u , v ) u = v u σ 2 V ( u ) = v u var ( Y | X = x , Z = z ) ,
    and which may be written as follows:
    Q ( u , v ) = v u v t σ 2 V ( t ) d t .
  • The regression operator r ( · ) , which is a nonlinear operator from H into I R , satisfies
    r C H 0 where C H 0 = f : H I R s u c h t h a t lim δ ( Z , Z ) 0 f ( Z ) = f ( Z ) .
    or
    There exists β > 0 such that r L i p H , β , where
    L i p H , β = f : H I R , C I R * + , Z H , | f ( Z ) f ( Z ) | < C δ ( Z , Z ) β .

3. Estimation Methodology

Let X i , Y i , Z i for i = 1 , , n , be an independent and identically distributed (i.i.d.) n-sample of ( X , Y , Z ) . Then, for each i = 1 , , n
g µ X i , Z i = η 0 α X i + r ( Z i ) .
Let v N * and κ ( 0 , 1 ] such that p = v + κ > 1.5 . We designate by H ( p ) the collection of functions g defined on [ a , b ] whose vth derivative, g ( v ) exists and satisfies the following Lipschitz condition of order κ ,
g ( v ) m g ( v ) ( m ) C m m κ , for all a m , m b .
We introduce a knots sequence ( k m ) in the interval [ a , b ] with J interior knots, such that
k r + 1 = = k 1 = k 0 = a < k 1 < < k J = k J + 1 = = k J + r ,
where J = J n increases as the sample size n increases.
Definition 1.
A function s ( · ) is said to belong to the space of polynomial splines, S n , on an interval [ a , b ] of order ν 1 , if
  • s ( · ) is a polynomial of degree ( ν 1 ) on each sub-interval I j = [ k j , k j + 1 ] , for j = 0 , , J n 1 and I J n = [ k J n , b ] .
  • s ( · ) is ( ν 2 ) -times continuously differentiable on [ a , b ] .
Let N n = J n + ν be the number of knots, and let B j ( u ) , j = 1 , , N n , be the B-spline basis functions of order ν . Moreover, let h = ( b a ) / ( J n + 1 ) be the distance between the neighboring knots. Then, a function η 0 ( · ) H ( p ) (which will be defined in Section 3) may be approximated by a function η ˜ S n with η ˜ ( · ) = γ ˜ B ( · ) where B ( · ) = B 1 ( · ) , B 2 ( . ) , , B N n ( . ) is a vector of cubic B-splines of order ν (see DeBoor [30]).
So, our estimation process consists of two steps as follows.

3.1. The First Step

Let
Y i = r Z i for 1 i n .
A non-parametric estimator of the regression operator r ( · ) is defined by
r ^ ( z ) = i = 1 n Y i K δ z , Z i h 1 i = 1 n K δ z , Z i h 1 = i = 1 n ω i , h 1 ( z ) Y i ,
where Y i = g µ X i , Z i , ω i , h 1 ( z ) = K δ z , Z i / h 1 / i = 1 n K δ z , Z i / h 1 , and the kernel K : R R + which is supported within ( 0 , 1 ) , is of
  • Type 1: if K = 1 , c 1 1 [ 0 , 1 ] K c 2 1 [ 0 , 1 ] for constants 0 < c 1 < c 2 ,
or
  • Type 2: if K = 1 , K is derivable on ( 0 , 1 ) and c 2 K ( 1 ) c 1 for some constants < c 2 < c 1 < 0 .
and the sequence h 1 = h 1 , n > 0 is the bandwidth (the smoothing parameter).

3.2. The Second Step

By plugging in the non-parametric estimator r ^ ( · ) in (6), we consider the model
g µ X i , Z i = γ B α X i + r ^ Z i for i = 1 , , n .
The mean function estimator µ ^ is given by the evaluation of the parameters θ ^ = α ^ , γ ^ and inverting Equation (7). In fact, θ ^ = α ^ , γ ^ is determined by maximizing the following quasi-likelihood
θ ^ = α ^ , γ ^ = arg max θ = ( α , γ ) R d × R N L ( θ )
where
L ( θ ) : = L ( α , γ ) = 1 n i = 1 n Q g 1 γ B α X i + r ^ Z i , Y i = 1 n i = 1 n Q g 1 m i X i , Z i + r ^ Z i , Y i ,
with
m i = γ B α X i + r ^ Z i = γ B U i + r ^ Z i where U i = α X i , m 0 i = γ 0 B α 0 X i + r ^ Z i = γ 0 B U 0 i + r ^ Z i where U 0 i = α 0 X i
and
m 0 = γ 0 B α 0 X + r ^ ( Z ) = γ 0 B U 0 + r ^ ( Z ) where U 0 = α 0 X ,
where α 0 , γ 0 and η 0 ( · ) denote the true values, respectively, of α , γ and η ( · ) .
In order to overcome the constraints α = 1 and α 1 > 0 of the d-dimensional index α , we proceed by a re-parametrization (see Yu and Ruppert [26])
α = α ( τ ) = 1 τ 2 , τ for τ I R d 1 .
The true value τ 0 , of τ , must satisfy τ 0 1 . Then, we assume that τ 0 < 1 .
The Jacobian matrix of α : τ α ( τ ) of dimension d × ( d 1 ) is
J ( τ ) = 1 1 τ 2 τ I ( d 1 ) × ( d 1 ) .
Notice that τ is unconstrained and of one dimension lower than the α dimension.
Recall that since η 0 H ( p ) , then there exists η ˜ S n such that η 0 η ˜ = O ( h p ) and η ˜ ( · ) = γ ˜ B ( · ) . Thus, let
α ˜ = arg max α d = 1 1 n i = 1 n Q g 1 η ˜ α X i + r ^ Z i , Y i .
Then
τ ˜ = arg max τ R d 1 l ˜ ( τ ) ,
where
l ˜ ( τ ) = 1 n i = 1 n Q g 1 η ˜ α ˜ ( τ ) X i + r ^ ( Z ) , Y i .
We define θ ˜ τ = τ ˜ , γ ˜ , such that
τ ˜ , γ ˜ = arg max τ R d 1 , γ R N 1 n i = 1 n Q g 1 γ B α ˜ ( τ ) X i + r ^ Z i , Y i
Notice that θ τ = τ , γ is ( d 1 ) × N -dimensional, while θ = α ( τ ) , γ is d × N -dimensional. Then, l ˜ θ τ becomes
l ˜ θ τ = 1 n i = 1 n Q g 1 γ B α ˜ ( τ ) X i + r ^ Z i , Y i = 1 n i = 1 n Q g 1 m i , Y i .
For l = 1 , 2 , we denote by
ρ l ( m ) = 1 σ 2 V g 1 ( m ) d d m g 1 ( m ) l and q l ( m , y ) = l m l Q g 1 ( m ) , y .
Then
q 1 ( m , y ) = y g 1 ( m ) ρ 1 ( m ) and q 2 ( m , y ) = y g 1 ( m ) ρ 1 ( m ) ρ 2 ( m ) .
The score vector is
S θ τ = L θ τ ( θ ) | θ = θ τ = 1 n i = 1 n q 1 m i , Y i ξ i ( τ , γ ) ,
where
ξ i ( τ , γ ) = γ B α ( τ ) X i J ( τ ) X i B α ( τ ) X i .
Then, the Hessian matrix of the quasi-likelihood function is therefore
H θ τ = E θ τ θ τ S ( θ ) | θ = θ r = 1 n i = 1 n ρ 2 m i ξ i ( τ , γ ) ξ i ( τ , γ ) .
We have
θ ˜ τ = τ ˜ , γ ˜ = arg max θ τ = ( τ , γ ) R d 1 × R N L ˜ θ τ .
Then
0 = L ˜ θ τ ( θ ) | θ = θ ˜ τ l θ τ ( θ ) | θ = θ ^ τ + 2 l θ τ θ τ ( θ ) | θ = θ ^ τ θ ˜ τ θ ^ τ .
By replacing 2 l θ τ θ τ ( θ ) | θ = θ ^ τ by E 2 l θ τ θ τ ( θ ) | θ = θ ^ τ , we obtain
S θ ^ τ H θ ^ τ θ ˜ τ θ ^ τ = 0 ,
and then
θ ˜ τ = θ ^ τ + H θ ^ τ 1 S θ ^ τ .
Elsewhere, the Fisher scoring update equations become
θ τ ( k + 1 ) = θ τ ( k ) + H θ τ ( k ) 1 S θ τ ( k ) = θ τ ( k ) + i = 1 n ρ 2 m i ( k ) ξ i τ ( k ) , γ ( k ) ξ i τ ( k ) , γ ( k ) 1 × i = 1 n Y i µ i ( k ) ρ 1 m i ( k ) ξ i τ ( k ) , γ ( k )
where, for 1 i n ,
m i ( k ) = γ ( k ) B α ( k ) τ ( k ) X i + r ^ Z i , µ i ( k ) = g 1 m i k , η ^ ( t ) = γ ^ B ( t ) γ ( k ) B ( t ) = j = 1 N γ j ( k ) B j ( t ) m ^ i = γ ^ B α ( τ ^ ) X i + r ^ Z i j = 1 N 1 γ j ( k ) B j α τ k X i + r ^ Z i .
Then µ ^ i = g 1 m ^ i and α ^ = α τ ( k ) is the estimator of the single index coefficient vector of the GNPFPLSIM model.

4. Some Asymptotic Properties

In this section, we present the asymptotic properties of the estimators for the non-parametric components, the parametric components, the single-index and the almost complete convergence of the functional regression operator of the GNPFPLSIM model. For this aim, we will need some assumptions.

4.1. Assumptions

Let φ , φ 1 and φ 2 be measurable functions on [ a , b ] . We define the empirical inner product and its corresponding norm as follows
φ 1 , φ 2 n = 1 n i = 1 n φ 1 U i φ 2 U i and φ n 2 = 1 n i = 1 n φ 2 U i where U i = α X i .
If φ , φ 1 and φ 2 are L 2 -integrable, we define the theoretical inner product and its corresponding norm as follows
φ 1 , φ 2 = E φ 1 ( U ) φ 2 ( U ) and φ 2 2 = E φ 2 ( U ) = a b φ 2 ( u ) f ( u ) d u .
We assume that
(C1)
η 0 ( · ) H ( p ) .
(C2)
For all m R and for all y in the range of the response variable Y, we have, for k = 1 , 2 that
q 2 ( m , y ) < 0 and c q < q 2 k ( m , y ) < C q ,
for some positive constants c q and C q .
(C3)
The ν th order partial derivative of the joint density function of X satisfies the Lipschitz’ condition of order κ ( κ ( 0 , 1 ] ) .
The marginal density function of α X is continuous and bounded away from zero and supported within [ a , b ] .
(C4)
For any vector τ , there exists positive constants c τ and C τ such that
c τ I t × t E 1 X 1 X | α ( τ ) X = α ( τ ) x C τ I t × t ,
where t = 1 + d + N n .
(C5)
The number N n of knots satisfies n 1 2 ( p + 1 ) N n n 1 8 ( p > 3 ) .
(C6)
The variable Z ( H , δ ) where ( H , δ ) is semi-metric space.
(C7)
The operator r ( · ) C H 0 , where C H 0 = f : H R such that lim δ x , x 0 f x = f ( x )
There exists β > 0 such that r ( · ) Lip S , β .
For all ε > 0 , P Z B ( z , ε ) = φ z ( ε ) > 0
The bandwidth h 1 satisfies h 1 0 and log n / n φ z h 1 0 , when n .
The kernel K is of Type 1 or Type 2.
For all m 2 , E Y m | Z = z < σ m ( z ) < , where σ m ( · ) is a continuous function in z.
(C8)
There exist some positive constants 0 < C g , C g * , M 1 < such that
ρ 1 m 0 C ρ and ρ 1 ( m ) ρ 1 m 0 C ρ * m m 0 for all m m 0 M 0 .
(C9)
There exist positive constants 0 < C g , C g * , M 1 < such that the link function g satisfies
d d m g m 0 C g and d d m g 1 ( m ) d d m g 1 m 0 C g * m m 0
for all m m 0 M 1 .
(C10)
 There exists a positive constant C 0 such that E ε 2 | U τ , 0 C 0 almost surely (a.s.)
Remark 2.
1. 
If the kernel K is of Type 1, then there are two generic constants c > 0 and c > 0 such that
c φ z h 1 E K h 1 1 δ ( z , Z ) c φ z h 1 .
2. 
If the kernel K is of Type 1 and if there exists c 3 > 0 and ε 0 > 0 such that for all ε < ε 0 , 0 ε φ z ( u ) d u > c 3 ε φ z ( ε ) , then there are two generic constants c > 0 and c > 0 such that for h 1 small enough
c φ z h 1 < E K h 1 1 δ ( z , Z ) c φ z h 1 .

4.2. The Consistency Study

Lemma 1.
Under assumptions (C1)–(C5), we have
n α ( τ ˜ ) α τ 0 D N 0 , J τ 0 A 1 Σ 1 A 1 J τ 0 ,
and
α ( τ ˜ ) α τ 0 = O P 1 n ,
where D (respectively, O P ) denotes the convergence in distribution (respectively, in probability), and Σ 1 and A are two matrices that will be defined in the Appendix A.
Lemma 2.
Under assumptions (C1)–(C5), we have
θ ^ θ ˜ = O P N n h p + 1 n h .
The proofs of the previous results are based on the following lemmas and among others on the papers of Pollard [31] and Stone [32].
Lemma 3
(Lemma A.1. in Huang [33]). For any λ > 0 , let Θ n = { η α 0 x s u c h t h a t η S n , η η 0 2 λ } . Then, for any ε λ
log N [ ] λ , Θ n , L 2 ( P ) C N n log λ ε .
Lemma 4
(Lemma A.2. in Wang and Yang [29] and Lemma A.4. in Xue and Yang [34]). Under assumptions (C1)–(C5), we have
A n = sup η 1 , η 2 S n η 1 , η 2 n η 1 , η 2 η 1 2 η 2 2 = O a . c o . log n n h ,
where O a . c o . denotes the “O” Lanadau symbol, for the almost-complete convergence.
Let D n , θ = γ B α ( τ ˜ ) X i J ( τ ) 0 0 B α ( τ ˜ ) X i , and denote by
W n , θ = 1 n i = 1 n D i , θ X i 1 X i 1 D i , θ and W θ = 1 n i = 1 n E D i , θ X i T X i 1 D i , θ .
Then, we have the following lemma.
Lemma 5
(Lemma A.3 in the Supplementary Material of Wang and Yang [29]). Under assumptions (C1)–(C8), there exists C > 0 such that
error θ W θ 1 2 C N n a . c o . a n d error θ W n , θ 1 2 C N n a . c o . ,
where M 2 = sup x 0 M x x = error x = 1 M x .

4.2.1. Almost Complete Convergence of the Functional Kernel Estimator of r ^

Theorem 1.
Under assumptions (C1)–(C7), we have
r ^ ( x ) a . c o r ( x ) as n .
In order to show Theorem 1, we will need the following lemmas for which the proofs are given in the Supplementary Material. In fact, the proof is based on the following decomposition:
r ^ ( z ) r ( z ) = 1 r ^ 1 ( z ) r ^ 2 ( z ) E r ^ 2 ( z ) r ( z ) E r ^ 2 ( z ) r ( z ) r ^ 1 ( z ) r ^ 1 ( z ) 1
where
r ^ ( z ) = r ^ 2 ( z ) r ^ 1 ( z ) with r ^ 2 ( z ) = 1 n i = 1 n Y i Δ i and r ^ 1 ( z ) = 1 n i = 1 n Δ i with Δ i = K h 1 1 δ z , Z i E K h 1 1 δ ( z , Z )
Lemma 6.
Under assumptions (C1)–(C6), (C7)-1 and (C7)-4, we have
lim n + E r ^ 2 ( z ) = r ( z ) .
Lemma 7.
Under assumptions (C1)–(C7), we have the following:
(i) 
If assumptions (C7)-3 to (C7)-5 are satisfied, then we have
r ^ 2 ( z ) E r ^ 2 ( z ) = O a . c o log n n φ z h 1 .
(II) 
If assumptions (C7)-3 and (C7)-4 are satisfied, then we have
r ^ 1 ( z ) 1 = O a . c o . log n n φ z h 1 .
Lemma 8
(Corollary of Bernstein’s inequality).
(i) 
If for all m 2 there exists c m > 0 such that E W 1 m c m a 2 ( m 1 ) , then for all ε > 0 ,
P | 1 n i = 1 n W i | > ε 2 exp ε 2 n 2 a 2 ( 1 + ε ) .
(II) 
If W i = W i , n depends on n, and for all m 2 , there exists c m > 0 , such that E W 1 m c m a 2 ( m 1 ) and u n = a n 2 log n n n + 0 , then
1 n i = 1 n W i = O a . c o . u n .

4.2.2. Estimation of the Non-Parametric Function

Theorem 2.
Under assumptions (C1)–(C7), we have
η ^ η 0 2 = O P N n 1 n h + h p
and
η ^ η 0 n = O P N n 1 n h + h p
The proof of Theorem 2 is given in Appendix A.

4.2.3. Estimation of the Parametric Components

Theorem 3.
Under assumptions (C1)–(C10), the quasi-likelihood estimator α ^ with the constraint α ^ = 1 is asymptotically normal, i.e.,
n α ^ α 0 D N 0 , J τ 0 D 1 J τ 0 ,
where D = E ρ 2 m 0 ( T ) η 0 U τ , 0 J τ 0 Φ ( X ) η 0 U τ , 0 J τ 0 Φ ( X ) .
The proof of Theorem 3 and the used technical lemmas are given in Appendix A.

5. A Simulation Study

We aim to illustrate numerically the convergence of different estimators of the parameters τ , γ , the non-parametric function η and the regression operator r of Y on Z. We conduct this numerical study in the Gaussian and in the logistic cases.
The conditional density of Y | X = x , Z = z is given by
f Y | X = x , Z = z ( y ) = exp y ξ ( x , z ) B ( ξ ( x , z ) ) + C ( y ) .
We deduce that
g ( µ ( x , z ) ) = E Y | X = x , Z = z = B ( ξ ( x , z ) ) and V { µ ( x , z ) } = B ( ξ ( x , z ) ) σ 2 .
We consider that the model is given by the following equation
g ( µ ( X i , Z i ) ) = sin π α X i A B A + r ( Z i ) + ε i for i = 1 , , n
The responses Y i are simulated according to Equation (15) (see Figure 1), X i is taken uniformly over the interval [ 0.5 , 0.5 ] , whereas the errors ε i N ( 0 , 0.025 ) . Moreover, we take the following coefficients
α = 1 3 ( 1 , 1 , 1 ) , A = 3 2 1.645 12 and B = 3 2 + 1.645 12 .
The functional real variable Z i ( · ) is taken as Z ( t ) = f ( U ( [ 0 , T ] ) ) , where f ( t ) = g ( B ( 1 , 0.5 ) , t ) and g ( a , t ) = 2 a ( 1 2 a ) sin ( t π N ( 0 , 1 ) ) .
The regression operator r of Y on Z is defined as follows:
r ( Z ) = 0 1 1 1 + Z ( t ) d t .
The knots are selected according to formula C n 1 2 r log ( n ) where C [ 0.3 , 1 ] (see Cao and Wang [10]). We choose C = 0.6 and we make 2000 sample replications of sizes n = 500 .
Then, the computed bias, the standard deviation (SD) and the mean squared error (MSE) with respect to (I) the parameter τ , and (II) the parameter γ are summarized in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10.
Notice that in the first step, we estimate the regression operator r using the functional kernel regression estimator by using the R-code routine funopare.knn.lcv. The obtained mean squared error is MSE = 0.17 .
By the plug-in process (the second step) we estimate the parameters of the following model by using our algorithm GNFPLSIM as described before
g ( µ ( X i , Z i ) ) = sin π α X i A B A + r ^ ( Z i ) + ε i for i = 1 , , n
To compute the bias, SD and MSE, we recorded 2000 replications of the GNFPLSIM algorithm in the Gaussian case (Table 1, Table 2, Table 3, Table 4 and Table 5) and in the logistic case (Table 6, Table 7, Table 8, Table 9 and Table 10) with n = 500 as follows.
Notice that it is obvious to see that the quality of the estimators is illustrated through these simulations. The method works quite well. The bias, SD and MSE are generally reasonably low. The parametric and non-parametric components, simple index and nonlinear regression operator r of Y over Z are calculated by the procedure described above. The two tables therefore indicate the consistency of α ^ such that the bias, SD and MSE decrease as the sample size increases.
We developed our algorithm in both cases: the identity link function and the logistic link function. The simulations show that the GNFPLSIM algorithm works well in both cases.
In Figure 2, we illustrate 500 realizations of the functional random variable Z and the predicted response versus the true response.
We present below, in Figure 3, the single index estimated by the model in both cases: Gaussian and logistic cases.
We observe that the single-index estimated by our model fits well with the single-index.
We present below, in Figure 4, the systematic component η ( . ) estimated by the model in both cases: Gaussian and logistic cases.
We consider that our model approximated to the best non-parametric function η ( . ) . We use the square root of the average square errors criterion (RASE, see Lai et al. [35]) in the Gaussian case and in the logistic case as follows:
R A S E = 1 n i = 1 n η ^ ( u i ) η ( u i ) 2 1 2
The following Table 11 and Table 12 summarize the samples’ means, medians and variances of the R A S E with different sample sizes in the Gaussian case.
The following table summarizes the samples means, medians and variances of the R A S E with different sample sizes in the logistic case.
We conclude that as the sample size n increases from 500 to 1000 , the sample mean, median and variance of R A S E decrease.

Application to Tecator Data

In this paragraph, we apply the GNFPLSIM model for Tecator data, popularly known in the functional data analysis. These data can be downloaded in the following link http://lib.stat.cmu.edu/datasets/tecator (accessed on 1 March 2022). For more details, check Ferraty and Vieu [12].
Given 215 finely chopped pieces of meat, (see Figure 5) Tecator’s data contain their corresponding fat contents ( Y i , i = 1 , , 215 ), near-infrared absorbance spectra ( Z i , i = 1 , , 215 ) observed on 100 equal wavelengths in the range 850–1050 nm, the protein X 1 , i and the moisture content X 2 , i .
We try to predict the fat content of the finely chopped meat samples.
The following figure shows the absorbance curves.
We divide the sample randomly into sub-samples: the training I 1 of size 160 and the test I 2 of size 55 . The training sample is used to estimate the parameters and the test sample is employed to verify the quality of predictors. To perform our model, we use the mean square error of prediction (MSEP) like in Aneiros-Pérez and Vieu [13] defined as the following:
M S E P = 1 50 i I 2 ( Y i Y ^ i ) 2 / v a r I 2 ( Y i )
where Y ^ i is the predicted value based on the training sample and v a r I 2 is variance of the response variables test sample.
The following Table 13 and Table 14 show the performance of our GNPFPLSIM model by comparing it with other models. We can conclude that GNPFPLSIM is competitive for such data.
The following Figure 6, shows us the estimator of the non-parametric function η ( . ) by the model in both cases: Gaussian and logistic cases.
The following Figure 7 compares the content of fatness and its estimation by the model in both cases: Gaussian and Logistic cases.
We can see that our model well fits the content of fatness of “215 pieces of meat”.

6. Summary

In this paper, we introduce estimates for the non-parametric generalized functional partially linear single-index model (GNPFPLSIM). Our estimates are obtained via the kernel methods and the Fisher scoring update equation derived from the quasi likelihood function and the normalized B-splines basis with their derivatives.
We prove the n-consistency and asymptotic normality of our estimates and therefore, first, we define estimates of the estimator r ^ , which converges almost completely to the true operator regression r. Second, we define estimates, with rates, of the estimator η ^ , which still converge at the rate to the true non-parametric function η . Finally, we define estimates, with rates, of the estimator α ^ , which still converge at the rate to the non-parametric parameter α .
A numerical study reveals that our estimation performs well in higher dimensions. The quality of the estimators is illustrated via simulations and real data.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math10152704/s1.

Author Contributions

Formal analysis, M.A.; Investigation, I.O.; Methodology, M.R. and P.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In what follows, we present the technical lemmas that are used for the proof of the previous theorems.
In what follows, for all probability measure Q, define L 2 ( Q ) = f : Q f 2 = f 2 d Q < . Let F be a subclass of L 2 ( Q ) . For all f F , f = f 2 d Q 1 2 . Denote N δ , F , L 2 ( Q ) the δ -covering number of F , i.e., the smallest value of N for which there exist functions f 1 , f 2 ,…, f N , such that for each f F , there exists j { 1 , , N } , f f j < δ or that F j = 1 N B f j , δ . Notice that f j ’s are not necessarily in F .
For two functions l and u, a bracketing [ l , u ] is the set of functions f such that l f u , [ l , u ] = { f : l f u } .
The δ -covering number with bracketing N [ ] δ , F , L 2 ( Q ) is defined as the smallest value of N, necessary to cover the whole F , for which there exist pairs of functions f j L , f j U for j = 1 , , N with f j U f j L δ , such that for each f F , there is a j { 1 , , N } such that f j L f f j U ( f j U and f j L are not necessary, belonging to F ).
The δ -entropy with bracketing is log N [ ] δ , F , L 2 ( Q ) . The uniform entropy integral J [ ] δ , F , L 2 ( Q ) is defined as
J [ ] δ , F , L 2 ( Q ) = 0 δ 1 + log N [ ] ε , F , L 2 ( Q ) 1 2 d ε
Let Q n be the empirical measure of Q i.e., Q n = 1 n i = 1 n δ X i ( · ) such that
Q n f = E Q n [ f ] = f d Q n = 1 n i = 1 n f δ X i = 1 n i = 1 n f X i
Denote G n = n Q n Q the standardized empirical process indexed by F , and G n F = f F G n f for any measurable class of functions F .
For all f F , we have Q f = E Q [ f ( X ) ] = f d Q , and
G n f = n Q n f Q f = 1 n i = 1 n f X i E [ f ( X ) ]
Lemma A1
(Lemma 3.4.2. in Van Der Vaart and Wellner [36]). Let M 0 > 0 and F uniformly bounded class of measurable functions such that
f o r a l l f F , f < M 0 , Q f 2 < δ 2 .
Then
E Q G n F c 0 J [ ] δ , F , L 2 ( Q ) 1 + J [ ] δ , F , L 2 ( Q ) δ 2 n M 0 ,
where c 0 is a finite constant not dependent on n.
Lemma A2
(Lemma A.1. in Huang [33]). For any λ > 0 , let Θ n = { η α 0 x , η S n , η η 0 2 λ } . Then, for any ε λ ,
log N [ ] λ , Θ n , L 2 ( P ) c N n log λ ε .
Recall that N n is number basis functions of B-spline basis functions of order r.
Lemma A3
(Lemma A.2. Page 3 in Wang and Yang [29] and Lemma A.4. Page 1442 in Xue and Yang [34]). Let S n be the space of all polynomial spline functions of order r on [ a , b ] . Under conditions (C1)–(C5), we have
A n = sup η 1 , η 2 S n η 1 , η 2 n η 1 , η 2 η 1 2 η 2 2 = O p . s . log n n h .
Let
D n , θ = γ B α ( τ ˜ ) X i J ( τ ) 0 0 B α ( τ ˜ ) X i .
Denote
W n , θ = 1 n i = 1 n D i , θ X i 1 X i 1 D i , θ ,
and
W θ = 1 n i = 1 n E D i , θ X i 1 X i 1 D i , θ .
Lemma A4
(Lemma A.3 in Wang and Yang [29]). Under conditions (C1)–(C8), there exists C > 0 such that
sup θ W θ 1 2 C N n a . s . and sup θ W n , θ 1 2 C N n a . s .
Recall that
A 2 = sup x 0 A x x = error x = 1 A x
In what follows, we give lemmas allowing to prove Theorem 3. The lemmas and theorem proofs are developed in the Appendix A.
Lemma A5.
Under conditions (C1)–(C8), we have
1 n i = 1 n ρ 2 m 0 i η ^ U τ , 0 i η 0 U τ , 0 i η 0 U τ , 0 i J τ 0 Φ X i = O P 1 n
1 n i = 1 n ρ 2 m 0 i η 0 U τ , 0 i Φ X i Υ U τ , 0 i J τ 0 τ ^ τ 0 = O P 1 n
where
Υ u τ , 0 = E X ρ 2 m 0 ( T ) / U τ , 0 = u τ , 0 E ρ 2 m 0 ( T ) / U τ , 0 = u τ , 0 ; Γ u τ , 0 = E W ρ 2 m 0 ( T ) / U τ , 0 = u τ , 0 E ρ 2 m 0 ( T ) / U τ , 0 = u τ , 0
Φ ( x ) = Φ U τ , 0 , x = x Υ u τ , 0 and Ψ ( w ) = Ψ U τ , 0 , w = w Γ u τ , 0
Lemma A6.
Under conditions (C1)–(C8), we have
1 n i = 1 n ρ 2 m o i η ^ U τ , o i η 0 U τ , o i Ψ T i = O P 1 n
1 n i = 1 n ρ 2 m o i η 0 U τ , o i Ψ T i Υ U τ , o i J τ 0 τ ^ τ 0 = O P 1 n

References

  1. McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman and Hall: London, UK, 1972. [Google Scholar] [CrossRef]
  2. Nelder, J.A.; Wedderburn, R.W.M. Generalized Linear Models. J. R. Stat. Soc. Ser. A 1972, 135, 370–384. [Google Scholar] [CrossRef]
  3. Hastie, T.J.; Tibshirani, R.J. Generalized Additive Models; Chapman and Hall: London, UK, 1990. [Google Scholar] [CrossRef]
  4. Wood, S.N. Generalized Additive Models. An Introduction with R; Chapman and Hall/CRC: Boca Raton, FL, USA, 2017; Volume 2017. [Google Scholar] [CrossRef]
  5. Härdle, W.; Hall, P.; Ichimura, H. Optimal Smoothing in Single-Index Models. Ann. Stat. 1993, 21, 157–178. [Google Scholar] [CrossRef]
  6. Hristache, M.; Juditsky, A.; Spokoiny, V. Direct Estimation of the Index Coefficient in a Single-Index Model. Ann. Stat. 2001, 29, 595–623. [Google Scholar] [CrossRef]
  7. Liang, H.; Wang, N. Partially Linear Single-Index Measurement Error Models. Stat. Sin. 2005, 15, 99–116. [Google Scholar]
  8. Chen, J.; Li, D.; Liang, H.; Wang, S. Semiparametric GEE Analysis of Partially Linear Single-Index Models for Longitudinal Data. Ann. Stat. 2015, 43, 1682–1715. [Google Scholar] [CrossRef]
  9. Caroll, R.J.; Fan, J.; Gijbels, I.; Wand, M.P. Generalized Partially Linear Single-Index Models. J. Am. Stat. Assoc. 1997, 92, 477–489. [Google Scholar] [CrossRef]
  10. Wang, L.; Cao, G. Efficient Estimation for Generalized Partially Linear Single-Index Models. Bernoulli 2018, 24, 1101–1127. [Google Scholar] [CrossRef] [Green Version]
  11. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar] [CrossRef]
  12. Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis: Theory and Practice; Springer Series in Statistics; Springer: Berlin, Germany, 2006. [Google Scholar]
  13. Aneiros-Perez, G.; Vieu, P. Semi Functional Partial Linear Regression. Stat. Probab. Lett. 2006, 76, 1102–1110. [Google Scholar] [CrossRef]
  14. Aneiros-Perez, G.; Vieu, P. Partial Linear Modelling with Multi-Functional Covariates. Comput. Stat. 2015, 30, 647–671. [Google Scholar] [CrossRef]
  15. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Series in Statistics; Springer: New York, NY, USA, 2012. [Google Scholar]
  16. Kokoszka, P.; Reimherr, M. Introduction to Functional Data Analysis; Chapman and Hall/CRC Press: Boca Raton, FL, USA, 30 June 2021. [Google Scholar]
  17. Schumaker, L.L. Spline Functions: Basic Theory; Wiley: New York, NY, USA, 1981; Volume 14. [Google Scholar]
  18. Ould-Said, E.; Ouassou, I.; Rachdi, M. Functional Statistics and Applications; Contributions to Statistics; Springer: Berlin, Germany, 2013. [Google Scholar]
  19. Ouassou, I.; Rachdi, M. Stein Type Estimation of the Regression Operator for Functional Data. Adv. Appl. Stat. Sci. 2010, 1, 233–250. [Google Scholar]
  20. Ouassou, I.; Rachdi, M. Regression Operator Estimation by Delta-Sequences Method for Functional Data and its Applications. AStA Adv. Stat. Anal. 2012, 92, 451–465. [Google Scholar] [CrossRef]
  21. Laksaci, A.; Kaid, Z.; Alahiane, M.; Ouassou, I.; Rachdi, M. Non parametric estimations of the conditional density and mode when the regressor and the response are curves. Commun. Stat.—Theory Methods 2022. [Google Scholar] [CrossRef]
  22. Cao, R.; Du, J.; Zhou, J.; Xie, T. FPCA-based Estimation for Generalized Functional Partially Linear Models. Stat. Pap. 2020, 61, 2715–2735. [Google Scholar] [CrossRef]
  23. Li, C.S.; Lu, M. A Lack-of-fit Test for Generalized Linear Models via Single-Index Techniques. Comput. Stat. 2018, 33, 731–756. [Google Scholar] [CrossRef]
  24. Yao, D.S.; Chen, W.X.; Long, C.X. Parametric estimation for the simple linear regression model under moving extremes ranked set sampling design. Appl. Math. J. Chin. Univ. 2021, 36, 269–277. [Google Scholar] [CrossRef]
  25. Yu, P.; Du, J.; Zhang, Z. Single-Index Partially Functional Linear Regression Model. Stat. Pap. 2020, 61, 1107–1123. [Google Scholar] [CrossRef]
  26. Yu, Y.; Ruppert, D. Penalized Spline Estimation for Partially Linear Single-Index Models. J. Am. Stat. Assoc. 2002, 16, 1042–1054. [Google Scholar] [CrossRef]
  27. Rachdi, M.; Alahiane, M.; Ouassou, I.; Vieu, P. Generalized functional partially linear single-index models. In Functional and High-Dimensional Statistics and Related Fields; Springer International Publishing: Cham Switzerland, 2020; pp. 221–228. [Google Scholar]
  28. Alahiane, M.; Ouassou, I.; Rachdi, M.; Vieu, P. Partially Linear Generalized Single Index Models for Functional Data (PLGSIMF). Stats 2021, 4, 793–813. [Google Scholar] [CrossRef]
  29. Li, W.; Yang, L. Spline Estimation of Single-Index Models. Stat. Sin. 2009, 19, 765–783. [Google Scholar]
  30. De Boor, C. A Practical Guide to Splines, Revised ed.; Applied Mathematical Sciences; Springer: Berlin, Germany, 2001; Volume 27. [Google Scholar]
  31. Pollard, D. Asymptotics for Least Absolute Deviation Regression Estimators. Econ. Theory 1991, 7, 186–199. [Google Scholar] [CrossRef]
  32. Stone, C.J. The Dimensionality Reduction Principle for Generalized Additive Models. Ann. Stat. 1986, 14, 590–606. [Google Scholar] [CrossRef]
  33. Huang, J. Efficient Estimation of the Partly Linear Additive Cox Model. Ann. Stat. 1999, 27, 1536–1563. [Google Scholar] [CrossRef]
  34. Xue, L.; Yang, L. Additive Coefficient Modeling via Polynomial Spline. Stat. Sin. 2006, 16, 1423–1446. [Google Scholar]
  35. Lai, P.; Tian, Y.; Lian, H. Estimation and Variable Selection for Generalised Partially Linear Single-Index Models. J. Nonparametr. Stat. 2014, 26, 171–185. [Google Scholar] [CrossRef]
  36. Van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes with Applications to Statistics; Springer: New York, NY, USA, 1996. [Google Scholar]
Figure 1. Responses of testing sample versus Predicted responses (step 1).
Figure 1. Responses of testing sample versus Predicted responses (step 1).
Mathematics 10 02704 g001
Figure 2. On the left plot: 500 realizations of the functional random variable Z, and on the right plot: the predicted response (in abscissa) compared to the true response (in ordinate).
Figure 2. On the left plot: 500 realizations of the functional random variable Z, and on the right plot: the predicted response (in abscissa) compared to the true response (in ordinate).
Mathematics 10 02704 g002
Figure 3. On the left plot: single-index versus predicted single-index, Gaussian case. On the right plot: single-index versus predicted single-index, logistic case.
Figure 3. On the left plot: single-index versus predicted single-index, Gaussian case. On the right plot: single-index versus predicted single-index, logistic case.
Mathematics 10 02704 g003
Figure 4. On the left plot: the non-parametric function η ( . ) versus its estimator η ^ ( . ) , Gaussian case. On the right plot: the non-parametric function η ( . ) versus its estimator η ^ ( . ) , logistic case.
Figure 4. On the left plot: the non-parametric function η ( . ) versus its estimator η ^ ( . ) , Gaussian case. On the right plot: the non-parametric function η ( . ) versus its estimator η ^ ( . ) , logistic case.
Mathematics 10 02704 g004
Figure 5. A sample of 100 absorbance curves Z .
Figure 5. A sample of 100 absorbance curves Z .
Mathematics 10 02704 g005
Figure 6. On the left plot: estimated non-parametric function η ^ ( . ) , Gaussian case. On the right plot: estimated non-parametric function η ^ ( . ) , logistic case.
Figure 6. On the left plot: estimated non-parametric function η ^ ( . ) , Gaussian case. On the right plot: estimated non-parametric function η ^ ( . ) , logistic case.
Mathematics 10 02704 g006
Figure 7. On the left plot: the content of fatness and its estimation, Gaussian case. On the right plot: the content of fatness and its estimation, logistic case.
Figure 7. On the left plot: the content of fatness and its estimation, Gaussian case. On the right plot: the content of fatness and its estimation, logistic case.
Mathematics 10 02704 g007
Table 1. Bias, SD and MSE according to the parameter τ for GNFPLSIM with the identity link function and n = 500 .
Table 1. Bias, SD and MSE according to the parameter τ for GNFPLSIM with the identity link function and n = 500 .
τ 1 τ 2
Bias0.0004−0.0005
SD0.00060.0013
MSE5.20 × 10 7 1.94 × 10 6
Table 2. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
Table 2. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
γ 1 γ 2 γ 3 γ 4 γ 5
Bias−0.00540.0258−0.03870.0289−0.0093
SD0.01230.01650.02140.01520.0064
MSE1.8045  × 10 4 9.3789  × 10 4 1.9556  × 10 3 1.0662  × 10 3 1.2745  × 10 4
Table 3. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
Table 3. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
γ 6 γ 7 γ 8 γ 9 γ 10
Bias0.00720.0006−0.00430.0006−0.0056
SD0.00620.00460.00240.00360.0042
MSE9.028 × 10 5 2.152  × 10 5 2.425 × 10 5 1.332 × 10 5 4.900 × 10 5
Table 4. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
Table 4. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
γ 11 γ 12 γ 13 γ 14 γ 15
Bias0.00090.0027−0.00570.00310.0092
SD0.00560.00340.00280.00570.0051
MSE3.217 × 10 5 1.885 × 10 5 4.033 × 10 5 4.210 × 10 5 1.1065 × 10 4
Table 5. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
Table 5. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
γ 16 γ 17 γ 18 γ 19 γ 20 γ 21
Bias0.00090.0027−0.00570.00310.00920.0051
SD0.00460.00840.01420.01540.02320.0131
MSE2.792 × 10 5 8.017 × 10 5 2.2373 × 10 4 4.1140 × 10 4 7.1780 × 10 4 6.3386 × 10 4
Table 6. Bias, SD and MSE according to the parameter τ for GNFPLSIM with the logistic link function and n = 500 .
Table 6. Bias, SD and MSE according to the parameter τ for GNFPLSIM with the logistic link function and n = 500 .
τ 1 τ 2
Bias−0.00840.0047
SD0.01030.0108
MSE1.7665 × 10 4 1.3873 × 10 4
Table 7. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the logistic link function and n = 500 .
Table 7. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the logistic link function and n = 500 .
γ 1 γ 2 γ 3 γ 4 γ 5
Bias−0.00430.0352−0.03890.0383−0.0065
SD0.01070.02340.02230.01360.0058
MSE1.3298 × 10 3 1.7866 × 10 3 2.0105 × 10 3 1.6518 × 10 3 7.589 × 10 4
Table 8. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the logistic link function and n = 500 .
Table 8. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the logistic link function and n = 500 .
γ 6 γ 7 γ 8 γ 9 γ 10
Bias0.00870.00060.04670.0003−0.0054
SD0.00700.00610.00510.00470.0026
MSE1.2469 × 10 4 3.757 × 10 5 2.2069 × 10 3 2.2180 × 10 5 3.5920 × 10 5
Table 9. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
Table 9. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the identity link function and n = 500 .
γ 11 γ 12 γ 13 γ 14 γ 15
Bias0.00060.0053−0.00830.0036−0.0072
SD0.00410.00270.00720.00640.0052
MSE1.7170 × 10 5 3.5380  × 10 5 1.2073  × 10 4 5.3920  × 10 5 7.888  × 10 5
Table 10. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the logistic link function and n = 500 .
Table 10. Bias, SD and MSE evolutions with respect to the parameter γ variation for GNFPLSIM with the logistic link function and n = 500 .
γ 16 γ 17 γ 18 γ 19 γ 20 γ 21
Bias0.00270.0035−0.00480.0215−0.0187−0.0214
SD0.00630.00780.01270.02150.02540.0213
MSE4.698 × 10 5 7.309 × 10 5 1.8433 × 10 4 9.2450 × 10 4 9.9485 × 10 4 9.1165 × 10 4
Table 11. The RASE criterion with the non-parametric function η ( . ) for both cases n = 500 and n = 1000 .
Table 11. The RASE criterion with the non-parametric function η ( . ) for both cases n = 500 and n = 1000 .
Gaussian CaseMeanMedianVariance
n = 500 0.0280.0240.004
n = 1000 0.0270.0220.002
Table 12. The RASE criterion with the non-parametric function η ( . ) for both cases n = 500 and n = 1000 .
Table 12. The RASE criterion with the non-parametric function η ( . ) for both cases n = 500 and n = 1000 .
Logistic CaseMeanMedianVariance
n = 500 0.0380.0430.027
n = 1000 0.0290.0390.016
Table 13. The MSEPs for different models: Gaussian case.
Table 13. The MSEPs for different models: Gaussian case.
Functional ModelsMSEP
Model 1 (GNPFPLSIM) g ( µ ( X i , Z i ) ) = η ( α 1 X 1 , i + α 2 X 2 , i ) + r ( Z i ) 0.019
Model 2 (GNPFPLM) g ( µ ( X i , Z i ) ) = α 1 X 1 , i + α 2 X 2 , i + r ( Z i ) 0.059
Model3 (SIM) Y i = η ( α 1 X 1 , i + α 2 X 2 , i ) + ϵ i 1.102
Model 4 (FM) Y i = r ( Z i ) + ϵ i 1.831
Table 14. The MSEPs for different models: logistic case.
Table 14. The MSEPs for different models: logistic case.
Functional ModelsMSEP
Model 1 (GNPFPLSIM) g ( µ ( X i , Z i ) ) = η ( α 1 X 1 , i + α 2 X 2 , i ) + r ( Z i ) 0.039
Model 2 (GNPFPLM) g ( µ ( X i , Z i ) ) = α 1 X 1 , i + α 2 X 2 , i + r ( Z i ) 0.093
Model 3 (SIM) Y i = η ( α 1 X 1 , i + α 2 X 2 , i ) + ϵ i 1.102
Model 4 (FM) Y i = r ( Z i ) + ϵ i 1.831
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alahiane, M.; Ouassou, I.; Rachdi, M.; Vieu, P. High-Dimensional Statistics: Non-Parametric Generalized Functional Partially Linear Single-Index Model. Mathematics 2022, 10, 2704. https://doi.org/10.3390/math10152704

AMA Style

Alahiane M, Ouassou I, Rachdi M, Vieu P. High-Dimensional Statistics: Non-Parametric Generalized Functional Partially Linear Single-Index Model. Mathematics. 2022; 10(15):2704. https://doi.org/10.3390/math10152704

Chicago/Turabian Style

Alahiane, Mohamed, Idir Ouassou, Mustapha Rachdi, and Philippe Vieu. 2022. "High-Dimensional Statistics: Non-Parametric Generalized Functional Partially Linear Single-Index Model" Mathematics 10, no. 15: 2704. https://doi.org/10.3390/math10152704

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop