Next Article in Journal
A Fuzzy Implication-Based Approach for Validating Climatic Teleconnections
Next Article in Special Issue
Infinite-Server Resource Queueing Systems with Different Types of Markov-Modulated Poisson Process and Renewal Arrivals
Previous Article in Journal
Reciprocal Formulae among Pell and Lucas Polynomials
Previous Article in Special Issue
A Novel Tree Ensemble Model to Approximate the Generalized Extreme Value Distribution Parameters of the PM2.5 Maxima in the Mexico City Metropolitan Area
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Universal Local Linear Kernel Estimators in Nonparametric Regression

1
Sobolev Institute of Mathematics, 630090 Novosibirsk, Russia
2
Department of Probability Theory, Lomonosov Moscow State University, 119234 Moscow, Russia
3
Department of Epidemiology of Noncommunicable Diseases, National Medical Research Center for Therapy and Preventive Medicine, 101990 Moscow, Russia
*
Authors to whom correspondence should be addressed.
Mathematics 2022, 10(15), 2693; https://doi.org/10.3390/math10152693
Submission received: 29 June 2022 / Revised: 24 July 2022 / Accepted: 25 July 2022 / Published: 29 July 2022
(This article belongs to the Special Issue New Advances and Applications of Extreme Value Theory)

Abstract

:
New local linear estimators are proposed for a wide class of nonparametric regression models. The estimators are uniformly consistent regardless of satisfying traditional conditions of dependence of design elements. The estimators are the solutions of a specially weighted least-squares method. The design can be fixed or random and does not need to meet classical regularity or independence conditions. As an application, several estimators are constructed for the mean of dense functional data. The theoretical results of the study are illustrated by simulations. An example of processing real medical data from the epidemiological cross-sectional study ESSE-RF is included. We compare the new estimators with the estimators best known for such studies.

1. Introduction

In this paper, we consider a nonparametric regression model, where bivariate observations { ( X 1 , z 1 ) , , ( X n , z n ) } satisfy the following equations:
X i = f ( z i ) + ε i , i = 1 , , n ,
where { f ( t ) , t [ 0 , 1 ] } , is an unknown random function (process) which is almost surely continuous, the design { z i ; i = 1 , , n } consists of a set of observable random variables with possibly unknown distributions lying in [ 0 , 1 ] , and the design points are not necessarily independent or identically distributed. We will consider the design as a triangular array, i.e., the random variables { z i ; i = 1 , , n } may depend on n. In particular, this scheme includes regression models with fixed design. The random regression function f ( t ) is not supposed to be design-independent. We will give below some fairly standard conditions for the regression analysis on the random errors { ε i ; i = 1 , , n } . In particular, they are supposed to be centered, not necessarily independent or identically distributed.
The paper is devoted to constructing uniformly consistent estimators for the regression function f ( t ) under minimal assumptions on the correlation of design points.
The most popular kernel estimation procedures in the classical case of nonrandom regression function are apparently related with the estimators of Nadaray–Watson, Priestley–Zhao, Gasser–Müller, local polynomial estimators, as well as their modifications (e.g., see [1,2,3,4,5]). We are primarily interested in the dependence conditions of design elements { z i } . In this regard, a huge number of publications in the field of nonparametric regression can be conditionally divided into two groups. We will classify papers with a random design to the first one, and to the second one with a fixed design.
In the papers dealing with random design, either independent and identically distributed quantities are considered or, as a rule, stationary sequences of observations that satisfy one or another known form of dependence. In particular, various types of mixing conditions, schemes of moving averages, associated random variables, Markov or martingale properties, and so on have been used. In this regard, we note, for example, the papers [3,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]. In the recent papers [23,24,25,26], nonstationary sequences of design elements with one or another special type of dependence are considered (Markov chains, autoregression, partial sums of moving averages, etc.). In the case of fixed design, in the overwhelming majority of works, certain conditions for the regularity of the design are assumed (e.g., see [9,10,27,28,29,30,31,32,33]). So, the nonrandom design points z i are most often given by the formula z i = g ( i / n ) + o ( 1 / n ) with some function g of bounded variation, where the error o ( 1 / n ) is uniform in all i = 1 , , n . If g is linear then we obtain a so-called equidistant design. Another version of the regularity condition is the relation max i n ( z i z i 1 ) = O ( 1 / n ) (here it is assumed that the design elements ranged in increasing order).
The problem of uniform approximation of a regression function has been studied by many authors (e.g., see [7,9,10,14,15,17,20,22,26,30,34,35,36], and the references there).
In connection with studying the random regression function f ( t ) , we note, for example, the papers [37,38,39,40,41,42,43,44,45,46] where the mean and covariance functions of the random regression function f are estimated in the case when, for N independent copies f 1 , , f N of the function f, noisy values of each of these trajectories are observed for some collection of design elements (the design can be either common to all trajectories or different from series to series). Estimation of the mean and covariance functions is an actively developing area of nonparametric estimation, especially in the last couple of decades, which is both of independent interest and plays an important role for some subsequent analysis of the random process f (e.g., see [39,40,45,47,48,49]). We consider one of the variants of this problem as an application of the main result.
The purpose of this article is to construct estimators that are uniformly consistent (in the sense of convergence in probability) not only in the above-mentioned review of cases of dependence, but also for significantly different correlations of observations when the conditions of ergodicity or stationarity are not satisfied, as well as the classical mixing conditions and other well-known dependence restrictions. Note that the proposed estimators belong to the class of local linear kernel estimators, but with some different weights than in the classical version. In this case, instead of the original observations, we consider their concomitants associated with the variational series based on the design observations, and their spacings are taken as the additional weights for the corresponding weighted least-square method generating the above-mentioned new estimators. It is important to emphasize that these estimators have the property of universality regarding the nature of dependence of observations: the design can be either fixed and not necessarily regular, or random, while not necessarily satisfying the traditional correlation conditions. In particular, the only condition for design points that guarantees the uniform consistency of new estimators is the condition for dense filling of the domain of definition of the regression function. In our opinion, this condition is very clear and in fact, it is necessary to restore the function on the area of defining design elements. Previously, similar ideas were implemented in [50] for slightly different evaluations (in detail, see Section 4). Similar conditions for design elements were also used in [51,52] in nonparametric regression, and in [53,54,55]—in nonlinear regression.
The paper has the following structure. Section 2 contains the main results. Section 3 discusses the problem of estimating the mean function of a stochastic process. Comparison of the universal local linear estimators with some known ones is given in Section 4. Section 5 contains some results of computer simulation. In Section 6, we compare the results of using the new universal local linear estimators with the most common approaches of data analysis based on the epidemiological research ESSE-RF. In Section 7, we briefly summarize the results of the study. The proofs of the results from Section 2, Section 3 and Section 4 are referred to Section 8.

2. Main Results

We need a number of assumptions.
( D ) The observations X 1 , , X n are represented in the form (1), where the unknown random regression function { f ( t ) , t [ 0 , 1 ] } , is almost surely continuous. The design points { z i ; i = 1 , , n } are a set of observable random variables with values in [ 0 , 1 ] , having, generally speaking, unknown distributions, not necessarily independent or equally distributed. Moreover, the random variables { z i ; i = 1 , , n } may depend on n, i.e., can be considered as an array of design observations. The random function f ( t ) may be design-dependent.
( E ) For all n 1 , the unobservable random errors { ε i ; i = 1 , , n } satisfy with probability 1 the following conditions for all i , j n and i j :
E F n ε i = 0 , sup i n E F n ε i 2 σ 2 , E F n ε i ε j = 0 ,
where the constant σ 2 > 0 may be unknown and does not depend on n, the symbol E F n stands for the conditional expectation given the σ-field generated both by the paths of the random process f ( · ) and by the random variables { z i ; i = 1 , , n } .
( K ) A kernel K ( t ) , t R , is equal to zero outside the interval [ 1 , 1 ] and is the density of a symmetric distribution with the support in [ 1 , 1 ] , i.e., K ( t ) 0 , K ( t ) = K ( t ) for all t [ 1 , 1 ] , and 1 1 K ( t ) d t = 1 . We assume that the function K ( t ) satisfies the Lipschitz condition with constant 1 L and K ( ± 1 ) = 0 .
In the future, we denote by κ j , j = 0 , 1 , 2 , 3 , the absolute jth moment of the distribution with density K ( t ) , i.e., κ j = 1 1 | u | j K ( u ) d u . Put K h ( t ) = h 1 K ( h 1 t ) . It is clear that K h ( s ) is a probability density with support lying in [ h , h ] . We need also the notation
K 2 = 1 1 K 2 ( u ) d u , κ j ( α ) = 1 α t j K ( t ) d t , α [ 0 , 1 ] , j = 0 , 1 , 2 , 3 .
Remark 1.
We emphasize that assumption ( D ) includes a fixed-design situation. We consider the segment [ 0 , 1 ] as an area of design change solely for the sake of simplicity of exposition of the approach. In the general case, instead of the segment [ 0 , 1 ] , one can consider an arbitrary Jordan measurable subset of R .
Further, we denote by z n : 1 z n : n the order statistics constructed by the sample { z i ; i = 1 , , n } . Put
z n : 0 : = 0 , z n : n + 1 : = 1 , Δ z n i : = z n : i z n : i 1 , i = 1 , , n + 1 .
For every i, the response variable and the random error from (1) associated with the order statistic z n : i will be denoted by X n i and ε n i , respectively. It is easy to see that the new errors { ε n i ; i = 1 , , n } satisfy condition ( E ) as well. Next, by O p ( η n ) we denote a random variable ζ n such that, for all M > 0 , one has
lim sup n P ( | ζ n | / η n > M ) β ( M ) ,
where lim M β ( M ) = 0 and { η n } are positive (maybe random or not) variables and the function β ( M ) that may depend on the kernel K and σ 2 . We agree that, throughout what follows, all limits, unless otherwise stated, are taken for n .
Let us introduce one more constraint, which is the crucial condition of the paper (in particular, the only condition on design points that guarantees the existence of a uniformly consistent estimator; see also the comments at the end of the section).
( D 0 ) The following limit relation holds: δ n : = max 1 i n + 1 Δ z n i p 0 .
Finally, for any h ( 0 , 1 ) , we introduce into consideration the following class of estimators for the regression function f:
f ^ n , h ( t ) : = I ( δ n c * h ) i = 1 n w n 2 ( t ) ( t z n : i ) w n 1 ( t ) w n 0 ( t ) w n 2 ( t ) w n 1 2 ( t ) X n i K h ( t z n : i ) Δ z n i ,
where I ( · ) is the indicator function,
c * c * ( K ) : = κ 2 κ 1 2 96 L ( 6 L + κ 2 + κ 1 / 2 ) < 1 864 L ;
hereinafter, we use the notation
w n j ( t ) : = i = 1 n ( t z n : i ) j K h ( t z n : i ) Δ z n i , j = 0 , 1 , 2 , 3 .
Remark 2.
It is easy to see that the difference κ 2 κ 1 2 is the variance of a non-degenerate distribution; thus, this is strictly positive.
Remark 3.
It is easy to verify that kernel estimator (3), without the indicator factor, is the first coordinate of the two-dimensional estimate of the weighted least-squares method, i.e., of the two-dimensional point ( a * , b * ) at which the following minimum is attained:
min a , b i = 1 n X n i a + b ( t z n : i ) 2 K h ( t z n : i ) Δ z n i .
Thus, the proposed class of estimators in a certain sense (in fact, by construction) is close to the classical local linear kernel estimators, but in the weighted least squares method (5) we use slightly different weights.
Remark 4.
In the case when there are multiple design points, some spacings Δ z n i vanish, and we lose some of the sample information in the estimator (3). In this case, it is proposed, before using the estimator (3), to slightly reduce the sample by replacing the observations X i with the same points z i with their sample mean and leaving only one design point out of multiples in the new sample. In this case, the averaged observations will have less noise. So, despite the smaller size of the new sample, we do not lose the information contained in the original sample.
Let us further agree to denote by C j , j 1 , absolute positive constants, and by C j * , positive constants depending only on the kernel K. The main result of this section is as follows.
Theorem 1.
Let conditions ( D ) , ( E ) , and ( K ) be satisfied. Then, for any fixed h ( 0 , 1 / 2 ) , with probability 1 it is satisfied
sup t [ 0 , 1 ] | f ^ n , t ( t ) f ( t ) | C 1 * ω f ( h ) + ζ n ( h ) ,
where ω f ( h ) : = sup u , v [ 0 , 1 ] : | u v | h | f ( u ) f ( v ) | and the random variable ζ n ( h ) meets the relation
P ζ n ( h ) > y , δ n c * h C 2 * σ 2 E δ n h 2 y 2 ,
with the constant c * from (4).
Remark 5.
As follows from the proof of Theorem 1, the constants C 1 * and C 2 * have the following structure:
C 1 * = C 1 L 2 κ 2 κ 1 2 , C 2 * = C 2 L 4 ( κ 2 κ 1 2 ) 2 .
Remark 6.
Since δ n 1 , then under condition ( D 0 ) the limit relation E δ n 0 holds. Therefore, taking into account Theorem 1, we can assert that ζ n ( h ) = O p ( h 1 ( E δ n ) 1 / 2 ) . Thus, the bandwidth h can be determined, for example, by the relation
h n = sup h > 0 : P ω f ( h ) h 1 ( E δ n ) 1 / 2 h 1 ( E δ n ) 1 / 2 .
It is easy to see that, when ( D 0 ) is satisfied, the limit relations h n 0 and h n 1 ( E δ n ) 1 / 2 0 hold. In fact, the value of h n equalizes in h the order of smallness in probability of both terms on the right-hand side of the relation (6). Note also that, for nonrandom f, one can choose h h n as a solution to the equation
h 1 ( E δ n ) 1 / 2 = ω f ( h ) .
It is clear that this solution tends to zero as n grows.
The relations (8) and (9) allow us to obtain the order of smallness of the optimal bandwidth h, but not the optimal value of h. In practice, h can be chosen, for example, by cross-validation.
From Theorem 1 and Remark 6 it is easy to obtain the following corollary.
Corollary 1.
Let the conditions ( D ) , ( D 0 ) , ( K ) , and ( E ) be satisfied, the regression function f ( t ) be nonrandom, and C be an arbitrary subset of equicontinuous functions in C [ 0 , 1 ] (for example, a precompact set). Then
γ n ( C ) = sup f C sup t [ 0 , 1 ] | f ^ n , h ˜ n ( t ) f ( t ) | p 0 ,
where h ˜ n is defined by equation (9), in which the modulus of continuity ω f ( h ) is replaced with the universal modulus ω C ( h ) = sup f C ω f ( h ) . Moreover, the asymptotic relation γ n ( C ) = O p ( ω C ( h ˜ n ) ) holds.
Remark 7.
It is easy to see that for a nonrandom f ( t ) the modulus of continuity in (9) can be replaced by one or another upper bound for ω C ( h ) , obtaining the corresponding upper bound for γ n ( C ) . Consider the case E δ n = O ( 1 / n ) . If C consists of functions f ( t ) satisfying the Hölder condition with exponent α ( 0 , 1 ] and a universal constant then h ˜ n = O n 1 2 ( 1 + α ) and ω C ( h ˜ n ) = O n α 2 ( 1 + α ) . In particular, if the functions from C satisfy the Lipschitz condition ( α = 1 ) with a universal constant then γ n ( C ) = O p ( n 1 / 4 ) .
From Theorem 1 and Remark 6 we obtain the following corollary.
Corollary 2.
Let the conditions ( D ) , ( D 0 ) , ( K ) , and ( E ) be satisfied and let the modulus of continuity ω f ( h ) of the random regression function f ( t ) with probability 1 admit the upper bound ω f ( h ) ζ d ( h ) , where ζ > 0 is a random variable and d ( h ) is a positive continuous nonrandom function such that d ( h ) 0 as h 0 . Then
sup t [ 0 , 1 ] | f ^ n , h ^ ( n ) ( t ) f ( t ) | p 0 ,
where the value h ^ n is defined in (9) after replacement d ( h ) .
Let us discuss in more detail condition ( D 0 ) . Obviously, condition ( D 0 ) is satisfied for any nonrandom regular design (this is the case of nonidentically distributed { z i } depending on n). If { z i } are independent and identically distributed and the interval [ 0 , 1 ] is the support of distribution of z 1 , then condition ( D 0 ) is also satisfied. In particular, if the distribution density of z 1 is separated from zero on [ 0 , 1 ] , then δ n = O log n / n holds (see details in [50]). If { z i ; i 1 } is a stationary sequence with a marginal distribution with the support [ 0 , 1 ] , satisfying an α -mixing condition, then condition ( D 0 ) is also satisfied (see Remark 8 below). Note that the dependence of the random variables { z i } satisfying condition ( D 0 ) can be much stronger, which is illustrated in the following example.
Example 1.
Let the sequence of random variables { z i ; i 1 } be defined by the relation
z i = ν i u i l + ( 1 ν i ) u i r ,
where { u i l } and { u i r } are independent and uniformly distributed on [ 0 , 1 / 2 ] and [ 1 / 2 , 1 ] , respectively, the sequence { ν i } does not depend on { u i l } , { u i r } and consists of Bernoulli random variables with success probability 1 / 2 , i.e., the distribution of random variables z i is an equilibrium mixture of two uniform distributions on the corresponding intervals. The dependence between the random variables ν i for any natural number i is defined by the equalities ν 2 i 1 = ν 1 and ν 2 i = 1 ν 1 . In this case, the random variables { z i ; i 1 } in (11) form a stationary sequence of random variables uniformly distributed on the segment [ 0 , 1 ] , satisfying condition ( D 0 ) . On the other hand, for all natural numbers m and n,
P ( z 2 m 1 / 2 , z 2 n 1 1 / 2 ) = 0 .
Thus, all the known conditions for the weak dependence of random variables (in particular, the mixing conditions) are not satisfied here.
According to the scheme of this example, it is possible to construct various sequences of dependent random variables uniformly distributed on [ 0 , 1 ] by choosing sequences of Bernoulli switches with the conditions ν j k = 1 and ν l k = 0 for infinite numbers of indices { j k } and { l k } . In which case, condition ( D 0 ) will also be satisfied, but the corresponding sequence { z i } (not necessarily stationary) may not even satisfy the strong law of large numbers. For example, this is the case when ν j = 1 ν 1 for j = 2 2 k 1 , , 2 2 k 1 , and ν j = ν 1 for j = 2 2 k , , 2 2 k + 1 1 , where k = 1 , 2 , (i.e., we randomly choose one of the two segments [ 0 , 1 / 2 ] and [ 1 / 2 , 1 ] , into which we randomly throw the first point, and then alternate the selection of one of the two segments by the following numbers of elements of the sequence: 1, 2, 2 2 , 2 3 , etc.). Indeed, we can introduce the notation n k = 2 2 k 1 , n ˜ k = 2 2 k + 1 1 , S m = i = 1 m z i and note that, for all elementary events from the event { ν 1 = 1 } , one has
S n k n k = 1 n k i N 1 , k u i l + 1 n k i N 2 , k u i r ,
where N 1 , k and N 2 , k are the sets of indices, for which the observations { z i , i n k } lie in the intervals [ 0 , 1 / 2 ] or [ 1 / 2 , 1 ] , respectively. It is easy to see that # ( N 1 , k ) = n k / 3 and # ( N 2 , k ) = 2 # ( N 1 , k ) . Hence, S n k / n k 7 / 12 almost surely as k due to the strong law of large numbers for the sequences { u i l } and { u i r } . On the other hand, as k , for all elementary events from { ν 1 = 1 } one has
S n ˜ k n ˜ k = 1 n ˜ k i N ˜ 1 , k u i l + 1 n ˜ k i N ˜ 2 , k u i r 5 12 ,
where N ˜ 1 , k and N ˜ 2 , k are the sets of indices, for which the observations { z i , i n ˜ k } lie in the intervals [ 0 , 1 / 2 ] or [ 1 / 2 , 1 ] , respectively. Proving the convergence in (12), we took into account that # ( N ˜ 1 , k ) = ( 2 2 k + 2 1 ) / 3 and # ( N ˜ 2 , k ) = 2 n k / 3 , i.e., # ( N ˜ 1 , k ) = 2 # ( N ˜ 2 , k ) + 1 .
Similar arguments are valid for all elementary events from { ν 1 = 0 } .
Remark 8.
In the case of i.i.d. random variables { z i } , condition ( D 0 ) will be fulfilled if, for all δ ( 0 , 1 ) ,
p n ( δ ) sup | Δ | = δ P i n { z i Δ } 0 ,
where the supremum is taken over all intervals Δ [ 0 , 1 ] of length δ. Indeed, for any natural N > 1 , we divide the interval [ 0 , 1 ] into N subintervals Δ k , k = 1 , , N , of length 1 / N . Then one has
P max 1 i n + 1 Δ z n i > 2 N k = 1 N P i n { z i Δ k } N max k P i n { z i Δ k } N p n ( 1 / N ) ,
since the event max 1 i n + 1 Δ z n i > 2 / N implies the existence of an interval Δ k of length 1 / N that does not contain any points from the collection { z i } . Thereby, condition (13) implies the limit relation max i n + 1 Δ z n i p 0 , which is equivalent to convergence with probability 1 due to the monotonicity of the sequence max i n + 1 Δ z n i . In particular, if { z i } are independent then p n ( δ ) = e c ( δ ) n and c ( δ ) > 0 , i.e., as n , the finite collection { z i } with probability 1 forms a refining partition of the finite segment [ 0 , 1 ] . It is easy to show that if { z i ; i 1 } is a stationary sequence satisfying an α-mixing condition and having a marginal distribution with the support [ 0 , 1 ] then (13) will be valid.

3. Estimating the Mean Function of a Stochastic Process

Consider the following statement of the problem of estimating the expectation of an almost surely continuous stochastic process f ( t ) . There are N independent copies of the regression Equation (1):
X i , j = f j ( z i , j ) + ε i , j , i = 1 , , n , j = 1 , , N ,
where f ( t ) , f 1 ( t ) , , f N ( t ) , t [ 0 , 1 ] , are independent identically distributed almost surely continuous unknown random processes, the set { ε i , j ; i = 1 , , n } satisfies condition ( E ) for any j, the set { z i , j ; i = 1 , , n } meets conditions ( D ) and ( D 0 ) for any j (here and below the index j for the considered random variables means the number of copy of Model (1)). In particular, under the assumption that condition ( K ) is valid, by f ^ n , h , j ( t ) , j = 1 , , N , we denote the estimator given by the relation (3) when replacing the values from (1) with the corresponding characteristics from (14). Finally, an estimator for the mean-function is determined by the equality
f ^ N , n , h ¯ ( t ) = 1 N j = 1 N f ^ n , h , j ( t ) .
As a consequence of Theorem 1, we obtain the following assertion.
Theorem 2.
Let Model (14) satisfy the above-mentioned conditions and, moreover,
E sup t [ 0 , 1 ] | f ( t ) | < ,
while the sequences h h n 0 and N N n meet the restrictions
h 2 E δ n 0 a n d N P ( δ n > c * h ) 0 .
Then
sup t [ 0 , 1 ] f ^ N , n , h ¯ ( t ) E f ( t ) p 0 .
Remark 9.
If condition (16) is replaced with a slightly stronger constraint
E sup t [ 0 , 1 ] f 2 ( t ) <
then, under conditions similar to (17), one can prove the uniform consistency of the estimator
M ^ N , n , h ( t 1 , t 2 ) = 1 N j = 1 N f ^ n , h , j ( t 1 ) f ^ n , h , j ( t 2 ) , t 1 , t 2 [ 0 , 1 ] ,
for the unknown mixed second moment E f ( t 1 ) f ( t 2 ) where h h n and N N n satisfy (17). The arguments in proving this fact are quite similar to those in proving Theorem 2 and they are omitted. In other words, under the above-mentioned restrictions, the estimator
Cov ^ N , n , h ( t 1 , t 2 ) = M ^ N , n , h ( t 1 , t 2 ) f ^ N , n , h ¯ ( t 1 ) f ^ N , n , h ¯ ( t 2 )
is uniformly consistent for the covariance of the random regression function f ( t ) .
Remark 10.
The problem of estimating the mean and covariance functions plays a fundamental role in the so-called functional data analysis (see, for example, [39,40,47,48]). The property of uniform consistency of certain estimates of the mean function, which is important in the context of the problem under consideration, was considered, for example, in [37,40,43,45,47]. For a random design, as a rule, it is assumed that all its elements are independent identically distributed random variables (see, for example, [37,38,40,42,43,44,45,46,56,57]). In the case where the design is deterministic, certain regularity conditions discussed above in the Introduction are usually used. Moreover, in the problem of estimating the mean function, it is customary to subdivide design elements into certain types depending on the density of filling with the design points the regression function domain. The literature focuses on two types of data: or the design is in some sense “sparse” (for example, the number of design elements in each series is uniformly limited [37,38,40,56,57]), or the design is somewhat “dense” (the number of elements in each series grows with the number of series [37,40,44,57,58]). Theorem 2 considers the second of the specified types of design under condition ( D 0 ) in each of the independent series. Note that our formulation of the problem of estimating the mean function also includes the situation of a general deterministic design.
Note that the methodologies for estimating the mean function used for dense or sparse data are often different (see, for example, [48,49]). In the situation of a growing number of observations in each series, it is natural to preliminarily estimate trajectories of a random regression function in each series, and then average over all series (e.g., see [38,44,56]). This is exactly what we do in (15) following this conventional approach.

4. Comparison with Some Known Approaches

In [50], under the conditions of the present paper, the following estimators were studied:
f n , h * ( t ) = i = 1 n X n i K h ( t z n : i ) Δ z n i i = 1 n K h ( t z n : i ) Δ z n i i = 1 n X n i K h ( t z n : i ) Δ z n i w n 0 ( t ) .
Notice that
f n , h * ( t ) arg min a i = 1 n ( X n i a ) 2 K h ( t z n : i ) Δ z n i .
It is interesting to compare the new estimators f ^ n , h ( t ) with the estimators f n , h * ( t ) from [50] as well as with other estimators (for example, the Nadaraya–Watson estimators f ^ N W ( t ) and classical local linear estimators f ^ L L ( t ) ). Throughout this section, we assume that conditions ( D ) , ( K ) , and ( E ) are satisfied and the regression function f ( t ) is nonrandom. Moreover, we need the following constraint.
( IID ) The regression function f ( t ) in Model (1) twice continuously differentiable, the errors { ε i } are independent, identically distributed, centered, and independent of the design { z i } , whose elements are independent and identically distributed. In addition, the distribution function of the random variable z 1 has a strictly positive density p ( t ) continuously differentiable on ( 0 , 1 ) .
Such severe restrictions on the parameters of the regression model are explained both by problems in calculating the asymptotic representation for the variances of the estimators f ^ n , h ( t ) and f n , h * ( t ) as well as by properties of the Nadaraya–Watson estimators, which are very sensitive to the nature of the correlation of design elements.
For any statistical estimator f ˜ n ( t ) of the regression function f ( t ) , we will use the notation Bias f ˜ n ( t ) for its bias, i.e., Bias f ˜ n ( t ) : = E f ˜ n ( t ) f ( t ) . Put f ¯ = sup t [ 0 , 1 ] | f ( t ) | and for j = 0 , 1 , 2 , 3 , introduce the notation
w j ( t ) = 0 1 ( t z ) j K h ( t z ) d z = z [ 0 , 1 ] : | t z | h ( t z ) j K h ( t z ) d z , t [ 0 , 1 ] .
The following asymptotic representation for the bias and variance of the estimator f n , h * ( t ) was obtained in [50].
Proposition 1.
Let condition ( I I D ) be fulfilled and inf t [ 0 , 1 ] p ( t ) > 0 . If n and h 0 so that ( log n ) 1 h n , h 2 E δ n 0 , and h 3 E δ n 2 0 then, for any t ( 0 , 1 ) , the following asymptotic relations are valid:
Bias f n , h * ( t ) = h 2 κ 2 2 f ( t ) + o ( h 2 ) , V a r f n , h * ( t ) 2 σ 2 h n p ( t ) K 2 .
Note that the first statement concerning the asymptotic behavior of the bias in Proposition 1 was actually proved for arbitrarily dependent design elements when condition ( D 0 ) is met. The following two propositions and corollaries are also obtained without any assumptions about correlation of design elements, only conditional centering and conditional orthogonality of the errors from condition ( E ) are used.
Proposition 2.
Let h < 1 / 2 . Then, for any fixed t [ h , 1 h ] ,
Bias f ^ n , h ( t ) = Bias f n , h * ( t ) + γ n , h ( t ) , V a r f ^ n , h ( t ) = V a r f n , h * ( t ) + ρ n , h ( t ) ,
where
| γ n , h ( t ) | C 3 * f ¯ h 1 E δ n , | ρ n , h ( t ) | C 4 * σ 2 + f ¯ 2 h 1 E δ n .
Proposition 3.
Let the regression function f ( t ) be twice continuously differentiable. Then, for any fixed t ( 0 , 1 ) ,
Bias f ^ n , h ( t ) = f ( t ) 2 B 0 ( t ) + O ( E δ n / h ) + o ( h 2 ) ,
where
B 0 ( t ) = w 2 2 ( t ) w 3 ( t ) w 1 ( t ) w 0 ( t ) w 2 ( t ) w 1 2 ( t ) .
Moreover,
Bias f n , h * ( t ) = f ( t ) w 1 ( t ) w 0 ( t ) + f ( t ) 2 w 2 ( t ) w 0 ( t ) + O ( E δ n ) + o ( h 2 ) ,
besides, the error terms o ( h 2 ) and O ( · ) in (22) and (24) are uniform in t.
Corollary 3.
Let the regression function f ( t ) be twice continuously differentiable, h 0 , and h 3 E δ n 0 . Then, for each fixed t ( 0 , 1 ) such that f ( t ) 0 , the following asymptotic relations are valid:
Bias f ^ n , h ( t ) Bias f n , h * ( t ) f ( t ) 2 κ 2 h 2 .
Corollary 4.
Suppose that, under the conditions of the previous corollary, f has nonzero first and second derivatives in a neighborhood of zero. Then for any fixed positive α < 1 such that κ 1 ( α ) < 0 , the following asymptotic relations hold:
Bias f ^ n , h ( α h ) 1 2 h 2 D ( α ) f ( 0 + ) , Bias f n , h * ( α h ) h κ 1 ( α ) κ 0 ( α ) f ( 0 + ) ,
where
D ( α ) = κ 2 2 ( α ) κ 3 ( α ) κ 1 ( α ) κ 0 ( α ) κ 2 ( α ) κ 1 2 ( α ) .
Note that, due to the Cauchy–Bunyakovsky inequality and the properties of the density K ( · ) , the strict inequality κ 0 ( α ) κ 2 ( α ) κ 1 2 ( α ) > 0 holds for any α [ 0 , 1 ] .
Remark 11.
Similar relations take place in a neighborhood of the right boundary of the segment [ 0 , 1 ] , when t = 1 α h for any α 1 . In this case, in the above asymptotics, one simply needs to replace the right-hand derivatives at zero by analogous (non-zero) left-hand derivatives at point 1, and instead of the quantities κ j ( α ) must be substituted κ ˜ j ( α ) = α 1 v i K ( v ) d v = ( 1 ) j κ j ( α ) . In this case, the coefficient D ( α ) will not change, and the corresponding coefficient on the right-hand side of the second asymptotics will only change its sign.
Thus, the qualitative difference between the estimators f n , h * ( t ) and f ^ n , h ( t ) is observed only in neighborhoods of the boundary points 0 and 1: for the estimator f n , h * ( t ) , in the h-neighborhoods of the indicated points, the order of smallness of the bias is h, and for f ^ n , h ( t ) this order is h 2 . Such a connection between the estimators (3) and (19) seems to be quite natural in view of the relations (5) and (20), and the known relationship at the boundary points between Nadaraya–Watson estimators f ^ N W ( t ) and locally linear estimators f ^ L L ( t ) .
Remark 12.
If condition ( I I D ) is satisfied, then, for the bias and variance of estimators f ^ N W ( t ) and f ^ L L ( t ) , the following asymptotic representations are well known (see, for example, [1]), which are valid for any t ( 0 , 1 ) under broad conditions on the parameters of the model under consideration:
Bias f ^ N W ( t ) = h 2 κ 2 2 p ( t ) f ( t ) p ( t ) + 2 f ( t ) p ( t ) + o ( h 2 ) , V a r f ^ N W ( t ) σ 2 h n p ( t ) K 2 , Bias f ^ L L ( t ) = h 2 κ 2 2 f ( t ) + o ( h 2 ) , V a r f ^ L L ( t ) σ 2 h n p ( t ) K 2 .
The above asymptotic representations show that if the assumptionss ( I I D ) are valid then the variance of the Nadaraya–Watson estimator f ^ N W ( t ) and the locally linear estimator f ^ L L ( t ) under broad conditions is asymptotically half the variance of the estimators f n , h * ( t ) and f ^ n , h ( t ) , respectively. However, the mean-square error of any estimator is equal to the sum of the variance and squared bias, which for the compared estimators is asymptotically determined by the quantities f ( t ) p ( t ) + 2 f ( t ) p ( t ) or f ( t ) p ( t ) , respectively. In other words, if the standard deviation σ of the errors is not very large and
f ( t ) p ( t ) + 2 f ( t ) p ( t ) > f ( t ) p ( t ) ,
then the estimator f n , h * ( t ) or f ^ n , h ( t ) may be more accurate than f ^ N W ( t ) . The indicated effect for the estimator f n , h * ( t ) is confirmed by the results of computer simulations in [50].
Note also that in order to choose in a certain sense the optimal bandwidth h, the orders of the smallness of the bias and the standard deviation of the estimator are usually equated. In other words, if the assumptions ( I I D ) are fulfilled, for all four types of estimators considered here, we need to solve the equation h 2 ( n h ) 1 / 2 . Thus the optimal bandwidth has the standard order h n 1 / 5 .
Remark 13.
Estimators of the form f ^ n , h ( t ) and f n , h * ( t ) given in (3) and (19) can define a little differently, depending on the choice of one or another partition with highlighted points { z i ; i = 1 , , n } of the domain of the regression function underlying these estimators. For example, using the Voronoi partition of the segment [ 0 , 1 ] , an estimator of the form (19) can be given by the equality
f ˜ n , h * ( t ) = i = 1 n X n i K h ( t z n : i ) Δ ˜ z n i i = 1 n K h ( t z n : i ) Δ ˜ z n i ,
where Δ ˜ z n 1 = Δ z n 1 + Δ z n 2 / 2 , Δ ˜ z n n = Δ z n n / 2 + Δ z n n + 1 , Δ ˜ z n i = ( Δ z n i + Δ z n i + 1 ) / 2 for i = 2 , , n 1 . Looking through the proofs from [50] it is easy to see that in this case all properties of the estimator f ˜ n , h * ( t ) are preserved, except for the asymptotic representation of the variance. Repeating (with obvious changes) the arguments in proving Proposition 1 in [50], we have
V a r f ˜ n , h * ( t ) 1.5 σ 2 h n p ( t ) K 2 .
Thus, in the case of independent and identically distributed design points, the asymptotic variance of the estimator can be somewhat reduced by choosing one or another partition.
Similarly, in the definition (3), the estimators f ^ n , h ( t ) , the quantities { Δ z n i } can be replaced by the Voronoi tiling { Δ ˜ z n i } . It is also worth noting that the indicator factor involved in the determination (3) of the estimator f ^ n , h ( t ) , does not affect the asymptotic properties of the estimator given in Theorem 1, and we only needed it to calculate the exact asymptotic behavior of the estimator bias.

5. Simulations

In the following computer simulations, instead of estimator (3), we used the equivalent estimator f ^ n , h ( t ) of the weighted least-squares method defined by the relation
( f ^ n , h ( t ) , b ^ ( t ) ) = arg min a , b i = 1 n X n i a b ( t z n : i ) 2 K h ( t z n : i ) Δ ˜ z n i ,
where the quantities Δ ˜ z n i are defined in (13) above. Estimator (27) differs from estimator (3) by excluding the indicator factor and replacing Δ z n i with Δ ˜ z n i , which is not essential (see Remark 13). If we had several observations at one design point, then the observations were replaced by one observation presenting their arithmetic mean (see Remark 4 above). Although the notation f ^ n , h ( t ) in (27) is somewhat different from the same notation in (3), we retained the notation f ^ n , h ( t ) , which will not lead to ambiguity.
In the simulations below, we will also consider the local constant estimator f ˜ n , h * ( t ) from (26), which can be defined by the equality
f ˜ n , h * ( t ) arg min a i = 1 n ( X n i a ) 2 K h ( t z n : i ) Δ ˜ z n i .
Here we also replace the observations corresponding to one design point by their arithmetic mean.
Recall that the Nadaraya–Watson estimator differs from (28) by the absence of the factors Δ ˜ z n i in the weighting coefficients:
f ^ N W ( t ) = i = 1 n X n i K h ( t z n : i ) i = 1 n K h ( t z n : i ) .
The Nadaraya–Watson estimators are also weighted least-squares estimators:
f ^ N W ( t ) arg min a i = 1 n ( X n i a ) 2 K h ( t z n : i ) .
In the following examples, estimators (27) and (28), which will be called universal local linear (ULL) and universal local constant (ULC), respectively, will be compared with the estimator of linear regression (LR), the Nadaraya–Watson (NW) estimator, LOESS of order 1, as well as with estimators of generalized additive models (GAM) and of random forest (RF). For LOESS estimators, the R loess() function was used. Calculating the ULL estimator with the custom script was on average 3.2 times slower than the LOESS estimator calculated by the R loess() function. That may be explained by the fact that the ULL estimator was implemented in R language (in contrast to R loess() whose body is implemented in C and Fortran) and was not optimized for performance.
It is worth noting that, in the examples below, the best results were obtained by the new estimators ULL (27) and ULC (28), LOESS estimator of order 1, and the Nadaraya–Watson estimator.
With regard to the simulation examples, the main difference between the ULL (27) and ULC (28) estimators, and the Nadaraya–Watson and LOESS ones is that ULL (27) and ULC (28) are “more local”. This means that if a function f ( z ) is evaluated on a design interval A with a “small” number of observations adjacent to a design interval B with a “large” number of observations, the Nadaraya–Watson and LOESS estimators will primarily seek to adjust to the “large” cluster of observations on the interval B. At the same time, ULL (27) and ULC (28) will equally consider observations on intervals of equal lengths, regardless of the distribution of design points on the intervals.
In the examples below, for all of the kernel estimators that are the Nadaraya–Watson ones, LOESS, ULL (27), and ULC (28), we used the tricubic kernel
K ( t ) = 70 81 max { 0 , ( 1 | t | 3 ) 3 } .
We chose the tricubic kernel because that kernel is employed in the R function loess() which was used in the simulations.
The accuracy of the models was estimated with respect to the maximum error and the mean squared error. In all the examples below, except Example 3, the maximum error was estimated on the uniform grid of 1001 points on the segment [ 0 , 10 ] by the formula
max j = 1 , , 1001 | f ˇ ( t j ) f ( t j ) | ,
where t j are the grid points of segment [ 0 , 10 ] , t 1 = 0 , t 1001 = 10 , f ˇ ( t j ) are the values of the constructed estimator at the points of the partition grid, and f ( t j ) are the true values of the estimated function. In Example 3, a grid of 1001 points was taken on the interval from the minimum to the maximum point of the design. That was done in order to to avoid assessing the quality of extrapolation, since, in that example, the minimum design point could fall far from 0.
The mean squared error was calculated for one random splitting of the whole sample into training and validation samples in a proportion of 80 % to 20 % , according to the formula
1 m j = 1 m f ˇ ( z j ) X j ) 2 ,
where m is the validation sample size, z j are the validation sample design points, X j are the noisy observations of the predicted function in the validation sample, f ˇ is the estimate calculated by the training sample. The splittings into training and validation samples were identical for all models.
For each of the kernel estimators, the parameter h of the kernel K h was determined using cross-validation, minimizing the mean squared error, where the set of observations was partitioned into 10 folds randomly. The same partitions were taken for all the kernel estimators.
When calculating the root mean square error, the cross-validation for choosing h was carried out on the training set. To calculate the maximum error, the cross-validation was performed on the whole sample. For the Nadaraya–Watson models as well as for ULL (27) and ULC (28), the parameter h was selected from 20 values located on the logarithmic grid from max { 0.0001 , 1.1 max i Δ z n i } to 0.9. For LOESS, the parameter span was chosen in the same way from 20 values located on the logarithmic grid from 0.0001 to 0.9.
The simulations also included testing basic statistical learning algorithms: linear regression without regularization, generalized additive model, and random forest [59]. The training of the generalized additive model was carried out using the R library mgcv.
Thin-plate splines were used, the optimal form of which was selected using generalized cross-validation. Random forest training was done using the R library randomForest. The number of trees was chosen to be 1000 based on the out-of-bag error plot for a random forest with five observations per leaf. The optimal number of observations in a random forest leaves was chosen using 10-fold cross-validation on a logarithmic grid out of 20 values from 5 to 2000.
In each example, 1000 realizations of different training and validation sets were performed, for each of which the errors were calculated. In each of the training and validation sets realizations, 5000 observations were generated. The results of the calculations are presented below in the boxplots, where every box represents the median and the 1st and 3rd quartiles. The plots do not show the results of linear regression, since in the examples, the results appeared to be significantly worse than those of the other models. The mean squared and maximum errors of ULL (27) were compared with the errors of LOESS estimator by the paired Wilcoxon test. The summaries of the errors on the 1000 realizations of different train and validation sets are reported as median (1st quartile, 3rd quartile).
The examples of this section were constructed so that the distribution of design points is “highly nonuniform”. Potentially, this could demonstrate the advantage of the new ULL estimator (27) over known estimation approaches.
Example 2.
Let us set the target function
f ( z ) = ( z 5 ) 2 + 10 , 0 z 10
and let the noise be centered Gaussian with standard deviation σ = 2 (Figure 1). In each realization, we draw 4500 independent design points uniformly distributed on the segment z [ 0 , 5 ] , and 500 independent design points uniformly distributed on the segment z [ 5 , 10 ] .
The results are presented in Figure 2. For the maximum error, the advantage of the estimators of order 1 (LOESS and ULL (27)) over the estimators of order 0 (the Nadaraya–Watson and ULC (28)) is noticeable, while ULL (27) turns out to be the best of all considered estimators, in particular, ULL (27) performs better than LOESS: 0.6357 (0.4993, 0.8224) vs. 0.6582 (0.5205, 0.8508), p = 0.019.
For the mean squared error, all models, except random forest and linear regression, show similar results. Moreover, ULL (27) turns out to be the best of the considered ones, although the difference between ULL (27) and LOESS is not statistically significant: 4.017 (3.896, 4.139) vs. 4.030 (3.906, 4.154), p = 0.11.
Example 3.
The piecewise linear target function is shown in Figure 3. For the sake of simplicity of presentation, we do not present the formula for the definition of this function. Here, the centered Gaussian noise has the standard deviation σ = 2 . The design points are independent and identically distributed with density proportional to the function ( z 5 ) 2 + 2 , 0 z 10 .
The results are presented in Figure 4. The Nadaraya–Watson estimator appears to be the best model both for the maximum error and for the mean squared error. For the both errors, ULL (27) is better than LOESS (p < 0.0001 for the maximum error, p = 0.0030 for the mean squared error).
Example 4.
In this example, the design points are strongly dependent. We will define them as follows: z i : = s ( A i ) , i = 1 , . . . , n , where A is a positive number such that A / π is irrational (we chose A = 0.0002 in this example),
s ( t ) : = 10 | k = 1 100 η k cos ( t k ) | w i t h η k : = k 1 ψ k j = 1 100 j 1 ψ j 1 ,
and ψ j are independent uniformly distributed on [ 0 , 1 ] random variables independent of the noise. It was shown [50] that the random sequence s ( A i ) is asymptotically everywhere dense on [ 0 , 10 ] with probability 1.
The target function is
f ( z ) = 0.2 ( ( z 5 ) 2 + 25 ) cos ( ( z 5 ) 2 / 2 ) + 60 ,
shown in Figure 5.
The results are presented in Figure 6. For maximum error, ULL (27) turns out to be the best of all the considered estimators. In particular, ULL (27) is better than LOESS: 1.757 (1.491, 2.053) vs. 2.538 (2.216, 2.886), p < 0.0001.
The median mean squared error for ULL (27) also turns out to be the smallest of those considered. In that sense, ULL (27) is better than LOESS, but the difference is not significant: 4.166 (4.025, 4.751) vs. 4.219 (4.096, 4.338), p = 0.92.
Example 5.
In this example, the target function was the same as in Example 4. The difference from the previous example is that 50,000 design points were generated by the same technique, and then 5000 points of the 50,000 ones were selected. This allowed us to fill the domain of f with design elements “more uniformly” than in the previous example, while preserving the clusters of design points.
The results are presented in Figure 7. For maximum error, ULL (27) turns out to be the best of all the considered estimators. In particular, ULL (27) is better than LOESS: 2.872 (2.369, 3.488) vs. 9.435 (5.719, 10.9), p < 0.0001 .
For the mean squared error, the best estimator is LOESS. ULL (27) is worse than LOESS: 5.108 (4.535, 6.597) vs. 4.378 (4.229, 4.541), p < 0.0001 , but it is better than the other estimators considered.

6. Real Data Application

In this section, we consider an application of the models considered in the previous section to the data collected in the multicenter study “Epidemiology of cardiovascular diseases in the regions of the Russian Federation”. In that study, representative samples of unorganized male and female populations aged 25–64 years from 13 regions of the Russian Federation were studied. The study was approved by the Ethics Committees of the three federal centers: State Research Center for Preventive Medicine, Russian Cardiology Research and Production Complex, Almazov Federal Medical Research Center. Each participant provided written informed consent for the study. The study was described in detail in [60].
One of the urgent problems of modern medicine is to study the relationship between heart rate (HR) and systolic arterial blood pressure (SBP), especially for low observation values. Therefore we will choose SBP as the outcome, and HR as the predictor. The association between these variables was previously estimated to be nonlinear [61]. The general analysis included 6597 participants from four regions of the Russian Federation. The levels of SBP and HR were statistically significantly pairwise different between the selected regions. Thus, the hypothesis of the independence of design points was violated.
In this section, the maximum error cannot be calculated because the exact form of the relationship is unknown, so only the mean squared error is reported. The mean squared error was calculated for 1000 random partitions of the entire set of observations into training ( 80 % ) and validation ( 20 % ) samples.
The results are presented in Figure 8. Here, the GAM estimator and the kernel estimators showed similar results, which were better than the results of both the linear regression and random forest.
The best estimator turned out to be ULC (28), although its difference from the Nadaraya–Watson estimator was not statistically significant: 220.2 (215.4, 225.9) vs. 220.4 (215.4, 225.8), p = 0.91 . The difference between ULL (27) and LOESS was not significant too: 220.4 (215.4, 225.9) vs. 220.6 (215.6, 226.1), p = 0.52 .

7. Conclusions

In this paper, for a wide class of nonparametric regression models with a random design, universal uniformly consistent kernel estimators are proposed for an unknown random regression function of a scalar argument. These estimators belong to the class of local linear estimators. However, in contrast to the vast majority of previously known results, traditional conditions of dependence of design elements are not needed for the consistency of the new estimators. The design can be either fixed and not necessarily regular, or random and not necessarily consisting of independent or weakly dependent random variables. With regard to design elements, the only condition that is required is the dense filling of the regression function domain with the design points.
Explicit upper bounds are found for the rate of uniform convergence in probability of the new estimators to an unknown random regression function. The only characteristic explicitly included in these estimators is the maximum spacing statistic of the variational series of design elements, which requires only the convergence to zero in probability of the maximum spacing as the sample size tends to infinity. The advantage of this condition over the classical ones is that it is insensitive to the forms of dependence of the design observations. Note that this condition is, in fact, necessary, since only when the design densely fills the regression function domain is it possible to reconstruct the regression function with some accuracy. As a corollary of the main result, we obtain consistent estimators for the mean function of continuous random processes.
In the simulation examples of Section 5, the new estimators were compared with known kernel estimators. In some of the examples, the new estimators proved to be the most accurate. In the application to real medical data considered in Section 6, the accuracy of new estimators was also comparable with that of the best-known kernel estimators.

8. Proofs

In this Section, we will prove the assertions stated in Section 2, Section 3 and Section 4. Denote
β n , i ( t ) : = w n 2 ( t ) ( t z n : i ) w n 1 ( t ) w n 0 ( t ) w n 2 ( t ) w n 1 2 ( t ) .
Taking into account the relations X n i = f ( z n : i ) + ε n i , i = 1 , , n , and the identity
i = 1 n β n , i ( t ) K h ( t z n : i ) Δ z n i 1 ,
we obtain the representation
f ^ n , h ( t ) = f ( t ) + f ( t ) I ( δ n > c * h ) + r ^ n , h ( f , t ) + ν ^ n , h ( t ) ,
where
r ^ n , h ( f , t ) = I ( δ n c * h ) i = 1 n β n , i ( t ) ( f ( z n : i ) f ( t ) ) K h ( t z n : i ) Δ z n i , ν ^ n , h ( t ) = I ( δ n c * h ) i = 1 n β n , i ( t ) K h ( t z n : i ) Δ z n i ε n i .
We emphasize that, in view of the properties of the density K h ( · ) , the domain of summation in the last two sums as well as in all sums defining the quantities w n j ( t ) from (4) coincides with the set A n , h ( t ) = { i : | t z n : i | h , 1 i n } , which is a crucial point for further analysis.
Lemma 1.
For h < 1 / 2 , the following equalities are valid:
inf t [ 0 , 1 ] ( w 0 ( t ) w 2 ( t ) w 1 2 ( t ) ) = 1 4 ( κ 2 κ 1 2 ) h 2 , inf t [ 0 , 1 ] w 0 ( t ) = 1 / 2 ,
sup t [ 0 , 1 ] | w j ( t ) | = 1 2 j 2 [ j / 2 ] κ j h j , j = 0 , 1 , 2 , 3 .
Moreover, on the set of elementary events such that δ n c * h , the following inequalities hold:
sup t [ 0 , 1 ] | w n j ( t ) | 3 L h j , sup t [ 0 , 1 ] | w n j ( t ) w j ( t ) | 12 L δ n h j 1 , j = 0 , 1 , 2 , 3 ,
inf t [ 0 , 1 ] ( w n 0 ( t ) w n 2 ( t ) w n 1 2 ( t ) ) 1 8 ( κ 2 κ 1 2 ) h 2 , inf t [ 0 , 1 ] w n 0 ( t ) 1 / 4 ,
t 1 , t 2 [ 0 , 1 ] | w n j ( t 2 ) w n j ( t 1 ) | 18 L h j 1 | t 2 t 1 | , j = 0 , 1 , 2 .
Proof. 
Let us prove (35) and (36). First of all, note that, due to the Cauchy–Bunyakovsky–Schwartz inequality, w 0 ( t ) w 2 ( t ) w 1 2 ( t ) 0 for all t [ 0 , 1 ] and this difference is continuous in t. First, consider the simplest case where h t 1 h . For such t, after changing the integration variable in the definition (21) of the quantities w j ( t ) we have
w j ( t ) = t h t + h ( t z ) j K h ( t z ) d z = h j 1 1 v j K ( v ) d v ,
i.e., w 0 ( t ) 1 , w 1 ( t ) 0 , and w 2 ( t ) h 2 κ 2 . In other words, on the segment [ h , 1 h ] , the following identity is valid:
w 0 ( t ) w 2 ( t ) w 1 2 ( t ) h 2 κ 2 .
We now consider the case t = α h for all α [ 0 , 1 ] . Then
w j ( α h ) = 0 ( 1 + α ) h ( α h z ) j K h ( α h z ) d z = h j κ j ( α ) .
Next, by (42), we obtain
d d α h 2 ( w 0 ( α h ) w 2 ( α h ) w 1 2 ( α h ) ) = d d α ( κ 0 ( α ) κ 2 ( α ) κ 1 2 ( α ) ) = K ( α ) α 2 1 α K ( v ) d v + 1 α v 2 K ( v ) d v 2 α 1 α v K ( v ) d v 0
in view of the relation 1 α v K ( v ) d v 0 since K ( v ) is an even function. Similarly we study the symmetrical case where t = 1 α h for all α [ 0 , 1 ] . From here and (41) we obtain the first relation in (35):
inf t [ 0 , 1 ] { w 0 ( t ) w 2 ( t ) w 1 2 ( t ) } = w 0 ( 0 ) w 2 ( 0 ) w 1 2 ( 0 ) = 1 4 h 2 ( κ 2 κ 1 2 ) .
The second relation in (35) directly follows from (42). Moreover, the above-mentioned arguments and the representations (40) and (42) imply (36).
Further, the first estimator in (37) is obvious by the above remark about the domain of summation in the definition of functions w n j ( t ) , and the relations
sup s [ 0 , 1 ] K ( s ) L , i A n , h ( t ) Δ z n i 2 h + δ n 3 h .
The second estimator in (37) immediately follows from the well-known estimate of the error of approximation by Riemann integral sums of the corresponding integrals of smooth functions on a finite closed interval:
| i A n , h ( t ) g t , j ( z n : i ) Δ z n i z [ 0 , 1 ] : | t z | h g t , j ( z ) d z | ( 2 h + δ n ) δ n L g t , j ,
where the functions g t , j ( z ) = ( t z ) j K h ( t z ) , j = 0 , 1 , 2 , 3 , are defined for all z [ 0 t h , 1 t + h ] , and L g t , j is the Lipschitz constant of the function g t , j ( z ) ; It easy to verify that sup t [ 0 , 1 ] L g t , j 4 L h j 2 for all h ( 0 , 1 / 2 ) and j = 0 , 1 , 2 , 3 . So, on the set of elementary events such that { δ n c * h } (recall that c * < 1 ), the right-hand side in (44) can be replaced with 12 L δ n h j 1 .
In addition, taking (36) and (37) into account, we obtain
| w n 0 ( t ) w n 2 ( t ) w 0 ( t ) w 2 ( t ) | w n 0 ( t ) | w n 2 ( t ) w 2 ( t ) | + w 2 ( t ) | w n 0 ( t ) w 0 ( t ) | 9 L δ n ( 3 L + κ 2 ) h ,
| w n 1 2 ( t ) w 1 2 ( t ) | | w n 1 ( t ) w 1 ( t ) | ( | w n 1 ( t ) | + | w 1 ( t ) | ) 9 L δ n ( 3 L + κ 1 / 2 ) h .
Hence follows the estimate
| w n 0 ( t ) w n 2 ( t ) w n 1 2 ( t ) w 0 ( t ) w 2 ( t ) + w 1 2 ( t ) | 9 L δ n ( 6 L + κ 2 + κ 1 / 2 ) h .
The inequalities in (38) follow from (35), (45), and the definition of the constant c * . To prove (39), note that
w n j ( t 2 ) w n j ( t 1 ) = i = 1 n ( t 2 z n : i ) j K h ( t 2 z n : i ) ( t 1 z n : i ) j K h ( t 1 z n : i ) Δ z n i
= i A n , h ( t 1 ) A n , h ( t 2 ) ( t 2 z n : i ) j K h ( t 2 z n : i ) ( t 1 z n : i ) j K h ( t 1 z n : i ) Δ z n i
where we can use the estimates | ( t 2 z n : i ) j ( t 1 z n : i ) j | 2 h j 1 | t 2 t 1 | for j = 0 , 1 , 2 , | t k z n : i | h for k = 1 , 2 , and also the inequalities
| K h ( t 2 z n : i ) K h ( t 1 z n : i ) | L h 2 | t 2 t 1 | ,
i A n , h ( t 1 ) A n , h ( t 2 ) Δ z n i 4 h + 2 δ n 6 h .
Thus, Lemma 1 is proved. □
Lemma 2.
For any positive h < 1 / 2 , the following estimate is valid:
sup t [ 0 , 1 ] | r ^ n , h ( f , t ) | C 1 * ω f ( h ) , w i t h C 1 * = C 1 L 2 κ 2 κ 1 2 .
Proof. 
Without loss of generality, the required estimate can be derived on the set of elementary events determined by the condition δ n c * h . Then, the assertion of the lemma follows from the inequality
| r ^ n , h ( f , t ) | ω f ( h ) w n 2 ( t ) w n 0 ( t ) w n 2 ( t ) w n 1 2 ( t ) i A n , h ( t ) K h ( t z n : i ) Δ z n i + ω f ( h ) | w n 1 ( t ) | w n 0 ( t ) w n 2 ( t ) w n 1 2 ( t ) i A n , h ( t ) | t z n : i | K h ( t z n : i ) Δ z n i ,
the estimates from (43), and Lemma 1. □
Lemma 3.
For any y > 0 and h < 1 / 2 , on the set of elementary events such that δ n c * h , the following estimate is valid:
P F n sup t [ 0 , 1 ] | ν ^ n , h ( t ) | > y C 2 * σ 2 δ n h 2 y 2 , w i t h C 2 * = C 2 L 4 ( κ 2 κ 1 2 ) 2 ,
where the symbol P F n denotes the conditionsl probability given the σ-field F n .
Proof. 
Put
μ n , h ( t ) = i A n , h ( t ) h 2 α n , i ( t ) K h t z n : i Δ z n i ε n i ,
where α n , i ( t ) = w n 2 ( t ) ( t z n : i ) w n 1 ( t ) , and note that from Lemma 1 and the conditions of Lemma 3 it follows that, firstly, h 2 | α n , i ( t ) | 6 L if only i A n , h ( t ) , and secondly,
| ν ^ n , h ( t ) | 8 ( κ 2 κ 1 2 ) 1 | μ n , h ( t ) | .
The distribution tail of the random variable sup t [ 0 , 1 ] | μ n , h ( t ) | will be estimated by the so-called chaining proposed by A.N. Kolmogorov to estimate the distribution tail of the supremum norm of a stochastic process with almost surely continuous trajectories (see [62]). First of all, note that the set [ 0 , 1 ] under the supremum sign can be replaced by the set of dyadic rational points
R = { j / 2 k ; j = 1 , , 2 k 1 ; k 1 } .
Thus,
sup t [ 0 , 1 ] | μ n , h ( t ) | = sup t R | μ n , h ( t ) | max j = 1 , . . . , 2 m 1 | μ n , h ( j 2 m ) | + k = m + 1 max j = 1 , . . . , 2 k 2 | μ n , h ( ( j + 1 ) 2 k ) μ n , h ( j 2 k ) | ,
where the natural number m is defined by the equality m = | log 2 h | (here a is the minimal natural number greater than or equal to a. One has
P F n sup t [ 0 , 1 ] | μ n , h ( t ) | > y P F n max j = 1 , . . . , 2 m 1 | μ n , h ( j 2 m ) | > a m y + k = m + 1 P F n max j = 1 , . . . , 2 k 2 | μ n , h ( ( j + 1 ) 2 k ) μ n , h ( j 2 k ) | > a k y j = 1 2 m 1 P F n ( | μ n , h ( j 2 m ) | > a m y ) + k = m + 1 j = 1 2 k 2 P F n | μ n , h ( ( j + 1 ) 2 k ) μ n , h ( j 2 k ) | > a k y ,
where a m , a m + 1 , . . . is a sequence of positive numbers such that a m + a m + 1 + . . . = 1 .
Let us now estimate each of the terms on the right-hand side of (50). Using Markov’s inequality for the second moment and the estimates (43), we obtain
P F n ( | μ n , h ( j 2 m ) | > a m y ) ( 6 L ) 2 ( a m y ) 2 i A n , h ( j 2 m ) K h 2 ( j 2 m z n : i ) ( Δ z n i ) 2 σ 2 ( 6 L ) 2 σ 2 ( a m y ) 2 δ n ( 2 h + δ n ) h 2 C 3 L 2 σ 2 ( a m y ) 2 δ n h 1 .
Further,
P F n | μ n , h ( ( j + 1 ) 2 k ) μ n , h ( j 2 k ) | > a k y ( a k y ) 2 h 4 × i = 1 n E F α n , i ( ( j + 1 ) 2 k ) K h ( ( j + 1 ) 2 k z n : i ) α n , i ( j 2 k ) K h ( j 2 k z n : i ) Δ z n i ε n i 2 σ 2 ( a k y ) 2 h 4 × i = 1 n α n , i ( ( j + 1 ) 2 k ) K h ( ( j + 1 ) 2 k z n : i ) α n , i ( j 2 k ) K h ( j 2 k z n : i ) 2 ( Δ z n i ) 2 L h 2 | u v | C 4 σ 2 L 4 ( a k y ) 2 2 2 k δ n ( 4 h + 2 δ n ) h 4 C 5 σ 2 L 4 ( a k y ) 2 2 2 k δ n h 3 .
Here, we took into account that the summation range in (52) coincides with the set
i : i A n , h ( ( j + 1 ) 2 k ) A n , h ( j 2 k ) ,
and hence, due to the relation | ( j + 1 ) / 2 k j / 2 k | = 2 k h for k > m , the estimate (46) is valid for t 1 = j 2 k and t 2 = ( j + 1 ) 2 k . Moreover, we used the estimates
sup t K h ( t ) L h 1 , | K h ( u ) K h ( v ) | L h 2 | u v | ,
and took into account the following inequalities in the above range of parameter changes (see Lemma 1):
| α n , i ( ( j + 1 ) 2 k ) α n , i ( j 2 k ) | C 6 L h 2 k , | α n , i ( j 2 k ) | C 7 L h 2 , | α n , i ( ( j + 1 ) 2 k ) K h ( ( j + 1 ) 2 k z n : i ) α n , i ( j 2 k ) K h ( j 2 k z n : i ) | C 8 L 2 k .
We now obtain from (50)–(52) that
P F n sup t [ 0 , 1 ] | μ n , h ( t ) | > y C 9 y 2 σ 2 L 4 δ n h 1 2 m a m 2 + h 2 k = m + 1 2 k + 1 a k 2 .
The optimal sequence a k minimizing the right-hand side of this inequality is a m = c 2 m / 3 and a k = c h 2 / 3 2 ( k + 1 ) / 3 for k = m + 1 , m + 2 , . . . , where c is defined by the relation a m + a m + 1 + . . . = 1 . For the indicated sequence, we conclude that
P F n sup t [ 0 , 1 ] | μ n , h ( t ) | > y C 10 y 2 σ 2 L 4 δ n h 1 2 m / 3 + h 2 / 3 2 m / 3 2 + 2 1 / 3 + 2 2 / 3 3 C 11 y 2 σ 2 L 4 δ n h 2 .
The assertion of the lemma follows from (49). □
Proof of Theorem 1. 
The assertion follows from Lemmas 2 and 3 if we set
ζ n ( h ) = sup t [ 0 , 1 ] | ν ^ n , h ( t ) | + sup t [ 0 , 1 ] | f ( t ) | I ( δ n > c * h )
and take into account the relation
P ζ n ( h ) > y , δ n c * h = E I δ n c * h P F n ζ n ( h ) > y ,
which was required. □
To prove Theorem 2 we need the two auxiliary assertions below.
Lemma 4.
If the condition(16)is fulfilled then lim ε 0 E ω f ( ε ) = 0 and for independent copies of the a.s. continuous random process f ( t ) the following strong law of large numbers is valid: As N , then
sup t [ 0 , 1 ] | f ¯ N ( t ) E f ( t ) | p 0 , w h e r e f ¯ N ( t ) = N 1 j = 1 N f j ( t ) .
Proof. 
The first assertion of the lemma follows from (16) and Lebesgue’s dominated convergence theorem. We put
ω f ¯ N ( ε ) = sup t , s : | t s | ε | f ¯ N ( t ) f ¯ N ( s ) | , ω E f ( ε ) = sup t , s : | t s | ε | E f ( t ) E f ( s ) | .
For any fixed k > 0 and i = 0 , , k , one has
sup t [ 0 , 1 ] | f ¯ N ( t ) E f ( t ) | max 0 i k f ¯ N i / k E f i / k + max 1 i k sup ( i 1 ) / k t i / k f ¯ N ( t ) f ¯ N i / k + max 1 i k sup ( i 1 ) / k t i / k E f ( t ) E f i / k max 0 i k f ¯ N i / k E f i / k + ω f ¯ N 1 / k + ω E f 1 / k .
Put ω f j ( ε ) = sup t , s : | t s | ε | f j ( t ) f j ( s ) | and note that ω E f ( ε ) E ω f ( ε ) , and as N ,
f ¯ N i / k p E f i / k , ω f ¯ N ( ε ) 1 N j = 1 N ω f j ( ε ) p E ω f ( ε ) .
Therefore, the right-hand side in (54) does not exceed E ω f 1 / k + o p ( 1 ) and by the arbitrariness of k and the first statement of the lemma, the relation (53) is proved. □
Lemma 5.
Under the conditions of Theorem 2 the following limit relation holds:
1 N j = 1 N Δ n , h , j p 0 , w h e r e Δ n , h , j = sup t [ 0 , 1 ] | f n , h , j * ( t ) f j ( t ) | .
Proof. 
Let the sequences h = h n 0 and N = N n be such that condition (17). Introduce the event B n , h , j = { δ n , j c * h } , where j = 1 , , N . For any positive ν one has
P 1 N j = 1 N Δ n , h , j > ν P 1 N j = 1 N Δ n , h , j I ( B n , h , j ) > ν + N P ( B n , h , 1 ¯ ) .
Next, from Theorem 1 we obtain
E Δ n , h , j I ( B n , h , j ) C 1 * E ω f ( h ) + 0 P ζ n ( h ) > y , δ n c * h d y C 1 * E ω f ( h ) + h 1 ( E δ n ) 1 / 2 + h 1 ( E δ n ) 1 / 2 P ζ n ( h ) > y , δ n c * h d y C 1 * E ω f ( h ) + ( 1 + C 2 * σ 2 ) h 1 ( E δ n ) 1 / 2 .
To complete the proof of the lemma, it remains for the first probability on the right-hand side of (56) to apply Markov’s inequality, use the last estimate, limit relations (17), and the first statement of Lemma 4. □
Proof of Theorem 2. 
The proof of Theorem 2 follows from Lemmas 4 and 5. □
Proof of Proposition 2. 
For the estimator f n , h * ( t ) defined in (19), we need the following representation:
f n , h * ( t ) = f ( t ) + r n , h * ( f , t ) + ν n , h * ( t ) ,
where
r n , h * ( f , t ) = w n 0 1 ( t ) i = 1 n ( f ( z n : i ) f ( t ) ) K h ( t z n : i ) Δ z n i , ν n , h * ( t ) = w n 0 1 ( t ) i = 1 n K h ( t z n : i ) Δ z n i ε n i .
In view of the representations (34) and (57), we obtain
Bias f ^ n , h ( t ) = E r ^ n , h ( f , t ) + f ( t ) P ( δ n > c * h ) = i = 1 n E { I ( δ n c * h ) β n , i ( t ) ( f ( z n : i ) f ( t ) ) K h ( t z n : i ) Δ z n i } + f ( t ) P ( δ n > c * h ) , Bias f n , h * ( t ) = E r n , h * ( f , t )
= i = 1 n E { I ( δ n c * h ) w n 0 1 ( t ) ( f ( z n : i ) f ( t ) ) K h ( t z n : i ) Δ z n i } + τ n ,
where | τ n | ω f ( h ) P ( δ n > c * h ) . Further, it follows from Lemma 1 that, under the condition δ n c * h , for any point t [ h , 1 h ] one has
sup i A n , h ( t ) | β n , i ( t ) w n 0 1 ( t ) | C 5 * δ n h 1 .
When deriving the relation (60), we also took into account that w 0 ( t ) = 1 and w 1 ( t ) = 0 for all t [ h , 1 h ] (see the proof of Lemma 1). Now, reckoning with the relations (43), (58), (59), (60), and Lemma 1, it easy to derive the first assertion of the lemma since
| Bias f ^ n , h ( t ) Bias f n , h * ( t ) | C 5 * h 1 ω f ( h ) E δ n I ( δ n c * h ) i = 1 n K h ( t z n : i ) Δ z n i + ( | f ( t ) | + ω f ( h ) ) P ( δ n > c * h ) C 6 * ω f ( h ) h 1 E δ n + ( | f ( t ) | + ω f ( h ) ) P ( δ n > c * h ) .
To prove the second assertion, first of all, note that
V a r f ^ n , h ( t ) = V a r ν ^ n , h ( t ) + V a r r ^ n , h ( f , t ) + f ( t ) I ( δ n > c * h ) = V a r ν ^ n , h ( t ) + V a r r ^ n , h ( f , t ) + f 2 ( t ) P ( δ n > c * h ) P ( δ n c * h ) ,
V a r f n , h * ( t ) = V a r ν n , h * ( t ) + V a r r n , h * ( f , t ) .
Thus, we need to compare the two variances on the right-hand side of the first equality with the corresponding variances of the second one. Using (43) and (60), we obtain
| V a r ν ^ n , h ( t ) V a r ν n , h * ( t ) | σ 2 E i = 1 n I ( δ n c * h ) ( β n , i 2 ( t ) w n 0 2 ( t ) ) K h 2 ( t z n : i ) ( Δ z n i ) 2 + σ 2 P ( δ n > c * h ) C 7 * σ 2 h 1 E δ n I ( δ n c * h ) i = 1 n h K h 2 ( t z n : i ) Δ z n i + σ 2 P ( δ n > c * h ) C 8 * σ 2 h 1 E δ n ;
when deriving this estimate, we took into account that
i = 1 n w n 0 2 ( t ) ) K h 2 ( t z n : i ) ( Δ z n i ) 2 1 .
To estimate the difference | V a r r ^ n , h ( t ) V a r r n , h * ( t ) | , note that the bound C 9 * f ¯ 2 h 1 E δ n for the modulus of the difference between the squares of the displacements of the random variables r ^ n , h ( f , t ) and r n , h * ( f , t ) is essentially contained in (47) and (61). Estimation of the difference of the second moments of the specified random variables is done similarly with (43), (60), and (61):
| E r ^ n , h 2 ( f , t ) E r n , h * 2 ( f , t ) | E | r ^ n , h ( f , t ) r n , h * ( f , t ) | | r ^ n , h ( f , t ) + r n , h * ( f , t ) | C 10 * f ¯ 2 h 1 E δ n ,
which completes the proof. □
Proof of Proposition 3. 
From the definition of β n , i ( t ) in (32) it follows that, for any t [ 0 , 1 ] ,
i = 1 n β n , i ( t ) ( z n : i t ) K h ( t z n : i ) Δ z n i = 0 , i = 1 n β n , i ( t ) ( z n : i t ) 2 K h ( t z n : i ) Δ z n i = D n 1 ( t ) ( w n 2 2 ( t ) w n 3 ( t ) w n 1 ( t ) ) = : B n ( t ) ,
where D n ( t ) : = w n 0 ( t ) w n 2 ( t ) w n 1 2 ( t ) . Expanding the function f ( · ) by the Taylor formula in a neighborhood of the point t (up to the second derivative), from the above identities we obtain, using (32), (58), and Lemma 1, that for any point t we have
Bias f ^ n , h ( t ) = E I ( δ n c * h ) i = 1 n { β n , i ( t ) ( f ( z n : i ) f ( t ) ) K h ( t z n : i ) Δ z n i } + f ( t ) P ( δ n > c * h ) = f ( t ) 2 E I ( δ n c * h ) B n ( t ) + f ( t ) P ( δ n > c * h ) + o ( h 2 ) = f ( t ) 2 B 0 ( t ) + O ( E δ n / h ) + o ( h 2 ) ;
moreover, the O- and o-symbols on the right-hand side of (62) are uniform in t. Note that B 0 ( t ) = O ( h 2 ) holds for any t.
Next, since for j = 1 , 2 we have | w j ( t ) | w 0 1 ( t ) h j and | w n j ( t ) | w n 0 1 ( t ) h j for all natural n, the following asymptotic representation holds:
Bias f n , h * ( t ) = i = 1 n E w n 0 1 ( t ) ( f ( z n : i ) f ( t ) ) K h ( t z n : i ) Δ z n i = f ( t ) E w n 1 ( t ) w n 0 ( t ) I ( δ n c * h ) + f ( t ) 2 E w n 2 ( t ) w n 0 ( t ) I ( δ n c * h ) + O ( h P ( δ n > c * h ) ) + o ( h 2 ) = f ( t ) w 1 ( t ) w 0 ( t ) + f ( t ) 2 w 2 ( t ) w 0 ( t ) + O ( E δ n ) + o ( h 2 ) .
Proof of Corollary 3. 
Without loss of generality, we can assume that t [ h , 1 h ] . Then, as noted in the proof of Lemma 1, for the indicated t, one has w 0 ( t ) = 1 , w 1 ( t ) = 0 , and w 2 ( t ) = κ 2 h 2 , i.e., B 0 ( t ) = κ 2 h 2 . □
Proof of Corollary 4. 
This assertion follows from Proposition 3 and (42). □

Author Contributions

Conceptualization, Y.L. and E.Y.; data curation, V.K. and S.S.; formal analysis, P.R. and V.K.; investigation, S.S.; software, P.R. and V.K.; methodology, Y.L, I.B., P.R., V.K. and E.Y.; visualization, P.R.; writing—original draft, Y.L., I.B. and P.R.; writing—(review and editing), Y.L., I.B., E.Y., P.R. and V.K. All authors have read and agreed to the published version of the manuscript.

Funding

The study of Yu. Linke, I. Borisov, and P. Ruzankin was supported within the framework of the state contract of the Sobolev Institute of Mathematics, project FWNF-2022-0009.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the three ethics committees: National Medical Research Center for Therapy and Preventive Medicine, Russian Cardiology Research-and-Production Complex, and Federal Almazov North-West Medical Research Centre.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data cannot be shared publicly because of Ethics committee of the National Medical Research Center for Therapy and Preventive Medicine regulations. Deidentified data will be provided to any qualified investigator on reasonable request. Proposals will be reviewed and approved by the researchers, local regulatory authorities, and the ethics committee of the National Medical Research Center for Therapy and Preventive Medicine. Once the proposal has been approved, data can be transferred through a secure online platform after the signing of a data access agreement and a confidentiality agreement.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications; Chapman and Hall: London, UK, 1996. [Google Scholar]
  2. Fan, J.; Yao, Q. Nonlinear Time Series Nonparametric and Parametric Methods; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  3. Györfi, L.; Kohler, M.; Krzyzak, A.; Walk, H. A Distribution-Free Theory of Nonparametric Regression; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
  4. Härdle, W. Applied Nonparametric Regression; Cambridge University Press: Cambridge, MA, USA, 1990. [Google Scholar]
  5. Müller, H.-G. Nonparametric Regression Analysis of Longitudinal Data; Springer: New York, NY, USA, 1988. [Google Scholar]
  6. Chu, C.K.; Deng, W.-S. An interpolation method for adapting to sparse design in multivariate nonparametric regression. J. Statist. Plann. Inference 2003, 116, 91–111. [Google Scholar] [CrossRef]
  7. Devroye, L.P. The uniform convergence of the Nadaraya–Watson regression function estimate. Can. J. Stat. 1979, 6, 179–191. [Google Scholar] [CrossRef]
  8. Gasser, T.; Engel, J. The choice of weghts in kernel regression estimation. Biometrica 1990, 77, 277–381. [Google Scholar] [CrossRef]
  9. Hansen, B.E. Uniform convergence rates for kernel estimation with dependent data. Econom. Theory 2008, 24, 726–748. [Google Scholar] [CrossRef] [Green Version]
  10. Härdle, W.; Luckhaus, S. Uniform consistency of a class of regression function estimators. Ann. Statist. 1984, 12, 612–623. [Google Scholar] [CrossRef]
  11. Hong, S.Y.; Linton, O.B. Asymptotic Properties of a Nadaraya-Watson Type Estimator for Regression Functions of Infinite Order; Cemmap Working Paper, No. CWP53/16; Centre for Microdata Methods and Practice (Cemmap): London, UK, 2016. [Google Scholar] [CrossRef] [Green Version]
  12. Jiang, J.; Mack, Y.P. Robust local polynomial regression for dependent data. Stat. Sin. 2001, 11, 705–722. [Google Scholar]
  13. Kulik, R.; Lorek, P. Some results on random design regression with long memory errors and predictors. J. Statist. Plann. Infer. 2011, 141, 508–523. [Google Scholar] [CrossRef] [Green Version]
  14. Liero, H. Strong uniform consistency of nonparametric regression function estimates. Probab. Theory Rel. Fields 1989, 82, 587–614. [Google Scholar] [CrossRef]
  15. Li, X.; Yang, W.; Hu, S. Uniform convergence of estimator for nonparametric regression with dependent data. J. Inequal. Appl. 2016, 142, 1–12. [Google Scholar] [CrossRef] [Green Version]
  16. Linton, O.B.; Jacho-Chavez, D.T. On internally corrected and symmetrized kernel estimators for nonparametric regression. Test 2010, 19, 166–186. [Google Scholar] [CrossRef]
  17. Mack, Y.P.; Silvermann, B.W. Weak and strong uniform consistency of kernel regression estimates. Z. Wahrscheinlichkeitstheor. Verw. Geb. 1982, 61, 405–415. [Google Scholar] [CrossRef]
  18. Masry, E. Nonparametric regression estimation for dependent functional data. Stoch. Proc. Their Appl. 2005, 115, 155–177. [Google Scholar] [CrossRef] [Green Version]
  19. Müller, H.-G. Density adjusted kernel smoothers for random design nonparametric regression. Stat. Probab. Lett. 1997, 36, 161–172. [Google Scholar] [CrossRef]
  20. Nadaraya, E.A. Remarks on non-parametric estimates for density functions and regression curves. Theory Prob. Appl. 1970, 15, 134–137. [Google Scholar] [CrossRef]
  21. Roussas, G.G. Nonparametric regression estimation under mixing conditions. Stoch. Proc. Appl. 1990, 36, 107–116. [Google Scholar] [CrossRef]
  22. Shen, J.; Xie, Y. Strong consistency of the internal estimator of nonparametric regression with dependent data. Stat. Probab. Lett. 2013, 83, 1915–1925. [Google Scholar] [CrossRef]
  23. Chen, J.; Gao, J.; Li, D. Estimation in semi-parametric regression with non-stationary regressors. Bernoulli 2012, 18, 678–702. [Google Scholar] [CrossRef]
  24. Karlsen, H.A.; Myklebust, T.; Tjøstheim, D. Nonparametric estimation in a nonlinear cointegration type model. Ann. Statist. 2007, 35, 252–299. [Google Scholar] [CrossRef] [Green Version]
  25. Linton, O.; Wang, Q. Nonparametric transformation regression with nonstationary data. Econom. Theory 2016, 32, 1–29. [Google Scholar] [CrossRef] [Green Version]
  26. Wang, Q.; Chan, N. Uniform convergence rates for a class of martingales with application in non-linear cointegrating regression. Bernoulli 2014, 20, 207–230. [Google Scholar] [CrossRef]
  27. Benelmadani, D.; Benhenni, K.; Louhichi, S. Trapezoidal rule and sampling designs for the nonparametric estimation of the regression function in models with correlated errors. Statistics 2020, 54, 59–96. [Google Scholar] [CrossRef] [Green Version]
  28. Benhenni, K.; Hedli-Griche, S.; Rachdi, M. Estimation of the regression operator from functional fixed-design with correlated errors. J. Multivar. Anal. 2010, 101, 476–490. [Google Scholar] [CrossRef]
  29. Beran, J.; Feng, Y. Local polynomial estimation with a FARIMA-GARCH error process. Bernoulli 2001, 7, 733–750. [Google Scholar] [CrossRef] [Green Version]
  30. Gu, W.; Roussas, G.G.; Tran, L.T. On the convergence rate of fixed design regression estimators for negatively associated random variables. Stat. Probab. Lett. 2007, 77, 1214–1224. [Google Scholar] [CrossRef]
  31. Tang, X.; Xi, M.; Wu, Y.; Wang, X. Asymptotic normality of a wavelet estimator for asymptotically negatively associated errors. Stat. Probab. Lett. 2018, 140, 191–201. [Google Scholar] [CrossRef]
  32. Wu, J.S.; Chu, C.K. Nonparametric estimation of a regression function with dependent observations. Stoch. Proc. Appl. 1994, 50, 149–160. [Google Scholar] [CrossRef] [Green Version]
  33. Zhou, X.; Zhu, F. Asymptotics for L1-wavelet method for nonparametric regression. J. Inequal. Appl. 2020, 216, 1–11. [Google Scholar] [CrossRef]
  34. Einmahl, U.; Mason, D.M. Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist. 2005, 33, 1380–1403. [Google Scholar] [CrossRef] [Green Version]
  35. Ioannides, D.A. Consistent nonparametric regression: Some generalizations in the fixed design case. J. Nonparametr. Stat. 1993, 2, 203–213. [Google Scholar] [CrossRef]
  36. Liang, H.-Y.; Jing, B.-Y. Asymptotic properties for estimates of nonparametric regression models based on negatively associated sequences. J. Multivar. Anal. 2005, 95, 227–245. [Google Scholar] [CrossRef] [Green Version]
  37. Zhou, L.; Lin, H.; Liang, H. Efficient estimation of the nonparametric mean and covariance functions for longitudinal and sparse functional data. J. Amer. Statist. Assoc. 2018, 113, 1550–1564. [Google Scholar] [CrossRef]
  38. Hall, P.; Müller, H.-G.; Wang, J.-L. Properties of principal component methods for functional and longitudinal data analysis. Ann. Statist. 2006, 34, 1493–1517. [Google Scholar] [CrossRef] [Green Version]
  39. Kokoszka, P.; Reimherr, M. Introduction to Functional Data Analysis; Chapman and Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
  40. Li, Y.; Hsing, T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Ann. Statist. 2010, 38, 3321–3351. [Google Scholar] [CrossRef]
  41. Lin, Z.; Wang, J.-L. Mean and Covariance Estimation for Functional Snippets. J. Amer. Statist. Assoc. 2020, 117, 348–360. [Google Scholar] [CrossRef] [PubMed]
  42. Yao, F. Asymptotic distributions of nonparametric regression estimators for longitudinal or functional data. J. Multivar. Anal. 2007, 98, 40–56. [Google Scholar] [CrossRef] [Green Version]
  43. Yao, F.; Müller, H.-G.; Wang, J.-L. Functional data analysis for sparse longitudinal data. J. Amer. Statist. Assoc. 2005, 100, 577–590. [Google Scholar] [CrossRef]
  44. Zhang, J.-T.; Chen, J. Statistical inferences for functional data. Ann. Statist. 2007, 35, 1052–1079. [Google Scholar] [CrossRef] [Green Version]
  45. Zhang, X.; Wang, J.-L. From sparse to dense functional data and beyond. Ann. Statist. 2016, 44, 2281–2321. [Google Scholar] [CrossRef]
  46. Zheng, S.; Yang, L.; Hardle, W. A smooth simultaneous confidence corridor for the mean of sparse functional data. J. Am. Statist. Assoc. 2014, 109, 661–673. [Google Scholar] [CrossRef]
  47. Hsing, T.; Eubank, R. Theoretical Foundations of Functional Data Analysis; with an Introduction to Linear Operators; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
  48. Müller, H.-G. Functional modelling and classification of longitudinal data. Scand. J. Statist. 2005, 32, 223–246. [Google Scholar] [CrossRef]
  49. Wang, J.-L.; Chiou, J.-M.; Müller, H.-G. Functional Data Analysis. Ann. Rev. Statist. 2016, 3, 257–295. [Google Scholar] [CrossRef] [Green Version]
  50. Borisov, I.S.; Linke, Y.Y.; Ruzankin, P.S. Universal weighted kernel-type estimators for some class of regression models. Metrika 2021, 84, 141–166. [Google Scholar] [CrossRef]
  51. Linke, Y.Y. Towards insensitivity of Nadaraya–Watson estimators to design correlation. Theory Probab. Appl. 2022, 67. [Google Scholar]
  52. Linke, Y.Y.; Borisov, I.S. Insensitivity of Nadaraya–Watson estimators to design correlation. Commun. Stat. Theory Methods 2021. [Google Scholar] [CrossRef]
  53. Linke, Y.Y.; Borisov, I.S. Constructing initial estimators in one-step estimation procedures of nonlinear regression. Statist. Probab. Lett. 2017, 120, 87–94. [Google Scholar] [CrossRef]
  54. Linke, Y.Y. Asymptotic properties of one-step M-estimators. Commun. Stat. Theory Methods 2019, 48, 4096–4118. [Google Scholar] [CrossRef]
  55. Linke, Y.Y.; Borisov, I.S. Constructing explicit estimators in nonlinear regression problems. Theory Probab. Appl. 2018, 63, 22–44. [Google Scholar] [CrossRef]
  56. Cai, T.T.; Yuan, M. Optimal estimation of the mean function based on discretely sampled functional data: Phase transition. Ann. Statist. 2011, 39, 2330–2355. [Google Scholar] [CrossRef] [Green Version]
  57. Wu, H.; Zhang, J.-T. Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches; John Wiley and Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
  58. Cao, G.; Wang, L.; Li, Y.; Yang, L. Oracle-efficient confidence envelopes for covariance functions in dense functional data. Stat. Sin. 2016, 26, 359–383. [Google Scholar] [CrossRef]
  59. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining; Inference, and Prediction; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  60. Shalnova, S.A.; Drapkina, O.M. Significance of the ESSE-RF study for the development of prevention in Russia. Cardiovasc. Ther. Prev. 2020, 19, 2602, In Russian. [Google Scholar] [CrossRef]
  61. Shalnova, S.A.; Kutsenko, V.A.; Kapustina, A.V.; Yarovaya, E.B.; Balanova YuA, E.S.; Imaeva, A.E.; Maksimov, S.A.; Muromtseva, G.A.; Kulakova, N.V.; Kalachikova, O.N.; et al. Associations of Blood Pressure and Heart Rate and Their Contribution to the Development of Cardiovascular Complications and All-Cause Mortality in the Russian Population of 25–64 Years. Ration. Pharmacother. Cardiol. 2020, 16, 759–769, In Russian. [Google Scholar] [CrossRef]
  62. Chentsov, N.N. Weak convergence of stochastic processes whose trajectories have no discontinuities of the second kind and the heuristic approach to the Kolmogorov-Smirnov tests. Theory Probab. Appl. 1956, 1, 140–144. [Google Scholar] [CrossRef]
Figure 1. Example 2. Sample observations, target function, and two estimators.
Figure 1. Example 2. Sample observations, target function, and two estimators.
Mathematics 10 02693 g001
Figure 2. The maximum (left) and mean squared (right) errors in Example 2. For the mean squared error, the random forest model performed worse (10.97 (10.55, 11.39)) than the GAM model and the kernel estimators, so the results of the random forest model “did not fit” into the plot.
Figure 2. The maximum (left) and mean squared (right) errors in Example 2. For the mean squared error, the random forest model performed worse (10.97 (10.55, 11.39)) than the GAM model and the kernel estimators, so the results of the random forest model “did not fit” into the plot.
Mathematics 10 02693 g002
Figure 3. Example 3. Sample observations, target function, and two estimators.
Figure 3. Example 3. Sample observations, target function, and two estimators.
Mathematics 10 02693 g003
Figure 4. The maximum (left) and mean squared (right) errors in Example 3. For the mean-squared error, the random forest model performed worse (6.699 (6.412, 7.046)) than the GAM model and the kernel estimators, so the results of the random forest model “did not fit” into the plot.
Figure 4. The maximum (left) and mean squared (right) errors in Example 3. For the mean-squared error, the random forest model performed worse (6.699 (6.412, 7.046)) than the GAM model and the kernel estimators, so the results of the random forest model “did not fit” into the plot.
Mathematics 10 02693 g004
Figure 5. Example 4. Sample observations, target function, and two estimators.
Figure 5. Example 4. Sample observations, target function, and two estimators.
Mathematics 10 02693 g005
Figure 6. The maximum (left) and mean squared (right) errors in Example 4. As before, for the mean squared error, the results of the random forest model (13.95 (11.69, 16.18)) are not shown in full on the graph. In addition, the outliers for the GAM, NW, ULC, and ULL estimators are “cut off” in this graph.
Figure 6. The maximum (left) and mean squared (right) errors in Example 4. As before, for the mean squared error, the results of the random forest model (13.95 (11.69, 16.18)) are not shown in full on the graph. In addition, the outliers for the GAM, NW, ULC, and ULL estimators are “cut off” in this graph.
Mathematics 10 02693 g006
Figure 7. The maximum (left) and mean-squared (right) errors in Example 5. As before, for the mean-squared error, the results of the random forest model are not shown in full on the graph. In addition, the outliers for the NW, ULC, and ULL estimators are “cut off” in this graph.
Figure 7. The maximum (left) and mean-squared (right) errors in Example 5. As before, for the mean-squared error, the results of the random forest model are not shown in full on the graph. In addition, the outliers for the NW, ULC, and ULL estimators are “cut off” in this graph.
Mathematics 10 02693 g007
Figure 8. Mean-squared prediction error of the dependence of BP from HR.
Figure 8. Mean-squared prediction error of the dependence of BP from HR.
Mathematics 10 02693 g008
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Linke, Y.; Borisov, I.; Ruzankin, P.; Kutsenko, V.; Yarovaya, E.; Shalnova, S. Universal Local Linear Kernel Estimators in Nonparametric Regression. Mathematics 2022, 10, 2693. https://doi.org/10.3390/math10152693

AMA Style

Linke Y, Borisov I, Ruzankin P, Kutsenko V, Yarovaya E, Shalnova S. Universal Local Linear Kernel Estimators in Nonparametric Regression. Mathematics. 2022; 10(15):2693. https://doi.org/10.3390/math10152693

Chicago/Turabian Style

Linke, Yuliana, Igor Borisov, Pavel Ruzankin, Vladimir Kutsenko, Elena Yarovaya, and Svetlana Shalnova. 2022. "Universal Local Linear Kernel Estimators in Nonparametric Regression" Mathematics 10, no. 15: 2693. https://doi.org/10.3390/math10152693

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop