Next Article in Journal
Predicting Human Mortality: Quantitative Evaluation of Four Stochastic Models
Next Article in Special Issue
On Comparison of Stochastic Reserving Methods with Bootstrapping
Previous Article in Journal / Special Issue
Parameter Estimation in Stable Law
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of Star-Shaped Distributions

by
Eckhard Liebscher
1,* and
Wolf-Dieter Richter
2
1
Department of Engineering and Natural Sciences, University of Applied Sciences Merseburg, 06217 Merseburg, Germany
2
Institute of Mathematics, University of Rostock, Ulmenstraße 69, Haus 3, 18057 Rostock, Germany
*
Author to whom correspondence should be addressed.
Risks 2016, 4(4), 44; https://doi.org/10.3390/risks4040044
Submission received: 2 September 2016 / Accepted: 18 November 2016 / Published: 30 November 2016

Abstract

:
Scatter plots of multivariate data sets motivate modeling of star-shaped distributions beyond elliptically contoured ones. We study properties of estimators for the density generator function, the star-generalized radius distribution and the density in a star-shaped distribution model. For the generator function and the star-generalized radius density, we consider a non-parametric kernel-type estimator. This estimator is combined with a parametric estimator for the contours which are assumed to follow a parametric model. Therefore, the semiparametric procedure features the flexibility of nonparametric estimators and the simple estimation and interpretation of parametric estimators. Alternatively, we consider pure parametric estimators for the density. For the semiparametric density estimator, we prove rates of uniform, almost sure convergence which coincide with the corresponding rates of one-dimensional kernel density estimators when excluding the center of the distribution. We show that the standardized density estimator is asymptotically normally distributed. Moreover, the almost sure convergence rate of the estimated distribution function of the star-generalized radius is derived. A particular new two-dimensional distribution class is adapted here to agricultural and financial data sets.

1. Introduction

The classes of multivariate Gaussian and elliptically contoured distributions have served as the probabilistic basis of many multivariate statistical models over a period of several decades. Accounts of the theory of elliptically contoured distributions may be found in [1,2,3]. The book [4] by Fang and Anderson contains a big chapter about statistical inference of elliptically contoured distributions. The theory of elliptically contoured distributions including applications to portfolio theory is presented in the monograph by [5] Gupta et al. On combining the advantages of several estimators, semiparametric density estimators for elliptical distributions were derived in papers [6,7,8] by Stute and Werner, by Cui and He and by Liebscher. In [9] Battey and Linton considered a density estimator for elliptical distributions based on Gaussian mixture sieves. The performance of their estimators heavily depends on how the density can be approximated by a mixture of normal distributions. Scatter plots of multivariate data sets, however, motivate modeling of star-shaped distributions beyond elliptically contoured ones.
The more flexible star-shaped densities were studied in [10] and later in [11]. The general structure of their normalizing constant given a density generating function was discovered, a geometric measure representation and, based upon it a stochastic representation were derived, and a survey of applications of such densities was given in [12]. Moreover, two-dimensional non-concentric elliptically contoured distributions are introduced there and, based upon two-dimensional star-shaped densities, a universal star-shaped generalization of the univariate von Mises density is derived. These results are further studied in detail in [13] for several particular classes. The big classes of norm and antinorm contoured distributions, being particular cases of star-shaped distributions, are considered in [14,15] for dimension two and for arbitrary finite dimension, respectively. In this paper we study several of those classes of distributions for arbitrary finite dimension and introduce a particular new class of distributions for dimension two. The rather general class of distributions considered in the present paper covers distributions with convex as well as such with non-convex contours.
The main goal of this paper is to develop estimation procedures for fitting multivariate generalized star-shaped distributions. The semiparametric procedure combines the flexibility of nonparametric estimators and the simple estimation and interpretation of parametric estimators. Since we apply nonparametric estimation to a univariate function, we avoid the disadvantages of nonparametric estimators in connection with the curse of dimensionality. The semiparametric approach of this paper is based on that in the earlier paper [7] on elliptical distributions but uses partially weaker assumptions. Alternatively, we consider a pure parametric method. In both cases, a parametric model is assumed for the density contours given by the star body and the Minkowski functional of it. For the semiparametric method, we assume that the contours are smooth, more precisely, that the Minkowski functional is continuously differentiable. The parameters are estimated using a method of moments. The star generalized radius density is estimated nonparametrically by use of a kernel density estimator, or parametrically.
The paper is structured as follows. The class of continuous star-shaped distributions and several of its subclasses are considered in Section 2. Section 3 deals with the estimation of the density and the star-generalized radius distribution. In Section 3, the statements on convergence rates and on the asymptotic normality of the density estimator as well as on the convergence rate of the estimated distribution function of the generalized radius are provided. First the case of a given star body is considered, later the more general case of a parametrized star body is taken into consideration. The particular Section 3.4 surveys on the one hand to a certain extent examples where different subclasses of star-shaped distributions appear in practice and deals on the other hand with applications of the methods developed here to the analysis of two-dimensional agricultural and financial data. The proofs can be found in Section 4.

2. Continuous Star-Shaped Distributions

2.1. The General Distribution Class

Throughout this paper, K R d denotes a star body, i.e., a non-empty star-shaped set that is compact and is equal to the closure of its interior, having the origin as an interior point. The Minkowski functional of K is defined by
h K ( x ) = inf { λ 0 : x λ K } for x R d .
The boundary of K is just the set { ( u , v ) T : h K ( ( u , v ) T ) = 1 } . Further we find a ball { y R d : y r } which covers K where | | . | | denotes Euclidean norm. Hence h K ( x x 1 ) 1 / r and
h K ( x ) 1 r x .
The function h K is assumed to be homogeneous of degree one,
h K ( λ x ) = λ h K ( x ) for x R d , λ > 0 ,
and to satisfy a further assumption.
A countable collection F = { C 1 , C 2 , } of pairwise disjoint sectors (closed convex cones C j containing no half-space, with non-empty interior and vertex being the origin 0 d ) such that R d = j C j will be called a fan. By B d we denote the Borel-σ-field in R d and by S, the boundary of K. We denote S j = S C j , S j B d = B S , j and B S = σ { B S , 1 , B S , 2 , } . We shall consider only star bodies K and sets A B S satisfying the following condition.
Assumption 1.
The star body K and the set A B S are chosen such that for every j the set
G ( A S j ) = { ϑ R d 1 : η with θ = ( ϑ T , η ) T A S j }
is well defined and such that for every ϑ = ( ϑ 1 , , ϑ d 1 ) T G ( A S j ) there is a uniquely determined η > 0 satisfying h K ( ( ϑ 1 , , ϑ d 1 , η ) T ) = 1 .
A star body K satisfying this assumption will be called for short an A 1 -star-body.
Let g : [ 0 , + ) [ 0 , + ) be a nonnegative function which fulfills the condition
0 < R d g ( h K ( x ) ) d x < .
Such function is called a density generating function (dgf).
We consider the class of continuous star-shaped distributions of random vectors X taking values in R d :
C S t S h ( d ) = { Φ g , K , μ : μ R d , K is an A 1-star-body with 0 d i n t K , g is a dgf }
where i n t K means the interior of K. Suppose that the distribution law Φ g , K , μ has the density
φ g , K , μ ( x ) = C ( g , K ) g h K ( x μ ) for x R d ,
where C ( g , K ) is a suitable normalizing constant. Moreover, K is called the contour defining star body of φ g , K , μ .
We consider the random vector X having the density (2) (in symbols X Φ g , K , μ ). According to Theorem 8 in [12], this random vector has the representation
X μ = d R U ,
where the star-generalized radius variable R = h K ( X μ ) and the star-generalized uniform basis vector U = 1 R X μ are independent.
Moreover, R has the density
f R ( r ) = 1 I ( g , d ) r d 1 g ( r )
with I ( g , d ) = 0 r d 1 g ( r ) d r , and U has a star-generalized uniform probability distribution on the boundary of K, i.e., P ( U A ) = O S ( A ) / O S ( S ) for A B S . According to [12], O S means the star-generalized surface measure which is a non-Euclidean one unless for S being the Euclidean sphere, and which is well defined if Assumption 1 is fulfilled. Note that O S ( S ) = d · v o l ( K ) . If lim r 0 + 0 g ( r ) > 0 is finite, then in view of (4), R takes values in the neighbourhood of zero with a rather small probability. This behaviour is called the volcano effect and is the stronger the higher the dimension is. The density (2) may be written as
φ g , K , μ ( x ) = 1 O S ( S ) h K ( x μ ) 1 d f R h K ( x μ ) , x R d .
Estimating such density may be studied under various assumptions concerning the degree of knowledge on the groups of parameters K , h K , O S ( S ) and f R , as well as μ. Let X = ( X ( 1 ) , , X ( d ) ) T and U = ( U 1 , , U d ) T . The next lemma gives helpful information about the mean and the covariances. Here and in what follows 1 { A } denotes the indicator function of an event A.
Lemma 1.
If E R 2 < + and K is symmetric w.r.t. the origin, then
E X = μ , and C ov ( X ( j ) , X ( k ) ) = E U j U k E R 2 for j , k = 1 , , d .
Proof. 
In view of (3), we have first to show that E U = 0 . Because of the symmetry of K, U has the same distribution as U , and U j = 0 with probability 0 for each j. Thus we obtain
E X μ = E R E U = E R E U 1 U 1 > 0 + E U 1 U 1 < 0 = E R E U 1 U 1 > 0 + E U 1 U 1 < 0 = 0 .
Moreover, it follows that
C ov ( X ( j ) , X ( k ) ) = E R 2 U j U k = E U j U k E R 2 .
The general approach followed here includes non-convex bodies which can occur in applications. Obviously, h K ( U ) = 1 and the distribution of the random vector U is concentrated on the set { u R d : h K ( u ) = 1 } .

2.2. A Class of Two-Dimensional Distributions Whose Contour Defining Star Bodies Are Squared Sine Transformed Euclidean Circles

We define α ( u , v ) ( π , π ] to be the angle in radians between the positive x-axis and the line through the point ( u , v ) and the origin: α ( u , v ) = arctan ( v / u ) for u > 0 , α ( u , v ) = arctan ( v / u ) + sgn ( v ) π for u < 0 , α ( 0 , v ) = π 2 · sgn ( v ) . The Minkowski functional of any two-dimensional star body K can then be written as
h K ( ( u , v ) T ) = u 2 + v 2 H ( α ( u , v ) ) ,
where H : ( π , π ] ( 0 , ) is a bounded function. In the following examples we consider two-dimensional star bodies with smooth boundaries; i.e., H is differentiable. Here, the following generator function is used
g 0 ( r ) = 1 + r 3 e r
which corresponds to the star-generalized radius density of mixed Erlang type
f 0 R ( r ) = r + r 2 3 e r .
Example 1.
Here we consider the Minkowski functional
h K ( ( u , v ) T ) = u 2 + v 2 1 + a v 2 u 2 + v 2 = u 2 + v 2 1 + a sin 2 α ( u , v )
where a ( 1 , + ) is a parameter. The Figure 1, Figure 2, Figure 3 and Figure 4 show the contour lines of the boundary of the body for several values of a and the resulting density for one choice of a. These figures show that the distribution class includes densities with convex as well as with non-convex contours.
Example 2.
We consider the star body K with Minkowski functional
h K ( ( u , v ) T ) = u 2 + v 2 1 + a 1 u a 2 v 2 u 2 + v 2 ,
where a 1 and a 2 R are parameters such that 1 + a 1 1 + a 2 2 > 0 , and
H ( α ) = 1 + a 1 ( 1 a 2 tan α ) 2 1 + tan 2 α .
This star body arises from a rotation of K in Example 1 by an angle α 0 where a 2 = 1 / tan α 0 . In Figure 5 and Figure 6 the boundaries of (multiples of) K are depicted.
A specific motivation for considering a star body K as in Example 2 arises when studying the dataset 5 of [38], see Figure 10 below.

2.3. Norm-Contoured Distributions

Specific norm-contoured distributions were studied in several papers which are in part surveyed in Richter [14]. A geometric measure representation of arbitrary norm contoured distributions is proved in [15]. The class of all norm-contoured distributions is denoted, according to these papers, by NC . The class NC is a subclass of the class StSh of star-shaped distributions. Here, we consider the subclass of continuous norm-contoured distributions CNC.
It is well known that there is a one-to-one correspondence between the class of convex bodies being symmetric w.r.t. the origin, where x K implies x K , and the class of norms in R d . If K is any such symmetric convex body then h K ( x ) = x where . is the uniquely determined norm. On the other hand, if . is any norm, then K = { x : x 1 } is the corresponding convex symmetric body having the origin as interior point, and x = h K ( x ) .
Throughout this section, let . be any norm and K = { x : x 1 } , and let the density of a norm-contoured distribution be
φ ( x ; g , . , O , μ ) = C ( g , . ) g O 1 ( x μ ) for x R d
where O is any orthogonal d × d -matrix. Because any rotated or mirrored norm-ball is again a norm ball we shall restrict our attention to the case O being the unit matrix and will write then X C N C ( g , . , μ ) . In the present situation, S = { x : x = 1 } is considered to be the unit sphere in the Minkowski space ( R d , . ) .
In the following we consider several specific cases of norms and the corresponding norm-contoured distributions.
Example 3.
If K is the Euclidean unit ball then h K is the Euclidean norm and (2) defines a shifted spherical distribution.
Example 4.
Let A R d , d be a d × d -matrix satisfying det ( A ) > 0 , . any norm, and
x = A x
another norm. If X is . -contoured distributed and A is a symmetric and positive definite then we call the distribution of A X an elliptically generalized . -contoured distribution.
Example 5.
Let a = ( a 1 , , a d ) T be a vector with a i > 0 , i = 1 , , d and A = diag ( 1 / a 1 , , 1 / a d ) . If, in Example 4, . is the p-norm, p 1 , then the corresponding norm . is
x = i = 1 d x i a i p 1 / p .
The distribution of X is called an axes aligned p-generalized elliptically contoured distribution.
Example 6.
Assume that data are grouped into k groups, and let k > 1 , n = n 1 + + n k , n i 1 , p i 1 , i = 1 k and
h K ( x ) = j = 1 k i = 1 d x i a i p j 1 / p j .
K may be called then an ( a , p 1 , , p k ) -generalized axis-aligned ellipsoid and we will say that X follows a grouped ( a , p 1 , , p k ) -generalized axis-aligned elliptically contoured distribution in R n .
Example 7.
In the case of two-dimensional observations, let P n denote the polygon having the n vertices I n , i = ( cos 2 π n i 1 , sin 2 π n i 1 ) T , i = 1 , , n , n 3 . The convex body which is circumscribed by P n will be denoted by K. Then h K is a norm defined in R 2 and φ g , K , 0 a polygonally contoured density which was used implicitly in [13] to construct a corresponding geometric generalization of the von Mises density. For the more general class of multivariate polyhedral star-shaped distributions, see [16].
Example 8.
Given a homogeneous polynomial p of degree k with p ( | x 1 | , , | x d | ) 0 , the function N ( x ) : = p ( | x 1 | , , | x d | ) 1 / k defines a norm in R d if it is subadditive. An example for a homogeneous polynomial of degree 3 and d = 2 is p ( x 1 , x 2 ) = x 1 3 + x 2 3 + 2 x 1 2 x 2 .

2.4. Antinorm-Contoured Distributions

A function g : R n [ 0 , ) which is continuous, positively homogeneous, non-degenerate and superadditive in some fan is called an antinorm in [17]. Thereby, g is called superadditive in a sector C or in the fan F if it satisfies the reverse triangle inequality in C or in every sector of the fan F , respectively.
Example 9.
If the ( a , p ) -functional | . | a , p is defined as . in Example 5 but with p ( 0 , 1 ) then it is an antinorm.
For geometric measure representations of elements from a big class of continuous antinorm contoured distributions we refer to [14,15]. For figures of two-dimensional antinorm balls, see [17]).

2.5. Continuous Non-Concentric Elliptically Contoured Distributions

Let a = ( a 1 , , a d ) T , a i > 0 ; i = 1 n and K a = { x R d : i = 1 d x i / a i 2 1 } . If e = ( e 1 , , e d ) T satisfies i = 1 d e i / a i 2 < 1 then K a , e = K a e is a star body having the origin as an interior point, and
r K a , e = x R d : i = 1 d x i + r e i r a i 2 1 = K r a , r e .
Moreover, r 1 K a , e r 2 K a , e for r 1 r 2 . A Minkowski functional h K a , e which is homogeneous of degree one will be called a non-concentric elliptically contoured function and φ g , K a , e , μ a non-concentric elliptically contoured density. If O : R d R d denotes an arbitrary orthogonal transformation then h O K a , e is also a non-concentric elliptically contoured function which is homogeneous of degree one. For the special case of d = 2 see [12,13].

3. Estimation for Continuous Star-Shaped Distributions

3.1. Parametric Estimators

Let X 1 , , X n be a sample of independent random vectors, where X i Φ g , K , μ and X i = ( X i 1 , , X i d ) T . Assume that the star body K is given and Assumption 1 is satisfied. From now on, we suppose that K is symmetric w.r.t. the origin. We consider a model family { f θ : θ Θ 1 } of continuously differentiable densities for the star-generalized radius R on [ 0 , ) , see (4). Θ = Θ 1 × Θ 2 , Θ 1 R q , Θ 2 R d is the parameter space which is assumed to be compact. Suppose that h K ( . ) is a continuous function.
Next we give two reasonable model classes for f θ :
(1)
Modified exponential model. θ = τ ( 0 , + ) ,
f τ ( r ) = 1 ( d + 1 ) ( d 1 ) ! τ d r d 1 1 + r τ e r / τ for r > 0
with the expectation
0 r f θ ( r ) d r = d ( d + 2 ) τ d + 1 .
(2)
Weibull model. θ = ( τ , a ) ( 0 , + ) × ( 1 , + ) ,
f θ ( r ) = a τ d Γ ( d / a ) r d 1 e ( r / τ ) a for r > 0
with the expectation
0 r f θ ( r ) d r = Γ ( d + 1 a ) τ Γ ( d a ) .
Let f R { f θ : θ Θ 1 } . In this section the aim is to fit the specific parametric model for the density φ g , K , μ to the data by estimating the parameters θ and μ where φ g , K , μ is given according to (5) and (4) with f R = f θ . Therefore, the two models [1] and [2] fulfill the condition lim r 0 + 0 g ( r ) = 0 which ensures the differentiability of the density φ g , K , μ at zero.
For the statistical analysis we suppose that the data X 1 , , X n are given and these data comprise independent random vectors having density φ g , K , μ . Suppose that θ and μ are interior points of Θ 1 and Θ 2 , respectively. The concentrated log likelihood function (constant addends can be omitted) reads as follows
L ( θ , μ ) = i = 1 n ln f θ ( h K ( X i μ ) ) + ( 1 d ) ln h K ( X i μ ) .
We introduce the maximum likelihood estimators θ ^ n , μ ^ n of θ and μ as joint maximizers of the likelihood function:
L ( θ ^ n , μ ^ n ) = max ( θ , μ ) Θ L ( θ , μ ) .
Under appropriate assumptions, the maximum-likelihood-estimator are asymptotically normally distributed (cf. Theorem 5.1 in [18], p. 463)
n ( θ ^ n θ , μ ^ n μ ) T d N ( 0 , I ( θ , μ ) 1 ) for n ,
where d is the symbol for convergence in distribution and the information matrix is given by I ( θ , μ ) = ( I i j ( δ ) ) i , j = 1 d + q with δ T = ( δ 1 , , δ d + q ) = ( θ T , μ T ) and
I i j ( δ ) = E δ i δ j ln f θ ( h K ( X k μ ) ) + ( 1 d ) ln h K ( X k μ ) .

3.2. Nonparametric Estimators without Scale Fit

In the present section we deal with nonparametric estimators in the context of star-shaped distributions. This type of estimators is of special interest if no suitable parametric model can be found. The cdf of R will be denoted by F R .

3.2.1. Estimating μ and F R

Let X 1 , , X n be the sample as in Section 3.1. In the following the focus is on the estimation of the parameter μ and the distribution function of the generalized radius R.
First we choose an estimator for μ. For this purpose we assume that E | X | < + . In view of Lemma 1,
μ ^ n = 1 n i = 1 n X i
is an unbiased estimator for the unknown parameter μ. Define R i = h K ( X i μ ) and R ^ i = h K ( X i μ ^ n ) for i = 1 , , n . Using this definition, an estimator for the cdf of R = h K ( X μ ) (cf. (3)) is given by the formula
F ^ n R ( r ) = 1 n i = 1 n 1 R ^ i r
for r 0 . At a first glance, F ^ n R ( r ) just approximates the empirical distribution function
F n R ( x ) = 1 n i = 1 n 1 R i r
which is not available from the data because of the unknown μ. We can prove that F ^ n R converges to F R a . s . , in fact at the same rate as every common empirical distribution function converges to the cdf. This is the assertion of the following theorem.
Theorem 1.
Suppose that Assumption 1 is satisfied, h K ( . ) is Lipschitz-continuous on R d , and
0 r d + 1 g ( r ) d r < + .
If further f is bounded on [ 0 , + ) , then, for n ,
sup r 0 F ^ n R ( r ) F R ( r ) = O ln ln n n a . s .
Here the condition (9) ensures that E R 2 < + which in turn is an assumption for the law of iterated logarithm of μ ^ n .

3.2.2. Density Estimation

In the remainder of Section 3.2, we establish an estimator for the density φ g , K , μ in the case of a bounded generator function g, and provide statements on convergence properties of the estimator. An estimator for μ is available by Formula (7), the estimation of g is still an open problem. If we want to estimate g, then it is necessary that this function is identifiable. In (2), however, function g is determined up to a constant factor. Therefore, we require I ( g , d ) = 1 to obtain the uniqueness and identifiability. As a consequence, we get, according to [12]
C ( g , K ) = 1 O S ( S ) .
In the following we adopt the approach introduced in Section 2 of [7] to the present much more general situation. This approach combines the advantages of two estimators and avoids their disadvantages. Let ψ : [ 0 , ) [ 0 , + ) be a function having a derivative ψ with ψ ( y ) > 0 for y 0 , and the property ψ ( 0 ) = 0 . We introduce the random variable Y = ψ ( h K ( X μ ) ) and denote the inverse function of ψ by Ψ. The transformation using ψ is applied to adjust the volcano effect described above. In view of (4), the density χ of Y = ψ ( R ) is given by
χ ( y ) = Ψ ( y ) d 1 g ( Ψ ( y ) ) · Ψ ( y )
for y 0 . This equation implies the following formula for g:
g ( z ) = z 1 d ψ ( z ) χ ( ψ ( z ) ) .
The next step is to establish the estimator for χ. Nonparametric estimators have the advantage that they are flexible and there is no need to assume a specific model. Let us consider the transformed sample Y 1 n , , Y n n with Y i n = ψ ( R ^ i ) . Further we apply the following kernel density estimator for χ:
χ ^ n ( y ) = n 1 b 1 i = 1 n k ( y Y i n ) b 1 + k ( y + Y i n ) b 1 for y 0 ,
where b = b ( n ) is the bandwidth and k the kernel function. Note that χ ^ n represents the usual kernel density estimator for χ based upon the Y i n ’s and including a boundary correction at zero (the second addend in the outer parentheses of (10)). The mirror rule is used as a simple boundary correction. Other more elegant corrections can be applied at the price of a higher technical effort. The properties of χ ^ n are essentially influenced by the bandwidth b. Since the kernel estimator shows reasonable properties only in the case of bounded χ, we have to guarantee by suitable assumptions that lim z 0 + z 1 d ψ ( z ) > 0 in order to get the boundedness of χ (see below). On the basis of χ ^ n , we can establish the following estimator for φ g , K , μ :
φ ^ n ( x ) = O S ( S ) 1 g ^ n h K ( x μ ^ n ) ,
where
g ^ n ( z ) = z 1 d ψ ( z ) χ ^ n ( ψ ( z ) ) .
This approach has the property that the theory of kernel density estimators applies here (cf. [19]). The kernel estimators are a very popular type of nonparametric density estimators because of their comparatively simple structure. In the literature the reader can find a lot of hints concerning the choice of the bandwidth.
Let us add here some words to the comparison between this paper and [7]. Although the main idea for the construction of estimators is the same, there is a difference in the definition of the generator functions (say g and g L ). Considering the special case h K ( x ) = Σ 1 / 2 ( x μ ) , identity g ( t ) = g L ( t 2 ) can be established for t 0 . This causes some changes in the formulas. For more details in a particular case, see Section 3 in [20].

3.2.3. Assumptions Ensuring Convergence Properties of Estimators

Next we provide the assumptions for the theorems below. Assumption 2 concerns the parameter b = b ( n ) and the function k of the kernel estimator whereas Assumption 3 is posed on function ψ.
Assumption 2.
(a) We assume that
lim n b ln ln n = 0 a n d b ¯ b C 1 · n 1 / 5
with constants b ¯ , C 1 > 0 .
(b) Suppose that the kernel function k : R R is continuous and vanishes outside the interval [ 1 , 1 ] , and has a Lipschitz continuous derivative on [ 1 , 1 ] . Moreover, assume that k ( t ) = k ( t ) holds for t [ 1 , 1 ] ,
1 1 k ( t ) d t = 1 a n d 1 1 t j k ( t ) d t = 0
for even j : 0 < j < p , where p 2 is an integer.
Note that continuity of the derivative at an enclosed boundary point means that the one-sided derivative exists and is the limit of the derivatives in a neighbourhood of this point. Symmetric kernel functions k satisfying (12) are called kernels of order p. Assumption 2 ensures that the bias of the density estimator χ ^ n converges to zero at a certain rate. Under Assumption 2 with p = 2 and k ( t ) 0 , the estimator χ ^ n is indeed a density. The case p > 2 is added to complete the presentation and is of minor practical importance unless we have a very large sample size (cf. the discussion in [21]). From the asymptotic theory for density estimators, it is known that the Epanechnikov kernel
k epa ( t ) = 3 4 1 t 2 for t [ 1 , 1 ] , 0 otherwise
is an optimal kernel of order 2 (i.e., in the case p = 2 in Assumption 2) with respect to the asymptotic mean square error (cf. [19]). This kernel function is simple in structure and leads to fast computations. The consideration of optimal kernels can be extended to higher-order kernels. It turned out that their use is advantageous only in the case of sufficiently large sample sizes (for instance, for a size greater than 1000).
Assumption 3.
The ( p + 1 ) -th order derivative of Ψ d exists and is continuous on [ 0 , ) , ψ is positive and bounded on ( 0 , + ) for some integer p 2 , and ψ is bounded on ( 0 , + ) . The functions z z d 1 ψ ( z ) 1 and z z 1 ψ ( z ) have bounded derivatives on [ 0 , M 1 ] with some M 1 > 0 . Moreover,
lim t 0 + Ψ d ( t ) > 0 .
Notice that in Assumption 3 we require that the right-sided limit of the ( p + 1 ) -th order derivative of Ψ d is finite at zero. Hence Assumption 3 implies that
Ψ d ( t ) = ( Ψ d ( t ˜ ) ) t , t ˜ ( 0 , t ) , lim t 0 + t 1 / d Ψ ( t ) = C 2 ,
and
lim z 0 + z d ψ ( z ) = C 2 d
with a finite constant C 2 > 0 . On the other hand, it follows from (13) that
lim z 0 + z 1 d ψ ( z ) = lim t 0 + Ψ ( t ) 1 d Ψ ( t ) 1 = d lim t 0 + Ψ d ( t ) 1 = C 3
with a finite constant C 3 > 0 . Therefore, χ is bounded under Assumption 3.
Example 10.
Let
ψ ( z ) = a + a d + z d 1 / d
with a constant a > 0 . Then Ψ d ( t ) = t + a d a d , Ψ d ( t ) = d t + a d 1 ,
z 1 ψ ( z ) = z d 2 a d + z d 1 + 1 / d , z d 1 ψ ( z ) 1 = a d + z d 1 1 / d , z 1 ψ ( z ) = a d + z d 1 / d 2 z d 3 a d d 2 a d z d , a n d z d 1 ψ ( z ) 1 = d 1 z d 1 a d + z d 1 / d .
Hence, Assumption 3 is satisfied for every p 2 .
Another condition is required now for h K .
Assumption 4.
For any bounded subset Q of R d , 0 Q , the partial derivatives G 1 , , G d of h K ( . ) exist and are bounded on Q, and x ψ ( h K ( x ) ) G j ( x ) is Hölder continuous of order α > 0.2 on Q for each j { 1 , , d } .
Assumption 5.
For any bounded subset Q of R d , 0 Q , h K is Hölder continuous of order α ¯ > 0.2 .
If Assumption 3 is fulfilled, the function h K has second order derivatives 2 x j x J h K ( x ) = G j J ( x ) , and these are bounded on bounded subsets of R d , then the Assumption 4 is satisfied.
Example 11.
We consider the q-norm/antinorm: h K ( x ) = x q ,
G j ( x ) = x j | x j | q 2 x q q 1 for j = 1 , , d , G j J ( x ) = 1 q x j | x j | q 2 x J | x J | q 2 x q 2 q 1 for j J , q 1 x q 2 q 1 | x j | q 2 ν j | x ν | q for j = J .
Therefore, Assumption 4 is fulfilled in the case q > 1.2 , and Assumption 5 is fulfilled in the case q > 0.2 .
Examples 1 and 2: (Continued) One can show that G j ( x ) exists for x 0 , j { 1 , , d } , and is bounded. Moreover, ψ ( h K ( . ) ) G j is Lipschitz continuous on R d \ { 0 } . Hence, Assumption 4 is satisfied.

3.2.4. Properties of the Density Estimator

First we provide the result on strong convergence of the density estimator.
Theorem 2.
Suppose that the p-th order derivative g ( p ) of g exists and is bounded on [ 0 , ) for some even integer p 2 . Moreover, assume that condition (9) as well as Assumptions 1 to 3 are satisfied for the given p. Let Assumption 4 or Assumption 5 be satisfied. In the first case define r n : = ln n ( n b ) 1 / 2 , in the latter case r n : = n α ¯ / 2 b 1 . Then, for any compact set D with μ D and n ,
sup x D φ ^ n ( x ) φ g , K , μ ( x ) = O r n + b p a . s .
For any compact set D with μ D and n ,
sup x D φ ^ n ( x ) φ g , K , μ ( x ) = O r n + b 1 / d a . s .
Theorem 2 applies in particular to the Euclidean case h K = . 2 . Since Assumption 2 is weaker than the corresponding assumption on the kernel in [7], Theorem 2 extends Theorem 3.1 in [7] even in the case of h K = . 2 . The convergence rate in (15) is the same as that known for one-dimensional kernel density estimators and cannot be improved under the assumptions posed here (cf. [22]).
The next theorem represents the result about the asymptotic normality of the estimator φ ^ n .
Theorem 3.
Suppose that the assumptions of Theorem 2 and Assumption 4 are satisfied. Let x R d , x μ such that g ( p ) is continuous at x ˜ : = h K ( x μ ) .
(i) 
Define
σ ¯ 2 ( x ˜ ) = O S ( S ) 2 x ˜ 1 d ψ ( x ˜ ) g ( x ˜ ) 1 1 k 2 ( t ) d t , Λ ( x ˜ ) = O S ( S ) 1 x ˜ 1 d ψ ( x ˜ ) 1 p ! χ ( p ) ( ψ ( x ˜ ) ) 1 1 t p k ( t ) d t .
Then
φ ^ n ( x ) φ g , K , μ ( x ) = Z n + e n ,
where e n = Λ ( x ˜ ) b p + o ( b p ) ,
n b Z n d N ( 0 , σ ¯ 2 ( x ˜ ) ) for n .
(ii) 
If additionally lim n n 1 / ( 2 p + 1 ) b = C 4 holds true with a constant C 4 0 , then, for n ,
n b φ ^ n ( x ) φ g , K , μ ( x ) d N ( C 4 ( 2 p + 1 ) / 2 Λ ( x ˜ ) , σ ¯ 2 ( x ˜ ) ) .
The assertion of Theorem 3 can be used to construct an asymptotic confidence region for φ g , K , μ ( x ) . Term e n describes the asymptotic behaviour of the the bias of the estimator φ ^ n whereas the fluctuations of the estimator are represented by Z n . In view of Theorem 3, n b Z n converges in distribution to Z N ( 0 , σ ¯ 2 ( x ˜ ) ) . The mean squared deviation of the leading terms in the asymptotic expansion of φ ^ n is thus given by
E n 1 / 2 b 1 / 2 Z + Λ ( x ˜ ) b p 2 = n 1 b 1 σ ¯ 2 ( x ˜ ) + Λ 2 ( x ˜ ) b 2 p .
The minimization of this function w.r.t. b leads to the asymptotically optimal bandwidth
b * = σ ¯ 2 ( x ˜ ) 2 p Λ 2 ( x ˜ ) n 1 / ( 2 p + 1 ) .
The bandwidth b * converges at rate n 1 / ( 2 p + 1 ) to zero. Under the conditions of Theorem 3(ii), φ ^ n ( x ) φ g , K , μ ( x ) has the convergence rate n p / ( 2 p + 1 ) . This convergence rate of φ ^ n is better than that of a nonparametric density estimator but slower than the usual rate n 1 / 2 for parametric estimators. In principle, Formula (16) could be used for the optimal choice of the bandwidth. However, one would need then an estimator for χ ( p ) and typically, estimators of derivatives of densities do not exhibit a good performance unless n is very large. As a resort, one can consider a bandwidth which makes reference to a specific radius distribution.
To illustrate how the estimators work in practice, we simulated data from a q-norm distribution with q = 1.3 and the radius distribution to be the modified exponential distribution with τ = 1.1 . The Figure 7 and Figure 8 include graphs of the underlying function g and its estimator in two cases.

3.2.5. Reference Bandwidth

Let us consider an estimator φ ^ n ( x ) with Epanechnikov kernel, function ψ as in Example 10, and modified exponential radius density in the case p = 2 . According to (16), the reference bandwidth is then
b * = 15 ( d + 1 ) ( d 1 ) ! e x ˜ / τ x ˜ 4 d τ 5 / d ( 1 + x ˜ d ) 1 + 5 / d ( x ˜ + τ ) O S ( S ) D 2 n 1 1 / 5 ,
where
D = x ˜ 2 ( x ˜ + ( d 2 ) τ ) + x ˜ d + 2 ( 2 x ˜ ( d + 1 ) τ ) + x ˜ 2 d x ˜ 3 + ( d 2 ) ( d 1 ) ( x ˜ + τ ) τ 2 + x ˜ 2 τ ( 1 2 d ) .
This formula was generated using the computer algebra system Mathematica. The parameter τ can be estimated by utilizing the above Formula (6) for the expectation of the radius.

3.3. Semiparametric Estimators Involving a Scale and a Parameter Fit

In this section we consider the situation where the contour of the body K depends on scale parameters σ 1 , , σ d . Suppose that I ( g , d ) = 1 . We introduce the diagonal matrix Σ = diag ( σ 1 , , σ d ) and a master body K 0 , which is symmetric w.r.t. the origin. Define
K = Σ K 0 : = { Σ x : x K 0 } ,
and U ˜ = Σ 1 U . The distribution of U ˜ is concentrated on the boundary S 0 of K 0 . We assume that K 0 is given such that
E U ˜ j 2 = 1 for j = 1 , , d .
Otherwise, K 0 is rescaled. Suppose that h K 0 depends on a further parameter vector θ Θ where the parameter space Θ R q is a compact set. Then
h K ( x ) = h K 0 ( θ , Σ 1 x ) for x R d .
The parameter vector θ is able to describe the shape of the boundary of body K, see Examples 1 and 2 (parameters a 1 and a 2 ). From Lemma 1, we obtain
V ( X ( j ) ) = σ j 2 E U ˜ j 2 E R 2 = σ j 2 E R 2 , C ov ( X ( j ) , X ( k ) ) = σ j σ k E U ˜ j U ˜ k E R 2 ,
and
ρ j k = C orr ( X ( j ) , X ( k ) ) = E U ˜ j U ˜ k for j , k = 1 , , d .
Here we see that (17) results in V ( X ( j ) ) = σ j 2 E R 2 . The density is given by
φ g , K , μ ( x ) = O S 0 ( S 0 ) 1 det ( Σ ) 1 g h K 0 ( θ , Σ 1 ( x μ ) ) , x R d .
In this context, a scaling problem occurs concerning g. Assume that g is a suitably given generator function satisfying I ( g , d ) = 1 . Then x g t * ( x ) : = t d 1 g ( t x ) is a modified generator for every t R with I ( g t * , d ) = 1 . For any t R , we obtain the same model when g is replaced by g t * and σ j is replaced by σ j t for j = 1 , , d . To get uniqueness, we choose t such that
E R 2 = 0 r d + 1 g t * ( r ) d r = t 2 0 r d + 1 g ( r ) d r = 1 .
Let μ ^ n = ( μ ^ n 1 , , μ ^ n d ) T as above. Then σ j 2 represents the variance of the j-th component of X. Based on this property, the sample variances of the components of X can be used as estimators for σ j 2 :
σ ^ n j 2 = 1 n 1 i = 1 n X i j μ ^ n j 2 for j = 1 , , d .
Moreover, we have the sample correlations
ρ ^ n j k = σ ^ j 1 σ ^ k 1 1 n 1 i = 1 n X i j μ ^ n j X i k μ ^ n k for j , k = 1 , , d .
In the following we use the notation Σ ^ n = diag ( σ ^ n 1 , , σ ^ n d ) . If θ is unknown, we consider moment estimators based on the correlations. For this we need the following assumptions.
Assumption 6.
Let I be a subset of { ( j , k ) : j , k = 1 , , d , j < k } with cardinality q. There is a vector ρ = ( ρ j k ) ( j , k ) I R q such that for l = 1 , , q ,
θ l = γ l ( ρ ) .
Assume that γ l : C Θ has bounded partial derivatives, ρ C and θ is an interior point of Θ.
Assumption 7.
For any bounded subset Q of R d , 0 Q , the partial derivatives G ˜ 1 , , G ˜ q , G 1 , , G d of ( θ , x ) h K 0 ( θ , x ) exist, are bounded for θ Θ , x Q , and ( θ , x ) ψ ( h K 0 ( θ , x ) ) G j ( θ , x ) , ψ ( h K 0 ( θ , x ) ) G ˜ j ( θ , x ) is Hölder continuous of order α > 0.2 on Θ × Q for each j { 1 , , d } .
Assumption 8.
For any bounded subset Q of R d , 0 Q , h K 0 is Hölder continuous of order α ¯ > 0.2 .
Examples 1 and 2: (Continued) Similarly as above, it can be proven that Assumption 7 is fulfilled.
Let ρ ^ be the sample version of ρ. Then
θ ^ n l = γ l ( ρ ^ ) for l = 1 , , q
is the estimator for θ, θ ^ n = ( θ ^ n l ) l = 1 q . Define R ^ i = Σ ^ 1 ( X i μ ^ n ) . With this definition, F ^ n R is determined according to Formula (8). The following result on the convergence rate of F ^ n R can be proven:
Theorem 4.
Suppose that Assumptions 1 and 6 are satisfied, and
0 r d + 3 g ( r ) d r < + .
Let r r f R ( r ) be bounded on [ 0 , + ) . Then, for n ,
sup r 0 F ^ n R ( r ) F R ( r ) = O ln ln ( n ) n a . s .
In this section the transformed sample Y 1 n , , Y n n is given by Y i n = ψ ( h K 0 ( θ ^ n , R ^ i ) ) with ψ as in Section 3.2. The estimator g ^ n for the generator g is calculated using Formulas (10) and (11) from the previous section. The following estimator for the density has thus been established:
φ ^ n ( x ) = O S 0 ( S 0 ) 1 det ( Σ ^ n ) 1 g ^ n h K 0 ( θ ^ n , Σ ^ n 1 ( x μ ^ n ) )
The next two theorems show the results concerning strong convergence and asymptotic normality of the density estimator:
Theorem 5.
Suppose that the p-th order derivative g ( p ) of g exists and is bounded on [ 0 , ) for some even integer p 2 . Where needed, with this p, assume further that Assumptions 1, 2, 3, 6, (1) and (18) are satisfied. Let Assumption 7 or Assumption 8 be satisfied, and define in the first case r n : = ln n ( n b ) 1 / 2 and in the latter case r n : = n α ¯ / 2 b 1 .Then the claim of Theorem 2 holds true for estimator φ ^ n defined in (19).
Theorem 6.
Suppose that the assumptions of Theorem 5 are satisfied. Let x R d , x μ such that g ( p ) is continuous at x ˜ : = h K 0 ( θ , Σ 1 ( x μ ) ) . Assume that lim n n 1 / ( 2 p + 1 ) b = C 4 holds true with a constant C 4 0 . Then for n ,
n b φ ^ n ( x ) φ g , K , μ ( x ) d N ( C 4 ( 2 p + 1 ) / 2 Λ ( x ˜ ) , σ ¯ 2 ( x ˜ ) ) ,
where φ ^ n is defined in (19),
σ ¯ 2 ( x ˜ ) = O S 0 ( S 0 ) 2 det ( Σ ) 2 x ˜ 1 d ψ ( x ˜ ) g ( x ˜ ) 1 1 k 2 ( t ) d t , Λ ( x ˜ ) = O S 0 ( S 0 ) 1 det ( Σ ) 1 x ˜ 1 d ψ ( x ˜ ) 1 p ! χ ( p ) ( ψ ( x ˜ ) ) 1 1 t p k ( t ) d t .
The remarks following Theorems 2 and 3 are valid similarly.

3.4. Applications

For many decades, statistical applications of multivariate distribution theory were manly based upon Gaussian and elliptically contoured distributions. Studies using non-elliptically contoured star-shaped distributions were basically made during the last two decades and are dealing in most cases with p-generalized normal distributions. Such distributions are convex or radially concave contoured if p 1 or 0 < p 1 , respectively, and are also called power exponential distributions. Moreover, common elliptically contoured power exponential (ecpe) distributions build a particular class of the wide class of star-shaped distributions that allows modeling much more flexible contours than elliptically ones.
The class of ecpe distributions is used in a crossover trial on insulin applied to rabbits in [23], in image denoising in [24] and in colour texture retrieval in [25]. Applications of multivariate g-and-h distributions to jointly modeling body mass index and lean body mass are demonstrated in [26] and accompanied by star-shaped contoured density illustrations. The l n , p -elliptically contoured distributions build another big class of star-shaped distributions and are used in [27] to explore to which extent orientation selectivity and contrast gain control can be used to model the statistics of natural images. Mixtures of ecpe distributions are considered for bioinformation data sets in [28]. Texture retrieval using the p-generalized Gaussian densities is studied in [29]. A random vector modeling data from quantitative genetics presented in [30] are shown in [31] to be more likely to have a power exponential distribution different from a normal one. The reconstruction of the signal induced by cosmic strings in the cosmic microwave background, from radio-interferometric data, is made in [32] based upon generalized Gaussian distributions. These distributions are also used in [33] for voice detection.
More recently, the considerations in [11] opened a new field of financial applications of more general star-shaped asymptotic distributions, where suitably scaled sample clouds converge onto a deterministic set.
Figure 3 in [34] represents a sample cloud which might be modeled with a density being star-shaped w.r.t. a fan having six cones that include sample points and other cones that do not. Note, however, that Figure 1 d-f in the same paper do not reflect a homogeneous density but might be compared in some sense to the level sets of the characteristic functions of certain polyhedral star-shaped distributions in [16], Figure 5.2.
When modeling Lymphoma data, [35] analyze sample clouds of points, see Figures 2 and 3, which might be interpreted as mixtures of densities having contours in part looking similar like that in [36] where flow cytometric data, Australian Institute of Sport data and Iris data are analyzed, or like that of certain skewed densities as they were (analytically derived and) drawn in [37]. In a similar manner, Figures 2 and 5 in [20] indicate that mixtures of different types of star-shaped distributions might be suitable for modeling residuals of certain stock exchange indices. It could be of interest to closer study in future work more possible connections of all the models behind.
The following numerical examples of the present section are aimed to illustrate the agricultural and financial application of the estimators described in this paper. To this end, we make use of the new particular non-elliptically contoured but star-shaped distribution class introduced in Section 2.2 of the present paper.
Example 12.
Example 2 of Section 2.2 continued. We consider the class of bodies K of Example 2. Let a 2 = 1 . Figure 9 shows the dependence of the correlation on the parameter a 1 .
Here we apply the above described methods to the dataset 5 of [38]. The yield of grain and straw are the two variables. The sample correlation is 0.73077. Starting from that value, we can calculate the moment estimator for a 1 : a ^ 1 = 0.83641 . Moreover, we obtain
O S 0 ( S 0 ) = 2.6406 , μ ^ 1 = 3.9480 , μ ^ 2 = 6.5148 σ ^ 1 = 0.45798 , σ ^ 2 = 0.89831 .
The data and the shape of the estimated multivariate density are depicted in Figure 10 and Figure 11.
Example 13.
We want to illustrate the potential of our approach for applications to financial data and consider daily index data from Morgan Stanley Capital International of the countries Germany and UK for the period August 2011 to June 2016. The data indicate the continuous daily return values computed as logarithm of the ratio of two subsequent index values. The modelling of MSCI data using elliptical models is considered in [5]. The data are depicted in Figure 12. A visual inspection seems to give some preference for our model from Section 2.2 compared to the elliptically contoured model. Figure 13 and Figure 14 show the estimated model for the data. The basic numerical results are:
O S 0 ( S 0 ) = 2.1966 , a ^ 1 = 1.2088 , μ ^ 1 = 0.00026725 , μ ^ 2 = 0.00026855 σ ^ 1 = 0.013519 , σ ^ 2 = 0.011715 .
Further we proceed with proving the results.

4. Proofs

Throughout the remainder of the paper, suppose that Assumptions 1–3 are satisfied for some integer p 2 . First, we prove auxiliary statements which are used in the proof of strong convergence of φ ^ n and later.

4.1. Proof of Auxiliary Statements

The following Lemma 2 clarifies the asymptotic behaviour of χ in the neighbourhood of zero.
Lemma 2.
Suppose that g exists and is bounded. Then
sup t , u [ 0 , M ¯ ] χ ( t ) χ ( u ) | t u | 1 / d < +
for any M ¯ > 0 .
Proof. 
Observe that by the Lipschitz continuity of the functions g and z z d 1 ψ ( z ) 1 in view of Assumption 3,
χ ( t ) χ ( u ) C · Ψ ( t ) d 1 ψ ( Ψ ( t ) ) g ( Ψ ( t ) ) g ( Ψ ( u ) ) + Ψ ( t ) d 1 ψ ( Ψ ( t ) ) Ψ ( u ) d 1 ψ ( Ψ ( u ) ) g ( Ψ ( u ) ) C · Ψ ( t ) Ψ ( u )
uniformly for t , v [ 0 , M ¯ ] . Here and in the following C is a generic constant which may differ from formula to formula. By assumption, we have
Ψ d ( t ) Ψ d ( u ) C · | t u | .
On the other hand,
sup t , u 0 | t 1 / d u 1 / d | | t u | 1 / d < + .
Hence
Ψ ( t ) Ψ ( u ) C · Ψ d ( t ) Ψ d ( u ) 1 / d C · | t u | 1 / d .
In view of (20), the proof is complete. ☐
Lemma 3.
Let (9) be fulfilled. Then, for n ,
μ ^ n μ 2 = O ln ln ( n ) n a . s .
If in addition (18) is satisfied then, for n ,
max j = 1 , , d σ ^ n j σ j = O ln ln ( n ) n a . s .
Proof. 
Because of (9), the law of iterated logarithm applies and leads to
lim sup n n ln ln n μ ^ n j μ j = 2 V X j a . s .
Since there is a constant C > 0 such that y 2 C y for all y R d in view of the norm equivalence property, the first part of the lemma follows from (21). The second part can be shown similarly. ☐
In several places, we will use the following property:
Lemma 4.
Suppose that Λ : R d R is a measurable function with Λ ( x ) = Λ ( x ) . Then
E Λ ( X μ ) = 0
Proof. 
Since X + μ has the same distribution as X μ , we have
E Λ ( X μ ) = E Λ ( X μ ) 1 { X 1 μ 1 < 0 } + E Λ ( X μ ) 1 { X 1 μ 1 > 0 } = E Λ ( X + μ ) 1 { X 1 μ 1 > 0 } + E Λ ( X μ ) 1 { X 1 μ 1 > 0 } = 0
which proves the lemma. ☐

4.2. Proving Convergence of F ^ n R

In this section we prove Theorem 1. The law of the iterated logarithm for the empirical process says (cf. [39], p. 268, for example)
Δ n : = sup r R F n R ( r ) F R ( r ) = O ln ln ( n ) n a . s .
By Lipschitz-continuity of h K and Lemma 3,
Δ ¯ n : = sup x R d h K ( x μ ^ n ) h K ( x μ ) C μ ^ n μ 2 = O ln ln ( n ) n a . s .
Moreover,
F ^ n R ( r ) F R ( r ) = 1 n i = 1 n 1 h K ( X i μ ^ n ) r 1 h K ( X i μ ) r + F n R ( r ) F R ( r ) 1 n i = 1 n 1 h K ( X i μ ^ n ) r , h K ( X i μ ) > r 1 h K ( X i μ ) r , h K ( X i μ ^ n ) > r + F n R ( r ) F R ( r ) 1 n i = 1 n 1 r < h K ( X i μ ) r + Δ ¯ n + Δ n ,
and
F ^ n R ( r ) F R ( r ) 1 n i = 1 n 1 r Δ ¯ n < h K ( X i μ ) r Δ n .
Hence, by the boundedness of f R ,
sup r 0 F ^ n R ( r ) F R ( r ) 1 n sup r 0 i = 1 n 1 r Δ ¯ n < h K ( X i μ ) r + Δ ¯ n + Δ n = sup r 0 F n R ( r + Δ ¯ n ) F n R ( r Δ ¯ n ) + Δ n sup r 0 F R ( r + Δ ¯ n ) F R ( r Δ ¯ n ) + 3 Δ n 2 sup r 0 f R ( r ) Δ ¯ n + O ln ln n n = O ln ln n n a . s . ,
which leads to the theorem. ☐

4.3. Proving Strong Convergence of the Density Estimator

Let Y ˜ i = ψ h K ( X i μ ) for i = 1 , , n , K b ( y , t ) = k ( ( y t ) / b ) + k ( ( y + t ) / b ) for y , t 0 , Y i n = ψ ( h K ( X i μ ^ n ) ) , and
χ ˜ n ( y ) = 1 n b i = 1 n K b ( y , Y ˜ i )
for y 0 . Then (cf. (10))
χ ^ n ( y ) = 1 n b i = 1 n K b ( y , Y i n ) .
Next we prove strong convergence rates for χ ^ n and later for φ ^ n . Throughout this section we suppose that Assumptions 1 to 3 and (9) are fulfilled for some even integer p 2 . The compact set [ m , M ] with arbitrary m and M, 0 m < M can be covered with closed intervals U 1 , , U n having sides of length ( M m ) n 1 and centres u 1 , , u n such that i = 1 n U i = [ m , M ] . The constants m and M will be specified later. Note that
sup y [ m , M ] χ ^ n ( y ) χ ( y ) max l = 1 , , n sup y U l χ ^ n ( y ) χ ^ n ( u l ) + χ ^ n ( u l ) χ ˜ n ( u l ) + χ ˜ n ( u l ) χ ( u l ) + sup y U l χ ( u l ) χ ( y ) .
The asymptotic behaviour of the right hand side in (22) as n is analyzed term by term in the next lemmas.
Lemma 5.
Assume that the p-th order derivative χ ( p ) of χ exists for some even integer p 2 and is bounded on every finite closed subinterval of ( 0 , ) . Let g be bounded. Then
max l = 1 , , n χ ˜ n ( u l ) χ ( u l ) = O ln n ( n b ) 1 / 2 + β n a . s .
where β n = b p if m > 0 and β n = b 1 / d if m = 0 .
The proof of this lemma is omitted since, with minor changes, this lemma can be proven in the same way as Lemma 4.4 in [7]. The following lemma is used later several times in proofs of almost sure convergence rates. We provide it without proof. The proof is almost identical to that of Lemma 4.6 in [7].
Lemma 6.
Assume that χ is bounded. Let k ¯ , λ : R R be bounded measurable functions with k ¯ ( t ) = 0 for t : | t | > 1 . Then
max l = 1 , , n i = 1 n U n i l E U n i l = O n b ln ( n ) a . s . , max l = 1 , , n i = 1 n U ¯ n i l E U ¯ n i l = O n b ln ( n ) a . s . , max l = 1 , , n i = 1 n V n i l E V n i l = O n b ln ( n ) a . s .
  • where U n i l : = k ¯ ( ( u l Y ˜ i ) / b ) 1 { | Y ˜ i u l | b w n } λ ( X i ) ,
  • U ¯ n i l : = k ¯ ( ( u l + Y ˜ i ) / b ) 1 { | Y ˜ i u l | b w n } λ ( X i ) ,
  • V i n l : = 1 { b w n < | Y ˜ i u l | < b + w n } .
We proceed with proving convergence rates of the terms in (22).
Lemma 7.
Suppose that Assumption 4 or 5 is satisfied. Then, as n ,
max l = 1 , , n χ ^ n ( u l ) χ ˜ n ( u l ) = o ( n b ) 1 / 2 under Assumption 4 , O ( b 1 n 1 ln ln n α ¯ / 2 ) under Assumption 5 a . s .
Proof. 
(a) Let Assumption 4 be satisfied. In view of Lemma 3, we obtain
Y i n Y ˜ i sup t [ 0 , ) ψ ( t ) h K ( X i μ ^ n ) h K ( X i μ ) C 5 · ln ln n n = : w n
with a suitable constant C 5 > 0 for n n 1 ( ω ) . We introduce
κ n ( u , t ) = k ( ( u t ) / b ) k ( ( u + t ) / b ) 1 u t b w n .
Let ψ ¯ ( z ) : = z 1 ψ ( z ) . Observe that k is bounded and Lipschitz continuous on [ 1 , 1 ] , ψ , ψ ¯ and ψ ¯ are bounded on [ 0 , + ) , functions G j are bounded, and functions ψ ( h K ( . ) ) G j are Hölder continuous of order α > 0.2 . We have then by Taylor expansion
k u Y i n b + k u + Y i n b k u Y ˜ i b k u + Y ˜ i b = 1 b k ( ( u Y ˜ i ) / b ) k ( ( u + Y ˜ i ) / b ) ψ ( h K ( X ¯ i ) ) × j = 1 d G j ( X ¯ i ) μ ^ n j μ j 1 Y ˜ i u b w n + W n i ( u ) ,
where
W n i ( u ) C b 2 w n 2 + b 1 w n 1 + α 1 Y ˜ i u b w n + C b 1 w n 1 b w n < Y ˜ i u < b + w n a . s .
uniformly w.r.t. u [ 0 , M ] . Here we have used Assumption 4 and Lipschitz continuity of k on R . This leads to
max l = 1 , , n χ ^ n ( u l ) χ ˜ n ( u l ) C j = 1 d B 1 n j μ ^ n j μ j + B 2 n + B 3 n ,
where
B 1 n j = n 1 b 2 max l = 1 , , n i = 1 n κ n ( u l , Y ˜ i ) ψ ( h K ( X ¯ i ) ) G j ( X ¯ i ) , B 2 n = C b 3 w n 2 + b 2 w n 1 + α max l = 1 , , n i = 1 n 1 Y ˜ i u l b w n , B 3 n = C n 1 b 2 w n max l = 1 , , n i = 1 n 1 b w n < Y ˜ i u l < b + w n a . s .
Note that
B 1 n j = n 1 b 2 max l = 1 , , n i = 1 n κ n ( u l , Y ˜ i ) ψ ( h K ( X ¯ i ) ) G j ( X ¯ i ) i = 1 n E κ n ( u l , ψ ( h K ( X ¯ i ) ) ) ψ ( h K ( X ¯ i ) ) G j ( X ¯ i )
since the expectation in the last term is zero in view of Lemma 4 ( G j ( x ) = G j ( x ) holds for all x R d ). Applying Lemma 6, it follows that
j = 1 d B 1 n j μ ^ n j μ j C n 3 / 2 b 2 ln ln n · n b ln n = o ( n b ) 1 / 2 .
On the other hand, we obtain
B 2 n C ln ln n n 2 b 3 1 + b n ( 1 α ) / 2 max l = 1 , , n i = 1 n I Y ˜ i u l b w n C ln ln n n 2 b 3 1 + b n ( 1 α ) / 2 max l = 1 , , n i = 1 n 1 Y ˜ i u l b w n P Y ˜ i u l b w n + n · sup v [ 0 , ) P v Y ˜ i v + 2 b 2 w n C ln ln n n 2 b 3 1 + n ( 1 α ) / 2 b n b ln n + n b C ln ln n n 1 b 2 + n ( 1 + α ) / 2 b 1 = o ( n b ) 1 / 2 a . s . ,
by utilizing Lemma 6 and taking α > 0.2 into account. Similarly, it follows that
B 3 n C ln ln n n 3 b 4 max l = 1 , , n i = 1 n 1 b w n Y ˜ i u l b + w n C ln ln n n 3 b 4 max l = 1 , , n i = 1 n 1 b w n Y ˜ i u l b + w n P b w n Y ˜ i u l b + w n + n · P u l + b w n < Y ˜ i u l + b + w n + n · P u l b w n < Y ˜ i u l b + w n = C ln ln n n 3 b 4 n b ln n + n w n = o ( n b ) 1 / 2 a . s .
Therefore, an application of (23)–(26) leads to the lemma under Assumption 4.
(b) Let Assumption 5 be satisfied. We obtain
Y i n Y ˜ i sup t [ 0 , ) ψ ( t ) h K ( X i μ ^ n ) h K ( X i μ ) C 5 · ln ln n n α ¯ = : w n .
Further, by Lipschitz continuity of k,
k u Y i n b + k u + Y i n b k u Y ˜ i b k u + Y ˜ i b C b 1 w n 1 Y ˜ i u b + w n
uniformly w.r.t. u [ 0 , M ] . Hence
max l = 1 , , n χ ^ n ( u l ) χ ˜ n ( u l ) C n 1 b 2 w n max l = 1 , , n i = 1 n 1 Y ˜ i u l b + w n P Y ˜ i u l b + w n + n · P u l b w n Y ˜ i u l + b + w n = n 1 b 2 w n n ( b + w n ) ln n + n ( b + w n ) = O ( w n b 1 ) a . s .
Lemma 8.
Suppose that the assumptions of Lemma 5 are satisfied. Then
( a ) max l = 1 , , n sup y U l χ ^ n ( y ) χ ^ n ( u l ) = O ( n 1 b 2 ) a . s . , ( b ) max l = 1 , , n sup y U l χ ˜ n ( y ) χ ˜ n ( u l ) = O ( n 1 b 2 ) a . s . , ( c ) sup y [ m , M ] χ ^ n ( y ) χ ( y ) = O ln n ( n b ) 1 / 2 + β n a . s . ,
β n as in Lemma 5.
Proof. 
In view of Lemma 2, we have
max l = 1 , , n sup y U l χ ( u l ) χ ( y ) = O ( n 1 ) if m 0 , O ( n 1 / d ) if m = 0 .
Moreover, by the Lipschitz continuity of k, we obtain
max l = 1 , , n sup y U l χ ^ n ( y ) χ ^ n ( u l ) = n 1 b 1 max l = 1 , , n sup y U l i = 1 n K b ( y , Y i n ) K b ( u l , Y i n ) C b 2 max l = 1 , , n sup y U l | y u l | = O ( n 1 b 2 ) a . s .
which proves assertion (a). Analogously, the validity of assertion (b) can be shown. In view of (22), the lemma follows by Lemma 5 and 7. ☐
We are now in a position to prove the result on strong convergence of φ ^ n .
Proof of Theorem 2:
(i) Case μ D . By Lemma 3, there are M 0 > m 0 > 0 such that h K ( x μ ^ n ) [ m 0 , M 0 ] for x D , n n 2 ( ω ) . In view of (11), we obtain
sup x D φ ^ n ( x ) φ ( x ) O S ( S ) 1 sup x D g ^ n h K ( x μ ^ n ) g h K ( x μ ^ n ) + sup z 0 | g ( z ) | sup x D h K ( x μ ^ n ) h K ( x μ ) O S ( S ) 1 sup z [ m 0 , M 0 ] g ^ n ( z ) g ( z ) + C ln ln n n O S ( S ) 1 sup z 0 z 1 d ψ ( z ) sup z [ ψ ( m 0 ) , ψ ( M 0 ) ] χ ^ n ( z ) χ ( z ) + C ln ln n n a . s .
for n n 3 ( ω ) . Lemma 8 applies to complete the proof of part (i).
(ii) Case μ D . The proof can be done analogously to part (i) taking m 0 = 0 into account. ☐

4.4. Proving Asymptotic Normality of φ ^ n ( x )

Throughout this subsection, assume that Assumptions 1–3 and (9) are fulfilled for some integer p 2 . First, an auxiliary result is proven. Define x ^ : = h K ( x μ ^ n ) and x ˜ : = h K ( x μ ) .
Lemma 9.
Under Assumption 4, we have
( a ) χ ^ n ( ψ ( x ˜ ) ) χ ˜ n ( ψ ( x ˜ ) ) = o ( n b ) 1 / 2 a . s . ( b ) χ ^ n ( ψ ( x ^ ) ) χ ^ n ( ψ ( x ˜ ) ) = o ( n b ) 1 / 2 a . s . as n .
Proof. 
Let U l and u l as in Section 4.3. We can choose m , M : 0 < m < M such that ψ ( x ^ ) , ψ ( x ˜ ) [ m , M ] for n n 4 ( ω ) . By Lemmas 7 and 8,
sup y [ m , M ] χ ^ n ( y ) χ ˜ n ( y ) max l = 1 , , n sup y U l χ ^ n ( y ) χ ^ n ( u l ) + max l = 1 , , n χ ^ n ( u l ) χ ˜ n ( u l ) + max l = 1 , , n sup y U l χ ˜ n ( y ) χ ˜ n ( u l ) C n 1 b 2 + o ( n b ) 1 / 2 = o ( n b ) 1 / 2 a . s .
which yields immediately assertion (a). Since
x ^ x ˜ = O ( w n ) a . s .
by Lemma 3, we obtain the inequality
χ ˜ n ψ ( x ^ ) χ ˜ n ψ ( x ˜ ) D 1 n + D 2 n + D 3 n
by Taylor expansion, where
D 1 n = n 1 b 2 i = 1 n j = 1 d k ( ( ψ ( x ˜ ) ) Y ˜ i ) b 1 ) 1 ψ ( x ˜ ) Y ˜ i b w n ψ ( x ˜ ) G j ( x μ ) μ ^ n j μ j , D 2 n C n 1 b 3 · w n 2 i = 1 n 1 ψ ( x ˜ ) Y ˜ i b w n , D 3 n C n 1 b 2 · w n i = 1 n 1 b w n < ψ ( x ˜ ) Y ˜ i < b + w n
a.s. Observe that k ( t ) = k ( t ) and
E k ( ( ψ ( x ˜ ) Y ˜ i ) b 1 ) 1 ψ ( x ˜ ) Y ˜ i b w n = b 1 + w n b 1 1 w n b 1 k ( t ) χ ( ψ ( x ˜ ) t b ) d t = b 0 1 w n b 1 k ( t ) χ ( ψ ( x ˜ ) t b ) χ ( ψ ( x ˜ ) + t b ) d t = O ( b 2 ) .
Analogously to Lemma 6, we can deduce
D 1 n C n 1 b 2 · w n · j = 1 d i = 1 n k ( ( ψ ( x ˜ ) Y ˜ i ) b 1 ) 1 ψ ( x ˜ ) Y ˜ i b w n ψ ( x ˜ ) G j ( x μ ) C n 3 / 2 b 2 ln ln n · n b ln n + n b 2 = o ( n b ) 1 / 2 a . s .
since the expectation is zero due to Lemma 4. Analogously to the examination of B 3 n and B 4 n in Lemma 7, we obtain
D 2 n + D 3 n = o ( n b ) 1 / 2 a . s .
Hence
χ ˜ n ψ ( x ^ ) χ ˜ n ψ ( x ˜ ) = o ( n b ) 1 / 2 a . s .
In view of (27), the proof of part b) is complete. ☐
From kernel density estimation theory, we can take the following lemma, see [40]. Subsequently, we prove asymptotic normality of φ ^ n .
Lemma 10.
Suppose that χ is continuous at y. Then
n b χ ˜ n ( y ) E χ ˜ n ( y ) d N ( 0 , σ 1 2 ) , σ 1 2 = χ ( y ) 1 1 k 2 ( t ) d t .
Proof of Theorem 3.
Note that z z 1 d ψ ( z ) has a bounded derivative on every finite subinterval of ( 0 , ) . By Lemmas 3 and 9, we obtain
h K ( x μ ^ n ) 1 d ψ ( h K ( x μ ^ n ) ) x ˜ 1 d ψ ( x ˜ ) = O ln ln n n
and hence
φ ^ n ( x ) φ ( x ) = O S ( S ) 1 h K ( x μ ^ n ) 1 d ψ ( h K ( x μ ^ n ) ) χ ^ n ( ψ ( h K ( x μ ^ n ) ) ) χ ^ n ( ψ ( x ˜ ) ) + O S ( S ) 1 x ˜ 1 d ψ ( x ˜ ) χ ^ n ( ψ ( x ˜ ) ) χ ( ψ ( x ˜ ) ) + h K ( x μ ^ n ) 1 d ψ ( h K ( x μ ^ n ) ) x ˜ 1 d ψ ( x ˜ ) O S ( S ) 1 χ ^ n ( ψ ( x ˜ ) ) = o ( n b ) 1 / 2 + O S ( S ) 1 x ˜ 1 d ψ ( x ˜ ) + o ( ( n b ) 1 / 2 ) χ ^ n ( ψ ( x ˜ ) ) χ ( ψ ( x ˜ ) ) = Z n + e n + o ( n b ) 1 / 2 a . s . ,
where
Z n = O S ( S ) 1 x ˜ 1 d ψ ( x ˜ ) χ ^ n ( ψ ( x ˜ ) ) E χ ^ n ( ψ ( x ˜ ) ) , e n = O S ( S ) 1 x ˜ 1 d ψ ( x ˜ ) E χ ^ n ( ψ ( x ˜ ) ) χ ( ψ ( x ˜ ) ) .
Using Lemma 10, we have
n b Z n d N ( 0 , σ ¯ 2 )
where σ ¯ 2 ( x ˜ ) = O S ( S ) 2 x ˜ 2 2 d ψ ( x ˜ ) 2 σ 1 2 . By Taylor expansion, we obtain
e n = O S ( S ) 1 x ˜ 1 d ψ ( x ˜ ) 0 k ψ ( x ˜ ) y b χ ( y ) d y χ ( ψ ( x ˜ ) ) = O S ( S ) 1 x ˜ 1 d ψ ( x ˜ ) 1 1 k ( t ) χ ( ψ ( x ˜ ) t b ) χ ( ψ ( x ˜ ) ) d t = O S ( S ) 1 x ˜ 1 d ψ ( x ˜ ) · 1 1 k ( t ) j = 1 p 1 ( j ! ) 1 χ ( j ) ( ψ ( x ˜ ) ) t b j + ( p ! ) 1 χ ( p ) ( x ˜ n ) t p b p d t = O S ( S ) 1 ( p ! ) 1 x ˜ 1 d ψ ( x ˜ ) χ ( p ) ( ψ ( x ˜ ) ) 1 1 t p k ( t ) d t b p + o ( b p ) ,
and x ˜ n lies between ψ ( x ˜ ) t b and ψ ( x ˜ ) . This completes the proof. ☐

4.5. Proofs When Additional Scale Fit Is Involved

When proving Theorem 4 we shall make use of the following lemma.
Lemma 11.
Let (18) be satisfied. Then
θ ^ n l θ l = O ln ln n n a . s . f o r l = 1 , , q .
Proof. 
By the law of iterated logarithm and Lemma 3, we obtain
ρ ^ n j k ρ j k = σ ^ n j 1 σ ^ n k 1 1 n 1 i = 1 n X i j X i k E X j X k n n 1 μ ^ n j μ j μ ^ n k + μ j μ ^ n k μ k + O ( n 1 ) + E X j X k σ ^ n j σ ^ n k σ j σ k 1 σ j σ k σ ^ n k + σ ^ n k σ j σ ^ n j = O ln ln n n a . s .
for all j , k = 1 , , d . Since the partial derivatives of functions γ l are bounded, it follows that
θ ^ n l θ l C max j , k = 1 , , d ρ ^ n j k ρ j k = O ln ln n n a . s .
Proof of Theorem 4.
Let u ( x ) = h K 0 ( θ , Σ 1 ( x μ ) ) . Note that u ( X i ) = R i . By Lipschitz-continuity of h K 0 , (1) and Lemma 3, we have
Δ ¯ n : = sup x R d h K 0 ( θ ^ n , Σ ^ n 1 ( x μ ^ n ) ) u ( x ) 1 + u ( x ) 1 C sup x R d θ ^ n θ + Σ ^ n 1 μ ^ n μ + j = 1 d σ ^ n j 1 σ j 1 x j μ j · 1 + u ( x ) 1 C ln ln n n · sup x R d 1 + x μ · 1 + u ( x ) 1 = O ln ln n n a . s .
We obtain the result as follows:
sup r 0 F ^ n R ( r ) F R ( r ) 1 n sup r 0 i = 1 n 1 r Δ ¯ n ( 1 + u ( X i ) ) u ( X i ) r + Δ ¯ n ( 1 + u ( X i ) ) + Δ n = 1 n sup r 0 i = 1 n 1 r Δ ¯ n 1 + Δ ¯ n R i r + Δ ¯ n 1 Δ ¯ n + Δ n = sup r 0 F n R ( ( r + Δ ¯ n ) / ( 1 Δ ¯ n )