Second Order Chebyshev–Edgeworth-Type Approximations for Statistics Based on Random Size Samples

Christoph, Gerd; Ulyanov, Vladimir V.

doi:10.3390/math11081848

Open AccessArticle

Second Order Chebyshev–Edgeworth-Type Approximations for Statistics Based on Random Size Samples

by

Gerd Christoph

^1,*,†

and

Vladimir V. Ulyanov

^2,3,†

¹

Department of Mathematics, Otto-von-Guericke University Magdeburg, 39016 Magdeburg, Germany

²

Faculty of Computer Science, HSE University, 101000 Moscow, Russia

³

Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, 119991 Moscow, Russia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2023, 11(8), 1848; https://doi.org/10.3390/math11081848

Submission received: 27 February 2023 / Revised: 3 April 2023 / Accepted: 11 April 2023 / Published: 13 April 2023

(This article belongs to the Special Issue Limit Theorems of Probability Theory)

Download Versions Notes

Abstract

:

This article completes our studies on the formal construction of asymptotic approximations for statistics based on a random number of observations. Second order Chebyshev–Edgeworth expansions of asymptotically normally or chi-squared distributed statistics from samples with negative binomial or Pareto-like distributed random sample sizes are obtained. The results can have applications for a wide spectrum of asymptotically normally or chi-square distributed statistics. Random, non-random, and mixed scaling factors for each of the studied statistics produce three different limit distributions. In addition to the expected normal or chi-squared distributions, Student’s t-, Laplace, Fisher, gamma, and weighted sums of generalized gamma distributions also occur.

Keywords:

second order Chebyshev–Edgeworth expansions; negative binomially distributed sample sizes; Pareto-like distributed sample sizes; asymptotically normally distributed statistics; asymptotically chi-square distributed statistics; scaled Student’s t-distribution; normal distribution; discrete Pareto distribution; generalized Laplace distribution; weighted sums of generalized gamma distributions

MSC:

62E17; 62H10; 60E05

1. Introduction

To improve the convergence properties of sums of independent identically distributed random variables in the Central Limit Theorem, asymptotic expansions of distribution functions of normalized sums were considered. The history of asymptotic expansions in nonparametric statistics is presented in detail in Wallace [1], Bickel [2], and Hall [3], among others. Chebyshev–Edgeworth expansions, with which we are concerned here, are presented in great detail in Bhattacharya and Rao [4] for random vectors and in Petrov [5] for one-dimensional random variables. For instance, in Pfanzagl [6] and Bentkus et al. [7], the authors emphasize that asymptotic expansions can provide more effective approximations for asymptotic studies in statistical theory. Second order approximations of distribution functions of sums of random variables are of great importance because they take into account the skewness and kurtosis of the random variable in addition to the expected value and the variance, as in the Central Limit Theorem. In Burnashev [8], second order expansions are proved for the asymptotically normally distributed sample median

M_{m}

on a sample of size m and its MSE. Based on this, for a Laplace population with density

e^{- | x |} / 2

, the actual MSE with exact data is compared numerically with approximations data. For the normal approximation, the influence of the remaining term is below 10% only for

m > 250

, while for the approximation with the second order expansion, the influence of the remaining term is below 10% already from

m = 8

. For a Cauchy population with smooth and heavy tailed density

1 / (π (1 + x^{2}))

, for the normal approximation, the influence of the remaining term is below 10% for

m \geq 23

, while for the approximation with the second order expansion, the influence of the remaining term is below 10% already from

m = 11

. Consequently, as Burnashev [8] pointed out, asymptotic expansions can significantly improve the exactness of statistical conclusions, even in the case of a small number of observations. The results in the abovementioned papers are based on non-random sample sizes or non-random number of observations.

When planning statistical studies, situations often arise where the sample sizes are unknown in advance and they are modeled as realizations of random variables. Many models from medicine, finance, risk theory, physics, and reliability lead to samples with random dimensions. For instance, in the papers by Nunes et al. [9,10,11], different models in medical research random size samples were investigated in order to prevent false conclusions. In Esquível et al. [12], the authors give an informative overview of statistical inference with a random number of observations and some applications. Results for mean and variance for normally distributed samples, calculation of quantiles, and interval estimates with random sample size were also proved. Döbler [13] gives a detailed review of the literature on random sums as well as recent results on approximation in various metrics. In Schluter and Trede [14] (Theorem 1, Proposition 1), the authors show, using the convergence of a negative binomial random sum, that the growth rate of cities is Student t-distributed with 2 degrees of freedom. Their empirical investigations verify the result. The references in the above-cited papers provide further applications for random dimension sampling.

Bening et al. [15,16] proved convergence rates and asymptotic expansions for distributions of statistics

T_{N_{n}}

based on samples with random dimension

N_{n} \geq 1

. Here,

T_{m}

is a statistic based on a non-random number

m \geq 1

of independent observations. The random variables size

N_{n} \geq 1

form a sequence of integer random sample sizes that depends on a natural parameter n with

N_{n} \to \infty

in probability for

n \to \infty

. Inequalities with a convergence rate are assumed for the approximations of the distribution functions of both the normalized statistics

T_{m}

and the normalized random sample sizes

N_{n}

. As examples, convergence rates and first order asymptotic expansions are derived for the statistics

T_{N_{n}}

, where

T_{m}

is an asymptotically normal statistic and the random sample size

N_{n}

is either negatively binomial or Pareto-like distributed.

In Christoph et al. [17], inequalities for the second order approximations of the distribution functions of normalized negative binomial and Pareto-like sample sizes were proved. Consequently, second order Chebyshev–Edgeworth approximations and the corresponding Cornish–Fisher expansions could be obtained for the distribution of the normalized arithmetic mean of a sample with normalized negative binomial or Pareto-like sample sizes where the remainders are of order

n^{- 3 / 2}

.

The present work provides a supplement to our paper, Christoph and Ulyanov [18], where we have developed a formal second order design for asymptotic Chebyshev–Edgeworth approximations. We considered asymptotically normal statistics with sample size having negative binomial distribution as well as asymptotically chi-squared statistics with Pareto-like distributed sample sizes. In addition to the distributions of statistic

T_{m}

and random sample size

N_{n}

, three scaling factors for

T_{N_{n}}

are also introduced, leading to different expansions. It is the first paper to consider approximations for asymptotic chi-square statistics based on random sample sizes. Some more applications of random sample size sampling were also mentioned.

In the present paper, we provide similar results for asymptotically normal statistics of samples with Pareto-like distributed sample sizes and for asymptotically chi-squared statistics with sample size having negative binomial distribution.

For better reader convenience, we list in Section 2 some notations, conditions, and statements that were also used in Christoph and Ulyanov [18]. Section 3 states the necessary approximations for the statistics

T_{m}

and the sample sizes

N_{n}

. The dependence of the limit distributions of the scaled statistic

T_{N_{n}}

on the distributions of the statistic

T_{m}

and the sample size

N_{n}

, as well as the scaling factors, is discussed in Section 4. Section 5 then presents the main results. As examples, we consider the same statistic

T_{m}

as in Christoph and Ulyanov [18] (Corollaries 1 and 2), but with changed sample sizes. Section 6 provides the proofs of the main results, leaving three auxiliary lemmas to Appendix A. Conclusions are presented in Section 7.

2. Notation and Preliminaries

Let

(Ω, A, P)

be a probability space on which all occurring random variables are given.

Set positive numbers, real axis, integer part

[y]

of real y, and indicator function as follows:

N_{+} = {1, 2, . . .}, R = (- \infty, \infty), y - 1 < [y] \leq y a n d I_{A} = I_{A} (x) = \{\begin{matrix} 1, & x \in A \subset R \\ 0, & x \notin A \subset R \end{matrix} .

Let

X_{1}, X_{2}, X_{3} \dots \in R

be independent identically distributed random variables. Define the statistic

T_{m} : = T_{m} (X_{1}, \dots, X_{m}) w i t h m \in N_{+},

based on the random sample

{X_{1}, X_{2}, \dots, X_{m}}

with a non-random sample size

m \in N_{+}

.

Consider the sequence of discrete random variables

N_{1}, N_{2}, \dots

, depending on an integer parameter

n \geq 1

. This integer

N_{n} \geq 1

indicates the random dimension of the observations

X_{1}, \dots, X_{N_{n}}

. Let us assume that the sample size

N_{n}

does not depend on

X_{1}, X_{2}, X_{3} \dots

, where

N_{n} \to \infty

in probability when

n \to \infty

. Define for each

n \in N_{+}

the statistic

T_{N_{n}}

obtained from a random sample

{X_{1}, X_{2}, \dots, X_{N_{n}}}

by

T_{N_{n}} (ω) : = T_{N_{n} (ω)} (X_{1} (ω), X_{2} (ω), \dots, X_{N_{n} (ω)} (ω)) for each ω \in Ω .

(1)

It follows from Esquível et al. [12] (Theorem 2.1.1) that the statistic

T_{N_{n}}

is well-defined in (1).

Since we want to prove second order approximations for the statistic

T_{N_{n}}

in form of inequalities, we need the corresponding assumptions for the statistic

T_{m}

and for the random sample size

N_{n}

as well.

For the statistic

T_{m}

with

E T_{m} = 0

and the random sample sizes

N_{n} \in N_{+}

we suppose conditions on the structure of the approximating functions as well as on the convergence rate:

Assumption 1.

There are a distribution function

F (x)

, bounded functions

f_{1} (x)

,

f_{2} (x)

which are differentiable for all

x \neq 0

,

γ \in {- 1, - 1 / 2, 0, 1 / 2, 1}

,

a > 1 / 2

as well as

0 < C_{1} < \infty

such that

\begin{matrix} {sup}_{x} | P (m^{γ} T_{m} \leq x) - F (x) - m^{- 1 / 2} f_{1} (x) - I_{a > 1} (a) m^{- 1} f_{2} (x) | \leq C_{1} m^{- a}, m \leq 1 . \end{matrix}

(2)

Assumption 2.

There exists a distribution function

H (y)

with

H (0 +) = 0

, a bounded variation function

h_{2} (y)

, a sequence of numbers

0 < g_{n} ↑ \infty

,

b > 0

, and

0 < C_{2} < \infty

such that for

n \in N_{+}

\begin{matrix} {sup}_{y \geq 0} |P (g_{n}^{- 1} N_{n} \leq y) - H (y)| \leq C_{2} n^{- b}, & f o r & 0 < b \leq 1, \\ {sup}_{y \geq 0} |P (g_{n}^{- 1} N_{n} \leq y) - H (y) - n^{- 1} h_{2} (y)| \leq C_{2} n^{- b}, & f o r & b > 1 . \end{matrix}\}

(3)

Remark 1.

Assumptions 1 and 2 require inequalities for the approximations of

T_{m}

and

N_{n}

for all

m, n \in N_{+}

, leading to inequalities for the approximations of

T_{N_{n}}

. See also Remark 5 below on Poisson and binomial random variables

N_{n}

. For these sample sizes, we are so far only aware of estimates of the remaining terms with small-o or large-

O

convergence rates. About the differences between inequalities and

O

order bounds, see, e.g., Fujikoshi and Ulyanov [19] (Chapter 1).

Remark 2.

In Bening et al. [16], these conditions are formulated more generally. Assumption 1 requires the existence of

f_{1}

,...,

f_{l}

with

a > l / 2

and Assumption 2 that of

h_{1}

,...,

h_{k}

with

b > k / 2

. We restrict ourselves here, as in Christoph and Ulyanov [18], to the required approximation functions.

Assumptions 1 and 2 lead to the approximations for the distribution functions of statistics

T_{N_{n}}

:

Proposition 1.

(Christoph and Ulyanov [18], Proposition 1) Let

γ \in {- 1, - 1 / 2, 0, 1 / 2, 1}

. The statistic

T_{m}

and the sample size

N_{n}

are supposed to satisfy Assumptions 1 and 2, respectively. Then,

\begin{matrix} {sup}_{x \in R} | P (g_{n}^{γ} T_{N_{n}} \leq x) - G_{n} (x, 1 / g_{n}) | \leq C_{1} E (N_{n}^{- a}) + (C_{3} D_{n} + C_{4}) n^{- b}, \end{matrix}

(4)

where

a > 0, b > 0

are the convergence rates in (2) and (3),

\begin{matrix} G_{n} (x, 1 / g_{n}) & = & \int_{1 / g_{n}}^{\infty} (F (x y^{γ}) + \frac{f_{1} (x y^{γ})}{\sqrt{g_{n} y}} + \frac{f_{2} (x y^{γ})}{g_{n} y}) d (H (y) + \frac{h_{2} (y)}{n}), \end{matrix}

(5)

\begin{matrix} D_{n} & = & sup_{x} \int_{1 / g_{n}}^{\infty} |\frac{\partial}{\partial y} (F (x y^{γ}) + \frac{f_{1} (x y^{γ})}{\sqrt{g_{n} y}} + \frac{f_{2} (x y^{γ})}{y g_{n}})| d y, \end{matrix}

(6)

and

f_{1} (z), f_{2} (z), h_{2} (y)

are given in (2) and (3). The constants

C_{1}, C_{3}, C_{4}

do not depend on n.

Bening et al. [16] proved general transfer theorems under the conditions indicated in Remark 2 only for case

γ \geq 0

. Therefore, the proof is repeated in Christoph and Ulyanov [20] (Appendix A.1).

3. Second Order Estimates for Both the Statistics $T_{m}$ and the Sample Sizes $N_{n}$

First we consider the following statistics

T_{m}

with non-random sample size m and

E T_{m} = 0

with the corresponding second order approximations. Let the asymptotically normal statistic

T_{m}

satisfy the following inequality:

|P (\sqrt{m} T_{m} \leq x) - Φ (x) - (m^{- 1 / 2} (p_{0} + p_{2} x^{2}) + m^{- 1} (p_{1} x + p_{3} x^{3} + p_{5} x^{5}) I_{a > 1} (a)) φ (x)| \leq C m^{- a}

(7)

with

a > 0

and

Φ (x)

refers to the standard normal distribution function with density function

φ (y)

:

Φ (x) = \int_{- \infty}^{x} φ (y) d y, x \in R, and φ (y) = \frac{1}{\sqrt{2 π}} e^{- y^{2} / 2}, y \in R .

Asymptotically chi-squared distributed statistics

T_{m}

satisfy the following inequality:

|P (m T_{m} \leq x) - G_{d} (x) - m^{- 1} (q_{1} x + q_{2} x^{2}) g_{d} (x)| \leq C m^{- 2},

(8)

where

G_{d} (x)

,

d \in N_{+}

, denotes the chi-squared distribution function with d degrees of freedom and the density function

g_{d} (y)

:

\begin{matrix} g_{d} (y) = \frac{1}{2^{d / 2} Γ (d / 2)} y^{(d - 2) / 2} e^{- y / 2}, y > 0, and G_{d} (x) = P (χ_{d}^{2} \leq x) = \int_{0}^{x} g_{d} (y) d y, x > 0 . \end{matrix}

In Christoph and Ulyanov [18] (Sections 3.1 and 3.2), some examples of such statistics

T_{m}

are given that satisfy (7) or (8) and consequently, Assumption 1.

As already announced, we consider the following random sample sizes

N_{n}

with the corresponding second order approximations.

The Pareto-like random sample sizes

N_{n} (s)

are defined as follows:

Let

Y_{j} (s) \in N_{+}

,

j = 1, 2, \dots

be independent discrete Pareto II random variables with parameter

s > 0

, which are discretized from continuous Lomax (Pareto II) random variables on

N_{+}

, for a review, see, e.g., Buddana and Kozubowski [21]. For

s > 0

, there are defined

P (Y_{j} (s) \leq k) = \frac{k}{s + k}, N_{n} (s) = max_{1 \leq j \leq n} Y_{j} (s) and P (N_{n} (s) \leq k) = {(\frac{k}{s + k})}^{n}, n, k \in N_{+} .

(9)

Proposition 2.

(Christoph and Ulyanov [18], Proposition 4) Let

N_{n} (s)

be the discrete Pareto-like random variable whose distribution function is given in (9); then, for all integers

n \geq 1

and fixed positive

s > 0

, we have

{sup}_{y > 0} |P (\frac{N_{n} (s)}{n} \leq y) - W_{s} (y) - \frac{h_{2; s} (y)}{n}| \leq \frac{C_{2} (s)}{n^{2}}

(10)

W_{s} (y) = e^{- s / y} y > 0, h_{2; s} (y) = \frac{s e^{- s / y}}{2 y^{2}} (s - 1 + 2 Q_{1} (n y)), y > 0,

(11)

with jump correcting function

Q_{1} (y) = 1 / 2 - (y - [y])

and

C_{2} (s) > 0

does not depend on n. Furthermore,

E {(N_{n} (s))}^{- a} \leq C (a, s) n^{- min {a, 2}},

(12)

with optimal bound in (12) for

0 < a \leq 2

, where a is the convergence rate in (7).

Remark 3.

The inverse exponential random variable

W (s)

with distribution function

H_{s} (y) = P (W (s) \leq y) = e^{- s / y} I_{(0, \infty)} (y)

and rate parameter

s > 0

is “heavy tailed” with shape parameter 1 as is

P (N_{n} (s) \leq y)

. Thus, the expected values of these two random variables do not exist.

Suppose the positive integer

N_{n} (r)

has a (shifted by 1) negative binomial distribution with probability of success

1 / n

,

n \in N_{+}

, parameter

r > 0

, probabilities

P (N_{n} (r) = j) = \frac{Γ (j + r - 1)}{Γ (j) Γ (r)} {(\frac{1}{n})}^{r} {(1 - \frac{1}{n})}^{j - 1}, j \in N_{+} \begin{matrix} and \end{matrix} g_{n} = E (N_{n} (r)) = r (n - 1) + 1 .

(13)

In statistical studies, for counting models, the negative binomial and Poisson distributions are the two most important ones. In Schluter and Trede [14] (Section 2.1), the authors emphasize that the negative binomial distribution with its two parameters can typically observe over-dispersion in count data, while this is not the case with the one-parameter Poisson distribution. They proved in a more general framework

{lim}_{n \to \infty} {sup}_{y} |P (N_{n} (r) / g_{n} \leq y) - G_{r, r} (y)| = 0,

(14)

while

G_{r, r} (y)

denotes the gamma distribution that has identical scale and shape parameters

r > 0

, whose density is

g_{r, r} (y) = \frac{r^{r}}{Γ (r)} y^{r - 1} e^{- r y} I_{(0, \infty)} (y), y \in R .

In Bening and Korolev [22] (Lemma 2.2), the result (14) was also obtained.

Proposition 3.

(Christoph and Ulyanov [18], Proposition 3) Let

r > 0

. The discrete random variable

N_{n} (r)

has probabilities and expected value

g_{n}

given in (13). Then, for all

n \in N_{+}

:

{sup}_{y \geq 0} |P (\frac{N_{n} (r)}{g_{n}} \leq y) - G_{r, r} (y) - \frac{h_{2; r} (y)}{n}| \leq C_{2} (r) n^{- min {r, 2}},

(15)

where

C_{2} (r) > 0

does not depend on n and with the jump correcting function

Q_{1} (y) = 1 / 2 - (y - [y])

,

h_{2; r} (y) = \{\begin{matrix} 0, & f o r r \leq 1, \\ \begin{matrix} \frac{g_{r, r} (y)}{2 r} \end{matrix} ((y - 1) (2 - r) + 2 Q_{1} (g_{n} y)), & f o r r > 1 . \end{matrix}

(16)

Moreover, negative moments

E {(N_{n} (r))}^{- a}

satisfy the estimation for all

r > 0

,

α > 0

E {(N_{n} (r))}^{- α} \leq C (r) \{\begin{matrix} n^{- min {r, α}}, r \neq α \\ ln (n) n^{- α}, r = α \end{matrix}

(17)

and the convergence rate in case

r = α

cannot be improved.

Remark 4.

Second order Chebyshev–Edgeworth expansions (10) and (15) with

r > 1

were first proved in Christoph et al. [17] (Theorems 4 and 1). Approximations in (10) and (15) with remainder estimations

C_{s} / n

or

C_{r} n^{- min {r, 1}}

are given, e.g., in Bening et al. [16] and Gavrilenko et al. [23]. In Christoph et al. [24] (Corollaries 5.4 and 6.5), leading terms for the negative moments of

N_{n} (r)

and

N_{n} (s)

are derived that lead to (17) and (12).

Remark 5.

The negative binomial distribution belongs to the class of Panjer distributions, which also includes the Poisson and binomial distributions. Samples with binomial or Poisson distributed sample sizes were studied among others in the above-cited papers [9,10,11,12]. Convergence rate bounds for statistics based on such samples are given in Döbler [13], Korolev [25], Bulinski and Slepov [26]. Döbler [13], Korolev and Shevtsova [27], Sunklodas [28] obtained Berry–Esseen bounds for sums based on samples with binomial and Poisson sample sizes. To the best of the authors’ knowledge, Chebyshev–Edgeworth expansions for these lattice distributed random variables have only been proven so far with bounds of small-o or large-

O

rates, see, e.g., Petrov [29] (Chapter 6, Theorem 6) or Kolassa and McCullagh [30]. Therefore, inequality (3) in Assumption 2 is not fulfilled.

4. Limit Distributions of Statistics with Random Size Samples using Different Scaling Factors

We now consider the statistics

T_{m}

and the sample sizes

N_{n}

, which are supposed to satisfy the inequalities (2) and (3) in Assumptions 1 and 2, respectively. Let us investigate the scaled statistics

g_{n}^{γ} N_{n}^{γ^{*} - γ} T_{N_{n}}

with the sequence

g_{n} ↑ \infty

as

n \to \infty

. We analyze the two cases

Φ

and

G_{u}

as limiting distributions F in Assumption 1 with respect to the exponents

γ^{*}

and

γ

: If

F = Φ

, then

γ^{*} = 1 / 2

and

γ \in {- 1 / 2, 0, 1 / 2}

, while if

F = G_{u}

, then

γ^{*} = 1

and

γ \in {- 1, 0, 1}

. Then, conditioning on

N_{n}

and using (2) and (3), we have

\begin{matrix} P (g_{n}^{γ} N_{n}^{γ^{*} - γ} T_{N_{n}} \leq x) & = & P (N_{n}^{γ^{*}} T_{N_{n}} \leq x {(N_{n} / g_{n})}^{γ}) = \sum_{m = 1}^{\infty} P (m^{γ^{*}} T_{m} \leq x {(m / g_{n})}^{γ}) P (N_{n} = m) \\ \overset{(2)}{\approx} E (F (x {(N_{n} / g_{n})}^{γ})) = \int_{1 / g_{n}}^{\infty} F (x y^{γ}) d P (N_{n} / g_{n} \leq y) \overset{(3)}{\approx} \int_{1 / g_{n}}^{\infty} F (x y^{γ}) d H (y) . \end{matrix}

(18)

Consequently, the limit distribution of the scaled statistic

g_{n}^{γ} N_{n}^{γ^{*} - γ} T_{N_{n}}

is a scale mixture of underlying F with mixing distribution H:

P (g_{n}^{γ} N_{n}^{γ^{*} - γ} T_{N_{n}} \leq x) \to \int_{0}^{\infty} F (x y^{γ}) d H (y)

, as

n \to \infty .

Refer to, e.g., Choy and Chan [31], Fujikoshi et al. [32] (Chapter 13), and Fujikoshi and Ulyanov [19] (Chapter 2) and the references therein.

The limiting distributions

\int_{1 / g_{n}}^{\infty} F (x y^{γ}) d H (y)

therefore only arise from the leading distributions

F (x)

and

H (y)

in the inequalities (2) and (3) and also depend on the parameter

γ

.

In Christoph and Ulyanov [18] (Sections 5 and 6), the cases

F (x) = Φ (x)

with

H (y) = G_{r, r} (y)

as well as

F (x) = G_{u} (x)

with

H (y) = W_{s} (y)

were considered. Now, we interchange the distributions of random sample sizes

N_{n}

. We first study the limiting distributions of asymptotically normally distributed statistics with Pareto-like distributed sample sizes

N_{n} (s)

and also asymptotically chi-squared distributed statistics with negative binomial distributed sample sizes

N_{n} (r)

. Since

W_{s} (1 / n) = e^{- s n}

and

G_{r, r} (1 / g_{n}) \leq \frac{r^{r - 1}}{Γ (r)} g_{n}^{- r}

hold, the integral range in the last integral in (18) can be extended from

(1 / g_{n}, \infty)

to

(0, \infty)

for further investigations.

4.1. The Case $F (x) = Φ (x)$ and $H (y) = W_{s} (y)$

In Christoph and Ulyanov [20,33], asymptotically normally distributed statistics

T_{m}

for samples of m-dimensional normally distributed vectors were considered: correlation coefficient as well as the three geometric features: the length of a vector, the distance, and the angle between two vectors. Inequalities for second order approximations for statistic

T_{m}

are derived when the dimension m is replaced by Pareto-like distributed random dimension

N_{n} (s)

. For the median of a sample with random sample size

N_{n} (s)

analogous results are shown in Christoph et al. [24] (Section 6). All these asymptotically normally distributed statistics

T_{N_{n} (s)}

with Pareto-like random dimensions or sample sizes have the same limiting distribution.

Let

γ \in {1 / 2, 0, - 1 / 2}

. Since

E N_{n} (s) = \infty

, we choose as

g_{n} = n

. Then, the limit laws for

P (n^{γ} N_{n} {(s)}^{1 / 2 - γ} T_{N_{n} (s)} \leq x) are V_{γ} (x, s) = \int_{0}^{\infty} Φ (x y^{γ}) d H_{s} (y) = \int_{0}^{\infty} Φ (x y^{γ}) \frac{s}{y^{2}} e^{- s y} d y .

with corresponding densities

v_{γ} (x, s) = \frac{s}{\sqrt{2 π}} \int_{0}^{\infty} y^{γ - 2} e^{- (x^{2} y^{2 γ} / 2 + s / y)} d y = \{\begin{matrix} l_{1 / \sqrt{s}} (x) & = \frac{\sqrt{2 s}}{2} e^{- \sqrt{2 s} | x |}, & γ = \frac{1}{2}, \\ φ (x) & = \frac{1}{\sqrt{2 π}} e^{- x^{2} / 2}, & γ = 0, \\ s_{2}^{*} (x; \sqrt{s}) & = \frac{1}{2 \sqrt{2 s}} {(1 + \frac{x^{2}}{2 s})}^{- 3 / 2}, & γ = - \frac{1}{2}, \end{matrix} .

(19)

Therefore, the limit distributions

V_{γ} (x, s)

are the Laplace law

L_{1 / \sqrt{s}} (x)

with density

l_{1 / \sqrt{s}} (x)

and scale parameter

λ = 1 / \sqrt{s}

for

γ = 1 / 2

, the standard normal law

Φ (x)

and density

φ (x)

for

γ = 0

and for

γ = - 1 / 2

the scaled Student’s t-distribution

S_{2}^{*} (x; \sqrt{s})

with 2 degrees of freedom and density

s_{2}^{*} (x; \sqrt{s})

. These mixed scale distributions

V_{γ} (x, s)

are discussed in more detail in Christoph and Ulyanov [20] (Section 4.2).

4.2. The Case $F (x) = G_{d} (x)$ and $H (y) = G_{r, r} (y)$

Asymptotically chi-squared distributed statistics of samples with random sample size were considered for the first time in Christoph and Ulyanov [18] in case of

H (y) = W_{s} (y) = e^{- s / y}

,

y > 0

.

Now, negatively binomial distributed sample sizes

N_{n} (r)

are considered. With

γ \in {1, 0, - 1}

and

g_{n} = E N_{n} (r) = r (n - 1) + 1

, the limit distributions for

P (g_{n}^{γ} N_{n} {(r)}^{1 - γ} T_{N_{n} (r)} \leq x) are V_{γ} (x; d, r) = \int_{0}^{\infty} G_{d} (x y^{γ}) d G_{r, r} (y) = \int_{0}^{\infty} G_{d} (x y^{γ}) \frac{r^{r}}{Γ (r)} y^{r - 1} e^{- r y} d y .

The corresponding densities are

\begin{matrix} v_{γ} (x; d, r) & = & \frac{r^{r} x^{d / 2 - 1}}{Γ (r) 2^{d / 2} Γ (d / 2)} \int_{0}^{\infty} y^{r + γ d / 2 - 1} e^{- (x y^{γ} / 2 + r y)} d y \\ = & \{\begin{matrix} f^{*} (x; d, 2 r) & = \frac{Γ (d / 2 + r) x^{d / 2 - 1}}{Γ (d / 2) Γ (r) \begin{matrix} 2^{d / 2} r^{d / 2} \end{matrix}} {(1 + \frac{x}{2 r})}^{- (d + 2 r) / 2}, & γ = 1, \\ g_{d} (x) & = \frac{1}{2^{d / 2} Γ (d / 2)} x^{d / 2 - 1} e^{- x / 2}, & γ = 0, \\ w_{r - d / 2} (x; d, r) & = \frac{r}{Γ (r) Γ (d / 2)} {(\frac{x r}{2})}^{r / 2 + d / 4 - 1} K_{r - d / 2} (\sqrt{2 r x}) . & γ = - 1 . \end{matrix} \end{matrix}

(20)

We prove (20) for

γ = \pm 1

in Section 6 in the proof of Theorem 2.

The scale mixtures

V_{γ} (x; d, r)

are the (scaled by d) F-distribution

F^{*} (x; d, 2 r) = F (x / d; d, 2 r)

with parameters

d \in N_{+}

and

r > 0

and density

f^{*} (x; d; 2 r) = \frac{1}{d} f (\frac{x}{d}; d; 2 r)

for

γ = 1

, the chi-squared distribution

G_{d} (x)

with d degrees of freedom and density

g_{d} (x)

for

γ = 0

and a gamma distribution of generalized type

W_{r - d / 2} (x; d, r)

occurs with density

w_{r - d / 2} (x; d, r)

for

γ = - 1

. The modified Bessel function of the third kind or Macdonald functions

K_{λ} (u)

also occurred in Christoph and Ulyanov [18,20] in generalized gamma and Laplace densities.

Remark 6.

The Macdonald function satisfying order-reflection formula

K_{- λ} (u) = K_{λ} (u)

and

K_{λ} (u)

may be expressed for

λ = m + 1 / 2

with integer m in closed forms. In Oldham et al. [34] (Formulas 51:4:1 and 26:13:3), the Macdonald functions

K_{- λ} (u) = K_{λ} (u)

for

λ = 1 / 2, 3 / 2, 5 / 2, 7 / 2, 9 / 2

are explicitly given. Using Prudnikov et al. [35] (Formulas 2.3.16.1-3), the densities

w_{r - d / 2} (x; d, r) = w_{m + 1 / 2} (x; d, r)

can be calculated:

w_{m + 1 / 2} (x; d, r) = \frac{r^{r} x^{d / 2 - 1}}{Γ (r) 2^{d / 2} Γ (d / 2)} \{\begin{matrix} {(- 1)}^{m} \sqrt{π} \frac{\partial^{m}}{\partial r^{m}} (r^{- 1 / 2} e^{- \sqrt{2 r x}}), m = 0, 1, 2, \dots, \\ {(- 2)}^{- m} \sqrt{\frac{π}{r}} \frac{\partial^{- m}}{\partial x^{- m}} e^{- \sqrt{2 r x}}, m = 0, - 1, - 2, \dots \end{matrix}

(21)

Example 1.

Some densities

w_{m + 1 / 2} (x; d, r)

for

m = r - (d + 1) / 2 = - 2, - 1, 0, 1, 2

:

\begin{matrix} m = - 2 & d = 7, r = 2 & w_{- 3 / 2} (x; 7, 2) = \frac{4 x}{15} (1 + \sqrt{4 x}) e^{- \sqrt{4 x}} \\ m = - 1 & d = 4, r = 3 / 2 & w_{- 1 / 2} (x; 4, 3 / 2) = \frac{3}{4} \sqrt{3 x} e^{- \sqrt{3 x}} \\ m = 0 & d = 4, r = 5 / 2 & w_{1 / 2} (x; 4, 5 / 2) = \frac{1}{12} \sqrt{25 x} e^{- \sqrt{5 x}} \\ m = 0 & d = 3, r = 2 & w_{1 / 2} (x; 3, 2) = \sqrt{4 x} e^{- \sqrt{4 x}} \\ m = 1 & d = 3, r = 3 & w_{3 / 2} (x; 3, 3) = \frac{3}{8} (6 x + \sqrt{6 x}) e^{- \sqrt{6 x}} \\ m = 2 & d = 3, r = 4 & w_{5 / 2} (x; 3, 4) = \frac{1}{12} ({(8 x)}^{3 / 2} + 24 x + 3 \sqrt{8 x}) e^{- \sqrt{8 x}} . \end{matrix}

Remark 7.

If

m = r - (d + 1) / 2

is an integer, the distribution functions

W_{m + 1 / 2} (x; d, r)

of the densities

w_{m + 1 / 2} (x; d, r)

can also be calculated explicitly by substitution and partial integration.

Example 2.

Distribution functions

W_{λ} (x; d, r)

for given densities

w_{λ} (x; d, r)

with

λ = \pm 1 / 2

:

\begin{matrix} w_{- 1 / 2} (x; 4, \frac{3}{2}) & = & \frac{3}{4} \sqrt{3 x} e^{- \sqrt{3 x}} a n d W_{- 1 / 2} (x; 4, \frac{3}{2}) = 1 - \frac{1}{2} (2 \sqrt{3 x} + 3 x + 2) e^{- \sqrt{3 x}} \end{matrix}

(22)

\begin{matrix} w_{1 / 2} (x; 4, \frac{5}{2}) & = & \frac{25 x}{12} e^{- \sqrt{5 x}} a n d W_{1 / 2} (x; 4, \frac{5}{2}) = 1 - (\frac{{(5 x)}^{3 / 2}}{6} + \frac{5 x}{2} + \frac{\sqrt{5 x}}{6} + 1) e^{- \sqrt{5 x}} \end{matrix}

(23)

\begin{matrix} w_{1 / 2} (x; 3, 2) & = & \sqrt{4 x} e^{- \sqrt{4 x}} a n d W_{1 / 2} (x; 3, 2) = 1 - (2 x + 2 \sqrt{x} + 1) e^{- \sqrt{4 x}} . \end{matrix}

(24)

Remark 8.

The generalized gamma distribution

G^{*} (x; β, α, λ)

has two shape parameters α and β, a scale parameter λ, and the density

g^{*} (x; β, α, λ) = \frac{| α | λ^{β}}{Γ (β)} x^{α β - 1} e^{- λ x^{α}}, x \geq 0, | α | > 0, β > 0, λ > 0 .

(25)

The density (25) is given in Korolev and Zeifman [36] and Korolev and Gorshenin [37] and summarizes many known densities. Generalized gamma distributions are defined in many different ways, but they do not correspond to the ones that occur above.

Remark 9.

The densities

w_{m + 1 / 2} (x; d, r)

with integer

m = r - (d + 1) / 2

are generalized gamma densities

g^{*} (x; β, α, λ)

given in formula (25) or may be represented as linear combinations of such densities. The parameters

α = 1 / 2

and

λ = \sqrt{2 r}

apply in all densities

g^{*} (x; β, α, λ)

. The parameter β also depends on the number of derivatives

m = r - (d + 1) / 2

in the densities

(21)

.

Example 3.

Some linear combinations of generalized gamma densities:

\begin{matrix} w_{1 / 2} (x; 3, 2) & = & g^{*} (x; 3, 1 / 2, \sqrt{4}) \\ w_{3 / 2} (x; 3, 3) & = & \frac{3}{4} g^{*} (x; 4, 1 / 2, \sqrt{6}) + \frac{1}{4} g^{*} (x; 3, 1 / 2, \sqrt{6}) \\ w_{5 / 2} (x; 3, 4) & = & \frac{1}{2} g^{*} (x; 5, 1 / 2, \sqrt{8}) + \frac{3}{8} g^{*} (x; 4, 1 / 2, \sqrt{8}) + \frac{1}{8} g^{*} (x; 3, 1 / 2, \sqrt{8}) . \end{matrix}

5. Main Results

Inequalities for approximations to scaled statistics

P (g_{n}^{γ} N_{n}^{γ^{*} - γ} T_{N_{n}} \leq x)

for

γ \in {0, \pm 1 / 2, \pm 1}

will be presented. Here,

γ^{*} = 1 / 2

and

γ \in {0, \pm 1 / 2}

when the statistic

T_{m}

is asymptotically normally distributed, or

γ^{*} = 1

and

γ \in {0, \pm 1}

when normalized

T_{m}

has chi-squared limit distribution.

5.1. Asymptotically Normal Statistics $T_{m}$ and Pareto-like Sample Sizes $N_{n} (s)$

Let asymptotically normal statistic

T_{m}

satisfy inequality (7) with coefficients

p_{k}

and the rate of convergence

a > 0

. The Pareto-like sample size

N_{n} = N_{n} (s)

,

s > 0

, is given in (9), which fulfills the inequality (10). For the scaling factors, select

γ^{*} = 1 / 2

and

γ \in {0, \pm 1 / 2}

in formula (18).

Theorem 1.

Under the conditions given above, the following approximations apply:

i:: Let $γ = 1 / 2$ . The non-random scaling factor $\sqrt{n}$ for the statistic $T_{N_{n} (s)}$ leads to approximations by the Laplace distribution $L_{1 / \sqrt{s}} (x)$ with the density $l_{1 / \sqrt{s}} (x)$ stated in (19) for $γ = 1 / 2$ :

${sup}_{x} |P (\sqrt{n} T_{N_{n} (s)} \leq x) - L_{1 / \sqrt{s}; n} (x)| \leq C_{s} n^{- min {a, 2}}$

where $a > 0$ is the rate of convergence in (7) and

$\begin{matrix} L_{1 / \sqrt{s}; n} (x) & = & L_{1 / \sqrt{s}} (x) + l_{1 / \sqrt{s}} (x) (\frac{I_{{a > 1 / 2}} (a)}{\sqrt{n}} [p_{2} x^{2} + p_{0} (\frac{| x |}{\sqrt{2 s}} + \frac{1}{2 s})] \\ + \frac{I_{{a > 1}} (a)}{n} [p_{5} x^{3} | x | \sqrt{2 s} + p_{3} x^{3} + (p_{1} + \frac{s - 1}{4}) x (\frac{| x |}{\sqrt{2 s}} + \frac{1}{2 s})]) . \end{matrix}$
ii:: Let $γ = 0$ . The random scaling factor $\sqrt{N_{n} (s)}$ with $T_{N_{n} (s)}$ leads to the normal approximation $Φ (x)$ :

${sup}_{x} |P (\sqrt{N_{n} (s)} T_{N_{n} (s)} \leq x) - \begin{matrix} Φ (x) - φ_{n, 2} (x) \end{matrix}| \leq C_{s} n^{- min {a, 2}},$

where $a > 0$ is the rate of convergence in (7) and

$\begin{matrix} φ_{n, 2} (x) = \end{matrix} φ (x) (\frac{\sqrt{π} (p_{0} + p_{2} x^{2})}{2 \sqrt{s n}} I_{{a > 1 / 2}} (a) + \frac{p_{1} x + p_{3} x^{3} + p_{5} x^{5}}{s n} I_{{a > 1}} (a)) .$
iii:: Let $γ = - 1 / 2$ . The mixed scaling factor $n^{- 1 / 2} N_{n} (s)$ at $T_{N_{n} (s)}$ results in Scaled Student’s t-distribution $S_{2}^{*} (x; \sqrt{s})$ with density $s_{2}^{*} (x; \sqrt{s})$ given in (19) for $γ = - 1 / 2$ :

${sup}_{x} |P (n^{- 1 / 2} N_{n} (s) T_{N_{n} (s)} \leq x) - S_{n; 2}^{*} (x)| \leq C_{s} n^{- min {a, 2}},$

where $a > 0$ is the rate of convergence in (7) and

$\begin{matrix} S_{n; 2}^{*} (x; \sqrt{s}) & = & S_{2}^{*} (x; \sqrt{s}) + s_{2}^{*} (x; \sqrt{s}) (\frac{I_{{a > 1 / 2}} (a)}{\sqrt{n}} [p_{0} + \frac{3 p_{2} x^{2}}{(x^{2} + 2 s)}] \\ + \frac{I_{{a > 1}} (a)}{n} [\frac{3 p_{1} x}{x^{2} + 2 s} + \frac{15 p_{3} x^{3}}{{(x^{2} + 2 s)}^{2}} + \frac{105 p_{5} x^{5}}{{(x^{2} + 2 s)}^{3}} + \frac{3 (s - 1) x}{4 (x^{2} + 2 s)}]) . \end{matrix}$

As applications of the Theorem 1, we now examine the Student t-distribution, the Student t-test statistic, and the sample mean as asymptotically normal statistics

T_{m}

considered in Christoph and Ulyanov [18] (Section 3.1 and Corollary 1) for the case of negative binomial sample sizes

N_{n} = N_{n} (r)

.

Corollary 1.

Let the conditions of Theorem 1 be satisfied:

i:: Let $γ = 1 / 2$ . In case of the Student’s t-statistic $T_{m} = Z / \sqrt{χ_{m}^{2}}$ with m degrees of freedom estimated in [18] (Formula (18)), inequality (7) is valid with $p_{0} = p_{2} = p_{5} = 0$ , $p_{1} = p_{3} = 1 / 4$ and $a = 2$ . The non-random scaling factor $\sqrt{n}$ and Pareto-like $N_{n} (s)$ sample sizes lead to:

${sup}_{x} |P (\frac{\sqrt{n} Z}{\sqrt{χ_{N_{n} (s)}^{2}}} \leq x) - L_{1 / \sqrt{s}} (x) - \frac{l_{1 / \sqrt{s}} (x)}{8 n} (2 x^{3} + x (1 + | x | \sqrt{2 s})| \leq C_{s} n^{- 2}$
ii:: Let $γ = 0$ . Let $T_{m} = ({\bar{X}}_{m} - μ) / {\hat{σ}}_{m}$ be the Student’s t-statistic with sample mean ${\bar{X}}_{m}$ and sample variance ${\hat{σ}}_{m}$ , which was considered in [18] (Formulas (21) and (20)). The first order approximation (7) with $p_{0} = λ_{3} / 6$ , $p_{2} = λ_{3} / 3$ , $a = 1$ , the Pareto-like random sample sizes $N_{n} (s)$ and the random scaling factor $\sqrt{N_{n} (s)}$ result in:

${sup}_{x} |P (\sqrt{N_{n} (s)} T_{N_{n} (s)} \leq x) - Φ (x) - φ (x) \frac{\sqrt{π} (λ_{3} + 2 λ_{3} x^{2})}{12 \sqrt{s n}}| \leq C_{s} n^{- 1},$
iii:: Let $γ = - 1 / 2$ . Considering sample mean $T_{m} = {\bar{X}}_{m}$ estimated in [18] (Formulas (15) and (16)), one has (7) with $p_{0} = - p_{2} = λ_{3} / 6$ , $p_{1} = λ_{4} / 8 - 5 λ_{3}^{2} / 24$ , $p_{3} = - λ_{4} / 24 + 5 λ_{3}^{2} / 36$ , $p_{5} = - λ_{3}^{2} / 72$ , $a = 3 / 2$ , Pareto-like random sample sizes $N_{n} (s)$ and mixed scaling factor $n^{- 1 / 2} N_{n} (s)$ , then

${sup}_{x} |P (n^{- 1 / 2} N_{n} (s) T_{N_{n} (s)} \leq x) - S_{2}^{*} (x; \sqrt{s}) - s_{n; 2}^{*} (x; \sqrt{s})| \leq C_{s} n^{- 3 / 2},$

with

$\begin{matrix} s_{n; 2}^{*} (x; \sqrt{s}) & = & s_{2}^{*} (x; \sqrt{s}) (\frac{1}{\sqrt{n}} (\frac{λ_{3}}{6} - \frac{λ_{3} x^{2}}{2 (x^{2} + 2 s)}) \\ + \frac{1}{n} (\frac{(3 λ_{4} - 5 λ_{3}^{2}) x}{8 (x^{2} + 2 s)} - \frac{5 (3 λ_{4} - 10 λ_{5}) x^{3}}{24 {(x^{2} + 2 s)}^{2}} - \frac{35 λ_{3}^{2} x^{5}}{24 {(x^{2} + 2 s)}^{3}} + \frac{3 (s - 1) x}{4 (x^{2} + 2 s)})) . \end{matrix}$

5.2. Asymptotically Chi-Squared Distributed $T_{m}$ with Negative Binomially Distributed Sample Sizes $N_{n} (r)$

Let the asymptotically chi-squared distributed statistics

T_{m}

satisfy inequality (8) with coefficients

q_{1}

,

q_{2}

and the rate of convergence

a = 2

. The negative binomially distributed sample sizes

N_{n} = N_{n} (r)

with parameter

r > 0

and success probability

1 / n

are given in (13) and fulfill the inequality (15). For the scaling factors, choose

γ^{*} = 1

and

γ \in {0, \pm 1}

in formula (18).

Theorem 2.

Under the conditions given above, the following approximations apply.

i:: Let $γ = 1$ . The non-random scaling factor $g_{n} = E N_{n} (r) = r (n - 1) + 1$ at statistics $T_{N_{n} (r)}$ leads to approximations by the scaled F-distribution $F^{*} (x; d, 2 r) = F (x / d; d, 2 r)$ having parameters $d \in N_{+}$ and $r > 0$ and density $f^{*} (x; d; 2 r) = \frac{1}{d} f (\frac{x}{d}; d; 2 r)$ given in (20) with $γ = 1$ :

${sup}_{x} |P (g_{n} T_{N_{n} (r)} \leq x) - F^{*} (x; d, 2 r) - f_{n}^{*} (x; d, 2 r)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, & r \neq 2, \\ n^{- 2} ln n, & r = 2, \end{matrix}$

where

$f_{n}^{*} (x; d, 2 r) = \frac{f^{*} (x; d, 2 r)}{g_{n}} I_{{r > 1}} (r) ((q_{1} - \frac{2 - r}{2}) \frac{x (2 r + x)}{2 r + d - 2} + q_{2} x^{2} + \frac{x (2 - r)}{2}) .$

(26)
ii:: For $γ = 0$ and random scaling factor $N_{n} (r)$ at $T_{N_{n} (r)}$ , the approximation $G_{d} (x)$ does not change:

${sup}_{x} |P (N_{n} (r) T_{N_{n} (r)} \leq x) - G_{d} (x; n)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, & r \neq 2, \\ n^{- 2} ln n, & r = 2, \end{matrix}$

where

$G_{d} (x; n) = G_{d} (x) + \frac{g_{d} (x)}{g_{n}} I_{{r > 1}} (r) (q_{1} x + q_{2} x^{2}) \frac{r}{r - 1} .$
iii:: Let $γ = - 1$ and $r \geq 2$ . The mixed scaling factor $g_{n}^{- 1} N_{n}^{2} (r)$ at $T_{N_{n} (r)}$ results in a gamma distribution of generalized type $W_{r - d / 2} (x; d, r)$ with density $w_{r - d / 2} (x; d, r)$ given in (20) for $γ = - 1$ :

$sup_{x} |P (\frac{N_{n}^{2} (r)}{g_{n}} T_{N_{n} (r)} \leq x) - W_{r - d / 2; n} (x; d, r)| \leq C_{r} \{\begin{matrix} n^{- 2}, & r > 2, \\ n^{- 2} ln n, & r = 2, \end{matrix},$

where

$\begin{matrix} W_{r - d / 2; n} (x; d, r) & = & W_{r - d / 2} (x; d, r) + \frac{w_{r - d / 2} (x; d, r)}{g_{n}} I_{{r > 1}} (r) (2 q_{2} r x + \frac{(r - 2) x}{2} \\ + & \frac{\sqrt{2 r x}}{2} (2 q_{1} + 2 q_{2} (d + 2 - 2 r) + 2 - r) \frac{K_{r - d / 2 - 1} (\sqrt{2 r x})}{K_{r - d / 2} (\sqrt{2 r x})}) . \end{matrix}$

The restriction $r \geq 2$ in Theorem 2(iii) has a purely proof-technical character. In Proposition 4, a result is shown with $r = 3 / 2$ .

Remark 10.

The function

R (u; d, r) = \frac{K_{λ - 1} (u)}{K_{λ} (u)}

can be calculated explicitly for

λ = m + 1 / 2

with integer

m = r - (d + 1) / 2

. Then, for example,

R (\sqrt{3 x}; 4, 3 / 2) = 1 + \frac{1}{\sqrt{3 x}}

and

R (\sqrt{4 x}; 3, 2) = 1

.

Example 4.

Let

γ = - 1

in (20),

r = 2

and

d = 3

. Then, for an asymptotically chi-squared distributed test variable

T_{m}

satisfying (8), with scale factor

\frac{N_{n}^{2} (2)}{2 n - 1}

, the estimation holds:

sup_{x > 0} |P (\frac{N_{n}^{2} (2)}{2 n - 1} T_{N_{n} (2)} \leq x) - W_{1 / 2} (x; 3, 2) + \frac{w_{1 / 2} (x; 3, 2)}{4 (2 n - 1)} (\sqrt{4 x} (q_{2} \sqrt{4 x} + q_{1} + q_{2}))| \leq C_{2} \frac{ln n}{n^{2}},

where

W_{1 / 2} (x; 3, 2)

and

w_{1 / 2} (x; 3, 2)

are specified in (24).

As applications to Theorem 2, we now examine Hotelling’s

T_{0}^{2}

distribution and normalized quotients of two independent chi-square distributions as asymptotic chi-square distributions, considered in Christoph and Ulyanov [18] (Section 3.2 and Corollary 2) where the sample sizes

N_{n} = N_{n} (s)

had Pareto-like distribution.

Corollary 2.

The conditions of the Theorem 2 shall be fulfilled:

i:: Let $γ = 1$ . Consider Hotelling’s generalized $T_{0}^{2}$ -statistic $T_{0}^{2} = T_{m} = tr (S_{q} S_{m}^{- 1})$ with independently distributed random matrices $S_{q}$ and $S_{m}$ having Wishart distributions $W_{p} (q, I_{p})$ and $W_{p} (m, I_{p})$ , respectively. Then, inequality (8) holds with limit distribution $G_{d} (x)$ , $d = p q$ , $q_{1} = (p + 1 - q) / 2$ and $q_{2} = (p + 1 + q) / (2 d + 4)$ . The non-random scaling factor $g_{n} = E N_{n} (r)$ by $T_{N_{n} (r)}$ leads to

${sup}_{x} |P (g_{n} T_{N_{n} (r)} \leq x) - F^{*} (x; p q, 2 r) - f_{n}^{*} (x; p q, 2 r)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, & r \neq 2, \\ n^{- 2} ln n, & r = 2, \end{matrix}$

(27)

where the scaled F-distribution $F^{*} (x; p q, 2 r)$ with density $f^{*} (x; p q, 2 r)$ is given in (20) for $γ = 1$

$\begin{matrix} f_{n}^{*} (x; p q, 2 r) & = & \frac{f^{*} (x; p q, 2 r)}{g_{n}} I_{{r > 1}} (r) ((\frac{p + 1 - q}{2} - \frac{2 - r}{2}) \frac{x (2 r + x)}{2 r + p q - 2} \\ + \frac{(p + 1 + q) x^{2}}{(2 p q + 4} + \frac{x (2 - r)}{2}) . \end{matrix}$

(28)
ii:: Let $γ = 0$ , $χ_{d}^{2}$ and $χ_{m}^{2}$ be independent and $T_{m} = χ_{d}^{2} / χ_{m}^{2}$ be scale mixtures satisfying inequality (8) with coefficients $q_{1} = (d - 2) / 2$ and $q_{2} = - 1 / 2$ . Random degrees of freedom $N_{n} (r)$ instead of m and random scaling factor $N_{n} (r)$ lead to

$sup_{x > 0} |P (N_{n} (r) T_{N_{n} (r)} \leq x) - G_{d} (x; n)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, & r \neq 2, \\ n^{- 2} ln n, & r = 2, \end{matrix}$

where

$G_{d} (x; n) = G_{d} (x) + \frac{g_{d} (x)}{2 g_{n}} I_{{r > 1}} (r) ((d - 2) x - x^{2}) \frac{r}{r - 1} .$
iii:: Let $γ = - 1$ . The statistics $T_{m} = χ_{4}^{2} / χ_{m}^{2}$ satisfy the inequality (8) with the limiting distribution $G_{4} (x)$ and the coefficients $q_{1} = 1$ and $q_{2} = - 1 / 2$ . The mixed scaling factor $g_{n}^{- 1} N_{n}^{2} (r)$ at $T_{N_{n} (r)}$ results in a limiting gamma distribution of generalized type $W_{r - d / 2} (x; d, r)$ . Only if $r - (d + 1) / 2 = m$ is an integer, the involved Macdonald functions $K_{r - d / 2} (\sqrt{2 r x})$ may be explicitly calculated. Since $d = 4$ , we choose $r = 5 / 2$ and find $r - (d + 1) / 2 = 0$ . Then, uniformly in $x > 0$ :

$|P (\frac{N_{n}^{2} (5 / 2)}{(5 n - 3) / 2} \frac{χ_{4}^{2}}{χ_{N_{n} (5 / 2)}^{2}} \leq x) - W_{1 / 2} (x; 4, 5 / 2) + \frac{w_{1 / 2} (x; 4, 5 / 2)}{2 (5 n - 3)} (9 x - \sqrt{5 x})| \leq \frac{C_{3 / 2}}{n^{3 / 2}},$

where $W_{1 / 2} (x; 4, 5 / 2)$ and $w_{1 / 2} (x; 4, 5 / 2)$ are specified in (23).

Remark 11.

In the paper Monahkov [38], an analogous to (27) estimation is shown, but with 11 approximation terms in corresponding formula (28). Instead of (8) with

q_{1} = (p + 1 - q) / 2

,

q_{2} = (p + 1 + q) / (2 d + 4)

and

d = p q

, the following equivalent inequality is used; see Fujikoshi et al. [39] (Theorem 4.1(ii)):

{sup}_{x} |P (m tr (S_{q} S_{m}^{- 1}) \leq x) - G_{d} (x) - \frac{d}{4 m} (a_{0} G_{d} (x) + a_{1} G_{d + 2} (x) + a_{2} G_{d + 4} (x))| \leq \frac{C}{m^{2}}

where

a_{0} = q - p - 1

,

a_{1} = - 2 q

,

a_{2} = q + p + 1

with

a_{0} + a_{1} + a_{2} = 0

and

d = p q

.

Proposition 4.

Let

γ = - 1

. Consider the statistics

T_{m} = χ_{4}^{2} / χ_{m}^{2}

, satisfying the inequality (8) with the limiting distribution

G_{4} (x)

, the coefficients

q_{1} = 1

and

q_{2} = - 1 / 2

and the mixed scaling factor

g_{n}^{- 1} N_{n}^{2} (r)

at

T_{N_{n} (r)}

. If

r = 3 / 2

and

d = 4

, then

r - (d + 1) / 2 = - 1

,

g_{n} = (3 n - 1) / 2

and, uniformly in

x > 0

:

|P (\frac{N_{n}^{2} (3 / 2)}{(3 n - 1) / 2} \frac{χ_{4}^{2}}{χ_{N_{n} (3 / 2)}^{2}} \leq x) - W_{- 1 / 2} (x; 4, 3 / 2) + \frac{w_{- 1 / 2} (x; 4, 3 / 2)}{2 (3 n - 1)} (7 x + \sqrt{3 x} + 1)| \leq \frac{C_{3 / 2}}{n^{3 / 2}},

where

W_{- 1 / 2} (x; 4, 3 / 2)

and

w_{- 1 / 2} (x; 4, 3 / 2)

are specified in (22).

6. Proofs

For the proofs of Theorems 1 and 2, we use Proposition 1. The statistics

T_{m}

and the sample size

N_{n}

are either asymptotically normally and discretely Pareto-like distributed (i.e.,

F = Φ

and

H = W_{s}

) or asymptotically chi-squared and negatively binomially distributed (i.e.,

F = G_{d}

and

H = G_{r, r}

). In both cases, the size

D_{n}

defined in (6) is uniformly bounded for all

n \in N_{+}

, see Christoph and Ulyanov [18] (Lemma A1). Next, the bounds that are required in (4) for the negative moments of sample sizes

E N_{n} {(s)}^{- a}

and

E N_{n} {(r)}^{- a}

are provided by (12) and (17). Furthermore, it follows from Christoph and Ulyanov [18] (Proposition 2 and Lemma A2) that in both cases the domain of integration of the integrals in the function

G_{n} (x, 1 / g_{n})

defined in (5) can be extended from

(1 / g_{n}, \infty)

to

(0, \infty)

:

{sup}_{x} | G_{n} (x, 1 / g_{n}) - G_{n, 2} (x) | \leq C g_{n}^{- b},

where

b = 2

if

F = Φ

and

H = W_{s}

or

b = min {r, 2}

if

F = G_{d}

and

H = G_{r, r}

, respectively, and

G_{n, 2} (x) = \{\begin{matrix} \int_{0}^{\infty} F (x y^{γ}) d H (y), & f o r 0 < b \leq 1 / 2, \\ \int_{0}^{\infty} (F (x y^{γ}) + \frac{f_{1} (x y^{γ})}{\sqrt{g_{n} y}}) d H (y) = : G_{n, 1} (x), & f o r 1 / 2 < b \leq 1, \\ G_{n, 1} (x) + \int_{0}^{\infty} \frac{f_{2} (x y^{γ})}{g_{n} y} d H (y) + \int_{0}^{\infty} \frac{F (x y^{γ})}{n} d h_{2} (y), & f o r b > 1, \end{matrix}\} .

(29)

We still have to calculate the integrals in (29) that contain

f_{1}

,

f_{2}

, and

h_{2}

, respectively.

Proof of Theorem 1.

We now consider

F = Φ

,

H = H_{s}

and

γ \in {0; \pm 1 / 2}

. Here,

f_{1} (x y^{γ}) = (p_{0} + p_{2} x^{2} y^{2 γ}) φ (x y^{γ})

,

f_{2} (x y^{γ}) = (p_{1} x y^{γ} + p_{3} x^{3} y^{3 γ} + p_{5} x^{5} y^{5 γ}) φ (x y^{γ})

and we divide the function

h_{2} (y) = h_{2; s} (y)

given in (11) into two parts:

h_{2; s}^{*} (y) = s (s - 1) e^{- s / y} / (2 y^{2})

and

h_{2; s}^{* *} (y) = s Q_{1} (n y) y^{- 2} e^{- s / y}

. The densities of the limit distributions

V_{γ} (x; d, r) = \int_{0}^{\infty} Φ (x y^{γ}) d W_{s} (y)

were given in (20). If

γ = 1 / 2

to calculate the integrals in (29) involving

f_{1} (x \sqrt{y})

,

f_{2} (x \sqrt{y})

and

h_{2; s}^{*} (y)

we use Prudnikov et al. [35] (Formulas 2.3.16.2 and 2.3.16.3):

\int_{0}^{\infty} y^{- m - 1 / 2} e^{- p y - q / y} d y = \{\begin{matrix} {(- 1)}^{- m} \sqrt{π} \frac{\partial^{- m}}{\partial p^{- m}} (p^{- 1 / 2} e^{- 2 \sqrt{p q}}), & m = 0, - 1, - 2, \dots \\ {(- 1)}^{m} \frac{\sqrt{π}}{\sqrt{p}} \frac{\partial^{m}}{\partial q^{m}} (e^{- 2 \sqrt{p q}}), & m = 0, 1, 2, \dots \end{matrix}, p, q > 0,

(30)

for

p = x^{2} / 2 > 0

,

q = s > 0

and

m = 0, 1, 2

, respectively. The corresponding integral with

h_{2; s}^{* *} (y)

was estimated in Christoph et al. [17] (see Proof of Theorem 5) by

c (s) e^{- \sqrt{π s n} / 2} \leq C (s) n^{- 2}

.

In case of

γ = 0

, we obtain

\int_{0}^{\infty} Φ (x) d h_{2} (y) = Φ (x) (h_{2} (\infty) - {lim}_{y \to 0} h_{2} (y)) = 0

. To calculate the integrals with

f_{1} (x)

and

f_{2} (x)

we use [35] (Formula 2.3.3.1) with

α = 3 / 2, 2

and

q = s

:

\int_{0}^{\infty} y^{- α - 1} e^{- q / y} d y \overset{1 / y = z}{=} \int_{0}^{\infty} z^{α - 1} e^{- q z} d z = Γ (α) q^{- α}, α > 0, q > 0 .

(31)

If

γ = - 1 / 2

, the integrals with

f_{1} (x / \sqrt{y})

,

f_{2} (x / \sqrt{y})

and

h_{2, s}^{*} (x / \sqrt{y})

are calculated using (31) with

α = 3 / 2, 5 / 2, 7 / 2, 9 / 2

and

q = s + x^{2} / 2

. From Christoph and Ulyanov [20] (see Proof of Theorem 8), it follows that holds:

n^{- 1} {sup}_{x} |\int_{0}^{\infty} Φ (x / \sqrt{y}) d h_{2; s}^{* *} (y)| \leq C (s) n^{- 2}

and Theorem 1 is proved. □

Proof of Theorem 2.

Now, we consider the case

F (x) = G_{d} (x)

,

H (y) = G_{r, r} (y)

and

γ \in {0; \pm 1}

. This combination has not yet been studied in the literature. Only if

γ = 1

, there is a result by Monahkov [38]; see Remark 11 above. Then,

f_{1} (x y^{γ}) = 0

,

f_{2} (x y^{γ}) = (q_{1} x y^{γ} + q_{2} x^{2} y^{2 γ}) g_{d} (x y^{γ})

and we divide the function

h_{2} (y) = h_{2; r} (y)

given in (16) into two parts:

h_{2; r}^{*} (y) = {(2 r)}^{- 1} g_{r, r} (y) (y - 1) (2 - r)

and

h_{2; r}^{* *} (y) = r^{- 1} g_{r, r} (y) Q_{1} (g_{n} y)

.

For

γ = 1

, the density

v_{1} (x; d, r)

in (20) and the integrals in (29) with

f_{2} (x y)

and

h_{2; r}^{*} (y)

are computed with (31) for

α = r + d / 2, r + d / 2 - 1

. The integral with

h_{2; r}^{* *} (y)

is estimated in (A1) in Lemma A1. Together with the inequality

| 1 / g_{n} {- 1 / (r n) | \leq max {2, r} (r - 1) (r n)}^{- 2}

, we get (26).

In case of

γ = 0

, we obtain

\int_{0}^{\infty} G_{d} (x) d h_{2} (y) = G_{d} (x) (h_{2, r} (\infty) - {lim}_{y \to 0} h_{2, r} (y)) = 0

. To calculate the integrals with

f_{2} (x)

, we use (31) with

α = r - 1

and

q = r

.

If

γ = - 1

the density

v_{- 1} (x; d, r)

in (20) and the integrals with

f_{2} (x / y)

and

h_{2, r}^{*} (y)

are calculated using Prudnikov et al. [35] (Formula 2.3.16.1):

\int_{0}^{\infty} y^{α - 1} e^{- p y - q / y} d y = 2 {(p / q)}^{α / 2} K_{α} (2 \sqrt{p q}), p, q > 0,

with

α = r - d / 2, r - d / 2 - 1, r - d / 2 - 2

,

p = r

and

q = x / 2

. We use the order-reflection formula

K_{α} (u) = K_{- α} (u)

and the recursion formula; see Oldham et al. [34] (Chapter 51.5):

\begin{matrix} K_{r - d / 2 - 2} (\sqrt{2 r x}) = K_{d / 2 + 2 - r} (\sqrt{2 r x}) = \frac{2 (d / 2 - r + 1)}{\sqrt{2 r x}} K_{d / 2 - r + 1} (\sqrt{2 r x}) + K_{d / 2 - r} (\sqrt{2 r x}) . \end{matrix}

The integral with

h_{2; r}^{* *} (y)

is estimated in (A4) in Lemma A2 and Theorem 2 is proved. □

Proof of Proposition 4.

We consider

γ = - 1

,

r = 3 / 2

d = 4

and

g_{n} = (3 n - 1) / 2

. The integrals in (29) with

f_{2} (x / y)

and

h_{2, r}^{*} (y)

are calculated using (30) with

m = - 1, - 2, - 3

,

p = r

and

q = x / 2

. The integral with

h_{2, r}^{* *}

is estimated in (A5) in Lemma A3 and Proposition 4 is proved. □

7. Conclusions

The common goal of the present work and that of Christoph and Ulyanov [18] is to develop formal second order Chebyshev–Edgeworth expansions for sample statistics with random sample sizes. Corresponding expansions are assumed for the statistics with non-random sample sizes as well as for the random sample sizes. The statistics examined are asymptotically normally distributed and, for the first time in this setting, also asymptotically chi-squared distributed. The random sample sizes have negative binomial or Pareto-like distributions. The formal construction of the approximating functions allows the results to be used for a whole family of asymptotically normal or chi-squared distributed statistics. The Student t-distribution with m degrees of freedom, the one-sample Student t-test statistic, and the sample mean are considered as examples of asymptotic normal statistics. Hotelling’s generalized

T_{0}^{2}

statistic and scale mixture of a normalized quotient of two independent chi-squared random variables were studied as examples of the asymptotic chi-squared distributions. In addition, random, non-random, and mixed scaling factors for the statistics are considered, which have a significant influence on the limit distributions. The limit laws are scale mixtures of the normal with mixing gamma or chi-squared with mixing inverse exponential distributions. In addition to the normal distribution and the chi-square distribution, there are a variety of limit distributions: the Laplace, the scaled Student t-, the scaled Fisher, the generalized gamma, and linear combinations of generalized gamma distributions.

The remaining terms in the approximations of the scaled statistics are estimated by inequalities.

Author Contributions

Conceptualization, G.C. and V.V.U.; methodology, V.V.U. and G.C.; formal analysis, G.C. and V.V.U.; investigation, G.C. and V.V.U.; writing—original draft, G.C. and V.V.U.; writing—review and editing, V.V.U. and G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. It was carried out within the project, “Analysis of the quality of approximations in the statistical analysis of multivariate observations” of the Magdeburg University, the program of the Moscow Center for Fundamental and Applied Mathematics, Lomonosov Moscow State University, and HSE University Basic Research Programs.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank the Editor for his support and the Reviewers for their appropriate comments which have improved the quality of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Auxiliary Lemmas

Lemma A1.

Let

r > 1

then

| J_{1} (x) | = |\int_{0}^{\infty} G_{d} (x y) d h_{2; r}^{* *} (y)| \leq \frac{c (r, d)}{g_{n}^{r - 1}} w i t h h_{2; r}^{* *} (y) = r^{- 1} g_{r, r} (y) Q_{1} (g_{n} y) .

(A1)

Proof of Lemma A1.

We use the Fourier series expansion of the jump correcting function

Q_{1} (y)

at all non-integer points y; see Prudnikov et al. [35] (Formula 5.4.2.9 for

a = 0

):

Q_{1} (y) = \frac{1}{2} - (y - [y]) = \sum_{k = 1}^{\infty} \frac{sin (2 π k y)}{k π}, y \neq [y],

(A2)

and Prudnikov et al. [35] (Formula 2.5.31.4):

\int_{0}^{\infty} y^{α - 1} e^{- p y} sin (b y) d y = \frac{Γ (α)}{{(b^{2} + p^{2})}^{α / 2}} sin (α arctan (b / p)) with α > - 1, b, p > 0 .

(A3)

Integration by parts in the integral

J_{1} (x)

, using (A2), interchanging sum and integral and applying (A3) with

α = r + d / 2 - 1

,

p = (r + x / 2)

and

b = 2 π k g_{n}

leads to

\begin{matrix} J_{1} (x) & = & - \frac{r^{r - 1} x^{d / 2}}{Γ (r) 2^{d / 2} Γ (d / 2)} \int_{0}^{\infty} y^{r + d / 2 - 2} Q_{1} (g_{n} y) e^{- (r + x / 2) y} d y \\ = & - \frac{r^{r - 1} x^{d / 2}}{π Γ (r) 2^{d / 2} Γ (d / 2)} \sum_{k = 1}^{\infty} \frac{1}{k} \int_{0}^{\infty} y^{r + d / 2 - 2} e^{- (r + x / 2) y} sin (2 π k g_{n} y) d y \\ = & - \frac{r^{r - 1} Γ (r + d / 2 - 1)}{π Γ (r) 2^{d / 2} Γ (d / 2)} \sum_{k = 1}^{\infty} \frac{a_{k} (x; n)}{k} \end{matrix}

with

a_{k} (x; n) = \frac{x^{d / 2} sin ((r + d / 2 - 1) arctan (2 π k g_{n} / (r + x / 2)))}{{({(2 π k g_{n})}^{2} + {(r + x / 2)}^{2})}^{(r + d / 2 - 1) / 2}} .

Now, we split the exponent

(r + d / 2 - 1) / 2 = (r - 1) / 2 + d / 4

and obtain

\begin{matrix} | a_{k} (x; n) | & \leq & \frac{x^{d / 2}}{{(2 π k g_{n})}^{r - 1} {(r + x / 2)}^{d / 2}} \leq \frac{2^{d / 2}}{{(2 π k g_{n})}^{r - 1}} . \end{matrix}

Since

r > 1

, we find uniform in

x \geq 0

| J_{1} (x) | \leq \frac{c_{1} (r, d)}{g_{n}^{r - 1}} \sum_{k = 1}^{\infty} k^{- r} = \frac{c (r, d)}{g_{n}^{r - 1}}

and Lemma A1 is proved. □

Lemma A2.

Let

r \geq 2

, then

| J_{- 1} (x) | = |\int_{0}^{\infty} G_{d} (x / y) d h_{2; r}^{* *} (y)| \leq \frac{c (r, d)}{g_{n}} w i t h h_{2; r}^{* *} (y) = r^{- 1} g_{r, r} (y) Q_{1} (g_{n} y) .

(A4)

Proof of Lemma A2.

Integration by parts in the integral

J_{- 1} (x)

, using the Fourier series expansion (A2), interchanging sum and integral, we find

J_{- 1} (x) = \frac{r^{r - 1} x^{d / 2}}{Γ (r) 2^{d / 2} Γ (d / 2)} \int_{0}^{\infty} y^{r - d / 2 - 2} Q_{1} (g_{n} y) e^{- (r y + x / (2 y))} d y = \frac{r^{r - 1}}{π Γ (r) 2^{d / 2} Γ (d / 2)} \sum_{k = 1}^{\infty} \frac{J_{k, n} (x)}{k}

with

J_{k, n} (x) = \int_{0}^{\infty} x^{d / 2} y^{r - d / 2 - 2} e^{- (r y + x / (2 y))} sin (2 π k g_{n} y) d y .

In the literature, we have only found integrals

J_{k, n} (x)

with power functions

y^{- 1 / 2}

and

y^{- 3 / 2}

. Therefore, we integrate by parts in the integral

J_{k, n} (x)

:

J_{k, n} (x) = \frac{- 1}{2} \int_{0}^{\infty} ((d - 2 r + 4) f_{1} (x, y) + 2 r f_{2} (x, y) - f_{3} (x, y)) e^{- (r y + x / (2 y))} \frac{cos (2 π k g_{n} y)}{2 π k g_{n}} d y,

where

f_{1} (x, y) = x^{d / 2} y^{r - d / 2 - 3}

,

f_{2} (x, y) = x^{d / 2} y^{r - d / 2 - 2}

and

f_{3} (x, y) = x^{d / 2 + 1} y^{r - d / 2 - 4} .

Since

r \geq 2

and

d \geq 1

we obtain

y^{r - 2} e^{- r y / 2} \leq c_{r}

and

{(x / y)}^{(d - 1) / 2} e^{- x / (4 y)} \leq c_{d}

. Using (30) with

m = 0, 1, 2,

p = r / 2

, and

q = x / 4

we find

\int_{0}^{\infty} f_{1} (x, y) d y \leq c_{r} c_{d} x^{1 / 2} \int_{0}^{\infty} y^{- 3 / 2} e^{- (r y / 2 + x / (4 y))} d y = c_{r} c_{d} 2 \sqrt{π} e^{- \sqrt{r x / 2}} \leq C_{1} (r, d),

\int_{0}^{\infty} f_{2} (x, y) d y \leq c_{r} c_{d} x^{1 / 2} \int_{0}^{\infty} \frac{y^{- 1 / 2}}{e^{(r y / 2 + x / (4 y))}} d y = c_{r} c_{d} \sqrt{2 π x / r} e^{- \sqrt{2 r x} / 2} \leq C_{2} (r, d),

\int_{0}^{\infty} f_{3} (x, y) d y \leq c_{r} c_{d} x^{3 / 2} \int_{0}^{\infty} \frac{y^{- 5 / 2}}{e^{(r y / 2 + x / (4 y))}} d y = c_{r} c_{d} 2 \sqrt{π} (\sqrt{2 r x} + 2) e^{- \sqrt{r x / 2}} \leq C_{3} (r, d)

and

| J_{k, n} | \leq \frac{1}{4 π k g_{n}} (| d - 2 r + 4 | C_{1} (r, d) + 2 r C_{2} (r, d) + C_{3} (r, d)) \leq \frac{C^{*} (r, d)}{k g_{n}} .

Hence,

| J_{- 1} (x) | \leq \frac{r^{r - 1}}{π Γ (r) 2^{d / 2} Γ (d / 2)} \frac{π^{2}}{6 g_{n}} C^{*} (r, d) \leq \frac{c (r, d)}{g_{n}} .

Lemma A2 is proved. □

Lemma A3.

Let

γ = - 1

,

r = 3 / 2

,

d = 4

and

g_{n} = (3 n - 1) / 2

, then

| J_{- 1}^{*} (x) | = |\int_{0}^{\infty} G_{d} (x / y) d h_{2; 3 / 2}^{* *} (y)| \leq \frac{c (3 / 2, 4)}{\sqrt{g_{n}}} w i t h h_{2; 3 / 2}^{* *} (y) = (2 / 3) g_{3 / 2, 3 / 2} (y) Q_{1} (g_{n} y) .

(A5)

Proof of Lemma A3.

Integration by parts in the integral

J_{- 1}^{*} (x)

, using the Fourier series expansion (A2), interchanging sum and integral, we find

J_{- 1}^{*} (x) = \frac{\sqrt{3 / 2} x^{2}}{4 Γ (3 / 2)} \int_{0}^{\infty} y^{- 5 / 2} Q_{1} (g_{n} y) e^{- (3 y / 2 + x / (2 y))} d y = \frac{\sqrt{3 / 2}}{\sqrt{π}} \sum_{k = 1}^{\infty} \frac{J_{k, n}^{*} (x)}{k}

with

J_{k, n}^{*} (x) = x^{2} \int_{0}^{\infty} y^{- 5 / 2} e^{- (3 y / 2 + x / (2 y))} sin (2 π k g_{n} y) d y .

Using Prudnikov et al. [35] (Formula 2.5.37.3), with the real constants

p > 0

,

q > 0

and

b > 0

, we obtain

\int_{0}^{\infty} y^{- 3 / 2} e^{- p y - q / y} sin (b y) d y = \frac{\sqrt{π}}{\sqrt{q}} e^{- 2 \sqrt{q} z_{+}} sin (2 \sqrt{q} z_{-}) a n d 2 z_{\pm}^{2} = \sqrt{p^{2} + b^{2}} \pm p .

(A6)

It was shown in Christoph et al. [17] (Proof of Theorem 5) that Leibniz’s integral rule allows differentiation to q under the integral sign in (A6). Therefore,

\begin{matrix} \int_{0}^{\infty} y^{- 5 / 2} e^{- p y - q / y} sin (b y) d y & = & (\sqrt{π} / 2) e^{- 2 \sqrt{q} z_{+}} (q^{- 3 / 2} sin (2 \sqrt{q} z_{-}) \\ + 2 q^{- 1} z_{+} sin (2 \sqrt{q} z_{-}) & - & 2 q^{- 1} z_{-} cos (2 \sqrt{q} z_{-})) . \end{matrix}

Since

0 < z_{-} \leq z_{+}

,

p = 3 / 2

,

q = x / 2

,

b = 2 π k g_{n}

,

k \geq 1

and

g_{n} \geq 1

we find

z_{+} \geq \sqrt{π k g_{n}}

,

| J_{k, n}^{*} (x) | \leq x^{2} \frac{\sqrt{π}}{2} e^{- \sqrt{2 x} z_{+}} (\frac{2 \sqrt{2}}{x^{3 / 2}} + \frac{8}{x} z_{+}) = \frac{\sqrt{π}}{z_{+}} e^{- \sqrt{2 x} z_{+}} (\sqrt{2 x} z_{+} + 4 x z_{+}^{2}) \leq \frac{e^{- 1} + 8 e^{- 2}}{\sqrt{k n}}

and

| J_{- 1} (x) | \leq \frac{\sqrt{3 / 2}}{\sqrt{π}} \sum_{k = 1}^{\infty} \frac{e^{- 1} + 8 e^{- 2}}{k^{3 / 2} \sqrt{g_{n}}} .

Lemma A3 is proved. □

References

Wallace, D.L. Asymptotic approximations to distributions. Ann. Math. Statist. 1958, 29, 635–654. [Google Scholar] [CrossRef]
Bickel, P.J. Edgeworth expansions in nonparametric statistics. Ann. Statist. 1974, 2, 1–20. [Google Scholar] [CrossRef]
Hall, P. The Bootstrap and Edgeworth Expansion; Springer Series in Statistics; Springer: New York, NY, USA, 1992. [Google Scholar]
Bhattacharya, R.N.; Ranga Rao, R. Normal Approximation and Asymptotic Expansions; Wiley: New York, NY, USA, 1976. [Google Scholar]
Petrov, V.V. Limit Theorems of Probability Theory, Sequences of Independent Random Variables; Clarendon Press: Oxford, UK, 1995. [Google Scholar]
Pfanzagl, J. Asymptotic expansions related to minimum contrast estimators. Ann. Statist. 1973, 1, 993–1026. [Google Scholar] [CrossRef]
Bentkus, V.; Götze, F.; van Zwet, W.R. An Edgeworth expansion for symmetric statistics. Ann. Statist. 1997, 25, 851–896. [Google Scholar] [CrossRef]
Burnashev, M.V. Asymptotic expansions for median estimate of a parameter. Theory Probab. Appl. 1997, 41, 632–645. [Google Scholar] [CrossRef]
Nunes, C.; Capistrano, G.; Ferreira, D.; Ferreira, S.S.; Mexia, J.T. Exact critical values for one-way fixed effects models with random sample sizes. J. Comput. Appl. Math. 2019, 354, 112–122. [Google Scholar] [CrossRef]
Nunes, C.; Capistrano, G.; Ferreira, D.; Ferreira, S.S.; Mexia, J.T. Random sample sizes in orthogonal mixed models with stability. Comp. Math. Methods 2019, 1, e1050. [Google Scholar] [CrossRef] [Green Version]
Nunes, C.; Mário, A.; Ferreira, D.; Moreira, E.M.; Ferreira, S.S.; Mexia, J.T. An algorithm for simulation in mixed models with crossed factors considering the sample sizes as random. J. Comput. Appl. Math. 2022, 404, 113463. [Google Scholar] [CrossRef]
Esquível, M.L.; Mota, P.P.; Mexia, J.T. On some statistical models with a random number of observations. J. Stat. Theory Pract. 2016, 10, 805–823. [Google Scholar] [CrossRef]
Döbler, C. New Berry-Esseen and Wasserstein bounds in the CLT for non-randomly centered random sums by probabilistic methods. ALEA Lat. Am. J. Probab. Math. Stat. 2015, 12, 863–902. [Google Scholar]
Schluter, C.; Trede, M. Weak convergence to the Student and Laplace distributions. J. Appl. Probab. 2016, 53, 121–129. [Google Scholar] [CrossRef]
Bening, V.E.; Galieva, N.K.; Korolev, V.Y. On rate of convergence in distribution of asymptotically normal statistics based on samples of random size. Ann. Math. Inform. 2012, 39, 17–28. [Google Scholar]
Bening, V.E.; Galieva, N.K.; Korolev, V.Y. Asymptotic expansions for the distribution functions of statistics constructed from samples with random sizes. Inform. Appl. 2013, 7, 75–83. (In Russian) [Google Scholar]
Christoph, G.; Monakhov, M.M.; Ulyanov, V.V. Second-order Chebyshev-Edgeworth and Cornish-Fisher expansions for distributions of statistics constructed with respect to samples of random size. J. Math. Sci. 2020, 244, 811–839, Translated from Zap. Nauchnykh Semin. POMI 2017, 466, 167–207. [Google Scholar] [CrossRef]
Christoph, G.; Ulyanov, V.V. Chebyshev–Edgeworth-type approximations for statistics based on samples with random sizes. Mathematics 2021, 9, 775. [Google Scholar] [CrossRef]
Fujikoshi, Y.; Ulyanov, V.V. Non-Asymptotic Analysis of Approximations for Multivariate Statistics; Springer: Singapore, 2020. [Google Scholar]
Christoph, G.; Ulyanov, V.V. Second order expansions for high-dimension low-sample-size data statistics in random setting. Mathematics 2020, 8, 1151, Reprinted in Special Issue: Stability Problems for Stochastic Models: Theory and Applications; Zeifman, A.; 57 Korolev, V.; Sipin, A., Eds.; MPDI: Basel, Switzerland, 2021; pp. 259–286.. [Google Scholar] [CrossRef]
Buddana, A.; Kozubowski, T.J. Discrete Pareto distributions. Econ. Qual. Control. 2014, 29, 143–156. [Google Scholar] [CrossRef]
Bening, V.E.; Korolev, V.Y. On the use of Student’s distribution in problems of probability theory and mathematical statistics. Theory Probab. Appl. 2005, 49, 377–391. [Google Scholar] [CrossRef]
Gavrilenko, S.V.; Zubov, V.N.; Korolev, V.Y. The rate of convergence of the distributions of regular statistics constructed from samples with negatively binomially distributed random sizes to the Student distribution. J. Math. Sci. 2017, 220, 701–713. [Google Scholar] [CrossRef]
Christoph, G.; Ulyanov, V.V.; Bening, V.E. Second order expansions for sample median with random sample size. ALEA Lat. Am. J. Probab. Math. Stat. 2022, 19, 339–365. [Google Scholar] [CrossRef]
Korolev, V. Bounds for the rate of convergence in the generalized Rényi theorem. Mathematics 2022, 10, 4252. [Google Scholar] [CrossRef]
Bulinski, A.; Slepov, N. Sharp estimates for proximity of geometric and related sums distributions to limit laws. Mathematics 2022, 10, 4747. [Google Scholar] [CrossRef]
Korolev, V.; Shevtsova, I. An improvement of the Berry-Esseen inequality with applications to Poisson and mixed Poisson random sums. Scand. Actuar. J. 2012, 2012, 81–105. [Google Scholar] [CrossRef] [Green Version]
Sunklodas, J.K. On the normal approximation of a binomial random sum. Lith. Math. J. 2014, 54, 356–365. [Google Scholar] [CrossRef]
Petrov, V.V. Sums of Independent Random Variables; Akademie-Verlag: Berlin, Germany, 1975. [Google Scholar]
Kolassa, J.E.; McCullagh, P. Edgeworth series for lattice distributions. Ann. Statist. 1990, 18, 981–985. [Google Scholar] [CrossRef]
Choy, T.B.; Chan, J.E. Scale mixtures distributions in statistical modelling. Aust. N. Z. J. Stat. 2008, 50, 135–146. [Google Scholar] [CrossRef]
Fujikoshi, Y.; Ulyanov, V.V.; Shimizu, R. Multivariate Statistics. High-Dimensional and Large-Sample Approximations; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2010. [Google Scholar]
Christoph, G.; Ulyanov, V.V. Random dimension low sample size asymptotics. In Recent Developments in Stochastic Methods and Applications; Shiryaev, A.N., Samouylov, K.E., Kozyrev, D.V., Eds.; Springer Proceedings in Mathematics & Statistics; Springer International Publishing: Cham, Switzerland, 2021; Volume 371, pp. 215–228. ISBN 978-3-030-83266-7 and 978-3-030-83266-0. [Google Scholar] [CrossRef]
Oldham, K.B.; Myland, J.C.; Spanier, J. An Atlas of Functions, 2nd ed.; Springer Science + Business Media: New York, NY, USA, 2009. [Google Scholar]
Prudnikov, A.P.; Brychkov, Y.A.; Marichev, O.I. Integrals and Series, Vol. 1: Elementary Functions, 3rd ed.; Gordon & Breach Science Publishers: New York, NY, USA, 1992. [Google Scholar]
Korolev, V.Y.; Zeifman, A.I. Generalized negative binomial distributions as mixed geometric laws and related limit theorems. Lith. Math. J. 2019, 59, 366–388. [Google Scholar] [CrossRef] [Green Version]
Korolev, V.Y.; Gorshenin, A. Probability models and statistical tests for extreme precipitation based on generalized negative binomial distributions. Mathematics 2020, 8, 604. [Google Scholar] [CrossRef] [Green Version]
Monakhov, M.M. Chebyshev–Edgeworth expansions for distributions of generalized Hotelling-type statistics based on random size samples. Inform. Primen. [Informatics Its Appl.] 2021, 15, 72–81. (In Russian) [Google Scholar] [CrossRef]
Fujikoshi, Y.; Ulyanov, V.V.; Shimizu, R. L₁-norm error bounds for asymptotic expansions of multivariate scale mixtures and their applications to Hotelling’s generalized T02. J. Multivar. Anal. 2005, 96, 1–19. [Google Scholar] [CrossRef] [Green Version]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Christoph, G.; Ulyanov, V.V. Second Order Chebyshev–Edgeworth-Type Approximations for Statistics Based on Random Size Samples. Mathematics 2023, 11, 1848. https://doi.org/10.3390/math11081848

AMA Style

Christoph G, Ulyanov VV. Second Order Chebyshev–Edgeworth-Type Approximations for Statistics Based on Random Size Samples. Mathematics. 2023; 11(8):1848. https://doi.org/10.3390/math11081848

Chicago/Turabian Style

Christoph, Gerd, and Vladimir V. Ulyanov. 2023. "Second Order Chebyshev–Edgeworth-Type Approximations for Statistics Based on Random Size Samples" Mathematics 11, no. 8: 1848. https://doi.org/10.3390/math11081848

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Second Order Chebyshev–Edgeworth-Type Approximations for Statistics Based on Random Size Samples

Abstract

1. Introduction

2. Notation and Preliminaries

3. Second Order Estimates for Both the Statistics $T_{m}$ and the Sample Sizes $N_{n}$

4. Limit Distributions of Statistics with Random Size Samples using Different Scaling Factors

4.1. The Case $F (x) = Φ (x)$ and $H (y) = W_{s} (y)$

4.2. The Case $F (x) = G_{d} (x)$ and $H (y) = G_{r, r} (y)$

5. Main Results

5.1. Asymptotically Normal Statistics $T_{m}$ and Pareto-like Sample Sizes $N_{n} (s)$

5.2. Asymptotically Chi-Squared Distributed $T_{m}$ with Negative Binomially Distributed Sample Sizes $N_{n} (r)$

6. Proofs

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Auxiliary Lemmas

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Second Order Chebyshev–Edgeworth-Type Approximations for Statistics Based on Random Size Samples

Abstract

1. Introduction

2. Notation and Preliminaries

3. Second Order Estimates for Both the Statistics T m and the Sample Sizes N n

4. Limit Distributions of Statistics with Random Size Samples using Different Scaling Factors

4.1. The Case F ( x ) = Φ ( x ) and H ( y ) = W s ( y )

4.2. The Case F ( x ) = G d ( x ) and H ( y ) = G r , r ( y )

5. Main Results

5.1. Asymptotically Normal Statistics T m and Pareto-like Sample Sizes N n ( s )

5.2. Asymptotically Chi-Squared Distributed T m with Negative Binomially Distributed Sample Sizes N n ( r )

6. Proofs

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Auxiliary Lemmas

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Second Order Estimates for Both the Statistics $T_{m}$ and the Sample Sizes $N_{n}$

4.1. The Case $F (x) = Φ (x)$ and $H (y) = W_{s} (y)$

4.2. The Case $F (x) = G_{d} (x)$ and $H (y) = G_{r, r} (y)$

5.1. Asymptotically Normal Statistics $T_{m}$ and Pareto-like Sample Sizes $N_{n} (s)$

5.2. Asymptotically Chi-Squared Distributed $T_{m}$ with Negative Binomially Distributed Sample Sizes $N_{n} (r)$