Next Article in Journal
Optimal Integration of Battery Systems in Grid-Connected Networks for Reducing Energy Losses and CO2 Emissions
Next Article in Special Issue
Exploring the Dynamics of COVID-19 with a Novel Family of Models
Previous Article in Journal
DESnets: A Graphical Representation for Discrete Event Simulation and Cost-Effectiveness Analysis
Previous Article in Special Issue
Undirected Structural Markov Property for Bayesian Model Determination
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cumulant-Based Goodness-of-Fit Tests for the Tweedie, Bar-Lev and Enis Class of Distributions

1
Faculty of Industrial Engineering and Technology Management, Holon Institute of Technology, Holon 6810201, Israel
2
Department of Mathematics, University of Ioannina, 45110 Ioannina, Greece
3
Department of Mathematical Sciences and Research Methods Centre, Durham University, Durham DH13LE, UK
4
School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(7), 1603; https://doi.org/10.3390/math11071603
Submission received: 2 March 2023 / Revised: 20 March 2023 / Accepted: 22 March 2023 / Published: 26 March 2023
(This article belongs to the Special Issue Advances in Applied Probability and Statistical Inference)

Abstract

:
The class of natural exponential families (NEFs) of distributions having power variance functions (NEF-PVFs) is huge (uncountable), with enormous applications in various fields. Based on a characterization property that holds for the cumulants of the members of this class, we developed a novel goodness-of-fit (gof) test for testing whether a given random sample fits fixed members of this class. We derived the asymptotic null distribution of the test statistic and developed an appropriate bootstrap scheme. As the content of the paper is mainly theoretical, we exemplify its applicability to only a few elements of the NEF-PVF class, specifically, the gamma and modified Bessel-type NEFs. A Monte Carlo study was executed for examining the performance of both—the asymptotic test and the bootstrap counterpart—in controlling the type I error rate and evaluating their power performance in the special case of gamma, while real data examples demonstrate the applicability of the gof test to the modified Bessel distribution.

1. Introduction

Let ν be a positive Radon measure on R , L ( θ ) = R E θ x d ν ( x ) be its Laplace transform and D = θ R : L ( θ ) < be its effective domain. Let us assume that Θ i n T D ; then, the natural exponential family (NEF) generated by ν is the set F = { F θ : θ Θ R } of probability distributions defined by
d F θ ( x ) = exp { θ x + c ( θ ) } d v ( x ) , θ Θ R ,
where c ( θ ) is real, analytic and strictly concave on Θ ; and k j ( θ ) = d j c ( θ ) d θ j , j N , is the j-th cumulant of F θ . In particular, the mean and variance of F θ are given by  μ = μ ( θ ) = k 1 ( θ ) and σ 2 = σ 2 ( θ ) = k 2 ( θ ) , respectively. As  c ( θ ) is strictly concave on Θ , the mean domain Ω = μ ( i n t Θ ) of F is an open interval; inverse map θ μ ( θ ) is one to one; and its inverse function ψ : Ω Θ is well defined. Let us denote V ( μ ) = σ 2 ( ψ ( μ ) ) ; then, the pair ( V , Ω ) is called the variance function (VF) of F . The VF uniquely determines F within the class of NEFs (see an appropriate survey in [1]).
An NEF F is said to have a power variance function (hereafter, NEF-PVF) if V has the form of V ( μ ) = a μ γ , μ Ω , for some constants a > 0 and power parameter γ R . The class of NEF-PVFs has been introduced independently and in different contexts by [2,3,4]. In this context, comprehensive details can be found in [5]. Despite the fact that all of the NEF-PVFs are often referred to as Tweedie class, for the reasons explained in [5] (and already adopted in [6,7]), we shall henceforth call them the Tweedie, Bar-Lev and Enis models and use the notation X T B E γ ( μ , a ) to indicate that random variable X has a distribution that belongs to the T B E γ ( μ , a ) model with mean μ , scale parameter a and power parameter γ . Finally, argument ( μ , a ) will be skipped when unnecessary, and we will write T B E γ .
T B E γ models constitute a huge (uncountable) class, as for each power parameter, γ R ( 0 , 1 ) , there corresponds a natural exponential family member, while for γ ( 0 , 1 ) , no NEF exists. Depending on the value of power parameter γ , NEF-PVFs include, as special cases (see for instance [8]), normal ( γ = 0 ), Poisson-type ( γ = 1 ) and gamma ( γ = 2 ) NEFs; the family of compound Poisson distributions generated by gamma variates ( 1 < γ < 2 ); NEFs generated by positive stable distributions with stable index in 0 , 1 and supported on R + ( γ > 2 ); and NEFs generated by extreme stable distributions with stable index in ( 1 , 2 ) and supported on R ( γ < 0 ). Among the NEFs with γ > 2 , we find the inverse Gaussian ( γ = 3 ), modified Bessel-type ( γ = 2.5 ) and Whittaker-type ( γ = 4 ) distributions.
Since T B E γ models have been utilized in various fields, such as actuarial studies, assay analysis, survival analysis, time spent splicing telephone cables, ecology and meteorology (see [9,10] and the references cited therein), it is important to test the hypothesis that a given sample stems from a T B E γ model with some γ .
In this paper, we propose a novel goodness-of-fit (gof) test for T B E γ distributions for any fixed power parameter γ R [ 0 , 1 ) . The test is based on a cumulant-based relationship existing among all members of T B E γ models, for  γ R [ 0 , 1 ) . These cumulant-based relationships were obtained by [11,12] as a characterization of any member of the T B E γ class with γ R [ 0 , 1 ) . To the best of our knowledge, this is the first attempt to utilize these characterization properties for developing gof tests that hold for all members of T B E γ with γ R [ 0 , 1 ) . Admittedly, a large number of tests have been proposed for T B E 0 (normal), T B E 1 (Poisson), T B E 2 (gamma) and T B E 3 (inverse Gaussian). On the other hand, we do not know of any test for T B E γ , where γ < 0 and γ > 2 (except for γ = 3 ) . The latter T B E γ are generated by either extreme stable distributions ( γ < 0 ) and are supported on R or positive stable distributions ( γ > 2 ) and are supported on R + . In both cases, the respective densities are unimodal, leptokurtic and absolutely continuous with respect to the Lebesgue measure. Therefore, they are suitable candidates for modeling continuous data. Unlike the normal, gamma and inverse Gaussian NEFs, however, these stable densities cannot be expressed in terms of elementary functions but rather in terms of series expansion. For instance, for any fixed γ > 2 , the corresponding density of the corresponding T B E γ model is given by (see [3])      
d F θ ( x ) = 1 π k = 0 ( 1 ) k k ! sin ( π ρ k ) ( 1 ρ ) k ( 1 ρ ) Γ ( ρ k + 1 ) ρ k a k ( 1 ρ ) x ρ k + 1 × exp θ x + 1 ρ a ρ a θ ρ 1 ρ d x , x > 0 , θ < 0 , 0 < ρ 2 γ 1 γ < 1 .
For some rational values of ρ ( 0 , 1 ) (or γ ( 2 , ) ) the corresponding densities in (2) can be expressed in terms of transcendental functions, e.g., the modified Bessel distribution ( ρ = 1 / 3 or γ = 2.5 ) and the Whittaker-type distribution ( ρ = 2 / 3 or γ = 4 ). Series expansions similar to (2) are also available for stable densities of T B E γ with ρ ( 1 , 2 ) (or γ ( , 0 ) ), which are supported on R . The complexity of the series expansion form of the T B E γ models generated by stable densities could be the reason why these have not been used for any statistical modeling purposes. Fortunately, nowadays, the availability of powerful software allows the cumbersome calculations of various functionals related to these densities to be conducted. Indeed, our proposed gof test for TBE models might be an important step for employing them in the statistical modeling and analysis of continuous sets of data.
The paper is organized as follows: Section 2 presents the cumulant-based relationships existing among T B E γ models, for  γ R [ 0 , 1 ) , and also some basic tools needed for constructing the proposed gof test. It should be noted that for any fixed permissible power parameter γ , the cumulant relation that we use for constructing our proposed gof tests characterizes the corresponding T B E γ model. Section 3 introduces the proposed gof test. In particular, it presents the test statistic, its asymptotic null distribution and a bootstrap approximation. As the goal of the paper is mainly theoretical, i.e., introducing gof tests for all T B E γ models with γ R [ 0 , 1 ) , we exemplify its applicability to only two models of the TBE class. These two models are the gamma and modified Bessel NEFs. Specifically, Section 4.1 exemplifies its applicability to the gamma NEF with respect to various alternatives and existing tests. The performance of the gof test is investigated with a simulation study. In particular, its performance in terms of controlling the type I error rate is examined, while its power performance is also evaluated. In Section 4.2, we demonstrate its applicability to the modified Bessel NEF, and we investigate the nominal level attainment and compute the respective p-values for two real data sets. Obviously, similar applications can be executed for all other T B E γ models with γ > 2 or γ < 0 . Concluding remarks and some open problems are introduced in Section 5. All proofs of statements (theorems and corollaries) in this paper are relegated to Appendix A.
In the sequel, we use the following notation: Let X = ( X 1 , . . . , X n ) be a random sample of size n taken from a population with distribution F, where X 1 ,..., X n are i.i.d.; let X ¯ n be the sample mean and L j = i = 1 n X i j , j N . We also denote, with k i , the i−th cumulant, i = 1 , 2 , . . . , associated with F. Let us recall that cumulant k i can be obtained by differentiating the cumulant-generating function K ( t ) = log E ( e t X ) i times and evaluating the result at zero.

2. TBE Cumulant Relationships among T B E γ Models and Some Testing Tools

A common approach to constructing gof tests is to utilize a characterization of the members of the family of distributions being considered. In this frame, the members of the T B E γ models satisfy the following propositions, some of which characterize these models (see [11,12]). The proof of Proposition 1 can be found in [11] for γ 1 and in Section 3 of [12] for γ < 0 . The proof of Proposition 2 can be conducted by utilizing tools available in [11,12,13]. We omit the proof of Proposition 2, as it is long and entirely not essential for the development of the results of this paper.
Proposition 1.
If X T B E γ , with γ R [ 0 , 1 ) , then
S r ( γ ) k r + 2 k r β r ( γ ) k r + 1 2 = 0 , r N ,
where k j is the j-th cumulant of the corresponding T B E γ and
β r ( γ ) = r γ ( r 1 ) ( r 1 ) γ ( r 2 ) .
Proposition 2.
Let us assume that X is an r.v. of a distribution in the NEF class of distributions. Then, for any γ R [ 0 , 1 ) , property (3) in Proposition 1 holds if and only if X T B E γ .
Remark 1.
Note that γ = 0 , representing the normal distribution, is excluded from the statement of Proposition 1 as it does not satisfy relation (3). Other cumulant-based relations hold for the normal NEF as k p k q = 0 , where q , p N ; q p ; and p 3 . Accordingly, our gof test does not hold for the normal NEF.
Proposition 1 implies that if relation (3) does not hold for γ = γ 0 , then the sample is not taken from T B E γ 0 . For instance, for the gamma distribution with shape parameter α and rate parameter β , the first three cumulants are given by
k 1 = α / β , k 2 = α / β 2 , k 3 = 2 α / β 3 ,
which results in
S 1 ( γ ) = k 3 k 1 γ k 2 2 = ( 2 γ ) α 2 β 4 .
Thus, S 1 ( γ ) is equal to 0 if and only if γ = 2 . Moreover, for the inverse Gaussian distribution with mean μ and shape parameter λ , one has
k 1 = μ , k 2 = μ 3 / λ , k 3 = 3 μ 5 / λ 2 ,
and thus
S 1 ( γ ) = k 3 k 1 γ k 2 2 = ( 3 γ ) μ 6 λ 2 .
Thus, S 1 ( γ ) = 0 if and only if γ = 3 . Finally, for the Poisson distribution with parameter λ , since k 1 = k 2 = k 3 = λ , we have
S 1 ( γ ) = k 3 k 1 γ k 2 2 = ( 1 γ ) λ 2 .
Thus, S 1 ( γ ) = 0 if and only if γ = 1 .
The reverse statement is obviously incorrect if F is not an NEF, i.e., there exist distributions F not in the NEF class for which S 1 ( γ ) = 0 . For instance, let us consider F to be a lognormal distribution L N ( μ , σ ) with p.d.f. of the form
f ( x ; μ , σ ) = exp ( ln ( x ) μ ) 2 / ( 2 σ 2 ) / ( σ x 2 π ) , σ > 0 , x > 0 .
Then (see [14]),
k 1 = exp ( μ ) exp ( σ 2 / 2 ) , k 2 = exp ( 2 μ ) exp ( σ 2 ) exp ( σ 2 ) 1
and
k 3 = exp ( 3 μ ) exp ( 3 σ 2 / 2 ) exp ( σ 2 ) 1 2 exp ( σ 2 ) + 2 ,
in which case,
S 1 ( γ ) = exp ( 4 μ ) exp ( 2 σ 2 ) exp ( σ 2 ) 1 2 exp ( σ 2 ) + 2 γ .
Therefore, S 1 ( γ ) = 0 if γ = e σ 2 + 2 . A special case of this situation is when F L N ( μ , ln ( 2 ) ) . Then, S 1 ( 4 ) = 0 , while the lognormal family is not an NEF (note that S 1 ( 4 ) = 0 holds for Whittaker-type NEFs). This example illustrates that if F is not an NEF; then, the relationship S 1 ( γ ) = 0 does not characterize the distribution involved.
As relation (3), r N , characterizes an NEF within the class of NEFs, a test for a null hypothesis in which F T B E γ against a general alternative can be based on any estimator of S r ( γ ) , r N . Nonetheless, as an unbiased estimator of S r ( γ ) , r > 1 , has a cumbersome form, we shall restrict our study to the case r = 1 . For ease of notation, we shall henceforth denote S 1 ( γ ) with S ( γ ) and its unbiased estimator with S ^ ( γ ) .
Indeed, an unbiased estimator S ^ ( γ ) of S ( γ ) = k 3 k 1 γ k 2 2 has the following polynomial structure (see [11]):
S ^ ( γ ) = T ^ 1 , 3 γ T ^ 2 , 2 , γ R [ 0 , 1 ) ,
where
T ^ 1 , 3 = 1 n ( 2 ) j k X j 3 X k 3 n ( 3 ) j k l X j 2 X k X l + 2 n ( 4 ) j k l m X j X k X l X m
and
T ^ 2 , 2 = 1 n ( 2 ) j k X j 2 X k 2 2 n ( 3 ) j k l X j 2 X k X l + 1 n ( 4 ) j k l m X j X k X l X m ,
with
n ( k ) = n ( n 1 ) · · · ( n ( k 1 ) ) , k = 1 , 2 , . . .
Note that the summations in (10) and (11) are taken over all distinct indices j , k , l , m { 1 , . . . , n } . Here, j k , j k l and j k l m stand for double, triple and fourth summations, respectively.
An alternative form of (9) in terms of L j , j N , is the following [11]:
S ^ ( γ ) = 1 n ( 4 ) ( n 2 + n + 4 ) L 3 L 1 ( n 2 + n ) L 4 3 ( n + 1 ) L 2 L 1 2 + 3 ( n 1 ) L 2 2 + 2 L 1 4 γ ( n 2 3 n + 3 ) L 2 2 ( n 2 n ) L 4 2 n L 2 L 1 2 + 4 ( n 1 ) L 3 L 1 + L 1 4
Based on the above, the following theorem, which follows from [9] and [4], provides an unbiased estimator of S ( γ ) as well as a characterization for TBE γ models with γ R / [ 0 , 1 ) .
Theorem 1.
Let us assume that a distribution F possesses a finite third moment and let ( X 1 , . . . , X n ) be a random sample of size n 4 taken from F. Then, for any fixed γ R / [ 0 , 1 ) , the following two properties hold:
(i) The polynomial statistic S ^ ( γ ) given in (9) is an unbiased estimator of S ( γ ) .
(ii) S ^ ( γ ) has zero regression on L 1 iff F is a TBE γ model.
Remark 2.
Part (ii) of Theorem 4 provides equivalent conditions under which a general family of distributions F is a TBE model. If, however, one confines F to be an NEF, then for any fixed γ R / [ 0 , 1 ) , F is a TBE γ iff S ( γ ) = 0 (this can be proved by using the tools in [11,12,13]).
In the next section, we utilize the properties of S ^ ( γ ) for constructing a gof test for TBE models.

3. The Proposed Gof Test: Test Statistic, Asymptotic Null Distribution and Bootstrap Approximation

In this section, we propose and study a novel gof test for T B E γ distributions for any fixed γ R [ 0 , 1 ) .
In this frame, let X 1 ,..., X n be a sample of size n, n 4 from a distribution with c.d.f. F with finite third moment and positive mean (first cumulant). We propose a general method for testing the null hypothesis that the sample is stemming from a T B E γ , with fixed γ = γ 0 R [ 0 , 1 ) , versus the alternative that the sample is not taken from a T B E γ 0 , i.e.,
H 0 : F = T B E γ 0
versus the alternative
H 1 : F T B E γ 0 .
Clearly, various gof tests for γ 0 = 1 , 2 , 3 (i.e., Poisson, gamma and inverse Gaussian, respectively) are available in the literature, whereas none exist for any T B E γ with γ R ( [ 0 , 1 ) 1 , 2 , 3 ) .
As [15] pointed out, characterization theorems or properties can be natural and effective starting points for constructing gof tests and are essential for assessing the validity of distributional models. It seems that the first idea of constructing gof tests based on a characterization of a distribution in the realm of the null hypotheses is due to [16] (see [17]). However, the earliest explicit use of a characterization theorem for constructing a gof test was presented by [18], who used Shannon’s maximum entropy characterization to construct a test for a composite hypothesis of normality. Now, there are extensive literature studies dealing with gof tests based on various types of characterizations. We will mention only a few relevant papers; for example, see [15,19,20,21,22,23,24,25] and the references therein.

3.1. Test Statistic

Here, we utilize relation (3) and the characterization properties given in Theorem 1 to construct a gof test. The test deals with a composite T B E γ 0 hypothesis. More specifically, if  X T B E γ 0 , then for testing (12) versus (13), we expect that the values of S ^ ( γ 0 ) of S ( γ 0 ) should be close to 0. Accordingly, one should reject (12) for large absolute values of S ^ ( γ 0 ) or for large values of
S n ( γ 0 ) S ^ 2 ( γ 0 ) .
The justification of such a criterion is demonstrated in the next subsection.

3.2. Asymptotic Behavior of the Test Statistic

Here, we investigate the asymptotic behavior of test statistic S n ( γ 0 ) , as  n .
Theorem 2.
Let X be a random variable with finite third moment and positive first cumulant and  X 1 , , X n be n independent copies of X. Then,
S n ( γ 0 ) a . s . S 2 ( γ 0 )
where a . s . denotes the almost sure convergence.
Note that S n ( γ 0 ) 0 , so under the null hypothesis, we have the following:
Corollary 1.
Let X 1 , , X n be i.i.d. r.v.s taken from T B E γ 0 , then
S n ( γ 0 ) a . s . 0 .
Hence, the null hypothesis that F is T B E γ 0 should be rejected for large values of S n ( γ 0 ) . As the exact distribution of S n ( γ 0 ) is rather intricate, we derive its asymptotic distribution. The next theorem determines the asymptotic null distribution of n S n ( γ 0 ) .
Theorem 3.
Suppose that F has finite sixth moments. Then, under the null hypothesis (12),
n S n ( γ 0 ) d M χ 1 2 ,
where d denotes a convergence in distribution, χ 1 2 denotes a chi-squared distribution with one degree of freedom and 
M = μ 1 2 μ 6 + 2 ( 2 γ 0 3 ) μ 1 3 μ 5 4 γ 0 μ 1 μ 2 μ 5 + ( 2 γ 0 5 ) 2 μ 1 4 μ 4 4 ( γ 0 1 ) ( 2 γ 0 3 ) μ 1 2 μ 2 μ 4 + 4 γ 0 2 μ 2 2 μ 4 + 2 μ 1 μ 3 μ 4 + ( 1 4 γ 0 ) μ 2 μ 3 2 4 ( 4 γ 0 2 10 γ 0 + 3 ) μ 1 μ 2 2 μ 3 + 32 ( γ 0 1 ) ( γ 0 2 ) μ 1 3 μ 2 μ 3 + 2 ( 2 γ 0 5 ) μ 1 2 μ 3 2 + 8 ( 2 γ 0 ) ( 2 γ 0 3 ) μ 1 5 μ 3 + 12 ( γ 0 1 ) ( 2 γ 0 3 ) μ 1 2 μ 2 3 + ( 35 18 γ 0 ) ( 2 γ 0 3 ) μ 1 4 μ 2 2 4 γ 0 2 μ 2 4 + 16 ( 2 γ 0 ) 2 μ 1 6 μ 2 ,
with μ i = E ( X i ) , i = 1 , . . . , 6 .
The power of the test depends on the value of S ( γ 0 ) , which in turn depends on the particular combination of the true F and the value γ 0 of the null hypothesis. If the null hypothesis is true, i.e., if F T B E γ 0 , then S ( γ 0 ) = 0 . If it is not true and F is still an NEF distribution, then S ( γ 0 ) is strictly positive, implying that the test is consistent (see Proposition 2). This particularly holds in the special case where F T B E γ for some γ γ 0 . However, if F is not an NEF, then S ( γ ) may still be zero in some specific combinations of truth and null hypothesis.
Some examples representing different scenarios are explicitly derived in Section 2 following Propositions 1 and 2. For F corresponding to a Poisson, gamma or inverse Gaussian distribution (which are all in TBE), it is shown that the only γ whereby S ( γ ) = 0 is given by γ = 1 , γ = 2 or γ = 3 , respectively. In the case where F is a lognormal (which is not in TBE), it is shown that S ( e σ 2 + 2 ) = 0 . Hence, in the first three cases, the test is consistent, but it is not for the lognormal with parameters μ and ln 2 , as the latter would result in a low-power test when testing H 0 : γ 0 = 4 (the Whittaker-type NEF), since S ( 4 ) = 0 .
Remark 3.
Theorem 3 presents the general result concerning the asymptotic null distribution of n S n ( γ 0 ) for testing the goodness-of-fit when the random sample is from a T B E γ 0 . Note that the limiting distribution depends on M, given in (18), where M depends on γ 0 and the first six moments of T B E γ 0 . The latter moments depend on ν—the vector of unknown parameters of T B E γ 0 . Hence, we write M = M ( ν , γ 0 ) . Special cases of M ( ν , γ 0 ) can easily be obtained for each of the specific values of γ 0 . For example, for  γ 0 = 2 , T B E γ 0 is the family of gamma distributions with shape parameter a and rate parameter b, in which case, ν = ( a , b ) ,   μ i = Γ ( i + a ) b i Γ ( a ) 1 ; thus,
M ( a , b , γ 0 = 2 ) = 2 a 3 ( a + 1 ) ( 3 a + 10 ) b 8 .
The computation of μ = ( μ 1 , . . . , μ 6 ) and thus of M given in (18) can be simply conducted for any T B E γ model as follows: For simplicity, let us assume γ > 2 (i.e., the corresponding T B E is an NEF-PVF generated by a positive stable distribution with VF V ( μ ) = a μ γ , a > 0 , γ > 2 ). Thus, for any fixed γ > 2 , such T B E γ depends on ν = ( a , μ ) . Let X T B E γ ; then, the moment-generating function of X is derived (as shown in [3] (Equation 2.4)) as
G ( t ) = E ( e t X ) = exp 1 a ( 2 γ ) a ( 1 γ ) ( θ t ) ( 2 γ ) ( 1 γ ) a ( 1 γ ) θ ( 2 γ ) ( 1 γ ) ,
where a > 0 , θ = μ ( 1 γ ) a ( 1 γ ) < 0 , γ > 2 and  μ μ 1 . The components μ i of μ can now be computed using μ i = d i G ( t ) / d t i | t = 0 g i ( ν ) , i = 1 , . . . , 6 , where ν = ( a , μ ) and g i is some R 2 R mapping.
The use of the asymptotic null distribution given in Theorem 3 for testing purposes requires a consistent estimator of M (see Remark 3). Such an estimator can be obtained by estimating μ = ( g 1 ( ν ) , . . . , g 6 ( ν ) ) with the maximum likelihood estimators (MLEs) of ν . We denote such MLEs with ν ^ , μ ^ and M ^ = M ( ν ^ , γ 0 ) . Of course, alternatively, M could be estimated using its consistent moment estimator. However, based on a small simulation conducted, we found that this moment estimator is unstable and lacks efficiency, due to the use of six empirical moments. Therefore, in what follows, we only consider the MLE as an estimator of M.
Based on observed sample data X 1 ,..., X n , the previous asymptotic results can be used to obtain p-values for the proposed test as is outlined by the following procedure:
1.
Use relations (14) and (9) to compute test statistic S n ( γ 0 ) . Denote its observed value with S n obs ( γ 0 ) .
2.
Under the null hypothesis of T B E γ 0 , compute the MLEs ν ^ and M ^ = M ( ν ^ , γ 0 ) of ν and M ( ν , γ 0 ) , respectively.
3.
Approximate the p-value of the test using the relation
p ^ = 1 F χ 1 2 n M ^ 1 S n obs ( γ 0 ) ,
where F χ 1 2 denotes the cumulative distribution function of the chi-squared distribution with one degree of freedom.

3.3. Bootstrap Approximation

One can also approximate the p-value and the critical points using a parametric bootstrap approach. More specifically, we shall apply this approach with the following procedure:
1.
Follow steps 1 and 2 of the pervious procedure.
2.
For some large integer B, repeat the following steps for every b { 1 , . . . , B } :
(a)
Generate a bootstrap sample X 1 * b ,..., X n * b from X * T B E γ 0 with parameter ν ^ .
(b)
Based on the bootstrap sample, calculate the bootstrap S n ( γ 0 ) * b version of test statistic S n ( γ 0 ) .
3.
Approximate the p-value with p ^ = 1 B b = 1 B I S n ( γ 0 ) * b S n obs ( γ 0 ) and the critical point with S c : B , n ( γ 0 ) * , where c = ( 1 α ) B and · is the ceiling function.

4. Numerical Studies of Gamma and Modified Bessel-Type NEFs

While the goal of the paper is mainly theoretical, i.e., introducing gof tests for T B E γ models, we illustrate, in this section, its applicability to two models of the TBE class: the gamma ( γ = 2 ) and modified Bessel-type ( γ = 2.5 ) NEFs. This is dealt with in the next two subsections. Applications to other T B E members will be discussed in a future study.

4.1. A Simulation Study of the Gamma NEF

Obviously, many various gof tests have been carried out for the well-used gamma distribution. In this subsection, we assess the performance of our proposed gof test and compare it with some other existing tests. The comparison is made in terms of type I error rate and test power. For this, we executed simulations using statistical computing environment R, while the respective analysis was conducted at a 5 % nominal level. Appropriate alternative distributions and competitive tests are outlined in the sequel.
The density of gamma distribution G a m m a ( α , β ) , with shape parameter α > 0 and rate parameter β > 0 , is
f ( x ; α , β ) = 1 Γ ( α ) β α x α 1 e β x , x > 0 ,
with mean μ = α / β and variance σ 2 = α / β 2 .
Based on the previous section, we considered the test statistic obtained with S n ( γ 0 ) using relations (14) and (9) in the case of γ 0 = 2 , while M, determined with relation (19), was unknown and estimated using the MLE.
In order to assess the performance of our proposed gof test in terms of type I error rate, 10 , 000 samples, each of size n = 50 , 100 , were drawn from a G a m m a ( α , β ) distribution. The chosen parameter values were ( α , β ) = ( 2 , 3 ) , ( 2 , 2 ) , ( 2 , 1 ) , ( 0.5 , 1 ) , (0.5,2) , ( 2 , 5 ) . The power performance of the test was investigated by generating 10 , 000 samples, each of size n = 50 , 100, from the following alternatives:
  • The inverse Gaussian distribution, denoted with I G ( μ , λ ) , with density
    f ( x ; μ , λ ) = λ 2 π x 3 1 2 exp λ ( x μ ) 2 2 μ 2 x , x > 0 ,
    where μ > 0 , λ > 0 and μ 3 / λ are the mean, shape parameter and variance, respectively. The  I G ( μ , λ ) distribution is extensively used for modeling non-negative, right-skewed data in different fields of applied research (see, for instance, Refs [26,27] and references therein). In this frame, data from I G ( 1 , λ ) , for  λ = 0.5 , 1 , 2 , 4 , 8 , 10 , were considered.
  • Lognormal distribution L N ( μ , ϕ ) with density
    f ( x ; μ , ϕ ) = exp ( ln ( x ) μ ) 2 / ( 2 ϕ 2 ) / ( ϕ x 2 π ) , μ R , ϕ > 0 , x > 0 ,
    Data from L N ( μ , ϕ ) with ( μ , ϕ ) = ( 0 , 0.5 ) , ( 0 , 0.6 ) , ( 0 , 1 ) , ( 0 , 1.4 ) , ( 0 , 2 ) , ( 0 , 3 ) , ( 0 , 5 ) , ( 0.5 , 1 ) were considered, where the last setting corresponds to the lognormal distribution with mean e and variance e 3 e 2 .
  • Half-Cauchy distribution H C ( 0 , 1 ) with density
    f ( x ) = 2 π 1 1 + x 2 , x > 0 .
  • Beta distribution B e t a ( a , b ) with density
    f ( x ; a , b ) = x a 1 ( 1 x ) b 1 B ( a , b ) , 0 < x < 1 , a > 0 , b > 0 ,
    for parameter values ( a , b ) { ( 2 , 0.5 ) , ( 2 , 2 ) } .
  • Pareto distribution P a ( a , b ) with density
    f ( x ; a , b ) = a b a x ( a + 1 ) , b > 0 , a > 0 , x > 0 ,
    for parameter values ( a , b ) { ( 1 , 1 ) , ( 2 , 1 ) } .
  • Shifted-Pareto distribution S P ( ν ) with density ν / ( 1 + x ) 1 + ν .
The competitiveness of our proposed gof test was compared with the following existing test statistics:
  • Test statistic G n , a , where n is the sample size and a > 0 is a tuning parameter. This test statistic was recently proposed by [28]. The corresponding test belongs to a class of weighted L2-type tests of fit to the gamma distribution. They are based on a fixed point property of a transformation connected to a Steinian characterization of the family of gamma distributions.
  • Test statistics T n , a ( 1 ) and T n , a ( 2 ) , where n is the sample size and a > 0 is a tuning parameter, proposed by [29]. The corresponding tests belong to a class of gof tests for the gamma distribution that utilizes the empirical Laplace transform.
  • The test statistic proposed by [30], which is based on the ratio of two variance estimators. It is denoted in the sequel with V n .
By taking into account the recommendations given by [28,29], we chose G n , a = 0.5 , T n , 1 ( 1 ) and T n , 4 ( 2 ) as representatives for the simulation study, where the test proposed by [30] was implemented using the function gamma_test in the R package goft. Finally, each of the tests under discussion was implemented using parametric bootstrap with B = 1000 bootstrap samples.
The results are listed in Table 1 and Table 2, where the best performing tests with respect to the power—for each distribution and sample size—are highlighted in bold face for easy reference. Graphical representations of these results are provided in Figure 1 and Figure 2. In these tables and graphs, we use the abbreviations Asym and Btstr for asymptotic and bootstrap, respectively.
According to the results, we conclude the following:
  • The empirical size of the test proposed in this paper got closer to the nominal level of 0.05 as the sample size increased.
  • The empirical power of the tests increased, as expected, as the sample size increased.
  • No test yielded the highest power against all alternatives analyzed, i.e., no test showed uniform superiority over the others, as indeed was expected according to the theoretical results in [31].
  • For the I G ( 1 , λ ) model, the larger λ was, the better the performance of our gof test was with respect to the other tests considered in the simulation study. However, for the remaining alternative distributions, tests G n , 0.5 and T n , 1 ( 1 ) performed better than the proposed test.

4.2. Numerical Examples for the Modified Bessel-Type NEF

In this subsection, we consider some numerical studies in which we applied the gof test to the modified Bessel NEF. These include a simulation study to assess nominal level attainment as well as the analysis of two real data sets. We re-emphasize the point that to the best of our knowledge, it is the first time that a gof test is proposed for such an NEF.
The modified Bessel densities can be either expressed with the modified Bessel function of the second kind of order 1 / 3 or with the series expansion given in (2).
Its density series expansion is given by
f ( x ; a , θ ) = exp θ x + 2 a 3 2 a θ 1 / 3 × 1 π k = 0 ( 1 ) k k ! sin π k 3 3 k 3 2 a 2 k 3 1 x k / 3 + 1 Γ k 3 + 1 , x > 0 , θ < 0 , a > 0 ,
where its log-likelihood based on a random sample ( x 1 , . . . , x n ) of size n has the form
L ( a , θ ) = θ i = 1 n x i + n 2 a 3 2 a θ 1 / 3 + i = 1 n log 1 π k = 0 ( 1 ) k k ! sin π k 3 3 k 3 2 a 2 k 3 1 x i k / 3 + 1 Γ k 3 + 1 .
The maximum likelihood estimators of parameters θ and a are obtained as the solution of a system of two equations based on the partial derivatives of the log-likelihood function,
L ( a , θ ) θ = 0 and L ( a , θ ) θ = 0 ,
where
L ( a , θ ) θ = i = 1 n x i n 3 2 a θ 2 3 , L ( a , θ ) a = 2 n a 2 3 2 a θ 1 3 n θ a 3 2 a θ 2 3 i = 1 n k = 0 ( 1 ) k k ! sin π k 3 3 k 3 2 2 k 3 2 k 3 1 a 2 k 3 + 1 1 x k / 3 + 1 Γ k 3 + 1 k = 0 ( 1 ) k k ! sin π k 3 3 k 3 2 a 2 k 3 1 x k / 3 + 1 Γ k 3 + 1 .
For references on this density, see p. 155 in [32] and [33]. Surprisingly enough, this density has various applications in diffusion and queuing theories (c.f., [34]).
We begin with a small-scale simulation study to assess type 1 error attainment (for nominal level α = 0.05 ) under the modified Bessel distribution. We considered six different choices of B e s s e l ( μ , ϕ ) , where μ is the mean and ϕ is the dispersion. In the bootstrap setting, B = 1000 Monte Carlo samples were generated. The results are provided in Table 3. We observed that the nominal level attainment improved as n increased and was better for the bootstrap version than for the asymptotic version. We leave a detailed assessment of the test power to further research and turn instead to real data examples.
The first considered data set represents the marks of slow-pace students in mathematics in the final 2003 examination at IIT Kanpur [35]. The second data set, used by [36], is the vinyl chloride data obtained from clean-up gradient-monitoring wells in mg/L. The data sets, which were recently analyzed by [37], are displayed in Table 4. Some basic summary statistics are provided in Table 5, indicating considerable skewness to the right.
The p-values for the test statistic in the special case of γ 0 = 2.5 for these two data sets are given in Table 6. These p-values indicate that for these data sets, the null hypothesis of the modified Bessel distribution should not be rejected.

5. Conclusions

In this manuscript, we were interested in testing the hypothesis that a given distribution corresponds to a specific T B E γ model, for fixed γ R [ 0 , 1 ) and unspecified population parameters. Goodness-of-fit tests are typically based on characterization properties. In our developments, the property S r ( γ ) = 0 , as displayed in Propositions 1 and 2, is based on existing relations among the first three cumulants (when using r = 1 ) of the null distribution. We demonstrate how S ( γ ) S 1 ( γ ) can be estimated and that its squared value produces a statistic that asymptotically approximates 0 under the null hypothesis and whose null distribution is a (scaled) chi-squared distribution with one degree of freedom. The scaling factor depends on distributional moments of the true distribution and needs to be estimated using the data at hand, for instance, with maximum likelihood. While the asymptotic chi-squared property allows the computation of theoretical critical values for the test problem of interest to be performed, we also developed a bootstrap version of the test and demonstrated in our simulation study that this leads to better power and nominal level attainment than those obtained with its theoretical counterpart. Under all of the scenarios, the nominal level attainments and powers increased as the sample size increased.
When the null distribution was gamma, we found that the empirical size of the test proposed in this paper got closer to the nominal level of 0.05 as the sample size increased, while the empirical power of the test increased, as expected, as the sample size increased. Compared with existing gof tests, it is concluded that for the I G ( 1 , λ ) alternative, the test proposed in this paper performs better as the value of λ gets larger. However, for the remaining alternative distributions considered, the existing tests perform better than the test proposed here. Finally, a detailed simulation study for the special case of modified Bessel and Whittaker distribution, i.e., when γ 0 = 2.5 , 4 , is a problem to be investigated in further research.
It is entirely not our claim that our proposed gof test performs uniformly better than any other test—an impossible mission that cannot be claimed or accomplished by any other test. Our gof test, however, is applicable to any member of the huge and uncountable class of the T B E γ models—the vast majority of which have not even been considered in the literature for any statistical modeling. The gamma distribution example considered in this paper is only presented to demonstrate the wide applicability of our proposed gof test.
It is our deep belief that T B E γ models will be well utilized in the near future for various statistical modeling purposes. Our proposed gof test establishes at least one step in this direction.

Author Contributions

Conceptualization, S.K.B.-L., A.B. and J.E.; methodology, S.K.B.-L., A.B., J.E. and X.L.; software, X.L. and P.R.; validation, X.L. and P.R.; formal analysis, X.L. and P.R.; investigation, A.B., J.E., X.L. and P.R.; resources, A.B. and X.L.; data curation, A.B. and P.R.; writing—original draft preparation, A.B., J.E. and X.L.; writing—review and editing, S.K.B.-L., A.B., J.E. and X.L.; visualization, J.E. and P.R.; supervision, S.K.B.-L. and X.L.; project administration, S.K.B.-L. and A.B.; funding acquisition, S.K.B.-L. and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project Establishment of capacity building infrastructures in Biomedical Research (BIOMED-20) (MIS 5047236), which is implemented under Action Reinforcement of the Research and Innovation Infrastructure, funded by Operational Programme Competitiveness, Entrepreneurship and Innovation (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund). National Natural Science Foundation of China (12271329).

Data Availability Statement

All real data sets used in this manuscript are explicitly displayed in the paper.

Acknowledgments

We thank two reviewers for helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
gofGoodness-of-fit
NEFNatural exponential family
PVFPower variance functions
TBETweedie, Bar-Lev and Enis
VFVariance function

Appendix A. Proofs

Proof of Theorem 2.
By (9)–(11), we have that
S ^ ( γ 0 ) = 1 n ( 2 ) j k X j 3 X k γ 0 n ( 2 ) j k X j 2 X k 2 + 2 γ 0 3 n ( 3 ) j k l X j 2 X k X l + 2 γ 0 n ( 4 ) j k l m X j X k X l X m .
Let
h 1 ( X 1 , X 2 , X 3 , X 4 ) = X 1 3 X 2 + X 1 3 X 3 + X 1 3 X 4 + X 2 3 X 1 + X 2 3 X 3 + X 2 3 X 4 + X 3 3 X 1 + X 3 3 X 2 + X 3 3 X 4 + X 4 3 X 1 + X 4 3 X 2 + X 4 3 X 3 , h 2 ( X 1 , X 2 , X 3 , X 4 ) = X 1 2 X 2 2 + X 1 2 X 3 2 + X 1 2 X 4 2 + X 2 2 X 3 2 + X 2 2 X 4 2 + X 3 2 X 4 2 , h 3 ( X 1 , X 2 , X 3 , X 4 ) = X 1 2 X 2 X 3 + X 1 2 X 2 X 4 + X 1 2 X 3 X 4 + X 2 2 X 1 X 3 + X 2 2 X 1 X 4 + X 2 2 X 3 X 4 + X 3 2 X 1 X 2 + X 3 2 X 1 X 4 + X 3 2 X 2 X 4 + X 4 2 X 1 X 2 + X 4 2 X 1 X 3 + X 4 2 X 2 X 3 , h 4 ( X 1 , X 2 , X 3 , X 4 ) = X 1 X 2 X 3 X 4 .
Then, S ^ ( γ 0 ) can be rewritten as
S ^ ( γ 0 ) = 1 n ( 2 ) I 2 h 1 ( X i 1 , X i 2 , X i 3 , X i 4 ) ( n 2 ) ( 2 ) γ 0 n ( 2 ) I 4 h 2 ( X i 1 , X i 2 , X i 3 , X i 4 ) ( n 2 ) ( 2 ) + 2 γ 0 3 n ( 3 ) I 2 h 3 ( X i 1 , X i 2 , X i 3 , X i 4 ) n 3 + 2 γ 0 n ( 4 ) I 24 h 4 ( X i 1 , X i 2 , X i 3 , X i 4 ) = 24 n ( 4 ) I h 1 ( X i 1 , X i 2 , X i 3 , X i 4 ) 12 γ 0 h 2 ( X i 1 , X i 2 , X i 3 , X i 4 ) 6 + 24 n ( 4 ) I ( 2 γ 0 3 ) h 3 ( X i 1 , X i 2 , X i 3 , X i 4 ) 12 + ( 2 γ 0 ) h 4 ( X i 1 , X i 2 , X i 3 , X i 4 ) 24 n ( 4 ) I h ( X i 1 , X i 2 , X i 3 , X i 4 ) .
where I is the set of all combinations of quadruplets selected from n elements and ( i 1 , i 2 , i 3 , i 4 ) takes all the combinations of I.
Since h 1 ( X i 1 , X i 2 , X i 3 , X i 4 ) , h 2 ( X i 1 , X i 2 , X i 3 , X i 4 ) , h 3 ( X i 1 , X i 2 , X i 3 , X i 4 ) and h 4 ( X i 1 , X i 2 , X i 3 , X i 4 ) are all symmetric functions of X i 1 , X i 2 , X i 3 and X i 4 , we have that h ( X i 1 , X i 2 , X i 3 , X i 4 ) is also symmetric. Thus, S ^ ( γ 0 ) = 24 n ( 4 ) I h ( X i 1 , X i 2 , X i 3 , X i 4 ) is a U-statistic. By the assumption that variable X has finite third moment, we have E ( | X | ) C 1 , E ( | X | 2 ) C 2 and E ( | X | 3 ) C 3 , where C 1 , C 2 and C 3 are all finite constants.
It is straightforward that
E ( | h 1 ( X 1 , X 2 , X 3 , X 4 ) | ) 12 C 1 C 3 < , E ( | h 2 ( X 1 , X 2 , X 3 , X 4 ) | ) 6 C 2 2 < , E ( | h 3 ( X 1 , X 2 , X 3 , X 4 ) | ) 12 C 1 2 C 2 < , E ( | h 4 ( X 1 , X 2 , X 3 , X 4 ) | ) C 1 4 < ,
which implies that E ( | h ( X 1 , X 2 , X 3 , X 4 ) | ) < . Let us assume that E ( X i ) = μ i , i = 1 , 2 , 3 . We have
E ( h ( X 1 , X 2 , X 3 , X 4 ) ) = 1 12 E h 1 ( X 1 , X 2 , X 3 , X 4 ) 2 γ 0 h 2 ( X 1 , X 2 , X 3 , X 4 ) + ( 2 γ 0 3 ) 12 E ( h 3 ( X 1 , X 2 , X 3 , X 4 ) ) + ( 2 γ 0 ) E ( h 4 ( X 1 , X 2 , X 3 , X 4 ) ) = 1 12 12 μ 1 μ 3 γ 0 6 6 μ 2 2 + 2 γ 0 3 12 12 μ 1 2 μ 2 + ( 2 γ 0 ) μ 1 4 = μ 1 μ 3 γ 0 μ 2 2 + ( 2 γ 0 3 ) μ 1 2 μ 2 + ( 2 γ 0 ) μ 1 4 .
According to the law of large numbers for U-statistics [38], we have
S ^ ( γ 0 ) = 24 n ( 4 ) I h ( X i 1 , X i 2 , X i 3 , X i 4 ) a . s . E ( h ( X 1 , X 2 , X 3 , X 4 ) ) = μ 1 μ 3 γ 0 μ 2 2 + ( 2 γ 0 3 ) μ 1 2 μ 2 + ( 2 γ 0 ) μ 1 4 .
By Theorem 1, we have E ( S ^ ( γ 0 ) ) = S ( γ 0 ) , which, together with E ( S ^ ( γ 0 ) ) = μ 1 μ 3 γ 0 μ 2 2 + ( 2 γ 0 3 ) μ 1 2 μ 2 + ( 2 γ 0 ) μ 1 4 , implies that
S ^ ( γ 0 ) a . s . S ( γ 0 ) .
By the continuous mapping theorem, we have
S n ( γ 0 ) = S ^ 2 ( γ 0 ) a . s . S 2 ( γ 0 ) ,
which concludes the proof. □
Proof of Corollary 1.
Since X 1 ,..., X n are i.i.d. from a T B E γ 0 distribution, we have S ( γ 0 ) = 0 . Therefore, the result is directly derived from Theorem 2. □
Before we proceed to the proof of Theorem 3, we state the following lemma.
Lemma A1.
Let us assume that F possesses finite sixth moment. Let E ( X i ) = μ i , i = 1 , 2 , . . . , 6 , with all μ i being finite constants. Let us define ϕ ( x 1 ) = E ( h ( X 1 , X 2 , X 3 , X 4 ) | X 1 = x 1 ) and η 1 = V a r ( ϕ ( X 1 ) ) . Then, under the null hypothesis,
η 1 = M 16
where M is as in Theorem 3.
Proof. 
Firstly, we find that
ϕ ( x 1 ) = E h ( X 1 , X 2 , X 3 , X 4 ) | X 1 = x 1 = E ( 1 12 h 1 ( X 1 , X 2 , X 3 , X 4 ) γ 0 6 h 2 ( X 1 , X 2 , X 3 , X 4 ) | X 1 = x 1 ) + E ( 2 γ 0 3 12 h 3 ( X 1 , X 2 , X 3 , X 4 ) + ( 2 γ 0 ) h 4 ( X 1 , X 2 , X 3 , X 4 ) | X 1 = x 1 ) = 1 12 ( 3 x 1 3 μ 1 + 3 x 1 μ 3 + 6 μ 1 μ 3 ) γ 0 6 ( 3 x 1 2 μ 2 + 3 μ 2 2 ) + 2 γ 0 3 12 ( 3 x 1 2 μ 1 2 + 6 x 1 μ 1 μ 2 + 3 μ 1 2 μ 2 ) + ( 2 γ 0 ) x 1 μ 1 3 = 1 4 μ 1 x 1 3 + ( 2 γ 0 3 4 μ 1 2 γ 0 2 μ 2 ) x 1 2 + ( 1 4 μ 3 + 2 γ 0 3 2 μ 1 μ 2 + ( 2 γ 0 ) μ 1 3 ) x 1 + 1 2 μ 1 μ 3 γ 0 2 μ 2 2 + 2 γ 0 3 4 μ 1 2 μ 2 = μ 1 4 x 1 3 + ( 2 γ 0 3 ) μ 1 2 2 γ 0 μ 2 4 x 1 2 + μ 3 + 2 ( 2 γ 0 3 ) μ 2 μ 1 + 4 ( 2 γ 0 ) μ 1 3 4 x 1 + 1 2 μ 1 μ 3 γ 0 2 μ 2 2 + 2 γ 0 3 4 μ 1 2 μ 2 ,
that is,
E μ 1 X 1 3 + [ ( 2 γ 0 3 ) μ 1 2 2 γ 0 μ 2 ] X 1 2 + [ μ 3 + 2 ( 2 γ 0 3 ) μ 2 μ 1 + 4 ( 2 γ 0 ) μ 1 3 ] X 1
equals
2 γ 0 μ 2 2 2 μ 1 μ 3 ( 2 γ 0 3 ) μ 1 2 μ 2 .
Therefore, we have
η 1 = V a r μ 1 4 X 1 3 + ( 2 γ 0 3 ) μ 1 2 2 γ 0 μ 2 4 X 1 2 + μ 3 + 2 ( 2 γ 0 3 ) μ 2 μ 1 + 4 ( 2 γ 0 ) μ 1 3 4 X 1 = 1 16 V a r μ 1 X 1 3 + ( 2 γ 0 3 ) μ 1 2 2 γ 0 μ 2 X 1 2 + μ 3 + 2 ( 2 γ 0 3 ) μ 2 μ 1 + 4 ( 2 γ 0 ) μ 1 3 X 1 = 1 16 E μ 1 X 1 3 + [ ( 2 γ 0 3 ) μ 1 2 2 γ 0 μ 2 ] X 1 2 + [ μ 3 + 2 ( 2 γ 0 3 ) μ 2 μ 1 + 4 ( 2 γ 0 ) μ 1 3 ] X 1 2 1 16 E μ 1 X 1 3 + [ ( 2 γ 0 3 ) μ 1 2 2 γ 0 μ 2 ] X 1 2 + [ μ 3 + 2 ( 2 γ 0 3 ) μ 2 μ 1 + 4 ( 2 γ 0 ) μ 1 3 ] X 1 2
or equivalently, using relation (A1),
η 1 = 1 16 μ 1 2 μ 6 + [ ( 2 γ 0 3 ) μ 1 2 2 γ 0 μ 2 ] 2 μ 4 + [ μ 3 + 2 ( 2 γ 0 3 ) μ 2 μ 1 + 4 ( 2 γ 0 ) μ 1 3 ] 2 μ 2 + 1 16 2 μ 1 [ ( 2 γ 0 3 ) μ 1 2 2 γ 0 μ 2 ] μ 5 + 2 μ 1 [ μ 3 + 2 ( 2 γ 0 3 ) μ 2 μ 1 + 4 ( 2 γ 0 ) μ 1 3 ] μ 4 + 1 16 2 [ ( 2 γ 0 3 ) μ 1 2 2 γ 0 μ 2 ] [ μ 3 + 2 ( 2 γ 0 3 ) μ 2 μ 1 + 4 ( 2 γ 0 ) μ 1 3 ] μ 3 1 16 2 γ 0 μ 2 2 2 μ 1 μ 3 ( 2 γ 0 3 ) μ 1 2 μ 2 2 .
Thus,
η 1 = 1 16 [ μ 1 2 μ 6 + 2 ( 2 γ 0 3 ) μ 1 3 μ 5 4 γ 0 μ 1 μ 2 μ 5 ] + 1 16 [ ( 2 γ 0 5 ) 2 μ 1 4 μ 4 4 ( γ 0 1 ) ( 2 γ 0 3 ) μ 1 2 μ 2 μ 4 + 4 γ 0 2 μ 2 2 μ 4 + 2 μ 1 μ 3 μ 4 ] + 1 16 [ ( 1 4 γ 0 ) μ 2 μ 3 2 4 ( 4 γ 0 2 10 γ 0 + 3 ) μ 1 μ 2 2 μ 3 + 32 ( γ 0 1 ) ( γ 0 2 ) μ 1 3 μ 2 μ 3 ] + 1 16 [ 2 ( 2 γ 0 5 ) μ 1 2 μ 3 2 + 8 ( 2 γ 0 ) ( 2 γ 0 3 ) μ 1 5 μ 3 + 12 ( γ 0 1 ) ( 2 γ 0 3 ) μ 1 2 μ 2 3 ] + 1 16 [ ( 35 18 γ 0 ) ( 2 γ 0 3 ) μ 1 4 μ 2 2 4 γ 0 2 μ 2 4 + 16 ( 2 γ 0 ) 2 μ 1 6 μ 2 ] = 1 16 M .
Proof of Theorem 3.
From the proof of Theorem 2, we obtain that
S ^ ( γ 0 ) = 24 n ( 4 ) I h ( X i 1 , X i 2 , X i 3 , X i 4 )
is a U-statistic, where
h ( X 1 , X 2 , X 3 , X 4 ) = 1 12 h 1 ( X 1 , X 2 , X 3 , X 4 ) γ 0 6 h 2 ( X 1 , X 2 , X 3 , X 4 ) + 2 γ 0 3 12 h 3 ( X 1 , X 2 , X 3 , X 4 ) + ( 2 γ 0 ) h 4 ( X 1 , X 2 , X 3 , X 4 ) .
Under the null hypothesis, it is straightforward that E ( h ( X 1 , X 2 , X 3 , X 4 ) ) = S ( γ 0 ) = 0 . By the fact that E ( h 2 ) = E ( 1 12 h 1 γ 0 6 h 2 + 2 γ 0 3 12 h 3 + ( 2 γ 0 ) h 4 ) 2 , then by taking every term apart and assuming that F has finite sixth moment, we obtain, after some algebra, that E ( h ( X 1 , X 2 , X 3 , X 4 ) 2 ) < .
With ϕ ( x 1 ) = E ( h ( X 1 , X 2 , X 3 , X 4 ) | X 1 = x 1 ) and η 1 = V a r ( ϕ ( X 1 ) ) , Lemma A1 implies that η 1 = 1 16 M , and because X 1 is not a constant almost everywhere, η 1 > 0 . Therefore, by Theorem 4.2.1 of [38], we have
n S ^ ( γ 0 ) d N ( 0 , 4 2 η 1 ) = N ( 0 , M ) ,
that is,
n M 1 / 2 S ^ ( γ 0 ) d N ( 0 , 1 ) .
Thus, we conclude the result of this Theorem by noting that
n M 1 S n ( γ 0 ) = n M 1 S ^ 2 ( γ 0 )

References

  1. Bar-Lev, S.K.; Kokonendji, C.C. On the mean value parametrization of natural exponential families—A revisited review. Math. Methods Stat. 2017, 26, 159–175. [Google Scholar] [CrossRef]
  2. Tweedie, M.C.K. An index which distinguishes between some important exponential families, 579–604. In Statistics: Applications and New Directions, Proceedings of the Indian Statistical Institute Golden Jubilee International Conference; Ghosh, J.K., Roy, J., Eds.; Indian Statistical Institute: Calcutta, India, 1984. [Google Scholar]
  3. Bar-Lev, S.K.; Enis, P. Reproducibility and Natural Exponential Families with Power Variance Functions. Ann. Statist. 1986, 4, 1507–1522. [Google Scholar] [CrossRef]
  4. Jørgensen, B. Exponential Dispersion Models. J. R. Stat. Society. Ser. B (Methodol.) 1987, 49, 127–162. [Google Scholar] [CrossRef]
  5. Bar-Lev, S.K. Independent, Tough Identical Results: The Class of Tweedie on Power Variance Functions and the Class of Bar-Lev and Enis on Reproducible Natural Exponential Families. Int. J. Stat. Probab. 2020, 9, 30–35. [Google Scholar] [CrossRef]
  6. Cohen, J.E.; Huillet, T.E. Taylor’s law for some infinitely divisible probability distributions from population models. J. Stat. Phys. 2022, 188, 33. [Google Scholar] [CrossRef]
  7. Kokenendji, C.C.; Touré, A.Y.; Abid, R. On general exponential Weight functions and variation phenomenon. Sankhya A 2022, 84, 924–940. [Google Scholar] [CrossRef]
  8. Bar-Lev, S.K.; Casalis, M. A classification of reproducible natural exponential families in the broad sense. J. Theor. Probab. 2003, 16, 175–195. [Google Scholar] [CrossRef]
  9. Dunn, P.K.; Smyth, G.K. Series evaluation of Tweedie exponential dispersion model densities. Stat. Comput. 2005, 15, 267–280. [Google Scholar] [CrossRef] [Green Version]
  10. Bar-Lev, S.K.; Ridder, A. Monte Carlo Methods for Insurance Risk Computation. Int. J. Stat. Probab. 2019, 8, 55–74. [Google Scholar] [CrossRef]
  11. Bar-Lev, S.K.; Stramer, O. Characterizations of natural exponential families with power variance functions by zero regression properties. Probab. Theory Relat. Fields 1987, 76, 509–522. [Google Scholar] [CrossRef]
  12. Bar-Lev, S.K.; Bshouty, D.; Van der Duyn Schouten, F.A. Zero regression characterizations of natural exponential families generated by Lévy stable laws - a complementary. Math. Methods Stat. 2004, 13, 356–367. [Google Scholar]
  13. Bar-Lev, S.K. Discussion on paper by B. Jorgensen, “Exponential dispersion models”. J. Roy. Stat. Soc. Ser. B 1987, 49, 153–154. [Google Scholar]
  14. Good, I.J. C164. The cumulants of the lognormal distribution, including some conjectures. J. Stat. Simul. 1983, 17, 321–328. [Google Scholar] [CrossRef]
  15. Wilding, G.E.; Mudholkar, G.S. A gamma goodnes-of-fit test based on characteristic independence of the mean and coefficient of variation. J. Stat. Inference 2008, 138, 3813–3821. [Google Scholar] [CrossRef]
  16. Linnik, Y.V. Linear forms and statistical criteria. I, II. Ukrain. Mat. Zh. 1953, 5, 207–243, 247–290. English translation in Sel. Transl. Math. Statist. Prob. 1963, 3, 1–90. (In Russian) [Google Scholar]
  17. Nikitin, Y.Y. Tests based on characterizations, and their efficiencies: A survey. Acta Comment.Univ. Tartu. Math. 2017, 21, 3–24. [Google Scholar] [CrossRef] [Green Version]
  18. Vasicek, O. A test of normality based on sample entropy. J. Roy. Statist. Soc. B 1976, 38, 54–59. [Google Scholar] [CrossRef]
  19. Lin, C.T.; Mudholkar, G.S. A simple test for normality against asymmetric alternatives. Biometrika 1980, 67, 455–461. [Google Scholar] [CrossRef]
  20. Mudholkar, G.S.; Lin, C.T. On two applications of characterization theorems to goodness-of-fit. Colloq. Math. Soc. Janos Bolyai 1984, 45, 395–414. [Google Scholar]
  21. Mudholkar, G.S.; Natarajan, R.; Chaubey, Y.P. A goodness-of-fit test test for the inverse gaussian distribution using its independence characterization. Sankhya B 2001, 63, 362–374. [Google Scholar]
  22. Marchetti, C.E.; Mudholkar, G.S. Characterization theorems and goodness-of-fit test. In Goodness-of-Fit Tests and Model Validity, Statistics for Industry and Technology; Huber-Carol, C., Balakrishnan, N., Nikulin, M.S., Mesbah, M., Eds.; Birkhäuser: Boston, MA, USA, 2002. [Google Scholar]
  23. Mudholkar, G.S.; Tian, L. An entropy characterization of the inverse Gaussian distribution and related goodness-of-fit test. J. Statist. Plann. Inference 2002, 102, 211–221. [Google Scholar] [CrossRef]
  24. Jiménez-Gamero, M.D.; Milošević, B.; Obradović, M. Exponentiality tests based on Basu characterization. Statistics 2020, 54, 714–736. [Google Scholar] [CrossRef]
  25. Milošević, B. Asymptotic efficiency of goodness-of-fit tests based on Too-Lin characterization. Commun. Stat.-Simul. Comput. 2020, 49, 2082–2101. [Google Scholar] [CrossRef] [Green Version]
  26. Chhikara, R.S.; Folks, J.L. The Inverse Gaussian Distribution: Theory, Methodology, and Applications; Marcel Dekker: New York, NY, USA, 1989. [Google Scholar]
  27. Seshadri, V. The Inverse Gaussian Distribution Statistical Theory and Applications; Springer: New York, NY, USA, 1999. [Google Scholar]
  28. Betsch, S.; Ebner, B. A new characterization of the Gamma distribution and associated goodness-of-fit tests. Metrics 2019, 82, 779–806. [Google Scholar] [CrossRef] [Green Version]
  29. Henze, N.; Meintanis, S.G.; Ebner, B. Goodness-of-Fit Tests for the Gamma Distribution Based on the Empirical Laplace Transform. Commun. Stat. Theoryand Methods 2012, 41, 1543–1556. [Google Scholar] [CrossRef]
  30. Villasenor, J.A.; Gonzalez-Estrada, E. A variance ratio test of fit for Gamma distributions. Stat. Probab. Lett. 2015, 96, 281–286. [Google Scholar] [CrossRef]
  31. Janssen, A. Global power functions of goodness of fit tests. Ann. Stat. 2000, 28, 239–253. [Google Scholar] [CrossRef]
  32. Oberhettinger, F.; Badil, L. Tables of Laplace Transforms; Springer: New York, NY, USA, 1973. [Google Scholar]
  33. Zolotarev, V.M. Expressions of the density of a stable distribution with exponent a greater than one by means of a frequency with exponent 1/a. Dokl. Akad. Nauk SSSR 1954, 98, 735–738. [Google Scholar]
  34. Feller, W. An Introduction to Probability Theory and Its Applications 2; Wiley: New York, NY, USA, 1966. [Google Scholar]
  35. Gupta, R.D.; Kundu, D. A New Class of Weighted Exponential Distributions. Statistics 2009, 43, 621–634. [Google Scholar] [CrossRef]
  36. Bhaumik, D.K.; Kapur, K.; Gibbons, R.D. Testing Parameters of a Gamma Distribution for Small Samples. Technometrics 2009, 51, 326–334. [Google Scholar] [CrossRef]
  37. Bar-Lev, S.K. Batsidis, Apostolos and Economou, P. Tweedie, Bar-Lev, and Enis class of leptokurtic distributions as a candidate for modeling real data. Commun. Stat. Case Stud. Data Anal. Appl. 2021, 7, 229–248. [Google Scholar] [CrossRef]
  38. Korolyuk, V.S.; Borovskikh, Y.V. Theory of U-Statistics; Kluwer: Dordrecht, The Netherlands, 1994. [Google Scholar]
Figure 1. Power curves of the test statistic for 21 alternative distributions, where the null distribution is gamma. The variance M was estimated using the MLE. This figure corresponds to Table 1 in the main text ( n = 50 ; α = 0.05 ).
Figure 1. Power curves of the test statistic for 21 alternative distributions, where the null distribution is gamma. The variance M was estimated using the MLE. This figure corresponds to Table 1 in the main text ( n = 50 ; α = 0.05 ).
Mathematics 11 01603 g001
Figure 2. Power curves of the test statistic for 21 alternative distributions, where the null distribution is gamma. The variance M was estimated using the MLE. This figure corresponds to Table 2 in the main text ( n = 100 ; α = 0.05 ).
Figure 2. Power curves of the test statistic for 21 alternative distributions, where the null distribution is gamma. The variance M was estimated using the MLE. This figure corresponds to Table 2 in the main text ( n = 100 ; α = 0.05 ).
Mathematics 11 01603 g002
Table 1. Percentage of 10,000 Monte Carlo samples declared to be significant by various tests for the gamma distribution ( n = 50 , α = 0.05 ). MLE ν ^ was used.
Table 1. Percentage of 10,000 Monte Carlo samples declared to be significant by various tests for the gamma distribution ( n = 50 , α = 0.05 ). MLE ν ^ was used.
AlternativeAsymBtstr G n , 0.5 T n , 1 ( 1 ) T n , 4 ( 2 ) Goft
G a m m a ( 2 , 3 ) 3.154.005.125.284.841.56
G a m m a ( 2 , 2 ) 3.284.285.345.335.241.46
G a m m a ( 2 , 1 ) 3.234.175.125.315.311.80
G a m m a ( 0.5 , 1 ) 1.782.904.565.374.691.27
G a m m a ( 0.5 , 2 ) 1.572.824.215.054.561.12
G a m m a ( 2 , 5 ) 3.164.045.645.525.561.62
G a m m a ( 1 , 1 ) 2.573.595.005.505.311.32
I G ( 1 , 0.5 ) 30.8035.2983.4788.6284.2249.31
I G ( 1 , 1 ) 27.1830.8964.4266.7764.5134.00
I G ( 1 , 2 ) 23.4026.3141.0942.0832.2722.32
I G ( 1 , 4 ) 19.1520.6920.8522.406.6813.19
I G ( 1 , 8 ) 15.0515.737.7612.750.048.07
I G ( 1 , 10 ) 13.8714.215.0510.540.016.76
L N ( 0 , 0.5 ) 22.4524.1021.8224.188.1416.03
L N ( 0 , 0.6 ) 25.4827.9831.1632.3119.2421.47
L N ( 0 , 1 ) 33.3937.6861.4064.0062.0240.58
L N ( 0 , 1.4 ) 35.7540.7476.5582.3178.3653.79
L N ( 0 , 2 ) 37.9443.8372.6092.8188.4663.86
L N ( 0 , 3 ) 77.0980.801.2495.3891.1066.60
L N ( 0 , 5 ) 93.0294.620.0094.4389.1656.50
L N ( 0.5 , 1 ) 33.5137.9361.5063.9662.1840.60
B e t a ( 2 , 0.5 ) 78.9681.6499.6399.8248.0099.72
B e t a ( 2 , 2 ) 0.000.0164.8864.4945.6444.75
P a ( 1 , 1 ) 81.8084.9196.4399.9870.9998.43
P a ( 1 , 2 ) 95.9796.4499.9499.896.6098.17
H C ( 0 , 1 ) 57.8963.0183.2290.1987.0479.14
S P ( 1 ) 51.9557.1977.5892.4991.3476.93
S P ( 2 ) 33.9938.5158.6759.0759.0544.37
For all alternatives (i.e., except the gamma), the best performing result in each row is given in bold face.
Table 2. Percentage of 10,000 Monte Carlo samples declared to be significant by various tests for the gamma distribution ( n = 100 , α = 0.05 ). MLE ν ^ was used.
Table 2. Percentage of 10,000 Monte Carlo samples declared to be significant by various tests for the gamma distribution ( n = 100 , α = 0.05 ). MLE ν ^ was used.
AlternativeAsymBtstr G n , 0.5 T n , 1 ( 1 ) T n , 4 ( 2 ) Goft
G a m m a ( 2 , 3 ) 3.544.315.145.034.942.06
G a m m a ( 2 , 2 ) 3.734.624.965.115.472.04
G a m m a ( 2 , 1 ) 4.175.185.215.165.112.33
G a m m a ( 0.5 , 1 ) 2.253.694.845.385.241.97
G a m m a ( 0.5 , 2 ) 2.624.154.525.044.682.07
G a m m a ( 2 , 5 ) 4.054.745.535.505.532.31
G a m m a ( 1 , 1 ) 3.074.184.785.345.482.01
I G ( 1 , 0.5 ) 47.2152.2799.0199.6792.7084.76
I G ( 1 , 1 ) 41.6746.0292.5094.7784.1167.27
I G ( 1 , 2 ) 37.0040.1571.2074.3750.8147.06
I G ( 1 , 4 ) 30.6632.7341.3545.335.0829.07
I G ( 1 , 8 ) 23.3924.1517.3923.860.0116.69
I G ( 1 , 10 ) 21.1021.6812.2719.750.0013.89
L N ( 0 , 0.5 ) 35.9438.1740.7145.376.7233.24
L N ( 0 , 0.6 ) 42.0244.9255.5359.1227.8844.23
L N ( 0 , 1 ) 54.0457.7389.2491.4977.2373.54
L N ( 0 , 1.4 ) 57.5462.2697.1098.4987.6186.47
L N ( 0 , 2 ) 58.0164.1477.6499.7791.7193.12
L N ( 0 , 3 ) 92.1893.590.0599.9799.5195.82
L N ( 0 , 5 ) 98.3698.770.0099.9699.6193.74
L N ( 0.5 , 1 ) 54.0058.5189.1291.4177.5873.87
B e t a ( 2 , 0.5 ) 99.0399.25100.00100.0054.70100.00
B e t a ( 2 , 2 ) 0.635.4292.1292.4378.2090.08
P a ( 1 , 1 ) 95.0595.9394.74100.0044.17100.00
P a ( 1 , 2 ) 99.9199.92100.00100.000.49100.00
H C ( 0 , 1 ) 81.9584.7188.5899.3171.4098.00
S P ( 1 ) 75.6979.6978.1299.7780.9197.55
S P ( 2 ) 57.8162.5685.0284.6772.9476.30
For all alternatives (i.e., except the gamma), the best performing result in each row is given in bold face.
Table 3. Percentage of 10,000 Monte Carlo samples declared to be significant by various tests for the modified Bessel distribution B e s s e l ( μ , ϕ ) . MLE ν ^ was used, and α = 0.05 .
Table 3. Percentage of 10,000 Monte Carlo samples declared to be significant by various tests for the modified Bessel distribution B e s s e l ( μ , ϕ ) . MLE ν ^ was used, and α = 0.05 .
AlternativeAsymBtstr
n = 20 n = 50 n = 20 n = 50
B e s s e l ( 1 , 1 ) 1.152.702.103.70
B e s s e l ( 2 , 1 ) 0.611.711.102.70
B e s s e l ( 0.5 , 1 ) 1.632.642.503.20
B e s s e l ( 1 , 2 ) 0.353.260.403.70
B e s s e l ( 0.5 , 2 ) 0.662.091.203.10
B e s s e l ( 1 , 0.5 ) 2.303.283.604.10
Table 4. IIT Kanpur and vinyl chloride data sets investigated in Section 4.2.
Table 4. IIT Kanpur and vinyl chloride data sets investigated in Section 4.2.
IIT Kanpur DataVinyl Chloride Data
29255015135.11.21.30.60.5
271518772.40.51.180.8
819121850.40.60.90.42
21158621150.55.33.22.72.9
14391514702.52.310.20.1
4462358190.11.80.924
5023116346.81.20.40.2
1828341237
460202340
651931
Table 5. Summary statistics for IIT Kanpur and vinyl chloride data sets.
Table 5. Summary statistics for IIT Kanpur and vinyl chloride data sets.
IIT Kanpur DataVinyl Chloride Data
Sample size4834
Mean25.901.88
Median19.51.15
Standard deviation18.601.95
Inter-quartile range201.98
Table 6. Results of p-values for the test statistic in the special case of γ 0 = 2.5 for the two data sets.
Table 6. Results of p-values for the test statistic in the special case of γ 0 = 2.5 for the two data sets.
IIT Kanpur DataVinyl Chloride Data
Asym0.7670.849
Btstr0.4930.431
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bar-Lev, S.K.; Batsidis, A.; Einbeck, J.; Liu, X.; Ren, P. Cumulant-Based Goodness-of-Fit Tests for the Tweedie, Bar-Lev and Enis Class of Distributions. Mathematics 2023, 11, 1603. https://doi.org/10.3390/math11071603

AMA Style

Bar-Lev SK, Batsidis A, Einbeck J, Liu X, Ren P. Cumulant-Based Goodness-of-Fit Tests for the Tweedie, Bar-Lev and Enis Class of Distributions. Mathematics. 2023; 11(7):1603. https://doi.org/10.3390/math11071603

Chicago/Turabian Style

Bar-Lev, Shaul K., Apostolos Batsidis, Jochen Einbeck, Xu Liu, and Panpan Ren. 2023. "Cumulant-Based Goodness-of-Fit Tests for the Tweedie, Bar-Lev and Enis Class of Distributions" Mathematics 11, no. 7: 1603. https://doi.org/10.3390/math11071603

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop