Next Article in Journal
A Tracklet-before-Clustering Initialization Strategy Based on Hierarchical KLT Tracklet Association for Coherent Motion Filtering Enhancement
Next Article in Special Issue
Scale Mixture of Exponential Distribution with an Application
Previous Article in Journal
Stability Analysis of the Rational Solutions, Periodic Cross-Rational Solutions, Rational Kink Cross-Solutions, and Homoclinic Breather Solutions to the KdV Dynamical Equation with Constant Coefficients and Their Applications
Previous Article in Special Issue
On a Parallelised Diffusion Induced Stochastic Algorithm with Pure Random Search Steps for Global Optimisation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Class of Generalized Probability-Weighted Moment Estimators for the Pareto Distribution

NOVA School of Science and Technology (FCT NOVA) and CMA, Campus de Caparica, NOVA University Lisbon, 2829-516 Caparica, Portugal
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(5), 1076; https://doi.org/10.3390/math11051076
Submission received: 16 December 2022 / Revised: 15 February 2023 / Accepted: 16 February 2023 / Published: 21 February 2023
(This article belongs to the Special Issue Computational Statistical Methods and Extreme Value Theory)

Abstract

:
Estimation based on probability-weighted moments is a well-established method and an excellent alternative to the classic method of moments or the maximum likelihood method, especially for small sample sizes. In this research, we developed a new class of estimators for the parameters of the Pareto type I distribution. A generalization of the probability-weighted moments approach is the foundation for this new class of estimators. It has the advantage of being valid in the entire parameter space of the Pareto distribution. We established the asymptotic normality of the new estimators and applied them to simulated and real datasets in order to illustrate their finite sample behavior. The results of comparisons with the most used estimation methods were also analyzed.

1. Introduction

The Pareto distribution has resulted from the work of the economist Vilfredo Pareto [1]. Pareto observed that the number of taxpayers with an income greater than x could be approximated by  b x a , where a and b are positive parameters. This fact led to the introduction of several variants of the Pareto distribution, with a survival function proportional to  x a . The most common Pareto distribution, often referred to as Pareto type I, will be investigated in this work. Given a random variable X with a Pareto type I distribution, the distribution function (d.f.) is as follows:
F ( x | a , c ) = 1 x c a , x > c , a > 0 , c > 0 ,
where a and c are the shape and scale parameters, respectively. The corresponding probability density function is
f ( x | a , c ) = a c a x a + 1 , x > c , a > 0 , c > 0 .
The parameter c corresponds to the lower bound for the support of the random variable, whereas the parameter a quantifies the heaviness of the right tail and is also referred to as the tail or Pareto index [2,3]. As a decreases, the tail becomes heavier. The d.f. in (1) is inverted to produce the associated quantile function of X, which is represented by
Q ( p | a , c ) = c ( 1 p ) 1 / a , 0 < p < 1 , a > 0 , c > 0 ,
where the lower tail probability is denoted by p. Despite the simple analytic expressions in Equations (1)–(3), this model has been successfully applied in a large number of different fields, such as bibliometrics, demography, economy, geology, insurance and finance, among others. An alternative form of the Pareto model results from the change in location  X c . This alternative form is known as the Pareto type II or the Lomax ([4,5]) distribution. Another related model is the generalized Pareto distribution [6]. Under a semiparametric framework, the Pareto type I distribution is often used in the analysis of extreme events. Under such a framework, we use Equation (1) as an upper tail model and work with the reciprocal parameter  ξ = 1 / a , the so-called extreme value index. Detailed discussions and reviews of  ξ  estimation for Pareto-tailed models can be found in works by Beirlant et al. [7,8,9], Gomes and Guillou [10] and Peng and Qi [11], among others.
The estimation of the shape and scale parameters a and c is an important and popular research topic. Although maximum likelihood estimators have optimal properties, such properties are only guaranteed asymptotically. Thus, different estimation methods, which performed better than the maximum likelihood method for small or moderate sample sizes, have been proposed in the literature by many authors. Quandt [12] compared the performance of the maximum likelihood estimator with that of the moments estimator, a least squares estimator and four quantile estimators. Different least squares estimators were examined by Lu and Tau [13], Caeiro et al. [14], Kantar [15] and Kim et al. [16]. Robust estimators of the shape parameter were introduced by Brazauskas and Serfling [17] and Vandewalle et al. [18]. Bayesian estimators can be found in Arnold and Press [19], Rasheed and Al-Gazi [20] and Han [21]. Singh and Guo [22], Caeiro and Gomes [23,24] and Munir et al. [25] considered probability-weighted moment estimators. Bhatti et al. [26,27] proposed modified maximum likelihood estimators and Chen et al. [28] dealt with the estimation of the Pareto parameters with a modification of ranked set sampling.
The purpose of this article is to examine a new method for estimating the shape and scale parameters of a Pareto model. The remainder of the paper is structured as follows. In Section 2, we review the most common estimators for the parameters of the Pareto distribution and introduce the new class of estimators. These estimators, called log-generalized probability-weighted moment estimators, are derived from a modification of the classic probability-weighted moments method. In Section 3, we study the asymptotic results for the new class of estimators. A Monte Carlo simulation study and two real data applications are provided in Section 4 to illustrate the performance of the estimators. Some concluding remarks are given in Section 5.

2. Traditional and New Techniques for Estimating the Parameters of the Pareto Distribution

This section covers some common estimation methods for the shape and scale parameters from the Pareto distribution in (1) and introduces a new estimation procedure. Assume that  X 1 , X 2 , , X n  is a sample of independent and identically distributed (i.i.d.) random variables, from a Pareto distribution, as defined in (1), with both parameters, a and c, unknown. The sample of non-decreasing order statistics is denoted as  X 1 : n , X 2 : n , , X n : n .

2.1. Maximum Likelihood Estimators

The maximum likelihood (ML) estimators are found by maximizing the log-likelihood function and have the closed-form expressions
a ^ M L = 1 n i = 1 n ln X i ln X 1 : n 1 , c ^ M L = X 1 : n .

2.2. Moment Estimators

It is well known that the non-central moments of order k for the Pareto model are expressed as follows:
E ( X k ) = a c k a k , if a > k .
In applications, the approach of moments based on the two first moments is unpopular because the second moment only exists for  a > 2 , and other moment-based estimators have emerged in the literature. To extend the domain of validity of the estimators based on moments, Quandt [12] considered the first non-central moment of X E ( X ) , and the first moment of the sample minimum,  E ( X 1 : n ) . The sample minimum of a Pareto distribution has a Pareto distribution whose scale and shape parameters are c and  a n , respectively. Quandt obtained the moment (M) estimators by equating two aforementioned theoretical moments to the corresponding sample moments and solving the system of equations following the parameters of the distribution. The estimators obtained are consistent for  a > 1  and given by
a ^ M = n X ¯ X 1 : n n ( X ¯ X 1 : n ) , c ^ M = 1 1 n a ^ M X 1 : n ,
where  X ¯  denotes the arithmetic sample mean.

2.3. Probability Weighted Moment Estimators

The probability-weighted moment (PWM) method (Greenwood et al. [29]) is currently a well-established estimation procedure in the field of hydrology. Studies using Monte Carlo simulations demonstrated that, for small sample sizes, PWM estimators outperform other estimation techniques (Hosking et al. [30]). The PWMs of a random variable X, with d.f. F, are defined as
M k , r , s = E ( X k ( F ( X ) ) r ( 1 F ( X ) ) s )
where k, r and s are real numbers. If the mean value  M 1 , 0 , 0  exists, then  M 1 , r , s  exists for any real positive values r and s. The PWM method generalizes the classic method of moments: when  r = s = 0 M k , 0 , 0  are the non-central moments of order k. For models that have a closed-form quantile function, Q, it may be more convenient to compute the PWMs as
M k , r , s = 0 1 ( Q ( u ) ) k u r ( 1 u ) s d u .
More recently, this method was modified for models without an analytic d.f. and quantile function (see Jing et al. [31]). The PWM estimators are derived by equating  M k , r , s  with their respective sample moments and then solving those equations following the parameters of the distribution. Greenwood et al. [29] and Hosking et al. [30] recommend using  M 1 , r , s , since the relations between parameters and moments are usually much simpler. The empirical estimate of  M 1 , r , s  is usually less sensitive to outliers and has good properties when the sample size is small. For convenience, several authors chose to use  k = 1  and non-negative integer values for r and s. This approach will be referred to as the classic PWM method. In addition, when r and s are non-negative integers, it is more convenient to work with the PWMs
α r = M 1 , 0 , r = E ( X ( 1 F ( X ) ) r ) , r = 0 , 1 , ,
or
β r = M 1 , r , 0 = E ( X ( F ( X ) ) r ) , r = 0 , 1 , .
It should be noted that  F ( X ) r ( 1 F ( X ) ) s  can be represented as a linear combination of powers of  F ( X )  or  1 F ( X )  for non-negative integers r and s. As a result, we may use the following equations to relate  α r  and  β r :
α r = j = 0 r ( 1 ) j r j β j and β r = j = 0 r ( 1 ) j r j α j ,
where  r j  denotes the binomial coefficient. Using  α r  or  β r  is equivalent as long as the values for r are non-negative integers that are as small as possible. For non-negative integer values of r, the unbiased estimators of the PWMs  α r  and  β r , defined in (7) and (8), are, respectively (Landwehr et al. [32]),
α ^ r = 1 n i = 1 n r n i r n 1 r X i : n and β ^ r = 1 n i = r + 1 n i 1 r n 1 r X i : n .
Instead of the unbiased estimators, one may prefer to use the biased estimators
α ˜ r = 1 n i = 1 n ( 1 p i : n ) r X i : n and β ˜ r = 1 n i = 1 n p i : n r X i : n ,
where r can be a real number and  p i : n  are the plotting positions; that is, empirical estimates of  F ( X i : n ) . The options that are most frequently used for plotting positions are
p i : n = i b n , 0 b 1
or
p i : n = i b n + 1 2 b , 0.5 b 0.5
where b is a continuity correction factor. Landwehr et al. [33] concluded empirically that moderated biased estimators of the PWMs could produce more accurate estimates of upper quantiles.
For the Pareto distribution in (1), the PWMs in (6) are given by
M k , r , s = c k B s + 1 k a , r + 1 , s k / a > 1 , r > 1 ,
where  B  stands for the complete beta function. By setting the exponents  ( k , r ) = ( 1 , 0 ) , we obtain the classical PWMs for the Pareto distribution, valid for  a > ( 1 + s ) 1  and given by
α s = M 1 , 0 , s = c ( s + 1 1 / a ) , s > 1 a 1 .
Singh and Guo [22], Caeiro and Gomes [23,34], Munir et al. [25] and Caeiro et al. [35] took the PWMs  α 0  and  α 1  into account and deduced the associated PWM estimators for the shape and scale parameters of the Pareto distribution. Those estimators are
a ^ P W M = α ^ 0 α ^ 1 α ^ 0 2 α ^ 1 a n d c ^ P W M = α ^ 0 α ^ 1 α ^ 0 α ^ 1 ,
with  α ^ 0  and  α ^ 1  given in (9). As stated earlier, the PWM estimators in (11) are only defined for a Pareto model with finite mean value ( a > 1 ).

2.4. Extended Class of PWM Estimators

The theoretical PWMs defined in (6) can have any real values for the exponents k, r and s; however, early applications only considered non-negative integer exponents. Rasmussen [36] explored PWMs with real exponents and referred to this method as generalized PWM (GPWM) to distinguish it from the classic PWM approach. He found that, in most cases, the GPWM method outperforms the classic PWM method. To simplify the GPWM method, it is recommended to limit the class of GPWMs by setting  ( k , r ) = ( 1 , 0 )  or  ( k , s ) = ( 1 , 0 ) . This restriction leads to the use of simpler analytical formulas for GPWMs. The GPWM estimators are the ones in (10) for any real value of r. Another version of the PWM method was introduced by Caeiro and Prata Gomes [37]. The authors worked in the context of Pareto-type tails and considered a different type of PWM, specified by
M g , r , s * = E ( g ( X ) ( F ( X ) ) r ( 1 F ( X ) ) s )
with  g ( x ) = ln ( x ) r = 0 s = 0 , 1 . Such a class of PWMs was named log PWM (LPWM) and has the advantage of extending the domain of validity of the estimators to the complete parameter space for the Pareto model. Caeiro and Mateus [38] considered the LPWMs in (12) with  r = 0  and studied the corresponding LPWMs for the Pareto model.
l s = M ln , 0 , s * = ln ( c ) 1 + s + 1 a ( 1 + s ) 2 , s > 1 .
If we take into consideration the LPWMs  l 0  and  l 1 , the respective LPWM estimators of the shape and scale parameters of the Pareto distribution in (1) are, respectively,
a ^ L P W M = 1 2 l 0 ^ 4 l 1 ^ a n d c ^ L P W M = exp ( 4 l 1 ^ l 0 ^ )
where  l ^ s s = 0 , 1  are the unbiased empirical estimator of  l s  given by
l ^ s = 1 n i = 1 n s n i s n 1 s ln X i : n .
Recently, Chen [39] introduced an extended class of GPWMs by evaluating the PWMs in (12) with g a suitable continuous function and r and s any real values. Mateus and Caeiro [40] considered the extended class of GPWMs with  g ( x ) = ln ( x )  for a rescaled sample of the Pareto model. This approach, called log-generalized probability-weighted (LGPWM), uses one theoretical moment and only provides an estimator for the shape parameter of the Pareto distribution. For the estimation of the scale parameter, Mateus and Caeiro [40] used an estimator similar to the moment estimator,  c ^ M .

2.5. New Class of LGPWM Estimators

We now introduce a new LGPWM class of estimators for the Pareto distribution that provides shape and scale estimators and generalizes the LPWM estimators in (14). The new LGPWM estimators are built using the moments  l s  in (13) for any real value of  s > 1 . Then, for each real s, the corresponding empirical (biased) estimator is provided by
l ˜ s = 1 n i = 1 n ( 1 p i : n ) s ln X i : n .
where  p i : n  are the plotting positions. To estimate the two parameters of the Pareto distribution, we shall consider the theoretical moments  l s 1  and  l s 2  in (13) with  s 1 < s 2 . Equating the moments  l s 1  and  l s 2  to the corresponding empirical estimate in (16) and solving the system of equations in the order of the parameters a and c, we obtain the following estimators:
a ^ L G P W M = s 2 s 1 ( 1 + s 1 ) ( 1 + s 2 ) [ ( 1 + s 1 ) l ˜ s 1 ( 1 + s 2 ) l ˜ s 2 ] ,
and
c ^ L G P W M = exp ( 1 + s 1 ) 2 l ˜ s 1 ( 1 + s 2 ) 2 l ˜ s 2 s 2 s 1
where  s 1 < s 2 . The tuning parameters  s 1  and  s 2  should be chosen carefully in order to obtain a good fit of the sample data. A possible selection of  s 1  and  s 2  will be presented in Section 4.

3. Distributional Behavior of the LGPWM Estimators

To better understand the behavior of the estimators under consideration, and in order to compare their relative performance with other established estimators from the literature, it is important to study their sampling distribution. Unfortunately, for the estimators depending on a weighted average of the complete set of order statistics, the exact distribution cannot be derived analytically. As a compromise, we will study the asymptotic sampling distribution of the estimators considered here. Such asymptotic distributions can be used as an approximation to the exact distribution for large values of n and usually provide a good approximation for samples of sizes larger than 50.
In the following,  d  and  = d  stand, respectively, for convergence and equality in distribution. Next, we present, without proof, in Proposition 1 and Proposition 2, the non-degenerated asymptotic distribution of the commonly known estimators from the literature given in (4), (5) and (11).
Proposition 1
(Mateus and Caeiro [40,41]). Suppose that  ( X 1 , X 2 , , X n )  is an i.i.d. sample from the Pareto population with d.f. in (1). Then,
n a ^ M L a n d N 0 , a 2 ,
n a ^ M a n d N 0 , a ( a 1 ) 2 a 2 , if a > 2 ,
and
n a ^ P W M a n d N 0 , a ( a 1 ) ( 2 a 1 ) 2 ( a 2 ) ( 3 a 2 ) , if a > 2 ,
where  N ( μ , σ 2 )  represents a normal random variable with mean value μ and variance  σ 2 .
Proposition 2.
Under the conditions of Proposition 1, we have
( ( 1 n 1 ) 1 / a 1 ) c ^ M L c 1 n d E x p ( 1 ) ,
( ( 1 n 1 ) 1 / a 1 ) c ^ M c 1 n d E x p ( 1 ) , if a > 2 ,
and
n c ^ P W M c 1 n d N 0 , a 1 a ( 3 a 2 ) ( a 2 ) , if a > 2 ,
where  E x p ( 1 )  refers to a standard exponential random variable with d.f.
F E ( x ) = 1 e x , x > 0 .
The following lemma and proposition are required to state the non-degenerate asymptotic limit behavior of the LGPWM estimators.
Lemma 1.
Let X be a Pareto random variable with d.f. given in (1) and E a standard exponential random variable with d.f. given in Equation (25). Then,  ln X  has a shifted and re-scaled standard exponential distribution (Arnold [42]):
ln X = d ln c + 1 a E .
Moreover, since the previous relation between the Pareto and exponential distributions is strictly increasing, it follows that, for a sample of size n,
ln X i : n = d ln c + 1 a E i : n .
where  E 1 : n E 2 : n E n : n  are the non-decreasing order statistics from n mutually independent and identically distributed standard exponentially random variables.
Proposition 3.
Consider a sample of size n from a Pareto population and define
D s 1 , s 2 ( ω ) = 1 n i = 1 n ( 1 + s 1 ) ω 1 i n s 1 ( 1 + s 2 ) ω 1 i n s 2 ln X i : n ,
with  0.5 < s 1 < s 2  and any real ω. The asymptotic limit distribution
n D s 1 , s 2 ( ω ) μ D ( ω ) n d N 0 , σ D ( ω ) 2
holds true, with
μ D ( ω ) = ln ( c ) [ ( 1 + s 1 ) ω 1 ( 1 + s 2 ) ω 1 ] + ( 1 + s 1 ) ω 2 ( 1 + s 2 ) ω 2 a ,
and
σ D ( ω ) 2 = 1 a 2 ( 1 + s 1 ) 2 ( ω 1 ) 1 + 2 s 1 + ( 1 + s 2 ) 2 ( ω 1 ) 1 + 2 s 2 2 ( 1 + s 1 ) ω 1 ( 1 + s 2 ) ω 1 1 + s 1 + s 2 .
Proof of Proposition 3.
Using Lemma 1 we can write
D s 1 , s 2 ( ω ) = ln ( c ) T 0 + 1 a T n ,
with
T 0 = 1 n i = 1 n J ( i / n ) , T n = 1 n i = 1 n J ( i / n ) E i : n ,
and
J i n = ( 1 + s 1 ) ω 1 i n s 1 ( 1 + s 2 ) ω 1 i n s 2 .
Hence, note that  T 0  converges toward  0 1 J ( x ) d x = ( 1 + s 1 ) ω 1 ( 1 + s 2 ) ω 1 . By utilizing the asymptotic result in the study by Arnold et al. [43] (p.229), for linear functions of order statistics, we obtain
n ( T n μ T n ) n d N 0 , σ T n 2 , 0.5 < s 1 < s 2
with
μ T n = 0 1 x J ( 1 e x ) e x d x = ( 1 + s 1 ) ω 2 ( 1 + s 2 ) ω 2
and
σ T n 2 = 2 0 J ( 1 e x ) ( 1 e x ) x x J ( 1 e y ) e y d y d x = ( 1 + s 1 ) 2 ( ω 1 ) 1 + 2 s 1 + ( 1 + s 2 ) 2 ( ω 1 ) 1 + 2 s 2 2 ( 1 + s 1 ) ω 1 ( 1 + s 2 ) ω 1 1 + s 1 + s 2 .
Combining the asymptotic results for  T 0  and  T n , the limit distribution in (27) follows straightforwardly. □
Next, we establish the non-degenerate asymptotic behavior of the LGPWM estimators in (17) and (18).
Proposition 4.
Let us consider the conditions of Proposition 3. Then,
n a ^ L G P W M a n d N 0 , 2 a 2 ( 1 + s 1 ) 2 ( 1 + s 2 ) 2 ( 1 + 2 s 1 ) ( 1 + 2 s 2 ) ( 1 + s 1 + s 2 ) ,
and
n ( c ^ L G P W M c ) n d N 0 , c 2 a 2 ( s 2 s 1 ) 2 ( 1 + s 1 ) 2 1 + 2 s 1 + ( 1 + s 2 ) 2 1 + 2 s 2 2 ( 1 + s 1 ) ( 1 + s 2 ) 1 + s 1 + s 2 ,
with  0.5 < s 1 < s 2 .
Proof. 
First, notice that we can write the LGPWM estimators in (17) and (18) as
a ^ L G P W M = s 2 s 1 ( 1 + s 1 ) ( 1 + s 2 ) D s 1 , s 2 ( 1 )
and
c ^ L G P W M = exp D s 1 , s 2 ( 2 ) s 2 s 1 ,
with  D s 1 , s 2 ( ω )  in (26).
Let  ξ n = ( 1 + s 1 ) ( 1 + s 2 ) s 2 s 1 D s 1 , s 2 ( 1 ) . Then, invoking Proposition 3 with  ω = 1 , we obtain
n ξ ^ n 1 a n d N 0 , 2 ( 1 + s 1 ) 2 ( 1 + s 2 ) 2 a 2 ( 1 + 2 s 1 ) ( 1 + 2 s 2 ) ( 1 + s 1 + s 2 ) .
Noticing that  a ^ L G P W M = 1 / ξ ^ n  and applying the delta method, the asymptotic result in (29) is established. Then, defining  γ ^ n = D s 1 , s 2 ( 2 ) s 2 s 1  and using the result from Proposition 3 again, with  ω = 2 , we obtain
n ( γ ^ n ln ( c ) ) n d N 0 , ( 1 + s 1 ) 2 1 + 2 s 1 + ( 1 + s 2 ) 2 1 + 2 s 2 2 ( 1 + s 1 ) ( 1 + s 2 ) 1 + s 1 + s 2 a 2 ( s 2 s 1 ) 2 .
Applying the delta method to  n exp ( γ ^ n ) c , we obtain the limit distribution in (30). □
Remark 1.
Since  l ^ s  and  l ˜ s  defined in (15) and (16), respectively, are asymptotic equivalent, straightforward computations to the result of Proposition 2 allow us to obtain the following asymptotic limit distribution for the LPWM estimators in (14):
n a ^ L P W M a n d N 0 , 4 a 2 3
and
n ( c ^ L P W M c ) n d N 0 , c 2 3 a 2 .

4. Numerical Results

In this section, we analyze simulated and real datasets to assess the performance of the estimation procedures discussed in Section 2. For the LGPWM estimators in (17) and (18), we used the empirical values of (16) with plotting positions  p i : n = ( i 0.35 ) / n , where  1 i n . Since the LGPWM estimation method requires two tuning parameters, we first present a data-driven algorithm to determine these parameters.

4.1. Data-Driven Tuning Parameter Selection for the LGPWM Estimator

Consider the LGPWM estimators  a ^  and  c ^  with tuning parameters  s 1  and  s 2  taking values in  ( 0.5 , 4 ] , with  s 1 < s 2 , discretized in small steps of length 0.1. For each pair of values ( s 1 , s 2 ), we analyze the fit of the Pareto model by comparing the empirical cumulative distribution function,  F n ( x ) , with the fitted cumulative distribution function,  F ^ ( x ) = F ( x | a ^ , c ^ ) , as defined in (1), using an appropriate goodness-of-fit statistic. Lastly, we select the set of parameters that provides the best fit. To measure the agreement between the observations and the model, the following goodness-of-fit statistic tests were considered:
  • Kolmogorov–Smirnov (KS) statistic:
    D n = sup x | F n ( x ) F ( x ) | = max D n + , D n
    with
    D n + = max 1 i n i n F ( X i : n | a ^ , c ^ ) ,
    and
    D n = max 1 i n i 1 n F ( X i : n | a ^ , c ^ ) .
  • Cramér–von Mises (CvM) statistic:
    W n 2 = i = 1 n 2 i 1 2 n F ( X i : n | a ^ , c ^ ) 2 + 1 12 n .
  • Modified Anderson–Darling (MAD) statistic (Ahmad et al. [44]):
    A U n 2 = n 2 i = 1 n 2 2 i 1 n log ( 1 F ( X i : n | a ^ , c ^ ) ) 2 i = 1 n F ( X i : n | a ^ , c ^ ) .
Relative to the usual Anderson–Darling statistic, the  A U n 2  statistic in (35) gives more weight to the data in the upper tail. Smaller values of the statistics in (33)–(35) correspond to a better fit of the Pareto model. For a statistical power comparison between some of the aforementioned statistical tests, see Razali and Wah [45] or Singla et al. [46].

4.2. Simulation Study

In this subsection, we conduct a Monte Carlo simulation experiment to illustrate the performance of the aforementioned estimation methods for the shape and scale parameters of the Pareto model. We refer to the LGPWM estimators as LGPWM-KS, LGPWM-CvM and LGPWM-MAD when the tuning parameters  s 1  and  s 2  are selected using the data-driven method described in Section 4.1 based on the statistics in (33), (34) and (35), respectively. All computation was performed in software R. We simulated  r = 200  samples of sizes  n = 15 , 20, 30, 40, 50, 75, 100, 150 and 200 from the Pareto distribution, taking the following combination of shape and scale parameters:  ( a , c )  =  ( 0.1 , 0.25 ) ( 0.25 , 0.5 ) ( 0.75 , 0.5 )  and  ( 1 , 1 ) . To evaluate the accuracy and efficiency of the various estimators, we computed the simulated bias and the root mean squared error (RMSE) for each sample size, each set of parameters and the estimator under study.
The simulated results are summarized in Table 1 and Table 2. As can be seen from these tables, the estimated biases and root mean squared errors generally tend toward zero for all estimation methods as the sample size increases, except for the M and PWM. This can be explained by the fact that the M estimator of a and both PWM estimators are not consistent if  a 1 . Moreover, most of the estimators usually overestimate the target parameter. Regarding the LGPWM estimator, the optimal selection of tuning parameters is obtained through the data-driven method outlined in Section 4.1 using the MAD statistic.
For the estimation of the shape parameter a, the LGPWM-MAD estimator always has the smallest absolute bias and the smallest RMSE if the sample size is small. For larger sample sizes, the ML estimators have the lowest RMSE. In addition, the performance of the ML estimator is always quite close to the LGPWM-MAD estimator. Comparing the performance of all of the estimators for the scale parameter c, it is observed that the M estimator usually has the smallest RMSE. The LGPWM-MAD provides generally good results in terms of absolute bias.

4.3. Real Data Analysis

We now analyze the fit of a Pareto model to two real datasets: the population of the 150 largest metropolitan areas in the world and the estimated number of deaths from major earthquakes.

4.3.1. Population of the Largest Metropolitan Areas in the World

This dataset has the 150 largest cities in the world, by population, and was retrieved from the worldatlas website [47]. Since the webpage with the dataset is no longer available, data can be retrieved using the Wayback Machine website (https://web.archive.org/ (accessed on 15 May 2021)) or in Appendix A. Values were converted to millions ( × 10 6 ). In Table 3 we provide the descriptive statistics obtained with the function summary in R software.
If data come from a Pareto distribution, high-order moments might not exist. Therefore, to assess the skewness, we computed the Bowley [48] coefficient of skewness,
S b = q 3 + q 1 2 q 2 q 3 q 1 = 0.483 ,
where  q 1 q 2  and  q 3  are the first, second and third empirical quartiles, respectively. This measure of skewness is robust against extreme values. For other robust measures of skewness, see, among others, Horn [49], Kim and White [50] and Brys et al. [51]. Since  S b > 0  and the median is smaller than the mean, we conclude that the underlying model is positively skewed. The histogram and the boxplot of these observations, in Figure 1, confirm the skewness of the data.
Figure 2 suggests a Paretian behavior of the data. For more details regarding the construction of the Pareto Q-Q plot, see refs. [7,52].
The parameter estimates for the fitted Pareto distribution, provided by the ML, L, PWM and LGPWM estimators, and the values of  D n W n 2  and  A U n 2  test statistics in (33), (34) and (35), respectively, are shown in Table 4. The smallest value of each test statistic is presented in bold. We took all possible combinations of values  ( s 1 , s 2 )  and chose the three combinations that provided the smallest values for each of the aforementioned test statistics. The values of the test statistics show that the new LGPWM estimators are relatively better than any other considered estimators. The choice of parameters  s 1 = 0.9  and  s 2 = 1.0  for the LGPWM estimators provides the smallest or second smallest value of the test statistics  D n W n 2  and  A U n 2 . Note that not all methods perform well: the  c ^ P W M  estimator produced an inadequate estimate ( c ^ P W M > x 1 : 150 ).

4.3.2. Estimated Number of Deaths in Major Earthquakes

The second data set is available in Clark [53] and contains the estimated number of deaths in international earthquakes (from 1900 to 2011). The values of the data are as follows: 316,000, 242,769, 227,898, 200,000, 142,800, 110,000, 87,587, 86,000, 72,000, 70,000, 50,000, 40,900, 32,700, 32,610, 31,000, 30,000, 28,000, 25,000, 23,000, 20,896, 20,085. Values were converted to thousands ( × 10 3 ). Table 5 shows the descriptive statistics of the data.
The Bowley coefficient of skewness is 0.5. Figure 3 shows the histogram and the boxplot, which are clearly right skewed.
The Q-Q plot in Figure 4 suggests a Paretian behavior of the data.
The parameter estimates of the Pareto model and the empirical value of the Kolmogorov–Smirnov, Cramér–von Mises and modified Anderson–Darling criteria are shown in Table 6. Overall, the LGPWM method provides a good fit. From Table 6, it is seen that there is no significant difference between using the Cramér–von Mises or modified Anderson–Darling criteria. In addition, notice that the scale PWM estimate is again invalid, since it is greater than the sample minimum.

5. Conclusions

In this research, we propose a new class of estimators for the shape and scale parameters of a Pareto distribution, named the log-generalized probability-weighted moment. This new class can be viewed as a generalization of the well-known probability-weighted moments and offers the advantage of extending the domain of the validity of the estimators to the complete parameter space of the Pareto distribution. Additionally, the asymptotic sampling distribution of the estimators provided by this method can be used as an approximation of the exact distribution for large sample sizes. The usefulness of the new estimation method was illustrated through a simulation study and two real data applications. It is concluded that, with appropriate choices of the tuning parameters  s 1  and  s 2 , the proposed LGPWM estimators are capable of competing with the most commonly used estimation methods. As future research, we plan to examine the utilization of other goodness-of-fit statistics in the data-driven method for selecting the tuning parameters.

Author Contributions

Conceptualization, F.C.; methodology, F.C. and A.M.; validation, A.M.; Investigation, F.C. and A.M.; data curation, F.C.; writing—original draft preparation, F.C. and A.M.; writing—review and editing, F.C. and A.M. All authors have read and agreed to the published version of the manuscript.

Funding

Research partially supported by National Funds through FCT—Fundação para a Ciência e a Tecnologia, projects UIDB/00297/2020 and UIDP/00297/2020 (Centro de Matemática e Aplicações).

Data Availability Statement

The data supporting the findings in Section 4.3 of this study are available within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Population of the largest metropolitan areas in the world.
Table A1. Population of the largest metropolitan areas in the world.
RankCityCountryPopulationRankCityCountryPopulation
1TokyoJapan38,001,00076AbidjanCote d’Ivoire4,859,798
2DelhiIndia25,703,16877GuadalajaraMexico4,843,241
3ShanghaiChina23,740,77878YangonMyanmar4,801,930
4São PauloBrazil21,066,24579AlexandriaEgypt4,777,677
5MumbaiIndia21,042,53880AnkaraTurkey4,749,968
6Mexico CityMexico20,998,54381KabulAfghanistan4,634,875
7BeijingChina20,383,99482QingdaoChina4,565,549
8OsakaJapan20,237,64583ChittagongBangladesh4,539,393
9CairoEgypt18,771,76984MonterreyMexico4,512,572
10New YorkUnited States18,593,22085SydneyAustralia4,505,341
11DhakaBangladesh17,598,22886DalianChina4,489,380
12KarachiPakistan16,617,64487XiamenChina4,430,081
13Buenos AiresArgentina15,180,17688ZhengzhouChina4,387,118
14KolkataIndia14,864,91989BostonUnited States4,249,036
15IstanbulTurkey14,163,98990MelbourneAustralia4,203,416
16ChongqingChina13,331,57991BrasíliaBrazil4,155,476
17LagosNigeria13,122,82992JiddahSaudi Arabia4,075,803
18ManilaPhilippines12,946,26393PhoenixUnited States4,062,605
19Rio de JaneiroBrazil12,902,30694Ji’nanChina4,032,150
20GuangzhouChina12,458,13095MontréalCanada3,980,708
21Los AngelesUnited States12,309,53096ShantouChina3,948,813
22MoscowRussia12,165,70497NairobiKenya3,914,791
23KinshasaD. Rep. Congo11,586,91498MedellínColombia3,910,989
24TianjinChina11,210,32999FortalezaBrazil3,880,202
25ParisFrance10,843,285100KunmingChina3,779,558
26ShenzhenChina10,749,473101ChangchunChina3,762,390
27JakartaIndonesia10,323,142102ChangshaChina3,761,018
28LondonUnited Kingdom10,313,307103RecifeBrazil3,738,526
29BangaloreIndia10,087,132104RomeItaly3,717,956
30LimaPeru9,897,033105ZhongshanChina3,691,360
31ChennaiIndia9,890,427106Cape TownSouth Africa3,660,447
32SeoulSouth Korea9,773,746107DetroitUnited States3,639,050
33BogotáColombia9,764,769108HanoiVietnam3,629,493
34NagoyaJapan9,406,264109Tel AvivIsrael3,608,265
35JohannesburgSouth Africa9,398,698110Porto AlegreBrazil3,602,526
36BangkokThailand9,269,823111KanoNigeria3,587,049
37HyderabadIndia8,943,523112SalvadorBrazil3,582,967
38ChicagoUnited States8,744,835113FaisalabadPakistan3,566,952
39LahorePakistan8,741,365114BerlinGermany3,563,194
40TehranIran8,432,196115AleppoSyria3,561,796
41WuhanChina7,905,572116DakarSenegal3,520,215
42ChengduChina7,555,705117CasablancaMorocco3,514,958
43DongguanChina7,434,935118UrumqiChina3,498,591
44NanjingChina7,369,157119TaiyuanChina3,481,810
45AhmadabadIndia7,342,850120CuritibaBrazil3,473,681
46Hong KongHong Kong7,313,557121JaipurIndia3,460,701
47Ho Chi Minh CityVietnam7,297,780122ShizuokaJapan3,368,988
48FoshanFoshan7,035,945123HefeiChina3,347,591
49Kuala LumpurMalaysia6,836,911124San FranciscoUnited States3,300,075
50BaghdadIraq6,642,848125FuzhouChina3,282,932
51SantiagoChile6,507,400126ShijiazhuangChina3,264,498
52HangzhouChina6,390,637127SeattleUnited States3,248,724
53RiyadhSaudi Arabia6,369,710128Addis AbabaEthiopia3,237,525
54ShenyangChina6,315,470129NanningChina3,234,379
55MadridSpain6,199,254130LucknowIndia3,221,817
56Xi’anChina6,043,700131BusanSouth Korea3,216,298
57TorontoCanada5,992,739132WenzhouChina3,207,846
58MiamiUnited States5,817,221133IbadanNigeria3,160,190
59PuneIndia5,727,530134NingboChina3,131,921
60Belo HorizonteBrazil5,716,422135San DiegoUnited States3,107,034
61DallasUnited States5,702,641136MilanItaly3,098,974
62SuratIndia5,650,011137YaoundeCameroon3,065,692
63HoustonUnited States5,638,045138AthensGreece3,051,899
64SingaporeSingapore5,618,866139WuxiChina3,049,042
65PhiladelphiaUnited States5,585,211140CampinasBrazil3,047,102
66KitakyushuJapan5,510,478141IzmirTurkey3,040,416
67LuandaAngola5,506,000142KanpurIndia3,020,795
68SuzhouChina5,472,033143MashhadIran3,014,424
69HaerbinChina5,457,414144PueblaMexico2,984,048
70BarcelonaSpain5,258,319145Sana’aYemen2,961,934
71AtlantaUnited States5,142,140146Santo DomingoDomican Rep.2,945,353
72KhartoumSudan5,129,358147DoualaCameroon2,943,318
73Dar es SalaamTanzania5,115,670148KievUkraine2,941,884
74Saint PetersburgRussia4,992,991149Guatemala CityGuatemala2,918,337
75Washington D.C.United States4,955,139150CaracasVenezuela2,916,183

References

  1. Pareto, V. Cours d’Economie Politique; Librairie Droz: Lausanne, Switzerland, 1897; Volume 2. [Google Scholar]
  2. Kleiber, C.; Kotz, S. Statistical Size Distributions in Economics and Actuarial Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2003; Volume 470. [Google Scholar] [CrossRef]
  3. Finkelstein, M.; Tucker, H.G.; Veeh, J.A. Pareto Tail Index Estimation Revisited. N. Am. Actuar. J. 2006, 10, 1–10. [Google Scholar] [CrossRef]
  4. Lomax, K.S. Business Failures: Another Example of the Analysis of Failure Data. J. Am. Stat. Assoc. 1954, 49, 847–852. [Google Scholar] [CrossRef]
  5. Bourguignon, M.; Gallardo, D.I.; Gómez, H.J. A Note on Pareto-Type Distributions Parameterized by Its Mean and Precision Parameters. Mathematics 2022, 10, 528. [Google Scholar] [CrossRef]
  6. Charpentier, A.; Flachaire, E. Pareto models for top incomes and wealth. J. Econ. Inequal. 2022, 20, 1–25. [Google Scholar] [CrossRef]
  7. Beirlant, J.; Goegebeur, Y.; Segers, J.; Teugels, J.L. Statistics of Extremes: Theory and Applications; John Wiley & Sons: Chichester, UK, 2004. [Google Scholar]
  8. Beirlant, J.; Caeiro, F.; Gomes, M.I. An overview and open research topics in statistics of univariate extremes. Revstat-Stat. J. 2012, 10, 1–31. [Google Scholar] [CrossRef]
  9. Albrecher, H.; Beirlant, J.; Teugels, J.L. Reinsurance: Actuarial and Statistical Aspects; John Wiley & Sons, Ltd.: Chichester, UK, 2017. [Google Scholar] [CrossRef]
  10. Gomes, M.I.; Guillou, A. Extreme value theory and statistics of univariate extremes: A review. Int. Stat. Rev. 2015, 83, 263–292. [Google Scholar] [CrossRef] [Green Version]
  11. Peng, L.; Qi, Y. Inference for Heavy-Tailed Data: Applications in Insurance and Finance; Academic Press: Cambridge, MA, USA, 2017. [Google Scholar] [CrossRef]
  12. Quandt, R.E. Old and new methods of estimation and the Pareto distribution. Metrika 1966, 10, 55–82. [Google Scholar] [CrossRef]
  13. Lu, H.L.; Tao, S.H. The Estimation of Pareto Distribution by a Weighted Least Square Method. Qual. Quant. 2007, 41, 913–926. [Google Scholar] [CrossRef]
  14. Caeiro, F.; Martins, A.P.; Sequeira, I.J. Finite sample behaviour of classical and quantile regression estimators for the Pareto distribution. AIP Conf. Proc. 2015, 1648, 540007. [Google Scholar] [CrossRef]
  15. Kantar, Y.M. Generalized least squares and weighted least squares estimation methods for distributional parameters. REVSTAT-Stat. J. 2015, 13, 263–282. [Google Scholar] [CrossRef]
  16. Kim, J.H.; Ahn, S.; Ahn, S. Parameter estimation of the Pareto distribution using a pivotal quantity. J. Korean Stat. Soc. 2017, 46, 438–450. [Google Scholar] [CrossRef]
  17. Brazauskas, V.; Serfling, R. Robust and Efficient Estimation of the Tail Index of a Single-Parameter Pareto Distribution. N. Am. Actuar. J. 2000, 4, 12–27. [Google Scholar] [CrossRef]
  18. Vandewalle, B.; Beirlant, J.; Christmann, A.; Hubert, M. A robust estimator for the tail index of Pareto-type distributions. Comput. Stat. Data Anal. 2007, 51, 6252–6268. [Google Scholar] [CrossRef] [Green Version]
  19. Arnold, B.C.; Press, S.J. Bayesian estimation and prediction for Pareto data. J. Am. Stat. Assoc. 1989, 84, 1079–1084. [Google Scholar] [CrossRef]
  20. Rasheed, H.A.; Al-Gazi, N.A.A. Bayes estimators for the shape parameter of Pareto type I distribution under generalized square error loss function. Math. Theory Model. 2014, 4, 20–32. [Google Scholar]
  21. Han, M. The E-Bayesian estimation and its E-MSE of Pareto distribution parameter under different loss functions. J. Stat. Comput. Simul. 2020, 90, 1834–1848. [Google Scholar] [CrossRef]
  22. Singh, V.P.; Guo, H. Parameter estimations for 2-parameter Pareto distribution by pome. Water Resour. Manag. 1995, 9, 81–93. [Google Scholar] [CrossRef]
  23. Caeiro, F.; Gomes, M.I. Semi-parametric tail inference through probability-weighted moments. J. Stat. Plan. Inference 2011, 141, 937–950. [Google Scholar] [CrossRef]
  24. Caeiro, F.; Gomes, M.I. A Class of Semi-parametric Probability Weighted Moment Estimators. In Recent Developments in Modeling and Applications in Statistics; Springer: Berlin/Heidelberg, Germany, 2013; pp. 139–147. [Google Scholar] [CrossRef]
  25. Munir, R.; Saleem, M.; Aslam, M.; Ali, S. Comparison of different methods of parameters estimation for Pareto Model. Casp. J. Appl. Sci. Res. 2013, 2, 45–56. [Google Scholar]
  26. Bhatti, S.H.; Hussain, S.; Ahmad, T.; Aslam, M.; Aftab, M.; Raza, M.A. Efficient estimation of Pareto model: Some modified percentile estimators. PLoS ONE 2018, 13, e0196456. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Bhatti, S.H.; Hussain, S.; Ahmad, T.; Aftab, M.; Ali Raza, M.; Tahir, M. Efficient estimation of Pareto model using modified maximum likelihood estimators. Sci. Iran. 2019, 26, 605–614. [Google Scholar] [CrossRef] [Green Version]
  28. Chen, W.; Yang, R.; Yao, D.; Long, C. Pareto parameters estimation using moving extremes ranked set sampling. Stat. Pap. 2019, 62, 1195–1211. [Google Scholar] [CrossRef]
  29. Greenwood, J.A.; Landwehr, J.M.; Matalas, N.C.; Wallis, J.R. Probability weighted moments: Definition and relation to parameters of several distributions expressable in inverse form. Water Resour. Res. 1979, 15, 1049–1054. [Google Scholar] [CrossRef] [Green Version]
  30. Hosking, J.R.M.; Wallis, J.R.; Wood, E.F. Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 1985, 27, 251–261. [Google Scholar] [CrossRef]
  31. Jing, D.; Dedun, S.; Ronfu, Y.; Yu, H. Expressions relating probability weighted moments to parameters of several distributions inexpressible in inverse form. J. Hydrol. 1989, 110, 259–270. [Google Scholar] [CrossRef]
  32. Landwehr, J.M.; Matalas, N.; Wallis, J. Probability weighted moments compared with some traditional techniques in estimating Gumbel parameters and quantiles. Water Resour. Res. 1979, 15, 1055–1064. [Google Scholar] [CrossRef]
  33. Landwehr, J.M.; Matalas, N.; Wallis, J. Estimation of parameters and quantiles of Wakeby distributions: 1. Known lower bounds. Water Resour. Res. 1979, 15, 1361–1372. [Google Scholar] [CrossRef]
  34. Caeiro, F.; Gomes, M.I. Computational Study of the Adaptive Estimation of the Extreme Value Index with Probability Weighted Moments. In Proceedings of the Recent Developments in Statistics and Data Science: SPE2021, Évora, Portugal, 13–16 October 2021; Bispo, R., Henriques-Rodrigues, L., Alpizar-Jara, R., de Carvalho, M., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 29–39. [Google Scholar] [CrossRef]
  35. Caeiro, F.; Gomes, M.I.; Vandewalle, B. Semi-parametric probability-weighted moments estimation revisited. Methodol. Comput. Appl. Probab. 2014, 16, 1–29. [Google Scholar] [CrossRef]
  36. Rasmussen, P.F. Generalized probability weighted moments: Application to the generalized Pareto distribution. Water Resour. Res. 2001, 37, 1745–1751. [Google Scholar] [CrossRef]
  37. Caeiro, F.; Prata Gomes, D. A Log Probability Weighted Moment Estimator of Extreme Quantiles. In Theory and Practice of Risk Assessment; Springer Proceedings in Mathematics & Statistics; Kitsos, C., Oliveira, T., Rigas, A., Gulati, S., Eds.; Springer: Cham, Switzerland, 2015; Volume 136, pp. 293–303. [Google Scholar] [CrossRef] [Green Version]
  38. Caeiro, F.; Mateus, A. Log Probability Weighted Moments Method for Pareto distribution. In Proceedings of the 17th Applied Stochastic Models and Data Analysis International Conference with the 6th Demographics Workshop, London, UK, 6–9 June 2017; Skiadas, C.H., Ed.; 2017; pp. 211–218. [Google Scholar]
  39. Chen, H.; Cheng, W.; Zhao, J.; Zhao, X. Parameter estimation for generalized Pareto distribution by generalized probability weighted moment-equations. Commun. Stat.-Simul. Comput. 2017, 46, 7761–7776. [Google Scholar] [CrossRef]
  40. Mateus, A.; Caeiro, F. A new class of estimators for the shape parameter of a Pareto model. Comput. Math. Methods 2021, 3, e1133. [Google Scholar] [CrossRef]
  41. Mateus, A.; Caeiro, F. Confidence intervals for the shape parameter of a Pareto distribution. AIP Conf. Proc. 2022, 2425, 320003. [Google Scholar] [CrossRef]
  42. Arnold, B.C. Pareto and Generalized Pareto Distributions. In Modeling Income Distributions and Lorenz Curves; Chotikapanich, D., Ed.; Springer: New York, NY, USA, 2008; pp. 119–145. [Google Scholar] [CrossRef]
  43. Arnold, B.C.; Balakrishnan, N.; Nagaraja, H.N. A First Course in Order Statistics; Siam: Philadelphia, PA, USA, 1992; Volume 54. [Google Scholar] [CrossRef]
  44. Ahmad, M.I.; Sinclair, C.D.; Spurr, B.D. Assessment of flood frequency models using empirical distribution function statistics. Water Resour. Res. 1988, 24, 1323–1328. [Google Scholar] [CrossRef]
  45. Razali, N.M.; Wah, Y.B. Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests. J. Stat. Model. Anal. 2011, 2, 21–33. [Google Scholar]
  46. Singla, N.; Jain, K.; Kumar Sharma, S. Goodness of Fit Tests and Power Comparisons for Weighted Gamma Distribution. REVSTAT-Stat. J. 2016, 14, 29–48. [Google Scholar] [CrossRef]
  47. The 150 Largest Cities in the World. Available online: https://www.worldatlas.com/citypops.htm (accessed on 15 May 2021).
  48. Bowley, A.L. Elements of Statistics; PS King & Son: London, UK, 1901. [Google Scholar]
  49. Horn, P.S. Robust quantile estimators for skewed populations. Biometrika 1990, 77, 631–636. [Google Scholar] [CrossRef]
  50. Kim, T.H.; White, H. On more robust estimation of skewness and kurtosis. Financ. Res. Lett. 2004, 1, 56–73. [Google Scholar] [CrossRef]
  51. Brys, G.; Hubert, M.; Struyf, A. A Robust Measure of Skewness. J. Comput. Graph. Stat. 2004, 13, 996–1017. [Google Scholar] [CrossRef]
  52. Cirillo, P. Are your data really Pareto distributed? Phys. A Stat. Mech. Its Appl. 2013, 392, 5947–5962. [Google Scholar] [CrossRef] [Green Version]
  53. Clark, D. A Note on the Upper-truncated Pareto distribution. Casualty Actuar. Soc. E-Forum 2013, Winter 1, 1–22. [Google Scholar]
Figure 1. Histogram and boxplot for the population data.
Figure 1. Histogram and boxplot for the population data.
Mathematics 11 01076 g001
Figure 2. Pareto Q-Q Plot for the population data.
Figure 2. Pareto Q-Q Plot for the population data.
Mathematics 11 01076 g002
Figure 3. Histogram and boxplot for the estimated number of deaths.
Figure 3. Histogram and boxplot for the estimated number of deaths.
Mathematics 11 01076 g003
Figure 4. Pareto Q-Q Plot for the estimated number of deaths.
Figure 4. Pareto Q-Q Plot for the estimated number of deaths.
Mathematics 11 01076 g004
Table 1. Bias and RMSE of the estimators of the shape parameter a for the Pareto distribution.
Table 1. Bias and RMSE of the estimators of the shape parameter a for the Pareto distribution.
MLMPWMLGPWM-KSLGPWM-CvMLGPWM-MAD
Bias/RMSEBias/RMSEBias/RMSEBias/RMSEBias/RMSEBias/RMSE
Pareto model with a = 0.1 and c = 0.25
150.016/0.0370.900/0.9000.906/0.9060.014/0.0420.015/0.0410.008/0.036
200.012/0.0300.900/0.9000.904/0.9040.010/0.0300.010/0.0320.006/0.029
300.007/0.0200.900/0.9000.903/0.9030.006/0.0230.005/0.0220.003/0.021
400.006/0.0190.900/0.9000.902/0.9020.006/0.0220.005/0.0220.003/0.020
500.004/0.0140.900/0.9000.902/0.9020.004/0.0160.004/0.0160.002/0.015
750.003/0.0110.900/0.9000.901/0.9010.003/0.0130.002/0.0130.001/0.012
1000.002/0.0100.900/0.9000.901/0.9010.003/0.0110.002/0.0120.001/0.011
1500.001/0.0080.900/0.9000.901/0.9010.002/0.0090.001/0.0090.000/0.009
2000.001/0.0070.900/0.9000.900/0.9000.001/0.0080.001/0.0080.000/0.008
Pareto model with a = 0.25 and c = 0.5
150.040/0.0930.753/0.7530.776/0.7780.036/0.1060.037/0.1030.020/0.089
200.029/0.0740.751/0.7510.770/0.7700.025/0.0760.026/0.0810.016/0.072
300.017/0.0500.750/0.7500.763/0.7640.015/0.0580.012/0.0550.008/0.051
400.015/0.0470.750/0.7500.759/0.7590.014/0.0540.013/0.0540.009/0.049
500.011/0.0360.750/0.7500.757/0.7570.011/0.0410.010/0.0410.006/0.038
750.007/0.0270.750/0.7500.754/0.7540.008/0.0320.006/0.0330.003/0.030
1000.006/0.0250.750/0.7500.753/0.7530.006/0.0290.005/0.0290.002/0.027
1500.003/0.0200.750/0.7500.752/0.7520.004/0.0230.003/0.0230.001/0.022
2000.002/0.0170.750/0.7500.752/0.7520.003/0.0200.002/0.0200.000/0.020
Pareto model with a = 0.75 and c = 0.5
150.120/0.2800.429/0.4670.524/0.5770.118/0.3180.115/0.3090.069/0.277
200.088/0.2230.397/0.4220.479/0.5240.089/0.2380.086/0.2520.052/0.219
300.052/0.1510.363/0.3750.435/0.4630.051/0.1800.040/0.1660.025/0.155
400.046/0.1410.346/0.3570.401/0.4210.048/0.1670.042/0.1630.027/0.146
500.033/0.1070.330/0.3380.379/0.3980.036/0.1220.033/0.1240.019/0.114
750.021/0.0800.315/0.3190.354/0.3640.026/0.0990.020/0.0980.009/0.090
1000.018/0.0750.310/0.3140.347/0.3550.021/0.0870.015/0.0870.008/0.081
1500.010/0.0590.301/0.3040.333/0.3400.014/0.0710.008/0.0690.003/0.067
2000.007/0.0520.297/0.3000.327/0.3320.010/0.0600.006/0.0600.001/0.059
Pareto model with a = 1 and c = 1
150.160/0.3730.363/0.4620.475/0.5930.138/0.4100.143/0.4100.077/0.354
200.118/0.2980.318/0.3930.416/0.5190.097/0.2990.098/0.3180.063/0.289
300.069/0.2020.269/0.3160.356/0.4290.054/0.2270.047/0.2180.029/0.206
400.061/0.1880.244/0.2910.309/0.3700.054/0.2160.052/0.2160.034/0.195
500.044/0.1430.217/0.2560.276/0.3350.041/0.1600.041/0.1640.024/0.152
750.028/0.1070.195/0.2210.244/0.2820.029/0.1290.025/0.1300.011/0.119
1000.024/0.1000.190/0.2130.236/0.2700.024/0.1150.019/0.1160.010/0.108
1500.013/0.0780.173/0.1930.214/0.2450.016/0.0920.011/0.0940.004/0.090
2000.009/0.0690.167/0.1860.205/0.2320.011/0.0790.008/0.0800.001/0.080
Table 2. Bias and RMSE of the estimators of the scale parameter c for the Pareto distribution.
Table 2. Bias and RMSE of the estimators of the scale parameter c for the Pareto distribution.
MLMPWMLGPWM-KSLGPWM-CvMLGPWM-MAD
Bias/RMSEBias/RMSEBias/RMSEBias/RMSEBias/RMSEBias/RMSE
Pareto model with a = 0.1 and c = 0.25
150.517/2.4890.466/2.319*/*0.428/1.4390.525/1.8520.366/1.248
200.199/0.3500.176/0.326*/*0.345/1.3760.328/1.3620.294/1.193
300.111/0.2000.099/0.189*/*0.097/0.3020.112/0.4700.100/0.384
400.077/0.1310.069/0.124*/*0.064/0.2210.066/0.2620.066/0.279
500.059/0.0870.053/0.082*/*0.058/0.2040.060/0.2190.051/0.216
750.036/0.0520.032/0.050*/*0.025/0.1130.022/0.1270.025/0.217
1000.025/0.0350.022/0.033*/*0.018/0.0880.013/0.0890.011/0.117
1500.016/0.0210.014/0.020*/*0.012/0.0690.009/0.0840.008/0.103
2000.012/0.0160.010/0.015*/*0.009/0.0620.004/0.0610.001/0.079
Pareto model with a = 0.25 and c = 0.5
150.179/0.3350.134/0.296*/*0.122/0.3450.133/0.3810.086/0.332
200.112/0.1680.081/0.144*/*0.095/0.3030.093/0.2990.071/0.301
300.070/0.1080.051/0.094*/*0.043/0.1490.039/0.1730.028/0.180
400.052/0.0780.039/0.068*/*0.029/0.1190.026/0.1280.019/0.147
500.042/0.0590.031/0.051*/*0.025/0.1140.025/0.1210.013/0.133
750.026/0.0370.019/0.032*/*0.011/0.0780.007/0.0830.001/0.111
1000.019/0.0260.014/0.022*/*0.009/0.0630.004/0.065−0.001/0.081
1500.012/0.0160.009/0.014*/*0.007/0.0520.003/0.057−0.001/0.071
2000.009/0.0130.007/0.011*/*0.004/0.0460.000/0.047−0.005/0.060
Pareto model with a = 0.75 and c = 0.5
150.048/0.0730.016/0.0540.514/2.0590.034/0.0840.034/0.0840.018/0.090
200.033/0.0460.009/0.0330.356/0.7200.028/0.0720.026/0.0740.014/0.078
300.021/0.0310.006/0.0230.370/0.6820.015/0.0450.012/0.0460.006/0.054
400.016/0.0230.004/0.0170.321/0.4560.011/0.0370.008/0.0380.003/0.045
500.013/0.0180.004/0.0130.316/0.4090.009/0.0360.007/0.0360.002/0.043
750.009/0.0120.002/0.0080.302/0.3460.004/0.0260.002/0.027−0.001/0.034
1000.006/0.0080.001/0.0060.306/0.3490.004/0.0210.002/0.021−0.001/0.026
1500.004/0.0050.001/0.0040.299/0.3210.002/0.0170.001/0.017−0.001/0.023
2000.003/0.0040.001/0.0030.308/0.3290.002/0.015−0.000/0.016−0.002/0.020
Pareto model with a = 1 and c = 1
150.070/0.1050.015/0.0750.293/0.6250.033/0.1200.035/0.1220.011/0.131
200.048/0.0680.010/0.0470.238/0.3430.026/0.1020.024/0.1030.009/0.114
300.032/0.0460.004/0.0320.233/0.3280.012/0.0630.009/0.0680.002/0.080
400.024/0.0340.003/0.0240.198/0.2620.008/0.0530.007/0.0560.001/0.067
500.020/0.0270.003/0.0180.183/0.2390.007/0.0520.007/0.054−0.000/0.064
750.013/0.0180.001/0.0120.173/0.2150.002/0.0370.000/0.040−0.004/0.052
1000.009/0.0120.001/0.0080.173/0.2090.002/0.0300.000/0.031−0.003/0.040
1500.006/0.0080.000/0.0050.165/0.1930.001/0.025−0.000/0.027−0.003/0.034
2000.005/0.0060.000/0.0040.163/0.1840.001/0.023−0.001/0.024−0.004/0.030
* value greater than 10.
Table 3. Summary statistics for the population data.
Table 3. Summary statistics for the population data.
Min.1st Quart.MedianMean3rd Quart.Max.
2.9163.5714.9077.0828.74438.001
Table 4. Parameter estimates and goodness-of-fit statistics for the population data.
Table 4. Parameter estimates and goodness-of-fit statistics for the population data.
  a ^   c ^   D n   W n 2   AU n 2
ML1.45992.91620.05950.09790.4462
M1.69532.90470.11670.59292.1937
PWM1.88353.32220.18000.97492.1406
LGPWM ( s 1 = 0.7 , s 2 = 1.1 )1.39332.91290.04370.05290.3408
LGPWM ( s 1 = 1.4 , s 2 = 1.5 )1.33252.86940.05020.04080.3686
LGPWM ( s 1 = 0.9 , s 2 = 1.0 )1.38242.90560.04470.04880.3398
Table 5. Summary statistics for the estimated number of deaths.
Table 5. Summary statistics for the estimated number of deaths.
Min.1st Quart.MedianMean3rd Quart.Max.
20.0930.0050.0089.96110.00316.00
Table 6. Parameter estimates and goodness-of-fit statistics for estimated number of deaths.
Table 6. Parameter estimates and goodness-of-fit statistics for estimated number of deaths.
  a ^   c ^   D n   W n 2   AU n 2
ML0.903420.08500.15250.06150.2479
M1.273719.33410.28200.43471.7498
PWM1.504630.17040.31450.51971.1743
LGPWM ( s 1 = 0.9 , s 2 = 1.5 )0.753618.05950.11590.05200.2421
LGPWM ( s 1 = 0.7 , s 2 = 0.8 )0.814919.04830.13000.04670.2179
LGPWM ( s 1 = 0.6 , s 2 = 0.7 )0.832319.33800.13340.04680.2161
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Caeiro, F.; Mateus, A. A New Class of Generalized Probability-Weighted Moment Estimators for the Pareto Distribution. Mathematics 2023, 11, 1076. https://doi.org/10.3390/math11051076

AMA Style

Caeiro F, Mateus A. A New Class of Generalized Probability-Weighted Moment Estimators for the Pareto Distribution. Mathematics. 2023; 11(5):1076. https://doi.org/10.3390/math11051076

Chicago/Turabian Style

Caeiro, Frederico, and Ayana Mateus. 2023. "A New Class of Generalized Probability-Weighted Moment Estimators for the Pareto Distribution" Mathematics 11, no. 5: 1076. https://doi.org/10.3390/math11051076

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop