Next Article in Journal
Computer Simulation of Coke Sediments Burning from the Whole Cylindrical Catalyst Grain
Previous Article in Journal
Assessing the Impact of Digital Finance on the Total Factor Productivity of Commercial Banks: An Empirical Analysis of China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Alternative Lambert-Type Distribution for Bounded Data

Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(3), 667; https://doi.org/10.3390/math11030667
Submission received: 5 January 2023 / Revised: 22 January 2023 / Accepted: 25 January 2023 / Published: 28 January 2023

Abstract

:
In this article, we propose a new two-parameter distribution for bounded data such as rates, proportions, or percentages. The density function of the proposed distribution, presenting monotonic, unimodal, and inverse-unimodal shapes, tends to a positive finite value at the lower end of its support, which can lead to a better fit of the lower empirical quantiles. We derive some of the main structural properties of the new distribution. We make a description of the skewness and kurtosis of the distribution. We discuss the parameter estimation under the maximum likelihood method. We developed a simulation study to evaluate the behavior of the estimators. Finally, we present two applications to real data providing evidence that the proposed distribution can perform better than the popular beta and Kumaraswamy distributions.

1. Introduction

The beta (B) and Kumaraswamy (K) distributions [1,2] play an important role in the analysis of bounded data such as rates, proportions, and percentages. These distributions, presenting only two shape parameters, have a very flexible probability density function (pdf), presenting monotonic, unimodal, and reverse-unimodal shapes. Structurally, the analytical expressions of the pdf’s of these distributions are similar. The random variable X follows the B distribution with shape parameters α , β > 0 , denoted as X B ( α , β ) , if its pdf is given by
f ( x ; β , α ) = [ B ( α , β ) ] 1 x α 1 ( 1 x ) β 1 , x ( 0 , 1 ) ,
where B ( α , β ) = 0 1 u α 1 ( 1 u ) β 1 d u is the beta function. On the other hand, if X has the K distribution with shape parameters α , β > 0 , denoted as X K ( α , β ) , its pdf is given by
f ( x ; α , β ) = α β x α 1 ( 1 x α ) β 1 , x ( 0 , 1 ) .
Being rigorous, it should be mentioned that the support of the previous pdf’s is the closed interval [ 0 , 1 ] , but the extremes 0 and 1 tend to be omitted to avoid difficulties in obtaining the maximum likelihood (ML) estimators for α and β . When the minimum observation in the data is exactly the value 0 or the maximum observation is exactly the value 1, the likelihood is null or indeterminate depending on whether α and β are greater than or less than 1. Consequently, the ML estimates of α and β cannot be obtained.
Note that the presence of the value 0 or 1 in the data will depend on the nature of the phenomenon being studied. For example, if the monthly proportion of the shared household budget allocated to some type of expense (food, clothing, transportation, etc.) is studied, it is to be expected that some observations will be exactly 0, since some households may not buy clothing in a given month, see for example Blundell et al. [3]. In a scenario like this, the following two paths are frequently chosen to overcome this difficulty:
1. Use another estimation method that provides consistent and efficient estimators, the moment method is a good alternative in this case. Details about the moment estimators for the shape parameters of the B and K distributions can be found in Johnson et al. [1] and Dey et al. [4].
2. Use the four-parameter version of the B distribution with two location parameters. This version is defined by the transformation Y = ( b a ) X + a , where X B ( α , β ) and b > a R are the location parameters. Thus, Y follows the B distribution over the interval [ a , b ] . Similarly, the above is valid for defining the four-parameter version of the K distribution. When a and b are known, or specified under some criteria by the analyst, ML estimates of the shape parameters of the B and K distributions can be obtained following the standard procedures of the ML method. When a and b are unknown, obtaining the estimates can be problematic because the regularity conditions are not satisfied, the support of the distributions depends on the location parameters. See Wang [5] and Smith [6] for some details on classes of non-regular distributions in ML estimation.
In this article, we propose a new two-parameter distribution for fitting bounded data to the unit interval. The pdf of the new distribution, depending on its parameters, can present monotonic, unimodal and reverse-unimodal shapes. We note that the pdf tends to a positive finite value at the lower end of its support, consequently, the likelihood is not null or indeterminate when trying to fit a data set whose minimum observation is exactly 0. Thus, the ML estimates for its parameters can be obtained.
The proposed distribution arises from the Lambert-F distribution generator [7], defined by the cumulative distribution function (cdf)
G ( x ; η , α ) = 1 [ 1 F ( x ; η ) ] α F ( x ; η ) , α ( 0 , e ) ,
where e is the Euler’s number, F ( · ; η ) is an arbitrary continuous cdf with parameter vector η , F 1 ( · ; η ) is the corresponding quantile function (qf), and W 0 ( · ) denotes the principal branch of the Lambert W function. See Corless et al. [8] and Brito et al. [9] for details of the Lambert W function.
Under a uniform (U) baseline distribution, Iriarte et al. [10] propose the Lambert-uniform (LU) distribution defined by the pdf f ( x ; α ) = α x [ 1 log ( α ) ( 1 x ) ] , x [ 0 , 1 ] , α ( 0 , e ) . This pdf has a monotonic shape, increasing or decreasing depending on α , and converges to finite values at the ends of its support. Iriarte et al. [10] show that the LU distribution can perform better than the B and K distributions when the histogram of the data exhibits increasing or decreasing behavior.
In our case, in Equation (1), we consider a proportional hazard uniform (PHU) baseline distribution [11], defined by the cdf F ( x , β ) = 1 ( 1 x ) β , x [ 0 , 1 ) , β > 0 . This distribution is characterized by having a hazard rate function (hrf) proportional to the hrf of the U distribution and, furthermore, by presenting a monotonic pdf. In this way, we obtain a new two-parameter distribution whose pdf tends to a positive finite value at the lower end of its support (similar to the LU pdf), which in addition to exhibiting monotonic shapes is also capable of presenting unimodal and reverse-unimodal shapes.
The article is organized as follows. In Section 2, we propose the new distribution and derive its main structural properties as pdf, cdf, and hrf. In addition, we derive the raw moments that are used to describe the behavior of the skewness and kurtosis of the distribution. In Section 3, we address the problem of parameter estimation via the ML method. We developed a simulation study to evaluate the behavior of the estimators. In Section 4, we present two application examples with real data in order to illustrate the usefulness of the new distribution. The concluding remarks are considered in Section 5.

2. The New Distribution

In this section, we propose the new distribution, derive some of its main structural properties, and describe the behavior of skewness and kurtosis.

2.1. LPHU Random Variable

In what follows, we define the LPHU random variable and derive some of its main properties.
Definition 1.
A random variable X follows the Lambert proportional hazard uniform distribution with shape parameter α ( 0 , e ) and β > 0 , denoted as X LPHU ( α , β ) , if it can be represented as
X = 1 1 log ( α ) W 0 log ( α ) ( U 1 ) α 1 β , if α ( 0 , 1 ) ( 1 , e ) , 1 ( 1 U ) 1 β , if α = 1 .
where e is the Euler’s number, W 0 ( · ) is the principal branch of the Lambert W function, and U is a uniform ( 0 , 1 ) random variable.
Considering the change of variable β = log ( α ) , it can be directly verified that lim U 0 + X = 0 and lim U 1 X = 1 . Thus, taking into account that W 0 ( · ) is a monotonic function, we observe that X is a one-to-one transformation of U that maps values from the interval ( 0 , 1 ) to the interval ( 0 , 1 ) . In consequence, the distribution of X inherits the support of the distribution of U, but it will present a greater variety of shapes due to the extra α parameter.
Proposition 1.
Let X LPHU ( α , β ) . Then, the cdf of X is given by
G ( x ; α , β ) = 1 ( 1 x ) β α 1 ( 1 x ) β , x ( 0 , 1 ) , α ( 0 , e ) , β > 0 .
Proof. 
From Equation (2), for α ( 0 , 1 ) ( 1 , e ) , we have that
G ( x ; α , β ) = P ( X x ) = P W 0 log ( α ) ( U 1 ) α log ( α ) ( 1 x ) β .
Then, by definition of the Lambert W function, it follows that
P ( X x ) = P log ( α ) ( U 1 ) α log ( α ) ( 1 x ) β = P U 1 ( 1 x ) β α 1 ( 1 x ) β ,
and the result is obtained taking into account that P ( U = u ) = u , since U has uniform ( 0 , 1 ) distribution. Finally, note that the expression obtained is also valid for α = 1 , once G ( x ; α = 1 , β ) = 1 ( 1 x ) β is the PHU cdf.    □
The pdf of X can be obtained in a straightforward way from Proposition 1.
Corollary 1.
Let X LPHU ( α , β ) . Then, the pdf of X is given by
g ( x ; α , β ) = β ( 1 x ) β 1 α 1 ( 1 x ) β [ 1 log ( α ) ( 1 x ) β ] .
Note that the functions given in Equations (2)–(4) are closed, so they are easy to implement computationally. We use the lamW package [12] of the R programming language [13] for the computation of the principal branch of the Lambert W function. Regarding the shapes of the LPHU pdf, we observe that:
  • The LPHU pdf is not null at the lower end of its support, g ( 0 ; β , α ) = β [ 1 log ( α ) ] > 0 . Thus, the LPHU has a behavior similar to that of the LU and PHU pdf’s, but with the advantage that it can present unimodal and reverse-unimodal shapes;
  • Equation (4) reduces to the PHU, LU, and uniform ( 0 , 1 ) pdf’s when α = 1 , β = 1 and α = β = 1 , respectively. Thus, for such parameter choices, the LPHU pdf inherits the shapes of the PHU, LU, and U pdf’s;
  • For α 1 and β 1 , we observe that the equation g ( x ; β , α ) / x = 0 leads to the statement that the LPHU pdf may have a critical point at
    x 0 = 1 3 β 1 β ( 5 β 2 ) + 1 2 β log ( α ) 1 β ,
    where ( x 0 , g ( x 0 ; α , β ) ) is a maximum or a minimum if d ( x 0 ) < 0 or d ( x 0 ) > 0 , respectively, such that d ( x ) = 2 + β ( 3 β ) + x 1 { 2 β ( 9 + 7 β ) + β x 1 [ β ( x 1 6 ) + 3 ] } , where x 1 = log ( α ) ( 1 x ) β .
The upper left panel of Figure 1 shows some LPHU pdf curves for different choices of α and β . In the figure, it can be seen that the LPHU pdf can present monotonic, unimodal, and reverse-unimodal shapes. In the same figure (upper right panel and lower panels) the histograms of three sets of LPHU pseudo-random numbers are presented together with the B and K pdf’s equipped with the ML estimates. Pseudo-random numbers were generated from Equation (2) by considering a uniform ( 0 , 1 ) random input. In the figure, it can be seen that the B and K pdf’s deviate from the relative frequencies exhibited by the histograms, especially in the lower quantiles, since the B and K pdf’s tend to or 0 at the lower end of the support. This suggests that the LPHU distribution could have a better performance than the B and K distributions when fitting real data that present a histogram with a behavior similar to that shown in this figure.
R codes for the computation of Equations (3) and (4) and for the generation of pseudo-random numbers from the LPHU distribution are provided in Appendix A.

2.2. Related Distributions

By choosing suitable values for the shapes parameters of the LPHU distribution it is possible to distinguish the following special cases: 1. If α = 1 , the LPHU distribution reduces to the PHU distribution; 2. If β = 1 , the LPHU distribution reduces to the LU distribution; 3. If α = β = 1 , the LPHU distribution reduces to the U distribution.
It is well known that some distributions such as the exponential, Rayleigh, and power, among others, can be derived as a transformation of a U random variable. Considering these transformations on a LU random variable, we derive the following distributions:
  • Let Y = λ log ( 1 X ) , where X LPHU ( α , β = λ ) and λ > 0 . Then, Y follows the nonscaled Lambert-exponential distribution. See Iriarte et al. [7];
  • Let Y = σ log ( 1 X ) , where X LPHU ( α , β = 1 / 2 ) and σ > 0 . Then, Y follows the Lambert–Rayleigh distribution. See Iriarte et al. [7];
  • Let Y = X 1 / δ , where X LPHU ( α , β ) and δ > 0 . Then, the distribution of Y is a three-parameter distribution that reduces to the K distribution when α = 1 . In this case, the cdf of Y is given by F ( y ; α , β , δ ) = 1 ( 1 y δ ) β α 1 ( 1 y δ ) β , where y [ 0 , 1 ) . Thus, we refer to this distribution as the Lambert–Kumaraswamy distribution.
Other distributions of the literature can be derived under consideration of appropriate transformations of LPHU random variables. Illustratively, we consider in this section only the three transformations described above. As a final consideration of this section, we highlight that the linear transformation a + ( b a ) X , where X LPHU ( α , β ) , with a < b R , follows a LPHU distribution on the continuous range ( a , b ) . Therefore, the LPHU distribution can be easily used to fit bounded data to any real range.

2.3. Hazard Rate Function

The reliability function (rf) and the hazard rate function (hrf) play an important role in the analysis of lifetime data in reliability studies. In the following statement the rf and hrf of the LPHU distribution are derived.
Proposition 2.
Let T LPHU ( α , β ) . Then, the rf and the hrf of T are given by
R ( t ; α , β ) = ( 1 t ) β α 1 ( 1 t ) β and
h ( t ; α , β ) = β 1 t [ 1 log ( α ) ( 1 t ) β ] .
Proof. 
If T is a random variable representing the failure time of mechanical units, the rf of T, defined as R ( t ) : = P ( T > t ) , t > 0 , indicates the probability that mechanical units survive beyond the time t. If T LPHU ( α , β ) , the result in (5) is obtained as P ( T > t ) = 1 P ( T t ) = 1 G ( t ; α , β ) , where G ( · ; · , · ) is as in Equation (3)
On the other hand, the hrf of T, defined as h ( t ) : = f ( t ) / R ( t ) (where f ( · ) is the pdf of T), measures the propensity of a mechanical units to fail or die depending on the age it has reached. If T LPHU ( α , β ) , the result in (6) is a consequence of Equations (3) and (4).    □
In Proposition 2, it can be seen that the LPHU hrf corresponds to a modification in a multiplicative fashion of the PHU hrf. Furthermore, we observe that
lim t 1 h ( t ; α = 1 , β ) h ( t ; α , β ) = 1 ,
which means that the LPHU hft can be understood as a modification in early times of the baseline PHU hrf. Figure 2 shows some LPHU hrf curves considering different choices for α and β . In the figure, it can be seen that (like the B and K hrf’s) the LPHU hrf can present monotonic and reverse-unimodal shapes.

2.4. Skewness and Kurtosis Behavior

In this section, we describe the skewness and kurtosis behavior of the LPHU distribution by analyzing Fisher’s skewness and kurtosis coefficients. For this, we first derive the raw moments.
Proposition 3.
Let X LPHU ( α , β ) . Then, for r = 1 , 2 , , the rth raw moment of X is given by
μ r = E ( X r ) = 1 + α k = 1 r r k a k ( α , β ) ,
where a k ( α , β ) = ( 1 ) k 0 1 u k / β α u [ 1 log ( α ) u ] d u .
Proof. 
From Equation (4), considering the change of variable u = ( 1 x ) β , we obtain that E ( X r ) = 0 1 ( 1 u 1 / β ) r α 1 u [ 1 log ( α ) u ] d u . Thus, the binomial theorem leads to the expression
E ( X r ) = α k = 0 r r k ( 1 ) k 0 1 u k β α u [ 1 log ( α ) u ] d u ,
and the result is obtained by representing the previous integral as a k ( α , β ) and noting that a 0 ( α , β ) = 1 / α .    □
Note that the function a k ( x ; α , β ) in Proposition 3 must be calculated by numerical integration. A good alternative is to use the integrate function of the R language.
Corollary 2.
Let X LPHU ( α , β ) . Then, the mean and variance of X are E ( X ) = 1 + α a 1 and V ( X ) = α ( a 2 α a 1 2 ) , where a j = a j ( α , β ) , with j = 1 , 2 , is as in Proposition 3.
Corollary 3.
Let X LPHU ( α , β ) . Then, the Fisher’s skewness and kurtosis coefficients of X are given by
β 1 = E [ X E ( X ) ] 3 [ V ( X ) ] 3 / 2 = a 3 3 α a 1 a 2 + 2 α 2 a 1 3 α 1 / 2 ( a 2 α a 1 2 ) 3 / 2 and β 2 = E [ X E ( X ) ] 4 [ V ( X ) ] 2 = a 4 4 α a 1 a 3 + 6 α 2 a 1 2 a 2 3 α 3 a 1 4 α ( a 2 α a 1 2 ) 2 ,
where a j = a j ( α , β ) , with j = 1 , 2 , 3 , 4 , is as in Proposition 3.
Due to the analytical complexity it is not possible to obtain closed expressions for the critical points of the Fisher’s skewness and kurtosis coefficients. However, by maximizing and minimizing the coefficients with the help of the R programming language, we obtain approximate ranges of skewness and kurtosis, thus obtaining β 1 < 4.418 and β 2 ( 1.717 , 46.855 ) .
Figure 3 shows the behavior of Fisher’s skewness and kurtosis coefficients for the LPHU distribution. In the figure, it can be seen that the skewness coefficient behaves monotonically with respect to both parameters, a fact that can be understood as an identifiability indicator in the sense that different values of the parameters lead to different skewness levels and, consequently, to different members of the LPHU family. On the other hand, we observe that the kurtosis coefficient can have a non-monotonic behavior, so that there can be two values of α (or β ) associated with the same kurtosis level. This can be understood as the kurtosis levels associated with the weight of the left tail when the distribution is negatively skewed and with the weight of the right tail when it is positively skewed.

3. Parameter Estimation

In this section, we discuss the parameter estimation for the LPHU distribution via the maximum likelihood (ML) method and develop a simulation study to evaluate the behavior of the estimators.

3.1. ML Estimation

Given an observed sample x 1 , , x n from the random variable X LPHU ( α , β ) , the log-likelihood function is given by
( α , β ; x i ) = c ( α , β ) + ( β 1 ) i = 1 n log ( z i ) log ( α ) i = 1 n z i β + i = 1 n log [ 1 log ( α ) z i β ] ,
where z i = 1 x i and c ( α , β ) = n log ( β ) + n log ( α ) .
The ML estimators α ^ M L and β ^ M L of α and β can be obtained by taking the partial derivatives of Equation (7) and solving the corresponding system of equations, which is given by the equations
n = i = 1 n z i β + i = 1 n z i β 1 log ( α ) z i β and
n β = log ( α ) i = 1 n z i β log ( z i ) i = 1 n log ( z i ) + log ( α ) i = 1 n z i β log ( z i ) 1 log ( α ) z i β .
The asymptotic distribution (under regularity conditions) of the ML estimator of θ = ( α , β ) is N 2 ( θ , I ( θ ) 1 ) , where I ( θ ) 1 is the expected information matrix. Taking into account the structure of Equation (7), we observe that it is not easy to derive the analytical expression of this matrix. Thus, we consider an approximation from the observed information matrix, where the elements of this matrix are computed as minus the second partial derivatives of the log-likelihood function with respect to each parameter (assessed in the ML estimates).
The observed information matrix is given by
I ( θ ) = α α α β α β β β ,
where
α α = n α 2 1 α 2 i = 1 n z i β 1 α 2 i = 1 n z i β 1 log ( α ) z i β + 1 α 2 i = 1 n z i 2 β [ 1 log ( α ) z i β ] 2 , α β = n β 2 + log ( α ) i = 1 n z i β log 2 ( z i ) + log ( α ) i = 1 n z i β log 2 ( z i ) 1 log ( α ) z i β + log 2 ( α ) i = 1 n z i 2 β log 2 ( z i ) [ 1 log ( α ) z i β ] 2 and β β = 1 α i = 1 n z i β log ( z i ) + 1 α i = 1 n z i β log ( z i ) 1 log ( α ) z i β + log ( α ) α i = 1 n z i 2 β log ( z i ) [ 1 log ( α ) z i β ] 2 .
Then, approximate 100 ( 1 φ ) % confidence intervals for α and β can be determined by α ^ ± z φ / 2 s α ^ and β ^ ± z φ / 2 s β ^ , respectively, where z φ / 2 is the upper ( φ / 2 ) th percentile of the standard normal distribution, and s α ^ and s β ^ are the diagonal elements of the matrix [ I ( θ ) ] 1 (assessed in the ML estimates).

3.2. Computational Guidelines

From Equations (8) and (9), it can be seen that the ML estimators cannot be obtained explicitly, so the estimates must be obtained using numerical procedures. A good alternative for this is to use the multiroot function [14] of the R programming language, which implements the Newton–Raphson method for obtaining roots of systems of nonlinear equations.
Due to the above, we consider the following points:
  • Alternatively, the ML estimates can be obtained by solving the optimization problem max ( α , β ) i = 1 n ( α , β ; x i ) , subject to α ( 0 , e ) , β > 0 , where ( · ; · ) is given in Equation (7).
  • For this, we use the optim function of the R programming language. An R code is provided in Appendix B.
  • Specifically, we consider the L-BGSB-B algorithm [15], which allows us to specify the parameter space. This algorithm requires declaring a value in the parameter space to initialize the iterative process. Taking into account that the PHU distribution is a special case of the LPHU distribution, we consider α 0 = 1 and β 0 = n / i = 1 n log ( 1 x i ) , where β 0 is the ML estimator of the shape parameter of the PHU distribution.

3.3. Simulation Study

We generate 1000 random samples from the LPHU distribution under the sample sizes n = 10 , 20, 30, , 1500, respectively, and consider the following scenarios: Scenario A, where α = 1.5 and β = 2.0 , being the pseudo-random numbers generated from a unimodal LPHU distribution; Scenario B, where α = 0.5 and β = 0.6 , being the pseudo-random numbers generated from a LPHU distribution with reverse-unimodal pdf; Scenario C, where α = 2.0 and β = 0.7 , being th pseudo-random numbers generated from a LPHU distribution with increasing pdf.
Pseudo-random numbers were generated from Equation (2) considering the following steps:
  • Generate u uniform ( 0 , 1 ) ;
  • Compute x = 1 1 log ( α ) W 0 log ( α ) ( u 1 ) α 1 β .
For each simulated sample, we obtain the ML estimate following the guidelines of Section 3.1. The R code used in scenario A is provided in Appendix C. For scenarios B and C it is enough to modify the initial parameters.
Figure 4, Figure 5 and Figure 6 illustrate the behavior of the average estimate (AE), the standard deviation (SD), the square root of the simulated mean square error (RMSE), the average of the asymptotic standard error (SE), and the probability of coverage of the 95% asymptotic confidence interval (CP) for each set of 1000 estimates obtained under the different sample sizes and scenarios considered. Looking at the figures, we can see that the AE’s tend to be close to the true values of the parameters as the sample size increases. An overestimation of the parameters is observed when the estimates are obtained from small samples ( n = 10 ), which decreases rapidly as the sample size increases. The SD’s, RMSE’s, and SE’s are close and decrease towards 0 as the sample size increases, as expected in the standard asymptotic theory. Similarly, the CP’s converge to the nominal values used to construct the confidence intervals as the sample size increases.

4. Data Analysis

In this section, we present two applications of the LPHU distribution in which its performance in fitting real data is contrasted with that of the B and K distributions.

4.1. Firm’s Risk Management Cost Effectiveness

We consider 73 observations on the measure of the firm’s risk management cost effectiveness presented by Schmit and Roth [16]. Some descriptive statistics of the data are as follows: Minimum, 0.0020; Maximum, 0.9755; Fisher’s skewness coefficient, 3.7154; Fisher’s kurtosis coefficient, 17.957.
Table 1 reports the ML estimates, the values associated with the Akaike Information Criterion (AIC) [17], and the Bayesian Information Criterion (BIC) [18], and the p-values of the traditional Anderson–Darling (AD) and Cramer–von Mises (CvM) goodness of fit tests [19] for the LPHU, B, and K distributions fitted to the firm’s risk management cost effectiveness data. Looking at the table, under the 0.05 significance level, we note that the p-values indicate that only the LPHU distribution appropriately models the firm’s risk management cost effectiveness data. In addition, it can be seen that the LPHU distribution is the one with the lowest AIC and BIC values among the fitted distributions, suggesting that this distribution should be selected for modeling the firm’s risk management cost effectiveness data.
We observe that the LPHU pdf, equipped with the ML estimates provided in Table 1, takes the value β ^ ( 1 log ( α ^ ) ) = 2.846 at the lower end of its support, allowing for more accurate modeling of the lower empirical quantiles. Figure 7 presents the empirical distribution contrasted with the fitted LPHU, B, and K distributions. In the Figure, it can be seen that the lower empirical quantiles are closer to the LPHU distribution quantiles.

4.2. Household Shared Budget for Transportation

Studies on the shared household budget are highly valued when trying to find out the evolution of the price level faced by households in a given locality, see for example Blundell et al. [3]. The BudgetUk database [20] of the R language provides information on the proportion of the shared budget that British households spend on different items such as food, clothing, transport, fuel, and alcohol.
In this application, we compare the performance of the LPHU, B, and K distributions by modeling the proportion of the shared budget of households that allocate part of the budget to transportation. These data correspond to 1473 observations bounded to the interval ( 0 , 1 ) whose histogram exhibits a unimodal behavior. From now on, we will refer to these data simply as the household shared budget proportion data.
For comparison, we use the sample function of the R language to obtain 1000 random samples of size 100 from the household shared budget proportion data. Based on the modified AD and CvM goodness-of-fit tests [21], under the 0.05 significance level, we calculate the proportion of samples where the LPHU, B, and K distributions fit the data appropriately. We call this the non-rejection rate. Additionally, we calculate the proportion of samples where each distribution presents the lowest AIC and BIC values, that is, the proportion of samples where each distribution exhibits the best performance. We call this the hit rate. Table 2 reports the values associated with the non-rejection and hit rates for the LPHU, B, and K distributions fitted to the 1000 samples. In the table, we observe that the LPHU distribution is able to adequately fit a larger proportion of samples than the B and K distributions. In addition, we see that the performance of the LPHU distribution is better in a large proportion of samples, in 83.6% of the samples.
Table 3 shows the parameter estimates and the AIC, BIC, W * and A * values for the LPHU, B, and K distributions fitted to a single sample obtained from the household shared budget proportion data. Looking at the table, it can be seen that the LPHU distribution performs more accurately than the B and K distributions. We note that the fitted LPHU pdf takes the value β ^ ( 1 log ( α ^ ) ) = 2.846 at the lower end of its support. This allows the LPHU distribution to more appropriately fit the lower empirical quantiles than the B and K distributions. This is illustrated in the upper left panel of Figure 8. In the upper right panel and the lower panels of Figure 8, the qq-plots for the LPHU, B and K distributions are presented, where it can be seen that the LPHU distribution presents a better fit of the lower quantiles.

5. Concluding Remarks

We propose a new two-parameter distribution for the fit of bounded data. The pdf of the new distribution, called the LPHU distribution, presents shapes similar to those exhibited by the popular B and K distributions, but with the characteristic that it tends towards a positive finite value at the lower end of its support. The latter may lead to a better fit of the lower empirical quantiles of certain datasets. Structural properties such as pdf, cdf, and hrf have a closed analytical structure, so they are easy to process computationally. From the description of Fisher’s skewness and kurtosis coefficients for the LPHU distribution, it is concluded that the LPHU distribution can present both positive and negative skewness, and is also capable of capturing high levels of kurtosis. Regarding the parameter estimation, the ML estimators for the LPHU distribution do not have a closed form, so it is necessary to use numerical procedures to obtain the estimates. We use the optim function of the R programming language to accomplish this task. Simulation studies show that the ML method provides acceptable estimates for the parameters of the LPHU distribution. Finally, two application examples illustrate that the LPHU distribution can perform better when fitting real data than the popular B and K distributions.

Author Contributions

Conceptualization, Y.A.I. and M.A.R.; methodology, J.R. and H.V.; software, Y.A.I.; validation, Y.A.I., M.A.R., J.R. and H.V.; formal analysis, Y.A.I., M.A.R., J.R. and H.V.; investigation, Y.A.I.; supervision, Y.A.I. All authors contributed significantly to this research article. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the internal project SEMILLERO UA-2022 (Chile).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

    The following abbreviations are used in this manuscript:
Bbeta distribution
KKumaraswamy distribution
pdfprobability density function
MLmaximum likelihood
cdfcumulative distribution function
Uuniform distribution
LULambert uniform distribution
PHUproportional hazard uniform distribution
rfreliability function
hrfhazard rate function
LPHULambert proportional hazard uniform distribution
AEaverage estimate
SDstandard deviation
RMSEroot mean square error
SEstandard error
CPcoverage probability
ADAnderson–Darling
CvMCramer–von Mises
AICAkaike information criterion
BICBayesian information criterion

Appendix A. R Codes

R codes for computing the cdf and pdf and for obtaining pseudo-random numbers from the LPHU distribution.
>  pLPHU <- function(x,a,b){1-(1-x)^b*a^{1-(1-x)^b}}
>  dLPHU <- function(x,a,b){b*(1-x)^{b-1}*a^{1-(1-x)^b}*(1-log(a)*(1-x)^b)}
>
> library(lamW)
>  n <- 1000 ; a <- 1.7 ; b <- 2 ; u <- runif(n)
>  1-(-1/log(a)*lambertW0(log(a)*(u-1)/a))^{1/b}

Appendix B. R Code

R code to obtain ML estimates for the parameters of the LPHU distribution.
>  loglikLPHU <- function(p,x){
+    -sum(log(dLPHU(x,p[1],p[2])))
+  }
>  beta0 <- -length(data)/sum(log(1-data)) # (data: numeric vector)
>  optim(par=c(1,beta0),fn=loglikLPHU,method=c("L-BFGS-B"),
+    hessian=TRUE,lower=c(1e-12,1e-12),upper=c(exp(1),Inf),x=data)

Appendix C. R Code

R code used in the simulation study of Section 3.3, scenario A.
>  n <- 150
>  alpha <- 1.5
>  beta <- 2
>  media.alpha <- desEs.alpha <- erEst.alpha <- erCua.alpha <- numeric(n)
>  probC.alpha <- numeric(n)
>  media.beta <- desEs.beta <- erEst.beta <- erCua.beta <- numeric(n)
>  probC.beta <- numeric(n)
>  for(j in 1:n){
+    alpha.mv <- se.alpha <- rmse.alpha <- count.alpha <- numeric(1000)
+    beta.mv <- se.beta <- rmse.beta <- count.beta <- numeric(1000)
+    for(i in 1:1000){
+      count2 <- 0
+      while(count2==0){
+        count1 <- 0
+        while(count1==0){
+          if(alpha==1){
+            x <- runif(10*j)
+          }else{
+            u <- runif(10*j)
+            x <- 1-(-1/log(alpha)*lambertW0(log(alpha)*
+                   (u-1)/alpha))^{1/beta}
+          }
+          alpha0 <- 1
+          beta0 <- -length(x)/sum(log(1-x))
+          fit <- try(optim(par=c(alpha0,beta0),fn=loglikLPHU,hessian=TRUE,
+                   method=c("L-BFGS-B"),lower=c(1e-15,1e-15),
+                   upper=c(exp(1),Inf),x=x),silent=TRUE)
+          fit2 <- try(fit$par,silent=TRUE)
+          if(is.numeric(fit2)==FALSE){count1=0}else{count1=1}
+        }
+      var <- diag(solve(fit$hessian))
+      if(var[1] <=0 | var[2]<=0){count2=0}else{count2=1}
+      }
+      LIalpha = fit$par[1]-1.96*sqrt(var[1])
+      LSalpha = fit$par[1]+1.96*sqrt(var[1])
+      LIbeta = fit$par[2]-1.96*sqrt(var[2])
+      LSbeta = fit$par[2]+1.96*sqrt(var[2])
+      alpha.mv[i] = fit$par[1]
+      se.alpha[i] = sqrt(var[1])
+      rmse.alpha[i] = (alpha-fit$par[1])^2
+      count.alpha[i] = if(alpha > LIalpha & alpha < LSalpha){1}else{0}
+      beta.mv[i] = fit$par[2]
+      se.beta[i] = sqrt(var[2])
+      rmse.beta[i] = (beta-fit$par[2])^2
+      count.beta[i] = if(beta > LIbeta & beta < LSbeta){1}else{0}
+    }
+  media.alpha[j] <- mean(alpha.mv)
+  desEs.alpha[j] <- sd(alpha.mv)
+  erEst.alpha[j] <- mean(se.alpha)
+  erCua.alpha[j] <- sqrt(mean(rmse.alpha))
+  probC.alpha[j] <- mean(count.alpha)
+  media.beta[j] <- mean(beta.mv)
+  desEs.beta[j] <- sd(beta.mv)
+  erEst.beta[j] <- mean(se.beta)
+  erCua.beta[j] <- sqrt(mean(rmse.beta))
+  probC.beta[j] <- mean(count.beta)
+  }
>  plot(ejex,media.alpha,type="l",col="red",lwd=2,xlab="n",ylab="AE",
+    cex.lab=2)
>  abline(h=1.5,lty=2)
>  plot(ejex,desEs.alpha,type="l",col="red",lwd=3,xlab="n",ylab="SD,
+    SE and RMSE",ylim=c(0,1),cex.lab=2)
>  lines(ejex,erEst.alpha,type="l",col="green",lwd=2)
>  lines(ejex,erCua.alpha,type="l",col="blue",lwd=1)
>  abline(h=0,lty=2)
>  legend("topright",c("SD","SE","RMSE"),lty=1,lwd=2,col=c("red","blue",
+    "green"),bty="n",cex=1.5)
>  plot(ejex,probC.alpha,type="l",col="red",lwd=2,xlab="n",ylab="CP",
+    cex.lab=2,ylim=c(0.9,1))
>  abline(h=0.95,lty=2)
>  plot(ejex,media.beta,type="l",col="red",lwd=2,xlab="n",ylab="AE",
+    cex.lab=2)
>  abline(h=2,lty=2)
>  plot(ejex,desEs.beta,type="l",col="red",lwd=3,xlab="n",ylab="SD, SE and
+    RMSE",ylim=c(0,1),cex.lab=2)
>  lines(ejex,erEst.beta,type="l",col="green",lwd=2)
>  lines(ejex,erCua.beta,type="l",col="blue",lwd=1)
>  abline(h=0,lty=2)
>  legend("topright",c("SD","SE","RMSE"),lty=1,lwd=2,col=c("red","blue",
+    "green"),bty="n",cex=1.5)
>  plot(ejex,probC.beta,type="l",col="red",lty=1,lwd=2,xlab="n",ylab="CP",
+    cex.lab=2,ylim=c(0.9,1))
>  abline(h=0.95,lty=2)

References

  1. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions; John Wiley & Sons: Hoboken, NJ, USA, 1995; Volume 2. [Google Scholar]
  2. Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
  3. Blundell, R.; Duncan, A.; Pendakur, K. Semiparametric estimation and consumer demand. J. Appl. Econom. 1998, 13, 435–461. [Google Scholar] [CrossRef]
  4. Dey, S.; Mazucheli, J.; Nadarajah, S. Kumaraswamy distribution: Different methods of estimation. Comput. Appl. Math. 2018, 37, 2094–2111. [Google Scholar] [CrossRef]
  5. Wang, J.Z. A note on estimation in the four-parameter beta distribution. Commun.-Stat.-Simul. Comput. 2005, 34, 495–501. [Google Scholar] [CrossRef]
  6. Smith, R.L. Maximum likelihood estimation in a class of nonregular cases. Biometrika 1985, 72, 67–90. [Google Scholar] [CrossRef]
  7. Iriarte, Y.A.; de Castro, M.; Gómez, H.W. The Lambert-F distributions class: An alternative family for positive data analysis. Mathematics 2020, 8, 1398. [Google Scholar] [CrossRef]
  8. Corless, R.M.; Gonnet, G.H.; Hare, D.E.; Jeffrey, D.J.; Knuth, D.E. On the LambertW function. Adv. Comput. Math. 1996, 5, 329–359. [Google Scholar] [CrossRef]
  9. Brito, P.; Fabiao, F.; Staubyn, A. Euler, Lambert, and the Lambert W function today. Math. Sci. 2008, 33. [Google Scholar]
  10. Iriarte, Y.A.; de Castro, M.; Gómez, H.W. An alternative one-parameter distribution for bounded data modeling generated from the Lambert transformation. Symmetry 2021, 13, 1190. [Google Scholar] [CrossRef]
  11. Martínez-Florez, G.; Moreno-Arenas, G.; Vergara-Cardozo, S. Properties and inference for proportional hazard models. Rev. Colomb. Estad. 2013, 36, 95–114. [Google Scholar]
  12. Adler, A. lamW: Lambert-W Function. 2015. R Package Version 2.1.1. Available online: https://doi.org/10.5281/zenodo.5874874 (accessed on 5 January 2023).
  13. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  14. Soetaert, K. rootSolve: Nonlinear Root Finding, Equilibrium and Steady-State Analysis of Ordinary Differential Equations. 2009. R Package Version 1.6. Available online: https://CRAN.R-project.org/package=rootSolve (accessed on 5 January 2023).
  15. Byrd, R.H.; Lu, P.; Nocedal, J.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
  16. Schmit, J.T.; Roth, K. Cost effectiveness of risk management practices. J. Risk Insur. 1990, 57, 455–470. [Google Scholar] [CrossRef]
  17. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
  18. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  19. Faraway, J.; Marsaglia, G.; Marsaglia, J.; Baddeley, A. Goftest: Classical Goodness-of-Fit Tests for Univariate Distributions. R Package Version 1.2-3. 2021. Available online: https://CRAN.R-project.org/package=goftest (accessed on 5 January 2023).
  20. Croissant, Y.; Graves, S. Ecdat: Data Sets for Econometrics. R Package Version 0.4-2. 2022. Available online: https://CRAN.R-project.org/package=Ecdat (accessed on 5 January 2023).
  21. Chen, G.; Balakrishnan, N. A general purpose approximate goodness-of-fit test. J. Qual. Technol. 1995, 27, 154–161. [Google Scholar] [CrossRef]
Figure 1. Top left: Some LPHU pdf curves considering different values for α and β . Top right and bottom: Histograms for three sets of 1000 pseudorandom numbers from the LPHU distribution, with ( α , β ) = (1.7, 2), (0.5, 1.5), and (0.2, 0.5), respectively, fitted with the B and K distributions via the ML method.
Figure 1. Top left: Some LPHU pdf curves considering different values for α and β . Top right and bottom: Histograms for three sets of 1000 pseudorandom numbers from the LPHU distribution, with ( α , β ) = (1.7, 2), (0.5, 1.5), and (0.2, 0.5), respectively, fitted with the B and K distributions via the ML method.
Mathematics 11 00667 g001
Figure 2. Some LPHU hrf curves for β = 5 and different values of α .
Figure 2. Some LPHU hrf curves for β = 5 and different values of α .
Mathematics 11 00667 g002
Figure 3. Fisher’s skewness and kurtosis coefficients for the LPHU distribution.
Figure 3. Fisher’s skewness and kurtosis coefficients for the LPHU distribution.
Mathematics 11 00667 g003
Figure 4. The AE, SD, RMSE, SE, and CP for each of the 1000 estimates of α (top) and β (bottom) obtained in scenario A, under the different sample sizes.
Figure 4. The AE, SD, RMSE, SE, and CP for each of the 1000 estimates of α (top) and β (bottom) obtained in scenario A, under the different sample sizes.
Mathematics 11 00667 g004
Figure 5. The AE, SD, RMSE, SE, and CP for each of the 1000 estimates of α (top) and β (bottom) obtained in scenario B, under the different sample sizes.
Figure 5. The AE, SD, RMSE, SE, and CP for each of the 1000 estimates of α (top) and β (bottom) obtained in scenario B, under the different sample sizes.
Mathematics 11 00667 g005
Figure 6. The AE, SD, RMSE, SE, and CP for each of the 1000 estimates of α (top) and β (bottom) obtained in scenario C, under the different sample sizes.
Figure 6. The AE, SD, RMSE, SE, and CP for each of the 1000 estimates of α (top) and β (bottom) obtained in scenario C, under the different sample sizes.
Mathematics 11 00667 g006
Figure 7. (Left): Histogram for the firm’s risk management cost effectiveness data and the fitted pdf curves via the ML method. (Right): Empirical cdf for the firm’s risk management cost effectiveness data and the fitted cdf curves.
Figure 7. (Left): Histogram for the firm’s risk management cost effectiveness data and the fitted pdf curves via the ML method. (Right): Empirical cdf for the firm’s risk management cost effectiveness data and the fitted cdf curves.
Mathematics 11 00667 g007
Figure 8. Top left: Histogram for a single sample obtained from the household shared budget proportion data and the fitted pdf curves via the ML method. Top right and lower: QQ-plots for the LPHU (red), B (green), and K (blue) distributions.
Figure 8. Top left: Histogram for a single sample obtained from the household shared budget proportion data and the fitted pdf curves via the ML method. Top right and lower: QQ-plots for the LPHU (red), B (green), and K (blue) distributions.
Mathematics 11 00667 g008
Table 1. The parameter estimates with standard errors in parentheses, the AIC and BIC values, and the p-values of the AD and CvM goodness-of-fit tests for each distribution fitted to the firm’s risk management cost effectiveness data.
Table 1. The parameter estimates with standard errors in parentheses, the AIC and BIC values, and the p-values of the AD and CvM goodness-of-fit tests for each distribution fitted to the firm’s risk management cost effectiveness data.
Distribution α ^ β ^ AICBICADCvM
LPHU0.0121.889−174.3−169.70.4840.616
(0.016)(0.496)
K0.6643.440−153.3−148.70.0250.039
(0.071)(0.620)
B0.6123.797−148.2−143.70.0090.013
(0.085)(0.715)
Table 2. Non-rejection rates based on modified AD ( A * ) and CvM ( W * ) statistics and hit rates based on AIC and BIC values for the LPHU, B, and K distributions fitted to the 1000 samples obtained from the household shared budget proportion data.
Table 2. Non-rejection rates based on modified AD ( A * ) and CvM ( W * ) statistics and hit rates based on AIC and BIC values for the LPHU, B, and K distributions fitted to the 1000 samples obtained from the household shared budget proportion data.
Non-Rejection RateHit Rate
Distribution w * A * AICBIC
LPHU0.7340.7180.8360.836
B0.5020.4790.1300.130
K0.5620.5310.0340.034
Table 3. ML estimates (with SE in parentheses), AIC and BIC values, and modified AD ( W * ) and CvM ( A * ) statistics for the LPHU, B, and K distributions fitted to a single sample obtained from the household shared budget proportion data.
Table 3. ML estimates (with SE in parentheses), AIC and BIC values, and modified AD ( W * ) and CvM ( A * ) statistics for the LPHU, B, and K distributions fitted to a single sample obtained from the household shared budget proportion data.
Distribution α ^ β ^ AICBIC W * A *
LPHU1.9969.222−209.5−204.30.1120.715
(0.259)(0.933)
B1.3028.217−203.1−197.80.2131.308
(0.166)(1.228)
K1.27310.539−105.1−199.90.1771.092
(0.116)(2.385)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Varela, H.; Rojas, M.A.; Reyes, J.; Iriarte, Y.A. An Alternative Lambert-Type Distribution for Bounded Data. Mathematics 2023, 11, 667. https://doi.org/10.3390/math11030667

AMA Style

Varela H, Rojas MA, Reyes J, Iriarte YA. An Alternative Lambert-Type Distribution for Bounded Data. Mathematics. 2023; 11(3):667. https://doi.org/10.3390/math11030667

Chicago/Turabian Style

Varela, Héctor, Mario A. Rojas, Jimmy Reyes, and Yuri A. Iriarte. 2023. "An Alternative Lambert-Type Distribution for Bounded Data" Mathematics 11, no. 3: 667. https://doi.org/10.3390/math11030667

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop