Next Article in Journal
Single-Loop Multi-Objective Reliability-Based Design Optimization Using Chaos Control Theory and Shifting Vector with Differential Evolution
Next Article in Special Issue
Prediction Interval for Compound Conway–Maxwell–Poisson Regression Model with Application to Vehicle Insurance Claim Data
Previous Article in Journal
Stability Analysis of Caputo Fractional Order Viral Dynamics of Hepatitis B Cellular Infection
Previous Article in Special Issue
Forecasting Financial and Macroeconomic Variables Using an Adaptive Parameter VAR-KF Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Arctan Power Distribution: Properties, Quantile and Modal Regressions with Applications to Biomedical Data

1
Department of Statistics and Actuarial Science, School of Mathematical Sciences, C. K. Tedam University of Technology and Applied Sciences, Kassena-Nankana Navrongo-Kologo Road, Navrongo P.O. Box 24, Upper East, Ghana
2
Department of Mathematics, LMNO, CNRS-Université de Caen, Campus II, Science 3, 14032 Caen, France
*
Author to whom correspondence should be addressed.
Math. Comput. Appl. 2023, 28(1), 25; https://doi.org/10.3390/mca28010025
Submission received: 22 December 2022 / Revised: 9 February 2023 / Accepted: 10 February 2023 / Published: 14 February 2023
(This article belongs to the Special Issue Statistical Inference in Linear Models)

Abstract

:
The usefulness of (probability) distributions in the field of biomedical science cannot be underestimated. Hence, several distributions have been used in this field to perform statistical analyses and make inferences. In this study, we develop the arctan power (AP) distribution and illustrate its application using biomedical data. The distribution is flexible in the sense that its probability density function exhibits characteristics such as left-skewedness, right-skewedness, and J and reversed-J shapes. The characteristic of the corresponding hazard rate function also suggests that the distribution is capable of modeling data with monotonic and non-monotonic failure rates. A bivariate extension of the AP distribution is also created to model the interdependence of two random variables or pairs of data. The application reveals that the AP distribution provides a better fit to the biomedical data than other existing distributions. The parameters of the distribution can also be fairly accurately estimated using a Bayesian approach, which is also elaborated. To end the study, the quantile and modal regression models based on the AP distribution provided better fits to the biomedical data than other existing regression models.

1. Introduction

Parametric statistical techniques have been used in biomedical studies to conduct analyses and draw conclusions. These parametric analyses, however, are constrained by some assumptions about (probability) distributions. Thus, the task of selecting an appropriate distribution for such analyses is incredibly essential. In addition, it is nontrivial, as the use of an incorrect distribution will result in misleading inferences. Knowing which distribution to use in biomedical modeling has become increasingly important as it is used to develop new parametric regression models for modeling the relationship between endogenous variables and a set of exogenous variables. These new regression models often provide a good fit with minimal loss of information compared to the existing ones. This has triggered new interest in developing regression models using extended or modified forms of existing distributions.
Among the distributions used for developing the regression models, those that are defined on the unit interval have received much attention due to the small loss of information they offer in modeling data on this interval. Some of these distributions include the unit folded normal distribution [1], bounded truncated Cauchy power exponential distribution [2], unit exponentiated Fréchet distribution [3], log XLindley (LXL) distribution [4], unit Chen distribution [5], unit Burr XII distribution (UBXII) [6], unit generalized half-normal distribution [7], unit Burr III (UBIII) distribution [8], unit Lindley distribution [9], unit Gompertz distribution [10], unit improved second degree Lindley (UISDL) distribution [11], unit Weibull distribution [12], and exponentiated Topp–Leone distribution [13].
Despite the existence of these distributions, it is worth noting that the behavior of humans or organisms is nondeterministic, and a single distribution cannot be selected in all situations to describe or model these traits. Therefore, we develop a new distribution called the arctan power (AP) distribution for modeling data on the unit interval based on the following motivations:
  • Develop a flexible unit distribution that is able to model data that are left-skewed, right-skewed, symmetric, J, and reversed-J shapes.
  • Develop a unit distribution capable of modeling data with increasing, bathtub, and modified upside-down bathtub hazard rate functions (HRFs).
  • Develop quantile regression for modeling response variables that are skewed or contain extreme values.
  • Develop modal regression for modeling response variables that are asymmetric or heavy-tailed.
The article is organized into eight sections. Section 2 describes the development of the AP distribution. Section 3 presents their statistical properties. Section 4 shows the construction of a possible bivariate extension of the AP distribution. Nine frequentist approaches to estimating the involved parameters are proposed in Section 5. The frequentist and Bayesian univariate applications of the distribution are given in Section 6. Section 7 is devoted to the quantile and modal regressions based on the AP distribution and their applications. The conclusion of the study is presented in Section 8.

2. Development of AP Distribution

Suppose that a random variable, X , follows the arctan uniform (AU) distribution. Then, according to [14], the cumulative distribution function (CDF) and probability density function (PDF) of X are, respectively, given by
F X ( x ; α ) = arctan ( α x ) arctan ( α ) , α > 0 , x ( 0 , 1 )
and
f X ( x ; α ) = α arctan ( α ) ( 1 + α 2 x 2 ) , x ( 0 , 1 ) .
The proposed AP distribution is obtained using the power transformation Y = X 1 / β , β > 0 . The motivations for introducing the power parameter, β , are to improve the tail properties of the new distribution, making it capable of handling both monotonic and non-monotonic HRFs. Other researchers have used the power transformation approach to modify existing continuous distributions. See, for instance, [15,16,17]. Hence, using standard mathematical developments, the CDF of Y is obtained as
F Y ( y ; α , β ) = F X ( y β ; α ) = arctan ( α y β ) arctan ( α ) , α > 0 , β > 0 , y ( 0 , 1 ) .
The PDF and HRF are, respectively, given by
f Y ( y ; α , β ) = α β y β 1 arctan ( α ) ( 1 + α 2 y 2 β ) , y ( 0 , 1 )
and
h Y ( y ; α , β ) = α β y β 1 ( arctan ( α ) arctan ( α y β ) ) ( 1 + α 2 y 2 β ) , y ( 0 , 1 ) .
Basically, when α 0 + , the PDF of the AP distribution reduces to the one of the power distribution. As α 0 + and β = 1 , the PDF of the AP distribution reduces to the one of the standard uniform distribution. Furthermore, when β = 1 , the PDF of the AP distribution reduces to the one of the AU distribution.
The expanded form of the PDF is often useful when deriving the statistical properties of the distribution. Thus, using the arctangent function expansion indicated as follows: arctan ( z ) = k = 0 ( 1 ) k z 2 k + 1 2 k + 1 , | z | < 1 (see [18]) and α ( 0 , 1 ) , the CDF of Y can be expressed as
F Y ( y ; α , β ) = k = 0 ( 1 ) k α 2 k + 1 y ( 2 k + 1 ) β ( 2 k + 1 ) arctan ( α ) , y ( 0 , 1 ) .
Differentiating the expanded form of the CDF in Equation (6), the corresponding PDF is given by
f Y ( y ; α , β ) = k = 0 ( 1 ) k β α 2 k + 1 y ( 2 k + 1 ) β 1 arctan ( α ) , y ( 0 , 1 ) .
The PDF and HRF plots are shown in Figure 1 for some given parameter values. In it, the PDF exhibits left-skewed, right-skewed, J, and reversed-J shapes. This makes the AP distribution superior to the AU distribution, which exhibits only J shapes. On this side, the HRF displays increasing, bathtub, and modified upside-down bathtub shapes.

3. Some Statistical Properties

In this section, some statistical properties of the AP distribution are presented.

3.1. Mode

The mode of a distribution is a useful measure of central tendency. It can be used as it for data measured on the nominal, ordinal, interval, or ratio scale. The AP distribution has a unique mode when β > 1 , and it is expressed in the result below.
Proposition 1.
The mode of the AP distribution is given by
mode = ( β 1 α 2 ( β + 1 ) ) 1 2 β , β > 1 .
Proof. 
To establish this expression, it is essential to locate the critical point(s) of the PDF. A critical point of the PDF is a point of the PDF, or equivalently, the logarithm of the PDF, where its derivative is zero or infinity. Taking the logarithm of the PDF and differentiating, we have
d log f Y ( y ; α , β ) d y = β 1 α 2 ( β + 1 ) y 2 β y ( 1 + α 2 y 2 β ) .
Equating the derivative to zero and simplifying yields the mode. This completes the proof. □

3.2. Quantile Function

The quantile function can be used to generate random observations from the AP distribution and to compute shape-related metrics like skewness and kurtosis.
Proposition 2.
The quantile function of the AP distribution is given by
Q ( u ; α , β ) = [ tan ( u arctan ( α ) ) α ] 1 β , u ( 0 , 1 ) .
Proof. 
The quantile function is the solution Q ( u ; α , β ) of the following nonlinear equation: F Y ( Q ( u ; α , β ) ; α , β ) = u for all u ( 0 , 1 ) . After some simplifications, letting y = Q ( u ; α , β ) in the CDF and equating the CDF to u ( 0 , 1 ) yields the quantile function. This completes the proof. □
It is important to note that the quantile function of the AP distribution is uniquely determined with simple trigonometric and power functions.
The median Q ( 0.5 ; α , β ) , first quartile Q ( 0.25 ; α , β ) , and upper quartile Q ( 0.75 ; α , β ) are obtained, respectively, by substituting 0.5, 0.25, and 0.75 into the quantile function. The Bowley’s (BS) measure of skewness and the Moors’ (MK) measure of kurtosis can then be calculated using the quantiles. They are, respectively, given by
BS = Q ( 0.75 ; α , β ) + Q ( 0.25 ; α , β ) 2 Q ( 0.5 ; α , β ) Q ( 0.75 ; α , β ) Q ( 0.25 ; α , β ) ,
and
MK = Q ( 0.375 ; α , β ) Q ( 0.125 ; α , β ) + Q ( 0.875 ; α , β ) Q ( 0.625 ; α , β ) Q ( 0.75 ; α , β ) Q ( 0.25 ; α , β ) .
The plots of the Bowley’s coefficient of skewness and Moor’s coefficient of kurtosis are displayed in Figure 2. Both the skewness and kurtosis are affected by changes in the values of the parameters. From this figure, we can observe that the AP distribution can be left-skewed or right-skewed.

3.3. Moments and Generating Function

The moments are useful for estimating measures of central tendency, dispersion, and shapes. The generating functions can be used to estimate the moments, if they exist in the mathematical sense.
Proposition 3.
For α ( 0 , 1 ) , the r t h raw moment of an AP random variable Y is given by
μ r = k = 0 ( 1 ) k β α 2 k + 1 ( r + ( 2 k + 1 ) β ) arctan ( α ) , r = 1 , 2 , ...
Proof. 
The r t h raw moment by definition is given by μ r = E ( Y r ) = 0 1 y r f Y ( y ; α , β ) d y . Thus, we obtain
μ r = k = 0 ( 1 ) k β α 2 k + 1 arctan ( α ) 0 1 y r + ( 2 k + 1 ) β 1 d y .
After some algebraic simplifications, the raw moment of the AP random variable is obtained. This completes the proof. □
The incomplete moment is very useful when computing measures of inequalities, such as the Lorenz and Bonferroni curves.
Proposition 4.
For α ( 0 , 1 ) , the r t h incomplete moment of an AP random variable Y is given by
ϑ r ( y ) = k = 0 ( 1 ) k β α 2 k + 1 y r + ( 2 k + 1 ) β ( r + ( 2 k + 1 ) β ) arctan ( α ) , r = 1 , 2 , ...
Proof. 
By definition, ϑ r ( y ) = E ( Y r 1 { Y < y } ) = 0 y z r f Y ( z ; α , β ) d z . Hence, substituting the expanded PDF into the definition and simplifying it completes the proof. □
The Lorenz and Bonferroni curves are obtained, respectively, as
L F ( y ) = 1 μ 0 y z f Y ( z ; α , β ) d z
and
B F ( y ) = 1 μ F Y ( y ; α , β ) 0 y z f Y ( z ; α , β ) d z ,
where μ = μ 1 is the mean.
Figure 3 displays the plots of the Lorenz and Bonferroni curves of the AP distribution for some selected parameter values. For the Lorenz curve, when L F ( y ) = y , the minimal point of inequality is obtained. When B F ( y ) = y , the so-called equidistributional line for the Bonferroni curve is obtained.
When non-central moments of a random variable exist, they can be found using the moment-generating function (MGF).
Proposition 5.
For α ( 0 , 1 ) , the MGF of an AP random variable Y is given by
M Y ( t ) = r = 0 k = 0 ( 1 ) k t r β α 2 k + 1 r ! ( r + ( 2 k + 1 ) β ) arctan ( α ) .
Proof. 
Using the definition M Y ( t ) = E ( e t Y ) = 0 1 e t y f Y ( y ; α , β ) d y and applying the Taylor series expansion, we get
M Y ( t ) = r = 0 t r r ! μ r
Hence, substituting the r t h non-central moment completes the proof. □

3.4. Order Statistics

Order statistics are very useful in extreme value analysis. They can be used to determine the behavior of the minimum and maximum value. Consider the order statistics Y 1 : n Y 2 : n Y n : n from the AP distribution. Then, the PDF of Y k : n , k = 1 , 2 , ... , n is
f k : n ( y ; α , β ) = C k : n [ F Y ( y ; α , β ) ] k 1 [ 1 F Y ( y ; α , β ) ] n k f Y ( y ; α , β ) ,
where the factor constant is given by
C k : n = n ! ( k 1 ) ! ( n k ) ! .
Using the standard binomial expansion, we can express this PDF as
f k : n ( y ; α , β ) = C k : n j = 0 n k ( 1 ) j ( n k j ) [ F Y ( y ; α , β ) ] k + j 1 f Y ( y ; α , β ) .
Hence, we obtain
f k : n ( y ; α , β ) = α β y β 1 C k : n arctan ( α ) ( 1 + α 2 y 2 β ) j = 0 n k ( 1 ) j ( n k j ) [ arctan ( α y β ) arctan ( α ) ] k + j 1 .
The minimum ( Y 1 : n ) and maximum ( Y n : n ) order statistics can serve to investigate the minimum and maximum failure time of a system, respectively. The PDF of Y 1 : n is given by
f 1 : n ( y ; α , β ) = n f Y ( y ; α , β ) [ 1 F Y ( y ; α , β ) ] n 1 = n α β y β 1 ( arctan ( α ) arctan ( α y β ) ) n 1 ( 1 + α 2 y 2 β ) ( arctan ( α ) ) n
and the PDF of Y n : n is
f n : n ( y ; α , β ) = n f Y ( y ; α , β ) [ F Y ( y ; α , β ) ] n 1 = n α β y β 1 ( arctan ( α y β ) ) n 1 ( 1 + α 2 y 2 β ) ( arctan ( α ) ) n .
The minimum and maximum (min-max) plot of the order statistics can be used to describe whether the distribution is symmetrical or skewed. The min-max plots depend on E ( Y 1 : n ) and E ( Y n : n ) . The min-max plots for some chosen parameter values for the AP distribution are shown in Figure 4. This figure reveals that the AP distribution can be right-skewed, left-skewed, or symmetric.

4. Bivariate AP Distribution

The development of bivariate distributions is very useful in the context of investigating the joint relationship between two random variables. For example, one may be interested in studying the relationship between the human development index and literacy rate of a country, the maternal mortality rate and literacy rate, or rainfall and temperature, among others. There are different methods of developing bivariate distributions. One way to do this is to use copula functions (see [19]). However, in this study, we follow the approach used by [20,21]. Let ( X , Y ) be a bivariate continuous random vector. The CDF of the bivariate AP (BAP) distribution with parameters α , β , ρ 1 , ρ 2 , ρ 3 , where α > 0 , β > 0 ,   1 < ρ 1 + ρ 3 < 1 , 1 < ρ 2 + ρ 3 < 1 , x ( 0 , 1 ) and y ( 0 , 1 ) , is given by
F X Y ( x , y ; ς ) = arctan ( α x β ) arctan ( α y β ) ( arctan ( α ) ) 2 [ 1 + ( ρ 1 + ρ 3 ) ( arctan ( α ) arctan ( α x β ) arctan ( α ) ) + ( ρ 2 + ρ 3 ) ( arctan ( α ) arctan ( α y β ) arctan ( α ) ) ] 1 ,
where ς = ( α , β , ρ 1 , ρ 2 , ρ 3 ) . The plots of the CDF of the BAP distribution for the given parameter values are shown in Figure 5:
(a)
α = 8.5 , β = 2.5 , ρ 1 = 0.4 , ρ 2 = 0.1 , ρ 3 = 0.2 ,
(b)
α = 4.5 , β = 8.2 , ρ 1 = 0.3 , ρ 2 = 0.4 , ρ 3 = 0.2 and
(c)
α = 3.4 , β = 6.2 , ρ 1 = 0.3 , ρ 2 = 0.4 , ρ 3 = 0.6 .
These plots reveal different concave and convex shapes for the chosen parameter values.
The PDF of the BAP distribution is given by
f X Y ( x , y ; ς ) = ( α β ) 2 ( x y ) β 1 ( arctan ( α ) ) 2 [ 1 + ( α x β ) 2 + ( α y β ) 2 + α 4 ( x y ) 2 β ] 1 [ 1 + ( ρ 1 + ρ 3 ) ( arctan ( α ) arctan ( α x β ) arctan ( α ) ) + ( ρ 2 + ρ 3 ) ( arctan ( α ) arctan ( α y β ) arctan ( α ) ) ] 1 .
The PDF plots of the BAP distribution for the following selected parameter values are displayed in Figure 6:
(a)
α = 8.5 , β = 2.5 , ρ 1 = 0.4 , ρ 2 = 0.1 , ρ 3 = 0.2 ,
(b)
α = 4.5 , β = 8.2 , ρ 1 = 0.3 , ρ 2 = 0.4 , ρ 3 = 0.2 and
(c)
α = 3.4 , β = 2.5 , ρ 1 = 0.3 , ρ 2 = 0.4 , ρ 3 = 0.6 .
These plots display left-skewed, right-skewed, and approximate symmetrical shapes.

5. Estimation Methods and Simulations

This section presents nine frequentist estimation procedures for estimating the parameters of the AP distribution. These are the maximum likelihood (ML) estimation, ordinary least squares (OLS), weighted least squares (WLS), Cramér–von Mises (CVM) estimation, Anderson–Darling (AD) estimation, percentile estimation (PE), and product spacing estimations.

5.1. Maximum Likelihood Estimation

Let y 1 , y 2 , , y n be independent and identically random observations of sample size n from the AP distribution. Suppose that ξ = ( α , β ) is the vector of parameters; then, the total log-likelihood function is
( ξ ) = n log ( α β ) n log ( arctan ( α ) ) + ( β 1 ) i = 1 n log ( y i ) i = 1 n log ( 1 + α 2 y i 2 β ) .
The total likelihood function can be maximized directly with respect to the parameters α and β to obtain the ML estimates of the parameters. Alternatively, these estimates can be obtained by equating the score functions to zero and solving the resulting system of equations simultaneously. The score functions, obtained by differentiating Equation (16) with respect to the parameters, are given by
( ξ ) α = n α n ( 1 + α 2 ) arctan ( α ) i = 1 n 2 α y i 2 β 1 + α 2 y i 2 β
and
( ξ ) β = n β + i = 1 n log ( y i ) i = 1 n 2 α 2 log ( y i ) y i 2 β 1 + α 2 y i 2 β .
The score functions do not have a closed form, thus, the resulting system of equations are solved numerically to obtain the estimates α ^ and β ^ .

5.2. Ordinary and Weighted Least Squares Estimation

Consider an ordered random sample y ( 1 ) , y ( 2 ) , , y ( n ) of size n from the AP distribution; then, the OLS estimates, α ^ O L S and β ^ O L S , of the parameters are obtained by minimizing the function
O L S = i = 1 n ( arctan ( α y ( i ) β ) arctan ( α ) i n + 1 ) 2 ,
with respect to the parameters α and β . The OLS estimates can also be obtained by numerically solving the nonlinear equations
i = 1 n ( arctan ( α y ( i ) β ) arctan ( α ) i n + 1 ) π s ( y ( i ) ; α , β ) = 0 ,   s = 1 , 2 ,
where
π 1 ( y ; α , β ) = 2 y ( i ) β arctan ( α ) ( 1 + α 2 y ( i ) 2 β ) 2 arctan ( α y ( i ) β ) ( arctan ( α ) ) 2 ( 1 + α 2 )
and
π 2 ( y ; α , β ) = 2 y ( i ) β arctan ( α ) ( 1 + α 2 y ( i ) 2 β ) .
The WLS estimates, α ^ W L S and β ^ W L S , of the parameters are obtained by minimizing the function
i = 1 n ( n + 1 ) 2 ( n + 2 ) i ( n i + 1 ) ( arctan ( α y ( i ) β ) arctan ( α ) i n + 1 ) 2 ,
with respect to the parameters α and β . Alternatively, the WLS estimates are obtained by numerically solving the nonlinear equations
i = 1 n ( n + 1 ) 2 ( n + 2 ) i ( n i + 1 ) ( arctan ( α y ( i ) β ) arctan ( α ) i n + 1 ) π s ( y ( i ) ; α , β ) = 0 ,   s = 1 , 2 ,
where π s ( y ; α , β ) ,   s = 1 , 2 are defined in Equations (21) and (22).

5.3. Cramér–Von Mises Estimation

Given that y ( 1 ) , y ( 2 ) , , y ( n ) are the ordered observations of size n from the AP distribution, the CVM estimates, α ^ C V M and β ^ C V M , of the parameters are obtained by minimizing the function
C V M = 1 12 n + i = 1 n ( arctan ( α y ( i ) β ) arctan ( α ) 2 i 1 2 n ) 2 ,
with respect to the parameters α and β . The CVM estimates can also be obtained by solving the nonlinear equation
i = 1 n ( arctan ( α y ( i ) β ) arctan ( α ) 2 i 1 2 n ) π s ( y ( i ) ; α , β ) = 0 ,   s = 1 , 2 ,
where π s ( y ; α , β ) ,   s = 1 , 2 are given in Equations (21) and (22).

5.4. Anderson–Darling Estimation

Let y ( 1 ) , y ( 2 ) , , y ( n ) be ordered observations of size n from the AP distribution. The AD estimates, α ^ A D and β ^ A D , of the parameters of the AP distribution are obtained by minimizing the function
A D = n 1 n i = 1 n ( 2 i 1 ) [ log ( arctan ( α y ( i ) β ) arctan ( α ) ) log ( arctan ( α ) arctan ( α y ( i ) β ) arctan ( α ) ) ] ,
with respect to the parameters α and β .

5.5. Percentile Estimation

Let y ( 1 ) , y ( 2 ) , , y ( n ) be ordered observations of size n from the AP distribution, and u i = i / ( n + 1 ) . The percentile estimates, α ^ P E and β ^ P E , of the parameters of the AP distribution are obtained by minimizing the function
P E = i = 1 n [ y ( i ) ( tan ( u i arctan ( α ) ) α ) 1 / β ] 2 ,
with respect to the parameters α and β .

5.6. Product Spacing Estimations

In this subsection, the maximum product spacing (MPS) and minimum spacing distance (MSD) estimation methods are discussed. The MPS estimation method is based on the Kullback–Leibler information measure. Let us consider the uniform spacing
D i = F Y ( y ( i ) ; α , β ) F Y ( y ( i 1 ) ; α , β ) = arctan ( α y ( i ) β ) arctan ( α ) arctan ( α y ( i 1 ) β ) arctan ( α ) ,
where F Y ( y ( 0 ) ; α , β ) = 0 , F Y ( y ( n + 1 ) ; α , β ) = 1 and D 0 ( α , β ) + D 1 ( α , β ) + + D n + 1 ( α , β ) = 1 . The MPS estimates, α ^ M P S and β ^ M P S , of the parameters are obtained by directly maximizing the logarithm of the geometric mean of the spacing given by
M P S = 1 n + 1 i = 1 n + 1 log D i ( α , β ) ,
with respect to the parameters α and β .
The MSD estimates, α ^ M S D and β ^ M S D , of the parameters of the AP distribution are obtained my minimizing the function
M S D = i = 1 n Δ ( D i ( α , β ) , 1 n + 1 ) ,
where Δ ( a , b ) represents an appropriate distance. Several choices of Δ ( a , b ) exist. However, in this study, we employ the absolute | a b | and absolute-logarithm | log ( a ) log ( b ) | distances. Hence, the minimum spacing absolute distance (MSAD) and minimum spacing absolute-logarithm (MSALD) estimates of the parameters are obtained by minimizing the functions
M S A D = i = 1 n | D i ( α , β ) 1 n + 1 |
and
M S A D = i = 1 n | log ( D i ( α , β ) ) log ( 1 n + 1 ) | ,
where D i ( α , β ) 1 n + 1 and log ( D i ( α , β ) ) log ( 1 n + 1 ) .

5.7. Monte Carlo Simulation

In this section, we conduct Monte Carlo simulation studies to investigate how the various estimation techniques perform with regards to estimating the parameter of the AP distribution. The exercise is carried out with two sets of parameter values, which are α = 0.8 , β = 0.4 and α = 4.5 , β = 6.2 . The simulation experiments are repeated 5000 times using the sample sizes n = 25 , 50 , 100 , 250 and 350 . The average estimates (AE), average absolute bias (AB), and root mean square error (RMSE) of the parameters are estimated and reported in Table 1 and Table 2. We observe that as the sample size increases, the AE of the parameters approaches the true parameter values. Furthermore, the ABs and RMSEs of the parameters decrease as the sample size increases for all the estimation methods used. Thus, the various estimation methods produce consistent estimates for the parameters of the AP distribution. However, none of the estimation methods proves to be superior to the others.

6. Empirical Application

In this section, we present frequentist and Bayesian applications of the AP distribution using biomedical data.

6.1. Frequentist Application

In this subsection, the univariate application of the AP distribution is illustrated using the ML estimation approach. The illustration is done using data on the recovery rates for viable CD34+ cells of 239 patients who agreed to an autologous peripheral blood stem cell (PBSC) transplant after myeloablative doses of chemotherapy between the years 2003 and 2008 at the Edmonton Hematopoietic Stem Cell Lab in the Cross Cancer Institute-Alberta Health Services. The data can be found in the simplexreg package developed by [22]. Ref. [6] recently fitted the unit Burr XII (UBXII) distribution to improve the recovery rates for viable CD34+ cells. The AP distribution is fitted to the recovery rates in this study, and its performance is compared to the AU distribution [14], unit power Weibull (UPW) distribution [23], log-XLindley (LXL) distribution [4], unit Lindley (UL) distribution [9], unit improved second degree Lindley (UISDL) distribution [11], bounded Marshall–Olkin extended exponential (BMOEE) distribution [24], unit Burr III (UBIII) distribution [8], unit Gompertz (UG) distribution [10], unit Weibull (UW) distribution [12], exponentiated Topp–Leone (ETL) distribution [13], Kumaraswamy distribution [25], and beta distribution. The performances of the distributions are compared using the log-likelihood ( ), Akaike information criterion (AIC), AIC difference (DAIC), Bayesian information criterion (BIC), Anderson–Darling (AD) test, Cramér–von Mises (CVM) test, and Kolmogorov–Smirnov (KS) test. The distribution with the highest value of and lowest values of AIC, BIC, AD, CVM, and KS is considered to be the best. The DAIC is computed as DAIC i = AIC i AIC min , i = 1 , 2 , ... , S , where S is the number of distributions under comparison. The best distribution satisfies DAIC = 0 . If DAIC > 2 , then the difference in performance between the two models is significant. Before fitting the models to the recovery rate for viable CD34+ cells, we explore their characteristics. From the kernel density, boxplot, and violin plots shown in Figure 7, we observe that the recovery rate for viable CD34+ cells is left-skewed. Hence, a distribution capable of modeling left-skewed data is required, which is the case for the AP distribution.
Table 3 presents the ML estimates of the parameters with their respective standard errors in brackets. The AP distribution appears to be the best model since it has the highest log-likelihood values and the smallest values for the AIC, BIC, AD, CVM, and KS. The p-values of the AD, CVM, and KS tests are given in parentheses. The p-values also indicate that the AP distribution is the best. Furthermore, looking at the DAIC values, the AP distribution significantly performs better than the other fitted distributions. Comparing the goodness-of-fit statistics of the AP and AU distributions, it can be concluded that the induction of the new parameter has greatly improved the performance of the AP distribution, making it superior to the AU distribution.
Figure 8 displays the histogram of the data and the estimated PDF of the AP distribution on the one hand and the empirical CDF and the estimated CDF of the AP distribution on the other hand, using the estimates of the parameter. This figure suggests that the AP distribution provides good fit to the data.
Figure 9 displays the probability-probability (P-P) plots of the fitted distributions. This figure suggests that the AP distribution provides a good fit to the data as its expected and observed probabilities cluster along the diagonal line.
The profile log-likelihood plots of the estimated parameters of the AP distribution are shown in Figure 10. These plots suggest that the ML estimates of the parameters are unique and denote the true maxima.

6.2. Bayesian Application

In this subsection, we demonstrate how to use the Bayesian approach to estimate the parameters of the AP distribution. To proceed, we need to first establish the prior distributions for the parameters, as it is very essential in Bayesian estimation. In this study, we use the non-informative gamma distribution as the prior distribution. Numerous studies have recommended the use of this approach (see [26,27]). Thus, the prior distributions of the parameters are
π ( α ) G a m m a ( a 1 , b 1 ) = b 1 a 1 Γ ( a 1 ) α a 1 1 e b 1 α , a 1 > 0 , b 1 > 0 , α > 0
and
π ( β ) G a m m a ( a 2 , b 2 ) = b 2 a 2 Γ ( a 2 ) β a 2 1 e b 2 β , a 2 > 0 , b 2 > 0 , β > 0
The joint PDF of the prior distributions of the parameters is given by
π ( α , β ) = π ( α ) π ( β ) .
The joint posterior PDF is therefore given by
P ( α , β | y ) i = 1 n f Y ( y i ; α , β ) × π ( α , β ) ,
where i = 1 n f Y ( y i ; α , β ) is the likelihood function of the AP distribution. The joint posterior PDF is not analytically tractable; hence, we employ the Markov Chain Monte Carlo (MCMC) approach to obtain samples from which features of the marginal distributions can be inferred. The following hyperparameter values a 1 = a 2 = b 1 = b 2 = 0.001 are considered for the analysis. The analysis is performed using the R2jags package in R (see [28]) and the data described in Section 6.1. We use three parallel chains, each with 40,000 iterations and a burn-in of 5000. Hence, posterior sample of size 7000 and thinning interval 5 is used in the analysis. Table 4 presents the mean estimate, Monte Carlo standard error (SE), posterior standard deviation (SD), and other numerical summaries of the posterior distribution. From the results, the MCMC algorithm has converged because the potential reduction scale factor ( R ^ ) is approximately 1 and the effective sample size (neff) is greater than 400. The estimated deviance information criterion (DIC) is 385.2000 . It can be observed that the Bayesian estimates and ML estimates of the parameters are quite close.
We investigate the convergence of the chains visually using the trace, ergodic mean, and autocorrelation plots. The trace plots shown in Figure 11 suggest a stationary pattern and thus convergence of the chains.
The ergodic mean plots (Figure 12) of the parameters clearly show that the chains have converged after 3000 iterations.
The rapid decay of the autocorrelation plots, as shown in Figure 13, suggests good mixing of the chains and the convergence of the MCMC algorithm.

7. Regression Models

In this section, the quantile and modal regression models are developed for investigating the relationship between a dependent variable and a set of independent variable (s).

7.1. Quantile Regression Model

When investigating the influence of covariates on a skewed, bounded response variable, the beta regression model cannot produce reliable results since it models the conditional mean of the response variable. This is because the mean is not an appropriate measure of central tendency when the data are skewed. Thus, a regression model that is not influenced by outliers is required. The quantile regression is appropriate when dealing with skewed response variables. In this subsection, the AP quantile regression model is developed. To this aim, we re-parameterize the PDF of the AP distribution in terms of its quantile function. Let η = Q ( u ; α , β ) , η ( 0 , 1 ) , making β the subject in the quantile function, and we have β = ( log ( η ) ) 1 log ( α 1 tan ( u arctan ( α ) ) ) . Hence, the re-parametrized PDF in terms of the quantile function is given by
f Y ( y ; α , η ) = α ( log ( η ) ) 1 λ y ( log ( η ) ) 1 λ 1 arctan ( α ) ( 1 + α 2 y 2 ( log ( η ) ) 1 λ ) ,
where λ = log ( α 1 tan ( u arctan ( α ) ) ) and η is the quantile parameter. Suppose that y 1 , y 2 , ... , y n are random observations from the AP distribution and z i is non-random covariates. The AP quantile regression model is thus given by
η i = g 1 ( z i T δ )
where δ = ( δ 0 , δ 1 , δ 2 , , δ p ) T is the vector of coefficients of the covariates to be estimated, z i T = ( 1 , z i 1 , z i 2 , , z i p ) is the known i t h vector of independent variables, and g ( ) is an appropriate link function that relates the independent variables to the conditional quantile of the dependent variable. When u = 0.5 , the median regression is obtained. Although different link functions exist for modeling bounded response variables, in this study, the logit link function is used due to the easy interpretation of the parameters. Hence, we have
log ( η i 1 η i ) = δ 0 + δ 1 z i 1 + δ 2 z i 2 + + δ p z i p
The log-likelihood for estimating the parameters of the regression model is
= n log ( α ) n log ( arctan ( α ) ) + n log ( λ ) i = 1 n log ( log ( η i ) ) + i = 1 n ( ( log ( η i ) ) 1 λ 1 ) log ( y i ) i = 1 n log ( 1 + α 2 y i 2 ( log ( η i ) ) 1 λ ) .
Maximizing the log-likelihood function in Equation (34) with respect to the involved parameters gives the estimates of the parameters of the model. For more information on the development of parametric quantile regressions, we refer the readers to [2,3,6].

7.2. Modal Regression

When the response variable is heavy-tailed or asymmetric, modal regression is known to give a better fit than the conditional mean or median regression [29]. It is also established that the prediction intervals from modal regression possess a higher coverage probability than the mean-based prediction interval (see [29,30]). This subsection presents the modal-based regression using the AP distribution. Suppose that the transformation ( α , β ) ( η , φ ) is one-to-one, where η ( 0 , 1 ) is the mode and φ > 1 is a precision/shape parameter. Then the PDF of the AP distribution can be re-parameterized in terms of the mode (see [29]). Let β = φ , then α = η φ ( φ + 1 ) 1 / 2 ( φ 1 ) 1 / 2 and the PDF of the AP distribution in terms of mode is given by
f Y ( y ; η , φ ) = η φ φ ( φ + 1 ) 1 / 2 ( φ 1 ) 1 / 2 y φ 1 arctan ( η φ ( φ + 1 ) 1 / 2 ( φ 1 ) 1 / 2 ) ( 1 + η 2 φ ( φ + 1 ) 1 ( φ 1 ) y 2 φ ) .
The modal regression is given by
η i = h 1 ( z i T δ )
where δ = ( δ 0 , δ 1 , δ 2 , , δ p ) T is the vector of unknown parameters to be estimated, z i T = ( 1 , z i 1 , z i 2 , , z i p ) are the known i t h vector of covariates and h ( ) is an appropriate link function that links the covariates to the conditional mode of the response variable. The logit link function is adopted since the mode of the AP distribution lies on (0, 1). Thus, we have
log ( η i 1 η i ) = δ 0 + δ 1 z i 1 + δ 2 z i 2 + + δ p z i p
The log-likelihood for estimating the parameters of the model is given by
= n log ( φ ( φ + 1 ) 1 / 2 ( φ 1 ) 1 / 2 ) φ i = 1 n log ( η i ) + ( φ 1 ) i = 1 n log ( y i ) i = 1 n log ( arctan ( η i φ ( φ + 1 ) 1 / 2 ( φ 1 ) 1 / 2 ) ) i = 1 n log ( 1 + η i 2 φ ( φ + 1 ) 1 ( φ 1 ) y i 2 φ ) .
The estimates of the parameters of the modal regression are obtained by maximizing Equation (36) with respect to the involved parameters.

7.3. Residual Analysis

Investigating how well a model fits a given data set is very important. Hence, the adequacy of the model is often examined using the residuals from the fitted model. The Cox–Snell and randomized quantile residuals are used to assess the performance of the regression models in this study.
Thus, the Cox–Snell residuals (see [31]) are used to assess the adequacy of the regression models. The Cox–Snell residuals are defined as
e i = log ( 1 F Y ( y i ; δ ^ ) , i = 1 , 2 , ... , n
where δ ^ is the vector of the estimated parameters of the regression models. The Cox–Snell residuals are expected to be standard exponentially distributed if the models provide good fit to the data.
Assessing the randomized quantile residuals of the model is another alternative for examining the adequacy of the regression model. The randomized quantile residual is given by
e i = Φ 1 ( F Y ( y i ; δ ^ ) , i = 1 , 2 , ... , n ,
where Φ 1 ( ) is the quantile of the standard normal distribution. If the regression model provides good fit to the data, the randomized quantile residuals are expected to follow the standard normal distribution (see [32]).

7.4. Monte Carlo Simulation for Regression Models

In this section, Monte Carlo simulation experiments are carried out to assess how the ML estimates perform with regards to estimating the parameters of the AP quantile and modal regressions. The simulations for the quantile regression are carried out using the conditional median. The conditional median in this case is the median of the response variable given the values of the covariates. The experiment is replicated 5000 times for each sample size n = 50 , 150 , 250 , 350 , 450 , and 550 . For the first scenario, the following parameter combinations are used for the quantile and modal regressions, respectively: ( δ 0 , δ 1 , δ 2 , α ) = ( 0.8 , 0.3 , 0.6 , 1.5 ) and ( δ 0 , δ 1 , δ 2 , φ ) = ( 0.8 , 0.3 , 0.6 , 1.5 ) . In the second scenario, the parameter following combinations are used, respectively, for the quantile and modal regressions: ( δ 0 , δ 1 , δ 2 , α ) = ( 0.1 , 0.4 , 0.8 , 1.3 ) and ( δ 0 , δ 1 , δ 2 , φ ) = ( 0.1 , 0.4 , 0.8 , 1.3 ) . The following regression structure with two covariates is employed during the simulation for both regression models:
log ( η i 1 η i ) = δ 0 + δ 1 z i 1 + δ 2 z i 2 , i = 1 , 2 , ... , n .
The covariate, z i 1 , is generated from a standard normal distribution and z i 2 is from a t distribution with four degrees of freedom. The covariates are held fixed during the simulation process. The observations for the response variable are generated using the inversion method for both the quantile and modal regressions. The performance of the estimation method is assessed using the average estimate (AE), absolute bias (AB), and root mean square error (RMSE). The results in Table 5 and Table 6 reveal that the AEs approach the true parameter values as the sample size increases. Furthermore, the ABs and RMSEs decrease as the sample size increases. Hence, the estimates of the parameters for both models are consistent based on the ML technique.

7.5. Application of Regression Models

The use of quantile and modal regressions is demonstrated in this subsection. The application of the quantile regression is illustrated via the conditional median regression by setting u = 0.5 . The application of the models is illustrated by regressing the recovery rates for viable CD34+ cells of 239 patients described in Section 6 on the following covariates: gender ( z i 1 , 0 for female and 1 for male), chemotherapy ( z i 2 , 0 for receiving chemotherapy on a one-day protocol and 1 for a three-day protocol), and adjusted patient’s age ( z i 3 , that is the current age minus 40). Ref. [6] fitted the UBXII median regression with the following results: AIC = 384.2649 and BIC = 366.8826 . The authors showed that the UBXII median regression performs better than the Kumaraswamy median regression with the following results: AIC = 375.6599 and BIC = 358.2775 , and beta mean regression with the following results: AIC = 381.7912 and BIC = 364.4089 . The exploratory analysis in Section 6.1 suggests that the response variable is left-skewed or contains some extreme values. This is an indication that robust regression models are required for modeling the data, and thus our choice of using the median and modal regressions is appropriate. We adopt the following regression structure:
log ( η i 1 η i ) = δ 0 + δ 1 z i 1 + δ 2 z i 2 + δ 3 z i 3 , i = 1 , 2 , ... , 239
to model the data. Table 7 displays the estimates of the model parameters, standard errors, p-values, and information criteria. From the information criteria, the AP regressions (median and modal) perform better than the UBXII median, Kumaraswamy median, and beta mean regressions. Since DAIC > 2 , the AP regressions perform significantly better than the compared regressions. Comparing the AP median regression with the modal regression, it can be said that the AP median regression performs better than the modal regression. From Table 7, it can be seen that the parameter δ 1 is not statistically significant at 5% level of significance. Hence, the variable gender has no significant effect on the recovery rate. The parameters δ 2 and δ 3 are statistically significant at the 5% level of significance. This implies that the recovery rate of older patients is higher than that of younger ones. Furthermore, the recovery rate of patients who receive chemotherapy on a three-day protocol is higher than that of those who receive chemotherapy on a one-day protocol.
The adequacy of the fitted regression models is assessed by examining the residuals of the fitted models. The P-P plots and half-normal plots with simulated envelopes of the randomized quantile residuals in Figure 14 indicate that the models are adequate.
The P-P and quantile-quantile (Q-Q) plots with simulated envelopes of the Cox–Snell residuals shown in Figure 15 again affirm that the fitted models are adequate.

8. Conclusions

In this study, the AP distribution and its associated quantile and modal regressions were developed. The PDF of the AP distribution exhibits flexible shapes such as left-skewed, right-skewed, J, and reversed-J shapes. This makes the distribution a suitable candidate for fitting data with such characteristics. The corresponding HRF also suggests that the distribution is capable of fitting data with monotonic and non-monotonic failure rates. We explored the performance of nine frequentist estimation procedures for estimating the parameters of the distribution using Monte Carlo simulations, and the results revealed that most of the procedures are consistent with regards to estimating the parameters. A biomedical application of the distribution showed that the model provides a good fit to the data. A Bayesian illustration of how to apply the distribution showed that the approach is able to estimate the parameters of the distribution very well. The applications of the elaborated quantile and modal regressions demonstrated that the new regression models outperformed some existing regression models. The future perspective of this work is to demonstrate the Bayesian applications of the quantile and modal regressions.

Author Contributions

Conceptualization, S.N., A.G.A. and C.C.; Data curation, S.N., A.G.A. and C.C.; Methodology, S.N., A.G.A. and C.C.; Supervision, S.N. and C.C.; Validation, S.N. and C.C.; Visualization, S.N. and A.G.A.; Writing, S.N. and A.G.A.; Review and editing, S.N. and C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study can be found in the simplexreg package of the R software developed by [22].

Acknowledgments

We express our sincere gratitude to the editor and reviewers whose constructive criticism improved the content of the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Korkmaz, M.Ç.; Chesneau, C.; Korkmaz, Z.S. The unit folded normal distribution: A new unit probability distribution with the estimation procedures, quantile regression modeling and educational attainment applications. J. Reliab. Stat. Stud. 2022, 15, 261–298. [Google Scholar] [CrossRef]
  2. Nasiru, S.; Abubakari, A.G.; Chesneau, C. New lifetime distribution for modeling data on the unit interval: Properties, application and quantile regression. Math. Comput. Appl. 2022, 27, 105. [Google Scholar] [CrossRef]
  3. Abubakari, A.G.; Luguterah, A.; Nasiru, S. Unit exponentiated Fréchet distribution: Actuarial measures, quantile regression and applications. J. Indian Soc. Probab. Stat. 2022, 23, 387–424. [Google Scholar] [CrossRef]
  4. Eliwa, M.S.; Ahsan-ul-Haq, M.; Al-Bossly, A.; El-Morshedy, M. Properties and estimation techniques with application to model data from SC16 and P3 algorithms. Math. Probl. Eng. 2022, 2022, 9289721. [Google Scholar] [CrossRef]
  5. Korkmaz, M.Ç.; Emrah, A.; Chesneau, C.; Yousof, H.M. On the unit-Chen distribution with associated quantile regression and applications. Math. Slovaca 2022, 72, 765–786. [Google Scholar] [CrossRef]
  6. Korkmaz, M.Ç.; Chesneau, C. On the unit Burr XII distribution with the quantile regression modeling and applications. Comput. Appl. Math. 2021, 40, 29. [Google Scholar] [CrossRef]
  7. Korkmaz, M.Ç. The unit generalized half normal distribution: A new bounded distribution with inference and application. UPB Sci. Bull. Ser. A 2020, 82, 133–140. [Google Scholar]
  8. Modi, K.; Gill, V. Unit Burr-III distribution with application. J. Stat. Manag. Syst. 2019, 23, 579–592. [Google Scholar] [CrossRef]
  9. Mazucheli, J.; Menezes, A.F.B.; Chakraborty, S. On the one parameter unit-Lindley distribution and its associated regression model for proportion data. J. Appl. Stat. 2019, 46, 700–714. [Google Scholar] [CrossRef]
  10. Mazucheli, J.; Menezes, A.F.; Dey, S. Unit-Gompertz distribution with applications. Statistica 2019, 79, 25–43. [Google Scholar]
  11. Altun, E.; Cordeiro, G.M. The unit-improved second-degree Lindley distribution: Inference and regression modeling. Comput. Stat. 2019, 35, 259–279. [Google Scholar] [CrossRef]
  12. Mazucheli, J.; Menezes, A.F.; Ghitany, M.E. The unit Weibull distribution and associated inference. J. Appl. Probab. Stat. 2018, 13, 1–22. [Google Scholar]
  13. Pourdarvish, A.; Mirmostafaee, S.M.T.K.; Naderi, K. The exponentiated Topp-Leone distribution: Properties and application. J. Appl. Environ. Biol. Sci. 2015, 5, 251–256. [Google Scholar]
  14. Kharazmi, O.; Alizadeh, M.; Contreras-Reyes, J.E.; Haghbin, H. Arctan-based family of distributions: Properties, survival regression, Bayesian analysis and applications. Axioms 2022, 11, 399. [Google Scholar] [CrossRef]
  15. Al-Mofleh, H.; Afify, A.Z.; Ibrahim, N.A. A new extended two-parameter distribution: Properties, estimation methods and, applications in medicine and geology. Mathematics 2020, 8, 1578. [Google Scholar] [CrossRef]
  16. Iqbal, Z.; Tahir, M.M.; Riaz, N.; Ali, S.A.; Ahmad, M. Generalized inverted Kumaraswamy distribution: Properties and application. Open J. Stat. 2017, 7, 645–662. [Google Scholar] [CrossRef]
  17. Iqbal, Z.; Hasnain, S.A.; Salman, M.; Ahmad, M.; Hamedani, G.G. Generalized exponentiated moment exponential distribution. Pak. J. Stat. 2014, 30, 537–554. [Google Scholar]
  18. Gradshteyn, I.S.; Ryzhik, I.M. Tables of Integrals, Series and Products, 7th ed.; Elsevier/Academic Press: Amsterdam, The Netherlands, 2007. [Google Scholar]
  19. Sklar, A. Random variables, joint distribution functions and copulas. Kybernetika 1973, 9, 449–460. [Google Scholar]
  20. Elhassanein, A. On statistical properties of a new bivariate modified Lindley distribution with an application to financial data. Complexity 2022, 2022, 2328831. [Google Scholar] [CrossRef]
  21. Ganji, M.; Bevrani, H.; Hami, N. A new method for generating continuous bivariate families. J. Iran. Stat. Soc. 2018, 17, 109–129. [Google Scholar] [CrossRef]
  22. Zhang, P.; Qiu, Z.; Shi, C. Simplexreg: An R package for regression analysis of proportional data using the simplex distribution. J. Stat. Softw. 2016, 71, 1–21. [Google Scholar] [CrossRef]
  23. Bantan, R.A.R.; Shafiq, S.; Tahir, M.H.; Elhassanein, A.; Jamal, F.; Almutiry, W.; Elgarhy, M. Statistical analysis of COVID-19 data: Using a new univariate and bivariate statistical model. J. Funct. Spaces 2022, 2022, 2851352. [Google Scholar] [CrossRef]
  24. Ghosh, I.; Dey, S.; Kumar, D. Bounded M-O extended exponential distribution with applications. Stoch. Qual. Control. 2019, 34, 35–51. [Google Scholar] [CrossRef]
  25. Kumaraswamy, P. A Generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
  26. Muse, A.H.; Chesneau, C.; Ngesa, O.; Mwalili, S. Flexible parametric accelerated hazard model: Simulation and application to censored lifetime data with crossing survival curves. Math. Comput. Appl. 2022, 27, 104. [Google Scholar] [CrossRef]
  27. Khan, S.A. Exponentiated Weibull regression for time-to-event data. Lifetime Data Anal. 2018, 24, 328–354. [Google Scholar] [CrossRef]
  28. Su, Y.S.; Yajima, M. R2jags: A Package for Running Jags from R. 2012. Available online: https://CRAN.R-project.org/package=R2jags (accessed on 21 December 2022).
  29. Menezes, A.F.B.; Mazucheli, J.; Chakraborty, S. A collection of parametric modal regression models for bounded data. J. Biopharm. Stat. 2021, 31, 490–506. [Google Scholar] [CrossRef]
  30. Yao, W.; Li, L. A new regression model. Scand. J. Stat. 2014, 41, 656–671. [Google Scholar] [CrossRef]
  31. Cox, D.R.; Snell, E.J. A general definition of residuals. J. R. Stat. Soc. Ser. B 1968, 30, 248–275. [Google Scholar] [CrossRef]
  32. Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar]
Figure 1. PDF (left) and HRF (right) plots.
Figure 1. PDF (left) and HRF (right) plots.
Mca 28 00025 g001
Figure 2. Skewness (left) and Kurtosis (right) plots.
Figure 2. Skewness (left) and Kurtosis (right) plots.
Mca 28 00025 g002
Figure 3. Plots of Lorenz curve (left) and Bonferroni curve (right).
Figure 3. Plots of Lorenz curve (left) and Bonferroni curve (right).
Mca 28 00025 g003
Figure 4. Min-max plots for the AP distribution.
Figure 4. Min-max plots for the AP distribution.
Mca 28 00025 g004
Figure 5. CDF plots of the BAP distribution.
Figure 5. CDF plots of the BAP distribution.
Mca 28 00025 g005
Figure 6. PDF plots of the BAP distribution.
Figure 6. PDF plots of the BAP distribution.
Mca 28 00025 g006
Figure 7. Kernel density, boxplot, and violin plots.
Figure 7. Kernel density, boxplot, and violin plots.
Mca 28 00025 g007
Figure 8. Histogram and estimated PDF (left), and empirical CDF and estimated CDF (right).
Figure 8. Histogram and estimated PDF (left), and empirical CDF and estimated CDF (right).
Mca 28 00025 g008
Figure 9. P-P plots of the fitted distributions.
Figure 9. P-P plots of the fitted distributions.
Mca 28 00025 g009
Figure 10. Profile log-likelihood plots of the estimated parameters of the AP distribution.
Figure 10. Profile log-likelihood plots of the estimated parameters of the AP distribution.
Mca 28 00025 g010
Figure 11. The AP distribution posterior parameters trace plots.
Figure 11. The AP distribution posterior parameters trace plots.
Mca 28 00025 g011
Figure 12. The AP distribution posterior parameters ergodic mean plots.
Figure 12. The AP distribution posterior parameters ergodic mean plots.
Mca 28 00025 g012
Figure 13. The AP distribution posterior parameters autocorrelation plots.
Figure 13. The AP distribution posterior parameters autocorrelation plots.
Mca 28 00025 g013
Figure 14. P-P (top) and half-normal (bottom) plots of the randomized quantile residuals.
Figure 14. P-P (top) and half-normal (bottom) plots of the randomized quantile residuals.
Mca 28 00025 g014
Figure 15. P-P (top) and Q-Q (bottom) plots of the Cox–Snell residuals.
Figure 15. P-P (top) and Q-Q (bottom) plots of the Cox–Snell residuals.
Mca 28 00025 g015
Table 1. AE, AB, and RMSE for α = 0.8 and β = 0.4 .
Table 1. AE, AB, and RMSE for α = 0.8 and β = 0.4 .
Parameter n MLMPSOLSWLSADCVMPEMADSMALDS
AE
α 250.76091.10130.43030.50790.56340.62100.89690.17300.5673
500.89891.11310.68650.76790.77940.83870.94000.17180.5865
1000.51860.63300.52850.54080.53160.60200.73640.30130.4153
2500.75630.82120.64380.69470.68210.67370.65980.45160.5850
3500.80820.87650.77200.80390.79330.79690.69470.56020.7547
β 250.42170.46740.39920.40050.40650.42370.48950.33230.4021
500.42940.45800.40860.41580.41740.42580.45840.29950.3967
1000.39030.40390.39470.39440.39260.40160.43710.35820.3858
2500.40350.41150.39380.39750.39660.39740.40610.36730.3940
3500.39490.40260.39070.39440.39310.39360.39040.37190.3899
AB
α 250.55840.68720.60470.53820.54530.64590.76760.68450.6637
500.53080.62700.51590.54050.49410.54910.95100.67120.6083
1000.66280.64470.70830.69090.68670.67930.86180.58000.6805
2500.28030.27190.36700.31640.32560.36160.47280.54430.4994
3500.25840.26660.23060.23760.23890.23360.45860.45180.3332
β 250.07010.10000.08070.07240.06860.08440.13270.21820.1001
500.04420.06430.04950.04350.04280.05800.10590.12750.0445
1000.05040.05300.04930.04930.04900.05000.06570.06400.0480
2500.02700.02860.03520.03060.03140.03580.05340.05570.0356
3500.02260.02220.02430.01760.01920.02430.05200.04280.0268
RMSE
α 250.68320.88240.66420.61960.63730.74980.93740.72490.7684
500.62910.75700.66030.68310.59630.71641.48600.71760.6671
1000.73220.74920.78480.77440.76110.79210.95370.65760.7420
2500.33590.33660.46140.39880.41080.46150.58930.62600.5625
3500.31290.30930.31540.30860.30980.31420.56020.53550.4107
β 250.09100.12170.10290.09180.08800.11740.16840.24640.1214
500.05420.07820.06070.05590.04930.07120.16460.15920.0603
1000.06120.06550.06270.06180.06060.06520.08750.09810.0604
2500.03370.03620.04020.03640.03740.04110.06790.06960.0446
3500.02590.02590.02930.02420.02490.02890.06190.05600.0337
Table 2. AE, AB, and RMSE for α = 4.5 and β = 6.2 .
Table 2. AE, AB, and RMSE for α = 4.5 and β = 6.2 .
Parameter n MLMPSOLSWLSADCVMPEMADSMALDS
AE
α 257.076510.36435.91415.80556.61867.59834.85741.27948.3329
505.04995.98014.80624.76514.76805.36904.17973.37585.4587
1004.38624.83834.15044.26294.28914.35893.95003.68634.3552
2504.36604.55604.27584.31554.33074.35974.15513.97164.4893
3504.33344.47674.20764.27484.27664.26684.21634.12504.3294
β 256.49147.31705.94965.95106.21636.53825.59273.31395.9368
506.18856.63365.95306.00596.05166.22265.70824.69876.1925
1006.25346.52786.07706.16576.18496.20945.99145.58516.2811
2506.12976.24816.07146.10256.11356.12406.00265.76966.1201
3506.06086.15145.98576.02326.02586.02325.98245.86186.0932
AB
α 253.41275.92933.39203.15703.42684.26222.74493.28625.8446
501.82882.17412.13201.91671.73832.27571.77672.48172.5227
1001.00120.95661.07381.02491.07811.04741.05211.52901.2026
2500.80310.80540.81030.77090.75700.79120.83091.20291.0822
3500.63950.61360.61380.61330.60860.60410.69720.88900.5945
β 251.20381.49811.32401.23791.18231.39261.21742.90291.2698
500.93400.96601.05990.99330.93271.04331.06662.10791.2164
1000.54490.54360.57230.55440.57150.53830.57690.99750.6254
2500.40170.41560.40490.40160.39590.40260.45750.75740.6456
3500.37070.35380.38350.36780.36520.37230.41900.52580.3588
RMSE
α 259.028916.65887.75157.08258.936610.79035.13253.586219.9363
503.11014.13063.70042.94292.70484.37872.27203.10474.0033
1001.27461.44241.34151.30201.36451.37431.19582.06021.7619
2501.02031.06311.01721.00520.99061.02171.04391.63231.3097
3500.75750.75590.75390.74760.73760.74270.80501.21300.7278
β 251.53692.03071.64411.53881.53571.79841.43253.36781.7998
501.20051.33721.36141.23141.17331.39641.23182.69881.5320
1000.69420.77280.72700.68910.71310.72960.66891.57220.8371
2500.53880.54320.53060.53430.52150.52320.59160.96660.7900
3500.42640.41220.46730.43680.43430.45340.47430.66240.4570
Table 3. Parameter estimates, standard errors, goodness-of-fit tests.
Table 3. Parameter estimates, standard errors, goodness-of-fit tests.
ModelParameter AICDAICBICADCVMK-S
AP α = 5.0250 ( 0.9841 )
β = 8.1856 ( 0.6324 )
194.5900−385.17560.0000−378.22270.3670
(0.8806)
0.0461
(0.8999)
0.0430
(0.7694)
AU α = 2.5208 × 10 14 ( 0.0828 ) 0.00002.0000387.17565.4765131.0700
(<0.0001)
28.2090
(<0.0001)
0.5572
(<0.0001)
Beta α = 8.6671 ( 0.8063 )
β = 2.2859 ( 0.1962 )
191.8700−379.73455.4411−372.78160.8732
(0.4310)
0.1402
(0.4213)
0.0650
(0.2647)
Kumaraswamy α = 6.6942 ( 0.4546 )
β = 2.4355 ( 0.2411 )
190.7600−377.58207.5936−370.57511.1438
(0.2899)
0.1916
(0.2845)
0.0723
(0.1646)
UBIII α = 6.4356 ( 0.5341 )
β = 1.5532 ( 0.0695 )
192.5000−381.00314.1725−374.05010.7758
(0.4987)
0.1191
(0.4996)
0.0535
(0.4997)
BMOEE α = 7.6885 ( 1.7248 )
β = 9.6771 ( 0.7554 )
192.4200−380.83554.3401−373.88250.6848
(0.5715)
0.0866
(0.6551)
0.0489
(0.6182)
UG α = 1.0457 ( 0.2360 )
β = 2.3734 ( 0.3237 )
177.0300−350.061235.1144−343.10824.9419
(0.0031)
0.7829
(0.0080)
0.1106
(0.0058)
UW α = 8.0560 ( 0.8314 )
β = 1.6182 ( 0.0791 )
192.0200−380.03145.1442−373.07850.8636
(0.4373)
0.1328
(0.4467)
0.0557
(0.4486)
ETL α = 14.9326 ( 1.3241 )
β = 0.8641 ( 0.0718 )
192.6800−381.36013.8155−374.40720.6705
(0.5838)
0.0996
(0.5873)
0.0520
(0.5370)
UBXII α = 10.0760 ( 1.0039 )
β = 1.7321 ( 0.0787 )
193.5000−383.00542.1702−376.05250.5806
(0.6664)
0.0887
(0.6437)
0.0522
(0.5321)
UISDL α = 0.3571 ( 0.0134 ) 54.2900−106.5865278.5891−103.110134.4330
(<0.0001)
20.1010
(<0.0001)
0.2851
(<0.0001)
UL α = 0.2424 ( 0.0112 ) 97.6400−193.2741191.9015−189.797620.1010
(<0.0001)
4.0961
(<0.0001)
0.2365
(<0.0001)
LXL α = 4.2040 ( 0.2569 ) 154.6800−307.356477.8192−303.879915.7970
(<0.0001)
3.0033
(<0.0001)
0.2010
(<0.0001)
UPW α = 500.0000 ( 8.1076 × 10 6 )
β = 2.4183 ( 9.9309 × 10 2 )
λ = 0.0372 ( 3.5461 × 10 3 )
168.2600−330.511154.6645−320.08175.3084
(0.0021)
0.8375
(0.0059)
0.1152
(0.0035)
Table 4. Posterior summaries of the parameters of the AP distribution.
Table 4. Posterior summaries of the parameters of the AP distribution.
ParameterEstimateSESD2.50%50%97.50% R ^ Neff
α 5.06000.01071.01503.37604.95407.35601.00105500
β 8.16000.00660.63006.96408.14909.41101.00106200
Table 5. Simulation results for the first scenario.
Table 5. Simulation results for the first scenario.
Parameter n AP Quantile RegressionParameter n AP Modal Regression
AEABRMSEAEABRMSE
δ 0 500.76590.20280.2533 δ 0 500.64950.59310.6372
1500.78700.12860.15861500.75510.52400.5771
2500.78370.10410.13042500.70150.45830.5226
3500.79530.08960.11043500.75260.42260.4880
4500.79900.08680.10714500.76740.37450.4419
5500.79900.06810.08445500.76680.34990.4195
δ 1 500.40100.32560.3983 δ 1 500.72020.66760.7959
1500.32660.19740.24071500.62080.56300.7027
2500.33080.17370.21222500.64700.57460.7074
3500.31190.14430.17423500.56950.51760.6518
4500.30120.14030.17114500.54390.48130.6098
5500.29510.10440.13095500.49650.44500.5669
δ 2 500.60150.08930.1157 δ 2 500.59210.35020.4263
1500.60450.04800.06141500.61430.21710.2787
2500.60570.03810.04692500.60900.16940.2232
3500.60060.03250.04103500.61830.15630.2020
4500.60010.02910.03714500.61740.12590.1659
5500.60170.02720.03365500.61870.11930.1569
α 501.81840.72790.8795 φ 501.66440.24650.2948
1501.64690.41110.52661501.57930.14770.1879
2501.59570.30580.39712501.53760.10260.1333
3501.56890.25260.31903501.52890.08400.1100
4501.55860.22270.28914501.52160.07210.0931
5501.54120.20470.26025501.50850.06930.0870
Table 6. Simulation results for the second scenario.
Table 6. Simulation results for the second scenario.
Parameter n AP Quantile RegressionParameter n AP Modal Regression
AEABRMSEAEABRMSE
δ 0 500.16670.14960.1906 δ 0 500.37460.38020.6027
1500.14840.12070.15021500.33360.33360.5220
2500.11360.09070.10972500.23760.24220.3747
3500.11710.08450.10213500.23020.22820.3570
4500.11640.08420.10284500.21650.20850.3172
5500.11220.07140.08565500.18410.17480.2572
δ 1 500.40490.30250.3523 δ 1 500.57590.58150.6773
1500.36810.18820.23121500.47280.48310.5746
2500.40420.16540.20112500.48920.43850.5127
3500.38620.14980.18083500.41870.37930.4540
4500.39120.14530.17714500.44570.36840.4586
5500.37300.10470.13245500.39740.34080.4147
δ 2 500.79350.10380.1363 δ 2 500.89700.33440.4124
1500.80570.05460.06991500.87730.20460.2720
2500.80130.04260.05192500.86510.14410.2004
3500.80080.03640.04573500.84710.12960.1734
4500.79870.03270.04144500.84400.10520.1468
5500.80500.03260.03945500.83390.10250.1397
α 501.20870.31830.4361 φ 501.44030.21640.2713
1501.26670.22810.29321501.36040.12420.1589
2501.27190.19670.24482501.32580.08700.1127
3501.29300.17020.20343501.32110.07000.0911
4501.28710.16320.19714501.31530.06090.0785
5501.29190.15460.18455501.30630.05880.0739
Table 7. Estimates, standard errors, and information criteria for the regression models.
Table 7. Estimates, standard errors, and information criteria for the regression models.
AP Quantile RegressionAP Modal Regression
ParameterEstimateStandard Errorp-ValueParameterEstimateStandard Errorp-Value
δ 0 1.01190.1226<0.0001 δ 0 0.89030.1715<0.0001
δ 1 0.05330.09120.5585 δ 1 0.09210.12350.4560
δ 2 0.23920.09400.0110 δ 2 0.31530.15590.0432
δ 3 0.01690.00490.0006 δ 3 0.02530.00820.0020
α 5.61001.1128<0.0001 φ 8.42440.6471<0.0001
= 201.1400 = 199.7300
AIC = 392.2835 AIC = 389.4540
BIC = 374.9012 BIC = 372.0717
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nasiru, S.; Abubakari, A.G.; Chesneau, C. The Arctan Power Distribution: Properties, Quantile and Modal Regressions with Applications to Biomedical Data. Math. Comput. Appl. 2023, 28, 25. https://doi.org/10.3390/mca28010025

AMA Style

Nasiru S, Abubakari AG, Chesneau C. The Arctan Power Distribution: Properties, Quantile and Modal Regressions with Applications to Biomedical Data. Mathematical and Computational Applications. 2023; 28(1):25. https://doi.org/10.3390/mca28010025

Chicago/Turabian Style

Nasiru, Suleman, Abdul Ghaniyyu Abubakari, and Christophe Chesneau. 2023. "The Arctan Power Distribution: Properties, Quantile and Modal Regressions with Applications to Biomedical Data" Mathematical and Computational Applications 28, no. 1: 25. https://doi.org/10.3390/mca28010025

Article Metrics

Back to TopTop