Next Article in Journal
Dispersive Optical Solitons to Stochastic Resonant NLSE with Both Spatio-Temporal and Inter-Modal Dispersions Having Multiplicative White Noise
Previous Article in Journal
BlockCrime: Blockchain and Deep Learning-Based Collaborative Intelligence Framework to Detect Malicious Activities for Public Safety
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Regression Model on the Unit Interval: Properties, Estimation, and Application

by
Yury R. Benites
1,†,
Vicente G. Cancho
1,†,
Edwin M. M. Ortega
2,*,†,
Roberto Vila
3,† and
Gauss M. Cordeiro
4,†
1
Department of Applied Mathematics and Statistics, University of São Paulo, São Carlos 13566-590, Brazil
2
Department of Exact Sciences, University of São Paulo, Piracicaba 13418-900, Brazil
3
Department of Statistics, University of Brasilia, Brasilia 70910-900, Brazil
4
Department of Statistics, Federal University of Pernambuco, Recife 50670-901, Brazil
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(17), 3198; https://doi.org/10.3390/math10173198
Submission received: 29 July 2022 / Revised: 29 August 2022 / Accepted: 1 September 2022 / Published: 4 September 2022
(This article belongs to the Section Probability and Statistics)

Abstract

:
A new and flexible distribution is introduced for modeling proportional data based on the quantile of the generalized extreme value distribution. We obtain explicit expressions for the moments, quantiles, and other structural properties. An extended regression model is constructed as an alternative to compete with the beta regression. Some simulations from the Bayesian perspectives are developed, and an illustrative application to real data involving the comparison of models and influence diagnostics is also addressed.

1. Introduction

In recent years, we have seen a considerable interest in formulating new families of distributions for modeling proportional data: for example, illiteracy and mortality rates, the proportion of eggs hatched in the production of cuttings, percentage of defective items, etc. One of the most common distributions in this context is the beta distribution [1], which was also extended to a regression [2]. There are several extensions of the beta regression in Simas et al. [3], Ospina Ferrari [4], Ospina Ferrari [5], Carrasco et al. [6], and Figueroa-Zúñiga et al. [7].
Some other alternatives have appeared in the literature. For example, Qiu et al. [8] defined a simplex regression [9], Bayes et al. [10] introduced a parametric quantile regression from the Kumaraswamy distribution [11], Lemonte and Bazán [12] proposed a regression based on an extended Johnson S B distribution [13,14] introduced a family by compounding the cumulative distribution function (cdf) of a model with the quantile function (qf) of a second one. Cancho et al. [15] constructed a regression model by extending the Johnson S B distribution with a shape parameter controlling the asymmetry.
Let F X ( x ) be the baseline cdf of a random variable (rv) X with real support R , and the transformation:
Y = G X γ δ ,
where G ( · ) is a cdf of an rv with support R , γ R , and δ > 0 . Then, X = γ + δ Q ( Y ) for y ( 0 , 1 ) , where Q ( y ) = G 1 ( Y ) . The probability density function (pdf) of Y follows from (1) as:
f Y ( y ) = δ f X ( γ + δ Q ( y ) ) d Q ( y ) d y ,
where f X ( x ) = d F X ( x ) / d x .
Let X N ( 0 , 1 ) have the standard normal cdf Φ ( x ) , and G ( y ) = 1 / ( 1 + e y ) the standard logistic cdf. Thus, the Johnson S B density follows from (2), for y ( 0 , 1 ) ,
f Y ( y ; γ , δ ) = δ ϕ ( γ + δ Q ( y ) ) y ( 1 y ) ,
where ϕ ( · ) is the standard normal density, and:
Q ( y ) = log y 1 y ,
is the logistic qf [14].
We define a new family by compounding a baseline standard normal with the qf of the generalized extreme value (GEV). Let Z be an rv having a GEV distribution [16] having cdf with zero location and unit scale parameter, namely:
G ( z ; λ ) = exp 1 + λ z 1 / λ , λ 0 ; exp e z , λ = 0 ;
where z [ 1 / λ , + ) for λ > 0 , z ( , 1 / λ ] for λ < 0 and z ( , + ) for λ = 0 .
The qf of Z becomes:
Q ( y ; λ ) = [ log ( y ) ] λ 1 λ , λ > 0 and y [ 0 , 1 ) ; λ < 0 and y ( 0 , 1 ] ; log [ log ( y ) ] , λ = 0 and y ( 0 , 1 ) .
This article is organized in six sections. Section 2 defines a new model called the normal-generalized extreme value (Normal-GEV) distribution, and provides some of its structural properties. Section 3 constructs a new regression model from this distribution, and addresses Bayesian inferential procedures. Section 4 performs several simulations under different scenarios to study the behavior of the estimators. An application to colorectal cancer data in Section 5 shows the importance of the proposed regression. Some conclusions are given in Section 6.

2. The Normal-GEV Distribution

The pdf of the rv Y Normal - GEV ( γ , δ , λ ) follows by inserting (6) in Equation (3):
f Y ( y ; γ , δ , λ ) = δ ϕ γ + δ Q ( y ; λ ) y log ( y ) λ + 1 , y ( 0 , 1 ) .
Note that the above pdf includes both cases of Equation (6), i.e., the cases for λ 0 and λ 0 .
Proposition 1.
The limits below hold:
lim y 0 + f Y ( y ; γ , δ , λ ) = 0 , λ < 0 ; , λ 0 ; a n d lim y 1 f Y ( y ; γ , δ , λ ) = δ ϕ ( γ + δ ) , λ = 1 ; 0 , λ 1 .
Proof. 
Setting x = log ( y ) , the Normal-GEV pdf (7) can be expressed as:
f Y ( y ; γ , δ , λ ) = δ 2 π exp 1 2 γ + δ λ ( x λ 1 ) 2 exp ( x ) x λ + 1 , λ 0 ; exp 1 2 γ + δ ( log ( x ) ) 2 exp ( x ) x , λ = 0 .
By using the known inequality:
exp ( x ) 1 + x , x R ,
the expression on the right-hand side of (8) reduces to:
δ 2 π 1 1 2 γ + δ λ ( x λ 1 ) 2 exp ( x ) x λ + 1 , λ 0 ; 1 1 2 γ + δ ( log ( x ) ) 2 exp ( x ) x , λ = 0 ; = g ( x ; γ , δ , λ ) .
That is, f Y ( y ; γ , δ , λ ) g ( x ; γ , δ , λ ) . We have lim x g ( x ; γ , δ , λ ) = , λ 0 , since the exponential grows faster than the polynomial. Then, since y 0 + x from the squeeze (or sandwich) theorem, we get = lim x g ( x ; γ , δ , λ ) lim y 0 + f Y ( y ; γ , δ , λ ) . This proves that lim y 0 + f Y ( y ; γ , δ , λ ) = , λ 0 .
Again, by using (8) and inequality (9), we have:
f Y ( y ; γ , δ , λ ) δ 2 π 1 x x λ + 1 exp 1 2 γ + δ λ ( x λ 1 ) 2 , λ 0 ; 1 x x exp 1 2 γ + δ ( log ( x ) ) 2 , λ = 0 ; = h ( x ; γ , δ , λ ) .
For λ < 0 , from the inequality above and by the rapid growth of the exponential in comparison to polynomials, we have lim y 0 + f Y ( y ; γ , δ , λ ) lim x h ( x ; γ , δ , λ ) = 0 . Therefore, lim y 0 + f Y ( y ; γ , δ , λ ) = 0 , λ < 0 .
On the other hand, note that y 1 x 0 . Again, from the inequality f Y ( y ; γ , δ , λ ) h ( x ; γ , δ , λ ) and the rapid growth of the exponential, λ 1 , we get lim y 1 f Y ( y ; γ , δ , λ ) lim x 0 h ( x ; γ , δ , λ ) = 0 . Hence, lim y 1 f Y ( y ; γ , δ , λ ) = 0 .
Finally, if λ = 1 , then f Y in (7) gives:
f Y ( y ; γ , δ , 1 ) = δ ϕ γ + δ [ log ( y ) + 1 ] y , y ( 0 , 1 ) .
Then, it is clear that (by the continuity of ϕ ), lim y 1 f Y ( y ; γ , δ , 1 ) = δ ϕ ( γ + δ ) . □
The proof of the next result is immediate and hence omitted.
Proposition 2.
Let Y N o r m a l - G E V ( γ , δ , λ ) , and T γ , δ , λ ( y ) = γ + δ Q ( y ; λ ) . The cdf of Y is
F Y ( y ) = Φ ( T γ , δ , λ ( y ) ) , y ( 0 , 1 ) ,
where Q ( y ; λ ) is as in (6).

2.1. Behavior of the Normal-GEV Distribution

In this subsection, some distributional properties such as unimodality and monotonicity of the Normal-GEV pdf are analyzed.
To determine the number of modes of a pdf, f, it is necessary to locate its critical points. By definition, a critical point of a function f is a point on the graph of f where the derivative is zero or infinite.
Proposition 3.
All critical points y of the new pdf (7) satisfy:
[ λ + 1 + log ( y ) ] [ log ( y ) ] λ δ [ γ + δ Q ( y ; λ ) ] = 0 ,
where Q ( y ; λ ) is given in (6).
Proof. 
Adopting the notation of Proposition 2, f Y ( y ) = f Y ( y ; γ , δ , λ ) and T ( y ) = T γ , δ , λ ( y ) = γ + δ Q ( y ; λ ) , we have (dashes mean derivatives) f Y ( y ) = ϕ ( T ( y ) ) T ( y ) . Differentiating f Y ( y ) with respect to y gives:
f Y ( y ) = ϕ ( T ( y ) ) T ( y ) T ( y ) [ T ( y ) ] 2 ,
where:
T ( y ) = δ [ log ( y ) ] ( λ + 1 ) y and T ( y ) = T ( y ) [ log ( y ) ] 1 [ λ + 1 + log ( y ) ] y .
By combining (10) and (11), we obtain:
f Y ( y ) = f Y ( y ) [ log ( y ) ] ( λ + 1 ) y [ λ + 1 + log ( y ) ] [ log ( y ) ] λ δ [ γ + δ Q ( y ; λ ) ] .
Then, the proof follows. □
Theorems 1 and 2 show that λ governs the shape of the new distribution.
Theorem 1.
If Y N o r m a l - G E V ( γ , δ , λ ) with λ 0 , the pdf of Y is:
  • Decreasing-increasing-decreasing (DID) or decreasing (D) whenever λ N .
  • Unimodal whenever λ Z \ ( N { 1 , 0 } ) and γ < δ / λ .
Proof. 
By replacing Q ( y ; λ ) with λ 0 in equation of Proposition 3, all critical points y of the pdf of Y satisfy:
p ( x ) = x 2 λ + 1 + ( λ + 1 ) x 2 λ + δ δ λ γ x λ δ 2 λ = 0 , with x = log ( y ) .
If λ N , then p ( x ) is a polynomial of degree 2 λ + 1 . By Descartes’ rule of signs [17], p ( x ) has two sign changes (regardless of the sign of δ / λ γ ) and then two or zero positive roots. If p ( x ) has two positive roots x 1 and x 2 , the pdf of Y has two critical points y 1 = exp ( x 1 ) and y 2 = exp ( x 2 ) in ( 0 , 1 ) . On the other hand, if p ( x ) has zero positive roots, it has no critical points in ( 0 , 1 ) . Finally, since lim y 0 + f Y ( y ) = and lim y 1 f Y ( y ) = 0 , the statement of Item 1 follows.
If λ Z \ ( N { 1 , 0 } ) , p ( x ) can be written in terms of w = x 1 = [ log ( y ) ] 1
q ( w ) = w 2 λ 1 + ( λ + 1 ) w 2 λ + δ δ λ γ w λ δ 2 λ = 0 .
In this case, q ( w ) is a polynomial of degree 2 λ 1 . Again, by Descartes’ rule of signs, q ( w ) has only one sign change, and this polynomial has only one positive root, say w 0 . Then, the pdf of Y has only one critical point y 0 = exp ( w 0 1 ) on ( 0 , 1 ) . Since lim y 0 + f Y ( y ) = 0 and lim y 1 f Y ( y ) = δ ϕ ( γ + δ ) δ λ , 1 , where δ i , j is the Kronecker delta, the unimodality stated in Item 2 follows. □
Theorem 2 provides the explicit critical points of the Normal-GEV pdf ( λ = 0 ) whenever a constraint on parameters γ and δ is imposed. Further, this theorem shows that the form of the pdf is continuous monotone at three disjoint intervals.
Theorem 2.
If Y N o r m a l - G E V ( γ , δ , λ ) and λ = 0 , the pdf of Y is decreasing-increasing-decreasing (DID) whenever:
γ [ 2 log ( δ ) 1 ] δ + 1 δ .
Moreover,
y 1 = exp [ δ 2 W 1 ( exp ( γ δ 1 δ 2 ) δ 2 ) ] and y 2 = exp [ δ 2 W 0 ( exp ( γ δ 1 δ 2 ) δ 2 ) ]
are the minimum and maximum points of the pdf of Y, respectively. For some integer k, W k ( · ) denotes the Lambert W function.
Proof. 
By replacing the definition of Q ( y ; λ ) with λ = 0 in the equation of Proposition 3, all critical points y of the pdf of Y satisfy:
1 x + δ 2 log ( x ) δ γ = 0 , with x = log ( y ) .
Equivalently,
x δ 2 exp x δ 2 = exp ( γ δ 1 δ 2 ) δ 2 .
Since the function f ( t ) = t exp ( t ) has a (global) minimum 1 / e at the point t = 1 , the above equation can be solved for x / δ 2 only if z = exp ( γ δ 1 δ 2 ) / δ 2 1 / e . Since z < 0 , assuming the condition (12), the two values y 1 and y 2 in (13) are obtained. Finally, y 1 and y 2 are minimum and maximum points of the pdf of Y, respectively, because lim y 0 + f Y ( y ) = and lim y 1 f Y ( y ) = 0 , and 0 < y 1 < y 2 < 1 . □
Figure 1 and Figure 2 reveal different types of asymmetrical. For negative values of the parameter λ we have positive asymmetry (and unimodality), for positive values of λ we have decreasing, increasing and decreasing behavior.
Figure 3 shows that cases of decreasing strict monotonicity can appear in the Normal-GEV pdf.

2.2. Related Distributions

Proposition 4.
If Y N o r m a l - G E V ( γ , δ , λ ) , then (for any a > 0 and b R ):
  • Normal distribution: a δ Q ( Y ; λ ) + a γ + b N ( b , a 2 ) .
  • Log-normal distribution: exp [ a δ Q ( Y ; λ ) + a γ + b ] log [ N ( b , a 2 ) ] .
  • Folded normal distribution: | a δ Q ( Y ; λ ) + a γ + b | N f ( b , a 2 ) .
  • χ distribution with one degree of freedom (df): | a δ Q ( Y ; λ ) + a γ | / a χ 1 .
  • Noncentral χ 2 distribution: [ a δ Q ( Y ; λ ) + a γ + b ] 2 / a 2 χ 1 2 ( b 2 / a 2 ) .
  • Lévy distribution: [ a δ Q ( Y ; λ ) + a γ ] 2 Levy ( 0 , a 2 ) .
Proof. 
The proof of this proposition follows by using well-known properties of the normal distribution with Equation (1). Hence, details are omitted. □
Proposition 5.
  • χ 2 distribution with n df: If Y k N o r m a l - G E V ( γ k , δ k , λ k ) , k = 1 , , n , are independent rvs, then:
    [ δ 1 Q ( Y 1 ; λ 1 ) + γ 1 ] 2 + + [ δ n Q ( Y n ; λ n ) + γ n ] 2 χ n 2 .
  • Student t-distribution with n 1 df: If Y k N o r m a l - G E V ( γ k , δ k , λ k ) , k = 1 , , n , are independent rvs, then:
    X ¯ 1 n ( n 1 ) [ ( δ 1 Q ( Y 1 ; λ 1 ) + γ 1 X ¯ ) 2 + + ( δ n Q ( Y n ; λ n ) + γ n X ¯ ) 2 ] t n 1 ,
    where X ¯ = { [ δ 1 Q ( Y 1 ; λ 1 ) + + δ n Q ( Y n ; λ n ) ] + ( γ 1 + + γ n ) } / n .
  • F-distribution with ( n , m ) df: If Y k N o r m a l - G E V ( γ k , δ k , λ k ) , k = 1 , , n , Y j N o r m a l - G E V ( γ j , δ j , λ j ) , j = 1 , , m , are independent rvs, then:
    { [ δ 1 Q ( Y 1 ; λ 1 ) + γ 1 ] 2 + + [ δ n Q ( Y n ; λ n ) + γ n ] 2 } / n { [ δ 1 Q ( Y 1 ; λ 1 ) + γ 1 ] 2 + + [ δ m Q ( Y m ; λ m ) + γ m ] 2 } / m F n , m .
Proof. 
The proof follows by combining known properties of the normal distribution with Equation (1). □

2.3. The New Model as a Limit Distribution

Let X ¯ be the sample mean of n samples drawn from a population with mean μ and standard deviation σ ( 0 , ) . The central limit theorem leads to the well-known result:
X ¯ μ σ / n D Z N ( 0 , 1 ) ,
where “ D ” denotes convergence in the distribution. Let T γ , δ , λ be the transformation defined in Proposition 2. Since T γ , δ , λ 1 is a continuous map, by applying the continuous mapping theorem, we have:
T γ , δ , λ 1 X ¯ μ σ / n D Y : = T γ , δ , λ 1 ( Z ) .
From Equation (1), Y = T γ , δ , λ 1 ( Z ) Normal - GEV ( γ , δ , λ ) , where T γ , δ , λ 1 ( z ) = G ( ( z γ ) / δ ; λ ) and G ( · ) is the GEV cdf given in (5).
Then, for λ 0 :
exp 1 + λ δ X ¯ μ σ / n γ 1 / λ D Y Normal - GEV ( γ , δ , λ ) ;
and, for λ = 0 ,
exp exp 1 δ X ¯ μ σ / n γ D Y Normal - GEV ( γ , δ , 0 ) .

2.4. Moments, Quantile and Other Measures

Proposition 6.
The rth real moment of Y N o r m a l - G E V ( γ , δ , λ ) is (whenever it makes sense):
E [ Y r ] = M φ λ ( Z ) ( r ) , Z N ( 0 , 1 ) ,
where,
φ λ ( z ) = ( 1 λ γ δ + λ δ z ) 1 / λ , if λ 0 ; exp ( γ z δ ) , if λ = 0 ,
and M X ( · ) is the generating function of X. For example, for λ = 1 , E [ Y r ] = exp ( λ γ δ + r 2 λ 2 2 δ 2 1 ) .
Proof. 
The proof is immediate since Y follows Equation (1) with G ( · ) given in (5). □
The moments of Y are finite since its support is limited, and the integral in (14) can be numerically computed via the software R, Mathematica, and Maple, among others. Specifically, the mean and variance are calculated using the integrate function of the R software. This approximation is based on the adaptive quadrature of functions of one variable over a finite interval; for more details, see Piessens et al. [18].
Table 1 reports the mean and variance of Y for some parameters obtained numerically using the integrate function of the R software. We note in the second and third columns that the mean and variance do not change for different values of λ . However, there is some variation of the variance for different values of γ with the other parameters fixed. So, the parameter γ is responsible for the location of the model and the parameter δ for the dispersion.
Proposition 7.
Let Y N o r m a l - G E V ( γ , δ , λ ) . Then, the qth quantile of Y is:
y q = exp [ 1 + λ δ ( x q γ ) ] 1 λ , λ 0 ; exp exp ( x q γ δ ) , λ = 0 ,
where x q is the qth standard normal quantile (for 0 < q < 1 ).
Proof. 
The proof is immediate and then omitted. □
The median ν of the Normal-GEV distribution follows from Proposition 7:
ν = exp 1 + λ δ ( γ ) 1 λ , λ 0 ; exp exp γ δ , λ = 0 ;
where x 0.5 is the median of the standard normal distribution. Furthermore, from Proposition 7, the random values for Y can be easily generated.
Further, the Bowley’s skewness of Y is:
B = y 0.75 + y 0.25 2 ν y 0.75 y 0.25 ,
where y 0.25 , ν and y 0.75 are the quantile values.
Figure 4 displays the skewness of Y for some parameters, which indicates that the distribution is symmetric when λ 0 and γ 0 . Thus, the parameters λ and γ govern the skewness of Y.

3. Bayesian Inference for the Normal-GEV Regression

The Normal-GEV median can be expressed from (15) as:
γ = δ Q ( ν ; λ ) ,
where Q ( y ; λ ) is given by (6). Therefore, it has a simple form to construct a regression model. In this context, we obtain a reparameterized density of Y by replacing the above expression in Equation (2),
f ( y ; λ , ν , δ ) = δ ϕ δ [ Q ( y ; λ ) Q ( ν ; λ ) ] y [ log ( y ) ] λ + 1 , y ( 0 , 1 ) ,
where λ R , ν ( 0 , 1 ) , and δ > 0 works as a dispersion parameter.
Let y = ( y 1 , , y n ) be n observations from Y i Normal - GEV ( λ , ν i , δ i ) , where two systematic components are constructed for the median ν i and dispersion δ i . The Normal-GEV regression model is defined by (16) and the systematic components:
η 1 i = h 1 ( ν i ) = w i β and η 2 i = h 2 ( δ i ) = z i τ ,
where w i = ( w 1 i , , w p i ) and z i = ( z 1 i , , z q i ) are vectors of covariates, β R p , τ R q are vectors of unknown coefficients ( p + q < n ), and h 1 : ( 0 , 1 ) R and h 2 : ( 0 , ) R are strictly monotonic and twice differentiable link functions. There are several possible choice for the link functions h 1 and h 2 . For example, some useful link functions for the median are: logit h 1 ( ν ) = log ν 1 ν ; probit h 1 ( ν ) = Φ 1 ( ν ) , where Φ 1 ( · ) is the standard normal quantile function; complementary log–log h 1 ( ν ) = log log ( 1 ν ) ; log–log h 1 ( ν ) = log log ( ν ) ; and Cauchy h 1 ( ψ ) = tan π ( ψ 0.5 ) . Some possible choice dispersion link are: logarithmic h 2 ( δ ) = log ( δ ) ; square root h 2 ( δ ) = δ ; identity h 2 ( δ ) = δ (with δ > 0 ); among others. The relationship between ν and β and δ i and τ is equivalent to a canonical link for ν i ( location parameter) and δ i (dispersion parameter) in setting generalized linear model.
Further, let W = ( w 1 , , w n ) and Z = ( z 1 , , z n ) be matrices of full ranks p and q, respectively. The likelihood function for the parameters given the observed data D = ( y , W , Z ) has the form:
L ( λ , β , τ | D ) = i = 1 n δ i ϕ ( δ i [ Q ( y i ; λ ) Q ( ν i ; λ ) ] ) i = 1 n y i 1 i = 1 n ( log ( y i ) ) λ + 1 1 .
Maximizing (17) provides the maximum likelihood estimates (MLEs) of the parameters. However, we consider the Bayesian method with the common proper prior distributions:
β j Nl ( 0 , 100 ) , j = 1 , , p , τ j N ( 0 , 100 ) , j = 1 , q , and λ N ( 0 , 1 ) ,
where β , τ , and λ are assumed independent. Combining (17) and (18), the joint posterior density for ϑ = ( λ , β , τ ) R p + q + 1 reduces to:
π ( ϑ | D ) i = 1 n δ i ϕ ( δ i [ Q ( y i ; λ ) Q ( ν i ; λ ) ] ) i = 1 n ( log ( y i ) ) λ + 1 1 π ( λ ) π ( β ) π ( τ ) .
The Metropolis–Hastings algorithm consists of the steps:
(1)
Initialize from trial ϑ ( 0 ) and set j = 0 ;
(2)
Construct the transitional kernel K ( ϑ , ϑ j ) = N p + q + 1 ϑ j , Σ ˜ to generate a new point ϑ , where Σ ˜ is evaluated at ϑ j ;
(3)
Update ϑ ( j ) to ϑ ( j + 1 ) = ϑ with probability p j = min { 1 , π ( ϑ | D ) / π ( ϑ ( j ) | D ) } , or set ϑ ( j ) with probability 1 p j ;
(4)
Steps (2) and (3) are repeated until the process becomes stationary.
The script can be obtained from the authors upon request. For more details on the Metropolis–Hastings algorithm, we refer to Chib et al. [19].

4. Simulation Study

We determine the accuracy of the Bayesian estimates in the new regression model. One thousand samples of sizes n = 50 , 100 , 200 , and 400 are generated from y i N o r m a l V E G ( λ , ν i , δ i ) under the systematic components h i ( ν i ) = log ( ν i 1 ν i ) = β 0 + β 1 w i and h 2 ( δ i ) = log δ i = τ 0 + τ 1 w i , and λ = 0.4 , 0.4 . The covariate w i is produced from the uniform U ( 0 , 1 ) distribution with β 0 = 3 , β 1 = 2 , τ 0 = 1 , and τ 1 = 1 .
We obtain the posterior summaries and 95% highest probability density (HPD) intervals of the parameters for each trial. We generate 25,000 MCMC posterior samples for the parameters, from which 5000 observations are discarded to eliminate the effect of the initial values. To avoid correlation between the generates values, we took a spacing of size 5, leading to samples of size 2000. Therefore, the final sample has size 2000 to record the convergence of the Gibbs samples [20]. For each configuration, we perform 1000 replicates to determine from the estimates: the average (MC mean), standard deviation (SD), mean root square error (MC RMSE), and coverage probability (CP).
Table 2 reports the simulations results, which reveal that the RMSEs decay when n increases (as expected), and the coverage probabilities approximate the nominal level.

5. Application: Colorectal Cancer Data

We analyze data on patients with colorectal cancer [21] from 50 American States, where the mortality rate is the response variable. We consider n = 220 observations after deleting states with incomplete data. The variables below were collected:
  • y i : mortality rate ( i = 1 , , 220 );
  • x 1 i : sex (0 = man, 1 = woman);
  • x 2 i : race (non-Hispanic white, non-Hispanic black, Hispanic).
Figure 5 displays boxplots of mortality rate by sex (left panel) and race (right panel). They indicate that it is different for men and women, and Hispanic patients have a lesser mortality rate than the other patients.
We fit the regression mode described in Section 3 to these data with all covariates on the median of the mortality rate ( ν ), and dispersion parameter ( δ ) with the link functions: logistic, probit, complement log–log for the median, and logarithmic for the dispersion, i.e.,
h 1 ( ν i ) = β 0 + β 1 x 1 i + β 2 1 x 2 1 i + β 2 2 x 2 2 i , and h 2 ( ν i ) = log ( δ i ) = τ 0 + τ 1 x 1 i + τ 2 1 x 2 1 i + τ 2 2 x 2 2 i ,
where the race covariate ( x 2 ) requires two dummy variables:
x 2 1 i = 1 , if non - hispanic white ; 0 , otherwise , a n d x 2 2 i = 1 , if non - hispanic black ; 0 , otherwise .
We consider 250,000 MCMC posterior samples from which 50,000 were excluded to eliminate the effect of the initial values. The autocorrelations of theses sampled values are reduced by taking a spacing of size five, yielding a final sample of size 4000. The trace plots for parameters of the new regression model with complementary log–log link are reported in Figure 6, thus indicating convergence of the chains [20].
For model comparison, we consider the deviance Information criterion (DIC [22]), the expected Akaike information criterion (EAIC, [23]), the expected Bayesian (or Schwarz) information criterion (EBIC, [24]), and the log pseudo marginal likelihood (LPML [25]). The last criteria is the one derived from the Conditional Predictive Ordinate (CPO) [26]. The Monte Carlo estimates of the DIC, EAIC, EBIC, and LPML criteria in Table 3 confirm that the proposed regression with complementary log–log link (com-log-log) (short Normal-GEV-CLL) is the best model.
The Bayesian estimates under quadratic and absolute losses of the parameters of the Normal-GEV-CLL and Johnson’s S B regression models and the 95 % HPD intervals are reported in Table 4. All covariates are statistically significant at the significance level of 5% for all models. Figure 7 (left panel) displays the marginal posterior density of λ in the Normal-GEV-CLL regression model, which is symmetric. Table 4 reveals that the posterior mean of λ is 1.088, and a 95% HPD interval is ( 0.0110 , 2.103 ) . We fit the GJS-Student-t regression model [12] with four degrees of freedom and log–log link to the current data. Table 5 reports the Monte Carlo estimates of DIC, EAIC, EBIC, and LPML for Jhonson’s S B and GJS-Student-t regression models. They indicate that the second regression is better than the first for these data, but it does not provide a better fit than the Normal-GEV-CLL regression model. The quantile–quantile (QQ) plot of the posterior normalized randomized quantile residuals for the last regression in Figure 7 (right panel) proves an acceptable fit [27,28].
We consider a Bayesian global influence methodology to identify the presence of outliers and/or influential observations under the general divergence measure [29]. Let D ψ ( π , π ( i ) ) be the ψ -divergence between π and π ( i ) , where π denotes the posterior distribution of ϑ for the full dataset, and π ( i ) the posterior distribution of ϑ without the ith observation, namely:
D ψ ( π , π ( i ) ) = ψ π ( ϑ | D ( i ) ) π ( ϑ | D ) π ( ϑ | D ) d ϑ = E ϑ | D ψ C P O i f ( y i ; ϑ ) ,
where ψ is a convex function with ψ ( 1 ) = 0 . Different choices of ψ are addressed by Dey and Birmiwal [30] and Pardo [31]. Here, ψ ( z ) = log ( z ) defines the Kullback–Leibler (K-L) divergence, ψ ( z ) = ( z 1 ) log ( z ) gives the J-distance, ψ ( z ) = 0.5 | z 1 | provides L 1 norm, and ψ ( z ) = z ( 1 / z 1 ) 2 yields the χ 2 -square divergence. The divergence measure to verify whether a small subset of observations from the full data is influential or not follows the criterion by Peng and Dey [29] and Weiss [32]; see also Cancho et al. [15,33,34].
We calculate the Monte Carlo estimates of the divergence measures K-L, J, L 1 , and χ 2 for the posterior distribution of the parameters of the Normal-GEV-CLL regression models to detect possible influential points.
They are plotted in Figure 8 and identify the cases 39, 54, and 122 as possible influential observations in the posterior distribution.
Table 6 presents subjects having large K-L, J, L 1 , and χ 2 .
For some cases, we refit the regression model to determine the impact of these observations on the posterior distribution of the parameters [15]. We eliminate each case individually and then two and three cases. The relative change (RC) of each estimate is R C ϑ j = ( ϑ ^ j ϑ ^ j ( I ) ) / ϑ ^ j × 100 % , where ϑ ^ j ( I ) is the posterior mean of θ j (for j = 1 , , 9 ) when the set I of observations is removed.
Table 7 reports the RCs after removing some observations, and the lower (L) and upper (U) limits of the 95 % HPD intervals of the new estimates. In general, the significance of the parameter estimates does not change after removing set I at the level of 5%.
We estimate the median of the mortality rate for eight patients A, B, C, D, E, and F with specified characteristics in Table 8. These numbers refer to the Bayes estimates under the quadratic and absolute loss functions and the 95 % HPD intervals for the median mortality. For example, the median mortality rates are 0.147 and 0.276 for patient A of gender male and race Hispanic and patient E of gender male and race Hispanic, respectively. This difference can be seen in Figure 9, and in the posterior distribution of the median mortality rate of the other patients.

6. Conclusions

We provided some mathematical properties of the new normal-generalized extreme value (Normal-GEV) distribution, and proposed a new and flexible regression model for proportional variables. This regression model is an alternative to the well-known beta regression model [2]. Some simulation studies and Bayesian procedures were developed to analyze a real dataset and they showed that the proposed regression is very competitive and useful for inferential and diagnostic problems involving bounded response variables and covariate variables.

Author Contributions

Conceptualization, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; methodology, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; software, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; validation, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; formal analysis, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; investigation, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; data curation, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; writing—original draft preparation, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; writing—review and editing, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; visualization, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C.; supervision, Y.R.B., V.G.C., E.M.M.O., R.V., G.M.C. All authors have read and agreed to the current version of the manuscript.

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES) (Finance Code 001).

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available within the article.

Acknowledgments

We thank the anonymous reviewers whose comments/suggestions helped improve and clarify this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Krysicki, W. On some new properties of the beta distribution. Stat. Probab. Lett. 1999, 42, 131–137. [Google Scholar]
  2. Ferrari, S.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar]
  3. Simas, A.B.; Barreto-Souza, W.; Rocha, A.V. Improved estimators for a general class of beta regression models. Comput. Stat. Data Anal. 2010, 54, 348–366. [Google Scholar]
  4. Ospina, R.; Ferrari, S.L. Inflated beta distributions. Stat. Pap. 2010, 51, 111. [Google Scholar]
  5. Ospina, R.; Ferrari, S.L. A general class of zero-or-one inflated beta regression models. Comput. Stat. Data Anal. 2012, 56, 1609–1623. [Google Scholar]
  6. Carrasco, J.M.; Ferrari, S.L.; Arellano-Valle, R.B. Errors-in-variables beta regression models. J. Appl. Stat. 2014, 41, 1530–1547. [Google Scholar]
  7. Figueroa-Zúñiga, J.I.; Arellano-Valle, R.B.; Ferrari, S.L. Mixed beta regression: A bayesian perspective. Comput. Stat. Data Anal. 2013, 61, 137–147. [Google Scholar]
  8. Qiu, Z.; Song, P.X.-K.; Tan, M. Simplex mixed-effects models for longitudinal proportional data. Scand. J. Stat. 2008, 35, 577–596. [Google Scholar]
  9. Barndorff-Nielsen, O.E.; Jørgensen, B. Some parametric models on the simplex. J. Multivar. Anal. 1991, 39, 106–116. [Google Scholar]
  10. Bayes, C.; Bazan, J.L.; de Castro, M. A quantile parametric mixed regression model for bounded response variables. Stat. Its Interface 2017, 10, 483–493. [Google Scholar]
  11. Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar]
  12. Lemonte, A.J.; Bazán, J.L. New class of johnson sb distributions and its associated regression model for rates and proportions. Biom. J. 2016, 41, 727–746. [Google Scholar]
  13. Johnson, N.L. Systems of frequency curves generated by methods of translation. Biometrika 1949, 36, 149–176. [Google Scholar]
  14. Smithson, M.; Shou, Y. Cdf-quantile distributions for modelling random variables on the unit interval. Br. J. Math. Stat. Psychol. 2017, 70, 412–438. [Google Scholar] [PubMed]
  15. Cancho, V.; Bazán, J.L.; Dey, D.K. A new class of regression model for a bounded response with application in the study of the incidence rate of colorectal cancer. Stat. Methods Med. Res. 2020, 29, 2015–2033. [Google Scholar] [PubMed]
  16. Jenkinson, A.F. The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Q. J. R. Meteorol. Soc. 1955, 81, 158–171. [Google Scholar]
  17. Griffiths, L. Introduction to the Theory of Equations; J. Wiley: New York, NY, USA, 1947. [Google Scholar]
  18. Piessens, R.; de Doncker-Kapenga, E.; Uberhuber, C.W.; Kahaner, D.K. QUADPACK A Subroutine Package for Automatic Integration; Springer: Berlin, Germany, 1983. [Google Scholar]
  19. Chib, S.; Greenberg, E. Understanding the metropolis-hastings algorithm. Am. Stat. 1995, 49, 327–335. [Google Scholar]
  20. Cowles, M.K.; Carlin, B.P. Markov chain Monte Carlo convergence diagnostics: A comparative review. J. Am. Stat. Assoc. 1996, 91, 883–904. [Google Scholar]
  21. Siegel, R.; DeSantis, C.; Jemal, A. Colorectal cancer statistics. CA A Cancer J. Clin. 2014, 64, 104–117. [Google Scholar]
  22. Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P.; van der Linde, A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 2002, 64, 583–639. [Google Scholar]
  23. Brooks, S.P. Discussion on the paper by Spiegelhalter, Best, Carlin, and van der Linde. J. R. Stat. Soc. B 2002, 64, 616–618. [Google Scholar]
  24. Carlin, B.P.; Louis, T.A. Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2001. [Google Scholar]
  25. Ibrahim, J.G.; Chen, M.-H.; Sinha, D. Bayesian Survival Analysis; Springer: New York, NY, USA, 2001. [Google Scholar]
  26. Gelfand, A.; Dey, D.; Chang, H. Model determination using predictive distributions with implementation via sampling based methods (with discussion). Bayesian Statistics 4; Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M., Eds.; Oxford University Press: Oxford, UK, 1992; Volume 1, pp. 7–167. [Google Scholar]
  27. Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar]
  28. Rigby, R.A.; Stasinopoulos, D.M. Generalized additive models for location, scale and shape (with discussion). Appl. Stat. 2005, 54, 507–554. [Google Scholar]
  29. Peng, F.; Dey, D. Bayesian analysis of outlier problems using divergence measures. Can. J. Stat. 1995, 23, 199–213. [Google Scholar]
  30. Dey, D.; Birmiwal, L.R. Robust bayesian analysis using divergence measures. Stat. Probab. Lett. 1994, 20, 287–294. [Google Scholar]
  31. Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar]
  32. Weiss, R. An approach to Bayesian sensitivity analysis. J. R. Stat. Soc. Ser. B 1996, 58, 739–750. [Google Scholar]
  33. Cancho, V.; Ortega, E.; Paula, G. On estimation and influence diagnostics for log-Birnbaum-Saunders student-t regression models: Full Bayesian analysis. J. Stat. Plan. Inference 2010, 140, 2486–2496. [Google Scholar]
  34. Cancho, V.; Dey, D.; Lachos, V.; Andrade, M. Bayesian nonlinear regression models with scale mixtures of skew-normal distributions: Estimation and case influence diagnostics. Comput. Stat. Data Anal. 2011, 55, 588–602. [Google Scholar]
Figure 1. Normal-GEV density with γ = 0.2 , δ = 2 , and λ varying.
Figure 1. Normal-GEV density with γ = 0.2 , δ = 2 , and λ varying.
Mathematics 10 03198 g001
Figure 2. Normal-GEV density with γ = 0.2 , δ = 2 , and λ varying.
Figure 2. Normal-GEV density with γ = 0.2 , δ = 2 , and λ varying.
Mathematics 10 03198 g002
Figure 3. Normal-GEV density with δ = 1 , λ = 1 , and γ varying.
Figure 3. Normal-GEV density with δ = 1 , λ = 1 , and γ varying.
Mathematics 10 03198 g003
Figure 4. The skewness of Y. (a) δ = 2 , γ = 0.5 , (b) δ = 2 , γ = 0.2 , (c) δ = 4 , γ = 0 , (d) δ = 2 , γ = 0.2 .
Figure 4. The skewness of Y. (a) δ = 2 , γ = 0.5 , (b) δ = 2 , γ = 0.2 , (c) δ = 4 , γ = 0 , (d) δ = 2 , γ = 0.2 .
Mathematics 10 03198 g004
Figure 5. Boxplots of the mortality rates by sex (left panel) and race (right panel).
Figure 5. Boxplots of the mortality rates by sex (left panel) and race (right panel).
Mathematics 10 03198 g005
Figure 6. Trace plots for the parameters of the GEV-CLL regression model for colon rectal data.
Figure 6. Trace plots for the parameters of the GEV-CLL regression model for colon rectal data.
Mathematics 10 03198 g006
Figure 7. Marginal posterior density for λ (left panel) and QQ plot of the posterior normalized randomized quantile residuals (right panel) for the Normal-GEV-CLL regression model.
Figure 7. Marginal posterior density for λ (left panel) and QQ plot of the posterior normalized randomized quantile residuals (right panel) for the Normal-GEV-CLL regression model.
Mathematics 10 03198 g007
Figure 8. Index plot of ψ -divergence measures.
Figure 8. Index plot of ψ -divergence measures.
Mathematics 10 03198 g008
Figure 9. Posterior density of median of mortality rate for six hypothetical patients.
Figure 9. Posterior density of median of mortality rate for six hypothetical patients.
Mathematics 10 03198 g009
Table 1. Mean and variance of the Normal-GEV distribution.
Table 1. Mean and variance of the Normal-GEV distribution.
ParameterMeanVarianceParameterMeanVarianceParameterMeanVariance
λ = 0.5 λ = 0.1 λ = 0.5
γ = 0.2 0.32120.0279 γ = 2 0.09210.0100 γ = 0.2 0.30470.0622
δ = 2 δ = 2 δ = 1
λ = 0.1 λ = 0.1 λ = 0.5
γ = 0.2 0.33410.0266 γ = 1 0.21050.0213 γ = 0.2 0.32120.0279
δ = 2 δ = 2 δ = 2
λ = 0 λ = 0.1 λ = 0.5
γ = 0.2 0.33730.0265 γ = 0.5 0.28560.0254 γ = 0.2 0.33590.0144
δ = 2 δ = 2 δ = 3
λ = 0.2 λ = 0.1 λ = 0.5
γ = 0.2 0.34370.0267 γ = 0.2 0.33410.0266 γ = 0.2 0.35320.0038
δ = 2 δ = 2 δ = 6
λ = 0.5 λ = 0.1 λ = 0.5
γ = 0.2 0.35380.0281 γ = 0 0.36660.0269 γ = 0.2 0.35730.0021
δ = 2 δ = 2 δ = 8
Table 2. Simulation results of the Normal-GEV regression model from 1000 trials.
Table 2. Simulation results of the Normal-GEV regression model from 1000 trials.
λ = 0.4 λ = 0.4
nParameterMC MeanSDBiasMC RMSECPMC MeanSDBiasMC RMSECP
λ −0.3680.8200.0320.8210.9830.3450.292−0.0550.2970.961
β 0 −3.0080.190−0.0080.1900.935−3.1400.545−0.1400.5620.938
50 β 1 −1.9900.2720.0100.2720.945−1.8990.8260.1010.8320.947
τ 0 1.0320.9530.0320.9530.9800.9420.417−0.0590.4210.959
τ 1 1.0300.5580.0300.5580.9650.9660.408−0.0340.4090.952
λ −0.360.5750.0400.5760.9690.3700.206−0.0310.2080.954
β 0 −3.000.1310.0040.1310.940−3.0570.366−0.0570.3700.943
100 β 1 −2.000.194−0.0050.1940.936−1.9690.5830.0310.5830.939
τ 0 1.050.6720.0490.6730.9530.9690.292−0.0310.2940.945
τ 1 1.010.3910.0150.3910.9550.9810.283−0.0190.2830.932
λ −0.3550.3880.0450.3910.9630.3830.133−0.0170.1340.953
β 0 −3.0020.090−0.0030.0900.944−3.0310.255−0.0310.2560.944
200 β 1 −1.9960.1310.0040.1310.941−1.9860.3900.0140.3900.949
τ 0 1.0470.4500.0470.4520.9660.9800.187−0.0200.1880.965
τ 1 1.0220.2630.0220.2640.9600.9870.192−0.0130.1920.950
λ −0.3940.2890.0060.2890.9390.3880.101−0.0120.1010.941
β 0 −3.0040.067−0.0040.0670.930−3.0220.177−0.0230.1780.949
400 β 1 −1.9950.0950.0050.0950.943−1.9830.2710.0170.2710.955
τ 0 1.0080.3370.0080.3370.9380.9810.142−0.0190.1430.939
τ 1 1.0000.1890.0000.1890.9451.0030.1290.0030.1290.952
Table 3. Some criteria for Normal-GEV regression models.
Table 3. Some criteria for Normal-GEV regression models.
Link FunctionCriteria
DICEAICEBICLPML
Logistic−995.926−986.771−956.228497.039
Probit−993.660−984.603−954.061496.038
Cauchy−995.063−985.563−955.020496.579
Com−log−log−996.674−987.814−957.271497.283
Log−Log−991.124−981.603−951.060494.889
Table 4. Estimates and 95 % HPD intervals for the Johnson’s S B and Normal–GEV regression models with com-log-log link function.
Table 4. Estimates and 95 % HPD intervals for the Johnson’s S B and Normal–GEV regression models with com-log-log link function.
ParameterNormal-GVEVJohnson’s S B
MeanMedianHPD (95%) IntervalMeanMedianHPD (95%) Interval
β 0 −1.879−1.877(−1.966, −1.796)−1.905−1.906(−1.993, −1.826)
β 1 −0.398−0.397(−0.444, −0.363)−0.391−0.391(−0.430, −0.348)
β 2 1 0.3430.342( 0.265, 0.430 )0.3590.361(0.277, 0.440)
β 2 2 0.7970.795(0.716, 0.894)0.8010.802(0.713, 0.888)
τ 0 2.6412.644(1.906, 3.352)1.0451.051(0.810, 1.266)
τ 1 0.6710.674(0.376, 0.968)0.3360.339(0.123, 0.513)
τ 2 1 0.5450.544(0.225, 0.846)0.8460.842( 0.596, 1.099)
τ 2 2 0.5450.544(0.225, 0.846)0.8460.842(0.596, 1.099)
λ 1.0881.088(0.011, 2.103)
Table 5. Monte Carlo estimates of DIC, EAIC, EBIC, and LPML for Jhonson’s S B and GJS-Student-t regression models.
Table 5. Monte Carlo estimates of DIC, EAIC, EBIC, and LPML for Jhonson’s S B and GJS-Student-t regression models.
ModelCriteria
DICEAICEBICLPML
GJS-t Student−984.764−976.786−949.637492.665
Jhonson’s−986.343−978.375−951.226491.750
Table 6. ψ -divergence measures for the Normal-GEV-CLL regression model.
Table 6. ψ -divergence measures for the Normal-GEV-CLL regression model.
CaseMortality RateSexRaceStateK-LJ L 1 χ 2
390.11Mennon-Hispanic whiteDist-Columbia1.0812.4220.5635.004
540.05WomenHispanicGeorgia0.4491.0570.3921.193
1220.29Womennon-Hispanic blackNebraska0.7692.3010.5752.861
Table 7. RCs (in %) and the L and U limits of the 95 % HPD intervals after removing some cases.
Table 7. RCs (in %) and the L and U limits of the 95 % HPD intervals after removing some cases.
Dropped λ β 0 β 1 β 2 1 β 2 2 τ 0 τ 1 τ 2 1 τ 2 2
noneMean1.088−1.879−0.3980.3430.7972.6410.6710.545−0.424
L0.011−1.966−0.4440.2650.7161.9060.3760.225−0.951
U2.103−1.796−0.3630.4300.8943.3520.9680.8460.101
{39}RC−39.0−0.00.90.9−0.3−9.7−21.425.2−44.0
L−0.391−1.960−0.4390.2660.7041.6090.2170.341−0.744
U1.822−1.797−0.3640.4250.8793.1350.8190.9740.309
{54}RC−35.80.10.81.5−0.1−8.6−20.722.4−41.5
L−0.343−1.964−0.4370.2610.7151.7510.2320.382−0.777
U1.763−1.802−0.3630.4230.8823.1960.8280.9980.225
{122}RC23.3−0.21.20.0−1.45.217.2−7.57.3
L0.305−1.961−0.4420.2600.6922.0400.4750.195−0.950
U2.324−1.798−0.3630.4200.8673.4351.0590.8010.050
{39,54}RC−50.7−0.70.5−3.2−2.4−10.3−20.613.9−37.8
L−0.528−1.938−0.4360.2510.6931.6930.2620.333−0.777
U1.602−1.790−0.3630.4020.8593.1790.8350.9290.243
{39,122}RC−17.0−0.42.10.1−1.8−4.7−5.217.1−35.6
L−0.277−1.955−0.4440.2680.7001.7790.3280.321−0.807
U1.927−1.794−0.3700.4260.8693.2530.9240.9370.202
{54,122}RC12.1−1.10.8−5.6−4.04.416.9−17.511.8
L0.246−1.933−0.4390.2480.6862.0770.4920.149−0.978
U2.201−1.789−0.3590.3880.8423.4491.0520.734−0.034
{39,54,122}RC−29.7−1.01.8−3.7−3.6−5.5−5.64.7−31.0
L−0.388−1.930−0.4440.2610.6941.7820.3380.270−0.830
U1.848−1.787−0.3700.4020.8463.3230.9470.8830.223
Table 8. Mortality rate estimates and the 95 % HPD intervals for six colorectal cancer patients.
Table 8. Mortality rate estimates and the 95 % HPD intervals for six colorectal cancer patients.
Patient Sex Race Mortality Rate
MeanMedianHPD (95%) Interval
AMenHispanic0.1420.142(0.131, 0.153)
BWomenHispanic0.0980.098(0.090, 0.105)
CMennon-Hispanic white0.1940.194(0.188, 0.200)
DWomennon-Hispanic white0.1350.135(0.131, 0.138)
EMennon-Hispanic black0.2870.287(0.275, 0.299)
FWomennon-Hispanic black0.2040.204(0.195, 0.212)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Benites, Y.R.; Cancho, V.G.; Ortega, E.M.M.; Vila, R.; Cordeiro, G.M. A New Regression Model on the Unit Interval: Properties, Estimation, and Application. Mathematics 2022, 10, 3198. https://doi.org/10.3390/math10173198

AMA Style

Benites YR, Cancho VG, Ortega EMM, Vila R, Cordeiro GM. A New Regression Model on the Unit Interval: Properties, Estimation, and Application. Mathematics. 2022; 10(17):3198. https://doi.org/10.3390/math10173198

Chicago/Turabian Style

Benites, Yury R., Vicente G. Cancho, Edwin M. M. Ortega, Roberto Vila, and Gauss M. Cordeiro. 2022. "A New Regression Model on the Unit Interval: Properties, Estimation, and Application" Mathematics 10, no. 17: 3198. https://doi.org/10.3390/math10173198

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop