Next Article in Journal
Mountaineering Team-Based Optimization: A Novel Human-Based Metaheuristic Algorithm
Previous Article in Journal
An Improved Fick’s Law Algorithm Based on Dynamic Lens-Imaging Learning Strategy for Planning a Hybrid Wind/Battery Energy System in Distribution Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Flexible Class of Two-Piece Normal Distribution with a Regression Illustration to Biaxial Fatigue Data

by
Hugo Salinas
1,
Hassan Bakouch
2,3,
Najla Qarmalah
4,* and
Guillermo Martínez-Flórez
5
1
Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 7500015, Chile
2
Department of Mathematics, College of Science, Qassim University, Buraydah 51452, Saudi Arabia
3
Department of Mathematics, Faculty of Science, Tanta University, Tanta 31111, Egypt
4
Department of Mathematical Sciences, Princess Nourah Bint Abdulrahman University, Riyadh 11671, Saudi Arabia
5
Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 230002, Colombia
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(5), 1271; https://doi.org/10.3390/math11051271
Submission received: 30 January 2023 / Revised: 25 February 2023 / Accepted: 3 March 2023 / Published: 6 March 2023

Abstract

:
Using a two-piece normal distribution for modeling univariate data that exhibits symmetry, and uni/bimodality is notably effective. In this respect, the shape parameter value determines whether unimodality or bimodality is present. This paper proposes a flexible uni/bimodal distribution with platykurtic density, which can be used to simulate a variety of data. The concept is based on the transforming of a random variable into a folded distribution. Further, the proposed class includes the normal distribution as a sub-model. In the current study, the maximum likelihood method is considered for deriving the main structural properties and for the estimation of parameters. In addition, simulation experiments are presented to evaluate the behavior of estimators. Finally, fitting and regression applications are presented to illustrate the usefulness of the proposed distribution for data modeling in different real-life scenarios.

1. Introduction

The standard normal distribution is frequently used in statistics but falls short in explaining various random phenomena in nature. To address this limitation, researchers have developed extensions of the normal distribution that offer greater flexibility in modeling data. For instance, Azzalini [1] introduced the skew-normal distribution, which shares some characteristics with the normal model but has a larger range of skewness and kurtosis coefficients. Other researchers have proposed new classes of distributions, such as Elal-Olivero’s [2] bimodal-symmetric skew-normal distributions, the skewed sinh-normal distribution investigated by Martínez-Flórez et al. [3], and the extended bimodal-normal distribution presented by Cortés et al. [4]. Additionally, Hoxhaj and Khattree [5] conducted a geometrical study of the skew-normal family of distributions, highlighting the normal distribution’s ability to describe rare events with continuous support. More recently, Elal-Olivero et al. [6] created the bimodal skew-normal distribution and presented its characteristics and several inferential investigations, while Martinez-Flórez et al. [7] improved upon the sinh-normal distribution to address issues of skewness and bi-modality. In their most recent work, Martinez-Flórez et al. [8] introduced flexible classes of normal distributions, including unimodal, bimodal, and trimodal distributions.
Given the challenges of data modeling in data science, it remains necessary to investigate alternative normal distributions. In this paper, we propose a flexible class of distributions that can model bimodal data and data with kurtosis less than a normal distribution, offering both theoretical and practical benefits. Our proposed class of distributions is based on the folded distribution, which has been studied by several authors in the literature.
In this paper, we propose a flexible class of distributions that offers a practical alternative to normal distributions for modeling two types of data: those with bimodality and those with less kurtosis than a normal distribution. The normal distribution often falls short in accurately modeling real-life data with bimodal distributions or a less pointed shape and thinner tails. While two-component mixing distributions are commonly used to model bimodal data, they can be challenging due to parameter non-identifiability. Our proposed distribution offers a more tractable alternative with the same flexibility as mixture distributions. We validate our approach with simulations and practical data applications.
Our proposed class of distributions has properties similar to those of a normal distribution, producing a family of uni/bimodal distributions. It has a kurtosis coefficient between 1 and 3, and when the shape parameter is between 0 and 1, it is unimodal and platykurtic. When the shape parameter is greater than 1, it is bimodal. This family provides an alternative to mixture distributions for modeling data with bimodality and to the normal distribution for modeling data with less pointed density and thinner tails. We build upon the folded distribution, which is generated by taking the absolute value of a normal random variable and has been studied by several authors in the literature, including Tsagris et al. [9] and Reig et al. [10]. We outline the folded distribution’s main theoretical aspects before introducing our proposed family, which is of interest to both the theoretical and applied communities.
The folded normal distribution was originally proposed by Leone et al. [11] and is defined by the parameters ( μ , σ ) . It represents the distribution of the absolute value of a normally distributed random variable U with mean μ and variance σ 2 . The probability density function (PDF) of Y = | U | can be expressed as follows:
f Y ( y ; μ , σ ) = 2 π σ 1 e ( y 2 + μ 2 ) / 2 σ 2 cosh μ σ 2 y , y > 0 .
Using a convenient reparameterization, ( μ , σ ) by ( λ , σ ) , where λ = μ / σ , the PDF can be expressed in another version, denoted as F N ( λ , σ ) , and shown as follows:
f Y ( y ; λ , σ ) = 2 σ ϕ y σ e λ 2 / 2 cosh λ y σ , y > 0 ,
where ϕ is the PDF of the standard normal distribution.
The proposed family of distributions extends the FN distribution across the entire real line, offering a reflected version that can analyze both positive and negative valued data. This family is a weighted version of the normal distribution, utilizing the weight function cosh ( · ) to imbue the distribution with additional properties, including bimodality and a reduction in biased data. Additionally, the proposed family offers a solution to Beneš stochastic partial differential equation for the Fokker–Planck–Kolmogorov equation under certain parameter values, as demonstrated by Sarkka and Solin [12]. While this solution has yet to be discussed from the perspective of distribution theory, this paper aims to undertake a statistical study of the proposed family, including the mentioned solution as a special case and exploring practical applications.
With the above in mind, the rest of this paper is structured as follows. In Section 2, we derive the proposed class of distributions, while Section 3 highlights several properties of the class. Section 4 delves into the development of likelihood-based estimation for the class, along with a simulation study. In Section 5, we present real-life data applications of the proposed distributions, including fitting and regression examples. Lastly, in Section 6, we provide concluding remarks.

2. The Family of Uni/Bimodal Densities

In this section, we introduce the two-piece normal (TN) distribution and derive its density. Additionally, we present some results on the uni/bimodality properties associated with the distribution.
Definition 1. 
If it is said that the random variable has a TN distribution with parameter λ, denoted by Z T N ( λ ) , then its PDF is given by:
f Z ( z ; λ ) = ϕ ( z ) e λ 2 / 2 cosh ( λ z ) , z R , λ 0 ,
where λ is the shape parameter.
The TN distribution can be regarded as a reflected version of the FN distribution, meaning it extends the FN distribution over the entire real line. Additionally, the TN distribution can be viewed as a weighted version of a normal distribution, ϕ ( z ) , where the weight function cosh ( λ z ) is used to obtain f Z ( z ; λ ) = cosh ( λ z ) ϕ ( z ) E ( cosh ( λ Z ) ) . The normalizing constant E ( cosh ( λ Z ) ) is e λ 2 / 2 . Later, we will show how the parameter λ is critical in determining the distribution’s unimodality and bimodality. Notably, the PDF (2) can be represented as a sum of two functions, that is, f Z ( z ; λ ) = ( ϕ ( z λ ) + ϕ ( z + λ ) ) / 2 .
Since cosh ( λ z ) 1 and e λ 2 / 2 1 for λ 0 , we have e λ 2 / 2 cosh ( λ z ) 1 . Thus, the PDF can be stochastically ordered as follows: f Z ( z ; λ ) ϕ ( z ) . Furthermore, by using the inequality cosh ( z ) e z 2 / 2 , we find that f Z ( z ; 1 ) 1 2 π e . In other words, f Z ( z ; 1 ) is bounded above by 1 2 π e .
Figure 1 illustrates the shapes of Equation (2) for various values of λ . It can be observed that the TN distribution can be unimodal or bimodal based on the value of the parameter.
Additionally, the following properties can be derived from the definition of the TN distribution, and basic properties of the T N ( λ ) family can be directly obtained from (2).
Proposition 1. 
If Z T N ( λ ) , then the cumulative distribution function (CDF) is given by
F Z ( z ; λ ) = ( Φ ( z λ ) + Φ ( z + λ ) ) / 2 ,
where Φ is the CDF of the standard normal distribution.
Proof. 
From the definition of the CDF, we get
P ( Z z ) = 1 2 2 π z e u 2 / 2 e λ 2 / 2 ( e λ u + e λ u ) d u = 1 2 2 π z e ( u λ ) 2 / 2 d u + z e ( u + λ ) 2 / 2 d u = 1 2 z λ ϕ ( u ) d u + z + λ ϕ ( u ) d u ,
which gives the desired outcome.  □

3. Some Properties of the TN Distribution

In this section, we explore various statistical properties of the TN distribution, including its basic properties, stochastic representation, and moment-generating function.

3.1. Basic Properties

The following properties can be obtained directly from Definition 1.
Proposition 2. 
Let X N ( λ , 1 ) and Z T N ( λ ) , the following properties hold:
(a) 
f Z ( z ; λ = 0 ) = ϕ ( z )
(b) 
f Z ( z ; λ ) = f Z ( z ; λ )
(c) 
| Z | = d | X | F N ( λ )
(d) 
Z T N ( λ )
(e) 
F Z ( z ; λ ) = 1 F Z ( z ; λ )
(f) 
F Z ( 0 ; λ ) = 1 / 2
(g) 
lim λ 0 f Z ( z ; λ ) = ϕ ( z ) . In contrast, as λ , f Z ( z ; λ ) tends to degenerate at 0.
(h) 
For λ > 1 , the density (2) is bimodal, that is, in each region z ( , 0 ] and z [ 0 , ) , log f Z ( z ; λ ) is concave function of z.
(i) 
For λ > 1 , two modes of (2) are located at z = ± z 0 , z 0 > 0 , where z 0 satisfies
z 0 = λ tanh ( λ z 0 )
(j) 
For 0 λ 1 , single mode of (2) is located at z = 0 , because f Z ( z ; λ ) < 0 for z > 0 and f Z ( z ; λ ) > 0 for z < 0 .
Property 2 presents several important properties of the TN distribution. First, property (a) shows that the TN distribution includes the normal distribution as a special case when λ = 0 , and, therefore, it is a generalization of the normal distribution. Property (b) indicates that the TN density is an even function, which means that it is symmetric about the y-axis. Properties (c) and (d) show the distributions of the absolute value and negative of a TN random variable, respectively. Property (e) shows that the TN density is symmetric around zero, which follows from the evenness of the density. Property (f) indicates that the TN density is always symmetric around zero. Property (g) shows that the TN density converges to a point mass at zero as λ approaches infinity and to a standard normal distribution as λ approaches zero. Finally, properties (h), (i), and (j) demonstrate that the TN density can be uni/bimodal depending on the value of parameter λ . These properties are useful in understanding the behavior and properties of the TN distribution, and they will be used in subsequent sections for statistical inference and applications.

3.2. Stochastic Representation of the TN Random Variable

The following proposition presents the mechanism for generating random numbers that follow the TN distribution.
Proposition 3. 
Z T N ( λ ) if and only if there exist independent random variables S and Y F N ( λ , 1 ) with P ( S = 1 ) = P ( S = 1 ) = 1 / 2 , such that Z = d S Y .
Proof. 
For Z 0 then
P ( Z z ) = P ( S Y z ) = P ( Y z ) P ( S = 1 ) = F Y ( z ) / 2 ,
while Z < 0 then
P ( Z z ) = P ( S Y z ) = P ( Y z ) P ( S = 1 ) = P ( Y z ) / 2 = ( 1 F Y ( z ) ) / 2 .
The derivative of P ( Z z ) , from (4) and (5), gives the PDF of Z given by (2) and hence Z has a T N ( λ ) distribution.  □

3.3. Moments

As shown in Property 2 (b), the odd moments of Z are equal to zero. In order to compute the even moments, we need to find the moment-generating function (MGF) of the T N ( λ ) distribution.
Proposition 4. 
The MGF of Z T N ( λ ) is
M Z ( t ) = e t 2 / 2 cosh ( λ t ) .
Proof. 
By definition of MGF, we find that
E ( e t Z ) = 1 2 2 π e t z e z 2 / 2 e λ 2 / 2 ( e λ z + e λ z ) d z = 1 2 2 π e t z e ( z λ ) 2 / 2 d z + e t z e ( z + λ ) 2 / 2 d z = 1 2 e t u + λ t ϕ ( u ) d u + e t u λ t ϕ ( u ) d u = 1 2 ( e λ t + e λ t ) e t u ϕ ( u ) d u ,
which gives the desired outcome.  □
Consider Z T N ( λ ) and X = ξ + η Z T N ( θ ) with the density given by
f X ( x ; θ ) = η 1 ϕ ( ( x ξ ) / η ) e λ 2 / 2 cosh ( λ ( x ξ ) / η ) , x R , λ 0 ,
where θ = ( ξ , η , λ ) , ξ R , and η > 0 , then Proposition 5 is presented below:
Proposition 5. 
Let Z T N ( λ ) , the r-th moment of X T N ( θ ) is given by
E ( X r ) = E ( ( ξ + η Z ) r ) = k = 0 r r k ξ r k η k E ( Z k )
and
E ( Z k ) = { 1 + ( 1 ) k } 2 k / 2 1 π Γ k + 1 2 1 F 1 k 2 ; 1 2 ; λ 2 2 ,
where 1 F 1 ( a ; b ; z ) is the Kummer confluent hypergeometric function [13], given by
1 F 1 ( a ; b ; z ) : = n = 0 a ( n ) z n b ( n ) n ! ,
a ( 0 ) = 1 and a ( n ) = a ( a + 1 ) ( a + 2 ) ( a + n + 1 ) is the rising factorial.
Proof. 
Using the stochastic representation given in Proposition 3 and Y F N ( λ , 1 ) , applying the properties of the expectation, and for E ( X r ) , the binomial theorem is used, and the result is obtained. In effect
E ( Z k ) = E Y k 2 + ( 1 ) k Y k 2 = 1 2 { 1 + ( 1 ) k } E ( Y k ) = 1 2 { 1 + ( 1 ) k } 0 2 y k ϕ ( y ) e λ 2 / 2 cosh ( λ y ) d y = 2 k / 2 1 π { 1 + ( 1 ) k } Γ k + 1 2 1 F 1 k 2 , 1 2 , λ 2 2 .
It can be noted that it is confirmed that E ( Z k ) = 0 for all k odd.  □
Corollary 1. 
The MGF of X T N ( θ ) is
M X ( t ) = e η 2 t 2 / 2 + ξ t cosh ( λ η t ) .
Proof. 
Using Equation (6) and X = ξ + η Z T N ( θ ) , we obtain
E ( e t X ) = e t ξ E ( e t η Z ) = e t ξ M Z ( η t ) = e ξ t cosh ( λ η t ) e η 2 t 2 / 2 ,
which completes the proof.  □
From the MGF, it can be observed that as λ 0 , the MGF of the TN distribution tends to that of N ( ξ , η 2 ) . The mean, variance, and kurtosis ( κ X ) of X T N ( θ ) are given by:
E ( X ) = ξ , V a r ( X ) = η 2 ( 1 + λ 2 ) and κ X = λ 4 + 6 λ 2 + 3 ( 1 + λ 2 ) 2 .
These results can be obtained using the properties of the MGF and moment formula, which can be derived from the stochastic representation given in Proposition 3.
Remark 1. 
The kurtosis of Z T N ( λ ) tends to 3 when λ 0 , corresponding to the standard normal distribution N ( 0 , 1 ) , and tends to 1 as λ . This behavior is illustrated in Figure 2, which shows the kurtosis for different values of λ. It can be shown that 1 κ X 3 , indicating that the TN distribution is platykurtic. Note that the variance increases with λ and that the kurtosis reaches its minimum value of 1 at λ .
Remark 2. 
The mean deviation of Z T N ( λ ) is 2 / π e λ 2 / 2 + 2 λ Φ ( λ ) λ , so that
M e a n   d e v i a t i o n S t a n d a r d   d e v i a i o n = 2 / π e λ 2 / 2 + 2 λ Φ ( λ ) λ 1 + λ 2 .
When λ > 0 , the ratio is larger than the corresponding ratio of N ( 0 , 1 ) and approaches 2 / π 0.7978 as λ approaches 0, as shown in Table 1. It is worth noting that the ratio approaches 1 as λ approaches infinity.

4. Estimation with Inference and a Simulation Study

This section discusses the maximum likelihood estimation (MLE) of parameters ξ , η , and λ for the TN distribution, as presented in [14]. A simulation was conducted to gain insight into the obtained estimators. To find the MLE for each parameter, we used the R programming language (R core Team [15]) with a machine learning tool, as recommended by Byrd and Zhu [16]. Additionally, we presented the observed information matrix for the distribution.

4.1. The Maximum Likelihood Estimation

Let x = ( x 1 , x 2 , , x n ) be a realization of the random sample X = ( X 1 , X 2 , , X n ) , where X 1 , X 2 , , X n are independent and identically distributed random variables that follow the T N ( θ ) distribution. The log-likelihood based on a random sample X can be given as follows:
l ( θ , z ) = n log ( 2 π ) / 2 n log η n λ 2 / 2 i = 1 n z i 2 / 2 + i = 1 n log ( cosh ( λ z i ) ) ,
where z i = ( x i ξ ) / η , which is a continuous function in each parameter. As a result, we can see that the score vector’s elements are S ( θ ) = ( l ( θ , z ) / ξ , l ( θ , z ) / η , l ( θ , z ) / λ ) given by
l ( θ , z ) ξ = 1 η i = 1 n z i λ η i = 1 n tanh ( λ z i ) l ( θ , z ) η = n η + 1 η i = 1 n z i 2 λ η i = 1 n z i tanh ( λ z i ) l ( θ , z ) λ = n λ + i = 1 n z i tanh ( λ z i ) .
The MLE θ ^ solves the score equations S ( θ ) = 0 . However, closed-form expressions for the MLE do not exist, and numerical maximization of the log-likelihood function using nonlinear optimization algorithms is necessary. In this study, we used the optim function in the R programming language to achieve the maximization of the log-likelihood function, although other numerical methods, such as Nelder–Mead [17] can also be used.
To obtain the standard errors of the MLE, we computed the information matrix I ( θ ) . The elements of this matrix are given by I ( θ ) = E ( 2 l ( θ , Z ) / θ i θ j ) , where i , j = 1 , 2 , 3 and θ = ( ξ , η , λ ) . The expected values of the second-order derivatives were computed as follows:
E 2 l ( θ , Z ) ξ 2 = n η 2 ( 1 λ 2 a 0 ) , E 2 l ( θ , Z ) ξ η = 0 , E 2 l ( θ , Z ) ξ λ = 0 , E 2 l ( λ , Z ) η 2 = n η 2 ( 2 + λ 2 ( 1 a 2 ) ) , E 2 l ( θ , Z ) η λ = n η λ ( 1 + a 2 ) , E 2 l ( θ , Z ) λ 2 = n ( 1 a 2 ) ,
where a r : = E ( Z r sech 2 ( λ Z ) ) for r = 0 , 2 .
The values a 0 and a 2 were obtained using numerical integration. If necessary, this matrix can be approximated by the observed information matrix I ( θ ^ ) , which is defined as minus the Hessian matrix and evaluated at θ ^ , that is, I ( θ ^ ) ( 2 l ( θ , z ) / θ i θ j ) i , j , where the second derivatives are as given below
2 l ( θ , z ) ξ 2 = n η 2 + λ 2 η 2 i = 1 n sech 2 ( λ z i ) 2 l ( θ , z ) ξ η = 2 η 2 i = 1 n z i + λ η 2 i = 1 n tanh ( λ z i ) + λ 2 η 2 i = 1 n z i sech 2 ( λ z i ) 2 l ( θ , z ) ξ λ = 1 η i = 1 n tanh ( λ z i ) λ η i = 1 n z i sech 2 ( λ z i ) 2 l ( θ , z ) η 2 = n η 2 3 η 2 i = 1 n z i 2 + 2 λ η 2 i = 1 n z i tanh ( λ z i ) + λ 2 η 2 i = 1 n z i 2 sech 2 ( λ z i ) 2 l ( θ , z ) η λ = 1 η i = 1 n z i tanh ( λ z i ) λ η i = 1 n z i 2 sech 2 ( λ z i ) 2 l ( θ , z ) λ 2 = n + i = 1 n z i 2 sech 2 ( λ z i )
Thus, the information matrix is used for computing the standard errors for the remainder of the paper.

4.2. Simulation Study

To evaluate the performance of the MLEs ξ ^ , η ^ , and λ ^ , we conducted a numerical experiment using the R programming language. We considered 5000 Monte Carlo replications, with sample sizes n { 50 , 100 , 150 , 200 , 300 , 500 } , and combinations of the parameters λ { 0.75 , 1.25 , 1.75 , 2.25 , 2.75 , 3.50 , 5.50 , 7.50 } . For simplicity, we fixed ( ξ , η ) at ( 0 , 1 ) in all experiments, as these are location and scale parameters, respectively. Table 2 presents the empirical mean, absolute value of the bias, and root mean squared error (RMSE) of the estimates of the TN distribution parameters.
Random numbers X T N ( ξ , η , λ ) can be generated by the following algorithm:
(a)
Choose the values ξ , η , λ , and the sample size n.
(b)
Generate U N ( λ , 1 ) .
(c)
Compute Y = | U | .
(d)
Generate S B e r ( 1 / 2 ) .
(e)
If S = 1 , compute Z = Y , else Z = Y .
(f)
Compute X = ξ + η Z
where B e r ( 1 / 2 ) is the Bernoulli distribution with probability of success p = 1 / 2 .
Table 2 shows that, in general, the bias decreases as n increases, indicating that the MLE λ ^ is asymptotically unbiased. Additionally, we observe that the RMSE decreases as n increases in all cases, indicating that the MLE of the TN distribution parameters has good precision, as expected.

5. Practical Data Illustrations

5.1. Illustration 1: Data Fitting

As a first illustration, the distribution of the random variable ultrasound weight (pre-birth weight) with size 500 was fitted to the variable X = b.weight, which is the ultrasound weight (fetal weight in grams). These data used are available at http://www.mat.uda.cl/hsalinas/cursos/2011/R/weight.rar. Some descriptive statistics for this variable are given in Table 3.
The analysis of the variable in question reveals that its distribution is approximately symmetric. Furthermore, as depicted in Figure 3a, the histogram of the variable exhibits a bimodal pattern, much like the behavior of ultrasound weight. Consequently, the TN distribution seems to be a suitable choice for modeling this set of observations.
Interestingly, these findings stand in contrast to the results of previous studies that utilized other distributions, such as the bimodal normal (BN) distribution used by Elal-Olivero [2], the two-piece skew-normal (TSN) distributions shown in Kim [18], and a mixture of normals (MN). In this study, we used various information criteria to identify the best-fitting model for the data.
(a)
Akaike information criterion (AIC), given by AIC = 2 l ^ + 2 k .
(b)
Bayesian information criterion (BIC), given by BIC = 2 l ^ + k log ( n ) .
(c)
Corrected AIC (AICc), given by AICc = AIC + 2 k ( k + 1 ) n k 1 ,
where l ^ is the estimated log-likelihood value, n is the number of data points, and k is the number of parameters in the model.
A good fitting model should have a small information criterion. The results of the fitted distributions are presented in Table 3. The TN distribution with the smallest AIC, BIC, and AICc values outperformed all of the considered competing distributions, indicating that it is the best fit for the data. This is further supported by Figure 3c, which clearly shows how well the CDF of the estimated TN distribution matches the empirical CDF. Table 4 contains the MLE of the BN, TSN, MN, and TN models for comparison.
The statistical metrics AIC, AICc, and BIC consistently suggest that the TN model outperforms the other compared models in fitting the ultrasound weight distribution. This conclusion is supported by the histogram in Figure 3a, which displays the fitted BN, MN, TSN, and TN distributions based on their respective MLEs. Moreover, the qq-plot of the TN distribution and the CDF plots of both TN and MN models in Figure 3b,c provide further evidence that the TN model is the best fit for the set of ultrasound weight observations. In addition, the Kolmogorov–Smirnov goodness-of-fit test yielded a test statistic of K = 0.034 with a p-value of 0.9347, indicating strong support for the TN model as an accurate representation of the data.

5.2. Illustration 2: Regression Analysis

The TN distribution was extended to the case of having covariates that explain the response variable Y, for example, a linear regression model
y i = β 0 + β 1 x 1 i + β 2 x 2 i + + β p x p i + ϵ i , i = 1 , 2 , , n ,
where x 1 , x 2 , , x p is a set of covariates, β 0 , β 1 , , β p is a set of unknown parameters, and ϵ 1 , ϵ 2 , , ϵ n are random variables representing the model errors. The most common assumption is that ϵ i are independent and identically distributed random variables, having a normal distribution with zero mean and constant variance σ 2 .
However, this assumption does not satisfy standards of practice. Therefore, it was assumed that ϵ i are independent and identically distributed as T N ( 0 , η , λ ) , which is E ( ϵ i ) = 0 and V a r ( ϵ i ) = η 2 ( 1 + λ 2 ) . In this case, the following applies:
E ( Y i ) = β 0 + β 1 x 1 i + β 2 x 2 i + + β p x p i = x i β .
For this illustration, a set of observations corresponding to biaxial fatigue data were analyzed, as used by Rieck and Nederman [19], using a Birnbaum–Saunders log-linear model. These authors fit the linear regression model as follows:
y i = β 0 + β 1 x i + ϵ i , i = 1 , 2 , 3 , , 46
The data presented in this study represent the results of fatigue tests conducted on cylindrical specimens of 1% Cr-Mo-V steel. The specimens were subjected to combined axial and torsional loads, using constant amplitude cycles until failure occurred. The study focuses on two variables: X, which corresponds to the logarithm of the work per cycle, and Y, which represents the logarithm of the number of cycles to failure. To demonstrate the effectiveness of various regression models, we have employed the TN, N, SNC, and SHN models. The SHN model is a nonlinear transformation of a normal distribution and was introduced by Rieck and Nederman [19]. By comparing the results of these models, we can better understand their applicability and effectiveness for this type of analysis.
Based on the data presented in Table 5, we observe that the TN and SHN regression models provide a good fit for the set of biaxial fatigue observations, outperforming the N and SNC models. Furthermore, the TN model is found to be the most suitable based on the AIC and AICc criteria, which indicate the goodness of fit of the models. Although the BIC of the TN model is slightly higher than that of the SHN model, this difference is not statistically significant, likely due to the TN model having more parameters. Nevertheless, the TN regression model proves to be a promising alternative to models discussed in previous research.
In Figure 4, we can see the scatter plot of the response variable, y, plotted against the explanatory variable x (logarithm of the work per cycle), along with the fitted model line. The plot clearly shows a good fit, which ensures that the model can be effectively used for making predictions. The TN distribution accounts for the random part, while the trend line explains the systematic part of the model. Thus, we can have confidence in the model’s ability to explain and predict the relationship between the variables.
In order to identify atypical observations and/or model misidentification, the transformation of the martingale residual, rMTi, as proposed by Barros et al. [20] was analyzed. These residuals can be defined as follows:
r M T i = sgn ( r M i ) 2 [ r M i + κ i log ( κ i r M i ) ] , i = 1 , 2 , , n ,
where r M i = κ i + log ( S ( e i , θ ^ ) ) is the martingal residual proposed by Ortega et al. [21], where κ i = 0 , 1 indicate whether the i-th observation is censored or not, respectively, sgn ( r M i ) denotes the sign of r M i , and S ( e i ; θ ^ ) represents the survival function evaluated at e i (standardized classical residuals), where θ ^ is the MLE for θ .
In order to verify the model’s assumptions, as well as error distribution, fit problems, and the presence of possible outliers, confidence bands were generated using simulations taken from the literature of diagnostic analysis and used as envelopes. The plots of r M T i with generated confidence envelopes are presented in Figure 5. From this figure, it can be seen clearly that the TN regression model is a better fit for the data than the other models, and the distribution of the set of points with regard to the boundary bands supports this finding.

6. Conclusions

Although the normal distribution is widely used in statistics, it falls short in describing many natural phenomena, especially when it comes to kurtosis and multimodality. To address these issues, a new family of distributions called the two-piece normal (TN) has been introduced in this paper. The TN distribution is a symmetric and continuous distribution with a shape parameter that can be used to control whether the density is unimodal or bimodal. Interestingly, the normal distribution is a special case of the TN distribution. By using the “cosh” weight function, which minimizes bias and adds additional features such as bimodality that are not present in a normal distribution, the TN distribution is a weighted version of a normal distribution. Moreover, the TN distribution is a solution to the stochastic partial differential equation of Beneš and has several desirable properties. The TN parameters can be estimated using maximum likelihood estimation, and the model can be validated using Monte Carlo simulation. The superiority of the TN model over other models in the literature is demonstrated through its application to two practical datasets. Furthermore, the TN distribution has many promising applications in various fields, and it can be extended to tri-modality to model real-world data with up to three modes. Overall, the TN distribution is a strong alternative for modeling symmetric data with one or two modes, and it is expected to become increasingly popular for modeling non-Gaussian time series models.

Author Contributions

Conceptualization, H.S., H.B. and G.M.-F.; methodology, H.S. and G.M.-F.; software, H.S. and G.M.-F.; validation, H.S., H.B., N.Q. and G.M.-F.; writing—original draft preparation, H.S. and G.M.-F.; writing—review and editing, H.S., H.B., N.Q. and G.M.-F.; visualization, H.S. and G.M.-F.; funding acquisition, N.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We make use of publicly available data, which can be downloaded from http://www.mat.uda.cl/hsalinas/cursos/2011/R/weight.rar for the first application. For the second application, the data is available from the corresponding author upon reasonable request.

Acknowledgments

The authors gratefully acknowledge Princess Nourah Bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R376), Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia for the financial support for this project.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  2. Elal-Olivero, D. Alpha-skew-normal distribution. Proyecc. J. Math. 2010, 29, 224–240. [Google Scholar] [CrossRef] [Green Version]
  3. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. The Log-Linear Birnbaum-Saunders Power Model. Methodol. Comput. Appl. Probab. 2017, 19, 913–933. [Google Scholar] [CrossRef]
  4. Cortés, M.A.; Elal-Olivero, D.; Olivares-Pacheco, J.F. A new class of distributions generated by the extended bimodal-normal distribution. J. Probab. Stat. 2018, 2018, 9753439. [Google Scholar] [CrossRef] [Green Version]
  5. Hoxhaj, V.; Khattree, R. A study of geometric and statistical curvatures for the skew-normal family of distributions. Commun. Stat.-Simul. Comput. 2018, 47, 2010–2022. [Google Scholar] [CrossRef]
  6. Elal-Olivero, D.; Olivares-Pacheco, J.F.; Venegas, O.; Bolfarine, H.; Gomez, H.W. On properties of the bimodal skew-normal distribution and an application. Mathematics 2020, 8, 703. [Google Scholar] [CrossRef]
  7. Martínez-Flórez, G.; Elal-Olivero, D.; Barrera-Causil, C. Extended Generalized Sinh-Normal Distribution. Mathematics 2021, 9, 2793. [Google Scholar] [CrossRef]
  8. Martinez-Florez, G.; Tovar-Falon, R.; Elal-Olivero, D. Some new flexible classes of normal distribution for fitting multimodal data. Statistics 2022, 56, 182–205. [Google Scholar] [CrossRef]
  9. Tsagris, M.; Beneki, C.; Hassani, H. On the folded normal distribution. Mathematics 2014, 2, 12–28. [Google Scholar] [CrossRef] [Green Version]
  10. Reig, J.; Rodrigo Peñarrocha, V.M.; Rubio Arjona, L.; Martínez-Inglés, M.T.; Molina-García-Pardo, J.M. The Folded Normal Distribution: A New Model for the Small-Scale Fading in Line-of-Sight (LOS) Condition. IEEE Access 2019, 7, 77328–77339. [Google Scholar] [CrossRef]
  11. Leone, F.C.; Nelson, L.S.; Nottingham, R.B. The folded normal distribution. Technometrics 1961, 3, 543–550. [Google Scholar] [CrossRef]
  12. Sarkka, S.; Solin, A. Applied Stochastic Differential Equations; Cambridge University Press: Cambridge, UK, 2003; ISBN 9781108186735. [Google Scholar] [CrossRef] [Green Version]
  13. Abad, J.; Sesma, J. Computation of the regular confluent hypergeometric function. Math. J. 1995, 5, 74–76. [Google Scholar]
  14. Wang, S.; Chen, W.X.; Chen, M.; Zhou, Y.W. Maximum likelihood estimation of the parameters of the inverse Gaussian distribution using maximum rank set sampling with unequal samples. Math. Popul. Stud. 2023, 30, 1–12. [Google Scholar] [CrossRef]
  15. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 29 January 2023).
  16. Byrd, R.H.; Lu, P.; Nocedal, J.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
  17. Nelder, J.A.; Mead, R. A simplex algorithm for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
  18. Kim, H.J. On a class of two-piece skew-normal distributions. Statistics 2005, 39, 537–553. [Google Scholar] [CrossRef]
  19. Rieck, J.R.; Nedelman, J.R. A log-linear model for the Birnbaum-Saunders distribution. Technometrics 1991, 33, 51–60. [Google Scholar]
  20. Barros, M.; Galea, M.; Gonzalez, M.; Leiva, V. Influence diagnostics in the tobit censored response model. Stat. Methods Appl. 2010, 19, 379–397. [Google Scholar] [CrossRef]
  21. Ortega, E.M.; Bolfarine, H.; Paula, G.A. Influence diagnostics in generalized log-gamma regression models. Comput. Stat. Data Anal. 2003, 42, 165–186. [Google Scholar] [CrossRef]
Figure 1. Plot of density function of the TN for different values of λ .
Figure 1. Plot of density function of the TN for different values of λ .
Mathematics 11 01271 g001
Figure 2. Plot of kurtosis coefficient of TN for different values of λ .
Figure 2. Plot of kurtosis coefficient of TN for different values of λ .
Mathematics 11 01271 g002
Figure 3. (a) Histogram of ultrasound weight variable and models fitted with the MLEs for TN, MN, TSN, and BN, respectively. (b) qq-plot for the TN model. (c) Empirical CDF and CDFs of the MN and TN models.
Figure 3. (a) Histogram of ultrasound weight variable and models fitted with the MLEs for TN, MN, TSN, and BN, respectively. (b) qq-plot for the TN model. (c) Empirical CDF and CDFs of the MN and TN models.
Mathematics 11 01271 g003
Figure 4. Plot of Y against X, TN model (black solid line).
Figure 4. Plot of Y against X, TN model (black solid line).
Mathematics 11 01271 g004
Figure 5. Confidence envelope plots for the fitted models (a) N, (b) SHN, and (c) TN.
Figure 5. Confidence envelope plots for the fitted models (a) N, (b) SHN, and (c) TN.
Mathematics 11 01271 g005
Table 1. Ratio of mean deviation to standard deviation for the TN distribution.
Table 1. Ratio of mean deviation to standard deviation for the TN distribution.
λ 1032.521.510.80.60.40.2
Ratio0.99500.94890.92990.90200.86450.82490.81240.80370.79930.7979
Table 2. Empirical mean, | B i a s | , and RMSE for the TN model.
Table 2. Empirical mean, | B i a s | , and RMSE for the TN model.
λ n ξ ^ η ^ λ ^
Mean | Bias | RMSEMean | Bias | RMSEMean | Bias | RMSE
500.00240.00240.18270.96200.03800.19980.78690.03690.4608
1000.00180.00180.12710.98880.01120.1630.71030.03970.3770
0.751500.00120.00120.10131.00380.00380.14280.71470.03530.3358
200−0.00090.00090.08931.00380.00380.13100.71670.03330.3053
3000.00070.00070.07211.00280.00280.11420.71770.03230.2667
500−0.00050.00050.05591.00240.00240.09530.72080.02920.2183
50−0.00260.00260.20590.99090.00910.17571.27360.02360.3674
100−0.00240.00240.14270.99320.00680.12001.26630.01630.2426
1.251500.00220.00220.11470.99350.00650.09371.26450.01450.1873
200−0.00220.00220.10050.99630.00370.08001.25800.00800.1596
3000.00070.00070.08020.99600.00400.06281.25790.00790.1256
5000.00020.00020.06300.99990.00010.04941.25050.00050.0982
50−0.00290.00290.18850.97820.02180.12451.80840.05840.2971
1000.00240.00240.12590.98520.01480.08411.78710.03710.1956
1.751500.00140.00140.10300.99120.00880.06841.77340.02340.1606
2000.00070.00070.08810.99310.00690.05951.76960.01960.1387
3000.00070.00070.07220.99590.00410.04831.76140.01140.1108
5000.00050.00050.05610.99730.00270.03701.75680.00680.0860
500.00170.00170.16030.97490.02510.10862.33490.08490.3094
100−0.00110.00110.10950.98850.01150.07622.28700.03700.2115
2.251500.00100.00100.09130.99130.00870.06102.27720.02720.1663
2000.00030.00030.07990.99400.00600.05302.26900.01900.1438
300−0.00030.00030.06520.99610.00390.04312.26200.01200.1174
5000.00000.00000.04950.99790.00210.03372.25710.00710.0896
50−0.00100.00100.15040.97660.02340.10452.84690.09690.3530
1000.00130.00130.10460.98770.01230.07292.79870.04870.2368
2.751500.00120.00120.08490.99270.00730.05962.77950.02950.1881
2000.00020.00020.07390.99370.00630.05142.77470.02470.1622
3000.00000.00000.05940.99540.00460.04182.76710.01710.1305
500−0.00010.00010.04640.99790.00210.03222.75960.00960.1012
50−0.00200.00200.14300.97660.02340.10303.62250.12250.4255
1000.00060.00060.10030.98780.01220.07053.55960.05960.2760
3.501500.00060.00060.08240.99100.00900.05893.54470.04470.2319
200−0.00040.00040.07110.99340.00660.05083.53150.03150.1980
300−0.00010.00010.05790.99650.00350.04093.51880.01880.1561
500−0.00010.00010.04520.99790.00210.03163.51180.01180.1205
500.00140.00140.14250.97630.02370.10215.69550.19550.6362
100−0.00130.00130.09970.98760.01240.07195.59850.09850.4298
5.501500.00130.00130.08270.99100.00900.05915.56790.06790.3476
2000.00090.00090.07090.99340.00660.05095.54890.04890.2961
300−0.00040.00040.05750.99580.00420.04055.53210.03210.2330
5000.00030.00030.04500.99800.00200.03185.51740.01740.1812
500.00240.00240.14260.97460.02540.10317.77720.27720.8682
100−0.00240.00240.10120.98880.01120.07127.62430.12430.5719
7.501500.00140.00140.08130.99210.00790.05867.58660.08660.4630
200−0.00080.00080.07160.99360.00640.05027.56360.06360.3925
3000.00030.00030.05690.99490.00510.04077.55040.05040.3168
500−0.00010.00010.04470.99760.00240.03197.52630.02630.2460
Table 3. Descriptive statistics for the variable b.weight.
Table 3. Descriptive statistics for the variable b.weight.
nMeanVarianceMedianSkewnessKurtosis
5003210.356695710.631750.07142.068
Table 4. MLE for the ultrasound weight data and the corresponding standard errors (in parenthesis), AIC, AICc, and BIC values.
Table 4. MLE for the ultrasound weight data and the corresponding standard errors (in parenthesis), AIC, AICc, and BIC values.
EstimatesBNTSNMNTN
ξ ^ ; μ ^ 1 3201.663207.422514.53223.49
(5.620)(26.03)(58.2)(27.59)
η ^ ; σ ^ 1 2 481.088772.68196,301.1466.23
(8.782)(25.76)(36,787.4)(20.09)
μ ^ 2 3896.8
(65.6)
σ ^ 2 2 236,577.6
(42,231.6)
λ ^ ; p ^ 1.7710.49671.4815
(0.539)(0.0442)(0.0904)
AIC8238.398109.678097.098094.22
AICc8240.438111.758099.268096.34
BIC8246.828122.318118.168106.86
Table 5. MLE for the biaxial fatigue data and the corresponding standard errors (in parenthesis), AIC, AICc, and BIC values.
Table 5. MLE for the biaxial fatigue data and the corresponding standard errors (in parenthesis), AIC, AICc, and BIC values.
EstimatesNSNCSHNTN
β ^ 0 12.28912.41912.27912.235
(0.405)(0.320)(0.389)(0.309)
β ^ 1 −1.672−1.706−1.670−1.665
(0.112)(0.088)(0.108)(0.086)
η ^ 0.40880.416 0.245
(0.042)(0.050) (0.034)
λ ^ −0.6980.4101.309
(0.294)(0.042)(0.302)
AIC53.2554.0052.7451.39
AICc56.2257.5055.7154.84
BIC58.7359.3258.2258.70
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Salinas, H.; Bakouch, H.; Qarmalah, N.; Martínez-Flórez, G. A Flexible Class of Two-Piece Normal Distribution with a Regression Illustration to Biaxial Fatigue Data. Mathematics 2023, 11, 1271. https://doi.org/10.3390/math11051271

AMA Style

Salinas H, Bakouch H, Qarmalah N, Martínez-Flórez G. A Flexible Class of Two-Piece Normal Distribution with a Regression Illustration to Biaxial Fatigue Data. Mathematics. 2023; 11(5):1271. https://doi.org/10.3390/math11051271

Chicago/Turabian Style

Salinas, Hugo, Hassan Bakouch, Najla Qarmalah, and Guillermo Martínez-Flórez. 2023. "A Flexible Class of Two-Piece Normal Distribution with a Regression Illustration to Biaxial Fatigue Data" Mathematics 11, no. 5: 1271. https://doi.org/10.3390/math11051271

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop