Next Article in Journal
Hybrid Sliding Mode Control of Full-Car Semi-Active Suspension Systems
Previous Article in Journal
A Novel Model for Distributed Denial of Service Attack Analysis and Interactivity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Generalization of the Student’s t Distribution with an Application in Quantile Regression

Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1270300, Chile
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(12), 2444; https://doi.org/10.3390/sym13122444
Submission received: 14 November 2021 / Revised: 5 December 2021 / Accepted: 11 December 2021 / Published: 17 December 2021

Abstract

:
In this work, we present a new generalization of the student’s t distribution. The new distribution is obtained by the quotient of two independent random variables. This quotient consists of a standard Normal distribution divided by the power of a chi square distribution divided by its degrees of freedom. Thus, the new symmetric distribution has heavier tails than the student’s t distribution and extensions of the slash distribution. We develop a procedure to use quantile regression where the response variable or the residuals have high kurtosis. We give the density function expressed by an integral, we obtain some important properties and some useful procedures for making inference, such as moment and maximum likelihood estimators. By way of illustration, we carry out two applications using real data, in the first we provide maximum likelihood estimates for the parameters of the generalized student’s t distribution, student’s t, the extended slash distribution, the modified slash distribution, the slash distribution generalized student’s t test, and the double slash distribution, in the second we perform quantile regression to fit a model where the response variable presents a high kurtosis.

1. Introduction

The slash distribution is the result of the quotient of two independent random variables, one with a standard normal distribution and the other with a uniform distribution on the interval (0, 1), with the following stochastic representation
Y = σ X U 1 / q + μ ,
where μ R is the location parameter and σ > 0 is the scale parameter and q is the parameter related to kurtosis. Will be denoted by Y S ( μ , σ , q ) and its density function has the following expression
f Y ( y ) = q 2 q 2 1 π y μ σ q + 1 Γ q + 1 2 Γ q + 1 2 , ( y μ ) 2 2 σ 2 ,
where Γ ( a ) = 0 t a 1 e t d t is the gamma function and Γ ( a , x ) = x t a 1 e t d t is the gamma function incomplete. This distribution presents heavier tails than the normal distribution, that is, it has more kurtosis. Properties of this family are discussed in Rogers and Tukey [1] and Mosteller and Tukey [2].
Maximum likelihood estimators for location and scale parameters are discussed in Kafadar [3]. Wang and Genton [4] described multivariate symmetrical and skew-multivariate extensions of the slash-distribution while Gómez et al. [5] (and Erratum in Gómez and Venegas, 2008) extend the slash distribution by introducing the slash-elliptical family; asymmetric version of this family is discussed in work of Arslan [6]. Genc [7] discussed a symmetric generalization of the slash distribution. More recently, Gómez et al. [8] utilize the slash-elliptical family to extend the Birnbaum–Saunders distribution.
In (1), μ = 0 and σ = 1 , we retrieve the standard slash distribution. What is more q = 1 we obtain the canonical slash distribution. When q tends to infinity, the standard normal distribution is recovered.
When U e x p ( 2 ) , in (1), the distribution obtained is called modified slash distribution studied by Reyes et al. [9]. Whose function of density is given by
f X ( x ) = 2 2 π 0 v 1 q e 1 2 x 2 v 2 q 2 v d v , q > 0 , x R ,
and will be denoted by X M S ( 0 , 1 , q ) , where q is kurtosis parameter.
When U B ( α , β ) and q = 1 , in (1), the distribution obtained is called extended slash (ES) distribution studied by Rojas et al. [10]. Whose function of density is given by
f Y ( y ; μ , σ , α , β ) = 1 σ B ( α , β ) 0 1 ϕ y μ σ t t α ( 1 t ) β 1 d t
is denoted as Y E S ( μ , σ , α , β ) with μ R , σ , α , β > 0 and ϕ denotes the pdf of the standard normal distribution (see Johnson et al. [11]) and B ( · , · ) denotes the beta function.
We will say that X has a student’s t distribution with ν degrees of freedom and with location parameter μ and scale parameter σ , which we will denote by X T ( μ , σ , ν ) and you have a stochastic representation given by
X = σ W ( V / ν ) 1 / 2 + μ
and continuous probability density function is given by
f X ( x ) = Γ ( ν + 1 2 ) σ Γ ( ν 2 ) ν π 1 + 1 ν x μ σ 2 ν + 1 2
with support on ( ; ) .
The moment’s order r of the random variable X with student’s t distribution can be explained by the function Gamma. If X T ( 0 , 1 , ν ) then
μ r = E [ X r ] = ν r / 2 a r / 2 2 r / 2 Γ ( ν r 2 ) Γ ( ν 2 ) , ν > r ,
where a r / 2 = x r ϕ ( x ) d x for r even, then
E [ X ] = 0 , ν > 1
V ( X ) = ν ν 2 , ν > 2 .
If Y T ( μ , σ , ν ) then
E ( Y r ) = k = 0 r r k σ k μ r k μ k .
Rui Li-Saralees Nadarajah [12] makes a review of all the generalizations of the student’s t distribution published to date, where they show that the main motivation of these extensions is to model heavy tails or data with high kurtosis.
In the study of symmetric distributions with heavy tails El-Bassiouny et al. [13] present the generalized student’s slash t distribution. We will say that X G L S T ( μ , σ , α , β , ν , q ) , with parameter q > 0 , has pdf given by
f X ( x ) = q Γ r + 1 2 σ π r Γ r 2 B ( α , β ) 0 1 w α q 1 w q β 1 1 + x μ σ w 2 r r + 1 2 d w , q > 0 , x R ,
where q is kurtosis parameter and B ( · , · ) denotes the beta function.
Another recent extension of the slash model was proposed by El-Morshedy, A. H. et al. [14]. These authors introduced the double slash (DSL) distribution with density function given by
f Y ( y ) = q 1 q 2 0 1 0 1 ϕ y μ σ w t t q 1 d t w q 2 d w
with μ R , σ , q 1 and q 2 > 0 .
When U G a ( 2 β , β ) and q = 1 , in (1), the distribution generalized modified slash distribution, denoted G M S ( μ , σ , β ) , studied by Reyes, J., Barranco-Chamorro, I., and Gómez, H. W. [15]. Whose function of density is given by
f Y ( y ; μ , σ , β ) = 1 σ 8 π if y = μ 2 β / 2 2 π σ β + 1 β β + 2 | y μ | β + 2 U 1 + β 2 , 3 2 , 2 σ 2 β 2 ( y μ ) 2 if y μ ,
where μ R , σ , β > 0 and
U ( a , b , z ) = 1 Γ ( a ) 0 t a 1 ( 1 + t ) b a 1 e z t d t ,
is the confluent hypergeometric function of the second kind. Details about this function can be seen in Abramowitz and Stegun, p. 505.
With the motivation of finding a distribution that is a generalization of the student’s t distribution and that presents heavier tails than the distributions found so far in the literature, in this article, we introduce a new generalization of the student’s t distribution (GT) whose stochastic representation is given by
Y = σ W ( V / ν ) 1 / q + μ ,
where W N ( 0 , 1 ) , V χ ( ν ) 2 are independent with ν > 0 and q > 0 and we will denote it as Y G T ( μ , σ , ν , q ) .
The paper is organized as follows. In Section 2 the probability density function (pdf) is given and some properties of the G T distribution are presented and shows that the distribution student’s t is a particular case of the distribution G T . Additionally, moments of order r are obtained, including the kurtosis coefficient. In Section 3 derivation of the moment and maximum likelihood estimators are discussed. A simulation study is presented to illustrate the behavior of the estimator of the parameters μ , σ , and q, for ν = 8 . Section 4 results of using the proposed model in two real applications are reported. Section 5 presents quantile regression. Section 6 presents the main conclusions.

2. The Generalized Student’s t Distribution

We present the generalized student’s t distribution with heavier tails compared to similar distributions. Initially we will present its density function.

2.1. Density Function

We will use the stochastic representation
Y = σ W ( V / ν ) 1 / q + μ ,
where W is distributed standard normal, V is distributed chi square, with ν degrees of freedom, W and V are independent random variables, μ , σ are location and scale parameters, respectively, ν degrees of freedom and q > 0 is the parameter related to the distribution kurtosis.
We use the notation Y G T ( μ , σ , ν , q ) , and for the standard case, we denote X G T ( 0 , 1 , ν , q ) .
Proposition 1.
Let Y G T ( μ , σ , ν , q ) . Then, the pdf of Y is given by
f Y ( y ; μ , σ , ν , q = 1 σ 2 ( ν / 2 ) ν 1 / q Γ ( ν / 2 ) 2 π 0 t ν 2 2 + 1 q e 1 2 [ y μ σ 2 ( t / ν ) 2 / q + t ] d t y μ Γ ( ν 2 + 1 q ) σ ( ν / 2 ) 1 / q Γ ( ν / 2 ) 2 π y = μ .
Proof. 
Since W and V are two independent random variables, such that W N ( 0 , 1 ) and V χ ( ν ) 2 , then the joint pdf of Y , T = σ W / ( V / ν ) 1 / q + μ , V is
f ( Y , T ) y , t , μ , σ , ν , q = 1 σ 2 ( ν / 2 ) ν 1 / q Γ ( ν / 2 ) 2 π t ν 2 2 + 1 q e 1 2 [ y μ σ 2 ( t / ν ) 2 / q + t ] ,
where y R and t > 0 . By marginalizing the result follows immediately para y μ . Doing y = μ the other expression is obtained. □
Corollary 1.
If q = 1 in (14), then la fdp de Y is called the canonical generalized student’s t distribution.
f Y y ; μ , σ , ν , 1 = y μ σ ν 2 + 2 2 3 1 + ν 4 σ 2 π U 1 + ν 4 , 3 2 , ν y μ σ 2 y μ 1 σ 2 π y = μ ,
where U ( a , b , x ) = 1 Γ ( a ) 0 e x t t a 1 ( 1 + t ) b a 1 d t , it is called the second-class hypergeometric confluent function.
Proof. 
If q = 1 in (14), then la fdp de Y is
f Y ( y ; μ , σ , ν , 1 = 1 σ 2 ( ν / 2 ) ν Γ ( ν / 2 ) 2 π 0 t ν 2 e 1 2 [ y μ σ 2 ( t / ν ) 2 + t ] d t y μ 1 σ 2 π y = μ .
Making a = ν / 2 and b = y μ σ 2 ν 2 and making the change of variables w = t 4 a and applying the result obtained in Reyes et al. [9]
0 t a e x 2 2 t 2 2 a t d t = a Γ ( a + 1 ) 2 a / 2 x ( a + 2 ) U 1 + a 2 , 3 2 , a 2 x 2 ,
where x = 2 y μ σ the result is obtained. □
Figure 1 on the left shows the PDFs of the generalized student’s t distribution for q = 1 compared to the Student’s t for ν = 5 , the normal distribution, the generalized bar t distribution and the double bar distribution. In which, it can be seen that as the variable tends to to the right (or to the left), the new model captures more data than the other comparative distributions. Furthermore, it is observed that to the extent that q is smaller, the distribution has greater kurtosis.

2.2. Tails Comparison of GT and Student’s t Distributions

In this part, we perform a comparison of the upper tails between the G T distribution and student’s t distribution. For this, we consider the canonical version ( q = 1 ) of G T distribution considering student’s t distribution with ν = 5 degrees of freedom. Table 1 shows P ( Y > y ) for different values of y in the mentioned distributions. The G T distribution has tails much heavier than the student’s t distribution.
Remark 1.
Table 1 illustrates the fact that the generalized student’s t distributions have heavier tails than the tails of the student’s t distribution.

2.3. Compared GT Quantiles with T Quantiles

Figure 2 shows the quantile function of the generalized student’s t distribution compared to quantile function of student’s t for different values of q and ν = 5 .
Proposition 2.
Let Y G T ( 0 , 1 , ν , q ) . Then an approximation of quantile p of Y is
y p = t p 2 j p ν q 2 2 q 1 + j p ν q 2 q q < 2 t p j p ν q 2 2 q q > 2 ,
where t p and j p denotes the quantiles p of student’s t and chi-square distribution whit ν degrees of freedom.
Proof. 
Y = Z J ν 1 q = Z J ν 1 2 J ν 1 2 J ν 1 q = T J ν 2 q 2 q
y p t p J p ν 2 q 2 q .
S i q < 2 y p t p J p ν 2 q 2 q + J p ν q 2 2 q 2 .
S i q > 2 y p t p J p ν q 2 2 q . □
Figure 3 shows the quantiles of the generalized student’s t distribution compared to quantile of proposition 2 for values q = 1 and ν = 5 .
Properties:
  • If q = 2 then y p = t p ;
  • if ν then y p = z p where z p is the quantile p of standard normal distribution.
In Table 2 we present quantiles generalized student’s t for n degrees of freedom and q = 1.

2.4. Properties of the Generalized Student’s t Distribution

In this section, we present some properties of the generalized student’s t distribution.
Proposition 3.
Let Y G T ( μ , σ , ν , q ) then
1. 
lim q f Y y ; μ , σ , ν , q = 1 σ ϕ y μ σ .
2. 
If Y | V = v N ( μ , v 2 / q σ 2 ) and V χ ( ν ) 2 then Y G T ( μ , σ , ν , q ) .
3. 
If Y G T ( 0 , 1 , ν , 2 ) , then, Y t ( ν ) .
Proof. 
  • Making q tend to infinity in representation (13), the result is immediately obtained;
  • f Y ( y ; μ , σ , ν , q ) = 0 ϕ ( y ; μ , v 1 / q σ ) f V ( v ) d v = 0 v 1 / q σ ϕ y μ σ v 1 / q f V ( v ) d v . where f V es la fdp chi-square distribution with ν degrees of freedom. The result follows using transformation t = v 1 / q and direct integral computations;
  • Making q = 2 we obtain the density student’s with ν degrees of freedom.
Remark 2.
Proposition 3 shows first that the generalized student’s t distribution contains the normal distribution as a special case ( q ). Moreover, it also shows that the generalized student’s t distribution is a scale mixture between the normal and the chi-square distribution with ν degrees of freedom. The third property shows that for q = 2 , the density function for the generalized student’s t coincides with the density function of the student’s t distribution with ν degrees of freedom.

2.5. Moments

In this subsection the moments of the generalized student’s t distribution are deduced.
Proposition 4.
Let X G T ( 0 , 1 , ν , q ) and Y G T ( μ , σ , ν , q ) . Hence, for r = 1 , 2 , 3 , . . . . and q > 2 r / ν , we have that
μ 2 r = E X 2 r = ν 2 r q 2 2 r q + 1 q ( 2 r ) ! Γ ( ν 2 2 r q ) r ! Γ ( ν / 2 ) μ 2 r 1 = E X 2 r 1 = 0
and
E ( Y r ) = k = 0 r r k σ k μ r k μ k .
Proof. 
Representation (13) with μ = 0 and σ = 1 , and since W and V are independent, we have that
μ 2 r = E ( X 2 r ) = E W ( V / ν ) 1 / q 2 r = E W 2 r E ( V / ν ) 2 r / q .
Moreover, since E ( V / ν ) 2 r / q = ν 2 r / q E V 2 r / q = ν 2 r / q 2 2 r / q Γ ν 2 2 r q 2 2 r / q Γ ( ν / 2 ) , q > 2 r / ν and E W 2 r = ( 2 r ) ! 2 r r ! are even moments for the standard normal distribution, the second result follows directly by applying the formula to the stochastic representation (13). □
Corollary 2.
Let Y G T ( μ , σ , ν , q ) , and hence,
E ( Y ) = μ a n d V a r ( Y ) = 2 σ 2 ν 2 / q 2 2 q + 2 q Γ ν 2 2 q Γ ( ν / 2 ) , q > 4 / ν .
Proposition 5.
Let Y G T ( μ , σ , ν , q ) , so that the coefficient of skewness and kurtosis are:
γ 1 = 0
and
β 2 = 3 Γ ( ν / 2 ) Γ ν 2 4 q Γ 2 ν 2 2 q , q > 8 / ν .
Proof. 
The standardized coefficient of skewness and kurtosis are
γ 1 = μ 3 3 μ 1 μ 3 + 2 μ 1 3 ( μ 2 μ 1 2 ) 3 / 2
and
β 2 = μ 4 4 μ 1 μ 3 + 6 μ 1 2 μ 2 3 μ 1 4 ( μ 2 μ 1 2 ) 2
and the result follows after replacing the even moments derived in Proposition 4. □
Figure 4 shows the kurtosis the G T distribution compared with T distribution for different values of q and ν = 8 .
It can be seen that the generalized student’s distribution has a greater kurtosis than the student’s distribution for q less than 2, then for data with high kurtosis, it would be recommended to use the generalized student’s distribution.

3. Inference

3.1. Moment Estimators

In the following proposition we present the moment estimators of μ , σ , and q for ν = 8 .
Proposition 6.
Where Y 1 , , Y n a random sample from the distribution of the random variable Y G T ( μ , σ , ν , q ) , so that the moment estimators of θ = ( μ , σ , ν , q ) for q > 1 are given by
μ ^ M = Y ¯ , σ ^ M = Γ ( ν / 2 ) S 2 2 ν 2 / q ^ M 2 2 q + 2 q Γ ( ν 2 2 q ^ M ) 1 / 2 a n d   γ 2 = 3 Γ ( ν / 2 ) Γ ( ν 2 4 q ^ M ) Γ 2 ( ν 2 2 q ^ M ) , ν > 8 q o   ν > 8   a n d   q < 1
where Y ¯ , S and γ 2 are the mean, standard deviation, and sample kurtosis coefficient.
Proof. 
Using (17) it follows that
μ = E ( Y ) a n d σ 2 = Γ ( ν / 2 ) V a r ( Y ) 2 ν 2 / q 4 q + 2 q Γ ν 2 2 q
replacing γ 2 in (19) one obtains the numerical equation
γ 2 = 3 Γ ( ν / 2 ) Γ ( ν 2 4 q ^ M ) Γ 2 ( ν 2 2 q ^ M )
and solving (21) for q ^ and ν ^ one obtains q ^ M and ν ^ M . Further, replacing in (20) q by q ^ M , ν by ν ^ M , E ( Y ) by Y ¯ and V a r ( Y ) by the sample variance S 2 , we obtain the moment estimators ( μ ^ M , σ ^ M , ν ^ M , q ^ M ) for ( μ , σ , ν , q ) . □

3.2. Maximum Likelihood Estimation

Given a random sample Y i G T ( μ , σ , ν , q ) , for i = 1 , . . , n , the log-likelihood function can be written as
l ( μ , σ , ν , q ) = n l o g ( σ ) n ν 2 log ( 2 ) n q l o g ( ν ) n l o g ( Γ ( ν / 2 ) ) n 2 l o g ( 2 π ) + i = 1 n l o g G ( y i )
where G ( y i ) = G ( y i ; μ , σ , ν , q ) = 0 v ν 2 2 + 1 q e 1 2 [ y i μ σ 2 ( v ν ) 2 q + v ] d v and hence the maximum likelihood equations are given by
i = 1 n G 1 ( y i ) G ( y i ) = 0
i = 1 n G 2 ( y i ) G ( y i ) = n σ
    i = 1 n G 3 ( y i ) G ( y i ) = n l o g ( 2 ) 2 + n q ν + n Ψ ( ν / 2 ) 2
      i = 1 n G 4 ( y i ) G ( y i ) = n l o g ( ν ) q 2
where, G 1 ( y i ) = μ G ( y i ) , G 2 ( y i ) = σ G ( y i ) , G 3 ( y i ) = ν G ( y i ) . G 4 ( y i ) = q G ( y i ) . The expressions for G 1 ( y i ) , G 2 ( y i ) , G 3 ( y i ) and G 4 ( y i ) should be given,
G 1 ( y i ) = 1 σ 2 ν 1 q 0 ( y i μ ) t i ( ν ) d v
G 2 ( y i ) = 1 σ 3 ν 2 q 0 ( y i μ ) 2 t i ( ν ) d v
                G 3 ( y i ) = 1 q σ 2 ν 0 [ v ν 2 / q ( y i μ ) 2 + q σ 2 ν log ( v ) t i ( ν ) d v
                G 4 ( y i ) = 1 σ 2 q 2 0 [ σ 2 log ( v ) log ( v / q ) ( v / q ) 2 / q ( y i μ ) 2 ] t i ( ν ) d v ,
where t i ( ν ) = v ν 2 2 + 1 q e 1 2 [ y i μ σ 2 ( v ν ) 2 q + v ] .
Using numerical procedures Equations (27)–(30) can be solved.
Proposition 7.
Let Y 1 , , Y n a random sample from the distribution of random variable Y G T ( μ , σ , ν , q ) . Then,
Y = Y ¯ μ S 2 / q σ 1 2 / q n G T ( 0 , 1 , ν , q )
Proof. 
The random variable Z and T
Z = Y ¯ μ σ n N ( 0 , 1 )
T = ( n 1 ) S 2 σ 2 χ ( n 1 ) 2
then
Y = Z ( T / ( n 1 ) ) 1 / q G T ( 0 , 1 , ν , q )
replacing the result is obtained. □
Proposition 8.
Let Y 1 , , Y n a random sample from the distribution of random variable Y G T ( μ , σ , ν , q ) . Then, a level ( 1 α ) confidence interval for the population mean is
[ Y ¯ t 1 α / 2 S 2 / q σ 1 2 / q n , Y ¯ + t 1 α / 2 S 2 / q σ 1 2 / q n ] ,
where t 1 α / 2 is the percentile of order 1 α 2 of GT distribution.
Proof. 
The result is obtained from the previous proposition. □

3.3. Simulation Study

To generate random numbers from the G T ( μ , σ , 8 , q ) distribution we will use the stochastic representation given in (13) and the following algorithm:
  • Simulate Z N ( 0 , 1 ) ;
  • Simulate V χ 2 ( ν ) ;
  • Compute Y = σ Z ( V / ν ) 1 / q + μ .
It then follows that Y G T ( μ , σ , ν , q ) .
Table 3 shows the parameter estimates obtained by the maximum likelihood method (MLE) through 1000 replicates of sizes 50, 100, 150, and 200 with their corresponding standard errors, mean length of the interval, and empirical coverage.

4. Two Illustrative Datasets

Illustrative Datasets 1

We consider the data that were first presented in Jander [16], from an entomology experiment. with respect to ants. A total of n = 730 ants were individually placed in the center of an arena. The measurements correspond to the initial direction in which they moved relative to a visual stimulus in a 180 degree angle from zero direction, rounded to the nearest 10 grades. Figure 5 depicts the histogram of these data, including estimated densities under a T, E S , M S , S G T , D S L and G T model, using maximum likelihood. Figure 6 shows the qqplots for T, E S , M S and G T models. We use the AIC (Akaike Information Criterion), which penalizes the maximized likelihood function by the excess of model parameters (AIC = −2log(lik) + 2k, where k is the number of unknown parameters being estimated, see Akaike [17]). Table 4 shows the descriptive statistics of the database, while Table 5 presents the Kolmogorov -Smirnov (KSS) statistic, corresponding values for the four given models, which also indicates that the best fit is presented by the G T model. Table 6 shows a 95% confidence interval for the population mean using generalized Student’s t-quantiles. Moreover, Figure 7 depicts the empirical cumulative distribution function (cdf) and the estimated cdfs for T, E S , M S and G T models.
The estimators of moments for the dataset are:
  • μ ^ M = 170.438 ;
  • σ ^ M = 47.551 ;
  • ν ^ M = 9.3458 ;
  • q ^ M = 0.4868 ,
which will be used as starting points in obtaining the EMVs.
Figure 8 depicts the histogram of these data, including estimated densities under a S G T , D S L and G T model, using maximum likelihood. We use the Akaike information criterion (AIC) and Bayesian Information Criterion (BIC), see Schwarz [18], which is defined as (BIC = 2 l o g ( l i k ) + k l o g ( n ) , where k is the number of estimated parameters and n is the sample size. Table 7 shows these results.

5. Quantile Regression

The quantile regression is used when the study objective focuses on the estimation of the different percentiles (such as the median) of a population of interest. An advantage of using quantile regression to estimate the median, rather than ordinary least squares regression current file (to estimate the mean), is that the quantile regression will be more robust in the presence of outliers. Quantile regression can be seen as a natural analogue in regression analysis when using different measures of central tendency and dispersion, in order to obtain a more complete and robust analysis of the data. Another advantage of this type of regression lies in the possibility of estimating any quantile, thus being able to assess what happens with extreme values of the population.

5.1. Quantile Regression Uni-Dimensional

Translating this concept of quantile to the regression line, we obtain the linear quantile regression.
If we assume that
Y i = β 0 , τ + β 1 , τ X i + ϵ i , τ ,
i ϵ ( 1 , . . . , n ) with τ ϵ ( 0 , 1 ) and that the conditional expected value is not necessarily zero, but the τ -ésimo quantile of the error with respect to the regressive variable is zero ( Q τ ( ϵ i , τ / X ) = 0 ) , then the τ -ésimo quantile of Y i with respect to X can be written as
Q τ ( Y i / X ) = β 0 , τ + β 1 , τ X i
The estimates of β 0 , τ y β 1 , τ are found by
β τ ^ = arg min β τ ϵ 2 Y i A τ | Y i β 0 , τ β 1 , τ X i | + Y i < A ( 1 τ ) | Y i β 0 , τ β 1 , τ X i | ,
being β τ = ( β 0 , τ , β 1 , τ ) y A = β 0 , τ + β 1 , τ X i .
To estimate the parameters, the function described in the equation should be minimized. For this, there is a way to approach the minimization problem as a linear programming problem. This allows us to obtain the regression line for the value of a certain quantile. Therefore, the first of the limitations will be solved raised at the end of the previous section, for simple linear regression. Furthermore, since the quartiles have robust properties, it is also possible to solve the second of the limitations that arose with the classical regression line.

5.2. Quantile Regression Student’s t

In this case, in the regression equation
Y i = β 0 , τ + β 1 , τ X i + ϵ i , τ ,
i ϵ ( 1 , . . . , n ) the response variable Y T ( μ , σ , ν ) , it is possible to generate random numbers for the T ( μ , σ , ν ) distribution, which the parameters μ , σ and ν they are estimated using maximum likelihood for the data. Then, one way to obtain the quantiles of Y is using the stochastic representation.
  • Simulate W N ( 0 , 1 ) ;
  • Simulate T χ 2 ( ν ) ;
  • Compute Y 1 = σ W ( T / ν ) 1 / 2 + μ .
Using this new variable Y 1 quantile regression is applied to the data ( X , Y 1 ) .

5.3. Quantile Regression Slash Logistic

In this case, in the regression equation
Y i = β 0 , τ + β 1 , τ X i + ϵ i , τ ,
i ϵ ( 1 , . . . , n ) the response variable Y G S L O G ( μ , σ , q ) , it is possible to generate random numbers for the S L O G ( μ , σ , q ) distribution, which the parameters μ , σ , and q they are estimated using maximum likelihood for the data. Then, one way to obtain the quantiles of Y is using the stochastic representation.
  • Simulate W U ( 0 , 1 ) ;
  • Compute T = μ + σ log W 1 W ;
  • Simulate U U ( 0 , 1 ) ;
  • Compute Y 2 = T U 1 / q .
Using this new variable Y 2 quantile regression is applied to the data ( X , Y 2 ) .

5.4. Quantile Regression Generalized Student’s t

In this case, in the regression equation
Y i = β 0 , τ + β 1 , τ X i + ϵ i , τ ,
i ϵ ( 1 , . . . , n ) the response variable Y G T ( μ , σ , ν , q ) , it is possible to generate random numbers for the G T ( μ , σ , ν , q ) distribution, which the parameters μ , σ , ν , and q they are estimated using maximum likelihood for the data. Then, one way to obtain the quantiles of Y is using the stochastic representation given in (13)
  • Simulate W N ( 0 , 1 ) ;
  • Simulate T χ 2 ( ν ) ;
  • Compute Y 3 = σ W ( T / ν ) 1 / q + μ .
Using this new variable Y 3 quantile regression is applied to the data ( X , Y 3 ) .

5.5. Application 2

We consider now data concerning the body mass index and Lean Body Mass of 202 Australian athletes. The data are available for download at http://azzalini.stat.unipd.it/SN/index.html (accessed on 15 October 2021). Table 8 shows statistics for these data for which the maximum likelihood estimators of ( β 0 , β 1 ) and its corresponding coefficients AIC and BIC fit models for data. are shown in Table 9 and Table 10, respectively.
In Figure 9 the quantile regression of the data is shown using the T, S L O G and G T models.

6. Discussion

We have introduced a new distribution called the generalized student’s t distribution (GT). The main idea is to replace the exponent 1 / 2 of the chi-square distribution by a exponent 1 / q where q > 0 is the kurtosis parameter. We consider the density function of the distribution and study some of its properties, as well as its moments. The parameter estimation was analyzed using the method of moments and maximum likelihood estimation. We present two illustrations, in the first a set of real data are studied where we show that the GT distribution fits the data better than the T, ES, M S , S G T , and D S L distributions. In the other application, we use quantile regression to fit a linear model to a paired dataset where the response variable shows high kurtosis where it is shown that the G T distribution fits better than the T and S L O G distributions to model the residuals.

Author Contributions

Data curation, J.R., M.A.R. and J.A.; formal analysis, J.R., M.A.R. and J.A.; investigation, J.R., M.A.R. and J.A.; methodology, J.R., M.A.R. and J.A.; writing—original draft, J.R., M.A.R. and J.A.; writing—review and editing, M.A.R. and J.A.; Funding Acquisition, J.R., M.A.R. and J.A. All authors have read and agreed to the published version of the manuscript.

Funding

Research of J.R., M.R. and J.A. was supported by Universidad de Antofagasta through project SEMILLERO UA 2021.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors would like to thank the referee for his/her constructive suggestions that improved the final version of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Rogers, W.H.; Tukey, J.W. Understanding Some Long-Tailed Symmetrical Distributions. Stat. Neerl. 1972, 26, 211–226. [Google Scholar] [CrossRef]
  2. Mosteller, F.; Tukey, J.W. Data Analysis and Regression; Addison-Wesley: Boston, MA, USA, 1977. [Google Scholar]
  3. Kafadar, K.A. Biweight Approach to the One-Sample Problem. J. Am. Stat. Assoc. 1982, 77, 416–424. [Google Scholar] [CrossRef]
  4. Wang, J.; Genton, M.G. The multivariate skew-slash distribution. J. Stat. Plan. Inference 2006, 136, 209–220. [Google Scholar] [CrossRef]
  5. Gómez, H.W.; Quintana, F.A.; Torres, F.J. A New Family of Slash-Distributions with Elliptical Contours. Stat. Probab. Lett. 2008, 77, 717–725, Erratum in Gómez, H.W.; Venegas, O. Stat. Probab. Lett. 2008, 78, 2273–2274. [Google Scholar] [CrossRef]
  6. Arslan, O. An Alternative Multivariate Skew-Slash Distribution. Stat. Probab. Lett. 2008, 78, 2756–2761. [Google Scholar] [CrossRef]
  7. Genc, A.I. A Generalization of the Univariate Slash by a Scale-Mixture Exponential Power Distribution. Commun. Stat. Simul. Comput. 2007, 36, 937–947. [Google Scholar] [CrossRef]
  8. Gómez, H.W.; Olivares-Pacheco, J.F.; Bolfarine, H. An Extension of the Generalized Birnbaum-Saunders Distribution. Stat. Probab. Lett. 2009, 79, 331–338. [Google Scholar] [CrossRef]
  9. Reyes, J.; Gómez, H.W.; Bolfarine, H. Modified slash distribution. Statistics 2013, 47, 929–941. [Google Scholar] [CrossRef]
  10. Rojas, M.A.; Bolfarine, H.; Gómez, H.W. An extension of the slash-elliptical distribution. Stat. Oper. Res. Trans. (SORT) 2014, 38, 215–230. [Google Scholar]
  11. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Wiley: New York, NY, USA, 1988. [Google Scholar]
  12. Li, R.; Nadarajah, S. A review of Student’s t distribution and its generalizations. Empir. Econ. 2020, 58, 1461–1490. [Google Scholar] [CrossRef] [Green Version]
  13. El-Bassiouny, A.H.; El-Morshedy, M. The Univarite and Multivariate Generalized Slash Student Distribution. Int. J. Math. Its Appl. 2015, 3, 3547. [Google Scholar]
  14. El-Morshedy, M.; EL-Bassiouny, A.H.; Tahir, M.H.; Eliwa, M.S. Univariate and Multivariate Double Slash Distribution. J. Stat. Appl. Probab. 2020, 9, 459–471. [Google Scholar]
  15. Reyes, J.; Barranco-Chamorro, I.; Gómez, H.W. Generalized modified slash distribution with applications. Commun. Stat.-Theory Methods 2020, 49, 2025–2048. [Google Scholar] [CrossRef]
  16. Jander, R. Die Optische Richtungsorientierung der RotenWaldameise (Formica rufa L.). Z. Vgl. Physiol. 1957, 40, 162–238. [Google Scholar] [CrossRef]
  17. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  18. Schwarz, G.E. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Figure 1. Generalized student’s pdf with q = 1 (solid line), student’s for ν = 5 pdf (dotted line), Normal pdf (dashed line), G S L T (dashed and dotted line) and D S L (thick dashed line) (left), and tails comparison (right).
Figure 1. Generalized student’s pdf with q = 1 (solid line), student’s for ν = 5 pdf (dotted line), Normal pdf (dashed line), G S L T (dashed and dotted line) and D S L (thick dashed line) (left), and tails comparison (right).
Symmetry 13 02444 g001
Figure 2. Quantile function of the generalized student’s t distribution compared to quantile function of the student’s t for ν = 5 for p = 0.975 (left) and p = 0.95 (right).
Figure 2. Quantile function of the generalized student’s t distribution compared to quantile function of the student’s t for ν = 5 for p = 0.975 (left) and p = 0.95 (right).
Symmetry 13 02444 g002
Figure 3. Densidad de G T evaluate in quantile theoretical compared to quantile, proposition 2 (upper), and qqplot (under).
Figure 3. Densidad de G T evaluate in quantile theoretical compared to quantile, proposition 2 (upper), and qqplot (under).
Symmetry 13 02444 g003
Figure 4. Kurtosis of the G T distribution compared with T distribution for ν = 8 .
Figure 4. Kurtosis of the G T distribution compared with T distribution for ν = 8 .
Symmetry 13 02444 g004
Figure 5. Histogram (left) and Comparison the tails (right) for ants dataset. Overlaid on top is the generalized student’s t density with parameters estimated via ML (solid line), the modified slash density (dashed line), the extended slash density (dotted line), the student’s t density (dashed line).
Figure 5. Histogram (left) and Comparison the tails (right) for ants dataset. Overlaid on top is the generalized student’s t density with parameters estimated via ML (solid line), the modified slash density (dashed line), the extended slash density (dotted line), the student’s t density (dashed line).
Symmetry 13 02444 g005
Figure 6. Q-q plots: student’s t (a), modified slash (b), extended slash (c), generalized student’s t (d).
Figure 6. Q-q plots: student’s t (a), modified slash (b), extended slash (c), generalized student’s t (d).
Symmetry 13 02444 g006
Figure 7. Empirical cdf with estimated T c.d.f. (yellow color),estimated M S cdf (red color), estimated E S c.d.f. (green color), and estimated G T c.d.f. (blue color).
Figure 7. Empirical cdf with estimated T c.d.f. (yellow color),estimated M S cdf (red color), estimated E S c.d.f. (green color), and estimated G T c.d.f. (blue color).
Symmetry 13 02444 g007
Figure 8. Histogram (left) and comparison the tails (right) for ants dataset. Overlaid on top is the generalized student’s t density with parameters estimated via ML (solid line), the modified slash density (dashed line), the extended slash density (dotted line),the student’s t density (dashed line).
Figure 8. Histogram (left) and comparison the tails (right) for ants dataset. Overlaid on top is the generalized student’s t density with parameters estimated via ML (solid line), the modified slash density (dashed line), the extended slash density (dotted line),the student’s t density (dashed line).
Symmetry 13 02444 g008
Figure 9. Quantile regression for BMI and LBM data with student’s t distribution (left), slash logistic distribution (center) and generalized student’s t distribution (right).
Figure 9. Quantile regression for BMI and LBM data with student’s t distribution (left), slash logistic distribution (center) and generalized student’s t distribution (right).
Symmetry 13 02444 g009
Table 1. Tails comparison GT distributions and student’s t distribution.
Table 1. Tails comparison GT distributions and student’s t distribution.
Distribution P ( Y > 3 ) P ( Y > 4 ) P ( Y > 5 ) P ( Y > 10 )
T ( 5 ) 0.01500.00520.00210.0001
G T ( 5 ) 0.03010.01030.00410.0002
Table 2. Table of quantiles generalized student’s t for ν degrees of freedom and q = 1 .
Table 2. Table of quantiles generalized student’s t for ν degrees of freedom and q = 1 .
ν GT 0.60 GT 0.70 GT 0.80 GT 0.90 GT 0.95 GT 0.975 GT 0.99 GT 0.995
10.3300.7271.4193.4677.79817.07447.159100.682
20.2890.6201.0912.0523.3715.2529.09613.578
30.2770.5871.0021.7492.6283.7105.5837.453
40.2710.5710.9601.6192.3343.1494.4425.631
50.2670.5620.9361.5462.1762.8613.8914.791
60.2650.5560.9201.4992.0782.6873.5694.314
70.2630.5510.9091.4672.0112.5703.3584.006
80.2620.5480.9001.4431.9622.4863.2093.792
90.2610.5450.8941.4251.9252.4223.0983.634
100.2600.5430.8891.4101.8962.3733.0123.513
110.2600.5420.8841.3981.8722.3332.9443.418
120.2590.5400.8811.3891.8532.3002.8883.340
130.2590.5390.8781.3801.8362.2732.8423.276
140.2580.5380.8751.3731.8232.2502.8033.221
150.2580.5370.8731.3671.8112.2302.7693.175
160.2580.5360.8711.3621.8002.2132.7403.135
170.2570.5360.8701.3571.7912.1972.7153.100
180.2570.5350.8681.3531.7832.1842.6923.070
200.2570.5340.8651.3461.7692.1612.6553.018
210.2570.5340.8641.3431.7632.1522.6392.996
220.2560.5330.8631.3401.7582.1432.6242.976
230.2560.5330.8621.3381.7532.1352.6112.958
240.2560.5320.8611.3351.7482.1272.5992.942
250.2560.5320.8611.3331.7442.1212.5882.927
260.2560.5320.8601.3311.7402.1142.5772.913
270.2560.5320.8591.3291.7372.1092.5682.900
280.2560.5310.8591.3281.7342.1032.5592.888
290.2560.5310.8581.3261.7302.0982.5512.877
300.2560.5310.8581.3251.7282.0942.5442.867
Table 3. Simulation of 1000 iterations of the model G T ( μ , σ , 8 , q ) .
Table 3. Simulation of 1000 iterations of the model G T ( μ , σ , 8 , q ) .
n μ σ q μ ^ sd ( μ ^ ) ali ( μ ^ ) c ( μ ^ ) σ ^ sd ( σ ^ ) ali ( σ ^ ) c ( σ ^ ) q ^ sd ( q ^ ) ali ( q ^ ) c ( q ^ )
500.5110.49920.16650.652796.100.99580.17600.689994.801.15580.50801.991492.80
100 0.50180.11480.450294.501.00120.12370.485194.301.09610.33191.300994.20
150 0.50450.09650.378595.501.00160.09670.379195.201.05420.25761.009895.00
200 0.50030.08010.313895.801.00180.08220.322195.101.04420.19080.748194.70
501111.00020.16490.646295.900.99630.17230.675694.901.15800.50841.993192.80
100 1.00070.11900.466495.101.00030.12770.500594.801.09480.33401.309494.10
150 1.00450.09660.378595.501.00160.09670.379095.201.05400.25751.009395.00
200 1.00030.08010.313895.801.00180.08220.322295.101.04420.19080.747994.70
501210.99980.33111.297996.001.98930.35061.374394.901.15110.49391.936392.90
100 1.00370.22900.897894.602.00350.24670.967094.701.09640.32471.272993.80
150 1.00940.19290.756095.502.00430.19350.758494.901.05520.25480.998995.20
200 1.00040.16010.627695.802.00440.16390.642594.901.04450.18930.742094.70
501310.93370.53962.115397.702.79910.88563.471493.601.08150.55292.167594.80
100 1.00420.34481.351694.703.00160.38181.496694.901.09700.33731.322294.10
150 1.01350.28931.134295.103.00820.29031.138195.101.05680.25631.004595.00
200 0.99970.24160.947295.703.00590.26301.030896.601.04560.19500.764395.20
500.50.510.50000.08250.323595.900.49860.08730.342294.701.16290.53182.084893.60
100 0.50030.05770.226194.700.50060.06200.243194.401.09540.32851.287894.10
150 0.50200.04820.188995.300.50070.04840.189994.801.05440.25731.008695.00
200 0.50000.04000.156796.000.50110.04120.161795.001.04450.19350.758594.60
5010.511.00000.08250.323695.900.49870.08720.342094.701.16620.54132.121993.60
100 1.00040.05760.226094.600.50060.06200.243194.401.09600.32861.288094.10
150 1.00200.04820.189095.500.50080.04830.189594.701.05470.25721.008195.00
200 0.99990.04000.156795.900.50100.04140.162194.901.04610.19550.766394.70
5010.50.51.00020.06950.272595.400.50450.11380.446294.700.52520.13550.531295.10
100 1.00050.04790.187994.000.50030.07980.312794.800.51310.07500.294194.80
150 1.00190.03890.152795.700.50220.06090.238894.100.50700.05870.230194.90
200 1.00030.03270.128195.200.50160.05440.213196.100.50700.05200.203894.60
500.50.50.50.50010.06950.272495.400.50400.11390.446794.700.52490.13550.531395.10
100 0.50070.04810.188594.200.50110.08000.313794.800.51350.07520.294994.80
150 0.50200.03900.152995.700.50210.06100.239394.200.50700.05870.230394.90
200 0.50010.03290.129095.100.50160.05440.213296.000.50730.05200.203894.60
s d corresponds to the standard deviation, average length of interval ( a l i ) is the average length of the confidence interval and c the empirical coverage of the respective EMV of the parameters, based on a 95 % confidence interval.
Table 4. Descriptive statistics the for dataset.
Table 4. Descriptive statistics the for dataset.
n X ¯ S b 1 b 2
730 176.438 62.6434 0.2057 4.6071
Table 5. Parameter estimates, AIC and KSS values for T, M S , E S , and G T models for the ants dataset.
Table 5. Parameter estimates, AIC and KSS values for T, M S , E S , and G T models for the ants dataset.
ParameterTMSESGT
μ 181.58 (1.265)181.67 (1.217)181.321 (0.094)181.4824 (1.1466)
σ 26.142 (1.712)16.7 (0.878)1.336 (0.108)33.4038 (1.5802)
ν 1.47 (0.134) 18.7203 (0.0029)
q 1.50 (0.034) 0.4085 (0.0013)
α 1.907 (0.094)
β 40.084 (4.719)
AIC7928.4487921.2827914.6427899.405
KSS0.11740.07810.10000.0644
p-value0.00050.01170.00070.4850
Table 6. The 95 percent confidence interval for the mean of dataset using T and G T quantiles T.
Table 6. The 95 percent confidence interval for the mean of dataset using T and G T quantiles T.
DistributionLower LimitUpper Limit
T170.5121182.4633
G T 166.8242186.1511
Table 7. Parameter estimates, AIC and BIC values for G S L T , D S L and G T models for the ants dataset.
Table 7. Parameter estimates, AIC and BIC values for G S L T , D S L and G T models for the ants dataset.
Parameter DSL GSLT GT
μ 181.6341 (1.2443)180.0680 (0.0169)181.4824 (1.1466)
σ 11.9447 (1.10722)2.5871 (0.0168)33.4038 (1.5802)
ν 2.2523 (0.016818.7203 (0.0029)
q 1 1.6916 (0.2390)0.4774 (0.0069)0.4085 (0.0013)
q 2 1.6911 (0.2788) 0.4085 (0.0013)
α 12.9451 (0.0169)
β 28.0256 (0.0170)
AIC7931.3137915.7747899.405
BIC7949.7457943.3337902.14
Table 8. Summary statistics for dataset of the body mass index and Lean Body Mass of 202 Australian athletes.
Table 8. Summary statistics for dataset of the body mass index and Lean Body Mass of 202 Australian athletes.
Datan W ¯ S W β 1 β 2
BMI20222.92642.86640.93955.1323
LBM20264.876713.07020.35582.7326
Table 9. Coefficients AIC and BIC fit models for dataset of the body mass index and Lean Body Mass of 202 Australian athletes for quantile regression student’s t (T), quantile regression slash logistic (SLOG) and quantile regression generalized student’s t (GT).
Table 9. Coefficients AIC and BIC fit models for dataset of the body mass index and Lean Body Mass of 202 Australian athletes for quantile regression student’s t (T), quantile regression slash logistic (SLOG) and quantile regression generalized student’s t (GT).
Coef.TSLOGGT
AIC915.3091252.004904.573
BIC925.2341261.928914.498
Table 10. Parameter estimates and standard deviation values for quantile regression coefficients 50 student’s t (T) and generalized student’s t ( G T ) models for the dataset.
Table 10. Parameter estimates and standard deviation values for quantile regression coefficients 50 student’s t (T) and generalized student’s t ( G T ) models for the dataset.
DistributionCoef.Est.SDt-Value P ( > | t | )
T β 0 17.50681.193814.66410.0000
β 1 0.07420.017214.66410.0002
S L O G β 0 8.74111.62375.38180.0000
β 1 0.27950.02799.98660.0000
G T β 0 17.10501.241413.77810.0000
β 1 0.08020.01724.66650.0001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Reyes, J.; Rojas, M.A.; Arrué, J. A New Generalization of the Student’s t Distribution with an Application in Quantile Regression. Symmetry 2021, 13, 2444. https://doi.org/10.3390/sym13122444

AMA Style

Reyes J, Rojas MA, Arrué J. A New Generalization of the Student’s t Distribution with an Application in Quantile Regression. Symmetry. 2021; 13(12):2444. https://doi.org/10.3390/sym13122444

Chicago/Turabian Style

Reyes, Jimmy, Mario A. Rojas, and Jaime Arrué. 2021. "A New Generalization of the Student’s t Distribution with an Application in Quantile Regression" Symmetry 13, no. 12: 2444. https://doi.org/10.3390/sym13122444

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop