Next Article in Journal
CSI-Former: Pay More Attention to Pose Estimation with WiFi
Next Article in Special Issue
TFD-IIS-CRMCB: Telecom Fraud Detection for Incomplete Information Systems Based on Correlated Relation and Maximal Consistent Block
Previous Article in Journal
Fast Adiabatic Control of an Optomechanical Cavity
Previous Article in Special Issue
Modeling Overdispersed Dengue Data via Poisson Inverse Gaussian Regression Model: A Case Study in the City of Campo Grande, MS, Brazil
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases

by
Alejandra E. Patiño Hoyos
1,2,*,
Victor Fossaluza
2,*,
Luís Gustavo Esteves
2 and
Carlos Alberto de Bragança Pereira
2
1
Facultad de Ingeniería, Institución Universitaria Pascual Bravo, Medellín 050034, Colombia
2
Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo 05508-090, Brazil
*
Authors to whom correspondence should be addressed.
Entropy 2023, 25(1), 19; https://doi.org/10.3390/e25010019
Submission received: 27 September 2022 / Revised: 17 November 2022 / Accepted: 16 December 2022 / Published: 22 December 2022
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)

Abstract

:
The full Bayesian significance test (FBST) for precise hypotheses is a Bayesian alternative to the traditional significance tests based on p-values. The FBST is characterized by the e-value as an evidence index in favor of the null hypothesis (H). An important practical issue for the implementation of the FBST is to establish how small the evidence against H must be in order to decide for its rejection. In this work, we present a method to find a cutoff value for the e-value in the FBST by minimizing the linear combination of the averaged type-I and type-II error probabilities for a given sample size and also for a given dimensionality of the parameter space. Furthermore, we compare our methodology with the results obtained from the test with adaptive significance level, which presents the capital-P P-value as a decision-making evidence measure. For this purpose, the scenario of linear regression models with unknown variance under the Bayesian approach is considered.

1. Introduction

The full Bayesian significance test (FBST) for precise hypotheses is presented in [1] as a Bayesian alternative to the traditional significance tests based on p-values. With the FBST, the authors introduce the e-value as an evidence index in favor of the null hypothesis (H). An important practical issue for the implementation of the FBST is to establish how small the evidence must be to decide to reject H ([2,3]). In that sense, the authors of [4] present loss functions such that the minimization of their posterior expected values characterizes the FBST as a Bayes test under a decision-theoretic approach. This procedure provides a cutoff point for the evidence that depends on the severity of the error for deciding whether to reject or accept H.
In the frequentist significance-test context, it is known that under certain conditions the p-value decreases as the sample size increases, in such a way that by setting a single significance level, the comparison of the p-value with the fixed significance level usually leads to rejection of the null hypothesis ([5,6,7,8,9]). In the FBST procedure, the e-value exhibits similar behavior to the p-value when the sample size increases, which suggests that the cutoff point to define the rejection of H should depend on the sample size and (possibly) on other characteristics of the statistical model under consideration. However, in the proposal of [4], a loss function that explicitly takes into account the sample size is not studied.
In order to solve the problem of testing hypotheses in the usual way, in which changing the sample size influences the probability of rejecting or accepting the null hypothesis, the authors of [10], motivated by [11], suggest that the level of significance in hypothesis testing should be a function of sample size. Instead of setting a single level of significance, the authors of [10] propose fixing the ratio of severity between type-I and type-II error probabilities based on the incurred losses in each case, and thus, given a sample size, defining the level of significance that minimizes the linear combination of the decision error probabilities. The authors of [10] show that proceeding this way, by increasing the sample size, the probabilities of both kind of errors and their linear combination decrease, while in most cases, setting a single level of significance independent of sample size, only type-II error probability decreases. The tests proposed in [10] take the same conceptual grounds of the usual tests for simple hypotheses based on the minimization of a linear combination of probabilities of error of decisions as presented in [12]. Then, the authors of [10] extend, in a sense, the idea in [12] to composite and sharp hypotheses, according to the initial work in [11].
Following the same line of work, the authors of [13,14] present a new hypothesis-testing procedure formulated from the ideas developed in previous works ([11,15,16,17]) and using a mixture of frequentist and Bayesian tools. This procedure introduces the capital-P P-value as a decision-making evidence measure and also includes an adaptive significance level, i.e., a significance level that is a function of sample size. Such an adaptive significance level is obtained from the minimization of the linear combination of generalized type-I and type-II error probabilities. According to the authors of [14], the resulting hypothesis tests do not violate the likelihood principle and do not require any constraints on the dimensionalities of the sample space and parameter space. It should be noticed that the new test procedure is precisely the optimal decision rule for the problem of testing the simple hypotheses f H against f A . For this reason, such a procedure overcomes the drawback of increasing the sample size resulting in the rejection of a null precise hypothesis ([12]). Another important way of successfully dealing with this question is to take into account meaningful deviations from the parameter value that specifies the null precise hypothesis in the formulation of the hypothesis testing problem ([18,19]).
On the other hand, linear models are probably the most used statistical models to establish the influence of a set of covariates on a response variable. In that sense, the proper identification of the relevant variables in the model is an important issue in any scientific investigation and is a more challenging task in the context of Big-Data problems. In addition to high dimensionality, in recent statistical learning problems, it is common to find large datasets with thousands of observations. This fact may cause the hypothesis of nullity of the regression coefficients to be rejected most of the time, due to the large sample size when the significance level is fixed.
The main goal of our work is to determine, in the setting of linear regression models, how small the Bayesian evidence in the FBST should be in order to reject the null hypothesis and prevent a decision-maker from the abovementioned drawbacks. Therefore, taking into account the concepts in [11,12] associated with optimal hypothesis tests, as well as the conclusions in [10] about the relationship between the significance levels and the sample size, and finally, considering the ideas developed recently by the authors of [13,14] related to adaptive significance levels, we present a method to find a cutoff point for the e-value by minimizing a linear combination of the averaged type-I and type-II error probabilities for a given sample size and also for a given dimensionality of the parameter space. For that purpose, the scenario of linear regression models with unknown variance under the Bayesian approach is considered. So, by providing an adaptive level for decision making and controlling the probabilities of both kinds of errors, we intend to avoid the problems associated with the rejection of the hypotheses on the regression coefficients when the sample size is very large. In addition to the e-value, we calculate the P-value as well as its corresponding adaptive significance levels in order to compare the decisions that can be made by performing the tests with each of these measures.

2. The Linear Regression Model with Unknown Variance

The identification of the relevant variables in linear models can be done through hypothesis-testing procedures involving the respective regression coefficients. In the conjugate Bayesian analysis of the normal linear regression model with unknown variance, it is possible to obtain expressions for the posterior distributions of the parameters and their respective marginals. Therefore, in this setting, the FBST can be used for testing if one or more of the regression coefficients is null, which is the basis of one possible model-selection procedure. We first review the normal linear regression model
y = X θ + ε , ε N n ( 0 , σ 2 I n ) ,
where y = ( y 1 , , y n ) is an n × 1 vector of y i observations, X = ( x 1 , , x n ) is an n × p matrix of covariates, also called the design matrix, with x i = ( 1 , x i 1 , , x i p 1 ) , θ = ( θ 1 , , θ p ) is a p × 1 vector of parameters (regression coefficients), and ε = ( ε 1 , , ε n ) an n × 1 vector of random errors. The model shows simply that the conditional distribution of y given parameters ( θ , σ 2 ) is the multivariate normal distribution N n ( X θ , σ 2 I n ) . Therefore, the likelihood becomes
f ( y | θ , σ 2 ) = ( 2 π σ 2 ) n / 2 exp 1 2 σ 2 ( y X θ ) ( y X θ ) .
The natural conjugate prior distribution of ( θ , σ 2 ) is a p-variate normal-inverse gamma distribution with hyperparameters m 0 , V 0 , a 0 , and b 0 , denoted by ( θ , σ 2 ) N p I G ( m 0 , V 0 , a 0 , b 0 ) . Combining it with the likelihood (2) gives the posterior distribution ([20,21,22]):
f ( θ , σ 2 | y ) ( σ 2 ) a 0 + n 2 + p 2 + 1 exp 1 2 σ 2 ( θ m * ) V * 1 ( θ m * ) + 2 b 1 ,
where
V * = V 0 1 + X X 1 , m * = V * V 0 1 m 0 + X y ,
a 1 = a 0 + n 2 , b 1 = b 0 + m 0 V 0 1 m 0 + y y m * V * 1 m * 2 .
If X X is non-singular, we can write
m * = V * V 0 1 m 0 + X X θ ^ ,
where θ ^ = ( X X ) 1 X y is the classical maximum likelihood or least squares estimator of θ . Therefore, the posterior distribution of ( θ , σ 2 ) is
( θ , σ 2 ) | y N p I G ( m * , V * , a 1 , b 1 ) .
See Appendix A for further explanation of the priors, posteriors, and conditional distributions for the linear regression models with unknown variance.

3. Adaptive Significance Levels in Linear Regression Coefficient Hypothesis Testing

In this section, we present the methodology to find a cutoff value for the evidence in the FBST as an adaptive significance level and we also develop the procedure to calculate the P-value with its corresponding adaptive significance level, all this in the context of linear regression coefficient hypothesis testing in models with unknown variance under the Bayesian point of view. For that purpose, first of all, it is necessary to show how the Bayesian prior predictive densities under the null and alternative hypotheses are defined.

3.1. Prior Predictive Densities in Regression-Coefficient Hypothesis Testing

Let θ = ( θ 1 θ 2 ) , with θ 1 = ( θ 1 , , θ s ) and θ 2 = ( θ s + 1 , , θ p ) , having θ 1 s elements and θ 2 r elements. Let ξ = ( θ , σ 2 ) = ( θ 1 , θ 2 , σ 2 ) , then, Y | ξ N n ( X θ , σ 2 I n ) where ξ Ξ . We are interested in testing the hypotheses
H : θ 2 = 0 A : θ 2 0 .
Let Ξ H and Ξ A be the partition of the parameter space defined by the competing hypotheses H and A. Consider the prior density g ( ξ ) defined over the entire parameter space Ξ and let f H and f A be the Bayesian prior predictive densities under the respective hypotheses. Both are probability density functions over the sample space Ω , as follows:
f H ( y ) = t n 2 a 0 + r 2 ; X C m 0 1.2 ( 0 ) , b 0 + m 0 2 ( V 0 22 ) 1 m 0 2 2 a 0 + r 2 I n + ( X C ) V 0 11.2 ( X C ) ,
where C ( s + r ) × s = [ I s , 0 s × r ] .
Additionally,
f A ( y ) = t n 2 a 0 ; X m 0 , b 0 a 0 I n + X V 0 X .
where P H and P A are the prior probability measure of ξ restricted to the sets H and A respectively (more details can be seen in Appendix B).

3.2. Evidence Index: e-Value

The full Bayesian significance test (FBST) was proposed in [1] for precise or “sharp” hypotheses (subsets of the parameter space with smaller dimension than the dimension of the whole parameter space, and, therefore, with null Lebesgue measure) based on the evidence in favor of the null hypothesis, calculated as the posterior probability of the complement of the highest posterior density (HPD) region (here we consider the usual HPD region with respect to the Lebesgue measure, even though it could be built by choosing any other dominating measure instead) tangent to the set that defines the null hypothesis. Considering the concepts in [10,11], and the recent works [13,14] related to adaptive significance levels, we propose to establish a cutoff value k * for the e -value ( e v H ; y 0 ) in the FBST as a function of the sample size n and the dimensionality of the parameter space d, i.e., k * = k * ( n , d ) with k * [ 0 , 1 ] , such that k * minimizes the linear combination of the averaged type-I and type-II error probabilities, a α + b β . To describe the procedure in the context of the coefficient hypothesis testing of the linear regression model we are addressing, consider the tangential set to the null hypothesis which is defined as
T y 0 = ξ Ξ : f ( ξ | y 0 ) > sup H f ( ξ | y 0 ) = ( θ 1 , θ 2 , σ 2 ) Ξ : f ( θ 1 , θ 2 , σ 2 | y 0 ) > sup H f ( θ 1 , θ 2 , σ 2 | y 0 ) .
This is the posterior distribution of ( θ 1 , σ 2 ) given θ 2 a s-variate normal-inverse gamma, that is
( θ 1 , σ 2 | θ 2 , y 0 ) N s I G m * 1.2 ( θ 2 ) , V * 11.2 , a 0 + r 2 , b 0 + ( θ 2 m * 2 ) V * 22 1 ( θ 2 m * 2 ) 2 ,
where the point under H for which the posterior attains its maximum value can be calculated as follows
arg sup H f ( θ 1 , θ 2 , σ 2 | y 0 ) = arg sup θ 1 , θ 2 = 0 , σ 2 f ( θ 1 , θ 2 = 0 , σ 2 | y 0 ) = arg sup θ 1 , σ 2 f ( θ 1 , θ 2 = 0 , σ 2 | y 0 ) θ 1 R s , σ 2 R + f ( θ 1 , θ 2 = 0 , σ 2 | y 0 ) d θ 1 d σ 2 = arg sup θ 1 , σ 2 f ( θ 1 , σ 2 | θ 2 = 0 , y 0 ) = Mode f ( θ 1 , σ 2 | θ 2 = 0 , y 0 ) = m * 1.2 ( θ 2 = 0 ) , 0 , b 1 + ( m * 2 ) ( V * 2 ) 1 ( m * 2 ) 2 a 1 + r 2 + 1 + s 2 = θ 1 ^ , 0 , σ 2 ^ .
Thus, we get the tangential set
T y 0 = ( θ 1 , θ 2 , σ 2 ) Ξ : f ( θ 1 , θ 2 , σ 2 | y 0 ) > f ( θ 1 ^ , 0 , σ 2 ^ | y 0 ) .
The evidence in favor H is calculated as the posterior probability of the complement of T y 0 . That is,
e v H ; y 0 = 1 P ( ξ T y 0 | y 0 ) .
The evidence index, e-value, in favor of a precise hypothesis, considers all points of the parameter space which are less “probable" than some point in Ξ H . A large value of e v H ; y 0 means that the subset Ξ H lies in a high-probability region of Ξ , and, therefore, the data support the null hypothesis; on the other hand, a small value of e v H ; y 0 means that Ξ H is in a low-probability region of Ξ and the data would make us discredit the null hypothesis ([23]).
The evidence in (8) can be approximately determined via Monte Carlo simulation. Then, generating M samples from the posterior distribution of ξ , such that ξ | y N p I G ( m * , V * , a 1 , b 1 ) , we estimate the evidence by Monte Carlo simulation through the expression
1 1 M j = 1 M 1 ξ ( j ) T y 0 .
Now, consider the test such that
φ e ( y ) = 0 i f e v H ; y > k 1 i f e v H ; y k .
The averaged error probabilities, expressed in terms of the predictive densities, can be estimated by Monte Carlo simulation through the expressions
α φ e = y Ψ e f H ( y ) d y and β φ e = y Ψ e f A ( y ) d y ,
where Ψ e is the set
Ψ e = y Ω : e v H ; y k .
So, the adaptive cutoff value k * for e v H ; y will be the k that minimizes a α φ e + b β φ e . The a and b values represent the relative seriousness of errors of the two types or, equivalently, relative prior preferences for the competing hypotheses. For example, if b / a = 1 , it is said that β φ e and α φ e are equally severe, whereas if b / a < 1 , then α φ e undergoes a more intense minimization than β φ e , which means that type-I error is considered more serious than type-II error and also indicates a prior preference for H.

3.3. Significance Index: P-Value

The authors of [13,14] present a new hypothesis-testing procedure using a mixture of frequentist and Bayesian tools. On the one hand, the procedure resembles a frequentist test as it is based on the comparison of the P-value as a decision-making evidence measure with an adaptive significance level. On the other hand, such an adaptive significance level is obtained from the minimization of a linear combination of generalized type-I and type-II error probabilities under a Bayesian perspective. As a result, it generally depends on both the null and alternative hypotheses and on the sample size n, as opposed to standard fixed significance levels. The new proposal may also be seen as a test for simple hypotheses characterized by the predictive distributions f H and f A in Section 3.1 that minimizes a specific linear combination of probabilities of errors of decision. It is then formally characterized by a cutoff for the Bayes Factor (which takes the place of the likelihood ratio here) and therefore may prevent a decision-maker from rejecting the null hypothesis when the data seem to be clear evidence in its favor ([12]). It should be stressed that under the new proposal, a cutoff value for the Bayes factor (for the “likelihood ratio” here) is chosen in advance and consequently no constraint is imposed exclusively on the probability of the error of the first kind. In this sense, the test in [13,14] completely departs from regular frequentist tests. From another angle, the Bayes factor may be seen as the ratio between the posterior odds in favor of the null hypothesis and its prior odds ([24]). Note that the quantity defined here is a capital-P “P-value” to distinguish it from the small-p “p-value”. In the scenario of the linear regression model with unknown variance, the ratio between the two prior predictive densities (4) and (5), will be the Bayes factor,
BF ( x ) = f H ( x ) f A ( x ) .
Now, consider the test
φ * ( y ) = 0 i f BF ( y ) > b a 1 i f BF ( y ) b a .
For any other test φ , φ * minimizes a linear combination of the type-I and type-II error probabilities, a α φ + b β φ . Here again, the a and b values represent the relative seriousness of errors of the two types. To obtain the P-value at the point y 0 Ω , define the set Ψ 0 of sample points y for which the Bayes factors are smaller than or equal to the Bayes factor of the observed sample point y 0 , that is
Ψ 0 = { y Ω : BF ( y ) BF ( y 0 ) } .
Then, the P-value is the integral of the predictive density over H, f H , in Ψ 0
P - value ( y 0 ) = Ψ 0 f H ( y ) d y .
Defining the set Ψ * of sample points y with Bayes factors smaller than or equal to b / a , i.e.,
Ψ * = y Ω : BF ( y ) b a ,
the optimal averaged error probabilities from the generalized Neyman–Pearson Lemma, which will depend on the sample size, are given by
α φ * = y Ψ * f H ( y ) d y and β φ * = y Ψ * f A ( y ) d y .
In order to make a decision, the P-value is compared to the optimal adaptive significance level α φ * . Then, when y 0 is observed, the hypothesis H will be rejected if the P - value ( y 0 ) < α φ * .

4. Simulation Study

We developed a simulation study considering two models. The first model was
y = X θ + ε , ε N n ( 0 , σ 2 I n ) ,
where X = 1 n and θ = θ 1 . The hypotheses to be tested were
H : θ 1 = 0 A : θ 1 0 .
The second model studied was
y = X θ + ε , ε N n ( 0 , σ 2 I n ) ,
where X = ( x 1 , , x n ) is an n × p matrix of covariates with x i = ( 1 , x i 1 , , x i p 1 ) and θ = ( θ 1 , θ 2 ) is the p × 1 vector of coefficients. In this case, the hypotheses of interest were
H : θ 2 = 0 A : θ 2 0 .
The averaged error probabilities, α φ * and β φ * , were calculated using the Monte Carlo method with values generated from the following distributions:
  • Model (11) under H
    θ 1 ( j ) = 0 σ 2 ( j ) | θ 1 ( j ) = 0 I G a 0 + 1 2 , b 0 + ( θ 1 ( j ) m 0 ) V 0 1 ( θ 1 ( j ) m 0 ) 2 Y ( j ) | σ 2 ( j ) , θ 1 ( j ) N n ( 1 n θ 1 ( j ) , σ 2 ( j ) I n ) .
  • Model (11) under A
    σ 2 ( j ) I G ( a 0 , b 0 ) θ 1 ( j ) | σ 2 ( j ) N ( m 0 , σ 2 ( j ) V 0 ) Y ( j ) | σ 2 ( j ) , θ 1 ( j ) N n ( 1 n θ 1 ( j ) , σ 2 ( j ) I n ) .
  • Model (12) under H
    θ 2 ( j ) = 0 θ 1 ( j ) | θ 2 ( j ) = 0 t s 2 a 0 + 1 ; m 0 1.2 ( θ 2 ( j ) ) , 2 b 0 + ( θ 2 ( j ) m 0 2 ) V 0 22 1 ( θ 2 ( j ) m 0 2 2 ) 2 a 0 + 1 V 0 11.2 σ 2 ( j ) | θ 1 ( j ) , θ 2 ( j ) = 0 I G a 0 + 1 , b 0 + ( θ ( j ) m 0 ) V 0 1 ( θ ( j ) m 0 ) 2 Y ( j ) | σ 2 ( j ) , θ 1 ( j ) , θ 2 ( j ) = 0 N n ( X θ ( j ) , σ 2 ( j ) I n ) .
  • Model (12) under A
    σ 2 ( j ) I G ( a 0 , b 0 ) θ ( j ) | σ 2 ( j ) N p ( m 0 , σ 2 ( j ) V 0 ) Y ( j ) | σ 2 ( j ) , θ ( j ) N n ( X θ ( j ) , σ 2 ( j ) I n ) .
Then, y ( j ) = ( y 1 ( j ) , y n ( j ) ) is a random sample of the conditional distribution of Y , j = 1 M .
In a first stage, we considered model (11) where θ = θ 1 and model (12) with θ = ( θ 1 , θ 2 ) . Note that the dimensionality of the parameter space, denoted by d, is different in the two models: for model (11), the dimensionality is d = 2 and for model (12), the dimensionality is d = 3 . Samples of size M = 1000 were generated for each model under the respective hypotheses and also for different sample sizes between n = 10 and n = 5000 . In model (12), the covariate x i 1 , i = 1 n , was generated from a standard normal distribution. Finally, to obtain the adaptive values α φ * and β φ * , the two types of errors were considered as equally severe, that is, a = b = 1 .
Figure 1 shows the averaged error probabilities for the FBST as functions of k for a sample size n = 100 . This was replicated for all sample sizes in order to numerically find the corresponding k * value that minimizes α φ e + β φ e . Table 1 and Table 2 and Figure 2 and Figure 3 present the k * and α φ P * values as function of n for each model. As can be seen, both values have a decreasing trend when the sample size increases. In the case of the cutoff value for the evidence, it is possible to notice the differences in the results when the dimensionality of the parameter space change. Then, the k * value depends not only on the sample size but also on the dimensionality of the parameter space, more specifically, it is greater when d is higher. However, this does not occur with α φ P * , which maintains almost the same values even if d increases. On the other hand, Figure 4 and Figure 5 illustrate that in all these models, the optimal averaged error probabilities and their linear combination also decrease with increasing sample size.
We choose a single random sample y 0 to calculate the e-value and P-value for the models. Table 3 displays the results: the cases where H is rejected being represented by the cells in boldface. It can be observed that the decision remains the same regardless of the index used.
As the second stage in our simulation study, we set two sample sizes n = 60 and n = 120 to perform the tests for model (12), increasing the dimensionality of the parameter space. In that scenario, the vector of coefficients was such that θ = ( θ 1 , θ 2 ) and the hypotheses to be tested were
H : θ 2 = 0 A : θ 2 0 .
So, by varying the dimension of vector θ 1 , the different models considered for each test were obtained. Table 4 and Table 5 and Figure 6 and Figure 7 show the k * and α φ P * values as functions of d. For d = 2 , the values correspond to model (11). We can say that, for a fixed hypothesis, the larger the dimensionality of the parameter space, the greater the value of k * . In the case of the α φ P * value, it does not change significantly when the dimensionality of the parameter space increases, except when the number of parameters is very large in relation to the sample size.
Table 6 presents the e-value and P-value calculated for a single random sample y 0 . Here, with the e-value the null hypothesis is less easily rejected. This may be related to two things: it may be due to approximation error as a result of the simulation process or due to the fact that the evidence apparently converges to 1 as the dimensionality of the parameter space increases, in which case a more detailed study is required.

5. Numerical Examples

In this section, we present two applications with real datasets. We choose a 0 = 3 and b 0 = 2 as parameters of the inverse gamma prior distribution for σ 2 . Additionally, in the normal prior for θ given σ 2 , m 0 = 0 p × 1 and V 0 = I p are taken as parameters. The Monte Carlo approximations were made generating samples of size M =10,000.

5.1. Budget Shares of British Households Dataset

We select a dataset that draws 1519 observations from the 1980–1982 British Family Expenditure Surveys (FES) ([25]). In our application, we want to fit the model
y i = θ 1 + θ 2 x i 1 + θ 3 x i 2 + θ 4 x i 3 + θ 5 x i 4 + ε i , ε i N ( 0 , σ 2 ) .
We consider as explanatory variables, respectively, the total net household income (rounded to the nearest 10 UK pounds sterling) ( x 1 ), the budget share for alcohol expenditure ( x 2 ), the budget share for fuel expenditure, and the age of household head ( x 3 ). We take the budget share for food expenditure as the dependent variable ( y ). All the expenditures and income are measured in pounds sterling per week.
Table 7 summarizes the results for the hypotheses H : θ j = 0 , j = 1 5 , by performing the test with the p-value at 0.05 significance level and also the e-value and the P-value with their respective adaptive significance levels. The cases where H is rejected are represented by the cells in boldface. θ ^ F r e q and θ ^ B a y e s are, respectively, the classical maximum likelihood estimator and the Bayes estimator of θ . It can be seen that unlike the p-value, the e-value and the P-value do not reject the hypothesis of nullity of the coefficient associated with the age of household head variable.
Table 8 exposes the optimal averaged error probabilities using the e-value and the P-value. It can be noted that the values are very similar with both methodologies. 

5.2. Boston Housing Dataset

We also take a dataset that contains information about housing values obtained from census tracts in the Boston Standard Metropolitan Statistical Area (SMSA) in 1970 ([26]). These data are composed of 506 samples and 14 variables. The regression model we use is
y i = θ 1 + θ 2 x i 1 + θ 3 x i 2 + θ 4 x i 3 + θ 5 x i 4 + θ 6 x i 5 + θ 7 x i 6 + θ 8 x i 7 + θ 9 x i 8 + θ 10 x i 9 + ε i ,
ε i N ( 0 , σ 2 ) .
We choose the following explanatory variables to fit our model: per capita crime rate by town ( x 1 ), the proportion of residential land zoned for lots over 25.000 sq. ft ( x 2 ), the proportion of non-retail business acres per town ( x 3 ), the proportion of non-retail business acres per town ( x 4 ), the average number of rooms per dwelling ( x 5 ), the proportion of owner-occupied units built prior to 1940 ( x 6 ), the weighted mean of distances to five Boston employment centers ( x 7 ), the full-value property tax rate per 10.000 ( x 8 ), the pupil–teacher ratio by town, and 1000 ( B k 0.63 ) 2 , where B k is the proportion of black people by town ( x 9 ). The dependent variable is the median value of the owner-occupied homes (in 1000 s) in the census tract ( y ).
The results for the hypotheses H : θ j = 0 , j = 1 10 by performing the test with the p-value, the e-value and the P-value, are summarized in Table 9. In this case, with the e-value the null hypotheses are less rejected. The e-value does not reject the hypotheses of nullity of the coefficients associated with the proportion of residential land zoned for lots over 25.000 sq. ft and proportion of non-retail business acres per town variables, while the p-value does. On the other hand, the P-value, unlike the p-value, does not reject the hypothesis for the proportion of residential land zoned for lots over 25.000 sq. ft variable, but it does for the Intercept. As can be observed in Table 10, for these data, the optimal averaged error probabilities values are also very close.

6. Conclusions

In this work, we present a method to find a cutoff value k * for the Bayesian evidence in the FBST by minimizing the linear combination of the averaged type-I and type-II error probabilities for a given sample size n and also for a given dimensionality d of the parameter space in the context of linear regression models with unknown variance under the Bayesian perspective. In that sense, we provide a solution to the existing problem in the usual approach of hypothesis-testing procedures based on fixed cutoffs for measures of evidence: the increase of the sample size leads to the rejection of the null hypothesis. Furthermore, we compare our results with those obtained by using the test proposed by the authors of [13,14]. With our suggestion of cutoff value for the evidence in the FBST and also with the procedure proposed by the authors of [13,14], increasing the sample size implies that the probabilities of both kinds of optimal averaged errors and their linear combination decrease, unlike most cases, where, by setting a single level of significance independent of sample size, only type-II error probability decreases.
A detailed study is still needed for more complex models, so the methodology we propose to determine the adaptive cutoff value for evidence in the FBST could be extended to models with different prior specifications, which would involve, among other things, using approximate methods to find the prior predictive densities under the null and alternative hypotheses.   

Author Contributions

Conceptualization, A.E.P.H., V.F., L.G.E. and C.A.d.B.P.; Methodology, A.E.P.H., V.F., L.G.E. and C.A.d.B.P.; Formal analysis, A.E.P.H., V.F. and L.G.E.; Investigation, A.E.P.H. and V.F.; Writing—review & editing, A.E.P.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The real datasets are freely available in the Ecdat package ([27]) (BudgetUK dataset) and the MASS package ([28]) (Boston dataset) of R software ([29]).

Acknowledgments

The first author gratefully acknowledges financial support from the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) from Brazil and the Ministerio de Ciencia Tecnología e Innovación (Minciencias) from Colombia. The authors are grateful to the editor and referees for helpful comments and suggestions which have led to an improvement of this article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

As stated in Section 2, the normal linear regression model in (1) shows that the conditional distribution of y given parameters ( θ , σ 2 ) is the multivariate normal distribution N n ( X θ , σ 2 I n ) . Therefore, the likelihood becomes
f ( y | θ , σ 2 ) = ( 2 π σ 2 ) n / 2 exp 1 2 σ 2 ( y X θ ) ( y X θ ) .
The natural conjugate prior distribution of ( θ , σ 2 ) is a p-variate normal-inverse gamma distribution with hyperparameters m 0 , V 0 , a 0 , and b 0 , denoted by ( θ , σ 2 ) N p I G ( m 0 , V 0 , a 0 , b 0 ) ([20,21,22]):
g ( θ , σ 2 ) = ( b 0 ) a 0 ( 2 π ) p / 2 | V 0 | 1 / 2 Γ ( a 0 ) ( σ 2 ) a 0 + p 2 + 1 exp 1 2 σ 2 ( θ m 0 ) V 0 1 ( θ m 0 ) + 2 b 0 ,
such that the conditional prior distributions of θ given σ 2 is
g ( θ | σ 2 ) = ( 2 π ) p / 2 | V 0 | 1 / 2 ( σ 2 ) p / 2 exp 1 2 σ 2 ( θ m 0 ) V 0 1 ( θ m 0 ) ,
and the prior marginal distribution of σ 2 is
g ( σ 2 ) = ( b 0 ) a 0 Γ ( a 0 ) ( σ 2 ) a 0 + 1 exp b 0 σ 2 ,
denoted, respectively, by
θ | σ 2 N p ( m 0 , σ 2 V 0 ) , σ 2 I G ( a 0 , b 0 ) .
Both distributions are equivalent to the following new pair of distributions
g ( σ 2 | θ ) = b 0 + ( θ m 0 ) V 0 1 ( θ m 0 ) 2 a 0 + p 2 Γ a 0 + p 2 ( σ 2 ) a 0 + p 2 + 1 × exp 1 2 σ 2 ( θ m 0 ) V 0 1 ( θ m 0 ) + 2 b 0 ,
and
g ( θ ) = ( 2 b 0 ) a 0 Γ a 0 + p 2 π p / 2 | V 0 | 1 / 2 Γ ( a 0 ) ( θ m 0 ) V 0 1 ( θ m 0 ) + 2 b 0 a 0 + p 2 1 + ( θ m 0 ) ( 2 b 0 V 0 ) 1 ( θ m 0 ) a 0 + p 2 .
The density in (A7) is a p-variate t distribution with 2 a 0 degrees of freedom and hyperparameters m 0 and ( b 0 a 0 ) V 0 . Then, the distributions in (A6) and (A7) are denoted by
σ 2 | θ I G a 0 + p 2 , b 0 + ( θ m 0 ) V 0 1 ( θ m 0 ) 2 , θ t p 2 a 0 ; m 0 , b 0 a 0 V 0 .
Now suppose that the N p I G ( m 0 , V 0 , a 0 , b 0 ) distribution (A2) is adopted as the prior distribution for ( θ , σ 2 ) . Combining it with the likelihood (A1) gives the posterior distribution ([20,21,22]):
f ( θ , σ 2 | y ) ( σ 2 ) a 0 + n 2 + p 2 + 1 exp 1 2 σ 2 ( θ m * ) V * 1 ( θ m * ) + 2 b 1 ,
where
V * = V 0 1 + X X 1 , m * = V * V 0 1 m 0 + X y ,
a 1 = a 0 + n 2 , b 1 = b 0 + m 0 V 0 1 m 0 + y y m * V * 1 m * 2 .
If X X is non-singular, we can write
m * = V * V 0 1 m 0 + X X θ ^ ,
where θ ^ = ( X X ) 1 X y is the classical maximum likelihood or least squares estimator of θ . Therefore, the posterior distribution of ( θ , σ 2 ) is
( θ , σ 2 ) | y N p I G ( m * , V * , a 1 , b 1 ) .
Consequently,
θ | σ 2 , y N p ( m * , σ 2 V * ) , σ 2 | y I G ( a 1 , b 1 ) ,
and this is equivalent to,
σ 2 | θ , y I G a 1 + p 2 , b 1 + ( θ m * ) V * 1 ( θ m * ) 2 ,
θ | y t p 2 a 1 ; m * , b 1 a 1 V * .
Consider now conditional distributions given partial specification of θ . First let θ = ( θ 1 , θ 2 ) , and consider distributions conditional on θ 2 . Suppose that ( θ , σ 2 ) N p I G ( m 0 , V 0 , a 0 , b 0 ) . Corresponding distributions result if we change a 0 to a 1 , b 0 to b 1 , m 0 to m * and V 0 to V * . If θ 1 has s elements and θ 2 has r elements, write
m 0 = m 0 1 m 0 2 , V 0 = V 0 11 V 0 12 V 0 21 V 0 22 ,
where m 0 1 is s × 1 , V 0 11 is s × s , m 0 2 is r × 1 , V 0 22 is r × r , with r = p s . Now since θ given σ 2 is distributed as N p ( m 0 , σ 2 V 0 ) , using general results on multivariate normal distributions (see [30]), we have the following distributions:
θ 2 | σ 2 N r ( m 0 2 , σ 2 V 0 22 ) ,
( θ 1 | θ 2 , σ 2 ) N s ( m 0 1.2 ( θ 2 ) , σ 2 V 0 11.2 ) ,
where m 0 1.2 ( θ 2 ) = m 0 1 + m 0 12 V 0 22 1 ( θ 2 m 0 2 ) and V 0 11.2 = V 0 11 V 0 12 V 0 22 1 V 0 21 .
From (A13) and the prior distribution of σ 2 we have that
( θ 2 , σ 2 ) N r I G ( m 0 2 , V 0 22 , a 0 , b 0 )
and hence
θ 2 t r 2 a 0 ; m 0 2 , b 0 a 0 V 0 22 ,
σ 2 | θ 2 I G a 0 + r 2 , b 0 + ( θ 2 m 0 2 ) V 0 22 1 ( θ 2 m 0 2 ) 2 ,
Now (A14) and (A17) together give
( θ 1 , σ 2 | θ 2 ) N s I G m 0 1.2 ( θ 2 ) , V 0 11.2 , a 0 + r 2 , b 0 + ( θ 2 m 0 2 ) V 0 22 1 ( θ 2 m 0 2 ) 2
and finally
θ 1 | θ 2 t s 2 a 0 + r ; m 0 1.2 ( θ 2 ) , 2 b 0 + ( θ 2 m 0 2 ) V 0 22 1 ( θ 2 m 0 2 ) 2 a 0 + r V 0 11.2 .

Appendix B

Let f H and f A be the Bayesian prior predictive densities under the respective hypotheses H and A described in Section 3.1. Both are probability density functions over the sample space Ω , and they are calculated as the following conditional expectations:
f H ( y ) = E ξ f ( y | ξ ) | H = H f ( y | ξ ) d P H ( ξ ) = H f ( y | θ 1 , θ 2 , σ 2 ) g H ( θ 1 , θ 2 , σ 2 ) d θ 1 d θ 2 d σ 2 ,
where g H ( θ 1 , θ 2 , σ 2 ) is the prior density under H calculated as
g H ( θ 1 , θ 2 , σ 2 ) = g ( θ 1 , θ 2 , σ 2 ) ) 1 ( θ 2 = 0 ) H g ( θ 1 , θ 2 , σ 2 ) d θ 1 d θ 2 d σ 2 = g ( θ 1 , θ 2 , σ 2 ) 1 ( θ 2 = 0 ) R s × R + g ( θ 1 , θ 2 = 0 , σ 2 ) d θ 1 d σ 2 = g ( θ 1 , σ 2 | θ 2 = 0 ) .
Thus, f H ( y ) is given by
f H ( y ) = H f ( y | θ 1 , θ 2 , σ 2 ) g H ( θ 1 , θ 2 , σ 2 ) d θ 1 d θ 2 d σ 2 = R s × R + f ( y | θ 1 , θ 2 = 0 , σ 2 ) g ( θ 1 , σ 2 | θ 2 = 0 ) d θ 1 d σ 2 = R s × R + N n ( X C θ 1 , σ 2 I n ) × N s I G m 0 1.2 ( 0 ) , v 0 11.2 , a 0 + r 2 , b 0 + m 0 2 ( V 0 22 ) 1 m 0 2 2 d θ 1 d σ 2 = t n 2 a 0 + r 2 ; X C m 0 1.2 ( 0 ) , b 0 + m 0 2 ( V 0 22 ) 1 m 0 2 2 a 0 + r 2 I n + ( X C ) V 0 11.2 ( X C ) ,
where C ( s + r ) × s = [ I s , 0 s × r ] .
The prior predictive density under A can be obtained as follows
f A ( y ) = E ξ f ( y | ξ ) | A = A f ( y | ξ ) d P A ( ξ ) = A f ( y | θ , σ 2 ) g A ( θ , σ 2 ) d θ d σ 2 = A f ( y | θ , σ 2 ) g ( θ , σ 2 ) d θ d σ 2 = A N n ( X θ , σ 2 I n ) × N p I G ( m 0 , V 0 , a 0 , b 0 ) d θ d σ 2 = t n 2 a 0 ; X m 0 , b 0 a 0 I n + X V 0 X .
where P H and P A are the prior probability measure of ξ restricted to the sets H and A , respectively.

References

  1. Pereira, C.A.B.; Stern, J.M. Evidence and Credibility: Full Bayesian Significance Test for Precise Hypotheses. Entropy 1999, 1, 99–110. [Google Scholar] [CrossRef] [Green Version]
  2. Wagner, B.; Stern, J.M. The Rules of Logic Composition for the Bayesian Epistemic e-Values. Logic J. IGPL 2007, 15, 401–420. [Google Scholar]
  3. Stern, J.M. Cognitive constructivism, eigen-solutions, and sharp statistical hypotheses. Cybern. Hum. Knowing 2007, 14, 9–46. [Google Scholar]
  4. Madruga, R.; Esteves, L.G.; Wechsler, S. On the Bayesianity of Pereira-Stern tests. TEST 2001, 10, 291–299. [Google Scholar] [CrossRef]
  5. Kempthorne, O.; Folks, L. Probability, Statistics, and Data Analysis; Iowa State University Press: Ames, IA, USA, 1971. [Google Scholar]
  6. Cox, D.R.; Spjøtvoll, E.; Johansen, S.; van Zwet, W.R.; Bithell, J.F.; Barndorff-Nielsen, O.; Keuls, M. The Role of Significance Tests [with Discussion and Reply]. Scand. J. Stat. 1978, 4, 49–70. [Google Scholar]
  7. Lindley, D.V.; Barndorff-Nielsen, O.; Gustav, E.; Harsaae, E.; Thorburn, D.; Hald, A.; Spjötvoll, E. The Bayesian Approach [with Discussion and Reply]. Scand. J. Stat. 1978, 5, 1–26. [Google Scholar]
  8. Cox, D.R. Statistical significance tests. Br. J. Clin. Pharmacol. 1982, 14, 325–331. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Johnstone, D.J.; Lindley, D.V. Bayesian inference given data ‘significant atα’: Tests of point hypotheses. Theor. Decis. 1995, 38, 51–60. [Google Scholar] [CrossRef]
  10. Oliveira, M.C. Definition of the Level of Significance as a Function of the Sample Size. Masters Dissertation, Universidade de São Paulo, Instituto de Matemática e Estatística, Departamento de Estatística, São Paulo, Brazil, 2014. (In Portuguese). [Google Scholar]
  11. Pereira, C.A.B. Test of Hypotheses Defined in Spaces of Different Dimensions: Bayesian View and Classical Interpretation. Ph.D. Thesis, Universidade de São Paulo, Instituto de Matemática e Estatśtica, Departamento de Estatśtica, São Paulo, Brazil, 1985. (In Portuguese). [Google Scholar]
  12. DeGroot, M.H. Probability and Statistics, 2nd ed.; Addison-Whesley Publishing Company: Boston, MA, USA, 1986. [Google Scholar]
  13. Pereira, C.A.B.; Nakano, E.Y.; Fossaluza, V.; Esteves, L.G.; Gannon, M.A.; Polpo, A. Hypothesis tests for bernoulli experiments: Ordering the sample space by bayes factors and using adaptive significance levels for decisions. Entropy 2017, 19, 696. [Google Scholar] [CrossRef] [Green Version]
  14. Gannon, M.A.; Pereira, C.A.B.; Polpo, A. Blending bayesian and classical tools to define optimal sample-size-dependent significance levels. Am. Stat. 2019, 73 (Suppl. S1), 213–222. [Google Scholar] [CrossRef] [Green Version]
  15. Montoya-Delgado, L.E.; Irony, T.Z.; Pereira, C.A.B.; Whittle, M.R. An unconditional exact test for the Hardy-Weinberg equilibrium law: Sample-space ordering using the Bayes factor. Genetics 2001, 158, 875–883. [Google Scholar] [CrossRef] [PubMed]
  16. Irony, T.Z.; Pereira, C.A.B. Bayesian hypothesis test: Using Surface Integrals to Distribute Prior Information among the Hypotheses. Resenhas Inst. Matemática Estatística Univ. Paulo 1995, 2, 27–46. [Google Scholar]
  17. Pereira, C.A.B.; Wechsler, S. On the concept of P-value. Braz. J. Probab. Stat. 1993, 7, 159–177. [Google Scholar]
  18. Esteves, L.G.; Izbicki, R.; Stern, J.M.; Stern, R.B. Pragmatic Hypotheses in the Evolution of Science. Entropy 2019, 21, 883. [Google Scholar] [CrossRef] [Green Version]
  19. Schervish, M.J. Theory of Statistics; Springer: New York, NY, USA, 1995. [Google Scholar]
  20. O’Hagan, A.; Forster, J.J. Kendall’s Advanced Theory of Statistics, Volume 2B: Bayesian Inference; Arnold: London, UK, 2004. [Google Scholar]
  21. Box, G.E.P.; Tiao, G.C. Bayesian Inference in Statistical Analysis; Wiley: New York, NY, USA, 1973. [Google Scholar]
  22. DeGroot, M.H. Optimal Statistical Decisions; McGraw-Hill: New York, NY, USA, 1970. [Google Scholar]
  23. Madruga, R.; Pereira, C.A.B. Power of FBST: Standard examples. Inst. Interam. Estadística. Estadístic 2005, 57, 1–9. [Google Scholar]
  24. Berger, J.O.; Delampady, M. Pragmatic Testing Precise Hypotheses. Stat. Sci. 1987, 22, 317–335. [Google Scholar]
  25. Blundell, R.; Duncan, A.; Pendakur, K. Semiparametric estimation and consumer demand. J. Appl. Econom. 1998, 13, 435–461. [Google Scholar] [CrossRef]
  26. Harrison, D.; Rubinfeld, D.L. Hedonic Housing Prices and the Demand for Clean Air. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef]
  27. Croissant, Y.; Graves, S. Ecdat: Data Sets for Econometrics. R Package Version 0.4-2. 2022. Available online: https://CRAN.R-project.org/package=Ecdat (accessed on 26 September 2022).
  28. Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002; Available online: https://www.stats.ox.ac.uk/pub/MASS4/ (accessed on 26 September 2022).
  29. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 26 September 2022).
  30. Mardia, K.V.; Kent, J.T.; Bibby, J.M. Multivariate Analysis; Academic Press: London, UK, 1979. [Google Scholar]
Figure 1. Averaged error probabilities ( α φ e , β φ e and α φ e + β φ e ) as function of k. Sample size n = 100 .
Figure 1. Averaged error probabilities ( α φ e , β φ e and α φ e + β φ e ) as function of k. Sample size n = 100 .
Entropy 25 00019 g001
Figure 2. Cutoff values k * for e v H ; y as a function of n, with d = 2 and d = 3 .
Figure 2. Cutoff values k * for e v H ; y as a function of n, with d = 2 and d = 3 .
Entropy 25 00019 g002
Figure 3. Optimal averaged type-I error probability ( α φ * ) as a function of n, with d = 2 and d = 3 .
Figure 3. Optimal averaged type-I error probability ( α φ * ) as a function of n, with d = 2 and d = 3 .
Entropy 25 00019 g003
Figure 4. Unknown-variance model optimal averaged error probabilities ( α φ e * * , β φ e * * and α φ e * * + β φ e * * ) as functions of n.
Figure 4. Unknown-variance model optimal averaged error probabilities ( α φ e * * , β φ e * * and α φ e * * + β φ e * * ) as functions of n.
Entropy 25 00019 g004
Figure 5. Optimal averaged error probabilities ( α φ * , β φ * and α φ * + β φ * ) as functions of n.
Figure 5. Optimal averaged error probabilities ( α φ * , β φ * and α φ * + β φ * ) as functions of n.
Entropy 25 00019 g005
Figure 6. Unknown-variance model cutoff values k * for e v H ; y as a function of d, with n = 60 and n = 120 .
Figure 6. Unknown-variance model cutoff values k * for e v H ; y as a function of d, with n = 60 and n = 120 .
Entropy 25 00019 g006
Figure 7. Optimal averaged type-I error probability ( α φ * ) as a function of d, with n = 60 and n = 120 .
Figure 7. Optimal averaged type-I error probability ( α φ * ) as a function of d, with n = 60 and n = 120 .
Entropy 25 00019 g007
Table 1. Cutoff values k * for e v H ; y as a function of n, with d = 2 and d = 3 .
Table 1. Cutoff values k * for e v H ; y as a function of n, with d = 2 and d = 3 .
k *
n d = 2 d = 3
100.325300.51220
500.125340.22442
1000.117050.21081
1500.108890.19735
2000.100920.18416
2500.093230.17132
3000.085870.15894
3500.078930.14713
4000.072430.13598
4500.066410.12560
5000.060910.11606
10000.030350.06689
15000.022230.07086
20000.018920.07173
Table 2. Optimal averaged type-I error probability ( α φ * ) as a function of n, with d = 2 and d = 3 .
Table 2. Optimal averaged type-I error probability ( α φ * ) as a function of n, with d = 2 and d = 3 .
α φ *
n d = 2 d = 3
100.124000.09200
500.045150.04327
1000.038990.03775
1500.033270.03252
2000.028170.02772
2500.023800.02341
3000.020180.01963
3500.017320.01642
4000.015130.01376
4500.013530.01163
5000.012410.01002
10000.009410.00683
15000.008270.00398
20000.006810.00524
Table 3. Cutoff values k * , e v H ; y 0 and P-value (y0) as function of n, with d = 2 and d = 3 .
Table 3. Cutoff values k * , e v H ; y 0 and P-value (y0) as function of n, with d = 2 and d = 3 .
d = 2 d = 3
n k * ev α φ P * Pv k * ev α φ P * Pv
100.32530.98380.12400.75100.51220.96960.09200.4850
500.12530.08200.04510.01900.22440.92610.04330.3570
1000.11710.00000.03900.00000.21080.41760.03770.0650
1500.10890.09730.03330.02000.19740.29650.03250.0510
2000.10090.00360.02820.00000.18420.04660.02770.0040
2500.09320.00010.02380.00000.17130.06200.02340.0050
3000.08590.00000.02020.00000.15890.01190.01960.0010
3500.07890.00000.01730.00000.14710.02820.01640.0030
4000.07240.00000.01510.00000.13600.03470.01380.0020
4500.06640.00000.01350.00000.12560.06280.01160.0040
5000.06090.00000.01240.00000.11610.01810.01000.0010
10000.03030.00000.00940.00000.06690.00000.00680.0010
15000.02220.00000.00830.00000.07090.00000.00400.0010
20000.01890.00000.00680.00000.07170.00000.00520.0010
Table 4. Unknown-variance model cutoff values k * for e v H ; y as a function of d, with n = 60 and n = 120 .
Table 4. Unknown-variance model cutoff values k * for e v H ; y as a function of d, with n = 60 and n = 120 .
k *
d n = 60 n = 120
20.185000.08560
30.204200.19480
40.315100.39630
50.477900.49500
60.576700.53040
70.799700.67400
80.829700.70490
90.912500.80310
100.945400.92770
110.973000.92940
210.999900.99960
310.999900.99970
410.999900.99990
510.999900.99990
Table 5. Optimal averaged type-I error probability ( α φ * ) as a function of d, with n = 60 and n = 120 .
Table 5. Optimal averaged type-I error probability ( α φ * ) as a function of d, with n = 60 and n = 120 .
α φ *
d n = 60 n = 120
20.037000.02100
30.033000.03800
40.037000.03600
50.041000.03800
60.048000.03300
70.044000.03500
80.046000.03100
90.050000.03600
100.045000.03900
110.046000.04000
210.051000.03700
310.053000.03700
410.072000.03600
510.126000.04100
Table 6. Cutoff values k * , e v H ; y 0 and P-value (y0) as functions of d, with n = 60 and n = 120 .
Table 6. Cutoff values k * , e v H ; y 0 and P-value (y0) as functions of d, with n = 60 and n = 120 .
n = 60 n = 120
d k * ev α φ P * Pv k * ev α φ P * Pv
20.18500.68650.03700.36600.08560.00820.02100.0010
30.20420.58490.03300.13600.19480.71990.03800.1760
40.31510.81190.03700.18200.39630.92300.03600.2470
50.47790.00000.04100.00000.49500.00000.03800.0010
60.57670.56720.04800.02900.53040.70020.03300.0360
70.79970.88540.04400.08200.67400.99920.03500.2860
80.82970.32670.04600.00500.70490.78580.03100.0260
90.91250.19190.05000.00200.80310.00090.03600.0010
100.94540.00060.04500.00100.92770.00010.03900.0010
110.97300.00000.04600.00000.92940.00000.04000.0000
210.99990.00000.05100.00000.99960.00000.03700.0000
310.99991.00000.05300.02400.99970.04950.03700.0010
410.99990.99980.07200.00100.99990.00040.03600.0010
510.99991.00000.12600.00000.99990.00000.04100.0000
Table 7. Budget shares of British households dataset hypothesis-testing summary.
Table 7. Budget shares of British households dataset hypothesis-testing summary.
Coefficients θ ^ Freq α pv θ ^ Bayes k * ev α φ P * Pv
Intercept0.37580.05000.00000.37000.70780.00000.03820.0000
x i 1 −0.00040.05000.0000−0.00040.01130.00000.00010.0000
x i 2 −0.15330.05000.0003−0.12830.94100.18900.12780.0172
x i 3 0.17170.05000.00070.14870.95200.19570.14680.0143
x i 4 0.00090.05000.01190.00100.07640.30480.00040.0666
Table 8. Budget shares of British households dataset optimal averaged error probabilities.
Table 8. Budget shares of British households dataset optimal averaged error probabilities.
Coefficients α φ e * * α φ P * β φ e * * β φ P *
Intercept0.04660.03820.21570.2193
x i 1 0.00000.00010.00060.0006
x i 2 0.15210.12780.41460.4145
x i 3 0.15080.14680.46790.4410
x i 4 0.00040.00040.00800.0083
Table 9. Boston housing dataset hypothesis-testing summary.
Table 9. Boston housing dataset hypothesis-testing summary.
Coefficients θ ^ Freq α pv θ ^ Bayes k * ev α φ P * Pv
Intercept1.70350.05000.69581.20350.99981.00000.19160.0085
x i 1 −0.12440.05000.0006−0.12440.57800.33650.00100.0001
x i 2 0.03590.05000.02240.03620.40890.90120.00040.0025
x i 3 −0.14890.05000.0235−0.14730.63900.91140.00250.0023
x i 4 6.71650.05000.00006.73360.92960.00000.01430.0000
x i 5 −0.06550.05000.0000−0.06480.32750.01410.00010.0000
x i 6 −1.31980.05000.0000−1.30910.81460.00010.00950.0000
x i 7 −0.00300.05000.2324−0.00300.01240.99960.00020.0198
x i 8 −0.76520.05000.0000−0.75280.82230.00030.00530.0000
x i 9 0.01450.05000.00000.01470.02970.01130.00010.0000
Table 10. Boston housing dataset optimal averaged error probabilities.
Table 10. Boston housing dataset optimal averaged error probabilities.
Coefficients α φ e * * α φ P * β φ e * * β φ P *
Intercept0.13210.19160.64940.4946
x i 1 0.00180.00100.01650.0173
x i 2 0.00060.00040.00750.0079
x i 3 0.00300.00250.02860.0292
x i 4 0.02220.01430.11230.1181
x i 5 0.00000.00010.00680.0068
x i 6 0.00910.00950.08250.0808
x i 7 0.00000.00020.00160.0015
x i 8 0.00810.00530.04940.0521
x i 9 0.00000.00010.00190.0017
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Patiño Hoyos, A.E.; Fossaluza, V.; Esteves, L.G.; de Bragança Pereira, C.A. Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases. Entropy 2023, 25, 19. https://doi.org/10.3390/e25010019

AMA Style

Patiño Hoyos AE, Fossaluza V, Esteves LG, de Bragança Pereira CA. Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases. Entropy. 2023; 25(1):19. https://doi.org/10.3390/e25010019

Chicago/Turabian Style

Patiño Hoyos, Alejandra E., Victor Fossaluza, Luís Gustavo Esteves, and Carlos Alberto de Bragança Pereira. 2023. "Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases" Entropy 25, no. 1: 19. https://doi.org/10.3390/e25010019

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop