Next Article in Journal
Data Cloning Estimation and Identification of a Medium-Scale DSGE Model
Previous Article in Journal
Statistical Analysis in the Presence of Spatial Autocorrelation: Selected Sampling Strategy Effects
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Semiparametric Tilt Optimality Model

by
Chathurangi H. Pathiravasan
1 and
Bhaskar Bhattacharya
2,*
1
Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
2
School of Mathematical and Statistical Sciences, Southern Illinois University Carbondale, Carbondale, IL 62901, USA
*
Author to whom correspondence should be addressed.
Stats 2023, 6(1), 1-16; https://doi.org/10.3390/stats6010001
Submission received: 14 November 2022 / Revised: 11 December 2022 / Accepted: 14 December 2022 / Published: 22 December 2022
(This article belongs to the Section Statistical Methods)

Abstract

:
Practitioners often face the situation of comparing any set of k distributions, which may follow neither normality nor equality of variances. We propose a semiparametric model to compare those distributions using an exponential tilt method. This extends the classical analysis of variance models when all distributions are unknown by relaxing its assumptions. The proposed model is optimal when one of the distributions is known. Large-sample estimates of the model parameters are derived, and the hypotheses for the equality of the distributions are tested for one-at-a-time and simultaneous comparison cases. Real data examples from NASA meteorology experiments and social credit card limits are analyzed to illustrate our approach. The proposed approach is shown to be preferable in a simulated power comparison with existing parametric and nonparametric methods.

1. Introduction

Classical applied statistical techniques often depend heavily on the assumption that observations are normally distributed. The benefit of this assumption is that it helps to produce exact inferences in many popular methods, such as, t-test, F-test, chi-squared tests, analysis of variance (ANOVA) models and multivariate analysis. In reality, however, observations often show departures from normality (or near-normality). Statistical texts address this issue and discuss remedies, such as the Box-Cox transformations of data to normality. Developing techniques that use fewer assumptions (normality or others) is an important area of statistical research. When comparing different populations, this paper proposes an alternative to the one-way ANOVA by relaxing some of the assumptions.
Previous study described experimental radar reflectivity data obtained from independent radars deployed during NASA’s Tropical Rainfall Measuring Mission Kwajalein Experiment in the republic of the Marshall Islands on 15 July to 12 September 1999 [1]. The data are skewed, and we investigate whether the data from two radars are from identical populations. A credit limit data set [2] with three different education levels; graduate school, university and high school was obtained from the University of California Irvine repository. The data are skewed with high variability, and we studied whether the three credit limit populations are identical. Such data have unknown statistical distributions, and known statistical procedures may fail to work properly.
If g i ( x ) is the probability density function (pdf) of N ( μ i , σ 2 ) for i = 1 , , k [3], then one can write
g i ( x ) = g k ( x ) exp ( α i + β i x ) , i = 1 , , k 1 ,
where
α i = μ k 2 μ i 2 2 σ 2 , β i = μ i μ k σ 2 , i = 1 , , k 1 .
From (1), denoting g k ( x ) as a reference distribution, one can think of g i ( x ) as an exponential distortion or tilt of the reference. Furthermore, using (2), the test of equality of μ i ’s (as needed in the one-way ANOVA) reduces to the test of equality of the β i ’s to zero, or
H 0 : μ 1 = = μ k H 0 : β 1 = = β k 1 = 0 .
Motivated by (3), the same authors proposed a generalization of (1) by replacing g k ( x ) with any pdf g ( x ) , x in the exponent of (1) with any known function h ( x ) and considered all pdfs g i ( x ) that can be expressed as exponential tilt pdfs of g ( x ) . In this way, (1)–(2) are updated as
g i ( x ) = g ( x ) exp ( α i + β i h ( x ) ) ,
where
α i = ln e β i h ( x ) g ( x ) d x , i = 1 , , k 1 ,
for some β i R . Then, H 0 : β 1 = = β k 1 = 0 can also be used to test for the equality of any k pdfs g ( x ) , g i ( x ) , 1 i k 1 satisfying (4), which are not necessarily normal.
In classical ANOVA, the parameters of the normal distributions are estimated using the maximum likelihood method. To estimate the β i parameters in (4), Consider the restricted class of distributions obtained by multiplicative exponential distortions with g k = g as a reference and, based on k independent samples, used the profile maximum likelihood method to estimate g , g i , i in that class [3]. This paper, instead, considers the class C of all distributions (which are restricted by a given mean for h ( X ) ) and estimates the β i by minimizing the Kullback–Leibler divergence between g ( x ) and the class C .
Often the criterion of comparison between distributions may be clear from the context of the data, which helps to formulate the constraint C . In the radar data, we are interested to know if the mean rain rate is equal in two different radars. In the credit limit data, we are interested to know if the mean credit limit is the same in three education groups. The constraints can be multiple criteria as well. Once the constraints are fixed in C , only those aspects are considered in the comparison between distributions.
This approach matches with the maximum entropy (ME) principle, which may be stated as follows: when selecting a model for a given situation, it is often appropriate to express the prior information in terms of the constraints. However, one must be careful so that no information other than these specified constraints are used in model selection. That is, other than the constraints that we have, the uncertainty associated with the probability distribution to be selected should be kept at its maximum [4]. In this paper, we extended the ME principle to general information projection using the Kullback–Leibler divergence.
We show in Section 2 that the solution (4) is optimum under specified constraints using h when g is known and, in this way, the proposed approach yields an optimality interpretation for the exponential tilt models. The proposed approach extends the comparisons of means in ANOVA to comparisons of means and variances for the normal and other known distributions using duality. In Section 3, we develop a semi-parametric approach when g ( x ) is unknown and derive asymptotic test statistics for testing the equality of populations for the cases when sample sizes are equal or different.
In Section 4, we present simulation studies that evaluate the performance of the β i -parameters with respect to the classical ANOVA methods. We also compare the test statistics developed in Section 4 with existing parametric and nonparametric procedures. Section 5 shows the details of the proposed methods for the applications with radar data and credit limit data sets. Section 6 contains discussions on the choice of the function(s) h for particular cases of reference distributions g ( x ) . The Appendix contains additional results and proofs.

2. Tilt Optimality Models

Kullback–Liebler (KL) discriminant information or divergence is a measure of ‘distance’ between two probability distributions. For pdf’s f ( x ) , g ( x ) , the KL discriminant information for f ( x ) against g ( x ) is given by f ( x ) ln f ( x ) g ( x ) d x , which is always nonnegative, and = 0 if and only if f g .
Let C be a convex set of pdf’s with g C . If f * ( x ) is the solution to
inf f ( x ) C f ( x ) ln f ( x ) g ( x ) d x
then f * ( x ) (information projection) is the closest to g ( x ) among all pdf’s in C with respect to the KL distance. Let h i ( x ) be arbitrary but known functions of x. Define C to be the class of all pdf’s f where E f ( h i ( X ) ) = 0 , i —that is,
C = f ( x ) : h i ( x ) f ( x ) d x = 0 , i .
In order to solve (6), Fenchel’s duality theorem can be applied as shown by [5]. The corresponding dual cone is C * = { i = 1 k β i h i ( x ) : β i R } .
The dual problem can be shown to be equivalent to
inf β i R , i exp ( i = 1 k β i h i ( x ) ) g ( x ) d x .
As the dual problem (8) is a function of scalars only, it could be substantially easier to solve than the primal problem (6), depending on the form of h i ( x ) . In particular, setting the derivative of the integral quantity in (8) with respect to β i equal to zero, we obtain
h i ( x ) exp ( i = 1 k β i h i ( x ) ) g ( x ) d x = 0 , 1 i k ,
which can be solved easily for β i ’s by the Newton–Raphson method. If the solution of (9) is β ^ i , then, from [5], the solution of the primal problem (6), say f * ( x ) , is of the form
f * ( x ) = g ( x ) exp ( i = 1 k β ^ i h i ( x ) ) g ( x ) exp ( i = 1 k β ^ i h i ( x ) ) d x .
When k = 1 , setting β ^ = β i , f * ( x ) = g i ( x ) , (10) can be simplified as (4). The above derivation explains the exponential structure of g i ( x ) , and (9) verifies that the exponential tilt model g i ( x ) is in C (see (7)). These developments are summarized in the following theorem.
Theorem 1.
When g ( x ) is known, the exponential tilt model (4) is the optimum model in the sense that it is the closest to g ( x ) among all probability distributions in C from (7) in the KL distance.
The model (10) will be referred to in the sequel as the tilt optimality (TO) model. Note that, when C in (7) specifies h ( x ) = x μ i and g ( x ) = g k ( x ) N ( μ k , σ 2 ) , then the solution g i ( x ) is found to be N ( μ i , σ 2 ) as was seen in (1)–(2). When g ( x ) in (6) is uniform (or Lebesgue measure), then the minimizing f * is known as the maximum entropy model (distribution) in C [4].
Theorem A1 in the Appendix A shows that closed form solutions for the dual problem are obtained for the normal distributions with constraints on both the mean and variance. However, this may not always be the case. The final solution depends on the form of g ( x ) and the restrictions in C .
While (1) compares each g i ( x ) with g k ( x ) one-at-a-time, another approach would be to compare all k distributions simultaneously. For x = ( x 1 , , x k ) , let X = ( X 1 , , X k ) g ( x ) = N k ( μ , Σ ) with known μ = ( μ 1 , , μ k ) and covariance matrix Σ = ( σ i j ) with σ i i = σ i 2 and σ i j = 0 , elsewhere. Considering equality of the k means (unspecified) with possibly unequal variances, we define
C = { f ( x ) : E ( X i ) = E ( X i + 1 ) , 1 i k 1 } .
Following (10), the solution is given by
f * ( x ) = g ( x ) e i = 1 k β ^ i ( x i x i + 1 ) g ( x ) e i = 1 k β ^ i ( x i x i + 1 ) d x .
However, a closed form expression is only available when k = 3 , 4 .
Theorem 2 (proof in Appendix A) considers equality of means as in classical ANOVA (deals with unknown means and unknown but equal variances) but with unrestricted variances.
Theorem 2.
For k = 3 with known reference pdf g ( x ) = N 3 ( μ , Σ ) with the above Σ, we consider the minimization problem
inf f ( x ) C x f ( x ) ln f ( x ) g ( x ) d x
with C in (11). The solution to (13) is given by
f * ( x ) = g ( x ) e α ^ 1 + β ^ 1 ( x 1 x 2 ) + β ^ 2 ( x 2 x 3 ) = N 3 ( μ * 1 , Σ )
where α ^ 1 = ln g ( x ) e β ^ 1 ( x 1 x 2 ) + β ^ 2 ( x 2 x 3 ) d x ,
μ * = μ 1 σ 2 2 σ 3 2 + μ 2 σ 1 2 σ 3 2 + μ 3 σ 1 2 σ 2 2 σ 1 2 σ 2 2 + σ 1 2 σ 3 2 + σ 2 2 σ 3 2 , 1 = ( 1 , 1 , 1 ) ,
β ^ 1 = μ 1 σ 2 2 + μ 1 σ 3 2 μ 2 σ 3 2 μ 3 σ 2 2 σ 1 2 σ 2 2 + σ 1 2 σ 3 2 + σ 2 2 σ 3 2 , β ^ 2 = μ 1 σ 2 2 + μ 2 σ 1 2 μ 3 σ 1 2 μ 3 σ 2 2 σ 1 2 σ 2 2 + σ 1 2 σ 3 2 + σ 2 2 σ 3 2 .
Note that the solution f * ( x ) has the same covariance Σ as the reference g ( x ) ; nonetheless, Σ influences the mean μ * in solution (15) as a weighted average of its elements. When σ 1 2 = σ 2 2 = σ 3 2 as in one-way ANOVA, then μ * = μ 1 + μ 2 + μ 3 3 ,
β ^ 1 = 2 μ 1 μ 2 μ 3 3 σ 2 , β ^ 2 = μ 1 + μ 2 2 μ 3 3 σ 2 .
Theorem 2 can be extended to k = 4 with closed form solutions for β , μ * , which is tedious [6]. Unique solutions exist for higher k but obtaining their closed forms seems intractable. This is also the case of extension to a general Σ with σ i j 0 , i j .
Beyond the one-way ANOVA, the above approach allows us to simultaneously compare both the means and variances of k independent normal distributions (Theorem A2).

3. Semiparametric Approach

For an unknown data generating process, however, the true form of g ( x ) in (13) (with C in (11)) may not be known. Then, g i ( x ) in the solution (4) becomes not well-defined (replacing x with x). Note that the model (10) now becomes a ‘semiparametric tilt optimality restricted model’ because, along with the parametric component β , there is also the nonparametric component g ( x ) , about which no distributional assumption is made. Using the sample, we define a discrete version of C expressed as moment constraints. Assuming that these sample (moment) constraints represent the corresponding population (moment) constraints efficiently and consistently, the resulting model is expected to perform well.
The dual problem corresponding to (13) is
inf β i R , i x e i = 1 k 1 β i ( x i x i + 1 ) g ( x ) d x .
The relevant score equations are
x ( x i x i + 1 ) e i = 1 k 1 β i ( x i x i + 1 ) g ( x ) d x = 0 , 1 i k 1 .
To study the asymptotic properties of the model (similar in spirit to [7]), we consider the cases when the sample sizes are equal and when they are different.

3.1. Equal Sample Sizes

Suppose k independent random samples { x i j 1 j n } , 1 i k , each of size n, are available from independent populations with unknown means μ i and unknown variances σ i 2 , 1 i k , respectively. If we rearrange all the n k sample values as ( x 1 , , x n ) , where x j = ( x 1 j , , x k j ) , 1 j n , then x j ’s form a random sample from a multivariate distribution (say, pdf g ( x ) ), with mean μ = ( μ 1 , , μ k ) and covariance diag ( σ 1 2 , , σ k 2 ) .
Let p ^ = ( p ^ 1 , , p ^ n ) be the empirical distribution that has mass p ^ j = 1 n at each x j , 1 j n . The constraint of equality of k means, C in (11), is discretized below as K , appropriately, using the probability mass function (pmf) q = ( q 1 , , q n ) and sample values x i j ’s as
K = q : j = 1 n ( x i j x i + 1 , j ) q j = 0 , 1 i k 1 .
Here, the primal problem (6) becomes (replacing pdfs with pmfs) inf q K j = 1 n q j ln q j p ^ j , and the dual problem is inf β i R , i j = 1 n p ^ j exp i = 1 k 1 β i ( x i j x i + 1 , j ) . The score equations for the dual problem are
j = 1 n p ^ j ( x i j x i + 1 , j ) exp i = 1 k 1 β i ( x i j x i + 1 , j ) = 0 , 1 i k 1 .
Suppose β n = ( β n i , 1 i k 1 ) solve the score Equation (20). Then, the primal solution q ^ j * is given by
q ^ j * = p ^ j exp i = 1 k 1 β n i ( x i j x i + 1 , j ) j = 1 n p ^ j exp i = 1 k 1 β n i ( x i j x i + 1 , j ) , j = 1 , , n .
For arbitrary δ = ( δ 1 , , δ k 1 ) , define
h j ( δ ) = i = 1 k 1 δ i ( x i j x i + 1 , j ) , 1 j n ,
m n ( δ ) = ( m n r ( δ ) : 1 r k 1 ) , Σ n ( δ ) = ( σ n r s ( δ ) ; 1 r , s k 1 ) ,
m n r ( δ ) = 1 n j = 1 n ( x r j x r + 1 , j ) e h j ( δ ) 1 n j = 1 n e h j ( δ ) , and σ n r s ( δ ) = 1 n j = 1 n ( x r j x r + 1 , j ) m n r ( δ ) ( x s j x s + 1 , j ) m n s ( δ ) e h j ( δ ) 1 n j = 1 n e h j ( δ ) .
By (20) and (21), m n ( β n ) = 0 , a vector of zeroes of length k 1 . Furthermore, by Taylor’s expansion of m n ( β n ) around β ( β = ( β 1 , , β k 1 ) ),
0 = m n ( β n ) = m n ( β ) + Σ n ( w n ) ( β n β ) ,
where w n satisfies max{ | w n β n | , | w n β | } ≤ | β n β | . As n , by the strong law of large numbers, β n β , w n β , and
1 n j = 1 n ( x r j x r + 1 , j ) a ( x s j x s + 1 , j ) b e h j ( w n ) ( x r x r + 1 ) a ( x s j x s + 1 , j ) b e h ( β ) g ( x ) d x ,
a , b = 0 , 1 , 1 r , s k 1 , with probability 1 where h ( β ) = i = 1 k 1 β i ( x i x i + 1 ) .
By (21) and (23), it follows that, when μ 1 = = μ k ,
m n r ( w n ) ( x r x r + 1 ) e h ( β ) g ( x ) d x e h ( β ) g ( x ) d x = 0 , σ n r s ( w n ) σ r s = ( x r x r + 1 ) ( x s x s + 1 ) e h ( β ) g ( x ) d x e h ( β ) g ( x ) d x ,
1 r , s k 1 , with probability 1. Let Σ = ( σ r s ) . As n , by the central limit theorem
n ( m n ( β ) 0 ) N ( 0 , Σ * ) ,
where Σ * = ( σ r s * ) ,
σ r s * = ( x r x r + 1 ) ( x s x s + 1 ) e 2 h ( β ) g ( x ) d x e h ( β ) g ( x ) d x 2 .
By (22)–(26),
n β n β = Σ n ( w n ) 1 n m n ( β n ) m n ( β ) D N 0 , Σ 1 Σ * Σ 1 ,
as n . The above developments are summarized in the following theorem. This theorem establishes the asymptotic normality of the parameters of the proposed model when sample sizes are equal.
Theorem 3.
For a general reference pdf g, assume that the solution of (13), f * ( x ) , exists and e h ( β ) f * ( x ) d x < where h ( β ) = i = 1 k 1 β i ( x i x i + 1 ) for β in an open neighborhood of 0. When ( x i x i + 1 ) ( x j x j + 1 ) e 2 h ( β ) g ( x ) d x < , i , j , and all sample sizes are equal to n,
n ( β n β ) D N ( 0 , Σ 1 Σ * Σ 1 )
as n where Σ , Σ * are defined in (24) and (26), respectively.
When μ 1 = = μ k , or equivalently, β = 0 , the quantity n β n ( Σ 1 Σ * Σ 1 ) 1 β n has an asymptotic chi-square distribution with k 1 degrees of freedom as n .
Thus, the test statistic
χ 1 = n β n ( Σ ^ 1 Σ ^ * Σ ^ 1 ) 1 β n
can be used for testing the hypothesis H 0 : β = 0 where σ r s , σ r s * are estimated by σ n r s ( β n ) , σ n r s * ( β n ) , respectively, (since Σ = Σ * , we replaced σ n r s ( β n ) , σ n r s * ( β n ) by ( σ n r s ( β n ) + σ n r s * ( β n ) ) / 2 ).
σ n r s ( β n ) = 1 n j = 1 n ( x r j x r + 1 , j ) ( x s j x s + 1 , j ) e i = 1 k 1 β n i ( x i j x i + 1 , j ) 1 n j = 1 n e i = 1 k 1 β n i ( x i j x i + 1 , j ) , σ n r s * ( β n ) = 1 n j = 1 n ( x r j x r + 1 , j ) ( x s j x s + 1 , j ) e 2 i = 1 k 1 β n i ( x i j x i + 1 , j ) 1 n j = 1 n e i = 1 k 1 β n i ( x i j x i + 1 , j ) 2 .
Clearly, the above developments can be extended for simultaneous mean and variance comparisons for k populations by modifying the C in (11).

3.2. Different Sample Sizes

The simultaneous approach developed above for equal sample sizes does not allow the sample sizes to be different. To that end, we consider k 1 independent one-at-a-time population optimization problems by adopting the development in Section 2 reversing the roles of g ( x ) = g k ( x ) and g i ( x ) in (4), along with setting h ( x ) = x , and, finally, we combine the k 1 results. Both procedures work for equal sample sizes.
For 1 i k 1 , consider the ith problem as finding the pdf f ( x ) in
C = { f ( x ) : E [ X ] = μ k } ,
which is the closest to g i ( x ) , assuming that μ k is known. Following similar steps as in Section 2, the pdf in (29), which is the closest to g i ( x ) , is given by
f i * ( x ) = g i ( x ) exp ( γ i + η i x ) , γ i = ln e η i x g i ( x ) d x ,
i = 1 , , k 1 , where η i solves
( x μ k ) exp ( η i x ) g i ( x ) d x = 0 .
If η i = 0 , then f i * ( x ) = g i ( x ) , x , i .
To develop the corresponding k 1 sample optimization problems, suppose independent random samples { x i j 1 j n i } , 1 i k are available from k populations with the means μ i , 1 i k , respectively. Although we assume that μ k is known, in reality, it maybe unknown. Thus, we suggest choosing the kth sample as the one that is the largest in size. Let x ¯ k = j = 1 n k x i j n k be the mean of the kth sample. Then, take μ k = x ¯ k (see Section 6).
For the ith ( 1 i k 1 ) sample optimization problem, let p ^ i = ( p ^ i 1 , , p ^ i , n i ) be an empirical distribution that has mass p ^ i j = 1 n i at each x i j , 1 j n i . Let the ith sample version of C in (29), say K i , containing q i = ( q i 1 , , q i , n i ) , be defined as
K i = q i : j = 1 n i ( x i j x ¯ k ) q i j = 0 , 1 i k 1 .
The ith ( 1 i k 1 ) sample version of (6) and its dual problem becomes
inf q i K i j = 1 n i q i j ln q i j p ^ i j , and , inf η i R j = 1 n i p ^ i j exp η i ( x i j x ¯ k ) ,
respectively. The ith score equation is
j = 1 n i p ^ i j ( x i j x ¯ k ) exp η i ( x i j x ¯ k ) = 0 .
Suppose η n i solves (32). Then, the primal solution q ^ i j * is given by
q ^ i j * = p ^ i j exp η n i ( x i j x ¯ k ) j = 1 n p ^ i j exp η n i ( x i j x ¯ k ) , j = 1 , , n i .
For arbitrary ψ i R , 1 i k 1 , define
h i j ( ψ i ) = ψ i ( x i j x ¯ k ) , m n i ( ψ i ) = 1 n i j = 1 n i ( x i j x ¯ k ) e h i j ( ψ i ) 1 n i j = 1 n i e h i j ( ψ i ) , and σ n i 2 ( ψ i ) = 1 n i j = 1 n i ( x i j x ¯ k ) m n i ( ψ i ) 2 e h i j ( ψ i ) 1 n i j = 1 n i e h i j ( ψ i ) .
By (32), m n i ( η n i ) = 0 , i . Using Taylor’s expansion of m n i ( η n i ) around η i ,
0 = m n i ( η n i ) = m n i ( η i ) + σ n i 2 ( w n i ) ( η n i η i ) ,
where w n i is such that max{ | w n i η n i | , | w n i η i | } ≤ | η n i η i | . With h i = η i ( x x ¯ k ) , by the strong law of large numbers, as n i , η n i η i , w n i η i i and
1 n i j = 1 n i ( x i j x ¯ k ) a e h i j ( w n i ) ( x x ¯ k ) a e h i g i ( x ) d x ,
a = 0 , 1 , 2 , 1 i k 1 , with probability 1.
From (31), ( x x ¯ k ) e h i g i ( x ) d x = 0 . When μ i = μ k , by (33) and (35), it follows that, as n i ,
m n i ( w n i ) ( x x ¯ k ) e h i g i ( x ) d x e h i g i ( x ) d x = 0 , σ n i 2 ( w n i ) σ i 2 = ( x x ¯ k ) 2 e h i g i ( x ) d x e h i g i ( x ) d x ,
1 i k 1 , with probability 1. As n i , by the central limit theorem
n i ( m n i ( η i ) 0 ) N ( 0 , σ i 2 * ) ,
where
σ i * 2 = ( x x ¯ k ) 2 e 2 h i g i ( x ) d x e h i g i ( x ) d x 2 .
By (34), (35) and (25),
n i η n i η i = n m n i ( η n i ) m n i ( η i ) σ n i 2 ( w n i ) D N 0 , σ i * 2 σ i 4 ,
as n i , 1 i k 1 .
Next, we combine the results from the k 1 sample-optimization problems.
Theorem 4.
For a general reference pdf g, assume that the solution of (13) subject to (29), f i * ( x ) , exists and e η i x f i * ( x ) d x < for some η i in an open neighborhood of 0, 1 i k 1 . Assume x 2 e η i x f i * ( x ) d x < , i . Since all k samples are independent, n i ( η n i η i ) σ i * / σ i 2 D N ( 0 , 1 ) , as n i .
Then, under H 0 : η i = 0 , 1 i k 1 , i = 1 k 1 n i η n i 2 σ i * 2 / σ i 4 has an asymptotic chi-square distribution with k 1 degrees of freedom as n i , i . Thus, the test statistic
i = 1 k 1 n i η n i 2 σ n i * 2 ( η n i ) / σ n i 4 ( η n i )
can be used for testing the hypothesis H 0 : η i = 0 , 1 i k 1 , where we replaced σ i * 2 , σ i 2 by σ n i * 2 ( η n i ) , σ n i 2 ( η n i ) , respectively, with
σ n i * 2 ( η n i ) = 1 n i j = 1 n i ( x i j x ¯ k ) 2 e 2 η n i ( x i j x ¯ k ) 1 n i j = 1 n i e η n i ( x i j x ¯ k ) 2 , σ n i 2 ( η n i ) = 1 n i j = 1 n i ( x i j x ¯ k ) 2 e η n i ( x i j x ¯ k ) 1 n i j = 1 n i e η n i ( x i j x ¯ k ) .
Since σ i * 2 = σ i 2 under H 0 , we replace each of σ n i * 2 ( η n i ) , σ n i 2 ( η n i ) by a common estimate σ n i * * 2 ( η n i ) = ( σ n i * 2 ( η n i ) + σ n i 2 ( η n i ) ) / 2 , and then calculate a pooled estimate of variance σ ^ 2 = ( i = 1 k 1 n i σ n i * * 2 ( η n i ) / i = 1 k 1 n i over k 1 populations. With this substitution, the test statistic from (40) is simplified as
χ 2 = σ ^ 2 i = 1 k 1 n i η n i 2 .
The above developments can be extended for simultaneous mean and variance comparisons for k populations by modifying the C in (29).
For one h:
1 n j = 1 n e β n h ( x j ) 1 2 n j = 1 n h 2 ( x j ) e 2 β n h ( x j ) j = 1 n h 2 ( x j ) e β n h ( x j )
For more than one h, let h = ( h 1 , , h t ) represent t constraints.
(i) For equal sample sizes, recall x j = ( x 1 j , , x k j ) , 1 j n .
Recall Σ , Σ * given as Σ ^ = ( σ n r s ( β n ) ; 1 r , s t ) , Σ ^ * = ( σ n r s * ( β n ) ; 1 r , s t ) .
σ n r s ( β n ) = 1 n j = 1 n h r ( x j ) h s ( x j ) e l = 1 t β n i h i ( x j ) 1 n j = 1 n e l = 1 t β n i h i ( x j ) ,
σ n r s * ( β n ) = 1 n j = 1 n h r ( x j ) h s ( x j ) e 2 l = 1 t β n i h i ( x j ) 1 n j = 1 n e l = 1 t β n i h i ( x j ) 2 .
Then, the IC is
I C = 1 n j = 1 n e l = 1 t β n i h i ( x j ) 1 2 n l = 1 t λ ^ i
where λ ^ i is the ith eigen value of Σ ^ 1 2 Σ ^ * Σ ^ 1 2 . Find Σ ^ 1 2 by Cholesky decomposition.
(ii) For different sample sizes, define Σ i , Σ i * given as Σ ^ i = ( σ n i r s ( η n i ) ; 1 r , s t ) , Σ ^ i * = ( σ n i r s * ( η n i ) ; 1 r , s t ) .
σ n i r s ( η n i ) = 1 n i j = 1 n i h r ( x i j ) h s ( x i j ) e l = 1 t η n l h l ( x i j ) 1 n i j = 1 n i e l = 1 t η n l h l ( x i j ) ,
σ n i r s * ( η n i ) = 1 n i j = 1 n i h r ( x i j ) h s ( x i j ) e 2 l = 1 t η n l h l ( x i j ) 1 n i j = 1 n i e l = 1 t η n l h l ( x i j ) 2 .
Then, the IC is
I C = 1 k 1 i = 1 k 1 I C i
I C i = 1 n i j = 1 n i e l = 1 t η n l h l ( x i j ) 1 2 n i l = 1 t λ ^ i l
where λ ^ i l is the lth eigen value of Σ ^ i l 1 2 Σ ^ i l * Σ ^ i l 1 2 . Find Σ ^ i l 1 2 by Cholesky decomposition.

4. Simulation Studies

In this section, we study the effect of violations of the assumptions of the one-way ANOVA on the performances of the beta parameters using Mean Square Errors. We also compare powers of the proposed methods with existing parametric and nonparametric methods. We used the R program for all simulation studies. The proposed methods are applicable to any type of data including under-dispersed or over-dispersed or data containing outliers as we do not assume any distribution g ( x ) in our analysis.

4.1. Mean Square Errors of β

Although the one-way ANOVA does not have a β -parameter as is, (14) with β ^ i replaced by β i may be considered as the one-way ANOVA model, which shows the β -parameters when k = 3 . To implement the assumption of equal variance, we used σ 2 = σ 1 2 + σ 2 2 + σ 3 2 3 as the common variance in ANOVA. The true β 1 = β ^ 1 , β 2 = β ^ 2 are in (17). In the simulation, the true β 1 , β 2 are not changed when the population distribution is any different from normal to maintain the normality assumption of the one-way ANOVA.
There are four beta parameters ( β 1 , , β 4 ) under the simultaneous constraints of equality of means and equality of variances. For one-way ANOVA, we set β 3 = 0 and β 4 = 0 as there are no variance constraints. In the TO model (10), however, the true β changes depending on the population and constraints as shown in Section 3 for selected distributions as solutions of the corresponding score equations.
For the simulation in Table 1, we considered independent normal, uniform and beta populations and formed g ( x ) by multiplying the pdf’s of independent components as listed, which were chosen with varying means and/or variances. Various sample sizes of 50, 200 and 500 with 1000 runs for each case were considered of which only selected results are reported in Table 1 (see more details [6]). We found that the total mean square error of our method decreased for larger sample sizes.
As a consequence of the way the ANOVA model is defined in (14) using the β , both ANOVA and TO solutions (or methods) become identical when populations are identical or when the means are the same, and thus they produce the same values for true β ’s, solutions and MSEs. Both methods have a relatively higher TMSE (the total of all MSEs) for beta reference distributions compared to the normal reference distributions. For the cases considered, the TO method has a smaller MSE compared with the one-way ANOVA method and more so when the mean and variance constraints are simultaneously considered.

4.2. Power Comparison: Equal Sample Sizes

When testing the equality of means with equal sample sizes, we conducted simulation studies to compare our proposed test statistic χ 1 ’s (in (27)) performance with χ 2 in (42) developed for different sample sizes, ( χ ) [3], Hotelling’s tests ( T 1 2 or T 2 2 depending on known (chi-square) or unknown (F-distribution)) variances, classical ANOVA and the nonparametric Kruskal–Wallis (KW) test statistics.
For k = 3 , with μ = ( 0 , μ 2 , μ 3 ) , we considered either g ( x ) N 3 ( μ , Σ = I 3 ) , or g ( x ) g a m m a ( 3 , 1 ) × g a m m a ( 3 + μ 2 , 1 ) × g a m m a ( 3 + μ 3 , 1 ) . The hypothesis of equality of means becomes equivalent to testing H 0 : μ 2 = μ 3 = 0 . Equal sample sizes are taken as n = 30 or n = 50 . For all simulations, 1000 runs were used with a nominal level of α = 0.05 .
The power results as a function of μ 2 , μ 3 are shown in Table 2. Both χ 1 and χ 2 tests have higher powers than other tests under the gamma distribution, including the nonparametric test with the χ 2 test performing slightly better than the χ 1 test. However, none of the χ , χ 1 , χ 2 tests was dominated by the ANOVA test under the normal case as expected (also observed by [3] for the χ test).

4.3. Power Comparison: Different Sample Sizes

We compared the performance of the proposed semi-parametric test statistic χ 2 in (42) with the test statistic χ [3], classical ANOVA and the Kruskal–Wallis test (KW).
We used the same g ( x ) , H 0 , number of runs and nominal level as in the equal sample size case. The power results as a function of μ 2 , μ 3 are shown in Table 3 with different sample sizes n 1 , n 2 , n 3 . For the calculation of χ 2 , the value of the sample mean with the largest sample size is taken as fixed in all cases.
It can be seen that the χ 2 test performed better than the other tests. However, it is not dominated by the ANOVA test as expected under the normal distribution.

5. Applications

We considered two applications. Equal sample sizes were used in the radar meteorology example, and the sample sizes were different in the credit limit example.

5.1. Radar Meteorology

We considered radar reflectivity data [1] from two independent radars deployed during NASA’s Tropical Rainfall Measuring Mission Kwajalein Experiment. An S-band radar was located on Kwajalein Island at the southern end of the Kwajalein Atoll, and a C-band radar was on board the NOAA ship Ronald H. Brown. Different calibrations were applied to two radars, and their spherical data were recorded to the same 1 km cube Cartesian grid.
Histograms of the samples were identical if sample 1 and sample 2 both are taken from the C-band population (Figure 1). Histograms of sample 1 and 3 illustrate that the shapes of the distributions are different for sample 1 taken from the C-band population and sample 2 taken from the S-band population (Figure 1).
First, two independent samples were taken from the C-band radar population to test whether they were from the same population. The population size was around 3987, and the sample sizes were 500 each. To test that the populations were equal, the method of Section 4 (in particular, the test statistic χ 1 in (27) was used with k = 2 ), and the results are given in Table 4. Using only equality of means constraints, the data fail to reject H 0 : β = 0 with the p-value = 0.71. Using simultaneous equality of means and equality of variance constraints, the data again fail to reject H 0 : β 1 = 0 , β 2 = 0 with p-value = 0.18, and thus we concluded that both samples are from the same population (with respect to the means and variances).
Next, we considered two independent samples of size 500 each from C- and S-band populations to test if those were from the same population. With C-band radar data as the reference and when using only equality of means constraints, the data rejected H 0 : β = 0 with p-value =0.0003. Using equality of means and equality of variance constraints simultaneously, the data again rejected H 0 : β 1 = 0 , β 2 = 0 with p-value 0 ; therefore, we concluded that the samples are from different populations (with respect to the means and variances).
From the histograms, the data appear to be skewed to the left (as was noted by [3]). Thus, we considered the log transformation of the data and repeated the analysis for equality of means as reported in Table 4; however, our conclusions remain unchanged.

5.2. Credit Limit Data

Previous study considered credit limit information from three different education levels, namely, graduate school, university and high school [2]. We want to investigate whether the populations are identical using the constraints that the means of the credit limit between three different education levels are equal. The sample sizes from the three populations are 10,585, 14,030 and 4917, respectively.
The credit limit data of each groups follows positively skewed distributions (Figure 2).
To test that the populations were identical with respect to their means, the method of Section 4 for different sample sizes, in particular, the test statistic χ 2 in (42) was used with k = 3 . As the ‘university’ group had the largest sample size, the sample mean from that group was taken as fixed to be used in (29). While comparing the other groups with the university group, we found η ^ 1 = 0.4435 × 10 5 and η ^ 2 = 0.1396 × 10 5 . The test statistic value was χ 2 = 1272.92 with p-value 0 . Thus, we reject H 0 : η 1 = η 2 = 0 , and conclude that the populations are different. As the data were skewed, a log transformation was, again, considered. Then, the dual problem solutions were η ^ 1 = 0.563 , η ^ 2 = 0.1988 , and the test statistic was χ 2 = 928.75 (p-value 0 ) rejecting H 0 .

6. Discussions

Often prior information is known about the population. However, the sample collected may not reveal this information due to the sampling variability. Hence, it is worthwhile to build a model that satisfies the prior information and is the closest to the observed data. In Theorem 1, g ( x ) serves as the observed data, C serves as the prior information, and the distance between g ( x ) and C is measured using the KL distance.
This paper proposed a method to compare between different populations based on a set of restrictions specified by the investigator. The restrictions were set in the form of moment constraints through one or more functions h. Setting different types of h compares different aspects of the distributions under consideration, e.g., h ( x ) = x c 1 in (4) compares g ( x ) and g i ( x ) regarding their means. When β i = 0 in (4), then g i ( x ) = g ( x ) , x . However, when β i 0 , then g i ( x ) and g ( x ) might differ in aspects other than only their values of E ( h ( X ) ) .
For real data, one can obtain basic information from the data, including the shape. If any of the distributions under consideration are known to be approximately symmetric, using h 1 ( x ) = x c 1 and/or h 2 ( x ) = x 2 c 2 may be the first steps to determine if the distributions are different regarding their means and/or variances. However, if the distributions under consideration are known to be approximately skewed, then using h ( x ) = ln x would be more appropriate. In general, the reference distribution in (4) may be any of the k distributions, leaving the exponential distortion intact but with shifted parameters. When using the one-at-a-time method for different sample sizes, we chose the sample with the largest size as the reference, considering it to be the most trusted, and used its mean as μ k .

Author Contributions

Both authors developed the new optimality models, and drafted the manuscript. Pathiravasan conducted the simulation, analyzed the data and critically reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Credit limit data is available on the University of California Irvine repository. Request for radar meteorology data can be made to the first author.

Acknowledgments

The authors thank the referees for their comments that helped to improve the presentation of this paper.

Conflicts of Interest

All authors declare no completing interest.

Appendix A

Theorem A1.
Suppose the reference pdf is g 2 ( x ) = g ( x ) N ( μ 2 , σ 2 2 ) , where μ 2 , σ 2 2 are known. Then, the solution g 1 ( x ) = f * ( x ) to the minimization problem (6) with C = { f ( x ) : E f ( X ) = μ 1 , V a r f ( X ) = σ 1 2 } is
f * ( x ) = g 1 ( x ) = g ( x ) e α ^ 1 + β ^ 1 x + β ^ 2 x 2 = N ( μ 1 , σ 1 2 ) ,
where α ^ 1 = ln g ( x ) e β ^ 1 x + β ^ 2 x 2 d x , β ^ 1 = μ 1 σ 1 2 μ 2 σ 2 2 , β ^ 2 = 1 2 σ 2 2 1 2 σ 1 2 .
Proof. 
See Pathiravasan (2019). □
When σ 1 2 = σ 2 2 , then the C above reduces to C = { f ( x ) : E f ( X ) = μ 1 } , and then, it can be shown that the solution of the dual problem is β ^ = μ 1 μ 2 σ 2 2 . Recall, the same β = β ^ was also observed as β i in model (1) for k = 2 , which justifies the optimality of the exponential tilt form of N ( μ 1 , σ 1 2 ) with g ( x ) = N ( μ 2 , σ 2 2 ) as the reference distribution for the one-way ANOVA model in Section 2.
Proof of Theorem 2.
The cone dual to C in (11) is C * = { β 1 ( x 1 x 2 ) + β 2 ( x 2 x 3 ) ; β 1 , β 2 R } . Therefore, the corresponding dual problem is
inf β 1 , β 2 R x e β 1 ( x 1 x 2 ) + β 2 ( x 2 x 3 ) g ( x ) d x .
In order to find the solution to (A2), we differentiate it with respect to β 1 and β 2 separately. The relevant score equations are
x ( x i x i + 1 ) e β 1 ( x 1 x 2 ) + β 2 ( x 2 x 3 ) g ( x ) d x = 0 , i = 1 , 2 .
Using the expression for g ( x ) and combining similar terms in the exponent, (A3) can be simplified as E ( X i X i + 1 ) = 0 , where ( X 1 , X 2 , X 3 ) has a tri-variate normal distribution with a mean vector ( μ 1 + β 1 σ 1 2 , μ 2 + ( β 2 β 1 ) σ 2 2 , μ 3 β 2 σ 3 2 ) and covariance matrix Σ .
Setting the equations E ( X i X i + 1 ) = 0 , i = 1 , 2 as
μ 1 + β 1 σ 1 2 = μ 2 + ( β 2 β 1 ) σ 2 2 , μ 2 + ( β 2 β 1 ) σ 2 2 = μ 3 β 2 σ 3 2 ,
we find the solution of the corresponding dual problem as given in (16). Using these solutions, one obtains the solution f * ( x ) in (14) with the expression of μ * as stated in (15). □
For k > 3 , closed form solutions for β ^ i , μ * exist but are tedious or intractable (Pathiravasan, 2019). For g ( x ) nonnormal cases, unique solutions for β i ’s exist but may not be in closed forms and/or the final solution f * ( x ) may not be a known distribution.
Theorem A2.
For k = 3 with the same pdf g ( x ) = N 3 ( μ , Σ ) as in Theorem 3, consider the minimization problem (13) with C = { f ( x ) ; E f ( X i ) = E f ( X i + 1 ) , V a r f ( X i ) = V a r f ( X i + 1 ) , i = 1 , 2 } . The solution is given by
f * ( x ) = g ( x ) e α ^ 1 + β ^ 1 ( x 1 x 2 ) + β ^ 2 ( x 2 x 3 ) + β ^ 3 ( x 1 2 x 2 2 ) + β ^ 4 ( x 2 2 x 3 2 ) = N 3 ( μ * 1 , σ 2 * I 3 ) ,
where α ^ 1 is the normalizing factor, μ * , as in (15), and σ 2 * = 3 σ 1 2 σ 2 2 σ 3 2 σ 1 2 σ 2 2 + σ 1 2 σ 3 2 + σ 2 2 σ 3 2 ,
β ^ 1 = 1 3 2 μ 1 σ 2 2 σ 3 2 μ 2 σ 1 2 σ 3 2 μ 3 σ 1 2 σ 2 2 σ 1 2 σ 2 2 σ 3 2 , β ^ 2 = 1 3 μ 1 σ 2 2 σ 3 2 + μ 2 σ 1 2 σ 3 2 2 μ 3 σ 1 2 σ 2 2 σ 1 2 σ 2 2 σ 3 2 ,
β ^ 3 = 1 6 σ 1 2 σ 2 2 + σ 1 2 σ 3 2 2 σ 2 2 σ 3 2 σ 1 2 σ 2 2 σ 3 2 , β ^ 4 = 1 6 2 σ 1 2 σ 2 2 σ 1 2 σ 3 2 σ 2 2 σ 3 2 σ 1 2 σ 2 2 σ 3 2 , and I 3 is the ( 3 × 3 ) identity matrix.
Proof. 
See Pathiravasan (2019). □

References

  1. Kedem, B.; Wolff, D.B.; Fokianos, K. Statistical comparison of algorithms. IEEE Trans. Instrum. Meas. 2004, 53, 770–776. [Google Scholar] [CrossRef]
  2. Yeh, I.-C.; Lien, C.-H. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 2009, 36, 2473–2480. [Google Scholar] [CrossRef]
  3. Fokianos, K.; Kedem, B.; Qin, J.; Short, D.A. A semiparametric approach to the one-way layout. Technometrics 2001, 43, 56–65. [Google Scholar] [CrossRef] [Green Version]
  4. Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620. [Google Scholar] [CrossRef]
  5. Bhattacharya, B.; Dykstra, R. A general duality approach to I-projections. J. Stat. Plan. Inference 1995, 47, 203–216. [Google Scholar] [CrossRef]
  6. Pathiravasan, C.H. Generalized Semiparametric Approach to the Analysis of Variance. Ph.D. Dissertation, Department of Mathematics, Southern Illinois University Carbondale, Carbondale, IL, USA, 2019. [Google Scholar]
  7. Haberman, S.J. Adjustment by minimum discriminant information. Ann. Stat. 1984, 12, 971–988. [Google Scholar] [CrossRef]
Figure 1. Radar reflectivity data: histograms, normal QQ-plots and boxplots. Sample 1 and sample 2 were taken from the C-band radar population with each sample size n = 500 . Sample 3 was taken from the S-band Radar population with the same sample size n = 500 .
Figure 1. Radar reflectivity data: histograms, normal QQ-plots and boxplots. Sample 1 and sample 2 were taken from the C-band radar population with each sample size n = 500 . Sample 3 was taken from the S-band Radar population with the same sample size n = 500 .
Stats 06 00001 g001
Figure 2. Credit limit data: histograms, normal QQ-plots and boxplots for the graduate ( n = 10,585), university ( n = 14,030) and high school ( n = 4917 ) student groups.
Figure 2. Credit limit data: histograms, normal QQ-plots and boxplots for the graduate ( n = 10,585), university ( n = 14,030) and high school ( n = 4917 ) student groups.
Stats 06 00001 g002
Table 1. Mean square errors of the β parameters.
Table 1. Mean square errors of the β parameters.
E ( X 1 ) = E ( X 2 ) = E ( X 3 )
Components of Sample Parameters MSE TMSE
g ( x 1 , x 2 , x 3 ) MeanVarSizesTOANOVA TOANOVA TOANOVA
X 1  N(0,1)01200 β 1 : 0.30940.3030 0.00440.0041 0.01090.0125
X 2 N(0.2,1.1)0.21.1200 β 2 : 0.40880.4242 0.00650.0084
X 3 N(0.8,1.2)0.81.2200
X 1  Beta(1,2)0.330.06200 β 1 : 1.34331.4000 0.07260.0787 0.14420.1516
X 2 Beta(1.33,1.99)0.40.06200 β 2 : 1.58711.6000 0.07160.0729
X 3 Beta(1.75,1.75)0.50.06200
X 1  Beta(1,2)0.330.06200 β 1 : 1.24241.2245 0.06420.0639 0.12050.1245
X 2 Beta(1.08,1.62)0.40.065200 β 2 : 1.35431.3994 0.05620.0607
X 3 Beta(1.29,1.29)0.50.07200
E ( X 1 ) = E ( X 2 ) = E ( X 3 ) , V ( X 1 ) = V ( X 2 ) = V ( X 3 )
X 1  N(0,1)01200 β 1 : 0.28280.3030 0.00590.0081 0.02170.0408
X 2  N(0.2,1.1)0.21.1200 β 2 : 0.38380.4242 0.00940.0166
X 3  N(0.8,1.2)0.81.2200 β 3 : 0.04290 0.00380.0091
β 4 : 0.04040 0.00260.0069
X 1  Beta(1,2)0.330.067200 β 1 : 2.80901.4000 0.97632.1442 4.53899.7559
X 2  Beta(1.33,1.99)0.40.067200 β 2 : 2.92561.6000 1.13762.1325
X 3 Beta(1.75,1.75).50.067200 β 3 : −1.65120 1.25992.9440
β 4 : −1.49020 1.16512.5353
X 1  Beta(1.33,1.99)0.40.067500 β 1 : −1.46400 0.61332.3176 3.961912.9005
X 2 Beta(1.2,1.8)0.40.06200 β 2 : −1.79340 1.19613.3782
X 3 Beta(.97,1.46)0.40.07100 β 3 : 1.66110 0.74082.9568
β 4 : 2.03180 1.41174.2478
Here TO = Proposed optimality model, ANOVA = Analysis of variance, MSE = Mean square errors, TMSE = Total mean square errors.
Table 2. Power comparison with equal sample sizes.
Table 2. Power comparison with equal sample sizes.
Normal Gamma
n μ 2 μ 3 χ χ 1 χ 2 T 1 2 T 2 2 ANOVAKW χ χ 1 χ 2 T 1 2 T 2 2 ANOVAKW
3000 0.0540.0800.0960.0430.0580.0400.042 0.0690.0870.1260.0450.0680.0410.038
0.20.2 0.1260.1840.3680.1010.1450.1090.099 0.0570.1400.1040.0740.1010.0600.059
0.10.4 0.3090.3810.5870.2670.3300.2840.278 0.0830.1820.1750.1070.1450.1200.110
0.20.5 0.4280.4950.8050.3790.4400.3980.377 0.1030.2390.2500.1450.1880.1430.138
0.50.5 0.5270.5980.9630.4740.5460.4990.480 0.1150.2670.4180.1610.2180.1700.183
0.70.5 0.7300.7740.9920.6640.7300.7030.663 0.1580.3460.5730.2290.2810.2410.259
5000 0.0560.0840.0830.0500.0600.0460.047 0.0630.0910.1120.0510.0700.0540.051
0.20.2 0.1610.1980.4830.1390.1640.1500.153 0.0580.1130.1160.0750.0950.0780.080
0.10.4 0.4570.5080.7690.4300.4630.4380.397 0.1170.2160.2540.1510.1790.1560.173
0.20.5 0.6210.6480.9380.5770.6180.6000.565 0.1440.2680.3980.2020.2310.2030.217
0.50.5 0.7390.7680.9950.7170.7410.7300.689 0.1830.3450.6350.2460.2780.2540.261
0.70.5 0.8970.9171.0000.8820.9030.8960.887 0.3060.4820.8310.3830.4240.3860.422
Here χ = Test statistic [3], χ 1 = Proposed test statistic (27) for equal sample sizes, χ 2 = Proposed test statistic (42) for different sample sizes, T 1 2 or T 2 2 = Hotelling’s tests depending on known (chi-square) or unknown (F-distribution) variances, ANOVA = F test statistic of analysis of variance, KW = Kruskal-Wallis test statistic.
Table 3. Power comparison with different sample sizes.
Table 3. Power comparison with different sample sizes.
Normal Gamma
Sample Sizes μ 2 μ 3 χ 2 χ ANOVAKW χ 2 χ ANOVKW
n 1 = 200 0.00.00.0680.0500.0480.048 0.0850.0550.0400.040
n 2 = 100 0.20.20.5940.3330.3300.306 0.1820.1130.1610.159
n 3 = 40 0.10.40.7190.5310.5200.498 0.2440.1200.2020.208
0.20.50.9160.7850.7720.755 0.3820.2220.3370.327
0.50.51.0000.9870.9880.981 0.8070.5140.6020.629
0.70.51.0001.0001.0001.000 0.9680.7660.8390.854
n 1 = 200 0.00.00.0710.0550.0540.055 0.0780.0530.0460.043
n 2 = 100 0.20.20.7140.3700.3630.349 0.2190.1050.1410.153
n 3 = 100 0.10.40.9600.8350.8280.813 0.4750.2690.3460.347
0.20.50.9990.9600.9590.962 0.7350.4430.5420.569
0.50.51.0000.9980.9980.997 0.9330.6060.6730.703
0.70.51.0001.0001.0001.000 0.9900.8190.8670.893
Here χ 2 = Proposed test statistic (42) for different sample sizes, χ = Test Statistic [3], ANOVA = F test statistic of analysis of variance, KW = Kruskal-Wallis test statistic.
Table 4. Hypothesis testing when samples are coming from the same or different populations.
Table 4. Hypothesis testing when samples are coming from the same or different populations.
Populations H 0 TransformationDual Solutions χ 1 p-Value
C-band, C-band H 0 : β = 0 h ( x ) = x 1 x 2 β ^ = −0.000860.13820.71
H 0 : β 1 = 0 , h 1 ( x ) = x 1 x 2 , β 1 ^ = 0.0175 , 3.35770.18
β 2 = 0 h 2 ( x ) = x 1 2 x 2 2 β 2 ^ = 0.0003
H 0 : β = 0 h ( x ) = ln x 1 ln x 2 β ^ = −0.08541−0.746710.4552
C-band, S-band H 0 : β = 0 h ( x ) = x 1 x 2 β ^ = −0.008713.240.0003
H 0 : β 1 = 0 , h 1 ( x ) = x 1 x 2 , β 1 ^ = 0.0048 ,23.40.0
β 2 = 0 h 2 ( x ) = x 1 2 x 2 2 β 2 ^ = 0.0002
H 0 : β = 0 h ( x ) = ln x 1 ln x 2 β ^ = 0.44144.83070.0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pathiravasan, C.H.; Bhattacharya, B. A Semiparametric Tilt Optimality Model. Stats 2023, 6, 1-16. https://doi.org/10.3390/stats6010001

AMA Style

Pathiravasan CH, Bhattacharya B. A Semiparametric Tilt Optimality Model. Stats. 2023; 6(1):1-16. https://doi.org/10.3390/stats6010001

Chicago/Turabian Style

Pathiravasan, Chathurangi H., and Bhaskar Bhattacharya. 2023. "A Semiparametric Tilt Optimality Model" Stats 6, no. 1: 1-16. https://doi.org/10.3390/stats6010001

Article Metrics

Back to TopTop