A Semiparametric Tilt Optimality Model

Pathiravasan, Chathurangi H.; Bhattacharya, Bhaskar

doi:10.3390/stats6010001

Open AccessArticle

A Semiparametric Tilt Optimality Model

by

Chathurangi H. Pathiravasan

¹

and

Bhaskar Bhattacharya

^2,*

¹

Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA

²

School of Mathematical and Statistical Sciences, Southern Illinois University Carbondale, Carbondale, IL 62901, USA

^*

Author to whom correspondence should be addressed.

Stats 2023, 6(1), 1-16; https://doi.org/10.3390/stats6010001

Submission received: 14 November 2022 / Revised: 11 December 2022 / Accepted: 14 December 2022 / Published: 22 December 2022

(This article belongs to the Section Statistical Methods)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Practitioners often face the situation of comparing any set of k distributions, which may follow neither normality nor equality of variances. We propose a semiparametric model to compare those distributions using an exponential tilt method. This extends the classical analysis of variance models when all distributions are unknown by relaxing its assumptions. The proposed model is optimal when one of the distributions is known. Large-sample estimates of the model parameters are derived, and the hypotheses for the equality of the distributions are tested for one-at-a-time and simultaneous comparison cases. Real data examples from NASA meteorology experiments and social credit card limits are analyzed to illustrate our approach. The proposed approach is shown to be preferable in a simulated power comparison with existing parametric and nonparametric methods.

Keywords:

constraints; exponential tilt; goodness-of-fit tests; information projection; Kullback–Leibler divergence; maximum entropy

1. Introduction

Classical applied statistical techniques often depend heavily on the assumption that observations are normally distributed. The benefit of this assumption is that it helps to produce exact inferences in many popular methods, such as, t-test, F-test, chi-squared tests, analysis of variance (ANOVA) models and multivariate analysis. In reality, however, observations often show departures from normality (or near-normality). Statistical texts address this issue and discuss remedies, such as the Box-Cox transformations of data to normality. Developing techniques that use fewer assumptions (normality or others) is an important area of statistical research. When comparing different populations, this paper proposes an alternative to the one-way ANOVA by relaxing some of the assumptions.

Previous study described experimental radar reflectivity data obtained from independent radars deployed during NASA’s Tropical Rainfall Measuring Mission Kwajalein Experiment in the republic of the Marshall Islands on 15 July to 12 September 1999 [1]. The data are skewed, and we investigate whether the data from two radars are from identical populations. A credit limit data set [2] with three different education levels; graduate school, university and high school was obtained from the University of California Irvine repository. The data are skewed with high variability, and we studied whether the three credit limit populations are identical. Such data have unknown statistical distributions, and known statistical procedures may fail to work properly.

If

g_{i} (x)

is the probability density function (pdf) of N

(μ_{i}, σ^{2})

for

i = 1, \dots, k

[3], then one can write

g_{i} (x) = g_{k} (x) exp (α_{i} + β_{i} x), i = 1, \dots, k - 1,

(1)

where

α_{i} = \frac{μ_{k}^{2} - μ_{i}^{2}}{2 σ^{2}}, β_{i} = \frac{μ_{i} - μ_{k}}{σ^{2}}, i = 1, \dots, k - 1 .

(2)

From (1), denoting

g_{k} (x)

as a reference distribution, one can think of

g_{i} (x)

as an exponential distortion or tilt of the reference. Furthermore, using (2), the test of equality of

μ_{i}

’s (as needed in the one-way ANOVA) reduces to the test of equality of the

β_{i}

’s to zero, or

H_{0} : μ_{1} = \dots = μ_{k} ⟺ H_{0} : β_{1} = \dots = β_{k - 1} = 0 .

(3)

Motivated by (3), the same authors proposed a generalization of (1) by replacing

g_{k} (x)

with any pdf

g (x)

, x in the exponent of (1) with any known function

h (x)

and considered all pdfs

g_{i} (x)

that can be expressed as exponential tilt pdfs of

g (x)

. In this way, (1)–(2) are updated as

g_{i} (x) = g (x) exp (α_{i} + β_{i} h (x)),

(4)

where

α_{i} = - ln (\int_{- \infty}^{\infty} e^{β_{i} h (x)} g (x) d x), i = 1, \dots, k - 1,

(5)

for some

β_{i} \in R

. Then,

H_{0} : β_{1} = \dots = β_{k - 1} = 0

can also be used to test for the equality of any k pdfs

g (x), g_{i} (x), 1 \leq i \leq k - 1

satisfying (4), which are not necessarily normal.

In classical ANOVA, the parameters of the normal distributions are estimated using the maximum likelihood method. To estimate the

β_{i}

parameters in (4), Consider the restricted class of distributions obtained by multiplicative exponential distortions with

g_{k} = g

as a reference and, based on k independent samples, used the profile maximum likelihood method to estimate

g, g_{i}, \forall i

in that class [3]. This paper, instead, considers the class

C

of all distributions (which are restricted by a given mean for

h (X)

) and estimates the

β_{i}

by minimizing the Kullback–Leibler divergence between

g (x)

and the class

C

.

Often the criterion of comparison between distributions may be clear from the context of the data, which helps to formulate the constraint

C

. In the radar data, we are interested to know if the mean rain rate is equal in two different radars. In the credit limit data, we are interested to know if the mean credit limit is the same in three education groups. The constraints can be multiple criteria as well. Once the constraints are fixed in

C

, only those aspects are considered in the comparison between distributions.

This approach matches with the maximum entropy (ME) principle, which may be stated as follows: when selecting a model for a given situation, it is often appropriate to express the prior information in terms of the constraints. However, one must be careful so that no information other than these specified constraints are used in model selection. That is, other than the constraints that we have, the uncertainty associated with the probability distribution to be selected should be kept at its maximum [4]. In this paper, we extended the ME principle to general information projection using the Kullback–Leibler divergence.

We show in Section 2 that the solution (4) is optimum under specified constraints using h when g is known and, in this way, the proposed approach yields an optimality interpretation for the exponential tilt models. The proposed approach extends the comparisons of means in ANOVA to comparisons of means and variances for the normal and other known distributions using duality. In Section 3, we develop a semi-parametric approach when

g (x)

is unknown and derive asymptotic test statistics for testing the equality of populations for the cases when sample sizes are equal or different.

In Section 4, we present simulation studies that evaluate the performance of the

β_{i}

-parameters with respect to the classical ANOVA methods. We also compare the test statistics developed in Section 4 with existing parametric and nonparametric procedures. Section 5 shows the details of the proposed methods for the applications with radar data and credit limit data sets. Section 6 contains discussions on the choice of the function(s) h for particular cases of reference distributions

g (x)

. The Appendix contains additional results and proofs.

2. Tilt Optimality Models

Kullback–Liebler (KL) discriminant information or divergence is a measure of ‘distance’ between two probability distributions. For pdf’s

f (x), g (x)

, the KL discriminant information for

f (x)

against

g (x)

is given by

\int_{- \infty}^{\infty} f (x) ln (\frac{f (x)}{g (x)}) d x,

which is always nonnegative, and

= 0

if and only if

f \equiv g

.

Let

C

be a convex set of pdf’s with

g \notin C

. If

f^{*} (x)

is the solution to

inf_{f (x) \in C} \int_{- \infty}^{\infty} f (x) ln (\frac{f (x)}{g (x)}) d x

(6)

then

f^{*} (x)

(information projection) is the closest to

g (x)

among all pdf’s in

C

with respect to the KL distance. Let

h_{i} (x)

be arbitrary but known functions of x. Define

C

to be the class of all pdf’s f where

E_{f} (h_{i} (X)) = 0, \forall i

—that is,

C = \{f (x) : \int_{- \infty}^{\infty} h_{i} (x) f (x) d x = 0, \forall i\} .

(7)

In order to solve (6), Fenchel’s duality theorem can be applied as shown by [5]. The corresponding dual cone is

C^{*} = {\sum_{i = 1}^{k} β_{i} h_{i} (x) : β_{i} \in R} .

The dual problem can be shown to be equivalent to

inf_{β_{i} \in R, \forall i} \int_{- \infty}^{\infty} exp (\sum_{i = 1}^{k} β_{i} h_{i} (x)) g (x) d x .

(8)

As the dual problem (8) is a function of scalars only, it could be substantially easier to solve than the primal problem (6), depending on the form of

h_{i} (x)

. In particular, setting the derivative of the integral quantity in (8) with respect to

β_{i}

equal to zero, we obtain

\int_{- \infty}^{\infty} h_{i} (x) exp (\sum_{i = 1}^{k} β_{i} h_{i} (x)) g (x) d x = 0, 1 \leq i \leq k,

(9)

which can be solved easily for

β_{i}

’s by the Newton–Raphson method. If the solution of (9) is

{\hat{β}}_{i}

, then, from [5], the solution of the primal problem (6), say

f^{*} (x)

, is of the form

f^{*} (x) = \frac{g (x) exp (\sum_{i = 1}^{k} {\hat{β}}_{i} h_{i} (x))}{\int_{- \infty}^{\infty} g (x) exp (\sum_{i = 1}^{k} {\hat{β}}_{i} h_{i} (x)) d x} .

(10)

When

k = 1

, setting

\hat{β} = β_{i}, f^{*} (x) = g_{i} (x)

, (10) can be simplified as (4). The above derivation explains the exponential structure of

g_{i} (x)

, and (9) verifies that the exponential tilt model

g_{i} (x)

is in

C

(see (7)). These developments are summarized in the following theorem.

Theorem 1.

When

g (x)

is known, the exponential tilt model (4) is the optimum model in the sense that it is the closest to

g (x)

among all probability distributions in

C

from (7) in the KL distance.

The model (10) will be referred to in the sequel as the tilt optimality (TO) model. Note that, when

C

in (7) specifies

h (x) = x - μ_{i}

and

g (x) = g_{k} (x) \sim N (μ_{k}, σ^{2})

, then the solution

g_{i} (x)

is found to be

N (μ_{i}, σ^{2})

as was seen in (1)–(2). When

g (x)

in (6) is uniform (or Lebesgue measure), then the minimizing

f^{*}

is known as the maximum entropy model (distribution) in

C

[4].

Theorem A1 in the Appendix A shows that closed form solutions for the dual problem are obtained for the normal distributions with constraints on both the mean and variance. However, this may not always be the case. The final solution depends on the form of

g (x)

and the restrictions in

C

.

While (1) compares each

g_{i} (x)

with

g_{k} (x)

one-at-a-time, another approach would be to compare all k distributions simultaneously. For

x = (x_{1}, \dots, x_{k})

, let

X = (X_{1}, \dots, X_{k}) \sim g (x) = N_{k} (μ, Σ)

with known

μ = (μ_{1}, \dots, μ_{k})

and covariance matrix

Σ = (σ_{i j})

with

σ_{i i} = σ_{i}^{2}

and

σ_{i j} = 0

, elsewhere. Considering equality of the k means (unspecified) with possibly unequal variances, we define

C = {f (x) : E (X_{i}) = E (X_{i + 1}), 1 \leq i \leq k - 1} .

(11)

Following (10), the solution is given by

f^{*} (x) = \frac{g (x) e^{\sum_{i = 1}^{k} {\hat{β}}_{i} (x_{i} - x_{i + 1})}}{\int_{- \infty}^{\infty} g (x) e^{\sum_{i = 1}^{k} {\hat{β}}_{i} (x_{i} - x_{i + 1})} d x} .

(12)

However, a closed form expression is only available when

k = 3, 4

.

Theorem 2 (proof in Appendix A) considers equality of means as in classical ANOVA (deals with unknown means and unknown but equal variances) but with unrestricted variances.

Theorem 2.

For

k = 3

with known reference pdf

g (x) = N_{3} (μ, Σ)

with the above Σ, we consider the minimization problem

inf_{f (x) \in C} \int_{x} f (x) ln (\frac{f (x)}{g (x)}) d x

(13)

with

C

in (11). The solution to (13) is given by

f^{*} (x) = g (x) e^{{\hat{α}}_{1} + {\hat{β}}_{1} (x_{1} - x_{2}) + {\hat{β}}_{2} (x_{2} - x_{3})} = N_{3} (μ^{*} 1, Σ)

(14)

where

{\hat{α}}_{1} = - ln [\int g (x) e^{{\hat{β}}_{1} (x_{1} - x_{2}) + {\hat{β}}_{2} (x_{2} - x_{3})} d x]

,

μ^{*} = \frac{μ_{1} σ_{2}^{2} σ_{3}^{2} + μ_{2} σ_{1}^{2} σ_{3}^{2} + μ_{3} σ_{1}^{2} σ_{2}^{2}}{σ_{1}^{2} σ_{2}^{2} + σ_{1}^{2} σ_{3}^{2} + σ_{2}^{2} σ_{3}^{2}}, 1 = {(1, 1, 1)}^{'},

(15)

{\hat{β}}_{1} = - (\frac{μ_{1} σ_{2}^{2} + μ_{1} σ_{3}^{2} - μ_{2} σ_{3}^{2} - μ_{3} σ_{2}^{2}}{σ_{1}^{2} σ_{2}^{2} + σ_{1}^{2} σ_{3}^{2} + σ_{2}^{2} σ_{3}^{2}}), {\hat{β}}_{2} = - (\frac{μ_{1} σ_{2}^{2} + μ_{2} σ_{1}^{2} - μ_{3} σ_{1}^{2} - μ_{3} σ_{2}^{2}}{σ_{1}^{2} σ_{2}^{2} + σ_{1}^{2} σ_{3}^{2} + σ_{2}^{2} σ_{3}^{2}}) .

(16)

Note that the solution

f^{*} (x)

has the same covariance

Σ

as the reference

g (x)

; nonetheless,

Σ

influences the mean

μ^{*}

in solution (15) as a weighted average of its elements. When

σ_{1}^{2} = σ_{2}^{2} = σ_{3}^{2}

as in one-way ANOVA, then

μ^{*} = \frac{μ_{1} + μ_{2} + μ_{3}}{3}

,

{\hat{β}}_{1} = - (\frac{2 μ_{1} - μ_{2} - μ_{3}}{3 σ^{2}}), {\hat{β}}_{2} = - (\frac{μ_{1} + μ_{2} - 2 μ_{3}}{3 σ^{2}}) .

(17)

Theorem 2 can be extended to

k = 4

with closed form solutions for

β, μ^{*}

, which is tedious [6]. Unique solutions exist for higher k but obtaining their closed forms seems intractable. This is also the case of extension to a general

Σ

with

σ_{i j} \neq 0, i \neq j

.

Beyond the one-way ANOVA, the above approach allows us to simultaneously compare both the means and variances of k independent normal distributions (Theorem A2).

3. Semiparametric Approach

For an unknown data generating process, however, the true form of

g (x)

in (13) (with

C

in (11)) may not be known. Then,

g_{i} (x)

in the solution (4) becomes not well-defined (replacing x with x). Note that the model (10) now becomes a ‘semiparametric tilt optimality restricted model’ because, along with the parametric component

β

, there is also the nonparametric component

g (x)

, about which no distributional assumption is made. Using the sample, we define a discrete version of

C

expressed as moment constraints. Assuming that these sample (moment) constraints represent the corresponding population (moment) constraints efficiently and consistently, the resulting model is expected to perform well.

The dual problem corresponding to (13) is

inf_{β_{i} \in R, \forall i} \int_{x} e^{\sum_{i = 1}^{k - 1} β_{i} (x_{i} - x_{i + 1})} g (x) d x .

(18)

The relevant score equations are

\int_{x} (x_{i} - x_{i + 1}) e^{\sum_{i = 1}^{k - 1} β_{i} (x_{i} - x_{i + 1})} g (x) d x = 0, 1 \leq i \leq k - 1 .

(19)

To study the asymptotic properties of the model (similar in spirit to [7]), we consider the cases when the sample sizes are equal and when they are different.

3.1. Equal Sample Sizes

Suppose k independent random samples

{x_{i j} 1 \leq j \leq n}, 1 \leq i \leq k

, each of size n, are available from independent populations with unknown means

μ_{i}

and unknown variances

σ_{i}^{2}, 1 \leq i \leq k

, respectively. If we rearrange all the

n k

sample values as

(x_{1}, \dots, x_{n})

, where

x_{j} = {(x_{1 j}, \dots, x_{k j})}^{'}, 1 \leq j \leq n

, then

x_{j}

’s form a random sample from a multivariate distribution (say, pdf

g (x)

), with mean

μ = {(μ_{1}, \dots, μ_{k})}^{'}

and covariance diag

(σ_{1}^{2}, \dots, σ_{k}^{2})

.

Let

\hat{p} = {({\hat{p}}_{1}, \dots, {\hat{p}}_{n})}^{'}

be the empirical distribution that has mass

{\hat{p}}_{j} = \frac{1}{n}

at each

x_{j}, 1 \leq j \leq n

. The constraint of equality of k means,

C

in (11), is discretized below as

K

, appropriately, using the probability mass function (pmf)

q = (q_{1}, \dots, q_{n})

and sample values

x_{i j}

’s as

K = \{q : \sum_{j = 1}^{n} (x_{i j} - x_{i + 1, j}) q_{j} = 0, 1 \leq i \leq k - 1\} .

Here, the primal problem (6) becomes (replacing pdfs with pmfs)

{inf}_{q \in K} \sum_{j = 1}^{n} q_{j} ln (\frac{q_{j}}{{\hat{p}}_{j}}),

and the dual problem is

{inf}_{β_{i} \in R, \forall i} \sum_{j = 1}^{n} {\hat{p}}_{j} exp [\sum_{i = 1}^{k - 1} β_{i} (x_{i j} - x_{i + 1, j})] .

The score equations for the dual problem are

\sum_{j = 1}^{n} {\hat{p}}_{j} (x_{i j} - x_{i + 1, j}) exp [\sum_{i = 1}^{k - 1} β_{i} (x_{i j} - x_{i + 1, j})] = 0, 1 \leq i \leq k - 1 .

(20)

Suppose

β_{n} = (β_{n i}, 1 \leq i \leq k - 1)

solve the score Equation (20). Then, the primal solution

{\hat{q}}_{j}^{*}

is given by

{\hat{q}}_{j}^{*} = \frac{{\hat{p}}_{j} exp [\sum_{i = 1}^{k - 1} β_{n i} (x_{i j} - x_{i + 1, j})]}{\sum_{j = 1}^{n} {\hat{p}}_{j} exp [\sum_{i = 1}^{k - 1} β_{n i} (x_{i j} - x_{i + 1, j})]}, j = 1, \dots, n .

For arbitrary

δ = {(δ_{1}, \dots, δ_{k - 1})}^{'}

, define

h_{j} (δ) = \sum_{i = 1}^{k - 1} δ_{i} (x_{i j} - x_{i + 1, j}), 1 \leq j \leq n,

m_{n} (δ) = (m_{n r} (δ) : 1 \leq r \leq k - 1)

,

Σ_{n} (δ) = (σ_{n r s} (δ); 1 \leq r, s \leq k - 1)

,

\begin{matrix} \begin{matrix} m_{n r} (δ) = \frac{\frac{1}{n} \sum_{j = 1}^{n} (x_{r j} - x_{r + 1, j}) e^{h_{j} (δ)}}{\frac{1}{n} \sum_{j = 1}^{n} e^{h_{j} (δ)}}, and \\ σ_{n r s} (δ) = \frac{\frac{1}{n} \sum_{j = 1}^{n} [(x_{r j} - x_{r + 1, j}) - m_{n r} (δ)] [(x_{s j} - x_{s + 1, j}) - m_{n s} (δ)] e^{h_{j} (δ)}}{\frac{1}{n} \sum_{j = 1}^{n} e^{h_{j} (δ)}} . \end{matrix} \end{matrix}

(21)

By (20) and (21),

m_{n} (β_{n}) = 0

, a vector of zeroes of length

k - 1

. Furthermore, by Taylor’s expansion of

m_{n} (β_{n})

around

β

(

β = (β_{1}, \dots, β_{k - 1})

),

0 = m_{n} (β_{n}) = m_{n} (β) + Σ_{n} (w_{n}) (β_{n} - β),

(22)

where

w_{n}

satisfies max{

| w_{n} - β_{n} |

,

| w_{n} - β |

} ≤

| β_{n} - β |

. As

n \to \infty

, by the strong law of large numbers,

β_{n} \to β, w_{n} \to β

, and

\begin{matrix} \frac{1}{n} \sum_{j = 1}^{n} {(x_{r j} - x_{r + 1, j})}^{a} {(x_{s j} - x_{s + 1, j})}^{b} e^{h_{j} (w_{n})} & ⟶ \int {(x_{r} - x_{r + 1})}^{a} {(x_{s j} - x_{s + 1, j})}^{b} e^{h (β)} g (x) d x, \end{matrix}

(23)

a, b = 0, 1, 1 \leq r, s \leq k - 1

, with probability 1 where

h (β) = \sum_{i = 1}^{k - 1} β_{i} (x_{i} - x_{i + 1})

.

By (21) and (23), it follows that, when

μ_{1} = \dots = μ_{k}

,

\begin{matrix} \begin{matrix} m_{n r} (w_{n}) & ⟶ \frac{\int (x_{r} - x_{r + 1}) e^{h (β)} g (x) d x}{\int e^{h (β)} g (x) d x} = 0, \\ σ_{n r s} (w_{n}) & ⟶ σ_{r s} = \frac{\int (x_{r} - x_{r + 1}) (x_{s} - x_{s + 1}) e^{h (β)} g (x) d x}{\int e^{h (β)} g (x) d x}, \end{matrix} \end{matrix}

(24)

1 \leq r, s \leq k - 1

, with probability 1. Let

Σ = (σ_{r s})

. As

n \to \infty

, by the central limit theorem

\sqrt{n} (m_{n} (β) - 0) ⟶ N (0, Σ^{*}),

(25)

where

Σ^{*} = (σ_{r s}^{*})

,

σ_{r s}^{*} = \frac{\int (x_{r} - x_{r + 1}) (x_{s} - x_{s + 1}) e^{2 h (β)} g (x) d x}{{(\int e^{h (β)} g (x) d x)}^{2}} .

(26)

By (22)–(26),

\sqrt{n} (β_{n} - β) = {[Σ_{n} (w_{n})]}^{- 1} \sqrt{n} (m_{n} (β_{n}) - m_{n} (β)) \overset{D}{\to} N (0, Σ^{- 1} Σ^{*} Σ^{- 1}),

as

n \to \infty

. The above developments are summarized in the following theorem. This theorem establishes the asymptotic normality of the parameters of the proposed model when sample sizes are equal.

Theorem 3.

For a general reference pdf g, assume that the solution of (13),

f^{*} (x)

, exists and

\int e^{h (β)} f^{*} (x) d x < \infty

where

h (β) = \sum_{i = 1}^{k - 1} β_{i} (x_{i} - x_{i + 1})

for β in an open neighborhood of 0. When

\int (x_{i} - x_{i + 1}) (x_{j} - x_{j + 1}) e^{2 h (β)} g (x) d x < \infty, \forall i, j

, and all sample sizes are equal to n,

\sqrt{n} (β_{n} - β) \to_{}^{D} N (0, Σ^{- 1} Σ^{*} Σ^{- 1})

as

n \to \infty

where

Σ, Σ^{*}

are defined in (24) and (26), respectively.

When

μ_{1} = \dots = μ_{k}

, or equivalently,

β = 0

, the quantity

n β_{n}^{^{'}} {(Σ^{- 1} Σ^{*} Σ^{- 1})}^{- 1} β_{n}

has an asymptotic chi-square distribution with

k - 1

degrees of freedom as

n \to \infty

.

Thus, the test statistic

χ_{1} = n β_{n}^{^{'}} {({\hat{Σ}}^{- 1} {\hat{Σ}}^{*} {\hat{Σ}}^{- 1})}^{- 1} β_{n}

(27)

can be used for testing the hypothesis

H_{0} : β = 0

where

σ_{r s}, σ_{r s}^{*}

are estimated by

σ_{n r s} (β_{n})

,

σ_{n r s}^{*} (β_{n})

, respectively, (since

Σ = Σ^{*}

, we replaced

σ_{n r s} (β_{n}), σ_{n r s}^{*} (β_{n})

by

(σ_{n r s} (β_{n}) + σ_{n r s}^{*} (β_{n})) / 2

).

\begin{matrix} \begin{matrix} σ_{n r s} (β_{n}) = \frac{\frac{1}{n} \sum_{j = 1}^{n} [(x_{r j} - x_{r + 1, j})] [(x_{s j} - x_{s + 1, j})] e^{\sum_{i = 1}^{k - 1} β_{n i} (x_{i j} - x_{i + 1, j})}}{\frac{1}{n} \sum_{j = 1}^{n} e^{\sum_{i = 1}^{k - 1} β_{n i} (x_{i j} - x_{i + 1, j})}}, \\ σ_{n r s}^{*} (β_{n}) = \frac{\frac{1}{n} \sum_{j = 1}^{n} [(x_{r j} - x_{r + 1, j})] [(x_{s j} - x_{s + 1, j})] e^{2 \sum_{i = 1}^{k - 1} β_{n i} (x_{i j} - x_{i + 1, j})}}{{(\frac{1}{n} \sum_{j = 1}^{n} e^{\sum_{i = 1}^{k - 1} β_{n i} (x_{i j} - x_{i + 1, j})})}^{2}} . \end{matrix} \end{matrix}

(28)

Clearly, the above developments can be extended for simultaneous mean and variance comparisons for k populations by modifying the

C

in (11).

3.2. Different Sample Sizes

The simultaneous approach developed above for equal sample sizes does not allow the sample sizes to be different. To that end, we consider

k - 1

independent one-at-a-time population optimization problems by adopting the development in Section 2 reversing the roles of

g (x) = g_{k} (x)

and

g_{i} (x)

in (4), along with setting

h (x) = x

, and, finally, we combine the

k - 1

results. Both procedures work for equal sample sizes.

For

1 \leq i \leq k - 1

, consider the ith problem as finding the pdf

f (x)

in

C = {f (x) : E [X] = μ_{k}},

(29)

which is the closest to

g_{i} (x)

, assuming that

μ_{k}

is known. Following similar steps as in Section 2, the pdf in (29), which is the closest to

g_{i} (x)

, is given by

f_{i}^{*} (x) = g_{i} (x) exp (γ_{i} + η_{i} x), γ_{i} = - ln (\int_{- \infty}^{\infty} e^{η_{i} x} g_{i} (x) d x),

(30)

i = 1, \dots, k - 1

, where

η_{i}

solves

\int_{- \infty}^{\infty} (x - μ_{k}) exp (η_{i} x) g_{i} (x) d x = 0 .

(31)

If

η_{i} = 0

, then

f_{i}^{*} (x) = g_{i} (x), \forall x, \forall i

.

To develop the corresponding

k - 1

sample optimization problems, suppose independent random samples

{x_{i j} 1 \leq j \leq n_{i}}, 1 \leq i \leq k

are available from k populations with the means

μ_{i}, 1 \leq i \leq k

, respectively. Although we assume that

μ_{k}

is known, in reality, it maybe unknown. Thus, we suggest choosing the kth sample as the one that is the largest in size. Let

{\bar{x}}_{k} = \frac{\sum_{j = 1}^{n_{k}} x_{i j}}{n_{k}}

be the mean of the kth sample. Then, take

μ_{k} = {\bar{x}}_{k}

(see Section 6).

For the ith (

1 \leq i \leq k - 1

) sample optimization problem, let

{\hat{p}}_{i} = ({\hat{p}}_{i 1}, \dots, {\hat{p}}_{i, n_{i}})

be an empirical distribution that has mass

{\hat{p}}_{i j} = \frac{1}{n_{i}}

at each

x_{i j}, 1 \leq j \leq n_{i}

. Let the ith sample version of

C

in (29), say

K_{i}

, containing

q_{i} = (q_{i 1}, \dots, q_{i, n_{i}})

, be defined as

K_{i} = \{q_{i} : \sum_{j = 1}^{n_{i}} (x_{i j} - {\bar{x}}_{k}) q_{i j} = 0\}, 1 \leq i \leq k - 1 .

The ith (

1 \leq i \leq k - 1

) sample version of (6) and its dual problem becomes

inf_{q_{i} \in K_{i}} \sum_{j = 1}^{n_{i}} q_{i j} ln (\frac{q_{i j}}{{\hat{p}}_{i j}}), and, inf_{η_{i} \in R} \sum_{j = 1}^{n_{i}} {\hat{p}}_{i j} exp [η_{i} (x_{i j} - {\bar{x}}_{k})],

respectively. The ith score equation is

\sum_{j = 1}^{n_{i}} {\hat{p}}_{i j} (x_{i j} - {\bar{x}}_{k}) exp [η_{i} (x_{i j} - {\bar{x}}_{k})] = 0 .

(32)

Suppose

η_{n_{i}}

solves (32). Then, the primal solution

{\hat{q}}_{i j}^{*}

is given by

{\hat{q}}_{i j}^{*} = \frac{{\hat{p}}_{i j} exp [η_{n_{i}} (x_{i j} - {\bar{x}}_{k})]}{\sum_{j = 1}^{n} {\hat{p}}_{i j} exp [η_{n_{i}} (x_{i j} - {\bar{x}}_{k})]}, j = 1, \dots, n_{i} .

For arbitrary

ψ_{i} \in R, 1 \leq i \leq k - 1

, define

\begin{matrix} h_{i j} (ψ_{i}) = ψ_{i} (x_{i j} - {\bar{x}}_{k}), & m_{n_{i}} (ψ_{i}) = \frac{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} (x_{i j} - {\bar{x}}_{k}) e^{h_{i j} (ψ_{i})}}{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} e^{h_{i j} (ψ_{i})}}, and \\ σ_{n_{i}}^{2} (ψ_{i}) & = & \frac{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} {[(x_{i j} - {\bar{x}}_{k}) - m_{n_{i}} (ψ_{i})]}^{2} e^{h_{i j} (ψ_{i})}}{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} e^{h_{i j} (ψ_{i})}} . \end{matrix}

(33)

By (32),

m_{n_{i}} (η_{n_{i}}) = 0, \forall i

. Using Taylor’s expansion of

m_{n_{i}} (η_{n_{i}})

around

η_{i}

,

0 = m_{n_{i}} (η_{n_{i}}) = m_{n_{i}} (η_{i}) + σ_{n_{i}}^{2} (w_{n_{i}}) (η_{n_{i}} - η_{i}),

(34)

where

w_{n_{i}}

is such that max{

| w_{n_{i}} - η_{n_{i}} |

,

| w_{n_{i}} - η_{i} |

} ≤

| η_{n_{i}} - η_{i} |

. With

h_{i} = η_{i} (x - {\bar{x}}_{k})

, by the strong law of large numbers, as

n_{i} \to \infty

,

η_{n_{i}} \to η_{i}, w_{n_{i}} \to η_{i} \forall i

and

\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} {(x_{i j} - {\bar{x}}_{k})}^{a} e^{h_{i j} (w_{n_{i}})} ⟶ \int {(x - {\bar{x}}_{k})}^{a} e^{h_{i}} g_{i} (x) d x,

(35)

a = 0, 1, 2, 1 \leq i \leq k - 1

, with probability 1.

From (31),

\int (x - {\bar{x}}_{k}) e^{h_{i}} g_{i} (x) d x = 0

. When

μ_{i} = μ_{k}

, by (33) and (35), it follows that, as

n_{i} \to \infty

,

m_{n_{i}} (w_{n_{i}}) ⟶ \frac{\int (x - {\bar{x}}_{k}) e^{h_{i}} g_{i} (x) d x}{\int e^{h_{i}} g_{i} (x) d x} = 0, σ_{n_{i}}^{2} (w_{n_{i}}) ⟶ σ_{i}^{2} = \frac{\int {(x - {\bar{x}}_{k})}^{2} e^{h_{i}} g_{i} (x) d x}{\int e^{h_{i}} g_{i} (x) d x},

(36)

1 \leq i \leq k - 1

, with probability 1. As

n_{i} \to \infty

, by the central limit theorem

\sqrt{n_{i}} (m_{n_{i}} (η_{i}) - 0) ⟶ N (0, σ_{i}^{2 *}),

(37)

where

σ_{i}^{* 2} = \frac{\int {(x - {\bar{x}}_{k})}^{2} e^{2 h_{i}} g_{i} (x) d x}{{[\int e^{h_{i}} g_{i} (x) d x]}^{2}} .

(38)

By (34), (35) and (25),

\sqrt{n_{i}} (η_{n_{i}} - η_{i}) = \frac{\sqrt{n} (m_{n_{i}} (η_{n_{i}}) - m_{n_{i}} (η_{i}))}{σ_{n_{i}}^{2} (w_{n_{i}})} \overset{D}{\to} N (0, \frac{σ_{i}^{* 2}}{σ_{i}^{4}}),

(39)

as

n_{i} \to \infty, 1 \leq i \leq k - 1

.

Next, we combine the results from the

k - 1

sample-optimization problems.

Theorem 4.

For a general reference pdf g, assume that the solution of (13) subject to (29),

f_{i}^{*} (x)

, exists and

\int e^{η_{i} x} f_{i}^{*} (x) d x < \infty

for some

η_{i}

in an open neighborhood of 0,

1 \leq i \leq k - 1

. Assume

\int x^{2} e^{η_{i} x} f_{i}^{*} (x) d x < \infty, \forall i .

Since all k samples are independent,

\frac{\sqrt{n_{i}} (η_{n_{i}} - η_{i})}{σ_{i}^{*} / σ_{i}^{2}} \overset{D}{\to} N (0, 1),

as

n_{i} \to \infty

.

Then, under

H_{0} : η_{i} = 0, 1 \leq i \leq k - 1

,

\sum_{i = 1}^{k - 1} \frac{n_{i} η_{n_{i}}^{2}}{σ_{i}^{* 2} / σ_{i}^{4}}

has an asymptotic chi-square distribution with

k - 1

degrees of freedom as

n_{i} \to \infty, \forall i

. Thus, the test statistic

\sum_{i = 1}^{k - 1} \frac{n_{i} η_{n_{i}}^{2}}{σ_{n_{i}}^{* 2} (η_{n_{i}}) / σ_{n_{i}}^{4} (η_{n_{i}})}

(40)

can be used for testing the hypothesis

H_{0} : η_{i} = 0, 1 \leq i \leq k - 1

, where we replaced

σ_{i}^{* 2}, σ_{i}^{2}

by

σ_{n_{i}}^{* 2} (η_{n_{i}}), σ_{n_{i}}^{2} (η_{n_{i}})

, respectively, with

σ_{n_{i}}^{* 2} (η_{n_{i}}) = \frac{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} {(x_{i j} - {\bar{x}}_{k})}^{2} e^{2 η_{n_{i}} (x_{i j} - {\bar{x}}_{k})}}{{[\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} e^{η_{n_{i}} (x_{i j} - {\bar{x}}_{k})}]}^{2}}, σ_{n_{i}}^{2} (η_{n_{i}}) = \frac{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} {(x_{i j} - {\bar{x}}_{k})}^{2} e^{η_{n_{i}} (x_{i j} - {\bar{x}}_{k})}}{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} e^{η_{n_{i}} (x_{i j} - {\bar{x}}_{k})}} .

(41)

Since

σ_{i}^{* 2} = σ_{i}^{2}

under

H_{0}

, we replace each of

σ_{n_{i}}^{* 2} (η_{n_{i}}), σ_{n_{i}}^{2} (η_{n_{i}})

by a common estimate

σ_{n_{i}}^{* * 2} (η_{n_{i}}) = (σ_{n_{i}}^{* 2} (η_{n_{i}}) + σ_{n_{i}}^{2} (η_{n_{i}})) / 2

, and then calculate a pooled estimate of variance

{\hat{σ}}^{2} = (\sum_{i = 1}^{k - 1}

n_{i} σ_{n_{i}}^{* * 2} (η_{n_{i}}) /

\sum_{i = 1}^{k - 1} n_{i}

over

k - 1

populations. With this substitution, the test statistic from (40) is simplified as

χ_{2} = {\hat{σ}}^{2} \sum_{i = 1}^{k - 1} n_{i} η_{n_{i}}^{2} .

(42)

The above developments can be extended for simultaneous mean and variance comparisons for k populations by modifying the

C

in (29).

For one h:

\frac{1}{n} \sum_{j = 1}^{n} e^{β_{n} h (x_{j})} - \frac{1}{2 n} \frac{\sum_{j = 1}^{n} h^{2} (x_{j}) e^{2 β_{n} h (x_{j})}}{\sum_{j = 1}^{n} h^{2} (x_{j}) e^{β_{n} h (x_{j})}}

For more than one h, let

h = {(h_{1}, \dots, h_{t})}^{'}

represent t constraints.

(i) For equal sample sizes, recall

x_{j} = {(x_{1 j}, \dots, x_{k j})}^{'}, 1 \leq j \leq n

.

Recall

Σ, Σ^{*}

given as

\hat{Σ} = (σ_{n r s} (β_{n}); 1 \leq r, s \leq t), {\hat{Σ}}^{*} = (σ_{n r s}^{*} (β_{n}); 1 \leq r, s \leq t)

.

σ_{n r s} (β_{n}) = \frac{\frac{1}{n} \sum_{j = 1}^{n} h_{r} (x_{j}) h_{s} (x_{j}) e^{\sum_{l = 1}^{t} β_{n i} h_{i} (x_{j})}}{\frac{1}{n} \sum_{j = 1}^{n} e^{\sum_{l = 1}^{t} β_{n i} h_{i} (x_{j})}},

σ_{n r s}^{*} (β_{n}) = \frac{\frac{1}{n} \sum_{j = 1}^{n} h_{r} (x_{j}) h_{s} (x_{j}) e^{2 \sum_{l = 1}^{t} β_{n i} h_{i} (x_{j})}}{{(\frac{1}{n} \sum_{j = 1}^{n} e^{\sum_{l = 1}^{t} β_{n i} h_{i} (x_{j})})}^{2}} .

Then, the IC is

I C = \frac{1}{n} \sum_{j = 1}^{n} e^{\sum_{l = 1}^{t} β_{n i} h_{i} (x_{j})} - \frac{1}{2 n} \sum_{l = 1}^{t} {\hat{λ}}_{i}

where

{\hat{λ}}_{i}

is the ith eigen value of

{\hat{Σ}}^{- \frac{1}{2}} {\hat{Σ}}^{*} {\hat{Σ}}^{- \frac{1}{2}}

. Find

{\hat{Σ}}^{- \frac{1}{2}}

by Cholesky decomposition.

(ii) For different sample sizes, define

Σ_{i}, Σ_{i}^{*}

given as

{\hat{Σ}}_{i} = (σ_{n_{i} r s} (η_{n_{i}}); 1 \leq r, s \leq t), {\hat{Σ}}_{i}^{*} = (σ_{n_{i} r s}^{*} (η_{n_{i}}); 1 \leq r, s \leq t)

.

σ_{n_{i} r s} (η_{n_{i}}) = \frac{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} h_{r} (x_{i j}) h_{s} (x_{i j}) e^{\sum_{l = 1}^{t} η_{n l} h_{l} (x_{i j})}}{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} e^{\sum_{l = 1}^{t} η_{n l} h_{l} (x_{i j})}},

σ_{n_{i} r s}^{*} (η_{n_{i}}) = \frac{\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} h_{r} (x_{i j}) h_{s} (x_{i j}) e^{2 \sum_{l = 1}^{t} η_{n l} h_{l} (x_{i j})}}{{(\frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} e^{\sum_{l = 1}^{t} η_{n l} h_{l} (x_{i j})})}^{2}} .

Then, the IC is

I C = \frac{1}{k - 1} \sum_{i = 1}^{k - 1} {I C}_{i}

{I C}_{i} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} e^{\sum_{l = 1}^{t} η_{n l} h_{l} (x_{i j})} - \frac{1}{2 n_{i}} \sum_{l = 1}^{t} {\hat{λ}}_{i l}

where

{\hat{λ}}_{i l}

is the lth eigen value of

{\hat{Σ}}_{i l}^{- \frac{1}{2}} {\hat{Σ}}_{i l}^{*} {\hat{Σ}}_{i l}^{- \frac{1}{2}}

. Find

{\hat{Σ}}_{i l}^{- \frac{1}{2}}

by Cholesky decomposition.

4. Simulation Studies

In this section, we study the effect of violations of the assumptions of the one-way ANOVA on the performances of the beta parameters using Mean Square Errors. We also compare powers of the proposed methods with existing parametric and nonparametric methods. We used the R program for all simulation studies. The proposed methods are applicable to any type of data including under-dispersed or over-dispersed or data containing outliers as we do not assume any distribution

g (x)

in our analysis.

4.1. Mean Square Errors of $β$

Although the one-way ANOVA does not have a

β

-parameter as is, (14) with

{\hat{β}}_{i}

replaced by

β_{i}

may be considered as the one-way ANOVA model, which shows the

β

-parameters when

k = 3

. To implement the assumption of equal variance, we used

σ^{2} = \frac{σ_{1}^{2} + σ_{2}^{2} + σ_{3}^{2}}{3}

as the common variance in ANOVA. The true

β_{1} = {\hat{β}}_{1}, β_{2} = {\hat{β}}_{2}

are in (17). In the simulation, the true

β_{1}, β_{2}

are not changed when the population distribution is any different from normal to maintain the normality assumption of the one-way ANOVA.

There are four beta parameters (

β_{1}, \dots, β_{4}

) under the simultaneous constraints of equality of means and equality of variances. For one-way ANOVA, we set

β_{3} = 0

and

β_{4} = 0

as there are no variance constraints. In the TO model (10), however, the true

β

changes depending on the population and constraints as shown in Section 3 for selected distributions as solutions of the corresponding score equations.

For the simulation in Table 1, we considered independent normal, uniform and beta populations and formed

g (x)

by multiplying the pdf’s of independent components as listed, which were chosen with varying means and/or variances. Various sample sizes of 50, 200 and 500 with 1000 runs for each case were considered of which only selected results are reported in Table 1 (see more details [6]). We found that the total mean square error of our method decreased for larger sample sizes.

As a consequence of the way the ANOVA model is defined in (14) using the

β

, both ANOVA and TO solutions (or methods) become identical when populations are identical or when the means are the same, and thus they produce the same values for true

β

’s, solutions and MSEs. Both methods have a relatively higher TMSE (the total of all MSEs) for beta reference distributions compared to the normal reference distributions. For the cases considered, the TO method has a smaller MSE compared with the one-way ANOVA method and more so when the mean and variance constraints are simultaneously considered.

4.2. Power Comparison: Equal Sample Sizes

When testing the equality of means with equal sample sizes, we conducted simulation studies to compare our proposed test statistic

χ_{1}

’s (in (27)) performance with

χ_{2}

in (42) developed for different sample sizes, (

χ

) [3], Hotelling’s tests (

T_{1}^{2}

or

T_{2}^{2}

depending on known (chi-square) or unknown (F-distribution)) variances, classical ANOVA and the nonparametric Kruskal–Wallis (KW) test statistics.

For

k = 3

, with

μ = (0, μ_{2}, μ_{3})

, we considered either

g (x) \sim N_{3} (μ, Σ = I_{3})

, or

g (x) \sim g a m m a (3, 1) \times g a m m a (3 + μ_{2}, 1) \times g a m m a (3 + μ_{3}, 1)

. The hypothesis of equality of means becomes equivalent to testing

H_{0} : μ_{2} = μ_{3} = 0

. Equal sample sizes are taken as

n = 30

or

n = 50

. For all simulations, 1000 runs were used with a nominal level of

α = 0.05

.

The power results as a function of

μ_{2}, μ_{3}

are shown in Table 2. Both

χ_{1}

and

χ_{2}

tests have higher powers than other tests under the gamma distribution, including the nonparametric test with the

χ_{2}

test performing slightly better than the

χ_{1}

test. However, none of the

χ, χ_{1}, χ_{2}

tests was dominated by the ANOVA test under the normal case as expected (also observed by [3] for the

χ

test).

4.3. Power Comparison: Different Sample Sizes

We compared the performance of the proposed semi-parametric test statistic

χ_{2}

in (42) with the test statistic

χ

[3], classical ANOVA and the Kruskal–Wallis test (KW).

We used the same

g (x)

,

H_{0}

, number of runs and nominal level as in the equal sample size case. The power results as a function of

μ_{2}, μ_{3}

are shown in Table 3 with different sample sizes

n_{1}, n_{2}, n_{3}

. For the calculation of

χ_{2}

, the value of the sample mean with the largest sample size is taken as fixed in all cases.

It can be seen that the

χ_{2}

test performed better than the other tests. However, it is not dominated by the ANOVA test as expected under the normal distribution.

5. Applications

We considered two applications. Equal sample sizes were used in the radar meteorology example, and the sample sizes were different in the credit limit example.

5.1. Radar Meteorology

We considered radar reflectivity data [1] from two independent radars deployed during NASA’s Tropical Rainfall Measuring Mission Kwajalein Experiment. An S-band radar was located on Kwajalein Island at the southern end of the Kwajalein Atoll, and a C-band radar was on board the NOAA ship Ronald H. Brown. Different calibrations were applied to two radars, and their spherical data were recorded to the same 1 km cube Cartesian grid.

Histograms of the samples were identical if sample 1 and sample 2 both are taken from the C-band population (Figure 1). Histograms of sample 1 and 3 illustrate that the shapes of the distributions are different for sample 1 taken from the C-band population and sample 2 taken from the S-band population (Figure 1).

First, two independent samples were taken from the C-band radar population to test whether they were from the same population. The population size was around 3987, and the sample sizes were 500 each. To test that the populations were equal, the method of Section 4 (in particular, the test statistic

χ_{1}

in (27) was used with

k = 2

), and the results are given in Table 4. Using only equality of means constraints, the data fail to reject

H_{0} : β = 0

with the p-value = 0.71. Using simultaneous equality of means and equality of variance constraints, the data again fail to reject

H_{0} : β_{1} = 0, β_{2} = 0

with p-value = 0.18, and thus we concluded that both samples are from the same population (with respect to the means and variances).

Next, we considered two independent samples of size 500 each from C- and S-band populations to test if those were from the same population. With C-band radar data as the reference and when using only equality of means constraints, the data rejected

H_{0} : β = 0

with p-value =0.0003. Using equality of means and equality of variance constraints simultaneously, the data again rejected

H_{0} : β_{1} = 0, β_{2} = 0

with p-value

\approx 0

; therefore, we concluded that the samples are from different populations (with respect to the means and variances).

From the histograms, the data appear to be skewed to the left (as was noted by [3]). Thus, we considered the log transformation of the data and repeated the analysis for equality of means as reported in Table 4; however, our conclusions remain unchanged.

5.2. Credit Limit Data

Previous study considered credit limit information from three different education levels, namely, graduate school, university and high school [2]. We want to investigate whether the populations are identical using the constraints that the means of the credit limit between three different education levels are equal. The sample sizes from the three populations are 10,585, 14,030 and 4917, respectively.

The credit limit data of each groups follows positively skewed distributions (Figure 2).

To test that the populations were identical with respect to their means, the method of Section 4 for different sample sizes, in particular, the test statistic

χ_{2}

in (42) was used with

k = 3

. As the ‘university’ group had the largest sample size, the sample mean from that group was taken as fixed to be used in (29). While comparing the other groups with the university group, we found

{\hat{η}}_{1} = - 0.4435 \times 10^{- 5}

and

{\hat{η}}_{2} = 0.1396 \times 10^{- 5}

. The test statistic value was

χ_{2} = 1272.92

with p-value

\approx 0

. Thus, we reject

H_{0} : η_{1} = η_{2} = 0

, and conclude that the populations are different. As the data were skewed, a log transformation was, again, considered. Then, the dual problem solutions were

{\hat{η}}_{1} = - 0.563, {\hat{η}}_{2} = 0.1988

, and the test statistic was

χ_{2} = 928.75

(p-value

\approx 0

) rejecting

H_{0}

.

6. Discussions

Often prior information is known about the population. However, the sample collected may not reveal this information due to the sampling variability. Hence, it is worthwhile to build a model that satisfies the prior information and is the closest to the observed data. In Theorem 1,

g (x)

serves as the observed data,

C

serves as the prior information, and the distance between

g (x)

and

C

is measured using the KL distance.

This paper proposed a method to compare between different populations based on a set of restrictions specified by the investigator. The restrictions were set in the form of moment constraints through one or more functions h. Setting different types of h compares different aspects of the distributions under consideration, e.g.,

h (x) = x - c_{1}

in (4) compares

g (x)

and

g_{i} (x)

regarding their means. When

β_{i} = 0

in (4), then

g_{i} (x) = g (x), \forall x

. However, when

β_{i} \neq 0

, then

g_{i} (x)

and

g (x)

might differ in aspects other than only their values of

E (h (X))

.

For real data, one can obtain basic information from the data, including the shape. If any of the distributions under consideration are known to be approximately symmetric, using

h_{1} (x) = x - c_{1}

and/or

h_{2} (x) = x^{2} - c_{2}

may be the first steps to determine if the distributions are different regarding their means and/or variances. However, if the distributions under consideration are known to be approximately skewed, then using

h (x) = ln x

would be more appropriate. In general, the reference distribution in (4) may be any of the k distributions, leaving the exponential distortion intact but with shifted parameters. When using the one-at-a-time method for different sample sizes, we chose the sample with the largest size as the reference, considering it to be the most trusted, and used its mean as

μ_{k}

.

Author Contributions

Both authors developed the new optimality models, and drafted the manuscript. Pathiravasan conducted the simulation, analyzed the data and critically reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Credit limit data is available on the University of California Irvine repository. Request for radar meteorology data can be made to the first author.

Acknowledgments

The authors thank the referees for their comments that helped to improve the presentation of this paper.

Conflicts of Interest

All authors declare no completing interest.

Appendix A

Theorem A1.

Suppose the reference pdf is

g_{2} (x) = g (x) \sim N (μ_{2}, σ_{2}^{2})

, where

μ_{2}, σ_{2}^{2}

are known. Then, the solution

g_{1} (x) = f^{*} (x)

to the minimization problem (6) with

C = {f (x) : E_{f} (X) = μ_{1}, V a r_{f} (X) = σ_{1}^{2}}

is

f^{*} (x) = g_{1} (x) = g (x) e^{{\hat{α}}_{1} + {\hat{β}}_{1} x + {\hat{β}}_{2} x^{2}} = N (μ_{1}, σ_{1}^{2}),

(A1)

where

{\hat{α}}_{1} = - ln [\int g (x) e^{{\hat{β}}_{1} x + {\hat{β}}_{2} x^{2}} d x], {\hat{β}}_{1} = \frac{μ_{1}}{σ_{1}^{2}} - \frac{μ_{2}}{σ_{2}^{2}}, {\hat{β}}_{2} = \frac{1}{2 σ_{2}^{2}} - \frac{1}{2 σ_{1}^{2}}

.

Proof.

See Pathiravasan (2019). □

When

σ_{1}^{2} = σ_{2}^{2}

, then the

C

above reduces to

C = {f (x) : E_{f} (X) = μ_{1}}

, and then, it can be shown that the solution of the dual problem is

\hat{β} = \frac{μ_{1} - μ_{2}}{σ_{2}^{2}} .

Recall, the same

β = \hat{β}

was also observed as

β_{i}

in model (1) for

k = 2

, which justifies the optimality of the exponential tilt form of

N (μ_{1}, σ_{1}^{2})

with

g (x) = N (μ_{2}, σ_{2}^{2})

as the reference distribution for the one-way ANOVA model in Section 2.

Proof of Theorem 2.

The cone dual to

C

in (11) is

C^{*} = {β_{1} (x_{1} - x_{2}) + β_{2} (x_{2} - x_{3}); β_{1}, β_{2} \in R}

. Therefore, the corresponding dual problem is

inf_{β_{1}, β_{2} \in R} \int_{x} e^{β_{1} (x_{1} - x_{2}) + β_{2} (x_{2} - x_{3})} g (x) d x .

(A2)

In order to find the solution to (A2), we differentiate it with respect to

β_{1}

and

β_{2}

separately. The relevant score equations are

\int_{x} (x_{i} - x_{i + 1}) e^{β_{1} (x_{1} - x_{2}) + β_{2} (x_{2} - x_{3})} g (x) d x = 0, i = 1, 2 .

(A3)

Using the expression for

g (x)

and combining similar terms in the exponent, (A3) can be simplified as

E (X_{i} - X_{i + 1}) = 0,

where

{(X_{1}, X_{2}, X_{3})}^{'}

has a tri-variate normal distribution with a mean vector

{(μ_{1} + β_{1} σ_{1}^{2}, μ_{2} + (β_{2} - β_{1}) σ_{2}^{2}, μ_{3} - β_{2} σ_{3}^{2})}^{'}

and covariance matrix

Σ

.

Setting the equations

E (X_{i} - X_{i + 1}) = 0, i = 1, 2

as

μ_{1} + β_{1} σ_{1}^{2} = μ_{2} + (β_{2} - β_{1}) σ_{2}^{2}, μ_{2} + (β_{2} - β_{1}) σ_{2}^{2} = μ_{3} - β_{2} σ_{3}^{2},

we find the solution of the corresponding dual problem as given in (16). Using these solutions, one obtains the solution

f^{*} (x)

in (14) with the expression of

μ^{*}

as stated in (15). □

For

k > 3

, closed form solutions for

{\hat{β}}_{i}, μ^{*}

exist but are tedious or intractable (Pathiravasan, 2019). For

g (x)

nonnormal cases, unique solutions for

β_{i}

’s exist but may not be in closed forms and/or the final solution

f^{*} (x)

may not be a known distribution.

Theorem A2.

For

k = 3

with the same pdf

g (x) = N_{3} (μ, Σ)

as in Theorem 3, consider the minimization problem (13) with

C = {f (x); E_{f} (X_{i}) = E_{f} (X_{i + 1}), V a r_{f} (X_{i}) = V a r_{f} (X_{i + 1}), i = 1, 2} .

The solution is given by

f^{*} (x) = g (x) e^{{\hat{α}}_{1} + {\hat{β}}_{1} (x_{1} - x_{2}) + {\hat{β}}_{2} (x_{2} - x_{3}) + {\hat{β}}_{3} (x_{1}^{2} - x_{2}^{2}) + {\hat{β}}_{4} (x_{2}^{2} - x_{3}^{2})} = N_{3} (μ^{*} 1, σ^{2 *} I_{3}),

where

{\hat{α}}_{1}

is the normalizing factor,

μ^{*}

, as in (15), and

σ^{2 *} = \frac{3 σ_{1}^{2} σ_{2}^{2} σ_{3}^{2}}{σ_{1}^{2} σ_{2}^{2} + σ_{1}^{2} σ_{3}^{2} + σ_{2}^{2} σ_{3}^{2}},

{\hat{β}}_{1} = \frac{- 1}{3} (\frac{2 μ_{1} σ_{2}^{2} σ_{3}^{2} - μ_{2} σ_{1}^{2} σ_{3}^{2} - μ_{3} σ_{1}^{2} σ_{2}^{2}}{σ_{1}^{2} σ_{2}^{2} σ_{3}^{2}}), {\hat{β}}_{2} = \frac{- 1}{3} (\frac{μ_{1} σ_{2}^{2} σ_{3}^{2} + μ_{2} σ_{1}^{2} σ_{3}^{2} - 2 μ_{3} σ_{1}^{2} σ_{2}^{2}}{σ_{1}^{2} σ_{2}^{2} σ_{3}^{2}}),

{\hat{β}}_{3} = \frac{- 1}{6} (\frac{σ_{1}^{2} σ_{2}^{2} + σ_{1}^{2} σ_{3}^{2} - 2 σ_{2}^{2} σ_{3}^{2}}{σ_{1}^{2} σ_{2}^{2} σ_{3}^{2}}), {\hat{β}}_{4} = \frac{- 1}{6} (\frac{2 σ_{1}^{2} σ_{2}^{2} - σ_{1}^{2} σ_{3}^{2} - σ_{2}^{2} σ_{3}^{2}}{σ_{1}^{2} σ_{2}^{2} σ_{3}^{2}}),

and

I_{3}

is the

(3 \times 3)

identity matrix.

Proof.

See Pathiravasan (2019). □

References

Kedem, B.; Wolff, D.B.; Fokianos, K. Statistical comparison of algorithms. IEEE Trans. Instrum. Meas. 2004, 53, 770–776. [Google Scholar] [CrossRef]
Yeh, I.-C.; Lien, C.-H. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 2009, 36, 2473–2480. [Google Scholar] [CrossRef]
Fokianos, K.; Kedem, B.; Qin, J.; Short, D.A. A semiparametric approach to the one-way layout. Technometrics 2001, 43, 56–65. [Google Scholar] [CrossRef] [Green Version]
Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620. [Google Scholar] [CrossRef]
Bhattacharya, B.; Dykstra, R. A general duality approach to I-projections. J. Stat. Plan. Inference 1995, 47, 203–216. [Google Scholar] [CrossRef]
Pathiravasan, C.H. Generalized Semiparametric Approach to the Analysis of Variance. Ph.D. Dissertation, Department of Mathematics, Southern Illinois University Carbondale, Carbondale, IL, USA, 2019. [Google Scholar]
Haberman, S.J. Adjustment by minimum discriminant information. Ann. Stat. 1984, 12, 971–988. [Google Scholar] [CrossRef]

Figure 1. Radar reflectivity data: histograms, normal QQ-plots and boxplots. Sample 1 and sample 2 were taken from the C-band radar population with each sample size

n = 500

. Sample 3 was taken from the S-band Radar population with the same sample size

n = 500

.

Figure 1. Radar reflectivity data: histograms, normal QQ-plots and boxplots. Sample 1 and sample 2 were taken from the C-band radar population with each sample size

n = 500

. Sample 3 was taken from the S-band Radar population with the same sample size

n = 500

.

Figure 2. Credit limit data: histograms, normal QQ-plots and boxplots for the graduate (

n =

10,585), university (

n =

14,030) and high school (

n = 4917

) student groups.

Figure 2. Credit limit data: histograms, normal QQ-plots and boxplots for the graduate (

n =

10,585), university (

n =

14,030) and high school (

n = 4917

) student groups.

Table 1. Mean square errors of the

β

parameters.

Table 1. Mean square errors of the

β

parameters.

	$E (X_{1}) = E (X_{2}) = E (X_{3})$
Components of			Sample	Parameters		MSE		TMSE
$g (x_{1}, x_{2}, x_{3})$	Mean	Var	Sizes	TO	ANOVA	TO	ANOVA	TO	ANOVA
$X_{1} \sim$ N(0,1)	0	1	200	$β_{1} :$ 0.3094	0.3030	0.0044	0.0041	0.0109	0.0125
$X_{2} \sim$ N(0.2,1.1)	0.2	1.1	200	$β_{2} :$ 0.4088	0.4242	0.0065	0.0084
$X_{3} \sim$ N(0.8,1.2)	0.8	1.2	200
$X_{1} \sim$ Beta(1,2)	0.33	0.06	200	$β_{1} :$ 1.3433	1.4000	0.0726	0.0787	0.1442	0.1516
$X_{2} \sim$ Beta(1.33,1.99)	0.4	0.06	200	$β_{2} :$ 1.5871	1.6000	0.0716	0.0729
$X_{3} \sim$ Beta(1.75,1.75)	0.5	0.06	200
$X_{1} \sim$ Beta(1,2)	0.33	0.06	200	$β_{1} :$ 1.2424	1.2245	0.0642	0.0639	0.1205	0.1245
$X_{2} \sim$ Beta(1.08,1.62)	0.4	0.065	200	$β_{2} :$ 1.3543	1.3994	0.0562	0.0607
$X_{3} \sim$ Beta(1.29,1.29)	0.5	0.07	200
	$E (X_{1}) = E (X_{2}) = E (X_{3}), V (X_{1}) = V (X_{2}) = V (X_{3})$
$X_{1} \sim$ N(0,1)	0	1	200	$β_{1} :$ 0.2828	0.3030	0.0059	0.0081	0.0217	0.0408
$X_{2} \sim$ N(0.2,1.1)	0.2	1.1	200	$β_{2} :$ 0.3838	0.4242	0.0094	0.0166
$X_{3} \sim$ N(0.8,1.2)	0.8	1.2	200	$β_{3} :$ 0.0429	0	0.0038	0.0091
				$β_{4} :$ 0.0404	0	0.0026	0.0069
$X_{1} \sim$ Beta(1,2)	0.33	0.067	200	$β_{1} :$ 2.8090	1.4000	0.9763	2.1442	4.5389	9.7559
$X_{2} \sim$ Beta(1.33,1.99)	0.4	0.067	200	$β_{2} :$ 2.9256	1.6000	1.1376	2.1325
$X_{3} \sim$ Beta(1.75,1.75)	.5	0.067	200	$β_{3} :$ −1.6512	0	1.2599	2.9440
				$β_{4} :$ −1.4902	0	1.1651	2.5353
$X_{1} \sim$ Beta(1.33,1.99)	0.4	0.067	500	$β_{1} :$ −1.4640	0	0.6133	2.3176	3.9619	12.9005
$X_{2} \sim$ Beta(1.2,1.8)	0.4	0.06	200	$β_{2} :$ −1.7934	0	1.1961	3.3782
$X_{3} \sim$ Beta(.97,1.46)	0.4	0.07	100	$β_{3} :$ 1.6611	0	0.7408	2.9568
				$β_{4} :$ 2.0318	0	1.4117	4.2478

Here TO = Proposed optimality model, ANOVA = Analysis of variance, MSE = Mean square errors, TMSE = Total mean square errors.

Table 2. Power comparison with equal sample sizes.

					Normal							Gamma
$n$	$μ_{2}$	$μ_{3}$	$χ$	$χ_{1}$	$χ_{2}$	$T_{1}^{2}$	$T_{2}^{2}$	ANOVA	KW	$χ$	$χ_{1}$	$χ_{2}$	$T_{1}^{2}$	$T_{2}^{2}$	ANOVA	KW
30	0	0	0.054	0.080	0.096	0.043	0.058	0.040	0.042	0.069	0.087	0.126	0.045	0.068	0.041	0.038
	0.2	0.2	0.126	0.184	0.368	0.101	0.145	0.109	0.099	0.057	0.140	0.104	0.074	0.101	0.060	0.059
	0.1	0.4	0.309	0.381	0.587	0.267	0.330	0.284	0.278	0.083	0.182	0.175	0.107	0.145	0.120	0.110
	0.2	0.5	0.428	0.495	0.805	0.379	0.440	0.398	0.377	0.103	0.239	0.250	0.145	0.188	0.143	0.138
	0.5	0.5	0.527	0.598	0.963	0.474	0.546	0.499	0.480	0.115	0.267	0.418	0.161	0.218	0.170	0.183
	0.7	0.5	0.730	0.774	0.992	0.664	0.730	0.703	0.663	0.158	0.346	0.573	0.229	0.281	0.241	0.259
50	0	0	0.056	0.084	0.083	0.050	0.060	0.046	0.047	0.063	0.091	0.112	0.051	0.070	0.054	0.051
	0.2	0.2	0.161	0.198	0.483	0.139	0.164	0.150	0.153	0.058	0.113	0.116	0.075	0.095	0.078	0.080
	0.1	0.4	0.457	0.508	0.769	0.430	0.463	0.438	0.397	0.117	0.216	0.254	0.151	0.179	0.156	0.173
	0.2	0.5	0.621	0.648	0.938	0.577	0.618	0.600	0.565	0.144	0.268	0.398	0.202	0.231	0.203	0.217
	0.5	0.5	0.739	0.768	0.995	0.717	0.741	0.730	0.689	0.183	0.345	0.635	0.246	0.278	0.254	0.261
	0.7	0.5	0.897	0.917	1.000	0.882	0.903	0.896	0.887	0.306	0.482	0.831	0.383	0.424	0.386	0.422

Here

χ

= Test statistic [3],

χ_{1}

= Proposed test statistic (27) for equal sample sizes,

χ_{2}

= Proposed test statistic (42) for different sample sizes,

T_{1}^{2}

or

T_{2}^{2}

= Hotelling’s tests depending on known (chi-square) or unknown (F-distribution) variances, ANOVA = F test statistic of analysis of variance, KW = Kruskal-Wallis test statistic.

Table 3. Power comparison with different sample sizes.

				Normal					Gamma
Sample Sizes	$μ_{2}$	$μ_{3}$	$χ_{2}$	$χ$	ANOVA	KW	$χ_{2}$	$χ$	ANOV	KW
$n_{1} = 200$	0.0	0.0	0.068	0.050	0.048	0.048	0.085	0.055	0.040	0.040
$n_{2} = 100$	0.2	0.2	0.594	0.333	0.330	0.306	0.182	0.113	0.161	0.159
$n_{3} = 40$	0.1	0.4	0.719	0.531	0.520	0.498	0.244	0.120	0.202	0.208
	0.2	0.5	0.916	0.785	0.772	0.755	0.382	0.222	0.337	0.327
	0.5	0.5	1.000	0.987	0.988	0.981	0.807	0.514	0.602	0.629
	0.7	0.5	1.000	1.000	1.000	1.000	0.968	0.766	0.839	0.854
$n_{1} = 200$	0.0	0.0	0.071	0.055	0.054	0.055	0.078	0.053	0.046	0.043
$n_{2} = 100$	0.2	0.2	0.714	0.370	0.363	0.349	0.219	0.105	0.141	0.153
$n_{3} = 100$	0.1	0.4	0.960	0.835	0.828	0.813	0.475	0.269	0.346	0.347
	0.2	0.5	0.999	0.960	0.959	0.962	0.735	0.443	0.542	0.569
	0.5	0.5	1.000	0.998	0.998	0.997	0.933	0.606	0.673	0.703
	0.7	0.5	1.000	1.000	1.000	1.000	0.990	0.819	0.867	0.893

Here

χ_{2}

= Proposed test statistic (42) for different sample sizes,

χ

= Test Statistic [3], ANOVA = F test statistic of analysis of variance, KW = Kruskal-Wallis test statistic.

Table 4. Hypothesis testing when samples are coming from the same or different populations.

Populations	$H_{0}$	Transformation	Dual Solutions	$χ_{1}$	p-Value
C-band, C-band	$H_{0} : β = 0$	$h (x) = x_{1} - x_{2}$	$\hat{β}$ = −0.00086	0.1382	0.71
	$H_{0} : β_{1} = 0,$	$h_{1} (x) = x_{1} - x_{2},$	$\hat{β_{1}} = - 0.0175,$	3.3577	0.18
	$β_{2} = 0$	$h_{2} (x) = x_{1}^{2} - x_{2}^{2}$	$\hat{β_{2}} = 0.0003$
	$H_{0} : β = 0$	$h (x) = ln x_{1} - ln x_{2}$	$\hat{β}$ = −0.08541	−0.74671	0.4552
C-band, S-band	$H_{0} : β = 0$	$h (x) = x_{1} - x_{2}$	$\hat{β}$ = −0.0087	13.24	0.0003
	$H_{0} : β_{1} = 0,$	$h_{1} (x) = x_{1} - x_{2},$	$\hat{β_{1}} = 0.0048$ ,	23.4	0.0
	$β_{2} = 0$	$h_{2} (x) = x_{1}^{2} - x_{2}^{2}$	$\hat{β_{2}} = - 0.0002$
	$H_{0} : β = 0$	$h (x) = ln x_{1} - ln x_{2}$	$\hat{β}$ = 0.4414	4.8307	0.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pathiravasan, C.H.; Bhattacharya, B. A Semiparametric Tilt Optimality Model. Stats 2023, 6, 1-16. https://doi.org/10.3390/stats6010001

AMA Style

Pathiravasan CH, Bhattacharya B. A Semiparametric Tilt Optimality Model. Stats. 2023; 6(1):1-16. https://doi.org/10.3390/stats6010001

Chicago/Turabian Style

Pathiravasan, Chathurangi H., and Bhaskar Bhattacharya. 2023. "A Semiparametric Tilt Optimality Model" Stats 6, no. 1: 1-16. https://doi.org/10.3390/stats6010001

Article Menu

A Semiparametric Tilt Optimality Model

Abstract

1. Introduction

2. Tilt Optimality Models

3. Semiparametric Approach

3.1. Equal Sample Sizes

3.2. Different Sample Sizes

4. Simulation Studies

4.1. Mean Square Errors of $β$

4.2. Power Comparison: Equal Sample Sizes

4.3. Power Comparison: Different Sample Sizes

5. Applications

5.1. Radar Meteorology

5.2. Credit Limit Data

6. Discussions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Semiparametric Tilt Optimality Model

Abstract

1. Introduction

2. Tilt Optimality Models

3. Semiparametric Approach

3.1. Equal Sample Sizes

3.2. Different Sample Sizes

4. Simulation Studies

4.1. Mean Square Errors of β

4.2. Power Comparison: Equal Sample Sizes

4.3. Power Comparison: Different Sample Sizes

5. Applications

5.1. Radar Meteorology

5.2. Credit Limit Data

6. Discussions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1. Mean Square Errors of $β$