Next Article in Journal
Intelligent Bearing Fault Diagnosis Based on Open Set Convolutional Neural Network
Next Article in Special Issue
Local Closure under Infinitely Divisible Distribution Roots and Esscher Transform
Previous Article in Journal
Fast Algorithms for Estimating the Disturbance Inception Time in Power Systems Based on Time Series of Instantaneous Values of Current and Voltage with a High Sampling Rate
Previous Article in Special Issue
Quantitative Analysis of the Balance Property in Factorial Experimental Designs 24 to 28
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Representative Points from a Mixture of Two Normal Distributions

1
Department of Statistics and Data Science, Beijing Normal University–Hong Kong Baptist University United International College, Zhuhai 519087, China
2
Department of Mathematics, Hong Kong Baptist University, Kowloon, Hong Kong
3
The Key Lab of Random Complex Structures and Data Analysis, The Chinese Academy of Sciences, Beijing 100045, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(21), 3952; https://doi.org/10.3390/math10213952
Submission received: 29 August 2022 / Revised: 17 October 2022 / Accepted: 19 October 2022 / Published: 24 October 2022
(This article belongs to the Special Issue Distribution Theory and Application)

Abstract

:
In recent years, the mixture of two-component normal distributions (MixN) has attracted considerable interest due to its flexibility in capturing a variety of density shapes. In this paper, we investigate the problem of discretizing a MixN by a fixed number of points under the minimum mean squared error (MSE-RPs). Motivated by the Fang-He algorithm, we provide an effective computational procedure with high precision for generating numerical approximations of MSE-RPs from a MixN. We have explored the properties of the nonlinear system used to generate MSE-RPs and demonstrated the convergence of the procedure. In numerical studies, the proposed computation procedure is compared with the k-means algorithm. From an application perspective, MSE-RPs have potential advantages in statistical inference.Our numerical studies show that MSE-RPs can significantly improve Kernel density estimation.

1. Introduction

A finite mixture of distributions allows for great flexibility in capturing a variety of density shapes. During the past few years, the Gaussian mixture family has attracted considerable attention due to its ability to approximate any continuous distribution using a finite Gaussian mixture model. Mixtures of normal distributions have numerous applications across a variety of disciplines, including physics, engineering, economics, biology, and finance. Andrew et al. [1] apply a two-component Gaussian mixture model for fast neutron detection with a pulse shape discriminating scintillator. Kong et al. [2] model the wireless channels using a mixture of normal distributions in order to analyze physical layer security over fading channels. Shen et al. [3] employ a mixture of two-component normal distributions to model the knock intensity in the gasoline engine combustion control problem. Mazzeo et al. [4], Ouarda and Charron [5] use a mixture of two truncated normal distributions for modeling wind speed. The mixture of normal distributions is capable of capturing the leptokurtic, skewed, and multimodal characteristics of financial data. Venkataraman [6] applies the mixture of two-component normal distributions to construct the Value at Risk (VaR) measures. In public health and biomedical studies, the admixture model is a two-component mixture model with one component being indexed by an unknown parameter and the other component’s parameter being known. The admixture models are widely used to account for population heterogeneity in genetic linkage analysis [7,8].
It is important to keep in mind that although the mixture model is flexible, it can also present certain statistical challenges. The number of parameters increases more than k-fold when k-component normal mixtures are applied. The mixture of normal distributions is not always an identifiable distribution. Different values of its parameters may generate the same probability distribution of the observable variables. The loss of identifiability breaks the asymptotic normality and the chi-squared approximation of a likelihood ratio test statistic constructed by independent and identically distributed samples from a mixture model. Hartigan [9] has shown that the likelihood ratio test statistic for homogeneity is stochastically unbounded and therefore fails to converge to the classical chi-squared distribution. Due to the non-identifiability problem, the asymptotic normality and chi-squared approximation of the likelihood ratio test statistic do not hold for the admixture models [8]. In addition to the loss of identifiability, the additive density form of a mixture model results in an unbounded likelihood function [10]. The singularity problem breaks down the consistency of MLE and influences the asymptotic normality property of MLE due to the occurrence of a degenerated Fisher information matrix. Recently, Panić et al. [11], Li and Fang [12] developed improved methods for estimating parameters of mixtures of normal distributions to overcome the difficulties associated with parameter estimation. More discussion on the inconsistency of MLE under a mixture of a two-component normal distribution can be found in Chen [13].
In this paper, we consider the problem of discretization of a mixture of two-component normal distributions (MixN) under the minimum mean squared error measure. Quantization is a process employed in source coding and information theory for converting continuous analog signals into discrete digital signals. Analog quantities must be quantized when they are represented, processed, stored, or transmitted by a digital system [14]. Quantization is also required for data compression. More information about quantization can be found in Graf and Luschgy [15], Gersho and Gray [16]. Max [17] considered the problem of minimizing the distortion of a signal by a quantizer when the number of output levels of the quantizer is fixed. The distortion can be defined as the squared error between the input signal and the output of the quantizer. Assume the input signal follows a continuous distribution. The optimal output levels are the representative points of the underlying distribution with minimum mean squared error. Our paper refers to the representative points as MSE-RPs for short. A more detailed discussion of MSE-RPs and their associated properties is provided in Section 3.
Due to the complexity of the density function of a MixN, selecting MSE-RPs from a MixN presents a challenge. Flury [18] provided analytic solutions to two principal points (MSE-RPs) of univariate symmetric distributions, such as the uniform distribution and the normal distribution. However, the additive density form and the uncertainty concerning the unique solution of MSE-RPs make it difficult to obtain the analytic solution of two MSE-RPs from a MixN. A log-concavity condition is commonly used in the literature to determine the unique solution to MSE-RP [19,20]. Later, Yamamoto and Shinozaki [21] derived a sufficient condition for the existence of unique two principal points (MSE-RPs) for the mirrored mixture of normal distributions. Mirrored mixtures of two-component normals have a symmetric distribution and a mixture proportion of 0.5. Using Truskin’s result [19], we derived a sufficient condition for location mixtures of two normal distributions with a mixture proportion ranging from 0 to 1 in Section 2. This condition can be used with any number of MSE-RPs.
In order to obtain numerical approximations to the MSE-RPs from a MixN, the Lloyd I procedure, also known as the k-means algorithm, is most applied [22,23]. Rowe [24] proposed an algorithm for finding solutions to a loss function with arbitrary order. In our case, we consider the order of 2. The algorithm obtains one value of n points at a time and then adjusts the other values based on the success of the corresponding search. This algorithm may not be efficient when it comes to generating large numbers of points. MSE-RPs of univariable distributions can be obtained by solving a system of non-linear equations. The non-linear system is formulated by taking the first-order partial derivatives of the mean squared function with respect to each point. Recently, Chakraborty et al. [25] applied the iterative Newton’s method to solve the nonlinear system. They demonstrated that their method calculates MSE-RPs with high precision for many univariate distributions. However, in their method, initial values for iterative Newton’s method are assigned at random. Convergence of the algorithm is not always guaranteed. Fang and He [26] proposed a computation procedure (Fang-He algorithm) also for solving the nonlinear system. They proved the convergence of the proposed algorithm for the normal distribution. Later, many authors proved the convergence of the Fang-He algorithm for other univariate distributions, for example, the Student t-distribution [27], the Gamma distribution [28], the S-type distributions [29], the Weibull distribution [30], and the Pearson distributions [31]. Recently, Fang et al. [32] applied the Fang-He algorithm to generate MSE-RPs from a MixN with bimodal. However, further investigation is necessary to determine whether the Fang-He algorithm is able to achieve convergence for MixN.
Taking advantage of the high precision and effectiveness of the Fang-He algorithm, we employ the same algorithm for finding numerical approximations of MSE-RPs from a MixN. The algorithm convergence and the properties of the nonlinear system based on the underlying distribution of MixN are discussed in Section 4.2. Numerical study indicates that the Fang-He algorithm is capable of providing more accurate MSE-RPs from the mixture of two-component normal distributions when compared to the famous k-means algorithm.
The rest of this paper is organized as follows. Section 2 includes the preliminaries of two-component normal mixtures. Section 3 discusses the MSE-RPs from a MixN. Section 4 gives the proposed computation procedures for MES-RPs selection. Section 5 presents numerical studies for examining the accuracy of the algorithms. A simulation study of kernel density estimation for signal transmission is also performed in this section. Finally, Section 6 gives conclusions and remarks.

2. Mixtures of Two-Component Normal Distributions

Consider a random variable X that follows a mixture of two independent normal distributions. Denote as X MixN α , μ 1 , σ 1 2 , μ 2 , σ 2 2 . The parameter α ranges from 0 to 1, and represents the contribution of the first component of the normal mixture N ( μ 1 , σ 1 2 ) . The probability density function (pdf) of X is defined as
f ( x ) = α ϕ x ; μ 1 , σ 1 2 + ( 1 α ) ϕ x ; μ 2 , σ 2 2 ,
and the cumulative distribution function (cdf) of X is given by
F ( x ) = α Φ x ; μ 1 , σ 1 2 + ( 1 α ) Φ x ; μ 2 , σ 2 2 ,
where α ( 0 , 1 ) , ϕ and Φ are the pdf and cdf of N ( 0 , 1 ) , respectively. For the remainder of this paper, we denote MixN as a mixture of two-component normal distributions.
The normal distribution belongs to the location-scale family. The convex combination of two normal distributions, however, is no longer considered a location-scale distribution. There are two special sub-classes of MixN when two normal components have the same location or have the same scale. Let X MixN α , μ 1 , σ 1 2 , μ 2 , σ 2 2 .
  • If μ 1 = μ 2 , σ 1 2 σ 2 2 , X follows a scale mixture, denoted as MixN.S α , μ , σ 1 2 , σ 2 2 ;
  • If μ 1 μ 2 , σ 1 2 = σ 2 2 , X follows a location mixture, denoted as MixN.L α , μ 1 , μ 2 , σ 2 .
Mirrored mixtures are a special case of location mixtures, denoted as MixN.M μ , σ 2 . The corresponding density function is
f ( x ) = 0.5 σ 2 π e x μ 2 2 σ 2 + 0.5 σ 2 π e x + μ 2 2 σ 2 .
The density of a MixN is either unimodal or bimodal. Statistical analyses based on a distribution of MixN are potentially influenced by the number of modes. For instance, when there is a large overlap in density between two mixture components, parameter estimation of MixN will be difficult. This is due to the difficulty of identifying the correct group for each observation. In the case of a bimodal MixN, the two normal components can be separated more easily, resulting in a more accurate estimation of parameters. In contrast, parameter estimation for a unimodal MixN will be challenging. In our study, the number of modes will also influence the generation of MSE-RPs. It remains unexplored whether there exists a set of unique MSE-RPs for a bimodal MixN. Hence, all numerical approximations of MSR-RPS from a bimodal MixN are only stationary points.
It is advantageous to know the conditions on the number of modes prior to setting the five parameters of MixN in statistical simulations. Next, we review and discuss some sufficient conditions for unimodal and bimodal densities, respectively. Eisenberger [33] derived a sufficient condition for a MixN to be unimodal, that is, for all α , 0 < α < 1 ,
μ 1 μ 2 2 < 27 σ 1 2 σ 2 2 4 σ 1 2 + σ 2 2 ,
and a sufficient condition for a MixN to be bimodal, that is, there exist values of α , where 0 < α < 1 such that
μ 2 μ 1 2 > 8 σ 1 2 σ 2 2 σ 1 2 + σ 2 2 .
According to Behboodian [34], a sufficient condition of a unimodal MixN is,
| μ 1 μ 2 | 2 min σ 1 , σ 2 ,
Consider the special sub-classes of MixN. If μ 1 = μ 2 is assumed, a scale mixture (MixN.S) is unimodal given α ( 0 , 1 ) according to the sufficient conditions of (2) and (3). When σ 1 2 = σ 2 2 = σ 2 is assumed, Behboodian [34] further obtained a sufficient condition of unimodal for a location mixture (MixN.L), that is,
μ 1 μ 2 2 σ 1 + | log α log ( 1 α ) | 2 , α ( 0 , 1 ) .
Note that (4) gives the dynamic upper bound of μ 1 μ 2 of a unimodal location mixture. The sufficient condition derived from (2) is μ 1 μ 2 3 3 σ / 4 for α ( 0 , 1 ) . Behboodian [34] further proved that a location mixture with σ 1 2 = σ 2 2 and α = 0.5 is unimodal, if and only if, μ 1 μ 2 2 σ . Hence, a location mixture with σ 1 2 = σ 2 2 and α = 0.5 is bimodal, if and only if, | μ 2 μ 1 | > 2 σ .
The MixN models can achieve a great deal of flexibility using only a few components. Several examples of MixN densities are presented in Figure 1 to demonstrate its flexibility. In the upper left corner of Figure 1, we see the density of MixN ( 0.1 , 0 , 100 , 0 , 1 ) in which one mixture component is a standard normal and the other normal component has the same mean but a 100-fold increase in variance. This type of mixture is often used as a theoretical model for identifying outliers in a single sample or regression problems [35]. The upper right panel of Figure 1 displays an example of right-skewed unimodal density. The lower left panel of Figure 1 shows a mirrored mixture with one mode. This example tends to be uniformly distributed near the mode. The lower left panel of Figure 1 presents a bimodal example.

3. MSE-RPs from a MixN

Suppose X MixN α , μ 1 , σ 1 2 , μ 2 , σ 2 2 with the cdf F ( x ) and the pdf f ( x ) . Consider to define a discrete distribution F ^ with n support points, where n N + to approximate MixN. To provide the best representation of MixN, we select representative points that have the least mean squared error from F ( x ) . Assume E ( X 2 ) < . Let Y n follows a discrete distribution F m s e , n defined as
F m s e , n ( y ) = j = 1 n p j ( n ) 1 a j ( n ) y
with the probability mass function
f ( y = a i ( n ) ) = p i ( n ) , i = 1 , , n ,
where a 1 ( n ) < < a n ( n ) are MSE-RPs of X and p 1 ( n ) , , p n ( n ) are corresponding probabilities with respect to
MSE X a 1 ( n ) , , a n ( n ) = min i = 1 , , n x a i ( n ) 2 f ( x ) d x ,
and
p 1 ( n ) = ( a 1 ( n ) + a 2 ( n ) ) / 2 f ( x ) d x , p i ( n ) = ( a i 1 ( n ) + a i ( n ) ) / 2 ( a i ( n ) + a i + 1 ( n ) ) / 2 f ( x ) d x , i = 2 , , n 1 , p n ( n ) = ( a n 1 ( n ) + a n ( n ) ) / 2 f ( x ) d x .
The approximating distribution F m s e , n has many useful properties. Given n N + , F m s e , n and F have the same population mean, and F m s e , n has less variance. Graf and Luschgy [15], (Remark 4.6 and Lemma 6.1) and Fei [36] proved that
E ( Y n ) = E ( X ) , lim n E [ ( X Y n ) 2 ] = lim n var ( X ) var ( Y n ) = 0 ,
which imply that Y n converges to X in the mean square as n approaches infinity. Hence, Y n converges to X in distribution. Flury [37] proved that all RPs are self-consistent, that is,
a i ( n ) = E ( X Y n = a i ( n ) ) , i = 1 , , n .
This property explains that each representative point can best represent a region from the population based on the mean squared error criterion.
Next, we discuss the uniqueness of a locally optimal solution that is a stationary point of the mean squared error function from a MixN (5). According to Li and Flury [38], the set of n MSE-RPs, where n N + is unique if | μ | / σ 1 for a mirrored mixture of two normal distributions ( MixN.M ( μ , σ 2 ) ) defined in (1). Lemma 1 gives a sufficient condition of the uniqueness of MSE-RPs.
Lemma 1.
(Trushkin [19]). Under Conditions (1) and (2), if X has a log-concave density f ( x ) , given n, then there exists a unique set of MSE-RPs Y n = y 1 , . . . , y n that can minimize the mean squared error function of Y n from the distribution of X.
Condition (1),
I = ( V , W ) , where V < W , such that,
f ( x ) > 0 , x I f ( x ) = 0 , x is outside of I ,
f ( x ) is continuous and positive inside I.
Condition (2),
V W ( y x ) 2 f ( x ) d x < + , f o r a n y y Y n .
A MixN with finite variance satisfies Conditions (1) and (2) in Lemma 1. All log-concave densities are unimodal, but not necessarily symmetric. As a result, all bimodal MixN fail to satisfy the sufficient condition given in Lemma 1. Additionally, not all of the unimodal densities are log-concave. A counter-example is the Student- t r distribution with a degree of freedom r > 0 [39]. A density function f on R is log-concave if and only if f is strongly unimodal, i.e., a Pólya frequency function of order 2 [20,40]. Although the normal distributions are log-concave, a convex combination of two normal distributions cannot easily satisfy this property. More conditions on the five parameters of a MixN are required. In Theorem 1, we establish a sufficient condition for the uniqueness of MSE-RPs from a MixN.
Theorem 1.
(Location mixture of two normal densities). Suppose X M i x N . L ( α , μ 1 , μ 2 , σ 2 ) . For any n N + , the set of n MSE-RPs of X is unique if, for all α, where α ( 0 , 1 ) ,
| μ 1 μ 2 | 2 σ .
The proof of Theorem 1 is presented in the Appendix A. It is strict for a mixture of two normal distributions to satisfy the log-concavity condition of uniqueness. As a result, if the strong unimodal condition is satisfied, there will be severe overlap between two mixture components. The practical application of this particular type of model is limited.
For the purpose of determining the number of principal components in principal component analysis, the proportion of variance explained by each component is considered in practice. Similarly, the information gain defined in (7) can be used to evaluate the repressiveness of a set of MSE-RPs in practice. IG is range from 0 to 1 ( 0 IG 1 ). MSE-RPs are considered valid as long as the information gain (IG) meets practical expectations.
IG = 1 MSE X a 1 ( n ) , , a n ( n ) var ( X ) .

4. Numerical Approximations to MSE-RPs from a MixN

Let X M i x N ( α , μ 1 , σ 1 2 , μ 2 , σ 2 2 ) with density f ( x ) . Given a set of n MSE-RPs of X, denoted as < a 1 ( n ) < < a n ( n ) < + , the objective Function (5) can be expressed as
MSE X a 1 ( n ) , , a n ( n ) = min i = 1 , , n x a i ( n ) 2 f ( x ) d x = i = 1 n Δ i x a i ( n ) 2 f ( x ) d x ,
where Δ 1 = , ( a 1 ( n ) + a 2 ( n ) ) / 2 , Δ j = ( a j 1 ( n ) + a j ( n ) ) / 2 , ( a j ( n ) + a j + 1 ( n ) ) / 2 , for j = 2 , , n 1 , and, Δ n = ( a n 1 ( n ) + a n ( n ) ) / 2 , + .

4.1. The k-Means Algorithm

Due to the complexity of MixN distribution, it is challenging to generate MSE-RPs from a MixN. MSE-RPs are the theoretical counterparts of cluster means obtained by a k-means algorithm. The k-means algorithm is the most popular method to approximate MSE-RPs. We summarize the computation procedure of the k-means algorithm for approximating MSE-RPs from a MixN.
Step 1. Generate training samples x 1 , x 2 , . . . , x N with a large size N from the underlying MixN.
Step 2. Initialize cluster centers m 1 ( 0 ) , , m n ( 0 ) usually by the Monte Carlo method.
Step 3. Assign each observation x in the training sample to the cluster with the nearest mean measured by the least squared Euclidean distance.
S i ( t ) = x : x m i ( t ) 2 x m j ( t ) 2 j , 1 j n .
For points with equal distance to different cluster centers, they are arbitrarily assigned to one of S ( t ) .
Step 4. Recalculate means for observations assigned to each cluster to form a new set of cluster centers. For i = 1 , , n , calculate
m i ( t + 1 ) = x S i ( t ) x n i ( t ) ,
where n i ( t ) is the number of x falling in S i ( t ) .
Step 5. Repeat Steps 3 and 4 until the cluster centers no longer change.
According to the k-means algorithm, the estimated F m s e , n is formed by the k-mean centers m i and the corresponding probabilities p i = n i N for i = 1 , , n .
Although we can generate as many training samples from MixN as possible, as a non-parametric method, the k-mean algorithm is not always reliable. Particularly, the performance of the k-means algorithm is strongly influenced by the initial values.

4.2. The Fang-He Algorithm

In order to generate reliable numerical solutions of MSE-RPs from MixN, the objective Function (8) can be minimized by taking the first-order partial derivatives with respect to a 1 ( n ) , , a n ( n ) correspondingly. Then, the optimal solutions to (8) can be obtained by solving a system of n non-linear equations, for i = 1 , , n ,
α Δ i x a i ( n ) ϕ ( x ; μ 1 , σ 1 2 ) d x + ( 1 α ) Δ i x a i ( n ) ϕ ( x ; μ 2 , σ 2 2 ) d x = 0 .
For a normal distribution ϕ ( x ; μ , σ 2 ) , we have σ 2 ϕ ( x ; μ , σ 2 ) = ( μ x ) ϕ ( x ; μ , σ 2 ) , which implies x ϕ ( x ; μ , σ 2 ) d x = μ ϕ ( x ; μ , σ 2 ) d x σ 2 ϕ ( x ; μ , σ 2 ) d x . Then, the system of Equations (9) can be expressed as, for i = 1 , , n ,
α μ 1 Φ ( x ; μ 1 , σ 1 2 ) Δ i σ 1 2 ϕ ( x ; μ 1 , σ 1 2 ) Δ i + ( 1 α ) μ 2 Φ ( x ; μ 2 , σ 2 2 ) Δ i σ 2 2 ϕ ( x ; μ 2 , σ 2 2 ) Δ i a i ( n ) F ( x ) Δ i = 0 ,
where F is the cdf of X. It is very difficult to determine MSE-RPs analytically by solving (10) for a MixN. The strategy of Fang and He [26] is summarized below:
  • Step 1. Set an initial value a 1 ( n ) .
  • Step 2. Solve the 1st equation in (10) to obtain a 2 ( n ) by the bisection method or the iterative Newton’s method.
  • Step 3. Solve the 2nd equation in (10) to calculate a 3 ( n ) based on the values of a 1 ( n ) and a 2 ( n ) obtained in Steps 1 and 2.
  • Step 4. Given the values of a i 1 ( n ) and a i ( n ) , solve the i th equation in (10) to get a i + 1 ( n ) for i = 3 , , n 1 .
  • Step 5. Given the solution a n 1 ( n ) of the ( n 2 ) th equation, solve the n th equation in (10) to obtain another solution of a n ( n ) * .
  • Step 6. Modify the initial value of a 1 ( n ) and repeat the above procedure until
    a n ( n ) a n ( n ) * < ϵ ,
    where ϵ represents the error tolerance, which is a very small number.
Next, we prove the convergence of the Fang-He algorithm for MixN. The equations in (10) can be classified into three types. Type I equation is the first equation of (10), i.e.,
α μ 1 Φ a 1 ( n ) + a 2 ( n ) 2 ; μ 1 , σ 1 2 σ 1 2 ϕ a 1 ( n ) + a 2 ( n ) 2 ; μ 1 , σ 1 2 + ( 1 α ) μ 2 Φ a 1 ( n ) + a 2 ( n ) 2 ; μ 2 , σ 2 2 σ 2 2 ϕ a 1 ( n ) + a 2 ( n ) 2 ; μ 2 , σ 2 2 a 1 ( n ) F a 1 ( n ) + a 2 ( n ) 2 = 0 .
Type II is the last equation of (10), i.e.,
α μ 1 1 Φ a n 1 ( n ) + a n ( n ) 2 ; μ 1 , σ 1 2 + σ 1 2 ϕ a n 1 ( n ) + a n ( n ) 2 ; μ 1 , σ 1 2 + ( 1 α ) μ 2 1 Φ a n 1 ( n ) + a n ( n ) 2 ; μ 2 , σ 2 2 + σ 2 2 ϕ a n 1 ( n ) + a n ( n ) 2 ; μ 2 , σ 2 2 a n ( n ) 1 F a n 1 ( n ) + a n ( n ) 2 = 0
Type III equation is the 2nd equation to the ( n 1 ) th equation of (10), i.e., for i = 2 , , n 1 ,
α μ 1 Φ a i ( n ) + a i + 1 ( n ) 2 ; μ 1 , σ 1 2 Φ a i 1 ( n ) + a i ( n ) 2 ; μ 1 , σ 1 2 + α σ 1 2 ϕ a i 1 ( n ) + a i ( n ) 2 ; μ 1 , σ 1 2 ϕ a i ( n ) + a i + 1 ( n ) 2 ; μ 1 , σ 1 2 + ( 1 α ) μ 2 Φ a i ( n ) + a i + 1 ( n ) 2 ; μ 2 , σ 2 2 Φ a i 1 ( n ) + a i ( n ) 2 ; μ 2 , σ 2 2 + ( 1 α ) σ 2 2 ϕ a i 1 ( n ) + a i ( n ) 2 ; μ 2 , σ 2 2 ϕ a i ( n ) + a i + 1 ( n ) 2 ; μ 2 , σ 2 2 a i ( n ) F a i ( n ) + a i + 1 ( n ) 2 F a i 1 ( n ) + a i ( n ) 2 = 0 .
Next, we discuss the proprieties of the Type I–III equations, respectively. Theorem 2 proves that there is a unique and non-decreasing solution of a 2 ( n ) within a specific searching range of a 1 ( n ) . Part (i) of Theorem 2 proves that the Type I equation has a unique solution of a 2 ( n ) if we set the initial searching range of a 1 ( n ) with an upper bound of E ( X ) . Note that E ( x ) is the MSE-RP when n = 1 . It is feasible for finding the solution of a 2 ( n ) given a 1 ( n ) in (14) for n 2 . Based on Part (ii), the solution of a 2 ( n ) is strictly increasing with respect to a 1 ( n ) under the condition (15). In practice, the condition (15) is easy to meet and convenient to verify.
Theorem 2.
Given n 2 . Denote E ( X ) as the population mean of X MixN ( α , μ 1 , σ 1 2 , μ 2 , σ 2 2 ) .
(i) 
The Type I Equation (11) in (10) has unique solution of a 2 ( n ) = h 1 ( a 1 ( n ) ) , if and only if,
< a 1 ( n ) < E ( X ) .
(ii) 
a 2 ( n ) = h 1 ( a 1 ( n ) ) is a strictly increasing function with respect to a 1 ( n ) if
a 2 ( n ) a 1 ( n ) 4 · f a 1 ( n ) + a 2 ( n ) 2 F a 1 ( n ) + a 2 ( n ) 2 < 0 ,
for a 1 ( n ) ( , E ( X ) ) .
The proof of Theorem 2 is presented in Appendix A. Due to the complexity of the density of MixN, it is difficult to prove that the inequality (15) holds over the entire searching range of a 1 ( n ) ( , E ( X ) ) . Remarks 1 and 2 give sufficient conditions for (15) to be true.
Remark 1.
When μ 1 = μ 2 , a 2 ( n ) = h 1 ( a 1 ( n ) ) is a strictly increasing function with respect to a 1 ( n ) if
E ( X ) 2 max { σ 1 , σ 2 } < a 1 ( n ) < E ( X ) .
Remark 2.
When μ 2 < μ 1 < μ 2 + σ 1 / ( 1 α ) , a 2 ( n ) = h 1 ( a 1 ( n ) ) is a strictly increasing function with respect to a 1 ( n ) if
μ 2 < a 1 ( n ) < E ( X ) .
Theorem 3 proves that the last equation in the non-linear system (10) has unique and non-decreasing solution of a n ( n ) given a n 1 ( n ) .The proof of Theorem 3 is presented in Appendix A.
Theorem 3.
Given n 2 and a 1 ( n ) ( , E ( X ) ) , the Type II Equation (12) in (10) has unique solution of a n ( n ) = h * ( a n 1 ( n ) ) and h * ( a n 1 ( n ) ) is a strictly increasing function with respect to a n 1 ( n ) .
In the case of n = 2 , the system of Equation (10) reduces to
α σ 1 2 ϕ a 1 ( 2 ) + a 2 ( 2 ) 2 ; μ 1 , σ 1 2 + μ 1 Φ a 1 ( 2 ) + a 2 ( 2 ) 2 ; μ 1 , σ 1 2 + ( 1 α ) σ 2 2 ϕ a 1 ( 2 ) + a 2 ( 2 ) 2 ; μ 2 , σ 2 2 + μ 2 Φ a 1 ( 2 ) + a 2 ( 2 ) 2 ; μ 2 , σ 2 2 a 1 ( 2 ) F a 1 ( 2 ) + a 2 ( 2 ) 2 = 0 , α μ 1 1 Φ a 1 ( 2 ) + a 2 ( 2 ) 2 ; μ 1 , σ 1 2 + σ 1 2 ϕ a 1 ( 2 ) + a 2 ( 2 ) 2 ; μ 1 , σ 1 2 + ( 1 α ) μ 2 1 Φ a 1 ( 2 ) + a 2 ( 2 ) 2 ; μ 2 , σ 2 2 + σ 2 2 ϕ a 1 ( 2 ) + a 2 ( 2 ) 2 ; μ 2 , σ 2 2 a 2 ( 2 ) 1 F a 1 ( 2 ) + a 2 ( 2 ) 2 = 0 .
For any given a 1 ( 2 ) that satisfies the conditions in Theorem 2, we can obtain a 2 ( 2 ) = h 1 a 1 ( 2 ) from the first equation in (16) and a 2 ( 2 ) * = h * a 1 ( 2 ) from the second equation in (16), respectively. Following the procedure of the Fang-He algorithm, the iteration will stop until a 2 ( 2 ) a 2 ( 2 ) * < ϵ . For example, let X MixN α = 0.7 , μ 1 = 4.6 , σ 1 2 = 2 , μ 2 = 1.5 , σ 2 = 1 , we obtain a set of 2 MSE-RPs a 1 ( 2 ) = 0.552945 and a 2 ( 2 ) = 1.989754 with probabilities p 1 ( 2 ) = 0.696014 and p 2 ( 2 ) = 0.303986 , respectively. The IG of the set of two MSE-RPs is about 59.18 % .
When n 3 , we wish to obtain a i + 1 ( n ) = h i a 1 ( n ) from the Type III equation in (10) based on the a 1 ( n ) and a 2 ( n ) = h 1 a 1 ( n ) for i = 2 , , n . Theorem 4 proves that, when n = 3 , the Type III equation has unique solution of a 3 ( 3 ) given a a 1 ( 3 ) in the range of ( , a 1 ( 2 ) ) , where a 1 ( 2 ) is the first point of a set of two MSE-RPs.
Theorem 4.
When n = 3 , the Type III equations (13) in (10) have a unique solution of a 3 ( 3 ) = h 2 a 1 ( 3 ) , if and only if,
a 1 ( 3 ) < a 1 ( 2 ) .
The proof of Theorem 4 is presented in Appendix A. According to Theorem 4, for given a 1 ( n ) , where n > 3 , we obtain a 2 ( n ) = h 1 a 1 ( n ) , a 3 ( n ) = h 2 a 1 ( n ) , , a n ( n ) = h n 1 a 1 ( n ) from the 1st, 2nd, …, ( n 1 ) th equation of (10), in turn. Then, the Type III Equation (13) has a unique solution of a j ( n ) = h j 1 a 1 ( n ) if and only if a 1 ( n ) < a 1 ( j 1 ) for j = 3 , , n .
As we discussed above, given an initial value of a 1 ( n ) , we can calculate the values of a ( i + 1 ) ( n ) = h i ( a 1 ( n ) ) , where i = 2 , , n 1 , and the value of a ( n ) ( n ) * = h * ( a 1 ( n ) ) . Based on Theorems 2–4, the non-linear Equation (10) has unique solution. Given a 1 ( n ) , when a n ( n ) * is significantly smaller than a n ( n ) , we can reduce the initial value of a 1 ( n ) to narrow down the difference between a n ( n ) and a n ( n ) * . When a n ( n ) * is significantly larger than a n ( n ) , we can increase the value a 1 ( n ) to shrink the difference between the two approximations of the last point.
In order to improve the efficiency of the algorithm, we need more specifications for the searching range of the initial value a 1 ( n ) and the rules for searching range reduction. Set an initial searching region of a 1 ( n ) , denoted as ( LB , UB ) . According to Theorem 2, we can set an upper bound of UB = E ( X ) to ensure a unique solution of the Type I equation and the monotonicity of h 1 ( a 1 ( n ) ) . For finding a fixed number of MSE-RPs, we recommend an initial lower bound of LB = min { μ 1 4 σ 1 , μ 2 4 σ 2 } . The initial value given in Step 1 is a 1 ( n ) = 0.5 ( LB + UB ) . Then, in Step 6, we follow the rules below for the iteration:
  • If a n ( n ) a n ( n ) * < ε , the desired solution set is obtained.
  • If a n ( n ) < a n ( n ) * + ϵ , the initial value of a 1 ( n ) is too small. Let LB = a 1 ( n ) , a 1 ( n ) = 0.5 ( LB + UB ) and go back to Step 1.
  • If a n ( n ) > a n ( n ) * + ϵ , the initial value of a 1 ( n ) is too large. Let UB = a 1 ( n ) , a 1 ( n ) = 0.5 ( LB + UB ) and go back to Step 1.
Therefore, the algorithm is convergent as the searching interval for a 1 ( n ) is reduced by half. When finding a set of n = 3 MSE-RPs, the initial upper bound of E ( X ) is reduced to a 1 ( 2 ) . Iteratively, the initial upper bound of a 1 ( n + 1 ) is adjusted to a 1 ( n ) when computing a set of n + 1 MSE-RP. As a result, the Fang-He algorithm effectively finds the unique solution to the nonlinear system (10).
The first 30 MSE-RPs of M i x N ( 0.7 , 8 , 10 , 1.5 , 1 ) generated by the above computation procedure are presented in Table 1. The first column in Table 1 gives the information gain (IG). The corresponding probabilities of MSE-RPs are given in Table 2. The density of M i x N ( 0.7 , 8 , 10 , 1.5 , 1 ) is presented in Figure 2(C5).

5. Numerical Studies

Based on the six underlying distributions listed in Table 3, we compare the Fang-He algorithm with the k-means algorithm and apply MSE-RPs to the kernel density estimation. The corresponding densities of the six distributions are presented in Figure 2.
The distribution C1 is a location mixture that satisfies the log-concavity condition of a unique set of MSE-RPs. The distributions C2 and C3 represent classic mirrored mixtures with unimodal and bimodal distributions, respectively. The distributions C4–C6 are general location–scale mixtures. In particular, C5 and C6 have large population variances.

5.1. Algorithm Comparisons

In this subsection, the Fang-He algorithm is compared with the k-mean algorithm. We generate MSE-RPs from the distributions C1–C6 on the point sizes of n = 2 , 3 , 4 , 5 , 10 , 15 , and 30, respectively. A tolerance for errors of ϵ = 10 5 is applied to both algorithms. The training sample size for the k-means algorithm is 100,000.
A solution that is optimal will satisfy the non-linear system (10). The total error of numerical solutions can be calculated. On the left-hand side of (10), we fill in the numerical solutions of MSE-RPs and then sum them up. Table 4 summarizes the total error of two algorithms in checking the non-linear system (10). It is evident that the Fang-He algorithm provides more accurate numerical approximations of the MSE-RPs from MixN compared to the k-means algorithm.
Table 5 summarizes the information gain (7) in percentage (IG%) of the numerical solutions generated by the Fang-He algorithm and the k-means algorithm, respectively. In all distributions and point sizes, the Fang-He algorithm generates MSE-RPs with higher IG.
Our numerical studies indicate that the Fang-He algorithm generates MSE-RPs from MixN with a high level of accuracy and efficiency.

5.2. Kernel Density Estimations

In many data-transmission systems, analog input signals are first converted to digital form at the transmitter, transmitted in digital form, and finally reconstituted at the receiver as analog signals [17]. Suppose the input signal X MixN α , μ 1 , σ 1 2 , μ 2 , σ 2 2 with the cdf F ( x ) and the density function f ( x ) , and the output signal is a set of n points. Three methods for selecting points from a MixN signal are compared in this section. They are the Monte Carlo method (MC), the quasi-Monte Carlo method (QMC) and the minimized mean squared method (MSE).
In the MC setting, the cdf F ( x ) of X can be estimated by the empirical cdf of X 1 , , X n
F n ( x ) = 1 n i = 1 n 1 X i x .
Every support point in F n has the same probability of 1 / n . The MC method is often criticized for its slow convergence. In particular, given a fixed number of points n, random samples from a MixN are usually not inclusive enough and often do not accurately represent the full mixture population.
The quasi-Monte Carlo (QMC) methods have an asymptotically faster convergence rate than the MC, as demonstrated in the field of numerical integration. In the univariate case, the QMC methods are designed to sample points that are uniformly distributed on the interval [ 0 , 1 ] , and aim to select support points from F ( x ) by minimum F-discrepancy. Define a set of n QMC points, where n 1 as
Q = 2 j 1 2 n , j = 1 , , n .
In the numeric–theoretic method, the set of QMC points is proved to have minimal discrepancy among all possible sets of n points lying in the interval ( 0 , 1 ) [41]. Assume the inverse function of F exists, then the point set
F 1 2 j 1 2 n , j = 1 , , n ,
are proved to have the minimal F-discrepancy of 1 / 2 n from F ( x ) [42]. Hence, the approximating distribution to F ( x ) in the QMC setting is defined as
F q m c , n ( x ) = 1 n i = 1 n 1 F 1 u i x ,
where u 1 , , u n are QMC points. Each support point in F q m c , n is not random but still has the same probability of 1 / n .
For the best representation of MixN, the minimized mean squared method generates MSE-RPs from F with minimum mean squared error.
Next, we perform the kernel density estimation to reconstitute the output signal at the receiver as analog signals. The kernel density estimation method is proposed by [43,44]. Given a fixed number of points x 1 , x n from the original signal, the density estimation of f ( x ) is given as
f ^ h ( x ) = 1 n i = 1 n k h x x i = 1 n h i = 1 n k x x i h ,
where k h ( y ) = 1 h k ( x / h ) . We apply standard normal kernel k ( x ) = 1 2 π e 1 2 x 2 in the simulation studies. In the MSE method, the points x 1 , x n may have different probabilities. The density estimation of f ( x ) can be extended to
f ^ h ( x ) = i = 1 n k h x x i p i = 1 h i = 1 n k x x i h p i .
We present comparisons of three types of sampling methods using sample sizes of 10 , 15 , and 30 for kernel density estimation among C1–C6 distributions. Our simulation results indicate that MSE-RPs-based kernel estimation is highly effective, followed by QMC-RPs-based estimation.
For the underlying distribution C6, the density comparisons are given in Figure 3 and the corresponding L 2 -distance between the estimated kernel density and the population density f ( x ) are summarised in Table 6. The density estimate obtained from 30 Monte Carlo sampling points would be inaccurate because the L 2 -distance is only 0.2082 , which is not sufficient to reconstitute the density of C6. From our simulation results, the MC method is affected by randomness. Increasing the sample size from 10 to 15 results in a larger L 2 distance. In comparison with the MC method, the QMC method significantly improves the estimation results. The MSE method offers better performance in density fitting than the QMC method. According to Table 6, kernel density estimation based on 15 MSE points is more accurate than one based on 30 QMC points. Consistent results are observed for the estimation of the distribution C3 in Figure 4 and Table 7.
Based on the underlying distributions C1, C2, C4, and C5, Figure 5 and Table 8 present comparisons of the three methods in density fittings at a point size of n = 30 . The MSE method is capable of re-constituting the four different MixN densities with low L 2 -distances.

5.3. Real Data Example

The proposed algorithm and the application of MSE-RPs in density estimations are demonstrated through the analysis of the cell nucleus data from the diagnostic database of the University of Wisconsin Clinical Sciences Center [45]. Ref. [12] fit the perimeter of the cell nucleus using MixN models based on the revised penalized maximum likelihood estimation. The MixN model presented below is taken from [12] and is used for further analysis.
M i x N ( 0.5806 , 75.4189 , 89.4452 , 117.1346 , 667.5068 )
We generate 50 MSE-RPs from the MixN model (17) using both the Fang-He algorithm and the k-means algorithm. Considering the large variance in the model (17), five million training samples are used for the k-means algorithm. Table 9 presents the comparisons of the two algorithms. MSE-RPs generated by the k-means algorithm have a total error of 0.0273 , which does not satisfy the non-linear system (10). By using the Fang-He algorithm, we are able to generate more accurate numerical approximations of the 50 MSE-RPs with a smaller mean squared error.
We also generate 50 QMC points and 50 MC points from the model (17). Table 10 summarizes the estimates of the mean, variance, skewness, and kurtosis of the distribution (17) based on the MC, QMC, and MSE methods. The first row of Table 10 displays the first fourth moments of the underlying distribution (17). MSE-RPs generated by the Fang-He algorithm have the same mean as the population expectation, which is consistent with the properties of MSE-RP as described in (6). MSE-RPs obtained using the k-means algorithm have a bias of 0.0054 to the population expectation. Additionally, MSE-RPs generated by the k-means algorithm have a higher bias in terms of variance, skewness, and kurtosis than the MSE-RPs obtained by the Fang-He algorithm. Therefore, the Fang-He algorithm is considered to be a more effective computational procedure for numerical approximations of MSE-RPs from a MixN. When comparing MSE-RPs to QMC points and MC points, the MSE method estimates the moments of the model more accurately.
A comparison of the kernel density estimates based on MC, QMC, and MSE methods are presented in Figure 6. The corresponding L 2 -distances between the kernel estimates and the density of the model (17) are 0.0789 for MC, 0.0382 for QMC and 0.0215 for MSE.

6. Discussion

The Fang-He algorithm is recommended for numerical approximations of MSE-RPs from a mixture of two-component normal distributions. In this paper, we investigated the properties of the non-linear system used for generating MSE-RPs from MixN. The Fang-He algorithm is effective in finding the numerical solution of the non-linear system (10), as the upper bound of the searching range of a 1 ( n ) is gradually narrowed down from the population expectation E ( X ) as the number of points n increases. The results of our numerical studies confirm that the Fang-He algorithm is capable of providing accurate and reliable MSE-RPs.
Mixtures of two-component normal distributions have applications across many disciplines. As demonstrated in this paper, MSE-RPs from MixN provide outstanding performance for kernel density estimation. Further applications of MSE-RPs based on a Gaussian mixture model are being investigated for future research.

Author Contributions

Conceptualization, K.-T.F., P.H. and H.P.; methodology, K.-T.F., Y.L. and P.H.; software, Y.L.; validation, P.H.; writing—original draft preparation, Y.L.; writing—review and editing, K.-T.F., P.H. and H.P. All authors have read and agreed to the published version of the manuscript.

Funding

Our work was supported in part by the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College (2022B1212010006) and in part by the Internal Research Grant (R202010) of BNU-HKBU United International College.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The cell nucleus data from the diagnostic database of the University of Wisconsin Clinical Sciences Center is available at https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 (accessed on 28 August 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In Appendix A section, we provide the proofs of Theorems 1–4. For simplicity, we denote ϕ 1 ( x ) and Φ 1 ( x ) as the pdf and the cdf of N ( μ 1 , σ 1 2 ) , respectively, and denote ϕ 2 ( x ) and Φ 2 ( x ) as the pdf and the cdf of N ( μ 2 , σ 2 2 ) , respectively.
Proof of Theorem 1.
Let a random variable X MixN ( α , μ 1 , σ 1 2 , μ 2 , σ 2 2 ) with the pdf
f ( x ) = α ϕ 1 ( x ) + ( 1 α ) ϕ 2 ( x ) .
According to the second-order concavity test, f ( x ) is log-concave if
f ( x ) f ( x ) ( f ( x ) ) 2 < 0 ,
for all x in R . The first-order derivative of f ( x ) is
f ( x ) = α ( x μ 1 ) σ 1 2 ϕ 1 ( x ) + ( 1 α ) ( x μ 2 ) σ 2 2 ϕ 2 ( x ) .
The second-order derivative of f ( x ) is
f ( x ) = ( x μ 1 ) 2 σ 1 4 1 σ 1 2 α ϕ 1 ( x ) + ( x μ 2 ) 2 σ 2 4 1 σ 2 2 ( 1 α ) ϕ 2 ( x ) .
Accordingly, we have
f ( x ) f ( x ) = α 2 ϕ 1 2 ( x ) ( x μ 1 ) 2 σ 1 4 1 σ 1 2 + ( 1 α ) 2 ϕ 2 2 ( x ) ( x μ 2 ) 2 σ 2 4 1 σ 2 2 + α ( 1 α ) ϕ 1 ( x ) ϕ 2 ( x ) ( x μ 1 ) 2 σ 1 4 + ( x μ 2 ) 2 σ 2 4 1 σ 1 2 1 σ 2 2 ,
and
( f ( x ) ) 2 = α 2 ϕ 1 2 ( x ) ( x μ 1 ) 2 σ 1 4 + ( 1 α ) 2 ϕ 2 2 ( x ) ( x μ 2 ) 2 σ 2 4 + α ( 1 α ) ϕ 1 ( x ) ϕ 2 ( x ) 2 ( x μ 1 ) ( x μ 2 ) σ 1 2 σ 2 2 .
By (A1) and (A2), we would like to have
α 2 ϕ 1 2 σ 1 2 ( 1 α ) 2 ϕ 2 2 σ 2 2 α ( 1 α ) ϕ 1 ( x ) ϕ 2 ( x ) 1 σ 1 2 + 1 σ 2 2 ( x μ 1 ) σ 1 2 ( x μ 2 ) σ 2 2 2 < 0 .
If 1 σ 1 2 + 1 σ 2 2 ( x μ 1 ) σ 1 2 ( x μ 2 ) σ 2 2 2 0 holds for all x R , then (A3) holds. Assume that σ 1 2 = σ 2 2 = σ 2 , 2 σ 2 ( μ 2 μ 1 ) σ 2 2 0 if | μ 1 μ 2 | 2 σ given any α ( 0 , 1 ) . □
Proof of Theorem 2.
Part (i). Denote the left-hand side of (11) as a function of two variables, i.e.,
G u , v ( u , v ) = α μ 1 Φ 1 u + v 2 σ 1 2 ϕ 1 u + v 2 + ( 1 α ) μ 2 Φ 2 u + v 2 σ 2 2 ϕ 2 u + v 2 u F u + v 2 ,
where u R and v ( u , + ) represent a 1 ( n ) and a 2 ( n ) , respectively, for shorts. Define a intermediate function of u by setting v = u in G u , v ( u , v ) as
G u ( u ) = α μ 1 Φ 1 u σ 1 2 ϕ 1 u + ( 1 α ) μ 2 Φ 2 u σ 2 2 ϕ 2 u u F u .
From (A4), it is easy to verify that G u ( u ) 0 as u and G u ( u ) = F ( u ) < 0 for u R . Hence, G u ( u ) < 0 for u R , which implies
lim v u G u , v ( u , v ) < 0 .
The first-order partial derivative of G u , v with respect to v is positive, as < u < v < ,
G u , v u , v v = v u 4 · f u + v 2 > 0 .
By (A5) and (A6), the function G u , v u , v = 0 has unique solution of v, if and only if,
lim v + G u , v u , v = E ( X ) u > 0 ,
where E ( X ) is the population mean of X MixN ( α , μ 1 , σ 1 2 , μ 2 , σ 2 2 ) . Therefore, when a 1 ( n ) < E ( X ) , the Type I Equation (11) has unique solution of a 2 ( n ) .
Part (ii). Given v = h 1 ( u ) , according to the implicit function theorem, we have
G u , v u ( u , h 1 ( u ) ) + G u , v v ( u , h 1 ( u ) ) h 1 ( u ) = 0 ,
which implies
h 1 ( u ) = G u , v u ( u , v ) G u , v v ( u , v ) .
Given (A6), if (A8) < 0 for all u ( , E ( X ) ) so that (A7) > 0, then v = h 1 u is a strictly increasing function with respect to u.
G u , v u ( u , v ) = v u 4 · f u + v 2 F u + v 2 .
Next, the strictly decreasing conditions of (A8) is discussed. Denote g u , v ( u , v ) = G u , v u ( u , v ) . Given u ( , E ( x ) ) , we have lim v u g u , v ( u , v ) < 0 and the partial derivative of g u , v ( u , v ) with respect to v is
g u , v ( u , v ) v = v u 8 f ( u + v 2 ) 1 4 f ( u + v 2 ) = α 4 ϕ 1 ( u + v 2 ) ( v u ) ( u + v 2 μ 1 ) 4 σ 1 2 + 1 ( 1 α ) 4 ϕ 2 ( u + v 2 ) ( v u ) ( u + v 2 μ 2 ) 4 σ 2 2 + 1 .
g u , v ( u , v ) is strictly decreasing, i.e., (A9) < 0, if both (A10) and (A11) hold.
( v u ) ( u + v 2 μ 1 ) 4 σ 1 2 + 1 > 0 ,
( v u ) ( u + v 2 μ 2 ) 4 σ 2 2 + 1 > 0
Remark A1.
Assume that μ 1 = μ 2 = E ( X ) . If v E ( X ) , (A10) and (A11) hold. If c < u < v < E ( X ) , where c > is a lower bound of the range of u. Let σ 2 = max σ 1 2 , σ 2 2 , if ( v u ) ( u + v 2 E ( X ) ) 4 σ 2 > 0 , which implies
u > v 4 σ 2 2 E ( X ) ( u + v ) ,
then (A10) and (A11) hold. Since u + v > 2 u and v < E ( X ) , we have
E ( X ) 2 σ 2 E ( X ) u > v 2 σ 2 E ( X ) u > v 4 σ 2 2 E ( X ) ( u + v ) .
Then, if u > E ( X ) 2 σ 2 E ( X ) u , which implies u > E ( X ) 2 σ , then (A12) holds. Therefore, when μ 1 = μ 2 , a 2 ( n ) = h 1 ( a 1 ( n ) ) is a strictly increasing function with respect to a 1 ( n ) for a 1 ( n ) E ( X ) 2 σ , E ( X ) .
Remark A2.
Assume that μ 2 < μ 1 , so that μ 2 < E ( X ) < μ 1 . Suppose c < u < E ( X ) , where c > is a lower bound for the range of u. If c μ 2 so that u + v 2 μ 2 > 2 ( c μ 2 ) 0 , then (A11) holds.
When u + v 2 μ 1 0 , (A10) holds, we have v > μ 1 > E ( X ) > u > c μ 2 because v 2 μ 1 u > 2 μ 1 E ( X ) and μ 1 > E ( X ) . Therefore, when u + v 2 μ 1 0 , g ( u ) = v is strictly increasing with respect to u if u ( c , E ( X ) ) ( μ 2 , E ( X ) ) , where c μ 2 .
When u + v 2 μ 1 < 0 , we have 2 μ 1 u > v > u , and (A10) holds if
0 < v u < 4 σ 1 2 2 μ 1 ( u + v )
Since u + v > 2 u , we have 2 σ 1 2 μ 1 u < 4 σ 1 2 2 μ 1 ( u + v ) . Then, if v u < 2 σ 1 2 μ 1 u so that
u > v 2 σ 1 2 μ 1 u ,
the inequality (A14) holds. Since 2 μ 1 u > v , we have 2 μ 1 u 2 σ 1 2 μ 1 u > v 2 σ 1 2 μ 1 u . Then, if u > μ 1 σ 1 and μ 2 < μ 1 < μ 2 + σ 1 / ( 1 α ) , we have
E ( X ) > u > μ 1 σ 1 2 μ 1 u ,
which implies the inequality (A15) holds. When μ 2 < μ 1 < μ 2 + σ 1 / ( 1 α ) , μ 1 σ 1 < μ 2 . Therefore, g ( u ) = v is strictly increasing with respect to u if μ 2 < u < E ( X ) given μ 2 < μ 1 < μ 2 + σ 1 / ( 1 α ) .
Therefore, for a general location scale mixture of two normal distributions with μ 2 < μ 1 < μ 2 + σ 1 / ( 1 α ) , a 2 ( n ) = h 1 ( a 1 ( n ) ) is a strictly increasing function with respect to a 1 ( n ) for a 1 ( n ) μ 2 , E ( X ) .
Proof of Theorem 3.
Denote the left-hand side of (12) as a function of two variables, i.e.,
D u , v ( u , v ) = α μ 1 1 Φ 1 u + v 2 + σ 1 2 ϕ 1 u + v 2 + ( 1 α ) μ 2 1 Φ 2 u + v 2 + σ 2 2 ϕ 2 u + v 2 v 1 F u + v 2 ,
where < u < v < + . In (A17), the variables u and v represent a n 1 ( n ) and a n ( n ) , respectively. Define a intermediate function of u by letting v = u as
D u ( u ) = α μ 1 1 Φ 1 u + σ 1 2 ϕ 1 u + ( 1 α ) μ 2 1 Φ 2 u + σ 2 2 ϕ 2 u u 1 F u .
From (A18), we have D u ( u ) + as u , D u ( u ) 0 as u + , and D u ( u ) = F ( u ) 1 < 0 for u R . Hence, we have, for u R ,
lim v u D u , v ( u , v ) > 0 , lim v + D u , v ( u , v ) = 0 .
D u , v u , v = 0 has only one solution of v if D u , v u , v firstly decreases from a positive value to a negative value and then increases and approaches 0 as v increases.
Next, we prove that the first-order partial derivative of D u , v ( u , v ) with respective to v is firstly negative and then be positive as v increases, where v ( u , + ) and u R . The first-order partial derivative of D u , v ( u , v ) with respective to v is
D u , v ( u , v ) v = v u 4 · f u + v 2 + F u + v 2 1 .
For u R , we have
lim v u D u , v ( u , v ) v = F ( u ) 1 < 0 , lim v + D u , v ( u , v ) v = 0 .
The second-order partial derivative of D u , v ( u , v ) with respect to v is
2 D u , v u , v v 2 = 3 4 f u + v 2 + v u 8 f u + v 2 = 1 4 α ϕ 1 u + v 2 M 1 ( u , v ) + ( 1 α ) ϕ 2 u + v 2 M 2 ( u , v ) ,
where
M 1 ( u , v ) = 3 ( v u ) ( u + v 2 μ 1 ) 4 σ 1 2 , M 2 ( u , v ) = 3 ( v u ) ( u + v 2 μ 2 ) 4 σ 2 2 .
Both M 1 ( u , v ) and M 2 ( u , v ) are quadratic functions with respect to v. For i = 1 , 2 , lim v u M i ( u , v ) = 3 , lim v + M i ( u , v ) = . Both M 1 ( u , v ) and M 2 ( u , v ) firstly increase from 3 then decrease from a positive number to with respect to v.
lim v u 2 M u , v u , v v 2 = 3 4 f u > 0 , lim v + 2 M u , v u , v v 2 = 0 .
Based on (A23), the second-order partial derivative (A22) is firstly positive and then negative as v increases. When v approaches + , (A20) approaches 0 from a negative value, as ϕ 1 and ϕ 2 converge to 0 faster than M 1 and M 2 .
From (A21)–(A23), the first-order partial derivative (A20) increases from a negative number to a positive number and then decreases and approaches 0 with respect to v. Therefore, we prove that D u , v ( u , v ) = 0 has a unique solution of v given u R .
When v in the neighborhood of ( u , h * ( u ) ) , we have
D u , v u , v v < 0 , D u , v u , v u = v u 4 · f u + v 2 > 0 .
According to the implicit function theorem, we have h * ( u ) > 0 . Therefore, v = h * u is a strictly increasing function with respect to u. □
Proof of Theorem 4.
Denote the left-hand side of (13) as a function of three variables, i.e.,
H u , v , z ( u , v , z ) = α μ 1 Φ 1 v + z 2 Φ 1 u + v 2 + σ 1 2 ϕ 1 u + v 2 ϕ 1 v + z 2 + ( 1 α ) μ 2 Φ 2 v + z 2 Φ 2 u + v 2 + σ 2 2 ϕ 2 u + v 2 ϕ 2 v + z 2 v F v + z 2 F u + v 2 ,
where < u < v < z < + . Variables u, v, z represent a i 1 ( n ) , a i ( n ) and a i + 1 ( n ) in the ith equation ( i > 1 ) of (10), respectively.
Similar to the proofs of Theorems 2 and 3, denote an intermediate function H u , v ( u , v ) by letting z = v in (A24). We have
H u , v ( u , v ) u = v u 4 · f u + v 2 > 0 ,
lim u v H u , v ( u , v ) = 0 ,
lim u H u , v ( u , v ) < 0 .
When u , we have lim u H u , v ( u , v ) = R ( v ) , where
R ( v ) = α μ 1 Φ 1 ( v ) σ 1 2 ϕ 1 ( v ) + ( 1 α ) μ 2 Φ 2 ( v ) σ 2 2 ϕ 2 ( v ) v F ( v ) .
Since R ( v ) = F ( v ) < 0 and R ( v ) 0 as v u , where u . Therefore, (A27) holds.
By (A25)–(A27), H u , v , z ( u , v , z ) strictly increases from a negative number and approaches to zero as u increases from to v.
Hence,
lim z v H u , v , z ( u , v , z ) < 0 .
The first-order partial derivative of H u , v , z ( u , v , z ) with respect to z is positive, i.e.,
H u , v , z z = z v 4 · f v + z 2 > 0 , as z > v .
Based on (A28) and (A29), H u , v , z ( u , v , z ) has a unique solution of z if and only if
lim z + H u , v , z ( u , v , z ) = D u , v ( u , v ) > 0 ,
where D u , v ( u , v ) is the Type II function defined in (A17). From the proof of Theorem 3 and the discussion of a set of n = 2 MSE-RPs, D u , v ( u , v ) is positive if and only if v < h * ( u ) and h * ( u ) is a strictly increasing function with respect to u. Given a set of n = 3 MSE-RPs, H ( a 1 ( 3 ) , a 2 ( 3 ) , a 3 ( 3 ) ) > 0 when a 3 ( 3 ) , if and only if, a 1 ( 3 ) < a 1 ( 2 ) . □

References

  1. Andrew, G.; Qi, C.; Alan, D.K.; Ron, W. Pulse pileup rejection methods using a two-component Gaussian Mixture Model for fast neutron detection with pulse shape discriminating scintillator. Nucl. Instrum. Methods Phys. Res. A Accel. Spectrom. Detect. Assoc. Equip. 2021, 988, 164905. [Google Scholar]
  2. Kong, L.; Chatzinotas, S.; Öttersten, B. Unified framework for secrecy characteristics with mixture of Gaussian (MoG) distribution. IEEE Wirel. Commun. 2020, 10, 1625–1628. [Google Scholar] [CrossRef]
  3. Shen, X.; Zhang, Y.; Sata, K.; Shen, T. Gaussian mixture model clustering-based knock threshold learning in automotive engines. IEEE ASME Trans. Mechatron 2020, 6, 2981–2991. [Google Scholar] [CrossRef]
  4. Mazzeo, D.; Oliveti, G.; Labonia, E. Estimation of wind speed probability density function using a mixture of two truncated normal distributions. Renew. Energy 2018, 115, 1260–1280. [Google Scholar] [CrossRef]
  5. Ouarda, T.B.M.J.; Charron, C. On the mixture of wind speed distribution in a Nordic region. Energy Convers. Manag. 2018, 174, 33–44. [Google Scholar] [CrossRef]
  6. Venkataraman, S. Value at risk for a mixture of normal distributions: The use of quasi- Bayesian estimation techniques. Econ. Perspect. Fed. Reserve Bank Chic. 1997, 21, 2–13. [Google Scholar]
  7. Duan, R.; Ning, Y.; Wang, S.; Lindsay, B.G.; Carroll, R.J.; Chen, Y. A fast score test for generalized mixture models. Biometrics 2020, 76, 811–820. [Google Scholar] [CrossRef]
  8. Di, C.Z.; Liang, K.Y. Likelihood ratio resting for admixture models with application to genetic linkage analysis. Biometrics 2011, 67, 1249–1259. [Google Scholar] [CrossRef]
  9. Hartigan, J.A. A failure of likelihood asymptotics for normal mixtures. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer; Wadsworth: Belmont, CA, USA, 1985; Volume 2, pp. 807–810. [Google Scholar]
  10. Day, N.E. Estimating the components of a mixture of normal distributions. Biometrika 1969, 56, 463–474. [Google Scholar] [CrossRef]
  11. Panić, B.; Klemenc, J.; Nagode, M. Improved Initialization of the EM Algorithm for Mixture Model Parameter Estimation. Mathematics 2020, 8, 373. [Google Scholar] [CrossRef] [Green Version]
  12. Li, Y.; Fang, K.T. A new approach to parameter estimation of mixture of two normal distributions. Commun. Stat. Simul. Comput. 2022, 1–27. [Google Scholar] [CrossRef]
  13. Chen, J. Consistency of the mle under mixture models. Stat. Sci. 2017, 32, 47–63. [Google Scholar] [CrossRef]
  14. Wu, X. Optimal quantization by matrix searching. J. Algorithms 1991, 12, 663–673. [Google Scholar] [CrossRef]
  15. Graf, S.; Luschgy, H. Foundations of Quantization for Probability Distributions; Springer-Verlag: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  16. Gersho, A.; Gray, R.M. Vector Quantization and Signal Compression; Kluwer: Boston, MA, USA, 1992. [Google Scholar]
  17. Max, J. Quantizing for minimum distortion. IEEE Trans. Inf. Theory 1960, 6, 7–12. [Google Scholar] [CrossRef]
  18. Flury, B. Principal points. Biometrika 1990, 77, 33–41. [Google Scholar] [CrossRef]
  19. Trushkin, A. Sufficient conditions for uniqueness of a locally optimal quantizer for a class of convex error weighting functions. IEEE Trans. Inf. Theory 1982, 28, 187–198. [Google Scholar] [CrossRef]
  20. Tarpey, T. Two principal points of symmetric, strongly unimodal distributions. Stat. Probab. Lett. 1994, 20, 253–257. [Google Scholar] [CrossRef]
  21. Yamamoto, W.; Shinozaki, N. On uniqueness of two principal points for univariate location mixtures. Stat. Probab. 2000, 46, 33–42. [Google Scholar] [CrossRef]
  22. Lloyd, S.P. Least squares quantization in PCM. IEEE Trans. Inform. 1982, 28, 129–137. [Google Scholar] [CrossRef] [Green Version]
  23. Kieffer, J. Exponential rate of convergence for Lloyd’s method. IEEE Trans. Inform. Theory 1982, 28, 205–210. [Google Scholar] [CrossRef]
  24. Rowe, S. An algorithm for computing principal points with respect to a loss function in the unidimensional case. Stat. Comput. 1996, 6, 187–190. [Google Scholar] [CrossRef]
  25. Chakraborty, S.; Roychowdhury, M.K.; Sifuentes, J. High precision numerical computation of principal points for univariate distributions. Sankhya B 2021, 83, 558–584. [Google Scholar] [CrossRef]
  26. Fang, K.T.; He, S.D. The Problem of Selecting a Given Number of Representative Points in a Normal Population and a Generalized Mills’ Ratio; Technical Report SOLONR327; Department of Statistics, Stanford University: Stanford, CA, USA, 1964. [Google Scholar]
  27. Zhou, M.; Wang, W. Representative points of Student’s tn distribution and their applications in statistical simulation. Acta Math. Appl. Sin. 2016, 39, 620–640. [Google Scholar]
  28. Fu, H. The problem of selecting a specified number of representative points from a Gamma population. J. China Univ. Min. Technol. 1985, 4, 107–117. [Google Scholar]
  29. Wang, H. Problems of choosing representative points of given data in S-type distributions. J. Fuzhou Univ. 1995, 23, 7–13. [Google Scholar]
  30. Fu, H. The problem of selecting a specified number of representative points from a Weibull population. J. Wuxi Inst. Light Ind. 1993, 22, 78–83. [Google Scholar]
  31. Fei, R. The problem of selecting representative points from Pearson distributions population. J. Wuxi Inst. Light Ind. 1990, 9, 71–78. [Google Scholar]
  32. Fang, K.T.; He, P.; Yang, J. Set of representative points of statistical distributions and their applications. Sci. Sin. Math 2020, 50, 1149–1168. [Google Scholar]
  33. Eisenberger, I. Genesis of bimodal distributions. Technometrics 1964, 6, 357–363. [Google Scholar] [CrossRef]
  34. Behboodian, J. On the modes of a mixture of two normal distributions. Technometrics 1970, 12, 131–139. [Google Scholar] [CrossRef]
  35. Aitkin, M.; Wilson, G.T. Mixture models, outliers, and the em algorithm. Technometrics 1980, 22, 325–331. [Google Scholar] [CrossRef]
  36. Fei, R. Statistical relationship between the representative point and the population. J. Wuxi Inst. Light Ind. 1991, 10, 78–81. [Google Scholar]
  37. Flury, B. Estimation of principal points. J. R. Stat. Soc. C Appl. Stat. 1993, 42, 139–151. [Google Scholar] [CrossRef]
  38. Li, L.; Flury, B. Uniqueness of principal points for univariate distributions. Stat. Probab. Lett. 1995, 25, 323–327. [Google Scholar] [CrossRef]
  39. Bagnoli, M.; Bergstrom, T. Log-concave probability and its applications. Econ. Theory 2005, 26, 445–469. [Google Scholar] [CrossRef] [Green Version]
  40. Saumard, A.; Wellner, J.A. Log-concavity and strong log-concavity: A review. Stat. Surv. 2014, 8, 45–114. [Google Scholar] [CrossRef] [PubMed]
  41. Fang, K.T.; Wang, Y. Number-Theoretic Methods in Statistics, 1st ed.; Chapman and Hall: London, UK, 1994. [Google Scholar]
  42. Fang, K.T.; Wang, Y.; Bentler, P.M. Some applications of number-theoretic methods in statistics. Stat. Sci. 1994, 9, 416–428. [Google Scholar] [CrossRef]
  43. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Statist. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
  44. Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 1956, 27, 832–837. [Google Scholar] [CrossRef]
  45. Wolberg, W.; Street, W.; Mangasarian, O. Breast Cancer Wisconsin (Diagnostic). UCI Mach. Learn. Repos. 1995. Available online: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 (accessed on 28 August 2022).
Figure 1. Examples of MixN densities. Density (A) represents MixN ( 0.1 , 0 , 100 , 0 , 1 ) ; density (B) represents MixN ( 0.3 , 1.2 , 4 , 0.2 , 1 ) ; density (C) represents MixN ( 0.5 , 1 , 1 , 1 , 1 ) ; density (D) represents MixN ( 0.5 , 1 , 0.25 , 1 , 0.25 ) . The solid lines are MixN densities. The dashed lines stand for the standard normal density for comparison.
Figure 1. Examples of MixN densities. Density (A) represents MixN ( 0.1 , 0 , 100 , 0 , 1 ) ; density (B) represents MixN ( 0.3 , 1.2 , 4 , 0.2 , 1 ) ; density (C) represents MixN ( 0.5 , 1 , 1 , 1 , 1 ) ; density (D) represents MixN ( 0.5 , 1 , 0.25 , 1 , 0.25 ) . The solid lines are MixN densities. The dashed lines stand for the standard normal density for comparison.
Mathematics 10 03952 g001
Figure 2. Densities of underlying distributions for simulation. The parameter settings of distributions (C1C6) are given in Table 3.
Figure 2. Densities of underlying distributions for simulation. The parameter settings of distributions (C1C6) are given in Table 3.
Mathematics 10 03952 g002
Figure 3. Results of kernel density estimation comparisons (based on the distribution C6). The solid lines are fitted densities. The dashed lines stand for the density of C6.
Figure 3. Results of kernel density estimation comparisons (based on the distribution C6). The solid lines are fitted densities. The dashed lines stand for the density of C6.
Mathematics 10 03952 g003
Figure 4. Results of kernel density estimation comparisons (based on the distribution C3). The solid lines are fitted densities. The dashed lines stand for the density of C3.
Figure 4. Results of kernel density estimation comparisons (based on the distribution C3). The solid lines are fitted densities. The dashed lines stand for the density of C3.
Mathematics 10 03952 g004
Figure 5. Results of kernel density estimation comparisons (based on the point size of n = 30 ). The solid lines are fitted densities. The dashed lines stand for the density of the correspond underlying distribution.
Figure 5. Results of kernel density estimation comparisons (based on the point size of n = 30 ). The solid lines are fitted densities. The dashed lines stand for the density of the correspond underlying distribution.
Mathematics 10 03952 g005
Figure 6. Results of kernel density estimation comparisons. The solid lines are fitted densities. The dashed lines stand for the density of the distribution (17).
Figure 6. Results of kernel density estimation comparisons. The solid lines are fitted densities. The dashed lines stand for the density of the distribution (17).
Mathematics 10 03952 g006
Table 1. The MSE-RPs from MixN( 0.7 , 8 , 10 , 1.5 , 1 ) generated by the Fang-He algorithm.
Table 1. The MSE-RPs from MixN( 0.7 , 8 , 10 , 1.5 , 1 ) generated by the Fang-He algorithm.
SizeIG(%)n = 1n = 2n = 3n = 4n = 5n = 6n = 7n = 8n = 9n = 10n = 11n = 12n = 13n = 14n = 15
n = 106.05              
n = 273.968673462.4236135439.348758993             
n = 388.054911791.7910941546.95843564811.2740348            
n = 492.635999871.5789478315.6862647358.87527512812.41300952           
n = 594.80414421.3593209394.4115470257.2470438179.88732581713.08347976          
n = 696.276014910.8741424842.7777670935.6896052668.0728843410.4703644213.48934091         
n = 797.193556850.7215473182.4329894854.9782706977.0926977439.07736048311.2178926114.02854444        
n = 897.780918150.5905654542.1691710194.2832397156.2551182868.0210712789.78915976611.7699749514.43927428       
n = 998.205001430.3801622321.7906780643.3474243275.3186477167.0049638048.60076179310.2498831712.1362764914.71782742      
n = 1098.524225360.2398195891.5621454862.8806671444.7109475186.305732227.7673005719.21223331910.7477008412.5392668515.02845211     
n = 1198.760432130.1335661731.3994446152.5855630184.1938967555.7193366287.086145388.3977054659.73258633711.1803366712.895311615.30709701    
n = 1298.943038360.0085998291.2180335722.2844404543.6151344735.1100294356.4276499777.6595667628.87428453810.1338338811.5187038613.1767878615.52993185   
n = 1399.08923755−0.1083777811.0569994062.0365263033.1617856344.5875011855.8670691177.0394989118.1692440539.30626528510.5027401311.8331071913.4408044615.74095873  
n = 1499.20635891−0.2030114830.9324425941.8550309212.8556036534.1536207965.3962104516.5199305457.5842111168.63281779.70526210810.8475927612.1302914513.6926735315.94396209 
n = 1599.30172785−0.299800570.8100636731.6842626512.5866582133.7230950494.9371628246.0293827257.048190638.0348229979.022475110.0445898611.1439726212.3879099713.912964816.12264763
n = 1699.38099587−0.3974123090.6916263921.5252711822.3508104893.3386729494.5067949645.57685936.5623132137.5030424418.4291581589.36766401310.3481577911.4105641312.6205274714.11295278
n = 1799.44737405−0.4851924980.5891769961.3921284372.1625513683.0426658754.1298133895.1777728086.135254777.0387123287.916635568.7922742679.68860576110.6327860811.6625455812.84233566
n = 1899.50344049−0.5718288370.4918122171.2691721711.9952955682.7916232213.7738429044.7990515075.7361505846.6114287867.4524414148.280573589.1151396199.9762913710.8894549411.89121853
n = 1999.55136666−0.660085550.3965257871.1521054071.8412990972.570761083.4469661794.4378201575.3598047616.2136720757.0262048857.8176750838.6052414599.40539328110.2366521911.1229878
n = 2099.59264581−0.7445578740.3086850161.0467383291.7064643122.385022813.1736735574.1085611795.0151501855.8514455766.6405281497.4020975758.1522227978.9051096059.675432110.48030964
n = 2199.62688834−0.8073938850.2457053510.9728036541.6140161282.2615066032.9958368193.8772674054.7710007585.5966620686.3714022717.1148851477.8423549928.566908919.30156789910.06021893
n = 2299.65267792−0.8702298960.184673630.9024174611.5274230292.1482517492.8367127773.6614648824.5387542695.3557474626.1190013916.8472988817.5555951058.25640448.9613044629.682558988
n = 2399.68714839−1.002643250.0615783230.7636756191.3605716921.936474532.5495503823.2625415564.0822288124.8821461885.626537076.3296865397.0060235077.6676007768.3242516358.985487835
n = 2499.71006407−1.0654792610.0060704630.7027033281.2889919461.847844572.4334524323.1025729663.8842307224.6740215925.4110913436.1047870356.7691536967.4158359258.0543676288.693504163
n = 2599.72667915−1.128315272−0.0476905070.6445522821.2215357471.7656447912.3276865032.9594531993.7005824684.4771862415.2080306485.8936407026.5476585517.1815136397.8045995688.425070077
n = 2699.75125918−1.253987293−0.150143840.5361267241.0977715711.6175237432.1416908642.7127509353.3747073294.1130545794.831940395.50541936.1434655746.757174937.3555745117.946356234
n = 2799.7644673−1.316823304−0.1988923310.4857164191.041329811.5512528322.0603715332.6077121313.235592883.9487759494.6602670195.3289219485.9607579266.5664866617.1549796247.733845897
n = 2899.78349091−1.442495326−0.2922620940.3908132080.936284431.429467241.9130780332.4209231892.9902434833.6461756024.3376562224.9975983385.6192094466.2117486486.7839252997.343011187
n = 2999.79899701−1.576585624−0.3864349720.2972033470.8343130391.3132037171.7752552312.2507023022.7708843963.3663074024.0243909424.6737325745.2868339445.8682864016.4267979216.969342579
n = 3099.81170752−1.682670291−0.4572777530.2281828590.7601486221.229852851.6780731872.13318012.6224951323.1763835583.8008061474.438734925.0460325865.6208448866.1707575856.702586356
SizeIG(%)n=16n=17n=18n=19n=20n=21n=22n=23n=24n=25n=26n=27n=28n=29n=30
n = 16 16.28634947              
n = 17 14.3036748116.44282304             
n = 18 13.0444638814.479294716.58833697            
n = 19 12.1003750813.230063514.6412697916.7219755           
n = 20 11.3428549612.2978072413.4052561714.7937959616.84910295          
n = 21 10.860743811.7278017212.7005859713.8493517515.3370775517.28340665         
n = 22 10.433977111.2343619412.1108310613.1079830814.3113260215.9348266117.34624266        
n = 23 9.66153852110.3636258111.1068890211.9127413512.8131289113.8672527215.2005100317.19050787       
n = 24 9.34211810310.0103795110.7100056611.4568905612.2742755413.1997139314.3040262715.750771717.54149202      
n = 25 9.0510607319.6907799210.354482911.0545825411.8091324512.6433071913.6022894614.7747431116.3804496117.60432804     
n = 26 8.5363786769.1327151089.74263360210.3747203311.0404118111.7544738912.5383696313.4287542614.4923088315.8815644517.55395655    
n = 27 8.3096442388.8888333789.47800935210.0845815210.7181312111.389902412.1171271812.9256417813.8591446815.0036372116.5789371817.79283607   
n = 28 7.8952392648.4463102919.0017922959.56795790510.150889410.7596992911.4049642312.1004083912.8698369513.7503432114.8109456616.2163305317.74246458  
n = 29 7.5016857268.0293559228.5572034929.0901173129.63372451610.1937005910.7772462411.3945177712.0590199412.7882898113.6142199714.5933711115.846808217.73883763 
n = 30 7.2223289327.7352206388.2456569648.7581846019.2772341449.80803692610.3563224210.9289317611.53566512.190343312.9103917113.726717914.6962014215.93885917.81798303
Table 2. The corresponding probabilities of the MSE−RPs from MixN( 0.7 , 8 , 10 , 1.5 , 1 ) generated by the Fang−He algorithm.
Table 2. The corresponding probabilities of the MSE−RPs from MixN( 0.7 , 8 , 10 , 1.5 , 1 ) generated by the Fang−He algorithm.
Sizen = 1n = 2n = 3n = 4n = 5n = 6n = 7n = 8n = 9n = 10n = 11n = 12n = 13n = 14n = 15
n = 11              
n = 20.4763452650.523654735             
n = 30.3874643180.3591002390.253435443            
n = 40.3535936470.2334349930.2718961310.141075228           
n = 50.3121393750.1602116050.2274694270.2055448280.094634765          
n = 60.2061422130.1746955980.1723881780.206115810.16778930.0728689         
n = 70.174025890.1829245930.1301051090.1704523090.1685242950.1236571990.050310606        
n = 80.1483649630.1848876440.1024635750.1391012970.1540344620.1383523150.0955275680.037268176       
n = 90.1118392470.1754683560.0981758660.1108824030.1362362590.1391321920.1188480930.0792725550.030145028      
n = 100.0910474490.1620274940.1079629810.0897112810.1154642910.1268651450.1209881090.0986948190.0636490470.023589383     
n = 110.0772499030.1495072140.1151273650.0755807390.0972567320.1125127980.1150787710.1046397720.0824699250.0517866670.018790114    
n = 120.063111530.1334958110.1199143780.0703619040.0812762080.0986518510.1067323650.1044360710.092009110.0708010660.0436253890.015584317   
n = 130.0518192530.1182238320.1198786860.0746591130.0685971170.0854880510.0964870450.0994114650.0939836980.0806944160.060861640.0368963320.012999352  
n = 140.0439561820.1061949260.1169818050.0804194520.0600172080.0738921670.0860642060.0920549590.0913523860.0840042770.0705981140.0523279670.0312606960.010875655 
n = 150.0369918320.0945227940.1119976020.0857132670.0561205930.0637842170.076279410.0842424260.0868159070.0838312150.0755015540.0624059670.0456120560.0269159140.009265246
n = 160.0309707850.083597980.1055616280.0891293930.0574768980.0555478140.0672625330.0763430830.0811162350.0812772460.0768174240.0680088530.0554384450.0400652250.023405491
n = 170.0263297520.0745777240.0990462040.0902655960.0612127070.0498197950.0592704970.0687258310.074834510.0771535650.0755606530.0701504950.0612087780.0492954360.035243746
n = 180.0223888710.0664774080.0923291380.0897446450.0653835290.0469693070.0523270920.0616410250.0685665260.0723872820.0729159430.0701240710.0641529890.055308250.04408941
n = 190.0189541480.059063340.0854941760.0878819860.0689752120.0472490920.0466435910.0551335220.0624931360.0673465770.0694250590.0686459520.0650446220.0587850660.050171849
n = 200.0161482510.052696260.0790997010.0851671180.0713360310.0495407760.0426084530.0493361440.0567601870.0622397450.0654047130.0661321610.0643892530.0602452830.053877025
n = 210.0143319580.0484309580.0745655440.0827675460.0723372210.051794050.0408355270.0455122770.0527523760.0585344450.0623031390.0638932070.0632593370.0604175430.055480215
n = 220.0127214580.0445402510.0702426580.080139620.0727601870.0541046580.0402118320.0422214550.0490590460.0550109510.0592105550.0614774110.0617369970.0599892530.05627992
n = 230.009905610.037376580.0618229340.0742113240.0720975870.0581513820.0418632210.0372258310.0421595730.0480875110.0528458730.0561133220.0577794450.0577801530.056134082
n = 240.0088058460.0344558170.0582323040.071376540.0712294460.0594877070.0434225010.0359407720.0393752560.0450845010.0499790470.0535555990.0556889750.0563108470.055408925
n = 250.0078352270.0318013350.0548791260.0685902540.0701141730.0604616310.0451379180.0353920090.036985070.04231960.0472619380.0510670980.0535777420.0547156070.054452555
n = 260.0062242180.0272053750.0488509980.0632297920.0673808760.0613439960.0483292670.0361259540.0334904640.0374944570.0423373810.0464213680.0494550620.051350210.052054826
n = 270.0055588470.0252195720.0461809420.0607329590.0658938640.0613855240.0496601220.0370786510.0324294430.0354589090.04014110.0442903230.0474962750.0496611250.05073257
n = 280.0044535380.0217512470.0413532490.0560010640.0627143170.0607938760.0516637560.039470530.031638360.032122980.0361554250.0403030730.0437479310.0463193410.047957622
n = 290.003540880.0186637360.0368802820.0513838050.0592436550.0595037410.0528722410.042010320.0323823290.0297609120.0325502750.0364791520.0400324960.0428907160.044945877
n = 300.0029707550.0165927420.0337782070.048056650.0565507820.0581809360.0532604270.0437219640.0336650970.0288359290.0302383790.0338126160.0373612670.0403406410.042626351
Sizen=16n=17n=18n=19n=20n=21n=22n=23n=24n=25n=26n=27n=28n=29n=30
n = 160.007980967              
n = 170.0204024110.0069023             
n = 180.0312384770.0179359450.006020091            
n = 190.0396437050.0278685530.0158831390.005297276           
n = 200.0455646070.0357095740.0249315370.0141283090.00468487          
n = 210.0486480710.0402095810.0305724190.0203787590.0099690090.003006819         
n = 220.0507500190.0436221320.0351905430.0258846260.0163449240.0063006530.002200855        
n = 230.0528723610.0481021780.0420061490.03477450.0267440560.0183625380.01024060.00334319       
n = 240.0530251960.0492127530.0440741260.0377893910.0305945040.0228252650.0149316380.0070040330.002189011      
n = 250.0527859980.0497735240.045503690.0401111870.0337425950.0266576650.0192115710.0118250.0042363890.001561098     
n = 260.0515575430.0498670760.0470166120.043105010.0382285170.0325052940.026161320.0194581270.0127408190.0060223890.002043046    
n = 270.0506831840.0495148080.0472492490.0439623510.0397127340.0346222850.0288622510.0226190730.0161628280.0098796870.0035251940.001286131   
n = 280.0486273580.0483124460.0470344480.04479280.0416635350.0377281360.0330236650.0277368780.022044550.0161323840.0103322580.0045439420.001581292  
n = 290.0461717770.0465500190.0460604520.0447310830.0425724850.039614380.0359588160.0316925460.0268605410.0216416850.0162512990.0109077420.0059472530.001899505 
n = 300.0441857740.044977090.0449833090.0442066370.0426680360.0403972210.0374195860.0338191880.0296983010.0250943180.0201496480.0150819240.0100925360.0054867280.00174696
Table 3. Parameters of underlying distributions for simulation.
Table 3. Parameters of underlying distributions for simulation.
Distribution α μ 1 σ 1 2 μ 2 σ 2 2 E(X)Var(X)
C10.800.01.001.01.000.201.16
C20.501.01.00−1.01.000.002.00
C30.501.00.25−1.00.250.001.25
C40.704.62.001.51.003.673.72
C50.708.010.001.51.006.0516.17
C60.65−4.615.0010.09.000.5161.39
Table 4. Comparisons of the Fang-He algorithm and the k-means algorithm in terms of the total error on the non-linear system.
Table 4. Comparisons of the Fang-He algorithm and the k-means algorithm in terms of the total error on the non-linear system.
SizeFang-Hek-MeansFang-Hek-Means
C1C2
20.000000.002400.000000.00334
30.000000.002150.000000.00217
40.000000.001580.000000.00198
50.000000.001420.000000.00180
100.000000.000540.000000.00064
150.000000.000370.000000.00052
300.000000.000200.000000.00023
C3C4
20.000000.001840.000000.00223
30.000000.001130.000000.00255
40.000000.001520.000000.00292
50.000000.000930.000000.00259
100.000000.000530.000000.00111
150.000000.000300.000000.00072
300.000000.000130.000000.00036
C5C6
20.000000.005110.000000.01290
30.000000.006740.000000.00797
40.000000.004900.000000.00742
50.000000.003830.000000.00764
100.000000.001460.000000.00309
150.000000.001020.000000.00206
300.000000.000720.000000.00096
Table 5. Comparisons of the Fang-He algorithm and the k-means algorithm in terms of the information gain (IG) in percentage.
Table 5. Comparisons of the Fang-He algorithm and the k-means algorithm in terms of the information gain (IG) in percentage.
SizeFang-Hek-MeansFang-Hek-Means
C1C2
263.650863.648368.051468.0489
380.991980.988183.487683.4856
488.266288.263789.905989.9039
592.020392.018993.167293.1661
1097.714097.713298.058598.0576
1598.930698.929099.093699.0892
3099.717499.715499.760899.7602
C3C4
281.364381.363670.030270.0296
388.419888.418184.783284.7818
493.570493.568490.624390.6219
595.391595.390393.682793.6815
1098.716598.715298.209198.2072
1599.399799.398899.164699.1608
3099.841799.841099.779799.7790
C5C6
273.968773.968080.610280.6094
388.054988.052189.881989.8816
492.636092.635093.287893.2874
594.804194.803495.643195.6424
1098.524298.523498.726498.7249
1599.301799.300799.403699.4024
3099.811799.808299.842699.8418
Table 6. Under the distribution C6, the L 2 -distance between the kernel estimate f ^ h ( x ) and the underlying mixture density f ( x ) .
Table 6. Under the distribution C6, the L 2 -distance between the kernel estimate f ^ h ( x ) and the underlying mixture density f ( x ) .
SizeMCQMCMSE
100.29370.13520.1084
150.30330.09660.0478
300.20820.06110.0207
Table 7. Under the distribution C3, the L 2 -distance between the kernel estimate f ^ h ( x ) and the underlying mixture density f ( x ) .
Table 7. Under the distribution C3, the L 2 -distance between the kernel estimate f ^ h ( x ) and the underlying mixture density f ( x ) .
SizeMCQMCMSE
101.81230.62060.4688
151.84410.58190.3040
301.56830.38900.2157
Table 8. The L 2 -distance between the kernel estimate f ^ h ( x ) and the corresponding underlying density f ( x ) , where f ^ h ( x ) is based on the point size of n = 30 .
Table 8. The L 2 -distance between the kernel estimate f ^ h ( x ) and the corresponding underlying density f ( x ) , where f ^ h ( x ) is based on the point size of n = 30 .
DistributionMCQMCMSE
C10.82010.36250.0694
C20.59010.16390.0607
C40.52370.12710.0480
C50.44740.26050.0992
Table 9. Comparisons of the Fang-He algorithm and the k-means algorithm in terms of the total error on the non-linear system and the mean squared error (MSE).
Table 9. Comparisons of the Fang-He algorithm and the k-means algorithm in terms of the total error on the non-linear system and the mean squared error (MSE).
AlgorithmTotal ErrorMSE
Fang-He0.00000.5943
k-means0.02731.2921
Table 10. The estimations of the first moments of the fitted MixN model based on 50 sampling points from the model.
Table 10. The estimations of the first moments of the fitted MixN model based on 50 sampling points from the model.
MethodMeanVarianceSkewnessKurtosis
Model92.9145755.62920.98530.2471
MSE (Fang-He)92.9145755.03250.98510.2392
MSE (k-means)92.9091754.20460.98330.2066
QMC92.8866741.10190.95380.0196
MC96.9411781.85460.6179−0.6027
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, Y.; Fang, K.-T.; He, P.; Peng, H. Representative Points from a Mixture of Two Normal Distributions. Mathematics 2022, 10, 3952. https://doi.org/10.3390/math10213952

AMA Style

Li Y, Fang K-T, He P, Peng H. Representative Points from a Mixture of Two Normal Distributions. Mathematics. 2022; 10(21):3952. https://doi.org/10.3390/math10213952

Chicago/Turabian Style

Li, Yinan, Kai-Tai Fang, Ping He, and Heng Peng. 2022. "Representative Points from a Mixture of Two Normal Distributions" Mathematics 10, no. 21: 3952. https://doi.org/10.3390/math10213952

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop