Estimating X¯ Statistical Control Limits for Any Arbitrary Probability Distribution Using Re-Expressed Truncated Cumulants

Braden, Paul; Matis, Timothy; Benneyan, James C.; Chen, Binchao

doi:10.3390/math10071044

Open AccessArticle

Estimating $\bar{X}$ Statistical Control Limits for Any Arbitrary Probability Distribution Using Re-Expressed Truncated Cumulants

¹

Department of Industrial, Manufacturing, and Systems Engineering, Texas Tech University, Lubbock, TX 79409, USA

²

Department of Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115, USA

³

Amazon.com Inc., Seatle, WA 98170, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(7), 1044; https://doi.org/10.3390/math10071044

Submission received: 18 February 2022 / Revised: 14 March 2022 / Accepted: 17 March 2022 / Published: 24 March 2022

(This article belongs to the Special Issue Advances in Statistical Process Control and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Shewhart

\bar{X}

control charts commonly used for monitoring the mean of a process may be inaccurate or perform poorly when the subgroup size is small or the distribution of the process variable is skewed. Truncated saddlepoint distributions can increase the accuracy of estimated control limits by including higher order moments/cumulants in their approximation, yet this distribution may not exist in the lower tail, and thus the lower control limit may not exist. We introduce a novel modification in which some usually truncated higher-order cumulants are re-expressed as functions of lower-order cumulants estimated from data in a manner that ensures the existence of the truncated saddlepoint distribution over the complete domain of the random variable. The accuracy of this approach is tested in cases where the cumulants are assumed either known or estimated from sample data, and demonstrated in a healthcare application.

Keywords:

statistical process control; control chart; cumulant generating function; saddlepoint approximation; skewed probability distributions

MSC:

62-08

1. Introduction

Shewhart control charts have been the most commonly used tool for monitoring quality in a production process since their inception in the 1920s, with the mean of a quality characteristic being a key parameter of interest (Bradford and Miranti [1]). The specification of control limits as the expectation of a statistic plus or minus a multiplier of the standard deviation of such is straightforward, but the interpretation of in and out of control error rates of Shewhart

\bar{X}

charts is complicated by both inaccuracies in estimating these measures from data and the degree to which the process deviates from normality. Indeed, quality control applications in which a process variable X has a non-Gaussian probability distribution frequently arise in practice. Examples include processes that involve human behavior, are related to reliability measures, or those in which the process variable itself is bounded. In such cases, it may be beneficial to specify control limits in an alternate way.

It follows that the problem of accurately specifying the control limits of a

\bar{X}

-chart for a non-Gaussian process equates to that of accurately describing the tails of the probability distribution of

\bar{X}

for a given subgroup size n. If the probability distribution of the process variable were known, then the distribution of

\bar{X}

would be the n-fold convolution of the distribution with itself, yet in many practical applications this is either not the case or is intractable. Instead, the distribution of the process mean is typically approximated by some probability distribution that is believed will accurately represent the tail behavior, whose parameters may be known or estimated from pre-sampled “phase-1” data in a retrospective analysis.

The most common approach, based on the central limit theorem, assumes

\bar{X}

has a reasonably Gaussian distribution and proceeds using the standard formulae as per Shewhart. This approach requires only knowledge or estimation of the first two moments from sampled data and inherently assumes that higher-order moments (e.g., skewness, kurtosis) of non-Gaussian process variables vanish in the convolution. In practice, however, the veracity of this assumption is difficult to achieve with standard statistical testing such as the Shapiro–Wilks test (Razali and Wah [2]) as the size of pre-sampled data must be large to achieve adequate statistical power, leaving only judgement through graphical tools such as normal probability plots. In applications where the subgroup size is small and/or the distribution of the process variable itself has a high degree of skewness or kurtosis, the Gaussian approximation for

\bar{X}

may be inaccurate (Chang et al. [3]). While accuracy may be improved in some cases by applying an appropriate transformation to the data, the order of convergence of the central limit theorem is

O (n^{- 1 / 2})

in general.

As an alternative, approximate probability distributions that include higher-order moments, or equivalently cumulants, may be used to describe

\bar{X}

, given that a sufficient quantity of sampled data exist to accurately estimate these. One such approach is to use a jth order truncated saddlepoint approximation [4] whose order of convergence is

O (n^{- 1})

as

j \to \infty

, yet a drawback is that this approximate density function may not exist in the lower tail [5]. We thus propose a modification to the basic truncated saddlepoint approach that ensures the complete definition of this approximation over the entire domain of the random variable and thus allows for estimation of both control limits. This is achieved by re-expressing some higher-order cumulants, which would otherwise be truncated, as functions of lower-order cumulants that are either known or estimated from data so as to ensure the second derivative of the cumulant generating function is positive over the full domain of definition.

Although other authors have studied the problem of defining a fully supported truncated saddlepoint distribution, including Wang [6], Gillespie and Renshaw [7], and Fasiolo et al. [8], our approach differs in that the probability distribution exists in closed algebraic form when incorporating only skewness into the approximation and requires only numerical root finding of a polynomial function when incorporating both skewness and kurtosis. Lower and upper control limits for the mean are then easily calculated by inversion of the cumulative density at the desired type-1 error probability

α

. The remainder of this paper first reviews the theory of truncated saddlepoint approximations, then defines truncated saddlepoint approximations with either a re-expressed fourth cumulant or re-expressed fifth and sixth cumulants, illustrates their use within a healthcare application, and finally performs numerical analysis of the resulting in-control and out-of-control average run length (ARL) performance when the cumulants are either assumed to be known or estimated from sampled data.

2. Truncated Saddlepoint Approximations

Let

K (θ)

be a cumulant generating function (CGF) for a random variable

X \in S \subseteq R

with moments

ν_{i}

and cumulants

κ_{i}

for

i = 1, 2, \dots, \infty

. It follows that

\begin{matrix} K (θ) & = & κ_{1} θ + κ_{2} \frac{θ^{2}}{2!} + κ_{3} \frac{θ^{3}}{3!} + κ_{4} \frac{θ^{4}}{4!} + κ_{5} \frac{θ^{5}}{5!} + \dots \\ = & κ_{1} θ + κ_{2} \frac{θ^{2}}{2!} + ν_{3} κ_{2}^{3 / 2} \frac{θ^{3}}{3!} + ν_{4} κ_{2}^{2} \frac{θ^{4}}{4!} + κ_{5} \frac{θ^{5}}{5!} + \dots \\ = & μ θ + σ^{2} \frac{θ^{2}}{2!} + ν_{3} σ^{3} \frac{θ^{3}}{3!} + ν_{4} σ^{4} \frac{θ^{4}}{4!} + κ_{5} \frac{θ^{5}}{5!} + \dots \end{matrix}

(1)

where

ν_{3} = Skewness = E [{(\frac{X - μ}{σ})}^{3}]

and

ν_{4} = Excess Kurtosis = E [{(\frac{X - μ}{σ})}^{4} - 3] .

It was shown by Daniels [9] that an approximate probability distribution of X based on the CGF may be found using the method of steepest descent as

\hat{f} (x) = \frac{exp [K (θ_{0}) - θ_{0} x]}{\sqrt{2 π K^{″} (θ_{0})}},

(2)

where

θ = θ_{0}

is the real root of

K^{'} (θ) - x = 0

, which both exists and is unique. This work was extended by Lugannani and Rice [10] in specifying the corresponding approximate cumulative distribution function for

x \neq μ

as

\begin{matrix} \hat{F} (x) & = & Φ (w (x)) + ϕ (w (x)) (\frac{1}{w (x)} - \frac{1}{u (x)}) \\ w (x) = s i g n (θ_{0}) \sqrt{2 (θ_{0} x - K (θ_{0}))} \\ u (x) = θ_{0} \sqrt{K^{″} (θ_{0})}, \end{matrix}

where

Φ (x)

and

ϕ (x)

denote the cumulative and density functions of the standard normal distribution, respectively.

Using the expanded form of the CGF in Equation (1) truncated at the jth order, i.e.,

{κ_{i} = 0 : i > j}

, yields an approximation to the probability density function of Equation (2):

\hat{f} (x) = \frac{exp (\sum_{i = 1}^{j} \frac{k_{i} θ_{0}^{i}}{i!} - θ_{0} x)}{\sqrt{2 π \sum_{i = 0}^{j - 2} \frac{k_{i + 2} θ_{0}^{i}}{i!}}},

(3)

with cumulative density function

\begin{matrix} \hat{F} (x) & = & Φ (w (x)) + ϕ (w (x)) (\frac{1}{w (x)} - \frac{1}{u (x)}) \\ w (x) = s i g n (θ_{0}) \sqrt{2 (θ_{0} x - \sum_{i = 1}^{j} \frac{k_{i} θ_{0}^{i}}{i!})} \\ u (x) = θ_{0} \sqrt{\sum_{i = 0}^{j - 2} \frac{k_{i + 2} θ_{0}^{i}}{i!}} . \end{matrix}

(4)

The order of truncation is commonly set to j = 3 or j = 4 based on the sufficiency of retrospectively sampled data to accurately estimate the skewness and kurtosis measures as per

\begin{matrix} \hat{ν_{3}} & = & \frac{\sum_{j = 1}^{n} {(x_{j} - \bar{x})}^{3}}{s^{3} n} \\ \hat{ν_{4}} & = & \frac{\sum_{j = 1}^{n} {(x_{j} - \bar{x})}^{4}}{s^{4} n} - 3 . \end{matrix}

It follows that for j = 3, the root

θ_{0}

in Equations (3) and (4) is given by

θ_{0} = - \frac{1}{κ_{3}} (κ_{2} + \sqrt{κ_{2}^{2} + 2 κ_{3} (x - κ_{1})}),

(5)

and for j = 4 by

\begin{matrix} θ_{0} & = & - \frac{1}{κ_{4}} (κ_{3} + \sqrt[3]{(p - q)} + \sqrt[3]{(p + q)}) \\ p = κ_{3}^{3} - 3 κ_{2} κ_{3} κ_{4} - 3 κ_{1} κ_{4} (x - κ_{1}) \\ q = \sqrt{- {(κ_{3}^{2} - 2 κ_{2} κ_{4})}^{3} + (κ_{3}^{2} - 3 κ_{2} κ_{3} κ_{4} + 3 κ_{4}^{2} {(x - κ_{1})}^{2})} . \end{matrix}

(6)

The probability distributions defined for a random variable X in Equations (3) and (4) may be readily extended to

\bar{X}

by noting the relationship

κ_{i, \bar{X}} = \frac{κ_{i, X}}{n^{i - 1}}

(7)

between their cumulants. Thus, approximate upper (UCL) and lower (LCL) control limits of an

\bar{X}

-chart with subgroup size n may be found by substituting cumulant estimates in Equation (7) into Equation (4) and inverting at the desired type-1 error probability

α

as

\begin{matrix} U C L & = & {\hat{F}}^{- 1} (1 - α / 2) \end{matrix}

(8)

\begin{matrix} L C L & = & {\hat{F}}^{- 1} (α / 2) . \end{matrix}

(9)

A major drawback of this approach, however, is that the probability distributions of the truncated saddlepoint in Equations (3) and (4) may not exist over a lower subset of the domain of the random variable for which they are defined, thereby prohibiting calculation of the LCL. Note that the square root of the second derivative of the CGF in Equation (2), a jth order truncated version of which is given in Equation (3), must be positive for the density function to exist. The second derivative of the truncated CGF will be zero, however, at the point

x^{*} = κ_{1} - \frac{κ_{2}^{2}}{2 κ_{3}}

for

j = 3

, and for

j = 4

at

\begin{matrix} x^{*} = (κ_{1} + \frac{(\sqrt{κ_{3}^{2} - 2 κ_{2} κ_{4}} - κ_{3}) (κ_{3} \sqrt{κ_{3}^{2} - 2 κ_{2} κ_{4}} + 4 κ_{2} κ_{4} - κ_{3}^{2})}{6 κ_{4}^{2}}, \\ κ_{1} + \frac{(\sqrt{κ_{3}^{2} - 2 κ_{2} κ_{4}} + κ_{3}) (κ_{3} \sqrt{κ_{3}^{2} - 2 κ_{2} κ_{4}} - 4 κ_{2} κ_{4} + κ_{3}^{2})}{6 κ_{4}^{2}}) . \end{matrix}

Thus, if

x^{*}

exits and is in the domain of the random variable, the probability distribution will cease to exist in a lower subset of that domain given by

{x \in S : x < x^{*}}

. We therefore present in the next two sections an extension of the truncated saddlepoint approach that ensures the existence of the cumulative density by ensuring that the second derivative of a modified/re-expressed CGF exists over all

X \in R

.

3. Truncated Saddlepoint Approximations with a Re-Expressed Fourth Order Cumulant, TS( $κ_{4}^{*}$ )

Consider a random variable X whose first three cumulants are known or estimated from data. Assume that the skewness is not equal to zero, i.e.,

κ_{3} \neq 0

, and let the fourth cumulant,

κ_{4}^{*}

, be referred to as a re-expressed cumulant (denoted with an asterisk). All other cumulants of higher orders are set to zero. It follows that a re-expressed fourth order cumulant generating function of X is given by

K {(θ)}^{*} = κ_{1} θ + \frac{κ_{2}}{2!} θ^{2} + \frac{κ_{3}}{3!} θ^{3} + \frac{κ_{4}^{*}}{4!} θ^{4}

(10)

with second derivative

K^{″} {(θ)}^{*} = κ_{2} + κ_{3} θ + \frac{κ_{4}^{*}}{2} θ^{2} .

(11)

Letting

κ_{4}^{*} = \frac{κ_{3}^{2}}{2 κ_{2}},

it follows from Equation (11) that

\begin{matrix} K^{″} {(θ)}^{*} & = & κ_{2} + κ_{3} θ + \frac{κ_{4}^{*}}{2} θ^{2} \\ = & κ_{2} + κ_{3} θ + \frac{κ_{3}^{2}}{4 κ_{2}} θ^{2} \\ = & {(\sqrt{κ_{2}} + \frac{κ_{3}}{2 \sqrt{κ_{2}}} θ)}^{2}, \end{matrix}

(12)

which both exists and is non-negative for

κ_{2} > 0

and

κ_{3}

,

θ \in R

. Thus, the re-expressed cumulant generating function of Equation (10) may be written as

\begin{matrix} K {(θ)}^{*} & = & κ_{1} θ + κ_{2} \frac{θ^{2}}{2!} + κ_{3} \frac{θ^{3}}{3!} + κ_{4}^{*} \frac{θ^{4}}{4!} \\ = & κ_{1} θ + κ_{2} \frac{θ^{2}}{2!} + κ_{3} \frac{θ^{3}}{3!} + (\frac{κ_{3}^{2}}{2 κ_{2}}) \frac{θ^{4}}{4!} \\ = & κ_{1} θ + \frac{κ_{2}}{2} θ^{2} + \frac{κ_{3}}{6} θ^{3} + \frac{κ_{3}^{2}}{48 κ_{2}} θ^{4} \end{matrix}

with first derivative

K^{'} {(θ)}^{*} = κ_{1} + κ_{2} θ + \frac{κ_{3}}{2} θ^{2} + \frac{κ_{3}^{2}}{12 κ_{2}} θ^{3} .

It follows that the roots of

\begin{matrix} K^{'} {(θ_{0})}^{*} - x = κ_{1} + κ_{2} θ_{0} + \frac{κ_{3}}{2} θ_{0}^{2} + \frac{κ_{3}^{2}}{12 κ_{2}} θ_{0}^{3} - x = 0 \end{matrix}

have the closed form solution

θ_{0} = \frac{\sqrt[3]{4 κ_{2} κ_{3}^{3} (2 κ_{2}^{2} + 3 κ_{3} (x - κ_{1}))} - 2 κ_{2} κ_{3}}{κ_{3}^{2}}

(13)

with discriminant given by

Δ = \frac{- (κ_{3}^{2} {(2 κ_{2}^{2} - 3 κ_{1} κ_{3} + 3 κ_{3} x)}^{2})}{48 κ_{2}^{2}} .

Because the discriminant

Δ \leq 0

for all

x \in R

regardless of the specification of the cumulants (

κ_{1}, κ_{2}, κ_{3}

), there exists a single real root

θ_{0}

in Equation (13) over all x, noting that this root becomes infinitely differentiable at the point

x^{*} = κ_{1} - \frac{2 κ_{2}^{2}}{3 κ_{3}}

where the radicand of the cube root is zero. An approximate cumulative density function for X is found upon substituting this root into Equation (4), albeit with some degree of irregularity in the neighborhood of

x^{*}

. Alternatively using the cumulants of Equation (7) yields an approximate cumulative density function for

\bar{X}

, from which the the UCL and LCL of a

\bar{X}

-chart may be found upon inversion per Equations (8) and (9). A computational implementation of this complete procedure in the R programming language may be found in Matis et al. [11].

4. Truncated Saddlepoint Approximations with Re-Expressed Fifth and Sixth Order Cumulants, TS( $κ_{5}^{}, κ_{6}^{}$ )

Extending this same approach, now consider a random variable X whose first four cumulants are known or estimated from data. Similar to above, assume that both the estimated skewness and kurtosis are not equal to zero, i.e.,

κ_{3}, κ_{4} \neq 0

, and let the fifth and sixth cumulants,

κ_{5}^{*}

and

κ_{6}^{*}

, be referred to as re-expressed cumulants. All other cumulants of higher orders will be set to zero. It follows that a re-expressed sixth order cumulant generating function of X is given by

K {(θ)}^{*} = κ_{1} θ + \frac{κ_{2}}{2!} θ^{2} + \frac{κ_{3}}{3!} θ^{3} + \frac{κ_{4}}{4!} θ^{4} + \frac{κ_{5}^{*}}{5!} θ^{5} + \frac{κ_{6}^{*}}{6!} θ^{6}

(14)

with second derivative

K^{″} {(θ)}^{*} = κ_{2} + κ_{3} θ + \frac{κ_{4}}{2} θ^{2} + \frac{κ_{5}^{*}}{6} θ^{3} + \frac{κ_{6}^{*}}{24} θ^{4} .

(15)

Letting

\begin{matrix} κ_{5}^{*} & = & \frac{3 (2 κ_{2} κ_{3} κ_{4} - κ_{3}^{3})}{4 κ_{2}^{2}} \\ κ_{6}^{*} & = & \frac{3 {(2 κ_{2} κ_{4} - κ_{3}^{3})}^{2}}{8 κ_{2}^{3}}, \end{matrix}

it follows from Equation (15) that

\begin{matrix} K^{″} {(θ)}^{*} & = & κ_{2} + κ_{3} θ + \frac{κ_{4}}{2} θ^{2} + \frac{κ_{5}^{*}}{6} θ^{3} + \frac{κ_{6}^{*}}{24} θ^{4} \\ = & {(\sqrt{κ_{2}} + (\frac{κ_{3}}{2 \sqrt{κ_{2}}}) θ + (\frac{2 κ_{2} κ_{4} - κ_{3}^{2}}{8 κ_{2} \sqrt{κ_{2}}}) θ^{2})}^{2}, \end{matrix}

(16)

which both exists and is non-negative for

κ_{2} > 0

and

κ_{3}, κ_{4}, θ \in R

. Thus, the re-expressed cumulant generating function of Equation (14) may be written as

\begin{matrix} K {(θ)}^{*} & = & κ_{1} θ + κ_{2} \frac{θ^{2}}{2!} + κ_{3} \frac{θ^{3}}{3!} + κ_{4}^{*} \frac{θ^{4}}{4!} + κ_{5}^{*} \frac{θ^{5}}{5!} + κ_{6}^{*} \frac{θ^{6}}{6!} \\ = & κ_{1} θ + κ_{2} \frac{θ^{2}}{2!} + κ_{3} \frac{θ^{3}}{3!} + κ_{4} \frac{θ^{4}}{4!} + (\frac{3 (2 κ_{2} κ_{3} κ_{4} - κ_{3}^{3})}{4 κ_{2}^{2}}) \frac{θ^{5}}{5!} + (\frac{3 {(2 κ_{2} κ_{4} - κ_{3}^{3})}^{2}}{8 κ_{2}^{3}}) \frac{θ^{6}}{6!} \\ = & κ_{1} θ + \frac{κ_{2}}{2} θ^{2} + \frac{κ_{3}}{6} θ^{3} + \frac{κ_{4}}{24} θ^{4} + (\frac{2 κ_{2} κ_{3} κ_{4} - κ_{3}^{3}}{160 κ_{2}^{2}}) θ^{5} + (\frac{{(2 κ_{2} κ_{4} - κ_{3}^{3})}^{2}}{1920 κ_{2}^{3}}) θ^{6} \end{matrix}

(17)

with first derivative

K^{'} {(θ_{0})}^{*} = κ_{1} + κ_{2} θ_{0} + \frac{κ_{3}}{2} θ_{0}^{2} + \frac{κ_{4}}{6} θ_{0}^{3} + (\frac{2 κ_{2} κ_{3} κ_{4} - κ_{3}^{3}}{32 κ_{2}^{2}}) θ_{0}^{4} + (\frac{{(2 κ_{2} κ_{4} - κ_{3}^{3})}^{2}}{320 κ_{2}^{3}}) θ_{0}^{5} .

(18)

In this case, the roots of

K^{'} {(θ_{0})}^{*} - x = 0

(19)

do not have a closed form algebraic solution, per the Abel–Ruffini theorem applied to the quintic polynomial function of Equation (18), and hence must be found through numerical search. As there may be more than one real root that which when substituted into Equation (4) yields a proper cumulative density function, yet similar to the TS(

κ_{4}

) approximation, this root may be infinitely differentiable at a point within the domain of definition, thereby causing irregularity in the neighborhood. Alternatively using the cumulants of Equation (7) yields an approximate cumulative density for

\bar{X}

, from whence the UCL and LCL for the mean may be found by inversion at the desired type-1 error, as per Equations (8) and (9).

5. Healthcare Example

Applications of quality control in healthcare frequently involve the monitoring of processes that are non-Gaussian (Woodall [12]), for which simplistic

\bar{X}

-charts that may be used by line staff members are desired (Finson et al. [13]). One such application is the surveillance of time delays that patients must wait between receiving care in the emergency department (ED) and being transferred to an available inpatient room, often referred to as “ED boarding times”. This is an important hospital measure of patient safety, flow, care quality, and efficiency and the focus of many hospital quality improvement efforts, with many hospitals reporting noon ED boarding times as one of their key performance indicators. It has been empirically observed across several mid-sized teaching hospitals that these times follow log-normal distributions, with an average rate of 4–6 patient transfers per hour (Ding et al. [14]).

Consider a boarding time process whose variable is assumed to follow a lognormal

(μ = 5.1, σ = 0.75)

distribution, for which the (10%, 50%, 90%) percentiles are given by (62.7, 164.0, and 428.9) minutes, exhibiting significant skew. An experiment in which

m = 500

random variates were generated from this process distribution, assumed stable, resulted in the following sample statistics (

\bar{x} = 228.58, s^{2} = 27, 385.68, {\hat{ν}}_{3} = 1.91

). Assuming these data were collected from an in-control process in rational subgroups of size

n = 4

, the estimated cumulants (

κ_{1} = 222.58, κ_{2} = 6846.62, κ_{3} = 540, 406.32

) of

\bar{X}

may be used in specifying the Gaussian and TS(

κ_{4} *

) approximations of the density function, which are displayed together with a histogram of the subgroup means in Figure 1a. From these, the

α = 0.0027

control limits for a

\bar{X}

chart may be determined for the Gaussian

(L C L = - 33.35, U C L = 463.10)

and TS(

κ_{4} *

) (

L C L = 76.82, U C L = 553.74

) approximations, shown on the control chart for subgroup means in Figure 1b, suggesting significantly better fit and performance.

Specifying a TS(

κ_{5}^{*}, κ_{6}^{*}

) approximation of

\bar{X}

is not practical in this example due to the difficulty of numerically finding the root

θ_{0}

in Equations (18) and (19), as explored further in Section 6.2. Note that including an estimate of only the sample skewness, however, greatly improves the accuracy of the approximated limits. In Figure 1b, several of the plotted subgroup means from the pre-sample would be (incorrectly) deemed out-of-control by the Gaussian approximated limits, but not using the TS(

κ_{4}^{*}

) limits. In a retrospective analysis, this also may result in the iterative elimination, recalculation, and tightening of control limits (here erroneously, as the generated data are i.i.d), further inflating the type-1 error of the Gaussian limits and increasing false alarm rates when used in prospective monitoring. In addition, the LCL of the TS(

κ_{4}^{*}

) approximation is greater than zero, allowing for the detection of assignable cause variability that indicates process improvement or possibly undesirable events, which is not possible using the Gaussian approximation.

6. Performance Analysis

For convenience, we assess the accuracy of the approximate upper and lower control limits of an

\bar{X}

-chart for a range of

G a m m a (γ, 1 / λ)

distributed process variables

X \in R^{+}

, where

γ

denotes the shape and

(1 / λ)

the scale parameters. Saralees [15] has observed that the Gamma distribution is commonly used to describe a broad array of real-life datasets. These parameters may be specified such that this distribution takes on a broad range of asymmetric shapes commonly encountered in practice, with cases for

γ = 1, 2, 4

and

\frac{1}{λ} = 3

(depicted in Figure 2) and examined below.

The corresponding distributions of

\bar{X}

are known to be distributed as a

G a m m a (n γ, 1 / n λ)

, from which the exact control limits may be calculated. For a desired type-1 error probability of

α = 0.0027

and subgroup sizes of

n = 5, 10, 15

, approximate control limits based on Gaussian, re-expressed fourth (TS(

κ_{4}^{*}

)), and re-expressed fifth and sixth (TS(

κ_{5}^{*}, κ_{6}^{*}

)) representations of the distribution of

\bar{X}

are compared to the exact limits for the cases in which the cumulants are either known or estimated from m in-control pre-sampled data.

6.1. Known Cumulants

The known cumulants of a

G a m m a (n γ, 1 / n λ)

distribution for

\bar{X}

are given by

\begin{matrix} κ_{1} & = & \frac{γ}{λ} \\ κ_{2} & = & \frac{γ}{n λ^{2}} \\ κ_{3} & = & \frac{2 γ}{n^{2} λ^{3}} \\ κ_{4} & = & \frac{6 γ^{4}}{n^{3} λ^{4}} . \end{matrix}

(20)

It follows that the exact and approximate control limits for an

\bar{X}

-chart, together with the resulting in-control average run lengths (ARL) above and below these limits are given in Table 1. The control limits for TS(

κ_{4}^{*}

) and TS(

κ_{5}^{*}, κ_{6}^{*}

) are calculated by substituting the cumulants from Equation (20) together with their respective root from Equation (13) or (19) into Equation (4), and then inverting as per Equations (8) and (9).

For each of the approximate density functions, both the LCL and UCL converge in type-1 error performance to the exact limits as the shape parameter

γ

and the subgroup size n are increased, as would be intuitive. In the case of a

G a m m a (4, 3)

distribution with n = 15, both the upper and lower control limits of the TS(

κ_{5}^{*}, κ_{6}^{*}

) distribution have almost fully converged to the exact limits. In the case of a

G a m m a (1, 3)

distribution with n = 5, the LCL of the Gaussian approximation is negative and beyond the domain of the distribution, yet note that the LCL exists for both the TS(

κ_{4}^{*}

) and TS(

κ_{5}^{*}, κ_{6}^{*}

) distributions. Focusing on only the TS(

κ_{4}^{*}

) and TS(

κ_{5}^{*}, κ_{6}^{*}

) approximations, the approximations of the UCLs are more accurate than the LCLs across all subgroup sizes n and values of

γ, 1 / λ

. In the case of n and

γ

small, this is in part due to the domain of the approximating distribution spanning beyond zero to negative infinity, yet moreover to infinite differentiability of the root

θ_{0}

at a point

x^{*}

, which is downward from

κ_{1}

in these cases. As an example, the root and corresponding cumulative density function of the TS(

κ_{4}^{*}

) approximation of a

G a m m a (2, 3)

population distribution with subgroups of size n = 10 is displayed in Figure 3a,b.

Note that the root becomes infinitely differentiable at the point

x^{*} = 4

where the radicand of Equation (13) is equal to zero, leading to irregularity and a ‘kink’ in the lower tail of the cumulative density, and ultimately error in approximating the LCL. Although not directly observable for the root of the TS(

κ_{5}^{*}, κ_{6}^{*}

) approximation that does not have a closed form, a similar result is obtained.

The type-2 error performance of the approximate distributions was assessed by varying the values of the parameters

γ

and

1 / λ

and generating the corresponding operating characteristic (OC) curves in Figure 4 and Figure 5.

Note that when shifting the scale parameter, both the mean and variance of

\bar{X}

vary, yet the skewness and kurtosis remain constant, whereas when shifting the shape parameter, all measures of

\bar{X}

vary.

Collectively observing in-control performance in Table 1 and out-of-control performance in Figure 4 and Figure 5, the Gaussian approximation does not have power in detecting downward shifts in the LCL and has high false alarm rates in the UCL, across all considered cases. The TS(

κ_{4}^{*}

) approximation has a high false alarm rate in the LCL, yet reasonably approximates the exact rates in the UCL, and the TS(

κ_{5}^{*}, κ_{6}^{*}

) approximation reasonably approximates the ARL performance of the exact distribution. Consider the specific case of a

G a m m a (2, 3)

distribution with subgroups of size

n = 5

, whose in- and out-of-control ARL performance is given in Table 2.

In this case, the distribution of

\bar{X}

exhibits considerable asymmetry, yet the in- and out-of-control performance of the TS(

κ_{4}^{*}

) and the TS(

κ_{5}^{*}, κ_{6}^{*}

) reasonably approximates that of the exact distribution for signals above the UCL. For signals below the LCL, it is observed that the Gaussain approximation lacks power, the TS(

κ_{4}^{*}

) has a high false alarm rate, and the TS(

κ_{5}^{*}, κ_{6}^{*}

) most accurately approximates the exact error performance. Across all considered cases and generally speaking, the in- and out-of-control performance of Gaussian approximation deviates by more that two-fold for upward signals and several fold for downward signals, whereas both the TS(

κ_{4}^{*}

) and TS(

κ_{5}^{*}, κ_{6}^{*}

) approximations deviate less than two-fold for upward signals while varying in downward signal performance, with the TS(

κ_{5}^{*}, κ_{6}^{*}

) exhibiting greater accuracy than the TS(

κ_{4}^{*}

).

6.2. Estimated Cumulants

As the cumulants of

\bar{X}

are generally not known in practice, we also consider the case for which these are estimated from

m = 100, 1000, 10, 000

sampled data points from a

G a m m a (γ, 1 / λ)

population distribution. The in-control

A R L

is simulated for

J = 1000

replications and reported in terms of the median (Med) accuracy and inner quartile range (IQR) precision. The resulting

A R L

values for the lower and upper control limits are given in Table 3 and Table 4.

As shown, the median accuracy of the UCL/LCL approximations with estimated cumulants convergence to those with known cumulants as the scale parameter

γ

, subgroup size n, and quantity of sampled data m are increased, as would be intuitive. In comparison to the exact result, the median accuracy of TS(

κ_{5}^{*}, κ_{6}^{*}

) approximations generally exceeds that of TS(

κ_{4}^{*}

), yet the IQR of TS(

κ_{4}^{*}

) approximations are often more narrow than TS(

κ_{5}^{*}, κ_{6}^{*}

). In general, the performance of TS(

κ_{5}^{*}, κ_{6}^{*}

) approximations hinges on how accurately the kurtosis, and hence fourth cumulant, may be estimated from the sampled data. In the case of a highly skewed process distribution when both the subgroup size n and number of pre-samples m are small, a TS(

κ_{4}^{*}

) approximation may be preferable to a TS(

κ_{5}^{*}, κ_{6}^{*}

) in both computational effort and accuracy, yet this advantage quickly diminishes with decreasing asymmetry in the distribution of

\bar{X}

and/or increasing accuracy of the estimated kurtosis.

Related to earlier comments, also note that in some cases, most notably when

m = 100

,

n = 5

, and

γ = 1, 2

, it is difficult to numerically find the root of the TS(

κ_{5}^{*}, κ_{6}^{*}

) approximation in Equations (18) and (19) with reasonable effort. This is due to the instability of these equations arising from inaccuracies in estimating the skewness, and hence fourth cumulant. The root of the TS(

κ_{4}^{*}

) approximation, however, may be readily found through the closed algebraic expression in Equation (13) regardless of inaccuracies in estimating the skewness, and hence third cumulant.

7. Discussion and Conclusions

This paper proposed a new and practical approach to estimating more accurate control limits for non-Gaussian data based on a novel extension of truncated saddlepoint distributions, in which some cumulants are re-expressed in such a manner to ensure the resulting approximated probability distribution exists across the complete domain of the random variable. The relative accuracy of the approximated probability limits across a range of skewed Gamma distributions suggests that this approach is highly accurate and robust, especially relative to the traditional central limit theorem approximation, with the percent error from presumed lower and upper tail probabilities being often an order of magnitude smaller.

Deciding whether to approximate the distribution of the process mean with a Gaussian, truncated saddlepoint with a fourth re-expressed cumulant (TS(

κ_{4}^{*}

)), or truncated saddlepoint with fifth and sixth re-expressed cumulants (TS(

κ_{5}^{*}, κ_{6}^{*}

)) approach has implications on the in- and out-of-control average run length (

A R L

) performance of the corresponding upper (UCL) and lower (LCL) control limits. For even a small number of pre-sampled data, the inclusion of skewness in a TS(

κ_{4}^{*}

) approximation was shown to have considerably better overall

A R L

accuracy than the Gaussian approximation, and for small subgroup sizes in particular, the TS(

κ_{4}^{*}

) produces a LCL than has statistical power in detecting downward shifts of the mean. Furthermore, the TS(

κ_{4}^{*}

) approximation has a closed form expression, from which both the LCL and UCL may be readily obtained through algebraic inversion. If the quantity of pre-sampled data is sufficiently large to reasonably estimate the population kurtosis, a TS(

κ_{5}^{*}, κ_{6}^{*}

) approximation will provide even greater accuracy in

A R L

performance.

Future work might investigate several practical issues, including comparison to various non-parametric methods and application to attribute control charts (Chakraborti [16], Chakraborti et al. [17], Qiu and Li [18]), likelihood ratio test control charts, individuals charts, and sums of heterogeneous random variables (Chen et al. [19]). Given the ability of cumulant generating functions to capture correlations between random variables, this approach also might be extended to a multivariate version of the re-expressed truncated saddlepoint distribution. Finally, applications beyond quality control could be investigated for which approximating convolutions of sums or averages is important [19].

Author Contributions

Study conceptualization: P.B., B.C. and T.M.; Research: P.B., B.C. and T.M.; Technical review and manuscript revision: J.C.B. and B.C.; Supervision: T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

$ν_{i}$	ith Moment
$κ_{i}$	ith Cumulant
$α$	Type I Error Rate, Producer’s Risk
$K (θ)$	Cumulant Generating Function
$ϕ$	Standard Normal Probability Density Function
$Φ$	Standard Normal Cumulative Density Function
$Δ$	Discriminant
$1 / λ$	Scale Parameter of $G a m m a (γ, 1 / λ)$ distribution
$γ$	Shape Parameter of $G a m m a (γ, 1 / λ)$ distribution
$I Q R$	Inter Quartile Range
$A R L$	Average Run Length

References

Bradford, P.; Miranti, P. Information in an Industrial Culture: Walter A. Shewhart and the Evolution of the Control Chart, 1917–1954. Inf. Cult. 2019, 54, 179–219. [Google Scholar]
Razali, N.; Wah, Y.B. Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. J. Stat. Model. Anal. 2011, 2, 21–33. [Google Scholar]
Chang, H.J.; Ho, J.F.; Wu, C.H.; Chen, P. On sample size in using central limit theorem for gamma distribution. Inf. Manag. Sci. 2008, 19, 153–174. [Google Scholar]
Butler, R.W.; Wood, A.T. Saddlepoint approximation for moment generating function of truncated random variables. Ann. Stat. 2004, 32, 2712–2730. [Google Scholar]
Chen, B.; Matis, T.; Benneyan, J. Improved one-sided control charts for the mean of a positively skewed population using truncated saddlepoint approximations. Qual. Reliab. Eng. Int. 2011, 27, 1043–1058. [Google Scholar]
Wang, S. General saddlepoint approximations in the bootstrap. Prob. Stat. Lett. 1992, 13, 61–66. [Google Scholar]
Gillespie, C.; Renshaw, E. An improved saddlepoint approximation. Math. Biosci. 2007, 208, 359–374. [Google Scholar] [PubMed]
Fasiolo, M.; Wood, S.; Harting, F.; Bravington, M. An extended empirical saddlepoint approximation for intractable likelihoods. Electron. J. Stat. 2018, 12, 1544–1578. [Google Scholar]
Daniels, H.E. Saddlepoint approximations in statistics. Ann. Math. Statist. 1954, 25, 631–650. [Google Scholar]
Lugannani, R.; Rice, S. Saddlepoint approximation for the distribution of the sum of independent random variables. Adv. Appl. Probab. 1980, 12, 475–490. [Google Scholar]
Matis, T.; Braden, P.; Chen, B.; Benneyan, J. Truncated Saddlepoint Approximation: ReExpressed Fourth Cumulant. 2022. Available online: https://rpubs.com/tmatis/ReExpressedFourth/ (accessed on 10 August 2021).
Woodall, W. The Use of Control Charts in Health-Care and Public-Health Surveillance. J. Qual. Technol. 2006, 38, 89–104. [Google Scholar]
Finson, L.; Finson, K.; Bliersbach, C. The Use of Control Charts to Improve Healthcare. J. Healthc. Qual. 1993, 15, 9–23. [Google Scholar]
Ding, R.; McCarthy, M.; Desmond, J.; Lee, J.; Aronsky, D.; Zeger, S. Characterizing Waiting Room Time, Treatment Time, and Boarding Time int he Emergency Department Using Quantile Regression. Acad. Emerg. Med. 2010, 17, 813–823. [Google Scholar] [PubMed] [Green Version]
Saralees, N. Some gamma distributions. Stat. A J. Theor. Appl. Stat. 2008, 42, 77–94. [Google Scholar]
Chakraborti, S. Nonparametric (distribution-free) quality control charts. In Encyclopedia of Statistical Sciences; Wiley Online Library: Hoboken, NJ, USA, 2011; pp. 1–27. [Google Scholar]
Chakraborti, S.; Qiu, P.; Mukherjee, A. Editorial to the special issue: Nonparametric statistical process control charts. Qual. Reliab. Eng. Int. 2015, 31, 1–2. [Google Scholar]
Qiu, P.; Li, Z. On nonparametric statistical process control of univariate processes. Technometrics 2011, 53, 390–405. [Google Scholar]
Chen, B.; Matis, T.; Benneyan, J. Computing exact bundle compliance control charts via probability generating functions. Health Care Manag. Sci. 2016, 19, 103–110. [Google Scholar]

Figure 1. (a) Histogram of 125 subgroup means of size n = 4 corresponding to m = 500 pre-samples from a lognormal(μ = 5.1, σ = 0.75) distributed process with approximate Gaussian and fourth re-expressed density functions fitted with sample cumulants. (b) Corresponding

\bar{X}

-chart displaying subgroups means and approximate control limits. Gaussian (blue) and re-expressed fourth (red) approximations. Sampled data consist of m = 500 data points, yielding 125 subgroups means of size n = 4.

Figure 1. (a) Histogram of 125 subgroup means of size n = 4 corresponding to m = 500 pre-samples from a lognormal(μ = 5.1, σ = 0.75) distributed process with approximate Gaussian and fourth re-expressed density functions fitted with sample cumulants. (b) Corresponding

\bar{X}

-chart displaying subgroups means and approximate control limits. Gaussian (blue) and re-expressed fourth (red) approximations. Sampled data consist of m = 500 data points, yielding 125 subgroups means of size n = 4.

Figure 2.

G a m m a (γ, \frac{1}{λ} = 3)

probability distributions of varied shape and skew used in performance analysis and representing the range of many control chart applications.

γ = 1

(blue),

γ = 2

(red), and

γ = 4

(yellow).

Figure 2.

G a m m a (γ, \frac{1}{λ} = 3)

probability distributions of varied shape and skew used in performance analysis and representing the range of many control chart applications.

γ = 1

(blue),

γ = 2

(red), and

γ = 4

(yellow).

Figure 3. (a) Plot of the root

θ_{0}

of a TS(

κ_{4}^{*}

) approximation of

\bar{X}

for subgroups of size

n = 10

for

X \sim G a m m a (2, 3)

process variable. (b) Plot of the corresponding TS(

κ_{4}^{*}

) cumulative density function.

Figure 3. (a) Plot of the root

θ_{0}

of a TS(

κ_{4}^{*}

) approximation of

\bar{X}

for subgroups of size

n = 10

for

X \sim G a m m a (2, 3)

process variable. (b) Plot of the corresponding TS(

κ_{4}^{*}

) cumulative density function.

Figure 4. Operating characteristic (OC) curves of an

\bar{X}

chart with subgroups of size n for a

G a m m a (γ, \frac{1}{λ} = 3)

population distribution with shifted shape parameter

γ = γ_{1}

, where

γ = γ_{0}

is the in-control case.

Figure 4. Operating characteristic (OC) curves of an

\bar{X}

chart with subgroups of size n for a

G a m m a (γ, \frac{1}{λ} = 3)

population distribution with shifted shape parameter

γ = γ_{1}

, where

γ = γ_{0}

is the in-control case.

Figure 5. Operating characteristic (OC) curves of an

\bar{X}

chart with subgroups of size n for a

G a m m a (γ, \frac{1}{λ})

population distribution with shifted scale parameter

\frac{1}{λ} = {(\frac{1}{λ})}_{1}

, where

\frac{1}{λ} = {(\frac{1}{λ})}_{0} = 3

is the in-control case.

Figure 5. Operating characteristic (OC) curves of an

\bar{X}

chart with subgroups of size n for a

G a m m a (γ, \frac{1}{λ})

population distribution with shifted scale parameter

\frac{1}{λ} = {(\frac{1}{λ})}_{1}

, where

\frac{1}{λ} = {(\frac{1}{λ})}_{0} = 3

is the in-control case.

Table 1. Upper and lower control limits and in-control average run lengths for an

\bar{X}

-chart on a

G a m m a (γ, \frac{1}{λ})

distributed process with subgroups of size n and desired type-1 error of

α = 0.0027

.

Table 1. Upper and lower control limits and in-control average run lengths for an

\bar{X}

-chart on a

G a m m a (γ, \frac{1}{λ})

distributed process with subgroups of size n and desired type-1 error of

α = 0.0027

.

	Gamma (1,3)
	n = 5				n = 10				n = 15
	ARL↓	LCL	UCL	ARL↑	ARL↓	LCL	UCL	ARL↑	ARL↓	LCL	UCL	ARL↑
Exact	740.74	0.48	8.64	740.74	740.74	0.93	6.65	740.74	740.74	1.20	5.86	740.74
Gaussian	∞	(·)	7.03	107.41	4.56 × 10⁹	0.15	5.85	148.85	3.53 × 10⁵	0.68	5.23	179.09
TS( $k_{4}^{*}$ )	75.55	0.82	8.22	441.50	42.11	1.42	6.48	520.29	46.83	1.64	5.76	562.75
TS( $k_{5}^{}, k_{6}^{}$ )	4.39 × 10 $^{6}$	0.24	8.52	638.16	1.42 × 10 $^{4}$	0.631	6.62	668.23	2595.35	1.05	5.84	704.23
	Gamma (2,3)
	n = 5				n = 10				n = 15
	ARL↓	LCL	UCL	ARL↑	ARL↓	LCL	UCL	ARL↑	ARL↓	LCL	UCL	ARL↑
Exact	740.74	1.85	13.31	740.74	740.74	2.76	10.83	740.74	740.74	3.24	9.82	740.74
Gaussian	4.56 × 10 $^{9}$	0.31	11.69	148.85	5.20 × 10 $^{4}$	1.98	10.03	203.11	1.25 × 10 $^{4}$	2.71	9.29	240.16
TS( $k_{4}^{*}$ )	42.10	2.85	12.96	520.02	58.01	3.52	10.69	590.32	96.66	3.76	9.74	624.22
TS( $k_{5}^{}, k_{6}^{}$ )	1.27 × 10 $^{4}$	1.28	13.23	683.06	1217.24	2.64	10.81	711.24	926.45	3.19	9.81	722.54
	Gamma (4,3)
	n = 5				n = 10				n = 15
	ARL↓	LCL	UCL	ARL↑	ARL↓	LCL	UCL	ARL↑	ARL↓	LCL	UCL	ARL↑
Exact	740.74	5.52	21.66	740.74	740.74	7.10	18.50	740.74	740.74	7.88	17.18	740.74
Gaussian	5.20 × 10 $^{4}$	3.95	20.05	203.11	6765.87	6.31	17.69	268.31	3755.31	7.35	16.65	309.84
TS( $k_{4}^{*}$ )	58.00	7.04	21.39	589.97	167.48	7.76	18.39	644.75	568.14	7.99	17.12	669.34
TS( $k_{5}^{}, k_{6}^{}$ )	2832.83	4.93	21.64	726.74	841.35	7.05	18.48	726.74	773.22	7.87	17.18	732.06

(·) = observation outside domain of definition; ARL ↓= 1/Pr{

\bar{X}

< LCL} for

\bar{X}

~

G a m m a (n γ, 1 / n λ)

; ARL ↑= 1/Pr{

\bar{X}

> UCL} for

\bar{X}

~

G a m m a (n γ, 1 / n λ)

.

Table 2. In- and out-of-control ARLs for a

G a m m a (γ = 2, \frac{1}{λ} = 3)

distributed process with subgroups of size

n = 5

.

Table 2. In- and out-of-control ARLs for a

G a m m a (γ = 2, \frac{1}{λ} = 3)

distributed process with subgroups of size

n = 5

.

	$1 / λ$	Exact	Gaussian	TS( $κ_{4}^{*}$ )	TS( $κ_{5}^{}, κ_{6}^{}$ )
ARL↓	1	2.24	8.13 ×10⁴	1.11	8.76
	2	49.48	8.13 ×10⁴	5.46	567.57
	3	740.74	4.56 ×10⁹	42.10	1.27 ×10⁴
ARL↑	3	740.74	148.58	520.02	683.06
	4	31.69	12.00	25.51	30.15
	5	6.87	3.70	5.94	6.61

ARL ↓= 1/Pr{

\bar{X}

< LCL} for

\bar{X}

~

G a m m a (n γ, 1 / n λ)

; ARL ↑= 1/Pr{

\bar{X}

> LCL} for

\bar{X}

~

G a m m a (n γ, 1 / n λ)

.

Table 3. In-control ARLs for the LCL of an

\bar{X}

-chart on a

G a m m a (γ, 1 / λ)

distributed process.

Table 3. In-control ARLs for the LCL of an

\bar{X}

-chart on a

G a m m a (γ, 1 / λ)

distributed process.

		Gamma (1,3)
		n = 5			n = 10			n = 15
	m =	100	1000	10,000	100	1000	10,000	100	1000	10,000
Exact		740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74
Gaussian	Known Sim(Med) Sim(IQR)	· · ·	· · ·	· · ·	4.56 × 10 $^{9}$ · ·	4.56 × 10 $^{9}$ · ·	4.56 × 10 $^{9}$ · ·	3.53 × 10 $^{5}$ (2.44 × 10 $^{5}$ ) (4.24 × 10 $^{5}$ )	3.53 × 10 $^{5}$ (3.29 × 10 $^{5}$ ) (6.90 × 10 $^{5}$ )	3.53 × 10 $^{5}$ (3.51 × 10 $^{5}$ ) (1.95 × 10 $^{5}$ )
TS( $k_{4}^{*}$ )	Known Sim(Med) Sim(IQR)	75.55 (78.15) (137.89)	75.55 (75.64) (39.62)	75.55 (75.78) (13.68)	42.11 (66.25) (85.01)	42.11 (42.21) (14.24)	42.11 (42.58) (2.67)	46.83 (87.89) (127.40)	46.83 (50.98) (20.59)	46.83 (47.59) (5.69)
TS( $k_{5}^{}$ , $k_{6}^{}$ )	Known Sim(Med) Sim(IQR)	4.39 × 10 $^{6}$ · ·	4.39 × 10 $^{6}$ · ·	4.39 × 10 $^{6}$ · ·	1.42 × 10 $^{4}$ (1528.59) (7930.54)	1.42 × 10 $^{4}$ (7150.84) (1.91 × 10 $^{4}$ )	1.42 × 10 $^{4}$ (1.28 × 10 $^{4}$ ) (1.21 × 10 $^{4}$ )	2595.35 (1163.14) (3717.40)	2595.35 (1917.10) (2622.79)	2595.35 (2497.07) (1092.10)
		Gamma (2,3)
		n = 5			n = 10			n = 15
	m =	100	1000	10,000	100	1000	10,000	100	1000	10,000
Exact		740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74
Gaussian	Known Sim(Med) Sim(IQR)	4.56 × 10⁹ · ·	4.56 × 10⁹ · ·	4.56 × 10⁹ · ·	5.20 × 10⁴ (3.78 × 10⁴) (3.20 × 10⁵)	5.20 × 10⁴ (5.00 × 10⁴) (6.15 × 10⁴)	5.20 × 10⁴ (5.29 × 10⁴) (1.81 × 10⁴)	1.25 × 10⁴ (1.02 × 10⁴) (3.26 × 10⁴)	1.25 × 10⁴ (1.23 × 10⁴) (1.24 × 10 $^{4}$ )	1.25 × 10 $^{4}$ (1.27 × 10 $^{4}$ ) (1092.90)
TS( $k_{4}^{*}$ )	Known Sim(Med) Sim(IQR)	42.10 (65.80) (67.39)	42.10 (44.79) (11.24)	42.10 (42.52) (2.71)	58.01 (105.88) (231.89)	58.01 (61.57) (28.00)	58.01 (59.86) (10.98)	96.66 (205.69) (445.78)	96.66 (107.85) (74.53)	96.66 (99.02) (23.44)
TS( $k_{5}^{}$ , $k_{6}^{}$ )	Known Sim(Med) Sim(IQR)	1.27 × 10 $^{4}$ (1112.05) (1.7 × 10 $^{4}$ )	1.27 × 10 $^{4}$ (3513.44) (1.10 × 10 $^{4}$ )	1.27 × 10 $^{4}$ (3681.22) (5449.54)	1217.24 (511.59) (1069.44)	1217.24 (1019.68) (725.47)	1217.24 (1179.71) (334.16)	926.45 (733.28) (1713.71)	926.45 (851.65) (598.51)	926.45 (917.78) (203.71)
		Gamma (4,3)
		n = 5			n = 10			n = 15
	m =	100	1000	10,000	100	1000	10,000	100	1000	10,000
Exact		740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74
Gaussian	Known Sim(Med) Sim(IQR)	5.20 × 10 $^{4}$ (3.80 × 10 $^{4}$ ) (2.15 × 10 $^{5}$ )	5.20 × 10 $^{4}$ (4.98 × 10 $^{4}$ ) (5.08 × 10 $^{4}$ )	5.20 × 10 $^{4}$ (5.13 × 10 $^{4}$ ) (1.58 × 10 $^{4}$ )	6765.87 (6776.40) (2.01 × 10 $^{4}$ )	6765.87 (6859.42) (4749.77)	6765.87 (6796.99) (1580.69)	3755.31 (3490.03) (1.13 × 10 $^{5}$ )	3755.31 (3787.28) (3102.52)	3755.31 (3777.15) (901.67)
TS( $k_{4}^{*}$ )	Known Sim(Med) Sim(IQR)	58.00 (103.9) (163.83)	58.00 (62.28) (29.53)	58.00 (58.56) (8.34)	167.48 (307.74) (730.82)	167.48 (190.32) (196.32)	167.48 (168.54) (44.46)	568.14 (467.25) (1089.74)	568.14 (509.13) (428.27)	568.14 (582.37) (255.89)
TS( $k_{5}^{}$ , $k_{6}^{}$ )	Known Sim(Med) Sim(IQR)	2832.83 (1493.64) (8689.35)	2832.83 (2452.38) (3560.64)	2832.83 (2850.68) (1216.48)	841.35 (568.38) (1229.17)	841.35 (681.44) (991.06)	841.35 (1014.18) (484.76)	773.22 (797.73) (1710.11)	773.22 (733.89) (261.54)	773.22 (748.5) (123.87)

· observation infinite or undefined; n = subgroup size, m = number of pre-samples from a

G a m m a (γ, 1 / λ)

distribution; Exact ARL = 1/0.00135 = 740.74, Calculated ARL = 1/Pr (

\bar{X}

< LCL) for

\bar{X}

~

G a m m a (γ n, 1 / n λ)

; Simulated results based on J = 1000 replications, Med = median, IQR = inner quartile range.

Table 4. In-control ARLs for the UCL of an

\bar{X}

-chart on a

G a m m a (γ, 1 / λ)

distributed process.

Table 4. In-control ARLs for the UCL of an

\bar{X}

-chart on a

G a m m a (γ, 1 / λ)

distributed process.

		Gamma (1,3)
		n = 5			n = 10			n = 15
	m =	100	1000	10,000	100	1000	10,000	100	1000	10,000
Exact		740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74
Gaussian	Known Sim(Med) Sim(IQR)	107.41 (93.8) (89.33)	107.41 (103.72) (45.7)	107.41 (107.2) (13.05)	148.85 (131.49) (243.43)	148.85 (148.13) (83.48)	148.85 (147.73) (24.57)	179.09 (166.51) (336.02)	179.09 (174.83) (144.55)	179.09 (177.68) (33.91)
TS( $k_{4}^{*}$ )	Known Sim(Med) Sim(IQR)	441.50 (313.19) (631.07)	441.50 (423.55) (267.62)	441.50 (438.88) (81.67)	520.29 (368.39) (901.68)	520.29 (401.2) (371.98)	520.29 (523.16) (52.63)	562.75 (413.22) (1296.71)	562.75 (527.98) (441.37)	562.75 (566.73) (56.13)
TS( $k_{5}^{}$ , $k_{6}^{}$ )	Known Sim(Med) Sim(IQR)	638.16 (218.77) (2261.47)	638.16 (298.26) (153.95)	638.16 (730.46) (106.18)	668.23 (419.02) (1144.96)	668.23 (659.20) (498.79)	668.23 (681.89) (177.53)	704.23 (485.79) (1393.70)	704.23 (709.72) (601.20)	704.23 (698.32) (191.97)
		Gamma (2,3)
		n = 5			n = 10			n = 15
	m =	100	1000	10,000	100	1000	10,000	100	1000	10,000
Exact		740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74
Gaussian	Known Sim(Med) Sim(IQR)	148.85 (144.83) (183.8)	148.85 (146.9) (55.16)	148.85 (148.71) (18.7)	203.11 (180.25) (347.12)	203.11 (204.12) (101.2)	203.11 (203.29) (32.13)	240.16 (209.89) (491.22)	240.16 (241.84) (149.39)	240.16 (242.1) (48.1)
TS( $k_{4}^{*}$ )	Known Sim(Med) Sim(IQR)	520.02 (389.56) (745.35)	520.02 (493.58) (285.45)	520.02 (514.8) (89.21)	590.32 (476.2) (1103.42)	590.32 (576.53) (386.10)	590.32 (590.02) (61.56)	624.22 (514.67) (1830.20)	624.22 (628.93) (470.02)	624.22 (626.91) (65.80)
TS( $k_{5}^{}$ , $k_{6}^{}$ )	Known Sim(Med) Sim(IQR)	683.06 (641.03) (1518.73)	683.06 (752.44) (474.39)	683.06 (702.74) (127.12)	711.24 (330.69) (626.62)	711.24 (694.93) (501.72)	711.24 (716.33) (170.65)	722.54 (530.79) (1916.38)	722.54 (666.00) (506.81)	722.54 (722.54) (174.22)
		Gamma (4,3)
		n=5			n=10			n = 15
	m =	100	1000	10,000	100	1000	10,000	100	1000	10,000
Exact		740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74	740.74
Gaussian	Known Sim(Med) Sim(IQR)	203.11 (195.1) (281.58)	203.11 (205.38) (75.88)	203.11 (203.75) (23.93)	268.31 (245.46) (464.72)	268.31 (272.38) (138.95)	268.31 (266.95) (46.10)	309.84 (292.83) (656.67)	309.84 (303.21) (198.34)	309.84 (307.41) (60.53)
TS( $k_{4}^{*}$ )	Known Sim(Med) Sim(IQR)	589.97 (471.14) (764.32)	589.97 (578.78) (322.57)	589.97 (586.33) (94.12)	644.75 (567.05) (1285.5)	644.75 (641.44) (396.98)	644.75 (645.16) (140.40)	669.34 (568.67) (1546.21)	669.34 (652.74) (463.59)	669.34 (666.89) (157.92)
TS( $k_{5}^{}$ , $k_{6}^{}$ )	Known Sim(Med) Sim(IQR)	726.74 (507.10) (1026.60)	726.74 (687.29) (453.33)	726.74 (727.01) (139.96)	726.74 (497.51) (1325.29)	726.74 (568.71) (341.06)	726.74 (811.31) (119.31)	732.06 (453.82) (1103.56)	732.06 (711.74) (545.36)	732.06 (732.60) (166.51)

n = subgroup size, m = number of pre-samples from a

G a m m a (γ, 1 / λ)

distribution; Exact ARL = 1/0.00135 = 740.74, Calculated ARL = 1/Pr (

\bar{X}

> UCL) for

\bar{X}

~

G a m m a (γ n, 1 / n λ)

; Simulated results based on J = 1000 replications, Med = median, IQR = inner quartile range.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Braden, P.; Matis, T.; Benneyan, J.C.; Chen, B. Estimating $\bar{X}$ Statistical Control Limits for Any Arbitrary Probability Distribution Using Re-Expressed Truncated Cumulants. Mathematics 2022, 10, 1044. https://doi.org/10.3390/math10071044

AMA Style

Braden P, Matis T, Benneyan JC, Chen B. Estimating $\bar{X}$ Statistical Control Limits for Any Arbitrary Probability Distribution Using Re-Expressed Truncated Cumulants. Mathematics. 2022; 10(7):1044. https://doi.org/10.3390/math10071044

Chicago/Turabian Style

Braden, Paul, Timothy Matis, James C. Benneyan, and Binchao Chen. 2022. "Estimating $\bar{X}$ Statistical Control Limits for Any Arbitrary Probability Distribution Using Re-Expressed Truncated Cumulants" Mathematics 10, no. 7: 1044. https://doi.org/10.3390/math10071044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimating $\bar{X}$ Statistical Control Limits for Any Arbitrary Probability Distribution Using Re-Expressed Truncated Cumulants

Abstract

1. Introduction

2. Truncated Saddlepoint Approximations

3. Truncated Saddlepoint Approximations with a Re-Expressed Fourth Order Cumulant, TS( $κ_{4}^{*}$ )

4. Truncated Saddlepoint Approximations with Re-Expressed Fifth and Sixth Order Cumulants, TS( $κ_{5}^{}, κ_{6}^{}$ )

5. Healthcare Example

6. Performance Analysis

6.1. Known Cumulants

6.2. Estimated Cumulants

7. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Estimating X¯ Statistical Control Limits for Any Arbitrary Probability Distribution Using Re-Expressed Truncated Cumulants

Abstract

1. Introduction

2. Truncated Saddlepoint Approximations

3. Truncated Saddlepoint Approximations with a Re-Expressed Fourth Order Cumulant, TS( κ 4 * )

4. Truncated Saddlepoint Approximations with Re-Expressed Fifth and Sixth Order Cumulants, TS( κ 5 * , κ 6 * )

5. Healthcare Example

6. Performance Analysis

6.1. Known Cumulants

6.2. Estimated Cumulants

7. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Estimating $\bar{X}$ Statistical Control Limits for Any Arbitrary Probability Distribution Using Re-Expressed Truncated Cumulants

3. Truncated Saddlepoint Approximations with a Re-Expressed Fourth Order Cumulant, TS( $κ_{4}^{*}$ )

4. Truncated Saddlepoint Approximations with Re-Expressed Fifth and Sixth Order Cumulants, TS( $κ_{5}^{}, κ_{6}^{}$ )