Next Article in Journal
Risk Prediction Model for Dengue Transmission Based on Climate Data: Logistic Regression Approach
Previous Article in Journal
A Distribution for Instantaneous Failures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Setting Alarm Thresholds in Measurements with Systematic and Random Errors

1
SGIM/Nuclear Fuel Cycle Information Analysis, International Atomic Energy Agency, A1220 Vienna, Austria
2
Statistics Department, Colorado State University, Fort Collins, CO 80523, USA
*
Author to whom correspondence should be addressed.
Stats 2019, 2(2), 259-271; https://doi.org/10.3390/stats2020020
Submission received: 14 March 2019 / Revised: 26 April 2019 / Accepted: 28 April 2019 / Published: 7 May 2019

Abstract

:
For statistical evaluations that involve within-group and between-group variance components (denoted σ W 2 and σ B 2 , respectively), there is sometimes a need to monitor for a shift in the mean of time-ordered data. Uncertainty in the estimates σ ^ W 2 and σ ^ B 2 should be accounted for when setting alarm thresholds to check for a mean shift as both σ W 2 and σ B 2 must be estimated. One-way random effects analysis of variance (ANOVA) is the main tool for analysing such grouped data. Nearly all of the ANOVA applications assume that both the within-group and between-group components are normally distributed. However, depending on the application, the within-group and/or between-group probability distributions might not be well approximated by a normal distribution. This review paper uses the same example throughout to illustrate the possible approaches to setting alarm limits in grouped data, depending on what is assumed about the within-group and between-group probability distributions. The example involves measurement data, for which systematic errors are assumed to remain constant within a group, and to change between groups. The false alarm probability depends on the assumed measurement error model and its within-group and between-group error variances, which are estimated while using historical data, usually with ample within-group data, but with a small number of groups (three to 10 typically). This paper illustrates the parametric, semi-parametric, and non-parametric options to setting alarm thresholds in such grouped data.

1. Introduction

Many applications involve testing for a shift in the mean of a probability distribution for a random quantity. Any such test will have a false alarm probability (FAP) and a failure-to-detect-the-shift probability. This review paper uses the same example application throughout to illustrate the possible approaches to setting alarm thresholds in measurement data that have both systematic and random error components. The International Atomic Energy Agency (IAEA) verification data that are collected to monitor for possible nuclear material diversion is the example [1,2]. Such verification data from IAEA inspections often consist of paired data (usually operators’ declarations and inspectors’ verification results) that are analyzed to detect the significant differences. Any significant difference could arise due to a problem with the operator and/or inspector measurement systems, or due to the operator falsifying the data in an attempt to mask diversion; such a falsification would cause a mean shift between the operator declaration and the corresponding inspector measurement. Paired data from past inspections are used to estimate the alarm limits, and each inspection period is regarded as a group within which the measurements are assumed to have the same systematic error due to calibration and other effects [1,2,3]. The corresponding FAP depends on the assumed measurement error model and its random (within-group) and systematic (between-group) error variances, which are estimated while using data from previous inspections [3].
This paper reviews the parametric, semi-parametric, and non-parametric options for setting alarm thresholds in such grouped data. If both the within-group and between-group measurement errors have approximately normal distributions, then a parametric option that involves tolerance intervals [4,5,6,7,8,9,10,11] for one-way ANOVA can be used. If either or both error distributions are not close to normal, then a semi-parametric method that is based on a Dirichlet process mixture [12] can be applied. A non-parametric method [13] could be used if there is enough data.

2. Materials and Methods

An effective measurement error model for operator and inspector data must account for variation within and between groups, where a group is, for example, a calibration period, inspection period, item, or laboratory. In this paper, a group is an inspection period. A typical model that assumes additive errors for the inspector (I) (and similarly for the operator O) is
I j k = μ j k + S I j + R I j k
where for item k (from 1 to n) in group j (from 1 to g), I j k is the inspector’s measured value of the true but unknown value μ j k , R I j k ~ N ( 0 , σ R I 2 ) is a random error, and S I j ~ N ( 0 , σ S I 2 ) is a short-term systematic error that arises due to metrology changes, most important of which is recalibration between inspection periods [1,2,3,14]. For a fixed value of μ j k , the total variance of the inspector measurement is σ I 2 = σ S I 2 + σ R I 2 .
The measurement error model in Equation (1) sets the stage for applying ANOVA with random effects [2,15,16,17,18,19,20]. Neither R I j k nor S I j are observable. If the errors tend to scale with the true value, then a typical model for multiplicative errors (with relative standard deviations (RSD) δ S and δ R ) is
I j k = μ j k ( 1 + S ˜ I j + R ˜ I j k )
where S ˜ I j ~ N ( 0 , δ S I 2 ) , R ˜ I j k ~ N ( 0 , δ R I 2 ) . As explained below, for a technical reason, the data model in Equation (2) is slightly modified from normal distributions to truncated normal distributions in the IAEA application. Let δ R 2 = δ R O 2 + δ R I 2 and δ S 2 = δ S O 2 + δ S I 2 . For a fixed value of μ j k , the total variance of the inspector measurement is σ I 2 = σ S I 2 + σ R I 2 = μ j k 2 ( δ S I 2 + δ R I 2 ) . Let O j k = μ j k ( 1 + S ˜ O j + R ˜ O j ) and I j k = μ j k ( 1 + S ˜ I j + R ˜ I j ) . Subsequently, the assumed model for the relative difference between operator and inspector is
μ j k ( 1 + S ˜ O j + R ˜ O j k ) μ j k ( 1 + S ˜ I j + R ˜ I j k ) μ j k = S ˜ j + R ˜ j k
for the operators declared value of item k from group j , R ˜ j k = R ˜ O j k R ˜ I j k ~ N ( 0 , δ R 2 ) is the net random error and S ˜ j = S ˜ O j S ˜ I j ~ N ( 0 , δ S 2 ) is the net short-term systematic error. In practice, while assuming no data falsification by the operator, Equation (3) can be calculated using the relative differences, d j k = o j k i j k o j k , where o j k is used in the denominator to estimate μ j k , because typically δ S O 2 + δ R O 2 δ S I 2 + δ R I 2 , with δ T O = δ S O 2 + δ R O 2 always being very small, 0.02 or less, in the IAEA application. Formally, a ratio of normal random variables has infinite variance [3,21]. To define a ratio that has finite variance, a truncated normal can be used as the data model in Equation (2) for o j k in d j k = 1 i j k o j k , which is equal in distribution to 1 μ j k ( 1 + δ T I z 1 ) μ j k ( 1 + δ T O z 2 ) = 1 R , which involves a ratio R = ( 1 + δ T I z 1 ) ( 1 + δ T O z 2 ) of the independent normal random variables z 1 and z 2 (for the case of one measurement per group; multiple measurements per group is treated similarly). Note that the unknown true value μ j k cancels in R , and then specifically, replace δ T O z 2 with, for example, the conditional normal random variable δ T O z ˜ 2 = δ T O z 2 | { δ T O z 2 > 0.5 } . The value −0.5 is an arbitrary but reasonable truncation value that is at least 25 standard deviations from the mean, and so such truncation would only take effect for a random realized value that is 25 or more standard deviations δ T O from the mean (an event that occurs with almost zero probability). The modified ratio then has finite moments of all orders, because the numerator has a normal distribution and the denominator is an independent truncated normal that is no smaller than 0.5 (so the mean of the ratio cannot exceed twice the mean of the numerator, and similarly for other moments). Therefore, because δ T O 0.02 and δ T I 0.05 (typically), the distribution of the values d j k = o j k i j k o j k is guaranteed to have finite moments when the data model for o j k is a truncated normal, such as just described. In addition, provided δ T O 0.02 and δ T I 0.05 , the distribution of the truncated version of the ratio d j k = o j k i j k o j k is extremely close to a normal distribution (see Supplementary 1, and also see Section 5 and Section 6 for cases in which the distribution of d j k = o j k i j k o j k does not appear to be approximately normal). Additionally, the expression for the approximate variance δ d 2 = δ S O 2 + δ R O 2 + δ S I 2 + δ R I 2 of d rel = ( O I ) / O arising from using linear error variance propagation is accurate for quite large δ T , up to approximately δ T = δ T O 2 + δ T I 2 = 0.20, where. δ T O = δ R O 2 + δ S O 2 (as confirmed by simulations in R [20], and see [3,21]). In addition, the simulation-based tolerance intervals in Section 4, Section 5 and Section 6 are constructed by simulating the ratio of a normal to a truncated normal, so all of the simulation results are accurate for R = ( 1 + δ T I z 1 ) ( 1 + δ T O z ˜ 2 ) (with a truncated data model for the denominator, as described above) to within very small simulation error arising from performing a finite but large number of simulations. To summarize this issue regarding a ratio of normal random variables, the parametric option using simulation assumes, as discussed, that the truncated ratio R = ( 1 + δ T I z 1 ) ( 1 + δ T O z ˜ 2 ) , involves a normal distribution for the numerator and a truncated normal for the denominator; and, in many real data sets, this assumption is reasonable on the basis of normality checks. However, because R = ( 1 + δ T I z 1 ) ( 1 + δ T O z ˜ 2 ) might not always appear to be approximately normally distributed, this paper also considers the semi-parametric and non-parametric alternatives to calculating alarm thresholds for the truncated version of d j k in Section 5 and Section 6, respectively.
In some applications, all four error variance components must be estimated [1,2,3,17,18,19], but in this application, only the aggregate variances δ S 2 and δ R 2 need to be estimated. This paper’s focus is on the total relative variance, δ T 2 = δ S 2 + δ R 2 , because d i j ~ N ( 0 , δ T 2 ) (approximately, due to using d j k = o j k i j k o j k rather than d j k = o j k i j k μ j k ). The estimated variances δ ^ R 2 and δ ^ S 2 are used to compute δ ^ T 2 = δ ^ S 2 + δ ^ R 2 , so that δ ^ T can be used to set an alarm threshold for future d rel value. Specifically, in future values of the operator-inspector difference statistic d i = ( o i i i ) / o i , if | d i |   > k δ ^ T (in two-sided testing), then the i-th item selected for verification leads to an alarm, where δ ^ T = δ ^ R 2 + δ ^ S 2 , (with δ T the total RSD, δ S the between-period short-term systematic error RSD, and δ R the within-period reproducibility) and k = 3 is a common current choice that corresponds to a small α of approximately 0.003 if δ ^ T = δ T .Therefore, the focus of this paper is a one-way random effects ANOVA [16]. Regarding jargon, note that the short-term systematic errors are fixed within an inspection period, but they are random across periods, so this is called a random effects ANOVA model [15]. Due to the estimation error in δ ^ T , the actual FAP can be considerably larger than 0.05, as shown in Section 4.
The usual ANOVA decomposition is
j = 1 g k = 1 n ( d j k d ¯ · · ) 2 = j = 1 g k = 1 n ( d j k d ¯ k · ) 2 +   n j = 1 g ( d ¯ j · d ¯ · · ) 2 = S S W + S S B = S W 2 +   S B 2
where d j k = o j k i j k for additive models and d j k = ( o j k i j k ) / o j k for multiplicative models as assumed from now on (to avoid cluttering the notation, the “rel” subscript in d rel = ( O I ) / O is omitted). In Equation (4), SSW is the within group sum of squares, and SSB is the between group sum of squares. For simplicity, Equation (4) assumes that each group has the same number of measurements n. As usual in one-way (one grouping variable) random effects ANOVA, E ( j = 1 g k = 1 n ( d j k d ¯ j · ) 2 ) / ( n g 1 ) =   δ R 2 and E ( n g 1 j = 1 g ( d ¯ j · d ¯ · · ) 2 ) = n δ S 2 + δ R 2 , so δ ^ R 2 = S W 2 g ( n 1 ) and δ ^ S 2 = S B 2 / ( g 1 ) S W 2 / ( g ( n 1 ) ) n .
For data sets in which d j k appears to have approximately a normal distribution, key properties (such as the variances, see Supplementary 1) of the estimators δ ^ R 2 and δ ^ S 2 are approximately known [15,16]. However, biased estimators can have smaller mean squared error (MSE), so other estimators should be considered. Additionally, again assuming normally distributed R and S values, an exact confidence interval (CI) can be constructed for δ R 2 using the χ d f = g ( n 1 ) 2 distribution, but there are only approximate methods to construct CIs for δ S 2 , because the distribution of δ ^ S 2 is a difference of two independent χ 2 random variables. Kraemer [11] proposes a modified CI construction method for δ S 2 and investigates impacts of non-normality.
Figure 1 plots 10 realistic simulated d = o i o values in each of the three groups of paired (O, I) data. All of the simulations and analyses are done in R [20]. The alarm limit on the right side of the plot in Figure 1 is estimated on the basis of the g = 3 sets of n = 10 values of d. Burr et al. [3] evaluates impacts on alarm probabilities of using estimates of δ R 2 and δ S 2 instead of the true quantities. As in Figure 1, in some data sets, the between-group variation is noticeable when compared to the within-group variation. Figure 2 is the same type of plot as Figure 1, but it is for real data consisting of n = 5 paired comparisons in each of the four inspection periods (groups). For the 30 simulated d = o i o values in Figure 1, the estimated RSDs are δ ^ R = 0.009 , δ ^ S = 0.007 , δ ^ T = 0.012 and the true values used in the simulation are δ R = 0.01 , δ S = 0.01 , δ T = 0.014 . For the 20 d = o i o values in Figure 2, the estimated RSDs are δ ^ R = 0.07 , δ ^ S = 0.11 , δ ^ T = 0.13 .

3. Tolerance Interval Approach to Setting Alarm Thresholds

Many readers are probably more familiar with confidence intervals (CIs) than tolerance intervals (TIs). A CI is defined as an interval that, on average, includes a model parameter, such as a population mean with a stated confidence, often 95%. A TI is very similar to a CI, but it is defined as in interval that bounds a percentage of the population with a stated confidence, often bounding 95% of the population with confidence 99%. In the IAEA application, an alarm threshold is used that is assumed to correspond to a small false alarm probability (FAP), such as 5%, so the TI-based threshold bounds the lower (one-sided testing) or middle (two-sided testing) 95% of the population. Therefore, TIs are needed to control the FAP with high confidence (such as 99%) to be 5% or less.
Historical differences, such as d 1 ,   d 2 ,   ,   d n g , are often used to estimate an alarm threshold for future measurements that has a small nominal α , such as α = 0.05 . Accordingly, instead of requiring a CI for the true value μ , the need is to estimate a threshold denoted T 0.95 , which is the 0.95 quantile of the probability distribution of d corresponds to α = 0.05 in one-sided testing. In contrast to a CI, a TI is an interval that bounds a fraction of a probability distribution with a specified confidence (frequentist) or probability (Bayesian approach) [22,23,24]. Section 3 presents the Frequentist and Bayesian TI approaches, for the model X = μ + S + R , where, in this paper, X = ( O I ) / O as computed with paired (O,I) data.
It is helpful to first review inference in a simpler setting without data being grouped by an inspection period. Suppose that data x1, x2, …, xn, xn+1 are collected from a distribution that is approximately normal with unknown mean µ and standard deviation σ , so X ~ N ( μ , σ ) . Assume that x n + 1 is test data and that x 1 ,   x 2 ,   ,   x n are the training data used to estimate μ and σ , while using the usual estimates μ ^ = x ¯ = 1 n i = 1 n x i and σ ^ = s 2 = 1 n 1 i = 1 n ( x i x ¯ ) 2 , respectively. When constructing intervals of the form x ¯ ± k σ ^ , the multiplier k can be chosen in order to have any user-desired confidence that the interval x ¯ ± k σ ^ will include the true parameter μ . Specifically, for the commonly-used t-based CI, k = t 1 α / 2 ( n 1 ) , where 1 α is the desired confidence and t 1 α / 2 ( n 1 ) denotes the 1 α / 2 quantile of the t distribution with n − 1 degrees of freedom. For example, if n = 10, 20, or 30, then k = t 1 α / 2 ( n 1 ) = 2.26, 2.09, or 2.05, respectively. Note that the well-known t-based CI is appropriate for ungrouped data. Or, if σ is known, then k = z 1 α / 2 = 1.96 , where z 1 α / 2 is the 1 α / 2 quantile of the normal distribution (the commonly-used z-based CI).
The previous paragraph adopted a frequentist viewpoint ( μ and σ are unknown constants), so the intervals are referred to as CIs. In repeated applications of training on n observations of X, a fraction of approximately 1 α of these CIs for μ will include the true value of μ (and similarly for σ ). The Bayesian viewpoint regards μ and σ as random variables. A prior distribution is assumed for both μ and σ and the training data are used via Bayes theorem to update the prior to produce a posterior distribution [24], which is used to produce an interval that includes the true parameter with any user-desired probability (assuming that the data x 1 ,   x 2 ,   ,   x n are approximately normal and the prior distributions for μ and σ are appropriate). The Bayesian approach is subjective, unless there is some objective means to choose the prior probability [24].
Moving from inference about μ and σ , this paper reviews methods to use x 1 ,   x 2 ,   ,   x n to calculate a threshold T ^ 1 α such that P ( T ^ 1 α T 1 α ) = p , where T 1 α is the true 1 α threshold of the distribution of x. The methods are frequentist TIs and Bayesian prediction intervals, which include a specified fraction of at least 1 α of future data, with p being the specified confidence in the frequentist TI approach and p being the posterior probability in a Bayesian approach [22,23]. The frequentist TI estimators that are presented have the form T ^ 0.95 = μ ^ + k σ ^ , where k is the coverage factor that depends on the sample size. In any Bayesian approach, probabilities are calculated with respect to the joint posterior distribution f p o s t e r i o r ( μ , σ ) for given x 1 ,   x 2 ,   ,   x n [22]. In this context, in the Bayesian approach, μ and σ are random unknown parameters, so P ( T ^ 0.95 T 0.95 ) = P μ , σ ( T ^ 0.95 T 0.95 ) is computed with respect to μ and σ for given x 1 ,   x 2 ,   ,   x n . The Bayesian approach that was used in the IAEA application generates observations μ and σ from the posterior probability, which can be used to compute the posterior means μ ^ and σ ^ (the hat notation is somewhat non-standard in Bayesian literature, but it denotes the respective point estimate), which can then be used to numerically search for a suitable value of k to estimate T ^ 0.95 = μ ^ + k σ ^ . In the frequentist approach, μ ^ and σ ^ are random, while μ and σ are fixed unknowns, so P ( T ^ 0.95 T 0.95 ) = P X 1 , X 2 , , X n ( T ^ 0.95 T 0.95 ) is computed with respect to random samples of size n. A frequentist TI has an associated confidence, which is the long-run relative frequency that an interval such as (0, T ^ 0.95 = μ ^ + k σ ^ ) will include a future observation X from the same distribution as the training data used to estimate μ ^ and σ ^ . An exact expression for a TI is only available in the one-sided Gaussian case [4,23]. However, good approximate expressions for many other cases are available [4,5,6,7,8,9,10,11]. Alternatively, TIs can be estimated well by using a simulation to approximate an alarm threshold that is designed to contain at least 1 α percent of future observations with a specified coverage probability p .
For the case X ~ N ( μ , σ ) , in one-sided testing, [4] the exact upper limit for a 1 − p TI upper bound is U = x ¯ + k s , where k = t n 1 ( p = 1 α , n c p = λ ) / n , where the noncentrality parameter λ = z p n , and z p denotes the pth quantile (p = 0.99 here) of the standard normal (mean 0, variance 1). For example, with n = 10, 20, or 30, k = 3.74, 2.81, and 2.52, respectively. Note that these values of k are larger than the corresponding values in a 1 α CI for μ (2.26, 2.09, or 2.05, from above). Supplementary 1 provides example simulation results that can be compared to these exact results in order to illustrate the simulation approach.

4. Parametric Approaches

Most TI literature assumes an additive model, such as Equation (1) or Equation (3). The verification data analysis assumes the model X = μ + S + R , where, for example, X = ( O I ) / O ( O I ) / μ , so the only difference between an additive and multiplicative model in this review is notation. Therefore, either the notation σ S 2 and σ R 2 or δ S 2 and δ R 2 can and will be used, with R j k ~ N ( 0 , σ R 2 ) the net random error, and S j ~ N ( 0 , σ S 2 ) the net systematic error. For readers that are concerned about undefined moments of the ratio ( O I ) / O , or the adequacy of the normality assumption, recall from Section 3 that the simulations use the truncated version of ( O I ) / O (which has finite moments) rather than directly simulating from a corresponding normal.

4.1. Exact Tolerance Interval if σ S 2 / σ R 2 is Known

It is useful to first assume that σ S 2 / σ R 2 is known.

4.1.1. Frequentist Approach

The relative differences between inspector measurements and operator declarations of n items that are sampled for verification are modelled as X i = μ + S i + R i , as in Equation (3). The null hypothesis is H 0 : μ = 0 , and δ T = δ R 2 + δ S 2 can be estimated by applying ANOVA, as described in Section 2, but τ = σ S 2 / σ R 2 is assumed to be known. If one assumes δ ^ T = δ T , then choosing k = 1.96 corresponds to α = 0.05 (Gaussian approximation); however, as an example, if n = 10, paired measurements in each of g = 3 prior inspection periods are available, and σ S = σ R , then choosing k = 1.65 leads to an actual FAP of 0.05 or less, with a probability of 0.38. Therefore, one must choose a larger value of k than k = 1.65 in order to ensure a large probability p that α 0.05 [23].
Unlike the single-component Gaussian case above, these values of k depend on the ratio σ S / σ R , which is unknown in practice, so approximate methods are needed (see Section 4.2). If one desires a high probability p = 0.99 that the actual FAP is as small as the nominal FAP, then the simulation in R [20] indicates that, instead of k = 1.96 in two-sided testing, for σ S = 0.25 σ R , 1 σ R , 4 σ R , one must choose, for example, k = 3.02, 3.15, or 5.04, respectively, for g = 3, n =10. In one-sided testing, the required k values are 2.52, 2.94, or 4.23, respectively. These values of k are demonstrated by simulation to be accurate to the number of digits shown.
The simulation approach uses three steps, as follows: (1) Simulate n scaled difference in each of g groups from Equation (3); (2) Estimate σ ^ T = σ ^ S 2 + σ ^ R 2 using ANOVA, as described in Section 2 and (3), use a grid search to find the value of k, such that for T ^ 0.95 = μ ^ + k σ ^ T , the requirement P ( T ^ 0.95 T 0.95 ) = p is met, where p is a user-specified probability (the frequentist confidence level), such as p = 0.99 . In the IAEA verification data, it is reasonable to assume that μ = 0 , so T ^ 0.95 = 0 + k σ ^ T .
Reference [10] provides an exact expression for a one-sided tolerance interval in the Gaussian case if σ S 2 / σ R 2 is known, using x ¯ + t n g 1 ( λ , p = 0.99 ) S / ω , where S 2 = ( S B 2 + S W 2 ) / ( n g 1 ) , S B 2 = j = 1 g ω i ( x ¯ j x ¯ ¯ ) 2 , S W 2 = j = 1 g k = 1 n ( x j k x ¯ j ) 2 , ω i = ω = j = 1 g 1 1 n + σ B 2 σ W 2 , and λ = z 1 α ( 1 + σ B 2 σ W 2 ) ω is the noncentrality parameter of the t d f = n g 1 distribution. For verification, the simulation results above were compared to the exact expression (using T ^ 0.95 = x ¯ + k S / ω instead of T ^ 0.95 = 0 + k σ ^ T ). The estimation approach in [10] using T ^ 0.95 = x ¯ + k S / ω requires σ S 2 / σ R 2 to be known. The values of k reported above assume that τ = σ B 2 / σ W 2 is known for the purpose of estimating k in T ^ 0.95 = 0 + k σ ^ T . As τ is assumed known, the expected value of S B 2 can be expressed as E ( S B 2 ) = σ W 2 ( τ + 1 n ) , so, an unbiased estimate of σ W 2 is σ ^ W 2 2 = S B 2 ( τ + 1 n ) . As usual, the first unbiased estimate of σ W 2 , S W 1 2 = S W 2 , is also an unbiased estimate of σ W 2 (which is independent of σ ^ W 2 2 for Gaussian data [15]). Therefore, a weighted average c 1 S W 1 2 + c 2 S W 2 2 with c 1 1 2 σ R 2 / g ( n 1 ) and c 2 2 σ R 2 n 2 / ( n τ + 1 ) 2 ( g 1 ) can be used to estimate σ ^ R 2 , and then σ ^ S 2 = τ σ ^ R 2 .
For g = 3 and n = 10, and δ R = 0.01 , Figure 3 plots the required value of k versus δ S , such that P ( d k δ ^ T ) 0.05 , with probability of at least p = 0.99 .
Figure 4 plots the required value of k for g = 3, 5, or 10, and n = 10 versus τ = σ B 2 σ W 2 . Note that the required value of k increases quite substantially as τ = σ B 2 σ W 2 increases.
For the data in Figure 1, for a two-sided interval expressed as 0 ± k σ ^ T , if one assumes (correctly) that τ = 1 , then the option 2 estimate is k = 2.34 , if the weighted least squares option that is described above is used. If the weighted least squares option is not used, but usual ANOVA estimation is used, then k = 2.92 .

4.1.2. Bayesian Approach

Bayesian ANOVA, such as the one applied to data generated from Equation (3), has been well studied [15,24]; however, Bayesian ANOVA using approximate Bayesian computation (ABC) has not been well studied. ABC has some robustness in misspecifying the true probability distribution for the data, so it is the Bayesian choice that is evaluated here [17,18].
In any Bayesian approach, prior information regarding the magnitudes and/or relative magnitudes of δ R 2 and δ S 2 can be provided. If the prior is “conjugate” for the likelihood, then the posterior is in the same likelihood family as the prior, in which case analytical methods are available to compute posterior prediction intervals for quantities of interest. As a wide variety of priors and likelihoods can be accommodated, modern Bayesian methods do not rely on conjugate priors, but they use numerical methods to obtain samples of δ R 2 and δ S 2 from their approximate posterior distributions [24]. For numerical methods, such as Markov Chain Monte Carlo [23], the user specifies a prior distribution for δ R 2 and δ S 2 and a likelihood (which need not be normal). ABC does not require a likelihood for the data (and, as in any Bayesian approach, ABC accommodates the constraints on variances through prior distributions [17,18,25,26,27,28,29].
The “output” of any Bayesian analysis is the posterior distribution for each model parameter, and so the output of ABC for data generated from Equation (3) is an estimate of the posterior distributions of δ R 2 and δ S 2 . No matter what type of Bayesian approach is used, a well-calibrated Bayesian approach satisfies several requirements. One requirement is that, in repeated applications of ABC, approximately 95% of the middle 95% of the posterior distribution for each δ R 2 and δ S 2 should contain the respective true values. That is, the actual coverage should be closely approximated by the nominal coverage. A second requirement is that the true mean squared error (MSE) of the ABC-based estimates of δ R 2 and δ S 2 should be closely approximated by the variance of the ABC-based posterior distributions of δ R 2 and δ S 2 . Inference using ABC can be briefly summarized, as follows:
ABC Inference
For i in 1, 2, …, N, do these 3 steps:
(1) Sample θ from the prior, θ ~ f prior ( θ ) .
(2) Simulate data x from the model x ~ P ( x | θ )
(3) Denote the real data as x. If the distance
d ( S ( x ) , S ( x ) ) ε ,   accept   θ   as   an   observation   from   f posterior ( θ | x ) .
Experience with ABC suggests that the ABC approximation to f posterior ( θ | x ) improves if step (3) is modified to include a weighting factor, so that trial values of θ simulated from f p r i o r ( θ ) that lead to very small distance d ( S ( x ) , S ( x ) ) are more heavily weighted in the estimated posterior [27,28,29].
In ABC, the model has input parameters θ and outputs data x(θ), and there is corresponding real data xbs. For example, the model could be Equation (3), which specifies how to generate synthetic d data (denoted x here), and it does provide a likelihood; however, the true likelihood that is used to generate the data need not be known to the user. Synthetic data is generated from the model for many trial values of θ, and trial θ values are accepted as contributing to the estimated posterior distribution for θ| xobs if the distance d ( x o b s ,   x ( θ ) ) between x o b s and x ( θ ) is reasonably small. Alternatively, it is necessary to reduce the dimension of xobs to a small set of summary statistics S and accept trial values of θ if d ( S ( x o b s ) , S ( x ( θ ) ) < ε , where ε is a user-chosen threshold, for most applications. Here, for example, x o b s = d = ( O I ) / O data in each inspection group, and (xobs) includes the within and between groups sums of squares. Specifically, the ANOVA-based estimator of δ R 2 in Equation (3) is δ ^ R 2 = 1 n g   { j = 1 g k = 1 n ( d j k d ¯ j ) 2 } , and the usual estimate of δ S 2 is δ ^ S 2 = j = 1 g ( d ¯ j d ¯ ¯ ) 2 g 1 δ ^ R 2 n . Therefore, the quantities δ ^ R 2 and δ ^ S 2 are good choices for the summary statistics for ABC. Recall that, because trial values of θ are accepted if d ( S ( x o b s ) , S ( x ( θ ) ) < ε , an approximation error to the posterior distribution arises that several ABC options attempt to mitigate. Additionally, recall that such options weight the accepted θ values by the actual distance d(S(xobs), S(x(θ)) (abctools package in R [20]).
To summarize, ABC applied to data following Equation (3) consists of three steps: (1) sample parameter values of δ R 2 and δ S 2 from their prior distribution pprior(θ); (2) for each simulated value of θ in (1), simulate data from Equation (2); and, (3) accept a fraction of the sampled prior values in (1) by checking whether the summary statistics that are computed from the data in (2) satisfy d ( S ( x o b s ) , S ( x ( θ ) )   <   ε . If desired, aiming to improve the approximation to the posterior, adjust the accepted θ values on the basis of the actual d ( S ( x o b s ) , S ( x ( θ ) ) value. ABC requires the user to make three choices: the summary statistics, the threshold ε , and the measure of distance d. Reference [25] introduced a method to choose the summary statistics that uses the estimated posterior means of the parameters based on pilot simulation runs. Reference [25] used an estimate of the change in posterior pposterior(θ), when a candidate summary statistic is added to the current set of summary statistics. Reference [26] illustrated a method to evaluate whether a candidate set of summary statistics leads to a well-calibrated posterior, in the same sense that is used in this paper; that is, nominal posterior probability intervals should have approximately the same actual coverage probability, and the posterior variance should agree with the observed MSE in testing.
Note that any Bayesian method can be regarded as approximate, because one almost never knows the exact prior probability distribution. However, from Section 2, S W 2 and S B 2 contain useful information regarding δ R 2 and δ S 2 , so this paper’s ABC implementation uses S W 2 and S B 2 as summary statistics. There is also the need to choose the threshold ε , which can be done by a grid search to find suitable small ε , using two assessments to check whether the approximate posterior is sufficiently accurate. First, the quantiles of the estimated posterior should approximately provide the nominal coverage, which is analogous to the frequentist assessment that, for example, approximately 95% of 95% confidence intervals should contain the true parameter. Second, the estimated MSE (which is the variance of the posterior) should be approximately the actual MSE. In all cases considered here, ABC performs well on the two assessments, with actual coverage and MSE within 1% relative of the nominal coverage and MSE. Furthermore, ABC exhibits modest robustness, still exhibiting good agreement between nominal and actual coverage (and MSME), for example, by replacing the Gaussian with a t distribution with small degrees of freedom, such as five-degrees of freedom. As with any Bayesian method, sensitivity to the assumed prior can be an important topic; the results reported below assumed either a uniform prior (0, 0.20) on δ R and δ S , but the results are very similar for other moderately-wide priors, such as a gamma prior with appropriate parameters.
As in the frequentist option (Section 4.1.1), for the data in Figure 1, for a two-sided interval expressed as 0 ± k σ ^ T , if one assumes (correctly) that τ = 1 , then with p = 0.99 for α = 0.05 , the ABC estimate of k is k = 3.22 if the weighted least squares option that is described in Section 4.1.1 is used, or k = 3.60 if it is not used. Note that these values of k are not directly comparable to the frequentist values of k (2.34, and 2.92, respectively, in Section 4.1.1), because the Bayesian estimate of σ T behaves differently from the frequentist estimate of σ T .

4.2. Approximate Tolerance Interval If σ S 2 / σ R 2 Is Unknown

In nearly all applications, σ S 2 / σ R 2 is unknown, and the resulting tolerance interval width is considerably larger than if σ S 2 / σ R 2 were known. As the tolerance interval width depends on the unknown σ S 2 / σ R 2 , all of the methods are approximate [4,5,6,7,8,9,10,11,23], including those that are based on simulation.

4.2.1. Frequentist Approach

There are several published approximate TI construction methods if σ S 2 / σ R 2 is unknown. Option 1 is to first construct a CI for σ S 2 / σ R 2 and then assume some upper value for the true unknown σ S 2 / σ R 2 . Option 2 uses a generalized confidence interval, which leads to an approximation that is based on the non-central t distribution, as in [8,9]. Option 3 uses a bootstrap adjustment to a conservative approximation [5]. Option 4 uses generalized pivot quantities [7]. Option 5 uses variance recover estimation [6]. There are more approximations; not all are reviewed here, but see [9] for a partial review.
Reference [10] suggests to first construct a CI for τ , and then assumes that τ is known, set to an upper quantile of the CI (or prediction interval if using ABC). An interval that bounds τ with high confidence can be easily constructed [15]. However, the CI for τ can be prohibitively wide in this context, extending from 0.036 to 176 for the data in Figure 1. For option 1, in the inspector verification data example, it is reasonable to assume an upper limit for σ S 2 / σ R 2 , so large values in a CI for σ S 2 / σ R 2 can be excluded from consideration. The largest feasible value for σ S 2 / σ R 2 (based on assumption or on a CI) is then used exactly, as in the case where σ S 2 / σ R 2 is known.
The approximation for options 2–5 are given in [4,5,6,7,8,9,10,11]. For example, option 2 from [8,9] is x ¯ + t g 1 ( p , η ) S S B / ( g ( g 1 ) n ) , where η = z 1 α ( g + ( g 1 ) F g 1 , g ( n 1 ) ( 1 p ) ) S S E / S S B .
In the verification data example, it is reasonable to assume that μ = 0 , so the term x ¯ is not needed in the approximation or in corresponding simulations. For the data in Figure 1, for a two-sided interval expressed as 0 ± k σ ^ T , not knowing τ leads to a much larger value, k = 11.2 for 99% confidence in 95% coverage. For 95% confidence in 90% coverage, k = 4.2.
Regarding the accuracy of the approximation, options 2–5 are reasonably accurate for Gaussian data, with each option being better or worse than other options in some regions of the parameter space. For example, option 2 has nearly 0.99 actual frequency (with a nominal frequency also of 0.99) of containing 1 α percent of the d values for τ = δ B 2 δ W 2 near 1. However, if τ is close to zero, then option 2 can have much lower actual frequency, such as 0.7 for τ = 0.01.

4.2.2. Bayesian Approach

To illustrate ABC, Figure 5 plots the estimated posterior assuming that either τ = δ B 2 δ W 2 = σ B 2 σ W 2 is known or unknown, for a uniform prior on δ T (from 0 to 0.20) with n = 10, g =5, and data simulated from Equation (3) using δ S = δ R = 0.01 . The two diagnostic assessments indicate that the ABC-based posterior distributions are adequate approximations.
For the data in Figure 1, for a two-sided interval expressed as 0 ± k σ ^ T , a probability interval using ABC that bounds τ with probability p = 0.99 ranges from 0.06 to 4.5 for the data in Figure 1. This probability interval is much narrower than the wide CI that is given in Section 4.2.1 in the frequentist approach (0.036 to 176), because a somewhat narrow prior was assumed, with a maximum assumed value of τ = 4 , leading to 0 ± 5.2 σ ^ T (refer to Figure 4 for g = 3, n = 10 with τ = 4.5 ).

5. Semi-Parametric Approach

A semiparametric approach lies between the parametric and nonparametric extremes. For example, one semiparametric option is bspmma (which uses a dirichlet process prior) that can be compared to the parametric case, and for a two-sided interval expressed as 0 ± k σ ^ T , requires k = 4.2 for the data in Figure 1. The R code bspmma [12] can be used in one-way random effects ANOVA to estimate the posterior distribution of S. The acronym bspmma stands for Bayesian semiparametric models for meta-analysis [12], and the R code uses model selection that is based on the Radon–Nikodym derivative (a key concept in measure theory). The posterior distribution of R can be obtained by any of several common approaches. The k = 4.2 result for the S + R distribution used a parametric bootstrap [30], in which sample i from the posterior distribution of S + R used bspmma to generate the S value and from N ( 0 , σ ^ R i ) to generate the R value. The standard deviation σ ^ R i is sampled from the posterior distribution of σ R i .
The semi-parametric approach that was used in this context assumes that R has a normal distribution and S has an arbitrary (unknown) distribution. Interestingly, in this case, the covariance between MSB and MSW is 0, because cov ( M S B , M S W ) = μ 4 , R 3 σ R 4 n g , where μ 4 , R is the fourth central moment, μ 4 , R = E ( R 4 ) , and then because μ 4 , R = 3 σ R 4 for a normal distribution [15], cov ( M S B , M S W ) = 0

6. Non-Parametric Approach

The nonparametric approach requires one or more observations from each of the 130 groups (inspection periods), which are far too many groups and observations to be practical. The basis for the nonparametric result of n = 130 is an application of order statistics [13], and the expression for the required sample size n from the S + R distribution in model 2 is 1 n p n 1 + ( n 1 ) p n , where p is the desired probability. If the requirement is 95% probability that the tolerance interval that is based on that the interval X ( n ) X ( 1 ) = max min contains at least 95% of future values, then n = 93. Raising the requirement from 95% probability to 99% probability increases the required n to n = 130, as given.

7. Summary

This paper reviewed TI-based options to set alarm thresholds in random effects one-way ANOVA, using an example that involves inspector verification in which the inspections are the groups. In the example, within-group variance is random measurement error variance; between-group variance is systematic measurement error variance. There are no exact TI methods for two-sided testing in such data, even if both the random and systematic components are Gaussian.
Although the current practice of using 0 ± 3 δ ^ T has acceptably small FAP for a modest number of groups and observations per group, the main purpose of this review paper is to investigate when more formal TI-based methods might be preferred. Semi- and non-parametric options were also investigated. Future work will consider TI-based options when the metrology groups must be inferred from data, as in [31].

Supplementary Materials

The following are available online at https://www.mdpi.com/2571-905X/2/2/20/s1.

Author Contributions

“Conceptualization, all four authors; Methodology, T.B. and K.K.; Software, T.B.; Validation, T.B. and E.B.; Formal Analysis, T.B.; Writing-Original Draft Preparation, all four authors; Writing-Review & Editing, all four authors.

Funding

This research received no external funding.

Acknowledgments

The authors acknowledge the careful reviews by both reviewers.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bonner, E.; Burr, T.; Guzzardo, T.; Norman, C.; Zhao, K.; Beddingfield, D.; Geist, W.; Laughter, M.; Lee, T. Ensuring the effectiveness of safeguards through comprehensive uncertainty quantification. J. Nucl. Mater. Manag. 2016, 44, 53–61. [Google Scholar]
  2. Walsh, S.; Burr, T.; Martin, K. The IAEA error approach to variance estimation for use in material balance evaluation and the international target values, and comparison to metrological definitions of precision. J. Nucl. Mater. Manag. 2017, 45, 4–14. [Google Scholar]
  3. Burr, T.; Krieger, T.; Norman, C.; Zhao, K. The impact of metrology study sample size on verification samples calculations in IAEA safeguards. Eur. J. Nucl. Sci. Technol. 2016, 2, 36. [Google Scholar]
  4. Young, D. Tolerance: An R package for estimating tolerance intervals. J. Stat. Softw. 2010, 36, 1–39. [Google Scholar] [CrossRef]
  5. Zou, G.; Donner, A. Construction of confidence limits about effect measures: A general approach. Stat. Med. 2008, 27, 1693–1702. [Google Scholar] [CrossRef]
  6. Hoffman, D. One-sided tolerance limits for balanced and unbalanced random effects models. Technometrics 2010, 52, 303–312. [Google Scholar] [CrossRef]
  7. Liao, C.; Lin, T.; Iyer, H. One- and two-sided tolerance intervals for general balanced mixed models and unbalanced one-way random models. Technometrics 2005, 47, 323–335. [Google Scholar] [CrossRef]
  8. Krishnamoorthy, K.; Mathew, T. One-sided tolerance limits in balanced and unbalanced one-way random models based on generalized confidence intervals. Technometrics 2004, 46, 44–52. [Google Scholar]
  9. Krishnamoorthy, K.; Peng, J. Approximate one-sided tolerance limits in random effects model and in some mixed models and comparisons. J. Stat. Comput. Simul. 2015, 85, 1651–1666. [Google Scholar] [CrossRef]
  10. Bhaumik, D.; Kulkarni, P. A simple and exact method of constructing tolerance intervals for the one-way ANOVA with random effects. Am. Stat. 1996, 50, 31–323. [Google Scholar]
  11. Kraemer, K. Confidence Interval for Variance Components and Functions of Variance Components in the Random Effects Model under Non-Normality. Ph.D. Thesis, Iowa State University, Ames, IA, USA, 2012. [Google Scholar]
  12. Burr, D. bspmma: An R package for Bayesian semiparametric models for meta-analysis. J. Stat. Softw. 2012, 50, 1–23. [Google Scholar] [CrossRef]
  13. Hogg, R.; Craig, A. Introduction to Mathematical Statistics, 4th ed.; Macmmillan: New York, NY, USA, 1978. [Google Scholar]
  14. Burr, T.; Knepper, P. A study of the effect of measurement error in predictor variables in nondestructive assay. Appl. Radiat. Isot. 2000, 53, 547–555. [Google Scholar] [CrossRef]
  15. Miller, R. Beyond ANOVA: Basics of Applied Statistics; Chapman & Hall: Boca Raton, FL, USA, 1998. [Google Scholar]
  16. Martin, K.; Böckenhoff, A. Analysis of short-term systematic measurement error variance for the difference of paired data without repetition of measurement. Adv. Stat. Anal. 2007, 91, 291–310. [Google Scholar] [CrossRef]
  17. Burr, T.; Croft, S.; Krieger, T.; Martin, K.; Norman, C.; Walsh, S. Uncertainty quantification for radiation measurements: Bottom-up error variance estimation using calibration information. Appl. Radiat. Isot. 2015, 108, 49–57. [Google Scholar] [CrossRef] [PubMed]
  18. Burr, T.; Croft, S.; Jarman, K.; Nicholson, A.; Norman, C.; Walsh, S. Improved uncertainty quantification in nondestructive assay for nonproliferation. Chemometrics 2016, 159, 164–173. [Google Scholar]
  19. Zhao, K.; Penkin, P.; Norman, C.; Balsely, S.; Mayer, K.; Peerani, P.; Pietri, P.; Tapodi, S.; Tsutaki, Y.; Boella, M.; et al. STR-368 International target values 2010 for measurement uncertainties in safeguarding nuclear materials, IAEA, Vienna. 2010. Available online: www.inmm.org (accessed on 17 December 2017).
  20. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2012; ISBN 3-900051-07-0. Available online: http://www.R-project.org (accessed on 17 February 2015).
  21. Marsaglia, G. Ratios of normal variables. J. Stat. Softw. 2006, 16, 2. [Google Scholar] [CrossRef]
  22. Hamada, M.; Johnson, V.; Moore, L.; Wendelberger, J. Bayesian prediction intervals and their relation to tolerance intervals. Technometrics 2004, 46, 452–459. [Google Scholar] [CrossRef]
  23. Agboraw, E.; Bonner, E.; Burr, T.; Croft, S.; Kirkpatrick, J.; Krieger, T.; Norman, C.; Santi, P. Revisiting Currie’s minimum detectable activity for nondestructive assay by gamma detection using tolerance intervals. Esarda Bull. 2017, 54, 14–22. [Google Scholar]
  24. Carlin, B.; John, B.; Stern, H.; Rubin, D. Bayesian Data Analysis, 1st ed.; Chapman and Hall: Boca Raton, FL, USA, 1995. [Google Scholar]
  25. Fearnhead, P.; Prangle, D. Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation. J. R. Stat. Soc. B 2012, 74, 419–474. [Google Scholar] [CrossRef]
  26. Joyce, P.; Marjoram, P. Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. 2008, 7, 26. [Google Scholar] [CrossRef] [PubMed]
  27. Burr, T.; Skurikhin, A. Selecting summary statistics in approximate Bayesian computation for calibrating stochastic models. Biomed. Res. Int. 2013, 2013, 210646. [Google Scholar] [CrossRef] [PubMed]
  28. Blum, M.; Nunes, M.; Prangle, D.; Sisson, S. A comparative review of dimension reduction methods in approximate Bayesian computation. Stat. Sci. 2013, 28, 189–208. [Google Scholar] [CrossRef]
  29. Nunes, M.; Prangle, D. abctools: An R package for tuning approximate Bayesian computation analyses. R. J. 2016, 7, 189–205. [Google Scholar] [CrossRef]
  30. Efron, B.; Hastie, T. Computer Age Statistical Inference; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
  31. Burr, T.; Martin, K.; Norman, C.; Zhao, K. Analysis of variance for item differences in verification data with unknown groups. Sci. Technol. Nucl. Install. 2019, 2019, 1769149. [Google Scholar] [CrossRef]
Figure 1. Simulated verification measurement data with δ S = δ R = 0.01 . The relative difference d = ( o i ) / o is plotted for each of 10 paired (o,i) measurements in each of three groups, for a total of 30 relative differences. The mean relative difference within each group (inspection period) is indicated by a horizontal line through the respective group means of the paired relative differences. The 3 sets of 10 d values (multiplied by 100 for ease of reading) are {(−0.52, 1.72, −0.66, −0.15, −0.06, 0.35, 1.70, −0.35, −0.17, 2.35), (−1.14, −2.38,0.40, −1.12, −1.36, −1.28, −1.67, 0.52 −1.65, −0.03), (−1.12, −1.04, −0.63, −1.19, 0.19, −1.63, 0.07, −0.34, −1.43, −1.50)}.
Figure 1. Simulated verification measurement data with δ S = δ R = 0.01 . The relative difference d = ( o i ) / o is plotted for each of 10 paired (o,i) measurements in each of three groups, for a total of 30 relative differences. The mean relative difference within each group (inspection period) is indicated by a horizontal line through the respective group means of the paired relative differences. The 3 sets of 10 d values (multiplied by 100 for ease of reading) are {(−0.52, 1.72, −0.66, −0.15, −0.06, 0.35, 1.70, −0.35, −0.17, 2.35), (−1.14, −2.38,0.40, −1.12, −1.36, −1.28, −1.67, 0.52 −1.65, −0.03), (−1.12, −1.04, −0.63, −1.19, 0.19, −1.63, 0.07, −0.34, −1.43, −1.50)}.
Stats 02 00020 g001
Figure 2. Example with n = 5 real d = o i o values in each of the four inspection periods.
Figure 2. Example with n = 5 real d = o i o values in each of the four inspection periods.
Stats 02 00020 g002
Figure 3. The value of k needed such that P ( d k δ ^ T ) 0.05 , with probability at least p = 0.99 versus δ S for δ R = 0.01 .
Figure 3. The value of k needed such that P ( d k δ ^ T ) 0.05 , with probability at least p = 0.99 versus δ S for δ R = 0.01 .
Stats 02 00020 g003
Figure 4. The required value of k for g = 3, 5, or 10, and n = 10 versus τ = σ B 2 σ W 2 .
Figure 4. The required value of k for g = 3, 5, or 10, and n = 10 versus τ = σ B 2 σ W 2 .
Stats 02 00020 g004
Figure 5. The estimated pdf for when τ 2 = δ B 2 δ W 2 is known and unknown.
Figure 5. The estimated pdf for when τ 2 = δ B 2 δ W 2 is known and unknown.
Stats 02 00020 g005

Share and Cite

MDPI and ACS Style

Burr, T.; Bonner, E.; Krzysztoszek, K.; Norman, C. Setting Alarm Thresholds in Measurements with Systematic and Random Errors. Stats 2019, 2, 259-271. https://doi.org/10.3390/stats2020020

AMA Style

Burr T, Bonner E, Krzysztoszek K, Norman C. Setting Alarm Thresholds in Measurements with Systematic and Random Errors. Stats. 2019; 2(2):259-271. https://doi.org/10.3390/stats2020020

Chicago/Turabian Style

Burr, Tom, Elisa Bonner, Kamil Krzysztoszek, and Claude Norman. 2019. "Setting Alarm Thresholds in Measurements with Systematic and Random Errors" Stats 2, no. 2: 259-271. https://doi.org/10.3390/stats2020020

Article Metrics

Back to TopTop