# Comparing the Effectiveness of Robust Statistical Estimators of Proficiency Testing Schemes in Outlier Detection

*Standards*)

## Abstract

**:**

## 1. Introduction

_{PT}, and standard deviation σ

_{PT}constitute the summary statistics of the population’s results. The respecting performance statistic for the result X

_{i}of each participating lab i is the Z-score provided by Equation (1) according to 17043:2010 [1].

## 2. Kernel Density Plots

_{ij}= x

_{ij}− x

_{pt,j}, where i is the participant and j is the round. Especially for the compressive strength in different ages, constituting the most significant tests, the software created the kernel plots using the results of nine moving rounds for the last three years of the PT scheme application. The Generalized Reduced Gradient non-linear regression technique fitted the mix of three normal distributions to each KDP and adjusted the parameters shown in Table 1 to achieve the minimum distance between the two curves. The result of this processing was 144 kernel density plots, based on multiple kinds of tests during the long-term operation of the PT scheme, efficiently simulated using a mix of central and contaminating populations. Figure 1 depicts an example of a kernel density plot implemented on 28-day strength results of nine rounds. The location of the two contaminating distributions is on the left and the right of the central one. The coefficient of determination, R

^{2}, between the mix of the three distributions and the kernel density plot is 0.998, and the parameters’ values are as follows: m

_{1}= 0.12 MPa; s

_{1}= 1.47 MPa; m

_{2}= 4.54 MPa; s

_{2}= 0.68; m

_{3}= −3.47; s

_{3}= 1.18; fr

_{1}= 0.87; fr

_{2}= 0.024; fr

_{3}= 0.106; n

_{2}= 3.01; n

_{3}= −2.45. The average coefficient of determination of the fit of all 144 sets of KDP is 0.9917, with a standard deviation equal to 0.0051. The above demonstrates that simulating kernel density plots with a sum of three normal distributions is effective.

_{1}to m

_{2}and to m

_{3}in s

_{1}units, n

_{2}and n

_{3}correspondingly.

_{2}, n

_{3}so that always to be |n

_{2}| ≤ |n

_{3}|. The data are divided into three regions: (a) |n

_{3}| < 3.5; (b) 3.5 ≤ |n

_{3}| < 5.5; (c) |n

_{3}| ≥ 5.5. The computations provide the following distribution of the three regions: (a) 68%; (b) 27.8%; (c) 4.2%. By rounding the values of n

_{2}, and n

_{3}to the nearest integer, the distribution of the three regions has the following meaning. (a) 68% of the rounded values belong to the interval [–3,3]; only 4.2% of the population has n

_{3}> = 6, or n

_{3}< = −6; 27.8% of data has n

_{3}= 4,5 or n

_{3}= −4, −5.

_{2}, f

_{3}, and the sum f

_{2}+ f

_{3}of the surfaces of contaminating populations. In 73.6% of cases, the points are down the diagonal, meaning that f

_{2}≥ f

_{3}. Thus, in most data, the fraction of the second population with |n

_{2}| ≤ |n

_{3}| is considerably higher than that of the third. Figure 3b indicates that the fraction f

_{2}+ f

_{3}≤ 20 represents 84.7% of the total population of contaminating distributions. Additionally, 76.4% of the results lie between 0.05 and 0.20, constituting the main fraction of the two secondary distributions.

_{3}values range from 2.5% to 4%, while specific surface values lie from 3000 cm

^{2}/g to 4500 cm

^{2}/g. Such a normalizing variable is the coefficient of variation, CV, which for each test takes into account the average of the assigned values of the included rounds, AvVal, provided by Equation (2).

## 3. Model Development Using Monte Carlo Simulation and Initial Simulations

#### 3.1. Monte Carlo Simulation

- number of participating laboratories, N
_{lab}; - number of replicate analyses per laboratory, N
_{rep}; - repeatability standard deviation, s
_{r}; - number of iterations, N
_{iter}; - number of simulations, N
_{s}; - number of bins to create histograms, N
_{b}.

_{2}+ f

_{3}< 0.10, and mean values of the Zu% distribution. Using robust estimators to detect the outliers indicates that the approach is nonparametric. Therefore, using the entire distribution of Zu% instead of its average constitutes a substantial improvement of the method presented in [24]. The improved algorithm applies the Monte Carlo simulation and follows the subsequent steps:

- (i)
- It creates a main normal distribution D
_{1}with mean value m_{1}and standard deviation s_{1}and two contaminating distributions D_{2}, D_{3}with mean values m_{2}, m_{3}, and standard deviations s_{2}, s_{3}. - (ii)
- The fractions of the contaminating distributions are fr
_{2}and fr_{3}, and, depending on these two values, the total distribution can be unimodal, bimodal, or trimodal. - (iii)
- The mean values m
_{2}and m_{3}differ by an integer number of standard deviations s_{1}from m_{1}, n_{2}and n_{3}, shown in Equation (3). In the case of trimodal distribution, if n_{2}·n_{3}> 0, then D_{2}and D_{3}are both to the same side of the D_{1}. Otherwise, one is to the left and the other to the right of D_{1}.$${m}_{2}={m}_{1}+{n}_{2}\xb7{s}_{1},{m}_{3}={m}_{1}+{n}_{3}\xb7{s}_{1}with\left|{n}_{2}\right|\le \left|{n}_{3}\right|,{n}_{3}\ge 0,and{n}_{2}\ge 0or{n}_{2}0$$ - (iv)
- According to the values of fr
_{2}, fr_{3}, n_{2}, and n_{3}, the software calculates the values of Zu%, which are unsatisfactory compared to the normal distribution function with mean and standard deviation m_{1}and s_{1}correspondingly. These values are the initial values. For example, if m_{1}= 0, s_{1}= 2, s_{2}= s_{3}= 1, n_{2}= −5, n_{3}= 7, fr_{2}= 0.05, and fr_{3}= 0.05, then Zu% = 0.24 (from D_{1}) + 5.0 (from D_{2}) + 5.0 (from D_{3}) = 10.24. - (v)
- The algorithm calculates all the estimators for the mean and standard deviation shown in Table 4 and the Zu% for the absolute values of the four Z-factors presented in the same table using a N
_{lab}= 1000. For this number of participants, all estimators converge to their final value. As demonstrated in [24], N_{lab}= 400 is adequate for such convergence. - (vi)
- This previous study found that Z_MADe was the closest estimator of unacceptable Z-factors to the estimation based on the main normal distribution of step (iv). Its values corresponding to N
_{lab}= 1000 represent the reference values, Z_{ref}. - (vii)
- The simulations implemented all the settings shown in Table 5 for participants up to 100. The populations correspond to unimodal, bimodal, and trimodal distributions with a maximum total fraction of secondary distributions up to 0.2.
- (viii)
- For each mix of the three normal distributions, the software performs N
_{iter}iterations and N_{s}simulations for each N_{lab}. Then it calculates the differential distribution of each of the four Zu% in at least twenty bins. The algorithm compares these results with the reference value using the distance of the distribution’s points from Z_{ref}provided by Equation (4). The Z-factor with the closest distance to the reference value is optimal.$$Z{M}_{j}={\displaystyle \sum}_{i=1}^{Nmax}\left|Z{D}_{ij}-{Z}_{ref}\right|\xb7{X}_{ij},Z{M}_{opt}=\underset{}{\mathrm{min}}\left(Z{M}_{j}\right)forj=1to4$$

_{j}, X

_{ij}, ZD

_{ij}, Z

_{ref}, and N

_{max}. ZM

_{j}denotes the mean of distances between the distribution of estimator j and Z

_{ref}, X

_{ij}is the fraction of the differential distribution at point i with a ZD

_{ij}value on the X-axis, and N

_{max}is the maximum i with a non-zero X

_{ij}value. The best estimator is the one whose index j corresponds to ZM

_{j}’s minimum. If the ZM value of another estimator differs by less than 1% from the minimum ZM, this also estimator is optimal. Figure 4 illustrates the above method of finding the most suitable estimator using the following model parameters. N

_{lab}= 20; s

_{1}= 2; n

_{2}= −2; n

_{3}= 5; s

_{2}= 1; s

_{3}= 1; fr

_{2}= 0.1; fr

_{3}= 0.05.

#### 3.2. Initial Simulations

_{b}bins, using the N

_{iter}·N

_{s}results of each statistic. These histograms, designed to investigate the symmetry of the distributions, are highly reliable because they incorporate up to 25,000 simulations. Figure 5 depicts such an example for MED and MADe. There is an asymmetry in both distributions, indicating that this skewness should be explored further. The skewness metric must be nonparametric because the approach used to detect outliers is. Additionally, all studied distributions are unimodal, sometimes with few frequency fractions over 0.01. In this study, we implement the simple formula provided by Groeneveld et al. [38] using the mean value, μ, and the median value, ν, of the distribution. The measure of dispersion in this formula, E(|X−ν|), is the mean value of the absolute differences between distribution values and the median. In this way, we avoid utilizing the more complicated formula described in many textbooks [39], involving the use of second and third-order moments. Equation (5) expresses the selected skewness measure, SK.

_{i}, for I = 1 to N

_{b}, with the average X

_{i}of each interval of the histogram X-axis using Equation (6). The calculation of median value, ν, utilized linear interpolation between the points X

_{p}, X

_{p+}

_{1}, where f

_{p}< = 0.5 and f

_{p+}

_{1}> = 0.5. Finally, Equation (7) provides the estimation E(|X−ν|).

_{lab}= 10, 20, 40, 80; s

_{1}= 2; n

_{3}= 7; n

_{2}= −7 to +7; s

_{2}= 1; s

_{3}= 1; fr

_{2}= 0.1; fr

_{3}= 0.05; from which one can conclude the following:

- (i)
- The distributions of mean are continuously positively skewed due to the assumption |n
_{2}| < |n_{3}|. If n_{3}=−7, and supposing the same condition between n_{2}, n_{3}, the distributions are symmetrical to the ones shown, but μ < ν and SK < 0. - (ii)
- We classified SK values into four regions: (a) 0 ≤ |SK| < 0.2; (b) 0.2 ≤ |SK| < 0.4; (c) 0.4 ≤ |SK| < 0.7; (d) 0.7 ≤ |SK| ≤ 1. As a rough guide, we can consider that if |SK| < 0.2, the departure from symmetry is low [41]. The skew is moderate for |SK| values in (b). The distribution is highly skewed for |SK| values in (c). Finally, the skew is very high for |SK| ≥ 0.7.
- (iii)
- The skewness of the estimators of mean increases, increasing n
_{2}from −7 to 7. For these estimators, the skewness decreases, increasing the number of participants. The distributions are highly symmetric for N_{lab}≥ 40 and n_{2}≤ 0. - (iv)
- The estimators of the mean have less skewness than those of standard deviation for the same N
_{lab}and n_{2}. The best symmetry in the distributions of the latter occurs for N_{lab}≥ 40 and n_{2}between −1 and 1. - (v)
- For low and high n
_{2}values, the skewness standard deviation sometimes becomes very high. - (vi)
- The asymmetry of the two distributions indicates that the search for the best method regarding outliers should focus on their distribution for each performance statistic mentioned in Table 4.

## 4. Optimal Robust Estimators for Outlier Detection

#### 4.1. Shape of the Outliers Distribution

_{lab}, each value ZD on the X-axis corresponds to an integer number of unsatisfactory results. For example, for N

_{lab}= 40 and ZD = 5%, the outliers are 5/100·40 = 2. Figure 8 illustrates the above by implementing the four performance statistics in Table 4 and using six sets of model parameters.

_{ref}. Figure 8 shows the mean distances ZM between each performance estimator and Z

_{ref}. Z_Hamp is the most effective estimator for the examples presented. An intriguing case appears in Figure 8b: Z_Hamp and Z_A estimators are both optimal, despite their distinct distributions. It results from using the objective criterion of the mean distance of the distribution from Z

_{ref}, which is particularly effective for strongly non-normal distributions.

#### 4.2. Implementation of the Simulator

_{1}, s

_{2}, and s

_{3}provides 4·3·2 = 24 triads. The simulation uses 12 out of 24, referred to as [s

_{1}, s

_{2}, s

_{3}], where the three standard deviations are integers. For example [s

_{1}, s

_{2}, s

_{3}] = 111 it means that s

_{1}= 1, s

_{2}= 1, and s

_{3}= 1. The set of applied [s

_{1}, s

_{2}, s

_{3}] is as follows: 111, 211, 212, 222, 311, 321, 312, 322, 421, 412, 422, 432. The combination of fr

_{2}and fr

_{3}provides ten couples of [fr

_{2}, fr

_{3}], the following: [0, 0.05], [0, 0.10], [0, 0.15], [0, 0.20], [0.05, 0.05], [0.05, 0.10], [0.05, 0.15], [0.10, 0.05], [0.10, 0.10], [0.15, 0.05]. The simulator utilized all these combinations to obtain various unimodal, bimodal, and trimodal distributions. The maximum selected number of n

_{3}, n

_{3MAX}, depends on the value of s

_{1}as follows: (a) for s

_{1}= 1, 2, n

_{3MAX}= 8; (b) for s

_{1}= 3, n

_{3MAX}= 6; (c) for s

_{1}= 4, n

_{3MAX}= 5.

_{1}, s

_{2}, s

_{3}] = 222. The main conclusions from these results are the following:

- (i)
- For n
_{3}≤ 3 and |n_{2}|< = 3, i.e., for relatively low differences between the mean values of contaminating and central distributions, Z_Hamp is almost always the best estimator. - (ii)
- Considering all the combinations of [s
_{1}, s_{2}, s_{3}] and [fr_{2}, fr_{3}], Z_Hamp is almost always the best in this range of n_{2}, n_{3}. Therefore, when at most half the distribution of each contaminating population is outside, left or right, of the central distribution, there is only one optimal estimator. - (iii)
- For higher values of n
_{3}and |n_{2}|, determining the most suitable estimator is a function of (a) the number of participants; (b) the distribution of the results, expressed by f_{2}, f_{3}, n_{2}, and n_{3}.

#### 4.3. Simulation Results

_{1}, s

_{2}, s

_{3}] X [fr

_{2}, fr

_{3}], resulting in 120 matrices similar to those in Figure A1 and Figure A2, which are the exact solution of the optimization problem applying the criterion of Equation (4). Instead of this solution, we chose a functional sub-optimal one by grouping and classifying the parameters as follows:

- (i)
- The algorithm initially generates two groups: (a) a small population of participants, N
_{LOW}, with N_{lab}= 10, 15, 20, and 30; (b) a large one, N_{HIGH}, with N_{lab}= 40, 60, 80, and 100. - (ii)
- It then creates at most seven regions of n
_{2}by keeping|n_{2}| < n_{3}, the following: (a) −7 ≤ n_{2}≤ −8; (b) −6 ≤ n_{2}≤ −5; (c) −4 ≤ n_{2}≤ −3; (d) −2 ≤ n_{2}≤ 2 (e) 3 ≤ n_{2}≤ 4 (f) 5 ≤ n_{2}≤ 6 (g) 7 ≤ n_{2}≤8. It is seven regions when n_{3MAX}is 8, but five when n_{3MAX}is less. - (iii)
- Afterwards, it counts the occurrences of each estimator as optimal in each region and calculates their percentages.
- (iv)
- An optimal estimator is the one with the highest percentage and those whose percent of appearance differs by up to 10% from the maximum.
- (v)
- Next is the creation of tables with the results per n
_{3}and [s_{1}, s_{2}, s_{3}].

_{1}, s

_{2}, s

_{3}] = 111, 211, 212, 222 and 7 ≤ n

_{3}≤ 8, 5 ≤ n

_{3}≤ 6, n

_{3}= 4. Appendix B contains all the remaining results for s

_{1}= 3 and 4. We do not show the results for n

_{3}= 2,3 since Z_Hampel is always much superior to the second estimator, i.e., Z_Hampel is the best. Figure 9 shows that the four estimators are numbered and colored. Two columns are present for each [s

_{1}, s

_{2}, s

_{3}] because two optimal estimators exist for some parameter sets.

- (i)
- (ii)
- For the same n
_{3}, the distribution of the most accurate performance statistics as a function of n_{2}is not symmetrical around the center. For all n_{3}, the estimators for −8 ≤ n_{2}≤ −7 and −6 ≤ n_{2}≤ −5 differ significantly from the ones for 7 ≤ n_{2}≤ 8 and 5 ≤ n_{2}≤ 6. More symmetrical patterns appear in zones −4 ≤ n_{2}≤ −3 and 3 ≤ n_{2}≤ 4. - (iii)
- With the same model parameters, the results are similar for the two groups N
_{LOW}and N_{HIGH}, but not in all cases. For example, for 7 ≤ n_{3}≤ 8 and −7 ≤ n_{2}≤ −8, Z_Hampel appears as the optimal performance statistic much more in the N_{HIGH}group than in N_{LOW}. In contrast, this estimator is found significantly more frequently in the low number of labs than in N_{HIGH}when n_{3}= 4. - (iv)
- The selection of the most appropriate estimator is not so sensitive to the choice of the mixture of standard deviations, [s
_{1}, s_{2}, s_{3}]. When comparing Figure 10, Figure 11, Figure A3 and Figure A4, one finds that in enough cases, the optimal statistic is the same for the same group of labs, n_{3}, n_{2}, f_{2}, and f_{3}, concluding that the impact of [s_{1}, s_{2}, s_{3}] is less strong than that of the other parameters. - (v)
- With a contaminating population fraction f
_{3}of 0.05 and distribution diverging relatively little from the bimodal with −2 ≤ n_{2}≤ 2, the Z_Hamp is most often found. However, this rule is not absolute. Figure A4 demonstrates that for the group N_{HIGH}, n_{3}= 4, and s_{1}= 3 or 4, Z_A is the most suitable. - (vi)
- In some cases, there are expanded zones where one estimator outperforms the others, increasing the robustness of the suggested solution. For example, in Figure 10 and Figure A3, for 5 ≤ n
_{2}≤ 6, the first choice is the Z_MADe. In Figure 10, Figure 11, Figure A3 and Figure A4, for −2 ≤ n_{2}≤ 2 and f_{3}= 0.10 the correct selection is Z_A. - (vii)
- For N
_{LOW}and N_{HIGH}groups, Figure 12 illustrates a rough tendency for the preferred statistics as a function of the zones of n_{2}, n_{3}, and f_{3}. For each region, the algorithm counts the occurrences of each estimator in the maps depicted in Figure 9, Figure 10, Figure 11, Figure A3 and Figure A4. It calculates their percentages and considers as optimal statistics those that are up to 10% below the maximum. - (viii)
- The results of Figure 12 are only a rough guide to choosing the most appropriate estimator, demonstrating that there is no unique solution, and the best selection depends on the data distribution.

_{lab}is small. The next step is to normalize the parameters of the found normal distributions based on the mean value of the central one using Equation (8).

_{i,Act}and s

_{i,Act}, for I = 1,2,3, denote the mean value and the standard deviation of the three normal distributions simulating the kernel density plot. The parameters fr

_{1}, fr

_{2}, fr

_{3}, and n

_{2}, n

_{3}in Table 1 are independent of normalization.

_{2}, n

_{3}, and the three standard deviations to the nearest integer follows. The same is done with the fractions fr

_{2}and fr

_{3}to the nearest multiple of 0.05. Next is to check if all the parameters are within the ranges given in Figure 9, Figure 10, Figure 11, Figure A3 and Figure A4. If n

_{3}< 0, and |n

_{3}| > |n

_{2}|, the symmetrical case by changing the signs of n

_{2}and n

_{3}provides the same optimal estimator. If this check is positive, the choice is the estimator that the Figures demonstrate. If f

_{2}+ f

_{3}≥ 0.25 but |n

_{2}| ≤ 3 and |n

_{3}| ≤ 3, Z_Hamp is the most appropriate selection, as already shown in 4.2. Otherwise, the figures are not applicable to such a high percentage of contaminating populations. The same happens if |n

_{3}| > n

_{3MAX}, as provided in Section 4.2.

_{1}, s

_{2}, s

_{3}] sets are not present in the simulation results, we selected from the figures those [s

_{1}, s

_{2}, s

_{3}] that are closest to the actual ones. The group chosen is always the N

_{LOW}, as 11–14 labs participate in each test. Table 6 summarizes the best estimators for detecting outliers for model parameters computed from actual and multiple tests.

_{3}| ≥ 6 shows that participants try to improve their performance and the PT scheme is well organized and mature. Only in 4.9% out of the 144 cases the parameters of the kernel density plots are outside the studied range. Z_Hampel significantly outperforms the other estimators when applied to multiple distributions of the results of a well-performing PT scheme. The conclusion is not general, but one can assume that in continuous and mature PT schemes with a limited number of participants, the Z_Hampel estimator is the suitable choice.

## 5. Conclusions

_{ref}. The estimator with the minimum value is the best.

- ▪
- direct use of the kernel density plots in determining the best statistic
- ▪
- estimators’ comparisons for Z-factors of absolute value between two and three.

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

_{1}, s

_{2}, s

_{3}] = 222. The algorithm also calculates the sum per estimator and n

_{3}value. If a cell contains multiple findings, the color is that of the estimator of the highest ranking in the row.

## Appendix B

_{LOW}, N

_{HIGH}, [s

_{1}, s

_{2}, s

_{3}] = 311, 321, 312, 322, 421, 412, 422, 432 and 5 ≤ n

_{3}≤ 6, n

_{3}= 4.

**Figure A3.**Best estimators for 5 ≤ n

_{3}≤ 6 and [s

_{1}, s

_{2}, s

_{3}] = 311, 321, 312, 322, 421, 412, 422, 432.

**Figure A4.**Best estimators for n

_{3}= 4 and [s

_{1}, s

_{2}, s

_{3}] = 311, 321, 312, 322, 421, 412, 422, 432.

## References

- EN ISO/IEC 17043:2010; Conformity Assessment—General Requirements for Proficiency Testing. CEN Management Centre: Brussels, Belgium, 1994; pp. 2–3, 8–9,30–33.
- Hampel, F.R.; Ronchetti, E.M.; Peter, J.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics: The Approach Based on Influence Functions; John Wiley & Sons, Inc.: New York, NY, USA, 1986. [Google Scholar]
- Huber, P.J.; Ronchetti, E.M. Robust Statistics, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2009. [Google Scholar]
- Wilcox, R. Introduction to Robust Estimation and Hypothesis Testing, 3rd ed.; Elsevier, Inc.: Waltham, MA, USA, 2013. [Google Scholar]
- Maronna, R.A.; Martin, R.D.; Yohai, V.J.; Salibián-Barrera, M. Robust Statistics: Theory and Methods (with R), 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2018. [Google Scholar]
- Hund, E.; Massart, D.L.; Smeyers-Verbeke, J. Inter-laboratory Studies in Analytical Chemistry. Anal. Chim. Acta
**2000**, 424, 145–165. [Google Scholar] [CrossRef] - Daszykowski, M.; Kaczmarek, K.; Vander Heyden, Y.; Walczaka, B. Robust statistics in data analysis—A review: Basic concepts. Chemom. Intell. Lab. Syst.
**2007**, 85, 203–219. [Google Scholar] [CrossRef] - Shevlyakov, G. Highly Efficient Robust and Stable M-Estimates of Location. Mathematics
**2021**, 9, 105. [Google Scholar] [CrossRef] - Ghosh, I.; Fleming, K. On the Robustness and Sensitivity of Several Nonparametric Estimators via the Influence Curve Measure: A Brief Study. Mathematics
**2022**, 10, 3100. [Google Scholar] [CrossRef] - Zimek, A.; Filzmoser, P. There and back again: Outlier detection between statistical reasoning and data mining algorithms. WIREs Data Min. Knowl. Discov.
**2018**, 8, e1280. [Google Scholar] [CrossRef][Green Version] - Roelant, E.; Van Aelst, S.; Willems, G. The minimum weighted covariance determinant estimator. Metrika
**2009**, 70, 177–204. [Google Scholar] [CrossRef] - Cerioli, A. Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc.
**2010**, 105, 147–156. [Google Scholar] [CrossRef] - Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection; John Wiley & Sons, Inc.: New York, NY, USA, 1987. [Google Scholar]
- Kalina, J.; Tichavský, J. On Robust Esti ation of Error Variance in (Highly) Robust Regression. Meas. Sci. Rev.
**2020**, 20, 6–14. [Google Scholar] [CrossRef][Green Version] - ISO 13528:2015; Statistical Methods for Use in Proficiency Testing by Interlaboratory Comparison. 2nd ed, ISO: Geneva, Switzerland, 2015; pp. 10–12, 32–33,44–51, 52–62.
- ISO 5725-2:1994; Accuracy (Trueness and Precision) of Measurement Methods and Results—Part 2: Basic Method for the Determination of Repeatability and Reproducibility of a Standard Measurement Method. 1st ed. ISO: Geneva, Switzerland, 1994; pp. 10–14, 21–22.
- ISO 5725-5:1998; Accuracy (Trueness and Precision) of Measurement Methods and Results—Part 5: Alternative Methods for the Determination of the Precision of a Standard Measurement Method. 1st ed. ISO: Geneva, Switzerland, 1998; pp. 35–36.
- Rosário, P.; Martínez, J.L.; Silván, J.M. Evaluation of Proficiency Test Data by Different Statistical Methods Comparison. In Proceedings of the First International Proficiency Testing Conference, Sinaia, Romania, 11–13 October 2007. [Google Scholar]
- Srnková, J.; Zbíral, J. Comparison of different approaches to the statistical evaluation of proficiency tests. Accredit. Qual. Assur.
**2009**, 14, 373–378. [Google Scholar] [CrossRef] - Tripathy, S.S.; Saxena, R.K.; Gupta, P.K. Comparison of Statistical Methods for Outlier Detection in Proficiency Testing Data on Analysis of Lead in Aqueous Solution. Am. J. Theor. Appl. Stat.
**2013**, 2, 233–242. [Google Scholar] [CrossRef] - Skrzypczak, I.; Lesniak, A.; Ochab, P.; Górka, M.; Kokoszka, W.; Sikora, A. Interlaboratory Comparative Tests in Ready-Mixed Concrete Quality Assessment. Materials
**2021**, 14, 3475. [Google Scholar] [CrossRef] - De Oliveira, C.C.; Tiglea, P.; Olivieri, J.C.; Carvalho, M.; Buzzo, M.L.; Sakuma, A.M.; Duran, M.C.; Caruso, M.; Granato, D. Comparison of Different Statistical Approaches Used to Evaluate the Performance of Participants in a Proficiency Testing Program. Available online: https://www.researchgate.net/publication/290293736_Comparison_of_different_statistical_approaches_used_to_evaluate_the_performance_of_participants_in_a_proficiency_testing_program (accessed on 2 November 2022).
- Kojima, I.; Kakita, K. Comparative Study of Robustness of Statistical Methods for Laboratory Proficiency Testing. Anal. Sci.
**2014**, 30, 1165–1168. [Google Scholar] [CrossRef] [PubMed][Green Version] - Tsamatsoulis, D. Comparing the Robustness of Statistical Estimators of Proficiency Testing Schemes for a Limited Number of Participants. Computation
**2022**, 10, 44. [Google Scholar] [CrossRef] - Yohai, V. High Break-Down Point and High Efficiency Robust Estimates for Regression. Ann. Stat.
**1987**, 15, 642–656. [Google Scholar] [CrossRef] - Gervini, D.; Yohai, V. A Class of Robust and Fully Efficient Regression Estimators. Ann. Stat.
**2002**, 30, 583–616. [Google Scholar] [CrossRef] - Pitselis, G. A Review on Robust Estimators Applied to Regression Credibility. J. Comput. Appl. Math.
**2013**, 239, 231–249. [Google Scholar] [CrossRef] - Yu, C.; Yao, W.; Bai, X. Robust Linear Regression: A Review and Comparison. Available online: https://arxiv.org/abs/1404.6274 (accessed on 4 November 2022).
- Kong, D.; Bondell, H.D.; Wu, Y. Fully Efficient Robust Estimation Outlier Detection and Variable Selection via penalized Regression. Stat. Sin.
**2018**, 28, 1031–1052. Available online: https://www3.stat.sinica.edu.tw/statistica/oldpdf/A28n222.pdf (accessed on 2 March 2023). - Marazzi, A. Improving the Efficiency of Robust Estimators for the Generalized Linear Model. Stats
**2021**, 4, 88–107. [Google Scholar] [CrossRef] - EN 197-1:2011; Cement. Part 1: Composition, Specifications and Conformity Criteria for Common Cements. Management Centre: Brussels, Belgium, 2011.
- Stancu, C.; Michalak, J. Interlaboratory Comparison as a Source of Information for the Product Evaluation Process. Case Study of Ceramic Tiles Adhesives. Materials
**2022**, 15, 253. [Google Scholar] [CrossRef] - Humbert, P.; Le Bars, B.; Minvielle, L.; Vayatis, N. Robust Kernel Density Estimation with Median-of-Means principle. Available online: https://arxiv.org/pdf/2006.16590.pdf (accessed on 2 March 2023).
- Gallego, J.A.; González, F.A.; Nasraoui, O. Robust kernels for robust location estimation. Neurocomputing
**2021**, 429, 174–186. [Google Scholar] [CrossRef] - EN 196-6:2010; Methods of Testing Cement—Part 6: Determination of Fineness. CEN Management Centre: Brussels, Belgium, 2010.
- EN 196-2:2013; Methods of Testing Cement—Part 2: Chemical Analysis of Cement. CEN Management Centre: Brussels, Belgium, 2013.
- EN 196-1:2016; Methods of Testing Cement—Part 1: Determination of Strength. CEN Management Centre: Brussels, Belgium, 2016.
- Groeneveld, R.A.; Meeden, G. Measuring Skewness and Kurtosis. J. R. Stat. Soc. Series D
**1984**, 33, 391–399. [Google Scholar] [CrossRef] - Bulmer, M.G. Principles of Statistics, 3rd ed.; Dover Publications, Inc.: New York, NY, USA, 1979; pp. 61–63. [Google Scholar]
- Hotelling, H.; Solomons, L.M. The Limits of a Measure of Skewness. Ann. Math. Statist.
**1932**, 3, 141–142. [Google Scholar] [CrossRef] - Nonparametric Skew. Available online: https://en.wikipedia.org/wiki/Nonparametric_skew (accessed on 14 November 2022).
- Vapnik, V.N. Robust Statistics: Statistical Learning Theory; John Wiley & Sons, Inc.: New York, NY, USA, 1998. [Google Scholar]

**Figure 1.**Kernel density plot and simulation with a mix of three normal distributions. Dashed lines; main and contaminating groups, and solid line; total population.

**Figure 3.**Fraction of contaminating populations: (

**a**) Full set of f

_{2}, f

_{3}; (

**b**) Distribution of f

_{2}+ f

_{3}.

**Figure 4.**Illustration of the method calculating the optimal estimator: (

**a**) Frequency distributions of the outliers detected by the four estimators; (

**b**) Average distances and optimal estimator.

**Figure 5.**Example of distributions of robust estimators calculating the best estimator: (

**a**) MED; (

**b**) MADe. Model parameters: N

_{lab}= 40; s

_{1}= 2; n

_{2}= 6; n

_{3}= 7; s

_{2}= 1; s

_{3}= 1; fr

_{2}= 0.1; fr

_{3}= 0.05.

**Figure 7.**Nonparametric skewness values of the robust estimators of population’s standard deviation.

**Figure 8.**Distribution of outliers using the performance statistics of Table 4 and parameters: s

_{1}= 2; s

_{2}= 1; s

_{3}= 1; fr

_{2}= 0.1; fr

_{3}= 0.05; and (

**a**) N

_{lab}= 20; n

_{2}= −3; n

_{3}= 4; (

**b**) N

_{lab}= 40; n

_{2}= −3; n

_{3}= 4; (

**c**) N

_{lab}= 20; n

_{2}= −2; n

_{3}= 5; (

**d**) N

_{lab}= 40; n

_{2}= −2; n

_{3}= 5; (

**e**) N

_{lab}= 20; n

_{2}= −2; n

_{3}= 6; (

**f**) N

_{lab}= 40; n

_{2}= −2; n

_{3}= 6.

**Figure 12.**Trend maps of robust performance estimators as a function of n

_{3}, n

_{2}, and f

_{3}regions.

Parameter | Symbol |
---|---|

Main distribution mean value | m_{1} |

Main distribution standard deviation | s_{1} |

Second distribution mean value | m_{2} |

Second distribution standard deviation | s_{2} |

Third distribution mean value | m_{3} |

Third distribution standard deviation | s_{3} |

Fraction of surface of the main distribution | fr_{1} |

Fraction of surface of the second distribution | fr_{2} |

Fraction of surface of the third distribution | fr_{3} = 1 − fr_{1} − fr_{2} |

Algebraic distance from m_{1} to m_{2} | n_{2}= (m_{2} − m_{1})/s_{1} |

Algebraic distance from m_{1} to m_{3} | n_{3}= (m_{3} − m_{1})/s_{1} |

Test | Method |
---|---|

Specific surface—Blaine method | EN 196-6:2010 [35] |

Loss on ignition, LOI | EN 196-2:2013 [36] |

Sulphates content, SO_{3} | EN 196-2:2013 |

2-day, 7-day, and 28-day compressive strength | EN 196-1:2016 [37] |

No | CV_{1} | CV_{2} | CV_{3} | Frequency % | No | CV_{1} | CV_{2} | CV_{3} | Frequency % |
---|---|---|---|---|---|---|---|---|---|

1 | 3 | 2 | 1 | 9.0 | 9 | 2 | 2 | 1 | 4.2 |

2 | 2 | 1 | 1 | 8.3 | 10 | 4 | 1 | 2 | 4.2 |

3 | 2 | 1 | 2 | 8.3 | 11 | 1 | 1 | 2 | 3.5 |

4 | 3 | 2 | 2 | 7.6 | 12 | 3 | 1 | 1 | 3.5 |

5 | 2 | 2 | 2 | 6.3 | 13 | 3 | 3 | 1 | 2.8 |

6 | 3 | 1 | 2 | 6.3 | 14 | 4 | 3 | 2 | 2.8 |

7 | 4 | 2 | 1 | 5.6 | 15 | 1 | 1 | 1 | 2.1 |

8 | 4 | 2 | 2 | 4.9 | 16 | 2 | 2 | 3 | 2.1 |

Performance Statistic | Mean Value | Standard Deviation | Variable Name |
---|---|---|---|

Z-factor | Median value, MED ISO 13528:2015, C.2.1 | Scaled median absolute deviation, MADe ISO 13528:2015, C.2.2 | Z_MADe |

Z-factor | Median value, MED ISO 13528:2015, C.2.1 | Normalized interquartile range, nIQR ISO 13528:2015, C.2.3 | Z_nIQR |

Z-factor | Robust mean—Algorithm A with iterated scale, Ax* ISO 13528:2015, C.3.1 | Robust standard deviation—Algorithm A with iterated scale, As* ISO 13528:2015, C.3.1 | Z_A |

Z-factor | Hampel estimator for mean, Hx* ISO 13528:2015, C.5.3.2 | Robust standard deviation—Q method, Hs* ISO 13528:2015, C.5.2.2 | Z_Hamp |

Setting | Value |
---|---|

N_{lab} | 10, 15, 20, 30, 40, 60, 80, 100 and 1000 ^{1} |

N_{rep} | 2 |

s_{r} | 0.01 |

m_{1} | 100 |

s_{1} | 1, 2, 3, 4 |

m_{2} | m_{2} = m_{1} + n_{2}·s_{1}, n_{2} = −8 to 8 and step 1 |

fr_{2} | 0, 0.05, 0.1, 0.15, 0.20 ^{2} |

m_{3} | m_{3} = m_{1} + n_{3}·s_{1}, n_{3} = 1 to 8 and step 1 ^{3} |

fr_{3} | 0, 0.05, 0.1, 0.15 |

s_{2} | 1, 2, 3 |

s_{3} | 1, 2 |

N_{iter} | 1000 |

N_{s} | Up to 25 ^{4} |

N_{b} | 20 ^{5} |

^{1}If N

_{lab}= 1000, then the N

_{s}= 5.

^{2}If fr

_{3}= 0, then the maximum value of fr

_{2}is 0.2. Otherwise fr

_{2Max}= 0.15 and fr

_{2}+ fr

_{3}≤ 0.2.

^{3}If n

_{3}< 0, the distribution is symmetric about the Y-axis with that of n

_{3}> 0 and n

_{2}of opposite sign and the same absolute values. Therefore, the simulation results will be the same.

^{4}The algorithm stops earlier than 25 simulations if the new value of the average of each unacceptable Z-factor differs from the previous one by less than 0.1%.

^{5}If the number of distribution non-zero fractions is higher than 20, then N

_{b}= 30 or 40.

Percentages | ||||
---|---|---|---|---|

Estimator | |n_{3}| ≤ 3 | 4 ≤ |n_{3}| ≤ 5 | |n_{3}| ≥ 6 | All n_{3} |

Z_MADe | 0 | 2.5 | 0 | 0.7 |

Z_nIQR | 0 | 0 | 0 | 0 |

Z_A | 0 | 30 | 0 | 8.3 |

Z_Hamp | 100 | 55 | 66.7 | 86.1 |

Not Applicable | 0 | 12.5 | 33.3 | 4.9 |

Percentage of each |n_{3}| region | 68 | 27.8 | 4.2 | 100 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Tsamatsoulis, D.
Comparing the Effectiveness of Robust Statistical Estimators of Proficiency Testing Schemes in Outlier Detection. *Standards* **2023**, *3*, 110-132.
https://doi.org/10.3390/standards3020010

**AMA Style**

Tsamatsoulis D.
Comparing the Effectiveness of Robust Statistical Estimators of Proficiency Testing Schemes in Outlier Detection. *Standards*. 2023; 3(2):110-132.
https://doi.org/10.3390/standards3020010

**Chicago/Turabian Style**

Tsamatsoulis, Dimitris.
2023. "Comparing the Effectiveness of Robust Statistical Estimators of Proficiency Testing Schemes in Outlier Detection" *Standards* 3, no. 2: 110-132.
https://doi.org/10.3390/standards3020010