Next Article in Journal
Quantum Coding via Quasi-Cyclic Block Matrix
Previous Article in Journal
An Order Reduction Design Framework for Higher-Order Binary Markov Random Fields
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Homogeneity Test of the First-Order Agreement Coefficient in a Stratified Design

College of Mathematics and System Science, Xinjiang University, Urumqi 830046, China
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(3), 536; https://doi.org/10.3390/e25030536
Submission received: 22 January 2023 / Revised: 13 March 2023 / Accepted: 16 March 2023 / Published: 20 March 2023

Abstract

:
Gwet’s first-order agreement coefficient (AC 1 ) is widely used to assess the agreement between raters. This paper proposes several asymptotic statistics for a homogeneity test of stratified AC 1 in large sample sizes. These statistics may have unsatisfactory performance, especially for small samples and a high value of AC 1 . Furthermore, we propose three exact methods for small pieces. A likelihood ratio statistic is recommended in large sample sizes based on the numerical results. The exact E approaches under likelihood ratio and score statistics are more robust in the case of small sample scenarios. Moreover, the exact E method is effective to a high value of AC 1 . We apply two real examples to illustrate the proposed methods.

1. Introduction

In the medical field, it is necessary to judge the accuracy and the interchangeability of different diagnostics. Inter-rater agreement is widely used to quantify the closeness of ratings for subjects by two raters. The recommendation of an efficient and economical method should guarantee a high degree of agreement between its result and the gold-standard method. A simple example is that independent raters A and B assess each subject with binary outcomes (e.g., + / , and Yes/No). Let n i j ( i , j = 1 , 2 ) be the numbers of independent subjects judged by two raters as ( + , + ) , ( , + ) , ( + , ) , and ( , ) , and P i j ( i , j = 1 , 2 ) be the corresponding probabilities, respectively. Denote n = n 1 + + n 2 + = n + 1 + n + 2 = i = 1 2 j = 1 2 n i j . The data can be arranged into a 2 × 2 original table (Table 1).
Researchers have developed several indices by which to measure the degree of agreement between raters on a nominal scale category, where the unordered categories are independent, mutually exclusive, and exhaustive. Denote P i = P r ( t h e   p r o b a b i l i t y   i s   c l a s s i f i e d   a s   +   b y   R a t e r i ) , i = A , B . We call P i ( i = A , B ) as the marginal probability. Cohen [1] showed that the χ 2 test was indefensible because of the null hypothesis with independence, not agreement. Furthermore, he presented the kappa coefficient to compute the extent of agreement between raters. For the problem of nominal scale agreement between raters A and B in Table 1, there are only two relevant quantities: the overall agreement probability p a , and the chance–agreement probability p c . Cohen’s kappa coefficient is defined by κ = ( p a p c ) / ( 1 p c ) , where p a = P 11 + P 22 , p c = P A P B + ( 1 P A ) ( 1 P B ) . That is to say, the coefficient κ is the proportion of agreement after the removal of chance agreement. Suppose that the distribution of proportions over the categories for the population is known and is taken to be equal for the judges. Thus, Scott [2] proposed π coefficient π = ( p a p π ) / ( 1 p π ) , where p π is the percent agreement to be expected on the basis of chance, and p π = ( P A + P B 2 ) 2 + ( 1 P A + P B 2 ) 2 . Despite the wide range of applications, the limitations of these coefficients have two main aspects: (i) it highly depends on marginal probabilities [3], and (ii) it is often affected by the composition of the population for subjects easy or difficult to agree upon [4]. For example, Cicchetti and Feinstein [3] illustrated one of the limitations by an example: n 11 = 118 , n 12 = 5 , n 21 = 2 , and n 22 = 0 . Through simple calculation, p a = 118 / 125 + 0 / 125 = 0.944 , p c = ( 123 / 125 ) × ( 120 / 125 ) + ( 5 / 125 ) × ( 2 / 125 ) = 0.9453 . Thus, the estimator κ ^ = ( 0.944 0.9456 ) / ( 1 0.9456 ) = 0.0234 < 0 . For Scott’s π coefficient, we have π ^ = 0.0288 because p π = 0.9456 . It is unreasonable that a high agreement has low κ and π coefficients. To solve the problem, some alternative indices have been derived to measure the consistency, such as Holley and Guilford’s G index [5], Aickin’s α agreement parameter [6], Andr e ´ s and Marzo’s delta measure [7]. Gwet [8] revealed the origin of these limitations and proposed the first-order agreement coefficient (AC 1 ) as an alternative index. The definition of this coefficient is based on two premises: (a) chance agreement occurs when at least one rater rates an individual randomly, and (b) only an unknown portion of observed ratings is subjected to randomness. Define two events
G = { b o t h   r a t e r s   a g r e e } , R = { a t   l e a s t   o n e   r a t e r   p e r f o r m s   r a n d o m   r a t i n g } .
Thus, the probability of agreement expected by chance can be defined by p e P ( G R ) = P ( R ) P ( G | R ) . Generally, a random rating may classify an individual into either category with the same probability of 1 / 2 . Since agreement may occur in either type, we have P ( G | R ) = 2 × ( 1 / 2 ) 2 = 1 / 2 . As for the probability of random rating P ( R ) , a normalized measure of randomness ( Ψ ) is used to approximate it as follows,
P ( R ) Ψ = π + ( 1 π + ) 1 / 2 ( 1 1 / 2 ) = 4 π + ( 1 π + ) ,
where π + represents the probability that a random rater classifies a randomly chosen individual into the “+” category. That is to say, p e can be quantified by p e * = P ( G | R ) Ψ = 2 π + ( 1 π + ) . Then, the AC 1 coefficient can be expressed as γ = ( p a p e * ) / ( 1 p e * ) , where p a denotes the agreement probability. In the above example, π ^ + = ( 123 / 125 + 120 / 125 ) / 2 = 0.9720 and p e * = 0.0544 . By the definition of AC 1 , we have γ = 0.9408 . Thus, the AC 1 coefficient is more consistent with the observed extent of agreement than Cohen’s κ and Scott’s π coefficients. There have been quite a few pieces of literature about agreement coefficients [9,10,11].
As with Scott’s π coefficient, Ohyama [12] assumed that two raters have a common marginal probability, that is, P A = P B π + . Thus, P 12 = P 21 , and Table 1 can be simplified as Table 2. Define
X i j = 1 , i f   r a t e r   i   c l a s s i f i e s   t h e   s u b j e c t   j   i n t o   c a t e g o r y   + , 0 , o t h e r w i s e
for i = A , B , j = 1 , 2 , , n . Suppose that the underlying probability of classifying a subject depends not on raters but subjects, which is P ( X i j = 1 | j ) = p j . We can obtain the overall agreement probability ( p a ) based on the idea of Vanbelle and Albert [13]. The agreement can occur in ( + , + ) and ( , ) , and the corresponding probabilities for jth subject are P 11 j = p j 2 and P 22 j = ( 1 p j ) 2 , respectively. Thus, the agreement probability of two raters for the jth subject is p a j = p j 2 + ( 1 p j ) 2 . We denote the mean of positive classification probability as E ( p j ) = j = 1 n p j / n π + , and the corresponding variance as Var ( p j ) = j = 1 n ( p j π + ) 2 / n σ 2 , where n is the size of the population. Then, the probabilities of ( + , + ) and ( , ) ratings over the population are P 11 = E ( p j 2 ) = V a r ( p j ) + ( E ( p j ) ) 2 = σ 2 + π + 2 and P 22 = E ( ( 1 p j ) 2 ) = 1 2 π + + σ 2 + π + 2 . Finally, the agreement probability over the population is p a = P 11 + P 22 = 1 + 2 ( σ 2 π + ( 1 π + ) ) . The AC 1 coefficient ( γ ) for a binary outcome judged by two raters is rewritten by
γ = p a p e * 1 p e * = 1 + 2 σ 2 4 π + ( 1 π + ) 1 2 π + ( 1 π + ) .
Up until now, the application [14,15] and the statistical inference [12] of the AC 1 have been concentrated at the situation without stratification. However, the ignorance of confounding variables or covariates may lead to a biased conclusion. Researchers often stratify the data into multiple strata to control the influence of these factors. A stratified analysis is applied to evaluate the relationship between the nontreatment factors of a clinical trial (age, gender, or severity of disease, etc.) and agreement. A test of homogeneity is the first step of the stratified analysis. It is essential to analyze the factors that lead to heterogeneity when we reject the homogeneity hypothesis. Suppose K levels of the subject covariates are introduced into Table 2 for two raters with binary outcome, and the data can be arranged in a 3 × K table of observed cell counts. Generally speaking, a sample can be classified as a large or small sample by the sample size. Hannah et al. [16] analyzed the data about the alcohol-drinking status of twins. A subject is categorised as nondrinker if he/she consumes less than 30 gm alcohol per week, and otherwise is a drinker. Thus, the binary outcome is the drinking status (drinker or nondrinker). A number of same-sex twins are stratified by zygosity, including monozygotic (MZ), and dizygotic (DZ). Nam [17] used the kappa index to investigate the agreement of alcohol-drinking status between twins. The data structure of male twins is shown in Table 3. The large-sample inference has been performed for the data type, including score, likelihood ratio, and Wald-type statistics [18]. Honda and Ohyama [19] proposed score and goodness-of-fit tests for the homogeneity test of stratified AC 1 . Unfortunately, both tests performed poorly due to the conservative or liberal type I error rates, especially for small sample sizes. Meanwhile, a high AC 1 may lead to conservative type I error rates for small and moderate sample sizes.
In practice, we often encounter small sample cases of agreement data, for example, a clinical trial about coronavirus disease 2019 (COVID-19) [20]. In this trial, the enzyme-linked immunosorbent assay (ELISA) and gold-standard methods are used to detect the novel coronavirus IgG and IgM antibodies, classifying each of them as either positive ( + ) or negative ( ) . ELISA positive criterion is that the sample’s optical density (OD) value is greater than or equal to the critical value. The positive criterion of the gold-standard method is the appearance of two colored bands. Table 4 lists the data stratified by the IgG and the IgM antibodies ( K = 2 ) , 17 patients in each group. Similar to Table 3, “One” entry corresponds to the number of “ ( + , ) ” and “ ( , + ) ”.
Unfortunately, asymptotic test statistics do not apply to small data. Exact approaches are effective for small samples, such as Fisher’s exact test [21,22,23], and its extensions [24,25,26]. A conservative performance of Fisher’s exact method supported the appearance of other exact approaches. We note that there exist nuisance parameters in the model of AC 1 coefficient. Significant progress has been achieved in the elimination of nuisance parameters for decades [27,28,29,30,31]. By fixing the marginal totals in the contingency table, Mehta [27] extensively used the conditional test (referred to as the C approach) to analyze various classical categorical data. Liddell [28] derived a test based on the exact distribution of the difference in sample proportions. As an alternative, Storer and Kim [29] modified Liddell’s exact test, abbreviated as the E approach. Basu [30] provided a new procedure by maximizing the tail probability over the whole range of parameters, called the M approach. The global maximum is a challenge when the parameter space is not finite. Lloyd [31] pointed out the weakness of the M approach, and he suggested a so-called E+M approach by defining the tail area with the E approach and maximizing the tail probability over the parameter space. Generally, E, M, and E+M approaches are called unconditional tests. Tang et al. [32] showed that the exact conditional approach was generally inferior to the exact unconditional approach for small samples. Shan and Wilding [33] compared asymptotic and exact procedures for the kappa coefficient in a 2 × 2 table. However, little work has been carried out in extending the exact approaches to test the homogeneity of the AC 1 coefficients across several independent strata.
This paper aims to propose asymptotic and exact methods for the homogeneity test of stratified AC 1 . The novelty and contribution are shown by three main aspects as follows. (i) For large sample sizes, we propose two asymptotic statistics, including likelihood ratio and Wald-type tests, to extend the study of homogeneity test in Honda and Ohyama [19] under large sample sizes. Our results show that the likelihood ratio test is more robust than other tests regarding type I error rates. The powers of these tests are close to each other. Thus, we recommend the likelihood ratio test for large samples’ homogeneity test of stratified AC 1 . (ii) Based on the asymptotic statistics, we derive three exact approaches (E, M, and E+M methods) to investigate the small sample cases ( n = 10 , 25 ). These exact methods can effectively improve the performance of the homogeneity test concerning type I error rates. Among these methods, the exact E approaches based on likelihood ratio and score tests are more robust in small samples. (iii) We investigate the strengths and weaknesses of asymptotic and exact methods through plentiful numerical analyses, respectively. Some beneficial conclusions are obtained from the analyses of actual examples. The rest of this paper is organized as follows. In Section 2, we review the AC 1 coefficient in a stratified condition and establish a probability model. The maximum likelihood method and iterative algorithm are used to estimate the unknown parameters. We further review the score statistic and derive two asymptotic test statistics for large samples in Section 3. Based on these statistics, several exact methods are used for small sample sizes in Section 4. In Section 5, we conduct numerical studies to investigate the performance of all the derived methods regarding type I error rates and powers. In Section 6, we study the aforementioned real examples of large and small samples to illustrate these methods. Finally, a brief conclusion is given in Section 7.

2. A Probability Model and Homogeneity Test

Following Ohyama [12], we introduce K covariates into Table 2 and establish a probability model. Suppose that N subjects are divided into K independent strata. In the kth ( k = 1 , 2 , , K ) stratum, there are n 1 k , n 2 k , and n 3 k subjects in the three categories. Denote n k = l = 1 3 n l k as the total number of subjects in the kth stratum. Table 5 shows the data structure across the strata.
For the stratified analysis, we need to construct AC 1 for each stratum. Let X k i j be an indicator of the ith ( i = 1 , 2 ) rater’s judgement for the jth ( j = 1 , 2 , , n K ) subject in the kth ( k = 1 , 2 , , K ) stratum. If there is a positive “ ( + ) ” classification, then X k i j = 1 , and otherwise 0. Ohyama [12] assumed that the underlying probability of classifying a subject does not depend on raters but on subjects; that is, P r ( X k i j = 1 | j ) = p k j . The N subjects are classified into K strata based on covariates, and every stratum has different subjects. Thus, the data of every stratum is independent of each other. Denote E ( p k j ) = j = 1 n k p k j / n k π k , and V a r ( p k j ) = j = 1 n k ( p k j π k ) 2 / n k σ k 2 . Then, AC 1 of the kth stratum is
γ k = 1 + 2 [ σ k 2 2 π k ( 1 π k ) ] 1 2 π k ( 1 π k ) , k = 1 , 2 , , K .
Suppose that P 1 k ( γ k , π k ) , P 2 k ( γ k , π k ) , and P 3 k ( γ k , π k ) are the corresponding probabilities in the kth stratum, where π k and γ k are the common positive classification probability and the AC 1 coefficient, respectively. As the AC 1 coefficient in the kth stratum, γ k includes the information of π k and σ k . It is obvious that there is no one-to-one correspondence between γ k and π k . Denote n k = ( n 1 k , n 2 k , n 3 k ) T , and P k = ( P 1 k ( γ k , π k ) , P 2 k ( γ k , π k ) , P 3 k ( γ k , π k ) ) T . For the kth stratum, n k ( k = 1 , 2 , , K ) follows a trinomial distribution. Thus, the probability density of n k is expressed as follows:
f ( P k | n k ) = n k ! n 1 k ! n 2 k ! n 3 k ! P 1 k n 1 k P 2 k n 2 k P 3 k n 3 k .
Through calculation, the probabilities P l k ( l = 1 , 2 , 3 , k = 1 , 2 , , K ) are obtained by
P 1 k ( γ k , π k ) = π k ( 2 π k ) 1 / 2 + γ k ( 1 2 π k ( 1 π k ) ) / 2 , P 2 k ( γ k , π k ) = ( 1 2 π k ( 1 π k ) ) ( 1 γ k ) , P 3 k ( γ k , π k ) = ( 1 π k ) ( 1 + π k ) 1 / 2 + γ k ( 1 2 π k ( 1 π k ) ) / 2 ,
where 0 P i k ( γ k , π k ) 1 and i = 1 3 P i k ( γ k , π k ) = 1 . Figure 1 shows the admissible range of γ k , satisfying
2 ( 1 | 1 2 π k | ) ( 3 + | 1 2 π k | ) 2 ( 1 | 1 2 π k | ) ( 1 + | 1 2 π k | ) γ k 1 .
Our work is interested in testing whether the AC 1 coefficients γ k ( k = 1 , 2 , , K ) are homogeneous among the K independent strata, that is,
H 0 : γ 1 = = γ K γ vs H a : γ k is not all the same ,
Denote π = ( π 1 , , π K ) and γ = ( γ 1 , , γ K ) . First, we calculate the unknown parameters under the alternative hypothesis H a . The corresponding log-likelihood function of the observed data N = ( n 1 , n 2 , , n K ) is
l ( γ , π | N ) = log ( k = 1 K f ( P k | n k ) ) = log ( k = 1 K n k ! n 1 k ! n 2 k ! n 3 k ! P 1 k n 1 k P 2 k n 2 k P 3 k n 3 k ) = k = 1 K n 1 k log P 1 k ( γ k , π k ) + n 2 k log P 2 k ( γ k , π k ) + n 3 k log P 3 k ( γ k , π k ) + C k = 1 K l k ( γ k , π k | n k ) + C ,
where C = log ( k = 1 K n k ! n 1 k ! n 2 k ! n 3 k ! ) is a constant, and l k ( γ k , π k | n k ) is the log-likelihood function of the kth stratum under H a . Let γ ^ k and π ^ k ( k = 1 , 2 , , K ) be the unconstrained maximum likelihood estimates (MLEs) of γ k and π k under H a . By solving the following equations,
l k π k = 2 n 2 k ( 2 π k 1 ) 2 π k 2 2 π k + 1 2 n 3 k ( γ k + 2 π k 2 π k γ k ) 1 2 π k 2 + γ k ( 2 π k 2 2 π k + 1 ) + 2 n 1 k ( 2 2 π k γ k + 2 π k γ k ) 4 π k + γ k ( 2 π k 2 2 π k + 1 ) 2 π k 2 1 = 0 , l k γ k = n 2 k γ k 1 + n 1 k ( 2 π k 2 2 π k + 1 ) 4 π k + γ k ( 2 π k 2 2 π k + 1 ) 2 π k 2 1 + n 3 k ( 2 π k 2 2 π k + 1 ) 1 2 π k 2 + γ k ( 2 π k 2 2 π k + 1 ) = 0 ,
we have
π ^ k = 2 n 1 k + n 2 k 2 n k , γ ^ k = 1 2 n k n 2 k n k 2 + ( n 1 k n 3 k ) 2 , k = 1 , 2 , , K .
Next, we estimate the parameters γ and π = ( π 1 , , π K ) under the null hypothesis H 0 : γ 1 = γ 2 = = γ K γ . The log-likelihood function is rewritten by
l 0 ( γ , π | N ) = k = 1 K { n 1 k log π k ( 2 π k ) 1 2 + γ 2 ( 1 2 π k ( 1 π k ) ) + n 2 k log ( 1 2 π k ( 1 π k ) ) ( 1 γ ) + n 3 k log ( 1 π k ) ( 1 + π k ) 1 2 + γ 2 ( 1 2 π k ( 1 π k ) ) } + C k = 1 K l 0 k ( γ , π k | n k ) + C ,
where l 0 k ( γ , π k | n k ) is the log-likelihood function of the kth stratum under H 0 . Let γ ˜ and π ˜ k ( k = 1 , 2 , , K ) be the constrained MLEs of γ and π k under H 0 . Similarly, we can differentiate l 0 ( γ , π | N ) to γ and π k , and set them to zero as follows:
l 0 π k = 2 n 2 k ( 2 π k 1 ) 2 π k 2 2 π k + 1 2 n 3 k ( γ + 2 π k 2 π k γ ) 1 2 π k 2 + γ ( 2 π k 2 2 π k + 1 ) + 2 n 1 k ( 2 2 π k γ + 2 π k γ ) 4 π k + γ ( 2 π k 2 2 π k + 1 ) 2 π k 2 1 = 0 , l 0 γ = k = 1 K n 2 k γ 1 + n 1 k ( 2 π k 2 2 π k + 1 ) 4 π k + γ ( 2 π k 2 2 π k + 1 ) 2 π k 2 1 + n 3 k ( 2 π k 2 2 π k + 1 ) 1 2 π k 2 + γ ( 2 π k 2 2 π k + 1 ) = 0 .
However, there are no closed-form solutions for the above equations. The Fisher scoring algorithm is used to obtain the constrained MLEs. Three steps describe the iteration process as follows.
(i)
Given the initial values γ ( 0 ) = 0.5 , and π k ( 0 ) = ( 2 n 1 k + n 2 k ) / ( 2 n k ) in the kth stratum.
(ii)
The ( t + 1 ) -th approximates of γ and π can be updated by
γ ( t + 1 ) π ( t + 1 ) = γ ( t ) π ( t ) + I 1 1 ( γ , π ) × l γ l π | γ = γ ( t ) , π = π ( t ) ,
where π = ( π 1 , , π K ) T , l π = ( l π 1 , , l π K ) T , and I 1 is the ( K + 1 ) × ( K + 1 ) Fisher information matrix (Appendix A.1).
(iii)
Repeat the processes (i)–(ii) until the results converge.

3. Asymptotic Methods

3.1. Likelihood Ratio Statistic T L

The unconstrained and constrained MLEs construct a likelihood ratio test statistic. It is defined by
T L = 2 [ l ( γ ^ , π ^ | N ) l 0 ( γ ˜ , π ˜ | N ) ] = 2 k = 1 K [ l k ( γ ^ k , π ^ k | n k ) l 0 k ( γ ˜ , π ˜ k | n k ) ] ,
where N = ( n 1 , n 2 , , n K ) is the observed data, γ ^ = ( γ ^ 1 , γ ^ 2 , , γ ^ k ) and π ^ = ( π ^ 1 , π ^ 2 , , π ^ k ) are the unconstrained MLEs, γ ˜ and π ˜ = ( π ˜ 1 , π ˜ 2 , , π ˜ k ) are the constrained MLEs.

3.2. Score Statistic T S C

Honda and Ohyama [19] proposed the score statistic. Denote
U = l 1 γ 1 , l 2 γ 2 , , l K γ K , 0 , 0 , , 0 1 × 2 K .
Under H 0 , the score test statistic can be represented as
T S C = U I 2 1 U T | γ 1 = γ 2 = = γ K = γ ˜ , π = π ˜ ,
where γ ˜ and π ˜ = ( π ˜ 1 , π ˜ 2 , , π ˜ K ) T are the constrained MLEs. The 2 K × 2 K Fisher information matrix I 2 is given in Appendix A.2. Through calculation, its simplified form is
T S C = k = 1 K r k 2 d k n k ( b k d k c k 2 ) | γ k = γ ˜ , π k = π ˜ k ,
where
r k = n 1 k P 1 k ( γ k , π k ) 2 n 2 k P 2 k ( γ k , π k ) + n 3 k P 3 k ( γ k , π k ) , b k = 1 P 1 k ( γ k , π k ) + 4 P 2 k ( γ k , π k ) + 1 P 3 k ( γ k , π k ) , c k = 1 P 1 k ( γ k , π k ) 1 P 3 k ( γ k , π k ) + ( 1 γ k ) ( 1 2 π k ) b k , d k = 1 P 1 k ( γ k , π k ) + 1 P 3 k ( γ k , π k ) + ( 1 γ k ) ( 1 2 π k ) 1 P 1 k ( γ k , π k ) 1 P 3 k ( γ k , π k ) + c k
for k = 1 , 2 , , K .

3.3. Wald-Type Statistic T W

Denote β = ( γ 1 , γ 2 , , γ K , π 1 , π 2 , , π K ) 1 × 2 K , and
C = 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 ( K 1 ) × 2 K .
The null hypothesis H 0 is equivalent to C β T = 0 , where 0 is a zero vector. Thus, we define the Wald-type statistic as
T W = ( β C T ) ( C I 3 1 C T ) 1 ( C β T ) | γ k = γ ^ k , π k = π ^ k ,
where γ ^ k and π ^ k are the unconstrained MLEs. The Fisher information matrix I 3 is the same as that of the score test. We obtain the simplified form of T W as
T W = i = 1 K 1 j = 1 K 1 ( γ ^ i γ ^ i + 1 ) ( γ ^ j γ ^ j + 1 ) ( C I 3 1 C T ) i , j 1 .
Appendix A.3 provides the detailed process.
Under H 0 , these three statistics T L , T S C , and T W are asymptotically distributed as a chi-square distribution with K 1 degrees of freedom [34]. Given a significance level α , H 0 would be rejected if T θ χ ( K 1 ) , ( 1 α ) 2 ,   θ = L , S C , W , where χ ( K 1 ) , ( 1 α ) 2 is the 100 ( 1 α ) percentile of the chi-square distribution with K 1 degrees of freedom. For a special observed data N * = ( n 1 , n 2 , , n K ) , the p-values of these statistics are defined as
p θ A ( N * ) = P ( χ ( K 1 ) , ( 1 α ) 2 > T θ ( N * ) ) , θ = L , S C , W ,
where T θ ( N * ) is the value of the statistic for the observed data N * . For convenience, p L A , p S C A , and p W A are called asymptotic (A) approaches. Generally, asymptotic tests work well for large sample scenarios. However, they are conservative or liberal in the case of small sample sizes. Thus, we propose several exact methods based on the above statistics.

4. Exact Methods

Researchers often use the p-value to summarise the evidence against a null hypothesis. Thus, the key to the exact method is the calculation of the exact p-value. We uniformly denote the aforementioned test statistics T L , T S C , and T W as T θ ( θ = L , S C , W ) . Instead of relying on the chi-square distribution, the exact test can use the true sampling distribution of T θ and compute an exact p-value. The calculation process is as follows. First, we need to generate all possible tables. For a given observed data N * , the column margins n 1 , n 2 , , n K are fixed. We enumerate all possible tables by varying the cell values. The detailed process is described as follows.
(i) Produce all possible values of each stratum, which is formed by all combinations ( n 1 k , n 2 k , n 3 k ) such that n 1 k + n 2 k + n 3 k = n k ( k = 1 , , K ) , and n k is fixed. We take K = 2 and n 1 = n 2 = 2 as an example. There are six combinations in the kth stratum, including ( 0 , 0 , 2 ) , ( 0 , 1 , 1 ) , ( 0 , 2 , 0 ) , ( 1 , 0 , 1 ) , ( 1 , 1 , 0 ) , ( 2 , 0 , 0 ) .
(ii) Enumerate all possible tables determined by the combination of all strata. For K = 2 and n 1 = n 2 = 2 , we can obtain 36 possible tables in Table 6.
Note that each column corresponds to a categorical table with K strata.
Through steps (i)–(ii), we can enumerate all possible tables for any observed data N * = ( n 1 , n 2 , , n K ) . Then we identify the tail area from this reference set. The tail area includes all the tables whose statistic values equal or exceed the statistics of the observed data N * . Finally, the exact p-value is calculated by summing the probabilities of all the tables in the tail area. The calculation of the exact p-value needs to eliminate the unknown parameters shown in the previous section. The following exact methods use different ways for the elimination of the unknown parameters.

4.1. E Approach

The E approach eliminates the unknown parameters by replacing them with the constrained MLEs. We first generate all possible tables. Define the tail area Ω E ( N * ) = { N : T θ ( N ) T θ ( N * ) } based on the test statistic T θ . The exact p-value of the observed data N * is expressed by
p θ E ( N * ) = P ( T θ ( N ) T θ ( N * ) | γ ˜ * , π ˜ * ) = N Ω E ( N * ) L ( γ ˜ * , π ˜ * | N ) , θ = L , S C , W ,
where γ ˜ * and π ˜ * = ( π ˜ 1 * , π ˜ 2 * , , π ˜ K * ) are the constrained MLEs of γ and π = ( π 1 , π 2 , , π K ) . Meanwhile, the probability of a table in the tail area is L ( γ ˜ * , π ˜ * | N ) = exp l 0 ( γ ˜ * , π ˜ * | N ) , which is the likelihood function under the null hypothesis. For convenience, p L E , p S C E , and p W E are collectively called the E approach.

4.2. M Approach

In Basu [30], the size of a test is always understood as the maximum probability of the type I error rate. Thus, the elimination of the unknown parameters for the M approach is to find the values of parameters over the whole range of γ and π , which can maximize the sum of probabilities of all the tables in the tail area. This maximum is the p-value of the M approach. Denote Θ = { π : π k [ 0 , 1 ] , k = 1 , 2 , , K } and
Λ = γ : 2 ( 1 | 1 2 π k | ) ( 3 + | 1 2 π k | ) 2 ( 1 | 1 2 π k | ) ( 1 + | 1 2 π k | ) γ k 1 ,
where π = ( π 1 , π 2 , , π K ) and γ = ( γ 1 , γ 2 , , γ K ) . Similar to the E approach, the tail area can be calculated by Ω M ( N * ) = { N : T θ ( N ) T θ ( N * ) } . Under these conditions, the exact p-value of the M approach can be defined as
p θ M ( N * ) = sup γ Λ , π Θ N Ω M ( N * ) L ( γ , π | N ) , θ = L , S C , W ,
where L ( γ , π | N ) = exp l ( γ , π | N ) is the likelihood function under H a . M approaches based on the three statistics are denoted as p L M , p S C M , and p W M .

4.3. E+M Approach

The E approach is not always effective because of unsatisfactory type I error rates. Lloyd [31] used an additional maximization step to improve it, which is called the E+M approach. First, the p-value of the E approach is used as a test statistic to define the tail area. Then, we maximize the sum of probabilities of all the tables in the tail area as the exact p-value. Based on the above procedures, the tail area of the E+M approach is defined as Ω E + M ( N * ) = { N : p θ E ( N ) p θ E ( N * ) } . The exact p-value of the E+M approach is expressed as
p θ E M ( N * ) = sup γ Λ , π Θ N Ω E + M ( N * ) L ( γ , π | N ) , θ = L , S C , W ,
where L ( γ , π | N ) is the same as the likelihood function in the M approach. The E+M approach includes p L E M , p S C E M , and p W E M .

5. Numerical Simulation

This section investigates the performance of asymptotic and exact methods in terms of type I error rates and powers. Given a significance level of 0.05 , the type I error rate is the probability of rejecting H 0 when H 0 is true. According to Tang et al. [35], a test is considered liberal when the type I error rate is larger than 0.06 , and conservative if it is less than 0.04 ; otherwise, it is robust. In several tables of this paper, we put the robust region of type I error rate (0.04–0.06) in bold to illustrate the performance of statistics. The power is defined by
p o w e r = 1 β = 1 P ( H 0   i s   a c c e p t e d | H 0   i s   f a l s e ) .
A test is optimal if it is robust and has more significant power.

5.1. Simulations of Asymptotic Methods

In the simulation, we first compare the performance of test statistics T L , T S C , and T W in terms of empirical type I error rates under different parameter settings. Under H 0 : γ 1 = γ 2 = 0.1 , we take K = 2 , n 1 = n 2 = 50 , and π 1 = π 2 = 0.3 as an example to describe the detailed calculation process of empirical type I error rates.
(i)
Bring the given values of γ k , n k , and π k ( k = 1 , 2 ) into (3). Let F be the cumulative distribution for the three types of ratings ( l = 1 , 2 , 3 ) of the two strata. Through calculation, we have
P 11 P 12 P 21 P 22 P 31 P 32 = 0.0390 0.0390 0.5220 0.5220 0.4390 0.4390 , F = ( F l k ) = 0.0390 0.0390 0.5610 0.5610 1.0000 1.0000 .
(ii)
We produce an n 1 × K pseudorandom matrix drawn from the standard uniform distribution on the open interval (0,1), denoted by r = ( r i k ) ( i = 1 , 2 , , n 1 , k = 1 , 2 , , K ) . Define
n 1 k = i = 1 n 1 I { r i k < F 1 k } , n 3 k = i = 1 n 1 I { r i k > F 2 k } , k = 1 , 2 , , K .
Then, n 2 k = n 1 n 1 k n 3 k for k = 1 , 2 , , K . When K = 2 , we can obtain a sample (or table) with two strata as follows:
n 11 n 12 n 21 n 22 n 31 n 32 = 2 2 23 26 25 22 .
When 10 , 000 pseudorandom matrices are given, 10 , 000 samples are randomly produced under the null hypothesis H 0 : γ 1 = γ 2 = 0.1 .
(iii)
For each sample, we calculate the corresponding MLEs and construct three statistics T L , T S C , and T W . Given a significance level α , H 0 would be rejected if T θ χ ( K 1 ) , ( 1 α ) 2 ,   θ = L , S C , W , where χ ( K 1 ) , ( 1 α ) 2 is the 100 ( 1 α ) percentile of the chi-square distribution with K 1 degrees of freedom.
(iv)
The empirical type I error rate is calculated by the proportion of rejecting H 0 , which is the number of rejections / 10 , 000 .
Through steps (i)–(iv), we can calculate the empirical type I error rates of asymptotic test statistics T L , T S C , and T W under different parameter settings. In practice, the AC 1 coefficient is usually positive. Under H 0 : γ 1 = = γ K γ , Table 7 shows the empirical type I error rates of asymptotic statistics for K = 2 under the balanced and unbalanced π settings. The corresponding results of K = 3 , 4 are shown in Table A1 and Table A2 of the Appendix A.4. The tables show that the type I error rates of all the statistics are closer to the significance level of 0.05 with the increasing sample size. When the sample sizes are relatively small, the type I error rates of the likelihood ratio and score statistics are smaller than 0.05 . The Wald-type test has a few liberal type I error rates. These three test statistics have conservative type I error rates for small and moderate samples when γ is close to 1. As the number of strata increases, T W becomes more liberal. For the unbalanced π , some type I error rates under γ = 0.1 are more significant than 0.06 under K = 3 and K = 4 . Overall, T L should be recommended because of the more robust type I error rates among the three statistics.
We use three-dimensional figures to investigate the asymptotic methods’ type I error rates. For convenience, the sample sizes are given as n 1 = n 2 = = n K n = 10 , 50 , 100 . Parameters are selected from π k = π and γ k = γ ( k = 1 , 2 , , K , K = 2 , 3 , 4 ) . For each sample size, π increases from 0.1 to 0.9 by 0.04 , and γ increases from 0.9 to 0.9 by 0.04 . Figure 2 shows the distribution surfaces of type I error rates for all tests under K = 2 . Similarly, the cases of K = 3 , 4 are displayed in Figure A1 and Figure A2 of the Appendix A.4. From these figures, we observe that the type I error rates of these statistics are smaller than 0.05 when the value of γ is close to 1 or 1. The type I error rates of Wald-type statistics tend to be larger for the same sample size with the increasing number of strata. Thus, it is more liberal than the other two statistics. The empirical type I error rates of the likelihood ratio and score statistics are closer to 0.05 as the sample size increases. For large sample scenarios, T L , T S C , and T W are usually robust, the type I error rates of T L are more concentrated at 0.05 . Overall, the likelihood ratio statistic performs better under all configurations. However, when the sample sizes are small, most type I error rates of T L and T S C are smaller than 0.04 , and those of T W are greater than 0.06 .
Next, we analyze the powers of the proposed tests, which are similar to the calculation of empirical type I error rate when the samples are generated from the alternative hypothesis H a . Take n 1 = n 2 = = n K n = 10 , 50 , 100 . The following parameter settings are considered for each sample size: (i) K = 2 , π = ( 0.5 , 0.5 ) , γ 1 = 0.1 , γ 2 = 0.9 : 0.05 : 0.95 , (ii) K = 3 , π = ( 0.5 , 0.5 , 0.5 ) , γ 1 = γ 3 = 0.1 , γ 2 = 0.9 : 0.05 : 0.95 , (iii) K = 4 , π = ( 0.5 , 0.5 , 0.5 , 0.5 ) , γ 1 = γ 3 = γ 4 = 0.1 , γ 2 = 0.9 : 0.05 : 0.95 . Here, a : b : c means increasing from a to c by b. Under the alternative hypothesis H a , we randomly generate 10 , 000 samples for each design. The empirical power equals the proportion of rejecting H 0 in all samples. Figure 3 shows the empirical powers of the three asymptotic tests. The Wald-type test has higher empirical powers than the other two statistics, especially in small samples. The powers of test statistics become higher if there exists a more considerable difference between γ 2 and γ 1 ( γ 3 ) . Among these statistics, the values of powers are closer as the sample size becomes larger.
Above all, T L is recommended for the homogeneity test of stratified AC 1 in large sample sizes because of the robust type I error rates and satisfactory powers.

5.2. Exact Methods Results

Considering the unsatisfactory performance of asymptotic methods in small sample sizes, we introduce the exact E, M, and E+M methods to improve effectiveness. The type I error rates and powers of these three methods are compared with asymptotic approaches to investigate the advantages of exact methods. The algorithm for exact p-value is usually computationally intensive, time consuming, and sometimes exceeds the memory limits of the computer. Thus, running time is an important determinant for the appropriate numbers of strata and sample size in the numerical study of exact methods. For simplicity, we only focus on n 1 = = n k n , π k = 0.5 , and γ k = 0.1 ( k = 1 , 2 , , K , ) . The average of 100 running times is used as the running time of the exact p-value. We study the running times in the case of the following parameter settings: (i) K = 2 , n = 2 , , 11 , and (ii) n = 3 , K = 2 , 3 , 4 , 5 . The running times of different methods are shown in Table 8. From the results, it is obvious that running time will increase exponentially for the growth of the number of strata K and the marginal numbers n. It is challenging for a computer’s megabytes of storage and clock speed. Thus, the cases n = 10 and K = 2 , 3 are considered in our work. Take π k = π , γ k = γ ( k = 1 , 2 , , K , ) , where π = 0 : 0.02 : 1 , and γ = 1 : 0.02 : 1 . Figure 4 shows the surfaces of type I error rates for K = 2 . In the Appendix A.4, we provide the case of K = 3 in Figure A3. The small diagrams in the upper right corner reflect the curves of the type I error rates under π = 0.5 and γ = 1 : 0.02 : 1 . For the large diagrams, p L A and p W A have liberal type I error rates, while the type I error rates of p S C A are smaller than 0.05 . The M and E+M approaches produce conservative type I error rates. The E approaches under the likelihood ratio and score statistics are better than that under the Wald-type test when K = 2 . For K = 3 , the surfaces of E approaches under three statistics are closer to the significance level in the case of positive γ . From small diagrams, the curves of type I error rates have bimodal shapes. To reveal the reason, we consider the case of K = 2 , γ k = γ , and π k = 0.5 . As a part of tail probability,
L L = exp k = 1 K ( l k ( γ k , π k ) | n k )
has the same value for each γ under the fixed sum of n 2 k . Table 9 reflects the changes of L L as the increases of γ and ( n 21 + n 22 ). The table shows that the increase of γ can affect the peak change with the given sum of n 2 k . The peak also occurs when the sum of n 2 k increases for a given γ . Meanwhile, each method’s tail area determines the bimodal shape’s location and values. Next, we compare the type I error rates of exact and asymptotic methods under several parameters. Let K = 2 and n 1 = n 2 = 10 , 25 . From Table 10, the type I error rates of p L E and p S C E are closer to 0.05 than those of p W E . The E approach works better as the sample size increases in the range of small sample sizes. The type I error rates of E approaches are close to 0.05 when n 1 = n 2 = 25 . It reveals that the E approach is more effective than asymptotic methods for high γ .
According to the relationship between γ k and π k , the parameter settings are considered under H a as follows: (i) n k = n = 10 , K = 2 , π = ( 0.5 , 0.5 ) , γ 1 = 0.1 , γ 2 = 0.9 : 0.05 : 0.9 ; (ii) n k = n = 10 , K = 3 , π = ( 0.5 , 0.5 , 0.5 ) , γ 1 = γ 3 = 0.1 , γ 2 = 0.9 : 0.05 : 0.9 , ( k = 1 , 2 , , K ) . Figure 5 provides the power curves of the exact methods. For the A approach, p W A has higher powers under different parameter settings. The power becomes larger as the absolute value of γ 2 γ 1 ( γ 3 ) becomes larger. On the contrary, the powers of each method tend to be 0.05 as γ 2 becomes closer to 0.1 . The power curves of the E+M approaches are close to each other. Then, we compare the powers of asymptotic and exact methods under different parameter settings. For K = 2 and n 1 = n 2 = = n K = 10 , the values of γ k are taken as (i) γ 1 = 0.1 , γ 2 = 0.5 : 0.1 : 0.9 ; (ii) γ 1 = 0.3 , γ 2 = 0.6 : 0.1 : 0.9 ; and (iii) γ 1 = 0.5 , γ 2 = 0.8 , 0.9 . When K = 3 , let γ 1 = γ 3 and other settings be the same. In Table 11, we provide the power comparisons of K = 2 under the balanced π conditions. Table A3, Table A4 and Table A5 show the comparisons of these methods under other settings. The powers of exact methods are generally smaller than those of asymptotic methods. However, p S C E has higher powers than p S C A . For exact methods, the E approach has higher power than the other two methods, in which the E approach under the Wald-type statistic has higher power.
In summary, the exact E method can effectively improve the effectiveness of the homogeneity test of stratified AC 1 under small sample sizes and high γ . The E approaches under the likelihood ratio and score statistics perform better than that under the Wald-type statistic.

6. Applications

We review the two real examples in the introduction. Table 3 shows the data structure among 83 twins. For the large sample size, the hypotheses of the homogeneity test for AC 1 across two strata are
H 0 : γ 1 = γ 2 γ vs H a : γ i is not all the same , i = 1 , 2 .
By computation, the unconstrained MLEs of γ and π are γ ^ = ( 0.4615 , 0.0312 ) and π ^ = ( 0.5000 , 0.5161 ) . The constrained MLEs of γ and π are γ ˜ = 0.2788 and π ˜ = ( 0.5000 , 0.5351 ) . Table 12 provides the results of statistics and p-values. Given a significance level α = 0.05 , the values of test statistics T L , T S C , T W are all larger than χ 1 , 0.95 2 = 3.8415 . All the p-values of these three tests are smaller than 0.05 . Thus, the null hypothesis H 0 is rejected at the significance level 0.05 . There is a significant difference between two AC 1 coefficients, and we cannot merge the data of two strata to compute a common coefficient. Then, we need to investigate how zygosity affects the consistency of the alcohol-drinking status of male twins.
Table 4 shows the small data structure for the clinical COVID-19 trial. The A, E, M, and E+M methods are applied to test H 0 : γ 1 = γ 2 γ vs H a : γ 1 γ 2 . By calculation, the unconstrained MLEs of parameters γ and π are γ ^ = ( 0.6656 , 0.2197 ) and π ^ = ( 0.6176 , 0.6176 ) . The constrained MLEs are γ ˜ = 0.4537 and π ˜ = ( 0.5882 , 0.6666 ) . The values of statistics are T L ( N * ) = 2.0150 , T S C ( N * ) = 1.9674 , and T W ( N * ) = 2.0805 . Table 13 shows the corresponding p-values of asymptotic and exact methods. The running time is about 8 min because of two strata and n 1 = n 2 = 17 , referred to Table 8. There is no significant difference in stratified AC 1 for any approaches. Thus, we can merge the data in these two strata and estimate the value of common AC 1 by MLEs, which is 0.4537 .

7. Concluding Remarks

This article defines the stratified AC 1 coefficients as the object of study and constructs the likelihood function of the observed data. The primary purpose is to derive various statistics for testing the homogeneity of stratified AC 1 in the case of two raters with a binary outcome. We constructed asymptotic and exact methods for large and small sample sizes. Two asymptotic test statistics and their explicit expressions are derived for large sample sizes, including the likelihood ratio statistic ( T L ) and Wald-type statistic ( T W ). Meanwhile, the score statistic ( T S C ) proposed by Honda and Ohyama [19] is also reviewed. Asymptotic p-values p θ A ( θ = L , S C , W ) under the statistics mentioned above are denoted as the A approach. Three exact methods (E, M, and E+M) are proposed based on T L , T S C , and T W for small sample sizes.
We conduct numerical studies to compare the performance of the above methods by type I error rates and powers. For large samples, the type I error rates of the likelihood ratio statistic are closer to the predetermined significance level of 0.05 . The powers of statistics are better as the sample size increases. Overall, the likelihood ratio statistic is optimal among these three statistics in large sample sizes. However, asymptotic tests may generate unsatisfactory type I error rates at small sample scenarios and high γ . For small sample sizes in K = 2 , p L A and p W A are liberal, and p S C A has the conservative type I error rates under different parameter settings. The type I error rates of the E approach are closer to the significance level of 0.05 . The M and E+M approaches have conservative type I error rates. Moreover, p L E and p S C E are more robust than p W E . When K = 3 , and γ is positive, the E approach has the robust type I error rates among three methods, in which p L E and p S C E perform better. The type I error rates of the exact E method are closer to 0.05 as the sample size increases. The E approach can improve the effectiveness of homogeneity tests at high γ . In the case of powers, the A approach has larger powers than these exact methods. Under the Wald-type statistic, each exact method has a higher power. Thus, the E approaches based on the likelihood ratio and score statistics have a robust performance for small sample sizes.
The proposed methods can be used not only in medical research but also in biometrics and psychological measurements. Meanwhile, exact methods can be applied to other data types, such as binary outcomes on multiple raters. This work focuses on constructing parametric statistics through unconstrained and constrained MLEs. However, there are still many problems that need to be solved. For example, how to construct optimal tests? How to improve exact methods effectively for a larger K or sample size n k ? How to simplify the heavy computations caused by the consideration of all the tables? More studies should be conducted on these problems and be extended in the future.

Author Contributions

Methodology, M.X.; software, M.X. and K.M.; writing—original draft preparation, M.X.; writing—review and editing, Z.L., K.M. and K.M.S.; supervision, Z.L.; visualization, M.X.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (Grant No.: 12061070) and the Science and Technology Department of Xinjiang Uygur Autonomous Region (Grant No.: 2021D01E13).

Institutional Review Board Statement

Not applicable because of the study involved the development of statistical methods. Clinical examples retrospective data were published in Hannah et al. [16] and Hang et al. [20].

Informed Consent Statement

Not applicable.

Data Availability Statement

Clinical data referred to are from Hannah et al. [16] and Hang et al. [20].

Acknowledgments

We thank reviewers and editors for constructive and valuable advice for improving this article.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MLEs:maximum likelihood estimates. T L :likelihood ratio statistic.
T S C :score statistic. T W :Wald-type statistic.
A:asymptotic approach.E:E approach.
M:M approach.E+M:E+M approach.
PVR:proliferative vitreoretinopathy.

Appendix A

Appendix A.1. The Fisher Information Matrix of MLEs under H0

Let I 1 be the ( K + 1 ) × ( K + 1 ) Fisher information matrix, and
I 1 = I 111 I 112 I 112 T I 122 = E 2 l 0 γ 2 E 2 l 0 γ π 1 E 2 l 0 γ π K E 2 l 0 γ π 1 E ( 2 l 0 π 1 2 ) E 2 l 0 γ π K E ( 2 l 0 π K 2 ) = k = 1 K n k 4 a k 2 b k n 1 a 1 c 1 2 n K a K c K 2 n 1 a 1 c 1 2 n 1 d 1 n K a K c K 2 n K d K ,
where
a k = 1 2 π k ( 1 π k ) , b k = 1 P 1 k ( γ , π k ) + 4 P 2 k ( γ , π k ) + 1 P 3 k ( γ , π k ) , c k = 1 P 1 k ( γ , π k ) 1 P 3 k ( γ , π k ) + ( 1 γ ) ( 1 2 π k ) b k , d k = 1 P 1 k ( γ , π k ) + 1 P 3 k ( γ , π k ) + ( 1 γ ) ( 1 2 π k ) × 1 P 1 k ( γ , π k ) 1 P 3 k ( γ , π k ) + c k
for k = 1 , 2 , , K .

Appendix A.2. The Fisher Information Matrix I2 of the Score Statistic

For the observed data N = ( n 1 , n 2 , , n K ) , the Fisher information matrix of score statistic is
I 2 = I 211 I 212 I 221 I 222 = E ( 2 l 1 γ 1 2 ) E ( 2 l 1 γ 1 π 1 ) E ( 2 l K γ K 2 ) E ( 2 l K γ K π K ) E ( 2 l 1 γ 1 π 1 ) E ( 2 l 1 π 1 2 ) E ( 2 l K γ K π K ) E ( 2 l K π K 2 ) ,
Since E ( n l k ) = n k P l k ( γ k , π k ) ( l = 1 , 2 , 3 ) , we can obtain E ( 2 l k γ k 2 ) = n k a k 2 b k 4 , E ( 2 l k γ k π k ) = n k a k c k 2 , and E ( 2 l k π k 2 ) = n k d k . Thus, the Fisher information matrix I 2 is simplified as
I 2 = 1 4 n 1 a 1 2 b 1 2 n 1 a 1 c 1 n K a K 2 b K 2 n K a K c K 2 n 1 a 1 c 1 4 n 1 d 1 2 n K a K c K 4 n K d K .

Appendix A.3. The Simplification of TW

To obtain the simplification of T W , we calculate C I 3 1 C T and obtain
C I 3 1 C T = g 1 + g 2 g 2 0 0 0 g 2 g 2 + g 3 g 3 0 0 0 0 g K 2 g K 2 + g K 1 g K 1 0 0 0 g K 1 g K 1 + g K ,
where g k = 4 d k n k a k 2 ( b k d k c k 2 ) . Obviously, ( C I 3 1 C T ) 1 is a symmetric matrix, in which
( C I 3 1 C T ) i , j 1 = g i + 1 g j h j + 1 h K 1 s i s K 1 , j > i , ( C I 3 1 C T ) i , i 1 = h i + 1 h K 1 s i s K 1 , i ,
where
h K = 1 , h K 1 = g K 1 + g K , h i = ( g i + g i + 1 g i + 1 2 h i + 1 ) , i = 1 , 2 , , K 2 , s 1 = g 1 + g 2 , s i = ( g i + g i + 1 ) g i 2 s i 1 , i = 2 , , K 1 .
By computation, the simplified form of T W is given as
T W = i = 1 K 1 j = 1 K 1 ( γ ^ i γ ^ i + 1 ) ( γ ^ j γ ^ j + 1 ) ( C I 3 1 C T ) i , j 1 .

Appendix A.4. Other Tables and Figures

Table A1. Empirical type I error rates of asymptotic statistics for K = 3 (balanced sample sizes).
Table A1. Empirical type I error rates of asymptotic statistics for K = 3 (balanced sample sizes).
Balanced π ConditionsUnbalanced π Conditions
n 1 = n 2 = n 3 γ π 1 = π 2 = π 3 T L T SC T W n 1 = n 2 = n 3 γ π 1 π 2 π 3 T L T SC T W
100.10.3 0.0527 0.0454 0.1192100.10.30.50.5 0.0515 0.0420 0.1030
0.3 0.0576 0.0462 0.10690.3 0.0510 0.03930.0958
0.5 0.0421 0.03370.06550.50.03380.02590.0627
0.70.01520.01380.01670.70.01240.01140.0147
0.90.00140.00140.00000.90.00050.00070.0001
0.10.6 0.0512 0.0443 0.10150.10.60.40.2 0.0525 0.0439 0.1115
0.3 0.0444 0.03410.08900.3 0.0529 0.0406 0.0974
0.50.03070.0241 0.0574 0.5 0.0412 0.03150.0671
0.70.01290.01180.01570.70.01610.01440.0204
0.90.00060.00080.00010.90.00050.00080.0000
0.10.5 0.0508 0.0413 0.09900.10.50.40.3 0.0578 0.0472 0.1078
0.3 0.0437 0.03170.08570.3 0.0524 0.0416 0.0992
0.50.03130.0223 0.0574 0.50.03790.02900.0662
0.70.00860.00820.01270.70.01220.01150.0145
0.90.00020.00030.00020.90.00040.00040.0001
500.10.3 0.0547 0.0528 0.0681500.10.30.50.5 0.0494 0.0480 0.0600
0.3 0.0498 0.0482 0.06000.3 0.0507 0.0477 0.0602
0.5 0.0533 0.0514 0.06320.5 0.0513 0.0471 0.0586
0.7 0.0548 0.0495 0.06060.7 0.0537 0.0479 0.0634
0.90.03380.02890.02660.90.02720.02420.0221
0.10.6 0.0480 0.0464 0.0583 0.10.60.40.20.06550.06460.0823
0.3 0.0470 0.0450 0.0563 0.3 0.0563 0.0541 0.0671
0.5 0.0517 0.0489 0.0590 0.5 0.0510 0.0480 0.0600
0.7 0.0528 0.0457 0.06130.7 0.0577 0.0505 0.0628
0.90.02370.02160.02110.90.03180.02860.0249
0.10.5 0.0502 0.0476 0.0588 0.10.50.40.3 0.0485 0.0468 0.0598
0.3 0.0483 0.0455 0.0586 0.3 0.0528 0.0499 0.0627
0.5 0.0514 0.0477 0.06220.5 0.0531 0.0489 0.0601
0.7 0.0526 0.0456 0.0588 0.7 0.0538 0.0470 0.0609
0.90.02290.02200.02360.90.02810.02690.0248
1000.10.3 0.0520 0.0515 0.0588 1000.10.30.50.5 0.0509 0.0506 0.0562
0.3 0.0518 0.0507 0.0571 0.3 0.0501 0.0491 0.0545
0.5 0.0487 0.0480 0.0530 0.5 0.0495 0.0470 0.0531
0.7 0.0523 0.0503 0.0542 0.7 0.0481 0.0455 0.0550
0.9 0.0563 0.0473 0.06050.9 0.0524 0.0402 0.0618
0.10.6 0.0492 0.0489 0.0543 0.10.60.40.20.07740.07720.0877
0.3 0.0500 0.0484 0.0551 0.3 0.0542 0.0528 0.0619
0.5 0.0579 0.0559 0.06310.5 0.0520 0.0509 0.0565
0.7 0.0514 0.0499 0.0569 0.7 0.0529 0.0513 0.0539
0.9 0.0529 0.0431 0.06100.9 0.0529 0.0430 0.0578
0.10.5 0.0499 0.0487 0.0545 0.10.50.40.3 0.0497 0.0479 0.0546
0.3 0.0475 0.0469 0.0519 0.3 0.0466 0.0461 0.0511
0.5 0.0518 0.0496 0.0555 0.5 0.0478 0.0470 0.0513
0.7 0.0525 0.0496 0.0581 0.7 0.0506 0.0478 0.0568
0.9 0.0563 0.0428 0.06530.9 0.0555 0.0471 0.0617
Note: The type I error rates of robust region (0.04–0.06) are shown in bold.
Table A2. Empirical type I error rates of homogeneity tests for K = 4 (balanced sample sizes).
Table A2. Empirical type I error rates of homogeneity tests for K = 4 (balanced sample sizes).
Balanced π ConditionsUnbalanced π Conditions
n 1 = n 2 = n 3 = n 4 γ π 1 = π 2 = π 3 = π 4 T L T SC T W n 1 = n 2 = n 3 = n 4 γ π 1 π 2 π 3 π 4 T L T SC T W
100.10.3 0.0559 0.0437 0.1519100.10.30.50.50.2 0.0541 0.0429 0.1287
0.3 0.0595 0.0465 0.13490.3 0.0466 0.03490.1166
0.5 0.0408 0.03250.08570.50.03530.02770.0763
0.70.01260.01370.01590.70.01040.01140.0147
0.90.00050.00150.00010.90.00050.00160.0001
0.10.6 0.0543 0.0455 0.12810.10.60.40.20.4 0.0594 0.0482 0.1387
0.3 0.0462 0.03450.12060.3 0.0495 0.03760.1175
0.50.02650.02040.06710.50.03750.02900.0783
0.70.00840.00960.01200.70.01080.01220.0150
0.90.00020.00060.00010.90.00010.00050.0000
0.10.5 0.0491 0.0404 0.11580.10.50.40.30.3 0.0583 0.0475 0.1433
0.3 0.0436 0.03400.10490.3 0.0533 0.03960.1233
0.50.02340.0193 0.0572 0.50.03280.02500.0768
0.70.00730.00930.01090.70.00950.01120.0145
0.90.00020.00080.00000.90.00050.00110.0002
500.10.3 0.0531 0.0515 0.0694500.10.30.50.50.20.06380.06170.0831
0.3 0.0504 0.0485 0.06590.3 0.0525 0.0501 0.0671
0.5 0.0532 0.0499 0.06910.5 0.0508 0.0472 0.0633
0.7 0.0577 0.0497 0.06820.7 0.0537 0.0466 0.0705
0.90.03040.03010.02560.90.02640.02650.0239
0.10.6 0.0511 0.0477 0.06620.10.60.40.20.40.06210.06080.0810
0.3 0.0525 0.0495 0.06600.3 0.0523 0.0497 0.0692
0.5 0.0566 0.0513 0.06880.5 0.0525 0.0485 0.0658
0.7 0.0549 0.0475 0.07260.7 0.0532 0.0457 0.0674
0.90.01910.02080.01850.90.02630.02600.0236
0.10.5 0.0433 0.0416 0.0578 0.10.50.40.30.3 0.0486 0.0461 0.0633
0.3 0.0477 0.0444 0.0593 0.3 0.0499 0.0473 0.0651
0.5 0.0502 0.0458 0.06010.5 0.0519 0.0472 0.0631
0.7 0.0549 0.0452 0.07390.7 0.0549 0.0476 0.0688
0.90.02230.02440.02060.90.02870.02880.0256
1000.10.3 0.0508 0.0499 0.0585 1000.10.30.50.50.20.08050.07940.0921
0.3 0.0494 0.0481 0.0571 0.3 0.0551 0.0551 0.0631
0.5 0.0498 0.0485 0.0568 0.5 0.0471 0.0457 0.0531
0.7 0.0504 0.0488 0.0556 0.7 0.0531 0.0505 0.0595
0.9 0.0568 0.0464 0.07120.9 0.0513 0.0408 0.0743
0.10.6 0.0462 0.0454 0.0523 0.10.60.40.20.40.07940.07890.0960
0.3 0.0484 0.0473 0.0547 0.3 0.0551 0.0535 0.0647
0.5 0.0493 0.0479 0.0556 0.5 0.0498 0.0473 0.0566
0.7 0.0498 0.0472 0.0599 0.7 0.0484 0.0465 0.0548
0.9 0.0494 0.0401 0.07370.9 0.0586 0.0482 0.0804
0.10.5 0.0476 0.0464 0.0531 0.10.50.40.30.3 0.0499 0.0491 0.0579
0.3 0.0493 0.0469 0.0561 0.3 0.0499 0.0486 0.0582
0.5 0.0484 0.0459 0.0571 0.5 0.0495 0.0476 0.0577
0.7 0.0536 0.0471 0.06200.7 0.0523 0.0493 0.0584
0.9 0.0482 0.0407 0.07170.9 0.0587 0.0467 0.0783
Note: The type I error rates of robust region (0.04–0.06) are shown in bold.
Figure A1. The surfaces of empirical type I error rates for asymptotic tests under K = 3 .
Figure A1. The surfaces of empirical type I error rates for asymptotic tests under K = 3 .
Entropy 25 00536 g0a1
Figure A2. The surfaces of empirical type I error rates for asymptotic tests under K = 4 .
Figure A2. The surfaces of empirical type I error rates for asymptotic tests under K = 4 .
Entropy 25 00536 g0a2
Figure A3. The surfaces and curves of type I error rates for exact methods under K = 3 .
Figure A3. The surfaces and curves of type I error rates for exact methods under K = 3 .
Entropy 25 00536 g0a3
Table A3. Powers of exact and asymptotic methods for K = 2 (unbalanced π conditions).
Table A3. Powers of exact and asymptotic methods for K = 2 (unbalanced π conditions).
γ 1 γ 2 π 1 π 2 Exact MethodsAsymptotic Methods
p L E p SC E p W E p L M p SC M p W M p L EM p SC EM p W EM p L A p SC A p W A
0.10.50.30.50.16680.16560.17690.14660.16180.13410.15230.13830.15650.18080.16790.1945
0.60.24000.23740.25200.21560.23180.18940.22080.20320.22630.25790.23860.2660
0.70.33940.33410.35260.31100.32540.26290.31510.29370.32170.36190.33270.3599
0.80.47200.46120.48440.44080.44690.35850.44310.41690.44930.49980.45400.4807
0.90.64590.62530.65410.61460.60080.48160.61520.58070.61670.68070.60610.6341
0.30.60.13100.13100.14290.11120.12020.08500.11410.10380.12070.14960.12400.1483
0.70.20270.20010.21610.17760.18190.12920.17990.16270.18790.22970.18620.2181
0.80.31100.30300.32420.28060.27220.19460.28210.25280.28990.35080.27660.3186
0.90.47070.45190.47920.43660.40010.28900.43740.38680.44020.52980.40380.4584
0.50.80.16360.15640.16920.14210.12280.07640.14400.12080.14620.20860.12550.1583
0.90.26900.25180.27030.23780.19580.12330.24060.19870.24000.34270.19870.2456
0.10.50.50.40.13990.15010.16290.11430.13820.11100.12360.12090.13350.16120.14550.1823
0.60.20890.22310.24060.17820.20560.16680.18750.18450.20130.23860.21400.2604
0.70.30680.32480.34710.27180.29920.24620.28030.27630.29780.34620.30860.3645
0.80.44280.46190.48680.40490.42490.35580.41250.40560.43120.49100.43490.4967
0.90.62870.64030.66240.58990.58860.50420.59800.58360.61090.68030.59830.6557
0.30.60.10970.11940.13090.09270.10150.07470.09670.09430.10490.13320.10690.1373
0.70.17680.18820.20360.15330.15940.11940.15850.15350.16820.21060.16640.2062
0.80.28450.29470.31370.25170.24810.18950.25930.24850.26850.33160.25730.3066
0.90.45270.45320.47340.40620.37950.29590.41890.39610.42180.51440.39080.4450
0.50.80.15420.15320.16330.12680.10940.07320.13480.12270.13440.19080.11600.1438
0.90.26420.25400.26680.21820.18040.12320.23260.20920.22690.31950.19020.2265
0.10.50.60.40.14780.15520.16780.12100.14530.11630.13070.12550.13910.16790.15310.1878
0.60.21840.22830.24530.18650.21410.17210.19610.18940.20760.24580.22270.2650
0.70.31700.32940.35040.28130.30850.25000.28960.28070.30410.35290.31770.3672
0.80.45170.46450.48760.41500.43380.35600.42120.40780.43640.49590.44300.4970
0.90.63250.63980.66020.59920.59500.49750.60400.58080.61300.68270.60290.6544
0.30.60.11400.12250.13370.09710.10650.07710.10080.09690.10790.13730.11190.1407
0.70.18160.19180.20660.15950.16590.12170.16390.15620.17180.21580.17260.2101
0.80.28840.29820.31610.25980.25620.19060.26540.25020.27190.33770.26420.3108
0.90.45270.45550.47410.41600.38850.29400.42450.39480.42360.52120.39730.4495
0.50.80.15410.15560.16480.13130.11420.07390.13740.12340.13490.19600.11990.1468
0.90.26180.25680.26810.22480.18730.12350.23600.20890.22660.32810.19520.2305
Table A4. Powers of exact and asymptotic methods for K = 3 (balanced π conditions).
Table A4. Powers of exact and asymptotic methods for K = 3 (balanced π conditions).
γ 1 = γ 3 γ 2 π 1 = π 2 = π 3 Exact MethodsAsymptotic Methods
p L E p SC E p W E p L M p SC M p W M p L EM p SC EM p W EM p L A p SC A p W A
0.10.50.30.13510.13890.14210.10490.09290.10890.12160.12740.11500.15960.13180.2366
0.60.19830.20020.20890.16060.13830.15830.18030.18450.17350.23160.19010.3167
0.70.29050.28600.30750.24600.20470.23280.26770.26610.26350.33420.27290.4258
0.80.42220.40280.45040.37370.29910.34330.39590.37990.39940.47620.38720.5682
0.90.60670.55720.65330.56070.42960.50510.58070.53490.60110.66840.54070.7462
0.30.60.10550.10440.11830.08620.06620.06860.09510.09590.09490.13280.09670.1829
0.70.16420.15480.18580.13910.10080.10810.14990.14360.15340.20350.14390.2632
0.80.26020.23190.29890.22750.15430.17440.24040.21740.25310.31710.21660.3880
0.90.41240.34590.48270.37070.23450.28220.38590.32820.41770.49430.32520.5742
0.50.80.13510.11690.15500.10870.06660.05650.12050.10850.11770.17990.10060.2119
0.90.21980.17870.25950.17850.10270.09510.19700.16700.19910.28970.15380.3369
0.10.50.50.12410.12730.13550.10020.08550.10080.11310.11860.11190.14700.11460.2295
0.60.18510.18470.20350.15490.12700.15440.17100.17350.17290.21640.16750.3148
0.70.27480.26560.30350.23770.18640.23520.25730.25150.26550.31660.24300.4281
0.80.40320.37620.44660.36000.26910.35350.38220.35900.40220.45690.34760.5725
0.90.58290.52310.64620.53660.38090.52300.55880.50330.59950.64830.48880.7493
0.30.60.10040.09560.12070.08080.05350.07230.09040.08780.09810.12980.08080.1884
0.70.15460.14080.18770.12740.08000.11380.14070.13020.15560.19690.12020.2698
0.80.24060.20830.29600.20260.11980.18100.22100.19380.24980.30260.17980.3930
0.90.37270.30570.46500.32010.17770.28610.34510.28660.39930.46340.26750.5732
0.50.80.11850.10560.15320.08620.04810.05770.10510.09540.10790.16340.08010.2225
0.90.18410.15660.23930.13520.07220.09100.16430.14260.17100.25170.12020.3353
0.10.50.60.12780.13070.13890.10210.08920.10280.11560.12140.11280.15020.12000.2328
0.60.18950.18910.20660.15800.13290.15600.17430.17710.17380.22010.17460.3180
0.70.28020.27130.30630.24270.19580.23610.26150.25650.26640.32080.25230.4312
0.80.41010.38370.44920.36780.28350.35340.38820.36620.40360.46160.35980.5753
0.90.59170.53290.64890.54860.40250.52130.56760.51360.60190.65360.50430.7510
0.30.60.10230.09840.11970.08330.05830.07110.09220.09030.09730.13000.08530.1879
0.70.15830.14520.18680.13240.08760.11260.14420.13450.15560.19830.12690.2695
0.80.24780.21560.29640.21230.13200.18040.22800.20140.25230.30630.19010.3933
0.90.38650.31800.46920.33810.19700.28730.35850.29950.40690.47210.28310.5745
0.50.80.12470.10880.15370.09360.05420.05800.11040.09960.11250.16770.08500.2203
0.90.19580.16270.24460.14840.08190.09320.17440.14990.18130.26160.12800.3366
Table A5. Powers of exact and asymptotic methods for K = 3 (unbalanced π conditions).
Table A5. Powers of exact and asymptotic methods for K = 3 (unbalanced π conditions).
γ 1 = γ 3 γ 2 π 1 π 2 π 3 Exact MethodsAsymptotic Methods
p L E p SC E p W E p L M p SC M p W M p L EM p SC EM p W EM p L A p SC A p W A
0.10.50.30.50.50.16680.16560.17690.14660.16180.13410.15230.13830.15650.16200.12780.2409
0.60.24000.23740.25200.21560.23180.18940.22080.20320.22630.23330.18320.3249
0.70.33940.33410.35260.31100.32540.26290.31510.29370.32170.33410.26070.4357
0.80.47200.46120.48440.44080.44690.35850.44310.41690.44930.47290.36620.5766
0.90.64590.62530.65410.61460.60080.48160.61520.58070.61670.65980.50640.7498
0.30.60.13100.13100.14290.11120.12020.08500.11410.10380.12070.13730.08920.1970
0.70.20270.20010.21610.17760.18190.12920.17990.16270.18790.20670.13090.2800
0.80.31100.30300.32420.28060.27220.19460.28210.25280.28990.31440.19310.4037
0.90.47070.45190.47920.43660.40010.28900.43740.38680.44020.47670.28310.5823
0.50.80.16360.15640.16920.14210.12280.07640.14400.12080.14620.17120.08530.2283
0.90.26900.25180.27030.23780.19580.12330.24060.19870.24000.26310.12660.3448
0.10.50.50.40.30.13990.15010.16290.11430.13820.11100.12360.12090.13350.15760.12520.2352
0.60.20890.22310.24060.17820.20560.16680.18750.18450.20130.22820.18060.3183
0.70.30680.32480.34710.27180.29920.24620.28030.27630.29780.32880.25880.4292
0.80.44280.46190.48680.40490.42490.35580.41250.40560.43120.46860.36650.5713
0.90.62870.64030.66240.58990.58860.50420.59800.58360.61090.65830.51070.7468
0.30.60.10970.11940.13090.09270.10150.07470.09670.09430.10490.13370.08890.1911
0.70.17680.18820.20360.15330.15940.11940.15850.15350.16820.20270.13140.2729
0.80.28450.29470.31370.25170.24810.18950.25930.24850.26850.31130.19540.3966
0.90.45270.45320.47340.40620.37950.29590.41890.39610.42180.47710.28910.5771
0.50.80.15420.15320.16330.12680.10940.07320.13480.12270.13440.17090.08720.2225
0.90.26420.25400.26680.21820.18040.12320.23260.20920.22690.26600.13070.3400
0.10.50.60.40.30.14780.15520.16780.12100.14530.11630.13070.12550.13910.16160.12920.2398
0.60.21840.22830.24530.18650.21410.17210.19610.18940.20760.23300.18530.3234
0.70.31700.32940.35040.28130.30850.25000.28960.28070.30410.33420.26420.4344
0.80.45170.46450.48760.41500.43380.35600.42120.40780.43640.47400.37220.5760
0.90.63250.63980.66020.59920.59500.49750.60400.58080.61300.66240.51610.7501
0.30.60.11400.12250.13370.09710.10650.07710.10080.09690.10790.13580.09140.1931
0.70.18160.19180.20660.15950.16590.12170.16390.15620.17180.20570.13470.2757
0.80.28840.29820.31610.25980.25620.19060.26540.25020.27190.31510.19980.4000
0.90.45270.45550.47410.41600.38850.29400.42450.39480.42360.48170.29470.5806
0.50.80.15410.15560.16480.13130.11420.07390.13740.12340.13490.17370.08930.2241
0.90.26180.25680.26810.22480.18730.12350.23600.20890.22660.27040.13350.3431

References

  1. Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
  2. Scott, W.A. Reliability of content analysis: The case of nominal scale coding. Public Opin. Q. 1955, 19, 321–325. [Google Scholar] [CrossRef]
  3. Feinstein, A.R.; Cicchetti, D.V. High agreement but low Kappa: I. The problems of two paradoxes. J. Clin. Epidemiol. 1990, 43, 543–549. [Google Scholar] [CrossRef] [PubMed]
  4. Vach, W. The dependence of Cohen’s Kappa on the prevalence does not matter. J. Clin. Epidemiol. 2005, 58, 655–661. [Google Scholar] [CrossRef]
  5. Holley, J.W.; Guilford, J.P. A note on the G index of agreement. Educ. Psychol. Meas. 1964, 24, 749–753. [Google Scholar] [CrossRef]
  6. Aickin, M. Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen’s Kappa. Biometrics 1990, 46, 293–302. [Google Scholar] [CrossRef]
  7. Andre´s, A.; Marzo, P.F. Delta: A new measure of agreement between two raters. Br. J. Math. Stat. Psychol. 2004, 57, 1–19. [Google Scholar] [CrossRef]
  8. Gwet, K.L. Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 2008, 61, 29–48. [Google Scholar] [CrossRef] [Green Version]
  9. Byrt, T.; Bishop, J.; Carlin, J.B. Bias, prevalence and Kappa. J. Clin. Epidemiol. 1993, 46, 423–429. [Google Scholar] [CrossRef]
  10. Shankar, V.; Bangdiwala, S.I. Observer agreement paradoxes in 2 × 2 tables: Comparison of agreement measures. BMC Med Res. Methodol. 2014, 14, 1–9. [Google Scholar] [CrossRef] [Green Version]
  11. Wongpakaran, N.; Wongpakaran, T.; Wedding, D.; Gwet, K.L. A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: A study conducted with personality disorder samples. BMC Med Res. Methodol. 2013, 13. [Google Scholar] [CrossRef] [Green Version]
  12. Ohyama, T. Statistical inference of agreement coefficient between two raters with binary outcomes. Commun. Stat. Theory Methods 2019, 49, 2529–2539. [Google Scholar] [CrossRef]
  13. Vanbelle, S.; Albert, A. Agreement between two independent groups of raters. Psychometrika 2009, 74, 477–491. [Google Scholar] [CrossRef] [Green Version]
  14. Alencar, L.M.; Zangwill, L.M.; Weinreb, R.N.; Bowd, C.; Vizzeri, G.; Sample, P.A.; Jr, R.S.; Medeiros, F.A. Agreement for detecting glaucoma progression with the GDx guided progression analysis, automated perimetry, and optic disc photography. Ophthalmology 2010, 117, 462–470. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Henninger, B.; Kaser, V.; Ostermann, S.; Spicher, A.; Zegg, M.; Schmid, R.; Kremser, C.; Krappinger, D. Cervical disc and ligamentous injury in hyperextension trauma: MRI and intraoperative correlation. J. Neuroimaging 2020, 30, 104–109. [Google Scholar] [CrossRef] [PubMed]
  16. Hannah, M.C.; Hopper, J.L.; Mathews, J.D. Twin concordance for a binary trait. II. Nested analysis of ever-smoking and ex-smoking traits and unnested analysis of a ’Committed-Smoking’ trait. Am. J. Hum. Genet. 1985, 37, 153–165. [Google Scholar]
  17. Nam, J.M. Assessment on homogeneity tests for kappa statistics under equal prevalence across studies in reliability. Stat. Med. 2010, 25, 1521–1531. [Google Scholar] [CrossRef]
  18. Agresti, A. Categorical Data Analysis; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2002. [Google Scholar]
  19. Honda, C.; Ohyama, T. Homogeneity score test of AC1 statistics and estimation of common AC1 in multiple or stratified inter-rater agreement studies. BMC Med Res. Methodol. 2020, 20, 20. [Google Scholar] [CrossRef]
  20. Huang, S.; Wu, Y.; Zhu, G.; Ji, J.; Lu, R. Clinical value of ELISA and gold-standard in the diagnosis of corona virus disease 2019. Traffic Med. 2020, 34, 119–120. [Google Scholar]
  21. Fisher, R.A. The Design of Experiments; Oliver and Boyd: Edinburgh, Scotland, 1935. [Google Scholar]
  22. Mehta, C.R.; Patel, N.R. A network algorithm for the exact treatment of the 2 × k contingency table. Commun. Stat.-Simul. Comput. 1980, 9, 649–664. [Google Scholar] [CrossRef]
  23. Upton, G.J.G. Fisher’s exact test. J. R. Stat. Soc. Ser. A (Stat. Soc.). 1992, 155, 395–402. [Google Scholar] [CrossRef]
  24. Mehta, C.R.; Patel, N.R. A network algorithm for performing Fisher’s exact test in r × c contingency tables. Publ. Am. Stat. Assoc. 1983, 78, 427–434. [Google Scholar] [CrossRef]
  25. Mehta, C.R.; Patel, N.R. Algorithm 643 FEXACT: A FORTRAN subroutine for Fisher’s exact test on unordered r × c contingency tables. ACM Trans. Math. Softw. 1986, 12, 154–161. [Google Scholar] [CrossRef]
  26. Mehta, C.R.; Patel, N.R. A hybrid algorithm for Fisher’s exact test in unordered r × c contingency tables. Commun. Stat.-Theory Methods 1986, 15, 387–404. [Google Scholar] [CrossRef]
  27. Mehta, C.R.; Patel, N.R.; Senchaudhuri, P. Exact power and sample-size computations for the Cochran-Armitage trend test. Biometrics 1998, 54, 1615–1621. [Google Scholar] [CrossRef]
  28. Liddell, D. Practical tests of 2 × 2 contingency tables. J. R. Stat. Soc. Ser. 1976, 25, 295–304. [Google Scholar] [CrossRef]
  29. Storer, B.E.; Kim, C. Exact properties of some exact test statistics for comparing two binomial proportions. J. Am. Stat. Assoc. 1990, 85, 146–155. [Google Scholar] [CrossRef]
  30. Basu, D. On the elimination of nuisance parameters. J. Am. Stat. Assoc. 1977, 72, 355–366. [Google Scholar] [CrossRef]
  31. Lloyd, C.J. Exact p-value for discrete models obtained by estimation and maximization. Aust. New Zealand J. Stat. 2008, 50, 329–345. [Google Scholar] [CrossRef]
  32. Tang, M.L.; Ng, H.K.T.; Guo, J.; Chan, W.; Chan, B.P.S. Exact Cochran-Armitage trend tests: Comparisons under different models. J. Stat. Comput. Simul. 2006, 76, 847–859. [Google Scholar] [CrossRef]
  33. Shan, G.; Wilding, G.E. Powerful Exact Unconditional Tests for Agreement between Two Raters with Binary Endpoints. PLoS ONE 2014, 9, e97386. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Engle, R.F. Chapter 13, Wald, likelihood ratio, and Lagrange multiplier tests in econometrics. Handb. Econom. 1984, 2, 775–826. [Google Scholar]
  35. Tang, M.L.; Tang, N.S.; Rosner, B. Statistical inference for correlated data in ophthalmologic studies. Stat. Med. 2006, 25, 2771–2783. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The range of γ k .
Figure 1. The range of γ k .
Entropy 25 00536 g001
Figure 2. The surfaces of empirical type I error rates for asymptotic tests under K = 2 .
Figure 2. The surfaces of empirical type I error rates for asymptotic tests under K = 2 .
Entropy 25 00536 g002
Figure 3. Empirical power curves of asymptotic tests for K = 2 , 3 , 4 .
Figure 3. Empirical power curves of asymptotic tests for K = 2 , 3 , 4 .
Entropy 25 00536 g003
Figure 4. The surfaces and curves of type I error rates for exact methods under K = 2 .
Figure 4. The surfaces and curves of type I error rates for exact methods under K = 2 .
Entropy 25 00536 g004
Figure 5. Power curves of exact methods for K = 2 , 3 .
Figure 5. Power curves of exact methods for K = 2 , 3 .
Entropy 25 00536 g005aEntropy 25 00536 g005b
Table 1. A 2 × 2 original table.
Table 1. A 2 × 2 original table.
Rater Rater ATotal
+
+ n 11 ( P 11 ) n 12 ( P 12 ) n 1 + ( P B )
Rater B n 21 ( P 21 ) n 22 ( P 22 ) n 2 + ( 1 P B )
Total n + 1 ( P A ) n + 2 ( 1 P A ) n
Table 2. A 2 × 2 table under P A = P B .
Table 2. A 2 × 2 table under P A = P B .
CategoryRatingsFrequencyProbability
1 ( + , + ) n 11 P 11
2 ( + , ) or ( , + ) n 12 + n 21 2 P 12
3 ( , ) n 22 P 22
Total n1
Table 3. Agreement of alcohol drinking status between male twins stratified by zygosity.
Table 3. Agreement of alcohol drinking status between male twins stratified by zygosity.
Alcohol DrinkingZygosity
MZDZ
Both198
One1416
Neither197
Total5231
Note: Both: (twin 1, twin 2) = (drinker, drinker); One: (twin 1, twin 2) = (drinker, nondrinker) or (nondrinker, drinker); neither: (twin 1, twin 2) = (nondrinker, nondrinker).
Table 4. Agreement between ELISA and gold-standard methods stratified by antibody type.
Table 4. Agreement between ELISA and gold-standard methods stratified by antibody type.
Number of AgreementAntibody Type
IgGIgM
Both97
One37
Neither53
Total1717
Note: Both: (ELISA, gold-standard) = ( + , + ) ; One: (ELISA, gold-standard) = ( + , ) or ( , + ) ; neither: (ELISA, gold-standard) = ( , ) .
Table 5. Frequencies and probabilities of ratings in K strata.
Table 5. Frequencies and probabilities of ratings in K strata.
CategoryRatingsFrequency of SubjectsTotal
Stratum 1Stratum 2Stratum K
1 ( + , + ) n 11 ( P 11 ( γ 1 , π 1 ) ) n 12 ( P 12 ( γ 2 , π 2 ) ) n 1 K ( P 1 K ( γ K , π K ) ) S 1
2 ( + , ) or ( , + ) n 21 ( P 21 ( γ 1 , π 1 ) ) n 22 ( P 22 ( γ 2 , π 2 ) ) n 2 K ( P 2 K ( γ K , π K ) ) S 2
3 ( , ) n 31 ( P 31 ( γ 1 , π 1 ) ) n 32 ( P 32 ( γ 2 , π 2 ) ) n 3 K ( P 3 K ( γ K , π K ) ) S 3
Total n 1 n 2 n K N
Table 6. All the possible tables for K = 2 and n 1 = n 2 = 2 .
Table 6. All the possible tables for K = 2 and n 1 = n 2 = 2 .
n 1 n 11 000000000000000000111111111111222222
n 21 000000111111222222000000111111000000
n 31 222222111111000000111111000000000000
n 2 n 12 000112000112000112000112000112000112
n 22 012010012010012010012010012010012010
n 32 210100210100210100210100210100210100
Table 7. Empirical type I error rates of asymptotic statistics for K = 2 (balanced sample sizes).
Table 7. Empirical type I error rates of asymptotic statistics for K = 2 (balanced sample sizes).
Balanced π ConditionsUnbalanced π Conditions
n 1 = n 2 γ π 1 = π 2 T L T SC T W n 1 = n 2 γ π 1 π 2 T L T SC T W
100.10.3 0.0536 0.0456 0.0789 100.10.30.5 0.0526 0.0425 0.0787
0.3 0.0562 0.0483 0.07530.3 0.0506 0.0403 0.0702
0.5 0.0434 0.0363 0.0540 0.5 0.0445 0.0326 0.0541
0.70.02330.01860.02280.70.01920.01290.0225
0.90.00180.00160.00180.90.00140.00070.0013
0.10.6 0.0522 0.0452 0.07640.10.60.4 0.0521 0.0431 0.0750
0.3 0.0515 0.03970.06970.3 0.0490 0.03620.0645
0.50.03840.0264 0.0461 0.5 0.0440 0.0271 0.0513
0.70.01890.01100.02270.70.02150.01270.0250
0.90.00050.00020.00060.90.00090.00050.0013
0.10.5 0.0483 0.03860.07270.10.50.4 0.0504 0.0422 0.0770
0.3 0.0513 0.03350.06530.3 0.0504 0.03420.0670
0.5 0.0424 0.0238 0.0496 0.5 0.0453 0.0276 0.0530
0.70.01610.00800.01940.70.01400.00710.0173
0.90.00060.00030.00120.90.00110.00050.0013
500.10.3 0.0496 0.0480 0.0558 500.10.30.5 0.0539 0.0526 0.0601
0.3 0.0525 0.0515 0.0554 0.3 0.0544 0.0528 0.0576
0.5 0.0524 0.0512 0.0542 0.5 0.0547 0.0537 0.0565
0.7 0.0531 0.0493 0.0500 0.7 0.0529 0.0489 0.0515
0.90.03940.03090.02800.90.03300.02190.0221
0.10.6 0.0511 0.0504 0.0567 0.10.60.4 0.0498 0.0495 0.0549
0.3 0.0480 0.0470 0.0521 0.3 0.0539 0.0525 0.0572
0.5 0.0512 0.0490 0.0533 0.5 0.0518 0.0503 0.0544
0.7 0.0552 0.0515 0.0545 0.7 0.0550 0.0505 0.0539
0.90.03660.02690.03100.90.03750.02970.0330
0.10.5 0.0493 0.0482 0.0517 0.10.50.4 0.0500 0.0491 0.0539
0.3 0.0472 0.0464 0.0508 0.3 0.0516 0.0500 0.0540
0.5 0.0506 0.0490 0.0532 0.5 0.0506 0.0487 0.0531
0.7 0.0516 0.0471 0.0519 0.7 0.0535 0.0485 0.0531
0.90.03260.02560.03100.90.03450.02480.0291
1000.10.3 0.0493 0.0484 0.0520 1000.10.30.5 0.0514 0.0508 0.0545
0.3 0.0494 0.0492 0.0506 0.3 0.0473 0.0466 0.0490
0.5 0.0516 0.0512 0.0524 0.5 0.0507 0.0499 0.0520
0.7 0.0489 0.0476 0.0478 0.7 0.0527 0.0504 0.0523
0.9 0.0547 0.0463 0.0440 0.9 0.0564 0.0493 0.0503
0.10.6 0.0521 0.0518 0.0541 0.10.60.4 0.0483 0.0482 0.0514
0.3 0.0493 0.0486 0.0518 0.3 0.0499 0.0495 0.0514
0.5 0.0522 0.0514 0.0528 0.5 0.0509 0.0498 0.0520
0.7 0.0530 0.0505 0.0522 0.7 0.0517 0.0496 0.0514
0.9 0.0576 0.0463 0.0476 0.9 0.0568 0.0455 0.0466
0.10.5 0.0473 0.0472 0.0489 0.10.50.4 0.0512 0.0510 0.0536
0.3 0.0463 0.0461 0.0475 0.3 0.0552 0.0544 0.0568
0.5 0.0523 0.0513 0.0533 0.5 0.0513 0.0503 0.0520
0.7 0.0510 0.0489 0.0508 0.7 0.0568 0.0539 0.0568
0.9 0.0579 0.0407 0.0457 0.9 0.0583 0.0478 0.0494
Note: The type I error rates of robust region (0.04–0.06) are shown in bold.
Table 8. Running times (seconds) for γ = 0.1 and π = 0.5 .
Table 8. Running times (seconds) for γ = 0.1 and π = 0.5 .
ValuenK
2345678910112345
p L E 0.00880.02780.05400.14160.30460.63561.19102.28423.92417.68250.02781.014933.79883743.0157
p S C E 0.01150.03240.06920.17780.36510.72241.29882.41104.38428.40290.03240.998937.16953814.3481
p W E 0.01250.03640.08370.20920.40090.80201.40192.52174.50788.46760.03640.879439.28423797.6383
p L M 0.29460.48330.85611.76864.19328.112111.523716.200522.932334.02280.48335.948786.42411584.3279
p S C M 0.28670.58650.97942.16894.56697.850011.464617.182223.638842.00950.58655.317758.32901237.8588
p W M 0.26080.54770.93202.26204.11706.612110.852314.082224.428037.15860.54776.4484109.0489990.4258
p L E M 0.34570.71301.12822.36534.30378.178713.634318.768630.132149.22840.71307.9202159.21395696.1438
p S C E M 0.34460.67771.04122.28724.36918.023112.829520.081130.181945.13560.67776.2942131.36344930.5572
p W E M 0.37450.75531.10462.66534.63918.351112.533617.463429.650147.69760.75536.9710186.09826903.8451
Table 9. The values of L L in different settings for K = 2 .
Table 9. The values of L L in different settings for K = 2 .
n 21 + n 22 γ
−0.9−0.8−0.7−0.6−0.5−0.4−0.3−0.2−0.100.10.20.30.40.50.60.70.80.9
21.31 ×  10 29 3.09 ×  10 24 4.07 ×  10 21 6.40 ×  10 19 3.12 ×  10 17 7.24 ×  10 16 1.00 ×  10 14 9.44 ×  10 14 6.61 ×  10 13 3.64 ×  10 12 1.64 ×  10 11 6.20 ×  10 11 2.00 ×  10 10 5.59 ×  10 10 1.34 ×  10 09 2.75 ×  10 09 4.60 ×  10 09 5.73 ×  10 09 3.79 ×  10 09
34.99 ×  10 28 5.56 ×  10 23 4.62 ×  10 20 5.12 ×  10 18 1.87 ×  10 16 3.38 ×  10 15 3.72 ×  10 14 2.83 ×  10 13 1.62 ×  10 12 7.28 ×  10 12 2.68 ×  10 11 8.27 ×  10 11 2.16 ×  10 10 4.79 ×  10 10 8.96 ×  10 10 1.37 ×  10 09 1.63 ×  10 09 1.27 ×  10 09 3.99 ×  10 10
41.90 ×  10 26 1.00 ×  10 21 5.23 ×  10 19 4.10 ×  10 17 1.12 ×  10 15 1.58 ×  10 14 1.38 ×  10 13 8.49 ×  10 13 3.95 ×  10 12 1.46 ×  10 11 4.39 ×  10 11 1.10 ×  10 10 2.32 ×  10 10 4.11 ×  10 10 5.97 ×  10 10 6.87 ×  10 10 5.74 ×  10 10 2.83 ×  10 10 4.20 ×  10 11
57.21 ×  10 25 1.80 ×  10 20 5.93 ×  10 18 3.28 ×  10 16 6.74 ×  10 15 7.36 ×  10 14 5.13 ×  10 13 2.55 ×  10 12 9.65 ×  10 12 2.91 ×  10 11 7.18 ×  10 11 1.47 ×  10 10 2.50 ×  10 10 3.52 ×  10 10 3.98 ×  10 10 3.44 ×  10 10 2.02 ×  10 10 6.28 ×  10 11 4.42 ×  10 12
62.74 ×  10 23 3.24 ×  10 19 6.72 ×  10 17 2.62 ×  10 15 4.05 ×  10 14 3.43 ×  10 13 1.91 ×  10 12 7.64 ×  10 12 2.36 ×  10 11 5.82 ×  10 11 1.17 ×  10 10 1.96 ×  10 10 2.70 ×  10 10 3.02 ×  10 10 2.66 ×  10 10 1.72 ×  10 10 7.14 ×  10 11 1.40 ×  10 11 4.65 ×  10 13
71.04 ×  10 21 5.84 ×  10 18 7.62 ×  10 16 2.10 ×  10 14 2.43 ×  10 13 1.60 ×  10 12 7.08 ×  10 12 2.29 ×  10 11 5.77 ×  10 11 1.16 ×  10 10 1.92 ×  10 10 2.61 ×  10 10 2.90 ×  10 10 2.59 ×  10 10 1.77 ×  10 10 8.59 ×  10 11 2.52 ×  10 11 3.10 ×  10 12 4.90 ×  10 14
83.95 ×  10 20 1.05 ×  10 16 8.63 ×  10 15 1.68 ×  10 13 1.46 ×  10 12 7.48 ×  10 12 2.63 ×  10 11 6.88 ×  10 11 1.41 ×  10 10 2.33 ×  10 10 3.15 ×  10 10 3.48 ×  10 10 3.13 ×  10 10 2.22 ×  10 10 1.18 ×  10 10 4.30 ×  10 11 8.90 ×  10 12 6.90 ×  10 13 5.15 ×  10 15
91.50 ×  10 18 1.89 ×  10 15 9.78 ×  10 14 1.34 ×  10 12 8.74 ×  10 12 3.49 ×  10 11 9.76 ×  10 11 2.06 ×  10 10 3.45 ×  10 10 4.66 ×  10 10 5.15 ×  10 10 4.64 ×  10 10 3.37 ×  10 10 1.90 ×  10 10 7.87 ×  10 11 2.15 ×  10 11 3.14 ×  10 12 1.53 ×  10 13 5.42 ×  10 16
105.71 ×  10 17 3.41 ×  10 14 1.11 ×  10 12 1.07 ×  10 11 5.24 ×  10 11 1.63 ×  10 10 3.63 ×  10 10 6.19 ×  10 10 8.42 ×  10 10 9.31 ×  10 10 8.42 ×  10 10 6.19 ×  10 10 3.63 ×  10 10 1.63 ×  10 10 5.24 ×  10 11 1.07 ×  10 11 1.11 ×  10 12 3.41 ×  10 14 5.71 ×  10 17
112.17 ×  10 15 6.13 ×  10 13 1.26 ×  10 11 8.59 ×  10 11 3.15 ×  10 10 7.60 ×  10 10 1.35 ×  10 09 1.86 ×  10 09 2.06 ×  10 09 1.86 ×  10 09 1.38 ×  10 09 8.26 ×  10 10 3.91 ×  10 10 1.40 ×  10 10 3.50 ×  10 11 5.37 ×  10 12 3.91 ×  10 13 7.57 ×  10 15 6.01 ×  10 18
128.25 ×  10 14 1.10 ×  10 11 1.42 ×  10 10 6.87 ×  10 10 1.89 ×  10 09 3.55 ×  10 09 5.00 ×  10 09 5.57 ×  10 09 5.03 ×  10 09 3.73 ×  10 09 2.26 ×  10 09 1.10 ×  10 09 4.21 ×  10 10 1.20 ×  10 10 2.33 ×  10 11 2.68 ×  10 12 1.38 ×  10 13 1.68 ×  10 15 6.33 ×  10 19
133.13 ×  10 12 1.99 ×  10 10 1.61 ×  10 09 5.50 ×  10 09 1.13 ×  10 08 1.66 ×  10 08 1.86 ×  10 08 1.67 ×  10 08 1.23 ×  10 08 7.45 ×  10 09 3.69 ×  10 09 1.47 ×  10 09 4.53 ×  10 10 1.03 ×  10 10 1.55 ×  10 11 1.34 ×  10 12 4.87 ×  10 14 3.74 ×  10 16 6.66 ×  10 20
141.19 ×  10 10 3.57 ×  10 09 1.83 ×  10 08 4.40 ×  10 08 6.80 ×  10 08 7.73 ×  10 08 6.90 ×  10 08 5.02 ×  10 08 3.01 ×  10 08 1.49 ×  10 08 6.04 ×  10 09 1.96 ×  10 09 4.88 ×  10 10 8.79 ×  10 11 1.04 ×  10 11 6.71 ×  10 13 1.72 ×  10 14 8.30 ×  10 17 7.01 ×  10 21
154.52 ×  10 09 6.43 ×  10 08 2.07 ×  10 07 3.52 ×  10 07 4.08 ×  10 07 3.61 ×  10 07 2.56 ×  10 07 1.50 ×  10 07 7.35 ×  10 08 2.98 ×  10 08 9.88 ×  10 09 2.61 ×  10 09 5.25 ×  10 10 7.54 ×  10 11 6.91 ×  10 12 3.36 ×  10 13 6.07 ×  10 15 1.85 ×  10 17 7.38 ×  10 22
161.72 ×  10 07 1.16 ×  10 06 2.35 ×  10 06 2.81 ×  10 06 2.45 ×  10 06 1.68 ×  10 06 9.52 ×  10 07 4.51 ×  10 07 1.80 ×  10 07 5.96 ×  10 08 1.62 ×  10 08 3.48 ×  10 09 5.66 ×  10 10 6.46 ×  10 11 4.60 ×  10 12 1.68 ×  10 13 2.14 ×  10 15 4.10 ×  10 18 7.77 ×  10 23
176.53 ×  10 06 2.08 ×  10 05 2.66 ×  10 05 2.25 ×  10 05 1.47 ×  10 05 7.85 ×  10 06 3.54 ×  10 06 1.35 ×  10 06 4.39 ×  10 07 1.19 ×  10 07 2.65 ×  10 08 4.64 ×  10 09 6.09 ×  10 10 5.54 ×  10 11 3.07 ×  10 12 8.39 ×  10 14 7.56 ×  10 16 9.11 ×  10 19 8.18 ×  10 24
Table 10. Empirical type I error rates of exact and asymptotic methods for K = 2 .
Table 10. Empirical type I error rates of exact and asymptotic methods for K = 2 .
n 1 = n 2 γ π 1 = π 2 Exact MethodsAsymptotic Methods
p L E p SC E p W E p L M p SC M p W M p L EM p SC EM p W EM p L A p SC A p W A
100.10.3 0.0485 0.0497 0.0514 0.0366 0.0483 0.0404 0.0413 0.0327 0.0419 0.0569 0.0506 0.0855
0.3 0.0482 0.0511 0.0528 0.0358 0.0491 0.0334 0.0405 0.0351 0.0425 0.0584 0.0514 0.0752
0.5 0.0528 0.0544 0.0586 0.0428 0.0478 0.0267 0.0455 0.0394 0.0464 0.0686 0.0502 0.0632
0.7 0.0580 0.0523 0.0546 0.0435 0.03540.0156 0.0481 0.0365 0.0414 0.07650.0383 0.0427
0.8 0.0434 0.03710.03660.02900.02040.00730.03410.02430.0266 0.0582 0.02240.0256
0.90.01490.01250.01060.00790.00480.00110.01050.00710.00720.02030.00550.0069
0.10.5 0.0404 0.0425 0.0457 0.03400.03710.03540.03580.03320.0381 0.0485 0.03830.0749
0.3 0.0445 0.0481 0.0559 0.03630.03790.03410.03830.0357 0.0431 0.06000.03890.0726
0.5 0.0553 0.0550 0.06490.03990.03590.0301 0.0449 0.0371 0.0476 0.07510.03800.0694
0.7 0.0549 0.0473 0.0542 0.03160.02220.0170 0.0404 0.02820.03720.07160.0243 0.0525
0.80.03580.02920.03310.01770.01060.00720.02460.01570.0212 0.0469 0.01160.0314
0.90.00990.00780.00860.00400.00190.00100.00620.00360.00500.01320.00210.0080
0.10.6 0.0433 0.0451 0.0466 0.0334 0.0404 0.03660.03670.03270.0385 0.0506 0.0423 0.0783
0.3 0.0458 0.0494 0.0537 0.0365 0.0414 0.03420.03870.0362 0.0427 0.0584 0.0426 0.0732
0.5 0.0542 0.0556 0.0618 0.0420 0.03980.0295 0.0449 0.0387 0.0471 0.0727 0.0420 0.0671
0.7 0.0547 0.0495 0.0528 0.03620.02610.0168 0.0418 0.03160.03790.07270.0285 0.0497
0.80.03700.03190.03300.02140.01330.00740.02620.01860.0222 0.0496 0.01460.0300
0.90.01090.00920.00880.00510.00260.00100.00680.00460.00540.01490.00290.0078
250.10.3 0.0493 0.0493 0.0493 0.03580.03580.0358 0.0409 0.0416 0.0393 0.0527 0.0516 0.0516
0.3 0.0496 0.0496 0.0496 0.03440.03440.0344 0.0417 0.0430 0.0399 0.0523 0.0502 0.0501
0.5 0.0494 0.0494 0.0494 0.03530.03530.0353 0.0408 0.0424 0.0395 0.0546 0.0514 0.0507
0.7 0.0484 0.0483 0.0483 0.04000.04000.0400 0.0408 0.0442 0.0407 0.0599 0.0511 0.0477
0.8 0.0562 0.0556 0.0556 0.0478 0.0478 0.0478 0.0483 0.0488 0.0440 0.0726 0.0481 0.0431
0.9 0.0503 0.0497 0.0497 0.03700.03700.0370 0.0411 0.03780.02770.07020.02590.0207
0.10.5 0.0495 0.0495 0.0495 0.03030.03030.03030.03890.03980.0358 0.0502 0.0477 0.0477
0.3 0.0484 0.0484 0.0484 0.03120.03120.03120.0399 0.0412 0.0376 0.0496 0.0477 0.0477
0.5 0.0486 0.0486 0.0486 0.03460.03460.03460.0392 0.0409 0.0386 0.0513 0.0492 0.0491
0.7 0.0521 0.0521 0.0521 0.0401 0.0401 0.0401 0.0419 0.0451 0.0437 0.0612 0.0507 0.0507
0.8 0.0593 0.0593 0.0593 0.0431 0.0431 0.0431 0.0467 0.0482 0.0487 0.0795 0.0457 0.0455
0.9 0.0414 0.0414 0.0414 0.02230.02230.02230.02800.02760.02950.06900.01880.0186
0.10.6 0.0493 0.0493 0.0493 0.03250.03250.0325 0.0404 0.0417 0.0386 0.0505 0.0481 0.0481
0.3 0.0492 0.0492 0.0492 0.03280.03280.0328 0.0404 0.0417 0.0388 0.0509 0.0486 0.0486
0.5 0.0486 0.0486 0.0486 0.03480.03480.03480.0399 0.0416 0.0392 0.0525 0.0498 0.0497
0.7 0.0503 0.0502 0.0502 0.0408 0.0408 0.0408 0.0425 0.0448 0.0428 0.0607 0.0502 0.0496
0.8 0.0584 0.0581 0.0581 0.0463 0.0463 0.0463 0.0495 0.0495 0.0473 0.0769 0.0454 0.0443
0.9 0.0438 0.0436 0.0436 0.02800.02800.02800.03390.03240.02900.06850.02010.0187
Note: The type I error rates of robust region (0.04–0.06) are shown in bold.
Table 11. Powers of exact and asymptotic methods for K = 2 (balanced π conditions).
Table 11. Powers of exact and asymptotic methods for K = 2 (balanced π conditions).
γ 1 γ 2 π 1 = π 2 Exact MethodsAsymptotic Methods
p L E p SC E p W E p L M p SC M p W M p L EM p SC EM p W EM p L A p SC A p W A
0.10.50.30.14030.14370.15490.12680.13680.12090.13230.13250.13340.17190.15930.1929
0.60.21080.21300.22810.19450.20410.17470.20060.20030.19960.24720.23040.2641
0.70.31140.31060.33170.29240.29900.24890.29790.29560.29430.35180.32760.3572
0.80.45090.44390.47230.43010.42830.34780.43360.42530.42650.49320.45560.4772
0.90.64080.62170.65700.61920.59910.47610.62010.59680.60760.68120.61800.6308
0.30.60.11490.11870.12840.10190.10720.07810.10480.10420.10690.13890.12170.1378
0.70.18520.18690.20160.16830.16980.12080.17090.16710.17160.21730.18630.2040
0.80.29760.29370.31510.27640.26710.18590.27790.26550.27500.34090.28400.3029
0.90.47270.45550.48460.44700.41320.28240.44740.41420.43570.53070.42610.4461
0.50.80.16140.15700.16800.14560.12900.07030.14630.13330.14010.20000.13720.1444
0.90.28180.26560.28280.25680.21760.11810.25810.22820.24230.34530.22550.2342
0.10.50.50.14420.15330.16840.11860.13310.11420.12730.13100.13670.16710.14630.1852
0.60.21370.22650.24650.18300.19860.17110.19170.19670.20520.24550.21440.2647
0.70.31150.32770.35210.27610.29040.25110.28430.28990.30210.35320.30780.3696
0.80.44620.46310.48940.40710.41480.36070.41510.41860.43500.49690.43220.5015
0.90.62880.63820.66110.58760.57860.50760.59710.59220.61260.68350.59270.6587
0.30.60.11240.12100.13400.09460.09690.07770.09910.09990.10760.13830.10580.1413
0.70.17970.18950.20650.15470.15330.12330.16090.16030.17170.21670.16430.2113
0.80.28610.29420.31470.25070.24000.19360.26020.25540.27180.33760.25290.3119
0.90.45010.44830.47000.39960.36900.29920.41530.40020.42290.51810.38230.4488
0.50.80.15350.15080.16270.12320.10380.07540.13340.12490.13610.19500.11130.1476
0.90.25890.24710.26230.20890.17170.12520.22670.21000.22660.32190.18130.2295
0.10.50.60.14780.15490.16780.12100.13780.11630.13070.13410.14130.16790.15310.1878
0.60.21830.22790.24530.18650.20480.17210.19600.20060.21090.24580.22270.2650
0.70.31690.32870.35040.28130.29830.25000.28960.29420.30880.35290.31770.3672
0.80.45160.46330.48760.41500.42440.35600.42120.42230.44250.49590.44300.4970
0.90.63230.63760.66020.59920.58910.49750.60400.59290.61970.68270.60290.6544
0.30.60.11390.12180.13370.09710.10140.07710.10080.10160.11140.13730.11190.1407
0.70.18140.19050.20660.15950.15990.12170.16380.16260.17700.21580.17260.2101
0.80.28810.29590.31610.25980.25000.19060.26530.25820.27920.33770.26420.3108
0.90.45210.45150.47410.41600.38390.29400.42430.40330.43290.52120.39730.4495
0.50.80.15360.15290.16480.13130.11160.07390.13730.12750.14100.19600.11990.1468
0.90.26080.25210.26810.22480.18530.12350.23570.21500.23570.32810.19520.2305
Table 12. Values of test statistics and p-value.
Table 12. Values of test statistics and p-value.
ValueTest Statistics
T L T SC T W
Statistic value5.03775.07625.1107
p-value0.02480.02430.0238
Table 13. Comparison of asymptotic and exact p-value.
Table 13. Comparison of asymptotic and exact p-value.
MethodA ApproachE ApproachM ApproachE + M Approach
p L A p SC A p W A p L E p SC E p W E p L M p SC M p W M p L EM p SC EM p W EM
p-value0.15580.16070.14920.19530.19520.08540.21940.20760.20390.19890.19990.2127
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, M.; Li, Z.; Mou, K.; Shuaib, K.M. Homogeneity Test of the First-Order Agreement Coefficient in a Stratified Design. Entropy 2023, 25, 536. https://doi.org/10.3390/e25030536

AMA Style

Xu M, Li Z, Mou K, Shuaib KM. Homogeneity Test of the First-Order Agreement Coefficient in a Stratified Design. Entropy. 2023; 25(3):536. https://doi.org/10.3390/e25030536

Chicago/Turabian Style

Xu, Mingrui, Zhiming Li, Keyi Mou, and Kalakani Mohammad Shuaib. 2023. "Homogeneity Test of the First-Order Agreement Coefficient in a Stratified Design" Entropy 25, no. 3: 536. https://doi.org/10.3390/e25030536

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop