# Four-Parameter Guessing Model and Related Item Response Models

^{1}

^{2}

## Abstract

**:**

## 1. Introduction

## 2. Item Response Models

#### 2.1. Two-Parameter Model (2PL)

#### 2.2. Three-Parameter Model (3PL)

#### 2.3. Four-Parameter Model (4PL)

#### 2.4. Four-Parameter Guessing Model (4PGL)

#### 2.5. Reparametrized Four-Parameter Model (R4PL)

#### 2.6. Three-Parameter Model with Residual Heterogeneity (3PLRH)

#### 2.7. Summary

## 3. Simulation Study

#### 3.1. Method

`xxirt()`function in the R package sirt [68]. In each of the four cells of the simulation (i.e., the four factor levels of the sample size N), 1500 replications were conducted.

#### 3.2. Results

## 4. Empirical Example: PIRLS 2016 Reading

#### 4.1. Method

#### 4.2. Results

## 5. Discussion

## Funding

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

2PL | two-parameter logistic model |

3PL | three-parameter logistic model |

3PLRH | three-parameter logistic model with residual heterogeneity |

4PL | four-parameter logistic model |

4PGL | four-parameter logistic guessing model |

AIC | Akaike information criterion |

BIC | Bayesian information criterion |

GHP | Gilula–Haberman penalty |

R4PL | reparametrized four-parameter logistic model |

RMSE | root mean square error |

## Appendix A. Selected Countries in Empirical Example PIRLS 2016 Reading

## References

- Bock, R.D.; Moustaki, I. Item response theory in a general framework. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 469–513. [Google Scholar] [CrossRef]
- van der Linden, W.J.; Hambleton, R.K. (Eds.) Handbook of Modern Item Response Theory; Springer: New York, NY, USA, 1997. [Google Scholar] [CrossRef]
- van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
- Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall/CRC Press: London, UK, 2013. [Google Scholar] [CrossRef]
- OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020; Available online: https://bit.ly/3zWbidA (accessed on 2 November 2022).
- Foy, P.; Yin, L. Scaling the PIRLS 2016 achievement data. In Methods and Procedures in PIRLS 2016; Martin, M.O., Mullis, I.V., Hooper, M., Eds.; IEA, Boston College: Newton, MA, USA, 2017. [Google Scholar]
- Haladyna, T.M.; Downing, S.M.; Rodriguez, M.C. A review of multiple-choice item-writing guidelines for classroom assessment. Appl. Meas. Educ.
**2002**, 15, 309–333. [Google Scholar] [CrossRef] - Haladyna, T.M. Developing and Validating Multiple-Choice Test Items; Routledge: London, UK, 2004. [Google Scholar]
- Haladyna, T.M.; Rodriguez, M.C.; Stevens, C. Are multiple-choice items too fat? Appl. Meas. Educ.
**2019**, 32, 350–364. [Google Scholar] [CrossRef] - Kubinger, K.D.; Holocher-Ertl, S.; Reif, M.; Hohensinn, C.; Frebort, M. On minimizing guessing effects on multiple-choice items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format. Int. J. Sel. Assess.
**2010**, 18, 111–115. [Google Scholar] [CrossRef] - Andrich, D.; Marais, I.; Humphry, S. Using a theorem by Andersen and the dichotomous Rasch model to assess the presence of random guessing in multiple choice items. J. Educ. Behav. Stat.
**2012**, 37, 417–442. [Google Scholar] [CrossRef] - Andrich, D.; Marais, I.; Humphry, S.M. Controlling guessing bias in the dichotomous Rasch model applied to a large-scale, vertically scaled testing program. Educ. Psychol. Meas.
**2016**, 76, 412–435. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Jiao, H. Comparison of different approaches to dealing with guessing in Rasch modeling. Psych. Test Assess. Model.
**2022**, 64, 65–86. Available online: https://bit.ly/3CJQECj (accessed on 2 November 2022). - Lord, F.M.; Novick, R. Statistical Theories of Mental Test Scores; Addison-Wesley: Reading, MA, USA, 1968. [Google Scholar]
- Aitkin, M.; Aitkin, I. Investigation of the Identifiability of the 3PL Model in the NAEP 1986 Math Survey; Technical Report; US Department of Education, Office of Educational Research and Improvement National Center for Education Statistics: Washington, DC, USA, 2006. Available online: https://bit.ly/3T6t9sl (accessed on 2 November 2022).
- von Davier, M. Is there need for the 3PL model? Guess what? Meas. Interdiscip. Res. Persp.
**2009**, 7, 110–114. [Google Scholar] [CrossRef] - Aitkin, M.; Aitkin, I. New Multi-Parameter Item Response Models; Technical Report; US Department of Education, Office of Educational Research and Improvement National Center for Education Statistics: Washington, DC, USA, 2008. Available online: https://bit.ly/3ypA0oK (accessed on 2 November 2022).
- Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
- Casabianca, J.M.; Lewis, C. IRT item parameter recovery with marginal maximum likelihood estimation using loglinear smoothing models. J. Educ. Behav. Stat.
**2015**, 40, 547–578. [Google Scholar] [CrossRef] [Green Version] - Steinfeld, J.; Robitzsch, A. Item parameter estimation in multistage designs: A comparison of different estimation approaches for the Rasch model. Psych
**2021**, 3, 279–307. [Google Scholar] [CrossRef] - Woods, C.M. Empirical histograms in item response theory with ordinal data. Educ. Psychol. Meas.
**2007**, 67, 73–87. [Google Scholar] [CrossRef] - Xu, X.; von Davier, M. Fitting the Structured General Diagnostic Model to NAEP Data; Research Report No. RR-08-28; Educational Testing Service: Princeton, NJ, USA, 2008. [Google Scholar] [CrossRef]
- Yen, W.M. Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Appl. Psychol. Meas.
**1984**, 8, 125–145. [Google Scholar] [CrossRef] - Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika
**1981**, 46, 443–459. [Google Scholar] [CrossRef] - Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Volume 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
- Robitzsch, A. A note on a computationally efficient implementation of the EM algorithm in item response models. Quant. Comput. Methods Behav. Sc.
**2021**, 1, e3783. [Google Scholar] [CrossRef] - Frey, A.; Hartig, J.; Rupp, A.A. An NCME instructional module on booklet designs in large-scale assessments of student achievement: Theory and practice. Educ. Meas.
**2009**, 28, 39–53. [Google Scholar] [CrossRef] - von Davier, M. Imputing proficiency data under planned missingness in population models. In A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Rutkowski, L., von Davier, M., Rutkowski, D., Eds.; Chapman Hall/CRC Press: London, UK, 2013; pp. 175–201. [Google Scholar] [CrossRef]
- Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Denmark, 1960. [Google Scholar]
- Debelak, R.; Strobl, C.; Zeigenfuse, M.D. An Introduction to the Rasch Model with Examples in R; CRC Press: Boca Raton, FL, USA, 2022. [Google Scholar] [CrossRef]
- Battauz, M.; Bellio, R. Shrinkage estimation of the three-parameter logistic model. Br. J. Math. Stat. Psychol.
**2021**, 74, 591–609. [Google Scholar] [CrossRef] - de Gruijter, D.N.M. Small N does not always justify Rasch model. Appl. Psychol. Meas.
**1986**, 10, 187–194. [Google Scholar] [CrossRef] [Green Version] - Kubinger, K.D.; Draxler, C. A comparison of the Rasch model and constrained item response theory models for pertinent psychological test data. In Multivariate and Mixture Distribution Rasch Models—Extensions and Applications; von Davier, M., Carstensen, C.H., Eds.; Springer: New York, NY, USA, 2006; pp. 295–312. [Google Scholar] [CrossRef]
- Maris, G.; Bechger, T. On interpreting the model parameters for the three parameter logistic model. Meas. Interdiscip. Res. Persp.
**2009**, 7, 75–88. [Google Scholar] [CrossRef] - San Martín, E.; González, J.; Tuerlinckx, F. On the unidentifiability of the fixed-effects 3PL model. Psychometrika
**2015**, 80, 450–467. [Google Scholar] [CrossRef] [PubMed] - von Davier, M.; Bezirhan, U. A robust method for detecting item misfit in large scale assessments. Educ. Psychol. Meas.
**2022**. [Google Scholar] [CrossRef] - Loken, E.; Rulison, K.L. Estimation of a four-parameter item response theory model. Br. J. Math. Stat. Psychol.
**2010**, 63, 509–525. [Google Scholar] [CrossRef] - Barnard-Brak, L.; Lan, W.Y.; Yang, Z. Differences in mathematics achievement according to opportunity to learn: A 4PL item response theory examination. Stud. Educ. Eval.
**2018**, 56, 1–7. [Google Scholar] [CrossRef] - Culpepper, S.A. The prevalence and implications of slipping on low-stakes, large-scale assessments. J. Educ. Behav. Stat.
**2017**, 42, 706–725. [Google Scholar] [CrossRef] - Robitzsch, A. On the choice of the item response model for scaling PISA data: Model selection based on information criteria and quantifying model uncertainty. Entropy
**2022**, 24, 760. [Google Scholar] [CrossRef] - Aitkin, M.; Aitkin, I. Statistical Modeling of the National Assessment of Educational Progress; Springer: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
- Bürkner, P.C. Analysing standard progressive matrices (SPM-LS) with Bayesian item response models. J. Intell.
**2020**, 8, 5. [Google Scholar] [CrossRef] [Green Version] - Meng, X.; Xu, G.; Zhang, J.; Tao, J. Marginalized maximum a posteriori estimation for the four-parameter logistic model under a mixture modelling framework. Br. J. Math. Stat. Psychol.
**2020**, 73, 51–82. [Google Scholar] [CrossRef] - Battauz, M. Regularized estimation of the four-parameter logistic model. Psych
**2020**, 2, 269–278. [Google Scholar] [CrossRef] - Bazán, J.L.; Bolfarine, H.; Branco, M.D. A skew item response model. Bayesian Anal.
**2006**, 1, 861–892. [Google Scholar] [CrossRef] - Goldstein, H. Consequences of using the Rasch model for educational assessment. Br. Educ. Res. J.
**1979**, 5, 211–220. [Google Scholar] [CrossRef] - Shim, H.; Bonifay, W.; Wiedermann, W. Parsimonious asymmetric item response theory modeling with the complementary log-log link. Behav. Res. Methods
**2022**. [Google Scholar] [CrossRef] [PubMed] - Zhang, J.; Zhang, Y.Y.; Tao, J.; Chen, M.H. Bayesian item response theory models with flexible generalized logit links. Appl. Psychol. Meas.
**2022**. [Google Scholar] [CrossRef] - Molenaar, D.; Dolan, C.V.; De Boeck, P. The heteroscedastic graded response model with a skewed latent trait: Testing statistical and substantive hypotheses related to skewed item category functions. Psychometrika
**2012**, 77, 455–478. [Google Scholar] [CrossRef] [PubMed] - Molenaar, D. Heteroscedastic latent trait models for dichotomous data. Psychometrika
**2015**, 80, 625–644. [Google Scholar] [CrossRef] - Bolt, D.M.; Deng, S.; Lee, S. IRT model misspecification and measurement of growth in vertical scaling. J. Educ. Meas.
**2014**, 51, 141–162. [Google Scholar] [CrossRef] - Liao, X.; Bolt, D.M. Item characteristic curve asymmetry: A better way to accommodate slips and guesses than a four-parameter model? J. Educ. Behav. Stat.
**2021**, 46, 753–775. [Google Scholar] [CrossRef] - Bolt, D.M.; Lee, S.; Wollack, J.; Eckerly, C.; Sowles, J. Application of asymmetric IRT modeling to discrete-option multiple-choice test items. Front. Psychol.
**2018**, 9, 2175. [Google Scholar] [CrossRef] [PubMed] - Lee, S.; Bolt, D.M. An alternative to the 3PL: Using asymmetric item characteristic curves to address guessing effects. J. Educ. Meas.
**2018**, 55, 90–111. [Google Scholar] [CrossRef] - Douglas, J.; Cohen, A. Nonparametric item response function estimation for assessing parametric model fit. Appl. Psychol. Meas.
**2001**, 25, 234–243. [Google Scholar] [CrossRef] - Sueiro, M.J.; Abad, F.J. Assessing goodness of fit in item response theory with nonparametric models: A comparison of posterior probabilities and kernel-smoothing approaches. Educ. Psychol. Meas.
**2011**, 71, 834–848. [Google Scholar] [CrossRef] - Chakraborty, S. Generating discrete analogues of continuous probability distributions—A survey of methods and constructions. J. Stat. Distr. Appl.
**2015**, 2, 6. [Google Scholar] [CrossRef] [Green Version] - Chalmers, R.P.; Ng, V. Plausible-value imputation statistics for detecting item misfit. Appl. Psychol. Meas.
**2017**, 41, 372–387. [Google Scholar] [CrossRef] - Khorramdel, L.; Shin, H.J.; von Davier, M. GDM software mdltm including parallel EM algorithm. In Handbook of Diagnostic Classification Models; von Davier, M., Lee, Y.S., Eds.; Springer: Cham, Switzerland, 2019; pp. 603–628. [Google Scholar] [CrossRef]
- Robitzsch, A. Statistical properties of estimators of the RMSD item fit statistic. Foundations
**2022**, 2, 488–503. [Google Scholar] [CrossRef] - Tijmstra, J.; Bolsinova, M.; Liaw, Y.L.; Rutkowski, L.; Rutkowski, D. Sensitivity of the RMSD for detecting item-level misfit in low-performing countries. J. Educ. Meas.
**2020**, 57, 566–583. [Google Scholar] [CrossRef] - Köhler, C.; Robitzsch, A.; Hartig, J. A bias-corrected RMSD item fit statistic: An evaluation and comparison to alternatives. J. Educ. Behav. Stat.
**2020**, 45, 251–273. [Google Scholar] [CrossRef] - Kang, T.; Cohen, A.S. IRT model selection methods for dichotomous items. Appl. Psychol. Meas.
**2007**, 31, 331–358. [Google Scholar] [CrossRef] - Myung, I.J.; Pitt, M.A.; Kim, W. Model evaluation, testing and selection. In Handbook of Cognition; Lamberts, K., Goldstone, R.L., Eds.; Sage: Thousand Oaks, CA, USA; Mahwah, NJ, USA, 2005; pp. 422–436. [Google Scholar] [CrossRef]
- von Davier, M.; Yamamoto, K.; Shin, H.J.; Chen, H.; Khorramdel, L.; Weeks, J.; Davis, S.; Kong, N.; Kandathil, M. Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assess. Educ.
**2019**, 26, 466–488. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 11 January 2022).
- Robitzsch, A. sirt: Supplementary Item Response Theory Models. R Package Version 3.12-66. 2022. Available online: https://CRAN.R-project.org/package=sirt (accessed on 17 May 2022).
- Gilula, Z.; Haberman, S.J. Prediction functions for categorical panel data. Ann. Stat.
**1995**, 23, 1130–1142. [Google Scholar] [CrossRef] - Haberman, S.J. The Information a Test Provides on an Ability Parameter; Research Report No. RR-07-18; Educational Testing Service: Princeton, NJ, USA, 2007. [Google Scholar] [CrossRef]
- van Rijn, P.W.; Sinharay, S.; Haberman, S.J.; Johnson, M.S. Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-Scale Assess. Educ.
**2016**, 4, 10. [Google Scholar] [CrossRef] [Green Version] - George, A.C.; Robitzsch, A. Validating theoretical assumptions about reading with cognitive diagnosis models. Int. J. Test.
**2021**, 21, 105–129. [Google Scholar] [CrossRef] - Robitzsch, A. On the treatment of missing item responses in educational large-scale assessment data: An illustrative simulation study and a case study using PISA 2018 mathematics data. Eur. J. Investig. Health Psychol. Educ.
**2021**, 11, 1653–1687. [Google Scholar] [CrossRef] - Robitzsch, A.; Lüdtke, O. Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies. Meas. Instrum. Soc. Sci.
**2022**, 4, 9. [Google Scholar] [CrossRef] - Camilli, G. IRT scoring and test blueprint fidelity. Appl. Psychol. Meas.
**2018**, 42, 393–400. [Google Scholar] [CrossRef] [PubMed] - Brennan, R.L. Misconceptions at the intersection of measurement theory and practice. Educ. Meas.
**1998**, 17, 5–9. [Google Scholar] [CrossRef] - Uher, J. Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and the complex network of its underlying fallacies. J. Theor. Philos. Psychol.
**2021**, 41, 58–84. [Google Scholar] [CrossRef] - Haberman, S.J. Identifiability of Parameters in Item Response Models with Unconstrained Ability Distributions; Research Report No. RR-05-24; Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
- Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
- Suh, Y.; Bolt, D.M. A nested logit approach for investigating distractors as causes of differential item functioning. J. Educ. Meas.
**2011**, 48, 188–205. [Google Scholar] [CrossRef] - Chiu, T.W.; Camilli, G. Comment on 3PL IRT adjustment for guessing. Appl. Psychol. Meas.
**2013**, 37, 76–86. [Google Scholar] [CrossRef] - San Martín, E.; Del Pino, G.; De Boeck, P. IRT models for ability-based guessing. Appl. Psychol. Meas.
**2006**, 30, 183–203. [Google Scholar] [CrossRef] [Green Version] - Jiang, Y.; Yu, X.; Cai, Y.; Tu, D. A multidimensional IRT model for ability-item-based guessing: The development of a two-parameter logistic extension model. Commun. Stat. Simul. Comput.
**2022**. [Google Scholar] [CrossRef] - Formann, A.K.; Kohlmann, T. Three-parameter linear logistic latent class analysis. In Applied Latent Class Analysis; Hagenaars, J.A., McCutcheon, A.L., Eds.; Cambridge University Press: Cambridge, MA, USA, 2002; pp. 183–210. [Google Scholar]
- Huang, H.Y.; Wang, W.C. The random-effect DINA model. J. Educ. Meas.
**2014**, 51, 75–97. [Google Scholar] [CrossRef] - Raiche, G.; Magis, D.; Blais, J.G.; Brochu, P. Taking atypical response patterns into account: A multidimensional measurement model from item response theory. In Improving Large-Scale Assessment in Education; Simon, M., Ercikan, K., Rousseau, M., Eds.; Routledge: New York, NY, USA, 2012; pp. 238–259. [Google Scholar] [CrossRef]
- Ferrando, P.J. A comprehensive IRT approach for modeling binary, graded, and continuous responses with error in persons and items. Appl. Psychol. Meas.
**2019**, 43, 339–359. [Google Scholar] [CrossRef] - Levine, M.V.; Drasgow, F. Appropriateness measurement: Review, critique and validating studies. Br. J. Math. Stat. Psychol.
**1982**, 35, 42–56. [Google Scholar] [CrossRef]

**Figure 1.**PIRLS 2016 reading: Histogram of proportion of guessers parameters ${g}_{i}$ in the 4PGL model.

**Figure 2.**PIRLS 2016 reading: Histogram of pseudo-guessing parameters ${c}_{i}$ (

**left panel**) and slipping parameters ${d}_{i}$ (

**right panel**) in the 4PL model.

Item | ${\mathit{a}}_{\mathit{i}}$ | ${\mathit{b}}_{\mathit{i}}$ | ${\mathit{g}}_{\mathit{i}}$ |
---|---|---|---|

C01 | 1.3 | −2.1 | — |

C02 | 2.3 | −1.7 | — |

C03 | 1.3 | −1.2 | — |

C04 | 1.7 | −0.9 | — |

C05 | 2.0 | −0.8 | — |

C06 | 2.1 | −0.7 | — |

C07 | 1.9 | −0.5 | — |

C08 | 1.3 | −0.3 | — |

C09 | 0.9 | −0.2 | — |

C10 | 1.7 | −0.1 | — |

C11 | 1.4 | 0.1 | — |

C12 | 1.7 | 0.3 | — |

C13 | 1.1 | 0.6 | — |

C14 | 1.1 | 0.7 | — |

C15 | 1.6 | 0.9 | — |

M01 | 1.0 | −0.6 | 0.20 |

M02 | 2.1 | −1.6 | 0.10 |

M03 | 2.1 | −3.0 | 0.20 |

M04 | 1.5 | −2.0 | 0.15 |

M05 | 2.1 | −1.0 | 0.20 |

M06 | 1.3 | 0.2 | 0.30 |

M07 | 0.9 | −0.4 | 0.05 |

M08 | 1.3 | −0.7 | 0.10 |

M09 | 1.3 | −0.7 | 0.20 |

M10 | 1.2 | −0.6 | 0.05 |

M11 | 1.4 | −0.4 | 0.10 |

M12 | 1.3 | −0.4 | 0.30 |

M13 | 1.5 | −2.1 | 0.15 |

M14 | 1.3 | −0.2 | 0.30 |

M15 | 1.4 | 0.2 | 0.20 |

_{i}= item discrimination; b

_{i}= item intercept; g

_{i}= probability of guessers. The items C01 to C15 are CR items and follow the 2PL model. The items M01 to M15 are MC items, follow the 4PGL model, and have a constant guessing probability π

_{i}of 0.25.

**Table 2.**Simulation study: average absolute bias (ABias) and root mean square error (RMSE) of estimated item parameters in the 4PGL and R4PL models as a function of sample size N.

Type | Parm | Model | ABias | RMSE | ||||||
---|---|---|---|---|---|---|---|---|---|---|

$\mathit{N}$ | $\mathit{N}$ | |||||||||

1000 | 2000 | 5000 | 10,000 | 1000 | 2000 | 5000 | 10,000 | |||

CR | ${a}_{i}$ | 4PGL | 0.011 | 0.004 | 0.002 | 0.001 | 0.133 | 0.093 | 0.059 | 0.041 |

CR | R4PL | 0.016 | 0.007 | 0.003 | 0.001 | 0.134 | 0.094 | 0.059 | 0.041 | |

CR | ${b}_{i}$ | 4PGL | 0.006 | 0.002 | 0.002 | 0.001 | 0.101 | 0.070 | 0.045 | 0.032 |

CR | R4PL | 0.005 | 0.002 | 0.002 | 0.001 | 0.101 | 0.070 | 0.045 | 0.032 | |

MC | ${a}_{i}$ | 4PGL | 0.069 | 0.028 | 0.008 | 0.004 | 0.395 | 0.275 | 0.173 | 0.120 |

MC | R4PL | 0.262 | 0.141 | 0.060 | 0.027 | 0.637 | 0.413 | 0.249 | 0.172 | |

MC | ${b}_{i}$ | 4PGL | 0.050 | 0.019 | 0.007 | 0.004 | 0.361 | 0.255 | 0.161 | 0.113 |

MC | R4PL | 0.062 | 0.026 | 0.011 | 0.004 | 0.429 | 0.285 | 0.175 | 0.121 | |

MC | ${g}_{i}$ | 4PGL | 0.017 | 0.014 | 0.007 | 0.004 | 0.092 | 0.073 | 0.049 | 0.035 |

MC | R4PL | 0.034 | 0.027 | 0.015 | 0.011 | 0.133 | 0.109 | 0.079 | 0.061 | |

MC | ${\pi}_{i}$ | R4PL | 0.035 | 0.028 | 0.026 | 0.028 | 0.245 | 0.216 | 0.178 | 0.151 |

**Table 3.**Simulation study: root integrated square error (RISE) and root mean square deviation (RMSD) statistics as a function of sample size N.

Model | RISE | RMSD | ||||||
---|---|---|---|---|---|---|---|---|

$\mathit{N}$ | $\mathit{N}$ | |||||||

1000 | 2000 | 5000 | 10,000 | 1000 | 2000 | 5000 | 10,000 | |

Constructed response items | ||||||||

2PL | 0.019 | 0.014 | 0.009 | 0.007 | 0.014 | 0.010 | 0.007 | 0.005 |

3PL | 0.019 | 0.014 | 0.009 | 0.007 | 0.014 | 0.010 | 0.007 | 0.005 |

4PGL | 0.019 | 0.013 | 0.008 | 0.006 | 0.014 | 0.010 | 0.006 | 0.004 |

R4PL | 0.019 | 0.013 | 0.008 | 0.006 | 0.014 | 0.010 | 0.006 | 0.004 |

3PLRH | 0.019 | 0.013 | 0.009 | 0.006 | 0.014 | 0.010 | 0.006 | 0.004 |

Multiple-choice items | ||||||||

2PL | 0.033 | 0.029 | 0.027 | 0.026 | 0.022 | 0.019 | 0.016 | 0.014 |

3PL | 0.034 | 0.030 | 0.027 | 0.026 | 0.022 | 0.018 | 0.015 | 0.014 |

4PGL | 0.024 | 0.018 | 0.011 | 0.008 | 0.015 | 0.010 | 0.006 | 0.005 |

R4PL | 0.028 | 0.020 | 0.013 | 0.009 | 0.013 | 0.009 | 0.005 | 0.004 |

3PLRH | 0.029 | 0.024 | 0.019 | 0.017 | 0.017 | 0.013 | 0.010 | 0.008 |

**Table 4.**PIRLS 2016 reading: Model comparison of different scaling models based on Akaike information criterion (AIC), Bayesian information criterion (BIC) and Gilula–Haberman penalty (GHP).

Model | #pars | AIC | BIC | GHP | $\mathbf{\Delta}\mathbf{GHP}$ |
---|---|---|---|---|---|

2PL | 282 | 1,001,341 | 1,003,773 | 0.5229 | 0.0006 |

3PL | 339 | 1,000,569 | 1,003,492 | 0.5225 | 0.0001 |

4PGL | 317 | 1,001,171 | 1,003,904 | 0.5228 | 0.0005 |

R4PL | 407 | 1,000,287 | 1,003,796 | 0.5223 | 0.0000 |

3PLRH | 352 | 1,000,780 | 1,003,815 | 0.5226 | 0.0003 |

**Table 5.**PIRLS 2016 reading: mean (M) and standard deviation (SD) of RMSD item fit statistics in different scaling models.

Model | CR | MC | ||
---|---|---|---|---|

M | SD | M | SD | |

2PL | 0.015 | 0.008 | 0.014 | 0.007 |

3PL | 0.014 | 0.008 | 0.007 | 0.005 |

4PGL | 0.015 | 0.009 | 0.012 | 0.007 |

R4PL | 0.014 | 0.008 | 0.005 | 0.003 |

3PLRH | 0.014 | 0.008 | 0.009 | 0.005 |

**Table 6.**PIRLS 2016 reading: Means (diagonal entries) and correlations (non-diagonal entries) of estimated item parameters of multiple-choice items in different scaling models.

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1: ${a}_{i}$ 2PL | $\phantom{-}$1.32 | $\phantom{-}$0.90 | $\phantom{-}$0.99 | $\phantom{-}$0.78 | $\phantom{-}$0.91 | −0.69 | −0.68 | −0.67 | −0.61 | −0.61 | −0.15 | −0.03 | −0.09 | −0.31 | −0.29 | $\phantom{-}$0.26 | −0.43 |

2: ${a}_{i}$ 3PL | $\phantom{-}$0.90 | $\phantom{-}$1.57 | $\phantom{-}$0.88 | $\phantom{-}$0.85 | $\phantom{-}$0.74 | −0.49 | −0.41 | −0.45 | −0.33 | −0.38 | −0.51 | $\phantom{-}$0.29 | $\phantom{-}$0.19 | −0.39 | −0.04 | $\phantom{-}$0.43 | −0.42 |

3: ${a}_{i}$ 3PLRH | $\phantom{-}$0.99 | $\phantom{-}$0.88 | $\phantom{-}$0.92 | $\phantom{-}$0.77 | $\phantom{-}$0.94 | −0.70 | −0.71 | −0.69 | −0.65 | −0.64 | −0.07 | −0.11 | −0.17 | −0.26 | −0.35 | $\phantom{-}$0.18 | −0.40 |

4: ${a}_{i}$ 4PL | $\phantom{-}$0.78 | $\phantom{-}$0.85 | $\phantom{-}$0.77 | $\phantom{-}$1.92 | $\phantom{-}$0.78 | −0.38 | −0.33 | −0.35 | −0.33 | −0.36 | −0.30 | $\phantom{-}$0.18 | $\phantom{-}$0.22 | −0.02 | $\phantom{-}$0.20 | $\phantom{-}$0.23 | $\phantom{-}$0.00 |

5: ${a}_{i}$ 4PGL | $\phantom{-}$0.91 | $\phantom{-}$0.74 | $\phantom{-}$0.94 | $\phantom{-}$0.78 | $\phantom{-}$1.43 | −0.70 | −0.74 | −0.71 | −0.74 | −0.72 | $\phantom{-}$0.17 | −0.25 | −0.26 | −0.01 | −0.33 | −0.03 | −0.20 |

6: ${b}_{i}$ 2PL | −0.69 | −0.49 | −0.70 | −0.38 | −0.70 | −1.00 | $\phantom{-}$0.97 | $\phantom{-}$1.00 | $\phantom{-}$0.94 | $\phantom{-}$0.97 | −0.27 | $\phantom{-}$0.05 | $\phantom{-}$0.09 | $\phantom{-}$0.33 | $\phantom{-}$0.35 | −0.05 | $\phantom{-}$0.53 |

7: ${b}_{i}$ 3PL | −0.68 | −0.41 | −0.71 | −0.33 | −0.74 | $\phantom{-}$0.97 | −0.74 | $\phantom{-}$0.98 | $\phantom{-}$0.98 | $\phantom{-}$0.97 | −0.42 | $\phantom{-}$0.26 | $\phantom{-}$0.27 | $\phantom{-}$0.21 | $\phantom{-}$0.46 | $\phantom{-}$0.08 | $\phantom{-}$0.45 |

8: ${b}_{i}$ 3PLRH | −0.67 | −0.45 | −0.69 | −0.35 | −0.71 | $\phantom{-}$1.00 | $\phantom{-}$0.98 | −0.68 | $\phantom{-}$0.96 | $\phantom{-}$0.98 | −0.33 | $\phantom{-}$0.11 | $\phantom{-}$0.14 | $\phantom{-}$0.29 | $\phantom{-}$0.37 | $\phantom{-}$0.00 | $\phantom{-}$0.50 |

9: ${b}_{i}$ 4PL | −0.61 | −0.33 | −0.65 | −0.33 | −0.74 | $\phantom{-}$0.94 | $\phantom{-}$0.98 | $\phantom{-}$0.96 | −0.89 | $\phantom{-}$0.98 | −0.54 | $\phantom{-}$0.32 | $\phantom{-}$0.33 | $\phantom{-}$0.10 | $\phantom{-}$0.45 | $\phantom{-}$0.20 | $\phantom{-}$0.34 |

10: ${b}_{i}$ 4PGL | −0.61 | −0.38 | −0.64 | −0.36 | −0.72 | $\phantom{-}$0.97 | $\phantom{-}$0.97 | $\phantom{-}$0.98 | $\phantom{-}$0.98 | −1.13 | −0.44 | $\phantom{-}$0.18 | $\phantom{-}$0.20 | $\phantom{-}$0.17 | $\phantom{-}$0.38 | $\phantom{-}$0.12 | $\phantom{-}$0.40 |

11: ${\delta}_{i}$ 3PLRH | −0.15 | −0.51 | −0.07 | −0.30 | $\phantom{-}$0.17 | −0.27 | −0.42 | −0.33 | −0.54 | −0.44 | −0.23 | −0.76 | −0.68 | $\phantom{-}$0.38 | −0.48 | −0.73 | $\phantom{-}$0.20 |

12: ${c}_{i}$ 3PL | −0.03 | $\phantom{-}$0.29 | −0.11 | $\phantom{-}$0.18 | −0.25 | $\phantom{-}$0.05 | $\phantom{-}$0.26 | $\phantom{-}$0.11 | $\phantom{-}$0.32 | $\phantom{-}$0.18 | −0.76 | $\phantom{-}$0.12 | $\phantom{-}$0.92 | −0.41 | $\phantom{-}$0.68 | $\phantom{-}$0.64 | −0.23 |

13: ${c}_{i}$ 4PL | −0.09 | $\phantom{-}$0.19 | −0.17 | $\phantom{-}$0.22 | −0.26 | $\phantom{-}$0.09 | $\phantom{-}$0.27 | $\phantom{-}$0.14 | $\phantom{-}$0.33 | $\phantom{-}$0.20 | −0.68 | $\phantom{-}$0.92 | $\phantom{-}$0.15 | −0.21 | $\phantom{-}$0.86 | $\phantom{-}$0.65 | $\phantom{-}$0.00 |

14: ${g}_{i}$ 4PGL | −0.31 | −0.39 | −0.26 | −0.02 | −0.01 | $\phantom{-}$0.33 | $\phantom{-}$0.21 | $\phantom{-}$0.29 | $\phantom{-}$0.10 | $\phantom{-}$0.17 | $\phantom{-}$0.38 | −0.41 | −0.21 | $\phantom{-}$0.03 | $\phantom{-}$0.27 | −0.50 | $\phantom{-}$0.89 |

15: ${g}_{i}$ R4PL | −0.29 | −0.04 | −0.35 | $\phantom{-}$0.20 | −0.33 | $\phantom{-}$0.35 | $\phantom{-}$0.46 | $\phantom{-}$0.37 | $\phantom{-}$0.45 | $\phantom{-}$0.38 | −0.48 | $\phantom{-}$0.68 | $\phantom{-}$0.86 | $\phantom{-}$0.27 | $\phantom{-}$0.20 | $\phantom{-}$0.37 | $\phantom{-}$0.50 |

16: ${\pi}_{i}$ R4PL | $\phantom{-}$0.26 | $\phantom{-}$0.43 | $\phantom{-}$0.18 | $\phantom{-}$0.23 | −0.03 | −0.05 | $\phantom{-}$0.08 | $\phantom{-}$0.00 | $\phantom{-}$0.20 | $\phantom{-}$0.12 | −0.73 | $\phantom{-}$0.64 | $\phantom{-}$0.65 | −0.50 | $\phantom{-}$0.37 | $\phantom{-}$0.72 | −0.39 |

17: ${d}_{i}$ 4PL | −0.43 | −0.42 | −0.40 | $\phantom{-}$0.00 | −0.20 | $\phantom{-}$0.53 | $\phantom{-}$0.45 | $\phantom{-}$0.50 | $\phantom{-}$0.34 | $\phantom{-}$0.40 | $\phantom{-}$0.20 | −0.23 | $\phantom{-}$0.00 | $\phantom{-}$0.89 | $\phantom{-}$0.50 | −0.39 | $\phantom{-}$0.04 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Robitzsch, A.
Four-Parameter Guessing Model and Related Item Response Models. *Math. Comput. Appl.* **2022**, *27*, 95.
https://doi.org/10.3390/mca27060095

**AMA Style**

Robitzsch A.
Four-Parameter Guessing Model and Related Item Response Models. *Mathematical and Computational Applications*. 2022; 27(6):95.
https://doi.org/10.3390/mca27060095

**Chicago/Turabian Style**

Robitzsch, Alexander.
2022. "Four-Parameter Guessing Model and Related Item Response Models" *Mathematical and Computational Applications* 27, no. 6: 95.
https://doi.org/10.3390/mca27060095