# A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning

^{1}

^{2}

## Abstract

**:**

## 1. Introduction

## 2. Linking Two Groups with the 2PL Model

#### 2.1. 2PL Model

#### 2.2. Linking Design

#### 2.3. Random Differential Item Functioning

#### 2.3.1. Identified Item Parameters in Separate Calibrations in the Two Groups

#### 2.3.2. The Role of Normally Distributed Random DIF in Educational Assessment

## 3. Linking Methods

#### 3.1. Log-Mean-Mean Linking

**Proposition**

**1.**

**Proof.**

#### 3.2. Mean-Mean Linking

**Proposition**

**2.**

**Proof.**

#### 3.3. Haberman Linking (HAB and HAB-nolog)

#### 3.4. Invariance Alignment with $p=2$

#### 3.5. Haebara Linking Methods (HAE-Asymm, HAE-Symm, HAE-Joint)

#### 3.6. Recalibration Linking (RC1, RC2, and RC3)

#### 3.7. Anchored Item Parameters

#### 3.8. Concurrent Calibration

## 4. Simulation Study

#### 4.1. Purpose

#### 4.2. Design

#### 4.3. Analysis Methods

#### 4.4. Results

## 5. Empirical Example: Linking PISA 2006 and PISA 2009 for Austria

#### 5.1. Method

#### 5.2. Results

## 6. Discussion

## Funding

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

1PL | one-parameter logistic model |

2PL | two-parameter logistic model |

ANCH | anchored item parameters |

CC | concurrent calibration |

DIF | differential item functioning |

HAB | Haberman linking with logarithmized item discriminations |

HAB-nolog | Haberman linking with untransformed item discriminations |

HAE | Haebara linking |

HAE-asymm | asymmetric Haebara linking |

HAE-joint | Haebara linking with joint item parameters |

HAE-symm | symmetric Haebara linking |

IA2 | invariance alignment with power $p=2$ |

IRF | item response function |

IRT | item response theory |

logMM | log-mean-mean linking |

LSA | large-scale assessment |

MM | mean-mean linking |

MML | marginal maximum likelihood |

MSE | mean-squared error |

NUDIF | nonuniform differential item functioning |

PIRLS | Progress in International Reading Literacy Study |

PISA | Programme for International Student Assessment |

RC | recalibration linking |

RMSE | root-mean-squared error |

SD | standard deviation |

TIMSS | Trends in International Mathematics and Science Study |

UDIF | uniform differential item functioning |

## Appendix A. Nonidentifiability of DIF Effects Distributions

#### Appendix A.1. DIF Effects for Item Difficulties

#### Appendix A.2. DIF Effects for Item Discriminations

## Appendix B. Proof of Proposition 1

#### Appendix B.1. Consistency of Additive DIF Effects f_{i} with Condition (I)

#### Appendix B.2. Consistency for Multiplicative DIF Effects f_{i} with Condition (II)

## Appendix C. Proof of Proposition 2

#### Appendix C.1. Consistency for Additive DIF Effects f_{i} with Condition (I)

#### Appendix C.2. Consistency for Multiplicative DIF Effects f_{i} with Condition (IO)

## Appendix D. Estimates in Haberman Linking

## Appendix E. Estimates in Invariance Alignment

## Appendix F. Item Parameters Used in the Simulation Study

Item | ${\mathit{a}}_{\mathit{i}}$ | ${\mathit{b}}_{\mathit{i}}$ |
---|---|---|

1 | 0.95 | −0.97 |

2 | 0.88 | $\phantom{-}$0.59 |

3 | 0.75 | $\phantom{-}$0.75 |

4 | 1.29 | −0.79 |

5 | 1.28 | $\phantom{-}$1.23 |

6 | 1.29 | −1.10 |

7 | 1.25 | −0.67 |

8 | 0.97 | $\phantom{-}$0.20 |

9 | 0.73 | $\phantom{-}$1.26 |

10 | 1.27 | $\phantom{-}$0.05 |

11 | 1.42 | $\phantom{-}$1.22 |

12 | 0.75 | −0.01 |

13 | 0.50 | $\phantom{-}$0.20 |

14 | 0.81 | $\phantom{-}$1.39 |

15 | 1.12 | $\phantom{-}$0.61 |

16 | 0.78 | −1.00 |

17 | 1.30 | −1.58 |

18 | 0.70 | −1.62 |

19 | 1.29 | $\phantom{-}$1.06 |

20 | 0.74 | −0.81 |

## References

- Cai, L.; Choi, K.; Hansen, M.; Harrell, L. Item response theory. Annu. Rev. Stat. Appl.
**2016**, 3, 297–321. [Google Scholar] [CrossRef] - van der Linden, W.J.; Hambleton, R.K. (Eds.) Handbook of Modern Item Response Theory; Springer: New York, NY, USA, 1997. [Google Scholar] [CrossRef]
- Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
- Battauz, M. Regularized estimation of the four-parameter logistic model. Psych
**2020**, 2, 269–278. [Google Scholar] [CrossRef] - Bürkner, P.C. Analysing standard progressive matrices (SPM-LS) with Bayesian item response models. J. Intell.
**2020**, 8, 5. [Google Scholar] [CrossRef] [PubMed][Green Version] - Chang, H.H.; Wang, C.; Zhang, S. Statistical applications in educational measurement. Annu. Rev. Stat. Appl.
**2021**, 8, 439–461. [Google Scholar] [CrossRef] - Genge, E. LC and LC-IRT models in the identification of Polish households with similar perception of financial position. Sustainability
**2021**, 13, 4130. [Google Scholar] [CrossRef] - Jefmański, B.; Sagan, A. Item response theory models for the fuzzy TOPSIS in the analysis of survey data. Symmetry
**2021**, 13, 223. [Google Scholar] [CrossRef] - Karwowski, M.; Milerski, B. Who supports Polish educational reforms? Exploring actors’ and observers’ attitudes. Educ. Sci.
**2021**, 11, 120. [Google Scholar] [CrossRef] - Medová, J.; Páleníková, K.; Rybanskỳ, L.; Naštická, Z. Undergraduate students’ solutions of modeling problems in algorithmic graph theory. Mathematics
**2019**, 7, 572. [Google Scholar] [CrossRef][Green Version] - Mousavi, A.; Cui, Y. The effect of person misfit on item parameter estimation and classification accuracy: A simulation study. Educ. Sci.
**2020**, 10, 324. [Google Scholar] [CrossRef] - Palma-Vasquez, C.; Carrasco, D.; Hernando-Rodriguez, J.C. Mental health of teachers who have teleworked due to COVID-19. Eur. J. Investig. Health Psychol. Educ.
**2021**, 11, 515–528. [Google Scholar] [CrossRef] - Storme, M.; Myszkowski, N.; Baron, S.; Bernard, D. Same test, better scores: Boosting the reliability of short online intelligence recruitment tests with nested logit item response theory models. J. Intell.
**2019**, 7, 17. [Google Scholar] [CrossRef][Green Version] - Tsutsumi, E.; Kinoshita, R.; Ueno, M. Deep item response theory as a novel test theory based on deep learning. Electronics
**2021**, 10, 1020. [Google Scholar] [CrossRef] - Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Lietz, P.; Cresswell, J.C.; Rust, K.F.; Adams, R.J. (Eds.) Implementation of Large-Scale Education Assessments; Wiley: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
- Maehler, D.B.; Rammstedt, B. (Eds.) Large-Scale Cognitive Assessment; Springer: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
- Wagemaker, H. International large-scale assessments: From research to policy. In A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Rutkowski, L., von Davier, M., Rutkowski, D., Eds.; Chapman Hall/CRC Press: London, UK, 2014; pp. 11–36. [Google Scholar] [CrossRef]
- van der Linden, W.J. Unidimensional Logistic Response Models. In Handbook of Item Response Theory, Volume One: Models; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
- Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Denmark, 1960. [Google Scholar]
- von Davier, A.A.; Carstensen, C.H.; von Davier, M. Linking Competencies in Educational Settings and Measuring Growth; (Research Report No. RR-06-12); Educational Testing Service: Princeton, NJ, USA, 2006. [Google Scholar] [CrossRef]
- von Davier, A.A.; Holland, P.W.; Thayer, D.T. The Kernel Method of Test Equating; Springer: New York, NY, USA, 2004. [Google Scholar] [CrossRef][Green Version]
- Bolsinova, M.; Maris, G. Can IRT solve the missing data problem in test equating? Front. Psychol.
**2016**, 6, 1956. [Google Scholar] [CrossRef] [PubMed][Green Version] - Liou, M.; Cheng, P.E. Equipercentile equating via data-imputation techniques. Psychometrika
**1995**, 60, 119–136. [Google Scholar] [CrossRef] - Meredith, W. Measurement invariance, factor analysis and factorial invariance. Psychometrika
**1993**, 58, 525–543. [Google Scholar] [CrossRef] - Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
- van de Vijver, F.J.R. (Ed.) Invariance Analyses in Large-Scale Studies; OECD: Paris, France, 2019. [Google Scholar] [CrossRef]
- Mellenbergh, G.J. Item bias and item response theory. Int. J. Educ. Res.
**1989**, 13, 127–143. [Google Scholar] [CrossRef] - Millsap, R.E.; Everson, H.T. Methodology review: Statistical approaches for assessing measurement bias. Appl. Psychol. Meas.
**1993**, 17, 297–334. [Google Scholar] [CrossRef] - Osterlind, S.J.; Everson, H.T. Differential Item Functioning; Sage Publications: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
- Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elesvier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
- Uyar, S.; Kelecioglu, H.; Dogan, N. Comparing differential item functioning based on manifest groups and latent classes. Educ. Sci. Theory Pract.
**2017**, 17, 1977–2000. [Google Scholar] [CrossRef][Green Version] - Lee, S.Y.; Hong, A.J. Psychometric investigation of the cultural intelligence scale using the Rasch measurement model in South Korea. Sustainability
**2021**, 13, 3139. [Google Scholar] [CrossRef] - Mylona, I.; Aletras, V.; Ziakas, N.; Tsinopoulos, I. Rasch validation of the VF-14 scale of vision-specific functioning in Greek patients. Int. J. Environ. Res. Public Health
**2021**, 18, 4254. [Google Scholar] [CrossRef] - Pichette, F.; Béland, S.; Leśniewska, J. Detection of gender-biased items in the peabody picture vocabulary test. Languages
**2019**, 4, 27. [Google Scholar] [CrossRef][Green Version] - Shibaev, V.; Grigoriev, A.; Valueva, E.; Karlin, A. Differential item functioning on Raven’s SPM+ amongst two convenience samples of Yakuts and Russian. Psych
**2020**, 2, 44–51. [Google Scholar] [CrossRef][Green Version] - Silvia, P.J.; Rodriguez, R.M. Time to renovate the humor styles questionnaire? An item response theory analysis of the HSQ. Behav. Sci.
**2020**, 10, 173. [Google Scholar] [CrossRef] - Hanson, B.A. Uniform DIF and DIF defined by differences in item response functions. J. Educ. Behav. Stat.
**1998**, 23, 244–253. [Google Scholar] [CrossRef] - Teresi, J.A.; Ramirez, M.; Lai, J.S.; Silver, S. Occurrences and sources of differential item functioning (DIF) in patient-reported outcome measures: Description of DIF methods, and review of measures of depression, quality of life and general health. Psychol. Sci.
**2008**, 50, 538–612. [Google Scholar] - Buchholz, J.; Hartig, J. Measurement invariance testing in questionnaires: A comparison of three multigroup-CFA and IRT-based approaches. Psych. Test Assess. Model.
**2020**, 62, 29–53. [Google Scholar] - Chalmers, R.P. Extended mixed-effects item response models with the MH-RM algorithm. J. Educ. Meas.
**2015**, 52, 200–222. [Google Scholar] [CrossRef] - De Boeck, P.; Wilson, M. Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach; Springer: New York, NY, USA, 2004. [Google Scholar] [CrossRef]
- De Boeck, P. Random item IRT models. Psychometrika
**2008**, 73, 533–559. [Google Scholar] [CrossRef] - de Jong, M.G.; Steenkamp, J.B.E.M.; Fox, J.P. Relaxing measurement invariance in cross-national consumer research using a hierarchical IRT model. J. Consum. Res.
**2007**, 34, 260–278. [Google Scholar] [CrossRef] - Doran, H.; Bates, D.; Bliese, P.; Dowling, M. Estimating the multilevel Rasch model: With the lme4 package. J. Stat. Softw.
**2007**, 20, 1–18. [Google Scholar] [CrossRef] - Fox, J.P.; Verhagen, A.J. Random item effects modeling for cross-national survey data. In Cross-Cultural Analysis: Methods and Applications; Davidov, E., Schmidt, P., Billiet, J., Eds.; Routledge: London, UK, 2010; pp. 461–482. [Google Scholar] [CrossRef]
- Van den Noortgate, W.; De Boeck, P. Assessing and explaining differential item functioning using logistic mixed models. J. Educ. Behav. Stat.
**2005**, 30, 443–464. [Google Scholar] [CrossRef] - Muthén, B.; Asparouhov, T. Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychol. Methods
**2012**, 17, 313–335. [Google Scholar] [CrossRef] [PubMed] - van de Schoot, R.; Kluytmans, A.; Tummers, L.; Lugtig, P.; Hox, J.; Muthén, B. Facing off with scylla and charybdis: A comparison of scalar, partial, and the novel possibility of approximate measurement invariance. Front. Psychol.
**2013**, 4, 770. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bechger, T.M.; Maris, G. A statistical test for differential item pair functioning. Psychometrika
**2015**, 80, 317–340. [Google Scholar] [CrossRef] - Camilli, G. The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In Differential Item Functioning: Theory and Practice; Holland, P.W., Wainer, H., Eds.; Erlbaum: Hillsdale, NJ, USA, 1993; pp. 397–417. [Google Scholar]
- Doebler, A. Looking at DIF from a new perspective: A structure-based approach acknowledging inherent indefinability. Appl. Psychol. Meas.
**2019**, 43, 303–321. [Google Scholar] [CrossRef] [PubMed] - Robitzsch, A.; Lüdtke, O. A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psych. Test Assess. Model.
**2020**, 62, 233–279. [Google Scholar] - Frederickx, S.; Tuerlinckx, F.; De Boeck, P.; Magis, D. RIM: A random item mixture model to detect differential item functioning. J. Educ. Meas.
**2010**, 47, 432–457. [Google Scholar] [CrossRef] - Byrne, B.M.; Shavelson, R.J.; Muthén, B. Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychol. Bull.
**1989**, 105, 456–466. [Google Scholar] [CrossRef] - Magis, D.; Tuerlinckx, F.; De Boeck, P. Detection of differential item functioning using the lasso approach. J. Educ. Behav. Stat.
**2015**, 40, 111–135. [Google Scholar] [CrossRef] - Tutz, G.; Schauberger, G. A penalty approach to differential item functioning in Rasch models. Psychometrika
**2015**, 80, 21–43. [Google Scholar] [CrossRef][Green Version] - Soares, T.M.; Gonçalves, F.B.; Gamerman, D. An integrated Bayesian model for DIF analysis. J. Educ. Behav. Stat.
**2009**, 34, 348–377. [Google Scholar] [CrossRef] - Kopf, J.; Zeileis, A.; Strobl, C. Anchor selection strategies for DIF analysis: Review, assessment, and new approaches. Educ. Psychol. Meas.
**2015**, 75, 22–56. [Google Scholar] [CrossRef] [PubMed][Green Version] - Magis, D.; Béland, S.; Tuerlinckx, F.; De Boeck, P. A general framework and an R package for the detection of dichotomous differential item functioning. Behav. Res. Methods
**2010**, 42, 847–862. [Google Scholar] [CrossRef][Green Version] - Teresi, J.A.; Ramirez, M.; Jones, R.N.; Choi, S.; Crane, P.K. Modifying measures based on differential item functioning (DIF) impact analyses. J. Aging Health
**2012**, 24, 1044–1076. [Google Scholar] [CrossRef][Green Version] - DeMars, C.E. Alignment as an alternative to anchor purification in DIF analyses. Struct. Equ. Model.
**2020**, 27, 56–72. [Google Scholar] [CrossRef] - Lai, M.H.C.; Liu, Y.; Tse, W.W.Y. Adjusting for partial invariance in latent parameter estimation: Comparing forward specification search and approximate invariance methods. Behav. Res. Methods
**2021**. [Google Scholar] [CrossRef] [PubMed] - Robitzsch, A.; Lüdtke, O. Mean comparisons of many groups in the presence of DIF: An evaluation of linking and concurrent scaling approaches. J. Educ. Behav. Stat.
**2021**. [Google Scholar] [CrossRef] - Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas.
**2016**, 53, 152–171. [Google Scholar] [CrossRef] - Oliveri, M.E.; von Davier, M. Investigation of model fit and score scale comparability in international assessments. Psych. Test Assess. Model.
**2011**, 53, 315–333. [Google Scholar] - Oliveri, M.E.; von Davier, M. Toward increasing fairness in score scale calibrations employed in international large-scale assessments. Int. J. Test.
**2014**, 14, 1–21. [Google Scholar] [CrossRef] - OECD. PISA 2015. Technical Report; OECD: Paris, France, 2017. [Google Scholar]
- von Davier, M.; Yamamoto, K.; Shin, H.J.; Chen, H.; Khorramdel, L.; Weeks, J.; Davis, S.; Kong, N.; Kandathil, M. Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assess. Educ.
**2019**, 26, 466–488. [Google Scholar] [CrossRef] - Robitzsch, A. L
_{p}loss functions in invariance alignment and Haberman linking with few or many groups. Stats**2020**, 3, 246–283. [Google Scholar] [CrossRef] - Robitzsch, A.; Lüdtke, O. Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assess. Educ.
**2019**, 26, 444–465. [Google Scholar] [CrossRef] - El Masri, Y.H.; Andrich, D. The trade-off between model fit, invariance, and validity: The case of PISA science assessments. Appl. Meas. Educ.
**2020**, 33, 174–188. [Google Scholar] [CrossRef] - Shealy, R.; Stout, W. A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika
**1993**, 58, 159–194. [Google Scholar] [CrossRef] - Zwitser, R.J.; Glaser, S.S.F.; Maris, G. Monitoring countries in a changing world: A new look at DIF in international surveys. Psychometrika
**2017**, 82, 210–232. [Google Scholar] [CrossRef] - Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Volume 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
- Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika
**1981**, 46, 443–459. [Google Scholar] [CrossRef] - von Davier, M.; Sinharay, S. Analytics in international large-scale assessments: Item response theory and population models. In A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Rutkowski, L., von Davier, M., Rutkowski, D., Eds.; Chapman Hall/CRC Press: London, UK, 2014; pp. 155–174. [Google Scholar] [CrossRef]
- Robitzsch, A. A note on a computationally efficient implementation of the EM algorithm in item response models. Quant. Comput. Methods Behav. Sci.
**2021**, 1, e3783. [Google Scholar] [CrossRef] - González, J.; Wiberg, M. Applying Test Equating Methods. Using R; Springer: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
- Lee, W.C.; Lee, G. IRT linking and equating. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 639–673. [Google Scholar] [CrossRef]
- Sansivieri, V.; Wiberg, M.; Matteucci, M. A review of test equating methods with a special focus on IRT-based approaches. Statistica
**2017**, 77, 329–352. [Google Scholar] [CrossRef] - Haberman, S.J. Linking Parameter Estimates Derived from an Item Response Model through Separate Calibrations; (Research Report No. RR-09-40); Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
- Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika
**2017**, 82, 610–636. [Google Scholar] [CrossRef] [PubMed] - Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model.
**2014**, 21, 495–508. [Google Scholar] [CrossRef] - Muthén, B.; Asparouhov, T. IRT studies of many groups: The alignment method. Front. Psychol.
**2014**, 5, 978. [Google Scholar] [CrossRef][Green Version] - Muthén, B.; Asparouhov, T. Recent methods for the study of measurement invariance with many groups: Alignment and random effects. Sociol. Methods Res.
**2018**, 47, 637–664. [Google Scholar] [CrossRef] - Pokropek, A.; Davidov, E.; Schmidt, P. A Monte Carlo simulation study to assess the appropriateness of traditional and newer approaches to test for measurement invariance. Struct. Equ. Model.
**2019**, 26, 724–744. [Google Scholar] [CrossRef][Green Version] - Pokropek, A.; Lüdtke, O.; Robitzsch, A. An extension of the invariance alignment method for scale linking. Psych. Test Assess. Model.
**2020**, 62, 303–334. [Google Scholar] - Haebara, T. Equating logistic ability scales by a weighted least squares method. Jpn. Psychol. Res.
**1980**, 22, 144–149. [Google Scholar] [CrossRef][Green Version] - Kim, S.; Kolen, M.J. Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. J. Educ. Behav. Stat.
**2007**, 32, 371–397. [Google Scholar] [CrossRef] - Weeks, J.P. plink: An R package for linking mixed-format tests using IRT-based methods. J. Stat. Softw.
**2010**, 35, 1–33. [Google Scholar] [CrossRef] - Arai, S.; Mayekawa, S.i. A comparison of equating methods and linking designs for developing an item pool under item response theory. Behaviormetrika
**2011**, 38, 1–16. [Google Scholar] [CrossRef] - Robitzsch, A. Robust Haebara linking for many groups: Performance in the case of uniform DIF. Psych
**2020**, 2, 155–173. [Google Scholar] [CrossRef] - OECD. PISA 2006. Technical Report; OECD: Paris, France, 2009. [Google Scholar]
- Foy, P.; Yin, L. Scaling the PIRLS 2016 achievement data. In Methods and Procedures in PIRLS 2016; Martin, M.O., Mullis, I.V., Hooper, M., Eds.; IEA: Newton, MA, USA, 2017. [Google Scholar]
- Foy, P.; Yin, L. Scaling the TIMSS 2015 achievement data. In Methods and Procedures in TIMSS 2015; Martin, M.O., Mullis, I.V., Hooper, M., Eds.; IEA: Newton, MA, USA, 2016. [Google Scholar]
- Foy, P.; Fishbein, B.; von Davier, M.; Yin, L. Implementing the TIMSS 2019 scaling methodology. In Methods and Procedures: TIMSS 2019 Technical Report; Martin, M.O., von Davier, M., Mullis, I.V., Eds.; IEA: Newton, MA, USA, 2020. [Google Scholar]
- Gebhardt, E.; Adams, R.J. The influence of equating methodology on reported trends in PISA. J. Appl. Meas.
**2007**, 8, 305–322. [Google Scholar] - Fishbein, B.; Martin, M.O.; Mullis, I.V.S.; Foy, P. The TIMSS 2019 item equivalence study: Examining mode effects for computer-based assessment and implications for measuring trends. Large-Scale Assess. Educ.
**2018**, 6, 11. [Google Scholar] [CrossRef][Green Version] - Martin, M.O.; Mullis, I.V.S.; Foy, P.; Brossman, B.; Stanco, G.M. Estimating linking error in PIRLS. IERI Monogr. Ser.
**2012**, 5, 35–47. [Google Scholar] - Kim, S.H.; Cohen, A.S. A comparison of linking and concurrent calibration under item response theory. Appl. Psychol. Meas.
**1998**, 22, 131–143. [Google Scholar] [CrossRef][Green Version] - Hanson, B.A.; Béguin, A.A. Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Appl. Psychol. Meas.
**2002**, 26, 3–24. [Google Scholar] [CrossRef] - Andersson, B. Asymptotic variance of linking coefficient estimators for polytomous IRT models. Appl. Psychol. Meas.
**2018**, 42, 192–205. [Google Scholar] [CrossRef] - Demirus, K.; Gelbal, S. The study of the effect of anchor items showing or not showing differantial item functioning to test equating using various methods. J. Meas. Eval. Educ. Psychol.
**2016**, 7, 182–201. [Google Scholar] [CrossRef][Green Version] - Gübes, N.; Uyar, S. Comparing performance of different equating methods in presence and absence of DIF Items in anchor test. Int. J. Progress. Educ.
**2020**, 16, 111–122. [Google Scholar] [CrossRef] - He, Y.; Cui, Z. Evaluating robust scale transformation methods with multiple outlying common items under IRT true score equating. Appl. Psychol. Meas.
**2020**, 44, 296–310. [Google Scholar] [CrossRef] - Inal, H.; Anil, D. Investigation of group invariance in test equating under different simulation conditions. Eurasian J. Educ. Res.
**2018**, 18, 67–86. [Google Scholar] [CrossRef] - Kabasakal, K.A.; Kelecioğlu, H. Effect of differential item functioning on test equating. Educ. Sci. Theory Pract.
**2015**, 15, 1229–1246. [Google Scholar] [CrossRef][Green Version] - Tulek, O.K.; Kose, I.A. Comparison of different forms of a test with or without items that exhibit DIF. Eurasian J. Educ. Res.
**2019**, 19, 167–182. [Google Scholar] [CrossRef] - Pohl, S.; Schulze, D. Assessing group comparisons or change over time under measurement non-invariance: The cluster approach for nonuniform DIF. Psych. Test Assess. Model.
**2020**, 62, 281–303. [Google Scholar] - Yurtçu, M.; Güzeller, C.O. Investigation of equating error in tests with differential item functioning. Int. J. Assess. Tool. Educ.
**2018**, 5, 50–57. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2020; Available online: https://www.R-project.org/ (accessed on 20 August 2020).
- Robitzsch, A.; Kiefer, T.; Wu, M. TAM: Test Analysis Modules; R Package Version 3.7-6. 2021. Available online: https://CRAN.R-project.org/package=TAM (accessed on 25 June 2021).
- Robitzsch, A. Sirt: Supplementary Item Response Theory Models; R Package Version 3.9-4. 2020. Available online: https://CRAN.R-project.org/package=sirt (accessed on 17 February 2020).
- Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw.
**2015**, 67, 1–48. [Google Scholar] [CrossRef] - OECD. PISA 2009. Technical Report; OECD: Paris, France, 2012. [Google Scholar]
- Falk, C.F.; Cai, L. Semiparametric item response functions in the context of guessing. J. Educ. Meas.
**2016**, 53, 229–247. [Google Scholar] [CrossRef] - Feuerstahler, L.M. Metric transformations and the filtered monotonic polynomial item response model. Psychometrika
**2019**, 84, 105–123. [Google Scholar] [CrossRef] [PubMed] - Feuerstahler, L. Flexible item response modeling in R with the flexmet package. Psych
**2021**, 3, 447–478. [Google Scholar] [CrossRef] - Ramsay, J.O.; Winsberg, S. Maximum marginal likelihood estimation for semiparametric item analysis. Psychometrika
**1991**, 56, 365–379. [Google Scholar] [CrossRef] - Rossi, N.; Wang, X.; Ramsay, J.O. Nonparametric item response function estimates with the EM algorithm. J. Educ. Behav. Stat.
**2002**, 27, 291–317. [Google Scholar] [CrossRef][Green Version] - Anderson, D.; Kahn, J.D.; Tindal, G. Exploring the robustness of a unidimensional item response theory model with empirically multidimensional data. Appl. Meas. Educ.
**2017**, 30, 163–177. [Google Scholar] [CrossRef] - Martineau, J.A. Distorting value added: The use of longitudinal, vertically scaled student achievement data for growth-based, value-added accountability. J. Educ. Behav. Stat.
**2006**, 31, 35–62. [Google Scholar] [CrossRef][Green Version] - Köhler, C.; Hartig, J. Practical significance of item misfit in educational assessments. Appl. Psychol. Meas.
**2017**, 41, 388–400. [Google Scholar] [CrossRef] [PubMed] - Sinharay, S.; Haberman, S.J. How often is the misfit of item response theory models practically significant? Educ. Meas.
**2014**, 33, 23–35. [Google Scholar] [CrossRef] - Zhao, Y.; Hambleton, R.K. Practical consequences of item response theory model misfit in the context of test equating with mixed-format test data. Front. Psychol.
**2017**, 8, 484. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bolt, D.M.; Deng, S.; Lee, S. IRT model misspecification and measurement of growth in vertical scaling. J. Educ. Meas.
**2014**, 51, 141–162. [Google Scholar] [CrossRef] - Guo, H.; Liu, J.; Dorans, N.; Feigenbaum, M. Multiple Linking in Equating and Random Scale Drift; (Research Report No. RR-11-46); Educational Testing Service: Princeton, NJ, USA, 2011. [Google Scholar] [CrossRef]
- Puhan, G. Detecting and correcting scale drift in test equating: An illustration from a large scale testing program. Appl. Meas. Educ.
**2008**, 22, 79–103. [Google Scholar] [CrossRef] - Battauz, M. IRT test equating in complex linkage plans. Psychometrika
**2013**, 78, 464–480. [Google Scholar] [CrossRef] [PubMed] - Battauz, M. Factors affecting the variability of IRT equating coefficients. Stat. Neerl.
**2015**, 69, 85–101. [Google Scholar] [CrossRef] - Battauz, M. equateIRT: An R package for IRT test equating. J. Stat. Softw.
**2015**, 68, 1–22. [Google Scholar] [CrossRef][Green Version] - Briggs, D.C.; Weeks, J.P. The sensitivity of value-added modeling to the creation of a vertical score scale. Educ. Financ. Policy
**2009**, 4, 384–414. [Google Scholar] [CrossRef] - Bjermo, J.; Miller, F. Efficient estimation of mean ability growth using vertical scaling. Appl. Meas. Educ.
**2021**. [Google Scholar] [CrossRef] - Fischer, L.; Rohm, T.; Carstensen, C.H.; Gnambs, T. Linking of Rasch-scaled tests: Consequences of limited item pools and model misfit. Front. Psychol.
**2021**, 12, 633896. [Google Scholar] [CrossRef] [PubMed] - Pohl, S.; Haberkorn, K.; Carstensen, C.H. Measuring competencies across the lifespan-challenges of linking test scores. In Dependent Data in Social Sciences Research; Stemmler, M., von Eye, A., Wiedermann, W., Eds.; Springer: Cham, Switzerland, 2015; pp. 281–308. [Google Scholar] [CrossRef]
- Tong, Y.; Kolen, M.J. Comparisons of methodologies and results in vertical scaling for educational achievement tests. Appl. Meas. Educ.
**2007**, 20, 227–253. [Google Scholar] [CrossRef] - Barrett, M.D.; van der Linden, W.J. Estimating linking functions for response model parameters. J. Educ. Behav. Stat.
**2019**, 44, 180–209. [Google Scholar] [CrossRef] - Jewsbury, P.A. Error Variance in Common Population Linking Bridge Studies; (Research Report No. RR-19-42); Educational Testing Service: Princeton, NJ, USA, 2019. [Google Scholar] [CrossRef][Green Version]
- Ogasawara, H. Standard errors of item response theory equating/linking by response function methods. Appl. Psychol. Meas.
**2001**, 25, 53–67. [Google Scholar] [CrossRef] - Haberman, S.J.; Lee, Y.H.; Qian, J. Jackknifing Techniques for Evaluation of Equating Accuracy; (Research Report No. RR-09-02); Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
- Michaelides, M.P. A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating. Front. Psychol.
**2010**, 1, 167. [Google Scholar] [CrossRef][Green Version] - Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas.
**2007**, 8, 323–335. [Google Scholar] [PubMed] - Monseur, C.; Sibberns, H.; Hastedt, D. Linking errors in trend estimation for international surveys in education. IERI Monogr. Ser.
**2008**, 1, 113–122. [Google Scholar] - Xu, X.; von Davier, M. Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study; (Research Report No. RR-10-10); Educational Testing Service: Princeton, NJ, USA, 2010. [Google Scholar] [CrossRef]
- Van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998; Volume 3. [Google Scholar] [CrossRef]

**Figure 1.**Linking design for two groups with common items ${I}_{0}$ and group-specific unique items ${I}_{1}$ and ${I}_{2}$.

**Table 1.**Variance proportions of different factors in the simulation study for the bias and RMSE for the estimated mean ${\widehat{\mu}}_{2}$ and estimated SD ${\widehat{\sigma}}_{2}$ for the second group.

Source | ${\widehat{\mathit{\mu}}}_{2}$ | ${\widehat{\mathit{\sigma}}}_{2}$ | ||
---|---|---|---|---|

Bias | RMSE | Bias | RMSE | |

N | 0.3 | 1.1 | 0.6 | 3.9 |

I | 0.0 | 0.3 | 0.0 | 0.0 |

Meth | 10.2$\phantom{1}$ | 14.9$\phantom{1}$ | 19.1$\phantom{1}$ | 0.0 |

${\tau}_{b}$ | 13.0$\phantom{1}$ | 0.0 | 0.8 | 1.8 |

${\tau}_{a}$ | 4.3 | 9.0 | 12.3$\phantom{1}$ | 0.0 |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$I | 0.0 | 0.0 | 0.0 | 0.0 |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$Meth | 0.0 | 3.7 | 0.8 | 0.0 |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 0.0 | 2.4 | 0.0 | 0.0 |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{a}$ | 0.0 | 0.6 | 0.0 | 5.0 |

I$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$Meth | 0.4 | 0.1 | 0.1 | 0.0 |

I$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 0.0 | 0.1 | 0.0 | 0.0 |

I$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{a}$ | 0.0 | 0.0 | 0.0 | 0.6 |

Meth$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 58.1$\phantom{1}$ | 13.1$\phantom{1}$ | 17.5$\phantom{1}$ | 14.2$\phantom{1}$ |

Meth$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{a}$ | 8.2 | 12.1$\phantom{1}$ | 47.7$\phantom{1}$ | 13.2$\phantom{1}$ |

${\tau}_{a}$$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 0.0 | 4.1 | 0.0 | 17.7$\phantom{1}$ |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$I$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$Meth | 0.0 | 0.0 | 0.1 | 0.0 |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$I$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 0.0 | 0.2 | 0.0 | 0.0 |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$I$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{a}$ | 0.0 | 0.1 | 0.0 | 0.4 |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$Meth$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 0.2 | 7.5 | 0.0 | 4.2 |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$Meth$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{a}$ | 0.0 | 4.0 | 0.0 | 9.1 |

N$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{a}$$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 0.1 | 10.0$\phantom{1}$ | 0.0 | 13.8$\phantom{1}$ |

I$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$Meth$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 0.5 | 0.0 | 0.0 | 0.4 |

I$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$Meth$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{a}$ | 0.1 | 0.0 | 0.2 | 1.1 |

I$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{a}$$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 0.1 | 0.3 | 0.0 | 0.7 |

Meth$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{a}$$\phantom{\rule{0.166667em}{0ex}}\times \phantom{\rule{0.166667em}{0ex}}$${\tau}_{b}$ | 1.0 | 10.1$\phantom{1}$ | 0.1 | 8.2 |

Residual | 3.7 | 6.4 | 0.6 | 5.7 |

**Table 2.**Summary of the satisfactory performance of linking methods for the absolute bias and RMSE across parameters (mean ${\widehat{\mu}}_{2}$ and standard deviation ${\widehat{\sigma}}_{2}$) and conditions.

Bias | RMSE | |||||
---|---|---|---|---|---|---|

NODIF | UDIF | NUDIF | NODIF | UDIF | NUDIF | |

logMM | 100 | $\phantom{1}$97 | 94 | 100 | 100 | 45 |

HAB | 100 | $\phantom{1}$97 | 94 | 100 | 100 | 44 |

MM | 100 | $\phantom{1}$94 | 95 | $\phantom{1}$92 | 100 | 72 |

HAB-nolog | 100 | $\phantom{1}$94 | 96 | 100 | 100 | 78 |

IA2 | $\phantom{1}$75 | $\phantom{1}$78 | $\phantom{1}$8 | 100 | 100 | $\phantom{1}$4 |

HAE-asymm | 100 | $\phantom{1}$42 | 42 | 100 | $\phantom{1}$61 | 78 |

HAE-symm | 100 | $\phantom{1}$97 | 94 | 100 | $\phantom{1}$61 | 81 |

HAE-joint | 100 | $\phantom{1}$42 | 60 | 100 | $\phantom{1}$42 | 61 |

RC1 | $\phantom{1}$83 | $\phantom{1}$78 | 16 | 100 | $\phantom{1}$61 | 29 |

RC2 | $\phantom{1}$83 | $\phantom{1}$78 | $\phantom{1}$8 | 100 | $\phantom{1}$61 | 48 |

RC3 | 100 | $\phantom{1}$94 | 96 | 100 | $\phantom{1}$61 | 79 |

ANCH | $\phantom{1}$83 | $\phantom{1}$78 | 13 | 100 | $\phantom{1}$61 | 48 |

CC | 100 | $\phantom{1}$50 | 45 | 100 | $\phantom{1}$33 | 46 |

**Table 3.**Bias and RMSE for mean ${\widehat{\mu}}_{2}$ and standard deviation ${\widehat{\sigma}}_{2}$ for the second group for a sample size $N=1000$ and $I=40$ items as a function of the type of differential item functioning and linking method.

Bias | RMSE | |||||
---|---|---|---|---|---|---|

NODIF | UDIF | NUDIF | NODIF | UDIF | NUDIF | |

${\mathbf{\tau}}_{\mathit{b}}=\mathbf{0}$ | ${\mathbf{\tau}}_{\mathit{b}}=\mathbf{0.5}$ | ${\mathbf{\tau}}_{\mathit{b}}=\mathbf{0.5}$ | ${\mathbf{\tau}}_{\mathit{b}}=\mathbf{0}$ | ${\mathbf{\tau}}_{\mathit{b}}=\mathbf{0.5}$ | ${\mathbf{\tau}}_{\mathit{b}}=\mathbf{0.5}$ | |

${\mathbf{\tau}}_{\mathit{a}}=\mathbf{0}$ | ${\mathbf{\tau}}_{\mathit{a}}=\mathbf{0}$ | ${\mathbf{\tau}}_{\mathit{a}}=\mathbf{0.25}$ | ${\mathbf{\tau}}_{\mathit{a}}=\mathbf{0}$ | ${\mathbf{\tau}}_{\mathit{a}}=\mathbf{0}$ | ${\mathbf{\tau}}_{\mathit{a}}=\mathbf{0.25}$ | |

Mean${\widehat{\mu}}_{2}$ | ||||||

logMM | $\phantom{-}$0.000 | $\phantom{-}$0.007 | $\phantom{-}$0.008 | 108.2 | 104.4 | 106.1 |

HAB | $\phantom{-}$0.000 | $\phantom{-}$0.007 | $\phantom{-}$0.008 | 108.2 | 104.4 | 106.1 |

MM | $\phantom{-}$0.000 | $\phantom{-}$0.007 | $\phantom{-}$0.007 | 108.1 | 103.7 | 104.7 |

HAB-nolog | $\phantom{-}$0.001 | $\phantom{-}$0.007 | $\phantom{-}$0.007 | 108.5 | 103.5 | 104.5 |

IA2 | −0.001 | $\phantom{-}$0.001 | $\phantom{-}$0.045 | 103.2 | 107.5 | 133.3 |

HAE-asymm | −0.002 | −0.030 | −0.032 | 102.3 | 100.0 | 100.0 |

HAE-symm | −0.001 | $\phantom{-}$0.002 | $\phantom{-}$0.005 | 102.7 | 105.0 | 105.2 |

HAE-joint | −0.002 | $\phantom{-}$0.067 | $\phantom{-}$0.064 | 100.9 | 136.1 | 132.4 |

RC1 | −0.001 | $\phantom{-}$0.001 | $\phantom{-}$0.028 | 100.2 | 104.8 | 120.5 |

RC2 | −0.006 | −0.004 | −0.022 | 100.0 | 104.0 | 100.1 |

RC3 | −0.003 | −0.001 | $\phantom{-}$0.002 | 100.1 | 103.9 | 109.4 |

ANCH | −0.003 | −0.004 | −0.021 | 101.4 | 104.2 | 103.9 |

CC | −0.002 | $\phantom{-}$0.095 | $\phantom{-}$0.109 | 101.3 | 149.2 | 157.7 |

Standard Deviation${\widehat{\sigma}}_{2}$ | ||||||

logMM | $\phantom{-}$0.000 | $\phantom{-}$0.003 | $\phantom{-}$0.008 | 110.2 | 112.6 | 128.9 |

HAB | $\phantom{-}$0.000 | $\phantom{-}$0.003 | $\phantom{-}$0.008 | 110.2 | 112.6 | 129.4 |

MM | −0.001 | $\phantom{-}$0.001 | $\phantom{-}$0.005 | 108.5 | 109.4 | 107.7 |

HAB-nolog | $\phantom{-}$0.001 | $\phantom{-}$0.002 | $\phantom{-}$0.007 | 100.0 | 100.0 | 100.0 |

IA2 | $\phantom{-}$0.009 | $\phantom{-}$0.009 | $\phantom{-}$0.147 | 113.2 | 111.6 | 197.9 |

HAE-asymm | −0.002 | −0.120 | −0.134 | 107.2 | 378.8 | 185.6 |

HAE-symm | $\phantom{-}$0.001 | −0.003 | $\phantom{-}$0.003 | 108.3 | 233.7 | 119.9 |

HAE-joint | −0.001 | $\phantom{-}$0.020 | $\phantom{-}$0.029 | 107.5 | 317.0 | 146.6 |

RC1 | $\phantom{-}$0.006 | $\phantom{-}$0.008 | $\phantom{-}$0.105 | 109.8 | 243.8 | 174.5 |

RC2 | −0.009 | −0.008 | −0.097 | 108.5 | 217.2 | 148.3 |

RC3 | −0.002 | $\phantom{-}$0.000 | $\phantom{-}$0.002 | 106.6 | 228.3 | 110.2 |

ANCH | −0.009 | −0.008 | −0.097 | 108.5 | 217.2 | 148.3 |

CC | −0.001 | $\phantom{-}$0.015 | $\phantom{-}$0.029 | 107.4 | 220.4 | 129.0 |

Domain | N | I | $\mathbf{M}$ | $\mathbf{SD}$ | ||||
---|---|---|---|---|---|---|---|---|

P06 | P09 | P06 | P09 | P06 | P09 | P06 | P09 | |

Mathematics | 3784 | 4575 | $\phantom{1}$48 | 35 | 506.8 | 495.9 | $\phantom{1}$96.8 | $\phantom{1}$96.1 |

Reading | 2646 | 6585 | $\phantom{1}$27 | 99 | 491.2 | 470.3 | 107.7 | 100.1 |

Science | 4927 | 4577 | 103 | 53 | 511.7 | 494.3 | $\phantom{1}$97.3 | 101.8 |

Method | Mathematics | Reading | Science | |||
---|---|---|---|---|---|---|

1PL | 2PL | 1PL | 2PL | 1PL | 2PL | |

logMM | −15.5 | −12.4 | −5.8 | −6.3 | −14.7 | −16.8 |

HAB | −15.5 | −12.4 | −5.8 | −6.3 | −14.7 | −16.8 |

MM | −15.5 | −12.4 | −5.8 | −6.3 | −14.7 | −16.7 |

HAB-nolog | −15.5 | −12.3 | −6.0 | −6.3 | −14.5 | −16.6 |

IA2 | −15.5 | −15.9 | −5.8 | −6.1 | −14.7 | −11.6 |

HAE-asymm | −14.4 | −14.6 | −4.9 | −6.4 | −14.2 | −15.9 |

HAE-symm | −14.6 | −15.0 | −5.0 | −6.6 | −14.2 | −15.7 |

HAE-joint | −13.5 | −14.1 | −4.1 | −5.0 | −13.9 | −14.0 |

RC1 | −14.3 | −14.5 | −4.4 | −5.1 | −14.0 | −13.2 |

RC2 | −14.3 | −14.3 | −4.3 | −5.0 | −14.2 | −12.9 |

RC3 | −14.3 | −14.4 | −4.4 | −5.0 | −14.1 | −13.1 |

ANCH | −14.4 | −15.7 | −4.5 | −5.4 | −14.5 | −14.1 |

CC | −14.3 | −14.9 | −4.3 | −5.3 | −14.2 | −13.6 |

M | −14.8 | −14.1 | −5.0 | −5.8 | −14.3 | −14.7 |

SD | $\phantom{-}$$\phantom{1}$0.7 | $\phantom{-}$$\phantom{1}$1.3 | $\phantom{-}$0.7 | $\phantom{-}$0.6 | $\phantom{-}$$\phantom{1}$0.3 | $\phantom{-}$$\phantom{1}$1.8 |

Min | −15.5 | −15.9 | −6.0 | −6.6 | −14.7 | −16.8 |

Max | −13.5 | −12.3 | −4.1 | −5.0 | −13.9 | −11.6 |

**Table 6.**Standard deviation for Austrian students in PISA 2009. for domains Mathematics, Reading and Science for the 1PL and the 2PL model as a function of the linking method.

Method | Mathematics | Reading | Science | |||
---|---|---|---|---|---|---|

1PL | 2PL | 1PL | 2PL | 1PL | 2PL | |

logMM | 97.7 | 98.3 | $\phantom{1}$98.6 | 103.2 | 103.2 | 106.8 |

HAB | 97.7 | 98.3 | $\phantom{1}$98.6 | 103.2 | 103.2 | 106.8 |

MM | 97.7 | 98.7 | $\phantom{1}$98.6 | 103.8 | 103.2 | 106.9 |

HAB-nolog | 97.9 | 99.3 | $\phantom{1}$94.6 | 102.0 | 103.9 | 108.1 |

IA2 | 97.7 | 99.5 | $\phantom{1}$98.6 | 104.6 | 103.2 | 109.2 |

HAE-asymm | 94.1 | 95.0 | 102.6 | 105.4 | 105.0 | 107.5 |

HAE-symm | 95.0 | 96.2 | 103.1 | 105.9 | 105.3 | 107.8 |

HAE-joint | 95.0 | 95.7 | 105.1 | 107.5 | 104.7 | 107.4 |

RC1 | 96.0 | 96.9 | 103.1 | 107.2 | 103.9 | 108.6 |

RC2 | 96.0 | 95.6 | $\phantom{1}$99.9 | 106.2 | 104.7 | 105.9 |

RC3 | 96.0 | 96.3 | 101.5 | 106.7 | 104.3 | 107.2 |

ANCH | 96.0 | 95.6 | $\phantom{1}$99.9 | 106.2 | 104.7 | 105.9 |

CC | 95.9 | 96.7 | 101.3 | 106.4 | 104.1 | 107.5 |

M | 96.3 | 97.1 | 100.4 | 105.2 | 104.1 | 107.4 |

SD | $\phantom{1}$1.2 | $\phantom{1}$1.5 | $\phantom{1}$$\phantom{1}$2.7 | $\phantom{1}$$\phantom{1}$1.7 | $\phantom{1}$$\phantom{1}$0.7 | $\phantom{1}$$\phantom{1}$0.9 |

Min | 94.1 | 95.0 | $\phantom{1}$94.6 | 102.0 | 103.2 | 105.9 |

Max | 97.9 | 99.5 | 105.1 | 107.5 | 105.3 | 109.2 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Robitzsch, A.
A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning. *Foundations* **2021**, *1*, 116-144.
https://doi.org/10.3390/foundations1010009

**AMA Style**

Robitzsch A.
A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning. *Foundations*. 2021; 1(1):116-144.
https://doi.org/10.3390/foundations1010009

**Chicago/Turabian Style**

Robitzsch, Alexander.
2021. "A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning" *Foundations* 1, no. 1: 116-144.
https://doi.org/10.3390/foundations1010009