# Comparing Robust Linking and Regularized Estimation for Linking Two Groups in the 1PL and 2PL Models in the Presence of Sparse Uniform Differential Item Functioning

^{1}

^{2}

## Abstract

**:**

## 1. Introduction

## 2. Two-Group Comparison under Sparse DIF

#### 2.1. Concurrent Calibration

#### 2.1.1. 1PL Model

#### 2.1.2. 2PL Model

#### 2.2. Regularization Approaches

#### 2.2.1. 1PL Model

#### 2.2.2. 2PL Model

#### 2.3. Robust Linking Approaches

#### 2.3.1. 1PL Model

#### Robust Linking Using the ${L}_{p}$ Loss Function

#### Robust Linking Using the MAD Statistic

#### 2.3.2. 2PL Model

#### Robust Linking Using ${L}_{p}$ Loss Function or MAD Statistic

#### Joint Haberman Linking Using Common Item Discriminations

#### Haberman Linking Based on Separate Calibration

#### 2.4. On the Relation of Robust Linking and Regularized Estimation

## 3. Simulation Study 1: DIF Effects in the 1PL Model

#### 3.1. Method

`xxirt`function in the sirt package [72]. Replication material can be found at https://osf.io/tma3f/ (accessed on 8 December 2022).

#### 3.2. Results

## 4. Focused Simulation Study 1A: Optimal Choice of two Tuning Parameters for the SCAD Penalty

#### 4.1. Method

#### 4.2. Results

## 5. Simulation Study 2: Uniform DIF Effects in the 2PL Model

#### 5.1. Method

#### 5.2. Results

## 6. Discussion

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

1PL | one-parameter logistic |

2PL | two-parameter logistic |

AIC | Akaike information criterion |

BIC | Bayesian information criterion |

CC | concurrent calibration |

DIF | differential item functioning |

DWLS | diagonally weighted least squares |

FIPC | fixed item parameter calibration |

IPD | item parameter drift |

IRT | item response theory |

JK | jackknife |

LE | linking error |

LSA | large-scale assessment studies |

MAD | median absolute deviation |

PISA | programme for international student assessment |

RMSE | root mean square error |

SCAD | smoothly clipped absolute deviation |

## References

- Van der Linden, W.J.; Hambleton, R.K. (Eds.) Handbook of Modern Item Response Theory; Springer: New York, NY, USA, 1997. [Google Scholar] [CrossRef]
- Van der Linden, W.J. (Ed.) Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
- Lietz, P.; Cresswell, J.C.; Rust, K.F.; Adams, R.J. (Eds.) Implementation of Large-scale Education Assessments; Wiley: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
- Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall/CRC Press: London, UK, 2013. [Google Scholar] [CrossRef]
- OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020; Available online: https://bit.ly/3zWbidA (accessed on 8 December 2022).
- Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
- Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Denmark, 1960. [Google Scholar]
- Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- Mellenbergh, G.J. Item bias and item response theory. Int. J. Educ. Res.
**1989**, 13, 127–143. [Google Scholar] [CrossRef] - Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011; p. 3821961. [Google Scholar] [CrossRef]
- Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
- Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; 2007; Elsevier: Amsterdam, The Netherlands pp. 125–167. [CrossRef]
- Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations
**2021**, 1, 116–144. [Google Scholar] [CrossRef] - De Boeck, P. Random item IRT models. Psychometrika
**2008**, 73, 533–559. [Google Scholar] [CrossRef] - Frederickx, S.; Tuerlinckx, F.; De Boeck, P.; Magis, D. RIM: A random item mixture model to detect differential item functioning. J. Educ. Meas.
**2010**, 47, 432–457. [Google Scholar] [CrossRef] - Byrne, B.M.; Shavelson, R.J.; Muthén, B. Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychol. Bull.
**1989**, 105, 456–466. [Google Scholar] [CrossRef] - Lee, S.S.; von Davier, M. Improving measurement properties of the PISA home possessions scale through partial invariance modeling. Psychol. Test Assess. Model.
**2020**, 62, 55–83. Available online: https://bit.ly/3FRN6Qf (accessed on 8 December 2022). - Magis, D.; Tuerlinckx, F.; De Boeck, P. Detection of differential item functioning using the lasso approach. J. Educ. Behav. Stat.
**2015**, 40, 111–135. [Google Scholar] [CrossRef] - Tutz, G.; Schauberger, G. A penalty approach to differential item functioning in Rasch models. Psychometrika
**2015**, 80, 21–43. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Chen, Y.; Li, C.; Xu, G. DIF statistical inference and detection without knowing anchoring items. arXiv
**2021**, arXiv:2110.11112. [Google Scholar] [CrossRef] - Halpin, P.F. Differential item functioning via robust scaling. arXiv
**2022**, arXiv:2207.04598. [Google Scholar] [CrossRef] - Magis, D.; De Boeck, P. Identification of differential item functioning in multiple-group settings: A multivariate outlier detection approach. Multivar. Behav. Res.
**2011**, 46, 733–755. [Google Scholar] [CrossRef] - Magis, D.; De Boeck, P. A robust outlier approach to prevent type I error inflation in differential item functioning. Educ. Psychol. Meas.
**2012**, 72, 291–311. [Google Scholar] [CrossRef] - Wang, W.; Liu, Y.; Liu, H. Testing differential item functioning without predefined anchor items using robust regression. J. Educ. Behav. Stat.
**2022**, 47, 666–692. [Google Scholar] [CrossRef] - Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry
**2021**, 13, 2198. [Google Scholar] [CrossRef] - Robitzsch, A.; Lüdtke, O. A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psychol. Test Assess. Model.
**2020**, 62, 233–279. Available online: https://bit.ly/3ezBB05 (accessed on 8 December 2022). - Fan, J.; Li, R.; Zhang, C.H.; Zou, H. Statistical Foundations of Data Science; Chapman and Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
- Chen, Y.; Li, X.; Liu, J.; Ying, Z. Robust measurement via a fused latent and graphical item response theory model. Psychometrika
**2018**, 83, 538–562. [Google Scholar] [CrossRef] [Green Version] - Sun, J.; Chen, Y.; Liu, J.; Ying, Z.; Xin, T. Latent variable selection for multidimensional item response theory models via L
_{1}regularization. Psychometrika**2016**, 81, 921–939. [Google Scholar] [CrossRef] [PubMed] - Geminiani, E.; Marra, G.; Moustaki, I. Single- and multiple-group penalized factor analysis: A trust-region algorithm approach with integrated automatic multiple tuning parameter selection. Psychometrika
**2021**, 86, 65–95. [Google Scholar] [CrossRef] - Huang, P.H.; Chen, H.; Weng, L.J. A penalized likelihood method for structural equation modeling. Psychometrika
**2017**, 82, 329–354. [Google Scholar] [CrossRef] - Jacobucci, R.; Grimm, K.J.; McArdle, J.J. Regularized structural equation modeling. Struct. Equ. Modeling
**2016**, 23, 555–566. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Chen, Y.; Li, X.; Liu, J.; Ying, Z. Regularized latent class analysis with application in cognitive diagnosis. Psychometrika
**2017**, 82, 660–692. [Google Scholar] [CrossRef] [PubMed] - Robitzsch, A.; George, A.C. The R package CDM for diagnostic modeling. In Handbook of Diagnostic Classification Models; von Davier, M., Lee, Y.S., Eds.; Springer: Cham, Switzerland, 2019; pp. 549–572. [Google Scholar] [CrossRef]
- Robitzsch, A. Regularized latent class analysis for polytomous item responses: An application to SPM-LS data. J. Intell.
**2020**, 8, 30. [Google Scholar] [CrossRef] - Fop, M.; Murphy, T.B. Variable selection methods for model-based clustering. Stat. Surv.
**2018**, 12, 18–65. [Google Scholar] [CrossRef] - Robitzsch, A. Regularized mixture Rasch model. Information
**2022**, 13, 534. [Google Scholar] [CrossRef] - Belzak, W.C. The multidimensionality of measurement bias in high-stakes testing: Using machine learning to evaluate complex sources of differential item functioning. Educ. Meas. 2022; epub ahead of print. [Google Scholar] [CrossRef]
- Belzak, W.; Bauer, D.J. Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychol. Methods
**2020**, 25, 673–690. [Google Scholar] [CrossRef] - Bauer, D.J.; Belzak, W.C.M.; Cole, V.T. Simplifying the assessment of measurement invariance over multiple background variables: Using regularized moderated nonlinear factor analysis to detect differential item functioning. Struct. Equ. Model.
**2020**, 27, 43–55. [Google Scholar] [CrossRef] [PubMed] - Gürer, C.; Draxler, C. Penalization approaches in the conditional maximum likelihood and Rasch modelling context. Brit. J. Math. Stat. Psychol. 2022; epub ahead of print. [Google Scholar] [CrossRef]
- Liang, X.; Jacobucci, R. Regularized structural equation modeling to detect measurement bias: Evaluation of lasso, adaptive lasso, and elastic net. Struct. Equ. Model.
**2020**, 27, 722–734. [Google Scholar] [CrossRef] - Schauberger, G.; Mair, P. A regularization approach for the detection of differential item functioning in generalized partial credit models. Behav. Res. Methods
**2020**, 52, 279–294. [Google Scholar] [CrossRef] [Green Version] - Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.
**2001**, 96, 1348–1360. [Google Scholar] [CrossRef] - Chen, Y.; Liu, J.; Xu, G.; Ying, Z. Statistical analysis of Q-matrix based diagnostic classification models. J. Am. Stat. Assoc.
**2015**, 110, 850–866. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Umezu, Y.; Shimizu, Y.; Masuda, H.; Ninomiya, Y. AIC for the non-concave penalized likelihood method. Ann. Inst. Stat. Math.
**2019**, 71, 247–274. [Google Scholar] [CrossRef] [Green Version] - Zhang, H.; Li, S.J.; Zhang, H.; Yang, Z.Y.; Ren, Y.Q.; Xia, L.Y.; Liang, Y. Meta-analysis based on nonconvex regularization. Sci. Rep.
**2020**, 10, 5755. [Google Scholar] [CrossRef] [Green Version] - Breheny, P.; Huang, J. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat.
**2011**, 5, 232–253. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Xiao, H.; Sun, Y. On tuning parameter selection in model selection and model averaging: A Monte Carlo study. J. Risk Financ. Manag.
**2019**, 12, 109. [Google Scholar] [CrossRef] [Green Version] - Williams, D.R. Beyond lasso: A survey of nonconvex regularization in Gaussian graphical models. PsyArXiv
**2020**. [Google Scholar] [CrossRef] - Battauz, M. Regularized estimation of the nominal response model. Multivar. Behav. Res.
**2020**, 55, 811–824. [Google Scholar] [CrossRef] [PubMed] - Oelker, M.R.; Tutz, G. A uniform framework for the combination of penalties in generalized structured models. Adv. Data Anal. Classif.
**2017**, 11, 97–120. [Google Scholar] [CrossRef] - Tutz, G.; Gertheiss, J. Regularized regression for categorical data. Stat. Model.
**2016**, 16, 161–200. [Google Scholar] [CrossRef] [Green Version] - Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Lee, W.C.; Lee, G. IRT linking and equating. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 639–673. [Google Scholar] [CrossRef]
- Sansivieri, V.; Wiberg, M.; Matteucci, M. A review of test equating methods with a special focus on IRT-based approaches. Statistica
**2017**, 77, 329–352. [Google Scholar] [CrossRef] - Robitzsch, A. Robust Haebara linking for many groups: Performance in the case of uniform DIF. Psych
**2020**, 2, 155–173. [Google Scholar] [CrossRef] - Pokropek, A.; Lüdtke, O.; Robitzsch, A. An extension of the invariance alignment method for scale linking. Psychol. Test Assess. Model.
**2020**, 62, 303–334. Available online: https://bit.ly/2UEp9GH (accessed on 8 December 2022). - Robitzsch, A. L
_{p}loss functions in invariance alignment and Haberman linking with few or many groups. Stats**2020**, 3, 246–283. [Google Scholar] [CrossRef] - Manna, V.F.; Gu, L. Different Methods of Adjusting for form Difficulty under the Rasch Model: Impact on Consistency of Assessment Results; (Research Report No. RR-19-08); Educational Testing Service: Princeton, NJ, USA, 2019. [Google Scholar] [CrossRef] [Green Version]
- Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model.
**2014**, 21, 495–508. [Google Scholar] [CrossRef] [Green Version] - Muthén, B.; Asparouhov, T. IRT studies of many groups: The alignment method. Front. Psychol.
**2014**, 5, 978. [Google Scholar] [CrossRef] [PubMed] [Green Version] - von Davier, M.; Bezirhan, U. A robust method for detecting item misfit in large scale assessments. Educ. Psychol. Meas. 2022; epub ahead of print. [Google Scholar] [CrossRef]
- Huynh, H.; Meyer, P. Use of robust z in detecting unstable items in item response theory models. Pract. Assess. Res. Eval.
**2010**, 15, 2. [Google Scholar] [CrossRef] - Liu, C.; Jurich, D. Outlier detection using t-test in Rasch IRT equating under NEAT design. Appl. Psychol. Meas. 2022; epub ahead of print. [Google Scholar] [CrossRef]
- Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika
**2017**, 82, 610–636. [Google Scholar] [CrossRef] - Haberman, S.J. Linking Parameter Estimates Derived from an Item Response Model through Separate Calibrations; (Research Report No. RR-09-40); Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
- Liu, X.; Wallin, G.; Chen, Y.; Moustaki, I. Rotation to sparse loadings using L
^{p}losses and related inference problems. arXiv**2022**, arXiv:2206.02263. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2022. Available online: https://www.R-project.org/ (accessed on 11 January 2022).
- Robitzsch, A.; Kiefer, T.; Wu, M. TAM: Test Analysis Modules. R Package Version 4.1-4. 2022. Available online: https://CRAN.R-project.org/package=TAM (accessed on 28 August 2022).
- Robitzsch, A. Sirt: Supplementary Item Response Theory Models. R Package Version 3.12-66. 2022. Available online: https://CRAN.R-project.org/package=sirt (accessed on 17 May 2022).
- Frey, A.; Hartig, J.; Rupp, A.A. An NCME instructional module on booklet designs in large-scale assessments of student achievement: Theory and practice. Educ. Meas.
**2009**, 28, 39–53. [Google Scholar] [CrossRef] - Robitzsch, A.; Lüdtke, O. Mean comparisons of many groups in the presence of DIF: An evaluation of linking and concurrent scaling approaches. J. Educ. Behav. Stat.
**2022**, 47, 36–68. [Google Scholar] [CrossRef] - Camilli, G. The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In Differential Item Functioning: Theory and Practice; Holland, P.W., Wainer, H., Eds.; Erlbaum: Hillsdale, NJ, USA, 1993; pp. 397–417. [Google Scholar]
- El Masri, Y.H.; Andrich, D. The trade-off between model fit, invariance, and validity: The case of PISA science assessments. Appl. Meas. Educ.
**2020**, 33, 174–188. [Google Scholar] [CrossRef] - Robitzsch, A.; Lüdtke, O. Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies. Meas. Instrum. Soc. Sci.
**2022**, 4, 9. [Google Scholar] [CrossRef] - Brennan, R.L. Misconceptions at the intersection of measurement theory and practice. Educ. Meas.
**1998**, 17, 5–9. [Google Scholar] [CrossRef]

**Figure 1.**SCAD penalty function ${\mathcal{P}}_{\mathrm{SCAD}}$ for different values of a for $\lambda =0.2$.

**Table 1.**Simulation Study 1: Bias of estimated group means for balanced and unbalanced DIF effects as a function of the size of DIF effects $\delta $ and sample size N.

Choice of $\mathit{\lambda}$ | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

$\mathit{\delta}$ | N | MAD | AIC | BIC | 0.05 | 0.10 | 0.15 | ${\mathit{L}}_{0.5}$ | ${\mathit{L}}_{1}$ | ${\mathit{L}}_{2}$ | CC |

Balanced DIF | |||||||||||

0.5 | 500 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |

1000 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |

2500 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |

5000 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | −0.01 | 0.00 | 0.00 | 0.00 | 0.00 | |

1.0 | 500 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |

1000 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |

2500 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |

5000 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |

Unbalanced DIF | |||||||||||

0.5 | 500 | −0.06 | −0.02 | −0.03 | −0.04 | −0.02 | −0.06 | −0.03 | −0.05 | −0.10 | −0.09 |

1000 | −0.03 | −0.02 | −0.01 | −0.01 | −0.01 | −0.08 | −0.02 | −0.04 | −0.10 | −0.10 | |

2500 | 0.00 | −0.02 | −0.01 | −0.01 | −0.01 | −0.09 | −0.01 | −0.02 | −0.10 | −0.10 | |

5000 | 0.00 | −0.01 | −0.01 | −0.01 | −0.01 | −0.09 | −0.01 | −0.02 | −0.10 | −0.10 | |

1.0 | 500 | −0.01 | −0.02 | 0.00 | −0.05 | 0.00 | 0.00 | −0.02 | −0.05 | −0.20 | −0.18 |

1000 | 0.00 | −0.02 | 0.00 | −0.02 | 0.00 | 0.00 | −0.01 | −0.04 | −0.20 | −0.18 | |

2500 | 0.00 | −0.01 | 0.00 | −0.02 | 0.00 | 0.00 | −0.01 | −0.02 | −0.20 | −0.18 | |

5000 | 0.00 | −0.01 | 0.00 | −0.03 | 0.00 | 0.00 | −0.01 | −0.02 | −0.20 | −0.18 |

_{p}= linking employing the L

_{p}loss function with p = 0.5, 1.0, or 2.0; CC = concurrent calibration assuming invariant item parameters; Absolute biases larger than 0.03 are printed in bold.

**Table 2.**Simulation Study 1: Relative root mean sqaure error (RMSE) of estimated group means for balanced and unbalanced DIF effects as a function of the size of DIF effects $\delta $ and sample size N.

Choice of $\mathit{\lambda}$ | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

$\mathit{\delta}$ | N | MAD | AIC | BIC | 0.05 | 0.10 | 0.15 | ${\mathit{L}}_{0.5}$ | ${\mathit{L}}_{1}$ | ${\mathit{L}}_{2}$ | CC |

Balanced DIF | |||||||||||

0.5 | 500 | 111 | 115 | 111 | 120 | 110 | 111 | 122 | 110 | 101 | 100 |

1000 | 111 | 112 | 106 | 109 | 108 | 118 | 115 | 108 | 101 | 100 | |

2500 | 104 | 133 | 103 | 122 | 118 | 135 | 114 | 108 | 101 | 100 | |

5000 | 103 | 147 | 129 | 139 | 126 | 153 | 111 | 107 | 100 | 100 | |

1.0 | 500 | 107 | 111 | 105 | 115 | 106 | 104 | 118 | 109 | 102 | 100 |

1000 | 104 | 110 | 103 | 108 | 103 | 103 | 115 | 108 | 102 | 100 | |

2500 | 105 | 113 | 104 | 112 | 103 | 103 | 114 | 109 | 102 | 100 | |

5000 | 104 | 111 | 103 | 126 | 103 | 103 | 112 | 108 | 102 | 100 | |

Unbalanced DIF | |||||||||||

0.5 | 500 | 120 | 108 | 104 | 120 | 100 | 117 | 113 | 110 | 142 | 138 |

1000 | 124 | 126 | 100 | 122 | 108 | 164 | 117 | 119 | 196 | 187 | |

2500 | 102 | 203 | 185 | 194 | 161 | 258 | 114 | 120 | 288 | 274 | |

5000 | 100 | 249 | 240 | 243 | 217 | 374 | 114 | 124 | 408 | 383 | |

1.0 | 500 | 108 | 109 | 100 | 141 | 101 | 100 | 122 | 121 | 270 | 247 |

1000 | 100 | 113 | 100 | 125 | 100 | 100 | 117 | 122 | 370 | 336 | |

2500 | 101 | 113 | 101 | 244 | 103 | 100 | 113 | 121 | 572 | 519 | |

5000 | 100 | 115 | 100 | 352 | 103 | 100 | 111 | 124 | 808 | 730 |

_{p}= linking employing the L

_{p}loss function with p = 0.5, 1.0, or 2.0; CC = concurrent calibration assuming invariant item parameters; Relative RMSE values larger than 125 are printed in bold.

**Table 3.**Focused Simulation Study 1A: Relative root mean square error (RMSE) of estimated group means for unbalanced DIF effects as a function of the size of DIF effects $\delta $ and sample size N for different values a of the SCAD penalty.

Best | Choice of $\mathit{\lambda}$ Based on AIC with $\mathit{a}=$ | Choice of $\mathit{\lambda}$ Based on BIC with $\mathit{a}=$ | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

$\mathit{\delta}$ | N | a | $\mathit{\lambda}$ | 2.2 | 2.5 | 3 | 3.7 | 4.5 | 6 | 9 | ${\mathit{a}}_{\mathbf{opt}}$ | 2.2 | 2.5 | 3 | 3.7 | 4.5 | 6 | 9 | ${\mathit{a}}_{\mathbf{opt}}$ |

0.5 | 500 | 9 | 0.04 | 110.8 | 111.2 | 110.9 | 111.8 | 110.2 | 110.3 | 109.6 | 108.0 | 103.4 | 103.4 | 103.3 | 103.4 | 103.4 | 103.5 | 104.0 | 104.0 |

1000 | 3.7 | BIC | 128.7 | 130.1 | 129.5 | 127.7 | 129.4 | 126.1 | 122.7 | 121.4 | 100.2 | 100.0 | 100.1 | 100.0 | 100.2 | 100.1 | 100.4 | 100.2 | |

2500 | 2.2 | 0.19 | 139.2 | 138.6 | 137.7 | 136.3 | 138.1 | 137.8 | 132.3 | 129.4 | 126.0 | 126.6 | 125.2 | 122.7 | 125.1 | 124.6 | 117.8 | 111.3 | |

1 | 500 | 3.7 | 0.13 | 111.4 | 110.4 | 110.5 | 108.7 | 108.7 | 107.9 | 109.9 | 108.6 | 102.5 | 102.4 | 102.5 | 100.3 | 100.2 | 100.2 | 100.4 | 100.6 |

1000 | 3.7 | 0.13 | 113.1 | 112.5 | 112.5 | 112.9 | 112.1 | 111.2 | 108.9 | 110.3 | 100.6 | 100.6 | 100.7 | 100.6 | 100.7 | 100.6 | 100.9 | 100.8 | |

2500 | 9 | 0.08 | 126.9 | 119.8 | 120.4 | 119.5 | 120.4 | 114.4 | 116.4 | 122.8 | 111.3 | 103.4 | 103.4 | 105.8 | 103.4 | 103.4 | 103.3 | 111.2 |

_{opt}= choice of optimal a parameter based on AIC or BIC with corresponding optimal $\lambda $ parameter.

**Table 4.**Simulation Study 2: Bias of estimated group means for balanced and unbalanced DIF effects as a function of the size of DIF effects $\delta $ and sample size N.

Choice of $\mathit{\lambda}$ | JHL with $\mathit{p}=$ | HL with $\mathit{p}=$ | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

$\mathit{\delta}$ | N | MAD | AIC | BIC | 0.05 | 0.10 | 0.15 | 0.5 | 1 | 2 | 0.5 | 1 | 2 | ${\mathit{L}}_{0.5}$ | ${\mathit{L}}_{1}$ | ${\mathit{L}}_{2}$ | CC |

Balanced DIF | |||||||||||||||||

0.5 | 500 | −0.01 | −0.01 | −0.01 | −0.02 | −0.01 | −0.04 | −0.01 | −0.01 | −0.01 | 0.00 | 0.00 | 0.01 | −0.01 | −0.01 | −0.01 | −0.04 |

1000 | 0.01 | 0.00 | 0.01 | 0.00 | 0.01 | −0.03 | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | −0.04 | |

2500 | −0.01 | −0.01 | −0.01 | −0.01 | 0.00 | −0.05 | −0.01 | −0.01 | −0.01 | 0.00 | 0.00 | 0.00 | −0.01 | −0.01 | −0.01 | −0.04 | |

5000 | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | −0.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | −0.04 | |

1.0 | 500 | −0.01 | −0.02 | −0.01 | −0.02 | −0.01 | −0.01 | −0.02 | −0.02 | −0.01 | 0.00 | 0.00 | 0.01 | −0.02 | −0.02 | −0.01 | −0.06 |

1000 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | −0.06 | |

2500 | −0.01 | −0.02 | −0.01 | −0.01 | −0.01 | −0.01 | −0.02 | −0.01 | −0.01 | 0.00 | 0.00 | 0.00 | −0.02 | −0.01 | −0.01 | −0.06 | |

5000 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | −0.06 | |

Unbalanced DIF | |||||||||||||||||

0.5 | 500 | −0.07 | −0.03 | −0.04 | −0.04 | −0.03 | −0.07 | −0.04 | −0.06 | −0.10 | −0.04 | −0.06 | −0.10 | −0.04 | −0.05 | −0.10 | −0.10 |

1000 | −0.03 | −0.01 | 0.00 | −0.01 | −0.01 | −0.07 | −0.02 | −0.04 | −0.10 | −0.03 | −0.05 | −0.10 | −0.02 | −0.03 | −0.10 | −0.10 | |

2500 | −0.01 | −0.03 | −0.03 | −0.03 | −0.02 | −0.10 | −0.02 | −0.04 | −0.10 | −0.02 | −0.04 | −0.10 | −0.01 | −0.03 | −0.10 | −0.10 | |

5000 | 0.01 | −0.01 | 0.00 | −0.01 | 0.00 | −0.09 | 0.00 | −0.02 | −0.10 | −0.01 | −0.03 | −0.10 | 0.00 | −0.01 | −0.10 | −0.10 | |

1.0 | 500 | −0.03 | −0.03 | −0.02 | −0.06 | −0.02 | −0.02 | −0.03 | −0.07 | −0.21 | −0.03 | −0.07 | −0.20 | −0.04 | −0.06 | −0.21 | −0.17 |

1000 | 0.00 | −0.01 | 0.00 | −0.02 | 0.00 | 0.00 | −0.01 | −0.04 | −0.20 | −0.02 | −0.05 | −0.20 | −0.01 | −0.03 | −0.20 | −0.17 | |

2500 | −0.01 | −0.02 | −0.02 | −0.05 | −0.02 | −0.02 | −0.02 | −0.04 | −0.21 | −0.01 | −0.04 | −0.20 | −0.02 | −0.03 | −0.21 | −0.17 | |

5000 | 0.00 | 0.00 | 0.00 | −0.02 | 0.00 | 0.00 | 0.00 | −0.02 | −0.20 | −0.01 | −0.03 | −0.20 | 0.00 | −0.01 | −0.20 | −0.17 |

_{p}= linking employing the unweighted L

_{p}loss function with p = 0.5, 1.0, or 2.0 using joint item discriminations; CC = concurrent calibration assuming invariant item parameters; Absolute biases larger than 0.03 are printed in bold.

**Table 5.**Simulation Study 2: Relative root mean square error (RMSE) of estimated group means for balanced and unbalanced DIF effects as a function of the size of DIF effects $\delta $ and sample size N.

Choice of $\mathit{\lambda}$ | JHL with $\mathit{p}=$ | HL with $\mathit{p}=$ | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

$\mathit{\delta}$ | N | MAD | AIC | BIC | 0.05 | 0.10 | 0.15 | 0.5 | 1 | 2 | 0.5 | 1 | 2 | ${\mathit{L}}_{0.5}$ | ${\mathit{L}}_{1}$ | ${\mathit{L}}_{2}$ | CC |

Balanced DIF | |||||||||||||||||

0.5 | 500 | 108 | 119 | 109 | 130 | 111 | 120 | 112 | 103 | 100 | 127 | 115 | 125 | 115 | 105 | 100 | 113 |

1000 | 109 | 113 | 106 | 111 | 106 | 127 | 107 | 102 | 100 | 120 | 112 | 118 | 112 | 104 | 100 | 121 | |

2500 | 104 | 185 | 117 | 183 | 139 | 178 | 105 | 103 | 100 | 113 | 110 | 116 | 111 | 105 | 100 | 152 | |

5000 | 103 | 120 | 112 | 115 | 104 | 209 | 103 | 102 | 100 | 110 | 109 | 113 | 110 | 104 | 100 | 194 | |

1.0 | 500 | 105 | 109 | 101 | 117 | 101 | 100 | 109 | 103 | 100 | 127 | 117 | 127 | 113 | 105 | 100 | 127 |

1000 | 104 | 108 | 102 | 107 | 102 | 101 | 109 | 103 | 100 | 124 | 116 | 123 | 112 | 106 | 100 | 149 | |

2500 | 105 | 113 | 100 | 127 | 100 | 100 | 106 | 104 | 102 | 114 | 111 | 116 | 112 | 107 | 102 | 192 | |

5000 | 103 | 108 | 100 | 108 | 100 | 100 | 102 | 101 | 100 | 113 | 113 | 118 | 110 | 104 | 100 | 258 | |

Unbalanced DIF | |||||||||||||||||

0.5 | 500 | 118 | 118 | 101 | 130 | 100 | 121 | 103 | 107 | 140 | 119 | 118 | 148 | 109 | 106 | 140 | 135 |

1000 | 126 | 136 | 100 | 133 | 113 | 163 | 107 | 118 | 190 | 126 | 137 | 201 | 116 | 115 | 190 | 192 | |

2500 | 105 | 293 | 272 | 299 | 212 | 269 | 107 | 134 | 288 | 118 | 146 | 284 | 113 | 123 | 288 | 276 | |

5000 | 102 | 276 | 279 | 279 | 253 | 356 | 100 | 123 | 375 | 115 | 156 | 391 | 109 | 110 | 375 | 384 | |

1.0 | 500 | 109 | 115 | 100 | 146 | 107 | 105 | 110 | 125 | 265 | 128 | 141 | 271 | 120 | 123 | 265 | 231 |

1000 | 100 | 114 | 105 | 131 | 105 | 105 | 105 | 122 | 359 | 124 | 145 | 366 | 113 | 118 | 359 | 315 | |

2500 | 101 | 179 | 169 | 308 | 163 | 171 | 108 | 144 | 536 | 113 | 146 | 529 | 112 | 131 | 536 | 459 | |

5000 | 103 | 117 | 100 | 345 | 100 | 100 | 104 | 135 | 776 | 116 | 161 | 778 | 112 | 118 | 776 | 677 |

^{p}= linking employing the unweighted L

^{p}loss function with p = 0.5, 1.0, or 2.0 using joint item discriminations; CC = concurrent calibration assuming invariant item parameters; Relative RMSE values larger than 125 are printed in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Robitzsch, A.
Comparing Robust Linking and Regularized Estimation for Linking Two Groups in the 1PL and 2PL Models in the Presence of Sparse Uniform Differential Item Functioning. *Stats* **2023**, *6*, 192-208.
https://doi.org/10.3390/stats6010012

**AMA Style**

Robitzsch A.
Comparing Robust Linking and Regularized Estimation for Linking Two Groups in the 1PL and 2PL Models in the Presence of Sparse Uniform Differential Item Functioning. *Stats*. 2023; 6(1):192-208.
https://doi.org/10.3390/stats6010012

**Chicago/Turabian Style**

Robitzsch, Alexander.
2023. "Comparing Robust Linking and Regularized Estimation for Linking Two Groups in the 1PL and 2PL Models in the Presence of Sparse Uniform Differential Item Functioning" *Stats* 6, no. 1: 192-208.
https://doi.org/10.3390/stats6010012