# Explanatory Cognitive Diagnosis Models Incorporating Item Features

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Theoretical Framework

#### 2.1. Explanatory CDM

#### 2.2. Linking Item Features to Item Psychometric Properties

## 3. The Proposed Model

#### 3.1. Model Specification

#### 3.2. Model Constraints and Identification

## 4. Empirical Data Analysis

#### 4.1. Data

#### 4.2. Feature Engineering

#### 4.2.1. Text Preprocessing

#### 4.2.2. Feature Extraction

#### 4.3. Model Estimation

## 5. Results

**Model fit.**The posterior predictive model check (Guttman 1967; Rubin 1981, 1984) was conducted to evaluate the data-model fit. The posterior predictive p-value (PPP) of the sum of squares of standardized residuals, which is a discrepancy measure between the data and the model, was calculated. Extremely small PPP value indicates a bad fit and this study regards PPP < 0.05 as a sign as bad model–data fit. Additionally, deviance information criterion (DIC; Spiegelhalter et al. 2002) was used to evaluate relative model fit. According to the PPP values shown in Table 1, all the five data-fitting models show an acceptable model–data fit. DIC results indicate that the IE-HO-DINA models (i.e., those without a residual term) are worse in model–data fit than the HO-DINA model or the IE-HO-DINA-R models, which is possibly due to the imperfect prediction of the item parameters from the item features. In contrast, the IE-HO-DINA-R models (i.e., those with a residual term) fit the data better than the HO-DINA model. The possible reason could be that, while the likelihood of the HO-DINA model and the IE-HO-DINA-R models were expected to be comparable, the IE-HO-DINA-R models contain fewer parameters than the HO-DINA model and, thus, could be less penalized for model complexity.

**The relationship between item features and item parameters.**${\gamma}_{m}$ and ${\phi}_{m}$ coefficients (Table 2 and Table 3) quantify the relationships between the item features and item parameters. In this study, the item features explained around 26% and 30% of the variance in the logit of the guessing and slipping parameters, respectively. The Wald test was performed to examine the null hypothesis that the parameters, ${\gamma}_{m}$ or ${\phi}_{m}$, equals to 0. Only the “proportion of words with 6 or more letters” feature is statistically significant based on all the models. Specifically, this feature is negatively related to the guessing parameter but positively related to the slipping parameter.

**Consistency of item parameter estimates and attribute profile classifications.**The estimated guessing or slipping parameters from the HO-DINA model are highly correlated (correlation coefficient close to 1) with the predicted guessing or slipping parameters from the IE-HO-DINA-R models, but only moderately correlated (correlation coefficient ranging from 0.4 to 0.7) with those from the IE-HO-DINA models (i.e., those without residual terms). Accordingly, the attribute profile classifications from the HO-DINA model are highly consistent (consistency rate > 0.95) with those from the IE-HO-DINA-R models but relatively inconsistent (consistency rate at around 0.6) with the IE-HO-DINA models (i.e., those without residual terms). The item parameter correlation and attribute classification consistency among the models are listed in Tables S5 and S6 in the Online Supplementary Materials.

#### A Simulation Study

## 6. Results

## 7. Summary and Discussion

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Note

1 | The computing time is based on the analyses run on a desktop with Intel Core i7 CPU and 3.2 GHz processor. Multiple MCMC chains were run in parallel with multiple cores. The sample size and number of items setup is similar to that in the empirical data analysis section. |

## References

- Ayers, Elizabeth, Sophia Rabe-Hesketh, and Rebecca Nugent. 2013. Incorporating Student Covariates in Cognitive Diagnosis Models. Journal of Classification 30: 195–224. [Google Scholar] [CrossRef]
- Bird, Steven, and Edward Loper. 2004. NLTK: The natural language toolkit. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. Barcelona: Association for Computational Linguistics, p. 31. [Google Scholar]
- Brooks, Stephen. P., and Andrew Gelman. 1998. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7: 434–55. [Google Scholar]
- Craney, Trevor A., and James G. Surles. 2002. Model-dependent variance inflation factor cutoff values. Quality Engineering 14: 391–403. [Google Scholar] [CrossRef]
- Dale, Edgar, and Jeanne S. Chall. 1955. A formula for predicting readability: Instructions. Educational Research Bulletin 27: 37–54. [Google Scholar]
- De Boeck, Paul. 2008. Random Item IRT Models. Psychometrika 73: 533–59. [Google Scholar] [CrossRef]
- De Boeck, Paul, and Mark Wilson, eds. 2004. Explanatory Item Response Models. New York: Springer. [Google Scholar]
- de la Torre, Jimmy. 2011. The generalized DINA model framework. Psychometrika 76: 179–99. [Google Scholar] [CrossRef]
- de la Torre, Jimmy, and Jeffrey A. Douglas. 2004. Higher-order latent trait models for cognitive diagnosis. Psychometrika 69: 333–53. [Google Scholar] [CrossRef]
- de la Torre, Jimmy, Yuan Hong, and Weiling Deng. 2010. Factors affecting the item parameter estimation and classification accuracy of the DINA model. Journal of Educational Measurement 47: 227–49. [Google Scholar] [CrossRef]
- Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics, pp. 4171–86. [Google Scholar] [CrossRef]
- Drum, Priscilla A., Robert C. Calfee, and Linda K. Cook. 1981. The effects of surface structure variables on performance in reading comprehension tests. Reading Research Quarterly 16: 486–514. [Google Scholar] [CrossRef]
- Embretson, Susan E., and C. Douglas Wetzel. 1987. Component latent trait models for paragraph comprehension tests. Applied Psychological Measurement 11: 175–93. [Google Scholar] [CrossRef]
- Farrar, Donald E., and Robert R. Glauber. 1967. Multicollinearity in regression analysis: The problem revisited. The Review of Economic and Statistics 49: 92–107. [Google Scholar] [CrossRef]
- Fischer, Gerhard. H. 1973. The linear logistic test model as an instrument in educational research. Acta Psychologica 37: 359–74. [Google Scholar] [CrossRef]
- Foy, Pierre, Alka Arora, and Gabrielle Stanco. 2013. TIMSS 2011 User Guide for the International Database: Released Items. Boston: TIMSS and PIRLS International Study Center. [Google Scholar]
- Guttman, Irwin. 1967. The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society: Series B (Methodological) 29: 83–100. [Google Scholar] [CrossRef]
- Haertel, Edward H. 1989. Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement 26: 301–21. [Google Scholar] [CrossRef]
- Henson, Robert A., Jonathan L. Templin, and John T. Willse. 2009. Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika 74: 191–210. [Google Scholar] [CrossRef]
- Hoerl, Arthur E., and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12: 55–67. [Google Scholar] [CrossRef]
- Iacobucci, Dawn, Matthew J. Schneider, Deidre L. Popovich, and Georgios A. Bakamitsos. 2016. Mean centering helps alleviate “micro” but not “macro” multicollinearity. Behavior Research Methods 48: 1308–17. [Google Scholar] [CrossRef]
- Janssen, Rianne, Jan Schepers, and Deborah Peres. 2004. Models with item and item group predictors. In Explanatory Item Response Models. New York: Springer, pp. 189–212. [Google Scholar]
- Jerman, Max. E., and Sanford Mirman. 1973. Linguistic and computational variables in problem solving in elementary mathematics. Educational Studies in Mathematics 5: 317–62. [Google Scholar] [CrossRef]
- Junker, Brian. W., and Klaas Sijtsma. 2001. Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement 25: 258–72. [Google Scholar] [CrossRef]
- Kaplan, Mehmet. 2016. New Item Selection and Test Administration Procedures for Cognitive Diagnosis Computerized Adaptive Testing. New Brunswick: Rutgers University, Graduate School. Available online: https://rucore.libraries.rutgers.edu/rutgers-lib/49244/ (accessed on 1 August 2017).
- Lepik, Madis. 1990. Algebraic word problems: Role of linguistic and structural variables. Educational Studies in Mathematics 21: 83–90. [Google Scholar] [CrossRef]
- Ma, Wenchao, Charles Iaconangelo, and Jimmy de la Torre. 2016. Model similarity, model selection, and attribute classification. Applied Psychological Measurement 40: 200–17. [Google Scholar] [CrossRef] [PubMed]
- Macready, George B., and C. Mitchell Dayton. 1977. The use of probabilistic models in the assessment of mastery. Journal of Educational and Behavioral Statistics 2: 99–120. [Google Scholar] [CrossRef]
- O’Shea, James, Zuhair Bandar, and Keeley Crockett. 2012. A multi-classifier approach to dialogue act classification using function words. In Transactions on Computational Collective Intelligence VII. Berlin and Heidelberg: Springer, pp. 119–43. [Google Scholar]
- Paap, Muirne. C. S., Qiwei He, and Bernard P. Veldkamp. 2015. Selecting testlet features with predictive value for the testlet effect: An empirical study. SAGE Open 5: 215824401558186. [Google Scholar] [CrossRef]
- Park, Yoon Soo, and Young-Sun Lee. 2014. An extension of the DINA model using covariates examining factors affecting response probability and latent classification. Applied Psychological Measurement 38: 376–90. [Google Scholar] [CrossRef]
- Park, Yoon Soo, and Young-Sun Lee. 2019. Explanatory cognitive diagnostic models. In Handbook of Diagnostic Classification Models. Cham: Springer, pp. 207–22. [Google Scholar]
- Park, Yoon Soo, K. Xing, and Young-Sun Lee. 2018. Explanatory cognitive diagnostic models: Incorporating latent and observed predictors. Applied Psychological Measurement 42: 376–92. [Google Scholar] [CrossRef] [PubMed]
- Plummer, Martyn. 2015. JAGS Version 4.0. 0 User Manual. Available online: https://sourceforge.net/projects/mcmc-jags/files/Manuals/4.x (accessed on 1 August 2017).
- Python Software Foundation. 2015. Python (2.7.10). Available online: https://www.python.org/downloads/release/python-2710/ (accessed on 1 August 2017).
- R Development Core Team. 2013. R: A Language and Environment for Statistical Computing. Available online: http://cran.fiocruz.br/web/packages/dplR/vignettes/timeseries-dplR.pdf (accessed on 1 August 2017).
- Rubin, Donald B. 1981. Estimation in parallel randomized experiments. Journal of Educational Statistics 6: 377–401. [Google Scholar] [CrossRef]
- Rubin, Donald B. 1984. Bayesianly justifiable and relevant frequency calculations for the applies statistician. The Annals of Statistics 12: 1151–72. [Google Scholar] [CrossRef]
- Rupp, Andre A., Jonathan L. Templin, and Robert A. Henson. 2010. Diagnostic Assessment: Theory, Methods, and Applications. New York: Guilford. [Google Scholar]
- Settles, Burr, Geoffrey T. LaFlair, and Masato Hagiwara. 2020. Machine Learning–Driven Language Assessment. Transactions of the Association for Computational Linguistics 8: 247–63. [Google Scholar] [CrossRef]
- Sorrel, Miguel A., Francisco J. Abad, Julio Olea, Jimmy de la Torre, and Juan Ramón Barrada. 2017. Inferential item-fit evaluation in cognitive diagnosis modeling. Applied Psychological Measurement 41: 614–31. [Google Scholar] [CrossRef]
- Spiegelhalter, David J., Nicola G. Best, Bradley P. Carlin, and Angelika Van Der Linde. 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64: 583–639. [Google Scholar] [CrossRef]
- Su, Yu-Sung, and Masanao Yajima. 2015. R2jags: Using R to Run ‘JAGS’. R Package Version 0.5-7. Available online: https://cran.r-project.org/web/packages/R2jags/index.html (accessed on 1 August 2017).
- Templin, Jonathan L. 2004. Generalized Linear Mixed Proficiency Models. Unpublished. Doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL, USA. Available online: http://jtemplin.coe.uga.edu/files/presentations/jtemplin_uiuc2004c.pdf (accessed on 1 August 2017).
- von Davier, Matthias. 2005. A General Diagnostic Model Applied to Language Testing Data. RR-05-16, ETS Research Series; Princeton: Educational Testing Service. [Google Scholar]

**Figure 1.**True and estimated item feature coefficients with guessing parameter as outcome. Two-step-4 = two-step procedure with the 4 data-generating features; Two-step-8 = two-step procedure with all 8 simulated features; IE-HO-DINA/IE-HO-DINA-R-8 = IE-HO-DINA/IE-HO-DINA-R model with all 8 simulated features; IE-HO-DINA-2-strong = IE-HO-DINA with only the 2 strong data-generating features; IE-HO-DINA-2-weak = IE-HO-DINA with only the 2 weak data-generating features.

**Figure 3.**Bias and root mean squared error of guessing parameter estimates. Items are ascendingly ordered by their item quality (i.e., the true value of 1-s-g). The vertical gray dashed line separates the low- and high-quality items. Items to the left of the gray dashed line are of low quality (1-s-g < 0.65), while items to the right of the gray dashed line are of high quality (1-s-g ≥ 0.65).

**Figure 4.**Bias and root mean squared error of slipping parameter estimates. Items are ascendingly ordered by their item quality (i.e., the true value of 1-s-g). The vertical gray dashed line separates the low- and high-quality items. Items to the left of the gray dashed line are of low quality (1-s-g < 0.65), while items to the right of the gray dashed line are of high quality (1-s-g ≥ 0.65).

Model | The Item Parameter Linked to Item Features | Contain a Residual Term or Not | # of Parameters | PPP | DIC |
---|---|---|---|---|---|

HO-DINA | - | - | 80 | 0.455 | 44,515.42 |

IE-HO-DINA-g | Guessing | No | 52 | 0.478 | 46,540.49 |

IE-HO-DINA-g-R | Guessing | Yes | 53 | 0.454 | 44,499.50 |

IE-HO-DINA-s | Slipping | No | 52 | 0.215 | 45,717.00 |

IE-HO-DINA-s-R | Slipping | Yes | 53 | 0.441 | 44,394.05 |

Coefficient | Data Fitting Model | |||||
---|---|---|---|---|---|---|

HO-DINA with Two-Step Procedure | IE-HO-DINA-g | IE-HO-DINA-g-R | ||||

Estimate | SE | Estimate | SE | Estimate | SE | |

Word token | −0.01 | 0.03 | −0.01 | <0.01 | −0.01 | 0.02 |

Number of adjectives | −0.11 | 0.18 | 0.01 | 0.02 | −0.12 | 0.17 |

Number of adverbs | −0.25 | 0.43 | −0.18 * | 0.06 | −0.25 | 0.40 |

Story or not | 0.09 | 0.45 | 0.18 * | 0.05 | 0.09 | 0.42 |

Item type | 0.49 | 0.45 | 0.72 * | 0.06 | 0.48 | 0.38 |

Proportion of tokens with six or more letters | −4.76 * | 2.34 | −2.44 * | 0.30 | −4.64 * | 2.12 |

Number of non-Dale–Chall words | 0.01 | 0.10 | 0.03 | 0.02 | 0.01 | 0.08 |

Brown News popularity | <0.01 | 0.01 | <0.01 | <0.01 | <0.01 | 0.01 |

Coefficient | Data Fitting Model | |||||
---|---|---|---|---|---|---|

HO-DINA with Two-Step Procedure | IE-HO-DINA-s | IE-HO-DINA-s-R | ||||

Estimate | SE | Estimate | SE | Estimate | SE | |

Word token | <0.01 | 0.03 | <0.01 | <0.01 | <0.01 | 0.03 |

Number of adjectives | 0.17 | 0.18 | 0.18 * | 0.03 | 0.17 | 0.15 |

Number of adverbs | 0.29 | 0.42 | 0.34 * | 0.05 | 0.38 | 0.40 |

Story or not | −0.46 | 0.43 | −0.79 * | 0.08 | −0.55 | 0.39 |

Item type | −0.46 | 0.45 | −0.44 * | 0.07 | −0.50 | 0.37 |

Proportion of tokens with 6 or more letters | 4.73 * | 2.28 | 7.14 * | 0.33 | 4.31 * | 1.87 |

Number of non-Dale–Chall words | <0.01 | 0.10 | 0.09 * | 0.02 | 0.02 | 0.09 |

Brown News popularity | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 |

Feature Label | Properties | True Data-Generating Model |
${\mathit{\psi}}_{\mathit{m}}$^{a} | ${\mathit{\phi}}_{\mathit{m}}$ ^{a} |
---|---|---|---|---|

Feature 1 | Continuous | Normal (0, 1) | 0.6 | −0.6 |

Feature 2 | Continuous | Normal (0, 1) | 0.3 | −0.3 |

Feature 3 | Continuous | Normal (0, 1) | 0.3 | −0.3 |

Feature 4 | Continuous | Normal (0, 1) | 0 | 0 |

Feature 5 | Continuous | Normal (0, 1) | 0 | 0 |

Feature 6 | Continuous | Normal (0, 1) | 0 | 0 |

Feature 7 | Dichotomous | Bernoulli (p = 0.5) | 0.6 | −0.6 |

Feature 8 | Dichotomous | Bernoulli (p = 0.5) | 0 | 0 |

^{a}${\mathit{\psi}}_{\mathit{m}}$ means coefficient regressing on the guessing parameter; ${\mathit{\phi}}_{\mathit{m}}$ means coefficient regressing on the slipping parameter.

Research Question | Correctly Specified Model | Misspecified Model |
---|---|---|

RQ1 (Over-specified) | Two-step-4 (or HO-DINA) | Two-step-8 |

IE-HO-DINA-g-8 | ||

IE-HO-DINA-s-8 | ||

IE-HO-DINA-g-R-8 | ||

IE-HO-DINA-s-R-8 | ||

RQ2 (Under-specified) | Two-step-4 (or HO-DINA) | IE-HO-DINA-2-g-strong |

IE-HO-DINA-2-s-strong | ||

IE-HO-DINA-2-g-weak | ||

IE-HO-DINA-2-s-weak |

Explanatory Component Specification Type | Model ^{a} | Guessing Feature Coefficients ^{b} | Slipping Feature Coefficients ^{b} | ||
---|---|---|---|---|---|

Bias | RMSE | Bias | RMSE | ||

Correctly specified | Two-step-4 | - | 0.03 | 0.03 | 0.03 |

Over-specified | Two-step-8 | - | 0.03 | - | 0.04 |

IE-HO-DINA-8 | −0.02 | 0.04 | - | 0.05 | |

IE-HO-DINA-R-8 | - | 0.04 | −0.01 | 0.04 | |

Under-specified | IE-HO-DINA-2-strong | −0.04 | 0.04 | 0.02 | 0.04 |

IE-HO-DINA-2-weak | 0.05 | 0.03 | −0.02 | 0.04 |

^{a}Two-step-4 = two-step procedure with the 4 data-generating features; Two-step-8 = two-step procedure with all 8 simulated features; IE-HO-DINA/IE-HO-DINA-R-8 = IE-HO-DINA/IE-HO-DINA-R model with all 8 simulated features; IE-HO-DINA-2-strong = IE-HO-DINA with only the 2 strong data-generating features; IE-HO-DINA-2-weak = IE-HO-DINA with only the 2 weak data-generating features.

^{b}The recovery of the guessing feature coefficients is only applicable to the IE-HO-DINA-g/IE-HO-DINA-g-R models; the recovery of the slipping feature coefficients is only applicable to the IE-HO-DINA-s/IE-HO-DINA-s-R models. Bias values that approaches 0 (i.e., −0.01 < Bias < 0.01) are represented with “-”.

Explanatory Component Specification Type | Model | Guessing ^{b} | Slipping ^{b} | ||
---|---|---|---|---|---|

Bias | RMSE | Bias | RMSE | ||

- | HO-DINA ^{a} | 0.002 | <0.001 | 0.002 | <0.001 |

Over-specified | IE-HO-DINA-8 | 0.001 | 0.014 | 0.001 | 0.008 |

IE-HO-DINA-R-8 | 0.002 | <0.001 | <0.001 | <0.001 | |

Under-specified | IE-HO-DINA-2-strong | 0.004 | 0.022 | 0.003 | 0.013 |

IE-HO-DINA-2-weak | -0.002 | 0.029 | 0.006 | 0.023 |

^{a}The guessing/slipping parameters from the HO-DINA model were estimated instead of predicted, and the recovery these parameter estimates were used as the baseline.

^{b}The recovery of the predicted guessing probabilities is only applicable to the IE-HO-DINA-g/IE-HO-DINA-g-R models; the recovery of the predicted slipping probabilities is only applicable to the IE-HO-DINA-s/IE-HO-DINA-s-R models.

Explanatory Component Specification Type | Model | PCCR | ACCR | ||
---|---|---|---|---|---|

A1 | A2 | A3 | |||

- | HO-DINA | 0.932 | 0.933 | 0.999 | 1.000 |

Over-specified | IE-HO-DINA-g-8 | 0.915 | 0.916 | 0.998 | 1.000 |

IE-HO-DINA-s-8 | 0.922 | 0.923 | 0.998 | 1.000 | |

IE-HO-DINA-g-R-8 | 0.931 | 0.932 | 0.999 | 1.000 | |

IE-HO-DINA-s-R-8 | 0.931 | 0.932 | 0.999 | 1.000 | |

Under-specified | IE-HO-DINA-2-g-strong | 0.913 | 0.915 | 0.999 | 1.000 |

IE-HO-DINA-2-s-strong | 0.914 | 0.916 | 0.998 | 1.000 | |

IE-HO-DINA-2-g-weak | 0.870 | 0.871 | 0.998 | 1.000 | |

IE-HO-DINA-2-s-weak | 0.908 | 0.910 | 0.998 | 1.000 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Liao, M.; Jiao, H.; He, Q.
Explanatory Cognitive Diagnosis Models Incorporating Item Features. *J. Intell.* **2024**, *12*, 32.
https://doi.org/10.3390/jintelligence12030032

**AMA Style**

Liao M, Jiao H, He Q.
Explanatory Cognitive Diagnosis Models Incorporating Item Features. *Journal of Intelligence*. 2024; 12(3):32.
https://doi.org/10.3390/jintelligence12030032

**Chicago/Turabian Style**

Liao, Manqian, Hong Jiao, and Qiwei He.
2024. "Explanatory Cognitive Diagnosis Models Incorporating Item Features" *Journal of Intelligence* 12, no. 3: 32.
https://doi.org/10.3390/jintelligence12030032