1. Introduction
In the comparison of multiple groups in confirmatory factor analysis (CFA) regarding factor variables, some identifying assumptions have to be made. It is frequently assumed that item parameters are equal across groups, denoted as measurement invariance [
1]. The concept of invariance has been very prominent in psychology and the social sciences in general [
2,
3]. For example, in international large-scale assessment studies in education, like the Programme for International Student Assessment (PISA), the necessity of invariance is strongly emphasized [
4].
In the violation of measurement invariance, the invariance alignment (IA) method [
5,
6] (also referred to as alignment optimization [
7,
8]) has been proposed to tackle such situations. The IA method tries to make item parameters as invariant as possible while allowing a few deviations from invariance. By doing so, group comparisons can be made more robust against violations of measurement invariance.
Nowadays, the IA method is frequently applied in social sciences for analyzing questionnaire data [
9,
10,
11,
12]. Unfortunately, most methodological developments of IA (but see [
13,
14,
15] for exceptions) are strongly coupled to the popular but commercial (and closed-source) Mplus software [
16]. Previous simulation studies for one-dimensional factor models investigated the case of continuous items [
5,
8,
17,
18,
19], dichotomous items [
20,
21], and polytomous items [
14,
22]. IA to multidimensional factor models with continuous items has been investigated in [
23,
24]. Moreover, IA was studied in longitudinal models in [
25,
26,
27]. The optimization function used in IA also gave rise to extending it to a general framework used in penalized structural equation models [
28].
Besides the Mplus software, there exists an alternative implementation in the R package sirt [
29]. However, several researchers have pointed out that there could be subtle differences of IA between Mplus and sirt. Unfortunately, there is no systematic comparison of the performance of invariance alignment implementations in Mplus and sirt. This article tries to shed some light on the subtleties of implementation differences of IA. It turns out that different identification constraints are likely the cause of the different results of software packages. By changing the default identification constraint in sirt, Mplus and sirt provided much more similar results. Moreover, the results from a simulation study also question the default choices of tuning parameters in the software packages.
The rest of this article is structured as follows. In
Section 2, the background of IA is reviewed.
Section 3 discusses the syntax code and estimation options of IA in Mplus and sirt. In
Section 4, the two software packages are compared by means of a simulation study. An empirical example is presented in
Section 5. Finally, the paper closes with a discussion in
Section 6.
2. Invariance Alignment
Let the random variable
denote item
i (
) in group
g (
). A one-dimensional factor model [
30] is defined as
where
are item loadings, and
are item intercepts. Item loadings can be assumed to be positive. If some loading is negative, the corresponding random variable
must be multiplied by
. The factor variables
and all residual variables
are independent and univariate normally distributed. The factor variable
has a factor mean
and a factor standard deviation
.
Without additional assumptions, the parameters in (
1) are not identified. An identified model is obtained by assuming a standardized latent variable
(i.e., with a mean of 0 and a standard deviation of 1):
The parameters in (
1) and (
2) are related to each other by
In many applications, the factor means
and factor standard deviations
should be compared across groups. To achieve this, a typical assumption in the social sciences is the property is measurement invariance [
1,
3]. Measurement invariance presupposes that item loadings
and item intercepts
are equal across groups. That is, there exist common item loadings
such that
for all
and common item intercepts
such that
for
for all items
. The absence of measurement invariance is also labeled as differential item functioning (DIF; [
2,
31]) in the item response theory literature. If measurement invariance holds, (
3) can be rewritten as
The IA method of Asparouhov and Muthén [
5,
6] tackles situations under sparse violations of measurement invariance. In this case, a few item loadings or item intercepts are allowed to differ across groups, while the majority of items (approximately) fulfills the invariance assumption [
32]. This situation is called partial invariance in the literature [
33].
The IA estimation method proceeds in two steps. In the first step, the one-dimensional factor model (
2) is separately estimated by the maximum likelihood method for all groups in the first step. The estimated item parameters
and
(
;
) are used as the input of the IA. By rewriting (
3) and inserting the estimated item loadings and item intercepts, we obtain
These relations motivate the minimization of the following linking function in IA to determine group means
and standard deviations
:
where the weights
and
are known, and
is a nonnegative, symmetric loss function with
and is monotonically increasing for nonnegative
x values. Asparouhov and Muthén [
5] proposed using
and
, where
denotes the sample size of group
g.
In the minimization of (
6), additional identification constraints must be imposed. As a first alternative, the distribution parameters of the first (or any other) group can be fixed. That is, we set
and
. As a second alternative, one can simultaneously constrain all estimated parameters. Then, the following identification constraints can be imposed:
The constraints in (
7) state that the arithmetic mean of the factor means equals zero, and the geometric mean of the factor standard deviation equals one.
Note that the optimization function
H of IA defined in (
6) can be rewritten as
However, the function
can be conveniently substituted by an alternative. Note that Equation (
3) can be rewritten as
This motivates the alternative optimization function
for determining standard deviations, which employs logarithmized item loadings (see [
34,
35])
where
for
. Due to the required identification constraints, we fix
(i.e.,
). By minimizing
, a vector of standard deviations
on the logarithm metric is obtained; that is,
. The vector of estimated standard deviations
can be obtained by exponentiating all entries in
.
2.1. Numerical Optimization
As mentioned above, IA uses the loss function
as the default in the Mplus software package [
16]. However, the loss function
is also available in Mplus [
16]. The more general
loss function
for
has been studied for IA in [
13,
35]. It has been shown that values of the power
p smaller than 0.5 can be advantageous in some situations [
13].
In the practical minimization of
H involved in IA, the nondifferentiable
loss function
(for
) is replaced by a differentiable approximation
(see [
5,
35])
where
is a tuning parameter that controls the approximation error of
for
. The approximation error becomes smaller with
values close to zero. However, the minimization of
H in IA becomes more difficult when choosing too small values of
. Practical experience led to proposals
[
5] or
[
35]. The choice
is the default in Mplus (see [
13]).
2.2. A More in-Depth Look into the Identification Constraint for Standard Deviations for Many Groups
The IA method measures the similarity between item loadings in the optimization function
by
As mentioned above, an identification constraint could be to fix the standard deviation of the first group to 1 or to fix the product of standard deviations to 1. Regarding the choice of the chosen identification constraint in their Mplus software, Asparouhov and Muthén [
5] state that “[…] in Mplus by default the parameters are indeed reported in that metric, however, the alignment optimization is carried out using Equation (10) [i.e., the product identification constraint in (
7)] to ensure full symmetry between the different groups”. To illustrate this motivation a bit, we rewrite (
13) as
where we decomposed the terms that do involve and do not involve the first group, respectively. If the optimization would only have been carried out based on the second term in (
14), the optimization value would tend to zero if standard deviations tend to infinity. Hence, fixing the standard deviation
to 1 prevents obtaining infinite estimates of
for
. If
is specified in the minimization of (
14), it becomes clear that the first term in the sum involving the first group becomes less relevant if the number of groups increases. Hence, there is a danger that estimated standard deviations are larger if more groups are involved in the analysis. For this reason, the identification constraint
is likely not appropriate in the case of many groups. In contrast, the constraint
would be preferable in this case. The behavior of IA for many groups is analyzed in a simulation study in
Section 4 and an empirical example in
Section 5.
3. Implementation of Invariance Alignment in Mplus and Sirt
We now describe how IA can be estimated with the commercial Mplus software (Version 8.9; [
16]) and the R (Version 4.3; [
36]) package sirt [
29].
Listing 1 contains command-line syntax for the specification of IA in Mplus (see [
16,
37]). The dataset is locally saved in
mydata.dat (see Line 4 in Listing 1) in an appropriate working directory. The IA method should be applied for five items
I1, …,
I5 (see Line 6 in Listing 1). The numeric grouping variable
group is included in the dataset. The grouping variable has to be specified as a known class variable in Mplus (see Lines 8 and 9 in Listing 1).
Listing 1. Specification of invariance alignment in Mplus software. |
- 1
TITLE : - 2
Invariance Alignment ; - 3
DATA : - 4
FILE IS mydata.dat ; - 5
VARIABLE : - 6
NAMES ARE group I1 I2 I3 I4 I5; - 7
USEVARIABLES ARE group I1 I2 I3 I4 I5; - 8
CLASSES = c(3); - 9
KNOWNCLASS = c(group = 1 group = 2 group = 3); - 10
ANALYSIS : - 11
TYPE = MIXTURE; - 12
ESTIMATOR = MLR; - 13
ALIGNMENT = FIXED(2); ! group=2 is reference group with zero mean; - 14
! ALIGNMENT = FREE for method ’FREE’; - 15
- 16
TOLERANCE = 0.01; ! epsilon value; - 17
SIMPLICITY = SQRT; ! for p=0.5; - 18
! SIMPLICITY = FOURTHRT for p=0.25; - 19
MODEL : - 20
%overall% - 21
f1 BY I1 I2 I3 I4 I5; - 22
OUTPUT : - 23
alignment ;
|
Mplus has only implemented the product constraint
for standard deviations. The method
FIXED (i.e., Line 13 in Listing 1 that states
ALIGNMENT=FIXED) utilizes the zero constraint of the factor of the first group; that is,
. The reference to the first group can be changed using the command
ALIGNMENT=FIXED(2) (see Line 13 in Listing 1). In this case, Group 2 is used as the reference group. Alternatively, “the
FREE alignment optimization estimates
as an additional parameter” [
5]. This specification seems to be overparametrized, and Mplus must have implemented some fix to prevent nonconvergence of the IA optimization problem. The Mplus manual states, “In the
FREE setting, all factor means are estimated.
FREE is the most general approach” [
16]. This statement does not certainly provide enough details for an independent implementation of the black-box algorithms in the Mplus software. Furthermore, the
TOLERANCE argument in Line 15 in Listing 1 specifies the tuning parameter
that appears in the differentiable approximation (
12). The default in Mplus is
. Finally, the
SIMPLICITY argument can either choose the power
(i.e., square root
SQRT) or
(i.e., fourth root
FOURTHRT).
Listing 2 shows how IA can be estimated in the R package sirt [
29,
38,
39]. In the first step, group-specific estimation of the one-dimensional factor models can be carried out with the function
sirt::invariance_alignment_cfa_config() (see Line 5 in Listing 2). The group-specific estimated item loadings
lambda and item intercepts
nu can be extracted from the output of this function (see Lines 9 and 10 in Listing 2). Moreover, the weights
in IA (see Equation (
6)) are specified in Line 14 in Listing 2. The specification in this listing ensures the same chosen weights as in Mplus. The function
sirt::invariance.alignment() performs IA based on estimated item loadings
lambda and item intercepts
nu (see Line 17 in Listing 2). The power
p in IA can be separately chosen for item loadings (first entry in
align.pow) and item intercepts (second entry in
align.pow). If the power
instead of the default
should be used in the analysis, users have to specify the argument
align.pow=c(0.25,0.25) in the
sirt::invariance.alignment() function. The tuning parameter
in Equation (
12) can be specified with the argument
eps.
Listing 2. Specification of invariance alignment in the R package sirt. |
- 1
#∗ define items - 2
items <- paste0(‘‘I’’, 1:5) - 3
- 4
#∗ separate estimate of factor model in groups - 5
prep <- sirt::invariance_alignment_cfa_config(dat=dat[,items], - 6
group=dat$group ) - 7
- 8
# extract item loadings and item intercepts - 9
lambda <- prep$lambda - 10
nu <- prep$nu - 11
- 12
#- define weights - 13
Ng <- prep$N - 14
wgts <- matrix(sqrt(Ng), length(Ng), ncol(nu)) - 15
- 16
#∗ perform invariance alignment - 17
res <- sirt::invariance.alignment(lambda=lambda, nu=nu, - 18
align.pow=c(.5, .5), eps=0.01, wgt=wgts, meth=3) - 19
- 20
#- extract estimated means and standard deviations - 21
res$pars
|
The IA function in the sirt package has four different estimation methods that can be requested with the argument
meth. The default
meth = 1 uses the optimization Function (
6) with the identification constraints
and
. The method
meth = 2 performs IA on logarithmized item loadings (see Equation (
11)), also using the constraints
and
. The method
meth = 3 implements the product constraint
for standard deviations and the zero mean constraint for the first group (i.e.,
). Hence, this method is expected to perform similarly to Mplus’
FIXED alignment method. Finally,
meth = 4 also utilizes the product constraint for standard deviations but freely estimates the first group mean
. To identify the model, a penalty term
is added to the optimization function, where
W is the sum of the involved weights in the IA optimization function and
is a small factor to achieve convergence in optimization. Likely, this method has only conceptual similarity with Mplus’
FREE method, and no equivalent performance can be expected.
The estimated distributed parameters can be requested by the list entry $pars (see Line 21 in Listing 2).
4. Simulation Study
4.1. Method
The datasets in this simulation study were simulated from a one-dimensional factor model consisting of items and , 6, 9, or 12 groups. In the case of three groups, the group means were 0, 0.3, and 0.8, and the group standard deviations were 1, 1.225, and 1.095, respectively. With more than three groups, all parameters (i.e., distribution and item parameters) were replicated accordingly. For example, for six groups, the parameters were twice replicated.
All measurement error variances were set to 1 in all groups and uncorrelated with each other. The factor variable and residual variables were normally distributed. There was noninvariance in item intercepts and item loadings. All item intercepts had a value of zero except for a few cases. In the first group, the fifth item intercept was . In the second group, the first item intercept was , while the second item had an intercept of in the third group. All item loadings had a value of one except for a few cases. In the first group, the third item loading was . In the second group, the fifth item loading was , while the fourth item loading was in the third group. These parameters were duplicated with more than three groups as described above.
The sample size per group was chosen as , , , , or (i.e., infinite sample size). In the case of an infinite sample size, there was no sampling error, and the population parameters were the data-generating parameters. The mean vectors and the covariance matrices are sufficient statistics for the IA method. Datasets with a sample size of , whose empirical means and covariances equaled the population means and covariances, respectively, were simulated in this case.
The IA method was applied in the Mplus software (Version 8.9; [
16]), and the function
invariance.alignment() in the R package sirt (Version 4.1-15; [
29]) was applied. Both software packages utilized the power
and the tuning parameter choices
and
. Mplus was used with the
FIXED or the
FREE methods, while the method
meth in sirt was specified as
meth = 1,
meth = 2,
meth = 3, or
meth = 4. To compare the performance across methods, the estimates were linearly transformed such that the mean and the SD of the first group were 0 and 1, respectively.
In total,
replications were conducted for each cell of the simulation study. Bias, standard deviation (SD), root mean square error (RMSE), and relative RMSE were computed to assess the performance of the different estimators for factor means and factor standard deviations. To ease the comparability between the different estimation methods, we computed a relative RMSE value, which was defined as the quotient of the RMSE for a particular method and the RMSE of a reference method. This quotient was multiplied by 100 afterward. The reference method was Mplus’
FIXED method with
and
, which is the default in this software package. We also computed the mean absolute difference between estimates of Mplus and sirt to determine possible differences between software packages. Information about model specifications can be found in the material located at
https://osf.io/84ne5 (accessed on 17 February 2024).
4.2. Results
In this section, we only present results for the distribution parameters for the second group. The findings for the other groups were very similar.
Table 1 contains the bias of the estimated factor mean
for different estimation methods in Mplus and sirt. Overall, noticeable bias occurred for
= 0.01 and
p = 0.5. However, the bias decreased with increasing sample size but still appeared in infinite sample sizes. Moreover, note that the bias did not disappear with an increasing number of groups. Interestingly, bias was substantially reduced with the tuning parameter
= 0.001, particularly for sample sizes of at least 1000. For three, six, or nine groups, the method
meth = 1 in sirt performed best in terms of bias. In general, the bias of both Mplus methods
FIXED and
FREE was similar to those obtained from the four methods implemented in sirt. Interestingly, sirt’s method
meth = 1 had issues with an increasing number of groups. For
and
N = 250, there was a large bias in estimated factor means, which showed that
meth = 1 failed for a large number of groups.
Table 2 shows the relative RMSE of the estimated factor mean
in the second group. The
FREE method in Mplus was slightly inferior to the
FIXED method in Mplus for more than three groups. The tuning parameter
outperformed
in terms of relative RMSE. This observation was primarily an effect of the larger bias for
. The simulation study also highlighted that the SD for the different estimates was larger for
than for
.
Table 3 presents the average absolute difference between the estimates of the factor mean in the second group between Mplus and sirt. It can be seen that Mplus’
FIXED method was closest to the sirt method
meth = 3. The differences were larger to sirt’s
meth = 1, which is the default in the R package sirt. Furthermore, the
FREE method of Mplus turned out to perform most similarly to sirt’s
meth = 4. However, the differences between the two methods are noticeable. Hence, it can be concluded that there is no equivalent implementation of the Mplus
FREE method in the sirt package.
Table 4 shows the bias for the factor SD of the second group for
p = 0.5. As for the factor mean, the tuning parameter
= 0.001 had superior performance compared to
= 0.01. For the SD, the Mplus methods
FIXED and
FREE as well as sirt’s
meth = 3 and
meth = 4 coincide. Overall, the sirt method
meth = 1 was preferable for three or six groups, while its performance deteriorated for a larger number of groups. It should be emphasized that the bias did not even disappear in infinite sample sizes for
.
Table 5 presents the relative RMSE for the factor SD in the second group. The specifications with
were generally preferable over
in terms of RMSE. The Mplus and sirt methods performed very similarly. Obviously, the bias issues of sirt’s
meth = 1 for many groups (i.e., 9 or 12 groups) also translated into substantially increased RMSE values.
Table 6 displays the mean absolute difference for the estimate of the factor SD in the second group between Mplus and sirt. The Mplus method
FIXED had a similar performance to the sirt
meth = 3, while Mplus’
FREE method has comparable performance with sirt’s
meth = 4.
To conclude, this simulation study demonstrated that the performance of IA estimates in Mplus can be similar to sirt if an appropriate estimation method meth in sirt is chosen. The default sirt method meth = 1 resulted in larger differences to Mplus. However, sirt’s meth = 1 can be preferred over Mplus and the other sirt methods for three or six groups but cannot be recommended for many groups (i.e., at least nine groups). Overall, the tuning parameter = 0.001 should be preferred over = 0.01 in terms of bias and RMSE.
5. Empirical Example: Asparouhov and Muthén (2014) Dataset
This empirical example uses a dataset that was previously also analyzed in [
5,
40,
41]. The dataset came from the European social survey (ESS) conducted in the year 2005 (ESS 2005), which included subjects from 26 countries. The factor variable of tradition and conformity was assessed by four items presented in portrait format, where the scale of the items is such that a high value represents a low level of tradition conformity. The wording of the four items were as follows (see [
5]): It is important for him to be humble and modest. He tries not to draw attention to himself (item
TR9); Tradition is important to him. He tries to follow the customs handed down by his religion or family (item
TR20); He believes that people should do what they’re told. He thinks people should follow rules at all times, even when no one is watching (item
CO7); and It is important for him to always behave properly. He wants to avoid doing anything people would say is wrong (item
CO16). The dataset for this empirical example (and used in [
5]) was downloaded from
https://www.statmodel.com/Alignment.shtml (accessed on 17 February 2024).
5.1. Original Data
We analyzed the original ESS dataset but included subjects with no missing values on the four items. The dataset used in this article can be found at
https://osf.io/84ne5 (accessed on 17 February 2024). In the 26 countries, the sample sizes ranged between 1031 and 2963 persons with a mean of 1869.5 and an SD of 454.7. The IA method was applied with the specifications
and
in Mplus and sirt. The same six estimation methods (i.e.,
FIXED and
FREE in Mplus as well as
meth = 1,
meth = 2,
meth = 3, and
meth = 4 in sirt) were applied to the dataset.
Table 7 shows the estimated factor means and SDs for the 26 countries and the six estimation methods. It can be seen that sirt’s default
meth = 1 provides implausible estimates in this example with many groups. However, the sirt methods
meth = 2,
meth = 3, and
meth = 4 performed comparably to Mplus’
FIXED and
FREE methods. It turned out that Mplus’
FIXED method was relatively close to sirt’s
meth = 3 in terms of absolute differences in estimated factor means (
M = 0.010,
= 0.013,
= 0.000,
= 0.070). In addition, estimated factor means were also similar between the Mplus
FIXED method and the sirt
meth = 2 method (absolute differences:
M = 0.012,
= 0.014,
= 0.000,
= 0.068). Moreover, Mplus’
FREE method also performed similarly to sirt’s
meth = 4 for estimated factor means (absolute differences:
M = 0.010,
= 0.016,
= 0.000,
= 0.086). There was also a close resemblance for estimated factor standard deviations between the Mplus
FIXED and sirt
meth = 3 methods (absolute differences:
M = 0.007,
= 0.006,
= 0.000,
= 0.020). However, the differences between the estimation methods
FIXED and
FREE in Mplus (or
meth = 3 and
meth = 4 in sirt) are noteworthy.
5.2. Pseudo-Datasets
In this section, the original ESS dataset is used to create pseudo-datasets that should provide more insights about the different behavior of the estimation methods implemented in Mplus and sirt. The first five countries from the original datasets with sample sizes 1525, 1695, 2320, 1468, and 1031 subjects are used in the creation of the datasets. It is investigated whether the size of the estimates depends on the number of groups. To enable clean but idealized settings, we varied the number of included groups by replicating the original dataset accordingly. For example, with G = 10 groups, the first five groups were the original five countries, while groups six to ten are also the five countries but labeled as unique groups in the IA estimation. Usually, one would expect that the results of the first five groups should not change if the same dataset appears as duplications in the pseudo-dataset.
Table 8 presents estimated factor means and SDs for the third and the sixth group in the pseudo-datasets involving
G = 5, 10, 15, 20, 25, or 30 groups. Note that the sixth group coincided with the first group in the pseudo-datasets and the first country in the original dataset. The distribution parameter estimates were transformed such that the mean and the SD of the first group were 0 and 1, respectively.
The factor mean estimates changed as a function of a number of groups for the Mplus FIXED and all sirt methods. Only for the Mplus FREE method were the estimates invariant with respect to the number of groups. In particular, large differences in the estimates were observed when comparing results in a model with 25 and 30 groups. Because the first group had a (transformed) mean of 0, it would also be expected that Group 6 would have factor mean estimates of 0. However, this was not the case for the estimation method, except for Mplus’ FREE and sirt’s meth = 4 methods. Overall, this pattern is surprising because it implies that the choice of the reference group (i.e., the first group in our case) and the number of groups strongly affect the estimates of factor means. For the SD, only sirt’s meth = 1 had estimates that depended on the number of groups.
6. Discussion
In this article, we compared the performance of IA estimates of the Mplus software and the R package sirt. There are two alternative identification constraints for estimating standard deviations . Mplus uses the product constraint , which is used in the sirt methods meth = 3 and meth = 4. However, one can alternatively fix the standard deviation of the first group to 1. This is the default in the R package sirt (i.e., meth = 1. The differences between Mplus and the IA function in the sirt package can primarily be traced back to the different identification constraints for standard deviations. The difference between Mplus and sirt can be made smaller by choosing meth = 3, which mimics the identification constraint used in Mplus. Notably, the latter method is preferred for a large(r) number of groups (say, more than eight), while the default of meth = 1 might be preferable for at most six groups. The simulation study and the empirical example demonstrated that the default meth = 1 in the sirt package does not provide trustworthy results, and users are strongly recommended switching to meth = 2 or meth = 3.
Overall, it turned out in the simulation study that the tuning parameter
generally outperforms the default Mplus choice
. A previous study indicated that the choice of
is more critical than the choice between the power
or
[
15]. Minor reductions regarding bias can be obtained with the power
instead of
. However, for reasonably large sample sizes (e.g., more than 500 subjects per group), an
loss function [
42] can even outperform the
loss function for
or
[
15].
Regardless of the use of a particular estimation method in Mplus or sirt, we wonder whether the optimization function of IA is suitable in the case of many groups. The pairwise differences between model parameters in the optimization might lead to less stable estimates than a linear model specification that does not involve pairwise differences. There is some evidence that Haberman linking with the
IA loss function could be superior in the estimation of many groups (say, more than 20 groups) in IA (see [
35]). Further research is needed to explore possible adaptations of the IA method in the case of many groups.
In this article, we only examined estimation differences between Mplus and sirt for normally distributed data. It can be expected that estimation differences due to different identification constraints would similarly be present for ordinal data [
6] because it uses item loadings and item thresholds from item response theory models instead of item loadings and item intercepts from a one-dimensional factor model based on the multivariate normal distribution as the model input.
The IA method can provide consistent estimation of factor means and standard deviations if there is a sparse pattern of parameters that are noninvariant across groups. It is debatable whether such a sparse pattern of noninvariant effects can be theoretically assumed in empirical datasets [
43,
44]. However, if researchers believe in such a sparsity assumption, IA can be deemed an effective data-driven method.
The simulation study conducted in this article assumed a sparse structure of noninvariant parameters. It could be that the differences between Mplus and the IA function in the sirt package were larger under different data-generating models. Future research could further investigate the software differences for more data-generating models and could also involve scenarios of a large number of groups.
As a cautionary remark, we would like to add that enough implementation details must appear in publications for commercial black-box software like Mplus to enable independent judgment, evaluation, and reimplementation of existing methods. We believe that non-documented or sparely documented modeling approaches in commercial software, like the IA method in Mplus, should not be used in substantive and methodological publications because it fundamentally contradicts the principles of open science.