Next Article in Journal
On Large and Small Data Blow-Up Solutions in the Trojan Y Chromosome Model
Previous Article in Journal
Topological Transcendental Fields

## Article Menu

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# Estimation Methods of the Multiple-Group One-Dimensional Factor Model: Implied Identification Constraints in the Violation of Measurement Invariance

1
Department of Educational Measurement, IPN, Leibniz Institute for Science and Mathematics Education, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), 24118 Kiel, Germany
Axioms 2022, 11(3), 119; https://doi.org/10.3390/axioms11030119
Submission received: 20 February 2022 / Revised: 2 March 2022 / Accepted: 7 March 2022 / Published: 9 March 2022
(This article belongs to the Section Mathematical Analysis)

## Abstract

:
Factor analysis is one of the most important statistical tools for analyzing multivariate data (i.e., items) in the social sciences. An essential case is the comparison of multiple groups on a one-dimensional factor variable that can be interpreted as a summary of the items. The assumption of measurement invariance is a frequently employed assumption that enables the comparison of the factor variable across groups. This article discusses different estimation methods of the multiple-group one-dimensional factor model under violations of measurement invariance (i.e., measurement noninvariance). In detail, joint estimation, linking methods, and regularized estimation approaches are treated. It is argued that linking approaches and regularization approaches can be equivalent to joint estimation approaches if appropriate (robust) loss functions are employed. Each of the estimation approaches defines identification constraints of parameters that quantify violations of measurement invariance. We argue in the discussion section that the fitted multiple-group one-dimensional factor analysis will likely be misspecified due to the violation of measurement invariance. Hence, because there is always indeterminacy in determining group comparisons of the factor variable under noninvariance, the preference of particular fitting strategies such as partial invariance over alternatives is unjustified. In contrast, researchers purposely define fitting functions that minimize the extent of model misspecification due to the choice of a particular (robust) loss function.

## 1. Introduction

Factor analysis is one of the most important statistical tools for analyzing multivariate data (i.e., items) in the social sciences [1,2]. An important case is the comparison of multiple groups on a one-dimensional factor variable that can be interpreted as a summary of the multivariate input data. To enable a comparison on the factor variable, identification constraints for model estimation must be posed [3,4,5].
A popular and heavily discussed identification is the assumption of measurement invariance (MI, [6,7]) that assumes the existence of invariant (i.e., equal) item parameters across groups. Noninvariant item parameters occur if not all parameters are equal across groups. Practitioners and applied methodologists frequently claim that MI or only weak violations of MI (i.e., partial invariance) are necessary to enable group comparisons on the factor variable [8,9,10]. In this article, we discuss different estimation methods of the one-dimensional factor model and their implied identification constraints in the violation of MI. In more detail, we focus on joint estimation (maximum likelihood estimation), linking approaches (Haberman linking, invariance alignment), and regularized estimation (lasso-type regularization, fused regularization, Bayesian approximate invariance). We derive identification constraints on parameters that quantify violations of MI under different estimation methods. By doing so, it turns out that joint estimation, linking, and regularization can be interpreted quite similarly under certain specifications. Therefore, this work discusses competing approaches for handling violations of measurement invariance in a unified framework and provides conditions for which these approaches can provide similar results. We also try to convince the reader that there cannot be a right choice of an approach on how to handle measurement invariance because an overidentified minimization problem is made identifiable to selecting a fitting function. We derive implied identification constraints by the different fitting functions and discuss the similarity in the sequel of this article.
The article is structured as follows: In Section 2, we discuss two important one-dimensional factor models and their estimation: the tau-equivalent and the tau-congeneric measurement model. In Section 3, the tau-equivalent model with noninvariant item intercepts. Section 4 treats the tau-congeneric model with noninvariant item intercepts and invariant item loadings, while Section 5 also allows noninvariant item loadings. Finally, Section 6 closes with a discussion.

## 2. One-Dimensional Factor Model

Assume that there are I random variables $X 1 , … , X I$. These variables are also referred to as items. Denote by $X = ( X 1 , … , X I )$ the vector of all items. Denote by $μ = E ( X )$ the vector of means containing the entries $E ( X i )$ and by $Σ = Var ( X )$ the covariance matrix containing entries $σ i j = Cov ( X i , X j )$ for $i ≠ j$. In one-dimensional factor analysis, we represent the I items by a one-dimensional factor variable F. Hence, the covariances among items are presented by a rank-one matrix. In the following, we discuss two main measurement models of one-dimensional factor analysis: the tau-equivalent and the tau-congeneric models [11,12,13,14].

#### 2.1. Tau-Equivalent Model

We now assume a one-dimensional factor F in the tau-equivalent model [14]:
$X i = ν i + F + ε i , Var ( ε i ) = ϕ ,$
where the residuals $ε i$ are uncorrelated. Note that it is assumed that there are equal loadings $λ$ and equal residual variances $θ$, while item intercepts $ν i$ are item-specific parameters. For identification, we assume $E ( F ) = 0$ and $ψ = Var ( F )$ is estimated. Denote by $I$ the $I × I$ identity matrix and by $1$ an $I × 1$ vector of ones. Then, the covariance matrix $Σ$ of the items $X$ is represented by a model-implied covariance matrix $Σ 0$
$Σ 0 = ψ 11 ⊤ + ϕ I .$
Note that the covariance matrix is parsimoniously represented by only two parameters. The mean vector $μ = E ( X ) = ν$ is estimated without constraints.

#### 2.2. Tau-Congeneric Model

In the tau-congeneric measurement model [11], item-specific loadings and item-specific residual variances are allowed:
$X i = ν i + λ i F + ε i , Var ( ε i ) = ϕ i ,$
where residuals $ε i$ are uncorrelated across items. Denote by $λ = ( λ 1 , … , λ I )$ the vector of item loadings $λ i$ and $Φ = diag ( ϕ 1 , … , ϕ I )$ the diagonal matrix containing residual variances $ϕ i$. For reasons of identification, we set $E ( F ) = 0$ and $Var ( F ) = 1$. The covariance matrix $Σ$ is modelled as
$Σ 0 = λ λ ⊤ + Φ .$

#### 2.3. Overview of Estimation Methods

The tau-equivalent and the tau-congeneric model are special cases of structural equation models that impose restrictions on the mean vector and the covariance matrix [15]. In maximum likelihood (ML) estimation assuming multivariate normality of $X$, the empirical mean vector $x ¯$ and empirical covariance matrix $S$ are sufficient statistics. We denote by $θ$ the vector of all estimated parameters and define the fitting function
$F ML ( θ ; x ¯ , S ) = − N 2 − I log ( 2 π ) + log | Σ 0 ( θ ) | + tr ( S Σ 0 ( θ ) − 1 ) + ( x ¯ − μ 0 ( θ ) ) ⊤ Σ 0 ( θ ) − 1 ( x ¯ − μ 0 ( θ ) ) .$
In this article, we are only concerned with the statistical behavior of parameter estimates in the population (i.e., infinite large sample sizes). Then, the sample quantities $x ¯$ and $S$ are replaced by population parameters $μ$ and $Σ$. The fitting function in Equation (1) can then be rewritten as
$F ML ( θ ; μ , Σ ) = − log | Σ 0 ( θ ) | − tr ( Σ Σ 0 ( θ ) − 1 ) − ( μ − μ 0 ( θ ) ) ⊤ Σ 0 ( θ ) − 1 ( μ − μ 0 ( θ ) ) .$
In practice, the model-implied covariance matrix will be misspecified [16], and $θ$ is a pseudo-true parameter that is defined as the maximizer of the fitting function $F ML$.
A more general class of fitting functions is weighted least squares (WLS) estimation [15]. The parameter vector $θ$ is determined as the minimizer of
$F WLS ( θ ; μ , σ ) = ( μ − μ 0 ( θ ) ) ⊤ W 1 ( μ − μ 0 ( θ ) ) + ( σ − σ 0 ( θ ) ) ⊤ W 2 ( σ − σ 0 ( θ ) )$
with known weight matrices $W 1$ and $W 2$. The vectors $σ$ and $σ 0$ contain the nonduplicated elements from covariance matrices $Σ$ and $Σ 0 ( θ )$. Diagonally weighted least squares (DWLS) estimation results by choosing diagonal weight matrices $W 1$ and $W 2$. If these matrices are identity matrices, unweighted least squares (ULS) estimation is obtained. Interestingly, the minimization in (2) can be interpreted as a nonlinear least squares estimation problem with sufficient statistics $μ$ and $Σ$ as input data [17].
It has been shown that ML estimation can be approximately written as DWLS estimation [18] with particular weight matrices. DWLS can be generally written as
$F DWLS ( θ ; μ , σ ) = ∑ i = 1 I w 1 i ( μ i − μ 0 , i ( θ ) ) 2 + ∑ i = 1 I ∑ j = i I w 2 i j ( σ i j − σ 0 , i j ( θ ) ) 2 ,$
where $μ i$ etc. indicate the corresponding elements of vectors defined in (3). In ML estimation, the weights are approximately determined by $w 1 i = 1 / u i 2$ and $w 2 i j = 1 / ( u i 2 u j 2 )$, where $u i 2$ are sample unique standardized variances with $u i 2 = ϕ i / σ i i$. With smaller residual variances $ϕ i$, more trust is put on a mean $μ i$ or a covariance $σ i j$ in the fitting function. This kind of weighting seems questionable in the case of misspecified models [18].
The model deviations $μ i − μ 0 , i ( θ )$ and $σ i j − σ 0 , i j ( θ )$ can be differently weighted by replacing the least squares functions with robust fitting functions $ρ$ [19,20]:
$F rob ( θ ; μ , σ ) = ∑ i = 1 I w 1 i ρ ( μ i − μ 0 , i ( θ ) ) + ∑ i = 1 I ∑ j = i I w 2 i j ρ ( σ i j − σ 0 , i j ( θ ) ) .$
Siemsen and Bollen [19] proposed the absolute value function $ρ ( x ) = | x |$ for fitting the factor analysis model. This fitting function is robust to a few model violations such as unmodelled correlations of residuals $ε i$. Alternative robust loss functions such as $ρ ( x ) = | x | p$ with $0 < p < 1$ can ensure even more model-robust estimates [21].

#### 2.4. Estimation in the Presence of Slight Model Misspecifications

We now study the behavior of the estimate $θ$ as the minimizer of $F rob$ in (4) (see [16,22]). A nondifferentiable loss function $ρ$ is substituted by a differentiable approximation (e.g., $ρ ( x ) = | x |$ is replaced by $ρ ( x ) = ( x 2 + ε ) 1 / 2$ for a small $ε > 0$; see [21]) in the following derivation.
We investigate slight model misspecifications for the mean and the covariance structures. For the mean structure, we define residuals $γ i ( θ ) = μ i − μ 0 , i ( θ )$ for $i = 1 , … , I$. Furthermore, we define $δ i j ( θ ) = σ i j − σ 0 , i j ( θ ) ≠ 0$ for model deviations in the covariance structure. The estimate $θ$ is obtained by taking the derivative of $F rob$ with respect to $θ$ and setting it to zero:
$H ( σ , γ , θ ) = ∑ i = 1 I w 1 i ρ ′ ( μ i − μ 0 , i ( θ ) ) a i ( θ ) + ∑ i = 1 I ∑ j = i I w 2 i j ρ ′ ( σ i j − σ 0 , i j ( θ ) ) b i j ( θ ) = 0 ,$
where $a i ( θ ) = ∂ μ 0 , i ∂ θ$ and $b i j ( θ ) = ∂ σ 0 , i j ∂ θ$. A parameter estimate $θ$ is obtained by computing the root of $H$ in Equation (5). Note that $θ$ is a function of the mean vector $μ$ and the stacked covariance matrix $σ$.
Assume that there is a true parameter $θ 0$ if all model deviations $γ i$ and $δ i j$ would be zero. That is, we assume
$H ( μ 0 ( θ 0 ) , σ 0 ( θ 0 ) , θ 0 ) = 0 .$
Now, use the notation $γ = μ − μ 0 ( θ 0 )$ and $δ = σ − σ 0 ( θ 0 )$. A first-order Taylor expansion of $H$ using (6) provides
$H ( μ 0 ( θ 0 ) + γ , σ 0 ( θ 0 ) + δ , θ ) ≃ H μ ( μ 0 ( θ 0 ) ) γ + H σ ( σ 0 ( θ 0 ) ) δ + H θ ( θ 0 ) ( θ − θ 0 ) = 0 ,$
where $H μ ( μ 0 ( θ 0 ) ) = ∂ H ∂ μ | μ = μ 0 ( θ 0 )$,$H σ ( σ 0 ( θ 0 ) ) = ∂ H ∂ σ | σ = σ 0 ( θ 0 )$ and $H θ ( θ 0 ) = ∂ H ∂ θ | θ = θ 0$. Note that we suppress arguments in $H μ$, $H σ$ and $H θ$ in our abbreviated notation. Using the approximation (7), we obtain
$θ = θ 0 − H θ − 1 ( θ 0 ) H μ ( μ 0 ( θ 0 ) ) γ + H σ ( σ 0 ( θ 0 ) ) δ .$
Model deviations $γ$ and $δ$ enter the computation according to
$H μ ( μ 0 ( θ 0 ) ) γ + H σ ( σ 0 ( θ 0 ) ) δ = ∑ i = 1 I w 1 i ρ ″ ( γ i ) a i ( θ 0 ) γ i + ∑ i = 1 I ∑ j = i I w 2 i j ρ ″ ( δ i j ) b i j ( θ 0 ) δ i j .$
When using the square loss function $ρ ( x ) = x 2$ in ULS estimation, $ρ ″ ( x ) = 2$ and all model deviations contribute equally in the adapted parameter estimate $θ$. In contrast, when using a robust loss function $ρ ( x ) = | x | p$ for $p ≤ 1$, model deviations $γ i$ and $δ i j$ are differentially weighted according to $ρ ″ ( γ i )$ and $ρ ″ ( δ i j )$ in Equation (9), respectively [21].
The result in Equation (8) highlights that model deviations $γ$ and $δ$ enter the computation of the model parameter $θ$. With suitable loss function $ρ$, the influence of model deviations can be reduced if the second derivative $ρ ″$ is sufficiently small for gross model misspecifications.

## 3. Group Comparisons in the Tau-Equivalent Model with Noninvariant Item Intercepts

Suppose that there are a fixed number of G groups. In each of the G groups, there exists a mean vector $μ g = ( μ 1 g , … , μ I g )$ and a covariance matrix $Σ g = ( σ i j g ) i j$ for items $X g = ( X 1 g , … , X I g )$. A latent variable $F g$ in a one-dimensional factor model summarizes the distribution of items in each group $g = 1 , … , G$. Define $α g = E ( F g )$ and $ψ g = Var ( F g )$. In the sequel, we discuss the identification of $α g$ and $ψ g$ for various fitting functions (i.e., estimation methods).
We now model the mean structure $μ g$ and the covariance structure $Σ g$ with the tau-equivalent one-dimensional factor model in each group g. By assuming the identification constraint $E ( F g ) = 0$ and estimating $ψ g = Var ( F g )$, this poses restrictions
$μ g = ν g and Σ g = ψ g 11 ⊤ + ϕ g I .$
It can be seen from (10) that the mean structure is estimated without constraints, while severe constraints on the covariance structure are imposed.
If the mean $α g$ and the variance $ψ g$ of the factor variable $F g$ should also be determined in each group for enabling a comparison of the factor variable $F g$ across groups, additional identification constraints have to be posed. This can be seen by including these parameters in Equation (10)
$μ g = α g 1 + ν g and Σ g = ψ g 11 ⊤ + ϕ g I .$
A popular identification constraint is to assume invariant item intercepts $ν 0$ across groups. In this case of MI, (11) simplifies to
$μ g = α g 1 + ν 0 and Σ g = ψ g 11 ⊤ + ϕ g I .$
The group means and group variances can be identified by assuming $α 1 = 0$. The condition (12) can be characterized as scalar invariance [7]. The MI assumption (12) can be statistically tested [7]. If the MI hypothesis (12) is not rejected, $α g$ and $ψ g$ can be uniquely determined. In the violation of MI (i.e., measurement noninvariance; MNI), there is indeterminacy in defining group means and group variances. Identification constraints are implicitly posed by assuming particular fitting functions. We discuss several alternative fitting functions and draw relations among the different approaches below. In the following treatment, we allow group-specific item intercepts in the data-generating model:
$μ g = α g 1 + ν 0 + ν g ∗ and Σ g = ψ g 11 ⊤ + ϕ g I .$
The group-specific item intercepts $ν g ∗$ are residuals that describe differences from the common item intercepts $ν 0$. Hence, violation of MI (i.e., MNI) is represented in $ν g ∗$. Condition (13) is also characterized as metric invariance [7].

#### 3.1. Joint Estimation

In joint estimation, group means, group variances and common item parameters are estimated. However, MNI effects are not explicitly modeled as additional parameters. Group-specific means $μ g$ and covariances $Σ g$ are used for determining the vector of model parameters $θ = ( ν 0 , α 2 , … , α G , ψ 1 , … , ψ G , ϕ 1 , … , ϕ G )$. In Section 2.3, it was argued that many estimation methods like ML estimation could be (approximately) characterized as DWLS estimation. DWLS uses the square loss function, but one can stick to the more general case of a loss function $ρ$ (see Equation (4)). Using a set of known weights $w 1 i g$ and $w 2 i j g$ for $i , j = 1 , … , I$ and $g = 1 , … , G$, the following fitting function is minimized:
$F ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ( μ i g − μ 0 , i g ( θ ) ) + ∑ g = 1 G ∑ i = 1 I ∑ j = i I w 2 i j g ρ ( σ i j g − σ 0 , i j g ( θ ) ) .$
Note that the order in the summation in (14) across groups (index g) and items (indices i and j) can be swapped. The model assumption (13) for $μ 0 , i g ( θ )$ and $σ 0 , i j g ( θ )$ can be included in (14), and we obtain
$F ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ( μ i g − α g − ν 0 i ) + ∑ g = 1 G ∑ i = 1 I ∑ j = i I w 2 i j g ρ ( σ i j g − ψ g − ϕ g 1 { i = j } ) ,$
where $1 A$ denotes the indicator function for a set A. Because of the invariance assumption for item loadings in (13), the second term in (15) will be exactly zero and $ψ 1 … , ψ G$, $ϕ 1 , … , ϕ G$ can be uniquely determined by data. One can choose $ψ g = σ i j g$ for any $i ≠ j$. Then, $ϕ g = σ i i g − ψ g$ for any i. The group means $α g$ and common item intercepts $ν 0 , i$ can be estimated by minimizing
$F 1 ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ( μ i g − α g − ν i 0 ) .$
It can be seen that (16) corresponds to an analysis of variance model in which the two-way data $μ i g$ for items $i = 1 , … , I$ and groups $g = 1 , … , G$ is represented by two sets of main effects $α g$ and $ν 0 i$ [23]. For item intercepts $ν i 0$, we obtain estimating equations
$∂ F 1 ∂ ν i 0 = − ∑ g = 1 G w 1 i g ρ ′ ( μ i g − α g − ν i 0 ) = 0 for i = 1 , … , I .$
For group means $α g$, we similarly obtain
$∂ F 1 ∂ α g = − ∑ i = 1 I w 1 i g ρ ′ ( μ i g − α g − ν i 0 ) = 0 for g = 2 , … , G .$
Due to the assumption (13), we have $μ i g − α g − ν i 0 = ν i g ∗$ for the group-specific item intercepts. Hence, it follows from (18) that
$∑ i = 1 I w 1 i g ρ ′ ( ν i g ∗ ) = 0 for all g = 2 , … , G .$
From (17), we get
$∑ g = 1 G ∑ i = 1 I w 1 i g ρ ′ ( μ i g − α g − ν i 0 ) = 0 .$
From (19) and (20), we finally obtain
$∑ i = 1 I w 1 i g ρ ′ ( ν i g ∗ ) = 0 for all g = 1 , … , G .$
The finding in Equation (21) demonstrates that there is an assumption of group-specific residual item intercepts $ν i g ∗$ when fitting a multiple group factor model under violation of MI. Hence, group means $ψ g$ depend on choosing sets of weights $w 1 i g$ and a loss function $ρ$.
When choosing $w 1 i g ≡ 1$, the condition of partial invariance (PI; [24]) is received for the loss function $ρ ( x ) = | x | p$ for $p → 0$ which takes the value of 0 iff $x = 0$ and 1 for $x ≠ 0$. In PI, it is typically assumed there is a subset of items for which $ν i g 0 ∗ ≠ 0$ (i.e., $ρ ( ν i g 0 ∗ ) = 1$). For the majority of items, it holds that $ν i g 0 ∗ = 0$ (i.e., $ρ ( ν i g 0 ∗ ) = 0$). The loss function in (16) then minimizes the number of group-specific residual item intercepts that differ from zero [25].

#### 3.2. Linking

In linking methods [26], the one-dimensional factor model is firstly estimated in each of the groups. In the tau-equivalent model, the group variance $ψ g$ can be identified, but the group-specific estimation only provides identified item intercepts $ν i g$ that are given as (see (13))
$ν i g = α g + ν i 0 + ν i g ∗ .$
Note that item intercepts $ν i g$ coincide with group-specific item means $μ i g = E ( X i g )$.
In a second step in the linking approach, the intercepts are used to determine group means $α g$ and common item intercepts $ν i 0$ [21]. By defining $θ = ( α 2 , … , α G , ν 10 , … , ν I 0 )$, a linking function H defined by
$H ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ( ν i g − α g − ν i 0 )$
using some set of weights $w 1 i g$ that are chosen equal to one in many applications. Again, the order in the summation in (22) across groups (index g) and items (indices i and j) can be swapped. The linking function in (22) can be considered as Haberman linking (HL; [21,27,28]). In [28], the loss function $ρ ( x ) = x 2$ was used while [21] treated the robust loss function $ρ ( x ) = | x | p$ for $p ∈ [ 0 , 2 ]$. Note that for the tau-equivalent model, the minimization problem (22) is exactly the same as the minimization problem (16) in joint estimation. This is trivial because the item intercepts coincide with the observed group-specific item means and if the covariance structure is correctly specified. Hence, the same condition as in (21) for group-specific residual item intercepts $ν i g ∗$.
An alternative linking approach has been proposed that avoids estimating common item intercepts $ν i 0$. In invariance alignment (IA; [29]), the following function G is minimized for determining $θ = ( α 2 , … , α G )$ while setting $α 1 = 0$:
$G ( θ ) = ∑ g = 1 G ∑ h = 1 H ∑ i = 1 I w 1 i g w 1 i h ρ ( ν i g − ν i h − α g + α h ) ,$
where the loss function $ρ ( x ) = | x | p$ for $p ≥ 0$ is utilized [21,30]. The power $p = 0.5$ is most frequently chosen because it is the default in the software package Mplus [30]. It was empirically found that the IA minimization in (23) provides very similar group mean estimates as the minimization in (22) that also involves the estimation of common item intercepts [21]. Indeed, the loss function $ρ ( x ) = | x | p$ is a subadditive function for $p ≤ 1$ [31] which means that
$ρ ( x + y ) ≤ ρ ( x ) + ρ ( y ) for all x , y ∈ R .$
By defining $x = ν i g − α g − ν i 0$ and $y = − ( ν i h − α h − ν i 0 )$, we get from (23) by using (24)
$G ( θ ) ≤ ∑ g = 1 G ∑ i = 1 I w ˜ 1 i g ρ ( ν i g − α g − ν i 0 ) = H ˜ ( θ ) ,$
where the weights $w ˜ 1 i g$ are defined as
$w ˜ 1 i g = 2 w 1 i g ∑ h = 1 G w 1 i h .$
Hence, the majorizing function $H ˜$ in (25) is exactly given by the minimization function H in (22) when using properly defined weights $w ˜ 1 i g$ but the same loss function $ρ$. As a conclusion, joint estimation and linking methods can be regarded as exchangeable in the tau-equivalent model. They pose the same identification constraints on group-specific residual item intercepts.
Note that the two-step linking approach can be rewritten as a one-step estimation approach with overidentified parameters. Identification is ensured by posing side constraints implied by the linking function [32]. A joint optimization problem can be formulated by using Lagrange multipliers. Assume that there are associated weights $w 1 i g ∗$ with group-wise ML estimation and different weights $w i g$ in the linking method. Suppose that we parametrize $ν i g = ν i 0 + α g + ν ˜ i g$. The reformulated one-step fitting function $F Lagrange$ using Lagrange multipliers $ℓ 1 g$ and $ℓ 2 i$ of the two-step linking approaches is given by (see [32])
$F Lagrange ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ∗ ( μ i g − α g − ν i 0 − ν ˜ i g ) 2 + ∑ g = 2 G ℓ 1 g ∑ i = 1 I w 1 i g ρ ′ ( ν ˜ i g ) + ∑ i = 1 I ℓ 2 i ∑ g = 1 G w 1 i g ρ ′ ( ν ˜ i g ) ,$
where $θ$ now also includes the $G − 1 + I$ Lagrange multipliers $ℓ 1 g$ and $ℓ 2 i$. Note that the second and third term in (27) corresponds to estimating equations obtained from the linking method by determining group means $α g$ and common item intercepts $ν i 0$.

#### 3.3. Regularization

MNI has also been tackled by regularization methods [33,34,35]. The main idea is to introduce nonidentified group-specific residual item intercepts $ν ˜ i g$ in ML estimation. The estimation problem becomes identifiable by adding a penalty function $Ƥ$ to the negative likelihood function [36]. Again, the covariance structure is assumed to be correctly specified. The estimated model parameters are collected in the vector $θ = ( ν 10 , … , ν 1 I , α 2 , … , α G , … , ν ˜ i g , … )$. Using the result (3), the fitting function can be approximately written
$F reg ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ( μ i g − α g − ν i 0 − ν ˜ i g ) 2 + ∑ g = 1 G ∑ i = 1 I Ƥ ( κ , ν ˜ i g ) ,$
where $κ > 0$ is a tuning parameter. A popular penalty function is the lasso penalty $Ƥ ( κ , x ) = κ | x |$, but alternative lasso-type penalty functions with similar behavior to the former one around $x = 0$ but more desirable statistical properties have been proposed [36]. Alternatively, the ridge penalty $Ƥ ( κ , x ) = κ x 2$ can be used that controls the variability of effects od MNI.
When using lasso-type penalty functions, the residual intercepts $ν ˜ i g$ can be interpreted as outliers. Indeed, for the lasso penalty, it has been shown that the minimization of $F reg$ in (28) using regularization is equivalent to robust regression with outlier detection [37,38]. By defining $θ ˜ = ( ν 10 , … , ν 1 I , α 2 , … , α G )$ and an appropriate loss function $ρ ˜$, the minimization problem (28) can be rewritten as
$F rob ( θ ˜ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ˜ ( μ i g − α g − ν i 0 ) ,$
Hence, regularized ML estimation can be equivalently recognized as joint estimation using a particular loss function that enables the efficient detection of outliers.
We now characterize the solution of (28) in more detail. From the behavior of the lasso penalty it is reasonable to assume that estimated residual item intercepts $ν ˜ i g = 0$ equal zero iff $| ν i g ∗ | < κ$ holds for true residual item intercepts $ν i g ∗$. On the other hand, we can assume that $ν ˜ i g = ν i g ∗$ iff $| ν i g ∗ | > κ$. For determining the group mean $α g$, we get the estimating equation
$∂ F reg ∂ α g = − 2 ∑ i = 1 I w 1 i g ( μ i g − α g − ν i 0 − ν ˜ i g ) = 0 .$
By relying on the just mentioned properties for estimated residual item intercepts $ν ˜ i g$, we obtain the condition
$∑ i = 1 I w 1 i g ν i g ∗ 1 { | ν i g ∗ | < κ } = 0 for all g = 1 , … , G .$
The result in (30) indicates the MNI cancels out on average for small effects $ν i g ∗$ that fulfill $| ν i g ∗ | < κ$. Note that this set of effects is implicitly estimated in regularized ML.
In the case of a general penalty function $Ƥ$, define $Ƥ 1 = ∂ Ƥ ∂ x$. Note that we replace a nondifferentiable penalty function with a differentiable approximation $Ƥ$ [33,39]. For determining the group mean $α g$, the condition (29) does not change. For determining $ν ˜ i g$, we get the estimating equation
$− w 1 i g ( μ i g − α g − ν i 0 − ν ˜ i g ) + Ƥ 1 ( κ , ν ˜ i g ) = − w 1 i g ( ν i g ∗ − ν ˜ i g ) + Ƥ 1 ( κ , ν ˜ i g ) = 0 .$
By using (31), there exists a function $Q$ such that $ν ˜ i g = Q ( ν i g ∗ )$. Moreover, by summing (31) across items $i = 1 , … , I$, we receive
$∑ i = 1 I − w 1 i g ( ν i g ∗ − ν ˜ i g ) + Ƥ 1 ( κ , ν ˜ i g ) = 0 .$
Because (29) holds, we get
$∑ i = 1 I w 1 i g ( ν i g ∗ − ν ˜ i g ) = 0 and ∑ i = 1 I Ƥ 1 ( κ , ν ˜ i g ) = 0 .$
This means that estimated effects $ν ˜ i g = Q ( ν i g ∗ )$ somehow vanish on average according to their contribution in $Ƥ 1$.
Instead of introducing group-specific residual item intercepts $ν i g ∗$ in regularized ML estimation, one can employ fused regularized ML estimation [40] that relies on overidentified group-specific item intercepts $ν ˘ i g$. The fitting function is fused regularized ML is defined as
$F fusedreg ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ( μ i g − α g − ν ˘ i g ) 2 + ∑ g = 1 G − 1 ∑ h = g + 1 G ∑ i = 1 I Ƥ ( κ , ν ˘ i g − ν ˘ i h ) .$
In this case, the parameter vector $θ$ does not include common item intercepts $ν 0$. The nonidentification issue of ML estimation is solved by defining a penalty function $Ƥ$ on deviations $ν ˘ i g − ν ˘ i h$. By using lasso-type penalty functions in (32), clusters of $ν ˘ i g$ coefficients will be obtained. If there are only a few outlying parameters for each item, estimated group means $α g$ from fused regularized ML using the fitting function in (32) will often be similar to those in regularized ML estimation using the fitting function in (28).
Another popular approach to handling MNI is the Bayesian approximate measurement invariance model (BAMI; [41,42,43]). The tau-equivalent model is estimated with an overidentified parameter vector that includes all item intercepts $ν i g$. To ensure the identification of the model, a normal prior with known variance on all pairwise deviations $ν i g − ν i h$ is posed. The normal prior distribution on $ν i g − ν i h$ can be regarded as a ridge penalty function of the form $κ ˜ ( ν i g − ν i h ) 2$ (see, e.g., [44]). Hence, BAMI can be recognized as fused regularized ML estimation with a particular penalty function in (32).
Interestingly, Battauz [39] showed for regularized estimation in the four-parameter item response model that the ridge penalty on differences $ν i g − ν i h$ can be rewritten as a penalty for residual item intercepts $ν ˜ i g = ν i g − ν i 0$:
$κ ˜ ∑ g = 1 G ∑ h = 1 H ∑ i = 1 I ( ν i g − ν i h ) 2 = κ ∑ g = 1 G ∑ i = 1 I ν ˜ i g 2$
using an appropriate tuning parameter $κ$. By replacing the Markov chain Monte Carlo estimation method of the BAMI model with regularized ML estimation, we obtain the fitting function
$F BAMI ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ( μ i g − α g − ν i 0 − ν ˜ i g ) 2 + κ ∑ g = 1 G ∑ i = 1 I ν ˜ i g 2 ,$
which is regularized ML estimation using a ridge penalty function. For determining group means $α g$, we get the estimating equation
$∑ i = 1 I w 1 i g ( μ i g − α g − ν i 0 − ν ˜ i g ) = 0 .$
For determining $ν ˜ i g$, we get
$− w 1 i g ( μ i g − α g − ν i 0 − ν ˜ i g ) + κ ν ˜ i g = 0$
Hence, the estimated group-specific residual item intercepts are shrunken estimates of $ν i g ∗$
$ν ˜ i g = w 1 i g w 1 i g + κ ( μ i g − α g − ν i 0 ) = w 1 i g w 1 i g + κ ν i g ∗$
that fulfill $∑ i = 1 I ν ˜ i g = 0$. By using (34), we get from (33)
$∑ i = 1 I w 1 i g ( ν i g ∗ − ν ˜ i g ) = ∑ i = 1 I κ w 1 i g w 1 i g + κ ν i g ∗ = 0 .$
With equal weights $w 1 i g$ for all items within a group g, this shows that BAMI and ML pose the same identification constraints on $ν i g ∗$; that is, $∑ i = 1 I ν i g ∗ = 0$.
A variant of IA has been proposed that uses the output of BAMI for determining the alignment solution [45,46,47]. BAMI produces adjusted group-specific item means $μ ˜ i g = α g + ν i 0 + ν ˜ i g$, which are subsequently used as input data for Bayesian IA. Note that
$μ ˜ i g = α g + ν i 0 + ν ˜ i g = μ i g + ν ˜ i g − ν i g ∗ = μ i g − κ w 1 i g + κ ν i g ∗ .$
We now substitute the quantity $μ ˜ i g$ obtained in (35) in the IA fitting function (see (25) and weights $w ˜ 1 i g$ defined in (26)):
$H ˜ ( θ ) = ∑ g = 1 G ∑ i = 1 I w ˜ 1 i g ρ ( μ ˜ i g − α g − ν i 0 ) = ∑ g = 1 G ∑ i = 1 I w ˜ 1 i g ρ μ i g − κ w 1 i g + κ ν i g ∗ − α g − ν i 0 .$
The estimating equation for $α g$ is then given by
$∑ i = 1 I w ˜ 1 i g ρ ′ μ i g − κ w 1 i g + κ ν i g ∗ − α g − ν i 0 = 0 .$
Using the definition $μ i g = α g + ν i 0 + ν i g ∗$, we get the identification constraint
$∑ i = 1 I w ˜ 1 i g ρ ′ w 1 i g w 1 i g + κ ν i g ∗ = ∑ i = 1 I w ˜ 1 i g ρ ′ ( ν ˜ i g ) = 0 .$
When equal weights $w 1 i g$ (and $w ˜ 1 i g$) are used in (36), the identification constraints $∑ i = 1 I ρ ′ ( ω ν i g ∗ ) = 0$ with a scaling factor $ω > 0$ are obtained. For $ρ ( x ) = x 2$, it holds that $ρ ′ ( x ) = 2 x$, and we receive the same identification constraints as those obtained with ULS estimation or DWLS estimation with equal weights.

## 4. Group Comparisons in the Tau-Congeneric Model with Noninvariant Item Intercepts

In the following, we discuss group comparisons in the tau-congeneric model. We investigate the consequences of MNI in item intercepts but assume MI in item loadings; that is, item loadings are invariant, and metric invariance holds [6,7]. It will be shown that the derivations of the tau-equivalent model only have slightly to be modified.
For the following examinations, we assume common loadings $λ 0$ where the first loading $λ 10$ is fixed to 1 for reasons of identification in the data-generating model
$μ g = α g λ 0 + ν 0 + ν g ∗ and Σ g = ψ g λ 0 λ 0 ⊤ + Φ g ,$
where $Φ g$ is a diagonal matrix of group-specific residual variances. Violation of MI only pertains to the mean structure due to the presence of group-specific residual item intercepts $ν g ∗$.

#### 4.1. Joint Estimation

In the joint estimation of the tau-congeneric measurement model, the parameter of interest is given as $θ = ( ν 0 , λ 0 , α 2 , … , α G , ψ 1 , … , ψ G , ϕ 1 , … , ϕ G )$, where $ϕ g$ include the diagonal entries of $Φ g$. The general fitting function is defined by
$F ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ( μ i g − α g λ i 0 − ν 0 i ) + ∑ g = 1 G ∑ i = 1 I ∑ j = i I w 2 i j g ρ ( σ i j g − ψ g λ i 0 λ j 0 − ϕ i g 1 { i = j } ) .$
Because the covariance structure is correctly specified, common item loadings $λ 0$ and group-specific residual variance matrices $Φ g$ can be uniquely determined by minimizing the second term in (37). Hence, for determining group means $α g$, only the first term in (37) must be considered, which results in a fitting function
$F 1 ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ( μ i g − α g λ i 0 − ν 0 i ) .$
Similar to Equation (21) in Section 3.1, we get the identification constraints by taking the same steps as in Section 3.1
$∑ i = 1 I w 1 i g λ i 0 ρ ′ ( ν i g ∗ ) = 0 for all g = 1 , … , G .$
Comparing (39) with (21), it is vital that the common item loadings $λ i 0$ now also enter the identification constraint (see also [48]).

#### 4.2. Linking

We now investigate HL in the tau-equivalent model [21] (see [48] for a similar approach). Again, identified parameters (i.e., item intercepts $ν i g$ and item loadings $λ i g$) are firstly obtained in group-wise estimations
$ν i g = α g λ i 0 + ν i 0 + ν i g ∗ and λ i g = ψ g 1 / 2 λ i 0 .$
From (40) and assuming $λ 10 = 1$ for the first item for reasons of identification, it can be seen that all group variances $ψ g$ can be uniquely identified. Obviously, we get $ψ g = λ 1 g 2$. For determining group means $α g$, we define a linking function H for the parameter $θ = ( ν i 1 , … , ν I 0 , ψ 2 , … , ψ G )$ of interest:
$H ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ( ν i g − α g λ i 0 − ν i 0 )$
Like in Section 3.2, it can be seen that the minimization problem in (41) corresponds to the problem in (38). Hence, the same identification constraints as in (39) are obtained.
As also discussed in Section 3.2, one can also argue that IA appears very similar to the linking problem in (41) and therefore resembles joint estimation by using an appropriate loss function $ρ$.

#### 4.3. Regularization

The fitting function for the mean structure of regularized ML estimation for the tau-congeneric model with invariant item loadings can be written as
$F reg ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ( μ i g − α g λ i 0 − ν i 0 − ν ˜ i g ) 2 + ∑ g = 1 G ∑ i = 1 I Ƥ ( κ , ν ˜ i g ) .$
Similar arguments like in Section 3.3 can be made for demonstrating the equivalence of regularized ML estimation using (42) and joint estimation using a robust loss function $ρ$ (see (38)).
Similar findings as in Section 3.3 regarding the equivalence of fused regularized ML estimation and BAMI can be made. The similarity of linking methods and regularized ML estimation was also pointed out by [48] in item response models for dichotomous items.

## 5. Group Comparisons in the Tau-Congeneric Model with Noninvariant Item Intercepts and Noninvariant Item Loadings

Finally, we consider the estimation of the tau-congeneric measurement model. The data-generating model allows for noninvariant item intercepts and noninvariant item loadings.
Let $x ∘ y$ be the Hadamard product (i.e., element-wise multiplication of vectors $x$ and $y$). We assume
$μ g = α g λ 0 ∘ λ g ∗ + ν 0 + ν g ∗ and Σ g = ψ g ( λ 0 ∘ λ g ∗ ) ( λ 0 ∘ λ g ∗ ) ⊤ + Φ g .$
For entries in $μ g$ and $Σ g$, we get from (43)
$μ i g = α g λ i 0 λ i g ∗ + ν i 0 + ν i g ∗ and σ i j g = ψ g λ i 0 λ j 0 λ i g ∗ λ j g ∗ + ϕ i g 1 { i = j } .$
Then, MNI is represented in $λ g ∗$ and $ν g ∗$. Note that values of $λ i g ∗$ equal to 1 indicate MI of a particular item parameter, while MNI is represented by values different from 1.
It is important to emphasize that deviations from MI in item loadings are modeled as multiplicative effects. This assumption means that there is an additive representation for logarithmized identified group-specific item loadings $l i g = log λ i g$
$l i g = f g + l i 0 + l i g ∗ ,$
where $l i 0 = log λ i 0$, $l i g ∗ = log λ i g ∗$, and $f g = 1 2 log ψ g$. Instead of treating deviations as multiplicative errors, one could also assume additive errors of deviations ([49], see also [50]) such that $λ i g = ψ g 1 / 2 ( λ i 0 + λ i g ∗ )$. In this case, the group-specific mean and covariances can be written as
$μ i g = α g ( λ i 0 + λ i g ∗ ) + ν i 0 + ν i g ∗ and σ i j g = ψ g ( λ i 0 + λ i g ∗ ) ( λ j 0 + λ j g ∗ ) + ϕ i g 1 { i = j } .$
Note the difference to the parameterization in (44).
We now study the effects of MNI in intercepts and loadings in different estimation approaches for determining group means and group variances in the tau-congeneric model.

#### 5.1. Joint Estimation

In joint estimation, the parameter of interest is given by $θ = ( ν 0 , λ 0 , α 2 , … , α G , ψ 2 , … , ψ G , ϕ 1 , … , ϕ G )$, while we set $α 1 = 0$ and $ψ 1 = 1$ in the estimation. We consider the general fitting function
$F ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ( μ i g − α g λ i 0 − ν 0 i ) + ∑ g = 1 G ∑ i = 1 I ∑ j = i I w 2 i j g ρ ( σ i j g − ψ g λ i 0 λ j 0 − ϕ i g 1 { i = j } ) .$
We first investigate the determination of group variances $ψ g$. Assume that common item loadings $λ 0$ have already been determined. Then, we get the estimating equation by taking $∂ F ∂ ψ g$
$∑ i = 1 I − 1 ∑ j = i + 1 I w 2 i j g λ i 0 λ j 0 ρ ′ ( σ i j g − ψ g λ i 0 λ j 0 ) = 0 .$
Note that $ϕ i g$ can be uniquely determined and there is a vanishing contribution of terms for $i = j$. Using (44), this implies the identification constraint
$∑ i = 1 I − 1 ∑ j = i + 1 I w 2 i j g λ i 0 λ j 0 ρ ′ ψ g λ i 0 λ j 0 ( λ i g ∗ λ j g ∗ − 1 ) = 0 .$
For the loss function $ρ ( x ) = | x | p$, we get due to $ρ ′ ( λ x ) = ρ ′ ( λ ) ρ ′ ( x )$ for any $λ ≥ 0$ from (46)
$∑ i = 1 I − 1 ∑ j = i + 1 I w 2 i j g λ i 0 λ j 0 ρ ′ ψ g λ i 0 λ j 0 ρ ′ λ i g ∗ λ j g ∗ − 1 = 0 .$
Group-specific residual item loadings $λ i g ∗$ vanish on average according to the identification constraint (47). We can further specialize (47) for DWLS estimation (which can also approximate ML estimation), which employs the square loss function $ρ ( x ) = x 2$:
$∑ i = 1 I − 1 ∑ j = i + 1 I w 2 i j g λ i 0 2 λ j 0 2 ψ g ( λ i g ∗ λ j g ∗ − 1 ) = 0$
Now, we determine group means $α g$. Assume that common item loadings $λ 0$ and $ν 0$ have already been determined. By taking $∂ F ∂ α g$, we get the identification constraints
$∑ i = 1 I w 1 i g λ i 0 ρ ′ α g λ i 0 ( λ i g ∗ − 1 ) + ν i g ∗ = 0 .$
For DWLS estimation, we get the identification constraint
$∑ i = 1 I w 1 i g λ i 0 ( α g λ i 0 ( λ i g ∗ − 1 ) + ν i g ∗ ) = 0 .$
It can be seen that MNI in loadings due to $λ i g ∗$ as well as MNI in intercepts due to $ν i g ∗$ determine the group mean $α g$.
For additive deviations from MI that follow (45), the condition (46) is replaced by
$∑ i = 1 I − 1 ∑ j = i + 1 I w 2 i j g λ i 0 λ j 0 ρ ′ ψ g ( λ i 0 λ j g ∗ + λ i g ∗ λ j 0 + λ i g ∗ λ j g ∗ ) = 0 .$
The condition (49) is replaced by
$∑ i = 1 I w 1 i g λ i 0 ρ ′ ( α g λ i g ∗ + ν i g ∗ ) = 0 .$
Due to the arbitrariness of using either the multiplicative or additive representation of MNI effects $λ i g ∗$, the different obtained identification conditions should not be interpreted as conflicting findings but rather as different ways of representing the same identification condition.

#### 5.2. Linking

This subsection discusses the estimation of group means and group variances for linking approaches. We assume that item loadings follow the multiplicative representation of MNI in (44). This corresponds to an additivity assumption for logarithmized item loadings.
At first, the tau-congeneric measurement model is estimated separately in each group. By assuming $E ( F g ) = 0$ and $Var ( F g ) = 1$, identified group-specific item intercepts $ν i g$ and group-specific item loadings $λ i g$ are given as
$ν i g = α g λ i 0 λ i g ∗ + ν i 0 + ν i g ∗ and λ i g = ψ g 1 / 2 λ i 0 λ i g ∗ .$
Note that (50) can be equivalently written as
$ν i g = α g λ i 0 λ i g ∗ + ν i 0 + ν i g ∗ and log λ i g = 1 2 log ψ g + log λ i 0 + log λ i g ∗ .$
In HL [21,28], common logarithmized item loadings $l 0 = ( l 10 , … , l I 0 ) = log λ 0$ and logarithmized group variances $f g = 1 2 ψ g$ are computed in the first step. Note that $ψ g = exp ( 2 f g )$. The linking function $H 2$ using $θ = ( f 2 , … , f G , l 0 )$ using the identification constraint $f 1 = 0$ (i.e., $ψ 1 = 1$) is defined as
$H 2 ( θ ) = ∑ g = 1 G ∑ i = 1 I w 2 i g ρ ( l i g − f g − l i 0 ) ,$
where $l i g = log λ i g$. For determining $f g$ (and, hence, the group variance $ψ g$), applying $∂ H 2 ∂ f g$ and considering (51) provides the identification constraint
$∑ i = 1 I w 2 i g ρ ′ ( log λ i g ∗ ) = 0 .$
In the second step in HL, group means $α g$ are determined based on identified item intercepts $ν i g$ and identified item loadings $λ i g$ (see Equation (50)) and group variances $ψ g$ that are determined in a first step. We discuss a variant of HL (see Equation (25) in [21])
$H 1 ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g ρ ν i g − ν i 0 − ψ g − 1 / 2 λ i g α g$
for $θ = ( α 2 , … , α G , ν 0 )$ and employing the identification constraint $α 1 = 0$. For determining group means $α g$, we consider $∂ H 1 ∂ α g$ and get from (52) the identification constraints
$∑ i = 1 I w 1 i g ψ g − 1 / 2 λ i g ρ ′ ν i g − ν i 0 − ψ g − 1 / 2 λ i g α g = ∑ i = 1 I w 1 i g λ i 0 λ i g ∗ ρ ′ ν i g ∗ = 0 .$
Another popular linking approach is IA [29]. Originally it was discussed as a simultaneous linking method for determining group means and group variances, it has been shown in [21] that alignment is equivalent to a two-step linking approach.
In the first step of IA, group standard deviations $p g$ defined as $ψ g = p g 2$ are determined by minimizing the linking function
$G 2 ( θ ) = ∑ g = 1 G ∑ h = 1 H ∑ i = 1 I w 2 i g h ρ p g − 1 λ i g − p h − 1 λ i h$
where $θ = ( p 2 , … , p G )$ using the identification constraint $p 1 = 1$ (and, hence, $ψ 1 = 1$). For determining $p g$ (and subsequently $ψ g$), we receive the following identification condition by taking $∂ G 2 ∂ p g$
$∑ h = 1 H ∑ i = 1 I w 2 i g h λ i 0 λ i g ∗ ρ ′ λ i 0 ( λ i g ∗ − λ i h ∗ ) = 0 .$
In the second step of IA, group means $α g$ using the identification constraint $α 1 = 0$ are obtained by minimizing the linking function using $θ = ( α 2 , … , α G )$
$G 1 ( θ ) = ∑ g = 1 G ∑ h = 1 H ∑ i = 1 I w 1 i g h ρ ν i g − ν i h − ψ g − 1 / 2 λ i g α g + ψ h − 1 / 2 λ i h α h .$
Then, we get the following identification condition by taking $∂ G 1 ∂ α g$
$∑ h = 1 H ∑ i = 1 I w 1 i g h ψ g − 1 / 2 λ i g ρ ′ ν i g − ν i h − ψ g − 1 / 2 λ i g α g + ψ h − 1 / 2 λ i h α h = ∑ h = 1 H ∑ i = 1 I w 1 i g h λ i 0 λ i g ∗ ρ ′ ν i g ∗ − ν i h ∗ = 0 .$
It has been shown in [21] that HL and IA provide very similar results for the tau-congeneric model if the same loss function $ρ ( x ) = | x | p$ for $p ∈ [ 0 , 1 ]$ is utilized. We can rely on the subadditivity property (24) for finding a majorizing function $H ˜ 1$ in (53)
$G 1 ( θ ) ≤ ∑ g = 1 G ∑ i = 1 I w ˜ 1 i g ρ ν i g − ν i 0 − ψ g − 1 / 2 λ i g α g = H ˜ 1 ( θ ˜ ) ,$
where $θ ˜$ now also includes common item intercepts $ν 0$ and $w ˜ 1 i g$ are appropriately defined weights. By minimizing $H ˜ 1$ for determining the group mean $α g$, we receive the identification constraint
$∑ i = 1 I w ˜ 1 i g λ i 0 λ i g ∗ ρ ν i g ∗ = 0 .$

#### 5.3. Regularization

We now discuss identification constraints in regularized ML estimation of the multiple-group tau-congeneric measurement model. Deviations of MI in item loadings can be either modeled in a multiplicative (see (44)) or an additive (see (45)) form. In the following, we assume additive effects because this specification appears more frequently in practical applications.
In regularized ML estimation, we collect in the parameter vector $θ$ the parameters $α g$, $ν i 0$, $ν ˜ i g$, $ψ g$, $λ i 0$, and $λ ˜ i g$. Moreover, we set $α 1 = 0$ and $ψ 1 = 1$ for reason of identification and define the fitting function
$F reg ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g μ i g − α g ( λ i 0 + λ ˜ i g ) − ν i 0 − ν ˜ i g 2 + ∑ g = 1 G ∑ i = 1 I ∑ j = i I w 2 i g σ i j g − ψ g ( λ i 0 + λ ˜ i g ) ( λ j 0 + λ ˜ j g ) − ϕ i g 1 { i = j } 2 + ∑ g = 1 G ∑ i = 1 I Ƥ ( κ 1 , ν ˜ i g ) + ∑ g = 1 G ∑ i = 1 I Ƥ ( κ 2 , λ ˜ i g ) ,$
where $κ 1$ and $κ 2$ are regularization parameters for group-specific residual intercepts $ν ˜ i g$ and residual loadings $λ ˜ i g$, respectively. Using the additive representation of MNI (45), we get the following identification constraint for determining the group mean $α g$
$∑ g = 1 G ∑ i = 1 I w 1 i g ν i g ∗ − ν ˜ i g + α g ( λ i g ∗ − λ ˜ i g ) = 0 .$
The condition (55) means that MNI effects $ν i g ∗$ and $λ i g ∗$ cancel out on average, where the average is computed mainly on those effects that are set to zero in regularized ML.
As an alternative approach, fused regularized ML estimation can be employed. In this approach, all item intercepts and item loadings are estimated group-wise, and identification is ensured using a fused penalty function. The fitting function is defined as
$F fusedreg ( θ ) = ∑ g = 1 G ∑ i = 1 I w 1 i g μ i g − α g λ ˘ i g − ν ˘ i g 2 + ∑ g = 1 G ∑ i = 1 I ∑ j = i I w 2 i g σ i j g − ψ g λ ˘ i g λ ˘ j g − ϕ i g 1 { i = j } 2 + ∑ g = 1 G ∑ h = g + 1 G ∑ i = 1 I Ƥ ( κ 1 , ν ˘ i g − ν ˘ i h ) + ∑ g = 1 G ∑ h = g + 1 G ∑ i = 1 I Ƥ ( κ 2 , λ ˘ i g − λ ˘ i h ) .$
With additive MNI effects (see (45)), the identification constraint for the group mean is given by
$∑ i = 1 I w 1 i g α g ( λ i 0 + λ i g ∗ − λ ˘ i g ) + ν i 0 + ν i g ∗ − ν ˘ i g = 0 .$
In the tau-congeneric measurement model, BAMI [41,47] uses normal prior distributions for intercept differences $ν i g − ν i h$ and loading differences $λ i g − λ i h$ [51]. Therefore, BAMI can be viewed as fused regularized ML estimation with a ridge penalty. Using the finding of Battauz [39] and the same reasoning as in Section 3.3, it is evident that BAMI as fused regularization with a ridge penalty function is equivalent to regularized ML estimation in (54) that involves the estimation of common item intercepts $ν 0$ and common item loadings $λ 0$. As argued in Section 3.3, regularized ML estimation with a ridge penalty can be quite close to DWLS with appropriate weights (and, hence, ML) in terms of estimated group means and estimated variances because only shrinkage for otherwise nonidentified residual MNI effects for intercepts and loadings is introduced. Consequently, BAMI will provide similar results compared to ML estimation.
As also discussed in Section 3.3 for the tau-equivalent model, the output of BAMI can be used in a subsequent IA estimation [45]. Using the same steps as in Section 3.3 for the derivations, an identification constraint similar to (36) can be obtained.

## 6. Discussion

In this article, we have argued that joint estimation, linking, and regularized ML estimation in the tau-equivalent and the tau-congeneric model can provide similar if not identical estimates in the violation of MI if an appropriate loss function $ρ$ in joint estimation or linking is used. In the violation of MI, it is important to emphasize that researchers can use arbitrary identification constraints to determine group means. Resulting estimates depend on the chosen weights and loss function or the used penalty functions in regularized ML estimation. Therefore, researchers use implicit definitions for identification constraints on effects that quantify MNI by choosing a particular fitting function. The wisdom under applied researchers that partial invariance is necessary for determining group-mean comparisons [24] is unsound because it would imply that a particular loss function should always be preferred in practice.
It is important to emphasize that choosing of a particular loss function weighs discrepancies between sample input data (i.e., means and covariances in factor analysis) and assumed population parameters. Two error types must be distinguished: sampling error and model error. The former sampling error can be reduced in large samples while the latter model error (i.e., MNI in multiple-group factor analysis) does not vanish in large samples. Certain types of misspecifications can be downweighted by utilizing a model-robust loss function $ρ$. Consequently, estimated model parameters are not influenced by some misspecifications. We think that the preference of ML estimation factor analysis (and in structural equation modeling in general) is misguided because ML can only be the most efficient estimation method in (very) large samples and if the model of interest is correctly specified. Because MI is rare as unicorns in applications of multiple-group factor analysis, we cannot imagine many situations in which a robust loss function should not be preferred. Note that we are discussing robustness in the sense of model misspecification in modelled means and covariances. In contrast, robust factor analysis is mainly devoted to the misspecification of the multivariate normal distribution [52].
Joint estimation in the case of measurement invariance still seems to be the most frequent approach. In contrast to regularization approaches, joint estimation does not include additional parameters for model deviations. Hence, joint estimation is less computationally demanding than regularization. Linking approaches are typically implemented in a two-step method. Because the one-dimensional factor model is estimated separately for each group in a first step, the linking approach might be less prone to convergence issues or ill-defined parameter estimates. However, group-wise estimation of the one-dimensional factor model might require a sufficiently large sample size. Hence, linking methods could result in less stable estimates than joint estimation or regularization.
Our arguments in this paper are based on fitting vectors of means and covariances that are computed for factor analysis of continuous items (i.e., assuming a multivariate normal distribution). The arguments likely generalize to the fitting of vectors of thresholds and polychoric correlation matrices for factor analysis of ordinal data [53]. Future research could investigate violations of measurement invariance in the one-dimensional factor model for a continuous covariate instead of a finite number of groups $g = 1 , … , G$ [54].
We limited our discussion to the most popular estimation methods of the multiple-group one-dimensional factor model (i.e., joint estimation, linking, and regularized estimation) in the violation of MI. More flexible handling of MNI has recently been proposed using deep learning methods [55]. These fitting functions also imply identification constraints for MNI effects. Because it is our conviction that all fitted models are grossly misspecified (and not only misspecified to a certain degree), there is no reason to believe that more complex models will provide more valid estimates for group comparisons. In contrast, researchers purposely choose fitting function that describe a complex dataset and define a pseudo-true parameter through the choice of this fitting function. It is likely not reasonable to talk about true parameters (i.e., true group means and true group variances) without explicitly mentioning identification constraints.

## 7. Conclusions

This article presented a formal analysis of different estimation methods in the violation of measurement invariance. We have shown how different fitting functions result in implied identification constraints on parameters that characterize the extent of measurement invariance. In our view, the choice of fitting functions should be mainly made regarding the weighing of model deviations because it is unlikely in practical applications that the doctrine of measurement invariance exactly holds.

## Funding

This research received no external funding.

Not applicable.

Not applicable.

Not applicable.

## Conflicts of Interest

The author declare no conflict of interest.

## Abbreviations

The following abbreviations are used in this manuscript:
 BAMI Bayesian approximate measurement invariance DWLS diagonally weighted least squares HL Haberman linking IA invariance alignment ML maximum likelihood MI measurement invariance MNI measurement noninvariance PI partial invariance ULS unweighted least squares WLS weighted least squares

## References

1. Bartholomew, D.J. The foundations of factor analysis. Biometrika 1984, 71, 221–232. [Google Scholar] [CrossRef]
2. Jöreskog, K.G. A general approach to confirmatory maximum likelihood factor analysis. Psychometrika 1969, 34, 183–202. [Google Scholar] [CrossRef]
3. Bechger, T.M.; Maris, G. A statistical test for differential item pair functioning. Psychometrika 2015, 80, 317–340. [Google Scholar] [CrossRef] [PubMed]
4. Schulze, D.; Pohl, S. Finding clusters of measurement invariant items for continuous covariates. Struct. Equ. Model. A Multidiscip. J. 2021, 28, 219–228. [Google Scholar] [CrossRef]
5. Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
6. Meredith, W. Measurement invariance, factor analysis and factorial invariance. Psychometrika 1993, 58, 525–543. [Google Scholar] [CrossRef]
7. Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
8. Davidov, E.; Meuleman, B. Measurement invariance analysis using multiple group confirmatory factor analysis and alignment optimisation. In Invariance Analyses in Large-Scale Studies; van de Vijver, F.J.R., Ed.; OECD: Paris, France, 2019; pp. 13–20. [Google Scholar]
9. Vandenberg, R.J.; Lance, C.E. A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organ. Res. Methods 2000, 3, 4–70. [Google Scholar] [CrossRef]
10. Wicherts, J.M.; Dolan, C.V. Measurement invariance in confirmatory factor analysis: An illustration using IQ test performance of minorities. Educ. Meas. Issues Pract. 2010, 29, 39–47. [Google Scholar] [CrossRef]
11. Jöreskog, K.G. Statistical analysis of sets of congeneric tests. Psychometrika 1971, 36, 109–133. [Google Scholar] [CrossRef]
12. Lewis, C. Selected topics in classical test theory. In Handbook of Statistics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2006; Volume 26, pp. 29–43. [Google Scholar] [CrossRef]
13. Mellenbergh, G.J. A unidimensional latent trait model for continuous item responses. Multivariate Behav. Res. 1994, 29, 223–236. [Google Scholar] [CrossRef]
14. Steyer, R. Models of classical psychometric test theory as stochastic measurement models: Representation, uniqueness, meaningfulness, identifiability, and testability. Methodika 1989, 3, 25–60. Available online: https://bit.ly/3Js7N3S (accessed on 20 February 2022).
15. Jöreskog, K.G.; Olsson, U.H.; Wallentin, F.Y. Multivariate Analysis with LISREL; Springer: Basel, Switzerland, 2016. [Google Scholar] [CrossRef]
16. Kolenikov, S. Biases of parameter estimates in misspecified structural equation models. Sociol. Methodol. 2011, 41, 119–157. [Google Scholar] [CrossRef]
17. Savalei, V. Understanding robust corrections in structural equation modeling. Struct. Equ. Model. A Multidiscip. J. 2014, 21, 149–160. [Google Scholar] [CrossRef]
18. MacCallum, R.C.; Browne, M.W.; Cai, L. Factor analysis models as approximations. In Factor Analysis at 100; Cudeck, R., MacCallum, R.C., Eds.; Lawrence Erlbaum: Mahwah, NJ, USA, 2007; pp. 153–175. [Google Scholar] [CrossRef]
19. Siemsen, E.; Bollen, K.A. Least absolute deviation estimation in structural equation modeling. Sociol. Methods Res. 2007, 36, 227–265. [Google Scholar] [CrossRef]
20. Van Kesteren, E.J.; Oberski, D.L. Flexible extensions to structural equation models using computation graphs. Struct. Equ. Model. A Multidiscip. J. 2021. [Google Scholar] [CrossRef]
21. Robitzsch, A. Lp loss functions in invariance alignment and Haberman linking with few or many groups. Stats 2020, 3, 246–283. [Google Scholar] [CrossRef]
22. Yuan, K.H.; Marshall, L.L.; Bentler, P.M. Assessing the effect of model misspecifications on parameter estimates in structural equation models. Sociol. Methodol. 2003, 33, 241–265. [Google Scholar] [CrossRef]
23. Davies, P.L. Data Analysis and Approximate Models; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar] [CrossRef]
24. Byrne, B.M.; Shavelson, R.J.; Muthén, B. Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychol. Bull. 1989, 105, 456–466. [Google Scholar] [CrossRef]
25. Davies, P.L.; Terbeck, W. Interactions and outliers in the two-way analysis of variance. Ann. Statist. 1998, 26, 1279–1305. [Google Scholar] [CrossRef]
26. Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
27. Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika 2017, 82, 610–636. [Google Scholar] [CrossRef]
28. Haberman, S.J. Linking Parameter Estimates Derived from An Item Response Model through Separate Calibrations; Research Report No. RR-09-40; Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
29. Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model. A Multidiscip. J. 2014, 21, 495–508. [Google Scholar] [CrossRef]
30. Pokropek, A.; Lüdtke, O.; Robitzsch, A. An extension of the invariance alignment method for scale linking. Psychol. Test Assess. Model. 2020, 62, 303–334. Available online: https://bit.ly/2UEp9GH (accessed on 20 February 2020).
31. Schechter, E. Handbook of Analysis and Its Foundations; Academic Press: San Diego, CA, USA, 1996. [Google Scholar] [CrossRef]
32. Von Davier, M.; von Davier, A.A. A unified approach to IRT scale linking and scale transformations. Methodology 2007, 3, 115–124. [Google Scholar] [CrossRef]
33. Geminiani, E.; Marra, G.; Moustaki, I. Single- and multiple-group penalized factor analysis: A trust-region algorithm approach with integrated automatic multiple tuning parameter selection. Psychometrika 2021, 86, 65–95. [Google Scholar] [CrossRef]
34. Huang, P.H. A penalized likelihood method for multi-group structural equation modelling. Br. J. Math. Stat. Psychol. 2018, 71, 499–522. [Google Scholar] [CrossRef]
35. Li, X.; Jacobucci, R.; Ammerman, B.A. Tutorial on the use of the regsem package in R. Psych 2021, 3, 579–592. [Google Scholar] [CrossRef]
36. Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
37. She, Y.; Owen, A.B. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 2011, 106, 626–639. [Google Scholar] [CrossRef] [Green Version]
38. Yu, C.; Yao, W. Robust linear regression: A review and comparison. Commun. Stat. Simul. Comput. 2017, 46, 6261–6282. [Google Scholar] [CrossRef]
39. Battauz, M. Regularized estimation of the four-parameter logistic model. Psych 2020, 2, 269–278. [Google Scholar] [CrossRef]
40. Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B 2005, 67, 91–108. [Google Scholar] [CrossRef] [Green Version]
41. Muthén, B.; Asparouhov, T. Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychol. Methods 2012, 17, 313–335. [Google Scholar] [CrossRef]
42. Pokropek, A.; Schmidt, P.; Davidov, E. Choosing priors in Bayesian measurement invariance modeling: A Monte Carlo simulation study. Struct. Equ. Model. 2020, 27, 750–764. [Google Scholar] [CrossRef]
43. Van de Schoot, R.; Kluytmans, A.; Tummers, L.; Lugtig, P.; Hox, J.; Muthén, B. Facing off with scylla and charybdis: A comparison of scalar, partial, and the novel possibility of approximate measurement invariance. Front. Psychol. 2013, 4, 770. [Google Scholar] [CrossRef] [Green Version]
44. Van Erp, S.; Oberski, D.L.; Mulder, J. Shrinkage priors for Bayesian penalized regression. J. Math. Psychol. 2019, 89, 31–50. [Google Scholar] [CrossRef] [Green Version]
45. Arts, I.; Fang, Q.; Meitinger, K.; van de Schoot, R. Approximate measurement invariance of willingness to sacrifice for the environment across 30 countries: The importance of prior distributions and their visualization. Front. Psychol. 2021, 12, 624032. [Google Scholar] [CrossRef]
46. De Bondt, N.; Van Petegem, P. Psychometric evaluation of the overexcitability questionnaire-two applying Bayesian structural equation modeling (BSEM) and multiple-group BSEM-based alignment with approximate measurement invariance. Front. Psychol. 2015, 6, 1963. [Google Scholar] [CrossRef] [Green Version]
47. Muthén, B.; Asparouhov, T. Recent methods for the study of measurement invariance with many groups: Alignment and random effects. Sociol. Methods Res. 2018, 47, 637–664. [Google Scholar] [CrossRef]
48. Chen, Y.; Li, C.; Xu, G. DIF statistical inference and detection without knowing anchoring items. arXiv 2021, arXiv:2110.11112. [Google Scholar]
49. Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Models: A Modern Perspective; Chapman and Hall: New York, NY, USA; CRC: Boca Raton, FL, USA, 2006. [Google Scholar] [CrossRef]
50. Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
51. Lek, K.; van de Schoot, R. Bayesian approximate measurement invariance. In Invariance Analyses in Large-Scale Studies; van de Vijver, F.J.R., Ed.; OECD: Paris, France, 2019; pp. 21–35. [Google Scholar]
52. Yuan, K.H.; Bentler, P.M. Robust procedures in structural equation modeling. In Handbook of Latent Variable and Related Models; Lee, S.Y., Ed.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 367–397. [Google Scholar] [CrossRef]
53. Cai, L.; Moustaki, I. Estimation methods in latent variable models for categorical outcome variables. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 253–277. [Google Scholar] [CrossRef]
54. Hildebrandt, A.; Wilhelm, O.; Robitzsch, A. Complementary and competing factor analytic approaches for the investigation of measurement invariance. Sociol. Methods Res. 2009, 16, 87–102. [Google Scholar]
55. Pokropek, A.; Pokropek, E. Deep neural networks for detecting statistical model misspecifications. The case of measurement invariance. arXiv 2022, arXiv:2107.12757. [Google Scholar] [CrossRef]
 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Share and Cite

MDPI and ACS Style

Robitzsch, A. Estimation Methods of the Multiple-Group One-Dimensional Factor Model: Implied Identification Constraints in the Violation of Measurement Invariance. Axioms 2022, 11, 119. https://doi.org/10.3390/axioms11030119

AMA Style

Robitzsch A. Estimation Methods of the Multiple-Group One-Dimensional Factor Model: Implied Identification Constraints in the Violation of Measurement Invariance. Axioms. 2022; 11(3):119. https://doi.org/10.3390/axioms11030119

Chicago/Turabian Style

Robitzsch, Alexander. 2022. "Estimation Methods of the Multiple-Group One-Dimensional Factor Model: Implied Identification Constraints in the Violation of Measurement Invariance" Axioms 11, no. 3: 119. https://doi.org/10.3390/axioms11030119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Back to TopTop