Measurement Invariance Testing Using the Structural Equation Modeling (SEM) Module in JASP

Many research questions in the social and behavioral sciences rely on between-group comparisons of scores on scales from questionnaires. But how do we know that the questionnaire measures the same thing across different groups? Such comparisons require measurement invariance to be appropriate. Multi-group-Modeling, an analytical approach that belongs to the class of Structural Equation Modeling (SEM), provides the toolbox that we need to assess whether, and to what extent, there is measurement invariance. In this tutorial, we show you how measurement invariance can be evaluated using JASP’s newly updated SEM module.

Why is Measurement Invariance Important?

Researchers typically rely on measurements in the form of a psychological test that consists of several items. Such tests are of course no perfect, but often a reasonable representation of the constructs of interest, the so-called latent variables. A common task in research is to compare these constructs across relevant groups, such as male and female participants, respondents from different countries, or people from different ethnic backgrounds. In these cases, we need to ensure that our test actually measures the latent variable in the same fashion across the different groups, a condition called “measurement invariance”.  If  measurement invariance is violated, differences in the (latent) outcomes between groups found may only reflect on differences in measurement, but not actually reflect on differences in the outcome itself, which may lead to highly misleading conclusions.

Evaluating measurement invariance is not possible by means of a single test or model. Rather, there are different forms of measurement invariance, which allow for more or less comparison between groups. To assess measurement invariance, a sequence of models is fitted, where with every model a new set of model parameters is set to be equal across groups (see also van de Schoot, Lugtig & Hox, 2012). Let’s dive into an example to illustrate!

Example: Holzinger Swineford

The Holzinger and Swineford (1939) dataset is a classic SEM data-set that comes with R’s SEM library lavaan (Rosseel, 2012). It contains measurements of three kinds of mental abilities, which are measured by nine items (x1-x9). More specifically, the first three items (x1-x3) measure visual perception, the next three items (x4-x6) measure textual processing, and the last three items (x7-x9) measure processing speed.

Now, an appropriate confirmatory factor model for the students’ mental abilities would naturally be a three-factor model with the three types of mental abilities as factor.

However, on a closer look at the data, we realize that the students are from two schools: Pasteur and Grant-White. If we are interested in comparing the students in their mental abilities, we thus need to establish first to what extent there is measurement invariance.

We open the SEM module by clicking on SEM in the top bar. To perform multi-group modeling, we need to specify our grouping variable, school, in the ‘Multigroup SEM’ Tab at the very bottom of the input panel.

In the Model Options tab, make sure that  ‘Include mean structure’ is checked. This will ensure that the intercepts of the items will be part of the output.

We use the white box on the top of the input panel to specify our models, which is done using lavaan syntax (Rosseel, 2012). Here, we specify that the three factors (visual, textual, and speed) are measured by (=~) a combination (+) of three items each. If you want to learn more about this syntax, follow this user-friendly tutorial.

In the next 3 sections, we show how to test a factor model for (a) configural invariance, which allows for no between-group comparisons yet, (b) metric invariance, which allows between-group comparisons of structural relationships, and (c) strong invariance, which allows for comparisons of means (van de Schoot,  Lugtig & Hox, 2012).

Configural Invariance

Initially, we just let all item-parameters of the model be estimated per school. This is called configural invariance and is automatically done per default by JASP when having specified a grouping variable. This model is mainly fitted to ensure that in general the factor model can be applied to both groups, i.e. that the model is the same across groups in qualitative terms. However, configural invariance is only a starting point that does not yet allow for any between-group comparisons.

After hitting Ctrl+Enter to fit the model, we first see the model fit in the output. We do not focus on the global fit of this model here, although it should be noted here that the fit was not ideal. In practice, it may have been wise to adjust and improve the model here, before proceeding to test for metric and scalar invariance. For the purpose of this tutorial, and evaluating measurement invariance, we can however continue.

Metric invariance

Metric invariance means that the factor loadings are equal between groups. What this means is that the relationship between the score on the latent variable and the items is equal between groups. Consequently, weak invariance allows us to statistically compare structural relationships, such as correlations, or regression coefficients, between groups.

To test for metric invariance, we fit a model where factor loadings are constrained to be equal across groups and we check whether this fits the data significantly worse than the configural invariance model. For this, we first click on the green Plus next to the Model 1 in the Model Box. This opens up a new white box where the next model can be specified. In the new JASP SEM module, all models created in this way are automatically (statistically) compared in the output.

To constrain factor loadings to be equal across groups, we “label” the loadings.  In lavaan syntax, this means prepending the item name with a label and  an asterisk(*). Giving each item only one label is interpreted by lavaan as giving both groups the same label (alternatively we could give a label per group, e.g., “c(v1pasteur, v1grantwhite)*x1”). This in turn ensures that the loadings are constrained to be equal in the two schools.

Indeed, in the output, we can see that the loadings are now equal in the two groups. The other item  parameters (intercepts and residual variances) still differ.

But how do we actually test whether metric invariance holds? For this, let’s look again at the model fit table. Here we’re interested in the Chi-square difference test, which formally compares the second (metric invariance) to the first (configural invariance)  model. Here we test whether the difference in model fit between the two models is significant. In other words, we test whether the constraints imposed made the model fit significantly worse. This can also be interpreted as testing the Null-Hypothesis that the model constraint imposed in the second model holds. In this case, we are thus testing the Null-Hypothesis that the factor loadings are equal across schools.

We find that p = .224. At a significance level of .05, we can thus conclude that there is metric invariance, as we cannot reject the Null-Hypothesis that the factor loadings are equal in the two schools. Now, metric invariance does not yet allow us to compare the scores on the latent factors between the groups. However, it does allow us to compare structural relationships between the latent variables between groups. For instance, we could now statistically compare the correlations between visual and textual processing across the two schools. If we would find that the correlation is statistically higher or lower in one of the groups, we could be certain that this would not just be a consequence of differences in measurement.


Scalar invariance

Next, we fit our third model that has the property of scalar, or, strong invariance, by adding the syntax below and hitting Ctrl + Enter. Scalar invariance means that not only the factor loadings, but also the intercepts are equal between groups. This, in turn, allows us to statistically compare the means on the latent constructs. In the context of our example, it would thus allow us to compare the mean scores on the three mental ability factors without bias.

This time, we explicitly mention the intercepts in the specification of the model. We didn’t need to do this earlier, as we already set in JASP that we want the mean structure to be included. However, now we include the intercepts such that we can give them an identical label across the two groups, in order to constrain them to be equal.

Indeed, we can now see in the model output that in the Means table the intercepts of the items are equal in both schools.

However, when turning to the Model fit table, we can see that we need to reject the Null-Hypothesis that the intercepts of the items are equal across the schools as p < .001. This means that the items form no test that can be used to compare the two schools in their students’ average mental abilities.


In this tutorial, we introduced the new SEM module and how it can be used to evaluate measurement invariance. Do you want to try it out yourself? Download JASP now to get started.


[1] It is also possible to constrain factor loadings, intercepts, and residual variances in the point and click interface.  However, it is not possible to render the output-table where the sequence of model is compared with the chi-square difference tests in this way.


Holzinger, K., & Swineford, F. (1939). A study in factor analysis: The stability of a bifactor solution. Supplementary Educational Monograph, no. 48. Chicago: University of Chicago Press.

Van De Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9, 486-492. doi:  10.1080/17405629.2012.686740

Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48, 1-36.

About the authors

Erik-Jan van Kesteren

Erik Jan van Kesteren is a PhD candidate at Utrecht University. At JASP, he is responsible for adding plots, functions, and UI elements, and interfacing R and C++.

Michael Koch

Michael is a Research Master student in Methodology and Statistics at Utrecht University.