Network analysis is a relatively new and promising method for modeling interactions between large numbers of variables. Instead of trying to reduce the structure of the variables to their shared information, as is done in latent variable modeling, we estimate the relation between all variables directly. In this blogpost, we provide a short tutorial on how to do network analyses in JASP. Most of the analyses shown are based on the bootnet package in R (Epskamp, Borsboom, & Fried, 2017). The network graphs that JASP produces are based on the R package qgraph (Epskamp, et al,. 2012).
Example Analysis: The Big Five Inventory
The data set for this example can be downloaded here, and the annotated .jasp file here. The items in the data set are from the International Personality Item Pool (ipip; Goldberg, 1999), whereas the data were obtained by the SAPA project (Revelle, Wilt and Rosenthal, 2010; http://sapa-project.org). The big five stands for a classical theory in personality psychology that assumes personality to consist of five latent variables: “Agreeableness”, “Conscientiousness”, “Extraversion”, “Neuroticism”, and “Openness”. Each person’s personality is thought to be some combination of these five variables. We cannot observe the five variables directly, but we can develop a set of items that each measure one of the dimensions. In this tutorial we reanalyze the data to see if the big 5 structure is borne out by a network analysis.
In network analysis jargon, we will refer to the observed variables as nodes and to the estimated relations between variables as edges.
To conduct a network analysis in JASP, the most important options are:
- Estimator, where you select the type of network you want to estimate.
- Dependent Variables: the data of interest.
- Depending on your choice for Estimator you might want to set additional options. Available options for each estimator are shown under Analysis Options.
First we change the estimator to ‘cor’ to obtain some descriptive correlation networks. In addition, we click on Graphical Options and then under nodes we drag the variable ‘group’ to Color nodes by.
This will color each node in the network according to the factor it supposedly belongs to. Finally, we click on Network plot and obtain the figure below:
The nodes are positioned using the Fruchterman-Reingold algorithm which organizes the network based on the strength of the connections between nodes. This algorithm uses pseudo-random numbers, so if you redo the analyses on your own computer the figure might look slightly different. However, the estimated edges should be identical. You can view their raw values by clicking on Weights matrix under Tables. This should give:
The raw network can provide an overload of information. To facilitate interpretation of the network, a number of statistics have been developed. A subset of these are called centrality measures. For the correlation network, we tick Centrality plot to inspect the centrality measures:
Two of the three definitions of the centrality measures depend on the shortest paths of the network. A shortest path is the minimum number of steps you need to take to get from one node to another one. Of course, you cannot go from one node to the next if the edge between them is missing. Edges are missing whenever their estimated value is zero. In this network an edge of zero implies a correlation of zero but that can differ across network estimators. The shortest paths are often computed for all nodes, to all other nodes.
The first centrality measure is called Betweenness: the number of shortest paths that pass through the node of interest. For instance, the betweenness of E5 is relatively high compared to that of node O5. This means that there are more shortest paths that pass through E5 than through O5, and that it is easier to traverse from other nodes to E5 than to O5. The second centrality measure is called Closeness: the inverse of the sum of all shortest paths from the node of interest to all other nodes. The last centrality measure is called Degree; the sum of the absolute input weights of that node. In general, a higher centrality measure indicates that this node is more central to the network.
For the correlation network, we tick “Clustering plot” to inspect the clustering measures.
The Wattz-Strogatz measure is zero for all nodes because it is a measure of the shortest path to its neighbors. In this correlation network, the shortest paths from one node to a neighbor are also the direct paths, causing the values to be all equal.
Although the correlation network provides a lot of information about the data, all edges are shown, not just the significant ones. This poses a problem as we might be interpreting noise instead of actual correlations. Looking at significant edges only does not make this easier, as this introduces a multiplicity problem because we are conducting significance tests on very many correlations: P(P-1) / 2, where P is the number of predictors; in this case there would be 300 significance tests.
Instead of using correlations, we can use a regularized estimation method, such as the Extended Bayesian Information Criterion Graphical Least Absolute Shrinkage and Selection Operator, or EBICglasso for short. The EBICglasso estimates the partial correlations between all variables, and shrinks the absolute weights to zero. As a consequence, edge weights are slightly biased but small edge weights are shrunken to be exactly zero. Hence, they do not have to be tested against zero anymore, alleviating the multiplicity issue. In other applications of the lasso you often see that the outcomes depend heavily on how a certain hyperparameter is chosen. In the EBICglasso this hyperparameter is chosen using the BIC, an information criterion that takes both model complexity and model fit into account. The EBICglasso network is shown in the figure below.
In the network above, you see that the layout has changed compared to the correlation network. This happens because the Fruchterman-Reingold now creates a new layout. However, when comparing multiple networks, it is best to keep the layout the same over analyses. This can be done by including some additional information in the network about the x and y coordinates of the layout. For example, if you want to plot the EBICglasso network using the layout of the correlation network, you first need to add information about the layout to the data set. This can be done as follows. First estimate the network whose layout you want to reuse. Next, click on Layout Matrix and select Show Variable Names.
We can copy the data from the layout table into the dataset, creating two new variables, x and y. Next we go to Graphical Options, to Layout and select Data. This opens up a new box where we can enter the two variables. At first, when you click Data, the layout of the network will change to a circle. This is a sign that the Data layout did not work! Also, if somehow JASP does not understand the data you entered as x or y variable, a footnote will appear under the ‘Summary of network’ table.
Once we drag the variables x and y to x and y, the EBICglasso network should have the exact same layout as the correlation network.
This type of column metadata is also important for another estimator, called ‘mixed graphical models’ or ‘mgm’. Mgm is used when the data is a mixture of various types (e.g. continuous, counts, categorical). The analyses requires that you specify the type for each variable. The available types are: Gaussian (‘g’), Poisson (‘p’), or categorical (‘c’). Similar to how the layout is specified, the metadata is new variables of the form: ‘variableName = type’.
Now that we have discussed the most relevant methodology we can try to interpret the EBICglasso network. Below, we plotted the network with new layout and the corresponding centrality plot. Perhaps the most striking observation is the large variance within nodes that are supposed to belong to the same group, in particular for Conscientiousness. If we take for example node C3, we see that it has a very low Betweenness and Degree, whereas those centrality measures are much higher for node C21. Another observation is that the indicators of Neuroticism are more alike, and all seems to relate positively to another. This stands in contrast to all other groups, where there are both positive relations and negative relations among the nodes. We also see that some groups are more closely related to each other than other groups. It appears that Extraversion and Agreeableness as groups of variables are closer connected to each other (most apparent from the strong connections between the nodes E4 and A5) as compared to Neuroticism and Agreeableness.
These observations, in particular those about relations with mixed signs within groups, are at odds with the latent variable perspective (except for Neuroticism). At the very least, it raises the question whether all items within a group can be adequately described by a single latent variable. However, we should not forget that the items in this questionnaire were designed with this factor structure in mind; there certainly is some structure present in the data.
Sometimes the interest centers on multiple networks, for example if we want to compare relations or centrality across groups. This can be done by dragging a grouping variable to the box Split By. For example, to estimate a correlation network separate for men and women we can drag the variable ‘gender’ to Split By and obtain the output below.
This concludes our tutorial. We have briefly shown how to estimate correlation networks in JASP. However, JASP supports other network analyses as well! Feel free to check them out and reanalyze the dataset with a different estimator. Post your results on twitter. Need help with the interpretation? Ask us on the JASP Forum. Is there any method for network analysis missing? Let us know on GitHub.
br>
Like this post?
Subscribe to our newsletter to receive regular updates about JASP including our latest blog posts, JASP articles, example analyses, new features, interviews with team members, and more! You can unsubscribe at any time.
br>
Footnotes
1 Nodes C2 and C3 respectively refer to the items “Continue until everything is perfect” and “Do things according to a plan”.
References
Epskamp, S., Borsboom, D., & Fried, E. I. (2017). Estimating psychological networks and their accuracy: a tutorial paper. Behavior Research Methods.
Epskamp, S., Cramer, A. O., Waldorp, L. J., Schmittmann, V. D., & Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48(4), 1-18.
Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. Personality psychology in Europe, 7(1), 7-28.
Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality-cognition link. In Handbook of individual differences in cognition (pp. 27-49). Springer New York.