“Hell is Other People’s Data”: Introducing the JASP Data Documentation Format

 

The JASP Data Library contains over 50 data sets that can be used to illustrate particular statistical analyses. We have now started to document these data sets systematically. The first version of the book “The JASP Data Library: Version 1” contains a preface and two chapters, describing the “Sleep” data set and Andy Field’s “Fear of Statistics” data set. You can find the book here. We will update the book as we add more chapters. What we want to call attention to here, however, is the general format that the book uses to document each data set.

The Need for Documentation Guidelines

Our experience with the analysis of publicly available data suggests that there is an urgent need for standard guidelines that encourage systematic and comprehensive documentation. Without such guidelines, the analysis often becomes a frustrating exercise in encryption. In other words, “hell is other people’s data” (for empirical support, see for instance Hardwicke et al., 2018 ). This undesirable state of affairs is why we have developed the JASP Data Documentation Format (JDDF). This format provides a list of minimal requirements for how data ought to be documented, one that may need to be expanded for particular data sets (such as those from neuroscience). The proposed format is concrete and focuses purely on documentation (see the FAIR guidelines for complementary advice concerning data storage; Wilkinson et al., 2016).

Eight Sections for Documentation

The JDDF features eight sections, elaborated upon and exemplified in the book. These are:

  1. Description. A brief summary of the data set in order to provide context.
  2. Purpose. A brief statement of the purpose that the data may serve.
  3. Data Screenshot. A visual impression of the data structure, showing at least some of the columns and rows in which the data have been organized.
  4. Variables. A description for each of the variables in the data set. At a minimum, the description features (a) the variable name; (b) the variable’s measurement level and the values that the variable can take on; (c) a verbal summary of the variable.
  5. Source. A description of where the data were obtained, and where the data can be accessed online.
  6. Analysis code. Anything that allows a third party to reproduce a result. This can be R code, Stan code, or a .jasp file, for instance. We strongly recommend against using commercial software for analysis, as the interested third party might be unwilling or unable to buy an expensive license (e.g., to Matlab, JMP, SAS, Stata, or SPSS) in order to reproduce the result.
  7. Example analysis. A demonstration that the provided code, when applied to the provided data, actually produces a result.
  8. Want to know more? Any additional information of interest.

Templates Available on the OSF

If you would like to document your own data using the JDDF, LaTeX users may find a template here. If you wish to propose a contribution to the JASP Data Library, please use the Tufte style file (as available for instance in Overleaf).

Future Prospects

We will use the JDDF to describe the data sets in the JASP Data Library. Perhaps the format is also useful as a guideline for researchers who wish to document their data in a systematic way.

References

Hardwicke, T., Mathur, M., MacDonald, K., Nilsonne, G., Banks, G., Kidwell, M., Hofelich Mohr, A., Clayton, E., Yoon, E., Tessler, M., Lenne, R., Altman, S., Long, B., and Frank, M. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition . Royal Society Open Science, 5: 180448.

The JASP Team (2019). The JASP Data Library: Version 1.

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzales–Beltran, A., Gray, A. J. G., Groth, P., Goble, C., S., G. J., Heringa, J., ’t Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca–Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., T. M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3.

About the authors

Eric-Jan Wagenmakers

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam. EJ guides the development of JASP.

Šimon Kucharský

Responsible for JASP examples and educational material.