Probability distributions lie at the heart of most statistical analyses and thus are crucial for proper understanding and use of statistics. To help researchers, students, and lecturers work easily with various probability distributions, we created the ‘Distribution’ module which is one of the new features of the upcoming version of JASP.
As of now, JASP currently covers 12 basic distributions, each available as a stand alone analysis panel:
- Student’s t
- Inverse gamma
- Negative binomial
There are many other distributions that deserve to be added to this list, and we will indeed include more of them in the future.
For each of the distributions, the functionality is divided in four sections. Each of the sections highlights different aspects of the probability distribution, and we will demonstrate them here in an example of a Normal distribution.
Section 1: Show Distribution
This section displays the theoretical distribution and enables the user to interact with it.
For distributions that have multiple parameterizations, we can choose which parametrization to use throughout the analysis. Then we can change the values of the distribution’s parameters and experience how they affect the distribution; displaying the probability density function (or probability mass function for discrete distributions), cumulative distribution function, and quantile function is just one click away and updates in real time.
Additionally, we can display various quantities implied by the distribution, such as the probability mass between two points, or highlighting the density of specific points. For example, Figure 1 shows a setup for displaying the Normal distribution. Both of the plots display the density and probability associated with the interval between 0 and 1. The upper plot shows the probability density function and we can read out the density of the two points at 0 and 1 easily as 0.4, and 0.24, respectively. The probability mass between those two points is the colored area under the density function and amounts to about 0.34. The second plot displays the cumulative distribution plot. Here, we can immediately read out the probability that a random number generated from this distribution falls below 0 (0.5) and 1 (0.84), respectively. The difference between the two probabilities (0.34) gives the probability that a random number generated falls between 0 and 1 (which we discussed in the first plot). The densities associated with the selected points 0 and 1 are displayed in the cumulative distribution plot as the slopes of the tangent lines.
Figure 1. Probability density and cumulative distribution plots of the normal distribution with mean 0 and variance 1.
JASP is quite flexible in tweaking all of these options, and thus helps in getting acquainted with the distribution.
Section 2: Generate and Display Data
This section serves to generate data from the theoretical distribution, compute its descriptive statistics, and display descriptive plots.
The section is helpful to build an intuition about the sampling variability. We can simulate and visualise data and then re-simulate the data to see how the output changes.
Section 3: Estimate Parameters
This section allows you to fit the theoretical distribution to the data we generated in the previous section.
Currently, JASP estimates the parameters of the distribution using maximum likelihood and computes the standard errors and confidence intervals using the delta method. Changing the parametrization of the distribution or re-simulating the data will update the estimates accordingly, which is again helpful to build an intuition for the theoretical distribution and sampling variability.
Section 4: Assess Fit
Having estimated the parameters of the distribution, it is important to assess whether the theoretical distribution indeed fits the data. This section allows you to display various statistics and plots that indicate the extent to which the distribution misfits the data.
Specifically, we can compute the Kolmogorov-Smirnov, Cramer-von Mises and Anderson-Darling statistics for the continuous distributions, and a Chi-squared goodness of fit test for the discrete distributions.
Next, we can display histogram of the data versus the fitted probability density function, the empirical cumulative distribution function versus the fitted cumulative distribution function, and the Q-Q and P-P plots.