How to Conduct a Classical One-Sample T-Test in JASP and Interpret the Results

The one-sample t-test is used to answer the question of whether a population mean is the same as a specified number, also called the test value. This blog post shows how to perf-orm the classical version of the one-sample t-test in JASP. Let’s consider an example.

Testing the Effect of Overeating on Weight Gain

Our example dataset stems from a study on the effect of overeating on weight gain (Levine, Eberhardt & Jensen (1999), as reported in Moore et al. 2012, p. 425).

Below we show how to conduct two null hypothesis tests in JASP: (1) overeating does not affect one’s weight, and (2) an excess of 3500 calories translates into a weight gain of 1 pound. To test these hypotheses, Levine, Eberhardt and Jensen (1999) set up an experiment consisting of 16 nonobese adults, aged 25 to 36, who consumed 1,000 calories per day in excess of the calories needed to maintain a stable body weight. The subjects maintained this diet for 8 weeks, so they consumed a total of 56,000 extra calories.

Here we have a (small) sample of 16 participants, but we wish to generalize to the population. The participants were weighed at the beginning and at the end of the study, so that we can compute the difference in weight between the two sampling points for every participant. That difference is what we’re putting to the test.

These are the data:

Weight beforeWeight afterDifference
122.54135.7413.20
120.78129.368.58
131.12145.2014.08
137.06145.648.58
163.24173.8010.56
166.32181.0614.74
155.54163.467.92
117.26130.4613.20
161.26174.0212.76
139.48145.205.72
149.82161.4811.66
162.14169.187.04
201.74204.823.08
122.98138.6015.62
135.74150.0414.30
127.16132.665.50

 

Let’s go ahead and test our hypothesis: whether people with that kind of eating behavior experience a change in weight at all. So, what we want to know is whether the population mean difference in weight of all people who eat 1,000 excess calories per day for 8 weeks is equal to the specified number 0. Our null hypothesis is that this mean difference is indeed 0 in the population, which says that the weight of people with this eating behavior does not change. However, even if the null hypothesis is true, we cannot expect that the mean difference score calculated for our sample will be exactly zero, because we only collected a small sample from a larger population. The t-test takes this uncertainty due to having only limited number of participants into account by supposing normally distributed error (see below for more on the assumptions), and helps us to come to a conclusion about the population.

Note: to follow along with the explanation, you can either download the dataset and follow the steps on your own or download the annotated JASP file to see exactly what was done in JASP. The links above will send you to the OSF, where you can click “Download” in the top right corner of the page to download the files.

Performing the Classical One-Sample t-Test in JASP

First, we open the dataset in JASP. In the “Common” analysis menu in the ribbon we select “T-Tests” and then “One-Sample T-Test”. We then drag the “Difference” variable from the left into the right input field. Immediately, JASP performs the analysis, presented in an APA-style table that can be copied directly into your word processor.

Additionally, we have the option to inspect the Location parameter, the Effect size, Descriptives, a Descriptives plot and the (mysterious) Vovk-Sellke maximum p-ratio.

The descriptives table provides the number of observations (N), the mean of all the observations in the data, as well as the standard deviation and the standard error.

From the t-test table we can see that our t-statistic is 10.84. As the formula below shows, the t-statistic is the difference between the observed mean (calculated in our sample of participants) and the test value as specified by the null hypothesis (zero in this case), divided by the quotient of the standard deviation of our sample and the square root of the sample size:

    \[t = \frac{\bar{x} - \mu_{0} }{s / \sqrt{n}}\]

Let’s go through this quickly to make sure we understand what the t-statistic tells us. We can obtain all values we need from the Descriptives table and the information we have about the data. In the numerator, we get 10.41 – 0 = 10.41. In the denominator, we get 3.841 / 4 (the square root of 16, as we have 16 participants) = 0.96025. Now we can divide 10.41 by 0.96025, and we get a rounded t-statistic of 10.84. This gives us an indication of how far away the observed mean is from the test value, while taking into account the precision of our measurement. Moving on, as we have 16 observations in the data, this t-test has 15 degrees of freedom.

Most classical analyses report a p-value. In our example, we can see that the p-value is below 0.001, which indicates that the probability of obtaining this result or something more extreme, given that our null hypothesis is true, is lower than 0.001. So, if it were true that people with this eating behavior do not experience a change in weight, then the chance of randomly drawing a sample at least as extreme as this one is lower than 0.1%. Note that this statement is reasoned from the population to the sample — it presupposes that the null hypothesis of no difference in the population holds true. From this we derive implications to our observations, but also for more extreme, but not observed, data, such as a t-value of 37. Another more extreme, but not observed t-value, in this case is t = -12, because we have conducted a two-sided test.

Usually, p-values below 0.05 are considered “statistically significant”, although many researchers believe this threshold is too lenient. In any case, a p-value below 0.001 is often considered strong evidence against the null hypothesis. However, it is always a good idea to broaden one’s inferential perspective and consider information other than the p-value (e.g., confidence intervals). The mean difference was 10.41 pounds, which support the idea that people actually gain substantial weight.

The location parameter (called “Mean Difference” in the table) gives you the difference between the sample mean (of the differences) and the test value, 10.41-0, that is the numerator of the t-test that we calculated above.

The effect size that is given is Cohen’s d. Cohen’s d is a standardized effect size as a result of dividing the mean difference by the observed standard deviation, that is,

    \[d = \frac{\bar{x} - \mu_{0} }{s}\]

which for our example implies d = 10.41/3.841 = 2.710. There is no strict rule for interpreting Cohen’s d, but a rough guideline accompanied with some explanation can be found here. In our example, the effect size (d = 2.71) is indicative of a very large effect. What that means is that the observed sample mean is very far off from what we would expect to see if our null hypothesis were true.

Testing Against Another Test Value

Suppose a friend of ours, Anna, received the dataset as well, and performed a different one-sample t-test. In formulating her hypothesis, she actually took a theory into account that came with the dataset. The theory says that “3500 extra calories will translate into a weight gain of 1 pound. Therefore, we expect each of these subjects to gain 56,000/3,500 = 16 pounds.” (Moore et al. 2012, p. 425).

Anna tested a different null hypothesis than we did. We tested the hypothesis that this eating behavior does not result in any change of weight, that is, we tried to determine whether the population mean difference in weight after eating 56,000 excess calories is 0. Anna took into account a theory that says people should gain 16 pounds on average after eating that much in excess. So while our test value was 0, Anna’s was 16. To simulate what she did in JASP, all we need to do is to change the number in the Test value field from 0 to 16.

Note: In addition to doing a two-sided test, one can do a directional test as well. Depending on whether you think the population mean might be larger, smaller, or just different that your test value, you can go for the respective one-sided t-test or stay with the two-sided version. More on the difference between one-sided and two-sided tests can be found here. For this example, let’s stick to the two-sided t-test.

We can see that the t-statistic, the location parameter and the effect size all changed to negative values. Both the t-statistic (t = -5.823) and the effect size (d = -1.456) suggest that the observed mean is quite far off from what we would expect to see if the null hypothesis were true.

The p-value in Anna’s results is below 0.001, which tells us again that observing these data or something more extreme, given that Anna’s null hypothesis is true, is less probable than 0.1% (click here to see a GIF that explains how to show the exact p-value). These results are generally taken to suggest that the data provide strong evidence against Anna’s null hypothesis.

The participants gained a lot less weight than what Anna would have expected to see under her hypothesis. Therefore, she rejects the null hypothesis that people who eat 56,000 excess calories gain 16 pounds of weight.

Checking the Assumption

As mentioned above, the one-sample t-test assumes that the dependent variable is normally distributed. We may check if that is actually the case here. In JASP you can do this by clicking Normality under Assumption Checks. JASP will then perform a Shapiro-Wilk test of Normality, which tests the null hypothesis that the dependent variable is normally distributed.

The p-value is not significant, which means that we failed to reject the null hypothesis, and most people would feel they do not have statistical reason for doubting the normality assumption.

The Non-Parametric Alternative

In case the Shapiro-Wilk test turns out significant or we have prior grounds to believe that our data might not be normally distributed, we can perform a Wilcoxon signed-rank test by selecting it under Tests. For more information on how to interpret the Wilcoxon signed-rank test, click here.

Thanks to Eric-Jan Wagenmakers and Alexander Ly for their comments on an earlier draft that helped me to write this blog post.


Like this post?

Subscribe to our newsletter to receive regular updates about JASP including our latest blog posts, JASP articles, example analyses, new features, interviews with team members, and more! You can unsubscribe at any time.


Animated GIF

References

Levine, J. A., Eberhardt, N. L., & Jensen, M. D. (1999). Role of nonexercise activity thermogenesis in resistance to fat gain in humans. Science, 283, 212-214.

Moore, D. S., McCabe, G. P. McCabe, & Craig, B. A. (2012) Introduction to the practice of statistics. Freeman, New York.

About the author

Tim Draws

Tim Draws is a PhD candidate in the Web Information Systems group at Delft University of Technology. At JASP, he is contributing to the Machine Learning Module.