John Tukey is famous. In his youth he coined the term “bit” (as an abbreviation of “binary information digit”), later he promoted exploratory data analysis, and throughout his entire life he worked on a broad range of statistical techniques, some of which carry his name.
In 1991, Tukey published the article “The philosophy of multiple comparisons”. The paper starts with an accusation, one that has gained much traction in recent years. Here it is:
“Statisticians classically asked the wrong question—and were willing to answer with a lie, one that was often a downright lie. They asked “Are the effects of A and B different?” and they were willing to answer “no.”
All we know about the world teaches us that the effects of A and B are always different—in some decimal place—for any A and B. Thus asking “Are the effects different?” is foolish.
What we should be answering first is “Can we tell the direction in which the effects of A differ from the effects of B?” In other words, can we be confident about the direction from A to B? Is it ”up,” “down” or “uncertain”?
The third answer to this first question is that we are “uncertain about the direction”—it is not, and never should be, that we “accept the null hypothesis.” (Tukey, 1991, p. 100)
In other words, Tukey claims that the dominant statistical practice (p-value hypothesis testing) answers the wrong question, namely whether the effect is present or absent. The right question, according to Tukey, is to test for the direction of the effect.
From a Bayesian perspective, Tukey’s statement is ironic. It has been known for a long time (at least since Jeffreys, 1937 – hat tip to Alexander Etz!) that the one-sided p-value is an approximation to the posterior area to the left of zero, assuming that the alternative hypothesis is true and the null hypothesis can be ignored. A recent Bayesian Spectacles blog post discusses one of our recent papers which also contains references to the statistical literature.
So, from a Bayesian point of view, one would address Tukey’s lament as follows: “If you do not care about the null hypothesis and you only want to test the direction of the effect, then please use the one-sided p-value; in many situations, it is a good approximation to the posterior mass on one side of zero.” For instance, suppose the one-sided p-value is .05. This means that approximately 5% of the posterior mass is lower then 0, and 95% is higher than zero. With a prior distribution symmetric around zero, the Bayes factor for the model that stipulates a positive-only effect versus the model that stipulates a negative-only effect is then .95/.05 = 19. That is, when we see a one-sided p-value of .05, we can draw the Bayesian conclusion the data are approximately 19 times more likely under the positive-effect-hypothesis than under the negative-effect-hypothesis.
In other words, in diametric opposition to Tukey’s claim, researchers who use p-values inadvertently give an approximate, poor-man Bayesian answer to the “right” question of direction, not an exact answer to the “wrong” question of presence. Perhaps this is one of the reasons that p-values have survived for so long.
As an aside, it is doubly ironic that Tukey is responsible for the following statistical aphorism:
“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” (Tukey, 1962, pp. 13-14).
From a Bayesian perspective, we can draw another important conclusion. If researchers do care about a test of the presence of an effect, and if they do take the null hypothesis seriously, then the p-value is not appropriate. The p-value is merely an approximate test of direction, assuming from the outset that the alternative hypothesis is the only game in town. Therefore, using the p-value to test a point-null hypothesis is begging the question – that is, it rules out a priori the very model one seeks to disprove.
Jeffreys, H. (1937). On the relation between direct and inverse methods in statistics. Proceedings of the Royal Society of London: Series A, Mathematical and Physical Sciences, 160, 325-348.
Marsman, M., & Wagenmakers, E.-J. (2017). Three insights from a Bayesian interpretation of the one-sided P value. Educational and Psychological Measurement, 77, 529-539.
Tukey, J. W. (1962). The future of data analysis. Annals of Mathematical Statistics 33, 1-67.
Tukey, J. W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100-116.
Like this post?
Subscribe to our newsletter to receive regular updates about JASP including our latest blog posts, JASP articles, example analyses, new features, interviews with team members, and more! You can unsubscribe at any time.