This is a guest post by Calvin Deans-Browne (UCL) and Henrik Singmann (UCL). Click here to access the supplemental materials.
This article contains an introduction to the different indices measured in the signal detection theory (SDT) framework, a case study to put them into context, and how to compute them in JASP.
An Introduction to SDT Indices
Signal Detection Theory (SDT) is one of the most popular formal theoretical approaches in cognitive psychology (for introductions see Kellen & Klauer, 2018; Macmillan & Creelman, 2005). SDT can be used to disentangle the cognitive processes underlying simple decision making under uncertainty. In the simplest SDT experiment, a yes-no experiment, the decision maker is presented with a series of stimuli from two distinct stimulus classes. The task of the decision maker is to decide, for each stimulus individually, whether the stimulus carries a “signal” or whether it is just “noise”. For example, we will introduce an experiment in which participants are presented with a series of images, some of which are digitally manipulated and some of which are not. The task of the participants is to decide, for each image, whether it is manipulated (i.e., has the signal) or not. Because the information to make such a decision is not perfect, the decision maker can make errors. In SDT terminology, the different possible observations are classified as follows:
- Hits (H): detecting a signal when it is present (e.g., responding “manipulated” to a manipulated image)
- False alarms (F): detecting a signal when it is not present (e.g., responding “manipulated” to a non-manipulated image)
- Misses (M): not detecting a signal when it is present (e.g., responding “not manipulated” to a manipulated image)
- Correct rejections (CR): not detecting a signal when it is not present (e.g., responding “not manipulated” to a non-manipulated image)
Here, we assume that all these measures are proportions (i.e., values between 0 and 1) indicating the rate with which each of the observations happens in the data. For example, H refers to the hit rate for a specific participant or study (below we describe how H can be calculated from the observed response frequencies).
Furthermore, we only focus on hits (H) and false alarms (F) here. The reason for this is that misses (M) and correct rejections (CR) are redundant, as M = 1 – H and CR = 1 – F (i.e., M and CR do not provide independent or additional information to H and F).
One of the key insights of the SDT framework is that there are two latent cognitive processes underlying these four observations, discriminability (often also called sensitivity) and response bias. If a decision maker is better in discriminating between two stimulus classes than another decision maker, overall performance is better for the former decision maker as indicated by increased H and decreased F (i.e., discriminability affects H and F in opposite directions). Likewise, if a particular subset of stimuli or specific condition is easier to discriminate than another set of stimuli or condition, performance will be better for the former stimuli/condition again indicated by increased H and decreased F.
If a decision maker has a specific propensity for one of the two response options, this is captured by the response bias. For example, a decision maker with a strong bias towards responding “signal” will have both high values of H and F (i.e., response bias affects H and F in the same direction). In the same way as discriminability can be a factor of both the decision maker or the stimuli/condition, response bias can also be a factor of both decision maker (as described in the previous sentence) or the stimuli/condition. For example, if participants are paid for each hit they make, but not punished for each false alarm, we will likely see high values of H and F, compared to a situation without such payoffs.
The data collected in SDT experiments, H and F rates, are not process pure. The SDT framework is what allows us to convert H and F into process pure measurements of discriminability and response bias, via several underlying assumptions. SDT assumes that for a given stimulus, decision makers have access to a signal on a (usually continuous) latent strength dimension. For the example we will flesh out further, the latent strength of said signal would be the visible evidence that the image has been manipulated. Additionally, noise stimuli in which no signal is present also elicit some value on this latent strength dimension. In our example, this would be aspects of an untouched image that make it look as if it were manipulated. However, SDT does not assume that each signal stimulus or noise stimulus has the same value on the latent strength dimension. In contrast, SDT assumes that the latent strength information itself is noisy; some signal items have a higher value than others and the same is true for noise items. For our example this means that some manipulated images do look more manipulated than other manipulated images and some non-manipulated images also look more manipulated than other non-manipulated images. Formally, the variant of SDT we focus on here assumes that the latent strength signal for each stimulus type follows a normal (i.e., Gaussian) distribution with the signal distribution having an on average higher signal than the noise distribution. This is graphically shown in Figure 1.
However, as also shown in Figure 1, in many situations there is an overlap between both distributions such that there exists a region for which the true status of an item is not easy to detect, stimuli are confusable (e.g., for some images there is uncertainty whether or not it is manipulated). We could say the latent strength value alone is not enough to determine whether an item comes from a signal or noise distribution. SDT assumes that the decision maker sets a response criterion (i.e., a threshold) that separates the point at which a particular strength value is labelled as either ‘signal’ or ‘no signal’. If a signal value exceeds the threshold the response ‘signal’ is given and otherwise the response is ‘no signal’.
Figure 1. Graphical representation of the Gaussian SDT model. The distribution on the left represents the latent strength value for noise items and the distribution on the right represents the latent strength values for signal items. The parts of the distributions to the left of the response criterion represent responses where participants responded ‘no signal’ (i.e., thought it was a noise stimulus), whereas the parts of the distributions to the right of the response criterion represent responses where participants responded ‘signal’. The area underneath the noise distribution curve corresponds to the proportions of CR and F responses (when looking at the area underneath the left and right of the response criterion, respectively) whilst the area underneath the signal distribution curve corresponds to the proportions of M and H responses (again, when looking at the area underneath the left and right of the response criterion, respectively.) In this graph, significant overlap between the noise and signal distributions suggest that discrimination between stimulus classes is limited. The position of the response criterion also suggests that people are slightly biased in responding ‘no signal’, regardless of whether or not it is.
The most commonly used measures of discriminability and response bias indices (taken from Macmillan and Creelman, 2005) are:
- d-prime: d’ = z(H) – z(F)
- where 0 indicates chance performance for discriminating the signal from noise and values above 0 indicate increasingly stronger abilities to discriminate the signal from the noise
- Criterion location: c = –½ [z(H) + z(F)]
- where 0 indicates an unbiased decision maker, values below 0 indicate a decision maker biased towards detecting the signal (independent of its presence), and values above 0 indicate a decision maker biased towards not detecting the signal (independent of its presence)
There exist alternative measurements of response bias. Most prominently:
- Relative criterion location: c’ = c/d’ = -1/2(z(H)+z(F))/(z(H)-z(F))
- This measurement of response bias is sensitive to the discriminability index. This is especially relevant when d-prime has extreme values. For example, easier discrimination tasks (yielding a higher d-prime) will require a higher response bias index to indicate an equivalent level of bias counter to more difficult discrimination tasks yielding lower d-prime values. c’ adjusts better for such situations than does the criterion c. As for c, 0 indicates an unbiased decision maker, values below 0 indicate a response bias towards detecting the signal, and values above 0 indicate a response bias biased towards not detecting the signal.
- Log of the likelihood ratio: ln(b) = cd’ = 1/2(z(H)2 – z(F)2)
- According to MacMillan (2005), the likelihood ratio (b) is the most general of the response bias indexes, and is meaningful for representations of any complexity as its scale is not in relationship to the discriminability. The likelihood ratio is given by e to the power of the product of d’ and c, where a value of 1 represents the neutral criterion point, values below 1 indicates a response bias towards detecting the signal and values above 1 indicate a response bias biased towards not detecting a signal. Here we present the log of the likelihood ratio, ln(b), which also centres this neutral point to 0. Therefore, in the given equation, 0 represents the neutral criterion point, values below 0 indicate a response bias towards detecting the signal and values above 0 indicate a response bias biased towards not detecting a signal.
For all these equations, z() represents the inverse of the cumulative distribution function of the normal (i.e., Gaussian) distribution. The inverse of the cumulative distribution function is also known as the quantile function. As we will see later, the quantile function for the normal distribution is called qnorm() in R. The case study that follows will put these into context, and will illustrate how to compute each one in JASP.
Case study: Using SDT to Investigate the Contextual Effects of Instagram on the Recognition of Digitally Manipulated Images
Whereas airbrushing and photoshopping the perfect body used to be the privilege of celebrities, new low-cost digital manipulation technology has now made it possible for everyone. This has been combined with the advent of social media such as Instagram, resulting in the ability to post the perfect images online aided with digital manipulation technologies such as Photoshop and Facetune. The literature generally concludes that people are bad at discerning between images that have and have not been manipulated (e.g., Nightingale, Wade & Watson, 2017; Farid & Bravo, 2010). However, given that it is well known that some people digitally manipulate their images on Instagram, it is possible that people could be better at detecting manipulated images if they are shown on Instagram. If this is not the case, however, then there could be consequences for the mental health of those who view them. One can imagine the potential effect on self-esteem if someone were to compare how they look in images of themselves with images of other people they are unaware have been digitally manipulated (e.g., Kleemans et al., 2018).
We investigated the question as to whether an Instagram context effects the recognition of digitally manipulated images in a sample of 18-25-year olds at the University of Warwick. Participants (n = 83) took part in a yes-no discrimination task. That is, participants were shown 50 images that one would typically find on Instagram. Half of these images were digitally manipulated (by us) and half were not (each image existed in a manipulated and non-manipulated form, but each participant only saw one version of each image in a randomized fashion). Participants then had to decide whether each image was digitally manipulated or not. This left us with the individual-level response frequencies of hits, false alarms, misses and correct rejections. Based on these values we could calculate discriminability and response bias indices for each participant.
The context of Instagram was operationalised as whether the images participants saw were surrounded by an Instagram-feed frame or not, and half of the participants experienced each condition, allowing us to measure group differences between discriminability and response bias. The difference between the Instagram-feed frame present and absent conditions are illustrated in Figure 2.
Figure 2. Example images in the Instagram absent (left) and Instagram present (right) conditions. In either of these conditions, participants will have seen an image that either had or had not been digitally manipulated. To maintain the confidentiality of those who submitted images to us, this image is not one that was shown to participants, but is instead a royalty free image from https://pixabay.com/photos/romantic-rosa-beauty-linda-2748701/.
Calculating Discriminability and Response Bias in JASP
After we finished collecting our data, our prepared anonymised data with the frequencies of hits, misses, false alarms and correct rejections looked something like what we have below:
The ‘Insta-absent’ condition refers to the Instagram-feed frame absent condition, whilst the Insta-present’ condition refers to the Instagram-feed frame present condition. This frequency data needs to be transformed into data that tells us the proportions of hits, H, and false alarms, F. Not only is this information needed in this form to calculate the discriminability and response bias indices, but it is in itself more informative of each participant’s performance.
The probabilities were calculated using the following equations:
- H = HN/(HN + MN)
- F = FN/(CRN + FN)
In the formulas, XN indicates the frequencies of the response options as opposed to their probabilities. To calculate these in JASP, you will have to compute a new column. To do this, click on the plus ‘+’ symbol next to the column headings, and click the ‘R’ symbol indicating you would like to code the column using R.
You will need to create new columns for both H and F. After creating each column, at the top of the page directly above the column headings should be a place to code the column in R. For each column, you will need to code the following:
You will need to then compute columns for the SDT indices the same way, except this time for each column, you will need to code the following for each index:
- qnorm(H) – qnorm(F)
- -0.5 * (qnorm(H)+qnorm(F))
- -0.5 * ((qnorm(H)+qnorm(F)) / (qnorm(H)-qnorm(F)))
- -0.5 * (qnorm(H)^2 – qnorm(F)^2)
As mentioned above, qnorm() is simply the R function corresponding to z() in the SDT formulas above, the inverse of the cumulative distribution function of the normal distribution (or quantile function of the normal distribution). Thus, these are simply the formulas from above. Once each column is computed, you should see the column fill with numbers automatically, providing you with the indices you require.
For those interested, we analysed our results by using a Bayesian t-test to check between-condition differences in both d’ and c. Significant support was found for the null effect, suggesting that Instagram does not provide contextual cues for detecting digital manipulation in images. Below we have attached a JASP file with only the original frequencies of H and F from our data, as well as a JASP file with all the computed SDT indices and data analysis, so you can try the calculations for yourself and see if you get the same results as us. We wish you the best of luck conducting your own SDT analysis on JASP, the possibilities are endless!
Materials to recreate the analyses can be found at https://osf.io/wujnr/.
Farid, H., & Bravo, M. J. (2010). Image forensic analyses that elude the human visual system. Media forensics and security II, 7541, 1-19
Kellen, D., & Klauer, K. C. (2018). Elementary Signal Detection and Threshold Theory. In J. T. Wixted (Ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience (pp. 1–39). John Wiley & Sons, Inc. https://doi.org/10.1002/9781119170174.epcn505
Kleemans, M., Daalmans, S., Carbaat, I., & Anschütz, D. (2018). Picture Perfect: The Direct Effect of Manipulated Instagram Photos on Body Image in Adolescent Girls. Media Psychology, 21(1), 93–110. https://doi.org/10.1080/15213269.2016.1257392
Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide. Psychology press.
Macmillan, N. A., & Creelman, C. D. (1990). Response bias: Characteristics of detection theory, threshold theory, and” nonparametric” indexes. Psychological bulletin, 107(3), 401.
Nightingale, S. J., Wade, K. A., & Watson, D. G. (2017). Can people identify original and manipulated photos of real-world scenes?. Cognitive research: principles and implications, 2, 30.