How to Predict with Machine Learning Models in JASP: Classification

This blog post will demonstrate how a machine learning model trained in JASP can be used to generate predictions for new data. The procedure we follow is standardized for all the supervised machine learning analyses in JASP, so the demonstration here generalizes to all of them. Please note that we use the latest version of JASP (version 0.16.2). For our demonstration we use the model trained in our previous blog post on classification. First, we provide some background information about making predictions with a machine learning model.

Background on Predictions with a Machine Learning Model

Recall that a classification problem has three components: (a) features/predictors x=(x_{1}, x_{2}, \ldots, x_{p}), (b) a categorical outcome/target variable y, and (c) a true –but unknown– function f that relates the features to the target variable. For the problem at hand, the features x include a customer’s phone contract details, and what a customer pays in monthly and total charges. The categorical outcome variable y is named Churn and indicates whether a customer cancels or unsubscribes from a company’s services (“Yes”), or not (“No”). The third component, the true function f, is unknown.

The prediction problem is as follows: Provided with a (new) customer’s features x_{\text{new}}=(x_{1, \text{new}}, \ldots, x_{p, \text{new}}) can we predict this customer’s outcome variable y? In other words, can we predict whether this customer is going to churn, or not, given the customer’s features such as contract and billing information? Ideally, we address this problem by applying the true function f to a customer’s features x_{\text{new}}. However, because the true function f is unknown, we have to use an estimate \hat{f} instead, resulting in the prediction

    \[y_{\text{pred}} = \hat{f}(x_{1, \text{new}}, \ldots, x_{p, \text{new}}) .\]

An estimate \hat{f} is obtained by feeding a machine learning algorithm a training (i.e., old) data set where both the features x_{1}, \ldots, x_{p} and the corresponding outcomes y (”Yes” or “No”)) are known, such as the publicly available Telco Customer Churn data set from Kaggle. In our previous blog post this data set was fed into a “K-Nearest Neighbors” algorithm. To apply the resulting classification model to a new data set, we first have to save the model in JASP.

Saving Trained Machine Learning Models in JASP

Since JASP 0.16 all supervised (i.e., regression and classification) machine learning analyses include options to save a trained model. These options can be found in the input panel on the left-hand side of the corresponding analysis, see Figure 1.

Figure 1. The options in the interface used for saving a trained model.

To follow along, go to “Open”, “Data Library”, “10. Machine Learning”, Telco Customer Churn, and click on the spreadsheet icon with the JASP logo.

To save the K-Nearest Neighbor Classification model trained with the Telco Customer Churn data set, we click the “Browse” button next to “Save as”, and store the resulting model with the .jaspML extension in our desired directory. After ticking the checkbox “Save trained model” the trained model is stored.

Predicting Customer Churn

We now show how such a saved .jaspML model is used to make predictions by applying it to a (new) data set consisting of only features. The example data set used for this demonstration can be downloaded as a .csv file from our osf repository.

After opening this data set, we see that it consists of customers’ features with the same column names as in the Telco Customer Churn data set. To apply the trained model to these new features we go to “Prediction” in the Machine Learning module.

To load a trained model, click on the “Browse” button, and select the saved .jaspML model. JASP automatically detects which machine learning algorithm is used in the training phase. In this case, a K Nearest Neighbors Classification model with the number of neighbors trained to K=42 based on 4500 training samples, as is shown in Table 1. The note under Table 1 indicates which features are required, but still missing, to make predictions using the trained model.

Table 1. Summary of the loaded model.

To apply the trained model to the new data set, we simply select all required features (in this case, all columns in the data set) and drag them into the “Predictors” field. This action fills the initially empty Table 2 with the predicted classes for the first 20 rows of the new data set. The features corresponding to these predictions can be added to the table by enabling the “Add predictors” option. In addition, the “From” and “To” options in the input panel control which rows are displayed in the table.

    Table 2. Predicted classes for the first 20 rows of the new data set after applying the trained model.

The predictions for all rows in the data set can be added as a new column to the data set by clicking the option “Add predictions to data”, and by providing a name for this column in the field behind “Column name”. When the actual labels for these customers are eventually known, the predictive accuracy of the model can then be re-evaluated by comparing the predicted labels to the true labels. The model can then be retrained using all available data and applied to a new data set.

Explaining the Algorithm

We now briefly elaborate on how the K Nearest Neighbor Classification predictions are obtained. The key idea is as follows: Provided with a new customer’s vector of features x_{\text{new}} the prediction \hat{y}_{\text{new}}, i.e., “Yes” or “No”, is based on a (weighted) average of K number of known labels y_{1}, y_{2}, \ldots, y_{K} from the training (i.e., old) data set that are closest to x_{\text{new}} in feature space.

The parameters of the algorithm include (i) the weighting method, (ii) the distance used to define what is meant by “closest to x_{\text{new}} in feature space”, and (iii) the number of neighbors K used to predict \hat{y}_{\text{new}}. These were set and trained to be (i) “Rectangular”, (ii) “Euclidean”, and (iii) K=42, respectively. For a more elaborate discussion on these parameters, see the corresponding section in our previous blog post.

Each classification algorithm (such as “Boosting Classification”, “K-Nearest Neighbors Classification”, “Linear Discriminant Classification”, and “Random Forest Classification”) defines its own heuristic and parameters to predict the categorical outcome y from the features x. However, prediction using these trained models in JASP follows the same procedure as was demonstrated here. We hope that this standardized way of generating predictions in JASP is to the liking of our users.

What is next for the Machine Learning module?

Since our previous blogpost about machine learning, we implemented four additional analyses: “Neural Network” regression and classification, “Decision Tree” regression and classification, “Support Vector Machines” regression and classification, and “K-Medoids” clustering. Since we like to remain in close contact with our users, we invite you to inform us about any feedback or feature requests via our issue page. Also keep an eye out for our upcoming blogpost about the neural network analysis applied in a regression context.

About the authors

Alexander Ly

Alexander Ly is the CTO of JASP and responsible for guiding JASP’s scientific and technological strategy as well as the development of some Bayesian tests.

Koen Derks

Koen Derks is a PhD candidate at Nyenrode Business University and at the Psychological Methods group at the University of Amsterdam. At JASP, he is creating JfA, an add-on module for Auditing.