Altair SmartWorks Analytics

 

Model Evaluation

The Model Evaluation process node allows you to calculate, visualize, and export metrics for deployed models using batches of prediction data. It works best with the built-in feedback loop. You can use the Model Evaluation plugin to detect concept drift and determine whether a deployed model should be retrained on newer data.

The simplest (and ideal) model evaluation workflow is a three-step process.

  1. Import request and response data equipped with ground-truth labels or values.

  2. Consume the data in the plugin to calculate and visualize one or more metrics over time.

  3. Export the metrics to a data store of choice.

 

You could also create a model evaluation workflow with a five-step process if your actuals are stored separately from your predictions.

  1. Import the first batch of actual data.

  2. Import the second batch of prediction data.

  3. Join these two batches of data.  

  4. Consume the joined data in the plugin to calculate and visualize one or more metrics over time.

  5. Export the metrics to a data store of choice.

 

Prerequisites

  • The model(s) in question is (are) solving a supervised learning problem with only one dependent variable.

  • The problem being solved is either a binary classification, multiclass classification, or regression problem.

  • There exists a single dataframe containing at least the following columns:

    • timestamp, a string or datetime column containing the prediction timestamps and

    • actual, a column containing the ground-truth labels or values that may be of numeric or string type.

The dataframe should also contain

    • in the case of a binary classification problem:

      • a prediction column, of string or integer type, for calculating the accuracy, confusion matrix, F1 score, precision or recall; or

      • a probability or probability_{{ positive_class }} column of continuous, numeric type taking on values in the interval [0, 1], where positive_class is the label representing the positive class (such as 1, “yes” and “anomaly”), for calculating the AUC–ROC, cumulative gains, Gini coefficient, KS plot, KS statistic, lift and ROC.

    • in the case of a multiclass classification problem with k classes:

      • a prediction column, of string or integer type, for calculating the accuracy, confusion matrix, weighted F1 score, weighted precision or weighted recall; or

      • a sequence probability_{{ class_0 }}, probability_{{ class_1 }}, …, probability_{{ class_k }} of columns of continuous, numeric type, taking on values in the interval [0, 1] and summing to 1.00 row-wise, for calculating the weighted, OvO AUC–ROC.

    • in the case of a regression problem:

      • a prediction column taking on continuous, numeric values.

Calculating and Visualizing Metrics for Models

The steps below describe how to configure the Model Evaluation node to calculate and visualize metrics for models.

Steps:

  1. Import data from a database or text file into the Workflow Canvas.

  2. Drag and drop the Model Evaluation node from the Machine Learning group of the Node tab of the Palette into the Workflow Canvas.

  3. Connect the output socket of the Import Database/Text dataframe to the input socket of the Model Evaluation node.

  4. Configure Evaluation Settings by opening the Model Evaluation Node Viewer.

  5.  

    The Information section provides details of the model, such as the prediction, deployment type, and champion model.

  6. If the model type is "Binary Classification," select a Positive Label from the dropdown provided in the Configure section.

  7. Under the Define Window Size section:

    • Enter a positive integer into the Multiple text box and

    • Select a value from the Base Frequency dropdown.

    The possible base frequencies are:

      • second(s)

      • minute(s)

      • hour(s)

      • day(s)

      • week(s)

      • month(s)

      • year(s)

  8. In the Select Metrics section, select one or more metrics by checking the appropriate boxes. The available metrics are determined by the prediction type and columns present in the input dataframe:

    • Binary Classification

      • Accuracy

      • Area under the curve, receiver operating characteristic (AUC–ROC)

      • Confusion matrix

      • Cumulative gains

      • F1 score

      • Gini coefficient

      • Kolmogorov–Smirnov (KS) plot

      • KS statistic

      • Lift (deciles)

      • Precision

      • Recall (true positive rate, sensitivity)

      • ROC

    • Multiclass Classification

      • Accuracy

      • AUC–ROC, weighted, one-versus-one (OvO)

      • Confusion matrix

      • F1 score, weighted

      • Precision, weighted

      • Recall, weighted

    • Regression

      • Mean absolute error (MAE)

      • Mean squared error (MSE)

      • Root-mean-square error (RMSE)

      • R-squared

    When a metric is selected, a new tab for that metric is created next to the Data Preview tab. Under this tab, you can view an interactive chart of the metric.

     

  9. For all metrics except the confusion matrix, cumulative gains, lift (deciles), KS plot, and ROC, you can:

    • Set “From” and “To” dates using a date and time picker widget and

    • zoom into and out of the chart.

  10. For all metrics, the Model Selection drop-down menu can be used to plot metrics for select models.

  11. Save your evaluation settings and then run the node to export the calculated metrics.  

 

RELATED READING:

Model Evaluation Explained