Model Evaluation
The Model Evaluation process node allows you to calculate, visualize, and export metrics for deployed models using batches of prediction data. It works best with the builtin feedback loop. You can use the Model Evaluation plugin to detect concept drift and determine whether a deployed model should be retrained on newer data.
The simplest (and ideal) model evaluation workflow is a threestep process.

Import request and response data equipped with groundtruth labels or values.

Consume the data in the plugin to calculate and visualize one or more metrics over time.

Export the metrics to a data store of choice.
You could also create a model evaluation workflow with a fivestep process if your actuals are stored separately from your predictions.

Import the first batch of actual data.

Import the second batch of prediction data.

Join these two batches of data.

Consume the joined data in the plugin to calculate and visualize one or more metrics over time.

Export the metrics to a data store of choice.
Prerequisites

The model(s) in question is (are) solving a supervised learning problem with only one dependent variable.

The problem being solved is either a binary classification, multiclass classification, or regression problem.

There exists a single dataframe containing at least the following columns:

timestamp, a string or datetime column containing the prediction timestamps and

actual, a column containing the groundtruth labels or values that may be of numeric or string type.
The dataframe should also contain

in the case of a binary classification problem:

a prediction column, of string or integer type, for calculating the accuracy, confusion matrix, F1 score, precision or recall; or

a probability or probability_{{ positive_class }} column of continuous, numeric type taking on values in the interval [0, 1], where positive_class is the label representing the positive class (such as 1, “yes” and “anomaly”), for calculating the AUC–ROC, cumulative gains, Gini coefficient, KS plot, KS statistic, lift and ROC.

in the case of a multiclass classification problem with k classes:

a prediction column, of string or integer type, for calculating the accuracy, confusion matrix, weighted F1 score, weighted precision or weighted recall; or

a sequence probability_{{ class_0 }}, probability_{{ class_1 }}, …, probability_{{ class_k }} of columns of continuous, numeric type, taking on values in the interval [0, 1] and summing to 1.00 rowwise, for calculating the weighted, OvO AUC–ROC.

in the case of a regression problem:

a prediction column taking on continuous, numeric values.
Calculating and Visualizing Metrics for Models
The steps below describe how to configure the Model Evaluation node to calculate and visualize metrics for models.
Steps:

Import data from a database or text file into the Workflow Canvas.

Drag and drop the Model Evaluation node from the Machine Learning group of the Node tab of the Palette into the Workflow Canvas.

Connect the output socket of the Import Database/Text dataframe to the input socket of the Model Evaluation node.

Configure Evaluation Settings by opening the Model Evaluation Node Viewer.

If the model type is "Binary Classification," select a Positive Label from the dropdown provided in the Configure section.

Under the Define Window Size section:

Enter a positive integer into the Multiple text box and

Select a value from the Base Frequency dropdown.

second(s)

minute(s)

hour(s)

day(s)

week(s)

month(s)

year(s)

In the Select Metrics section, select one or more metrics by checking the appropriate boxes. The available metrics are determined by the prediction type and columns present in the input dataframe:

Binary Classification

Accuracy

Area under the curve, receiver operating characteristic (AUC–ROC)

Confusion matrix

Cumulative gains

F1 score

Gini coefficient

Kolmogorov–Smirnov (KS) plot

KS statistic

Lift (deciles)

Precision

Recall (true positive rate, sensitivity)

ROC

Multiclass Classification

Accuracy

AUC–ROC, weighted, oneversusone (OvO)

Confusion matrix

F1 score, weighted

Precision, weighted

Recall, weighted

Regression

Mean absolute error (MAE)

Mean squared error (MSE)

Rootmeansquare error (RMSE)

Rsquared

For all metrics except the confusion matrix, cumulative gains, lift (deciles), KS plot, and ROC, you can:

Set “From” and “To” dates using a date and time picker widget and

zoom into and out of the chart.

For all metrics, the Model Selection dropdown menu can be used to plot metrics for select models.

Save your evaluation settings and then run the node to export the calculated metrics.
The Information section provides details of the model, such as the prediction, deployment type, and champion model.
The possible base frequencies are:
When a metric is selected, a new tab for that metric is created next to the Data Preview tab. Under this tab, you can view an interactive chart of the metric.
RELATED READING: