Altair SmartWorksâ„¢ Analytics

 

Using the Auto ML Node

The Auto ML node trains a classifier by using built-in feature transformations and optional automated hyperparameter tuning of logistic regression and decision tree algorithms.

Prerequisites

  • An Execution Profile with an active session linked to the workflow

  • A Data Frame node

  • ML Flow configuration settings added as an internal connection

 

Steps

  1. From the Machine Learning group of  the Nodes tabbed page, drag and drop the Auto ML node to the Workflow Canvas.

  2. Connect the output socket of the Data Frame node to the input socket of the Auto ML node.

  3.  

  4. Configure the Auto ML node by double-clicking on the node or using the Open option provided in the node menu.

  5. Specify the following settings:

  6.  

  7. Select the previously configured  ML Flow connection  from the connection profile dropdown.

     

    Property

    Description

    Dependent variable

    Select a dependent variable of text data type only (the column from the resultant data set on which the prediction will be based) from the list of available variables of the input data source.

    Column names

    Select columns of the input data source that are to be included in the resultant model.

    Fitting option

    Specify the fitting option or model (Default, Grid Search).

    • Default indicates that you can create a logistic regression model with built-in transformations

    • Grid Search Indicates that you can perform a 5-fold cross-validated grid search to find the optional parameters between a logistic regression and decision tree with built-in transformations. You can then select which Machine Learning models to train and test, including Logistic Regression, Decision Tree, KNN, Random Forest, Gradient Boosting, and Bagging (applicable only for the Pandas engine).

    Application Connection profile

    Select the application connection profile from the Connection list and then select the configured ML flow server added from an internal connection.

  8.  

  9. Check the code that will be executed for your specified Auto ML configuration by saving your specifications and then clicking on the Code tab of the Auto ML Node Viewer. You can also use the tab that displays to refine the code further.

  10. The auto generated code is displayed in the Code tabbed page. The Input Properties section is read-only and pre-populated with the input table name. However, you can modify the object name in the Output Properties section.

     

  11. Complete the Auto ML configuration by clicking Save. Otherwise, cancel your changes and return to the Workflow Canvas by clicking Discard or simply closing the Auto ML Viewer. Execute your configuration by clicking the Run button. When the operation has been completed, the Workflow Canvas is populated with the Model object node that is generated.

  12.  

  13. After running the Auto ML process node, a record is created in the ML Flow server and displays the details of the experiments.

  14.  

  15. Click on any Experiment ID to view more details of the experiment such as parameters, metrics, artifacts, and so on.

  16. For more details about obtaining the configured ML Flow URL, see Using ML Flow

 

Working with the AutoML Model Object Node

The output of the Auto ML process node is an object node called a Model. This object node is an interactive report containing charts and tables. Users can select which models are displayed in the charts and tables as well as options to rank models according to different metrics. In this way, users will know right away how well their models perform, how these models compare with each other, and what do to next. The models generated from the AutoML node are explained using:

  • Pipelines - A diagram view of model pipeline components and their parameters

  • Predictions - A sample view of the predictions of the model computed on a dataset

  • Evaluations - Defined charts and metrics to evaluate the performance of the predictions of the model on the dataset against the true values.

Pipelines

The visualizations for the Pipeline item vary according to the configuration of the autoML node. When configuring the autoML process node, two options are possible: Default and AutoML. The default option trains a single machine learning model whereas the autoML option performs a search to train multiple models and selects one as the final output. Consequently, the autoML option includes an additional visualization to provide details about the search results. The example below shows the pipeline visualization obtained from the Default configuration.

 

Pipeline Details displays a diagram of every pipeline component. These components can be expanded to obtain details about the component’s parameter values. If more than one details is generated by the AutoML node, you can select which model to view Pipeline Details for from the left-hand panel of the node.

Evaluations

The visualizations for the Evaluations item vary according to the problem type. A machine learning model solves different types of problem and the problem types currently available in the platform are: binary classification and multi-class classification. An additional visualization is required for a binary classification problem. The visualizations below show an ROC curve (with its associated ROC AUC metrics) and a confusion matrix (with its associated metrics).

 

 

Selecting a fitting option of Grid Search in the AutoML node and then running the node performs one grid search for logistic regression and one grid search for decision tree. Each grid search will pick the best performing model per algorithm. Hence the Grid Search option will generate 2 models: one logistic regression and one decision tree. Since two models are now generated from the Grid Search option, your may want to compare the performance of these models against each other in a single view.

Although the current example shows only one model, the ROC curve visualization actually allows you to compare multiple models in a single view via a drop-down list from which you can select models to compare.

 

Predictions

The Predictions visualization provides a view of a sample of the training dataset with predictions and probability columns for every row in the sample. This sample represents the first 200 rows of the training dataset.