Altair SmartWorks Analytics

 

Registering Models

This page describes how different models are registered in SmartWorks Analytics.

Registering Sklearn Models

To register an Sklearn model, you must upload a single .pkl file.

Supported Versions for Sklearn

In this release, only a subset of package versions are supported:

  • 0.21.3

  • 0.22.2.post1

  • 0.23.2

  • 0.24.1

Registering Knowledge Studio Models

To register a Knowledge Studio model, you must upload a single .py file.

The specific format for registering Knowledge Studio models is as follows:

  • If you have a Single Model, then it must have a top level Python class that has a single predict() function

    • If the class has an attribute IVs, then the predict() function should take in the variables as is in the same order as the IVs list

    • If the class does not have the IVs attribute, then the predict() function should take in a Pandas row as input

    • The predict() function can return a numpy array, dictionary, or list

  • If the user has a Ensemble, then it must have a top level Python class, with sub-classes

    • Each sub-class is a Single Model with the IVs attribute and predict() function that takes the variables as is, in the same order as the IVs list

    • Each Single Model has a predict() that returns data in the following format

      • For classification models: node_id, node_number, probabilities (multiple items), node_size

      • For regression models: node_id, node_number, prediction, standard deviation, node_size

    • The DVCategories must be defined for all sub-classes if the model is for classification. If it is for regression, then it must not exist for all sub-classes

Supported versions of Knowledge Studio models with MLOps

  • 2021.1.0

Supported Knowledge Studio Models

Predictive Models

  • Linear Regression

  • Logistic Regression

  • Deep Learning - Regression

  • Deep Learning - Classification

  • Factor Analysis

  • PCA

  • Regularization

  • Strategy Tree

  • Decision Tree - Regression

  • Decision Tree - Classification

  • Reject Inference

Ensembles

  • Bagging - Regression

  • Bagging - Classification

  • Boosting - Classification

  • Random Forest - Regression

  • Random Forest - Classification

** Note that Boosting is only valid for binary classification, so no regression here

Registering Python Models

You must upload a single .py file to register a Python model.

The specific format for registering Python models is as follows:

  • The model must be a class that extends the mlflow.pyfunc.PythonModel class from MLflow. The class must have a predict(self, context, model_input) function inside of it.

  • In the predict() function, the model_input variable is a Pandas DataFrame. The data that are returned from this function must be either a Python list() or an ndarray.

  • The user can have several other types of function within their class to perform pre-processing, post-processing, etc.

  • Inside of the container that gets deployed for this model, pandas==1.1.3 and numpy==1.20.3 will be installed. Therefore, the user can leverage those libraries to  process their data if they wish.

An example code is shown below.

 

import mlflow

class MyModel(mlflow.pyfunc.PythonModel):

    def preprocess(self):

        pass

    def postprocess(self):

        pass

    def predict(self, context, model_input):

        out = list()

        for i, x in model_input.iterrows():

            if x["size"] > 50:

                out.append([True, 10])

            else:

                out.append([False, 20])

        return out

 

 

Supported versions of Python models with MLOps

  • 3.7

Registering PySpark Models

PySpark models can be registered in two ways:

  • Training the model within the platform using the Spark AutoML Node and then registering the model through the MLflow UI.

  • Training the model outside the platform, zipping it, uploading it to the platform, and then registering it through the MLOps Node UI.

Training a PySpark Model in SmartWorks Analytics

You can train your own PySpark Models in the platform and have them managed (versioning, organization, etc) via the Model Registry.

Steps:

  1. Train your model using the Spark-based AutoML Node in the SmartWorks platform.

  2. Navigate to the MLflow UI where your model is tracked – it’s the “Connection Profile“ you set when you configured the AutoML node.

  3. Click into your model to view it

    • If you scroll down to the Tags section, you’ll see some information about your model, such as createdBy, packageVersion, type, and modelMetadata

    • If you wish to add your own custom tags to this model, you can edit the modelMetadata JSON to do so. See the Training a PySpark Model Outside SmartWorks Analytics section below for details on how the JSON should be formatted

  4. Register your model by clicking on the Register Model button in the MLflow UI.

  5. Once all of the above steps are complete, your model will be visible in the Model Registry of the MLOps Node with all of the appropriate information

Training a PySpark Model Outside of SmartWorks Analytics

You can also bring in your own PySpark models that you’ve trained outside the platform to leverage SmartWorks Analytics' Model Registry and Model Serving features.

Steps:

  1. Train your PySpark model outside of the SmartWorks platform and then save it.

  2. Package the metadata and stages folders of your model into a single zip file.

  3. Upload this zip file to the SmartWorks Analytics library .

  4. In the MLOps Node, click the + Register New Model button.

  5. Set the Model Type to PYSPARK.

  6. Browse and select your zip file to set the Model File Location.

  7. Fill out the rest of the settings for your model and then click the Save button to register it.

Registering TensorFlow Models

TensorFlow models can be registered in several ways, depending on the API that will be used to reload the model (i.e., tf or tf.keras) and the save format of the model (i.e., H5 or SavedModel).

With tf.keras and H5 Format

Steps:

  1. Upload your .h5 model file to the library.

  2. Select tf.keras from the API dropdown.

  3. Select h5 from the Model Format dropdown.

  4. Select your .h5 model file from the Model File Location browser.

When you deploy a model with this configuration through the MLOps application, it will be loaded using the tf.keras.models.load_model() function from TensorFlow.

With the tf.keras API and SavedModel Format

Steps:

  1. On your local file system, navigate to your SavedModel model’s folder.

  2. Select all of the files in the folder and create a .zip file from them.

  3. Upload the .zip file to the library.

  4. Select tf.keras from the API dropdown.

  5. Select SavedModel from the Model Format dropdown.

  6. Select your .zip file from the Model File Location browser.

When you deploy a model with this configuration through the MLOps application, it will be loaded using the tf.keras.models.load_model() function from TensorFlow.

With the tf API and SavedModel format

Steps:

  1. On your local file system, navigate to your SavedModel model’s folder.

  2. Select all of the files in the folder and create a .zip file from them.

  3. Upload the .zip file to the library.

  4. Select tf.keras from the API dropdown.

  5. Select SavedModel from the Model Format dropdown.

  6. Provide the tag or sequence of comma-separated tags identifying the MetaGraph to load with the model using the MetaGraph Tags input. For example: init,serve.

  7. Provide the string for the SignatureDef Key that you wish to use for the model. For example: serving_default.

  8. Select your .zip file from the Model File Location browser.

When you deploy a model with this configuration through the MLOps application, it will be loaded using the tf.saved_model.load() function from TensorFlow, where your MetaGraph Tags will be passed to the tags parameter.

Supported Versions for TensorFlow

  • 2.5.2

  • 2.6.2

Registering Models from MLflow UI

Models that were created by the AutoML node will appear in the MLflow UI. You can register them from the MLflow UI so that they show up in the MLOps Node’s Model Registry.  

 

 

 

 

When a model is successfully registered from the MLflow UI, it displays in the Model Registry page.

 

This newly registered model will also appear in the MLflow UI in the Models and Experiments sections.

You can modify the tags of a models via a Code node.

NOTE: If you wish to eventually deploy your model, the packageVersion that you tag for the model must be supported for that MODEL_TYPE in MLOps.

 

from mlflow.tracking import MlflowClient

import mlflow

MODEL_NAME = "workflow-iris-automl-model"

MODEL_VERSION = "2"

CREATED_BY = "George"

MODEL_TYPE = "SKLEARN"

SKLEARN_VERSION = "0.23.2"

TAGS = {"type": MODEL_TYPE, "createdBy": CREATED_BY, "packageVersion": SKLEARN_VERSION}

client = MlflowClient()

for k, v in TAGS.items():

    # Set tag for the specifc model version

    client.set_model_version_tag(name=MODEL_NAME, version=MODEL_VERSION, key=k, value=v)