Altair SmartWorks Analytics

 

Deployments

The Deployments page is the second of two tabs that display when the MLOps node is opened and shows a list of models that have been deployed.

This page provides a means through which users can deploy models to obtain an endpoint where they can send API requests and receive responses for predictions.

 

The Main Page includes a Deployments tab, with a table having the following columns

  • Deployment Name – the name of the deployment

  • Created by – the user who created that model

  • Last Updated – the date and time when the deployment was last updated

  • Status – the current status of the deployment, as explained below:

    • LIVE – The deployment is up and running. The user can send requests to the model endpoint. If the requests are formatted correctly, then they should receive a response.

    • DEPLOYING – The deployment is being made for the first time.

    • UPDATING – The deployment is being updated. This will happen if the deployment was previously in the LIVE state and the user changes some settings, such as the model, resource allocation, number of pods, etc. While the deployment is in this state, the user can send requests to the model endpoint. If the requests are formatted correctly, then they should receive a response.

    • FAILED – The deployment has failed when Kubernetes tried to create the underlying resources: deployments, services, and pods.

NOTE:

Deployments may sometimes hang in the pending state on Kubernetes. This can happen in situations where, for example, there are not enough resources on the cluster for the requested pods. In this case, the deployment state will display as either DEPLOYING or UPDATING, depending on what was happening at the time.

 

You can sort any of these columns by clicking on the sorting buttons located to the right of each column header. Typing the first few characters of a deployment name into the Search box located at the upper right-hand corner of the page displays all deployments corresponding to this string. Different deployment profiles can be viewed by clicking and selecting the appropriate profile from the drop-down list located beside the Search bar.

You can also register add a new deployment version by clicking the Add Deployment button.

Adding a New Deployment

Steps:

  1. Click on the Add Deployment button in the Deployments page.

  2. The Add New Deployment page displays.

     

  3. Provide the following properties:

  4. Property

    Description

    Deployment Profile

    The Internal Connection for the Seldon cluster and namespace that you wish to deploy your model(s) to.

    If you select a Deployment Profile that has the Has GPU Nodes checkbox checked, then you will additionally see another checkbox on the Deployments page called Use GPU. Checking it will add the Additional GPU Configurations to the deployment manifest, allowing you to deploy your model endpoints to specific GPU nodes.

    Deployment Name

    The name assigned to the deployment. This name displays in the Deployments page.

    Deployment Type

    Choose one of three possible deployment types:

    • Regular – a single model receiving all of the traffic

    • A/B/n Testing – multiple models with a user-specified traffic distribution (Note: The traffic total for all models must add up to 100%)

    • Champion / Challenger – two models, each receiving the exact same traffic but only the champion sends the response back to the requester

    Model Repository

    The MLflow internal connection under which the model will be registered

    Feedback Loop Database

    An SQL database (configured as an Internal Connection) in which to store all requests, responses, and model and deployment metadata.

    Model Name

    Specify a model for deployment; this model must be added to the Model Registry

    Model Version

    Specify the version of the selected model to use

    Scaling Properties

    Autoscaling - Allows the user to set the min and max pods for the Kubernetes Horizontal Pod Autoscaler, along with the target average % utilization of CPU and RAM

    Manual - Allows the user to set the number of pods and the Kubernetes resource requests for CPU and RAM

    When the scaling type has been selected, specify the amount of resources to provide for the deployment:

    • CPU

    • RAM

    • Number of pods

    • CPU Target %

    • Ram Target %

    • Min Pods Count

    • Max Pods Count

     

    Inputs must be in Kubernetes format; for example:

    • CPU = 100m

    • RAM = 32Mi

     

    Notes:

    • The CPU and RAM settings correspond to the Kubernetes Requests settings. The Limits will be automatically set to the Requests + 50%.

    • When setting your desired deployment resources, be sure to set the RAM high enough to allow your model to load into memory. We recommend a minimum of 100Mi (larger models, such as TensorFlow, will need more). Having less resources than this value will likely result in a memory crash and your deployment never coming to a LIVE state.

     

     

  5. Click Deploy when you are finished.

The new deployment displays in the Deployments page.

 

Viewing a Deployment

The properties of a deployment display when you click on a deployment from the Deployments page.

 

Editing a Deployment

When a specific deployment is selected in the Deployments page, several buttons display at the top of the page.

 

You can use these buttons to open, edit, or remove (delete) the selected deployment. Selecting Edit displays the deployment properties page. You can, for example, change the  model to use, the deployment type, the amount of resources/scaling type, or the deployment profile.

Inspecting Seldon Deployments

When you make a deployment through SmartWorks Analytics, a Seldon deployment is created. The names of all of your Seldon deployments can be retrieved via the command kubectl get sdep -n <your_namespace>.

The pods of your deployment, by default, will be named according to the following schema: <deployment_name>-<model_name>-<model_version>-<podspec_idx>-<container_name>. However, if the length of this name is greater than 63 characters (the max length for DNS compliance), then the pod names will be changed to the schema seldon-<md5_hash_of_original_name>.

To discover which pods match which deployments when the pod names have been md5 hashed, you can inspect the result of the following command: kubectl get sdep <deployment_name> -o yaml -n <your_namespace>.

Request-Response Formats

Authentication

Before doing any requests on a deployed model endpoint, you will have to retrieve a TOKEN from Keycloak. You can do so by executing the following script:

 

curl --data "username=UserAdmin&password=MyPassword&grant_type=password&client_id=istio-gateway" \

     https://smartworks.altair.com/auth/realms/smartworks/protocol/openid-connect/token

 

 

Where, in the above code:

  • “UserAdmin“ is the Keycloak username

  • “MyPassword“ is the Keycloak password

  • “istio-gateway“ is the name of a client in Keycloak and has been configured for authentication on the Seldon cluster using an AuthorizationPolicy

Example using Python:

 

import requests

AUTH_URL = "https://smartworksanalytics.altair.com/auth/realms/smartworksanalytics/protocol/openid-connect/token"

AUTH_DATA = {

    "username": "UserAdmin",

    "password": "MyPassword",

    "grant_type": "password",

    "client_id": "istio-gateway"

}

r = requests.post(url=AUTH_URL, data=AUTH_DATA)

token = r.json()["access_token"]

 

 

You can then put the TOKEN into an environment variable called TOKEN and follow the rest of the information below for sending requests.

Requests

The data required for sending requests varies slightly depending on the type of model you have deployed; see the subsections of 'Basic Serving for X” for the specific details. Generally speaking, however, you make requests as follows:

  • data OR jsonData – Required. This is your input data. Use jsonData for TensorFlow models and data for all other types of models.

  • names – Required only for Sklearn (pipelines), KStudio, and PySpark models. This is a list a column names to apply to the input data prior to passing it to the model.

  • meta – Optional. This is a dictionary of model type-specific run time options.

The following example shows a request using curl:

 

curl -X POST https://seldon.smartworks.altair.com/seldon/seldon-ns/seldon-model/api/v1.0/predictions \

     -H 'Authorization: Bearer $TOKEN' \

     -H 'Content-Type: application/json' \

     -d '{ "data": { "ndarray": [[33, "Male", 50], [32, "Female", 60]], "names": ["age", "sex", "hours-per-week"] } }'

 

 

The following example shows another request in Python:

 

import requests

import json

 

DEPLOYMENT_NAME = "my-deployment"

MODEL_ENDPOINT = "https://seldon.smartworks.altair.com/seldon/seldon-ns/{}/api/v1.0/predictions".format(DEPLOYMENT_NAME)

 

REQUEST_DATA = { "data": { "ndarray": [[33, "Male", 50], [32, "Female", 60]], "names": ["age", "sex", "hours-per-week"] } }

REQUEST_HEADERS = {'Authorization': 'Bearer {}'.format(os.getenv("TOKEN")), "Content-Type": "application/json"}

 

r = requests.post(url=ML_REQUEST_URL, data=json.dumps(REQUEST_DATA), headers=REQUEST_HEADERS)

 

print(r)

print(r.text)

 

 

Responses

The response from Seldon always follows the same format, that is:

  • data – the output predicted by the model

  • meta – any metadata or tags that are from the ML model

The example below shows a response. Note that the format of the data may slightly differ according to the Model Type.

 

{

    "data": {

      "names": [0, 1, 2],

      "ndarray": [[0.9612952553605776, 0.03840242822212667, 0.0003023164172958162],

                  [0.0019173336702884656, 0.2109311182562371, 0.7871515480734745]]

    },

    "meta": {}

}