Altair SmartWorks Analytics

 

Basic Serving of Knowledge Studio Models

This topic describes how Knowledge Studio models are handled inside of the Docker containers deployed on Seldon Core.

Preprocessing

All data are converted into a pandas DataFrame before being processed by the model. For Knowledge Studio models specifically, the user must pass in the list of names.

Additionally, all numbers are converted into numeric datatypes for Python models to be able to process them.

Both conditions are shown in the single line of code below.

 

X = pd.DataFrame(X, columns=names).apply(pd.to_numeric, errors='ignore')

 

 

Predictions

With Knowledge Studio models, Single Models are processed in a slightly different manner compared with Ensembles.

Single Models

For single model predictions, the predict method of the model class is executed on each DataFrame row.

  • IF the model class contains the IVs attribute list, then the input values will be unpacked according to the names list

  • IF the model does not contain the IVs attribute list, then the predict function will be executed on the Pandas row as is

This functionality is illustrated in the code below:

 

unpack_input = True if hasattr(model, "IVs") else False

if unpack_input:

    r = model.predict(*[x[column] for column in model.IVs])

else:

    r = model.predict(x)

 

 

The raw output from the predict on all rows of the Pandas DataFrame is returned to the user in the response as a 2D ndarray.

Ensembles

Ensembles from Knowledge Studio are a collection of Single Model decision trees. We must combine all of these results to get the final output.

  • Classification Ensembles – For each row, average probabilities for dependent variable values over all trees in the ensemble

  •  

    avg_values = [0] * num_classes

    probas = []

    factor = 0

    # Loop through all of the trees and predict

    # Add up the resulting probabilities scaled based on the sub-tree weight

    for tree in ensemble:

        weight = tree.Weight

        values = tree.predict(row)

        tree_probas = values[2:2+num_classes] # Grab all of the probabilities

        tree_probas_weighted = [tree_probas[i] * weight for i in range(num_classes)]

        avg_values = list(map(add, avg_values, tree_probas_weighted))

        factor += weight

    # Get the average probability for each class across all of the sub-trees

    for n in range(num_classes):

        avg_values[n] /= factor

        probas.append(avg_values[n])

    return probas

     

     

  • Regression Ensembles – For each row, the prediction value over all trees in the ensemble

  •  

    factor = 0

    avg_prediction = 0

    # Loop through all of the trees and predict

    # Add up the resulting predictions scaled based on the sub-tree weight

    for tree in ensemble:

        values = tree.predict(row)

        avg_prediction += tree.Weight * values[2] * values[4]

        factor += tree.Weight * values[4]

    prediction = avg_prediction / factor

    return prediction