Basic Serving of Knowledge Studio Models
This topic describes how Knowledge Studio models are handled inside of the Docker containers deployed on Seldon Core.
Preprocessing
All data are converted into a pandas DataFrame before being processed by the model. For Knowledge Studio models specifically, the user must pass in the list of names.
Additionally, all numbers are converted into numeric datatypes for Python models to be able to process them.
Both conditions are shown in the single line of code below.
X = pd.DataFrame(X, columns=names).apply(pd.to_numeric, errors='ignore')
|
Predictions
With Knowledge Studio models, Single Models are processed in a slightly different manner compared with Ensembles.
Single Models
For single model predictions, the predict method of the model class is executed on each DataFrame row.
-
IF the model class contains the IVs attribute list, then the input values will be unpacked according to the names list
-
IF the model does not contain the IVs attribute list, then the predict function will be executed on the Pandas row as is
This functionality is illustrated in the code below:
unpack_input = True if hasattr(model, "IVs") else False if unpack_input: r = model.predict(*[x[column] for column in model.IVs]) else: r = model.predict(x)
|
The raw output from the predict on all rows of the Pandas DataFrame is returned to the user in the response as a 2D ndarray.
Ensembles
Ensembles from Knowledge Studio are a collection of Single Model decision trees. We must combine all of these results to get the final output.
-
Classification Ensembles – For each row, average probabilities for dependent variable values over all trees in the ensemble
-
Regression Ensembles – For each row, the prediction value over all trees in the ensemble
avg_values = [0] * num_classes probas = [] factor = 0 # Loop through all of the trees and predict # Add up the resulting probabilities scaled based on the sub-tree weight for tree in ensemble: weight = tree.Weight values = tree.predict(row) tree_probas = values[2:2+num_classes] # Grab all of the probabilities tree_probas_weighted = [tree_probas[i] * weight for i in range(num_classes)] avg_values = list(map(add, avg_values, tree_probas_weighted)) factor += weight # Get the average probability for each class across all of the sub-trees for n in range(num_classes): avg_values[n] /= factor probas.append(avg_values[n]) return probas
|
factor = 0 avg_prediction = 0 # Loop through all of the trees and predict # Add up the resulting predictions scaled based on the sub-tree weight for tree in ensemble: values = tree.predict(row) avg_prediction += tree.Weight * values[2] * values[4] factor += tree.Weight * values[4] prediction = avg_prediction / factor return prediction
|