Altair SmartWorks Analytics

 

Deployments

General

PySpark models can be deployed just like any other model that has been registered in the Model Registry.

Steps:

  1. Navigate to the Deployments section of the MLOps Node

  2. When configuring your models, your PySpark model

  3. After filling out the rest of the settings, deploy it

Once the model is deployed, you can send requests to it through the REST API. Here is an example of how to do so via curl from the command line:

 

curl -X POST http://localhost:5000/api/v1.0/predictions -H 'Content-Type: application/json' \

    -d '{ "data": { "ndarray": [[5.1,3.5,1.4,0.2], [7.7,3.0,6.1,2.3]], "names": ["sepal_length", "sepal_width", "petal_length", "petal_width"] }, "meta": {"method": "predict_proba"} }'

 

 

Preprocessing

All data is first converted into a pandas DataFrame with column names attached, before being converted to a Spark DataFrame, and then being processed by the model. If the user passes in the list of names, then it will be applied to the columns of the DataFrame. For PySpark models specifically, the user must pass in the list of names.

 

df = pd.DataFrame(X, columns=names)

data = spark_session.createDataFrame(df)

 

 

The choice of whether to return the final predicted labels or the probability of each class is determined based on the meta passed in through the request. That is, the meta dictionary will be checked for the method key-value:

If the value of method is predict then the final predicted labels will be returned

If the value of method is predict_proba then the probability of each class will be returned

The default method is always predict.

 

method = self.default_method # The default is "predict"

if meta and isinstance(meta.get("method", 0), str):

    method = meta["method"]

 

 

Predictions

Based on the method chosen by the user, the results of the model will be returned in different ways. In either case, the final result will always be reshaped and returned as a 2D numpy array.

 

preds = self.model.transform(data)

if method == "predict_proba":

    res = [x.probability for x in preds.select("probability").collect()]

else:

    res = [x.prediction for x in preds.select("prediction").collect()]

res_np = np.asarray(res)