Challenge 5: Make it work and make it scale
< Previous Challenge - Home - Next Challenge >
Introduction
Having a model is only the first step, we can now make predictions using that model. This is typically called inferencing (or scoring) and can be done
- In an online fashion with an HTTP endpoint that can generate predictions for incoming data in real-time,
- Or in batch by running the model on a large set of files or a database table.
From this challenge onwards you’ll have the option to either do online inferencing or batch inferencing. Please choose your path:
Online Inferencing
Description
So, you’ve chosen for online inferencing. In order to use the model to serve predictions in an online fashion it has to be deployed to an endpoint. Luckily Vertex AI has exactly what we need, Vertex AI Endpoints provide a managed service for serving predictions.
Create a new Vertex AI Endpoint and deploy the freshly trained model. Use the smallest instance size but make sure that it can scale to more than 1 instance.
Note
The deployment of the model will take ~10 minutes to complete.
Warning
Note that the Qwiklab environment we’re using has a quota on the endpoint throughput (30K requests per minute), do not exceed that.
Success Criteria
- The model has been deployed to an endpoint and can serve requests
- Show that the Endpoint has scaled to more than 1 instance under load
- No code change is needed for this challenge
Tips
- In order to generate load you can use any tool you want, but the easiest approach would be to install apache-bench on Cloud Shell or your notebook environment.
Learning Resources
- Documentation on Vertex AI Endpoints
- More info on the request data format
Batch Inferencing
Description
So, you’ve chosen for the batch inferencing path. We’re going to use Vertex AI Batch Predictions to get predictions for data in a BigQuery table. First, go ahead and create a new table with at most 10K rows that’s going to be used for generating the predictions. Once the table is created, create a new Batch Prediction job with that table as the input and another BigQuery table as the output, using the previously created model. Choose a small machine type and 2 compute nodes. Don’t turn on Model Monitoring yet as that’s for the next challenge.
Note
The batch inferencing will take roughly ~25 minutes, most of that is the overhead of starting the cluster, so increasing the number of instances won’t help with the small table we’re using.
Success Criteria
- There’s a properly structured input table in BigQuery with 10K rows
- There’s a succesful Batch Prediction job
- There are predictions in a new BigQuery table
Tips
- The pipeline that we’ve used in the previous challenge contains a task to prepare the data using BigQuery, have a look at that for inspiration
- Make sure that the input table has the exact same number of input columns as required by the model.
Learning Resources
- Creating BigQuery datasets
- Creating BigQuery tables
- BigQuery public datasets
- Vertex AI Batch Predictions