Challenge 5: Make it work and make it scale

< Previous Challenge - Home - Next Challenge >

Introduction

Having a model is only the first step, we can now make predictions using that model. This is typically called inferencing (or scoring) and can be done

In an online fashion with an HTTP endpoint that can generate predictions for incoming data in real-time,
Or in batch by running the model on a large set of files or a database table.

From this challenge onwards you’ll have the option to either do online inferencing or batch inferencing. Please choose your path:

Online Inferencing
Batch Inferencing

Online Inferencing

Description

So, you’ve chosen for online inferencing. In order to use the model to serve predictions in an online fashion it has to be deployed to an endpoint. Luckily Vertex AI has exactly what we need, Vertex AI Endpoints provide a managed service for serving predictions.

Create a new Vertex AI Endpoint and deploy the freshly trained model. Use the smallest instance size but make sure that it can scale to more than 1 instance.

Note
The deployment of the model will take ~10 minutes to complete.

Warning
Note that the Qwiklab environment we’re using has a quota on the endpoint throughput (30K requests per minute), do not exceed that.

Success Criteria

The model has been deployed to an endpoint and can serve requests
Show that the Endpoint has scaled to more than 1 instance under load
No code change is needed for this challenge

Tips

In order to generate load you can use any tool you want, but the easiest approach would be to install apache-bench on Cloud Shell or your notebook environment.

Learning Resources

Documentation on Vertex AI Endpoints
More info on the request data format

Batch Inferencing

Description

So, you’ve chosen for the batch inferencing path. We’re going to use Vertex AI Batch Predictions to get predictions for data in a BigQuery table. First, go ahead and create a new table with at most 10K rows that’s going to be used for generating the predictions. Once the table is created, create a new Batch Prediction job with that table as the input and another BigQuery table as the output, using the previously created model. Choose a small machine type and 2 compute nodes. Don’t turn on Model Monitoring yet as that’s for the next challenge.

Note
The batch inferencing will take roughly ~25 minutes, most of that is the overhead of starting the cluster, so increasing the number of instances won’t help with the small table we’re using.

Success Criteria

There’s a properly structured input table in BigQuery with 10K rows
There’s a succesful Batch Prediction job
There are predictions in a new BigQuery table

Tips

The pipeline that we’ve used in the previous challenge contains a task to prepare the data using BigQuery, have a look at that for inspiration
Make sure that the input table has the exact same number of input columns as required by the model.

Learning Resources

Creating BigQuery datasets
Creating BigQuery tables
BigQuery public datasets
Vertex AI Batch Predictions

< Previous Challenge - Home - Next Challenge >