Challenge 7: Close the loop

< Previous Challenge - Home

Introduction

If you’ve completed all of the previous challenges, you’re now ready to bring it all together. This task is all about automating the whole process, so that when Model Monitoring raises an alert, a new model is trained and deployed.

Just like the previous challenges, if you’ve chosen the online inferencing path, continue to Online Loop, otherwise please skip to the Batch Loop section.

Note
For this challenge we’ll keep things simple, we’ll reuse the original training data to retrain and won’t do anything if the model is not better, but in real world you’d be using a combination of existing data with the new data, and take manual actions if automatic retraining doesn’t yield better results.

Online Loop

Description

Use the provided build pipeline (clouddeploy.yaml) to create a new build configuration. Make sure that it’s only triggered when a webhook is called. Also provide the necessary variables, such as the model training code version, endpoint name etc. Name this trigger CT-CD (or continous-training-and-delivery).

Configure Log based alerts for Model Monitoring, and use webhooks as a notification channel to trigger the build.

Success Criteria

  1. There’s a correctly configured build pipeline that can be triggered through webhooks only, named CT-CD (or continous-training-and-delivery).
  2. Model Monitoring alerts can trigger the mentioned build through Log based alerts.
  3. There’s at least one successful build

Tips

  • Cloud Build supports inline yaml as well
  • You can create/update a Monitoring Job with the gcloud cli which has more configuration options than the UI

Learning Resources

Batch Loop

Description

Typically Batch Predictions are asynchronous and are scheduled to run periodically (daily/weekly etc). You can trigger batch jobs using different methods, for this challenge we’ll use Cloud Build pipelines in combination with Vertex AI pipelines. Create a new Cloud Build trigger using the provided batchdeploy.yaml file, don’t forget to set the required variables. Call this trigger CD (or continous-delivery) and make sure that this build pipeline is triggered through webhook events. Create a new Cloud Scheduler job that runs every Sunday at 3:30 and uses the webhook event URL as the execution method.

Running the batch predictions periodically will only get us half way. We need to monitor any Model Monitoring alerts and act on that. There’s another Cloud Build pipeline definition provided by clouddeploy.yaml that’s responsible for retraining. Configure that in a new Cloud Build trigger, call it CT (or continous-training) set the required variables (remember to set ENDPOINT to [none], the others should be familiar, when in doubt have a look at the yaml file). Use Pub/Sub messages as the trigger event and pick the batch-monitoring topic.

Note
At the time of this writing the batch monitoring results are written to Cloud Storage, but there’s no additional events or logs triggered when there are monitoring alerts. This project includes some custom services that we prepared to periodically poll for batch monitoring results and then put a message in Pub/Sub if there’s a new batch that has created any alerts.

Success Criteria

  1. There’s a correctly configured build pipeline for batch predictions that can be triggered with webhooks, called CD (or continous-delivery)
  2. There’s a Cloud Scheduler job that is configured to run every Sunday at 3.30 triggering the batch predictions build pipeline
  3. There’s a correctly configured build pipeline for retraining that can be triggered with Pub/Sub messages, called CT (or continous-training)
  4. Show that all the components have run at least once

Tips

  • Cloud Build supports inline yaml as well
  • You can force run a Cloud Scheduler job

Learning Resources