Prepare Databricks Machine Learning Professional certification exam

Victor Bonnet
5 min readNov 24, 2023

--

Databricks offers an end-to-end big data analytics and MLOps scalable solution including the ability to track, version, and manage machine learning experiments and machine learning models lifecycle.

Databricks allows the development of machine learning models using a variety of machine learning libraries, leveraging the Spark architecture for accelerated model training. It facilitates the seamless automation of training and deployment processes for new model versions in production, as well as the implementation of automated tests and ongoing monitoring of deployed models.

The Databricks Machine Learning Professional certification exam is all about assessing one’s ability to use these capabilities and perform advanced machine learning in production tasks.

Hereafter are the links to the official Databricks pages related to this exam:

  • Here is the official certification information page.
  • Here is the recommend official video learning path to prepare for this certification.

In this article, starting from the official Machine Learning Professional Exam Guide, are provided some tests and hands-on exercises I did while preparing the certification. Hopefully this can helps ones who wants to pass the certification to prepare the exam.

Note that some of the examples provided in the next notebooks may be inspired from the demos shown in the official video learning path.

Section 1: Experimentation

Data Management

Read and write a Delta table
View Delta table history and load a previous version of a Delta table
Create, overwrite, merge, and read Feature Store tables in machine learning workflows

Streaming

Describe Structured Streaming as a common processing tool for ETL pipelines
● Identify structured streaming as a continuous inference solution on incoming data
● Describe why complex business logic must be handled in streaming deployments
● Identify that data can arrive out-of-order with structured streaming
● Identify continuous predictions in time-based prediction store as a scenario for streaming deployments
● Convert a batch deployment pipeline inference to a streaming deployment pipeline
● Convert a batch deployment pipeline writing to a streaming deployment pipeline

Comprehensive Drift Solutions

Describe a common workflow for measuring concept drift and feature drift
● Identify when retraining and deploying an updated model is a probable solution to drift
● Test whether the updated model performs better on the more recent data

Based on my November 2023 certification experience, a solid understanding of the content presented in the official video, combined with practical hands-on tests in Databricks, proved sufficient for success. The MLflow part is definitely not to be neglected.

--

--

Victor Bonnet

Learning Data engineering, Cloud computing and data science after 10 years as structural engineer in aeronautics