The Databricks Data Engineer Associate is organised into five topic areas. This is a plain-English summary; the official Databricks exam guide is authoritative. Databricks does not publish a percentage weight for each area, so the table below lists them without invented numbers - prepare across all five.
| # | Topic area | Official weight |
|---|---|---|
| 1 | Databricks Data Intelligence Platform | Not published |
| 2 | Development and Ingestion | Not published |
| 3 | Data Processing and Transformations | Not published |
| 4 | Productionizing Data Pipelines | Not published |
| 5 | Data Governance and Quality | Not published |
1 - Databricks Data Intelligence Platform
The lakehouse foundations: the workspace, clusters and compute, notebooks, Databricks SQL, and how data is organised with the medallion (bronze → silver → gold) design. This is the mental model the other areas build on.
2 - Development and Ingestion
Getting data in and creating tables. Delta Lake tables (ACID, schema enforcement, time travel), the difference between managed and external tables, basic reads and writes, and incremental file ingestion with Auto Loader.
3 - Data Processing and Transformations
Transforming data with Spark SQL and PySpark, and the basics of Structured Streaming for incremental processing - understanding what a streaming query does differently from a one-off batch run.
4 - Productionizing Data Pipelines
Building declarative pipelines with Lakeflow Declarative Pipelines (formerly Delta Live Tables), including data-quality expectations, and orchestrating and scheduling work with Databricks Jobs/Workflows.
5 - Data Governance and Quality
Governing data with Unity Catalog - the catalog.schema.table namespace, permissions and lineage - and applying data-quality checks within pipelines so bad data is caught early.