Plain-English definitions of the core Databricks terms for the Data Engineer Associate exam. Simplified for learning; the Databricks documentation is authoritative.
| Term | Definition |
|---|---|
| Databricks Data Intelligence Platform | The lakehouse platform (built on Apache Spark and Delta Lake) for data engineering, analytics and ML. |
| Lakehouse | A single platform combining a data lake’s open storage with a warehouse’s reliability and governance. |
| Workspace | The Databricks environment where you organise notebooks, jobs, data and compute. |
| Cluster / compute | The Spark compute resource that runs your code; can be all-purpose (interactive) or job clusters. |
| Notebook | A document of runnable cells (SQL, Python and more) used to develop and run code. |
| Databricks SQL | The SQL interface and warehouses for querying lakehouse data and building dashboards. |
| Apache Spark | The distributed engine underneath Databricks for processing large data in parallel. |
| PySpark | The Python API for Spark, used to read, transform and write data in code. |
| Spark SQL | Running SQL queries over data in the Spark engine and the lakehouse. |
| Delta Lake | The open table format adding ACID transactions, schema enforcement and time travel over files. |
| Delta table | A table stored in Delta Lake format; the default table type on Databricks. |
| Managed table | A table whose data and metadata Databricks manages; dropping it deletes the underlying files. |
| External (unmanaged) table | A table pointing at data in a location you manage; dropping it leaves the files in place. |
| Time travel | Querying a previous version of a Delta table by version number or timestamp. |
| Medallion architecture | The bronze → silver → gold layering: raw, then cleaned/conformed, then business-level data. |
| Auto Loader | Incremental ingestion of new files from cloud storage (the cloudFiles source). |
| Structured Streaming | Spark’s engine for incremental, continuous processing of data as it arrives. |
| Lakeflow Declarative Pipelines (DLT) | The declarative pipeline framework (formerly Delta Live Tables) that builds and maintains tables for you. |
| Expectations | Data-quality rules in a declarative pipeline that validate, drop or fail rows that break them. |
| Databricks Jobs / Workflows | The orchestration tool that schedules and runs tasks (notebooks, pipelines, scripts) on a timetable or trigger. |
| Unity Catalog | The governance layer: a catalog.schema.table namespace with central permissions and lineage. |
| Catalog | The top level of the Unity Catalog namespace, containing schemas. |
| Schema (database) | A grouping of tables and views within a catalog. |
| Partitioning | Splitting a table’s files by a column’s values to speed up some queries. |
| Lineage | The tracked flow of data from source to table to dashboard, surfaced by Unity Catalog. |