Databricks DE Associate Glossary: Core Terms Defined

Plain-English definitions of the core Databricks terms for the Data Engineer Associate exam. Simplified for learning; the Databricks documentation is authoritative.

Term	Definition
Databricks Data Intelligence Platform	The lakehouse platform (built on Apache Spark and Delta Lake) for data engineering, analytics and ML.
Lakehouse	A single platform combining a data lake’s open storage with a warehouse’s reliability and governance.
Workspace	The Databricks environment where you organise notebooks, jobs, data and compute.
Cluster / compute	The Spark compute resource that runs your code; can be all-purpose (interactive) or job clusters.
Notebook	A document of runnable cells (SQL, Python and more) used to develop and run code.
Databricks SQL	The SQL interface and warehouses for querying lakehouse data and building dashboards.
Apache Spark	The distributed engine underneath Databricks for processing large data in parallel.
PySpark	The Python API for Spark, used to read, transform and write data in code.
Spark SQL	Running SQL queries over data in the Spark engine and the lakehouse.
Delta Lake	The open table format adding ACID transactions, schema enforcement and time travel over files.
Delta table	A table stored in Delta Lake format; the default table type on Databricks.
Managed table	A table whose data and metadata Databricks manages; dropping it deletes the underlying files.
External (unmanaged) table	A table pointing at data in a location you manage; dropping it leaves the files in place.
Time travel	Querying a previous version of a Delta table by version number or timestamp.
Medallion architecture	The bronze → silver → gold layering: raw, then cleaned/conformed, then business-level data.
Auto Loader	Incremental ingestion of new files from cloud storage (the cloudFiles source).
Structured Streaming	Spark’s engine for incremental, continuous processing of data as it arrives.
Lakeflow Declarative Pipelines (DLT)	The declarative pipeline framework (formerly Delta Live Tables) that builds and maintains tables for you.
Expectations	Data-quality rules in a declarative pipeline that validate, drop or fail rows that break them.
Databricks Jobs / Workflows	The orchestration tool that schedules and runs tasks (notebooks, pipelines, scripts) on a timetable or trigger.
Unity Catalog	The governance layer: a catalog.schema.table namespace with central permissions and lineage.
Catalog	The top level of the Unity Catalog namespace, containing schemas.
Schema (database)	A grouping of tables and views within a catalog.
Partitioning	Splitting a table’s files by a column’s values to speed up some queries.
Lineage	The tracked flow of data from source to table to dashboard, surfaced by Unity Catalog.

Databricks Certified Data Engineer Associate Glossary

Sources