Glossary · Data & Analytics

Databricks Certified Data Engineer Associate Glossary

intermediate

A free Databricks Data Engineer Associate glossary: core terms (Delta Lake, Auto Loader, Unity Catalog, medallion, DLT, Structured Streaming) in plain English.

By The Exam Atlas Editorial Team · Verified 2026-06-06

Plain-English definitions of the core Databricks terms for the Data Engineer Associate exam. Simplified for learning; the Databricks documentation is authoritative.

TermDefinition
Databricks Data Intelligence PlatformThe lakehouse platform (built on Apache Spark and Delta Lake) for data engineering, analytics and ML.
LakehouseA single platform combining a data lake’s open storage with a warehouse’s reliability and governance.
WorkspaceThe Databricks environment where you organise notebooks, jobs, data and compute.
Cluster / computeThe Spark compute resource that runs your code; can be all-purpose (interactive) or job clusters.
NotebookA document of runnable cells (SQL, Python and more) used to develop and run code.
Databricks SQLThe SQL interface and warehouses for querying lakehouse data and building dashboards.
Apache SparkThe distributed engine underneath Databricks for processing large data in parallel.
PySparkThe Python API for Spark, used to read, transform and write data in code.
Spark SQLRunning SQL queries over data in the Spark engine and the lakehouse.
Delta LakeThe open table format adding ACID transactions, schema enforcement and time travel over files.
Delta tableA table stored in Delta Lake format; the default table type on Databricks.
Managed tableA table whose data and metadata Databricks manages; dropping it deletes the underlying files.
External (unmanaged) tableA table pointing at data in a location you manage; dropping it leaves the files in place.
Time travelQuerying a previous version of a Delta table by version number or timestamp.
Medallion architectureThe bronze → silver → gold layering: raw, then cleaned/conformed, then business-level data.
Auto LoaderIncremental ingestion of new files from cloud storage (the cloudFiles source).
Structured StreamingSpark’s engine for incremental, continuous processing of data as it arrives.
Lakeflow Declarative Pipelines (DLT)The declarative pipeline framework (formerly Delta Live Tables) that builds and maintains tables for you.
ExpectationsData-quality rules in a declarative pipeline that validate, drop or fail rows that break them.
Databricks Jobs / WorkflowsThe orchestration tool that schedules and runs tasks (notebooks, pipelines, scripts) on a timetable or trigger.
Unity CatalogThe governance layer: a catalog.schema.table namespace with central permissions and lineage.
CatalogThe top level of the Unity Catalog namespace, containing schemas.
Schema (database)A grouping of tables and views within a catalog.
PartitioningSplitting a table’s files by a column’s values to speed up some queries.
LineageThe tracked flow of data from source to table to dashboard, surfaced by Unity Catalog.

Sources