Flashcards · Data & Analytics

Databricks DE Associate Flashcards

intermediate 25 cards

Free flashcards for Databricks DE Associate: flip each card to reveal the definition. Built from the glossary as a study aid, these are concept checks, not real exam questions.

By The Exam Atlas Editorial Team · Verified 2026-06-06

All 25 terms

Databricks Data Intelligence Platform
The lakehouse platform (built on Apache Spark and Delta Lake) for data engineering, analytics and ML.
Lakehouse
A single platform combining a data lake's open storage with a warehouse's reliability and governance.
Workspace
The Databricks environment where you organise notebooks, jobs, data and compute.
Cluster / compute
The Spark compute resource that runs your code; can be all-purpose (interactive) or job clusters.
Notebook
A document of runnable cells (SQL, Python and more) used to develop and run code.
Databricks SQL
The SQL interface and warehouses for querying lakehouse data and building dashboards.
Apache Spark
The distributed engine underneath Databricks for processing large data in parallel.
PySpark
The Python API for Spark, used to read, transform and write data in code.
Spark SQL
Running SQL queries over data in the Spark engine and the lakehouse.
Delta Lake
The open table format adding ACID transactions, schema enforcement and time travel over files.
Delta table
A table stored in Delta Lake format; the default table type on Databricks.
Managed table
A table whose data and metadata Databricks manages; dropping it deletes the underlying files.
External (unmanaged) table
A table pointing at data in a location you manage; dropping it leaves the files in place.
Time travel
Querying a previous version of a Delta table by version number or timestamp.
Medallion architecture
The bronze → silver → gold layering: raw, then cleaned/conformed, then business-level data.
Auto Loader
Incremental ingestion of new files from cloud storage (the cloudFiles source).
Structured Streaming
Spark's engine for incremental, continuous processing of data as it arrives.
Lakeflow Declarative Pipelines (DLT)
The declarative pipeline framework (formerly Delta Live Tables) that builds and maintains tables for you.
Expectations
Data-quality rules in a declarative pipeline that validate, drop or fail rows that break them.
Databricks Jobs / Workflows
The orchestration tool that schedules and runs tasks (notebooks, pipelines, scripts) on a timetable or trigger.
Unity Catalog
The governance layer: a catalog.schema.table namespace with central permissions and lineage.
Catalog
The top level of the Unity Catalog namespace, containing schemas.
Schema (database)
A grouping of tables and views within a catalog.
Partitioning
Splitting a table's files by a column's values to speed up some queries.
Lineage
The tracked flow of data from source to table to dashboard, surfaced by Unity Catalog.