Flashcards · Data & Analytics

Databricks DE Associate Flashcards

intermediate 25 cards

Free flashcards for Databricks DE Associate: flip each card to reveal the definition. Built from the glossary as a study aid, these are concept checks, not real exam questions.

By The Exam Atlas Editorial Team · Verified 2026-06-06

All 25 terms

Databricks Data Intelligence Platform: The lakehouse platform (built on Apache Spark and Delta Lake) for data engineering, analytics and ML.
Lakehouse: A single platform combining a data lake's open storage with a warehouse's reliability and governance.
Workspace: The Databricks environment where you organise notebooks, jobs, data and compute.
Cluster / compute: The Spark compute resource that runs your code; can be all-purpose (interactive) or job clusters.
Notebook: A document of runnable cells (SQL, Python and more) used to develop and run code.
Databricks SQL: The SQL interface and warehouses for querying lakehouse data and building dashboards.
Apache Spark: The distributed engine underneath Databricks for processing large data in parallel.
PySpark: The Python API for Spark, used to read, transform and write data in code.
Spark SQL: Running SQL queries over data in the Spark engine and the lakehouse.
Delta Lake: The open table format adding ACID transactions, schema enforcement and time travel over files.
Delta table: A table stored in Delta Lake format; the default table type on Databricks.
Managed table: A table whose data and metadata Databricks manages; dropping it deletes the underlying files.
External (unmanaged) table: A table pointing at data in a location you manage; dropping it leaves the files in place.
Time travel: Querying a previous version of a Delta table by version number or timestamp.
Medallion architecture: The bronze → silver → gold layering: raw, then cleaned/conformed, then business-level data.
Auto Loader: Incremental ingestion of new files from cloud storage (the cloudFiles source).
Structured Streaming: Spark's engine for incremental, continuous processing of data as it arrives.
Lakeflow Declarative Pipelines (DLT): The declarative pipeline framework (formerly Delta Live Tables) that builds and maintains tables for you.
Expectations: Data-quality rules in a declarative pipeline that validate, drop or fail rows that break them.
Databricks Jobs / Workflows: The orchestration tool that schedules and runs tasks (notebooks, pipelines, scripts) on a timetable or trigger.
Unity Catalog: The governance layer: a catalog.schema.table namespace with central permissions and lineage.
Catalog: The top level of the Unity Catalog namespace, containing schemas.
Schema (database): A grouping of tables and views within a catalog.
Partitioning: Splitting a table's files by a column's values to speed up some queries.
Lineage: The tracked flow of data from source to table to dashboard, surfaced by Unity Catalog.