Flashcards · Data & Analytics
Databricks DE Associate Flashcards
Free flashcards for Databricks DE Associate: flip each card to reveal the definition. Built from the glossary as a study aid, these are concept checks, not real exam questions.
1 / 25
Click the card (or press Space) to flip · use Prev/Next to move
All 25 terms
- Databricks Data Intelligence Platform
- The lakehouse platform (built on Apache Spark and Delta Lake) for data engineering, analytics and ML.
- Lakehouse
- A single platform combining a data lake's open storage with a warehouse's reliability and governance.
- Workspace
- The Databricks environment where you organise notebooks, jobs, data and compute.
- Cluster / compute
- The Spark compute resource that runs your code; can be all-purpose (interactive) or job clusters.
- Notebook
- A document of runnable cells (SQL, Python and more) used to develop and run code.
- Databricks SQL
- The SQL interface and warehouses for querying lakehouse data and building dashboards.
- Apache Spark
- The distributed engine underneath Databricks for processing large data in parallel.
- PySpark
- The Python API for Spark, used to read, transform and write data in code.
- Spark SQL
- Running SQL queries over data in the Spark engine and the lakehouse.
- Delta Lake
- The open table format adding ACID transactions, schema enforcement and time travel over files.
- Delta table
- A table stored in Delta Lake format; the default table type on Databricks.
- Managed table
- A table whose data and metadata Databricks manages; dropping it deletes the underlying files.
- External (unmanaged) table
- A table pointing at data in a location you manage; dropping it leaves the files in place.
- Time travel
- Querying a previous version of a Delta table by version number or timestamp.
- Medallion architecture
- The bronze → silver → gold layering: raw, then cleaned/conformed, then business-level data.
- Auto Loader
- Incremental ingestion of new files from cloud storage (the cloudFiles source).
- Structured Streaming
- Spark's engine for incremental, continuous processing of data as it arrives.
- Lakeflow Declarative Pipelines (DLT)
- The declarative pipeline framework (formerly Delta Live Tables) that builds and maintains tables for you.
- Expectations
- Data-quality rules in a declarative pipeline that validate, drop or fail rows that break them.
- Databricks Jobs / Workflows
- The orchestration tool that schedules and runs tasks (notebooks, pipelines, scripts) on a timetable or trigger.
- Unity Catalog
- The governance layer: a catalog.schema.table namespace with central permissions and lineage.
- Catalog
- The top level of the Unity Catalog namespace, containing schemas.
- Schema (database)
- A grouping of tables and views within a catalog.
- Partitioning
- Splitting a table's files by a column's values to speed up some queries.
- Lineage
- The tracked flow of data from source to table to dashboard, surfaced by Unity Catalog.