A team needs an interactive cluster to develop and test notebook code together during the day. Which compute is the best fit?

An all-purpose (interactive) cluster

A job cluster that is created and terminated per scheduled run

Why is the lakehouse described as combining a data lake with a data warehouse?

It keeps open, low-cost storage like a lake while adding warehouse-style reliability and governance

It stores data only in a proprietary warehouse format with no open files

It replaces SQL entirely with Python

It removes the need for any data modelling

Databricks SQL is primarily used to:

Run SQL queries and build dashboards on lakehouse data

Schedule multi-task production jobs

Define row-level pipeline dependencies automatically

Manage cluster auto-termination policies

You drop a managed table in Databricks. What happens to the underlying data files?

They are deleted, because Databricks manages both the metadata and the data

Nothing; the files always remain regardless of table type

Only the most recent version is deleted, older versions stay

The files are moved to another catalog automatically

You need to continuously load only newly arrived files from a cloud storage folder, without reprocessing old ones. Which Databricks feature is built for this?

Auto Loader (the cloudFiles source)

A static one-time CREATE TABLE from the folder

What is the key difference between a managed table and an external (unmanaged) table?

For external tables you control the storage location, and dropping the table leaves the files in place

Managed tables cannot be queried with SQL

External tables cannot store more than one version of data

Managed tables do not support time travel

Which SQL command would you use to load query results into a new managed Delta table from existing data?

CREATE TABLE ... AS SELECT (CTAS)

When ingesting raw source files into the bronze layer, the usual goal is to:

Capture the data as-is with minimal transformation, so nothing is lost

Apply all business aggregations immediately

Enforce final report formatting before storing

Delete any columns you will not show to end users

How does Structured Streaming differ from a standard batch query?

It processes data incrementally as it arrives, rather than in one complete run

It must run on a single machine

It cannot write to Delta tables

When transforming bronze data into a silver table, a typical step is to:

Clean and standardise the data, for example fixing types and removing bad records

Grant catalog admin rights to all users

Schedule the workspace to shut down

Convert the table into a dashboard

A streaming query that reads from a source and writes to a Delta sink uses a checkpoint location mainly to:

Track progress so the stream can resume exactly where it left off

Store the cluster's billing information

Define which users can read the output

Format the output for dashboards

In Spark, a transformation such as filter() is described as 'lazy'. This means it:

Is only planned and does not run until an action triggers execution

Runs immediately and returns rows at once

Permanently deletes rows from the source table

Can only be written in SQL, not Python

Which Databricks feature lets you define target tables and transformations declaratively while the platform manages dependencies and execution?

Lakeflow Declarative Pipelines (formerly Delta Live Tables)

In a Lakeflow Declarative Pipeline (DLT), what do 'expectations' do?

Define data-quality rules that validate rows and can drop or fail records that break them

Set the cluster size for the pipeline

Grant table permissions to users

Schedule the pipeline to run nightly

How does a Lakeflow Declarative Pipeline differ from a Databricks Job?

The pipeline declares and maintains target tables; a Job schedules and orchestrates arbitrary tasks

They are identical and interchangeable

A Job can only run SQL, never Python

A pipeline cannot apply any data-quality checks

You want a production job to run on fresh, automatically terminated compute rather than a shared interactive cluster. You should configure it to use:

A job cluster created for the run

The driver node only, with no workers

If one task in a multi-task Databricks Job fails, a sensible production practice is to:

Configure dependencies and alerts/retries so downstream tasks do not run on bad data and someone is notified

Always ignore the failure and mark the job successful

Delete the entire pipeline immediately

Move the table to a different catalog

To let an analyst read a specific table but not modify it, which approach fits Unity Catalog?

GRANT SELECT on that table to the analyst (or their group)

Give them workspace admin rights

Drop and recreate the table for each user

Within a Lakeflow Declarative Pipeline, you add a rule that fails the update if a primary-key column contains nulls. This is best described as:

A Unity Catalog lineage diagram

Your organisation wants the same table permissions and names to apply consistently when users work from several different Databricks workspaces. Unity Catalog supports this because it provides:

A shared metastore so catalogs, names and access governance are consistent across workspaces

A separate, unconnected metastore per notebook

Permissions that only ever apply inside one cluster

A copy of every table duplicated into each workspace

Are these real Databricks DE Associate exam questions?

No. These are original study questions written to test understanding. They are not real exam questions, exam dumps, or copied from any provider.

How should I use these practice questions?

Answer each one, read the explanation (including why the wrong options are wrong), and use the per-domain score below to focus your revision on weak areas. Revisit before exam day.

How many questions should I do before the exam?

Enough to score consistently across every domain, alongside full-length practice from official or reputable providers. Understanding why each answer is right matters more than raw volume.

What score means I am ready?

A good signal is consistently scoring around 80% or higher across all domains on questions you have not seen before, and being able to explain why the wrong options are wrong.

Should I use exam dumps?

No. Dumps (real or leaked questions) breach provider policy, can void your certification, and do not build the understanding the exam actually tests.

Practice questions · Data & Analytics

Databricks Certified Data Engineer Associate: Practice Questions

intermediate 30 questions

Original practice questions for the Databricks Certified Data Engineer Associate. Each answer is explained, including why each other option is wrong. Filter by topic area or difficulty. These are concept checks - not questions from the certification.

By The Exam Atlas Editorial Team · Verified 2026-06-06 · ~38 min

Domain Difficulty

Databricks Data Intelligence Platform easy

Which open table format gives Databricks tables ACID transactions, schema enforcement and time travel on top of files in cloud storage?
Databricks Data Intelligence Platform easy

In the medallion architecture, which layer holds raw data ingested as-is from the source?
Databricks Data Intelligence Platform medium

A team needs an interactive cluster to develop and test notebook code together during the day. Which compute is the best fit?
Databricks Data Intelligence Platform medium

Why is the lakehouse described as combining a data lake with a data warehouse?
Databricks Data Intelligence Platform medium

Databricks SQL is primarily used to:
Databricks Data Intelligence Platform medium

You query a Delta table 'AS OF' an earlier version to recover data that was overwritten this morning. Which Delta Lake capability is this?
Development and Ingestion medium

You drop a managed table in Databricks. What happens to the underlying data files?
Development and Ingestion medium

You need to continuously load only newly arrived files from a cloud storage folder, without reprocessing old ones. Which Databricks feature is built for this?
Development and Ingestion medium

What is the key difference between a managed table and an external (unmanaged) table?
Development and Ingestion medium

Which SQL command would you use to load query results into a new managed Delta table from existing data?
Development and Ingestion medium

When ingesting raw source files into the bronze layer, the usual goal is to:
Development and Ingestion hard

A CREATE TABLE statement fails because incoming data has a column type that does not match the table definition. Which Delta Lake behaviour caused this?
Data Processing and Transformations easy

Which language is the Python API for working with Spark DataFrames in Databricks?
Data Processing and Transformations medium

How does Structured Streaming differ from a standard batch query?
Data Processing and Transformations medium

You want to combine two DataFrames by matching rows on a shared key column, adding columns from both. In PySpark you would use a:
Data Processing and Transformations medium

When transforming bronze data into a silver table, a typical step is to:
Data Processing and Transformations hard

A streaming query that reads from a source and writes to a Delta sink uses a checkpoint location mainly to:
Data Processing and Transformations hard

In Spark, a transformation such as filter() is described as 'lazy'. This means it:
Productionizing Data Pipelines medium

Which Databricks feature lets you define target tables and transformations declaratively while the platform manages dependencies and execution?
Productionizing Data Pipelines medium

In a Lakeflow Declarative Pipeline (DLT), what do 'expectations' do?
Productionizing Data Pipelines easy

Which tool do you use to schedule and orchestrate several dependent tasks - notebooks, a pipeline and a script - to run in order?
Productionizing Data Pipelines hard

How does a Lakeflow Declarative Pipeline differ from a Databricks Job?
Productionizing Data Pipelines medium

You want a production job to run on fresh, automatically terminated compute rather than a shared interactive cluster. You should configure it to use:
Productionizing Data Pipelines hard

If one task in a multi-task Databricks Job fails, a sensible production practice is to:
Data Governance and Quality easy

Which Databricks component provides centralised governance - permissions, lineage and a unified namespace - across workspaces?
Data Governance and Quality medium

What is the correct three-level namespace used to reference an object in Unity Catalog?
Data Governance and Quality medium

To let an analyst read a specific table but not modify it, which approach fits Unity Catalog?
Data Governance and Quality hard

Within a Lakeflow Declarative Pipeline, you add a rule that fails the update if a primary-key column contains nulls. This is best described as:
Data Governance and Quality medium

A regulator asks how a particular gold table was built and which source tables fed it. Which Unity Catalog feature answers this fastest?
Data Governance and Quality hard

Your organisation wants the same table permissions and names to apply consistently when users work from several different Databricks workspaces. Unity Catalog supports this because it provides:

Practice questions FAQ

Are these real Databricks DE Associate exam questions?: No. These are original study questions written to test understanding. They are not real exam questions, exam dumps, or copied from any provider.
How should I use these practice questions?: Answer each one, read the explanation (including why the wrong options are wrong), and use the per-domain score below to focus your revision on weak areas. Revisit before exam day.
How many questions should I do before the exam?: Enough to score consistently across every domain, alongside full-length practice from official or reputable providers. Understanding why each answer is right matters more than raw volume.
What score means I am ready?: A good signal is consistently scoring around 80% or higher across all domains on questions you have not seen before, and being able to explain why the wrong options are wrong.
Should I use exam dumps?: No. Dumps (real or leaked questions) breach provider policy, can void your certification, and do not build the understanding the exam actually tests.

Sources

Databricks - Certified Data Engineer Associate ↗