Study guide · Data & Analytics

Microsoft Fabric Data Engineer (DP-700): Study Guide

intermediate

A practical, step-by-step plan to take DP-700 from "interested" to exam-ready - the mechanics, what to study in what order, how to practise, and how to know you are ready.

By The Exam Atlas Editorial Team · Verified 2026-06-07

Study plans by timeline

4-week intensiveWith Fabric experience (~14-16 hrs/week): work the three skills measured hands-on in a Fabric workspace, then mocks.
6-week balancedThe default (~9-10 hrs/week): roughly two weeks per skill area, hands-on in Fabric, mocks at the end.
8-week steadyFor those newer to Fabric (~6 hrs/week): start with the Lakehouse and notebooks, then ingestion, then security and optimisation.

What to study, in order

Weeks 1–2Implement and manage: workspace settings, OneLake security, RLS/CLS, deployment pipelines and orchestration
Weeks 3–4Ingest and transform: loading patterns, Dataflows Gen2, notebooks (PySpark), pipelines, OneLake shortcuts and mirroring
Weeks 5–6Streaming (Eventstream, KQL, structured streaming); then monitor and optimise Lakehouse, warehouse and Spark
Weeks 7–8Full-length timed reviews and the free Microsoft Learn practice assessment

DP-700 certifies that you can build and run an analytics solution end to end on Microsoft Fabric, and it is hands-on in spirit. Microsoft expects you to manipulate and transform data with three languages, SQL, PySpark and KQL, across the full Fabric workflow, and the exam may include interactive items that ask you to do something rather than recall it. That makes a Fabric workspace your most important study tool: every concept in this guide is one you should perform, not just read. This is a full self-study course organised around the three skill areas Microsoft measures, which are weighted almost evenly at roughly 30 to 35 percent each, so none can be skipped. It is original teaching material with no real or simulated exam questions, and you should always confirm the current skills-measured list and the free practice assessment on the official certification page before you book, since Microsoft revises its exams regularly and may test commonly used Preview features.

Chapter 1: Exam overview and how to use this guide

What the exam measures and how it is structured

DP-700 is the associate-level data-engineering credential for the Microsoft Fabric stack. The exam runs 100 minutes, is delivered through Pearson VUE either at a centre or online with a proctor, and may mix interactive question types with multiple-choice. Microsoft does not publish the question count. The pass mark is 700 out of 1000 on a scaled score, which is not a simple percentage of questions correct, so the right target is broad competence rather than a raw count. The credential is valid for one year, but renewal is free: a short, unproctored, open-book assessment on Microsoft Learn in the six months before it expires.

The three skill areas this course follows are implement and manage an analytics solution, ingest and transform data, and monitor and optimize an analytics solution. Because each is worth roughly a third, your study time should be split fairly evenly, with extra hands-on attention wherever you are weakest.

Fabric, OneLake, and the platform you are being tested on

The exam is built around Microsoft Fabric and its unified data lake, OneLake. Fabric brings the data-engineering items you will work with, principally the Lakehouse (files and Delta tables you transform with Spark notebooks) and the Warehouse (a full T-SQL engine for set-based work), alongside pipelines, Dataflows Gen2, and the real-time items built on KQL. Knowing what each item is for is the backbone of the exam, so this guide introduces them early and returns to the tool-choice decisions throughout.

DP-700 versus DP-203, and versus PL-300

Two comparisons clear up what DP-700 is. First, it is the Fabric-era successor to the retired DP-203 (Azure Data Engineer Associate, retired 31 March 2025). DP-203 was built on Azure Synapse and Azure Data Factory; DP-700 is built on Fabric and OneLake instead, so the data-engineering ideas overlap but the tooling is different. Second, it is not the Power BI analyst exam: PL-300 covers preparing, modelling and visualising data for reports, while DP-700 covers engineering the data that feeds those reports. The two are complementary on the same stack, and the deciding factor between them is your role, not difficulty.

How to use this course

Read the chapters in order at least once. The platform foundations make the ingestion and optimisation chapters concrete, and the security material recurs across them. Treat the bold terms as a checklist you can both explain and perform in a workspace. The final chapters turn the content into a schedule, a final-week routine, and a description of exam day. A short illustration appears where a concept is easy to misread, but none are exam questions.

Chapter 2: Fabric foundations and choosing the right item

Before the three skill areas, you need a clear map of the Fabric items and when to use each, because so many questions are really “which item is correct here.” This chapter builds that map.

Lakehouse versus Warehouse

The most important decision in Fabric is Lakehouse versus Warehouse. A Lakehouse stores files and Delta tables over OneLake; you load and transform it with Spark notebooks and read it through a SQL analytics endpoint, which is read-only T-SQL over the Delta tables. A Warehouse is a full T-SQL engine that supports SQL writes, stored procedures and set-based transformation. The clean rule: choose a Lakehouse when your work is file-based or Spark-driven, and a Warehouse when you need T-SQL writes and procedural SQL. A scenario that says “transform large files with PySpark” points to a Lakehouse; one that says “run stored procedures and INSERT/UPDATE in T-SQL” points to a Warehouse.

OneLake and shortcuts

OneLake is the single, tenant-wide data lake that every Fabric item reads from and writes to, which is what lets different engines work on one copy of the data. A shortcut is a pointer to data in another location (another Lakehouse, another domain, or external storage like ADLS or S3) that appears in your item without copying the data. The exam-relevant judgement is recognising when a shortcut is better than a copy: when you want a single source of truth reused across items, or want to reference external data in place, a shortcut avoids duplication and the staleness that copies introduce.

The three transform and orchestration tools

Three tools do most of the work, and knowing which to choose is heavily tested. Dataflows Gen2 is low-code transformation built on Power Query, suited to analysts and to transformations expressed as steps rather than code. Notebooks are code-first Spark (PySpark or Spark SQL), suited to large-scale or complex transformation. Pipelines orchestrate and move data: they copy data and run other activities (including notebooks and Dataflows) on a schedule or trigger. The shorthand to internalise: Dataflow Gen2 for low-code transform, notebook for code transform, pipeline for orchestration and copy. Many questions hand you a requirement and expect you to pick the right one of these three.

Direct Lake, briefly

You should recognise Direct Lake, a Power BI semantic-model mode that reads Delta tables in OneLake directly, giving import-like query speed without a scheduled import, as long as the data stays in supported Delta form. It sits at the boundary between engineering and reporting and is worth knowing as the reason your gold Delta tables can serve fast reports without a separate import step.

Chapter 3: Implement and manage an analytics solution

This first skill area, roughly 30 to 35 percent, covers configuring the workspace, lifecycle management, security and governance, and orchestration. It is broad, and security in particular is several distinct topics rather than one.

Configuring the workspace and lifecycle

You should be able to configure Fabric workspace settings, including Spark settings, the domain a workspace belongs to, OneLake options, and Dataflows Gen2 settings. On top of that sits lifecycle management: using version control (Git integration) so your items are tracked as code, working with database projects for the Warehouse, and using deployment pipelines to promote content through development, test and production stages. The exam wants you to understand that mature Fabric work is versioned and promoted, not edited directly in production.

Security is not one topic

The biggest trap in this area is treating security as a single subject. Fabric layers several controls, and the exam separates them. Workspace-level and item-level access decides who can open or manage a workspace or a specific item. Row-level security (RLS) restricts which rows a user sees, column-level security (CLS) restricts which columns, and object-level security (OLS) hides whole tables or objects. Dynamic data masking obscures sensitive values in results without removing access to the row. Sensitivity labels classify and protect data for governance, and OneLake security governs access at the lake. The skill being tested is matching the control to the requirement: “users in different regions must see only their region’s rows” is RLS; “hide the salary column from analysts” is CLS; “mask the middle digits of a card number” is dynamic data masking.

Orchestration

Finally, this area covers orchestrating processes with pipelines and notebooks, and setting up schedules and event-based triggers. You should know that a pipeline can run notebooks and Dataflows as activities, and that work can be triggered on a schedule or by an event rather than only run manually. Build a pipeline that runs a notebook on a schedule so the orchestration concepts are concrete.

Chapter 4: Ingest and transform data

This is the heart of the exam, another 30 to 35 percent, and it is where most candidates underestimate how many questions hinge on the right tool and the right loading pattern.

Loading patterns

Start with patterns, because they frame everything else. A full load reloads the entire dataset each run, which is simple but expensive at scale. An incremental load brings only new or changed data since the last run, which is what production pipelines usually need. You should also understand preparing data for a dimensional model (shaping facts and dimensions) and streaming loads for continuously arriving data. Practise both a full and an incremental load so you can reason about why incremental is preferred for large, frequently updated sources.

Batch ingestion and transformation

For batch work, the exam expects two decisions and then the transformation itself. First, choose an appropriate data store (Lakehouse or Warehouse, per Chapter 2). Second, choose the transform tool among Dataflows Gen2, notebooks, KQL and T-SQL. Then transform: with PySpark, SQL and KQL you should be able to denormalise, group and aggregate, and handle the messy realities of duplicate, missing and late-arriving data. Build a small medallion layering (bronze raw, silver cleaned and conformed, gold business-ready) so you genuinely understand preparing data through stages, and so the “which store, which tool” questions are grounded in something you have done. OneLake shortcuts and mirroring sit here too: mirroring continuously replicates an external database into OneLake, which is the better pattern when you need a live copy of an operational source available for analytics.

Streaming and KQL

The streaming portion is real and inside this weighted area, so do not skip it as niche. Eventstreams capture and route real-time events; Spark structured streaming processes streams in notebooks; and KQL (Kusto Query Language) queries high-volume event and telemetry data held in an Eventhouse. A specific skill is windowing: grouping streaming events into time windows (for example, counts per five-minute window) so you can aggregate an unbounded stream. Process a small stream with an Eventstream and query it with KQL, including a windowing function, so the real-time concepts move from abstract to familiar.

The decision skill the exam rewards

If you can read a requirement and quickly say which store, which transform tool, and whether it is batch or streaming, with a one-line justification, you are practising exactly what this area tests. Lakehouse versus Warehouse, Dataflow Gen2 versus notebook versus pipeline, full versus incremental, shortcut versus mirroring versus copy: these are the recurring forks, and fluency with them is worth more than any single piece of syntax.

Chapter 5: Monitor and optimize an analytics solution

The final skill area, again roughly 30 to 35 percent, is about keeping the solution healthy: monitoring, diagnosing errors, and optimising performance. The marks reward knowing where a problem surfaces and which lever fixes it.

Monitoring and alerts

You should be able to monitor ingestion, transformation and semantic-model refresh, and configure alerts so failures and anomalies are surfaced rather than discovered late. Fabric’s monitoring surfaces show the status and history of pipeline runs, Dataflow refreshes, and semantic-model refreshes; the skill is knowing where to look for each and how to set an alert on it.

Diagnosing errors across items

A listed skill is to identify and resolve errors across the full range of items: pipelines, Dataflows Gen2, notebooks, Eventhouses, Eventstreams, T-SQL, and OneLake shortcuts. Each surfaces failures differently, so the exam wants you to recognise the typical failure of each: a pipeline activity failing and how to read its run output, a notebook cell error, a Dataflow refresh failure, a broken shortcut. Practise reading the error each item produces rather than guessing at causes.

Optimisation levers

Then optimise. The exam covers tuning Lakehouse tables (for example, file layout and table maintenance), pipelines, the data warehouse, Eventstreams and Eventhouses, Spark, and query performance generally. The judgement being tested is matching the lever to the symptom: a slow Spark notebook points to Spark-level tuning and the right cluster configuration; a slow warehouse query points to warehouse and query optimisation; many small files in a Lakehouse table point to table maintenance. You do not need deep performance-engineering expertise at the associate level, but you should know which optimisation applies where.

The pitfall to avoid here

The mistake is studying optimisation as a list of features instead of a set of responses to symptoms. Reframe it: for each common slowness or failure, know the one or two levers you would reach for. That framing is how the questions are written.

Chapter 6: Study plan and hands-on practice

With the three skill areas understood, pace them so the hands-on time DP-700 rewards is not crowded out by passive reading.

Set up a Fabric workspace first

A Fabric trial capacity is available, and Microsoft Learn provides free training paths plus a free practice assessment. Get a workspace on day one so every later session can include building. The free practice assessment is the best early calibration tool, so take it once near the start to see where you stand.

Choose a timeline by experience

A working data engineer new to Fabric typically needs about 50 to 70 hours over six to eight weeks: a couple of weeks on implement-and-manage, a couple on ingest-and-transform, then streaming and the monitor-and-optimise area, finishing with full-length practice. Someone newer to data engineering should plan for roughly 90 to 120 hours, starting with the Lakehouse and notebooks before ingestion, then security and optimisation. To turn a chosen length into dated weeks for your start date, use the free study-plan generator.

Build the full loop once

The highest-value exercise spans all three areas. Create a Lakehouse, ingest data with a pipeline and a Dataflow Gen2, transform it with a PySpark notebook and with T-SQL, add a OneLake shortcut, build a small medallion layering, set up row-level security, then process a stream with an Eventstream and query it with KQL, and finally monitor a refresh and set an alert. Doing this loop once exercises the whole exam and directly prepares you for any interactive items.

Chapter 7: Final preparation, exam day, and format

Final preparation

In the closing week, consolidate rather than learn new material. Revisit the tool-choice decisions (Lakehouse versus Warehouse, Dataflow Gen2 versus notebook versus pipeline, shortcut versus mirroring), the security sub-types (RLS, CLS, OLS, dynamic data masking, sensitivity labels, OneLake security), the full-versus-incremental loading patterns, and the optimisation levers by symptom. Re-take the free practice assessment and sit one or two full-length mocks, treating each as a diagnosis of weak areas and reviewing the reasoning behind every miss. Aim to score comfortably above the pass mark on fresh questions before booking, and avoid sites recycling copied exam content, which breaches Microsoft policy.

Exam day and format

On the day, the exam runs 100 minutes through Pearson VUE, at a centre or online, and may include interactive question types as well as multiple-choice. Microsoft does not publish the question count, and you need a scaled 700 out of 1000 to pass. Pace yourself so the interactive items, which can take longer, do not run away from you, and read each tool-choice question for the requirement that decides the answer (a T-SQL write requirement, a real-time requirement, an incremental requirement). Confirm the current skills-measured list, fee, and policy on the official certification page when you book. Having built the workflow yourself in a Fabric workspace is the advantage that makes the format feel familiar rather than abstract.

Domain by domain: what to master

Implement and manage an analytics solution (30-35%)
Configure Fabric workspace settings (Spark, domain, OneLake, Dataflows Gen2) · Lifecycle management: version control, database projects, deployment pipelines · Security and governance: workspace and item access, RLS/CLS/OLS, sensitivity labels, OneLake security · Orchestrate processes with pipelines and notebooks, schedules and event triggers
Ingest and transform data (30-35%)
Loading patterns: full and incremental loads, dimensional model prep, streaming loads · Batch ingestion and transformation with Dataflows Gen2, notebooks, KQL and T-SQL · OneLake shortcuts, mirroring and pipelines · Streaming with Eventstreams, Spark structured streaming and KQL windowing
Monitor and optimize an analytics solution (30-35%)
Monitor ingestion, transformation and semantic model refresh; configure alerts · Identify and resolve pipeline, Dataflow Gen2, notebook, Eventhouse and T-SQL errors · Optimise Lakehouse tables, warehouses and pipelines · Optimise Spark, Eventstreams/Eventhouses and query performance

Key concepts to master

Lakehouse
A Fabric item that stores files and Delta tables over OneLake. You load and transform here with notebooks (Spark) and read it with the SQL analytics endpoint.
Warehouse
A Fabric item with a full T-SQL engine for set-based transformation and serving. Choose it when you want SQL writes and stored procedures, not just read access.
OneLake & shortcuts
OneLake is the single, tenant-wide data lake. A shortcut points to data in another location without copying it, so one copy can be reused across items.
Dataflows Gen2 vs pipelines vs notebooks
Low-code transform (Dataflows Gen2), orchestration and copy (pipelines), and code-first Spark transform (notebooks). Knowing when to use which is heavily tested.
Direct Lake
A Power BI semantic-model mode that reads Delta tables in OneLake directly - import-like speed without scheduled import, as long as the data stays in supported Delta form.
Medallion architecture
Bronze (raw), silver (cleaned/conformed), gold (business-ready) layering. In Fabric you build it across Lakehouse/Warehouse tables to prepare a dimensional model.
Eventhouse & KQL
Real-Time Intelligence storage (Eventhouse, holding KQL databases) queried with Kusto Query Language for high-volume event and telemetry data.

What you should be able to do

By exam day, you should be able to:

  • Configure a Fabric workspace (Spark, domain, OneLake, Dataflows Gen2) and set up deployment pipelines
  • Choose correctly between a Lakehouse and a Warehouse, and between a Dataflow Gen2, a notebook and a pipeline
  • Implement full and incremental loads and prepare data for a dimensional model
  • Transform data with PySpark, T-SQL and KQL, and create OneLake shortcuts and mirroring
  • Process streaming data with Eventstreams, KQL windowing and Spark structured streaming
  • Apply security (workspace, item, row/column/object level, OneLake), and monitor and optimise items

How to practise

Practise in a Microsoft Fabric workspace (a trial capacity is available), since the exam expects hands-on ingestion and transformation and may include interactive items. Create a Lakehouse, ingest data with a pipeline and a Dataflow Gen2, transform with a PySpark notebook and with T-SQL, add a OneLake shortcut, build a small medallion layering, set up row-level security, then process a stream with an Eventstream and query it with KQL. Use the free Microsoft Learn practice assessment and review weak areas before booking.

  • Practise actively from early on - recall and apply, don't just re-read.
  • Each week, review the previous week's weak spots before moving on.
  • Do at least one full-length, timed mock near the end, then a second after fixing weak areas.
  • Warm up with our original DP-700 practice questions (concept checks, not exam dumps).

We never publish exam dumps or "real" questions. Use official practice and reputable providers for question banks.

Are you ready? (readiness checklist)

  • You score at or above the pass mark (700 / 1000) on full-length, timed mocks - consistently, not once.
  • No more than one or two weak domains remain, and you know exactly which.
  • You can explain why the wrong options are wrong, not just spot the right one.
  • You've completed at least one full-length mock under real time pressure.
  • You could pass next week, not only on the day you crammed.

On exam day

Scheduled through Pearson VUE, at a centre or online. The exam runs 100 minutes and may include interactive question types as well as multiple-choice. Microsoft does not publish the question count; confirm current details when you book.

  • Arrive early, or run the online-proctoring system check well ahead; have valid ID ready.
  • Budget your time per question and keep moving - don't sink minutes into one item.
  • Where the format allows, flag hard questions and return to them rather than stalling.
  • Read scenario and performance-based questions twice: work out what is actually asked first.
  • Taper in the final days - light review and rest beat an all-nighter.

Common mistakes to avoid

  • Studying without a Fabric workspace open; DP-700 expects hands-on ingestion and transformation, so reading alone is not enough.
  • Not learning when to choose a Lakehouse vs a Warehouse, or a Dataflow Gen2 vs a notebook vs a pipeline - these decision questions are common.
  • Treating security as one topic; DP-700 separates workspace-level, item-level, row/column/object-level and OneLake security, plus dynamic data masking and sensitivity labels.
  • Skipping streaming (Eventstream, KQL windowing, Spark structured streaming) because it feels niche - it sits inside a 30–35% area.
  • Confusing this with DP-203 (Azure Synapse / Data Factory); DP-700 is Microsoft Fabric and OneLake, and DP-203 is retired.

Resource stack

Start with the free and official resources above. Paid courses and question banks help if you want structure, but they are optional, not required to pass.

What to study next

DP-700 is the core Fabric data-engineering cert. From here, pair it with PL-300 (Power BI Data Analyst) for the reporting side of the stack, or compare it with the Google Cloud Professional Data Engineer and SnowPro Core if you are choosing a data platform.

FAQ

How long does it take to study for DP-700?
Most working data engineers need 50–70 hours over 6 to 8 weeks. Hands-on time in a Fabric workspace, especially notebooks, pipelines and the Lakehouse, shortens this considerably.
Do I need to know PySpark and KQL for DP-700?
Yes. Microsoft expects you to transform data with SQL, PySpark and KQL. You do not need to be an expert in all three, but you must be able to read and write transformations and choose the right tool for batch versus streaming.
Is DP-700 the same as the old DP-203?
No. DP-203 (Azure Data Engineer Associate) retired on 31 March 2025 and was built on Azure Synapse and Azure Data Factory. DP-700 is the Fabric-era equivalent, built on Microsoft Fabric and OneLake. The data-engineering ideas overlap, but the tooling is different.
Which DP-700 area is the hardest?
It varies, but the three areas are weighted almost evenly (30–35% each), so you cannot skip one. Many candidates find the 'Ingest and transform' decisions (which store, which transform tool, batch vs streaming) and the security sub-types the trickiest.
How many practice tests should I do?
Start with Microsoft Learn's free practice assessment to calibrate, then sit several full-length practice tests in the final weeks. Use each to find weak skill areas and aim to be comfortably above the pass mark on fresh questions before booking.

Sources