Practice questions · Data & Analytics
Microsoft Fabric Data Engineer (DP-700): Practice Questions
Original practice questions for Microsoft Fabric Data Engineer (DP-700). Each answer is explained, including why each other option is wrong. Filter by domain or difficulty. These are concept checks - not questions from the certification.
Answered 0 · Correct 0
-
In Microsoft Fabric, which single, tenant-wide data lake do all workspaces and items store their data in?
Correct answer: A. OneLake is the single data lake provisioned once per tenant, and every Fabric workspace and item stores data in it, which is what enables shortcuts and reuse without copies. Azure Data Lake Storage Gen1 is a retired Azure service, not the Fabric lake; there is not a separate lake per workspace - workspaces are folders within the one OneLake; and the Spark cluster disk is transient compute storage, not the durable data lake. -
You need to promote a tested set of Fabric items from a development workspace to test and then production. Which Fabric feature is designed for this?
Correct answer: C. Deployment pipelines move content across development, test and production stages, which is exactly the lifecycle-management task described. A data pipeline orchestrates data movement and activities inside a solution, not promotion between stages; a OneLake shortcut points to data without copying it; and an Eventstream captures and routes streaming data - none promote items across environments. -
A team wants each regional manager to see only the sales rows for their own region when they open the same semantic model. Which control should you implement?
Correct answer: B. Row-level security filters the rows returned based on the user's role, so each manager sees only their region while everyone uses one model. Object-level security hides entire tables or columns from a role, not specific rows; a sensitivity label classifies and protects an item but does not filter rows; and dynamic data masking obscures values in a column rather than restricting which rows are visible. -
Which workspace setting in Fabric controls the compute configuration used by notebooks and Spark jobs?
Correct answer: D. Spark workspace settings define the pool, runtime and compute that notebooks and Spark jobs use. Dataflows Gen2 settings control the low-code dataflow engine, not Spark; domain settings group workspaces for governance; and OneLake settings govern the data-lake layer - none configure the Spark compute. -
You must hide the exact values in a credit-card column from most users, showing only masked characters, while keeping the rows available. Which feature does this at query time?
Correct answer: B. Dynamic data masking obscures the values in a sensitive column for unauthorised users at query time while the rows themselves remain available. Row-level security removes whole rows from a user's view rather than masking a column's values; a deployment pipeline promotes content across stages; and a OneLake shortcut references data elsewhere - neither masks column values. -
For lifecycle management of Fabric items using Git, which capability lets you track and version changes to your items?
Correct answer: A. Version control through Git integration tracks changes to Fabric items so you can review history and collaborate, which is the lifecycle-management capability asked for. A scheduled refresh updates data on a timetable; a KQL database stores event data for Real-Time Intelligence; and conditional formatting styles a report visual - none version your items. -
You need to run a notebook every night, but also start a second notebook immediately whenever a file lands in a folder. Which combination meets both needs?
Correct answer: D. Orchestration in Fabric supports both time-based schedules and event-based triggers, so a schedule handles the nightly run and a file-arrival event triggers the second notebook. A manual run cannot satisfy either automated requirement; RLS and sensitivity labels are security controls, not orchestration; and shortcuts and mirroring move or reference data, they do not trigger runs. -
Which access control limits a specific user or group to a single Fabric item, such as one Lakehouse, rather than the whole workspace?
Correct answer: C. Item-level access grants permissions on an individual item like a single Lakehouse, independent of the workspace roles. Workspace-level access applies broadly to everything in the workspace; tenant-level admin settings govern the whole tenant; and Spark pool settings configure compute, not access - none scope permission to one item. -
Endorsing a Fabric item by marking it as 'Certified' or 'Promoted' primarily helps to:
Correct answer: A. Endorsement (Promoted or Certified) is a governance signal that tells other users an item is trustworthy and ready to reuse. It does not encrypt files, has no effect on Spark pool sizing, and does not change an item's type from Lakehouse to Warehouse - those are unrelated storage, compute and item concerns. -
Where do you configure OneLake security to govern access at the data-lake layer for a Fabric item?
Correct answer: B. OneLake security is configured on the item within the workspace so access is enforced at the OneLake data layer. A report's format pane only styles visuals; editing notebook code transforms data but does not set lake-layer security; and the Pearson VUE portal is for scheduling exams - none configure OneLake security. -
You need a Fabric store you can transform with PySpark notebooks and that holds files plus Delta tables. Which item fits best?
Correct answer: D. A Lakehouse stores both files and Delta tables over OneLake and is the natural target for Spark notebook transformation. A Warehouse is a T-SQL engine, not a files-plus-Spark store; a Power BI dashboard is a reporting surface, not a data store; and an Eventstream routes streaming data rather than storing files and Delta tables for batch Spark work. -
Your transformation logic is set-based T-SQL with stored procedures and you need full read/write SQL support. Which Fabric store is the better choice?
Correct answer: A. A Fabric Warehouse provides a full T-SQL engine with read/write support and stored procedures, which suits set-based SQL transformation. A Lakehouse's SQL analytics endpoint is read-only, so it cannot run write transformations; an Eventhouse is for KQL event data, not T-SQL writes; and a Dataflow Gen2 is a low-code transform tool, not a SQL store. -
You want to reference data that already exists in another Lakehouse without duplicating or copying it. Which feature do you use?
Correct answer: C. A OneLake shortcut points to existing data in another location so it can be queried in place, with no copy. Mirroring continuously replicates an external database into OneLake (it does copy); a full data load physically copies all rows; and a deployment pipeline promotes items across stages - none give an in-place reference without duplication. -
A source table is large and only a small number of rows change daily. The most efficient loading pattern is:
Correct answer: B. An incremental load processes only the new or changed rows, which is efficient when most data is unchanged. A full load reprocesses everything each run, wasting compute; re-typing rows by hand is impractical and error-prone; and disabling refresh leaves the data stale rather than loading the changes. -
Which Fabric tool gives a low-code, Power Query-style experience for ingesting and transforming data?
Correct answer: D. Dataflows Gen2 provide the low-code, Power Query-based way to ingest and shape data without writing code. A Spark notebook is code-first (PySpark/Spark SQL); a KQL queryset queries event data with Kusto Query Language; and a T-SQL stored procedure is code in the Warehouse - none are the low-code Power Query experience. -
You are processing a high-volume telemetry stream and need to count events in fixed five-minute, non-overlapping intervals. Which concept do you apply?
Correct answer: C. A tumbling window groups streaming events into fixed, non-overlapping time intervals, which is exactly what counting per five minutes needs. A star schema relationship is a batch-modelling concept; a deployment pipeline stage promotes content; and a OneLake shortcut references data - none define time windows over a stream. -
To continuously replicate an external operational database into OneLake as Delta tables with minimal setup, you use:
Correct answer: A. Mirroring continuously replicates a supported external database into OneLake as Delta tables, keeping it in sync with little configuration. A single CSV export is a one-off snapshot, not continuous replication; row-level security restricts row visibility; and a report bookmark saves a page view - none replicate a source database. -
While transforming data, you must collapse multiple normalised tables into one wide table to simplify reporting. This operation is called:
Correct answer: B. Denormalisation combines normalised tables into a wider table to make querying and reporting simpler, which is the transformation described. Mirroring replicates a source database; row-level security filters rows by user; and scheduling controls when a job runs - none merge normalised tables into one wide table. -
You need code-first streaming processing inside a Fabric notebook, reading a stream and writing results to a Delta table. Which approach fits?
Correct answer: D. Spark structured streaming is the code-first API in notebooks for reading a stream continuously and writing results to a Delta table. A static T-SQL SELECT run once is batch, not streaming; a Power BI slicer is a report filter; and a conditional-formatting rule styles a visual - none provide continuous stream processing in a notebook. -
When ingesting data you encounter records arriving after their event time has passed. Handling these correctly is described as managing:
Correct answer: C. Records that arrive after their event time are late-arriving data, and handling them (alongside duplicate and missing data) is an explicit transformation skill. Workspace roles are access control; sensitivity labels are governance tags; and Spark pool sizing is compute configuration - none describe handling delayed records. -
A nightly pipeline fails intermittently. Your first monitoring step in Fabric to find the cause is to:
Correct answer: A. Fabric's monitoring view shows pipeline run history and error details, which is where you start diagnosing intermittent failures. Deleting and rebuilding from memory discards the evidence you need; changing a report's colour theme is unrelated to pipeline errors; and exam passing scores have nothing to do with operating a pipeline. -
Reports on a Lakehouse have become slow because the underlying Delta table has many tiny files. The appropriate optimisation is to:
Correct answer: B. Compacting or optimising the Delta table consolidates many small files into fewer larger ones, which improves read performance. Adding slicers changes the report UI, not table performance; renaming the workspace is cosmetic; and disabling row-level security weakens access control without fixing the small-file problem. -
You want to be notified automatically when a semantic model refresh fails. In Fabric you should:
Correct answer: D. Configuring an alert means Fabric notifies you automatically when a monitored event such as a failed refresh occurs. Checking manually every hour is not automatic notification; adding a calculated column changes the model, not monitoring; and a OneLake shortcut references data - neither alerts you to a failure. -
A Spark notebook job is slow and you see many small partitions and repeated reads of the same data. A reasonable Spark optimisation is to:
Correct answer: C. Tuning partition sizes and caching data that is read repeatedly are standard Spark performance improvements that directly address the symptoms described. The exam time limit is unrelated to job performance; switching to CSV removes Delta's columnar and statistics advantages and usually hurts performance; and deleting the Lakehouse destroys the data rather than optimising it. -
Which monitoring task specifically checks that data is being brought into Fabric as expected?
Correct answer: A. Monitoring data ingestion is the task that confirms data is landing in Fabric as expected, separate from transformation or refresh monitoring. Applying a sensitivity label is governance; building a slicer is report design; and configuring a Spark pool is compute setup - none monitor whether ingestion is succeeding. -
A KQL query against an Eventhouse is timing out on a huge dataset. A sensible first optimisation is to:
Correct answer: D. Filtering earlier and narrowing the time range reduces how much data the KQL query scans, which is the usual way to speed up Eventhouse queries. Moving data to a dashboard does not change query efficiency; removing security is a governance risk, not an optimisation; and rebuilding visuals addresses presentation, not the slow query. -
A Dataflow Gen2 refresh is failing on a type-conversion step. The most direct way to resolve it is to:
Correct answer: B. Opening the Dataflow Gen2, reading the failing step's error and fixing that transformation resolves the error at its source. Increasing warehouse capacity does not fix a logic error in a transform step; adding a report bookmark is unrelated; and re-registering for the exam has nothing to do with the dataflow. -
Queries against a Fabric Warehouse are slow. Which optimisation is most relevant?
Correct answer: C. Optimising the warehouse - improving query patterns and ensuring statistics are current - directly targets slow Warehouse queries. Adding report pages changes the UI, not query speed; disabling the gateway breaks connectivity to on-premises sources; and a warehouse cannot become a slicer, which is a report control. -
After a notebook fails mid-run, which Fabric capability helps you find the exact cell and error message that caused it?
Correct answer: D. The notebook run details and logs in the monitoring view show the failing cell and its error message, which is what you need to resolve a notebook error. The report format pane styles visuals; a sensitivity label classifies data; and a OneLake shortcut references data - none surface notebook run errors. -
An Eventstream feeding an Eventhouse cannot keep up with peak event volume and you see growing latency. Which action most directly addresses the throughput problem?
Correct answer: C. Scaling and tuning the Eventstream and Eventhouse increases the throughput available to a high-volume stream, which is the latency problem described. Adding a calculated column changes a batch model, not stream throughput; replacing the stream with a static CSV import abandons real-time processing entirely; and applying a sensitivity label is governance, not a performance fix.
Practice questions FAQ
- Are these real DP-700 exam questions?
- No. These are original study questions written to test understanding. They are not real exam questions, exam dumps, or copied from any provider.
- How should I use these practice questions?
- Answer each one, read the explanation (including why the wrong options are wrong), and use the per-domain score below to focus your revision on weak areas. Revisit before exam day.
- How many questions should I do before the exam?
- Enough to score consistently across every domain, alongside full-length practice from official or reputable providers. Understanding why each answer is right matters more than raw volume.
- What score means I am ready?
- A good signal is consistently scoring around 80% or higher across all domains on questions you have not seen before, and being able to explain why the wrong options are wrong.
- Should I use exam dumps?
- No. Dumps (real or leaked questions) breach provider policy, can void your certification, and do not build the understanding the exam actually tests.