Flashcards · Data & Analytics

PDE Flashcards

expert 26 cards

Free flashcards for PDE: flip each card to reveal the definition. Built from the glossary as a study aid, these are concept checks, not real exam questions.

By The Exam Atlas Editorial Team · Verified 2026-06-06

All 26 terms

BigQuery
Serverless, columnar data warehouse for SQL analytics over large datasets.
BigQuery slot
A unit of compute capacity; queries use slots, billed on-demand or via reservations.
Partitioning
Splitting a BigQuery table by date or range so queries scan fewer rows.
Clustering
Sorting data within partitions by chosen columns to cut bytes scanned.
Materialised view
A precomputed, auto-refreshed query result that speeds up frequent queries.
Bigtable
Wide-column NoSQL store for high-throughput, low-latency key-based access.
Row key
Bigtable's primary access path; its design decides read/write performance.
Cloud Storage
Object storage for files and as a data-lake layer; has storage classes and lifecycle rules.
Cloud SQL
Managed relational database (MySQL, PostgreSQL, SQL Server) for transactional workloads.
Spanner
Globally distributed, strongly consistent relational database that scales horizontally.
Dataflow
Managed service that runs Apache Beam batch and streaming pipelines, serverless.
Apache Beam
The unified programming model for batch and streaming pipelines that Dataflow runs.
Windowing
Grouping streaming data into time-based windows for aggregation in Beam/Dataflow.
Pub/Sub
Global, at-least-once messaging for ingesting and decoupling event streams.
Dataproc
Managed Hadoop and Spark for running existing open-source big-data jobs.
Dataprep
A visual, no-code tool for exploring and cleaning data (Cloud Dataprep by Trifacta).
Dataform
A tool for managing SQL-based ELT transformations and workflows in BigQuery.
Cloud Composer
Managed Apache Airflow for orchestrating and scheduling data pipelines.
Airflow DAG
A directed graph of tasks defining a pipeline's steps and dependencies in Composer.
Dataplex
A data fabric for organising, governing and discovering data across lakes and warehouses.
Data Catalog
The metadata and discovery service (part of Dataplex) for finding and tagging data.
Streaming vs batch
Processing events continuously as they arrive vs in scheduled bulk loads.
ETL / ELT
Extract-Transform-Load vs Extract-Load-Transform (transform in the warehouse).
IAM
Identity and Access Management - roles and permissions that control who can do what.
Service account
A non-human identity that pipelines and services use to authenticate to GCP.
Cloud Monitoring / Logging
The services for metrics, dashboards, alerts and logs used to operate pipelines.