Glossary · Data & Analytics

Google Cloud Professional Data Engineer (PDE) Glossary

expert

A free Google Cloud Professional Data Engineer (PDE) glossary: core terms (BigQuery, Dataflow, Pub/Sub, Bigtable, partitioning) defined in plain English for study.

By The Exam Atlas Editorial Team · Verified 2026-06-06

Plain-English definitions of the core Google Cloud terms for Professional Data Engineer study. Simplified for learning; Google Cloud documentation is authoritative.

TermDefinition
BigQueryServerless, columnar data warehouse for SQL analytics over large datasets.
BigQuery slotA unit of compute capacity; queries use slots, billed on-demand or via reservations.
PartitioningSplitting a BigQuery table by date or range so queries scan fewer rows.
ClusteringSorting data within partitions by chosen columns to cut bytes scanned.
Materialised viewA precomputed, auto-refreshed query result that speeds up frequent queries.
BigtableWide-column NoSQL store for high-throughput, low-latency key-based access.
Row keyBigtable’s primary access path; its design decides read/write performance.
Cloud StorageObject storage for files and as a data-lake layer; has storage classes and lifecycle rules.
Cloud SQLManaged relational database (MySQL, PostgreSQL, SQL Server) for transactional workloads.
SpannerGlobally distributed, strongly consistent relational database that scales horizontally.
DataflowManaged service that runs Apache Beam batch and streaming pipelines, serverless.
Apache BeamThe unified programming model for batch and streaming pipelines that Dataflow runs.
WindowingGrouping streaming data into time-based windows for aggregation in Beam/Dataflow.
Pub/SubGlobal, at-least-once messaging for ingesting and decoupling event streams.
DataprocManaged Hadoop and Spark for running existing open-source big-data jobs.
DataprepA visual, no-code tool for exploring and cleaning data (Cloud Dataprep by Trifacta).
DataformA tool for managing SQL-based ELT transformations and workflows in BigQuery.
Cloud ComposerManaged Apache Airflow for orchestrating and scheduling data pipelines.
Airflow DAGA directed graph of tasks defining a pipeline’s steps and dependencies in Composer.
DataplexA data fabric for organising, governing and discovering data across lakes and warehouses.
Data CatalogThe metadata and discovery service (part of Dataplex) for finding and tagging data.
Streaming vs batchProcessing events continuously as they arrive vs in scheduled bulk loads.
ETL / ELTExtract-Transform-Load vs Extract-Load-Transform (transform in the warehouse).
IAMIdentity and Access Management - roles and permissions that control who can do what.
Service accountA non-human identity that pipelines and services use to authenticate to GCP.
Cloud Monitoring / LoggingThe services for metrics, dashboards, alerts and logs used to operate pipelines.

Sources