Glossary

Data & Analytics glossary

163 key terms and acronyms from across Data & Analytics certifications, in plain English. Definitions are simplified for learning; the official exam outlines are authoritative.

Aggregation
Rolling values up with SUM, AVG, COUNT and similar functions.
Airflow DAG
A directed graph of tasks defining a pipeline's steps and dependencies in Composer.
Apache Beam
The unified programming model for batch and streaming pipelines that Dataflow runs.
Apache Spark
The distributed engine underneath Databricks for processing large data in parallel.
Append
Stacking the rows of two queries with the same columns.
Applied Steps
The ordered list of transformations Power Query records and replays on refresh.
Auto Loader
Incremental ingestion of new files from cloud storage (the cloudFiles source).
BigQuery
Serverless, columnar data warehouse for SQL analytics over large datasets.
BigQuery slot
A unit of compute capacity; queries use slots, billed on-demand or via reservations.
Bigtable
Wide-column NoSQL store for high-throughput, low-latency key-based access.
Bookmark
A saved state of a report page (filters, selection, visibility).
Calculated column
A DAX value computed per row and stored at refresh.
Calculated field
A new field defined by a formula.
Cardinality
The relationship type: one-to-many, one-to-one or many-to-many.
Catalog
The top level of the Unity Catalog namespace, containing schemas.
Cloud Composer
Managed Apache Airflow for orchestrating and scheduling data pipelines.
Cloud Monitoring / Logging
The services for metrics, dashboards, alerts and logs used to operate pipelines.
Cloud services layer
The brain that handles authentication, metadata, query optimisation and security.
Cloud SQL
Managed relational database (MySQL, PostgreSQL, SQL Server) for transactional workloads.
Cloud Storage
Object storage for files and as a data-lake layer; has storage classes and lifecycle rules.
Cluster / compute
The Spark compute resource that runs your code; can be all-purpose (interactive) or job clusters.
Clustering
Sorting data within partitions by chosen columns to cut bytes scanned.
Clustering key
A chosen column set that co-locates related rows so pruning is more effective on large tables.
Compute layer
The virtual warehouses that run queries and data loads.
Continuous
A field shown along an axis; the pill is green.
Continuous Data Protection (CDP)
The umbrella for Time Travel, Fail-safe and cloning that guards data automatically.
COPY INTO
The command that bulk-loads files from a stage into a table, or unloads data out.
Credit
The unit Snowflake uses to bill compute; a running warehouse consumes credits by size and time.
Dashboard
A single view that combines multiple sheets, filters and actions.
Data blending
Linking two separate data sources on a shared field.
Data Catalog
The metadata and discovery service (part of Dataplex) for finding and tagging data.
Data model
The connected set of tables, relationships and measures behind a report.
Data pipeline
An orchestration item that copies data and runs activities in sequence.
Databricks Data Intelligence Platform
The lakehouse platform (built on Apache Spark and Delta Lake) for data engineering, analytics and ML.
Databricks Jobs / Workflows
The orchestration tool that schedules and runs tasks (notebooks, pipelines, scripts) on a timetable or trigger.
Databricks SQL
The SQL interface and warehouses for querying lakehouse data and building dashboards.
Dataflow
Managed service that runs Apache Beam batch and streaming pipelines, serverless.
Dataflows Gen2
The low-code, Power Query-based transform item for ingesting and shaping data.
Dataform
A tool for managing SQL-based ELT transformations and workflows in BigQuery.
Dataplex
A data fabric for organising, governing and discovering data across lakes and warehouses.
Dataprep
A visual, no-code tool for exploring and cleaning data (Cloud Dataprep by Trifacta).
Dataproc
Managed Hadoop and Spark for running existing open-source big-data jobs.
Dataset (semantic model)
The published data model that reports connect to.
DAX
Data Analysis Expressions - the formula language for measures and columns.
Delta (Delta Lake)
The open table format Fabric uses, adding transactions and versioning over Parquet files.
Delta Lake
The open table format adding ACID transactions, schema enforcement and time travel over files.
Delta table
A table stored in Delta Lake format; the default table type on Databricks.
Deployment pipeline
A tool to promote content across development, test and production stages.
Dimension
A field that slices the data (often blue, discrete), such as Region or Category.
Dimension table
A table of descriptive attributes (who, what, when, where).
Dimensional model
Fact and dimension tables shaped for analytics and reporting.
Direct Lake
A semantic-model mode that reads OneLake Delta tables directly, at import-like speed.
Discrete
A field shown as distinct headers; the pill is blue.
Drill-through
Navigation to a detail page filtered to the selected item.
Dynamic data masking
Hiding sensitive column values from unauthorised users at query time.
Edition
A Snowflake service tier (e.g. Standard, Enterprise) with different features and limits.
ETL / ELT
Extract-Transform-Load vs Extract-Load-Transform (transform in the warehouse).
Eventhouse
Real-Time Intelligence storage that holds KQL databases.
Eventstream
A no-code item for capturing, transforming and routing streaming data.
Expectations
Data-quality rules in a declarative pipeline that validate, drop or fail rows that break them.
External (unmanaged) table
A table pointing at data in a location you manage; dropping it leaves the files in place.
Extract
A saved snapshot (.hyper) of the data for faster, offline use.
Fact table
A table of events or transactions (the numbers you measure).
Fail-safe
A separate, Snowflake-managed 7-day recovery period after Time Travel ends.
File format
A named set of options (e.g. CSV, JSON) describing how files in a stage are parsed.
Filter context
The set of filters acting on a measure when it is evaluated.
Filters shelf
Where fields are placed to limit what the view shows.
FLATTEN
A function that expands nested arrays or objects into separate rows.
Full vs incremental load
Reloading all data versus loading only new or changed rows.
Gateway
A bridge that lets the Power BI service reach on-premises data.
Group
Combining selected members of a field into one category.
IAM
Identity and Access Management - roles and permissions that control who can do what.
Join
Combining tables at the row level on a shared key.
KQL
Kusto Query Language, used to query high-volume event and telemetry data.
Lakeflow Declarative Pipelines (DLT)
The declarative pipeline framework (formerly Delta Live Tables) that builds and maintains tables for you.
Lakehouse
A Fabric item storing files and Delta tables; loaded with notebooks, read via a SQL endpoint.
Level of Detail (LOD)
An expression (FIXED / INCLUDE / EXCLUDE) that sets the granularity of a calculation.
Lineage
The tracked flow of data from source to table to dashboard, surfaced by Unity Catalog.
Live connection
Queries the source data directly each time.
M
The language behind Power Query transformations (mostly generated for you).
Managed table
A table whose data and metadata Databricks manages; dropping it deletes the underlying files.
Marks card
The panel that controls colour, size, label, detail, shape and tooltip.
Materialised view
A precomputed, auto-refreshed query result that speeds up frequent queries.
Materialized view
A precomputed, automatically maintained result set for faster repeated queries.
Measure
A DAX calculation evaluated at query time, used for aggregations.
Medallion architecture
Bronze (raw), silver (cleaned), gold (business-ready) data layering.
Merge
Joining two queries on a matching key, like a SQL join.
Metadata cache
Statistics in the services layer that answer some queries without scanning data.
Micro-partition
A small, immutable columnar storage unit Snowflake creates automatically for table data.
Microsoft Fabric
The unified analytics platform that holds all the items below over one data lake.
Mirroring
Continuously replicating an external database into OneLake as Delta tables.
Multi-cluster shared-data architecture
The design that separates storage, compute and cloud services into independent layers.
Network policy
A rule that allows or blocks account access by IP address range.
Notebook
A code-first item (PySpark, Spark SQL) for transforming data at scale.
OneLake
The single, tenant-wide data lake every Fabric workspace and item shares.
OneLake shortcut
A pointer to data in another location, reused without copying it.
Pages shelf
Splits a view into a sequence you can step through by a field.
Parameter
A user-controllable value that can feed calculations, filters or reference lines.
Partitioning
Splitting a table's files by a column's values to speed up some queries.
Pearson VUE
The testing provider that delivers the SnowPro Core exam, online or at a test centre.
Power BI Desktop
The free authoring app where you prepare, model and build reports.
Power Query
The data-preparation engine for cleaning, transforming and combining data.
Privilege
A specific permission (e.g. SELECT, INSERT) granted on an object to a role.
Pruning
Skipping micro-partitions that cannot match a query, using their metadata, to speed it up.
Pub/Sub
Global, at-least-once messaging for ingesting and decoupling event streams.
PySpark
The Python API for Spark, used to read, transform and write data in code.
Reader account
A Snowflake-managed account a provider creates so a non-customer can read shared data.
Relationship
A link between two tables, defining how filters flow.
Results cache
Returns identical query results without recompute, for 24 hours, using no warehouse.
Role
A container of privileges; RBAC grants privileges to roles, and roles to users.
Role-based access control (RBAC)
Snowflake's security model: privileges flow through a hierarchy of roles.
Row key
Bigtable's primary access path; its design decides read/write performance.
Row-level security (RLS)
Restricting which rows a user can see, by role.
Rows / Columns shelves
Where fields are placed to define the structure of the view.
Scaling out
Adding clusters (multi-cluster warehouse) to handle more concurrent queries.
Scaling up
Increasing a warehouse size (e.g. XS to L) for more power on a single, larger query.
Scheduled refresh
Automatic updates of a dataset from its source on a timetable.
Schema (database)
A grouping of tables and views within a catalog.
Secure data sharing
Giving another account live, read-only access to objects with no data copied.
Secure view
A view that hides its definition and underlying detail, used for sharing sensitive data.
Semantic model
The published data model (dataset) that Power BI reports connect to.
Semi-structured data
Flexible data (JSON, Avro, Parquet) with no fixed table schema, queryable in Snowflake.
Sensitivity label
A governance tag that classifies and protects an item's data.
Sequence
An object that generates unique, increasing numbers, often for surrogate keys.
Service account
A non-human identity that pipelines and services use to authenticate to GCP.
Set
A custom subset of data, which can be dynamic and used in calculations.
Share
The object that defines what is shared and with which accounts.
Show Me
A helper that suggests chart types for the fields you have selected.
Slicer
An on-canvas control that filters the report for the viewer.
Snowflake AI Data Cloud
Snowflake's cloud platform for data storage, processing and sharing across clouds.
Snowflake Marketplace
A catalogue where providers publish data and services for others to access via sharing.
Snowpipe
Continuous, automated loading of files as they arrive, rather than in scheduled batches.
Spanner
Globally distributed, strongly consistent relational database that scales horizontally.
Spark
The distributed engine behind Fabric notebooks for large-scale transformation.
Spark SQL
Running SQL queries over data in the Spark engine and the lakehouse.
Spark structured streaming
Spark's API for processing streaming data in notebooks.
Stage
A location for data files; internal (in Snowflake) or external (e.g. an S3 bucket).
Star schema
The recommended model shape: fact tables surrounded by dimension tables.
Storage layer
Where table data is held as compressed, columnar micro-partitions in cloud storage.
Stored procedure
Procedural code that runs operations and logic on the server side.
Story
An ordered sequence of sheets or dashboards that tells a narrative.
Streaming vs batch
Processing events continuously as they arrive vs in scheduled bulk loads.
Structured Streaming
Spark's engine for incremental, continuous processing of data as it arrives.
System-defined role
A built-in role such as ACCOUNTADMIN, SYSADMIN, SECURITYADMIN or PUBLIC.
Table calculation
A calculation across the values already in the view (e.g. % of total).
Tableau Public
The free version of Tableau Desktop that publishes to the public web.
Time intelligence
DAX patterns for periods (year-to-date, prior year, and similar).
Time travel
Querying a previous version of a Delta table by version number or timestamp.
Tooltip
The pop-up detail shown when hovering over a mark.
Union
Stacking rows from tables that share the same columns.
Unity Catalog
The governance layer: a catalog.schema.table namespace with central permissions and lineage.
Unloading
Exporting data from Snowflake to files in a stage with COPY INTO.
User-defined function (UDF)
A custom function (SQL, JavaScript, Python and more) that returns a value.
VARIANT
A data type that stores semi-structured data such as JSON within a column.
View
A saved query that presents data as a virtual table.
Virtual warehouse
An independent compute cluster that runs queries and loads, billed in credits while running.
Visual
A chart, table, card or map placed on a report page.
Warehouse
A Fabric item with a full T-SQL engine for set-based transformation and serving.
Warehouse cache
Local cached data on a running warehouse that speeds up repeated queries.
Windowing
Grouping streaming data into time-based windows for aggregation in Beam/Dataflow.
Windowing function
A time-based grouping (such as tumbling or hopping) over a stream.
Workspace
A container for Fabric items where a team collaborates and sets permissions.
Zero-copy clone
An instant copy of a table, schema or database that shares storage until data changes.